Randomly Selected Data
Randomly Selected Data
Randomly Selected Data
The fast generation of discrete random variables with arbitrary frequency distributions is
discussed. The proposed method is related to rejection techniques but differs from them in
that all samples comprising the input data contribute to the samples in the target distribution.
The software implementation of the method requires at most two memory references and a
comparison. The method features good accuracy and modest storage requirements. It is par-
ticularly useful in small computers with limited memory capacity.
Key Words and Phrases" random number generation, probability, arbitrary distributions,
statistical tests
CR Categories: 3.24, 5.5
INTRODUCTION
M~HOD
The new method is related to rejection techniques but differs from t h e m in t h a t all
numbers generated are used. A n u m b e r is either accepted or replaced with an
"alias" number.
Copyright © 1977, Association for Computing Machinery, Inc. General permission to repub-
lish, but not for profit, all or part of this material is granted provided that ACM's copyright
notice is given and that reference is made to the publication, to its date of issue, and to the
fact that reprinting privileges were granted by permission of the Association for Computing
Machinery.
Author's address: Department of Electrical Engineering, University of Witwatersrand, 1 Jan
Smuts Ave., Johannesburg 2001, South Africa.
ACMTransactionson MathematicalSoftware.Vol.3, No. 3, September1977,Pages258-256.
254 • A.J. W a l k e r
C S U B R O U T I N E TO C O M P U T E T H E ALIAS A N D C U T O F F V A L U E S
C FOR T H E D E S I R E D P R O B A B I L I T Y D I S T R I B U T I O N .
C ON E N T R Y , A R R A Y E C O N T A I N S ~ H E D E S I R E D P R O B A B I L I T Y
C V A L U E S . N IS THE N U M B E R OF V A R I A B L E S IN THE
C DISTRIBUTION.
C ON EXIT, A R R A Y I S IA AND F C O N T A I N THE A L I A S AND
C CUTOFF VALUES, RESPECTIVELY. ARRAY 9 CONTAINS
C THE RECONSTRUCTED PROBABILITIES.
S U B R O U T I N E A R B R A N ( B , E, N, IA, F, P)
R E A L B(N), E(N), F(N), P(N)
I N T E G E R IA(N)
ERROR = .IE-5
AN = FLOAT(N)
C INITIALISE ARRAYIS IA,F,B
DO 10 I = I , N
IA (I) = I
F(I) = o.0
B(I) = E(I) - 1.0/AN
10 C O N T I N U E
C FIND THE LARGEST POSITIVE AND NEGATIVE DIFFERENCES
C AND T H E I R P O S I T I O N S IN A R R A Y B
DO 50 I = I , N
C= 0.0
D= 0.0
DO 30 J = I , N
IF (B(J).GT.C) G O T O 20
C = B(J)
K = J
GO TO 30
20 IF (B (J). LT. D) G O T O 30
D = B(J)
L = J
30 CONTINUE
C T E S T W H E T H E R T H E S U B OF D I F F E R E N C E S IN A R R A Y B H A V E
C BECOME SIGNIFICANT.
SUM = 0.0
DO t$O M = I , N
SUn = SUM * ABS ( B ( M ) )
q0 CONTINUE
IF (SUM.LT. ERROR) GO T O 60
C ASSIGN THE ALIAS AND CUTOFF VALUES.
IA(K) = L
F(K) = 1.0 + C ' A N
B(K) = 0.0
B(L) = C ÷ D
50 C O N T I N U E
C C O M P U T A T I O N O F ALIAS AND C U T O F F V A L U E S C O M P L E T E .
C NON R E C O N S T R U C T THE P R O B A B I L I T I E S .
60 DO 80 I = I , N
P(I) = F(I)/AN
DO 70 J = I , N
IF ( I A ( J ) . E Q . I ) P(I) = P(I) + ( I . 0 - F ( J ) ) / A N
70 CONTINUE
80 C O N T I N U E
RETURN
END
Fig. 1. Subroutine for finding the alias and cutoff values for a given p r o b a b i l i t y d i s t r i b u t i o n
ACM Transactionson Mathematical Software,Vol 3, No 3, September 1977.
An Efficient Method for Generating Discrete Random Varic~bles • 255
C A S U B R O U T I N E TO G E N E R A T E I N T E G E R R A N D O M
C V A R I A B L E S WITH P R E S C R I B E D P R O B A B I L I T Y
C DISTRIBUTION.
C ON E N T R Y , UA AND UB ARE U N C O R R E L A T E D R A N D O M
C V A R I A B L E S U N I F O R M L Y D I S T R I B U T E D O V E R (0,I)
C A R R A Y ' S IA A N D F C O N T A I N T H E D E S I R E D A L I A S
C AND C U T O F F V A L U E S , R E S P E C T I V E L Y . N IS T H E
C N U M B E R OF V A R I A B L E S IN T H E D I S T R I B U T I O N .
C ON EXIT, IX IS THE R E T U R N E D R A N D O M V A R I A B L E .
S U B R O U T I N E G E T O N E ( U A , UB, IA, F, N, IX)
R E A L F (N)
I N T E G E R IA(N)
AN = F L O A T ( N )
C C O N V E R T U~ TO AN I N T E G E R V A R I A B L E
IX = I N T ( U A * A N ) +
C COMPARE WITH T~E SELECTED CUTOFF.
IF ( U B . G T . F ( I X ) ) IX = IA(IX)
RETURN
END
Fig. 2. Subroutine for generating a sample I X with the required frequency distribution
~X with probability F ( X )
Y = ~.A(X) with probability 1 - F ( X )
where A ( X ) is an alias. The functions A ( X ) and F ( X ) are chosen according to
the algorithm shown in Figure 1 and ensure t h a t P r (Y = j ) = p~, j = 1 to n.
After the desired probability values have been entered into array E, the differ-
ences in magnitude between the desired distribution and the uniform distribution
are found and stored in array B. This array is searched for the largest negative and
positive differences, C and D, respectively, and their positions, K and L, respec-
tively. B ( K ) is then set to zero and B(L) is assigned the sum of C and D. I A ( K )
is assigned the value of L and F(K) is the normalized value of C added to unity.
A confirmation of the method's working may be required. This consists of the
following operation. Find all values X = x~, x 2 , . . . , x~ such t h a t j is an alias
of X,~. Then
Pr ( Y = j) = [F, + ~-"~I --
DISCUSSION
The method is considered to be particularly well suited for use in the small general
purpose computer with limited memory capacity because of the modest storage
requirements. Implementation is simple; only two random numbers, one discrete
and one continuous, at most two memory references, and a comparison are required
to produce a new sample with the required frequency distribution.
Where it is desired to generate discrete random variables from unbounded dis-
tributions, the method may be used to handle the bulk of the distribution and a
standard computational subroutine used to handle the tail.
Although it has been assumed t h a t X is uniformly distributed, which is the most
common case, the method may be extended quite easily to cover any given dis-
tribution for X.
ACKNOWLEDGMENTS
The author is most grateful for the helpful comments and suggestions given b y
Professor D.M. Hawkins, University of Witwatersrand, in the preparation of this
paper.
REFERENCES
1. MARSAGLIA,G. Generating discrete random variables in a computer. Comm. ACM 6, 1
(1963), 37-38.
2. RAMBERG,J.S., AND SCHMEISER, B.W An approximate method for generating symmetric
random variables Comm. ACM 15, 11 (Nov. 1972), 987-990.
3. SOBOLEWSKI,J.S , ANn PAYNE, W.H. Pseudo-noise with arbitrary amplitude distribution.
IEEE Trans. Compulers C-21 (1972), 337-345.
4. WALKER, A.J Fast generation of uniformly distributed pseudorandom numbers with
floating point representation. Eleclron. Lett. 10, 25/26 (1974), 553-554.
5. WALKER, A J. New fast method for generating discrete random numbers with arbitrary
frequency distributions. Electron. Lett 10, 8 (1974), 127-128.