Randomly Selected Data

Download as pdf or txt
Download as pdf or txt
You are on page 1of 4

An Efficient Method for Generating Discrete

Random Variables With General Distributions


ALASTAIR J. WALKER
University of Witwatersrand, South Africa

The fast generation of discrete random variables with arbitrary frequency distributions is
discussed. The proposed method is related to rejection techniques but differs from them in
that all samples comprising the input data contribute to the samples in the target distribution.
The software implementation of the method requires at most two memory references and a
comparison. The method features good accuracy and modest storage requirements. It is par-
ticularly useful in small computers with limited memory capacity.
Key Words and Phrases" random number generation, probability, arbitrary distributions,
statistical tests
CR Categories: 3.24, 5.5

INTRODUCTION

Commonly used criteria [2, 3] for judging the performance of r a n d o m number


generators include speed, precision, m e m o r y requirements, generality of the tech-
nique, and ease of implementation. The method proposed b y Marsaglia [1] has all
these attributes but must be used with care in a computer with limited m e m o r y
capacity since the number of m e m o r y locations required for the storage of con-
stants appears to be an order of magnitude greater t h a n the n u m b e r of variables
to be generated. I n comparison, the number of m e m o r y locations required for the
storage of constants b y the method proposed in this contribution is only twice the
n u m b e r of variables used. There is no restriction on the other desirable features
given above.
These principles have already been used to advantage in the design and con-
struction of a high-speed digital hardware-implemented pseudorandom n u m b e r
generator with an arbitrary discrete frequency distribution [5] and as a generator
of uniformly distributed random variables with floating point representation [4].

M~HOD
The new method is related to rejection techniques but differs from t h e m in t h a t all
numbers generated are used. A n u m b e r is either accepted or replaced with an
"alias" number.

Copyright © 1977, Association for Computing Machinery, Inc. General permission to repub-
lish, but not for profit, all or part of this material is granted provided that ACM's copyright
notice is given and that reference is made to the publication, to its date of issue, and to the
fact that reprinting privileges were granted by permission of the Association for Computing
Machinery.
Author's address: Department of Electrical Engineering, University of Witwatersrand, 1 Jan
Smuts Ave., Johannesburg 2001, South Africa.
ACMTransactionson MathematicalSoftware.Vol.3, No. 3, September1977,Pages258-256.
254 • A.J. W a l k e r

C S U B R O U T I N E TO C O M P U T E T H E ALIAS A N D C U T O F F V A L U E S
C FOR T H E D E S I R E D P R O B A B I L I T Y D I S T R I B U T I O N .
C ON E N T R Y , A R R A Y E C O N T A I N S ~ H E D E S I R E D P R O B A B I L I T Y
C V A L U E S . N IS THE N U M B E R OF V A R I A B L E S IN THE
C DISTRIBUTION.
C ON EXIT, A R R A Y I S IA AND F C O N T A I N THE A L I A S AND
C CUTOFF VALUES, RESPECTIVELY. ARRAY 9 CONTAINS
C THE RECONSTRUCTED PROBABILITIES.
S U B R O U T I N E A R B R A N ( B , E, N, IA, F, P)
R E A L B(N), E(N), F(N), P(N)
I N T E G E R IA(N)
ERROR = .IE-5
AN = FLOAT(N)
C INITIALISE ARRAYIS IA,F,B
DO 10 I = I , N
IA (I) = I
F(I) = o.0
B(I) = E(I) - 1.0/AN
10 C O N T I N U E
C FIND THE LARGEST POSITIVE AND NEGATIVE DIFFERENCES
C AND T H E I R P O S I T I O N S IN A R R A Y B
DO 50 I = I , N
C= 0.0
D= 0.0
DO 30 J = I , N
IF (B(J).GT.C) G O T O 20
C = B(J)
K = J
GO TO 30
20 IF (B (J). LT. D) G O T O 30
D = B(J)
L = J
30 CONTINUE
C T E S T W H E T H E R T H E S U B OF D I F F E R E N C E S IN A R R A Y B H A V E
C BECOME SIGNIFICANT.
SUM = 0.0
DO t$O M = I , N
SUn = SUM * ABS ( B ( M ) )
q0 CONTINUE
IF (SUM.LT. ERROR) GO T O 60
C ASSIGN THE ALIAS AND CUTOFF VALUES.
IA(K) = L
F(K) = 1.0 + C ' A N
B(K) = 0.0
B(L) = C ÷ D
50 C O N T I N U E
C C O M P U T A T I O N O F ALIAS AND C U T O F F V A L U E S C O M P L E T E .
C NON R E C O N S T R U C T THE P R O B A B I L I T I E S .
60 DO 80 I = I , N
P(I) = F(I)/AN
DO 70 J = I , N
IF ( I A ( J ) . E Q . I ) P(I) = P(I) + ( I . 0 - F ( J ) ) / A N
70 CONTINUE
80 C O N T I N U E
RETURN
END
Fig. 1. Subroutine for finding the alias and cutoff values for a given p r o b a b i l i t y d i s t r i b u t i o n
ACM Transactionson Mathematical Software,Vol 3, No 3, September 1977.
An Efficient Method for Generating Discrete Random Varic~bles • 255

C A S U B R O U T I N E TO G E N E R A T E I N T E G E R R A N D O M
C V A R I A B L E S WITH P R E S C R I B E D P R O B A B I L I T Y
C DISTRIBUTION.
C ON E N T R Y , UA AND UB ARE U N C O R R E L A T E D R A N D O M
C V A R I A B L E S U N I F O R M L Y D I S T R I B U T E D O V E R (0,I)
C A R R A Y ' S IA A N D F C O N T A I N T H E D E S I R E D A L I A S
C AND C U T O F F V A L U E S , R E S P E C T I V E L Y . N IS T H E
C N U M B E R OF V A R I A B L E S IN T H E D I S T R I B U T I O N .
C ON EXIT, IX IS THE R E T U R N E D R A N D O M V A R I A B L E .
S U B R O U T I N E G E T O N E ( U A , UB, IA, F, N, IX)
R E A L F (N)
I N T E G E R IA(N)
AN = F L O A T ( N )
C C O N V E R T U~ TO AN I N T E G E R V A R I A B L E
IX = I N T ( U A * A N ) +
C COMPARE WITH T~E SELECTED CUTOFF.
IF ( U B . G T . F ( I X ) ) IX = IA(IX)
RETURN
END
Fig. 2. Subroutine for generating a sample I X with the required frequency distribution

I t is required to produce a random or pseudorandom integer Y whose probability


distribution is P r (Y = 3) = PJ, 3 = 1 to n. We have available a random integer
X which is uniformly distributed over the range 1 to m, i.e. P r ( X =j) = q = m-*.
The method requires t h a t m = n. I t may be convenient to have m > n, i.e. to
have the range of X larger than that of Y, and in this case we redefine the range of
Y to be 1 to m by setting p~+, . . . . p m = 0.
The method consists of setting

~X with probability F ( X )
Y = ~.A(X) with probability 1 - F ( X )
where A ( X ) is an alias. The functions A ( X ) and F ( X ) are chosen according to
the algorithm shown in Figure 1 and ensure t h a t P r (Y = j ) = p~, j = 1 to n.
After the desired probability values have been entered into array E, the differ-
ences in magnitude between the desired distribution and the uniform distribution
are found and stored in array B. This array is searched for the largest negative and
positive differences, C and D, respectively, and their positions, K and L, respec-
tively. B ( K ) is then set to zero and B(L) is assigned the sum of C and D. I A ( K )
is assigned the value of L and F(K) is the normalized value of C added to unity.
A confirmation of the method's working may be required. This consists of the
following operation. Find all values X = x~, x 2 , . . . , x~ such t h a t j is an alias
of X,~. Then

Pr ( Y = j) = [F, + ~-"~I --

This operation is implemented at the end of the algorithm shown in Figure 1.


The reconstructed probabilities are stored in array P at the end of computation.
The practical implementation requires the generation of a pair (X, U) where
U is a continuous random variable which has a uniform probability distribution
A C M Transactions on Mathematical Software, Vol. 3, No. 3, September 1977.
256 • A.J. Walker

over the range (0, 1) and is independent of X. We then set


X if U < F(X)
Y= A(X) if U> F(X).
A Fortran implementation of this procedure is given in Figure 2.

DISCUSSION

The method is considered to be particularly well suited for use in the small general
purpose computer with limited memory capacity because of the modest storage
requirements. Implementation is simple; only two random numbers, one discrete
and one continuous, at most two memory references, and a comparison are required
to produce a new sample with the required frequency distribution.
Where it is desired to generate discrete random variables from unbounded dis-
tributions, the method may be used to handle the bulk of the distribution and a
standard computational subroutine used to handle the tail.
Although it has been assumed t h a t X is uniformly distributed, which is the most
common case, the method may be extended quite easily to cover any given dis-
tribution for X.

ACKNOWLEDGMENTS
The author is most grateful for the helpful comments and suggestions given b y
Professor D.M. Hawkins, University of Witwatersrand, in the preparation of this
paper.

REFERENCES
1. MARSAGLIA,G. Generating discrete random variables in a computer. Comm. ACM 6, 1
(1963), 37-38.
2. RAMBERG,J.S., AND SCHMEISER, B.W An approximate method for generating symmetric
random variables Comm. ACM 15, 11 (Nov. 1972), 987-990.
3. SOBOLEWSKI,J.S , ANn PAYNE, W.H. Pseudo-noise with arbitrary amplitude distribution.
IEEE Trans. Compulers C-21 (1972), 337-345.
4. WALKER, A.J Fast generation of uniformly distributed pseudorandom numbers with
floating point representation. Eleclron. Lett. 10, 25/26 (1974), 553-554.
5. WALKER, A J. New fast method for generating discrete random numbers with arbitrary
frequency distributions. Electron. Lett 10, 8 (1974), 127-128.

Received April 1975

ACM Transactions on Mathematical Software, Vol 3, No 3, September 1977

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy