Lec27 AcceptReject
Lec27 AcceptReject
Email: nzabaras@gmail.com
URL: https://www.zabaras.com/
October 9, 2020
Following closely:
C. Robert, G. Casella, Monte Carlo Statistical Methods (Ch.. 1, 2, 3.1, & 3.2) (google books, slides, video)
J. S. Liu, MC Strategies in Scientific Computing (Chapters 1 & 2)
J-M Marin and C. P. Robert, Bayesian Core (Chapter 2)
Statistical Computing & Monte Carlo Methods, A. Doucet (course notes, 2007)
Statistical Computing and Machine Learning, Fall 2020, N. Zabaras 2
Goals
The goals for today’s lecture include:
y
Set 𝑖 = 1 Accept
Repeat until 𝑖 = 𝑁
Sample y ~ q( y ) and u ~ U(0,1)
If u M' *( y)
q *( y )
then accept (set 𝑥 (𝑖) = 𝑦) and increment the
counter 𝑖
Otherwise, reject
X *( y)dy
Pr(Y is accepted )
M ' q *( y )dy
X
The mean of the geometric distribution is 1/𝛾, thus the number of trials before
success is thus an unbiased estimate of 1/Pr(𝑌 𝑖𝑠 𝑎𝑐𝑐𝑒𝑝𝑡𝑒𝑑).
𝒢𝒶(𝑎−𝑘|𝑎,𝜆)
This ratio is max at 𝑥 = 𝑎 − 𝑎 . Thus 𝑀 = ,𝑘 = 𝑎.
𝒢𝒶(𝑎−𝑘|𝑘,𝜆−1)
Since we know the CDF of ℎ(𝑥), we can sample easily from it. To use it as a
proposal distribution, we generalize to be sure that is nowhere less than the Gamma.
You can show 𝑎 > 1, 𝑥 ≥ 0 the following inequality holds:
In our notation:
1 𝑎−1 −𝑥 1 𝑒 − 𝑎−1 𝑎 − 1 𝑎−1
𝑓(𝑥) = 𝑥 𝑒 ≤ 2 ≡𝑔 𝑥 𝜋 ∗ = 𝑥 𝑎−1 𝑒 −𝑥
Γ 𝑎 Γ 𝑎 𝑥− 𝑎−1 1
𝑞∗ =
1+ 2
2𝑎 − 1 1+
𝑥− 𝑎−1
2𝑎 − 1
1 − 𝑎−1 𝑎−1
1 𝑀∗ = 𝑒 − 𝑎−1 𝑎 − 1 𝑎−1
= 𝑒 𝑎−1 𝜋 2𝑎 − 1 ℎ 𝑥|𝑎 − 1, 𝑐 = 1 𝑀∗ 𝑒 𝑥𝑑 ∗ 𝑞 −(𝑎−1) 𝑎 − 1 𝑎−1
Γ 𝑎 2𝑎 − 1 = = 𝜋 2𝑎 − 1
𝛾 𝑥𝑑 ∗ 𝜋 Γ 𝑎
Thus we have shown that:
1 1
𝑓(𝑥) ≤ 𝐾ℎ 𝑥|𝑏 = 𝑎 − 1, 𝑐 = 2𝑎−1 , where 𝐾 = Γ 𝑒− 𝑎−1
𝑎−1 𝑎−1 𝜋
2𝑎 − 1.
𝑎
U. Dieter & J. Ahrens, Acceptance Rejection Techniques for Sampling from the Beta and Gamma Distributions, 1974
1. Set 𝑏 ← 𝑎 − 1, 𝐴 ← 𝑎 + 𝑏, and s ← 𝐴. // 𝑏 = 𝑎 − 1, 𝐴 = 2𝑎 − 1, 𝑠 = 2𝑎 − 1.
3. If 𝑥 < 0 go to 2.
𝑥 𝑡2
4. Generate 𝑢′ . If 𝑢′ > 𝑒𝑥𝑝 𝑏 𝑙𝑛 − 𝑡 + 𝑙𝑛 1 + go to Step 2. Otherwise
𝑏 𝐴
deliver 𝑥.
U. Dieter & J. Ahrens, Acceptance Rejection Techniques for Sampling from the Beta and Gamma Distributions, 1974
1
𝐾ℎ 𝑥|𝑏 = 𝑎 − 1, 𝑐 =
2𝑎 − 1
𝐺𝑎 𝑥 𝑎, 𝜆 = 1
1 𝑎−1 −𝑥
= 𝑥 𝑒
Γ(𝑎)
1 1
The expected number of trials per sample is: 𝛾 = Γ 𝑎 𝑒 − 𝑎−1 𝑎 − 1 𝑎−1
𝜋 2𝑎 − 1.
It decreases from π = 3.14159 for 𝑎 = 1 to 𝜋 = 1.77245 for 𝑎 → ∞.
U. Dieter & J. Ahrens, Acceptance Rejection Techniques for Sampling from the Beta and Gamma Distributions, 1974
Statistical Computing and Machine Learning, Fall 2020, N. Zabaras 11
Alternative Rejection Sampling (RS) Algorithm
In the standard accept/reject algorithm, the candidate is sampled before 𝑢.
This is not necessary.
(Beskos et al., 2005): Let (Yn , I n ) n1 be a sequence of i.i.d. random variables in
X 0,1 such that Y1 ~ q
*( y )
Pr( I1 1| Y1 y ) y X
Cq *( y )
Define min i 1: Ii 1 , then Y ~
This scheme does not assume any order for the simulation of 𝑌 and 𝐼 and,
besides the conditional property given in the proposition, does not restrict the
construction of 𝐼.
This result is useful if we can construct conditions for the acceptance or
rejection of the current proposed element 𝑌 from minimal information about it.
A. Beskos and G. Roberts, The Annals of Applied Probability, Vol 15(4) (2005) pp. 2422–2444.
Statistical Computing and Machine Learning, Fall 2020, N. Zabaras 12
Alternative Rejection Sampling (RS) Algorithm
The proof is given in (Beskos et al., 2005) and can be summarized in the
following steps. Let (𝑆, 𝒮) be a sufficiently regular measurement space
𝜋∗ (𝑦) 𝑦𝑑)𝑦( ∗𝜋 𝑆
Step 1. 𝑝 𝐼1 = 1 = 𝐼 𝑃 𝑆1 = 1|𝑌1 = 𝑦 𝑞 𝑦 𝑑𝑦 = 𝑞 )𝑦( ∗𝑞𝐶 𝑆 𝑦 𝑑𝑦 = ≡𝛾
𝐶 𝑦𝑑)𝑦( ∗𝑞 𝑆
Step 2. For any 𝐹 ∈ 𝒮, we have:
𝑝 𝑌𝜏 ∈ 𝐹 = 𝑝 𝑌𝜏 ∈ 𝐹, 𝐼1 = 1 + 𝑃 𝑌𝜏 ∈ 𝐹|𝐼1 = 0 𝑃 𝐼1 = 0
𝑝 𝑌𝜏 ∈ 𝐹 = න 𝑃 𝐼1 = 1|𝑌1 = 𝑦 𝑞 𝑦 𝑑𝑦 + 𝑃 𝑌𝜏 ∈ 𝐹 1 − 𝛾
𝐹
𝑝 𝑌𝜏 ∈ 𝐹 = 𝛾𝜋(𝐹) + 𝑃 𝑌𝜏 ∈ 𝐹 1 − 𝛾 → 𝑃 𝑌𝜏 ∈ 𝐹 = 𝜋(𝐹)
A. Beskos and G. Roberts, The Annals of Applied Probability, Vol 15(4) (2005) pp. 2422–2444.
Statistical Computing and Machine Learning, Fall 2020, N. Zabaras 13
Mixture Methods for the Generation of Random Variables
Consider infinite mixture elements, mixture weights 𝑝𝑖 that are geometric
probabilities and mixture elements 𝜋𝑖 that are all equal to 𝜋 ∙
( x) pi ( x), pi p (1 p )i 1 and
i 1
𝑦𝑑)𝑦( ∗ 𝜋 𝒳
𝑝=
𝑀′ 𝑦𝑑)𝑦( ∗𝑞 𝒳
The element identifier 𝐼~𝒢ℯℴ(𝑝) is generated not by discrete sampling but by
a sequential search that tests for 𝐼 = 1 , {𝐼 = 2}, . . until a test is accepted.
𝑥~𝜋𝑖 𝑥 = 𝜋(𝑥) is generated automatically as a by product of the
determination of 𝐼.
Instead of simulating from the geometric distribution 𝒢ℯℴ(𝑝) directly which is
impossible, one simulates an element which admits this probability distribution
(see Peterson and Kronmal, 1982)
Arthur V. Peterson, Jr. and Richard A. Kronmal, On Mixture Methods for the Computer Generation of Random
Variables, The American Statistician, Vol. 36, No. 3, Part 1 (Aug., 1982), pp. 184-191
Arthur V. Peterson, Jr. and Richard A. Kronmal, On Mixture Methods for the Computer Generation of Random
Variables, The American Statistician, Vol. 36, No. 3, Part 1 (Aug., 1982), pp. 184-191
2
1 x2
If we use q( x) q *( x) e (normalized )
2
The likelihood is often bounded so one can use the rejection procedure.
Samples are accepted with probability
X *( x)dx ( ) f ( x | )d ( ) f ( x | )d
M q *( x)dx M ( )d M
X
xi2
1 i 1
( x ) *( x ) N (0,I ) e 2
2
d /2
xi2
1 i 1 2
q *( x ) N (0, I ) 2
e 2
d
2 2 d /2
Note that:
1
*( x )
1 xi2 (1 2 )
The efficiency of
e d 2 i1
M
d
RS
q *( x ) decreases
Z 1
Pro Proposal Accepted 0 exponentially
M q *( y )dy d
with
X dimensionality
when d
Rejection sampling:
h n ( x) h( x) log f ( x) h n ( x)
f n ( x) e hn ( x ) f ( x) e hn ( x ) f n ( x)
f n ( x) f ( x) f n ( x) wn g n ( x),
log f ( x) wn is the normalization constant of f n ( x )
and g n ( x) is a density easy to sample from.
At iteration 𝑛 ≥ 1
Otherwise, update
S n 1 S n Y
Gilks, WR, Wild, P, "Adaptive rejection sampling for Gibbs sampling“ Applied Statistics, Vol. 41 (1992), pp. 337-348. Get it from
JSTOR
Thus (a | x1:n , y1:n , b) and similarly (b | x1:n , y1:n , b) are log-concave, and
adaptive rejection sampling can be applied.
with : 1 a1 a2 ... 0
J. F. Monahan, Extension of von Newmann’s method for generating random variables, Mathematics of
Computation, 33(147) (1979) 1065-1069.
F ( x) 1 cos
2 2 2 i 2
1 1 ... 1 ...
2 i
48 2 2 i 3
2i !
To derive this note that: x
2i
2i
x 2i
cos x (1) i x
2i !
1 cos 1 (1) i 2
2i !
2i x
2 i
i 0 2 i 0 i 1 2 2i !
2 i 2 x 2
i
2 i 2
x
22i 3 2i !
2 2 i 3
2i !
x
2 i
i 1 i 1
1 cos (use x 1 for the denominator approximation)
2
1
2 i 2
2 2i ! 1
2 i
8 2 i 3
i 1
2 2 i 2
Thus : G ( x) x 2 , H ( x) x x 2 ... x i ...
48 22i 3 2i !
Statistical Computing and Machine Learning, Fall 2020, N. Zabaras 26
Monahan’s Accept/Reject Method
H G ( x)
F ( x) P( X x) , G is a CDF , H ( x ) an x n , s.t.1 a1 a2 ... 0
H (1) n 1
Repeat
Generate X ~ G and set K 1
Repeat
Generate U ~ G and V ~ U [0,1]
aK 1
If U X and V then K K 1, otherwise stop.
aK
Until K odd , return X
This can be shown simply using P X , An , Anc1 anG () n an1G () n1 an an1
P ( Accept X ) P ( K odd ) a1 a2 a3 a4 ... an (1) n1 H (1)
n 1
P ( X x, X returned )
n 1,3,5,..
P ( X x, An , Anc1 )
P ( X x | X returned )
P ( X returned ) P ( X returned )
1
a G ( x) a2G ( x) a3G ( x) a4G ( x) ...
2 3 4 a G ( x) (1)
n
n n 1
H G ( x)
1 n 1
H G ( x)
F ( x) P ( X x | X returned )
H (1)