0% found this document useful (0 votes)
21 views

Lec27 AcceptReject

This document discusses rejection sampling and its goals for the lecture. It provides an overview of rejection sampling and the accept/reject algorithm. It explains how to use rejection sampling to sample from distributions that are only known up to a proportionality constant, like the gamma distribution. It also discusses using an envelope function to improve the efficiency of rejection sampling, like using a Cauchy proposal distribution when sampling from a gamma distribution. The goals are to understand rejection sampling, the accept/reject algorithm, envelope rejection methods, and their applications to problems like sampling from the posterior distribution.

Uploaded by

hu jack
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
21 views

Lec27 AcceptReject

This document discusses rejection sampling and its goals for the lecture. It provides an overview of rejection sampling and the accept/reject algorithm. It explains how to use rejection sampling to sample from distributions that are only known up to a proportionality constant, like the gamma distribution. It also discusses using an envelope function to improve the efficiency of rejection sampling, like using a Cauchy proposal distribution when sampling from a gamma distribution. The goals are to understand rejection sampling, the accept/reject algorithm, envelope rejection methods, and their applications to problems like sampling from the posterior distribution.

Uploaded by

hu jack
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 30

Rejection Sampling

Prof. Nicholas Zabaras

Email: nzabaras@gmail.com
URL: https://www.zabaras.com/

October 9, 2020

Statistical Computing and Machine Learning, Fall 2020, N. Zabaras 1


Contents
 Accept/Reject Algorithm, Adaptive rejection for the Gamma, Alternative
rejection sampling, Mixture methods, High-Dimensions, Rejection sampling
from the posterior

 Envelop rejection methods

 Monahan’s accept/reject method

Following closely:
 C. Robert, G. Casella, Monte Carlo Statistical Methods (Ch.. 1, 2, 3.1, & 3.2) (google books, slides, video)
 J. S. Liu, MC Strategies in Scientific Computing (Chapters 1 & 2)
 J-M Marin and C. P. Robert, Bayesian Core (Chapter 2)
 Statistical Computing & Monte Carlo Methods, A. Doucet (course notes, 2007)
Statistical Computing and Machine Learning, Fall 2020, N. Zabaras 2
Goals
 The goals for today’s lecture include:

 Understand rejection sampling and the accept/reject algorithm

 Understand the envelope rejection method and Monahan’s accept/reject


method

Statistical Computing and Machine Learning, Fall 2020, N. Zabaras 3


Rejection Sampling
 We would like to sample from 𝜋(𝑥) defined on X only known up to a
proportionality constant,   *

 The method relies on samples generated from a proposal distribution 𝑞(𝑥) on


𝑋. 𝑞 might also be known up to a normalizing constant, q  q *

 We need 𝑞(𝑥) to dominate 𝜋(𝑥), i.e.


𝜋 ∗ (𝑥)
𝑀 = sup ∗ < +∞
𝑥∈𝒳 𝑞 (𝑥 )

 This implies that  *( x)  0  q *( x)  0 but also that the tails of q *( x) must be


thicker than the tails of  *( x).

Statistical Computing and Machine Learning, Fall 2020, N. Zabaras 4


Accept/Reject Algorithm
 More generally, we would like to sample from 𝜋(𝑥), but it’s easier to sample
from a proposal distribution 𝑞(𝑥)
 𝑞(𝒙) satisfies 𝜋 ∗ (𝒙) < 𝑀’ 𝑞∗ (𝒙) for all 𝒙 for some 𝑀’ ≥ 𝑀
 Procedure:
 Sample 𝑦 from 𝑞(𝑦) and 𝑢 from 𝒰[0,1]
 Accept (set 𝑥 = 𝑦) with probability 𝜋 ∗ (𝑦)/ 𝑀’𝑞 ∗ (𝑦) (i.e. if 𝑢 ≤ 𝜋 ∗ (𝑦) / 𝑀’
𝑞 ∗ (𝑦))
 Reject otherwise and repeat.
 The accepted 𝑥 (𝑖) are samples from 𝜋(𝑥)!
 If 𝑀’ is too large, we will rarely accept samples.
B. D. Flury, Rejection Sampling made easy, SIAM review, 1990
J. Halton, Reject the rejection technique, J. Scientific Computing, 1992

Statistical Computing and Machine Learning, Fall 2020, N. Zabaras 5


Rejection Sampling
Reject
 *( y )
M ' q *( y )
M ' q *( y )
M ' q *( y )u  *( y )

y
 Set 𝑖 = 1 Accept
 Repeat until 𝑖 = 𝑁
 Sample y ~ q( y ) and u ~ U(0,1)
 If u  M' *( y)
q *( y )
then accept (set 𝑥 (𝑖) = 𝑦) and increment the

counter 𝑖
 Otherwise, reject

The distribution 𝜋(𝑦) needs to be known only up to a normalizing


constant:
 *( y )
 ( y)  , Z    *( y )dy
Z X
Statistical Computing and Machine Learning, Fall 2020, N. Zabaras 6
Accept/Reject Efficiency
 The number of trials before a candidate sample is accepted is a geometric
distribution.

 Indeed note that:


Pr  k th proposal is accepted   (1   ) k 1 

X  *( y)dy
  Pr(Y is accepted ) 
M '  q *( y )dy
X

 The mean of the geometric distribution is 1/𝛾, thus the number of trials before
success is thus an unbiased estimate of 1/Pr(𝑌 𝑖𝑠 𝑎𝑐𝑐𝑒𝑝𝑡𝑒𝑑).

Statistical Computing and Machine Learning, Fall 2020, N. Zabaras 7


Example: Sampling from the Gamma
1
 Consider sampling from the Gamma distribution 𝒢𝒶 𝑥 𝑎, 𝜆 = Γ(𝑎) 𝜆𝑎 𝑥 𝑎−1 𝑒 −𝜆𝑥
 One can show that if 𝑋𝑖 ~ℰ𝓍𝓅 𝜆 𝑖𝑖𝑑 and 𝑌 = 𝑋1 + ⋯ + 𝑋𝑘 , then 𝑌~𝒢𝒶 𝑥 𝑘, 𝜆 .
 This transformation cannot be used for non-integer 𝑎. In this case, we can sample from 𝒢𝒶 𝑥 𝑎, 𝜆 using
accept/reject with a proposal 𝑞 𝑥 = 𝒢𝒶 𝑥 𝑘 = 𝑎 , 𝜆 − 1 . With this choice

𝑝(𝑥) 𝑥 𝑎−1 𝜆𝑎 𝑒 −𝜆𝑥 Γ(𝑘) Γ( 𝑎 )𝜆𝑎


= 𝑘−1 𝑘 −(𝜆−1)𝑥
= 𝑘
𝑥 𝑎−𝑘 𝑒 −𝑥
𝑞(𝑥) 𝑥 (𝜆 − 1) 𝑒 Γ(𝑎) Γ(𝑎)(𝜆 − 1)

𝒢𝒶(𝑎−𝑘|𝑎,𝜆)
 This ratio is max at 𝑥 = 𝑎 − 𝑎 . Thus 𝑀 = ,𝑘 = 𝑎.
𝒢𝒶(𝑎−𝑘|𝑘,𝜆−1)

Link to MatLab implementation


From PMTK3

Statistical Computing and Machine Learning, Fall 2020, N. Zabaras 8


Acceptance-Rejection for the Gamma
1
 We revisit sampling from 𝒢𝒶 𝑥 𝑎, 𝜆 = 1 ≡ 𝑓(𝑥) = 𝑥 𝑎−1 𝑒 −𝑥 , 𝑎 > 1. A suitable
Γ(𝑎)
𝑐/𝜋 1
proposal is the Cauchy ℎ 𝑥 = . The CDF is given as 𝐻 𝑥 = +
1+𝑐 𝑥−𝑏 2 2
1 1
𝑎𝑟𝑐𝑡𝑎𝑛 𝑐 𝑥−𝑏 . The inverse is 𝑥 = 𝑡𝑎𝑛 𝜋 𝐻 𝑥 − 0.5 + 𝑏.
𝜋 𝑐

 Since we know the CDF of ℎ(𝑥), we can sample easily from it. To use it as a
proposal distribution, we generalize to be sure that is nowhere less than the Gamma.
 You can show 𝑎 > 1, 𝑥 ≥ 0 the following inequality holds:
In our notation:
1 𝑎−1 −𝑥 1 𝑒 − 𝑎−1 𝑎 − 1 𝑎−1
𝑓(𝑥) = 𝑥 𝑒 ≤ 2 ≡𝑔 𝑥 𝜋 ∗ = 𝑥 𝑎−1 𝑒 −𝑥
Γ 𝑎 Γ 𝑎 𝑥− 𝑎−1 1
𝑞∗ =
1+ 2
2𝑎 − 1 1+
𝑥− 𝑎−1
2𝑎 − 1
1 − 𝑎−1 𝑎−1
1 𝑀∗ = 𝑒 − 𝑎−1 𝑎 − 1 𝑎−1
= 𝑒 𝑎−1 𝜋 2𝑎 − 1 ℎ 𝑥|𝑎 − 1, 𝑐 = 1 𝑀∗ ‫ 𝑒 𝑥𝑑 ∗ 𝑞 ׬‬−(𝑎−1) 𝑎 − 1 𝑎−1
Γ 𝑎 2𝑎 − 1 = = 𝜋 2𝑎 − 1
𝛾 ‫𝑥𝑑 ∗ 𝜋 ׬‬ Γ 𝑎
 Thus we have shown that:
1 1
𝑓(𝑥) ≤ 𝐾ℎ 𝑥|𝑏 = 𝑎 − 1, 𝑐 = 2𝑎−1 , where 𝐾 = Γ 𝑒− 𝑎−1
𝑎−1 𝑎−1 𝜋
2𝑎 − 1.
𝑎
 U. Dieter & J. Ahrens, Acceptance Rejection Techniques for Sampling from the Beta and Gamma Distributions, 1974

Statistical Computing and Machine Learning, Fall 2020, N. Zabaras 9


Adaptive Rejection for the Gamma
The overall accept−reject algorithm now takes the form:

1. Set 𝑏 ← 𝑎 − 1, 𝐴 ← 𝑎 + 𝑏, and s ← 𝐴. // 𝑏 = 𝑎 − 1, 𝐴 = 2𝑎 − 1, 𝑠 = 2𝑎 − 1.

2. Generate 𝑢~𝒰 0,1 . Set 𝑡 ← 𝑠 𝑡𝑎𝑛 𝜋 𝑢 − 0.5 and 𝑥 ← 𝑏 + 𝑡. // 𝑥 = sample


from the Cauchy.

3. If 𝑥 < 0 go to 2.

𝑥 𝑡2
4. Generate 𝑢′ . If 𝑢′ > 𝑒𝑥𝑝 𝑏 𝑙𝑛 − 𝑡 + 𝑙𝑛 1 + go to Step 2. Otherwise
𝑏 𝐴
deliver 𝑥.

 U. Dieter & J. Ahrens, Acceptance Rejection Techniques for Sampling from the Beta and Gamma Distributions, 1974

Statistical Computing and Machine Learning, Fall 2020, N. Zabaras 10


Adaptive Rejection for the Gamma
1
 We revisit sampling from 𝒢𝒶 𝑥 𝑎, 𝜆 = 𝑥 𝑎−1 𝜆𝑎 𝑒 −𝜆𝑥 , 𝑎 > 1. A suitable
Γ(𝑎)
𝑐/𝜋
proposal distribution is the Cauchy ℎ 𝑥 = . We generalize to be sure
1+𝑐 𝑥−𝑏 2
that is nowhere less than the Gamma.

1
𝐾ℎ 𝑥|𝑏 = 𝑎 − 1, 𝑐 =
2𝑎 − 1

𝐺𝑎 𝑥 𝑎, 𝜆 = 1
1 𝑎−1 −𝑥
= 𝑥 𝑒
Γ(𝑎)
1 1
 The expected number of trials per sample is: 𝛾 = Γ 𝑎 𝑒 − 𝑎−1 𝑎 − 1 𝑎−1
𝜋 2𝑎 − 1.
It decreases from π = 3.14159 for 𝑎 = 1 to 𝜋 = 1.77245 for 𝑎 → ∞.
 U. Dieter & J. Ahrens, Acceptance Rejection Techniques for Sampling from the Beta and Gamma Distributions, 1974
Statistical Computing and Machine Learning, Fall 2020, N. Zabaras 11
Alternative Rejection Sampling (RS) Algorithm
 In the standard accept/reject algorithm, the candidate is sampled before 𝑢.
This is not necessary.
 (Beskos et al., 2005): Let (Yn , I n ) n1 be a sequence of i.i.d. random variables in
X  0,1 such that Y1 ~ q
 *( y )
Pr( I1  1| Y1  y )  y  X
Cq *( y )
Define   min i  1: Ii  1 , then Y ~ 
 This scheme does not assume any order for the simulation of 𝑌 and 𝐼 and,
besides the conditional property given in the proposition, does not restrict the
construction of 𝐼.
 This result is useful if we can construct conditions for the acceptance or
rejection of the current proposed element 𝑌 from minimal information about it.

A. Beskos and G. Roberts, The Annals of Applied Probability, Vol 15(4) (2005) pp. 2422–2444.
Statistical Computing and Machine Learning, Fall 2020, N. Zabaras 12
Alternative Rejection Sampling (RS) Algorithm
 The proof is given in (Beskos et al., 2005) and can be summarized in the
following steps. Let (𝑆, 𝒮) be a sufficiently regular measurement space
𝜋∗ (𝑦) ‫𝑦𝑑)𝑦( ∗𝜋 𝑆׬‬
 Step 1. 𝑝 𝐼1 = 1 = ‫𝐼 𝑃 𝑆׬‬1 = 1|𝑌1 = 𝑦 𝑞 𝑦 𝑑𝑦 = ‫𝑞 )𝑦( ∗𝑞𝐶 𝑆׬‬ 𝑦 𝑑𝑦 = ≡𝛾
𝐶 ‫𝑦𝑑)𝑦( ∗𝑞 𝑆׬‬
 Step 2. For any 𝐹 ∈ 𝒮, we have:

𝑝 𝑌𝜏 ∈ 𝐹 = 𝑝 𝑌𝜏 ∈ 𝐹, 𝐼1 = 1 + 𝑃 𝑌𝜏 ∈ 𝐹|𝐼1 = 0 𝑃 𝐼1 = 0

or using the independence of 𝑌𝑛 , 𝐼𝑛 𝑛≥1 and the definition of 𝜏 = min 𝑖 ≥ 1: 𝐼𝑖 = 1

𝑝 𝑌𝜏 ∈ 𝐹 = න 𝑃 𝐼1 = 1|𝑌1 = 𝑦 𝑞 𝑦 𝑑𝑦 + 𝑃 𝑌𝜏 ∈ 𝐹 1 − 𝛾
𝐹

𝜋 ∗ (𝑦) ‫𝑦𝑑)𝑦( ∗ 𝜋 𝑆׬‬


𝑝 𝑌𝜏 ∈ 𝐹 = න ∗ (𝑦)
𝑞 𝑦 𝑑𝑦 + 𝑃 𝑌𝜏 ∈ 𝐹 1 − 𝛾 → 𝑝 𝑌𝜏 ∈ 𝐹 = න 𝜋(𝑦)𝑑𝑦 + 𝑃 𝑌𝜏 ∈ 𝐹 1 − 𝛾
𝐹 𝐶𝑞 ∗
𝐶 ‫𝐹 𝑦𝑑)𝑦( 𝑞 𝑆׬‬
𝜋(𝐹)

𝑝 𝑌𝜏 ∈ 𝐹 = 𝛾𝜋(𝐹) + 𝑃 𝑌𝜏 ∈ 𝐹 1 − 𝛾 → 𝑃 𝑌𝜏 ∈ 𝐹 = 𝜋(𝐹)

A. Beskos and G. Roberts, The Annals of Applied Probability, Vol 15(4) (2005) pp. 2422–2444.
Statistical Computing and Machine Learning, Fall 2020, N. Zabaras 13
Mixture Methods for the Generation of Random Variables
 Consider infinite mixture elements, mixture weights 𝑝𝑖 that are geometric
probabilities and mixture elements 𝜋𝑖 that are all equal to 𝜋 ∙

 ( x)   pi ( x), pi  p (1  p )i 1 and
i 1
‫𝑦𝑑)𝑦( ∗ 𝜋 𝒳׬‬
𝑝=
𝑀′ ‫𝑦𝑑)𝑦( ∗𝑞 𝒳׬‬
 The element identifier 𝐼~𝒢ℯℴ(𝑝) is generated not by discrete sampling but by
a sequential search that tests for 𝐼 = 1 , {𝐼 = 2}, . . until a test is accepted.
𝑥~𝜋𝑖 𝑥 = 𝜋(𝑥) is generated automatically as a by product of the
determination of 𝐼.
 Instead of simulating from the geometric distribution 𝒢ℯℴ(𝑝) directly which is
impossible, one simulates an element which admits this probability distribution
(see Peterson and Kronmal, 1982)
Arthur V. Peterson, Jr. and Richard A. Kronmal, On Mixture Methods for the Computer Generation of Random
Variables, The American Statistician, Vol. 36, No. 3, Part 1 (Aug., 1982), pp. 184-191

Statistical Computing and Machine Learning, Fall 2020, N. Zabaras 14


Mixture Methods for the Generation of Random Variables
2 1 2
 Consider the truncated Cauchy 𝜋 𝑥 = 𝜋 ∗ 𝑥 = 𝕀 𝑥 < 1 . Use 𝑞∗ 𝑥 = 𝕀 𝑥 < 1 .
𝜋 1+𝑥 2 𝜋

𝜋∗ (𝑥) ‫𝑦𝑑)𝑦( 𝜋 𝒳׬‬ 𝜋
Note that 𝑀 = sup ∗ ) = 1 and the probability of acceptance is 𝑝 = = .
𝑥∈𝒳 𝑞 (𝑥 𝑀 ‫𝑦𝑑)𝑦( ∗ 𝑞 𝒳׬‬ 4
 The AR algorithm proceeds as follows:
AR1. [Prepare for the acceptance-rejection comparison]
a. Generate 𝑥~𝒰(−1, + 1)
b. Generate 𝑢~𝒰 0, 1
AR2. [Acceptance-rejection step]
a. If 𝑢 < 1/(1 + 𝑥 2 ), return 𝑥.
b. Otherwise, repeat steps 1 and 2 until acceptance.
 The mixture representation of this AR algorithm is 𝜋 𝑥 = σ∞𝑖=1 𝑝𝑖 𝜋 𝑥 , 𝑝𝑖 = 𝑝 1 − 𝑝
𝑖−1 =
𝜋 𝜋 𝑖−1
1− where 𝑝𝑖 = 𝑃{the number of times that steps 1 and 2 are executed = 𝑖}.
4 4
 For the AR method 𝜋𝑖 ∙ = 𝜋 ∙ for all 𝑖, and thus the algorithm does not use in any
computations the value of the chosen mixture element 𝐼.

Arthur V. Peterson, Jr. and Richard A. Kronmal, On Mixture Methods for the Computer Generation of Random
Variables, The American Statistician, Vol. 36, No. 3, Part 1 (Aug., 1982), pp. 184-191

Statistical Computing and Machine Learning, Fall 2020, N. Zabaras 15


Accept/Reject-No Need of Normalizing Constants
x2

 The target 𝜋 is given by  ( x)   *( x)  e m( x), m( x)  M x  X
2

2
1  x2
 If we use q( x)  q *( x)  e (normalized )
2

then we have:  *( x) X  *( y)dy


 M '  2 M and Pr(Y accepted ) 
q *( x) 2 M'
x

 If instead we use q *( x)  e , then we have:
2

 *( x) X  *( y)dy X  *( y)dy X  *( y)dy


 M and Pr(Y accepted )   
q *( x) M  q *( y )dy M 2 M'
X

 Once again we see that we do not need the normalizing constant of 𝑞 ∗ .

Statistical Computing and Machine Learning, Fall 2020, N. Zabaras 16


Accept/Reject – Bayesian Estimation Example
 Consider a Bayesian model: prior  ( ) and likelihood f ( x |  )

 The posterior distribution is given by:


f ( x |  ) ( )
 ( | x)    *( | x), where  *( | x)  f ( x |  ) ( )
 f ( x |  ) ( )d
 We can use the prior distribution as a candidate distribution q( )  q *( )   ( ) as
long as  *( | x)
sup  sup f ( x |  )  M , i.e. M  f ( x |  MLE )
  q *( )  

 The likelihood is often bounded so one can use the rejection procedure.
Samples are accepted with probability
X  *( x)dx   ( ) f ( x |  )d   ( ) f ( x |  )d
 
 

M  q *( x)dx M   ( )d M
X 

Statistical Computing and Machine Learning, Fall 2020, N. Zabaras 17


Rejection Sampling in High-Dimensions
 Consider the following target distribution d

 xi2
1  i 1
 ( x )   *( x )  N (0,I )  e 2

 2 
d /2

 We take the following reasonably good proposal distribution (𝜎 > 1) d

 xi2
1  i 1 2
q *( x )  N (0,  I )  2
e 2

d
 2  2 d /2

 Note that: 
1
 *( x ) 
1 xi2 (1 2 )

The efficiency of
 e d 2 i1
  M
d
RS
q *( x ) decreases
Z 1
Pro  Proposal Accepted    0 exponentially
M  q *( y )dy  d
with
X dimensionality
when d  

Statistical Computing and Machine Learning, Fall 2020, N. Zabaras 18


Sampling From an Arbitrary PDF
 Inverting the CDF:

 Not practical in high dimensions


 Often the CDF is not known (e.g. when the density is known up to a normalizing factor)

 Rejection sampling:

 It only requires 𝜋 or 𝑞 to be known up to a normalizing constant


 We need to have a proposal density q *(x) from which we can draw easily samples. This
is not feasible in high dimensions.
 We need to find a bounding constant
 *(x)
M , x
q *(x)
 How do you construct 𝑞∗ (𝑥) automatically?

Statistical Computing and Machine Learning, Fall 2020, N. Zabaras 19


Envelope Rejection Method
 Assume we have:
qL* ( x)   *( x)  M ' q *( x)

 We can modify the accept/reject algorithm as follows:

I. Sample 𝑌~𝑞 and 𝑢~𝒰(0,1).


qL* (Y )
II. If u  , then return 𝑌;
M ' q *(Y )
 * (Y )
III. Otherwise, accept 𝑌 if u  , otherwise return to step I.
M ' q *(Y )

Statistical Computing and Machine Learning, Fall 2020, N. Zabaras 20


Log-Concave Densities
 Consider the class of univariate log-concave densities, i.e. we have:
 2 log  ( x)
x 2
 0,  ( x)  f ( x) X f ( x)dx
 The idea is to construct automatically a piecewise linear upper and lower
bounds for the target PDF.
1. Define the line Li ,i 1 through  xi , h( xi ) 
and  xi 1 , h( xi 1 )  as shown in the Fig .
2. Define on  x0 , xn 1 
hi ( x)  min  Li 1,i ( x), Li 1,i  2 ( x)  , hi ( x)  Li ,i 1 ( x)
h( x)  log f ( x) On [ x0 , xn 1 ]c : h n ( x)   and
h n ( x)  min  L0,1 ( x), Ln ,n 1 ( x) 
such that h n ( x)  h( x)  log f ( x)  h n ( x)
Statistical Computing and Machine Learning, Fall 2020, N. Zabaras 21
Log-Concave Densities
 Consider the class of univariate log-concave densities, i.e. we have:
 2 log  ( x)
x 2
 0,  ( x)  f ( x) X f ( x)dx
 The idea is to construct automatically a piecewise linear upper and lower
bounds for the target PDF.

h n ( x)  h( x)  log f ( x)  h n ( x) 

f n ( x)  e hn ( x )  f ( x)  e hn ( x )  f n ( x) 
f n ( x)  f ( x)  f n ( x)  wn g n ( x),
log f ( x) wn is the normalization constant of f n ( x )
and g n ( x) is a density easy to sample from.

Statistical Computing and Machine Learning, Fall 2020, N. Zabaras 22


Adaptive Rejection Sampling
 Let 𝑆𝑛 be a set of points 𝑥𝑖 , 𝑖 = 0,1, … , 𝑛 + 1. Initialize 𝑛 = 0 and 𝑆0.

 At iteration 𝑛 ≥ 1

 Generate 𝑌~𝑔𝑛 (𝑥), 𝑢~𝒰[0,1]


f n (Y )
 If u
wn g n (Y )
, then accept 𝑌; // Squeezing test
f (Y )
Otherwise, if u , then return 𝑌 // Rejection step
wn g n (Y )

Otherwise, update
S n 1  S n  Y 

Gilks, WR, Wild, P, "Adaptive rejection sampling for Gibbs sampling“ Applied Statistics, Vol. 41 (1992), pp. 337-348. Get it from
JSTOR

Statistical Computing and Machine Learning, Fall 2020, N. Zabaras 23


Adaptive Rejection Sampling: Example
 Consider 𝑛 data ( xi , Yi ) of integer value data Yi where
 e
a bxi 
 a bxi  yi
e e
Yi | xi ~ Poisson (exp a  bxi ) 
yi !
 Consider as the prior the following:
 (a, b)  N (a;0,  2 )N (b;0, 2 )

 We have for the posterior:


a2 b2

 (a, b | x1:n , y1:n )  exp a  yi  b yi xi  e a  e x b  e


 
i 2 2
e 2 2

a2
Full Conditional : log  (a | x1:n , y1:n , b)  a  yi  e a
e xi b
 2  non a  dependent terms 
2
 2 log  (a | x1:n , y1:n , b)
 e a  e xib   2  0
a 2

 Thus  (a | x1:n , y1:n , b) and similarly  (b | x1:n , y1:n , b) are log-concave, and
adaptive rejection sampling can be applied.

Statistical Computing and Machine Learning, Fall 2020, N. Zabaras 24


Monahan’s Accept/Reject Method
 Assume that we want to sample from a CDF of the form:
H  G ( x) 
F ( x)  P( X  x) 
H (1)
 Here 𝐺(𝑥) is a given CDF and

H ( x )   an x n ,
n 1

with : 1  a1  a2  ...  0

 We want to achieve this by taking samples only from 𝐺 and 𝒰[0,1].

J. F. Monahan, Extension of von Newmann’s method for generating random variables, Mathematics of
Computation, 33(147) (1979) 1065-1069.

Statistical Computing and Machine Learning, Fall 2020, N. Zabaras 25


Monahan’s Accept/Reject Method
 For example, let
2  2 i 2
x
  x   48   x 
2 2 2
 ... 
2 2 i 3
 2i !
  x   ...
2 i

F ( x)  1  cos 
2 2  2 i 2
 1   1  ...   1  ...
2 i

48 2 2 i 3
 2i !
 To derive this note that: x 
2i

2i
x    2i
cos x   (1) i x

 2i !
 1  cos  1   (1) i  2 

 2i !
  2i   x 
2 i

i 0 2 i 0 i 1 2  2i !

 2 i 2   x 2 
i
 2 i 2
x

22i 3  2i !
2 2 i 3
 
2i !
 x 
2 i

i 1 i 1
1  cos   (use x  1 for the denominator approximation)
2  
1
 2 i 2
 2  2i !  1
2 i

 8  2 i 3
  i 1

2  2 i 2
Thus : G ( x)  x 2 , H ( x)  x  x 2  ...  x i  ...
48 22i 3  2i !
Statistical Computing and Machine Learning, Fall 2020, N. Zabaras 26
Monahan’s Accept/Reject Method
H  G ( x)  
F ( x)  P( X  x)  , G is a CDF , H ( x )   an x n , s.t.1  a1  a2  ...  0
H (1) n 1

 The sampling algorithm is as follows:

Repeat
 Generate X ~ G and set K  1
 Repeat
 Generate U ~ G and V ~ U [0,1]
aK 1
 If U  X and V  then K  K  1, otherwise stop.
aK
Until K odd , return X

Statistical Computing and Machine Learning, Fall 2020, N. Zabaras 27


Monahan’s Accept/Reject Method
 We define the event
An : X  max  X ,U1 ,U 2 ,...,U n  and Z1  Z 2  ...Z n  1
 The 𝑈𝑖 ’s are the random variables generated in the inner loop of the algorithm
and the 𝑍𝑖 ’s are Bernoulli random variables equal to consecutive values
aK 1
V
 We can show: aK
P  X  x, An   anG ( x) n
 Notice that (the CDF of the max function was derived in the previous lecture):
 a2   an 
P  X  x, An   1 G ( x)     G ( x)   ...    G ( x)   anG ( x) n Note 𝑎1 = 1
 a1   an1 
 Using 𝐴𝑛+1 ⊂ 𝐴𝑛 , we can now show that:
P  X  x, An , Anc1   P  X  x, An   P  X  x, An1 
 anG ( x) n  an1G ( x) n1
Statistical Computing and Machine Learning, Fall 2020, N. Zabaras 28
Monahan’s Accept/Reject Method
 The probability that 𝑋 is accepted is:

P ( Accept X )  P ( K odd )   an (1) n1   H (1)
n 1

 This can be shown simply using P  X  , An , Anc1   anG () n  an1G () n1  an  an1


P ( Accept X )  P ( K odd )   a1  a2    a3  a4   ...   an (1) n1   H (1)
n 1

 The returned 𝑋 has then a distribution:


P ( X  x, X returned )
F ( x)  P ( X  x | X returned ) 
P ( X returned )

Statistical Computing and Machine Learning, Fall 2020, N. Zabaras 29


Monahan’s Accept/Reject Method
 The returned 𝑋 has then a distribution:

P ( X  x, X returned )

n 1,3,5,..
P ( X  x, An , Anc1 )
P ( X  x | X returned )  
P ( X returned ) P ( X returned )

 1
a G ( x)  a2G ( x)    a3G ( x)  a4G ( x)   ...
2 3 4  a G ( x) (1)
n
n n 1
 H  G ( x) 
  
1 n 1

 H (1)  H (1)  H (1)

H  G ( x) 
F ( x)  P ( X  x | X returned ) 
H (1)

Statistical Computing and Machine Learning, Fall 2020, N. Zabaras 30

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy