Introduction To Markov Chain Monte Carlo (MCMC) and Its Role in Modern Bayesian Analysis
Introduction To Markov Chain Monte Carlo (MCMC) and Its Role in Modern Bayesian Analysis
Introduction To Markov Chain Monte Carlo (MCMC) and Its Role in Modern Bayesian Analysis
Phil Gregory
March 2010
Outline
1. Bayesian primer 1
5. Conclusions 8
Methanol Occam
outline
What is Bayesian Probability Theory?
(BPT)
Product rule : p A, B C = p A C p B A, C
= p B C p A B, C
Bayes theorem :
p A C p B A, C
p A B, C =
p B C
outline
p Hi I â p D Hi , I
p Hi D, I =
pD I
Posterior probability
that Hi is true, given Normalizing constant
the new data D and
prior information I
Every item to the right of the
vertical bar | is assumed to be true
The likelihood p(D| Hi, I), also written as l(Hi ), stands for
the probability that we would have gotten the data D that we
did, if Hi is true.
outline
As a theory of extended logic BPT can be used to find optimal
answers to well posed scientific questions for a given state of
knowledge, in contrast to a numerical recipe approach.
Two basic problems
1. Model selection (discrete hypothesis space)
“Which one of 2 or more models (hypotheses) is most probable
given our current state of knowledge?”
e.g.
• Hypothesis or model M0 asserts that the star has no planets.
• Hypothesis M1 asserts that the star has 1 planet.
• Hypothesis Mi asserts that the star has i planets.
0 Realm of science 1
and inductive logic
false true
Occam
outline
Calculation of a simple Likelihood p D M , X , I
Let di represent the i th measured data value . We model di by,
d i = fi X + e i
Model prediction for i th data value
for current choice of parameters X
where ei represents the error component in the measurement.
0.4 pH Di » M,X ,I L
Probability density
proportional
0.3 to line height
0.2
0.1
ei
0
Jdi - fi H X LN
2
i= 1 i= 1 si 2
The familiar c2
statistic used
in least-squares
105
104
p P M, I P
= 104
10
1
p P M, I P
p H P M , I L dP =
dP
P ¥ ln H P max ê P min L
p Hln P M , I L d ln P =
d ln P
ln H P max ê P min L
or equivalently
d ln f
p ln f M , I d ln f =
ln fmax fmin
Data
To test this prediction, a new spectrometer was mounted on the James
Clerk Maxwell telescope on Mauna Kea and the spectrum shown below
was obtained. The spectrometer has 64 frequency channels.
Questions of interest
Hdi - T fi L2
p HD M1 , T , I L = H2 p L 2 σ−N ExpC- ‚ G
N N
-
i= 1 2s
i= 1 2 Is + s M
2 2
outline
Simple nonlinear model with a single parameter α
True value
The Bayesian posterior density for a nonlinear model with single parameter,
α, for 4 simulated data sets of different size ranging from N = 5 to N = 80.
The N = 5 case has the broadest distribution and exhibits 4 maxima.
Asymptotic theory says that the maximum likelihood estimator becomes
more unbiased, more normally distributed and of smaller variance as the
sample size becomes larger. Simulated annealing
Integration not minimization
outline
p T D, M1 , I = ‚ u 0 ‚ s L ‚ s p T , u0 , s L , s D , M 1 , I
Laplace
approx.’s
Chapters
1. Role of probability theory in science
2. Probability theory as extended logic
3. The how-to of Bayesian inference
4. Assigning probabilities
5. Frequentist statistical inference
6. What is a statistic?
7. Frequentist hypothesis testing
8. Maximum entropy probabilities
9. Bayesian inference (Gaussian errors)
10. Linear model fitting (Gaussian errors)
11. Nonlinear model fitting
12. Markov chain Monte Carlo
13. Bayesian spectral analysis
14. Bayesian inference (Poisson sampling)
The MCMC employs a Markov chain random walk, whereby the new
sample in parameter space, designated X{t+1} , depends on previous
sample Xt according to an entity called the transition probability or
kernel, p(X{t+1} |Xt). The transition kernel is assumed to be time
independent.
conditions return
outline
p HY » D, IL q HXt » YL
p HXt » D, IL q HY » XtL
-If U £ â , then set Xt+1 = Y
Autocorrelation
return
outline
MCMC parameter samples for
P1 a Kepler model with 2 planets.
P2
Parallel tempering
Simulated annealing
Genetic algorithm
Differential evolution
pH8Xa <»D,M,IL
Target Posterior
Quasi-Monte Carlo
outline
Calculation of p(D|M0,I)
Model M0 assumes the spectrum is consistent with noise and has no
free parameters so we can write
Hdi - 0L2
p HD M0 , s , I L = H2 p L 2 Js 2 + s2 N 2 ExpC- ‚ G
N N N
- -
i= 1 2 Is 2
+ s2
M
9v Ikm s−1M, FWHM Ikm s−1 M, TJ HKL, HN ê ZLA Icm−2 M, HN ê ZLA Icm−2M
TK HKL, ν UL H MHzL, FWHMUL Ikm s−1M, TUL HKL, ds96 , ds242 , s HKL=
Conclusions
1. For Bayesian parameter estimation, MCMC provides a powerful
means of computing the integrals required to compute posterior
probability density function (PDF) for each model parameter.
2. Even though we demonstrated the performance of an MCMC for a
simple spectral line problem with only 4 parameters, MCMC
techniques are really most competitive for models with a much larger
number of parameters m ≥ 15.
3. Markov chain Monte Carlo analysis produces samples in model
parameter space in proportion to the posterior probability distribution.
This is fine for parameter estimation.
For model selection we need to determine the proportionality constant
to evaluate the marginal likelihood p(D|Mi ,I) for each model. This is a
much more difficult problem still in search of two good solutions for
large m. We need two to know if either is valid.
One solution is to use the MCMC results from all the parallel
tempering chains spanning a wide range of β values, however, this
becomes computationally very intensive for m > 17.
For a copy of this talk please Google Phil Gregory
outline
Let qij represent the ith iteration of the jth of m independent simulation.
Extract the last h post burn - in iterations for each simulation.
‚ ‚ Iq j - q j M
m h
1 i êê 2
m Hh - 1L j=1 i=1
Mean within chain variance W =
‚ Hq j - q L
m
h êê êê 2
Between chain variance B =
m- 1 j=1
i 1 yz
Estimated variance V Hq L = jj1 - zW+ B
` 1
k h{ h
Hq L
Gelman - Rubin statistic = $%%%%%%%%%%%%
`
V
The Gelman - Rubin statistic should be close to 1.0 He.g. < 1.05 L
W