Bayesian Inference: A Practical Primer: Outline
Bayesian Inference: A Practical Primer: Outline
Bayesian Inference: A Practical Primer: Outline
A Practical Primer
Tom Loredo
Department of Astronomy, Cornell University
loredo@spacenet.tn.cornell.edu
http://www.astro.cornell.edu/staff/loredo/bayes/
Outline
• Parametric Bayesian inference
– Probability theory
– Parameter estimation
– Model uncertainty
• Bayesian calculation
– Asymptotics: Laplace approximations
– Quadrature
– Posterior sampling and MCMC
Bayesian Statistical Inference:
Quantifying Uncertainty
Inference:
Statistical:
x has a single,
uncertain value
x
Relationships between probability and frequency were
demonstrated mathematically (large number theorems,
Bayes’s theorem).
x
Probability ≡ frequency.
Interpreting Abstract Probabilities
Symmetry/Invariance/Counting
• Resolve possibilities into equally plausible “microstates”
using symmetries
• Count microstates in each possibility
Probability 6= Frequency!
Bayesian Probability:
A Thermal Analogy
Intuitive notion Quantification Calibration
Bayes’s Theorem:
P (D|Hi, I)
P (Hi|D, I) = P (Hi |I)
P (D|I)
posterior ∝ prior × likelihood
Marginalization:
Note that for exclusive, exhaustive {Bi},
X X
P (A, Bi |I) = P (Bi|A, I)P (A|I) = P (A|I)
i i
X
= P (Bi|I)P (A|Bi, I)
i
Parameter Estimation
I = Model M with parameters θ (+ any add’l info)
Hi = statements about θ; e.g. “θ ∈ [2.5, 3.5],” or “θ > 0”
Probability for any such statement can be found using a
probability density function (PDF) for θ:
P (θ ∈ [θ, θ + dθ]| · · ·) = f (θ)dθ
= p(θ| · · ·)dθ
Summaries of posterior:
• “Best fit” values: mode, posterior mean
• Uncertainties: Credible regions
• Marginal distributions:
– Interesting parameters ψ, nuisance parameters φ
– Marginal dist’n for ψ:
Z
p(ψ|D, M ) = dφ p(ψ, φ|D, M )
L(Mi ) = hL(θi)i
Hi = statements about ψ
P(D|H) Simple H
Complicated H
D obs D
p, L
Likelihood
δθ
Prior
θ
∆θ
Z
p(D|Mi ) = dθi p(θi |M ) L(θi )
– Parameter estimation:
p(θ|M )L(θ)
p(θ|D, M ) = R
dθ p(θ|M )L(θ)
– Model Comparison:
R
dθ1 p(θ1 |M1 ) L(θ1 )
O∝R
dθ2 p(θ2 |M2 ) L(θ2 )
• Evaluate
R S(Dobs ); decide whether to reject H0 based on,
e.g., >Sobs dS p(S|H0 )
Crucial Distinctions
The role of subjectivity:
BI exchanges (implicit) subjectivity in the choice of null
& statistic for (explicit) subjectivity in the specification
of alternatives.
BI is a problem-solving approach
FS is a solution-characterization approach
(xi − µ)2
· ¸
1
Infer µ : x i = µ + ²i ; p(xi |µ, M ) = √ exp −
σ 2π 2σ 2
p(x1,x2| µ )
x2
µ
x2 x1
x1
√
68% confidence region: x̄ ± σ/ N
(x̄ − µ)2
· ¸
Infer µ : Flat prior; L(µ) ∝ exp − √
2(σ/ N )2
p(x1,x2| µ )
L( µ)
x2
µ
x1
√
68% credible region: x̄ ± σ/ N
R x̄−σ/√N h
(x̄−µ) 2
i
√
x̄−σ/ N
dµ exp − 2(σ/ √ 2
N)
R∞ h i ≈ 0.683
(x̄−µ) 2
−∞
dµ exp − 2(σ/√N )2
Frequentist integrals:
Z Z Z
dx1 p(x1 |θ) dx2 p(x2 |θ) · · · dxN p(xN |θ)f (D)
Bayesian integrals:
Z
dm θ g(θ) p(θ|M ) L(θ)
∂ 2 ln[p(θ|M )L(θ)] ¯¯
where I= ¯, Info matrix
∂ 2θ θ̂
Bayes Factors:
Z
dθ p(θ|M )L(θ) ≈ p(θ̂|M )L(θ̂) (2π)m/2 |I|−1/2
Marginals:
Profile likelihood Lp (θ) ≡ max L(θ, φ)
φ
−1/2
→ p(θ|D, M ) ∝
∼ Lp (θ)|I(θ)|
Z X
dθ f (θ) ≈ wi f (θi ) + O(n−2 ) or O(n−4 )
i
· ¸
∼ O(n−1 ) with
Z X
dθ g(θ)p(θ) ≈ g(θi ) + O(n−1/2 )
quasi-MC
θi ∼p(θ)
Subregion-Adaptive Quadrature/MC:
Concentrate points where most of the probability lies
via recursion
Adaptive quadrature: Use a pair of lattice rules (for
error estim’n), subdivide regions w/ large error
• ADAPT (Genz & Malik) at GAMS (gams.nist.gov)
• BAYESPACK (Genz; Genz & Kass)—many methods
Automatic; regularly used up to m ≈ 20
Adaptive Monte Carlo: Build the importance sampler
on-the-fly (e.g., VEGAS, miser in Numerical Recipes)
Rejection Method:
P(θ ) ●
● ● ●
● ● ● ●
●
● ● ●
● ●
● ● ●
●
● ●
● ● ●
● ● ●
● ● ● ●
● ●
● ● ● ●
● ●
● ●
● ● ●
● ●
● ●
● ● ●
θ
Hard to find efficient comparison function if m>
∼6
e−Λ(θ)
Z
Then p(θ|D, M ) = Z≡ dθ e−Λ(θ)
Z
Bayesian integration looks like problems addressed in
computational statmech and Euclidean QFT!
Bayesian Benefits:
• Rigorous foundations, consistent & simple interpre-
tation
Bayesian Challenges:
• More complicated problem specification
(≥ 2 alternatives; priors)
P(D|H) H0
P=95%
D
D obs
P(D|H) H2
H1
H0
D
D obs
" #
r2
· ¸
π(θ)J(θ) −1 X 2
p(B, θ|D, I) ∝ exp − exp (B µ − B̂ µ )
σN 2σ 2 2σ 2 µ
Y
where J(θ) = λµ (θ)−1/2
µ
Model Comparison:
• Model comparison is asymptotically consistent. Popular
frequentist procedures (e.g., χ2 test, asymptotic likeli-
hood ratio (∆χ2 ), AIC) are not.