A Gentle Tutorial in Bayesian Statistics PDF
A Gentle Tutorial in Bayesian Statistics PDF
A Gentle Tutorial in Bayesian Statistics PDF
Warning
This tutorial should be accessible even if the equations might look hard.
2 / 29
The need for (statistical) modelling; two examples (a linear model/tractography) introduction to statistical inference (frequentist); introduction to the Bayesian approach to parameter estimation; more examples and Bayesian inference in practice conclusions.
3 / 29
4 / 29
4 / 29
4 / 29
4 / 29
4 / 29
5 / 29
5 / 29
5 / 29
5 / 29
5 / 29
5 / 29
0.8
q qq q q q q q q q q qq q q
response (y)
q q q q qq q q q qq q q qq q q q
0.4
q q qq q q qq qq qq q q q q q q q q q q q q q q qq q q q
q q
0.6
q q
0.2
explanatory (x)
6 / 29
An Example in DW-MRI
Suppose that we are interested in tractography. We use the diusion tensor to model local diusion within a voxel. The (model) assumption made is that local diusion could be modelled with a 3D Gaussian distribution whose variance-covariance matrix is proportional to the diusion tensor, D .
7 / 29
An Example in DW-MRI
The resulting diusion-weighted signal, i along a gradient direction gi with b -value bi is modelled as: i = S0 exp {bi gT i D g} where D11 D12 D13 D = D21 D22 D23 D31 D32 D33 (1)
S0 is the signal with no diusion weight gradients applied (i.e. b0 = 0). The eigenvectors of D give an orthogonal coordinate system and dene the orientation of the ellipsoid axes. The eigenvalues of D give the length of these axes. If we sort the eigenvalues by magnitude we can derive the the orientation of the major axis of the ellipsoid and the orientation of the minor axes.
8 / 29
An Example in DW-MRI
Although this may look a bit complicate, actually, it can be written in terms of a linear model.
11 / 29
Sometime we are interested in (not necessarily) functions of parameters, e.g. 1 + 2 , 1 /(1 1 ) 2 /(1 2 )
Whilst in some cases, the frequentist approach oers a solution which is not exact but approximate, there other where it cannot or it is very hard to do so.
13 / 29
Bayesian Inference
When drawing inference within a Bayesian framework, the data are treated as a xed quantity and the parameters are treated as random variables. That allows us to assign to parameters (and models) probabilities, making the inferential framework far more intuitive and more straightforward (at least in principle!)
14 / 29
(y|) () = (y)
(y|) () (y| ) ( ) d
Everything is assigned distributions (prior, posterior); we are allowed to incorporate prior information about the parameter . . . which is then updated by using the likelihood function . . . leading to the posterior distribution which tell us everything we need about the parameter.
16 / 29
Everything is assigned distributions (prior, posterior); we are allowed to incorporate prior information about the parameter . . . which is then updated by using the likelihood function . . . leading to the posterior distribution which tell us everything we need about the parameter.
16 / 29
Everything is assigned distributions (prior, posterior); we are allowed to incorporate prior information about the parameter . . . which is then updated by using the likelihood function . . . leading to the posterior distribution which tell us everything we need about the parameter.
16 / 29
Everything is assigned distributions (prior, posterior); we are allowed to incorporate prior information about the parameter . . . which is then updated by using the likelihood function . . . leading to the posterior distribution which tell us everything we need about the parameter.
16 / 29
Although Bayesian inference has been around for long time it is only the last two decades that it has really revolutionized the way we do statistical modelling. Although, in principle, Bayesian inference is straightforward and intuitive when it comes to computations it could be very hard to implement it. Thanks to computational developments such as Markov Chain Monte Carlo (MCMC) doing Bayesian inference is a lot easier.
18 / 29
posterior
0
0.0
0.2
0.4 theta
0.6
0.8
1.0
19 / 29
10
posterior
0
0.0
0.2
0.4 theta
0.6
0.8
1.0
20 / 29
posterior
0
0.0
0.2
0.4 theta
0.6
0.8
1.0
21 / 29
posterior 0
0.0
0.2
0.4 theta
0.6
0.8
1.0
22 / 29
10
posterior
0
0.0
0.2
0.4 theta
0.6
0.8
1.0
23 / 29
(y|M1 ) (M1 ) (D )
(y|1 , M1 ) (1 ) d1 (y|2 , M2 ) (2 ) d2
The Bayesian model comparison does not depend on the parameters used by each model. Instead, it considers the probability of the model considering all possible parameter values. This is similar to a likelihood-ratio test, but instead of maximizing the likelihood, we average over all the parameters.
25 / 29
26 / 29
27 / 29
Suppose that we have some measurements (intensities) for each voxel. We could t the two dierent models (on the same dataset). Question: How do we tell which model ts the data best taking into account the uncertainty associated with the parameters in each model? Answer: Calculate the Bayes factor!
28 / 29
Suppose that we have some measurements (intensities) for each voxel. We could t the two dierent models (on the same dataset). Question: How do we tell which model ts the data best taking into account the uncertainty associated with the parameters in each model? Answer: Calculate the Bayes factor!
28 / 29
Conclusions
Quantication of the uncertainty both in parameter estimation and model choice is essential in any modelling exercise. A Bayesian approach oers a natural framework to deal with parameter and model uncertainty. It oers much more than a single best t or any sort sensitivity analysis. There is no free lunch, unfortunately. To do fancy things, often one has to write his/her own computer programs. Software available: R, Winbugs, BayesX . . .
29 / 29