0% found this document useful (0 votes)
4 views

1 Bayesian Talk

The document provides an introduction to Bayesian statistics, focusing on its foundations, comparisons with Frequentist approaches, and applications in health economics using WinBUGS. It discusses Bayes' Theorem, the concepts of prior and posterior probabilities, and the differences in probability definitions between Bayesian and Frequentist statistics. The presentation also includes examples and a discussion on hypothesis testing within both paradigms.

Uploaded by

yunsulalala
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views

1 Bayesian Talk

The document provides an introduction to Bayesian statistics, focusing on its foundations, comparisons with Frequentist approaches, and applications in health economics using WinBUGS. It discusses Bayes' Theorem, the concepts of prior and posterior probabilities, and the differences in probability definitions between Bayesian and Frequentist statistics. The presentation also includes examples and a discussion on hypothesis testing within both paradigms.

Uploaded by

yunsulalala
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 84

Bayesian Statistics: An Introduction Slide 1

An Introduction to Using WinBUGS


for Cost-Effectiveness Analyses
in Health Economics

Dr. Christian Asseburg


Centre for Health Economics
University of York, UK

ca505@york.ac.uk

Part 1

Bayesian Statistics: An Introduction


2007-03-12, Linköping Dr Christian Asseburg
University of York, UK ca505@york.ac.uk
Bayesian Statistics: An Introduction Slide 2

Talk overview

● Foundations of Bayesian statistics


● Comparison between Frequentist and Bayesian
approaches
● Calculations and computer implementations
● Example from health economics
● Questions and discussion

2007-03-12, Linköping Dr Christian Asseburg


University of York, UK ca505@york.ac.uk
Bayesian Statistics: An Introduction Slide 3

History of Bayesian Statistics


The reverend Thomas Bayes (1702-1761) proved a special
case of what is now known as Bayes' Theorem.

Pierre-Simon Laplace (1749-1827) proved a more general


version of Bayes' Theorem and used it for various
applications.

The relevance of Bayes' Theorem to statistics, however, was


Revd. Thomas Bayes
not appreciated until the 20th century.

The Frequentist paradigm has been the mainstay of probability theory during the
19th and 20th century, with important contributions by e.g. Jerzy Neyman, Egon
Pearson, John Venn, R.A. Fisher, and Richard von Mises.

Frequentist tools such as hypothesis testing and confidence intervals have


allowed many advances in statistics. Bayesian equivalents exist, but they often
require more computations – it was during the last two decades of increasing
availability of computing resources that Bayesian statistics gained ground.

2007-03-12, Linköping Dr Christian Asseburg


University of York, UK ca505@york.ac.uk
Bayesian Statistics: An Introduction Slide 4

Bayes' Theorem
Bayes' Theorem can be derived easily from the expression of the joint
probability of two events A and B:

Let p(A) denote the probability that event A will occur, let p(B) denote the
probability that event B will occur, and let p(A,B) denote the probability that
both of the events occur.

Then p  A , B= p A⋅p  B∣A= p B⋅p A∣B

Bayes' Theorem states simply that

p  B⋅p  A∣B
p  B∣A=
p  A

2007-03-12, Linköping Dr Christian Asseburg


University of York, UK ca505@york.ac.uk
Bayesian Statistics: An Introduction Slide 5

Priors and Posteriors (1)


Of course, Bayes' Theorem as a way to relate the conditional probabilities
of two events is valid both in Frequentist as well as in Bayesian statistics.

However, in Bayesian statistics it is also applied to unknown parameters x


directly:

p  B⋅p  A∣B
p  B∣A=
p  A

p  x ⋅p data∣x
p  x∣data=
p data

2007-03-12, Linköping Dr Christian Asseburg


University of York, UK ca505@york.ac.uk
Bayesian Statistics: An Introduction Slide 6

Priors and Posteriors (2)


Unknown parameter(s): x

Data (known): data

Probability of data given x: p(data|x)

“Prior” probability of x: p(x)

“Posterior” probability of x: p(x|data)

p  x ⋅p data∣x
p  x∣data=
p data

2007-03-12, Linköping Dr Christian Asseburg


University of York, UK ca505@york.ac.uk
Bayesian Statistics: An Introduction Slide 7

Priors and Posteriors (2)


Unknown parameter(s): x

Data (known): data

Probability of data given x: p(data|x)

“Prior” probability of x: p(x)

“Posterior” probability of x: p(x|data)

“Likelihood”
p  x ⋅p data∣x
p  x∣data=
p data

2007-03-12, Linköping Dr Christian Asseburg


University of York, UK ca505@york.ac.uk
Bayesian Statistics: An Introduction Slide 8

Priors and Posteriors (2)


Unknown parameter(s): x

Data (known): data

Probability of data given x: p(data|x)

“Prior” probability of x: p(x)

“Posterior” probability of x: p(x|data)

p  x ⋅p data∣x
p  x∣data=
p data
The denominator is a constant and can usually be ignored.

2007-03-12, Linköping Dr Christian Asseburg


University of York, UK ca505@york.ac.uk
Bayesian Statistics: An Introduction Slide 9

Priors and Posteriors (3)


Bayes' Theorem is thus used to combine data with a prior belief on an
unknown quantity, resulting in a posterior belief on the unknown quantity.

This approach has been compared to the task of learning in humans, where
experience supports a constant updating of a person's belief system.

“Prior” probability of x: p(x)

“Posterior” probability of x: p(x|data)

p  x ⋅p data∣x
p  x∣data=
p data

2007-03-12, Linköping Dr Christian Asseburg


University of York, UK ca505@york.ac.uk
Bayesian Statistics: An Introduction Slide 10

Definition of “Probability”
FREQUENTIST

B AY E S I A N

2007-03-12, Linköping Dr Christian Asseburg


University of York, UK ca505@york.ac.uk
Bayesian Statistics: An Introduction Slide 11

Definition of “Probability”
FREQUENTIST
The “probability” of an event A occurring (or of a quantity taking a value in
a given interval) is a frequency. Imagine many (hypothetical or actual)
circumstances in which the data have been observed. The proportion of
circumstances in which event A occurs (out of all circumstances) is the
“probability” of A. This probability is objective.

B AY E S I A N

2007-03-12, Linköping Dr Christian Asseburg


University of York, UK ca505@york.ac.uk
Bayesian Statistics: An Introduction Slide 12

Definition of “Probability”
FREQUENTIST
The “probability” of an event A occurring (or of a quantity taking a value in
a given interval) is a frequency. Imagine many (hypothetical or actual)
circumstances in which the data have been observed. The proportion of
circumstances in which event A occurs (out of all circumstances) is the
“probability” of A. This probability is objective.

B AY E S I A N

The “probability” of an event A occurring (or of a quantity taking a value in


a given interval) is a degree of belief. The degree of belief in A may
change when we are confronted with new data. The “probability” of A is a
numerical representation of this degree of belief.
If you and I (and everyone else) agree on the belief in event A, we define
an objective probability, otherwise we define a subjective probability.

2007-03-12, Linköping Dr Christian Asseburg


University of York, UK ca505@york.ac.uk
Bayesian Statistics: An Introduction Slide 13

What is fixed, what is random? (1)


FREQUENTIST

B AY E S I A N

2007-03-12, Linköping Dr Christian Asseburg


University of York, UK ca505@york.ac.uk
Bayesian Statistics: An Introduction Slide 14

What is fixed, what is random? (1)


FREQUENTIST
There is a fixed, but unknown value for each parameter. The data are
an instance of many possible data that could have been collected. A
Frequentist statistician evaluates how likely the given data are according
to different hypothetical values for the unknown quantities. Thus,
statements about the probability of observing the data given different
hypothetical parameter values are summarised in a confidence interval.

B AY E S I A N

2007-03-12, Linköping Dr Christian Asseburg


University of York, UK ca505@york.ac.uk
Bayesian Statistics: An Introduction Slide 15

What is fixed, what is random? (1)


FREQUENTIST
There is a fixed, but unknown value for each parameter. The data are
an instance of many possible data that could have been collected. A
Frequentist statistician evaluates how likely the given data are according
to different hypothetical values for the unknown quantities. Thus,
statements about the probability of observing the data given different
hypothetical parameter values are summarised in a confidence interval.

B AY E S I A N

The value for each parameter is unknown. The data are known, they
have been observed. A Bayesian statistician evaluates how likely different
values for the underlying quantities are, given the observed data. Thus,
statements can be made about the probability of the unknown quantity
taking a value in a certain credibility interval.

2007-03-12, Linköping Dr Christian Asseburg


University of York, UK ca505@york.ac.uk
Bayesian Statistics: An Introduction Slide 16

What is fixed, what is random? (2)


FREQUENTIST
A 95% confidence interval for a quantity x:

B AY E S I A N

2007-03-12, Linköping Dr Christian Asseburg


University of York, UK ca505@york.ac.uk
Bayesian Statistics: An Introduction Slide 17

What is fixed, what is random? (2)


FREQUENTIST
A 95% confidence interval for a quantity x:

“If new data are collected many times and


confidence intervals are calculated, then 95%
of these confidence intervals contain the true
value of x.”

B AY E S I A N

2007-03-12, Linköping Dr Christian Asseburg


University of York, UK ca505@york.ac.uk
Bayesian Statistics: An Introduction Slide 18

What is fixed, what is random? (2)


FREQUENTIST
A 95% confidence interval for a quantity x:

“If new data are collected many times and


confidence intervals are calculated, then 95%
of these confidence intervals contain the true
value of x.”

B AY E S I A N

A 95% credibility interval for a quantity x:

“The probability that the value of x lies


between 2.5 and 4.7 is 95%, given the
observed data and the prior belief.”

2007-03-12, Linköping Dr Christian Asseburg


University of York, UK ca505@york.ac.uk
Bayesian Statistics: An Introduction Slide 19

Hypothesis testing
FREQUENTIST
Given two hypotheses, H0 and H1, ...

B AY E S I A N

2007-03-12, Linköping Dr Christian Asseburg


University of York, UK ca505@york.ac.uk
Bayesian Statistics: An Introduction Slide 20

Hypothesis testing
FREQUENTIST
Given two hypotheses, H0 and H1, calculate the probability of observing
the data (or more extreme data) if H0 is true. If this probability is low
(p‑value), reject H0.

B AY E S I A N

Given two hypotheses, H0 and H1, ...

2007-03-12, Linköping Dr Christian Asseburg


University of York, UK ca505@york.ac.uk
Bayesian Statistics: An Introduction Slide 21

Hypothesis testing
FREQUENTIST
Given two hypotheses, H0 and H1, calculate the probability of observing
the data (or more extreme data) if H0 is true. If this probability is low
(p‑value), reject H0.

B AY E S I A N

Given two hypotheses, H0 and H1, calculate the probability of each of


them, given the data and the priors. Favour the hypothesis that has the
higher probability.

2007-03-12, Linköping Dr Christian Asseburg


University of York, UK ca505@york.ac.uk
Bayesian Statistics: An Introduction Slide 22

Hypothesis testing
FREQUENTIST
Given two hypotheses, H0 and H1, calculate the probability of observing
the data (or more extreme data) if H0 is true. If this probability is low
(p‑value), reject H0.
Because a hypothesis is either true or false (this is just not known)
and only the likelihood of observing the data is calculated, a
Frequentist cannot assign a probability to each hypothesis.
B AY E S I A N

Given two hypotheses, H0 and H1, calculate the probability of each of


them, given the data and the priors. Favour the hypothesis that has the
higher probability.
The probability of each of the hypotheses being true can be
calculated. Relative statements (e.g. “H0 is twice as likely as H1”)
can be made.

2007-03-12, Linköping Dr Christian Asseburg


University of York, UK ca505@york.ac.uk
Bayesian Statistics: An Introduction Slide 23

A Simple Example (1)


In roulette, a spin of the wheel results in a red or a black
number (or 0). In one hour, the roulette wheel resulted in
25 red and 15 black numbers. What is the probability z that
this wheel gives a red number?

2007-03-12, Linköping Dr Christian Asseburg


University of York, UK ca505@york.ac.uk
Bayesian Statistics: An Introduction Slide 24

A Simple Example (1)


In roulette, a spin of the wheel results in a red or a black
number (or 0). In one hour, the roulette wheel resulted in
25 red and 15 black numbers. What is the probability z that
this wheel gives a red number?

FREQUENTIST

The probability of observing 25 red and 15 black numbers can be


described by a Binomial distribution with 25 successes out of 40.

The sample proportion of success is 25/40, or 0.625. Using the central


limit theorem, an approximate confidence interval for a proportion can be
found. The sampling distribution is summarised by its mean (0.625) and
standard deviation (0.0765), and these are used to obtain a 95%
confidence interval for the mean of a normal distribution. After correcting
for the discrete nature of the data, the confidence interval for z
is found: [0.46, 0.79].

2007-03-12, Linköping Dr Christian Asseburg


University of York, UK ca505@york.ac.uk
Bayesian Statistics: An Introduction Slide 25

A Simple Example (2)


In roulette, a spin of the wheel results in a red or a black
number (or 0). In one hour, the roulette wheel resulted in
25 red and 15 black numbers. What is the probability z that
this wheel gives a red number?

B AY E S I A N

The probability of observing 25 red and 15 black numbers can be


described by a Binomial distribution with 25 successes out of 40.

The prior probability for z is assumed to be Beta(1,1).

Bayes' Theorem is used to calculate the posterior probability of z.


(See next slide)

The 95% credibility interval for z is [0.47, 0.76].

2007-03-12, Linköping Dr Christian Asseburg


University of York, UK ca505@york.ac.uk
Bayesian Statistics: An Introduction Slide 26

A Simple Example (3)


In roulette, a spin of the wheel results in a red or a black
number (or 0). In one hour, the roulette wheel resulted in
25 red and 15 black numbers. What is the probability z that
this wheel gives a red number?

B AY E S I A N

Bayes' theorem p  z∣data∝ p  z ⋅p data∣z 


(The denominator of Bayes' Theorem, p(data), is a constant and can
usually be ignored.)

p(data|z) = Binomial (25 out of 40 with prob. z)

p(z) = Beta(1, 1)

2007-03-12, Linköping Dr Christian Asseburg


University of York, UK ca505@york.ac.uk
Bayesian Statistics: An Introduction Slide 27

A Simple Example (4)


In roulette, a spin of the wheel results in a red or a black
number (or 0). In one hour, the roulette wheel resulted in
25 red and 15 black numbers. What is the probability z that
this wheel gives a red number?

B AY E S I A N

Bayes' theorem p  z∣data∝ p  z ⋅pdata∣z 


1 1−1 1−1 40 !
p  z∣data∝ z 1− z  z 25 1− z 40−25
B 1,1 25!40−25!
25 15
p  z∣data∝ z 1−z 
So p(z|data) = Beta(26,16), and the credibility interval can be calculated
easily by looking up the cumulative probabilities.

2007-03-12, Linköping Dr Christian Asseburg


University of York, UK ca505@york.ac.uk
Bayesian Statistics: An Introduction Slide 28

A Simple Example (5)


In roulette, a spin of the wheel results in a red or a black
number (or 0). In one hour, the roulette wheel resulted in
25 red and 15 black numbers. What is the probability z that
this wheel gives a red number?

B AY E S I A N

In this simple example, when the prior is from a particular family (Beta)
and the likelihood of the data is also from a particular family (Binomial),
the posterior likelihood also belongs to a particular family of distributions
(Beta). The Beta prior and Binomial likelihood distribution are called
conjugate.

This is a special case – usually the Bayesian posterior distributions


cannot be calculated analytically, and numerical methods are required to
approximate the posterior distribution.

2007-03-12, Linköping Dr Christian Asseburg


University of York, UK ca505@york.ac.uk
Bayesian Statistics: An Introduction Slide 29

A Simple Example (6)


In roulette, a spin of the wheel results in a red or a black
number (or 0). In one hour, the roulette wheel resulted in
25 red and 15 black numbers. What is the probability z that
this wheel gives a red number?

B AY E S I A N

Different choices of prior distributions lead to different posterior


distributions and thus to different credibility intervals.

Prior Data Posterior 95% credibility interval


Beta(1,1) 25 out of 40 Beta(26,16) [0.47, 0.76]

Beta(50,50) 25 out of 40 Beta(75,65) [0.45, 0.62]

Beta(26,16) 25 out of 40 Beta(51,31) [0.52, 0.72]

2007-03-12, Linköping Dr Christian Asseburg


University of York, UK ca505@york.ac.uk
Bayesian Statistics: An Introduction Slide 30

A Simple Example (7)


Different choices of prior distributions lead to different posterior
distributions and thus to different credibility intervals.

Prior Data Posterior 95% credibility interval


Beta(1,1) 25 out of 40 Beta(26,16) [0.47, 0.76]

2007-03-12, Linköping Dr Christian Asseburg


University of York, UK ca505@york.ac.uk
Bayesian Statistics: An Introduction Slide 31

A Simple Example (7)


Different choices of prior distributions lead to different posterior
distributions and thus to different credibility intervals.

Prior Data Posterior 95% credibility interval


Beta(1,1) 25 out of 40 Beta(26,16) [0.47, 0.76]
Beta(50,50) 25 out of 40 Beta(75,65) [0.45, 0.62]

2007-03-12, Linköping Dr Christian Asseburg


University of York, UK ca505@york.ac.uk
Bayesian Statistics: An Introduction Slide 32

A Simple Example (7)


Different choices of prior distributions lead to different posterior
distributions and thus to different credibility intervals.

Prior Data Posterior 95% credibility interval


Beta(1,1) 25 out of 40 Beta(26,16) [0.47, 0.76]
Beta(50,50) 25 out of 40 Beta(75,65) [0.45, 0.62]
Beta(26,16) 25 out of 40 Beta(51,31) [0.52, 0.72]

2007-03-12, Linköping Dr Christian Asseburg


University of York, UK ca505@york.ac.uk
Bayesian Statistics: An Introduction Slide 33

Priors - again...
Different choices of prior distributions lead to different posterior
distributions and thus to different credibility intervals.

Prior Data Posterior 95% credibility interval


Beta(1,1) 25 out of 40 Beta(26,16) [0.47, 0.76]
Beta(50,50) 25 out of 40 Beta(75,65) [0.45, 0.62]
Beta(26,16) 25 out of 40 Beta(51,31) [0.52, 0.72]

So how does one choose the “right”


prior?

2007-03-12, Linköping Dr Christian Asseburg


University of York, UK ca505@york.ac.uk
Bayesian Statistics: An Introduction Slide 34

Controversy regarding priors


There is no “right” prior.

A good prior choice may be obvious, for example when earlier


studies on a model quantity can be used.

The influence of the prior on the model output can be minimised by


choosing an “uninformative” prior or a “reference prior”.

If different stakeholders are involved, whose prior opinions on a model


quantity differ, each of them may propose a prior. The model can then
be run in turn for each prior. Afterwards, it may be possible to
reconcile the different posterior opinions.

In general, if the prior choice makes a difference to the model's output,


then more data should be collected. A good modelling application should
either have an informative prior or be robust to prior choice.

2007-03-12, Linköping Dr Christian Asseburg


University of York, UK ca505@york.ac.uk
Bayesian Statistics: An Introduction Slide 35

Model Selection
In Bayesian statistics, it is relatively straightforward to evaluate different
explanations for a data-set (nested models or totally different models).
The models are all evaluated simultaneously, together with additional
parameters mi for the probabilities of each of the models.

The posteriors for the parameters mi summarise how well each of the
competing models fits the data. Depending on the model application,
one most suitable model may be found, or predictions can be made from
all models simultaneously, using the posterior values for mi as weights
(model averaging).

In Frequentist statistics, it is relatively easy to evaluate nested models –


but the evaluation of other competing models is not straightforward.

2007-03-12, Linköping Dr Christian Asseburg


University of York, UK ca505@york.ac.uk
Bayesian Statistics: An Introduction Slide 36

Summary
FREQUENTIST B AY E S I A N

Probability Frequency Belief

Statements Probability of observing Probability of model quantity


data

Objectivity Result depends only Result depends on prior and


on data data – subjective

Computation Often feasible Often complicated

Flexibility Some applications re- no intrinsic limitations


quire normal or other
simplifying assumptions

Model selection Sometimes possible Straightforward

2007-03-12, Linköping Dr Christian Asseburg


University of York, UK ca505@york.ac.uk
Bayesian Statistics: An Introduction Slide 37

Interval.......

...... any questions? .......

Good Bayesian text book that starts with a comparison of Bayesian and
Frequentist methods:

D'Agostini, G: Bayesian Reasoning in Data Analysis.


World Scientific Publishing, Singapore, 2003.

Why use Bayesian methods in health economics? E.g. B Luce, Y Shih,


K Claxton: International Journal of Technology Assessment in Health
Care 17/1, 2001, pp 1-5.

2007-03-12, Linköping Dr Christian Asseburg


University of York, UK ca505@york.ac.uk
Bayesian Statistics: An Introduction Slide 38

Calculating Bayesian Posteriors


Models that can be solved analytically (such as the simple example
before) are rare and require conjugacy. Most multi-dimensional models
do not fall in this class.

The problem is that, for each possible set of parameter values, Bayes'
Theorem gives the posterior probability, but if the parametric form of the
distribution cannot be recognised, there is no obvious method for
calculating e.g. its mean value, or for sampling from it.

Therefore, a Bayesian model usually requires numerical methods for


calculating the posteriors of interest. Any algorithm that generates
samples from a distribution that is defined by its probability density
function could be used.

p∣data∝ p ⋅pdata∣

2007-03-12, Linköping Dr Christian Asseburg


University of York, UK ca505@york.ac.uk
Bayesian Statistics: An Introduction Slide 39

Calculating Bayesian Posteriors


p ∣data∝ p ⋅p data∣
The most commonly used algorithms for sampling from the Bayesian
posterior fall in two groups:

Metropolis-Hastings: some algorithms in this class are Markov chain


Monte Carlo (MCMC), e.g. Gibbs sampling or Reversible Jump.
These algorithms work well when the posterior model
space can be written as a product, such that factors
correspond to subspaces.
Sequential Importance Sampling.
Very suitable for posteriors that can be written as products,
such that factors correspond to individual data.

2007-03-12, Linköping Dr Christian Asseburg


University of York, UK ca505@york.ac.uk
Bayesian Statistics: An Introduction Slide 40

Calculating Bayesian Posteriors


p ∣data∝ p ⋅p data∣
The most commonly used algorithms for sampling from the Bayesian
posterior fall in two groups:

Metropolis-Hastings: some algorithms in this class are Markov chain


Monte Carlo (MCMC), e.g. Gibbs sampling or Reversible Jump.

p ∣data∝ p a  p data∣a ⋅p b  p data∣b 

Sequential Importance Sampling.

p ∣data∝ p ⋅p data 1∣⋅p data 2∣⋅...

2007-03-12, Linköping Dr Christian Asseburg


University of York, UK ca505@york.ac.uk
Bayesian Statistics: An Introduction Slide 41

Markov chain Monte Carlo (1)


MCMC generates samples from the posterior space M by defining a
chain C={x1, x2, x3, ...} in M.

At each step i in the chain, candidate values x* are generated randomly


for each of the parameters. (The proposal distribution may depend on
the current values, xi.)

The posterior probabilities are calculated for both xi and x*. Depending
on the likelihood of x* relative to xi, an acceptance probability is
calculated, and the chain either moves to x* (xi+1=x*) or stays at its
current value (xi+1=xi).

Ergodic theory ensures that, in the limit, the distribution of the values of
C converges to the posterior distribution of interest. The beginning of the
chain is discarded because the initial values dominate it (“burn-in”).

2007-03-12, Linköping Dr Christian Asseburg


University of York, UK ca505@york.ac.uk
Bayesian Statistics: An Introduction Slide 42

Markov chain Monte Carlo (2)


Illustration:

At step i in the MCMC, the chain


may jump to the candidate value
θ* or stay at the current value θi.
This depends on the posterior
probabilities for these two points
in parameter space.

2007-03-12, Linköping Dr Christian Asseburg


University of York, UK ca505@york.ac.uk
Bayesian Statistics: An Introduction Slide 43

Markov chain Monte Carlo (2)


Illustration:

At step i in the MCMC, the chain


may jump to the candidate value
θ* or stay at the current value θi.
This depends on the posterior
probabilities for these two points
in parameter space.

In the long run, the


distribution of points in the
chain approximates the
posterior distribution.

2007-03-12, Linköping Dr Christian Asseburg


University of York, UK ca505@york.ac.uk
Bayesian Statistics: An Introduction Slide 44

Markov chain Monte Carlo (3)


To sample from a multi-dimensional posterior (e.g. the posterior of a model
with several unknown parameters), parameters can be grouped together
(block sampling). Blocks are chosen such that calculations can be simplified.

At each iteration, a new candidate is suggested for one block (and


parameters in the other blocks retain their current value). The candidate
values for that block are either accepted or rejected. Then the same is done
for the next parameter block, etc.

Example
1. Suggest a candidate for a.
(In this example, a* is accepted.)
2.

3.

2007-03-12, Linköping Dr Christian Asseburg


University of York, UK ca505@york.ac.uk
Bayesian Statistics: An Introduction Slide 45

Markov chain Monte Carlo (3)


To sample from a multi-dimensional posterior (e.g. the posterior of a model
with several unknown parameters), parameters can be grouped together
(block sampling). Blocks are chosen such that calculations can be simplified.

At each iteration, a new candidate is suggested for one block (and


parameters in the other blocks retain their current value). The candidate
values for that block are either accepted or rejected. Then the same is done
for the next parameter block, etc.

Example
1. Suggest a candidate for a.
(In this example, a* is accepted.)
2. Suggest a candidate for b.
(In this example, b* is accepted.)
3.

2007-03-12, Linköping Dr Christian Asseburg


University of York, UK ca505@york.ac.uk
Bayesian Statistics: An Introduction Slide 46

Markov chain Monte Carlo (3)


To sample from a multi-dimensional posterior (e.g. the posterior of a model
with several unknown parameters), parameters can be grouped together
(block sampling). Blocks are chosen such that calculations can be simplified.

At each iteration, a new candidate is suggested for one block (and


parameters in the other blocks retain their current value). The candidate
values for that block are either accepted or rejected. Then the same is done
for the next parameter block, etc.

Example
1. Suggest a candidate for a.
(In this example, a* is accepted.)
2. Suggest a candidate for b.
(In this example, b* is accepted.)
3. The chain moves to [a*, b*].

2007-03-12, Linköping Dr Christian Asseburg


University of York, UK ca505@york.ac.uk
Bayesian Statistics: An Introduction Slide 47

Gibbs sampling
Gibbs sampling is a special case of MCMC. Here, the posterior
parameter space is divided into blocks of parameters, such that for each
block, the conditional posterior probabilities are known.

Then, at each step i in the chain, candidate values x* are generated


randomly from the conditional posterior probability for each parameter
block, given the current values of the other parameters in the model.

Because x* is a draw from the conditional posterior probability, the


calculation of the MCMC acceptance probability always gives 1. Thus,
the chain always moves to x* (xi+1=x*). The sampler thus converges
more quickly.

2007-03-12, Linköping Dr Christian Asseburg


University of York, UK ca505@york.ac.uk
Bayesian Statistics: An Introduction Slide 48

Difficulties with MCMC


Unfortunately, the Markov chain Monte Carlo algorithms do not always
work well, and some care is needed when checking for convergence to
the posterior distribution of interest.

The most common problems are:


Bad “mixing”: The chain does not move
well because the candidate acceptance
rate is too low.
Cause: The candidate generator often
suggests candidates that are too unlikely i
compared to the current value.

2007-03-12, Linköping Dr Christian Asseburg


University of York, UK ca505@york.ac.uk
Bayesian Statistics: An Introduction Slide 49

Difficulties with MCMC


Unfortunately, the Markov chain Monte Carlo algorithms do not always
work well, and some care is needed when checking for convergence to
the posterior distribution of interest.

The most common problems are:


Bad “mixing”

Trends in the chain: The exploration of


posterior model space is slow and the
chain seems to have a direction.
Cause: The candidate generator
suggests candidates too close to the
current values.

2007-03-12, Linköping Dr Christian Asseburg


University of York, UK ca505@york.ac.uk
Bayesian Statistics: An Introduction Slide 50

Difficulties with MCMC


Unfortunately, the Markov chain Monte Carlo algorithms do not always
work well, and some care is needed when checking for convergence to
the posterior distribution of interest.

The most common problems are:


Bad “mixing”

Trends in the chain

Poor coverage of posterior probability:


The chain seems to mix well, but it is stuck
at a local maximum of posterior probability.
The samples thus do not exhaust the
posterior model space.
Cause: Inappropriate candidate generator.

2007-03-12, Linköping Dr Christian Asseburg


University of York, UK ca505@york.ac.uk
Bayesian Statistics: An Introduction Slide 51

Difficulties with MCMC


Unfortunately, the Markov chain Monte Carlo algorithms do not always
work well, and some care is needed when checking for convergence to
the posterior distribution of interest.

The most common problems are:


Bad “mixing”

Trends in the chain

Poor coverage of posterior probability


Because of these difficulties, generating samples from a Bayesian
posterior requires a lot of attention to detail and can often not be fully
automated.
Diagnostic criteria exist to aid in detecting convergence and good mixing
of the MCMC sampler.

2007-03-12, Linköping Dr Christian Asseburg


University of York, UK ca505@york.ac.uk
Bayesian Statistics: An Introduction Slide 52

Sequential Importance Sampling


This algorithm is very suitable for data that can be obtained sequentially,
for example to monitor an industrial process. SIS (usually implemented
as a particle filter) can also be applied to more general problems.

In SIS, the posterior distribution of interest is approximated by a “swarm”


of particles, where each particle is one possible realisation of the model.
For example, in a model with two parameter values, a and b, a particle
could be the pair (a=4.5, b=-2).

The posterior density function is split into factors, and at each step in the
algorithm, all particles are resampled based on weights. These weights
are derived from the factors that make up the pdf. For example, the first
step might weight the particle sample according to the Bayesian prior.
The second step might weight the updated set of particles according to
the factor that corresponds to the first datum. The next resampling may
take into account the next datum, etc, until the data are used up.

2007-03-12, Linköping Dr Christian Asseburg


University of York, UK ca505@york.ac.uk
Bayesian Statistics: An Introduction Slide 53

Sequential Importance Sampling


This algorithm is very suitable for data that can be obtained sequentially,
for example to monitor an industrial process. SIS (usually implemented
as a particle filter) can also be applied to more general problems.

p ∣data∝ p ⋅p data 1∣⋅p data 2∣⋅...

Prior

a=2.5
a=3.1
a=-1
a=4
a=2.7
a=1.7
...

2007-03-12, Linköping Dr Christian Asseburg


University of York, UK ca505@york.ac.uk
Bayesian Statistics: An Introduction Slide 54

Sequential Importance Sampling


This algorithm is very suitable for data that can be obtained sequentially,
for example to monitor an industrial process. SIS (usually implemented
as a particle filter) can also be applied to more general problems.

p ∣data∝ p ⋅p data 1∣⋅p data 2∣⋅...

a=2.5 1.5 Weights due to the first datum


a=3.1 0.1
a=-1 5.5
a=4 0.0
a=2.7 1.3
a=1.7 3.4
...

2007-03-12, Linköping Dr Christian Asseburg


University of York, UK ca505@york.ac.uk
Bayesian Statistics: An Introduction Slide 55

Sequential Importance Sampling


This algorithm is very suitable for data that can be obtained sequentially,
for example to monitor an industrial process. SIS (usually implemented
as a particle filter) can also be applied to more general problems.

p ∣data∝ p ⋅p data 1∣⋅p data 2∣⋅...

Weighted resampling...

a=2.5 1.5 a=-1


a=3.1 0.1 a=-1
a=-1 5.5 a=-1
a=4 0.0 a=-1
a=2.7 1.3 a=1.7
a=1.7 3.4 a=1.7
...

2007-03-12, Linköping Dr Christian Asseburg


University of York, UK ca505@york.ac.uk
Bayesian Statistics: An Introduction Slide 56

Sequential Importance Sampling


This algorithm is very suitable for data that can be obtained sequentially,
for example to monitor an industrial process. SIS (usually implemented
as a particle filter) can also be applied to more general problems.

p ∣data∝ p ⋅p data 1∣⋅p data 2∣⋅...

again, calculate weights and resample...

a=2.5 1.5 a=-1 2.5 a=-1


a=3.1 0.1 a=-1 2.5 a=-1
a=-1 5.5 a=-1 2.5 a=-1
a=4 0.0 a=-1 2.5 a=-1
a=2.7 1.3 a=1.7 0.1 a=-1
a=1.7 3.4 a=1.7 0.1 a=1.7
...

2007-03-12, Linköping Dr Christian Asseburg


University of York, UK ca505@york.ac.uk
Bayesian Statistics: An Introduction Slide 57

Sequential Importance Sampling


This algorithm is very suitable for data that can be obtained sequentially,
for example to monitor an industrial process. SIS (usually implemented
as a particle filter) can also be applied to more general problems.

p ∣data∝ p ⋅p data 1∣⋅p data 2∣⋅...

When all the data are used


up, the final swarm of
a=2.5 1.5 a=-1 2.5 a=-1 particles is a sample from
a=3.1 0.1 a=-1 2.5 a=-1 the posterior distribution.
a=-1 5.5 a=-1 2.5 a=-1
a=4 0.0 a=-1 2.5 a=-1 Because of its sequential
a=2.7 1.3 a=1.7 0.1 a=-1 structure, SIS is often used
a=1.7 3.4 a=1.7 0.1 a=1.7 with time-series data.
...

2007-03-12, Linköping Dr Christian Asseburg


University of York, UK ca505@york.ac.uk
Bayesian Statistics: An Introduction Slide 58

Difficulties with SIS


The main problem with SIS is particle depletion: At each resampling
step, the number of different particles is reduced, and no new particles
are created. Because the number of particles is finite, eventually there
are many identical particles.

Different solutions have been suggested, usually based on randomly


generating new particles at each step that are slightly different from the
existing particles but not too different to break the ergodic properties of
the sampler. Other methods are being explored – this is an area of
active research.

2007-03-12, Linköping Dr Christian Asseburg


University of York, UK ca505@york.ac.uk
Bayesian Statistics: An Introduction Slide 59

Comparison MCMC and SIS


MCMC SIS

Sampling Chain generates All samples are generated


samples one by one at once

Data Required from the start Can be added sequentially

Computational 10,000's of iterations 10,000's of particles


cost

Challenges Convergence and Particle depletion


mixing

Uses Very versatile “Live” time-series

2007-03-12, Linköping Dr Christian Asseburg


University of York, UK ca505@york.ac.uk
Bayesian Statistics: An Introduction Slide 60

Implementations
For MCMC, many ready-made implementations exist. A good place to
start is the package OpenBUGS (ongoing development of WinBUGS),
which implements the Gibbs and other samplers. With a familiar
Windows interface and a very general symbolic language to specify
models, OpenBUGS can solve most classes of Bayesian models.

R offers several add-on packages with MCMC capabilities, as well as an


interface to OpenBUGS, called BRugs.

In terms of speed and efficiency, it may be best to hand-code the MCMC


sampler directly in Fortran, C or another suitable language.

For SIS, I am not aware of any ready-made packages, but there are
ongoing developments.

2007-03-12, Linköping Dr Christian Asseburg


University of York, UK ca505@york.ac.uk
Bayesian Statistics: An Introduction Slide 61

Hands-on Example
Here I demonstrate the use of OpenBUGS. I've made up this example –
but the basic approach carries through to real applications in health
economics.

8 RCTs have been carried out to investigate the effectiveness of


treatments A and B (observing the number of symptom-free patients
after 1 year). Treatment A costs SEK 10,000, whereas treatment B costs
SEK 14,000. QALY values are given by a probability distribution.

The trial data is summarised as follows:


nA 120 15 84 398 80 40 97 121
rA 65 9 39 202 45 17 48 63
nB 120 16 45 402 77 20 100 115
rB 81 15 29 270 52 12 68 80

QALY symptom-free: Beta(9,1) QALY with symptoms: Beta(5,5)

2007-03-12, Linköping Dr Christian Asseburg


University of York, UK ca505@york.ac.uk
Bayesian Statistics: An Introduction Slide 62

Statistical model
An evidence synthesis model is required to combine the information
from the 8 RCTs.

Let's choose a random-baselines, random-effects model. We model


trial outcomes on the log-odds probability scale, with the treatment
effect being additive on the log-odds scale.

Letting i denote a trial, we have:


A
Probability with treatment A (baseline): logit  pi =i
Log-odds treatment effect: ti
B
Probability with treatment B: logit  p =i t i
i

2007-03-12, Linköping Dr Christian Asseburg


University of York, UK ca505@york.ac.uk
Bayesian Statistics: An Introduction Slide 63

Statistical model (2)


Random-baselines, random-effects model on the log-odds scale
A
Probability with treatment A (baseline): logit  p =i
i

Log-odds treatment effect: ti


B
Probability with treatment B: logit  pi =i t i
We need to relate the trial-specific parameters μi and ti to their
underlying values Μ and T. On the log-odds scale, these are usually
assumed to be normally distributed.
2
Random baseline: i ~ Norm M , M 
2
Random treatment effect: t i ~ Norm T ,  T

2007-03-12, Linköping Dr Christian Asseburg


University of York, UK ca505@york.ac.uk
Bayesian Statistics: An Introduction Slide 64

Statistical model (3)


Next, the model requires a sampling distribution: Given a set of values
for the unknown parameters, how likely is an observed datum?

The model yields a probability and we have binomial data, so the only
sensible choice is
A A A B B B
r i ~ Binom  pi , ni  r i ~ Binom  pi , ni 
By now we have 20 unknown parameters (8 ti, 8 μi, M, σM, T and σT).

So far we have made arbitrary choices in model design – we could just


as well have chosen a fixed-effects model (with fewer parameters) or
designed something more complicated.

2007-03-12, Linköping Dr Christian Asseburg


University of York, UK ca505@york.ac.uk
Bayesian Statistics: An Introduction Slide 65

Statistical model (4)


Now, because this is a Bayesian model, we need priors for the unknown
parameters M, σM, T and σT.

If we already know something about the parameters we could add this


knowledge as prior information.

For example, there may be further information on the baseline


probability of symptom-free days – we could express this through the
prior on M, if we consider the information relevant.

Otherwise, we choose “sensible” priors that have little information,


along the lines of: M and T lie in the real line, and we know little about
it, so let's pick a Normal prior with mean 0 and large variance.

2007-03-12, Linköping Dr Christian Asseburg


University of York, UK ca505@york.ac.uk
Bayesian Statistics: An Introduction Slide 66

Statistical model (5)


Model equations: logit  piA =i B
logit  p =i t i
i
2 2
i ~ Norm M ,  M t i ~ Norm T ,  T

Sampling A A A B B B
distribution: r i ~ Binom  pi , ni  r i ~ Binom  pi , ni 

Priors: M ~ Norm 0,10000 (log-odds probabilities)


T ~ Norm 0,10000
 M ~ Unif 0,2 (log-odds standard deviations)
 T ~ Unif 0,2

2007-03-12, Linköping Dr Christian Asseburg


University of York, UK ca505@york.ac.uk
Bayesian Statistics: An Introduction Slide 67

Statistical model (6)


Model equations: logit  piA =i B
logit  p =i t i
i
2 2
i ~ Norm M ,  M t i ~ Norm T ,  T

Sampling A A A B B B
distribution: r i ~ Binom  pi , ni  r i ~ Binom  pi , ni 

The model equations and the sampling distribution are common to


the Frequentist and the Bayesian approaches. If you already have a
Frequentist model, then you (should) already have specified these.

2007-03-12, Linköping Dr Christian Asseburg


University of York, UK ca505@york.ac.uk
Bayesian Statistics: An Introduction Slide 68

Statistical model (7)


Priors do not occur in the Frequentist setting, so you probably have to
make them up.
In this example, the priors are meant to be uninformative, i.e. they are
supposed to add no information to the result. It is good practice to test
this by changing the priors a little bit and observing the impact on the
results of your model.

Priors: M ~ Norm 0,10000 (log-odds probabilities)


T ~ Norm 0,10000
 M ~ Unif 0,2 (log-odds standard deviation)
 T ~ Unif 0,2

2007-03-12, Linköping Dr Christian Asseburg


University of York, UK ca505@york.ac.uk
Bayesian Statistics: An Introduction Slide 69

OpenBUGS
Let us fit this Bayesian model using OpenBUGS.

The OpenBUGS syntax is relatively straightforward and similar to R.

model {
A
for (i in 1:N) { logit  pi =i
logit(pA[i])<-mu[i]
B
logit(pB[i])<-mu[i]+t[i] logit  p =i t i
i
rA[i]~dbin(pA[i],nA[i])
A A A
rB[i]~dbin(pB[i],nB[i]) r ~ Binom  p , n 
i i i
mu[i]~dnorm(M,precM) B B B
t[i] ~dnorm(T,precT) r ~ Binom  p , n 
i i i
} 2
M~dnorm(0,0.0001) i ~ NormM ,  M
2
T~dnorm(0,0.0001) t i ~ Norm T , T 
precM<-1/pow(sigmaM,2)
precT<-1/pow(sigmaT,2)
sigmaM~dunif(0,2)
sigmaT~dunif(0,2)
}

2007-03-12, Linköping Dr Christian Asseburg


University of York, UK ca505@york.ac.uk
Bayesian Statistics: An Introduction Slide 70

OpenBUGS (2)
The data are specified in a separate section so that they can be entered
or changed easily.

model {
for (i in 1:N) {
logit(pA[i])<-mu[i]
logit(pB[i])<-mu[i]+t[i]
rA[i]~dbin(pA[i],nA[i])
rB[i]~dbin(pB[i],nB[i])
mu[i]~dnorm(M,precM)
t[i] ~dnorm(T,precT)
#data
} list(N=8,
M~dnorm(0,0.0001) nA=c(120,15,84,398, 80,40, 97,121),
T~dnorm(0,0.0001) rA=c( 65, 9,39,202, 45,17, 48, 63),
precM<-1/pow(sigmaM,2)
nB=c(120,16,45,402, 77,20,100,115),
precT<-1/pow(sigmaT,2)
rB=c( 81,15,29,270, 52,12, 68, 80))
sigmaM~dunif(0,2)
sigmaT~dunif(0,2)
}

2007-03-12, Linköping Dr Christian Asseburg


University of York, UK ca505@york.ac.uk
Bayesian Statistics: An Introduction Slide 71

OpenBUGS (3)
The OpenBUGS window can look like this.

2007-03-12, Linköping Dr Christian Asseburg


University of York, UK ca505@york.ac.uk
Bayesian Statistics: An Introduction Slide 72

OpenBUGS (4)
In this example, OpenBUGS explores the model's posterior reasonably well.

The three colours denote


three chains that are run in
parallel.

Note that there is no


evidence that initial values
are influencing the chains.

Also, each chain appears to


“wiggle” quite well and the
three chains overlap,
indicating that they are
exploring the same posterior
space (as they should).

2007-03-12, Linköping Dr Christian Asseburg


University of York, UK ca505@york.ac.uk
Bayesian Statistics: An Introduction Slide 73

OpenBUGS (5)
Here's a screenshot from another model, in which the sampler did not
converge (just to give you an idea of what to look for...).

Poor mixing: Three chains (which should all explore


the same parameter space) are far from each other,
and trends are evident.

2007-03-12, Linköping Dr Christian Asseburg


University of York, UK ca505@york.ac.uk
Bayesian Statistics: An Introduction Slide 74

Model convergence
WinBUGS and OpenBUGS provide a few formal diagnostics to check for
convergence and performance of the sampler, for example the Brooks-
Gelman-Rubin diagram and plots of within-chain autocorrelation.
High “MC_error” can also indicate convergence problems.

To avoid convergence problems, bear in mind the following.

1. It is difficult to fit many unknown parameters to little data.

2. If you can exploit conjugate distributions, do so.

3. Sometimes you can get around convergence problems by re-writing


your model equations without changing the underlying model.

4. If high within-chain autocorrelation is the only problem, you can thin


the posterior samples and only keep every nth draw.

2007-03-12, Linköping Dr Christian Asseburg


University of York, UK ca505@york.ac.uk
Bayesian Statistics: An Introduction Slide 75

Example continued
When you are satisfied with the posterior sampling, you can generate
any desired summary statistic for your posterior.

For example, here's an overview of the posterior means and credibility


intervals.

In this example, the treatment effect T is positive (on the log-odds


scale), i.e. the treatment B has a higher probability of symptom-free.

2007-03-12, Linköping Dr Christian Asseburg


University of York, UK ca505@york.ac.uk
Bayesian Statistics: An Introduction Slide 76

Example continued
When you are satisfied with the posterior sampling, you can generate
any desired summary statistic for your posterior.

MC_error is another
indication for how well the
sampler performed.

The sampling error should


be much smaller than the
estimated posterior standard
deviation.

2007-03-12, Linköping Dr Christian Asseburg


University of York, UK ca505@york.ac.uk
Bayesian Statistics: An Introduction Slide 77

Example continued
When you are satisfied with the posterior sampling, you can generate
any desired summary statistic for your posterior.

The 95% credibility interval for the treatment efficacy parameter


T is [0.481, 0.9601], i.e. treatment B has a very significant effect
on the probability of symptom-freeness after 1 year.

2007-03-12, Linköping Dr Christian Asseburg


University of York, UK ca505@york.ac.uk
Bayesian Statistics: An Introduction Slide 78

Example continued
But what is the probability P that treatment B is the cost-effective choice
at a willingness-to-pay (WTP) of λ=SEK 50,000?

This probability P can be calculated from the model parameters as


follows.

Let's assume that the underlying baseline and treatment effect would
apply to the target population, i.e. in the target population
A B
logit  p =M and logit  p =M T . We calculate the net benefit of
the treatments (NB), using the costs C and utilities U.
A A A A
NB =[ p ⋅U free1− p ⋅U symptoms ]⋅−C
B B B B
NB =[ p ⋅U free1− p ⋅U symptoms ]⋅−C

B A
The probability P is given by P=Pr  NB  NB 

2007-03-12, Linköping Dr Christian Asseburg


University of York, UK ca505@york.ac.uk
Bayesian Statistics: An Introduction Slide 79

Example continued
But what is the probability P that treatment B is the cost-effective choice
at a willingness-to-pay (WTP) of λ=SEK 50,000?

We can calculate this directly in WinBUGS, adding a few more lines of


code.

model {
... Numerically, we simply look at all
logit(PA)<-M the draws from the posterior and
logit(PB)<-M+T check which of them fulfill the
Uf~dbeta(9,1) condition. This proportion is the
Us~dbeta(5,5) posterior probability P. There is no
NBA<-(PA*Uf+(1-PA)*Us)*WTP-CA
NBB<-(PB*Uf+(1-PB)*Us)*WTP-CB need for any further tests.
P<-step(NBB-NBA)
}

2007-03-12, Linköping Dr Christian Asseburg


University of York, UK ca505@york.ac.uk
Bayesian Statistics: An Introduction Slide 80

Example continued
But what is the probability P that treatment B is the cost-effective choice
at a willingness-to-pay (WTP) of λ=SEK 50,000?

Here's the posterior summary for our new quantities.

There is a lot of overlap between the net benefits for treatment A


(95%-CI [SEK 16240, 33200]) and B (95%-CI [SEK 15580, 31090]).

Accordingly, the probability that treatment B is cost-effective is


estimated at 0.3498.

2007-03-12, Linköping Dr Christian Asseburg


University of York, UK ca505@york.ac.uk
Bayesian Statistics: An Introduction Slide 81

Example continued
But what is the probability P that treatment B is the cost-effective choice
at a willingness-to-pay (WTP) of λ=SEK 50,000?

Here's the posterior summary for our new quantities.

In this example, I defined the probability P by checking a condition,


rather than as a stochastic variable with its own prior and probability
density. This is why the confidence intervals for P do not mean
anything.

2007-03-12, Linköping Dr Christian Asseburg


University of York, UK ca505@york.ac.uk
Bayesian Statistics: An Introduction Slide 82

Example continued
OpenBUGS can also produce graphical output for all quantities
of interest.

For example, here are the posterior densities for the net benefits NBA
and NBB.
NBA sample: 30000 NBB sample: 30000
0.07.5E-5

P(NBB)
0.01.0E-4
P(NBA)

0.0 1.0E+4 2.0E+4 3.0E+4 -1.0E+4 0.0 1.0E+4 3.0E+4


NBA NBB

They both show quite wide distributions and their support on the x-axes
overlaps substantially.

2007-03-12, Linköping Dr Christian Asseburg


University of York, UK ca505@york.ac.uk
Bayesian Statistics: An Introduction Slide 83

Bayesian decision theory


With a probabilistic net benefit function, Bayesian decision theory can be
applied to optimise management decisions.

Bayesian models can thus directly feed in to management processes.

In a Frequentist model, it is not generally possible to find a probability


distribution for an unknown parameter – because a Frequentist
calculates the likelihood of observing the data and uses this to make
inferences on the model parameters.

Frequentist models do not offer an obvious way for calculating quantities


that are derived from individual parameter values (such as the
probability P or a net benefit) – this makes them less amenable to
management processes.

2007-03-12, Linköping Dr Christian Asseburg


University of York, UK ca505@york.ac.uk
Bayesian Statistics: An Introduction Slide 84

Summary
Frequentist modelling centres on the likelihood function, i.e. how likely
are the data given a particular model.

Bayesian modelling centres on the probabilities of models and model


parameters, by combining the likelihood of the data with prior
probabilities of the unknown parameters.

Both can be used equally well to fit models and to make inferences on
model parameters.

However, only Bayesian statistics is capable of assigning probabilities


to model quantities. This makes it possible to calculate derived
quantities and their uncertainties.

Numerical methods for fitting Bayesian models require some care and
experience.

2007-03-12, Linköping Dr Christian Asseburg


University of York, UK ca505@york.ac.uk

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy