Bayesian Econometrics Introduction
Bayesian Econometrics Introduction
Bayesian Econometrics Introduction
Daniel Buncic
Sveriges Riksbank
Homepage
www.danielbuncic.com
Outline/Table of Contents
Outline
Introduction, Intuition and Background
History of Probability Some Model Based Examples
Different views of Probability Example 1: Binary Model
Bayesian view of Probability Example 2: Poisson Model
Bayesian statistical modelling Example 3: Normal Model
Some General Examples Example 4: Regression Model
Main differences between Bayesian and Frequentist views References
Relation to other Estimation Methods
Daniel Buncic (Sveriges Riksbank) Topic 1: Bayesian Econometrics August 20, 2018 : 2/107
History of Probability
Daniel Buncic (Sveriges Riksbank) Topic 1: Bayesian Econometrics August 20, 2018 : 3/107
History of Probability
Daniel Buncic (Sveriges Riksbank) Topic 1: Bayesian Econometrics August 20, 2018 : 4/107
History of Probability
Daniel Buncic (Sveriges Riksbank) Topic 1: Bayesian Econometrics August 20, 2018 : 5/107
History of Probability
P(Ai ∩ Aj )
P(Ai |Aj ) = . (2)
P(Aj )
Daniel Buncic (Sveriges Riksbank) Topic 1: Bayesian Econometrics August 20, 2018 : 6/107
Different views of Probability
What is Probability?
There exists no agreement of a formal definition or interpretation of probability!
all that we have is a set of rules (axioms) that define the mathematical
properties of probability and its building blocks
3) subjective interpretation
– probability as meaning the ”degree of belief” in an event held by an individual,
- what ever way formulated, ie., based on past info, combinatorial probabilities, etc.
Daniel Buncic (Sveriges Riksbank) Topic 1: Bayesian Econometrics August 20, 2018 : 7/107
Different views of Probability
the ”degree of belief” view is thus a quite general concept and allows for a
”subjective” as well as a ”classical” definition of probability.
”classical” definition of probability was formulated by Jacob Bernoulli in Ars
Conjectandi (1713) and Abraham De Moivre in The Doctrine of Chances
(1718)
the conditions for each trial or experiment have to remain the same for a
”relative frequency” view of probability to be valid
what does limn→∞ mean operationally? if only a finite number of trials can
ever be conducted, then the ”relative frequency” view of probability cannot
be made operational
the ”frequentist” interpretation of probability cannot be applied to situations
where there is no precedence of the event that one is interested in, when the
conditions of the experiment change
the ”combinatorial” interpretation is not feasible when there is an infinite set
of possible equally likely outcomes
Daniel Buncic (Sveriges Riksbank) Topic 1: Bayesian Econometrics August 20, 2018 : 8/107
Different views of Probability
Daniel Buncic (Sveriges Riksbank) Topic 1: Bayesian Econometrics August 20, 2018 : 9/107
Different views of Probability
Daniel Buncic (Sveriges Riksbank) Topic 1: Bayesian Econometrics August 20, 2018 : 10/107
Bayesian view of Probability
Daniel Buncic (Sveriges Riksbank) Topic 1: Bayesian Econometrics August 20, 2018 : 11/107
Bayesian view of Probability
most people are likely to use probability within a relative frequency context of
observed outcomes from the past
but probability is very often also based on the subjective view, that is, the
”degree of belief” view, where a probability can be assigned to a (personal or
individual) proposition without ever having observed it
”degree of belief” can be mapped onto probabilities if they satisfy simple
rules of consistency, which are known as the Cox axioms (Cox, 1946)
Cox axioms of probability ensure that if two people made the same prior
assumptions and were given the same data, then they will draw identical
conclusions
such a more general view of probability to quantify beliefs is known as the
Bayesian viewpoint or as the subjective interpretation of probability
Daniel Buncic (Sveriges Riksbank) Topic 1: Bayesian Econometrics August 20, 2018 : 12/107
Bayesian view of Probability
Daniel Buncic (Sveriges Riksbank) Topic 1: Bayesian Econometrics August 20, 2018 : 13/107
Bayesian view of Probability
L(θ|x) = p(x|θ)
N
Y
= p(xi |θ)
i=1
Daniel Buncic (Sveriges Riksbank) Topic 1: Bayesian Econometrics August 20, 2018 : 14/107
Bayesian view of Probability
P(B|A)P(A)
P(A|B) = (7)
P(B|A)P(A) + P(B|AC )P(AC )
Daniel Buncic (Sveriges Riksbank) Topic 1: Bayesian Econometrics August 20, 2018 : 15/107
Bayesian view of Probability
Bayes’ Theorem follows directly from the conditional probability corollary in (2).
By definition we have
P(B|A)P(A) = P(B ∩ A)
and also
Daniel Buncic (Sveriges Riksbank) Topic 1: Bayesian Econometrics August 20, 2018 : 16/107
Bayesian statistical modelling
for all θ, where C (x1 , x2 ) is constant with respect to θ, then the information content
in the two likelihoods L(θ|x1 ) and L(θ|x2 ) with respect to θ is the same.
Daniel Buncic (Sveriges Riksbank) Topic 1: Bayesian Econometrics August 20, 2018 : 17/107
Bayesian statistical modelling
The Likelihood Principle states that all of the relevant information that is available in
a sample of data is contained in its likelihood function.
Daniel Buncic (Sveriges Riksbank) Topic 1: Bayesian Econometrics August 20, 2018 : 18/107
Bayesian statistical modelling
where
– ∝ is the ”proportional to” operator
– p(θ|x) is the posterior probability of θ after observing the data
– L(θ|x) ≡ p(x|θ) is the likelihood function which is equal to the PDF of x given θ
– π(θ) is the prior
– p(x) 6= 0 is the marginal density of the data, does not depend on θ and is assumed
to exist
Daniel Buncic (Sveriges Riksbank) Topic 1: Bayesian Econometrics August 20, 2018 : 19/107
Bayesian statistical modelling
The relation in (9) is the key relation and describes how we proceed under a
Bayesian modelling approach
This can be summarised as follows:
1) given your current knowledge (or information available to you), form a prior
belief on the proposition of interest, express it with π(θ)
2) collect sample data and formulate a (parametric) statistical model, formulate
it as L(θ|x)
3) update your prior beliefs expressed by π(θ) with the new ”information” that
is available and contained in the likelihood L(θ|x) using the rule in (9) as:
p(θ|x) ∝ L(θ|x)π(θ)
Daniel Buncic (Sveriges Riksbank) Topic 1: Bayesian Econometrics August 20, 2018 : 20/107
Bayesian statistical modelling
Daniel Buncic (Sveriges Riksbank) Topic 1: Bayesian Econometrics August 20, 2018 : 21/107
Some General Examples of Bayesian Probability
Daniel Buncic (Sveriges Riksbank) Topic 1: Bayesian Econometrics August 20, 2018 : 22/107
Some General Examples of Bayesian Probability
Daniel Buncic (Sveriges Riksbank) Topic 1: Bayesian Econometrics August 20, 2018 : 23/107
Some General Examples of Bayesian Probability
P(T = 1, D = 1)
P(D = 1|T = 1) =
P(T = 1)
P(T = 1|D = 1)P(D = 1)
=
P(T = 1|D = 1)P(D = 1) + P(T = 1|D = 0)P(D = 0)
0.95 × 0.01
=
0.95 × 0.01 + 0.05 × 0.99
= 0.161
Thus, despite the positive test result, the (posterior) probability that Bob has the
diseases given that he tested positive is only 16.1%.
So this result is very different from the typical frequentist interpretation.
Daniel Buncic (Sveriges Riksbank) Topic 1: Bayesian Econometrics August 20, 2018 : 24/107
Main differences between Bayesian and Frequentist views
Daniel Buncic (Sveriges Riksbank) Topic 1: Bayesian Econometrics August 20, 2018 : 25/107
Main differences between Bayesian and Frequentist views
Bayesian
Daniel Buncic (Sveriges Riksbank) Topic 1: Bayesian Econometrics August 20, 2018 : 26/107
Relation to other Estimation Methods
Daniel Buncic (Sveriges Riksbank) Topic 1: Bayesian Econometrics August 20, 2018 : 27/107
Relation to other Estimation Methods
Daniel Buncic (Sveriges Riksbank) Topic 1: Bayesian Econometrics August 20, 2018 : 28/107
Relation to other Estimation Methods
Daniel Buncic (Sveriges Riksbank) Topic 1: Bayesian Econometrics August 20, 2018 : 29/107
Some Model Based Examples
Example 1: Binary Model
Binary Model
Suppose our variable of interest is Y which is Binary so that
Daniel Buncic (Sveriges Riksbank) Topic 1: Bayesian Econometrics August 20, 2018 : 30/107
Some Model Based Examples
Example 1: Binary Model
Frequentist MLE
The MLE would maximise the likelihood function L(θ|y) = p(y|θ) w.r.t. θ.
-31
x 10
3.5
2.5
2
L(µ jy)
1.5
0.5
0
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
µ
Daniel Buncic (Sveriges Riksbank) Topic 1: Bayesian Econometrics August 20, 2018 : 31/107
Some Model Based Examples
Example 1: Binary Model
Analytically, we would get the MLE of θ by setting the derivative of L(θ|y) = p(y|θ)
w.r.t. θ to zero. As always, with MLE, we actually set the log of the likelihood
ln(L(θ|y)) to zero, that is:
∂ ln(L(θ|y))
=0
∂θ
∂ [N ȳ ln(θ) + N (1 − ȳ) ln(1 − θ)]
=0
∂θ
N ȳ N (1 − ȳ)
=
θ (1 − θ)
N ȳ − N ȳθ = θN − θN ȳ
θ̂MLE = ȳ
= 100/130 = 0.7692.
Daniel Buncic (Sveriges Riksbank) Topic 1: Bayesian Econometrics August 20, 2018 : 32/107
Some Model Based Examples
Example 1: Binary Model
The variance of θ̂MLE can be found as the inverse of information matrix I(θ), where
2
∂ [N ȳ ln(θ) + N (1 − ȳ) ln(1 − θ)]
I(θ) = −E
∂θ2
= −E −N ȳθ−2 − N (1 − ȳ) (1 − θ)−2
−1 −1
= Nθ + N (1 − θ)
= N [θ−1 + (1 − θ)−1 ]
= N [θ(1 − θ)]−1
Daniel Buncic (Sveriges Riksbank) Topic 1: Bayesian Econometrics August 20, 2018 : 33/107
Some Model Based Examples
Example 1: Binary Model
Bayesian
Under a Bayesian approach we need to specify a prior π(θ) and then compute the
posterior p(θ|y) from the relation
p(y|θ)π(θ)
p(θ|y) =
p(y)
L(θ|y)π(θ)
= (14)
p(y)
We know that θ ∈ [0, 1]. If we are uncertain about what the most likely value of θ is
before seeing the data, we can assign an uninformative prior, so that θ ∈ [0, 1] has
equal probability of falling anywhere in this interval.
We can let θ ∼ Uniform (a, b), with a = 0 and b = 1.
Daniel Buncic (Sveriges Riksbank) Topic 1: Bayesian Econometrics August 20, 2018 : 34/107
Some Model Based Examples
Example 1: Binary Model
With a = 0 and b = 1 we get the Uniform prior for θ as π(θ) = 1. From (14) we can
then form the posterior
L(θ|y)π(θ)
p(θ|y) = (16)
p(y)
θN ȳ (1 − θ)N (1−ȳ)
= (17)
p(y)
∝ θN ȳ (1 − θ)N (1−ȳ) (18)
| {z }
kernel
Daniel Buncic (Sveriges Riksbank) Topic 1: Bayesian Econometrics August 20, 2018 : 35/107
Some Model Based Examples
Example 1: Binary Model
A trick that we will frequently employ is to look at the numerator of (17) only,
without considering the marginal density of the data, p(y) at all.
the right hand side of (18) is often referred to as the kernel of a density
the kernel of a density determines its shape, hence all that matters
– but kernel does not integrate to 1 as a proper PDF should, but to some other
constant which we can call C, ie.
Z
θN ȳ (1 − θ)N (1−ȳ) dθ = C 6= 1.
θ
Daniel Buncic (Sveriges Riksbank) Topic 1: Bayesian Econometrics August 20, 2018 : 36/107
Some Model Based Examples
Example 1: Binary Model
p(y) = C (19)
once we know what kind of posterior density we are dealing with, we can work
out the normalising constant C easily.
Daniel Buncic (Sveriges Riksbank) Topic 1: Bayesian Econometrics August 20, 2018 : 37/107
Some Model Based Examples
Example 1: Binary Model
1
p(z|α, β) = z α−1 (1 − z)β−1 , ∀α, β > 0, (20)
B(α, β) | {z }
kernel
Γ(α)Γ(β)
B(α, β) =
Γ(α + β)
Daniel Buncic (Sveriges Riksbank) Topic 1: Bayesian Econometrics August 20, 2018 : 38/107
Some Model Based Examples
Example 1: Binary Model
Γ(α)Γ(β)
xα−1 (1 − x)β−1 dx =
R
and the term p(y) = C from (19) is x Γ(α+β)
= B(α, β).
From our earlier numerical values we have:
α = N ȳ + 1 = 101
β = N (1 − ȳ) + 1 = 31
α + β = N ȳ + 1 + N (1 − ȳ) = N + 2 = 132
Daniel Buncic (Sveriges Riksbank) Topic 1: Bayesian Econometrics August 20, 2018 : 39/107
Some Model Based Examples
Example 1: Binary Model
so that
Γ(101)Γ(31)
C =
Γ(132)
= 100! × 30!/131!
C = 2.9221e − 032 (pretty small number!)
L(θ|y)
p(θ|y) =
p(y)
Daniel Buncic (Sveriges Riksbank) Topic 1: Bayesian Econometrics August 20, 2018 : 40/107
Some Model Based Examples
Example 1: Binary Model
L(θ|y)
=
C
thus, posterior is just a re-scaled version of the likelihood function with the
same shape as the likelihood function!
this is due to the prior being π(θ) = 1.
the maximum of L(θ|y) was at 3.1696e − 031 (see Figure (1))
the maximum of the posterior (ie., the mode of θ|y) thus has to be
L(θ|y)
max(p(θ|y)) =
C
3.1696e − 031
=
2.9221e − 032
≈ 10.8469
Daniel Buncic (Sveriges Riksbank) Topic 1: Bayesian Econometrics August 20, 2018 : 41/107
Some Model Based Examples
Example 1: Binary Model
12
Posterior
Prior
10
8
p (µ jy)
0
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
µ
Daniel Buncic (Sveriges Riksbank) Topic 1: Bayesian Econometrics August 20, 2018 : 42/107
Some Model Based Examples
Example 1: Binary Model
We know that
p(θ|y) = Beta(α, β) (22)
where
α = N ȳ + 1 and β = N (1 − ȳ) + 1.
From the moments of a Beta distribution we know that, if θ|y ∼ Beta(α, β), then
α
E(θ|y) = α+β
,
αβ
Var(θ|y) = (α+β)2 (α+β+1)
, and
α−1
mode(θ|y) = α+β−2
, if α, β > 1.
Daniel Buncic (Sveriges Riksbank) Topic 1: Bayesian Econometrics August 20, 2018 : 43/107
Some Model Based Examples
Example 1: Binary Model
To get a Bayesian point estimate (at the centre) of θ|y from (22), we can look at
either of the three well known measures of central tendencies, ie.,
1) the mode
2) the mean
3) or the median of the posterior (no closed form for the Beta PDF)
Which one we end up using depends on our loss function.
If a quadratic loss function is used, we get the standard result that the mean of the
posterior should be used, which is
α
E(θ|y) =
α+β
Daniel Buncic (Sveriges Riksbank) Topic 1: Bayesian Econometrics August 20, 2018 : 44/107
Some Model Based Examples
Example 1: Binary Model
N ȳ + 1
=
N ȳ + 1 + N (1 − ȳ) + 1
N ȳ + 1
=
N +2
= 101/132
= 0.76515.
(see also Chapter 3 in Koop et al. (2007) for various types of loss functions)
Daniel Buncic (Sveriges Riksbank) Topic 1: Bayesian Econometrics August 20, 2018 : 45/107
Some Model Based Examples
Example 1: Binary Model
αβ − (N y + 1) (N (y − 1) − 1)
Var(θ|y) = =
(α + β)2 (α + β + 1) (N y − N (y − 1) + 2)2 (N y − N (y − 1) + 3)
which is 0.001351.
Daniel Buncic (Sveriges Riksbank) Topic 1: Bayesian Econometrics August 20, 2018 : 46/107
Some Model Based Examples
Example 1: Binary Model
Use of priors
In the previous example we used a Uniform prior for θ
so effectively imposed no prior information on θ
we may want to express our prior beliefs more accurately though through a
more informative prior
all the prior in this example has to satisfy is the requirement that θ ∈ [0, 1] and
θ is continuous, so any density that satisfies these can be used, ie.,
– Beta, Minimax, Noncentral Beta, Standard Power distribution etc. (see Leemis
and McQueston (2008) for more details on these distributions)
the Beta distribution (and Noncentral Beta also) are particularly interesting
because they are known to be conjugate prior distributions for our Binary
model here.
Daniel Buncic (Sveriges Riksbank) Topic 1: Bayesian Econometrics August 20, 2018 : 47/107
Some Model Based Examples
Example 1: Binary Model
Daniel Buncic (Sveriges Riksbank) Topic 1: Bayesian Econometrics August 20, 2018 : 48/107
Some Model Based Examples
Example 1: Binary Model
where
Γ (2)
B(α = 1, β = 1) = = 1.
Γ (1) Γ (1)
Daniel Buncic (Sveriges Riksbank) Topic 1: Bayesian Econometrics August 20, 2018 : 49/107
Some Model Based Examples
Example 1: Binary Model
Using the more general (conjugate) Beta prior for the parameter of interest θ in this
set-up allows us to work with a model that still yields a Beta distribution as the
posterior.
this is the main advantage of using conjugate priors, posteriors are available
in closed form (ie., analytically)
the Beta distribution is very flexible and can take on many different shapes (see
Figure (3)) and has the Normal density as a special limiting case as β → ∞.
it can thus replicate many different prior assumptions, where one can put flat
or uninformative priors as well as highly informative ones over a given region of
importance to the investigator
for example, for AR(1) models, we can use the Beta prior to restrain the
parameter interval to [0, 1] with a peak at a value of say around 0.9, if we
know (or believe) a series has a fairly large persistence
Daniel Buncic (Sveriges Riksbank) Topic 1: Bayesian Econometrics August 20, 2018 : 50/107
Some Model Based Examples
Example 1: Binary Model
p(x|θ)
p(x|θ)
x x
(a) Beta density (b) Symmetric Beta density (α = β)
Daniel Buncic (Sveriges Riksbank) Topic 1: Bayesian Econometrics August 20, 2018 : 51/107
Some Model Based Examples
Example 1: Binary Model
ᾱ = N ȳ + α0 and β̄ = N (1 − ȳ) + β0 ,
Daniel Buncic (Sveriges Riksbank) Topic 1: Bayesian Econometrics August 20, 2018 : 53/107
Some Model Based Examples
Example 1: Binary Model
Daniel Buncic (Sveriges Riksbank) Topic 1: Bayesian Econometrics August 20, 2018 : 54/107
Some Model Based Examples
Example 2: Poisson Model
N
Y θyi exp {−θ}
=
i=1
yi !
N
Y 1
L(θ|y) = θN ȳ exp {−θN } .
i=1
yi!
| {z }
=K
Daniel Buncic (Sveriges Riksbank) Topic 1: Bayesian Econometrics August 20, 2018 : 55/107
Some Model Based Examples
Example 2: Poisson Model
Frequentist MLE
The ML estimate is again obtained by setting the derivative of
ln(L(y|θ)) = ln(p(y|θ)) wrt to θ to zero, that is:
∂ ln(p(y|θ))
= 0
∂θ
∂ [N ȳ ln(θ) − N θ + ln(K)]
= 0
∂θ
θN = N ȳ
θ̂MLE = ȳ
QN 1
where ȳ as before and K = i=1 yi ! is some constant that does not depend on θ.
Daniel Buncic (Sveriges Riksbank) Topic 1: Bayesian Econometrics August 20, 2018 : 56/107
Some Model Based Examples
Example 2: Poisson Model
= N E [ȳ] θ−2
= N θ−1
PN
since E(ȳ) = N −1 i=1 E(yi ) = θ, because E(y) = θ = Var(y) for a Poisson RV.
So Var(θ̂MLE ) = θ/N , where we would again replace θ by a consistent estimator such
as its MLE θ̂MLE to get an estimate of Var(θ̂MLE ) as θ̂MLE /N .
Daniel Buncic (Sveriges Riksbank) Topic 1: Bayesian Econometrics August 20, 2018 : 57/107
Some Model Based Examples
Example 2: Poisson Model
Bayesian
Daniel Buncic (Sveriges Riksbank) Topic 1: Bayesian Econometrics August 20, 2018 : 58/107
Some Model Based Examples
Example 2: Poisson Model
Daniel Buncic (Sveriges Riksbank) Topic 1: Bayesian Econometrics August 20, 2018 : 59/107
Some Model Based Examples
Example 2: Poisson Model
1
p(z|α, β) = z (α−1) exp{−z/β}, ∀α, β > 0
Γ (α) β α
R∞
where Γ(α) = 0
z (α−1) exp{−z}dz = (α − 1)! as before for the Beta distribution.
in the Gamma Class of densities, α and β are commonly referred to as shape
and scale parameters
1
ᾱ = N ȳ + 1 and β̄ = .
N
Daniel Buncic (Sveriges Riksbank) Topic 1: Bayesian Econometrics August 20, 2018 : 60/107
Some Model Based Examples
Example 2: Poisson Model
A Numerical Example
PN
Suppose we have a sample of N = 100 and compute i=1 yi = 201 as the values
that we observe.
then, the MLE of θ is just 2.01.
posterior density p(θ|y) under π(θ) = 1 is proportional to θN ȳ exp {−θN }, ie.,
1
θ|y ∼ Gamma N ȳ + 1,
N
Daniel Buncic (Sveriges Riksbank) Topic 1: Bayesian Econometrics August 20, 2018 : 61/107
Some Model Based Examples
Example 2: Poisson Model
we would compute
mode(θ|y) = (ᾱ − 1) β̄
= (N ȳ) /N
= ȳ = 2.01
E(θ|y) = ᾱβ̄
= (N ȳ + 1)/N = 2.02.
Daniel Buncic (Sveriges Riksbank) Topic 1: Bayesian Econometrics August 20, 2018 : 62/107
Some Model Based Examples
Example 2: Poisson Model (adapted from Hoff (2009) pages 43-50)
40 3
35
2.5
30
2
25
20 1.5
15
1
10
0.5
5
0 0
0 1 2 3 4 5 6 7 1 1.5 2 2.5 3 3.5 4 4.5 5
Figure 5: Plot of observed Poisson (Count) data and posterior density p(θ|y) for N = 100
and ȳ = 2.01.
Daniel Buncic (Sveriges Riksbank) Topic 1: Bayesian Econometrics August 20, 2018 : 63/107
Some Model Based Examples
Example 2: Poisson Model (adapted from Hoff (2009) pages 43-50)
where we are again using the notation α0 , β0 for the hyperparameters of the
Gamma(α0 , β0 ) prior.
Daniel Buncic (Sveriges Riksbank) Topic 1: Bayesian Econometrics August 20, 2018 : 64/107
Some Model Based Examples
Example 2: Poisson Model (adapted from Hoff (2009) pages 43-50)
N ȳ+α0 −1
∝ θ exp {−θ (N + 1/β0 )} . (26)
The kernel of the posterior p(θ|y) in (26) can be recognised as a Gamma ᾱ, β̄
density with
ᾱ = N ȳ + α0 and β̄ = 1/ (N + 1/β0 ) .
Daniel Buncic (Sveriges Riksbank) Topic 1: Bayesian Econometrics August 20, 2018 : 65/107
Some Model Based Examples
Example 2: Poisson Model (adapted from Hoff (2009) pages 43-50)
3 3 3
:(3 )
L(3 jy)
2.5 2.5 p(3 jy) 2.5
2 2 2
1 1 1
0 0 0
1 2 3 4 1 2 3 4 1 2 3 4
Figure 6: Plots of prior, posterior and likelihood function for three different prior
hyperparameter values α0 and β0 , and N = 100, ȳ = 2.01.
Daniel Buncic (Sveriges Riksbank) Topic 1: Bayesian Econometrics August 20, 2018 : 66/107
Some Model Based Examples
Example 3: Normal Model
with µ, σ 2 ∈ R × R+ .
Daniel Buncic (Sveriges Riksbank) Topic 1: Bayesian Econometrics August 20, 2018 : 67/107
Some Model Based Examples
Example 3: Normal Model
Frequentist MLE
The MLE for µ and σ 2 is obtained as before from
N
∂ ln L(µ, σ 2 |y)
2 X
= 2
(yi − µ) = 0
∂µ 2σ i=1
N
X
yi = N µ
i=1
N
X
N −1 yi = µ
i=1
so µ̂MLE = ȳ.
Daniel Buncic (Sveriges Riksbank) Topic 1: Bayesian Econometrics August 20, 2018 : 68/107
Some Model Based Examples
Example 3: Normal Model
For σ 2 we get
N
∂ ln L(µ, σ 2 |y)
N 2 −1 1 2 −2 X
= − σ + σ (yi − µ)2 = 0
∂σ 2 2 2 i=1
N
−1 −2 X
N σ2 = σ2 (yi − µ)2
i=1
N
X
N σ2 = (yi − µ)2
i=1
PN
2
so σ̂MLE = N −1 i=1 (yi − µ̂MLE )2 .
Daniel Buncic (Sveriges Riksbank) Topic 1: Bayesian Econometrics August 20, 2018 : 69/107
Some Model Based Examples
Example 3: Normal Model
the inverse of the Fisher Information I(θ), where I(θ) = −E(H) and the Hessian H
∂ 2 ln (L(θ|y))
H=
∂θ∂θ 0
∂ 2 ln(L(θ|y)) ∂ 2 ln(L(θ|y))
∂µ∂µ ∂σ 2 ∂µ
=
2 2
∂ ln(L(θ|y)) ∂ ln(L(θ|y))
∂µ∂σ 2 ∂σ 2 ∂σ 2
N
−N/σ 2 (yi − µ)/σ 4
P
−
i=1
=
N N
.
P 4 4 P 2 6
− (yi − µ)/σ N/(2σ ) − (yi − µ) /σ
i=1 i=1
Daniel Buncic (Sveriges Riksbank) Topic 1: Bayesian Econometrics August 20, 2018 : 70/107
Some Model Based Examples
Example 3: Normal Model
I(θ) = −E(H)
N
N/σ 2 E(yi − µ)/σ 4
P
i=1
=N
N
E(yi − µ)/σ 4 4 2 6
P P
−N/(2σ ) + E(yi − µ) /σ
i=1 i=1
N/σ 2
0
=
0 −N/(2σ 4 ) + N/σ 4
N/σ 2
0
=
0 N/(2σ 4 )
so that
σ 2 /N
0
Var(θ̂MLE ) = I(θ)−1 = .
0 2σ 4 /N
Daniel Buncic (Sveriges Riksbank) Topic 1: Bayesian Econometrics August 20, 2018 : 71/107
Some Model Based Examples
Example 3: Normal Model
Bayesian
Under a Bayesian setting, we again need to specify a prior, but now for the joint
parameter vector θ, ie., π (θ) = π µ, σ 2 . A few options exist here:
one could use a conjugate prior for θ (we will see that later)
another common one is the following combination for µ and σ 2 (related to
Jeffreys’ prior which will be discussed later as well)
– assume µ and σ 2 are independent, hence
π(θ) = π(µ, σ 2 )
= π(µ|σ 2 )π(σ 2 )
= π(µ)π(σ 2 ).
– then set a flat prior for µ, ie., π(µ) ∝ 1 (uninformative and improper prior for µ)
Daniel Buncic (Sveriges Riksbank) Topic 1: Bayesian Econometrics August 20, 2018 : 72/107
Some Model Based Examples
Example 3: Normal Model
– set also a flat prior for the log of σ 2 , that is, let φ = ln σ 2 , then π (φ) ∝ 1.
∂φ 1
– noting that π(σ 2 ) = π(φ) ∂σ 2 = π (φ) σ 2 (from RV transform), we get
| {z }
∝1
π(θ) = π µ, σ 2
= π µ|σ 2 π σ 2
1
= π (µ) π (φ)
σ2
1
∝ . (27)
σ2
Daniel Buncic (Sveriges Riksbank) Topic 1: Bayesian Econometrics August 20, 2018 : 73/107
Some Model Based Examples
Example 3: Normal Model
Given π(θ) in (27) and data density p (y|θ), the posterior becomes
where we can drop the terms involving (2π)−N/2 as they do not depend on
θ = (µ, σ 2 ).
How do we use the posterior for the joint parameter vector θ in (28)?
Daniel Buncic (Sveriges Riksbank) Topic 1: Bayesian Econometrics August 20, 2018 : 74/107
Some Model Based Examples
Example 3: Normal Model
Since we want to find the (marginal) posterior for µ and σ 2 , given the data, we need
to integrate out the other parameters that are not of interest to us, that is,
Z
p(µ|y) = p(θ|y)dσ 2
σ2
and Z
p(σ 2 |y) = p(θ|y)dµ
µ
in this setting.
Before we do that, note that
N
X N
X N
X
(yi − µ)2 = [(yi − ȳ) − (µ − ȳ)]2 , where ȳ = N −1 yi
i=1 i=1 i=1
N
X
(yi − ȳ)2 − 2 (yi − ȳ) (µ − ȳ) + (µ − ȳ)2
=
i=1
Daniel Buncic (Sveriges Riksbank) Topic 1: Bayesian Econometrics August 20, 2018 : 75/107
Some Model Based Examples
Example 3: Normal Model
N
X N
X N
X
= (yi − ȳ)2 −2 (µ − ȳ) (yi − ȳ) + (µ − ȳ)2
i=1 i=1 i=1
| {z } | {z }
=SSE =0
= (N − 1) s2
| {z }
=ν
2
= νs ,
and
Daniel Buncic (Sveriges Riksbank) Topic 1: Bayesian Econometrics August 20, 2018 : 76/107
Some Model Based Examples
Example 3: Normal Model
Note that SSE is a function of the data only and hence does not depend on
parameter vector of interest θ!
Combining (28) and (29) we can re-express (28) as
( 2
)
2−(N/2+1) 1 SSE + N (µ − ȳ)
p (θ|y) ∝ σ exp − 2 (30)
σ 2
Rather then using the relation in (28) to integrate out the unwanted parameter, this
is commonly done on the relation in (30)!
Daniel Buncic (Sveriges Riksbank) Topic 1: Bayesian Econometrics August 20, 2018 : 77/107
Some Model Based Examples
Example 3: Normal Model
Z
p(σ 2 |y) = p µ, σ 2 |y dµ
µ
( )
N (µ − ȳ)2
Z
2−(N/2+1) SSE
∝ σ exp − exp − dµ
µ 2σ 2 2σ 2
| {z }
does not depend on µ
( )
N (µ − ȳ)2
Z
2−(N/2+1) SSE
∝σ exp − exp − dµ
2σ 2 µ 2σ 2
| {z }
kernel of Normal(ȳ,σ 2 /N )
Daniel Buncic (Sveriges Riksbank) Topic 1: Bayesian Econometrics August 20, 2018 : 78/107
Some Model Based Examples
Example 3: Normal Model
Note that ( )
N (µ − ȳ)2
Z
exp − dµ = (2πσ 2 /N )1/2
µ 2σ 2
so that
1/2
σ2
SSE
p(σ 2 |y) ∝ σ 2−(N/2+1) exp − 2π
2σ 2 N
N −1 SSE
∝ σ 2−( 2 +1) exp − . (31)
2σ 2
The marginal posterior in (31) can be recognised as an Inverse Gamma density with
α = (N − 1) /2 and β = SSE/2, that is,
2 N − 1 SSE
σ |y ∼ InvGam , . (32)
2 2
Daniel Buncic (Sveriges Riksbank) Topic 1: Bayesian Econometrics August 20, 2018 : 79/107
Some Model Based Examples
Example 3: Normal Model
Daniel Buncic (Sveriges Riksbank) Topic 1: Bayesian Econometrics August 20, 2018 : 80/107
Some Model Based Examples
Example 3: Normal Model
Daniel Buncic (Sveriges Riksbank) Topic 1: Bayesian Econometrics August 20, 2018 : 81/107
Some Model Based Examples
Example 3: Normal Model
Daniel Buncic (Sveriges Riksbank) Topic 1: Bayesian Econometrics August 20, 2018 : 82/107
Some Model Based Examples
Example 3: Normal Model
Note: The reason why we are using β = 1 in the Inverse of an Inverse Gamma RV
transformation is to avoid having to re-define the scale parameter as β = 1/b as was
done in the distributions notes.
Daniel Buncic (Sveriges Riksbank) Topic 1: Bayesian Econometrics August 20, 2018 : 83/107
Some Model Based Examples
Example 3: Normal Model
The last relation in 4) will generate a Chi2 RV with α degrees of freedom if we draw
from 2*randg(α,T,1), with α = ν/2 ⇔ ν = α/2.
Daniel Buncic (Sveriges Riksbank) Topic 1: Bayesian Econometrics August 20, 2018 : 84/107
Some Model Based Examples
Example 3: Normal Model
ν νs2
z }| { z }| { !
2 N − 1 SSE
σ |y ∼ InvGam , .
2 2
2σ 2
N −1
y ∼ InvGam , 1
SSE 2
SSE N −1
y ∼ Gamma , 1
2σ 2 2
SSE N −1
y ∼ Gamma , 2
σ2 2
SSE
y ∼ Chi2(N − 1)
σ2
Daniel Buncic (Sveriges Riksbank) Topic 1: Bayesian Econometrics August 20, 2018 : 85/107
Some Model Based Examples
Example 3: Normal Model
(N − 1) s2
y ∼ Chi2(N − 1)
σ2
σ2
s2 y
∼ Chi2(N − 1).
(N − 1)
Also, since E(z) = N − 1, when Z ∼ Chi2(N − 1), we get the standard result that
σ2
E( s2 y) =
E[Chi2(N − 1)]
(N − 1) | {z }
=N −1
2
=σ ,
Daniel Buncic (Sveriges Riksbank) Topic 1: Bayesian Econometrics August 20, 2018 : 86/107
Some Model Based Examples
Example 3: Normal Model
To obtain a point estimate, we can again use the mean (or mode or median). Since
the marginal posterior for σ 2 , given the data y is:
N −1 SSE
σ 2 |y ∼ InvGam α = ,β =
2 2
β
and E(z) = α−1
if RV Z ∼ InvGam (α, β), we get
SSE
E σ 2 |y = 2
N −1
2
− 1
SSE
=
(N − 3)
β
as the posterior mean and with mode(z) = α+1
.
Daniel Buncic (Sveriges Riksbank) Topic 1: Bayesian Econometrics August 20, 2018 : 87/107
Some Model Based Examples
Example 3: Normal Model
Notice here that the mode(σ 2 |y) 6= max (L (θ|y)) because π(θ) ∝ 1/σ 2 and not
just a constant, so exact results differ because of the prior on θ.
But as N → ∞, this difference disappears and mode σ 2 |y → σ̂MLE2
.
Daniel Buncic (Sveriges Riksbank) Topic 1: Bayesian Econometrics August 20, 2018 : 88/107
Some Model Based Examples
Example 3: Normal Model
Daniel Buncic (Sveriges Riksbank) Topic 1: Bayesian Econometrics August 20, 2018 : 89/107
Some Model Based Examples
Example 3: Normal Model
p(µ|y) ∝ Γ(α)β −α
Daniel Buncic (Sveriges Riksbank) Topic 1: Bayesian Econometrics August 20, 2018 : 90/107
Some Model Based Examples
Example 3: Normal Model
−N/2
N (µ − ȳ) 2
= SSE −N/2 1 + (40)
SSE
−N/2
N (µ − ȳ) 2
∝ 1+
SSE
−(N −1+1)/2
N (µ − ȳ) 2
1
= 1+
(N − 1) s2
" 2 #−(ν+1)/2
1 (µ − ȳ)
= 1+ √ (41)
ν s/ N
where ν = (N − 1).
The kernel in (41) is that of a (non-standard) Students’ t distribution with ν degrees
of freedom, and location ȳ and scale s2 /N .
Daniel Buncic (Sveriges Riksbank) Topic 1: Bayesian Econometrics August 20, 2018 : 91/107
Some Model Based Examples
Example 3: Normal Model
√
Defining ω = N (µ − ȳ) /s we can compute the (standard) Students’ t distribution
for RV ω with PDF
∂µ
p(ω|y) = p(µ|y)
∂ω
−(ν+1)/2
1 s
∝ 1 + ω2 √
ν N
−(ν+1)/2
1
∝ 1 + ω2
ν
Daniel Buncic (Sveriges Riksbank) Topic 1: Bayesian Econometrics August 20, 2018 : 92/107
Some Model Based Examples
Example 3: Normal Model
E(µ|y) =ȳ
mode(µ|y) =ȳ
median(µ|y) =ȳ
so that
µ|y ∼ Normal ȳ, s2 /N ) .
Note that since the prior on µ was ∝ 1, posterior (Bayesian) estimate is same as
MLE (µ̂MLE = ȳ).
Daniel Buncic (Sveriges Riksbank) Topic 1: Bayesian Econometrics August 20, 2018 : 93/107
Some Model Based Examples
Example 4: Plain Vanilla Regression Model
where εi ∼ Normal 0, σ 2 .
···
y1 x11 x12 x1p β1 ε1
y2 x21 x22 ··· x2p β2 ε2
y = . , X = . .. , β = . , ε = .
.. ..
(N ×1) .. (N ×p) .. . . . (p×1) .. (N ×1) ..
yN xN 1 xN 2 ··· xN p βp εN
Daniel Buncic (Sveriges Riksbank) Topic 1: Bayesian Econometrics August 20, 2018 : 94/107
Some Model Based Examples
Example 4: Plain Vanilla Regression Model
with β, σ 2 ∈ Rp × R+ .
Note that the first column of X is frequently a column of ones corresponding to the
intercept term in the regression model, but this is not important here.
In general, a random vector Z(k×1) is said to be Multivariate Normal distributed with
1) mean vector µ(k×1) and
2) co-variance matrix Σ(k×k)
if its PDF is
1
p(Z|µ, Σ) = (2π)−k/2 det (Σ)−1/2 exp − (Z − µ)0 Σ−1 (Z − µ)
2
Z ∼ MNormal(µ, Σ)
Daniel Buncic (Sveriges Riksbank) Topic 1: Bayesian Econometrics August 20, 2018 : 95/107
Some Model Based Examples
Example 4: Plain Vanilla Regression Model
Frequentist MLE
The Likelihood function (joint density) is simply (where Σ = σ 2 I)
−1/2 1 −1
L(β, σ 2 |y, X) = (2π)−N/2 det σ 2 I exp − (y − Xβ)0 σ 2 I
(y − Xβ)
2
2 −N/2 1 0
= (2πσ ) exp − 2 (y − Xβ) (y − Xβ)
2σ
Daniel Buncic (Sveriges Riksbank) Topic 1: Bayesian Econometrics August 20, 2018 : 96/107
Some Model Based Examples
Example 4: Plain Vanilla Regression Model
and
∂ ln(L(β, σ 2 |y, X))
=0
∂σ 2
N 2 −1 1 2 −2
(σ ) + (σ ) (y − Xβ)0 (y − Xβ) = 0
−
2 2
0
2
so that σ̂MLE = N −1 y − Xβ̂MLE y − Xβ̂MLE .
2
The variance/covariance matrix for θ̂MLE = (β̂MLE σ̂MLE ), is obtained as before from
the inverse of the Information matrix, where I(θ) = −E ( ·| X), ie., conditional on X:
2
∂ ln(L(β, σ 2 |y, X))
I(θ) = −E X
∂θ∂θ
Daniel Buncic (Sveriges Riksbank) Topic 1: Bayesian Econometrics August 20, 2018 : 97/107
Some Model Based Examples
Example 4: Plain Vanilla Regression Model
−σ −2 (X0 X) −σ −4 ε0 X
= −E −6 X
−σ −4 ε0 X −4 0
1
2
N σ − ε εσ
−2 0
σ (X X) 0
=
0 N/2σ −4
yielding
−1
σ 2 (X0 X)
0
Var(θ̂MLE ) = I(θ)−1 = .
0 2σ 4 /N
Daniel Buncic (Sveriges Riksbank) Topic 1: Bayesian Econometrics August 20, 2018 : 98/107
Some Model Based Examples
Example 4: Plain Vanilla Regression Model
Bayesian
Under a Bayesian setting, using again the same (Jeffrey’s) prior for
π (θ) = π(β, σ 2 ) ∝ 1/σ 2 as before in Example 3: The Normal Model, we get:
1
p(θ|y, X) ∝ (σ 2 )−(N/2+1) exp − 2 (y − Xβ)0 (y − Xβ) (43)
2σ
and we can again seek an expression for (y − Xβ)0 (y − Xβ) that relates y to the
OLS (and here also MLE) estimate β̂ = (X0 X)−1 (X0 y).
Re-writing the term
=(A−B)0 =(A−B)
z }| {z }| {
(y − Xβ)0 (y − Xβ) = [(y − Xβ̂) − X(β − β̂)]0 [(y − Xβ̂) − X(β − β̂)]
| {z } | {z } | {z } | {z }
=A =B =A =B
Daniel Buncic (Sveriges Riksbank) Topic 1: Bayesian Econometrics August 20, 2018 : 99/107
Some Model Based Examples
Example 4: Plain Vanilla Regression Model
= A0 A − A0 B − B0 A + B0 B
Daniel Buncic (Sveriges Riksbank) Topic 1: Bayesian Econometrics August 20, 2018 : 100/107
Some Model Based Examples
Example 4: Plain Vanilla Regression Model
−1 0
Because the hat (or projection) matrix H = X (X0 X) X is a symmetric (and
idempotent) matrix, we have H = H0 (and H0 H = I(N ×N ) ) so that
(I − H)0 X = X − HX
−1 0
= X − X X0 X XX=0
Daniel Buncic (Sveriges Riksbank) Topic 1: Bayesian Econometrics August 20, 2018 : 101/107
Some Model Based Examples
Example 4: Plain Vanilla Regression Model
and proceed to find p(σ 2 |y, X) as with the Normal model, ie., form
Z
p(σ 2 |y, X) = p(β, σ 2 |y, X)dβ
β
SSE
= (σ 2 )−(N/2+1) exp −
2σ 2
Z
1
× exp − 2 [(β − β̂)0 (X0 X)(β − β̂)] dβ (46)
β 2σ
where the kernel of (46) is a Multivariate Normal with mean β̂ and covariance matrix
−1
σ 2 (X0 X) , so will integrate to
−1 1/2 −1/2
(2π)p/2 det σ 2 X0 X = (2πσ 2 )p/2 det X0 X .
Daniel Buncic (Sveriges Riksbank) Topic 1: Bayesian Econometrics August 20, 2018 : 102/107
Some Model Based Examples
Example 4: Plain Vanilla Regression Model
−1/2
We then get, after dropping the (2π)p/2 and the det (X0 X) constants:
SSE
p(σ 2 |y, X) ∝ (σ 2 )−(N/2+1) exp − × (σ 2 )p/2
2σ 2
−p
2 −( N 2 +1) SSE
∝ (σ ) exp − (47)
2σ 2
where the expression in (47) is the kernel of an InvGam (α, β) density with
α = (N − p)/2 and β = SSE/2.
So the result for the marginal posterior p(σ 2 |y, X) from the regression model is
analogous to the Normal model that we obtained earlier.
Daniel Buncic (Sveriges Riksbank) Topic 1: Bayesian Econometrics August 20, 2018 : 103/107
Some Model Based Examples
Example 4: Plain Vanilla Regression Model
The marginal posterior for β is obtained by integrating σ 2 out of (46), ie., compute
Z
1
p(β|y, X) ∝ (σ 2 )−(N/2+1) exp − 2 [SSE + (β − β̂)0 (X0 X)(β − β̂)] dσ 2
σ2 2σ
(48)
As before, we can recognise the kernel in (48) to be that of an InvGam (α, β), where
α = N/2
and
β = [SSE + (β − β̂)0 (X0 X)(β − β̂)]/2,
so that
Daniel Buncic (Sveriges Riksbank) Topic 1: Bayesian Econometrics August 20, 2018 : 104/107
Some Model Based Examples
Example 4: Plain Vanilla Regression Model
p(β|y, X) ∝ Γ (α) β −α
−N/2
= Γ(N/2) [SSE + (β − β̂)0 (X0 X)(β − β̂)]/2
Daniel Buncic (Sveriges Riksbank) Topic 1: Bayesian Econometrics August 20, 2018 : 105/107
Some Model Based Examples
Example 4: Plain Vanilla Regression Model
−(ν+p)/2
1
∝ 1 + (β − β̂)0 Σ−1 (β − β̂) (50)
ν
where ν = (N − p), SSE = νs2 , Σ = s2 (X0 X)−1 and p = dim (β), ie., the number
of regressors.
The expression in (50) can
be recognised as a Multivariate Students’ t distribution
denoted by Mt ν, β̂, Σ with ν degrees of freedom and location and scale being β̂
and Σ, respectively, ie.,
−(ν+p)/2
Γ ((ν + p) /2) −1/2 1 0 −1
p(β|y, X) = det (Σ) 1 + (β − β̂) Σ (β − β̂)
Γ (ν/2) (νπ)p/2 ν
Daniel Buncic (Sveriges Riksbank) Topic 1: Bayesian Econometrics August 20, 2018 : 106/107
References
Bernardo, José M. and Adrian F. M. Smith (1994): Bayesian Theory, Wiley Series in Probability and Statistics, John
Wiley & Sons.
Casella, George and Rober L. Berger (2001): Statistical Inference, 2nd Edition, Duxbury Press.
Cox, Richard T. (1946): “Probability, frequency and reasonable expectation,” American journal of physics, 14(1), 1–13.
DeGroot, Morris H. and Mark J. Schervish (2010): Probability and Statistics, 4th edition Edition, Pearson.
Hoff, Peter D. (2009): A First Course in Bayesian Statistical Methods, Springer Verlag.
Kolmogorov, Andrei N. (1933): Grundbegriffe der Wahrscheinlichkeitsrechnung, Springer.
Koop, Gary M. (2003): Bayesian Econometrics, John Wiley & Sons.
Koop, Gary M., Dale J. Poirier and Justin L. Tobias (2007): Bayesian Econometric Methods, Cambridge University
Press.
Leemis, Lawrence M. and Jacquelyn T. McQueston (2008): “Univariate distribution relationships,” The American
Statistician, 62(1), 45–53.
Robert, Christian P. (2007): The Bayesian Choice: From Decision-Theoretic Foundations to Computational
Implementation, Springer Verlag.
Savage, Leonard J. (1954): The Foundations of Statistical Inference, John Wiley and Sons.
Shafer, Glenn and Vladimir Vovk (2006): “The Sources of Kolmogorov’s Grundbegriffe,” Statistical Science, 21(1),
70–98.
Spanos, Aris (1986): Statistical Foundations of Econometric Modelling, Cambridge University Press.
Daniel Buncic (Sveriges Riksbank) Topic 1: Bayesian Econometrics August 20, 2018 : 107/107