L31 Bayesian Logistic Regression PDF
L31 Bayesian Logistic Regression PDF
The instructor of this course owns the copyright of all the course materials. This lecture
material was distributed only to the students attending the course MTH511a: “Statistical
Simulation and Data Analysis” of IIT Kanpur, and should not be distributed in print or
through electronic media without the consent of the instructor. Students can make their own
copies of the course materials for their use.
A popular Bayesian model is the Bayesian logistic regression model. In this lecture we
will present the model and analyze the Titanic dataset from the exam.
be the vector of covariates for the ith observation and 2 Rp be the corresponding
vector of regression coefficients. Suppose response yi is a realization of Yi with
exp(xTi )
Yi |xi , ⇠ Bern (pi ) where pi = .
1 + exp(xTi )
Since this is a Bayesian model, we also assume that has the following prior distribu-
tion
⇠ Np (0, Ip ) .
Our goal is to find the posterior distribution and report the posterior mean and credible
intervals of . In order to do this, the first thing we do is write down the posterior
distribution.
n
Y
⇡( |y) / ⇡( ) f (yi | )
i=1
n
Y
T
/e /2
(pi )yi (1 pi ) 1 yi
i=1
1
The posterior is p dimensional, so here we need to sample from a p dimensional distri-
bution. Our proposal distribution will be
p
Y
⇤ 2 ⇤
q( | , )= q( | ).
k=1
So each component is given it’s own proposal value, independent of each other. We will
use all normal distributions, with di↵erent step sizes h1 , . . . , hp . So we propose from
0 2 31
h ... 0 0
B 6 1 7C
B 6 7C
B 6 0 h2 0 0 7C
Np B 6
B t , 6 .. ..
7C
7C
B 6 . . 0 7C
@ 4 5A
0 0 0 hp
Note that this is a symmetric proposal, so the MH ratio is simplifies. Since we already
know the MLE of the logistic regression model, we can start from the MLE solution!
###########################################
## Bayesian logistic regression
## with MH implementation
###########################################
# log posterior
logf <- function(beta)
{
one.minus.yx <- (1 - y)*X
-sum(beta^2)/2 - sum(log(1 + exp(-X%*%beta))) - sum(one.minus.yx%*%beta)
}
for(i in 2:N)
{
#symmetric density
prop <- rnorm(p, mean = beta, sd = prop.sd)
2
log.rat <- logf(prop) - logf(beta)
if(log(runif(1)) < log.rat)
{
beta <- prop
accept <- accept + 1
}
beta.mat[i, ] <- beta
}
print(paste("Acceptance Prob = ", accept/N))
return(beta.mat)
}
When we run the above function, it automatically prints the acceptance probability.
We now load the dataset.
titanic <- read.csv("https://dvats.github.io/assets/titanic.csv")
y <- titanic[,1]
X <- as.matrix(titanic[, -1])
First we will try to find a proposal variance h that words reasonably well to give about
23% acceptance. We will do this by running the sampler for short (103 ) runs. First we
let all proposal variance to be the same.
## acceptance is too low. we want 23%
## so decrease proposal variance
chain <- bayes_logit_mh(y = y, X = X, N = 1e3, prop.sd = .35)
#[1] "Acceptance Prob = 0"
Notice our first two runs did not work well since our acceptance rate was too low. This
means we were proposing large jumps and it would be better to take smaller jumps
to increase acceptance. When we reduced the proposal sd to .0065 we got a decent
acceptance rate. Now we will run this same run for longer (105 ) and print diagonstics.
# will now run the chain much longer for 10^5
# takes a few seconds
chain <- bayes_logit_mh(y = y, X = X, N = 1e5, prop.sd = .0065)
3
plot.ts(chain)
par(mfrow = c(2,3))
# all ACF plots
for(i in 1:dim(chain)[2])
{
acf(chain[,i], main = paste("ACF of Comp ", i))
}
Trace plots
−0.1
−2.00.8 1.0 1.2 1.4 1.6 1.8
Series 1
Series 4
−0.3
−0.5
0.1 −0.7
−2.2
−0.1
Series 2
Series 5
−2.4
−0.3
−2.6
0.030 −0.5
0.00 −2.8
Series 3
Series 6
0.020
−0.02
0.010
−0.04
0e+00 2e+04 4e+04 6e+04 8e+04 1e+05 0e+00 2e+04 4e+04 6e+04 8e+04 1e+05
Time Time
4
ACF of Comp 1 ACF of Comp 2 ACF of Comp 3
1.0
1.0
1.0
0.8
0.8
0.8
0.6
0.6
0.6
ACF
ACF
ACF
0.4
0.4
0.4
0.2
0.2
0.2
0.0
0.0
0.0
0 10 20 30 40 50 0 10 20 30 40 50 0 10 20 30 40 50
1.0
1.0
0.8
0.8
0.8
0.6
0.6
0.6
ACF
ACF
ACF
0.4
0.4
0.4
0.2
0.2
0.2
0.0
0.0
0.0
0 10 20 30 40 50 0 10 20 30 40 50 0 10 20 30 40 50
2.5
60
50
2.0
1.5
40
1.5
Density
Density
Density
1.0
30
1.0
20
0.5
0.5
10
0.0
0.0
0.8 1.0 1.2 1.4 1.6 1.8 2.0 −2.8 −2.6 −2.4 −2.2 −2.0 −0.04 −0.03 −0.02 −0.01 0.00
100
3
3
80
Density
Density
Density
2
2
60
40
1
1
20
0
−0.6 −0.4 −0.2 0.0 −0.5 −0.3 −0.1 0.1 0.005 0.015 0.025
What we see is that although components 3 and 6 are well estimated with sample
size 105 , the other four are very poorly moving. This is because we choose the same
proposal variance for each component which is not ideal here. We will now give each
component a di↵erent proposal variance and run the sampler again for 105 steps.
# we see above that some components are ok, but 4 components are
5
# moving very slowly. This is because we are using the same proposal
# variance for each component, which is not adequate here.
# Below now I use different proposal variances for different
#components.
par(mfrow = c(2,3))
# all ACF plots
for(i in 1:dim(chain)[2])
{
acf(chain[,i], main = paste("ACF of Comp ", i))
}
Trace plots
2.5
−0.2 0.0
2.0
Series 1
Series 4
1.5
−0.6
1.0
−1.0
0.5
−1.8
0.2
0.0
−2.2
Series 2
Series 5
−0.2
−2.6
−0.4
−3.0
−0.6
0.030
−0.01
Series 3
Series 6
0.020
−0.03
0.010
−0.05
0e+00 2e+04 4e+04 6e+04 8e+04 1e+05 0e+00 2e+04 4e+04 6e+04 8e+04 1e+05
Time Time
6
ACF of Comp 1 ACF of Comp 2 ACF of Comp 3
1.0
1.0
1.0
0.8
0.8
0.8
0.6
0.6
0.6
ACF
ACF
ACF
0.4
0.4
0.4
0.2
0.2
0.2
0.0
0.0
0.0
0 10 20 30 40 50 0 10 20 30 40 50 0 10 20 30 40 50
1.0
1.0
0.8
0.8
0.8
0.6
0.6
0.6
ACF
ACF
ACF
0.4
0.4
0.4
0.2
0.2
0.2
0.0
0.0
0.0
0 10 20 30 40 50 0 10 20 30 40 50 0 10 20 30 40 50
2.0
60
50
1.5
1.0
40
Density
Density
Density
1.0
30
0.5
20
0.5
10
0.0
0.0
0.5 1.0 1.5 2.0 2.5 −3.2 −2.8 −2.4 −2.0 −0.05 −0.03 −0.01 0.01
100
80
Density
Density
Density
60
40
20
0
−1.0 −0.8 −0.6 −0.4 −0.2 0.0 −0.6 −0.4 −0.2 0.0 0.2 0.005 0.015 0.025
The estimated density plots, acfs, and trace plots are much better!
Thus, we see that MCMC, although powerful, can be difficult to tune. However, once
you can make it work, it works reasonable well.
7
2 Questions to think about
• Try and implement MCMC for the Bayesian regression model.
• Obtain posterior mean and quantiles for the above implemented example. How
do the final estimates compare to the MLE estimates?