0% found this document useful (0 votes)
194 views8 pages

L31 Bayesian Logistic Regression PDF

Uploaded by

Ananya Agarwal
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
194 views8 pages

L31 Bayesian Logistic Regression PDF

Uploaded by

Ananya Agarwal
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

MTH 511a - 2020: Lecture 31

Instructor: Dootika Vats

The instructor of this course owns the copyright of all the course materials. This lecture
material was distributed only to the students attending the course MTH511a: “Statistical
Simulation and Data Analysis” of IIT Kanpur, and should not be distributed in print or
through electronic media without the consent of the instructor. Students can make their own
copies of the course materials for their use.
A popular Bayesian model is the Bayesian logistic regression model. In this lecture we
will present the model and analyze the Titanic dataset from the exam.

1 Bayesian logistic regression


Consider a Bayesian logistic regression model. For i = 1, . . . , n, let
T
xi = (1, xi2 , . . . , xi(p 1) )

be the vector of covariates for the ith observation and 2 Rp be the corresponding
vector of regression coefficients. Suppose response yi is a realization of Yi with

exp(xTi )
Yi |xi , ⇠ Bern (pi ) where pi = .
1 + exp(xTi )

Since this is a Bayesian model, we also assume that has the following prior distribu-
tion
⇠ Np (0, Ip ) .

Our goal is to find the posterior distribution and report the posterior mean and credible
intervals of . In order to do this, the first thing we do is write down the posterior
distribution.
n
Y
⇡( |y) / ⇡( ) f (yi | )
i=1
n
Y
T
/e /2
(pi )yi (1 pi ) 1 yi

i=1

1
The posterior is p dimensional, so here we need to sample from a p dimensional distri-
bution. Our proposal distribution will be
p
Y
⇤ 2 ⇤
q( | , )= q( | ).
k=1

So each component is given it’s own proposal value, independent of each other. We will
use all normal distributions, with di↵erent step sizes h1 , . . . , hp . So we propose from
0 2 31
h ... 0 0
B 6 1 7C
B 6 7C
B 6 0 h2 0 0 7C
Np B 6
B t , 6 .. ..
7C
7C
B 6 . . 0 7C
@ 4 5A
0 0 0 hp

Note that this is a symmetric proposal, so the MH ratio is simplifies. Since we already
know the MLE of the logistic regression model, we can start from the MLE solution!
###########################################
## Bayesian logistic regression
## with MH implementation
###########################################
# log posterior
logf <- function(beta)
{
one.minus.yx <- (1 - y)*X
-sum(beta^2)/2 - sum(log(1 + exp(-X%*%beta))) - sum(one.minus.yx%*%beta)
}

bayes_logit_mh <- function(y, X, N = 1e4, prop.sd = .35)


{
p <- dim(X)[2]
one.minus.yx <- (1 - y)*X

# starting value is the MLE


foo <- glm(y ~X - 1, family = binomial("logit"))$coef
beta <- as.matrix(foo, ncol = 1)
beta.mat <- matrix(0, nrow = N, ncol = p)
beta.mat[1, ] <- as.numeric(beta)
accept <- 0

for(i in 2:N)
{
#symmetric density
prop <- rnorm(p, mean = beta, sd = prop.sd)

# log of the MH ratio

2
log.rat <- logf(prop) - logf(beta)
if(log(runif(1)) < log.rat)
{
beta <- prop
accept <- accept + 1
}
beta.mat[i, ] <- beta
}
print(paste("Acceptance Prob = ", accept/N))
return(beta.mat)
}

When we run the above function, it automatically prints the acceptance probability.
We now load the dataset.
titanic <- read.csv("https://dvats.github.io/assets/titanic.csv")

y <- titanic[,1]
X <- as.matrix(titanic[, -1])

First we will try to find a proposal variance h that words reasonably well to give about
23% acceptance. We will do this by running the sampler for short (103 ) runs. First we
let all proposal variance to be the same.
## acceptance is too low. we want 23%
## so decrease proposal variance
chain <- bayes_logit_mh(y = y, X = X, N = 1e3, prop.sd = .35)
#[1] "Acceptance Prob = 0"

# still too low


chain <- bayes_logit_mh(y = y, X = X, N = 1e3, prop.sd = .1)
#[1] "Acceptance Prob = 0"

# now its better


chain <- bayes_logit_mh(y = y, X = X, N = 1e3, prop.sd = .0065)
#[1] "Acceptance Prob = 0.218"

Notice our first two runs did not work well since our acceptance rate was too low. This
means we were proposing large jumps and it would be better to take smaller jumps
to increase acceptance. When we reduced the proposal sd to .0065 we got a decent
acceptance rate. Now we will run this same run for longer (105 ) and print diagonstics.
# will now run the chain much longer for 10^5
# takes a few seconds
chain <- bayes_logit_mh(y = y, X = X, N = 1e5, prop.sd = .0065)

# all trace plots

3
plot.ts(chain)

par(mfrow = c(2,3))
# all ACF plots
for(i in 1:dim(chain)[2])
{
acf(chain[,i], main = paste("ACF of Comp ", i))
}

# all density plots plots


for(i in 1:dim(chain)[2])
{
plot(density(chain[,i]), main = paste("Density of Comp ", i))
}

Trace plots

−0.1
−2.00.8 1.0 1.2 1.4 1.6 1.8
Series 1

Series 4
−0.3
−0.5
0.1 −0.7
−2.2

−0.1
Series 2

Series 5
−2.4

−0.3
−2.6

0.030 −0.5
0.00 −2.8
Series 3

Series 6
0.020
−0.02

0.010
−0.04

0e+00 2e+04 4e+04 6e+04 8e+04 1e+05 0e+00 2e+04 4e+04 6e+04 8e+04 1e+05

Time Time

4
ACF of Comp 1 ACF of Comp 2 ACF of Comp 3

1.0

1.0

1.0
0.8

0.8

0.8
0.6

0.6

0.6
ACF

ACF

ACF
0.4

0.4

0.4
0.2

0.2

0.2
0.0

0.0

0.0
0 10 20 30 40 50 0 10 20 30 40 50 0 10 20 30 40 50

Lag Lag Lag

ACF of Comp 4 ACF of Comp 5 ACF of Comp 6


1.0

1.0

1.0
0.8

0.8

0.8
0.6

0.6

0.6
ACF

ACF

ACF
0.4

0.4

0.4
0.2

0.2

0.2
0.0

0.0

0.0
0 10 20 30 40 50 0 10 20 30 40 50 0 10 20 30 40 50

Lag Lag Lag

Density of Comp 1 Density of Comp 2 Density of Comp 3


2.0

2.5

60
50
2.0
1.5

40
1.5
Density

Density

Density
1.0

30
1.0

20
0.5

0.5

10
0.0

0.0

0.8 1.0 1.2 1.4 1.6 1.8 2.0 −2.8 −2.6 −2.4 −2.2 −2.0 −0.04 −0.03 −0.02 −0.01 0.00

N = 100000 Bandwidth = 0.01975 N = 100000 Bandwidth = 0.01604 N = 100000 Bandwidth = 0.0005816

Density of Comp 4 Density of Comp 5 Density of Comp 6


4
4

100
3
3

80
Density

Density

Density
2
2

60
40
1
1

20
0

−0.6 −0.4 −0.2 0.0 −0.5 −0.3 −0.1 0.1 0.005 0.015 0.025

N = 100000 Bandwidth = 0.009918 N = 100000 Bandwidth = 0.009745 N = 100000 Bandwidth = 0.0002776

What we see is that although components 3 and 6 are well estimated with sample
size 105 , the other four are very poorly moving. This is because we choose the same
proposal variance for each component which is not ideal here. We will now give each
component a di↵erent proposal variance and run the sampler again for 105 steps.

# we see above that some components are ok, but 4 components are

5
# moving very slowly. This is because we are using the same proposal
# variance for each component, which is not adequate here.
# Below now I use different proposal variances for different
#components.

chain <- bayes_logit_mh(y = y, X = X, N = 1e5, prop.sd = c(.08, .08, .0065,


.03, .03, .0065))

# all trace plots


plot.ts(chain, main = "Trace plots")

par(mfrow = c(2,3))
# all ACF plots
for(i in 1:dim(chain)[2])
{
acf(chain[,i], main = paste("ACF of Comp ", i))
}

# all density plots plots


for(i in 1:dim(chain)[2])
{
plot(density(chain[,i]), main = paste("Density of Comp ", i))
}

Trace plots
2.5

−0.2 0.0
2.0
Series 1

Series 4
1.5

−0.6
1.0

−1.0
0.5
−1.8

0.2
0.0
−2.2
Series 2

Series 5
−0.2
−2.6

−0.4
−3.0

−0.6
0.030
−0.01
Series 3

Series 6
0.020
−0.03

0.010
−0.05

0e+00 2e+04 4e+04 6e+04 8e+04 1e+05 0e+00 2e+04 4e+04 6e+04 8e+04 1e+05

Time Time

6
ACF of Comp 1 ACF of Comp 2 ACF of Comp 3

1.0

1.0

1.0
0.8

0.8

0.8
0.6

0.6

0.6
ACF

ACF

ACF
0.4

0.4

0.4
0.2

0.2

0.2
0.0

0.0

0.0
0 10 20 30 40 50 0 10 20 30 40 50 0 10 20 30 40 50

Lag Lag Lag

ACF of Comp 4 ACF of Comp 5 ACF of Comp 6


1.0

1.0

1.0
0.8

0.8

0.8
0.6

0.6

0.6
ACF

ACF

ACF
0.4

0.4

0.4
0.2

0.2

0.2
0.0

0.0

0.0
0 10 20 30 40 50 0 10 20 30 40 50 0 10 20 30 40 50

Lag Lag Lag

Density of Comp 1 Density of Comp 2 Density of Comp 3


1.5

2.0

60
50
1.5
1.0

40
Density

Density

Density
1.0

30
0.5

20
0.5

10
0.0

0.0

0.5 1.0 1.5 2.0 2.5 −3.2 −2.8 −2.4 −2.0 −0.05 −0.03 −0.01 0.01

N = 100000 Bandwidth = 0.02346 N = 100000 Bandwidth = 0.01756 N = 100000 Bandwidth = 0.000606

Density of Comp 4 Density of Comp 5 Density of Comp 6


0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5
0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5

100
80
Density

Density

Density

60
40
20
0

−1.0 −0.8 −0.6 −0.4 −0.2 0.0 −0.6 −0.4 −0.2 0.0 0.2 0.005 0.015 0.025

N = 100000 Bandwidth = 0.01082 N = 100000 Bandwidth = 0.01045 N = 100000 Bandwidth = 0.0002826

The estimated density plots, acfs, and trace plots are much better!
Thus, we see that MCMC, although powerful, can be difficult to tune. However, once
you can make it work, it works reasonable well.

7
2 Questions to think about
• Try and implement MCMC for the Bayesian regression model.
• Obtain posterior mean and quantiles for the above implemented example. How
do the final estimates compare to the MLE estimates?

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy