Package Bayeslogit': R Topics Documented
Package Bayeslogit': R Topics Documented
Package Bayeslogit': R Topics Documented
= 1|x
, ) = (1 + exp(x
))
1
.
Instead of representing data as a collection of binary outcomes, one may record the average response
y
i
at each unique x
i
given a total number of n
i
observations at x
i
. We follow this method of
encoding data.
Value
logit returns a list.
beta A samp x P array; the posterior sample of the regression coefcients.
w A samp x N array; the posterior sample of the latent variable. WARNING: N
may be less than N if data is combined.
y The response matrixdifferent than input if data is combined.
X The design matrixdifferent than input if data is combined.
n The number of samples at each observationdifferent than input if data is com-
bined.
References
Nicholas G. Polson, James G. Scott, and Jesse Windle. Bayesian inference for logistic models using
Polya-Gamma latent variables. http://arxiv.org/abs/1205.0310
Nicholas Poslon and James G. Scott. Default Bayesian analysis for multi-way tables: a data-
augmentation approach. http://arxiv.org/pdf/1109.4180
6 logit.combine
See Also
rpg, logit.EM, mlogit
Examples
## From UCI Machine Learning Repository.
data(spambase);
## A subset of the data.
sbase = spambase[seq(1,nrow(spambase),10),];
X = model.matrix(is.spam ~ word.freq.free + word.freq.1999, data=sbase);
y = sbase$is.spam;
## Run logistic regression.
output = logit(y, X, samp=1000, burn=100);
logit.combine Collapse Data for Binomial Logistic Regression
Description
Collapse data for binomial logistic regression.
Usage
logit.combine(y, X, n=rep(1,length(y)))
Arguments
y An N dimensional vector; y
i
is the average response at x
i
.
X An N x P dimensional design matrix; x
i
is the ith row.
n An N dimensional vector; n_i is the number of observations at each x
i
.
Details
Logistic regression is a classication mechanism. Given the binary data {y
i
} and the p-dimensional
predictor variables {x
i
}, one wants to forecast whether a future data point y* observed at the pre-
dictor x* will be zero or one. Logistic regression stipulates that the statistical model for observing
a success=1 or failure=0 is governed by
P(y
= 1|x
, ) = (1 + exp(x
))
1
.
logit.EM 7
Instead of representing data as a collection of binary outcomes, one may record the average response
y
i
at each unique x
i
given a total number of n
i
observations at x
i
.
Thus, when a predictor is repeated the two reponses may be collapsed into a single observation
representing multiple trials. This function collapses data in this way.
Value
logit.combine returns a list.
y The new response.
X The new design matrix.
n The number of samples at each revised observation.
See Also
logit, logit.EM, mlogit
Examples
## From UCI Machine Learning Repository.
data(spambase);
## A subset of the data.
sbase = spambase[seq(1,nrow(spambase),10),];
X = model.matrix(is.spam ~ word.freq.free + word.freq.1999, data=sbase);
y = sbase$is.spam;
## Actually unnecessary as logit.EM automatically tries to compress.
new.data = logit.combine(y, X)
mode.spam = logit.EM(new.data$y, new.data$X, new.data$n)
mode.spam
logit.EM Logistic Regression Expectation Maximization
Description
Expectation maximization for logistic regression.
Usage
logit.EM(y, X, n=rep(1,length(y)), tol=1e-9, max.iter=100)
8 logit.EM
Arguments
y An N dimensional vector; y
i
is the average response at x
i
.
X An N x P dimensional design matrix; x
i
is the ith row.
n An N dimensional vector; n_i is the number of observations at each x
i
.
tol Threshold at which algorithm stops.
max.iter Maximum number of iterations.
Details
Logistic regression is a classication mechanism. Given the binary data {y
i
} and the p-dimensional
predictor variables {x
i
}, one wants to forecast whether a future data point y* observed at the pre-
dictor x* will be zero or one. Logistic regression stipulates that the statistical model for observing
a success=1 or failure=0 is governed by
P(y
= 1|x
, ) = (1 + exp(x
))
1
.
Instead of representing data as a collection of binary outcomes, one may record the average response
y
i
at each unique x
i
given a total number of n
i
observations at x
i
. We follow this method of
encoding data.
A non-informative prior is used.
Value
beta The posterior mode.
iter The number of iterations.
References
Nicholas G. Polson, James G. Scott, and Jesse Windle. Bayesian inference for logistic models using
Polya-Gamma latent variables. http://arxiv.org/abs/1205.0310
Nicholas G. Poslon and James G. Scott. Default Bayesian analysis for multi-way tables: a data-
augmentation approach. http://arxiv.org/pdf/1109.4180
See Also
rpg, logit, mlogit
Examples
## From UCI Machine Learning Repository.
data(spambase);
## A subset of the data.
sbase = spambase[seq(1,nrow(spambase),10),];
X = model.matrix(is.spam ~ word.freq.free + word.freq.1999, data=sbase);
mlogit 9
y = sbase$is.spam;
## Run logistic regression.
output = logit.EM(y, X);
mlogit Bayesian Multinomial Logistic Regression
Description
Run a Bayesian multinomial logistic regression.
Usage
mlogit(y, X, n=rep(1,nrow(as.matrix(y))),
m.0=array(0, dim=c(ncol(X), ncol(y))),
P.0=array(0, dim=c(ncol(X), ncol(X), ncol(y))),
samp=1000, burn=500)
Arguments
y An N x J-1 dimensional matrix; y
ij
is the average response for category j at x
i
.
X An N x P dimensional design matrix; x
i
is the ith row.
n An N dimensional vector; n
i
is the total number of observations at each x
i
.
m.0 A P x J-1 matrix with the beta
j
s prior means.
P.0 A P x P x J-1 array of matrices with the beta
j
s prior precisions.
samp The number of MCMC iterations saved.
burn The number of MCMC iterations discarded.
Details
Multinomial logistic regression is a classiction mechanism. Given the multinomial data {y
i
} with
J categories and the p-dimensional predictor variables {x
i
}, one wants to forecast whether a future
data point y* at the predictor x*. Multinomial Logistic regression stiuplates that the statistical model
for observing a draw category j after rolling the multinomial die n
= 1 time is governed by
P(y
= j|x
, , n
= 1) = e
x
j
/
J
k=1
e
x
k
.
Instead of representing data as the total number of responses in each category, one may record the
average number of responses in each category and the total number of responses n
i
at x
i
. We follow
this method of encoding data.
We assume that
J
= 0 for purposes of identication!
You may use mlogit for binary logistic regression with a normal prior.
10 rain
Value
mlogit returns a list.
beta A samp x P x J-1 array; the posterior sample of the regression coefcients.
w A samp x N x J-1 array; the posterior sample of the latent variable. WARNING:
N may be less than N if data is combined.
y The response matrixdifferent than input if data is combined.
X The design matrixdifferent than input if data is combined.
n The number of samples at each observationdifferent than input if data is com-
bined.
References
Nicholas G. Polson, James G. Scott, and Jesse Windle. Bayesian inference for logistic models using
Polya-Gamma latent variables. http://arxiv.org/abs/1205.0310
See Also
rpg, logit.EM, logit
Examples
## Use the iris dataset.
data(iris)
N = nrow(iris)
P = ncol(iris)
J = nlevels(iris$Species)
X = model.matrix(Species ~ ., data=iris);
y.all = model.matrix(~ Species - 1, data=iris);
y = y.all[,-J];
out = mlogit(y, X, samp=1000, burn=100);
rain Tokyo Rainfall
Description
The rain data has 366 real valued binomial responses representing days on which it rained over the
course of two years in Tokyo.
rks 11
Format
A list with three components: y, the repsonse representing the number of times it has rained on the
ith day of the year in Tokyo from 1983 through 1984, X, a matrix of ones, and n, the number of
observations for each day. n is two for all days except Feb. 29.
Details
This is the infamous Tokyo rainfall data from Kitagawa (1987).
References
Genshiro Kitagawa. Non-Gaussian State-Space Modeling of Non-stationary Time Series. Journal
of the American Statistical Association (1987).
rks The Kolmogorov-Smirnov distribution
Description
Generate a random variate from the Kolmogorov-Smirnov distribution.
This is not directly related to the Polya-Gamma technique, but it is a nice example of using an
alternating sum to generate a random variate.
Usage
rks(N=1)
Arguments
N The number of random variates to generate.
Details
The density function of the KS distribution is
f(x) = 8
i=1
(1)
n+1
n
2
xe
2n
2
x
2
.
We follow Devroye (1986) p. 161 to generate random draws from KS.
References
L. Devroye. Non-Uniform Random Variate Generation, 1986.
12 rpg
Examples
X = rks(1000)
rpg The Polya-Gamma Distribution
Description
Generate a random variate from the Polya-Gamma distribution.
Usage
rpg(num=1, h=1, z=0.0)
rpg.gamma(num=1, h=1, z=0.0, trunc=200)
rpg.devroye(num=1, n=1, z=0.0)
rpg.alt(num=1, h=1, z=0.0)
rpg.sp(num=1, h=1, z=0.0, track.iter=FALSE)
Arguments
You may call rpg when n and z are vectors.
The number of random variates to simulate.
num n Shape parameter, a positive integer
h Shape parameter. codeh >= 1 if not using sum of gammas method.
z Parameter associated with tilting.
trunc The number of elements used in sum of gammas approximation.
track.iter The number of proposals made before accepting.
Details
A random variable X with distribution PG(n,z) is generated by
X
k=1
G(n, 1)/(2
2
(k 1/2)
2
+ z
2
/2).
The density for X may be derived from Z and PG(n,0) as
spambase 13
p(x|n, z) exp(xz
2
/2)p(x|n, 0).
Thus PG(n,z) is an exponentially tilted PG(n,0).
Two different methods for generating this random variable are implemented. In general, you may
use rpg.gamma to generate an approximation of PG(n,z) using the sum of Gammas representation
above. When n is a natural number you may use rpg.devroye to sample PG(n,z). The later method
is fast.
Value
This function returns num Polya-Gamma samples.
References
Nicholas G. Polson, James G. Scott, and Jesse Windle. Bayesian inference for logistic models using
Polya-Gamma latent variables. http://arxiv.org/abs/1205.0310
See Also
logit.EM, logit, mlogit
Examples
h = c(1, 2, 3);
z = c(4, 5, 6);
## Devroye-like method -- only use if h contains integers, preferably small integers.
X = rpg.devroye(100, h, z);
h = c(1.2, 2.3, 3.2);
z = c(4, 5, 6);
## Sum of gammas method -- this is slow.
X = rpg.gamma(100, h, z);
## Hybrid method -- automatically chooses best procedure.
X = rpg(100, h, z);
spambase Spambase Data
Description
The spambase data has 57 real valued explanatory variables which characterize the contents of
an email and and one binary response variable indicating if the email is spam. There are 4601
observations.
14 spambase
Format
A data frame: the rst column is a binary response variable indicating if the email is spam. The
remaining 57 columns are real valued explanatory variables.
Details
Of the 57 explanatory variables, 48 describe word frequency, 6 describe character frequency, and 3
describe sequences of capital letters.
word.freq.<word> A continuous explanatory variable describing the frequency with which the
word <word> appears; measured in percent.
char.freq.<char> Acontinuous explanatory variable describing the frequency with which the char-
acter <char> appears; measured in percent.
capital.run.length.<oper> A statistic involving the length of consecutive capital letters.
Use names to see the specic words, characters, or statistics for each respective class of variable.
References
Mark Hopkins, Erik Reeber, George Forman, and Jaap Suermondt of Hewlett-Packard Labs (1999).
Spambase Data Set. http://archive.ics.uci.edu/ml/datasets/Spambase
Frank, A. &Asuncion, A. (2010). UCI Machine Learning Repository [http://archive.ics.uci.edu/ml].
Irvine, CA: University of California, School of Information and Computer Science.
Index
Topic Polya-Gamma
rpg, 12
Topic datasets
rain, 10
spambase, 13
Topic draw.indicators
draw.indicators, 3
Topic logit
logit, 4
logit.combine, 6
logit.EM, 7
mlogit, 9
Topic normal mixture
compute.mixture, 2
draw.indicators, 3
Topic polyagamma
rpg, 12
Topic regression
logit, 4
logit.combine, 6
logit.EM, 7
mlogit, 9
Topic rks
rks, 11
Topic rpg
rpg, 12
compute.mixture, 2, 4
draw.indicators, 3, 3
logit, 4, 7, 8, 10, 13
logit.combine, 6
logit.EM, 6, 7, 7, 10, 13
mlogit, 68, 9, 13
rain, 10
rks, 11
rpg, 6, 8, 10, 12
spambase, 13
15