Beta Regression Models
Beta Regression Models
Beta Regression Models
Francisco Cribari-Neto
Achim Zeileis
Universitat Innsbruck
Abstract
This introduction to the R package betareg is a (slightly) modified version of CribariNeto and Zeileis (2010), published in the Journal of Statistical Software. A follow-up paper
un, Kosmidis, and Zeileis (2012) a slightly modified version
with various extensions is Gr
of which is also provided within the package as vignette("betareg-ext", package =
"betareg")
The class of beta regression models is commonly used by practitioners to model variables that assume values in the standard unit interval (0, 1). It is based on the assumption
that the dependent variable is beta-distributed and that its mean is related to a set of
regressors through a linear predictor with unknown coefficients and a link function. The
model also includes a precision parameter which may be constant or depend on a (potentially different) set of regressors through a link function as well. This approach naturally
incorporates features such as heteroskedasticity or skewness which are commonly observed
in data taking values in the standard unit interval, such as rates or proportions. This paper
describes the betareg package which provides the class of beta regressions in the R system
for statistical computing. The underlying theory is briefly outlined, the implementation
discussed and illustrated in various replication exercises.
1. Introduction
How should one perform a regression analysis in which the dependent variable (or response
variable), y, assumes values in the standard unit interval (0, 1)? The usual practice used to be
to transform the data so that the transformed response, say y, assumes values in the real line
and then apply a standard linear regression analysis. A commonly used transformation is the
logit, y = log(y/(1 y)). This approach, nonetheless, has shortcomings. First, the regression
parameters are interpretable in terms of the mean of y, and not in terms of the mean of y
(given Jensens inequality). Second, regressions involving data from the unit interval such
as rates and proportions are typically heteroskedastic: they display more variation around
the mean and less variation as we approach the lower and upper limits of the standard unit
interval. Finally, the distributions of rates and proportions are typically asymmetric, and
thus Gaussian-based approximations for interval estimation and hypothesis testing can be
quite inaccurate in small samples. Ferrari and Cribari-Neto (2004) proposed a regression
model for continuous variates that assume values in the standard unit interval, e.g., rates,
proportions, or concentration indices. Since the model is based on the assumption that the
response is beta-distributed, they called their model the beta regression model. In their model,
the regression parameters are interpretable in terms of the mean of y (the variable of interest)
Beta Regression in R
=5
15
15
= 100
0.90
10
0.90
0.25
0.75
Density
10
0.10
0.10
0.50
0.25
0.75
0
0.50
0.0
0.2
0.4
0.6
y
0.8
1.0
0.0
0.2
0.4
0.6
0.8
1.0
Figure 1: Probability density functions for beta distributions with varying parameters =
0.10, 0.25, 0.50, 0.75, 0.90 and = 5 (left) and = 100 (right).
and the model is naturally heteroskedastic and easily accomodates asymmetries. A variant of
the beta regression model that allows for nonlinearities and variable dispersion was proposed
by Simas, Barreto-Souza, and Rocha (2010). In particular, in this more general model, the
parameter accounting for the precision of the data is not assumed to be constant across
observations but it is allowed to vary, leading to the variable dispersion beta regression model.
The chief motivation for the beta regression model lies in the flexibility delivered by the
assumed beta law. The beta density can assume a number of different shapes depending on
the combination of parameter values, including left- and right-skewed or the flat shape of the
uniform density (which is a special case of the more general beta density). This is illustrated
in Figure 1 which depicts several different beta densities. Following Ferrari and Cribari-Neto
(2004), the densities are parameterized in terms of the mean and the precision parameter ;
all details are explained in the next section. The evident flexiblity makes the beta distribution
an attractive candidate for data-driven statistical modeling.
The idea underlying beta regression models dates back to earlier approaches such as Williams
(1982) or Prentice (1986). The initial motivation was to model binomial random variables with
extra variation. The model postulated for the (discrete) variate of interest included a more
flexible variation structure determined by independent beta-distributed variables which are
related to a set of independent variables through a regression structure. However, unlike the
more recent literature, the main focus was to model binomial random variables. Our interest
in what follows will be more closely related to the recent literature, i.e., modeling continous
random variables that assume values in (0, 1), such as rates, proportions, and concentration
or inequality indices (e.g., Gini).
In this paper, we describe the betareg package which can be used to perform inference in both
fixed and variable dispersion beta regressions. The package is implemented in the R system for
statistical computing (R Development Core Team 2009) and available from the Comprehensive
R Archive Network (CRAN) at http://CRAN.R-project.org/package=betareg. The initial
version of the package was written by Simas and Rocha (2006) up to version 1.2 which was
orphaned and archived on CRAN in mid-2009. Starting from version 2.0-0, Achim Zeileis
took over maintenance after rewriting/extending the packages functionality.
The paper unfolds as follows: Section 2 outlines the theory underlying the beta regression
model before Section 3 describes its implementation in R. Sections 4 and 5 provide various
empirical applications: The former focuses on illustrating various aspects of beta regressions
in practice while the latter provides further replications of previously published empirical
research. Finally, Section 6 contains concluding remarks and directions for future research
and implementation.
2. Beta regression
The class of beta regression models, as introduced by Ferrari and Cribari-Neto (2004), is
useful for modeling continuous variables y that assume values in the open standard unit
interval (0, 1). Note that if the variable takes on values in (a, b) (with a < b known) one
can model (y a)/(b a). Furthermore, if y also assumes the extremes 0 and 1, a useful
transformation in practice is (y (n 1) + 0.5)/n where n is the sample size (Smithson and
Verkuilen 2006).
The beta regression model is based on an alternative parameterization of the beta density in
terms of the variate mean and a precision parameter. The beta density is usually expressed
as
(p + q) p1
f (y; p, q) =
y (1 y)q1 , 0 < y < 1,
(p)(q)
where p, q > 0 and () is the gamma function.1 Ferrari and Cribari-Neto (2004) proposed a
different parameterization by setting = p/(p + q) and = p + q:
f (y; , ) =
()
y 1 (1 y)(1)1 ,
()((1 ))
0 < y < 1,
with 0 < < 1 and > 0. We write y B(, ). Here, E(y) = and VAR(y) =
(1 )/(1 + ). The parameter is known as the precision parameter since, for fixed , the
larger the smaller the variance of y; 1 is a dispersion parameter.
Let y1 , . . . , yn be a random sample such that yi B(i , ), i = 1, . . . , n. The beta regression
model is defined as
g(i ) = x
i = i ,
where = (1 , . . . , k ) is a k 1 vector of unknown regression parameters (k < n), xi =
(xi1 , . . . , xik ) is the vector of k regressors (or independent variables or covariates) and i is
a linear predictor (i.e., i = 1 xi1 + + k xik ; usually xi1 = 1 for all i so that the model
has an intercept). Here, g() : (0, 1) 7 IR is a link function, which is strictly increasing and
twice differentiable. The main motivation for using a link function in the regression structure
1
A beta regression model based on this parameterization was proposed by Vasconcellos and Cribari-Neto
(2005). We shall, however, focus on the parameterization indexed by the mean and a precision parameter.
Beta Regression in R
is twofold. First, both sides of the regression equation assume values in the real line when
a link function is applied to i . Second, there is an added flexibility since the practitioner
can choose the function that yields the best fit. Some useful link functions are: logit g() =
log(/(1)); probit g() = 1 (), where () is the standard normal distribution function;
complementary log-log g() = log{ log(1 )}; log-log g() = log{ log()}; and Cauchy
g() = tan{( 0.5)}. Note that the variance of y is a function of which renders the
regression model based on this parameterization naturally heteroskedastic. In particular,
VAR(yi ) =
1
i (1 i )
g 1 (x
i )[1 g (xi )]
=
.
1+
1+
Pn
i=1 i (i , ),
(1)
where
(2)
+{(1 i ) 1} log(1 yi ).
Notice that i = g 1 (x
i ) is a function of , the vector of regression parameters. Parameter
estimation is performed by maximum likelihood (ML).
An extension of the beta regression model above which was employed by Smithson and
Verkuilen (2006) and formally introduced (along with further extensions) by Simas et al.
(2010) is the variable dispersion beta regression model. In this model the precision parameter
is not constant for all observations but instead modeled in a similar fashion as the mean
parameter. More specifically, yi B(i , i ) independently, i = 1, . . . , n, and
g1 (i ) = 1i = x
i ,
(3)
zi ,
(4)
g2 (i ) = 2i =
(5)
1
d i) =
where VAR(y
i (1
i )/(1 + i ),
i = g11 (x
). Similarly, deviance
i ), and i = g2 (zi
residuals can be defined in the standard way via signed contributions to the excess likelihood.
(6)
3. Implementation in R
To turn the conceptual model from the previous section into computational tools in R, it helps
to emphasize some properties of the model: It is a standard maximum likelihood (ML) task
for which there is no closed-form solution but numerical optimization is required. Furthermore, the model shares some properties (such as linear predictor, link function, dispersion
parameter) with generalized linear models (GLMs; McCullagh and Nelder 1989), but it is not
a special case of this framework (not even for fixed dispersion). There are various models
with implementations in R that have similar features here, we specifically reuse some of the
ideas employed for generalized count data regression by Zeileis, Kleiber, and Jackman (2008).
The main model-fitting function in betareg is betareg() which takes a fairly standard approach for implementing ML regression models in R: formula plus data is used for model and
data specification, then the likelihood and corresponding gradient (or estimating function)
is set up, optim() is called for maximizing the likelihood, and finally an object of S3 class
betareg is returned for which a large set of methods to standard generics is available. The
workhorse function is betareg.fit() which provides the core computations without formularelated data pre- and post-processing. Update: Recently, betareg() has been extended to
optionally include an additional Fisher scoring iteration after the optim() optimization in
order to improve the ML result (or apply a bias correction or reduction).
The model-fitting function betareg() and its associated class are designed to be as similar
as possible to the standard glm() function (R Development Core Team 2009) for fitting
GLMs. An important difference is that there are potentially two equations for mean and
precision (Equations 3 and 4, respectively), and consequently two regressor matrices, two
linear predictors, two sets of coefficients, etc. In this respect, the design of betareg() is
similar to the functions described by Zeileis et al. (2008) for fitting zero-inflation and hurdle
Beta Regression in R
models which also have two model components. The arguments of betareg() are
betareg(formula, data, subset, na.action, weights, offset,
link = "logit", link.phi = NULL, control = betareg.control(...),
model = TRUE, y = TRUE, x = FALSE, ...)
where the first line contains the standard model-frame specifications (see Chambers and Hastie
1992), the second line has the arguments specific to beta regression models and the arguments
in the last line control some components of the return value.
If a formula of type y ~ x1 + x2 is supplied, it describes yi and xi for the mean equation
of the beta regression (3). In this case a constant i is assumed, i.e., zi = 1 and g2 is
the identity link, corresponding to the basic beta regression model as introduced in Ferrari
and Cribari-Neto (2004). However, a second set of regressors can be specified by a two-part
formula of type y ~ x1 + x2 | z1 + z2 + z3 as provided in the Formula package (Zeileis
and Croissant 2010). This model has the same mean equation as above but the regressors
zi in the precision equation (4) are taken from the ~ z1 + z2 + z3 part. The default link
function in this case is the log link g2 () = log(). Consequently, y ~ x1 + x2 and y ~ x1
+ x2 | 1 correspond to equivalent beta likelihoods but use different parametrizations for i :
simply i = 1 in the former case and log(i ) = 1 in the latter case. The link for the i
precision equation can be changed by link.phi in both cases where "identity", "log", and
"sqrt" are allowed as admissible values. The default for the i mean equation is always the
logit link but all link functions for the binomial family in glm() are allowed as well as the
log-log link: "logit", "probit", "cloglog", "cauchit", "log", and "loglog".
ML estimation of all parameters employing analytical gradients is carried out using Rs
optim() with control options set in betareg.control(). All of optim()s methods are
available but the default is "BFGS", which is typically regarded to be the best-performing
method (Mittelhammer, Judge, and Miller 2000, Section 8.13) with the most effective updating formula of all quasi-Newton methods (Nocedal and Wright 1999, p. 197). Starting
values can be user-supplied, otherwise the starting values are estimated by a regression
of g1 (yi ) on xi . The starting values for the intercept are chosen as described in (Ferrari
and Cribari-Neto 2004, p. 805), corresponding to a constant i (plus a link transformation, if
any). All further coefficients (if any) are initially set to zero. The covariance matrix estimate is derived analytically as in Simas et al. (2010). However, by setting hessian = TRUE
the numerical Hessian matrix returned by optim() can also be obtained. Update: In recent
versions of betareg, the optim() is still performed but optionally it may be complemented by
a subsequent additional Fisher scoring iteration to improve the result.
The returned fitted-model object of class betareg is a list similar to glm objects. Some of
its elements such as coefficients or terms are lists with a mean and precision component,
respectively. A set of standard extractor functions for fitted model objects is available for
objects of class betareg, including the usual summary() method that includes partial Wald
tests for all coefficients. No anova() method is provided, but the general coeftest() and
waldtest() from lmtest (Zeileis and Hothorn 2002), and linear.hypothesis() from car (?)
can be used for Wald tests while lrtest() from lmtest provides for likelihood-ratio tests of
nested models. See Table 1 for a list of all available methods. Most of these are standard in
base R, however, methods to a few less standard generics are also provided. Specifically, there
are tools related to specification testing and computation of sandwich covariance matrices
Function
print()
summary()
Description
simple printed display with coefficient estimates
standard regression output (coefficient estimates, standard
errors, partial Wald tests); returns an object of class
summary.betareg containing the relevant summary statistics (which has a print() method)
coef()
extract coefficients of model (full, mean, or precision components), a single vector of all coefficients by default
vcov()
associated covariance matrix (with matching names)
predict()
predictions (of means i , linear predictors 1i , precision parameter i , or variances i (1 i )/(1 + i )) for new data
fitted()
fitted means for observed data
residuals()
extract residuals (deviance, Pearson, response, or different
weighted residuals, see Espinheira et al. 2008b), defaulting
to standardized weighted residuals 2 from Equation 6
estfun()
compute empirical estimating functions (or score functions),
evaluated at observed data and estimated parameters (see
Zeileis 2006b)
bread()
extract bread matrix for sandwich estimators (see Zeileis
2006b)
terms()
extract terms of model components
model.matrix()
extract model matrix of model components
model.frame()
extract full original model frame
logLik()
extract fitted log-likelihood
plot()
diagnostic plots of residuals, predictions, leverages etc.
hatvalues()
hat values (diagonal of hat matrix)
cooks.distance()
(approximation of) Cooks distance
gleverage()
compute generalized leverage (Wei, Hu, and Fung 1998;
Rocha and Simas 2010)
coeftest()
partial Wald tests of coefficients
waldtest()
Wald tests of nested models
linear.hypothesis() Wald tests of linear hypotheses
lrtest()
likelihood ratio tests of nested models
AIC()
compute information criteria (AIC, BIC, . . . )
Table 1: Functions and methods for objects of class betareg. The first four blocks refer
to methods, the last block contains generic functions whose default methods work because of
the information supplied by the methods above.
as discussed by Zeileis (2004, 2006b) as well as a method to a new generic for computing
generalized leverages (Wei et al. 1998).
Beta Regression in R
analyses from the original papers that suggested the methodology. More specifically, we
estimate and compare various flavors of beta regression models for the gasoline yield data
of Prater (1956), see Figure 2, and for the household food expenditure data taken from
Griffiths, Hill, and Judge (1993), see Figure 4. Further pure replication exercises are provided
in Section 5.
***
***
***
***
***
***
***
***
***
**
***
Batch
1
2
3
4
5
6
7
8
9
10
0.4
0.3
0.2
0.1
250
200
loglog
logit
300
350
400
450
Figure 2: Gasoline yield data from Prater (1956): Proportion of crude oil converted to gasoline explained by temperature (in degrees Fahrenheit) at which all gasoline has vaporized and
given batch (indicated by gray level). Fitted curves correspond to beta regressions gy_loglog
with log-log link (solid, red) and gy_logit with logit link (dashed, blue). Both curves were
evaluated at varying temperature with the intercept for batch 6 (i.e., roughly the average
intercept).
10
Beta Regression in R
0.0
0.4
0.1
0.3
0.2
Cook's distance
Pearson residuals
0.5
10
15
20
25
30
10
15
0.6
0.4
0.2
0.3
0.4
0.5
3.0
2.5
2.0
1.5
1.0
0.5
Predicted values
Linear predictor
0.0
3.0
2.5
0.5
1.0
2
0.5
0.0
1.5
Deviance residuals
2.0
0.0
0.1
0.2
0.5
Pearson residuals
0.3
Generalized leverage
30
Obs. number
25
Obs. number
20
1.0
1.5
Normal quantiles
2.0
10
15
20
25
30
Obs. number
11
Neto (2004) decided to refit the model excluding this observation. While this does not change
the coefficients in the mean model very much, the precision parameter increases clearly.
R> gy_logit4 <- update(gy_logit, subset = -4)
R> coef(gy_logit, model = "precision")
(phi)
440.2784
R> coef(gy_logit4, model = "precision")
(phi)
577.7907
12
Beta Regression in R
0.5
0.4
0.3
Persons
0.2
0.1
7
6
5
4
3
2
1
30
40
50
60
70
80
90
Household income
Figure 4:
Household food expenditure data from Griffiths et al. (1993): Proportion of
household income spent on food explained by household income and number of persons in
household (indicated by gray level). Fitted curves correspond to beta regressions fe_beta
with fixed dispersion (long-dashed, blue), fe_beta2 with variable dispersion (solid, red), and
the linear regression fe_lin (dashed, black). All curves were evaluated at varying income
with the intercept for mean number of persons (= 3.58).
Call:
betareg(formula = I(food/income) ~ income + persons, data = FoodExpenditure)
Standardized weighted residuals 2:
Min
1Q Median
3Q
Max
-2.7818 -0.4445 0.2024 0.6852 1.8755
Coefficients (mean model with logit link):
Estimate Std. Error z value Pr(>|z|)
(Intercept) -0.622548
0.223854 -2.781 0.005418 **
income
-0.012299
0.003036 -4.052 5.09e-05 ***
persons
0.118462
0.035341
3.352 0.000802 ***
Phi coefficients (precision model with identity link):
Estimate Std. Error z value Pr(>|z|)
8.08
13
14
Beta Regression in R
fe_beta
fe_beta2
df
AIC
4 -76.11667
5 -80.18198
Thus, there is evidence for variable dispersion and model fe_beta2 seems to be preferable.
As visualized in Figure 4, it describes a similar relationship between response and explanatory
variables although with a somewhat shrunken income slope.
2
In R, the BIC can be computed by means of AIC() when log(n) is supplied as the penalty term k
15
16
Beta Regression in R
The improvement of the model fit can also be brought out graphically by comparing absolute
raw residuals (i.e., yi
i ) from both models as in Figure 5.
R> plot(abs(residuals(gy_loglog, type = "response")),
+
abs(residuals(gy_logit, type = "response")))
R> abline(0, 1, lty = 2)
This shows that there are a few observations clearly above the diagonal (where the log-log-link
fits better than the logit link) whereas there are fewer such observations below the diagonal.
A different diagnostic display that is useful in this situation (and is employed by Cribari-Neto
and Lima 2007) is a plot of predicted values (
i ) vs. observed values (yi ) for each model.
This can be created by plot(gy_logit, which = 6) and plot(gy_loglog, which = 6),
respectively.
In principle, the link function g2 in the precision equation could also influence the model fit.
However, as the best-fitting model gy_loglog has a constant , all links g2 lead to equivalent
estimates of and thus to equivalent fitted log-likelihoods. However, the link function can
have consequences in terms of the inference about and in terms of convergence of the
optimization. Typically, a log-link leads to somewhat improved quadratic approximations of
the likelihood and less iterations in the optimization. For example, refitting gy_loglog with
g2 () = log() converges more quickly:
R> gy_loglog2 <- update(gy_loglog, link.phi = "log")
R> summary(gy_loglog2)$iterations
optim scoring
21
2
17
0.05
0.03
0.04
0.01
0.02
0.000
0.00
0.005
0.010
0.015
0.020
0.025
0.030
Figure 5: Scatterplot comparing the absolute raw residuals from beta regression modes with
log-log link (x-axis) and logit link (y-axis).
with a lower number of iterations than for gy_loglog which had 51 iterations.
18
Beta Regression in R
19
control
dyslexic
betareg
lm
0.8
0.7
0.5
0.6
Reading accuracy
0.9
1.0
IQ score
Figure 6: Reading skills data from Smithson and Verkuilen (2006): Linearly transformed
reading accuracy by IQ score and dyslexia status (control, blue vs. dyslexic, red). Fitted curves
correspond to beta regression rs_beta (solid) and OLS regression with logit-transformed
dependent variable rs_ols (dashed).
z test of coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept)
1.12323
0.15089 7.4441 9.758e-14 ***
dyslexia
-0.74165
0.15145 -4.8969 9.736e-07 ***
iq
0.48637
0.16708 2.9109 0.0036034 **
dyslexia:iq
-0.58126
0.17258 -3.3681 0.0007568 ***
(phi)_(Intercept) 3.30443
0.22650 14.5890 < 2.2e-16 ***
(phi)_dyslexia
1.74656
0.29398 5.9410 2.832e-09 ***
(phi)_iq
1.22907
0.45957 2.6744 0.0074862 **
--Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1
This shows that precision increases with iq and is lower for controls while in the mean equation
there is a significant interaction between iq and dyslexia. As Figure 6 illustrates, the beta
regression fit does not differ much from the OLS fit for the dyslexics group (with responses
close to 0.5) but fits much better in the control group (with responses close to 1).
20
Beta Regression in R
The estimates above replicate those in Table 5 of Smithson and Verkuilen (2006), except for
the signs of the coefficients of the dispersion submodel which they defined in the opposite
way. Note that their results have been obtained with numeric rather than analytic standard
errors hence hessian = TRUE is set above for replication. The results are also confirmed by
Espinheira, Ferrari, and Cribari-Neto (2008a), who have also concluded that the dispersion
is variable. As pointed out in Section 4.2, to formally test equidispersion against variable
dispersion lrtest(rs_beta, . ~ . | 1) (or the analogous waldtest()) can be used.
Smithson and Verkuilen (2006) also consider two other psychometric applications of beta
regressions the data for which are also provided in the betareg package: see ?MockJurors and
?StressAnxiety. Furthermore, demo("SmithsonVerkuilen2006", package = "betareg")
is a complete replication script with comments.
21
(phi)
0.2
0.2
1.5
0.4
0.0
(phi)
0.5
(Intercept)
0.6
Mfluctuation test
0.8
(Intercept)
Mfluctuation test
0.0
0.2
0.4
0.6
0.8
1.0
Time
0.0
0.2
0.4
0.6
0.8
1.0
Time
Figure 7: Structural change tests for artificial data y1 with change in (left) and y2 with
change in (right).
R> y2 <- c(rbeta(100, 0.3 * 4, 0.7 * 4), rbeta(100, 0.3 * 8, 0.7 * 8))
To capture instabilities in the parameters over time (i.e., the ordering of the observations),
the generalized empirical fluctuation processes can be derived via
R> library("strucchange")
R> y1_gefp <- gefp(y1 ~ 1, fit = betareg)
R> y2_gefp <- gefp(y2 ~ 1, fit = betareg)
and visualized by
R> plot(y1_gefp, aggregate = FALSE)
R> plot(y2_gefp, aggregate = FALSE)
The resulting Figure 7 (replicating Figure 4 from Zeileis 2006a) shows two 2-dimensional fluctuation processes: one for y1 (left) and one for y2 (right). Both fluctuation processes behave
as expected: There is no excessive fluctuation of the process pertaining to the parameter that
remained constant while there is a clear peak at about the time of the change in the parameter
with the shift. In both series the structural change is significant due to the crossing of the red
boundary that corresponds to the 5% critical value. For further details see Zeileis (2006a).
6. Summary
This paper addressed the R implementation of the class of beta regression models available
in the betareg package. We have presented the fixed and variable dispersion beta regression
22
Beta Regression in R
models, described how one can model rates and proportions using betareg and presented
several empirical examples reproducing previously published results. Future research and
implementation shall focus on the situation where the data contain zeros and/or ones (see
e.g., Kieschnick and McCullough 2003). An additional line of research and implementation
is that of dynamic beta regression models, such as the class of ARMA models proposed by
Rocha and Cribari-Neto (2009).
Acknowledgments
FCN gratefully acknowledges financial support from CNPq/Brazil. Both authors are grateful
to A.B. Simas and A.V. Rocha for their work on the previous versions of the betareg package
(up to version 1.2). Furthermore, detailed and constructive feedback from two anonymous
reviewers, the associated editor, as well as from B. Gr
un was very helpful for enhancing both
software and manuscript.
References
Breusch TS, Pagan AR (1979). A Simple Test for Heteroscedasticity and Random Coefficient
Variation. Econometrica, 47, 12871294.
Chambers JM, Hastie TJ (eds.) (1992). Statistical Models in S. Chapman & Hall, London.
Cribari-Neto F, Lima LB (2007). A Misspecification Test for Beta Regressions. Technical
report.
Cribari-Neto F, Zeileis A (2010). Beta Regression in R. Journal of Statistical Software,
34(2), 124. URL http://www.jstatsoft.org/v34/i02/.
Espinheira PL, Ferrari SLP, Cribari-Neto F (2008a). Influence Diagnostics in Beta Regression. Computational Statistics & Data Analysis, 52(9), 44174431.
Espinheira PL, Ferrari SLP, Cribari-Neto F (2008b). On Beta Regression Residuals. Journal
of Applied Statistics, 35(4), 407419.
Ferrari SLP, Cribari-Neto F (2004). Beta Regression for Modelling Rates and Proportions.
Journal of Applied Statistics, 31(7), 799815.
Griffiths WE, Hill RC, Judge GG (1993). Learning and Practicing Econometrics. John Wiley
& Sons, New York.
Gr
un B, Kosmidis I, Zeileis A (2012). Extended Beta Regression in R: Shaken, Stirred,
Mixed, and Partitioned. Journal of Statistical Software, 48(11), 125. URL http://www.
jstatsoft.org/v48/i11/.
Kieschnick R, McCullough BD (2003). Regression Analysis of Variates Observed on (0, 1):
Percentages, Proportions and Fractions. Statistical Modelling, 3(3), 193213.
Koenker R (1981). A Note on Studentizing a Test for Heteroscedasticity. Journal of Econometrics, 17, 107112.
23
McCullagh P, Nelder JA (1989). Generalized Linear Models. 2nd edition. Chapman & Hall,
London.
Mittelhammer RC, Judge GG, Miller DJ (2000). Econometric Foundations. Cambridge
University Press, New York.
Nocedal J, Wright SJ (1999). Numerical Optimization. Springer-Verlag, New York.
Ospina R, Cribari-Neto F, Vasconcellos KLP (2006). Improved Point and Interval Estimation
for a Beta Regression Model. Computational Statistics & Data Analysis, 51(2), 960981.
Prater NH (1956). Estimate Gasoline Yields from Crudes. Petroleum Refiner, 35(5), 236
238.
Prentice RL (1986). Binary Regression Using an Extended Beta-Binomial Distribution, With
Discussion of Correlation Induced by Covariate Measurement. Journal of the American
Statistical Association, 81(394), 321327.
Ramsey JB (1969). Tests for Specification Error in Classical Linear Least Squares Regression
Analysis. Journal of the Royal Statistical Society B, 31, 350371.
R Development Core Team (2009). R: A Language and Environment for Statistical Computing.
R Foundation for Statistical Computing, Vienna, Austria. ISBN 3-900051-07-0, URL http:
//www.R-project.org/.
Rocha AV, Cribari-Neto F (2009). Beta Autoregressive Moving Average Models. Test, 18,
529545.
Rocha AV, Simas AB (2010). Influence Diagnostics in a General Class of Beta Regression
Models. Test, 20, 95119.
Simas AB, Barreto-Souza W, Rocha AV (2010). Improved Estimators for a General Class of
Beta Regression Models. Computational Statistics & Data Analysis, 54(2), 348366.
Simas AB, Rocha AV (2006). betareg: Beta Regression. R package version 1.2, URL http:
//CRAN.R-project.org/src/contrib/Archive/betareg/.
Smithson M, Verkuilen J (2006). A Better Lemon Squeezer? Maximum-Likelihood Regression with Beta-Distributed Dependent Variables. Psychological Methods, 11(1), 5471.
Vasconcellos KLP, Cribari-Neto F (2005). Improved Maximum Likelihood Estimation in
a New Class of Beta Regression Models. Brazilian Journal of Probability and Statistics,
19(1), 1331.
Wei BC, Hu YQ, Fung WK (1998). Generalized Leverage and Its Applications. Scandinavian
Journal of Statistics, 25(1), 2537.
Williams DA (1982). Extra Binomial Variation in Logistic Linear Models. Applied Statistics,
31(2), 144148.
Zeileis A (2004). Econometric Computing with HC and HAC Covariance Matrix Estimators.
Journal of Statistical Software, 11(10), 117. URL http://www.jstatsoft.org/v11/i10/.
24
Beta Regression in R
Zeileis A (2006a). Implementing a Class of Structural Change Tests: An Econometric Computing Approach. Computational Statistics & Data Analysis, 50(11), 29873008.
Zeileis A (2006b). Object-Oriented Computation of Sandwich Estimators. Journal of Statistical Software, 16(9), 116. URL http://www.jstatsoft.org/v16/i09/.
Zeileis A, Croissant Y (2010). Extended Model Formulas in R: Multiple Parts and Multiple
Responses. Journal of Statistical Software, 34(1), 113. URL http://www.jstatsoft.
org/v34/i01/.
Zeileis A, Hothorn T (2002). Diagnostic Checking in Regression Relationships. R News,
2(3), 710. URL http://CRAN.R-project.org/doc/Rnews/.
Zeileis A, Kleiber C, Jackman S (2008). Regression Models for Count Data in R. Journal
of Statistical Software, 27(8), 125. URL http://www.jstatsoft.org/v27/i08/.
Zeileis A, Leisch F, Hornik K, Kleiber C (2002). strucchange: An R Package for Testing
for Structural Change in Linear Regression Models. Journal of Statistical Software, 7(2),
138. URL http://www.jstatsoft.org/v07/i02/.
Affiliation:
Francisco Cribari-Neto
Departamento de Estatstica, CCEN
Universidade Federal de Pernambuco
Cidade Universit
aria
Recife/PE 50740-540, Brazil
E-mail: cribari@ufpe.br
URL: http://www.de.ufpe.br/~cribari/
Achim Zeileis
Department of Statistics
Universit
at Innsbruck
Universit
atsstr. 15
6020 Innsbruck, Austria
E-mail: Achim.Zeileis@R-project.org
URL: http://statmath.wu.ac.at/~zeileis/