JKBDSSK

Journal of Statistical Computation and Simulation
ISSN: 0094-9655 (Print) 1563-5163 (Online) Journal homepage: https://www.tandfonline.com/loi/gscs20
On testing inference in beta regressions
Francisco Cribari-Neto & Marcela P.F. Queiroz
To cite this article: Francisco Cribari-Neto & Marcela P.F. Queiroz (2014) On testing inference
in beta regressions, Journal of Statistical Computation and Simulation, 84:1, 186-203, DOI:
10.1080/00949655.2012.700456
To link to this article: https://doi.org/10.1080/00949655.2012.700456
Published online: 10 Jul 2012.
Submit your article to this journal
Article views: 329
View related articles
View Crossmark data
Citing articles: 6 View citing articles
Full Terms & Conditions of access and use can be found at

https://www.tandfonline.com/action/journalInformation?journalCode=gscs20
Journal of Statistical Computation and Simulation, 2014
Vol. 84, No. 1, 186–203, http://dx.doi.org/10.1080/00949655.2012.700456
On testing inference in beta regressions

Francisco Cribari-Neto* and Marcela P.F. Queiroz
Departamento de Estatística, Universidade Federal de Pernambuco, Cidade Universitária,
Recife/PE 50740–540, Brazil
(Received 10 April 2012; final version received 2 June 2012)
This article deals with testing inference in the class of beta regression models with varying dispersion. We
focus on inference in small samples. We perform a numerical analysis in order to evaluate the sizes and
powers of different tests. We consider the likelihood ratio test, two adjusted likelihood ratio tests proposed
by Ferrari and Pinheiro [Improved likelihood inference in beta regression, J. Stat. Comput. Simul. 81
(2011), pp. 431–443], the score test, the Wald test and bootstrap versions of the likelihood ratio, score and
Wald tests. We perform tests on the parameters that index the mean submodel and also on the parameters
in the linear predictor of the precision submodel. Overall, the numerical evidence favours the bootstrap
tests. It is also shown that the score test is considerably less size-distorted than the likelihood ratio and
Wald tests. An application that uses real (not simulated) data is presented and discussed.
Keywords: beta regression; bootstrap; likelihood ratio test; profile likelihood; score test; Wald test
1. Introduction
Oftentimes, researchers in different fields need to model the dependence of a continuous ran-
dom variable that assumes values in the standard unit interval on a set of explanatory variables
(covariates, regressors). This is usually the case when the interest lies in explaining the behaviour
of rates and proportions. Ferrari and Cribari-Neto [1] proposed a beta regression model that is
useful for such a modelling. They have introduced a new parameterization of the beta density
and developed a model in which the mean response is related to a linear predictor that includes
covariates and regression parameters. Their model can be used with continuous responses that
assume values in (0, 1) and, more generally, with those that take values in (a, b), where a and b are
known constants (a < b). In the latter case, one models (y − a)/(b − a). The chief assumption in
their modelling strategy is that the variable of interest is beta-distributed. Their model includes
a precision parameter and is similar to a generalized linear model [2]. An extension of the beta
regression for situations in which the precision parameter is not constant but varies across the
observations was developed by Simas et al. [3].
After the point estimation of the parameters that index the model the interest usually lies in
testing hypotheses about such parameters. Unlike the linear regression model, there is no exact
test available. Testing inference is typically performed with the aid of three tests that rely on a large
sample approximation, namely: the likelihood ratio, score and Wald tests. The null distributions
of the three test statistics can be approximated by their limiting counterpart, which is χr2 , where
*Corresponding author. Email: cribari@de.ufpe.br
© 2012 Taylor & Francis

Journal of Statistical Computation and Simulation 187
r is the number of restrictions imposed by the null hypothesis under test. Since the tests are
performed using asymptotic (approximate) critical values, size distortions are likely to take place
when the sample size is not large. It should also be noted that when there are nuisance parameters
the likelihood ratio test is based on the profile likelihood function, which does not enjoy the same
properties as the likelihood function. As a result, the likelihood ratio test tends to be considerably
size-distorted.
Several corrections and adjustments to the likelihood ratio and score test statistics were proposed
in the literature, such as the likelihood ratio tests based on modified profile likelihood functions [4–
9] and Bartlett-corrected likelihood ratio and score tests (see [10]). There are also tests based on the
signed and adjusted signed likelihood ratio test statistics [11,12]; these tests can be used when the
parameter of interest is scalar. Some of the corrections require orthogonality between the parameter
of interest and the nuisance parameter or are very hard to obtain under non-orthogonality. A
simpler alternative was proposed by Skovgaard [13]. He derived two adjusted likelihood ratio
test statistics that can be easily obtained in a wide variety of models, even when the parameter
of interest is vector-valued and is not orthogonal to the nuisance parameter. His correction was
computed in the context of nonlinear exponential family models by Ferrari and Cysneiros [14] and
by Melo et al. [15] for a new class of models introduced by the authors. Skovgaard’s adjustment
for testing inference in the class of beta regressions was derived by Ferrari and Pinheiro [16].
Their numerical results show that the adjusted tests outperform the likelihood ratio test in finite
samples.
An alternative strategy is to use data resampling to estimate the null distributions of the test
statistics. This can be accomplished by using the parameteric bootstrap [17]. A large number of
artificial samples is generated imposing the null hypothesis, the test statistic is computed for each
of them and all bootstrap test statistics are used to estimate the null distribution of the test statistic,
from which we obtain a critical value for the test.
Our chief goal in this paper is to consider and numerically evaluate different testing strategies
in varying dispersion beta regressions. We consider testing inference based on the likelihood
ratio, score and Wald tests, the bootstrap versions of these three tests, and also based on the two
adjusted likelihood ratio tests of Ferrari and Pinheiro [16]. We perform tests on the parameters
that index the mean submodel (i.e. on the mean effects) and also tests on the parameters that define
the linear predictor of the precision submodel. Our numerical evidence shows that the bootstrap
tests outperform the competition and that the score test is considerably less size-distorted than
the likelihood ratio and Wald tests. Indeed, the Wald test can display very large size distortions.
Our results also show that testing inference on the parameters that index the precision submodel
is usually less accurate than that performed on the mean submodel parameters, in the sense of
typically displaying larger type II error frequency.
The paper unfolds as follows. Section 2 introduces the beta regression model. The different
tests are presented in Section 3. The numerical results are presented and discussed in Section 4.
Section 5 contains an application that uses real (not simulated) data. Finally, Section 6 offers some
concluding remarks.
2. The beta regression model
Ferrari and Cribari-Neto [1] expressed the beta density in such a way that it is indexed by the
random variable mean (μ) and a precision parameter (φ). Using their parameterization, we say
that a variate y is beta-distributed, denoted by y ∼ B(μ, φ), if its density function is given by
(φ)
f (y; μ, φ) = yμφ−1 (1 − y)(1−μ)φ−1 , (1)
(μφ)((1 − μ)φ)
188 F. Cribari-Neto and M.P.F. Queiroz
y ∈ (0, 1), μ ∈ (0, 1) and φ > 0.

Let y1 , . . . , yn be independent random variables such that yt ∼ B(μt , φ), t = 1, . . . , n, where μt
is the mean of the tth variate and φ is the precision parameter. In the regression model introduced
by Ferrari and Cribari-Neto [1] the tth response mean (μt ) is linked to a linear predictor (ηt ) that
involves independent variables and parameters as follows:

k
g(μt ) = ηt = xti βi = xt β, (2)
i=1
where β = (β1 , . . . , βk ) is a vector of unknown regression parameters (β ∈ Rk ), xt =

(xt1 , . . . , xtk ) is a vector of k covariates (k < n), which are taken to be fixed, and g : (0, 1) −→ R a
strictly increasing and twice-differentiable link function. The most commonly used link functions
are logit, probit, log–log, complementary log–log and Cauchy.
The model was extended by Simas et al. [3] to allow the precision parameter to vary across
observations. They included a separate submodel for the precision parameter:

q
h(φt ) = ztj γj = ϑt ,
j=1
where γ = (γ1 , . . . , γq ) is a vector of unknown parameters (γ ∈ Rq ), zt1 , . . . , ztq are observations

on q covariates (q < n − k), which are assumed fixed and known, and h(·) : (0, ∞) −→ R is a
strictly increasing and twice-differentiable link function. The most
√ obvious link functions for this
submodel are the log h(φt ) = log(φt ) and square root h(φt ) = φt links.
In order to simplify the notation in what follows, let yt∗ = log{yt /(1 − yt )} and yt† =
log(1 − yt ). Notice that the first two moments of such variates are μ∗t = E(yt∗ ) =
ψ(μt φt ) − ψ((1 − μt )φt ), μ†t = E(yt† ) = ψ((1 − μt )φt ) − ψ(φt ), vt∗ = Var(yt∗ ) = ψ (μt φt ) +
ψ ((1 − μt )φt ), vt† = Var(yt† ) = ψ ((1 − μt )φt ) − ψ (φt ) and ct = Cov(yt∗ , yt† ) = −ψ ((1 −
μt )φt ), where ψ(·) is the digamma function.
Parameter estimation is typically performed by maximum likelihood. The log-likelihood
function based on a sample of n independent beta responses is

n
(β, γ ) = t (μt , φt ),
t=1
where
t (μt , φt ) = log (φt ) − log (μt φt ) − log ((1 − μt )φt )

+ (μt φt − 1)yt∗ + (φt − 2)yt† .
The score function U = (Uβ (β, γ ), Uγ (β, γ )) is obtained from the differentiation of the log-
likelihood function with respect to the model parameters (β, γ ):
Uβ (β, γ ) = X T (y∗ − μ∗ )
and
Uγ (β, γ ) = Z H{M(y∗ − μ∗ ) + (y† − μ† )},
where X is an n × k matrix whose tth row is xt = (xt1 , . . . , xtk ) and Z is an
n × q matrix with tth row given by zt = (zt1 , . . . , ztq ). The following are diago-
nal matrices: T = diag{1/g (μ1 ), . . . , 1/g (μn )}, H = diag{1/h (φ1 ), . . . , 1/h (φn )}, M = diag
{μ1 , . . . , μn }, = diag{φ1 , . . . , φn }. Additionally, y∗ = (y1∗ , . . . , yn∗ ) , y† = (y1† , . . . , yn† ) , μ∗ =

(μ∗1 , . . . , μ∗n ) and μ† = (μ†1 , . . . , μ†n ) .
The maximum-likelihood estimators of β and γ , say β̂ and γ̂ , solve the system of equations
given by Uβ (β, γ ) = Uγ (β, γ ) = 0. Since the estimators cannot be expressed in the closed-
form, the maximum-likelihood estimates are typically computed by numerically maximizing the
log-likelihood function with the aid of a Newton (e.g. Newton–Raphson) or quasi-Newton (e.g.
Broyden–Fletcher–Goldfarb–Shanno) algorithm; see [18] for details on nonlinear optimization.
The matrix of second-order log-likelihood derivatives can be expressed as

Jββ Jβγ
J = J(β, γ ) = ,
Jγβ Jγ γ
where
Jββ = X { TV ∗ + ST 2 (y∗ − M∗ )}T X,

Jβγ = Jγβ = −X {(y∗ − M∗ ) − (MV ∗ + C)}THZ
and Jγ γ = Z {H(M2 V ∗ + 2MC + V † ) + {M(y∗ − M∗ ) + (y† − M† )}QH 2 }HZ.
The following are diagonal matrices: V ∗ = diag{v1∗ , . . . , vn∗ }, V † = diag{v1† , . . . , vn† }, M∗ =

diag{μ∗1 , . . . , μ∗n }, M† = diag{μ†1 , . . . , μ†n }, C = diag{c1 , . . . , ct }, S = diag{g (μ1 ), . . . , g (μn )},
Q = diag{h (μ1 ), . . . , h (μn )}.
Fisher’s information matrix can be written as

Kββ Kβγ
K = K(β, γ ) = ,
Kγβ Kγ γ
where
Kββ = X TV ∗ T X,
Kβγ = Kγβ = X T (MV ∗ + C)HZ
and Kγ γ = Z H(M2 V ∗ + 2MC + V † )HZ.
It is noteworthy that, unlike what we observe in the class of generalized linear models [2], the
parameters β and γ are not orthogonal.
Under the standard regularity conditions,

β̂ β
∼ Nk+q , K −1 ,
γ̂ γ
approximately, where K −1 is the inverse information.
3. Hypothesis tests
Let y1 , . . . , yn , t = 1, . . . , n, be independent beta responses and let θ = (μ , φ ) be the parameter

vector that indexes the beta regression model. We shall partition θ as θ = (ω , ψ ) , where ω =
(ω1 , . . . , ωr ) is the parameter of interest and ψ = (ψ1 , . . . , ψs ) is the nuisance parameter
(k + q = r + s). The interest lies in the test of H0 : ω = ω(0) against a two-sided alternative
hypothesis, where ω(0) is a given r-vector. Let (ω, ψ) denote the log-likelihood function for the
parameters ω and ψ. We partition Fisher’s information matrix and its inverse as

Kωω Kωψ K ωω K ωψ
K= and K −1 = ,
Kψω Kψψ K ψω K ψψ
and likewise for J and J −1 .

The likelihood ratio, Wald and score test statistics are given by
LR = 2{ (ω(0) , ψ̂) − (ω(0) , ψ̃)},

W = (ω̂ − ω(0) ) (K̂ ωω )−1 (ω̂ − ω(0) )
and SR = Ũω K̃ ωω Ũω ,
respectively, where hats (tildes) indicate evaluation at the unrestricted (restricted) maximum-
likelihood estimator. Under mild regularity conditions and when the null hypothesis is true the
distributions of all three test statistics converge to χr2 . The tests are then performed using approx-
imate (asymptotic) χ 2 critical values. Since they are based on a limiting approximation, size
distortions may occur, especially when the sample size is small. Several finite-sample corrections
were proposed in the literature; see, e.g. [10].
It is important to remark that when there are nuisance parameters (i.e. inference is carried out on
a subset of the parameter vector), the resulting inference is based on the profile likelihood function,
which does not enjoy all of the desirable properties of the likelihood function. For instance, the
score mean is not zero and the information equality does not hold. Hypothesis tests based on
the profile likelihood function can also display considerable size distortions in small samples,
especially when there are several nuisance parameters. A number of finite sample adjustments
were proposed in the literature [4–9].
When the parameter of interest is scalar, one can test the null hypothesis using the signed
likelihood ratio statistic:
R = sign(ω̂ − ω)[2{ (ω̂, ψ̂) − (ω, ψ̂ω )}]1/2 .
Here, ψ̂ω denotes the maximum-likelihood estimator of ψ for fixed ω. The null distribution of
such a test statistic can be approximated by the standard normal distribution with an error of order
n−1/2 . The approximation can be, nonetheless, poor in small samples.
Barndorff-Nielsen [11,12] introduced the following modified signed likelihood ratio statistic:
u
R∗ = R + R−1 log , (3)
R
where u involves sample space derivatives. Under H0 , his test statistic is normally distributed
with an error of order n−3/2 . (Notice the gain in precision relative to the standard signed statistic.)
It should be pointed that it may be difficult to obtain this statistic in many situations since to that
end one needs to compute sample space derivatives.
Skovgaard [19] obtained an approximation to Barndorff-Nielsen’s statistic by approximating
the sample space derivatives. His results were later generalized to cover situations in which the
parameter of interest is vector-valued; see [13]. Skovgaard’s test statistic can be expressed as
RV c = RV − 2 log ξ ,
where
K||
{|K||
Jψψ |}1/2 Ῡ −1 K
{U −1 U}
J −1 Ῡ K r/2
ξ= .
Ῡ −1
|Ῡ ||[K −1 Ῡ ]ψψ |1/2
JK RV r/2−1 U Ῡ −1 ῡ
Here, Ῡ and ῡ are obtained, respectively, from
Υ = Eθ1 [U(θ1 )U (θ )]
and
υ = Eθ1 [U(θ1 )( (θ1 ) − (θ ))],
by replacing θ1 for θ̂ and θ for θ̃ after the expected values are computed. It is possible to define
an asymptotically equivalent test statistic as
2
1
RV2c = RV 1 − log ξ .
RV
This statistic has the advantage of always being positive. It also enjoys another desirable property:
it reduces to (R∗ )2 when r = 1, where R∗ is as in Equation (3).
The adjusted statistics proposed by Skovgaard [13] were obtained for the class of varying
dispersion beta regression models by Ferrari and Pinheiro [16]. They showed that
X ˆ T̂ V̂ ∗ T̃ ˜ X X ˆ T̂ (M̃V̂ ∗ + Ĉ)H̃Z
Ῡ =
Z Ĥ(M̂V̂ ∗ + Ĉ)T̃ ˜ X

Z Ĥ[M̂V̂ ∗ M̃ + (M̂ + M̃)Ĉ + V̂ † ]H̃Z

and
X ˆ T̂ [V̂ ∗ ( ˆ M̂ − ˜ M̃) + Ĉ( ˆ − ˜ )]ı
ῡ = .
Z Ĥ[(M̂V̂ ∗ + Ĉ)( ˆ M̂ − ˜ M̃) + (M̂Ĉ + V̂ † )( ˆ − φ̃)]ı

Here, ı denotes the n-vector of ones. The numerical evidence presented by the authors suggests
that testing inference based on the modified statistics is typically more accurate in small samples
than that based on the usual likelihood ratio statistic.
A different approach that can be used to improve the testing inference accuracy in small samples
is to base the inference not on asymptotic (approximate) critical values but on critical values
estimated using data resampling, i.e. by using the bootstrap method.
Let y = (y1 , . . . , yn ) be a random sample from a given population. The bootstrap method [17]
applied to hypothesis testing consists of obtaining a large number (say, B) of artificial samples,
using data resampling in a way that the restriction under test holds, computing the test statistic for
each artificial sample and then using the B bootstrap test statistics to estimate the null distribution
of the statistic computed using the original sample. Such an empirical distribution can be used,
for instance, to obtain a critical value for the test. The bootstrap test is then performed using the
test statistic computed using y and the critical value (for a given nominal level) obtained from the
bootstrap resampling. Beran [20] showed that the bootstrap delivers asymptotic refinements when
applied to test statistics that are asymptotically pivotal (i.e. whose asymptotic null distributions are
free of unknown parameters). According to Davison and Hinkley [21, p. 156], when the number
of bootstrap samples is large (e.g. 999), there is no noticeable loss in power.
We shall consider bootstrap testing inference in beta regressions as an alternative to the standard
likelihood ratio, score and Wald tests and also as an alternative to the analytical approach proposed
by Ferrari and Pinheiro [16]. Let y1 , . . . , yn be random variables such that yt ∼ B(μt , φt ), t =
1, . . . , n. Also, let (μ , φ ) be the parameter vector that indexes the beta regression model. Our
interest lies in testing restrictions on a subset of the parameter vector. The bootstrap algorithm
can be outlined as follows:
(1) Compute the test statistic using the original sample, τ ;
(2) Generate a bootstrap sample y , where each yt is an independent draw from B(μ t ), with
t , φ
−1 −1
t = g (xt β) and φt = h (zt
μ γ ). Here, β and γ are the restricted maximum-likelihood
estimates of β and γ , respectively;
(3) Fit the model using the bootstrap (artificial) responses and compute the test statistic τ ;
(4) Perform Steps 2 and 3 a large number of times, say B;
(5) Compute the 1 − α quantile (say, z1−α ) of the B bootstrap test statistics τ1 . . . , τB , where α is
the test nominal level;
(6) Reject the null hypothesis if τ > z1−α , i.e. if the test statistic computed using the original
sample (Step 1) is greater than the bootstrap critical value.
The rejection rule in Step 5 uses the bootstrap critical value. Alternatively, it can be stated using
the bootstrap p-value, which is given by
#{τb > τ }
p = ,
B
where # denotes the set cardinality, τb is the test statistic computed using the bth artificial sample
(b = 1, . . . , B) and τ is the test statistic computed using the original sample (Step 1). We reject
the null hypothesis if p < α, i.e. if the bootstrap p-value is smaller than the test nominal level.
4. Numerical results
In what follows, we shall report simulation results on the finite-sample performance of the like-
lihood ratio (LR), score (SR ) and Wald (W ) tests in beta regressions. We shall also report results
regarding their bootstrap variants (LR∗ , SR∗ and W ∗ , respectively) and the two modified likelihood
ratio tests (LRc and LRc2 ). The numerical results were obtained using the following beta regression
model:
k
μt
log = β1 + βi xti
1 − μt i=2

q
log(φt ) = γ1 + γj ztj ,
j=2
t = 1, . . . , n, i = 1, . . . , k, j = 1, . . . , q.
The covariate values (xti and ztj , with xti = ztj , ∀i = j) were obtained as random standard
uniform draws. The samples sizes are n = 20, 30, 40. We generated 10 values for xti = ztj and
replicated them to get covariate values for the three sample sizes (twice, three times and four
times, respectively). This was done so that the ratio between the maximal and minimal precisions
(λ = max φt / min φt ) would remain constant as the sample size changes. All results were obtained
using 5000 Monte Carlo replications and 1000 bootstrap replications. (Notice that each numer-
ical experiment entails a total of five million replications.) The tests were carried out at three
different nominal levels, namely: 10%, 5% and 1%. In each Monte Carlo replication, we gener-
ated the response values as yt ∼ B(μt , φt ), t = 1, . . . , n. When generating the bootstrap responses
(y1 , . . . , yn ), we imposed the null hypothesis under test.
At the outset, we consider hypothesis testing inference on β. We set k = q = 4 and tested

H0 : β4 = 0 (r = 1), H0 : β3 = β4 = 0 (r = 2) and H0 : β2 = β3 = β4 = 0 (r = 3) against two-
sided alternatives. When testing only one restriction (r = 1), the parameter values are β1 = 1.5,
β2 = −4.0, β3 = 3.5, β4 = 0, γ1 = 1.0, γ2 = 2.8, γ3 = −1.2 and γ4 = 2.0. When r = 2, we
used β1 = 1.5, β2 = −4.0, β3 = β4 = 0. Finally, when r = 3, data generation was carried out
using β1 = 1.5, β2 = β3 = β4 = 0.
Tables 1–3 contain the null rejection rates (%) of the different tests. The size distortions decrease
as the sample size increases, as expected. We also notice that, in general, the finite-sample per-
formances of the likelihood ratio, score and Wald tests deteriorate when we increase the number
of restrictions under test (from r = 1 up to r = 3); this holds true especially for the likelihood
ratio and Wald tests. It is also noteworthy that the modified likelihood ratio tests outperform the
unmodified test. For instance, when n = 30, r = 1 and α = 10%, the null rejection rate of the
Table 1. Null rejection rates (%) when testing H0 : β4 = 0 (r = 1).
α = 10% α = 5% α = 1%
n 20 30 40 20 30 40 20 30 40
LR 21.5 16.0 15.1 14.1 9.9 8.0 4.9 2.6 2.1

SR 12.0 11.1 11.4 5.8 5.3 5.3 0.7 0.6 1.0
W 30.7 20.3 18.4 23.2 14.1 11.3 13.2 6.2 3.9
LRc 15.5 11.4 10.6 9.0 5.5 5.2 2.7 1.3 1.1
LRc2 16.8 12.0 10.9 10.0 6.0 5.5 3.4 1.4 1.2
LR∗ 10.0 10.3 10.3 5.0 5.0 5.2 1.0 0.8 1.1
SR∗ 9.2 9.5 10.6 4.7 4.8 5.0 0.9 0.8 1.1
W∗ 10.5 10.7 10.4 5.1 5.1 5.0 1.0 0.9 1.2
Table 2. Null rejection rates (%) when testing H0 : β3 = β4 = 0 (r = 2).
α = 10% α = 5% α = 1%
n 20 30 40 20 30 40 20 30 40
LR 25.1 18.8 15.4 16.7 10.9 8.2 6.4 3.6 2.3

SR 11.4 10.8 10.2 4.9 5.3 4.5 0.5 0.7 0.7
W 40.5 26.2 20.9 32.0 19.1 13.2 20.3 8.9 5.3
LRc 14.8 10.9 9.7 8.9 6.0 5.0 2.6 1.0 1.0
LRc2 15.9 11.2 10.0 9.8 6.4 5.1 3.1 1.2 1.0
LR∗ 9.5 10.1 10.1 4.7 5.6 5.1 1.0 0.9 1.1
SR∗ 9.5 10.1 10.0 4.7 5.2 5.1 0.9 1.0 1.2
W∗ 9.4 10.1 9.9 4.7 5.2 5.0 1.2 0.9 1.0
Table 3. Null rejection rates (%) when testing

H0 : β2 = β3 = β4 = 0 (r = 3).
α = 10% α = 5% α = 1%
n 20 30 40 20 30 40 20 30 40
LR 27.7 19.3 16.7 19.0 11.9 9.2 7.1 3.9 2.5

SR 8.6 9.3 9.5 3.5 4.1 4.1 0.3 0.4 0.8
W 48.9 31.7 24.8 40.8 23.5 17.3 29.0 12.7 7.5
LRc 13.5 10.8 9.9 8.1 5.7 4.9 2.4 1.1 1.0
LRc2 15.2 11.3 10.1 9.1 5.9 5.1 2.8 1.4 1.1
LR∗ 9.9 9.9 9.8 4.9 5.5 4.8 1.1 1.2 1.2
SR∗ 9.7 9.9 10.1 4.9 5.1 4.9 1.1 0.9 1.3
W∗ 10.6 10.0 10.4 5.2 5.2 4.9 1.2 0.9 1.1
standard likelihood ratio (LR) test is 16% whereas those of LRc and LRc2 are 11.4% and 12%,
respectively. On the other hand, these modified tests are typically outperformed by the score test,
especially when the sample contains only 20 observations. As for the bootstrap tests, they clearly
display superior finite-sample behaviour relative to the corresponding asymptotic tests. The boot-
strap resampling leads to substantial gains in the control of the tests type I error probabilities. For
example, when r = 2, n = 40 and α = 5%, the rejection rates of the likelihood ratio test and its
bootstrap variant are 8.2% and 5.1%, respectively. The null rejection rates of the Wald (score) and
bootstrap Wald (bootstrap score) tests are 13.2% and 5.0% (4.5% and 5.1%). It is thus clear that
the likelihood ratio and Wald tests perform considerably better when they are based on a critical
value obtained from a bootstrap resampling scheme instead of a χ 2 critical value.
It is noteworthy that the Wald test was the worst performer; it is quite liberal (oversized,
anticonservative) in small samples. For instance, when n = 20, r = 3 and α = 10%, the test null
rejection rate is around 50%. Even with 40 observations in the sample and only one restriction in
the null hypothesis (r = 1), the size distortion is considerable: the test null rejection rate at the
10% nominal level exceeds 18%. We also note that the three bootstrap tests display good control
of the type I error probability and outperform the modified likelihood ratio tests when n = 20;
they display slightly superior behaviour when n = 30 and perform equally well when n = 40.
The bootstrap tests are thus an appealing alternative to the modifications to the likelihood ratio
test proposed by Ferrari and Pinheiro [16].
We have also carried out simulations in which data generation was performed under the alter-
native hypothesis (H1 : β4 = (r = 1), H1 : β3 = β4 = (r = 2) and H1 : β2 = β3 = β4 =
(r = 3), for different values of ) in order to evaluate the tests type II error probabilities. The
likelihood ratio (original and modified), score and Wald tests were carried out using exact crit-
ical values obtained from the size simulations, so that we compare powers of tests with correct
sizes. Since the same approach cannot be applied to the bootstrap tests (given that they must be
based on bootstrap critical values), we shall only consider situations in which such tests displayed
minor size distortions. Table 4 contains the non-null rejection rates of the different tests when
r = 1, n = 40 and α = 5%. It is clear that all tests become more powerful as the || increases,
as expected. The modified likelihood ratio tests outperform the competition when > 0. We also
note that the use of bootstrap resampling did not yield loss in power.
We shall now consider tests on γ , the parameter vector that indexes the precision submodel. The
following null hypotheses were considered: H0 : γ4 = 0 (r = 1), H0 : γ3 = γ4 = 0 (r = 2) and
H0 : γ2 = γ3 = γ4 = 0 (r = 3); they were tested against two-sided alternatives. The parameter
values were fixed at β1 = 3.5, β2 = −3.0, β3 = 2.4, β4 = −2.1, in all situations, and γ1 = 5.0,
γ2 = −2.0, γ3 = 1.0 and γ4 = 0.0, when r = 1; γ1 = 5.0, γ2 = −2.0, γ3 = γ4 = 0.0, when r = 2;
and γ1 = 5.0, γ2 = γ3 = γ4 = 0.0, when r = 3. The tests null rejection rates are presented in
Tables 5–7. Again, we note that the use of bootstrap resampling reduces size distortions and also
Table 4. Non-null rejection rates (%) when testing

H0 : β4 = 0 (r = 1), n = 40, α = 5%.
−2.0 −1.0 −0.5 0.5 1.0 2.0
LR 99.6 57.8 18.8 13.1 34.9 66.8

SR 99.3 55.5 17.2 15.7 36.4 62.6
W 99.6 57.4 19.1 12.2 34.5 69.2
LRc 99.5 55.4 17.4 16.2 40.6 73.9
LRc2 99.5 55.5 17.3 15.9 40.1 73.5
LR∗ 99.6 58.6 19.3 13.4 35.1 67.3
SR∗ 99.4 55.4 17.2 16.0 36.9 63.3
W∗ 99.6 59.1 20.2 11.9 33.7 68.3
Table 5. Null rejection rates (%) when testing H0 : γ4 = 0 (r = 1).
α = 10% α = 5% α = 1%
n 20 30 40 20 30 40 20 30 40
LR 23.3 17.5 14.7 15.7 10.2 8.4 6.3 2.7 2.0

SR 8.5 9.4 8.9 3.9 4.2 4.6 0.8 0.9 0.8
W 39.2 27.2 20.4 31.1 19.7 14.0 19.3 8.6 5.5
LRc 19.2 12.7 11.3 11.8 6.4 5.8 3.8 1.4 1.3
LRc2 20.7 13.2 11.5 12.9 6.8 6.0 4.3 1.5 1.4
LR∗ 10.8 10.4 10.6 5.6 5.2 5.2 1.3 1.0 1.1
SR∗ 10.1 10.5 9.6 4.9 5.1 5.1 0.9 1.2 1.0
W∗ 11.7 10.9 10.6 6.1 5.4 5.5 1.5 0.9 1.3
Table 6. Null rejection rates (%) when testing H0 : γ3 = γ4 = 0 (r = 2).
α = 10% α = 5% α = 1%
n 20 30 40 20 30 40 20 30 40
LR 26.7 19.1 15.5 17.5 11.9 9.0 7.0 3.9 2.5

SR 8.7 9.4 9.0 3.6 4.7 4.5 0.5 0.9 1.0
W 48.0 31.0 24.2 39.6 23.2 16.6 26.8 12.4 7.0
LRc 14.2 11.6 10.1 8.2 6.3 5.2 2.3 1.5 1.0
LRc2 16.4 12.2 10.5 9.4 6.7 5.3 2.8 1.6 1.1
LR∗ 10.0 10.7 9.9 5.0 5.8 5.0 1.1 1.3 1.1
SR∗ 10.0 10.4 9.8 4.8 5.4 5.0 1.3 1.2 0.9
W∗ 10.6 10.8 10.2 5.1 6.1 4.7 1.1 1.4 1.2
Table 7. Null rejection rates (%) when testing

H0 : γ2 = γ3 = γ4 = 0 (r = 3).
α = 10% α = 5% α = 1%
n 20 30 40 20 30 40 20 30 40
LR 28.1 20.7 16.8 19.0 12.66 9.7 6.8 3.7 2.9

SR 8.4 10.1 9.8 4.4 5.0 4.8 0.9 1.0 1.0
W 54.3 36.4 27.2 45.4 27.9 19.6 31.1 14.9 9.0
LRc 11.4 10.1 10.1 6.0 5.0 5.2 1.7 1.0 1.0
LRc2 13.3 11.0 10.7 7.5 5.6 5.4 2.0 1.1 1.0
LR∗ 9.1 10.4 10.2 4.5 4.9 5.5 0.7 1.0 1.0
SR∗ 9.4 10.7 10.2 4.8 5.5 5.0 1.1 1.1 1.2
W∗ 9.8 9.9 10.4 4.9 4.8 5.5 1.0 0.8 1.3
that the modified likelihood ratio tests are less size-distorted than the original likelihood ratio test,
LRc2 usually being more liberal than LRc . It should be noted, however, that the modified likelihood
ratio tests are typically outperformed by the score (SR ) test. It is also important to notice that the
Wald test was once again very liberal, displaying large-size distortions. Indeed, its null rejection
rate at the 5% nominal level exceeded 45% when n = 20 and r = 3. This is a very large-size
distortion. Additionally, we note that the use of bootstrap resampling considerably improves the
finite sample performance of score, likelihood ratio and Wald tests, especially the latter two.
Table 8 contains the tests estimated powers corresponding to r = 1, n = 30 and α = 5%. At the
outset, we note that the tests become more powerful as the true parameter value moves away from
the value specified in the null hypothesis, as expected. When > 0, the modified likelihood ratio
Table 8. Non-null rejection rates (%) when testing H0 : γ4 = 0 (r = 1),

n = 30, α = 5%.
−5.0 −4.0 −2.0 −1.0 −0.5 0.5 1.0 2.0
LR 88.02 75.14 28.80 11.16 6.86 5.30 8.28 21.06

SR 84.56 73.36 31.56 13.52 8.02 4.40 6.02 15.72
W 85.00 69.42 22.50 8.92 6.08 5.60 8.90 19.92
LRc 86.74 72.26 25.40 9.72 6.52 6.80 10.68 25.52
LRc2 86.82 72.34 25.56 9.88 6.56 6.74 10.30 24.92
LR∗ 89.18 76.54 29.94 11.42 7.12 5.26 8.54 20.84
SR∗ 83.46 72.76 31.92 13.68 8.18 4.44 6.08 16.18
W∗ 90.94 76.54 25.00 9.60 6.56 5.70 9.04 20.14
tests are slightly more powerful than the original likelihood ratio test and its bootstrap variant.
Power simulations for testing inference on γ were carried out using = −4 and = −5.
We note that the power simulations carried out to evaluate the small sample performance of
tests on the parameters that index the precision submodel used true parameter values that are
more distant from the ones specified in the null hypothesis than when testing restrictions on the
parameters in the mean submodel. It follows that testing inference on γ is typically not as accurate
than that performed on β as far as the probability of making a type II error is concerned.
In the previous simulations, we increased the number of restrictions under test (r) and examined
the tests finite sample performances. In what follows, we shall consider an increase in the number
of nuisance parameters. Our interest lies in testing H0 : βk = 0 against a two-sided alternative,
where k denotes the number of parameters that index the mean submodel. Data generation was
performed according to the following beta regression model:

k
μt
log = β1 + βi xti ,
1 − μt i=2
log(φt ) = γ1 + γ2 zt2 ,
where t = 1, . . . , n, i = 1, . . . , k and xti = ztj ∀i = j. Notice, for instance, that when k = 2, we test
H0 : β2 = 0. Thus, the null hypothesis only imposes one restriction. The true parameter values
are: (i) β1 = 1.0 and β2 = 0 when k = 2; (ii) β1 = 1.0, β2 = 0.5 and β3 = 0 when k = 3; and (iii)
β1 = 1.0, β2 = 0.5, β3 = −2.2 and β4 = 0 when k = 4. In all scenarios, γ1 = 2.0 and γ2 = 4.5.
Tables 9–11 contain the tests null rejection rates for inference on the mean submodel parameters.
Such rates approach the corresponding nominal levels as the sample size increases, as expected.
It is noteworthy that the finite sample behaviour of the likelihood ratio, score and Wald tests
deteriorate as the number of nuisance parameters increases. In particular, we note that the score
test small sample performance becomes poorer when the number of nuisance parameters increase,
unlike what was observed when we increased the number of parameters of interest. When k = 2,
the score test is competitive with the modified likelihood ratio tests and also with the bootstrap
tests. However, it becomes liberal when the k is increased to 3 and 4, being nonetheless less
oversized than the likelihood ratio and Wald tests. We also note that the modified likelihood ratio
and bootstrap tests displayed similar finite sample behaviour. Finally, we note that the Wald test
was again considerably oversized, but less so than in the previous setting, that is, the Wald test is
more liberal when there are several parameters of interest than when there are several nuisance
parameters.
We have also evaluated the tests performances when inference is carried out on γ and the
number of nuisance parameters is increased. The null hypothesis of interest is H0 : γq = 0, where
q is the number of regression parameters in the precision submodel. The mean submodel used in
Table 9. Null rejection rates (%) when testing H0 : β2 = 0 (k = 2).
α = 10% α = 5% α = 1%
n 20 30 40 20 30 40 20 30 40
LR 12.4 11.8 10.8 6.6 6.1 5.7 1.8 1.3 1.2

SR 9.9 10.3 9.5 4.6 4.8 4.9 0.6 0.6 0.9
W 14.7 13.3 11.9 8.9 7.3 6.6 3.0 2.3 1.9
LRc 9.2 9.6 9.0 4.4 4.6 4.9 0.9 0.8 1.0
LRc2 9.4 9.8 9.1 4.6 4.7 4.9 1.0 0.9 1.1
LR∗ 8.9 9.8 9.3 4.5 4.6 5.0 1.0 1.0 1.2
SR∗ 9.1 9.8 9.3 4.5 4.7 5.0 0.9 1.0 1.1
W∗ 9.2 9.7 9.3 4.3 4.6 5.0 1.0 1.1 1.2
α = 10% α = 5% α = 1%
n 20 30 40 20 30 40 20 30 40
LR 16.8 13.6 12.7 10.5 7.8 7.2 3.1 2.1 1.8

SR 12.5 11.6 11.0 6.1 5.8 5.9 0.8 0.9 1.1
W 20.3 16.4 14.5 14.1 9.9 8.8 6.6 3.5 2.9
LRc 10.1 10.0 9.9 5.1 4.9 5.4 1.1 0.9 1.1
LRc2 10.7 10.3 10.1 5.5 5.1 5.5 1.2 1.0 1.2
LR∗ 10.2 10.1 10.0 5.2 4.7 5.6 1.1 1.0 1.2
SR∗ 9.8 10.2 9.9 5.0 5.0 5.5 1.0 1.1 1.1
W∗ 10.5 10.0 10.1 5.1 4.9 5.4 1.1 0.9 1.3
α = 10% α = 5% α = 1%
n 20 30 40 20 30 40 20 30 40
LR 15.6 13.5 12.8 8.8 7.1 6.7 2.5 1.6 1.7

SR 13.0 12.0 11.9 6.6 5.7 5.8 1.0 0.8 1.2
W 17.7 15.0 13.7 11.2 8.2 7.5 4.0 2.3 2.2
LRc 9.8 9.9 10.1 4.7 4.6 4.8 0.7 0.7 1.0
LRc2 10.5 10.1 10.3 4.9 4.7 4.9 0.8 0.7 1.1
LR∗ 9.4 10.1 10.4 4.7 4.8 5.1 0.7 0.8 1.1
SR∗ 9.5 10.1 10.4 4.8 4.8 5.0 0.9 0.8 1.1
W∗ 9.3 9.8 10.4 4.4 4.6 5.1 0.6 0.8 1.1
the simulations was

μt
log = β1 + β2 xt2 ,
1 − μt
and the precision submodel was

q
log(φt ) = γ1 + γj ztj ,
j=2
where t = 1, . . . , n and j = 1, . . . , q. The true values of β1 and β2 are 1.5 and −2.8, respectively.
When q = 2, γ1 = 3.9 and γ2 = 0. When q = 3, we have that γ1 = 3.9, γ2 = 5.0 and γ3 = 0.
Table 12. Null rejection rates (%) when testing H0 : γ2 = 0 (q = 2).
α = 10% α = 5% α = 1%
n 20 30 40 20 30 40 20 30 40
LR 13.4 12.0 12.3 7.6 6.5 6.6 1.8 1.7 1.2

SR 9.8 9.5 10.2 4.1 4.6 4.7 0.5 0.9 0.7
W 16.5 14.7 13.9 10.2 8.5 8.1 3.7 2.8 2.0
LRc 10.5 10.0 10.7 5.0 5.4 5.5 1.0 1.3 0.9
LRc2 10.6 10.0 10.8 5.1 5.5 5.6 1.1 1.4 0.9
LR∗ 10.4 9.9 10.7 5.1 5.3 5.6 1.1 1.6 1.0
SR∗ 10.6 10.0 10.7 5.3 5.2 5.2 1.2 1.5 1.1
W∗ 9.7 10.2 11.2 4.8 5.3 5.5 1.3 1.2 0.9
α = 10% α = 5% α = 1%
n 20 30 40 20 30 40 20 30 40
LR 16.4 13.7 11.6 9.5 7.8 6.2 3.0 1.7 1.6

SR 7.1 8.4 8.1 3.0 3.7 3.4 0.4 0.6 0.5
W 25.8 19.4 15.5 18.6 12.9 9.2 8.7 4.5 2.9
LRc 12.0 11.1 9.5 6.1 5.5 4.8 1.4 1.0 1.1
LRc2 12.5 11.3 9.7 6.4 5.6 4.9 1.5 1.1 1.1
LR∗ 10.1 10.6 9.4 5.3 5.1 4.8 1.2 1.0 1.0
SR∗ 9.9 10.4 9.5 4.9 5.2 4.5 1.2 0.9 0.7
W∗ 10.8 10.7 9.6 5.3 5.2 4.6 1.3 1.0 1.0
α = 10% α = 5% α = 1%
n 20 30 40 20 30 40 20 30 40
LR 18.6 14.0 13.2 11.3 8.1 7.0 3.3 2.0 1.9

SR 9.8 9.4 9.8 5.3 4.5 4.9 1.2 1.1 1.1
W 27.4 19.5 16.4 19.9 12.3 9.6 9.7 5.1 3.3
LRc 13.0 10.4 10.2 7.2 5.5 5.4 2.0 1.4 1.2
LRc2 14.9 11.1 10.5 8.3 5.8 5.6 2.5 1.5 1.3
LR∗ 10.5 10.0 9.6 5.2 5.0 5.0 1.3 1.3 1.2
SR∗ 10.3 9.6 10.2 5.3 4.8 5.2 1.3 1.2 1.2
W∗ 10.8 9.9 9.3 5.2 5.3 4.7 1.2 1.1 1.0
Finally, when q = 4, the precision submodel parameters are fixed at the following values: γ1 = 3.9,
γ2 = 5.0, γ3 = −1.0 and γ4 = 0. The null rejection rates of the different tests are displayed in
Tables 12–14.
It is important to notice that here the performance of the score test does not deteriorate when
the number of nuisance parameters is increased. Its finite sample behaviour is close to those of the
bootstrap and modified likelihood ratio tests. We also note that when q > 2, the modified likelihood
ratio tests are outperformed by the bootstrap tests, especially when n = 20. Once again, the Wald
test is considerably liberal. For instance, when q = 4, n = 20 and α = 10% its null rejection rate
is nearly 28%. Overall, the bootstrap tests display superior finite sample behaviour.
The numerical evidence presented in this section leads to several important conclusions. First,
the modified likelihood ratio tests outperform the standard likelihood ratio test. Second, the
score test oftentimes performs better than the modified likelihood ratio tests, especially when the
dimensions of the parameter of interest and nuisance parameter are small. Third, the best overall
small sample performance was achieved by the bootstrap tests. Fourth, the Wald can be very
liberal in small samples, i.e. it can display a type I error frequency well above the nominal level.
This is a serious shortcoming and practitioners should be careful when basing their inferences
on the Wald test, especially when the sample contains few observations. It is noteworthy that the
standard z test statistics outputed by betareg – the R package that is commonly used in beta
regression analyses – are Wald test statistics; for details on betareg, see [22]. Practitioners
should be aware that such z tests can be considerably inaccurate in small samples. In particular,
some regressors are likely to be incorrectly included in the mean and precision submodels after
such tests are carried out with small sample sizes.
5. An application
In what follows, we shall present an empirical illustration of the testing inferences considered in
the previous sections. To that end we shall use a data set analysed by Smithson and Verkuilen
[23] that contains 44 observations on reading accuracy of dyslexic and non-dyslexic Australian
children. The variable of interest (y) are reading accuracy indices of such children. The covariates
are: dyslexia versus non-dyslexia status (x1 ), non-verbal IQ converted to z-scores (x2 ) and an
interaction variable (x3 ). The participants (19 dyslexics and 25 controls) were students from
primary schools in the Australian Capital Territory. The ages of the 44 children range from 8
years 5 months to 12 years 3 months. The covariate x2 is a dummy variable, which equals 1
if the child is dyslexic and −1 otherwise. The observed scores were linearly transformed from
their original scale to the open unit interval (0, 1); see [23]. The mean accuracy was 0.900 for
non-dyslexic readers and 0.606 for dyslexic children. The scores ranged from 0.459 to 0.990, and
averaged 0.773. At the outset, we tested the null hypothesis of constant dispersion (i.e. that the
precision parameter is the same for all observations) which was rejected by all tests at the 1%
nominal level. There is thus evidence of varying dispersion. The beta regression model estimated
by the authors and by Espinheira et al. [24,25] was

μt
log = β0 + β1 xt1 + β2 xt2 + β3 xt1 xt2 ,
1 − μt
log(φt ) = γ0 + γ1 xt1 + γ2 xt2 ,
t = 1, . . . , 44. We shall refer to this model as ‘Model 1’. The parameter estimates along with the
corresponding standard errors and tests p-values are given in Table 15. Since the sample size is
small, we shall use 10% as the nominal level for all testing inferences. We note that all covariates
in the mean submodel are statistically significant. Nonetheless, in the precision submodel, the
Table 15. Point estimates, standard errors, and p-values; Model 1.
β0 β1 β2 β3 γ0 γ1 γ2
Estimate Std. error p-Value 1.1230 −0.7420 0.4860 −0.5810 3.3040 1.7470 1.2290
0.1430 0.1430 0.1330 0.1330 0.2230 0.2620 0.2670
LR 0.0000 0.0000 0.0066 0.0023 0.0000 0.0000 0.0105
SR 0.0000 0.0022 0.0415 0.0318 0.0000 0.0001 0.1354
W 0.0000 0.0000 0.0003 0.0000 0.0000 0.0000 0.0000
LRc 0.0000 0.0000 0.0133 0.0049 0.0000 0.0000 0.0179
LRc2 0.0000 0.0000 0.0129 0.0048 0.0000 0.0000 0.0179
LR∗ 0.0020 0.0000 0.0120 0.0080 0.0000 0.0000 0.0270
SR∗ 0.0020 0.0010 0.0390 0.0370 0.0000 0.0000 0.1210
W∗ 0.0000 0.0000 0.0010 0.0000 0.0000 0.0000 0.0010
score tests (standard and bootstrap) indicate that x2 is not statistically significant, that is, the null
hypothesis H0 : γ2 = 0 is not rejected.
Given that the score test and its bootstrap variant performed well in our Monte Carlo simulations,
we searched for an alternative model to represent the data. We started by removing x2 from the
precision submodel, but this lead all tests to not reject the null hypotheses H0 : β2 = 0 and
H0 : β3 = 0, i.e. they indicate that x2 and x3 should be removed from the mean submodel which
can be due to model misspecification. The reduced model, however, displayed poor fit. We then
considered the following augmented model:

μt
log = β0 + β1 xt1 + β2 xt2 + β3 xt1 xt2 ,
1 − μt
log(φt ) = γ0 + γ1 xt1 + γ2 xt2 + γ3 xt2
2
,
t = 1, . . . , 44. That is, we included x22 as a regressor in the precision submodel. We shall refer to
this model as ‘Model 2’.
The point estimates, standard errors and tests p-values obtained for the new fit (‘Model 2’)
are given in Table 16. Notice that that the score and bootstrap score tests p-values for testing the
exclusion of x22 from the precision submodel are 0.0691 and 0.0800, respectively. We thus reject
the null hypothesis H0 : γ3 = 0 at the 10% nominal level. The corresponding p-values for testing
H0 : γ2 = 0 are 0.0682 and 0.0520. All tests indicate that all covariates in the precision submodel
are statistically significant at the 10% nominal level.
It is now important to evaluate the goodness-of-fit of the two estimated beta regression models,
namely Models 1 and 2. At the outset, we carried out the RESET mis-specification test [26].
The null hypothesis under test is that the model is correctly specified which is tested against
the alternative hypothesis of model misspecification. We used η̂t2 as testing variable in the mean
submodel. The null hypothesis of correct model specification is not rejected at the usual nominal
levels for both models.
Table 17 presents the pseudo-R2 and also the AIC and BIC values for the two models. We notice
that the pseudo-R2 of Model 1 is slightly higher, but that the Akaike and Bayesian information
Table 16. Point estimates, standard errors, and p-values; Model 2.
β0 β1 β2 β3 γ0 γ1 γ2 γ3
Estimate Std. error p-Value 1.040 −0.676 0.768 −0.838 2.576 1.770 1.918 1.218
0.147 0.147 0.088 0.088 0.297 0.254 0.249 0.206
LR 0.0000 0.0001 0.0000 0.0000 0.0000 0.0000 0.0000 0.0001
SR 0.0001 0.0041 0.0436 0.0429 0.0002 0.0014 0.0682 0.0691
W 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000
LRc 0.0000 0.0004 0.0000 0.0000 0.0000 0.0000 0.0001 0.0015
LRc2 0.0000 0.0003 0.0000 0.0000 0.0000 0.0000 0.0001 0.0015
LR∗ 0.0120 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0020
SR∗ 0.0150 0.0020 0.0440 0.0360 0.0000 0.0010 0.0520 0.0800
W∗ 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0010
Table 17. Pseudo-R2 and model selection criteria.
Pseudo-R2 AIC BIC
Model 1 0.5756 −117.8040 −105.3144

Model 2 0.5144 −131.3091 −117.0356
Model 1 Model 2
4
Residuals (absolute values)
Residuals (absolute values)

3
3
2
2
1
1
0
0
0.0 0.5 1.0 1.5 2.0 2.5 0.0 0.5 1.0 1.5 2.0 2.5
Normal quantiles Normal quantiles
Figure 1. Simulated envelope plots for Models 1 and 2.
criteria favour Model 2. Figure 1 contains the simulated envelope plots for both models. It is
noteworthy that these plots indicate that Model 2 yields a slightly better fit.
6. Concluding remarks
The class of beta regressions is useful for modelling responses that assume values in the standard
unit interval such as rates and proportions. The model is indexed by mean and precision parameters
which can be modelled using regressors and link functions. Hypothesis testing inference is usually
based on the likelihood ratio test. Such a test uses an asymptotic (approximate) critical value and
may display size distortions in small samples. Ferrari and Pinheiro [16] proposed two modified
likelihood ratio test statistics. Their numerical evidence show that the modified tests outperform
the standard likelihood ratio test in the finite sample since they display smaller size distortions. In
this paper, we considered an alternative testing strategy based on bootstrap resampling. The main
idea is to obtain a set of artificial samples using the parametric bootstrap and use them to estimate
the test statistic null distribution. The test is then based on a critical value obtained from such an
estimated distribution, thus avoiding the use of asymptotic critical values. The bootstrap approach
is applied to the likelihood ratio, score and Wald tests. Extensive Monte Carlo simulations were
carried to evaluate the relative merits of each test. The numerical evidence we present yields
several interesting and important conclusions. First, the Wald test tends to be severely oversized
in small samples, especially when used to test more than one restriction. For instance, the Wald
test null rejection is nearly 50% when testing three joint restrictions using only 20 observations
at the 10% nominal level. This is a huge size distortion. A direct implication of this result is that
practitioners will likely incorrectly conclude that some regressors are statistically significant when
inference is based on Wald tests (e.g. standard z tests) with small sample sizes. We recommend
that testing inference should not be based on Wald tests (including standard z tests) unless the
sample size is large.
The second important implication of the numerical evidence we present is that the score test
displays much better control of the type I error frequency relative to the likelihood ratio and Wald
tests. For instance, our numerical evidence includes a setting in which the score test null rejection
rate at the 10% nominal level and with 30 data points equals 10.1% whereas those of the likelihood
ratio and Wald tests are 20.7% and 36.4%, respectively.
Third, the finite sample behaviour of the score, likelihood ratio and Wald tests is greatly
improved when inference is based on bootstrap resampling, thus avoiding the use of asymptotic
critical values. The improvement in the type I error control can be substantial. As an illustration,
in one of our Monte Carlo experiments (r = 3, n = 30) the null rejection rate of the Wald test was
reduced from over 54% to 9.8% at the 10% nominal level by basing the inference on a bootstrap
critical value.
Fourth, the modified likelihood ratio tests proposed by Ferrari and Pinheiro [16] consistently
outperform the likelihood ratio test, as expected. Such modified tests, however, are slightly out-
performed by the standard score test in several settings. Overall, the bootstrap tests display better
control of the type I error probability than the two modified likelihood ratio tests. As an illustra-
tion, the sizes of the two modified tests are around 13% and 15% at the 10% nominal in one of
our numerical exercises (q = 4 and n = 20) whereas the corresponding null rejection rates of the
bootstrap likelihood ratio, score and Wald tests are around 10.5%.
Finally, our results reveal that testing inference on the parameters that index the precision
submodel is usually less accurate than that performed on the mean submodel parameters in the
sense of typically displaying larger type II error frequency.
Acknowledgements
We gratefully acknowledge financial support from CNPq/Brazil.
References
[1] S.L.P. Ferrari and F. Cribari-Neto, Beta regression for modelling rates and proportions, J. Appl. Stat. 31 (2004),
pp. 799–815.
[2] P. McCullagh and J.A. Nelder, Generalized Linear Models, 2nd ed., Chapman and Hall, London, 1989.
[3] A.B. Simas, W. Barreto-Souza, and A.V. Rocha, Improved estimators for a general class of beta regression models,
Comput. Statist. Data Anal. 54 (2010), pp. 348–366.
[4] O.E. Barndorff-Nielsen, On a formula for the distribution of the maximum likelihood estimator, Biometrika 70
(1983), pp. 343–365.
[5] O.E. Barndorff-Nielsen, Adjusted versions of profile likelihood and directed likelihood, and extended likelihood, J.
R. Stat. Soc. B 56 (1994), pp. 125–140.
[6] D.R. Cox and N. Reid, Parameter orthogonality and approximate conditional inference, J. R. Stat. Soc. B 49 (1987),
pp. 1–39.
[7] D.R. Cox and N. Reid, A note on the difference between profile and modified profile likelihood, Biometrika 79 (1992),
pp. 408–411.
[8] P. McCullagh and R. Tibshirani, A simple method for the adjustment of profile likelihood, J. R. Stat. Soc. B 52 (1990),
pp. 325–344.
[9] S.E. Stern, A second-order adjustment to the profile likelihood in the case of a multidimensional parameter of interest,
J. R. Stat. Soc. B 59 (1997), pp. 653–665.
[10] F. Cribari-Neto and G.M. Cordeiro, On bartlett and bartlett-type corrections, Econometric Rev. 15 (1996), pp. 339–
367.
[11] O.E. Barndorff-Nielsen, Inference on full or partial parameters based on the standardized signed log likelihood
ratio, Biometrika 73 (1986), pp. 307–322.
[12] O.E. Barndorff-Nielsen, Modified signed log likelihood ratio, Biometrika 78 (1991), pp. 557–563.
[13] I.M. Skovgaard, Likelihood asymptotics, Scand. J. Statist. 28 (2001), pp. 3–32.
[14] S.L.P. Ferrari and A.H.M.A. Cysneiros, Skovgaard’s adjustment to likelihood ratio tests in exponential family
nonlinear models, Statist. Probab. Lett. 78 (2008), pp. 3049–3057.
[15] T.F.N. Melo, K.L.P. Vasconcellos, and A.J. Lemonte, Some restriction tests in a new class of regression models for
proportions, Comput. Statist. Data Anal. 53 (2009), pp. 3972–3979.
[16] S.L.P. Ferrari and E.C. Pinheiro, Improved likelihood inference in beta regression, J. Stat. Comput. Simul. 81 (2011),
pp. 431–443.
[17] B. Efron, Bootstrap methods: Another look at the jackknife, Ann. Statist. 7 (1979), pp. 1–26.
[18] J. Nocedal and S.J. Wright, Numerical Optimization, Springer-Verlag, New York, 1999.
[19] I.M. Skovgaard, An explicit large-deviation approximation to one-parameter tests, Bernoulli 2 (1996), pp. 145–165.
[20] R. Beran, Prepivoting test statistics: A bootstrap view of asymptotic refinements, J. Amer. Statist. Assoc. 83 (1988),
pp. 687–697.
[21] A.C. Davison and D.V. Hinkley, Bootstrap Methods and Their Application, Cambridge University Press, Cambridge,
1997.
[22] F. Cribari-Neto and A. Zeileis, Beta regression in R, J. Statist. Softw. 34 (2010), pp. 1–24.
[23] M. Smithson and J. Verkuilen, A better lemon squeezer? Maximum-likelihood regression with beta-distributed
dependent variables, Psychol. Methods 11 (2006), pp. 54–71.
[24] P.L. Espinheira, S.L.P. Ferrari, and F. Cribari-Neto, Influence diagnostics in beta regression, Comput. Statist. Data
Anal. 52 (2008), pp. 4417–4431.
[25] P.L. Espinheira, S.L.P. Ferrari, and F. Cribari-Neto, On beta regression residuals, J.Appl. Stat. 35 (2008), pp. 407–419.
[26] F. Cribari-Neto and L.B. Lima, A misspecification test for beta regressions, Working Paper, Department of Statistics,
Federal University of Pernambuco, Recife, Brazil, 2007.

JKBDSSK

Uploaded by

Copyright:

Available Formats

JKBDSSK

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

JKBDSSK

Uploaded by

Copyright:

Available Formats

Journal of Statistical Computation and Simulation

ISSN: 0094-9655 (Print) 1563-5163 (Online) Journal homepage: https://www.tandfonline.com/loi/gscs20

On testing inference in beta regressions

Francisco Cribari-Neto & Marcela P.F. Queiroz

To link to this article: https://doi.org/10.1080/00949655.2012.700456

Published online: 10 Jul 2012.

Submit your article to this journal

Article views: 329

View related articles

View Crossmark data

Citing articles: 6 View citing articles

Full Terms & Conditions of access and use can be found at

On testing inference in beta regressions

(Received 10 April 2012; final version received 2 June 2012)

*Corresponding author. Email: cribari@de.ufpe.br

© 2012 Taylor & Francis

2. The beta regression model

y ∈ (0, 1), μ ∈ (0, 1) and φ > 0.

where β = (β1 , . . . , βk ) is a vector of unknown regression parameters (β ∈ Rk ), xt =

where γ = (γ1 , . . . , γq ) is a vector of unknown parameters (γ ∈ Rq ), zt1 , . . . , ztq are observations

t (μt , φt ) = log (φt ) − log (μt φt ) − log ((1 − μt )φt )

{μ1 , . . . , μn }, = diag{φ1 , . . . , φn }. Additionally, y∗ = (y1∗ , . . . , yn∗ ) , y† = (y1† , . . . , yn† ) , μ∗ =

Jββ = X { TV ∗ + ST 2 (y∗ − M∗ )}T X,

The following are diagonal matrices: V ∗ = diag{v1∗ , . . . , vn∗ }, V † = diag{v1† , . . . , vn† }, M∗ =

approximately, where K −1 is the inverse information.

Let y1 , . . . , yn , t = 1, . . . , n, be independent beta responses and let θ = (μ , φ ) be the parameter

and likewise for J and J −1 .

LR = 2{ (ω(0) , ψ̂) − (ω(0) , ψ̃)},

R = sign(ω̂ − ω)[2{ (ω̂, ψ̂) − (ω, ψ̂ω )}]1/2 .

Here, Ῡ and ῡ are obtained, respectively, from

At the outset, we consider hypothesis testing inference on β. We set k = q = 4 and tested

Table 1. Null rejection rates (%) when testing H0 : β4 = 0 (r = 1).

LR 21.5 16.0 15.1 14.1 9.9 8.0 4.9 2.6 2.1

Table 2. Null rejection rates (%) when testing H0 : β3 = β4 = 0 (r = 2).

LR 25.1 18.8 15.4 16.7 10.9 8.2 6.4 3.6 2.3

Table 3. Null rejection rates (%) when testing

LR 27.7 19.3 16.7 19.0 11.9 9.2 7.1 3.9 2.5

Table 4. Non-null rejection rates (%) when testing

 −2.0 −1.0 −0.5 0.5 1.0 2.0

LR 99.6 57.8 18.8 13.1 34.9 66.8

Table 5. Null rejection rates (%) when testing H0 : γ4 = 0 (r = 1).

LR 23.3 17.5 14.7 15.7 10.2 8.4 6.3 2.7 2.0

Table 6. Null rejection rates (%) when testing H0 : γ3 = γ4 = 0 (r = 2).

LR 26.7 19.1 15.5 17.5 11.9 9.0 7.0 3.9 2.5

Table 7. Null rejection rates (%) when testing

LR 28.1 20.7 16.8 19.0 12.66 9.7 6.8 3.7 2.9

Table 8. Non-null rejection rates (%) when testing H0 : γ4 = 0 (r = 1),

 −5.0 −4.0 −2.0 −1.0 −0.5 0.5 1.0 2.0

LR 88.02 75.14 28.80 11.16 6.86 5.30 8.28 21.06

Table 9. Null rejection rates (%) when testing H0 : β2 = 0 (k = 2).

LR 12.4 11.8 10.8 6.6 6.1 5.7 1.8 1.3 1.2

Table 10. Null rejection rates (%) when testing H0 : β3 = 0 (k = 3).

LR 16.8 13.6 12.7 10.5 7.8 7.2 3.1 2.1 1.8

Table 11. Null rejection rates (%) when testing H0 : β4 = 0 (k = 4).

LR 15.6 13.5 12.8 8.8 7.1 6.7 2.5 1.6 1.7

the simulations was

Table 12. Null rejection rates (%) when testing H0 : γ2 = 0 (q = 2).

LR 13.4 12.0 12.3 7.6 6.5 6.6 1.8 1.7 1.2

Table 13. Null rejection rates (%) when testing H0 : γ3 = 0 (q = 3).

−2.0 −1.0 −0.5 0.5 1.0 2.0

−5.0 −4.0 −2.0 −1.0 −0.5 0.5 1.0 2.0