Institute of Mathematical Statistics

Bootstrapping Regression Models Author(s): D. A. Freedman Reviewed work(s): Source: The Annals of Statistics, Vol. 9, No. 6 (Nov.
, 1981), pp. 1218-1228 Published by: Institute of Mathematical Statistics Stable URL: http://www.jstor.org/stable/2240411 . Accessed: 16/03/2012 23:47
Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at . http://www.jstor.org/page/info/about/policies/terms.jsp JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range of content in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new forms of scholarship. For more information about JSTOR, please contact support@jstor.org.
Institute of Mathematical Statistics is collaborating with JSTOR to digitize, preserve and extend access to The Annals of Statistics.
http://www.jstor.org
The Annals of Statistics 1981,vol. 9, No. 6, 1218-1228
BOOTSTRAPPING
REGRESSION
MODELS
BY D. A. FREEDMAN'
University California, Berkeley of
It The regression correlation and modelsare considered. is shownthat thebootstrap of approximation thedistributiontheleastsquares to estimates is valid, and someerror bounds given. are 1. Introduction. This paper, a sequel to Bickel and Freedman (1981a), will develop some asymptotictheoryforapplications of Efron's (1977) bootstrapto regression.Autoregressive models and linear econometric models may be consideredelsewhere.Here, only multiplelinear regressionmodels will be considered: (1.1) + Y(n) = X(n)/3 E(n).
the data; is to In this equation,/3 a p x 1 vectorofunknownparameters, be estimatedfrom of Y(n) is an n x 1 data vector;X(n) is an n x p data matrix, fullrankp c n; e(n) is an n x 1 vector of unobservables.The models differ the stochasticmechanismsupposed to in have generated the data. However, in all cases E is supposed random. Throughout this paper,p is fixedand n is large. The case wherep and n are both large will be considered in a futurepaper, Bickel and Freedman (1981b). for givenby to Attentionis restricted the conventionalleast squares estimate /3(n) /3,
,3(n) = {X(n) TX(n)}'X (n)TY(n).

How close is ,3(n) to ,/?The object ofthispaper is to comparethe bootstrapapproximation with the standard asymptotics.Under mild conditions,the bootstrap is valid. However, details depend on the model. To fixideas, it may be usefulto reviewtwo different kindsof models, "regression"and "correlation",and to indicate the resultsforeach. Naturally,the mathematicsmay apply in othercases as well.In theregression model,Xis fixedand the errors "homoscedastic." are In the correlationmodel,X is random and the errorsare in general"heteroscedastic":the conditional distribution the errorsgiven X depends on X. (As it turns out, only the of behavior of the conditional second moment counts.) In order to succeed, the bootstrap simulation must reflectthe relevant featuresof the stochastic model assumed to have generated the data. This point is also discussed by Efron (1977, Section 7), and in the jackknifecontextby Hinkley (1977). The regressionmodel. This is appropriateif,forexample, the basic source of uncertainty is measurement error. An instance is the weighing designs used in precision calibration work at the National Bureau of Standards. The main assumptions are as follows. (1.2) The matrixX(n) is not random.
Received revised August 1980; April1981. ' Research while I worked this partially supported NSF Grant by MCS-80-02535. on paper enjoying of thehospitality theETH, Zurich. AMS 1980subject 62G15. classifications. Primary 62E20;secondary 62G05, metrics. and phrases.Regression, least Wasserstein Keywords correlation, squares, bootstrap, 1218
BOOTSTRAPPING REGRESSION MODELS (1.3)
1219
The components,-, E2, . .., c, of ?(n) are independent, withcommondistribution F havingmean 0 and finite variance a2; both F and a2 are unknown.
The Y-vectoris consideredto be the observed value of the random vectorX(n),/ + ?(n). Then /3(n)has mean / and variance-covariance matrix t2{X(n)TX(n))}-1 Suppose (1.4) 1 - X(n) TX(n) -* V which is positivedefinite. n
Suppose also that the elementsofX(n) are uniformly small by comparisonwithV4I Then . Vn{/3(n) - /8} is asymptoticallynormal,with mean 0 and variance-covariancematrix a2V1. In particular, the distribution the pivotal quantity{X(n) TX(n)) fl/2{(n) - /)/u of is asymptoticallynormal with mean 0 and variance-covariancematrixIpXp,the p x p identity matrix.
NOTATION. XTX is positive definite, it has a unique positive definitesquare root; so this is (XTX)1/2. "Positive definite"is taken in the strictsense.
The correlationmodel. This is appropriateif,forexample,it is desiredto estimatethe regressionplane fora certainpopulation on the basis of a simple random sample, and to the quantify samplingerrorin the estimate.Now X mustbe consideredrandom,and - may be related to X. The ith row in the data array(X, Y) willbe denoted (X,, Y,), representing the measurementson the ith subject in the sample; so (X,, Y,) is a random (p + 1) dimensionalrow vector. Potentially,there are infinitely many such vectors;the rows of n X(n) consistof the first of the X,'s and Y(n) consistsof the first of the Y,'s. n (1.5) The vectors (Xl, Y,) are assumed independent, with common (unknown)distributionpiin RP+1; and E {11 (X,, Y) 114} < 00, where 11 11 Euclidean length. * is
By convention, is a row vector.Let I = E (X7TX,), p x p variance covariancematrix X, the of any row ofX. Assume (1.6) 2 is positivedefinite. X,,8112);
yThe p-vector/3 parametersis definedas the vector which minimizesE (I1I of is equivalently,Y, - X,/3 orthogonalto X,: (1.7) E(XIJ,?) = 0 for j = 1, * **,p, where = Y, -X,/8.
Of course ?(n) in (1.1) consistsof the first of the El's.Relationship (1.7) entails n (1.8)
/3 =
Mik
-1E(XTYE).
If the nonnegativedefinite matrixM is definedby (1.9)

= E (XijX,k-'I)
then the asymptotics in the model can be summarized as follows: X(n)WTc(n)/ asymptotically normal,withmean 0 and variance-covariance matrixM. Since (1.10) and -X(n) n it follows that V4nI{/3(n)covariance matrix I-'ME-'.
TX(n) -- z
is
v/ {=(n)
={X(n)TX(n)}
.2
X(n)
TE(n)
a.e.,
/3}is asymptoticallynormal, with mean 0 and variance-
1220
D. A. FREEDMAN
The I in the correlationmodel plays the role of V in the regressionmodel. However, the asymptotics quite different, are unless the correlation model is "homoscedastic,"which can be interpreted mathematically follows: as (1.11) E(e I Xi)
=
g2
a.e., where
2= E
Then M = a2 ; as a result,in the homoscedasticcase, the correlation model has the same as asymptotics the regression model,Perhaps surprisingly, conditionE (ei IXi) = 0 does the not seem to be needed, or even E (ei) = 0. Bootstrapping. The object of this paper is to show that ifthe simulationsare done in a manner consistentwith the model, the bootstrapwill give the same asymptoticresults as classical methods. In the regressionmodel, it is appropriateto resample the centered residuals. More specifically, the observable column n-vectorE(n) of residuals is given by E(n) = Y(n) - X(n),8. However, jn = (1/n) Zin=lE^i(n)need not vanish, for the column space of X need not include the constantvectors.Let Fnbe the empiricaldistribution of = E(n), centeredat the mean, so Fnputs mass 1/nat E^i(n) inand f x dFPX 0. Given Y(n), let e 1*, **,e,En* be conditionally independent, withcommondistribution let e * (n) be the Fn; n-vectorwhose ith componentis e-; and let Y*(n) = X(n),8(n) + e* (n). e Informally, * is obtained by resamplingthe centeredresiduals.And Y * is generatedfrom the data, using the regressionmodel with /8as the vector of parameters and Fn as the distribution the disturbancetermsE. Now imagine givingthe starreddata (X, Y*) to of anotherstatistician, and askingforan estimateofthe parametervector.The least squares estimate is /8* = (XTXfLXTY*. The bootstrap principle is that the distributionof ,/n which can be computeddirectly /3), fromthe data, approximatesthe distribution (Gof vn(/8 - /8). As will be shown in Section 2 below,this approximation likelyto be very is good, providedn is large and a2p trace(XTX)-1 is small. What happens ifthe residualsare not centeredbeforeresampling? Suppose the constant vectors are neither included in nor orthogonal to the column space of X. Then the distributionof vIn (/3*- /3) incorporatesa bias term which is random (depending on normallimiting This is distribution. El, ***, En) and which in general has a nondegenerate so despite the factthat the empiricaldistribution the uncenteredresiduals convergesto of F. In short,withoutcentering, bootstrapwill usually fail.Efron (1977) asks about this the issue in his Section 7. This completes a sketch of the bootstrapforthe regressionmodel; details will be foundin Section 2. Turningnow to the correlationmodel, there is in generalsome dependence between ei and Xi, "heteroscedacity." So it is inappropriate to resample the residuals, for that obliteratesthe dependence. Instead, it is necessaryto resample the vectors.More specifically, let tinbe the empirical distribution the (Xi, Yi) for i = 1, *.., n. Thus, /in a of is probabilityon RP+l, puttingmass 1/n at each vector (Xi, Yi). Given {X(n), Y(n)}, let (Xi, Y*) be independent, with common distribution fori = 1, * , m. Informally, this ti,n, amounts to taking a resample of size m fromthe n observed vectors. The technical advantages of lettingm # n are seen in Bickel and Freedman (1981a). Informally, data froma small sample can be used to judge the likelyperformance a largersample. of 1 Remember /3(n)minimizes/n that I { Yi - Xi,8 } 2 Thus,/3(n)is to tin /8 to as is (n) the true law p of (Xi, YO).Let 8 * (m) be the least squares estimatebased on the resample: (1.12) /8*(m)= {X*(m)TX*(m)}1X*(m)TY*(m). In Section 3, it will be shownthat the conditionallaw of / {,8* (m) - /) mustbe close to the unconditionallaw of -,?/n{(n) -,/), i.e., the bootstrapapproximation valid. Notice is that in the correlationmodel, unlike the regressionmodel, the first rows of the starred m design matrixare random. Notice too that in the correlationmodel, the residuals should not be centeredbeforeresampling:theyare already orthogonalto X.
BOOTSTRAPPING
REGRESSION
MODELS
1221
2. The regression model. Assume the regressionmodel, with (1.1-2-3).Let ',,(F) be the distribution -/ni of {,8(n) -,8), whenF is the law ofthe ?'s. So 4, (F) is a probability in RP. For the next theorem,let G be an alternativelaw forthe ?'s: assume G also has mean 0 and finitevariance. In applications,G will be the centeredempiricaldistribution of the residuals. Bickel and Freedman (1981a) will be abbreviatedB & F, due to frequent citation.
DEFINITION 2.1. Let dP be the Mallows metricforprobabilitiesin RP, relativeto the of Euclidean norm *. Thus, ifp. and v are probabilitiesin RP, dP'(li, v) is the infimum U Et 11 - V Ill over all pairs of random vectors U and V, where U has law p.and V has law v. Abbreviate d, for d'. For details, see Section 8 of B & F. Only i = 1 or 2 are of presentinterest.
NOTATION. In the presentpaper,p is the dimensionof a linear space; in Section 8 of B & F, however, is the index of an Lp space. p
THEOREM 2.1.
dP{tI,,(F), I,,(G)}2 _ n.trace{X(n)TX(n)}'.d2(F,
G)2.
where PROOF. Let A (n) = {X(n) TX(n) })lX(n)T. Then I, (F) is the law of N/nA (n)?(n) ?(n) is an n-vectorof independentrandomvariables E,havingcommonlaw F. Likewise for Also see = G. Now use Lemma 8.9 of B & F, observingthat A(n)A(n)T {X(n)TX(n)}1. ? (8.2) ofB & F.
To proceed, let F,, be the empirical distribution function of i, *.., E,; let F, be the of empirical distribution the residuals E,(n), ..., E,, (n) fromthe originalregressionon n = data vectors,and let F,, be F,, centeredat its mean iAn (1/n) > ?(n). Since?(n) = Y(n) , -X(n),83(n),
(2.1)
?(n) - ?(n) = -P(n)E(n)

T iS
whereP (n) = X(n) {X(n) TX(n) } -1X(n) X(n).

LEMMA 2.1. E {d2(Fn, F,n)2) }
PROOF. a2
the projectionmatrixonto the columnspace of
p/n.
from(2.1) shows A routinecomputationstarting E{

|| ?(n) ?(n)
(2.2) But
112)
a2p.
(2.3)
d2(Fn,
Fn)2
-_=1
1 nE
{t(n)
,n
-_
=_ II (n) - f(n) 12.
LEMMA 2.2. E {d2(Fn, Fn,)2 }

PROOF.
C2(p + 1)/n.
Two applicationsof Lemma 8.8 of B & F show that

d2(Fn, Fn,)2=
(2.4)
(_
[?=
(n) -
+ , d2J(Fn Fn,)2
Now use the presentLemma 2.1.
The negativetermin (2.4) is a bit disconcerting. REMARK. However,it is small. To see this,let the n x 1 column vector v(n) be identically1. Using (2.1),
1222
D. A. FREEDMAN
As will now be shown,these resultsimplythe validityof the bootstrapapproximation, in probability, assuming for example (1.4). Recall 1, fromthe beginningof the section. Now
(2.5)
(2.6)
E[dP2 t*,(F,), 41n(F)]_
n * trace tX(n) 'X(n)
-'. *E
{d2(Fn } , F)
and because d2 is a metric F)2 1/2d2(Ffl, _ d2(Fn,F.)2 + d2(Fn,F)2 The first termgoes to 0 in probability Lemma 2.2; the second, by Lemma 8.4 of B & F: by and n.trace{X(n)TX(n)}1- = 0(1) by (1.4). Of course, condition (1.4) can be weakened appreciably,and p can be allowed to go to infinity slowly:Bickel and Freedman (1981b). Rather than pursuingthis,a theoremforconvergencea.e. will be given.ConsiderX(n) as the firstn of an infinite sequence of rows. Likewise, consider the disturbancesEl, E2, n randomvariableswithcommon ***, -, as the first of an infinite sequence ofindependent F. function The originalregression distribution problemhas now been embedded into an from n. For infinitesequence of such problems. Allow the resample size m to differ motivation,consider again the weighingdesignsused in precisioncalibration.The error distribution depends in principleonly on the apparatus and proceduresused, not on the one designto assess the probable specificweights. Thus, it may be possible to use data from accuracy fromanother. This simulationwill now be spelled out in more detail. Recall that ,/(n) is the estimate n of/3, based on the first data points.The starreddata is generatedby the recipe
Y*(m) = X(m) 3(n) + ?*(m)
mxl mxp pxl mxl
E the c *.* *c being independentwith common distribution the empiricaldistribution F,, Now ,B* is the of the residuals fromthe originaldata set, but centeredat the mean Fun. (m) parameterestimatebased on the starreddata:
3*(m) = [X(m)TX(m)]-l X(m)TY*(m).
pxl pxp pxm mxl
The starredresiduals are *(m) = Y*(m)

mxl mxl
- X(M)13 *(m)
mxp pxl
The theoreticalvariance a2 = E
{f21
is estimatedfromthe n originaldata vectorsby

-/n
(2.7)
(2.7)
~~~~2 =1
CFn =-EI-l=,
-2 ?^2(n)
where
/In
=1- n '=i E',(n). n
Likewise,the variance estimatefromthe m starredvectorsis (2.8)

a=
-
g*2(m)-
where
/Lm=
^g*(m).
In principle,the starreddata, as well as /8*(m) and c>*,depend on n; this is suppressedin the notation. The estimates a2 are slightlybiased, but this is immaterialfor present purposes. The next result is a special case of results in Lai et al. (1979), which gives further references. Write?(n) forthe n x 1 column vectorEi, n.., c,. Likewise,write ^(n)forthe n x 1 column vectorof residuals from the regression the first data points. on n
LEMMA 2.3.
-
X(n)Tf(n)
-- 0
a.e.
and
(n)
-*
a.e. [
PROOF. Use Kolmogorov'sinequality,along the subsequence of powers of 2.

LEMMA 2.4.
112 E(n) - -(n)12 n 11
a.e.
BOOTSTRAPPING REGRESSION MODELS PROOF. As is easily seen from(2.1), (2.9)

-
1223
E(n) -(n) 11
1 -2 =
(n)
(n)
{ X(n)TX(n)
.{X(n)
e(n)
But the first and thirdfactorsgo to 0 a.e. by Lemma 2.3; the middle factorgoes to V` by 0 assumption (1.4) LEMMA2.5. d2(Fn,Fn) -* a.e. 0
PROOF. Use (2.3), (2.4) and Lemma 2.4. LEMMA2.6. d2(Fn, F)-* 0 a.e.
PROOF. Use Lemma 2.5, and Lemma 8.4 of B & F. LEMMA2.7. Let u, and v, be real numbers.Let
u-
= u, and
s5
E-l= n
u),
and likewisefor v. Then

(Slt -Su )2 C_En=
(U
V, ) 2
PROOF. Clearly,su =
(Ss
/Vnand likewiseforv, so u- 11 1
)2<I211
n n
(U
-U
(V
)112
-)2]
--[IIu-vII2_
n 11U-_v
112.o
The behaviorof of The nexttheoremis the a.e. justification the bootstrapasymptotics. the pivot will be consideredin more detail in Bickel and Freedman (1981b).
THEOREM 2.2. Assume the regressionmodel, with (1.1-4). Along almost all sample sequences, given Y1, * , Yn, as m and n tend to oo, of a) the conditional distribution m{/3*(m)-/3(n)) convergesweaklyto normal with
mean 0 and variance-covariance matrix u2V-1. of b) the conditional distribution a* convergestopoint mass at a. (m) - 8(n) } /I * conc) the conditional distributionof the pivot {X(m)TX(m) } 1/2{fl9 vergesto standard normal in R P.
PROOF. Claim (a) is immediate fromTheorem 2.1 and Lemma 2.6. Indeed, in the theorem,one puts m forn and F, forG. Now
m-trace[X(m)TX(m)]1 -d2(F,
Fn)2= trace[
X(m)TX(m)j
*d2(F,
Fn)2
a.e.
because (1/m)X(m) X(n) TX(m) -* V positivedefinite assumption(1.4). By construction, by is the firstn rows of X(n + 1), so (1.4) entails that the elements of X(n) are uniformly n is asymptotically normal. o(Jn), and X(n)TE(n)/ Claim (b). (2.10) Recall 'a, from(2.7). It will be shown that
a*
J
a.e.
1224 To argue this,introduce

n=
D. A. FREEDMAN
2
-
Ei
Clearly,an -* a a.e. In view of Lemmas 2.7 and 2.4,

(-n
)2C -=l
{El(n)
-E}2
a.e.
Next, let
*2( a.z* =-F1
E*' (i l'E)
Now Recall from(2.8) thatM is the variance of the residuals in the starredregression.
- I EI ('* ,g-*nI Y1 **
Yn)2 cEt{(,*-z*_g)2
1yl, *,Yn
}
Yi,
*..,
_Et- El=
= anp/m
{E(m)-E*}21
Yn]
by Lemma 2.7
by (2.2) applied to the starredregression
a.e. by (2.10).
What remainsis to show that the conditionallaw of u*2is nearlypointmass at 2 This the followsfrom resultsin Section 8 ofB & F. Indeed, conditionon Y1, * , Yn. By Lemma 8.6 of B & F,
d (-1=1 ?1 ,-i=l 1)_di(,-* , El )
(Both sides ofthe displayare random;forthe distancecomputedis betweenthe conditional of of distribution the starred quantityand the unconditionaldistribution the unstarred quantity.) Now ? has conditionallaw Fn; and E, has law F; and d2(Fn, F) -* 0 a.e. by In short,the 2. ] 0 a.e. by Lemma 8.5 of B & F, with4() ? = Lemma 2.6. So di[?2, O [E ? law of 1/mEm=. I, and littlefromthe unconditional conditionallaw of 1/mEm,. ?, differs concentratenear a2. Likewise,the conditionallaw of 1/m E * concenmust therefore tratesnear 0. Claim (c). This is immediatefrom(a) and (b). O
To conclude this section, consider the bootstrap when the uncenteredresiduals are resampled. Let v(m) be a column (m x 1) vector of l's. Applying(1.10) to the starred regression, (2.11)
E[p j,8*(m) -i,n))
*,
nj
{I-X(m)TX(m)}
*-X(M)Tv(M).
rn
where by (2.1)
,n=
1
D
n1
1(n)
) ){Inxn - P(n)})(n). v(n
Now ,nis scalar, E(~n) = 0 and (2.12) E()

=
{Inxn - P(n)}v(n) 112. 11
for a X(n) ofdesigns which(1.4) holds,and (1/M)X(M)Tv(M) It is easyto find sequence
convergesto a limitL; then the rightside of (2.12) convergesto au2(1- LTV-lL). Assume projectioninto the column space of L is nonzero,and LTV-IL < 1, i.e., v has a nontrivial
BOOTSTRAPPING REGRESSION MODELS
1225
X, and this projectionis substantiallyshorterthan v. If m is of ordern, the rightside of (2.11) convergesto a properGaussian limit.If m dominatesn, the rightside of (2.11) blows up. 3. The correlation model. In this section,the object is to justifythe bootstrapin the correlation model,by a straightforward applicationofthe machinery Section 8 of B in & F. The following lemma will be useful.To state it,let i,,and p.be probabilitieson R P", forwhich the fourthpower of the Euclidean norm is integrable.A typical point in RP' will be written(x, y), where x E R P is viewed as a row vector,and y E R 1.Assume
(M)= xTxIt(dx,dy)
is positivedefinite, and let
(t) = E(t)-
xTyg(dx,dy);
x(i,X y) = y - X,8(yi)LEMMA 3.1.
If dP4l(tn p.) -* 0, then a) 2( --) (ti) and /(tin) -*,8(), b) the i,i-lawof ?(n x, y)x convergesto the ,i-law of Qi, x, y)x in dS, c) the i,i-lawof ?(,i, x, y)2 convergesto the ,i-law of x, y)2 in di.
PROOF.
Claim (a) is immediatefromLemma 8.3c of B & F.
Claim (b). Weak convergenceis easy, and then Lemma 8.3a of B & F can be used. Here is a sketchof the argument.
| (n, X, y)XY | = ?( X, y)2 11 X 2
211 112 2YX/B(/I)|| X 112 _ x + Integratewithrespectto ft, and use claim (a).
T(,i)X TX/(,i")
|| X 112
Claim (c). First,the ti-law of c(y, x, y) convergesto the a-law of c(1, x, y) in d2, by the previousargument. Then use Lemma 8.5 of B & F with4(c) = ?2. 0 Now return the correlation to model describedin Section 1. The originaln data vectors are (Xl, Y1) for i = 1, *.., n; these are independent,with common distribution their y; is empiricaldistribution tl. Both tLand tln are probabilitiesin RP". LEMMA3.2. dP4 (tQ, t) -O 0 a.e.
as n -> oc.
PROOF. This is a special case of Lemma 8.4 of B & F; the variables are L4 by (1.5). 0 Turn now to bootstrapping. Given {X(n), Y(n)}, the resampled vectors (X*, Y2*)are independent,with common distribution for i = 1, *., m. Let X *((m) be the m x p ti, matrixwhose ith row is Xl*; and Y* (m) is the m x 1 column vector of Y *'s. The least squares estimatebased on the originaldata is 8(n); on the starreddata, *((m): see (1.12). is In the originaldata, thevectorofunobservabledisturbances e(n), see (1.7); the observable residuals are (3.1) (3.2) e(n) = Y(n)
-
X(n)/8(n).
In the starreddata, the m x 1 columnvectorof disturbancesis E *, with

*=
Y-
X *(m)/(n).
1226
D. A. FREEDMAN
The m x 1 column vectorof residuals is ?*(m) with (3.3)

(m) = Y*-X
*(m)
The next result shows that the asymptoticsestimated fromthe bootstrap are correct. Recall that E is the variance-covariancematrixof X,; and M was definedin (1.9). The dependence of the starreddata and ,8*(m) on n is suppressedin the notation.
THEOREM 3.1. Assume the correlationmodel, withconditions (1.5-6). Along almost all sample sequences, given (X,, Yj) for 1 i ' n, as m and n go to infinity, a) (l/m)X*(m)TX*(m) convergesin conditionalprobabilityto I {83*(m) -,/3(n)} goes weaklyto normal withmean 0 and b) the conditional law of m variance-covariance matrix -1ME1.
PROOF. As in (1.10),
(3.4)
where (3.5) and (3.6)
* M{,B (m) - /(n)) = W*(m)-lZ* (m),

W* (m)=
1
m
X*(m)TX*(m)
l=
** Xl*TXl*
Z*(m) =-imX*(m)T?
*(m)
Xl I
Here, W*(m) is a p x p matrix;Z*(m) is a p x 1 column vector. The corresponding unstarredquantitiesare
(3.7)
(3.8)
W(m)
1 X(m)TX(m)= m
1 El=, m
XTXl;
Z(m) =-X(m)Te(m)
N-El=,
XT'-
Now W* (m) is a vectorsum in RPXP;conditionit on {X(n), Y(n)}. By Lemma 8.6 of B & F,

dPhxP {W*(m), W(m)} '
dPIP{XP tX,X,XTX,}.
Again,both sides ofthe displayare randomvariables:forthe distanceis computedbetween of of the conditionaldistribution the starredquantityand the unconditionaldistribution the unstarredquantity.The righthand side of the display goes to 0 a.e. as n -> oo;this R followsfrom Lemma 3.2,and Lemma 8.5 ofB & F; the relevant4 is O(x,y) = xTx from P+ to R pXp.In otherwords,the conditionallaw of W* (m) is close to the unconditionallaw of W(m), but the latter concentratesnear E,' the variance-covariancematrix of Xi. This proves: (3.9) The conditionallaw of W* (m) concentratesnear S. Likewise,Z*(m) is a vectorsum in RP. Conditionit on {X(n), Y(n)} and use Lemma 8.7 of B & F to obtain
dZ (m),Z(m)}2'dP2 2{*
(X*
X,T,-,)2
-* The righthand side goes to 0 a.e. as n -* oo.Indeed, Lemma 3.2 shows It,, lI a.e. in dS+l; then use Lemma 3.1b. In other words, the conditional law of Z*(m) is close to the unconditionallaw of Z(m), and the latteris essentiallymultivariateGaussian, withmean 0 and variance-covariance matrixM definedby (1.9). This proves:
BOOTSTRAPPING REGRESSION MODELS (3.10)
1227
The conditionallaw ofZ* (m) is essentiallymultivariate normalwithmean 0 and variance-covariance matrixM. 0
To complete the argument, combine (3.9) and (3.10).
In the correlationmodel, (XTX)-/2( -8)/& is not pivotal. However, bootstrapping it The only new issue is a. As before, may be of interest. let
..2
n =i
E()2.
c(n)
This estimatesa2 = E(E2) from data, as the mean square ofthe residuals:see (3.1). The the estimatebased on the starreddata is corresponding
(3.11)
a
=-El=1
El
where the starredresiduals are definedin (3.3).

THEOREM 3.2. Assume the correlationmodel with conditions (1.5-6). Along almost the all sample sequences, given (Xl, Yl) for 1 = 1, *.., n, as m and n tend to infinity, conditional law of a* convergesweakly to point mass at a.
PROOF. Using (2.9) on the starredregression,

*(
-E
2 *(m) 11 = Z*(m)T. W*V(m)-.Z* (m).
Now use (3.9-10) to conclude (3.12) The conditionallaw ofM
E j*(*(m)-
*M
*(m) 112concentratesnear 0.
For the definition a*, see (3.11). Let of

* 2 M
*2
m be the average of the squares of the starreddisturbances(as opposed to residuals); see (3.2-3). Now
(m * )M C-
{E(m)
_*12
is so it remainsonly to show that M2 nearly 2 Conditionthe Ec on [X(n), Y(n)]. In view of Lemma 8.6 of B & F,
di (-Elm=
d1(
E 2El= 61-
*2
M 2 l
61C
de2, di (E I*
E 12
2r).
But the righthand side tends to 0 a.e. by Lemmas 3.2 and 3.1c. In other words, the law of (1/m) E=i c2. And the conditionallaw of (1/m) = *2 is close to the unconditional
latter concentrates near a2.
In particular,as m and n tend to oc, the conditionallaw of {X*(m)TX*(m)1l/2{,&*(m) convergesto the appropriatelimit:normalwithmean 0 and variance-covari/(n)) I/,M* ance matrix Z-1/2ME- /2/a2. In the homoscedastic case, this is just the p x p identity matrixI,,<: see (1.11). What is the role of the 4th moment conditionin (1.5)? To secure the conventional the conditionsseem to be needed: asymptotics, following
-
E(11X 112) < o? and E(Y2) < oc and E(IIXi
1122) < ?c.
1228
D. A. FREEDMAN
the bootstrapwill Preliminary calculations suggestthat under these minimalconditions, < be valid in probability;convergencea.e. can be secured by requiringE { 11 11 2+6} . X/ Convergencea.e. under the minimalconditionsseems to be quite a delicate question. REFERENCES for Ann.Statist. 9 BICKEL, P. and FREEDMAN, D. (1981a).Some asymptotic theory the bootstrap. 1196-1217. D. BICKEL, P. and FREEDMAN, (1981b). More on bootstrapping regressionmodels. Technical report, StatisticsDepartment,University California, of Berkeley. BILLINGSLEY, P. (1979). Probability and Measure.Wiley, New York. EFRON, B. (1977). Bootstrap methods:anotherlook at the jackknife. Ann. Statist. 7 1-26. in 19 HINXLEY, D. (1977). On jackknifing unbalanced situations.Technometrics, 285-292. LAI, T., ROBBINS,H., and WEI, C. (1979). Strong consistencyof least squares estimatesin multiple J. 9 regression. Multivariate Analysis 343-361.
DEPARTMENT UNIVERSITY BERKELEY, OF STATISTICS OF CALIFORNIA CALIFORNIA 94720

Institute of Mathematical Statistics

Uploaded by

Copyright:

Available Formats

Institute of Mathematical Statistics

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Institute of Mathematical Statistics

Uploaded by

Copyright:

Available Formats

Bootstrapping Regression Models Author(s): D. A. Freedman Reviewed work(s): Source: The Annals of Statistics, Vol. 9, No. 6 (Nov.

The Annals of Statistics 1981,vol. 9, No. 6, 1218-1228

,3(n) = {X(n) TX(n)}'X (n)TY(n).

BOOTSTRAPPING REGRESSION MODELS (1.3)

If the nonnegativedefinite matrixM is definedby (1.9)

/3}is asymptoticallynormal, with mean 0 and variance-

dP{tI,,(F), I,,(G)}2 _ n.trace{X(n)TX(n)}'.d2(F,

?(n) - ?(n) = -P(n)E(n)

whereP (n) = X(n) {X(n) TX(n) } -1X(n) X(n).

the projectionmatrixonto the columnspace of

from(2.1) shows A routinecomputationstarting E{

=_ II (n) - f(n) 12.

LEMMA 2.2. E {d2(Fn, Fn,)2 }

Two applicationsof Lemma 8.8 of B & F show that

Now use the presentLemma 2.1.

E[dP2 t*,(F,), 41n(F)]_

n * trace tX(n) 'X(n)

The starredresiduals are *(m) = Y*(m)

is estimatedfromthe n originaldata vectorsby

=1- n '=i E',(n). n

Likewise,the variance estimatefromthe m starredvectorsis (2.8)

PROOF. Use Kolmogorov'sinequality,along the subsequence of powers of 2.

BOOTSTRAPPING REGRESSION MODELS PROOF. As is easily seen from(2.1), (2.9)

and likewisefor v. Then

1224 To argue this,introduce

Clearly,an -* a a.e. In view of Lemmas 2.7 and 2.4,

by (2.2) applied to the starredregression

) ){Inxn - P(n)})(n). v(n

Now ,nis scalar, E(~n) = 0 and (2.12) E()

{Inxn - P(n)}v(n) 112. 11

for a X(n) ofdesigns which(1.4) holds,and (1/M)X(M)Tv(M) It is easyto find sequence

BOOTSTRAPPING REGRESSION MODELS

x(i,X y) = y - X,8(yi)LEMMA 3.1.

Claim (a) is immediatefromLemma 8.3c of B & F.

In the starreddata, the m x 1 columnvectorof disturbancesis E *, with

The m x 1 column vectorof residuals is ?*(m) with (3.3)

* M{,B (m) - /(n)) = W*(m)-lZ* (m),

Here, W*(m) is a p x p matrix;Z*(m) is a p x 1 column vector. The corresponding unstarredquantitiesare

Now W* (m) is a vectorsum in RPXP;conditionit on {X(n), Y(n)}. By Lemma 8.6 of B & F,

BOOTSTRAPPING REGRESSION MODELS (3.10)

The conditionallaw ofZ* (m) is essentiallymultivariate normalwithmean 0 and variance-covariance matrixM. 0

To complete the argument, combine (3.9) and (3.10).

where the starredresiduals are definedin (3.3).

PROOF. Using (2.9) on the starredregression,

2 *(m) 11 = Z*(m)T. W*V(m)-.Z* (m).

Now use (3.9-10) to conclude (3.12) The conditionallaw ofM

For the definition a*, see (3.11). Let of

E(11X 112) < o? and E(Y2) < oc and E(IIXi

1122) < ?c.

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

The starredresiduals are (m) = Y(m)

* M{,B (m) - /(n)) = W(m)-lZ (m),

Here, W(m) is a p x p matrix;Z(m) is a p x 1 column vector. The corresponding unstarredquantitiesare

2 (m) 11 = Z(m)T. WV(m)-.Z (m).