3 The Basic Linear Model Finite Sample Results
3 The Basic Linear Model Finite Sample Results
3 The Basic Linear Model Finite Sample Results
1
a vector of coe¢ cients which are assumed unknown. In matrix notation:
y = X + u :
T 1 T kk 1 T 1
Assumptions:
The crucial assumption is A1. It states that the expected value of the
errors is zero conditional on all entries in the matrix of regressors. A necessary
condition is that the errors and the regressors be uncorrelated. Since this is
the most relevant condition in practice, we shall loosely refer to Assumption
A1 as requiring the regressors and errors to be uncorrelated, i.e. whatever
is left as unexplained (even if it does truly a¤ect the dependent variable) is
uncorrelated with what is used to explain the dependent variable. Then a
unit change in, say, xi has an e¤ect of i on y holding everything else constant
since that change which is used to explain y is not associated (correlated)
with changes in the unexplained part (the residual). Su¢ cient conditions for
A1 to hold are:
2
With cross-section data. The data (yi ; x1i ; :::; xki ) for units (individuals)
i (i = 1; :::; n) are a random sample from some large population (i.e.,
the units are independent draws) and, denoting the population relation
as y = X + u , we have E(u jX ) = 0.
For time series data, A1 implies that the residuals are uncorrelated
with the regressors at all leads and lags. This is a very strong
assumption which almost never holds and is referred to as having
strongly exogenous regressors.
3
A4 implies that E(u2t ) = 2 and, hence, that the errors are homoskedastic,
i.e. they have the same variance across time or across units. This is a strong
assumption which rarely holds and we shall consider methods to deal with
this fact later. Assumption A5 implies that E(ut us ) = 0 and, hence, that
the residuals are uncorrelated. For cross-section data, this assumption is
redundant since it is implied by the requirement of a random sample. With
time series data, this assumption implies no temporal or serial correlation.
Assumptions A1-A5 imply that
2
V (u) = V (ujX) = V (yjX) = I
which we label spherical errors for reasons that will be explained later.
We then have the following result about the variance of the least-squares
estimates conditional on the values of the regressors.
since
1
V ( ^ jX) = V ar (X 0 X) X 0 yjX
1 1
= (X 0 X) X 0 V ar (yjX) X (X 0 X)
1 1
= (X 0 X) X 0 V ar (ujX) X (X 0 X)
1 1 1
= 2 (X 0 X) X 0 X (X 0 X) = 2 (X 0 X) :
Proof:
4
Now, we have
y = xj j +X j j +u (1)
where
X j = (x1 ; :::; xj 1 ; xj+1 ; :::; xk )
and similarly
0
j =( 1 ; :::; j 1; j+1 ; :::; k) :
5
where y = (I PX j )y and xj = (I PX j )xj . Hence,
2
V ( ^ j jX) =
xj 0 xj
2
=
x0j (I PX j )0 (I PX j )xj
2
=
x0j (I PX j )xj
2
= PT
t=1 (xjt xj )2 (1 Rj2 )
using the fact that
x0j (I PX j )xj
Rj2 =1 PT :
xj )2
t=1 (xjt
This lemma is useful since it shows under what conditions a coe¢ cient can
be estimated precisely. Two factors contribute to the precision. The …rst is
that the regressor associated with the coe¢ cient should be highly variable
(or have high leverage). The second factor is that the correlation between
this regressor and the others be as small as possible. The intuition behind
the second factor is that if two regressors are highly correlated it does not
really matter which is assigned the weight in obtaining a …tted value for the
dependent variable. The least-squares method is in some sense “undecided”
as to which particular weights to assigned, hence the imprecision.
6
De…ne
1
Z = L (X 0 X) X 0
h ih i
1 1
ZZ = L (X X) X L0
0 0 0
X (X X) 0
1
= LL0 (X 0 X)
That is:
V ( jX) V ( ^ jX) = 2
ZZ 0 0:
The equality holds i¤ ^ = since
1
ZZ 0 = 0 ) Z = 0 ) L = (X 0 X) X0 ) ^ = :
Hence,
2
3.5 Estimate of
If we knew the true errors ut , then we could use T 1 T 2
t=1 ut as an estimate
of 2 since
E[T 1 Tt=1 u2t ] = T 1 Tt=1 E u2t = 2
We don’t know the errors, so we proxy them with the estimated residuals.
Consider
1 T 2
^2 = u^
T t=1 t
as an estimate of 2 :
7
Properties:
1) ^ 2 is a biased estimate of 2
since
1 T (T k)
E T ^2t
t=1 u = 2
:
T
Proof.
u0 u^jX] =
E [^ E u0 (I PX )0 (I PX ) ujX
= E [u0 (I PX ) ujX]
= E [tr fu0 (I PX ) ug jX]
= tr [(I PX ) E(uu0 jX)]
= tr [(I PX )] 2
= (T k) 2
tr [I PX ] = tr (I) tr [PX ]
h i
1
= T tr X (X 0 X) X 0
h i
1
= T tr (X 0 X) X 0 X
= T tr [Ik ] = T k:
We have
u^ = (I PX ) u
8
and
ujX) = (I PX ) V ar (ujX) (I
V (^ PX )0
= 2 (I PX )
3.6 Readings
Wooldridge, pp.22-59, 84-105, 714-737, 747-757, 799-805.
3.7 Exercises
1) Let the estimate of a single coe¢ cient, say ^ j from a multiple regression.
Let Rj2 be the R2 from a regression of xj on all regressors except xj (and
P
assume that one of these regressors is a constant) and let xj = T 1 Tt=1 xjt ,
Then under A1-A5, show
2
V ( ^ j jX) = PT :
t=1 (xjt xj )2 (1 Rj2 )