3 The Basic Linear Model Finite Sample Results

Download as pdf or txt
Download as pdf or txt
You are on page 1of 9

Dukpa Kim

ECON 504; Econometrics for MAE II


Department of Economics
University of Michigan

3 The Basic Linear Model; Finite Sample


Results
So far, we have not considered any model with speci…c conditions. Our
results pertained only to the mechanics of least-squares projections and its
properties by simply labelling one variable as the dependent one and the
others as explanatory ones. We looked at a method to describe the dependent
variable in terms of some linear combination of the explanatory ones. For this
reason, we can so far only discuss issues of …t and not causal interpretations.
Of central importance in the use of regression analysis is the issue of
causal interpretation. What we seek is to answer a question of the type:

Holding everything else constant, what is the e¤ect of changing the


value of some explanatory variable (a regressor xi ) on the dependent
variable y.

To provide such a causal interpretation, we need to specify a structure


or model with speci…c assumptions that will ensure we can obtain a valid
answer.

3.1 Model and assumptions.


The speci…cation is that some scalar random variable yt is generated by the
following model:
yt = x0t + ut t = 1; : : : ; T
where
x0t = (x1t ; : : : ; xkt )
is some vector of regressors (or explanatory variables) and
0
=( 1; : : : ; k)

1
a vector of coe¢ cients which are assumed unknown. In matrix notation:

y = X + u :
T 1 T kk 1 T 1

Remark 1 We will often assume that one of the regressors is a constant.

The least-squares estimate of is given by:


^ = (X 0 X) 1
X 0y

which is linear in the data y.

3.2 Unbiasedness of the Least-Squares Estimate ^ :


De…nition 1 (Unbiasedness) A statistic T is an unbiased estimator of if
E(T ) = .

We want to show the unbiasedness of the least-squares estimates and for


that we make the following assumptions:

Assumptions:

A1) E [ut jX] = 0 for all t;

A2) The model relating y and X is linear and given by y = X + u;

A3) T k and rank (X) = k.

The crucial assumption is A1. It states that the expected value of the
errors is zero conditional on all entries in the matrix of regressors. A necessary
condition is that the errors and the regressors be uncorrelated. Since this is
the most relevant condition in practice, we shall loosely refer to Assumption
A1 as requiring the regressors and errors to be uncorrelated, i.e. whatever
is left as unexplained (even if it does truly a¤ect the dependent variable) is
uncorrelated with what is used to explain the dependent variable. Then a
unit change in, say, xi has an e¤ect of i on y holding everything else constant
since that change which is used to explain y is not associated (correlated)
with changes in the unexplained part (the residual). Su¢ cient conditions for
A1 to hold are:

2
With cross-section data. The data (yi ; x1i ; :::; xki ) for units (individuals)
i (i = 1; :::; n) are a random sample from some large population (i.e.,
the units are independent draws) and, denoting the population relation
as y = X + u , we have E(u jX ) = 0.

For time series data, A1 implies that the residuals are uncorrelated
with the regressors at all leads and lags. This is a very strong
assumption which almost never holds and is referred to as having
strongly exogenous regressors.

Remark 2 Note that A1 implies E(u) = 0, since E(u) = E(E(ujX)).

Assumption A2 states that the true relationship between the dependent


variable and the regressors is linear. For example, a violation would occur
if the true relationship is non-linear, say y = x1 1 + x21 2 + u and we specify
only x1 as the regressor even if E(ujx1 ) = 0. Assumption A3 is simply a
requirement to have well de…ned parameter estimates.
Under these assumptions, the least-squares estimates are unbiased. To
prove this, note that
1
E( ^ jX) = E[(X 0 X) X 0 yjX]
1
= (X 0 X) X 0 E[yjX]
1 1
= (X 0 X) X 0 X + (X 0 X) X 0 E(ujX) given A2
= given A1

Since E[E( ^ jX)] = E( ^ ), we have E( ^ ) = . What allows this result is


basically the fact that the conditional expectation does not depend on X.
This type of argument will be used often.

3.3 The Variance of the Least-Squares Estimate ^ :


To make statements about the variance of the least-squares estimate, we
impose the following additional assumptions:

A4) E(u2t jX) = 2


for all t;

A5) E(ut us jX) = 0 for all t 6= s.

3
A4 implies that E(u2t ) = 2 and, hence, that the errors are homoskedastic,
i.e. they have the same variance across time or across units. This is a strong
assumption which rarely holds and we shall consider methods to deal with
this fact later. Assumption A5 implies that E(ut us ) = 0 and, hence, that
the residuals are uncorrelated. For cross-section data, this assumption is
redundant since it is implied by the requirement of a random sample. With
time series data, this assumption implies no temporal or serial correlation.
Assumptions A1-A5 imply that
2
V (u) = V (ujX) = V (yjX) = I

which we label spherical errors for reasons that will be explained later.
We then have the following result about the variance of the least-squares
estimates conditional on the values of the regressors.

Lemma 1 Under A1-A5:


1
V ( ^ jX) = 2
(X 0 X)

since
1
V ( ^ jX) = V ar (X 0 X) X 0 yjX
1 1
= (X 0 X) X 0 V ar (yjX) X (X 0 X)
1 1
= (X 0 X) X 0 V ar (ujX) X (X 0 X)
1 1 1
= 2 (X 0 X) X 0 X (X 0 X) = 2 (X 0 X) :

To derive the unconditional variance, we use the following result:

Lemma 2 Let z be a scalar random variable and W a matrix of random


variables, then: V (z) = V (E(zjW )) + E(V (zjW )).

Proof:

E(V (zjW )) = E E (z E(zjW ))2 jW


= E E (z E(z) + E(z) E(zjW ))2 jW
= E E (z E(z))2 jW
+2E [E [(z E(z))(E(z) E(zjW ))jW ]]
+E E (E(z) E(zjW ))2 jW

4
Now, we have

E E (z E(z))2 jW = E (z E(z))2 = V (z)

E E (E(z) E(zjW ))2 jW = E[(E(z) E(zjW ))2 ] = V (E(zjW ))


and

E [E [(z E(z))(E(z) E(zjW ))jW ]]


= E (E(zjW ) E(z))2 = V (E(zjW ))

and the result follows.


Since E( ^ jX) = , this implies
1
V (^) = 2
E[(X 0 X) ]:

It is useful to derive an expression for the conditional variance of the


estimate of a single coe¢ cient, say ^ j . We have the following result:

Lemma 3 Let Rj2 be the R2 from a regression of xj on all regressors except


xj (and assume that one of these regressors is a constant) and let xj =
P
T 1 Tt=1 xjt , Then under A1-A5:
2
V ( ^ j jX) = PT :
t=1 (xjt xj )2 (1 Rj2 )

Proof: Write the regression as

y = xj j +X j j +u (1)

where
X j = (x1 ; :::; xj 1 ; xj+1 ; :::; xk )
and similarly
0
j =( 1 ; :::; j 1; j+1 ; :::; k) :

Using the Frisch-Waugh-Lovell Theorem, the least-squares estimate of j


from (1) is identical to the least-squares estimate of j obtained from the
regression
y = j xj + u

5
where y = (I PX j )y and xj = (I PX j )xj . Hence,
2
V ( ^ j jX) =
xj 0 xj
2
=
x0j (I PX j )0 (I PX j )xj
2
=
x0j (I PX j )xj
2
= PT
t=1 (xjt xj )2 (1 Rj2 )
using the fact that
x0j (I PX j )xj
Rj2 =1 PT :
xj )2
t=1 (xjt
This lemma is useful since it shows under what conditions a coe¢ cient can
be estimated precisely. Two factors contribute to the precision. The …rst is
that the regressor associated with the coe¢ cient should be highly variable
(or have high leverage). The second factor is that the correlation between
this regressor and the others be as small as possible. The intuition behind
the second factor is that if two regressors are highly correlated it does not
really matter which is assigned the weight in obtaining a …tted value for the
dependent variable. The least-squares method is in some sense “undecided”
as to which particular weights to assigned, hence the imprecision.

3.4 Gauss-Markov theorem


Theorem 1 ^ is the Best Linear Unbiased Estimator (BLUE) of , i.e., if
is any other linear unbiased estimator, then
V( ) V (^) 0
in the sense that the di¤erence is a positive semi-de…nite matrix.
Proof. We show that V ( jX) V ( ^ jX) 0 which implies the result
since V ( ) V ( ^ ) = E[V ^
( jX)] E[V ( jX)] since both estimates are
assumed unbiased. Let = Ly be a linear unbiased estimator. Then
E ( jX) = LE [yjX] = LX = for all : Hence,
LX = I:

6
De…ne
1
Z = L (X 0 X) X 0
h ih i
1 1
ZZ = L (X X) X L0
0 0 0
X (X X) 0

1
= LL0 (X 0 X)

(since LX = I): Hence,


h i
2 0 2 0 0 1
V( jX) = V ar (LyjX) = LL = ZZ + (X X)
= V ( ^ jX) + 2
ZZ 0 :

That is:
V ( jX) V ( ^ jX) = 2
ZZ 0 0:
The equality holds i¤ ^ = since
1
ZZ 0 = 0 ) Z = 0 ) L = (X 0 X) X0 ) ^ = :

Hence,

V( ) V ( ^ ) = E[V ( jX)] E[V ( ^ jX)]


= 2 E[ZZ 0 ] 0:

2
3.5 Estimate of
If we knew the true errors ut , then we could use T 1 T 2
t=1 ut as an estimate
of 2 since
E[T 1 Tt=1 u2t ] = T 1 Tt=1 E u2t = 2

We don’t know the errors, so we proxy them with the estimated residuals.
Consider
1 T 2
^2 = u^
T t=1 t
as an estimate of 2 :

7
Properties:

1) ^ 2 is a biased estimate of 2
since

1 T (T k)
E T ^2t
t=1 u = 2
:
T
Proof.

u0 u^jX] =
E [^ E u0 (I PX )0 (I PX ) ujX
= E [u0 (I PX ) ujX]
= E [tr fu0 (I PX ) ug jX]
= tr [(I PX ) E(uu0 jX)]
= tr [(I PX )] 2
= (T k) 2

In the derivations above we have used the symmetry and idempotency of


projection matrices and the following steps to justify the last equality

tr [I PX ] = tr (I) tr [PX ]
h i
1
= T tr X (X 0 X) X 0
h i
1
= T tr (X 0 X) X 0 X
= T tr [Ik ] = T k:

u0 u^jX] does not depend on X, we also have E(^


Since E [^ u0 u^) = (T k) 2
.
2
So, an unbiased estimate of is
1 1 0
s2 = u^0 u^ = y X^ y X^ :
T k T k
Remark 3 The intuition behind the fact that a scaling by T k is needed to
get an unbiased estimate is that there are only T k independent residuals
(since X 0 u^ = 0 imposes k linear combinations to 0).

2) The estimated residuals are correlated by construction.

We have
u^ = (I PX ) u

8
and

ujX) = (I PX ) V ar (ujX) (I
V (^ PX )0
= 2 (I PX )

using the symmetry and idempotency of PX .

3.6 Readings
Wooldridge, pp.22-59, 84-105, 714-737, 747-757, 799-805.

Davidson and MacKinnon, pp.86-118.

3.7 Exercises
1) Let the estimate of a single coe¢ cient, say ^ j from a multiple regression.
Let Rj2 be the R2 from a regression of xj on all regressors except xj (and
P
assume that one of these regressors is a constant) and let xj = T 1 Tt=1 xjt ,
Then under A1-A5, show
2
V ( ^ j jX) = PT :
t=1 (xjt xj )2 (1 Rj2 )

Comment on the implications of this result.

2) Do Problem 3.3 on page 106, Wooldridge.

3) Do Problem 3.5 on page 107, Wooldridge.

4) Do Problem 3.6 on page 107, Wooldridge.

5) Do Problem 3.10 on page 108, Wooldridge.

6) Do Problem 3.13 on page 109, Wooldridge.

7) Do Problem C3.4 on page 111, Wooldridge.

8) Do Problem C3.8 on page 112, Wooldridge.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy