EC771 S2012 nn02
EC771 S2012 nn02
EC771 S2012 nn02
Chapters 4, 8:
Properties of LS and IV estimators
M SE = E[y − x0γ]2
which may be decomposed as
E[xy] = E[xx0]γ
which leads to the same least squares esti-
mator as that for β above. Thus, the least
squares estimator solves the optimal linear pre-
dictor problem: even without specification of
the form of the conditional mean function E[y|x],
since we did not use the assumption of linearity
in the above derivation.
b = (X 0X)−1X 0y
b = (X 0X)−1X 0(Xβ + )
b = β + (X 0X)−1X 0
Taking expectations of this expression, we find
that E[b|X] = E[b] = β, since by the assump-
tion of orthogonal regressors (exogeneity), the
expectation of X 0 is zero. For any particular
set of observations X the least squares estima-
tor has expectation β. If we average over the
possible values of X (in the case where X is
stochastic) the unconditional mean of b is β as
well.
D = C − (X 0X)−1X 0,
so Dy = (b0 − b). Then
0 −1 0 0 −1 0 0
h i
2
V ar[b0|X] = σ (D + (X X) X )(D + (X X) X )
Since
CX = DX + (X 0X)−1(X 0X) = I,
DX = 0, and
K−1 R2/(K − 1)
Fn−K =
(1 − R2)/(n − K)
which will have a Fisher F distribution under
that null hypothesis. In later discussions, we
will consider alternative joint hypotheses on
combinations or subsets of the regression pa-
rameters.
Collinearity
C[s2(X 0X)−1]C 0.
If any of the J functions are nonlinear, the un-
biasedness of b may not carry over to f (b).
Nevertheless, f (b) will be a consistent estima-
tor of f (β), with a consistent estimate of the
asymptotic covariance matrix. This is the ra-
tionale for the widely–employed delta method,
implemented in Stata as the testnl command.
Asymptotic efficiency
E[t|xt−s] = 0 ∀s ≥ 0.
This states that the disturbance at period t
is an innovation, uncorrelated with the past
history of the x process. It cannot be uncor-
related with the future of the process, since it
will become part of those future values. We
further must assume that the series in x are
stationary (at least in terms of covariance sta-
tionarity), which assumes that they have finite,
non–time–varying second moments which de-
pend only on the temporal displacement be-
tween their values; and that the autocorrela-
tion of the series is damped (so that the de-
pendence between observations declines with
the temporal displacement, and sample esti-
mates of the autocovariance function will be
suitable estimates of their population counter-
parts). The combination of these conditions is
equivalent to stating that the regressors are
stationary and ergodic. Under these condi-
tions, consistency of the OLS estimator can
be proven.
y = Xβ + u, E(uu0) = Ω (1)
with typical row
yi = Xiβ + ui (2)
Homoskedasticity:Ω = σ 2I
σ12 0
...
σi2
Heteroskedasticity:Ω =
...
0 2
σn
σ2
σ21 σ 2
σ2
Serial correlation:Ω =
σ31 σ32
.. ... ... ...
.
σn1 σn2 . . . σ2
Σ1 0
...
Clustering:Ω = Σm
...
0 ΣM
QZZ = E(Zi0Zi)
and let û denote the IV residuals,
û ≡ y − X β̂IV
Then the IV estimator is asymptotically dis-
A
tributed as β̂IV ∼ N (β, V (β̂IV )) where
1 2 0
V (β̂IV ) = σ (QXZ Q−1
ZZ Q XZ ) −1
n
Replacing QXZ , QZZ and σ 2 with their sample
estimates
1
QXZ = X 0Z
n
1 0
QZZ = ZZ
n
2 û0û
σ̂ =
n
we obtain the estimated asymptotic variance–
covariance matrix of the IV estimator:
E(gi(β)) = 0
Each of the L moment equations corresponds
to a sample moment, and we write these L
sample moments as
n n
1 X 1 X 0 1 0
g(β̂) = gi(β̂) = Zi (yi − Xiβ̂) = Z û
n i=1 n i=1 n
The intuition behind GMM is to choose an es-
timator for β that solves g(β̂) = 0.
V (β̂GM M ) =
1 0
(QXZ W QXZ )−1(Q0XZ W SW QXZ )(Q0XZ W QXZ )−1 (4)
n
The efficient GMM estimator is the GMM es-
timator with an optimal weighting matrix W ,
one which minimizes the asymptotic variance
of the estimator. This is achieved by choosing
W = S −1. Substitute this into Equation (3)
and Equation (4) and we obtain the efficient
GMM estimator
2 1 0
Ŝ = σ̂ Z Z
n
Finally, if we now set
1 0 −1
Ŵ = Ŝ −1 2
= σ̂ Z Z
n
and substitute into the formula for the asymp-
totic variance of the efficient GMM estimator
we find that it reduces to the formula for the
asymptotic variance of the IV estimator. In ef-
fect, under the assumption of conditional ho-
moskedasticity, the (efficient) iterated GMM
estimator is the IV estimator, and the itera-
tions converge after one step. It is worth not-
ing that the IV estimator is not the only such
efficient GMM estimator under conditional ho-
moskedasticity. Instead of treating σ̂ 2 as a pa-
rameter to be estimated in a second stage,
what if we return to the GMM criterion func-
tion and minimize by simultaneously choosing
β̂ and σ̂ 2? The estimator that solves this min-
imization problem is in fact the Limited Infor-
mation Maximum Likelihood estimator (LIML).
In effect, under conditional homoskedasticity,
the continuously updated GMM estimator is
the LIML estimator. Calculating the LIML es-
timator does not require numerical optimiza-
tion methods; it can be calculated as the so-
lution to an eigenvalue problem. The latest
version of ivreg2 (Baum, Schaffer and Still-
man) supports LIML and k–class estimation
methods.