Chapter 6 - Endogeneity
Chapter 6 - Endogeneity
Set of Exercises
Exercise 1. Regression towards the mean
In a famous exercise (“Regression towards Mediocrity in Hereditary Stature”, 1886), Galton
regressed the heights of a sample of sons on the heights of their fathers. He found a coefficient
strictly less than 1. He interpreted this as evidence of “regression towards the mean”.
Regression fallacy: Let us first consider the model:
yis = α + βyi,s−1 + uis , s = 1, ..., S, i = 1, ..., N
s indices the generations (grand-father, father, son), and i indices individuals.
We assume that, for all s: E(uis |y1,s−1 , ..., yN,s−1 ) = 0. Moreover the variance of uis is
constant and equal to σ 2 .
1 PN y .
Lastly, we shall denote as ȳs = N i=1 is
1. Compute E(yis − ȳs |y1,s−1 , ..., yN,s−1 ) as a function of y1,s−1 , ..., yN,s−1 , and give a precise
meaning to Galton’s interpretation.
3. Show that:
0 ≤ plim β̃ ≤ 1
N →∞
1
A modern researcher estimates the following regression for a sample of N countries indexed
by i:
∆lnGDPit = δ + βlnGDPiinit + ui
where ∆lnGDPit is the average growth rate of GDP of country i during a period, and
lnGDPiinit is the (log) GDP at the beginning of the period. She estimates a negative β by
OLS. She concludes that this is evidence of convergence across countries. Your opinion?
A researcher does just that (with a valid instrument), and finds an estimate:
α̃ ≈ 1.2α̂
Endogeneity and measurement error: To try to explain this result, we consider a second
(augmented) model:
yi = δ + αxi + ui
xi = µ + βzi + vi
ui = zi + ηi
ỹi = yi + εi
x̃i = xi + νi
(2)
2
In addition to the assumptions in (1), it is assumed here that εi and νi are iid with zero mean,
uncorrelated, and that E(εi zi ) = E(εi vi ) = E(νi zi ) = E(νi vi ) = E(νi ηi ) = 0. In addition, νi
has variance σν2 . Importantly, ỹi and x̃i are the only observed variables in this model.
3. Does the strategy of the first paragraph yield a consistent estimator of α in this case?
4. In the dataset that is used in this exercise, it is thought that measurement error
accounts for around 10% of the variance of education (measured by the “number of
years of schooling” variable). Your opinion on the measurement error explanation?
1. Show that:
1 PN z x = 0
plim N i=1 i i
N →∞
2. Show that:
√1
PN
zi xi → N (µ, V )
N i=1 d
3. Show that:
3
β̂ − β → (E(zi2 ) + a)−1 b
d
where a and b are jointly normally distributed with mean zero and covariance matrix:
0 zi ui
S = E(gi gi ), where gi =
zi vi
Hint: apply the CLT for a vector of random variables. You may also use that zn → z
d
implies a(zn ) → a(z) under suitable regularity conditions on function a.
d
4. Is β̂ consistent?
3. Let yi be a test score of child i, ci be class size, ei be enrollment at the beginning of the
academic year, and let c̃i = f (ei ) be the class size as predicted by ei and Maimonides’
rule. Draw function f , up to ei = 121 children.
5. Are ci and c̃i strongly correlated? You may refer to the introduction and the Figures
in the paper.
where xi includes individual and school characteristics, but does not include ei . What
is the main assumption I am making? Comment.
8. Do you see a potential identification problem with including ei in the regression (through
the term γei )? Show that the graphs suggest that the parameter of interest is identified.
Hint: You may want to read the last part of the introduction again.