0% found this document useful (0 votes)
75 views

15 Instrumental Variables

1) Endogeneity occurs when an independent variable is correlated with the error term, biasing ordinary least squares estimates. 2) Instrumental variables (IV) address endogeneity by using a variable correlated with the endogenous independent variable but uncorrelated with the error term. 3) Two-stage least squares (2SLS) estimation first predicts the endogenous variable using the instrument, then runs regression on the predicted values to obtain unbiased estimates.

Uploaded by

David Ayala
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
75 views

15 Instrumental Variables

1) Endogeneity occurs when an independent variable is correlated with the error term, biasing ordinary least squares estimates. 2) Instrumental variables (IV) address endogeneity by using a variable correlated with the endogenous independent variable but uncorrelated with the error term. 3) Two-stage least squares (2SLS) estimation first predicts the endogenous variable using the instrument, then runs regression on the predicted values to obtain unbiased estimates.

Uploaded by

David Ayala
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 27

Instrumental Variables

Ani Katchova

© 2020 by Ani Katchova. All rights reserved.


Outline
• Endogeneity problem
• Instrumental variables
• IV estimation
• 2SLS estimation
• Testing for endogeneity

2
Endogeneity problem
• Endogeneity problem is when the independent variable is correlated with the
error term.
• Endogeneity is a frequent problem in economics and econometrics.
• Sources of endogeneity:
• Omitted variables - independent variables are not observed and end up in the error term, so
the error term is correlated with the independent variables.
• Measurement error can cause correlation between the mismeasured variable and the error
term.
• Solutions for endogeneity:
• Find and include the unobserved variable in the model.
• Find and include a proxy variable in the model.
• Use fixed effects estimator with panel data, by eliminating individual specific effects.
• Use instrumental variables (IV) method which replaces the endogenous variable with a
predicted value that has only exogenous information.

3
Instrumental variables - definition
• An instrumental variable (or instrument or IV) is a variable that is
used in a regression model to correct for the endogeneity problem.
• Dependent variable 𝑦𝑦
• Endogenous variable 𝑥𝑥 that is correlated with the error term 𝑢𝑢
• Instrument 𝑧𝑧 is a variable that is related to the endogenous variable 𝑥𝑥
but does not belong in the model for 𝑦𝑦 and is not correlated with the
error term.

4
Regression model – OLS estimation
• Regression model: 𝑦𝑦 = 𝛽𝛽0 + 𝛽𝛽1 𝑥𝑥 + 𝑢𝑢
• If 𝑥𝑥 is exogenous (not correlated with the error term), 𝑐𝑐𝑐𝑐𝑐𝑐 𝑥𝑥, 𝑢𝑢 = 0.
• 𝑐𝑐𝑐𝑐𝑐𝑐 𝑥𝑥, 𝑢𝑢 = 𝑐𝑐𝑐𝑐𝑐𝑐 𝑥𝑥, 𝑦𝑦 − 𝛽𝛽0 − 𝛽𝛽1 𝑥𝑥 = 𝑐𝑐𝑐𝑐𝑐𝑐 𝑥𝑥, 𝑦𝑦 − 𝛽𝛽1 𝑣𝑣𝑣𝑣𝑣𝑣 𝑥𝑥 = 0
𝑐𝑐𝑐𝑐𝑐𝑐 𝑥𝑥,𝑦𝑦 ∑(𝑥𝑥−𝑥𝑥)(𝑦𝑦−
̅ �
𝑦𝑦)
• 𝛽𝛽1𝑂𝑂𝑂𝑂𝑂𝑂 = 𝛽𝛽1 = = ∑(𝑥𝑥−𝑥𝑥)(𝑥𝑥−
̅
𝑣𝑣𝑣𝑣𝑣𝑣 𝑥𝑥 𝑥𝑥)̅
• If 𝑥𝑥 is exogenous, then 𝛽𝛽1 will be unbiased and consistent.

5
Regression model – IV estimation
• Regression model: 𝑦𝑦 = 𝛽𝛽0 + 𝛽𝛽1 𝑥𝑥 + 𝑢𝑢
• If 𝑥𝑥 is endogenous (correlated with the error term), 𝑐𝑐𝑐𝑐𝑐𝑐 𝑥𝑥, 𝑢𝑢 ≠ 0.
• Find an instrument 𝑧𝑧 that is not correlated with the error term 𝑢𝑢,
𝑐𝑐𝑐𝑐𝑐𝑐 𝑧𝑧, 𝑢𝑢 = 0
• 𝑐𝑐𝑐𝑐𝑐𝑐 𝑧𝑧, 𝑢𝑢 = 𝑐𝑐𝑐𝑐𝑐𝑐 𝑧𝑧, 𝑦𝑦 − 𝛽𝛽0 − 𝛽𝛽1 𝑥𝑥 = 𝑐𝑐𝑐𝑐𝑐𝑐 𝑧𝑧, 𝑦𝑦 − 𝛽𝛽1 𝑐𝑐𝑐𝑐𝑐𝑐 𝑧𝑧, 𝑥𝑥 = 0
𝑐𝑐𝑐𝑐𝑐𝑐 𝑧𝑧,𝑦𝑦 ∑(𝑧𝑧−𝑧𝑧)(𝑦𝑦−
̅ �
𝑦𝑦)
• 𝛽𝛽1𝐼𝐼𝐼𝐼 = 𝛽𝛽1 = = ∑(𝑧𝑧−𝑧𝑧)(𝑥𝑥−
𝑐𝑐𝑐𝑐𝑐𝑐 𝑧𝑧,𝑥𝑥 ̅ 𝑥𝑥)̅
• The coefficient estimated using the above IV formula will be unbiased and
consistent.
• If 𝑥𝑥 is exogenous, it can serve as its own instrument 𝑧𝑧 = 𝑥𝑥, and the IV
estimate 𝛽𝛽1𝐼𝐼𝐼𝐼 would be identical to the OLS estimate 𝛽𝛽1𝑂𝑂𝑂𝑂𝑂𝑂 .
6
IV properties
• An instrument 𝑧𝑧 should have three properties:
1) The instrument 𝑧𝑧 does not appear in the original regression model.
𝑦𝑦 = 𝛽𝛽0 + 𝛽𝛽1 𝑥𝑥 + 𝑢𝑢
2) The instrument 𝑧𝑧 is correlated with the endogenous variable 𝑥𝑥, so
𝑐𝑐𝑐𝑐𝑐𝑐 𝑧𝑧, 𝑥𝑥 ≠ 0
𝑥𝑥 = 𝛿𝛿0 + 𝛿𝛿1 𝑧𝑧 + 𝑣𝑣
where 𝛿𝛿1 ≠ 0.
3) The instrument 𝑧𝑧 is uncorrelated with the error term 𝑢𝑢.
𝑐𝑐𝑐𝑐𝑐𝑐 𝑧𝑧, 𝑢𝑢 = 0

7
2SLS – two stage least squares
• Regression model – OLS estimation: 𝑦𝑦 = 𝛽𝛽0 + 𝛽𝛽1 𝑥𝑥 + 𝑢𝑢
• If 𝑥𝑥 is endogenous, the coefficient 𝛽𝛽̂1 estimated with OLS will be biased.
• 2SLS – first stage: 𝑥𝑥 = 𝛿𝛿0 + 𝛿𝛿1 𝑧𝑧 + 𝑣𝑣 is a regression of the endogenous
variable 𝑥𝑥 on the instrument 𝑧𝑧.
• Get predicted values 𝑥𝑥� = 𝛿𝛿̂0 + 𝛿𝛿̂1 𝑧𝑧. The predicted value 𝑥𝑥� contains only
exogenous information from the instrument 𝑧𝑧.
• 2SLS – second stage: 𝑦𝑦 = 𝛽𝛽0 + 𝛽𝛽1 𝑥𝑥� + 𝑢𝑢. Regression the dependent
variable 𝑦𝑦 on the predicted values 𝑥𝑥. �
• The coefficient 𝛽𝛽̂1 estimated with 2SLS will be unbiased because 𝑥𝑥� is
exogenous and uncorrelated with the error term 𝑢𝑢.
8
2SLS – standard errors
• The standard errors from the second stage regression need to be corrected.
𝜎𝜎 2 𝜎𝜎 2
• In OLS, 𝑣𝑣𝑣𝑣𝑣𝑣 𝛽𝛽1 = In 2SLS, 𝑣𝑣𝑣𝑣𝑣𝑣 𝛽𝛽1 = 2
𝑆𝑆𝑆𝑆𝑆𝑆𝑥𝑥 𝑆𝑆𝑆𝑆𝑆𝑆𝑥𝑥 𝑅𝑅𝑥𝑥,𝑧𝑧
• 𝜎𝜎2 is the variance of the error term 𝑢𝑢. 𝑆𝑆𝑆𝑆𝑇𝑇𝑥𝑥 is the total variation in 𝑥𝑥.
2
• 𝑅𝑅𝑥𝑥,𝑧𝑧 is the 𝑅𝑅2 from the regression of 𝑥𝑥 on 𝑧𝑧.
�1𝑂𝑂𝑂𝑂𝑂𝑂
𝑣𝑣𝑣𝑣𝑣𝑣 𝛽𝛽 �1𝑂𝑂𝑂𝑂𝑂𝑂
𝑠𝑠𝑠𝑠 𝛽𝛽
• 𝑣𝑣𝑣𝑣𝑣𝑣(𝛽𝛽̂1 ) =
2𝑆𝑆𝑆𝑆𝑆𝑆
2 and 𝑠𝑠𝑠𝑠 𝛽𝛽̂1
2𝑆𝑆𝑆𝑆𝑆𝑆
=
𝑅𝑅𝑥𝑥,𝑧𝑧 2
𝑅𝑅𝑥𝑥,𝑧𝑧
• The variance of coefficients using the 2SLS estimation will be higher than the
variance of coefficients using the OLS estimation, because the R-squared is less
than 1.
2
• A weaker the relationship between 𝑥𝑥 and 𝑧𝑧 will results in lower 𝑅𝑅𝑥𝑥,𝑧𝑧 and higher
variance of the 2SLS coefficients, leading to less significance.

9
IV versus 2SLS estimation
• If there is one endogenous variable and one instrument, then the
2SLS estimates (replacing 𝑥𝑥 with 𝑥𝑥� based on 𝑧𝑧) will be the same as the
IV estimates (𝑐𝑐𝑐𝑐𝑐𝑐(𝑧𝑧, 𝑦𝑦)/𝑐𝑐𝑐𝑐𝑐𝑐(𝑧𝑧, 𝑥𝑥)).
• The 2SLS estimation can also be used if there is more than one
endogenous variable and at least as many instruments.

10
IV example
• Model for log wages (𝑙𝑙𝑙𝑙𝑙𝑙𝑙𝑙𝑙𝑙) explained by education (𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒), which is
endogenous. The father’s education (𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓) will serve as an instrument
for education.
• 𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓 is a good instrument for 𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒 because it has the three properties:
1) The instrument 𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓 does not appear in the original regression model.
𝑙𝑙𝑙𝑙𝑙𝑙𝑙𝑙𝑙𝑙 = 𝛽𝛽0 + 𝛽𝛽1 𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒 + 𝑢𝑢
2) The instrument 𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓 is correlated with the endogenous variable 𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒, so
𝑐𝑐𝑐𝑐𝑐𝑐 𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓, 𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒 ≠ 0
𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒 = 𝛿𝛿0 + 𝛿𝛿1 𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓 + 𝑣𝑣
where 𝛿𝛿1 ≠ 0.
3) The instrument 𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓 is uncorrelated with the error term 𝑢𝑢.
𝑐𝑐𝑐𝑐𝑐𝑐 𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓, 𝑢𝑢 = 0
• Other potential instruments: number of siblings, college proximity when 16 years
old, month of birth.

11
IV estimation example
• Model for log wages (𝑙𝑙𝑙𝑙𝑙𝑙𝑙𝑙𝑙𝑙) explained by education (𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒), which is endogenous.
The father’s education (𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓) is an instrument for education.
• Regression model: 𝑙𝑙𝑙𝑙𝑙𝑙𝑙𝑙𝑙𝑙 = 𝛽𝛽0 + 𝛽𝛽1 𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒 + 𝑢𝑢
𝑐𝑐𝑐𝑐𝑐𝑐 𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒, 𝑙𝑙𝑙𝑙𝑙𝑙𝑙𝑙𝑙𝑙 ∑(𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒−𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒)(𝑙𝑙𝑙𝑙𝑙𝑙𝑙𝑙𝑙𝑙−𝑙𝑙𝑙𝑙𝑙𝑙𝑙𝑙𝑙𝑙)
• 𝛽𝛽1𝑂𝑂𝑂𝑂𝑂𝑂 = = ∑(𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒−𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒)(𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒−𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒)
= 0.109
𝑣𝑣𝑣𝑣𝑣𝑣 𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒
𝑐𝑐𝑐𝑐𝑐𝑐 𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓, 𝑙𝑙𝑙𝑙𝑙𝑙𝑙𝑙𝑙𝑙 ∑(𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓−𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓)(𝑙𝑙𝑙𝑙𝑙𝑙𝑙𝑙𝑙𝑙−𝑙𝑙𝑙𝑙𝑙𝑙𝑙𝑙𝑙𝑙)
• 𝛽𝛽1𝐼𝐼𝐼𝐼 = = ∑(𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓−𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓)(𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒−𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒)
= 0.059
𝑐𝑐𝑐𝑐𝑐𝑐 𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓, 𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒
• The coefficient using IV estimation is lower than the coefficient using OLS estimation.
One additional year of education is associated with 10.9% increase in wages using
OLS but only 5.9% increase in wages using IV.
• The OLS and IV estimates for 𝛽𝛽1 appear to be different from each other, so
perhaps 𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒 is endogenous.

12
OLS and 2SLS example
• Regression model – OLS estimation: 𝑙𝑙𝑙𝑙𝑙𝑙𝑙𝑙𝑙𝑙 = 𝛽𝛽0 + 𝛽𝛽1 𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒 + 𝑢𝑢. Get
𝛽𝛽̂1𝑂𝑂𝑂𝑂𝑂𝑂
• Education is an endogenous variable, and father’s education is the
instrument.
• 2SLS estimation:

• First stage: 𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒 = 𝛿𝛿0 + 𝛿𝛿1 𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓 + 𝑣𝑣, get predicted values 𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒.
� + 𝑢𝑢. Get 𝛽𝛽̂12𝑆𝑆𝑆𝑆𝑆𝑆 .
• Second stage: 𝑙𝑙𝑙𝑙𝑙𝑙𝑙𝑙𝑙𝑙 = 𝛽𝛽0 + 𝛽𝛽1 𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒
• 2SLS first stage is regressing education on father’s education, getting
predicted values for education 𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒.� The 2SLS second stage is

regressing lwage on 𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒.
13
OLS and 2SLS estimation
OLS estimation 2SLS estimation 2SLS estimation – Using the OLS estimation, one
– first stage second stage additional year of education is
VARIABLES lwage educ lwage associated with 10.9% increase in
educ 0.109*** wages.
(0.014) Using the 2SLS estimation, one
additional year of education is
educ_hat 0.059*
associated with 5.9% increase in
(0.035) wages, which is a lower effect and
fatheduc 0.269*** less significant.
(0.029) The same IV and 2SLS coefficient of
0.059 are obtained.
Constant -0.185 10.237*** 0.441
(0.185) (0.276) (0.446)
R-squared 0.12 0.17 0.09

14
2SLS – endogenous variable vs predicted
values using instrument
educ_hat �
• The first few observations for 𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒 and 𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒.

educ fatheduc 𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒 � is only based on the exogenous information coming from
• 𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒
12 7 12.12 𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓.
12 7 12.12 � is not a whole number
• 𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒
12 7 12.12
12 7 12.12 • If a variable is binary (0 or 1), the predicted values below 0.5 can be
14 14 14.01 replaced by 0 and above 0.5 can be replaced by 1.
12 7 12.12
16 7 12.12
12 3 11.05

15
2SLS – standard errors
• The R-squared of the 2SLS first stage regression of 𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒 on 𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑐𝑐
2
is 𝑅𝑅𝑥𝑥,𝑧𝑧 =0.17. The R-squared is not very high.
• From the regression output, 𝑠𝑠𝑠𝑠 𝛽𝛽̂1𝑂𝑂𝑂𝑂𝑂𝑂 =0.014 and 𝑠𝑠𝑠𝑠 𝛽𝛽̂12𝑆𝑆𝑆𝑆𝑆𝑆 =0.034.
The 2SLS coefficient has a higher standard error and is less significant.
• The exact relationship for the standard errors is:
𝑠𝑠𝑠𝑠 ̂1𝑂𝑂𝑂𝑂𝑂𝑂
𝛽𝛽 0.014
𝑠𝑠𝑠𝑠 𝛽𝛽̂1
2𝑆𝑆𝑆𝑆𝑆𝑆
= = = 0.034
2 0.17
𝑅𝑅𝑥𝑥,𝑧𝑧

16
Multiple regression model – IV estimation
• Multiple regression model
𝑦𝑦1 = 𝛽𝛽0 + 𝛽𝛽1 𝑦𝑦2 + 𝛽𝛽2 𝑧𝑧1 + 𝛽𝛽3 𝑧𝑧2 + 𝑢𝑢1
• here 𝑦𝑦2 is the endogenous variable that is correlated with the error term
𝑢𝑢1 , and 𝑧𝑧1 and 𝑧𝑧2 are exogenous variables.
• Find two instruments 𝑧𝑧3 and 𝑧𝑧4 for the endogenous variable 𝑦𝑦2 , that are
uncorrelated with the error term.
• The exogeneity conditions for the instruments are:
• 𝑐𝑐𝑐𝑐𝑐𝑐 𝑧𝑧3 , 𝑢𝑢1 = 𝑐𝑐𝑐𝑐𝑐𝑐 𝑧𝑧3 , 𝑦𝑦1 − 𝛽𝛽0 − 𝛽𝛽1 𝑦𝑦2 − 𝛽𝛽2 𝑧𝑧1 − 𝛽𝛽3 𝑧𝑧2 = 0
• 𝑐𝑐𝑐𝑐𝑐𝑐 𝑧𝑧4 , 𝑢𝑢1 = 𝑐𝑐𝑐𝑐𝑐𝑐 𝑧𝑧4 , 𝑦𝑦1 − 𝛽𝛽0 − 𝛽𝛽1 𝑦𝑦2 − 𝛽𝛽2 𝑧𝑧1 − 𝛽𝛽3 𝑧𝑧2 = 0
• 𝐸𝐸 𝑢𝑢1 = 𝐸𝐸(𝑦𝑦1 − 𝛽𝛽0 − 𝛽𝛽1 𝑦𝑦2 − 𝛽𝛽2 𝑧𝑧1 − 𝛽𝛽3 𝑧𝑧2 ) = 0.
• These equations are solved to obtain the IV coefficients 𝛽𝛽̂0 , 𝛽𝛽̂1 , 𝛽𝛽̂2 , and 𝛽𝛽̂3 .

17
Multiple regression model – 2SLS
• Multiple regression model
𝑦𝑦1 = 𝛽𝛽0 + 𝛽𝛽1 𝑦𝑦2 + 𝛽𝛽2 𝑧𝑧1 + 𝛽𝛽3 𝑧𝑧2 + 𝑢𝑢1
• here 𝑦𝑦2 is the endogenous variable that is correlated with the error term
𝑢𝑢1 , and 𝑧𝑧1 and 𝑧𝑧2 are exogenous variables.
• Find two instruments 𝑧𝑧3 and 𝑧𝑧4 for the endogenous variable 𝑦𝑦2 .
• The 2SLS first stage reduced form equation is:
𝑦𝑦2 = 𝛿𝛿0 + 𝛿𝛿1 𝑧𝑧1 + 𝛿𝛿2 𝑧𝑧2 + 𝛿𝛿3 𝑧𝑧3 +𝛿𝛿4 𝑧𝑧4 + 𝑣𝑣2
• Obtain fitted values: 𝑦𝑦�2 = 𝛿𝛿̂0 + 𝛿𝛿̂1 𝑧𝑧1 + 𝛿𝛿̂2 𝑧𝑧2 + 𝛿𝛿̂3 𝑧𝑧3 + 𝛿𝛿̂4 𝑧𝑧4
• The 2SLS second stage is to estimate the structural model where the
endogenous variable 𝑦𝑦2 is replaced by 𝑦𝑦�2 :
𝑦𝑦1 = 𝛽𝛽0 + 𝛽𝛽1 𝑦𝑦�2 + 𝛽𝛽2 𝑧𝑧1 + 𝛽𝛽3 𝑧𝑧2 + 𝑢𝑢1
18
IV properties
• The instruments 𝑧𝑧3 and 𝑧𝑧4 should have three properties:
1) The instruments do not appear in the original regression model.
𝑦𝑦1 = 𝛽𝛽0 + 𝛽𝛽1 𝑦𝑦2 + 𝛽𝛽2 𝑧𝑧1 + 𝛽𝛽3 𝑧𝑧2 + 𝑢𝑢1
2) The instruments are correlated with the endogenous variable 𝑦𝑦2 , so
𝑐𝑐𝑐𝑐𝑐𝑐 𝑧𝑧3 , 𝑦𝑦2 ≠ 0 and 𝑐𝑐𝑐𝑐𝑐𝑐 𝑧𝑧4 , 𝑦𝑦2 ≠ 0
𝑦𝑦2 = 𝛿𝛿0 + 𝛿𝛿1 𝑧𝑧1 + 𝛿𝛿2 𝑧𝑧2 + 𝛿𝛿3 𝑧𝑧3 +𝛿𝛿4 𝑧𝑧4 + 𝑣𝑣2
where 𝛿𝛿3 ≠ 0 and 𝛿𝛿4 ≠ 0.
3) The instruments are uncorrelated with the error term 𝑢𝑢1 .
𝑐𝑐𝑐𝑐𝑐𝑐 𝑧𝑧3 , 𝑢𝑢1 = 0 and 𝑐𝑐𝑐𝑐𝑐𝑐 𝑧𝑧4 , 𝑢𝑢1 = 0.

19
IV and 2SLS discussion
• The IV estimation is equivalent to the 2SLS estimation.
• The 2SLS estimation works because the endogenous variable 𝑦𝑦2 is
replaced in the second stage by 𝑦𝑦�2 that contains only exogenous
information from instruments and exogenous variables, but not the
endogenous part that is correlated with the error term.

20
2SLS example
• Structural equation model:
𝑙𝑙𝑙𝑙𝑙𝑙𝑙𝑙𝑙𝑙 = 𝛽𝛽0 + 𝛽𝛽1 𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒 + 𝛽𝛽2 𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒 + 𝛽𝛽3 𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑟𝑟 2 + 𝑢𝑢1
• here 𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒 is endogenous and 𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒 and 𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑟𝑟 2 are exogenous.
• Find two instruments 𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓 and 𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚 for 𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒.
• 2SLS first stage - estimate the reduced form equation:
• 𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒 = 𝛿𝛿0 + 𝛿𝛿1 𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒 + 𝛿𝛿2 𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑟𝑟 2 + 𝛿𝛿3 𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓 + 𝛿𝛿4 𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚 + 𝑣𝑣2
• Obtain the predicted values 𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒, � which contain only exogenous
information.
• 2SLS second stage - estimate the structural equation replacing 𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒 with
� :
𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒
� + 𝛽𝛽2 𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒 + 𝛽𝛽3 𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑟𝑟 2 + 𝑢𝑢1
𝑙𝑙𝑙𝑙𝑙𝑙𝑙𝑙𝑙𝑙 = 𝛽𝛽0 + 𝛽𝛽1 𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒

21
2SLS example
OLS 2SLS – first 2SLS – second • 2SLS estimation – estimate 2SLS first
stage stage stage for education, get predicted values
VARIABLES lwage educ lwage educ_hat and use them instead of educ
educ 0.108***
in the 2SLS second stage.
(0.014)
• The coefficient on education goes down
educ_hat 0.061*
(0.031) from 0.108 using OLS to 0.061 using 2SLS.
exper 0.042*** 0.045 0.044*** • One additional year of education is
(0.013) (0.040) (0.013) associated with 10.8% increase in wages
expersq -0.0008** -0.001 -0.0009** using OLS, and with 6.1% increase in
(0.0004) (0.001) (0.0004) wages using 2SLS. The effect is smaller
fatheduc 0.190*** and less significant using the 2SLS after
(0.034) correcting for the endogeneity.
motheduc 0.158***
(0.036)
Constant -0.522*** 9.103*** 0.048
(0.199) (0.427) (0.400)
22
2SLS example
2SLS – second stage 2SLS – second stage If estimating the second stage of 2SLS, the
correct standard incorrect standard standard errors need to be corrected.
errors errors In OLS, 𝑣𝑣𝑣𝑣𝑣𝑣 𝛽𝛽 =
𝜎𝜎2
𝑆𝑆𝑆𝑆𝑆𝑆𝑥𝑥
VARIABLES lwage lwage
𝜎𝜎2
educ In 2SLS, 𝑣𝑣𝑣𝑣𝑣𝑣 𝛽𝛽 = 2
𝑆𝑆𝑆𝑆𝑆𝑆𝑥𝑥 𝑅𝑅𝑥𝑥,𝑧𝑧
The standard error on the coefficient on
educ_hat 0.061* 0.061* education is higher when corrected (0.033 vs
(0.031) (0.033) 0.031).
exper 0.044*** 0.044*** Many software packages provide the
(0.013) (0.014) corrected standard errors.
expersq -0.0009** -0.0009**
(0.0004) (0.0004)
fatheduc

motheduc

Constant 0.048 0.048


23
(0.400) (0.420)
Testing for endogeneity
• Structural equation model: 𝑦𝑦1 = 𝛽𝛽0 + 𝛽𝛽1 𝑦𝑦2 + 𝛽𝛽2 𝑧𝑧1 + 𝛽𝛽3 𝑧𝑧2 + 𝑢𝑢1
• Testing for endogeneity of 𝑦𝑦2 .
• Find two instruments 𝑧𝑧3 and 𝑧𝑧4 for 𝑦𝑦2 .
• Estimate the reduced form equation:
𝑦𝑦2 = 𝛿𝛿0 + 𝛿𝛿1 𝑧𝑧1 + 𝛿𝛿2 𝑧𝑧2 + 𝛿𝛿3 𝑧𝑧3 + 𝛿𝛿4 𝑧𝑧4 + 𝑣𝑣2
• Obtain the residuals 𝑣𝑣�2 , which would contain the endogenous information.
• The predicted values 𝑦𝑦�2 only contains the exogenous information.
• So the endogenous variable is broken down in exogenous part 𝑦𝑦�2 and endogenous part 𝑣𝑣�2 ,
𝑦𝑦2 = 𝑦𝑦�2 + 𝑣𝑣�2 .
• Estimate the structural equation with the residuals 𝑣𝑣�2 included:
𝑦𝑦1 = 𝛽𝛽0 + 𝛽𝛽1 𝑦𝑦2 + 𝛽𝛽2 𝑧𝑧1 + 𝛽𝛽3 𝑧𝑧2 + 𝛾𝛾1 𝑣𝑣�2 + 𝑢𝑢1
• H0: 𝛾𝛾1 = 0 (exogeneity)
• Ha: 𝛾𝛾1 ≠ 0 (endogeneity)

24
Testing for endogeneity example
• Structural equation model: 𝑙𝑙𝑙𝑙𝑙𝑙𝑙𝑙𝑙𝑙 = 𝛽𝛽0 + 𝛽𝛽1 𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒 + 𝛽𝛽2 𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒 + 𝛽𝛽3 𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑟𝑟 2 + 𝑢𝑢1
• Testing for endogeneity of 𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒.
• Find two instruments 𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓 and 𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚 for 𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒.
• Estimate the reduced form equation:
• 𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒 = 𝛿𝛿0 + 𝛿𝛿1 𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒 + 𝛿𝛿2 𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑟𝑟 2 + 𝛿𝛿3 𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓 + 𝛿𝛿4 𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚 + 𝑣𝑣2
• Obtain the residuals 𝑣𝑣�2 , which would contain the endogenous information.
• The predicted values 𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒 � only contains the exogenous information.
• Estimate the structural equation with the residuals 𝑣𝑣�2 included:
𝑙𝑙𝑙𝑙𝑙𝑙𝑙𝑙𝑙𝑙 = 𝛽𝛽0 + 𝛽𝛽1 𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒 + 𝛽𝛽2 𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒 + 𝛽𝛽3 𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑟𝑟 2 + 𝛾𝛾1 𝑣𝑣�2 + 𝑢𝑢1
• H0: 𝛾𝛾1 = 0 (exogeneity)
• Ha: 𝛾𝛾1 ≠ 0 (endogeneity)

25
Testing for endogeneity
Structural model Reduced Structural model Estimate the reduced form model for
form model with residuals education, obtain the residuals, and
VARIABLES lwage educ lwage include them in the structural model for
educ 0.107*** 0.061** lwage.
(0.014) (0.031) The coefficient on the residual vhat is
exper 0.042*** 0.045 0.044*** significant at 10%, so the variable
(0.013) (0.040) (0.013) education is endogenous. Instrumental
expersq -0.001** -0.001 -0.001** variables need to be used to correct for
(0.0004) (0.001) (0.0003) the endogeneity.
fatheduc 0.190***
(0.033)
motheduc 0.157***
(0.035)
vhat 0.058*
(0.034)
Constant -0.522*** 9.103*** 0.048
(0.199) (0.427) (0.395)
26
Review questions
• Describe the three properties of a good instrument.
• Describe the IV estimator.
• Describe the 2SLS procedure with first and second stage estimation.
• Describe the test for endogeneity of an independent variable.

27

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy