Stata IV Simple Example
Stata IV Simple Example
ivregress, ivreg2
Table of Contents
Instrumental variable regression is a statistical method used when you suspect that there’s a
hidden bias affecting the relationship between your variables. It’s like having a sneaky
confounder that you can’t measure directly, but you know it’s there, messing with your
results. So, you bring in an instrumental variable—a kind of secret agent—to help you
uncover the true effect of your variable of interest.
Imagine you’re studying the effect of a new counseling program (treatment) on reducing
stress levels (outcome) among social workers. However, you suspect that those who choose to
participate might already be more motivated or less stressed, which could bias your results.
Step 1: Find Your Instrument You need an instrumental variable that’s related to the
likelihood of participating in the program but not directly related to stress levels. Let’s say
you find that social workers who live closer to the counseling center are more likely to
participate. Proximity to the center becomes your instrumental variable.
Step 2: First Stage Regression You first run a regression with the instrumental variable
(proximity) predicting the treatment (participation in the program). This gives you the
predicted values of treatment, which are free from the bias of the unmeasured confounder
(motivation or initial stress levels).
Step 3: Second Stage Regression Next, you use these predicted values from the first stage as
your ‘clean’ treatment variable to predict the outcome (stress levels). This second regression
tells you the effect of the counseling program on stress levels, without the bias introduced by
the unmeasured confounder.
Suppose you have data on social workers’ stress levels and their participation in the
counseling program. You also know how far each social worker lives from the center.
1. First Stage: You find that living closer to the center significantly predicts higher
participation.
2. Second Stage: Using the predicted participation from the first stage, you find that
participating in the counseling program leads to lower stress levels.
By using the instrumental variable of proximity, you’ve managed to isolate the effect of the
counseling program on stress levels, accounting for the potential bias of self-selection into the
program.
Tests
You can load the nlswork.dta dataset from the default Stata Press website using
the webuse command:
Stata
webuse nlswork, clear
For example, let’s say we want to estimate the effect of years of education, instrumenting
education with mother’s education.
Stata
ssc install ivreg2
ivreg2 ln_wage (grade = msp)
ln_wage (dependent variable):
o The coefficient for grade is approximately 0.2313, and it is statistically
significant (p-value < 0.001). This suggests that grade has a positive effect
on ln_wage.
Identification Tests:
o The Anderson canonical correlation LM statistic tests for
underidentification. The p-value is 0.0001, indicating that the model is not
underidentified.
o The Cragg-Donald Wald F statistic tests for weak identification. The p-value
is also 0.0001, suggesting that the instrument is not weak.
o Since the equation is exactly identified (no overidentification), the Sargan
statistic reports a p-value of 0.000.
Instrumentation:
o grade is instrumented by msp.
The basic command for ivreg2 is as follows. Please replace y with your dependent
variable, x1 with your endogenous regressor, z1 with your instrument, and x2 with other
control variables.
Stata
ivreg2 y (x1 = z1) x2
To add control variables, simply include them after the dependent variable. For instance, to
control for experience and tenure:
Stata
ivreg2 ln_wage ttl_exp tenure (grade = msp)
The Wu-Hausman F test and the Durbin-Wu-Hausman chi-sq test are used to test for
endogeneity of a regressor (in this case, the variable grade). You can perform this test by
simply putting the ivendog command (developed by Baum et al. 2007) after ivreg2 command.
Stata
ivendog
1. Wu-Hausman F Test:
o Null Hypothesis (H0): The regressor (in this case, grade) is exogenous (i.e.,
not correlated with the error term).
o The test statistic is 8.77348, and the associated p-value is 0.00306.
o Since the p-value is less than 0.05, we reject the null hypothesis.
o Interpretation: There is evidence to suggest
that grade is endogenous (correlated with the error term) in the regression
model.
2. Durbin-Wu-Hausman Chi-Square Test:
o This test is another way to assess endogeneity.
o Null Hypothesis (H0): The regressor (again, grade) is exogenous.
o The test statistic is 8.77170, and the associated p-value is also 0.00306.
o Similar to the Wu-Hausman F test, the p-value is less than 0.05, leading us to
reject the null hypothesis.
o Conclusion: The evidence supports the idea that grade is endogenous.
In summary, both tests indicate that grade is likely endogenous in your regression model.
This means that there may be omitted variables or other issues affecting the relationship
between grade and the dependent variable. Researchers often address endogeneity by using
instrumental variables or other econometric techniques.
The Pagan-Hall general test statistic is used to test for heteroskedasticity in the context of
instrumental variables (IV) estimation. You can perform this test by simply putting the
ivhettest command (developed by Schaffer 2023) after ivreg2 command.
Stata
ssc install ivhettest, replace
ivhettest
Null Hypothesis (H0): The disturbance (error term) is homoskedastic (i.e., the
variance of the error term is constant across observations).
Since the p-value is 0.5596, which is much greater than 0.05, we fail to reject the null
hypothesis. This means that there is no statistical evidence to suggest the presence of
heteroskedasticity in the model; the assumption of homoskedasticity is not violated.
In simpler terms, the test indicates that the variance of the error terms in your IV regression
model is consistent across different levels of the instrumental variables, and there’s no need to
adjust for heteroskedasticity based on this test.
Reference
andrewproctor.github.io/assets/StataSeminar4.pdf
mayoral.iae-csic.org/IV_2015/StataIV_baum.pdf
fmwww.bc.edu/EC-C/S2014/823/EC823.S2014.nn02.slides.pdf