Format Sample - ECON
Format Sample - ECON
Format Sample - ECON
To begin, run
three regressions:
1
The following table summarizes these regressions.
unmarried -187.1
R2 0.028 0.072
0.087
(a) What is the value of the estimated effect of smoking on birth weight in each of the regressions?
2
(Answer) See
Table 1 above.
(b) Construct a 95% confidence interval for the effect of smoking on birth weight, using each of
theregressions.
(Answer)
From (1) the 95% CI is -305.8 to -200.7
From (2) the 95% CI is -268.8 to -166.4
From (3) the 95% CI is -228.0 to -122.8
(c) Does the coefficient on smoker in regression (1) suffer from omitted variable bias? Explain?
(Answer)
Yes, it seems so. The coefficient falls by roughly 30% in magnitude when additional regressors are
added to (1). This change is substantively large and large relative to the standard error in (1).
(d) Does the coefficient on smoker in regression (2) suffer from omitted variable bias? Explain?
(Answer)
Yes, it seems so. The coefficient falls by roughly 20% in magnitude when unmarried is added as an
additional regression. This change is substantively large and large relative to the standard error in
(2).
iv. A family advocacy group notes that the large coefficient suggests that public policies
thatencourage marriage will lead, on average, to healthier babies. Do you agree? (Hint: Review
the discussion of control variables in Section 7.5. Discuss some of the various factors that
unmarried may be controlling for and how this affects the interpretation of its coefficient.)
(Answer)
As the question suggests, unmarried is a control variable that captures the effects of several
factors that differ between married and unmarried mothers such as age, education, income,
diet and other health factors, and so forth.
(f) Consider the various other control variables in the data set. Which do you think should be included
in the regression? Using a table like Table 7.1, examine the robustness of the confidence interval
you constructed in (b). What is a reasonable 95% confidence interval for the effect of smoking on
birth weight?
3
(Answer)
I have added on additional regression in the table that includes age and educ (years of education).
The coefficient on smoker is very similar to its value in regression (3). See Table 2 below.
E7.2 In the empirical exercises on earnings and height until last week, you estimated a relatively large and
statistically significant effect of a worker’s height on his or her earnings. One explanation for this result
is omitted variable bias: Height is correlated with an omitted factor that affects earnings. For example,
Case and Paxson (2008) suggest that cognitive ability (or intelligence) is the omitted factor. The
mechanism they describe is straightforward: Poor nutrition and other harmful environmental
Table 2: Birth Weight and
Smoking (1) (2) (3) (4)
birthweight birthweight birthweight birthweight
smoker -253.2∗∗∗ -217.6∗∗∗ -175.4∗∗∗ -177.0∗∗∗
(26.81) (26.11) (26.83) (27.33)
-187.1∗∗∗ -199.3∗∗∗
unmarried
(27.68) (30.99)
age -2.494
(2.445)
educ 0.238
(5.533)
R2 0.028 0.072
0.087 0.087
4
The mechanism described in the problem explains why height (X) and cognitive ability (the
omitted variable) are correlated and why cognitive ability is a determinant of earning (Y ). The
mechanism suggests that height and cognitive ability are positively correlated and that cognitive
ability has a positive effect on earnings. Thus, X will be positively correlated with the error leading
to a positive bias in the estimated coefficient.
If the mechanism described above is correct, the estimated effect of height on earnings should
disappear if a variable measuring cognitive ability is included in the regression. Unfortunately, there is
not a direct measure of cognitive ability in the dataset, but the dataset does include “years of education”
for each individual. Because students with higher cognitive ability are more likely to attend school
longer, years of education might serve as a control variable for cognitive ability; in this case, including
education in the regression will eliminate, or at least attenuate, the omitted variable bias problem.
Use the years of education variable, educ, to construct four indicator (dummy) variables for whether a
worker has less than a high school diploma (lt hs = 1 if educ < 12, 0, otherwise), a high school diploma
(hs = 1 if educ = 12, 0, otherwise), some college, (some col = 1 if 12 < educ < 16, 0, otherwise), or a
bachelor’s degree or higher (college = 1 if educ≥ 16, 0, otherwise).
(Answer)
Generate the dummy variables as follows;
(b) Focusing first on women only, run a regression of (1) earnings on height and (2) earnings on height,
including lt hs, hs, and some col as control variables.
(Answer)
5
i. Compare the estimated coefficient on height in regression (1) and (2). Is there a large change
in the coefficient? Has it changed in a way consistent with the cognitive ability explanation?
Explain.
(Answer)
The estimated coefficient on height falls by approximately 75%, from 511 to 135 when the
education variables are added as control variables in the regression. This is consistent with
positive omitted bias in (1).
iii. Test the joint null hypothesis that the coefficients on the education variables are equal tozero.
(Answer)
Do the following test right after estimating the regression (2).
6
The F-statistic is 578, and the corresponding p-value is ≈ 0.00. Therefore, the null hypothesis
that the coefficients on the education variables are jointly equal to zero is rejected at the 1%
significance level.
iv. Discuss the values of the estimated coefficients on lt hs, hs, and some col. (Each of the
estimated coefficients is negative, and the coefficient on lt hs is more negative than the
coefficient on hs, which in turn is more negative than the coefficient on some col. Why?
Wha do the coefficients measure?)
(Answer)
The coefficients measure the effect of education on earnings relative to the omitted category,
which is college. Thus, the estimated coefficient on the “Less than High School” regressor
implies that workers with less than a high school education on average earn $31,858 less per
year than a college graduate; a worker with a high school education on average earns $20,418
less per year than a college graduate; a worker with a some college on average earns $12,649
less per year than a college graduate.
7
i. The estimated coefficient on height falls by approximately 50%, from 1307 to 745. This is
consistent with positive omitted bias in the simple regression (1).
ii. The same answer as (b).
iii. The F-statistic is 500.9, and the corresponding p-value is ≈ 0.00. Therefore, the null hypothesis
that the coefficients on the education variables are jointly equal to zero is rejected at the 1%
significance level.
iv. The coefficients measure the effect of education on earnings relative to the omitted
category,which is college. Thus, the estimated coefficient on the “Less than High School”
regressor implies that workers with less than a high school education on average earn
$31,400 less per year than a college graduate; a worker with a high school education on
average earns $20,346 less per year than a college graduate; a worker with a some college on
average earns $12,611 less per year than a college graduate.
The following table summarizes all the regressions we run in this question.
Table 3: Earnings, Height and
Education (1) (2) (3) (4)
Women Women Men Men
height 511.2∗∗∗ 135.1 1306.9∗∗∗ 744.7∗∗∗
(97.58) (92.32) (98.86) (92.26)
-31857.8∗∗∗ -31400.5∗∗∗
lt hs
(835.0) (869.7)
-20417.9∗∗∗ -20345.9∗∗∗
hs
(637.8) (701.6)
∗∗∗
-12649.1
8 -12610.9∗∗∗
some col
(716.6) (797.8)
R2 0.003
0.138 0.021