Tutorial Session 11 - Heteroscedasticity Solution
Tutorial Session 11 - Heteroscedasticity Solution
Heteroscedasticity occurs when the error term does not have constant variance. It is a violation
of multiple linear regression M6. By violating assumption M6, heteroscedasticity prevents OLS
estimates from being BLUE because it results in standard errors that are not minimum variance
among the class of unbiased estimators.
2. What are the null and alternative hypotheses for testing for the presence of heteroscedasticity?
Why? Explain.
The null hypothesis for testing for the presence of heteroscedasticity is that the error term has
constant variance (or 𝐻𝐻0 : 𝑉𝑉𝑉𝑉𝑉𝑉(𝜀𝜀𝑖𝑖2 ) = 𝜎𝜎 2 ). The alternative hypothesis is that the error term has
non-constant variance (or 𝐻𝐻0 : 𝑉𝑉𝑉𝑉𝑉𝑉(𝜀𝜀𝑖𝑖2 ) ≠ 𝜎𝜎 2 ). The reason for this is that if we can reject that
null hypothesis that the error term has constant variance, then we can conclude that it has non-
constant variance, meaning that our heteroscedasticity is present in our data.
3. Suppose you are interested in explaining variation in Body Mass Index in a nationally-
representative sample of 12,486 men and women. You estimate the sample regression
function (with D =1 if female and 0 otherwise) as follows (and with standard errors in
parentheses):
a) Which variable do you suspect could be responsible for heteroscedasticity in this model?
Why? Explain.
It seems most reasonable that Age is likely responsible for heteroskedasticity if present in
this model. The hypothesis would be that as people get older the variance of BMI
increases.
b) How would you use Weighted Least Squares to account for heteroskedasticity if it is
present? Explain.
To use Weighted Least Squares you need to know the exact form of heteroscedasticity.
Assume the form of heteroscedasticity is ℎ(𝑥𝑥) = 𝜎𝜎 2 𝐴𝐴𝐴𝐴𝐴𝐴𝑖𝑖 . As age increases so does the
variance of the error term increase proportionally i.e. it is linearly related. Because we
have assumed the exact form of heteroscedasticity, we can use weighted least squares.
Estimate the model
The variance of the error term of the weighted least squares model is now homoscedastic.
Page 1 of 3
4. Suppose you are interested in explaining variation in Vacation Spending in a sample of
10 000 high-school and university graduates. You estimate the sample regression function
(with D =1 if university graduate and 0 otherwise) (standard errors in parentheses)
𝑉𝑉𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎�
𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑖𝑖 = 109.82 + 0.12 𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝑖𝑖 + 317.39 𝑈𝑈𝑈𝑈𝑈𝑈𝑈𝑈𝑈𝑈𝑈𝑈𝑈𝑈𝑈𝑈𝑈𝑈𝑈𝑈 𝐺𝐺𝐺𝐺𝐺𝐺𝐺𝐺𝐺𝐺𝐺𝐺𝐺𝐺𝐺𝐺𝑖𝑖
b) Suppose you know that heteroscedasticity takes the form ℎ(𝑥𝑥) = 𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝑖𝑖2 ∙ 𝜎𝜎 2 . How
would you use Weighted Least Squares to correct for the heteroscedasticity? Explain.
To use Weighted Least Squares you need to know the exact form of heteroscedasticity.
In this case the form of heteroscedasticity is ℎ(𝑥𝑥) = 𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝑖𝑖2 ∙ 𝜎𝜎 2 . So the variance of
the error term increases to the square of income. Because we have assumed the exact
form of heteroskedasticity, we can use weighted least squares. Estimate the model
The reason that weighted least squares works is that the variance of the error term is
now homoscedastic.
5. You are interested in estimating the money demand function in several countries. You obtain
cross-sectional data for 39 countries for the year 2015, and estimate the following model:
------------------------------------------------------------------------------
lnmon | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
gdp | 1.227289 .0273449 44.88 0.000 1.171776 1.282802
cpi | .0429665 .0316949 1.36 0.184 -.0213776 .1073105
_cons | -2.27753 .3034308 -7.51 0.000 -2.893527 -1.661532
------------------------------------------------------------------------------
Variables: lnmon the natural log of the broad money stock per capita (in US $)
gdp GDP per capita (in US $)
cpi consumer inflation rate (% change)
Page 2 of 3
You also generated the following graph:
1000
800
600
Squared
400
200
0
2 4 6 8
gdp
c) To remedy for the potential problem, you attempt the following transformation:
. g gdpsqrt=sqrt(gdp)
. g lnmont=lnmon/gdpsqrt
. g gdpt=gdp/gdpsqrt
. g cpit=cpi/gdpsqrt
. g xt=1/gdpsqrt
and you regress the transformed variables lnmont on gdpt, cpit and xt.
What is the assumption that you have made regarding the nature of the pattern/structure
of the problem suspected?
The error variance is proportional to X (gdp).
Page 3 of 3