04 Violation of Assumptions All

Multicollinearity
Multicollinearity is a phenomena that may be observed in multiple linear regression. It results in:
1. High standard error (small t-ratios) estimates and a high R2.
2. A wide confidence intervals for parameter estimates.
There are basically two reasons for this; either

1. We have little variation in the explanatory variables, or
2. There is high correlation among the explanatory variables.
Now, for the general case, we know that
=
∑
Where, is the square of multiple correlation coefficient of Xj and the other elements of X.
If X2, X3, ..., Xk are orthogonal (i.e., uncorrelated), R 2j = 0. Then

∑ ∑
= ∑
;…; = ∑
and
=∑ ∀ = 2, 3, … ,
 
Based on this fact, we use the ratio of var ˆ j to to obtain a measure of the degree
of multicolliniearity in the model, which is known as the Variance Inflation Factor (VIFj), i.e.
∑ !
= = ≥ 1; ∀ = 2, 3, … ,
∑
There is a positive relationship between R 2j and VIFj, which is depicted in the following table
1
R 2j VIFj
0.1 1.11
0.2 1.25
0.5 2
0.8 5
0.9 10
0.95 20
VIFj
1 R2j
Detection of Multicollinearity
1. The estimates are unstable: thus though R2 is high, the parameter estimates will be unstable
(insignificant). Actually, if R2j = 1 it follows that the parameter estimates will be indeterminate.
2. Deletion/addition of an explanatory variable causes large changes in the estimates of the
remaining coefficients.
Suppose that = 2+3 + and that =2 − 1. There is no disturbance term in the
equation for Y, but that is not important. Suppose that we have the six observations shown.
X1 X2 Y X1 X2 Y
10 19 51
11 21 56 1 2 5
12 23 61 1 2 5
13 25 66 1 2 5
14 27 71 1 2 5
15 29 76 1 2 5
2
The three variables are plotted as line graphs below. Looking at the data, it is impossible to tell
whether the changes in Y are caused by changes in X1, by changes in X2, or jointly by changes in
both X1 and X2.
Numerically, Y increases by 5 in each observation as X1 changes by 1.

Hence the true relationship could have been
= 2+3 +2 −1
= 1+5 However, it can also be seen that X2 increases by 2 in each observation; that is
= 2+3 + and that = 0.5 + 0.5 is also true; therefore
= 2 + 3(0.5 + 0.5 )+
= 3.5 + 2.5
These two possibilities are special cases of Y = 3.5 -2.5p + 5pX1 +2.5(1-p)X2, which would fit
the relationship for any value of p.
There is no way that regression analysis, or any other technique, could determine the true
relationship from this infinite set of possibilities, given the sample data
= + .What would happen if you tried to run a regression when there is an exact linear
relationship among the explanatory variables?
We will investigate, using the model with two explanatory variables shown above. [Note: A
disturbance term has now been included in the true model, but it makes no difference to the
analysis.]The multiple regression coefficient is calculated as shown.
( , ) ( ) ( , ) ( , )
= ( ) ( ) [ ( , )]
( , ) ( ) ([ ], ) ( ,[ ])
= ( ) ( ) [ ( ,[ ])]
( , ) ( ) ( , ) ( , )
= ( ) ( ) [ ( , )]
( , ) ( ) ( , ) ( )
= ( ) ( ) [ ( )]
=
It turns out that both the numerator and the denominator are equal to zero. The regression
coefficient is not defined.
3
It is unusual for there to be an exact relationship among the explanatory variables in a regression.
When this occurs it s typically because there is a logical error in the specification.Possible
measures for alleviating multicollinearity
What can we do about this problem if encountered?
Note that:
1. Multicollinearity does not cause the regression coefficients to be biased.
2. The standard errors and t tests remain valid.
The problem is large standard errors than they would be in the absence of multicollinearity.
How can we reduce the variances?
We might be able to reduce it by bringing more variables into the model and reducing the
population variance of the disturbance term.
Recall: the population variance of is
= ×
1) Reduce
by including further relevant variables in the model.2) Increase the number of observations.
Time series: decrease the time interval (from yearly to quarterly etc.)
Cross section: Increase the sample size.
3) Increase Var(Xj).
4) Reduce R 2j Combining the correlated variables.
If the correlated variables are similar conceptually, it may be reasonable to combine them into
some overall index; or
b) Dropping some of the correlated variables.
Drop some of the correlated variables, if they have insignificant coefficients.
Dangerous because some of the variables dropped may truly belong in the model and their
omission may cause omitted variable bias.
5) Empirical restriction
Use extraneous information, if available, concerning the coefficient of one of the variables.
For example, suppose that Y in
= + + +
4
Is the demand for a category of consumer expenditure (Y), X2 is aggregate disposable personal
income, and X3 is a price index for the category.
To fit a model of this type you would use time series data. If X2 and X3 are highly correlated,
which is often the case with time series variables, the problem of multicollinearity might be
eliminated in the following way.
Obtain data on income and expenditure on the category from a household survey and regress Y'
on X'. (The ' marks are to indicate that the data are household data, not aggregate data.)
′ ′ ′ ′
= + +
This is a simple regression because (Note there will be relatively little variation in the price paid
by the households in cross section data).
′ ′ ′ ′ ′ ′ ′
= + Now substitute for in the time series model. Subtract from both
sides of
= + + +
and regress
′
= − on
This is a simple regression, so multicollinearity has been eliminated.
There are some problems with this technique.
1. The coefficients may be conceptually different in time series and cross-section
contexts.
′
2. Since we subtract the estimated income component , not the true income component
, from Y when constructing Z, we have introduced an element of measurement error
in the dependent variable.
6. Theoretical restriction
Use theoretical restriction, (hypothetical relationship among the parameters of a regression
model).
It will be explained using an educational attainment model as an example. Suppose that we
hypothesize that highest grade obtained, Y, depends on Ability, X2 and highest education level
completed by the respondent's mother and father, X3 and X4, respectively.
5
Mother's education is generally held to be at least, if not more, important than father's education
for educational attainment, so this outcome is unexpected.
= + + + +
=
 2  3 therefore
= + + ( + )+
6
Multiple Regression Analysis: Heteroskedasticity
What is Heteroskedasticity?
In this chapter, we discuss

1) the definition of
2) How to test for its presence.
3) The available remedies when heteroskedasticity occurs.
We begin by briefly reviewing the consequences of heteroskedasticity for ordinary least squares
estimation.
 The assumption of homoskedasticity implies that conditional on the X variables, the variance
of the unobserved error, , is constant
 If this is not true, that is, if the variance of is different for different values of the X’s, then
the errors are heteroskedastic
 Example: in a savings equation, heteroskedasticity is present if the variance of the
unobserved factors affecting savings increases with income.
 Homoskedasticity justifies the use of the usual t and F tests, and confidence intervals for
OLS estimation of the linear regression model, even with large sample sizes.
Example of Heteroskedasticity
f(Y|X)
Y
.
. E(Y|0 + 1X
.
X1 X2 X3
X
Why Worry About Heteroskedasticity?
Consider again the multiple linear regression model:
= + + ⋯+ +
 Recall: … , under the first four Gauss-Markov assumptions, are unbiased.
 Homoskedasticity assumption plays no role in showing whether OLS was unbiased.
7
 Heteroskedasticity does not cause bias in the OLS estimators of the
 Why introduce it as one of the Gauss-Markov assumptions?
 Recall 1: without the homoskedasticity assumption, the estimators of Var( ) are biased.
 Given that OLS standard errors are based on these variances, they are invalid for constructing
confidence intervals and t statistics.
 The OLS t and F statistics do not have t and F distributions in the presence of
heteroskedasticity.
 The statistics we use to test hypotheses under the G-M assumptions are invalid in the
presence of hetero.
 Recall 2: The G-M theorem says that OLS is BLU, that relies on the homoskedasticity
assumption.
 If Var( |X) is not constant OLS is no longer BLUE.
 In addition, OLS is no longer asymptotically efficient in the class of estimators described in
Theorem 5.3.
 It is possible to find estimators that are more efficient than OLS in the presence of
heteroskedasity (although it requires knowing the form of the hetero).
Detecting Hetroskedasticity
Y X Ymybar xmxbar xbyb xbxb yhat uh uh2
19.9 22.3 -3.66 -2.95 10.78 8.70 20.90 -1.00 1.00
31.2 32.3 7.65 7.05 53.90 49.70 29.90 1.30 1.70
31.8 36.6 8.25 11.35 93.58 128.82 33.76 -1.96 3.85
12.1 12.1 -11.46 -13.15 150.63 172.92 11.73 0.37 0.14
40.7 42.3 17.15 17.05 292.32 290.70 38.89 1.81 3.28
6.1 6.2 -17.46 -19.05 332.52 362.90 6.42 -0.32 0.10
38.6 44.7 15.05 19.45 292.63 378.30 41.05 -2.45 5.99
25.5 26.1 1.95 0.85 1.65 0.72 24.32 1.18 1.39
10.3 10.3 -13.26 -14.95 198.16 223.50 10.11 0.19 0.04
38.8 40.2 15.25 14.95 227.91 223.50 37.00 1.80 3.24
8 8.1 -15.56 -17.15 266.77 294.12 8.13 -0.13 0.02
33.1 34.5 9.55 9.25 88.29 85.56 31.87 1.23 1.50
33.5 38 9.95 12.75 126.80 162.56 35.02 -1.52 2.31
13.1 14.1 -10.46 -11.15 116.57 124.32 13.53 -0.43 0.18
14.8 16.4 -8.76 -8.85 77.48 78.32 15.60 -0.80 0.63
21.6 24.1 -1.96 -1.15 2.25 1.32 22.52 -0.92 0.85
29.3 30.1 5.75 4.85 27.86 23.52 27.92 1.38 1.91
25 28.3 1.45 3.05 4.41 9.30 26.30 -1.30 1.68
17.9 18.2 -5.66 -7.05 39.87 49.70 17.21 0.69 0.47
19.8 20.1 -3.76 -5.15 19.34 26.52 18.92 0.88 0.77
23.56 25.25 2423.73 2695.05 31.07
b1 0.899327
b0 0.852005
8
x
50
40
30
x
20
10
0
0 10 20 30 40 50
uh
3
0 uh
0 10 20 30 40 50
-1
-2
-3
Detection of hetroskedasticity
1. cross-plot the residuals versus the explanatory variable.

Formal tests:
Note: any regression of the residual and the explanatory variable give no significant relationship.
1. Ancombe’s and Ramsey’s test (the latter is also known as the RESET test):
Regress the residuals, ̂, on , …. and test whether the coefficients are significant. If
significant there is indication of hetroskedasticity.
9
2. White suggests regressing ̂ , on all the explanatory variables, their squares and cross
products. And follow the same rules as above.
3. Glejser suggested estimating regressions of the type

| ̂| = +
| ̂| = +
| ̂| = + √
4. the Goldfeld and Quandt test.

a) split the observations into two groups: One corresponding to large values of X and the
other corresponding to small values.
b) Fit two regressions for these separate groups and apply the F test for equality of error
variance.
Omitting some observations in the middle could increase the ability of discrimination.
For our data in levels, all the tests except the RESET, reject the homo hypothesis.
For the log-linear model except the RESET and Goldfeld and Quandt model, all the other test
reject the hypothesis that we have homoskedastic errors.
The variance of estimators when we have heteroskedasticity
For the simple case

= + +
∑( ) ∑( ) ∑( )
= + ∑( )
So = (∑( ) )
=
A valid estimator for this when

∑( )
≠ is then
Where ̂ are the OLS residuals
 
For the general multiple regression model a valid estimator for Var ˆ j with hetero is
∑ ̂
=
Where
10
̂ is the ith residual from regressing Xj on all the other independent variables, and SST j is the
sum of squared residuals from this regression
Robust Standard Errors
 Now that we have a consistent estimate of the variance, the square root can be used as a
standard error for inference
 Typically call these robust standard errors
 Sometimes the estimated variance is corrected for degrees of freedom by multiplying by n/(n
– k – 1); as n → ∞ it’s all the same, though
 Important to remember that these robust standard errors only have asymptotic justification –
with small sample sizes t statistics formed with robust standard errors will not have a
distribution close to the t, and inferences will not be correct
 In Stata, robust standard errors are easily obtained using the robust option of reg
Robust LM Statistic
 Run OLS on the restricted model and save the residuals ̂
 Regress each of the excluded variables on all of the included variables (q different
regressions) and save each set of residuals ̂ , ̂ , ..., ̂
 Regress a variable defined to be = 1 on rˆ1uˆ , rˆ2uˆ ,..., rˆquˆ , with no intercept
 The LM statistic is n – SSR1, where SSR1 is the sum of squared residuals from this final
regression
Effects/consequences of heteroskedasticity
Hetroskedasticity implies that the variance of error terms is not constant. Thus,
[ ]=
The main effects of this are:
1. The least squares estimates are still unbiased, but they are inefficient, and
2. The estimates of the variance are biased–thus the tests of significance based on them are
invalid.
To see this consider the simple regression model in deviation form
Y   0  1 X  u , with
The OLS estimator of the slope parameter is
∑
= + ∑
Since [ ] = 0and and Xi are independent, it follows that
11
∑ ∑ [ ]
= + ∑
= + ∑
=
Therefore, the OLS estimate is unbiased.

However, if s are mutually independent, i.e., and are uncorrelated for all i ≠ j,
∑
= + ∑
= +∑ +∑ +⋯+ ∑
= [ ]+ [ ] + ⋯+ [ ]
∑
= [ + +⋯+ ]
∑
∑
=
∑
If we know the functional form of the variance up to a multiplicative constant, which we do not
know, we can proceed to obtain the variance by making nessary transfomations and obtain
Efficient estmators of , known as the weighted least squares – I leave this for graduate school.
Conclusion:
If the error terms are not homoskedastic
a) OLS estimators are unbiased
b) The OLS estimators are less efficient (i.e., have higher variance) than the WLS.
Looking at the estimator of the variance of βOLS, we estimate it by
 While it’s always possible to estimate robust standard errors for OLS estimates, if we know
something about the specific form of the heteroskedasticity, we can obtain more efficient
estimates than OLS
 The basic idea is going to be to transform the model into one that has homoskedastic errors –
called weighted least squares
12
Autocorrelation
The term autocorrelation may be defined as correlation between members of series of
observations ordered in time (as in time series data) or space (cross-sectional data).
Autocorrelation versus Serial Correlation: Similarities and differences.
Consider a two-variables linear regression model of the form:
= + + = 1, 2, …
Autocorrelation is present in the residuals if
[ ] ≠ 0 for > 1 thus,
[ , ] ≠ 0 for | | ≠
The autocovariance at lag s is defined by
= [ ] for | | = 0, ±1, ±2, ….
At lag zero, i.e., s = 0, we have = [ ]= [ ]= i.e., constant variance
The Autocorrelation coefficient at lag s is defined by:
=
Note that =1
The γ's and the ρ's are symmetric in s (the lag) and do not depend on t (time), i.e., they only
depend on the lags.
The covariance matrix of the 's is:
⋯
⋯
[ ′] = ⋯ ⋯ ⋯ ⋱ ⋯
⋯
Since = and letting = a constant; we write the autocorrelation matrix as:
1 ⋯
1 ⋯
[ ′] = ⋯ ⋯ ⋯ ⋱
⋯
⋯
1
Given that there is temporal dependence in the residuals, how do we best model it? There are
three main types of time series process that we could use:
Example: Monthly data to estimate a model that explains the demand for ice cream. The weather
will be an important factor hidden in the error term : positive and negative residuals group
together
13
.6
.5
.4
.3
.2
0 10 20 30
time
cons Fitted values
ice cream consumption = f(aggregate income and a price index).
use "C:\Users\6440\Documents\DocA\Econometrics\Econ352\201819\Econ2061\DataVerbeek\icecream.dta"
tsset time
twoway (scatter cons time)
reg cons price income
predict chat
twoway (scatter cons time) (line chat time)
1. Autoregressive process of order p AR(p):

= + + ⋯+ + = 1, 2, …
Where
[ ] = 0; [ ]= and [ , ] = 0 and is white noise
2. Moving Average process of order q MA(q):

ut  t   1  t 1   2  t 2     q  t q
Where E  t   0 Var  t    2 and Cov t ,  s   0 for s  t
3. Autoregressive-Moving Average Process ARMA(p, q):
ut  1 ut 1   2 ut 2     p u t  p   t   1 t 1   2  t 2     q  t q
Where E  t   0 Var  t    2 and Cov t ,  s   0 for s  t
14
This is a very general process and may be difficult to estimate. However, we can usually be
fairly confident of capturing realistic dynamics in the residuals by considering low order AR
process. Hence, we will confine our selves (our attention) to the AR(1) model:
First Order Autocorrelation
Many forms of autocorrelation; The most popular form is known as the first-order autoregressive
process, which is given by
= + + ⋯+ +
and error terms following the following relation
= +
where ut is an error term with mean zero and constant variance , with no serial correlation.
The parameters ρ and are unknown, and, along with β we may wish to estimate them. Note
that the statistical properties of ut are the same as those assumed for in the standard case: thus
if ρ = 0, = ut and the standard Gauss–Markov conditions are satisfied.
To derive the covariance matrix ( ), we need to make an assumption about the distribution
of the initial period error, . We assumed that
1. is mean zero with the same variance as all other s.
2. The process has been operating for a long period in the past and that |ρ| < 1.
3. When |ρ| < 1 is satisfied we say the first-order autoregressive process is stationary.
4. A stationary process is such that the mean, variances and covariances of do not change over
time. Imposing stationarity it easily follows from
[ ]= [ ]+ [ ]→ [ ]=0
Moreover
[ ]= [ + ]= [ ]+
If we let [ ]= , we have
= + → =
The nondiagonal elements in the variance–covariance matrix of follow from
[ , ] = [( + ) ]= [ + ]
= [ ]+ [ ] = =
The covariance between error terms two periods apart is
[ , ] = [( + ) ]= [( + )]
= [ ]+ [ ]=
and in general we have, for non-negative values of s,
[ , ]=
15
Thus for 0 < |ρ| < 1 all elements in are mutually correlated with a decreasing covariance if the
distance as s gets large. The covariance matrix of is a full matrix (a matrix without zero
elements). Given our model
= + + ⋯+ +
with = + , where ut satisfies the Gauss–Markov conditions: A transformation that
leads to an error term with
= −
will generate homoskedastic non-autocorrelated errors. To do so, take
= + + ⋯+ +
lag Yt by one period and multiply this by to get
= + + ⋯+ +
then subtract from and get
− = (1 − ) + ( − ) + ⋯+ ( − )+ − , i.e.,
− = (1 − ) + ( − ) + ⋯+ ( − )+
That is, all observations should be transformed as Yt − ρYt−1 and Xt − ρXt−1.

Because the model in ut satisfies the Gauss–Markov conditions, estimation with OLS yields the
GLS estimator (assuming ρ is known).
Note: this transformation cannot be applied to the first observation (because Y0 and X0 are not
observed). The information in this first observation is lost and OLS in produces only an
approximate GLS estimator. With large n, the loss of a single observation will typically not have
a large impact on the results.
We can rescue to first observation by noting that , is uncorrelated with all ut s, t = 2, . . . , T .
However, the variance of is much larger than the variance of the transformed errors (u2, . . . ,
uT ), particularly when ρ is close to unity. To obtain homoskedastic and non-autocorrelated errors
in a transformed model (which includes the first observation), this first observation should be
transformed by multiplying it by 1 − , i.e.,
1− = 1− + 1− + ⋯+ 1− + 1−
OLS applied on the transformed variables leads to BLUE estimators and are called GLS
estimator.
Early work (Cochrane and Orcutt, 1949) dropped the first (transformed) observation to estimate
β from the remaining T − 1 transformed observations.
The estimator that uses all transformed observations is sometimes called the Prais–Winsten
(1954) estimator.
Unknown ρ
In practice ρ is known. In that case we have to estimate it. Given

= +
It seems natural to estimate ρ from a regression of the OLS residual et on et−1. The resulting OLS
estimator for ρ is given by
16
∑
= ∑
This estimator is typically biased, but under weak regularity conditions it is a consistent
estimator. Using instead of ρ to compute the feasible GLS (EGLS) estimator , the BLUE
property is no longer retained. Under the same conditions as before, it holds that the EGLS
estimator is asymptotically equivalent to the GLS estimator . That is, for large sample sizes
we can ignore the fact that ρ is estimated.
The Cochrane–Orcutt procedure, which is applied in many software packages, ρ and β are
recursively estimated until convergence
Testing for First Order Autocorrelation
When ρ = 0 no autocorrelation is present and OLS is BLUE. If ρ ≠ 0 inferences based on the

OLS estimator will be misleading because standard errors will be based on the wrong formula.
Therefore, it is common practice with time series data to test for autocorrelation in the error term.
Suppose we want to test for first order autocorrelation indicated by ρ ≠ 0 in = + .
We will present several alternative tests for autocorrelation
1 Asymptotic Tests
The OLS residuals from
= + + ⋯+ +
= + ,
where ut satisfies the Gauss–Markov conditions
Consider the regression of the OLS residual et upon its lag et−1 (auxiliary regression): with or
without an intercept term.
This auxiliary regression produces and ( ). If Yt-1 is not included in the model, the
corresponding t-test is asymptotically valid.
In fact, the resulting test statistic can be shown to be approximately equal to
≈√
which provides an alternative way of computing the test statistic.
At the 5% significance level we reject the null hypothesis of no autocorrelation against a two-
sided alternative if |t | > 1.96.
If H1 is positive autocorrelation (ρ > 0), which is often expected a priori, the null hypothesis is
rejected at the 5% level if t > 1.64.
Alternative test: based upon the R2 of the auxiliary regression (including an intercept term).
If we take the R2 of this regression and multiply it by the effective number of observations T − 1
we obtain a test statistic that, under the null hypothesis, has a χ2 distribution with one degree of
freedom.
Clearly an R2 close to zero in this regression implies that lagged residuals are not explaining
current residuals and a simple way to test ρ = 0 is by computing (T − 1)R2.
If the model of interest includes a Yt-1 (or other X variables that are correlated with ), the
tests are still appropriate provided that the regressors (X) are included in the auxiliary
regression.
17
The Durbin–Watson Test
A popular test for 1st order autocorrelation: Durbin and Watson, 1950: Small sample distribution
under a restrictive set of conditions.
Two important assumptions:
a) Xs as deterministic (no Yt-1 in the model) and
b) the regression contain an intercept term.
The simplest and most commonly used model is one where the errors t and t-1 have a
correlation ρ.
Think of testing hypotheses about ρ on the basis of , the correlation between the least squares
residuals ̂ and ̂ . A commonly used statistic for this purpose (which is related to ) is the
Durbin-Watson (DW) statistic, (denote by d). It is defined as
∑ ( )
= ∑
where ̂ , is the estimated residual for period t. We can write d as
∑ ∑ ∑
= ∑
Since ∑ ̂ and ∑ ̂ , are approximately equal if the sample is large, we have
∑ ∑ ∑ ∑ ∑ ∑
= ∑
= ∑
=2 ∑
− ∑
= 2(1 − )
If = 1 , then d = 0, and if = - 1 , then d = 4. We have d = 2 if = 0. If d is close to 0 or 4, the

residuals are highly correlated.
The sampling distribution of d depends on the values of the explanatory variables and hence
Durbin and Watson derived upper (dU) limits and lower (dL) limits for the significance levels for
d.
There are tables to test the hypothesis of zero autocorrelation against the hypothesis of first-order
positive autocorrelation. (For negative autocorrelation we interchange dL and dU)
If d < dL, we reject the null hypothesis of no autocorrelation.
If d > dU we do not reject the null hypothesis.
If dL < d < dU the test is inconclusive.
The UB of the DW statistic is a good approximation to its distribution when the regressors are
slowly changing and as economic time series change slowly one can use dU as the correct
significance point.
The significance points in the DW are tabulated for testing ρ = 0 against ρ > 0. If d > 2 and we
wish to test the hypothesis ρ = 0 against ρ < 0, we consider 4 - d and refer to them as if we are
testing for positive autocorrelation.
Although we have said that d = 2(1 - ) this approximation is valid only in large samples.
18
DYNAMIC MODELS
So far we have considered purely static models of the form:
= + +
What implications does this have for the underlying behaviour of economics agents?
What is the effect on Y of a change in X?
If X increases by 1 unit in period t, then Y will increase by β2 units instantaneously. This is
unrealistic. In the real world adjustment does not take place immediately—there are lags
involved. These lags may be due to:
i. time to feed through the economy
ii. lack of information
iii. search cost
iv. adjustment costs
Suppose it takes three periods for a change in X to feed through into Y given by the model of the
form:
= + + + +
The effect on Y of a one-time change in X is as follows:
Period Effect (Cumulative)
t β0
t+1 β0+ β1
t+2 β0+ β1+ β2
This type of model is known as a DISTRIBUTED LAG MODEL. In general we havve a model
of the form:
= + + + +⋯+ +
= + + + +
The immediate effect of a change in X is given by β0—This is the impact multiplier.

The effect next period is given by (β0+ β1), then (β0+ β1+ β2) and so on—these are known as
interim multipliers.
The total effect of any change in X is given by, ∑ , this the long run or total multiplier.
19
The easiest way to estimate Distributed Lag (DL) model is by OLS but note that estimation is
based on T-K observations.
Example: t Xt Xt-1 Xt-2
1 12 - -
2 10 12 -
3 14 10 12
4 15 14 10
Suppose we have an infinite distributed lag model in variable of the following form:
= + + + + ⋯+
Where the lag length is not defined, that is, how far into the past we want to go. One alternative
estimation mechanism of such a model is to use the KOYCK approach to distributed lag model.
If the β's are all of the same sign, Koyck assumes that they decline or decay geometrically as
follows:
= = 0,1,2, …
Where λ, such that 0 < λ <1, is known as the rate of decline or decay, of the distributed lag
1- λ is known as the speed of adjustment.
By assuming that λ<1, we attaching lesser weight to the distant β's that current ones, and also
insures that the sum of the β's, which gives the long-run multipliers, is finite, namely:
0
=
1−
=1
Using equation (2), equation (1) can be written as:
= + + + + ⋯+
Still this equation is not amenable to easy estimation. Now lag this equation by one period and
multiply both sides of the resulting model by λ to obtain:
= + + + +⋯+
Now subtract this equation from the original equation to obtain:
− = (1 − ) + +( − )
Or rearranging,
= (1 − ) + + +
Where ut = εt-λεt-1, a moving average of εt and εt-1.
20
AUTOREGRESSIVE MODELS
These are models containing lagged dependent variables as regressors.
= + + +
An AR(1) model may be written as:
= + +
which does not depend on Xt.
The use of OLS on autoregressive models is biased in finite samples, i.e., E ˆ 1   1 , but this
bias disappears as the sample size gets larger. That is, OLS is consistent.
NB. A special case of AR(1) model is the random walk, when we set α0 = 0 and α1 = 1, i.e.,
= +
If α0 ≠ 0, we have a random walk with drift:
= + +
These models are non-stationary models. The condition for stationarity is: |α1| < 1.
Why may we be interested in such dynamic models? What is the motivation or justification for
their use?
THE ADAPTIVE EXPECTATIONS MODEL
Consider the following model:
∗
= + +
∗
Where : is equilibrium/desired/ long-run/ expected/normal values of Xt.
∗ ∗
Now is not observable, but Xt is observable; so we need some way of relating to Xt. we
might suggest the following adjustment mechanism:
∗ ∗ ∗
− = ( − )
Where γ is an adjustment parameter–typically we would expect 0 < γ ≤ 1, i.e., expectations are
revised every period by a fraction γ of the difference between current value of Xt and previous
period's expectations..
∗ ∗ ∗ ∗
How can we use this information? Solve for in − = ( − ) to get
∗ ∗
= − (1 − )
∗
Lag = + + and multiply by 1 − to get
(1 − ) ∗
= (1 − ) + (1 − ) + (1 − )
∗
Subtruct this from = + + to get
21
∗ ∗
− (1 − ) = [1 − (1 − )] + − (1 − ) + − (1 − )
∗ ∗
− (1 − ) = + ( − (1 − ) )+ − (1 − )
∗ ∗
and substitute into the original equation:but = − (1 − )
therefore
− (1 − ) = + + − (1 − )
Or
= (1 − ) + + +
Where νt = ut - (1-γ)ut-1, which is a moving average error term depending on (1-γ).
THE PARTIAL ADJUSTMENT MODEL (Stock adjustment Model):

Here we assume that the desired level of Y (Y*) is a function of the observable X, i.e.,
∗
= + +
∗
Where is the desired/expected level of Y.
In this model, it is assumed that Yt adjusts as follows:
∗
− = ( − )
Where δ is the adjustment parameter: 0 < δ ≤ 1, i.e., the actual change in Y is some proportion δ
of the desired change in Y.
Solving this for Yt gives:
∗
= + (1 − )
∗ ∗ ∗
Substitute equation = + + into = + (1 − ) for
= ( + + ) + (1 − ) = + + (1 − ) +
Economic theory typically has little to say about dynamic adjustment, but the partial adjustment
and adaptive expectations model go some way towards overcoming these drawbacks.
From our discussions so far we have the following three models:

Koyck Model:
= (1 − ) + + +
Adaptive Expectations Model:
= (1 − ) + + +
Partial Adjustment Model:
22
= + + (1 − ) +
All these models have the following common form:
= + + +
They are all autoregressive in nature. The classical least squares estimation cannot be directly
applicable for such models for two reasons:
i. Presence of stochastic explanatory variable
ii. Possibility of serial correlation
For the above models, OLS estimators are not only biased but also not consistent as the
stochastic explanatory variable in the regression model is correlated with the stochastic
disturbance term.
THE METHOD OF INSTRUMENTAL VARIABLES (IV)

OLS cannot be applied to Koyck, adapative expectations and partial adjustment models for the
simple reason that the explanatory variable Yt-1 tends to be correlated with error terms. If some
how this correlation can be removed, one can apply OLS to obtain consistent estimators.
Let us suppose we can find a proxy variable for Yt-1 that is highly correlated with Yt-1 but
uncorrelated with the error term νt, where νt is the error term appearing in the Koyck, adpative
expectations, or partial adjustment models.
Such a variable is called an instrumental variable (IV). On suggestion could to use Xt-1 as an
instrument for Yt-1 and further suggestion is that the parameters of the model can be obtained by
solving the following normal equations:
= + +
= + +
= + +
23
Dubin watson statistic for testing first order autocorrelation is no longer appropriate if the model
contains lagged dependent variable as a regressor (i.e., Yt-1).
TESTING FOR AUTOCORRELATION IN MODELS WITH LAGGED DEPENDENT

VARIABLE:
Consider the model,
= + + +
Since Yt-1 is included as a regressor, DW is not appropriate. In this case, the appropriate test
statistic is the Durbin's h-statistic. Durbin's h-statistic to test for 1st order correlation is given
by:
ℎ=
1− ( 2)
Where T - Number of observations used to estimation

̂ - Estimate of ρ in εt = ρεt-1+ ut
In large samples, if ρ = 0 then h has a N(0, 1) distribution, i.e., H0: ρ = 0 (no first order auto
correlation).
Decision rule: if | h | > Z(α/2) then reject H0 at α% level of significance, do not reject H0 if
otherwise.
There is an easy way to compute h, since we know that:
1
≈= 1 −
2
where d is the DW statistic; thus
1
ℎ = 1−
2 1− ( 2)
NB. This test statistic can not be used if T .Var ̂ 2  ≥ 1. Alternatively, on can use Breusch-
Godfrey test of higher order autocorrelation, also known as the Lagrange multiplier test.
24

04 Violation of Assumptions All

Uploaded by

Copyright:

Available Formats

04 Violation of Assumptions All

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

04 Violation of Assumptions All

Uploaded by

Copyright:

Available Formats

Multicollinearity

There are basically two reasons for this; either

Now, for the general case, we know that

If X2, X3, ..., Xk are orthogonal (i.e., uncorrelated), R 2j = 0. Then

Numerically, Y increases by 5 in each observation as X1 changes by 1.

4) Reduce R 2j Combining the correlated variables.

In this chapter, we discuss

1. cross-plot the residuals versus the explanatory variable.

3. Glejser suggested estimating regressions of the type

4. the Goldfeld and Quandt test.

The variance of estimators when we have heteroskedasticity

For the simple case

A valid estimator for this when

Robust Standard Errors

To see this consider the simple regression model in deviation form

Since [ ] = 0and and Xi are independent, it follows that

Therefore, the OLS estimate is unbiased.

cons Fitted values

ice cream consumption = f(aggregate income and a price index).

1. Autoregressive process of order p AR(p):

2. Moving Average process of order q MA(q):

That is, all observations should be transformed as Yt − ρYt−1 and Xt − ρXt−1.

In practice ρ is known. In that case we have to estimate it. Given

Testing for First Order Autocorrelation

When ρ = 0 no autocorrelation is present and OLS is BLUE. If ρ ≠ 0 inferences based on the

If = 1 , then d = 0, and if = - 1 , then d = 4. We have d = 2 if = 0. If d is close to 0 or 4, the

The immediate effect of a change in X is given by β0—This is the impact multiplier.

THE PARTIAL ADJUSTMENT MODEL (Stock adjustment Model):

From our discussions so far we have the following three models:

THE METHOD OF INSTRUMENTAL VARIABLES (IV)

TESTING FOR AUTOCORRELATION IN MODELS WITH LAGGED DEPENDENT

Where T - Number of observations used to estimation

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.