Chapter Four Violations of Basic Classical Assumptions: Y and The Random Error Term U
Chapter Four Violations of Basic Classical Assumptions: Y and The Random Error Term U
Chapter Four Violations of Basic Classical Assumptions: Y and The Random Error Term U
with mean zero and var(u t ) = σ 2 , and that the errors corresponding to different observation are
Now, we address the following ‘what if’ questions in this chapter. What if the error variance is
not constant over all observations? What if the different errors are correlated? What if the
explanatory variables are correlated? We need to ask whether and when such violations of the
basic classical assumptions are likely to occur. What are the consequences such violations on
least square estimators? How do we detect the presence of autocorrelation, heteroscedasticity, or
multicollineairy? What are the remedial measures? In the subsequent sections, we attempt to
answer such questions.
4.1 Heteroscedasticity
4.1.1 The Nature of Heteroscedasticty
In the classical linear regression model, one of the basic assumptions is that the probability
distribution of the disturbance term remains same over all observations of X; i.e. the variance of
each u i is the same for all the values of the explanatory variable. Symbolically,
1|Page
4.1.2 Graphical Representation of Heteroscedasticity and Homoscedasticity
The assumption of homoscedasticity states that the variation of each u i around its zero mean
does not depend on the value of explanatory variable. The variance of each u i remains the same
function of X; i.e. σ 2 ≠ f ( X i )
If σ u2 is not constant but its value depends on the value of X; it means that σ ui2 = f ( X i ) . Such
dependency is depicted diagrammatically in the following figures. Three cases of
heteroscedasticity all shown by increasing or decreasing dispersion of the observation from the
regression line.
In panel (a) σ u2 seems to increase with X. in panel (b) the error variance appears greater in X’s
middle range, tapering off toward the extremes. Finally, in panel (c), the variance of the error
term is greater for low values of X, declining and leveling off rapidly an X increases.
The pattern of hetroscedasticity would depend on the signs and values of the coefficients of the
relationship σ ui2 = f ( X i ) , but u i ’s are not observable. As such in applied research we make
convenient assumptions that hetroscedasticity is of the forms:
i. σ ui2 = K 2 ( X i2 )
ii. σ 2 = K 2 (Xi )
K
iii. σ ui2 = etc.
Xi
2|Page
4.1.3. Reasons for Hetroscedasticity
There are several reasons why the variances of u i may be variable. Some of these are:
1. Error learning model: it states that as people learn their error of behavior become smaller over
time. In this case σ i2 is expected to decrease.
Example: as the number of hours of typing practice increases, the average number of typing
errors and as well as their variance decreases.
2. As data collection technique improves, σ ui2 is likely to decrease. Thus banks that have
sophisticated data processing equipment are likely to commit fewer errors in the monthly or
quarterly statements of their customers than banks without such facilities.
3. Heteroscedasticity can also arise as a result of the presence of outliers. An outlier is an
observation that is much different (either very small or very large) in relation to the other
observation in the sample.
Ε(αˆ ) = α + β X + Ε(U ) − Ε( βˆ ) X = α
i.e., the least square estimators are unbiased even under the condition of heteroscedasticity. It is
because we do not make use of assumption of homoscedasticity here.
2. Variance of OLS coefficients will be incorrect
σ2
Under homoscedasticity, var(βˆ ) = σ 2 ΣK 2 = 2 , but under hetroscedastic assumption we shall
Σx
have: var(βˆ ) = ΣK i2 Ε(Yi 2 ) = ΣK i2σ ui2 ≠ σ 2 ΣK i2
3|Page
σ ui2 is no more a finite constant figure, but rather it tends to change with an increasing range of
value of X and hence cannot be taken out of the summation (notation).
3. OLS estimators shall be inefficient: In other words, the OLS estimators do not have the
smallest variance in the class of unbiased estimators and, therefore, they are not efficient both in
small and large samples. Under the heteroscedastic assumption, therefore:
x Σxi2σ ui2
var(βˆ ) = ΣK i2 Ε(Y 2i) = ∑ i2 Ε
i (Y 2
) = − − − − − − − − − 3.11
Σx (Σxi2 ) 2
σ2
Under homoscedasticy, var(βˆ ) = − − − − − − − − − − − − − − − − − − − 3.12
Σx 2
These two variances are different. This implies that, under heteroscedastic assumption although
the OLS estimator is unbiased, but it is inefficient. Its variance is larger than necessary.
To see the consequence of using (3.12) instead of (3.11), let us assume that:
σ ui2 = K iσ 2
Where K i are same non-stochastic constant weights. This assumption merely states that the
greater than 1, then var( βˆ ) under heteroscedasticty will be greater than its variance under
homoscedasticity. As a result the true standard error of βˆ shall be underestimated. As such the
t-value associated with it will be overestimated which might lead to the conclusion that in a
specific case at hand βˆ is statistically significant (which in fact may not be true). Moreover, if
we proceed with our model under false belief of homoscedasticity of the error variance, our
inference and prediction about the population coefficients would be incorrect.
4|Page
4.1.5. Detecting Heteroscedasticity
We have observed that the consequences of heteroscedasticty are serious on OLS estimates. As
such, it is desirable to examine whether or not the regression model is in fact homosedastic.
Hence there are two methods of testing or detecting heteroscedasticity. These are:
i. Informal method
ii. Formal method
i. Informal Method
This method is called informal because it does not undertake the formal testing procedures such
as t-test, F-test and the like. It is a test based on the nature of the graph. In this method to check
whether a given data exhibits heteroscedsticity or not we look on whether there is a systematic
relation between residual squared ei2 and the mean value of Y i.e. (Yˆ ) or with X i . In the figure
below ei2 are plotted against Yˆ or ( X i ) . In fig (a), we see there is no systematic pattern between
the two variables, suggesting that perhaps no hetroscedasticity is present in the data. Figures b to
e, however, exhibit definite patterns. . For instance, c suggests a linear relationship where as d
and e indicate quadratic relationship between ei2 and Yi .
5|Page
ii. Formal Methods
a. Goldfield-Quandt test
This popular method is applicable if one assumes that the heteroscedastic variance σ i2 is
positively related to one of the explanatory variables in the regression model. For simplicity,
consider the usual two variable models:
Yi = α + β i X i + U i
If the above equation is appropriate, it would mean σ i2 would be larger, the larger values of X i .If
that turns out to be the case, hetroscedasticity is most likely to be present in the model. To test
this explicitly, Goldfeld and Quandt suggest the following steps:
Step 1: Order or rank the observations according to the values of X i beginning with the lowest
X value
Step 2: Omit C central observations where C is specified a priori, and divide the remaining (n-
( n − c)
c) observations into two groups each of observations
2
( n − c) ( n − c)
Step 3: Fit separate OLS regression to the first observations and the last
2 2
observations, and obtain the respective residual sums of squares RSS, and RSS2, RSS1
representing RSS from the regression corresponding to the smaller X i values (the small variance
group) and RSS2 that from the larger X i values (the large variance group). These RSS each have
(n − c) (n − c − 2 K )
− K or df , where: K is the number of parameters to be
2 2
estimated, including the intercept term; and df is the degree of freedom.
RSS 2 / df
Step 4: compute λ =
RSS1 / df
If U i are assumed to be normally distributed (which we usually do), and if the assumption of
homoscedasticity is valid, then it can be shown that λ follows F distribution with numerator and
RSS 2 /( n − c − 2 K ) / 2
denominator df each (n-c-2k)/2. λ = ~ F (n -c) (n -c)
RSS1 /(n − c − 2 K ) / 2
2
−K ,
2
−K
6|Page
If in application the computed λ ( = F ) is greater than the critical F at the chosen level of
significance, we can reject the hypothesis of homoscedasticity, i.e. we can say that
hetroscedasticity is very likely.
Example: to illustrate the Goldfeld-Quandt test, we present in table 3.1 data on consumption
expenditure in relation to income for a cross-section of 30 families. Suppose we postulate that
consumption expenditure is linearly related to income but that heteroscedasticity is present in the
data. We further postulate that the nature of heterosedastic is given in equation (3.15) above.
The necessary reordering of the data for the application of the test is also presented in table 3.1.
Table 3.1 Hypothetical data on consumption expenditure Y($) and income X($). (Data ranked by
X values)
Y X Y X
55 80 55 80
65 100 70 85
70 85 75 90
80 110 65 100
79 120 74 105
84 115 80 110
98 130 84 115
95 140 79 120
90 125 90 125
75 90 98 130
74 105 95 140
110 160 108 145
113 150 113 150
125 165 110 160
108 145 125 165
115 180 115 180
140 225 130 185
120 200 135 190
145 240 120 200
130 185 140 205
152 220 144 210
144 210 152 220
175 245 140 225
180 260 137 230
135 190 145 240
140 205 175 245
178 265 189 250
191 270 180 260
137 230 178 265
189 250 191 270
7|Page
Dropping the middle four observations, the OLS regression based on the first 13 and the last 13
observations and their associated sums of squares are shown next (standard errors in
parentheses). Regression based on the last 13 observations
Yi = 3.4094 + 0.6968 X i + ei
(8.7049) (0.0744)
R 2 = 0.8887
RSS1 = 377.17
df = 11
Regression based on the last 13 observations
Yi = −28.0272 + 0.7941X i + ei
(30.6421) (0.1319)
R = 0.7681
2
RSS 2 = 1536.8
df = 11
RSS 2 / df 1536.8 / 11
λ= =
From these results we obtain: RSS1 / df 377.17 / 11
λ = 4.07
The critical F-value for 11 numerators and 11 denominators for df at the 5% level is 2.82. Since
the estimated F ( = λ ) value exceeds the critical value, we may conclude that there is
hetrosedasticity in the error variance. However, if the level of significance is fixed at 1%, we
may not reject the assumption of homosedasticity (why?) Note that the ρ value of the observed
λ is 0.014.
There are also other tests of hetroscedasticity like spearman’s rank correlation test, Breusch-
pagan-Goldfe y test and white’s general hetroscedastic test. Read them by yourself.
8|Page
Y = α + βX i + U i , var(u i ) = σ i2 , Ε(u i ) = 0 Ε(ui u j ) = 0
If we apply OLS to the above then it will result in inefficient parameters since var(u i ) is not
constant.
The remedial measure is transforming the above model so that the transformed model satisfies all
the assumptions of the classical regression model including homoscedasticity. Applying OLS to
the transformed variables is known as the method of Generalized Least Squares (GLS). In short
GLS is OLS on the transformed variables that satisfy the standard least squares assumptions.
The estimators thus obtained are known as GLS estimators, and it is these estimators that are
BLUE.
4.2 Autocorrelation
4.2.1 The Nature of Autocorrelation
In our discussion of simple and multiple regression models, one of the assumptions of the
classicalist is that the cov(u i u j ) = Ε(u i u j ) = 0 ,which implies that successive values of
disturbance term U are temporarily independent, i.e. disturbance occurring at one point of
observation is not related to any other disturbance. This means that when observations are made
over time, the effect of disturbance occurring at one period does not carry over into another
period.
If the above assumption is not satisfied, that is, if the value of U in any particular period is
correlated with its own preceding value(s), we say there is autocorrelation of the random
variables. Hence, autocorrelation is defined as a ‘correlation’ between members of series of
observations ordered in time or space.
9|Page
4.2.2 Graphical Representation of Autocorrelation
Since autocorrelation is correlation between members of series of observations ordered in time,
we will see graphically the trend of the random variable by plotting time horizontally and the
random variable (U i ) vertically.
Consider the following figures
Ui Ui Ui
t t t
(a) (b ) (c)
Ui Ui
t : : : : : : :: :: : : : : t
:::::::::::::
(d) (e)
The figures (a) –(d) above, show a cyclical pattern among the U’s indicating autocorrelation i.e.
figures (b) and (c) suggest an upward and downward linear trend and (d) indicates quadratic
trend in the disturbance terms. Figure (e) indicates no systematic pattern supporting non-
autocorrelation assumption of the classical linear regression model. We can also show
autocorrelation graphically by plotting successive values of the random disturbance term
vertically (ui) and horizontally (uj).
There are several reasons why serial or autocorrelation a rises. Some of these are:
a. Cyclical fluctuations
Time series such as GNP, price index, production, employment and unemployment exhibit
business cycle. Starting at the bottom of recession, when economic recovery starts, most of these
10 | P a g e
series move upward. In this upswing, the value of a series at one point in time is greater than its
previous value. Thus, there is a momentum built in to them, and it continues until something
happens (e.g. increase in interest rate or tax) to slowdown them. Therefore, regression involving
time series data, successive observations are likely to be interdependent.
b. Specification Bias
Let’s see one by one how the above specification biases causes autocorrelation.
i. Exclusion of variables: as we have discussed in chapter one , there are several sources of
the random disturbance term (ui). One of these is exclusion of variable(s) from the model.
This error term will show a systematic change as this variable changes. For example,
suppose the correct demand model is given by:
yt = α + β1 x1t + β 2 x 2t + β 31 x3t + U t − − − − − − − − − − − − 3.21 where
pork and t = time. Now, suppose we run the following regression in (3.21):
y t = α + β 1 x1t + β 2 x 2t + Vt − − − − − − − − − − − − ------3.22
Now, if equation 3.21 is the ‘correct’ model or true relation, running equation 3.22 is the
tantamount to letting Vt = β 3 x3t + U t . And to the extent the price of pork affects the
consumption of beef, the error or disturbance term V will reflect a systematic pattern, thus
creating autocorrelation. A simple test of this would be to run both equation 3.21 and equation
3.22 and see whether autocorrelation, if any, observed in equation 3.22 disappears when
11 | P a g e
equation 3.21 is run. The actual mechanics of detecting autocorrelation will be discussed
latter.
ii. Incorrect functional form: This is also one source of the autocorrelation of error term.
Suppose the ‘true’ or correct model in a cost-output study is as follows.
incorrectly fit the following model. M arg inal cos t i = α 1 + α 2 output i + Vi --------------3.24
The marginal cost curve corresponding to the ‘true’ model is shown in the figure below along
with the ‘incorrect’ linear cost curve.
As the figure shows, between points A and B the linear marginal cost curve will
consistently over estimate the true marginal cost; whereas, outside these points it will
consistently underestimate the true marginal cost. This result is to be expected because the
disturbance term Vi is, in fact, equal to (output)2+ ui, and hence will catch the systematic
effect of the (output)2 term on the marginal cost. In this case, Vi will reflect autocorrelation
because of the use of an incorrect functional form.
iii. Neglecting lagged term from the model: - If the dependent variable of a certain regression
model is to be affected by the lagged value of itself or the explanatory variable and is not
included in the model, the error term of the incorrect model will reflect a systematic pattern
which indicates autocorrelation in the model. Suppose the correct model for consumption
expenditure is:
C t = α + β 1 yt + β 2 y t −1 + U t -----------------------------------3.25
12 | P a g e
But again for some reason we incorrectly regress:
C t = α + β 1 yt + Vt ---------------------------------------------3.26
Autocorrelation, as stated earlier, is a kind of lag correlation between successive values of same
variables. Thus, we treat autocorrelation in the same way as correlation in general. A simple
case of linear correlation is termed here as autocorrelation of first order. In other words, if the
value of U in any particular period depends on its own value in the preceding period alone, we
say that U’s follow a first order autoregressive scheme AR(1) (or first order Markove scheme)
i.e. u t = f (u t −1 ) . ------------------------- - -------------3.28
u t = f (u t −1 , u t −2 ) ---------------------------------- 3.29
This form of autocorrelation is called a second order autoregressive scheme and so on.
Generally when autocorrelation is present, we assume simple first form of autocorrelation:
u t = ρu t −1 + vt --------------------------------------------3.30
where ρ the coefficient of autocorrelation and V is a random variable satisfying all the basic
assumption of ordinary least square.
Ε(v) = 0, Ε(v 2 ) = σ v2 and Ε( v i v j ) = 0 for i ≠ j The
above relationship states the simplest possible form of autocorrelation; if we apply OLS on the
model given in ( 3.30) we obtain:
13 | P a g e
n
∑u u t t −1
ρ̂ = t =2
n
--------------------------------3.31
∑u
t =2
2
t −1
Given that for large samples: Σu t2 ≈ Σu t2−1 , we observe that coefficient of autocorrelation ρ
represents a simple correlation coefficient r.
n n n
∑u u t t −1 ∑u u t t −1 ∑u u t t −1
ρˆ = t =2
n
= t =2
= t =2
= rut ut −1 ---------------------------3.32
2
Σu t2 Σu t2−1
∑u
t =2
2
t −1 n 2
∑ u t −1
t =2
⇒ −1 ≤ ρˆ ≤ 1 since − 1 ≤ r ≤ 1 ---------------------------------------------3.33
This proves the statement “we can treat autocorrelation in the same way as correlation in
general”. From our statistics background we know that:
14 | P a g e
Our objective, here is to obtain value of u t in terms of autocorrelation coefficient ρ and random
variable vt . The complete form of the first order autoregressive scheme may be discussed as
under:
U t = f (U t −1 ) = ρU t −1 + vt
U t −1 = f (U t − 2 ) = ρU t −2 + vt −1
U t −2 = f (U t −3 ) = ρU t −3 + vt − 2
U t − r = f (U t − ( r +1) ) = ρU t −( r +1) + vt − r We
U t = ρU t −1 + vt
= ρ ( ρU t − 2 + vt −1 ) + vt , u t −1 = ρU t −2 + vt −1
= ρ 2U t − 2 + ρvt −1 + vt
= ρ 2 ( ρU t −3 + vt −3 ) + ( ρvt −1 + vt )
U t = ρ 3U t −3 + ρ 2 vt −3 + ρvt −1 + v t In this
way, if we continue the substitution process for r periods (assuming that r is very large), we shall
obtain:
U t = vt + ρvt −1 + ρ 2 vt − 2 + ρ 3 vt −3 + − − − − − − − − -------------3.35
ρ r → 0 since / ρ / ≤1
∞
u t = ∑ ρ r vt − r -----------------------------------------------------------3.36
r =0
Now, using this value of u t , let’s compute its mean, variance and covariance
1. To obtain mean:
∞
Ε(U t ) = Ε ∑ ρ r vt − r = Σρ r Ε(vt − r ) = 0 since Ε(vt −r ) = 0 ----------3.37
r =0
In other words, we found that the mean of autocorrelated U’s turns out to be zero.
15 | P a g e
2. To obtain variance
2
∞ ∞ ∞
By the definition of variance Ε(U ) = Ε ∑ ρ r vt −r = ∑ ( ρ r ) 2 Ε(vt −r ) 2 = ∑ ( ρ r ) 2 var(Vt − r )
i
2
r =0 r =0 r =0
σ2
var(U t ) = --------------------------------(3.38) ; Since / ρ / < 1
(1 − ρ 2 )
σ2
Thus, variance of autocorrelated u i is which is constant value. From the above, the
1− ρ 2
variance of Ui depends on the nature of variance of Vi. If the variance of Vi is homoscedaistic, Ui
is homomscedastic and if Vi is hetroscedastic, Ui is hetroscedastic.
3. To obtain covariance:
= E (U tU t −1 ) ------------------------------------------------------------------------ (3.39)
∴ U t −1 = vt −1 + ρv t − 2 + ρ 2 vt −3 + ........
16 | P a g e
= ρΕ(vt −1 + ρ 2 vt −2 + ...... + 2 times cross products)
2 2
= ρ (σ v2 + ρ 2σ v2 + ...... + 0)
= ρ (σ v2 (1 + ρ 2 + ρ 4 + ......)
ρσ 2
= since ρ < 1--------------------------------------------------------3.40
1− ρ 2
ρσ v2
∴ cov(U t , U t −1 ) = = ρσ u2 ……………………………………………….3.41
1− ρ 2
cov(U t , U t −3 ) = ρ 3σ u2 ….........................................................................3.43
σ2
U t ~ N 0, v 2 and; E ( U t U t −r ) ≠ 0 --------------------------------3.44
1- ρ
We have seen that ordinary least square technique is based on basic assumptions. Some of the
basic assumptions are with respect to mean, variance and covariance of disturbance term.
Naturally, therefore, if these assumptions do not hold good on what so ever account, the
estimators derived by OLS procedure may not be efficient. Now, we are in a position to examine
the effect of autocorrelation on OLS estimators. Following are effects on the estimators if OLS
method is applied in presence of autocorrelation in the given data.
We know that: βˆ = β + Σk i u i
17 | P a g e
2. The variance of OLS estimates is inefficient.
The variance of estimate βˆ in simple regression model will be biased down wards (i.e.
underestimated) when u’s are auto correlated.
If var( βˆ ) is underestimated, SE ( βˆ ) is also underestimated, this makes t-ratio large. This large t-
4. Wrong testing procedure will make wrong prediction and inference about the characteristics
of the population.
There are two methods that are commonly used to detect the existence or absence of
autocorrelation in the disturbance terms. These are:
1. Graphic method
Recalled from section 3.2.2 that autocorrelation can be presented in graphs in two ways.
Detection of autocorrelation using graphs will be based on these two ways.
Given a data of economic variables, autocorrelation can be detected in this data using graphs in
the following two procedures.
a. Apply OLS to the given data whether it is auto correlated or not and obtain the error
terms. Plot et horizontally and et −1 vertically. i.e. plot the following observations
(e1 , e2 ), (e2 , e3 ), (e3 , e4 ).......( en , en +1 ) .If on plotting, it is found that most of the points fall
in quadrant I and III, as shown in fig (a) below, we say that the given data is
autocorrelated and the type of autocorrelation is positive autocorrelation. If most of the
points fall in quadrant II and IV, as shown in fig (b) below the autocorrelatioin is said to
be negative. But if the points are scattered equally in all the quadrants as shown in fig (c)
below, then we say there is no autocorrelation in the given data.
18 | P a g e
2. Formal testing method
Different econometricians and statisticians suggest different types of testing methods. But, the
most frequently and widely used testing method by researchers is the following.
A. The Durbin-Watson d test: The most celebrated test for detecting serial correlation is one that
is developed by statisticians Durbin and Waston. It is popularly known as the Durbin-Waston d
statistic, which is defined as:
t =n
∑ (e t − et −1 ) 2
d= t =2
t =n
------------------------------------3.47
∑e
t =1
2
t
19 | P a g e
Note that, in the numerator of d statistic the number of observations is n − 1 because one
observation is lost in taking successive differences.
1. The regression model includes an intercept term. If such term is not present, as in the case
of the regression through the origin, it is essential to rerun the regression including the
intercept term to obtain the RSS.
2. The explanatory variables, the X’s, are non-stochastic, or fixed in repeated sampling.
3. The disturbances U t are generated by the first order auto regressive scheme:
Vt = ρu t −1 + ε t
4. The regression model does not include lagged value of Y the dependent variable as one
of the explanatory variables. Thus, the test is inapplicable to models of the following
type
yt = β1 + β 2 X 2t + β 3 X 3t + ....... + β k X kt + ry t −1 + U t
Where y t −1 the one period lagged value of y is such models are known as autoregressive
models. If d-test is applied mistakenly, the value of d in such cases will often be around
2, which is the value of d in the absence of first order autocorrelation. Durbin developed
the so-called h-statistic to test serial correlation in such autoregressive.
In using the Durbin –Watson test, it is, therefore, to note that it cannot be applied in
violation of any of the above five assumptions.
t =n
∑ (e t − et −1 ) 2
From equation 3.47 the value of d = t =2
t =n
∑e
t =1
t
2
20 | P a g e
Squaring the numerator of the above equation, we obtain
n n
∑e + ∑et
2 2
t −1 − 2Σet et −1
d= t =2 t =2
------------------3.48
Σet2
n n
However, for large samples ∑e
t =2
2
t ≅ ∑ et2−1 because in both cases one observation is lost.
t =2
Thus,
n
2∑ et2
2Σet et −1
d= t =2
−+
Σe 2
t Σe t
Σet et −1
d ≈ 2 1− n
∑
t =1
et
Σet et −1
but ρ = from equation
Σet
d = 2(1 − ρˆ )
ρˆ = 0, d ≅ 2
if ρˆ = 1, d ≅ 0
ρˆ = −1, d ≅ 4
21 | P a g e
Whenever, therefore, the calculated value of d turns out to be sufficiently close to 2, we accept
null hypothesis, and if it is close to zero or four, we reject the null hypothesis that there is no
autocorrelation.
However, because the exact value of d is never known, there exist ranges of values with in which
we can either accept or reject null hypothesis. We do not also have unique critical value of d-
stastics. We have d L -lower bound and d u upper bound of the initial values of d to accept or
reject the null hypothesis. For the two-tailed Durbin Watson test, we can set five regions to the
values of d graphically (read it).
The mechanisms of the D.W test are as follows, assuming that the assumptions underlying the
tests are fulfilled.
Obtain the computed value of d using the formula given in equation 3.47
For the given sample size and given number of explanatory variables, find out critical d L
and dU values.
3. If how ever the value of d lies between d L and dU or between (4 − dU ) and (4 − d L ) , the
D.W test is inconclusive.
22 | P a g e
Solution: First compute (4 − d L ) and (4 − d U ) and compare the computed value
of d with d L , d U , (4 − d L ) and (4 − d U )
(4 − d L ) =4-1.37=2.63
(4 − dU ) =4-1.5=2.50
X 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Y 2 2 2 1 3 5 6 6 10 10 10 12 15 10 11
Solution:
1. regress Y on X: i.e. Yt = α + β X t + U t :
Σxy 255
βˆ = = = 0.91
Σx 2 280
Y = −0.29 + 0.91X + U i
23 | P a g e
Σ(et − et −1 ) 2 60.213
d= = = 1.442
Σet2 41.767
Values of d L and d U on 5% level of significance, with n=15 and one explanatory variable are:
(4 − d u ) = 2.64
Since d* lies between dU < d < 4 − dU , accept H0. This implies the data is autocorrelated.
Although D.W test is extremely popular, the d test has one great drawback in that if it falls in the
inconclusive zone or region, one cannot conclude whether autocorrelation does or does not exist.
Several authors have proposed modifications of the D.W test.
In many situations, however, it has been found that the upper limit d U is approximately the true
significance limit. Thus, the modified DW test is based on dU in case the estimated d value lies
in the inconclusive zone, one can use the following modified d test procedure. Given the level of
significance α ; if
Since in the presence of serial correlation the OLS estimators are inefficient, it is essential to
seek remedial measures. The remedy however depends on what knowledge one has about the
24 | P a g e
nature of interdependence among the disturbances. : This means the remedy depends on
whether the coefficient of autocorrelation is known or not known.
A. when ρ is known- When the structure of autocorrelation is known, i.e ρ is known, the
appropriate corrective procedure is to transform the original model or data so that error term of
the transformed model or data is non auto correlated.
When ρ is not known, we first estimate the coefficient of autocorrelation and apply appropriate
measure accordingly.
4.3 Multicollinearity
4.3.1 The Nature of Multicollinearity
Originally, multicollinearity meant the existence of a “perfect” or exact, linear relationship
among some or all explanatory variables of a regression model. For k-variable regression
involving explanatory variables x1 , x 2 ,......, x k , an exact linear relationship is said to exist if the
following condition is satisfied.
λ1 x1 + λ2 x 2 + ....... + λk x k + vi = 0 − − − − − − (1)
where λ1 , λ2 ,.....λk are constants such that not all of them are simultaneously zero.
Today, however , the term multicollineaity is used in a broader sense to include the case of
perfect multicollinearity as shown by (1) as well as the case where the x-variables are inter-
correlated but not perfectly so as follows
λ1 x1 + λ2 x 2 + ....... + λ2 x k + vi = 0 − − − − − − (1)
where vi is the stochastic error term.
25 | P a g e
For example: in the regression of electricity consumption on income (x1) and house size
(x2), there is a physical constraint in the population in that families with higher income
generally have larger homes than with lower incomes.
3. Overdetermined model: This happens when the model has more explanatory variables
than the number of observations. This could happen in medical research where there may
be a small number of patients about whom information is collected on a large number of
variables.
y i = βˆ 1 x 1 i + βˆ 2 x 2 i + e i
Recall the formulas of βˆ1 and βˆ 2 from our discussion of multiple regression.
Σ x 1 y Σ x 22 − Σ x 2 y Σ x 1 x 2
βˆ 1 =
Σ x 12i Σ x 22 − ( Σ x 1 x 2 ) 2
Σ x 2 y Σ x 12 − Σ x 1 y Σ x 1 x 2
βˆ 1 =
Σ x 12 Σ x 22 − ( Σ x 1 x 2 ) 2
Where λ is non-zero constants. Substitute 3.32in the above βˆ1 and βˆ 2 formula:
Σ x1 yΣ (λ x1 ) 2 − Σ λ x1 yΣ x1λ x1
βˆ 1 =
Σ x 12i Σ ( λ x 1 ) 2 − ( Σ x 1 λ x 1 ) 2
Σ x1 yλ 2Σ x1 − λ Σ x1 yΣ x1
2 2
0
= = ⇒ indeterminate.
λ (Σ x
2
1
2
) 2
− λ 2
(Σ x1 )
2 2
0
26 | P a g e
Applying the same procedure, we obtain similar result (indeterminate value) for βˆ 2 . Likewise,
σ 2 λ2 Σx12
=
λ2 (Σx12 ) 2 − λ2 (Σx12 ) 2
σ 2 λ2 Σx12
= =∞ ⇒ infinite.
0
These are the consequences of perfect multicollinearity. One may raise the question on
consequences of less than perfect correlation. In cases of near or high multicollinearity, one is
likely to encounter the following consequences.
2. If multicollineaity is less than perfect (i.e near or high multicollinearity), the regression
coefficients are determinate
Proof: Consider the two explanatory variables model above in deviation form.
If we assume x 2 = λx1 it indicates us perfect correlation between x1 and x2 because the change
in x2 is completely because of the change in x1.Instead of exact multicollinearity, we may have:
x 2i = λx1i + v t Where λ ≠ 0, vt is stochastic error term such that Σxi v i = 0 . In this case x2 is not
only determined by x1,but also affected by some other variables given by vi (stochastic error
term).
Substitute x 2 = λx1i + vt in the formula of βˆ1 above
This proves that if we have less than perfect multicollinearity the OLS coefficients are
determinate.
27 | P a g e
The implication of indetermination of regression coefficients in the case of perfect
multicolinearity is that it is not possible to observe the separate influence of x1 and x 2 . But
such extreme case is not very frequent in practical applications. Most data exhibit less than
perfect multicollinearity.
3. If multicollineaity is less than perfect (i.e. near or high multicollinearity) , OLS estimators
retain the property of BLUE
Explanation:
Note: While we were proving the BLUE property of OLS estimators in simple and multiple
regression models; we did not make use of the assumption of no multicollinearity. Hence, if the
basic assumptions which are important to prove the BLUE property are not violated ,whether
multicollinearity exist or not ,the OLS estimators are BLUE .
3. Although BLUE, the OLS estimators have large variances and covariances.
σ 2 Σx 22
var(βˆ 2 ) =
Σx12 Σx 22 − (Σx1 x 2 ) 2
1
Multiply the numerator and the denominator by
Σx 2
2
1
σ 2 Σx 22 .
Σx 2 σ2
2
var(βˆ 2 ) = =
(Σx Σx
2
1
2
2 − (Σx1 x 2 ) 2 . ) 1
Σx −
(Σx1 x 2 ) 2
2
Σx 2 Σx12
2 1
σ2 σ2
= =
(Σx1 x 2 ) 2 Σx12 − (1 − r122 )
Σx − 1 −
2
Σx12 Σx12
1
Where r122 is the square of correlation coefficient between x1 and x2 ,
As r12 tends to 1 or as collinearity increases, the variance of the estimator increase and in the
− r12σ 2
Similarly cov(β 1 , β 2 ) = . (why?)
(1 − r122 ) Σx12 Σx12
28 | P a g e
As r12 increases to ward one, the covariance of the two estimators increase in absolute value. The
speed with which variances and covariance increase can be seen with the variance-inflating
factor (VIF) which is defined as:
1
VIF =
1 − r122
VIF shows how the variance of an estimator is inflated by the presence of multicollinearity. As
r122 approaches 1, the VIF approaches infinity. That is, as the extent of collinearity increase, the
variance of an estimator increases and in the limit the variance becomes infinite. As can be seen,
if there is no multicollinearity between x1 and x 2 , VIF will be 1.
Using this definition we can express var(β1 ) and var( βˆ 2 ) interms of VIF
σ2 σ2
var(β1 ) = 2 VIF and var(β 2 ) = 2 VIF
ˆ ˆ
Σx1 Σx 2
Which shows that variances of βˆ1 and βˆ 2 are directly proportional to the VIF.
4. Because of the large variance of the estimators, which means large standard errors, the
confidence interval tend to be much wider, leading to the acceptance of “zero null hypothesis”
(i.e. the true population coefficient is zero) more readily.
5. Because of large standard error of the estimators, the computed t-ratio will be very small
leading one or more of the coefficients tend to be statistically insignificant when tested
individually.
6. Although the t-ratio of one or more of the coefficients is very small (which makes the
coefficients statistically insignificant individually), R2, the overall measure of goodness of fit, can
be very high.
Example: if y = α + β1 x1 + β 2 x 2 + .... + β k x k + vi
In the cases of high collinearity, it is possible to find that one or more of the partial slope
coefficients are individually statistically insignificant on the basis of t-test. But the R2 in such
situations may be so high say in excess of 0.9.in such a case on the basis of F test one can
convincingly reject the hypothesis that β1 = β 2 = − − − = β k = 0 Indeed, this is one of the
signals of multicollinearity- insignificant t-values but a high overall R2 (i.e a significant F-value).
29 | P a g e
7. The OLS estimators and their standard errors can be sensitive to small change in the data.
existence of multicollinearity because multicollinearity can also exist even if the correlation
coefficient is low.
However, the combination of all these criteria should help the detection of multicollinearity.
30 | P a g e
the model. Note that according tot Klieri’s rule of thumb, which suggest that multicollinearity
may be a trouble some problem only if R2 obtained from an auxiliary regression is greater than
the overall R2, that is obtained from the regression of Y on all regressors.
Max.eigen value
CI = = k
min . eigen value
Decision rule: if K is between 100 and 1000 there is moderate to strong muticollinearity and if it
exceeds 1000 there is sever muticollinearity. Alternatively if CI( = k ) is between 10 and 30,
there is moderate to strong multicollineaity and if it exceeds 30 there is sever muticollinearity.
Example . If k=123,864 and CI=352 – This suggest existence of multicollinearity
σ2 1 σ2
var(βˆ1 ) =
2
= 2 VIF
Σx
Σx1 1 − Ri2 i
where Ri2 is the R 2 in the auxiliary regression of Xj on the remaining (k-2) regressors and VIF is
variance inflation factor.
Some authors therefore use the VIF as an indicator of multicollinearity: The larger is the value
of VIFj, the more “trouble some” or collinear is the variable Xj. However, how high should VIF
be before a regressor becomes troublesome? As a rule of thumb, if VIF of a variable exceeds 10
(this will happens if Ri2 exceeds (0.9) the variable is said to be highly collinear.
Other authors use the measure of tolerance to detect multicollinearity. It is defined as
1
TOLi = (1 − R 2j ) = Clearly, TOLj =1 if Xj is not correlated with the other regressors, where
VIF
as it is zero if it is perfectly related to other regressors. VIF (or tolerance) as a measure of
31 | P a g e
σ2
collinearity, is not free of criticism. As we have seen earlier var(βˆ ), = = 2 (VIF ) ; depends
Σxi
on three factors σ 2 , Σxi2 and VIF . A high VIF can be counter balanced by low
σ 2 or high Σxi2 . To put differently, a high VIF is neither necessary nor sufficient to get high
variances and high standard errors. Therefore, high multicollinearity, as measured by a high VIF
may not necessary cause high standard errors.
32 | P a g e