LEC11
LEC11
MULTICOLLINEARITY
Chapter 8
MULTICOLLINEARITY: NATURE
One of the assumptions of the CLRM was that there is no exact linear
relationship between the independent variables of a regression model.
Informally, no exact linear relationship or no collinearity means that a
variable, say , can not be expressed as an exact linear function of
another variable, say .
𝑌 = 𝛽1 + 𝛽2 𝑋 2+ 𝛽 3 𝑋 3 +𝑒
is the impact of on holding all other factors constant. If is related to
then also captures the impact of changes in . In other words,
interpretation of the parameters becomes difficult.
When the explanatory variables are very highly correlated with each
other (correlation coefficients either very close to 1 or to -1) then the
2
problem of multicollinearity occurs.
MULTICOLLINEARITY: NATURE
1 2 3 4 5 6
2 4 6 8 10 12
Where are constants and not all of them are zero simultaneously.
In our case the equation: can be satisfied for non-zero values of
both .
We have:
, that is and =1
EXACT LINEAR RELATIONSHIP: CONSEQUENCE
In case of perfect
multicollinearity, we cannot
T 1 T2 21 T1 get the inverse of ,
because it becomes
singular.
5
NEAR-EXACT LINEAR RELATIONSHIP
A near-exact linear relationship is said to exist if the following
condition is satisfied:
Where are constants and not all of them are zero simultaneously
and is a random error term.
In our example, , shows that is not an exact linear combination of
6
MULTICOLLINEARITY: NATURE
Points to remember
Multicollinearity as we have defined refers only to linear
3 9 13 (+4)
5 15 16 (+1) 0.9959
6 18 21 (+3)
8 24 25 (+1)
10 30 30 (+0)
THEORETICAL CONSEQUENCES OF MULTICOLLINEARITY
the F test in most cases will reject the null hypothesis that the partial
slope coefficients are jointly or simultaneously equal to zero. But
individual t tests will show that none or very few partial slope
coefficients are statistically different from zero.
Subsidiary, or auxiliary regressions.
One way of finding out which X variable is highly collinear with other X
on remaining Xs
on remaining Xs
on remaining Xs
on remaining Xs
on remaining Xs
on remaining Xs
2
R /( k −1)
𝐹= 2
(1 − R )/ (n − k )
DETECTION OF MULTICOLLINEARITY: VARIANCE INFLATION FACTOR
Run a regression, plug in the value of in the following formula:
The variance inflation factor:
Variance inflation factors range from 1 upwards. The numerical value for
VIF tells us (in decimal form) what % the variance is inflated for each
coefficient. For example, a VIF of 1.9 tells you that the variance of a
particular coefficient is 90% bigger than what you would expect if there
was no multicollinearity.
A rule of thumb for interpreting the variance inflation factor:
1 = not correlated.
Between 1 and 5 = moderately correlated.
Greater than 5 = highly correlated.
What is known is that the more your VIF increases, the less reliable your
regression results are going to be. In general, a VIF above 10 indicates high
correlation and is cause for concern. Some authors suggest a more
conservative level of 2.5 or above.
REMEDIAL MEASURES
15
REMEDIAL MEASURES
Dropping a Variable(s) from the Model
Faced with severe multicollinearity, the simplest solution might seem to be
to drop one or more of the collinear variables.
But this remedy can be worse than the disease (multicollinearity). When
formulating an economic model, we base the model on some theoretical
considerations.
Suppose that we are modelling demand for chicken which theoretically,
among others, depends on disposable income, price of chicken, price of
beef and, price of goat. In this example, following economic theory, we
expect all three prices to have some effect on the demand for chicken
since the three meat products are to some extent competing products.
Suppose we fail to separate the influence of the prices beef and goat on
the quantity of chicken demanded. But dropping those variables from the
model will lead to what is known as model specification error.
The best practical advice is not to drop a variable from an economically
viable model just because the collinearity problem is serious.
REMEDIAL MEASURES
18
PRIOR INFORMATION ABOUT SOME PARAMETERS
Sometimes a particular phenomenon, such as a demand function, is
investigated time and again.
From prior studies it is possible that we can have some knowledge of the
values of one or more parameters. This knowledge can be profitably used in
the current sample.
Consider following demand function where we find collinearity between price
and earnings but current research has found income coefficient of 0.9 in
similar regression.
We can use this information for :
Ridge regression