0% found this document useful (0 votes)
12 views

LEC11

The document discusses multicollinearity, a violation of the Classical Linear Regression Model (CLRM) assumptions, where independent variables are highly correlated, making it difficult to interpret regression coefficients. It explains the nature, consequences, detection methods, and potential remedies for multicollinearity, including dropping variables, acquiring additional data, rethinking the model, and transforming variables. The document emphasizes the importance of addressing multicollinearity to ensure reliable regression analysis results.

Uploaded by

alifmahmud436
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views

LEC11

The document discusses multicollinearity, a violation of the Classical Linear Regression Model (CLRM) assumptions, where independent variables are highly correlated, making it difficult to interpret regression coefficients. It explains the nature, consequences, detection methods, and potential remedies for multicollinearity, including dropping variables, acquiring additional data, rethinking the model, and transforming variables. The document emphasizes the importance of addressing multicollinearity to ensure reliable regression analysis results.

Uploaded by

alifmahmud436
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 21

VIOLATION OF CLRM ASSUMPTIONS:

MULTICOLLINEARITY

Chapter 8
MULTICOLLINEARITY: NATURE

 One of the assumptions of the CLRM was that there is no exact linear
relationship between the independent variables of a regression model.
 Informally, no exact linear relationship or no collinearity means that a
variable, say , can not be expressed as an exact linear function of
another variable, say .

𝑌 = 𝛽1 + 𝛽2 𝑋 2+ 𝛽 3 𝑋 3 +𝑒
 is the impact of on holding all other factors constant. If is related to
then also captures the impact of changes in . In other words,
interpretation of the parameters becomes difficult.
 When the explanatory variables are very highly correlated with each
other (correlation coefficients either very close to 1 or to -1) then the
2
problem of multicollinearity occurs.
MULTICOLLINEARITY: NATURE

 Perfect linear relationship: Assume we have the following model:

 Where the sample values for and are:

1 2 3 4 5 6
2 4 6 8 10 12

 From above Table, we see that


 Therefore, although it seems that there are two explanatory variables in
fact it is only one. This is because is an exact linear function of or
because and are perfectly collinear.
3
PERFECT LINEAR RELATIONSHIP: DEFINITION

 In presence of multicollinearity, we can not disentangle the


individual effect of and on Y.
 Exact linear relationship: For a k-variable regression model
involving explanatory variables ( is the intercept and has a value
of 1) an exact linear relationship is said to exist if the following
condition is satisfied

 Where are constants and not all of them are zero simultaneously.
 In our case the equation: can be satisfied for non-zero values of

both .
 We have:

, that is and =1
EXACT LINEAR RELATIONSHIP: CONSEQUENCE

 Under Perfect Multicollinearity, the OLS estimators simply


do not exist.
 If you try to estimate an equation in Eviews/ any other

computer packages and your equation specifications suffers


from
perfect multicollinearity, you will get an error message that
the regression cannot be run.

In case of perfect
multicollinearity, we cannot
 T 1 T2 21 T1 get the inverse of ,
because it becomes
singular.
5
NEAR-EXACT LINEAR RELATIONSHIP
 A near-exact linear relationship is said to exist if the following
condition is satisfied:

Where are constants and not all of them are zero simultaneously
and is a random error term.
 In our example, , shows that is not an exact linear combination of

because it is also determined by the stochastic error term .


 That is and =1 does not satisfy the relationship, we need to

know the value of random error term.

6
MULTICOLLINEARITY: NATURE

 Points to remember
 Multicollinearity as we have defined refers only to linear

relationship among variables of a regression model. But


may be nonlinearly correlated as:

 Variables and are functionally related with but the


relationship in nonlinear thus the model does not violate
the assumption of multicollinearity.
MULTICOLLINEARITY: EXAMPLE

3 9 13 (+4)

5 15 16 (+1) 0.9959

6 18 21 (+3)

8 24 25 (+1)

10 30 30 (+0)
THEORETICAL CONSEQUENCES OF MULTICOLLINEARITY

 CASE 1: PERFECT LINEAR RELATIONSHIP BETWEEN VARIABLES


 If multicollinearity is perfect (exact linear relationship), the
partial regression coefficients of the variables are
indeterminate and their standard errors are infinite.
 If and are perfectly collinear, there is no way that can be

kept constant when changes. As long as we fail to separate


individual effect of and on Y, we can not get a unique solution
for individual regression coefficients.
THEORETICAL CONSEQUENCES OF MULTICOLLINEARITY

 CASE 2: NEAR PERFECT LINEAR RELATIONSHIP BETWEEN VARIABLES


 If multicollinearity is less than perfect, the regression
coefficients, although determinate, posses large standard errors
(in relation to the coefficients themselves), which means that the
coefficients can not be estimated with great precision.
 The effect of multicollinearity is to make it hard to obtain

estimates of coefficients with small standard error.


 However, having a small number of observations also has similar

effect. For this reason, multicollinearity is considered as a small


sample phenomenon. Some Economists also use separate
term for multicollinearity, like micronumerosity instead of
multicollinearity.
PRACTICAL CONSEQUENCES OF MULTICOLLINEARITY

In case of near perfect or high multicollinearity, one is likely to


face following problems:
1. Although BLUE, the OLS estimates have large variances and
covariances making precise estimation difficult.
2. Because of 1, the confidence intervals for parameter estimates
become large leading to accepting the zero more readily.
3. Also because of 1, the t - statistics of one or more coefficients
tends to be statistically insignificant.
4. Although the t - statistics of one or more coefficients is
statistically insignificant, , the overall measure of goodness-of-fit,
can be very high.
5. The OLS estimators and their standard errors can be sensitive to
small changes in the data.
DETECTION OF MULTICOLLINEARITY
 High but few significant t ratios.

 “Classic” symptom of multicollinearity. If is high, say, in excess of 0.8,

the F test in most cases will reject the null hypothesis that the partial
slope coefficients are jointly or simultaneously equal to zero. But
individual t tests will show that none or very few partial slope
coefficients are statistically different from zero.
 Subsidiary, or auxiliary regressions.

 One way of finding out which X variable is highly collinear with other X

variables in the model is to regress each X variable on the remaining


X variables and to compute the corresponding .

 Consider the regression of on ─ six explanatory variables.


 If this regression shows that the is high but very few X coefficients
are individually statistically significant, we then look for the “culprit,”
the variable(s) that may be a perfect or near perfect linear
combination of the other X’s.
DETECTION OF MULTICOLLINEARITY

on remaining Xs
on remaining Xs
on remaining Xs
on remaining Xs
on remaining Xs
on remaining Xs
2
R /( k −1)
𝐹= 2
(1 − R )/ (n − k )
DETECTION OF MULTICOLLINEARITY: VARIANCE INFLATION FACTOR
 Run a regression, plug in the value of in the following formula:
 The variance inflation factor:
 Variance inflation factors range from 1 upwards. The numerical value for
VIF tells us (in decimal form) what % the variance is inflated for each
coefficient. For example, a VIF of 1.9 tells you that the variance of a
particular coefficient is 90% bigger than what you would expect if there
was no multicollinearity.
 A rule of thumb for interpreting the variance inflation factor:
1 = not correlated.
Between 1 and 5 = moderately correlated.
Greater than 5 = highly correlated.
 What is known is that the more your VIF increases, the less reliable your
regression results are going to be. In general, a VIF above 10 indicates high
correlation and is cause for concern. Some authors suggest a more
conservative level of 2.5 or above.
REMEDIAL MEASURES

 Suppose on the basis of one or more of the diagnostic tests we


find a particular problem is plagued by multicollinearity.
 What solution(s), if any, can be used to reduce the severity of the

collinearity problem, if not eliminate it completely?


 Unfortunately, as in the case of collinearity diagnostics, there is

no silver bullet; there are only a few rules of thumb.


 What can be done if multicollinearity is serious? We have two
choices:
(1) do nothing or (2) follow some rules of thumb.

15
REMEDIAL MEASURES
 Dropping a Variable(s) from the Model
 Faced with severe multicollinearity, the simplest solution might seem to be
to drop one or more of the collinear variables.
 But this remedy can be worse than the disease (multicollinearity). When
formulating an economic model, we base the model on some theoretical
considerations.
 Suppose that we are modelling demand for chicken which theoretically,
among others, depends on disposable income, price of chicken, price of
beef and, price of goat. In this example, following economic theory, we
expect all three prices to have some effect on the demand for chicken
since the three meat products are to some extent competing products.
 Suppose we fail to separate the influence of the prices beef and goat on
the quantity of chicken demanded. But dropping those variables from the
model will lead to what is known as model specification error.
 The best practical advice is not to drop a variable from an economically
viable model just because the collinearity problem is serious.
REMEDIAL MEASURES

 Acquiring Additional Data or a New Sample


 Since multicollinearity is a sample feature, it is possible that

in another sample involving the same variables, collinearity


may not be as serious as in the first sample.
 The important practical question is whether we can obtain

another sample, for collection of data can be costly.


 Sometimes just acquiring additional data—increasing the

sample size-can reduce the severity of the collinearity


problem.
 But if cost constraints are not very prohibitive, by all means

getting more data, i.e., increasing the sample size is


certainly feasible.
17
RETHINKING THE MODEL
 Sometimes a model chosen for empirical analysis is not carefully
thought out.
 Maybe some important variables are omitted, or maybe the

functional form of the model is incorrectly chosen.


 Thus, in the demand function for chicken can be estimated as a

log-linear or in linear in variable specification.


 If presence of collinearity is high in the log-linear specification, the

demand function can be estimated using a linear in variables (LIV)


model.
 It is possible that in the LIV specification the extent of collinearity

may not be as high as in the log-linear specification.

18
PRIOR INFORMATION ABOUT SOME PARAMETERS
 Sometimes a particular phenomenon, such as a demand function, is
investigated time and again.
 From prior studies it is possible that we can have some knowledge of the
values of one or more parameters. This knowledge can be profitably used in
the current sample.
 Consider following demand function where we find collinearity between price
and earnings but current research has found income coefficient of 0.9 in
similar regression.
 We can use this information for :

 Assuming that the prior information is correct, we have resolved the


collinearity problem, for on the right-hand side of the above model.
19
 We now have only one explanatory variable and no question of collinearity
arises.
TRANSFORMATION OF VARIABLES
 Transformation of Variables
 Occasionally, transformation of variables included in the model can

minimize, if not solve, the problem of collinearity.


 Consider following estimated regression

t= N.A. (1.232) (1.844) = 0.9894


Where, Imports ($) billion, GNP ($) billion,CPI
 In theory, imports are positively related to the GNP (a measure of

income) and domestic prices.


To resolve collinearity, consider following transformation:

This regression shows that real imports are


statistically significantly positively related to
real income, the estimated t- value being highly
20
significant. The trick is to transforming nominal
value into real value
OTHER REMEDIES
 Combining time series and cross-sectional data.
 Factor or principal component analysis and,

 Ridge regression

Will not be covered.


Perhaps out of
scope. 21

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy