Multicol

Download as pdf or txt
Download as pdf or txt
You are on page 1of 3

STK 310 (Multicollinearity (MC))

Some ideas:
MC is a problem of exact or near exact linear relationship between the explanatory of feature
variables. Not a problem in the structure of the error terms.
Is this really a problem?

Consider the model:

Yi = β̂1 + β̂2 X2 + β̂3 X3 + ûi (1)


Ŷi = β̂1 + β̂2 X2 + β̂3 X3 (2)

or in deviation(?) form:

yi = β̂2 x2 + β̂3 x3 + ûi (3)


ŷi = β̂2 x2 + β̂3 x3 (4)

It is known that:

σ2 σ2 1
var(β̂2 ) = P = × (5)
x22i (1 − r23
2 ) 2 )
x22i (1 − r23
P

σ2 σ2 1
var(β̂3 ) = P = × (6)
x23i (1 − r232 ) x23i (1 − r23
2 )
P

−r23 σ 2
cov(β̂2 , β̂3 ) = qP (7)
2 ) x22i
P 2
(1 − r23 x3i

where r23 is the correlation coefficient between X2 and X3 .


What happens if X2 and X3 are independent?
What happens if X2 and X3 are perfectly linearly related?

Matrix results (considering 1, 2 above)

ŷ = Xβ + û (8)
−1 0
β̂ = X 0 X Xy (9)
−1
cov(β̂) = σ 2 X 0 X (10)

In order to determine β̂ or cov(β̂) we need the determinant of X 0 X.


What happens with |X 0 X| when X2 and X3 are perfectly linearly related?

1
STK 310 (Multicollinearity (MC))

Sources of MC???? See textbook/class discussion for more detail

1. Data collection method

2. Constraints on the model

3. Model specification

4. Overdetermined model

Consequences of MC???? See textbook/class discussion for more detail

1. Although the OLS estimators are BLUE their variances are inflated
Intuition ??? Impact of inflated variances?

σ2
var(βˆj ) = P 2 V IFj (11)
xj
1
V IFj = (12)
1 − Rj2

where
β̂j = the estimated partial regression coefficient of variable Xj
Rj2 = R2 of the regression of Xj on the remaining (k − 2) explanatory variables
P 2
(Xj − X j )2
P
xj =
1
Sometimes we use tolerance: T OLj = V IFj .

2. Confidence intervals tend to be much wider, eg. (β̂j ± t/alpha/2 se(β̂j ))

3. t-ratios lower

4. Low t-values but high R2

5. OLS estimators and standard errors sensitive to small changes in the data.

2
STK 310 (Multicollinearity (MC))

Detection of MC

1. High R2 but few significant t ratios.

2. High pairwise correlation between explanatory variables

3. Examination of partial correlation

4. Auxiliary regressions

5. Eigenvalues and condition index

6. Variance inflation factors (and tolerance)

7. Plots

Remedial measures

1. Use a-priori information

2. Extending/expanding data

3. Removing variables

4. Transformation of variables

5. Ridge regression/ factor analysis regression etc

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy