3.0 ErrorVar and OLSvar-1
3.0 ErrorVar and OLSvar-1
MFE 402
Dan Yavorsky
1
Topics for Today
• Residuals
• Projections
• R Squared
• CEF Error Variance
• Variance of the OLS Estimator
2
Residuals
OLS Fitted Values and Residuals
Define:
Thus, assuming a linear CEF (or best linear predictor of Y, or best linear approximation
to the CEF), we have:
Note that:
Fitted Values
Negative Residuals
Positive Residuals
4
Two Algebraic Properties of Residuals
1. The sample correlation between the regressors and the residuals is zero:
n
∑ Xi eî = 0
i=1
2. When Xi contains a constant such that the model has an intercept, then
n
∑ eî = 0
i=1
• are the first-order conditions when solving for the OLS estimator
• offer a nice parallel to the moment conditions 𝔼[e] = 0 and 𝔼[Xe] = 0
5
Residuals in Matrix Notation
Recall the matrix notation for the model and least squares estimator:
Then:
Y = X𝛽 ̂ + e ̂ or e ̂ = Y - X𝛽 ̂
X′ e ̂ = 0
6
Projection Matrices
Visualizing Least Squares as Projection
Y
^
e
X2
^
Y
X1
7
Projection Matrix
P = X(X′ X)-1 X′
PY = X(X′ X)-1 X′ Y = X𝛽 ̂ = Ŷ
• P is symmetric (P′ = P)
• P is idempotent (PP = P)
• P has k eigenvalues equaling 1 and n - k equaling 0
• trace(P) = k
8
Annihilator Matrix
M = In - P = In - X(X′ X)-1 X′
The matrices P and M make it easy to show that the decomposition of Y = Ŷ + e ̂ into fitted
values Ŷ and residuals e ̂ is orthogonal:
It follows that
n n n
Y′ Y = Ŷ ′ Ŷ + 2Ŷ ′ e ̂ + e′̂ e ̂ = Ŷ ′ Ŷ + e′̂ e ̂ or that ∑ Y2i = ∑ Ŷ 2i + ∑ e2î
i=1 i=1 i=1
n n n
∑(Yi - Ȳ )2 = ∑(Ŷ i - Ȳ )2 + ∑ e2î
⏟⏟⏟⏟⏟
i=1 ⏟⏟⏟⏟⏟
i=1 ⏟
i=1 10
TSS SSR SSE
Coefficient of Determination (R2 )
SSR SSE
R2 = =1-
TSS TSS
“The fraction of the sample variance of Y which is explained by the least squares fit.”
Notice:
A “high” value of R2 is sometimes used to claim that a regression model is “valid” or correctly
specified or highly accurate for prediction – none of these are necessarily true.
11
Error Variance
Unconditional Error Variance
An important measure of the dispersion about the CEF function is the unconditional
(on X) variance of the CEF error e:
2 2
𝜎2 = var(e) = 𝔼 [(e - 𝔼[e]) ] = 𝔼[e2 ] = 𝔼 [(Y - m(X)) ]
𝜎2 measures the amount of variation in Y which is not “explained” by the CEF 𝔼[Y|X].
12
Adding Regressors Changes the Regression Variance
Y = m(X) + e
= explained + unexplained
𝜎2 = 𝔼[e2 ]
1 n 2
𝜎̃ 2 = ∑e
n i=1 i
But the errors ei are not observed, so we first estimate them with the residuals eî :
1 2 2
𝜎̂ 2 = ∑ ê
n i=1 i
14
𝜎̂ 2 ≤ 𝜎̃ 2
1 ′ 1 1
𝜎̂ 2 = e Me = tr(e′ Me) = tr(Mee′ )
n n n
1 1
𝔼[𝜎̂ 2 |X] = tr (𝔼[Mee′ |X]) = tr (M𝔼[ee′ |X])
n n
16
𝜎̂ 2 is biased (Cont.)
1 1 n-k 2
𝔼[𝜎̂ 2 |X] = tr (M𝔼[ee′ |X]) = tr(M)𝜎2 = 𝜎
n n n
The “fix” is to propose an unbiased estimator s2
n 2 1 n 2
s2 = 𝜎̂ = ∑ ê
n-k n - k i=1 i
Terminology:
√
• The summary() command in R calls s2 the Residual Standard Error
√
• Some textbooks call s2 the Standard Error of the Regression
17
Adjusted R2
n
Define 𝜎̂Y2 = (1/n) ∑i=1 (Yi - Ȳ )2 . Then
SSE 𝜎̂ 2
R2 = 1 - =1-
TSS 𝜎Y2̂
Since 𝜎̂Y2 is biased for the variance of Y and 𝜎̂ 2 is biased for the error variance, an
“adjusted R2 ” measure was proposed using unbiased estimators, often denoted R̄ 2 :
n
s2 n-1 ∑ e2î
R̄ 2 = 1 - 2 = 1 - ( ) n i=1
sY n - k ∑i=1 (Yi - Ȳ )2
n
where s2Y is the sample variance of Y: s2Y = (n - 1)-1 ∑i=1 (Yi - Ȳ )2
18
Conditional Error Variance
Conditional Error Variance
2
𝜎2 (x) = var(Y|X = x) = 𝔼 [(Y - 𝔼[Y|X = x]) |X = x]
2
var(e|X = x) = 𝔼[e2 |X] = 𝔼 [(Y - 𝔼[Y|X = x]) |X = x] = 𝜎2 (x)
Y = m(X) + 𝜎(X)u
Most econometric studies focus on m(x) and either ignore 𝜎(x), treat is as a constant
(𝜎(x) = 𝜎), or treat it as a nuisance parameter.
20
Homoskedasticity & Heterskedasticity
Two terms are used to summarize assumptions about the conditional variance:
• Simplifying calculations
• Teaching and learning
• Understanding a specific, unusual, and exceptional special case.
21
Variance of OLS Estimator
Variance of the OLS Estimator
22
Variance of the OLS Estimator (cont.)
𝜎2 (x) 0 ⋯ 0
⎡ 1 2 ⎤
0 𝜎 2 (x) ⋯ 0 ⎥
D = var(e|X) = 𝔼[ee′ |X] = ⎢
⎢ ⋮
⎢ ⋮ ⋱ ⋮ ⎥⎥
⎣ 0 0 ⋯ 𝜎n2 (x)⎦
Because
• the ith diagonal element of D is 𝔼[e2i |X] = 𝔼[e2i |Xi ] = 𝜎i2 (x)
• the ijth off-diagonal element of D is 𝔼[ei ej |X] = 𝔼[ei |Xi ]𝔼[ej |Xj ] = 0
23
Variance of the Estimator Under Homoskedasticity
𝜎2 0 ⋯ 0 1 0 ⋯ 0
⎡ 2 ⎤ ⎡ ⎤
⎢ 0 𝜎 ⋯ 0⎥ 2 ⎢0 1 ⋯ 0 ⎥ 2
D=⎢ ⎥ = 𝜎 ⎢ ⋮ ⋮ ⋱ ⋮ ⎥ = 𝜎 In
⎢ ⋮ ⋮ ⋱ ⋮ ⎥ ⎢ ⎥
2
⎣0 0 ⋯ 𝜎 ⎦ ⎣0 0 ⋯ 1 ⎦
V𝛽 ̂ = (X′ X)-1 X′ 𝔼[ee′ |X] X(X′ X)-1 = (X′ X)-1 X′ 𝜎2 In X(X′ X)-1 = 𝜎2 (X′ X)-1
24
Variance of the Estimator Under Homoskedasticity (one X)
n
Define the sample variance of X as s2X = (n - 1)-1 ∑i=1 (Xi - X̄ )2 .
Then for a simple (X is a scalar random variable) linear regression model, the standard
deviation of the slope coefficient is:
𝜎2
var(𝛽1 ) =
(n - 1)s2X
This equation makes it clear that the stand deviation of the slope:
25
Variance of the Estimator Under Homoskedasticity (one X): Graphically
The red side of arrows indicates an increase in the parameter (ie, either 𝜎2 , n, or s2X ).
sx
n
σ2
26
Gauss-Markov Theorem
the OLS estimator 𝛽 ̂ is the Best (lowest variance) Linear Unbiased Estimator (BLUE).
A new paper by Hansen (2022) shows 𝛽 ̂ is BUE – future textbooks might call it the
Gauss-Markov-Hansen Theorem!
27
Gauss-Markov Theorem Proof
̃ ] = A′ 𝔼[Y|X] = A′ X𝛽
𝔼[𝛽 |X ⇒ A′ X = In
̃ ) = var(A′ Y|X) = A′ DA = 𝜎2 A′ A
var(𝛽 |X
28
OLS Covariance Matrix Estimation Under Homoskedasticity
Under the assumption of homoskedasticity, the var-cov matrix of the OLS estimator is:
V0 ̂ = 𝜎2 (X′ X)-1
𝛽
The most common estimator for V0 ̂ replaces 𝜎2 with its unbiased estimator s2 :
𝛽
V̂ 0 ̂ = s2 (X′ X)-1
𝛽
𝔼 [V̂ 0𝛽 |X 2 ′ -1 2 ′ -1 0
̂ ] = 𝔼[s |X](X X) = 𝜎 (X X) = V𝛽 ̂
29
OLS Covariance Matrix Estimation Under Heteroskedasticity
n
V𝛽 ̂ = (X′ X)-1 (X′ DX)(X′ X)-1 = (X′ X)-1 (∑ Xi X′i 𝔼[e2i |X]) (X′ X)-1
i=1
n
V̂ ideal
̂ = (X′ X)-1 (∑ Xi X′i e2i ) (X′ X)-1
𝛽
i=1
s(𝛽ĵ ) = √[V̂ 𝛽 ]̂
jj
31
Computation
Computation in R: lm()
Residuals:
Min 1Q Median 3Q Max
-2.3583 -0.4215 0.0042 0.4718 2.3569
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 2.876515 0.067631 42.532 <2e-16 ***
exper[sam] 0.004776 0.004335 1.102 0.272
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
32
Computation in R: y ̂ and e ̂
33
Computation in R: s(𝛽)̂
34
Computation in R: R2 and R̄ 2
1 - SSE/TSS
[,1]
[1,] 0.004542129
sig2hat <- t(ehat) %*% ehat / n
sigYtilde <- sum((y - ybar)^2) / n
1 - sig2hat/sigYtilde
[,1]
[1,] 0.004542129
35