0% found this document useful (0 votes)
71 views

3.0 ErrorVar and OLSvar-1

This document discusses error variance and the OLS variance estimator. It defines residuals and how they relate to fitted values. It introduces the projection and annihilator matrices and how they decompose the total sum of squares. It defines R-squared and how it measures the fraction of variance explained by the regression. It discusses how changing regressors affects the error variance and how the error variance is estimated with the residual sum of squares. It notes that this estimator is biased downward and introduces an unbiased estimator.

Uploaded by

Malik Mahad
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
71 views

3.0 ErrorVar and OLSvar-1

This document discusses error variance and the OLS variance estimator. It defines residuals and how they relate to fitted values. It introduces the projection and annihilator matrices and how they decompose the total sum of squares. It defines R-squared and how it measures the fraction of variance explained by the regression. It discusses how changing regressors affects the error variance and how the error variance is estimated with the residual sum of squares. It notes that this estimator is biased downward and introduces an unbiased estimator.

Uploaded by

Malik Mahad
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 42

Week 3: Error Variance & OLS Variance

MFE 402

Dan Yavorsky

1
Topics for Today

• Residuals
• Projections
• R Squared
• CEF Error Variance
• Variance of the OLS Estimator

2
Residuals
OLS Fitted Values and Residuals

Define:

• Fitted values Yî = X′i 𝛽 ̂


• Residuals eî = Yi - Ŷ i

Thus, assuming a linear CEF (or best linear predictor of Y, or best linear approximation
to the CEF), we have:

Yi = X′i 𝛽 + ei or Yi = X′i 𝛽 ̂ + eî

Note that:

• The error ei is unobservable


• The residual eî is a statistic (a function of the data)
• And we will use eî as an estimator of ei , hence the hat notation 3
Visualizing Residuals

Fitted Values
Negative Residuals
Positive Residuals

4
Two Algebraic Properties of Residuals

1. The sample correlation between the regressors and the residuals is zero:

n
∑ Xi eî = 0
i=1

2. When Xi contains a constant such that the model has an intercept, then

n
∑ eî = 0
i=1

Notice that these:

• are the first-order conditions when solving for the OLS estimator
• offer a nice parallel to the moment conditions 𝔼[e] = 0 and 𝔼[Xe] = 0
5
Residuals in Matrix Notation

Recall the matrix notation for the model and least squares estimator:

Y = X𝛽 + e and 𝛽 ̂ = (X′ X)-1 X′ Y

Then:

Y = X𝛽 ̂ + e ̂ or e ̂ = Y - X𝛽 ̂

and the conditions from the previous slide are simply:

X′ e ̂ = 0

6
Projection Matrices
Visualizing Least Squares as Projection
Y

^
e

X2

^
Y

X1
7
Projection Matrix

Define the n × n projection matrix P:

P = X(X′ X)-1 X′

This is sometimes called the “hat” matrix because

PY = X(X′ X)-1 X′ Y = X𝛽 ̂ = Ŷ

Many important properties:

• P is symmetric (P′ = P)
• P is idempotent (PP = P)
• P has k eigenvalues equaling 1 and n - k equaling 0
• trace(P) = k
8
Annihilator Matrix

Define the n × n annihilator matrix M:

M = In - P = In - X(X′ X)-1 X′

It gets its name from the calculation of MX:

MX = (In - X(X′ X)-1 X′ )X = X - X(X′ X)-1 X′ X = X - XIn = 0

A useful relationship with M is:


MY = Y - PY = Y - Ŷ = e ̂
MY = MMY = Me ̂ = M(X𝛽 + e) = Me

M is symmetric, idempotent, and it has trace(M) = n - k


9
R-Squared
Analysis of Variance

The matrices P and M make it easy to show that the decomposition of Y = Ŷ + e ̂ into fitted
values Ŷ and residuals e ̂ is orthogonal:

Ŷ ′ e ̂ = (PY)′ (MY) = Y′ PMY = Y′ (P - P)Y = 0

It follows that

n n n
Y′ Y = Ŷ ′ Ŷ + 2Ŷ ′ e ̂ + e′̂ e ̂ = Ŷ ′ Ŷ + e′̂ e ̂ or that ∑ Y2i = ∑ Ŷ 2i + ∑ e2î
i=1 i=1 i=1

Replace Y with (Y - 1n Ȳ ) to show the cross product is

n n n
∑(Yi - Ȳ )2 = ∑(Ŷ i - Ȳ )2 + ∑ e2î
⏟⏟⏟⏟⏟
i=1 ⏟⏟⏟⏟⏟
i=1 ⏟
i=1 10
TSS SSR SSE
Coefficient of Determination (R2 )

A commonly reported statistic is the Coefficient of Determination (or R2 ):

SSR SSE
R2 = =1-
TSS TSS

“The fraction of the sample variance of Y which is explained by the least squares fit.”

Notice:

• Minimizing SSE is the same as maximizing R2


• R2 (weakly) increases as more regressors are included in a regression model

A “high” value of R2 is sometimes used to claim that a regression model is “valid” or correctly
specified or highly accurate for prediction – none of these are necessarily true.

11
Error Variance
Unconditional Error Variance

An important measure of the dispersion about the CEF function is the unconditional
(on X) variance of the CEF error e:

2 2
𝜎2 = var(e) = 𝔼 [(e - 𝔼[e]) ] = 𝔼[e2 ] = 𝔼 [(Y - m(X)) ]

Econometricians have several names for this:

• Variance of the regression error


• Regression variance
• Error variance

𝜎2 measures the amount of variation in Y which is not “explained” by the CEF 𝔼[Y|X].

12
Adding Regressors Changes the Regression Variance

We can think of Y as the combination of an “explained” (by X) portion and an


unexplained (by X) portion:

Y = m(X) + e
= explained + unexplained

Changing the conditioning information (the X’s)

• changes the CEF m(X)


• and thus changes the error e
• and thus changes the variance of the error 𝜎

The relationship is monotonic: more info ⇒ smaller 𝜎


13
Estimate the Error Variance

The unconditional error variance is a moment:

𝜎2 = 𝔼[e2 ]

So a natural (analog, plug-in, or method of moments) estimator would be:

1 n 2
𝜎̃ 2 = ∑e
n i=1 i

But the errors ei are not observed, so we first estimate them with the residuals eî :

1 2 2
𝜎̂ 2 = ∑ ê
n i=1 i
14
𝜎̂ 2 ≤ 𝜎̃ 2

The feasible estimator (𝜎̂ 2 ) is smaller than the idealized estimator 𝜎̃ 2 :

Rewrite the feasible estimator as:


𝜎̂ 2 = n-1 e′̂ e ̂
= n-1 (Me)′ Me
= n-1 e′ Me

Then take the difference:


𝜎̃ 2 - 𝜎̂ 2 = n-1 e′ e - n-1 e′ Me
= n-1 e′ (I - M)e
= n-1 e′ Pe

Since e′ Pe is quadratic form, e′ Pe ≥ 0 which implies 𝜎̂ 2 ≤ 𝜎̃ 2 15


𝜎̂ 2 is biased

Recall two special properties of the trace operator:

• tr(AB) = tr(BA) when dim(A) = dim(B′ )


k
• tr(A) = ∑i=1 𝜆k for square k × k matrix A and eigenvalues 𝜆i for i = 1, … , k.

Then we can show:

1 ′ 1 1
𝜎̂ 2 = e Me = tr(e′ Me) = tr(Mee′ )
n n n

Taking the conditional expected value:

1 1
𝔼[𝜎̂ 2 |X] = tr (𝔼[Mee′ |X]) = tr (M𝔼[ee′ |X])
n n

16
𝜎̂ 2 is biased (Cont.)

Under an assumption of homoskedasticity, 𝔼[ee′ |X] = 𝜎2 In so that

1 1 n-k 2
𝔼[𝜎̂ 2 |X] = tr (M𝔼[ee′ |X]) = tr(M)𝜎2 = 𝜎
n n n
The “fix” is to propose an unbiased estimator s2

n 2 1 n 2
s2 = 𝜎̂ = ∑ ê
n-k n - k i=1 i

Terminology:

• The summary() command in R calls s2 the Residual Standard Error

• Some textbooks call s2 the Standard Error of the Regression
17
Adjusted R2
n
Define 𝜎̂Y2 = (1/n) ∑i=1 (Yi - Ȳ )2 . Then

SSE 𝜎̂ 2
R2 = 1 - =1-
TSS 𝜎Y2̂

Since 𝜎̂Y2 is biased for the variance of Y and 𝜎̂ 2 is biased for the error variance, an
“adjusted R2 ” measure was proposed using unbiased estimators, often denoted R̄ 2 :

n
s2 n-1 ∑ e2î
R̄ 2 = 1 - 2 = 1 - ( ) n i=1
sY n - k ∑i=1 (Yi - Ȳ )2

n
where s2Y is the sample variance of Y: s2Y = (n - 1)-1 ∑i=1 (Yi - Ȳ )2
18
Conditional Error Variance
Conditional Error Variance

First, consider the conditional variance of Y given X = x:

2
𝜎2 (x) = var(Y|X = x) = 𝔼 [(Y - 𝔼[Y|X = x]) |X = x]

The conditional variance is a function of the conditioning variables (the X’s),


much like how 𝔼[Y|X] = m(X).

Next, consider the conditional variance of e given X = x:

2
var(e|X = x) = 𝔼[e2 |X] = 𝔼 [(Y - 𝔼[Y|X = x]) |X = x] = 𝜎2 (x)

So we see that var(e|X = x) = var(Y|X = x) = 𝜎2 (x)


19
Mean-Variance Representation of the CEF

𝜎2 (x) is in a different unit of measurement than Y. To convert it to the same unit of


measure, define the conditional standard deviation: 𝜎(x) = √𝜎2 (x).

Now consider the re-scaled error u = e/𝜎(x). Notice:


𝔼[u|X] = 𝔼[e/𝜎(x)|X] = (1/𝜎(x))𝔼[e|X] = 0
var(u|X) = 𝔼[u2 |X] = 𝔼[e2 /𝜎2 (x)|X] = (1/𝜎2 (x))𝔼[e2 |X] = 1

So we can write the CEF Model in a mean-variance representation:

Y = m(X) + 𝜎(X)u

Most econometric studies focus on m(x) and either ignore 𝜎(x), treat is as a constant
(𝜎(x) = 𝜎), or treat it as a nuisance parameter.
20
Homoskedasticity & Heterskedasticity

Two terms are used to summarize assumptions about the conditional variance:

• The error is homoskedastic if the conditional variance does not depend on X:


𝜎2 (x) = 𝜎2
• The error is heteroskedastic if the conditional variance depends on X: 𝜎2 (x)

Heteroskedasticity is the “more correct” model specification!

Homoskedasticity is useful for:

• Simplifying calculations
• Teaching and learning
• Understanding a specific, unusual, and exceptional special case.

21
Variance of OLS Estimator
Variance of the OLS Estimator

Define the k × k conditional variance-covariance matrix of the OLS estimator to be:

̂ ) = var((X′ X)-1 X′ Y|X)


V𝛽 ̂ = var(𝛽 |X
= var((X′ X)-1 X′ (X𝛽 + e)|X)
= var((X′ X)-1 X′ e|X)

= ((X′ X)-1 X′ ) var (e|X) ((X′ X)-1 X′ )
= (X′ X)-1 X′ 𝔼[ee′ |X] X(X′ X)-1

22
Variance of the OLS Estimator (cont.)

Let’s explore the “meat” of the sandwich. Define D:

𝜎2 (x) 0 ⋯ 0
⎡ 1 2 ⎤
0 𝜎 2 (x) ⋯ 0 ⎥
D = var(e|X) = 𝔼[ee′ |X] = ⎢
⎢ ⋮
⎢ ⋮ ⋱ ⋮ ⎥⎥
⎣ 0 0 ⋯ 𝜎n2 (x)⎦

Because

• the ith diagonal element of D is 𝔼[e2i |X] = 𝔼[e2i |Xi ] = 𝜎i2 (x)
• the ijth off-diagonal element of D is 𝔼[ei ej |X] = 𝔼[ei |Xi ]𝔼[ej |Xj ] = 0

23
Variance of the Estimator Under Homoskedasticity

Under an assumption of homoskedasticity, we have 𝔼[e2i |X] = 𝜎2 for i = 1, … , n

Then D simplifies to:

𝜎2 0 ⋯ 0 1 0 ⋯ 0
⎡ 2 ⎤ ⎡ ⎤
⎢ 0 𝜎 ⋯ 0⎥ 2 ⎢0 1 ⋯ 0 ⎥ 2
D=⎢ ⎥ = 𝜎 ⎢ ⋮ ⋮ ⋱ ⋮ ⎥ = 𝜎 In
⎢ ⋮ ⋮ ⋱ ⋮ ⎥ ⎢ ⎥
2
⎣0 0 ⋯ 𝜎 ⎦ ⎣0 0 ⋯ 1 ⎦

And the variance-covariance matrix of the OLS estimator simplifies to

V𝛽 ̂ = (X′ X)-1 X′ 𝔼[ee′ |X] X(X′ X)-1 = (X′ X)-1 X′ 𝜎2 In X(X′ X)-1 = 𝜎2 (X′ X)-1

24
Variance of the Estimator Under Homoskedasticity (one X)

n
Define the sample variance of X as s2X = (n - 1)-1 ∑i=1 (Xi - X̄ )2 .

Then for a simple (X is a scalar random variable) linear regression model, the standard
deviation of the slope coefficient is:

𝜎2
var(𝛽1 ) =
(n - 1)s2X

This equation makes it clear that the stand deviation of the slope:

• increases when the error variance 𝜎2 increases


• decreases when the sample size n increases
• decreases when the spread of the X values s2X increases

25
Variance of the Estimator Under Homoskedasticity (one X): Graphically

The red side of arrows indicates an increase in the parameter (ie, either 𝜎2 , n, or s2X ).

Relative to the top-left plot, each plot has an increase in var(𝛽1 )

sx

n
σ2

26
Gauss-Markov Theorem

For the homoskedastic Linear Regression Model

Y = X𝛽 + e with 𝔼[e|X] = 0 and 𝔼[ee′ |X] = 𝜎2 In

the OLS estimator 𝛽 ̂ is the Best (lowest variance) Linear Unbiased Estimator (BLUE).

In other words, suppose 𝛽 ̃ = A′ Y is unbiased, then var(𝛽 |X


̃ ) ≥ 𝜎2 (X′ X)-1 .

A new paper by Hansen (2022) shows 𝛽 ̂ is BUE – future textbooks might call it the
Gauss-Markov-Hansen Theorem!

27
Gauss-Markov Theorem Proof

̃ ] = A′ 𝔼[Y|X] = A′ X𝛽
𝔼[𝛽 |X ⇒ A′ X = In

̃ ) = var(A′ Y|X) = A′ DA = 𝜎2 A′ A
var(𝛽 |X

What’s left to show is that A′ A ≥ (X′ X)-1

Set C = A - X(X′ X)-1 and notice that X′ C = 0. Then:



A′ A - (X′ X)-1 = (C + X(X′ X)-1 ) (C + X(X′ X)-1 ) - (X′ X)-1
= C′ C + C′ X(X′ X)-1 + (X′ X)-1 X′ C + (X′ X)-1 X′ X(X′ X)-1 - (X′ X)-1
= C′ C
≥0

28
OLS Covariance Matrix Estimation Under Homoskedasticity

Under the assumption of homoskedasticity, the var-cov matrix of the OLS estimator is:

V0 ̂ = 𝜎2 (X′ X)-1
𝛽

The most common estimator for V0 ̂ replaces 𝜎2 with its unbiased estimator s2 :
𝛽

V̂ 0 ̂ = s2 (X′ X)-1
𝛽

V̂ 0 ̂ is conditionally unbiased for V0 ̂ under homoskedasticity:


𝛽 𝛽

𝔼 [V̂ 0𝛽 |X 2 ′ -1 2 ′ -1 0
̂ ] = 𝔼[s |X](X X) = 𝜎 (X X) = V𝛽 ̂

29
OLS Covariance Matrix Estimation Under Heteroskedasticity

Without the assumption of homoskedasticity, the var-cov matrix of 𝛽 ̂ is

n
V𝛽 ̂ = (X′ X)-1 (X′ DX)(X′ X)-1 = (X′ X)-1 (∑ Xi X′i 𝔼[e2i |X]) (X′ X)-1
i=1

An idealized estimator would be:

n
V̂ ideal
̂ = (X′ X)-1 (∑ Xi X′i e2i ) (X′ X)-1
𝛽
i=1

Two feasible estimators (called White, Eicker-White, robust, heteroskedasticity-consistent) are:


n
V̂ HC0
̂ = (X′ X)-1 (∑ Xi X′i e2î ) (X′ X)-1
𝛽
i=1
n
n
V̂ HC1
̂ = (X′ X)-1 (∑ Xi X′i e2î ) (X′ X)-1
𝛽 n-k i=1 30
Standard Errors of the OLS Estimator

A standard error s(𝛽)̂ for an estimator 𝛽 ̂ is an estimator of the standard deviation of


the distribution of 𝛽 .̂

When 𝛽 is a vector with estimator 𝛽 ̂ and variance-covariance matrix estimator V̂ 𝛽 ,̂ the


standard errors are the square roots of the diagonal elements of V̂ :̂
𝛽

s(𝛽ĵ ) = √[V̂ 𝛽 ]̂
jj

31
Computation
Computation in R: lm()

dat <- read.table("support/cps09mar.txt") out <- lm(lwage[sam] ~ exper[sam])


summary(out)
exper <- dat[,1] - dat[,4] - 6
lwage <- log( dat[,5]/(dat[,6]*dat[,7]) )
Call:
sam <- dat[,11]==4 & dat[,12]==7 & dat[,2]==0 lm(formula = lwage[sam] ~ exper[sam])

Residuals:
Min 1Q Median 3Q Max
-2.3583 -0.4215 0.0042 0.4718 2.3569

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 2.876515 0.067631 42.532 <2e-16 ***
exper[sam] 0.004776 0.004335 1.102 0.272
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.7122 on 266 degrees of freedom


Multiple R-squared: 0.004542, Adjusted R-squared: 0.0007998
F-statistic: 1.214 on 1 and 266 DF, p-value: 0.2716

32
Computation in R: y ̂ and e ̂

y <- matrix(lwage[sam], ncol=1) # check y = yhat + resids


x <- cbind(1, exper[sam]) sum(y - (yhat + ehat))
[1] 1.110223e-16
xxi <- solve(crossprod(x))
# check sum(resids)=0
xy <- crossprod(x,y)
sum(ehat)
betahat <- xxi %*% xy
[1] 4.241052e-13
yhat <- x %*% betahat # fitted values
ehat <- y - yhat # residuals

33
Computation in R: s(𝛽)̂

n <- nrow(y) # std err (heterosk)


k <- ncol(x) u <- x*(ehat %*% matrix(1, ncol=k))
VHC0 <- xxi %*% (t(u) %*% u) %*% xxi
# residual standard error VHC1 <- (n / (n-k)) * VHC0
s2 <- (1/(n-k)) * t(ehat) %*% ehat
s2 <- as.vector(s2) sqrt(diag(VHC0))
sqrt(s2)
[1] 0.071346291 0.004295331
[1] 0.712242 sqrt(diag(VHC1))
# std err (homosk)
[1] 0.071614008 0.004311449
V0 <- s2*xxi
sqrt(diag(V0))
[1] 0.067631401 0.004335196

34
Computation in R: R2 and R̄ 2

# R-squared #adjusted R-squared


ybar <- mean(y) 1 - s2/var(y)
TSS <- sum((y - ybar)^2)
[,1]
SSE <- t(ehat) %*% ehat
[1,] 0.0007998062

1 - SSE/TSS
[,1]
[1,] 0.004542129
sig2hat <- t(ehat) %*% ehat / n
sigYtilde <- sum((y - ybar)^2) / n

1 - sig2hat/sigYtilde
[,1]
[1,] 0.004542129

35

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy