As of Sep 16, 2021: Seppo Pynn Onen Econometrics I

Download as pdf or txt
Download as pdf or txt
You are on page 1of 60

Multiple Regression Analysis

Part III

Multiple Regression Analysis

As of Sep 16, 2021


Seppo Pynnönen Econometrics I
Multiple Regression Analysis

1 Multiple Regression Analysis

Estimation

Matrix form

Goodness-of-Fit

R-square

Adjusted R-square

Expected values of the OLS estimators

Irrelevant variables in a regression

Omitted variable bias

Variances of OLS estimators

Multicollinearity

Variances of misspecified models

Estimating error variance σu2

Standard error of β̂k

The Gauss-Markov Theorem

Seppo Pynnönen Econometrics I


Multiple Regression Analysis

The general linear regression with k explanatory variables is just an


extension of the simple regression as follows

y = β0 + β1 x1 + · · · + βk xk + u. (1)

Because
∂y
= βj (2)
∂xj
j = 1, . . . , k, coefficient βj indicates the marginal effect of variable
xj , and indicates the amount y is expected to change as xj changes
by one unit and other variables are kept constant (ceteris paribus).

Seppo Pynnönen Econometrics I


Multiple Regression Analysis

Multiple regression opens up several additional options to enrich


analysis and make modeling more realistic compared to the simple
regression.

Seppo Pynnönen Econometrics I


Multiple Regression Analysis

Example 1
Consider the consumption function
C = f (Y ), where Y is income. Suppose the assumption is that as
incomes grow the marginal propensity to consume decreases.

In simple regression we could dry to fit a level-log model or log-log model.

One possibility also could be

β1 = β1l + β1q Y ,

where according to our hypothesis β1q < 0. Thus the consumption


function becomes
C = β0 + (β1l + β1q Y )Y + u

= β0 + β1l Y + β1q Y 2 + u

This is a multiple regression model with x1 = Y and x2 = Y 2 .

Seppo Pynnönen Econometrics I


Multiple Regression Analysis

Example 1 continues . . .
This simple example demonstrates that we can meaningfully enrich
simple regression analysis (even though we have essentially only two
variables, C and Y ) and at the same time get a meaningful interpretation
to the above polynomial model.
Technically, considering the simple regression

C = β0 + β1 Y + v ,

the extension
C = β0 + βil Y + β1q Y 2 + u
means that we have extracted the quadratic term Y 2 out from the error
term v of the simple regression.

Seppo Pynnönen Econometrics I


Multiple Regression Analysis

Estimation

1 Multiple Regression Analysis

Estimation

Matrix form

Goodness-of-Fit

R-square

Adjusted R-square

Expected values of the OLS estimators

Irrelevant variables in a regression

Omitted variable bias

Variances of OLS estimators

Multicollinearity

Variances of misspecified models

Estimating error variance σu2

Standard error of β̂k

The Gauss-Markov Theorem

Seppo Pynnönen Econometrics I


Multiple Regression Analysis

Estimation

In order to estimate the model we replace the classical assumption


4 by
4. The explanatory variables are in mathematical sense linearly
independent. That is no column in the data matrix X (see below)
is a linear combination of the other columns.
The key assumption (Assumption 5) in terms of the conditional
expectation becomes

E[u|x1 , . . . , xk ] = 0. (3)

Seppo Pynnönen Econometrics I


Multiple Regression Analysis

Estimation

Given observations (yi , xi1 , . . . , xik ), i = 1, . . . , n (n = the number


of observations), the estimation method again is the OLS, which
produces estimates β̂0 , β̂1 , . . . , β̂k by minimizing
n
X
(yi − β0 − β1 xi1 − · · · − βk xik )2 (4)
i=1

with respect to the parameters.


Again the first order solution is to set the (k + 1) partial
derivatives equal to zero.
The solution is straightforward although the explicit form of the
estimators become complicated.

Seppo Pynnönen Econometrics I


Multiple Regression Analysis

Estimation

1 Multiple Regression Analysis

Estimation

Matrix form

Goodness-of-Fit

R-square

Adjusted R-square

Expected values of the OLS estimators

Irrelevant variables in a regression

Omitted variable bias

Variances of OLS estimators

Multicollinearity

Variances of misspecified models

Estimating error variance σu2

Standard error of β̂k

The Gauss-Markov Theorem

Seppo Pynnönen Econometrics I


Multiple Regression Analysis

Estimation

Using matrix algebra simplifies considerably the notations in


multiple regression.
Denote the observation vector on y as

y = (y1 , . . . , yn )0 , (5)

where the prime denotes transposition.


In the same manner denote the data matrix on x-variables
enhanced with ones in the first column as an n × (k + 1) matrix
 
1 x11 x12 · · · x1k
 1 x21 x22 · · · x2k 
X= . .. , (6)
 
.. ..
 .. . . . 
1 xn1 xn2 · · · xnk

where k < n (the number of observations, n, is larger than the


number of x-variables, k).
Seppo Pynnönen Econometrics I
Multiple Regression Analysis

Estimation

Then we can present the whole set of regression equations for the
sample
y1 = β0 + β1 x11 + · · · + βk x1k + u1
y2 = β0 + β1 x21 + · · · + βk x2k + u2
.. (7)
.
yn = β0 + β1 xn1 + · · · + βk xnk + un
in the matrix form as
 
    β0  
y1 1 x11 x12 ··· x1k u1
 β1 
 y2   1 x21 x22 ··· x2k    u2 

 .. =
 
.. .. .. ..

 β2 +
 
..

 (8)
 .   . . . .  ..   . 
 . 
yn 1 xn1 xn2 ··· xnk un
βk
or shortly
y = Xβ + u, (9)
where
β = (β0 , β1 , . . . , βk )0
Seppo Pynnönen Econometrics I
Multiple Regression Analysis

Estimation

The normal equations for the first order conditions of the


minimization of (4) in matrix form are simply

X0 Xβ̂ = X0 y (10)

which gives the explicit solution for the OLS estimator of β as

β̂ = (X0 X)−1 X0 y, (11)

where β̂ = (β̂0 , β̂1 , . . . , β̂k )0 .

Seppo Pynnönen Econometrics I


Multiple Regression Analysis

Estimation

The fitted model is

ŷi = β̂0 + β̂1 xi1 + · · · + β̂k xik , (12)

i = 1, . . . , n.
The residual for observation i is again defined as

ûi = yi − ŷi . (13)

Again hold
ȳ = ŷ¯ (14)
and
ȳ = β̂0 + β̂1 x̄1 + · · · + β̂k x̄k , (15)
where ȳ , ŷ¯ , x̄j , j = 1, . . . , k are sample averages of the variables.

Seppo Pynnönen Econometrics I


Multiple Regression Analysis

Estimation

Exploring further (Wooldridge 5e, p.74)


OLS fitted regression explaining college GPA in terms of high school GPA
and ACT score is

\ = 1.29 + 0.454 hsGPA + 0.0094 ACT.


clGPA

If in the sample the average high school GPA is 3.4 and the average ACT
is 24.2, what is the average college GPA in the sample?

Seppo Pynnönen Econometrics I


Multiple Regression Analysis

Estimation

Remark 1
For example, if you fit the simple regression ỹ = β̃0 + β̃1 x1 , where β̃0 and
β̃1 are the OLS estimators, and fit a multiple regression
ŷ = β̂0 + β̂1 x1 + β̂2 x2 then generally β̃1 6= β̂1 unless β̂2 = 0, or x1 and x2
are uncorrelated.

Seppo Pynnönen Econometrics I


Multiple Regression Analysis

Estimation

Example 2
Consider the hourly wage example. enhance the model as

log(w ) = β0 + β1 x1 + β2 x2 + β3 x3 , (16)

where w = hourly wage, x1 = years of education (educ), x2 = years of


labor market experience (exper), and x3 = years with the current
employer (tenure).
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.284360 0.104190 2.729 0.00656 **
educ 0.092029 0.007330 12.555 < 2e-16 ***
exper 0.004121 0.001723 2.391 0.01714 *
tenure 0.022067 0.003094 7.133 3.29e-12 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.4409 on 522 degrees of freedom


Multiple R-squared: 0.316,Adjusted R-squared: 0.3121
F-statistic: 80.39 on 3 and 522 DF, p-value: < 2.2e-16

Seppo Pynnönen Econometrics I


Multiple Regression Analysis

Goodness-of-Fit

1 Multiple Regression Analysis

Estimation

Matrix form

Goodness-of-Fit

R-square

Adjusted R-square

Expected values of the OLS estimators

Irrelevant variables in a regression

Omitted variable bias

Variances of OLS estimators

Multicollinearity

Variances of misspecified models

Estimating error variance σu2

Standard error of β̂k

The Gauss-Markov Theorem

Seppo Pynnönen Econometrics I


Multiple Regression Analysis

Goodness-of-Fit

Again in the same manner as with the simple regression we have


n
X n
X n
X
(yi − ȳ )2 = (ŷi − ȳ )2 + (yi − ŷi )2 (17)
i=1 i=1 i=1

or
SST = SSE + SSR, (18)
where
ŷi = β̂0 + β̂1 xi1 + · · · + β̂k xik . (19)

Seppo Pynnönen Econometrics I


Multiple Regression Analysis

Goodness-of-Fit

1 Multiple Regression Analysis

Estimation

Matrix form

Goodness-of-Fit

R-square

Adjusted R-square

Expected values of the OLS estimators

Irrelevant variables in a regression

Omitted variable bias

Variances of OLS estimators

Multicollinearity

Variances of misspecified models

Estimating error variance σu2

Standard error of β̂k

The Gauss-Markov Theorem

Seppo Pynnönen Econometrics I


Multiple Regression Analysis

Goodness-of-Fit

SSE SSR
R2 = =1− . (20)
SST SST
Again as in the case of the simple regression, R 2 can be shown to
be the squared correlation coefficient between the actual yi and
fitted ŷi . This correlation is called the multiple correlation

(yi − ȳ )(ŷi − ŷ¯ )


P
R = pP pP . (21)
(yi − ȳ )2 (ŷi − ŷ¯ )2

Recall, ŷ¯ = ȳ .
Remark 2
R 2 never decreases when an explanatory variable is added to the model!

Seppo Pynnönen Econometrics I


Multiple Regression Analysis

Goodness-of-Fit

1 Multiple Regression Analysis

Estimation

Matrix form

Goodness-of-Fit

R-square

Adjusted R-square

Expected values of the OLS estimators

Irrelevant variables in a regression

Omitted variable bias

Variances of OLS estimators

Multicollinearity

Variances of misspecified models

Estimating error variance σu2

Standard error of β̂k

The Gauss-Markov Theorem

Seppo Pynnönen Econometrics I


Multiple Regression Analysis

Goodness-of-Fit

su2
R̄ 2 = 1 − , (22)
sy2
where
n n
1 X 1 X
su2 = (yi − ŷi )2 = ûi2 (23)
n−k −1 n−k −1
i=1 i=1

is an estimator of the error variance σu2 = var[ui ] and ûi = yi − ŷi


are residuals with ŷi = β̂0 + β̂1 xi1 + · · · + β̂k xik the fitted values.

Seppo Pynnönen Econometrics I


Multiple Regression Analysis

Expected values of the OLS estimators

1 Multiple Regression Analysis

Estimation

Matrix form

Goodness-of-Fit

R-square

Adjusted R-square

Expected values of the OLS estimators

Irrelevant variables in a regression

Omitted variable bias

Variances of OLS estimators

Multicollinearity

Variances of misspecified models

Estimating error variance σu2

Standard error of β̂k

The Gauss-Markov Theorem

Seppo Pynnönen Econometrics I


Multiple Regression Analysis

Expected values of the OLS estimators

Under the classical assumptions 1–5 (with assumption 5 updated


to multiple regression) we can show that the estimators of the
regression coefficients are unbiased.
Theorem 1
Under the classical assumptions the OLS estimators of the regression
coefficients, βj , j = 0, 1, . . . , k, are unbiased, i.e.,
h i
E β̂j = βj , j = 0, 1, . . . , k. (24)

Seppo Pynnönen Econometrics I


Multiple Regression Analysis

Expected values of the OLS estimators

Using matrix notations, the proof of Theorem 3.1 is pretty


straightforward. To do this, write

β̂ = (X0 X)−1 X0 y
= (X0 X)−1 X0 (Xβ + u)
(25)
= (X0 X)−1 X0 Xβ + (X0 X)−1 X0 u
= β + (X0 X)−1 X0 u.

The expected value of β̂ is


h i
E β̂ = β + E (X0 X)−1 X0 u
 
(26)
= β

because of Assumption 5, i.e.,


 E[u|X] = 0 implies cov[u, X] = 0
and therefore E (X0 X)−1 X0 u = 0. Thus, the OLS estimators of


the regression coefficients are unbiased.


Seppo Pynnönen Econometrics I
Multiple Regression Analysis

Expected values of the OLS estimators

Remark 3
If z = (z1 , . . . , zk )0 is a random vector, then E[z] = (E[z1 ] , . . . , E[zn ])0 .
That is the expected value of a vector is a vector whose components are
the individual expected values.

Seppo Pynnönen Econometrics I


Multiple Regression Analysis

Expected values of the OLS estimators

1 Multiple Regression Analysis

Estimation

Matrix form

Goodness-of-Fit

R-square

Adjusted R-square

Expected values of the OLS estimators

Irrelevant variables in a regression

Omitted variable bias

Variances of OLS estimators

Multicollinearity

Variances of misspecified models

Estimating error variance σu2

Standard error of β̂k

The Gauss-Markov Theorem

Seppo Pynnönen Econometrics I


Multiple Regression Analysis

Expected values of the OLS estimators

Suppose the correct model is

y = β0 + β1 x1 + u (27)

but we estimate the model as

y = β0 + β1 x1 + β2 x2 + u. (28)

Thus, β2 = 0 in reality. The OLS estimation results yield

ŷ = β̂0 + β̂1 x1 + β̂2 x2 (29)


h i h i
By Theorem 3.1 E β̂j = βj , thus in particular E β̂2 = β2 = 0,
implying that inclusion of extra variables to a regression does not
bias the results.
However, as will be seen later, they decrease accuracy of
estimation by increasing variance of the OLS estimates.
Seppo Pynnönen Econometrics I
Multiple Regression Analysis

Expected values of the OLS estimators

1 Multiple Regression Analysis

Estimation

Matrix form

Goodness-of-Fit

R-square

Adjusted R-square

Expected values of the OLS estimators

Irrelevant variables in a regression

Omitted variable bias

Variances of OLS estimators

Multicollinearity

Variances of misspecified models

Estimating error variance σu2

Standard error of β̂k

The Gauss-Markov Theorem

Seppo Pynnönen Econometrics I


Multiple Regression Analysis

Expected values of the OLS estimators

Suppose now as an example that the correct model is

y = β0 + β1 x1 + β2 x2 + u, (30)

but we misspecify the model as

y = β0 + β1 x1 + v , (31)

where the omitted variable is embedded into the residual term


v = β2 x2 + u.
OLS estimator for β1 for specification (31)is
P
(xi1 − x̄1 )yi
β̃1 = P . (32)
(xi1 − x̄1 )2

Seppo Pynnönen Econometrics I


Multiple Regression Analysis

Expected values of the OLS estimators

From Equation (32) we have


n
X
β̃1 = β1 + ai vi , (33)
i=1

where
(xi1 − x̄1 )
ai = P . (34)
(xi1 − x̄1 )2

Seppo Pynnönen Econometrics I


Multiple Regression Analysis

Expected values of the OLS estimators

Thus because E[vi ] = E[β2 xi2 + ui ] = β2 xi2


h i P
E β̃1 = β1 + ai E[vi ]
P
= β1 + ai β2 xi2 (35)
P
= β1 + β2 P(x(xi1 −x̄1 )xi2
2
i1 −x̄1 )

i.e., P
h i (xi1 − x̄1 )xi2
E β̃1 = β1 + β2 P , (36)
(xi1 − x̄1 )2
implying that β̃1 is biased for β1 unless x1 and x2 are uncorrelated
(or β2 = 0). This is called the omitted variable bias.

Seppo Pynnönen Econometrics I


Multiple Regression Analysis

Expected values of the OLS estimators

The direction of the omitted variable bias is as follows:

cor[x1 , x2 ] > 0 cor[x1 , x2 ] < 0


β2 > 0 positive bias negative bias
β2 < 0 negative bias positive bias

Seppo Pynnönen Econometrics I


Multiple Regression Analysis

Expected values of the OLS estimators

Remark 4
The omitted variable bias factually is due to the fact that the error term
of the miss-specified model is correlated with the explanatory variables.
That is, if the true model is y = β0 + β1 x1 + β2 x2 + u but we estimate
model y = β0 + β2 x1 + v (so that v = β2 x2 + u), then
E[v |x1 ] = E[β2 x2 + u|x1 ] = β2 E[x2 |x1 ] + E[u|x1 ] = β2 E[x2 |x1 ] 6= 0 if x2
and x1 are correlated, i.e., the crucial assumption E[v |x1 ] = 0 becomes
violated in the miss-specified model. (Note: we assume that in the true
model E[u|x1 , x2 ] = 0 which implies also E[u|x1 ] = 0.)

Seppo Pynnönen Econometrics I


Multiple Regression Analysis

Expected values of the OLS estimators

Exploring further (Wooldridge 5e, p.67)


A simple model to explain city murder rates (murdrate) in terms of
probability of conviction (prbconv) and average sentence length
(avgsen) is

murdrate = β0 + β1 prbconv + β2 avgsen + u.

What are some factors contained in u? Do you think the key assumption
E[u|prbconv, avgsen] = 0 likely to hold?

Seppo Pynnönen Econometrics I


Multiple Regression Analysis

Variances of OLS estimators

1 Multiple Regression Analysis

Estimation

Matrix form

Goodness-of-Fit

R-square

Adjusted R-square

Expected values of the OLS estimators

Irrelevant variables in a regression

Omitted variable bias

Variances of OLS estimators

Multicollinearity

Variances of misspecified models

Estimating error variance σu2

Standard error of β̂k

The Gauss-Markov Theorem

Seppo Pynnönen Econometrics I


Multiple Regression Analysis

Variances of OLS estimators

Write the regression model

yi = β0 + β1 xi1 + · · · + βk xik + ui (37)

i = 1, . . . , n in the matrix form

y = Xβ + u. (38)

Then we can write the OLS estimators compactly as

β̂ = (X0 X)−1 X0 y. (39)

Seppo Pynnönen Econometrics I


Multiple Regression Analysis

Variances of OLS estimators

Under the classical assumption 1–5, and assuming X fixed we can


show that the variance-covariance matrix β is
h i
cov β̂ = (X0 X)−1 σu2 . (40)

Variances of the individual coefficients are obtained form the main


diagonal of the matrix, and can be shown to be of the form
h i σ2
var β̂j = Pnu , (41)
(1 − Rj2 ) i=1 (xij − x̄j )2

j = 1, . . . , k, where Rj2 is the R-square when regressing xj on the


other explanatory variables and the constant term.

Seppo Pynnönen Econometrics I


Multiple Regression Analysis

Variances of OLS estimators

1 Multiple Regression Analysis

Estimation

Matrix form

Goodness-of-Fit

R-square

Adjusted R-square

Expected values of the OLS estimators

Irrelevant variables in a regression

Omitted variable bias

Variances of OLS estimators

Multicollinearity

Variances of misspecified models

Estimating error variance σu2

Standard error of β̂k

The Gauss-Markov Theorem

Seppo Pynnönen Econometrics I


Multiple Regression Analysis

Variances of OLS estimators

In terms of linear algebra, we say that vectors

x1 , x2 , . . . , xk

are linearly independent if

a1 x1 + · · · + ak xk = 0

holds only if
a1 = · · · = ak = 0.

Otherwise x1 , . . . , xk are linearly dependent. In such a case some


a` 6= 0 and we can write

x` = c1 x1 + · · · + c`−1 x`−1 + c`+1 x`+1 + · · · + ck xk ,

where cj = −aj /a` that is x` can be represented as a linear


combination of the other variables.
Seppo Pynnönen Econometrics I
Multiple Regression Analysis

Variances of OLS estimators

In statistics the multiple correlation measures the degree of linear


dependence.
If the variables are perfectly linearly dependent. That is, if for
example, xj is a linear combination of other variables, the multiple
correlation Rj = 1.
A perfect linear dependence is rare between random variables.
However, particularly between macro economic variables
dependencies are often high.

Seppo Pynnönen Econometrics I


Multiple Regression Analysis

Variances of OLS estimators

h i
From the variance equation (41) we see that var β̂j → ∞ as
Rj2 → 1.
That is, the more the explanatory variables are linearly dependent
the larger the variance becomes.
This implies that the coefficient estimates become increasingly
unstable.
High (but not perfect) correlation between two or more
explanatory variables is called multicollinearity.

Seppo Pynnönen Econometrics I


Multiple Regression Analysis

Variances of OLS estimators

Symptoms of multicollinearity:
1 High correlations between explanatory variables.
2 R 2 is relatively high, but the coefficient estimates tend to be
insignificant (see the section of hypothesis testing)
3 Some of the coefficients are of wrong sign and some
coefficients are at the same time unreasonably large.
4 Coefficient estimates change much from one model alternative
to another.

Seppo Pynnönen Econometrics I


Multiple Regression Analysis

Variances of OLS estimators

Example 3
Variable Et denotes expenditure costs in a sample of Toyota Mark II cars
at time point t, Mt denotes the mileage and At age.

Consider model alternatives:


Model A: Et = α0 + α1 At + u1t

Model B: Et = β0 + β1 Mt + u2t

Model C: Et = γ0 + γ1 Mt + γ2 At + u3t

Seppo Pynnönen Econometrics I


Multiple Regression Analysis

Variances of OLS estimators

Example 3 continues . . .
Estimation results: (t-values in parentheses)

Variable Model A Model B Model C


Constant -626.24 -796.07 7.29
(-5.98) (-5.91) (0.06)

Age 7.35 27.58


(22.16) (9.58)

Miles 53.45 -151.15


(18.27) (-7.06)
df 55 55 54
R̄ 2 0.897 0.856 0.946
σ̂ 368.6 437.0 268.3

Findings:
Apriori, coefficients α1 , β1 , γ1 , and γ2 should be positive. However,
γ̂2 = −151.15 (!!?), but β̂1 = 53.45. Correlation rA,M = 0.996!

Seppo Pynnönen Econometrics I


Multiple Regression Analysis

Variances of OLS estimators

Remedies:
In the collinearity problem the question is the there is not enough
information to reliably identify each variables contribution as an
explanatory variable in the model.
Thus in order to alleviate the problem:
1 Use non-sample information if available to impose restrictions
between coefficients.
2 Increase the sample size if possible.
3 Drop the most collinear (on the base of Rj2 ) variables.
4 If a linear combination (usually a sum) of the most collinear
variables is meaningful, replace the collinear variables by the
linear combination.

Seppo Pynnönen Econometrics I


Multiple Regression Analysis

Variances of OLS estimators

Remark 5
Multicollinearity is not always harmful.
If an explanatory variable is not correlated with the rest of the
explanatory variables multicollinearity of these variables (provided
that it is not perfect) does not harm the variance of the slope
coefficient estimator of the uncorrelated explanatory variable.
If xj is not correlated with any of the other predictors, Rj2 = 0 and hence,
as is seen from equation (41), the correlation factor (1 − Rj2 ) drops out
h i
from its slope coefficient estimator variance var β̂j .

Another case where multicollinearity (again excluding the perfect


case) is not a problem is if we are purely interested in the
predictions of the model.
Predictions are affected essentially only by the usefulness of the available
information.

Seppo Pynnönen Econometrics I


Multiple Regression Analysis

Variances of OLS estimators

Exploring further (Wooldridge 5e, p.93)


Consider a model explaining final exam score by class attendance (number of
classes attended). To control for student abilities and efforts outside the
classroom, additional explanatory variables like cumulative GPA, SAT score,
and a measure of high school performance are included. Someone says, ”You
cannot learn much from this exercise because cumulative GPA, SAT score, and
high school performance are likely to be highly collinear.” What should be your
response?

Seppo Pynnönen Econometrics I


Multiple Regression Analysis

Variances of OLS estimators

Sometimes collinearity imply from technical reasons.


Example 4 (Wooldridge 5e, Exercise 3.5)
In a study relating college grade point average (gpa) to time spent in various
activities, you distribute a survey to several students. The students are asked
how many hours they spend each week in four activities: studying, sleeping,
working, and leisure. Any activity is put into one of these four categories, so
that for each student the sum of hours in the four activities must sum to 168.
(i) In the model

gpa = β0 + β1 study + β2 sleep + β3 work + β4 leisure + u,

does it make sense to hold sleep, work, leisure fixed, while changing
study?
(ii) Explain why this model violates Assumption 4.
(iii) How would you reformulate model so that its parameters have a useful
interpretation and it satisfies Assumption 4.?

Seppo Pynnönen Econometrics I


Multiple Regression Analysis

Variances of OLS estimators

1 Multiple Regression Analysis

Estimation

Matrix form

Goodness-of-Fit

R-square

Adjusted R-square

Expected values of the OLS estimators

Irrelevant variables in a regression

Omitted variable bias

Variances of OLS estimators

Multicollinearity

Variances of misspecified models

Estimating error variance σu2

Standard error of β̂k

The Gauss-Markov Theorem

Seppo Pynnönen Econometrics I


Multiple Regression Analysis

Variances of OLS estimators

Consider again as in (30) the regression model

y = β0 + β1 x1 + β2 x2 + u. (42)

Suppose the following models are estimated by OLS

ŷ = β̂0 + β̂1 x1 + β̂2 x2 (43)

and
ỹ = β̃0 + β̃1 x1 . (44)
Then
h i σ2
var β̂1 = 2 )
Pu (45)
(1 − r12 (xi1 − x̄1 )2
and
h i σu2
var β̃1 = P , (46)
(xi1 − x̄1 )2
where r12 is the sample correlation between x1 and x2 .
Seppo Pynnönen Econometrics I
Multiple Regression Analysis

Variances of OLS estimators

h i h i
Thus var β̃1 ≤ var β̂1 , and the inequality is strict if r12 6= 0.
In summary (assuming r12 6= 0):
h i h i
1 If β2 6= 0, then β̃1 is biased, β̂1 is unbiased, and var β̃1 < var β̂1
h i h i
2 If β2 = 0, then both β̃1 and β̂1 are unbiased, but var β̃1 < var β̂1

Seppo Pynnönen Econometrics I


Multiple Regression Analysis

Variances of OLS estimators

1 Multiple Regression Analysis

Estimation

Matrix form

Goodness-of-Fit

R-square

Adjusted R-square

Expected values of the OLS estimators

Irrelevant variables in a regression

Omitted variable bias

Variances of OLS estimators

Multicollinearity

Variances of misspecified models

Estimating error variance σu2

Standard error of β̂k

The Gauss-Markov Theorem

Seppo Pynnönen Econometrics I


Multiple Regression Analysis

Variances of OLS estimators

An unbiased estimator of the error variance var[u] = σu2 is


n
1 X
σ̂u2 = ûi2 , (47)
n−k −1
i=1

where
ûi = yi − β̂0 − β̂1 xi1 − · · · − β̂k xik . (48)
The term n − k − 1 in (46) is the
degrees of freedom (df).
It can be shown that
E σ̂u2 = σu2 ,
 
(49)
i.e., σ̂u2 is unbiased estimator of σu2 .
σ̂u is called the standard error of the regression.

Seppo Pynnönen Econometrics I


Multiple Regression Analysis

Variances of OLS estimators

1 Multiple Regression Analysis

Estimation

Matrix form

Goodness-of-Fit

R-square

Adjusted R-square

Expected values of the OLS estimators

Irrelevant variables in a regression

Omitted variable bias

Variances of OLS estimators

Multicollinearity

Variances of misspecified models

Estimating error variance σu2

Standard error of β̂k

The Gauss-Markov Theorem

Seppo Pynnönen Econometrics I


Multiple Regression Analysis

Variances of OLS estimators

Standard deviation of β̂j is the square root of (41), i.e.


σu
q
σβ̂j = var[βj ] = q (50)
Rj2 )
P
(1 − (xij − x̄j )2

p
Substituting σu by its estimate σ̂u = σ̂u2 gives
the standard error of β̂j

σ̂u
se(β̂j ) = q . (51)
(1 − Rj2 ) (xij − x̄j )2
P

Seppo Pynnönen Econometrics I


Multiple Regression Analysis

The Gauss-Markov Theorem

1 Multiple Regression Analysis

Estimation

Matrix form

Goodness-of-Fit

R-square

Adjusted R-square

Expected values of the OLS estimators

Irrelevant variables in a regression

Omitted variable bias

Variances of OLS estimators

Multicollinearity

Variances of misspecified models

Estimating error variance σu2

Standard error of β̂k

The Gauss-Markov Theorem

Seppo Pynnönen Econometrics I


Multiple Regression Analysis

The Gauss-Markov Theorem

Theorem 2 (Gauss-Markov)

Under the classical assumptions 1–5 β̂0 , β̂1 , . . . β̂k are the best linear
unbiased estimators (BLUEs) of β0 , β1 , . . . βk , respectively.

BLUE:
Best: The variance of the OLS estimator is smallest among all
linear unbiased estimators of βj
Linear: β̂j = ni=1 wij yi .
P
h i
Unbiased: E β̂j = βj .

Seppo Pynnönen Econometrics I


Multiple Regression Analysis

The Gauss-Markov Theorem

Example 5 (Wooldridge 5e Exercise 3.13 (our Exercise 2.1))


(i) Consider the simple regression model y = β0 + β1 x + u under the classical
(Gauss-Markov) assumptions 1–5. For some function g (x), for example
g (x) = x 2 or g (x) = log(1 + x 2 ), define zi = g (xi ). Define a slope estimator as
Pn
(zi − z̄)yi
β̃ = Pni=1 .
i=1 i − z̄)xi
(z

Show that β̃1 is linear and unbiased. (Remember, because E[u|x] = 0, you can
treat both xi and zi as nonrandom in your derivation.)
(ii) Show that
σ 2 n (zi − z̄)2
h i P
var β̃1 = Pn i=1 2
i=1 (zi − z̄)zi
h i h i
(iii) Show that under the Gauss-Markov assumptions, var β̂1 ≤ var β̃ , where β̂1 is
the
P OLS estimator. (Hint. Cauchy-Schwartz inequality:
( ai bi )2 ≤ ( ai ) ( bi ).)
P P

Seppo Pynnönen Econometrics I

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy