As of Sep 16, 2021: Seppo Pynn Onen Econometrics I

Multiple Regression Analysis
Part III
As of Sep 16, 2021

Seppo Pynnönen Econometrics I
1 Multiple Regression Analysis
Estimation
Matrix form
Goodness-of-Fit
R-square
Adjusted R-square
Expected values of the OLS estimators
Irrelevant variables in a regression
Omitted variable bias
Variances of OLS estimators
Multicollinearity
Variances of misspecified models
Estimating error variance σu2
Standard error of β̂k
The Gauss-Markov Theorem

The general linear regression with k explanatory variables is just an

extension of the simple regression as follows
y = β0 + β1 x1 + · · · + βk xk + u. (1)
Because
∂y
= βj (2)
∂xj
j = 1, . . . , k, coefficient βj indicates the marginal effect of variable
xj , and indicates the amount y is expected to change as xj changes
by one unit and other variables are kept constant (ceteris paribus).

Multiple regression opens up several additional options to enrich

analysis and make modeling more realistic compared to the simple
regression.

Example 1
Consider the consumption function
C = f (Y ), where Y is income. Suppose the assumption is that as
incomes grow the marginal propensity to consume decreases.
In simple regression we could dry to fit a level-log model or log-log model.
One possibility also could be
β1 = β1l + β1q Y ,
where according to our hypothesis β1q < 0. Thus the consumption

function becomes
C = β0 + (β1l + β1q Y )Y + u
= β0 + β1l Y + β1q Y 2 + u
This is a multiple regression model with x1 = Y and x2 = Y 2 .

Example 1 continues . . .
This simple example demonstrates that we can meaningfully enrich
simple regression analysis (even though we have essentially only two
variables, C and Y ) and at the same time get a meaningful interpretation
to the above polynomial model.
Technically, considering the simple regression
C = β0 + β1 Y + v ,
the extension
C = β0 + βil Y + β1q Y 2 + u
means that we have extracted the quadratic term Y 2 out from the error
term v of the simple regression.

Estimation
Estimation
Matrix form
Goodness-of-Fit
R-square
Adjusted R-square
Multicollinearity

Estimation
In order to estimate the model we replace the classical assumption

4 by
4. The explanatory variables are in mathematical sense linearly
independent. That is no column in the data matrix X (see below)
is a linear combination of the other columns.
The key assumption (Assumption 5) in terms of the conditional
expectation becomes
E[u|x1 , . . . , xk ] = 0. (3)

Estimation
Given observations (yi , xi1 , . . . , xik ), i = 1, . . . , n (n = the number

of observations), the estimation method again is the OLS, which
produces estimates β̂0 , β̂1 , . . . , β̂k by minimizing
n
X
(yi − β0 − β1 xi1 − · · · − βk xik )2 (4)
i=1
with respect to the parameters.

Again the first order solution is to set the (k + 1) partial
derivatives equal to zero.
The solution is straightforward although the explicit form of the
estimators become complicated.

Estimation
Estimation
Matrix form
Goodness-of-Fit
R-square
Adjusted R-square
Multicollinearity

Estimation
Using matrix algebra simplifies considerably the notations in

multiple regression.
Denote the observation vector on y as
y = (y1 , . . . , yn )0 , (5)
where the prime denotes transposition.

In the same manner denote the data matrix on x-variables
enhanced with ones in the first column as an n × (k + 1) matrix
 
1 x11 x12 · · · x1k
 1 x21 x22 · · · x2k 
X= . .. , (6)
 
.. ..
 .. . . . 
1 xn1 xn2 · · · xnk
where k < n (the number of observations, n, is larger than the

number of x-variables, k).
Estimation
Then we can present the whole set of regression equations for the
sample
y1 = β0 + β1 x11 + · · · + βk x1k + u1
y2 = β0 + β1 x21 + · · · + βk x2k + u2
.. (7)
.
yn = β0 + β1 xn1 + · · · + βk xnk + un
in the matrix form as
 
    β0  
y1 1 x11 x12 ··· x1k u1
 β1 
 y2   1 x21 x22 ··· x2k    u2 

 .. =
 
.. .. .. ..

 β2 +
 
..

 (8)
 .   . . . .  ..   . 
 . 
yn 1 xn1 xn2 ··· xnk un
βk
or shortly
y = Xβ + u, (9)
where
β = (β0 , β1 , . . . , βk )0
Estimation
The normal equations for the first order conditions of the

minimization of (4) in matrix form are simply
X0 Xβ̂ = X0 y (10)
which gives the explicit solution for the OLS estimator of β as
β̂ = (X0 X)−1 X0 y, (11)
where β̂ = (β̂0 , β̂1 , . . . , β̂k )0 .

Estimation
The fitted model is
ŷi = β̂0 + β̂1 xi1 + · · · + β̂k xik , (12)
i = 1, . . . , n.
The residual for observation i is again defined as
ûi = yi − ŷi . (13)
Again hold
ȳ = ŷ¯ (14)
and
ȳ = β̂0 + β̂1 x̄1 + · · · + β̂k x̄k , (15)
where ȳ , ŷ¯ , x̄j , j = 1, . . . , k are sample averages of the variables.

Estimation
Exploring further (Wooldridge 5e, p.74)

OLS fitted regression explaining college GPA in terms of high school GPA
and ACT score is
\ = 1.29 + 0.454 hsGPA + 0.0094 ACT.

clGPA
If in the sample the average high school GPA is 3.4 and the average ACT
is 24.2, what is the average college GPA in the sample?

Estimation
Remark 1
For example, if you fit the simple regression ỹ = β̃0 + β̃1 x1 , where β̃0 and
β̃1 are the OLS estimators, and fit a multiple regression
ŷ = β̂0 + β̂1 x1 + β̂2 x2 then generally β̃1 6= β̂1 unless β̂2 = 0, or x1 and x2
are uncorrelated.

Estimation
Example 2
Consider the hourly wage example. enhance the model as
log(w ) = β0 + β1 x1 + β2 x2 + β3 x3 , (16)
where w = hourly wage, x1 = years of education (educ), x2 = years of

labor market experience (exper), and x3 = years with the current
employer (tenure).
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.284360 0.104190 2.729 0.00656 **
educ 0.092029 0.007330 12.555 < 2e-16 ***
exper 0.004121 0.001723 2.391 0.01714 *
tenure 0.022067 0.003094 7.133 3.29e-12 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.4409 on 522 degrees of freedom

Multiple R-squared: 0.316,Adjusted R-squared: 0.3121
F-statistic: 80.39 on 3 and 522 DF, p-value: < 2.2e-16

Goodness-of-Fit
Estimation
Matrix form
Goodness-of-Fit
R-square
Adjusted R-square
Multicollinearity

Goodness-of-Fit
Again in the same manner as with the simple regression we have

n
X n
X n
X
(yi − ȳ )2 = (ŷi − ȳ )2 + (yi − ŷi )2 (17)
i=1 i=1 i=1
or
SST = SSE + SSR, (18)
where
ŷi = β̂0 + β̂1 xi1 + · · · + β̂k xik . (19)

Goodness-of-Fit
Estimation
Matrix form
Goodness-of-Fit
R-square
Adjusted R-square
Multicollinearity

Goodness-of-Fit
SSE SSR
R2 = =1− . (20)
SST SST
Again as in the case of the simple regression, R 2 can be shown to
be the squared correlation coefficient between the actual yi and
fitted ŷi . This correlation is called the multiple correlation
(yi − ȳ )(ŷi − ŷ¯ )

P
R = pP pP . (21)
(yi − ȳ )2 (ŷi − ŷ¯ )2
Recall, ŷ¯ = ȳ .
Remark 2
R 2 never decreases when an explanatory variable is added to the model!

Goodness-of-Fit
Estimation
Matrix form
Goodness-of-Fit
R-square
Adjusted R-square
Multicollinearity

Goodness-of-Fit
su2
R̄ 2 = 1 − , (22)
sy2
where
n n
1 X 1 X
su2 = (yi − ŷi )2 = ûi2 (23)
n−k −1 n−k −1
i=1 i=1
is an estimator of the error variance σu2 = var[ui ] and ûi = yi − ŷi

are residuals with ŷi = β̂0 + β̂1 xi1 + · · · + β̂k xik the fitted values.

Estimation
Matrix form
Goodness-of-Fit
R-square
Adjusted R-square
Multicollinearity

Under the classical assumptions 1–5 (with assumption 5 updated

to multiple regression) we can show that the estimators of the
regression coefficients are unbiased.
Theorem 1
Under the classical assumptions the OLS estimators of the regression
coefficients, βj , j = 0, 1, . . . , k, are unbiased, i.e.,
h i
E β̂j = βj , j = 0, 1, . . . , k. (24)

Using matrix notations, the proof of Theorem 3.1 is pretty

straightforward. To do this, write
β̂ = (X0 X)−1 X0 y
= (X0 X)−1 X0 (Xβ + u)
(25)
= (X0 X)−1 X0 Xβ + (X0 X)−1 X0 u
= β + (X0 X)−1 X0 u.
The expected value of β̂ is

h i
E β̂ = β + E (X0 X)−1 X0 u

(26)
= β
because of Assumption 5, i.e.,

E[u|X] = 0 implies cov[u, X] = 0
and therefore E (X0 X)−1 X0 u = 0. Thus, the OLS estimators of

the regression coefficients are unbiased.

Remark 3
If z = (z1 , . . . , zk )0 is a random vector, then E[z] = (E[z1 ] , . . . , E[zn ])0 .
That is the expected value of a vector is a vector whose components are
the individual expected values.

Estimation
Matrix form
Goodness-of-Fit
R-square
Adjusted R-square
Multicollinearity

Suppose the correct model is
y = β0 + β1 x1 + u (27)
but we estimate the model as
y = β0 + β1 x1 + β2 x2 + u. (28)
Thus, β2 = 0 in reality. The OLS estimation results yield
ŷ = β̂0 + β̂1 x1 + β̂2 x2 (29)

h i h i
By Theorem 3.1 E β̂j = βj , thus in particular E β̂2 = β2 = 0,
implying that inclusion of extra variables to a regression does not
bias the results.
However, as will be seen later, they decrease accuracy of
estimation by increasing variance of the OLS estimates.
Estimation
Matrix form
Goodness-of-Fit
R-square
Adjusted R-square
Multicollinearity

Suppose now as an example that the correct model is
y = β0 + β1 x1 + β2 x2 + u, (30)
but we misspecify the model as
y = β0 + β1 x1 + v , (31)
where the omitted variable is embedded into the residual term

v = β2 x2 + u.
OLS estimator for β1 for specification (31)is
P
(xi1 − x̄1 )yi
β̃1 = P . (32)
(xi1 − x̄1 )2

From Equation (32) we have

n
X
β̃1 = β1 + ai vi , (33)
i=1
where
(xi1 − x̄1 )
ai = P . (34)
(xi1 − x̄1 )2

Thus because E[vi ] = E[β2 xi2 + ui ] = β2 xi2

h i P
E β̃1 = β1 + ai E[vi ]
P
= β1 + ai β2 xi2 (35)
P
= β1 + β2 P(x(xi1 −x̄1 )xi2
2
i1 −x̄1 )
i.e., P
h i (xi1 − x̄1 )xi2
E β̃1 = β1 + β2 P , (36)
(xi1 − x̄1 )2
implying that β̃1 is biased for β1 unless x1 and x2 are uncorrelated
(or β2 = 0). This is called the omitted variable bias.

The direction of the omitted variable bias is as follows:
cor[x1 , x2 ] > 0 cor[x1 , x2 ] < 0

β2 > 0 positive bias negative bias
β2 < 0 negative bias positive bias

Remark 4
The omitted variable bias factually is due to the fact that the error term
of the miss-specified model is correlated with the explanatory variables.
That is, if the true model is y = β0 + β1 x1 + β2 x2 + u but we estimate
model y = β0 + β2 x1 + v (so that v = β2 x2 + u), then
E[v |x1 ] = E[β2 x2 + u|x1 ] = β2 E[x2 |x1 ] + E[u|x1 ] = β2 E[x2 |x1 ] 6= 0 if x2
and x1 are correlated, i.e., the crucial assumption E[v |x1 ] = 0 becomes
violated in the miss-specified model. (Note: we assume that in the true
model E[u|x1 , x2 ] = 0 which implies also E[u|x1 ] = 0.)


A simple model to explain city murder rates (murdrate) in terms of
probability of conviction (prbconv) and average sentence length
(avgsen) is
murdrate = β0 + β1 prbconv + β2 avgsen + u.
What are some factors contained in u? Do you think the key assumption
E[u|prbconv, avgsen] = 0 likely to hold?

Estimation
Matrix form
Goodness-of-Fit
R-square
Adjusted R-square
Multicollinearity

Write the regression model
yi = β0 + β1 xi1 + · · · + βk xik + ui (37)
i = 1, . . . , n in the matrix form
y = Xβ + u. (38)
Then we can write the OLS estimators compactly as
β̂ = (X0 X)−1 X0 y. (39)

Under the classical assumption 1–5, and assuming X fixed we can

show that the variance-covariance matrix β is
h i
cov β̂ = (X0 X)−1 σu2 . (40)
Variances of the individual coefficients are obtained form the main

diagonal of the matrix, and can be shown to be of the form
h i σ2
var β̂j = Pnu , (41)
(1 − Rj2 ) i=1 (xij − x̄j )2
j = 1, . . . , k, where Rj2 is the R-square when regressing xj on the

other explanatory variables and the constant term.

Estimation
Matrix form
Goodness-of-Fit
R-square
Adjusted R-square
Multicollinearity

In terms of linear algebra, we say that vectors
x1 , x2 , . . . , xk
are linearly independent if
a1 x1 + · · · + ak xk = 0
holds only if
a1 = · · · = ak = 0.
Otherwise x1 , . . . , xk are linearly dependent. In such a case some

a` 6= 0 and we can write
x` = c1 x1 + · · · + c`−1 x`−1 + c`+1 x`+1 + · · · + ck xk ,
where cj = −aj /a` that is x` can be represented as a linear

combination of the other variables.
In statistics the multiple correlation measures the degree of linear

dependence.
If the variables are perfectly linearly dependent. That is, if for
example, xj is a linear combination of other variables, the multiple
correlation Rj = 1.
A perfect linear dependence is rare between random variables.
However, particularly between macro economic variables
dependencies are often high.

h i
From the variance equation (41) we see that var β̂j → ∞ as
Rj2 → 1.
That is, the more the explanatory variables are linearly dependent
the larger the variance becomes.
This implies that the coefficient estimates become increasingly
unstable.
High (but not perfect) correlation between two or more
explanatory variables is called multicollinearity.

Symptoms of multicollinearity:
1 High correlations between explanatory variables.
2 R 2 is relatively high, but the coefficient estimates tend to be
insignificant (see the section of hypothesis testing)
3 Some of the coefficients are of wrong sign and some
coefficients are at the same time unreasonably large.
4 Coefficient estimates change much from one model alternative
to another.

Example 3
Variable Et denotes expenditure costs in a sample of Toyota Mark II cars
at time point t, Mt denotes the mileage and At age.
Consider model alternatives:

Model A: Et = α0 + α1 At + u1t
Model B: Et = β0 + β1 Mt + u2t
Model C: Et = γ0 + γ1 Mt + γ2 At + u3t

Example 3 continues . . .
Estimation results: (t-values in parentheses)
Variable Model A Model B Model C

Constant -626.24 -796.07 7.29
(-5.98) (-5.91) (0.06)
Age 7.35 27.58

(22.16) (9.58)
Miles 53.45 -151.15

(18.27) (-7.06)
df 55 55 54
R̄ 2 0.897 0.856 0.946
σ̂ 368.6 437.0 268.3
Findings:
Apriori, coefficients α1 , β1 , γ1 , and γ2 should be positive. However,
γ̂2 = −151.15 (!!?), but β̂1 = 53.45. Correlation rA,M = 0.996!

Remedies:
In the collinearity problem the question is the there is not enough
information to reliably identify each variables contribution as an
explanatory variable in the model.
Thus in order to alleviate the problem:
1 Use non-sample information if available to impose restrictions
between coefficients.
2 Increase the sample size if possible.
3 Drop the most collinear (on the base of Rj2 ) variables.
4 If a linear combination (usually a sum) of the most collinear
variables is meaningful, replace the collinear variables by the
linear combination.

Remark 5
Multicollinearity is not always harmful.
If an explanatory variable is not correlated with the rest of the
explanatory variables multicollinearity of these variables (provided
that it is not perfect) does not harm the variance of the slope
coefficient estimator of the uncorrelated explanatory variable.
If xj is not correlated with any of the other predictors, Rj2 = 0 and hence,
as is seen from equation (41), the correlation factor (1 − Rj2 ) drops out
h i
from its slope coefficient estimator variance var β̂j .
Another case where multicollinearity (again excluding the perfect

case) is not a problem is if we are purely interested in the
predictions of the model.
Predictions are affected essentially only by the usefulness of the available
information.


Consider a model explaining final exam score by class attendance (number of
classes attended). To control for student abilities and efforts outside the
classroom, additional explanatory variables like cumulative GPA, SAT score,
and a measure of high school performance are included. Someone says, ”You
cannot learn much from this exercise because cumulative GPA, SAT score, and
high school performance are likely to be highly collinear.” What should be your
response?

Sometimes collinearity imply from technical reasons.

Example 4 (Wooldridge 5e, Exercise 3.5)
In a study relating college grade point average (gpa) to time spent in various
activities, you distribute a survey to several students. The students are asked
how many hours they spend each week in four activities: studying, sleeping,
working, and leisure. Any activity is put into one of these four categories, so
that for each student the sum of hours in the four activities must sum to 168.
(i) In the model
gpa = β0 + β1 study + β2 sleep + β3 work + β4 leisure + u,
does it make sense to hold sleep, work, leisure fixed, while changing
study?
(ii) Explain why this model violates Assumption 4.
(iii) How would you reformulate model so that its parameters have a useful
interpretation and it satisfies Assumption 4.?

Estimation
Matrix form
Goodness-of-Fit
R-square
Adjusted R-square
Multicollinearity

Consider again as in (30) the regression model
y = β0 + β1 x1 + β2 x2 + u. (42)
Suppose the following models are estimated by OLS
ŷ = β̂0 + β̂1 x1 + β̂2 x2 (43)
and
ỹ = β̃0 + β̃1 x1 . (44)
Then
h i σ2
var β̂1 = 2 )
Pu (45)
(1 − r12 (xi1 − x̄1 )2
and
h i σu2
var β̃1 = P , (46)
(xi1 − x̄1 )2
where r12 is the sample correlation between x1 and x2 .
h i h i
Thus var β̃1 ≤ var β̂1 , and the inequality is strict if r12 6= 0.
In summary (assuming r12 6= 0):
h i h i
1 If β2 6= 0, then β̃1 is biased, β̂1 is unbiased, and var β̃1 < var β̂1
h i h i
2 If β2 = 0, then both β̃1 and β̂1 are unbiased, but var β̃1 < var β̂1

Estimation
Matrix form
Goodness-of-Fit
R-square
Adjusted R-square
Multicollinearity

An unbiased estimator of the error variance var[u] = σu2 is

n
1 X
σ̂u2 = ûi2 , (47)
n−k −1
i=1
where
ûi = yi − β̂0 − β̂1 xi1 − · · · − β̂k xik . (48)
The term n − k − 1 in (46) is the
degrees of freedom (df).
It can be shown that
E σ̂u2 = σu2 ,

(49)
i.e., σ̂u2 is unbiased estimator of σu2 .
σ̂u is called the standard error of the regression.

Estimation
Matrix form
Goodness-of-Fit
R-square
Adjusted R-square
Multicollinearity

Standard deviation of β̂j is the square root of (41), i.e.

σu
q
σβ̂j = var[βj ] = q (50)
Rj2 )
P
(1 − (xij − x̄j )2
p
Substituting σu by its estimate σ̂u = σ̂u2 gives
the standard error of β̂j
σ̂u
se(β̂j ) = q . (51)
(1 − Rj2 ) (xij − x̄j )2
P

Estimation
Matrix form
Goodness-of-Fit
R-square
Adjusted R-square
Multicollinearity

Theorem 2 (Gauss-Markov)
Under the classical assumptions 1–5 β̂0 , β̂1 , . . . β̂k are the best linear
unbiased estimators (BLUEs) of β0 , β1 , . . . βk , respectively.
BLUE:
Best: The variance of the OLS estimator is smallest among all
linear unbiased estimators of βj
Linear: β̂j = ni=1 wij yi .
P
h i
Unbiased: E β̂j = βj .

Example 5 (Wooldridge 5e Exercise 3.13 (our Exercise 2.1))

(i) Consider the simple regression model y = β0 + β1 x + u under the classical
(Gauss-Markov) assumptions 1–5. For some function g (x), for example
g (x) = x 2 or g (x) = log(1 + x 2 ), define zi = g (xi ). Define a slope estimator as
Pn
(zi − z̄)yi
β̃ = Pni=1 .
i=1 i − z̄)xi
(z
Show that β̃1 is linear and unbiased. (Remember, because E[u|x] = 0, you can
treat both xi and zi as nonrandom in your derivation.)
(ii) Show that
σ 2 n (zi − z̄)2
h i P
var β̃1 = Pn i=1 2
i=1 (zi − z̄)zi
h i h i
(iii) Show that under the Gauss-Markov assumptions, var β̂1 ≤ var β̃ , where β̂1 is
the
P OLS estimator. (Hint. Cauchy-Schwartz inequality:
( ai bi )2 ≤ ( ai ) ( bi ).)
P P

As of Sep 16, 2021: Seppo Pynn Onen Econometrics I

Uploaded by

Copyright:

Available Formats

As of Sep 16, 2021: Seppo Pynn Onen Econometrics I

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

As of Sep 16, 2021: Seppo Pynn Onen Econometrics I

Uploaded by

Copyright:

Available Formats

Multiple Regression Analysis

Multiple Regression Analysis

As of Sep 16, 2021

1 Multiple Regression Analysis

Expected values of the OLS estimators

Irrelevant variables in a regression

Omitted variable bias

Variances of OLS estimators

Variances of misspecified models

Estimating error variance σu2

Standard error of β̂k

The Gauss-Markov Theorem

Seppo Pynnönen Econometrics I

The general linear regression with k explanatory variables is just an

Seppo Pynnönen Econometrics I

Multiple regression opens up several additional options to enrich

Seppo Pynnönen Econometrics I

In simple regression we could dry to fit a level-log model or log-log model.

One possibility also could be

where according to our hypothesis β1q < 0. Thus the consumption

This is a multiple regression model with x1 = Y and x2 = Y 2 .

Seppo Pynnönen Econometrics I

Seppo Pynnönen Econometrics I

1 Multiple Regression Analysis

Expected values of the OLS estimators

Irrelevant variables in a regression

Omitted variable bias

Variances of OLS estimators

Variances of misspecified models

Estimating error variance σu2

Standard error of β̂k

The Gauss-Markov Theorem

Seppo Pynnönen Econometrics I

In order to estimate the model we replace the classical assumption

Seppo Pynnönen Econometrics I

Given observations (yi , xi1 , . . . , xik ), i = 1, . . . , n (n = the number

with respect to the parameters.

Seppo Pynnönen Econometrics I

1 Multiple Regression Analysis

Expected values of the OLS estimators

Irrelevant variables in a regression

Omitted variable bias

Variances of OLS estimators

Variances of misspecified models

Estimating error variance σu2

Standard error of β̂k

The Gauss-Markov Theorem

Seppo Pynnönen Econometrics I

Using matrix algebra simplifies considerably the notations in

where the prime denotes transposition.

where k < n (the number of observations, n, is larger than the

The normal equations for the first order conditions of the

which gives the explicit solution for the OLS estimator of β as

β̂ = (X0 X)−1 X0 y, (11)

where β̂ = (β̂0 , β̂1 , . . . , β̂k )0 .

Seppo Pynnönen Econometrics I

The fitted model is

ŷi = β̂0 + β̂1 xi1 + · · · + β̂k xik , (12)

ûi = yi − ŷi . (13)