Econometrics - Exercise set 1 (solution)

Econometrics - Exercise set 1
* Marks exercises to be solved in class. Try to solve the remaining at home.
Exercise 1
Answer the following questions and state the differences between simple- and multiple linear regression:
1) State the simple linear regression model and explain the variables
True model: 𝑦𝑖 = 𝛽0 + 𝛽1 𝑥𝑖 + 𝜀𝑖
Estimated model: 𝑦𝑖 = 𝑏0 + 𝑏1 𝑥𝑖 + 𝑒𝑖
Where:
• 𝑦𝑖 = outcome
• 𝑏0 = intercept
• 𝑏1 = slope/effect on outcome from a change in 𝑥
• 𝑥𝑖 = explanatory variable for each observation 𝑖
• 𝑒 = residual (variation in 𝑦 that cannot be explained from the model)
2) State the multiple linear regression model and explain the variables
True model: 𝑦𝑖 = 𝛽𝑋 + 𝜀
Estimated model: 𝑦𝑖 = 𝑏𝑋 + 𝑒
Where:
• 𝑌 = 𝑛 × 1 matrix
𝛽0
𝛽
• 𝑏 = 𝑘 × 1 matrix of parameter values ( 1 )
…
𝛽𝑘
• 𝑋 =𝑛×𝑘 matrix containing values of each explanatory variable (incl. intercept)
1 𝑥21 … 𝑥𝑘1
1 𝑥22 … 𝑥𝑘2
( … …)
1 …
1 𝑥2𝑛 … 𝑥𝑘𝑛
Page 1 of 7
• 𝑒 = residual term, 𝑛 × 1 matrix (column vector)
3) What does homoscedastic and heteroscedastic mean?
Homoskedasticity: Variance of the error term is constant, 𝑣𝑎𝑟(𝜖𝑖 ) = 𝐸(𝜖𝑖2 ) = 𝜎 2
𝜎2 0 … 0
2 0 …
𝑣𝑎𝑟(𝜖𝑖 ) = ( 0 𝜎 )
… 0 𝜎2 0
0 … 0 𝜎2
Heteroskedasticity: Variance of the error term differs across the value of one (or more) explanatory
variable(s), 𝑣𝑎𝑟(𝜖𝑖 ) = 𝜎𝑖2 , 𝜎𝑖2 ≠ 𝜎𝑗2
𝜎12 0 … 0
2 … 0
𝑣𝑎𝑟(𝜖𝑖 ) = 0 𝜎2
… … … 0
( 0 0 0 𝜎𝑛2 )
4) What does exogenous and endogenous mean?
Exogenous: A variable which is not determined within the model (e.g. explanatory variables). By
assumption of strict exogeneity 𝐸[𝜀𝑖 |𝑋] = 0.
Endogenous: A variable determined within the model (e.g. the outcome variable, or other economic
explanatory variables such as years of education).
Page 2 of 7
Exercise 2
State the seven Gauss-Markov assumptions and give a brief explanation of each assumption.
Assumption 1: Fixed regressors. All variables are fixed (non-stochastic) and 𝑟𝑎𝑛𝑘(𝑋) = 𝑘 ≤ 𝑛 and no perfect
multicollinearity.
Assumption 2: Random disturbances, zero mean. The error term is randomly distributed with mean = 0.
Hence, 𝐸[𝜖𝑖 |𝑋] = 0.
Assumption 3: Homoskedasticity. Disturbances have constant variance 𝜎 2.
𝜎2 0 … 0
2 0 …
𝑣𝑎𝑟(𝜖𝑖 ) = ( 0 𝜎 )
… 0 𝜎2 0
0 … 0 𝜎2
Assumption 4: No correlation. The off-diagonal elements of the covariance matrix of the disturbances (see
above) are all equal to zero. In other words, 𝑐𝑜𝑣(𝜀𝑖 𝜀𝑗 ) = 𝐸[𝜀𝑖 𝜀𝑗 ] = 0 for 𝑖 ≠ 𝑗.
Assumption 5: Constant parameters. The elements of 𝛽 and 𝜎 are fixed unknown parameters, and 𝜎 ≥ 0.
Assumption 6: Linear model. The outcome variable 𝑦 is a linear function of 𝛽 and 𝜖, and have been generated
by the data generating process (DGP):
𝑦 = 𝑋𝛽 + 𝜀
Assumption 7: Normality. The disturbances are jointly normally distributed. Hence, 𝜖𝑖 ~ 𝑁(0, 𝜎 2 ) and the
sample is randomly drawn from the population.
Exercise 3
Give a brief explanation of the difference between the “True model” and the “estimated model”.
Page 3 of 7
The true model contains the true parameter values (𝛽) that generate the outcome 𝑦. There is no way to know
the exact value of these parameters.
The true model encompasses the entire population, not only a sample of the population.
The estimated model is the model which minimizes the error term and is used to obtain 𝑏 - our guess of the
value of 𝛽. We can estimate values of 𝑏 using OLS, which gives us a best-fit line. The estimated model is based
on a sample which is representative. The more representative the sample is, the more precise our model is.
Exercise 4
Given the multiple linear regression,
𝑦 = 𝑋𝛽 + 𝜀
Where 𝛽 is a vector of coefficients, 𝑋 is the covariance matrix and 𝜀 is the error term which 𝜀 ~𝑁(0, 𝜎 2 ).
The Ordinary Least Squares (OLS) function (also called our estimated model) is then given as,
𝑦 = 𝑋𝑏 + 𝑒
Given the information above, answer the following questions:
4.1
Derive the objective function and find the minimizer
First, rewrite our estimated model as:
𝑒 = 𝑦 − 𝑋𝑏
Our objective is to minimize the sum of squared residuals. Hence, our objective function is:
𝑛
min 𝑆(𝑏) = min ∑ 𝑒𝑖2

𝑏 𝑏
𝑖=1
= ∑(𝑦1 − 𝑏1 − 𝑏2 𝑥2 − 𝑏3 𝑥3 − ⋯ − 𝑏𝑘 𝑥𝑘𝑖 )2
= 𝑒 ′ 𝑒 = (𝑦 − 𝑋𝑏)′ (𝑦 − 𝑋𝑏)
= (𝑦 ′ 𝑦 − 𝑦 ′ 𝑋𝑏 − 𝑏′ 𝑋 ′ 𝑦 + 𝑏′𝑋 ′ 𝑋𝑏)
Page 4 of 7
Which we can rewrite as:
𝑆(𝑏) = (𝑦 ′ 𝑦 − 2𝑏′𝑋 ′ 𝑦 − 𝑏′𝑋′𝑋𝑏)
4.2
−𝟏
Show that 𝒃 = (𝒙′ 𝒙) 𝒙′𝒚 becomes the OLS estimate of 𝜷
Take the derivative of 𝑆(𝑏) w.r.t. 𝑏:
𝜕𝑆(𝑏) 𝜕
= (𝑦 ′ 𝑦 − 2𝑏′ 𝑋 ′ 𝑦 + 𝑏′𝑋′𝑋𝑏) = −2𝑋 ′ 𝑦 + 2𝑋′𝑋𝑏
𝜕𝑏 𝜕𝑏
𝜕𝑆(𝑏)
Then, solve =0
𝜕𝑏
𝜕𝑆(𝑏)
= 0 ⇔ −2𝑋 ′ 𝑦 + 2𝑋′𝑋𝑏 = 0
𝜕𝑏
Solve this equation for 𝑏:
2𝑋 ′ 𝑦 = 2𝑋 ′ 𝑋𝑏
2𝑋 ′ 𝑦
=𝑏
2𝑋′𝑋
𝑋′ 𝑦
=𝑏
𝑋′ 𝑋
Or similarly:
−1
(𝑋 ′ 𝑋) 𝑋′𝑦 = 𝑏
Exercise 5*
Imagine we want to estimate the wage for our future job give some already given factors. With econometrics,
we have various ways of doing so. Here are some guidelines one might want to consider when creating your
own model and estimating Ordinary Least Square (OLS). These will be tested in the problems below.
- Analyzing the dataset. What factors are important and what do we want to estimate?
- What model should we suggest?
- Can we do better?
For the exercise, use the dataset called wage1.RData. To load the dataset into RStudio, use the following
coding:
Page 5 of 7
load("𝑤𝑎𝑔𝑒1. 𝑅𝐷𝑎𝑡𝑎")
In this exercise, the model we want to estimate is given as Model 1,
𝑤𝑎𝑔𝑒 = 𝛽0 + 𝛽1 𝑒𝑑𝑢𝑐 + 𝛽2 𝑒𝑥𝑝𝑒𝑟 + 𝛽3 𝑡𝑒𝑛𝑢𝑟𝑒 + 𝜀𝑖
5.1
What is the mean, variance and quantiles of the variable wage?
5.2
Estimate Model 1 with OLS in RStudio. What do we observe with Model 1?
Interpretation:
• Educ: an additional year of education is expected to increase wages by 0.599 units - significant at 1%
level
• Experience: an additional year of education is expected to increase wages by 0.022 units - note that
this coefficient is only significant at a 10% significance level
• Tenure: an additional year of tenure increases wages by 0.169 units - highly significant
5.3
Given the factors available in the dataset, your knowledge of econometrics and the estimate in 2), could
we do better?
Page 6 of 7
We could introduce additional (relevant) covariates. Based on our knowledge of economics, we could include
𝑒𝑥𝑝𝑒𝑟 2 as there is a non-linear relationship between wages and experience. Moreover, we could include
female and non-white because of gender- and racial earning gaps.
As the distribution of income is notoriously known as being skewed, we also estimate an alternative to Model
1 where we log-transform our dependent variable.
Both proposed models out-perform Model 1 in terms of Adjusted R-Squared (see below).
Page 7 of 7

Econometrics - Exercise set 1 (solution)

Uploaded by

Copyright:

Available Formats

Econometrics - Exercise set 1 (solution)

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Econometrics - Exercise set 1 (solution)

Uploaded by

Copyright:

Available Formats

Econometrics - Exercise set 1

* Marks exercises to be solved in class. Try to solve the remaining at home.

3) What does homoscedastic and heteroscedastic mean?

Homoskedasticity: Variance of the error term is constant, 𝑣𝑎𝑟(𝜖𝑖 ) = 𝐸(𝜖𝑖2 ) = 𝜎 2

4) What does exogenous and endogenous mean?

Assumption 3: Homoskedasticity. Disturbances have constant variance 𝜎 2.

Given the information above, answer the following questions:

First, rewrite our estimated model as:

min 𝑆(𝑏) = min ∑ 𝑒𝑖2

𝑆(𝑏) = (𝑦 ′ 𝑦 − 2𝑏′𝑋 ′ 𝑦 − 𝑏′𝑋′𝑋𝑏)

Take the derivative of 𝑆(𝑏) w.r.t. 𝑏:

Solve this equation for 𝑏:

- What model should we suggest?

In this exercise, the model we want to estimate is given as Model 1,

𝑤𝑎𝑔𝑒 = 𝛽0 + 𝛽1 𝑒𝑑𝑢𝑐 + 𝛽2 𝑒𝑥𝑝𝑒𝑟 + 𝛽3 𝑡𝑒𝑛𝑢𝑟𝑒 + 𝜀𝑖

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.