0% found this document useful (0 votes)
8 views

Econometrics 7

This lecture introduces econometrics, focusing on the simple regression model, which explains how one variable (y) varies with another (x). It covers the estimation of Ordinary Least Squares (OLS) parameters, the derivation of estimators, and the interpretation of fitted values and residuals using examples such as wage and education. The lecture also discusses the numerical properties of OLS estimators, including the sum of residuals being zero and the covariance between regressors and residuals being zero.

Uploaded by

manas.juve
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views

Econometrics 7

This lecture introduces econometrics, focusing on the simple regression model, which explains how one variable (y) varies with another (x). It covers the estimation of Ordinary Least Squares (OLS) parameters, the derivation of estimators, and the interpretation of fitted values and residuals using examples such as wage and education. The lecture also discusses the numerical properties of OLS estimators, including the sum of residuals being zero and the covariance between regressors and residuals being zero.

Uploaded by

manas.juve
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 49

Lecture 7

Defining Econometrics, Economic Models, and Econometric Models

Introduction to Econometrics
BSc Eco 2023, Spring 2025
Instructor: Sunaina Dhingra
Email-id: sunaina@jgu.edu.in
Lecture Date: 24th February
Revision of
Estimation and Interpretation of OLS
Estimators
The Simple Regression Model
• Definition of the simple regression model
• “Explains variable y in terms of variable x”

3
The Simple Regression Model
• Interpretation of the simple linear regression model
• Explains how y varies with changes in x

• The simple linear regression model is rarely applicable in practice


but its discussion is useful for pedagogical reasons.

4
The Simple Regression Model
• Example: Soybean yield and fertilizer

• Example: A simple wage equation

5
The Simple Regression Model
• When is there a causal interpretation?
• Conditional mean independence assumption

• Example: wage equation

6
The Simple Regression Model
• Population regression function (PRF)
• The conditional mean independence assumption implies that

• This means that the average value of the dependent variable


can be expressed as a linear function of the explanatory variable.

7
The Simple Regression Model

8
The Simple Regression Model
• Deriving the ordinary least squares estimates
• In order to estimate the regression model one needs data
• A random sample of n observations

9
The Simple Regression Model
• Deriving the ordinary least squares (OLS) estimators
• Defining regression residuals

• Minimize the sum of the squared regression residuals

• OLS estimators

10
The Simple Regression Model
• OLS fits as good as possible a regression line through the data points

11
Estimation of OLS parameters
• The Wage1 dataset is used to estimate wage

PRF: 𝑤𝑎𝑔𝑒𝑖 = 𝛽0 + 𝛽1 𝑒𝑑𝑢𝑐𝑎𝑡𝑖𝑜𝑛𝑖 + μ𝑖 −−−−−−−(1)

• The model is estimated using a sample from the population

෢0 + 𝛽
SRF: 𝑤𝑎𝑔𝑒𝑖 = 𝛽 ෢1 𝑒𝑑𝑢cation𝑖 + μෝ𝑖 -------(2)

𝑤𝑎𝑔𝑒𝑖 = 𝑤𝑎𝑔𝑒
ෟ𝑖+ෞ μ𝑖

• 𝑤𝑎𝑔𝑒
ෟ 𝑖 is the estimated conditional mean value of 𝑤𝑎𝑔𝑒𝑖

• The parameters of this regression model are estimated using Ordinary Least
Squares (OLS) method
Estimation of OLS parameters(contd.)
෢0 + 𝛽
𝑤𝑎𝑔𝑒𝑖 = 𝛽 ෢1 𝑒𝑑𝑢cation𝑖 + μෝ𝑖 -------(2)

෢0 − 𝛽
μෝ𝑖 = 𝑤𝑎𝑔𝑒𝑖 − 𝛽 ෢1 𝑒𝑑𝑢𝑐𝑎𝑡𝑖𝑜𝑛𝑖 −−−−−−−(3)

• As the sum of disturbances equals zero, the mean also equals zero
E μ = μത = 0−−−−−−−(4)

• As the average value of residual equals zero , we square the residuals, sum them and
then try to minimize that sum
n n
2 2
෢0 − β
min ෍ μෝi = min ෍ wagei − β ෢1 educationi −−−−−−−(5)
i=1 i=1

෢0 and 𝛽
• The estimators 𝛽 ෢1 are obtained by minimizing the sum of squared residuals.
Estimation of OLS parameters(contd.)
2
𝑚𝑖𝑛 σ𝑛𝑖=1 𝜇ෝ𝑖 2 = 𝑚𝑖𝑛 σ𝑛𝑖=1 ෢0 − 𝛽
𝑤𝑎𝑔𝑒𝑖 − 𝛽 ෢1 𝑒𝑑𝑢𝑐𝑎𝑡𝑖𝑜𝑛𝑖 −−−−−−−(5)

෢0 , and equate them to 0


• Take the FOC of equation 5 with respect to 𝛽

෢0 − 𝛽
−2 σ𝑛𝑖=1 𝑤𝑎𝑔𝑒𝑖 − 𝛽 ෢1 𝑒𝑑𝑢𝑐𝑎𝑡𝑖𝑜𝑛𝑖 = 0 −−−−−−−(6)

෢1 , and equate them to 0


• Take the FOC of equation 5 with respect to 𝛽

෢0 − 𝛽
−2 σ𝑛𝑖=1 𝑒𝑑𝑢𝑐𝑎𝑡𝑖𝑜𝑛𝑖 𝑤𝑎𝑔𝑒𝑖 − 𝛽 ෢1 𝑒𝑑𝑢𝑐𝑎𝑡𝑖𝑜𝑛𝑖 = 0 −−−−−−−(7)
Estimation of OLS parameters(contd)
෢0 & β
• Solving 6 and 7 simultaneously, we get the OLS estimators β ෢1

෢0 − 𝛽
−2 σ𝑛𝑖=1 𝑤𝑎𝑔𝑒𝑖 − 𝛽 ෢1 𝑒𝑑𝑢𝑐𝑎𝑡𝑖𝑜𝑛𝑖 = 0 −−−−−−−(6)

෢0 − 𝛽
−2 σ𝑛𝑖=1 𝑒𝑑𝑢𝑐𝑎𝑡𝑖𝑜𝑛𝑖 𝑤𝑎𝑔𝑒𝑖 − 𝛽 ෢1 𝑒𝑑𝑢𝑐𝑎𝑡𝑖𝑜𝑛𝑖 = 0 −−−−−−−(7)

෢0 = 𝑤𝑎𝑔𝑒 − 𝛽
𝛽 ෢1 𝑒𝑑𝑢𝑐𝑎𝑡𝑖𝑜𝑛------(8)

σ𝑛𝑖=1 𝑒𝑑𝑢𝑐𝑎𝑡𝑖𝑜𝑛𝑖 − 𝑒𝑑𝑢𝑐𝑎𝑡𝑖𝑜𝑛 (𝑤𝑎𝑔𝑒𝑖 − 𝑤𝑎𝑔𝑒)


෢1 =
𝛽 −−−−−−(9)
2
σ𝑛𝑖=1 𝑒𝑑𝑢𝑐𝑎𝑡𝑖𝑜𝑛𝑖 − 𝑒𝑑𝑢𝑐𝑎𝑡𝑖𝑜𝑛

• These estimators are also referred to as least-squares estimators


Fitted Values and Residuals
• Thus, we can write the OLS estimator for any y and x as

෢0 = 𝑦ത − 𝛽
𝛽 ෢1 𝑥ҧ −−−− − 10

σ𝑛
𝑖=1 𝑥𝑖 −𝑥ҧ (𝑦𝑖 −𝑦)

෢1 =
𝛽 2 −−−− − 11
σ𝑛
𝑖=1 𝑥−𝑥ҧ
• Equation 11 can be re written as

෢1 = 𝐶𝑜𝑣(𝑥𝑦)
𝛽 −−−− − 12
𝜎𝑥2

• Covariance and slope have the same sign . Thus, the sign of covariance determines
the expected direction in which x affects y
Fitted Values and Residuals(contd.)
෢0 and 𝛽
• Predicted y: For any given value of 𝑥𝑖 , using the estimated 𝛽 ෢1 values we
get
𝑦ෝ𝑖 = ෢
𝛽0 +𝛽 ෢1 𝑥𝑖 -----(13)

𝑦𝑖 = 𝑦ෝ𝑖 + 𝑢ෝ𝑖 −−−−−(14)

• The fitted regression line is called the line of best fit

• The OLS residual associated with each observation i , 𝑢ෝ𝑖 is

𝑢ෝ𝑖 = 𝑦𝑖 − 𝑦ෝ𝑖 −−−−−(15)

• If 𝑢ෝ𝑖 is positive, the line under predicts 𝑦𝑖 and if 𝑢ෝ𝑖 is negative, the line over
predicts 𝑦𝑖
The Simple Regression Model
• Example of a simple regression
• CEO salary and return on equity

• Fitted regression

• Causal interpretation?

18
The Simple Regression Model

19
The Simple Regression Model
• Example of a simple regression
• Wage and education

• Fitted regression

• Causal interpretation?

20
The Simple Regression Model
• Example of a simple regression
• Voting outcomes and campaign expenditures (two parties)

• Fitted regression

• Causal interpretation?

21
STATA Results
• The Wage1 dataset is used to estimate wage- education model

𝑤𝑎𝑔𝑒
ෟ 𝑖 = −0.91 + 0.54 𝑒𝑑𝑢𝑐𝑎𝑡𝑖𝑜𝑛𝑖
STATA Result 1: Estimation of OLS Regression Line Figure 1: Line of best fit
. reg wage education

Source SS df MS Number of obs = 526


F(1, 524) = 103.36
Model 1179.73204 1 1179.73204 Prob > F = 0.0000
Residual 5980.68225 524 11.4135158 R-squared = 0.1648
Adj R-squared = 0.1632
Total 7160.41429 525 13.6388844 Root MSE = 3.3784

wage Coefficient Std. err. t P>|t| [95% conf. interval]

education .5413593 .053248 10.17 0.000 .4367534 .6459651


_cons -.9048516 .6849678 -1.32 0.187 -2.250472 .4407687

Source: Author’s estimation using Wage1 dataset in STATA, refer Do file for command
Algebraic Properties: Numerical & Statistical
• Algebraic properties are classified into numerical properties and statistical properties

• Numerical Propoerties:
• Numerical properties always hold as they were calculated using the Ordinary
Least Squares principles which uses differential calculus

• If the technique is used correctly, the β0 and β1 estimates will satisfy the
numerical properties regardless of how the data were generated

• These properties are true for any sample of data.


Numerical Property:1
• Numerical Property 1: The sum and sample average of OLS residuals is zero

• The equation below follows the first order condition with respect to 𝛽0
𝑛

෍ 𝑢ෝ𝑖 = 0 −−−−−−−(1)
𝑖=1

• The equation below follows the first order condition with respect to 𝛽1

෢0 − 𝛽
−2 σ𝑛𝑖=1 𝑦𝑖 − 𝛽 ෢1 𝑥𝑖 = 0 -------(2)

σ𝑛𝑖=1 𝑢ෝ𝑖
𝜇Ƹ ҧ = = 0 if equation 1 holds true
𝑛
Numerical Property:2
• Numerical Property 2: The sample covariance between regressors and OLS residuals
is zero
Cov (ොμ,x) = E[(ොμ -E(ොμ)) (x-E(x))]-------(3)

= E[(ොμ) (x-E(x))] as E(ොμ)=0

Cov (ොμ,x)= E[ොμ x] -----(4)


• If Covariance is zero, then
𝑛

෍ μෝ𝑖 . 𝑥𝑖 = 0 −−−−−(5)
𝑖=1
෢1 for any x
• The First order condition with respect to 𝛽

෢0 − 𝛽
−2 σ𝑛𝑖=1 𝑥𝑖 𝑦𝑖 − 𝛽 ෢1 𝑥𝑖 = 0 -------(6)
Numerical Property 2: Example
• The Wage1 dataset is used to estimate the wage

Source: Author’s estimation using Wage1 dataset in STATA, refer Do file for command

• Cov(ොμ, education) = 0
Numerical Property: 3
• The point (𝑥,ҧ 𝑦)
ത is always on the OLS regression line

𝑤𝑎𝑔𝑒
ෟ =𝛽 ෢0 + 𝛽
෢1 𝑒𝑑𝑢𝑐𝑎𝑡𝑖𝑜𝑛 ------(7)

• In the equation above if we plug in the mean of education, 𝑒𝑑𝑢cation on for


education, then the predicted value of wage will be its mean i.e., 𝑤𝑎𝑔𝑒

• These properties can be used to write each 𝑦𝑖 as its fitted value, plus its residuals as
given below
𝑦𝑖 = 𝑦ෝ𝑖 + 𝑢ෝ𝑖 ------(8)
The Simple Regression Model
obsno roe salary salaryhat uhat
1 14.1 1095 1224.058 -129.058
• This table presents fitted
2 10.9 1001 1164.854 -163.854
values and residuals for 15
3 23.5 1122 1397.960 -275.969
CEOs.
4 5.9 578 1072.348 -494.348
5 13.8 1368 1218.508 149.493
• For example, the 12th
6 20.0 1145 1333.215 -188.215 CEO’s predicted salary is
7 16.4 1078 1266.611 188.611 $526,023 higher than their
8 16.3 1094 1264.761 -170.761 actual salary.
9 10.5 1237 1157.454 79.546
10 26.3 833 1449.773 -616.773
11 25.9 567 1442.372 -875.372 • By contrast the 5th CEO’s
12 26.8 933 1459.023 -526.023 predicted salary is
13 14.8 1339 1237.009 101.991 $149,493 lower than their
14 22.3 937 1375.768 -438.768 actual salary.
15 56.3 2011 2004.808 6.192

28
Measure of Goodness of Fit (R )
2
Introduction
• Goodness of fit measures how well the independent variables explain the dependent
variable y

• From numerical property1: The average of residuals is zero as the sum of residuals is
𝑛
zero ҧ ෞ𝑖
෌𝑖=1 μ
𝜇Ƹ = = 0 𝑖𝑓 σ𝑛𝑖=1 μෝ𝑖 = 0 −−−−−− − 1
𝑛

• Actual yi consists of a fitted value and a residual

𝑦𝑖 = 𝑦ෝ𝑖 + μෝ𝑖 −−−−−− − 2

• From eq2, the sample average of the fitted values can be written as

𝑦തො = 𝑦--------(3)
ത Summing eq2, diving by n and plugging in eq 1.
Introduction(contd.)

• The covariance between residuals μෝ𝑖 , and x is 0

C𝑜𝑣 μෝ𝑖 , 𝑥𝑖 = E[μෝ𝑖 , 𝑥𝑖 ] =0 as σni=1 μෝi . xi = 0-----(4)

• The covariance between the fitted value and residuals is 0

෢0 − 𝛽
C𝑜𝑣 𝑦ෝ𝑖 , μෝ𝑖 = E 𝑢ෝ𝑖 . 𝑦ෝ𝑖 = E μෝ𝑖 . 𝑦𝑖 − 𝛽 ෢1 𝑥𝑖 = 0 -------(5)
𝑛
෢0 − 𝛽
𝑠𝑖𝑛𝑐𝑒 ෍ μෝ𝑖 . 𝑦𝑖 − 𝛽 ෢1 𝑥𝑖 =0
𝑖=1
Variation in a regression model

• Total sum of squares (SST): Measure of total sample variation in 𝑦𝑖

𝑛
Total variation: SST = ෌𝑖=1(𝑦𝑖 ത 2 = 0 −−−−−−−(6)
− 𝑦)

• Explained Sum of Squares (SSE): Measures sample variation in 𝑦ෝ𝑖


𝑛
Explained variation: SSE=෌𝑖=1(𝑦ෝ𝑖 ത 2 = 0-------(7)
− 𝑦)

• Residual Sum of Squares (SSR): Measures sample variation in μෝ𝑖


𝑛
Unexplained variation: SSR=෌𝑖=1(𝑢ෝ𝑖 )2 = 0 -------(8)
Variation in a regression model(contd)

• The total variation in y can be written as the sum of explained and unexplained variation
SST = SSE + SSR---------(9)

• Dividing the equation above throughout by SST

1= SSE/SST + SSR/SST --------(10)


Coefficient of determination (R2)
• It is the ratio of the explained variation(SSE) compared to the total variation (SST)

• Fraction of the sample variation in y that is explained by x

𝑆𝑆𝐸
R2 = −−− −(11)
𝑆𝑆𝑇
• The value of R2 is always between zero and one, because SSE can be no greater than SST
2 𝑆𝑆𝑅
R =1- −−− −(12)
𝑆𝑆𝑇

𝑆𝑆𝑅
• If the regression model fits well, then is nearly zero and so R2 is one
𝑆𝑆𝑇

𝑆𝑆𝑅
• If the regression model fits badly, then is nearly one and so R2 is zero
𝑆𝑆𝑇
STATA Results
• The Wage1 dataset is used to estimate wage
STATA Result 1: Regression Result

. reg wage education

Source SS df MS Number of obs = 526 R2= 100* 0.1632


F(1, 524) = 103.36 =16.32% : Explained
Model 1179.73204 1 1179.73204 Prob > F = 0.0000
Residual 5980.68225 524 11.4135158 R-squared = 0.1648
Adj R-squared = 0.1632 Unexplained: 83.6%
Total 7160.41429 525 13.6388844 Root MSE = 3.3784

wage Coefficient Std. err. t P>|t| [95% conf. interval]

education .5413593 .053248 10.17 0.000 .4367534 .6459651


_cons -.9048516 .6849678 -1.32 0.187 -2.250472 .4407687
Source: Author’s estimation using Wage1 dataset in STATA, refer Do file for command

• Low R2 value does not mean OLS regression equation is useless

• Using R-squared as the main gauge of success for an econometric analysis can lead to
trouble
The Simple Regression Model
• Goodness of fit
• How well does an explanatory variable explain the dependent variable?

• Measures of variation:

36
The Simple Regression Model
• Decomposition of total variation

• Goodness-of-fit measure (R-squared)

37
The Simple Regression Model
• CEO Salary and return on equity

• Voting outcomes and campaign expenditures

• Caution: A high R-squared does not necessarily mean that the


regression has a causal interpretation!

38
The Simple Regression Model
• Expected values and variances of the OLS estimators
• The estimated regression coefficients are random variables
because they are calculated from a random sample

• The question is what the estimators will estimate on average and


how large will their variability be in repeated samples

39
Assumptions and Unbiasedness
Property of OLS Estimators
SLR.1: Linear in parameters

Source: Wooldridge, Chapter 2

• We need linearity in parameters, i.e., the β’s should have power 1 but our equation
may not be linear in variables y and x
SLR.2: Random Sampling

Source: Wooldridge, Chapter 2

• As the primary goal is to draw conclusions about the population, the sample that
we used must be drawn at random from the population

• If the sample is not drawn at random, it may not be representative of the


population and the conclusions that we draw from it may be biased
The Simple Regression Model
• Discussion of random sampling: Wage and education
• The population consists, for example, of all workers of country A
• In the population, there is a linear relationship between wages and years
of education.
• Draw completely randomly a worker from the population
• The wage and the years of education of the worker drawn are random
because one does not know beforehand which worker is drawn.
• Throw that worker back into the population and repeat the random draw n
times.
• The wages and years of education of the sampled workers are used to
estimate the linear relationship between wages and education.

43
The Simple Regression Model

44
SLR.3: Sample Variation in the explanatory variable
Source: Wooldridge, Chapter 2

• This assumption requires that our x


needs to vary in some way

• Assumption SLR.3 fails if the sample


standard deviation of xi is zero;
otherwise, it holds.
• If this is zero, we cannot find beta1hat,
or beta0hat
Source: Author’s estimation using Wage1 dataset in STATA, refer Do file for command
SLR.4: Zero Conditional Mean

Source: Wooldridge, Chapter 2

• It states that the average value of the disturbance, conditional on x, equals zero for all
possible values of the explanatory variable.

• It also implies the cloud of data centers on a straight line at every possible value of x
Statistical Property 1: Unbiasedness
• Bias means our estimator’s expected value does not equal the true value in the population

• In some ways, biased estimators aren't even "correct on average.“

• The estimators are unbiased if

መ = 𝛽-------(1)
E(𝛽)

• Unbiasedness does not imply that the 𝛽መ for every possible sample is equal to its
population value

• It only means that on an average 𝛽መ are not too large or small in comparison to the
population value
Theorem1: Unbiasedness of OLS Estimates
Source: Wooldridge, Chapter 2

• If assumptions hold, we have a very important theorem that states that under assumptions SLR.1
through SLR.4, OLS estimates of 𝛽 ෢0 and 𝛽෢1 are unbiased, i.,e they are equal to the population 𝛽
෢0 and 𝛽
෢1
, on average.
• In real world, simple linear regression estimators will most commonly be biased
• Unbiased SLR parameters (Theorem 1) only works if E(u|x)=0 i.e. SLR.4 is true

Interpretation of unbiasedness
• The estimated coefficients may be smaller or larger, depending on the sample that is the result of a random draw.
• However, on average, they will be equal to the values that characterize the true relationship between y and x in the
population.
• “On average” means if sampling was repeated, i.e. if drawing the random sample and doing the estimation was
repeated many times.
• In a given sample, estimates may differ considerably from true values.
Failure of SLR.4 may lead to biased Estimates

• Usually SL.4 fails to be true due to:

❑ Reverse causality

❑ Wrong functional form

❑ Expected values of y conditional on x do not really fall on a straight line

❑ Error in measurement of x variable

❑ Omitted variables

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy