Metrics 2019 Lec3

Simple OLS Regression:Estimation
Introduction to Econometrics,Fall 2019
Zhaopeng Qu
Nanjing University
9/26/2019
Zhaopeng Qu (Nanjing University) Simple OLS Regression:Estimation 9/26/2019 1 / 60

1 Review the last lecture
2 OLS Estimation: Simple Regression
3 The Least Squares Assumptions
4 Properties of the OLS Estimators

Review the last lecture
Section 1
Review the last lecture

OLS Estimation: Simple Regression
Section 2

Question: Class Size and Student’s Performance

Specific Question:
What is the effect on district test scores if we would increase district
average class size by 1 student (or one unit of Student-Teacher’s
Ratio)
Technically,we would like to know the real value of a parameter 𝛽1 ,
Δ𝑇 𝑒𝑠𝑡𝑠𝑐𝑜𝑟𝑒
𝛽1 =
Δ𝐶𝑙𝑎𝑠𝑠𝑆𝑖𝑧𝑒
And 𝛽1 is actually the definition of the slope of a straight line

relating test scores and class size. Thus
𝑇 𝑒𝑠𝑡 𝑠𝑐𝑜𝑟𝑒 = 𝛽0 + 𝛽1 × 𝐶𝑙𝑎𝑠𝑠 𝑠𝑖𝑧𝑒
where 𝛽0 is the intercept of the straight line.

Question: Class Size and Student’s Performance
BUT the average test score in district 𝑖 does not only depend on the
average class size
It also depends on other factors such as
Student background
Quality of the teachers
School’s facilitates
Quality of text books …..
So the equation describing the linear relation between Test score and
Class size is better written as
𝑇 𝑒𝑠𝑡 𝑠𝑐𝑜𝑟𝑒𝑖 = 𝛽0 + 𝛽1 × 𝐶𝑙𝑎𝑠𝑠 𝑠𝑖𝑧𝑒𝑖 + 𝑢𝑖
where 𝑢𝑖 lumps together all other district characteristics that affect

average test scores.

Terminology for Simple Regression Model
The linear regression model with one regressor is denoted by
𝑌𝑖 = 𝛽0 + 𝛽1 𝑋𝑖 + 𝑢𝑖
Where
𝑌𝑖 is the dependent variable(Test Score)
𝑋𝑖 is the independent variable or regressor(Class Size or
Student-Teacher Ratio)
𝛽0 + 𝛽1 𝑋𝑖 is the population regression line or the population
regression function
This is the relationship that holds between Y and X on average over
the population. (be familiar with? Recall the concept of CEF )

The intercept 𝛽0 and the slope 𝛽1 are the coefficients of the

population regression line, also known as the parameters of the
population regression line.
𝑢𝑖 is the error term which contains all the other factors besides 𝑋
that determine the value of the dependent variable, 𝑌 , for a specific
observation, 𝑖.


How to find the “best” fitting line?

In general we don’t know 𝛽0 and 𝛽1 which are parameters of
population regression function.We have to calculate them using a
bunch of data- the sample.
So how to find the line that fits the data best?

The Ordinary Least Squares Estimator (OLS)
The OLS estimator

Chooses the best regression coefficients so that the estimated
regression line is as close as possible to the observed data, where
closeness is measured by the sum of the squared mistakes made in
predicting Y given X.
Let 𝑏0 and 𝑏1 be estimators of 𝛽0 and 𝛽1 ,thus 𝑏0 ≡ 𝛽0̂ ,𝑏1 ≡ 𝛽1̂
The predicted value of 𝑌𝑖 given 𝑋𝑖 using these estimators is
𝑏0 + 𝑏1 𝑋𝑖 , or 𝛽0̂ + 𝛽1̂ 𝑋𝑖 formally denotes as 𝑌𝑖̂

The OLS estimator

The prediction mistake is the difference between 𝑌𝑖 and 𝑌𝑖̂
𝑢̂𝑖 = 𝑌𝑖 − 𝑌𝑖̂ = 𝑌𝑖 − (𝑏0 + 𝑏1 𝑋𝑖 )
The estimators of the slope and intercept that minimize the sum of
the squares of 𝑢̂𝑖 ,thus
𝑛 𝑛
𝑎𝑟𝑔 𝑚𝑖𝑛 ∑ 𝑢̂2𝑖 = 𝑚𝑖𝑛 ∑(𝑌𝑖 − 𝑏0 − 𝑏1 𝑋𝑖 )2
𝑏0 ,𝑏1 𝑏0 ,𝑏1
𝑖=1 𝑖=1
are called the ordinary least squares (OLS) estimators of 𝛽0 and

𝛽1 .

OLS minimizes sum of squared prediction mistakes:

𝑛 𝑛
𝑚𝑖𝑛 ∑ 𝑢̂2𝑖 = ∑(𝑌𝑖 − 𝑏0 − 𝑏1 𝑋𝑖 )2
𝑏0 ,𝑏1
𝑖=1 𝑖=1
Solve the problem by F.O.C(the first order condition)

Step 1 for 𝛽0 :
𝜕 𝑛
∑(𝑌 − 𝑏0 − 𝑏1 𝑋𝑖 )2 = 0
𝜕𝑏0 𝑖=1 𝑖
Step 2 for 𝛽1 :
𝜕 𝑛
∑(𝑌 − 𝑏0 − 𝑏1 𝑋𝑖 )2 = 0
𝜕𝑏1 𝑖=1 𝑖

Step 1: OLS estimator of 𝛽0
Optimization
𝜕 𝑛 2 𝑛
∑ 𝑢̂𝑖 = −2 ∑(𝑌𝑖 − 𝑏0 − 𝑏1 𝑋𝑖 ) = 0
𝜕𝑏0 𝑖=1 𝑖=1
𝑛 𝑛 𝑛
⇒ ∑ 𝑌𝑖 − ∑ 𝑏0 − ∑ 𝑏1 𝑋𝑖 = 0
𝑖=1 𝑖=1 𝑖=1
𝑛 𝑛
1 1 1 𝑛
⇒ ∑ 𝑌𝑖 − ∑ 𝑏0 − 𝑏1 ∑ 𝑋𝑖 = 0
𝑛 𝑖=1 𝑛 𝑖=1 𝑛 𝑖=1
⇒𝑌 − 𝑏0 − 𝑏1 𝑋 = 0

OLS estimator of 𝛽0 :
𝑏0 = 𝛽0̂ = 𝑌 − 𝑏1 𝑋

𝜕 𝑛 2 𝑛
∑ 𝑢̂𝑖 = −2 ∑ 𝑋𝑖 (𝑌𝑖 − 𝑏0 − 𝑏1 𝑋𝑖 ) = 0
𝜕𝑏1 𝑖=1 𝑖=1
𝑛
⇒ ∑ 𝑋𝑖 [𝑌𝑖 − (𝑌 − 𝑏1 𝑋) − 𝑏1 𝑋𝑖 ] = 0
𝑖=1
𝑛
⇒ ∑ 𝑋𝑖 [(𝑌𝑖 − 𝑌 ) − 𝑏1 (𝑋𝑖 − 𝑋)] = 0
𝑖=1
𝑛 𝑛
⇒ ∑ 𝑋𝑖 (𝑌𝑖 − 𝑌 ) − 𝑏1 ∑ 𝑋𝑖 (𝑋𝑖 − 𝑋) = 0
𝑖=1 𝑖=1


Some Algebraic Facts
𝑛 𝑛 𝑛 𝑛 𝑛
∑(𝑋𝑖 − 𝑋)(𝑌𝑖 − 𝑌 ) = ∑ 𝑋𝑖 𝑌𝑖 − ∑ 𝑋𝑖 𝑌 − ∑ 𝑋𝑌𝑖 + ∑ 𝑋𝑌
𝑖=1 𝑖=1 𝑖=1 𝑖=1 𝑖=1
𝑛 𝑛 𝑛
1
= ∑ 𝑋𝑖 𝑌𝑖 − ∑ 𝑋𝑖 𝑌 − 𝑛𝑋( ∑ 𝑌𝑖 ) + 𝑛𝑋𝑌
𝑖=1 𝑖=1
𝑛 𝑖=1
𝑛
= ∑ 𝑋𝑖 (𝑌𝑖 − 𝑌 )
𝑖=1
By a similar reasoning, we could obtain

𝑛 𝑛
∑(𝑋𝑖 − 𝑋)(𝑋𝑖 − 𝑋) = ∑ 𝑋𝑖 (𝑋𝑖 − 𝑋)
𝑖=1 𝑖=1

Thus
𝜕 𝑛 2 𝑛 𝑛
∑ 𝑢̂𝑖 = ∑(𝑋𝑖 − 𝑋)(𝑌𝑖 − 𝑌 ) − 𝑏1 ∑(𝑋𝑖 − 𝑋)(𝑋𝑖 − 𝑋) = 0
𝜕𝑏1 𝑖=1 𝑖=1 𝑖=1
OLS estimator of 𝛽1 :
𝑛
∑𝑖=1 (𝑋𝑖 − 𝑋)(𝑌𝑖 − 𝑌 )
𝑏1 = 𝛽1̂ = 𝑛
∑𝑖=1 (𝑋𝑖 − 𝑋)(𝑋𝑖 − 𝑋)

Some Algebraic of 𝑢̂𝑖

Recall the F.O.C
𝜕 𝑛
∑(𝑌 − 𝑏0 − 𝑏1 𝑋𝑖 )2 = 0
𝜕𝑏0 𝑖=1 𝑖
𝜕 𝑛
∑(𝑌 − 𝑏0 − 𝑏1 𝑋𝑖 )2 = 0
𝜕𝑏1 𝑖=1 𝑖
We obtain two intermediate formulas
𝑛
∑(𝑌𝑖 − 𝑏0 − 𝑏1 𝑋𝑖 ) = 0
𝑖=1
𝑛
∑ 𝑋𝑖 (𝑌𝑖 − 𝑏0 − 𝑏1 𝑋𝑖 ) = 0
𝑖=1
Some Algebraic of 𝑢̂𝑖

Recall the OLS predicted values 𝑌𝑖̂ and residuals 𝑢̂𝑖 are:
𝑌𝑖̂ = 𝛽0̂ + 𝛽1̂ 𝑋𝑖

𝑢𝑖̂ = 𝑌𝑖 − 𝑌𝑖̂
Then we have
𝑛
∑ 𝑢𝑖̂ =0
𝑖=1
𝑛
∑ 𝑢𝑖̂ 𝑋𝑖 = 0
𝑖=1

Some Algebraic of 𝑢
𝑛 𝑛
∑ 𝑢̄ = ∑ 𝑢̂𝑖 = 0
𝑖=1 𝑖=1
𝑛
∑ 𝑢𝑋
̄ 𝑖=0
𝑖=1

The Estimated Regression Line

Measures of Fit: The 𝑅2
Decompose 𝑌𝑖 into the fitted value plus the residual 𝑌𝑖 = 𝑌𝑖̂ + 𝑢̂𝑖
𝑛
The total sum of squares (TSS): 𝑇 𝑆𝑆 = ∑𝑖=1 (𝑌𝑖 − 𝑌 )2
𝑛
The explained sum of squares (ESS): ∑𝑖=1 (𝑌𝑖̂ − 𝑌 )2
𝑛 𝑛
The sum of squared residuals (SSR): ∑𝑖=1 (𝑌𝑖̂ − 𝑌𝑖 )2 = ∑𝑖=1 𝑢̂2𝑖
And
𝑇 𝑆𝑆 = 𝐸𝑆𝑆 + 𝑆𝑆𝑅

Measures of Fit: The 𝑅2
𝑅2 or the coefficient of determination, is the fraction of the sample

variance of 𝑌𝑖 explained/predicted by 𝑋𝑖
𝑆𝑆𝐸 𝑆𝑆𝑅
𝑅2 = =1−
𝑆𝑆𝑇 𝑆𝑆𝑇
So 0 ≤ 𝑅2 ≤ 1
It seems that R-squares is bigger, the regression is better.
But actually we DON’T care much about 𝑅2 in causal inference

The Standard Error of the Regression
The standard error of the regression (SER) is an estimator of the

standard deviation of the regression error 𝑢𝑖
Because the regression errors 𝑢𝑖 are unobserved, the SER is
computed using their sample counterparts, the OLS residuals 𝑢𝑖̂
𝑆𝐸𝑅 = 𝑠𝑢̂ = √𝑠𝑢2̂
1 𝑛
where 𝑠𝑢2̂ = 𝑛−2 ∑𝑖=1 𝑢̂2

The Least Squares Assumptions
Section 3

Assumption of the Linear regression model
In order to investigate the statistical properties of OLS, we need to

make some statistical assumptions
Linear Regression Model
The observations, (𝑌𝑖 , 𝑋𝑖 ) come from a random sample(i.i.d) and satisfy
the linear regression equation,
and 𝐸[𝑢𝑖 ∣ 𝑋𝑖 ] = 0

Assumption 1: Conditional Mean is Zero
Assumption 1: Zero conditional mean of the errors given X

The error,𝑢𝑖 has expected value of 0 given any value of the independent
variable
𝐸[𝑢𝑖 ∣ 𝑋𝑖 = 𝑥] = 0
An weaker condition that 𝑢𝑖 and 𝑋𝑖 are uncorrelated:
𝐶𝑜𝑣[𝑢𝑖 , 𝑋𝑖 ] = 𝐸[𝑢𝑖 𝑋𝑖 ] = 0
if both are correlated, then Assumption 1 is violated.

Equivalently, the population regression line is the conditional mean of
𝑌𝑖 given 𝑋𝑖 , thus

Assumption 1: Conditional Mean is Zero

Assumption 2: Random Sample
Assumption 2: Random Sample

We have a i.i.d random sample of size , {(𝑋𝑖 , 𝑌𝑖 ), 𝑖 = 1, ..., 𝑛} from the
population regression model above.
This is an implication of random sampling.

And it generally won’t hold in other data structures.
– Violations: time-series, cluster samples.

Assumption 3: Large outliers are unlikely

It states that observations with values of 𝑋𝑖 , 𝑌𝑖 or both that are far
outside the usual range of the data(Outlier) are unlikely. Mathematically,
it assume that X and Y have nonzero finite fourth moments.
Large outliers can make OLS regression results misleading.

One source of large outliers is data entry errors, such as a
typographical error or incorrectly using different units for different
observations.
Data entry errors aside, the assumption of finite kurtosis is a plausible
one in many applications with economic data.


Underlying assumptions of OLS
The OLS estimator is unbiased, consistent and has asymptotically

normal sampling distribution if
1 Random sampling.
2 Large outliers are unlikely.
3 The conditional mean of 𝑢𝑖 given 𝑋𝑖 is zero

Underlying assumptions of OLS
OLS is an estimator: it’s a machine that we plug data into and we

get out estimates.
It has a sampling distribution, with a sampling variance/standard
error, etc. like the sample mean, sample difference in means, or the
sample variance.
Let’s discuss these characteristics of OLS in the next section.

Properties of the OLS Estimators
Section 4

The OLS estimators
Question of interest: What is the effect of a change in 𝑋𝑖 (Class Size)

on 𝑌𝑖 (Test Score)
We derived the OLS estimators of 𝛽0 and𝛽1 :
𝛽0̂ = 𝑌 ̄ − 𝛽1̂ 𝑋̄
̄ 𝑖 − 𝑌̄ )
∑(𝑋𝑖 − 𝑋)(𝑌
𝛽1̂ =
̄
∑(𝑋𝑖 − 𝑋)(𝑋 ̄
𝑖 − 𝑋)

Least Squares Assumptions
1 Assumption 1:
2 Assumption 2:
3 Assumption 3:
If the 3 least squares assumptions hold the OLS estimators will be
unbiased
consistent
normal sampling distribution

Properties of the OLS estimator: unbiasedness
Recall:
𝛽0̂ = 𝑌 − 𝛽1̂ 𝑋
take expectation to 𝛽0 :
𝐸[𝛽0̂ ] = 𝑌 ̄ − 𝐸[𝛽1̂ ]𝑋̄
Then we have: if 𝛽1 is unbiased, then 𝛽0 is also unbiased.

Remind we have
𝑌 = 𝛽 0 + 𝛽1 𝑋 + 𝑢
So take expectation to 𝛽1 :
̄
∑(𝑋𝑖 − 𝑋)/(𝑌 𝑖−𝑌)
̄
𝐸[𝛽1̂ ] = 𝐸[ ]
̄
∑(𝑋𝑖 − 𝑋)(𝑋 ̄
𝑖 − 𝑋)

Continued
̄ 0 + 𝛽1 𝑋𝑖 + 𝑢𝑖 − (𝛽0 + 𝛽1 𝑋 + 𝑢))
∑(𝑋𝑖 − 𝑋)(𝛽
𝐸[𝛽1̂ ] = 𝐸[ ]
̄
∑(𝑋𝑖 − 𝑋)(𝑋 ̄
𝑖 − 𝑋)
̄ 1 (𝑋𝑖 − 𝑋) + (𝑢𝑖 − 𝑢))
∑(𝑋𝑖 − 𝑋)(𝛽
= 𝐸[ ]
∑(𝑋 − 𝑋)(𝑋̄ − 𝑋)̄ 𝑖 𝑖
̄ 𝑖 − 𝑢)
∑(𝑋𝑖 − 𝑋)(𝑢
= 𝛽1 + 𝐸[ ]
̄
∑(𝑋𝑖 − 𝑋)(𝑋 ̄
𝑖 − 𝑋)

̄ 𝑖 − 𝑢) = ∑(𝑋𝑖 − 𝑋)𝑢
Because ∑(𝑋𝑖 − 𝑋)(𝑢 ̄ 𝑖 , so
∑(𝑋𝑖 − 𝑋)𝑢𝑖
𝐸[𝛽1̂ ] = 𝛽1 + 𝐸[ ]
∑(𝑋𝑖 − 𝑋)(𝑋𝑖 − 𝑋)
∑(𝑋𝑖 − 𝑋)𝐸(𝑢𝑖 |𝑋1 , ..., 𝑋𝑛 )
= 𝛽1 + 𝐸[ ]
∑(𝑋𝑖 − 𝑋)(𝑋𝑖 − 𝑋)
the Law of Iterated Expectation(LIE)
𝐸(𝐸(𝑌 |𝑋)) = 𝐸(𝑌 ) 𝑎𝑛𝑑 𝐸(𝐸(𝑔(𝑋)𝑌 |𝑋) = 𝐸(𝑔(𝑋)𝑌 )

Then we can obtain
𝐸[𝛽1̂ ] = 𝛽1 𝑖𝑓 𝐸[𝑢𝑖 |𝑋𝑖 ] = 0
Thus both 𝛽0 and 𝛽1 are unbiased on the condition of Assumption

1.

Properties of the OLS estimator: Consistency
𝑝
Notation: 𝛽1̂ ⟶ 𝛽1 or 𝑝𝑙𝑖𝑚𝛽1̂ = 𝛽1 , so
̄ 𝑖 − 𝑌̄ )
∑(𝑋𝑖 − 𝑋)(𝑌
𝑝𝑙𝑖𝑚𝛽1̂ = 𝑝𝑙𝑖𝑚[ ]
̄
∑(𝑋𝑖 − 𝑋)(𝑋 𝑖 − 𝑋)
̄
1 ̄ ̄ 𝑠𝑥𝑦
𝑛−1 ∑(𝑋𝑖 − 𝑋)(𝑌𝑖 − 𝑌 )
𝑝𝑙𝑖𝑚𝛽1̂ = 𝑝𝑙𝑖𝑚[ ] = 𝑝𝑙𝑖𝑚( )
1 ̄ ̄ 𝑠2𝑥
𝑛−1 ∑(𝑋𝑖 − 𝑋)(𝑋𝑖 − 𝑋)
where 𝑠𝑥𝑦 and 𝑠2𝑥 are sample covariance and sample variance.

Continuous Mapping Theorem: For every continuous function 𝑔(𝑡)

and random variable 𝑋:
𝑝𝑙𝑖𝑚(𝑔(𝑋)) = 𝑔(𝑝𝑙𝑖𝑚(𝑋))
Example:
𝑝𝑙𝑖𝑚(𝑋 + 𝑌 ) = 𝑝𝑙𝑖𝑚(𝑋) + 𝑝𝑙𝑖𝑚(𝑌 )
𝑋 𝑝𝑙𝑖𝑚(𝑋)
𝑝𝑙𝑖𝑚( )= 𝑖𝑓 𝑝𝑙𝑖𝑚(𝑌 ) ≠ 0
𝑌 𝑝𝑙𝑖𝑚(𝑌 )

Base on L.L.N(law of large numbers) and random sample(i.i.d)

𝑝
𝑠2𝑋 ⟶= 𝜎𝑋
2
= 𝑉 𝑎𝑟(𝑋)
𝑝
𝑠𝑥𝑦 ⟶ 𝜎𝑋𝑌 = 𝐶𝑜𝑣(𝑋, 𝑌 )
Combining with CMP,then we obtain OLS estimator when 𝑛 ⟶ ∞

𝑠𝑥𝑦 𝐶𝑜𝑣(𝑋𝑖 , 𝑌𝑖 )
𝑝𝑙𝑖𝑚𝛽1̂ = 𝑝𝑙𝑖𝑚( 2
)=
𝑠𝑥 𝑉 𝑎𝑟(𝑋𝑖 )

𝐶𝑜𝑣(𝑋𝑖 , 𝑌𝑖 )
𝑝𝑙𝑖𝑚𝛽1̂ =
𝑉 𝑎𝑟(𝑋𝑖 )
𝐶𝑜𝑣(𝑋𝑖 , (𝛽0 + 𝛽1 𝑋𝑖 + 𝑢𝑖 ))
=
𝐶𝑜𝑣(𝑋𝑖 , 𝛽0 ) + 𝛽1 𝐶𝑜𝑣(𝑋𝑖 , 𝑋𝑖 ) + 𝐶𝑜𝑣(𝑋𝑖 , 𝑢𝑖 )
=
𝐶𝑜𝑣(𝑋𝑖 , 𝑢𝑖 )
= 𝛽1 +
Then we could obtain

𝑝𝑙𝑖𝑚𝛽1̂ = 𝛽1 𝑖𝑓 𝐸[𝑢𝑖 |𝑋𝑖 ] = 0
Both 𝛽0̂ and 𝛽1̂ are Consistent on the condition of Assumption 1.

In A Summary: Unbiasedness vs Consistency
Unbiasedness & Consistency both rely on 𝐸[𝑢𝑖 |𝑋𝑖 ] = 0

Unbiasedness implies that 𝐸[𝛽1̂ ] = 𝛽1 for a certain sample size
n.(“small sample”)
Consistency implies that the distribution of 𝛽1̂ becomes more and
more tightly distributed around 𝛽1 if the sample size n becomes larger
and larger.(“large sample””)

Sampling Distribution of 𝛽0̂ and 𝛽1̂
Firstly, Let’s recall: Sampling Distribution of 𝑌

Because Y1,…,Yn are i.i.d., then we have
𝐸(𝑌 ) = 𝜇𝑌
Based on the Central Limit theorem(C.L.T), the sample distribution

in a large sample can approximates to a normal distribution, thus
𝜎𝑌2
𝑌 ∼ 𝑁 (𝜇𝑌 , )
𝑛
The OLS estimators 𝛽0̂ and 𝛽1̂ could have similar sample distributions
when three least squares assumptions hold.

Sampling Distribution of 𝛽0̂ and 𝛽1̂
Unbiasedness of the OLS estimators implies that
𝐸[𝛽1̂ ] = 𝛽1 𝑎𝑛𝑑 𝐸[𝛽0̂ ] = 𝛽0
Based on the Central Limit theorem(C.L.T), the sample distribution

of 𝛽 in a large sample can approximates to a normal distribution, thus
𝛽0̂ ∼ 𝑁 (𝛽0 , 𝜎𝛽2 ̂ )

0
𝛽1̂ ∼ 𝑁 (𝛽1 , 𝜎𝛽2 ̂ )

1

Sampling Distribution of 𝛽0̂ and 𝛽1̂ in large-sample

Recall: Sampling Distribution of 𝑌 ̄ , based on the Central Limit
theorem(C.L.T), the sample distribution in a large sample can
approximates to a normal distribution.
𝜎𝑌2
𝑌 ∼ 𝑁 (𝜇𝑌 , )
𝑛
So the sample distribution of 𝛽1 in a large sample can also

approximates to a normal distribution based on the Central Limit
theorem(C.L.T), thus 𝛽1̂ ∼ 𝑁 (𝛽1 , 𝜎𝛽2 ̂ )
1
Where it can be shown that

1 𝑉 𝑎𝑟[(𝑋𝑖 − 𝜇𝑥 )𝑢𝑖 ]
𝜎𝛽2 ̂ = )
1 𝑛 [𝑉 𝑎𝑟(𝑋𝑖 )]2
1 𝑉 𝑎𝑟(𝐻𝑖 𝑢𝑖 )
𝜎𝛽2 ̂ = )
0 𝑛 (𝐸[𝐻𝑖2 ])2
Sampling Distribution of 𝛽1̂
𝛽1̂ in terms of regression and errors in following equation
1 𝑛
𝑛 ∑𝑖=1 (𝑋𝑖 − 𝑋)(𝑌𝑖 − 𝑌 )
𝛽1̂ = 1 𝑛
𝑛 ∑𝑖=1 (𝑋𝑖 − 𝑋)(𝑋𝑖 − 𝑋)
1 𝑛
𝑛 ∑𝑖=1 (𝑋𝑖 − 𝑋)(𝑢𝑖 − 𝑢)
= 𝛽1 + 1 𝑛
𝑛 ∑𝑖=1 (𝑋𝑖 − 𝑋)(𝑋𝑖 − 𝑋)

Sampling Distribution of 𝛽1̂ :the numerator

1 𝑛
The numerator: 𝑛 ∑𝑖=1 (𝑋𝑖 − 𝑋)(𝑢𝑖 − 𝑢)
𝑝
Because 𝑋̄ is consistent, thus 𝑋 −
→ 𝜇𝑥 .
𝑛
Recall ∑𝑖=1 𝑢̄ = 0, then
𝑛 𝑛
∑(𝑋𝑖 − 𝑋)(𝑢𝑖 − 𝑢) = ∑(𝑋𝑖 − 𝑋)𝑢𝑖
𝑖=1 𝑖=1
then we have
1 𝑛 1 𝑛
∑(𝑋𝑖 − 𝑋)(𝑢𝑖 − 𝑢) ≅ ∑(𝑋𝑖 − 𝜇𝑥 )𝑢𝑖
𝑛 𝑖=1 𝑛 𝑖=1

Sampling Distribution of 𝛽1̂ :the numerator
Let 𝑣𝑖 = (𝑋𝑖 − 𝜇𝑥 )𝑢𝑖

Based on Assumption 1, then 𝐸(𝑣𝑖 ) = 0
Based on Assumption 2, 𝜎𝑣2 = 𝑉 𝑎𝑟[(𝑋𝑖 − 𝜇𝑥 )𝑢𝑖 ]
Then
1 𝑛 1 𝑛
∑(𝑋𝑖 − 𝜇𝑥 )𝑢𝑖 = ∑ 𝑣𝑖 = 𝑣 ̄
𝑛 𝑖=1 𝑛 𝑖=1
Then 𝑣 ̄ is the sample mean of 𝑣𝑖 , based on C.L.T,
𝑣̄ − 0 𝑑 𝑑 𝜎2
− → 𝑁 (0, 𝑣 )
→ 𝑁 (0.1) 𝑜𝑟 𝑣 ̄ −
𝜎𝑣 ̄ 𝑛

Sampling Distribution of 𝛽1̂ :the denominator
The denominator,
1 𝑛
∑(𝑋 − 𝑋)(𝑋𝑖 − 𝑋)
𝑛 𝑖=1 𝑖
This is a variation of sample variance of 𝑋 (except dividing by 𝑛

rather than 𝑛 − 1, which is inconsequential if 𝑛 is large)
Based on discussion of the sample variance is a consistent estimator
of the population variance,thus
𝑝
𝜎𝑥2 𝑖 −
→ 𝑉 𝑎𝑟[𝑋𝑖 ]

𝛽1̂ in terms of regression and errors

1 𝑛
𝑛 ∑𝑖=1 (𝑋𝑖 − 𝑋)(𝑢𝑖 − 𝑢)
= 𝛽1 + 1 𝑛
𝑛 ∑𝑖=1 (𝑋𝑖 − 𝑋)(𝑋𝑖 − 𝑋)
Combining these two results, we have that, in large samples

𝑣̄
𝛽1̂ − 𝛽1 ≅
𝑉 𝑎𝑟[𝑋𝑖 ]


Based on 𝑣 ̄ follow a normal distribution, in large samples,thus
𝑑 𝜎𝑣2̄
𝑣̄ −
→ 𝑁 (0, )
𝑛
𝑣̄ 𝑑 𝜎𝑣2̄
⇒ −
→ 𝑁 (0, )
𝑉 𝑎𝑟[𝑋𝑖 ] 𝑛[𝑉 𝑎𝑟(𝑋𝑖 )]2
𝑑 𝜎𝑣2̄
⇒𝛽1̂ − 𝛽1 −
→ 𝑁 (0, )
𝑛[𝑉 𝑎𝑟(𝑋𝑖 )]2
Then the sampling distribution of 𝛽1̂ is

𝑑
𝛽1̂ −
→ 𝑁 (𝛽1 , 𝜎𝛽2 ̂ )
1
where
𝜎𝑣2̄ 𝑉 𝑎𝑟[(𝑋𝑖 − 𝜇𝑥 )𝑢𝑖 ]
𝜎𝛽2 ̂ = 2
=
1 𝑛[𝑉 𝑎𝑟(𝑋𝑖 )] 𝑛[𝑉 𝑎𝑟(𝑋𝑖 )]2
Sampling Distribution 𝛽1̂ in large-sample
We have shown that

1 𝑉 𝑎𝑟[(𝑋𝑖 − 𝜇𝑥 )𝑢𝑖 ]
𝜎𝛽2 ̂ = )
1 𝑛 [𝑉 𝑎𝑟(𝑋𝑖 )]2
An intuition：The variation of X is very important.

Because if 𝑉 𝑎𝑟(𝑋𝑖 ) is small, it is difficult to obtain an accurate
estimate of the effect of X on Y which implies that 𝑉 𝑎𝑟(𝛽1̂ ) is large.

Variation of X
When more variation in X, then there is more information in the data

that you can use to fit the regression line.

In a Summary
Under 3 least squares assumptions, the OLS estimators will be

unbiased
consistent
normal sampling distribution
more variation in X, more accurate estimation

Metrics 2019 Lec3

Uploaded by

Copyright:

Available Formats

Metrics 2019 Lec3

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Metrics 2019 Lec3

Uploaded by

Copyright:

Available Formats

Simple OLS Regression:Estimation

Introduction to Econometrics,Fall 2019

Zhaopeng Qu (Nanjing University) Simple OLS Regression:Estimation 9/26/2019 1 / 60

2 OLS Estimation: Simple Regression

3 The Least Squares Assumptions

4 Properties of the OLS Estimators

Zhaopeng Qu (Nanjing University) Simple OLS Regression:Estimation 9/26/2019 2 / 60

Review the last lecture

Zhaopeng Qu (Nanjing University) Simple OLS Regression:Estimation 9/26/2019 3 / 60

OLS Estimation: Simple Regression

Zhaopeng Qu (Nanjing University) Simple OLS Regression:Estimation 9/26/2019 5 / 60

Question: Class Size and Student’s Performance

And 𝛽1 is actually the definition of the slope of a straight line

𝑇 𝑒𝑠𝑡 𝑠𝑐𝑜𝑟𝑒 = 𝛽0 + 𝛽1 × 𝐶𝑙𝑎𝑠𝑠 𝑠𝑖𝑧𝑒

where 𝛽0 is the intercept of the straight line.

Question: Class Size and Student’s Performance

𝑇 𝑒𝑠𝑡 𝑠𝑐𝑜𝑟𝑒𝑖 = 𝛽0 + 𝛽1 × 𝐶𝑙𝑎𝑠𝑠 𝑠𝑖𝑧𝑒𝑖 + 𝑢𝑖

where 𝑢𝑖 lumps together all other district characteristics that affect

Zhaopeng Qu (Nanjing University) Simple OLS Regression:Estimation 9/26/2019 7 / 60

Terminology for Simple Regression Model

The linear regression model with one regressor is denoted by

Zhaopeng Qu (Nanjing University) Simple OLS Regression:Estimation 9/26/2019 8 / 60

Terminology for Simple Regression Model

The intercept 𝛽0 and the slope 𝛽1 are the coeﬀicients of the

Zhaopeng Qu (Nanjing University) Simple OLS Regression:Estimation 9/26/2019 9 / 60

Terminology for Simple Regression Model

Zhaopeng Qu (Nanjing University) Simple OLS Regression:Estimation 9/26/2019 10 / 60

How to find the “best” fitting line?

So how to find the line that fits the data best?

The Ordinary Least Squares Estimator (OLS)

The OLS estimator

Zhaopeng Qu (Nanjing University) Simple OLS Regression:Estimation 9/26/2019 12 / 60

The Ordinary Least Squares Estimator (OLS)

The OLS estimator

𝑢̂𝑖 = 𝑌𝑖 − 𝑌𝑖̂ = 𝑌𝑖 − (𝑏0 + 𝑏1 𝑋𝑖 )

are called the ordinary least squares (OLS) estimators of 𝛽0 and

Zhaopeng Qu (Nanjing University) Simple OLS Regression:Estimation 9/26/2019 13 / 60

The Ordinary Least Squares Estimator (OLS)

OLS minimizes sum of squared prediction mistakes:

Solve the problem by F.O.C(the first order condition)

Zhaopeng Qu (Nanjing University) Simple OLS Regression:Estimation 9/26/2019 14 / 60

Step 1: OLS estimator of 𝛽0

Zhaopeng Qu (Nanjing University) Simple OLS Regression:Estimation 9/26/2019 15 / 60

Step 1: OLS estimator of 𝛽0

Zhaopeng Qu (Nanjing University) Simple OLS Regression:Estimation 9/26/2019 16 / 60

Step 2: OLS estimator of 𝛽1

Zhaopeng Qu (Nanjing University) Simple OLS Regression:Estimation 9/26/2019 17 / 60

Step 2: OLS estimator of 𝛽1

By a similar reasoning, we could obtain

Zhaopeng Qu (Nanjing University) Simple OLS Regression:Estimation 9/26/2019 18 / 60

Step 2: OLS estimator of 𝛽1

Zhaopeng Qu (Nanjing University) Simple OLS Regression:Estimation 9/26/2019 19 / 60

Some Algebraic of 𝑢̂𝑖

We obtain two intermediate formulas

Some Algebraic of 𝑢̂𝑖

𝑌𝑖̂ = 𝛽0̂ + 𝛽1̂ 𝑋𝑖

Zhaopeng Qu (Nanjing University) Simple OLS Regression:Estimation 9/26/2019 21 / 60

Zhaopeng Qu (Nanjing University) Simple OLS Regression:Estimation 9/26/2019 22 / 60

The Estimated Regression Line