Metrics 2019 Lec3
Metrics 2019 Lec3
Metrics 2019 Lec3
Zhaopeng Qu
Nanjing University
9/26/2019
Section 1
Section 2
Δ𝑇 𝑒𝑠𝑡𝑠𝑐𝑜𝑟𝑒
𝛽1 =
Δ𝐶𝑙𝑎𝑠𝑠𝑆𝑖𝑧𝑒
BUT the average test score in district 𝑖 does not only depend on the
average class size
It also depends on other factors such as
Student background
Quality of the teachers
School’s facilitates
Quality of text books …..
So the equation describing the linear relation between Test score and
Class size is better written as
𝑌𝑖 = 𝛽0 + 𝛽1 𝑋𝑖 + 𝑢𝑖
Where
𝑌𝑖 is the dependent variable(Test Score)
𝑋𝑖 is the independent variable or regressor(Class Size or
Student-Teacher Ratio)
𝛽0 + 𝛽1 𝑋𝑖 is the population regression line or the population
regression function
This is the relationship that holds between Y and X on average over
the population. (be familiar with? Recall the concept of CEF )
The estimators of the slope and intercept that minimize the sum of
the squares of 𝑢̂𝑖 ,thus
𝑛 𝑛
𝑎𝑟𝑔 𝑚𝑖𝑛 ∑ 𝑢̂2𝑖 = 𝑚𝑖𝑛 ∑(𝑌𝑖 − 𝑏0 − 𝑏1 𝑋𝑖 )2
𝑏0 ,𝑏1 𝑏0 ,𝑏1
𝑖=1 𝑖=1
Step 2 for 𝛽1 :
𝜕 𝑛
∑(𝑌 − 𝑏0 − 𝑏1 𝑋𝑖 )2 = 0
𝜕𝑏1 𝑖=1 𝑖
Optimization
𝜕 𝑛 2 𝑛
∑ 𝑢̂𝑖 = −2 ∑(𝑌𝑖 − 𝑏0 − 𝑏1 𝑋𝑖 ) = 0
𝜕𝑏0 𝑖=1 𝑖=1
𝑛 𝑛 𝑛
⇒ ∑ 𝑌𝑖 − ∑ 𝑏0 − ∑ 𝑏1 𝑋𝑖 = 0
𝑖=1 𝑖=1 𝑖=1
𝑛 𝑛
1 1 1 𝑛
⇒ ∑ 𝑌𝑖 − ∑ 𝑏0 − 𝑏1 ∑ 𝑋𝑖 = 0
𝑛 𝑖=1 𝑛 𝑖=1 𝑛 𝑖=1
⇒𝑌 − 𝑏0 − 𝑏1 𝑋 = 0
OLS estimator of 𝛽0 :
𝑏0 = 𝛽0̂ = 𝑌 − 𝑏1 𝑋
𝜕 𝑛 2 𝑛
∑ 𝑢̂𝑖 = −2 ∑ 𝑋𝑖 (𝑌𝑖 − 𝑏0 − 𝑏1 𝑋𝑖 ) = 0
𝜕𝑏1 𝑖=1 𝑖=1
𝑛
⇒ ∑ 𝑋𝑖 [𝑌𝑖 − (𝑌 − 𝑏1 𝑋) − 𝑏1 𝑋𝑖 ] = 0
𝑖=1
𝑛
⇒ ∑ 𝑋𝑖 [(𝑌𝑖 − 𝑌 ) − 𝑏1 (𝑋𝑖 − 𝑋)] = 0
𝑖=1
𝑛 𝑛
⇒ ∑ 𝑋𝑖 (𝑌𝑖 − 𝑌 ) − 𝑏1 ∑ 𝑋𝑖 (𝑋𝑖 − 𝑋) = 0
𝑖=1 𝑖=1
Thus
𝜕 𝑛 2 𝑛 𝑛
∑ 𝑢̂𝑖 = ∑(𝑋𝑖 − 𝑋)(𝑌𝑖 − 𝑌 ) − 𝑏1 ∑(𝑋𝑖 − 𝑋)(𝑋𝑖 − 𝑋) = 0
𝜕𝑏1 𝑖=1 𝑖=1 𝑖=1
OLS estimator of 𝛽1 :
𝑛
∑𝑖=1 (𝑋𝑖 − 𝑋)(𝑌𝑖 − 𝑌 )
𝑏1 = 𝛽1̂ = 𝑛
∑𝑖=1 (𝑋𝑖 − 𝑋)(𝑋𝑖 − 𝑋)
𝜕 𝑛
∑(𝑌 − 𝑏0 − 𝑏1 𝑋𝑖 )2 = 0
𝜕𝑏0 𝑖=1 𝑖
𝜕 𝑛
∑(𝑌 − 𝑏0 − 𝑏1 𝑋𝑖 )2 = 0
𝜕𝑏1 𝑖=1 𝑖
𝑛
∑(𝑌𝑖 − 𝑏0 − 𝑏1 𝑋𝑖 ) = 0
𝑖=1
𝑛
∑ 𝑋𝑖 (𝑌𝑖 − 𝑏0 − 𝑏1 𝑋𝑖 ) = 0
𝑖=1
Zhaopeng Qu (Nanjing University) Simple OLS Regression:Estimation 9/26/2019 20 / 60
OLS Estimation: Simple Regression
Then we have
𝑛
∑ 𝑢𝑖̂ =0
𝑖=1
𝑛
∑ 𝑢𝑖̂ 𝑋𝑖 = 0
𝑖=1
Some Algebraic of 𝑢
𝑛 𝑛
∑ 𝑢̄ = ∑ 𝑢̂𝑖 = 0
𝑖=1 𝑖=1
𝑛
∑ 𝑢𝑋
̄ 𝑖=0
𝑖=1
Decompose 𝑌𝑖 into the fitted value plus the residual 𝑌𝑖 = 𝑌𝑖̂ + 𝑢̂𝑖
𝑛
The total sum of squares (TSS): 𝑇 𝑆𝑆 = ∑𝑖=1 (𝑌𝑖 − 𝑌 )2
𝑛
The explained sum of squares (ESS): ∑𝑖=1 (𝑌𝑖̂ − 𝑌 )2
𝑛 𝑛
The sum of squared residuals (SSR): ∑𝑖=1 (𝑌𝑖̂ − 𝑌𝑖 )2 = ∑𝑖=1 𝑢̂2𝑖
And
𝑇 𝑆𝑆 = 𝐸𝑆𝑆 + 𝑆𝑆𝑅
𝑆𝑆𝐸 𝑆𝑆𝑅
𝑅2 = =1−
𝑆𝑆𝑇 𝑆𝑆𝑇
So 0 ≤ 𝑅2 ≤ 1
It seems that R-squares is bigger, the regression is better.
But actually we DON’T care much about 𝑅2 in causal inference
1 𝑛
where 𝑠𝑢2̂ = 𝑛−2 ∑𝑖=1 𝑢̂2
Section 3
𝑌𝑖 = 𝛽0 + 𝛽1 𝑋𝑖 + 𝑢𝑖
and 𝐸[𝑢𝑖 ∣ 𝑋𝑖 ] = 0
𝐶𝑜𝑣[𝑢𝑖 , 𝑋𝑖 ] = 𝐸[𝑢𝑖 𝑋𝑖 ] = 0
Section 4
𝛽0̂ = 𝑌 ̄ − 𝛽1̂ 𝑋̄
̄ 𝑖 − 𝑌̄ )
∑(𝑋𝑖 − 𝑋)(𝑌
𝛽1̂ =
̄
∑(𝑋𝑖 − 𝑋)(𝑋 ̄
𝑖 − 𝑋)
1 Assumption 1:
2 Assumption 2:
3 Assumption 3:
If the 3 least squares assumptions hold the OLS estimators will be
unbiased
consistent
normal sampling distribution
Recall:
𝛽0̂ = 𝑌 − 𝛽1̂ 𝑋
take expectation to 𝛽0 :
Remind we have
𝑌𝑖 = 𝛽0 + 𝛽1 𝑋𝑖 + 𝑢𝑖
𝑌 = 𝛽 0 + 𝛽1 𝑋 + 𝑢
So take expectation to 𝛽1 :
̄
∑(𝑋𝑖 − 𝑋)/(𝑌 𝑖−𝑌)
̄
𝐸[𝛽1̂ ] = 𝐸[ ]
̄
∑(𝑋𝑖 − 𝑋)(𝑋 ̄
𝑖 − 𝑋)
Continued
̄ 0 + 𝛽1 𝑋𝑖 + 𝑢𝑖 − (𝛽0 + 𝛽1 𝑋 + 𝑢))
∑(𝑋𝑖 − 𝑋)(𝛽
𝐸[𝛽1̂ ] = 𝐸[ ]
̄
∑(𝑋𝑖 − 𝑋)(𝑋 ̄
𝑖 − 𝑋)
̄ 1 (𝑋𝑖 − 𝑋) + (𝑢𝑖 − 𝑢))
∑(𝑋𝑖 − 𝑋)(𝛽
= 𝐸[ ]
∑(𝑋 − 𝑋)(𝑋̄ − 𝑋)̄ 𝑖 𝑖
̄ 𝑖 − 𝑢)
∑(𝑋𝑖 − 𝑋)(𝑢
= 𝛽1 + 𝐸[ ]
̄
∑(𝑋𝑖 − 𝑋)(𝑋 ̄
𝑖 − 𝑋)
̄ 𝑖 − 𝑢) = ∑(𝑋𝑖 − 𝑋)𝑢
Because ∑(𝑋𝑖 − 𝑋)(𝑢 ̄ 𝑖 , so
∑(𝑋𝑖 − 𝑋)𝑢𝑖
𝐸[𝛽1̂ ] = 𝛽1 + 𝐸[ ]
∑(𝑋𝑖 − 𝑋)(𝑋𝑖 − 𝑋)
∑(𝑋𝑖 − 𝑋)𝐸(𝑢𝑖 |𝑋1 , ..., 𝑋𝑛 )
= 𝛽1 + 𝐸[ ]
∑(𝑋𝑖 − 𝑋)(𝑋𝑖 − 𝑋)
𝑝
Notation: 𝛽1̂ ⟶ 𝛽1 or 𝑝𝑙𝑖𝑚𝛽1̂ = 𝛽1 , so
̄ 𝑖 − 𝑌̄ )
∑(𝑋𝑖 − 𝑋)(𝑌
𝑝𝑙𝑖𝑚𝛽1̂ = 𝑝𝑙𝑖𝑚[ ]
̄
∑(𝑋𝑖 − 𝑋)(𝑋 𝑖 − 𝑋)
̄
1 ̄ ̄ 𝑠𝑥𝑦
𝑛−1 ∑(𝑋𝑖 − 𝑋)(𝑌𝑖 − 𝑌 )
𝑝𝑙𝑖𝑚𝛽1̂ = 𝑝𝑙𝑖𝑚[ ] = 𝑝𝑙𝑖𝑚( )
1 ̄ ̄ 𝑠2𝑥
𝑛−1 ∑(𝑋𝑖 − 𝑋)(𝑋𝑖 − 𝑋)
where 𝑠𝑥𝑦 and 𝑠2𝑥 are sample covariance and sample variance.
𝑝𝑙𝑖𝑚(𝑔(𝑋)) = 𝑔(𝑝𝑙𝑖𝑚(𝑋))
Example:
𝑝𝑙𝑖𝑚(𝑋 + 𝑌 ) = 𝑝𝑙𝑖𝑚(𝑋) + 𝑝𝑙𝑖𝑚(𝑌 )
𝑋 𝑝𝑙𝑖𝑚(𝑋)
𝑝𝑙𝑖𝑚( )= 𝑖𝑓 𝑝𝑙𝑖𝑚(𝑌 ) ≠ 0
𝑌 𝑝𝑙𝑖𝑚(𝑌 )
𝐶𝑜𝑣(𝑋𝑖 , 𝑌𝑖 )
𝑝𝑙𝑖𝑚𝛽1̂ =
𝑉 𝑎𝑟(𝑋𝑖 )
𝐶𝑜𝑣(𝑋𝑖 , (𝛽0 + 𝛽1 𝑋𝑖 + 𝑢𝑖 ))
=
𝑉 𝑎𝑟(𝑋𝑖 )
𝐶𝑜𝑣(𝑋𝑖 , 𝛽0 ) + 𝛽1 𝐶𝑜𝑣(𝑋𝑖 , 𝑋𝑖 ) + 𝐶𝑜𝑣(𝑋𝑖 , 𝑢𝑖 )
=
𝑉 𝑎𝑟(𝑋𝑖 )
𝐶𝑜𝑣(𝑋𝑖 , 𝑢𝑖 )
= 𝛽1 +
𝑉 𝑎𝑟(𝑋𝑖 )
𝐸(𝑌 ) = 𝜇𝑌
𝜎𝑌2
𝑌 ∼ 𝑁 (𝜇𝑌 , )
𝑛
The OLS estimators 𝛽0̂ and 𝛽1̂ could have similar sample distributions
when three least squares assumptions hold.
1 𝑛
𝑛 ∑𝑖=1 (𝑋𝑖 − 𝑋)(𝑌𝑖 − 𝑌 )
𝛽1̂ = 1 𝑛
𝑛 ∑𝑖=1 (𝑋𝑖 − 𝑋)(𝑋𝑖 − 𝑋)
1 𝑛
𝑛 ∑𝑖=1 (𝑋𝑖 − 𝑋)(𝑢𝑖 − 𝑢)
= 𝛽1 + 1 𝑛
𝑛 ∑𝑖=1 (𝑋𝑖 − 𝑋)(𝑋𝑖 − 𝑋)
𝑛 𝑛
∑(𝑋𝑖 − 𝑋)(𝑢𝑖 − 𝑢) = ∑(𝑋𝑖 − 𝑋)𝑢𝑖
𝑖=1 𝑖=1
then we have
1 𝑛 1 𝑛
∑(𝑋𝑖 − 𝑋)(𝑢𝑖 − 𝑢) ≅ ∑(𝑋𝑖 − 𝜇𝑥 )𝑢𝑖
𝑛 𝑖=1 𝑛 𝑖=1
Then
1 𝑛 1 𝑛
∑(𝑋𝑖 − 𝜇𝑥 )𝑢𝑖 = ∑ 𝑣𝑖 = 𝑣 ̄
𝑛 𝑖=1 𝑛 𝑖=1
𝑣̄ − 0 𝑑 𝑑 𝜎2
− → 𝑁 (0, 𝑣 )
→ 𝑁 (0.1) 𝑜𝑟 𝑣 ̄ −
𝜎𝑣 ̄ 𝑛
The denominator,
1 𝑛
∑(𝑋 − 𝑋)(𝑋𝑖 − 𝑋)
𝑛 𝑖=1 𝑖
𝑑 𝜎𝑣2̄
𝑣̄ −
→ 𝑁 (0, )
𝑛
𝑣̄ 𝑑 𝜎𝑣2̄
⇒ −
→ 𝑁 (0, )
𝑉 𝑎𝑟[𝑋𝑖 ] 𝑛[𝑉 𝑎𝑟(𝑋𝑖 )]2
𝑑 𝜎𝑣2̄
⇒𝛽1̂ − 𝛽1 −
→ 𝑁 (0, )
𝑛[𝑉 𝑎𝑟(𝑋𝑖 )]2
where
𝜎𝑣2̄ 𝑉 𝑎𝑟[(𝑋𝑖 − 𝜇𝑥 )𝑢𝑖 ]
𝜎𝛽2 ̂ = 2
=
1 𝑛[𝑉 𝑎𝑟(𝑋𝑖 )] 𝑛[𝑉 𝑎𝑟(𝑋𝑖 )]2
Zhaopeng Qu (Nanjing University) Simple OLS Regression:Estimation 9/26/2019 57 / 60
Properties of the OLS Estimators
Variation of X
In a Summary