STAT 3008 Applied Regression Analysis Tutorial 1 - Term 2, 2019 20
STAT 3008 Applied Regression Analysis Tutorial 1 - Term 2, 2019 20
ZHAN Zebang
22 January 2020
1 Regression Problems
• Regression studies the dependency: is there any dependence of the response Y on the
predictors (explanatory variables) X?
• A scatterplot is able to identify the mean function, variance function and separated points.
Mean function: E(Y |X = x) = f (x).
Variance function: Var(Y |X = x) = h(x).
Separated points: leverage point (horizontal, larger impact), outlier (vertical).
• A null plot is a scatterplot with constant mean function, constant variance function and
no separated point.
2 Linear Regression
• Model setting: Y = Xβ + e, where E(e) = 0, Var(e) = σ 2 In .
Simple linear regression: yi = β0 + β1 xi + ei ;
1
Multiple linear regression (p > 1): yi = β0 + β1 x1i + · · · + βp xpi + ei ,
where E(ei ) = 0, Var(ei ) = σ 2 , ei s are uncorrelated.
• Ordinary least squares (OLS) method: the estimates of parameters are obtained by
minimizing the sum of squares of the distances.
g(β) = ni=1 [yi − (β0 + β1 x1i + · · · + βp xpi )]2 ⇒ β̂ = arg minβ g(β).
P
Define the fitted value of yi as ŷi = β̂0 + β̂1 x1i + · · · + β̂p xpi ,
the residual êi = yi − ŷi = yi − (β̂0 + β̂1 x1i + · · · + β̂p xpi ),
and the residual sum of squares RSS = ni=1 ê2i = ni=1 [yi −(β̂0 + β̂1 x1i +· · ·+ β̂p xpi )]2 .
P P
RSS
E(RSS) = (n − k)σ 2 ⇒ σ̂ 2 = is an unbiased estimator of σ 2 , where k is the number
n−k
of regression coefficients in the model.
• For simple linear regression (p = 1), denote the sample means and sums of squares by
1 Pn Pn 2
Pn 2 2
x̄ = i=1 xi , SXX = i=1 (xi − x̄) = i=1 xi − nx̄ ,
n
1 Pn Pn 2
Pn 2 2
ȳ = i=1 yi , SY Y = i=1 (yi − ȳ) = i=1 yi − nȳ ,
n
SXY = ni=1 (xi − x̄)(yi − ȳ) = ni=1 xi yi − nx̄ȳ. Then
P P
∂g(β0 , β1 ) ∂g(β0 , β1 )
= −2 ni=1 (yi − β0 − β1 xi ), = −2 ni=1 (yi − β0 − β1 xi )xi .
P P
⇒
∂β0 ∂β1
∂g(β0 , β1 ) ∂g(β0 , β1 )
Set |β̂0 ,β̂1 = |β̂0 ,β̂1 = 0
∂β0 ∂β1
Pn Pn
xi yi − nx̄ȳ (x − x̄)(yi − ȳ) SXY
⇒ β̂1 = Pn 2 i=1
2
= i=1 Pn i 2
= , β̂0 = ȳ − β̂1 x̄.
i=1 xi − nx̄ i=1 (xi − x̄) SXX
RSS 1 SXY 2
E(RSS) = (n − 2)σ 2 ⇒ σ̂ 2 = = ··· = (SY Y − ).
n−2 n−2 SXX
Example 2. (Alternative form of SLR) For a data set with observations {(xi , yi ), i =
1, ..., n}, consider the regression model yi = α0 + α1 (xi − x̄) + ei , where E(ei ) = 0, Var(ei ) = σ 2 ,
ei s are uncorrelated. Find the least squares estimates of α0 , α1 and σ 2 .
Example 3. (OLS method) For a data set with observations {(xi , yi ), i = 1, ..., n}, consider
the regression model yi = β1 x2i + ei , where E(ei ) = 0, Var(ei ) = σ 2 , ei s are uncorrelated. Find
the least squares estimates of β1 and σ 2 .