36-401 Modern Regression HW #2 Solutions: Problem 1 (36 Points Total)
36-401 Modern Regression HW #2 Solutions: Problem 1 (36 Points Total)
36-401 Modern Regression HW #2 Solutions: Problem 1 (36 Points Total)
DUE: 9/15/2017
In Lecture Notes 4 we derived the following estimators for the simple linear regression model:
βb0 = Y − βb1 X
cXY
βb1 = 2 ,
sX
where
n n
1X 1X
cXY = (Xi − X)(Yi − Y ) and s2X = (Xi − X)2 .
n i=1 n i=1
Since the formula for βb0 depends on βb1 we will calculate Var(βb1 ) first. Some simple algebra1 shows we can
rewrite βb1 as Pn
1
n i=1 (Xi − X)i
β1 = β1 +
b .
s2X
Now, treating the Xi0 s as fixed, we have
Pn !
1
n i=1 (Xi − X)i
Var βb1 = Var β1 +
s2X
Pn !
1
n i=1 (Xi − X)i
= Var
s2X
1
Pn
n2 i=1 (Xi − X)2 Var(i )
=
s4X
σ2
Pn
n2 i=1 (Xi − X)2
=
s4X
σ2 2
n sX
=
s4X
2
σ
= .
n · s2X
1
Thus, Var(βb0 ) is given by
Var(βb0 ) = Var Y − βb1 X
n Pn !
1
2 1X n i=1 (Xi − X)(Yi − Y)
= Var(Y ) + X Var(βb1 ) − 2XCov Yi , 1
Pn
n i=1 n i=1 (Xi − X)
2
n n
!
σ2 2 2X X X
= + X Var(βb1 ) − Pn Cov Yi , (Xi − X)(Yi − Y )
n n i=1 (Xi − X)2 i=1 i=1
n
σ2 2 2X X
= + X Var(βb1 ) − Pn (Xi − X)Cov(Yi , Yi )
n n i=1 (Xi − X)2 i=1
n
σ2 2 2Xσ 2 X
= + X Var(β1 ) − Pn
b (Xi − X)
n n i=1 (Xi − X)2 i=1
| {z }
=0
σ2 2
= + X Var(βb1 )
n
2
σ2 σ2 X
= +
n n · s2X
2
σ 2 s2X + X
=
n · s2X
n
σ 2 i=1 Xi2
P
= .
n2 · s2X
(b) (6 pts.)
n
X n
X
bi = Yi − (βb0 + βb1 Xi )
i=1 i=1
n
X
= Yi − (Y − βb1 X) − βb1 Xi
i=1
n
X n
X
= (Yi − Y ) + (βb1 X − βb1 Xi )
i=1 i=1
2
(c) (12 pts.)
n
X n
X
Ybi b
i = (βb0 + βb1 Xi )b
i
i=1 i=1
n
X n
X
= βb0 i +βb1
b Xi b
i
i=1 i=1
| {z }
=0
n
X
= βb1 Xi b
i
i=1
Xn n
X
= βb1 i − βb1 X
Xi b i
b
i=1 i=1
| {z }
=0
n
X
= βb1 (Xi − X)b
i
i=1
Xn
= βb1 (Xi − X) Yi − (βb0 + βb1 Xi )
i=1
n
X
= βb1 (Xi − X) Yi − (Y − βb1 X) − βb1 Xi
i=1
n
X
= βb1 (Xi − X) (Yi − Y ) − βb1 (Xi − X)
i=1
n
X n
X
= βb1 (Xi − X)(Yi − Y ) − βb12 (Xi − X)2
i=1 i=1
Linear Algebra interpretation: The observed residuals are orthogonal to the fitted values.
Statistical interpretation: The observed residuals are linearly uncorrelated with the fitted values.
3
(d) (6 pts.)
4
Problem 2 [24 points]
(a) (8 pts.)
We compute the least squares estimate βb1 by minimizing the empirical mean squared error via a 1st derivative
test.
n
!
∂ \ ∂ 1X
M SE(β1 ) = (Yi − β1 Xi )2
∂β1 ∂β1 n i=1
n
2X
= (Yi − β1 Xi )(−Xi )
n i=1
Furthermore, !
n
∂2 \ ∂ 2X
M SE(β1 ) = − (Yi Xi − β1 Xi2 )
∂β12 ∂β1 n i=1
n
2X 2 ,
= X
n i=1 i
>0
5
(b) (8 pts.)
"P #
n
i=1 Yi X i
E βb1 = E Pn 2
i=1 Xi
"P #
n
i=1P Xi (β1 Xi + i )
=E n 2
i=1 Xi
" P #
n Pn
β1 i=1 Xi2 + i=1 Xi i
=E Pn 2
i=1 Xi
" Pn #
Xi i
= E β1 + Pi=1 n 2
i=1 Xi
" n #
1 X
= β1 + P n 2E Xi i
i=1 Xi i=1
n
1 X
= β1 + P n Xi · E[i ]
i=1 Xi2 i=1
|{z}
=0
= β1
Thus, if the true model is linear and through the origin, then βb1 is an unbiased estimator for β1 .
(c) (8 pts.)
If the true model is linear, but not necessarily through the origin, then the bias of the regression-through-the-
origin estimator βb1 is
Bias(βb1 ) = E βb1 − β1
"P #
n
i=1 Yi Xi
= E Pn 2 − β1
i=1 Xi
"P #
n
i=1 XiP(β0 + β1 Xi + i )
=E n 2 − β1
i=1 Xi
" P #
n Pn Pn
β0 i=1 Xi + β1 i=1 Xi2 + i=1 Xi i )
=E Pn 2 − β1
i=1 Xi
" Pn Pn #
β0 i=1 Xi + i=1 Xi i
= E β1 + Pn 2 − β1
i=1 Xi
Pn n
β0 i=1 Xi 1 X
= β1 + Pn 2 + P n 2 Xi · E[i ] −β1
i=1 Xi i=1 Xi i=1 |{z}
=0
Pn
β0 i=1 Xi
= Pn 2 .
i=1 Xi
6
Problem 3 [20 points total]
(a) (5 pts.)
set.seed(1)
n <- 100
X <- runif(n, 0, 1)
Y <- 5 + 3 * X + rnorm(n, 0, 1)
6
5
4
X
Figure 1: One hundred data points with the simple linear regression fit
(b) (5 pts.)
n <- 100
betas <- rep(NA,1,1000)
7
mean(betas)
## [1] 3.019629
(1) (1000)
Since 1000 is a reasonably large number of trials we expect the mean of β1 , . . . , β1 to be close to
E βb1 ] = E E βb1 | X1 , . . . , Xn
= E β1 ]
= E[3]
= 3.
30
20
10
0
(c) (5 pts.)
n <- 100
betas <- rep(NA,1,1000)
8
model <- lm(Y ~ X)
betas[itr] <- model$coefficients[2]
}
par(mfrow = c(1,2))
hist(betas, xlab = expression(hat(beta)[1]), prob = FALSE, main = "", xlim = c(3-20,3+20),
breaks = 750)
abline(v = 3, col = "red", lwd = 2)
hist(betas, xlab = expression(hat(beta)[1]), prob = FALSE, main = "", breaks = 200)
100
400
80
300
Frequency
Frequency
60
200
40
100
20
0
Figure 3: Histogram of linear regression slope parameters for Cauchy data (Left: restricted to the window
(-17,23). Right: The full window.)
(1) (1000)
Notice that the distribution of β1 , . . . , β1 still seems to be approximately centered around βb1 = 3, but
the tails are now much fatter. In particular, from the plot on the right, we see that at least one trial of the
experiment resulted in a value around βb ≈ −600.
(d) (5 pts.)
set.seed(1)
n <- 100
X <- runif(n, 0, 1)
W <- X + rnorm(n, 0, sqrt(2))
Y <- 5 + 3 * X + rnorm(n, 0, 1)
9
10
9
8
7
Y
6
5
4
3
X
Figure 4: One hundred observations of Y vs. X with the simple linear regression fit of Y on W
Y <- 5 + 3 * X + rnorm(n, 0, 1)
model <- lm(Y ~ W)
betas[itr] <- model$coefficients[2]
}
mean(betas)
## [1] 0.1198059
hist(betas, xlab = expression(hat(beta)[1]), prob = FALSE, main = "", breaks = 50)
From this, and Figure 5, we conclude having errors on the Xi ’s biases βb1 downwards.
10
50
40
Frequency
30
20
10
0
Figure 5: Histogram of linear regression slope parameters for data with errors on the X’s
11
Problem 4 [20 points total]
data(airquality)
(a) (5 pts.)
summary(airquality)
(b) (5 pts.)
(c) (5 pts.)
summary(model)$coefficients
12
0 100 250 60 80 0 10 20 30
150
Ozone
50
0
250
Solar.R
100
0
20
Wind
10
5
80
Temp
60
9
8
Month
7
6
5
30
20
Day
10
0
0 50 150 5 10 20 5 6 7 8 9
13
150
100
Ozone
50
0
Solar Radiation
Figure 7: Ozone vs. solar radiation observations in the **airquality** data set
14
100
Residuals
50
0
−50
Solar Radiation
(d) (5 pts.)
No, the standard regression assumptions do not hold. The residuals are not symmetric about zero so the
linear functional form assumption is not suitable. Furthermore, the residuals are highly heteroskedastic.
15