Problem 7.5 A)
Problem 7.5 A)
2024-10-21
Problem 7.5
a)
## Analysis of Variance Table
##
## Response: Y
## Df Sum Sq Mean Sq F value Pr(>F)
## X2 1 4860.3 4860.3 48.0439 1.822e-08 ***
## X1 1 3896.0 3896.0 38.5126 2.008e-07 ***
## X3 1 364.2 364.2 3.5997 0.06468 .
## Residuals 42 4248.8 101.2
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Call:
## lm(formula = Y ~ X2 + X1 + X3, data = data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -18.3524 -6.4230 0.5196 8.3715 17.1601
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 158.4913 18.1259 8.744 5.26e-11 ***
## X2 -0.4420 0.4920 -0.898 0.3741
## X1 -1.1416 0.2148 -5.315 3.81e-06 ***
## X3 -13.4702 7.0997 -1.897 0.0647 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 10.06 on 42 degrees of freedom
## Multiple R-squared: 0.6822, Adjusted R-squared: 0.6595
## F-statistic: 30.05 on 3 and 42 DF, p-value: 1.542e-10
##
##
## F = 5.403859
## F* = 3.599735
H0 : B3 = 0 (reduced model, drop X3)
Ha : B3 != 0 (full model, keep X3)
Since F* < F we cannot say at with 97.5% confidence that X3 should be in the model. We fail to reject H0,
so we should probably drop X3 at a = .025.
1
b)
## The p-value for this test is: 0.06467813
Problem 7.14
a)
## [1] 0.6189843
## [1] 0.8016123
## [1] 0.4562661
X1 can initially explain about 61% of the error, but after X2 is added it can now explain 80% of the remaining
error. Once X2 and X3 are added it can only explain 45% of the remaining error
b)
## [1] 0.3635387
## [1] 0.05811392
## [1] 0.009034276
The effects are drastically different with X2. X2 can initially explain only about 36% of the error, but after
X1 is added it can now only explain 6% of the remaining error. Once X1 and X3 are added it can only explain
1% of the remaining error
Probelm 7.18
a)
##
## Call:
## lm(formula = Ystar ~ X1star + X2star + X3star + 0, data = data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.158723 -0.055550 0.004493 0.072402 0.148411
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## X1star -0.5907 0.1098 -5.378 2.91e-06 ***
## X2star -0.1106 0.1217 -0.909 0.3684
## X3star -0.2339 0.1219 -1.920 0.0615 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.08597 on 43 degrees of freedom
## Multiple R-squared: 0.6822, Adjusted R-squared: 0.66
## F-statistic: 30.77 on 3 and 43 DF, p-value: 8.79e-11
b)
## [1] 0.05811392
## [1] 0.8016123
## [1] 0.09225135
## [1] 0.1274542
## [1] 0.627173
2
## [1] 0.09225135
Yes. That is the definition of the coeffecient of multiple determination.
C)
transformed back B0 = 158.4913
transformed back B1 = -1.141612
transformed back B2 = -0.4420043
transformed back B3 = -13.47016
orig B0 = 158.4913
orig B1 = -1.141612
orig B2 = -0.4420043
orig B3 = -13.47016
They are the same.
Problem 7.25
a)
## The fitted regression function is: Y hat = 4079.87 +X1* 0.0009354971
b)
## The fitted regression function is: Y hat = 4149.887 +X1* 0.0007870804 +X2* -13.16602 +X3* 623.5545
The coefficient is smaller in the MLR than in the SLR.
c)
SSR(X1) = 136366.2SSR(X1|X2) = 130697.2
They are not equal, the difference is medium, but substantial (about 6000).
d)
## Y X1 X2 X3
## Y 1.0000000 0.20766494 0.06002960 0.81057940
## X1 0.2076649 1.00000000 0.08489639 0.04565698
## X2 0.0600296 0.08489639 1.00000000 0.11337076
## X3 0.8105794 0.04565698 0.11337076 1.00000000
X1 and X2 have correlation of 0.08489639. This has little bearing on our previous answers
Problem 7.26
a)
The fitted regression function is: Y hat = 119.9432 +X1* -1.520604 +X2* 7.084751
b)
The fitted regression function is: Y hat = 158.4913 +X1* -1.141612 +X2* -0.4420043 +X3* -13.47016
The coefficients are very different. adding X3 makes many of them drop a lot and the intercept grows a lot.
Very big impact.
c)
3
SSR(X1) = 8275.389
SSR(X1|X3) = 3483.891
SSR(X2) = 4860.26
SSR(X2|X3) = 707.9971
None of them are the same.
d)
## Y X1 X2 X3
## Y 1.0000000 -0.7867555 -0.6029417 -0.6445910
## X1 -0.7867555 1.0000000 0.5679505 0.5696775
## X2 -0.6029417 0.5679505 1.0000000 0.6705287
## X3 -0.6445910 0.5696775 0.6705287 1.0000000
The correlation between X2 and X3 is .6705
The correlation between X1 and X3 is .5696
These correlations are fairly high, which should make you consider if you need to inclued all of these variables
(since they have a lot of overlaping predictive power).
# ---
# title: "HW6"
# output: pdf_document
# date: "2024-10-21"
# ---
# Problem 7.5
#
# a)
#
# ```{r,echo=FALSE}
# data <- read.delim(file = "http://www.cnachtsheim-text.csom.umn.edu/Kutner/Chapter%20%206%20Data%20Set
# Y<-data$V1
# X1<-data$V2
# X2<-data$V3
# X3<-data$V4
# data <- data.frame(Y, X2, X1, X3)
# mlr<-lm(Y ~ X2+X1+X3, data = data)
# sm = summary(mlr)
# avm = anova(mlr)
# avm
# sm
# err = sm$sigma
# MSE = errˆ2
# n = 46
# p = 4
# cat("\n\nF =",qf(.975,1,n-p))
# RSSX3 = avm[[3,2]]
# Fdrop = (RSSX3)/MSE
# cat("F* = ", Fdrop)
# ```
#
# H0 : B3 = 0 (reduced model, drop X3)
#
# Ha : B3 != 0 (full model, keep X3)
4
#
# Since F* < F we cannot say at with 97.5% confidence that X3 should be in the model. We fail to reject
#
# b)
#
# ```{r, echo=FALSE}
# pVal = 1-pf(Fdrop,1,42)
# cat("The p-value for this test is: ",pVal)
# ```
#
# Problem 7.14
#
# a)
#
# ```{r, echo=FALSE}
# data <- data.frame(Y, X1)
# mlr<-lm(Y ~ X1, data = data)
# sm = summary(mlr)
# avm = anova(mlr)
# r2.1 = sm$r.squared
#
# data <- data.frame(Y, X2, X1, X3)
# mlr<-lm(Y ~ X2+X1+X3, data = data)
# sm = summary(mlr)
# avm = anova(mlr)
# RSSX1 = avm[[2,2]]
# SSEX2 = avm[[1,2]]
# r2.2 = RSSX1/SSEX2
# data <- data.frame(Y, X2, X3, X1)
# mlr<-lm(Y ~ X2+X3+X1, data = data)
# sm = summary(mlr)
# avm = anova(mlr)
# RSSX1 = avm[[3,2]]
# SSEX23 = avm[[1,2]]+avm[[2,2]]
# r2.3 = RSSX1/SSEX23
# r2.1
# r2.2
# r2.3
# ```
#
# X1 can initially explain about 61% of the error, but after X2 is added it can now explain 80% of the r
#
# b)
#
# ```{r, echo=FALSE}
# data <- data.frame(Y, X2)
# mlr<-lm(Y ~ X2, data = data)
# sm = summary(mlr)
# avm = anova(mlr)
# r2.1 = sm$r.squared
#
# data <- data.frame(Y, X1, X2, X3)
# mlr<-lm(Y ~ X1+X2+X3, data = data)
5
# sm = summary(mlr)
# avm = anova(mlr)
# RSSX1 = avm[[2,2]]
# SSEX2 = avm[[1,2]]
# r2.2 = RSSX1/SSEX2
# data <- data.frame(Y, X1, X3, X2)
# mlr<-lm(Y ~ X1+X3+X2, data = data)
# sm = summary(mlr)
# avm = anova(mlr)
# RSSX1 = avm[[3,2]]
# SSEX23 = avm[[1,2]]+avm[[2,2]]
# r2.3 = RSSX1/SSEX23
# r2.1
# r2.2
# r2.3
# ```
#
# The effects are drastically different with X2. X2 can initially explain only about 36% of the error, b
#
# Probelm 7.18
#
# a)
#
# ```{r, echo=FALSE}
# Ybar = sum(Y)/n
# SY = sqrt(sum((Y-Ybar)ˆ2)/(n-1))
# Ystar = ((Y-Ybar)/SY)/sqrt(n-1)
#
# X1bar = sum(X1)/n
# X2bar = sum(X2)/n
# X3bar = sum(X3)/n
# SX1 = sqrt(sum((X1-X1bar)ˆ2)/(n-1))
# SX2 = sqrt(sum((X2-X2bar)ˆ2)/(n-1))
# SX3 = sqrt(sum((X3-X3bar)ˆ2)/(n-1))
# X1star = (X1-X1bar)/SX1/sqrt(n-1)
# X2star = (X2-X2bar)/SX2/sqrt(n-1)
# X3star = (X3-X3bar)/SX3/sqrt(n-1)
#
# data <- data.frame(Ystar, X1star, X2star, X3star)
# mlr<-lm(Ystar ~ X1star+X2star+X3star+0, data = data)
# summary(mlr)
# ```
#
#
# b)
#
# ```{r,echo=FALSE}
# sm = summary(mlr)
# avm = anova(mlr)
# RSSX1 = avm[[2,2]]
# SSEX2 = avm[[1,2]]
# r12 = RSSX1/SSEX2
#
6
#
#
# data <- data.frame(Ystar, X2star, X1star, X3star)
# mlr<-lm(Ystar ~ X2star+X1star+X3star+0, data = data)
#
# sm = summary(mlr)
# avm = anova(mlr)
# RSSX1 = avm[[2,2]]
# SSEX2 = avm[[1,2]]
# r21 = RSSX1/SSEX2
#
#
#
# data <- data.frame(Ystar, X3star, X1star, X2star)
# mlr<-lm(Ystar ~ X3star+X1star+X2star+0, data = data)
#
# sm = summary(mlr)
# avm = anova(mlr)
# RSSX1 = avm[[2,2]]
# SSEX2 = avm[[1,2]]
# r31 = RSSX1/SSEX2
#
#
# data <- data.frame(Ystar, X1star, X3star, X2star)
# mlr<-lm(Ystar ~ X1star+X3star+X2star+0, data = data)
#
# sm = summary(mlr)
# avm = anova(mlr)
# RSSX1 = avm[[2,2]]
# SSEX2 = avm[[1,2]]
# r13 = RSSX1/SSEX2
#
#
# data <- data.frame(Ystar, X2star, X3star, X1star)
# mlr<-lm(Ystar ~ X1star+X3star+X2star+0, data = data)
#
# sm = summary(mlr)
# avm = anova(mlr)
# RSSX1 = avm[[2,2]]
# SSEX2 = avm[[1,2]]
# r23 = RSSX1/SSEX2
#
# data <- data.frame(Ystar, X3star, X2star, X1star)
# mlr<-lm(Ystar ~ X3star+X2star+X1star+0, data = data)
#
# sm = summary(mlr)
# avm = anova(mlr)
# RSSX1 = avm[[2,2]]
# SSEX2 = avm[[1,2]]
# r32 = RSSX1/SSEX2
#
#
#
7
# r12
# r21
# r23
# r32
# r31
# r13
# ```
#
# Yes. That is the definition of the coeffecient of multiple determination.
#
# C)
#
# ```{r,echo=FALSE, results='asis'}
# data <- data.frame(Ystar, X1star, X2star, X3star)
# mlr<-lm(Ystar ~ X1star+X2star+X3star+0, data = data)
#
# b1star = sm$coefficients[[3]]
# b2star = sm$coefficients[[2]]
# b3star = sm$coefficients[[1]]
# b1 = b1star*SY/SX1
# b2 = b2star*SY/SX2
# b3 = b3star*SY/SX3
# b0 = Ybar-b1*X1bar-b2*X2bar-b3*X3bar
# cat("transformed back B0 = ", b0)
# cat("\n\ntransformed back B1 = ", b1)
# cat("\n\ntransformed back B2 = ", b2)
# cat("\n\ntransformed back B3 = ", b3)
#
# data <- data.frame(Y, X1, X2, X3)
# mlr<-lm(Y ~ X1+X2+X3, data = data)
# sm = summary(mlr)
# b0 = sm$coefficients[[1]]
# b1 = sm$coefficients[[2]]
# b2 = sm$coefficients[[3]]
# b3 = sm$coefficients[[4]]
# cat("\n\norig B0 = ", b0)
# cat("\n\norig B1 = ", b1)
# cat("\n\norig B2 = ", b2)
# cat("\n\norig B3 = ", b3)
# ```
#
# They are the same.
#
# Problem 7.25
#
# a)
#
# ```{r, echo=FALSE}
# data <- read.delim(file = "http://www.cnachtsheim-text.csom.umn.edu/Kutner/Chapter%20%206%20Data%20Set
#
# Y<-data$V1
# X1<-data$V2
# X2<-data$V3
8
# X3<-data$V4
# data <- data.frame(Y, X1)
# mlr<-lm(Y ~ X1, data = data)
# sm = summary(mlr)
# b0 = sm$coefficients[[1]]
# b1 = sm$coefficients[[2]]
# cat("The fitted regression function is: Y hat =",b0,"+X1*",b1)
# ```
#
# b)
#
# ```{r,echo=FALSE}
# r1 = sm$r.squared
# data <- data.frame(Y, X1, X2, X3)
# mlr2<-lm(Y ~ X1+X2+X3, data = data)
# sm2 = summary(mlr2)
# b0 = sm2$coefficients[[1]]
# b1 = sm2$coefficients[[2]]
# b2 = sm2$coefficients[[3]]
# b3 = sm2$coefficients[[4]]
# cat("The fitted regression function is: Y hat =",b0,"+X1*",b1,"+X2*",b2,"+X3*",b3)
# ```
#
# The coefficient is smaller in the MLR than in the SLR.
#
# c)
#
# ```{r, echo=FALSE,results='asis'}
# data <- data.frame(Y,X1)
# mlr<-lm(Y ~ X1, data = data)
# avm = anova(mlr)
# RSSX1 = avm[[1,2]]
#
# data <- data.frame(Y, X2, X1)
# mlr<-lm(Y ~ X2+X1, data = data)
# avm = anova(mlr)
# RSSX21 = avm[[2,2]]
#
# cat("SSR(X1) = ",RSSX1)
# cat("SSR(X1|X2) = ",RSSX21)
# ```
#
# They are not equal, the difference is medium, but substantial (about 6000).
#
# d)
#
# ```{r,echo=FALSE}
# data <- data.frame(Y, X1, X2, X3)
# cor(data)
# ```
#
# X1 and X2 have correlation of 0.08489639. This has little bearing on our previous answers
#
9
# Problem 7.26
#
# a)
#
# ```{r,echo=FALSE,results='asis'}
# data <- read.delim(file = "http://www.cnachtsheim-text.csom.umn.edu/Kutner/Chapter%20%206%20Data%20Set
# Y<-data$V1
# X1<-data$V2
# X2<-data$V3
# X3<-data$V4
# data <- data.frame(Y, X1, X2)
# mlr<-lm(Y ~ X1+X1, data = data)
# sm = summary(mlr)
# avm = anova(mlr)
# b0 = sm$coefficients[[1]]
# b1 = sm$coefficients[[2]]
# b2 = sm$coefficients[[3]]
# cat("The fitted regression function is: Y hat =",b0,"+X1*",b1,"+X2*",b2)
# ```
#
# b)
#
# ```{r,echo=FALSE,results='asis'}
# data <- data.frame(Y, X1, X2, X3)
# mlr2<-lm(Y ~ X1+X2+X3, data = data)
# sm2 = summary(mlr2)
# b0 = sm2$coefficients[[1]]
# b1 = sm2$coefficients[[2]]
# b2 = sm2$coefficients[[3]]
# b3 = sm2$coefficients[[4]]
# cat("The fitted regression function is: Y hat =",b0,"+X1*",b1,"+X2*",b2,"+X3*",b3)
# ```
#
# The coefficients are very different. adding X3 makes many of them drop a lot and the intercept grows a
#
# c)
#
# ```{r, echo=FALSE,results='asis'}
# data <- data.frame(Y,X1)
# mlr<-lm(Y ~ X1, data = data)
# avm = anova(mlr)
# SSRX1 = avm[[1,2]]
#
# data <- data.frame(Y, X3, X1)
# mlr<-lm(Y ~ X3+X1, data = data)
# avm = anova(mlr)
# SSRX31 = avm[[2,2]]
#
# cat("SSR(X1) = ",SSRX1)
# cat("\n\nSSR(X1|X3) = ",SSRX31)
#
# data <- data.frame(Y,X2)
# mlr<-lm(Y ~ X2, data = data)
10
# avm = anova(mlr)
# SSRX2 = avm[[1,2]]
#
# data <- data.frame(Y, X3, X2)
# mlr<-lm(Y ~ X3+X2, data = data)
# avm = anova(mlr)
# SSRX32 = avm[[2,2]]
#
# cat("\n\nSSR(X2) = ",SSRX2)
# cat("\n\nSSR(X2|X3) = ",SSRX32)
# ```
#
# None of them are the same.
#
# d)
#
# ```{r,echo=FALSE}
# data <- data.frame(Y, X1, X2, X3)
# cor(data)
# ```
#
# The correlation between X2 and X3 is .6705
#
# The correlation between X1 and X3 is .5696
#
# These correlations are fairly high, which should make you consider if you need to inclued all of these
11