HW10 Solu F09
HW10 Solu F09
HW10 Solu F09
1) In this problem, both x and y are considered to be variables. When this is the case, we use
the correlation to measure the strength of the linear relationship between x and y.
a) ANS: R = .966. This can be read directly off the table given to you.
b) ANS: Yes, because the data follows a linear pattern and there are no outliers.
Reasoning: R2 =
(∑ (x − x ) )(∑ (y − y ) ) and is symmetrical in x and y, giving the
i
2
i
2
same answer.
f) Because your are asked to decide whether there is a positive linear association
between the two variables, the alternative hypothesis is: HA: ρ > 0.
• The general form of the rejection region for this alternative hypothesis is: Reject H0 if
t ≥ tα , n− 2 .
R n−2 0.966 18 − 2
• t= = = 15.04
1− R2 1 − 0.934
2) Now we are asked to do a hypothesis test at a .05 significance level to determine whether a
linear relationship exists between stiffness (x) and thickness (y). In this case, I’m not specifying a
positive linear relationship.
1
R n−2 0.773 6 − 2
• t= = = 2.435.
1− R2 1 − 0.597
• The test statistic is not in the rejection region so we should fail to reject the null hypothesis.
• There is no statistical evidence at the .05 level of linear relationship between stiffness and
thickness.
3) a) The data provides strong statistical evidence of a linear relationship between x and y.
b) No, because the test statistic depends on both n and R. Because n = 5000 is so large, the
test statistic will also be large resulting in a small p-value. The value of R could be small
and we’d still have a small p-value.
4) I have a slight preference for the second transformation because there appears to be an
outlier in the reciprocal transformation. Neither transformation has a very good standardized
residual plot.
8) Problem on page 4
a) In this problem, both transformations appear to satisfy the linear regression conditions
of linear and equal variance. The point of this problem is that there isn’t necessarily one
best transformation.
b) When x = x’ and y’ = ln(y) then the model is Y = α e β xε
c) You need to first find the confidence interval for µY 'i 200 = µln(Y ) i 200 and then take
exponentials to get a confidence interval for µY i 200 . Because only y was transformed
and not x, we don’t need to use ln(200) in our calculation.
Using the transformed data, the 95% confidence interval for µY 'i 200 is:
1 ( x * − x )2
βˆ '0 + βˆ '1 x* ± tα / 2,n − 2 ⋅ sβˆ ' = β '0 + β '1 x ± tα / 2, n − 2 ⋅ s
ˆ ˆ *
+
0 + β '1 200
n ∑ ( xi − x ) 2
ˆ
1 (200 − 215.4) 2
= (8.95782 + .00617 × 200) ± 1.984 × .054 × + =
384 1859614.2
= 10.1918 ± 0.02408 = (10.16374 , 10.21588). This gives us a confidence interval for the
expected value of ln(Y) when x = 200.
5) [problem on page 6]
2
b) 96.8%
c) 99.7%
d) The quadratic model should be used because the adjusted R2 of .995 is larger for the
quadratic model than the adjusted R2 of .960 for the linear model.
ii. We can not predict values when x = 40.0 because 40.0 is outside the range of
observed x values.
f) We will use the model utility test to test for usefulness of the model. The hint that you
needed to use the Model Utility test for this problem is that you are told to use the output
from the quadratic model. The model utility test tells us whether or not at least one
predictor is useful but not which one. Suppose we reject the null hypothesis of the
model utility test and we looked at the p-value for individual predictors and found that
only β1 had a statistically significant p-value (p-value <β) then the linear model would be
the best model for this data set.
• p-value = .000 < .05 (the p-value is read off the ANOVA table)
• Conclusion: There is statistical evidence that at least one of the predictors provides
useful information. So, linear or quadratic model is useful in explaining the variability
in the moisture content but we don’t know which based on this statistical test.
The p-value (which I meant to delete but forgot) is 0.0134 and since p-value < α =
.05, we should reject the null hypothesis.
Conclusion: The data provides statistical evidence at the .05 level that β2 ≠ 0.
From this we infer that the quadratic predictor appears to provide useful information.
h) For 2 confidence intervals to have joint confidence level of (1-α)100%, we require that
the confidence level of each CI be (1- α/2)100%.
.08
For this problem, α = 0.08. Therefore, each CI must be (1- )100% = 96% CI. In
2
other words, we have to look up tα / 4 ,n−( k +1) where α/4 = 0.02.
3
The CI for β2 is −0.254 ± 3.482 (0.048) = −0.254 ± 0.167
c) 78.3%
7) Coefficients
Unstandardized
Coefficients
B Std. Error t Sig.
Hours since detection .0390 .0054 7.202 .000
Hours since detection ** 2 -.0010 .0001 -8.115 .000
(Constant) .0642 .0505 1.271 .211
c) You are told that n = 42. We want to test the following hypotheses:
• Conclusion: The data provides statistical evidence at the .05 level that the
coefficient of the linear term is less than .05.
d) If the significance level had been .01, then the rejection region would change to:
The conclusion when the significance level is .01 becomes: The data fails to
provide statistical evidence at the .01 significance level that the coefficient of the
linear term is less than .05.