HW10 Solu F09

HW #10 - Solutions
1) In this problem, both x and y are considered to be variables. When this is the case, we use
the correlation to measure the strength of the linear relationship between x and y.
a) ANS: R = .966. This can be read directly off the table given to you.
b) ANS: Yes, because the data follows a linear pattern and there are no outliers.
c) R2 = .934 so the answer is 93.4%.
d) Again the answer is 93.4%.

(∑ ( x − x )( y − y ))
i i
2
Reasoning: R2 =
(∑ (x − x ) )(∑ (y − y ) ) and is symmetrical in x and y, giving the
i
2
i
2
same answer.
e) The answer is still 93.4% because R2 is a unit free quantity.
f) Because your are asked to decide whether there is a positive linear association
between the two variables, the alternative hypothesis is: HA: ρ > 0.
• H0: ρ = 0 versus HA: ρ > 0
• The general form of the rejection region for this alternative hypothesis is: Reject H0 if
t ≥ tα , n− 2 .
For this particular problem, we should reject H0 if t ≥ 2.583 .
R n−2 0.966 18 − 2
• t= = = 15.04
1− R2 1 − 0.934
• Because t = 15.04 ≥ 2.583 , we will decide to reject the null hypothesis.

• Conclusion: The data provides statistical evidence at the .01 level that there is a
positive linear relationship between shear force and percent fiber dry weight for
asparagus plants. [Note, this is a statement about all asparagus plants, not just the 18
plants in the study.]
2) Now we are asked to do a hypothesis test at a .05 significance level to determine whether a
linear relationship exists between stiffness (x) and thickness (y). In this case, I’m not specifying a
positive linear relationship.
• H 0 : ρ = 0 versus H1 : ρ ≠ 0 where ρ = correlation between stiffness and thickness.

• We reject the null hypothesis if | t |≥ tα . Because n = 6 and α = .05,
,n−2
2
tα /2,n −2 = t.025,4 = 2.776 . We should reject the null hypothesis if | t |≥ 2.776.
1
R n−2 0.773 6 − 2
• t= = = 2.435.
1− R2 1 − 0.597
• The test statistic is not in the rejection region so we should fail to reject the null hypothesis.
• There is no statistical evidence at the .05 level of linear relationship between stiffness and
thickness.
3) a) The data provides strong statistical evidence of a linear relationship between x and y.
b) No, because the test statistic depends on both n and R. Because n = 5000 is so large, the
test statistic will also be large resulting in a small p-value. The value of R could be small
and we’d still have a small p-value.
4) I have a slight preference for the second transformation because there appears to be an
outlier in the reciprocal transformation. Neither transformation has a very good standardized
residual plot.
8) Problem on page 4
a) In this problem, both transformations appear to satisfy the linear regression conditions
of linear and equal variance. The point of this problem is that there isn’t necessarily one
best transformation.
b) When x = x’ and y’ = ln(y) then the model is Y = α e β xε
When x ' = ln( x ) and y ' = ln( y ) , then the model is Y = α xβ ε .
c) You need to first find the confidence interval for µY 'i 200 = µln(Y ) i 200 and then take
exponentials to get a confidence interval for µY i 200 . Because only y was transformed
and not x, we don’t need to use ln(200) in our calculation.
Using the transformed data, the 95% confidence interval for µY 'i 200 is:
1 ( x * − x )2
βˆ '0 + βˆ '1 x* ± tα / 2,n − 2 ⋅ sβˆ ' = β '0 + β '1 x ± tα / 2, n − 2 ⋅ s
ˆ ˆ *
+
0 + β '1 200
n ∑ ( xi − x ) 2
ˆ
1 (200 − 215.4) 2
= (8.95782 + .00617 × 200) ± 1.984 × .054 × + =
384 1859614.2
= 10.1918 ± 0.02408 = (10.16374 , 10.21588). This gives us a confidence interval for the
expected value of ln(Y) when x = 200.
Now take exponentials of both sides: ( e

10.16712
, e10.21588 ) = (26033.0 , 27333.8).
5) [problem on page 6]
a) Yes, the points fit the line very well.
2
b) 96.8%
c) 99.7%
d) The quadratic model should be used because the adjusted R2 of .995 is larger for the
quadratic model than the adjusted R2 of .960 for the linear model.
e) y = 1015.8655 + 6.1445x – 0.254(x – 14.4833)2
i. yˆ = 1015.8655 + 6.1445(15) − 0.254(15 − 14.4833) 2 =
ii. We can not predict values when x = 40.0 because 40.0 is outside the range of
observed x values.
f) We will use the model utility test to test for usefulness of the model. The hint that you
needed to use the Model Utility test for this problem is that you are told to use the output
from the quadratic model. The model utility test tells us whether or not at least one
predictor is useful but not which one. Suppose we reject the null hypothesis of the
model utility test and we looked at the p-value for individual predictors and found that
only β1 had a statistically significant p-value (p-value <β) then the linear model would be
the best model for this data set.
• Hypotheses: H0: β1 = β2 = 0 versus HA: at least one of β1 and β2 ≠ 0.
• p-value = .000 < .05 (the p-value is read off the ANOVA table)
• Decision: We reject the null hypothesis
• Conclusion: There is statistical evidence that at least one of the predictors provides
useful information. So, linear or quadratic model is useful in explaining the variability
in the moisture content but we don’t know which based on this statistical test.
g) The appropriate hypotheses to test are: H0: β2 = 0 versus HA: β2 ≠ 0.
The p-value (which I meant to delete but forgot) is 0.0134 and since p-value < α =
.05, we should reject the null hypothesis.
Conclusion: The data provides statistical evidence at the .05 level that β2 ≠ 0.
From this we infer that the quadratic predictor appears to provide useful information.
h) For 2 confidence intervals to have joint confidence level of (1-α)100%, we require that
the confidence level of each CI be (1- α/2)100%.
.08
For this problem, α = 0.08. Therefore, each CI must be (1- )100% = 96% CI. In
2
other words, we have to look up tα / 4 ,n−( k +1) where α/4 = 0.02.
When n = 6, tα / 4 ,n −( k +1) = t.02,3 = 3.482.

The CI for β1 is 6.144 ± 3.482 (0.204) = 6.144 ± 0.710
3
The CI for β2 is −0.254 ± 3.482 (0.048) = −0.254 ± 0.167
i) In calculus we learned that for quadratic equations, y = β 0 + β1 x + β 2 x 2 the

interpretation of β1 is y’(0) = β1. So, the value of β1 is based on the behavior of the
curve around x = 0. The estimates of curves (and lines) is the least variable around
the middle of the data set and if x = 0 isn’t near the middle of the data set, the
estimate of β1 could vary a lot from sample to sample. But centering the data, we’ve
essentially made it so that the new centered data has x = 0 right in the middle of the
new centered x values. This decreases the variability of estimate of β1 from sample
to sample.
6) a) The cubic model
b) Because it has the largest adjusted R2 value.
c) 78.3%
7) Coefficients
Unstandardized
Coefficients
B Std. Error t Sig.
Hours since detection .0390 .0054 7.202 .000
Hours since detection ** 2 -.0010 .0001 -8.115 .000
(Constant) .0642 .0505 1.271 .211
a) y = .0642 + .0390 x − .0010 x 2
b) µˆY ⋅20 = .0642 + .0390(20) − .0010(20 2 )
c) You are told that n = 42. We want to test the following hypotheses:
• H0: β1 = .05 versus HA: β1 < .05

.039 − .05
• Test Statistic: t = = −2.037
.0054
• Rejection Region: Reject H0 if t ≤ −t.05,n −( k +1) where t.05,39 = 1.684
• Decision: Because -2.037 < -1.684, we reject the null hypothesis
• Conclusion: The data provides statistical evidence at the .05 level that the
coefficient of the linear term is less than .05.
d) If the significance level had been .01, then the rejection region would change to:
Reject H0 if t ≤ −t.01,n −( k +1) = -2.423.
Because t > -2.423, we would fail to reject H0
The conclusion when the significance level is .01 becomes: The data fails to
provide statistical evidence at the .01 significance level that the coefficient of the
linear term is less than .05.

HW10 Solu F09

Uploaded by

Copyright:

Available Formats

HW10 Solu F09

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

HW10 Solu F09

Uploaded by

Copyright:

Available Formats

HW #10 - Solutions

c) R2 = .934 so the answer is 93.4%.

d) Again the answer is 93.4%.

e) The answer is still 93.4% because R2 is a unit free quantity.

• H0: ρ = 0 versus HA: ρ > 0

For this particular problem, we should reject H0 if t ≥ 2.583 .

• Because t = 15.04 ≥ 2.583 , we will decide to reject the null hypothesis.

• H 0 : ρ = 0 versus H1 : ρ ≠ 0 where ρ = correlation between stiffness and thickness.

tα /2,n −2 = t.025,4 = 2.776 . We should reject the null hypothesis if | t |≥ 2.776.

When x ' = ln( x ) and y ' = ln( y ) , then the model is Y = α xβ ε .

Now take exponentials of both sides: ( e

a) Yes, the points fit the line very well.

e) y = 1015.8655 + 6.1445x – 0.254(x – 14.4833)2

i. yˆ = 1015.8655 + 6.1445(15) − 0.254(15 − 14.4833) 2 =

• Hypotheses: H0: β1 = β2 = 0 versus HA: at least one of β1 and β2 ≠ 0.

• Decision: We reject the null hypothesis

g) The appropriate hypotheses to test are: H0: β2 = 0 versus HA: β2 ≠ 0.

When n = 6, tα / 4 ,n −( k +1) = t.02,3 = 3.482.

i) In calculus we learned that for quadratic equations, y = β 0 + β1 x + β 2 x 2 the

6) a) The cubic model

b) Because it has the largest adjusted R2 value.

a) y = .0642 + .0390 x − .0010 x 2

b) µˆY ⋅20 = .0642 + .0390(20) − .0010(20 2 )

• H0: β1 = .05 versus HA: β1 < .05

Reject H0 if t ≤ −t.01,n −( k +1) = -2.423.

Because t > -2.423, we would fail to reject H0

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.