ST350 NCSU Practice Problems Final Exam
ST350 NCSU Practice Problems Final Exam
ST350 NCSU Practice Problems Final Exam
2. (linear regression) In the setting of the previous problem, about what percent of the variation in number of
service calls is explained by the linear relation between number of service calls and number of machines?
a. 86% b. 93% c. 74% d. none of these e. can't tell from information given
3. (correlation) Outdoor temperature influences natural gas consumption for the purpose of heating a house.
The usual measure of the need for heating is heating degree days. The number of heating degree days for a
particular day is the number of degrees the average temperature for that day is below 65°F, where the
average temperature for a day is the mean of the high and low temperatures for that day. An average
temperature of 20°F, for example, corresponds to 45 heating degree days. A homeowner interested in
switching to solar heating panels collects the following data on her natural gas use for the months October
through June, where x is heating degree days per day for the month and y is gas consumption per day in
hundreds of cubic feet.
Month Oct Nov Dec Jan Feb Mar Apr May June
x 15.6 26.8 37.8 36.4 35.5 18.6 15.3 7.9 0
y 5.2 6.1 8.7 8.5 8.8 4.9 4.5 2.5 1.1
If ÐB3 BÑÐC3 CÑ œ #*"Þ$"ß =B œ "$Þ%#ß =C œ #Þ(%, calculate the correlation coefficient r and interpret
*
3œ"
its value; draw a scatterplot of the data.
4. (correlation) Each of the following statements contains a blunder. In each case explain what is wrong.
a. “There is a high correlation between the sex of American workers and their income."
b. “We found a high correlation (r œ 1.09) between students' ratings of faculty teaching and
ratings made by other faculty members."
c. “The correlation between planting rate and yield of corn was found to be r œ .23 bushel."
6. (linear regression) Outdoor temperature influences natural gas consumption for the purpose of heating a
house. The usual measure of the need for heating is heating degree days. The number of heating degree
days for a particular day is the number of degrees the average temperature for that day is below 65°F, where
the average temperature for a day is the mean of the high and low temperatures for that day. An average
temperature of 20°F, for example, corresponds to 45 heating degree days. A homeowner interested in
switching to solar heating panels collects the following data on her natural gas use for the months October
through June, where the explanatory variable x is heating degree days per day for the month and the
response variable y is gas consumption per day in hundreds of cubic feet.
Month Oct Nov Dec Jan Feb Mar Apr May June
x 15.6 26.8 37.8 36.4 35.5 18.6 15.3 7.9 0
y 5.2 6.1 8.7 8.5 8.8 4.9 4.5 2.5 1.1
Summary statistics are as follows: B œ #"Þ&%ß C œ &Þ&*ß =B œ "$Þ%#ß =C œ #Þ(%ß < œ Þ**!
Calculate the least squares regression line y œ ,! ," B of gas consumption y on heating degree days x.
Draw the regression line on a scatterplot of the data.
7. (confidence interval for mean) As the sample size n increases, the width of the confidence interval at a fixed
confidence level for the population mean tends to:
a. increase b. decrease c. stay the same
8. (confidence interval for mean) In a study to establish the absolute threshold of hearing, 71 male college
freshmen are asked to participate. Each subject is seated in a soundproof room and a 150 Hz tone is
presented at a large number of stimulus levels in a randomized order. The subject is instructed to press a
button if he detects the tone. The mean for the group was 21.6 db with s = 2.1. Estimate the mean absolute
threshold of all men 19-21 years of age with a 99% confidence interval.
9. (confidence interval for mean) A 95% confidence interval for the mean mileage of the 2005 Mazda 626s is
(23.8, 29.6). Then
a. At the 95% confidence level, we can conclude that the mean mileage for 1990 Mazda 626
exceeds 23.8 miles per gallon.
b. If the same data were used to construct a 99% confidence interval, the resulting interval
would have length shorter than the one given above.
c. If we repeatedly took samples of the same size, and each time computed a 95% confidence
interval, approximately 95% of the resulting intervals would include the true mean mileage
for 1990 Mazda 626
d. None of the above
10. (confidence interval for mean) A random sample of size 49 selected from a population had a sample standard
deviation = œ 8.726. Suppose the confidence interval formed for . was 63 < . < 69.
a. What is the sample mean, x?
b. What is the estimated standard deviation of x?
c. What was the confidence coefficient used to form the confidence interval?
11. (confidence interval) A confidence coefficient of 0.99 can correctly be interpreted to mean that
a. 99% of the time in repeated sampling, intervals using an appropriate formula will contain
the sample value.
b. 99% of the time in repeated sampling, intervals using an appropriate formula will contain
the relevant population parameter.
c. 99% of the time in repeated sampling, intervals using an appropriate formula will contain
the sample value as the midpoint of the interval.
d. 99% of the time in repeated sampling, intervals using an appropriate formula will contain
the sample mean as their midpoints.
ST 350 Practice Problems Final Exam page 3
12. (correlation) If a pair of variables have a strong curvilinear (that is, nonlinear) relationship, which of the
following is true?
a. The correlation coefficient will be able to indicate that a nonlinear relationship is present
b. A scatter plot will not be needed to indicate that a nonlinear relationship is present
c. The correlation coefficient will not be able to indicate the relationship is nonlinear
d. The correlation coefficient will be exactly equal to zero
13. (confidence interval for mean) The minimum sample size needed to estimate a population mean within „ 5
at a 95% confidence
level when the standard deviation is 40 is:
a. 62 b. 44 c. 1,537 d. 175 e. 246
14. (linear regression) If a residual plot exhibits a curved pattern in the residuals, this means that:
a. the B and C variable should be reversed so that a least squares line is appropriate for this data
b. a least squares line is not appropriate because there is a nonlinear relation between B and C
c. there is no significant relation between B and C
d. there is a problem with the accuracy of the data
Use the following Excel output to answer questions 15, 16, and 17 below.
Regression Statistics
Multiple R 0.8851
R Square 0.7835
Adjusted R Square 0.7474
Standard Error 5.4006
Observations 8
ANOVA
df SS MS F
Regression 1 633.242 633.242 21.711
Residual 6 175.000 29.167
Total 7 808.242
Use the following Excel output to answer questions 18, 19, and 20 below.
18. (linear regression) Given this information, what percent of the variation in the y variable is explained by the
variation of the independent variable?
a. About 75 percent
b. Approximately 57 percent
c. Can't be determined without having the actual data available
d. About 25 percent
19. (linear regression) Given this information, what was the sample size used in the study?
a. 8 b. 18 c. 9 d. 16
20. (linear regression) If the B variable increased by 2 units, then C would change by approximately
a. 9.64 b. 1.509 c. 0.1076 d. 9.64 e. 0.1076
ST 350 Practice Problems Final Exam page 5
21. (linear regression) If a least squares line were determined for the data in each scatterplot, which would have
the smallest sum of the squares of the residuals?
a. A b. B c. C d. D
A B
D
C
22. (mulitple regression) For all students at Walden University, the prediction equation for C œ college GPA
(range 0-4.0) and B" œ high school GPA (range 0-4.0) and B# œ college board score (range 200-800) is
sC œ !Þ#! !Þ&!B" !Þ!!#B#
a. Find the predicted college GPA for students having (i) high school GPA = 4.0 and college board score
=800; (ii) B" œ #.! and B# œ #!!.
b. A high school student trying to gain admission to Walden University re-takes the college board exam
during the summer after his senior year in high school. If he increases his college board score by 100
points, what will be the change in his predicted college GPA? Note that his high school GPA does not
change since he has already graduated from high school.
c. For all students with B# œ &!!, what is the predicted college GPA when high school GPA is the
explanatory variable?
ST 350 Practice Problems Final Exam page 6
23. (mulitple regression) For a study of crime in the United States, data for each of the 50 states and
Washington, DC was collected on the violent crime rate (per 100,000 citizens), poverty rate (percent of the
population), single parent households (percent of all state households), and urbanization (percent of state
population living in urban areas). The multiple regression output is shown below where C œ violent crime
rate, B" œ poverty rate, B# œ single parent households, and B$ œ urbanization.
ANOVA
df SS MS F S ignific
Regression 3 2082535.142 694178.4 39.25579 7.46
Residual 47 831122.7794 17683.46
Total 50 2913657.922
a. What is the least squares prediction equation for the violent crime rate?
b. What is the sum of the squares of the residuals SSE for this multiple regression model?
c. What proportion of the variation in the violent crime rate is explained by poverty rate, single family
households, and urbanization?
d. If the poverty rate increased by 1 percent, with single parent households and urbanization unchanged,
how would the violent crime rate change?
24. (linear regression) In 1998 sociologists were of the opinion that since 1975 there had been a decrease in the
difference in ages at first marriage of husbands and wives. We want to examine data to determine if this
decrease is significant. The following data summary and regression results were obtained, where the B
variable is year and the C variable is the age difference (husband age wife age) at first marriage.
25. (linear regression) Over 6 decades the Gallup Organization has periodically asked the following question:
If your party nominated a generally well-qualified person for president who happened to be a
woman, would you vote for that person?
Below is a table showing the percentage answering “yes” and the year of the century (37 = 1937).
% Yes 92 82 78 80 76 73 66 53 57 55 57 54 52 48 33 33
Year 99 87 84 83 78 75 71 69 67 63 59 58 55 49 45 37
summary statistics:
B œ '(Þ%% =B œ "'Þ( C œ '"Þ)" =C œ "(Þ"* < œ Þ*("
a. Determine the estimates ,! and ," of the parameters "! and "" in the linear model
y œ "! "" x %, where B is year and C is the percentage who respond “yes”.
b. Use the least squares line to estimate the percentage of respondents that would say “yes” in 1997.
c. Determine the estimate =/ of the standard deviation 5 of the error component %
(note that the sum of squares of residuals ÐC3 sC3 Ñ# œ 255.748).
"'
3œ"
Calculate a 95% confidence interval for the slope "" . (Note that WIÐ," Ñ œ / )
=
d.
8" =B
e. Conduct an appropriate hypothesis test (use α = .05) to determine if the year of the century is useful for
predicting the percentage of respondents that would answer “yes” to the above question. State the
hypotheses, find the value of the test statistic, and state your conclusion based on the P-value or the
rejection region.
26. (confidence interval) A random sample of 100 individuals was taken to determine the true percentage of
people who smoke in a region of the eastern United States. Forty-six of them said “yes" when they were
asked if they smoked. A 95% confidence interval for the true proportion of nonsmokers is:
a. (.46, .54) b. (.36, .56) c. (.95, 1.0) d. (.44, .64) e. (.00, .95)
27. (sample size) The credit manager of a department store would like to know what proportion of the credit-
card customers take advantage of the store's deferred payment plan each year. She would like to estimate
this proportion within „ .10 at a 90% confidence level, but has no good idea about what this proportion
might be. How many customers should she sample?
a. 271 b. 41 c. 17 d. 68 e. cannot be determined
29. (hypothesis test) A professor claims that 70% of College of Business graduates earn more than $45,000 per
year. In a random sample of 300 graduates, 195 earn more than $45,000.
Perform a 2-tailed hypothesis test to test the professor's claim; compute the T -value for the test.
30. (hypothesis test) A candy company claims that in a large bag of St. Patrick's Day candy half the candies are
green and half are white. You select candies at random from a bag and discover that of the first 50 you eat,
only 15 are green. This makes you suspicious of the company's claim that half are green and half are white.
Perform a hypothesis test to test the company's claim.
ST 350 Practice Problems Final Exam page 9
31. (hypothesis test) In the 1980's it was generally believed that congenital abnormalities affected about 5% of
the nation's children. Some people believe that the increase in the number of chemicals in the environment
has lead to an increase in the incidence of abnormalities. A recent study examined 408 children and found
that 48 of them showed signs of an abnormality. Is this evidence that the risk has increased?
i) Let : denote the proportion of children with genetic abnormalities. Choose the correct null and alternative
hypotheses.
A. L! À : œ !Þ""(' vs LE À : !Þ""('à B. L! À s: œ Þ!& vs LE À s: Þ!&
C. L! À : œ Þ!& vs LE À : Á Þ!&à D. L! À : œ Þ!& vs LE À : Þ!&
E. L! À : œ Þ!& vs LE À : Þ!& F. L! À s: œ !Þ""(' vs LE À s: Á !Þ""('
32. (confidence interval) The minimum sample size needed to estimate a population mean within „ 5 at a 95%
confidence level when the standard deviation is 40 is:
a. 62 b. 44 c. 1,537 d. 175 e. 246
33. (multiple regression, global F-test, hypothesis test for individual coefficients) Suppose you fit the multiple
regression model sC œ ,! ," B" ,# B# to a set of 8 œ #! data points and found V # œ !Þ*!ß
WWI œ $Þ)(, and Q WV œ "'Þ'$&Þ
33a. (global F-test) Perform a global F-test to test if there sufficient evidence to indicate that the model
contributes information for predicting C by answering questions i) - iv) below.
iv) If the T @+6?/ œ !Þ!!&, what is the conclusion for this test?
a) Do not reject the null hypothesis. Conclude that there is not sufficient evidence that the model is
useful for predicting CÞ
b) Reject the null hypothesis. Conclude that there is not sufficient evidence that the model is useful
for predicting CÞ
c) Reject the null hypothesis. Conclude that at least one of "" ß "# is nonzero and that the model is
useful for predicting CÞ
d) Reject the null hypothesis. Conclude that both "" and "# are greater than 0 and that the model is
useful for predicting C.
e) Do not reject the null hypothesis. Conclude that since "" œ ! and "# œ !, then "! must be
significantly different from 0.
ST 350 Practice Problems Final Exam page 10
33b. (t-test for individual coefficient) What null and alternative hypotheses would you test to determine whether
"# is positive?
a) L! À "# œ !ß L+ À "# Á ! b) L! À "# œ !ß L+ À "# ! c) L! À "# œ !ß L+ À "# !
d) L! À "# !ß L+ À "# Á ! e) L! À "# !ß L+ À "# !
33c. (t-test for individual coefficient) What null and alternative hypotheses would you test to determine whether
"# is negative?
a) L! À "# œ !ß L+ À "# Á ! b) L! À "# œ !ß L+ À "# ! c) L! À "# œ !ß L+ À "# !
d) L! À "# !ß L+ À "# Á ! e) L! À "# !ß L+ À "# !
34. (multiple regression with interaction) Consider the following prediction equation with an interaction term:
sC œ $Þ' "Þ#B" #Þ%B# !Þ#B" B# Þ When B# is held fixed at $ß how much does the predicted value of C
change when B" increases by 1 unit?
a) 1.8 b) 10.8 c) 11.4 d) 4.2 e) 7.2
35. (multiple regression with interaction) Consider the printout for an interaction regression between a response
variable C and two explanatory variables B" and B# Þ
36. (multiple regression, indicator variables) An elections officer wants to model voter turnout C in a precinct as
a function of type of election: national or state.
37. (multiple regression, indicator variables) An elections officer wants to model voter turnout C in a precinct as
a function of the type of precinct: urban, suburban, or rural. The elections officer uses the following
regression model:
C œ "! "" B" "# B# % where
B" œ " if urban, ! if not,
B# œ " if suburban, ! if not.
***************************************************
SOLUTIONS
r sBC œ .86 3.8
s
1. b. since b œ 2.1 œ 1.56. 2. c. since r œ (.86) œ .74
# #
3. r œ 3œ" Ð8"Ñ=B =C œ Þ**!Þ There is a strong positive linear relationship between heating degree days and gas
consumption.
9
6
GAS CONSUMPT.
3
0
0 10 20 30 40
Heating Degree Days
4. a. The correlation we are studying measures the linear relationship between 2 quantitative
variables; sex is a categorical variable.
b. 1 Ÿ r Ÿ 1 is violated.
c. r has no units.
5. c. The husband is 4 inches, or 4/2.7 œ 1.5 standard deviations above the mean husband height.
The wife's height is predicted to be above average by .25(1.5) œ .4 standard deviations, or .4 ‚ 2.5
inches œ 1 inch. (Recall b œ r(sC /sB ))
=
6. ," œ < =BC ; ,! œ C ," Bà b" œ .202; ,! œ C ," B œ 1.23
ST 350 Practice Problems Final Exam page 12
Consumption/Day
7
Gas Consumption/Day
6
Gas 5
4 Predicted Gas
Consumption/Day
3
2
1
0 10 20 30 40
Heating Degree Days/Day
7. b 8. 21.6 „ 2.648 #Þ"(" 9. a and c 10. a. 66 b. (8.726)/7 = 1.2465. c. >‡%) = 2.%!''; confidence coeff, .
*)% 11. b 12. d 13. e 14. b 15. b 16. c 17. a 18. b 19. c 20. e 21. a
22. a. i) sC œ !Þ#! !Þ&!‡%Þ! !Þ!!#‡)!! œ $Þ); ii) sC œ !Þ# !Þ&!‡#Þ! !Þ!!#‡#!! œ "Þ'.
b. predicted college GPA will increase by !.!!#‡"!! œ !Þ#.
c. sC œ !Þ#! !Þ&!B" !Þ!!#‡&!! œ !Þ#! !Þ&!B" " œ "Þ# !Þ&!B" .
23. a. @396/8> -<37/s <+>/ œ -786.7533445 13.40434162‡:9@/<>C <+>/ 33.02182927‡=3816/ :+</8>
4.401587623‡?<,+83D+>398
b. SSE œ 831,122.7794 c. V # œ 0.714749363 d. the violent crime rate will increase by approx. 13.4 per
100,000 citizens. e. analyze the model by examining the F test statistic for the F test; the hypotheses are
L! À "" œ "# œ "$ œ !ß
LE À at least one of the "3 's is not zero.
From the regression output: J œ Q WI '*%"()Þ%
WWI œ "(')$Þ%' œ $*Þ#&&(*; the "Significance F" (that is, the P-value) is
less than 0.05. Therefore, reject L! and conclude that at least one of the "3 's is not zero.
24. a. ," œ !Þ!#% (approximately); this means that each year since 1975 the average difference (husband
age wife age) has decreased by Þ!#%
,"
b. > œ WIÐ, "Ñ
œ !Þ!#$*&'ÞÞÞ
!Þ!!&&!$ÞÞÞ œ %Þ$&$""
c. iii. The T -value given in the Excel output is always for a 2-tail test; since we are conducting a 1-tail test
L+ À "" !, the T -value is !Þ!!!#&&
# œ !Þ!!!"#(&
d. Ð !Þ!$&$'*(ß !Þ!"#&%$$&Ñ from the output; notice that the interval is entirely negative.
e. use sC"**) „ >‡8# WIÐ. s"**) Ñ;
for the calculations below, note that from the output we have =/ œ !Þ")''#'%*;
sC"**) œ %*Þ*!#"$!%$ !Þ!#$*&'&##Ð"**)Ñ œ #Þ!$(
s"**) Ñ œ WI # Ð," Ñ ‚ ÐB/ BÑ# œ ÐÞ!!&&!$$"&Ñ# ‚ Ð"**) "*)'Þ&Ñ#
=#/ !Þ")''#'%*#
WIÐ. 8 #%
œ Þ!($)'))*à
>‡8# WIÐ.s"**) Ñ œ #Þ!(%Ð Þ!($)'))*Ñ œ Þ"&$#!%à
so sC"**) „ >‡8# WIÐ.s"**) Ñ œ #Þ!$( „ Þ"&$#!% Ê ("Þ))$(*', #Þ"*!#!%)
f. use sC"**) „ >‡8# WIÐCs"**) Ñ;
for the calculations below, note that from the output we have =/ œ !Þ")''#'%*;
sC"**) œ %*Þ*!#"$!%$ !Þ!#$*&'&##Ð"**)Ñ œ #Þ!$(à
s"**) Ñ œ WI # Ð," Ñ ‚ ÐB/ BÑ#
=/#
WIÐC =#/ œ
ÐÞ!!&&!$$"&Ñ# ‚ Ð"**) "*)'Þ&Ñ#
8
!Þ")''#'%*#
#% !Þ")''#'%*# œ Þ#!!("%à
>‡8# WIÐCs"**) Ñ œ #Þ!(%ÐÞ#!!("%Ñ œ Þ%"'#)à
so sC"**) „ >‡8# WIÐC s"**) Ñ œ #Þ!$( „ Þ%"'#) Ê "Þ'#!(#ß #Þ%&$#)
=
25. a. ," œ < =BC œ Þ*(" "(Þ"*
"'Þ( œ Þ***%*à ,! œ C ," B œ '"Þ)" Þ***%*Ð'(Þ%%Ñ œ &Þ&*$&);
b. sC*( œ &Þ&*$&) Þ***%*Ð*(Ñ œ *"Þ$'
c. =/ œ WWI
8# œ
#&&Þ(%)
"% œ %Þ#(%;
ST 350 Practice Problems Final Exam page 13
d. WIÐ," Ñ œ / = %Þ#(% œ Þ!''";
=
8" =B "&‡"'Þ(
confidence interval is ," „ >‡8# WIÐ," Ñ œ Þ***%* „ #Þ"%&ÐÞ!''"Ñ œ Þ***%* „ Þ"%"() Ê
ÐÞ)&(("ß "Þ"%"#(Ñ
e. L! À "" œ ! vs L+ À "" Á !à
,"
test statistic > œ WIÐ, "Ñ
œ Þ***%*
Þ!''" œ "&Þ"#à
for α œ Þ!&ß the rejection region is > #Þ"%& and > #Þ"%& (8 # œ "' # œ "% .0 Ñ;
the T -value is 0 to nine decimal places.
Conclusion: since the test statistic is in the rejection region, reject L! À "" œ ! and conclude that year is
useful for predicting the percentage of respondents that will answer “yes” to the question.
%!
26. d 27. d 28. s: œ "!!! œ Þ!%; test statistic D œ Þ!%Þ!' Þ!#
œ #Þ'(;
Þ!'‡Þ*%
œ Þ!!(&
"!!!
T @+6?/ œ T ÐD #Þ'(Ñ œ .!!$)Þ Since the T -value is less than .!&, reject H! and conclude that
: Þ!'Þ