Workshop 10 S1 2020 Solutions
Workshop 10 S1 2020 Solutions
Workshop 10 S1 2020 Solutions
This question uses data in the file Car Dealership.xlsx. A used car dealership is considering the factors
that determine the sale price of used Toyota Camry passenger vehicles. As a first attempt at
predicting the price ($), it is assumed that the main factor affecting the resale value is the distance
the car has travelled (i.e. the odometer reading (km)).
a) What do you expect the relationship to be between Price and Odometer Reading?
I would expect Price and Odometer reading to be negatively correlated, since a higher odometer
reading would tend to lower the resale price of used cars.
b) Use Excel to plot a scatter diagram against Odometer Reading vs Price. [Insert>Scatter and select
unjoined dots] of Price(Y) against Odometer Reading(X), (with Price on the vertical)].
Comment on how this visual relationship compares with your expectations.
20000
15000
10000
5000
0
0 50000 100000 150000 200000 250000
Odometer Reading (km)
The scatter graph of PRICE vs ODOMETER READING has the data points clustered around a straight
line indicating a linear relationship. Closer observation shows that although they are clustered
1
ETF/ETC5900 Business Statistics Week 10
around a straight line, they are not clustered closely together. This is especially true for higher levels
of odometer reading, as it can be seen that the linear relationship is weaker.
The data points spread from the top left corner to the bottom right corner which indicates a
negative slope. This indicates that as odometer reading (km) increases, price of used car ($)
decreases.
However, the slope of the line is not very steep. The scatter points are not closely clustered together
which suggests a weak relationship.
This is surprising and not as expected. This could be partially due to the following:
· Sample is small therefore we do not have enough information
· Linear relationship is weak as can be seen at higher levels of odometer reading.
c) Based on the scatter plot, comment on whether it is appropriate to fit a regression line to the data.
Estimate a model for the relationship between Price and Odometer Reading, by using Excel to produce
the simple linear regression output.
The scatter plot indicates a lot of randomness but also the prices tend to be higher for lower odometer
readings. It is not clear whether the underlying relationship is linear, but given the amount of
randomness and the small size of the sample this simple linear relationship is a reasonable one to try.
A regression analysis was performed using Excel, with the following result:
SUMMARY OUTPUT
Regression Statistics
Multiple R 0.441278
R Square 0.194727
Adjusted R Square 0.12152
Standard Error 7249.394
Observations 13
ANOVA
df SS MS F Significance F
Regression 1 1.4E+08 1.4E+08 2.659956 0.131176221
Residual 11 5.78E+08 52553716
Total 12 7.18E+08
2
ETF/ETC5900 Business Statistics Week 10
d) Using the Excel output provided, state the estimated linear regression equation for this data, being
sure to define the variables.
e) Using a 5% level of significance, conduct a hypothesis test to determine if a linear relationship exists
between ODOMETER READING and SALE PRICE. Use the p-value approach. Remember also to
show ALL working, ALL steps AND interpret the conclusion in context of the question.
Step 1:
H0: β1 = 0
H1: β1 ≠ 0
Step 2:
α = 0.05
Step 3:
p-value = 0.131176
Step 4:
Reject H0 if p-value < α
Since the 0.131176 > 0.05, we CANNOT reject the null hypothesis, H0
Step 5:
We CANNOT reject H0 at the 5% level of significance.
The sample DOES NOT provide evidence against H0.
The linear relationship between Odometer Reading and Price does not exist.
f) What is the slope of the estimated regression line? Provide an interpretation of this value.
Slope = ‒ 0.05106. For each additional kilometre on the odometer reading, the estimated sale price
decreases on average by about 5.1 cents.
3
ETF/ETC5900 Business Statistics Week 10
In order to express this in more realistic terms, we could say that if a car has done an extra
10,000km, all other things being equal, we would expect the estimated sale price to decrease on
average by $510.60 (or $511)
g) State the value of the intercept of the regression line? Give an interpretation of this value and
discuss whether it is meaningful in this case.
Intercept = 28876.87. If the odometer reading is 0km, this model would predict that the Sales Price
would be $28876.87, on average.
So, the intercept can be interpreted as a prediction of the value of a new Camry.
However, 0km is well outside the range of the data (the lowest where x is the odometer reading
(km) being 53,365km, so this is not a valid way of predicting the price of a car that has not been
driven at all.
h) Look again at the output. The first number at the top of the output is labelled “Multiple R”. This
number is the absolute value of the sample correlation coefficient r between the two variables. In
order to find the sign of the correlation coefficient, you must look at the sign of the slope.
So, state AND interpret the value of the correlation coefficient between odometer reading and
price?
4
ETF/ETC5900 Business Statistics Week 10
Q10.2: Manatee.xlsx
This question uses data in the file Manatee.xlsx
a) What do you expect the relationship to be between Manatee deaths and the number of registered
powerboats?
I would expect Manatee deaths and the number of registered powerboats to be positively correlated,
since a more registered powerboats would tend to higher the number of manatee deaths.
50
40
30
20
10
0
400 450 500 550 600 650 700 750
The scatter graph of Manatee deaths vs Number of registered powerboats has the data points
clustered around a line indicating a linear relationship. The data points spread from the bottom
left corner up to the top right corner which indicates a positive slope. This indicates that the
manatee deaths increases as number of registered powerboats increases. The slope of the line is
quite steep. Since the data points are closely clustered around a line, this suggests a strong
relationship – this is not surprising/is as expected.
c) Estimate a model for the relationship between NUMBER OF DEATHS and NUMBER OF
POWERBOATS, by using Excel to produce the simple linear regression output.
5
ETF/ETC5900 Business Statistics Week 10
SUMMARY OUTPUT
Regression Statistics
Multiple R 0.941477289
R Square 0.886379485
Adjusted R
Square 0.876911109
Standard Error 4.276387771
Observations 14
ANOVA
df SS MS F Significance F
Regression 1 1711.979 1711.979 93.61473 5.11E-07
Residual 12 219.4499 18.28749
Total 13 1931.429
Standard
Coefficients Error t Stat P-value Lower 95% Upper 95%
Intercept -41.43043895 7.412217 -5.58948 0.000118 -57.5803 -25.2806
Powerboats
(thousands) 0.124861692 0.012905 9.675471 5.11E-07 0.096744 0.152979
Step 2:
α = 0.05
Step 3:
p-value = 5.11E-07/2 = 2.56E-07 ≈ 0.0000
Step 4:
Reject H0 if p-value < α
Since the 0.0000 < 0.05, we CAN reject the null hypothesis, H0
6
ETF/ETC5900 Business Statistics Week 10
Step 5:
We CAN reject H0 at the 5% level of significance.
The sample DOES provide evidence against H0.
A positive linear relationship exists between Powerboat registrations and Manatee Deaths.
e) Look again at the output. The first number at the top of the output is labelled “Multiple R”. This
number is the absolute value of the sample correlation coefficient r between the two variables.
In order to find the sign of the correlation coefficient, you must look at the sign of the slope.
So, state AND interpret the value of the correlation coefficient between Powerboat registrations
and Manatee Deaths?
g) In 1998 there were 914,535 powerboats registered in Florida. According to the model, how many
deaths attributable to powerboats would you expect in 1998?
𝑁𝑢𝑚𝑏𝑒𝑟̂
𝑜𝑓 𝑑𝑒𝑎𝑡ℎ𝑠 = -41.43+0.1249 × 914.535 = 72.80 ≈ 73
7
ETF/ETC5900 Business Statistics Week 10
h) In fact, in 1998 in Florida there were 66 manatee deaths attributable to watercraft, while in 1999
there were 82. (See http://www.savethemanatee.org/newsprrecorddeaths.htm.)
Comment on these numbers in relation to your calculation.
The calculation for 1998 (in part g) used the model outside of the range of explanatory
variables used in constructing the model. It is therefore not reliable.
Also, it is clear from the graph that the number of deaths each year can differ quite a lot from
the value estimated by the model. And this is borne out by the fact that the number of deaths
in 1998 (66) and 1999 (82) differed quite markedly although there was unlikely to be a big
change in powerboat numbers in one year.
Using a 10% level of significance, conduct a hypothesis test to determine if a negative linear
relationship exists between ODOMETER READING and SALE PRICE. Use the p-value approach.
Remember also to show ALL working, ALL steps AND interpret the conclusion in context of the
question.
Step 1:
H0: β1 ≥ 0
H1: β1 < 0
Step 2:
α = 0.10
Step 3:
p-value = 0.131176/2=0.0656
Step 4:
Reject H0 if p-value < α
Since the 0.0656 < 0.1, we CAN reject the null hypothesis, H0
8
ETF/ETC5900 Business Statistics Week 10
Step 5:
We CAN reject H0 at the 10% level of significance.
The sample DOES provide evidence against H0.
A negative linear relationship exists between Odometer Reading and Price.