Workshop 10 S1 2020 Solutions

Download as pdf or txt
Download as pdf or txt
You are on page 1of 9

ETF/ETC5900 Business Statistics Week 10

Instructions for Tutorial 10


Download files from the Week 10 section on Moodle
If you have not already done so, download the following files for use in these Week 10 Tutorial, These
excel files are found in the Tutorial Material folder.
• Car Dealership.xlsx
• Manatee.xlsx

Q10.1: Car Dealership.xlsx

This question uses data in the file Car Dealership.xlsx. A used car dealership is considering the factors
that determine the sale price of used Toyota Camry passenger vehicles. As a first attempt at
predicting the price ($), it is assumed that the main factor affecting the resale value is the distance
the car has travelled (i.e. the odometer reading (km)).

a) What do you expect the relationship to be between Price and Odometer Reading?

I would expect Price and Odometer reading to be negatively correlated, since a higher odometer
reading would tend to lower the resale price of used cars.

b) Use Excel to plot a scatter diagram against Odometer Reading vs Price. [Insert>Scatter and select
unjoined dots] of Price(Y) against Odometer Reading(X), (with Price on the vertical)].
Comment on how this visual relationship compares with your expectations.

Price of Used Camry


40000
35000
30000
25000
Price ($)

20000
15000
10000
5000
0
0 50000 100000 150000 200000 250000
Odometer Reading (km)

The scatter graph of PRICE vs ODOMETER READING has the data points clustered around a straight
line indicating a linear relationship. Closer observation shows that although they are clustered
1
ETF/ETC5900 Business Statistics Week 10
around a straight line, they are not clustered closely together. This is especially true for higher levels
of odometer reading, as it can be seen that the linear relationship is weaker.
The data points spread from the top left corner to the bottom right corner which indicates a
negative slope. This indicates that as odometer reading (km) increases, price of used car ($)
decreases.
However, the slope of the line is not very steep. The scatter points are not closely clustered together
which suggests a weak relationship.
This is surprising and not as expected. This could be partially due to the following:
· Sample is small therefore we do not have enough information
· Linear relationship is weak as can be seen at higher levels of odometer reading.

c) Based on the scatter plot, comment on whether it is appropriate to fit a regression line to the data.
Estimate a model for the relationship between Price and Odometer Reading, by using Excel to produce
the simple linear regression output.
The scatter plot indicates a lot of randomness but also the prices tend to be higher for lower odometer
readings. It is not clear whether the underlying relationship is linear, but given the amount of
randomness and the small size of the sample this simple linear relationship is a reasonable one to try.
A regression analysis was performed using Excel, with the following result:
SUMMARY OUTPUT
Regression Statistics
Multiple R 0.441278
R Square 0.194727
Adjusted R Square 0.12152
Standard Error 7249.394
Observations 13

ANOVA
df SS MS F Significance F
Regression 1 1.4E+08 1.4E+08 2.659956 0.131176221
Residual 11 5.78E+08 52553716
Total 12 7.18E+08

Standard Upper Lower Upper


Coefficients Error t Stat P-value Lower 95% 95% 95.0% 95.0%
Intercept 28876.87 4516.154 6.394129 5.12E-05 18936.88211 38816.86 18936.88 38816.86
Odometer
Reading -0.05106 0.03131 -1.63094 0.131176 -0.11997598 0.017848 -0.11998 0.017848

2
ETF/ETC5900 Business Statistics Week 10
d) Using the Excel output provided, state the estimated linear regression equation for this data, being
sure to define the variables.

yˆ = 28876.87 − 0.05106 x , where x is the odometer reading (km) and 𝒚


̂ is
the estimated sale price($).

e) Using a 5% level of significance, conduct a hypothesis test to determine if a linear relationship exists
between ODOMETER READING and SALE PRICE. Use the p-value approach. Remember also to
show ALL working, ALL steps AND interpret the conclusion in context of the question.

Step 1:
H0: β1 = 0
H1: β1 ≠ 0

Step 2:
α = 0.05

Step 3:
p-value = 0.131176

Step 4:
Reject H0 if p-value < α
Since the 0.131176 > 0.05, we CANNOT reject the null hypothesis, H0

Step 5:
We CANNOT reject H0 at the 5% level of significance.
The sample DOES NOT provide evidence against H0.
The linear relationship between Odometer Reading and Price does not exist.

f) What is the slope of the estimated regression line? Provide an interpretation of this value.

Slope = ‒ 0.05106. For each additional kilometre on the odometer reading, the estimated sale price
decreases on average by about 5.1 cents.

3
ETF/ETC5900 Business Statistics Week 10
In order to express this in more realistic terms, we could say that if a car has done an extra
10,000km, all other things being equal, we would expect the estimated sale price to decrease on
average by $510.60 (or $511)

g) State the value of the intercept of the regression line? Give an interpretation of this value and
discuss whether it is meaningful in this case.

Intercept = 28876.87. If the odometer reading is 0km, this model would predict that the Sales Price
would be $28876.87, on average.

So, the intercept can be interpreted as a prediction of the value of a new Camry.

However, 0km is well outside the range of the data (the lowest where x is the odometer reading
(km) being 53,365km, so this is not a valid way of predicting the price of a car that has not been
driven at all.

h) Look again at the output. The first number at the top of the output is labelled “Multiple R”. This
number is the absolute value of the sample correlation coefficient r between the two variables. In
order to find the sign of the correlation coefficient, you must look at the sign of the slope.
So, state AND interpret the value of the correlation coefficient between odometer reading and
price?

The coefficient of correlation is – 0.44


This indicates a weak, negative linear relationship between odometer readings and price.
[Note that the value of Multiple R is 0.441278, and Multiple R is always positive. The correlation
coefficient is obtained by applying the correct sign to Multiple R.]

i) State the coefficient of determination and interpret this value.

Coefficient of determination = R2 = 19.48%.


Approximately 19.48% of the variation in Price of Cars is explained by the variation in Odometer
Reading. The remaining 80.52% of variation in Price of Cars is left unexplained by the model.

Video Solution for Q10.1

4
ETF/ETC5900 Business Statistics Week 10
Q10.2: Manatee.xlsx
This question uses data in the file Manatee.xlsx

It gives real data on Manatee deaths due to powerboat accidents.

a) What do you expect the relationship to be between Manatee deaths and the number of registered
powerboats?

I would expect Manatee deaths and the number of registered powerboats to be positively correlated,
since a more registered powerboats would tend to higher the number of manatee deaths.

b) Use Excel to plot a scatter diagram.


Comment on how this visual relationship compares with your expectations.

Powerboat related Manatee Deaths


60
Number of Manatee Deaths

50

40

30

20

10

0
400 450 500 550 600 650 700 750

Number of Powerboats Registered ('000)

The scatter graph of Manatee deaths vs Number of registered powerboats has the data points
clustered around a line indicating a linear relationship. The data points spread from the bottom
left corner up to the top right corner which indicates a positive slope. This indicates that the
manatee deaths increases as number of registered powerboats increases. The slope of the line is
quite steep. Since the data points are closely clustered around a line, this suggests a strong
relationship – this is not surprising/is as expected.

c) Estimate a model for the relationship between NUMBER OF DEATHS and NUMBER OF
POWERBOATS, by using Excel to produce the simple linear regression output.

5
ETF/ETC5900 Business Statistics Week 10
SUMMARY OUTPUT

Regression Statistics
Multiple R 0.941477289
R Square 0.886379485
Adjusted R
Square 0.876911109
Standard Error 4.276387771
Observations 14

ANOVA
df SS MS F Significance F
Regression 1 1711.979 1711.979 93.61473 5.11E-07
Residual 12 219.4499 18.28749
Total 13 1931.429

Standard
Coefficients Error t Stat P-value Lower 95% Upper 95%
Intercept -41.43043895 7.412217 -5.58948 0.000118 -57.5803 -25.2806
Powerboats
(thousands) 0.124861692 0.012905 9.675471 5.11E-07 0.096744 0.152979

d) Using a 5% level of significance, conduct a hypothesis test to determine if a positive linear


relationship exists between Manatee deaths and the number of registered powerboats. Use
the p-value approach. Remember also to show ALL working, ALL steps AND interpret the
conclusion in context of the question.
Step 1:
H0: β1 ≤ 0
H1: β1 > 0

Step 2:
α = 0.05

Step 3:
p-value = 5.11E-07/2 = 2.56E-07 ≈ 0.0000

Step 4:
Reject H0 if p-value < α
Since the 0.0000 < 0.05, we CAN reject the null hypothesis, H0
6
ETF/ETC5900 Business Statistics Week 10

Step 5:
We CAN reject H0 at the 5% level of significance.
The sample DOES provide evidence against H0.
A positive linear relationship exists between Powerboat registrations and Manatee Deaths.

e) Look again at the output. The first number at the top of the output is labelled “Multiple R”. This
number is the absolute value of the sample correlation coefficient r between the two variables.
In order to find the sign of the correlation coefficient, you must look at the sign of the slope.
So, state AND interpret the value of the correlation coefficient between Powerboat registrations
and Manatee Deaths?

The coefficient of correlation is 0.94


This indicates a strong, positive linear relationship between Powerboat registrations and
Manatee Deaths.
[Note that the value of Multiple R is 0.94, and Multiple R is always positive. The correlation
coefficient is obtained by applying the correct sign to Multiple R.]

f) State the coefficient of determination and interpret this value.

Coefficient of determination = R2 = 88.64%.


Approximately 88.64% of the variation in Manatee Deaths is explained by the regression of
Manatee Deaths on Powerboat registrations. The remaining 11.36% of variation in Manatee
Deaths is left unexplained by the model.

g) In 1998 there were 914,535 powerboats registered in Florida. According to the model, how many
deaths attributable to powerboats would you expect in 1998?

𝑁𝑢𝑚𝑏𝑒𝑟̂
𝑜𝑓 𝑑𝑒𝑎𝑡ℎ𝑠 = -41.43+0.1249 × 914.535 = 72.80 ≈ 73

[Remember the number of powerboat registrations is in thousands.]

7
ETF/ETC5900 Business Statistics Week 10
h) In fact, in 1998 in Florida there were 66 manatee deaths attributable to watercraft, while in 1999
there were 82. (See http://www.savethemanatee.org/newsprrecorddeaths.htm.)
Comment on these numbers in relation to your calculation.

The calculation for 1998 (in part g) used the model outside of the range of explanatory
variables used in constructing the model. It is therefore not reliable.
Also, it is clear from the graph that the number of deaths each year can differ quite a lot from
the value estimated by the model. And this is borne out by the fact that the number of deaths
in 1998 (66) and 1999 (82) differed quite markedly although there was unlikely to be a big
change in powerboat numbers in one year.

Video Solution for Q10.2

Further practice questions

Q10.3: Car Dealership.xlsx (continue)

Using a 10% level of significance, conduct a hypothesis test to determine if a negative linear
relationship exists between ODOMETER READING and SALE PRICE. Use the p-value approach.
Remember also to show ALL working, ALL steps AND interpret the conclusion in context of the
question.

Step 1:
H0: β1 ≥ 0
H1: β1 < 0

Step 2:
α = 0.10

Step 3:
p-value = 0.131176/2=0.0656

Step 4:
Reject H0 if p-value < α
Since the 0.0656 < 0.1, we CAN reject the null hypothesis, H0
8
ETF/ETC5900 Business Statistics Week 10

Step 5:
We CAN reject H0 at the 10% level of significance.
The sample DOES provide evidence against H0.
A negative linear relationship exists between Odometer Reading and Price.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy