Core Data Analysis Worksheet 6
Core Data Analysis Worksheet 6
Questions
1. CORE, FUR1 2006 VCAA 7 MC
For a set of bivariate data, involving the variables and ,
Using this data, the equation of the least squares regression line that enables weight to be
Part 1
predicted from waist measurement is
Written as a percentage, the coefficient of determination is closest to
When this equation is used to predict the weight of the man with a waist measurement of 80 A.
cm, the residual value is closest to B.
A. C.
B. D.
C. E.
D.
E. Part 2
From the equation of the least squares regression line, it can be concluded that for these
jellyfish, on average
A. there is a 3.5 mm increase in diameter for each 1 mm increase in length.
B. there is a 3.5 mm increase in length for each 1 mm increase in diameter.
C. there is a 0.87 mm increase in diameter for each 1 mm increase in length.
D. there is a 0.87 mm increase in length for each 1 mm increase in diameter.
E. there is a 4.37 mm increase in diameter for each 1 mm increase in length.
Part 2
Using systolic blood pressure (systolic) as the dependent variable, and diastolic blood
pressure (diastolic) as the independent variable, a least squares regression line is fitted to
the data in Table 1.
The equation of the least squares regression line is closest to
A.
B.
C. 6. CORE, FUR1 2012 VCAA 8 MC
D.
The maximum wind speed and maximum temperature were recorded each day for a month.
E. The data is displayed in the scatterplot below and a least squares regression line has been
fitted. The dependent variable is temperature. The independent variable is wind speed.
Part 3
From the fifteen blood pressure measurements for this person, it can be concluded that the
percentage of the variation in systolic blood pressure that is explained by the variation in
diastolic blood pressure is closest to
A.
B.
C.
D.
E.
Given the information above, the least squares regression line predicting from is closest
to
A.
B.
C.
D.
E.
10. CORE, FUR1 2015 VCAA 9 MC
Part 1
A least squares regression line has been fitted to the scatterplot above to enable distance, in Given the information above, which one of the following statements is not true?
kilometres, to be predicted from time, in minutes.
A. The value of the correlation coefficient is close to 0.94
The equation of this line is closest to
B. 12.5% of the variation in life expectancy is not explained by the variation in the Human
A. distance time Development Index.
B. time distance C. On average, life expectancy increases by 43.0 years for each 10-point increase in the
Human Development Index.
C. distance time
D. Ignoring any outliers, the association between life expectancy and the Human
D. time distance
Development Index can be described as strong, positive and linear.
E. distance time
E. Using the least squares line to predict the life expectancy in a country with a Human
Development Index of 75 is an example of interpolation.
11. CORE, FUR1 2016 VCAA 9-10 MC
The scatterplot below shows life expectancy in years (life expectancy) plotted against the Part 2
Human Development Index (HDI) for a large number of countries in 2011.
In 2011, life expectancy in Australia was 81.8 years and the Human Development Index was
A least squares line has been fitted to the data and the resulting residual plot is also shown. 92.9
When the least squares line is used to predict life expectancy in Australia, the residual is
closest to
A.
B.
C.
D.
E.
12. CORE, FUR1 2018 VCAA 10 MC closest to
A.
In a study of the association between a person’s height, in centimetres, and body surface
area, in square metres, the following least squares line was obtained. B.
body surface area = –1.1 + 0.019 × height C.
Which one of the following is a conclusion that can be made from this least squares line? D.
A. An increase of 1 m² in body surface area is associated with an increase of 0.019 cm in E.
height.
B. An increase of 1 cm in height is associated with an increase of 0.019 m² in body
Part 2
surface area.
The independent variable is foot length.
C. The correlation coefficient is 0.019
The equation of the least squares regression line is closest to
D. A person’s body surface area, in square metres, can be determined by adding 1.1 cm
A. height = –110 + 0.78 × foot length.
to their height.
B. height = 141 + 1.3 × foot length.
E. A person’s height, in centimetres, can be determined by subtracting 1.1 from their body
surface area, in square metres. C. height = 167 + 1.3 × foot length.
D. height = 167 + 0.67 × foot length.
13. CORE, FUR1 2010 VCAA 7-9 MC E. foot length = 167 + 1.3 × height.
The height (in cm) and foot length (in cm) for each of eight Year 12 students were recorded
and displayed in the scatterplot below. Part 3
A least squares regression line has been fitted to the data as shown.
The plot of the residuals against foot length is closest to
Part 1
By inspection, the value of the product-moment correlation coefficient for this data is
20
19
wrist
circumference 18
(cm)
17
16
21 22 23 24 25 26
ankle circumference (cm)
Part 1
The equation of the least squares line is closest to
A. ankle = 10.2 + 0.342 × wrist
B. wrist = 10.2 + 0.342 × ankle
C. ankle = 17.4 + 0.342 × wrist
D. wrist = 17.4 + 0.342 × ankle
E. wrist = 17.4 + 0.731 × ankle
Part 2
When the least squares line on the scatterplot is used to predict the wrist circumference of
the person with an ankle circumference of 24 cm, the residual will be closest to
14. CORE, FUR1 2017 VCAA 8-10 MC
A.
The scatterplot below shows the wrist circumference and ankle circumference, both in B.
centimetres, of 13 people. A least squares line has been fitted to the scatterplot with ankle
circumference as the explanatory variable. C.
D.
E.
Part 3
The residuals for this least squares line have a mean of 0.02 cm and a standard deviation of
0.4 cm.
The value of the residual for one of the data points is found to be – 0.3 cm.
The standardised value of this residual is
A. A least squares line is to be fitted to the data with the aim of predicting evening congestion
level from morning congestion level.
B.
The equation of this line is.
C.
evening congestion level = 8.48 + 0.922 × morning congestion level
D.
E.
b. Name the response variable in this equation. (1 mark)
c. Use the equation of the least squares line to predict the evening congestion level when the
15. CORE, FUR2 2018 VCAA 2 morning congestion level is 60%. (1 mark)
The congestion level in a city can be recorded as the percentage increase in travel time due d. Determine the residual value when the equation of the least squares line is used to predict
to traffic congestion in peak periods (compared to non-peak periods). the evening congestion level when the morning congestion level is 47%.
Round your answer to one decimal place? (2 marks)
This is called the percentage congestion level.
e. The value of the correlation coefficient
The percentage congestion levels for the morning and evening peak periods for 19 large
is 0.92
cities are plotted on the scatterplot below.
What percentage of the variation in the evening congestion level can be explained by the
variation in the morning congestion level?
Round your answer to the nearest whole number. (1 mark)
a. Determine the median percentage congestion level for the morning peak period and the
evening peak period.
Write your answers in the appropriate boxes provided below. (2 marks)
c. Explain why this residual plot supports the assumption of linearity for this relationship. (1
mark)
d. Write down the percentage of variation in the mean duration of a warm spell that is
explained by the variation in mean surface temperature. Write your answer correct to the
nearest per cent. (1 mark)
e. Describe the relationship between the mean duration of a warm spell and the mean surface
temperature in terms of strength, direction and form. (2 marks)
The residual plot below was constructed to test the assumption of linearity for the
relationship between the variables mean duration of warm spell and the mean surface
temperature.
17. CORE, FUR2 2010 VCAA 2
In the scatterplot below, average annual female income, in dollars, is plotted against
average annual male income, in dollars, for 16 countries. A least squares regression line is
fitted to the data.
a. Use the scatterplot to describe the association between apparent temperature and actual
temperature in terms of strength, direction and form. (1 mark)
b.
The equation of the least squares regression line for predicting female income from male i. Determine the equation of the least squares line that can be used to predict the
income is apparent temperature from the actual temperature.
female income = 13 000 + 0.35 × male income Write the values of the intercept and slope of this least squares line in the appropriate
boxes provided below.
a. What is the independent variable? (1 mark)
Round your answers to two significant figures. (3 marks)
b. Complete the following statement by filling in the missing information.
From the least squares regression line equation it can be concluded that, for these apparent temperature
actual temperature
countries, on average, female income increases by for each $1000 increase in
male income. (1 mark)
ii. Interpret the intercept of the least squares line in terms of the variables apparent
c. temperature and actual temperature. (1 mark)
i. Use the least squares regression line equation to predict the average annual female
c. The coefficient of determination for the association between the variables apparent
income (in dollars) in a country where the average annual male income is $15 000. (1
temperature and actual temperature is 0.97
mark)
Interpret the coefficient of determination in terms of these variables. (1 mark)
ii. The prediction made in part c.i. is not likely to be reliable.
d. The residual plot obtained when the least squares line was fitted to the data is shown
Explain why. (1 mark)
below.
i. A residual plot can be used to test an assumption about the nature of the association
between two numerical variables.
What is this assumption? (1 mark)
ii. Does the residual plot above support this assumption? Explain your answer. (1 mark)
An equation of the least squares regression line for this data set is
rainfall × percentage of clear days
a. Draw this line on the scatterplot. (1 mark)
b. Use the equation of the least squares regression line to predict the rainfall for a month with
35% of clear days. Write your answer in mm correct to one decimal place. (1 mark)
c. The coefficient of determination for this data set is 0.8081.
i. Interpret the coefficient of determination in terms of the variables rainfall and
percentage of clear days. (1 mark)
ii. Determine the value of Pearson’s product moment correlation coefficient. Write your
answer correct to three decimal places. (2 marks)
20. CORE, FUR2 2014 VCAA 2 21. CORE, FUR2 2012 VCAA 2
The scatterplot below shows the population and area (in square kilometres) of a sample of The maximum temperature and the minimum temperature at this weather station on each of
inner suburbs of a large city. the 30 days in November 2011 are displayed in the scatterplot below.
The equation of the least squares regression line for the data in the scatterplot is The equation of the least squares regression line for this data set is
b. Draw the least squares regression line on the scatterplot above. b. Interpret the vertical intercept of the least squares regression line in terms of maximum
temperature and minimum temperature. (1 mark)
(Answer on the scatterplot above.) (1 mark)
c. Describe the relationship between the maximum temperature and the minimum
c. Interpret the slope of this least squares regression line in terms of the variables area temperature in terms of strength and direction. (1 mark)
and population. (2 marks)
d. Interpret the slope of the least squares regression line in terms of maximum temperature
d. Wiston is an inner suburb. It has an area of 4 km² and a population of 6690. and minimum temperature. (1 mark)
The correlation coefficient, , is equal to 0.668
e. Determine the percentage of variation in the maximum temperature that may be explained
i. Calculate the residual when the least squares regression line is used to predict by the variation in the minimum temperature.
the population of Wiston from its area. (1 mark) Write your answer, correct to the nearest percentage. (1 mark)
ii. What percentage of the variation in the population of the suburbs is explained by On the day that the minimum temperature was 11.1 °C, the actual maximum
the variaton in area. temperature was 12.2 °C.
Write your answer, correct to one decimal place. (1 mark)
f. Determine the residual value for this day if the least squares regression line is used to
predict the maximum temperature.
Write your answer, correct to the nearest degree. (2 marks)
VCE Mathematics examination questions reproduced by permission, VCAA. VCE is a registered trademark of the VCAA. The
VCAA does not endorse or make any warranties regarding this study resource. Current and past VCE exams and related
content can be accessed directly at www.vcaa.vic.edu.au.
Worked Solutions 4. CORE, FUR1 2010 VCAA 10 MC
b.
c.
d.
e.
c.
a.
d.
e. 18. CORE, FUR2 2016 VCAA 3
a.
17. CORE, FUR2 2010 VCAA 2
a.
b.i.
b.
a. a.
c.
b. ² ♦ Mean mark 41% (part (iii)).
b.
d.
e.
f.
MARKER'S COMMENT:
Students had particular
difficulty with this part, with
many using the incorrect
calculation of 12.2 - 11.1 = 1.1.