S1 Correlation and Regression
S1 Correlation and Regression
com
1. A personnel manager wants to find out if a test carried out during an employee’s interview and a
skills assessment at the end of basic training is a guide to performance after working for the
company for one year.
The table below shows the results of the interview test of 10 employees and their performance
after one year.
Employee A B C D E F G H I J
Interview
65 71 79 77 85 78 85 90 81 62
test, x %.
Performance
after one 65 74 82 64 87 78 61 65 79 69
year, y %.
(a) Showing your working clearly, calculate the product moment correlation coefficient
between the interview test and the performance after one year.
(5)
The product moment correlation coefficient between the skills assessment and the performance
after one year is –0.156 to 3 significant figures.
(b) Use your answer to part (a) to comment on whether or not the interview test and skills
assessment are a guide to the performance after one year. Give clear reasons for your
answers.
(2)
(Total 7 marks)
2. A second hand car dealer has 10 cars for sale. She decides to investigate the link between the
age of the cars, x years, and the mileage, y thousand miles. The data collected from the cars are
shown in the table below.
Age, x
2 2.5 3 4 4.5 4.5 5 3 6 6.5
(years)
Mileage, y
22 34 33 37 40 45 49 30 58 58
(thousands)
(b) Find the equation of the least squares regression line in the form y = a + bx. Give the
values of a and b to 2 decimal places.
(4)
(d) Using your answer to part (b), find the mileage predicted by the regression line for a
5 year old car.
(2)
(Total 10 marks)
3. A young family were looking for a new 3 bedroom semi-detached house. A local survey
recorded the price x, in £1000, and the distance y, in miles, from the station of such houses. The
following summary statistics were provided
(a) Use these values to calculate the product moment correlation coefficient.
(2)
Another family asked for the distances to be measured in km rather than miles.
(c) State the value of the product moment correlation coefficient in this case.
(1)
(Total 4 marks)
x u s
The student calculated the value of the product moment correlation coefficient for each of the
sets of data.
Write down, with a reason, which value corresponds to which scatter diagram.
(Total 6 marks)
If r > 0.5 they can score B1g in (b) for saying that it (skills test)
is not a good guide to performance but B0h since a second
acceptable comment about both tests is not possible.
Give B1 for one correct line, B1B1 for any 2.
If the only comment is the test(s) are a good guide: scores B0B0
If the only comment is the tests are not good: scores B1B0 (second line)
The third line is for a comment that suggests that the interview
test is OK but the skills test is not since one is positive and the
other is negative.
Treat 1st B1 as B1g and 2nd as B1h
An answer of “no” alone scores B0B0
NB
(773) 2 (724) 2 773 × 724
S xx = 60475 − = 722.1, S yy = 53122 − = 704.4, S xy = 56076 − = 110.8
10 10 10
[7]
41× 406
2. (a) S xy = 1818.5 − , = 153.9 (could be seen in (b)) AWRT 154 M1, A1
10
412
S xx = 188 − = 19.9 (could be seen in (b)) A1 3
10
M1 for correct attempt or expression for either
1st A1 for one correct
2nd A1 for both correct
153.9
(b) b= , = 7.733668.... AWRT 7.73 M1, A1
19.9
a = 40.6 – b × 4.1(= 8.89796....) M1
y = 8.89 + 7.73x A1 4
Ignore the epen marks for part (b) they should be awarded as
per this scheme
their S xy
1st M1 for
their S xx
(c) A typical car will travel 7700 miles every year B1ft 1
B1ft for their b × 1000 to at least 2 sf.
Accept “7.7 thousand” but value is needed
S xy − 808.917
3. (a) r= = M1
S xx S yy 113573 × 8.657
= –0.81579... A1 2
M1 for knowing formula and clear attempt to sub in correct values
from question.
Root required for method.
Anything that rounds to –0.82 for A1.
Correct answer with no working award 2/2
(b) Houses are cheaper further away from the station or equivalent
statement B1 1
Context based on negative correlation only required.
Accept Houses are more expensive closer to the station or
equivalent statement.
Require ‘house prices’ or ‘station’ and clear correct comparison.
1. Part (a) was answered very well and most candidates scored full marks here but responses to
part (b) were mixed. Some thought that because both values were similar, but one positive and
one negative, they “cancelled out” and others only commented on one of the tests or thought
that the correlation coefficients were between the two tests. However a number of fully correct
solutions were seen.
2. The first two parts of this question were answered very well. There were few problems
encountered in parts (a) and (b) although a = 8.91 was a common error caused by using the
rounded value of b not a more accurate version. Most candidates adhered to the instruction to
give their answers to 2 decimal places. Problems started though when the candidates were asked
to interpret the equation. Many candidates simply said that mileage increases with age and few
who mentioned the 7.7 value remembered the thousands. A simple response such as “the annual
mileage is 7700 miles” or “each year a car travels 7700 miles” was rarely seen.
In part (d) most could substitute x = 5 into their equation but once again the “thousand” was
forgotten and the unlikely figure of 48 miles for a 5 year old car was all too common.
3. Most candidates had little trouble with part (a). In part (b) a sizeable minority correctly
identified negative correlation, but failed to put it into the context of the question. In part (c)
again a sizeable minority did not attempt this part or multiplied their answer for part (a) by some
arbitrary factor.
4. Most candidates were able to match the given values of the product moment correlation
coefficient with the correct diagram. However, relatively few were able to give acceptable
reasons based upon correlation rather than regression considerations. Even those candidates who
correctly referred to variables increasing and decreasing missed identifying u, v and s, t or found
it difficult to describe diagram B. There was a tendency to write about the points being in the
middle of the diagram without reference to their randomness or scattering.