S4 Correlation (3)
S4 Correlation (3)
The table shows the length of time, to the nearest ms, that is takes a group of students to
respond to a stimulus.
(a) Give two reasons why a histogram is appropriate for displaying this data.
The bar to represent the students who took between 4 ms and 6 ms to respond is of width 1
cm and height 2.5 cm. Continuous data; classes are different widths
(b) Calculate the width and height of the bar representing students who took between 7.5 ms
and 8.5 ms to respond. Width = 0.5 cm, height = 4 cm
(c) Use linear interpolation to estimate the median time taken to respond.
25 − 12
(d) Calculate an estimate for the standard deviation. × 1+ 6=6.866
√
15
( )
2
2291.1875 330.25
∑ 𝑓𝑡=330.25 ∑ 𝑓 𝑡 2=2291.1875 𝜎= 50
−
50
=1.4824 …
2. Simon is researching the eating habits of the students in his school year.
He asks the first five people he sees on Monday morning.
(a) Write down the sampling technique Simon is using. Opportunity sampling
(b) State one advantage of this technique. Quick (and easy) to carry out.
(c) Suggest two improvements that Simon can make to the sampling technique.
Increase the number of people he asks.
Vary the time of day he asks people.
Your turn …
For each situation below, explain which quantity would be the explanatory variable, and which would be the
response variable.
1. The time spent practising the piano each week. Explanatory
The number of mistakes made in a test at the end of the week. Response
2. The age of a second hand car. Explanatory
The value of the second hand car. Response
3. The growth rate of a plant in an experiment. Response
The amount of sunlight falling on a plant in an experiment. Explanatory
Correlation describes the nature of the linear relationship between two variables.
x x x x x
x x x x
x x x x
x x x x
x x x x x x
x x x x
x x x x x
x
In this example, the correlation between windmill activity and wind velocity does not
imply that wind is caused by windmills. It is rather the other way around, as suggested
by the fact that wind doesn’t need windmills to exist, while windmills need wind to
rotate. Wind can be observed in places where there are no windmills or non-rotating
windmills—and there are good reasons to believe that wind existed before the
invention of windmills!
Example 2
Since the 1950s, both the atmospheric CO2 level and obesity levels have increased sharply.
Hence, atmospheric CO2 causes obesity.
Richer populations tend to eat more food and consume more energy
1. Ice cream sales and the number of shark attacks on swimmers are positively correlated. Can I
conclude that a rise in ice cream sales is going to cause more shark attacks?
2. Children with bigger feet spell better. So, a better ability to spell is caused by big feet.
3. The more firefighetrs fighting a fire, the bigger the fire is going to be. Therefore, firefighters cause fires.
4. People are taller today than 500 years ago. Health and diet have improved over the last 500 years.
So, better health and diet have caused people to become taller.
5. As the number of pirates has decreased, global warming has increased. So, global warming is caused
by a lack of pirates.
1. Of course ice cream does not cause shark attacks! Ice cream sales and shark attacks both increase
during warm weather. So, the two variables are positively correlated but there is no causal relationship
between the two!
2. A child’s shoe size and their ability to spell are both related to a child’s age. Children with bigger
feet spell better because they are older, their greater age bringing about bigger feet and, not quite so
certainly, better spelling. Thus the two variables are positively correlated and there is no causal
relationship.
4. It makes sense. It’s also why people live longer. We still need proof though!
5. No! We don’t need more pirates! These are completely unrelated and are a coincidence.
Jerry is studying visibility for Camborne using the large data set June 1987.
Jerry drew the following scatter diagram, Figure 2, and calculated some statistics using the June 1987 data
for Camborne from the large data set.
Jerry defines an outlier as a value that is more than 1.5 times the interquartile range above Q 3 or more
than 1.5 times the interquartile range below Q 1.
(a) Show that the point circled on the scatter diagram is an outlier for visibility.
(b) Interpret the correlation between the daily mean visibility and the daily maximum relative humidity.
2. Data from the daily mean windspeed (in knots) in Leuchars is July 1987 is taken from the large data
set.
3 4 5 5 5 5 5 5 5
5 6
6 6 7 7 7 8 8 8 8
9 9 Median = 7, LQ = 5, UQ = 9, IQR = 4
9 9 10 11 11 12 15 16 19
(a) Calculate the median and the interquartile range.
An outlier is defined as a value which lies either 1.5 x interquartile
16 & 19 arerange above
the only the upper quartile or
outliers
1.5 x the interquartile range below the lower quartile.
(b) Determine whether there any outliers in the data.
(c) Draw a box plot for this data.
where is the point of intercept with the -axis and is the gradient of the line (i.e. the amount by
which increases for an increase of 1 in ).
For each point on the scatter diagram we can express in terms of as , where is the vertical
distance from the line of best fit.
From the large data set, the daily maximum temperature () and the daily total sunshine () for
12 days in May in Heathrow in 2015 were recorded The data was plotted on a scatter graph.
Interpolation is when you estimate the value of a dependent variable within the range of
observed data values.
Extrapolation is when you estimate a values outside the range of observed data values.
Extrapolated values can be unreliable and should be viewed with caution.
The data in the table refer to a chain of shops. The figures reported are the number of sales staff () and the
average daily takings in thousands of pounds ().
17 39 32 17 25 43 25 32 48 10 48 42 36 30 19
7 17 10 5 7 15 11 13 19 3 17 15 14 12 8
A company is introducing a job evaluation scheme. Points (x) will be awarded to each job based on the
qualifications and skills needed and the level of responsibility. Pay (£y) will then be allocated to each job
according to the number of points awarded.
Before the scheme is introduced, a random sample of 8 employees was taken and the linear regression
equation of pay on points was y = 4.5x – 47
(c) Explain why this model might not be appropriate for all jobs in the company.
To test the heating of tyre material, tyres are run on a test rig at chosen speeds under given
conditions of load, pressure and surrounding temperature. The following table gives values
of , the test rig speed in miles per hour (mph), and the temperature, °C, generated in the
shoulder of the tyre for a particular tyre material.
x (mph) 15 20 25 30 35 40 45 50
y (°C) 53 55 63 65 78 83 91 101
(a) Draw a scatter diagram to represent these data.
(b) Give a reason to support the fitting of a regression line of the form
through these points.
90
70
60
50
x
15 25 35 45 55
S p eed (m p h )
(d)
(e)