5 1 Representation of Data Hard
5 1 Representation of Data Hard
Subject: Mathematics
Syllabus Code: 9709
Level: AS Level
Component: Probability and Statistics 1
Topic: 5.1 Representation of Data
Difficulty: Hard
Questions
1. The numbers of chocolate bars sold per day in a cinema over a period of 100 days are summarised
in the following table. (9709/51/M/J/20 number 7)
2. The annual salaries, in thousands of dollars, for 11 employees at each of two companies A and B
are shown below. (9709/53/M/J/20 number 6)
Company A 30 32 35 41 41 42 47 49 52 53 64
Company B 26 47 30 52 41 38 35 42 49 31 42
(a) Represent the data by drawing a back-to-back stem-and-leaf diagram with company A on the
left-hand side of the diagram.
(b) Find the median and the interquartile range of the salaries of the employees in company A.
A new employee joins company B. The mean salary of the 12 employees is now $38 500.
(c) Find the salary of the new employee.
3. The times, t minutes, taken by 150 students to complete a particular challenge are summarised in
the following cumulative frequency table. (9709/51/O/N/20 number 6)
4. A particular piece of music was played by 91 pianists and for each pianist, the number of incorrect
notes was recorded. The results are summarised in the table. (9709/53/O/N/20 number 7)
5. The distances, xm, travelled to school by 140 children was recorded. The results are summarised
in the table below. (9709/52/O/N/21 number 7)
1
Distance, x m x ≤ 200 x ≤ 300 x ≤ 500 x ≤ 900 x ≤ 1200 x ≤ 1600
Cumulative Frequency 16 46 88 122 134 140
(a) On the grid, draw a cumulative frequency graph to represent these results.
(b) Use your graph to estimate the interquartile range of these distances.
(c) Calculate estimates of the mean and standard deviation of the distances.
6. The time, t minutes, taken to complete a walking challenge by 250 members of a club are summarised
in the table. (9709/53/O/N/22 number 3)
7. The times, to the nearest minute, of 150 athletes taking part in a charity run are recorded. The
results are summarised in the table. (9709/51/O/N/23 number 4)
Time in minutes 101 − 120 121 − 130 131 − 135 136 − 145 146 − 160
Frequency 18 48 34 32 18
8. The heights, in cm, of the 11 players in each of two teams, the Aces and the Jets, are shown in the
following table. (9709/52/O/N/23 number 4)
Aces 180 174 169 182 181 166 173 182 168 171 164
Jets 175 174 188 168 166 174 181 181 170 188 190
(a) Draw a back-to-back stem-and-leaf diagram to represent this information with the Aces on the
left-hand side of the diagram.
(b) Find the median and interquartile range of the heights of the players in the Aces.
(c) Give one comment comparing the spread of the heights of the Aces with the spread of the
heights of the Jets.
9. The weights, x kg, of 120 students is a sports college are recorded. The results are summarised in
the following table. (9709/53/O/N/23 number 4)
2
(c) Calculate estimates for the mean and standard deviation of the weights of the 120 students.
10. Helen measures the lengths of 150 fish of a certain species in a large pond. These lengths, correct
to the nearest centimetre, are summarised in the following table. (9709/52/F/M/20 number 7)
Length (cm) 0 − 9 10 − 14 15 − 19 20 − 30
Cumulative Frequency 15 48 66 21
11. The following table gives the weekly snowfall, in centimetres, for 11 weeks in 2018 at two ski resorts,
Dados and Linva. (9709/52/O/N/20 number 5)
Dados 6 8 12 15 10 36 42 28 10 22 16
Linva 2 11 15 16 0 32 36 40 10 12 9
12. The times taken, in minutes, by 360 employees at a large company to travel from home to work are
summarised in the following table. (9709/53/O/N/21 number 3)
3
Answers
1. The numbers of chocolate bars sold per day in a cinema over a period of 100 days are summarised
in the following table. (9709/51/M/J/20 number 7)
Notice how there are gaps between our classes. Let’s apply continuity correction to
get rid of the gaps,
No. of chocolate bars sold 0.5 − 10.5 10.5 − 15.5 15.5 − 30.5 30.5 − 50.5 50.5 − 60.5
No. of days 18 24 30 20 8
Note: To apply correction, add 0.5 to the upper bounds and subtract 0.5 from the
lower bounds.
Label the y-axis with frequency density and the x-axis with number of chocolate bars
sold. Plot frequency density against the number of chocolate bars sold, ensuring
that you use the correct class width for each class,
4
Frequency density
(b) What is the greatest value of the interquartile range for this data?
IQR = q3 − q1
4
To find the greatest possible value of the interquartile range we need the largest
value of the upper quartile and the smallest value of the lower quartile. Let’s find
the largest value of the upper quartile,
3
q3 = n
4
3
q3 = × 100
4
q3 = 75
31 − 50
11 − 15
IQR = 50 − 11
IQR = 39
5
The formula for grouped mean is,
Σxf
x=
Σx
Σxf = 2355
Σx2 f = 77917.5
Simplify,
σ = 15.0
x = 23.55 σ = 15.0
2. The annual salaries, in thousands of dollars, for 11 employees at each of two companies A and B
are shown below. (9709/53/M/J/20 number 6)
Company A 30 32 35 41 41 42 47 49 52 53 64
Company B 26 47 30 52 41 38 35 42 49 31 42
(a) Represent the data by drawing a back-to-back stem-and-leaf diagram with company A on the
left-hand side of the diagram.
6
We need to determine a scale for our stem. Our data ranges from 26000 to 64000
so our stem will run from 2 to 6 to represent 2000 to 6000. Now let’s draw our
stem-and-leaf diagram and don’t forget to add a key,
A B
2 6
5 2 0 3 0 1 5 8
9 7 2 1 1 4 1 2 2 7 9
3 2 5 2
4 6
q2 = $42000
Now let’s find the interquartile range. The formula for interquartile range is,
IQR = q3 − q1
q3 = 52000
7
Find the data point in position 3,
q1 = 35000
IQR = $17000
A new employee joins company B. The mean salary of the 12 employees is now $38 500.
(c) Find the salary of the new employee.
We are given that the mean is 12 and the original number of cyclists is 11,
Σx + x12
38500 =
11 + 1
Σx + x12
38500 =
12
Σx = 26000+47000+30000+52000+41000+38000+35000+42000+49000+31000+42000
Σx = 433000
8
Therefore, the final answer is,
$29000
3. The times, t minutes, taken by 150 students to complete a particular challenge are summarised in
the following cumulative frequency table. (9709/51/O/N/20 number 6)
Label the y-axis as cumulative frequency and the x-axis as times (t minutes). Plot
the upper bounds against the cumulative frequency. Join the plots to form an
s-shaped curve,
150
140
130
120
110
100
Cumulative Frequency
90
80
70
60
50
40
30
20
10
0
0 10 20 30 40 50 60 70 80 90 100
Time taken (t minutes)
(b) 24% of the students take k minutes or longer to complete the challenge. Use your graph to
estimate the value of k.
9
This means 76% of the students take less than k minutes. Let’s find 76% of 150,
76
× 150 = 114
100
Draw construction lines at a cumulative frequency of 114 and read off the time,
k = 44 minutes
Σxf = 10 × 12 + 25 × 36 + 35 × 58 + 50 × 28 + 80 × 16
Σxf = 5730
Σx2 f = 267150
10
Substitute into the formula for standard deviation,
r
267150
σ= − (38.2)2
150
Simplify,
σ = 17.9
x = 38.2 σ = 17.9
4. A particular piece of music was played by 91 pianists and for each pianist, the number of incorrect
notes was recorded. The results are summarised in the table. (9709/53/O/N/20 number 7)
Notice how there are gaps between our classes. Let’s apply continuity correction to
get rid of the gaps,
Number of incorrect notes 0.5 − 5.5 5.5 − 10.5 10.5 − 20.5 20.5 − 40.5 40.5 − 70.5
Frequency 10 5 26 32 18
Note: To apply correction, add 0.5 to the upper bounds and subtract 0.5 from the
lower bounds.
Label the y-axis with frequency density and the x-axis with number of incorrect
notes. Plot frequency density against the number of incorrect notes, ensuring that
you use the correct class width for each class,
11
3
Frequency density
2
(b) State which class interval contains the lower quartile and which class contains the upper
quartile. Hence find the greatest possible value of the interquartile range.
Let’s start by finding the class that contains the upper quartile,
3
q3 = n
4
3
q3 = × 91
4
q3 = 68.25
21 − 40
21 − 40
Now let’s find the class that contains the lower quartile,
1
q1 = n
4
1
q1 = × 91
4
q1 = 22.75
11 − 20
11 − 20
12
The formula for the interquartile range is,
IQR = q3 − q1
To find the greatest possible value of the interquartile range we have to find the
largest value of the upper quartile and smallest value of the lower quartile. The
upper quartile lies within,
21 − 40
IQR = 40 − 11
IQR = 29
Number of incorrect notes 0.5 − 5.5 5.5 − 10.5 10.5 − 20.5 20.5 − 40.5 40.5 − 70.5
Frequency 10 5 26 32 18
Σxf = 2448
13
Substitute into the formula for mean,
2448
x=
91
x = 26.9
5. The distances, xm, travelled to school by 140 children was recorded. The results are summarised
in the table below. (9709/52/O/N/21 number 7)
(a) On the grid, draw a cumulative frequency graph to represent these results.
Label the y-axis as cumulative frequency and the x-axis as distance (x m). Plot the
upper bounds against the cumulative frequency. Join the plots to form an s-shaped
curve,
140
130
120
110
100
90
Cumulative Frequency
80
70
60
50
40
30
20
10
0
0 200 400 600 800 1,000 1,200 1,400 1,600
Distance (x m)
(b) Use your graph to estimate the interquartile range of these distances.
14
The formula for interquartile range is,
IQR = q3 − q1
Draw construction lines at a cumulative frequency of 105 and read off the distance,
q3 = 685
Draw construction lines at a cumulative frequency of 35 and read off the distance,
q1 = 260
IQR = 425
15
Let’s find Σxf ,
Σxf = 70700
Σx2 f = 50405000
Simplify,
σ = 324
x = 505 σ = 324
6. The time, t minutes, taken to complete a walking challenge by 250 members of a club are summarised
in the table. (9709/53/O/N/22 number 3)
16
Label the y-axis as cumulative frequency and the x-axis as time (t minutes). Plot the
upper bounds against the cumulative frequency. Join the plots to form an s-shaped
curve,
250
225
200
175
Cumulative Frequency
150
125
100
75
50
25
0
0 10 20 30 40 50 60
Time taken (t minutes)
(b) Use your graph to estimate the 60th percentile of the data.
Draw construction lines at a cumulative frequency of 150 and read off the time,
t = 38 minutes
17
The formula for standard deviation is,
s
Σx2 f
σ= − x2
Σf
Σx2 f = 333650
Simplify,
σ = 12.3
7. The times, to the nearest minute, of 150 athletes taking part in a charity run are recorded. The
results are summarised in the table. (9709/51/O/N/23 number 4)
Time in minutes 101 − 120 121 − 130 131 − 135 136 − 145 146 − 160
Frequency 18 48 34 32 18
Notice how there are gaps between our classes. Let’s apply continuity correction to
get rid of the gaps,
Time in minutes 100.5 − 120.5 120.5 − 130.5 130.5 − 135.5 135.5 − 145.5 145.5 − 160.5
Frequency 18 48 34 32 18
Note: To apply correction, add 0.5 to the upper bounds and subtract 0.5 from the
lower bounds.
18
Label the y-axis with frequency density and the x-axis with time (minutes). Plot
frequency density against the time, ensuring that you use the correct class width for
each class,
Frequency density 5
Time (minutes)
(b) Calculate estimates for the mean and standard deviation of the times taken by the athletes.
Time in minutes 100.5 − 120.5 120.5 − 130.5 130.5 − 135.5 135.5 − 145.5 145.5 − 160.5
Frequency 18 48 34 32 18
Σxf = 19785
19
Now let’s find the standard deviation,
r
Σx2 f
σ= − x2
Σx
Simplify,
σ = 11.7
8. The heights, in cm, of the 11 players in each of two teams, the Aces and the Jets, are shown in the
following table. (9709/52/O/N/23 number 4)
Aces 180 174 169 182 181 166 173 182 168 171 164
Jets 175 174 188 168 166 174 181 181 170 188 190
(a) Draw a back-to-back stem-and-leaf diagram to represent this information with the Aces on the
left-hand side of the diagram.
We need to determine a scale for our stem. Our data ranges from 164 to 190 so our
stem will run from 16 to 19 to represent 160 to 190. Now let’s draw the stem-and-leaf
diagram, don’t forget to add the key,
Aces Jets
9 8 6 4 16 6 8
4 3 1 17 0 4 4 5
2 2 1 0 18 1 1 8 8
19 0 6
20
Identify the data point in position 6,
q2 = 173 cm
Now let’s find the interquartile range. The formula for interquartile range is,
IQR = q3 − q1
IQR = 13 cm
q2 = 173 cm IQR = 13 cm
(c) Give one comment comparing the spread of the heights of the Aces with the spread of the
heights of the Jets.
21
If you look at the stem and leaf diagram above, you will notice that the Jets have
a wider range (166 to 190), than the
The Jets have a wider range of heights, (190 − 160) = 24, than the Aces, (182 − 164) = 18.
9. The weights, x kg, of 120 students is a sports college are recorded. The results are summarised in
the following table. (9709/53/O/N/23 number 4)
Label the y-axis as cumulative frequency and the x-axis as weights (x kg). Plot the
upper bounds against the cumulative frequency. Join the plots to form an s-shaped
curve,
120
110
100
90
80
Cumulative Frequency
70
60
50
40
30
20
10
0
40 50 60 70 80 90 100
Weight (x kg)
(b) It is found that 35% of the students weigh more than W kg. Use your graph to estimate the
value of W .
This means that 65% of the students weight less than W kg. Let’s find 65% of 120,
65
× 120 = 78
100
22
Draw construction lines at a cumulative frequency of 78 and read off the weight,
W = 76 kg
Σxf = 8545
Σx2 f = 625062.5
23
Simplify,
σ = 11.8
Notice how there are gaps between our classes. Let’s apply continuity correction to
get rid of the gaps, and let’s also find the cumulative frequency,
Length (cm) 0 − 9.5 9.5 − 14.5 14.5 − 19.5 19.5 − 30.5
Cumulative Frequency 15 63 129 150
Label the y-axis as cumulative frequency and the x-axis as Length (cm). Plot the
cumulative frequency against the upper bounds of the length. Join the plots to form
an s-shaped curve,
150
140
130
120
110
100
Cumulative Frequency
90
80
70
60
50
40
30
20
10
0
0 5 10 15 20 25 30
Length (cm)
(b) 40% of these fish have a length of dcm or more. Use your graph to estimate the value of d.
24
This means that 60% of these fish have a length less than d. Let’s find 60% of 150,
60
× 150 = 90
100
Draw construction lines at a cumulative frequency of 90 and read off the length,
d = 16.5 cm
Let’s start by drawing a table with the midpoints of the classes against their fre-
quencies,
Mid Interval 4.75 12 17 25
Frequency 15 48 66 21
Σx2 f = 39449.4375
Simplify,
σ 2 = 29.1
11. The following table gives the weekly snowfall, in centimetres, for 11 weeks in 2018 at two ski resorts,
Dados and Linva. (9709/52/O/N/20 number 5)
Dados 6 8 12 15 10 36 42 28 10 22 16
Linva 2 11 15 16 0 32 36 40 10 12 9
25
We need to determine a scale for our stem. Our data ranges from 0 to 42 so our
stem will run from 0 to 4 to represent 0 to 40. Now let’s draw the stem-and-leaf
diagram, don’t forget to add the key,
Dados Linva
8 6 0 0 2 9
6 5 2 0 0 1 0 1 2 5 6
8 2 2
6 3 2 6
2 4 0
q2 = 15 cm
Now let’s find the interquartile range. The formula for interquartile range is,
IQR = q3 − q1
26
Find the data point in position 3,
q1 = 10
IQR = 28 − 10
IQR = 18 cm
q2 = 15 cm IQR = 18 cm
(c) The median, lower quartile and upper quartile of the weekly snowfall for Linva are 12, 9 and
32cm respectively. Use this information and your answers to part (b) to compare the central
tendency and the spread of the weekly snowfall in Dados and Linva.
To compare the spread, we need the interquartile range. Let’s find the interquartile
range for Linva,
IQR = q3 − q1
IQR = 32 − 9
IQR = 23
12. The times taken, in minutes, by 360 employees at a large company to travel from home to work are
summarised in the following table. (9709/53/O/N/21 number 3)
27
Label the y-axis with frequency density and the x-axis with time taken (minutes).
Plot the frequency density against the classes, ensuring each class has its respective
class width,
22
20
18
Frequency density
16
14
12
10
5 10 20 30 40 50
(b) Calculate an estimate of the mean time taken by an employee to travel to work.
Time, t minutes 0 ≤ t < 5 5 ≤ t < 10 10 ≤ t < 20 20 ≤ t < 30 30 ≤ t < 50
Frequency 23 102 135 76 24
Σxf = 5707.5
28