Topic 4A. Descripitve Statistics - Probability

International Baccalaureate
MATHEMATICS
Applications and Interpretation SL (and HL)
Lecture Notes
Christos Nikolaidis
TOPIC 4
STATISTICS AND PROBABILITY
4A. Descriptive statistics - Probability
4.1 BASIC CONCEPTS OF STATISTICS …..……………………………………………………. 1
4.2 MEASURES OF CENTRAL TENDENCY AND SPREAD ………………………… 4
4.3 FREQUENCY TABLES – GROUPED DATA …………………………………………… 11
4.4 REGRESSION ……………………..……………………………………………………………………. 17
4.5 ELEMENTARY SET THEORY ……………...………………………………………………….. 23
4.6 PROBABILITY ……………………………………...………………………………………………….. 28
4.7 CONDITIONAL PROBABILITY – INDEPENDENT EVENTS ………..………….. 33
4.8 TREE DIAGRAMS ………………………………...………………………………………………….. 39
4.9 DISTRIBUTIONS - DISCRETE RANDOM VARIABLES ………………………….. 47
4.10 BINOMIAL DISTRIBUTION – B(n,p) ..………..………………………….……………….. 52
4.11 NORMAL DISTRIBUTION – N(μ,σ2) …………………………………….……………….. 57
Only for HL
4.12 POISSON DISTRIBUTION – Po(m) …………………………………….……………….. 67
4.13 MARKOV CHAINS …………………………………………………………………………………. 72
October 2021
TOPIC 4: STATISTICS AND PROBABILITY Christos Nikolaidis
4.1 BASIC CONCEPTS OF STATISTICS
In Statistics we deal with data collection, presentation, analysis and

interpretation of results. Data can be from
Population (the entire list of a specified group)

Sample (a subset of the Population)
We usually investigate a small sample of the population to draw

conclusions for the whole population itself.
Numerical data can be
Discrete OR Continuous
{10,20,30} [40,100]
{0,1,2,3,…} R
(finite or numerable set) (interval)
Data can be organized in several ways. We present some examples

below
Frequency table Pie chart
Colored Balls Frequency

Blue 13
Green 8
Red 10
Yellow 3
1
Bar graph (for discrete data)
Colored
Freq
Balls
Blue 13
Green 8
Red 10
Yellow 3
Histogram (for continuous data)
Age Frequency
[0,10) 7
[10,20) 5
[20,30) 1
[30,40) 3
Stem and leaf Diagram
Key: 1|3 represents 13

Stem Leaf
Data
1 2, 4, 6, 6
12, 14, 16, 16, 20, 21
2 0, 1, 1, 1, 5
21, 21, 25, 32, 39, 40
3 2, 9
43, 44, 47, 48, 49, 53
4 0, 3, 4, 7, 8, 9
5 3
2
As far as sampling is concerned, it is very crucial to select a sample

which is not biased. There are several sampling techniques which
face this bias.
Suppose that we have a population of 100,000 people and wish to

select a sample of 1000 people. If we select the first 1000 in a list,
or the youngest 1000 there is certainly a bias in our selection.
Simple random sampling: We select 1000 people out of a hat

Each member has an equal probability
Systematic sampling: Since 100000/1000=100 (=period)

we pick a random starting point (e.g.
the 20th person) and pick every 100th
person (i.e. 20th, 120th, 220th, …)
Stratified sampling: We divide the population in subgroups

(say men and women, or under and
over 40 years old). We pick a sample
from each group
Quota sampling: As in stratified but we pick

proportional samples according to the
proportion of the subgroups in the
population.
There are advantages and disadvantages in each method. Simple

random sampling is fair but it may be very time consuming
compared to the systematic sampling. In systematic sample though,
if there is a periodic pattern in the population there may be a bias.
Suppose that the 100000 are in groups of 100 people. If the first
person of the group is the leader, then the sampling method of
selecting every 100th person may provide a sample of only leaders
or no leaders at all.
3
4.2 MEASURES OF CENTRAL TENDENCY AND SPREAD

Consider the following numerical data1:
10, 20, 20, 20, 30, 30, 40, 50, 70, 70, 80
The total number of entries is n=11.
In order to describe these data we use
 3 measures of central tendency

 3 measures of spread
The first three measures indicate a representative central value

which best describes the data, while the second three measures
indicate if our data are very close or dispersed to each other.
 MEASURES OF CENTRAL TENDENCY (The 3 M’s)
A) MEAN = The sum of all values divided by n.
Here
10  20  20  20  30  30  40  50 70 70 80
mean = = 40
11
B) MODE = the most frequent value

Here
mode = 20
C) MEDIAN = The value in the middle

(provided they have been placed in ascending order).
Here, it is the sixth number in the list
median = 30
1 This set of values is either a population or a sample.
4
NOTICE
 For the data 10, 20, 30
Median = 20
For the data 10, 20, 30, 40
Median = 25
That is, for an even number of data,
median = the mean of the two middle values
 The median is not the n -th entry as one would possibly expect.
2
the median is the n 1 -th entry.

2
For example,
n 1
if n=11, =6, thus the median is the 6th entry. See the
2
example above;
n 1
if n=10, =5.5, thus the median is the mean of the 5th and
2
6th entries; for the 10 entries
10, 20, 30, 40, 50, 60, 70, 80, 90, 100
the median is the mean of 50 and 60. Hence median = 55
The median is also denoted by Q2 (the index 2 will be clarified soon)
 The mean is denoted by μ (or by x ). In fact, we use
the Greek letter μ for the whole population.

the Latin letter x for a sample of the population.
If our data are denoted by x1 , x 2 ,…, xn , the mean is given by

x1  x 2  x3  ⋯
μ =
n
or otherwise
μ
x i
5
EXAMPLE 1
Find
a) the integers a  b  c , given that mean=4, mode=5, median=5.
The median implies that b=5. The mode implies that also c=5.
a5 5
Then  4  a  10  12  a  2
3
Therefore, the numbers are 2,5,5.
b) the integers a  b  c  d , given that mean=5, mode=7, median=6.
The median implies that either b=c=6 or (b=5 and c=7)
Since the mode is 7 we obtain b=5 and c=d=7.

a  5 7 7
Then  5  a  19 20  a  1
4
Therefore, the numbers are 1,5,7,7.
 MEASURES OF SPREAD
We use the same set of data
10, 20, 20, 20, 30, 30, 40, 50, 70, 70, 80
A) STANDARD DEVIATION
The standard deviation is perhaps the most “reliable” measure

for spread, as it takes all data into consideration. It measures
how far the entries from the mean are. It can be found by using
the GDC (directions will be given later on).
The standard deviation is denoted2 either by σ or by sn .
For our example the GDC gives σ = 22.96.
2 In fact,
the Greek letter σ is used for the whole population;
the Latin letter sn is used for a sample of the population
6
B) RANGE = (maximum value) - (minimum value)

Here
range = 80-10 = 70
C) INTERQUARTILE RANGE = IQR = Q3 – Q1

where
Q1 = LOWER QUARTILE = the median of the values before Q2
Q3 = UPPER QUARTILE = the median of the values after Q2
Here, before the median Q2=30, we have 5 numbers, hence

Q1=20 (this is the 3rd entry)
Also,
Q3=70 (it is the 3rd entry from the end)
Therefore,
IQR = 70-20 = 50
As the estimation of the values Q1, Q2, Q3 is quite tricky, let us see
some extra cases in the following example.
EXAMPLE 2 Remember that

n1
 for the value of the median Q2 we consider the th entry.
2
 for the values of Q1 and Q3 we consider only the entries before
and the entries after the median respectively.
a) For n=7 entries: 10, 20, 30, 40, 50, 60, 70
The median is Q2=40 (the 4th entry). Hence Q1=20, Q3=60.
b) For n=8 entries: 10, 20, 30, 40, 50, 60, 70, 80
The median is Q2=45 (the 4.5th entry). Hence Q1=25, Q3=65.
c) For n=9 entries: 10, 20, 30, 40, 50, 60, 70, 80, 90
The median is Q2=50 (the 5th entry). Hence Q1=25, Q3=75.
d) For n=10 entries: 10, 20, 30, 40, 50, 60, 70, 80, 90, 100
Then Q2=55 (the 5.5th entry). Hence Q1=30, Q3=80.
7
NOTICE
The square of the standard deviation is called variance. That is
2
variance = σ2 or sn
For our example, σ2= 22.962 = 527.27
 USE OF GDC
We can use the GDC to easily obtain all these measures.

For Casio CFX we select
 MENU
 STAT
 Complete List 1 with values of x (our data)
 CALC
 (1VAR): We obtain all the statistics.
Notice that
The standard deviation in the GDC is denoted by σχ
The variance is not given; it is simply the square of σχ
 BOX AND WHISKER PLOT

Consider again the initial example
10, 20, 20, 20, 30, 30, 40, 50, 70, 70, 80
In an appropriate horizontal scale we mark 5 figures:
min, Q1, Q2, Q3, max

in the following way:
min Q1 Q2 Q3 max
8
This diagram is helpful, particularly when we have a large number

of entries. It shows the “density” of data within the whole range. In
fact, the box plot splits the whole range of data in 4 intervals.
Generally speaking, each interval contains 25% of the entries. Thus
the following conclusions can be drawn:
The lowest 25% is below Q1 The upper 25% is above Q3

The lowest 50% is below Q2 The upper 50% is above Q2
The middle 50% is between Q1 and Q3
 MORE DETAILS
1) Percentiles
The values Q1, Q2, Q3 are also called
Q1 : 25th-percentile
Other percentiles may also be defined in a similar way; we will give

further examples in the next paragraph.
2) Outliers
Very extreme values in a set of data (that is very small or very

large) may give a false impression for out data. They are known as
outliers. We agree that
an outlier is any value

below Q1 – 1.5×IQR
or above Q3 + 1.5×IQR,
Such a value is viewed as being too far from the central values to
be reasonable. In our example,
Q1 - 1.5×IQR = 20 - 1.5×50 = - 55
Q3 + 1.5×IQR = 70 + 1.5×50 = 145
i.e. there are no outliers.
9
 FORMULAS FOR VARIANCE AND STANDARD DEVIATION

(not in the syllabus)
The formulas are not in the syllabus. We give them just for
information.
If our data are x1 , x 2 ,…, x n
the variance is given by σ 2


(x
i  μ)2
n
the standard deviation is given by σ

(xi  μ)2
n
For our example, since μ=40,
(10- 40)2  (20- 40)2  (20- 40)2 ⋯ (80- 40)2

variance = = 527.27
11
standard deviation = 527.27 = 22.96
An alternative and more practical formula for the variance is given

by
σ 2

x i
2

- x2
n
For our example,
10 2  20 2  20 2  ⋯  80 2
variance = -402 =527.27
11
10
4.3 FREQUENCY TABLES – GROUPED DATA
Consider again the numerical data:

10, 20, 20, 20, 30, 30, 40, 50, 70, 70, 80
The total number of entries is n=11.
An alternative way of presentation is the frequency table:
Data Frequency
x f
10 1
20 3
30 2
40 1
50 1
70 2
80 1
n=11
Let us study again the basic measures for these data.
 MEASURES OF CENTRAL TENDENCY (The 3 M’s)
A) MEAN = The sum of all values divided by n.

The MEAN is given by
110 3 20 2 30 1 40 150 270 180
mean = = 40
11
In general, given that fi is the frequency of the entry xi, the

formula is
μ
f1 x1  f2 x 2  f3 x3 ⋯
or otherwise μ
f x
i i
n n
11
B) MODE = the most frequent value

It is very obvious now. The entry x of the highest frequency is
mode = 20
C) MEDIAN = The value in the middle
It is still the entry in position n  1 , that is the 6th entry.

2
We can easily see that this is 30.
It helps here to add an extra column in the table above with the
so-called cumulative frequencies:
Data Frequency Cumulative

x f frequency (c.f.)
10 1 1
20 3 4
30 2 6
40 1 7
50 1 8
70 2 10
80 1 11
n=11
It simply gives the total number of entries up to each row. For

example, the total number of entries up to 20 is 1+3=4.
The MEDIAN, i.e. the 6th entry, is 30.
 MEASURES OF SPREAD
A) STANDARD DEVIATION
Again, it can be directly obtained by the GDC.
For our example the GDC gives σ = 22.96.
Thus the variance is σ2 = 527.27
12
B) RANGE = (maximum value of x) - (minimum value of x)

It is very obvious here
range = 80-10 = 70
C) INTERQUARTILE RANGE = IQR = Q3 – Q1

The cumulative frequency table helps here as well.
The median Q2=30 is in the 6th position.

n 1
Thus, before the median we have 5 entries. Since  3,
2
Q1=20 (this is the 3rd entry)
and
Q3=70 (this is the 3rd entry from the end)
Therefore,
IQR = 70-20 = 50
 USE OF GDC
We can use the GDC to easily obtain all these measures.

 MENU
 STAT
 Complete List 1 with values of x (our data)
List 2 with frequencies
 CALC
 SET: we check the first two lines
The first line is OK. (1Var XList :List1)
For the second line (1Var Freq :----), select between
F1: enter 1, if there are no frequencies
F2: enter List 2 to consider frequencies
 Go back (EXIT)
 1VAR: We obtain all the statistics.
Check the value of n first (number of entries), to ensure that

all data have been considered.
13
NOTICE (for the GDC)

 The variance is not given; it is simply the square of σχ
 Since the GDC gives minX,Q1,Med,Q3,maxX remember that
Range = maxX – minx Interquartile Range = Q3 – Q1
The box and whisker plot uses exactly those 5 measures

 Extra information given:
Σx : the sum of all entries, i.e. x1+x2+x3+…
Σx2: the sum of the squares, i.e. x12+x22+x32+…
sχ : it is known as unbiased st. deviation (not in the syllabus!)
 GROUPED DATA
Suppose that 100 students took an exam and obtained scores from
1 to 60 (full marks), according to the following table:
Score Midpoint No of students Cumulative

(x) (for x) (frequency f) frequency (cf)
0  x  10 5 8 8
10  x  20 15 12 20
20  x  30 25 10 30
30  x  40 35 25 55
40  x  50 45 35 90
50  x  60 55 10 100
n=100
i.e. 8 students obtained scores from 1 up to 10, and so on.
 The mean and the standard deviation are still calculated as in a

usual frequency table, but now x1,x2,x3,… are the midpoints of
the intervals.
For example,
8 5  12 15  10  25  25  35  35  45  10  55
μ  34.7
100
14
These measures may also be obtained by the GDC, where the

LIST1 contains the midpoints of x. Here,
μ=34.7 σ =14.31
 Moreover, instead of the mode we have the modal group here.

That is the interval of the highest frequency. In our example, the
modal group is 40  x  50.
 For the median Q2 and the quartiles Q1 and Q3:

we need to draw the so-called cumulative frequency diagram
x-axis: values of x (we consider upper bounds of intervals)
y-axis: cumulative frequencies
x: up to  10  20  30  40  50  60
y: c.f 8 20 30 55 90 100
Q1=25 Q2=38 Q3=46
For the estimation of Q1, Q2, Q3 follow
15
Step 1: Divide y-axis into four equal parts

(Here we divide at y=25, y=50, y=75)
Step 2: Draw three horizontal lines until you meet the curve
Step 3: Draw three vertical lines from the intersection points
Obtain Q1, Q2, Q3 on x-axis (look at above)
Below that graph we can easily draw box and whisker plot:
Min=0 Q1=25 Q2=38 Q3=46 Max=60
 Remember that the values Q1, Q2, Q3 are also called

In the same way we can find any percentile. For example, for the
40th-percentile
Estimate 40% of n: here 40% of 100 students is 40;
Draw a horizontal line at y=40 until you meet the curve;
Then draw a vertical line;
Hence
40th-percentile = 35.
In other words, 40% of the students have scores below 35.
 Let us check if there are outliers:

IQR = 46-25=21
Q1 - 1.5×IQR = 25 - 1.5×21 = - 6.5
Q3 + 1.5×IQR = 46 + 1.5×21 = 77.5
There are no scores lower than -6.5 or greater than 77.5, that is
there are no outliers.
16
4.4 REGRESSION
We have a list of paired data. For example
x 10 12 15 20 23 28 30
y 120 135 174 213 270 301 305
We assume that x is the independent variable, y is the dependent

variable. Let us also see these points (x,y) on a scatter diagram.
y
300
200
100
x
-5 5 10 15 20 25 30 35
The main question here is whether there is a linear relationship

between the values of x and the corresponding values of y.
There is a parameter r, called correlation coefficient3 that gives the

extent of this relationship. It takes values
-1 ≤ r ≤ 1
The closest to the ends ±1, the more our data are linearly related.
(-1 implies a negative slope while +1 implies a positive slope)
The closest to 0, the less our data are linearly related.
There is also a line y=ax+b that best fits our data; it is known as
regression line. We can easily obtain these details by using a GDC.
3It is known as Pearson’s product-moment correlation coefficient
17
 USE OF GDC

 MENU
 STAT
 Complete List 1 with values of x; List 2 with values of y
 CALC
 REG
 X
 aX+b : look at the values of a,b,r.
For our example,
r =0.99 there is a very strong correlation between x and y

a =9.83
The regression line is y =9.83x+23.1
b =23.1
300
200
y =9.83x+23.1
100
x
-5 5 10 15 20 25 30 35
By using the regression line y=f(x) we may predict values of y

corresponding to values of x that are not in the list. For example
for x=18, we estimate y = 9.8318+23.1  200

for x=40, we estimate y = 9.8340+23.1  416
Notice that x=18 is within the range of our list while x=40 is not.
f(18)=200 is known as interpolation, f(40)=416 as extrapolation.
In general, interpolations are more reliable than extrapolations.
Notice. In order to predict a value of x corresponding to a given y

we do not use the same regression line. We find a new regression
line for x on y. In our example, the GDC gives x =0.0997y-1.92
18
 CHARACTERISTICS OF THE REGRESSION LINE y=ax+b

The regression line
 passes through the point M( x , y ), where
x = the mean of the values of x
y = the mean of the values of y
 separates the points in (almost) two halves: half of the points

are above and half below the line.
The values of x , y can also be obtained by the GDC (together with

other statistics). In the STAT mode, after inserting the values of x
and y, select
 CALC
 2VAR: We obtain all the statistics, separately for x’s and y’s
In our example
x = 19.7 y =216.9
Thus the line passes through the point M(19.7, 216.9).
 CHARACTERISTICS OF THE CORRELATION COEFFICIENT r
The correlation between x and y is characterised according to the

value of r as follows:
-1 -0.75 -0.5 -0.25 0 0.25 0.5 0.75 1
strong moderate weak very weak weak moderate strong

negative negative negative or no positive positive positive
correlation correlation correlation correlation correlation correlation correlation
To better understand the correlation coefficient r, let us see some

characteristic cases (find the results below in your GDC for
practice).
19
Data Scatter diagram Results
x y
y
10
r =1
1 2 8
perfect positive
2 4 6 correlation
3 6 4
Regression line:
4 8 2
x y=2x
5 10 1 2 3 4 5
x y y
r =-1
10
1 10 perfect negative
8
2 8 correlation
6
3 6 4
Regression line:
4 4 2
y=-2x+12
5 2 1 2 3 4 5
x
Let us slightly modify our data
x y y
10
r =0.98
1 2
8 strong positive
2 3
6 correlation
3 7 4
4 8 Regression line:
2
5 10 x y=2.1x-0.3
1 2 3 4 5
x y y
10
r =-0.98
1 10 strong negative
8
2 8 6 correlation
3 7 4
Regression line:
4 3 2
x y=-2.1x+12.3
5 2 1 2 3 4 5
and a final extreme case
x y y
10
r =0
1 8
8 no correlation
2 2
6 at all
3 5 4
2
5 8
x y=5
1 2 3 4 5
20
 SPEARMAN’S RANK CORRELATION COEFFICIENT rs
The Spearman correlation coefficient is defined as the Pearson

correlation coefficient (seen above) between the rank variables.
Look at the example

x y
10 105
20 103
30 125
40 130
50 128
The Pearson correlation coefficient is r = 0.88.
Let us also observe the ranks of the data
x y rank rank
of x of y
10 105 1 2
20 103 2 1
30 125 3 3
40 130 4 5
50 128 5 4
The correlation coefficient between the last two columns is 0.8.
This is the Spearman correlation coefficient rs.

r = 0.88 rs = 0.8
Pearson's correlation coefficient (r) indicates the degree of linear

relationship between two variables.
Spearman's correlation coefficient (rs) indicates the degree of

monotonic relationship between the variables (either linear or not),
that is what extent y increases when x increases.
In case of equal data items we use averages ranks. For example, if

the first four values for y are equal, say 20, 20, 20, 20
instead of ranks 1, 2, 3, 4
we use the ranks 2.5, 2.5, 2.5, 2.5.
21
It is possible to have r ≠ 1 and rs = 1.
Look at one of the examples we saw earlier
x y y
10
r =0.98
1 2
8 strong positive
2 3
6 correlation
3 7 4
2
5 10 x y=2.1x-0.3
1 2 3 4 5
We write down the ranks for our data:
x y rank rank
of x of y
1 2 1 1
2 3 2 2
3 7 3 3
4 8 4 4
5 10 5 5
Here
r = 0.98
rs = 1 (perfect monotonic relationship)
since y always increases as x increases.
Spearman’s correlation coefficient is less sensitive to outliers than

Pearson’s product moment correlation coefficient.
22
4.5 ELEMENTARY SET THEORY
 BASIC NOTIONS
In elementary set theory, a set is just a collection of objects (or
elements). It is usually denoted by a capital letter. For example,
R = the set of real numbers

Q = the set of rational numbers
When listed, the elements of a set are separated by commas “,”

and included between the symbols { and }. For example,
N = {0,1,2,3,4,…} (i.e. the set of natural numbers)

Z = {…,-3,-2,-1,0,1,2,3,…} (i.e. the set of all integers)
or less popular sets, such as
A = {1,2,3} (it contains only 3 elements)

B = {a,b,c,d} (it contains 4 letters)
C = {Chris, Mary, Tom} (it contains 3 names)
etc
To declare that the element a is contained in set B we write
a B
To declare that the element f is not contained in set B we write
f B
The most trivial set is the empty set. It contains no elements, it is

denoted by { } or by the symbol .
Let us consider the set A = {1,2,3}. The subsets of A are sets that
contain some (or none or all) elements of A. There are 8 subsets:

{1}, {2}, {3}
{1,2}, {1,3}, {2,3}
{1,2,3}
23
In general,
if A contains n elements, there are 2n subsets.
Indeed, here, A contains 3 elements and possesses 23=8 subsets.
If A = {1,2,3} and B = {1,2}, to declare that B is a subset of A, we

write
B A
Do not forget that always

 A (The empty set is a subset of any set)
AA (Any set is a subset of itself)
All subsets of A except itself are also called proper subsets. To

emphasize that B is a proper subset of A we write
B A
 VENN DIAGRAMS
We usually refer to a large set S, called universal set, and consider
several subsets of S.
Let
S = { a,b,c,d,e,f,g,h,i,j }
be our universal set. We consider the subset

A = {a,b,c,d,e}
A helpful way to present this information is by using a Venn

diagram:
A
a b d f h
c e g i j
24
If we also consider the subset

B = {d,e,f,g}
the Venn diagram becomes
A B
a b d f h
c e g i j
As we usually deal with large universal sets, in a Venn diagram we

are not interested so much for the elements themselves but only for
the number of elements in each region. In this case the Venn
diagram above takes the form
S 10
A B
3 2 2
We denote by
n(A) = the number of elements of set A
In our example
n(S) = 10
n(A)=5 n(B)=4
Notice that the number n(A)=5 does not appear on the Venn
diagram. The subset A consists of two regions of size 3 and 2, thus
n(A)=3+2=5
25
Now we can study some basic operations between sets. Let us refer
again to our example where S = { a,b,c,d,e,f,g,h,i,j } and
A = {a,b,c,d,e}
B = {d,e,f,g}
 THE COMPLEMENT OF A: A΄ (not A)

It contains the elements that are not in A.
S
A
In our example A΄ = {f,g,h,i,j}

Sometimes the complement of A is also denoted by A .
 THE UNION OF A AND B: AB (A or B)

It contains all the elements that are either in A or in B.
S
A B
In our example AB = {a,b,c,d,e,f,g}
 THE INTERSECTION OF A AND B: AB (A and B)

It contains the common elements of A and B.
S
A B
In our example AB = {d,e}
26
 A BASIC PREPERTY
n(AB) = n(A) + n(B) – n(AB)
Indeed, in our example

n(AB)=7, n(A)=5, n(B)=4, n(AB)=2
Notice that AB contains 7 elements, not 5+4=9, as in n(A)+n(B)

we count the common elements twice. Thus,
7 = 5 + 4 – 2
 MUTUALLY EXCLUSIVE SETS

If AB=, then n(AB)=0
S A B
In this case only
n(AB) = n(A) + n(B)

and the two sets A and B are said to be mutually exclusive.
27
4.6 PROBABILITY
We start again with a universal set S. In probability theory this set

is known as the sample space; it contains all possible outcomes of a
game, or experiment, etc. The subsets A,B, … of the sample space S
are called events.
Consider the sample space S. The number of elements in S, that is

n(S), is denoted by TOTAL. The probability of some event A is
simply defined by
n(A)
P(A) 
TOTAL
For example, in the following Venn diagram, the sample space S
contains 100 elements, while the event A contains 30 elements
100
30
70
n(A) 30
P(A)    0.3
TOTAL 100
In simple words, if we choose an element from S at random,

(provided that every element is equally likely to be selected), the
probability that this element belongs to A is 30 out of 100,
otherwise 30% (that is 0.3).
We understand that
0  P(A)  1
Clearly
P()=0 and P(S) = 1
28
 COMPLEMENTARY EVENTS
In our example above P(A΄) = 0.7
In general
P(A΄) = 1- P(A)
 COMBINED EVENTS
Remember the basic property for combined events
n(AB) = n(A) + n(B) – n(AB)

If we divide all terms by the TOTAL we obtain
P(AB) = P(A) + P(B) – P(AB)
For example, consider

100
A B
20 10 30
40
Then
P(A) = 0.3, P(B) = 0.4, P(A΄)=0.7, P(B΄)=0.6
Also
P(AB)=0.1, P(AB)=0.6
Clearly
P(AB) = P(A) + P(B) – P(AB)
0.6 = 0.3 + 0.4 – 0.1
A Venn diagram may also contain probabilities instead of numbers
of elements. The Venn diagram above takes the form
A B
0.2 0.1 0.3
0.4
29
 MUTUALLY EXCLUSIVE SETS
S A B
We have seen that two events are mutually exclusive if

AB= or equivalently n(AB)=0;
Equivalently if
P(AB)=0
In this case only
P(AB) = P(A) + P(B)
EXAMPLE 1
Given that P(A) = 0.5, P(B) = 0.3, P(AB)=0.6, let us construct a
Venn diagram representing the combined events A and B
Notice that
P(AB)≠P(A)+P(B)
0.6 ≠ 0.8
The difference implies the existence of an intersection; P(AB)=0.2
Starting from the intersection 0.2 we may easily complete the
following Venn diagram
A B
0.3 0.2 0.1
0.4
After completing the Venn diagram, we are in a position to answer

any probability question. For example
P(AB΄) = 0.3 P(A΄B) = 0.1 P(A΄B΄) = 0.4
P(AB΄) = 0.9 P(A΄B) = 0.7 P(A΄B΄) = 0.8
30
 TABLES
Another way to represent sets in order to find probabilities is the
tabular form below. It is appropriate when the sample space is
partitioned in disjoint subsets according two different criteria; for
example MALE-FEMALE and SMOKERS-NON SMOKERS.
Let us consider the following group of 200 people
male female Total

smoker 40 20 60
non-smoker 80 60 140
Total 120 80 200
In order to find the probability of a group (or combination of

groups) we simply divide its size by 200, the total number of
people. Thus
If we select a person at random the probability that this person is
120
 male is P(male)   0.6
200
80
 female is P(female)   0.4
200
60
 smoker is P(smoker)   0.3
200
140
 non-smoker is P(non - smoker)   0.7
200
40
 male AND smoker P(male  smoker)   0.2
200
140
 male OR smoker P(male  smoker)   0.7
200
Notice: In the last probability, we consider the column of male and

the row of smokers, but the combination male-smoker is counted
only once. It holds again
P(male  smoker)  P(male )  P( smoker) - P(male  smoker)
31
Some problems require particular techniques for counting the

appropriate group size. Tossing two dice is a characteristic example.
 TWO DICE
We toss two dice. There are 36 possible outcomes (combinations of

scores). The following table helps to visualize the outcomes
1 2 3 4 5 6
1      
2      
3      
4      
5      
6      
Notice that there is only one combination ones (the first dot; 1-1)
but two combinations of one-two (1-2 and 2-1).
We find the following probabilities:
1
P(two sixes)  (the very last dot)
36
11
P(at leat one six)  (last column and last row)
36
10
P(exactly one six)  (why?)
36
6
P(same score)  (the main diagonal: 1-1, etc)
36
4
P(sum of scores  9)  (the dotted line)
36
6
P(sum of scores  9)  (below the dotted line)
36
26
P(sum of scores  9)  (above the dotted line)
36
32
4.7 CONDITIONAL PROBABILITY – INDEPENDENT EVENTS
Notice the following difference in notation
P(A) means “probability of A”
P(A|B) means “probability of A, given B”
Intuitively, we expect that
“the probability that it will rain in some day”
is different than
“the probability that it will rain in some day,

given that this is a day of September”
In a more mathematical example, suppose that we pick a whole

number in the range 1-100. Let
A = “we pick 17”
1
Clearly P(A)  .
100
However, if we know the information
B = “the number selected has two digits”
1
then P(A | B)  (there are 90 two-digit numbers)
90
 FORMAL DEFINITION OF P(A|B)
The conditional probability is given by the formula
n(A  B) P(A  B)
P(A | B)  or P(A | B) 
n(B) P(B)
We will clarify the definition by using Venn diagrams and Tables
33
 P(A|B) IN A VENN DIAGRAM
Let us consider the example

100
A B
20 10 30
40
30
We know that P(A)  . What about P(A|B) ?
100
We start with the given event B; now the total number is not 100,
the size of the whole sample space, but only 40, the size of B:
?
P(A | B) 
40  given B
How many elements of A are inside the given space B? Only 10.
Therefore,
10
P(A | B) 
40
NOTICE
In fact, in the last result we apply the formula
n(A  B) 10
P(A | B)    0.25
n(B) 40
If we divide both the numerator and the denominator by the

TOTAL number of the sample space we obtain the formal definition
10
P(A  B) 100
P(A | B)    0.25
P(B) 40
100
34
 Similarly we obtain
10
P(B | A) 
30  given A
30 20 40
P(A'| B)  P(A | B' )  P(A'| B' ) 
40 60 60
 P(A|B) IN A TABLE
Perhaps it is much easier to observe the conditional probability in
tables. Consider again the example
male female Total

smoker 40 20 60
non-smoker 80 60 140
Total 120 80 200
Observe the difference between the probalitites

P(smoker) the person is a smoker
P(smoker|male) the person is a smoker given it is male
Clearly,
60
P(smoker) 
200
40
P(smoker|male) 
120  given male
40
P(male|smoker) 
60  given smoker
20 60
P(female|smoker)   0.33 P(non-smoker|female)   0.75
60 80
35
 INDEPENDENT EVENTS
The events A and B are said to be independent if
P(A|B) = P(A)
In other words, the event B does not affect A;

the probability of A remains the same, either B is given or not!
 Similarly, in this case it holds P(B|A) = P(B)

That is, the event A does not affect B.
P(A  B)
 In this case the definition P(A | B)  gives
P(B)
P(A  B)  P(A | B)  P(B)  P(A  B)  P(A)  P(B)
To summarize
P(A|B) = P(A) (1)
A and B are independent P(B|A) = P(B) (2)
P(A  B)  P(A)  P(B) (3)
EXAMPLE 1
120
A B
20 10 30
60
We can show in three different ways that A and B are independent

30 1 10 1
 P(A) = = and P(A|B) = = thus (1) holds
120 4 40 4
40 1 10 1
 P(B) = = and P(B|A) = = thus (2) holds
120 3 30 3
10 1 1 1 1
 P(A  B) = = P(A)  P(B)    thus (3) holds
120 12 4 3 12
36
NOTICE
 Many students confuse the terms
Mutually exclusive events and Independent events
Remember
Mutually exclusive events means A B = 
Independent events means P(A  B)  P(A)  P(B)
 Mind that
P(AB) = P(A) + P(B) – P(AB) holds in general
P(A  B)  P(A)  P(B) holds for independent events
In particular for independent events, it is sometimes useful to

combine these two formulas in the following one
P(AB) = P(A) + P(B) - P(A)  P(B)
 Sometimes we know beforehand that two events are

independent. Thus, for their combination we can apply the
formula P(A  B)  P(A)  P(B)
For example,
we toss a die and a coin; Find the probability that the die
shows a SIX and the coin shows a HEAD.
We call
1
A = “the die shows a SIX” P(A) =
6
1
B = “the coin shows a HEAD” P(B) =
2
The events A and B are clearly independent and for their

combination it holds
1 1 1
P(A  B)  P(A)  P(B) = =
6 2 12
37
EXAMPLE 2
Let P(A)=0.4 and P(B)=0.3. Find P(AB) in the following cases
a) A and B are mutually exclusive
b) A and B are independent
c) P(A  B)  0.2
d) P(A | B)  0.2
Solution
a) P(AB) = P(A) + P(B) = 0.4 + 0.3 = 0.7
b) P(AB) = P(A) + P(B) - P(A)  P(B) = 0.4+0.3–(0.4)(0.3) = 0.58
c) P(AB) = P(A) + P(B) – P(AB) = 0.4+0.3–0.2 = 0.5
P(A  B)
d) P(A | B)   P(AB)= P(A | B) P(B)= (0.2)(0.3) = 0.06
P(B)
Hence, P(AB) = P(A)+P(B)–P(AB) = 0.4+0.3–0.06 = 0.64
EXAMPLE 3
Let A and B be independent events with
P(A)=0.4 and P(AB)=0.7.
Find P(B).
Solution
For independent events it holds
P(AB) = P(A) + P(B) - P(A)  P(B)
 0.7 = 0.4 + P(B) – 0.4 P(B)
 0.3 = 0.6P(B)
 P(B) = 0.5
38
4.8 TREE DIAGRAMS
Very often we have to estimate the probability in a sequence of

events under different scenarios. The best way to represent such a
problem is by a tree diagram.
PROBLEM 1. We play a game with two possible results.
For example we pick one of the following 10 letters
AAAA BBBBBB
The results are

A with probability 0.4
B with probability 0.6
We play the game twice. All possible scenarios are shown below; the
corresponding probabilities are shown on the branches of the tree:
A
Scenario AA
0.4
A
0.4
0.6 B Scenario AB
Scenario BA
0.4 A
0.6 B
0.6 B Scenario BB
Next, for each scenario we multiply the corresponding probabilities
for AA: (0.4)x(0.4) = 0.16

for AB: (0.4)x(0.6) = 0.24,
etc
39
Thus, the final “picture” of the tree diagram is as follows
A
0.16
0.4
A
0.4
0.6 B
0.24
0.24
0.4 A
0.6 B
0.6 B 0.36
(notice that the sum of the resulting probabilities is 1).
Now any probability may be found by adding the relevant results.
Namely, the probability
 to obtain two A’s is 0.16
 to obtain two B’s is 0.36
 to obtain first A and then B is 0.24
 to obtain one A, one B is 0.24 + 0.24 = 0.48
 to obtain the same result is 0.16 + 0.36 = 0.52
(thus to obtain different results is 1- 0.52 = 0.48)
If we refer to the number of A’s, the probability
 to obtain no A is 0.36
 to obtain exactly one A is 0.24 + 0.24 = 0.48
 to obtain at least one A is 0.24 + 0.24 + 0.16 = 0.64
 to obtain at most one A is 0.24 + 0.24 + 0.36 = 0.84
40
PROBLEM 2. We play again the previous game once. According to

the result we play a different second game.
For example we pick one of the following 10 letters
AAAA BBBBBB
If the first result is A we pick one letter among CCC DDDDDDD
If the first result is B we pick one letter among CCCC DDDDDD
What is the probability to obtain C?
A tree diagram is particularly helpful in such a situation where the

second game depends on the first one:
C
0.12
0.3
A
0.4
0.7 D
0.28
0.24
0.4 C
0.6 B
0.6 D 0.36
(notice again that the sum of the resulting probabilities is 1).
Thus
the probability to obtain C is 0.12 + 0.24 = 0.36
the probability to obtain D is 0.28 + 0.36 = 0.64
41
It is worthwhile to mention the following probabilities:
 to obtain A and C. It is 0.12

It is in fact P(AC) and refers to the first scenario
 to obtain A or C. It is 0.12 + 0.28 + 0.24 = 0.64

It is in fact P(AC) and refers to the first three scenarios which
contain either A or C (or both).
NOTICE
In the tree diagram above, the value 0.3 of the branch AC is in fact
the conditional probability
P(C|A) = Probability to obtain C, given that the first letter is A
In general, in a tree diagram
 the branches of the 1st column contain simple probabilities

of the form P(X)
 the branches of the 2nd column contain conditional probabilities

of the form P(Y|X)
 the results in the last column are combined probabilities

of the form P(XY)
C
P(A
C)
P(C|A)
A
P(A)
P(D|A) D
P(A
D)
P(B
C)
P(C|B) C
P(B) B
P(D|B) D P(B
D)
42
We may have more complicated tree diagrams, with more

branches per level, more levels, etc.
EXAMPLE 1.
We throw a die.
If we get 1 we stop.
If we get 2,3,4 or 5 we toss a coin.
If we get 6 we toss two coins.
Find the probability that only one head is obtained.
Solution.
For our convenience, we denote the results of the die by
A={1}, B={2,3,4,5}, C={6}
We construct the following tree diagram:
1/6
1/2 H
4/6 B
1/2
1/6 T H
1/2
C 1/2 1/2
H T
H
1/2 1/2
T
1/2 T
There are finally 7 scenarios (seven paths).

In 3 of them we have exactly one HEAD (we mark them by )
We add the corresponding results:
4 1 1 1 1 1 1 1 4 1 1 5
P(only one HEAD)            
6 2 6 2 2 6 2 2 12 24 24 12
43
 A TYPICAL EXAMPLE: COLORED BALLS IN A BOX
A box contains 10 balls: 6 BLACK and 4 WHITE:
10 balls
We select two balls, one after the other. All possible outcomes are
clearly shown on the following tree diagram
30
B 90
5
9
B
6
10 4 24
90
9
W
6
B 24
9
90
4
10 W
3 12
9
W 90
6 5 30 1
P(both balls are BLACK)    
10 9 90 3
 6 4 24 8
P(only one ball is BLACK)     2  2 
 10 9  90 15
6 5 4 3 42 7
P(balls of same color)      
10 9 10 9 90 15
If we select 3 balls, we may follow the same rationale and answer

directly without drawing a tree diagram. Thus,
6 5 4 1
P(all three balls are BLACK)    
10 9 8 6
 6 4 3 3
P(only one ball is BLACK)     3 
 10 9 8  10
44
 THE “REVERSE GIVEN”
Consider again the tree diagram of PROBLEM 2
C 0.12
0.3
A
0.4
0.7
D 0.28
C 0.24
0.4
0.6 B
0.6
D 0.36
We said that P(C|A) = 0.3 is shown on the tree (on the branch AC).
What about P(A|C)?
Notice the “reverse” chronological order:
given that the final result is C,

what is the probability that the first result was A?
This result is not shown on the tree diagram; it is estimated as

follows
0.12  combination AC
P(A | C) 
0.12  0.24  given C
P(A  C)
Actually, it is the formula P(A | C) 
P(C)
Therefore,
0.12 0.24
P(A | C)   0.33 P(B | C)   0.67
0.36 0.36
0.28 0.36
P(A | D)   0.44 P(B | D)   0.56
0.64 0.64
Remark: the formula that calculates the “reverse” probability is

known as Bayes’ Theorem.
45
EXAMPLE 2.
In a private school party, 30% of the students wear RED suits, 20%
wear GREEN suits and 50% wear BLUE suits. 25% of the RED
students, 35% of the GREEN students and 45% of the BLUE
students are MALE. Find the probability that a MALE student
wears GREEN suit, that is
P(GREEN|MALE).
Solution.
Instead of applying the Bayes’ formula we will construct a tree
diagram to obtain the “inverse given” probability.
Notice that we do not complete all the probabilities on the tree

diagram, but only the “necessary” ones.
MALE 0.075
0.25
RED
D
0.3 FEMALE
MALE 0.070
0.2 GREEN 0.35
0.5
FEMALE
BLUE 0.45
MALE 0.225
FEMALE
Therefore,
0.070 0.07
P(GREEN|MALE)    0.189
0.075  0.070  0.225 0.37
In other words, 18.9% of the MALE students wear GREEN suits.
46
4.9 PROBABILITY DISTRIBUTION OF A RANDOM VARIABLE X
Roughly speaking, a random variable X takes on some values in a

given domain at random!!! It may be
Discrete OR Continuous
e.g. X{10,20,30} e.g. X[10,20]
X{0,1,2,3,…} XR
A discrete variable takes on values in a finite or numerable set,

while a continuous variable takes on values in some interval(s).
In this paragraph we only deal with discrete random variables.
 DISCRETE RANDOM VARIABLE
Let X be a variable which takes on the values
10, 20, 30
with probabilities
0.2, 0.3, 0.5
respectively. We often use a table
x 10 20 30
P(X=x) 0.2 0.3 0.5
Clearly
(i) all the probabilities are non-negative numbers; and

(ii) their sum is always 1.
Then we say that X is a discrete random variable.
To express that the probability that X=10 is 0.2 we write
P(X=10) = 0.2
Similarly, P(X=20) = 0.3 and P(X=30) = 0.5.
47
In general, for a discrete random variable X with
x x1 x2 x3 …
P(X=x) p1 p2 p3 …
it holds
(i) pi  0 , for all i
(ii) p i  1, i.e p1  p 2  p3  ⋯  1
We write
P(X=x1) = p1, P(X=x2) = p2, and so on.
(We also say that a probability function p: xi ֏ yi is defined).
 THE EXPECTED VALUE μ=E(X)
The mean μ or otherwise the expected value E(X) is defined by

Ε(Χ) = x p i i = x 1 p1  x 2 p 2  x 3 p3  ⋯
For our example
x 10 20 30
P(X=x) 0.2 0.3 0.5
the expected value (otherwise the mean) is
E(X) = 10  0.2  20  0.3  30  0.5 = 23
NOTICE: Explanation for μ=E(X)

In fact the mean here is not different than the mean in statistics
Consider the following ten numbers
10, 10, 20, 20, 20, 30, 30, 30, 30, 30
The probabilities to select 10, 20 or 30 are as in the table above.
The mean in statistics is also

10  2  20  3  30  5 2 3 5
μ= = 10   20   30  =23
10 10 10 10
48
EXAMPLE 1
Consider
x 10 20 30
P(X=x) a b 0.5
Given that E(X)=23, find the values of a and b.

Solution.
We use two relations
a + b + 0.5 = 1  a + b = 0.5
10a + 20b + 30 0.5 = 23  10a+20b=8
The solution of the system is a = 0.2 and b = 0.3
The probability distribution applies in many betting games:
EXAMPLE 2
Consider again the same table above. But now we select one of the
numbers 10, 20, 30 at random.
If we select 10 we earn 6 points
If we select 20 we earn 1 point
If we select 30 we lose 2 points
What is the expected number of points in one game?

Solution.
We extend our table as follows
x 10 20 30
Profit 6 points 1 point -2 points
Prob 0.2 0.3 0.5
We estimate the expected profit:
Expected profit = 6  0.2  1  0.3- 2  0.5 =0.5
That is, in each game we earn 0.5 points on average.
49
Explanation
In other words, if we play this game 10 times we expect to earn 5
points on average.
Indeed, if we play the game 10 times we expect to obtain
2 times the number 10, that is 2  6=12 points
3 times the number 20, that is 3  1=3 points
5 times the number 30, that is 5  (-2)=-10 points
In total, 12+3-10 = 5 points
EXAMPLE 3
We throw two dice.
If we obtain TWO SIXES we earn 15€
If we obtain ONLY ONE SIX we earn 1€
If we obtain NO SIX we lose 1€
Find the expected profit in one game.

Solution.
Let us organize our data on a table
Result TWO SIXES ONE SIX NO SIX

Profit 15€ 1€ -1€
Prob 1 10 25
36 36 36
The expected amount earned per game is

1 10 25
Expected profit = 15   1 - 1 0
36 36 36
This is a FAIR GAME! We expect neither to earn nor to lose!
Notice. If the first winning prize was not 15€ but 14€, the expected
1
profit would be -
36
In other words, if we play the game 36000 times (or otherwise bet
36000€) we expect to lose 1000€.
50
 MEDIAN-MODE
These measures, known from statistics, are defined analogously:
MODE = The value X=a of the highest probability
MEDIAN = The value X=m where the probability splits
in two equal parts (0.5-0.5)
Look at the examples below
x 10 20 30 x 10 20 30
P(X=x) 0.4 0.3 0.3 P(X=x) 0.2 0.3 0.5
MODE = 10 MODE = 30
MEDIAN = 20 MEDIAN = 25 (why?)
 VARIANCE (Only for HL)

We define
Var(X) = E(X-μ)2
that is
Var(X) = (x1-μ)2  p1 + (x2-μ)2  p2 + (x3-μ)2  p3 + …
An equivalent definition is
Var(X) = E(X2)-μ2
where
E(X2) = x12  p1 + x22  p2 + x32  p3 + …
EXAMPLE 4
Consider again the probability distribution
x 10 20 30
P(X=x) 0.2 0.3 0.5
We have seen that μ=Ε(Χ)=23. Therefore,

Var(X) = (10-23)2  0.2+(20-23)2  0.3+(30-23)2  0.5 = 61
or
E(X2) = 102  0.2+202  0.3+302  0.5 = 590
Var(X)=590-232 = 61
51
4.10 BINOMIAL DISTRIBUTION – B(n,p)
It is the distribution of a discrete random variable X which takes on

the values
0, 1, 2, 3, 4, … , n
with probability function
n 
p(x)   p x (1  p) n- x x  0,1,2,3, …, n
x 
where n, p are two parameters. We will see that the binomial

distribution describes a certain type of problems.
Notice: the formula is not in the syllabus; Results will be obtained

directly by GDC. It is worth to mention though how it works!
 DESCRIPTION OF THE PROBLEM

We deal with a game (or any experiment) with two outcomes
SUCCESS with probability p

FAILURE (with probability 1-p)
We play the game n times. Our parameters are
n = number of trials
p = probability of success
while
X counts the number of (possible) successes
We say that X follows a binomial distribution and write XB(n,p).
Since n is the number of trials, X can take on the values

0, 1, 2, 3, 4, …, n
The probabilities P(X=0), P(X=1), P(X=2), etc can be obtained by

the GDC.
(and also by the formula mentioned in the introduction, but as we

have said this formula is not in the syllabus).
52
 GDC
Our GDC (Casio) gives the results for a Binomial distribution
MENU – Statistics – DIST – BINOMIAL: We use Bpd or Bcd
For simplicity let us denote by
Bpd(x) the probability of exactly x successes

Bcd(x1 to x2) the probability from x1 up to x2 successes
The menu for both functions is

Data: always Variable
Numtrial: is the number of trials i.e. n
p: is the probability of success p (for each game)
Then for each value of x (or x1 to x2), EXE gives the result.
EXAMPLE 1
We toss a die 5 times. The success is to get a six. Then
1
n=5 and p=
6
We may have 0, 1, 2, 3, 4 or 5 successes.
The probability distribution for X is given by (results in 4dp)
x 0 1 2 3 4 5
GDC Bpd(0) Bpd(1) Bpd(2) Bpd(3) Bpd(4) Bpd(5)
P(X=x) 0.4019 0.4019 0.1608 0.0322 0.0032 0.0001
We can also answer the following questions:
Find the probability of Notation GDC Result

exactly 3 sixes P(X=3) Bpd(3) 0.0322
at most 3 sixes P(X≤3) Bcd(0 to 3) 0.9967
less than 3 sixes P(X<3) Bcd(0 to 2) 0.9645
more than 3 sixes P(X>3) Bcd(4 to 5) 0.0033
at least 3 sixes P(X≥3) Bcd(3 to 5) 0.0355
53
Remark for the formula (not in the syllabus but worth to know)
The probability to obtain

5 sixes in a row is 1/65
no six at all is 5/65
2 3
5  1   5 
2 sixes and 3 no-sixes is  2    
  6   6 
5 
  (or 5C2) is the number of ways to have 2 sixes in 5 trials.
2
In general, the probability to obtain x sixes (and (5-x) no-sixes) is

x 5- x
 5  1   5 
    
 x  6   6 
In general, if we play n times a game with probability of success p
the probability P(X=x) is given by the formula
n 
p(X  x)   p x (1  p) x
x 
According to the formula
x 0 1 2 3 4 5
3125 3125 1250 250 25 1
P(X=x)
65 65 65 65 65 65
=0.4019 =0.4019 =0.1608 =0.0322 =0.0032 =0.0001
The table agrees with the results found by Bpd(x) above.
 EXPECTED VALUE AND VARIANCE OF X

They are given by the formulae
E(X) = np Var(X) = np(1-p)
For our example above

1 5 1 5 25
E(X) =5 = and Var(X)=5 =
6 6 6 6 36
54
EXAMPLE 2
A box contains 5 balls, 1 BLACK and 4 WHITE. We win if we select
a BLACK ball. We play this game 10 times.
Find
(a) The probability to win exactly 4 times
(b) The probability to win at most 4 times
(c) The probability to win at least once
(d) The expected number of winning games.
(e) The variance of the number of winning games.
Solution
The variable
X = number of winning games
1
follows a binomial distribution with n=10 and p= =0.2
5
[we may also write XB(10,0.2)]
(a) The probability to win exactly 4 times is Bpd(4)=0.088

(b) The probability to win at most 4 times is Bcd(0 to 4)=0.967
[in fact P(X≤4) = P(X=0)+P(X=1)+P(X=2)+P(X=3)+P(X=4)]
(c) The probability to win al least once is Bcd (1 to 10) = 0.893
[in fact P(X≥1) = 1-P(X=0) = 1-0.107 = 0.893]
(d) The expected number is E(X)=np=100.2 = 2
(e) The variance is Var(X) = np(1-p) = 100.20.8 = 1.6
EXAMPLE 3
Let p=0.2 and n unknown. It is given that P(X=1) = 0.268. Find n.
Solution
We know that n must be an integer.

By trial and error on Numtrial we can see that Bpd(10)=0.268
Hence n=10.
55
 MODE (mainly for HL)
We first check the expected number

 If the expected number is in decimal form,
1 20
say n=20, p= , so that E(X)=  3.3
6 6
we check the nearest integer values 3 and 4
P(X=3) = 0.237
P(X=4) = 0.202
Hence the mode is 3 (it has the highest probability)
 If the expected number is a whole number,
1 60
say n=60, p= , so that E(X)= = 10
6 6
we check the neighboring integer values 9, 10, 11
P(X=9) = 0.134
P(X=10) = 0.137
P(X=11) = 0.126
Hence the mode is 10.
Notice In some cases we may have two modes:

1 5
For n=5 and p= , it is E(X)=  0.833 . We check
6 6
P(X=0) = 0.4019
P(X=1) = 0.4019
Hence there are two modes, 0 and 1.
56
4.11 NORMAL DISTRIBUTION – N(μ,σ2)
It is the distribution of a continuous random variable X with values

form -∞ to +∞. The parameters of this distribution are
μ = mean
σ = standard deviation.
The “behavior” of the probability is described by a function which

looks like
-∞ μ +∞
Roughly speaking, there is a highly likely mean value μ and all the
other values of X spread out symmetrically about the mean. As we
move away from the mean (either to the left or to the right of the
mean) the probability decreases dramatically!
We say that X follows a normal distribution with mean μ and

standard deviation σ (or variance σ2) and we write XN(μ,σ2).
 DESCRIPTION OF THE PROBLEM IN GENERAL

It is the most “popular” distribution in nature. Random variables
which depend on many factors follow this distribution, for example
 Weight of people
 Height of people
 Time spent in a super market
 Weight of a pack of coffee labeled 500 g.
57
For example, suppose that for a Greek man
mean weight: μ=75kg st.dev. of the weights: σ=10kg
It is estimated4 that
ranges between
Percentage of the population
in general for our problem
about 68% of the population μ-σ and μ+σ [65,85]
about 95% of the population μ-2σ and μ+2σ [55,95]
about 99.7% of the population μ-3σ and μ+3σ [45,105]
68.3%
x
-∞ 65 75 85 +∞
NOTICE
 The whole area under the curve is 1 (i.e. 100%). The area before
the mean as well as the area after the mean is 0.5 (i.e. 50%)
 Theoretically, the distribution of X ranges between -∞ to +∞.
In practice, we may assume that almost the whole population
(in fact 99,7%) ranges between μ-3σ and μ+3σ.
 The standard deviation σ indicates the spread of the population.
For example, assume that
Greeks: μ=75 kg σ=10 kg
Italians: μ=75 kg σ=8 kg
This implies that both populations have the same mean but
Italians are closer to the mean than Greeks. In other words,
almost the whole population is between μ±3σ, namely
75±30 i.e. 45-105 kg for Greeks
75±24 i.e. 51-99 kg for Italians
4 We will explain in a while how we get the following estimations.
58
We will distinguish two types of problems. For both problems we

use GDC to find the results.
For Casio fx
MENU – STAT – DIST – NORM: We use Ncd or InvN

Data: always use Variable
In general, Ncd is used when we ask for a probability

InvN is used when we know the probability
 PROBLEM 1: FIND PROBABILITY (we use Ncd)

Consider again the example where
X = the weight of a Greek man
with μ=75 kg and σ=10 kg.
Find the probability that a Greek man weighs
(a) between 60 and 82 kg [that is P(60≤X≤82)]
(b) more than 82 kg [that is P(X≥82)]
(c) less than 60 kg [that is P(X≤60)]
Solution
We use Ncd in the GDC. We set σ=10, μ=75
Question Ncd Press EXE

(a) Lower 60 Upper 82 0.691
(b) Lower 82 Upper 999999… 0.242
(c) Lower -999999 Upper 60 0.067
Let us represent the information of this problem by a diagram:
0.067 0.691 0.242
-∞ 60 75 82 +∞
59
NOTICE
 GDC gives some extra information below each result.
For question (a) it gives P(60≤X≤82)=0.691 and then
z:Low =-1.5 z:Up =0.7
the so-called standardized values of 60 and 82. They mean that
the lower bound 60 is 1.5 standard deviations below μ=75

the upper bound 82 is 0.7 standard deviations above μ=75
 The probability that some weight is 1 st. deviation (i.e. 10 kg)

away from the mean m=75, that is between 65 and 85 kg is
P(65≤X≤85)0.683, (68.3%)
as we said in the introduction. Notice that z:Low =-1 z:Up =1
 The probabilities above refer to one person only.

For example,
P(a person is between 60 and 82 kg) = 0.691,
P(a person is not between 60 and 82 kg) =1-0.691= 0.309
If we select two people,

P(both between 60 and 82 kg) = (0.691)2
P(none between 60 and 82 kg) = (0.309)2
P(only one between 60 and 82 kg) = (0.691)(0.309) 2
 Combination of Normal and Binomial distributions:

We select 10 people. What is the probability that exactly three
of them are between 60 and 82 kg?
For one person the normal distribution gives p=0.691.
Then, a new variable Y (for the number of people) follows
binomial distribution B(n,p) with
n=10 and p=0.691.
and
P(Y=3) =0.0106
60
 PROBLEM 2: PROBABILITY IS GIVEN (we use InvN)

Now, for
X = the weight of a Greek man
with μ=75 kg and σ=10 kg they give us the information
the probability that somebody weighs less than a is 0.067

or
6.7% of the Greek men weigh less than a
Find a.
Using mathematical notation:
P(X≤a)=0.067, find a
Solution
Let us represent this information in a diagram
0.067
-∞ a +∞
We use InvN. We set the parameters σ=10, μ=75. Then
Tail: Left (it is the area before a)

Area: 0.067
Press EXE and obtain a=60 kg.
Notice for the tail
Tail: Left if it says less than a

Tail: Right if it says more than a
The area after point a above is 0.933. That is P(X≥a)=0.933. Then

Tail: Right
Area: 0.933
also gives a=60 kg.
61
EXAMPLE 1
The mass of packets for a certain type of coffee is normally
distributed with a mean of 500 g and standard deviation of 15 g.
(a) Find the probability that a packet weighs more than 520 g
(b) The lightest 4% of the packets weigh less than a.
The heaviest 5% of the packets weigh more than b.
Find a and b.
(c) Find Q1, Q3, the lower and upper quartiles of the weights
Solution
(a) We use Ncd

P(X≥520)  0.091
(b) We use InvN

P(X≤a)=0.04, hence a  474 g
P(X≥b)=0.05, hence b  525 g
(c) In fact, this question looks like question (b).

We know that the “area” before Q1 is 0.25, while the area
before Q3 is 0.75.
We use InvN, tail: left
P(X≤Q1)=0.25  Q1  490 g
P(X≤Q3)=0.75  Q3  510 g
For the second result we can also use Tail: right, area = 0.25
For this question in particular, we may use
Tail: Central, Area =0.5
and obtain both results. Q1  490 g and Q3  510 g
In general, Tail: Central can be used when the values are

symmetrically before and after the mean.
62
EXAMPLE 2
The mass of packets for a certain type of coffee is normally
distributed with a mean of 500 g and standard deviation of 15 g.
Packets less than 475 g are rejected from the market.
(a) We select 2 packets. Find the probability that both are

rejected.
(b) We select 5 packets. Find the probability that al least one is
rejected.
Solution
The probability that a packet is rejected is

P(X<475)  0.0478
(a) (0.0478)2=0.00228
(b) Y follows binomial with n=5 and p=0.0478
P(Y≥1)  0.217
63
64
ONLY FOR
HL
65
66
4.12 POISSON DISTRIBUTION – Po(m)
It is the distribution of a discrete random variable X which takes on

the values
0, 1, 2, 3, …
with probability function
e m m x
P(X  x)  x  0,1,2,3, …
x!
where m is a parameter.
We say that X follows a Poisson distribution and write XPo(m).
Notice: the formula above is not in the syllabus but it is worth to

mention it! The results can be obtained directly by GDC.
 DESCRIPTION OF THE PROBLEM

In general, we study the number of incidents that occur within a
certain period (usually of time). For example
- Number of phone calls per minute in a call center.

- Number of accidents per hour in a certain area.
- Number of cars coming to a junction.
- Number of mistakes per page in a book
- Number of bacteria per cm3
We denote by
m = the mean number of incidents
(within a certain period)
while
X is the random variable for the possible number of incidents
(within the certain period)
Then, the probability that x incidents occur, that is

P(X=x)
can be obtained by GDC (or by the formula mentioned above).
67
 GDC
Our GDC (Casio) gives the results for a Poisson distribution
MENU – Statistics – DIST – POISSON: We use Ppd or Pcd

Data: always Variable
λ: is the mean m
x: is the value for X
Then EXE gives the result
For simplicity let us denote by
Ppd(x) the probability that exactly x incidents occur

Pcd(x-y) the probability from x up to y incidents
EXAMPLE 1
In a call center it has been noticed that 2 phone calls on average

occur per minute. Thus m=2 (for GDC: λ=2)
Thus, the probability that 3 phone calls occur per minute is 5
P(X=3) = Ppd(3) = 0.180
Similarly,
P(X=0)  0.135, P(X=1)  0.271, and so on!
We obtain the following table for the distribution of X
X 0 1 2 3 …
P(X=x) 0.135 0.271 0.271 0.180 …
We can also obtain results like that:
P(X≤2) = Pcd(0-2) = 0.677
This is in fact,
P(X=0) + P(X=1) + P(X=2) = 0.135 + 0.271 + 0.271 = 0.677
-2 3
5 Notice that the formula e 2  0.180 gives the same result
3!
68
Loot at also the following results:
Find the probability that Notation GDC Result
exactly 3 phone calls occur P(X=3) Ppd(3) 0.180
at most 3 phone calls occur P(X≤3) Pcd(0-3) 0.857
less than 3 phone calls occur P(X<3) Pcd(0-2) 0.677
more than 3 phone calls occur P(X>3) Pcd(4-+) 0.143
at least 3 phone calls occur P(X≥3) Pcd(3-+) 0.323
For + use a very large number, e.g. +999999
 MIND THE LENGTH OF THE PERIOD
Be careful about m. It is the mean number of incidents for the

period in question. For example, if
v = the frequency of incidents per minute
the mean number of incidents in t minutes is
m=vt
The following example helps!
EXAMPLE 2
Assume that the mean number of phone calls per minute is 2. Find
(a) The probability that 3 phone calls occur in one minute
(b) The probability that 3 phone calls occur in two minutes
(c) The probability that no phone calls occur in three minutes
Solution
The frequency of phone calls is v=2 (phone calls per minute)
(a) The mean number of phone per minute is m=2. Hence
P(X=3) = Ppd(3) = 0.180
(b) The mean number of phone calls per 2 minutes is m=4. Hence
P(X=3) = Ppd(3) = 0.147
(c) The mean number of phone calls per 3 minutes is m=6. Hence
P(X=0) = Ppd(0) = 0.00248
69
 EXPECTED VALUE AND VARIANCE

The formulas are simple
E(X) = m
Var(X) = m
At least for E(X) it seems to be very reasonable that the expected

number of incidents is the mean number of incidents!
 MODE
We check the neighboring integer values of the expected number λ.
Look at the following two cases:
 Assume that the mean is m=4.3. We expect that the most likely
number of incidents is near 4.3. We check
P(X=4) = 0.193
P(X=5) = 0.166
Hence the mode is 4.
 Assume that the mean is m=5. We expect that the most likely
number of incidents is near 5.
We check
P(X=3) = 0.140 P(X=4) = 0.175
P(X=5) = 0.175 P(X=6) = 0.146
Hence we have two modes, 4 and 5.
 ASSUMPTIONS (a bit theoretical!)
The conditions that justify a Poisson distribution are the following
 The events within two disjoint intervals are independent.

 Events occur at a uniform average rate
For example, if we study the number of phone calls in a call center

within a minute, we assume that the numbers of phone calls in
disjoint minutes are independent to each other, and also that the
phone calls occur at a uniform rate (we exclude extreme situations).
70
EXAMPLE 3
The mean number of phone calls in a call center is m=2. Find the
probability of the combined event that
3 phone calls occur in the first minute and
4 phone calls occur in the second minute
Solution
We have
P(X=3) = 0.1804 and P(X=4) = 0.0902
The time intervals are disjoint, thus the probability is
P(X=3 and X=4) = (0.1804)(0.0902) = 0.0163
 THE SUM OF POISSON DISTRIBUTIONS

Suppose that X and Y are two independent variables such that
X follows Po(m)
Y follows Po(n)
Then X+Y follows Po(m+n)
EXAMPLE 4
The mean number of phone calls in the call center A is m=2, while
the mean number of phone calls in the call center B is n=3.
Find the probability that the total number of phone calls is 6.
Solution
X ~Po(2), Y ~ Po(3), therefore X+Y ~ Po(5)
Thus
P(X+Y=6) = 0.161
NOTICE
The sum of the following combinations (where X+Y=6)
P(X=0)P(Y=6) + P(X=1)P(Y=5) + … + P(X=6)P(Y=0)
is also 0.161. (check!)
71
4.13 MARKOV CHAINS
Markov chain is a model describing a sequence of possible events in

which the probability of each event depends only on the state
attained in the previous event.
Let us make it clear by presenting a simple example:
Consider two locations A and B. Each day the following transitions

in population occur:
For the population of A: 80% stays in A
20% moves to B
For the population of B: 70% stays in B
30% moves to A
In probability terms, this information is depicted in the following
transition state diagram
0.8
0.2
A
B 0.7
0.3
or in the following table
from
Prob
A B
A 0.8 0.3
to
B 0.2 0.7
or, in mathematical terms, in the following transition matrix
 0.8 0.3 
T = 
 0.2 0.7 
Notice that
 the columns refer to “from” and the rows to “to”.
 the sum of the entries in each column is 1.
72
After two days, the probabilities for the transition of the

population are shown in the following tree diagrams:
starting from A starting from B
A A
0.64 0.24
0.8 0.8
A A
0.8 0.3
0.2 B 0.2 B
0.16 0.06
A B
0.06 0.21
0.3 A 0.3 A
0.2 B 0.7 B
0.7 B 0.14 0.7 B 0.49
Therefore, after two days, the probability to move

from A to A is 0.64+0.06 = 0.70
from A to B is 0.16+0.14 = 0.30
from B to A is 0.24+0.21 = 0.45
from B to B is 0.06+0.49 = 0.55
However, we can obtain all the results above in a much easier and
amazing way: by squaring the transition matrix A!
 0.8 0.3   0.8 0.3   0.70 0.45 

T2 =  = 
 0.2 0.7   0.2 0.7   0.30 0.55 
Still,
 the columns refer to “from” and the rows to “to”.
 the sum of the entries in each column is 1.
The situation after 3 days is depicted in the following matrix
 0.650 0.525 
T3 = 
 0.350 0.475 
By using our GDC we may observe that for large powers (that is
after many days) the result converges to the matrix
 0.6 0.6 
T = 
 0.4 0.4 
73
Notice.
The columns in T  are identical. This happens when the matrix is
regular:
a square matrix A is regular if An has non-zero entries
after some value n. Then An converges to a matrix A∞
in which all columns are identical.
In our case T is regular as it has non-zero entries from the very

beginning!
Suppose that the total population in the two cities is 100 and the
initial distribution is 50-50. This can be depicted in the
 50 
initial state vector S0 =  
 50 
After one day, the distribution in the two cities becomes
 0.8 0.3  50   55 
S1 = TS0 =   =  that is, 55 in A, 45 in B
 0.2 0.7  50   45 
 55 
After 2 days, S2 = TS1 = T 2 S0 =  
 45 
 55 
After 3 days, S3 = TS2 = T 3S0 =  
 45 
 60 
After many days, S = T  S0 =  
 40 
However,
 100   60 
If the initial state vector was S0 =   , then still S = T S0 = 


 0   40 
 0   60 
If the initial state vector was S0 =   , then still S = T S0 = 


 100   40 
 60 
In fact, any initial state vector S0 would result to S = T  S0 =  
 40 
74
This means that after many steps, things tend to stabilize and the
distribution of the population reaches the so-called
 60 
steady state vector S= 
 40 
This also implies that if the initial distribution was already 60-40,
it would remain the same forever, from the very first day! Indeed,
 0.8 0.3  60   60 
TS =   =  =S
 0.2 0.7  40   40 
 THE PROBABILITY FOR THE MOTION OF A ROBOT!

Another interesting observation is the following:
If a robot starts from location A and the probabilities for its motion
are given in the transition diagram above, then the initial state
vector is
1
S0 =  
0
After one move the state of probabilities is
 0.8 0.3   1   0.8 

S1 = TS0 =    =   i.e. 0.8 to be in A, 0.2 to be in B
 0.2 0.7   0   0.2 
 0.7 
After 2 moves, S2 = TS1 = T 2 S0 =  
 0.3 
 0.65 
After 3 moves, S3 = TS2 = T 3S0 =  
 0.35 
 0.6 
After many days, S = T  S 0 =  
 0.4 
In this case, the steady state vector is
 0.6 
S= 
 0.4 
Thus, at any moment in the “future”, the probability for the robot
to be in location A is 0.6, while to be in location B is 0.4.
The following example is described by a non-regular transition

matrix but it is interesting to see how a Markov chain works.
75
EXAMPLE 1
A robot can be moved on the diagram below, at positions 1,2,3,4.

It is initially located at position 2.
The probability to move to the right is 0.6, while to the left 0.4. It
stops when it reaches position 1 or position 4.
What is the probability to finish at position 1?
This problem can be easily (and amazingly) solved as a Marvov

chain process.
The probabilities of moving from any position i to any position j are

shown in the following transition matrix
 1 0.4 0 0 
 
0 0 0.4 0 
T =
 0 0.6 0 0 
 
 0 0 0.6 1 
For example,
the 1st column says: if the robot is situated at position 1, the
probability to move elsewhere is 0 as it stops at position 1.
the 2nd column says: if the robot is situated at position 2, the
probability to move to 1 is 0.4 while to 3 is 0.6.
and so on.
The initial state vector is

0
 
1
S0 =  
0
 
0
as the robot starts at position 2.
76
After one move the state of probabilities is
 1 0.4 0 0   0   0.4 
    
0 0 0.4 0   1   0 
S1 = TS0 = T =  =
 0 0.6 0 0   0   0.6 
    
 0 0 0.6 1   0   0 
 0.40 
 
2  0.24 
After 2 moves, S2 = TS1 = T S0 =
 0 
 
 0.36 
In other words,
the probability to finish at position 1 is 0.40,
the probability to be again in position 2 is 0.24,
there is no way to be in position 3 (can you think why?), the
probability to finish at position 4 is 0.36.
so after 2 moves it is more highly likely to finish at position 1.
The transition matrix is not regular as any power will contain non-
zero matrix (can you think why?).
However, it is interesting to see what happens after 10 moves (by

using our GDC for multiplication of matrices):
 0.5259 
 
10  0.0008 
S10 = T S0 =
 0 
 
 0.4733 
After 11 moves
 0.5262 
 
11  0 
S11 = T S0 =
 0.0005 
 
 0.4733 
Notice that it is impossible to be at position 3 after 10 moves or at

position 3 after 11 moves (why?)
We observe that in general, it is more highly likely for the robot to

finish at position A (about 53%) than position B (about 47%).
77
 MARKOV CHAINS AND EIGENVECTORS

In our original problem, for the transition matrix
 0.8 0.3 
T = 
 0.2 0.7 
we found a steady state vector
 60 
S= 
 40 
such that TS = S .
Think of this relation as
TS = 1S
Thus, 1 is an eigenvalue and S is a corresponding eigenvector!
It can be shown that any transition matrix has 1 as eigenvalue.
Let see what happens in our case.
 0.8- λ 0.3  2
det   = 0  λ - 1.5λ + 0.5 = 0
 0.2 0.7- λ 
 λ = 1 or λ = 0.5
For the eigenvalue λ = 1 we obtain the system:

-0.2x + 0.3y = 0 x 3
 0.2x = 0.3y  =
0.2x- 0.3y = 0 y 2
 3t 
Thus, the corresponding eigenvectors are  .
 2t 
Therefore, if the total population is 100,
 60 
the corresponding eigenvector is  
 40 
[since 3t + 2t = 100  t = 20 ].
If the total population is 1 (the case of the robot)

 0.6 
the corresponding eigenvector is  
 0.4 
These are the steady state vector for the corresponding populations.
For example, a rental car company with 100 cars, two car stations
A and B, and transition matrix T as above, should place 60 cars in
station A and 40 cars in station B, from the first day!.
78

Topic 4A. Descripitve Statistics - Probability

Uploaded by

Copyright:

Available Formats

Topic 4A. Descripitve Statistics - Probability

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Topic 4A. Descripitve Statistics - Probability

Uploaded by

Copyright:

Available Formats

International Baccalaureate

4A. Descriptive statistics - Probability

4.1 BASIC CONCEPTS OF STATISTICS …..……………………………………………………. 1

4.2 MEASURES OF CENTRAL TENDENCY AND SPREAD ………………………… 4

4.3 FREQUENCY TABLES – GROUPED DATA …………………………………………… 11

4.4 REGRESSION ……………………..……………………………………………………………………. 17

4.5 ELEMENTARY SET THEORY ……………...………………………………………………….. 23

4.6 PROBABILITY ……………………………………...………………………………………………….. 28

4.7 CONDITIONAL PROBABILITY – INDEPENDENT EVENTS ………..………….. 33

4.8 TREE DIAGRAMS ………………………………...………………………………………………….. 39

4.9 DISTRIBUTIONS - DISCRETE RANDOM VARIABLES ………………………….. 47

4.10 BINOMIAL DISTRIBUTION – B(n,p) ..………..………………………….……………….. 52

4.11 NORMAL DISTRIBUTION – N(μ,σ2) …………………………………….……………….. 57

4.12 POISSON DISTRIBUTION – Po(m) …………………………………….……………….. 67

4.13 MARKOV CHAINS …………………………………………………………………………………. 72

4.1 BASIC CONCEPTS OF STATISTICS

In Statistics we deal with data collection, presentation, analysis and

Population (the entire list of a specified group)

We usually investigate a small sample of the population to draw

Numerical data can be

Data can be organized in several ways. We present some examples

Frequency table Pie chart

Colored Balls Frequency

Bar graph (for discrete data)

Histogram (for continuous data)

Stem and leaf Diagram

Key: 1|3 represents 13

As far as sampling is concerned, it is very crucial to select a sample

Suppose that we have a population of 100,000 people and wish to

Simple random sampling: We select 1000 people out of a hat

Systematic sampling: Since 100000/1000=100 (=period)

Stratified sampling: We divide the population in subgroups

Quota sampling: As in stratified but we pick

There are advantages and disadvantages in each method. Simple

4.2 MEASURES OF CENTRAL TENDENCY AND SPREAD

The total number of entries is n=11.

In order to describe these data we use

 3 measures of central tendency

The first three measures indicate a representative central value

 MEASURES OF CENTRAL TENDENCY (The 3 M’s)

A) MEAN = The sum of all values divided by n.

B) MODE = the most frequent value

C) MEDIAN = The value in the middle

Here, it is the sixth number in the list

1 This set of values is either a population or a sample.

the median is the n 1 -th entry.

the median is the mean of 50 and 60. Hence median = 55

The median is also denoted by Q2 (the index 2 will be clarified soon)

 The mean is denoted by μ (or by x ). In fact, we use

the Greek letter μ for the whole population.

If our data are denoted by x1 , x 2 ,…, xn , the mean is given by

Therefore, the numbers are 2,5,5.

b) the integers a  b  c  d , given that mean=5, mode=7, median=6.

The median implies that either b=c=6 or (b=5 and c=7)

Since the mode is 7 we obtain b=5 and c=d=7.

The standard deviation is perhaps the most “reliable” measure

The standard deviation is denoted2 either by σ or by sn .

For our example the GDC gives σ = 22.96.

B) RANGE = (maximum value) - (minimum value)

C) INTERQUARTILE RANGE = IQR = Q3 – Q1