Topic 4A. Descripitve Statistics - Probability

Download as pdf or txt
Download as pdf or txt
You are on page 1of 80

International Baccalaureate

MATHEMATICS
Applications and Interpretation SL (and HL)
Lecture Notes

Christos Nikolaidis

TOPIC 4
STATISTICS AND PROBABILITY

4A. Descriptive statistics - Probability

4.1 BASIC CONCEPTS OF STATISTICS …..……………………………………………………. 1

4.2 MEASURES OF CENTRAL TENDENCY AND SPREAD ………………………… 4

4.3 FREQUENCY TABLES – GROUPED DATA …………………………………………… 11

4.4 REGRESSION ……………………..……………………………………………………………………. 17

4.5 ELEMENTARY SET THEORY ……………...………………………………………………….. 23

4.6 PROBABILITY ……………………………………...………………………………………………….. 28

4.7 CONDITIONAL PROBABILITY – INDEPENDENT EVENTS ………..………….. 33

4.8 TREE DIAGRAMS ………………………………...………………………………………………….. 39

4.9 DISTRIBUTIONS - DISCRETE RANDOM VARIABLES ………………………….. 47

4.10 BINOMIAL DISTRIBUTION – B(n,p) ..………..………………………….……………….. 52

4.11 NORMAL DISTRIBUTION – N(μ,σ2) …………………………………….……………….. 57

Only for HL

4.12 POISSON DISTRIBUTION – Po(m) …………………………………….……………….. 67

4.13 MARKOV CHAINS …………………………………………………………………………………. 72

October 2021
TOPIC 4: STATISTICS AND PROBABILITY Christos Nikolaidis

4.1 BASIC CONCEPTS OF STATISTICS

In Statistics we deal with data collection, presentation, analysis and


interpretation of results. Data can be from

Population (the entire list of a specified group)


Sample (a subset of the Population)

We usually investigate a small sample of the population to draw


conclusions for the whole population itself.

Numerical data can be

Discrete OR Continuous
{10,20,30} [40,100]
{0,1,2,3,…} R
(finite or numerable set) (interval)

Data can be organized in several ways. We present some examples


below

Frequency table Pie chart

Colored Balls Frequency


Blue 13
Green 8
Red 10
Yellow 3

1
TOPIC 4: STATISTICS AND PROBABILITY Christos Nikolaidis

Bar graph (for discrete data)

Colored
Freq
Balls
Blue 13
Green 8
Red 10
Yellow 3

Histogram (for continuous data)

Age Frequency
[0,10) 7
[10,20) 5
[20,30) 1
[30,40) 3

Stem and leaf Diagram

Key: 1|3 represents 13


Stem Leaf
Data
1 2, 4, 6, 6
12, 14, 16, 16, 20, 21
2 0, 1, 1, 1, 5
21, 21, 25, 32, 39, 40
3 2, 9
43, 44, 47, 48, 49, 53
4 0, 3, 4, 7, 8, 9
5 3

2
TOPIC 4: STATISTICS AND PROBABILITY Christos Nikolaidis

As far as sampling is concerned, it is very crucial to select a sample


which is not biased. There are several sampling techniques which
face this bias.

Suppose that we have a population of 100,000 people and wish to


select a sample of 1000 people. If we select the first 1000 in a list,
or the youngest 1000 there is certainly a bias in our selection.

Simple random sampling: We select 1000 people out of a hat


Each member has an equal probability

Systematic sampling: Since 100000/1000=100 (=period)


we pick a random starting point (e.g.
the 20th person) and pick every 100th
person (i.e. 20th, 120th, 220th, …)

Stratified sampling: We divide the population in subgroups


(say men and women, or under and
over 40 years old). We pick a sample
from each group

Quota sampling: As in stratified but we pick


proportional samples according to the
proportion of the subgroups in the
population.

There are advantages and disadvantages in each method. Simple


random sampling is fair but it may be very time consuming
compared to the systematic sampling. In systematic sample though,
if there is a periodic pattern in the population there may be a bias.
Suppose that the 100000 are in groups of 100 people. If the first
person of the group is the leader, then the sampling method of
selecting every 100th person may provide a sample of only leaders
or no leaders at all.

3
TOPIC 4: STATISTICS AND PROBABILITY Christos Nikolaidis

4.2 MEASURES OF CENTRAL TENDENCY AND SPREAD


Consider the following numerical data1:

10, 20, 20, 20, 30, 30, 40, 50, 70, 70, 80

The total number of entries is n=11.

In order to describe these data we use

 3 measures of central tendency


 3 measures of spread

The first three measures indicate a representative central value


which best describes the data, while the second three measures
indicate if our data are very close or dispersed to each other.

 MEASURES OF CENTRAL TENDENCY (The 3 M’s)

A) MEAN = The sum of all values divided by n.

Here
10  20  20  20  30  30  40  50 70 70 80
mean = = 40
11

B) MODE = the most frequent value


Here
mode = 20

C) MEDIAN = The value in the middle


(provided they have been placed in ascending order).

Here, it is the sixth number in the list

median = 30

1 This set of values is either a population or a sample.

4
TOPIC 4: STATISTICS AND PROBABILITY Christos Nikolaidis

NOTICE
 For the data 10, 20, 30
Median = 20
For the data 10, 20, 30, 40
Median = 25
That is, for an even number of data,
median = the mean of the two middle values

 The median is not the n -th entry as one would possibly expect.
2

the median is the n 1 -th entry.


2
For example,
n 1
if n=11, =6, thus the median is the 6th entry. See the
2
example above;
n 1
if n=10, =5.5, thus the median is the mean of the 5th and
2
6th entries; for the 10 entries

10, 20, 30, 40, 50, 60, 70, 80, 90, 100

the median is the mean of 50 and 60. Hence median = 55

The median is also denoted by Q2 (the index 2 will be clarified soon)

 The mean is denoted by μ (or by x ). In fact, we use

the Greek letter μ for the whole population.


the Latin letter x for a sample of the population.

If our data are denoted by x1 , x 2 ,…, xn , the mean is given by


x1  x 2  x3  ⋯
μ =
n
or otherwise

μ
x i

5
TOPIC 4: STATISTICS AND PROBABILITY Christos Nikolaidis

EXAMPLE 1
Find
a) the integers a  b  c , given that mean=4, mode=5, median=5.

The median implies that b=5. The mode implies that also c=5.
a5 5
Then  4  a  10  12  a  2
3

Therefore, the numbers are 2,5,5.

b) the integers a  b  c  d , given that mean=5, mode=7, median=6.

The median implies that either b=c=6 or (b=5 and c=7)

Since the mode is 7 we obtain b=5 and c=d=7.


a  5 7 7
Then  5  a  19 20  a  1
4
Therefore, the numbers are 1,5,7,7.

 MEASURES OF SPREAD
We use the same set of data

10, 20, 20, 20, 30, 30, 40, 50, 70, 70, 80

A) STANDARD DEVIATION

The standard deviation is perhaps the most “reliable” measure


for spread, as it takes all data into consideration. It measures
how far the entries from the mean are. It can be found by using
the GDC (directions will be given later on).

The standard deviation is denoted2 either by σ or by sn .

For our example the GDC gives σ = 22.96.

2 In fact,
the Greek letter σ is used for the whole population;
the Latin letter sn is used for a sample of the population

6
TOPIC 4: STATISTICS AND PROBABILITY Christos Nikolaidis

B) RANGE = (maximum value) - (minimum value)


Here
range = 80-10 = 70

C) INTERQUARTILE RANGE = IQR = Q3 – Q1


where
Q1 = LOWER QUARTILE = the median of the values before Q2
Q3 = UPPER QUARTILE = the median of the values after Q2

Here, before the median Q2=30, we have 5 numbers, hence


Q1=20 (this is the 3rd entry)
Also,
Q3=70 (it is the 3rd entry from the end)
Therefore,
IQR = 70-20 = 50

As the estimation of the values Q1, Q2, Q3 is quite tricky, let us see
some extra cases in the following example.

EXAMPLE 2 Remember that


n1
 for the value of the median Q2 we consider the th entry.
2
 for the values of Q1 and Q3 we consider only the entries before
and the entries after the median respectively.
a) For n=7 entries: 10, 20, 30, 40, 50, 60, 70
The median is Q2=40 (the 4th entry). Hence Q1=20, Q3=60.

b) For n=8 entries: 10, 20, 30, 40, 50, 60, 70, 80
The median is Q2=45 (the 4.5th entry). Hence Q1=25, Q3=65.

c) For n=9 entries: 10, 20, 30, 40, 50, 60, 70, 80, 90
The median is Q2=50 (the 5th entry). Hence Q1=25, Q3=75.

d) For n=10 entries: 10, 20, 30, 40, 50, 60, 70, 80, 90, 100
Then Q2=55 (the 5.5th entry). Hence Q1=30, Q3=80.

7
TOPIC 4: STATISTICS AND PROBABILITY Christos Nikolaidis

NOTICE
The square of the standard deviation is called variance. That is
2
variance = σ2 or sn

For our example, σ2= 22.962 = 527.27

 USE OF GDC

We can use the GDC to easily obtain all these measures.


For Casio CFX we select
 MENU
 STAT
 Complete List 1 with values of x (our data)
 CALC
 (1VAR): We obtain all the statistics.

Notice that
The standard deviation in the GDC is denoted by σχ
The variance is not given; it is simply the square of σχ

 BOX AND WHISKER PLOT


Consider again the initial example
10, 20, 20, 20, 30, 30, 40, 50, 70, 70, 80

In an appropriate horizontal scale we mark 5 figures:

min, Q1, Q2, Q3, max


in the following way:

min Q1 Q2 Q3 max

8
TOPIC 4: STATISTICS AND PROBABILITY Christos Nikolaidis

This diagram is helpful, particularly when we have a large number


of entries. It shows the “density” of data within the whole range. In
fact, the box plot splits the whole range of data in 4 intervals.
Generally speaking, each interval contains 25% of the entries. Thus
the following conclusions can be drawn:

The lowest 25% is below Q1 The upper 25% is above Q3


The lowest 50% is below Q2 The upper 50% is above Q2
The middle 50% is between Q1 and Q3

 MORE DETAILS

1) Percentiles
The values Q1, Q2, Q3 are also called
Q1 : 25th-percentile
Q2 : 50th-percentile
Q3 : 75th-percentile

Other percentiles may also be defined in a similar way; we will give


further examples in the next paragraph.

2) Outliers

Very extreme values in a set of data (that is very small or very


large) may give a false impression for out data. They are known as
outliers. We agree that

an outlier is any value


below Q1 – 1.5×IQR
or above Q3 + 1.5×IQR,

Such a value is viewed as being too far from the central values to
be reasonable. In our example,

Q1 - 1.5×IQR = 20 - 1.5×50 = - 55

Q3 + 1.5×IQR = 70 + 1.5×50 = 145

i.e. there are no outliers.

9
TOPIC 4: STATISTICS AND PROBABILITY Christos Nikolaidis

 FORMULAS FOR VARIANCE AND STANDARD DEVIATION


(not in the syllabus)

The formulas are not in the syllabus. We give them just for
information.

If our data are x1 , x 2 ,…, x n

the variance is given by σ 2



(x
i  μ)2
n

the standard deviation is given by σ


(xi  μ)2
n

For our example, since μ=40,

(10- 40)2  (20- 40)2  (20- 40)2 ⋯ (80- 40)2


variance = = 527.27
11

standard deviation = 527.27 = 22.96

An alternative and more practical formula for the variance is given


by

σ 2

x i
2

- x2
n
For our example,

10 2  20 2  20 2  ⋯  80 2
variance = -402 =527.27
11

10
TOPIC 4: STATISTICS AND PROBABILITY Christos Nikolaidis

4.3 FREQUENCY TABLES – GROUPED DATA

Consider again the numerical data:


10, 20, 20, 20, 30, 30, 40, 50, 70, 70, 80
The total number of entries is n=11.

An alternative way of presentation is the frequency table:

Data Frequency
x f
10 1
20 3
30 2
40 1
50 1
70 2
80 1
n=11

Let us study again the basic measures for these data.

 MEASURES OF CENTRAL TENDENCY (The 3 M’s)

A) MEAN = The sum of all values divided by n.


The MEAN is given by
110 3 20 2 30 1 40 150 270 180
mean = = 40
11

In general, given that fi is the frequency of the entry xi, the


formula is

μ
f1 x1  f2 x 2  f3 x3 ⋯
or otherwise μ
f x
i i

n n

11
TOPIC 4: STATISTICS AND PROBABILITY Christos Nikolaidis

B) MODE = the most frequent value


It is very obvious now. The entry x of the highest frequency is
mode = 20

C) MEDIAN = The value in the middle

It is still the entry in position n  1 , that is the 6th entry.


2

We can easily see that this is 30.

It helps here to add an extra column in the table above with the
so-called cumulative frequencies:

Data Frequency Cumulative


x f frequency (c.f.)
10 1 1
20 3 4
30 2 6
40 1 7
50 1 8
70 2 10
80 1 11
n=11

It simply gives the total number of entries up to each row. For


example, the total number of entries up to 20 is 1+3=4.
The MEDIAN, i.e. the 6th entry, is 30.

 MEASURES OF SPREAD

A) STANDARD DEVIATION

Again, it can be directly obtained by the GDC.

For our example the GDC gives σ = 22.96.

Thus the variance is σ2 = 527.27

12
TOPIC 4: STATISTICS AND PROBABILITY Christos Nikolaidis

B) RANGE = (maximum value of x) - (minimum value of x)


It is very obvious here
range = 80-10 = 70

C) INTERQUARTILE RANGE = IQR = Q3 – Q1


The cumulative frequency table helps here as well.

The median Q2=30 is in the 6th position.


n 1
Thus, before the median we have 5 entries. Since  3,
2
Q1=20 (this is the 3rd entry)
and
Q3=70 (this is the 3rd entry from the end)
Therefore,
IQR = 70-20 = 50

 USE OF GDC

We can use the GDC to easily obtain all these measures.


For Casio CFX we select
 MENU
 STAT
 Complete List 1 with values of x (our data)
List 2 with frequencies
 CALC
 SET: we check the first two lines
The first line is OK. (1Var XList :List1)
For the second line (1Var Freq :----), select between
F1: enter 1, if there are no frequencies
F2: enter List 2 to consider frequencies
 Go back (EXIT)
 1VAR: We obtain all the statistics.

Check the value of n first (number of entries), to ensure that


all data have been considered.

13
TOPIC 4: STATISTICS AND PROBABILITY Christos Nikolaidis

NOTICE (for the GDC)


 The variance is not given; it is simply the square of σχ
 Since the GDC gives minX,Q1,Med,Q3,maxX remember that

Range = maxX – minx Interquartile Range = Q3 – Q1

The box and whisker plot uses exactly those 5 measures


 Extra information given:
Σx : the sum of all entries, i.e. x1+x2+x3+…
Σx2: the sum of the squares, i.e. x12+x22+x32+…
sχ : it is known as unbiased st. deviation (not in the syllabus!)

 GROUPED DATA

Suppose that 100 students took an exam and obtained scores from
1 to 60 (full marks), according to the following table:

Score Midpoint No of students Cumulative


(x) (for x) (frequency f) frequency (cf)

0  x  10 5 8 8

10  x  20 15 12 20

20  x  30 25 10 30

30  x  40 35 25 55

40  x  50 45 35 90

50  x  60 55 10 100
n=100

i.e. 8 students obtained scores from 1 up to 10, and so on.

 The mean and the standard deviation are still calculated as in a


usual frequency table, but now x1,x2,x3,… are the midpoints of
the intervals.
For example,
8 5  12 15  10  25  25  35  35  45  10  55
μ  34.7
100

14
TOPIC 4: STATISTICS AND PROBABILITY Christos Nikolaidis

These measures may also be obtained by the GDC, where the


LIST1 contains the midpoints of x. Here,
μ=34.7 σ =14.31

 Moreover, instead of the mode we have the modal group here.


That is the interval of the highest frequency. In our example, the
modal group is 40  x  50.

 For the median Q2 and the quartiles Q1 and Q3:


we need to draw the so-called cumulative frequency diagram
x-axis: values of x (we consider upper bounds of intervals)
y-axis: cumulative frequencies

x: up to  10  20  30  40  50  60
y: c.f 8 20 30 55 90 100

Q1=25 Q2=38 Q3=46

For the estimation of Q1, Q2, Q3 follow

15
TOPIC 4: STATISTICS AND PROBABILITY Christos Nikolaidis

Step 1: Divide y-axis into four equal parts


(Here we divide at y=25, y=50, y=75)
Step 2: Draw three horizontal lines until you meet the curve
Step 3: Draw three vertical lines from the intersection points
Obtain Q1, Q2, Q3 on x-axis (look at above)

Below that graph we can easily draw box and whisker plot:

Min=0 Q1=25 Q2=38 Q3=46 Max=60

 Remember that the values Q1, Q2, Q3 are also called


Q1 : 25th-percentile
Q2 : 50th-percentile
Q3 : 75th-percentile

In the same way we can find any percentile. For example, for the
40th-percentile
Estimate 40% of n: here 40% of 100 students is 40;
Draw a horizontal line at y=40 until you meet the curve;
Then draw a vertical line;
Hence
40th-percentile = 35.

In other words, 40% of the students have scores below 35.

 Let us check if there are outliers:


IQR = 46-25=21

Q1 - 1.5×IQR = 25 - 1.5×21 = - 6.5

Q3 + 1.5×IQR = 46 + 1.5×21 = 77.5

There are no scores lower than -6.5 or greater than 77.5, that is
there are no outliers.

16
TOPIC 4: STATISTICS AND PROBABILITY Christos Nikolaidis

4.4 REGRESSION

We have a list of paired data. For example

x 10 12 15 20 23 28 30
y 120 135 174 213 270 301 305

We assume that x is the independent variable, y is the dependent


variable. Let us also see these points (x,y) on a scatter diagram.
y

300

200

100

x
-5 5 10 15 20 25 30 35

The main question here is whether there is a linear relationship


between the values of x and the corresponding values of y.

There is a parameter r, called correlation coefficient3 that gives the


extent of this relationship. It takes values
-1 ≤ r ≤ 1

The closest to the ends ±1, the more our data are linearly related.
(-1 implies a negative slope while +1 implies a positive slope)
The closest to 0, the less our data are linearly related.

There is also a line y=ax+b that best fits our data; it is known as
regression line. We can easily obtain these details by using a GDC.

3It is known as Pearson’s product-moment correlation coefficient

17
TOPIC 4: STATISTICS AND PROBABILITY Christos Nikolaidis

 USE OF GDC

For Casio CFX we select


 MENU
 STAT
 Complete List 1 with values of x; List 2 with values of y
 CALC
 REG
 X
 aX+b : look at the values of a,b,r.

For our example,

r =0.99 there is a very strong correlation between x and y


a =9.83
The regression line is y =9.83x+23.1
b =23.1

300

200

y =9.83x+23.1
100

x
-5 5 10 15 20 25 30 35

By using the regression line y=f(x) we may predict values of y


corresponding to values of x that are not in the list. For example

for x=18, we estimate y = 9.8318+23.1  200


for x=40, we estimate y = 9.8340+23.1  416

Notice that x=18 is within the range of our list while x=40 is not.
f(18)=200 is known as interpolation, f(40)=416 as extrapolation.
In general, interpolations are more reliable than extrapolations.

Notice. In order to predict a value of x corresponding to a given y


we do not use the same regression line. We find a new regression
line for x on y. In our example, the GDC gives x =0.0997y-1.92

18
TOPIC 4: STATISTICS AND PROBABILITY Christos Nikolaidis

 CHARACTERISTICS OF THE REGRESSION LINE y=ax+b


The regression line
 passes through the point M( x , y ), where
x = the mean of the values of x
y = the mean of the values of y

 separates the points in (almost) two halves: half of the points


are above and half below the line.

The values of x , y can also be obtained by the GDC (together with


other statistics). In the STAT mode, after inserting the values of x
and y, select

 CALC
 2VAR: We obtain all the statistics, separately for x’s and y’s

In our example
x = 19.7 y =216.9
Thus the line passes through the point M(19.7, 216.9).

 CHARACTERISTICS OF THE CORRELATION COEFFICIENT r

The correlation between x and y is characterised according to the


value of r as follows:

-1 -0.75 -0.5 -0.25 0 0.25 0.5 0.75 1

strong moderate weak very weak weak moderate strong


negative negative negative or no positive positive positive
correlation correlation correlation correlation correlation correlation correlation

To better understand the correlation coefficient r, let us see some


characteristic cases (find the results below in your GDC for
practice).

19
TOPIC 4: STATISTICS AND PROBABILITY Christos Nikolaidis

Data Scatter diagram Results

x y
y

10
r =1
1 2 8
perfect positive
2 4 6 correlation
3 6 4

Regression line:
4 8 2

x y=2x
5 10 1 2 3 4 5

x y y
r =-1
10
1 10 perfect negative
8

2 8 correlation
6

3 6 4
Regression line:
4 4 2

y=-2x+12
5 2 1 2 3 4 5
x

Let us slightly modify our data

x y y

10
r =0.98
1 2
8 strong positive
2 3
6 correlation
3 7 4

4 8 Regression line:
2

5 10 x y=2.1x-0.3
1 2 3 4 5

x y y

10
r =-0.98
1 10 strong negative
8

2 8 6 correlation
3 7 4

Regression line:
4 3 2

x y=-2.1x+12.3
5 2 1 2 3 4 5

and a final extreme case

x y y

10
r =0
1 8
8 no correlation
2 2
6 at all
3 5 4

4 2 Regression line:
2

5 8
x y=5
1 2 3 4 5

20
TOPIC 4: STATISTICS AND PROBABILITY Christos Nikolaidis

 SPEARMAN’S RANK CORRELATION COEFFICIENT rs

The Spearman correlation coefficient is defined as the Pearson


correlation coefficient (seen above) between the rank variables.

Look at the example


x y
10 105
20 103
30 125
40 130
50 128

The Pearson correlation coefficient is r = 0.88.

Let us also observe the ranks of the data

x y rank rank
of x of y
10 105 1 2
20 103 2 1
30 125 3 3
40 130 4 5
50 128 5 4

The correlation coefficient between the last two columns is 0.8.

This is the Spearman correlation coefficient rs.


r = 0.88 rs = 0.8

Pearson's correlation coefficient (r) indicates the degree of linear


relationship between two variables.

Spearman's correlation coefficient (rs) indicates the degree of


monotonic relationship between the variables (either linear or not),
that is what extent y increases when x increases.

In case of equal data items we use averages ranks. For example, if


the first four values for y are equal, say 20, 20, 20, 20

instead of ranks 1, 2, 3, 4
we use the ranks 2.5, 2.5, 2.5, 2.5.

21
TOPIC 4: STATISTICS AND PROBABILITY Christos Nikolaidis

It is possible to have r ≠ 1 and rs = 1.

Look at one of the examples we saw earlier

x y y

10
r =0.98
1 2
8 strong positive
2 3
6 correlation
3 7 4

4 8 Regression line:
2

5 10 x y=2.1x-0.3
1 2 3 4 5

We write down the ranks for our data:

x y rank rank
of x of y
1 2 1 1
2 3 2 2
3 7 3 3
4 8 4 4
5 10 5 5

Here
r = 0.98
rs = 1 (perfect monotonic relationship)

since y always increases as x increases.

Spearman’s correlation coefficient is less sensitive to outliers than


Pearson’s product moment correlation coefficient.

22
TOPIC 4: STATISTICS AND PROBABILITY Christos Nikolaidis

4.5 ELEMENTARY SET THEORY

 BASIC NOTIONS
In elementary set theory, a set is just a collection of objects (or
elements). It is usually denoted by a capital letter. For example,

R = the set of real numbers


Q = the set of rational numbers

When listed, the elements of a set are separated by commas “,”


and included between the symbols { and }. For example,

N = {0,1,2,3,4,…} (i.e. the set of natural numbers)


Z = {…,-3,-2,-1,0,1,2,3,…} (i.e. the set of all integers)

or less popular sets, such as

A = {1,2,3} (it contains only 3 elements)


B = {a,b,c,d} (it contains 4 letters)
C = {Chris, Mary, Tom} (it contains 3 names)
etc

To declare that the element a is contained in set B we write

a B

To declare that the element f is not contained in set B we write

f B

The most trivial set is the empty set. It contains no elements, it is


denoted by { } or by the symbol .

Let us consider the set A = {1,2,3}. The subsets of A are sets that
contain some (or none or all) elements of A. There are 8 subsets:


{1}, {2}, {3}
{1,2}, {1,3}, {2,3}
{1,2,3}

23
TOPIC 4: STATISTICS AND PROBABILITY Christos Nikolaidis

In general,
if A contains n elements, there are 2n subsets.

Indeed, here, A contains 3 elements and possesses 23=8 subsets.

If A = {1,2,3} and B = {1,2}, to declare that B is a subset of A, we


write
B A

Do not forget that always


 A (The empty set is a subset of any set)
AA (Any set is a subset of itself)

All subsets of A except itself are also called proper subsets. To


emphasize that B is a proper subset of A we write

B A

 VENN DIAGRAMS
We usually refer to a large set S, called universal set, and consider
several subsets of S.
Let
S = { a,b,c,d,e,f,g,h,i,j }

be our universal set. We consider the subset


A = {a,b,c,d,e}

A helpful way to present this information is by using a Venn


diagram:

A
a b d f h

c e g i j

24
TOPIC 4: STATISTICS AND PROBABILITY Christos Nikolaidis

If we also consider the subset


B = {d,e,f,g}
the Venn diagram becomes

A B
a b d f h

c e g i j

As we usually deal with large universal sets, in a Venn diagram we


are not interested so much for the elements themselves but only for
the number of elements in each region. In this case the Venn
diagram above takes the form

S 10

A B

3 2 2

We denote by
n(A) = the number of elements of set A

In our example
n(S) = 10
n(A)=5 n(B)=4

Notice that the number n(A)=5 does not appear on the Venn
diagram. The subset A consists of two regions of size 3 and 2, thus
n(A)=3+2=5

25
TOPIC 4: STATISTICS AND PROBABILITY Christos Nikolaidis

Now we can study some basic operations between sets. Let us refer
again to our example where S = { a,b,c,d,e,f,g,h,i,j } and
A = {a,b,c,d,e}
B = {d,e,f,g}

 THE COMPLEMENT OF A: A΄ (not A)


It contains the elements that are not in A.

S
A

In our example A΄ = {f,g,h,i,j}


Sometimes the complement of A is also denoted by A .

 THE UNION OF A AND B: AB (A or B)


It contains all the elements that are either in A or in B.

S
A B

In our example AB = {a,b,c,d,e,f,g}

 THE INTERSECTION OF A AND B: AB (A and B)


It contains the common elements of A and B.

S
A B

In our example AB = {d,e}

26
TOPIC 4: STATISTICS AND PROBABILITY Christos Nikolaidis

 A BASIC PREPERTY

n(AB) = n(A) + n(B) – n(AB)

Indeed, in our example


n(AB)=7, n(A)=5, n(B)=4, n(AB)=2

Notice that AB contains 7 elements, not 5+4=9, as in n(A)+n(B)


we count the common elements twice. Thus,
7 = 5 + 4 – 2

 MUTUALLY EXCLUSIVE SETS


If AB=, then n(AB)=0

S A B

In this case only

n(AB) = n(A) + n(B)


and the two sets A and B are said to be mutually exclusive.

27
TOPIC 4: STATISTICS AND PROBABILITY Christos Nikolaidis

4.6 PROBABILITY

We start again with a universal set S. In probability theory this set


is known as the sample space; it contains all possible outcomes of a
game, or experiment, etc. The subsets A,B, … of the sample space S
are called events.

Consider the sample space S. The number of elements in S, that is


n(S), is denoted by TOTAL. The probability of some event A is
simply defined by

n(A)
P(A) 
TOTAL
For example, in the following Venn diagram, the sample space S
contains 100 elements, while the event A contains 30 elements

100

30

70

n(A) 30
P(A)    0.3
TOTAL 100

In simple words, if we choose an element from S at random,


(provided that every element is equally likely to be selected), the
probability that this element belongs to A is 30 out of 100,
otherwise 30% (that is 0.3).

We understand that
0  P(A)  1
Clearly
P()=0 and P(S) = 1

28
TOPIC 4: STATISTICS AND PROBABILITY Christos Nikolaidis

 COMPLEMENTARY EVENTS
In our example above P(A΄) = 0.7
In general
P(A΄) = 1- P(A)

 COMBINED EVENTS
Remember the basic property for combined events

n(AB) = n(A) + n(B) – n(AB)


If we divide all terms by the TOTAL we obtain

P(AB) = P(A) + P(B) – P(AB)

For example, consider


100

A B

20 10 30

40

Then
P(A) = 0.3, P(B) = 0.4, P(A΄)=0.7, P(B΄)=0.6
Also
P(AB)=0.1, P(AB)=0.6
Clearly
P(AB) = P(A) + P(B) – P(AB)
0.6 = 0.3 + 0.4 – 0.1
A Venn diagram may also contain probabilities instead of numbers
of elements. The Venn diagram above takes the form

A B

0.2 0.1 0.3

0.4

29
TOPIC 4: STATISTICS AND PROBABILITY Christos Nikolaidis

 MUTUALLY EXCLUSIVE SETS

S A B

We have seen that two events are mutually exclusive if


AB= or equivalently n(AB)=0;
Equivalently if
P(AB)=0
In this case only
P(AB) = P(A) + P(B)

EXAMPLE 1
Given that P(A) = 0.5, P(B) = 0.3, P(AB)=0.6, let us construct a
Venn diagram representing the combined events A and B
Notice that
P(AB)≠P(A)+P(B)
0.6 ≠ 0.8
The difference implies the existence of an intersection; P(AB)=0.2
Starting from the intersection 0.2 we may easily complete the
following Venn diagram

A B

0.3 0.2 0.1

0.4

After completing the Venn diagram, we are in a position to answer


any probability question. For example
P(AB΄) = 0.3 P(A΄B) = 0.1 P(A΄B΄) = 0.4
P(AB΄) = 0.9 P(A΄B) = 0.7 P(A΄B΄) = 0.8

30
TOPIC 4: STATISTICS AND PROBABILITY Christos Nikolaidis

 TABLES
Another way to represent sets in order to find probabilities is the
tabular form below. It is appropriate when the sample space is
partitioned in disjoint subsets according two different criteria; for
example MALE-FEMALE and SMOKERS-NON SMOKERS.

Let us consider the following group of 200 people

male female Total


smoker 40 20 60
non-smoker 80 60 140
Total 120 80 200

In order to find the probability of a group (or combination of


groups) we simply divide its size by 200, the total number of
people. Thus

If we select a person at random the probability that this person is

120
 male is P(male)   0.6
200
80
 female is P(female)   0.4
200
60
 smoker is P(smoker)   0.3
200
140
 non-smoker is P(non - smoker)   0.7
200
40
 male AND smoker P(male  smoker)   0.2
200
140
 male OR smoker P(male  smoker)   0.7
200

Notice: In the last probability, we consider the column of male and


the row of smokers, but the combination male-smoker is counted
only once. It holds again

P(male  smoker)  P(male )  P( smoker) - P(male  smoker)

31
TOPIC 4: STATISTICS AND PROBABILITY Christos Nikolaidis

Some problems require particular techniques for counting the


appropriate group size. Tossing two dice is a characteristic example.

 TWO DICE

We toss two dice. There are 36 possible outcomes (combinations of


scores). The following table helps to visualize the outcomes

1 2 3 4 5 6
1      
2      
3      
4      
5      
6      

Notice that there is only one combination ones (the first dot; 1-1)
but two combinations of one-two (1-2 and 2-1).

We find the following probabilities:

1
P(two sixes)  (the very last dot)
36

11
P(at leat one six)  (last column and last row)
36

10
P(exactly one six)  (why?)
36

6
P(same score)  (the main diagonal: 1-1, etc)
36

4
P(sum of scores  9)  (the dotted line)
36

6
P(sum of scores  9)  (below the dotted line)
36

26
P(sum of scores  9)  (above the dotted line)
36
32
TOPIC 4: STATISTICS AND PROBABILITY Christos Nikolaidis

4.7 CONDITIONAL PROBABILITY – INDEPENDENT EVENTS

Notice the following difference in notation

P(A) means “probability of A”

P(A|B) means “probability of A, given B”

Intuitively, we expect that

“the probability that it will rain in some day”

is different than

“the probability that it will rain in some day,


given that this is a day of September”

In a more mathematical example, suppose that we pick a whole


number in the range 1-100. Let

A = “we pick 17”

1
Clearly P(A)  .
100

However, if we know the information

B = “the number selected has two digits”

1
then P(A | B)  (there are 90 two-digit numbers)
90

 FORMAL DEFINITION OF P(A|B)

The conditional probability is given by the formula

n(A  B) P(A  B)
P(A | B)  or P(A | B) 
n(B) P(B)

We will clarify the definition by using Venn diagrams and Tables

33
TOPIC 4: STATISTICS AND PROBABILITY Christos Nikolaidis

 P(A|B) IN A VENN DIAGRAM

Let us consider the example


100

A B

20 10 30

40

30
We know that P(A)  . What about P(A|B) ?
100

We start with the given event B; now the total number is not 100,
the size of the whole sample space, but only 40, the size of B:

?
P(A | B) 
40  given B

How many elements of A are inside the given space B? Only 10.
Therefore,
10
P(A | B) 
40

NOTICE
In fact, in the last result we apply the formula
n(A  B) 10
P(A | B)    0.25
n(B) 40

If we divide both the numerator and the denominator by the


TOTAL number of the sample space we obtain the formal definition

10
P(A  B) 100
P(A | B)    0.25
P(B) 40
100

34
TOPIC 4: STATISTICS AND PROBABILITY Christos Nikolaidis

 Similarly we obtain

10
P(B | A) 
30  given A
 Similarly we obtain
30 20 40
P(A'| B)  P(A | B' )  P(A'| B' ) 
40 60 60

 P(A|B) IN A TABLE
Perhaps it is much easier to observe the conditional probability in
tables. Consider again the example

male female Total


smoker 40 20 60
non-smoker 80 60 140
Total 120 80 200

Observe the difference between the probalitites


P(smoker) the person is a smoker
P(smoker|male) the person is a smoker given it is male

Clearly,
60
P(smoker) 
200

40
P(smoker|male) 
120  given male

 Similarly we obtain

40
P(male|smoker) 
60  given smoker

 Similarly we obtain
20 60
P(female|smoker)   0.33 P(non-smoker|female)   0.75
60 80

35
TOPIC 4: STATISTICS AND PROBABILITY Christos Nikolaidis

 INDEPENDENT EVENTS

The events A and B are said to be independent if

P(A|B) = P(A)

In other words, the event B does not affect A;


the probability of A remains the same, either B is given or not!

 Similarly, in this case it holds P(B|A) = P(B)


That is, the event A does not affect B.

P(A  B)
 In this case the definition P(A | B)  gives
P(B)

P(A  B)  P(A | B)  P(B)  P(A  B)  P(A)  P(B)

To summarize

P(A|B) = P(A) (1)

A and B are independent P(B|A) = P(B) (2)

P(A  B)  P(A)  P(B) (3)

EXAMPLE 1
120

A B

20 10 30
60

We can show in three different ways that A and B are independent


30 1 10 1
 P(A) = = and P(A|B) = = thus (1) holds
120 4 40 4
40 1 10 1
 P(B) = = and P(B|A) = = thus (2) holds
120 3 30 3
10 1 1 1 1
 P(A  B) = = P(A)  P(B)    thus (3) holds
120 12 4 3 12

36
TOPIC 4: STATISTICS AND PROBABILITY Christos Nikolaidis

NOTICE
 Many students confuse the terms
Mutually exclusive events and Independent events

Remember
Mutually exclusive events means A B = 
Independent events means P(A  B)  P(A)  P(B)

 Mind that
P(AB) = P(A) + P(B) – P(AB) holds in general
P(A  B)  P(A)  P(B) holds for independent events

In particular for independent events, it is sometimes useful to


combine these two formulas in the following one

P(AB) = P(A) + P(B) - P(A)  P(B)

 Sometimes we know beforehand that two events are


independent. Thus, for their combination we can apply the
formula P(A  B)  P(A)  P(B)
For example,
we toss a die and a coin; Find the probability that the die
shows a SIX and the coin shows a HEAD.
We call
1
A = “the die shows a SIX” P(A) =
6
1
B = “the coin shows a HEAD” P(B) =
2

The events A and B are clearly independent and for their


combination it holds

1 1 1
P(A  B)  P(A)  P(B) = =
6 2 12

37
TOPIC 4: STATISTICS AND PROBABILITY Christos Nikolaidis

EXAMPLE 2
Let P(A)=0.4 and P(B)=0.3. Find P(AB) in the following cases
a) A and B are mutually exclusive
b) A and B are independent
c) P(A  B)  0.2
d) P(A | B)  0.2
Solution
a) P(AB) = P(A) + P(B) = 0.4 + 0.3 = 0.7
b) P(AB) = P(A) + P(B) - P(A)  P(B) = 0.4+0.3–(0.4)(0.3) = 0.58
c) P(AB) = P(A) + P(B) – P(AB) = 0.4+0.3–0.2 = 0.5
P(A  B)
d) P(A | B)   P(AB)= P(A | B) P(B)= (0.2)(0.3) = 0.06
P(B)
Hence, P(AB) = P(A)+P(B)–P(AB) = 0.4+0.3–0.06 = 0.64

EXAMPLE 3
Let A and B be independent events with
P(A)=0.4 and P(AB)=0.7.
Find P(B).
Solution
For independent events it holds
P(AB) = P(A) + P(B) - P(A)  P(B)
 0.7 = 0.4 + P(B) – 0.4 P(B)
 0.3 = 0.6P(B)
 P(B) = 0.5

38
TOPIC 4: STATISTICS AND PROBABILITY Christos Nikolaidis

4.8 TREE DIAGRAMS

Very often we have to estimate the probability in a sequence of


events under different scenarios. The best way to represent such a
problem is by a tree diagram.

PROBLEM 1. We play a game with two possible results.

For example we pick one of the following 10 letters

AAAA BBBBBB

The results are


A with probability 0.4
B with probability 0.6

We play the game twice. All possible scenarios are shown below; the
corresponding probabilities are shown on the branches of the tree:

A
Scenario AA
0.4

A
0.4
0.6 B Scenario AB

Scenario BA
0.4 A
0.6 B

0.6 B Scenario BB

Next, for each scenario we multiply the corresponding probabilities

for AA: (0.4)x(0.4) = 0.16


for AB: (0.4)x(0.6) = 0.24,
etc

39
TOPIC 4: STATISTICS AND PROBABILITY Christos Nikolaidis

Thus, the final “picture” of the tree diagram is as follows

A
0.16
0.4

A
0.4
0.6 B
0.24

0.24
0.4 A
0.6 B

0.6 B 0.36

(notice that the sum of the resulting probabilities is 1).

Now any probability may be found by adding the relevant results.

Namely, the probability

 to obtain two A’s is 0.16

 to obtain two B’s is 0.36

 to obtain first A and then B is 0.24

 to obtain one A, one B is 0.24 + 0.24 = 0.48

 to obtain the same result is 0.16 + 0.36 = 0.52

(thus to obtain different results is 1- 0.52 = 0.48)

If we refer to the number of A’s, the probability

 to obtain no A is 0.36

 to obtain exactly one A is 0.24 + 0.24 = 0.48

 to obtain at least one A is 0.24 + 0.24 + 0.16 = 0.64

 to obtain at most one A is 0.24 + 0.24 + 0.36 = 0.84

40
TOPIC 4: STATISTICS AND PROBABILITY Christos Nikolaidis

PROBLEM 2. We play again the previous game once. According to


the result we play a different second game.

For example we pick one of the following 10 letters

AAAA BBBBBB

If the first result is A we pick one letter among CCC DDDDDDD

If the first result is B we pick one letter among CCCC DDDDDD

What is the probability to obtain C?

A tree diagram is particularly helpful in such a situation where the


second game depends on the first one:

C
0.12
0.3

A
0.4
0.7 D
0.28

0.24
0.4 C
0.6 B

0.6 D 0.36

(notice again that the sum of the resulting probabilities is 1).

Thus

the probability to obtain C is 0.12 + 0.24 = 0.36

the probability to obtain D is 0.28 + 0.36 = 0.64

41
TOPIC 4: STATISTICS AND PROBABILITY Christos Nikolaidis

It is worthwhile to mention the following probabilities:

 to obtain A and C. It is 0.12


It is in fact P(AC) and refers to the first scenario

 to obtain A or C. It is 0.12 + 0.28 + 0.24 = 0.64


It is in fact P(AC) and refers to the first three scenarios which
contain either A or C (or both).

NOTICE

In the tree diagram above, the value 0.3 of the branch AC is in fact
the conditional probability

P(C|A) = Probability to obtain C, given that the first letter is A

In general, in a tree diagram

 the branches of the 1st column contain simple probabilities


of the form P(X)

 the branches of the 2nd column contain conditional probabilities


of the form P(Y|X)

 the results in the last column are combined probabilities


of the form P(XY)

C
P(A
C)
P(C|A)

A
P(A)
P(D|A) D
P(A
D)

P(B
C)
P(C|B) C
P(B) B

P(D|B) D P(B
D)

42
TOPIC 4: STATISTICS AND PROBABILITY Christos Nikolaidis

We may have more complicated tree diagrams, with more


branches per level, more levels, etc.

EXAMPLE 1.
We throw a die.
If we get 1 we stop.
If we get 2,3,4 or 5 we toss a coin.
If we get 6 we toss two coins.
Find the probability that only one head is obtained.
Solution.
For our convenience, we denote the results of the die by
A={1}, B={2,3,4,5}, C={6}

We construct the following tree diagram:

1/6
1/2 H
4/6 B

1/2
1/6 T H
1/2
C 1/2 1/2
H T

H
1/2 1/2

T
1/2 T

There are finally 7 scenarios (seven paths).


In 3 of them we have exactly one HEAD (we mark them by )
We add the corresponding results:
4 1 1 1 1 1 1 1 4 1 1 5
P(only one HEAD)            
6 2 6 2 2 6 2 2 12 24 24 12

43
TOPIC 4: STATISTICS AND PROBABILITY Christos Nikolaidis

 A TYPICAL EXAMPLE: COLORED BALLS IN A BOX

A box contains 10 balls: 6 BLACK and 4 WHITE:

10 balls

We select two balls, one after the other. All possible outcomes are
clearly shown on the following tree diagram

30
B 90
5
9

B
6
10 4 24
90
9
W

6
B 24
9
90
4
10 W

3 12
9
W 90

6 5 30 1
P(both balls are BLACK)    
10 9 90 3
 6 4 24 8
P(only one ball is BLACK)     2  2 
 10 9  90 15
6 5 4 3 42 7
P(balls of same color)      
10 9 10 9 90 15

If we select 3 balls, we may follow the same rationale and answer


directly without drawing a tree diagram. Thus,
6 5 4 1
P(all three balls are BLACK)    
10 9 8 6
 6 4 3 3
P(only one ball is BLACK)     3 
 10 9 8  10

44
TOPIC 4: STATISTICS AND PROBABILITY Christos Nikolaidis

 THE “REVERSE GIVEN”

Consider again the tree diagram of PROBLEM 2

C 0.12
0.3

A
0.4
0.7
D 0.28

C 0.24
0.4
0.6 B

0.6
D 0.36

We said that P(C|A) = 0.3 is shown on the tree (on the branch AC).

What about P(A|C)?

Notice the “reverse” chronological order:

given that the final result is C,


what is the probability that the first result was A?

This result is not shown on the tree diagram; it is estimated as


follows
0.12  combination AC
P(A | C) 
0.12  0.24  given C

P(A  C)
Actually, it is the formula P(A | C) 
P(C)
Therefore,
0.12 0.24
P(A | C)   0.33 P(B | C)   0.67
0.36 0.36

0.28 0.36
P(A | D)   0.44 P(B | D)   0.56
0.64 0.64

Remark: the formula that calculates the “reverse” probability is


known as Bayes’ Theorem.

45
TOPIC 4: STATISTICS AND PROBABILITY Christos Nikolaidis

EXAMPLE 2.
In a private school party, 30% of the students wear RED suits, 20%
wear GREEN suits and 50% wear BLUE suits. 25% of the RED
students, 35% of the GREEN students and 45% of the BLUE
students are MALE. Find the probability that a MALE student
wears GREEN suit, that is
P(GREEN|MALE).
Solution.
Instead of applying the Bayes’ formula we will construct a tree
diagram to obtain the “inverse given” probability.

Notice that we do not complete all the probabilities on the tree


diagram, but only the “necessary” ones.

MALE 0.075
0.25
RED
D
0.3 FEMALE

MALE 0.070
0.2 GREEN 0.35

0.5
FEMALE
BLUE 0.45
MALE 0.225

FEMALE

Therefore,
0.070 0.07
P(GREEN|MALE)    0.189
0.075  0.070  0.225 0.37

In other words, 18.9% of the MALE students wear GREEN suits.

46
TOPIC 4: STATISTICS AND PROBABILITY Christos Nikolaidis

4.9 PROBABILITY DISTRIBUTION OF A RANDOM VARIABLE X

Roughly speaking, a random variable X takes on some values in a


given domain at random!!! It may be

Discrete OR Continuous

e.g. X{10,20,30} e.g. X[10,20]

X{0,1,2,3,…} XR

A discrete variable takes on values in a finite or numerable set,


while a continuous variable takes on values in some interval(s).

In this paragraph we only deal with discrete random variables.

 DISCRETE RANDOM VARIABLE

Let X be a variable which takes on the values

10, 20, 30
with probabilities
0.2, 0.3, 0.5

respectively. We often use a table

x 10 20 30

P(X=x) 0.2 0.3 0.5

Clearly

(i) all the probabilities are non-negative numbers; and


(ii) their sum is always 1.

Then we say that X is a discrete random variable.

To express that the probability that X=10 is 0.2 we write

P(X=10) = 0.2

Similarly, P(X=20) = 0.3 and P(X=30) = 0.5.

47
TOPIC 4: STATISTICS AND PROBABILITY Christos Nikolaidis

In general, for a discrete random variable X with

x x1 x2 x3 …

P(X=x) p1 p2 p3 …

it holds
(i) pi  0 , for all i
(ii) p i  1, i.e p1  p 2  p3  ⋯  1

We write
P(X=x1) = p1, P(X=x2) = p2, and so on.

(We also say that a probability function p: xi ֏ yi is defined).

 THE EXPECTED VALUE μ=E(X)

The mean μ or otherwise the expected value E(X) is defined by


Ε(Χ) = x p i i = x 1 p1  x 2 p 2  x 3 p3  ⋯

For our example

x 10 20 30

P(X=x) 0.2 0.3 0.5

the expected value (otherwise the mean) is

E(X) = 10  0.2  20  0.3  30  0.5 = 23

NOTICE: Explanation for μ=E(X)


In fact the mean here is not different than the mean in statistics

Consider the following ten numbers

10, 10, 20, 20, 20, 30, 30, 30, 30, 30

The probabilities to select 10, 20 or 30 are as in the table above.

The mean in statistics is also


10  2  20  3  30  5 2 3 5
μ= = 10   20   30  =23
10 10 10 10

48
TOPIC 4: STATISTICS AND PROBABILITY Christos Nikolaidis

EXAMPLE 1
Consider
x 10 20 30
P(X=x) a b 0.5

Given that E(X)=23, find the values of a and b.


Solution.
We use two relations
a + b + 0.5 = 1  a + b = 0.5
10a + 20b + 30 0.5 = 23  10a+20b=8

The solution of the system is a = 0.2 and b = 0.3

The probability distribution applies in many betting games:

EXAMPLE 2
Consider again the same table above. But now we select one of the
numbers 10, 20, 30 at random.
If we select 10 we earn 6 points
If we select 20 we earn 1 point
If we select 30 we lose 2 points

What is the expected number of points in one game?


Solution.
We extend our table as follows

x 10 20 30
Profit 6 points 1 point -2 points
Prob 0.2 0.3 0.5

We estimate the expected profit:

Expected profit = 6  0.2  1  0.3- 2  0.5 =0.5

That is, in each game we earn 0.5 points on average.

49
TOPIC 4: STATISTICS AND PROBABILITY Christos Nikolaidis

Explanation
In other words, if we play this game 10 times we expect to earn 5
points on average.
Indeed, if we play the game 10 times we expect to obtain
2 times the number 10, that is 2  6=12 points
3 times the number 20, that is 3  1=3 points
5 times the number 30, that is 5  (-2)=-10 points
In total, 12+3-10 = 5 points

EXAMPLE 3
We throw two dice.
If we obtain TWO SIXES we earn 15€
If we obtain ONLY ONE SIX we earn 1€
If we obtain NO SIX we lose 1€

Find the expected profit in one game.


Solution.
Let us organize our data on a table

Result TWO SIXES ONE SIX NO SIX


Profit 15€ 1€ -1€
Prob 1 10 25
36 36 36

The expected amount earned per game is


1 10 25
Expected profit = 15   1 - 1 0
36 36 36
This is a FAIR GAME! We expect neither to earn nor to lose!

Notice. If the first winning prize was not 15€ but 14€, the expected
1
profit would be -
36
In other words, if we play the game 36000 times (or otherwise bet
36000€) we expect to lose 1000€.

50
TOPIC 4: STATISTICS AND PROBABILITY Christos Nikolaidis

 MEDIAN-MODE
These measures, known from statistics, are defined analogously:
MODE = The value X=a of the highest probability
MEDIAN = The value X=m where the probability splits
in two equal parts (0.5-0.5)
Look at the examples below

x 10 20 30 x 10 20 30
P(X=x) 0.4 0.3 0.3 P(X=x) 0.2 0.3 0.5

MODE = 10 MODE = 30
MEDIAN = 20 MEDIAN = 25 (why?)

 VARIANCE (Only for HL)


We define
Var(X) = E(X-μ)2
that is
Var(X) = (x1-μ)2  p1 + (x2-μ)2  p2 + (x3-μ)2  p3 + …

An equivalent definition is
Var(X) = E(X2)-μ2
where
E(X2) = x12  p1 + x22  p2 + x32  p3 + …

EXAMPLE 4
Consider again the probability distribution

x 10 20 30
P(X=x) 0.2 0.3 0.5

We have seen that μ=Ε(Χ)=23. Therefore,


Var(X) = (10-23)2  0.2+(20-23)2  0.3+(30-23)2  0.5 = 61
or
E(X2) = 102  0.2+202  0.3+302  0.5 = 590
Var(X)=590-232 = 61

51
TOPIC 4: STATISTICS AND PROBABILITY Christos Nikolaidis

4.10 BINOMIAL DISTRIBUTION – B(n,p)

It is the distribution of a discrete random variable X which takes on


the values
0, 1, 2, 3, 4, … , n
with probability function

n 
p(x)   p x (1  p) n- x x  0,1,2,3, …, n
x 

where n, p are two parameters. We will see that the binomial


distribution describes a certain type of problems.

Notice: the formula is not in the syllabus; Results will be obtained


directly by GDC. It is worth to mention though how it works!

 DESCRIPTION OF THE PROBLEM


We deal with a game (or any experiment) with two outcomes

SUCCESS with probability p


FAILURE (with probability 1-p)

We play the game n times. Our parameters are

n = number of trials
p = probability of success
while
X counts the number of (possible) successes

We say that X follows a binomial distribution and write XB(n,p).

Since n is the number of trials, X can take on the values


0, 1, 2, 3, 4, …, n

The probabilities P(X=0), P(X=1), P(X=2), etc can be obtained by


the GDC.

(and also by the formula mentioned in the introduction, but as we


have said this formula is not in the syllabus).

52
TOPIC 4: STATISTICS AND PROBABILITY Christos Nikolaidis

 GDC
Our GDC (Casio) gives the results for a Binomial distribution

MENU – Statistics – DIST – BINOMIAL: We use Bpd or Bcd

For simplicity let us denote by

Bpd(x) the probability of exactly x successes


Bcd(x1 to x2) the probability from x1 up to x2 successes

The menu for both functions is


Data: always Variable
Numtrial: is the number of trials i.e. n
p: is the probability of success p (for each game)

Then for each value of x (or x1 to x2), EXE gives the result.

EXAMPLE 1
We toss a die 5 times. The success is to get a six. Then
1
n=5 and p=
6
We may have 0, 1, 2, 3, 4 or 5 successes.
The probability distribution for X is given by (results in 4dp)

x 0 1 2 3 4 5
GDC Bpd(0) Bpd(1) Bpd(2) Bpd(3) Bpd(4) Bpd(5)
P(X=x) 0.4019 0.4019 0.1608 0.0322 0.0032 0.0001

We can also answer the following questions:

Find the probability of Notation GDC Result


exactly 3 sixes P(X=3) Bpd(3) 0.0322
at most 3 sixes P(X≤3) Bcd(0 to 3) 0.9967
less than 3 sixes P(X<3) Bcd(0 to 2) 0.9645
more than 3 sixes P(X>3) Bcd(4 to 5) 0.0033
at least 3 sixes P(X≥3) Bcd(3 to 5) 0.0355

53
TOPIC 4: STATISTICS AND PROBABILITY Christos Nikolaidis

Remark for the formula (not in the syllabus but worth to know)

The probability to obtain


5 sixes in a row is 1/65
no six at all is 5/65
2 3
5  1   5 
2 sixes and 3 no-sixes is  2    
  6   6 

5 
  (or 5C2) is the number of ways to have 2 sixes in 5 trials.
2

In general, the probability to obtain x sixes (and (5-x) no-sixes) is


x 5- x
 5  1   5 
    
 x  6   6 

In general, if we play n times a game with probability of success p

the probability P(X=x) is given by the formula

n 
p(X  x)   p x (1  p) x
x 

According to the formula

x 0 1 2 3 4 5
3125 3125 1250 250 25 1
P(X=x)
65 65 65 65 65 65
=0.4019 =0.4019 =0.1608 =0.0322 =0.0032 =0.0001

The table agrees with the results found by Bpd(x) above.

 EXPECTED VALUE AND VARIANCE OF X


They are given by the formulae
E(X) = np Var(X) = np(1-p)

For our example above


1 5 1 5 25
E(X) =5 = and Var(X)=5 =
6 6 6 6 36

54
TOPIC 4: STATISTICS AND PROBABILITY Christos Nikolaidis

EXAMPLE 2
A box contains 5 balls, 1 BLACK and 4 WHITE. We win if we select
a BLACK ball. We play this game 10 times.
Find
(a) The probability to win exactly 4 times
(b) The probability to win at most 4 times
(c) The probability to win at least once
(d) The expected number of winning games.
(e) The variance of the number of winning games.

Solution
The variable
X = number of winning games
1
follows a binomial distribution with n=10 and p= =0.2
5
[we may also write XB(10,0.2)]

(a) The probability to win exactly 4 times is Bpd(4)=0.088


(b) The probability to win at most 4 times is Bcd(0 to 4)=0.967
[in fact P(X≤4) = P(X=0)+P(X=1)+P(X=2)+P(X=3)+P(X=4)]
(c) The probability to win al least once is Bcd (1 to 10) = 0.893
[in fact P(X≥1) = 1-P(X=0) = 1-0.107 = 0.893]
(d) The expected number is E(X)=np=100.2 = 2
(e) The variance is Var(X) = np(1-p) = 100.20.8 = 1.6

EXAMPLE 3

Let p=0.2 and n unknown. It is given that P(X=1) = 0.268. Find n.

Solution

We know that n must be an integer.


By trial and error on Numtrial we can see that Bpd(10)=0.268

Hence n=10.

55
TOPIC 4: STATISTICS AND PROBABILITY Christos Nikolaidis

 MODE (mainly for HL)

We first check the expected number


 If the expected number is in decimal form,
1 20
say n=20, p= , so that E(X)=  3.3
6 6
we check the nearest integer values 3 and 4
P(X=3) = 0.237
P(X=4) = 0.202
Hence the mode is 3 (it has the highest probability)
 If the expected number is a whole number,
1 60
say n=60, p= , so that E(X)= = 10
6 6
we check the neighboring integer values 9, 10, 11
P(X=9) = 0.134
P(X=10) = 0.137
P(X=11) = 0.126
Hence the mode is 10.

Notice In some cases we may have two modes:


1 5
For n=5 and p= , it is E(X)=  0.833 . We check
6 6
P(X=0) = 0.4019
P(X=1) = 0.4019
Hence there are two modes, 0 and 1.

56
TOPIC 4: STATISTICS AND PROBABILITY Christos Nikolaidis

4.11 NORMAL DISTRIBUTION – N(μ,σ2)

It is the distribution of a continuous random variable X with values


form -∞ to +∞. The parameters of this distribution are

μ = mean
σ = standard deviation.

The “behavior” of the probability is described by a function which


looks like

-∞ μ +∞

Roughly speaking, there is a highly likely mean value μ and all the
other values of X spread out symmetrically about the mean. As we
move away from the mean (either to the left or to the right of the
mean) the probability decreases dramatically!

We say that X follows a normal distribution with mean μ and


standard deviation σ (or variance σ2) and we write XN(μ,σ2).

 DESCRIPTION OF THE PROBLEM IN GENERAL


It is the most “popular” distribution in nature. Random variables
which depend on many factors follow this distribution, for example

 Weight of people
 Height of people
 Time spent in a super market
 Weight of a pack of coffee labeled 500 g.

57
TOPIC 4: STATISTICS AND PROBABILITY Christos Nikolaidis

For example, suppose that for a Greek man

mean weight: μ=75kg st.dev. of the weights: σ=10kg

It is estimated4 that

ranges between
Percentage of the population
in general for our problem
about 68% of the population μ-σ and μ+σ [65,85]
about 95% of the population μ-2σ and μ+2σ [55,95]
about 99.7% of the population μ-3σ and μ+3σ [45,105]

68.3%
x

-∞ 65 75 85 +∞

NOTICE
 The whole area under the curve is 1 (i.e. 100%). The area before
the mean as well as the area after the mean is 0.5 (i.e. 50%)
 Theoretically, the distribution of X ranges between -∞ to +∞.
In practice, we may assume that almost the whole population
(in fact 99,7%) ranges between μ-3σ and μ+3σ.
 The standard deviation σ indicates the spread of the population.
For example, assume that
Greeks: μ=75 kg σ=10 kg
Italians: μ=75 kg σ=8 kg
This implies that both populations have the same mean but
Italians are closer to the mean than Greeks. In other words,
almost the whole population is between μ±3σ, namely
75±30 i.e. 45-105 kg for Greeks
75±24 i.e. 51-99 kg for Italians

4 We will explain in a while how we get the following estimations.

58
TOPIC 4: STATISTICS AND PROBABILITY Christos Nikolaidis

We will distinguish two types of problems. For both problems we


use GDC to find the results.
For Casio fx

MENU – STAT – DIST – NORM: We use Ncd or InvN


Data: always use Variable

In general, Ncd is used when we ask for a probability


InvN is used when we know the probability

 PROBLEM 1: FIND PROBABILITY (we use Ncd)


Consider again the example where
X = the weight of a Greek man
with μ=75 kg and σ=10 kg.
Find the probability that a Greek man weighs
(a) between 60 and 82 kg [that is P(60≤X≤82)]
(b) more than 82 kg [that is P(X≥82)]
(c) less than 60 kg [that is P(X≤60)]
Solution
We use Ncd in the GDC. We set σ=10, μ=75

Question Ncd Press EXE


(a) Lower 60 Upper 82 0.691
(b) Lower 82 Upper 999999… 0.242
(c) Lower -999999 Upper 60 0.067

Let us represent the information of this problem by a diagram:

0.067 0.691 0.242

-∞ 60 75 82 +∞

59
TOPIC 4: STATISTICS AND PROBABILITY Christos Nikolaidis

NOTICE
 GDC gives some extra information below each result.
For question (a) it gives P(60≤X≤82)=0.691 and then
z:Low =-1.5 z:Up =0.7
the so-called standardized values of 60 and 82. They mean that

the lower bound 60 is 1.5 standard deviations below μ=75


the upper bound 82 is 0.7 standard deviations above μ=75

 The probability that some weight is 1 st. deviation (i.e. 10 kg)


away from the mean m=75, that is between 65 and 85 kg is

P(65≤X≤85)0.683, (68.3%)

as we said in the introduction. Notice that z:Low =-1 z:Up =1

 The probabilities above refer to one person only.


For example,
P(a person is between 60 and 82 kg) = 0.691,
P(a person is not between 60 and 82 kg) =1-0.691= 0.309

If we select two people,


P(both between 60 and 82 kg) = (0.691)2
P(none between 60 and 82 kg) = (0.309)2
P(only one between 60 and 82 kg) = (0.691)(0.309) 2

 Combination of Normal and Binomial distributions:


We select 10 people. What is the probability that exactly three
of them are between 60 and 82 kg?
For one person the normal distribution gives p=0.691.
Then, a new variable Y (for the number of people) follows
binomial distribution B(n,p) with
n=10 and p=0.691.
and
P(Y=3) =0.0106

60
TOPIC 4: STATISTICS AND PROBABILITY Christos Nikolaidis

 PROBLEM 2: PROBABILITY IS GIVEN (we use InvN)


Now, for
X = the weight of a Greek man
with μ=75 kg and σ=10 kg they give us the information

the probability that somebody weighs less than a is 0.067


or
6.7% of the Greek men weigh less than a
Find a.
Using mathematical notation:
P(X≤a)=0.067, find a
Solution
Let us represent this information in a diagram

0.067

-∞ a +∞

We use InvN. We set the parameters σ=10, μ=75. Then

Tail: Left (it is the area before a)


Area: 0.067

Press EXE and obtain a=60 kg.

Notice for the tail

Tail: Left if it says less than a


Tail: Right if it says more than a

The area after point a above is 0.933. That is P(X≥a)=0.933. Then


Tail: Right
Area: 0.933
also gives a=60 kg.

61
TOPIC 4: STATISTICS AND PROBABILITY Christos Nikolaidis

EXAMPLE 1
The mass of packets for a certain type of coffee is normally
distributed with a mean of 500 g and standard deviation of 15 g.

(a) Find the probability that a packet weighs more than 520 g
(b) The lightest 4% of the packets weigh less than a.
The heaviest 5% of the packets weigh more than b.
Find a and b.
(c) Find Q1, Q3, the lower and upper quartiles of the weights

Solution

(a) We use Ncd


P(X≥520)  0.091

(b) We use InvN


P(X≤a)=0.04, hence a  474 g
P(X≥b)=0.05, hence b  525 g

(c) In fact, this question looks like question (b).


We know that the “area” before Q1 is 0.25, while the area
before Q3 is 0.75.
We use InvN, tail: left
P(X≤Q1)=0.25  Q1  490 g
P(X≤Q3)=0.75  Q3  510 g

For the second result we can also use Tail: right, area = 0.25

For this question in particular, we may use

Tail: Central, Area =0.5

and obtain both results. Q1  490 g and Q3  510 g

In general, Tail: Central can be used when the values are


symmetrically before and after the mean.

62
TOPIC 4: STATISTICS AND PROBABILITY Christos Nikolaidis

EXAMPLE 2
The mass of packets for a certain type of coffee is normally
distributed with a mean of 500 g and standard deviation of 15 g.

Packets less than 475 g are rejected from the market.

(a) We select 2 packets. Find the probability that both are


rejected.
(b) We select 5 packets. Find the probability that al least one is
rejected.

Solution

The probability that a packet is rejected is


P(X<475)  0.0478
(a) (0.0478)2=0.00228
(b) Y follows binomial with n=5 and p=0.0478
P(Y≥1)  0.217

63
TOPIC 4: STATISTICS AND PROBABILITY Christos Nikolaidis

64
TOPIC 4: STATISTICS AND PROBABILITY Christos Nikolaidis

ONLY FOR

HL

65
TOPIC 4: STATISTICS AND PROBABILITY Christos Nikolaidis

66
TOPIC 4: STATISTICS AND PROBABILITY Christos Nikolaidis

4.12 POISSON DISTRIBUTION – Po(m)

It is the distribution of a discrete random variable X which takes on


the values
0, 1, 2, 3, …
with probability function

e m m x
P(X  x)  x  0,1,2,3, …
x!

where m is a parameter.
We say that X follows a Poisson distribution and write XPo(m).

Notice: the formula above is not in the syllabus but it is worth to


mention it! The results can be obtained directly by GDC.

 DESCRIPTION OF THE PROBLEM


In general, we study the number of incidents that occur within a
certain period (usually of time). For example

- Number of phone calls per minute in a call center.


- Number of accidents per hour in a certain area.
- Number of cars coming to a junction.
- Number of mistakes per page in a book
- Number of bacteria per cm3

We denote by
m = the mean number of incidents
(within a certain period)
while
X is the random variable for the possible number of incidents
(within the certain period)

Then, the probability that x incidents occur, that is


P(X=x)

can be obtained by GDC (or by the formula mentioned above).

67
TOPIC 4: STATISTICS AND PROBABILITY Christos Nikolaidis

 GDC
Our GDC (Casio) gives the results for a Poisson distribution

MENU – Statistics – DIST – POISSON: We use Ppd or Pcd


Data: always Variable
λ: is the mean m
x: is the value for X
Then EXE gives the result

For simplicity let us denote by

Ppd(x) the probability that exactly x incidents occur


Pcd(x-y) the probability from x up to y incidents

EXAMPLE 1

In a call center it has been noticed that 2 phone calls on average


occur per minute. Thus m=2 (for GDC: λ=2)
Thus, the probability that 3 phone calls occur per minute is 5

P(X=3) = Ppd(3) = 0.180

Similarly,

P(X=0)  0.135, P(X=1)  0.271, and so on!

We obtain the following table for the distribution of X

X 0 1 2 3 …
P(X=x) 0.135 0.271 0.271 0.180 …

We can also obtain results like that:

P(X≤2) = Pcd(0-2) = 0.677

This is in fact,

P(X=0) + P(X=1) + P(X=2) = 0.135 + 0.271 + 0.271 = 0.677

-2 3
5 Notice that the formula e 2  0.180 gives the same result
3!

68
TOPIC 4: STATISTICS AND PROBABILITY Christos Nikolaidis

Loot at also the following results:

Find the probability that Notation GDC Result

exactly 3 phone calls occur P(X=3) Ppd(3) 0.180

at most 3 phone calls occur P(X≤3) Pcd(0-3) 0.857

less than 3 phone calls occur P(X<3) Pcd(0-2) 0.677

more than 3 phone calls occur P(X>3) Pcd(4-+) 0.143

at least 3 phone calls occur P(X≥3) Pcd(3-+) 0.323

For + use a very large number, e.g. +999999

 MIND THE LENGTH OF THE PERIOD

Be careful about m. It is the mean number of incidents for the


period in question. For example, if
v = the frequency of incidents per minute
the mean number of incidents in t minutes is
m=vt
The following example helps!

EXAMPLE 2
Assume that the mean number of phone calls per minute is 2. Find
(a) The probability that 3 phone calls occur in one minute
(b) The probability that 3 phone calls occur in two minutes
(c) The probability that no phone calls occur in three minutes
Solution
The frequency of phone calls is v=2 (phone calls per minute)
(a) The mean number of phone per minute is m=2. Hence
P(X=3) = Ppd(3) = 0.180
(b) The mean number of phone calls per 2 minutes is m=4. Hence
P(X=3) = Ppd(3) = 0.147
(c) The mean number of phone calls per 3 minutes is m=6. Hence
P(X=0) = Ppd(0) = 0.00248

69
TOPIC 4: STATISTICS AND PROBABILITY Christos Nikolaidis

 EXPECTED VALUE AND VARIANCE


The formulas are simple
E(X) = m
Var(X) = m

At least for E(X) it seems to be very reasonable that the expected


number of incidents is the mean number of incidents!

 MODE
We check the neighboring integer values of the expected number λ.
Look at the following two cases:
 Assume that the mean is m=4.3. We expect that the most likely
number of incidents is near 4.3. We check
P(X=4) = 0.193
P(X=5) = 0.166
Hence the mode is 4.
 Assume that the mean is m=5. We expect that the most likely
number of incidents is near 5.
We check
P(X=3) = 0.140 P(X=4) = 0.175
P(X=5) = 0.175 P(X=6) = 0.146
Hence we have two modes, 4 and 5.

 ASSUMPTIONS (a bit theoretical!)

The conditions that justify a Poisson distribution are the following

 The events within two disjoint intervals are independent.


 Events occur at a uniform average rate

For example, if we study the number of phone calls in a call center


within a minute, we assume that the numbers of phone calls in
disjoint minutes are independent to each other, and also that the
phone calls occur at a uniform rate (we exclude extreme situations).

70
TOPIC 4: STATISTICS AND PROBABILITY Christos Nikolaidis

EXAMPLE 3
The mean number of phone calls in a call center is m=2. Find the
probability of the combined event that
3 phone calls occur in the first minute and
4 phone calls occur in the second minute
Solution
We have
P(X=3) = 0.1804 and P(X=4) = 0.0902
The time intervals are disjoint, thus the probability is
P(X=3 and X=4) = (0.1804)(0.0902) = 0.0163

 THE SUM OF POISSON DISTRIBUTIONS


Suppose that X and Y are two independent variables such that

X follows Po(m)
Y follows Po(n)

Then X+Y follows Po(m+n)

EXAMPLE 4
The mean number of phone calls in the call center A is m=2, while
the mean number of phone calls in the call center B is n=3.
Find the probability that the total number of phone calls is 6.
Solution
X ~Po(2), Y ~ Po(3), therefore X+Y ~ Po(5)
Thus
P(X+Y=6) = 0.161

NOTICE
The sum of the following combinations (where X+Y=6)

P(X=0)P(Y=6) + P(X=1)P(Y=5) + … + P(X=6)P(Y=0)

is also 0.161. (check!)

71
TOPIC 4: STATISTICS AND PROBABILITY Christos Nikolaidis

4.13 MARKOV CHAINS

Markov chain is a model describing a sequence of possible events in


which the probability of each event depends only on the state
attained in the previous event.

Let us make it clear by presenting a simple example:

Consider two locations A and B. Each day the following transitions


in population occur:
For the population of A: 80% stays in A
20% moves to B
For the population of B: 70% stays in B
30% moves to A
In probability terms, this information is depicted in the following
transition state diagram

0.8
0.2
A

B 0.7
0.3

or in the following table

from
Prob
A B
A 0.8 0.3
to
B 0.2 0.7

or, in mathematical terms, in the following transition matrix

 0.8 0.3 
T = 
 0.2 0.7 
Notice that
 the columns refer to “from” and the rows to “to”.
 the sum of the entries in each column is 1.

72
TOPIC 4: STATISTICS AND PROBABILITY Christos Nikolaidis

After two days, the probabilities for the transition of the


population are shown in the following tree diagrams:

starting from A starting from B

A A
0.64 0.24
0.8 0.8

A A
0.8 0.3
0.2 B 0.2 B
0.16 0.06
A B
0.06 0.21
0.3 A 0.3 A
0.2 B 0.7 B

0.7 B 0.14 0.7 B 0.49

Therefore, after two days, the probability to move


from A to A is 0.64+0.06 = 0.70
from A to B is 0.16+0.14 = 0.30
from B to A is 0.24+0.21 = 0.45
from B to B is 0.06+0.49 = 0.55

However, we can obtain all the results above in a much easier and
amazing way: by squaring the transition matrix A!

 0.8 0.3   0.8 0.3   0.70 0.45 


T2 =  = 
 0.2 0.7   0.2 0.7   0.30 0.55 
Still,
 the columns refer to “from” and the rows to “to”.
 the sum of the entries in each column is 1.

The situation after 3 days is depicted in the following matrix

 0.650 0.525 
T3 = 
 0.350 0.475 

By using our GDC we may observe that for large powers (that is
after many days) the result converges to the matrix

 0.6 0.6 
T = 
 0.4 0.4 

73
TOPIC 4: STATISTICS AND PROBABILITY Christos Nikolaidis

Notice.
The columns in T  are identical. This happens when the matrix is
regular:
a square matrix A is regular if An has non-zero entries
after some value n. Then An converges to a matrix A∞
in which all columns are identical.

In our case T is regular as it has non-zero entries from the very


beginning!

Suppose that the total population in the two cities is 100 and the
initial distribution is 50-50. This can be depicted in the

 50 
initial state vector S0 =  
 50 

After one day, the distribution in the two cities becomes

 0.8 0.3  50   55 
S1 = TS0 =   =  that is, 55 in A, 45 in B
 0.2 0.7  50   45 
 55 
After 2 days, S2 = TS1 = T 2 S0 =  
 45 
 55 
After 3 days, S3 = TS2 = T 3S0 =  
 45 
 60 
After many days, S = T  S0 =  
 40 

However,

 100   60 
If the initial state vector was S0 =   , then still S = T S0 = 


 0   40 
 0   60 
If the initial state vector was S0 =   , then still S = T S0 = 


 100   40 
 60 
In fact, any initial state vector S0 would result to S = T  S0 =  
 40 

74
TOPIC 4: STATISTICS AND PROBABILITY Christos Nikolaidis

This means that after many steps, things tend to stabilize and the
distribution of the population reaches the so-called

 60 
steady state vector S= 
 40 
This also implies that if the initial distribution was already 60-40,
it would remain the same forever, from the very first day! Indeed,

 0.8 0.3  60   60 
TS =   =  =S
 0.2 0.7  40   40 

 THE PROBABILITY FOR THE MOTION OF A ROBOT!


Another interesting observation is the following:
If a robot starts from location A and the probabilities for its motion
are given in the transition diagram above, then the initial state
vector is
1
S0 =  
0
After one move the state of probabilities is

 0.8 0.3   1   0.8 


S1 = TS0 =    =   i.e. 0.8 to be in A, 0.2 to be in B
 0.2 0.7   0   0.2 
 0.7 
After 2 moves, S2 = TS1 = T 2 S0 =  
 0.3 
 0.65 
After 3 moves, S3 = TS2 = T 3S0 =  
 0.35 
 0.6 
After many days, S = T  S 0 =  
 0.4 

In this case, the steady state vector is

 0.6 
S= 
 0.4 

Thus, at any moment in the “future”, the probability for the robot
to be in location A is 0.6, while to be in location B is 0.4.

The following example is described by a non-regular transition


matrix but it is interesting to see how a Markov chain works.

75
TOPIC 4: STATISTICS AND PROBABILITY Christos Nikolaidis

EXAMPLE 1

A robot can be moved on the diagram below, at positions 1,2,3,4.


It is initially located at position 2.

The probability to move to the right is 0.6, while to the left 0.4. It
stops when it reaches position 1 or position 4.

What is the probability to finish at position 1?

This problem can be easily (and amazingly) solved as a Marvov


chain process.

The probabilities of moving from any position i to any position j are


shown in the following transition matrix

 1 0.4 0 0 
 
0 0 0.4 0 
T =
 0 0.6 0 0 
 
 0 0 0.6 1 

For example,
the 1st column says: if the robot is situated at position 1, the
probability to move elsewhere is 0 as it stops at position 1.
the 2nd column says: if the robot is situated at position 2, the
probability to move to 1 is 0.4 while to 3 is 0.6.
and so on.

The initial state vector is


0
 
1
S0 =  
0
 
0

as the robot starts at position 2.

76
TOPIC 4: STATISTICS AND PROBABILITY Christos Nikolaidis

After one move the state of probabilities is

 1 0.4 0 0   0   0.4 
    
0 0 0.4 0   1   0 
S1 = TS0 = T =  =
 0 0.6 0 0   0   0.6 
    
 0 0 0.6 1   0   0 
 0.40 
 
2  0.24 
After 2 moves, S2 = TS1 = T S0 =
 0 
 
 0.36 

In other words,
the probability to finish at position 1 is 0.40,
the probability to be again in position 2 is 0.24,
there is no way to be in position 3 (can you think why?), the
probability to finish at position 4 is 0.36.

so after 2 moves it is more highly likely to finish at position 1.

The transition matrix is not regular as any power will contain non-
zero matrix (can you think why?).

However, it is interesting to see what happens after 10 moves (by


using our GDC for multiplication of matrices):

 0.5259 
 
10  0.0008 
S10 = T S0 =
 0 
 
 0.4733 
After 11 moves

 0.5262 
 
11  0 
S11 = T S0 =
 0.0005 
 
 0.4733 

Notice that it is impossible to be at position 3 after 10 moves or at


position 3 after 11 moves (why?)

We observe that in general, it is more highly likely for the robot to


finish at position A (about 53%) than position B (about 47%).

77
TOPIC 4: STATISTICS AND PROBABILITY Christos Nikolaidis

 MARKOV CHAINS AND EIGENVECTORS


In our original problem, for the transition matrix

 0.8 0.3 
T = 
 0.2 0.7 
we found a steady state vector
 60 
S= 
 40 
such that TS = S .
Think of this relation as

TS = 1S

Thus, 1 is an eigenvalue and S is a corresponding eigenvector!

It can be shown that any transition matrix has 1 as eigenvalue.

Let see what happens in our case.

 0.8- λ 0.3  2
det   = 0  λ - 1.5λ + 0.5 = 0
 0.2 0.7- λ 

 λ = 1 or λ = 0.5

For the eigenvalue λ = 1 we obtain the system:


-0.2x + 0.3y = 0 x 3
 0.2x = 0.3y  =
0.2x- 0.3y = 0 y 2
 3t 
Thus, the corresponding eigenvectors are  .
 2t 
Therefore, if the total population is 100,
 60 
the corresponding eigenvector is  
 40 
[since 3t + 2t = 100  t = 20 ].

If the total population is 1 (the case of the robot)


 0.6 
the corresponding eigenvector is  
 0.4 
These are the steady state vector for the corresponding populations.

For example, a rental car company with 100 cars, two car stations
A and B, and transition matrix T as above, should place 60 cars in
station A and 40 cars in station B, from the first day!.

78

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy