Topic 4A. Descripitve Statistics - Probability
Topic 4A. Descripitve Statistics - Probability
Topic 4A. Descripitve Statistics - Probability
MATHEMATICS
Applications and Interpretation SL (and HL)
Lecture Notes
Christos Nikolaidis
TOPIC 4
STATISTICS AND PROBABILITY
Only for HL
October 2021
TOPIC 4: STATISTICS AND PROBABILITY Christos Nikolaidis
Discrete OR Continuous
{10,20,30} [40,100]
{0,1,2,3,…} R
(finite or numerable set) (interval)
1
TOPIC 4: STATISTICS AND PROBABILITY Christos Nikolaidis
Colored
Freq
Balls
Blue 13
Green 8
Red 10
Yellow 3
Age Frequency
[0,10) 7
[10,20) 5
[20,30) 1
[30,40) 3
2
TOPIC 4: STATISTICS AND PROBABILITY Christos Nikolaidis
3
TOPIC 4: STATISTICS AND PROBABILITY Christos Nikolaidis
10, 20, 20, 20, 30, 30, 40, 50, 70, 70, 80
Here
10 20 20 20 30 30 40 50 70 70 80
mean = = 40
11
median = 30
4
TOPIC 4: STATISTICS AND PROBABILITY Christos Nikolaidis
NOTICE
For the data 10, 20, 30
Median = 20
For the data 10, 20, 30, 40
Median = 25
That is, for an even number of data,
median = the mean of the two middle values
The median is not the n -th entry as one would possibly expect.
2
10, 20, 30, 40, 50, 60, 70, 80, 90, 100
μ
x i
5
TOPIC 4: STATISTICS AND PROBABILITY Christos Nikolaidis
EXAMPLE 1
Find
a) the integers a b c , given that mean=4, mode=5, median=5.
The median implies that b=5. The mode implies that also c=5.
a5 5
Then 4 a 10 12 a 2
3
MEASURES OF SPREAD
We use the same set of data
10, 20, 20, 20, 30, 30, 40, 50, 70, 70, 80
A) STANDARD DEVIATION
2 In fact,
the Greek letter σ is used for the whole population;
the Latin letter sn is used for a sample of the population
6
TOPIC 4: STATISTICS AND PROBABILITY Christos Nikolaidis
As the estimation of the values Q1, Q2, Q3 is quite tricky, let us see
some extra cases in the following example.
b) For n=8 entries: 10, 20, 30, 40, 50, 60, 70, 80
The median is Q2=45 (the 4.5th entry). Hence Q1=25, Q3=65.
c) For n=9 entries: 10, 20, 30, 40, 50, 60, 70, 80, 90
The median is Q2=50 (the 5th entry). Hence Q1=25, Q3=75.
d) For n=10 entries: 10, 20, 30, 40, 50, 60, 70, 80, 90, 100
Then Q2=55 (the 5.5th entry). Hence Q1=30, Q3=80.
7
TOPIC 4: STATISTICS AND PROBABILITY Christos Nikolaidis
NOTICE
The square of the standard deviation is called variance. That is
2
variance = σ2 or sn
USE OF GDC
Notice that
The standard deviation in the GDC is denoted by σχ
The variance is not given; it is simply the square of σχ
min Q1 Q2 Q3 max
8
TOPIC 4: STATISTICS AND PROBABILITY Christos Nikolaidis
MORE DETAILS
1) Percentiles
The values Q1, Q2, Q3 are also called
Q1 : 25th-percentile
Q2 : 50th-percentile
Q3 : 75th-percentile
2) Outliers
Such a value is viewed as being too far from the central values to
be reasonable. In our example,
Q1 - 1.5×IQR = 20 - 1.5×50 = - 55
9
TOPIC 4: STATISTICS AND PROBABILITY Christos Nikolaidis
The formulas are not in the syllabus. We give them just for
information.
σ 2
x i
2
- x2
n
For our example,
10 2 20 2 20 2 ⋯ 80 2
variance = -402 =527.27
11
10
TOPIC 4: STATISTICS AND PROBABILITY Christos Nikolaidis
Data Frequency
x f
10 1
20 3
30 2
40 1
50 1
70 2
80 1
n=11
μ
f1 x1 f2 x 2 f3 x3 ⋯
or otherwise μ
f x
i i
n n
11
TOPIC 4: STATISTICS AND PROBABILITY Christos Nikolaidis
It helps here to add an extra column in the table above with the
so-called cumulative frequencies:
MEASURES OF SPREAD
A) STANDARD DEVIATION
12
TOPIC 4: STATISTICS AND PROBABILITY Christos Nikolaidis
USE OF GDC
13
TOPIC 4: STATISTICS AND PROBABILITY Christos Nikolaidis
GROUPED DATA
Suppose that 100 students took an exam and obtained scores from
1 to 60 (full marks), according to the following table:
0 x 10 5 8 8
10 x 20 15 12 20
20 x 30 25 10 30
30 x 40 35 25 55
40 x 50 45 35 90
50 x 60 55 10 100
n=100
14
TOPIC 4: STATISTICS AND PROBABILITY Christos Nikolaidis
x: up to 10 20 30 40 50 60
y: c.f 8 20 30 55 90 100
15
TOPIC 4: STATISTICS AND PROBABILITY Christos Nikolaidis
Below that graph we can easily draw box and whisker plot:
In the same way we can find any percentile. For example, for the
40th-percentile
Estimate 40% of n: here 40% of 100 students is 40;
Draw a horizontal line at y=40 until you meet the curve;
Then draw a vertical line;
Hence
40th-percentile = 35.
There are no scores lower than -6.5 or greater than 77.5, that is
there are no outliers.
16
TOPIC 4: STATISTICS AND PROBABILITY Christos Nikolaidis
4.4 REGRESSION
x 10 12 15 20 23 28 30
y 120 135 174 213 270 301 305
300
200
100
x
-5 5 10 15 20 25 30 35
The closest to the ends ±1, the more our data are linearly related.
(-1 implies a negative slope while +1 implies a positive slope)
The closest to 0, the less our data are linearly related.
There is also a line y=ax+b that best fits our data; it is known as
regression line. We can easily obtain these details by using a GDC.
17
TOPIC 4: STATISTICS AND PROBABILITY Christos Nikolaidis
USE OF GDC
300
200
y =9.83x+23.1
100
x
-5 5 10 15 20 25 30 35
Notice that x=18 is within the range of our list while x=40 is not.
f(18)=200 is known as interpolation, f(40)=416 as extrapolation.
In general, interpolations are more reliable than extrapolations.
18
TOPIC 4: STATISTICS AND PROBABILITY Christos Nikolaidis
CALC
2VAR: We obtain all the statistics, separately for x’s and y’s
In our example
x = 19.7 y =216.9
Thus the line passes through the point M(19.7, 216.9).
19
TOPIC 4: STATISTICS AND PROBABILITY Christos Nikolaidis
x y
y
10
r =1
1 2 8
perfect positive
2 4 6 correlation
3 6 4
Regression line:
4 8 2
x y=2x
5 10 1 2 3 4 5
x y y
r =-1
10
1 10 perfect negative
8
2 8 correlation
6
3 6 4
Regression line:
4 4 2
y=-2x+12
5 2 1 2 3 4 5
x
x y y
10
r =0.98
1 2
8 strong positive
2 3
6 correlation
3 7 4
4 8 Regression line:
2
5 10 x y=2.1x-0.3
1 2 3 4 5
x y y
10
r =-0.98
1 10 strong negative
8
2 8 6 correlation
3 7 4
Regression line:
4 3 2
x y=-2.1x+12.3
5 2 1 2 3 4 5
x y y
10
r =0
1 8
8 no correlation
2 2
6 at all
3 5 4
4 2 Regression line:
2
5 8
x y=5
1 2 3 4 5
20
TOPIC 4: STATISTICS AND PROBABILITY Christos Nikolaidis
x y rank rank
of x of y
10 105 1 2
20 103 2 1
30 125 3 3
40 130 4 5
50 128 5 4
instead of ranks 1, 2, 3, 4
we use the ranks 2.5, 2.5, 2.5, 2.5.
21
TOPIC 4: STATISTICS AND PROBABILITY Christos Nikolaidis
x y y
10
r =0.98
1 2
8 strong positive
2 3
6 correlation
3 7 4
4 8 Regression line:
2
5 10 x y=2.1x-0.3
1 2 3 4 5
x y rank rank
of x of y
1 2 1 1
2 3 2 2
3 7 3 3
4 8 4 4
5 10 5 5
Here
r = 0.98
rs = 1 (perfect monotonic relationship)
22
TOPIC 4: STATISTICS AND PROBABILITY Christos Nikolaidis
BASIC NOTIONS
In elementary set theory, a set is just a collection of objects (or
elements). It is usually denoted by a capital letter. For example,
a B
f B
Let us consider the set A = {1,2,3}. The subsets of A are sets that
contain some (or none or all) elements of A. There are 8 subsets:
{1}, {2}, {3}
{1,2}, {1,3}, {2,3}
{1,2,3}
23
TOPIC 4: STATISTICS AND PROBABILITY Christos Nikolaidis
In general,
if A contains n elements, there are 2n subsets.
B A
VENN DIAGRAMS
We usually refer to a large set S, called universal set, and consider
several subsets of S.
Let
S = { a,b,c,d,e,f,g,h,i,j }
A
a b d f h
c e g i j
24
TOPIC 4: STATISTICS AND PROBABILITY Christos Nikolaidis
A B
a b d f h
c e g i j
S 10
A B
3 2 2
We denote by
n(A) = the number of elements of set A
In our example
n(S) = 10
n(A)=5 n(B)=4
Notice that the number n(A)=5 does not appear on the Venn
diagram. The subset A consists of two regions of size 3 and 2, thus
n(A)=3+2=5
25
TOPIC 4: STATISTICS AND PROBABILITY Christos Nikolaidis
Now we can study some basic operations between sets. Let us refer
again to our example where S = { a,b,c,d,e,f,g,h,i,j } and
A = {a,b,c,d,e}
B = {d,e,f,g}
S
A
S
A B
S
A B
26
TOPIC 4: STATISTICS AND PROBABILITY Christos Nikolaidis
A BASIC PREPERTY
S A B
27
TOPIC 4: STATISTICS AND PROBABILITY Christos Nikolaidis
4.6 PROBABILITY
n(A)
P(A)
TOTAL
For example, in the following Venn diagram, the sample space S
contains 100 elements, while the event A contains 30 elements
100
30
70
n(A) 30
P(A) 0.3
TOTAL 100
We understand that
0 P(A) 1
Clearly
P()=0 and P(S) = 1
28
TOPIC 4: STATISTICS AND PROBABILITY Christos Nikolaidis
COMPLEMENTARY EVENTS
In our example above P(A΄) = 0.7
In general
P(A΄) = 1- P(A)
COMBINED EVENTS
Remember the basic property for combined events
A B
20 10 30
40
Then
P(A) = 0.3, P(B) = 0.4, P(A΄)=0.7, P(B΄)=0.6
Also
P(AB)=0.1, P(AB)=0.6
Clearly
P(AB) = P(A) + P(B) – P(AB)
0.6 = 0.3 + 0.4 – 0.1
A Venn diagram may also contain probabilities instead of numbers
of elements. The Venn diagram above takes the form
A B
0.4
29
TOPIC 4: STATISTICS AND PROBABILITY Christos Nikolaidis
S A B
EXAMPLE 1
Given that P(A) = 0.5, P(B) = 0.3, P(AB)=0.6, let us construct a
Venn diagram representing the combined events A and B
Notice that
P(AB)≠P(A)+P(B)
0.6 ≠ 0.8
The difference implies the existence of an intersection; P(AB)=0.2
Starting from the intersection 0.2 we may easily complete the
following Venn diagram
A B
0.4
30
TOPIC 4: STATISTICS AND PROBABILITY Christos Nikolaidis
TABLES
Another way to represent sets in order to find probabilities is the
tabular form below. It is appropriate when the sample space is
partitioned in disjoint subsets according two different criteria; for
example MALE-FEMALE and SMOKERS-NON SMOKERS.
120
male is P(male) 0.6
200
80
female is P(female) 0.4
200
60
smoker is P(smoker) 0.3
200
140
non-smoker is P(non - smoker) 0.7
200
40
male AND smoker P(male smoker) 0.2
200
140
male OR smoker P(male smoker) 0.7
200
31
TOPIC 4: STATISTICS AND PROBABILITY Christos Nikolaidis
TWO DICE
1 2 3 4 5 6
1
2
3
4
5
6
Notice that there is only one combination ones (the first dot; 1-1)
but two combinations of one-two (1-2 and 2-1).
1
P(two sixes) (the very last dot)
36
11
P(at leat one six) (last column and last row)
36
10
P(exactly one six) (why?)
36
6
P(same score) (the main diagonal: 1-1, etc)
36
4
P(sum of scores 9) (the dotted line)
36
6
P(sum of scores 9) (below the dotted line)
36
26
P(sum of scores 9) (above the dotted line)
36
32
TOPIC 4: STATISTICS AND PROBABILITY Christos Nikolaidis
is different than
1
Clearly P(A) .
100
1
then P(A | B) (there are 90 two-digit numbers)
90
n(A B) P(A B)
P(A | B) or P(A | B)
n(B) P(B)
33
TOPIC 4: STATISTICS AND PROBABILITY Christos Nikolaidis
A B
20 10 30
40
30
We know that P(A) . What about P(A|B) ?
100
We start with the given event B; now the total number is not 100,
the size of the whole sample space, but only 40, the size of B:
?
P(A | B)
40 given B
How many elements of A are inside the given space B? Only 10.
Therefore,
10
P(A | B)
40
NOTICE
In fact, in the last result we apply the formula
n(A B) 10
P(A | B) 0.25
n(B) 40
10
P(A B) 100
P(A | B) 0.25
P(B) 40
100
34
TOPIC 4: STATISTICS AND PROBABILITY Christos Nikolaidis
Similarly we obtain
10
P(B | A)
30 given A
Similarly we obtain
30 20 40
P(A'| B) P(A | B' ) P(A'| B' )
40 60 60
P(A|B) IN A TABLE
Perhaps it is much easier to observe the conditional probability in
tables. Consider again the example
Clearly,
60
P(smoker)
200
40
P(smoker|male)
120 given male
Similarly we obtain
40
P(male|smoker)
60 given smoker
Similarly we obtain
20 60
P(female|smoker) 0.33 P(non-smoker|female) 0.75
60 80
35
TOPIC 4: STATISTICS AND PROBABILITY Christos Nikolaidis
INDEPENDENT EVENTS
P(A|B) = P(A)
P(A B)
In this case the definition P(A | B) gives
P(B)
To summarize
EXAMPLE 1
120
A B
20 10 30
60
36
TOPIC 4: STATISTICS AND PROBABILITY Christos Nikolaidis
NOTICE
Many students confuse the terms
Mutually exclusive events and Independent events
Remember
Mutually exclusive events means A B =
Independent events means P(A B) P(A) P(B)
Mind that
P(AB) = P(A) + P(B) – P(AB) holds in general
P(A B) P(A) P(B) holds for independent events
1 1 1
P(A B) P(A) P(B) = =
6 2 12
37
TOPIC 4: STATISTICS AND PROBABILITY Christos Nikolaidis
EXAMPLE 2
Let P(A)=0.4 and P(B)=0.3. Find P(AB) in the following cases
a) A and B are mutually exclusive
b) A and B are independent
c) P(A B) 0.2
d) P(A | B) 0.2
Solution
a) P(AB) = P(A) + P(B) = 0.4 + 0.3 = 0.7
b) P(AB) = P(A) + P(B) - P(A) P(B) = 0.4+0.3–(0.4)(0.3) = 0.58
c) P(AB) = P(A) + P(B) – P(AB) = 0.4+0.3–0.2 = 0.5
P(A B)
d) P(A | B) P(AB)= P(A | B) P(B)= (0.2)(0.3) = 0.06
P(B)
Hence, P(AB) = P(A)+P(B)–P(AB) = 0.4+0.3–0.06 = 0.64
EXAMPLE 3
Let A and B be independent events with
P(A)=0.4 and P(AB)=0.7.
Find P(B).
Solution
For independent events it holds
P(AB) = P(A) + P(B) - P(A) P(B)
0.7 = 0.4 + P(B) – 0.4 P(B)
0.3 = 0.6P(B)
P(B) = 0.5
38
TOPIC 4: STATISTICS AND PROBABILITY Christos Nikolaidis
AAAA BBBBBB
We play the game twice. All possible scenarios are shown below; the
corresponding probabilities are shown on the branches of the tree:
A
Scenario AA
0.4
A
0.4
0.6 B Scenario AB
Scenario BA
0.4 A
0.6 B
0.6 B Scenario BB
39
TOPIC 4: STATISTICS AND PROBABILITY Christos Nikolaidis
A
0.16
0.4
A
0.4
0.6 B
0.24
0.24
0.4 A
0.6 B
0.6 B 0.36
to obtain no A is 0.36
40
TOPIC 4: STATISTICS AND PROBABILITY Christos Nikolaidis
AAAA BBBBBB
C
0.12
0.3
A
0.4
0.7 D
0.28
0.24
0.4 C
0.6 B
0.6 D 0.36
Thus
41
TOPIC 4: STATISTICS AND PROBABILITY Christos Nikolaidis
NOTICE
In the tree diagram above, the value 0.3 of the branch AC is in fact
the conditional probability
C
P(A
C)
P(C|A)
A
P(A)
P(D|A) D
P(A
D)
P(B
C)
P(C|B) C
P(B) B
P(D|B) D P(B
D)
42
TOPIC 4: STATISTICS AND PROBABILITY Christos Nikolaidis
EXAMPLE 1.
We throw a die.
If we get 1 we stop.
If we get 2,3,4 or 5 we toss a coin.
If we get 6 we toss two coins.
Find the probability that only one head is obtained.
Solution.
For our convenience, we denote the results of the die by
A={1}, B={2,3,4,5}, C={6}
1/6
1/2 H
4/6 B
1/2
1/6 T H
1/2
C 1/2 1/2
H T
H
1/2 1/2
T
1/2 T
43
TOPIC 4: STATISTICS AND PROBABILITY Christos Nikolaidis
10 balls
We select two balls, one after the other. All possible outcomes are
clearly shown on the following tree diagram
30
B 90
5
9
B
6
10 4 24
90
9
W
6
B 24
9
90
4
10 W
3 12
9
W 90
6 5 30 1
P(both balls are BLACK)
10 9 90 3
6 4 24 8
P(only one ball is BLACK) 2 2
10 9 90 15
6 5 4 3 42 7
P(balls of same color)
10 9 10 9 90 15
44
TOPIC 4: STATISTICS AND PROBABILITY Christos Nikolaidis
C 0.12
0.3
A
0.4
0.7
D 0.28
C 0.24
0.4
0.6 B
0.6
D 0.36
We said that P(C|A) = 0.3 is shown on the tree (on the branch AC).
P(A C)
Actually, it is the formula P(A | C)
P(C)
Therefore,
0.12 0.24
P(A | C) 0.33 P(B | C) 0.67
0.36 0.36
0.28 0.36
P(A | D) 0.44 P(B | D) 0.56
0.64 0.64
45
TOPIC 4: STATISTICS AND PROBABILITY Christos Nikolaidis
EXAMPLE 2.
In a private school party, 30% of the students wear RED suits, 20%
wear GREEN suits and 50% wear BLUE suits. 25% of the RED
students, 35% of the GREEN students and 45% of the BLUE
students are MALE. Find the probability that a MALE student
wears GREEN suit, that is
P(GREEN|MALE).
Solution.
Instead of applying the Bayes’ formula we will construct a tree
diagram to obtain the “inverse given” probability.
MALE 0.075
0.25
RED
D
0.3 FEMALE
MALE 0.070
0.2 GREEN 0.35
0.5
FEMALE
BLUE 0.45
MALE 0.225
FEMALE
Therefore,
0.070 0.07
P(GREEN|MALE) 0.189
0.075 0.070 0.225 0.37
46
TOPIC 4: STATISTICS AND PROBABILITY Christos Nikolaidis
Discrete OR Continuous
X{0,1,2,3,…} XR
10, 20, 30
with probabilities
0.2, 0.3, 0.5
x 10 20 30
Clearly
P(X=10) = 0.2
47
TOPIC 4: STATISTICS AND PROBABILITY Christos Nikolaidis
x x1 x2 x3 …
P(X=x) p1 p2 p3 …
it holds
(i) pi 0 , for all i
(ii) p i 1, i.e p1 p 2 p3 ⋯ 1
We write
P(X=x1) = p1, P(X=x2) = p2, and so on.
x 10 20 30
48
TOPIC 4: STATISTICS AND PROBABILITY Christos Nikolaidis
EXAMPLE 1
Consider
x 10 20 30
P(X=x) a b 0.5
EXAMPLE 2
Consider again the same table above. But now we select one of the
numbers 10, 20, 30 at random.
If we select 10 we earn 6 points
If we select 20 we earn 1 point
If we select 30 we lose 2 points
x 10 20 30
Profit 6 points 1 point -2 points
Prob 0.2 0.3 0.5
49
TOPIC 4: STATISTICS AND PROBABILITY Christos Nikolaidis
Explanation
In other words, if we play this game 10 times we expect to earn 5
points on average.
Indeed, if we play the game 10 times we expect to obtain
2 times the number 10, that is 2 6=12 points
3 times the number 20, that is 3 1=3 points
5 times the number 30, that is 5 (-2)=-10 points
In total, 12+3-10 = 5 points
EXAMPLE 3
We throw two dice.
If we obtain TWO SIXES we earn 15€
If we obtain ONLY ONE SIX we earn 1€
If we obtain NO SIX we lose 1€
Notice. If the first winning prize was not 15€ but 14€, the expected
1
profit would be -
36
In other words, if we play the game 36000 times (or otherwise bet
36000€) we expect to lose 1000€.
50
TOPIC 4: STATISTICS AND PROBABILITY Christos Nikolaidis
MEDIAN-MODE
These measures, known from statistics, are defined analogously:
MODE = The value X=a of the highest probability
MEDIAN = The value X=m where the probability splits
in two equal parts (0.5-0.5)
Look at the examples below
x 10 20 30 x 10 20 30
P(X=x) 0.4 0.3 0.3 P(X=x) 0.2 0.3 0.5
MODE = 10 MODE = 30
MEDIAN = 20 MEDIAN = 25 (why?)
An equivalent definition is
Var(X) = E(X2)-μ2
where
E(X2) = x12 p1 + x22 p2 + x32 p3 + …
EXAMPLE 4
Consider again the probability distribution
x 10 20 30
P(X=x) 0.2 0.3 0.5
51
TOPIC 4: STATISTICS AND PROBABILITY Christos Nikolaidis
n
p(x) p x (1 p) n- x x 0,1,2,3, …, n
x
n = number of trials
p = probability of success
while
X counts the number of (possible) successes
52
TOPIC 4: STATISTICS AND PROBABILITY Christos Nikolaidis
GDC
Our GDC (Casio) gives the results for a Binomial distribution
Then for each value of x (or x1 to x2), EXE gives the result.
EXAMPLE 1
We toss a die 5 times. The success is to get a six. Then
1
n=5 and p=
6
We may have 0, 1, 2, 3, 4 or 5 successes.
The probability distribution for X is given by (results in 4dp)
x 0 1 2 3 4 5
GDC Bpd(0) Bpd(1) Bpd(2) Bpd(3) Bpd(4) Bpd(5)
P(X=x) 0.4019 0.4019 0.1608 0.0322 0.0032 0.0001
53
TOPIC 4: STATISTICS AND PROBABILITY Christos Nikolaidis
Remark for the formula (not in the syllabus but worth to know)
5
(or 5C2) is the number of ways to have 2 sixes in 5 trials.
2
n
p(X x) p x (1 p) x
x
x 0 1 2 3 4 5
3125 3125 1250 250 25 1
P(X=x)
65 65 65 65 65 65
=0.4019 =0.4019 =0.1608 =0.0322 =0.0032 =0.0001
54
TOPIC 4: STATISTICS AND PROBABILITY Christos Nikolaidis
EXAMPLE 2
A box contains 5 balls, 1 BLACK and 4 WHITE. We win if we select
a BLACK ball. We play this game 10 times.
Find
(a) The probability to win exactly 4 times
(b) The probability to win at most 4 times
(c) The probability to win at least once
(d) The expected number of winning games.
(e) The variance of the number of winning games.
Solution
The variable
X = number of winning games
1
follows a binomial distribution with n=10 and p= =0.2
5
[we may also write XB(10,0.2)]
EXAMPLE 3
Solution
Hence n=10.
55
TOPIC 4: STATISTICS AND PROBABILITY Christos Nikolaidis
56
TOPIC 4: STATISTICS AND PROBABILITY Christos Nikolaidis
μ = mean
σ = standard deviation.
-∞ μ +∞
Roughly speaking, there is a highly likely mean value μ and all the
other values of X spread out symmetrically about the mean. As we
move away from the mean (either to the left or to the right of the
mean) the probability decreases dramatically!
Weight of people
Height of people
Time spent in a super market
Weight of a pack of coffee labeled 500 g.
57
TOPIC 4: STATISTICS AND PROBABILITY Christos Nikolaidis
It is estimated4 that
ranges between
Percentage of the population
in general for our problem
about 68% of the population μ-σ and μ+σ [65,85]
about 95% of the population μ-2σ and μ+2σ [55,95]
about 99.7% of the population μ-3σ and μ+3σ [45,105]
68.3%
x
-∞ 65 75 85 +∞
NOTICE
The whole area under the curve is 1 (i.e. 100%). The area before
the mean as well as the area after the mean is 0.5 (i.e. 50%)
Theoretically, the distribution of X ranges between -∞ to +∞.
In practice, we may assume that almost the whole population
(in fact 99,7%) ranges between μ-3σ and μ+3σ.
The standard deviation σ indicates the spread of the population.
For example, assume that
Greeks: μ=75 kg σ=10 kg
Italians: μ=75 kg σ=8 kg
This implies that both populations have the same mean but
Italians are closer to the mean than Greeks. In other words,
almost the whole population is between μ±3σ, namely
75±30 i.e. 45-105 kg for Greeks
75±24 i.e. 51-99 kg for Italians
58
TOPIC 4: STATISTICS AND PROBABILITY Christos Nikolaidis
-∞ 60 75 82 +∞
59
TOPIC 4: STATISTICS AND PROBABILITY Christos Nikolaidis
NOTICE
GDC gives some extra information below each result.
For question (a) it gives P(60≤X≤82)=0.691 and then
z:Low =-1.5 z:Up =0.7
the so-called standardized values of 60 and 82. They mean that
P(65≤X≤85)0.683, (68.3%)
60
TOPIC 4: STATISTICS AND PROBABILITY Christos Nikolaidis
0.067
-∞ a +∞
61
TOPIC 4: STATISTICS AND PROBABILITY Christos Nikolaidis
EXAMPLE 1
The mass of packets for a certain type of coffee is normally
distributed with a mean of 500 g and standard deviation of 15 g.
(a) Find the probability that a packet weighs more than 520 g
(b) The lightest 4% of the packets weigh less than a.
The heaviest 5% of the packets weigh more than b.
Find a and b.
(c) Find Q1, Q3, the lower and upper quartiles of the weights
Solution
For the second result we can also use Tail: right, area = 0.25
62
TOPIC 4: STATISTICS AND PROBABILITY Christos Nikolaidis
EXAMPLE 2
The mass of packets for a certain type of coffee is normally
distributed with a mean of 500 g and standard deviation of 15 g.
Solution
63
TOPIC 4: STATISTICS AND PROBABILITY Christos Nikolaidis
64
TOPIC 4: STATISTICS AND PROBABILITY Christos Nikolaidis
ONLY FOR
HL
65
TOPIC 4: STATISTICS AND PROBABILITY Christos Nikolaidis
66
TOPIC 4: STATISTICS AND PROBABILITY Christos Nikolaidis
e m m x
P(X x) x 0,1,2,3, …
x!
where m is a parameter.
We say that X follows a Poisson distribution and write XPo(m).
We denote by
m = the mean number of incidents
(within a certain period)
while
X is the random variable for the possible number of incidents
(within the certain period)
67
TOPIC 4: STATISTICS AND PROBABILITY Christos Nikolaidis
GDC
Our GDC (Casio) gives the results for a Poisson distribution
EXAMPLE 1
Similarly,
X 0 1 2 3 …
P(X=x) 0.135 0.271 0.271 0.180 …
This is in fact,
-2 3
5 Notice that the formula e 2 0.180 gives the same result
3!
68
TOPIC 4: STATISTICS AND PROBABILITY Christos Nikolaidis
EXAMPLE 2
Assume that the mean number of phone calls per minute is 2. Find
(a) The probability that 3 phone calls occur in one minute
(b) The probability that 3 phone calls occur in two minutes
(c) The probability that no phone calls occur in three minutes
Solution
The frequency of phone calls is v=2 (phone calls per minute)
(a) The mean number of phone per minute is m=2. Hence
P(X=3) = Ppd(3) = 0.180
(b) The mean number of phone calls per 2 minutes is m=4. Hence
P(X=3) = Ppd(3) = 0.147
(c) The mean number of phone calls per 3 minutes is m=6. Hence
P(X=0) = Ppd(0) = 0.00248
69
TOPIC 4: STATISTICS AND PROBABILITY Christos Nikolaidis
MODE
We check the neighboring integer values of the expected number λ.
Look at the following two cases:
Assume that the mean is m=4.3. We expect that the most likely
number of incidents is near 4.3. We check
P(X=4) = 0.193
P(X=5) = 0.166
Hence the mode is 4.
Assume that the mean is m=5. We expect that the most likely
number of incidents is near 5.
We check
P(X=3) = 0.140 P(X=4) = 0.175
P(X=5) = 0.175 P(X=6) = 0.146
Hence we have two modes, 4 and 5.
70
TOPIC 4: STATISTICS AND PROBABILITY Christos Nikolaidis
EXAMPLE 3
The mean number of phone calls in a call center is m=2. Find the
probability of the combined event that
3 phone calls occur in the first minute and
4 phone calls occur in the second minute
Solution
We have
P(X=3) = 0.1804 and P(X=4) = 0.0902
The time intervals are disjoint, thus the probability is
P(X=3 and X=4) = (0.1804)(0.0902) = 0.0163
X follows Po(m)
Y follows Po(n)
EXAMPLE 4
The mean number of phone calls in the call center A is m=2, while
the mean number of phone calls in the call center B is n=3.
Find the probability that the total number of phone calls is 6.
Solution
X ~Po(2), Y ~ Po(3), therefore X+Y ~ Po(5)
Thus
P(X+Y=6) = 0.161
NOTICE
The sum of the following combinations (where X+Y=6)
71
TOPIC 4: STATISTICS AND PROBABILITY Christos Nikolaidis
0.8
0.2
A
B 0.7
0.3
from
Prob
A B
A 0.8 0.3
to
B 0.2 0.7
0.8 0.3
T =
0.2 0.7
Notice that
the columns refer to “from” and the rows to “to”.
the sum of the entries in each column is 1.
72
TOPIC 4: STATISTICS AND PROBABILITY Christos Nikolaidis
A A
0.64 0.24
0.8 0.8
A A
0.8 0.3
0.2 B 0.2 B
0.16 0.06
A B
0.06 0.21
0.3 A 0.3 A
0.2 B 0.7 B
However, we can obtain all the results above in a much easier and
amazing way: by squaring the transition matrix A!
0.650 0.525
T3 =
0.350 0.475
By using our GDC we may observe that for large powers (that is
after many days) the result converges to the matrix
0.6 0.6
T =
0.4 0.4
73
TOPIC 4: STATISTICS AND PROBABILITY Christos Nikolaidis
Notice.
The columns in T are identical. This happens when the matrix is
regular:
a square matrix A is regular if An has non-zero entries
after some value n. Then An converges to a matrix A∞
in which all columns are identical.
Suppose that the total population in the two cities is 100 and the
initial distribution is 50-50. This can be depicted in the
50
initial state vector S0 =
50
0.8 0.3 50 55
S1 = TS0 = = that is, 55 in A, 45 in B
0.2 0.7 50 45
55
After 2 days, S2 = TS1 = T 2 S0 =
45
55
After 3 days, S3 = TS2 = T 3S0 =
45
60
After many days, S = T S0 =
40
However,
100 60
If the initial state vector was S0 = , then still S = T S0 =
0 40
0 60
If the initial state vector was S0 = , then still S = T S0 =
100 40
60
In fact, any initial state vector S0 would result to S = T S0 =
40
74
TOPIC 4: STATISTICS AND PROBABILITY Christos Nikolaidis
This means that after many steps, things tend to stabilize and the
distribution of the population reaches the so-called
60
steady state vector S=
40
This also implies that if the initial distribution was already 60-40,
it would remain the same forever, from the very first day! Indeed,
0.8 0.3 60 60
TS = = =S
0.2 0.7 40 40
0.6
S=
0.4
Thus, at any moment in the “future”, the probability for the robot
to be in location A is 0.6, while to be in location B is 0.4.
75
TOPIC 4: STATISTICS AND PROBABILITY Christos Nikolaidis
EXAMPLE 1
The probability to move to the right is 0.6, while to the left 0.4. It
stops when it reaches position 1 or position 4.
1 0.4 0 0
0 0 0.4 0
T =
0 0.6 0 0
0 0 0.6 1
For example,
the 1st column says: if the robot is situated at position 1, the
probability to move elsewhere is 0 as it stops at position 1.
the 2nd column says: if the robot is situated at position 2, the
probability to move to 1 is 0.4 while to 3 is 0.6.
and so on.
76
TOPIC 4: STATISTICS AND PROBABILITY Christos Nikolaidis
1 0.4 0 0 0 0.4
0 0 0.4 0 1 0
S1 = TS0 = T = =
0 0.6 0 0 0 0.6
0 0 0.6 1 0 0
0.40
2 0.24
After 2 moves, S2 = TS1 = T S0 =
0
0.36
In other words,
the probability to finish at position 1 is 0.40,
the probability to be again in position 2 is 0.24,
there is no way to be in position 3 (can you think why?), the
probability to finish at position 4 is 0.36.
The transition matrix is not regular as any power will contain non-
zero matrix (can you think why?).
0.5259
10 0.0008
S10 = T S0 =
0
0.4733
After 11 moves
0.5262
11 0
S11 = T S0 =
0.0005
0.4733
77
TOPIC 4: STATISTICS AND PROBABILITY Christos Nikolaidis
0.8 0.3
T =
0.2 0.7
we found a steady state vector
60
S=
40
such that TS = S .
Think of this relation as
TS = 1S
0.8- λ 0.3 2
det = 0 λ - 1.5λ + 0.5 = 0
0.2 0.7- λ
λ = 1 or λ = 0.5
For example, a rental car company with 100 cars, two car stations
A and B, and transition matrix T as above, should place 60 cars in
station A and 40 cars in station B, from the first day!.
78