Probability Theory: Notation
Probability Theory: Notation
Probability Theory: Notation
In the world of Statistics, the word that can most suitably replace probability
would be ' chance '. Probability can be considered as a measure of likelihood of an event
occurring. There are two ways of assigning numerical values to probability : the classical
approach and empirical method.
Before probing into the core of this topic, it will be most appropriate to, first
of all, make the reader familiar with the different notations that will be used throughout
it. It must be mentioned, here, that probability theory is quite closely linked to set theory.
It has, therefore, been considered necessary to remind the reader of the equivalence of the
terms of these two concepts.
Notation
Example
A six-sided unbiased (ordinary fair) die is tossed. What is the probability of obtaining a
multiple of 3 ?
Solution
For example, let us try to find out whether a coin is unbiased. The most
natural thing to do is to toss it a certain number of times n and record the number of
'heads', n(H), or number of 'tails' , n(T)). A ratio n(H) : n(T) approximately equal to 1 : 1
would suggest that the coin is probably unbiased. On the other hand, if the coin is tossed
20 times, for example, and 15 heads are recorded, it cannot be immediately concluded
that the coin is biased. This is because the value of n is too small to allow any hasty
n( H )
decision. It is expected that, as we increase the number of tosses, the value of
n
will tend to the real value of p(H) for the coin.
Let us consider the following table, which gives the results of tosses of a coin
generated at random by a computer.
It can be noted that p(H) approaches 0.55 as the number of tosses increases
indefinitely. We can, within some security margin, conclude that the coin is slightly
biased towards 'heads'.
There are cases where all outcomes are not equally likely to occur and also
where a good estimate of probability cannot be obtained because an experiment cannot be
repeated under identical conditions. An example would be to calculate the probability
that a shop will sell exactly 10 television sets on a particular day. In those cases, we are
forced to form a subjective probability, based on past experience, records, expert opinion
or other factors. This method obviously has a very large margin of error but it is
sometimes the only method available.
AXIOMS OF PROBABILITY
Note
Two events are said to be mutually exclusive if they have no intersection.
Proofs
1. p(S) = 1.
2. 0 ≤ p( A) ≤ 1
(i) p( A) ≥ 0
n( A)
From the classical definition, p( A) = . Since n(A) and n(S) are both
n( S )
natural numbers, it goes without saying that p( A) ≥ 0 .
4
(ii) p( A) ≤ 1
n( A) ≤ n( S )
n( A) n( S )
⇒ ≤
n( S ) n( S )
⇒ p( A) ≤ 1 .
We conclude that probability can only take on values between 0 and 1
inclusive.
3. p( A ∪ B) = p( A) + p ( B)
A B
p q
Now that we have proved the axioms, let us have a look at some further rules
of probability. Most of them are derived from the axioms themselves.
5
1. p(φ ) = 0 .
n(φ ) 0
(i) p(φ ) = = =0.
n( S ) n( S )
(ii) Since S and φ are complementary, they are also mutually exclusive. Using
the third axiom of probability, we have
p( S ∪ φ ) = p (S ) + p (φ )
Since S ∪ φ = S , it follows that
p( S ) = p ( S ) + p (φ )
⇒ p(φ ) = p ( S ) − p( S ) = 0
2. p( A′) = 1 − p( A)
3. p( A) = p( A ∩ B) + p( A ∩ B ′)
(It can easily be checked that n( A) = n( A ∩ B) + n( A ∩ B ′) ).
4. De Morgan's rules.
(i) p( A′ ∩ B ′) = p( A ∪ B)′
(ii) p( A′ ∪ B ′) = p( A ∩ B)′
p( A ∪ B) = p ( A) + p( B) − p ( A ∩ B)
This rule will also be proven by the use of a Venn diagram. From the diagram
below, n(A) = p + q, n(B) = q + r and n( A ∩ B) = q.
We have that n(A) + n(B) − n( A ∩ B) = ( p + q ) + (q + r ) − q = p + q + r .
Also, n( A ∪ B) = p + q + r .
6
S
A B
p q r
CONDITIONAL PROBABILITY
Let us start with a real-life situation where the event A is defined as “Mauritius will
produce 600 000 tonnes of sugar in 2004” and B is the event that “there will be a heavy cyclone
during the year 2004”. It is obvious that both A and B have their individual probabilities of
occurring but it should also be clear that the probability of A will change depending on whether B
occurs, that is, a cyclone could definitely bring about a drastic decrease in the number of tonnes
of sugar produced. This is a case of conditional probability.
Notation
P(A | B ) means “the probability that A occurs given that B has already occurred”
[Remember that we are only calculating the probability that A occurs (nothing to do with B).]
Definition
P (A ∩ B )
P (A | B ) =
P(B )
The above formula can be explained by means of the following Venn diagram.
S
A B
A∩ B
8
If event B occurs, obviously its complement B ′ cannot occur any more. Thus, if any
subsequent event has to occur, it can only do so in B. Therefore, the initial sample space S has
now been reduced to the new sample space B. Now, what is the probability of A occurring in B?
The answer is clearly the part of A which also belongs to B, that is, A ∩ B (indicated by an arrow
in the above diagram).
It is worth noting that, sometimes, we do not have to solve a problem by strictly using
theory so that a logical approach may also lead us to the solution more easily. Example 1
illustrates both approaches to the same problem.
Example 1
An ordinary fair (six-sided unbiased) die is tossed. If the score is even, what is the
probability that it is also prime?
Solution
Method 1 (Theory)
P(A ∩ B ) , the probability that the score is both even and prime, is 1
6
.
P(B ∩ A) (16 ) 1
Therefore, = = .
P (A) (63 ) 3
Method 2 (Logic)
9
We know that the score is even so that any subsequent outcome must belong to the set {2, 4, 6},
which is the new sample space (read discussion above).
1
The only prime number in this set is 2 and the probability that it occurs is clearly 3
.
A problem on conditional probability can also be solved by means of a tree diagram as shown in
Example 2 below.
Example 2
A man goes to work on foot, by bus or by car with respective probabilities of 0.5, 0.2 and
0.3 respectively. If he goes on foot, the probability that he arrives to work late is 0.4. If he goes
by bus, the probability that he arrives to work late is 0.7 and if he goes by car, the probability that
he arrives to work late is 0.5. Determine the probability that
[To avoid any confusion, you may assume that being exactly on time is the same as being early.]
Solution
Note that there is no need to define an event E, for example, where E = “man is early” since that
event is simply the complement of “man is late”, hence, denoted by L ′ .
10
P(L | F ) = 0.4
P(L ′ | F ) = 0.6
P(F) = 0.5
P(L | B ) = 0.7
P(B) = 0.2
P(L ′ | B ) = 0.3
P(L ′ | C ) = 0.5
(a) The probability of the man being late, irrespective of the means of transport used, is
given by
P(B ∩ L ) P (B ) P(L | B )
(b) P(B | L ) = =
(
PL ) P F P L | F + P(B ) P(L | B ) + P (C ) P (L | C )
( ) ( )
0.14 2
= = . (this could mean the contribution of bus in lateness)
0.49 7
The next example shows the application of probability in the case of a contingency table.
A contingency table is just a table of frequencies representing two factors in terms of their
attributes.
11
Example 3
The following table shows the frequency distribution of grades obtained in Mathematics
by students of different sections of a certain form in a secondary school.
A B C
Form V Red 15 25 40
Form V Blue 26 44 10
(a) obtained an A
(b) is from Form V Blue
(c) is from Form V Red and obtained a C
(d) obtained an A given that he is from Form V Red
(e) is from Form V Blue given that he obtained a B.
Solution
A B C Total
Form V Red 15 25 40 80
Form V Blue 26 44 10 80
Total 41 69 50 160
41
(a) P [student obtained an A] = 160
.
Independence
P ( A | B ) = P ( A) .
P (A ∩ B )
= P ( A)
P(B )
The above result is known as the multiplicative rule for two independent events.
Note that independent events are not mutually exclusive events! The difference is in fact very
obvious: for mutually exclusive events, there is no intersection whereas independent events
definitely do have an intersection (the above result says it all!)