Probability Theory: Notation

PROBABILITY THEORY
In the world of Statistics, the word that can most suitably replace probability
would be ' chance '. Probability can be considered as a measure of likelihood of an event
occurring. There are two ways of assigning numerical values to probability : the classical
approach and empirical method.
Before probing into the core of this topic, it will be most appropriate to, first
of all, make the reader familiar with the different notations that will be used throughout
it. It must be mentioned, here, that probability theory is quite closely linked to set theory.
It has, therefore, been considered necessary to remind the reader of the equivalence of the
terms of these two concepts.
Notation
S - sample space (universal set)

A, B, C, … - events defined on the sample space S ( subsets of the universal set)
n(A) - number of outcomes favourable to A (number of elements in set A)
p(A) - probability of event A occurring
φ - null event (empty set)
A or A′ - A complement, that is, the event which automatically occurs in the absence
of event A ( the set of all elements not belonging to set A)
∪ - union
∩ - intersection
⊂ - ' is a subset of '
⊆ - ' is a subset of ' or ' is equal to '
Classical method of obtaining probability
Given an event A defined on a sample space S, the probability of A occurring,

denoted by p(A), is defined as
n( A)
p( A) = .
n( S )
It should be clear that n(S) represents the total number of outcomes.
In fact, the classical definition of probability applies when the outcomes of an

experiment are equally likely to occur. When deriving mathematical rules for probability,
it is useful to consider the classical definition of probability.
2
Example
A six-sided unbiased (ordinary fair) die is tossed. What is the probability of obtaining a
multiple of 3 ?
Solution
The sample space S = {1,2,3,4,5,6} ⇒ n(S) = 6;

Let A = ' multiple of 3 '
Therefore, A = {3,6} ⇒ n(A) = 2;
n( A) 2 1
It follows that p(A) = = = .
n( S ) 6 3
Empirical method of obtaining probability
This definition of probability lays emphasis on a much more practical rather

than theoretical approach. Here, probability is determined as a result of a large number of
repeated experiments under presumably identical conditions.
For example, let us try to find out whether a coin is unbiased. The most
natural thing to do is to toss it a certain number of times n and record the number of
'heads', n(H), or number of 'tails' , n(T)). A ratio n(H) : n(T) approximately equal to 1 : 1
would suggest that the coin is probably unbiased. On the other hand, if the coin is tossed
20 times, for example, and 15 heads are recorded, it cannot be immediately concluded
that the coin is biased. This is because the value of n is too small to allow any hasty
n( H )
decision. It is expected that, as we increase the number of tosses, the value of
n
will tend to the real value of p(H) for the coin.
Let us consider the following table, which gives the results of tosses of a coin
generated at random by a computer.
n 10 50 100 500 1000 5000

n(H) 7 33 62 289 572 2797
p(H) 0.7 0.66 0.62 0.578 0.572 0.5594
n 10000 50000 100000 500000 1000000 5000000

n(H) 5521 27530 54997 274895 550020 2750050
p(H) 0.5521 0.5506 0.54997 0.54979 0.55002 0.55001
3
It can be noted that p(H) approaches 0.55 as the number of tosses increases
indefinitely. We can, within some security margin, conclude that the coin is slightly
biased towards 'heads'.
Subjective method of obtaining probability
There are cases where all outcomes are not equally likely to occur and also
where a good estimate of probability cannot be obtained because an experiment cannot be
repeated under identical conditions. An example would be to calculate the probability
that a shop will sell exactly 10 television sets on a particular day. In those cases, we are
forced to form a subjective probability, based on past experience, records, expert opinion
or other factors. This method obviously has a very large margin of error but it is
sometimes the only method available.
AXIOMS OF PROBABILITY
1. For any sample space S, p(S) = 1.

2. For any given event A defined on a sample space S, 0 ≤ p( A) ≤ 1 .
3. For any two mutually exclusive events A and B defined on a sample space S,
p( A ∪ B) = p( A) + p ( B)
Note
Two events are said to be mutually exclusive if they have no intersection.
Proofs
1. p(S) = 1.
A probability of 1 is known as certainty. It is obvious that the outcome of an

experiment must belong to the sample space corresponding to that particular experiment.
Hence, the sample space occurs all the time with 100% probability. If not, it would not
have been well-defined !
2. 0 ≤ p( A) ≤ 1
This proof consists of two parts :
(i) p( A) ≥ 0
n( A)
From the classical definition, p( A) = . Since n(A) and n(S) are both
n( S )
natural numbers, it goes without saying that p( A) ≥ 0 .
4
(ii) p( A) ≤ 1
Here, we know that A ⊆ S . Therefore, n( A) ≤ n( S ) . Dividing both sides of

the inequality by n(S), we have the required result :
n( A) ≤ n( S )
n( A) n( S )
⇒ ≤
n( S ) n( S )
⇒ p( A) ≤ 1 .
We conclude that probability can only take on values between 0 and 1
inclusive.
3. p( A ∪ B) = p( A) + p ( B)
This axiom can easily be proved by using a Venn diagram.
A B
p q
Let n(A) = p and n(B) = q.

Thus, n(A) + n(B) = p + q and also, n( A ∪ B) = p + q .
We can therefore write n( A ∪ B) = n( A) + n( B) and, dividing both sides by

n(S), we have
n( A ∪ B) n( A) n( B)
= +
n( S ) n( S ) n( S )
⇒ p( A ∪ B) = p ( A) + p( B) .
Now that we have proved the axioms, let us have a look at some further rules
of probability. Most of them are derived from the axioms themselves.
5
Further rules of probability
1. p(φ ) = 0 .
There are two ways of proving the above :
n(φ ) 0
(i) p(φ ) = = =0.
n( S ) n( S )
(ii) Since S and φ are complementary, they are also mutually exclusive. Using
the third axiom of probability, we have
p( S ∪ φ ) = p (S ) + p (φ )
Since S ∪ φ = S , it follows that
p( S ) = p ( S ) + p (φ )
⇒ p(φ ) = p ( S ) − p( S ) = 0
2. p( A′) = 1 − p( A)
We know that A and A′ are mutually exclusive. By the third axiom,

p( A ∪ A′) = p ( A) + p ( A′)
Since A ∪ A′ = S ,
p( S ) = p ( A) + p ( A′)
⇒ 1 = p ( A) + p ( A′) from the first axiom and
p( A′) = 1 − p( A) .
3. p( A) = p( A ∩ B) + p( A ∩ B ′)
(It can easily be checked that n( A) = n( A ∩ B) + n( A ∩ B ′) ).
4. De Morgan's rules.
(i) p( A′ ∩ B ′) = p( A ∪ B)′
(ii) p( A′ ∪ B ′) = p( A ∩ B)′
5. General addition rule for two events.
p( A ∪ B) = p ( A) + p( B) − p ( A ∩ B)
This rule will also be proven by the use of a Venn diagram. From the diagram
below, n(A) = p + q, n(B) = q + r and n( A ∩ B) = q.
We have that n(A) + n(B) − n( A ∩ B) = ( p + q ) + (q + r ) − q = p + q + r .
Also, n( A ∪ B) = p + q + r .
6
S
A B
p q r
Therefore, n(A) + n(B) − n( A ∩ B) = n( A ∪ B) . Dividing by n(S) on both

sides, we obtain
p(A) + p(B) − p ( A ∩ B) = p ( A ∪ B) .
7
CONDITIONAL PROBABILITY
Conditional probability theory helps to calculate the probability of an event occurring

given that another event has occurred. In a sense, it verifies whether an event A is independent of
a second event B. We use conditional probability very often in solving basic statistical problems;
if asked, for example, the probability of drawing two red balls from a bag containing 3 blue and 5
red balls, we would find it very straightforward to reply 5
8
× 74 = 145 . However, there is a rich
theoretical background behind this simple calculation, as will be seen later.
Let us start with a real-life situation where the event A is defined as “Mauritius will
produce 600 000 tonnes of sugar in 2004” and B is the event that “there will be a heavy cyclone
during the year 2004”. It is obvious that both A and B have their individual probabilities of
occurring but it should also be clear that the probability of A will change depending on whether B
occurs, that is, a cyclone could definitely bring about a drastic decrease in the number of tonnes
of sugar produced. This is a case of conditional probability.
Notation
P(A | B ) means “the probability that A occurs given that B has already occurred”
[Remember that we are only calculating the probability that A occurs (nothing to do with B).]
Definition
P (A ∩ B )
P (A | B ) =
P(B )
The above formula can be explained by means of the following Venn diagram.
S
A B
A∩ B
8
If event B occurs, obviously its complement B ′ cannot occur any more. Thus, if any
subsequent event has to occur, it can only do so in B. Therefore, the initial sample space S has
now been reduced to the new sample space B. Now, what is the probability of A occurring in B?
The answer is clearly the part of A which also belongs to B, that is, A ∩ B (indicated by an arrow
in the above diagram).
n(A ∩ B ) n(A ∩ B ) ÷ n(S ) P(A ∩ B )

Hence, P (A | B ) = = = .
n(B ) n(B ) ÷ n(S ) P(B )
It is worth noting that, sometimes, we do not have to solve a problem by strictly using
theory so that a logical approach may also lead us to the solution more easily. Example 1
illustrates both approaches to the same problem.
Example 1
An ordinary fair (six-sided unbiased) die is tossed. If the score is even, what is the
probability that it is also prime?
Solution
Method 1 (Theory)
Let A = “the score is even” and

B = “the score is prime”
P(B ∩ A)
We wish to calculate P (B | A), that is, .
P (A)
The sample space of scores is S ={1, 2, 3, 4, 5, 6}.
Note that A = {2, 4, 6}, B = {2, 3, 5} and A ∩ B = {2}.
P(A) , the probability that the score is even, is 3
6
.
P(A ∩ B ) , the probability that the score is both even and prime, is 1
6
.
P(B ∩ A) (16 ) 1
Therefore, = = .
P (A) (63 ) 3
Method 2 (Logic)
9
We know that the score is even so that any subsequent outcome must belong to the set {2, 4, 6},
which is the new sample space (read discussion above).
1
The only prime number in this set is 2 and the probability that it occurs is clearly 3
.
(Short and sweet!)
A problem on conditional probability can also be solved by means of a tree diagram as shown in
Example 2 below.
Example 2
A man goes to work on foot, by bus or by car with respective probabilities of 0.5, 0.2 and
0.3 respectively. If he goes on foot, the probability that he arrives to work late is 0.4. If he goes
by bus, the probability that he arrives to work late is 0.7 and if he goes by car, the probability that
he arrives to work late is 0.5. Determine the probability that
(a) he is late on a given day,

(b) he travelled by bus given that he is late.
[To avoid any confusion, you may assume that being exactly on time is the same as being early.]
Solution
Let F = “man goes on foot”

B = “man travels by bus”
C = “man travels by car”
L = “man is late”
Note that there is no need to define an event E, for example, where E = “man is early” since that
event is simply the complement of “man is late”, hence, denoted by L ′ .
10
The information can be illustrated by a tree diagram.
P(L | F ) = 0.4
P(L ′ | F ) = 0.6
P(F) = 0.5
P(L | B ) = 0.7
P(B) = 0.2
P(L ′ | B ) = 0.3
P(C) = 0.3 P(L | C ) = 0.5
P(L ′ | C ) = 0.5
(a) The probability of the man being late, irrespective of the means of transport used, is
given by
P(L ) = P(F ) P(L | F ) + P(B ) P(L | B ) + P(C ) P(L | C )

= (0.5)(0.4) + (0.2)(0.7) + (0.3)(0.5) = 0.49
P(B ∩ L ) P (B ) P(L | B )
(b) P(B | L ) = =
(
PL ) P F P L | F + P(B ) P(L | B ) + P (C ) P (L | C )
( ) ( )
0.14 2
= = . (this could mean the contribution of bus in lateness)
0.49 7
The next example shows the application of probability in the case of a contingency table.
A contingency table is just a table of frequencies representing two factors in terms of their
attributes.
11
Example 3
The following table shows the frequency distribution of grades obtained in Mathematics
by students of different sections of a certain form in a secondary school.
A B C
Form V Red 15 25 40
Form V Blue 26 44 10
If a student is chosen at random, find the probability that the student
(a) obtained an A
(b) is from Form V Blue
(c) is from Form V Red and obtained a C
(d) obtained an A given that he is from Form V Red
(e) is from Form V Blue given that he obtained a B.
Solution
We start by calculating the marginal and grand totals.
A B C Total
Form V Red 15 25 40 80
Form V Blue 26 44 10 80
Total 41 69 50 160
41
(a) P [student obtained an A] = 160
.
(b) P [student is from Form V Blue] = 80

160
= 12 .
(c) P [student is from Form V Red and obtained a C] = 40

160
= 14 .
(d) P [student obtained an A given that he is from Form V Red] = 15

80
= 163 .
44
(e) P [student is from Form V Blue given that he obtained a B] = 69
.
12
Independence
Imagine that an event A is independent of another event B. It can be easily

understood that the probability of A occurring will be unaffected by the fact that B has
occurred or not. Thus, we simply conclude that
P ( A | B ) = P ( A) .
From definition, we obtain
P (A ∩ B )
= P ( A)
P(B )
so that P(A ∩ B ) = P(A) P (B ) .
The above result is known as the multiplicative rule for two independent events.
Note that independent events are not mutually exclusive events! The difference is in fact very
obvious: for mutually exclusive events, there is no intersection whereas independent events
definitely do have an intersection (the above result says it all!)

Probability Theory: Notation

Uploaded by

Copyright:

Available Formats

Probability Theory: Notation

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Probability Theory: Notation

Uploaded by

Copyright:

Available Formats

PROBABILITY THEORY

S - sample space (universal set)

Classical method of obtaining probability

Given an event A defined on a sample space S, the probability of A occurring,

In fact, the classical definition of probability applies when the outcomes of an

The sample space S = {1,2,3,4,5,6} ⇒ n(S) = 6;

Empirical method of obtaining probability

This definition of probability lays emphasis on a much more practical rather

n 10 50 100 500 1000 5000

n 10000 50000 100000 500000 1000000 5000000

Subjective method of obtaining probability

1. For any sample space S, p(S) = 1.

A probability of 1 is known as certainty. It is obvious that the outcome of an

This proof consists of two parts :

Here, we know that A ⊆ S . Therefore, n( A) ≤ n( S ) . Dividing both sides of

This axiom can easily be proved by using a Venn diagram.

Let n(A) = p and n(B) = q.

We can therefore write n( A ∪ B) = n( A) + n( B) and, dividing both sides by

Further rules of probability

There are two ways of proving the above :

We know that A and A′ are mutually exclusive. By the third axiom,

5. General addition rule for two events.

Therefore, n(A) + n(B) − n( A ∩ B) = n( A ∪ B) . Dividing by n(S) on both

Conditional probability theory helps to calculate the probability of an event occurring

n(A ∩ B ) n(A ∩ B ) ÷ n(S ) P(A ∩ B )

Let A = “the score is even” and

(Short and sweet!)

(a) he is late on a given day,

Let F = “man goes on foot”

The information can be illustrated by a tree diagram.

P(C) = 0.3 P(L | C ) = 0.5

P(L ) = P(F ) P(L | F ) + P(B ) P(L | B ) + P(C ) P(L | C )

If a student is chosen at random, find the probability that the student

We start by calculating the marginal and grand totals.

(b) P [student is from Form V Blue] = 80

(c) P [student is from Form V Red and obtained a C] = 40

(d) P [student obtained an A given that he is from Form V Red] = 15

Imagine that an event A is independent of another event B. It can be easily

From definition, we obtain

so that P(A ∩ B ) = P(A) P (B ) .

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.