Unit 3 - Notes
Unit 3 - Notes
(COQT111)
Unit 3
Foundation of Statistical
Inference
1
Table of Contents
Unit Overview ....................................................................................................................... 3
1. Basic Probability Concepts ................................................................................... 4
2. Probability Distributions ..................................................................................... 19
3. Unit Summary ...................................................................................................... 24
4. References ............................................................................................................ 24
2
Unit Overview
Suppose a company questions 200 customers in order to estimate the proportion of all
customers who favour a particular product. In this context, it would be expected that the
proportion of the 200 customers in the survey in favour of the product is a representative of
all customers who are in favour. There is a degree of uncertainty associated with any survey
results. A determination of the likelihood that a certain proportion of the customers in the
survey would favor the company product is of great importance in management-related
decision-making. The task of calculating the likelihood that something occurs belongs to the
realm of probability, which is the focus in this unit.
Learning Outcomes
3
1. Basic Probability Concepts
• Probability is defined as the likelihood (or chance) that a particular event will occur
• An experiment is a process by which an outcome is obtained (i.e., rolling a dice)
• The result of an experiment is called an outcome (i.e. obtaining a 2 after rolling a dice)
• An event is any particular outcome or group of outcomes (i.e. obtaining {Head}, or
{Tail} or {Head; Tail} after tossing a fair coin)
• The sample space is the set of all possible outcomes of a random variable (i.e. after
rolling a dice, all possible outcomes are given by {1; 2; 3; 4; 5; 6})
𝑟
𝑃(𝐴) =
𝑛
Where: 𝐴 = 𝑒𝑣𝑒𝑛𝑡 𝑜𝑓 𝑎 𝑠𝑝𝑒𝑐𝑖𝑓𝑖𝑐 𝑡𝑦𝑝𝑒
Probability values are always defined on a scale from 0 to 1 (or an interval (0 ≤ 𝑃(𝐴) ≤ 1). A
probability of zero (or near zero) indicates that an event is unlikely to occur, and a probability
of 1 (or close to 1) indicates that an event is certain to occur. Other probability values between
0 and 1 represent degrees of likelihood that an event will occur. An illustration of the scale
where the probability is defined is given in Figure 17.
4
1.1. Types of Probability
There are two types of probabilities namely: subjective and objective probabilities. Subjective
probability is assigned based on personal feelings or insights. The determination of the
probability comes from an educated guess, expert opinion or just plain intuition (Wegner,
2016). Although not a scientific approach to probability, the subjective method is often based
on wisdom and experiences. Suppose an experienced head of the mathematics department at
a college analyses students’ performance in mathematics at the end of term 1, and takes note
of the poor performing students. Drawing form his experience, the head of department would
have knowledge of the scope of the mathematics module and hence may be able to give an
accurate probability that a certain proportion of the poor performing students would pass the
final examination. However, this approach of determining probabilities is not used extensively
in statistical analysis because it is difficult to statistically verify the correctness of the results.
When working with probabilities it is important to understand some of its most basic
properties. A list of five most basic properties is as follows:
● A probability value always lies between 0 and 1 (i.e. 0 ≤ 𝑃(𝐴) ≤ 1). Note that 0 and 1
are included in the interval.
● If it is impossible for an event to occur, then 𝑃(𝐴) = 0. For example, the probability
of a Spaza shop with capital investment of R8 000 making R80 000 profit in one day
is zero.
● If it is certain that an event will occur, then 𝑃(𝐴) = 1. The probability that the human
resource office at a company processes at least one leave application in a year is 1.
● The sum of the probabilities of all possible events equals 1 (i.e. for 𝑘 possible events
in a sample space, 𝑃(𝐴1 ) + 𝑃(𝐴2 ) + 𝑃(𝐴3 ) + ⋯ + 𝑃(𝐴𝑘 ) = 1). For example if a coin is
1 1
tossed, there are two outcomes {Head, Tail), and 𝑃(𝐻𝑒𝑎𝑑) + 𝑃(𝑇𝑎𝑖𝑙) = + = 1.
2 2
● Complementary probability: if 𝑃(𝐴) is the probability of event A occurring, then the
probability of event A not occurring (i.e. 𝐴) is defined as 𝑃(𝐴) = 1 − 𝑃(𝐴). For example,
if there is 60% chance that a salesperson would make R100 000,00 in one day then
60
𝑃(𝑚𝑎𝑘𝑖𝑛𝑔 𝑅100 000,00) = = 0,6 and 𝑃(𝑛𝑜𝑡 𝑚𝑎𝑘𝑖𝑛𝑔 𝑅100 000,00) = 1 − 0,6 = 0,4
100
5
More information relating to types of probability, see Wegner (2016, p.107-109)
The following example will help us to understand the differences between these concepts.
Example 12: Consider the table below showing recruitment by sex at a new company.
a) Intersection of events
The intersection of two events 𝐴 and 𝐵 is the set of all outcomes that belong to both 𝐴 and
𝐵 simultaneously. It is written as 𝐴 ∩ 𝐵 (i.e. 𝐴 and B).
6
Figure 18: Venn diagram showing intersection of two events (𝐴 ∩ 𝐵)
To illustrate, we answer the following question: What is the probability that a randomly
selected employee will be female and belong to the security department?
Solution:
Then (𝐴 ∩ 𝐵) the set of all employees who are female ‘and’ are recruited in security
department.
From table 5, there are 4 employees out of 28 who are female and recruited in security
department.
4
Thus 𝑃(𝐴 ∩ 𝐵) = 𝑃(𝐹𝑒𝑚𝑎𝑙𝑒 ∩ 𝑆𝑒𝑐𝑢𝑟𝑖𝑡𝑦) = 28 = 0,143
7
Figure 19: Venn diagram of female and security employees
b) Union of events
The union of two events 𝐴 and 𝐵 is the set of all outcomes that belong to either 𝐴 or 𝐵
or both. It is written as 𝐴 ∪ 𝐵.
8
To illustrate, we answer the following question: What is the probability that a randomly
selected employee will be female or will be an employee in the security department, or both?
Solution:
Then (𝐴 ∪ 𝐵) the set of all employees who are female or employed in security department or
both (female and security) employees. From table 5, there are 13 female employees (includes
4 security employees), 10 security employees (includes 4 female employees) and 4 employees
who are female and in security department. This means that there are 19 different employees
(13 + 10 − 4) that are either female or in security or both.
13+10−4 19
Thus 𝑃(𝐴 ∪ 𝐵) = 𝑃(𝑓𝑒𝑚𝑎𝑙𝑒 ∪ 𝑆𝑒𝑐𝑢𝑟𝑖𝑡𝑦) = 28
= 28 = 0,679
9
c) Mutually exclusive
Events are mutually exclusive if they cannot occur together on a single trial of a random
experiment (i.e. not at the same point in time).
For example, we answer the question: what is the probability of randomly selecting an
employee who is both male and female?
Solution:
Events A and B are mutually exclusive because a randomly selected employee cannot be male
and female at the same time
0
Thus 𝑃(𝐴 ∩ 𝐵) = 𝑃(𝑚𝑎𝑙𝑒 ∩ 𝑓𝑒𝑚𝑎𝑙𝑒) = 28 = 0. A probability of zero means an impossible event.
10
Events are collectively exhaustive when the union of all possible events is equal to the sample
space. To illustrate we answer the question: what is the probability of selecting an employee
who is male or female from the sample of 28 employees (refer to table 5).
Solution:
Two events 𝐴 and 𝐵 are statistically independent if the occurrence of event 𝐴 has no effect on
the outcome of event 𝐵 and vice-versa.
It is also important to note the key difference between ‘mutually exclusive events’ and
‘statistically independent events’ to avoid confusion. When two events are mutually exclusive,
they cannot occur together. On the other hand, when two events are statistically independent,
they can occur together, but they do not have an influence on each other. Here is an example
for statistically independent events:
Notice here that the occurrence of the meeting in the boardroom does not influence the
weather and vice-versa. In fact, the two events can occur simultaneously.
More information relating to basic probability concepts, see Wegner (2016, p.109 -113)
Objective probabilities can be classified into three types namely: marginal probability, joint
probability, and conditional probability. We illustrate their differences using an example.
Example 13: Consider the information relating to numbers of people (grouped according to
gender) who have a particular blood group.
11
Table 6: Blood groups
● Joint probability P(𝐴 ∩ 𝐵): a joint probability is the probability that both event A and
event B will occur simultaneously on a single trial of a random experiment (Wegner,
2016). A joint event refers to the outcomes of two or more random variables occurring
together. It is the same as the intersection of two events in a Venn diagram. A cross
tabulation is used to find joint probabilities because it shows the outcomes of two
random variables. In example 13, the probability of being female and belong to blood
group O can be calculated as follows:
27
𝑃(𝐹𝑒𝑚𝑎𝑙𝑒 𝑎𝑛𝑑 𝑂) = 𝑃(𝐹 ∩ 𝑂) = = 0,19
140
12
𝑁𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑚𝑎𝑙𝑒𝑠 𝑤ℎ𝑜 ℎ𝑎𝑠 𝑎 𝑏𝑙𝑜𝑜𝑑 𝑔𝑟𝑜𝑢𝑝 𝑂 21
𝑃(𝑀) = =
𝑇𝑜𝑡𝑎𝑙 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑚𝑎𝑙𝑒𝑠 70
For more information relating to calculating objective probabilities, see Wegner (2016).
The probability rules are more useful when calculating probabilities of compound or multiple
events occurring simultaneously. Two probability rules are presented below:
● The addition rule: useful for non-mutually exclusive events and mutually exclusive
events. It relates to the union of events. Used to find the probability of either event A
or event B, or both events occurring simultaneously in a single trial of a random
experiment
● The multiplication rule: useful for statistically dependent events and statistically
independent events. It relates to the union of events. It is used to find the probability
of event A and event B occurring together in a single trial of a random experiment.
Non-mutually exclusive events are those events that can occur together in a single trial of a
random experiment. Therefore the probability of either event A or event B or both occurring
in a single trial of a random experiment is defined as:
If a Venn diagram is used for illustration, the union of two non-mutually exclusive events is the
combined outcomes of the two overlapping events A and B. From example 13, if an individual
was selected randomly, what is the probability that the individual is either female or someone
with blood group AB, or both?
Solution:
Let 𝐴 = event (Female)
Let 𝐵 = event (Blood group AB)
Notice that the two events are not mutually exclusive as they occur at the same time.
13
70
Therefore: 𝑃(𝐴) = 𝑃(𝐹𝑒𝑚𝑎𝑙𝑒) = 140 = 0,5
22
𝑃(𝐵) = 𝑃(𝐴𝐵) = 140 = 0,157
10
𝑃(𝐴 ∩ 𝐵) = 𝑃(𝐹𝑒𝑚𝑎𝑙𝑒 𝑤𝑖𝑡ℎ 𝑏𝑙𝑜𝑜𝑑 𝑔𝑟𝑜𝑢𝑝 𝐴𝐵) = = 0,0714
140
Mutually exclusive events are those events that cannot occur together in a single trial of a
random experiment. For mutually exclusive events, there is no intersectional event, meaning
that 𝑃(𝐴 ∩ 𝐵) = 0. Thus the probability of either event A or event B (but not both) occurring
in a single trial of a random experiment is defined as:
If events are mutually exclusive, then the probability is the sum of only the two marginal
probabilities of events A and B. Using Venn diagrams, the union of two mutually exclusive
events is the sum of the outcomes of each of the two (non-overlapping) events A and B
separately. For example; what is the probability that a randomly selected individual is either
someone with blood group O or AB?
Solution:
The two events are mutually exclusive since they cannot occur simultaneously. Therefore
𝑃(𝐴 ∩ 𝐵) = 0.
50
Since: 𝑃(𝐴) = 𝑃(𝐺𝑟𝑜𝑢𝑝 𝑂) = 140 = 0,357
22
𝑃(𝐵) = 𝑃(𝐺𝑟𝑜𝑢𝑝 𝐴𝐵) = 140 = 0,157
The multiplication rule is often used to find the joint probability of events A and B occurring
together in a single trial of random experiment (i.e. the intersection of the two events). This
14
rule assumes that the two events A and B are associated (i.e. they are dependent events). The
multiplication rule for dependent events is given by the following:
Example 14
Suppose there are 3 green marbles, 4 blue marbles, and 3 red marbles in a bag. What is the
probability of drawing 2 blue marbles from the bag if the first marble is not replaced before
the second marble is drawn?
Solution:
If two events, A and B, are statistically independent (i.e. there is no association between the
two events) then the multiplication rule reduces to the product of the two marginal
probabilities only:
𝑃(𝐴 ∩ 𝐵) = 𝑃(𝐴) × 𝑃(𝐵)
Where: 𝑃(𝐴 ∩ 𝐵) = 𝑗𝑜𝑖𝑛𝑡 𝑝𝑟𝑜𝑏𝑎𝑏𝑖𝑙𝑖𝑡𝑦 𝑜𝑓 𝐴 𝑎𝑛𝑑 𝐵
𝑃(𝐴) = 𝑚𝑎𝑟𝑔𝑖𝑛𝑎𝑙 𝑝𝑟𝑜𝑏𝑎𝑏𝑖𝑙𝑖𝑡𝑦 𝑜𝑓 𝐴 𝑜𝑛𝑙𝑦
𝑃(𝐵) = 𝑚𝑎𝑟𝑔𝑖𝑛𝑎𝑙 𝑝𝑟𝑜𝑏𝑎𝑏𝑖𝑙𝑖𝑡𝑦 𝑜𝑓 𝐵 𝑜𝑛𝑙𝑦
The following test is used to check if two events are statistically independent. Two events are
statistically independent if the following is true:
𝑃(𝐵) = 𝑃(𝐴)
This means that the prior occurrence of event B does not influence the outcome of event A.
15
Example 15
Suppose you throw two fair dice. What is the probability of obtaining a 3 on each dice?
Solution:
The two events are independent – the outcome of the first event (throwing the first die) does
not influence the occurrence of the second event (throwing the second die).
𝑃(3 𝑜𝑛 1𝑠𝑡 𝑑𝑖𝑒 𝑎𝑛𝑑 3 𝑜𝑛 2𝑛𝑑 𝑑𝑖𝑒) = 𝑃(3 𝑜𝑛 𝑓𝑖𝑟𝑠𝑡 𝑑𝑖𝑒) × 𝑃(3 𝑜𝑛 𝑠𝑒𝑐𝑜𝑛𝑑 𝑑𝑖𝑒)
1 1
= ×
6 6
1
=
36
A probability tree is a graphic way to apply probability rules where there are multiple events
that occur in sequence and these events can be represented by branches (similar to a tree).
Example 16: Consider two fair coins that are tossed, find the outcomes using a tree diagram.
What is the probability that you will obtain two heads?
Solution:
16
The outcomes are {(H,H); (H,T); (T,H); (T,T)}
1
Probability 𝑃(𝑇𝑤𝑜 𝐻𝑒𝑎𝑑𝑠) = 𝑃(𝐻, 𝐻) = 4
Solving probabilities using tree diagrams can be very challenging. This video provides more
detailed and clear explanations:
https://www.youtube.com/watch?v=kNOrDWm15bY
https://www.youtube.com/watch?v=z9FKoQ8a_1E
Probability calculations involve counting the number of event outcomes (𝑟) and the total
number of possible outcomes (𝑛) and expressing this as a ratio. Often the values for 𝑟 and 𝑛
cannot be counted because of the large number of possible outcomes involved. Counting
rules help to find values for 𝑟 and 𝑛. There are three basic counting rules namely: the
multiplication rule, the permutation rule, and the combination rule (Wegner, 2016).
For a single event, the total number of different ways (unique ways) in which 𝑛 objects (i.e.
the full sample space) can be arranged (ordered) is given by 𝑛! (read as ‘n factorial’).
𝑛! = 𝑛 × (𝑛 − 1) × (𝑛 − 2) × (𝑛 − 3) × … × 3 × 2 × 1 (note that 0! = 1)
For example, consider the number of unique arrangements of seven 100m athletes in a seven-
lane track. The number of different arrangements of the seven athletes will be:
7! = 7 × 6 × 5 × 4 × 3 × 2 × 1 = 5040
17
For combined events, if a random process has 𝑛1 possible outcomes for event 1, 𝑛2 possible
outcomes for event 2, …, 𝑛𝑗 possible outcomes for event 𝑗, then the total number of possible
outcomes for the 𝑗 events collectively is:
𝑛1 × 𝑛2 × 𝑛3 × … × 𝑛𝑗
𝑛!
𝑛 𝑃𝑟 =
(𝑛 − 𝑟)!
Where: 𝑟 = 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑜𝑏𝑗𝑒𝑐𝑡𝑠 𝑠𝑒𝑙𝑒𝑐𝑡𝑒𝑑 𝑎𝑡 𝑎 𝑡𝑖𝑚𝑒
Example 17: A Statistics debating team consists of 5 speakers. a) In how many ways can all 5
speakers be arranged in a row for a photo? b) How many ways can the captain and vice-captain
be chosen?
Solution:
c) Combination rule
A combination is the number of distinct ways of selecting (or arranging) a subset of 𝑟 objects
drawn from a larger group of 𝑛 objects where the order of selecting objects is not important.
Each separate grouping of the subset of 𝑟 objects is called a combination. The number of ways
of selecting 𝑟 objects selected from 𝑛 objects, not considering the order of selection is given
by:
𝑛!
𝑛 𝐶𝑟 =
𝑟! (𝑛 − 𝑟)!
Where: 𝑟 = 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑜𝑏𝑗𝑒𝑐𝑡𝑠 𝑠𝑒𝑙𝑒𝑐𝑡𝑒𝑑 𝑎𝑡 𝑎 𝑡𝑖𝑚𝑒
18
Example 18: How many ways can a basketball team of 5 players be chosen from 9 players?
Solution:
9! 9! 362880
9 𝐶5 = = = = 126
5! (9 − 5)! 5! 4! (120)(24)
2. Probability Distributions
A probability distribution is a list of all the possible outcomes of a random variable and their
associated probabilities of occurrence (Wegner, 2016). The probability distribution for a
random variable X provides the possible outcomes for X, and the probabilities associated with
each possible value.
For more information relating to types of probability distributions, see Wegner (2016, p.133)
19
2.2. Discrete Probability Distributions
Discrete probability distributions assume that the outcomes of a random variable under study
can take on only specific values (normally integers). These distributions assign a probability to
each value of a discrete random variable X. Some examples of discrete probability distributions
are:
For each possible outcome of a discrete random variable in a sample space, there is a non-
zero probability. The zero probability is only assigned for values of the random variable outside
the sample space. As noted above, binomial probability distribution and Poisson probability
distribution are the commonly used discrete probability distribution functions.
More information and detailed explanations about discrete probability distributions, please
click the on the link below:
https://www.youtube.com/watch?v=UnzbuqgU2LE
A discrete random variable follows the binomial distribution if it satisfies the following four
conditions:
Before we calculate the probabilities for probability distributions, let us first discuss
the meanings of the following phrases that are often used in probability problem solving.
20
At least three (𝑥 ≥ 3): This means that three is the minimum value and if we say at least
three books, it means three or four or five books.
At most three (𝑋 ≤ 3): This means that three is the maximum value. At most three books
means no book or one book or two books or three books.
No more than three (𝑋 ≤ 3): This means that three is the maximum number. With regards
to number of books, we would say that it means, three or two or one or zero books.
Less than three (𝑋 < 3): This means that three is not included and we are only interested in
the values smaller than three, that is, zero or one or two.
More than three 𝑋 > 3): This means that three is not included and we are only interested in
the values larger than three, that is, four, five, six etc.
Once these four conditions are satisfied, the following binomial question can be answered.
Binomial question
What is the probability that 𝑥 successes will occur in a randomly drawn sample of 𝑛 objects?
The 𝑥 values (called the domain) represent the number of success outcome that can be
observed in a sample of 𝑛 objects. It is also important to note that the success outcome is
always associated with the probability, 𝑝. Thus the outcome that must be labelled as the
success outcome is identified from the binomial question.
Example 19: A surgery has a success rate of 85%. Suppose that the surgery is performed on
three patients. What is the probability that the surgery is successful on exactly 2 patients?
Therefore:
𝑃(2) = 3 𝐶2 𝑝2 𝑞 𝑛−2 = 3 𝐶2 (0,85)2 (0,15)1 = 0,325
21
A measure of central location and a measure of dispersion can be calculated for any random
variable that follows a binomial distribution using the following formulae:
𝑀𝑒𝑎𝑛: 𝜇 = 𝑛𝑝
𝑆𝑡𝑎𝑛𝑑𝑎𝑟𝑑 𝑑𝑒𝑣𝑖𝑎𝑡𝑖𝑜𝑛: 𝜎 = √𝑛𝑝(1 − 𝑝)
From Example 19, calculate the mean of 𝑋.
For more information relating to binomial probability distributions, see Wegner (2016)
● The occurrence or non-occurrence of the event over any interval is independent of the
occurrence or non-occurrence over any other interval.
● The probability of the occurrence of an event is the same for any two intervals of the
same length.
● The occurrences are uniformly distributed throughout the interval of time
Some of the examples of a Poisson process (also see Wegner, 2016) are:
From these examples, it can be seen that the number of occurrences of a given outcome of
the random variable, 𝑥, can take on any integer value from 0 to infinity (i.e. 0 ≤ 𝑥 < ∞).
22
Poisson question
To answer the Poisson question, the following Poisson probability distribution formula can
be used:
𝑒 −𝜆 𝜆𝑥
𝑃(𝑥) = 𝑓𝑜𝑟 𝑥 = 0, 1, 2, 3, …
𝑥!
Where: 𝜆=
𝑡ℎ𝑒 𝑚𝑒𝑎𝑛 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑜𝑐𝑐𝑢𝑟𝑒𝑛𝑐𝑒𝑠 𝑜𝑓 𝑎 𝑔𝑖𝑣𝑒𝑛 𝑜𝑢𝑡𝑐𝑜𝑚𝑒 𝑜𝑓 𝑡ℎ𝑒 𝑟𝑎𝑛𝑑𝑜𝑚 𝑣𝑎𝑟𝑖𝑎𝑏𝑙𝑒
𝑥=
𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑜𝑐𝑐𝑢𝑟𝑒𝑛𝑐𝑒𝑠 𝑜𝑓 𝑎 𝑔𝑖𝑣𝑒𝑛 𝑜𝑢𝑡𝑐𝑜𝑚𝑒 𝑓𝑜𝑟 𝑤ℎ𝑖𝑐ℎ 𝑎 𝑝𝑟𝑜𝑏𝑎𝑏𝑖𝑙𝑖𝑡𝑦 𝑖𝑠 𝑟𝑒𝑞𝑢𝑖𝑟𝑒𝑑
Example 20: The number of typographical errors in a PhD thesis is Poisson distributed with a
mean of 1,9 per 100 pages. If 100 pages of the thesis are seleceted randomly, what is the
probability that there are no typographical errors?
A measure of central location and dispersion can be calculated for any random variable that
follows a Poisson process using the following formula:
𝑀𝑒𝑎𝑛: 𝜇 = 𝜆
𝑆𝑡𝑎𝑛𝑑𝑎𝑟𝑑 𝑑𝑒𝑣𝑖𝑎𝑡𝑖𝑜𝑛: 𝜎 = √𝜆
For more information about binomial and Poisson probability distributions, see this video
https://www.youtube.com/watch?v=BR1nN8DW2Vg
23
For more information relating to binomial probability distributions, see Wegner (2016:
141-150)
3. Unit Summary
https://www.youtube.com/watch?v=UnzbuqgU2LE&t=1368s
4. References
Anderson, D., Sweeney, D., Williams, T. (2011) Statistics for Business and Economics, South-
Western, Cengage Learning
Doane, D., & Seward, L. (2011). Applied Statistics in business and Economics (3rd Ed).
Mcgraw-Hill
Groebner, D., Shannon, P., Fry, P., & Smith, K. (2011). Business Statistics: a decision-making
approach. Pearson Education Inc.
Levine, D., Krehbiel, T. & Berenson, M. (2009). Business Statistics: A First Course, (5th Edition),
Prentice-Hall, Inc.
Wegner, T. (2016). Applied Business Statistics: Methods and Excel-based applications (4th Ed).
Juta & Company
24