Introduction to Discrete Probability
Introduction to Discrete Probability
• The theory of probability is concerned with the averages of mass phenomenon occurring either sequentially
or simultaneously, electron emission, arrival of telephone calls, birth and death, system failure etc.
• It has been observed that certain averages in these and other fields approach a constant value as the number
of observations increases, and this value remains the same if the averages are evaluated over any
subsequence specified before the experiment is performed.
➢ For instance, the percentage occurrence of ``heads" approaches 0.5 or some other constant, and the same average is
observed if, let's say, every fourth toss is considered.
Introduction To Discrete Probability (AI121 - Discrete Mathematics, Spring 2025) 3
Introduction To Probability
Relative Frequency Interpretation
• The probability of an event is a number , assigned to the event , and can be interpreted as follows:
➢ If the experiment is performed times and the event occurs times, then with a high degree of
certainty, the relative frequency, , of occurrence of is close to , i.e.,
• The lack of precision in the above definition, due to the phrases highlighted, cannot be avoided, which is why, the theory
of probability, like any physical theory, is related to the physical phenomenon in inexact terms.
• In applying the theory of probability to real-world problems, following steps must be clearly distinguished:
➢ Determination of the probabilities of certain events by an inexact process.
❖ This process could be based on the relationship between probability and observation, as in (1).
❖ Or it could be based on some ``reasoning" making use of certain symmetries, i.e.,
According to the
Classical Definition of Probability classical definition, the
probability of an event
• If there are number of outcomes favorable to the event , out of a total ``equally likely"
is determined apriori,
i.e., without actual
outcomes, then experimentation.
• The lack of precision in the above definition, due to the phrases highlighted, cannot be avoided, which is why, the theory
of probability, like any physical theory, is related to the physical phenomenon in inexact terms.
• In applying the theory of probability to real-world problems, following steps must be clearly distinguished:
i. Determination of the probabilities of certain events by an inexact process.
❖ This process could be based on the relationship between probability and observation, as in (1).
❖ Or it could be based on some ``reasoning" making use of certain symmetries, i.e.,
According to the
Classical Definition of Probability classical definition, the
probability of an event
• If there are number of outcomes favorable to the event , out of a total ``equally likely"
is determined apriori,
i.e., without actual
outcomes, then experimentation.
because probability of each face is ( ) and axioms tell us to add the probabilities.
• For example, if a fair die is rolled times, we predict that will show about times.
__________________________________________________________________________________________________________________________________________________________
The act of drawing specific conclusions from general information, as opposed to inductive reasoning, in which conclusions are drawn by going from specific observations to general principles.
Application of (1) in reverse.
• Furthermore:
➢ Union of two events and , denoted by , is the event that occurs when or or both occur.
➢ Intersection of two events and , denoted by , is the event that occurs when both and occur.
Axioms of Probability Two events are called
• The modern framework of probability is defined by the following axioms: mutually exclusive if
occurrence of one of
i. , them excludes the
occurrence of the
ii. , other.
iii. , if and are mutually exclusive.
Introduction To Discrete Probability (AI121 - Discrete Mathematics, Spring 2025) 7
Axiomatic Definitions – Borel Fields
• Instead of considering all possible subsets of the probability space as events, we consider only those that
form a Borel field.
The main reason for not
Field considering all subsets is that
it becomes impossible to
• A field is a non-empty class of sets such that: assign probabilities to all
subsets satisfying the axioms
when the experiment has
infinitely many outcomes.
As an example of an experiment with infinitely many outcomes, consider the roll of a die that lands on a point on the table which is specified by the coordinates. Then, the outcome is not only the face value but also the infinitely many
possible ordered pairs of the form .
iii. The probabilities of these events, i.e., , and probabilities of other events given by
the axiom III and its generalization and extension, to be given later.
Introduction To Discrete Probability (AI121 - Discrete Mathematics, Spring 2025) 9
Probabilities of Events
• Repeated application of axiom III allows us to write
if the events , , are all (pairwise) mutually exclusive, because that renders the events and as
mutually exclusive, i.e.,
• Extending the above for a finite sequence of mutually exclusive events, we get the axiom IIIa as
• Generalizing (4), for an infinite sequence of mutually
exclusive events,
• Similarly,
• Let us now consider that the die is loaded in such a way that appear twice as often as each of the other
face values, but the other face values are equally likely. Then,
• Now,
Introduction To Discrete Probability (AI121 - Discrete Mathematics, Spring 2025) 11
Probabilities of Events
• A trivial partition of the probability space is given by an event and its complement, i.e., give a probability
space and an event , we have
• Finally, for two events and , not necessarily mutually exclusive, we note that
• As an illustration of probabilistic reasoning regarding such problems, let us discuss a famous problem
originating with the 1960s television game show “Let’s Make a Deal”, and named after its host, Monty Hall.
• Suppose that you are a game show contestant, where you have a chance to win a large prize, and are asked to
select one of three doors to open; the large prize, such as a sports car, is behind one of the three doors and
the other two doors contain, let’s say, goats.
• Once you select a door, the game show host, who knows what is behind each door, does the following:
➢ First, whether or not you selected the winning door, he opens one of the other two doors that he knows
is a losing door (selecting at random if both are losing doors).
➢ Then he asks you whether you would like to switch doors.
• Should you change doors or keep your original selection, or does it not matter?
Introduction To Discrete Probability (AI121 - Discrete Mathematics, Spring 2025) 13
Probabilistic Reasoning – The Monty Hall Three-Door Puzzle
• Probabilistic Reason for The Monty Hall Three-Door Puzzle:
• The probability of selecting the correct door (before the host opens a door and asks you whether you want to
switch the doors) is 1/3, because the three doors are equally likely to be the correct door.
• Now, when the game show host opens one of the other doors, he will always open the door that the car is
not behind.
➢ As a result, the probability that the initially selected door is correct does not change once the game show host opens
one of the other doors.
➢ This means that the probability that you selected incorrectly, which is the probability that the car is behind one of
the two doors you did not select, which equals 2/3, does not change after the game show host opens one of the
other doors.
• So, if you selected incorrectly, when the game show host opens a door to show you that the car is not behind
it, the car is behind the other door.
➢ In this case, you will always win if your initial choice was incorrect and you change doors.
❖ So, by changing doors, the probability you win is 2/3.
• In other words, you should always change doors when given the chance to do so by the game show host.
➢ This doubles the probability that you will win.
Introduction To Discrete Probability (AI121 - Discrete Mathematics, Spring 2025) 14
Probabilistic Reasoning – The Monty Hall Three-Door Puzzle
Figure 3: Assuming that the contestant chooses door 1, the contestant wins in two out of three equally likely scenarios upon
switching, and wins in only one out of three scenarios upon staying with the initial choice of door 1. Hence, the probability of
wining the car upon switching has doubled to 2/3. The brown colored cell indicates the door the game host show opens for the
contestant. If the initially chosen door has car behind it, the game show is at liberty to choose any of the other two doors,
because both contain goat.
• The above queries are answered by defining a new law of probability, called conditional probability, defined
as
where the event $(A\,|\,M)$ is interpreted as the occurrence of $A$ given that $M$ has occurred.
➢ This new construct can be proved to be in harmony with the axioms of probability and hence, is a
probability of new kind:
i. Axiom I:
• Now, if the event occurs, then the probability of two showing on the face increases
from to the following:
• As another example, consider all the 16 4-bit strings, generated at random, to be is equally likely.
➢ Then, what is the probability that a 4-bit string contains at least two consecutive 0s, given that its first bit
is a 0 ?, assuming that 0 bits and 1 bits are equally likely.
• The experiment is given by:
• The events of interest, and the conditional probability of the event in question, are:
• Then:
which is called the total probability theorem, and the final expression has been obtained by using axiom III.
• Moreover,
• As a result,
and
which can also be seen as a consequence of the definitions in (12) and (9).
• Then,
i. represents , Random Variable
ii. is an empty set, A random variable is defined as a process of assigning a number ,
iii. represents , to every outcome of an experiment, in such a way that the resulting
function satisfies the following two conditions:
iv. represents ,
i. The set is an event for every .
v. represents . ii. .
Introduction To Discrete Probability (AI121 - Discrete Mathematics, Spring 2025) 25
Cumulative Distribution Function (CDF)
• Because the constituency of the subset is a function of the real number , so is its probability, i.e.,
P(x <= x). .
Definition
The probability of the event depends on
the number and is called the Cumulative Distribution
Function (CDF) of the random variable, denoted by ,
i.e.,
•
Figure 6: CDF of the random variable ,
Then, the CDF of the random variable is shown on defined for the roll-of-a-die experiment.
the right.
Introduction To Discrete Probability (AI121 - Discrete Mathematics, Spring 2025) 26
Properties of CDF
• Using the notation
1. , and .
Proof:
Consider a decreasing sequence of event, given by , where and is small. Then,
PDF is non-negative
because CDF is
monotonically non-
• The above expression can be inverted to express decreasing.
is also called Dirac delta function, characterized by the following Sifting theorem
Figure 7: CDF of a random variable ,
defined for the roll-of-a-die experiment.
Introduction To Discrete Probability (AI121 - Discrete Mathematics, Spring 2025) Figure 8: Dirac delta function. 31
Types Of Random Variables
• A random variable is called continuous if its CDF and PDF are continuous functions.
• A random variable is called mixed if its CDF is neither continuous nor discrete,
but a mixture of both.
➢ The resulting PDF is continuous with impulses at points of discontinuity in the CDF.
Introduction To Discrete Probability (AI121 - Discrete Mathematics, Spring 2025) Figure 8: Dirac delta function. 32
Discrete Random Variable – Uniform Distribution
• The simplest discrete random variable is given by the following PDF/PMF
• A discrete uniform random variable models the probability of equally likely outcomes of an experiment.
• It must be noted that the experiment does not actually have to have only two outcomes for it to be specified
by Bernoulli trials; the event success is the happening of any event and failure is then the complement of the
success.
➢ For instance, the roll-of-a-die experiment has six outcomes, however, with respect to the event
an instance of the experiment is a Bernoulli trial w.r.t. , in the sense that either occurs or it doesn’t.
• A random variable mapping the outcomes of a Bernoulli trial on the real line is called a Bernoulli random
variable.
➢ Representing the probability of success with and that of failure with , we observe that that the PMF of a Bernoulli
random variable is given by
➢ Oftentimes, binary values are used, i.e., and . Figure 12: PDF/PMF (blue) and CDF (maroon) of
Introduction To Discrete Probability (AI121 - Discrete Mathematics, Spring 2025) a Bernoulli random variable . 34
Combined Experiments
• So far, we have considered experiments of a single nature at a time, i.e., rolling of a die and tossing of a coin.
• However, if experiments of different kind are performed simultaneously, these can be considered as a single
trial of a combined experiment.
• As an example, consider rolling a fair die and tossing a fair coin at the same time, with the probability spaces,
given by
• Then, under the reasonable assumption that the outcomes of the two experiments are independent of each
other, we conclude that:
• However, the idea of independence used to reach the above conclusion does not agree with the definition of
independence of two events, as the events must belong to the same probability space.
• In this case, events ``two" and ``heads" are not elementary, rather subsets of with two and six elements
respectively, i.e.,
• Hence,
➢ By virtue of the trials being independent, occurrence of the event success is independent in all trials, i.e., the
probability of success in a given trial does not depend on the outcome of any other trial.
❖ As an example, consider a sequence of independent tosses of a fair coin, in which the event may be defined as
❖ Then, the probability of success in a given toss is independent of the outcome of any other toss, and is given by
Arranging successes in n
• For instance, in a sequence of independent tosses of a fair coin, the event independent trials is
equivalent to selecting k
objects out of without
paying attention to the
can occur in the following mutually exclusive ways order!
For exactly k
successes in trials,
there have to be n-k
where, using independence of trials, we note (by extending (14)) that failures!
• Therefore,
• A random variable that gives the number of successes in a sequence of n independent Bernoulli trials is
called a binomial random variable, whose PMF is given by (12), i.e.,
Binomial distribution refers
to the staircase function in
which the jump
discontinuities at are
given by (15).
• Such a random variable that quantifies the number of trials to achieve first success is called a geometric
random variable, whose probability density function is given by
where
Lecture 11 40