Lecture Notes
Lecture Notes
PROBABILITY THEORY
LECTURE NOTES
[2017]
Probability Theory Lecture Notes ---
i
Probability Theory Lecture Notes -
İçindekiler Tablosu
SYLLABUS .......................................................................................................................... - 5 -
CHAPTER 1 ........................................................................................................................ - 1 -
1.1 INTRODUCTION ....................................................................................................... - 1 -
1.2 BASIC PRINCIPLE OF COUNTING .................................................................... - 2 -
1.3 PERMUTATIONS .................................................................................................... - 3 -
1.4 COMBINATIONS ..................................................................................................... - 4 -
1.5 MULTİNOMİNAL COEFFICIENTS ..................................................................... - 5 -
SUMMARY...................................................................................................................... - 7 -
CHAPTER 2 ........................................................................................................................ - 9 -
2.1 SAMPLE SPACE AND EVENTS ........................................................................... - 9 -
2.2 AXIOMS OF PROBABILITY ............................................................................... - 11 -
2.3. SOME SIMPLE PROPOSITIONS....................................................................... - 12 -
2.4 SAMPLE SPACES HAVING EQUALLY LIKELY OUCOMES ..................... - 14 -
SUMMARY.................................................................................................................... - 17 -
CHAPTER3: ...................................................................................................................... - 19 -
CONDITIONAL PROBABILITY AND INDEPENDENCE ........................................ - 19 -
3.1 CONDITIONAL PROBABILITIES ..................................................................... - 19 -
3.2 BAYES’ FORMULA............................................................................................... - 22 -
3.3 INDEPENDENT EVENTS ..................................................................................... - 26 -
3.4 𝑷(. |𝑭) IS A PROBABILITY .................................................................................. - 29 -
SUMMARY.................................................................................................................... - 31 -
CHAPTER 4: ..................................................................................................................... - 33 -
RANDOM VARIABLES .................................................................................................. - 33 -
4.1 RANDOM VARIABLES ........................................................................................ - 33 -
4.2 DISTRIBUTION FUNCTIONS ............................................................................. - 36 -
4.3 DISCRETE RANDOM VARIABLES ................................................................... - 38 -
4.4 EXPECTED VALUE .............................................................................................. - 39 -
4.5 EXPECTATION OF A FUNCTION OF A RANDOM VARIABLE ................. - 41 -
4.6 VARIANCE ............................................................................................................. - 45 -
4.7 THE BERNOULLI AND BINOMIAL RANDOM VARIABLES ...................... - 46 -
4.7.1 Properties of Binomial Random Variables .................................................... - 47 -
4.8 THE POISSON RANDOM VARIABLE .............................................................. - 47 -
SUMMARY.................................................................................................................... - 49 -
CHAPTER 5. ..................................................................................................................... - 53 -
CONTINUOUS RANDOM VARIABLES ...................................................................... - 53 -
5.1 INTRODUCTION ..................................................................................................... - 53 -
ii
Probability Theory Lecture Notes
iii
Probability Theory Lecture Notes --
iv
Probability Theory Lecture Notes ---
SYLLABUS
Midterms 2×30%
Total 100%
Rules:
Entrance to classroom is permitted in only every 15 minutes starting from 09:00, e.g., at 09:00,
09:15, 09:30, … etc.
Textbooks:
1. Sheldon Ross, A First Course in Probability, (8th Ed.) Prentince Hall, 2010, USA.
2. Dimitri P. Bertsekas, Jon N. Tsitsiklis, Introduction to Probability, Athena Scientific, USA,
2000.
3. Hwei Hsu, Probability, Random Variables and Random Processes, Schaum’s Outline Series,
McGraw Hill, 1997.
4. Olasilik ve istatistige giris, Mühendisler ve Fenciler için, 4. Basimdan çeviri,
Prof. Dr. Salih Çelebioglu, Prof. Dr. Resat Kasap, Nobel Akademik Yayincilik 2015 (1 no tercüme)
5. 3. A. Papoulis, S. U. Pillai, Probability, Random Variables, and Stochastic Processes, McGraw-
Hill, USA.
Subjects:
v
Probability Theory Lecture Notes ---
CHAPTER 1
1.1 INTRODUCTION
Communications
o During wireless or wired communications, signals encounter noise, which has
specific probabilistic features.
Radars and electromagnetics
o The same noise arises
Electronics
o Thermal noises, as well as noise from other sources are probabilistic.
Combination of these
o Lifetime estimations of specific electronic components.
Probability deals with unpredictability and randomness, and probability theory is the branch
of mathematics that is concerned with the study of random phenomena. A random
phenomenon is one that, under repeated observation, yields different outcomes that are not
deterministically predictable. Examples of these random phenomena include the number of
electronic mail (e-mail) messages received by all employees of a company in one day, the
number of phone calls arriving at the university’s switchboard over a given period, the
number of components of a system that fail within a given interval, number of bits correctly
-1-
Probability Theory Lecture Notes ---
received through the internet, and the number of A’s that a student can receive in one
academic year.
States that if one experiment can result in any of m possible outcomes and if another
experiment can result in any of n possible outcomes, then there are mn possible
outcomes of the two experiments.
Suppose that two experiments are to be performed. Then if experiment 1 can result
in any one of m possible outcomes and if for each outcome of experiment 1 there are
n possible outcomes of experiment 2, then together there are mn possible outcomes
of the two experiments.
Example 1.1: A small community consists of 10 women, each of whom has 3 children. If one
woman and one of her children are to be chosen as mother and child of the year, how many
different choices are possible?
Solution: We see from the basic principle that there are 10 x 3 = 30 possible choices.
If r experiments that are to be performed are such that the first one may result in any
of n1 possible outcomes; and if, for each of these 𝒏𝟏 possible outcomes, there are 𝒏𝟐
possible outcomes of the second experiment; and if, for each of the possible
outcomes of the first two experiments, there are 𝒏𝟑 possible outcomes of the third
experiment; and if . . . , then there is a total of 𝒏𝟏 · 𝒏𝟐 · · · 𝒏𝒓 possible outcomes
of the r experiments.
-2-
Probability Theory Lecture Notes ---
1.3 PERMUTATIONS
How many different ordered arrangements of the letters a, b, and c are possible? By direct
enumeration we see that there are 6, namely, abc, acb, bac, bca, cab, and cba. Each
arrangement is known as a permutation.
Suppose now that we have n objects. Reasoning similar to that we have just used for the 3
letters then shows that there are
𝒏(𝒏 − 𝟏)(𝒏 − 𝟐) · · · 𝟑 · 𝟐 · 𝟏 = 𝒏!
Example 1.3: A class in probability theory consists of 6 men and 4 women. An examination
is given, and the students are ranked according to their performance. Assume that no two
students obtain the same score.
(b) If the men are ranked just among themselves and the women just among
themselves, how many different rankings are possible?
Solution. (a) Because each ranking corresponds to a particular ordered arrangement of the 10
people, the answer to this part is 𝟏𝟎! = 𝟑, 𝟔𝟐𝟖, 𝟖𝟎𝟎.
(b) Since there are 𝟔! possible rankings of the men among themselves and 𝟒! possible
rankings of the women among themselves, it follows from the basic principle that
there are (𝟔!)(𝟒!) = (𝟕𝟐𝟎)(𝟐𝟒) = 𝟏𝟕, 𝟐𝟖𝟎 possible rankings in this case.
Example 1.4: Ms. Jones has 10 books that she is going to put on her bookshelf. Of these, 4
are mathematics books, 3 are chemistry books, 2 are history books, and 1 is a language book.
Ms. Jones wants to arrange her books so that all the books dealing with the same subject are
together on the shelf. How many different arrangements are possible?
Solution. There are 𝟒! 𝟑! 𝟐! 𝟏! arrangements such that the mathematics books are first in
line, then the chemistry books, then the history books, and then the language book. Similarly,
for each possible ordering of the subjects, there are 𝟒! 𝟑! 𝟐! 𝟏! possible arrangements.
-3-
Probability Theory Lecture Notes ---
Hence, as there are 4! possible orderings of the subjects, the desired answer is
𝟒! 𝟒! 𝟑! 𝟐! 𝟏! = 𝟔𝟗𝟏𝟐.
Example 1.5: A chess tournament has 10 competitors, of which 4 are Russian, 3 are from the
United States, 2 are from Great Britain, and 1 is from Brazil. If the tournament result lists just
the nationalities of the players in the order in which they placed, how many outcomes are
possible?
possible outcomes.
1.4 COMBINATIONS
For instance, how many different groups of 3 could be selected from the 5 items A, B, C, D,
and E? To answer this question, reason as follows: Since there are 5 ways to select the initial
item, 4 ways to then select the next item, and 3 ways to select the final item, there are thus 5 ·
4 · 3 ways of selecting the group of 3 when the order in which the items are selected is
relevant. However, since every group of 3—say, the group consisting of items A, B, and C
will be counted 6 times (that is, all of the permutations ABC, ACB, BAC,BCA, CAB, and CBA
will be counted when the order of selection is relevant), it follows that the total number of
groups that can be formed is
𝟓 · 𝟒 · 𝟑/(𝟑 · 𝟐 · 𝟏) = 𝟏𝟎
𝒏 𝒏 (𝒏 − 𝟏) ⋯ (𝒏 − 𝒓 + 𝟏) 𝒏!
( )= =
𝒓 𝒓! (𝒏 − 𝒓)! 𝒓!
-4-
Probability Theory Lecture Notes ---
𝟐𝟎 𝟐𝟎 ⋅𝟏𝟗⋅𝟏𝟖
Solution. There are( ) = 𝟑⋅𝟐⋅𝟏 = 𝟏𝟏𝟒𝟎 possible outcomes.
𝟑
Example 1.7: From a group of 5 women and 7 men, how many different committees
consisting of 2 women and 3 men can be formed? What if 2 of the men are feuding and refuse
to serve on the committee together?
𝟓 𝟕
Solution As there are ( ) possible groups of 2 women and ( ) possible groups of 3 men, it
𝟐 𝟑
𝟓 𝟕
follows from the basic principle that there are ( ) ( ) = 𝟑𝟓𝟎 possible commitees
𝟐 𝟑
consisting of 2 women and 3 men.
On the other hand, if 2 of the men refuse to serve on the committee together, then there are
𝟐 𝟓 𝟐 𝟓
( ) ( ) possible groups of 3 men not containing either of the feuding men and ( ) ( )
𝟎 𝟑 𝟏 𝟐
𝟐 𝟓 𝟐 𝟓
groups of 3 men containing exactly one of the feuding men, there are ( ) ( ) + ( ) ( ) =
𝟎 𝟑 𝟏 𝟐
𝟓
𝟑𝟎 groups of 3 men not containing both of the feuding men. Since there are ( ) ways to
𝟐
𝟓
choose the 2 women, it follows that in this case there are 𝟑𝟎 ( ) = 𝟑𝟎𝟎 possible committees.
𝟐
In this section, we consider the following problem: A set of n distinct items is to be divided
into r distinct groups of respective sizes 𝒏𝟏 , 𝒏𝟐 , ⋯ , 𝒏𝒓 , where ∑𝒓𝒊=𝟏 𝒏𝒊 = 𝒏. How many
𝒏
different divisions are possible? To answer this question, we note that there are (𝒏 ) possible
𝟏
𝒏 − 𝒏𝟏
choices for the first group; for each choice of the first group, there are ( 𝒏 ) possible
𝟐
-5-
Probability Theory Lecture Notes ---
𝒏 − 𝒏𝟏 − 𝒏𝟐
choices for the second group; for each choice of the first two groups, there are ( 𝒏𝟑 )
possible choices for the third group; and so on. It then follows from the generalized version of
the basic counting principle that there are
𝒏 𝒏 − 𝒏𝟏 𝒏 − 𝒏𝟏 − 𝒏𝟐 − ⋯ − 𝒏𝒓−𝟏
(𝒏 ) ( 𝒏 ) ⋯ ( 𝒏 )
𝟏 𝟐 𝒓
𝒏! (𝒏 − 𝒏𝟏 )! (𝒏 − 𝒏𝟏 − 𝒏𝟐 − ⋯ − 𝒏𝒓−𝟏 )!
= ⋯
(𝒏 − 𝒏𝟏 )! 𝒏𝟏 ! (𝒏 − 𝒏𝟏 − 𝒏𝟐 )! 𝒏𝟐 ! 𝟎! 𝒓!
𝒏!
=
𝒏𝟏 ! 𝒏𝟐 ! ⋯ 𝒏𝒓 !
possible divisions.
Notation
𝒏
If 𝒏𝟏 + 𝒏𝟐 + · · · + 𝒏𝒓 = 𝒏, we define (𝒏 , 𝒏 , ⋯ , 𝒏 ) by
𝟏 𝟐 𝒓
𝒏 𝒏!
(𝒏 , 𝒏 , ⋯ , 𝒏 ) =
𝟏 𝟐 𝒓 𝒏𝟏 ! 𝒏𝟐 ! ⋯ 𝒏𝒓 !
𝒏
Thus, (𝒏 , 𝒏 , ⋯ , 𝒏 ) represents the number of possible divisions of n distinct objects into 𝒓
𝟏 𝟐 𝒓
distinct groups of respective sizes 𝒏𝟏 , 𝒏𝟐 , ⋯ . , 𝒏𝒓 .
Example 1.8: A police department in a small city consists of 10 officers. If the department
policy is to have 5 of the officers patrolling the streets, 2 of the officers working full time at
the station, and 3 of the officers on reserve at the station, how many different divisions of the
10 officers into the 3 groups are possible?
Example 1.9: Ten children are to be divided into an A team and a B team of 5 each. The A
team will play in one league and the B team in another. How many different divisions are
possible?
-6-
Probability Theory Lecture Notes ---
Solution. Note that this example is different from Example 1.9 because now the order of the
two teams is irrelevant. That is, there is no A and B team, but just a division consisting of 2
groups of 5 each. Hence, the desired answer is 𝟏𝟎!/((𝟓! 𝟓!) 𝟐!) = 𝟏𝟐𝟔
That is, the sum is over all nonnegative integer-valued vactors (𝒏𝟏 , 𝒏𝟐 , ⋯ , 𝒏𝒓 ) such that
𝒏𝟏 + 𝒏𝟐 + ⋯ + 𝒏𝒓 = 𝒏.
SUMMARY
The basic principle of counting states that if an experiment consisting of two phases is such
that there are 𝑛 possible outcompes of phase 1 and, for each of these 𝑛 outcomes, there are 𝑚
possible outcomes of phase 2, then there are 𝑛 ⋅ 𝑚 possible outcomes of the experiment.
𝑛 𝑛!
( )=
𝑖 (𝑛 − 𝑖)! 𝑖!
when 0 ≤ 𝑖 ≤ 𝑛, and let it equal 0 otherwise. This quantity represents the number of different
subgroups of size i that can be chosen from a set of size n. It is often called binomial
coefficient because of its prominence in the binomial theorem, which states that
𝑛
𝑛
(𝑥 + 𝑦) = ∑ ( ) 𝑥 𝑘 𝑦 𝑛−𝑘
𝑛
𝑘
𝑘=0
𝑛 𝑛!
(𝑛 , 𝑛 , ⋯ , 𝑛 ) =
1 2 𝑟 𝑛1 ! 𝑛2 ! ⋯ 𝑛𝑟 !
-7-
Probability Theory Lecture Notes ---
-8-
Probability Theory Lecture Notes ---
CHAPTER 2
Consider an experiment whose set of all possible outcomes is known. This set of all possible
outcomes of an experiment is known as the sample space of the experiment and is denoted by
S. Following are some examples:
𝑺 = {𝒈, 𝒃}
where the outcome g means that the child is a girl and b that it is a boy.
2. If the outcome of an experiment is the order of finish in a race among the 7 horses
having post positions 1, 2, 3, 4, 5, 6, and 7, then
The outcome (𝟐, 𝟑, 𝟏, 𝟔, 𝟓, 𝟒, 𝟕) means, for instance, that the number 2 horse comes in
first, then the number 3 horse, then the number 1 horse, and so on.
3. If the experiment consists of flipping two coins, then the sample space consists of the
following four points:
The outcome will be (𝑯, 𝑯) if both coins are heads, (𝑯, 𝑻) if the first coin is heads
and the second tails, (𝑻, 𝑯) if the first is tails and the second heads, and (𝑻, 𝑻) if both
coins are tails.
-9-
Probability Theory Lecture Notes ---
4. If the experiment consists of tossing two dice, then the sample space consists of the 36
points
𝑺 = {(𝒊, 𝒋): 𝒊, 𝒋 = 𝟏, 𝟐, 𝟑, 𝟒, 𝟓, 𝟔}
where the outcome (𝒊, 𝒋) is said to occur if i appears on the leftmost die and j on the
other die.
5. If the experiment consists of measuring (in hours) the lifetime of a transistor, then the
sample space consists of all nonnegative real numbers; that is,
Any subset E of the sample space is known as an event. In other words, an event is a set
consisting of possible outcomes of the experiment. If the outcome of the experiment is
contained in E, then we say that E has occurred. Example 1, if 𝑬 = {𝒈}, then E is the event
that the child is a girl. In Example 3, if 𝑬 = {(𝑯, 𝑯), (𝑯, 𝑻)}, then E is the event that a head
appears on the first coin. In Example 4, 𝒊𝒇 𝑬 = {(𝟏, 𝟔), (𝟐, 𝟓), (𝟑, 𝟒), (𝟒, 𝟑), (𝟓, 𝟐), (𝟔, 𝟏)},
then E is the event that
the sum of the dice equals 7. In Example 5, if 𝑬 = {𝒙: 𝟎 < 𝒙 < 𝟓}, then E is the event that
the transistor does not last longer than 5 hours.
The event 𝑬 𝑼 𝑭 is called the union of the event E and the event F.
For any two events E and F, we may also define the new event 𝑬𝑭, called the intersection of
E and F
For any event E, we define the new event 𝑬𝒄 , referred to as the complement of E, to consist of
all outcomes in the sample space S that are not in E.
A graphical representation that is useful for illustrating logical relations among events is the
Venn diagram.
- 10 -
Probability Theory Lecture Notes ---
De Morgan’s Laws:
𝒏 𝒄 𝒏
(⋃ 𝑬𝒊 ) = ⋂ 𝑬𝒄𝒊
𝒊=𝟏 𝒊=𝟏
𝒏 𝒄 𝒏
(⋂ 𝑬𝒊 ) = ⋃ 𝑬𝒄𝒊
𝒊=𝟏 𝒊=𝟏
One way of defining the probability of an event is in terms of its relative frequency. Such a
definition usually goes as follows: We suppose that an experiment, whose sample space is 𝑺,
is repeatedly performed under exactly the same conditions. For each event E of the sample
space 𝑺, we define 𝒏(𝑬) to be the number of times in the first n repetitions of the experiment
that the event E occurs. Then P(E), the probability of the event 𝑬, is defined as
- 11 -
Probability Theory Lecture Notes ---
𝒏(𝑬)
𝑷(𝑬) = 𝐥𝐢𝐦
𝒏→∞ 𝒏
Consider an experiment whose sample space is 𝑺. For each event 𝑬 of the sample space 𝑺, we
assume that a number 𝑷(𝑬) is defined and satisfies the following three axioms:
Axiom 1
𝟎 ≤ 𝑷(𝑬) ≤ 𝟏
Axiom 2
𝑷(𝑺) = 𝟏
Axiom 3
For any sequence of mutually exclusive events 𝑬𝟏 , 𝑬𝟐 , . .. (that is, events for which
𝑬𝒊 𝑬𝒋 = Ø when 𝒊 ≠ 𝒋),
∞ ∞
𝑷 (⋃ 𝑬𝒊 ) = ∑ 𝑷(𝑬𝒊 )
𝒊=𝟏 𝒊=𝟏
Example 2.1: If a die is rolled and we suppose that all six sides are equally likely to appear,
then we would have ({𝟏}) = 𝑷({𝟐}) = 𝑷({𝟑}) = 𝑷({𝟒}) = 𝑷({𝟓}) = 𝑷({𝟔}) = 𝟏/𝟔 .
From Axiom 3, it would thus follow that the probability of rolling an even number would
equal
We first note that, since 𝑬 and 𝑬𝒄 are always mutually exclusive and since 𝑬 ∪ 𝑬𝒄 = 𝑺, we
have, by Axioms 2 and 3,
Proposition 1:
𝑷(𝑬𝒄 ) = 𝟏 − 𝑷(𝑬)
- 12 -
Probability Theory Lecture Notes ---
Proposition 2:
Proposition 3:
E F
I II III
Example 2.2: J is taking two books along on her holiday vacation. With probability .5, she
will like the first book; with probability .4, she will like the second book; and with probability
.3, she will like both books. What is the probability that she likes neither book?
Solution. Let Bi denote the event that J likes book i, i = 1, 2. Then the probability that she
likes at least one of the books is
Because the event that J likes neither book is the complement of the event that she likes at
least one of them, we obtain the result
We may also calculate the probability that any one of the three events 𝑬, 𝑭, and 𝑮 occurs,
namely,
𝑷(𝑬 𝑼 𝑭 𝑼 𝑮) = 𝑷[(𝑬 𝑼 𝑭) 𝑼 𝑮]
Now, it follows from the distributive law that the events (𝑬 𝑼 𝑭)𝑮 and 𝑬𝑮 𝑼 𝑭𝑮 are
equivalent; hence, from the preceding equations, we obtain
- 13 -
Probability Theory Lecture Notes ---
𝑷(𝑬 𝑼 𝑭 𝑼 𝑮)
Proposition 4
𝑷(𝑬𝟏 ∪ 𝑬𝟐 ∪ ⋯ ∪ 𝑬𝒏 )
𝒏
𝒏
The summation ∑𝒊𝟏 <𝒊𝟐 <⋯<𝒊𝒓 𝑷(𝑬𝒊𝟏 𝑬𝒊𝟐 ⋯ 𝑬𝒊𝒓 ) is taken over all of the ( ) possible subsets of
𝒓
size 𝒓 of the set {𝟏, 𝟐, ⋯ , 𝒏}.
Consider an experiment whose sample space 𝑺 is a finite set, say, 𝑺 = {𝟏, 𝟐, . . . , 𝑵}. Then it
is often natural to assume that
𝟏
𝑷({𝒊}) = , 𝒊 = 𝟏, 𝟐, . . . , 𝑵
𝑵
From this equation, it follows from Axiom 3 that, for any event E,
- 14 -
Probability Theory Lecture Notes ---
In words, if we assume that all outcomes of an experiment are equally likely to occur, then the
probability of any event E equals the proportion of outcomes in the sample space that are
contained in E.
Example 2.3: If two dice are rolled, what is the probability that the sum of the upturned faces
will equal 7?
Solution. We shall solve this problem under the assumption that all of the 36 possible
outcomes are equally likely. Since there are 6 possible outcomes—namely,
(𝟏, 𝟔), (𝟐, 𝟓), (𝟑, 𝟒), (𝟒, 𝟑), (𝟓, 𝟐), and (𝟔, 𝟏)—that result in the sum of the dice being equal
to 7, the desired probability is 6/36 = 1/ 6.
Example 2.4: If 3 balls are “randomly drawn” from a bowl containing 6 white and 5 black
balls, what is the probability that one of the balls is white and the other two black?
Solution. If we regard the order in which the balls are selected as being relevant, then the
sample space consists of 11 · 10 · 9 = 990 outcomes. Furthermore, there are 6 · 5 · 4 = 120
outcomes in which the first ball selected is white and the other two are black; 5 · 6 · 4 = 120
outcomes in which the first is black, the second is white, and the third is black; and 5 · 4 · 6 =
120 in which the first two are black and the third is white. Hence, assuming that “randomly
drawn” means that each outcome in the sample space is equally likely to occur, we see that
the desired probability is
𝟔 𝟓
( )( )
𝟏 𝟐 = 𝟒
𝟏𝟏 𝟏𝟏
( )
𝟑
or
𝟐𝟎 ⋅ 𝟏𝟖 ⋅ 𝟏𝟔 ⋅ 𝟏𝟒 ⋅ 𝟏𝟐 𝟒
= .
𝟐𝟎 ⋅ 𝟏𝟗 ⋅ 𝟏𝟖 ⋅ 𝟏𝟕 ⋅ 𝟏𝟔 𝟏𝟏
- 15 -
Probability Theory Lecture Notes ---
Example 2.5: A committee of 5 is to be selected from a group of 6men and 9 women. If the
selection is made randomly, what is the probability that the committee consists of 3 men and 2
women?
𝟏𝟓
Solution. Because each of the ( ) possible committees is equally likely to be selected, the
𝟓
desired probability is
𝟔 𝟗
( )( )
𝟑 𝟐 = 𝟐𝟒𝟎
𝟏𝟓 𝟏𝟎𝟎𝟏
( )
𝟓
Example 2.6: An urn contains n balls, one of which is special. If k of these balls are
withdrawn one at a time, with each selection being equally likely to be any of the balls that
remain at the time, what is the probability that the special ball is chosen?
Solution.
𝟏 𝒏−𝟏
( )( )
𝑷{𝑺𝒑𝒆𝒄𝒊𝒂𝒍 𝒃𝒂𝒍𝒍 𝒊𝒔 𝒔𝒆𝒍𝒆𝒄𝒕𝒆𝒅} = 𝟏 𝒌 −𝟏 = 𝒌
𝒏 𝒏
( )
𝒌
Example 2.7: A 5-card poker hand is said to be a full house if it consists of 3 cards of the
same denomination and 2 other cards of the same denomination (of course, different from the
first denomination). Thus, one kind of full house is three of a kind plus a pair. What is the
probability that one is dealt a full house?
𝟓𝟐
Solution. Again, we assume that all ( ) possible hands are equally likely. To determine the
𝟓
𝟒 𝟒
number of possible full houses, we first note that there are ( ) ( ) different combinations of,
𝟐 𝟑
say, 2 tens and 3 jacks. Because there are 13 different choices for the kind of pair and, after a
pair has been chosen, there are 12 other choices for the denomination of the remaining 3
cards, it follows that the probability of a full house is
𝟒 𝟒
𝟏𝟑 ⋅ 𝟏𝟐 ⋅ ( ) ( )
𝟐 𝟑 ≈ 𝟎. 𝟎𝟎𝟏𝟒
𝟓𝟐
( )
𝟓
- 16 -
Probability Theory Lecture Notes ---
Example 2.8: A poker hand consists of 5 cards. If the cards have distinct consecutive values
and are not all of the same suit, we say that the hand is a straight. For instance, a hand
consisting of the five of spades, six of spades, seven of spades, eight of spades, and nine of
hearts is a straight. What is the probability that one is dealt a straight?
𝟓𝟐
Solution. We start by assuming that all ( ) possible poker hands are equally likely. To
𝟓
determine the number of outcomes that are straights, let us first determine the number of
possible outcomes for which the poker hand consists of an ace, two, three, four, and five (the
suits being irrelevant). Since the ace can be any 1 of the 4 possible aces, and similarly for the
two, three, four, and five, it follows that there are 𝟒𝟓 outcomes leading to exactly one ace,
two, three, four, and five. Hence, since in 4 of these outcomes all the cards will be of the same
suit (such a hand is called a straight flush), it follows that there are 45 − 4 hands that make
up a straight of the form ace, two, three, four, and five. Similarly, there are 45 − 4 hands that
make up a straight of the form ten, jack, queen, king, and ace. Thus, there are 10(45 − 4)
hands that are straights, and it follows that the desired probability is
𝟏𝟎(𝟒𝟓 − 𝟒)
≈ 𝟎. 𝟎𝟎𝟑𝟗
𝟓𝟐
( )
𝟓
SUMMARY
Let 𝑆 denote the set of all possible outcomes of an experiment. 𝑆 is called the sample space of
the experiment. An event is a subset of 𝑆. If 𝐴𝑖 , 𝑖 = 1, … , 𝑛, are events, then ⋃𝑛𝑖=1 𝐴𝑖 , called
the union of these events, consists of all outcomes that are in at least one of the events 𝐴𝑖 ,
𝑖 = 1, … , 𝑛. Similarly, ⋂𝑛𝑖=1 𝐴𝑖 , sometimes written as 𝐴1 ⋯ 𝐴𝑛 , is called the intersection of the
events 𝐴𝑖 and consists of all outcomes that are in all of the events 𝐴𝑖 , 𝑖 = 1, … , 𝑛.
For any event 𝐴, we define 𝐴𝑐 to consist of all outcomes in the sample space that are not in 𝐴.
We call 𝐴𝑐 the complement of the event 𝐴. The event 𝑆 𝑐 , which is empty of outcomes, is
designated by Ø and is called the null set. If 𝐴𝐵 = Ø, then we say that 𝐴 and 𝐵 are mutually
exclusive.
For each event 𝐴 of the sample space 𝑆, we suppose that a number 𝑃(𝐴), called the
probability of 𝐴, is defined and is such that
- 17 -
Probability Theory Lecture Notes ---
i. 0 ≤ 𝑃(𝐴) ≤ 1
ii. 𝑃(𝑆) = 1
iii. For mutually exclusive events 𝐴𝑖 ,
∞ ∞
𝑷 (⋃ 𝐴𝒊 ) = ∑ 𝑷(𝐴𝒊 )
𝒊=𝟏 𝒊=𝟏
𝑃(𝐴) represents the probability that the outcome of the experiment is in 𝐴. It can be shown
that
𝑃(𝐴𝑐 ) = 1 − 𝑃(𝐴)
𝑛 𝑛
+ ⋯ + (−1)𝑛+1 𝑃(𝐴1 𝐴2 ⋯ 𝐴𝑛 )
If 𝑆 is finite and each one point set is assumed to have equal probability, then
|𝐴|
𝑃(𝐴) =
|𝑆|
- 18 -
Probability Theory Lecture Notes ---
CHAPTER3:
CONDITIONAL PROBABILITY AND
INDEPENDENCE
Suppose that we toss 2 dice, and suppose that each of the 36 possible outcomes is equally
likely to occur and hence has probability 1/36 . Suppose further that we observe that the first
die is a 3. Then, given this information, what is the probability that the sum of the 2 dice
equals 8?
Given that the initial die is a 3, there can be at most 6 possible outcomes of our experiment,
namely, (3, 1), (3, 2), (3, 3), (3, 4), (3, 5), and (3, 6), the (conditional) probability of each of
the outcomes (3, 1), (3, 2), (3, 3), (3, 4), (3, 5), and (3, 6) is 1/6, whereas the (conditional)
probability of the other 30 points in the sample space is 0. Hence, the desired probability will
be 1/6.
If we let E and F denote, respectively, the event that the sum of the dice is 8 and the event that
the first die is a 3, then the probability just obtained is called the conditional probability that
E occurs given that F has occurred and is denoted by
𝑷(𝑬|𝑭)
Definition:
𝑷(𝑬𝑭)
𝑷(𝑬|𝑭) =
𝑷(𝑭)
- 19 -
Probability Theory Lecture Notes ---
Solution. Let 𝑳𝒙 denote the event that the student finishes the exam in less than 𝒙 hours,
𝟎 < 𝒙 < 𝟏, and let 𝑭 be the event that the student uses the full hour. Because 𝑭 is the event
that the student is not finished in less than 1 hour,
Now, the event that the student is still working at time . 𝟕𝟓 is the complement of the event
𝑳𝟎.𝟕𝟓 , so the desired probability is obtained from
𝑷(𝑭𝑳𝒄𝟎.𝟕𝟓 )
𝑷(𝑭|𝑳𝒄𝟎.𝟕𝟓 ) =
𝑷(𝑳𝒄𝟎.𝟕𝟓 )
𝑷(𝑭)
=
𝟏 − 𝑷(𝑳𝟎.𝟕𝟓 )
= 𝟎. 𝟓 × 𝟎. 𝟔𝟐𝟓 = . 𝟖
If each outcome of a finite sample space 𝑺 is equally likely, then, conditional on the event that
the outcome lies in a subset 𝑭 𝑺, all outcomes in 𝑭 become equally likely. In such cases, it
is often convenient to compute conditional probabilities of the form 𝑷(𝑬|𝑭) by using 𝑭 as the
sample space. Indeed, working with this reduced sample space often results in an easier and
better understood solution. Our next few examples illustrate this point.
Example 3.2: A coin is flipped twice. Assuming that all four points in the sample space
𝑺 = {(𝒉, 𝒉), (𝒉, 𝒕), (𝒕, 𝒉), (𝒕, 𝒕)} are equally likely, what is the conditional probability that
both flips land on heads, given that (a) the first flip lands on heads? (b) at least one flip lands
on heads?
Solution. Let 𝑩 = {(𝒉, 𝒉)} be the event that both flips land on heads; let
𝑭 = {(𝒉, 𝒉), (𝒉, 𝒕)} be the event that the first flip lands on heads; and let
𝑨 = {(𝒉, 𝒉), (𝒉, 𝒕), (𝒕, 𝒉)} be the event that at least one flip lands on heads. The probability
for (a) can be obtained from
- 20 -
Probability Theory Lecture Notes ---
Example 3.3: A woman is known to have two children, what is the conditional probability
that they are both boys, given that she has at least one son?
Solution: Assume that the sample space S is given by 𝑺 = {(𝒃, 𝒃), (𝒃, 𝒈), (𝒈, 𝒃), (𝒈, 𝒈)} and
all outcomes are equally likely.
Fact:
𝑷(𝑬𝑭)
As 𝑷(𝑬|𝑭) = , then
𝑷(𝑭)
𝑷(𝑬𝑭) = 𝑷(𝑬|𝑭)𝑷(𝑭)
Let the event that Celine takes chemistry and A denote the event that she receives an A in
whatever course she takes, then the desired probability is 𝑷(𝑪𝑨), which is calculated by using
the above fact as follows:
Example 3.5: Suppose that an urn contains 8 red balls and 4 white balls. We draw 2 balls
from the urn without replacement. (a) If we assume that at each draw each ball in the urn is
equally likely to be chosen, what is the probability that both balls drawn are red?
- 21 -
Probability Theory Lecture Notes ---
Solution. Let 𝑹𝟏 and 𝑹𝟐 denote, respectively, the events that the first and second balls drawn
are red. Now, given that the first ball selected is red, there are 7 remaining red balls and 4
white balls, so 𝑷(𝑹𝟐 |𝑹𝟏 ) = 𝟕/𝟏𝟏. As 𝑷(𝑹𝟏 ) is clearly /𝟏𝟐 , the desired probability is
(𝟖)
𝟐
Of course, this probability could have been computed by 𝑷(𝑹𝟏 𝑹𝟐 ) = .
(𝟏𝟐)
𝟐
A generalization of the above fact which provides an expression for the probability of
the intersection of an arbitrary number of events, is sometimes referred to as the
multiplication rule.
To prove the multiplication rule, just apply the definition of conditional probability to its
right-hand side. This gives:
𝑬 = 𝑬𝑭 𝑼 𝑬𝑭𝒄
for, in order for an outcome to be in E, it must either be in both E and F or be in E but not in
F. (See Figure 3.1.) As EF and EFc are clearly mutually exclusive, we have,
- 22 -
Probability Theory Lecture Notes ---
E F
𝑬𝑭𝒄 𝑬𝑭
Example 3.6: An insurance company believes that people can be divided into two classes:
those who are accident prone and those who are not. The company’s statistics show that an
accident-prone person will have an accident at some time within a fixed 1-year period with
probability .4, whereas this probability decreases to .2 for a person who is not accident prone.
If we assume that 30 percent of the population is accident prone, what is the probability that a
new policyholder will have an accident within a year of purchasing a policy?
Solution: Let A1 denote the event that the policyholder will have an accident within a year of
purchasing the policy, and let A denote the event that the policyholder is accident prone.
Hence, the desired probability is given by
Example 3.6.b Suppose that a new policyholder has an accident within a year of purchasing a
policy. What is the probability that he or she is accident prone?
𝑷(𝑨𝑨𝟏 ) 𝑷(𝑨)𝑷(𝑨|𝑨𝟏 ) 𝟎. 𝟑 × 𝟎. 𝟒 𝟔
𝑷(𝑨|𝑨𝟏 ) = = = =
𝑷(𝑨𝟏 ) 𝑷(𝑨𝟏 ) 𝟎. 𝟐𝟔 𝟏𝟑
Example 3.7: In answering a question on a multiple-choice test, a student either knows the
answer or guesses. Let 𝒑 be the probability that the student knows the answer and 𝟏 − 𝒑 be
the probability that the student guesses. Assume that a student who guesses at the answer will
be correct with probability 𝟏/𝒎, where 𝒎 is the number of multiple-choice alternatives.
What is the conditional probability that a student knew the answer to a question given that he
or she answered it correctly?
- 23 -
Probability Theory Lecture Notes ---
Solution. Let 𝑪 and 𝑲 denote, respectively, the events that the student answers the question
correctly and the event that he or she actually knows the answer. Now,
𝑷(𝑲𝑪) 𝑷(𝑪|𝑲)𝑷(𝑲)
𝑷(𝑲|𝑪) = =
𝑷(𝑪) 𝑷(𝑪|𝑲)𝑷(𝑲) + 𝑷(𝑪|𝑲𝒄 )𝑷(𝑲𝒄 )
𝒑 𝒎𝒑
= =
𝒑 + (𝟏/𝒎)(𝟏 − 𝒑) 𝟏 + (𝒎 − 𝟏)𝒑
For example, if 𝒎 = 𝟓 and 𝒑 = 𝟏/𝟐, then the probability that the student knew the answer
to a question he or she answered correctly is 𝟓/𝟔.
Example 3.8: A laboratory blood test is 95 percent effective in detecting a certain disease
when it is, in fact, present. However, the test also yields a “false positive” result for 1 percent
of the healthy persons tested. (That is, if a healthy person is tested, then, with probability . 𝟎𝟏,
the test result will imply that he or she has the disease.) If . 𝟓 percent of the population
actually has the disease, what is the probability that a person has the disease given that the test
result is positive?
Solution. Let 𝑫 be the event that the person tested has the disease and 𝑬 the event that the test
result is positive. Then the desired probability is
𝑷(𝑫𝑬) 𝑷(𝑬|𝑫)𝑷(𝑫)
𝑷(𝑫|𝑬) = =
𝑷(𝑬) 𝑷(𝑬|𝑫)𝑷(𝑫) + 𝑷(𝑬|𝑫𝒄 )𝑷(𝑫𝒄 )
(. 𝟗𝟓)(. 𝟎𝟎𝟓) 𝟗𝟓
= = ≈ 𝟎. 𝟑𝟐𝟑
(. 𝟗𝟓)(. 𝟎𝟎𝟓) + (. 𝟎𝟏)(. 𝟗𝟗𝟓) 𝟐𝟗𝟒
Since 0.5 percent of the population actually has the disease, it follows that, on the average, 1
person out of every 200 tested will have it. The test will correctly confirm that this person has
the disease with probability .95. Thus, on the average, out of every 200 persons tested, the test
will correctly confirm that .95 person has the disease. On the other hand, however, out of
the(on the average) 199 healthy people, the test will incorrectly state that (199)(.01) of these
people have the disease. Hence, for every .95 diseased person that the test correctly states is
ill, there are (on the average) (199)(.01) healthy persons that the test incorrectly states are ill.
Thus, the proportion of time that the test result is correct when it states that a person is ill is
𝟎. 𝟗𝟓 𝟗𝟓
= ≈ 𝟎. 𝟑𝟐𝟑
. 𝟗𝟓 + (𝟏𝟗𝟗)(𝟎. 𝟎𝟏) 𝟐𝟗𝟒
- 24 -
Probability Theory Lecture Notes ---
Example 3.9: Consider a medical practitioner pondering the following dilemma: “If I’m at
least 80 percent certain that my patient has this disease, then I always recommend surgery,
whereas if I’m not quite as certain, then I recommend additional tests that are expensive and
sometimes painful. Now, initially I was only 60 percent certain that Jones had the disease, so I
ordered the series A test, which always gives a positive result when the patient has the disease
and almost never does when he is healthy. The test result was positive, and I was all set to
recommend surgery when Jones informed me, for the first time, that he was diabetic. This
information complicates matters because, although it doesn’t change my original 60 percent
estimate of his chances of having the disease in question, it does affect the interpretation of
the results of the A test. This is so because the A test, while never yielding a positive result
when the patient is healthy, does unfortunately yield a positive result 30 percent of the time in
the case of diabetic patients who are not suffering from the disease. Now what do I do? More
tests or immediate surgery?”
Solution. Let D denote the event that Jones has the disease and E the event that the A test
result is positive. The desired conditional probability is then
𝑷(𝑫𝑬) 𝑷(𝑫)𝑷(𝑬|𝑫)
𝑷(𝑫|𝑬) = =
𝑷(𝑬) 𝑷(𝑬|𝑫)𝑷(𝑫) + 𝑷(𝑬|𝑫𝒄 )𝑷(𝑫𝒄 )
𝟎. 𝟔 × 𝟏
= = 𝟎. 𝟖𝟑𝟑
𝟏 × 𝟎. 𝟔 + 𝟎. 𝟑 × 𝟎. 𝟒
Solution. Letting G denote the event that the suspect is guilty and C the event that he
possesses the characteristic of the criminal, we have
𝑷(𝑮𝑪) 𝑷(𝑪|𝑮)𝑷(𝑮)
𝑷(𝑮|𝑪) = =
𝑷(𝑪) 𝑷(𝑪|𝑮)𝑷(𝑮) + 𝑷(𝑪|𝑮𝒄 )𝑷(𝑮𝒄 )
𝟏 × 𝟎. 𝟔
= ≈ 𝟎. 𝟖𝟖𝟐
𝟏 × 𝟎. 𝟔 + 𝟎. 𝟐 × 𝟎. 𝟒
- 25 -
Probability Theory Lecture Notes ---
In the special cases where 𝑷(𝑬|𝑭) does in fact equal 𝑷(𝑬), we say that 𝑬 is independent of 𝑭.
That is, 𝑬 is independent of 𝑭 if knowledge that 𝑭 has occurred does not change the
probability that 𝑬 occurs.
𝑷(𝑬𝑭) = 𝑷(𝑬)𝑷(𝑭)
The fact that the above equation is symmetric in 𝑬 and 𝑭 shows that whenever 𝑬 is
independent of 𝑭, 𝑭 is also independent of 𝑬. We thus have the following
Definition:
Two events 𝑬 and 𝑭 that are not independent are said to be dependent.
Example 3.11: A card is selected at random from an ordinary deck of 𝟓𝟐 playing cards. If 𝑬
is the event that the selected card is an ace and 𝑭 is the event that it is a spade, then 𝑬 and 𝑭
are independent. This follows because (𝑬𝑭) = 𝟏/𝟓𝟐 , whereas 𝑷(𝑬) = 𝟒/𝟓𝟐 and 𝑷(𝑭) =
𝟏𝟑/𝟓𝟐 .
Example 3.12: Two coins are flipped, and all 4 outcomes are assumed to be equally likely. If
E is the event that the first coin lands on heads and F the event that the second lands on tails,
then E and F are independent, since 𝑷(𝑬𝑭) = 𝑷({(𝑯, 𝑻)}) = 𝟏/𝟒, whereas 𝑷(𝑬) =
𝑷({(𝑯, 𝑯), (𝑯, 𝑻)}) = ½ and 𝑷(𝑭) = 𝑷({(𝑯, 𝑻), (𝑻, 𝑻)}) = ½.
Example 3.13: Suppose that we toss 2 fair dice. Let E1 denote the event that the sum of the
dice is 6 and F denote the event that the first die equals 4. Then
whereas
- 26 -
Probability Theory Lecture Notes ---
Hence, E1 and F are not independent. Intuitively, the reason for this is clear because if we are
interested in the possibility of throwing a 6 (with 2 dice), we shall be quite happy if the first
die lands on 4 (or, indeed, on any of the numbers 1, 2, 3, 4, and 5), for then we shall still have
a possibility of getting a total of 6.
Now, suppose that we let 𝑬𝟐 be the event that the sum of the dice equals 7. Is 𝑬𝟐 independent
of 𝑭? The answer is yes, since
whereas
We leave it for the reader to present the intuitive argument why the event that the sum of the
dice equals 7 is independent of the outcome on the first die.
Proof: Assume that E and F are independent. Since 𝑬 = 𝑬𝑭 𝑼 𝑬𝑭𝒄 , and 𝑬𝑭 and 𝑬𝑭𝒄 are
obviously mutually exclusive, we have that
= 𝑷(𝑬)𝑷(𝑭) + 𝑷(𝑬𝑭𝒄 )
then we get
= 𝑷(𝑬)𝑷(𝑭𝒄 )
Definition:
𝑷(𝑬𝑭𝑮) = 𝑷(𝑬)𝑷(𝑭)𝑷(𝑮)
𝑷(𝑬𝑭) = 𝑷(𝑬)𝑷(𝑭)
𝑷(𝑬𝑮) = 𝑷(𝑬)𝑷(𝑮)
𝑷(𝑭𝑮) = 𝑷(𝑭)𝑷(𝑮)
- 27 -
Probability Theory Lecture Notes ---
𝒏
(b) 𝑷{𝑬𝒙𝒂𝒄𝒕𝒍𝒚 𝒌 𝒔𝒖𝒄𝒄𝒆𝒔𝒔} = ( ) 𝒑𝒌 (𝟏 − 𝒑)𝒏−𝒌
𝒌
𝟎, 𝒊𝒇 𝒑 < 𝟏
(c) = 𝐥𝐢𝐦𝒏 𝒑𝒏 = {
𝟏, 𝒊𝒇 𝒑 = 𝟏
Solution.
= 𝟏 − ∏(𝟏 − 𝒑𝒊 ) 𝐛𝐲 𝐢𝐧𝐝𝐞𝐩𝐞𝐧𝐝𝐞𝐧𝐜𝐞
𝒊=𝟏
- 28 -
Probability Theory Lecture Notes ---
Proposition
(a) 𝟎 ≤ 𝑷(𝑬|𝑭) ≤ 𝟏
(b) 𝑷(𝑺|𝑭) = 𝟏
(c) If 𝑬𝒊 , 𝒊 = 𝟏, 𝟐, ⋯ are mutually exclusive events, then
∞ ∞
If we define 𝑸(𝑬) = 𝑷(𝑬|𝑭), then, from Proposition 5.1, 𝑸(𝑬) may be regarded as a
probability function on the events of 𝑺. Hence, all of the propositions previously proved for
probabilities apply to 𝑸(𝑬). For instance, we have
or, equivalently,
Also, if we define the conditional probability 𝑸(𝑬𝟏 |𝑬𝟐 ) by 𝑸(𝑬𝟏 |𝑬𝟐 ) = 𝑸(𝑬𝟏 𝑬𝟐 )/𝑸(𝑬𝟐 ),
then, we have
Since
- 29 -
Probability Theory Lecture Notes ---
𝑷(𝑬𝟏 𝑬𝟐 𝑭)
𝑷(𝑭)
= = 𝑷(𝑬𝟏 |𝑬𝟐 𝑭 )
𝑷(𝑬𝟐 𝑭)
𝑷(𝑭)
𝑷(𝑬𝟏 |𝑭) = 𝑷(𝑬𝟏 |𝑬𝟐 𝑭)𝑷(𝑬𝟐 |𝑭) + 𝑷(𝑬𝟏 |𝑬𝑪𝟐 𝑭)𝑷(𝑬𝑪𝟐 |𝑭)
Example 3.16: Consider Example 3.6, which is concerned with an insurance company which
believes that people can be divided into two distinct classes: those who are accident prone and
those who are not. During any given year, an accident-prone person will have an accident
with probability .4, whereas the corresponding figure for a person who is not prone to
accidents is .2. What is the conditional probability that a new policyholder will have an
accident in his or her second year of policy ownership, given that the policyholder has had an
accident in the first year?
Solution. If we let 𝑨 be the event that the policyholder is accident prone and we let 𝑨𝒊 ,
𝒊 = 𝟏, 𝟐, be the event that he or she has had an accident in the 𝒊-th year, then the desired
probability 𝑷(𝑨𝟐 |𝑨𝟏 ) may be obtained by conditioning on whether or not the policyholder is
accident prone, as follows:
Now,
𝑷(𝑨𝟏 𝑨) 𝑷(𝑨|𝑨𝟏 )
𝑷(𝑨|𝑨𝟏 ) = =
𝑷(𝑨𝟏 ) 𝑷(𝑨𝟏 )
However, 𝑷(𝑨) is assumed to equal /𝟏𝟎 , and it was shown in a previous example that
𝑷(𝑨𝟏 ) = 𝟎. 𝟐𝟔. Hence,
- 30 -
Probability Theory Lecture Notes ---
𝟔 𝟕
𝑷(𝑨𝟐 |𝑨𝑨𝟏 ) = 𝟎. 𝟒 × + 𝟎. 𝟐 × ≈ 𝟎. 𝟐𝟗
𝟏𝟑 𝟏𝟑
Example 3.17: A female chimp has given birth. It is not certain, however, which of two male
chimps is the father. Before any genetic analysis has been performed, it is felt that the
probability that male number 𝟏 is the father is 𝒑 and the probability that male number 𝟐 is the
father is 𝟏 − 𝒑. DNA obtained from the mother, male number 1, and male number 2 indicate
that, on one specific location of the genome, the mother has the gene pair (𝑨, 𝑨), male
number 1 has the gene pair (𝒂, 𝒂), and male number 2 has the gene pair (𝑨, 𝒂). If a DNA test
shows that the baby chimp has the gene pair (𝑨, 𝒂), what is the probability that male number
1 is the father?
Solution: Let all probabilities be conditional on the event that the mother has the gene pair
(𝑨, 𝑨), male number 1 has the gene pair (𝒂, 𝒂), and male number 2 has the gene pair (𝑨, 𝒂).
Now, let 𝑴𝟏 and 𝑴𝟐 be the events that male number 1 and 2, is the father, respectively, and
let 𝑩𝑨,𝒂 be the event that the baby chimp has the gene pair (𝑨, 𝒂). Then 𝑷(𝑴𝟏 |𝑩𝑨,𝒂 ) is
obtained as follows:
𝟏×𝒑 𝟐𝒑
= =
𝟏 × 𝒑 + 𝟎. 𝟓 (𝟏 − 𝒑) 𝟏 + 𝒑
𝟐𝒑
Because > 𝒑 when 𝒑 < 𝟏, the information that the baby’s gene pair is (𝑨, 𝒂) increases
𝟏+𝒑
the probability that male number 𝟏 is the father. This result is intuitive because it is more
likely that the baby would have gene pair (𝑨, 𝒂) if 𝑴𝟏 is true than if 𝑴𝟐 is true (the respective
contidtional probabilities being 𝟏 and 𝟏/𝟐).
SUMMARY
For events 𝐸 and 𝐹, the conditional probability of 𝐸 given that 𝐹 has occurred is denoted by
𝑃(𝐸|𝐹) and is defined by
𝑃(𝐸𝐹)
𝑃(𝐸|𝐹) =
𝑃(𝐹)
- 31 -
Probability Theory Lecture Notes ---
The identity
A valuable identity is
𝑃(𝐻|𝐸) 𝑃(𝐻)𝑃(𝐸|𝐻)
=
𝑃(𝐻 𝑐 │𝐸) 𝑃(𝐻 𝑐 )𝑃(𝐸│𝐻 𝑐 )
shows that when new evidence 𝐸 is obtained, the value of the odds of 𝐻 becomes its old value
multiplied by the ratio of the conditional probability of the new evidence when 𝐻 is true to the
conditional probability when 𝐻 is not true.
Let 𝐹𝑖 , 𝑖 = 1, ⋯ , 𝑛, be mutually exclusive events whose union is the entire samplespace. The
identity
𝑃(𝐸| 𝐹𝑗 )𝑃(𝐹𝑗 )
𝑃(𝐹𝑗 |𝐸) = 𝑛
∑𝑖=1 𝑃(𝐸|𝐹𝑖 ) 𝑃(𝐹𝑖 )
If 𝑃(𝐸𝐹) = 𝑃(𝐸)𝑃(𝐹), then we say that the events 𝐸 and 𝐹 are independent. This condition
is equivalent to 𝑃(𝐸|𝐹) = 𝑃(𝐸) and to 𝑃(𝐹|𝐸) = 𝑃(𝐹). Thus, the events 𝐸 and 𝐹 are
independent if knowledge of the occurrence of one of them does not affect the probability of
the other.
The events 𝐸1 , … , 𝐸𝑛 are said to be independent if, for any subset 𝐸𝑖1 , … , 𝐸𝑖𝑟 of them,
- 32 -
Probability Theory Lecture Notes ---
CHAPTER 4:
RANDOM VARIABLES
We are interested mainly in some function of the outcome as opposed to the actual
outcome itself. For instance, in tossing dice, we are often interested in the sum of
the two dice and are not really concerned about the separate values of each die. That
is, we may be interested in knowing that the sum is 7 and may not be concerned over
whether the actual outcome was (1, 6), (2, 5), (3, 4), (4, 3), (5, 2), or (6, 1). Also, in
flipping a coin, we may be interested in the total number of heads that occur and not
care at all about the actual head–tail sequence that results. These quantities of
interest, or, more formally, these real-valued functions defined on the sample space,
are known as random variables.
Example 4.1: Suppose that our experiment consists of tossing 3 fair coins. If we let Y
denote the number of heads that appear, then Y is a random variable taking on one
of the values 0, 1, 2, and 3 with respective probabilities
- 33 -
Probability Theory Lecture Notes ---
Example 4.2: Three balls are to be randomly selected without replacement from an
urn containing 20 balls numbered 1 through 20. If we bet that at least one of the balls
that are drawn has a number as large as or larger than 17, what is the probability that
we win the bet?
Solution. Let 𝑋 denote the largest number selected. Then 𝑋 is a random variable
taking on one of the values 3, 4, . . . , 20. Furthermore, if we suppose that each of the
20
( ) possible selections are equally likely to occur, then
3
𝑖−1
( )
𝑃 {𝑋 = 𝑖 } = 2 , 𝑖 = 3,4, … , 20
20
( )
3
Above equation follows because the number of selections that result in the event
{𝑋 = 𝑖} is just the number of selections that result in the ball numbered 𝑖 and two
of the balls numbered 1 through (𝑖 – 1) being chosen. Because there are clearly
𝑖−1
( ) such selections, we obtain the probabilities expressed in above, from which
2
we see that
19
(
) 3
{ }
𝑃 𝑋 = 20 = 2 = = 0.150
20
( ) 20
3
18
(
) 51
𝑃{𝑋 = 19} = 2 = ≈ 0.134
20 380
( )
3
17
(
) 34
{ }
𝑃 𝑋 = 18 = 2 = ≈ 0.119
20
( ) 285
3
16
() 2
𝑃{𝑋 = 17} = 2 = ≈ 0.105
20
( ) 19
3
- 34 -
Probability Theory Lecture Notes ---
Hence, since the event {𝑋 ≥ 17} is the union of the disjoint events {𝑋 = 𝑖}, 𝑖 =
17, 18, 19, 20, it follows that the probability of our winning the bet is given by
Example 4.3: Independent trials consisting of the flipping of a coin having probability
𝑝 of coming up heads are continually performed until either a head occurs or a total
of 𝑛 flips is made. If we let 𝑋 denote the number of times the coin is flipped, then 𝑋 is
a random variable taking on one of the values 1, 2, 3, . . . , 𝑛 with respective
probabilities
𝑃{𝑋 = 1} = 𝑃{𝐻} = 𝑝
⏟ 𝑇, … , 𝑇 , 𝐻)} = (1 − 𝑝)𝑛−2 𝑝
𝑃{𝑋 = 𝑛 − 1} = 𝑃 {(𝑇,
𝑛−2
𝑛 𝑛
𝑛−1
Example 4.4: Three balls are randomly chosen from an urn containing 3 white, 3 red,
and 5 black balls. Suppose that we win $1 for each white ball selected and lose $1 for
- 35 -
Probability Theory Lecture Notes ---
each red ball selected. If we let X denote our total winnings from the experiment,
then 𝑋 is a random variable taking on the possible values ±0, ±1, ±2, ±3 with
respective probabilities
5 3 3 5
( ) + ( )( )( )
𝑃 { 𝑋 = 0} = 3 1 1 1 = 55
11 165
( )
3
3 5 3 3
( )( ) + ( )( )
𝑃{𝑋 = 1} = 𝑃{𝑋 = −1} = 1 2 2 1 = 39
11 165
( )
3
3 5
( )( ) 15
𝑃{𝑋 = 2} = 𝑃{𝑋 = −2} = 2 1 =
11 165
( )
3
3
( ) 1
𝑃{𝑋 = 3} = 𝑃{𝑋 = −3} = 3 =
11
( ) 165
3
3
55 1
∑ 𝑃 {𝑋 = 𝑖 } = =
165 3
𝑖=1
The cumulative distribution function (c.d.f.), or more simply the distribution function
of a random variable 𝑋, is defined for all real numbers 𝑏, −∞ < 𝑏 < ∞, by
𝐹 (𝑏 ) = 𝑃 {𝑋 ≤ 𝑏 }
In words, 𝐹(𝑏) denotes the probability that the random variable 𝑋 takes on a value
that is less than or equal to 𝑏. Some properties of the c.d.f., 𝐹 are,
- 36 -
Probability Theory Lecture Notes ---
2. lim𝑏→∞ 𝐹 (𝑏) = 1.
3. lim𝑏→−∞ 𝐹 (𝑏) = 0.
Example 4.5: The distribution function of the random variable 𝑋 is given by:
0, 𝑥<0
𝑥⁄ , 0≤𝑥≤1
2
2⁄ , 1≤𝑥<2
3
𝑓 (𝑥 ) =
11⁄ , 2 ≤ 𝑥 < 3
12
{ 1 , 3≤𝑥
A graph of 𝐹(𝑥) is presented in the figure below. Compute (a) 𝑃{𝑥 < 3}, (b)
𝑃{𝑋 = 3}, (c) 𝑃{𝑋 > 1/2}, and (d) 𝑃{2 < 𝑋 ≤ 4}.
Solution:
11
(a) 𝑃{𝑥 < 3} = lim𝑛→0+ 𝑃{𝑋 ≤ 3 − ℎ} = 𝑙𝑖𝑚𝑛→0+ 𝐹 {3 − ℎ} =
12
2 1 1
(b) 𝑃{𝑋 ≤ 1} − 𝑃{𝑋 < 1} = 𝐹 (1) − 𝑙𝑖𝑚𝑛→0+ 𝐹 {1 − ℎ} = − =
3 2 6
1 1 1 3
(c) 𝑃 {𝑋 > } = 1 − 𝑃 {𝑥 ≤ } = 1 − 𝐹 ( ) =
2 2 2 4
1
(d) 𝑃{2 < 𝑋 ≤ 4} = 𝐹 (4) − 𝐹 (2) =
12
- 37 -
Probability Theory Lecture Notes ---
A random variable that can take on at most a countable number of possible values is
said to be discrete. For a discrete random variable X, we define the probability mass
function 𝑝(𝑎) of 𝑋 by
𝑝(𝑎) = 𝑃{𝑋 = 𝑎}
The probability mass function 𝑝(𝑎) is positive for at most a countable number of
values of 𝑎. That is, if 𝑋 must assume one of the values 𝑥1 , 𝑥2 , ⋯ , then
𝑝(𝑥𝑖 ) ≥ 0 for 𝑖 = 1, 2, ⋯
∑ 𝑝(𝑥𝑖 ) = 1
𝑖=1
- 38 -
Probability Theory Lecture Notes ---
𝐹 (𝑎 ) = ∑ 𝑝 ( 𝑥 )
𝑎𝑙𝑙 𝑥≤𝑎
1 1 1 1
𝑝 (1) = 𝑝(2) = 𝑝(3) = 𝑝 (4) =
4 2 8 8
0 𝑎<1
1
1≤𝑎≤2
4
3
𝐹 (𝑎 ) = 𝑓 (𝑥 ) = 2≤𝑎<3
4
7
3≤𝑎<4
8
{1 4≤𝑎
One of the most important concepts in probability theory is that of the expectation of
a random variable. If 𝑋 is a discrete random variable having a probability mass
- 39 -
Probability Theory Lecture Notes ---
function 𝑝(𝑥), then the expectation, or the expected value, of 𝑋, denoted by 𝐸[𝑋], is
defined by
𝐸 [𝑋 ] = ∑ 𝑥 𝑝 (𝑥 )
𝑥: 𝑝(𝑥)>0
The expected value of 𝑋 is a weighted average of the possible values that 𝑋 can take
on, each value being weighted by the probability that 𝑋 assumes it. For instance, on
the one hand, if the probability mass function of 𝑋 is given by
1
𝑝 (0) = = 𝑝(1)
2
then
1 1 1
𝐸 [𝑋 ] = 0 ( ) + 1 ( ) =
2 2 2
1 2
𝑝 (0) = , 𝑝 (1 ) =
3 3
then
1 2
𝐸[𝑋] = 0 ( ) + 1 ( ) = 2
3 3
Example 4.6: Find 𝐸[𝑋], where 𝑋 is the outcome when we roll a fair die.
Solution. Since 𝑝(1) = 𝑝(2) = 𝑝(3) = 𝑝(4) = 𝑝(5) = 𝑝(6) = 1/6, we obtain
∑ 𝑥𝑖 𝑝(𝑥𝑖 ) = 𝐸 [𝑋 ]
𝑖=1
- 40 -
Probability Theory Lecture Notes ---
Example 4.7: A school class of 120 students is driven in 3 buses for a school trip.
There are 36 students in one of the buses, 40 in another, and 44 in the third bus.
When the buses arrive, one of the 120 students is randomly chosen. Let 𝑋 denote the
number of students on the bus of that randomly chosen student, and find 𝐸[𝑋].
Solution. Since the randomly chosen student is equally likely to be any of the 120
students, it follows that
Hence,
Suppose that we are given a discrete random variable along with its probability mass
function and that we want to compute the expected value of some function of 𝑋, say,
𝑔(𝑋). How can we accomplish this? One way is as follows: Since 𝑔(𝑋) is itself a
discrete random variable, it has a probability mass function, which can be determined
from the probability mass function of X. Once we have determined the probability
mass function of 𝑔(𝑋), we can compute 𝐸[𝑔(𝑋)] by using the definition of expected
value.
Example 4.8: Let X denote a random variable that takes on any of the values −1, 0,
and 1 with respective probabilities
Compute 𝐸[𝑋 2 ].
- 41 -
Probability Theory Lecture Notes ---
Hence,
Note that
Before proving this proposition, let us check that it is in accord with the results of the
previous example. Applying it to that example yields
= .5
Example 4.9: A product that is sold seasonally yields a net profit of b dollars for each
unit sold and a net loss of _ dollars for each unit left unsold when the season ends.
The number of units of the product that are ordered at a specific department store
during any season is a random variable having probability mass function p(i), i ≥ 0. If
the store must stock this product in advance, determine the number of units the
store should stock so as to maximize its expected profit.
- 42 -
Probability Theory Lecture Notes ---
Solution. Let X denote the number of units ordered. If s units are stocked, then the
profit—call it 𝑃(𝑠)—can be expressed as
𝑏𝑋 − (𝑠 − 𝑋 )𝑙, 𝑖𝑓 𝑋 ≤ 𝑠
𝑃𝑠 (𝑋 ) = {
𝑠𝑏, 𝑖𝑓 𝑋 > 𝑠
𝑠 ∞
𝑠 𝑠 𝑠
𝑠 𝑠
= 𝑠𝑏 + (𝑏 + 𝑙) ∑(𝑖 − 𝑠)𝑝(𝑖)
𝑖=0
To determine the optimum value of s, let us investigate what happens to the profit
when we increase s by 1 unit. By substitution, we see that the expected profit in this
case is given by
𝑠+1
= 𝑏 (𝑠 + 1) + (𝑏 + 𝑙 ) ∑ (𝑖 − 𝑠 − 1) 𝑝 (𝑖 )
𝑖=0
Therefore,
- 43 -
Probability Theory Lecture Notes ---
𝑠
𝑏
∑ 𝑝 (𝑖 ) <
𝑏+𝑙
𝑖=0
Because the left-hand side of the above equation is increasing in s while the right-
hand side is constant, the inequality will be satisfied for all values of s … s∗, where s∗ is
the largest value of s satisfying Equation (4.1). Since
𝐸[𝑃(0)] < · · · < 𝐸[𝑃(𝑠 ∗)] < 𝐸[𝑃(𝑠 ∗ + 1)] > 𝐸[𝑃(𝑠 ∗ + 2)] > · · ·
Proof:
= 𝑎∑(𝑥)𝑝(𝑥) + 𝑏∑𝑝(𝑥)
= 𝑎𝐸 [𝑋 ] + 𝑏
The expected value of a random variable 𝑋, 𝐸[𝑋], is also referred to as the mean or
the first moment of 𝑋. The quantity 𝐸[𝑋 𝑛 ], 𝑛 ≥ 1, is called the n-th moment of 𝑋. By
the proposition above, we note that
𝐸 [𝑋 𝑛 ] = ∑ 𝑥 𝑛 𝑝 (𝑥 )
𝑥:𝑝(𝑥) > 0
- 44 -
Probability Theory Lecture Notes ---
4.6 VARIANCE
We expect X to take on values around its mean 𝐸[𝑋], it would appear that a
reasonable way of measuring the possible variation of X would be to look at how far
apart X would be from its mean, on the average. One possible way to measure this
variation would be to consider the quantity 𝐸 [|𝑋 − 𝜇|], where 𝜇 = 𝐸[𝑋]. However,
it turns out to be mathematically inconvenient to deal with this quantity, so a more
tractable quantity is usually considered—namely, the expectation of the square of
the difference between X and its mean.
= ∑ (𝑥 − 𝜇 )2 𝑝 ( 𝑥 )
𝑥
= 𝐸 [𝑋 2 ] − 2𝜇2 + 𝜇2
= 𝐸 [𝑋 2 ] − 𝜇 2
That is,
- 45 -
Probability Theory Lecture Notes ---
Example 4.10: Calculate 𝑉𝑎𝑟(𝑋 ) if 𝑋 represents the outcome when a fair die is rolled.
Hence,
91 7 2 35
𝑉𝑎𝑟(𝑋 ) = –( ) =
6 2 12
𝑉𝑎𝑟(𝑎𝑋 + 𝑏) = 𝑎2 𝑉𝑎𝑟(𝑋 )
To prove this equality, let 𝜇 = 𝐸[𝑋] and note from Corollary 4.1 that 𝐸[𝑎𝑋 + 𝑏] =
𝑎𝜇 + 𝑏. Therefore,
= 𝐸[𝑎2 (𝑋 − 𝜇)2 ]
= 𝑎2 𝐸 [(𝑋 − 𝜇)2 ]
= a2Var(X)
(b) The square root of the 𝑉𝑎𝑟(𝑋 ) is called the standard deviation of 𝑋, and we
denote it by 𝑆𝐷 (𝑋 ). That is,
𝑆𝐷 (𝑋 ) = √𝑉𝑎𝑟(𝑋)
𝑝(0) = 𝑃{𝑋 = 0} = 1 − 𝑝
- 46 -
Probability Theory Lecture Notes ---
𝑝(1) = 𝑃{𝑋 = 1} = 𝑝
Suppose now that 𝑛 independent trials, each of which results in a success with
probability 𝑝 and in a failure with probability 1 − 𝑝, are to be performed. If 𝑋
represents the number of successes that occur in the 𝑛 trials, then 𝑋 is said to be a
binomial random variable with parameters (𝑛, 𝑝). Thus, a Bernoulli random variable
is just a binomial random variable with parameters (1, 𝑝).
𝑛
𝑝(𝑖) = ( ) 𝑝𝑖 (1 − 𝑝)𝑛−𝑖 , 𝑖 = 0,1, … , 𝑛
𝑖
Note that, by the binomial theorem, the probabilities sum to 1; that is,
∞ ∞
𝐸 [𝑋 ] = 𝑛𝑝
𝐸 [𝑋 2 ] = 𝑛𝑝[(𝑛 − 1)𝑝 + 1]
- 47 -
Probability Theory Lecture Notes ---
𝜆𝑖
𝜆
𝑝 (𝑖 ) = 𝑃 {𝑋 = 𝑖 } = 𝑒
𝑖!
∞ ∞
𝜆𝑖
∑ 𝑝 (𝑖 ) = 𝑒 𝜆 ∑ = 𝑒 𝜆 𝑒 −𝜆 = 1
𝑖!
𝑖=0 𝑖=0
The Poisson random variable has a tremendous range of applications in diverse areas
because it may be used as an approximation for a binomial random variable with
parameters (𝑛, 𝑝) when 𝑛 is large and 𝑝 is small enough so that 𝑛𝑝 is of moderate
size.
Some examples of random variables that generally obey the Poisson probability law
are as follows:
4. The number of packages of dog biscuits sold in a particular store each day
6. The number of vacancies occurring during a year in the federal judicial system
- 48 -
Probability Theory Lecture Notes ---
Example 4.11: Suppose that the number of typographical errors on a single page of
this book has a Poisson distribution with parameter = 1/2 . Calculate the probability
that there is at least one error on this page.
The expected value and the variance of a Poisson random variable are
both equal to its parameter 𝝀.
SUMMARY
A random variable whose set of possible values is either finite or countably infinite is
called discrete. If X is a discrete random variable, then the function
𝑝(𝑥) = 𝑃{𝑋 = 𝑥}
𝐸[𝑋] = ∑ 𝑥𝑝(𝑥)
𝑥:𝑝(𝑥)>0
- 49 -
Probability Theory Lecture Notes ---
is called the expected value of X. E[X] is also commonly called the mean or the
expectation of X.
𝐸[𝑔(𝑋 )] = ∑ 𝑔(𝑥)𝑝(𝑥)
𝑥:𝑝(𝑥)>0
The variance, which is equal to the expected square of the difference between 𝑋 and
its expected value, is a measure of the spread of the possible values of 𝑋. A useful
identity is
We now note some common types of discrete random variables. The random
variable 𝑋 whose probability mass function is given by
𝑛
𝑝(𝑖) = ( ) 𝑝𝑖 (1 − 𝑝)𝑛−𝑖 , 𝑖 = 0, … , 𝑛
𝑖
𝐸 [𝑋 ] = 𝑛𝑝 Var(𝑋) = 𝑛𝑝(1 − 𝑝)
- 50 -
Probability Theory Lecture Notes ---
𝑒 −𝜆 𝜆𝑖
𝑝 (𝑖 ) = , 𝑖≥ 0
𝑖!
𝐸[𝑋] = 𝑉𝑎𝑟(𝑋) = 𝜆
An important property of the expected value is that the expected value of a sum of
random variables is equal to the sum of their expected values. That is,
𝑛 𝑛
𝐸 [∑ 𝑋𝑖 ] = ∑ 𝐸 [𝑋𝑖 ]
𝑖=1 𝑖=1
- 51 -
Probability Theory Lecture Notes ---
- 52 -
Probability Theory Lecture Notes ---
CHAPTER 5.
CONTINUOUS RANDOM VARIABLES
5.1 INTRODUCTION
The function f is called the probability density function of the random variable X.
- 53 -
Probability Theory Lecture Notes ---
In words, the above states that the probability that X will be in B may be obtained
by integrating the probability density function over the set B. Since X must
assume some value, f must satisfy
𝑃{𝑎 ≤ 𝑋 ≤ 𝑏} = ∫ 𝑓 (𝑥)𝑑𝑥
𝑎
𝑃{𝑋 = 𝑎} = ∫ 𝑓 (𝑥)𝑑𝑥 = 0
𝑎
𝐶 (4𝑥 − 2𝑥 2 ), 0<𝑥<0
𝑓 (𝑥 ) = {
0, 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒
- 54 -
Probability Theory Lecture Notes ---
∞
Solution. (a) Since f is a probability density function, we must have ∫−∞ 𝑓 (𝑥)𝑑𝑥 =
1, implying that
𝐶 ∫(4𝑥 − 2𝑥 2 )𝑑𝑥 = 1
0
or
𝑥=2
2𝑥 3
2
𝐶 [2𝑥 − ]| =1
3 𝑥=0
or
3
𝐶=
8
Hence,
∞ 3 2 1
(b) 𝑃{𝑋 > 1} = ∫1 𝑓(𝑥) 𝑑𝑥 = ∫1 (4𝑥 − 2𝑥 2 ) 𝑑𝑥 =
8 2
Example 5.2: The amount of time in hours that a computer functions before
breaking down is a continuous random variable with probability density function
given by
𝜆𝑒 −𝑥/100 , 𝑥≥0
𝑓(𝑥) = {
0, 𝑥<0
(a) a computer will function between 50 and 150 hours before breaking down?
∞ ∞
1 = ∫ 𝑓 (𝑥)𝑑𝑥 = 𝜆 ∫ 𝑒 −𝑥/100 𝑑𝑥
−∞ 0
- 55 -
Probability Theory Lecture Notes ---
we obtain,
𝑥 ∞ 1
1 = −𝜆(100) 𝑒 −100 | = 100𝜆 or 𝜆 =
0 100
Hence, the probability that a computer will function between 50 and 150 hours
before breaking down is given by
150
1 −𝑥/100 150
𝑃{50 < 𝑋 < 150} = ∫ 𝑒 𝑑𝑥 = −𝑒 −𝑥/100 |50 = 𝑒 −1/2 − 𝑒 −3/2 ≈ 0.384
50 100
(b) Similarly,
100
1 −𝑥/100 100
𝑃{𝑋 < 100} = ∫ 𝑒 𝑑𝑥 = −𝑒 −𝑥/100 |0 = 1 − 𝑒 −1 ≈ 0.633
0 100
Example 5.3: The lifetime in hours of a certain kind of radio tube is a random
variable having a probability density function given by
0, 𝑥 ≤ 100
𝑓(𝑥) = {100
, 𝑥 > 100
𝑥2
What is the probability that exactly 2 of 5 such tubes in a radio set will have to be
replaced within the first 150 hours of operation? Assume that the events Ei, i = 1,
2, 3, 4, 5, that the ith such tube will have to be replaced within this time are
independent.
150 150
1
𝑃{𝐸𝑖 } = ∫ 𝑓 (𝑥) 𝑑𝑥 = 100 ∫ 𝑥 −2 𝑑𝑥 =
0 100 3
Hence, from the independence of the events Ei, it follows that the desired
probability is
2
5 1 2 3 80
( )( ) ( ) =
2 3 3 243
- 56 -
Probability Theory Lecture Notes ---
𝑎
𝐹 (𝑎) = 𝑃{𝑋 ∈ (−∞, 𝑎]} = ∫ 𝑓 (𝑥) 𝑑𝑥
−∞
𝑑
𝐹 (𝑎 ) = 𝑓 (𝑎 )
𝑑𝑎
𝐸 [𝑋 ] = ∑ 𝑥𝑃{𝑋 = 𝑥}
𝑥
∞
𝐸 [𝑋 ] = ∫ 𝑥𝑓 (𝑥) 𝑑𝑥
−∞
2𝑥, if 0 ≤ 𝑥 ≤ 1
𝑓 (𝑥 ) = {
0, otherwise
Solution:
1
2
𝐸 [𝑋 ] = ∫ 𝑥𝑓 (𝑥)𝑑𝑥 = ∫ 2𝑥 2 𝑑𝑥 =
0 3
1, if 0 ≤ 𝑥 ≤ 1
𝑓 (𝑥 ) = {
0, otherwise
find 𝐸 [𝑒 𝑋 ].
- 57 -
Probability Theory Lecture Notes ---
𝐹𝑌 (𝑥) = 𝑃{𝑌 ≤ 𝑥}
log(𝑥)
=∫ 𝑓 (𝑦)𝑑𝑦 = log(𝑥)
0
1
𝑓𝑌 (𝑥) = , 1≤𝑥≤𝑒
𝑥
Hence,
∞ 𝑒
𝐸 [𝑒 𝑋 ] = 𝐸 [𝑌] = ∫ 𝑥𝑓𝑌 (𝑥) 𝑑𝑥 = ∫ 𝑑𝑥 = 𝑒 − 1
−∞ 1
1
𝑋]
𝐸 [𝑒 = ∫ 𝑒 𝑥 𝑑𝑥 = 𝑒 − 1, since 𝑓 (𝑥) = 1, 0<𝑥<1
0
- 58 -
Probability Theory Lecture Notes ---
Solution. Let Lp(U) denote the length of the substick that contains the point p, and
note that
1 − 𝑈, 𝑈<𝑝
𝐿𝑝 (𝑢) = {
𝑈, 𝑈≥𝑝
1 𝑝 1
𝐸[𝐿𝑝 (𝑈)] = ∫ 𝐿𝑝 (𝑢)𝑑𝑢 = ∫ (1 − 𝑢) 𝑑𝑢 + ∫ 𝑢 𝑑𝑢
0 0 𝑝
1 (1 − 𝑝 ) 2 1 𝑝 2 1
= − + − = + 𝑝 (1 − 𝑝 )
2 2 2 2 2
2𝑥, if 0 ≤ 𝑥 ≤ 1
𝑓 (𝑥 ) = { .
0, otherwise
𝐸 [𝑋 2 ] = ∫ 𝑥 2 𝑓(𝑥)𝑑𝑥
- 59 -
Probability Theory Lecture Notes ---
1
1
= ∫ 2𝑥 3 𝑑𝑥 =
0 2
1 2 2
Var(𝑋 ) = − ( ) = 1/18
2 3
Var(𝑎𝑋 + 𝑏) = 𝑎2 Var(𝑋)
The proof mimics the one given for discrete random variables.
A random variable is said to be uniformly distributed over the interval (0, 1) if its
probability density function is given by
1, 0<𝑥<1
𝑓 (𝑥 ) = {
0, otherwise
In general, we say that 𝑋 is a uniform random variable on the interval (𝛼, 𝛽) if the
probability density function of 𝑋 is given by
1
, if 𝛼 < 𝑥 < 𝛽
𝑓 (𝑥) = {𝛽 − 𝛼
0, otherwise
𝑎
Since 𝐹 (𝑎) = ∫−∞ 𝑓(𝑥) 𝑑𝑥 , it follows from the above equation that the
distribution function of a uniform random variable on the interval (𝛼, 𝛽) is given
by
0 𝑎≤𝛼
𝑎−𝛼
𝐹 (𝑎 ) = {𝛽 − 𝛼 𝛼<𝑎<𝛽
1 𝑎≥𝛽
- 60 -
Probability Theory Lecture Notes ---
Graph of (a) 𝑓(𝑎) and (b) 𝐹 (𝑎) for a uniform (𝛼, 𝛽) random variable.
Example 5.8: Let 𝑋 be uniformly distributed over (𝛼, 𝛽). Find (a) 𝐸 [𝑋 ] and (b)
Var(𝑋).
Solution. (a)
∞ 𝛽
𝑥
𝐸 [𝑋 ] = ∫ 𝑥𝑓 (𝑥) 𝑑𝑥 = ∫ 𝑑𝑥
−∞ 𝛼 𝛽−𝛼
𝛽2 − 𝛼 2 𝛽+𝛼
= =
2(𝛽 − 𝛼 ) 2
𝛽
2]
1
𝐸 [𝑋 =∫ 𝑥 2 𝑑𝑥
𝛼 𝛽−𝛼
𝛽3 − 𝛼 3
=
3(𝛽 − 𝛼 )
𝛽2 + 𝛼𝛽 + 𝛼 2
=
3
Hence,
𝛽2 + 𝛼𝛽 + 𝛼 2 (𝛼 + 𝛽)2
𝑉𝑎𝑟(𝑋 ) = −
3 4
(𝛽 − 𝛼 ) 2
=
12
- 61 -
Probability Theory Lecture Notes ---
Example 5.9: If X is uniformly distributed over (0, 10), calculate the probability
that (a) 𝑋 < 3, (b) 𝑋 > 6, and (c) 3 < 𝑋 < 8.
3 1 3
Solution. (a) 𝑃{𝑋 < 3} = ∫0 𝑑𝑥 =
10 10
10 1 4
(b) 𝑃{𝑋 > 6} = ∫6 𝑑𝑥 =
10 10
8 1 1
(b) 𝑃{3 < 𝑋 < 8} = ∫3 𝑑𝑥 =
10 2
Solution. Let 𝑋 denote the number of minutes past 7 that the passenger arrives at
the stop. Since 𝑋 is a uniform random variable over the interval (0, 30), it follows
that the passenger will have to wait less than 5 minutes if (and only if) he arrives
between 7:10 and 7:15 or between 7:25 and 7:30. Hence, the desired probability
for part (a) is
15 30
1 1 1
𝑃{10 < 𝑋 < 15} + 𝑃{25 < 𝑋 < 30} = ∫ 𝑑𝑥 + ∫ 𝑑𝑥 =
10 30 25 30 3
- 62 -
Probability Theory Lecture Notes ---
1 2 /2𝜎 2
𝑓 (𝑥 ) = 𝑒 −(𝑥−𝜇) , −∞<𝑥 <∞
𝜎√(2𝜋)
Example 5.11: Find (a) 𝐸 [𝑋 ] and (b) Var(𝑋) when 𝑋 is a normal random variable
with parameters 𝜇 and 𝜎 2 .
Solution: (a)
∞
1 2 /2𝜎 2
𝐸[𝑥] = ∫ 𝑥𝑒 −(𝑥−𝜇) 𝑑𝑥
𝜎√2𝜋 −∞
Writing 𝑥 as (𝑥 − µ) + µ yields
- 63 -
Probability Theory Lecture Notes ---
∞ ∞
1 −(𝑥−𝜇)2 /2𝜎 2
1 2 /2𝜎 2
𝐸[𝑋] = ∫ (𝑥 − 𝜇)𝑒 𝑑𝑥 + 𝜇 ∫ 𝑒 −(𝑥−𝜇) 𝑑𝑥
𝜎 √2𝜋 −∞ 𝜎√2𝜋 −∞
∞ ∞
1 −𝑦 2 /2𝜎 2
𝐸[𝑋] = ∫ 𝑦𝑒 𝑑𝑥 + 𝜇 ∫ 𝑓(𝑥) 𝑑𝑥
𝜎√2𝜋 −∞ −∞
where 𝑓(𝑥) is the normal density. By symmetry, the first integral must be 0, so
∞
𝐸[𝑋] = 𝜇 ∫ 𝑓(𝑥) 𝑑𝑥 = 𝜇
−∞
∞
1 2 /2𝜎 2
= ∫ (𝑥 − 𝜇)2 𝑒 −(𝑥−𝜇) 𝑑𝑥
𝜎√2𝜋 −∞
Substituting 𝑦 = (𝑥 − µ) yields
∞
𝜎2 2 /2
𝑉𝑎𝑟(𝑋) = ∫ 𝑦 2 𝑒 −𝑦 𝑑𝑦
√2𝜋 −∞
∞ ∞
𝜎2 −
𝑦2 2 /2
= [𝑦𝑒 2 | + ∫ 𝑒 −𝑦 𝑑𝑦]
√2𝜋 −∞ −∞
∞
𝜎2 2 /2
𝑉𝑎𝑟(𝑋) = ∫ 𝑒 −𝑦 𝑑𝑦
√2𝜋 −∞
= 𝜎2
- 64 -
Probability Theory Lecture Notes ---
∞
1 2 /2
Φ(𝑥) = ∫ 𝑒 −𝑦 𝑑𝑦
√2𝜋 ∞
The values of Φ(𝑥) for nonnegative x are given in a table. For negative values of 𝑥,
Φ(𝑥) can be obtained by
Solution. (a)
−1 2
= 𝑃{ <𝑍< }
3 3
- 65 -
Probability Theory Lecture Notes ---
2 1 2 1
= Φ ( ) − Φ (− ) = Φ ( ) − [1 − Φ ( )] ≈ 0.3779
3 3 3 3
(b)
𝑋−3 0−3
𝑃 {𝑋 > 0} = 𝑃 { < } = 𝑃{𝑍 > −1} = 1 − Φ(−1) = Φ(1) ≈ 0.8413
3 3
(c)
𝑋−3 9−3 𝑋 − 3 −3 − 3
= 𝑃{ > }+𝑃{ < }
3 3 3 3
Example 5.13: The instructor often uses the test scores to estimate the normal
parameters 𝜇 and 𝜎 2 and then assigns the letter grade A to those whose test score
is greater than 𝜇 + 𝜎, B to those whose score is between 𝜇 and 𝜇 + 𝜎, C to those
whose score is between 𝜇 − 𝜎 and 𝜇, D to those whose score is between 𝜇 − 2𝜎
and 𝜇 − 𝜎, and F to those getting a score below 𝜇 − 2𝜎. (This strategy is
sometimes referred to as grading “on the curve.”) Since
𝑃{𝜇 < 𝑋 < 𝜇 + 𝜎} = 𝑃{0 < (𝑋 – 𝜇)/𝜎 < 1} = Φ(1) − Φ(0) ≈ 0.3413
𝑃{𝜇 − 𝜎 < 𝑋 < 𝜇} = 𝑃{−1 < (𝑋 – 𝜇)/𝜎 < 0} = Φ(0) − Φ(−1) ≈ 0.3413
𝑃{𝜇 − 2𝜎 < 𝑋 < 𝜇 − 𝜎 } = 𝑃{−2 < (𝑋 – 𝜇)/𝜎 < −1} = Φ(2) − Φ(1) ≈ 0.1359
- 66 -
Probability Theory Lecture Notes ---
Because the channel noise is often normally distributed, we determine the error
probabilities when 𝑁 is a standard normal random variable.
Solution. Two types of errors can occur: One is that the message 1 can be
incorrectly determined to be 0, and the other is that 0 can be incorrectly
determined to be 1. The first type of error will occur if the message is 1 and
2 + 𝑁 <0.5, whereas the second will occur if the message is 0 and −2 + 𝑁 ≥
0.5. Hence,
= 1 − Φ(1.5) ≈ 0.0668
and
= 1 − Φ(2.5) ≈ 0.0062
- 67 -
Probability Theory Lecture Notes ---
𝜆𝑒 −𝜆𝑥 , if 𝑥 ≥ 0
𝑓 (𝑥 ) = {
0, if 𝑥 < 0
𝐹 (𝑎 ) = 𝑃 {𝑋 ≤ 𝑎 }
𝑎
= ∫ 𝜆𝑒 −𝜆𝑥 𝑑𝑥
0
𝑎
= −𝑒 −𝜆𝑥 |0
= 1 − 𝑒 −𝜆𝑎 , 𝑎≥0
𝜆𝑒 −𝜆𝑥 , if 𝑥 ≥ 0
𝑓 (𝑥 ) = {
0, if 𝑥 < 0
∞
∞
𝐸 [𝑋 𝑛 ] = −𝑥 𝑛 𝑒 −𝜆𝑥 |0 + ∫ 𝑒 −𝜆𝑥 𝑛𝑥 𝑛−1 𝑑𝑥
0
𝑛 ∞ −𝜆𝑥 𝑛−1
= 0 + ∫ 𝜆𝑒 𝑛𝑥 𝑑𝑥
𝜆 0
𝑛
= 𝐸 [𝑋 𝑛−1 ]
𝜆
1
𝐸 [𝑋 ] =
𝜆
- 68 -
Probability Theory Lecture Notes ---
2 2
𝐸 [𝑋 2 ] = 𝐸 [𝑋 ] = 2
𝜆 𝜆
(b) Hence,
2 1 2 1
𝐕𝐚𝐫(𝑋 ) = 2 − ( ) = 2
𝜆 𝜆 𝜆
Thus, the mean of the exponential is the reciprocal of its parameter 𝜆, and the
variance is the mean squared.
Solution. Let 𝑋 denote the length of the call made by the person in the booth.
Then the desired probabilities are
(a)
= 𝑒 −1 ≈ 0.368
(b)
- 69 -
Probability Theory Lecture Notes ---
= 𝑒 −1 − 𝑒 −2 ≈ 0.233
or
Example 5.17: Suppose that the number of miles that a car can run before its
battery wears out is exponentially distributed with an average value of 10,000
miles. If a person desires to take a 5000-mile trip, what is the probability that he
or she will be able to complete the trip without having to replace the car battery?
What can be said when the distribution is not exponential?
- 70 -
Probability Theory Lecture Notes ---
1
𝑃{remaining lifetime > 5} = 1 − 𝐹(5) = 𝑒 −5𝜆 = 𝑒 −2 ≈ 0.604
1 − 𝐹 (𝑡 + 5)
𝑃{lifetime > 𝑡 + 5|lifetime > 𝑡 } =
𝑎 − 𝐹 (𝑡 )
where 𝑡 is the number of miles that the battery had been in use prior to the start
of the trip. Therefore, if the distribution is not exponential, additional information
is needed (namely, the value of 𝑡) before the desired probability can be
calculated.
1 −𝜆|𝑥|
𝑓 (𝑥 ) = 𝜆𝑒 , −∞<𝑥 <∞
2
1 ∞ 𝜆𝑥
∫ 𝜆𝑒 𝑑𝑥, 𝑥≤0
2 −∞
𝑓 (𝑥 ) =
1 0 𝜆𝑥 1 𝑥 −𝜆𝑥
∫ 𝜆𝑒 𝑑𝑥 + ∫ 𝜆𝑒 , 𝑥>0
{2 −∞ 2 0
Example 5.18: Consider again the same example, which supposes that a binary
message is to be transmitted from A to B, with the value 2 being sent when the
- 71 -
Probability Theory Lecture Notes ---
message is 1 and −2 when it is 0. However, suppose now that, rather than being a
standard normal random variable, the channel noise 𝑁 is a Laplacian random
variable with parameter 𝜆 = 1. Suppose again that if 𝑅 is the value received at
location B, then the message is decoded as follows:
In this case, where the noise is Laplacian with parameter λ = 1, the two types of
errors will have probabilities given by
= ½ 𝑒 −1.5 ≈ 0.1116
1 −2.5
= 𝑒 ≈ 0.041
2
On comparing this with the results of Example 4e, we see that the error
probabilities are higher when the noise is Laplacian with 𝜆 = 1 than when it is a
standard normal variable.
𝑓 (𝑡 )
𝜆(𝑡 ) = ̅̅̅̅̅̅ , 𝑤ℎ𝑒𝑟𝑒 𝐹̅ = 1 − 𝐹
𝐹 (𝑡 )
To interpret 𝜆(𝑡), suppose that the item has survived for a time 𝑡 and we desire
the probability that it will not survive for an additional time d𝑡. That is, consider
𝑃{𝑋 ∈ (𝑡, 𝑡 + d𝑡 )|𝑋 > 𝑡}. Now,
- 72 -
Probability Theory Lecture Notes ---
𝑃{𝑥 ∈ (𝑡, 𝑡 + 𝑑𝑡 )}
=
𝑃 {𝑋 > 𝑡 }
𝑓 (𝑡 )
≈
̅̅̅̅̅̅
𝐹 (𝑡 )
Thus, 𝜆(𝑡) represents the conditional probability intensity that a 𝑡-unit-old item
will fail.
𝑓(𝑡 ) 𝜆𝑒 −𝜆𝑡
𝜆 (𝑡 ) = = −𝜆𝑡 = 𝜆
̅̅̅̅̅̅
𝐹 (𝑡 ) 𝑒
Thus, the failure rate function for the exponential distribution is constant. The
parameter 𝜆 is often referred to as the rate of the distribution.
It turns out that the failure rate function 𝜆(𝑡) uniquely determines the
distribution 𝐹. To prove this, note that, by definition,
𝑑
𝐹 (𝑡 )
𝜆(𝑡 ) = 𝑑𝑡
1 − 𝐹 (𝑡 )
𝑡
log(1 − 𝐹 (𝑡 )) = − ∫ 𝜆(𝑡 ) 𝑑𝑡 + 𝑘
0
or
- 73 -
Probability Theory Lecture Notes ---
𝑡
𝑘
1 − 𝐹 (𝑡 ) = 𝑒 exp (− ∫ 𝜆(𝑡 )𝑑𝑡 )
0
𝑡
𝐹 (𝑡 ) = 1 − exp (− ∫ 𝜆(𝑡 ) 𝑑𝑡)
0
𝜆(𝑡) = 𝑎 + 𝑏𝑡
2 /2
𝐹(𝑡) = 1 − 𝑒 −𝑎𝑡−𝑏𝑡
𝑏𝑡 2
−(𝑎𝑡+ )
𝑓 (𝑡 ) = (𝑎 + 𝑏𝑡 )𝑒 2 , 𝑡≥ 0
Example 5.19: One often hears that the death rate of a person who smokes is, at
each age, twice that of a nonsmoker. What does this mean? Does it mean that a
nonsmoker has twice the probability of surviving a given number of years as does
a smoker of the same age?
Solution. If 𝜆𝑠 (𝑡) denotes the hazard rate of a smoker of age 𝑡 and 𝜆𝑛 (𝑡 ) that of a
nonsmoker of age t, then the statement at issue is equivalent to the statement
that
The probability that an A-year-old nonsmoker will survive until age B, A < B, is
- 74 -
Probability Theory Lecture Notes ---
1 − 𝐹𝑛𝑜𝑛 (𝐵)
=
1 − 𝐹𝑛𝑜𝑛 (𝐴)
𝐵
exp {− ∫0 𝜆𝑛 (𝑡 ) 𝑑𝑡}
= 𝐴
exp {− ∫0 𝜆𝑛 (𝑡 ) 𝑑𝑡}
𝐵
= exp {− ∫ 𝜆𝑛 (𝑡 ) 𝑑𝑡}
𝐴
whereas the corresponding probability for a smoker is, by the same reasoning,
𝐵
𝑃{𝐴 − year − old smoker reaches age 𝐵} = exp {− ∫ 𝜆𝑠 (𝑡 ) 𝑑𝑡}
𝐴
𝐵 𝐵 2
= exp (−2 ∫ 𝜆𝑛 (𝑡 )𝑑𝑡) = [exp (− ∫ 𝜆𝑛 (𝑡 ) 𝑑𝑡)]
𝐴 𝐴
In other words, for two people of the same age, one of whom is a smoker and the
other a nonsmoker, the probability that the smoker survives to any given age is
the square (not one-half) of the corresponding probability for a nonsmoker. For
instance, if 𝜆𝑛 (𝑡) = 1/30, 50 ≤ 𝑡 ≤ 60, then the probability that a 50-year-old
nonsmoker reaches age 60 is 𝑒 −1/3 ≈ 0.7165, whereas the corresponding
probability for a smoker is e−2/3 ≈ 0.5134.
A random variable is said to have a gamma distribution with parameters (𝛼, 𝜆), λ
> 0, α > 0, if its density function is given by
- 75 -
Probability Theory Lecture Notes ---
𝜆𝑒 −𝜆𝑥 (𝜆𝑥)𝛼−1
𝑓 (𝑥) = {− Γ( 𝛼 )
, 𝑥≥0
0, 𝑥<0
∞
Γ(𝛼) = ∫ 𝑒 −𝑦 𝑦 𝛼−1 𝑑𝑦
0
0, 𝑥≤𝜐
𝐹 (𝑥 ) = { 𝑥−𝜐 𝛽
1 − exp {− ( ) }, 𝑥>𝜐
𝛼
0, 𝑥≤𝜐
𝑓(𝑥) = {𝛽 𝑥 − 𝜐 𝛽−1 𝑥−𝜐 𝛽
( ) exp {− ( ) }, 𝑥>𝜐
𝛼 𝛼 𝛼
- 76 -
Probability Theory Lecture Notes ---
1 1
𝑓 (𝑥 ) = , −∞<𝑥 <∞
𝜋 1 + (𝑥 − 𝜃 )2
1
− 𝑥 𝑎−1 (1 − 𝑥)𝑏−1 , 0<𝑥<1
𝑓 (𝑥) = { 𝐵(𝑎, 𝑏)
0, otherwise
where
1
𝐵(𝑎, 𝑏) = ∫ 𝑥 𝑎−1 (1 − 𝑥)𝑏−1 𝑑𝑥
0
Example 5.20: Let 𝑋 be uniformly distributed over (0, 1). We obtain the
distribution of the random variable 𝑌, defined by 𝑌 = 𝑋 𝑛 , as follows: For
0 ≤ 𝑦 ≤ 1,
𝐹𝑌 (𝑦) = 𝑃{𝑌 ≤ 𝑦}
= 𝑃 {𝑋 𝑛 ≤ 𝑦 }
= 𝑃{𝑋 ≤ 𝑦1/𝑛 }
- 77 -
Probability Theory Lecture Notes ---
= 𝐹𝑋 (𝑦1/𝑛 )
1 1/𝑛−1
𝑓𝑌 (𝑦) = {𝑛 𝑦 , 0≤𝑦≤1
0, otherwise
𝐹𝑌 (𝑦) = 𝑃{𝑌 ≤ 𝑦}
= 𝑃 {𝑋 2 ≤ 𝑦 }
= 𝑃{−√𝑦 ≤ 𝑋 ≤ √𝑦}
= 𝐹𝑋 (√𝑦) − 𝐹𝑋 (−√𝑦)
Differentiation yields
1
𝑓𝑌 (𝑦) = [𝑓𝑋 (√𝑦) + 𝑓𝑋 (√𝑦)]
2√𝑦
𝐹𝑌 (𝑦) = 𝑃{𝑌 ≤ 𝑦}
= 𝑃{|𝑋 | ≤ 𝑦} = 𝑃{−𝑦 ≤ 𝑋 ≤ 𝑦}
= 𝐹𝑋 (𝑦) − 𝐹𝑋 (−𝑦)
- 78 -
Probability Theory Lecture Notes ---
𝐹𝑌 (𝑦) = 𝑃{𝑔(𝑋 ) ≤ 𝑦}
= 𝐹𝑋 (𝑔−1 (𝑦))
Differentiation gives
𝑑 −1
𝑓𝑌 (𝑦) = 𝑓𝑋 (𝑔−1 (𝑦)) 𝑔 (𝑦 )
𝑑𝑦
When 𝑦 ≠ 𝑔(𝑥) for any 𝑥, then 𝐹𝑌 (𝑦) is either 0 or 1, and in either case 𝑓𝑌 (𝑦) =
0.
and
- 79 -
Probability Theory Lecture Notes ---
𝑑 1
{𝑔−1 (𝑦)} = 𝑦1/𝑛−1
𝑑𝑦 𝑛
Hence, from the theorem, we obtain
1 1 −1
𝑓𝑌 (𝑦) = 𝑦 𝑛 𝑓(𝑦1/𝑛 )
𝑛
1
𝑓𝑌 (𝑦) = 𝑓(√𝑦)
2√𝑦
SUMMARY
𝑃{𝑋 ∈ 𝐵} = ∫ 𝑓 (𝑥) d𝑥
𝐵
𝑑
𝐹(𝑥) = 𝑓 (𝑥)
𝑑𝑥
𝐸[𝑋] = ∫ 𝑥𝑓 (𝑥) d𝑥
−∞
- 80 -
Probability Theory Lecture Notes ---
A random variable X is said to be uniform over the interval (𝑎, 𝑏) if its probability
density function is given by
1
𝑓 (𝑥 ) = {𝑏 − 𝑎 , 𝑎≤𝑥≤𝑏
0, otherwise
𝑎 + 𝑏
𝐸[𝑋] =
2
(𝑏 − 𝑎 )2
𝑉𝑎𝑟(𝑋 ) =
12
1 2 /2𝜎 2
𝑓 (𝑥 ) = 𝑒 −(𝑥−𝜇) , −∞<𝑥 <∞
𝜎√(2𝜋)
𝑋 − 𝜇
𝑍 =
𝜎
𝜆𝑒 −𝜆𝑥 , 𝑥≥0
𝑓 (𝑥 ) = {
0, otherwise
- 81 -
Probability Theory Lecture Notes ---
1 1
𝐸 [𝑋 ] = 𝑉𝑎𝑟(𝑋 ) =
𝜆 𝜆2
A key property possessed only by exponential random variables is that they are
memoryless, in the sense that, for positive 𝑠 and 𝑡,
If 𝑋 represents the life of an item, then the memoryless property states that, for
any 𝑡, the remaining life of a 𝑡-year-old item has the same probability distribution
as the life of a new item. Thus, one need not remember the age of an item to know
its distribution of remaining life.
𝑓 (𝑡 )
𝜆 (𝑡 ) = , 𝑡≥0
1 − 𝐹 (𝑡 )
𝜆 (𝑡 ) = 𝜆 𝑡≥0
- 82 -
Probability Theory Lecture Notes ---
𝜆𝑒 −𝜆𝑥 (𝜆𝑥)𝛼−1
𝑓 (𝑥) = {− Γ( 𝛼 )
, 𝑥≥0
0, 𝑥<0
∞
Γ(𝛼) = ∫ 𝑒 −𝑦 𝑦 𝛼−1 𝑑𝑦
0
The expected value and variance of a gamma random variable are, respectively,
𝛼 𝛼
𝐸 [𝑋 ] = Var(𝑋) =
𝜆 𝜆2
A random variable is said to have a beta distribution with parameters (𝑎, 𝑏) if its
probability density function is equal to
1
− 𝑥 𝑎−1 (1 − 𝑥)𝑏−1 , 0<𝑥<1
𝑓 (𝑥) = { 𝐵(𝑎, 𝑏)
0, otherwise
where
1
𝐵(𝑎, 𝑏) = ∫ 𝑥 𝑎−1 (1 − 𝑥)𝑏−1 𝑑𝑥
0
𝑎 𝑎𝑏
𝐸 [𝑋 ] = Var(𝑋 ) =
𝑎 + 𝑏 (𝑎 + 𝑏 )2 (𝑎 + 𝑏 + 1 )
- 83 -