0% found this document useful (0 votes)
8 views89 pages

Lecture Notes

Uploaded by

sevvalcvdr
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views89 pages

Lecture Notes

Uploaded by

sevvalcvdr
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 89

YILDIZ TEKNİK ÜNİVERSİTESİ

PROBABILITY THEORY
LECTURE NOTES

[2017]
Probability Theory Lecture Notes ---

i
Probability Theory Lecture Notes -

İçindekiler Tablosu
SYLLABUS .......................................................................................................................... - 5 -
CHAPTER 1 ........................................................................................................................ - 1 -
1.1 INTRODUCTION ....................................................................................................... - 1 -
1.2 BASIC PRINCIPLE OF COUNTING .................................................................... - 2 -
1.3 PERMUTATIONS .................................................................................................... - 3 -
1.4 COMBINATIONS ..................................................................................................... - 4 -
1.5 MULTİNOMİNAL COEFFICIENTS ..................................................................... - 5 -
SUMMARY...................................................................................................................... - 7 -
CHAPTER 2 ........................................................................................................................ - 9 -
2.1 SAMPLE SPACE AND EVENTS ........................................................................... - 9 -
2.2 AXIOMS OF PROBABILITY ............................................................................... - 11 -
2.3. SOME SIMPLE PROPOSITIONS....................................................................... - 12 -
2.4 SAMPLE SPACES HAVING EQUALLY LIKELY OUCOMES ..................... - 14 -
SUMMARY.................................................................................................................... - 17 -
CHAPTER3: ...................................................................................................................... - 19 -
CONDITIONAL PROBABILITY AND INDEPENDENCE ........................................ - 19 -
3.1 CONDITIONAL PROBABILITIES ..................................................................... - 19 -
3.2 BAYES’ FORMULA............................................................................................... - 22 -
3.3 INDEPENDENT EVENTS ..................................................................................... - 26 -
3.4 𝑷(. |𝑭) IS A PROBABILITY .................................................................................. - 29 -
SUMMARY.................................................................................................................... - 31 -
CHAPTER 4: ..................................................................................................................... - 33 -
RANDOM VARIABLES .................................................................................................. - 33 -
4.1 RANDOM VARIABLES ........................................................................................ - 33 -
4.2 DISTRIBUTION FUNCTIONS ............................................................................. - 36 -
4.3 DISCRETE RANDOM VARIABLES ................................................................... - 38 -
4.4 EXPECTED VALUE .............................................................................................. - 39 -
4.5 EXPECTATION OF A FUNCTION OF A RANDOM VARIABLE ................. - 41 -
4.6 VARIANCE ............................................................................................................. - 45 -
4.7 THE BERNOULLI AND BINOMIAL RANDOM VARIABLES ...................... - 46 -
4.7.1 Properties of Binomial Random Variables .................................................... - 47 -
4.8 THE POISSON RANDOM VARIABLE .............................................................. - 47 -
SUMMARY.................................................................................................................... - 49 -
CHAPTER 5. ..................................................................................................................... - 53 -
CONTINUOUS RANDOM VARIABLES ...................................................................... - 53 -
5.1 INTRODUCTION ..................................................................................................... - 53 -

ii
Probability Theory Lecture Notes

5.2 EXPECTATION AND VARIANCE OF CONTINUOUS RANDOM VARIABLES .... - 57 -


5.3 THE UNIFORM RANDOM VARIABLE .................................................................. - 60 -
5.4 NORMAL RANDOM VARIABLES .......................................................................... - 63 -
5.5 EXPONENTIAL RANDOM VARIABLES ................................................................ - 67 -
5.6 OTHER CONTINUOUS DISTRIBUTIONS ............................................................. - 75 -
5.6.1 The Gamma Distribution .............................................................................. - 75 -
5.6.2 The Weibull Distribution ............................................................................. - 76 -
5.6.3 The Cauchy Distribution .............................................................................. - 77 -
5.6.4 The Beta Distribution ................................................................................... - 77 -
5.7 THE DISTRIBUTION OF A FUNCTION OF A RANDOM VARIABLE ................. - 77 -
SUMMARY ...................................................................................................................... - 80 -

iii
Probability Theory Lecture Notes --

iv
Probability Theory Lecture Notes ---

SYLLABUS

EHM2952 INTRODUCTION TO PROBABILITY


Evaluation:

Midterms 2×30%

Final Exam 40%

Total 100%

Rules:

Entrance to classroom is permitted in only every 15 minutes starting from 09:00, e.g., at 09:00,
09:15, 09:30, … etc.

Textbooks:

1. Sheldon Ross, A First Course in Probability, (8th Ed.) Prentince Hall, 2010, USA.
2. Dimitri P. Bertsekas, Jon N. Tsitsiklis, Introduction to Probability, Athena Scientific, USA,
2000.
3. Hwei Hsu, Probability, Random Variables and Random Processes, Schaum’s Outline Series,
McGraw Hill, 1997.
4. Olasilik ve istatistige giris, Mühendisler ve Fenciler için, 4. Basimdan çeviri,
Prof. Dr. Salih Çelebioglu, Prof. Dr. Resat Kasap, Nobel Akademik Yayincilik 2015 (1 no tercüme)
5. 3. A. Papoulis, S. U. Pillai, Probability, Random Variables, and Stochastic Processes, McGraw-
Hill, USA.
Subjects:

1. Introduction to probability, permutation, combination, relative frequency concept Ch.1


2. Axioms of probability, set theory Text Book 1 Ch. 2
3. Conditional probability, Bayes theorem Text Book 1 Ch. 3
4. Statistical independency, mutually exclusive events Text Book 1 Ch. 3
5. Discrete random variables, their probability mass and distribution functions Text Book 1 Ch. 4
6. Expected value and variance of discrete random variables Text Book 1 Ch. 4
7. Bernoulli, Binomial and Poisson random variables and their applications Text Book 1 Ch. 4
8. Continuous random variables, their prob. density and distribution functions Text Book 1 Ch. 5
9. Expected value and variance of continuous random variables Text Book 1 Ch. 5
10. Uniform, Gaussian (normal) and exponential RV, the distribution of a function of a RV Text Book
1 Ch. 5
11. Jointly distributed RV, their prob. density and distribution functions Text Book 1 Ch. 6
12. Density functions of independent random variables Text Book 1 Ch. 6
13. Concept of random process, types of random processes, measurement of process
parameters Text Book 2 Ch. 5

v
Probability Theory Lecture Notes ---

CHAPTER 1

1.1 INTRODUCTION

Many systems encountered in science and engineering require an understanding of probability


concepts because they possess random variations. These include messages arriving at a
switchboard; customers arriving at a restaurant, movie theater, or a bank; component failure in
a system; traffic arrival at a junction; and transaction requests arriving at a server. There are
many application areas in the field of electrical and electronics engineering. Some of these are

 Communications
o During wireless or wired communications, signals encounter noise, which has
specific probabilistic features.
 Radars and electromagnetics
o The same noise arises
 Electronics
o Thermal noises, as well as noise from other sources are probabilistic.
 Combination of these
o Lifetime estimations of specific electronic components.

Probability deals with unpredictability and randomness, and probability theory is the branch
of mathematics that is concerned with the study of random phenomena. A random
phenomenon is one that, under repeated observation, yields different outcomes that are not
deterministically predictable. Examples of these random phenomena include the number of
electronic mail (e-mail) messages received by all employees of a company in one day, the
number of phone calls arriving at the university’s switchboard over a given period, the
number of components of a system that fail within a given interval, number of bits correctly

-1-
Probability Theory Lecture Notes ---

received through the internet, and the number of A’s that a student can receive in one
academic year.

1.2 BASIC PRINCIPLE OF COUNTING

 Basic to all our work

 States that if one experiment can result in any of m possible outcomes and if another
experiment can result in any of n possible outcomes, then there are mn possible
outcomes of the two experiments.

The Basic Principle of Counting

Suppose that two experiments are to be performed. Then if experiment 1 can result
in any one of m possible outcomes and if for each outcome of experiment 1 there are
n possible outcomes of experiment 2, then together there are mn possible outcomes
of the two experiments.

Example 1.1: A small community consists of 10 women, each of whom has 3 children. If one
woman and one of her children are to be chosen as mother and child of the year, how many
different choices are possible?

Solution: We see from the basic principle that there are 10 x 3 = 30 possible choices.

The Generalized Basic Principle of Counting

If r experiments that are to be performed are such that the first one may result in any
of n1 possible outcomes; and if, for each of these 𝒏𝟏 possible outcomes, there are 𝒏𝟐
possible outcomes of the second experiment; and if, for each of the possible
outcomes of the first two experiments, there are 𝒏𝟑 possible outcomes of the third
experiment; and if . . . , then there is a total of 𝒏𝟏 · 𝒏𝟐 · · · 𝒏𝒓 possible outcomes
of the r experiments.

Example 1.2: A college planning committee consists of 3 freshmen, 4 sophomores, 5 juniors,


and 2 seniors. A subcommittee of 4, consisting of 1 person from each class, is to be chosen.
How many different subcommittees are possible?

Solution: There are 𝟑 × 𝟒 × 𝟓 × 𝟐 = 𝟏𝟐𝟎 possible subcommittees

-2-
Probability Theory Lecture Notes ---

1.3 PERMUTATIONS

How many different ordered arrangements of the letters a, b, and c are possible? By direct
enumeration we see that there are 6, namely, abc, acb, bac, bca, cab, and cba. Each
arrangement is known as a permutation.

Suppose now that we have n objects. Reasoning similar to that we have just used for the 3
letters then shows that there are

𝒏(𝒏 − 𝟏)(𝒏 − 𝟐) · · · 𝟑 · 𝟐 · 𝟏 = 𝒏!

different permutations of the n objects.

Example 1.3: A class in probability theory consists of 6 men and 4 women. An examination
is given, and the students are ranked according to their performance. Assume that no two
students obtain the same score.

(a) How many different rankings are possible?

(b) If the men are ranked just among themselves and the women just among
themselves, how many different rankings are possible?

Solution. (a) Because each ranking corresponds to a particular ordered arrangement of the 10
people, the answer to this part is 𝟏𝟎! = 𝟑, 𝟔𝟐𝟖, 𝟖𝟎𝟎.

(b) Since there are 𝟔! possible rankings of the men among themselves and 𝟒! possible
rankings of the women among themselves, it follows from the basic principle that
there are (𝟔!)(𝟒!) = (𝟕𝟐𝟎)(𝟐𝟒) = 𝟏𝟕, 𝟐𝟖𝟎 possible rankings in this case.

Example 1.4: Ms. Jones has 10 books that she is going to put on her bookshelf. Of these, 4
are mathematics books, 3 are chemistry books, 2 are history books, and 1 is a language book.
Ms. Jones wants to arrange her books so that all the books dealing with the same subject are
together on the shelf. How many different arrangements are possible?

Solution. There are 𝟒! 𝟑! 𝟐! 𝟏! arrangements such that the mathematics books are first in
line, then the chemistry books, then the history books, and then the language book. Similarly,
for each possible ordering of the subjects, there are 𝟒! 𝟑! 𝟐! 𝟏! possible arrangements.

-3-
Probability Theory Lecture Notes ---

Hence, as there are 4! possible orderings of the subjects, the desired answer is
𝟒! 𝟒! 𝟑! 𝟐! 𝟏! = 𝟔𝟗𝟏𝟐.

Example 1.5: A chess tournament has 10 competitors, of which 4 are Russian, 3 are from the
United States, 2 are from Great Britain, and 1 is from Brazil. If the tournament result lists just
the nationalities of the players in the order in which they placed, how many outcomes are
possible?

Solution. There are

𝟏𝟎!/(𝟒! 𝟑! 𝟐! 𝟏!) = 𝟏𝟐𝟔𝟎𝟎

possible outcomes.

1.4 COMBINATIONS

For instance, how many different groups of 3 could be selected from the 5 items A, B, C, D,
and E? To answer this question, reason as follows: Since there are 5 ways to select the initial
item, 4 ways to then select the next item, and 3 ways to select the final item, there are thus 5 ·
4 · 3 ways of selecting the group of 3 when the order in which the items are selected is
relevant. However, since every group of 3—say, the group consisting of items A, B, and C
will be counted 6 times (that is, all of the permutations ABC, ACB, BAC,BCA, CAB, and CBA
will be counted when the order of selection is relevant), it follows that the total number of
groups that can be formed is

𝟓 · 𝟒 · 𝟑/(𝟑 · 𝟐 · 𝟏) = 𝟏𝟎

In general, as 𝒏(𝒏 − 𝟏) · · · (𝒏 − 𝒓 + 𝟏) represents the number of different ways that a


group of r items could be selected from n items when the order of selection is relevant, and as
each group of r items will be counted r! times in this count, it follows that the number of
different groups of r items that could be formed from a set of n items is

𝒏 𝒏 (𝒏 − 𝟏) ⋯ (𝒏 − 𝒓 + 𝟏) 𝒏!
( )= =
𝒓 𝒓! (𝒏 − 𝒓)! 𝒓!

-4-
Probability Theory Lecture Notes ---

Example 1.6: A committee of 3 is to be formed from a group of 20 people. How many


different committees are possible?

𝟐𝟎 𝟐𝟎 ⋅𝟏𝟗⋅𝟏𝟖
Solution. There are( ) = 𝟑⋅𝟐⋅𝟏 = 𝟏𝟏𝟒𝟎 possible outcomes.
𝟑

Example 1.7: From a group of 5 women and 7 men, how many different committees
consisting of 2 women and 3 men can be formed? What if 2 of the men are feuding and refuse
to serve on the committee together?

𝟓 𝟕
Solution As there are ( ) possible groups of 2 women and ( ) possible groups of 3 men, it
𝟐 𝟑
𝟓 𝟕
follows from the basic principle that there are ( ) ( ) = 𝟑𝟓𝟎 possible commitees
𝟐 𝟑
consisting of 2 women and 3 men.

On the other hand, if 2 of the men refuse to serve on the committee together, then there are
𝟐 𝟓 𝟐 𝟓
( ) ( ) possible groups of 3 men not containing either of the feuding men and ( ) ( )
𝟎 𝟑 𝟏 𝟐
𝟐 𝟓 𝟐 𝟓
groups of 3 men containing exactly one of the feuding men, there are ( ) ( ) + ( ) ( ) =
𝟎 𝟑 𝟏 𝟐
𝟓
𝟑𝟎 groups of 3 men not containing both of the feuding men. Since there are ( ) ways to
𝟐
𝟓
choose the 2 women, it follows that in this case there are 𝟑𝟎 ( ) = 𝟑𝟎𝟎 possible committees.
𝟐

The Binomial Theorem


𝒏
𝒏
(𝒙 + 𝒚)𝒏 = ∑ ( ) 𝒙𝒌 𝒚𝒏−𝒌
𝒌
𝒌=𝟎

1.5 MULTİNOMİNAL COEFFICIENTS

In this section, we consider the following problem: A set of n distinct items is to be divided
into r distinct groups of respective sizes 𝒏𝟏 , 𝒏𝟐 , ⋯ , 𝒏𝒓 , where ∑𝒓𝒊=𝟏 𝒏𝒊 = 𝒏. How many
𝒏
different divisions are possible? To answer this question, we note that there are (𝒏 ) possible
𝟏
𝒏 − 𝒏𝟏
choices for the first group; for each choice of the first group, there are ( 𝒏 ) possible
𝟐

-5-
Probability Theory Lecture Notes ---

𝒏 − 𝒏𝟏 − 𝒏𝟐
choices for the second group; for each choice of the first two groups, there are ( 𝒏𝟑 )

possible choices for the third group; and so on. It then follows from the generalized version of
the basic counting principle that there are

𝒏 𝒏 − 𝒏𝟏 𝒏 − 𝒏𝟏 − 𝒏𝟐 − ⋯ − 𝒏𝒓−𝟏
(𝒏 ) ( 𝒏 ) ⋯ ( 𝒏 )
𝟏 𝟐 𝒓

𝒏! (𝒏 − 𝒏𝟏 )! (𝒏 − 𝒏𝟏 − 𝒏𝟐 − ⋯ − 𝒏𝒓−𝟏 )!
= ⋯
(𝒏 − 𝒏𝟏 )! 𝒏𝟏 ! (𝒏 − 𝒏𝟏 − 𝒏𝟐 )! 𝒏𝟐 ! 𝟎! 𝒓!

𝒏!
=
𝒏𝟏 ! 𝒏𝟐 ! ⋯ 𝒏𝒓 !

possible divisions.

Notation
𝒏
If 𝒏𝟏 + 𝒏𝟐 + · · · + 𝒏𝒓 = 𝒏, we define (𝒏 , 𝒏 , ⋯ , 𝒏 ) by
𝟏 𝟐 𝒓

𝒏 𝒏!
(𝒏 , 𝒏 , ⋯ , 𝒏 ) =
𝟏 𝟐 𝒓 𝒏𝟏 ! 𝒏𝟐 ! ⋯ 𝒏𝒓 !

𝒏
Thus, (𝒏 , 𝒏 , ⋯ , 𝒏 ) represents the number of possible divisions of n distinct objects into 𝒓
𝟏 𝟐 𝒓
distinct groups of respective sizes 𝒏𝟏 , 𝒏𝟐 , ⋯ . , 𝒏𝒓 .

Example 1.8: A police department in a small city consists of 10 officers. If the department
policy is to have 5 of the officers patrolling the streets, 2 of the officers working full time at
the station, and 3 of the officers on reserve at the station, how many different divisions of the
10 officers into the 3 groups are possible?

Solution. There are 𝟏𝟎! / (𝟓! 𝟐! 𝟑!) = 𝟐𝟓𝟐𝟎 possible divisions.

Example 1.9: Ten children are to be divided into an A team and a B team of 5 each. The A
team will play in one league and the B team in another. How many different divisions are
possible?

Solution. There are 𝟏𝟎! / (𝟓! 𝟓!) = 𝟐𝟓𝟐 possible divisions.

-6-
Probability Theory Lecture Notes ---

Example 1.10: In order to play a game of basketball, 10 children at a playground divide


themselves into two teams of 5 each. How many different divisions are possible?

Solution. Note that this example is different from Example 1.9 because now the order of the
two teams is irrelevant. That is, there is no A and B team, but just a division consisting of 2
groups of 5 each. Hence, the desired answer is 𝟏𝟎!/((𝟓! 𝟓!) 𝟐!) = 𝟏𝟐𝟔

THE MULTINOMINAL THEOREM


𝒏
𝒏 𝒏 𝒏 𝒏
(𝒙𝟏 + 𝒙𝟐 + ⋯ + 𝒙𝒓 )𝒏 = ∑ (𝒏 , 𝒏 , ⋯ , 𝒏 ) 𝒙 𝟏 𝟏 𝒙 𝟐 𝟐 ⋯ 𝒙 𝒓 𝒓
𝟏 𝟐 𝒓
(𝒏𝟏 ,⋯,𝒏𝒓 ):
𝒏𝟏 +𝒏𝟐 +⋯+𝒏𝒓 =𝒏

That is, the sum is over all nonnegative integer-valued vactors (𝒏𝟏 , 𝒏𝟐 , ⋯ , 𝒏𝒓 ) such that
𝒏𝟏 + 𝒏𝟐 + ⋯ + 𝒏𝒓 = 𝒏.

SUMMARY

The basic principle of counting states that if an experiment consisting of two phases is such
that there are 𝑛 possible outcompes of phase 1 and, for each of these 𝑛 outcomes, there are 𝑚
possible outcomes of phase 2, then there are 𝑛 ⋅ 𝑚 possible outcomes of the experiment.

There are 𝑛! = 𝑛(𝑛 − 1) ⋯ 3 ⋅ 2 ⋅ 1 possible linear orderings of n items. Let

𝑛 𝑛!
( )=
𝑖 (𝑛 − 𝑖)! 𝑖!

when 0 ≤ 𝑖 ≤ 𝑛, and let it equal 0 otherwise. This quantity represents the number of different
subgroups of size i that can be chosen from a set of size n. It is often called binomial
coefficient because of its prominence in the binomial theorem, which states that

𝑛
𝑛
(𝑥 + 𝑦) = ∑ ( ) 𝑥 𝑘 𝑦 𝑛−𝑘
𝑛
𝑘
𝑘=0

For nonnegative numbers 𝑛1 , ⋯ , 𝑛𝑟 summing to n,

𝑛 𝑛!
(𝑛 , 𝑛 , ⋯ , 𝑛 ) =
1 2 𝑟 𝑛1 ! 𝑛2 ! ⋯ 𝑛𝑟 !

-7-
Probability Theory Lecture Notes ---

is the number of divisions of 𝑛 items into 𝑟 distinct nonoverlapping subgroups of sizes


𝑛1 , ⋯ , 𝑛𝑟 .

-8-
Probability Theory Lecture Notes ---

CHAPTER 2

2.1 SAMPLE SPACE AND EVENTS

Consider an experiment whose set of all possible outcomes is known. This set of all possible
outcomes of an experiment is known as the sample space of the experiment and is denoted by
S. Following are some examples:

1. If the outcome of an experiment consists in the determination of the sex of a newborn


child, then

𝑺 = {𝒈, 𝒃}

where the outcome g means that the child is a girl and b that it is a boy.

2. If the outcome of an experiment is the order of finish in a race among the 7 horses
having post positions 1, 2, 3, 4, 5, 6, and 7, then

𝑺 = {𝒂𝒍𝒍 𝟕! 𝒑𝒆𝒓𝒎𝒖𝒕𝒂𝒕𝒊𝒐𝒏𝒔 𝒐𝒇 (𝟏, 𝟐, 𝟑, 𝟒, 𝟓, 𝟔, 𝟕)}

The outcome (𝟐, 𝟑, 𝟏, 𝟔, 𝟓, 𝟒, 𝟕) means, for instance, that the number 2 horse comes in
first, then the number 3 horse, then the number 1 horse, and so on.

3. If the experiment consists of flipping two coins, then the sample space consists of the
following four points:

𝑺 = {(𝑯, 𝑯), (𝑯, 𝑻), (𝑻, 𝑯), (𝑻, 𝑻)}

The outcome will be (𝑯, 𝑯) if both coins are heads, (𝑯, 𝑻) if the first coin is heads
and the second tails, (𝑻, 𝑯) if the first is tails and the second heads, and (𝑻, 𝑻) if both
coins are tails.

-9-
Probability Theory Lecture Notes ---

4. If the experiment consists of tossing two dice, then the sample space consists of the 36
points

𝑺 = {(𝒊, 𝒋): 𝒊, 𝒋 = 𝟏, 𝟐, 𝟑, 𝟒, 𝟓, 𝟔}

where the outcome (𝒊, 𝒋) is said to occur if i appears on the leftmost die and j on the
other die.

5. If the experiment consists of measuring (in hours) the lifetime of a transistor, then the
sample space consists of all nonnegative real numbers; that is,

𝑺 = {𝒙: 𝟎 < 𝒙 < ∞}

Any subset E of the sample space is known as an event. In other words, an event is a set
consisting of possible outcomes of the experiment. If the outcome of the experiment is
contained in E, then we say that E has occurred. Example 1, if 𝑬 = {𝒈}, then E is the event
that the child is a girl. In Example 3, if 𝑬 = {(𝑯, 𝑯), (𝑯, 𝑻)}, then E is the event that a head
appears on the first coin. In Example 4, 𝒊𝒇 𝑬 = {(𝟏, 𝟔), (𝟐, 𝟓), (𝟑, 𝟒), (𝟒, 𝟑), (𝟓, 𝟐), (𝟔, 𝟏)},
then E is the event that

the sum of the dice equals 7. In Example 5, if 𝑬 = {𝒙: 𝟎 < 𝒙 < 𝟓}, then E is the event that
the transistor does not last longer than 5 hours.

The event 𝑬 𝑼 𝑭 is called the union of the event E and the event F.

For any two events E and F, we may also define the new event 𝑬𝑭, called the intersection of
E and F

If 𝑬𝑭 = Ø, then E and F are said to be mutually exclusive.

For any event E, we define the new event 𝑬𝒄 , referred to as the complement of E, to consist of
all outcomes in the sample space S that are not in E.

If E is contained in F, or E is a subset of F, and write 𝑬  𝑭 (or equivalently, 𝑭  𝑬, which


we sometimes say as F is a superset of E).

A graphical representation that is useful for illustrating logical relations among events is the
Venn diagram.

- 10 -
Probability Theory Lecture Notes ---

Commutative Laws 𝑬𝑼𝑭 = 𝑭𝑼𝑬 𝑬𝑭 = 𝑭𝑬

Associative Laws (𝑬 𝑼 𝑭) 𝑼 𝑮 = 𝑬 𝑼 (𝑭 𝑼 𝑮) (𝑬𝑭)𝑮 = 𝑬(𝑭𝑮)

Distributive Laws (𝑬 𝑼 𝑭)𝑮 = 𝑬𝑮 𝑼 𝑭𝑮 𝑬𝑭 𝑼 𝑮 = (𝑬 𝑼 𝑮)(𝑭 𝑼 𝑮)

De Morgan’s Laws:

𝒏 𝒄 𝒏

(⋃ 𝑬𝒊 ) = ⋂ 𝑬𝒄𝒊
𝒊=𝟏 𝒊=𝟏

𝒏 𝒄 𝒏

(⋂ 𝑬𝒊 ) = ⋃ 𝑬𝒄𝒊
𝒊=𝟏 𝒊=𝟏

2.2 AXIOMS OF PROBABILITY

One way of defining the probability of an event is in terms of its relative frequency. Such a
definition usually goes as follows: We suppose that an experiment, whose sample space is 𝑺,
is repeatedly performed under exactly the same conditions. For each event E of the sample
space 𝑺, we define 𝒏(𝑬) to be the number of times in the first n repetitions of the experiment
that the event E occurs. Then P(E), the probability of the event 𝑬, is defined as

- 11 -
Probability Theory Lecture Notes ---

𝒏(𝑬)
𝑷(𝑬) = 𝐥𝐢𝐦
𝒏→∞ 𝒏

Consider an experiment whose sample space is 𝑺. For each event 𝑬 of the sample space 𝑺, we
assume that a number 𝑷(𝑬) is defined and satisfies the following three axioms:

Axiom 1

𝟎 ≤ 𝑷(𝑬) ≤ 𝟏

Axiom 2

𝑷(𝑺) = 𝟏

Axiom 3

For any sequence of mutually exclusive events 𝑬𝟏 , 𝑬𝟐 , . .. (that is, events for which
𝑬𝒊 𝑬𝒋 = Ø when 𝒊 ≠ 𝒋),

∞ ∞

𝑷 (⋃ 𝑬𝒊 ) = ∑ 𝑷(𝑬𝒊 )
𝒊=𝟏 𝒊=𝟏

Example 2.1: If a die is rolled and we suppose that all six sides are equally likely to appear,
then we would have ({𝟏}) = 𝑷({𝟐}) = 𝑷({𝟑}) = 𝑷({𝟒}) = 𝑷({𝟓}) = 𝑷({𝟔}) = 𝟏/𝟔 .
From Axiom 3, it would thus follow that the probability of rolling an even number would
equal

𝑷({𝟐, 𝟒, 𝟔}) = 𝑷({𝟐}) + 𝑷({𝟒}) + 𝑷({𝟔}) = ½

2.3. SOME SIMPLE PROPOSITIONS

We first note that, since 𝑬 and 𝑬𝒄 are always mutually exclusive and since 𝑬 ∪ 𝑬𝒄 = 𝑺, we
have, by Axioms 2 and 3,

𝟏 = 𝑷(𝑺) = 𝑷(𝑬 𝑼 𝑬𝒄 ) = 𝑷(𝑬) + 𝑷(𝑬𝒄 )

Proposition 1:

𝑷(𝑬𝒄 ) = 𝟏 − 𝑷(𝑬)

- 12 -
Probability Theory Lecture Notes ---

Proposition 2:

If 𝑬  𝑭, then 𝑷(𝑬) ≤ 𝑷(𝑭)

Proposition 3:

𝑷(𝑬 𝑼 𝑭) = 𝑷(𝑬) + 𝑷(𝑭) – 𝑷(𝑬𝑭)

E F

I II III

Example 2.2: J is taking two books along on her holiday vacation. With probability .5, she
will like the first book; with probability .4, she will like the second book; and with probability
.3, she will like both books. What is the probability that she likes neither book?

Solution. Let Bi denote the event that J likes book i, i = 1, 2. Then the probability that she
likes at least one of the books is

𝑷(𝑩𝟏 𝑼 𝑩𝟐 ) = 𝑷(𝑩𝟏 ) + 𝑷(𝑩𝟐 ) − 𝑷(𝑩𝟏 𝑩𝟐 ) = . 𝟓 + . 𝟒 − . 𝟑 = . 𝟔

Because the event that J likes neither book is the complement of the event that she likes at
least one of them, we obtain the result

𝑷(𝑩𝒄𝟏 𝑩𝒄𝟐 ) = 𝑷(𝑩𝒄𝟏 𝑼 𝑩𝒄𝟐 ) = 𝟏 − 𝑷(𝑩𝟏 𝑼 𝑩𝟐 ) = 𝟎. 𝟒 .

We may also calculate the probability that any one of the three events 𝑬, 𝑭, and 𝑮 occurs,
namely,

𝑷(𝑬 𝑼 𝑭 𝑼 𝑮) = 𝑷[(𝑬 𝑼 𝑭) 𝑼 𝑮]

which, by Proposition 4.3, equals

𝑷(𝑬 𝑼 𝑭) + 𝑷(𝑮) − 𝑷[(𝑬 𝑼 𝑭)𝑮]

Now, it follows from the distributive law that the events (𝑬 𝑼 𝑭)𝑮 and 𝑬𝑮 𝑼 𝑭𝑮 are
equivalent; hence, from the preceding equations, we obtain

- 13 -
Probability Theory Lecture Notes ---

𝑷(𝑬 𝑼 𝑭 𝑼 𝑮)

= 𝑷(𝑬) + 𝑷(𝑭) − 𝑷(𝑬𝑭) + 𝑷(𝑮) − 𝑷(𝑬𝑮 𝑼 𝑭𝑮)

= 𝑷(𝑬) + 𝑷(𝑭) − 𝑷(𝑬𝑭) + 𝑷(𝑮) − 𝑷(𝑬𝑮) − 𝑷(𝑭𝑮) + 𝑷(𝑬𝑮𝑭𝑮)

= 𝑷(𝑬) + 𝑷(𝑭) + 𝑷(𝑮) − 𝑷(𝑬𝑭) − 𝑷(𝑬𝑮) − 𝑷(𝑭𝑮) + 𝑷(𝑬𝑭𝑮)

Proposition 4

𝑷(𝑬𝟏 ∪ 𝑬𝟐 ∪ ⋯ ∪ 𝑬𝒏 )
𝒏

= ∑ 𝑷(𝑬𝒊 ) − ∑ 𝑷(𝑬𝒊𝟏 𝑬𝒊𝟐 ) + ⋯


𝒊=𝟏 𝒊𝟏 <𝒊𝟐

+ (−𝟏)𝒓+𝟏 ∑ 𝑷(𝑬𝒊𝟏 𝑬𝒊𝟐 ⋯ 𝑬𝒊𝒓 ) + ⋯ + (−𝟏)𝒏+𝟏 𝑷(𝑬𝟏 𝑬𝟐 ⋯ 𝑬𝒏 )


𝒊𝟏 <𝒊𝟐 <⋯<𝒊𝒓

𝒏
The summation ∑𝒊𝟏 <𝒊𝟐 <⋯<𝒊𝒓 𝑷(𝑬𝒊𝟏 𝑬𝒊𝟐 ⋯ 𝑬𝒊𝒓 ) is taken over all of the ( ) possible subsets of
𝒓
size 𝒓 of the set {𝟏, 𝟐, ⋯ , 𝒏}.

2.4 SAMPLE SPACES HAVING EQUALLY LIKELY OUCOMES

Consider an experiment whose sample space 𝑺 is a finite set, say, 𝑺 = {𝟏, 𝟐, . . . , 𝑵}. Then it
is often natural to assume that

𝑷({𝟏}) = 𝑷({𝟐}) = · · · = 𝑷({𝑵})

which implies, from Axioms 2 and 3, that

𝟏
𝑷({𝒊}) = , 𝒊 = 𝟏, 𝟐, . . . , 𝑵
𝑵

From this equation, it follows from Axiom 3 that, for any event E,

𝑷(𝑬) = (𝒏𝒖𝒎𝒃𝒆𝒓 𝒐𝒇 𝒐𝒖𝒕𝒄𝒐𝒎𝒆𝒔 𝒊𝒏 𝑬)/(𝒏𝒖𝒎𝒃𝒆𝒓 𝒐𝒇 𝒐𝒖𝒕𝒄𝒐𝒎𝒆𝒔 𝒊𝒏 𝑺)

- 14 -
Probability Theory Lecture Notes ---

In words, if we assume that all outcomes of an experiment are equally likely to occur, then the
probability of any event E equals the proportion of outcomes in the sample space that are
contained in E.

Example 2.3: If two dice are rolled, what is the probability that the sum of the upturned faces
will equal 7?

Solution. We shall solve this problem under the assumption that all of the 36 possible
outcomes are equally likely. Since there are 6 possible outcomes—namely,
(𝟏, 𝟔), (𝟐, 𝟓), (𝟑, 𝟒), (𝟒, 𝟑), (𝟓, 𝟐), and (𝟔, 𝟏)—that result in the sum of the dice being equal
to 7, the desired probability is 6/36 = 1/ 6.

Example 2.4: If 3 balls are “randomly drawn” from a bowl containing 6 white and 5 black
balls, what is the probability that one of the balls is white and the other two black?

Solution. If we regard the order in which the balls are selected as being relevant, then the
sample space consists of 11 · 10 · 9 = 990 outcomes. Furthermore, there are 6 · 5 · 4 = 120
outcomes in which the first ball selected is white and the other two are black; 5 · 6 · 4 = 120
outcomes in which the first is black, the second is white, and the third is black; and 5 · 4 · 6 =
120 in which the first two are black and the third is white. Hence, assuming that “randomly
drawn” means that each outcome in the sample space is equally likely to occur, we see that
the desired probability is

(𝟏𝟐𝟎 + 𝟏𝟐𝟎 + 𝟏𝟐𝟎) / 𝟗𝟗𝟎 = 𝟒 / 𝟏𝟏

This problem could also have been solved by

𝟔 𝟓
( )( )
𝟏 𝟐 = 𝟒
𝟏𝟏 𝟏𝟏
( )
𝟑

or

𝟐𝟎 ⋅ 𝟏𝟖 ⋅ 𝟏𝟔 ⋅ 𝟏𝟒 ⋅ 𝟏𝟐 𝟒
= .
𝟐𝟎 ⋅ 𝟏𝟗 ⋅ 𝟏𝟖 ⋅ 𝟏𝟕 ⋅ 𝟏𝟔 𝟏𝟏

- 15 -
Probability Theory Lecture Notes ---

Example 2.5: A committee of 5 is to be selected from a group of 6men and 9 women. If the
selection is made randomly, what is the probability that the committee consists of 3 men and 2
women?

𝟏𝟓
Solution. Because each of the ( ) possible committees is equally likely to be selected, the
𝟓
desired probability is

𝟔 𝟗
( )( )
𝟑 𝟐 = 𝟐𝟒𝟎
𝟏𝟓 𝟏𝟎𝟎𝟏
( )
𝟓

Example 2.6: An urn contains n balls, one of which is special. If k of these balls are
withdrawn one at a time, with each selection being equally likely to be any of the balls that
remain at the time, what is the probability that the special ball is chosen?

Solution.

𝟏 𝒏−𝟏
( )( )
𝑷{𝑺𝒑𝒆𝒄𝒊𝒂𝒍 𝒃𝒂𝒍𝒍 𝒊𝒔 𝒔𝒆𝒍𝒆𝒄𝒕𝒆𝒅} = 𝟏 𝒌 −𝟏 = 𝒌
𝒏 𝒏
( )
𝒌

Example 2.7: A 5-card poker hand is said to be a full house if it consists of 3 cards of the
same denomination and 2 other cards of the same denomination (of course, different from the
first denomination). Thus, one kind of full house is three of a kind plus a pair. What is the
probability that one is dealt a full house?

𝟓𝟐
Solution. Again, we assume that all ( ) possible hands are equally likely. To determine the
𝟓
𝟒 𝟒
number of possible full houses, we first note that there are ( ) ( ) different combinations of,
𝟐 𝟑
say, 2 tens and 3 jacks. Because there are 13 different choices for the kind of pair and, after a
pair has been chosen, there are 12 other choices for the denomination of the remaining 3
cards, it follows that the probability of a full house is

𝟒 𝟒
𝟏𝟑 ⋅ 𝟏𝟐 ⋅ ( ) ( )
𝟐 𝟑 ≈ 𝟎. 𝟎𝟎𝟏𝟒
𝟓𝟐
( )
𝟓
- 16 -
Probability Theory Lecture Notes ---

Example 2.8: A poker hand consists of 5 cards. If the cards have distinct consecutive values
and are not all of the same suit, we say that the hand is a straight. For instance, a hand
consisting of the five of spades, six of spades, seven of spades, eight of spades, and nine of
hearts is a straight. What is the probability that one is dealt a straight?

𝟓𝟐
Solution. We start by assuming that all ( ) possible poker hands are equally likely. To
𝟓
determine the number of outcomes that are straights, let us first determine the number of
possible outcomes for which the poker hand consists of an ace, two, three, four, and five (the
suits being irrelevant). Since the ace can be any 1 of the 4 possible aces, and similarly for the
two, three, four, and five, it follows that there are 𝟒𝟓 outcomes leading to exactly one ace,
two, three, four, and five. Hence, since in 4 of these outcomes all the cards will be of the same
suit (such a hand is called a straight flush), it follows that there are 45 − 4 hands that make
up a straight of the form ace, two, three, four, and five. Similarly, there are 45 − 4 hands that
make up a straight of the form ten, jack, queen, king, and ace. Thus, there are 10(45 − 4)
hands that are straights, and it follows that the desired probability is

𝟏𝟎(𝟒𝟓 − 𝟒)
≈ 𝟎. 𝟎𝟎𝟑𝟗
𝟓𝟐
( )
𝟓

SUMMARY

Let 𝑆 denote the set of all possible outcomes of an experiment. 𝑆 is called the sample space of
the experiment. An event is a subset of 𝑆. If 𝐴𝑖 , 𝑖 = 1, … , 𝑛, are events, then ⋃𝑛𝑖=1 𝐴𝑖 , called
the union of these events, consists of all outcomes that are in at least one of the events 𝐴𝑖 ,
𝑖 = 1, … , 𝑛. Similarly, ⋂𝑛𝑖=1 𝐴𝑖 , sometimes written as 𝐴1 ⋯ 𝐴𝑛 , is called the intersection of the
events 𝐴𝑖 and consists of all outcomes that are in all of the events 𝐴𝑖 , 𝑖 = 1, … , 𝑛.

For any event 𝐴, we define 𝐴𝑐 to consist of all outcomes in the sample space that are not in 𝐴.
We call 𝐴𝑐 the complement of the event 𝐴. The event 𝑆 𝑐 , which is empty of outcomes, is
designated by Ø and is called the null set. If 𝐴𝐵 = Ø, then we say that 𝐴 and 𝐵 are mutually
exclusive.

For each event 𝐴 of the sample space 𝑆, we suppose that a number 𝑃(𝐴), called the
probability of 𝐴, is defined and is such that

- 17 -
Probability Theory Lecture Notes ---

i. 0 ≤ 𝑃(𝐴) ≤ 1
ii. 𝑃(𝑆) = 1
iii. For mutually exclusive events 𝐴𝑖 ,

∞ ∞

𝑷 (⋃ 𝐴𝒊 ) = ∑ 𝑷(𝐴𝒊 )
𝒊=𝟏 𝒊=𝟏

𝑃(𝐴) represents the probability that the outcome of the experiment is in 𝐴. It can be shown
that

𝑃(𝐴𝑐 ) = 1 − 𝑃(𝐴)

A useful result is that

𝑃(𝐴 ∪ 𝐵) = 𝑃(𝐴) + 𝑃(𝐵) − 𝑃(𝐴𝐵)

which can be generalized to give

𝑛 𝑛

𝑃 (⋃ 𝐴𝑖 ) = ∑ 𝑃(𝐴𝑖 ) − ∑ 𝑃(𝐴𝑖 𝐴𝑗 ) + ⋯ + (−1)𝑟+1 ∑ 𝑃(𝐴𝑖1 𝐴𝑖2 ⋯ 𝐴𝑖𝑟 )


𝑖=1 𝑖=1 𝑖<𝑗 𝑖1 <𝑖2 <⋯<𝑖𝑟

+ ⋯ + (−1)𝑛+1 𝑃(𝐴1 𝐴2 ⋯ 𝐴𝑛 )

If 𝑆 is finite and each one point set is assumed to have equal probability, then

|𝐴|
𝑃(𝐴) =
|𝑆|

where |𝐸| denotes the number of outcomes in the event 𝐸.

𝑃(𝐴) can be interpreted either as a long-run relative frequency or as a measure of one’s


degree of belief.

- 18 -
Probability Theory Lecture Notes ---

CHAPTER3:
CONDITIONAL PROBABILITY AND
INDEPENDENCE

3.1 CONDITIONAL PROBABILITIES

Suppose that we toss 2 dice, and suppose that each of the 36 possible outcomes is equally
likely to occur and hence has probability 1/36 . Suppose further that we observe that the first
die is a 3. Then, given this information, what is the probability that the sum of the 2 dice
equals 8?

Given that the initial die is a 3, there can be at most 6 possible outcomes of our experiment,
namely, (3, 1), (3, 2), (3, 3), (3, 4), (3, 5), and (3, 6), the (conditional) probability of each of
the outcomes (3, 1), (3, 2), (3, 3), (3, 4), (3, 5), and (3, 6) is 1/6, whereas the (conditional)
probability of the other 30 points in the sample space is 0. Hence, the desired probability will
be 1/6.

If we let E and F denote, respectively, the event that the sum of the dice is 8 and the event that
the first die is a 3, then the probability just obtained is called the conditional probability that
E occurs given that F has occurred and is denoted by

𝑷(𝑬|𝑭)

The following definition is made:

Definition:

If 𝑷(𝑭) > 𝟎, then

𝑷(𝑬𝑭)
𝑷(𝑬|𝑭) =
𝑷(𝑭)

- 19 -
Probability Theory Lecture Notes ---

Example 3.1: A student is taking a one-hour-time-limit makeup examination. Suppose the


probability that the student will finish the exam in less than 𝒙 hours is 𝒙/𝟐, for all 𝟎 < 𝒙 <
𝟏. Then, given that the student is still working after 0.75 hour, what is the conditional
probability that the full hour is used?

Solution. Let 𝑳𝒙 denote the event that the student finishes the exam in less than 𝒙 hours,
𝟎 < 𝒙 < 𝟏, and let 𝑭 be the event that the student uses the full hour. Because 𝑭 is the event
that the student is not finished in less than 1 hour,

𝑷(𝑭) = 𝑷(𝑳𝒄𝟏 ) = 𝟏 − 𝑷(𝑳𝟏 ) = 𝟎. 𝟓

Now, the event that the student is still working at time . 𝟕𝟓 is the complement of the event
𝑳𝟎.𝟕𝟓 , so the desired probability is obtained from

𝑷(𝑭𝑳𝒄𝟎.𝟕𝟓 )
𝑷(𝑭|𝑳𝒄𝟎.𝟕𝟓 ) =
𝑷(𝑳𝒄𝟎.𝟕𝟓 )

𝑷(𝑭)
=
𝟏 − 𝑷(𝑳𝟎.𝟕𝟓 )

= 𝟎. 𝟓 × 𝟎. 𝟔𝟐𝟓 = . 𝟖

If each outcome of a finite sample space 𝑺 is equally likely, then, conditional on the event that
the outcome lies in a subset 𝑭  𝑺, all outcomes in 𝑭 become equally likely. In such cases, it
is often convenient to compute conditional probabilities of the form 𝑷(𝑬|𝑭) by using 𝑭 as the
sample space. Indeed, working with this reduced sample space often results in an easier and
better understood solution. Our next few examples illustrate this point.

Example 3.2: A coin is flipped twice. Assuming that all four points in the sample space
𝑺 = {(𝒉, 𝒉), (𝒉, 𝒕), (𝒕, 𝒉), (𝒕, 𝒕)} are equally likely, what is the conditional probability that
both flips land on heads, given that (a) the first flip lands on heads? (b) at least one flip lands
on heads?

Solution. Let 𝑩 = {(𝒉, 𝒉)} be the event that both flips land on heads; let
𝑭 = {(𝒉, 𝒉), (𝒉, 𝒕)} be the event that the first flip lands on heads; and let
𝑨 = {(𝒉, 𝒉), (𝒉, 𝒕), (𝒕, 𝒉)} be the event that at least one flip lands on heads. The probability
for (a) can be obtained from

- 20 -
Probability Theory Lecture Notes ---

𝑷(𝑩𝑭) 𝑷({𝒉, 𝒉}) 𝟏/𝟒 𝟏


𝑷(𝑩|𝑭) = = = =
𝑷(𝑭) 𝑷({𝒉, 𝒉}, {𝒉, 𝒕}) 𝟐/𝟒 𝟐

For (b), we have

𝑷(𝑩𝑨) 𝑷({𝒉, 𝒉}) 𝟏/𝟒 𝟏


𝑷(𝑩|𝑨) = = = =
𝑷(𝑨) 𝑷({(𝒉, 𝒉), (𝒉, 𝒕), (𝒕, 𝒉)}) 𝟑/𝟒 𝟑

Example 3.3: A woman is known to have two children, what is the conditional probability
that they are both boys, given that she has at least one son?

Solution: Assume that the sample space S is given by 𝑺 = {(𝒃, 𝒃), (𝒃, 𝒈), (𝒈, 𝒃), (𝒈, 𝒈)} and
all outcomes are equally likely.

𝑷(𝑬𝑭) 𝑷({𝒃, 𝒃}) 𝟏/𝟒 𝟏


𝑷(𝑬|𝑭) = = = =
𝑷(𝑭) 𝑷({𝒃, 𝒃}, {𝒃, 𝒈}, {𝒈, 𝒃}) 𝟑/𝟒 𝟑

Fact:

𝑷(𝑬𝑭)
As 𝑷(𝑬|𝑭) = , then
𝑷(𝑭)

𝑷(𝑬𝑭) = 𝑷(𝑬|𝑭)𝑷(𝑭)

Example 3.4: Celine is undecided as to whether to take a French course or a chemistry


course. She estimates that her probability of receiving an A grade would be 1/2 in a French
course and 2/3 in a chemistry course. If Celine decides to base her decision on the flip of a
fair coin, what is the probability that she gets an A in chemistry?

Let the event that Celine takes chemistry and A denote the event that she receives an A in
whatever course she takes, then the desired probability is 𝑷(𝑪𝑨), which is calculated by using
the above fact as follows:

𝑷(𝑪𝑨) = 𝑷(𝑪)𝑷(𝑨|𝑪) = (𝟏/𝟐) × (𝟐/𝟑) = 𝟏/𝟑

Example 3.5: Suppose that an urn contains 8 red balls and 4 white balls. We draw 2 balls
from the urn without replacement. (a) If we assume that at each draw each ball in the urn is
equally likely to be chosen, what is the probability that both balls drawn are red?

- 21 -
Probability Theory Lecture Notes ---

Solution. Let 𝑹𝟏 and 𝑹𝟐 denote, respectively, the events that the first and second balls drawn
are red. Now, given that the first ball selected is red, there are 7 remaining red balls and 4
white balls, so 𝑷(𝑹𝟐 |𝑹𝟏 ) = 𝟕/𝟏𝟏. As 𝑷(𝑹𝟏 ) is clearly /𝟏𝟐 , the desired probability is

𝑷(𝑹𝟏 𝑹𝟐 ) = 𝑷(𝑹𝟏 )𝑷(𝑹𝟐 |𝑹𝟏 )

= (𝟐/𝟑) × (𝟕/𝟏𝟏) = (𝟏𝟒/𝟑𝟑)

(𝟖)
𝟐
Of course, this probability could have been computed by 𝑷(𝑹𝟏 𝑹𝟐 ) = .
(𝟏𝟐)
𝟐

 A generalization of the above fact which provides an expression for the probability of
the intersection of an arbitrary number of events, is sometimes referred to as the
multiplication rule.

The multiplication rule:

𝑷(𝑬𝟏 𝑬𝟐 ⋯ 𝑬𝒏 ) = 𝑷(𝑬𝟏 )𝑷(𝑬𝟐 |𝑬𝟏 )𝑷(𝑬𝟑 |𝑬𝟏 𝑬𝟐 ) ⋯ 𝑷(𝑬𝒏 |𝑬𝟏 𝑬𝟐 ⋯ 𝑬𝒏−𝟏 )

To prove the multiplication rule, just apply the definition of conditional probability to its
right-hand side. This gives:

𝑷(𝑬𝟏 𝑬𝟐 ) 𝑷(𝑬𝟏 𝑬𝟐 𝑬𝟑 ) 𝑷(𝑬𝟏 𝑬𝟐 ⋯ 𝑬𝒏 )


𝑷(𝑬𝟏 ) ⋯ = 𝑷(𝑬𝟏 𝑬𝟐 ⋯ 𝑬𝒏 )
𝑷(𝑬𝟏 ) 𝑷(𝑬𝟏 𝑬𝟐 ) 𝑷(𝑬𝟏 𝑬𝟐 ⋯ 𝑬𝒏−𝟏 )

3.2 BAYES’ FORMULA

Let E and F be events. We may express E as

𝑬 = 𝑬𝑭 𝑼 𝑬𝑭𝒄

for, in order for an outcome to be in E, it must either be in both E and F or be in E but not in
F. (See Figure 3.1.) As EF and EFc are clearly mutually exclusive, we have,

𝑷(𝑬) = 𝑷(𝑬𝑭) + 𝑷(𝑬𝑭𝒄 )

= 𝑷(𝑬|𝑭)𝑷(𝑭) + 𝑷(𝑬|𝑭𝒄 )𝑷(𝑭𝒄 )

= 𝑷(𝑬|𝑭)𝑷(𝑭) + 𝑷(𝑬|𝑭𝒄 )[𝟏 − 𝑷(𝑭)]

- 22 -
Probability Theory Lecture Notes ---

E F

𝑬𝑭𝒄 𝑬𝑭

Example 3.6: An insurance company believes that people can be divided into two classes:
those who are accident prone and those who are not. The company’s statistics show that an
accident-prone person will have an accident at some time within a fixed 1-year period with
probability .4, whereas this probability decreases to .2 for a person who is not accident prone.
If we assume that 30 percent of the population is accident prone, what is the probability that a
new policyholder will have an accident within a year of purchasing a policy?

Solution: Let A1 denote the event that the policyholder will have an accident within a year of
purchasing the policy, and let A denote the event that the policyholder is accident prone.
Hence, the desired probability is given by

𝑷(𝑨𝟏 ) = 𝑷(𝑨𝟏 |𝑨)𝑷(𝑨) + 𝑷(𝑨𝟏 |𝑨𝒄 )𝑷(𝑨𝒄 )

= (𝟎. 𝟒)(𝟎. 𝟑) + (𝟎. 𝟐)(𝟎. 𝟕) = 𝟎. 𝟐𝟔

Example 3.6.b Suppose that a new policyholder has an accident within a year of purchasing a
policy. What is the probability that he or she is accident prone?

Solution. The desired probability is

𝑷(𝑨𝑨𝟏 ) 𝑷(𝑨)𝑷(𝑨|𝑨𝟏 ) 𝟎. 𝟑 × 𝟎. 𝟒 𝟔
𝑷(𝑨|𝑨𝟏 ) = = = =
𝑷(𝑨𝟏 ) 𝑷(𝑨𝟏 ) 𝟎. 𝟐𝟔 𝟏𝟑

Example 3.7: In answering a question on a multiple-choice test, a student either knows the
answer or guesses. Let 𝒑 be the probability that the student knows the answer and 𝟏 − 𝒑 be
the probability that the student guesses. Assume that a student who guesses at the answer will
be correct with probability 𝟏/𝒎, where 𝒎 is the number of multiple-choice alternatives.
What is the conditional probability that a student knew the answer to a question given that he
or she answered it correctly?

- 23 -
Probability Theory Lecture Notes ---

Solution. Let 𝑪 and 𝑲 denote, respectively, the events that the student answers the question
correctly and the event that he or she actually knows the answer. Now,

𝑷(𝑲𝑪) 𝑷(𝑪|𝑲)𝑷(𝑲)
𝑷(𝑲|𝑪) = =
𝑷(𝑪) 𝑷(𝑪|𝑲)𝑷(𝑲) + 𝑷(𝑪|𝑲𝒄 )𝑷(𝑲𝒄 )

𝒑 𝒎𝒑
= =
𝒑 + (𝟏/𝒎)(𝟏 − 𝒑) 𝟏 + (𝒎 − 𝟏)𝒑

For example, if 𝒎 = 𝟓 and 𝒑 = 𝟏/𝟐, then the probability that the student knew the answer
to a question he or she answered correctly is 𝟓/𝟔.

Example 3.8: A laboratory blood test is 95 percent effective in detecting a certain disease
when it is, in fact, present. However, the test also yields a “false positive” result for 1 percent
of the healthy persons tested. (That is, if a healthy person is tested, then, with probability . 𝟎𝟏,
the test result will imply that he or she has the disease.) If . 𝟓 percent of the population
actually has the disease, what is the probability that a person has the disease given that the test
result is positive?

Solution. Let 𝑫 be the event that the person tested has the disease and 𝑬 the event that the test
result is positive. Then the desired probability is

𝑷(𝑫𝑬) 𝑷(𝑬|𝑫)𝑷(𝑫)
𝑷(𝑫|𝑬) = =
𝑷(𝑬) 𝑷(𝑬|𝑫)𝑷(𝑫) + 𝑷(𝑬|𝑫𝒄 )𝑷(𝑫𝒄 )

(. 𝟗𝟓)(. 𝟎𝟎𝟓) 𝟗𝟓
= = ≈ 𝟎. 𝟑𝟐𝟑
(. 𝟗𝟓)(. 𝟎𝟎𝟓) + (. 𝟎𝟏)(. 𝟗𝟗𝟓) 𝟐𝟗𝟒

Since 0.5 percent of the population actually has the disease, it follows that, on the average, 1
person out of every 200 tested will have it. The test will correctly confirm that this person has
the disease with probability .95. Thus, on the average, out of every 200 persons tested, the test
will correctly confirm that .95 person has the disease. On the other hand, however, out of
the(on the average) 199 healthy people, the test will incorrectly state that (199)(.01) of these
people have the disease. Hence, for every .95 diseased person that the test correctly states is
ill, there are (on the average) (199)(.01) healthy persons that the test incorrectly states are ill.
Thus, the proportion of time that the test result is correct when it states that a person is ill is

𝟎. 𝟗𝟓 𝟗𝟓
= ≈ 𝟎. 𝟑𝟐𝟑
. 𝟗𝟓 + (𝟏𝟗𝟗)(𝟎. 𝟎𝟏) 𝟐𝟗𝟒

- 24 -
Probability Theory Lecture Notes ---

Example 3.9: Consider a medical practitioner pondering the following dilemma: “If I’m at
least 80 percent certain that my patient has this disease, then I always recommend surgery,
whereas if I’m not quite as certain, then I recommend additional tests that are expensive and
sometimes painful. Now, initially I was only 60 percent certain that Jones had the disease, so I
ordered the series A test, which always gives a positive result when the patient has the disease
and almost never does when he is healthy. The test result was positive, and I was all set to
recommend surgery when Jones informed me, for the first time, that he was diabetic. This
information complicates matters because, although it doesn’t change my original 60 percent
estimate of his chances of having the disease in question, it does affect the interpretation of
the results of the A test. This is so because the A test, while never yielding a positive result
when the patient is healthy, does unfortunately yield a positive result 30 percent of the time in
the case of diabetic patients who are not suffering from the disease. Now what do I do? More
tests or immediate surgery?”

Solution. Let D denote the event that Jones has the disease and E the event that the A test
result is positive. The desired conditional probability is then

𝑷(𝑫𝑬) 𝑷(𝑫)𝑷(𝑬|𝑫)
𝑷(𝑫|𝑬) = =
𝑷(𝑬) 𝑷(𝑬|𝑫)𝑷(𝑫) + 𝑷(𝑬|𝑫𝒄 )𝑷(𝑫𝒄 )

𝟎. 𝟔 × 𝟏
= = 𝟎. 𝟖𝟑𝟑
𝟏 × 𝟎. 𝟔 + 𝟎. 𝟑 × 𝟎. 𝟒

Example 3.10: At a certain stage of a criminal investigation, the inspector in charge is 60


percent convinced of the guilt of a certain suspect. Suppose, however, that a new piece of
evidence which shows that the criminal has a certain characteristic (such as left-handedness,
baldness, or brown hair) is uncovered. If 20 percent of the population possesses this
characteristic, how certain of the guilt of the suspect should the inspector now be if it turns
out that the suspect has the characteristic?

Solution. Letting G denote the event that the suspect is guilty and C the event that he
possesses the characteristic of the criminal, we have

𝑷(𝑮𝑪) 𝑷(𝑪|𝑮)𝑷(𝑮)
𝑷(𝑮|𝑪) = =
𝑷(𝑪) 𝑷(𝑪|𝑮)𝑷(𝑮) + 𝑷(𝑪|𝑮𝒄 )𝑷(𝑮𝒄 )

𝟏 × 𝟎. 𝟔
= ≈ 𝟎. 𝟖𝟖𝟐
𝟏 × 𝟎. 𝟔 + 𝟎. 𝟐 × 𝟎. 𝟒

- 25 -
Probability Theory Lecture Notes ---

3.3 INDEPENDENT EVENTS

In the special cases where 𝑷(𝑬|𝑭) does in fact equal 𝑷(𝑬), we say that 𝑬 is independent of 𝑭.
That is, 𝑬 is independent of 𝑭 if knowledge that 𝑭 has occurred does not change the
probability that 𝑬 occurs.

Since 𝑷(𝑬|𝑭) = 𝑷(𝑬𝑭)/𝑷(𝑭), it follows that 𝑬 is independent of 𝑭 if

𝑷(𝑬𝑭) = 𝑷(𝑬)𝑷(𝑭)

The fact that the above equation is symmetric in 𝑬 and 𝑭 shows that whenever 𝑬 is
independent of 𝑭, 𝑭 is also independent of 𝑬. We thus have the following

Definition:

Two events 𝑬 and 𝑭 are said to be independent if 𝑷(𝑬𝑭) = 𝑷(𝑬)𝑷(𝑭) holds.

Two events 𝑬 and 𝑭 that are not independent are said to be dependent.

Example 3.11: A card is selected at random from an ordinary deck of 𝟓𝟐 playing cards. If 𝑬
is the event that the selected card is an ace and 𝑭 is the event that it is a spade, then 𝑬 and 𝑭
are independent. This follows because (𝑬𝑭) = 𝟏/𝟓𝟐 , whereas 𝑷(𝑬) = 𝟒/𝟓𝟐 and 𝑷(𝑭) =
𝟏𝟑/𝟓𝟐 .

Example 3.12: Two coins are flipped, and all 4 outcomes are assumed to be equally likely. If
E is the event that the first coin lands on heads and F the event that the second lands on tails,
then E and F are independent, since 𝑷(𝑬𝑭) = 𝑷({(𝑯, 𝑻)}) = 𝟏/𝟒, whereas 𝑷(𝑬) =
𝑷({(𝑯, 𝑯), (𝑯, 𝑻)}) = ½ and 𝑷(𝑭) = 𝑷({(𝑯, 𝑻), (𝑻, 𝑻)}) = ½.

Example 3.13: Suppose that we toss 2 fair dice. Let E1 denote the event that the sum of the
dice is 6 and F denote the event that the first die equals 4. Then

𝑷(𝑬𝟏 𝑭) = 𝑷({(𝟒, 𝟐)}) = 𝟏/𝟑𝟔

whereas

𝑷(𝑬𝟏 )𝑷(𝑭) = (𝟓/𝟑𝟔) × (𝟏/𝟔) = (𝟓/𝟐𝟏𝟔)

- 26 -
Probability Theory Lecture Notes ---

Hence, E1 and F are not independent. Intuitively, the reason for this is clear because if we are
interested in the possibility of throwing a 6 (with 2 dice), we shall be quite happy if the first
die lands on 4 (or, indeed, on any of the numbers 1, 2, 3, 4, and 5), for then we shall still have
a possibility of getting a total of 6.

Now, suppose that we let 𝑬𝟐 be the event that the sum of the dice equals 7. Is 𝑬𝟐 independent
of 𝑭? The answer is yes, since

𝑷(𝑬𝟐 𝑭) = 𝑷({(𝟒, 𝟑)}) = 𝟏/𝟑𝟔

whereas

𝑷(𝑬𝟐 )𝑷(𝑭) = (𝟏/𝟔) × (𝟏/𝟔)

We leave it for the reader to present the intuitive argument why the event that the sum of the
dice equals 7 is independent of the outcome on the first die.

Proposition 3.1. If 𝑬 and 𝑭 are independent, then so are 𝑬 and 𝑭𝒄 .

Proof: Assume that E and F are independent. Since 𝑬 = 𝑬𝑭 𝑼 𝑬𝑭𝒄 , and 𝑬𝑭 and 𝑬𝑭𝒄 are
obviously mutually exclusive, we have that

𝑷(𝑬) = 𝑷(𝑬𝑭) + 𝑷(𝑬𝑭𝒄 )

= 𝑷(𝑬)𝑷(𝑭) + 𝑷(𝑬𝑭𝒄 )

then we get

𝑷(𝑬𝑭𝒄 ) = 𝑷(𝑬)[𝟏 − 𝑷(𝑭)]

= 𝑷(𝑬)𝑷(𝑭𝒄 )

Definition:

Three events E, F, and G are said to be independent if

𝑷(𝑬𝑭𝑮) = 𝑷(𝑬)𝑷(𝑭)𝑷(𝑮)

𝑷(𝑬𝑭) = 𝑷(𝑬)𝑷(𝑭)

𝑷(𝑬𝑮) = 𝑷(𝑬)𝑷(𝑮)

𝑷(𝑭𝑮) = 𝑷(𝑭)𝑷(𝑮)
- 27 -
Probability Theory Lecture Notes ---

Example 3.14: An infinite sequence of independent trials is to be performed. Each trial


results in a success with probability p and a failure with probability 1 − p.What is the
probability that

(a) at least 1 success occurs in the first n trials;

(b) exactly k successes occur in the first n trials;

(c) all trials result in successes?

Solution. the answer to part (a) is 𝟏 − (𝟏 − 𝒑)𝒏 .

𝒏
(b) 𝑷{𝑬𝒙𝒂𝒄𝒕𝒍𝒚 𝒌 𝒔𝒖𝒄𝒄𝒆𝒔𝒔} = ( ) 𝒑𝒌 (𝟏 − 𝒑)𝒏−𝒌
𝒌

𝟎, 𝒊𝒇 𝒑 < 𝟏
(c) = 𝐥𝐢𝐦𝒏 𝒑𝒏 = {
𝟏, 𝒊𝒇 𝒑 = 𝟏

Example 3.15: A system composed of n separate components is said to be a parallel system if


it functions when at least one of the components functions (see Figure below). For such a
system, if component i, which is independent of the other components, functions with
probability pi, i = 1, . . . , n, what is the probability that the system functions?

Solution.

𝑷{𝒔𝒚𝒔𝒕𝒆𝒎 𝒇𝒖𝒏𝒄𝒕𝒊𝒐𝒏𝒔} = 𝟏 − 𝑷{𝒔𝒚𝒔𝒕𝒆𝒎 𝒅𝒐𝒆𝒔 𝒏𝒐𝒕 𝒇𝒖𝒏𝒄𝒕𝒊𝒐𝒏}

= 𝟏 − 𝑷{𝒂𝒍𝒍 𝒄𝒐𝒎𝒑𝒐𝒏𝒆𝒏𝒕𝒔 𝒅𝒐 𝒏𝒐𝒕 𝒇𝒖𝒏𝒄𝒕𝒊𝒐𝒏}

= 𝟏 − ∏(𝟏 − 𝒑𝒊 ) 𝐛𝐲 𝐢𝐧𝐝𝐞𝐩𝐞𝐧𝐝𝐞𝐧𝐜𝐞
𝒊=𝟏

- 28 -
Probability Theory Lecture Notes ---

3.4 𝑷(. |𝑭) IS A PROBABILITY

Proposition

(a) 𝟎 ≤ 𝑷(𝑬|𝑭) ≤ 𝟏
(b) 𝑷(𝑺|𝑭) = 𝟏
(c) If 𝑬𝒊 , 𝒊 = 𝟏, 𝟐, ⋯ are mutually exclusive events, then

∞ ∞

𝑷 (⋃ 𝑬𝒊 |𝑭) = ∑ 𝑷(𝑬𝒊 |𝑭)


𝟏 𝟏

A good example is: 𝑷(𝑬|𝑭) + 𝑷(𝑬𝑪 |𝑭) = 𝟏, whereas 𝑷(𝑬|𝑭) + 𝑷(𝑬|𝑭𝑪 ) ≠ 𝟏

If we define 𝑸(𝑬) = 𝑷(𝑬|𝑭), then, from Proposition 5.1, 𝑸(𝑬) may be regarded as a
probability function on the events of 𝑺. Hence, all of the propositions previously proved for
probabilities apply to 𝑸(𝑬). For instance, we have

𝑸(𝑬𝟏 𝑼𝑬𝟐 ) = 𝑸(𝑬𝟏 ) + 𝑸(𝑬𝟐 ) − 𝑸(𝑬𝟏 𝑬𝟐 )

or, equivalently,

𝑷(𝑬𝟏 𝑼 𝑬𝟐 |𝑭) = 𝑷(𝑬𝟏 |𝑭) + 𝑷(𝑬𝟐 |𝑭) − 𝑷(𝑬𝟏 𝑬𝟐 |𝑭) (∗)

Also, if we define the conditional probability 𝑸(𝑬𝟏 |𝑬𝟐 ) by 𝑸(𝑬𝟏 |𝑬𝟐 ) = 𝑸(𝑬𝟏 𝑬𝟐 )/𝑸(𝑬𝟐 ),
then, we have

𝑸(𝑬𝟏) = 𝑸(𝑬𝟏 |𝑬𝟐 )𝑸(𝑬𝟐 ) + 𝑸(𝑬𝟏 |𝑬𝑪𝟐 )𝑸(𝑬𝑪𝟐 )

Since

𝑸(𝑬𝟏 𝑬𝟐 ) 𝑷(𝑬𝟏 𝑬𝟐 |𝑭)


𝑸(𝑬𝟏 |𝑬𝟐 ) = =
𝑸(𝑬𝟐 ) 𝑷(𝑬𝟐 |𝑭)

- 29 -
Probability Theory Lecture Notes ---

𝑷(𝑬𝟏 𝑬𝟐 𝑭)
𝑷(𝑭)
= = 𝑷(𝑬𝟏 |𝑬𝟐 𝑭 )
𝑷(𝑬𝟐 𝑭)
𝑷(𝑭)

Equation (*) is equivalent to

𝑷(𝑬𝟏 |𝑭) = 𝑷(𝑬𝟏 |𝑬𝟐 𝑭)𝑷(𝑬𝟐 |𝑭) + 𝑷(𝑬𝟏 |𝑬𝑪𝟐 𝑭)𝑷(𝑬𝑪𝟐 |𝑭)

Example 3.16: Consider Example 3.6, which is concerned with an insurance company which
believes that people can be divided into two distinct classes: those who are accident prone and
those who are not. During any given year, an accident-prone person will have an accident
with probability .4, whereas the corresponding figure for a person who is not prone to
accidents is .2. What is the conditional probability that a new policyholder will have an
accident in his or her second year of policy ownership, given that the policyholder has had an
accident in the first year?

Solution. If we let 𝑨 be the event that the policyholder is accident prone and we let 𝑨𝒊 ,
𝒊 = 𝟏, 𝟐, be the event that he or she has had an accident in the 𝒊-th year, then the desired
probability 𝑷(𝑨𝟐 |𝑨𝟏 ) may be obtained by conditioning on whether or not the policyholder is
accident prone, as follows:

𝑷(𝑨𝟐 |𝑨𝟏 ) = 𝑷(𝑨𝟐 |𝑨𝑨𝟏 )𝑷(𝑨|𝑨𝟏 ) + 𝑷(𝑨𝟐 |𝑨𝒄 𝑨𝟏 )𝑷(𝑨𝒄 |𝑨𝟏 )

Now,
𝑷(𝑨𝟏 𝑨) 𝑷(𝑨|𝑨𝟏 )
𝑷(𝑨|𝑨𝟏 ) = =
𝑷(𝑨𝟏 ) 𝑷(𝑨𝟏 )

However, 𝑷(𝑨) is assumed to equal /𝟏𝟎 , and it was shown in a previous example that
𝑷(𝑨𝟏 ) = 𝟎. 𝟐𝟔. Hence,

𝑷(𝑨|𝑨𝟏 ) = (. 𝟒)(. 𝟑) . 𝟐𝟔 = 𝟔/𝟏𝟑


Thus,

𝑷(𝑨𝒄 | 𝑨𝟏 ) = 𝟏 − 𝑷(𝑨|𝑨𝟏 ) = 𝟕/𝟏𝟑

Since 𝑷(𝑨𝟐 |𝑨𝑨𝟏 ) = 𝟎. 𝟒 and 𝑷(𝑨𝟐 |𝑨𝒄 𝑨𝟏 ) = 𝟎. 𝟐, it follows that

- 30 -
Probability Theory Lecture Notes ---

𝟔 𝟕
𝑷(𝑨𝟐 |𝑨𝑨𝟏 ) = 𝟎. 𝟒 × + 𝟎. 𝟐 × ≈ 𝟎. 𝟐𝟗
𝟏𝟑 𝟏𝟑

Example 3.17: A female chimp has given birth. It is not certain, however, which of two male
chimps is the father. Before any genetic analysis has been performed, it is felt that the
probability that male number 𝟏 is the father is 𝒑 and the probability that male number 𝟐 is the
father is 𝟏 − 𝒑. DNA obtained from the mother, male number 1, and male number 2 indicate
that, on one specific location of the genome, the mother has the gene pair (𝑨, 𝑨), male
number 1 has the gene pair (𝒂, 𝒂), and male number 2 has the gene pair (𝑨, 𝒂). If a DNA test
shows that the baby chimp has the gene pair (𝑨, 𝒂), what is the probability that male number
1 is the father?

Solution: Let all probabilities be conditional on the event that the mother has the gene pair
(𝑨, 𝑨), male number 1 has the gene pair (𝒂, 𝒂), and male number 2 has the gene pair (𝑨, 𝒂).
Now, let 𝑴𝟏 and 𝑴𝟐 be the events that male number 1 and 2, is the father, respectively, and
let 𝑩𝑨,𝒂 be the event that the baby chimp has the gene pair (𝑨, 𝒂). Then 𝑷(𝑴𝟏 |𝑩𝑨,𝒂 ) is
obtained as follows:

𝑷(𝑴𝟏 𝑩𝑨,𝒂 ) 𝑷(𝑩𝑨,𝒂 |𝑴𝟏 )𝑷(𝑴𝟏 )


𝑷(𝑴𝟏 |𝑩𝑨,𝒂 ) = =
𝑷(𝑩𝑨,𝒂 ) 𝑷(𝑩𝑨,𝒂 |𝑴𝟏 )𝑷(𝑴𝟏 ) + 𝑷(𝑩𝑨,𝒂|𝑴𝟐 )𝑷(𝑴𝟐 )

𝟏×𝒑 𝟐𝒑
= =
𝟏 × 𝒑 + 𝟎. 𝟓 (𝟏 − 𝒑) 𝟏 + 𝒑

𝟐𝒑
Because > 𝒑 when 𝒑 < 𝟏, the information that the baby’s gene pair is (𝑨, 𝒂) increases
𝟏+𝒑

the probability that male number 𝟏 is the father. This result is intuitive because it is more
likely that the baby would have gene pair (𝑨, 𝒂) if 𝑴𝟏 is true than if 𝑴𝟐 is true (the respective
contidtional probabilities being 𝟏 and 𝟏/𝟐).

SUMMARY

For events 𝐸 and 𝐹, the conditional probability of 𝐸 given that 𝐹 has occurred is denoted by
𝑃(𝐸|𝐹) and is defined by

𝑃(𝐸𝐹)
𝑃(𝐸|𝐹) =
𝑃(𝐹)

- 31 -
Probability Theory Lecture Notes ---

The identity

𝑃(𝐸1 𝐸2 ⋯ 𝐸𝑛 ) = 𝑃(𝐸1 )𝑃(𝐸2 |𝐸1 ) ⋯ 𝑃(𝐸𝑛 |𝐸1 ⋯ 𝐸𝑛−1 )

is known as the multiplication rule of probability.

A valuable identity is

𝑃(𝐸) = 𝑃(𝐸|𝐹)𝑃(𝐹) + 𝑃(𝐸|𝐹 𝑐 )𝑃(𝐹 𝑐 )

which can be used to compute 𝑃(𝐸) by “conditioning” on whether F occurs.

𝑃(𝐻)/ 𝑃(𝐻 𝑐 ) is called the odds of the event 𝐻. The identity

𝑃(𝐻|𝐸) 𝑃(𝐻)𝑃(𝐸|𝐻)
=
𝑃(𝐻 𝑐 │𝐸) 𝑃(𝐻 𝑐 )𝑃(𝐸│𝐻 𝑐 )

shows that when new evidence 𝐸 is obtained, the value of the odds of 𝐻 becomes its old value
multiplied by the ratio of the conditional probability of the new evidence when 𝐻 is true to the
conditional probability when 𝐻 is not true.

Let 𝐹𝑖 , 𝑖 = 1, ⋯ , 𝑛, be mutually exclusive events whose union is the entire samplespace. The
identity

𝑃(𝐸| 𝐹𝑗 )𝑃(𝐹𝑗 )
𝑃(𝐹𝑗 |𝐸) = 𝑛
∑𝑖=1 𝑃(𝐸|𝐹𝑖 ) 𝑃(𝐹𝑖 )

is known as Bayes’s formula. If the events 𝐹𝑖 , 𝑖 = 1, … , 𝑛, are competing hypotheses, then


Bayes’s formula shows how to compute the conditional probabilities of these hypotheses
when additional evidence 𝐸 becomes available.

If 𝑃(𝐸𝐹) = 𝑃(𝐸)𝑃(𝐹), then we say that the events 𝐸 and 𝐹 are independent. This condition
is equivalent to 𝑃(𝐸|𝐹) = 𝑃(𝐸) and to 𝑃(𝐹|𝐸) = 𝑃(𝐹). Thus, the events 𝐸 and 𝐹 are
independent if knowledge of the occurrence of one of them does not affect the probability of
the other.

The events 𝐸1 , … , 𝐸𝑛 are said to be independent if, for any subset 𝐸𝑖1 , … , 𝐸𝑖𝑟 of them,

𝑃(𝐸𝑖1 ⋯ 𝐸𝑖𝑟 ) = 𝑃(𝐸𝑖1 ) ⋯ 𝑃(𝐸𝑖𝑟 )

- 32 -
Probability Theory Lecture Notes ---

CHAPTER 4:
RANDOM VARIABLES

4.1 RANDOM VARIABLES

We are interested mainly in some function of the outcome as opposed to the actual
outcome itself. For instance, in tossing dice, we are often interested in the sum of
the two dice and are not really concerned about the separate values of each die. That
is, we may be interested in knowing that the sum is 7 and may not be concerned over
whether the actual outcome was (1, 6), (2, 5), (3, 4), (4, 3), (5, 2), or (6, 1). Also, in
flipping a coin, we may be interested in the total number of heads that occur and not
care at all about the actual head–tail sequence that results. These quantities of
interest, or, more formally, these real-valued functions defined on the sample space,
are known as random variables.

Example 4.1: Suppose that our experiment consists of tossing 3 fair coins. If we let Y
denote the number of heads that appear, then Y is a random variable taking on one
of the values 0, 1, 2, and 3 with respective probabilities

𝑃{𝑌 = 0} = 𝑃{(𝑇, 𝑇, 𝑇)} = 1/8

𝑃{𝑌 = 1} = 𝑃{(𝑇, 𝑇, 𝐻), (𝑇, 𝐻, 𝑇), (𝐻, 𝑇, 𝑇)} = 3/8

𝑃{𝑌 = 2} = 𝑃{(𝑇, 𝐻, 𝐻), (𝐻, 𝑇, 𝐻), (𝐻, 𝐻, 𝑇)} = 3/8

𝑃{𝑌 = 3} = 𝑃{(𝐻, 𝐻, 𝐻)} = 1/8

- 33 -
Probability Theory Lecture Notes ---

Example 4.2: Three balls are to be randomly selected without replacement from an
urn containing 20 balls numbered 1 through 20. If we bet that at least one of the balls
that are drawn has a number as large as or larger than 17, what is the probability that
we win the bet?

Solution. Let 𝑋 denote the largest number selected. Then 𝑋 is a random variable
taking on one of the values 3, 4, . . . , 20. Furthermore, if we suppose that each of the
20
( ) possible selections are equally likely to occur, then
3

𝑖−1
( )
𝑃 {𝑋 = 𝑖 } = 2 , 𝑖 = 3,4, … , 20
20
( )
3

Above equation follows because the number of selections that result in the event
{𝑋 = 𝑖} is just the number of selections that result in the ball numbered 𝑖 and two

of the balls numbered 1 through (𝑖 – 1) being chosen. Because there are clearly
𝑖−1
( ) such selections, we obtain the probabilities expressed in above, from which
2
we see that

19
(
) 3
{ }
𝑃 𝑋 = 20 = 2 = = 0.150
20
( ) 20
3

18
(
) 51
𝑃{𝑋 = 19} = 2 = ≈ 0.134
20 380
( )
3

17
(
) 34
{ }
𝑃 𝑋 = 18 = 2 = ≈ 0.119
20
( ) 285
3

16
() 2
𝑃{𝑋 = 17} = 2 = ≈ 0.105
20
( ) 19
3
- 34 -
Probability Theory Lecture Notes ---

Hence, since the event {𝑋 ≥ 17} is the union of the disjoint events {𝑋 = 𝑖}, 𝑖 =
17, 18, 19, 20, it follows that the probability of our winning the bet is given by

𝑃{𝑋 ≥ 17} = .105 + .119 + .134 + .150 = .508

Example 4.3: Independent trials consisting of the flipping of a coin having probability
𝑝 of coming up heads are continually performed until either a head occurs or a total
of 𝑛 flips is made. If we let 𝑋 denote the number of times the coin is flipped, then 𝑋 is
a random variable taking on one of the values 1, 2, 3, . . . , 𝑛 with respective
probabilities

𝑃{𝑋 = 1} = 𝑃{𝐻} = 𝑝

𝑃{𝑋 = 2} = 𝑃{(𝑇, 𝐻)} = (1 − 𝑝)𝑝

𝑃{𝑋 = 3} = 𝑃{(𝑇, 𝑇, 𝐻)} = (1 − 𝑝)2 𝑝

⏟ 𝑇, … , 𝑇 , 𝐻)} = (1 − 𝑝)𝑛−2 𝑝
𝑃{𝑋 = 𝑛 − 1} = 𝑃 {(𝑇,
𝑛−2

𝑃{𝑋 = 𝑛} = 𝑃 {(𝑇, ⏟ 𝑇, … , 𝑇)} = (1 − 𝑝)𝑛−1


⏟ 𝑇, … , 𝑇 , 𝐻)} + 𝑃 {(𝑇,
𝑛−1 𝑛

As a check, note that

𝑛 𝑛

𝑃 (⋃{𝑋 = 𝑖}) = ∑ 𝑃{𝑋 = 𝑖}


𝑖=1 𝑖=1

𝑛−1

= ∑ 𝑝(1 − 𝑝)𝑖−1 + (1 − 𝑝)𝑛−1 = 1


𝑖=1

Example 4.4: Three balls are randomly chosen from an urn containing 3 white, 3 red,
and 5 black balls. Suppose that we win $1 for each white ball selected and lose $1 for

- 35 -
Probability Theory Lecture Notes ---

each red ball selected. If we let X denote our total winnings from the experiment,
then 𝑋 is a random variable taking on the possible values ±0, ±1, ±2, ±3 with
respective probabilities

5 3 3 5
( ) + ( )( )( )
𝑃 { 𝑋 = 0} = 3 1 1 1 = 55
11 165
( )
3

3 5 3 3
( )( ) + ( )( )
𝑃{𝑋 = 1} = 𝑃{𝑋 = −1} = 1 2 2 1 = 39
11 165
( )
3

3 5
( )( ) 15
𝑃{𝑋 = 2} = 𝑃{𝑋 = −2} = 2 1 =
11 165
( )
3

3
( ) 1
𝑃{𝑋 = 3} = 𝑃{𝑋 = −3} = 3 =
11
( ) 165
3

The probability that we win money is given by

3
55 1
∑ 𝑃 {𝑋 = 𝑖 } = =
165 3
𝑖=1

4.2 DISTRIBUTION FUNCTIONS

The cumulative distribution function (c.d.f.), or more simply the distribution function
of a random variable 𝑋, is defined for all real numbers 𝑏, −∞ < 𝑏 < ∞, by

𝐹 (𝑏 ) = 𝑃 {𝑋 ≤ 𝑏 }

In words, 𝐹(𝑏) denotes the probability that the random variable 𝑋 takes on a value
that is less than or equal to 𝑏. Some properties of the c.d.f., 𝐹 are,

1. 𝐹 is a non-decreasing function; that is if 𝑎 < 𝑏, then 𝐹(𝑎) ≤ 𝐹(𝑏)

- 36 -
Probability Theory Lecture Notes ---

2. lim𝑏→∞ 𝐹 (𝑏) = 1.

3. lim𝑏→−∞ 𝐹 (𝑏) = 0.

Example 4.5: The distribution function of the random variable 𝑋 is given by:

0, 𝑥<0
𝑥⁄ , 0≤𝑥≤1
2
2⁄ , 1≤𝑥<2
3
𝑓 (𝑥 ) =
11⁄ , 2 ≤ 𝑥 < 3
12
{ 1 , 3≤𝑥

A graph of 𝐹(𝑥) is presented in the figure below. Compute (a) 𝑃{𝑥 < 3}, (b)
𝑃{𝑋 = 3}, (c) 𝑃{𝑋 > 1/2}, and (d) 𝑃{2 < 𝑋 ≤ 4}.

Solution:

11
(a) 𝑃{𝑥 < 3} = lim𝑛→0+ 𝑃{𝑋 ≤ 3 − ℎ} = 𝑙𝑖𝑚𝑛→0+ 𝐹 {3 − ℎ} =
12

2 1 1
(b) 𝑃{𝑋 ≤ 1} − 𝑃{𝑋 < 1} = 𝐹 (1) − 𝑙𝑖𝑚𝑛→0+ 𝐹 {1 − ℎ} = − =
3 2 6

1 1 1 3
(c) 𝑃 {𝑋 > } = 1 − 𝑃 {𝑥 ≤ } = 1 − 𝐹 ( ) =
2 2 2 4

1
(d) 𝑃{2 < 𝑋 ≤ 4} = 𝐹 (4) − 𝐹 (2) =
12

- 37 -
Probability Theory Lecture Notes ---

4.3 DISCRETE RANDOM VARIABLES

A random variable that can take on at most a countable number of possible values is
said to be discrete. For a discrete random variable X, we define the probability mass
function 𝑝(𝑎) of 𝑋 by

𝑝(𝑎) = 𝑃{𝑋 = 𝑎}

The probability mass function 𝑝(𝑎) is positive for at most a countable number of
values of 𝑎. That is, if 𝑋 must assume one of the values 𝑥1 , 𝑥2 , ⋯ , then

𝑝(𝑥𝑖 ) ≥ 0 for 𝑖 = 1, 2, ⋯

𝑝(𝑥) = 0 for all other values of 𝑥

Since 𝑋 must take on one of the values 𝑥𝑖 , we have

∑ 𝑝(𝑥𝑖 ) = 1
𝑖=1

For instance, if the probability mass function of 𝑋 is

𝑝(0) = ¼, 𝑝(1) = ½, 𝑝(2) = ¼

we can represent this function graphically as shown in Figure below.

The cumulative distribution function 𝐹 can be expressed in terms of 𝑝(𝑎) by

- 38 -
Probability Theory Lecture Notes ---

𝐹 (𝑎 ) = ∑ 𝑝 ( 𝑥 )
𝑎𝑙𝑙 𝑥≤𝑎

If 𝑋 is a discrete random variable whose possible values are 𝑥1 , 𝑥2 , 𝑥3 , ⋯, where


𝑥1 < 𝑥2 < 𝑥3 < ···, then the distribution function 𝐹 of 𝑋 is a step function. That is,
the value of 𝐹 is constant in the intervals [𝑥𝑖−1 , 𝑥𝑖 ) and then takes a step (or jump) of
size 𝑝(𝑥𝑖 ) at 𝑥𝑖 . For instance, if 𝑋 has a probability mass function given by

1 1 1 1
𝑝 (1) = 𝑝(2) = 𝑝(3) = 𝑝 (4) =
4 2 8 8

then its cumulative distribution function is

0 𝑎<1
1
1≤𝑎≤2
4
3
𝐹 (𝑎 ) = 𝑓 (𝑥 ) = 2≤𝑎<3
4
7
3≤𝑎<4
8
{1 4≤𝑎

This function is depicted graphically as

4.4 EXPECTED VALUE

One of the most important concepts in probability theory is that of the expectation of
a random variable. If 𝑋 is a discrete random variable having a probability mass

- 39 -
Probability Theory Lecture Notes ---

function 𝑝(𝑥), then the expectation, or the expected value, of 𝑋, denoted by 𝐸[𝑋], is
defined by

𝐸 [𝑋 ] = ∑ 𝑥 𝑝 (𝑥 )
𝑥: 𝑝(𝑥)>0

The expected value of 𝑋 is a weighted average of the possible values that 𝑋 can take
on, each value being weighted by the probability that 𝑋 assumes it. For instance, on
the one hand, if the probability mass function of 𝑋 is given by

1
𝑝 (0) = = 𝑝(1)
2

then

1 1 1
𝐸 [𝑋 ] = 0 ( ) + 1 ( ) =
2 2 2

On the other hand, if

1 2
𝑝 (0) = , 𝑝 (1 ) =
3 3

then

1 2
𝐸[𝑋] = 0 ( ) + 1 ( ) = 2
3 3

Example 4.6: Find 𝐸[𝑋], where 𝑋 is the outcome when we roll a fair die.

Solution. Since 𝑝(1) = 𝑝(2) = 𝑝(3) = 𝑝(4) = 𝑝(5) = 𝑝(6) = 1/6, we obtain

𝐸[𝑋] = 1(1/6) + 2 (1/6) + 3 (1/6) + 4 (1/6) + 5 (1/6) + 6 (1/6) = 7/2

∑ 𝑥𝑖 𝑝(𝑥𝑖 ) = 𝐸 [𝑋 ]
𝑖=1

- 40 -
Probability Theory Lecture Notes ---

Example 4.7: A school class of 120 students is driven in 3 buses for a school trip.
There are 36 students in one of the buses, 40 in another, and 44 in the third bus.
When the buses arrive, one of the 120 students is randomly chosen. Let 𝑋 denote the
number of students on the bus of that randomly chosen student, and find 𝐸[𝑋].

Solution. Since the randomly chosen student is equally likely to be any of the 120
students, it follows that

𝑃{𝑋 = 36} = 36/120, 𝑃 {𝑋 = 40} = 40/120, 𝑃{𝑋 = 44} = 44/120

Hence,

𝐸[𝑋] = 36 (3/10) + 40 (1/3) + 44 (11/30) = 1208/30 = 40.2667

However, the average number of students on a bus is 120/3 = 40.

4.5 EXPECTATION OF A FUNCTION OF A RANDOM VARIABLE

Suppose that we are given a discrete random variable along with its probability mass
function and that we want to compute the expected value of some function of 𝑋, say,
𝑔(𝑋). How can we accomplish this? One way is as follows: Since 𝑔(𝑋) is itself a
discrete random variable, it has a probability mass function, which can be determined
from the probability mass function of X. Once we have determined the probability
mass function of 𝑔(𝑋), we can compute 𝐸[𝑔(𝑋)] by using the definition of expected
value.

Example 4.8: Let X denote a random variable that takes on any of the values −1, 0,
and 1 with respective probabilities

𝑃{𝑋 = −1} = 0.2, 𝑃{𝑋 = 0} = 0.5, 𝑃{𝑋 = 1} = 0.3

Compute 𝐸[𝑋 2 ].

Solution. Let 𝑌 = 𝑋 2 . Then the probability mass function of 𝑌 is given by

- 41 -
Probability Theory Lecture Notes ---

𝑃{𝑌 = 1} = 𝑃{𝑋 = −1} + 𝑃{𝑋 = 1} = 0.5

𝑃{𝑌 = 0} = 𝑃{𝑋 = 0} = 0.5

Hence,

𝐸[𝑋 2 ] = 𝐸[𝑌] = 1(0.5) + 0(0.5) = 0.5

Note that

0.5 = 𝐸 [𝑋 2 ] ≠ (𝐸 [𝑋 ])2 = .01

Proposition: If X is a discrete random variable that takes on one of the


values 𝒙𝒊 , 𝒊 ≥ 𝟏, with respective probabilities 𝒑(𝒙𝒊 ), then, for any real-
valued function 𝒈,

𝑬[𝒈(𝑿)] = ∑ 𝒈(𝒙𝒊 )𝒑(𝒙𝒊 )


𝒊

Before proving this proposition, let us check that it is in accord with the results of the
previous example. Applying it to that example yields

𝐸{𝑋2} = (−1)2 (0.2) + 02 (0.5) + 12 (0.3)

= 1(.2 + .3) + 0(.5)

= .5

which is in agreement with the result given in Example 4a.

Example 4.9: A product that is sold seasonally yields a net profit of b dollars for each
unit sold and a net loss of _ dollars for each unit left unsold when the season ends.
The number of units of the product that are ordered at a specific department store
during any season is a random variable having probability mass function p(i), i ≥ 0. If
the store must stock this product in advance, determine the number of units the
store should stock so as to maximize its expected profit.

- 42 -
Probability Theory Lecture Notes ---

Solution. Let X denote the number of units ordered. If s units are stocked, then the
profit—call it 𝑃(𝑠)—can be expressed as

𝑏𝑋 − (𝑠 − 𝑋 )𝑙, 𝑖𝑓 𝑋 ≤ 𝑠
𝑃𝑠 (𝑋 ) = {
𝑠𝑏, 𝑖𝑓 𝑋 > 𝑠

Hence, the expected profit equals

𝑠 ∞

𝐸 [𝑃𝑠 (𝑋)] = ∑[𝑏𝑖 − (𝑠 − 𝑖)𝑙]𝑝(𝑖) + ∑ 𝑠𝑏 𝑝(𝑖)


𝑖=0 𝑖=𝑠+1

𝑠 𝑠 𝑠

= (𝑏 + 𝑙) ∑ 𝑖𝑝(𝑖) − 𝑠𝑙 ∑ 𝑝(𝑖) + 𝑠𝑏 [1 − ∑ 𝑝(𝑖)]


𝑖=0 𝑖=0 𝑖=0

𝑠 𝑠

= (𝑏 + 𝑙) ∑ 𝑖𝑝(𝑖) − (𝑏 + 𝑙)𝑠 ∑ 𝑝(𝑖) + 𝑠𝑏


𝑖=0 𝑖=0

= 𝑠𝑏 + (𝑏 + 𝑙) ∑(𝑖 − 𝑠)𝑝(𝑖)
𝑖=0

To determine the optimum value of s, let us investigate what happens to the profit
when we increase s by 1 unit. By substitution, we see that the expected profit in this
case is given by

𝑠+1

𝐸 [𝑃𝑠+1 (𝑋 )] = 𝑏 (𝑠 + 1) + (𝑏 + 𝑙) ∑(𝑖 − 𝑠 − 1) 𝑝(𝑖)


𝑖=0

= 𝑏 (𝑠 + 1) + (𝑏 + 𝑙 ) ∑ (𝑖 − 𝑠 − 1) 𝑝 (𝑖 )
𝑖=0

Therefore,

- 43 -
Probability Theory Lecture Notes ---

𝐸 [𝑃𝑠+1 (𝑋 )] − 𝐸 [𝑃𝑠 (𝑋 )] = 𝑏 − (𝑏 + 𝑙) ∑ 𝑝(𝑖)


𝑖=0

Thus, stocking s + 1 units will be better than stocking s units whenever

𝑠
𝑏
∑ 𝑝 (𝑖 ) <
𝑏+𝑙
𝑖=0

Because the left-hand side of the above equation is increasing in s while the right-
hand side is constant, the inequality will be satisfied for all values of s … s∗, where s∗ is
the largest value of s satisfying Equation (4.1). Since

𝐸[𝑃(0)] < · · · < 𝐸[𝑃(𝑠 ∗)] < 𝐸[𝑃(𝑠 ∗ + 1)] > 𝐸[𝑃(𝑠 ∗ + 2)] > · · ·

it follows that stocking s∗ + 1 items will lead to a maximum expected profit.

Corollary: If a and b are constants, then


𝑬[𝒂𝑿 + 𝒃] = 𝒂𝑬[𝑿] + 𝒃

Proof:

𝐸 [𝑎𝑋 + 𝑏] = ∑(𝑎𝑥 + 𝑏)𝑝(𝑥)

= 𝑎∑(𝑥)𝑝(𝑥) + 𝑏∑𝑝(𝑥)

= 𝑎𝐸 [𝑋 ] + 𝑏

The expected value of a random variable 𝑋, 𝐸[𝑋], is also referred to as the mean or
the first moment of 𝑋. The quantity 𝐸[𝑋 𝑛 ], 𝑛 ≥ 1, is called the n-th moment of 𝑋. By
the proposition above, we note that

𝐸 [𝑋 𝑛 ] = ∑ 𝑥 𝑛 𝑝 (𝑥 )
𝑥:𝑝(𝑥) > 0

- 44 -
Probability Theory Lecture Notes ---

4.6 VARIANCE

We expect X to take on values around its mean 𝐸[𝑋], it would appear that a
reasonable way of measuring the possible variation of X would be to look at how far
apart X would be from its mean, on the average. One possible way to measure this
variation would be to consider the quantity 𝐸 [|𝑋 − 𝜇|], where 𝜇 = 𝐸[𝑋]. However,
it turns out to be mathematically inconvenient to deal with this quantity, so a more
tractable quantity is usually considered—namely, the expectation of the square of
the difference between X and its mean.

Definition: If 𝑿 is a random variable with mean 𝝁, then the variance of 𝑿,


denoted by 𝑽𝒂𝒓(𝑿), is defined by
𝑽𝒂𝒓(𝑿) = 𝑬[(𝑿 − 𝝁)𝟐 ]

𝑉𝑎𝑟(𝑋 ) = 𝐸 [(𝑥 − 𝜇)2 ]

= ∑ (𝑥 − 𝜇 )2 𝑝 ( 𝑥 )
𝑥

= ∑(𝑥 2 − 2𝑥𝜇 + 𝜇2 )𝑝(𝑥)


𝑥

= ∑ 𝑥 2 𝑝(𝑥) − 2𝜇 ∑ 𝑥𝑝(𝑥) + 𝜇2 ∑ 𝑝(𝑥)


𝑥 𝑥 𝑥

= 𝐸 [𝑋 2 ] − 2𝜇2 + 𝜇2

= 𝐸 [𝑋 2 ] − 𝜇 2

That is,

𝑽𝒂𝒓(𝑿) = 𝑬[𝑿𝟐 ] − (𝑬[𝑿])𝟐

- 45 -
Probability Theory Lecture Notes ---

Example 4.10: Calculate 𝑉𝑎𝑟(𝑋 ) if 𝑋 represents the outcome when a fair die is rolled.

Solution. It was shown in a previous example that 𝐸 [𝑋 ] = 7/2. Also,

𝐸[𝑋 2 ] = 12 (1/6) + 22 (1/6) + 32 (1/6) + 42 (1/6) + 52 (1/6) + 62 (1/6) = 91/6.

Hence,

91 7 2 35
𝑉𝑎𝑟(𝑋 ) = –( ) =
6 2 12

A useful identity is that, for any constants a and b,

𝑉𝑎𝑟(𝑎𝑋 + 𝑏) = 𝑎2 𝑉𝑎𝑟(𝑋 )

To prove this equality, let 𝜇 = 𝐸[𝑋] and note from Corollary 4.1 that 𝐸[𝑎𝑋 + 𝑏] =
𝑎𝜇 + 𝑏. Therefore,

𝑉𝑎𝑟(𝑎𝑋 + 𝑏) = 𝐸[(𝑎𝑋 + 𝑏 − 𝑎𝜇 − 𝑏)2 ]

= 𝐸[𝑎2 (𝑋 − 𝜇)2 ]

= 𝑎2 𝐸 [(𝑋 − 𝜇)2 ]

= a2Var(X)

(b) The square root of the 𝑉𝑎𝑟(𝑋 ) is called the standard deviation of 𝑋, and we
denote it by 𝑆𝐷 (𝑋 ). That is,

𝑆𝐷 (𝑋 ) = √𝑉𝑎𝑟(𝑋)

4.7 THE BERNOULLI AND BINOMIAL RANDOM VARIABLES

Suppose that a trial, or an experiment, whose outcome can be classified as either a


success or a failure is performed. If we let 𝑋 = 1 when the outcome is a success and
𝑋 = 0 when it is a failure, then the probability mass function of 𝑋 is given by

𝑝(0) = 𝑃{𝑋 = 0} = 1 − 𝑝
- 46 -
Probability Theory Lecture Notes ---

𝑝(1) = 𝑃{𝑋 = 1} = 𝑝

A random variable 𝑋 is said to be a Bernoulli random variable if its probability mass


function is given by above equation for some 𝑝 ∈ (0, 1).

Suppose now that 𝑛 independent trials, each of which results in a success with
probability 𝑝 and in a failure with probability 1 − 𝑝, are to be performed. If 𝑋
represents the number of successes that occur in the 𝑛 trials, then 𝑋 is said to be a
binomial random variable with parameters (𝑛, 𝑝). Thus, a Bernoulli random variable
is just a binomial random variable with parameters (1, 𝑝).

𝑛
𝑝(𝑖) = ( ) 𝑝𝑖 (1 − 𝑝)𝑛−𝑖 , 𝑖 = 0,1, … , 𝑛
𝑖

Note that, by the binomial theorem, the probabilities sum to 1; that is,

∞ ∞

∑ 𝑝(𝑖) = ∑(𝑛𝑖)𝑝𝑖 (1 − 𝑝)𝑛−𝑖 = [𝑝 + (1 − 𝑝)]𝑛 = 1


𝑖=0 𝑖=0

4.7.1 Properties of Binomial Random Variables

𝐸 [𝑋 ] = 𝑛𝑝

𝐸 [𝑋 2 ] = 𝑛𝑝[(𝑛 − 1)𝑝 + 1]

If 𝑿 is a binomial random variable with parameters 𝒏 and 𝒑


𝑬[𝑿] = 𝒏𝒑
𝑬[𝑿] = 𝒏𝒑(𝟏 − 𝒑)

4.8 THE POISSON RANDOM VARIABLE

A random variable X that takes on one of the values 0, 1, 2, . . . is said to be a Poisson


random variable with parameter 𝜆 if, for some 𝜆 > 0,

- 47 -
Probability Theory Lecture Notes ---

𝜆𝑖
𝜆
𝑝 (𝑖 ) = 𝑃 {𝑋 = 𝑖 } = 𝑒
𝑖!

Equation above defines a probability mass function, since

∞ ∞
𝜆𝑖
∑ 𝑝 (𝑖 ) = 𝑒 𝜆 ∑ = 𝑒 𝜆 𝑒 −𝜆 = 1
𝑖!
𝑖=0 𝑖=0

The Poisson random variable has a tremendous range of applications in diverse areas
because it may be used as an approximation for a binomial random variable with
parameters (𝑛, 𝑝) when 𝑛 is large and 𝑝 is small enough so that 𝑛𝑝 is of moderate
size.

If 𝑛 independent trials, each of which results in a success with probability 𝑝, are


performed, then, when n is large and p is small enough to make 𝑛𝑝 moderate, the
number of successes occurring is approximately a Poisson random variable with
parameter 𝜆 = 𝑛𝑝. This value 𝜆, which is equal the expected number of successes, is
usually determined empirically.

Some examples of random variables that generally obey the Poisson probability law
are as follows:

1. The number of misprints on a page (or a group of pages) of a book

2. The number of people in a community who survive to age 100

3. The number of wrong telephone numbers that are dialed in a day

4. The number of packages of dog biscuits sold in a particular store each day

5. The number of customers entering a post office on a given day

6. The number of vacancies occurring during a year in the federal judicial system

- 48 -
Probability Theory Lecture Notes ---

7. The number of α-particles discharged in a fixed period of time from some


radioactive material

Example 4.11: Suppose that the number of typographical errors on a single page of
this book has a Poisson distribution with parameter = 1/2 . Calculate the probability
that there is at least one error on this page.

Solution. Letting X denote the number of errors on this page, we have

𝑃{𝑋 ≥ 1} = 1 − 𝑃{𝑋 = 0} = 1 − 𝑒 −1/2 ≈ 0 .393

The expected value and the variance of a Poisson random variable are
both equal to its parameter 𝝀.

SUMMARY

A real-valued function defined on the outcome of a probability experiment is called a


random variable.

If X is a random variable, then the function F(x) defined by

𝐹(𝑥) = 𝑃{𝑋 \𝑙𝑒𝑞 𝑥}

is called the distribution function of 𝑋. All probabilities concerning 𝑋 can be stated in


terms of 𝐹.

A random variable whose set of possible values is either finite or countably infinite is
called discrete. If X is a discrete random variable, then the function

𝑝(𝑥) = 𝑃{𝑋 = 𝑥}

is called the probability mass function of 𝑋. Also, the quantity 𝐸 [𝑋 ] defined by

𝐸[𝑋] = ∑ 𝑥𝑝(𝑥)
𝑥:𝑝(𝑥)>0

- 49 -
Probability Theory Lecture Notes ---

is called the expected value of X. E[X] is also commonly called the mean or the
expectation of X.

A useful identity states that, for a function g,

𝐸[𝑔(𝑋 )] = ∑ 𝑔(𝑥)𝑝(𝑥)
𝑥:𝑝(𝑥)>0

The variance of a random variable 𝑋, denoted by Var(𝑋), is defined by

Var(𝑋) = 𝐸 [(𝑋 − 𝐸 [𝑋 ])2 ]

The variance, which is equal to the expected square of the difference between 𝑋 and
its expected value, is a measure of the spread of the possible values of 𝑋. A useful
identity is

Var(𝑋 ) = 𝐸[𝑋 2 ] − (𝐸 [𝑋 ])2

The quantity √Var(𝑋 ) is called the standard deviation of 𝑋.

We now note some common types of discrete random variables. The random
variable 𝑋 whose probability mass function is given by

𝑛
𝑝(𝑖) = ( ) 𝑝𝑖 (1 − 𝑝)𝑛−𝑖 , 𝑖 = 0, … , 𝑛
𝑖

is said to be a binomial random variable with parameters 𝑛 and 𝑝. Such a random


variable can be interpreted as being the number of successes that occur when 𝑛
independent trials, each of which results in a success with probability 𝑝, are
performed. Its mean and variance are given by

𝐸 [𝑋 ] = 𝑛𝑝 Var(𝑋) = 𝑛𝑝(1 − 𝑝)

The random variable X whose probability mass function is given by

- 50 -
Probability Theory Lecture Notes ---

𝑒 −𝜆 𝜆𝑖
𝑝 (𝑖 ) = , 𝑖≥ 0
𝑖!

is said to be a Poisson random variable with parameter 𝜆. If a large number of


(approximately) independent trials are performed, each having a small probability of
being successful, then the number of successful trials that result will have a
distribution which is approximately that of a Poisson random variable. The mean and
variance of a Poisson random variable are both equal to its parameter λ. That is,

𝐸[𝑋] = 𝑉𝑎𝑟(𝑋) = 𝜆

An important property of the expected value is that the expected value of a sum of
random variables is equal to the sum of their expected values. That is,

𝑛 𝑛

𝐸 [∑ 𝑋𝑖 ] = ∑ 𝐸 [𝑋𝑖 ]
𝑖=1 𝑖=1

- 51 -
Probability Theory Lecture Notes ---

- 52 -
Probability Theory Lecture Notes ---

CHAPTER 5.
CONTINUOUS RANDOM VARIABLES

5.1 INTRODUCTION

We considered discrete random variables—that is, random variables whose set of


possible values is either finite or countably infinite. However, there also exist
random variables whose set of possible values is uncountable. Two examples are
the time that a train arrives at a specified stop and the lifetime of a transistor. Let
X be such a random variable. We say that X is a continuous† random variable if
there exists a nonnegative function f, defined for all real x ∈ (−∞, ∞), having the
property that, for any set B of real numbers

𝑃{𝑋 ∈ 𝐵} = ∫ 𝑓(𝑥)𝑑𝑥 (∗)


𝐵

The function f is called the probability density function of the random variable X.

- 53 -
Probability Theory Lecture Notes ---

In words, the above states that the probability that X will be in B may be obtained
by integrating the probability density function over the set B. Since X must
assume some value, f must satisfy

1 = 𝑃{𝑋 ∈ (−∞, ∞)} = ∫ 𝑓 (𝑥)𝑑𝑥


−∞

All probability statements about X can be answered in terms of f . For instance,


from Equation (*), letting B = [a, b], we obtain

𝑃{𝑎 ≤ 𝑋 ≤ 𝑏} = ∫ 𝑓 (𝑥)𝑑𝑥
𝑎

If we let a = b in the above equation, we get

𝑃{𝑋 = 𝑎} = ∫ 𝑓 (𝑥)𝑑𝑥 = 0
𝑎

For a continuous random variable,

𝑃{𝑋 < 𝑎} = 𝑃{𝑋 ≤ 𝑎} = 𝐹 (𝑎) = ∫ 𝑓(𝑥)𝑑𝑥 = 0


−∞

Example 5.1: Suppose that X is a continuous random variable whose probability


density function is given by

𝐶 (4𝑥 − 2𝑥 2 ), 0<𝑥<0
𝑓 (𝑥 ) = {
0, 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒

(a) What is the value of C?

(b) Find P{X > 1}.

- 54 -
Probability Theory Lecture Notes ---


Solution. (a) Since f is a probability density function, we must have ∫−∞ 𝑓 (𝑥)𝑑𝑥 =
1, implying that

𝐶 ∫(4𝑥 − 2𝑥 2 )𝑑𝑥 = 1
0

or

𝑥=2
2𝑥 3
2
𝐶 [2𝑥 − ]| =1
3 𝑥=0

or

3
𝐶=
8

Hence,

∞ 3 2 1
(b) 𝑃{𝑋 > 1} = ∫1 𝑓(𝑥) 𝑑𝑥 = ∫1 (4𝑥 − 2𝑥 2 ) 𝑑𝑥 =
8 2

Example 5.2: The amount of time in hours that a computer functions before
breaking down is a continuous random variable with probability density function
given by

𝜆𝑒 −𝑥/100 , 𝑥≥0
𝑓(𝑥) = {
0, 𝑥<0

What is the probability that

(a) a computer will function between 50 and 150 hours before breaking down?

(b) it will function for fewer than 100 hours?

Solution. (a) Since

∞ ∞
1 = ∫ 𝑓 (𝑥)𝑑𝑥 = 𝜆 ∫ 𝑒 −𝑥/100 𝑑𝑥
−∞ 0

- 55 -
Probability Theory Lecture Notes ---

we obtain,

𝑥 ∞ 1
1 = −𝜆(100) 𝑒 −100 | = 100𝜆 or 𝜆 =
0 100

Hence, the probability that a computer will function between 50 and 150 hours
before breaking down is given by

150
1 −𝑥/100 150
𝑃{50 < 𝑋 < 150} = ∫ 𝑒 𝑑𝑥 = −𝑒 −𝑥/100 |50 = 𝑒 −1/2 − 𝑒 −3/2 ≈ 0.384
50 100

(b) Similarly,

100
1 −𝑥/100 100
𝑃{𝑋 < 100} = ∫ 𝑒 𝑑𝑥 = −𝑒 −𝑥/100 |0 = 1 − 𝑒 −1 ≈ 0.633
0 100

Example 5.3: The lifetime in hours of a certain kind of radio tube is a random
variable having a probability density function given by

0, 𝑥 ≤ 100
𝑓(𝑥) = {100
, 𝑥 > 100
𝑥2

What is the probability that exactly 2 of 5 such tubes in a radio set will have to be
replaced within the first 150 hours of operation? Assume that the events Ei, i = 1,
2, 3, 4, 5, that the ith such tube will have to be replaced within this time are
independent.

Solution. From the statement of the problem, we have

150 150
1
𝑃{𝐸𝑖 } = ∫ 𝑓 (𝑥) 𝑑𝑥 = 100 ∫ 𝑥 −2 𝑑𝑥 =
0 100 3

Hence, from the independence of the events Ei, it follows that the desired
probability is

2
5 1 2 3 80
( )( ) ( ) =
2 3 3 243
- 56 -
Probability Theory Lecture Notes ---

The relationship between the cumulative distribution F and the probability


density f is expressed by

𝑎
𝐹 (𝑎) = 𝑃{𝑋 ∈ (−∞, 𝑎]} = ∫ 𝑓 (𝑥) 𝑑𝑥
−∞

Differentiating both sides of the preceding equation yields

𝑑
𝐹 (𝑎 ) = 𝑓 (𝑎 )
𝑑𝑎

5.2 EXPECTATION AND VARIANCE OF CONTINUOUS RANDOM VARIABLES

We have defined the expected value of a discrete random variable X by

𝐸 [𝑋 ] = ∑ 𝑥𝑃{𝑋 = 𝑥}
𝑥

If X is a continuous random variable having probability density function f (x), then


the analogous definition is to define the expected value of X by


𝐸 [𝑋 ] = ∫ 𝑥𝑓 (𝑥) 𝑑𝑥
−∞

Example 5.4: Find E[X] when the density function of X is

2𝑥, if 0 ≤ 𝑥 ≤ 1
𝑓 (𝑥 ) = {
0, otherwise

Solution:

1
2
𝐸 [𝑋 ] = ∫ 𝑥𝑓 (𝑥)𝑑𝑥 = ∫ 2𝑥 2 𝑑𝑥 =
0 3

Example 5.5: The density function of X is given by

1, if 0 ≤ 𝑥 ≤ 1
𝑓 (𝑥 ) = {
0, otherwise

find 𝐸 [𝑒 𝑋 ].
- 57 -
Probability Theory Lecture Notes ---

Solution. Let Y = eX. We start by determining FY, the probability distribution


function of Y. Now, for 1 ≤ x ≤ e

𝐹𝑌 (𝑥) = 𝑃{𝑌 ≤ 𝑥}

= 𝑃{𝑒 𝑋 ≤ 𝑥} = 𝑃{𝑋 ≤ log(𝑥)}

log(𝑥)
=∫ 𝑓 (𝑦)𝑑𝑦 = log(𝑥)
0

By differentiating FY(x), we can conclude that the probability density function of Y


is given by

1
𝑓𝑌 (𝑥) = , 1≤𝑥≤𝑒
𝑥

Hence,

∞ 𝑒
𝐸 [𝑒 𝑋 ] = 𝐸 [𝑌] = ∫ 𝑥𝑓𝑌 (𝑥) 𝑑𝑥 = ∫ 𝑑𝑥 = 𝑒 − 1
−∞ 1

Proposition: If X is a continuous random variable with


probability density function f (x), then, for any real-valued
function g,

𝑬[𝒈(𝑿)] = ∫ 𝒈(𝒙)𝒇(𝒙) 𝒅𝒙
−∞

An application of the above proposition to the previous example yields,

1
𝑋]
𝐸 [𝑒 = ∫ 𝑒 𝑥 𝑑𝑥 = 𝑒 − 1, since 𝑓 (𝑥) = 1, 0<𝑥<1
0

Example 5.6: A stick of length 1 is split at a point U that is uniformly distributed


over (0, 1). Determine the expected length of the piece that contains the point p, 0
≤ p ≤ 1.

- 58 -
Probability Theory Lecture Notes ---

Solution. Let Lp(U) denote the length of the substick that contains the point p, and
note that

1 − 𝑈, 𝑈<𝑝
𝐿𝑝 (𝑢) = {
𝑈, 𝑈≥𝑝

Hence, from the proposition

1 𝑝 1
𝐸[𝐿𝑝 (𝑈)] = ∫ 𝐿𝑝 (𝑢)𝑑𝑢 = ∫ (1 − 𝑢) 𝑑𝑢 + ∫ 𝑢 𝑑𝑢
0 0 𝑝

1 (1 − 𝑝 ) 2 1 𝑝 2 1
= − + − = + 𝑝 (1 − 𝑝 )
2 2 2 2 2

Since 𝑝(1 − 𝑝) is maximized when 𝑝 = ½, it is interesting to note that the


expected length of the substick containing the point 𝑝 is maximized when 𝑝 is the
midpoint of the original stick.

Corrolary: If 𝒂 and 𝒃 are constants, then


𝑬[𝒂𝑿 + 𝒃] = 𝒂𝑬[𝑿] + 𝒃

Var(𝑋) = 𝐸[(𝑋 − 𝜇)2 ]

Var(𝑋) = 𝐸[𝑋2] − (𝐸 [𝑋 ])2

Example 5.7: Find Var(𝑋) where 𝑋 its density function is

2𝑥, if 0 ≤ 𝑥 ≤ 1
𝑓 (𝑥 ) = { .
0, otherwise

Solution. We first compute 𝐸[𝑋 2 ].

𝐸 [𝑋 2 ] = ∫ 𝑥 2 𝑓(𝑥)𝑑𝑥
- 59 -
Probability Theory Lecture Notes ---

1
1
= ∫ 2𝑥 3 𝑑𝑥 =
0 2

Hence, since 𝐸[𝑋] = 2/3, we obtain

1 2 2
Var(𝑋 ) = − ( ) = 1/18
2 3

It can be shown that, for constants 𝑎 and 𝑏,

Var(𝑎𝑋 + 𝑏) = 𝑎2 Var(𝑋)

The proof mimics the one given for discrete random variables.

5.3 THE UNIFORM RANDOM VARIABLE

A random variable is said to be uniformly distributed over the interval (0, 1) if its
probability density function is given by

1, 0<𝑥<1
𝑓 (𝑥 ) = {
0, otherwise

In general, we say that 𝑋 is a uniform random variable on the interval (𝛼, 𝛽) if the
probability density function of 𝑋 is given by

1
, if 𝛼 < 𝑥 < 𝛽
𝑓 (𝑥) = {𝛽 − 𝛼
0, otherwise

𝑎
Since 𝐹 (𝑎) = ∫−∞ 𝑓(𝑥) 𝑑𝑥 , it follows from the above equation that the
distribution function of a uniform random variable on the interval (𝛼, 𝛽) is given
by

0 𝑎≤𝛼
𝑎−𝛼
𝐹 (𝑎 ) = {𝛽 − 𝛼 𝛼<𝑎<𝛽
1 𝑎≥𝛽

Figure below presents a graph of 𝑓(𝑎) and 𝐹 (𝑎).

- 60 -
Probability Theory Lecture Notes ---

Graph of (a) 𝑓(𝑎) and (b) 𝐹 (𝑎) for a uniform (𝛼, 𝛽) random variable.

Example 5.8: Let 𝑋 be uniformly distributed over (𝛼, 𝛽). Find (a) 𝐸 [𝑋 ] and (b)
Var(𝑋).

Solution. (a)

∞ 𝛽
𝑥
𝐸 [𝑋 ] = ∫ 𝑥𝑓 (𝑥) 𝑑𝑥 = ∫ 𝑑𝑥
−∞ 𝛼 𝛽−𝛼

𝛽2 − 𝛼 2 𝛽+𝛼
= =
2(𝛽 − 𝛼 ) 2

(b) To find Var(𝑋), we first calculate 𝐸 [𝑋 2 ].

𝛽
2]
1
𝐸 [𝑋 =∫ 𝑥 2 𝑑𝑥
𝛼 𝛽−𝛼

𝛽3 − 𝛼 3
=
3(𝛽 − 𝛼 )

𝛽2 + 𝛼𝛽 + 𝛼 2
=
3

Hence,

𝛽2 + 𝛼𝛽 + 𝛼 2 (𝛼 + 𝛽)2
𝑉𝑎𝑟(𝑋 ) = −
3 4

(𝛽 − 𝛼 ) 2
=
12

- 61 -
Probability Theory Lecture Notes ---

Therefore, the variance of a random variable that is uniformly distributed over


some interval is the square of the length of that interval divided by 12.

Example 5.9: If X is uniformly distributed over (0, 10), calculate the probability
that (a) 𝑋 < 3, (b) 𝑋 > 6, and (c) 3 < 𝑋 < 8.

3 1 3
Solution. (a) 𝑃{𝑋 < 3} = ∫0 𝑑𝑥 =
10 10

10 1 4
(b) 𝑃{𝑋 > 6} = ∫6 𝑑𝑥 =
10 10

8 1 1
(b) 𝑃{3 < 𝑋 < 8} = ∫3 𝑑𝑥 =
10 2

Example 5.10: Buses arrive at a specified stop at 15-minute intervals starting at


7 A.M. That is, they arrive at 7, 7:15, 7:30, 7:45, and so on. If a passenger arrives at
the stop at a time that is uniformly distributed between 7 and 7:30, find the
probability that he waits

(a) less than 5 minutes for a bus;

(b) more than 10 minutes for a bus.

Solution. Let 𝑋 denote the number of minutes past 7 that the passenger arrives at
the stop. Since 𝑋 is a uniform random variable over the interval (0, 30), it follows
that the passenger will have to wait less than 5 minutes if (and only if) he arrives
between 7:10 and 7:15 or between 7:25 and 7:30. Hence, the desired probability
for part (a) is

15 30
1 1 1
𝑃{10 < 𝑋 < 15} + 𝑃{25 < 𝑋 < 30} = ∫ 𝑑𝑥 + ∫ 𝑑𝑥 =
10 30 25 30 3

Similarly, he would have to wait more than 10 minutes if he arrives between 7


and 7:05 or between 7:15 and 7:20, so the probability for part (b) is

P {0 < X < 5} + P {15 < X < 20} = 1/3

- 62 -
Probability Theory Lecture Notes ---

5.4 NORMAL RANDOM VARIABLES

We say that X is a normal random variable, or simply that 𝑋 is normally


distributed, with parameters 𝜇 and 𝜎 2 if the density of 𝑋 is given by

1 2 /2𝜎 2
𝑓 (𝑥 ) = 𝑒 −(𝑥−𝜇) , −∞<𝑥 <∞
𝜎√(2𝜋)

This density function is a bell-shaped curve that is symmetric about 𝜇.

Normal density function: (a) 𝜇 = 0, 𝜎 = 1; (b) arbitrary 𝜇, 𝜎 2 .

The normal distribution was introduced by the French mathematician Abraham


DeMoivre in 1733, who used it to approximate probabilities associated with
binomial random variables when the binomial parameter n is large.

Example 5.11: Find (a) 𝐸 [𝑋 ] and (b) Var(𝑋) when 𝑋 is a normal random variable
with parameters 𝜇 and 𝜎 2 .

Solution: (a)


1 2 /2𝜎 2
𝐸[𝑥] = ∫ 𝑥𝑒 −(𝑥−𝜇) 𝑑𝑥
𝜎√2𝜋 −∞

Writing 𝑥 as (𝑥 − µ) + µ yields

- 63 -
Probability Theory Lecture Notes ---

∞ ∞
1 −(𝑥−𝜇)2 /2𝜎 2
1 2 /2𝜎 2
𝐸[𝑋] = ∫ (𝑥 − 𝜇)𝑒 𝑑𝑥 + 𝜇 ∫ 𝑒 −(𝑥−𝜇) 𝑑𝑥
𝜎 √2𝜋 −∞ 𝜎√2𝜋 −∞

Letting 𝑦 = 𝑥 in the first integral yields,

∞ ∞
1 −𝑦 2 /2𝜎 2
𝐸[𝑋] = ∫ 𝑦𝑒 𝑑𝑥 + 𝜇 ∫ 𝑓(𝑥) 𝑑𝑥
𝜎√2𝜋 −∞ −∞

where 𝑓(𝑥) is the normal density. By symmetry, the first integral must be 0, so


𝐸[𝑋] = 𝜇 ∫ 𝑓(𝑥) 𝑑𝑥 = 𝜇
−∞

(b) Since 𝐸[𝑋] = µ, we have that

𝑉𝑎𝑟(𝑋) = 𝐸[(𝑥 − 𝜇)2 ]


1 2 /2𝜎 2
= ∫ (𝑥 − 𝜇)2 𝑒 −(𝑥−𝜇) 𝑑𝑥
𝜎√2𝜋 −∞

Substituting 𝑦 = (𝑥 − µ) yields


𝜎2 2 /2
𝑉𝑎𝑟(𝑋) = ∫ 𝑦 2 𝑒 −𝑦 𝑑𝑦
√2𝜋 −∞

∞ ∞
𝜎2 −
𝑦2 2 /2
= [𝑦𝑒 2 | + ∫ 𝑒 −𝑦 𝑑𝑦]
√2𝜋 −∞ −∞

by integration by parts. Hence,


𝜎2 2 /2
𝑉𝑎𝑟(𝑋) = ∫ 𝑒 −𝑦 𝑑𝑦
√2𝜋 −∞

= 𝜎2

- 64 -
Probability Theory Lecture Notes ---

An important fact about the normal variables is that if 𝑋 is normally distributed


with parameters 𝜇 and 𝜎 2 , then 𝑌 = 𝛼𝑋 + 𝛽 is normally distributed with
parameters 𝛼𝜇 + 𝛽 and 𝛼 2 𝜎 2 .

If X is normally distributed with parameters 𝜇 and 𝜎 2 , then 𝑍 = (𝑋 − 𝜇)/𝜎 is


normally distributed with parameters 0 and 1. Such a random variable 𝑍 is said to
have a standard, or unit normal distribution.

The cumulative distribution function of a standard normal variable is denoted by


Φ(𝑥). That is,


1 2 /2
Φ(𝑥) = ∫ 𝑒 −𝑦 𝑑𝑦
√2𝜋 ∞

The values of Φ(𝑥) for nonnegative x are given in a table. For negative values of 𝑥,
Φ(𝑥) can be obtained by

Φ(−𝑥) = 1 − Φ(𝑥) −∞<𝑥 <∞

Since 𝑍 = (𝑋 − 𝜇)/𝜎 is a standard normal random variable whenever 𝑋 is


normally distributed with parameters 𝜇 and 𝜎 2 , it follows that the distribution
function of 𝑋 can be expressed as

𝑋−𝜇 𝑎−𝜇 𝑎−𝜇


𝐹𝑋 (𝑎) = 𝑃{𝑋 ≤ 𝑎} = 𝑃 ( ≤ ) = Φ( )
𝜎 𝜎 𝜎

Example 5.12: If X is a normal random variable with parameters 𝜇 = 3 and


𝜎 2 = 9, find (a) 𝑃{2 < 𝑋 < 5}; (b) 𝑃{𝑋 > 0}; (c) 𝑃{|𝑋 − 3| > 6}.

Solution. (a)

2−3 𝑋−3 5−3


𝑃 { 2 < 𝑋 < 5} = 𝑃 { < < }
3 3 3

−1 2
= 𝑃{ <𝑍< }
3 3

- 65 -
Probability Theory Lecture Notes ---

2 1 2 1
= Φ ( ) − Φ (− ) = Φ ( ) − [1 − Φ ( )] ≈ 0.3779
3 3 3 3

(b)

𝑋−3 0−3
𝑃 {𝑋 > 0} = 𝑃 { < } = 𝑃{𝑍 > −1} = 1 − Φ(−1) = Φ(1) ≈ 0.8413
3 3

(c)

𝑃{|𝑋 − 3| > 6} = 𝑃{𝑋 > 9} + 𝑃{𝑋 < −3}

𝑋−3 9−3 𝑋 − 3 −3 − 3
= 𝑃{ > }+𝑃{ < }
3 3 3 3

= 1 − Φ(2) + Φ(−2) = 2[1 − Φ(2)] ≈ 0.456

Example 5.13: The instructor often uses the test scores to estimate the normal
parameters 𝜇 and 𝜎 2 and then assigns the letter grade A to those whose test score
is greater than 𝜇 + 𝜎, B to those whose score is between 𝜇 and 𝜇 + 𝜎, C to those
whose score is between 𝜇 − 𝜎 and 𝜇, D to those whose score is between 𝜇 − 2𝜎
and 𝜇 − 𝜎, and F to those getting a score below 𝜇 − 2𝜎. (This strategy is
sometimes referred to as grading “on the curve.”) Since

𝑃{𝑋 > 𝜇 + 𝜎} = 𝑃{(𝑋 – 𝜇)/𝜎 > 1} = 1 − Φ(1) ≈ 0.1587

𝑃{𝜇 < 𝑋 < 𝜇 + 𝜎} = 𝑃{0 < (𝑋 – 𝜇)/𝜎 < 1} = Φ(1) − Φ(0) ≈ 0.3413

𝑃{𝜇 − 𝜎 < 𝑋 < 𝜇} = 𝑃{−1 < (𝑋 – 𝜇)/𝜎 < 0} = Φ(0) − Φ(−1) ≈ 0.3413

𝑃{𝜇 − 2𝜎 < 𝑋 < 𝜇 − 𝜎 } = 𝑃{−2 < (𝑋 – 𝜇)/𝜎 < −1} = Φ(2) − Φ(1) ≈ 0.1359

𝑃{𝑋 < 𝜇 − 2𝜎} = 𝑃{(𝑋 – 𝜇)/𝜎 < −2} = Φ(−2) ≈ 0.0228

it follows that approximately 16 percent of the class will receive an A grade on


the examination, 34 percent a B grade, 34 percent a C grade, and 14 percent a D
grade; 2 percent will fail.

- 66 -
Probability Theory Lecture Notes ---

Example 5.14: Suppose that a binary message—either 0 or 1—must be


transmitted by wire from location A to location B. However, the data sent over the
wire are subject to a channel noise disturbance, so, to reduce the possibility of
error, the value 2 is sent over the wire when the message is 1 and the value −2 is
sent when the message is 0. If 𝑥, 𝑥 = ±2, is the value sent at location A, then R,
the value received at location B, is given by 𝑅 = 𝑥 + 𝑁, where 𝑁 is the channel
noise disturbance. When the message is received at location B, the receiver
decodes it according to the following rule:

If 𝑅 ≥ 0.5, then 1 is concluded.

If 𝑅 < 0.5, then 0 is concluded.

Because the channel noise is often normally distributed, we determine the error
probabilities when 𝑁 is a standard normal random variable.

Solution. Two types of errors can occur: One is that the message 1 can be
incorrectly determined to be 0, and the other is that 0 can be incorrectly
determined to be 1. The first type of error will occur if the message is 1 and
2 + 𝑁 <0.5, whereas the second will occur if the message is 0 and −2 + 𝑁 ≥
0.5. Hence,

𝑃{error | message is 1} = 𝑃{𝑁 < −1.5}

= 1 − Φ(1.5) ≈ 0.0668

and

𝑃{error|message is 0} = 𝑃{𝑁 ≥ 2.5}

= 1 − Φ(2.5) ≈ 0.0062

5.5 EXPONENTIAL RANDOM VARIABLES

A continuous random variable whose probability density function is given, for


some 𝜆 > 0, by

- 67 -
Probability Theory Lecture Notes ---

𝜆𝑒 −𝜆𝑥 , if 𝑥 ≥ 0
𝑓 (𝑥 ) = {
0, if 𝑥 < 0

is said to be an exponential random variable (or, more simply, is said to be


exponentially distributed) with parameter 𝜆. The cumulative distribution
function 𝐹(𝑎) of an exponential random variable is given by

𝐹 (𝑎 ) = 𝑃 {𝑋 ≤ 𝑎 }

𝑎
= ∫ 𝜆𝑒 −𝜆𝑥 𝑑𝑥
0

𝑎
= −𝑒 −𝜆𝑥 |0

= 1 − 𝑒 −𝜆𝑎 , 𝑎≥0

Example 5.15: Let 𝑋 be an exponential random variable with parameter 𝜆.


Calculate (a) 𝐸[𝑋] and (b) Var(𝑋).

Solution. (a) Since the density function is given by

𝜆𝑒 −𝜆𝑥 , if 𝑥 ≥ 0
𝑓 (𝑥 ) = {
0, if 𝑥 < 0

we obtain, for 𝑛 > 0,



𝐸 [𝑋 𝑛 ] = −𝑥 𝑛 𝑒 −𝜆𝑥 |0 + ∫ 𝑒 −𝜆𝑥 𝑛𝑥 𝑛−1 𝑑𝑥
0

𝑛 ∞ −𝜆𝑥 𝑛−1
= 0 + ∫ 𝜆𝑒 𝑛𝑥 𝑑𝑥
𝜆 0

𝑛
= 𝐸 [𝑋 𝑛−1 ]
𝜆

Letting 𝑛 = 1 and then 𝑛 = 2 gives

1
𝐸 [𝑋 ] =
𝜆

- 68 -
Probability Theory Lecture Notes ---

2 2
𝐸 [𝑋 2 ] = 𝐸 [𝑋 ] = 2
𝜆 𝜆

(b) Hence,

2 1 2 1
𝐕𝐚𝐫(𝑋 ) = 2 − ( ) = 2
𝜆 𝜆 𝜆

Thus, the mean of the exponential is the reciprocal of its parameter 𝜆, and the
variance is the mean squared.

In practice, the exponential distribution often arises as the distribution of the


amount of time until some specific event occurs. For instance, the amount of time
(starting from now) until an earthquake occurs, or until a new war breaks out, or
until a telephone call you receive turns out to be a wrong number are all random
variables that tend in practice to have exponential distributions.

Example 5.16: Suppose that the length of a phone call in minutes is an


exponential random variable with parameter 𝜆 = 1/10 . If someone arrives
immediately ahead of you at a public telephone booth, find the probability that
you will have to wait

(a) more than 10 minutes;

(b) between 10 and 20 minutes.

Solution. Let 𝑋 denote the length of the call made by the person in the booth.
Then the desired probabilities are

(a)

𝑃{𝑋 > 10} = 1 − 𝐹(10)

= 𝑒 −1 ≈ 0.368

(b)

- 69 -
Probability Theory Lecture Notes ---

𝑃{10 < 𝑋 < 20} = 𝐹(20) − 𝐹(10)

= 𝑒 −1 − 𝑒 −2 ≈ 0.233

We say that a nonnegative random variable 𝑋 is memoryless if

𝑃{𝑋 > 𝑠 + 𝑡 | 𝑋 > 𝑡} = 𝑃{𝑋 > 𝑠} for all 𝑠, 𝑡 ≥ 0

If we think of 𝑋 as being the lifetime of some instrument, the above equation


states that the probability that the instrument survives for at least 𝑠 + 𝑡 hours,
given that it has survived 𝑡 hours, is the same as the initial probability that it
survives for at least s hours. In other words, if the instrument is alive at age 𝑡, the
distribution of the remaining amount of time that it survives is the same as the
original lifetime distribution. (That is, it is as if the instrument does not
“remember” that it has already been in use for a time 𝑡.)

The above equation is equivalent to

𝑃{𝑋 > 𝑠 + 𝑡, 𝑋 > 𝑡 }


= 𝑃 {𝑋 > 𝑠 }
𝑃 {𝑋 > 𝑡 }

or

𝑃 {𝑋 > 𝑠 + 𝑡 } = 𝑃 {𝑋 > 𝑠 }𝑃 {𝑥 > 𝑡 }

Since the above equation is satisfied when X is exponentially distributed (for


𝑒 −𝜆(𝑠+𝑡) = 𝑒 −𝜆𝑠 𝑒 −𝜆𝑡 ), it follows that exponentially distributed random variables
are memoryless. The exponential distribution is the only distribution that is
memorless.

Example 5.17: Suppose that the number of miles that a car can run before its
battery wears out is exponentially distributed with an average value of 10,000
miles. If a person desires to take a 5000-mile trip, what is the probability that he
or she will be able to complete the trip without having to replace the car battery?
What can be said when the distribution is not exponential?

- 70 -
Probability Theory Lecture Notes ---

Solution. It follows by the memoryless property of the exponential distribution


that the remaining lifetime (in thousands of miles) of the battery is exponential
with parameter 𝜆 = 1/10 . Hence, the desired probability is

1
𝑃{remaining lifetime > 5} = 1 − 𝐹(5) = 𝑒 −5𝜆 = 𝑒 −2 ≈ 0.604

However, if the lifetime distribution 𝐹 is not exponential, then the relevant


probability is

1 − 𝐹 (𝑡 + 5)
𝑃{lifetime > 𝑡 + 5|lifetime > 𝑡 } =
𝑎 − 𝐹 (𝑡 )

where 𝑡 is the number of miles that the battery had been in use prior to the start
of the trip. Therefore, if the distribution is not exponential, additional information
is needed (namely, the value of 𝑡) before the desired probability can be
calculated.

5.5.1 LAPLACE DISTRIBUTON

A variation of the exponential distribution is the distribution of a random variable


that is equally likely to be either positive or negative and whose absolute value is
exponentially distributed with parameter 𝜆, 𝜆 ≥ 0. Such a random variable is
said to have a Laplace distribution, and its density is given by

1 −𝜆|𝑥|
𝑓 (𝑥 ) = 𝜆𝑒 , −∞<𝑥 <∞
2

Its distribution function is given by

1 ∞ 𝜆𝑥
∫ 𝜆𝑒 𝑑𝑥, 𝑥≤0
2 −∞
𝑓 (𝑥 ) =
1 0 𝜆𝑥 1 𝑥 −𝜆𝑥
∫ 𝜆𝑒 𝑑𝑥 + ∫ 𝜆𝑒 , 𝑥>0
{2 −∞ 2 0

Example 5.18: Consider again the same example, which supposes that a binary
message is to be transmitted from A to B, with the value 2 being sent when the

- 71 -
Probability Theory Lecture Notes ---

message is 1 and −2 when it is 0. However, suppose now that, rather than being a
standard normal random variable, the channel noise 𝑁 is a Laplacian random
variable with parameter 𝜆 = 1. Suppose again that if 𝑅 is the value received at
location B, then the message is decoded as follows:

If R ≥ 0.5, then 1 is concluded.

If R < 0.5, then 0 is concluded.

In this case, where the noise is Laplacian with parameter λ = 1, the two types of
errors will have probabilities given by

𝑃{error|message 1 is sent} = 𝑃{𝑁 < −1.5}

= ½ 𝑒 −1.5 ≈ 0.1116

𝑃{error|message 0 is sent} = 𝑃{𝑁 ≥ 2.5}

1 −2.5
= 𝑒 ≈ 0.041
2

On comparing this with the results of Example 4e, we see that the error
probabilities are higher when the noise is Laplacian with 𝜆 = 1 than when it is a
standard normal variable.

5.5.2 Hazard Rate Functions

Consider a positive continuous random variable 𝑋 that we interpret as being the


lifetime of some item. Let 𝑋 have distribution function 𝐹 and density 𝑓. The
hazard rate (sometimes called the failure rate) function 𝜆(𝑡) of 𝐹 is defined by

𝑓 (𝑡 )
𝜆(𝑡 ) = ̅̅̅̅̅̅ , 𝑤ℎ𝑒𝑟𝑒 𝐹̅ = 1 − 𝐹
𝐹 (𝑡 )

To interpret 𝜆(𝑡), suppose that the item has survived for a time 𝑡 and we desire
the probability that it will not survive for an additional time d𝑡. That is, consider
𝑃{𝑋 ∈ (𝑡, 𝑡 + d𝑡 )|𝑋 > 𝑡}. Now,

- 72 -
Probability Theory Lecture Notes ---

𝑃{𝑥 ∈ (𝑡, 𝑡 + 𝑑𝑡 ), 𝑋 > 𝑡 }


𝑃{𝑥 ∈ (𝑡, 𝑡 + 𝑑𝑡 )|𝑋 > 𝑡 } =
𝑃 {𝑋 > 𝑡 }

𝑃{𝑥 ∈ (𝑡, 𝑡 + 𝑑𝑡 )}
=
𝑃 {𝑋 > 𝑡 }

𝑓 (𝑡 )

̅̅̅̅̅̅
𝐹 (𝑡 )

Thus, 𝜆(𝑡) represents the conditional probability intensity that a 𝑡-unit-old item
will fail.

Suppose now that the lifetime distribution is exponential. Then, by the


memoryless property, it follows that the distribution of remaining life for a
𝑡 −year-old item is the same as that for a new item. Hence, 𝜆(𝑡) should be
constant. In fact, this checks out, since

𝑓(𝑡 ) 𝜆𝑒 −𝜆𝑡
𝜆 (𝑡 ) = = −𝜆𝑡 = 𝜆
̅̅̅̅̅̅
𝐹 (𝑡 ) 𝑒

Thus, the failure rate function for the exponential distribution is constant. The
parameter 𝜆 is often referred to as the rate of the distribution.

It turns out that the failure rate function 𝜆(𝑡) uniquely determines the
distribution 𝐹. To prove this, note that, by definition,

𝑑
𝐹 (𝑡 )
𝜆(𝑡 ) = 𝑑𝑡
1 − 𝐹 (𝑡 )

Integrating both sides yields

𝑡
log(1 − 𝐹 (𝑡 )) = − ∫ 𝜆(𝑡 ) 𝑑𝑡 + 𝑘
0

or

- 73 -
Probability Theory Lecture Notes ---

𝑡
𝑘
1 − 𝐹 (𝑡 ) = 𝑒 exp (− ∫ 𝜆(𝑡 )𝑑𝑡 )
0

Letting 𝑡 = 0 shows that 𝑘 = 0; thus,

𝑡
𝐹 (𝑡 ) = 1 − exp (− ∫ 𝜆(𝑡 ) 𝑑𝑡)
0

Hence, a distribution function of a positive continuous random variable can be


specified by giving its hazard rate function. For instance, if a random variable has
a linear hazard rate function—that is, if

𝜆(𝑡) = 𝑎 + 𝑏𝑡

then its distribution function is given by

2 /2
𝐹(𝑡) = 1 − 𝑒 −𝑎𝑡−𝑏𝑡

and differentiation yields its density, namely,

𝑏𝑡 2
−(𝑎𝑡+ )
𝑓 (𝑡 ) = (𝑎 + 𝑏𝑡 )𝑒 2 , 𝑡≥ 0

When 𝑎 = 0, the preceding equation is known as the Rayleigh density function.

Example 5.19: One often hears that the death rate of a person who smokes is, at
each age, twice that of a nonsmoker. What does this mean? Does it mean that a
nonsmoker has twice the probability of surviving a given number of years as does
a smoker of the same age?

Solution. If 𝜆𝑠 (𝑡) denotes the hazard rate of a smoker of age 𝑡 and 𝜆𝑛 (𝑡 ) that of a
nonsmoker of age t, then the statement at issue is equivalent to the statement
that

𝜆𝑠 (𝑡) = 2𝜆𝑛 (𝑡)

The probability that an A-year-old nonsmoker will survive until age B, A < B, is

- 74 -
Probability Theory Lecture Notes ---

𝑃{𝐴 − year − old nonsmoker reaches age 𝐵}

= 𝑃{nonsmoker’s lifetime > 𝐵|nonsmoker’s lifetime > 𝐴}

1 − 𝐹𝑛𝑜𝑛 (𝐵)
=
1 − 𝐹𝑛𝑜𝑛 (𝐴)

𝐵
exp {− ∫0 𝜆𝑛 (𝑡 ) 𝑑𝑡}
= 𝐴
exp {− ∫0 𝜆𝑛 (𝑡 ) 𝑑𝑡}

𝐵
= exp {− ∫ 𝜆𝑛 (𝑡 ) 𝑑𝑡}
𝐴

whereas the corresponding probability for a smoker is, by the same reasoning,

𝐵
𝑃{𝐴 − year − old smoker reaches age 𝐵} = exp {− ∫ 𝜆𝑠 (𝑡 ) 𝑑𝑡}
𝐴

𝐵 𝐵 2
= exp (−2 ∫ 𝜆𝑛 (𝑡 )𝑑𝑡) = [exp (− ∫ 𝜆𝑛 (𝑡 ) 𝑑𝑡)]
𝐴 𝐴

In other words, for two people of the same age, one of whom is a smoker and the
other a nonsmoker, the probability that the smoker survives to any given age is
the square (not one-half) of the corresponding probability for a nonsmoker. For
instance, if 𝜆𝑛 (𝑡) = 1/30, 50 ≤ 𝑡 ≤ 60, then the probability that a 50-year-old
nonsmoker reaches age 60 is 𝑒 −1/3 ≈ 0.7165, whereas the corresponding
probability for a smoker is e−2/3 ≈ 0.5134.

5.6 OTHER CONTINUOUS DISTRIBUTIONS

5.6.1 The Gamma Distribution

A random variable is said to have a gamma distribution with parameters (𝛼, 𝜆), λ
> 0, α > 0, if its density function is given by

- 75 -
Probability Theory Lecture Notes ---

𝜆𝑒 −𝜆𝑥 (𝜆𝑥)𝛼−1
𝑓 (𝑥) = {− Γ( 𝛼 )
, 𝑥≥0
0, 𝑥<0

where Γ(𝛼), called the gamma function, is defined as


Γ(𝛼) = ∫ 𝑒 −𝑦 𝑦 𝛼−1 𝑑𝑦
0

5.6.2 The Weibull Distribution

The Weibull distribution is widely used in engineering practice due to its


versatility. It was originally proposed for the interpretation of fatigue data, but
now its use has been extended to many other engineering problems. In particular,
it is widely used in the field of life phenomena as the distribution of the lifetime of
some object, especially when the “weakest link” model is appropriate for the
object. That is, consider an object consisting of many parts, and suppose that the
object experiences death (failure) when any of its parts fail. It has been shown
(both theoretically and empirically) that under these conditions a Weibull
distribution provides a close approximation to the distribution of the lifetime of
the item.

The Weibull distribution function has the form

0, 𝑥≤𝜐
𝐹 (𝑥 ) = { 𝑥−𝜐 𝛽
1 − exp {− ( ) }, 𝑥>𝜐
𝛼

A random variable whose cumulative distribution function is given by the above


equation is said to be a Weibull random variable with parameters 𝜈, 𝛼, and 𝛽.
Differentiation yields the density:

0, 𝑥≤𝜐
𝑓(𝑥) = {𝛽 𝑥 − 𝜐 𝛽−1 𝑥−𝜐 𝛽
( ) exp {− ( ) }, 𝑥>𝜐
𝛼 𝛼 𝛼

- 76 -
Probability Theory Lecture Notes ---

5.6.3 The Cauchy Distribution

A random variable is said to have a Cauchy distribution with parameter 𝜃, −∞ <


𝜃 < ∞, if its density is given by

1 1
𝑓 (𝑥 ) = , −∞<𝑥 <∞
𝜋 1 + (𝑥 − 𝜃 )2

5.6.4 The Beta Distribution

A random variable is said to have a beta distribution if its density is given by

1
− 𝑥 𝑎−1 (1 − 𝑥)𝑏−1 , 0<𝑥<1
𝑓 (𝑥) = { 𝐵(𝑎, 𝑏)
0, otherwise

where

1
𝐵(𝑎, 𝑏) = ∫ 𝑥 𝑎−1 (1 − 𝑥)𝑏−1 𝑑𝑥
0

5.7 THE DISTRIBUTION OF A FUNCTION OF A RANDOM VARIABLE

Often, we know the probability distribution of a random variable and are


interested in determining the distribution of some function of it. For instance,
suppose that we know the distribution of 𝑋 and want to find the distribution of
𝑔(𝑋). To do so, it is necessary to express the event that 𝑔(𝑋) ≤ 𝑦 in terms of
𝑋 being in some set. We illustrate with the following examples.

Example 5.20: Let 𝑋 be uniformly distributed over (0, 1). We obtain the
distribution of the random variable 𝑌, defined by 𝑌 = 𝑋 𝑛 , as follows: For
0 ≤ 𝑦 ≤ 1,

𝐹𝑌 (𝑦) = 𝑃{𝑌 ≤ 𝑦}

= 𝑃 {𝑋 𝑛 ≤ 𝑦 }

= 𝑃{𝑋 ≤ 𝑦1/𝑛 }
- 77 -
Probability Theory Lecture Notes ---

= 𝐹𝑋 (𝑦1/𝑛 )

For instance, the density function of 𝑌 is given by

1 1/𝑛−1
𝑓𝑌 (𝑦) = {𝑛 𝑦 , 0≤𝑦≤1
0, otherwise

Example 5.21: If 𝑋 is a continuous random variable with probability density 𝑓𝑋 ,


then the distribution of 𝑌 = 𝑋 2 is obtained as follows: For 𝑦 ≥ 0,

𝐹𝑌 (𝑦) = 𝑃{𝑌 ≤ 𝑦}

= 𝑃 {𝑋 2 ≤ 𝑦 }

= 𝑃{−√𝑦 ≤ 𝑋 ≤ √𝑦}

= 𝐹𝑋 (√𝑦) − 𝐹𝑋 (−√𝑦)

Differentiation yields

1
𝑓𝑌 (𝑦) = [𝑓𝑋 (√𝑦) + 𝑓𝑋 (√𝑦)]
2√𝑦

Example 5.22: If X has a probability density 𝑓𝑋 , then 𝑌 = |𝑋| has a density


function that is obtained as follows: For 𝑦 ≥ 0,

𝐹𝑌 (𝑦) = 𝑃{𝑌 ≤ 𝑦}

= 𝑃{|𝑋 | ≤ 𝑦} = 𝑃{−𝑦 ≤ 𝑋 ≤ 𝑦}

= 𝐹𝑋 (𝑦) − 𝐹𝑋 (−𝑦)

Hence, on differentiation, we obtain

𝑓𝑌 (𝑦) = 𝑓𝑋 (𝑦) + 𝑓𝑋 (−𝑦), 𝑦≥0

- 78 -
Probability Theory Lecture Notes ---

Theorem: Let X be a continuous random variable having probability


density function fX. Suppose that g(x) is a strictly monotonic
(increasing or decreasing), differentiable (and thus continuous)
function of x. Then the random variable Y defined by Y = g(X) has a
probability density function given by
𝒅
𝒇𝑿 [𝒈−𝟏 (𝒚)] | 𝒈−𝟏 (𝒚)| , 𝐢𝐟 𝒚 = 𝒈(𝒙) 𝐟𝐨𝐫 𝐬𝐨𝐦𝐞 𝒙
𝒇𝒀 (𝒚 ) = { 𝒅𝒚
𝟎, 𝐢𝐟 𝒚 ≠ 𝒈(𝒙) 𝐟𝐨𝐫 𝐚𝐥𝐥 𝒙
where g-1(y) is defined to equal that value of x such that g(x) = y

Proof. Suppose that 𝑦 = 𝑔(𝑥) for some 𝑥. Then, with 𝑌 = 𝑔(𝑋),

𝐹𝑌 (𝑦) = 𝑃{𝑔(𝑋 ) ≤ 𝑦}

= 𝑃{𝑋 ≤ 𝑔−1 (𝑦)}

= 𝐹𝑋 (𝑔−1 (𝑦))

Differentiation gives

𝑑 −1
𝑓𝑌 (𝑦) = 𝑓𝑋 (𝑔−1 (𝑦)) 𝑔 (𝑦 )
𝑑𝑦

When 𝑦 ≠ 𝑔(𝑥) for any 𝑥, then 𝐹𝑌 (𝑦) is either 0 or 1, and in either case 𝑓𝑌 (𝑦) =
0.

Example 5.23: Let 𝑋 be a continuous nonnegative random variable with density


function 𝑓 , and let 𝑌 = 𝑋 𝑛 . Find 𝑓𝑌 , the probability density function of 𝑌.

Solution. If 𝑔(𝑥) = 𝑥 𝑛 , then

𝑔−1 (𝑦) = 𝑦1/𝑛

and

- 79 -
Probability Theory Lecture Notes ---

𝑑 1
{𝑔−1 (𝑦)} = 𝑦1/𝑛−1
𝑑𝑦 𝑛
Hence, from the theorem, we obtain

1 1 −1
𝑓𝑌 (𝑦) = 𝑦 𝑛 𝑓(𝑦1/𝑛 )
𝑛

For 𝑛 = 2, this gives

1
𝑓𝑌 (𝑦) = 𝑓(√𝑦)
2√𝑦

SUMMARY

A random variable X is continuous if there is a nonnegative function f , called the


probability density function of 𝑋, such that, for any set 𝐵,

𝑃{𝑋 ∈ 𝐵} = ∫ 𝑓 (𝑥) d𝑥
𝐵

If 𝑋 is continuous, then its distribution function F will be differentiable and

𝑑
𝐹(𝑥) = 𝑓 (𝑥)
𝑑𝑥

The expected value of a continuous random variable 𝑋 is defined by

𝐸[𝑋] = ∫ 𝑥𝑓 (𝑥) d𝑥
−∞

A useful identity is that, for any function 𝑔,

𝐸[𝑔(𝑋)] = ∫ 𝑔(𝑥)𝑓 (𝑥) d𝑥


−∞

As in the case of a discrete random variable, the variance of 𝑋 is defined by

Var(𝑋 ) = 𝐸 [(𝑋 − 𝐸 [𝑋 ])2 ]

- 80 -
Probability Theory Lecture Notes ---

A random variable X is said to be uniform over the interval (𝑎, 𝑏) if its probability
density function is given by

1
𝑓 (𝑥 ) = {𝑏 − 𝑎 , 𝑎≤𝑥≤𝑏
0, otherwise

Its expected value and variance are

𝑎 + 𝑏
𝐸[𝑋] =
2

(𝑏 − 𝑎 )2
𝑉𝑎𝑟(𝑋 ) =
12

A random variable 𝑋 is said to be normal with parameters 𝜇 and 𝜎 2 if its


probability density function is given by

1 2 /2𝜎 2
𝑓 (𝑥 ) = 𝑒 −(𝑥−𝜇) , −∞<𝑥 <∞
𝜎√(2𝜋)

If 𝑋 is normal with mean 𝜇 and variance 𝜎 2 , then 𝑍, defined by

𝑋 − 𝜇
𝑍 =
𝜎

is normal with mean 0 and variance 1. Such a random variable is said to be a


standard normal random variable. Probabilities about 𝑋 can be expressed in
terms of probabilities about the standard normal variable 𝑍.

When n is large, the probability distribution function of a binomial random


variable with parameters 𝑛 and 𝑝 can be approximated by that of a normal
random variable having mean 𝑛𝑝 and variance 𝑛𝑝(1 − 𝑝).

A random variable whose probability density function is of the form

𝜆𝑒 −𝜆𝑥 , 𝑥≥0
𝑓 (𝑥 ) = {
0, otherwise

- 81 -
Probability Theory Lecture Notes ---

is said to be an exponential random variable with parameter 𝜆. Its expected value


and variance are, respectively,

1 1
𝐸 [𝑋 ] = 𝑉𝑎𝑟(𝑋 ) =
𝜆 𝜆2

A key property possessed only by exponential random variables is that they are
memoryless, in the sense that, for positive 𝑠 and 𝑡,

𝑃 {𝑋 > 𝑠 + 𝑡|𝑋 > 𝑡} = 𝑃{𝑋 > 𝑠}

If 𝑋 represents the life of an item, then the memoryless property states that, for
any 𝑡, the remaining life of a 𝑡-year-old item has the same probability distribution
as the life of a new item. Thus, one need not remember the age of an item to know
its distribution of remaining life.

Let 𝑋 be a nonnegative continuous random variable with distribution function 𝐹


and density function 𝑓 . The function

𝑓 (𝑡 )
𝜆 (𝑡 ) = , 𝑡≥0
1 − 𝐹 (𝑡 )

is called the hazard rate, or failure rate, function of 𝐹. If we interpret 𝑋 as being


the life of an item, then, for small values of d𝑡, 𝜆(𝑡) d𝑡 is approximately the
probability that a 𝑡-unit-old item will fail within an additional time d𝑡. If 𝐹 is the
exponential distribution with parameter 𝜆, then

𝜆 (𝑡 ) = 𝜆 𝑡≥0

In addition, the exponential is the unique distribution having a constant failure


rate.

A random variable is said to have a gamma distribution with parameters 𝛼 and 𝜆


if its probability density function is equal to

- 82 -
Probability Theory Lecture Notes ---

𝜆𝑒 −𝜆𝑥 (𝜆𝑥)𝛼−1
𝑓 (𝑥) = {− Γ( 𝛼 )
, 𝑥≥0
0, 𝑥<0

where Γ(𝛼), called the gamma function, is defined as


Γ(𝛼) = ∫ 𝑒 −𝑦 𝑦 𝛼−1 𝑑𝑦
0

The expected value and variance of a gamma random variable are, respectively,

𝛼 𝛼
𝐸 [𝑋 ] = Var(𝑋) =
𝜆 𝜆2

A random variable is said to have a beta distribution with parameters (𝑎, 𝑏) if its
probability density function is equal to

1
− 𝑥 𝑎−1 (1 − 𝑥)𝑏−1 , 0<𝑥<1
𝑓 (𝑥) = { 𝐵(𝑎, 𝑏)
0, otherwise

where

1
𝐵(𝑎, 𝑏) = ∫ 𝑥 𝑎−1 (1 − 𝑥)𝑏−1 𝑑𝑥
0

The mean and variance of such a random variable are, respectively,

𝑎 𝑎𝑏
𝐸 [𝑋 ] = Var(𝑋 ) =
𝑎 + 𝑏 (𝑎 + 𝑏 )2 (𝑎 + 𝑏 + 1 )

- 83 -

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy