MATHEMATICAL - STATISTICS (p.1-34)

Download as pdf or txt
Download as pdf or txt
You are on page 1of 34

CHAPTER 1

INTRODUCTION TO STATISTICS

1.1 Basic Probability

This chapter is a reminder of some basics in probability and statistics, it contains some definitions and examples that we
will be using in the rest of this notes. Probability or chance can be measured on a scale which runs from zero to one,
where zero represents impossibility and one represents certainty.

1.1.1 Sample Space


A sample space, Ω, is the set of all possible outcomes of an experiment. The sample space can be classified into two main
categories: discrete, where the space contains a finite or countable number of distinct point, and continuous when the
space contains an uncountable distinct sample points. An event E is defined to be a subset of the sample space, E ⊆ Ω.

EXAMPLE 1.1

A manufacturer has five seemingly identical computer terminals available for shipping. Unknown to her, two of the
five are defective. A particular order calls for two of the terminals and is filled by randomly selecting two of the five
that are available.
a List the sample space for this experiment.
b Let A denote the event that the order is filled with two non defective terminals. List the sample points in A.
c List the possible outcome for event B where both terminals are defective.
d Let C represent the case where at least one of the terminals is defective.
1
Please enter \offprintinfo{(Title, Edition)}{(Author)}
at the beginning of your document.
2 INTRODUCTION TO STATISTICS

Solution.
a Let the two defective terminals be labelled D1 and D2 and let the three good terminals be labelled G1 , G2 , and G3 .
Any single sample point will consist of a list of the two terminals selected for shipment. The simple events may be
denoted by
E1 = {D1 , D2 }, E5 = {D2 , G1 }, E8 = {G1 , G2 }, E10 = {G2 , G3 }.
E2 = {D1 , G1 }, E6 = {D2 , G2 }, E9 = {G1 , G3 },
E3 = {D1 , G2 }, E7 = {D2 , G3 },
E4 = {D1 , G3 },
Thus, The sample space Ω contains 10 sample points Ω = {E1 , E2 , . . . , E10 }.
b Event A = {E8 , E9 , E10 }.
c B = {E1 }.
d C = {E1 , E2 , E3 , E4 , E5 , E6 , E7 }
J

1.1.2 Probability Axioms


Suppose Ω is a sample space associated with an experiment. To every event A in Ω (A is a subset of Ω), we assign a
number, P (A), called the probability of A, so that the following axioms hold:
1. P (a) ≥ 0.
2. P (Ω) = 1.
3. If A1 , A2 , A3 , . . . form a sequence of pairwise mutually exclusive events in Ω (Ai ∩ Aj = ∅ for i 6= j), then:

X
P (A1 ∪ A2 ∪ A3 ∪ . . . ) = P (Ai ).
i=1

Other Consequences:

(i) P (Ā) = 1 − P (A), therefore P (φ) = 0.


(ii) For any two events A1 and A2 we have the addition rule:

P (A1 ∪ A2 ) = P (A1 ) + P (A2 ) − P (A1 ∩ A2 )

EXAMPLE 1.2

Following example (??), evaluate:


a Assign probabilities to the simple events in such a way that the information about the experiment is use.
b Find the probability of event A, B, C.
c Find the probability of A ∩ B, A ∪ C and B ∩ C.
BASIC PROBABILITY 3

Solution. a Because the terminals are selected at random, any pair of terminals is as likely to be selected as any other
pair. Thus, P (Ei ) = 1/10, for i = 1, 2, . . . , 10, is a reasonable assignment of probabilities.
d Since A = E8 ∪ E9 ∪ E10, then P (A) = P (E8 ) + P (E9 ) + P (E10 ) = 3/10.
Also, P (B) = 1/10 and P (C) = 7/10.
c Since A ∩ B = ∅, then P (A ∩ B) = 0.
A ∪ C = Ω, then P (A ∪ C) = 1.
B ∩ C = E1 , then P (B ∩ C) = 1/10.
J

1.1.3 Conditional Probability

Suppose P (A2 ) 6= 0. The conditional probability of the event A1 given that the probability of the event E2 is known, is
defined as:
P (A1 ∩ A2 )
P (A1 |A2 ) = .
P (A2 )
The conditional probability is undefined if P (A2 ) = 0. The conditional probability formula above yields the multiplica-
tion rule:

P (A1 ∩ A2 ) = P (A1 )P (A2 |A1 ) (1.1)


= P (A2 )P (A1 |A2 )

1.1.4 Independence

Suppose that events A1 and A2 are in sample space Ω, A1 and A2 are said to be independent if

P (A1 ∩ A2 ) = P (A1 )P (A2 ).

In the case of conditional probability, this implies to P (A1 |A2 ) = P (A1 ) and P (A2 |A1 ) = P (A2 ). That means that
the knowledge of the occurrence of one of the events does not affect the likelihood of occurrence of the other. For more
general case, A1 , A2 , . . . are pairwise independent
Q if P (Ai ∩ Aj ) = P (Ai )P (Aj ), for all i 6= j. They are mutually
independent if for all subsets P (∩j Aj ) = j P (Aj ).

EXAMPLE 1.3

Back again to example (??), evaluate the probability of the event A given B and B given C.
Solution.
P (A ∩ B)
P (A|B) = = 0.
P (B)
P (B ∩ C) 1/10
P (B|C) = = = 1/7.
P (C) 7/10
J
Partition Law: Suppose B1 , B2 , . . . , Bk are mutually exclusive and exhaustive events, (i.e. Bi ∩ Bj = ∅, for all i 6= j
and ∪i Bi = Ω ). Let A be any event, then
k
X
P (A) = P (A|Bj )P (Bj ).
j=1
4 INTRODUCTION TO STATISTICS

. Bayes’ Law: Suppose B1 , B2 , . . . , Bk are mutually exclusive and exhaustive events and A is any event, then

P (A|Bj )P (Bj ) P (A|Bj )P (Bj )


P (Bj |A) = =P .
P (A) i P (A|Bi )P (Bi )

EXAMPLE 1.4

(Cancer diagnosis) A screening programme for a certain type of cancer has reliabilities P (A|D) = 0 : 98,
P (A|D̄) = 0 : 05, where D is the event “disease is present” and A is the event “test gives a positive result”. It
is known that 1 in 10,000 of the population has the disease. Suppose that an individual’s test result is positive. What
is the probability that the person has the disease?

Solution. We require P (D|A). First, we need to find P (A):

P (A) = P (A|D)P (D) + P (A|D̄)P (D̄) = 0.98 × 0.0001 + 0.05 × 0.9999 = 0.050093.

By the use of Bayes’ rule;


P (A|D)P (D) 0.0001 × 0.98
P (D|A) = = = 0.002.
P (D) 0.050093

Therefore, the person is still very unlikely to have the disease even though the test is positive. J

EXAMPLE 1.5

(Bertrand’s Box Paradox) Three indistinguishable boxes contain black and white beads as shown: [ww], [wb], [bb].
A box is chosen at random and a bead chosen at random from the selected box. What is the probability of that the
[wb] box was chosen given that selected bead was white?

Solution. Let E represent the event of choosing the [wb] box, W is the event of that the selected bead is white. By
partition law: P (W ) = 1 × 13 + 12 × 13 + 0 × 31 . Then, using Bayes’ rule gives:

1 1
P (E)P (W |E) 3 × 2 1
P (E|W ) = = 1 = .
P (W ) 2
3

This means, even though a bead from the selected box has been seen, the probability that the box is [wb] is still 1/3. J

1.2 Random Variables

A Random variable X is a real-valued function for which the domain is a sample space. Given a random experiment
with sample space Ω, then X : Ω → R. The space of the r.v X is the set of real numbers A = {x : x = X(ω), ω ∈ Ω}.
Furthermore, for any event A ⊂ A, then there is an event Ψ ⊂ Ω, such that P (A) = Pr{X ∈ A} = P (Ψ), where
Ψ = {ω : ω ∈ Ω, X(ω) ∈ A} and A = {x : x = X(ω), ω ∈ Ψ}, knowing that P (A) satisfy the probability axiom ??.
Note: A r.v X is called discrete if it defined on a discrete sample space (countable or finite), and it is called a continuous
r.v otherwise.
RANDOM VARIABLES 5

EXAMPLE 1.6

Toss a coin twice, the the sample space is: Ω = {HH, HT, T H, T T }, suppose a r.v X represent the number of
heads. Then: 
0, if ω = T T

X(ω) = 1, if ω = T H, HT (1.2)

2, if ω = HH

Therefore, the space of X is A = {x : x = 0, 1, 2}, and the probability of x = 0, 1, 2: Pr X = 0 = 1/4,


Pr X = 1 = 1/2 and Pr X = 2 = 1/4.
Assume the event A = {x : x = 0, 1} ⊂ A, then P (A) = Pr(X ∈ A) = Pr(X = 0, 1) = Pr(X = 0)+Pr(X =
1) = 3/4.

EXAMPLE 1.7

Let A = {x : 0 < x < 2} be the sample space of a r.v X. For each event A ⊂ A, we define the probability set
function P (A) as
Z
3 2
P (A) = x dx, x∈A (1.3)
x∈A 8
= 0, e.w

If A1 = {x : 0 < x < 1/2} and A2 = {x : 1/2 < x < 1}. Find the P (A1 ), P (Ac1 ), P (A2 ), P (Ac2 ), P (A1 ∩
A2 ), P (A1 ∪ A2 )

Solution. Z Z 1/2
3 2 3 1
P (A1 ) = x dx = x2 dx = .
x∈A1 8 8 0 64
1 63
P (Ac1 ) = 1 − P (A1 ) = 1 − = .
64 64
Z Z 1
3 2 3 7
P (A2 ) = x dx = x2 dx = .
x∈A2 8 8 1/2 8
7 1
P (Ac2 ) = 1 − P (A2 ) = 1 − = .
8 8
Since A1 ∩ A2 = ∅, Then P (A1 ∩ A2 ) = 0, and then

57
P (A1 ∪ A2 ) = P (A1 ) + P (A2 ) = .
64
J

EXAMPLE 1.8

Let A = {x : x = 1, 2, . . . } be the sample space of a r.v X. For each event A ⊂ A, we define the probability set
function P (A) as
X  1 x
P (A) = Pr(X ∈ A) = , x∈A
2
x∈Å
6 INTRODUCTION TO STATISTICS

If A = {x : x = 1, 2}, B = {x : x = 2, 3}, C = {x : x = 1, 3, 5, . . . }. Find P (A), P (B), P (C), P (Ac ), P (B c ),


P (C c ), P (A ∩ B), and P (A ∪ B).

Solution.
X  1 x 2  x  2
X 1 1 1 3 1
P (A) = = = + = ⇒ P (Ac ) =
2 x=1
2 2 2 4 4
x∈A

X  1 x 3  x  2  3
X 1 1 1 3 5
P (B) = = = + = ⇒ P (B c ) =
2 x=2
2 2 2 8 8
x∈B

X  1 x ∞  x  3  5
X 1 1 1 1 1/2 2 1
P (C) = = = + + + ··· = 2
= ⇒ P (C c ) =
2 x=1,step2
2 2 2 2 1 − (1/2) 3 3
x∈C

X  1 x  1 2 1
A ∩ B = {x : x = 2} ⇒ P (A ∩ B) = = =
2 2 4
x∈A∩B

3 7 1 7
P (A ∪ B) = P (A) + P (B) − P (A ∩ B) == + − =
4 8 4 8
J
CHAPTER 2

DISTRIBUTION OF RANDOM VARIABLES

2.1 The Probability Density Function (PDF)

Notationally, we will use an upper case letter, such as X or Y , to denote a random variable and a lower case letter, such
as x or y, to denote a particular value that a random variable may assume. For example, let X denote any one of the six
possible values that could be observed on the upper face when a die is tossed. After the die is tossed, the number actually
observed will be denoted by the symbol x. Note that X is a random variable, but the specific observed value, x, is not
random.
It is now meaningful to talk about the probability that X takes on the value x, denoted by Pr(X = x).
Definition: The probability that X takes on the value x, Pr(X = x), is defined as the sum of the probabilities of all
sample points in Ω that are assigned the value x. We will sometimes denote Pr(X = x) by p(x) or f (x). Because p(x)
or f (x) is a function that assigns probabilities to each value x of the random variable X.
Definition: The probability distribution for a discrete variable X can be represented by a formula, a table, or a graph
that provides p(x) = Pr(X = x) for all x.

EXAMPLE 2.1

A supervisor in a manufacturing plant has three men and three women working for him. He wants to choose two
workers for a special job. Not wishing to show any biases in his selection, he decides to select the two workers at
random. Let X denote the number of women in his selection. Find the probability distribution for X.
Solution. The supervisor can select two workers from six in 62 = 15 ways. Hence Ω contains 15 sample points,


which we assume to be equally likely because random sampling was employed. Thus, Pr(Ei ) = 1/15, i = 1, 2, . . . , 15.
7
Please enter \offprintinfo{(Title, Edition)}{(Author)}
at the beginning of your document.
8 DISTRIBUTION OF RANDOM VARIABLES

The
 values for X that have nonzero probability are 0, 1, and 2. The number of ways of selecting X = 0 women is
3 3

0 2 = 1 × 3 = 3 sample points in the event X = 0, and
3
 3
0 2 3 1
p(0) = Pr(X = 0) = = = .
15 15 5
Similarly,
3
 3
1 1 9 3
p(1) = Pr(X = 1) = = = .
15 15 5
3
 3
2 0 3 1
p(2) = Pr(X = 2) = = = .
15 15 5
Notice that (X = 1) is by far the most likely outcome. This should seem reasonable since the number of women equals
the number of men in the original group. Therefore, we can write the probability function in the formula:
3
 3 
x 2−x
p(x) = 6
 , x = 0, 1, 2.
2

Notice that, since p(x) = Pr(X = x) is a probability function, this means that the sum of p(x) over the space is equal to
one. J

Theorem: If f (x) is a probability density function (pdf) for a discrete or continuous random variable X, then the
following properties should be satisfied:

1. f (x) ≥ 0, for all x ∈ A.


P
2. discrete: x∈A f (x) = 1.
R
continuous: x∈A f (x) = 1.

For any subset of the sample of the sample points (A ⊂ A), a probability set function p(x) can be expressed in term of
the pdf f (x) as: (P
f (x), discrete
p(A) = Pr(x ∈ A) = R x∈A
x∈A
f (x), continuous

EXAMPLE 2.2

Let X be a discreter.v defined on a sample set A = {x : x = 0, 1, 2, 3}, and let f (x) be a function defined on
A as: f (x) = 18 x3 , x ∈ A. Examine whether f (x) is a pdf of X or not. If so, find: p(A) and p(Ac ), where
A = {x : x = 1, 2}.

Solution.

For the fist condition:


f (x) > 0, ∀x ∈ A.

For the second condition:


3          
X 1X 3 1 3 3 3 3 1
f (x) = = + + + = (1 + 3 + 3 + 1) = 1.
8 x=0 x 8 0 1 2 3 8
x∈A
THE PROBABILITY DENSITY FUNCTION (PDF) 9

This proves that f (x) is a pdf of X. Hence,

2      
X 1X 3 1 3 3 3
p(A) = f (x) = = + = ,
8 x=1 x 8 1 2 4
x∈A

3
and p(Ac ) = 1 − p(A) = 1 − 4 = 14 . J

EXAMPLE 2.3

LetX be a r.v defined on a sample set A = {x : x = 1, 2, 3, . . . } and let f (x) be a function defined on A as: f (x) =
1 x
2 , x ∈ A. Is f (x) a pdf of X? If so, evaluate the following probabilities: p(A), p(B), p(A ∩ B), p(A ∪ B) and
p(A|B), knowing that A = {x : x = 1, 2, 3} and B = {x : x = 1, 3, 5, . . . }

Solution.

f (x) > 0, ∀x ∈ A = {x : x = 1, 2, 3, . . . }.

∞  x  2  3
X X 1 1 1 1 1/2
f (x) = = + + + ··· = = 1.
x=1
2 2 2 2 1 − 1/2
x∈A

Hence, f (x) is a pdf of X.

In order to evaluate the probabilities:

3  x  2  3
X X 1 1 1 1 7
p(A) = f (x) = = + + =
x=1
2 2 2 2 8
x∈A

∞  x  3  5
X X 1 1 1 1 1/2 2
p(B) = f (x) = = + + + ··· = = .
x=1,step2
2 2 2 2 1 − 1/4 3
x∈B

X X  1 x 1
 3
1 5
A ∩ B = {x : x = 1, 3} ⇒ p(A ∩ B) = f (x) = = + = .
x=1,3
2 2 2 8
x∈A∩B

7 2 5 11
p(A ∪ B) = p(A) + p(B) − p(A ∩ B) = + − = .
8 3 8 12

p(A ∩ B) 5/8 15
p(A|B) = = = .
p(B) 2/3 16

J
10 DISTRIBUTION OF RANDOM VARIABLES

EXAMPLE 2.4

Let X be a r.v defined on a sample set A = {x : 2 ≤ x ≤ 4} and let f (x) be a function defined on A as:
f (x) = 81 (x + 1), x ∈ A. Examine whether f (x) is a pdf of X? If it is, find p(A), where A = {x : 1.5 ≤ x ≤ 2.5}.
Solution.

f (x) > 0, ∀x ∈ A = {x : 2 ≤ x ≤ 4}.

Z Z 4 4
1 1 1
f (x)dx = (x + 1)dx = (x + 1)2 = (25 − 9) = 1.
x∈A 2 8 16 2 16
Hence, f (x) is a pdf of X.

Z Z 2 Z 2.5 Z 2.5 2
1 1 13
p(A) = f (x)dx = f (x)dx + f (x)dx = 0 + (x + 1)dx = (x + 1)2 .5 = .
x∈A 1.5 2 2 8 16 2 320

EXAMPLE 2.5

Suppose that the function f (x) = e−x , x ∈ A is defined os a sample set A = {x : 0 < x < ∞} and that X is a
r.v. Show that f (x) is a pdf of X and evaluate p(A), p(B), p(A ∩ B) and p(A ∪ B), if A = {x : 0 < x < 3} and
B = {x : 1 < x < ∞}.
Solution.

f (x) > 0, ∀x ∈ A = {x : 2 ≤ x ≤ 4}.

Z Z ∞ ∞
f (x)dx = e−x dx = −e−x = −(e−∞ − e0 ) = 1.
x∈A 0 0

Hence, f (x) is a pdf of X.

For the probability evaluations:

Z Z 3
p(A) = f (x)dx = e−x dx = 1 − e−3 .
x∈A 0

Z Z ∞
p(B) = f (x)dx = e−x dx = e−1 .
x∈B 1

Z 3
A ∩ B = {x : 1 < x < 3} ⇒ p(A ∩ B) = e−x dx = e−1 − e−3 .
1

p(A ∪ B) = p(A) + p(B) − p(A ∩ B) = 1 − e−3 + e−1 − e−1 + e−3 = 1.

J
THE PROBABILITY DENSITY FUNCTION (PDF) 11

EXAMPLE 2.6

Verify that the following functions are pdf’s of a r.v X that defined as:
1. f (x) = x−2 , A = {x : 1 < x < ∞}.
1 x
2. f (x) = 94 x2
 
2 , A = {x : x = 0, 1, 2}.
3. f (x) = 1 − |1 − x|, A = {x : 0 < x < 2}.
(
1 + x, −1 < x < 0
4. f (x) =
1 − x, 0 ≤ x < 1

Solution.
3. f (x) ≥ 0, ∀x ∈ A = {x : 0 < x < 2}.
(
1 − (1 − x), 1 − x ≥ 0 → x ≤ 1
f (x) = 1 − |1 − x| =
1 + (1 − x), 1 − x < 0 → x > 1
(
x, 0<x≤1
f (x) =
2 − x, 1<x<2
then, Z Z 1 Z 2 1 2
1 2 1 1 1
f (x)dx = xdx + (2 − x)dx = x + (2 − x)2 = + =1
x∈A 0 1 2 0 2 1 2 2
J

EXAMPLE 2.7

Find the constant c that makes each of the following function pdf of a r.v X:
1. f (x) = c(x + 1), x = 0, 1, 2, 3.
2. f (x) = c(xα−1 − xβ−1 ), 0 < x < 1, α > 1, β > 0.
3. f (x) = c(1 + x2 )−1 , −∞ < x < ∞.

Solution. Since f (x) is a pdf, then it should satisfy the properties of the pdf, hence:
1.
X 3
X
1= f (x) = c(x + 1) = c(1 + 2 + 3 + 4) = 10c ⇒ c = 1/10.
x∈A x=0

2.
1 1
xα xβ
  
β−α
Z Z
1 1 αβ
1= f (x)dx = c(xα−1 − xβ−1 )dx = c − =c − =c ⇒c= , β 6= α
x∈A 0 α β 0 α β αβ β−α

3.
Z ∞ ∞
c(1 + x2 )−1 dx = c tan−1 x = c tan−1 (∞) − tan−1 (−∞) = c tan−1 (∞) + tan−1 (∞)
   
1=
−∞ −∞
12 DISTRIBUTION OF RANDOM VARIABLES

π 1
= 2c tan−1 (∞) = 2c ⇒c= .
2 π
J

Definition: The Mode of the distribution is the value of x that maximises the pdf f (x) of a r.v X. Note that the mode
of a continuous r.v is the solution of f 0 (x) = 0 and f (x) < 0. Also, the mode may not be existed or a distribution may
have more than one mode.

EXAMPLE 2.8

Find the mode for the following pdf’s:


x
1. f (x) = 21 , x = 1, 2, 3, . . . .

2. f (x) = 12x2 (1 − x), 0 < x < 1.

Solution.
1 x

1. f (x) = 2 , x = 1, 2, 3, . . . , then x = 1 is the mode.

2. f (x) = 12x2 (1 − x), 0 < x < 1 ⇒ f 0 (x) = 12(2x − 3x2 ), set f 0 (x) = 0:

12(2x − 3x2 ) = 0 ⇒ x = 0, 2/3.

then, f 00 (x) = 12(2 − 6x) = 24(1 − 3x).

f 00 (0) = 24(1 − 0) = 24 > 0

f 00 (2/3) = 24(1 − 3(2/3)) = −24 < 0


hence, x = 2/3 is the mode.

2.1.1 The Probability Density Function in n−Dimensional Space

Let X1 , X2 , . . . , Xn be an n r.v’s (discrete or continuous) defined on n−D sample space A, and let Pr(X1 = x1 , X2 =
x2 , . . . , Xn = xn ) = f (x1 , x2 , . . . , xn ) be a function defined on A, such that:

1. f (x1 , x2 , . . . , xn ) ≥ 0, ∀(x1 , x2 , . . . , xn ) ∈ A.

2. (P P P
· · · (x1 ,x2 ,...,xn )∈A f (x1 , x2 , . . . , xn )
1= RR R
··· (x1 ,x2 ,...,xn )∈A f (x1 , x2 , . . . , xn )dx1 , dx2 , . . . , dxn

Then the function f (x1 , x2 , . . . , xn ) is called pdf of r.v’s X1 , X2 , . . . , Xn . Furthermore, for all event A ⊂ A, the
probability of A, p(A), can be expressed in terms of the pdf by:
(P P P
· · · (x1 ,x2 ,...,xn )∈A f (x1 , x2 , . . . , xn ).
p(A) = Pr{(X1 , X2 , . . . , Xn ) ∈ A} = R R R
··· (x1 ,x2 ,...,xn )∈A f (x1 , x2 , . . . , xn )dx1 , dx2 , . . . , dxn .
THE PROBABILITY DENSITY FUNCTION (PDF) 13

EXAMPLE 2.9

Let X and Y be discrete r.v’s defined on a sample space A = {(x, y) : x = 1, 2, 3; y = 1, 2}, and let f (x, y) be a
1
function defined on A by f (x, y) = 21 (x + y), (x, y) ∈ A.
1. Is f (x, y) a pdf of X and Y ?
2. If so, find p(A) and p(Ac ), where A = {(x, y) : x = 1, 2; y = 1}.

Solution.

1. - f (x, y) > 0, ∀(x, y) ∈ A.


-
3 X 2 3
XX X 1 1 X
f (x, y) = (x + y) = [(x + 1) + (x + 2)]
(x,y)∈A
x=1 y=1
21 21 x=1
3
1 X 1
= (2x + 3) = [5 + 7 + 9] = 1.
21 x=1 21

Then f (x, y) is a pdf of X and Y .


2.
2 X 1 2
X X X 1 1 X 1 5
p(A) = f (x, y) = (x + y) = (x + 1) = (2 + 3) =
x=1 y=1
21 21 x=1
21 21
(x,y)∈A

5 16
⇒ p(Ac ) = 1 − = .
21 21
J

EXAMPLE 2.10

Let X and Y be two r.v’s defined on a sample space A = {(x, y) : x = 1, 2, . . . ; y = 0, 1, 2}, and let f (x, y) be a
function defined on A by
   x+2
2 1
f (x, y) = , (x, y) ∈ A
y 2
1. Is f (x, y) a pdf of X and Y ?
2. Find p(A), where A = {(x, y) : x = 1, 3, 5, . . . ; y = 1}.

Solution.

1. - f (x, y) > 0, ∀(x, y) ∈ A.


-
∞ X
2    x+2 ∞  x+2      
XX X 2 1 X 1 2 2 2
f (x, y) = = + +
(x,y)∈A y 2
x=1 y=0 x=1
2 0 1 2
∞  x+2
"  3  4  5 #
X 1 1 1 1 (1/2)3
= (1 + 2 + 1) = 4 + + + ... = 4 =1
x=1
2 2 2 2 1 − (1/2)
14 DISTRIBUTION OF RANDOM VARIABLES

Then f (x, y) is a pdf of X and Y .

2.
∞ 1    x+2 ∞  x+2   ∞  x+2
XX X X 2 1 X 1 2 X 1
p(A) = f (x, y) = = =2
(x,y)∈A
x=1,step2 y=1
y 2 x=1,step2
2 1 x=1,step2
2
"  #
3  5  7
1 1 1 (1/2)3 1
=2 + + + ... = 2 = .
2 2 2 1 − (1/2)2 3

EXAMPLE 2.11

Let X and Y be continuous r.v’s defined on a sample space A = {(x, y) : 0 < x < 2; 2 < y < 4}, and let f (x, y)
be a function defined on A as f (x, y) = 81 (6 − x − y), (x, y) ∈ A

1. Is f (x, y) a pdf of X and Y ?

2. Find p(A), where A = {(x, y) : 0 < x < 1; 2 < y < 3}.

Solution.

1. - f (x, y) ≥ 0, ∀(x, y) ∈ A.
- 4
2 4
1 2 y2
Z Z Z Z Z 
1
f (x, y) = (6 − x − y)dydx = 6y − xy − dx
(x,y)∈A x=0 y=2 8 8 0 2 2

1 2 1 2 12 − 4
Z Z
1 2
= [(24 − 4x − 8) − (12 − 2x − 2)] dx = (6 − 2x)dx = 6x − x2 0 = =1
8 0 8 0 8 8
Then f (x, y) is a pdf of X and Y .

2. Z Z Z 1 Z 3
1 3
p(A) = f (x, y) = (6 − x − y)dydx = · · · =
(x,y)∈A x=0 y=2 8 8

EXAMPLE 2.12

X and Y are two r.v’s defined on a sample space A = {(x, y) : 0 < x < y < 1}, and let f (x, y) = 2 (x, y) ∈ A is
a function defined on A. is f (x, y) a pdf of X and Y ?

Solution.

- f (x, y) = 2 > 0, ∀(x, y) ∈ A.

- 1
1 1 1 1
x2
Z Z Z Z Z Z 
1 1
f (x, y) = 2 dydx = 2 y dx = 2 (1 − x)dx = 2 x − = 2(1 − ) = 1
(x,y)∈A x=0 y=x 0 x 0 2 0 2
CUMULATIVE DISTRIBUTION FUNCTION (CDF) 15

Then f (x, y) is a pdf of X and Y . J


Note: If f (x1 , x2 , . . . , xn ) is a pdf of r.v’s X1 , X2 , . . . , Xn defined on a sample space A = {(x1 , x2 , . . . , xn ) : −∞ <
xi < ∞; i = 1, 2, . . . , n}, and if event A ⊂ A, where A = {(x1 , x2 , . . . , xn ) : ai < xi < bi ; i = 1, 2, . . . , n}. Then the
probability of A is:

b1
P b2
P Pbn
· · · f (x1 , x2 , . . . , xn )




x1 =a1 x2 =a2
 xn =an
p(A) = Pr{ai < xi < bi ; i = 1, 2, . . . , n} =
Rb1 Rb2 Rbn





 ··· f (x1 , x2 , . . . , xn )dx1 , dx2 , . . . , dxn
x1 =a1 x2 =a2 xn =an

EXAMPLE 2.13

Find the constant c that makes the function f (x, y) = ce−x−y , 0 < x < y < ∞ a pdf of r.v’s X and Y .
RR
Solution. Since f (x, y) is a pdf of X and Y , then by definition (x,y)∈A
f (x, y)dxdy = 1:
Z ∞ Z ∞ Z ∞ Z ∞  Z ∞ ∞
−x−y −x −y
e−x −e−y x dx

1=c e dydx = c e e dy dx = c
x=0 y=x 0 x 0
Z ∞ Z ∞  ∞
1 −2x 1 c
=c e−x (0 + e−x )dx = c e−2x dx = c e = c(0 + ) = .
0 0 2 0 2 2
Therefore, c = 2. J

2.2 Cumulative Distribution Function (CDF)

Definition: The cumulative distribution function or the probability distribution of a r.v X, denoted by F (x) = Pr(X ≤
x), −∞ < x < ∞. Let X be a r.v has a pdf f (x) defined on a sample space A, we define:
 x
P


 f (t), discrete.
F (x) = Pr(X ≤ x) = Pr(−∞ < x < ∞) = t=−∞ x
R


 f (t)dt, continuous.
t=−∞

EXAMPLE 2.14

Suppose that X has a pdf p(x), defined on x = 0, 1, 2 as:


   x  2−x
2 1 1
p(x) = , x = 0, 1, 2.
x 2 2

Find F (x) for all x.


Solution. From the pdf p(x), we have: p(0) = 1/4, p(1) = 1/2, p(2) = 1/4. In order to find the distribution function
F (x), from definition of F (x) = Pr(X ≤ x), we should evaluate the probability for four regions of x:

−∞ < x < 0, 0 ≤ x < 1, 1 ≤ x < 2 and 2 ≤ x < ∞.


16 DISTRIBUTION OF RANDOM VARIABLES

For x < 0: Because the only value of X that are assigned positive probabilities are 0, 1 and 2; and that non of these
values are less than 0, then f (x) = 0, ∀ − ∞ < x < 0.
For 0 ≤ x < 1:
1 1
F (x) = Pr(0 ≤ x < 1) = Pr(X < 0) + Pr(X = 0) = 0 + = .
4 4
For 1 ≤ x < 2:
1 1 3
F (x) = Pr(1 ≤ x < 2) = Pr(X = 0) + Pr(X = 1) = + = .
4 2 4
For x ≥ 2:
1 3
F (x) = Pr(X ≥ 2) = Pr(X < 0) + Pr(0 ≤ X ≤ 1) + Pr(1 ≤ X < 2) = 0 + + = 1.
4 4
Therefore, the distribution probability is:


 0, x<0

1/4, 0≤x<1
F (x) = Pr(X ≤ x) =


 3/4, 1≤x<2
1, x≥2

J
Notes:
1. F (−∞) = lim F (x) = 0.
x→−∞

2. F (∞) = lim F (x) = 1.


x→∞

3. F (x) is a non-decreasing function of x, if x1 < x2 then F (x1 ) ≤ F (x2 ).


4. 0 ≤ F (x) ≤ 1, because 0 ≤ Pr(X ≤ x) ≤ 1.
5. For a continuous r.v X, the pdf could be evaluated in terms of the cdf as:
d
f (x) = F (x).
dx
6. For a discrete r.v X, the pdf could be evaluated in terms of the cdf as:
f (x) = Pr(X = x) = Pr(X ≤ x) − Pr(X ≤ x − 1) = F (x) − F (x − 1).

EXAMPLE 2.15

Let the r.v has a pdf f (x) = x6 ; x = 1, 2, 3. Find the cdf of X


Solution.
x x
X X τ 1 x(x + 1)
F (x) = Pr(X ≤ x) = f (τ ) = = (1 + 2 + · · · + x) =
τ =−∞ τ =1
6 6 12

0,
 x<1
x(x+1)
∴ F (x) = , 1≤x<3
 12
1, x≥3

J
CUMULATIVE DISTRIBUTION FUNCTION (CDF) 17

EXAMPLE 2.16

Let the r.v has a pdf f (x) = 2x−3 ; 1 < x < ∞. Find cdf of X
Solution. Z x Z x x 1
F (x) = Pr(X ≤ x) = f (τ )dτ = 2x−3 dτ = −τ −2 1 −
τ =−∞ τ =1 1 x2

0,
 x≤1
1
∴ F (x) = 1 − x2 , 1≤x<∞

1, x=∞

J
Remark: In order to evaluate the probability of a r.v X between a and b, we have two ways:
1. Using the pdf as: (P
b
f (x), discrete
Pr(a ≤ X ≤ b) = R b x=a
x=a
f (x), continuous

2. Using the cdf as:


Pr(a ≤ X ≤ b) = Pr(X ≤ b) − Pr(X ≤ a) = F (b) − F (a).

EXAMPLE 2.17

1 x

Let the r.v has a pdf f (x) = 2 ; x = 1, 2, 3, . . . .
1. Find the cdf of X.
2. Evaluate Pr(2 ≤ X ≤ 4) using the pdf and the cdf.

Solution. 1.
x x  τ  2  3  x
X X 1 1 1 1 1 (1/2)[1 − (1/2)x ]
F (x) = f (τ ) = = + + + ··· + =
τ =−∞ τ =1
2 2 2 2 2 1 − (1/2)

0,
 x<1
1 x

∴ F (x) = 1 − 2 , 1≤x<∞

1, x=∞

2. Using the pdf:


4  x  2  3  4
X 1 1 1 1 7
Pr(2 ≤ x ≤ 4) = = + + = .
x=2
2 2 2 2 16
Using the cdf:
 4  2
1 1 1 1 7
Pr(2 ≤ x ≤ 4) = F (4) − F (2) = 1 − −1+ = − =
2 2 2 16 16
J
Definition: The Median of a r.v X is the value of x for which the cdf F (x) = 21 .
18 DISTRIBUTION OF RANDOM VARIABLES

EXAMPLE 2.18

Find the median of the following distributions, where their pdf’s are defined as follows:
x
1. f (x) = 21 , x = 1, 2, 3, . . . .

2. f (x) = x6 , x = 1, 2, 3.

3. f (x) = 3x2 , 0 < x < 1.

4. f (x) = 1
π (1 + x2 )−1 , −∞ < x < ∞.

1 x
To evaluate the median, set F (x) = 12 :

Solution. 1. From ex (??), F (x) = 1 − 2 .
 x  x
1 1 1 1
1− = ⇒ = ⇒ x = 1.
2 2 2 2

The median x = 1.
x(x+1)
2. From ex (??), F (x) = 12 . To evaluate the median, set F (x) = 21 :

x(x + 1) 1
= ⇒ x(x + 1) = 6 ⇒ x2 + x − 6 = 0 ⇒ x = −3, 2.
12 2
The median x = 2 since −3 ∈
/ A.
Rx Rx x
3. F (X) = τ =−∞
f (τ )dτ = τ =0
3τ 2 dτ = τ 3 = x3 , 0 < x < 1.
0
1 1 1
Set F (x) = 2 ⇒ x3 = 2 ⇒x= 3

2
is the median.
Rx Rx
4. F (X) = τ =−∞
f (τ )dτ = 1
τ =−∞ π
(1 + τ )−1 dτ = 1
π tan−1 τ |x−∞ = 1
2 + 1
π tan−1 x, −∞ < x < ∞.
Set F (x) = 1
2 ⇒ 1
2 + 1
π tan−1 x = 1
2 ⇒ x = 0 is the median.
J

EXAMPLE 2.19

Find the constant c in the following cdf’s and find the pdf for each case.

1. X is a r.v with a cdf: 


0,
 x<1
F (x) = c[1 − (1/2)x ], 1 ≤ x < ∞

1, x=∞

2. X is a r.v has a cdf: 


0,
 x≤
F (x) = cx(x + 1), 0 < x < 3

1, x≥3

Solution.
CUMULATIVE DISTRIBUTION FUNCTION (CDF) 19

1. Since F (x) is a cdf, then: F (∞) = 1 ⇒ c[1 − (1/2)∞ ] = 1 ⇒ c(1 − 0) = 1 ⇒ c = 1. then:



0,

x
x<1
F (x) = 1 − 21 , 1 ≤ x < ∞

1, x=∞

Therefore, the pdf f (x) = F (x) − F (x − 1):


 x  x−1  x−1  x  x−1  
1 1 1 1 1 1
f (x) = 1 − −1+ = − = 1−
2 2 2 2 2 2
 x
1
∴ f (x) = , x = 1, 2, 3, . . .
2
2. Since F (∞) is a cdf, then: F (3) = 1 ⇒ 3c(3 + 1) = 1 ⇒ 12c = 1 ⇒ c = 12 , then:

0,
 x≤0
1
F (x) = 12 x(x + 1), 0 < x < 3

1, x≥3

Therefore, the pdf f (x) = F 0 (x):


1
∴ f (x) = (2x + 1), 0<x<3
12
J

EXAMPLE 2.20
(
x, 0<x<1
Find the cdf of the r.v X which has a pdf f (x) =
2 − x, 1≤x<2
Solution. 

 0, x≤0


 Rx Rx 1 2
x
1 2
 f (τ )dτ = τ dτ = 2 τ = 2x , 0<x<1


0 0 0
F (x) = Pr(X ≤ x) = R1 Rx
f (τ )dτ + f (τ )dτ = 1 − 21 (2 − x)2 , 1 ≤ x < 2






 0 1
1, x≥2

J

2.2.1 The Cumulative Distribution Function in n−Dimensional Space

Let X1 , X2 , . . . , Xn be n r.v’s defined on an n−Dimensional sample space A with pdf f (x1 , x2 , . . . , xn ). We define the
cdf of X1 , X2 , . . . , Xn as:
F (x1 , x2 , . . . , xn ) = Pr(X1 ≤ x1 , X2 ≤ x2 , . . . , Xn ≤ xn )
 P x1 x2
P xn
P


 · · · f (τ1 , τ2 , . . . , τn ), discrete.
τ1 =−∞ τ2 =−∞ τn =−∞

=

 Rx1 Rx2 x
Rn
··· f (τ1 , τ2 , . . . , τn )dτ1 dτ2 . . . dτn , continuous.



τ1 =−∞ τ2 =−∞ τn =−∞
20 DISTRIBUTION OF RANDOM VARIABLES

EXAMPLE 2.21

x 1 y

Let the pdf of r.v’s X and Y be: f (x, y) = 6 2 ; x = 1, 2, 3, y = 1, 2, . . . . Find the cdf of X and Y .
Solution.
x y x X y  s
X X
X t 1
F (x, y) = Pr(X ≤ x, Y ≤ y) = f (t, s) =
t=−∞ s=−∞ t=1 s=1
6 2
3
"  2  y # 3
X t 1 1 1 X t (1/2)[1 − (1/2)y ]
= + + ··· + =
t=1
6 2 2 2 t=1
6 1 − (1/2)
1 1 x(x + 1)
= [1 − (1/2)y ] (1 + 2 + · · · + x) = [1 − (1/2)y ]
6 6 2

0,
 x < 1, y < 1
x(x+1) 1 y

12 [1 − 2 ], 1 ≤ x < 3, 1 ≤ y < ∞
∴ F (x, y) =

1, x ≥ 3, y = ∞

EXAMPLE 2.22

Let X and Y be two r.v’s defined on sample space A = {(x, y) : 0 < x < 2; 2 < y < 4}, and let f (x, y) =
1
8 (6 − x − y) is a pdf of X and Y . Find the cdf.

Solution.
Z x Z y Z xZ y
1
F (x, y) = Pr(X ≤ x, Y ≤ y) = f (t, s)dtds = (6 − t − s)dsdt
t=−∞ s=−∞ 0 2 8
1 yh 1 y
Z ix Z   iy
1 1 1h 1 1
= 6t − t2 − st ds = 6x − x2 − xs ds = 6xs − x2 s − xs2
8 2 2 0 8 2 2 8 2 2 2
 
1 1 2 1 2 x
= 6xy − x y − xy − 10x + x2 = (12y − xy − y 2 − 20 + 2x).
8 2 2 16

0,
 x ≤ 0, y ≤ 2
x
∴ F (x, y) = 16 (12y − xy − y 2 − 20 + 2x), 0 < x < 2, 2 < y < 4

1, x ≥ 2, y ≥ 4

J
Note:
1. If X1 , X2 , . . . , Xn are continuous r.v’s, then the pdf is:

∂ n F (x1 , x2 , . . . , xn )
f (x1 , x2 , . . . , xn ) = .
∂x1 , ∂x2 , . . . , ∂xn

2. For 2−dimension discrete sample space, the pdf could be written as:

f (x, y) = F (x, y) − F (x, y − 1) − F (x − 1, y) + F (x − 1, y − 1).


TRANSFORMATION OF VARIABLES (CDF TECHNIQUE) 21

EXAMPLE 2.23

Let the r.v’s X, Y and Z have pdf f (x, y, z) = 6 e−(x+y+z) , 0 < x < y < z < ∞. Find the cdf of X, Y and Z.

Solution.
Z x Z y Z z Z x Z y Z z
F (x, y, z) = Pr(X ≤ x, Y ≤ y, Z ≤ z) = f (t, s, q)dtdsdq = e−(t+s+q) dqdsdt
t=−∞ s=−∞ q=−∞ 0 t s

0,
 x ≤ 0, y ≤ 0, z ≤ 0
1 −(x+2y) −(x+y+z) 1 −3x 1 −(2x+z) 1 −2y −(y+z) 1 −z 1

∴ F (x, y, z) = 6 2 e −e − 6e + 2e − 2e +e − 2e + 6 , 0 < x < y < z < ∞

1, x=y=z=∞

2.3 Transformation of Variables (cdf technique)

Assume that X is a r.v defined on sample space A and has pdf f (x) and cdf F (x). Consider a new r.v Y as a function
of X, say Y = ψ(X) defined on a sample space B = {y : y = ψ(x), x ∈ A}. The aim now is to find the distribution
of r.v Y , let G(y) = Pr(Y ≤ y) and g(x) represent the cdf and the pdf of Y . If the function y = ψ(x) is a one-to-one
1−1
transformation that maps the space A on to the space B (y = ψ(x) : A −−−→ B), then the inverse function x = ψ −1 (y)
on to
exist. Therefore the cdf of Y can be written as:

G(y) = Pr(Y ≤ y) = Pr(ψ(X) ≤ y) = Pr(X ≤ ψ −1 (y)) = F ψ −1 (y) , y ∈ B.




Hence, the pdf g(y) is: (


G(y) − G(y − 1), discrete
g(y) =
G0 (y), continuous

EXAMPLE 2.24

Let the r.v X has a pdf f (x) = 12 , −1 < x < 1. Find the distribution of r.v Y = X 2 .

Solution. Case 1 Since we have the pdf f (x) = 12 , −1 < x < 1, then:

Z x 0,
 x ≤ −1
F (x) = Pr(X ≤ x) = f (τ )dτ = 21 (x + 1), −1 < x < 1
−1 
1, x≥1

Let G(y) and g(y) represent the cdf and the pdf of Y defined on sample space B = {y : 0 < y < 1}, and
√ √ √ √
G(y) = Pr(Y ≤ y) = Pr(X 2 ≤ y) = Pr(− y ≤ X ≤ y) = Pr(X ≤ y) − Pr(X ≤ − y)
√ √ 1 √ 1 √ 1 √
= F ( y) − F (− y) = ( y + 1) − (− y + 1) = 2 y
2 2 2

0,
 y≤0

∴ G(y) = y, 0 < y < 1

1, y≥1

22 DISTRIBUTION OF RANDOM VARIABLES

Then, the pdf g(y) = G0 (y) = 1


2 y,
√ 0 < y < 1.

Case 2 The pdf of r.v X f (x) = 12 , −1 < x < 1. Let G(y) and g(y) be the cdf and pdf of Y = X 2 with sample space
B = {y : 0 < y < 1}. Then:
√ √
Zy Zy
√ √ 1 √
G(y) = Pr(Y ≤ y) = Pr(X 2 ≤ y) = Pr(− y ≤ X ≤ y) = f (x)dx = dx = y
√ √
2
− y − y

0,
 y≤0

∴ G(y) = y, 0 < y < 1

1, y≥1

EXAMPLE 2.25

The function f (x) = x6 , x = 1, 2, 3 is a pdf of r.v X that is defined on sample space A = {1, 2, 3}. Find the cdf of
r.v Y = 2X + 1.
Solution. The cdf of r.v X is:
x x
X 1X 1
F (x) = Pr(X ≤ x) = f (τ ) = τ = (1 + 2 + · · · + x)
τ =1
6 τ =1 6

0,
 x<1
1
∴ F (x) = x(x + 1), 1 ≤ x < 3
 12
1, x≥3

Let G(y) is the cdf of r.v Y = 2X + 1 defined on sample space B = {3.5.7}, then:
 1  y − 1
G(y) = Pr(Y ≤ y) = Pr(2X + 1 ≤ y) = Pr X ≤ (y − 1) = F
   2 2
1 y−1 y−1
= +1
12 2 2

0,
 y<3
1 2

∴ G(y) = 48 y − 1 , 3 ≤ y < 7

1, y≥7

EXAMPLE 2.26

Find the distribution of the r.v Y = − ln X, where X is defined on sample space A = {x : 0 < x < 1} with pdf
f (x) = 1, x ∈ A.
Solution. In order to evaluate the cdf of r.v X:
Z t Z x x
F (x) = Pr(X ≤ x) = f (τ )dτ = dτ = τ
0 0 0
MATHEMATICAL EXPECTATION 23


0,
 x≤0
∴ F (x) = x, 0<x<1

1, x≥1

Assume G(y) and g(y) represent the cdf and pdf of r.v Y = − ln X defined on sample space B = {y : 0 < y < ∞},
G(y) = Pr(Y ≤ y) = Pr(− ln X ≤ y) = Pr(ln X ≥ y) = Pr(X ≥ e−y )
= 1 − Pr(X ≤ e−y ) = 1 − F e−y


0,
 y≤0
−y
∴ G(y) = 1 − e , 0 < y < ∞

1, y=∞

This leads to the pdf of r.v Y , g(y) = G0 (y) = e−y , y ∈ B J

2.4 Mathematical Expectation

We have observed that the probability distribution for a random variable is a theoretical model for the empirical distri-
bution of data associated with a real population. If the model is an accurate representation of nature, the theoretical and
empirical distributions are equivalent. Consequently, we attempt to find the mean and the variance for a random variable
and thereby to acquire numerical descriptive measures, parameters, for the probability distribution.
Definition: Let X be a r.v with probability function f (x) and u(X) be a real-valued function of X. Then the expected
value of u(X) is given by:
 P ∞


 u(x)f (x), discrete
x=−∞

E[u(X)] =

 R∞
u(x)f (x)dx, continuous



x=−∞

EXAMPLE 2.27

Let the random variable X has a pdf f (x) = x6 , x = 1, 2, 3. Find E[X], E[X 2 ], E[X 3 ], E[(X − 1)3 ].
Solution.
-
3
X X x 1 14 7
E[X] = xf (x) = x = (12 + 22 + 32 ) = =
x x=1
6 6 6 3
-
3
2
X
2 1X 2 1 36
E[X ] = x f (x) = x = (13 + 23 + 33 ) = =6
x
6 x=1 6 6
-
3
X 1X 3 1 36 98 49
E[X 3 ] = x3 f (x) = x = (14 + 24 + 34 ) = = =
x
6 x=1 6 6 6 3
-
3
X 1X 1 26 13
E[(X − 1)3 ] = (x − 1)3 f (x) = x(x − 1)3 = (0 + 2 + 24) = =
x
6 x=1 6 6 2
J
24 DISTRIBUTION OF RANDOM VARIABLES

EXAMPLE 2.28

1
Let the random variable X has a pdf f (x) = 18 (X + 2), −2 < X < 4. Find E[3X], E[(X − 2)3 ].

Solution.

-
4 4
1 4 2 1 x3
Z Z Z 
1 2
E[3X] = 3xf (x) = 3x (x + 2)dx = (x + 2x)dx = +x
x x=−2 18 6 −2 6 3 −2
     
1 64 8 1 64 8
= + 16 − − + 4 = + 16 = − 4 = 6
6 3 3 6 3 3

-
4 4
1 (x + 2)5
Z Z 4
1 1
E[(X + 2)3 ] = (x + 2)3 (x + 2)dx = (x + 2)4 dx =
x=−2 18 18 −2 18 5 −2
1  5 
= 6 − 0 = 86.4
90

Notes:

If c is constant, then E[c] = c.

If c is constant and u is a function of X, then E[cu(X)] = cE[u(X)].


Pn Pn
If c1 , c2 , . . . , cn are constants, and u1 , u2 , . . . , un are functions, then i=1 E[ci ui ] = i=1 ci E[ui ].

EXAMPLE 2.29

Let the r.v X has a pdf f (x) = 2(1 − x), 0 < x < 1. Find E[X], E[X 2 ] and E[6X + 3X 2 − 4].

Solution. -
∞ 1 1
x2 x3
Z Z 
1
E[X] = xf (x)dx = 2 x(x − 1)dx = 2 − = .
−∞ 0 2 3 0 3

-
∞ 1 1
x3 x4
Z Z 
1
E[X 2 ] = x2 f (x)dx = 2 x2 (x − 1)dx = 2 − = .
−∞ 0 3 4 0 6

-
1 1 −3
E[6X + 3X 2 − 4] = 6E[X] + 3E[X 2 ] − 4 = 6 + 3 − 4 = .
3 6 2
J
MATHEMATICAL EXPECTATION 25

2.4.1 Some Special Mathematical Expectations


In this section, we will introduce some special mathematical expectation which are most common in the use of statistical
problems.

1. The Mean (or the expected vale) of a r.v X is the mathematical expectation E[X] and denoted by µ.

2. If X is a r.v with mean E(X) = µ, the Variance of r.v X, denoted by σ 2 or V ar(X), is defined to be the expected
value of (Xµ)2 . That is,
σ 2 = V ar(x) = E[(X − µ)2 ].
The standard deviation of X is the positive square root of σ 2 .
Properties of Variance

i The variance σ 2 = E[(x − µ)2 ] = E[X 2 − 2µX + µ2 ] = E[X 2 ] − µ2 .


ii If c is a constant, then V ar(c) = 0.
iii If c is a constant and X is a r.v, then V ar(cX) = c2 V ar(X)

3. The mathematical expectation µ0r = E[X r ] is called the rth moment about the origin.

4. The mathematical expectation µr = E[(X − µ)r ] is called the rth moment about the mean.

5. The Moment Generating Function (mgf) of a r.v X is the expectation of E[etX ] (if exist), and denoted by M (t).
The reason of the function M (t) is called mgf can be explained by the following statement. We have

(tx)2 (tx)3
etx = 1 + tx + + + ....
2! 3!
Then, the expectation
X (tx)2 (tx)3
X 
E[etx ] = etx f (x) = 1 + tx + + + . . . f (x)
x x
2! 3!
X X t2 X 2 t3 X 3
= f (x) + t xf (x) + x f (x) + x f (x) + . . .
x x
2! x 3! x
t2 t3
= 1 + tµ01 + µ02 + µ03 + . . . .
2! 3!

This argument involves an interchange of summations, which is justifiable if M (t) exists. Thus, E[etX ] is a function
of all the moments µ0k about the origin, for k = 1, 2, 3, . . . . In particular, µ0k is the coefficient of tk /k! in the series
expansion of M (t).
Notes:

i M (0) = 1.
ii If we can find E[etX ], we can find any of the moments for X. If M (t) exist, then for any positive integer k,

dk M (t)
= M (k) (0) = µ0k .
dtk
t=0

In other words, if you find the k th derivative of M (t) with respect to t and then set t = 0, the result will be µ0k .
26 DISTRIBUTION OF RANDOM VARIABLES

iii If we set ξ(t) = ln M (t), then

M 0 (t) M 0 (0) E[X]


ξ 0 (t) = ⇒ ξ 0 (0) = = =µ
M (t) M (0) 1
M (t)M 00 (t) − M 0 (t)M 0 (t) M (0)M 00 (0) − [M 0 (0)]2
ξ 00 (t) = ⇒ ξ 00
(0) =
[M (t)]2 [M (0)]2
2 2
E[X ] − µ
⇒ ξ 00 (0) = = σ2 .
12

EXAMPLE 2.30

The probability distribution of a r.v Y is given in the following table. Find the mean, variance and standard deviation
of Y .

Probability distribution of Y
y 0 1 2 3
p(y) 1/8 1/4 3/8 1/4

Solution. By definition:
3
X
µ = E[Y ] = yp(y) = 0(1/8) + 1(1/4) + 2(3/8) + 3(1/4) = 1.75
y=0

3
X
σ 2 = E[(Y −µ)2 ] = (y−µ)2 p(y) = (0−1.75)2 (1/8)+(1−1.75)2 (1/4)+(2−1.75)2 (3/8)+(3−1.75)2 (1/4) = 0.9375
y=0

or
3
X
E[Y 2 ] = y 2 p(y) = (0)2 (1/8) + (1)2 (1/4) + (2)2 (3/8) + (3)2 (1/4) = 4
y=0

∴ σ 2 = E[Y 2 ] − µ2 = 4 − (1.75)2 = 0.9275


and then √ √
σ = + σ 2 = 0.9375 = 0.97
J

EXAMPLE 2.31

The manager of an industrial plant is planning to buy a new machine of either type A or type B. If t denotes the
number of hours of daily operation, the number of daily repairs Y1 required to maintain a machine of type A is
a random variable with mean and variance both equal to 0.10t. The number of daily repairs Y2 for a machine
of type B is a random variable with mean and variance both equal to 0.12t. The daily cost of operating A is
CA (t) = 10t + 30Y12 ; for B it is CB (t) = 8t + 30Y22 . Assume that the repairs take negligible time and that each
night the machines are tuned so that they operate essentially like new machines at the start of the next day. Which
machine minimizes the expected daily cost if a workday consists of (a) 10 hours and (b) 20 hours?
MATHEMATICAL EXPECTATION 27

Solution. The expected daily cost for A is

E[CA (t)] = E[10t + 30Y12 ] = 10t + 30E[Y12 ]


= 10t + 30{V ar(Y1 ) + (E[Y1 ])2 } = 10t + 30[0.10t + (0.10t)2 ]
= 13t + 0.3t2 .

In this calculation, we used the known values for V ar(Y1 ) and E(Y1 ) and the fact that V ar(Y1 ) = E(Y12 ) − [E(Y1 )]2
to obtain that E(Y12 ) = V ar(Y1 ) + [E(Y1 )]2 = 0.10t + (0.10t)2 . Similarly,

E[CB (t)] = E[8t + 30Y22 ] = 8t + 30E[Y22 ]


= 8t + 30V (Y2 ) + (E[Y2 ])2 = 8t + 30[0.12t + (0.12t)2 ]
= 11.6t + 0.432t2 .

Thus, for scenario (a) where t = 10,

E[CA (10)] = 160 and E[CB (10)] = 159.2,

which results in the choice of machine B.


For scenario (b), t = 20 and
E[CA (20)] = 380 and E[CB (20)] = 404.8,

resulting in the choice of machine A.


In conclusion, machines of type B are more economical for short time periods because of their smaller hourly operating
cost. For long time periods, however, machines of type A are more economical because they tend to be repaired less
frequently. J

EXAMPLE 2.32

A retailer for a petroleum product sells a random amount X each day. Suppose that X , measured in thousands of
gallons, has the probability density function f (x) = 38 x2 , 0 ≤ x ≤ 2. The retailer’s profit turns out to be 100$for
each 1000 gallons sold if X ≤ 1 and 40$extra per 1000 gallons if X > 1. Find the retailer’s expected profit for any
given day.

Solution. Let p(x) denote the retailer’s daily profit. Then


(
100X, 0 ≤ X ≤ 1,
p(x) =
140X, 1 ≤ X ≤ 2.

We want to find expected profit; by definition, the expectation is:


Z Z 1   Z 2  
3 2 3 2
E[p(X)] = p(x)f (x)dx = 100x x dx + 140x x dx
x 0 8 1 8
 1  2
300 4 420 4
= x + x = 206.25
(8)(4) 0 (8)(4) 1

Thus, the retailer can expect a profit of 206.25$on the daily sale of this particular product. J
28 DISTRIBUTION OF RANDOM VARIABLES

EXAMPLE 2.33

Find the mean and variance, if exist, for each of the following distributions, with pdf’s:
x
1. f (x) = 15 , x = 1, 2, 3, 4, 5.
2. f (x) = 12 (x + 1), −1 < x < 1.
3. f (x) = x−2 , 1 < x < ∞.
4. f (x) = e−x , 0 < x < ∞.
5. f (x) = 38 x2 , 0 ≤ x ≤ 2

Solution.

1.
5
X X x 1 2 11
µ = E[X] = xf (x) = x = (1 + 22 + 32 + 42 + 52 ) = .
x x=1
15 15 3
5
X x 1 3 X
E[X 2 ] = x2 f (x) = = (1 + 23 + 33 + 43 + 53 ) = 15.
x2
x x=1
15 15
 2
11 121 14
σ 2 = V ar(X) = E[X 2 ] − µ2 = 15 − = 15 − = .
3 9 9

2.
1 1
1 x3 x2
    
−1 1
Z Z
1 1 1 1 1
µ = E[X] = xf (x)dx = x (x + 1)dx = + = + − + = .
x −1 2 2 3 2 −1 2 3 2 3 2 3

1 1
1 x4 x3
Z Z 
1 1
E[X 2 ] = x2 f (x)dx = x (x + 1)dx = + = .
x −1 2 2 4 3 −1 3
1 1 2
σ 2 = V ar(X) = E[X 2 ] − µ2 = − = .
3 9 9
3. ∞ ∞ ∞
Z Z Z
−2 1
µ = E[X] = xf (x)dx = xx dx = dx = ln x = ln ∞ − ln 1 = ∞ − 0 = ∞.
x 1 1 x 1

Therefore, the mean µ does not exist, hence the variance σ 2 does not exist neither.
4. ∞ ∞
Z Z
µ = E[X] = xf (x)dx = xe−x dx = −xe−x − e−x = (0 − 0) − (0 − 1) = 1.
x 0 0
Z Z ∞ ∞
2
E[X ] = 2
x f (x)dx = xe−x dx = −x2 e−x − 2xe−x − 2e−x = 2.
x 0 0

σ 2 = E[X 2 ] − µ2 = 2 − (1)2 = 1.

5.
2
∞ 2
3 x4
Z Z
3
µ = E[X] = xf (x)dx = x x2 dx = = 1.5.
−∞ 0 8 8 4
0
MATHEMATICAL EXPECTATION 29

2
∞ 2
3 x5
Z Z
2 2 23 2
E[X ] = x f (x)dx = x x dx = = 2.4
−∞ 0 8 8 5
0

σ 2 = V ar(X) = E[X 2 ] − (E[X])2 = 2.4 − (1.5)2 = 0.15

EXAMPLE 2.34

1 x

Let the r.v X has a pdf f (x) = 2 , x = 1, 2, 3, . . . , then:
1. Find the mgf of X.
2. Evaluate the mean and variance of X using the mgf.

Solution.

1. The mgf M (t) is the expectation of the function etX , then


∞  x X ∞  t x  t 2  t 3
X X 1 e et e e (et /2)
M (t) = E[etX ] = etx f (x) = etx = = + + + ··· =
x x=1
2 x=1
2 2 2 2 1 − (et /2)

et
∴ M (t) = , t 6= ln 2.
2 − et
e0 1
2. Check M (0) = 2−e0 = 2−1 = 1. now

(2 − et )et − et (−et ) 2et − e2t + e2t 2et


M 0 (t) = = =
(2 − et )2 (2 − et )2 (2 − et )2

2e0 2
M 0 (0) = = =2=µ
(2 − e0 )2 (2 − 1)2
and that
(2 − et )2 2et − 2et 2(2 − et )(−et ) (2 − et )2et + 4e2 t
M 00 (t) = t 4
=
(2 − e ) (2 − et )3
(2 − 1)2 + 4
M 00 (0) = = 6 = E[X 2 ]
(2 − 1)3
then
σ 2 = E[X 2 ] − µ2 = 6 − 4 = 2.
or, we consider the function ξ(t) = ln M (t) = t − ln(2 − et ), then

et 1
ξ 0 (t) = 1 + t
⇒ ξ 0 (0) = 1 + =2=µ
2−e 2−1
and
(2 − et )et − et (−et ) 2et 2
ξ 00 (t) = t 2
= ⇒ ξ 00 (0) = = 2 = σ2 .
(2 − e ) (2 − et )2 (2 − 1)2
J
30 DISTRIBUTION OF RANDOM VARIABLES

EXAMPLE 2.35

Find the mgf of a r.v X that has a pdf f (x) = xe−x , 0 < x < ∞, then evaluate the mean and variance of X.

Solution. The mgf od r.v X is the expectation of etX ,


∞ ∞ ∞
xe−(1−t)x e−(1−t)x
Z Z Z 
tX tx tx −x −(1−t)x
M (t) = E[e ]= e f (x)dx = e xe dx = xe dx = − −
x 0 0 1−t (1 − t)2 0

Therefore,
1
M (t) = , t < 1.
(1 − t)2
In order to evaluate µ and σ 2 , we have the mgf M (t) = (1 − t)−2 ,

M 0 (t) = 2(1 − t)−3 ⇒ M 0 (0) = 2 = µ

M 00 (t) = 6(1 − t)−4 ⇒ M 00 (0) = 6 = E[X 2 ]


∴ σ 2 = E[X 2 ] − µ2 = 6 − 4 = 2
or, we consider the function ξ(t) = ln M (t) = −2 ln(1 − t), then

2
ξ 0 (t) = ⇒ ξ 0 (0) = 2 = µ
1−t

ξ 00 (t) = 2(1 − t)−2 ⇒ ξ 00 (0) = 2 = σ 2


J

EXAMPLE 2.36

A manufacturing company ships its product in two different sizes of truck trailers. Each shipment is made in a trailer
with dimensions 8 feet × 10 feet × 30 feet or 8 feet × 10 feet × 40 feet. If 30% of its shipments are made by using
30-foot trailers and 70% by using 40-foot trailers, find the mean volume shipped per trailer load. (Assume that the
trailers are always full.)

Solution. Assume that the volume of the 30-foot trailers is v1 and the 40-foot trailers is v2 , then:

v1 = 8 × 10 × 30 = 2400 feet3 .

v2 = 8 × 10 × 40 = 3200 feet3 .
since we have the probability of shipping throughout v1 and v2 are:
3 7
p(v1 ) = 30% = p(v2 ) = 70% = .
10 10
Therefore, the expected shipping volume is:
2
X 3 7
E[V ] = vi p(vi ) = 2400 × + 3200 × = 2990 feet3 .
i=1
10 10

J
MATHEMATICAL EXPECTATION 31

EXAMPLE 2.37

In a gambling game a person draws a single card from an ordinary 52-card playing deck. A person is paid 15$for
drawing a jack or a queen and 5$ for drawing a king or an ace. A person who draws any other card pays 4$. If a
person plays this game, what is the expected gain?

Solution. Let the r.v X represents the outcome of the draw. Then, the player gain could be represented as:

15,
 x = J, Q
g = 5, x = K, A

−4, x = 2, 3, 4, 5, 6, 7, 8, 9, 10

52 1 1

Since we have 4 = 13 ways of drawing a card, then the probability of drawing any number or shape is equal to 13 ,
i.e:

Probability distribution of X
x 2 3 ... 10 J Q K A
p(x) 1/13 1/13 ... 1/13 1/13 1/13 1/13 1/13

Then, the expected gain of the played is calculated by:


      
X 1 1 1 −36 10 30 4
E[G] = gp(x) = 9 (−4) × + 2 (5) × + 2 (15) × = + + = = 0.307.
13 13 13 13 13 13 13

EXAMPLE 2.38

 Y for delivery, with a continuous uniform


A builder of houses needs to order some supplies that have a waiting time
distribution over the interval from 1 to 4 days p(y) = 31 , 1 ≤ y ≤ 4 . Because he can get by without them for 2
days, the cost of the delay is fixed at 100$ for any waiting time up to 2 days. After 2 days, however, the cost of the
delay is 100$ + 20$ per day (prorated) for each additional day. That is, if the waiting time is 3.5 days, the cost of
the delay is 100$ + 20$(1.5) = 130$. Find the expected value of the builder’s cost due to waiting for supplies.

Solution. Assume that the cost of waiting the supplies Wc , and Y is the r.v that represents the number of waiting days,
then: (
100, 1≤y≤2
Wc =
100 + 20(y − 2), 2 < y ≤ 4

Therefore, expected value of the builder’s cost due to waiting for supplies is
Z Z 2
Z 4
1 1
E[Wc ] = Wc p(y)dy =
100 dy + (100 + 20(y − 2)) dy
1 3 2 3
 2 4
100 2 100 4 20 y 100 200 40 340
= y + y + − 2y = + + = = 113.33
3 1 3 2 3 2 2 3 3 3 3

J
32 DISTRIBUTION OF RANDOM VARIABLES

2.4.2 Tchebyshev’s Inequality


In order to find the upper and lower bounds for certain probability, we will need to prove some theorems. These bounds
are not necessarily close to the exact probability.
Theorem: Let u(X) be a non-negative function of a r.v X whose pdf f (x), −∞ < x < ∞. If E[u(X)] exist, then for
all positive constant c,
E[u(X)]
Pr[u(X) ≥ c] ≤ .
c
Theorem: Tchebyshev Inequality: Let X be a r.v with mean µ and finite variance σ 2 . Then, for any constant k > 0,
 1  1
Pr |X − µ| < kσ ≥ 1 − 2 , or Pr |X − µ| ≥ kσ ≤ 2
k k
Two important aspects of this result should be pointed out. First, the result applies for any probability distribution.
Second, the results of the theorem are very conservative in the sense that the actual probability that X is in the interval
µ ± kσ usually exceeds the lower bound for the probability, 1 − 1/k 2 , by a considerable amount.
Proof : Consider the previous theorem by taking u(X) = (X − µ)2 and c2 = k 2 σ 2 , then
 
 2 2 2
 E (X − µ)2 σ2 1
Pr (X − µ) ≥ k σ ≤ 2 2
= 2 2 = 2
k σ k σ k
Since (x − µ)2 ≥ k 2 σ 2 ⇔ |x − µ| ≥ kσ. It follows,
 1
Pr |X − µ| ≥ kσ ≤ 2
k

EXAMPLE 2.39

1
√ √ 3

Let the r.v X has a pdf f (x) = 2√ 3
, − 3 < x < 3. Find The exact value of Pr |X − µ| ≥ 2σ and

Pr |X − µ| ≥ 2σ , and then compare those results with their upper bounds.
Solution. First of all, we need to find the mean µ and the variance σ 2 . Then
Z Z √3
1
µ = E[X] = xf (x)dx = √ xdx = 0.
x 2 3 −√3
Z Z √3
2 2 1 2
E[X ] = x f (x)dx = √ √ x dx = 1.
x 2 3 − 3

σ 2 = E[X 2 ] − µ2 = 1 ⇒ σ = 1.
The exact value of probability
3  3 3 3 3
Pr |X − µ| ≥ σ = Pr |X| ≥ = 1 − Pr |X| < = 1 − Pr − < X <
2 2 2 2 2
Z 3/2 Z 3/2 √
1 3
=1− f (x)dx = 1 − √ dx = 1 − = 0.134
−3/2 2 3 −3/2 2
To compare with the
 probability upper bound, we will use Tchebyshev inequality, to find this upper bound for probability
Pr |X − µ| ≥ 32 σ
 2
3 3 4
Pr |X| ≥ ≤ = = 0.44.
2 2 9
MATHEMATICAL EXPECTATION 33

It is clear that the exact probability (0.134) is less than the upper bound (0.44).
For the next part, we do the same. The exact value of probability
   
Pr |X − µ| ≥ 2σ = Pr |X| ≥ 2 = 1 − Pr |X| < 2 = 1 − Pr − 2 < X < 2
Z 2 "Z √ Z √3 #
3 Z 2
=1− f (x)dx = 1 − f (x)dx + √ f (x)dx + √ f (x)dx = 1 − (0 + 1 + 0) = 0
−2 −2 − 3 3

To compare with the  probability upper bound, we will use Tchebyshev inequality, to find this upper bound for probability
Pr |X − µ| ≥ 2σ
 1
Pr |X| ≥ 2 ≤ 2 = 0.25.
2
It is clear that the exact probability (0) is less than the upper bound (0.25). J

Note: We may have the mean µ and variance σ 2 for a distribution whose pdf is not available for some reason. In this
case, to find a certain probability, we use Tchebyshev inequality to find the upper or lower bound for this probability.

EXAMPLE 2.40

Let the r.v X has mean µ = 3 and variance σ 2 = 4. Use Tchebyshev inequality to determine a lower bound for
Pr(−2 < X < 8).
Solution. To use the Tchebyshev inequality, we need to get to the form Pr |X − µ| < kσ = 1 − k12 . Then
 

   
Pr − 2 < X < 8 = Pr − 2 − 3 < X − µ < 8 − 3 = Pr − 5 < X − µ < 5 = Pr |X − µ| < 5
 2
5  2 4 21
= Pr |X − µ| < σ ≥ 1 − =1− = = 0.85
2 5 25 25
J

EXAMPLE 2.41

The number of customers per day at a sales counter, Y , has been observed for a long period of time and found to
have mean 20 and standard deviation 2. The probability distribution of Y is not known. What can be said about the
probability that, tomorrow, Y will be greater than 16 but less than 24?
Solution. We want to find Pr(16 < Y < 24). From Tchebyshev inequality we know, for any k ≥ 0, Pr(|Y − µ| <
kσ) ≥ 1 − 1/k 2 , or
  1
Pr (µ − kσ) < Y < (µ + kσ) ≥ 1 − 2 .
k
Because µ = 20 and σ = 2, it follows that µ − kσ = 16 and µ + kσ = 24 if k = 2. Thus
  1 3
Pr 16 < Y < 24 = Pr µ − 2σ < Y < µ + 2σ ≥ 1 − 2 = .
2 4
In other words, tomorrow’s customer total will be between 16 and 24 with a fairly high probability (at least 3/4).
Notice that if σ were 1, k would be 4, and
  1 15
Pr 16 < Y < 24 = Pr µ − 4σ < Y < µ + 4σ ≥ 1 − 2 = .
4 16
Thus, the value of σ has considerable effect on probabilities associated with intervals. J
34 DISTRIBUTION OF RANDOM VARIABLES

EXAMPLE 2.42

Suppose that experience has shown that the length of time T (in minutes) required to conduct a periodic maintenance
check on a dictating machine follows a gamma distribution with mean µ = 6.2 and variance σ 2 = 12.4. A new
maintenance worker takes 22.5 minutes to check the machine. Does this length of time to perform a maintenance
check disagree with prior experience?

Solution. We know that µ = 6.2 and σ 2 = 12.4 ⇒ σ = 12.4 = 3.52. We need to evaluate Pr(T ≥ 22.5), then

Pr T − µ ≥ 22.5 − µ

. Notice that t = 22.5 minutes exceeds the mean µ = 6.2 minutes by 16.3 minutes, or k = 16.3/3.52 = 4.63 standard
deviations. Then from Tchebysheff’s theorem,
  1
Pr |T − 6.2| ≥ 16.3 = Pr |T − µ| ≥ 4.63σ ≤ = 0.0466.
(4.63)2

This probability is based on the assumption that the distribution of maintenance times has not changed from prior ex-
perience. Then, observing that Pr(T ≥ 22.5) is small, we must conclude either that our new maintenance worker has
generated by chance a lengthy maintenance time that occurs with low probability or that the new worker is somewhat
slower than preceding ones. Considering the low probability for Pr(T ≥ 22.5), we favour the latter view. J

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy