ch1 2

Download as pdf or txt
Download as pdf or txt
You are on page 1of 37

Stochastic and Random Processes

(EC636)

by

Dr. Mohamed Elalem

Department of Computer Engineering

University of Tripoli

Notes on Probabilities and Stochastic Processes

Graduate Program of Computer Engineering

http://melalem.com/EC636.php

c
M.Elalem
Lecture 1

Experiments, Models, and

Probabilities

1.1 Introduction

• Real word exhibits randomness

– Today’s temperature

– Flip a coin, head or tail (H,T)?

– Walk to a bus station, how long do you wait for the arrival of a bus?

– Transmit a waveform through a channel, which one arrives at the receiver?

– Which one does the receiver identify as the transmitted signal?

• We create models to analyze, since real experiment are generally too complicated, for

example, waiting time depends on the following factors:

– The time of a day (is it rush hour?);

– The speed of each car that passed by while you waited;

1
– The weight, horsepower, and gear ratio of the bus;

– The psychological profile and work schedule of drivers;

– The status of all road construction within 100 miles of the bus stop.

• It would be apparent that it would be too difficult to analyze the effects of all the

factors on the likelihood that you will wait less than 5 minutes for a bus. Therefore,

it is necessary to study and create a model to capture the critical part of the actual

physical experiment.

• Probability theory deals with the study of random phenomena, which under re-

peated experiments yield different outcomes that have certain underlying patterns

about them.

1.1.1 Review of Set Operation

• Event space Ω: sets of outcomes

• Sets constructions for events E ⊂ Ω and F ⊂ Ω

– Union: E ∪ F = {s ∈ Ω : s ∈ E OR s ∈ F };

– Intersection: E ∩ F = {s ∈ Ω : s ∈ E AND s ∈ F };

– Complement: E c = Ē = {s ∈ Ω : s ∈
/ E};

– Empty set: Φ = Ωc = {}.

• Only complement operation needs the knowledge of Ω; event space.

Spring 2023 2
1.1.2 Several Definitions

• Disjoint: if A ∩ B = Φ, the empty set, then A and B are said to be mutually exclusive

(M.E), or disjoint.

• Exhaustive: the collection of events has

Σ∞
i=1 Ai = Ω

• A partition of Ω is a collection of mutually exclusive subsets of Ω such that their

union is Ω (Partition is a stronger condition than Exhaustive.):

Ai ∩ Aj = φ and ∪ni=1 Ai = Ω

1.1.3 De-Morgan’s Law

A∪B =A∩B A∩B = A∪B (1.1)

Spring 2023 3
1.1.4 Sample Space, Events and Probabilities

• Outcome: an outcome of an experiment is any possible observations of that experi-

ment.

• Sample space: the sample space of an experiment is the set of all possible outcomes

of that experiment.

• Event: is a set of outcomes of an experiment.

• Event Space: is a collectively exhaustive, mutually exclusive set of events.

Sample Space and Event Space

– Sample space: contains all the details of an experiment. It is a set of all

outcomes, each outcome s ∈ S. Some example:

∗ coin toss: S = {H, T }

∗ roll pair of dice: S = {(1, 1), · · · , (6, 6)}

∗ component life time: S = {t ∈ [0, ∞)} e.g., lifespan of a light bulb

Spring 2023 4
∗ noise: S = {n(t); t: real}

– Event Space: is a set of events.

Example 1

coin toss 4 times:

The sample space consists of 16 four-letter words, with each letter either h (head)

or t (tail).

Let Bi = outcomes with i heads for i = 0, 1, 2, 3, 4. Each Bi is an event containing

one or more outcomes, say, B1 = {ttth, ttht, thtt, httt} contains four outcomes.

The set B = {B0 , B1 , B2 , B3 , B4 } is an event space. It is not a sample.

Example 2

Toss two dices, there are 36 elements in the sample space. If we define the event

as the sum of two dice,

Ω = {B2 , B3 , · · · , B12 }

there are 11 elements.

– Practical example, binary data transmit through a noisy channel, we are more

interested in the event space.

1.1.5 Probability Defined on Events

Often it is meaningful to talk about at least some of the subsets of S as events, for which

we must have mechanism to compute their probabilities.

Example 3

Spring 2023 5
Consider the experiment where two coins are simultaneously tossed. The sample space is

S = {γ1 , γ2 , γ3, γ4 } where

γ1 = {H, H} γ2 = {H, T } γ3 = {T, H} γ4 = {T, T }

If we define

A = {γ1 , γ2 , γ3 }

The event of A is the same as “Head has occurred at least once” and qualifies as an event.

Probability measure: each event has a probability, P (E)

1.1.6 Definitions, Axioms and Theorems

• Definitions: establish the logic of probability theory

• Axioms: are facts that we have to accept without proof.

• Theorems are consequences that follow logically from definitions and axioms. Each

theorem has a proof that refers to definitions, axioms, and other theorems.

• There are only three axioms.

For any event A, we assign a number P (A), called the probability of the event A. This

number satisfies the following three conditions that act the axioms of probability.

1- Probability is a nonnegative number

P (A) ≥ 0 (1.2)

2- Probability of the whole set is unity

P (Ω) = 1 (1.3)

Spring 2023 6
3- For any countable collection A1 , A2 , · · · of mutually exclusive events

P (A1 ∪ A2 ∪ · · · ) = P (A1 ) + P (A2 ) + · · · (1.4)

(Note that (3) states that if A and B are mutually exclusive (M.E.) events, the

probability of their union is the sum of their probabilities.)

We will build our entire probability theory on these axioms.

1.1.7 Some Results Derived from the Axioms

The following conclusions follow from these axioms:

• Since A ∪ A = Ω, using (2), we have

P (A ∪ A) = P (Ω) = 1

But A ∩ A = φ, and using (3),

P (A ∪ A) = P (A) + P (A) = 1 or P (A) = 1 − P (A)

• Similarly, for any A, A ∩ {φ} = {φ}. hence it follows that P (A ∪ {φ}) = P (A) + P (φ).

But A ∪ {φ} = A and thus

P {φ} = 0

• Suppose A and B are not mutually exclusive (M.E.) How does one compute P (A ∪ B)?

To compute the above probability, we should re-express (A ∪ B) in terms of M.E. sets

so that we can make use of the probability axioms. From figure below,

P (A ∪ B) = A ∪ AB

where A and AB are clearly M.E. events. Thus using axiom (3)

Spring 2023 7
P (A ∪ B) = P (A ∪ AB) = P (A) + P (AB)

to compute P (AB), we can express B as

B = B ∩ Ω = B ∩ (A ∪ A) = (B ∩ A) ∪ (B ∩ A) = BA ∪ BA

Thus

P (B) = P (BA) + P (BA)

Since BA = AB and BA = AB are M.E. events, we have

P (AB) = P (B) − P (AB)

Therefore

P (A ∪ B) = P (A) + P (B) − P (AB)

• Coin toss revisited:

γ1 = [H, H], γ2 = [H, T ], γ3 = [T, H], γ4 = [T, T ],

Let A = γ1 , γ2 : the event that the first coin falls head

Let B = γ1 , γ3 : the event that the second coin falls head

1 1 1 3
P (A ∪ B) = P (A) + P (B) − P (AB) = + − =
2 2 4 4

where P (A ∪ B) denotes the event that at least one head appeared.

Spring 2023 8
1.1.8 Theorem

For an event space B = {B1 , B2 , · · · } and any event A in the event space, let Ci =

A ∩ Bi . For i 6= j, the events Ci and Cj are mutually exclusive and


n
X
A = C1 ∪ C2 ∪ · · · P (A) = P (Ci )
i=1

Example 4

Coin tossing, let A equal the set of outcomes with less than three heads, as A =

{tttt, httt, thtt, ttht, ttth, hhtt, htht, htth, tthh, thth, thht} Let {B0 , B1 , B2 , B3 , B4 } de-

note the event space in which Bi = { outcomes with i heads }. Let Ci = A ∩ Bi (i =

0, 1, 2, 3, 4), the above theorem states that

A = C0 ∪ C1 ∪ C2 ∪ C3 ∪ C4

= (A ∩ B0 ) ∪ (A ∩ B1 ) ∪ (A ∩ B2 ) ∪ (A ∩ B3 ) ∪ (A ∩ B4 )

In this example, Bi ⊂ A, for i = 0, 1, 2. Therefore, A ∩ Bi = Bi for i = 0, 1, 2. Also

for i = 3, 4, A ∩ Bi = φ, so that A = B0 ∪ B1 ∪ B2 , a union of disjoint sets. In words,

this example states that the event less than three heads is the union of the events for

“zero head”, “one head”, and “two heads”.

Example 5

Spring 2023 9
V F D

L 0.3 0.15 0.12

B 0.2 0.15 0.08

A company has a model of telephone usage. It classifies all calls as L (long), B (brief).

It also observes whether calls carry voice(V ), fax (F ), or data(D). The sample space

has six outcomes S = {LV, BV, LD, BD, LF, BF }. The probability can be represented

in the table as Note that {V, F, D} is an event space corresponding to {B1 , B2 , B3 } in

the previous theorem (and L is equivalent as the event A). Thus, we can apply the

theorem to find

P (L) = P (LV ) + P (LD) + P (LF ) = 0.57

1.1.9 Conditional Probability and Independence

In N independent trials, suppose NA , NB , NAB denote the number of times events A,

B and AB occur respectively. According to the frequency interpretation of probability,

for large N
NA NB NAB
P (A) = P (B) = P (AB) =
N N N

Among the NA occurrences of A, only NAB of them are also found among the NB

occurrences of B. Thus the ratio

NAB NAB /N P (AB)


= =
NB NB /N P (B)

is a measure of the event A given that B has already occurred. We denote this condi-

tional probability by P (A|B) = Probability of the event A given that B has occurred.

Spring 2023 10
We define

P (AB)
P (A|B) = (1.5)
P (B)

provided P (B) 6= 0. As we show below, the above definition satisfies all probability

axioms discussed earlier. We have

1. Non-negative
P (AB) ≥ 0
P (A|B) = ≥0
P (B) > 0

2.
P (ΩB) P (B)
P (Ω|B) = = = 1, since ΩB = B
P (B) P (B)

3. Suppose A ∩ C = φ, then

P ((A ∪ C) ∩ B) P (AB ∪ CB)


P (A ∪ C|B) = =
P (B) P (B)

But AB ∩ CB = φ, hence P (AB ∪ CB) = P (AB) + P (CB),

P (AB) P (CB)
P (A ∪ C|B) = + = P (A|B) + P (C|B)
P (B) P (B)

satisfying all probability axioms. Thus P (A|B) defines a legitimate probability

measure.

1.1.10 Properties of Conditional Probability

1. If B ⊂ A, AB = B, and

P (AB) P (B)
P (A|B) = = =1
P (B) P (B)

since if B ⊂ A, then occurrence of B implies automatic occurrence of the event

A. As an example, let A = { outcome is even}, B = { outcome is 2} in a dice

tossing experiment. Then B ⊂ A and P (A|B) = 1.

Spring 2023 11
2. If A ⊂ B, AB = A, and

P (AB) P (A)
P (A|B) = = > P (A)
P (B) P (B)

In a dice experiment, A = { outcome is 2}, B = { outcome is even}, so that A ⊂

B. The statement that B has occurred (outcome is even) makes the probability

for “outcome is 2” greater than that without that information.

3. We can use the conditional probability to express the probability of a compli-

cated event in terms of simpler related events: Law of Total Probability. Let

A1 , A2 , · · · , An are pair wise disjoint and their union is Ω. Thus Ai ∩ Aj = φ, and

∪ni=1 Ai = Ω

thus

B = BΩ = B(A1 ∪ A2 ∪ · · · ∪ An ) = BA1 ∪ BA2 ∪ · · · BAn ∪

But Ai ∩ Aj = φ ⇒ BAi ∩ BAj = φ so that

n
X n
X
P (B) = P (BAi ) = P (B|Ai )P (Ai )
i=1 i=1

Next we introduce the notion of “independence” of events.

Independence: A and B are said to be independent events, if

P (AB) = P (A)P (B)

Notice that the above definition is a probabilistic statement, NOT a set theo-

retic notion such as mutually exclusiveness, (independent and disjoint are not

synonyms).

Spring 2023 12
1.1.11 More on Independence

– Disjoint events have no common outcomes and therefore P (AB) = 0. In most

cases, independent does not mean disjoint, except P (A) = 0 or P (B) = 0.

– Disjoint leads to probability sum, while independence leads to probability multi-

plication.

– Independent events cannot be mutually exclusive, since P (A) > 0, P (B) > 0, and

A, B independent implies P (AB) > 0, thus the event AB cannot be the null set.

– Suppose A and B are independent, then

P (AB) P (A)P (B)


P (A|B) = = = P (A)
P (B) P (B)

Thus if A and B are independent, the event that B has occurred does not shed

any more light into the event A. It makes no difference to A whether B has

occurred or not.

Example 6

A box contains 6 white and 4 black balls. Remove two balls at random without

replacement. What is the probability that the first one is white and the second

one is black?

Let W1 = “first ball removed is white” and B2 = “second ball removed is black”.

We need to find P (W1 ∩ B2 ) =?.

We have W1 ∩ B2 = W1 B2 = B2 W1 . Using the conditional probability rule,

P (W1 B2 ) = P (B2 W1 ) = P (B2 |W1 )P (W1 )

But
6 6 3
P (W1 ) = = =
6+4 10 5

Spring 2023 13
and
4 4
P (B2 |W1 ) = =
5+4 9
and hence
3 4 4
P (W1 |B2 ) = · = = 0.267
5 9 15
Are the events W1 and B2 independent? Our common sense says No. To verify

this we need to compute P (B2). Of course the fate of the second ball very much

depends on that of the first ball. The first ball has two options: W1 = “first ball

is white” or B1 = “first ball is black”. Note that W1 ∩ B1 = φ and W1 ∪ B1 = Ω.

Hence W1 together with B1 form a partition. Thus

P (B2) = P (B2 |W1 )P (W1 ) + P (B2 |B1 )P (B1 )

4 3 3 4 2
P (B2 ) = · + · =
5 + 4 5 6 + 3 10 5
and
2 3 4
P (B2 )P (W1 ) = · =6 P (B2 W1 ) =
5 5 15
As expected, the events W1 and B2 are dependent.

1.2 Bayes’ Theorem

since

P (AB) = P (A|B)P (B)

similarly,

P (BA) P (AB)
P (B|A) = = ⇒ P (AB) = P (B|A)P (A)
P (A) P (A)

We get

P (A|B)P (B) = P (B|A)P (A)

Spring 2023 14
or

P (B|A)
P (A|B) = · P (A) (1.6)
P (B)

The above equation is known as Bayes’ theorem.

Although simple enough, Bayes theorem has an interesting interpretation: P (A)

represents the a-priori probability of the event A. Suppose B has occurred, and

assume that A and B are not independent. How can this new information be

used to update our knowledge about A? Bayes’ rule takes into account the new

information (“B has occurred”) and gives out the a-posteriori probability of A

given B.

We can also view the event B as new knowledge obtained from a fresh experiment.

We know something about A as P (A). The new information is available in terms of

B. The new information should be used to improve our knowledge/understanding

of A. Bayes theorem gives the exact mechanism for incorporating such new in-

formation.

A more general version of Bayes’ theorem involves partition of Ω as

P (B|Ai )P (Ai ) P (B|Ai )P (Ai )


P (Ai |B) = = Pn (1.7)
P (B) j=1 P (B|Aj )P (Aj )

In above equation, Ai , i = [1, n] represent a set of mutually exclusive events with

associated a-priori probabilities P (Ai ), i = [1, n]. With the new information “B

has occurred”, the information about Ai can be updated by the n conditional

probabilities P (B|Aj ), j = [1, n].

Example 7

Two boxes B1 and B2 contain 100 and 200 light bulbs respectively. The first box

(B1 ) has 15 defective bulbs and the second 5. Suppose a box is selected at random

Spring 2023 15
and one bulb is picked out.

(a) What is the probability that it is defective?

Solution: Note that box B1 has 85 good and 15 defective bulbs. Similarly box

B2 has 195 good and 5 defective bulbs. Let D = “Defective bulb is picked out”.

Then,
15 5
P (D|B1) = = 0.15, P (D|B2) = = 0.025
100 200
Since a box is selected at random, they are equally likely.

P (B1 ) = P (B2 ) = 1/2

Thus B1 and B2 form a partition, and using Law of Total Probability, we obtain

1 1
P (D) = P (D|B1 )P (B1 ) + P (D|B2)P (B2 ) = 0.15 × + 0.025 × = 0.0875
2 2

Thus, there is about 9% probability that a bulb picked at random is defective.

(b) Suppose we test the bulb and it is found to be defective. What is the proba-

bility that it came from box 1? (P (B1 |D) =?)

P (D|B1)P (B1 ) 0.15 × 0.5


P (B1 |D) = = = 0.8571 (1.8)
P (D) 0.0875

Notice that initially P (B1) = 0.5; then we picked out a box at random and tested

a bulb that turned out to be defective. Can this information shed some light about

the fact that we might have picked up box 1? From (1.8), P (B1 |D) = 0.875 > 0.5,

and indeed it is more likely at this point that we must have chosen box 1 in favor

of box 2. (Recall box1 has six times more defective bulbs compared to box2).

Example 8

Suppose you have two coins, one biased, one fair, but you don’t know which coin is

which. Coin 1 is biased. It comes up heads with probability 3/4, while coin 2 will flip

Spring 2023 16
heads with probability 1/2. Suppose you pick a coin at random and flip it. Let Ci

denote the event that coin i is picked. Let H and T denote the possible outcomes of the

flip. Given that the outcome of the flip is a head, what is P [C1 |H], the probability that

you picked the biased coin? Given that the outcome is a tail, what is the probability

P [C1 |T ] that you picked the biased coin?

Solution: First, we construct the sample tree as shown: To find the conditional

probabilities, we see

P (C1H) P (C1H) 3/8 3


P (C1|H) = = = =
P (H) P (C1H) + P (C2 H) 3/8 + 1/4 5

Similarly,

P (C1 T ) P (C1 T ) 1/8 1


P (C1 |T ) = = = =
P (T ) P (C1T ) + P (C2 T ) 1/8 + 1/4 3

As we would expect, we are more likely to have chosen coin 1 when the first flip is

heads but we are more likely to have chosen coin 2 when the first flip is tails.

Spring 2023 17
Lecture 2

Random Variables

2.1 Introduction

Let Ω be sample space of a probability model, and X a function that maps every ζ ∈ Ω, to

a unique point x ∈ R, the set of real numbers. Since the outcome ζ is not certain, so is the

value X(ζ) = x. Thus if B is some subset of R, we may want to determine the probability

of ”X(ζ) ∈ B“. To determine this probability, we can look at the set A = X −1 (B) ⊂ Ω. A

contains all that maps into B under the function X.

Obviously, if the set A = X −1 (B) is an event, the probability of A is well defined; in this

case we can say

probability of the event ”X(ζ) ∈ B“ = P (X −1(B)) = P (A)

18
However, X −1 (B) may not always belong to R for all B, thus creating difficulties. The

notion of random variable (RV ) makes sure that the inverse mapping always results in an

event so that we are able to determine the probability for any B ⊂ R.

Random Variable (RV ): A finite single valued function X(·) that maps the set of all

experimental outcomes Ω into the set of real numbers R is said to be a RV , if the set

{ζ|X(ζ) ≤ x} is an event for every x in R.

The random variable X by the function X(ζ) that maps the sample outcome ζ to the

corresponding value of the random variable X. That is

{X = x} = {ζ ∈ Ω|X(ζ) = x}

Since all events have well defined probability. Thus the probability of the event {ζ|X(ζ) ≤ x}

must depend on x. Denote

P {ζ|X(ζ) ≤ x} = FX (x) ≥ 0 (2.1)

The role of the subscript X is only to identify the actual RV . FX (x) is said to be the

Cumulative Distribution Function (CDF) associated with the RV X.

2.1.1 Properties of CDF

FX (+∞) = 1, FX (−∞) = 0

FX (+∞) = P {ζ|X(ζ) ≤ +∞} = P (Ω) = 1

FX (−∞) = P {ζ|X(ζ) ≤ −∞} = P (φ) = 0

if x1 < x2 , then FX (x1 ) ≤ FX (x2 )

Spring 2023 19
If x1 < x2 , then the subset (−∞, x1 ) ⊂ (−∞, x2 ). Consequently the event {ζ|X(ζ) ≤

x1 } ⊂ {ζ|X(ζ) ⊂ x2}, since X(ζ) ≤ x1 , implies X(ζ) ≤ x2. As a result

FX (x1 ) = P (X(ζ) ≤ x1 ) ≤ P (X(ζ) ≤ x2) = FX (x2 )

implying that the probability distribution function is nonnegative and monotone non-

decreasing.

F or all b > a, FX (b) − FX (a) = P (a < X ≤ b)

To prove this theorem, express the event Eab = {a < X ≤ b} as a part of union of

disjoint events. Starting with the event Eb = {X ≤ b} . Note that Eb can be written

as the union

Eb = X ≤ b = {X ≤ a} ∪ {a < X ≤ b} = Ea ∪ Eab

Note also that Ea and Eab are disjoint so that P (Eb) = P (Ea )+P (Eab). Since P (Eb ) =

FX (b) and P (Ea ) = FX (a), we can write FX (b) = FX (a) + P (a < X ≤ b), which

completes the proof.

2.1.2 Additional Properties of a CDF

• If FX (x0 ) = 0 for some x0 , then FX (x) = 0, x ≤ x0 .

This follows, since FX (x0 ) = P (X(ζ) ≤ x0 ) = 0 implies {X(ζ) ≤ x0 } is the null set,

and for any x ≤ x0 , {X(ζ) ≤ x} will be a subset of the null set.

• P {X(ζ) > x} = 1 − FX (x)

We have {X(ζ) ≤ x} ∪ {X(ζ) > x} = Ω, and since the two events are mutually

exclusive, the above equation follows.

Spring 2023 20
• P {x1 < X(ζ) ≤ x2 } = FX (x2 ) − FX (x1 ), x2 > x1

The events {X(ζ) ≤ x1 } and {x1 < X(ζ) ≤ x2 } are mutually exclusive and their union

represents the event {X(ζ) ≤ x2 }.

• P {X(ζ) = x} = FX (x) − FX (x− )

Let x1 = x − ǫ, ǫ > 0, and x2 = x,

lim P {x − ǫ < X(ζ) ≤ x} = FX (x) − lim FX (x − ǫ)


ǫ→0 ǫ→0

or

P {X(ζ) = x} = FX (x) − FX (x− )

FX (x+
0 ), the limit of FX (x) as x → x0 from the right always exists and equals FX (x0 ).

However the left limit value FX (x−


0 ) need not equal FX (x0 ). Thus FX (x) need not be

continuous from the left. At a discontinuity point of the distribution, the left and right

limits are different, and

P {X(ζ) = x0 } = FX (x0 ) − FX (x−


0)

Thus the only discontinuities of a distribution function are of the jump type. The

CDF is continuous from the right. Keep in mind that the CDF always takes on the

upper value at every jump in staircase.

Example 1

X is a RV such that X(ζ) = c, ζ ∈ Ω. Find FX (x).

Solution: For x < c, {X(ζ) ≤ x} = φ, so that FX (x) = 0 and for x > c, X(ζ) ≤ x = Ω, so

that FX (x) = 1. (see Figure 2.1)

Example 2

Spring 2023 21
Figure 2.1: CDF for example 1

Toss a coin. Ω = {H, T }. Suppose the RV X is such that X(T ) = q, X(H) = 1 − q. Find

FX (x).

Solution:

• For x < 0, {X(ζ) ≤ x} = φ, so that FX (x) = 0.

• For 0 < x < 1, {X(ζ) ≤ x} = {T }, so that FX (x) = P (T ) = q.

• For x ≥ 1, {X(ζ) ≤ x} = {H, T } = Ω, so that FX (x) = 1. (see Figure 2.2)

Figure 2.2: CDF for example 2

• X is said to be a continuous-type RV if its distribution function FX (x) is continuous.

In that case FX (x− ) = FX (x) for all x, therefore, P {X = x} = 0.

• If FX (x) is constant except for a finite number of jump discontinuities(piece-wise con-

stant; step-type), then X is said to be a discrete-type RV . If xi is such a discontinuity

Spring 2023 22
point, then

pi = P {X = xi } = FX (xi ) − FX (x−
i )

For above two examples, at a point of discontinuity we get

P {X = c} = FX (c) − FX (c− ) = 1 − 0 = 1

and

P {X = 0} = FX (0) − FX (0− ) = q − 0 = q

Example 3

A fair coin is tossed twice, and let the RV X represent the number of heads. Find FX (x).

Solution: In this case Ω = {HH, HT, T H, T T }, and X(HH) = 2, X(HT ) = 1, X(T H) = 1,

X(T T ) = 0

• For x < 0, {X(ζ) ≤ x} = φ, so that FX (x) = 0.

• For 0 ≤ x < 1, {X(ζ) ≤ x} = {T T }, so that FX (x) = P (T T ) = P (T )P (T ) = 41 .

• For 1 ≤ x < 2, {X(ζ) ≤ x} = {T T, HT, T H}, so that FX (x) = P (T T, HT, T H) = 43 .

• x ≥ 2, {X(ζ) ≤ x} = Ω, so that FX (x) = 1.

(see Figure 2.3) We can also have

3 1 1
P {X = 1} = FX (1) − FX (1− ) = − =
4 4 2

2.2 Probability Density Function (pdf )

The first derivative of the distribution function FX (x) is called the probability density func-

tion fX (x) of the RV X. Thus

dFx (x) FX (x + ∆x) − FX (x)


fX (x) = = lim ≥0 (2.2)
dx ∆x→0 ∆x

Spring 2023 23
Figure 2.3: CDF for example 3

Equation (2.2) shows that fX (x) ≥ 0 for all x.

• Discrete RV:if X is a discrete type RV , then its density function has the general

form
X
fX (x) = pi δ(x − xi )
i

where xi represent the jump-discontinuity points in FX (x). As Fig. 2.4 shows, fX (x)

represents a collection of positive discrete masses, and it is known as the probability

mass function (pmf) in the discrete case.

Figure 2.4: Discrete pmf

• If X is a continuous type RV , fX (x) will be a continuous function,

• We also obtain by integration


Z x
FX (x) = fX (u)du
+∞

Spring 2023 24
Since FX (+∞) = 1, yields
Z −∞
fX (u)du = 1
−∞

which justifies its name as the density function.

• We also get Figure 2.5


Z x2
P {x1 < X < x2 } = FX (x1 ) − FX (x2 ) = fX (x)dx
x1

Thus the area under fX (x) in the interval (x1 , x2 ) represents the probability in the

Figure 2.5: Continuous pdf

above equation.

• Often, RV s are referred by their specific density functions - both in the continuous and

discrete cases - and in what follows we shall list a number of them in each category.

2.3 Continuous-type Random Variables

• Normal (Gaussian): X is said to be normal or Gaussian RV , if

1 h (x − µ)2 i
fX (x) = √ exp − (2.3)
2πσ 2 2σ 2

This is a bell shaped curve, symmetric around the parameter µ, and its distribution

function is given by

x h (y − µ)2 i
1 x−µ
Z
FX (x) = √ exp − 2
dy = Φ( ) (2.4)
−∞ 2πσ 2 2σ σ

Spring 2023 25
Rx
where Φ(x) = −∞
√1

exp(− y2 )dy is called standard normal CDF , and is often tabu-

lated. Figure 2.6 shows pdf and cdf of the Normal distribution for different means and

variances.

Figure 2.6: pdf and cdf of Normal distribution for different means and variances

b−µ a−µ
P (a < X < b) = Φ( ) − Φ( )
σ σ
Z ∞
1  y
Q(x) = √ exp − dy = 1 − Φ(x)
x 2π 2
Q(x) is called Standard Normal complementary CDF , and Q(x) = 1 − Φ(x). Since

fX (x) depends on two parameters µ and σ 2 , the notation X ∼ ℵ(µ, σ 2 ) is applied. If

X −µ
Y = ∼ ℵ(0, 1) (2.5)
σ

Y is called normalized Gaussian RV . Furthermore,

aX + b ∼ ℵ(aµ + b, a2 σ 2 )

linear transform of a Gaussian RV is still Gaussian.

• Uniform: X ∼ U(a, b), a < b, as shown in Figure ?? if



 1 , a ≤ x ≤ b;

b−a
fX (x) = (2.6)
 0,

elsewhere.

Spring 2023 26
Figure 2.7: pdf and cdf of Uniform distribution

• Exponential: X ∼ E(λ) if

1
exp(− λx , x ≥ 0;


λ
fX (x) = (2.7)
 0,

elsewhere.

Figure ?? indicates the pdf and cdf of Exponential distribution for different parameter

λ.

Figure 2.8: pdf and cdf of Exponential distribution

• Chi-Square distribution with n degree of freedom



 n n1

xn/2−1 exp(− 2σx2 ), x ≥ 0;
σ 2 Γ(n/2)
fX (x) = (2.8)
 0,

elsewhere.

Spring 2023 27
When the RV X is defined as X = Σni=1 Xi , i = [1, n] are statistically independent

and identically distributed (i.i.d) Gaussian RV ∼ ℵ(0, σ 2 ), then X has a chi-square

distribution with n degree of freedom. Γ(x) is called Gamma function and given as
Z ∞
Γ(p) = tp−1 e−t dt, p > 0
0

Γ(p) = (p − 1)! pis possitive integer.

1 √
Γ( ) = π
2

• Rayleigh: X ∼ R(σ 2 ) as shown in Figure ??



 x2 exp(− x22 ), x ≥ 0;

σ 2σ
fX (x) = (2.9)
 0,

elsewhere.

Let Y = X12 +X22 where X1 and X2 ∼ ℵ(0, σ 2 ) and independent. Then Y is chi-square

Figure 2.9: pdf and cdf of Rayleigh distribution

distributed with two degrees of freedom, hence pdf of Y is

1  y 
fY (y) = exp −
2σ 2 2σ 2

Now, suppose we define a new RV as R = Y , then R is Rayleigh distributed.

Spring 2023 28
2.4 Discrete-type Random Variables

• Bernoulli: X takes the values of (0, 1), and

P (X = 0) = q, P (X = 1) = p, p = 1 − q.

• Binomial: X ∼ B(n, p)
 
 n 
P (X = k) =   pk q n−k , k = 0, 1, 2, · · · , n
k

• Poisson: X ∼ P (λ)

λk
P (X = k) = e−λ , k = 0, 1, 2, · · · , ∞
k!

• Uniform: X takes the values from [1, , n], and

1
P (X = k) = , k = 0, 1, 2, · · · , n
n

• Geometric: (number of coin toss till first head appear)

P (X = k) = (1 − p)k−1 p, k = 1, · · · ,

where the parameter p ∈ (0, 1) (probability for head appear on each one toss).

2.4.1 Example of Poisson RV

Example of Poisson Distribution: the probability model of Poisson RV describes phenomena

that occur randomly in time. While the time of each occurrence is completely random,

there is a known average number of occurrences per unit time. For example, the arrival of

information requests at a W W W server, the initiation of telephone call, etc.

For example, calls arrive at random times at a telephone switching office with an average

Spring 2023 29
of λ = 0.25 calls/second. The pmf of the number of calls that arrive in a T = 2 second

interval is 
 (0.5)k e−0.5 , k = 0, 1, 2, · · · ;

k!
PK (k) =
 0,

o.w.

2.4.2 Example of Binomial RV

Example of using Binomial Distribution: To communicate one bit of information reliably, we

transmit the same binary symbol 5 times. Thus, ”zero“ is transmitted as 00000 and ”one“

is transmitted as 11111. The receiver detects the correct information if three or more binary

symbols are received correctly. What is the information error probability P (E), if the binary

symbol error probability is q = 0.1?

In this case, we have five trials corresponding to five transmissions. On each trial, the

probability of a success is p = 1 − q = 0.9 (binary symmetric channel). The error event

occurs when the number of successes is strictly less than three:

P (E) = P (S0,5 ) + P (S1,5 ) + P (S2,5 ) = q 5 + 5pq 4 + 10p2 q 3 = 0.0081

By increasing the number of transmissions (5 times), the probability of error is reduced from

0.1 to 0.0081.

2.4.3 Bernoulli Trial Revisited

Bernoulli trial consists of repeated independent and identical experiments each of which has

only two outcomes A or Ā with P (A) = p and P (Ā) = q. The probability of exactly k

occurrences of A in n such trials is given by Binomial distribution.

Let

Xk = ”exact k occurance in n trials“ (2.10)

Spring 2023 30
Since the number of occurrences of A in n trials must be an integer k = 0, 1, 2, · · · , n, either

X0 or X1 or X2 or · · · or Xn must occur in such an experiment. Thus

P (X0 ∪ X1 ∪ X2 ∪ · · · ∪ Xn ) = 1 (2.11)

But Xi , Xj are mutually exclusive. Thus


 
n n
X X  n  k n−k
P (X0 ∪ X1 ∪ X2 ∪ · · · ∪ Xn ) = P (Xk ) =  p q (2.12)
k=0 k=0 k

from the relation  


n
X  n  k n−k
(a + b)n =  p q
k=0 k

Equation (2.12) (p + q)n = 1 and it agrees with Equation (2.11).

For a given n and p what is the most likely value of k? The most probable value of k is that

number which maximizes in Binomial distribution. To obtain this value, consider the ratio

Pn (k − 1) n!pk−1 q n−k+1 (n − k)!k! k q


= · k n−k
= ·
Pn (k) (n − k + 1)!(k − 1)! n!p q n−k+1 p

Thus Pn (k) > Pn (k − 1), if k(1 − p) < (n − k + 1)p or k < (n + 1)p. Thus, Pn (k) as a function

of k increases until

k = (n + 1)p

if it is an integer, or the largest integer kmax less than (n + 1)p and (n + 1)p represents the

most likely number of successes (or heads) in n trials.

Example 4

In a Bernoulli experiment with n trials, find the probability that the number of occurrences

of A is between k1 and k2 .

Spring 2023 31
Solution: with Xi , i = 0, 1, 2, · · · , n as defined in Equation (2.10), clearly they are mutually

exclusive events. Thus

P = P (”Occurrence of A is between k1 and k2 “)


 
k2 k2
X X  n  k n−k
= P (Xk1 ∪ Xk1 +1 ∪ · · · Xk2 ) = P (Xk ) =  p q (2.13)
k=k1 k=k1 k

Example 5

Suppose 5, 000 components are ordered. The probability that a part is defective equals 0.1.

What is the probability that the total number of defective parts does not exceed 400?

Solution: Let

Yk = ”k parts are detective among 5000 components“

using Equation (2.13), the desired probability is given by


 
400
X 400
X  5000 
k 5000−k
P (Y0 ∪ Y1 ∪ · · · Y400 ) = P (Yk ) =   (0.1) (0.9)
k=0 k=0 k

The above equation has too many terms to compute. Clearly, we need a technique to compute

the above term in a more efficient manner.

2.4.4 Binomial Random Variable Approximations

Let X represent a Binomial RV , then


 
k2 k2
X X  n  k n−k
P (k1 < X < k2 = P (Xk ) =  p q (2.14)
k=k1 k=k1 k
 
 n  n!
Since the binomial coefficient   = (n−k)!k! grows quite rapidly with n, it is difficult to
k
compute Equation (2.14) for large n. In this context, Normal approximation is extremely

Spring 2023 32
useful.

Normal Approximation: (Demoivre-Laplace Theorem) Suppose n → ∞ with p held fixed.



Then for k in the npq neighborhood of np, we can approximate
 
 n  k n−k 1  (k − np)2 
p q = √ exp − (2.15)
2πnpq 2npq
 
k

Thus if k1 and k2 in Equation (2.14) are within or around the neighborhood of the interval
√ √
(np − npq, np + npq) we can approximate the summation in Equation (2.14) by an

integration as

K2  (k − np)2 
1
Z
P (k1 < X < k2 ) = √ exp − dx
k1 2πnpq 2npq
x2  y2 
1
Z
= √ exp − dy (2.16)
x1 2π 2

where
k1 − np k2 − np
x1 = √ x2 = √
npq npq

We can express Equation (2.16) in terms of the normalized integral that has been tabulated

extensively. See Figures 2.11 and 2.12.

x
1
Z
2 /2
erf (x) = √ ey dy = −erf (−x) (2.17)
2π 0

For example, if x1 and x2 are both positive, we obtain

P (x1 < X < x2 ) = erf (x2 ) − erf (x1 )

Example 6

A fair coin is tossed 5, 000 times. Find the probability that the number of heads is between

2, 475 to 2, 525.

Solution: We need P (2475 ≤ X ≤ 2525). Here n is large so that we can use the normal

Spring 2023 33
Figure 2.10: pdf of Gaussian approximation.

√ √
approximation. In this case p = 1/2, so that np = 2500, and npq ≃ 35. Since np − npq ≃

2465 and np + npq ≃ 2535, the approximation is valid for k1 = 2475 and k2 = 2525. Thus

x2  y2 
1
Z
P (k1 < X < k2 ) = √ exp − dy (2.18)
x1 2π 2

here
k1 − np 5 k2 − np 5
x1 = √ =− , x2 = √ =
npq 7 npq 7

Since x1 < 0, from Figure 2.10, the above probability is given by

5
P (2475 ≤ X ≤ 2525) = erf (x2 ) − erf (x1 ) = erf (x2 ) + erf (|x1 |) = 2erf ( ) = 0.516
7

where we have used table (erf (0.7) = 0.258).

Spring 2023 34
Figure 2.11: The standard normal complementary CDF Φ(z)

Spring 2023 35
Figure 2.12: The standard normal complementary CDF Q(z)

Spring 2023 36

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy