ch1 2

Stochastic and Random Processes
(EC636)
by
Dr. Mohamed Elalem
Department of Computer Engineering
University of Tripoli
Notes on Probabilities and Stochastic Processes
Graduate Program of Computer Engineering
http://melalem.com/EC636.php
c
M.Elalem
Lecture 1
Experiments, Models, and
Probabilities
1.1 Introduction
• Real word exhibits randomness
– Today’s temperature
– Flip a coin, head or tail (H,T)?
– Walk to a bus station, how long do you wait for the arrival of a bus?
– Transmit a waveform through a channel, which one arrives at the receiver?
– Which one does the receiver identify as the transmitted signal?
• We create models to analyze, since real experiment are generally too complicated, for
example, waiting time depends on the following factors:
– The time of a day (is it rush hour?);
– The speed of each car that passed by while you waited;
1
– The weight, horsepower, and gear ratio of the bus;
– The psychological profile and work schedule of drivers;
– The status of all road construction within 100 miles of the bus stop.
• It would be apparent that it would be too difficult to analyze the effects of all the
factors on the likelihood that you will wait less than 5 minutes for a bus. Therefore,
it is necessary to study and create a model to capture the critical part of the actual
physical experiment.
• Probability theory deals with the study of random phenomena, which under re-
peated experiments yield different outcomes that have certain underlying patterns
about them.
1.1.1 Review of Set Operation
• Event space Ω: sets of outcomes
• Sets constructions for events E ⊂ Ω and F ⊂ Ω
– Union: E ∪ F = {s ∈ Ω : s ∈ E OR s ∈ F };
– Intersection: E ∩ F = {s ∈ Ω : s ∈ E AND s ∈ F };
– Complement: E c = Ē = {s ∈ Ω : s ∈
/ E};
– Empty set: Φ = Ωc = {}.
• Only complement operation needs the knowledge of Ω; event space.
Spring 2023 2
1.1.2 Several Definitions
• Disjoint: if A ∩ B = Φ, the empty set, then A and B are said to be mutually exclusive
(M.E), or disjoint.
• Exhaustive: the collection of events has
Σ∞
i=1 Ai = Ω
• A partition of Ω is a collection of mutually exclusive subsets of Ω such that their
union is Ω (Partition is a stronger condition than Exhaustive.):
Ai ∩ Aj = φ and ∪ni=1 Ai = Ω
1.1.3 De-Morgan’s Law
A∪B =A∩B A∩B = A∪B (1.1)
Spring 2023 3
1.1.4 Sample Space, Events and Probabilities
• Outcome: an outcome of an experiment is any possible observations of that experi-
ment.
• Sample space: the sample space of an experiment is the set of all possible outcomes
of that experiment.
• Event: is a set of outcomes of an experiment.
• Event Space: is a collectively exhaustive, mutually exclusive set of events.
Sample Space and Event Space
– Sample space: contains all the details of an experiment. It is a set of all
outcomes, each outcome s ∈ S. Some example:
∗ coin toss: S = {H, T }
∗ roll pair of dice: S = {(1, 1), · · · , (6, 6)}
∗ component life time: S = {t ∈ [0, ∞)} e.g., lifespan of a light bulb
Spring 2023 4
∗ noise: S = {n(t); t: real}
– Event Space: is a set of events.
Example 1
coin toss 4 times:
The sample space consists of 16 four-letter words, with each letter either h (head)
or t (tail).
Let Bi = outcomes with i heads for i = 0, 1, 2, 3, 4. Each Bi is an event containing
one or more outcomes, say, B1 = {ttth, ttht, thtt, httt} contains four outcomes.
The set B = {B0 , B1 , B2 , B3 , B4 } is an event space. It is not a sample.
Example 2
Toss two dices, there are 36 elements in the sample space. If we define the event
as the sum of two dice,
Ω = {B2 , B3 , · · · , B12 }
there are 11 elements.
– Practical example, binary data transmit through a noisy channel, we are more
interested in the event space.
1.1.5 Probability Defined on Events
Often it is meaningful to talk about at least some of the subsets of S as events, for which
we must have mechanism to compute their probabilities.
Example 3
Spring 2023 5
Consider the experiment where two coins are simultaneously tossed. The sample space is
S = {γ1 , γ2 , γ3, γ4 } where
γ1 = {H, H} γ2 = {H, T } γ3 = {T, H} γ4 = {T, T }
If we define
A = {γ1 , γ2 , γ3 }
The event of A is the same as “Head has occurred at least once” and qualifies as an event.
Probability measure: each event has a probability, P (E)
1.1.6 Definitions, Axioms and Theorems
• Definitions: establish the logic of probability theory
• Axioms: are facts that we have to accept without proof.
• Theorems are consequences that follow logically from definitions and axioms. Each
theorem has a proof that refers to definitions, axioms, and other theorems.
• There are only three axioms.
For any event A, we assign a number P (A), called the probability of the event A. This
number satisfies the following three conditions that act the axioms of probability.
1- Probability is a nonnegative number
P (A) ≥ 0 (1.2)
2- Probability of the whole set is unity
P (Ω) = 1 (1.3)
Spring 2023 6
3- For any countable collection A1 , A2 , · · · of mutually exclusive events
P (A1 ∪ A2 ∪ · · · ) = P (A1 ) + P (A2 ) + · · · (1.4)
(Note that (3) states that if A and B are mutually exclusive (M.E.) events, the
probability of their union is the sum of their probabilities.)
We will build our entire probability theory on these axioms.
1.1.7 Some Results Derived from the Axioms
The following conclusions follow from these axioms:
• Since A ∪ A = Ω, using (2), we have
P (A ∪ A) = P (Ω) = 1
But A ∩ A = φ, and using (3),
P (A ∪ A) = P (A) + P (A) = 1 or P (A) = 1 − P (A)
• Similarly, for any A, A ∩ {φ} = {φ}. hence it follows that P (A ∪ {φ}) = P (A) + P (φ).
But A ∪ {φ} = A and thus
P {φ} = 0
• Suppose A and B are not mutually exclusive (M.E.) How does one compute P (A ∪ B)?
To compute the above probability, we should re-express (A ∪ B) in terms of M.E. sets
so that we can make use of the probability axioms. From figure below,
P (A ∪ B) = A ∪ AB
where A and AB are clearly M.E. events. Thus using axiom (3)
Spring 2023 7
P (A ∪ B) = P (A ∪ AB) = P (A) + P (AB)
to compute P (AB), we can express B as
B = B ∩ Ω = B ∩ (A ∪ A) = (B ∩ A) ∪ (B ∩ A) = BA ∪ BA
Thus
P (B) = P (BA) + P (BA)
Since BA = AB and BA = AB are M.E. events, we have
P (AB) = P (B) − P (AB)
Therefore
P (A ∪ B) = P (A) + P (B) − P (AB)
• Coin toss revisited:
γ1 = [H, H], γ2 = [H, T ], γ3 = [T, H], γ4 = [T, T ],
Let A = γ1 , γ2 : the event that the first coin falls head
Let B = γ1 , γ3 : the event that the second coin falls head
1 1 1 3
P (A ∪ B) = P (A) + P (B) − P (AB) = + − =
2 2 4 4
where P (A ∪ B) denotes the event that at least one head appeared.
Spring 2023 8
1.1.8 Theorem
For an event space B = {B1 , B2 , · · · } and any event A in the event space, let Ci =
A ∩ Bi . For i 6= j, the events Ci and Cj are mutually exclusive and

n
X
A = C1 ∪ C2 ∪ · · · P (A) = P (Ci )
i=1
Example 4
Coin tossing, let A equal the set of outcomes with less than three heads, as A =
{tttt, httt, thtt, ttht, ttth, hhtt, htht, htth, tthh, thth, thht} Let {B0 , B1 , B2 , B3 , B4 } de-
note the event space in which Bi = { outcomes with i heads }. Let Ci = A ∩ Bi (i =
0, 1, 2, 3, 4), the above theorem states that
A = C0 ∪ C1 ∪ C2 ∪ C3 ∪ C4
= (A ∩ B0 ) ∪ (A ∩ B1 ) ∪ (A ∩ B2 ) ∪ (A ∩ B3 ) ∪ (A ∩ B4 )
In this example, Bi ⊂ A, for i = 0, 1, 2. Therefore, A ∩ Bi = Bi for i = 0, 1, 2. Also
for i = 3, 4, A ∩ Bi = φ, so that A = B0 ∪ B1 ∪ B2 , a union of disjoint sets. In words,
this example states that the event less than three heads is the union of the events for
“zero head”, “one head”, and “two heads”.
Example 5
Spring 2023 9
V F D
L 0.3 0.15 0.12
B 0.2 0.15 0.08
A company has a model of telephone usage. It classifies all calls as L (long), B (brief).
It also observes whether calls carry voice(V ), fax (F ), or data(D). The sample space
has six outcomes S = {LV, BV, LD, BD, LF, BF }. The probability can be represented
in the table as Note that {V, F, D} is an event space corresponding to {B1 , B2 , B3 } in
the previous theorem (and L is equivalent as the event A). Thus, we can apply the
theorem to find
P (L) = P (LV ) + P (LD) + P (LF ) = 0.57
1.1.9 Conditional Probability and Independence
In N independent trials, suppose NA , NB , NAB denote the number of times events A,
B and AB occur respectively. According to the frequency interpretation of probability,
for large N
NA NB NAB
P (A) = P (B) = P (AB) =
N N N
Among the NA occurrences of A, only NAB of them are also found among the NB
occurrences of B. Thus the ratio
NAB NAB /N P (AB)

= =
NB NB /N P (B)
is a measure of the event A given that B has already occurred. We denote this condi-
tional probability by P (A|B) = Probability of the event A given that B has occurred.
Spring 2023 10
We define
P (AB)
P (A|B) = (1.5)
P (B)
provided P (B) 6= 0. As we show below, the above definition satisfies all probability
axioms discussed earlier. We have
1. Non-negative
P (AB) ≥ 0
P (A|B) = ≥0
P (B) > 0
2.
P (ΩB) P (B)
P (Ω|B) = = = 1, since ΩB = B
P (B) P (B)
3. Suppose A ∩ C = φ, then
P ((A ∪ C) ∩ B) P (AB ∪ CB)

P (A ∪ C|B) = =
P (B) P (B)
But AB ∩ CB = φ, hence P (AB ∪ CB) = P (AB) + P (CB),
P (AB) P (CB)
P (A ∪ C|B) = + = P (A|B) + P (C|B)
P (B) P (B)
satisfying all probability axioms. Thus P (A|B) defines a legitimate probability
measure.
1.1.10 Properties of Conditional Probability
1. If B ⊂ A, AB = B, and
P (AB) P (B)
P (A|B) = = =1
P (B) P (B)
since if B ⊂ A, then occurrence of B implies automatic occurrence of the event
A. As an example, let A = { outcome is even}, B = { outcome is 2} in a dice
tossing experiment. Then B ⊂ A and P (A|B) = 1.
Spring 2023 11
2. If A ⊂ B, AB = A, and
P (AB) P (A)
P (A|B) = = > P (A)
P (B) P (B)
In a dice experiment, A = { outcome is 2}, B = { outcome is even}, so that A ⊂
B. The statement that B has occurred (outcome is even) makes the probability
for “outcome is 2” greater than that without that information.
3. We can use the conditional probability to express the probability of a compli-
cated event in terms of simpler related events: Law of Total Probability. Let
A1 , A2 , · · · , An are pair wise disjoint and their union is Ω. Thus Ai ∩ Aj = φ, and
∪ni=1 Ai = Ω
thus
B = BΩ = B(A1 ∪ A2 ∪ · · · ∪ An ) = BA1 ∪ BA2 ∪ · · · BAn ∪
But Ai ∩ Aj = φ ⇒ BAi ∩ BAj = φ so that
n
X n
X
P (B) = P (BAi ) = P (B|Ai )P (Ai )
i=1 i=1
Next we introduce the notion of “independence” of events.
Independence: A and B are said to be independent events, if
P (AB) = P (A)P (B)
Notice that the above definition is a probabilistic statement, NOT a set theo-
retic notion such as mutually exclusiveness, (independent and disjoint are not
synonyms).
Spring 2023 12
1.1.11 More on Independence
– Disjoint events have no common outcomes and therefore P (AB) = 0. In most
cases, independent does not mean disjoint, except P (A) = 0 or P (B) = 0.
– Disjoint leads to probability sum, while independence leads to probability multi-
plication.
– Independent events cannot be mutually exclusive, since P (A) > 0, P (B) > 0, and
A, B independent implies P (AB) > 0, thus the event AB cannot be the null set.
– Suppose A and B are independent, then
P (AB) P (A)P (B)

P (A|B) = = = P (A)
P (B) P (B)
Thus if A and B are independent, the event that B has occurred does not shed
any more light into the event A. It makes no difference to A whether B has
occurred or not.
Example 6
A box contains 6 white and 4 black balls. Remove two balls at random without
replacement. What is the probability that the first one is white and the second
one is black?
Let W1 = “first ball removed is white” and B2 = “second ball removed is black”.
We need to find P (W1 ∩ B2 ) =?.
We have W1 ∩ B2 = W1 B2 = B2 W1 . Using the conditional probability rule,
P (W1 B2 ) = P (B2 W1 ) = P (B2 |W1 )P (W1 )
But
6 6 3
P (W1 ) = = =
6+4 10 5
Spring 2023 13
and
4 4
P (B2 |W1 ) = =
5+4 9
and hence
3 4 4
P (W1 |B2 ) = · = = 0.267
5 9 15
Are the events W1 and B2 independent? Our common sense says No. To verify
this we need to compute P (B2). Of course the fate of the second ball very much
depends on that of the first ball. The first ball has two options: W1 = “first ball
is white” or B1 = “first ball is black”. Note that W1 ∩ B1 = φ and W1 ∪ B1 = Ω.
Hence W1 together with B1 form a partition. Thus
P (B2) = P (B2 |W1 )P (W1 ) + P (B2 |B1 )P (B1 )
4 3 3 4 2
P (B2 ) = · + · =
5 + 4 5 6 + 3 10 5
and
2 3 4
P (B2 )P (W1 ) = · =6 P (B2 W1 ) =
5 5 15
As expected, the events W1 and B2 are dependent.
1.2 Bayes’ Theorem
since
P (AB) = P (A|B)P (B)
similarly,
P (BA) P (AB)
P (B|A) = = ⇒ P (AB) = P (B|A)P (A)
P (A) P (A)
We get
P (A|B)P (B) = P (B|A)P (A)
Spring 2023 14
or
P (B|A)
P (A|B) = · P (A) (1.6)
P (B)
The above equation is known as Bayes’ theorem.
Although simple enough, Bayes theorem has an interesting interpretation: P (A)
represents the a-priori probability of the event A. Suppose B has occurred, and
assume that A and B are not independent. How can this new information be
used to update our knowledge about A? Bayes’ rule takes into account the new
information (“B has occurred”) and gives out the a-posteriori probability of A
given B.
We can also view the event B as new knowledge obtained from a fresh experiment.
We know something about A as P (A). The new information is available in terms of
B. The new information should be used to improve our knowledge/understanding
of A. Bayes theorem gives the exact mechanism for incorporating such new in-
formation.
A more general version of Bayes’ theorem involves partition of Ω as
P (B|Ai )P (Ai ) P (B|Ai )P (Ai )

P (Ai |B) = = Pn (1.7)
P (B) j=1 P (B|Aj )P (Aj )
In above equation, Ai , i = [1, n] represent a set of mutually exclusive events with
associated a-priori probabilities P (Ai ), i = [1, n]. With the new information “B
has occurred”, the information about Ai can be updated by the n conditional
probabilities P (B|Aj ), j = [1, n].
Example 7
Two boxes B1 and B2 contain 100 and 200 light bulbs respectively. The first box
(B1 ) has 15 defective bulbs and the second 5. Suppose a box is selected at random
Spring 2023 15
and one bulb is picked out.
(a) What is the probability that it is defective?
Solution: Note that box B1 has 85 good and 15 defective bulbs. Similarly box
B2 has 195 good and 5 defective bulbs. Let D = “Defective bulb is picked out”.
Then,
15 5
P (D|B1) = = 0.15, P (D|B2) = = 0.025
100 200
Since a box is selected at random, they are equally likely.
P (B1 ) = P (B2 ) = 1/2
Thus B1 and B2 form a partition, and using Law of Total Probability, we obtain
1 1
P (D) = P (D|B1 )P (B1 ) + P (D|B2)P (B2 ) = 0.15 × + 0.025 × = 0.0875
2 2
Thus, there is about 9% probability that a bulb picked at random is defective.
(b) Suppose we test the bulb and it is found to be defective. What is the proba-
bility that it came from box 1? (P (B1 |D) =?)
P (D|B1)P (B1 ) 0.15 × 0.5

P (B1 |D) = = = 0.8571 (1.8)
P (D) 0.0875
Notice that initially P (B1) = 0.5; then we picked out a box at random and tested
a bulb that turned out to be defective. Can this information shed some light about
the fact that we might have picked up box 1? From (1.8), P (B1 |D) = 0.875 > 0.5,
and indeed it is more likely at this point that we must have chosen box 1 in favor
of box 2. (Recall box1 has six times more defective bulbs compared to box2).
Example 8
Suppose you have two coins, one biased, one fair, but you don’t know which coin is
which. Coin 1 is biased. It comes up heads with probability 3/4, while coin 2 will flip
Spring 2023 16
heads with probability 1/2. Suppose you pick a coin at random and flip it. Let Ci
denote the event that coin i is picked. Let H and T denote the possible outcomes of the
flip. Given that the outcome of the flip is a head, what is P [C1 |H], the probability that
you picked the biased coin? Given that the outcome is a tail, what is the probability
P [C1 |T ] that you picked the biased coin?
Solution: First, we construct the sample tree as shown: To find the conditional
probabilities, we see
P (C1H) P (C1H) 3/8 3

P (C1|H) = = = =
P (H) P (C1H) + P (C2 H) 3/8 + 1/4 5
Similarly,
P (C1 T ) P (C1 T ) 1/8 1

P (C1 |T ) = = = =
P (T ) P (C1T ) + P (C2 T ) 1/8 + 1/4 3
As we would expect, we are more likely to have chosen coin 1 when the first flip is
heads but we are more likely to have chosen coin 2 when the first flip is tails.
Spring 2023 17
Lecture 2
Random Variables
2.1 Introduction
Let Ω be sample space of a probability model, and X a function that maps every ζ ∈ Ω, to
a unique point x ∈ R, the set of real numbers. Since the outcome ζ is not certain, so is the
value X(ζ) = x. Thus if B is some subset of R, we may want to determine the probability
of ”X(ζ) ∈ B“. To determine this probability, we can look at the set A = X −1 (B) ⊂ Ω. A
contains all that maps into B under the function X.
Obviously, if the set A = X −1 (B) is an event, the probability of A is well defined; in this
case we can say
probability of the event ”X(ζ) ∈ B“ = P (X −1(B)) = P (A)
18
However, X −1 (B) may not always belong to R for all B, thus creating difficulties. The
notion of random variable (RV ) makes sure that the inverse mapping always results in an
event so that we are able to determine the probability for any B ⊂ R.
Random Variable (RV ): A finite single valued function X(·) that maps the set of all
experimental outcomes Ω into the set of real numbers R is said to be a RV , if the set
{ζ|X(ζ) ≤ x} is an event for every x in R.
The random variable X by the function X(ζ) that maps the sample outcome ζ to the
corresponding value of the random variable X. That is
{X = x} = {ζ ∈ Ω|X(ζ) = x}
Since all events have well defined probability. Thus the probability of the event {ζ|X(ζ) ≤ x}
must depend on x. Denote
P {ζ|X(ζ) ≤ x} = FX (x) ≥ 0 (2.1)
The role of the subscript X is only to identify the actual RV . FX (x) is said to be the
Cumulative Distribution Function (CDF) associated with the RV X.
2.1.1 Properties of CDF
FX (+∞) = 1, FX (−∞) = 0
FX (+∞) = P {ζ|X(ζ) ≤ +∞} = P (Ω) = 1
FX (−∞) = P {ζ|X(ζ) ≤ −∞} = P (φ) = 0
if x1 < x2 , then FX (x1 ) ≤ FX (x2 )
Spring 2023 19
If x1 < x2 , then the subset (−∞, x1 ) ⊂ (−∞, x2 ). Consequently the event {ζ|X(ζ) ≤
x1 } ⊂ {ζ|X(ζ) ⊂ x2}, since X(ζ) ≤ x1 , implies X(ζ) ≤ x2. As a result
FX (x1 ) = P (X(ζ) ≤ x1 ) ≤ P (X(ζ) ≤ x2) = FX (x2 )
implying that the probability distribution function is nonnegative and monotone non-
decreasing.
F or all b > a, FX (b) − FX (a) = P (a < X ≤ b)
To prove this theorem, express the event Eab = {a < X ≤ b} as a part of union of
disjoint events. Starting with the event Eb = {X ≤ b} . Note that Eb can be written
as the union
Eb = X ≤ b = {X ≤ a} ∪ {a < X ≤ b} = Ea ∪ Eab
Note also that Ea and Eab are disjoint so that P (Eb) = P (Ea )+P (Eab). Since P (Eb ) =
FX (b) and P (Ea ) = FX (a), we can write FX (b) = FX (a) + P (a < X ≤ b), which
completes the proof.
2.1.2 Additional Properties of a CDF
• If FX (x0 ) = 0 for some x0 , then FX (x) = 0, x ≤ x0 .
This follows, since FX (x0 ) = P (X(ζ) ≤ x0 ) = 0 implies {X(ζ) ≤ x0 } is the null set,
and for any x ≤ x0 , {X(ζ) ≤ x} will be a subset of the null set.
• P {X(ζ) > x} = 1 − FX (x)
We have {X(ζ) ≤ x} ∪ {X(ζ) > x} = Ω, and since the two events are mutually
exclusive, the above equation follows.
Spring 2023 20
• P {x1 < X(ζ) ≤ x2 } = FX (x2 ) − FX (x1 ), x2 > x1
The events {X(ζ) ≤ x1 } and {x1 < X(ζ) ≤ x2 } are mutually exclusive and their union
represents the event {X(ζ) ≤ x2 }.
• P {X(ζ) = x} = FX (x) − FX (x− )
Let x1 = x − ǫ, ǫ > 0, and x2 = x,
lim P {x − ǫ < X(ζ) ≤ x} = FX (x) − lim FX (x − ǫ)

ǫ→0 ǫ→0
or
P {X(ζ) = x} = FX (x) − FX (x− )
FX (x+
0 ), the limit of FX (x) as x → x0 from the right always exists and equals FX (x0 ).
However the left limit value FX (x−

0 ) need not equal FX (x0 ). Thus FX (x) need not be
continuous from the left. At a discontinuity point of the distribution, the left and right
limits are different, and
P {X(ζ) = x0 } = FX (x0 ) − FX (x−

0)
Thus the only discontinuities of a distribution function are of the jump type. The
CDF is continuous from the right. Keep in mind that the CDF always takes on the
upper value at every jump in staircase.
Example 1
X is a RV such that X(ζ) = c, ζ ∈ Ω. Find FX (x).
Solution: For x < c, {X(ζ) ≤ x} = φ, so that FX (x) = 0 and for x > c, X(ζ) ≤ x = Ω, so
that FX (x) = 1. (see Figure 2.1)
Example 2
Spring 2023 21
Figure 2.1: CDF for example 1
Toss a coin. Ω = {H, T }. Suppose the RV X is such that X(T ) = q, X(H) = 1 − q. Find
FX (x).
Solution:
• For x < 0, {X(ζ) ≤ x} = φ, so that FX (x) = 0.
• For 0 < x < 1, {X(ζ) ≤ x} = {T }, so that FX (x) = P (T ) = q.
• For x ≥ 1, {X(ζ) ≤ x} = {H, T } = Ω, so that FX (x) = 1. (see Figure 2.2)
• X is said to be a continuous-type RV if its distribution function FX (x) is continuous.
In that case FX (x− ) = FX (x) for all x, therefore, P {X = x} = 0.
• If FX (x) is constant except for a finite number of jump discontinuities(piece-wise con-
stant; step-type), then X is said to be a discrete-type RV . If xi is such a discontinuity
Spring 2023 22
point, then
pi = P {X = xi } = FX (xi ) − FX (x−
i )
For above two examples, at a point of discontinuity we get
P {X = c} = FX (c) − FX (c− ) = 1 − 0 = 1
and
P {X = 0} = FX (0) − FX (0− ) = q − 0 = q
Example 3
A fair coin is tossed twice, and let the RV X represent the number of heads. Find FX (x).
Solution: In this case Ω = {HH, HT, T H, T T }, and X(HH) = 2, X(HT ) = 1, X(T H) = 1,
X(T T ) = 0
• For x < 0, {X(ζ) ≤ x} = φ, so that FX (x) = 0.
• For 0 ≤ x < 1, {X(ζ) ≤ x} = {T T }, so that FX (x) = P (T T ) = P (T )P (T ) = 41 .
• For 1 ≤ x < 2, {X(ζ) ≤ x} = {T T, HT, T H}, so that FX (x) = P (T T, HT, T H) = 43 .
• x ≥ 2, {X(ζ) ≤ x} = Ω, so that FX (x) = 1.
(see Figure 2.3) We can also have
3 1 1
P {X = 1} = FX (1) − FX (1− ) = − =
4 4 2
2.2 Probability Density Function (pdf )
The first derivative of the distribution function FX (x) is called the probability density func-
tion fX (x) of the RV X. Thus
dFx (x) FX (x + ∆x) − FX (x)

fX (x) = = lim ≥0 (2.2)
dx ∆x→0 ∆x
Spring 2023 23
Equation (2.2) shows that fX (x) ≥ 0 for all x.
• Discrete RV:if X is a discrete type RV , then its density function has the general
form
X
fX (x) = pi δ(x − xi )
i
where xi represent the jump-discontinuity points in FX (x). As Fig. 2.4 shows, fX (x)
represents a collection of positive discrete masses, and it is known as the probability
mass function (pmf) in the discrete case.
Figure 2.4: Discrete pmf
• If X is a continuous type RV , fX (x) will be a continuous function,
• We also obtain by integration

Z x
FX (x) = fX (u)du
+∞
Spring 2023 24
Since FX (+∞) = 1, yields
Z −∞
fX (u)du = 1
−∞
which justifies its name as the density function.
• We also get Figure 2.5

Z x2
P {x1 < X < x2 } = FX (x1 ) − FX (x2 ) = fX (x)dx
x1
Thus the area under fX (x) in the interval (x1 , x2 ) represents the probability in the
Figure 2.5: Continuous pdf
above equation.
• Often, RV s are referred by their specific density functions - both in the continuous and
discrete cases - and in what follows we shall list a number of them in each category.
2.3 Continuous-type Random Variables
• Normal (Gaussian): X is said to be normal or Gaussian RV , if
1 h (x − µ)2 i
fX (x) = √ exp − (2.3)
2πσ 2 2σ 2
This is a bell shaped curve, symmetric around the parameter µ, and its distribution
function is given by
x h (y − µ)2 i
1 x−µ
Z
FX (x) = √ exp − 2
dy = Φ( ) (2.4)
−∞ 2πσ 2 2σ σ
Spring 2023 25
Rx
where Φ(x) = −∞
√1
2π
exp(− y2 )dy is called standard normal CDF , and is often tabu-
lated. Figure 2.6 shows pdf and cdf of the Normal distribution for different means and
variances.
Figure 2.6: pdf and cdf of Normal distribution for different means and variances
b−µ a−µ
P (a < X < b) = Φ( ) − Φ( )
σ σ
Z ∞
1 y
Q(x) = √ exp − dy = 1 − Φ(x)
x 2π 2
Q(x) is called Standard Normal complementary CDF , and Q(x) = 1 − Φ(x). Since
fX (x) depends on two parameters µ and σ 2 , the notation X ∼ ℵ(µ, σ 2 ) is applied. If
X −µ
Y = ∼ ℵ(0, 1) (2.5)
σ
Y is called normalized Gaussian RV . Furthermore,
aX + b ∼ ℵ(aµ + b, a2 σ 2 )
linear transform of a Gaussian RV is still Gaussian.
• Uniform: X ∼ U(a, b), a < b, as shown in Figure ?? if


 1 , a ≤ x ≤ b;

b−a
fX (x) = (2.6)
 0,

elsewhere.
Spring 2023 26
Figure 2.7: pdf and cdf of Uniform distribution
• Exponential: X ∼ E(λ) if

1
exp(− λx , x ≥ 0;


λ
fX (x) = (2.7)
 0,

elsewhere.
Figure ?? indicates the pdf and cdf of Exponential distribution for different parameter
λ.
Figure 2.8: pdf and cdf of Exponential distribution
• Chi-Square distribution with n degree of freedom


 n n1

xn/2−1 exp(− 2σx2 ), x ≥ 0;
σ 2 Γ(n/2)
fX (x) = (2.8)
 0,

elsewhere.
Spring 2023 27
When the RV X is defined as X = Σni=1 Xi , i = [1, n] are statistically independent
and identically distributed (i.i.d) Gaussian RV ∼ ℵ(0, σ 2 ), then X has a chi-square
distribution with n degree of freedom. Γ(x) is called Gamma function and given as
Z ∞
Γ(p) = tp−1 e−t dt, p > 0
0
Γ(p) = (p − 1)! pis possitive integer.
1 √
Γ( ) = π
2
• Rayleigh: X ∼ R(σ 2 ) as shown in Figure ??


 x2 exp(− x22 ), x ≥ 0;

σ 2σ
fX (x) = (2.9)
 0,

elsewhere.
Let Y = X12 +X22 where X1 and X2 ∼ ℵ(0, σ 2 ) and independent. Then Y is chi-square
Figure 2.9: pdf and cdf of Rayleigh distribution
distributed with two degrees of freedom, hence pdf of Y is
1 y
fY (y) = exp −
2σ 2 2σ 2
√
Now, suppose we define a new RV as R = Y , then R is Rayleigh distributed.
Spring 2023 28
2.4 Discrete-type Random Variables
• Bernoulli: X takes the values of (0, 1), and
P (X = 0) = q, P (X = 1) = p, p = 1 − q.
• Binomial: X ∼ B(n, p)
 
 n 
P (X = k) =   pk q n−k , k = 0, 1, 2, · · · , n
k
• Poisson: X ∼ P (λ)
λk
P (X = k) = e−λ , k = 0, 1, 2, · · · , ∞
k!
• Uniform: X takes the values from [1, , n], and
1
P (X = k) = , k = 0, 1, 2, · · · , n
n
• Geometric: (number of coin toss till first head appear)
P (X = k) = (1 − p)k−1 p, k = 1, · · · ,
where the parameter p ∈ (0, 1) (probability for head appear on each one toss).
2.4.1 Example of Poisson RV
Example of Poisson Distribution: the probability model of Poisson RV describes phenomena
that occur randomly in time. While the time of each occurrence is completely random,
there is a known average number of occurrences per unit time. For example, the arrival of
information requests at a W W W server, the initiation of telephone call, etc.
For example, calls arrive at random times at a telephone switching office with an average
Spring 2023 29
of λ = 0.25 calls/second. The pmf of the number of calls that arrive in a T = 2 second
interval is 
 (0.5)k e−0.5 , k = 0, 1, 2, · · · ;

k!
PK (k) =
 0,

o.w.
2.4.2 Example of Binomial RV
Example of using Binomial Distribution: To communicate one bit of information reliably, we
transmit the same binary symbol 5 times. Thus, ”zero“ is transmitted as 00000 and ”one“
is transmitted as 11111. The receiver detects the correct information if three or more binary
symbols are received correctly. What is the information error probability P (E), if the binary
symbol error probability is q = 0.1?
In this case, we have five trials corresponding to five transmissions. On each trial, the
probability of a success is p = 1 − q = 0.9 (binary symmetric channel). The error event
occurs when the number of successes is strictly less than three:
P (E) = P (S0,5 ) + P (S1,5 ) + P (S2,5 ) = q 5 + 5pq 4 + 10p2 q 3 = 0.0081
By increasing the number of transmissions (5 times), the probability of error is reduced from
0.1 to 0.0081.
2.4.3 Bernoulli Trial Revisited
Bernoulli trial consists of repeated independent and identical experiments each of which has
only two outcomes A or Ā with P (A) = p and P (Ā) = q. The probability of exactly k
occurrences of A in n such trials is given by Binomial distribution.
Let
Xk = ”exact k occurance in n trials“ (2.10)
Spring 2023 30
Since the number of occurrences of A in n trials must be an integer k = 0, 1, 2, · · · , n, either
X0 or X1 or X2 or · · · or Xn must occur in such an experiment. Thus
P (X0 ∪ X1 ∪ X2 ∪ · · · ∪ Xn ) = 1 (2.11)
But Xi , Xj are mutually exclusive. Thus

 
n n
X X  n  k n−k
P (X0 ∪ X1 ∪ X2 ∪ · · · ∪ Xn ) = P (Xk ) =  p q (2.12)
k=0 k=0 k
from the relation  

n
X  n  k n−k
(a + b)n =  p q
k=0 k
Equation (2.12) (p + q)n = 1 and it agrees with Equation (2.11).
For a given n and p what is the most likely value of k? The most probable value of k is that
number which maximizes in Binomial distribution. To obtain this value, consider the ratio
Pn (k − 1) n!pk−1 q n−k+1 (n − k)!k! k q

= · k n−k
= ·
Pn (k) (n − k + 1)!(k − 1)! n!p q n−k+1 p
Thus Pn (k) > Pn (k − 1), if k(1 − p) < (n − k + 1)p or k < (n + 1)p. Thus, Pn (k) as a function
of k increases until
k = (n + 1)p
if it is an integer, or the largest integer kmax less than (n + 1)p and (n + 1)p represents the
most likely number of successes (or heads) in n trials.
Example 4
In a Bernoulli experiment with n trials, find the probability that the number of occurrences
of A is between k1 and k2 .
Spring 2023 31
Solution: with Xi , i = 0, 1, 2, · · · , n as defined in Equation (2.10), clearly they are mutually
exclusive events. Thus
P = P (”Occurrence of A is between k1 and k2 “)

 
k2 k2
= P (Xk1 ∪ Xk1 +1 ∪ · · · Xk2 ) = P (Xk ) =  p q (2.13)
k=k1 k=k1 k
Example 5
Suppose 5, 000 components are ordered. The probability that a part is defective equals 0.1.
What is the probability that the total number of defective parts does not exceed 400?
Solution: Let
Yk = ”k parts are detective among 5000 components“
using Equation (2.13), the desired probability is given by

 
400
X 400
X  5000 
k 5000−k
P (Y0 ∪ Y1 ∪ · · · Y400 ) = P (Yk ) =   (0.1) (0.9)
k=0 k=0 k
The above equation has too many terms to compute. Clearly, we need a technique to compute
the above term in a more efficient manner.
2.4.4 Binomial Random Variable Approximations
Let X represent a Binomial RV , then

 
k2 k2
P (k1 < X < k2 = P (Xk ) =  p q (2.14)
k=k1 k=k1 k
 
 n  n!
Since the binomial coefficient   = (n−k)!k! grows quite rapidly with n, it is difficult to
k
compute Equation (2.14) for large n. In this context, Normal approximation is extremely
Spring 2023 32
useful.
Normal Approximation: (Demoivre-Laplace Theorem) Suppose n → ∞ with p held fixed.

√
Then for k in the npq neighborhood of np, we can approximate
 
 n  k n−k 1 (k − np)2
p q = √ exp − (2.15)
2πnpq 2npq
 
k
Thus if k1 and k2 in Equation (2.14) are within or around the neighborhood of the interval
√ √
(np − npq, np + npq) we can approximate the summation in Equation (2.14) by an
integration as
K2 (k − np)2
1
Z
P (k1 < X < k2 ) = √ exp − dx
k1 2πnpq 2npq
x2 y2
1
Z
= √ exp − dy (2.16)
x1 2π 2
where
k1 − np k2 − np
x1 = √ x2 = √
npq npq
We can express Equation (2.16) in terms of the normalized integral that has been tabulated
extensively. See Figures 2.11 and 2.12.
x
1
Z
2 /2
erf (x) = √ ey dy = −erf (−x) (2.17)
2π 0
For example, if x1 and x2 are both positive, we obtain
P (x1 < X < x2 ) = erf (x2 ) − erf (x1 )
Example 6
A fair coin is tossed 5, 000 times. Find the probability that the number of heads is between
2, 475 to 2, 525.
Solution: We need P (2475 ≤ X ≤ 2525). Here n is large so that we can use the normal
Spring 2023 33
Figure 2.10: pdf of Gaussian approximation.
√ √
approximation. In this case p = 1/2, so that np = 2500, and npq ≃ 35. Since np − npq ≃
√
2465 and np + npq ≃ 2535, the approximation is valid for k1 = 2475 and k2 = 2525. Thus
x2 y2
1
Z
P (k1 < X < k2 ) = √ exp − dy (2.18)
x1 2π 2
here
k1 − np 5 k2 − np 5
x1 = √ =− , x2 = √ =
npq 7 npq 7
Since x1 < 0, from Figure 2.10, the above probability is given by
5
P (2475 ≤ X ≤ 2525) = erf (x2 ) − erf (x1 ) = erf (x2 ) + erf (|x1 |) = 2erf ( ) = 0.516
7
where we have used table (erf (0.7) = 0.258).
Spring 2023 34
Figure 2.11: The standard normal complementary CDF Φ(z)
Spring 2023 35
Figure 2.12: The standard normal complementary CDF Q(z)
Spring 2023 36

ch1 2

Uploaded by

Copyright:

Available Formats

ch1 2

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

ch1 2

Uploaded by

Copyright:

Available Formats

Stochastic and Random Processes

Dr. Mohamed Elalem

Department of Computer Engineering

Notes on Probabilities and Stochastic Processes

Graduate Program of Computer Engineering

Experiments, Models, and

• Real word exhibits randomness

– Flip a coin, head or tail (H,T)?

– Transmit a waveform through a channel, which one arrives at the receiver?

– Which one does the receiver identify as the transmitted signal?

example, waiting time depends on the following factors:

– The time of a day (is it rush hour?);

– The speed of each car that passed by while you waited;

– The psychological profile and work schedule of drivers;

1.1.1 Review of Set Operation

• Event space Ω: sets of outcomes

• Sets constructions for events E ⊂ Ω and F ⊂ Ω

– Empty set: Φ = Ωc = {}.

• Only complement operation needs the knowledge of Ω; event space.

• Exhaustive: the collection of events has

• A partition of Ω is a collection of mutually exclusive subsets of Ω such that their

union is Ω (Partition is a stronger condition than Exhaustive.):

1.1.3 De-Morgan’s Law

A∪B =A∩B A∩B = A∪B (1.1)

• Outcome: an outcome of an experiment is any possible observations of that experi-

• Event: is a set of outcomes of an experiment.

• Event Space: is a collectively exhaustive, mutually exclusive set of events.

Sample Space and Event Space

– Sample space: contains all the details of an experiment. It is a set of all

outcomes, each outcome s ∈ S. Some example:

∗ coin toss: S = {H, T }

∗ roll pair of dice: S = {(1, 1), · · · , (6, 6)}

∗ component life time: S = {t ∈ [0, ∞)} e.g., lifespan of a light bulb

– Event Space: is a set of events.

coin toss 4 times:

Let Bi = outcomes with i heads for i = 0, 1, 2, 3, 4. Each Bi is an event containing

The set B = {B0 , B1 , B2 , B3 , B4 } is an event space. It is not a sample.

as the sum of two dice,

there are 11 elements.

interested in the event space.

1.1.5 Probability Defined on Events

we must have mechanism to compute their probabilities.

S = {γ1 , γ2 , γ3, γ4 } where

γ1 = {H, H} γ2 = {H, T } γ3 = {T, H} γ4 = {T, T }

Probability measure: each event has a probability, P (E)

1.1.6 Definitions, Axioms and Theorems

• Definitions: establish the logic of probability theory

• Axioms: are facts that we have to accept without proof.

• There are only three axioms.

1- Probability is a nonnegative number

2- Probability of the whole set is unity

P (A1 ∪ A2 ∪ · · · ) = P (A1 ) + P (A2 ) + · · · (1.4)

probability of their union is the sum of their probabilities.)

We will build our entire probability theory on these axioms.

1.1.7 Some Results Derived from the Axioms

The following conclusions follow from these axioms:

• Since A ∪ A = Ω, using (2), we have

But A ∩ A = φ, and using (3),

P (A ∪ A) = P (A) + P (A) = 1 or P (A) = 1 − P (A)