Notes EC636

Stochastic and Random Processes
(EC636)
by
Dr. Mohamed Elalem
Department of Computer Engineering
University of Tripoli
Notes on Probabilities and Stochastic Processes
Graduate Program of Computer Engineering
http://melalem.com/EC636.php
c M.Elalem
Lecture 1
Experiments, Models, and
Probabilities
1.1 Introduction
• Real word exhibits randomness
– Today’s temperature
– Flip a coin, head or tail (H,T)?
– Walk to a bus station, how long do you wait for the arrival of a bus?
– Transmit a waveform through a channel, which one arrives at the receiver?
– Which one does the receiver identify as the transmitted signal?
• We create models to analyze, since real experiment are generally too complicated, for
example, waiting time depends on the following factors:
– The time of a day (is it rush hour?);
– The speed of each car that passed by while you waited;
1
– The weight, horsepower, and gear ratio of the bus;
– The psychological profile and work schedule of drivers;
– The status of all road construction within 100 miles of the bus stop.
• It would be apparent that it would be too difficult to analyze the effects of all the
factors on the likelihood that you will wait less than 5 minutes for a bus. Therefore,
it is necessary to study and create a model to capture the critical part of the actual
physical experiment.
• Probability theory deals with the study of random phenomena, which under re-
peated experiments yield different outcomes that have certain underlying patterns
about them.
1.1.1 Review of Set Operation
• Event space Ω: sets of outcomes
• Sets constructions for events E ⊂ Ω and F ⊂ Ω
– Union: E ∪ F = {s ∈ Ω : s ∈ E OR s ∈ F };
– Intersection: E ∩ F = {s ∈ Ω : s ∈ E AND s ∈ F };
– Complement: E c = Ē = {s ∈ Ω : s ∈
/ E};
– Empty set: Φ = Ωc = {}.
• Only complement operation needs the knowledge of Ω; event space.
Spring 2023 2
1.1.2 Several Definitions
• Disjoint: if A ∩ B = Φ, the empty set, then A and B are said to be mutually exclusive
(M.E), or disjoint.
• Exhaustive: the collection of events has
Σ∞
i=1 Ai = Ω
• A partition of Ω is a collection of mutually exclusive subsets of Ω such that their
union is Ω (Partition is a stronger condition than Exhaustive.):
Ai ∩ Aj = φ and ∪ni=1 Ai = Ω
1.1.3 De-Morgan’s Law
A∪B =A∩B A∩B = A∪B (1.1)
Spring 2023 3
1.1.4 Sample Space, Events and Probabilities
• Outcome: an outcome of an experiment is any possible observations of that experi-
ment.
• Sample space: the sample space of an experiment is the set of all possible outcomes
of that experiment.
• Event: is a set of outcomes of an experiment.
• Event Space: is a collectively exhaustive, mutually exclusive set of events.
Sample Space and Event Space
– Sample space: contains all the details of an experiment. It is a set of all
outcomes, each outcome s ∈ S. Some example:
∗ coin toss: S = {H, T }
∗ roll pair of dice: S = {(1, 1), · · · , (6, 6)}
∗ component life time: S = {t ∈ [0, ∞)} e.g., lifespan of a light bulb
Spring 2023 4
∗ noise: S = {n(t); t: real}
– Event Space: is a set of events.
Example 1
coin toss 4 times:
The sample space consists of 16 four-letter words, with each letter either h (head)
or t (tail).
Let Bi = outcomes with i heads for i = 0, 1, 2, 3, 4. Each Bi is an event containing
one or more outcomes, say, B1 = {ttth, ttht, thtt, httt} contains four outcomes.
The set B = {B0 , B1 , B2 , B3 , B4 } is an event space. It is not a sample.
Example 2
Toss two dices, there are 36 elements in the sample space. If we define the event
as the sum of two dice,
Ω = {B2 , B3 , · · · , B12 }
there are 11 elements.
– Practical example, binary data transmit through a noisy channel, we are more
interested in the event space.
1.1.5 Probability Defined on Events
Often it is meaningful to talk about at least some of the subsets of S as events, for which
we must have mechanism to compute their probabilities.
Example 3
Spring 2023 5
Consider the experiment where two coins are simultaneously tossed. The sample space is
S = {γ1 , γ2 , γ3, γ4 } where
γ1 = {H, H} γ2 = {H, T } γ3 = {T, H} γ4 = {T, T }
If we define
A = {γ1 , γ2 , γ3 }
The event of A is the same as “Head has occurred at least once” and qualifies as an event.
Probability measure: each event has a probability, P (E)
1.1.6 Definitions, Axioms and Theorems
• Definitions: establish the logic of probability theory
• Axioms: are facts that we have to accept without proof.
• Theorems are consequences that follow logically from definitions and axioms. Each
theorem has a proof that refers to definitions, axioms, and other theorems.
• There are only three axioms.
For any event A, we assign a number P (A), called the probability of the event A. This
number satisfies the following three conditions that act the axioms of probability.
1- Probability is a nonnegative number
P (A) ≥ 0 (1.2)
2- Probability of the whole set is unity
P (Ω) = 1 (1.3)
Spring 2023 6
3- For any countable collection A1 , A2 , · · · of mutually exclusive events
P (A1 ∪ A2 ∪ · · · ) = P (A1 ) + P (A2 ) + · · · (1.4)
(Note that (3) states that if A and B are mutually exclusive (M.E.) events, the
probability of their union is the sum of their probabilities.)
We will build our entire probability theory on these axioms.
1.1.7 Some Results Derived from the Axioms
The following conclusions follow from these axioms:
• Since A ∪ A = Ω, using (2), we have
P (A ∪ A) = P (Ω) = 1
But A ∩ A = φ, and using (3),
P (A ∪ A) = P (A) + P (A) = 1 or P (A) = 1 − P (A)
• Similarly, for any A, A ∩ {φ} = {φ}. hence it follows that P (A ∪ {φ}) = P (A) + P (φ).
But A ∪ {φ} = A and thus
P {φ} = 0
• Suppose A and B are not mutually exclusive (M.E.) How does one compute P (A ∪ B)?
To compute the above probability, we should re-express (A ∪ B) in terms of M.E. sets
so that we can make use of the probability axioms. From figure below,
P (A ∪ B) = A ∪ AB
where A and AB are clearly M.E. events. Thus using axiom (3)
Spring 2023 7
P (A ∪ B) = P (A ∪ AB) = P (A) + P (AB)
to compute P (AB), we can express B as
B = B ∩ Ω = B ∩ (A ∪ A) = (B ∩ A) ∪ (B ∩ A) = BA ∪ BA
Thus
P (B) = P (BA) + P (BA)
Since BA = AB and BA = AB are M.E. events, we have
P (AB) = P (B) − P (AB)
Therefore
P (A ∪ B) = P (A) + P (B) − P (AB)
• Coin toss revisited:
γ1 = [H, H], γ2 = [H, T ], γ3 = [T, H], γ4 = [T, T ],
Let A = γ1 , γ2 : the event that the first coin falls head
Let B = γ1 , γ3 : the event that the second coin falls head
1 1 1 3
P (A ∪ B) = P (A) + P (B) − P (AB) = + − =
2 2 4 4
where P (A ∪ B) denotes the event that at least one head appeared.
Spring 2023 8
1.1.8 Theorem
For an event space B = {B1 , B2 , · · · } and any event A in the event space, let Ci =
A ∩ Bi . For i 6= j, the events Ci and Cj are mutually exclusive and

n
X
A = C1 ∪ C2 ∪ · · · P (A) = P (Ci )
i=1
Example 4
Coin tossing, let A equal the set of outcomes with less than three heads, as A =
{tttt, httt, thtt, ttht, ttth, hhtt, htht, htth, tthh, thth, thht} Let {B0 , B1 , B2 , B3 , B4 } de-
note the event space in which Bi = { outcomes with i heads }. Let Ci = A ∩ Bi (i =
0, 1, 2, 3, 4), the above theorem states that
A = C0 ∪ C1 ∪ C2 ∪ C3 ∪ C4
= (A ∩ B0 ) ∪ (A ∩ B1 ) ∪ (A ∩ B2 ) ∪ (A ∩ B3 ) ∪ (A ∩ B4 )
In this example, Bi ⊂ A, for i = 0, 1, 2. Therefore, A ∩ Bi = Bi for i = 0, 1, 2. Also
for i = 3, 4, A ∩ Bi = φ, so that A = B0 ∪ B1 ∪ B2 , a union of disjoint sets. In words,
this example states that the event less than three heads is the union of the events for
“zero head”, “one head”, and “two heads”.
Example 5
Spring 2023 9
V F D
L 0.3 0.15 0.12
B 0.2 0.15 0.08
A company has a model of telephone usage. It classifies all calls as L (long), B (brief).
It also observes whether calls carry voice(V ), fax (F ), or data(D). The sample space
has six outcomes S = {LV, BV, LD, BD, LF, BF }. The probability can be represented
in the table as Note that {V, F, D} is an event space corresponding to {B1 , B2 , B3 } in
the previous theorem (and L is equivalent as the event A). Thus, we can apply the
theorem to find
P (L) = P (LV ) + P (LD) + P (LF ) = 0.57
1.1.9 Conditional Probability and Independence
In N independent trials, suppose NA , NB , NAB denote the number of times events A,
B and AB occur respectively. According to the frequency interpretation of probability,
for large N
NA NB NAB
P (A) = P (B) = P (AB) =
N N N
Among the NA occurrences of A, only NAB of them are also found among the NB
occurrences of B. Thus the ratio
NAB NAB /N P (AB)

= =
NB NB /N P (B)
is a measure of the event A given that B has already occurred. We denote this condi-
tional probability by P (A|B) = Probability of the event A given that B has occurred.
Spring 2023 10
We define
P (AB)
P (A|B) = (1.5)
P (B)
provided P (B) 6= 0. As we show below, the above definition satisfies all probability
axioms discussed earlier. We have
1. Non-negative
P (AB) ≥ 0
P (A|B) = ≥0
P (B) > 0
2.
P (ΩB) P (B)
P (Ω|B) = = = 1, since ΩB = B
P (B) P (B)
3. Suppose A ∩ C = φ, then
P ((A ∪ C) ∩ B) P (AB ∪ CB)

P (A ∪ C|B) = =
P (B) P (B)
But AB ∩ CB = φ, hence P (AB ∪ CB) = P (AB) + P (CB),
P (AB) P (CB)
P (A ∪ C|B) = + = P (A|B) + P (C|B)
P (B) P (B)
satisfying all probability axioms. Thus P (A|B) defines a legitimate probability
measure.
1.1.10 Properties of Conditional Probability
1. If B ⊂ A, AB = B, and
P (AB) P (B)
P (A|B) = = =1
P (B) P (B)
since if B ⊂ A, then occurrence of B implies automatic occurrence of the event
A. As an example, let A = { outcome is even}, B = { outcome is 2} in a dice
tossing experiment. Then B ⊂ A and P (A|B) = 1.
Spring 2023 11
2. If A ⊂ B, AB = A, and
P (AB) P (A)
P (A|B) = = > P (A)
P (B) P (B)
In a dice experiment, A = { outcome is 2}, B = { outcome is even}, so that A ⊂
B. The statement that B has occurred (outcome is even) makes the probability
for “outcome is 2” greater than that without that information.
3. We can use the conditional probability to express the probability of a compli-
cated event in terms of simpler related events: Law of Total Probability. Let
A1 , A2 , · · · , An are pair wise disjoint and their union is Ω. Thus Ai ∩ Aj = φ, and
∪ni=1 Ai = Ω
thus
B = BΩ = B(A1 ∪ A2 ∪ · · · ∪ An ) = BA1 ∪ BA2 ∪ · · · BAn ∪
But Ai ∩ Aj = φ ⇒ BAi ∩ BAj = φ so that
n
X n
X
P (B) = P (BAi ) = P (B|Ai )P (Ai )
i=1 i=1
Next we introduce the notion of “independence” of events.
Independence: A and B are said to be independent events, if
P (AB) = P (A)P (B)
Notice that the above definition is a probabilistic statement, NOT a set theo-
retic notion such as mutually exclusiveness, (independent and disjoint are not
synonyms).
Spring 2023 12
1.1.11 More on Independence
– Disjoint events have no common outcomes and therefore P (AB) = 0. In most
cases, independent does not mean disjoint, except P (A) = 0 or P (B) = 0.
– Disjoint leads to probability sum, while independence leads to probability multi-
plication.
– Independent events cannot be mutually exclusive, since P (A) > 0, P (B) > 0, and
A, B independent implies P (AB) > 0, thus the event AB cannot be the null set.
– Suppose A and B are independent, then
P (AB) P (A)P (B)

P (A|B) = = = P (A)
P (B) P (B)
Thus if A and B are independent, the event that B has occurred does not shed
any more light into the event A. It makes no difference to A whether B has
occurred or not.
Example 6
A box contains 6 white and 4 black balls. Remove two balls at random without
replacement. What is the probability that the first one is white and the second
one is black?
Let W1 = “first ball removed is white” and B2 = “second ball removed is black”.
We need to find P (W1 ∩ B2 ) =?.
We have W1 ∩ B2 = W1 B2 = B2 W1 . Using the conditional probability rule,
P (W1 B2 ) = P (B2 W1 ) = P (B2 |W1 )P (W1 )
But
6 6 3
P (W1 ) = = =
6+4 10 5
Spring 2023 13
and
4 4
P (B2 |W1 ) = =
5+4 9
and hence
3 4 4
P (W1 |B2 ) = · = = 0.267
5 9 15
Are the events W1 and B2 independent? Our common sense says No. To verify
this we need to compute P (B2). Of course the fate of the second ball very much
depends on that of the first ball. The first ball has two options: W1 = “first ball
is white” or B1 = “first ball is black”. Note that W1 ∩ B1 = φ and W1 ∪ B1 = Ω.
Hence W1 together with B1 form a partition. Thus
P (B2) = P (B2 |W1 )P (W1 ) + P (B2 |B1 )P (B1 )
4 3 3 4 2
P (B2 ) = · + · =
5 + 4 5 6 + 3 10 5
and
2 3 4
P (B2 )P (W1 ) = · =6 P (B2 W1 ) =
5 5 15
As expected, the events W1 and B2 are dependent.
1.2 Bayes’ Theorem
since
P (AB) = P (A|B)P (B)
similarly,
P (BA) P (AB)
P (B|A) = = ⇒ P (AB) = P (B|A)P (A)
P (A) P (A)
We get
P (A|B)P (B) = P (B|A)P (A)
Spring 2023 14
or
P (B|A)
P (A|B) = · P (A) (1.6)
P (B)
The above equation is known as Bayes’ theorem.
Although simple enough, Bayes theorem has an interesting interpretation: P (A)
represents the a-priori probability of the event A. Suppose B has occurred, and
assume that A and B are not independent. How can this new information be
used to update our knowledge about A? Bayes’ rule takes into account the new
information (“B has occurred”) and gives out the a-posteriori probability of A
given B.
We can also view the event B as new knowledge obtained from a fresh experiment.
We know something about A as P (A). The new information is available in terms of
B. The new information should be used to improve our knowledge/understanding
of A. Bayes theorem gives the exact mechanism for incorporating such new in-
formation.
A more general version of Bayes’ theorem involves partition of Ω as
P (B|Ai )P (Ai ) P (B|Ai )P (Ai )

P (Ai |B) = = Pn (1.7)
P (B) j=1 P (B|Aj )P (Aj )
In above equation, Ai , i = [1, n] represent a set of mutually exclusive events with
associated a-priori probabilities P (Ai ), i = [1, n]. With the new information “B
has occurred”, the information about Ai can be updated by the n conditional
probabilities P (B|Aj ), j = [1, n].
Example 7
Two boxes B1 and B2 contain 100 and 200 light bulbs respectively. The first box
(B1 ) has 15 defective bulbs and the second 5. Suppose a box is selected at random
Spring 2023 15
and one bulb is picked out.
(a) What is the probability that it is defective?
Solution: Note that box B1 has 85 good and 15 defective bulbs. Similarly box
B2 has 195 good and 5 defective bulbs. Let D = “Defective bulb is picked out”.
Then,
15 5
P (D|B1) = = 0.15, P (D|B2) = = 0.025
100 200
Since a box is selected at random, they are equally likely.
P (B1 ) = P (B2 ) = 1/2
Thus B1 and B2 form a partition, and using Law of Total Probability, we obtain
1 1
P (D) = P (D|B1 )P (B1 ) + P (D|B2)P (B2 ) = 0.15 × + 0.025 × = 0.0875
2 2
Thus, there is about 9% probability that a bulb picked at random is defective.
(b) Suppose we test the bulb and it is found to be defective. What is the proba-
bility that it came from box 1? (P (B1 |D) =?)
P (D|B1)P (B1 ) 0.15 × 0.5

P (B1 |D) = = = 0.8571 (1.8)
P (D) 0.0875
Notice that initially P (B1) = 0.5; then we picked out a box at random and tested
a bulb that turned out to be defective. Can this information shed some light about
the fact that we might have picked up box 1? From (1.8), P (B1 |D) = 0.875 > 0.5,
and indeed it is more likely at this point that we must have chosen box 1 in favor
of box 2. (Recall box1 has six times more defective bulbs compared to box2).
Example 8
Suppose you have two coins, one biased, one fair, but you don’t know which coin is
which. Coin 1 is biased. It comes up heads with probability 3/4, while coin 2 will flip
Spring 2023 16
heads with probability 1/2. Suppose you pick a coin at random and flip it. Let Ci
denote the event that coin i is picked. Let H and T denote the possible outcomes of the
flip. Given that the outcome of the flip is a head, what is P [C1 |H], the probability that
you picked the biased coin? Given that the outcome is a tail, what is the probability
P [C1 |T ] that you picked the biased coin?
Solution: First, we construct the sample tree as shown: To find the conditional
probabilities, we see
P (C1H) P (C1H) 3/8 3

P (C1|H) = = = =
P (H) P (C1H) + P (C2 H) 3/8 + 1/4 5
Similarly,
P (C1 T ) P (C1 T ) 1/8 1

P (C1 |T ) = = = =
P (T ) P (C1T ) + P (C2 T ) 1/8 + 1/4 3
As we would expect, we are more likely to have chosen coin 1 when the first flip is
heads but we are more likely to have chosen coin 2 when the first flip is tails.
Spring 2023 17
Lecture 2
Random Variables
2.1 Introduction
Let Ω be sample space of a probability model, and X a function that maps every ζ ∈ Ω, to
a unique point x ∈ R, the set of real numbers. Since the outcome ζ is not certain, so is the
value X(ζ) = x. Thus if B is some subset of R, we may want to determine the probability
of ”X(ζ) ∈ B“. To determine this probability, we can look at the set A = X −1 (B) ⊂ Ω. A
contains all that maps into B under the function X.
Obviously, if the set A = X −1 (B) is an event, the probability of A is well defined; in this
case we can say
probability of the event ”X(ζ) ∈ B“ = P (X −1(B)) = P (A)
18
However, X −1 (B) may not always belong to R for all B, thus creating difficulties. The
notion of random variable (RV ) makes sure that the inverse mapping always results in an
event so that we are able to determine the probability for any B ⊂ R.
Random Variable (RV ): A finite single valued function X(·) that maps the set of all
experimental outcomes Ω into the set of real numbers R is said to be a RV , if the set
{ζ|X(ζ) ≤ x} is an event for every x in R.
The random variable X by the function X(ζ) that maps the sample outcome ζ to the
corresponding value of the random variable X. That is
{X = x} = {ζ ∈ Ω|X(ζ) = x}
Since all events have well defined probability. Thus the probability of the event {ζ|X(ζ) ≤ x}
must depend on x. Denote
P {ζ|X(ζ) ≤ x} = FX (x) ≥ 0 (2.1)
The role of the subscript X is only to identify the actual RV . FX (x) is said to be the
Cumulative Distribution Function (CDF) associated with the RV X.
2.1.1 Properties of CDF
FX (+∞) = 1, FX (−∞) = 0
FX (+∞) = P {ζ|X(ζ) ≤ +∞} = P (Ω) = 1
FX (−∞) = P {ζ|X(ζ) ≤ −∞} = P (φ) = 0
if x1 < x2 , then FX (x1 ) ≤ FX (x2 )
Spring 2023 19
If x1 < x2 , then the subset (−∞, x1 ) ⊂ (−∞, x2 ). Consequently the event {ζ|X(ζ) ≤
x1 } ⊂ {ζ|X(ζ) ⊂ x2}, since X(ζ) ≤ x1 , implies X(ζ) ≤ x2. As a result
FX (x1 ) = P (X(ζ) ≤ x1 ) ≤ P (X(ζ) ≤ x2) = FX (x2 )
implying that the probability distribution function is nonnegative and monotone non-
decreasing.
F or all b > a, FX (b) − FX (a) = P (a < X ≤ b)
To prove this theorem, express the event Eab = {a < X ≤ b} as a part of union of
disjoint events. Starting with the event Eb = {X ≤ b} . Note that Eb can be written
as the union
Eb = X ≤ b = {X ≤ a} ∪ {a < X ≤ b} = Ea ∪ Eab
Note also that Ea and Eab are disjoint so that P (Eb) = P (Ea )+P (Eab). Since P (Eb ) =
FX (b) and P (Ea ) = FX (a), we can write FX (b) = FX (a) + P (a < X ≤ b), which
completes the proof.
2.1.2 Additional Properties of a CDF
• If FX (x0 ) = 0 for some x0 , then FX (x) = 0, x ≤ x0 .
This follows, since FX (x0 ) = P (X(ζ) ≤ x0 ) = 0 implies {X(ζ) ≤ x0 } is the null set,
and for any x ≤ x0 , {X(ζ) ≤ x} will be a subset of the null set.
• P {X(ζ) > x} = 1 − FX (x)
We have {X(ζ) ≤ x} ∪ {X(ζ) > x} = Ω, and since the two events are mutually
exclusive, the above equation follows.
Spring 2023 20
• P {x1 < X(ζ) ≤ x2 } = FX (x2 ) − FX (x1 ), x2 > x1
The events {X(ζ) ≤ x1 } and {x1 < X(ζ) ≤ x2 } are mutually exclusive and their union
represents the event {X(ζ) ≤ x2 }.
• P {X(ζ) = x} = FX (x) − FX (x− )
Let x1 = x − ǫ, ǫ > 0, and x2 = x,
lim P {x − ǫ < X(ζ) ≤ x} = FX (x) − lim FX (x − ǫ)

ǫ→0 ǫ→0
or
P {X(ζ) = x} = FX (x) − FX (x− )
FX (x+
0 ), the limit of FX (x) as x → x0 from the right always exists and equals FX (x0 ).
However the left limit value FX (x−

0 ) need not equal FX (x0 ). Thus FX (x) need not be
continuous from the left. At a discontinuity point of the distribution, the left and right
limits are different, and
P {X(ζ) = x0 } = FX (x0 ) − FX (x−

0)
Thus the only discontinuities of a distribution function are of the jump type. The
CDF is continuous from the right. Keep in mind that the CDF always takes on the
upper value at every jump in staircase.
Example 1
X is a RV such that X(ζ) = c, ζ ∈ Ω. Find FX (x).
Solution: For x < c, {X(ζ) ≤ x} = φ, so that FX (x) = 0 and for x > c, X(ζ) ≤ x = Ω, so
that FX (x) = 1. (see Figure 2.1)
Example 2
Spring 2023 21
Figure 2.1: CDF for example 1
Toss a coin. Ω = {H, T }. Suppose the RV X is such that X(T ) = q, X(H) = 1 − q. Find
FX (x).
Solution:
• For x < 0, {X(ζ) ≤ x} = φ, so that FX (x) = 0.
• For 0 < x < 1, {X(ζ) ≤ x} = {T }, so that FX (x) = P (T ) = q.
• For x ≥ 1, {X(ζ) ≤ x} = {H, T } = Ω, so that FX (x) = 1. (see Figure 2.2)
• X is said to be a continuous-type RV if its distribution function FX (x) is continuous.
In that case FX (x− ) = FX (x) for all x, therefore, P {X = x} = 0.
• If FX (x) is constant except for a finite number of jump discontinuities(piece-wise con-
stant; step-type), then X is said to be a discrete-type RV . If xi is such a discontinuity
Spring 2023 22
point, then
pi = P {X = xi } = FX (xi ) − FX (x−
i )
For above two examples, at a point of discontinuity we get
P {X = c} = FX (c) − FX (c− ) = 1 − 0 = 1
and
P {X = 0} = FX (0) − FX (0− ) = q − 0 = q
Example 3
A fair coin is tossed twice, and let the RV X represent the number of heads. Find FX (x).
Solution: In this case Ω = {HH, HT, T H, T T }, and X(HH) = 2, X(HT ) = 1, X(T H) = 1,
X(T T ) = 0
• For x < 0, {X(ζ) ≤ x} = φ, so that FX (x) = 0.
• For 0 ≤ x < 1, {X(ζ) ≤ x} = {T T }, so that FX (x) = P (T T ) = P (T )P (T ) = 41 .
• For 1 ≤ x < 2, {X(ζ) ≤ x} = {T T, HT, T H}, so that FX (x) = P (T T, HT, T H) = 43 .
• x ≥ 2, {X(ζ) ≤ x} = Ω, so that FX (x) = 1.
(see Figure 2.3) We can also have
3 1 1
P {X = 1} = FX (1) − FX (1− ) = − =
4 4 2
2.2 Probability Density Function (pdf )
The first derivative of the distribution function FX (x) is called the probability density func-
tion fX (x) of the RV X. Thus
dFx (x) FX (x + ∆x) − FX (x)

fX (x) = = lim ≥0 (2.2)
dx ∆x→0 ∆x
Spring 2023 23
Equation (2.2) shows that fX (x) ≥ 0 for all x.
• Discrete RV:if X is a discrete type RV , then its density function has the general
form
X
fX (x) = pi δ(x − xi )
i
where xi represent the jump-discontinuity points in FX (x). As Fig. 2.4 shows, fX (x)
represents a collection of positive discrete masses, and it is known as the probability
mass function (pmf) in the discrete case.
Figure 2.4: Discrete pmf
• If X is a continuous type RV , fX (x) will be a continuous function,
• We also obtain by integration

Z x
FX (x) = fX (u)du
+∞
Spring 2023 24
Since FX (+∞) = 1, yields
Z −∞
fX (u)du = 1
−∞
which justifies its name as the density function.
• We also get Figure 2.5

Z x2
P {x1 < X < x2 } = FX (x2 ) − FX (x1 ) = fX (x)dx
x1
Thus the area under fX (x) in the interval (x1 , x2 ) represents the probability in the
Figure 2.5: Continuous pdf
above equation.
• Often, RV s are referred by their specific density functions - both in the continuous and
discrete cases - and in what follows we shall list a number of them in each category.
2.3 Continuous-type Random Variables
• Normal (Gaussian): X is said to be normal or Gaussian RV , if
1 h (x − µ)2 i
fX (x) = √ exp − (2.3)
2πσ 2 2σ 2
This is a bell shaped curve, symmetric around the parameter µ, and its distribution
function is given by
x h (y − µ)2 i
1 x−µ
Z
FX (x) = √ exp − 2
dy = Φ( ) (2.4)
−∞ 2πσ 2 2σ σ
Spring 2023 25
Rx 2
where Φ(x) = −∞
√1
2π
exp(− y2 )dy is called standard normal CDF , and is often tabu-
lated. Figure 2.6 shows pdf and cdf of the Normal distribution for different means and
variances.
Figure 2.6: pdf and cdf of Normal distribution for different means and variances
b − µ a − µ
P (a < X < b) = Φ −Φ
σ σ
Z ∞ 2
1 y
Q(x) = √ exp − dy = 1 − Φ(x)
x 2π 2
Q(x) is called Standard Normal complementary CDF , and Q(x) = 1 − Φ(x). Since
fX (x) depends on two parameters µ and σ 2 , the notation X ∼ ℵ(µ, σ 2 ) is applied. If
X −µ
Y = ∼ ℵ(0, 1) (2.5)
σ
Y is called normalized Gaussian RV . Furthermore,
aX + b ∼ ℵ(aµ + b, a2 σ 2 )
linear transform of a Gaussian RV is still Gaussian.
• Uniform: X ∼ U(a, b), a < b, as shown in Figure 2.7 if


 1 , a ≤ x ≤ b;

b−a
fX (x) = (2.6)
 0,

elsewhere.
Spring 2023 26
Figure 2.7: pdf and cdf of Uniform distribution
• Exponential: X ∼ E(λ) if

1
exp(− λx ), x ≥ 0;


λ
fX (x) = (2.7)
 0,

elsewhere.
Figure 2.8 indicates the pdf and cdf of Exponential distribution for different parameter
λ.
Figure 2.8: pdf and cdf of Exponential distribution
• Chi-Square distribution with n degree of freedom


 n n1

xn/2−1 exp(− 2σx2 ), x ≥ 0;
σ 2 Γ(n/2)
fX (x) = (2.8)
 0,

elsewhere.
Spring 2023 27
When the RV X is defined as X = Σni=1 Xi2 , i = [1, n] are statistically independent
and identically distributed (i.i.d) Gaussian RV ∼ ℵ(0, σ 2 ), then X has a chi-square
distribution with n degree of freedom. Γ(x) is called Gamma function and given as
Z ∞
Γ(p) = tp−1 e−t dt, p > 0
0
Γ(p) = (p − 1)! p is possitive integer.
1 √
Γ( ) = π
2
• Rayleigh: X ∼ R(σ 2 ) as shown in Figure 2.9


 x2 exp(− x22 ), x ≥ 0;

σ 2σ
fX (x) = (2.9)
 0,

elsewhere.
Let Y = X12 +X22 where X1 and X2 ∼ ℵ(0, σ 2 ) and independent. Then Y is chi-square
Figure 2.9: pdf and cdf of Rayleigh distribution
distributed with two degrees of freedom, hence pdf of Y is
1 y
fY (y) = exp −
2σ 2 2σ 2
√
Now, suppose we define a new RV as R = Y , then R is Rayleigh distributed.
Spring 2023 28
2.4 Discrete-type Random Variables
• Bernoulli: X takes the values of (0, 1), and
P (X = 0) = q, P (X = 1) = p, p = 1 − q.
• Binomial: X ∼ B(n, p)
 
 n 
P (X = k) =   pk q n−k , k = 0, 1, 2, · · · , n
k
• Poisson: X ∼ P (λ)
λk
P (X = k) = e−λ , k = 0, 1, 2, · · · , ∞
k!
• Uniform: X takes the values from [1, , n], and
1
P (X = k) = , k = 0, 1, 2, · · · , n
n
• Geometric: (number of coin toss till first head appear)
P (X = k) = (1 − p)k−1 p, k = 1, · · · ,
where the parameter p ∈ (0, 1) (probability for head appear on each one toss).
2.4.1 Example of Poisson RV
Example of Poisson Distribution: the probability model of Poisson RV describes phenomena
that occur randomly in time. While the time of each occurrence is completely random,
there is a known average number of occurrences per unit time. For example, the arrival of
information requests at a W W W server, the initiation of telephone call, etc.
For example, calls arrive at random times at a telephone switching office with an average
Spring 2023 29
of λ = 0.25 calls/second. The pmf of the number of calls that arrive in a T = 2 second
interval is 
 (0.5)k e−0.5 , k = 0, 1, 2, · · · ;

k!
PK (k) =
 0,

o.w.
2.4.2 Example of Binomial RV
Example of using Binomial Distribution: To communicate one bit of information reliably, we
transmit the same binary symbol 5 times. Thus, ”zero“ is transmitted as 00000 and ”one“
is transmitted as 11111. The receiver detects the correct information if three or more binary
symbols are received correctly. What is the information error probability P (E), if the binary
symbol error probability is q = 0.1?
In this case, we have five trials corresponding to five transmissions. On each trial, the
probability of a success is p = 1 − q = 0.9 (binary symmetric channel). The error event
occurs when the number of successes is strictly less than three:
P (E) = P (S0,5 ) + P (S1,5 ) + P (S2,5 ) = q 5 + 5pq 4 + 10p2 q 3 = 0.0081
By increasing the number of transmissions (5 times), the probability of error is reduced from
0.1 to 0.0081.
2.4.3 Bernoulli Trial Revisited
Bernoulli trial consists of repeated independent and identical experiments each of which has
only two outcomes A or Ā with P (A) = p and P (Ā) = q. The probability of exactly k
occurrences of A in n such trials is given by Binomial distribution.
Let
Xk = ”exact k occurance in n trials“ (2.10)
Spring 2023 30
Since the number of occurrences of A in n trials must be an integer k = 0, 1, 2, · · · , n, either
X0 or X1 or X2 or · · · or Xn must occur in such an experiment. Thus
P (X0 ∪ X1 ∪ X2 ∪ · · · ∪ Xn ) = 1 (2.11)
But Xi , Xj are mutually exclusive. Thus

 
n n
X X  n  k n−k
P (X0 ∪ X1 ∪ X2 ∪ · · · ∪ Xn ) = P (Xk ) =  p q (2.12)
k=0 k=0 k
from the relation  

n
X  n  k n−k
(a + b)n =  p q
k=0 k
Equation (2.12) (p + q)n = 1 and it agrees with Equation (2.11).
For a given n and p what is the most likely value of k? The most probable value of k is that
number which maximizes in Binomial distribution. To obtain this value, consider the ratio
Pn (k − 1) n!pk−1 q n−k+1 (n − k)!k! k q

= · k n−k
= ·
Pn (k) (n − k + 1)!(k − 1)! n!p q n−k+1 p
Thus Pn (k) > Pn (k − 1), if k(1 − p) < (n − k + 1)p or k < (n + 1)p. Thus, Pn (k) as a function
of k increases until
k = (n + 1)p
if it is an integer, or the largest integer kmax less than (n + 1)p and (n + 1)p represents the
most likely number of successes (or heads) in n trials.
Example 4
In a Bernoulli experiment with n trials, find the probability that the number of occurrences
of A is between k1 and k2 .
Spring 2023 31
Solution: with Xi , i = 0, 1, 2, · · · , n as defined in Equation (2.10), clearly they are mutually
exclusive events. Thus
P = P (”Occurrence of A is between k1 and k2 “)

 
k2 k2
= P (Xk1 ∪ Xk1 +1 ∪ · · · Xk2 ) = P (Xk ) =  p q (2.13)
k=k1 k=k1 k
Example 5
Suppose 5, 000 components are ordered. The probability that a part is defective equals 0.1.
What is the probability that the total number of defective parts does not exceed 400?
Solution: Let
Yk = ”k parts are detective among 5000 components“
using Equation (2.13), the desired probability is given by

 
400
X 400
X  5000 
k 5000−k
P (Y0 ∪ Y1 ∪ · · · Y400 ) = P (Yk ) =   (0.1) (0.9)
k=0 k=0 k
The above equation has too many terms to compute. Clearly, we need a technique to compute
the above term in a more efficient manner.
2.4.4 Binomial Random Variable Approximations
Let X represent a Binomial RV , then

 
k2 k2
P (k1 < X < k2 = P (Xk ) =  p q (2.14)
k=k1 k=k1 k
 
 n  n!
Since the binomial coefficient   = (n−k)!k! grows quite rapidly with n, it is difficult to
k
compute Equation (2.14) for large n. In this context, Normal approximation is extremely
Spring 2023 32
useful.
Normal Approximation: (Demoivre-Laplace Theorem) Suppose n → ∞ with p held fixed.

√
Then for k in the npq neighborhood of np, we can approximate
 
 n  k n−k 1 (k − np)2
p q = √ exp − (2.15)
2πnpq 2npq
 
k
Thus if k1 and k2 in Equation (2.14) are within or around the neighborhood of the interval
√ √
(np − npq, np + npq) we can approximate the summation in Equation (2.14) by an
integration as
K2 (k − np)2
1
Z
P (k1 < X < k2 ) = √ exp − dx
k1 2πnpq 2npq
x2 y2
1
Z
= √ exp − dy (2.16)
x1 2π 2
where
k1 − np k2 − np
x1 = √ x2 = √
npq npq
We can express Equation (2.16) in terms of the normalized integral that has been tabulated
extensively. See Figures 2.11 and 2.12.
x
1
Z
2 /2
erf (x) = √ ey dy = −erf (−x) (2.17)
2π 0
For example, if x1 and x2 are both positive, we obtain
P (x1 < X < x2 ) = erf (x2 ) − erf (x1 )
Example 6
A fair coin is tossed 5, 000 times. Find the probability that the number of heads is between
2, 475 to 2, 525.
Solution: We need P (2475 ≤ X ≤ 2525). Here n is large so that we can use the normal
Spring 2023 33
Figure 2.10: pdf of Gaussian approximation.
√ √
approximation. In this case p = 1/2, so that np = 2500, and npq ≃ 35. Since np − npq ≃
√
2465 and np + npq ≃ 2535, the approximation is valid for k1 = 2475 and k2 = 2525. Thus
x2 y2
1
Z
P (k1 < X < k2 ) = √ exp − dy (2.18)
x1 2π 2
here
k1 − np 5 k2 − np 5
x1 = √ =− , x2 = √ =
npq 7 npq 7
Since x1 < 0, from Figure 2.10, the above probability is given by
5
P (2475 ≤ X ≤ 2525) = erf (x2 ) − erf (x1 ) = erf (x2 ) + erf (|x1 |) = 2erf ( ) = 0.516
7
where we have used table (erf (0.7) = 0.258).
Spring 2023 34
Figure 2.11: The standard normal complementary CDF Φ(z)
Spring 2023 35
Figure 2.12: The standard normal complementary CDF Q(z)
Spring 2023 36
Lecture 3
Mean, Variance, Characteristic
Function and Transforms of RVs
3.1 Mean of a RV
For a RV X, its pdf fX (x) represents complete information about it. Note that fX (x)
represents very detailed information, and quite often it is desirable to characterize the r.v
in terms of its average behavior. In this context, we will introduce two parameters - mean
and variance - that are universally used to represent the overall properties of the RV and
its pdf . Mean (Expected Value) of a RV X is defined as

Z ∞
X̄ = E[X] = xfX dx (3.1)
−∞
If X is a discrete-type RV , then we get

Z X X Z
X̄ = E[X] = x pi δi (x − xi )dx = xi pi δi (x − xi )dx
i i
X X
= xi pi = xi P (X = xi ) (3.2)
i i
37
Mean represents the average (mean) value of the RV in a very large number of trials. For
example
• X ∼ U(a, b) (uniform distribution), then,

b
b
x 1 x2 a+b
Z
E[X] = dx = =
a b−a b−a 2 2
a
is the midpoint of the interval (a, b).
• X is exponential with parameter λ, then
∞ ∞
x −x
Z Z
E[X] = e λ dx = λ ye−y dy = λ
0 λ 0
implying that the parameter represents the mean value of the exponential RV .
• X is Poisson with parameter λ, we get

∞ ∞ ∞
X X λk X λk
X̄ = E[X] = kP (X = k) = ke−λ = e−λ k
k=0 k=0
k! k=0
k!
∞ k X λi ∞
−λ
X λ
= e = λe−λ eλ = λ (3.3)
k=0
(k − 1)! i=0
i!
Thus the parameter λ also represents the mean of the Poisson RV .
• X is binomial, then its mean is given by

 
n n
X X  n 
X̄ = E[X] = kP (X = k) = k   pk q n−k
k=0 k=0 k
n n
X n! X n!
= k pk q n−k = pk q n−k
k=1
(n − k)!k! k=1
(n − k)!(k − 1)!
n−1
X (n − 1)!
= np pi q n−i−1 = np(p + q)n−1 = np (3.4)
i=0
(n − i − 1)!i!
Thus np represents the mean of the binomial RV .
Spring 2023 38
• For the normal RV ,
∞ Z ∞
(x − µ)2 (y)2

1 1
Z
X̄ = E[X] = √ x exp − dx = √ (y + µ) exp − 2 dy
2πσ 2 −∞ 2σ 2 2πσ 2 −∞ 2σ
Z ∞ 2
Z ∞ 2

1 (y) 1 (y)
= √ y exp − 2 dy + µ √ exp − 2 dy = µ
2
2πσ −∞ 2σ 2
2πσ −∞ 2σ
(3.5)
where the first integral in Equation (3.5) is zero and the second is 1. Thus the first
parameter in X ∼ ℵ(µ, σ 2 ) is in fact the mean of the Gaussian RV X.
3.1.1 Mean of a Function of a RV
Given X ∼ fX (x), suppose Y = g(X) defines a new RV with pdf fY (y). Then from the
previous discussion, the new RV Y has a mean µY given by

Z ∞
µY = E[Y ] = yfY (y)dy (3.6)
−∞
From above, it appears that to determine E(Y ), we need to determine fY (y). However this
is not the case if only E(Y ) is the quantity of interest. Instead, we can obtain E(Y ) as
Z ∞ Z ∞
µY = E[Y ] = E[g(X)] = yfY (y)dy = g(x)fX (x)dx (3.7)
−∞ −∞
For discrete case
X
µY = E[Y ] = g(xi )P (X = xi ) (3.8)
i
Spring 2023 39
Therefore, fY (y) is not required to evaluate E(Y ) for Y = g(X). As an example, we
determine the mean of Y = X 2 , where X is a Poisson RV.

∞ ∞ k ∞
2
X
2
X
2 −λ λ −λ
X λk
X̄ 2 = E[X ] = k P (X = k) = k e =e k2
k=0 k=0
k! k=1
k!
∞ k ∞ i+1
X λ X λ
= e−λ k = e−λ (i + 1)
k=1
(k − 1)! i=0
i!
∞ ∞
!
−λ
X λi X λi
= λe i +
i=0
i! i=0
i!
∞
! ∞
!
i m+1
X λ X λ
= λe−λ + eλ = λe−λ + eλ
i=1
(i − 1)! m=0
m!
= λe−λ (λeλ + eλ ) = λ2 + λ (3.9)
In general, E(X k ) is known as the k th moment of RV X. Thus if X ∼ P (λ), its second
moment is λ2 + λ.
3.2 Variance of a RV
Mean alone cannot be able to truly represent the pdf of any RV. As an example to illustrate
this, considering two Gaussian RVs X1 ∼ ℵ(0, 1) and X2 ∼ ℵ(0, 10). Both of them have
the same mean. However, as Figure 3.1 shows, their pdf s are quite different. One is more
concentrated around the mean, whereas the other one has a wider spread. Clearly, we need
at least an additional parameter to measure this spread around the mean!.
For a RV X with mean µ , X − µ represents the deviation of the RV from its mean.
Since this deviation can be either positive or negative, consider the quantity (X − µ)2 , and
its average value E[(X − µ)2 ] represents the average mean square deviation of X around its
mean. Define
2
σX = E[(X − µ)2 ] > 0 (3.10)
Spring 2023 40
0.5
2
σ =1
2
σ =4
0.4
Both µ =0
0.3
0.2
0.1
0
−10 −5 0 5 10
Figure 3.1: The impact of the variance on pdf of Normal distribution
With g(X) = (X − µ)2 and using Equation (3.7) we get

Z ∞
2
σX = (X − µ)2 fx (x) dx > 0 (3.11)
−∞
2
p
σX is known as the variance of the RV X, and its square root σX = E[(X − µ)2 ] is
known as the standard deviation of X. Note that the standard deviation represents the root
mean square spread of the RV X around its mean µ.

Z ∞
2
Var(X) = σX = (X 2 − 2xµ + µ2 )fx (x) dx
Z−∞
∞ Z ∞
2
= X fx dx − 2µ Xfx (x) dx + µ2
−∞ −∞
2
= E(X 2 ) − µ2 = E(X 2 ) − [E(X)]2 = X 2 − X (3.12)
• For a Poisson RV, we can obtain that
2 2
σX = E(X 2 ) − [E(X)]2 = X 2 − X = (λ2 + λ) − λ2 = λ
Thus for a Poisson RV, mean and variance are both equal to its parameter λ.
Spring 2023 41
• The variance of the normal RV ℵ(µ, σ 2) can be obtained as
∞
1
Z
2 2 /2σ 2
Var(X) = E(X − µ) = (X − µ)2 √ e−(x−µ) dx (3.13)
−∞ 2πσ 2
To simplify the above integral, we can make use of the identity
∞ ∞
1
Z Z
2 /2σ 2
fX (x)dx = √ e−(x−µ) dx = 1
−∞ −∞ 2πσ 2
which gives
Z ∞
2 /2σ 2 √
e−(x−µ) dx = 2πσ
−∞
Differentiating both sides of above with respect to σ, we get

Z ∞
(x − µ)2 −(x−µ)2 /2σ2 √
3
e dx = 2π
−∞ σ
or
∞
(x − µ)2 −(x−µ)2 /2σ2
Z
√ e dx = σ 2
2πσ 2
−∞
which represents the Var(X) in Equation (3.13). Thus for a normal RV ℵ(µ, σ 2 ),
Var(X) = σ 2 ,
therefore the second parameter in ℵ(µ, σ 2 ) in fact represents the variance. As Figure
4.1 shows the larger the σ, the larger the spread of the pdf around its mean. Thus as
the variance of a RV tends to zero, it will begin to concentrate more and more around
the mean, ultimately behaving like a constant.
3.3 Moments
As remarked earlier, in general
mn = X¯n = E(X n ) n ≥ 1 (3.14)
Spring 2023 42
are known as the moments of the RV X, and
µn = E[(X − µ)n ]
are known as the central moments of X. Clearly, the mean µ = m1 , and the variance
σ 2 = µ2 . It is easy to relate mn and µn . In fact for n > 1 and using the relation
 
n
X n 
(a + b)n =  p q ,
k n−k
k=0 k
we get
 
n
!
Xn
= E[(X − µ)n ] = E
  k n−k
µn   X (−µ)
k=0 k
   
n
X n  n
X n 
k n−k n−k
=   E(X )(µ) =   mk (−µ) (3.15)
k=0 k k=0 k
Direct calculation is often a tedious procedure to compute the mean and variance, and in
this context, the notion of the characteristic function can be quite helpful.
3.4 Characteristic Function (CF)
The characteristic function of a RV X is defined as

Z ∞
jXω
ΦX (ω) = E(e )= ejXω fX (x) dx (3.16)
−∞
Thus ΦX (0) = 1 and |ΦX (ω|) ≤ 1 for all ω.
For discrete RVs the characteristic function is
X
ΦX (ω) = ejkω P (X = k) (3.17)
k
Spring 2023 43
• if X ∼ P (λ) for Poisson distribution, then its characteristic function is given by
∞ ∞ k
X λk X (λejω ) jω jω
ΦX (ω) = ejkω e−λ = e−λ = e−λ eλe = eλ(e −1) (3.18)
k! k!
k=0 k=0
• if X is a binomial RV, its characteristic function is given by

   
n n
X  n  X n 
jω k n−k
n
ejkω   pk q n−k = = pejω + q

ΦX (ω) =   pe q (3.19)
k=0 k k=0 k
3.4.1 CF and Moment
To illustrate the usefulness of the characteristic function of a RV in computing its moments,
first it is necessary to derive the relationship between them.

∞ ∞
jXω
hX (jωX)k i X E(X k ) k
ΦX (ω) = E(e )=E = jk ω
k=0
k! k=0
k!
E(X 2 ) 2 E(X k ) k
= 1 + jE(X)ω + j 2 ω + · · · + jk ω +··· (3.20)
2! k!
P∞ λk
where we have used eλ = k=0 k! . Taking the first derivative of Equation (3.20) with respect
to ω, and letting it to be equal to zero, we get
∂ΦX (ω) 1 ∂ΦX (ω)

= jE(X) or E(X) = (3.21)
∂ω j ∂ω
ω=0 ω=0
Similarly, the second derivative of Equation (3.20) gives
1 ∂ 2 ΦX (ω)
E(X 2 ) = (3.22)
j 2 ∂ω 2
ω=0
and repeating this procedure k times, we obtain the k th moment of X to be
k1 ∂ k ΦX (ω)
E(X ) = k , k≥1 (3.23)
j ∂ω k
ω=0
We can use Equations (3.20)-(3.22) to compute the mean, variance and other higher order
moments of any RV X.
Spring 2023 44
• if X ∼ P (λ), then from Equation (3.18),
∂ΦX (ω) jω
= e−λ eλe , (3.24)
∂ω
so that from Equation (3.21)
E(X) = λ
which agrees with our earlier derivation in Equation (3.3). Differentiating Equation
(3.24) one more time, we get
∂ 2 ΦX (ω) −λ
j
λe ω jω 2
λej ω 2 jω

=e e λje +e λj e (3.25)
∂ω 2
so that from Equation (3.22),
E(X 2 ) = λ2 + λ
which again agrees with results in Equation (3.3). Notice that compared to the tedious
calculations in Equation(3.3) to Equation (3.9), the efforts involved by using CF are
very minimal.
• We can use the characteristic function of the binomial RV B(n, p) in Equation (3.19)
to obtain its variance. Direct differentiation gives
∂ΦX (ω) n−1

= jnpejω pejω + q (3.26)
∂ω
so that from Equation (3.21), E(X) = np, which is the same as previous calculation.
One more differentiation of Equation (3.26) yields
∂ 2 ΦX (ω) n−1 n−2

2
= j n p[ejω pejω + q + (n − 1)pe2jω pejω + q (3.27)
∂ω
and using Equation (3.22), we obtain the second moment of the binomial r.v to be
E(X 2 ) = np(1 + (n − 1)p) = n2 p2 + npq
Spring 2023 45
Therefore, we obtain the variance of the binomial r.v to be
2
σX = E(X 2 ) − [E(X)]2 = n2 p2 + npq − n2 p2 = npq
• To obtain the characteristic function of the Gaussian r.v, we can make use of the
definition. Thus if X ∼ N(µ, σ 2 ) then

∞
1
Z
(x−µ)2
ΦX (ω) = ejω √ e− 2σ 2 dx, (Let x − µ = y)
2πσ 2
−∞
Z ∞ ∞
1 1
Z
jµω jωy −y 2 /2σ2 jµω 2 )(y−2jσ 2 ω)
= e √ e e dy = e √ e−(y/2σ dy
2πσ 2 −∞ 2πσ 2 −∞
(Let y − jσ 2 ω = z, so that y = z + jσ 2 ω)
Z ∞
jµω 1 2 2 2
= e √ e−(z+jσ ω)(z−jσ ω)/2σ dz
2πσ −∞ 2
Z ∞
jµω −σ2 ω 2 /2 1 2 2 2 2
= e e √ e−z /2σ dz = e(jµω−σ ω /2) (3.28)
2
2πσ −∞
Notice that the characteristic function of a Gaussian r.v itself has the ”Gaussian“ bell
shape. Thus if X ∼ ℵ(0, σ 2 ), then
1 2 /2σ 2 2 ω 2 /2
fX (x) = √ e−x ΦX (ω) = e−σ
2πσ 2
From Fig. 10, the reverse roles of σ 2 in fX (x) and ΦX (ω) are noteworthy (σ 2 , vs.1/σ 2 ).
3.5 Chebychev Inequality
We conclude this section with a bound that estimates the dispersion of the r.v beyond a
certain interval centered around its mean. Since σ 2 measures the dispersion of the RV X
around its mean µ, we expect this bound to depend on σ 2 as well. Consider an interval of
width 2ǫ symmetrically centered around its mean µ shown as in Figure 3.2. What is the
probability that X falls outside this interval? We need
P (|X − µ| ≥ ǫ) =? (3.29)
Spring 2023 46
Figure 3.2: Chebyshev inequality concept
To compute this probability, we can start with the definition of σ 2

Z ∞ Z
2 2 2
σ = E[(x − µ) ] = (x − µ) fX (x)dx ≥ (x − µ)2 fX (x)dx
−∞ |x−µ|≥ǫ
Z Z
≥ ǫ2 fX (x)dx = ǫ2 fX (x)dx = ǫP (|X − µ| ≥ ǫ) (3.30)
|x−µ|≥ǫ |x−µ|≥ǫ
From Equation (3.30), we obtain the desired probability to be
σ2
P (|X − µ| ≥ ǫ) ≤ (3.31)
ǫ2
Equation (3.31) is known as the chebychev inequality. Interestingly, to compute the above
probability bound, the knowledge of fX (x) is not necessary. We only need σ 2 , the variance
of the RV. In particular with ǫ = kσ in Equation (3.31) we obtain
1
P (|X − µ| ≥ kσ) ≤ (3.32)
k2
Thus with k = 3, we get the probability of X being outside the 3σ interval around its mean
to be 0.111 for any RV. Obviously this cannot be a tight bound as it includes all RVs. For
example, in the case of a Gaussian RV, from Table (µ = 0, σ = 1):
P (|X − µ| ≥ 3σ) = 0.0027
which is much tighter than that given by Equation (3.32). Chebychev inequality always
underestimates the exact probability.
Spring 2023 47
Example 1
If the height X of a randomly chosen adult has expected value E[X] = 5.5 feet and standard
deviation σX = 1 foot, use the Chebyshev inequality to find an upper bound on P (X ≥ 11)
Solution: Since X is nonnegative, the probability that X ≥ 11 can be written as
P [X ≥ 11] = P [X − µX ≥ 11 − µX ] = P [|X − µX | ≥ 5.5]
Now we use the Chebyshev inequality to obtain
Var[X]
P [X ≥ 11] = P [|X − µX | ≥ 5.5] ≤ = 0.033 ≃ 1/30
5.52
We can see that the Chebyshev inequality is a loose bound. In fact, P [X ≥ 11] is orders of
magnitude lower than 1/30. Otherwise, we would expect often to see a person over 11 feet
tall in a group of 30 or more people!
Example 2
If X is uniformly distributed over the interval (0, 10), then, as E[X] = 5, Var(X) = 25/3, it
follows from Chebyshev’s inequality that
σ2 25 1
P (|X ≥ 5| > 4) ≤ 2
= ≃ 0.52
ǫ 3 16
whereas the exact result is
P (|X − 5| > 4) = 0.20
Thus, although Chebyshev’fs inequality is correct, the upper bound that it provides is not
particularly close to the actual probability.
Similarly, if X is a normal random variable with mean µ and variance σ 2 , Chebyshev’s
inequality states that

1
P (|X − µ| > 2σ) ≤
4
Spring 2023 48
whereas the actual probability is given by
X −µ
P (|X − µ| > 2σ) = P > 2 = 2[1 − Φ(2)] ≃ 0.0456
σ
Chebyshevfs inequality is often used as a theoretical tool in providing results.
3.6 Functions of a Random Variable
Let X be a RV and suppose g(x) is a function of the variable x. Define
Y = g(X)
Is Y necessarily a RV? If so what is its CDF ; FY (y),pdf fY (y)?
Example 3
Y = aX + b
Solution: Suppose a > 0

y − b y − b
FY (y) = P (Y ≤ y) = P (aX + b ≤ y) = P X ≤ = FX
a a
and
1 y − b
fY (y) = fX
a a
On the other hand if a < 0, then
y − b y − b
FY (y) = P (Y ≤ y) = P (aX + b ≤ y) = P X > = 1 − FX
a a
and hence
1 y − b
fY (y) = − fX
a a
Therefore, we obtain (for all a)
1 y − b
fY (y) = fX
|a| a
Spring 2023 49
Example 4
Y = X2
FY (y) = P (Y ≤ y) = P (X 2 ≤ y)
if y < 0, then the event {X 2 ≤ y} = φ and hence
FY (y) = 0 y<0
For y > 0, from Figure. 12, the event {Y ≤ y} = {X 2 ≤ y} is equivalent to {x1 < X ≤ x2 }.
Hence,
√ √
FY (y) = P (x1 < X ≤ x2 ) = FX (x2 ) − FX (x1 ) = FX ( y) − FX (− y) y > 0
By direct differentiation, we get


 √1 [fX (√y) − fX (−√y)], y > 0;

2 y
fY (y) = (3.33)
 0,

o.w.
If fX (x) represents an even function, then Equation (3.33) reduces to
1 √
fY (y) = √ fX ( y)U(y) (3.34)
y
In particular if X ∼ ℵ(0, 1), so that
1 2
fx (x) = √ e−x /2 (3.35)
2π
and substituting this into Equation (3.33) or Equation (3.34), we obtain the pdf of Y = X 2
to be
1 2
fY (y) = √ e−y /2 U(y) (3.36)
2πy
Spring 2023 50
3.6.1 General Approach
As a general approach, given Y = g(X), first sketch the graph y = g(x), and determine the
range space of y. Suppose a < y < b is the range space of y = g(x).
• For y < a, FY (y) = 0
• For y > b, FY (y) = 1
• FY (y) can be nonzero only in a < y < b.
• Next, determine whether there are discontinuities in the range space of y. If so evaluate
P (Y (ζ) = yi ) at these discontinuities.
• In the continuous region of y, use the basic approach
FY (y) = P (g(X) ≤ y)
and determine appropriate events in terms of the RV X for every y. Finally, we must
have FY (y) for −∞ < y < ∞ and obtain
dFY (y)
fY (y) = a<y<b
dy
However, if Y = g(X) is a continuous function, it is easy to establish a direct procedure to
obtain fY (y). A continuous function g(x) has a derivative function ǵ(X) which has a finite
number of maxima and minima.
The pdf of Y can be expressed as
X 1 X 1
fY (y) = fX (xi ) = fX (xi ) (3.37)
i
|dy/dx|i i
|ǵ(X)|i
The summation index i in Equation (3.37) depends on y, and for every y the equation
y = g(xi ) must be solved to obtain the total number of solutions at every y, and the actual
Spring 2023 51
solutions x1 , x2 , · · · all in terms of y.
Repeat Example 4 using this approach to verify the answer.
Example 5
Let Y = 1/X, find fY (y).
Solution: Here for every y, x1 = 1/y is the only solution, and
dy 1 dy 1
=− 2 so that = = y2
dx x dx 1/y 2
x=x1
and substituting this into Equation (3.37), we obtain
1 1
fY (y) = 2
fX ( ). (3.38)
y y
3.6.2 Functions of A Discrete-type RV
Suppose X is a discrete-type RV with
P (X = xi ) = pi , x = x1 , x2 , . . . , xi , · · · ,
and Y = g(X). Clearly Y is also of discrete-type, and when x = xi , yi = g(xi ), and for those
yi ,
P (Y = yi ) = P (X = xi ) = pi , y = y1 , y2 , · · · , yi , · · ·
Example 6
Suppose X ∼ P (λ) so that
λk
P (X = k) = e−λ , k = 0, 1, 2, · · ·
k!
Define Y = X 2 + 1. Find the pmf of Y .
Solution: X takes the values 0, 1, 2, · · · , k, · · · , so that Y only takes the values 1, 2, 5, · · · , k 2 +
1, · · · ,
P (Y = k 2 + 1) = P (X = k)
Spring 2023 52
so that for j = k 2 + 1
√
p λ λ j−1
P (Y = j) = P (X = j − 1) = e √ , j = 1, 2, 5, · · · k 2 + 1, · · ·
( j − 1)!
Spring 2023 53
Lecture 4
Distribution and Density Functions of
Two Random Variables
4.1 Two Random Variables
In many experiments, the observations are expressible not as a single quantity, but as a family
of quantities. For example to record the height and weight of each person in a community
or the number of people and the total income in a family, we need two numbers. Let X and
Y denote two random variables (r.v) based on a probability model (Ω, F, P ). Then
Z x2
P (x1 < X(ζ) < x2 ) = FX (x2 ) − FX (x1 ) = fX (x)dx
x1
and
Z y2
P (y1 < Y (ζ) < y2 ) = FY (y2 ) − FX (y1 ) = fY (y)dy
y1
What about the probability that the pair of RVs (X, Y ) belongs to an arbitrary region D?
In other words, how does one estimate, for example
P (x1 < X(ζ) < x2 ) ∩ P (y1 < Y (ζ) < y2 ) =?
54
Towards this, we define the joint probability distribution function of X and Y to be
FXY (x, y) = P (x1 < X(ζ) < x2 ) ∩ P (y1 < Y (ζ) < y2 )
= P (x1 ≤ x, Y ≤ y) ≥ 0 (4.1)
where x and y are arbitrary real numbers.
4.1.1 Properties
1.
FXY (−∞, y) = FXY (x, −∞) = 0, F (+∞, +∞) = 1 (4.2)
Since (X(ζ) ≤ −∞, Y (ζ) ≤ y) ⊂ (X(ζ) ≤ −∞), we get:
FXY (−∞, y) ≤ P (Z(ζ) ≤ −∞) = 0
Similarly, (X(ζ) ≤ +∞, Y (ζ) ≤ +∞) = Ω we get: FXY (+∞, +∞) = P (Ω) = 1
2.
P (x1 < X(ζ) < x2 , Y (ζ) ≤ y) = FXY (x2 , y) − FXY (x1 , y) (4.3)
P (X(ζ) ≤ x, y1 < Y (ζ) < y2 ) = FXY (x, y2 ) − FXY (x, y1 ) (4.4)
To proof of above equations, we note that for x2 > x1
(X(ζ) ≤ x2 , Y (ζ) ≤ y) = (X(ζ) ≤ x1 , Y (ζ) ≤ y) ∪ (x1 < X(ζ) ≤ x2 , Y (ζ) ≤ y)
and the mutually exclusive property of the events on the right side gives
P (X(ζ) ≤ x2 , Y (ζ) ≤ y) = P (X(ζ) ≤ x1 , Y (ζ) ≤ y) + P (x1 < X(ζ) ≤ x2 , Y (ζ) ≤ y)
which proves Equation (4.3). Similarly one can prove Equation (4.4).
Spring 2023 55
3.
P (x1 < X(ζ) ≤ x2 , y1 < Y (ζ) ≤ y2) = FXY (x2 , y2 ) − FXY (x2 , y1 )
− FXY (x1 , y2 ) + FXY (x1 , y1) (4.5)
This is the probability that (X, Y ) belongs to the rectangle in Figure 4.1. To prove
Figure 4.1: Two dimensional RV.
Equation (4.5), we can make use of the following identity involving mutually exclusive
events on the right side.
(x1 < X(ζ) ≤ x2 , Y (ζ) ≤ y2 ) = (x1 < X(ζ) ≤ x2 , Y (ζ) ≤ y1 )
∪ (x1 < X(ζ) ≤ x2 , y1 < Y (ζ) ≤ y2 )
This gives
P (x1 < X(ζ) ≤ x2 , Y (ζ) ≤ y2 ) = P (x1 < X(ζ) ≤ x2 , Y (ζ) ≤ y1 )
+ P (x1 < X(ζ) ≤ x2 , y1 < Y (ζ) ≤ y2 )
and the desired result in Equation (4.5) follows by making use of Equation (4.3) with
y = y2 and y1 respectively.
Spring 2023 56
4.2 Joint Probability Density Function (Joint pdf )
By definition, the joint pdf of X and Y is given by
∂ 2 FXY (x, y)
fXY (x, y) = (4.6)
∂x∂y
and hence we obtain the useful formula

Z x Z y
FXY (x, y) = fXY (u, v)dudv (4.7)
−∞ −∞
Using Equation (4.1.1)

Z ∞ Z ∞
fXY (x, y)dxdy = 1 (4.8)
−∞ −∞
Z Z
P ((X, Y ) ∈ R0 ) = fXY (x, y)dxdy (4.9)
(x,y)∈R0
4.3 Marginal Statistics
In the context of several RV s, the statistics of each individual ones are called marginal
statistics. Thus FX (x) is the marginal probability distribution function of X, and fX (x) is
the marginal pdf of X. It is interesting to note that all marginal can be obtained from the
joint pdf. In fact
FX (x) = FXY (x, +∞) FY (y) = FXY (+∞, y) (4.10)
Also
Z ∞ Z ∞
fX (x) = fXY (x, y)dy fY (y) = fXY (x, y)dx (4.11)
−∞ −∞
To prove Equation (4.10), we can make use of the identity
(X ≤ x) = (X ≤ x) ∪ (Y ≤ +∞)
Spring 2023 57
So that
FX (x) = P (X ≤ x) = P (X ≤ x, Y ≤ ∞) = FXY (x, +∞) (4.12)
To prove Equation (4.11), we can make use of Equation (4.7 and Equation (4.10, which gives
Z +∞
FX (x) = FXY (x, +∞) = fXY (u, y)dydu
−∞
and taking derivative with respect to x , we get

Z +∞
fX (x) = fXY (x, y)dy (4.13)
−∞
If X and Y are discrete RV s, then pij = P (X = xi , Y = yj ) represents their joint pmf , and
their respective marginal pmf s are given by

X X
P (X = xi ) = P (X = xi , Y = yj ) = pij
j j
X X
and P (Y = yi ) = P (X = xi , Y = yj ) = pij (4.14)
i i
Assuming that P (X = xi , Y = yj ) is written out in the form of a rectangular array, to obtain
P (X = xi ) from Equation (4.14, one needs to add up all the entries in the ith row.
4.3.1 Examples
Equation (4.11), the joint cdf and/or the joint pdf represent complete information about the
RV s, and their marginal pdf s can be evaluated from the joint pdf . However, given marginal,
(most often) it will not be possible to compute the joint pdf .
Example 1
Given

 constant, 0 < x < y < 1

fXY (x, y) = (4.15)
 0,

o.w.
Spring 2023 58
Obtain the marginal pdf s fX (x) and fY (y).
Solution: It is given that the joint pdf fXY (x, y) is a constant in the shaded region in
Fig. 4.2 We can use Equation (4.16) to determine that constant c.
Figure 4.2: Diagram for the example.
+∞ 1 y 1
c
Z Z Z Z 1
fXY (x, y)dxdy = c · dx dy = cydy = =1 (4.16)
−∞ y=0 x=0 y=0 0 2
Thus c = 2. Moreover
Z ∞ Z 1
fX (x) = fXY (x, y)dy = 2dy = 2(1 − x), 0<x<1
−∞ y=x
Similarly
Z ∞ Z y
fY (y) = fXY (x, y)dx = 2dy = 2y, 0<y<1
−∞ x=0
Clearly, in this case given fX (x) and fY (y) as above, it will not be possible to obtain the
original joint pdf in Equation (4.15).
Example 2
Spring 2023 59
X and Y are said to be jointly normal (Gaussian) distributed, if their joint pdf has the
following form:
1 p
fXY (x, y) = 1 − ρ2
2πσx σy
( )
−1 h (x − µx )2 2ρ(x − µx )(y − µy ) (y − µy )2 i
exp 2
− +
2(1 − ρ2 ) σX σX σY σY2
−∞ < x < ∞ − ∞ < y < ∞, |ρ| < 1 (4.17)
By direct integration, it can be shown that
1 h −(x − µ )2 i
x
fX (x) = p exp 2
∼ ℵ(µ2X , σX
2
)
2
2πσX 2σX
Similarly
1 h −(y − µ )2 i
Y
fY (y) = p exp ∼ ℵ(µ2Y , σY2 )
2πσY2 2σY2
2
Following the above notation, we will denote Equation (4.17) as ℵ(µX , µY , σX , σY2 , ρ).
Once again, knowing the marginals in above alone doesn’t tell us everything about the
joint pdf in Equation (4.17). As we show below, the only situation where the marginal
pdf s can be used to recover the joint pdf is when the random variables are statistically
independent.
4.4 Independence of RVs
Definition: The random variables X and Y are said to be statistically independent if
P [(X(ζ) ≤ x) ∩ (Y (ζ) ≤ y)] = P (X(ζ) ≤ x) · P (Y (ζ) ≤ y)
• For continuous RV s,
FXY (x, y) = FX (x)FY (y) (4.18)
Spring 2023 60
or equivalently, if X and Y are independent, then we must have
fXY (x, y) = fX (x)fY (y) (4.19)
• If X and Y are discrete-type RV s then their independence implies
P (X = xi , Y = yj ) = P (X = xi ) · P (Y = yj ) ∀ i, j (4.20)
Equations (4.18)-(4.20) give us the procedure to test for independence. Given fXY (x, y),
obtain the marginal pdf s fX (x) and fY (y) and examine whether one of equations in
(4.18) or (4.20) is valid. If so, the RV s are independent, otherwise they are dependent.
• Returning back to Example 1, we observe by direct verification that fXY (x, y) 6= fX (x)·
fY (y). Hence X and Y are dependent RV s in that case.
• It is easy to see that such is the case in the case of Example 2 also, unless in other
words, two jointly Gaussian RV s as in Equation (4.17) are independent if and only if
the fifth parameter ρ = 0.
4.5 Expectation of Functions of RVs
If X and Y are random variables and g(·) is a function of two variables, then
XX
E[g(X, Y )] = g(x, y) · p(x, y) discrete case
y x
Z ∞ Z ∞
= g(x, y)fXY (x, y)dxdy continuous case (4.21)
−∞ −∞
If g(X, Y ) = aX + bY , then we can obtain
E[g(X, Y )] = aE[X] + bE[Y ]
Spring 2023 61
• If X and Y are independent, then for any functions h(·) and g(·)
E[g(X)h(Y )] = E[g(X)] · E[h(Y )]
And
Var(X + Y ) = Var(X) + Var(Y )
Example 3
Random variables X1 and X2 are independent and identically distributed with probability
density function 
 1 − x/2, 0 ≤ x ≤ 2;

fX (x) =
 0,

o.w.
Find
• (a) The joint pdf fX1 X2 (x1 , x2 )
• (b) The cdf of Z = max(X1 , X2 ).
Solution:(a) since X1 and X2 are independent,


 (1 − x1 )(1 −
 x2
), 0 ≤ x1 ≤ 2, 0 ≤ x2 ≤ 2;
2 2
fX1 X2 (x1 , x2 ) = fX1 (x1 ) · fX2 (x2 ) =
 0,

o.w.
(b) Let FX (x) denote the CDF of both X1 and X2 . The CDF of Z = max(X1 , X2 ) is found
by observing that Z ≤ z iff X1 ≤ z and X2 ≤ z. That is
P (Z ≤ z) = P (X1 ≤ z, X2 ≤ z) = P (X1 ≤ z)P (X2 ≤ z) = [FX (z)]2


0, x < 0;




Z x Z x 
t 
2
FX (x) = fX (t)dt = (1 − )dt = (x − x4 ), 0 ≤ x ≤ 2;
−∞ −∞ 2 



 1,

x > 2.
Spring 2023 62
Thus, for 0 ≤ z ≤ 2,
z2 2
FZ (z) = (z − )
4
The complete CDF of Z is


0, z < 0;






FZ (z) = z2
 (z − 4
), 0 ≤ z ≤ 2;



 1,

z > 2.
Example 4
Given 
 xy 2 e−y , 0 < y < ∞, 0 < x < 1;

fXY (x, y) =
 0,

o.w.
Determine whether X and Y are independent.
Solution:
Z ∞ Z ∞ Z ∞
2 −y
fX (x) = fXY (x, y)dy = x y e dy = x −y 2 de−y
0 0 0
∞
Z ∞
= x − y 2e−y +2 ye−y dy = 2x, 0<x<1
0
0
Similarly
1
y 2 −y
Z
fy (y) = fXY (x, y)dx = e , 0<y<∞
0 2
In this case
fXY (x, y) = fX (x) · fY (y)
and hence X and Y are independent random variables.
Spring 2023 63
4.6 Correlation and Covariance
1 Correlation: Given any two RV s X and Y , define
E[X m Y n ] m, nth joint moment
E[XY ] = Corr(X, Y ) = RXY correlation of X and Y
E[(X − µX )m (Y − µY )n ] m, nth central joint moment
E[(X − µX )(Y − µY )] = Cov(X, Y ) = KXY covariance of X and Y
2 Covariance: Given any two RV s X and Y , define
Cov(X, Y ) = E[(X − µX )(Y − µY )]
By expanding and simplifying the right side of the above equation, we also get
Cov(X, Y ) = E(XY ) − µX × µY = E(XY ) − E(X)E(Y )
3 Correlation coefficient between X and Y
Cov(X, Y ) Cov(X, Y )
ρXY = p = − 1 ≤ ρXY ≤ 1
V ar(X)V ar(Y ) σX σY
Cov(X, Y ) = ρXY σX σY
4 Uncorrelated RV s : If ρXY = 0, then X and Y are said to be uncorrelated RV s. If
X and Y are uncorrelated, then
E(XY ) = E(X)E(Y ) (4.22)
5 Orthogonality X and Y are said to be orthogonal if
E(XY ) = 0
From above, if either X or Y has zero mean, then orthogonality implies uncorrelated-
ness also and vice-versa.
Spring 2023 64
Suppose X and Y are independent RV s,
E(XY ) = E(X)E(Y ),
therefore from Equation (4.22), we conclude that the random variables are uncorrelated.
Thus independence implies uncorrelatedness (ρXY = 0). But the inverse is generally not
true.
Example 5
Let Z = aX + bY . Determine the variance of Z in terms of σX , σY and ρXY .
Solution:
µZ = E(Z) = E(aX + bY ) = aµX + bµY
and

σZ2 2
= var(Z) = E[(Z − µZ ) ] = E [a(X − µX ) + b(Y − µY )] 2
= a2 E[(X − µX )2 ] + 2abE[(X − µX )(Y − µY )] + b2 E[(Y − µY )2 ]
= a2 σX
2
+ 2abρXY σX σY + b2 σY2
In particular if X and Y are independent, then ρXY = 0, and the above equation reduces to
σZ = a2 σX
2
+ b2 σY2
Thus the variance of the sum of independent RV s is the sum of their variances (a = b = 1).
6 Moments
Z ∞ Z ∞
m n
E[X Y ] = X m Y n fXY (x, y)dxdy
−∞ −∞
represents the joint moment of order (m, n) for X and Y .
Spring 2023 65
7 Joint Characteristic Function
Following the one random variable case, we can define the joint characteristic function
between two random variables which will turn out to be useful for moment calculations.
The joint characteristic function between X and Y is defined as

Z ∞Z ∞
j(Xω1 +Y ω2 )
ej(Xω1 +Y ω2 ) fXY (x, y)dxdy

ΦXY (ω1 , ω2 ) = E e =
−∞ −∞
From this and the two-dimensional inversion formula for Fourier transforms, it follows
that
∞ ∞
1
Z Z
fXY (x, y) = 2 ΦXY e−j(ω1 x+ω2 y) dω1 dω2
4π −∞ −∞
Note that
|ΦXY (ω1 , ω2 )| ≤ ΦXY (0, 0) = 1
If X and Y are independent RV s, then
ΦXY (ω1 , ω2 ) = ΦX (ω1 )ΦY (ω2 )
Also
ΦX (ω) = ΦXY (ω, 0) ΦY (ω) = ΦXY (0, ω)
Convolution
Characteristic functions are useful in determining the pdf of linear combinations of
RV s. If the RV ; X and Y are independent and Z = X + Y , then
E[ejωZ ] = E[ejω(X+Y ) ] = E[ejωX ] · E[ejωY ]
Hence,
ΦZ (ω) = ΦX (ω) · ΦY (ω)
It is known that the density of Z equals the convolution of fX (x) and fY (y). From
above, the characteristic function of the convolution of two densities equals the product
of their characteristic functions.
Spring 2023 66
Example 6
X and Y are independent Poisson RV s with parameters λ1 and λ2 respectively, let Z = X +Y
Then
ΦZ (ω) = ΦX (ω) · ΦY (ω)
From earlier results
jω −1) jω −1)
ΦX (ω) = eλ1 (e , ΦY (ω) = eλ2 (e
so that
jω −1)
ΦZ (ω) = e(λ1 +λ2 )(e ∼ P (λ1 + λ2 )
i.e., sum of independent Poisson RV s is also a Poisson random variable.
4.7 Central Limit Theorem
Suppose X1 , X2 , · · · , Xn are a sequence of independent, identically distributed (i.i.d) random
variables, each with mean µ and variance σ 2 . Then the distribution of
X1 + X2 + · · · + Xn − nµ
Y = √ ,
σ n
tends to the standard normal as n → ∞
Y → ℵ(0, 1)
The central limit theorem states that a large sum of independent random variables each
with finite variance tends to behave like a normal random variable. Thus the individual
pdf s become unimportant to analyze the collective sum behavior. If we model the noise
phenomenon as the sum of a large number of independent random variables (e.g.: electron
motion in resistor components), then this theorem allows us to conclude that noise behaves
Spring 2023 67
like a Gaussian RV s. This theorem holds for any distribution of the Xi ’s; herein lies its
power.
Spring 2023 68
Lecture 5
Stochastic Processes
Stochastic means: random
Process means: function of time
• Definition: Stochastic Process: A stochastic process X(t) consists of an experiment
with a probability measure P [·] defined on a sample space S and a function that assigns
a time function x(t, s) to each outcome s in the sample space of the experiment.
• Definition: Sample Function: A sample function x(t, s) is the time function associated
with outcome s of an experiment.
X(t) : name of the stochastic process
t : indicate the time dependence
s : indicates the particular outcome of the experiment
(5.1)
69
Figure 5.1: Illustration of Stochastic Process.
Figure 5.2: Example: transmit 3 binary digits to the receiver.
5.1 Types of Stochastic Processes
• Discrete Value and Continuous Value Processes: X(t) is a discrete value process if the
set of all possible values of X(t) at all times t is a countable set SX ; otherwise, X(t)
is a continuous value process.
• Discrete Time and Continuous Time Process: The stochastic process X(t) is a discrete
time process if X(t) is defined only for a set of time instants, tn = nT , where T is a
Spring 2023 70
constant and n is an integer; otherwise X(t) is a continuous time process.
• Random variables from random processes: consider a sample function x(t, s), each
x(t1 , s) is a sample value of a random variable. We use X(t1 ) for this random variable.
The notation X(t) can refer to either the random process or the random variable that
corresponds to the value of the random process at time t.
• Example: in the experiment of repeatedly rolling a die, let Xn = X(nT ). What is
the pmf of X3 ?
The random variable X3 is the value of the die roll at time 3. In this case,

 1/6, x = 1, 2 · · · , 6;

P X3 =
 0,

o.w.
5.2 Independent, Identically Distributed (i.i.d) Ran-
dom Sequences
An i.i.d. random sequence is a random sequence, Xn , in which
· · · , X−2 , X−1 , X0 , X1 , X2 , · · ·
are i.i.d random variables. An i.i.d random sequence occurs whenever we perform indepen-
dent trials of an experiment at a constant rate. An i.i.d random sequence can be either
discrete value or continuous value. In the discrete case, each random variable Xi has pmf
PXi (x) = PX (x), while in the continuous case, each Xi has pdf fXi (x) = fX (x).
Theorem: Let Xn denote an i.i.d random sequence. For a discrete value process, the sample
vector Xn1 , · · · , Xnk has joint pmf

k
Y
PXn1 , · · · , Xnk (x1 , · · · , xk ) = PX (x1 )PX (x2 ) · · · PX (xk ) = PX (xi )
i=1
Spring 2023 71
Otherwise, for a continuous value process, the joint pdf of Xn1 , · · · , Xnk is
k
Y
fXn1 , · · · , Xnk (x1 , · · · , xk ) = fX (x1 )fX (x2 ) · · · fX (xk ) = fX (xi )
i=1
5.3 Expected Value and Correlation
• The Expected Value of Process: The expected value of a stochastic process X(t)
is the deterministic function
µX (t) = E[X(t)]
• Autocovariance: the autocovariance function of the stochastic process X(t) is
CX (t, τ ) = Cov[X(t), X(t + τ )]
• Autocorrelation: The autocorrelation function of the stochastic process X(t) is
RX (t, τ ) = E[X(t)X(t + τ )]
• Autocovariance and Autocorrelation of a Random Sequence:
CX [m, k] = Cov[Xm , Xm+k ] = RX [m, k] − E[Xm ]E[Xm+k ]
where m and k are integers. The autocorrelation function of the random sequence Xn
is
RX [m, k] = E[Xm Xm+k ]
Example 1
Fading envelope, sampled at nTs , obtain a(i). Then

N
1 1 X
RX (t, Ts ) = [a(1)a(2)+a(2)a(3)+· · ·+a(N)a(N+1)] = a(i)a(i+1), (window length N)
N N i=1
Spring 2023 72
similarly,
N
1 X
RX (t, 2Ts ) = a(i)a(i + 2)
N i=1
and
N
1 X
CX (t, Ts ) = (a(i) − µX ) (a(i + 1) − µX )
N i=1
Example 2
If R is a random variable, find the expected value of the rectified cosine X(t) = R| cos 2πf t|.
Solution: The expected value of X(t) is
µX (t) = E[R| cos 2πf t|] = E[R] · | cos 2πf t|
Example 3
The input to a digital filter is an i.i.d random sequence · · · , X−1 , X0 , X1 , · · · with E[Xi ] = 0
and V ar[Xi ] = 1. The output is also a random sequence · · · , Y−1 , Y0 , Y1 , · · · . The relation-
ship between the input sequence and output sequence is expressed in the formula
Yn = Xn + Xn−1 for all integer n
Find the expected value function E[Yn ] and autocovariance function CY (m, k) of the output.
Solution: Because Yi = Xi + Xi−1 , we have E[Yi ] = E[Xi ] + E[Xi−1 ] = 0. Before calculating
CY [m, k], we observe that Xn being an i.i.d random sequence with E[Xn ] = 0 and V ar[Xn ] =
1 implies 
 1, k = 0;

CX [m, k] = E[Xm Xm+k ] =
 0, o.w.

Spring 2023 73
For any integer k, we can write
CY [m, k] = E[Ym Ym+k ] = E[(Xm + Xm−1 ) (Xm+k + Xm+k−1 )]
= E[Xm Xm+k + Xm Xm+k−1 + Xm−1 Xm+k + Xm−1 Xm+k−1 ]
= E[Xm Xm+k ] + E[Xm Xm+k−1 ] + E[Xm−1 Xm+k ] + E[Xm−1 Xm+k−1 ]
= CX [m, k] + CX [m, k − 1] + CX [m − 1, k + 1] + CX [m − 1, k]
We still need to evaluate the above expression for all k. For each value of k, some terms in
the above expression will equal zero since CX [m, k] = 0 for k 6= 0.
When k = 0
CY [m, 0] = CX [m, 0] + CX [m, −1] + CX [m − 1, 1] + CX [m − 1, 0] = 2.
When k = 1
CY [m, 1] = CX [m, 1] + CX [m, 0] + CX [m − 1, 2] + CX [m − 1, 1] = 1.
When k = −1
CY [m, −1] = CX [m, −1] + CX [m, −2] + CX [m − 1, 0] + CX [m − 1, −1] = 1.
When k = 2
CY [m, 2] = CX [m, 2] + CX [m, 1] + CX [m − 1, 3] + CX [m − 1, 2] = 0.
A complete expression for the autocovariance is


2, k = 0;






CY [m, k] = 1, k = ±1;




 0, o.w.

Spring 2023 74
5.4 Stationary Processes
In general, for the stochastic process, X(t), there is a random variable X(t1 ) at every time
instant t1 with pdf fX(t1 ) (x) which depends on t1 . For a special class of random process
known as stationary processes fX(t1 ) (X) does not depend on t1 . That is, for any two time
instants t1 and t1 + τ
fX(t1 ) (x) = fX(t1 +τ ) (x) = fX (x)
5.4.1 Some properties of Stationary Processes
• If X(t) is a stationary process, and for a > 0, then
Y (t) = aX(t) + b is also a stationary process
• If X(t) is a stationary process, the expected value, the autocorrelation, and the auto-
covariance have the following properties for all t
(a) µX(t) = µX
(b) Rx (t, τ ) = Rx (0, τ ) = RX (τ )
(c) Cx (t, τ ) = RX (τ ) − µ2X = CX (τ )
Example 4
At the receiver of an AM radio, the received signal contains a cosine carrier signal at the
carrier frequency fc with a random phase θ that is a sample value of the uniform (0, 2π)
random variable. The received carrier signal is
X(t) = A cos(2πfc t + θ)
Spring 2023 75
What are the expected value and autocorrelation of the process X(t)?
Solution: The phase has P DF


 1/(2π), 0 ≤ θ ≤ 2π;

fθ =
 0,

o.w.
For any fixed angle α and integer k,

2π
1
Z
E[cos(α + kθ)] = cos(α + kθ) dθ
0 2π
sin(α + kθ) 2π sin(α + 2kπ) − sin α
= 0
= =0
k k
(5.2)
We will use the identity cos A cos B = [cos(A−B)+cos(A+B)]/2 to find the autocorrelation:
RX (t, τ ) = E[A cos(2πfc t + θ)A cos(2πfc (t + τ ) + θ)]

A2
= E[cos(2πfc τ ) + cos(2πfc (2t + τ ) + 2θ)]
2
For α = 2πfc (t + τ ) and k = 2,
E[cos(2πfc (2t + τ ) + 2θ)] = E[cos(α + kθ)] = 0.
A2
Thus, RX (t, τ ) = cos(2πfc τ ) = RX (τ ).
2
Therefore, X(t) has the properties of a stationary stochastic
5.4.2 Wide Sense Stationary Stochastic Processes(WSS)
X(t) is a wide sense stationary stochastic process if and only if for all t,
E[X(t)] = µx , and RX (t, τ ) = RX (0, τ ) = RX (τ ).
Xn is a wide sense stationary random sequence if and only if for all n,
E[Xn ] = µx , and RX [n, k] = RX [0, k] = RX [k].
Spring 2023 76
In Example 4, we observe that µX (t) = 0 and RX (t, τ ) = (A2 /2) cos 2πfc τ Thus the random
phase carrier X(t) is a wide sense stationary process.
Properties of WSS
The autocorrelation function of a wide sense stationary process has a number of important
properties:
1. RX (0) ≥ 0
2. RX (τ ) = RX (−τ )
3. RX (0) ≥ RX (τ )
RX (0) has an important physical interpretation for electrical engineers.
The average power of a wide sense stationary process X(t) is RX (0) = E[X 2 (t)].
Quiz : Which of the following functions are valid autocorrelation functions?
1. R1 (τ ) = e−|τ |
2
2. R2 (τ ) = eτ
3. R3 (τ ) = e−τ cos τ
2
4. R4 (τ ) = e−τ sin τ
Example 5
A simple model (in degrees Celsius) for the daily temperature process C(t) is
2πn
Cn = 16[1 − cos ] + 4Xn
365
where xl , x2 , · · · is an iid random sequence of ℵ(0, 1) random variables.
Spring 2023 77
(a) What is the mean E[Cn ]?
(b) Find the autocovariance function CC [m, k].
Solution:
(a) The expected value of the process is

2πn 2πn
E[Cn ] = 16E[1 − cos ] + 4E[Xn ] = 16[1 − cos ]
365 365
(b) The autocovariance of Cn is

h h 2πm i h 2π(m + k) ii
CC [m, k] = E Cm − 16 1 − cos Cm+k − 16 1 − cos
 365 365
 16, k = 0;

= 16E[Xm Xm+k ] =
 0, o.w.

Example 6
A different model for the above example Cn is given as:
1
Cn = Cn−1 + 4Xn ,
2
where C0 , X1 , X2 , · · · is an iid random sequence of ℵ(0, 1) random variables
a) Find the mean and variance of Cn .
b) Find the autocovariance CC [m, k].
Solution:
By repeated application of the recursion Cn = Cn−1 /2 + 4Xn , we obtain

Cn−2 hX
n−1
i
Cn = +4 + Xn
4 2
Cn−3 hX
n−2 Xn−1 i
= +4 + + Xn
8 4 2
..
.
n
C0 h X
1 X2 i C
0
X Xi
= n + 4 n−1 + n−2 + · · · + Xn = n + 4
2 2 2 2 i=1
2n−1
Spring 2023 78
a) Since C0 , X1 , X2 , · · · all have zero mean,
n
E[C0 ] X E[Xi ]
E[Cn ] = + 4 =0
2n i=1
2n−1
b) The autocovariance is so complicated.
5.5 Random Signal Processing
Electrical signals are usually represented as sample functions of wide sense stationary stochas-
tic processes. We use probability density functions and probability mass functions to describe
the amplitude characteristics of signals, and we use autocorrelation functions to describe the
time-varying nature of the signals. In Practical equipment uses digital signal processing
to perform many operations on continuous-time signals, such equipment is known as an
analog-to-digital converter to transform a continuous-time signal to a random sequence. An
analog-to-digital converter performs two operations: sampling and quantization. Sampling
with a period Ts seconds transforms a continuous-time process X(t) to a random sequence
Xn = X(nTs ). Quantization transforms the continuous random variable Xn to a discrete
random variable Qn . Here, we ignore quantization and analyze linear filtering of random
processes and random sequences resulting from sampling random processes.
5.5.1 Linear Filtering of a Continuous-Time Stochastic Process
The relationship of the stochastic process at the output w(t) of a linear time invariant (LT I)
with impulse response h(t) filter to the stochastic process at the input of the filter v(t), is
the convolution:
Z ∞ Z ∞
w(t) = h(u)v(t − u)du = h(t − u)v(u)du.
−∞ −∞
Spring 2023 79
If the possible inputs to the filter are x(t), sample functions of a stochastic process X(t),
then the outputs, y(t), are sample functions of another stochastic process, Y (t). Because
y(t) is the convolution of x(t) and h(t), we adopt the following notation for the relationship
of Y (t) to X(t):
Z ∞ Z ∞
Y (t) = h(u)X(t − u)du = h(t − u)X(u)du.
−∞ −∞
Similarly, the expected value of Y (t) is the convolution of h(t) and E[X(t)].
hZ ∞ i Z ∞
E[Y (t)] = E h(u)X(t − u)du = h(u)E X(t − u) du
−∞ −∞
5.5.2 Some Properties of LTI Systems
If the input to an LT I filter with impulse response h(t) is a W SS process X(t), the output
Y (t) has the following properties:
• Y (t) is also a W SS process with expected value

Z ∞
µY = E[Y (t)] = µX h(u)du,
−∞
and autocorrelation function

Z ∞ Z ∞
RY (τ ) = h(u) h(v)RX (τ + u − v)dvdu.
−∞ −∞
• X(t) and Y (t) are jointly wide sense stationary and have input-output cross-correlation
Z ∞
RXY (τ ) = h(u)RX (τ − u)du.
−∞
• The output autocorrelation is related to the input-output cross-correlation by

Z ∞
RY (τ ) = h(−u)RXY (τ − u)du.
−∞
Spring 2023 80
Example 7
X(t), a wide sense stationary stochastic process with expected value µ = 10 volts, is the
input to a linear time-invariant filter. The filter impulse response is


 et/0.2 , 0 ≤ t ≤ 0.1 sec;

h(t) =
 0,

o.w.
What is the expected value of the filter output process Y (t)?
Solution:
Z ∞ Z 0.1
et/0.2 dt = 2 e0.5 − 1 = 1.3 volt

µY = µX h(t)dt =
−∞ 0
5.5.3 Linear Filtering of a Random Sequence
The random sequence Xn is obtained by sampling the continuous-time process X(t) at a
rate of 1/Ts samples per second. If X(t) is a wide sense stationary process with expected
value E[X(t)] = µX and autocorrelation RX (τ ), then Xn is a wide sense stationary random
sequence with expected value E[Xn ] = µX and autocorrelation function
RX [k] = RX (kTs )
The impulse response of a discrete-time If the filter has a sequence hn , n = · · · , −1, 0, 1, · · ·
and the output is a random sequence Yn , related to the input Xn by the discrete-time
convolution,
∞
X
Yn = hi Xn−i
i=−∞
If the input to a discrete-time LT I filter with impulse response hn is a wide sense stationary
random sequence, Xn , the output Yn has the following properties.
Spring 2023 81
• (a) Yn is a wide sense stationary random sequence with expected value
∞
X
µ = E[Yn ] = µX hn
n=−∞
and autocorrelation function

∞
X ∞
X
RY [n] = hi hj RX [n + i − j]
i=−∞ j=−∞
• (b) Yn and Xn are jointly wide sense stationary with input-output cross-correlation
∞
X
RXY [n] = hi RX [n − i]
i=−∞
• (c) The output autocorrelation is related to the input-output cross-correlation by

∞
X
RY [n] = h−i RXY [n − i]
i=−∞
Example 8
A wide sense stationary random sequence Xn with µX = 1 and autocorrelation function
RX [n] is the input to the order M − 1 discrete-time moving-average filter hn where


4, n = 0;
 



 1/M, n = 0, · · · , M − 1;
 

hn = and RX [n] = 2, = ±1;
 0,

o.w.




 0, |n| ≥ 2.

For the case M = 2, find the following properties of the output random sequence Yn : the
expected value µY , the autocorrelation Ry [n], and the variance V ar[Yn ].
Solution:
For this filter with M = 2
µY = µX (h0 + h1 ) = µX = 1.
Spring 2023 82
The autocorrelation of the filter output is
1 X
X 1
RY [n] = (0.25)RX [n + i − j]
i=0 j=0




 3, n = 0;



 2,

|n| = 1;
= (0.5)RX [n] + (0.25)RX [n − 1] + (0.25)RX [n + 1] =
0.5, |n| = 2.







 0,

o.w.
To obtain V ar[Yn ], we know that E[Yn2 ] = RY [0] = 3.
∴ V ar[Yn ] = E[Yn2 ] − µ2Y = 2.
Spring 2023 83
5.6 Power Spectral Density of a Continuous-Time Pro-
cess
AS you studied before, the functions g(t) and G(f ) have the Fourier transform pair:
Z ∞ Z ∞
−j2πf t
G(f ) = g(t)e dt, g(t) = G(f )ej2πf t df,
−∞ −∞
The table below provides a list of Fourier transform pairs.
Spring 2023 84
The power spectral density function of the wide sense stationary stochastic process X(t) is
" Z #
T
1 h
2
i 1 2
SX (f ) = lim E XT (f ) = lim E X(t)e−j2πf t dt .
T →∞ 2T T →∞ 2T −T
Physically, Sx (f ) has units of watts/Hz = Joules. Both the autocorrelation function and
the power spectral density function convey information about the time structure of X(t).
Z ∞ Z ∞
−j2πf τ
SX (f ) = RX (τ )e dτ, RX (τ ) = SX (f )ej2πf τ df,
−∞ −∞
For a wide sense stationary random process X(t), the power spectral density Sx (f ) is a
real-valued function with the following properties:
1. SX (f ) ≥ 0, for all f .
R∞
2. −∞
SX (f )df = E[X 2 (t)] = RX (0)
3. SX (−f ) = SX (f )
Example 9
A wide sense stationary process X(t) has autocorrelation function RX (τ ) = Ae−b|τ | where
b > 0. Derive the power spectral density function Sx (f ) and calculate the average power
E[X 2 (t)]. To find Sx (f ), we use the above table, since RX (τ ) is of the form ae−a|τ | .
2Ab
SX (f ) =
(2πf )2 + b2
The average power is
∞
2Ab
Z
2 −b|0|
E[X (t)] = RX (0) = Ae = df = A
−∞ (2πf )2 + b2
Figure 5.3 displays three graphs for each of two stochastic processes. For each process, the
three graphs are the autocorrelation function, the power spectral density function, and one
Spring 2023 85
Figure 5.3: Random processes V (t) and W (t) with autocorrelation functions Rv (τ ) = e−05|τ |
and Rw (τ ) = e−2|τ | are examples of the process X(t) in above Example. These graphs show
Rv (τ ) and Rw (τ ), the power spectral density functions Sv (f ) and Sw (f ), and sample paths
of V (t) and W (t).
sample function. For both processes, the average power is A = 1 watt. Note W (t) has a
narrower autocorrelation (less dependence between two values of the process with a given
time separation) and a wider power spectral density (more power at higher frequencies) than
V (t). The sample function w(t) fluctuates more rapidly with time than v(t).
5.7 Power Spectral Density of a Random Sequence
The spectral analysis of a random sequence parallels the analysis of a continuous-time pro-
cess. A sample function of a random sequence is an ordered list of numbers. Each number in
the list is a sample value of a random variable. The discrete-time Fourier transform (DTFT)
Spring 2023 86
is a spectral representation of an ordered set of numbers.
The sequence {· · · , X−2 X−1 x0 , X1 , x2 , · · · } and the function X(φ) are a discrete-time
Fourier transform (DTFT) pair if

∞
X Z 1/2
−j2πφn
X(φ) = Xn e , Xn = X(φ)ej2πφn dφ,
−∞ −1/2
where φ is normalized frequency, f = φfs .
The power spectral density function of the wide sense stationary random sequence Xn is
∞
X Z 1/2
−j2πφk
SX (φ) = RX [k]e , RX (k) = SX (φ)ej2πφk df.
k=−∞ −1/2
The properties of the power spectral density function of a random sequence are similar to
the properties of the power spectral density function of a continuous-time stochastic process.
1. SX (φ) ≥ 0, for all f .
R 1/2
2. −1/2
SX (φ)dφ = E[Xn2 ] = Rf X[0]
3. SX (−φ) = SX (φ)
4. For any integer n, SX (φ + n) = SX (φ).
Example 10
The wide sense stationary random sequence Xn has zero expected value and autocorrelation
function 
 σ 2 (2 − |n|)/2, n = −1, 0, 1;

RX [k] =
 0,

o.w.
Derive the power spectral density function of Xn .
Spring 2023 87
Solution:
We have
1
X
SX (φ) = RX [n]e−j2πnφ
n=−1
h (2 − 1) 2 (2 − 1) −j2πφ i
= σ2 ej2πφ + + e
4 4 4
σ2
= 1 + cos(2πφ)
2
Spring 2023 88

Notes EC636

Uploaded by

Copyright:

Available Formats

Notes EC636

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Notes EC636

Uploaded by

Copyright:

Available Formats

Stochastic and Random Processes

Dr. Mohamed Elalem

Department of Computer Engineering

Notes on Probabilities and Stochastic Processes

Graduate Program of Computer Engineering

Experiments, Models, and

• Real word exhibits randomness

– Flip a coin, head or tail (H,T)?

– Transmit a waveform through a channel, which one arrives at the receiver?

– Which one does the receiver identify as the transmitted signal?

example, waiting time depends on the following factors:

– The time of a day (is it rush hour?);

– The speed of each car that passed by while you waited;

– The psychological profile and work schedule of drivers;

1.1.1 Review of Set Operation

• Event space Ω: sets of outcomes

• Sets constructions for events E ⊂ Ω and F ⊂ Ω

– Empty set: Φ = Ωc = {}.

• Only complement operation needs the knowledge of Ω; event space.

• Exhaustive: the collection of events has

• A partition of Ω is a collection of mutually exclusive subsets of Ω such that their

union is Ω (Partition is a stronger condition than Exhaustive.):

1.1.3 De-Morgan’s Law

A∪B =A∩B A∩B = A∪B (1.1)

• Outcome: an outcome of an experiment is any possible observations of that experi-

• Event: is a set of outcomes of an experiment.

• Event Space: is a collectively exhaustive, mutually exclusive set of events.

Sample Space and Event Space

– Sample space: contains all the details of an experiment. It is a set of all

outcomes, each outcome s ∈ S. Some example:

∗ coin toss: S = {H, T }

∗ roll pair of dice: S = {(1, 1), · · · , (6, 6)}

∗ component life time: S = {t ∈ [0, ∞)} e.g., lifespan of a light bulb

– Event Space: is a set of events.

coin toss 4 times:

Let Bi = outcomes with i heads for i = 0, 1, 2, 3, 4. Each Bi is an event containing

The set B = {B0 , B1 , B2 , B3 , B4 } is an event space. It is not a sample.

as the sum of two dice,

there are 11 elements.

interested in the event space.

1.1.5 Probability Defined on Events

we must have mechanism to compute their probabilities.

S = {γ1 , γ2 , γ3, γ4 } where

γ1 = {H, H} γ2 = {H, T } γ3 = {T, H} γ4 = {T, T }

Probability measure: each event has a probability, P (E)

1.1.6 Definitions, Axioms and Theorems

• Definitions: establish the logic of probability theory

• Axioms: are facts that we have to accept without proof.

• There are only three axioms.

1- Probability is a nonnegative number

2- Probability of the whole set is unity

P (A1 ∪ A2 ∪ · · · ) = P (A1 ) + P (A2 ) + · · · (1.4)

probability of their union is the sum of their probabilities.)

We will build our entire probability theory on these axioms.

1.1.7 Some Results Derived from the Axioms

The following conclusions follow from these axioms:

• Since A ∪ A = Ω, using (2), we have

But A ∩ A = φ, and using (3),

P (A ∪ A) = P (A) + P (A) = 1 or P (A) = 1 − P (A)