Notes EC636
Notes EC636
Notes EC636
(EC636)
by
University of Tripoli
http://melalem.com/EC636.php
c M.Elalem
Lecture 1
Probabilities
1.1 Introduction
– Today’s temperature
– Walk to a bus station, how long do you wait for the arrival of a bus?
• We create models to analyze, since real experiment are generally too complicated, for
1
– The weight, horsepower, and gear ratio of the bus;
– The status of all road construction within 100 miles of the bus stop.
• It would be apparent that it would be too difficult to analyze the effects of all the
factors on the likelihood that you will wait less than 5 minutes for a bus. Therefore,
it is necessary to study and create a model to capture the critical part of the actual
physical experiment.
• Probability theory deals with the study of random phenomena, which under re-
peated experiments yield different outcomes that have certain underlying patterns
about them.
– Union: E ∪ F = {s ∈ Ω : s ∈ E OR s ∈ F };
– Intersection: E ∩ F = {s ∈ Ω : s ∈ E AND s ∈ F };
– Complement: E c = Ē = {s ∈ Ω : s ∈
/ E};
Spring 2023 2
1.1.2 Several Definitions
• Disjoint: if A ∩ B = Φ, the empty set, then A and B are said to be mutually exclusive
(M.E), or disjoint.
Σ∞
i=1 Ai = Ω
Ai ∩ Aj = φ and ∪ni=1 Ai = Ω
Spring 2023 3
1.1.4 Sample Space, Events and Probabilities
ment.
• Sample space: the sample space of an experiment is the set of all possible outcomes
of that experiment.
Spring 2023 4
∗ noise: S = {n(t); t: real}
Example 1
The sample space consists of 16 four-letter words, with each letter either h (head)
or t (tail).
one or more outcomes, say, B1 = {ttth, ttht, thtt, httt} contains four outcomes.
Example 2
Toss two dices, there are 36 elements in the sample space. If we define the event
Ω = {B2 , B3 , · · · , B12 }
– Practical example, binary data transmit through a noisy channel, we are more
Often it is meaningful to talk about at least some of the subsets of S as events, for which
Example 3
Spring 2023 5
Consider the experiment where two coins are simultaneously tossed. The sample space is
If we define
A = {γ1 , γ2 , γ3 }
The event of A is the same as “Head has occurred at least once” and qualifies as an event.
• Theorems are consequences that follow logically from definitions and axioms. Each
theorem has a proof that refers to definitions, axioms, and other theorems.
For any event A, we assign a number P (A), called the probability of the event A. This
number satisfies the following three conditions that act the axioms of probability.
P (A) ≥ 0 (1.2)
P (Ω) = 1 (1.3)
Spring 2023 6
3- For any countable collection A1 , A2 , · · · of mutually exclusive events
(Note that (3) states that if A and B are mutually exclusive (M.E.) events, the
P (A ∪ A) = P (Ω) = 1
• Similarly, for any A, A ∩ {φ} = {φ}. hence it follows that P (A ∪ {φ}) = P (A) + P (φ).
P {φ} = 0
• Suppose A and B are not mutually exclusive (M.E.) How does one compute P (A ∪ B)?
so that we can make use of the probability axioms. From figure below,
P (A ∪ B) = A ∪ AB
where A and AB are clearly M.E. events. Thus using axiom (3)
Spring 2023 7
P (A ∪ B) = P (A ∪ AB) = P (A) + P (AB)
B = B ∩ Ω = B ∩ (A ∪ A) = (B ∩ A) ∪ (B ∩ A) = BA ∪ BA
Thus
Therefore
1 1 1 3
P (A ∪ B) = P (A) + P (B) − P (AB) = + − =
2 2 4 4
Spring 2023 8
1.1.8 Theorem
For an event space B = {B1 , B2 , · · · } and any event A in the event space, let Ci =
Example 4
Coin tossing, let A equal the set of outcomes with less than three heads, as A =
{tttt, httt, thtt, ttht, ttth, hhtt, htht, htth, tthh, thth, thht} Let {B0 , B1 , B2 , B3 , B4 } de-
A = C0 ∪ C1 ∪ C2 ∪ C3 ∪ C4
= (A ∩ B0 ) ∪ (A ∩ B1 ) ∪ (A ∩ B2 ) ∪ (A ∩ B3 ) ∪ (A ∩ B4 )
this example states that the event less than three heads is the union of the events for
Example 5
Spring 2023 9
V F D
A company has a model of telephone usage. It classifies all calls as L (long), B (brief).
It also observes whether calls carry voice(V ), fax (F ), or data(D). The sample space
has six outcomes S = {LV, BV, LD, BD, LF, BF }. The probability can be represented
the previous theorem (and L is equivalent as the event A). Thus, we can apply the
theorem to find
for large N
NA NB NAB
P (A) = P (B) = P (AB) =
N N N
Among the NA occurrences of A, only NAB of them are also found among the NB
is a measure of the event A given that B has already occurred. We denote this condi-
tional probability by P (A|B) = Probability of the event A given that B has occurred.
Spring 2023 10
We define
P (AB)
P (A|B) = (1.5)
P (B)
provided P (B) 6= 0. As we show below, the above definition satisfies all probability
1. Non-negative
P (AB) ≥ 0
P (A|B) = ≥0
P (B) > 0
2.
P (ΩB) P (B)
P (Ω|B) = = = 1, since ΩB = B
P (B) P (B)
3. Suppose A ∩ C = φ, then
P (AB) P (CB)
P (A ∪ C|B) = + = P (A|B) + P (C|B)
P (B) P (B)
measure.
1. If B ⊂ A, AB = B, and
P (AB) P (B)
P (A|B) = = =1
P (B) P (B)
Spring 2023 11
2. If A ⊂ B, AB = A, and
P (AB) P (A)
P (A|B) = = > P (A)
P (B) P (B)
B. The statement that B has occurred (outcome is even) makes the probability
cated event in terms of simpler related events: Law of Total Probability. Let
∪ni=1 Ai = Ω
thus
n
X n
X
P (B) = P (BAi ) = P (B|Ai )P (Ai )
i=1 i=1
Notice that the above definition is a probabilistic statement, NOT a set theo-
retic notion such as mutually exclusiveness, (independent and disjoint are not
synonyms).
Spring 2023 12
1.1.11 More on Independence
plication.
– Independent events cannot be mutually exclusive, since P (A) > 0, P (B) > 0, and
A, B independent implies P (AB) > 0, thus the event AB cannot be the null set.
Thus if A and B are independent, the event that B has occurred does not shed
any more light into the event A. It makes no difference to A whether B has
occurred or not.
Example 6
A box contains 6 white and 4 black balls. Remove two balls at random without
replacement. What is the probability that the first one is white and the second
one is black?
Let W1 = “first ball removed is white” and B2 = “second ball removed is black”.
But
6 6 3
P (W1 ) = = =
6+4 10 5
Spring 2023 13
and
4 4
P (B2 |W1 ) = =
5+4 9
and hence
3 4 4
P (W1 |B2 ) = · = = 0.267
5 9 15
Are the events W1 and B2 independent? Our common sense says No. To verify
this we need to compute P (B2). Of course the fate of the second ball very much
depends on that of the first ball. The first ball has two options: W1 = “first ball
4 3 3 4 2
P (B2 ) = · + · =
5 + 4 5 6 + 3 10 5
and
2 3 4
P (B2 )P (W1 ) = · =6 P (B2 W1 ) =
5 5 15
As expected, the events W1 and B2 are dependent.
since
similarly,
P (BA) P (AB)
P (B|A) = = ⇒ P (AB) = P (B|A)P (A)
P (A) P (A)
We get
Spring 2023 14
or
P (B|A)
P (A|B) = · P (A) (1.6)
P (B)
represents the a-priori probability of the event A. Suppose B has occurred, and
assume that A and B are not independent. How can this new information be
used to update our knowledge about A? Bayes’ rule takes into account the new
information (“B has occurred”) and gives out the a-posteriori probability of A
given B.
We can also view the event B as new knowledge obtained from a fresh experiment.
of A. Bayes theorem gives the exact mechanism for incorporating such new in-
formation.
associated a-priori probabilities P (Ai ), i = [1, n]. With the new information “B
Example 7
Two boxes B1 and B2 contain 100 and 200 light bulbs respectively. The first box
(B1 ) has 15 defective bulbs and the second 5. Suppose a box is selected at random
Spring 2023 15
and one bulb is picked out.
Solution: Note that box B1 has 85 good and 15 defective bulbs. Similarly box
B2 has 195 good and 5 defective bulbs. Let D = “Defective bulb is picked out”.
Then,
15 5
P (D|B1) = = 0.15, P (D|B2) = = 0.025
100 200
Since a box is selected at random, they are equally likely.
Thus B1 and B2 form a partition, and using Law of Total Probability, we obtain
1 1
P (D) = P (D|B1 )P (B1 ) + P (D|B2)P (B2 ) = 0.15 × + 0.025 × = 0.0875
2 2
(b) Suppose we test the bulb and it is found to be defective. What is the proba-
Notice that initially P (B1) = 0.5; then we picked out a box at random and tested
a bulb that turned out to be defective. Can this information shed some light about
the fact that we might have picked up box 1? From (1.8), P (B1 |D) = 0.875 > 0.5,
and indeed it is more likely at this point that we must have chosen box 1 in favor
of box 2. (Recall box1 has six times more defective bulbs compared to box2).
Example 8
Suppose you have two coins, one biased, one fair, but you don’t know which coin is
which. Coin 1 is biased. It comes up heads with probability 3/4, while coin 2 will flip
Spring 2023 16
heads with probability 1/2. Suppose you pick a coin at random and flip it. Let Ci
denote the event that coin i is picked. Let H and T denote the possible outcomes of the
flip. Given that the outcome of the flip is a head, what is P [C1 |H], the probability that
you picked the biased coin? Given that the outcome is a tail, what is the probability
Solution: First, we construct the sample tree as shown: To find the conditional
probabilities, we see
Similarly,
As we would expect, we are more likely to have chosen coin 1 when the first flip is
heads but we are more likely to have chosen coin 2 when the first flip is tails.
Spring 2023 17
Lecture 2
Random Variables
2.1 Introduction
Let Ω be sample space of a probability model, and X a function that maps every ζ ∈ Ω, to
a unique point x ∈ R, the set of real numbers. Since the outcome ζ is not certain, so is the
value X(ζ) = x. Thus if B is some subset of R, we may want to determine the probability
of ”X(ζ) ∈ B“. To determine this probability, we can look at the set A = X −1 (B) ⊂ Ω. A
Obviously, if the set A = X −1 (B) is an event, the probability of A is well defined; in this
18
However, X −1 (B) may not always belong to R for all B, thus creating difficulties. The
notion of random variable (RV ) makes sure that the inverse mapping always results in an
Random Variable (RV ): A finite single valued function X(·) that maps the set of all
experimental outcomes Ω into the set of real numbers R is said to be a RV , if the set
The random variable X by the function X(ζ) that maps the sample outcome ζ to the
{X = x} = {ζ ∈ Ω|X(ζ) = x}
Since all events have well defined probability. Thus the probability of the event {ζ|X(ζ) ≤ x}
The role of the subscript X is only to identify the actual RV . FX (x) is said to be the
FX (+∞) = 1, FX (−∞) = 0
Spring 2023 19
If x1 < x2 , then the subset (−∞, x1 ) ⊂ (−∞, x2 ). Consequently the event {ζ|X(ζ) ≤
implying that the probability distribution function is nonnegative and monotone non-
decreasing.
To prove this theorem, express the event Eab = {a < X ≤ b} as a part of union of
disjoint events. Starting with the event Eb = {X ≤ b} . Note that Eb can be written
as the union
Eb = X ≤ b = {X ≤ a} ∪ {a < X ≤ b} = Ea ∪ Eab
Note also that Ea and Eab are disjoint so that P (Eb) = P (Ea )+P (Eab). Since P (Eb ) =
FX (b) and P (Ea ) = FX (a), we can write FX (b) = FX (a) + P (a < X ≤ b), which
This follows, since FX (x0 ) = P (X(ζ) ≤ x0 ) = 0 implies {X(ζ) ≤ x0 } is the null set,
We have {X(ζ) ≤ x} ∪ {X(ζ) > x} = Ω, and since the two events are mutually
Spring 2023 20
• P {x1 < X(ζ) ≤ x2 } = FX (x2 ) − FX (x1 ), x2 > x1
The events {X(ζ) ≤ x1 } and {x1 < X(ζ) ≤ x2 } are mutually exclusive and their union
or
FX (x+
0 ), the limit of FX (x) as x → x0 from the right always exists and equals FX (x0 ).
continuous from the left. At a discontinuity point of the distribution, the left and right
Thus the only discontinuities of a distribution function are of the jump type. The
CDF is continuous from the right. Keep in mind that the CDF always takes on the
Example 1
Solution: For x < c, {X(ζ) ≤ x} = φ, so that FX (x) = 0 and for x > c, X(ζ) ≤ x = Ω, so
Example 2
Spring 2023 21
Figure 2.1: CDF for example 1
Toss a coin. Ω = {H, T }. Suppose the RV X is such that X(T ) = q, X(H) = 1 − q. Find
FX (x).
Solution:
Spring 2023 22
point, then
pi = P {X = xi } = FX (xi ) − FX (x−
i )
P {X = c} = FX (c) − FX (c− ) = 1 − 0 = 1
and
P {X = 0} = FX (0) − FX (0− ) = q − 0 = q
Example 3
A fair coin is tossed twice, and let the RV X represent the number of heads. Find FX (x).
X(T T ) = 0
3 1 1
P {X = 1} = FX (1) − FX (1− ) = − =
4 4 2
The first derivative of the distribution function FX (x) is called the probability density func-
Spring 2023 23
Figure 2.3: CDF for example 3
• Discrete RV:if X is a discrete type RV , then its density function has the general
form
X
fX (x) = pi δ(x − xi )
i
where xi represent the jump-discontinuity points in FX (x). As Fig. 2.4 shows, fX (x)
Spring 2023 24
Since FX (+∞) = 1, yields
Z −∞
fX (u)du = 1
−∞
Thus the area under fX (x) in the interval (x1 , x2 ) represents the probability in the
above equation.
• Often, RV s are referred by their specific density functions - both in the continuous and
discrete cases - and in what follows we shall list a number of them in each category.
1 h (x − µ)2 i
fX (x) = √ exp − (2.3)
2πσ 2 2σ 2
This is a bell shaped curve, symmetric around the parameter µ, and its distribution
function is given by
x h (y − µ)2 i
1 x−µ
Z
FX (x) = √ exp − 2
dy = Φ( ) (2.4)
−∞ 2πσ 2 2σ σ
Spring 2023 25
Rx 2
where Φ(x) = −∞
√1
2π
exp(− y2 )dy is called standard normal CDF , and is often tabu-
lated. Figure 2.6 shows pdf and cdf of the Normal distribution for different means and
variances.
Figure 2.6: pdf and cdf of Normal distribution for different means and variances
b − µ a − µ
P (a < X < b) = Φ −Φ
σ σ
Z ∞ 2
1 y
Q(x) = √ exp − dy = 1 − Φ(x)
x 2π 2
Q(x) is called Standard Normal complementary CDF , and Q(x) = 1 − Φ(x). Since
X −µ
Y = ∼ ℵ(0, 1) (2.5)
σ
aX + b ∼ ℵ(aµ + b, a2 σ 2 )
Spring 2023 26
Figure 2.7: pdf and cdf of Uniform distribution
• Exponential: X ∼ E(λ) if
1
exp(− λx ), x ≥ 0;
λ
fX (x) = (2.7)
0,
elsewhere.
Figure 2.8 indicates the pdf and cdf of Exponential distribution for different parameter
λ.
Spring 2023 27
When the RV X is defined as X = Σni=1 Xi2 , i = [1, n] are statistically independent
distribution with n degree of freedom. Γ(x) is called Gamma function and given as
Z ∞
Γ(p) = tp−1 e−t dt, p > 0
0
1 √
Γ( ) = π
2
Let Y = X12 +X22 where X1 and X2 ∼ ℵ(0, σ 2 ) and independent. Then Y is chi-square
1 y
fY (y) = exp −
2σ 2 2σ 2
√
Now, suppose we define a new RV as R = Y , then R is Rayleigh distributed.
Spring 2023 28
2.4 Discrete-type Random Variables
P (X = 0) = q, P (X = 1) = p, p = 1 − q.
• Binomial: X ∼ B(n, p)
n
P (X = k) = pk q n−k , k = 0, 1, 2, · · · , n
k
• Poisson: X ∼ P (λ)
λk
P (X = k) = e−λ , k = 0, 1, 2, · · · , ∞
k!
1
P (X = k) = , k = 0, 1, 2, · · · , n
n
P (X = k) = (1 − p)k−1 p, k = 1, · · · ,
where the parameter p ∈ (0, 1) (probability for head appear on each one toss).
that occur randomly in time. While the time of each occurrence is completely random,
there is a known average number of occurrences per unit time. For example, the arrival of
For example, calls arrive at random times at a telephone switching office with an average
Spring 2023 29
of λ = 0.25 calls/second. The pmf of the number of calls that arrive in a T = 2 second
interval is
(0.5)k e−0.5 , k = 0, 1, 2, · · · ;
k!
PK (k) =
0,
o.w.
transmit the same binary symbol 5 times. Thus, ”zero“ is transmitted as 00000 and ”one“
is transmitted as 11111. The receiver detects the correct information if three or more binary
symbols are received correctly. What is the information error probability P (E), if the binary
In this case, we have five trials corresponding to five transmissions. On each trial, the
By increasing the number of transmissions (5 times), the probability of error is reduced from
0.1 to 0.0081.
Bernoulli trial consists of repeated independent and identical experiments each of which has
only two outcomes A or Ā with P (A) = p and P (Ā) = q. The probability of exactly k
Let
Spring 2023 30
Since the number of occurrences of A in n trials must be an integer k = 0, 1, 2, · · · , n, either
P (X0 ∪ X1 ∪ X2 ∪ · · · ∪ Xn ) = 1 (2.11)
For a given n and p what is the most likely value of k? The most probable value of k is that
number which maximizes in Binomial distribution. To obtain this value, consider the ratio
Thus Pn (k) > Pn (k − 1), if k(1 − p) < (n − k + 1)p or k < (n + 1)p. Thus, Pn (k) as a function
of k increases until
k = (n + 1)p
if it is an integer, or the largest integer kmax less than (n + 1)p and (n + 1)p represents the
Example 4
In a Bernoulli experiment with n trials, find the probability that the number of occurrences
of A is between k1 and k2 .
Spring 2023 31
Solution: with Xi , i = 0, 1, 2, · · · , n as defined in Equation (2.10), clearly they are mutually
Example 5
Suppose 5, 000 components are ordered. The probability that a part is defective equals 0.1.
What is the probability that the total number of defective parts does not exceed 400?
Solution: Let
The above equation has too many terms to compute. Clearly, we need a technique to compute
Spring 2023 32
useful.
Thus if k1 and k2 in Equation (2.14) are within or around the neighborhood of the interval
√ √
(np − npq, np + npq) we can approximate the summation in Equation (2.14) by an
integration as
K2 (k − np)2
1
Z
P (k1 < X < k2 ) = √ exp − dx
k1 2πnpq 2npq
x2 y2
1
Z
= √ exp − dy (2.16)
x1 2π 2
where
k1 − np k2 − np
x1 = √ x2 = √
npq npq
We can express Equation (2.16) in terms of the normalized integral that has been tabulated
x
1
Z
2 /2
erf (x) = √ ey dy = −erf (−x) (2.17)
2π 0
Example 6
A fair coin is tossed 5, 000 times. Find the probability that the number of heads is between
2, 475 to 2, 525.
Solution: We need P (2475 ≤ X ≤ 2525). Here n is large so that we can use the normal
Spring 2023 33
Figure 2.10: pdf of Gaussian approximation.
√ √
approximation. In this case p = 1/2, so that np = 2500, and npq ≃ 35. Since np − npq ≃
√
2465 and np + npq ≃ 2535, the approximation is valid for k1 = 2475 and k2 = 2525. Thus
x2 y2
1
Z
P (k1 < X < k2 ) = √ exp − dy (2.18)
x1 2π 2
here
k1 − np 5 k2 − np 5
x1 = √ =− , x2 = √ =
npq 7 npq 7
5
P (2475 ≤ X ≤ 2525) = erf (x2 ) − erf (x1 ) = erf (x2 ) + erf (|x1 |) = 2erf ( ) = 0.516
7
Spring 2023 34
Figure 2.11: The standard normal complementary CDF Φ(z)
Spring 2023 35
Figure 2.12: The standard normal complementary CDF Q(z)
Spring 2023 36
Lecture 3
3.1 Mean of a RV
For a RV X, its pdf fX (x) represents complete information about it. Note that fX (x)
represents very detailed information, and quite often it is desirable to characterize the r.v
in terms of its average behavior. In this context, we will introduce two parameters - mean
and variance - that are universally used to represent the overall properties of the RV and
37
Mean represents the average (mean) value of the RV in a very large number of trials. For
example
∞ ∞
x −x
Z Z
E[X] = e λ dx = λ ye−y dy = λ
0 λ 0
implying that the parameter represents the mean value of the exponential RV .
Spring 2023 38
• For the normal RV ,
∞ Z ∞
(x − µ)2 (y)2
1 1
Z
X̄ = E[X] = √ x exp − dx = √ (y + µ) exp − 2 dy
2πσ 2 −∞ 2σ 2 2πσ 2 −∞ 2σ
Z ∞ 2
Z ∞ 2
1 (y) 1 (y)
= √ y exp − 2 dy + µ √ exp − 2 dy = µ
2
2πσ −∞ 2σ 2
2πσ −∞ 2σ
(3.5)
where the first integral in Equation (3.5) is zero and the second is 1. Thus the first
Given X ∼ fX (x), suppose Y = g(X) defines a new RV with pdf fY (y). Then from the
From above, it appears that to determine E(Y ), we need to determine fY (y). However this
is not the case if only E(Y ) is the quantity of interest. Instead, we can obtain E(Y ) as
Z ∞ Z ∞
µY = E[Y ] = E[g(X)] = yfY (y)dy = g(x)fX (x)dx (3.7)
−∞ −∞
X
µY = E[Y ] = g(xi )P (X = xi ) (3.8)
i
Spring 2023 39
Therefore, fY (y) is not required to evaluate E(Y ) for Y = g(X). As an example, we
moment is λ2 + λ.
3.2 Variance of a RV
Mean alone cannot be able to truly represent the pdf of any RV. As an example to illustrate
this, considering two Gaussian RVs X1 ∼ ℵ(0, 1) and X2 ∼ ℵ(0, 10). Both of them have
the same mean. However, as Figure 3.1 shows, their pdf s are quite different. One is more
concentrated around the mean, whereas the other one has a wider spread. Clearly, we need
For a RV X with mean µ , X − µ represents the deviation of the RV from its mean.
Since this deviation can be either positive or negative, consider the quantity (X − µ)2 , and
its average value E[(X − µ)2 ] represents the average mean square deviation of X around its
mean. Define
2
σX = E[(X − µ)2 ] > 0 (3.10)
Spring 2023 40
0.5
2
σ =1
2
σ =4
0.4
Both µ =0
0.3
0.2
0.1
0
−10 −5 0 5 10
2
p
σX is known as the variance of the RV X, and its square root σX = E[(X − µ)2 ] is
known as the standard deviation of X. Note that the standard deviation represents the root
2 2
σX = E(X 2 ) − [E(X)]2 = X 2 − X = (λ2 + λ) − λ2 = λ
Thus for a Poisson RV, mean and variance are both equal to its parameter λ.
Spring 2023 41
• The variance of the normal RV ℵ(µ, σ 2) can be obtained as
∞
1
Z
2 2 /2σ 2
Var(X) = E(X − µ) = (X − µ)2 √ e−(x−µ) dx (3.13)
−∞ 2πσ 2
∞ ∞
1
Z Z
2 /2σ 2
fX (x)dx = √ e−(x−µ) dx = 1
−∞ −∞ 2πσ 2
which gives
Z ∞
2 /2σ 2 √
e−(x−µ) dx = 2πσ
−∞
or
∞
(x − µ)2 −(x−µ)2 /2σ2
Z
√ e dx = σ 2
2πσ 2
−∞
which represents the Var(X) in Equation (3.13). Thus for a normal RV ℵ(µ, σ 2 ),
Var(X) = σ 2 ,
therefore the second parameter in ℵ(µ, σ 2 ) in fact represents the variance. As Figure
4.1 shows the larger the σ, the larger the spread of the pdf around its mean. Thus as
the variance of a RV tends to zero, it will begin to concentrate more and more around
3.3 Moments
Spring 2023 42
are known as the moments of the RV X, and
µn = E[(X − µ)n ]
are known as the central moments of X. Clearly, the mean µ = m1 , and the variance
σ 2 = µ2 . It is easy to relate mn and µn . In fact for n > 1 and using the relation
n
X n
(a + b)n = p q ,
k n−k
k=0 k
we get
n
!
Xn
= E[(X − µ)n ] = E
k n−k
µn X (−µ)
k=0 k
n
X n n
X n
k n−k n−k
= E(X )(µ) = mk (−µ) (3.15)
k=0 k k=0 k
Direct calculation is often a tedious procedure to compute the mean and variance, and in
this context, the notion of the characteristic function can be quite helpful.
X
ΦX (ω) = ejkω P (X = k) (3.17)
k
Spring 2023 43
• if X ∼ P (λ) for Poisson distribution, then its characteristic function is given by
∞ ∞ k
X λk X (λejω ) jω jω
ΦX (ω) = ejkω e−λ = e−λ = e−λ eλe = eλ(e −1) (3.18)
k! k!
k=0 k=0
1 ∂ 2 ΦX (ω)
E(X 2 ) = (3.22)
j 2 ∂ω 2
ω=0
k1 ∂ k ΦX (ω)
E(X ) = k , k≥1 (3.23)
j ∂ω k
ω=0
We can use Equations (3.20)-(3.22) to compute the mean, variance and other higher order
moments of any RV X.
Spring 2023 44
• if X ∼ P (λ), then from Equation (3.18),
∂ΦX (ω) jω
= e−λ eλe , (3.24)
∂ω
E(X) = λ
which agrees with our earlier derivation in Equation (3.3). Differentiating Equation
∂ 2 ΦX (ω) −λ
j
λe ω jω 2
λej ω 2 jω
=e e λje +e λj e (3.25)
∂ω 2
E(X 2 ) = λ2 + λ
which again agrees with results in Equation (3.3). Notice that compared to the tedious
very minimal.
• We can use the characteristic function of the binomial RV B(n, p) in Equation (3.19)
so that from Equation (3.21), E(X) = np, which is the same as previous calculation.
and using Equation (3.22), we obtain the second moment of the binomial r.v to be
Spring 2023 45
Therefore, we obtain the variance of the binomial r.v to be
2
σX = E(X 2 ) − [E(X)]2 = n2 p2 + npq − n2 p2 = npq
• To obtain the characteristic function of the Gaussian r.v, we can make use of the
(Let y − jσ 2 ω = z, so that y = z + jσ 2 ω)
Z ∞
jµω 1 2 2 2
= e √ e−(z+jσ ω)(z−jσ ω)/2σ dz
2πσ −∞ 2
Z ∞
jµω −σ2 ω 2 /2 1 2 2 2 2
= e e √ e−z /2σ dz = e(jµω−σ ω /2) (3.28)
2
2πσ −∞
Notice that the characteristic function of a Gaussian r.v itself has the ”Gaussian“ bell
1 2 /2σ 2 2 ω 2 /2
fX (x) = √ e−x ΦX (ω) = e−σ
2πσ 2
From Fig. 10, the reverse roles of σ 2 in fX (x) and ΦX (ω) are noteworthy (σ 2 , vs.1/σ 2 ).
We conclude this section with a bound that estimates the dispersion of the r.v beyond a
certain interval centered around its mean. Since σ 2 measures the dispersion of the RV X
around its mean µ, we expect this bound to depend on σ 2 as well. Consider an interval of
width 2ǫ symmetrically centered around its mean µ shown as in Figure 3.2. What is the
P (|X − µ| ≥ ǫ) =? (3.29)
Spring 2023 46
Figure 3.2: Chebyshev inequality concept
σ2
P (|X − µ| ≥ ǫ) ≤ (3.31)
ǫ2
Equation (3.31) is known as the chebychev inequality. Interestingly, to compute the above
probability bound, the knowledge of fX (x) is not necessary. We only need σ 2 , the variance
1
P (|X − µ| ≥ kσ) ≤ (3.32)
k2
Thus with k = 3, we get the probability of X being outside the 3σ interval around its mean
to be 0.111 for any RV. Obviously this cannot be a tight bound as it includes all RVs. For
which is much tighter than that given by Equation (3.32). Chebychev inequality always
Spring 2023 47
Example 1
If the height X of a randomly chosen adult has expected value E[X] = 5.5 feet and standard
deviation σX = 1 foot, use the Chebyshev inequality to find an upper bound on P (X ≥ 11)
Var[X]
P [X ≥ 11] = P [|X − µX | ≥ 5.5] ≤ = 0.033 ≃ 1/30
5.52
We can see that the Chebyshev inequality is a loose bound. In fact, P [X ≥ 11] is orders of
magnitude lower than 1/30. Otherwise, we would expect often to see a person over 11 feet
Example 2
If X is uniformly distributed over the interval (0, 10), then, as E[X] = 5, Var(X) = 25/3, it
σ2 25 1
P (|X ≥ 5| > 4) ≤ 2
= ≃ 0.52
ǫ 3 16
Thus, although Chebyshev’fs inequality is correct, the upper bound that it provides is not
Y = g(X)
Example 3
Y = aX + b
and
1 y − b
fY (y) = fX
a a
On the other hand if a < 0, then
y − b y − b
FY (y) = P (Y ≤ y) = P (aX + b ≤ y) = P X > = 1 − FX
a a
and hence
1 y − b
fY (y) = − fX
a a
Therefore, we obtain (for all a)
1 y − b
fY (y) = fX
|a| a
Spring 2023 49
Example 4
Y = X2
FY (y) = P (Y ≤ y) = P (X 2 ≤ y)
FY (y) = 0 y<0
For y > 0, from Figure. 12, the event {Y ≤ y} = {X 2 ≤ y} is equivalent to {x1 < X ≤ x2 }.
Hence,
√ √
FY (y) = P (x1 < X ≤ x2 ) = FX (x2 ) − FX (x1 ) = FX ( y) − FX (− y) y > 0
1 √
fY (y) = √ fX ( y)U(y) (3.34)
y
1 2
fx (x) = √ e−x /2 (3.35)
2π
and substituting this into Equation (3.33) or Equation (3.34), we obtain the pdf of Y = X 2
to be
1 2
fY (y) = √ e−y /2 U(y) (3.36)
2πy
Spring 2023 50
3.6.1 General Approach
As a general approach, given Y = g(X), first sketch the graph y = g(x), and determine the
• Next, determine whether there are discontinuities in the range space of y. If so evaluate
FY (y) = P (g(X) ≤ y)
and determine appropriate events in terms of the RV X for every y. Finally, we must
dFY (y)
fY (y) = a<y<b
dy
obtain fY (y). A continuous function g(x) has a derivative function ǵ(X) which has a finite
X 1 X 1
fY (y) = fX (xi ) = fX (xi ) (3.37)
i
|dy/dx|i i
|ǵ(X)|i
The summation index i in Equation (3.37) depends on y, and for every y the equation
y = g(xi ) must be solved to obtain the total number of solutions at every y, and the actual
Spring 2023 51
solutions x1 , x2 , · · · all in terms of y.
Example 5
dy 1 dy 1
=− 2 so that = = y2
dx x dx 1/y 2
x=x1
1 1
fY (y) = 2
fX ( ). (3.38)
y y
P (X = xi ) = pi , x = x1 , x2 , . . . , xi , · · · ,
and Y = g(X). Clearly Y is also of discrete-type, and when x = xi , yi = g(xi ), and for those
yi ,
P (Y = yi ) = P (X = xi ) = pi , y = y1 , y2 , · · · , yi , · · ·
Example 6
λk
P (X = k) = e−λ , k = 0, 1, 2, · · ·
k!
1, · · · ,
P (Y = k 2 + 1) = P (X = k)
Spring 2023 52
so that for j = k 2 + 1
√
p λ λ j−1
P (Y = j) = P (X = j − 1) = e √ , j = 1, 2, 5, · · · k 2 + 1, · · ·
( j − 1)!
Spring 2023 53
Lecture 4
In many experiments, the observations are expressible not as a single quantity, but as a family
of quantities. For example to record the height and weight of each person in a community
or the number of people and the total income in a family, we need two numbers. Let X and
Y denote two random variables (r.v) based on a probability model (Ω, F, P ). Then
Z x2
P (x1 < X(ζ) < x2 ) = FX (x2 ) − FX (x1 ) = fX (x)dx
x1
and
Z y2
P (y1 < Y (ζ) < y2 ) = FY (y2 ) − FX (y1 ) = fY (y)dy
y1
What about the probability that the pair of RVs (X, Y ) belongs to an arbitrary region D?
54
Towards this, we define the joint probability distribution function of X and Y to be
FXY (x, y) = P (x1 < X(ζ) < x2 ) ∩ P (y1 < Y (ζ) < y2 )
= P (x1 ≤ x, Y ≤ y) ≥ 0 (4.1)
4.1.1 Properties
1.
Similarly, (X(ζ) ≤ +∞, Y (ζ) ≤ +∞) = Ω we get: FXY (+∞, +∞) = P (Ω) = 1
2.
P (x1 < X(ζ) < x2 , Y (ζ) ≤ y) = FXY (x2 , y) − FXY (x1 , y) (4.3)
and the mutually exclusive property of the events on the right side gives
which proves Equation (4.3). Similarly one can prove Equation (4.4).
Spring 2023 55
3.
P (x1 < X(ζ) ≤ x2 , y1 < Y (ζ) ≤ y2) = FXY (x2 , y2 ) − FXY (x2 , y1 )
This is the probability that (X, Y ) belongs to the rectangle in Figure 4.1. To prove
Equation (4.5), we can make use of the following identity involving mutually exclusive
This gives
and the desired result in Equation (4.5) follows by making use of Equation (4.3) with
y = y2 and y1 respectively.
Spring 2023 56
4.2 Joint Probability Density Function (Joint pdf )
∂ 2 FXY (x, y)
fXY (x, y) = (4.6)
∂x∂y
Z Z
P ((X, Y ) ∈ R0 ) = fXY (x, y)dxdy (4.9)
(x,y)∈R0
In the context of several RV s, the statistics of each individual ones are called marginal
statistics. Thus FX (x) is the marginal probability distribution function of X, and fX (x) is
the marginal pdf of X. It is interesting to note that all marginal can be obtained from the
Also
Z ∞ Z ∞
fX (x) = fXY (x, y)dy fY (y) = fXY (x, y)dx (4.11)
−∞ −∞
(X ≤ x) = (X ≤ x) ∪ (Y ≤ +∞)
Spring 2023 57
So that
To prove Equation (4.11), we can make use of Equation (4.7 and Equation (4.10, which gives
Z +∞
FX (x) = FXY (x, +∞) = fXY (u, y)dydu
−∞
If X and Y are discrete RV s, then pij = P (X = xi , Y = yj ) represents their joint pmf , and
P (X = xi ) from Equation (4.14, one needs to add up all the entries in the ith row.
4.3.1 Examples
Equation (4.11), the joint cdf and/or the joint pdf represent complete information about the
RV s, and their marginal pdf s can be evaluated from the joint pdf . However, given marginal,
Example 1
Given
constant, 0 < x < y < 1
fXY (x, y) = (4.15)
0,
o.w.
Spring 2023 58
Obtain the marginal pdf s fX (x) and fY (y).
Solution: It is given that the joint pdf fXY (x, y) is a constant in the shaded region in
+∞ 1 y 1
c
Z Z Z Z 1
fXY (x, y)dxdy = c · dx dy = cydy = =1 (4.16)
−∞ y=0 x=0 y=0 0 2
Thus c = 2. Moreover
Z ∞ Z 1
fX (x) = fXY (x, y)dy = 2dy = 2(1 − x), 0<x<1
−∞ y=x
Similarly
Z ∞ Z y
fY (y) = fXY (x, y)dx = 2dy = 2y, 0<y<1
−∞ x=0
Clearly, in this case given fX (x) and fY (y) as above, it will not be possible to obtain the
Example 2
Spring 2023 59
X and Y are said to be jointly normal (Gaussian) distributed, if their joint pdf has the
following form:
1 p
fXY (x, y) = 1 − ρ2
2πσx σy
( )
−1 h (x − µx )2 2ρ(x − µx )(y − µy ) (y − µy )2 i
exp 2
− +
2(1 − ρ2 ) σX σX σY σY2
−∞ < x < ∞ − ∞ < y < ∞, |ρ| < 1 (4.17)
1 h −(x − µ )2 i
x
fX (x) = p exp 2
∼ ℵ(µ2X , σX
2
)
2
2πσX 2σX
Similarly
1 h −(y − µ )2 i
Y
fY (y) = p exp ∼ ℵ(µ2Y , σY2 )
2πσY2 2σY2
2
Following the above notation, we will denote Equation (4.17) as ℵ(µX , µY , σX , σY2 , ρ).
Once again, knowing the marginals in above alone doesn’t tell us everything about the
joint pdf in Equation (4.17). As we show below, the only situation where the marginal
pdf s can be used to recover the joint pdf is when the random variables are statistically
independent.
• For continuous RV s,
Spring 2023 60
or equivalently, if X and Y are independent, then we must have
P (X = xi , Y = yj ) = P (X = xi ) · P (Y = yj ) ∀ i, j (4.20)
Equations (4.18)-(4.20) give us the procedure to test for independence. Given fXY (x, y),
obtain the marginal pdf s fX (x) and fY (y) and examine whether one of equations in
(4.18) or (4.20) is valid. If so, the RV s are independent, otherwise they are dependent.
• Returning back to Example 1, we observe by direct verification that fXY (x, y) 6= fX (x)·
• It is easy to see that such is the case in the case of Example 2 also, unless in other
words, two jointly Gaussian RV s as in Equation (4.17) are independent if and only if
If X and Y are random variables and g(·) is a function of two variables, then
XX
E[g(X, Y )] = g(x, y) · p(x, y) discrete case
y x
Z ∞ Z ∞
= g(x, y)fXY (x, y)dxdy continuous case (4.21)
−∞ −∞
Spring 2023 61
• If X and Y are independent, then for any functions h(·) and g(·)
And
Example 3
Random variables X1 and X2 are independent and identically distributed with probability
density function
1 − x/2, 0 ≤ x ≤ 2;
fX (x) =
0,
o.w.
Find
(b) Let FX (x) denote the CDF of both X1 and X2 . The CDF of Z = max(X1 , X2 ) is found
Spring 2023 62
Thus, for 0 ≤ z ≤ 2,
z2 2
FZ (z) = (z − )
4
Example 4
Given
xy 2 e−y , 0 < y < ∞, 0 < x < 1;
fXY (x, y) =
0,
o.w.
Determine whether X and Y are independent.
Solution:
Z ∞ Z ∞ Z ∞
2 −y
fX (x) = fXY (x, y)dy = x y e dy = x −y 2 de−y
0 0 0
∞
Z ∞
= x − y 2e−y +2 ye−y dy = 2x, 0<x<1
0
0
Similarly
1
y 2 −y
Z
fy (y) = fXY (x, y)dx = e , 0<y<∞
0 2
In this case
Spring 2023 63
4.6 Correlation and Covariance
By expanding and simplifying the right side of the above equation, we also get
Cov(X, Y ) Cov(X, Y )
ρXY = p = − 1 ≤ ρXY ≤ 1
V ar(X)V ar(Y ) σX σY
Cov(X, Y ) = ρXY σX σY
E(XY ) = 0
From above, if either X or Y has zero mean, then orthogonality implies uncorrelated-
Spring 2023 64
Suppose X and Y are independent RV s,
E(XY ) = E(X)E(Y ),
therefore from Equation (4.22), we conclude that the random variables are uncorrelated.
Thus independence implies uncorrelatedness (ρXY = 0). But the inverse is generally not
true.
Example 5
Solution:
and
σZ2 2
= var(Z) = E[(Z − µZ ) ] = E [a(X − µX ) + b(Y − µY )] 2
= a2 σX
2
+ 2abρXY σX σY + b2 σY2
In particular if X and Y are independent, then ρXY = 0, and the above equation reduces to
σZ = a2 σX
2
+ b2 σY2
Thus the variance of the sum of independent RV s is the sum of their variances (a = b = 1).
6 Moments
Z ∞ Z ∞
m n
E[X Y ] = X m Y n fXY (x, y)dxdy
−∞ −∞
Spring 2023 65
7 Joint Characteristic Function
Following the one random variable case, we can define the joint characteristic function
between two random variables which will turn out to be useful for moment calculations.
From this and the two-dimensional inversion formula for Fourier transforms, it follows
that
∞ ∞
1
Z Z
fXY (x, y) = 2 ΦXY e−j(ω1 x+ω2 y) dω1 dω2
4π −∞ −∞
Note that
Also
Convolution
Hence,
It is known that the density of Z equals the convolution of fX (x) and fY (y). From
above, the characteristic function of the convolution of two densities equals the product
Spring 2023 66
Example 6
Then
jω −1) jω −1)
ΦX (ω) = eλ1 (e , ΦY (ω) = eλ2 (e
so that
jω −1)
ΦZ (ω) = e(λ1 +λ2 )(e ∼ P (λ1 + λ2 )
X1 + X2 + · · · + Xn − nµ
Y = √ ,
σ n
Y → ℵ(0, 1)
The central limit theorem states that a large sum of independent random variables each
with finite variance tends to behave like a normal random variable. Thus the individual
pdf s become unimportant to analyze the collective sum behavior. If we model the noise
phenomenon as the sum of a large number of independent random variables (e.g.: electron
motion in resistor components), then this theorem allows us to conclude that noise behaves
Spring 2023 67
like a Gaussian RV s. This theorem holds for any distribution of the Xi ’s; herein lies its
power.
Spring 2023 68
Lecture 5
Stochastic Processes
with a probability measure P [·] defined on a sample space S and a function that assigns
a time function x(t, s) to each outcome s in the sample space of the experiment.
• Definition: Sample Function: A sample function x(t, s) is the time function associated
(5.1)
69
Figure 5.1: Illustration of Stochastic Process.
• Discrete Value and Continuous Value Processes: X(t) is a discrete value process if the
set of all possible values of X(t) at all times t is a countable set SX ; otherwise, X(t)
• Discrete Time and Continuous Time Process: The stochastic process X(t) is a discrete
time process if X(t) is defined only for a set of time instants, tn = nT , where T is a
Spring 2023 70
constant and n is an integer; otherwise X(t) is a continuous time process.
• Random variables from random processes: consider a sample function x(t, s), each
x(t1 , s) is a sample value of a random variable. We use X(t1 ) for this random variable.
The notation X(t) can refer to either the random process or the random variable that
the pmf of X3 ?
The random variable X3 is the value of the die roll at time 3. In this case,
1/6, x = 1, 2 · · · , 6;
P X3 =
0,
o.w.
dom Sequences
· · · , X−2 , X−1 , X0 , X1 , X2 , · · ·
are i.i.d random variables. An i.i.d random sequence occurs whenever we perform indepen-
dent trials of an experiment at a constant rate. An i.i.d random sequence can be either
discrete value or continuous value. In the discrete case, each random variable Xi has pmf
PXi (x) = PX (x), while in the continuous case, each Xi has pdf fXi (x) = fX (x).
Theorem: Let Xn denote an i.i.d random sequence. For a discrete value process, the sample
Spring 2023 71
Otherwise, for a continuous value process, the joint pdf of Xn1 , · · · , Xnk is
k
Y
fXn1 , · · · , Xnk (x1 , · · · , xk ) = fX (x1 )fX (x2 ) · · · fX (xk ) = fX (xi )
i=1
• The Expected Value of Process: The expected value of a stochastic process X(t)
µX (t) = E[X(t)]
RX (t, τ ) = E[X(t)X(t + τ )]
where m and k are integers. The autocorrelation function of the random sequence Xn
is
Example 1
Spring 2023 72
similarly,
N
1 X
RX (t, 2Ts ) = a(i)a(i + 2)
N i=1
and
N
1 X
CX (t, Ts ) = (a(i) − µX ) (a(i + 1) − µX )
N i=1
Example 2
If R is a random variable, find the expected value of the rectified cosine X(t) = R| cos 2πf t|.
Example 3
The input to a digital filter is an i.i.d random sequence · · · , X−1 , X0 , X1 , · · · with E[Xi ] = 0
and V ar[Xi ] = 1. The output is also a random sequence · · · , Y−1 , Y0 , Y1 , · · · . The relation-
ship between the input sequence and output sequence is expressed in the formula
Find the expected value function E[Yn ] and autocovariance function CY (m, k) of the output.
CY [m, k], we observe that Xn being an i.i.d random sequence with E[Xn ] = 0 and V ar[Xn ] =
1 implies
1, k = 0;
CX [m, k] = E[Xm Xm+k ] =
0, o.w.
Spring 2023 73
For any integer k, we can write
= CX [m, k] + CX [m, k − 1] + CX [m − 1, k + 1] + CX [m − 1, k]
We still need to evaluate the above expression for all k. For each value of k, some terms in
When k = 0
When k = 1
When k = −1
When k = 2
Spring 2023 74
5.4 Stationary Processes
In general, for the stochastic process, X(t), there is a random variable X(t1 ) at every time
instant t1 with pdf fX(t1 ) (x) which depends on t1 . For a special class of random process
known as stationary processes fX(t1 ) (X) does not depend on t1 . That is, for any two time
instants t1 and t1 + τ
• If X(t) is a stationary process, the expected value, the autocorrelation, and the auto-
(a) µX(t) = µX
Example 4
At the receiver of an AM radio, the received signal contains a cosine carrier signal at the
carrier frequency fc with a random phase θ that is a sample value of the uniform (0, 2π)
X(t) = A cos(2πfc t + θ)
Spring 2023 75
What are the expected value and autocorrelation of the process X(t)?
We will use the identity cos A cos B = [cos(A−B)+cos(A+B)]/2 to find the autocorrelation:
A2
Thus, RX (t, τ ) = cos(2πfc τ ) = RX (τ ).
2
Therefore, X(t) has the properties of a stationary stochastic
X(t) is a wide sense stationary stochastic process if and only if for all t,
Spring 2023 76
In Example 4, we observe that µX (t) = 0 and RX (t, τ ) = (A2 /2) cos 2πfc τ Thus the random
Properties of WSS
The autocorrelation function of a wide sense stationary process has a number of important
properties:
1. RX (0) ≥ 0
2. RX (τ ) = RX (−τ )
3. RX (0) ≥ RX (τ )
The average power of a wide sense stationary process X(t) is RX (0) = E[X 2 (t)].
1. R1 (τ ) = e−|τ |
2
2. R2 (τ ) = eτ
3. R3 (τ ) = e−τ cos τ
2
4. R4 (τ ) = e−τ sin τ
Example 5
A simple model (in degrees Celsius) for the daily temperature process C(t) is
2πn
Cn = 16[1 − cos ] + 4Xn
365
Spring 2023 77
(a) What is the mean E[Cn ]?
Solution:
Example 6
1
Cn = Cn−1 + 4Xn ,
2
where C0 , X1 , X2 , · · · is an iid random sequence of ℵ(0, 1) random variables
Solution:
Spring 2023 78
a) Since C0 , X1 , X2 , · · · all have zero mean,
n
E[C0 ] X E[Xi ]
E[Cn ] = + 4 =0
2n i=1
2n−1
Electrical signals are usually represented as sample functions of wide sense stationary stochas-
tic processes. We use probability density functions and probability mass functions to describe
the amplitude characteristics of signals, and we use autocorrelation functions to describe the
time-varying nature of the signals. In Practical equipment uses digital signal processing
random variable Qn . Here, we ignore quantization and analyze linear filtering of random
The relationship of the stochastic process at the output w(t) of a linear time invariant (LT I)
with impulse response h(t) filter to the stochastic process at the input of the filter v(t), is
the convolution:
Z ∞ Z ∞
w(t) = h(u)v(t − u)du = h(t − u)v(u)du.
−∞ −∞
Spring 2023 79
If the possible inputs to the filter are x(t), sample functions of a stochastic process X(t),
then the outputs, y(t), are sample functions of another stochastic process, Y (t). Because
y(t) is the convolution of x(t) and h(t), we adopt the following notation for the relationship
of Y (t) to X(t):
Z ∞ Z ∞
Y (t) = h(u)X(t − u)du = h(t − u)X(u)du.
−∞ −∞
Similarly, the expected value of Y (t) is the convolution of h(t) and E[X(t)].
hZ ∞ i Z ∞
E[Y (t)] = E h(u)X(t − u)du = h(u)E X(t − u) du
−∞ −∞
If the input to an LT I filter with impulse response h(t) is a W SS process X(t), the output
• X(t) and Y (t) are jointly wide sense stationary and have input-output cross-correlation
Z ∞
RXY (τ ) = h(u)RX (τ − u)du.
−∞
Spring 2023 80
Example 7
X(t), a wide sense stationary stochastic process with expected value µ = 10 volts, is the
Solution:
Z ∞ Z 0.1
et/0.2 dt = 2 e0.5 − 1 = 1.3 volt
µY = µX h(t)dt =
−∞ 0
rate of 1/Ts samples per second. If X(t) is a wide sense stationary process with expected
RX [k] = RX (kTs )
and the output is a random sequence Yn , related to the input Xn by the discrete-time
convolution,
∞
X
Yn = hi Xn−i
i=−∞
If the input to a discrete-time LT I filter with impulse response hn is a wide sense stationary
Spring 2023 81
• (a) Yn is a wide sense stationary random sequence with expected value
∞
X
µ = E[Yn ] = µX hn
n=−∞
• (b) Yn and Xn are jointly wide sense stationary with input-output cross-correlation
∞
X
RXY [n] = hi RX [n − i]
i=−∞
Example 8
For the case M = 2, find the following properties of the output random sequence Yn : the
Solution:
µY = µX (h0 + h1 ) = µX = 1.
Spring 2023 82
The autocorrelation of the filter output is
1 X
X 1
RY [n] = (0.25)RX [n + i − j]
i=0 j=0
3, n = 0;
2,
|n| = 1;
= (0.5)RX [n] + (0.25)RX [n − 1] + (0.25)RX [n + 1] =
0.5, |n| = 2.
0,
o.w.
Spring 2023 83
5.6 Power Spectral Density of a Continuous-Time Pro-
cess
AS you studied before, the functions g(t) and G(f ) have the Fourier transform pair:
Z ∞ Z ∞
−j2πf t
G(f ) = g(t)e dt, g(t) = G(f )ej2πf t df,
−∞ −∞
Spring 2023 84
The power spectral density function of the wide sense stationary stochastic process X(t) is
" Z #
T
1 h
2
i 1 2
SX (f ) = lim E XT (f ) = lim E X(t)e−j2πf t dt .
T →∞ 2T T →∞ 2T −T
Physically, Sx (f ) has units of watts/Hz = Joules. Both the autocorrelation function and
the power spectral density function convey information about the time structure of X(t).
Z ∞ Z ∞
−j2πf τ
SX (f ) = RX (τ )e dτ, RX (τ ) = SX (f )ej2πf τ df,
−∞ −∞
For a wide sense stationary random process X(t), the power spectral density Sx (f ) is a
1. SX (f ) ≥ 0, for all f .
R∞
2. −∞
SX (f )df = E[X 2 (t)] = RX (0)
3. SX (−f ) = SX (f )
Example 9
A wide sense stationary process X(t) has autocorrelation function RX (τ ) = Ae−b|τ | where
b > 0. Derive the power spectral density function Sx (f ) and calculate the average power
E[X 2 (t)]. To find Sx (f ), we use the above table, since RX (τ ) is of the form ae−a|τ | .
2Ab
SX (f ) =
(2πf )2 + b2
∞
2Ab
Z
2 −b|0|
E[X (t)] = RX (0) = Ae = df = A
−∞ (2πf )2 + b2
Figure 5.3 displays three graphs for each of two stochastic processes. For each process, the
three graphs are the autocorrelation function, the power spectral density function, and one
Spring 2023 85
Figure 5.3: Random processes V (t) and W (t) with autocorrelation functions Rv (τ ) = e−05|τ |
and Rw (τ ) = e−2|τ | are examples of the process X(t) in above Example. These graphs show
Rv (τ ) and Rw (τ ), the power spectral density functions Sv (f ) and Sw (f ), and sample paths
sample function. For both processes, the average power is A = 1 watt. Note W (t) has a
narrower autocorrelation (less dependence between two values of the process with a given
time separation) and a wider power spectral density (more power at higher frequencies) than
V (t). The sample function w(t) fluctuates more rapidly with time than v(t).
The spectral analysis of a random sequence parallels the analysis of a continuous-time pro-
cess. A sample function of a random sequence is an ordered list of numbers. Each number in
the list is a sample value of a random variable. The discrete-time Fourier transform (DTFT)
Spring 2023 86
is a spectral representation of an ordered set of numbers.
The sequence {· · · , X−2 X−1 x0 , X1 , x2 , · · · } and the function X(φ) are a discrete-time
The power spectral density function of the wide sense stationary random sequence Xn is
∞
X Z 1/2
−j2πφk
SX (φ) = RX [k]e , RX (k) = SX (φ)ej2πφk df.
k=−∞ −1/2
The properties of the power spectral density function of a random sequence are similar to
the properties of the power spectral density function of a continuous-time stochastic process.
R 1/2
2. −1/2
SX (φ)dφ = E[Xn2 ] = Rf X[0]
3. SX (−φ) = SX (φ)
Example 10
The wide sense stationary random sequence Xn has zero expected value and autocorrelation
function
σ 2 (2 − |n|)/2, n = −1, 0, 1;
RX [k] =
0,
o.w.
Derive the power spectral density function of Xn .
Spring 2023 87
Solution:
We have
1
X
SX (φ) = RX [n]e−j2πnφ
n=−1
h (2 − 1) 2 (2 − 1) −j2πφ i
= σ2 ej2πφ + + e
4 4 4
σ2
= 1 + cos(2πφ)
2
Spring 2023 88