ch1 2
ch1 2
ch1 2
(EC636)
by
University of Tripoli
http://melalem.com/EC636.php
c
M.Elalem
Lecture 1
Probabilities
1.1 Introduction
– Today’s temperature
– Walk to a bus station, how long do you wait for the arrival of a bus?
• We create models to analyze, since real experiment are generally too complicated, for
1
– The weight, horsepower, and gear ratio of the bus;
– The status of all road construction within 100 miles of the bus stop.
• It would be apparent that it would be too difficult to analyze the effects of all the
factors on the likelihood that you will wait less than 5 minutes for a bus. Therefore,
it is necessary to study and create a model to capture the critical part of the actual
physical experiment.
• Probability theory deals with the study of random phenomena, which under re-
peated experiments yield different outcomes that have certain underlying patterns
about them.
– Union: E ∪ F = {s ∈ Ω : s ∈ E OR s ∈ F };
– Intersection: E ∩ F = {s ∈ Ω : s ∈ E AND s ∈ F };
– Complement: E c = Ē = {s ∈ Ω : s ∈
/ E};
Spring 2023 2
1.1.2 Several Definitions
• Disjoint: if A ∩ B = Φ, the empty set, then A and B are said to be mutually exclusive
(M.E), or disjoint.
Σ∞
i=1 Ai = Ω
Ai ∩ Aj = φ and ∪ni=1 Ai = Ω
Spring 2023 3
1.1.4 Sample Space, Events and Probabilities
ment.
• Sample space: the sample space of an experiment is the set of all possible outcomes
of that experiment.
Spring 2023 4
∗ noise: S = {n(t); t: real}
Example 1
The sample space consists of 16 four-letter words, with each letter either h (head)
or t (tail).
one or more outcomes, say, B1 = {ttth, ttht, thtt, httt} contains four outcomes.
Example 2
Toss two dices, there are 36 elements in the sample space. If we define the event
Ω = {B2 , B3 , · · · , B12 }
– Practical example, binary data transmit through a noisy channel, we are more
Often it is meaningful to talk about at least some of the subsets of S as events, for which
Example 3
Spring 2023 5
Consider the experiment where two coins are simultaneously tossed. The sample space is
If we define
A = {γ1 , γ2 , γ3 }
The event of A is the same as “Head has occurred at least once” and qualifies as an event.
• Theorems are consequences that follow logically from definitions and axioms. Each
theorem has a proof that refers to definitions, axioms, and other theorems.
For any event A, we assign a number P (A), called the probability of the event A. This
number satisfies the following three conditions that act the axioms of probability.
P (A) ≥ 0 (1.2)
P (Ω) = 1 (1.3)
Spring 2023 6
3- For any countable collection A1 , A2 , · · · of mutually exclusive events
(Note that (3) states that if A and B are mutually exclusive (M.E.) events, the
P (A ∪ A) = P (Ω) = 1
• Similarly, for any A, A ∩ {φ} = {φ}. hence it follows that P (A ∪ {φ}) = P (A) + P (φ).
P {φ} = 0
• Suppose A and B are not mutually exclusive (M.E.) How does one compute P (A ∪ B)?
so that we can make use of the probability axioms. From figure below,
P (A ∪ B) = A ∪ AB
where A and AB are clearly M.E. events. Thus using axiom (3)
Spring 2023 7
P (A ∪ B) = P (A ∪ AB) = P (A) + P (AB)
B = B ∩ Ω = B ∩ (A ∪ A) = (B ∩ A) ∪ (B ∩ A) = BA ∪ BA
Thus
Therefore
1 1 1 3
P (A ∪ B) = P (A) + P (B) − P (AB) = + − =
2 2 4 4
Spring 2023 8
1.1.8 Theorem
For an event space B = {B1 , B2 , · · · } and any event A in the event space, let Ci =
Example 4
Coin tossing, let A equal the set of outcomes with less than three heads, as A =
{tttt, httt, thtt, ttht, ttth, hhtt, htht, htth, tthh, thth, thht} Let {B0 , B1 , B2 , B3 , B4 } de-
A = C0 ∪ C1 ∪ C2 ∪ C3 ∪ C4
= (A ∩ B0 ) ∪ (A ∩ B1 ) ∪ (A ∩ B2 ) ∪ (A ∩ B3 ) ∪ (A ∩ B4 )
this example states that the event less than three heads is the union of the events for
Example 5
Spring 2023 9
V F D
A company has a model of telephone usage. It classifies all calls as L (long), B (brief).
It also observes whether calls carry voice(V ), fax (F ), or data(D). The sample space
has six outcomes S = {LV, BV, LD, BD, LF, BF }. The probability can be represented
the previous theorem (and L is equivalent as the event A). Thus, we can apply the
theorem to find
for large N
NA NB NAB
P (A) = P (B) = P (AB) =
N N N
Among the NA occurrences of A, only NAB of them are also found among the NB
is a measure of the event A given that B has already occurred. We denote this condi-
tional probability by P (A|B) = Probability of the event A given that B has occurred.
Spring 2023 10
We define
P (AB)
P (A|B) = (1.5)
P (B)
provided P (B) 6= 0. As we show below, the above definition satisfies all probability
1. Non-negative
P (AB) ≥ 0
P (A|B) = ≥0
P (B) > 0
2.
P (ΩB) P (B)
P (Ω|B) = = = 1, since ΩB = B
P (B) P (B)
3. Suppose A ∩ C = φ, then
P (AB) P (CB)
P (A ∪ C|B) = + = P (A|B) + P (C|B)
P (B) P (B)
measure.
1. If B ⊂ A, AB = B, and
P (AB) P (B)
P (A|B) = = =1
P (B) P (B)
Spring 2023 11
2. If A ⊂ B, AB = A, and
P (AB) P (A)
P (A|B) = = > P (A)
P (B) P (B)
B. The statement that B has occurred (outcome is even) makes the probability
cated event in terms of simpler related events: Law of Total Probability. Let
∪ni=1 Ai = Ω
thus
n
X n
X
P (B) = P (BAi ) = P (B|Ai )P (Ai )
i=1 i=1
Notice that the above definition is a probabilistic statement, NOT a set theo-
retic notion such as mutually exclusiveness, (independent and disjoint are not
synonyms).
Spring 2023 12
1.1.11 More on Independence
plication.
– Independent events cannot be mutually exclusive, since P (A) > 0, P (B) > 0, and
A, B independent implies P (AB) > 0, thus the event AB cannot be the null set.
Thus if A and B are independent, the event that B has occurred does not shed
any more light into the event A. It makes no difference to A whether B has
occurred or not.
Example 6
A box contains 6 white and 4 black balls. Remove two balls at random without
replacement. What is the probability that the first one is white and the second
one is black?
Let W1 = “first ball removed is white” and B2 = “second ball removed is black”.
But
6 6 3
P (W1 ) = = =
6+4 10 5
Spring 2023 13
and
4 4
P (B2 |W1 ) = =
5+4 9
and hence
3 4 4
P (W1 |B2 ) = · = = 0.267
5 9 15
Are the events W1 and B2 independent? Our common sense says No. To verify
this we need to compute P (B2). Of course the fate of the second ball very much
depends on that of the first ball. The first ball has two options: W1 = “first ball
4 3 3 4 2
P (B2 ) = · + · =
5 + 4 5 6 + 3 10 5
and
2 3 4
P (B2 )P (W1 ) = · =6 P (B2 W1 ) =
5 5 15
As expected, the events W1 and B2 are dependent.
since
similarly,
P (BA) P (AB)
P (B|A) = = ⇒ P (AB) = P (B|A)P (A)
P (A) P (A)
We get
Spring 2023 14
or
P (B|A)
P (A|B) = · P (A) (1.6)
P (B)
represents the a-priori probability of the event A. Suppose B has occurred, and
assume that A and B are not independent. How can this new information be
used to update our knowledge about A? Bayes’ rule takes into account the new
information (“B has occurred”) and gives out the a-posteriori probability of A
given B.
We can also view the event B as new knowledge obtained from a fresh experiment.
of A. Bayes theorem gives the exact mechanism for incorporating such new in-
formation.
associated a-priori probabilities P (Ai ), i = [1, n]. With the new information “B
Example 7
Two boxes B1 and B2 contain 100 and 200 light bulbs respectively. The first box
(B1 ) has 15 defective bulbs and the second 5. Suppose a box is selected at random
Spring 2023 15
and one bulb is picked out.
Solution: Note that box B1 has 85 good and 15 defective bulbs. Similarly box
B2 has 195 good and 5 defective bulbs. Let D = “Defective bulb is picked out”.
Then,
15 5
P (D|B1) = = 0.15, P (D|B2) = = 0.025
100 200
Since a box is selected at random, they are equally likely.
Thus B1 and B2 form a partition, and using Law of Total Probability, we obtain
1 1
P (D) = P (D|B1 )P (B1 ) + P (D|B2)P (B2 ) = 0.15 × + 0.025 × = 0.0875
2 2
(b) Suppose we test the bulb and it is found to be defective. What is the proba-
Notice that initially P (B1) = 0.5; then we picked out a box at random and tested
a bulb that turned out to be defective. Can this information shed some light about
the fact that we might have picked up box 1? From (1.8), P (B1 |D) = 0.875 > 0.5,
and indeed it is more likely at this point that we must have chosen box 1 in favor
of box 2. (Recall box1 has six times more defective bulbs compared to box2).
Example 8
Suppose you have two coins, one biased, one fair, but you don’t know which coin is
which. Coin 1 is biased. It comes up heads with probability 3/4, while coin 2 will flip
Spring 2023 16
heads with probability 1/2. Suppose you pick a coin at random and flip it. Let Ci
denote the event that coin i is picked. Let H and T denote the possible outcomes of the
flip. Given that the outcome of the flip is a head, what is P [C1 |H], the probability that
you picked the biased coin? Given that the outcome is a tail, what is the probability
Solution: First, we construct the sample tree as shown: To find the conditional
probabilities, we see
Similarly,
As we would expect, we are more likely to have chosen coin 1 when the first flip is
heads but we are more likely to have chosen coin 2 when the first flip is tails.
Spring 2023 17
Lecture 2
Random Variables
2.1 Introduction
Let Ω be sample space of a probability model, and X a function that maps every ζ ∈ Ω, to
a unique point x ∈ R, the set of real numbers. Since the outcome ζ is not certain, so is the
value X(ζ) = x. Thus if B is some subset of R, we may want to determine the probability
of ”X(ζ) ∈ B“. To determine this probability, we can look at the set A = X −1 (B) ⊂ Ω. A
Obviously, if the set A = X −1 (B) is an event, the probability of A is well defined; in this
18
However, X −1 (B) may not always belong to R for all B, thus creating difficulties. The
notion of random variable (RV ) makes sure that the inverse mapping always results in an
Random Variable (RV ): A finite single valued function X(·) that maps the set of all
experimental outcomes Ω into the set of real numbers R is said to be a RV , if the set
The random variable X by the function X(ζ) that maps the sample outcome ζ to the
{X = x} = {ζ ∈ Ω|X(ζ) = x}
Since all events have well defined probability. Thus the probability of the event {ζ|X(ζ) ≤ x}
The role of the subscript X is only to identify the actual RV . FX (x) is said to be the
FX (+∞) = 1, FX (−∞) = 0
Spring 2023 19
If x1 < x2 , then the subset (−∞, x1 ) ⊂ (−∞, x2 ). Consequently the event {ζ|X(ζ) ≤
implying that the probability distribution function is nonnegative and monotone non-
decreasing.
To prove this theorem, express the event Eab = {a < X ≤ b} as a part of union of
disjoint events. Starting with the event Eb = {X ≤ b} . Note that Eb can be written
as the union
Eb = X ≤ b = {X ≤ a} ∪ {a < X ≤ b} = Ea ∪ Eab
Note also that Ea and Eab are disjoint so that P (Eb) = P (Ea )+P (Eab). Since P (Eb ) =
FX (b) and P (Ea ) = FX (a), we can write FX (b) = FX (a) + P (a < X ≤ b), which
This follows, since FX (x0 ) = P (X(ζ) ≤ x0 ) = 0 implies {X(ζ) ≤ x0 } is the null set,
We have {X(ζ) ≤ x} ∪ {X(ζ) > x} = Ω, and since the two events are mutually
Spring 2023 20
• P {x1 < X(ζ) ≤ x2 } = FX (x2 ) − FX (x1 ), x2 > x1
The events {X(ζ) ≤ x1 } and {x1 < X(ζ) ≤ x2 } are mutually exclusive and their union
or
FX (x+
0 ), the limit of FX (x) as x → x0 from the right always exists and equals FX (x0 ).
continuous from the left. At a discontinuity point of the distribution, the left and right
Thus the only discontinuities of a distribution function are of the jump type. The
CDF is continuous from the right. Keep in mind that the CDF always takes on the
Example 1
Solution: For x < c, {X(ζ) ≤ x} = φ, so that FX (x) = 0 and for x > c, X(ζ) ≤ x = Ω, so
Example 2
Spring 2023 21
Figure 2.1: CDF for example 1
Toss a coin. Ω = {H, T }. Suppose the RV X is such that X(T ) = q, X(H) = 1 − q. Find
FX (x).
Solution:
Spring 2023 22
point, then
pi = P {X = xi } = FX (xi ) − FX (x−
i )
P {X = c} = FX (c) − FX (c− ) = 1 − 0 = 1
and
P {X = 0} = FX (0) − FX (0− ) = q − 0 = q
Example 3
A fair coin is tossed twice, and let the RV X represent the number of heads. Find FX (x).
X(T T ) = 0
3 1 1
P {X = 1} = FX (1) − FX (1− ) = − =
4 4 2
The first derivative of the distribution function FX (x) is called the probability density func-
Spring 2023 23
Figure 2.3: CDF for example 3
• Discrete RV:if X is a discrete type RV , then its density function has the general
form
X
fX (x) = pi δ(x − xi )
i
where xi represent the jump-discontinuity points in FX (x). As Fig. 2.4 shows, fX (x)
Spring 2023 24
Since FX (+∞) = 1, yields
Z −∞
fX (u)du = 1
−∞
Thus the area under fX (x) in the interval (x1 , x2 ) represents the probability in the
above equation.
• Often, RV s are referred by their specific density functions - both in the continuous and
discrete cases - and in what follows we shall list a number of them in each category.
1 h (x − µ)2 i
fX (x) = √ exp − (2.3)
2πσ 2 2σ 2
This is a bell shaped curve, symmetric around the parameter µ, and its distribution
function is given by
x h (y − µ)2 i
1 x−µ
Z
FX (x) = √ exp − 2
dy = Φ( ) (2.4)
−∞ 2πσ 2 2σ σ
Spring 2023 25
Rx
where Φ(x) = −∞
√1
2π
exp(− y2 )dy is called standard normal CDF , and is often tabu-
lated. Figure 2.6 shows pdf and cdf of the Normal distribution for different means and
variances.
Figure 2.6: pdf and cdf of Normal distribution for different means and variances
b−µ a−µ
P (a < X < b) = Φ( ) − Φ( )
σ σ
Z ∞
1 y
Q(x) = √ exp − dy = 1 − Φ(x)
x 2π 2
Q(x) is called Standard Normal complementary CDF , and Q(x) = 1 − Φ(x). Since
X −µ
Y = ∼ ℵ(0, 1) (2.5)
σ
aX + b ∼ ℵ(aµ + b, a2 σ 2 )
Spring 2023 26
Figure 2.7: pdf and cdf of Uniform distribution
• Exponential: X ∼ E(λ) if
1
exp(− λx , x ≥ 0;
λ
fX (x) = (2.7)
0,
elsewhere.
Figure ?? indicates the pdf and cdf of Exponential distribution for different parameter
λ.
Spring 2023 27
When the RV X is defined as X = Σni=1 Xi , i = [1, n] are statistically independent
distribution with n degree of freedom. Γ(x) is called Gamma function and given as
Z ∞
Γ(p) = tp−1 e−t dt, p > 0
0
1 √
Γ( ) = π
2
Let Y = X12 +X22 where X1 and X2 ∼ ℵ(0, σ 2 ) and independent. Then Y is chi-square
1 y
fY (y) = exp −
2σ 2 2σ 2
√
Now, suppose we define a new RV as R = Y , then R is Rayleigh distributed.
Spring 2023 28
2.4 Discrete-type Random Variables
P (X = 0) = q, P (X = 1) = p, p = 1 − q.
• Binomial: X ∼ B(n, p)
n
P (X = k) = pk q n−k , k = 0, 1, 2, · · · , n
k
• Poisson: X ∼ P (λ)
λk
P (X = k) = e−λ , k = 0, 1, 2, · · · , ∞
k!
1
P (X = k) = , k = 0, 1, 2, · · · , n
n
P (X = k) = (1 − p)k−1 p, k = 1, · · · ,
where the parameter p ∈ (0, 1) (probability for head appear on each one toss).
that occur randomly in time. While the time of each occurrence is completely random,
there is a known average number of occurrences per unit time. For example, the arrival of
For example, calls arrive at random times at a telephone switching office with an average
Spring 2023 29
of λ = 0.25 calls/second. The pmf of the number of calls that arrive in a T = 2 second
interval is
(0.5)k e−0.5 , k = 0, 1, 2, · · · ;
k!
PK (k) =
0,
o.w.
transmit the same binary symbol 5 times. Thus, ”zero“ is transmitted as 00000 and ”one“
is transmitted as 11111. The receiver detects the correct information if three or more binary
symbols are received correctly. What is the information error probability P (E), if the binary
In this case, we have five trials corresponding to five transmissions. On each trial, the
By increasing the number of transmissions (5 times), the probability of error is reduced from
0.1 to 0.0081.
Bernoulli trial consists of repeated independent and identical experiments each of which has
only two outcomes A or Ā with P (A) = p and P (Ā) = q. The probability of exactly k
Let
Spring 2023 30
Since the number of occurrences of A in n trials must be an integer k = 0, 1, 2, · · · , n, either
P (X0 ∪ X1 ∪ X2 ∪ · · · ∪ Xn ) = 1 (2.11)
For a given n and p what is the most likely value of k? The most probable value of k is that
number which maximizes in Binomial distribution. To obtain this value, consider the ratio
Thus Pn (k) > Pn (k − 1), if k(1 − p) < (n − k + 1)p or k < (n + 1)p. Thus, Pn (k) as a function
of k increases until
k = (n + 1)p
if it is an integer, or the largest integer kmax less than (n + 1)p and (n + 1)p represents the
Example 4
In a Bernoulli experiment with n trials, find the probability that the number of occurrences
of A is between k1 and k2 .
Spring 2023 31
Solution: with Xi , i = 0, 1, 2, · · · , n as defined in Equation (2.10), clearly they are mutually
Example 5
Suppose 5, 000 components are ordered. The probability that a part is defective equals 0.1.
What is the probability that the total number of defective parts does not exceed 400?
Solution: Let
The above equation has too many terms to compute. Clearly, we need a technique to compute
Spring 2023 32
useful.
Thus if k1 and k2 in Equation (2.14) are within or around the neighborhood of the interval
√ √
(np − npq, np + npq) we can approximate the summation in Equation (2.14) by an
integration as
K2 (k − np)2
1
Z
P (k1 < X < k2 ) = √ exp − dx
k1 2πnpq 2npq
x2 y2
1
Z
= √ exp − dy (2.16)
x1 2π 2
where
k1 − np k2 − np
x1 = √ x2 = √
npq npq
We can express Equation (2.16) in terms of the normalized integral that has been tabulated
x
1
Z
2 /2
erf (x) = √ ey dy = −erf (−x) (2.17)
2π 0
Example 6
A fair coin is tossed 5, 000 times. Find the probability that the number of heads is between
2, 475 to 2, 525.
Solution: We need P (2475 ≤ X ≤ 2525). Here n is large so that we can use the normal
Spring 2023 33
Figure 2.10: pdf of Gaussian approximation.
√ √
approximation. In this case p = 1/2, so that np = 2500, and npq ≃ 35. Since np − npq ≃
√
2465 and np + npq ≃ 2535, the approximation is valid for k1 = 2475 and k2 = 2525. Thus
x2 y2
1
Z
P (k1 < X < k2 ) = √ exp − dy (2.18)
x1 2π 2
here
k1 − np 5 k2 − np 5
x1 = √ =− , x2 = √ =
npq 7 npq 7
5
P (2475 ≤ X ≤ 2525) = erf (x2 ) − erf (x1 ) = erf (x2 ) + erf (|x1 |) = 2erf ( ) = 0.516
7
Spring 2023 34
Figure 2.11: The standard normal complementary CDF Φ(z)
Spring 2023 35
Figure 2.12: The standard normal complementary CDF Q(z)
Spring 2023 36