Chapter One 1-Random Variables: Dr. Mahmood
Chapter One 1-Random Variables: Dr. Mahmood
1- Random Variables
A random variable, usually written X, is a variable whose possible values are numerical
outcomes of a random phenomenon. There are two types of random variables, discrete
and continuous. All random variables have a cumulative distribution function. It is a
function giving the probability that the random variable X is less than or equal to x, for
every value x.
A discrete random variable is one which may take on only a countable number of distinct
values such as 0,1,2,3,4,........ If a random variable can take only a finite number of
distinct values, then it must be discrete. Examples of discrete random variables include
the number of children in a family, the number of defective light bulbs in a box of ten.
The probability distribution of a discrete random variable is a list of probabilities
associated with each of its possible values. It is also sometimes called the probability
function or the probability mass function.
When the sample space Ω has a finite number of equally likely outcomes, so that the
discrete uniform probability law applies. Then, the probability of any event x is given
by:
𝑁𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑒𝑙𝑒𝑚𝑒𝑛𝑡𝑠 𝑜𝑓 𝑥
𝑃(𝐴) =
𝑁𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑒𝑙𝑒𝑚𝑒𝑛𝑡𝑠 𝑜𝑓 Ω
This distribution may also be described by the probability histogram. Suppose a random
variable X may take k different values, with the probability that X = 𝑥𝑖 defined to be
P(X = 𝑥𝑖 ) =𝑃𝑖 . The probabilities 𝑃𝑖 must satisfy the following:
∑ 𝑃𝑖 = 1
𝑖=1
Example
Suppose a variable X can take the values 1, 2, 3, or 4. The probabilities associated
with each outcome are described by the following table:
Outcome 1 2 3 4
Probability 0.1 0.3 0.4 0.2
A continuous random variable is one which takes an infinite number of possible values.
Continuous random variables are usually measurements. Examples include height,
weight and the amount of sugar in an orange. A continuous random variable is not
defined at specific values. Instead, it is defined over an interval of values, and is
represented by the area under a curve. The curve, which represents a function p(x),
must satisfy the following:
1 𝑥−𝜇 2
−0.5( )
ℎ= 𝑒 𝜎
𝜎√2𝜋
2- Joint Probability:
Joint probability is the probability of event Y occurring at the same time event X
occurs. Its notation is 𝑃(𝑋 ∩ 𝑌)𝑜𝑟 𝑃(𝑋, 𝑌), which reads; the joint probability of
X and Y.
𝑃(𝑋, 𝑌) = 𝑃(𝑋) × 𝑃(𝑌)
∑ ∑ 𝑓(𝑥, 𝑦) = 1
𝑥 𝑦
Example:
For discrete random variable, if the probability of rolling a four on one die is
𝑃(𝑋) and if the probability of rolling a four on second die is 𝑃(𝑌). Find 𝑃(𝑋, 𝑌).
Solution:
We have 𝑃(𝑋) = 𝑃(𝑌) = 1/6
1 1 1
𝑃(𝑋, 𝑌) = 𝑃(𝑋) × 𝑃(𝑌) = × = = 0.0277 = 2.8%
6 6 36
3- Conditional Probabilities:
It is happened when there are dependent events. We have to use the symbol "|" to
mean "given":
- And we write it as
𝑃(𝐴 ∩ 𝐵)
𝑃(𝐴 | 𝐵) =
𝑃(𝐵)
Example: A box contains 5 green pencils and 7 yellow pencils. Two pencils are chosen
at random from the box without replacement. What is the probability they are different
colors?
4- Bayes’ Theorem
𝑃(𝐴 ∩ 𝐵)
𝑃(𝐴 | 𝐵) = 𝑃(𝐵) ≠ 0
𝑃(𝐵)
𝑃(𝐴 ∩ 𝐵)
𝑃(𝐵 | 𝐴) = 𝑃(𝐴) ≠ 0
𝑃(𝐴)
𝑃(𝐴 ∩ 𝐵) = 𝑃(𝐴 | 𝐵) × 𝑃(𝐵) = 𝑃(𝐵| 𝐴) × 𝑃(𝐴)
6- Venn's Diagram:
A Venn diagram is a diagram that shows all possible logical relations between a
finite collections of different sets. These diagrams depict elements as points in the
plane, and sets as regions inside closed curves. A Venn diagram consists of multiple
overlapping closed curves, usually circles, each representing a set. The points inside
a curve labelled S represent elements of the set S, while points outside the boundary
represent elements not in the set S. Fig. 5 shows the set 𝐴 = {1, 2, 3}, 𝐵 =
{4, 5} 𝑎𝑛𝑑 𝑈 = {1, 2, 3, 4, 5, 6}.
U
6
4 2 3
5 B 1 A
From the adjoining Venn diagram of Fig. 6, find the following sets:
(𝐵 ∪ 𝐶)′….
Solution:
𝑋 = {1, 2, 3, 4, 5, 6, 7, 8, 9, 10}
𝐵′ = {1, 3, 7, 8, 9, 10},
(𝐶 − 𝐴) = {3, 4, 6, 7, 10},
(𝐵 − 𝐶) = {1, 2, 4, 7, 10},
(𝐴 ∪ 𝐵) = {1, 2, 3, 4, 5, 6},
(𝐴 ∩ 𝐵) = {4, 5},
8- Self- information:
A nat is the natural unit of information, sometimes also nit or nepit, is a unit
of information or entropy, based on natural logarithms and powers of e, rather than the
powers of 2 and base 2 logarithms which define the bit. This unit is also known by its
unit symbol, the nat.
A fair die is thrown, find the amount of information gained if you are told that 4 will
appear.
Solution:
1
𝑃(1) = 𝑃(2) = ⋯ … … . = 𝑃(6) =
6
1
1 ln( )
𝐼(4) = −log 2 ( ) = 6 = 2.5849 𝑏𝑖𝑡𝑠
6 𝑙𝑛2
Example 2:
A biased coin has P(Head)=0.3. Find the amount of information gained if you are told
that a tail will appear.
Solution:
𝑙𝑛0.7
𝐼(𝑡𝑎𝑖𝑙) = −log 2 (0.7) = − = 0.5145 𝑏𝑖𝑡𝑠
𝑙𝑛2
HW
A communication system source emits the following information with their
corresponding probabilities as follows: A=1/2, B=1/4, C=1/8. Calculate the information
conveyed by each source outputs.
6
I(xi) bits
0
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
P(xi)
Solution:
𝑛
Then:
𝐻𝑏 (𝑋) = −[𝑃(0 𝑇 ) log 2 𝑃(0 𝑇 ) + (1 − 𝑃(0 𝑇 )) log 2 (1 − 𝑃(0 𝑇 ))] 𝑏𝑖𝑡𝑠/𝑠𝑦𝑚𝑏𝑜𝑙
If 𝑃(0 𝑇 ) = 0.2, 𝑡ℎ𝑒𝑛 𝑃(1 𝑇 ) = 1 − 0.2 = 0.8, 𝑎𝑛𝑑 𝑝𝑢𝑡 𝑖𝑛 𝑎𝑏𝑜𝑣𝑒 𝑒𝑞𝑢𝑎𝑡𝑖𝑜𝑛,
𝐻𝑏 (𝑋) = −[0.2 log 2 (0.2) + 0.8 log 2 (0.8)] = 0.7
For binary source, if 𝑃(0 𝑇 ) = 𝑃(1 𝑇 ) = 0.5, then the entropy is:
1
𝐻𝑏 (𝑋) = −[0.5 log 2 (0.5) + 0.5 log 2 (0.5)] = − log 2 ( ) = log 2 (2) = 1 𝑏𝑖𝑡
2
For any non-binary source, if all messages are equiprobable, then 𝑃(𝑥𝑖 ) = 1/𝑛 so that:
1 1 1
𝐻(𝑋) = 𝐻(𝑋)𝑚𝑎𝑥 = −[ log 𝑎 ( )] × 𝑛 = −log 𝑎 ( ) = log 𝑎 𝑛 𝑏𝑖𝑡𝑠/𝑠𝑦𝑚𝑏𝑜𝑙, which
𝑛 𝑛 𝑛
is the maximum value of source entropy. Also, 𝐻(𝑋) = 0 if one of the message has the
probability of a certain event or p(x) =1.
𝜏̅ = ∑ 𝜏𝑖 𝑃(𝑥𝑖 )
𝑖=1
𝜏̅ is the average time duration of symbols, 𝜏𝑖 is the time duration of the symbol 𝑥𝑖 .
Example 1:
A source produces dots ‘.’ And dashes ‘-‘ with P(dot)=0.65. If the time duration of dot
is 200ms and that for a dash is 800ms. Find the average source entropy rate.
Solution:
𝑃(𝑑𝑎𝑠ℎ) = 1 − 𝑃(𝑑𝑜𝑡) = 1 − 0.65 = 0.35
𝐻(𝑋) = −[0.65log 2 (0.65) + 0.35log 2 (0.35)] = 0.934 𝑏𝑖𝑡𝑠/𝑠𝑦𝑚𝑏𝑜𝑙
𝜏̅ = 0.2 × 0.65 + 0.8 × 0.35 = 0.41 𝑠𝑒𝑐
𝐻(𝑋) 0.34
𝑅(𝑋) = = = 2.278 𝑏𝑝𝑠
𝜏̅ 0.41
Example 2:
A discrete source emits one of five symbols once every millisecond. The symbol
probabilities are 1/2, 1/4, 1/8, 1/16 and 1/16 respectively. Calculate the information rate.
Solution:
5
1
H = ∑ Pi log 2
pi
i=1
1 1 1 1 1
H = log 2 2 + log 2 4 + log 2 8 + log 2 16 + log 2 16
2 4 8 16 16
A source produces dots and dashes; the probability of the dot is twice the probability
of the dash. The duration of the dot is 10msec and the duration of the dash is set to
three times the duration of the dot. Calculate the source entropy rate.
Properties of 𝑰(𝒙𝒊 , 𝒚𝒋 ):
Example:
Show that I(X, Y) is zero for extremely noisy channel.
Solution:
For extremely noisy channel, then 𝑦𝑗 gives no information about 𝑥𝑖 the receiver can’t
decide anything about 𝑥𝑖 as if we transmit a deterministic signal 𝑥𝑖 but the receiver
receives noise like signal 𝑦𝑗 that is completely has no correlation with 𝑥𝑖 . Then 𝑥𝑖 and
𝑦𝑗 are statistically independent so that 𝑃( 𝑥𝑖 ∣∣ 𝑦𝑗 ) = 𝑃(𝑥𝑖 )𝑎𝑛𝑑 𝑃( 𝑦𝑗 ∣∣ 𝑥𝑖 ) =
𝑃(𝑥𝑖 ) 𝑓𝑜𝑟 𝑎𝑙𝑙 𝑖 𝑎𝑛𝑑 𝑗, 𝑡ℎ𝑒𝑛:
𝐼(𝑥𝑖 , 𝑦𝑗 ) = log 2 1 = 0 𝑓𝑜𝑟 𝑎𝑙𝑙 𝑖 & 𝑗, 𝑡ℎ𝑒𝑛 𝐼(𝑋, 𝑌) = 0
Marginal entropies is a term usually used to denote both source entropy H(X) defined
as before and the receiver entropy H(Y) given by:
𝑚
𝑏𝑖𝑡𝑠
𝐻(𝑌) = − ∑ 𝑃(𝑦𝑗 ) log 2 𝑃(𝑦𝑗 )
𝑠𝑦𝑚𝑏𝑜𝑙
𝑗=1
𝐻( 𝑌 ∣ 𝑋 ) = 𝐻(𝑋, 𝑌) − 𝐻(𝑋)
𝐻( 𝑋 ∣ 𝑌 ) = 𝐻(𝑋, 𝑌) − 𝐻(𝑌)
Where, the 𝐻( 𝑋 ∣ 𝑌 ) is the losses entropy.
Also we have:
𝐼(𝑋, 𝑌) = 𝐻(𝑋) − 𝐻(𝑋 ∣ 𝑌)
𝐼(𝑋, 𝑌) = 𝐻(𝑌) − 𝐻(𝑌 ∣ 𝑋)
𝐻(𝑋, 𝑌)
[0.5ln(0.5) + 0.25 ln(0.25) + 0.125 ln(0.125) + 2 × 0.0625 ln(0.0625)]
=−
𝑙𝑛2
= 1.875 𝑏𝑖𝑡𝑠/𝑠𝑦𝑚𝑏𝑜𝑙
𝑏𝑖𝑡𝑠
3- 𝐻( 𝑌 ∣ 𝑋 ) = 𝐻(𝑋, 𝑌) − 𝐻(𝑋) = 1.875 − 1.06127 = 0.813
𝑠𝑦𝑚𝑏𝑜𝑙
1- Channel:
In telecommunications and computer networking, a communication channel
or channel, refers either to a physical transmission medium such as a wire, or to
a logical connection over a multiplexed medium such as a radio channel. A channel is
used to convey an information signal, for example a digital bit stream, from one or
several senders (or transmitters) to one or several receivers. A channel has a certain
capacity for transmitting information, often measured by its bandwidth in Hz or its data
rate in bits per second.
2- Symmetric channel:
For example the following conditional probability of various channel types as shown:
0.9 0.1
a- 𝑃( 𝑌 ∣ 𝑋 ) = [ ] is a BSC, because it is square matrix and 1st row is the
0.1 0.9
permutation of 2nd row.
0.9 0.05 0.05
b- 𝑃( 𝑌 ∣ 𝑋 ) = [0.05 0.9 0.05] is TSC, because it is square matrix and each
0.05 0.05 0.9
row is a permutation of others.
0.8 0.1 0.1
c- 𝑃( 𝑌 ∣ 𝑋 ) = [ ] is a non-symmetric since since it is not square
0.1 0.8 0.1
although each row is permutation of others.
0.8 0.1 0.1
d- 𝑃( 𝑌 ∣ 𝑋 ) = [0.1 0.7 0.2] is a non-symmetric although it is square since 2nd
0.1 0.1 0.8
row is not permutation of other rows.
1 1
1-P
Pr( 𝑌 = 0 ∣ 𝑋 = 0 ) = 1 − 𝑃
Pr( 𝑌 = 0 ∣ 𝑋 = 1 ) = 𝑃
Pr( 𝑌 = 1 ∣ 𝑋 = 0 ) = 𝑃
Pr( 𝑌 = 1 ∣ 𝑋 = 1 ) = 1 − 𝑃
𝑦1 𝑦2 𝑦3
𝑥1
1 − 2𝑃𝑒 𝑃𝑒 𝑃𝑒
𝑃( 𝑌 ∣ 𝑋 ) = 𝑥2 [ 𝑃 1 − 2𝑃𝑒 𝑃𝑒 ]
𝑥3 𝑒
𝑃𝑒 𝑃𝑒 1 − 2𝑃𝑒
The TSC is symmetric but not very practical since practically 𝑥1 and 𝑥3 are not affected
so much as 𝑥2 . In fact the interference between 𝑥1 and 𝑥3 is much less than the
interference between 𝑥1 and 𝑥2 or 𝑥2 and 𝑥3 .
X2 1-2Pe Y2
Pe
X3 Y3
1-2Pe
Hence the more practice but nonsymmetric channel has the trans. prob.
𝑦 𝑦2 𝑦3
𝑥1 1
1 − 𝑃𝑒 𝑃𝑒 0
𝑃( 𝑌 ∣ 𝑋 ) = 𝑥2 [𝑃 1 − 2𝑃𝑒 𝑃𝑒 ]
𝑥3 𝑒
0 𝑃𝑒 1 − 𝑃𝑒
Where 𝑥1 interfere with 𝑥2 exactly the same as interference between 𝑥2 and 𝑥3 , but 𝑥1
and 𝑥3 are not interfere.
1-Pe
X1 Y1
Pe
X2 1-2Pe Y2
Pe
X3 Y3
1-Pe
The Discrete Memoryless Channel (DMC) has an input X and an output Y. At any given
time (t), the channel output Y= y only depends on the input X = x at that time (t) and it
does not depend on the past history of the input. DMC is represented by the conditional
probability of the output Y = y given the input X = x, or P(YX).
X Channel Y
P(YX)
The Binary Erasure Channel (BEC) model are widely used to represent channels or links
that “losses” data. Prime examples of such channels are Internet links and routes. A
BEC channel has a binary input X and a ternary output Y.
1-Pe
X1 Y1
Pe
Erasure
X2 Y2
1-Pe
Note that for the BEC, the probability of “bit error” is zero. In other words, the
following conditional probabilities hold for any BEC model:
Pr( 𝑌 = "𝑒𝑟𝑎𝑠𝑢𝑟𝑒" ∣ 𝑋 = 0 ) = 𝑃
Pr( 𝑌 = "𝑒𝑟𝑎𝑠𝑢𝑟𝑒" ∣ 𝑋 = 1 ) = 𝑃
Pr( 𝑌 = 0 ∣ 𝑋 = 0 ) = 1 − 𝑃
Pr( 𝑌 = 1 ∣ 𝑋 = 1 ) = 1 − 𝑃
Pr( 𝑌 = 0 ∣ 𝑋 = 1 ) = 0
Pr( 𝑌 = 1 ∣ 𝑋 = 0 ) = 0
5- Special Channels:
a- Lossless channel: It has only one nonzero element in each column of the
transitional matrix P(Y∣X).
𝑦1 𝑦2 𝑦3 𝑦4 𝑦5
𝑥1
3/4 1/4 0 0 0
𝑃( 𝑌 ∣ 𝑋 ) = 𝑥2 [ ]
𝑥3 0 0 1/3 2/3 0
0 0 0 0 1
6- Shannon’s theorem:
The Shannon-Hartley theorem states that the channel capacity is given by:
𝑆
𝐶 = 𝐵𝑙𝑜𝑔2 (1 + )
𝑁
Where C is the capacity in bits per second, B is the bandwidth of the channel in Hertz,
and S/N is the signal-to-noise ratio.
Physically it is the maximum amount of information each symbol can carry to the
receiver. Sometimes this capacity is also expressed in bits/sec if related to the rate of
producing symbols r:
But we have
𝑃(𝑥𝑖 , 𝑦𝑗 ) = 𝑃(𝑥𝑖 )𝑃(𝑦𝑗 ∣ 𝑥𝑖 ) 𝑝𝑢𝑡 𝑖𝑛 𝑎𝑏𝑜𝑣𝑒 𝑒𝑞𝑢𝑎𝑡𝑖𝑜𝑛 𝑦𝑖𝑒𝑙𝑑𝑒𝑠:
𝑚 𝑛
Where K is constant and independent of the row number i so that the equation
becomes:
𝑛
X2 0.7 Y2
Example 10:
Find the channel capacity for the channel having the following transition:
0.7 0.3
𝑃( 𝑌 ∣ 𝑋 ) = [ ]
0.1 0.9
Solution: First not that the channel is not symmetric since the 1st row is not
permutation of 2nd row.
𝑃(𝑋, 𝑌) = 𝑃(𝑋) × 𝑃( 𝑌 ∣ 𝑋 )
0.6034 0.2586
=[ ]
0.0138 0.1242
We have
𝑚 𝑛
H( Y ∣ X ) = 0.8244 bits/symbol
Review questions:
A binary source sending 𝑥1 with a probability of 0.4 and 𝑥2 with 0.6 probability
through a channel with a probabilities of errors of 0.1 for 𝑥1 and 0.2 for 𝑥2 .Determine:
1- Source entropy.
2- Marginal entropy.
3- Joint entropy.
4- Conditional entropy 𝐻(𝑌𝑋).
5- Losses entropy 𝐻(𝑋𝑌).
6- Transinformation.
Solution:
1- The channel diagram:
0.9
0.4 𝑥1 𝑦1
0.1
0.2
0.6 𝑥2 𝑦2
0.8
0.9 0.1
Or 𝑃(𝑌X) = [ ]
0.2 0.8
𝑛
3- 𝐻(𝑋, 𝑌)
𝑚 𝑛
2- Cascading of Channels
If two channels are cascaded, then the overall transition matrix is the product of the two
transition matrices.
p( z / x) p( y / x). p( z / y)
(n k ) ( n m) ( m k )
matrix matrix matrix
1 1 1
. . .
. Channel 1 . Channel 1 .
n m k
Example:
Find the transition matrix p( z / x) for the cascaded channel shown.
0.8
0.7
0.2 0.3
0.3 1
0.7
1
0.7 0.3
0. 8 0 . 2 0 ,
p( y / x) p( z / y ) 1 0
0. 3 0 0 . 7
1 0
0.7 0.3
0.8 0.2 0 0.76 0.24
p ( z / x) 1 0
0.3 0 0.7 0.91 0.09
1 0
1- Sampling theorem:
i- A band limited signal of finite energy, which has no frequency components higher
than W Hz, is completely described by specifying the values of the signal at instant
of time separated by 1/2W second and
ii- A band limited signal of finite energy, which has no frequency components higher
than W Hz, may be completely recovered from the knowledge of its samples taken
at the rate of 2W samples per second.
When the sampling rate is chosen 𝑓𝑠 = 2𝑓𝑚 each spectral replicate is separated from
each of its neighbors by a frequency band exactly equal to 𝑓𝑠 hertz, and the analog
waveform ca theoretically be completely recovered from the samples, by the use of
filtering. It should be clear that if 𝑓𝑠 > 2𝑓𝑚 , the replications will be move farther apart
in frequency making it easier to perform the filtering operation.
When the sampling rate is reduced, such that 𝑓𝑠 < 2𝑓𝑚 , the replications will overlap, as
shown in figure below, and some information will be lost. This phenomenon is called
aliasing.
Example: Find the Nyquist rate and Nyquist interval for the following signals.
sin(500𝜋𝑡)
i- 𝑚(𝑡) =
𝜋𝑡
1
ii- 𝑚(𝑡) = cos(4000𝜋𝑡) cos(1000𝜋𝑡)
2𝜋
Solution:
1 1
ii- 𝑚(𝑡) = [ {cos(4000𝜋𝑡 − 1000𝜋𝑡) + cos(4000𝜋𝑡 + 1000𝜋𝑡)}]
2𝜋 2
1
= {cos(3000𝜋𝑡) + cos(5000𝜋𝑡)}
4𝜋
Then the highest frequency is 2500Hz
1 1
Nyquist interval = = = 0.2𝑚𝑠𝑒𝑐.
2𝑓𝑚𝑎𝑥 2×2500
H. W:
Find the Nyquist interval and Nyquist rate for the following:
1
i- cos(400𝜋𝑡) . cos(200𝜋𝑡)
2𝜋
1
ii- 𝑠𝑖𝑛𝜋𝑡
𝜋
Example:
2- Source coding:
An important problem in communications is the efficient representation of data
generated by a discrete source. The process by which this representation is
accomplished is called source encoding. An efficient source encoder must satisfies two
functional requirements:
We have:
max[𝐻(𝑌)] = 𝑙𝑜𝑔2 𝑚
A code efficiency can therefore be defined as:
𝐻(𝑌)
𝜂= × 100
max[𝐻(𝑌)]
The overall code length, L, can be defined as the average code word length:
𝑚
then:
1- LC log2 n bit/message if n 2r ( n 2,4,8,16,.... and r is an
integer) which gives 100 %
Example
For ten equiprobable messages coded in a fixed length code then
c- For three throws then the possible messages are n 6 6 6 216 with equal
probabilities
C(a)= 0 l(a)=1
C(b)= 10 l(b)=2
C(c)= 11 l(c)=2
The major property that is usually required from any variable-length code is that of
unique decodability. For example, the above code C for the alphabet X = {a, b, c} is
soon shown to be uniquely decodable. However such code is not uniquely decodable,
even though the codewords are all different. If the source decoder observes 01, it
cannot determine whether the source emitted (a b) or (c).
When message probabilities are not equal, then we use variable length codes. The
following properties need to be considered when attempting to use variable length
codes:
1) Unique decoding:
Example
Consider a 4 alphabet symbols with symbols represented by binary digits as
follows:
A0
B 01
C 11
DR. MAHMOOD 2017-12-08 36
D 00
If we receive the code word 0011 it is not known whether the transmission was DC
or AAC . This example is not, therefore, uniquely decodable.
2) Instantaneous decoding:
Example
Consider a 4 alphabet symbols with symbols represented by binary digits as
follows:
A0
B 10
C 110
D 111
This code can be instantaneously decoded since no complete codeword is a prefix of a
larger codeword. This is in contrast to the previous example where A is a prefix of both
B and D . This example is also a ‘comma code’ as the symbol zero indicates the end
of a codeword except for the all ones word whose length is known.
Example
Consider a 4 alphabet symbols with symbols represented by binary digits as follows:
A0
B 01
C 011
D 111
The code is identical to the previous example but the bits are time reversed. It is still
uniquely decodable but no longer instantaneous, since early codewords are now prefixes
of later ones.
Shannon Code
For messages x1 , x2 , x3 ,… xn with probabilities p( x1 ) , p ( x2 ) , p( x3 ) ,… p( xn ) then:
Ci Fi 2i
l
Example
Develop the Shannon code for the following set of messages,
p( x) [0.3 0.2 0.15 0.12 0.1 0.08 0.05]
then find:
(a) Code efficiency,
(b) p(0) at the encoder output.
Solution
xi p( xi ) li Fi Ci 0i
x1 0.3 2 0 00 2
0 0 1
0 1 0
To find To find
1 1
0 1
1 0
0 0
7
H ( X ) p( x i ) log 2 p( x i ) 2.6029 bits/message.
i 1
H (X )
100% 83.965%
LC
p(0) 0.603
Example
Repeat the previous example using ternary coding.
Solution
r
1) li log3 p( xi ) if p( x ) 1 1 1 1
i
{ , , ,...}
3 3 9 27
xi p( xi ) li Fi Ci 0i
x1 0.3 2 0 00 2
x2 0.2 2 0.3 02 1
x3 0.15 2 0.5 11 0
x4 0.12 2 0.65 12 0
0 0 1
1
To find To find
1 2
2 0
7
H ( X ) p( x i ) log 3 p( x i ) 1.642 ternary unit/message.
i 1
H (X )
100% 73.632%
LC
0i p( xi ) 0 . 6 0 .2 0 .1
p ( 0) i 1
LC 2.23
p(0) 0.404
Shannon- Fano Code:
In Shannon–Fano coding, the symbols are arranged in order from most probable to
least probable, and then divided into two sets whose total probabilities are as close
as possible to being equal. All symbols then have the first digits of their codes
assigned; symbols in the first set receive "0" and symbols in the second set receive
"1". As long as any sets with more than one member remain, the same process is
repeated on those sets, to determine successive digits of their codes.
Example:
The five symbols which have the following frequency and probabilities, design
suitable Shannon-Fano binary code. Calculate average code length, source entropy
and efficiency.
𝐿 = ∑ 𝑃(𝑥𝑗 )𝑙𝑗
𝑗=1
𝐻(𝑌) = 2.18567𝑏𝑖𝑡𝑠/𝑠𝑦𝑚𝑏𝑜𝑙
𝐻(𝑌) 2.18567
𝜂= × 100 = × 100 = 95.86%
L 2.28
Example
Develop the Shannon - Fano code for the following set of messages,
p( x) [0.35 0.2 0.15 0.12 0.1 0.08] then find the code efficiency.
Solution
xi p( xi ) Code li
x1 0.35 0 0 2
x2 0.2 0 1 2
x3 0.15 1 0 0 3
x4 0.12 1 0 1 3
x5 0.10 1 1 0 3
x6 0.08 1 1 1 3
6
H ( X ) p( xi ) log 2 p( xi ) 2.396
bits/symbol
i 1
H (X )
100% 97.796%
LC
Example
Repeat the previous example using with r 3
Solution
xi p( xi ) Code li
x1 0.35 0 1
x2 0.2 1 0 2
x3 0.15 1 1 2
x4 0.12 2 0 2
x5 0.10 2 1 2
x6 0.08 2 2 2
6
LC li p( xi ) 1.65 ternary unit/symbol
i 1
6
H ( X ) p( xi ) log 3 p( xi ) 1.512 ternary unit/symbol
i 1
H (X )
100% 91.636%
LC
The Huffman coding algorithm comprises two steps, reduction and splitting. These
steps can be summarized as follows:
1) Reduction
a) List the symbols in descending order of probability.
2) Splitting
2.12193
𝜂= × 100 = 96.45%
2.2
The average code word length is still 2.2 bits/symbol. But variances are different!
Example
Symbol A B C D E F G H
0
B 0.18 0.18 0.18 0.19 0.23 0.37 0.40
1
0
A 0.10 0.10 0.13 0.18 0.19 0.23
1
0
F 0.10 0.10 0.10 0.13 0.18
1
0
G 0.07 0.09 0.10 0.10
1
0
E 0.06 0.07 0.09 1
0
D 0.05 0.06 1
H 0.04 1
li 3 3 1 5 4 4 4 5
8
H ( X ) p( xi ) log 2 p( xi ) 2.552 bits/symbol
i 1
8
LC li p( xi ) 2.61 bits/symbol
i 1
Data Compression:
In computer science and information theory, data compression, source coding, or bit-
rate reduction involves encoding information using fewer bits than the original
representation. Compression can be either lossy or lossless.
The input message to RLE encoder is a variable while the output code word is fixed,
unlike Huffman code where the input is fixed while the output is varied.
The idea of error detection and/or correction is to add extra bits to the digital
message that enable the receiver to detect or correct errors with limited
capabilities. These extra bits are called parity bits. If we have k bits, r parity bits
are added, then the transmitted digits are:
𝑛 = 𝑟+𝑘
Here n called code word denoted as (n, k). The efficiency or code rate is equal to
𝑘/𝑛.
Ideally, FEC codes can be used to generate encoding symbols that are transmitted
in packets in such a way that each received packet is fully useful to a receiver to
reassemble the object regardless of previous packet reception patterns. The most
applications of FEC are:
Compact Disc (CD) applications, digital audio and video, Global System Mobile (GSM) and
Mobile communications.
2𝑡 = 𝑑𝑚𝑖𝑛 − 1
𝑑𝑚𝑖𝑛 − 1
𝑡=
2
𝑑𝑚𝑖𝑛 −1
The possible error correction 𝑡 = = 1.
2
It is a linear block codes (systematic codes). In this code, an extra bit is added
for each k information and hence the code rate (efficiency) is 𝑘 ⁄(𝑘 + 1). At the
receiver if the number of 1’s is odd then the error is detected. The minimum
Hamming distance for this category is dmin =2, which means that the simple
parity code is a single-bit error-detecting code; it cannot correct any error. There
are two categories in this type: even parity (ensures that a code word has an even
number of 1's) and odd parity (ensures that a code word has an odd number of
1's) in the code word.
The above table can be repeated with odd parity-check code of (5, 4) as follow:
Data word Code word Data word Code word
0010 00100 0110 01101
1010 10101 1000 10000
Note:
Error detection was used in early ARQ (Automatic Repeat on Request) systems.
If the receiver detects an error, it asks the transmitter (through another backward
channel) to retransmit.
The sender is calculate the parity bit to be added to the data word to form a code
word. At the receiver, a syndrome is calculated. The syndrome is passed to the
decision logic analyzer. If the syndrome is 0, there is no error in the received
codeword; the data portion of the received codeword is accepted as the data
word; if the syndrome is 1, the data portion of the received codeword is
discarded. The data word is not created as shown in figure below.
The repetition code is one of the most basic error-correcting codes. The idea of
the repetition code is to just repeat the message several times. The encoder is a
simple device that repeats, r times.
For example, if we have a (3, 1) repetition code, then encoding the signal
m=101001 yields a code c=111000111000000111.
Suppose we received a (3, 1) repetition code and we are decoding the signal
c=110001111. The decoded message is m=101. For (r, 1) repetition code an error
correcting capacity of 𝑟/2 (i.e. it will correct up to 𝑟/2 errors in any code word).
In other word the 𝑑𝑚𝑖𝑛 = 𝑟, or increasing the correction capability depending on
r value. Although this code is very simple, it also inefficient and wasteful because
using only (2, 1) repetition code, that would mean we have to double the size of
the bandwidth which means doubling the cost.
Linear block codes extend of parity check code by using a larger number of parity
bits to either detect more than one error or correct for one or more errors. A block
codes of an (n, k) binary block code can be selected a 2k codewords from 2n
possibilities to form the code, such that each k bit information block is uniquely
mapped to one of these 2k codewords. In linear codes the sum of any two
codewords is a codeword. The code is said to be linear if, and only if the sum of
𝑉𝑖 (+)𝑉𝑗 is also a code vector, where 𝑉𝑖 &𝑉𝑗 are codeword vectors and (+)
represents modulo-2 addition.
Hamming codes are a family of linear error-correcting codes that generalize the
Hamming (n,k) -code, and were invented by Richard Hamming in 1950.
Hamming codes can detect up to two-bit errors or correct one-bit errors without
detection of uncorrected errors. Hamming codes are perfect codes, that is, they
achieve the highest possible rate for codes with their block length and minimum
distance of three.
In the codeword, there are k data bits and 𝑟 = 𝑛 − 𝑘 redundant (check) bits,
giving a total of n codeword bits. 𝑛 = 𝑘 + 𝑟
1. r parity bits are added to an k - bit data word, forming a code word of n
bits .
This table describes which parity bits cover which transmitted bits in the
encoded word. For example, p2 provides an even parity for bits 2, 3, 6, and 7.
It also details which transmitted by which parity bit by reading the column. For
example, d1 is covered by p1 and p2 but not p3. This table will have a striking
resemblance to the parity-check matrix (H).
A = p1 d1d2 d4
B = p2 d1d3 d4
C = p3 d2 d3d4
Example:
Suppose we want to transmit the data 1011 over noisy communication channel.
Determine the Hamming code word.
Solution:
The first step is to calculate the parity bit value as follow and put it in the
corresponding position as follow:
Suppose the following noise is added to the code word, then the received code
becomes as:
Hamming matrices: