Part 3 Information and Quantification
Part 3 Information and Quantification
Part 3 Information and Quantification
Alsahlany
2/24/2019 1
Information Theory / CE231 Lecturer Ali M. Alsahlany
Lecture Outlines :
• Introduction
• General model of communication system
• Information Source
• Self Information
• Entropy
• Information Rate
• Joint Entropy
• Conditional Entropy
• Mutual Information
2/24/2019 2
Information Theory / CE231 Lecturer Ali M. Alsahlany
Lecture 3: Information and quantification Information Theory
Information Theory is concerned with the theoretical limitations and potentials of systems that communicate.
E.g., \What is the best compression or communications rate we can achieve"
Communication is sending information from one place and/or time to another place and/or time over a medium
that might have errors.
2/24/2019 3
Information Theory / CE231 Lecturer Ali M. Alsahlany
Lecture 3: Information and quantification General model of communication system
Noise
• Telephone line, High frequency radio link, Space communication link, Biological organism (send message
Channel
from brain to foot, or from ear to brain)
• Some signal with time-varying frequency response, cross-talk, thermal noise, impulsive switch noise, etc.
Noise • Represents our imperfect understanding of the universe. Thus, we treat it as random, often however
obeying some rules, such as that of a probability distribution.
Receiver • The destination of the information transmitted, Person, Computer, Disk, Analog Radio or TV internet
2/24/2019 4
Information Theory / CE231 Lecturer Ali M. Alsahlany
Lecture 3: Information and quantification General model of communication system
Noise
2/24/2019 5
Information Theory / CE231 Lecturer Ali M. Alsahlany
Lecture 3: Information and quantification Information Source
Assuming:
- Information source generates a group of symbols from a given alphabet S = {so , s1 , ..., s K −1 }
The amount of information gained from knowing that the source produces the symbols is sk is related with as pk
follows :
- If Pk = 1
Then there is no uncertainty of occurrence of the event ; no gain of information i.e., there is no need for
communications because the receiver knows everything.
- As P decreases
k
Then the uncertainty increases; the reception of sk corresponds to some gain of information
Self Information: Is a function which measures the amount of information after observing the symbol sk
1
I ( sk ) = log b
P ( sk ) log 𝑎𝑎 (𝑝𝑝) =
𝐿𝐿𝐿𝐿 𝑝𝑝
𝐿𝐿𝐿𝐿 𝑎𝑎
2, bits 10, Hartleys
e=2.718, nats
The unit of information depends on the base of the log
The amount of information in bits about a symbol is closely related to its probability of occurrence
A low probability event contains a lot of information and vice versa
-The information obtained from the occurrence of two independent events is the sum of the information obtained
from the occurrence of the individual events
1
I ( AB ) = log b
P ( AB )
1
= log b
P ( A) P ( B )
1 1
= log b + log b
P ( A) P( B)
= I ( A) + I ( B )
∴ I ( AB ) = I ( A) + I ( B )
Example 1: Let H and T are the outcomes of a flipping coin, calculate the self information for the following
cases:
(a) Fair coin with P (H) = P (T) = 0.5 (a) I (H) =I (T))= 1 bit
(b) Unfair coin with P(H) = 1/8, P(T) = 7/8 (b) I (H) = 3 bits I (T) = 0.193 bits
2/24/2019 8
Information Theory / CE231 Lecturer Ali M. Alsahlany
Lecture 3: Information and quantification Information Source / Self Information
Example 2: A source puts out one of five possible messages during each message interval. The probability of
these messages are {m1,…,m5} :P1 = 1/2, P2 = 1/4, P3 = 1/8, P4 = 1/16 and P5 = 1/16 What is the information
content of these messages in bit?
1 1
I ( m1 ) = log 2 = − log 2 P ( m1 ) = − log 2 ( ) = 1bit
P ( m1 ) 2
1 1
I ( m2 ) = log 2 = − log 2 P ( m2 ) = − log 2 ( ) = 2bits
P ( m2 ) 4
1 1
I ( m3 ) = log 2 = − log 2 P ( m3 ) = − log 2 ( ) = 3bits
P ( m3 ) 8
1 1
I (m4 ) = log 2 = − log 2 P (m4 ) = − log 2 ( ) = 4bits
P (m4 ) 16
1 1
I ( m5 ) = log 2 = − log 2 P ( m5 ) = − log 2 ( ) = 4bits
P ( m5 ) 16
2/24/2019 9
Information Theory / CE231 Lecturer Ali M. Alsahlany
Lecture 3: Information and quantification Information Source / Self Information
Exercise 1: For 128 equally likely and independents messages find the information content (in bits) in each of
the messages.
Solution:
1
I (m) = log 2 = − log 2 P ( m) = log 2 (128) = 7bits
P(m )
Homework 1: Suppose in sizing up the data storage requirements for a word processing system to be used in
production of a book, it is required to calculate the information capacity. The book consist of 450 pages, 500
words per page, each containing 5 symbols are chosen at random from a 37- ary alphabet (26 letters, 10
numerical digits and one blank space). Calculate the information capacity of the book.
2/24/2019 10
Information Theory / CE231 Lecturer Ali M. Alsahlany
Lecture 3: Information and quantification Information Source / Entropy
Entropy: It is the average number of bits per symbol required to describe a source
N
1
H = ∑ Pi log b
i =1 P ( si )
2/24/2019 11
Information Theory / CE231 Lecturer Ali M. Alsahlany
Lecture 3: Information and quantification Information Source / Entropy
Properties of Entropy
H is a positive quantity H ≥ 0
If all a priori probabilities are equally likely (Pi = 1 / N for all N symbols)
then the entropy is maximum and given by:
H = log b N
The proof:
If all a priori probabilities are equally likely (Pi = 1/N for all N symbols)
N N
H = −∑ Pi log 2 ( Pi ) = −(1 / N )∑ log 2 (1 / N )
i =1 i =1
= −(1 / N ) [ N log b (1 / N )]
= − log 2 (1 / N ) = log 2 N
∴ 0 ≤ H ≤ log b N
2/24/2019 12
Information Theory / CE231 Lecturer Ali M. Alsahlany
Lecture 3: Information and quantification Information Source / Entropy
i.e., you need log2N bits to represent a variable that can take one of N values if N is a power of 2
If these values are equally probable, the entropy is equal to certain number of bits
If one of the events is more probable than others, observation of that event is less informative
Conversely, rarer events provide more information when observed. Since observation of less probable
events occurs more rarely, the net effect is that the entropy received from non uniformly distributed
data is less than log2N
Entropy is zero when one outcome is certain, so entropy refers to disorder or uncertainty of a
message
2/24/2019 13
Information Theory / CE231 Lecturer Ali M. Alsahlany
Lecture 3: Information and quantification Information Source / Entropy
Example 4: Find and plot the entropy of the binary code in which the probability of occurrence for
the symbol 1 is P and for the symbol 0 is 1-P
2
H
H = −∑ Pi log 2 Pi = − P log 2 P − (1 − P ) log 2 (1 − P ) 1
i =1
v logv 0 as v 0
P = 0 ⇒ H = 0 bit/symbol
P = 1 ⇒ H = 0 bit/symbol 0 1/2 1 p
1 1 1 1 1 1 1
P= ⇒ H = − log 2 − log 2 = + = 1 bit/symbol
2 2 2 2 2 2 2
1 1 1 3 3
P = ⇒ H = − log 2 − log 2 = 0.8113 bits/symbol
4 4 4 4 4
Example 5: Calculate the average information in bits/character in English assuming each letter is
equally likely
26
H = ∑ log 2 N = log 2 26 = 4.7 bits/character
i =1
2/24/2019 14
Information Theory / CE231 Lecturer Ali M. Alsahlany
Lecture 3: Information and quantification Information Source / Entropy
Exercise 2: For Uniform distribution P(xi)= 1/N find the average information
𝟏𝟏 𝟏𝟏 𝑵𝑵 𝟏𝟏 𝟏𝟏
Solutions: 𝑯𝑯 𝑿𝑿 = ∑𝑵𝑵
𝒊𝒊=𝟏𝟏 𝑰𝑰 𝒙𝒙𝒙𝒙 = ∑𝒊𝒊=𝟏𝟏 −𝒍𝒍𝒍𝒍𝒍𝒍𝟐𝟐 𝑰𝑰 𝒙𝒙𝒙𝒙 = ∗ 𝑵𝑵 ∗ 𝒍𝒍𝒍𝒍𝒍𝒍𝟐𝟐 𝑵𝑵
𝑵𝑵 𝑵𝑵 𝑵𝑵 𝑵𝑵
= 𝒍𝒍𝒍𝒍𝒍𝒍𝟐𝟐 𝑵𝑵
Homework 3: A source sending 2 symbols. A and B if the 𝑯𝑯𝒙𝒙𝒊𝒊 = 𝟎𝟎. 𝟔𝟔 and self information is 0.3 find the
probability of A and B.
2/24/2019 15
Information Theory / CE231 Lecturer Ali M. Alsahlany
Lecture 3: Information and quantification Information Rate
The information rate is represented by R and it is represented in average number of bits of information per second.
And is given as,
Information Rate : R = rH
Information rate R it is calculated as follows:
𝒔𝒔𝒔𝒔𝒔𝒔𝒔𝒔𝒔𝒔𝒔𝒔 𝒃𝒃𝒃𝒃𝒃𝒃𝒃𝒃 𝒃𝒃𝒃𝒃𝒃𝒃𝒃𝒃
𝑹𝑹 = 𝒓𝒓 𝒊𝒊𝒊𝒊 ∗ 𝑯𝑯 𝒊𝒊𝒊𝒊 =
𝒔𝒔𝒔𝒔𝒔𝒔𝒔𝒔𝒔𝒔𝒔𝒔 𝒔𝒔𝒔𝒔𝒔𝒔𝒔𝒔𝒔𝒔𝒔𝒔 𝒔𝒔𝒔𝒔𝒔𝒔𝒔𝒔𝒔𝒔𝒔𝒔
Example 6: A PCM source transmits four samples (messages) with a rate 2 samples / second. The probabilities of
occurrence of these 4 samples (messages) are p1 = p4 = 1/8 and p2 = p3 = 3/8. Find out the information rate of the
source.
𝟏𝟏 𝟏𝟏 𝟏𝟏 𝟏𝟏
Solution: 𝑯𝑯 = 𝒑𝒑𝟏𝟏 𝒍𝒍𝒍𝒍𝒍𝒍𝟐𝟐 + 𝒑𝒑𝟐𝟐 𝒍𝒍𝒍𝒍𝒍𝒍𝟐𝟐 + 𝒑𝒑𝟑𝟑 𝒍𝒍𝒍𝒍𝒍𝒍𝟐𝟐 + 𝒑𝒑𝟒𝟒 𝒍𝒍𝒍𝒍𝒍𝒍𝟐𝟐 ( )
𝒑𝒑𝟏𝟏 𝒑𝒑𝟐𝟐 𝒑𝒑𝟑𝟑 𝒑𝒑𝟒𝟒
𝟏𝟏 𝟑𝟑 𝟑𝟑 𝟑𝟑 𝟑𝟑 𝟏𝟏 𝒃𝒃𝒃𝒃𝒃𝒃𝒃𝒃
𝑯𝑯 = 𝒍𝒍𝒍𝒍𝒍𝒍𝟐𝟐 𝟖𝟖 + 𝒍𝒍𝒍𝒍𝒍𝒍𝟐𝟐 + 𝒍𝒍𝒍𝒍𝒍𝒍𝟐𝟐 + 𝒍𝒍𝒍𝒍𝒍𝒍𝟐𝟐 𝟖𝟖 = 𝟏𝟏. 𝟖𝟖
𝟖𝟖 𝟖𝟖 𝟖𝟖 𝟖𝟖 𝟖𝟖 𝟖𝟖 𝒎𝒎𝒎𝒎𝒎𝒎𝒎𝒎𝒎𝒎𝒎𝒎𝒎𝒎
𝒎𝒎𝒎𝒎𝒎𝒎𝒎𝒎𝒎𝒎𝒎𝒎𝒎𝒎 𝒃𝒃𝒃𝒃𝒃𝒃𝒃𝒃 𝒃𝒃𝒃𝒃𝒃𝒃𝒃𝒃
𝑹𝑹 = 𝒓𝒓𝒓𝒓 = 𝟐𝟐𝟐𝟐 ∗ 𝟏𝟏. 𝟖𝟖 = 𝟑𝟑. 𝟔𝟔
𝒔𝒔𝒔𝒔𝒔𝒔𝒔𝒔𝒔𝒔𝒔𝒔 𝒎𝒎𝒎𝒎𝒎𝒎𝒎𝒎𝒎𝒎𝒎𝒎𝒎𝒎 𝒔𝒔𝒔𝒔𝒔𝒔𝒔𝒔𝒔𝒔𝒔𝒔
In the example we discussed above, there are four samples (levels). Those four levels can be coded using binary
PCM as shown down in the table:
2/24/2019 16
Information Theory / CE231 Lecturer Ali M. Alsahlany
Lecture 3: Information and quantification Information Rate
Since one bit is capable of conveying 1 bit of information, the above coding scheme is capable of conveying 4 bits of
information per second. But in example, we have obtained that we are transmitting 3.6 bits of information per
second. This shows that the information carrying ability of binary PCM is not completely utilized by the
transmission scheme discussed in example . This situation is improved in the next example.
Example 7: In the transmission scheme of example 6, calculate the information rate if all messages are equally likely.
Solution: Since they are equally likely, their probabilities p1 = p2 = p3 = p4 = 1/4
𝑯𝑯 = 𝒍𝒍𝒐𝒐𝒐𝒐𝟐𝟐 𝟒𝟒 = 𝟐𝟐 𝒃𝒃𝒃𝒃𝒃𝒃𝒃𝒃/𝒎𝒎𝒎𝒎𝒎𝒎𝒎𝒎𝒎𝒎𝒎𝒎𝒎𝒎
𝒎𝒎𝒎𝒎𝒎𝒎𝒎𝒎𝒎𝒎𝒎𝒎𝒎𝒎 𝒃𝒃𝒃𝒃𝒃𝒃𝒃𝒃 𝒃𝒃𝒃𝒃𝒃𝒃𝒃𝒃
𝑹𝑹 = 𝒓𝒓𝒓𝒓 = 𝟐𝟐 ∗ 𝟐𝟐 = 𝟒𝟒
𝒔𝒔𝒔𝒔𝒔𝒔𝒔𝒔𝒔𝒔𝒔𝒔 𝒎𝒎𝒎𝒎𝒎𝒎𝒎𝒎𝒎𝒎𝒎𝒎𝒎𝒎 𝒔𝒔𝒔𝒔𝒔𝒔𝒔𝒔𝒔𝒔𝒔𝒔
Just before this example we have seen that a binary coded PCM with 2 bits per message is capable of conveying 4
bits of information per second. This has been made possible since all the messages are equally likely. Thus with
binary PCM coding the maximum information rate is achieved if all messages are equally likely.
2/24/2019 17
Information Theory / CE231 Lecturer Ali M. Alsahlany
Lecture 3: Information and quantification Joint Entropy
The joint entropy represents the amount of information needed on average to specify the value of two
discrete random variables
The entropy of the pairing (X,Y)
Example 8: Let X represent whether it is sunny or rainy in a particular town on a given day. Let Y represent whether
it is above 70 degrees or below seventy degrees. Compute the entropy of the joint distribution P(X, Y ) given by
P(sunny, hot) = 1/2
P(sunny, cool) = 1/4
P(rainy, hot) = 1/4
P(rainy, cool) = 0
Answer:
H(X, Y ) = − 1/2 log 1/2 + 1/4 log 1/4 + 1/4 log 1/4 + 0 log 0
𝟑𝟑
= − − 1/2 + −1/2+−1/2 = bits/symbol
𝟐𝟐
2/24/2019 18
Information Theory / CE231 Lecturer Ali M. Alsahlany
Lecture 3: Information and quantification Joint Entropy
Homework 4 xxxx For discrete memory channel the joint probability is tabulated as :.
y
p( x, y) 0 1 2
0 3/24 2/24 1/24
x 1 2/24 5/24 2/24
2 6/24 1/24 2/24
Find E(X), E(Y), H(X), H(Y), and H(X,Y)
2/24/2019 19
Information Theory / CE231 Lecturer Ali M. Alsahlany
Lecture 3: Information and quantification Conditional Entropy
Given a pair of random variables (X, Y ), the conditional entropy H(X/Y ) is defined as
= 𝑯𝑯(𝑿𝑿/𝒀𝒀)= -∑𝒎𝒎 𝒏𝒏
𝒋𝒋=𝟏𝟏 ∑𝒊𝒊=𝟏𝟏 𝒑𝒑 (𝒙𝒙𝒊𝒊 , 𝒚𝒚𝒋𝒋 ) 𝒍𝒍𝒍𝒍𝒍𝒍𝟐𝟐 𝒑𝒑(𝒙𝒙𝒊𝒊 /𝒚𝒚𝒋𝒋 )
= 𝑯𝑯(𝒀𝒀/𝑿𝑿)= -∑𝒎𝒎 𝒏𝒏
𝒋𝒋=𝟏𝟏 ∑𝒊𝒊=𝟏𝟏 𝒑𝒑 (𝒙𝒙𝒊𝒊 , 𝒚𝒚𝒋𝒋 ) 𝒍𝒍𝒍𝒍𝒍𝒍𝟐𝟐 𝒑𝒑(𝒚𝒚𝒋𝒋 /𝒙𝒙𝒊𝒊 )
Example 9: For discrete memory channel the joint probability is tabulated as:.
p(x, y) y=0 y=1
x=0 1/2 1/4
x=1 0 1/4
2/24/2019 20
Information Theory / CE231 Lecturer Ali M. Alsahlany
Lecture 3: Information and quantification Conditional Entropy
Chain Rule:
2/24/2019 21
Information Theory / CE231 Lecturer Ali M. Alsahlany
Lecture 3: Information and quantification Conditional Entropy
Homework 6: A transmitter produces three symbols A, B, C which are related with joint probabilities as shown in
the table below.
k P(k) P(j/k) J
A 11/30 A B C
B 7/12 A 0 4/5 1/5
k
C 1/20 B 1/2 1/2 0
C 1/2 2/5 1/10
2/24/2019 22
Information Theory / CE231 Lecturer Ali M. Alsahlany
Lecture 3: Information and quantification Mutual Information
Mutual Information 𝑰𝑰 𝑿𝑿; 𝒀𝒀 ∶consider the set of symbols 𝒙𝒙𝟏𝟏 , 𝒙𝒙𝟐𝟐 ,.., 𝒙𝒙𝒏𝒏 . The source may produce 𝒚𝒚𝟏𝟏 , 𝒚𝒚𝟐𝟐 ,.., 𝒚𝒚𝒏𝒏 .
Theoretically, if the noise and jamming is zero then the set x = set y and n = m. however , due to noise and
jamming there will be a conditional probability of 𝒑𝒑(𝒚𝒚𝒋𝒋 /𝒙𝒙𝒋𝒋 ).
𝑇𝑇𝑥𝑥 𝑇𝑇𝑦𝑦
This the statistical average of all the pairs 𝑰𝑰(𝒙𝒙𝒊𝒊 , 𝒚𝒚𝒋𝒋 ), i= 1,2,…, n, j= 1,2, …,m . 𝒙𝒙𝟏𝟏 𝒚𝒚𝟏𝟏
𝒙𝒙𝟐𝟐 Noise 𝒚𝒚𝟐𝟐
. .
𝒎𝒎 𝒏𝒏
𝑰𝑰(𝑿𝑿, 𝒀𝒀)= ∑𝒋𝒋=𝟏𝟏 ∑𝒊𝒊=𝟏𝟏 𝒑𝒑 (𝒙𝒙𝒊𝒊 , 𝒚𝒚𝒋𝒋 ) 𝑰𝑰(𝒙𝒙𝒊𝒊 , 𝒚𝒚𝒋𝒋 ) . .
𝒑𝒑(𝒙𝒙𝒊𝒊 /𝒚𝒚𝒋𝒋 ) . channel .
=∑𝒎𝒎 ∑ 𝒏𝒏
𝒋𝒋=𝟏𝟏 𝒊𝒊=𝟏𝟏 𝒑𝒑 (𝒙𝒙 , 𝒚𝒚
𝒊𝒊 𝒋𝒋 ) 𝒍𝒍𝒍𝒍𝒍𝒍𝟐𝟐 𝒑𝒑(𝒙𝒙 )
𝒊𝒊 𝒙𝒙𝒏𝒏 𝒚𝒚𝒏𝒏
𝒑𝒑(𝒚𝒚 /𝒙𝒙 )
=∑𝒎𝒎 𝒏𝒏 𝒋𝒋 𝒊𝒊
𝒋𝒋=𝟏𝟏 ∑𝒊𝒊=𝟏𝟏 𝒑𝒑 (𝒙𝒙𝒊𝒊 , 𝒚𝒚𝒋𝒋 ) 𝒍𝒍𝒍𝒍𝒍𝒍𝟐𝟐 bits
𝒑𝒑(𝒚𝒚𝒊𝒊 )
2/24/2019 23
Information Theory / CE231 Lecturer Ali M. Alsahlany
Lecture 3: Information and quantification Mutual Information
Mutual Information 𝑰𝑰 𝑿𝑿; 𝒀𝒀 ∶It is a measure of mutual dependence between variables; it quantifies the amount of
information obtain about random variable from the other.
Example 11: For X data transmitted and received as Y after passing through the channel.
Noise Channel
X P(Y|X) Y
H(Y) is the received data a mix of
part of H(X) from the transmitter
H(X) is the information
with H(Y|X) the noise from the
generated by the source
channel
then sent
H(Y|X) is the added noise
by the channel
2/24/2019 24
Information Theory / CE231 Lecturer Ali M. Alsahlany
Lecture 3: Information and quantification Mutual Information
2/24/2019 25
Information Theory / CE231 Lecturer Ali M. Alsahlany
Lecture 3: Information and quantification Mutual Information
b. H(X, Y) = -∑𝒎𝒎 𝒏𝒏
𝒋𝒋=𝟏𝟏 ∑𝒊𝒊=𝟏𝟏 𝒑𝒑 𝑿𝑿𝒊𝒊 , 𝒀𝒀𝒋𝒋 𝒍𝒍𝒍𝒍𝒍𝒍𝟐𝟐 𝒑𝒑 𝑿𝑿𝒊𝒊 , 𝒀𝒀𝒋𝒋
𝒑𝒑(𝑿𝑿𝟏𝟏,𝒀𝒀𝟐𝟐) 0.25
Then 𝑰𝑰(𝑿𝑿𝟏𝟏, 𝒀𝒀𝟐𝟐) = 𝒍𝒍𝒍𝒍𝒍𝒍𝟐𝟐 = 𝒍𝒍𝒍𝒍𝒍𝒍𝟐𝟐 = -0.3923 bits , that means 𝒀𝒀𝟐𝟐 𝒈𝒈𝒈𝒈𝒈𝒈𝒈𝒈𝒈𝒈 𝒂𝒂𝒂𝒂𝒂𝒂𝒂𝒂𝒂𝒂𝒂𝒂𝒂𝒂𝒂𝒂𝒂𝒂 𝒂𝒂𝒂𝒂𝒂𝒂𝒂𝒂𝒂𝒂 𝑿𝑿𝟏𝟏
𝒑𝒑(𝑿𝑿𝟏𝟏).𝒑𝒑(𝒀𝒀𝟐𝟐) 0.75∗0.4375
e. 𝑰𝑰 𝑿𝑿; 𝒀𝒀 = 𝑯𝑯 𝑿𝑿 − 𝑯𝑯 𝑿𝑿 𝒀𝒀
= 1.06127-0.8863= 0.1749 bits/ symbol
f. To draw channel model we must find p(y/x) matrix from p(x, y)
p(Y/X) = -∑𝑵𝑵
𝒋𝒋=𝟏𝟏 𝑿𝑿𝒊𝒊 , 𝒀𝒀 /𝒑𝒑 𝑿𝑿𝒊𝒊
2/24/2019 26
Information Theory / CE231 Lecturer Ali M. Alsahlany
Lecture 3: Information and quantification Mutual Information
𝒀𝒀𝟏𝟏 𝒀𝒀𝟐𝟐
𝟐𝟐 𝟏𝟏
𝑿𝑿𝟏𝟏 (𝟎𝟎. 𝟓𝟓)/(𝟎𝟎. 𝟕𝟕𝟕𝟕) (𝟎𝟎. 𝟐𝟐𝟐𝟐)/(𝟎𝟎. 𝟕𝟕𝟕𝟕) 𝟑𝟑 𝟑𝟑
𝑿𝑿𝟐𝟐 𝟎𝟎 (𝟎𝟎. 𝟏𝟏𝟏𝟏𝟏𝟏)/(𝟎𝟎. 𝟏𝟏𝟏𝟏𝟏𝟏) = 𝟎𝟎 𝟏𝟏
𝑿𝑿𝟑𝟑 𝟎𝟎. 𝟎𝟎𝟎𝟎𝟎𝟎𝟎𝟎 /(𝟎𝟎. 𝟏𝟏𝟏𝟏𝟏𝟏 (𝟎𝟎. 𝟎𝟎𝟎𝟎𝟎𝟎𝟎𝟎)/(𝟎𝟎. 𝟏𝟏𝟏𝟏𝟏𝟏) 𝟏𝟏 𝟏𝟏
𝟐𝟐 𝟐𝟐
Unit row summation
2/3
𝑿𝑿𝟏𝟏 𝒀𝒀 𝟏𝟏
𝑿𝑿𝟐𝟐 1 1/3
1/3
𝑿𝑿𝟑𝟑 1/2 𝒀𝒀 𝟐𝟐
2/24/2019 27
Information Theory / CE231 Lecturer Ali M. Alsahlany
Lecture 3: Information and quantification Mutual Information
Solution:
(a) H(X) =2/3 log 3/2 +1/3 log 3= 0.918 bits = H(Y).
(b) H(X|Y)=1/3H(X|Y=0)+2/3H(X|Y=1)=0.667 bits = H (Y|X).
(c) H(X,Y) = 3*1/3 log 3 = 1.5858 bits.
(d) H(Y)-H(Y|X)=0.251 bits.
(e) I(X;Y)=H(Y)-H(Y|X)=0.251 bits
(f) Venn diagram to illustrate the relationships of entropy and relative entropy
2/24/2019 28