0% found this document useful (0 votes)

59 views25 pages

Lect2 PDF

Uploaded by

Raghu s

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

59 views25 pages

Lect2 PDF

Uploaded by

Raghu s

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 25

Kraft’s inequality

An instantaneous code (prefix code, tree code) with the codeword

lengths l1 , . . . , lN exists if and only if
N
X
2−li ≤ 1
i=1

Proof: Suppose that we have a tree code. Let lmax = max{l1 , . . . , lN }.

Expand the tree so that all branches have the depth lmax . A codeword at
depth li has 2lmax −li leaves underneath itself at depth lmax . The sets of
leaves under codewords are disjoint. The total number of leaves under
codewords are less than or equal to 2lmax . Thus we have
N
X
2lmax −li ≤ 2lmax ⇒
i=1

N
X
2−li ≤ 1
i=1
Kraft’s inequality, cont.

Conversely, given codeword lengths l1 , . . . , lN that fulfill Kraft’s

inequality, we can always construct a tree code.

Start with a complete binary tree where all the leaves are at depth lmax .

Assume, without loss of generality, that the codeword lengths are sorted
in increasing order.

Choose a free node at depth l1 for the first codeword and remove all its
descendants. Do the same for l2 and codeword 2, etc. until we have
placed all codewords.
Kraft’s inequality, cont.

Obviously we can place a codeword at depth l1 .

In order for the algorithm to work, there must in every step i be free
leaves at the maximum depth lmax . The number of remaining leaves is
i−1
X i−1
X N
X
2lmax − 2lmax −lj = 2lmax (1 − 2−lj ) > 2lmax (1 − 2−lj ) ≥ 0
j=1 j=1 j=1

where we used the fact that Kraft’s inequality is fulfilled.

This shows that there are free leaves in every step. Thus, we can
construct a tree code code with the given codeword lengths.
Kraft-McMillan’s inequality

Kraft’s inequality can be shown to be fulfilled for all uniquely decodable

codes, not just prefix codes. It is then called Kraft-McMillan’s inequality:
A uniquely decodable code with the codeword lengths l1 , . . . , lN exists if
and only if
X N
2−li ≤ 1
i=1
PN
Consider ( i=1 2−li )n , where n is an arbitrary positive integer.
N
X N
X N
X
−li n
( 2 ) = ··· 2−(li1 + ...+lin )
i=1 i1 =1 in =1
Kraft-McMillan’s inequality, cont.

li1 + . . . + lin is the length of n codewords from the code. The smallest
value this exponent can take is n, which would happen if all code words
had the length 1. The largest value the exponent can take is nl where l is
the maximal codeword length. The summation can then be written as

XN nl
X
( 2−li )n = Ak 2−k
i=1 k=n

where Ak is the number of combinations of n codewords that have the

combined length k. The number of possible binary sequences of length k
is 2k . Since the code is uniquely decodable, we must have

Ak ≤ 2k

in order to be able to decode.

Kraft-McMillan’s inequality, cont.

We have
N
X nl
X
( 2−li )n ≤ 2k 2−k = nl − n + 1
i=1 k=n

which gives us
N
X
2−li ≤ (n(l − 1) + 1)1/n
i=1

This is true for all n, including when we let n tend to infinity, which
finally gives us
XN
2−li ≤ 1
i=1

The converse of the inequality has already been proven, since we know
that we can construct a prefix code with the given codeword lengths if
they fulfill Kraft’s inequality, and all prefix codes are uniquely decodable.
Instantaneous codes

One consequence of Kraft-McMillan’s inequality is that it tells us that

there is nothing to gain by using codes that are uniquely decodable but
not instantaneous.

Given a uniquely decodable but not instantaneous code with codeword

lengths l1 , . . . , lN , Kraft’s inequality gives that we can always construct
an instantaneous code with exactly the same codeword lengths. This new
code will have the same rate as the old code but it will be easier to
decode.
Performance measure

We measure how good a code is with the mean data rate R (more often
just data rate or rate).

average number of bits per codeword

R= [bits/symbol]
average number of symbols per code word
Since we’re doing data compression, we want the rate to be as low as
possible.
If we initially assume that we have a memoryless source Xj and code one
symbol at a time using a tree code, then R is given by
L
X
R = l¯ = pi · li [bits/symbol]
i=1

where L is the alphabet size and pi the probability of symbol i.

l¯ is the mean codeword length [bits/codeword].
Theoretical lower bound

Given that we have a memoryless source Xj and that we code one symbol
at a time with a prefix code. Then the mean codeword length l¯ (which is
equal to the rate) is bounded by
L
X
l¯ ≥ − pi · log2 pi = H(Xj )
i=1

H(Xj ) is called the entropy of the source.

Theoretical lower bound, cont.

Consider the difference between the entropy and the mean codeword
length
L L L
X X X 1
H(Xj ) − l¯ = − pi · log pi − pi · li = pi · (log − li )
pi
i=1 i=1 i=1
L L
X 1 X 2−li
= pi · (log − log 2li ) = pi · log
pi pi
i=1 i=1
L L L
1 X 2−li 1 X −li X
≤ pi · ( − 1) = ( 2 − pi )
ln 2 pi ln 2
i=1 i=1 i=1
1
≤ (1 − 1) = 0
ln 2
where we used the fact that ln x ≤ x − 1 and Kraft’s inequality.
Shannon’s information measure

We want a measure I of information that is connected to the probabilities

of events.
Some desired properties:
I The information I (A) of an event A should only depend of the
probability P(A) of the event.
I The lower the probability of the event, the larger the information
should be.
I If the probability of an event is 1, the information should be 0.
I Information should be a continuous function of the probability.
I If the independent events A and B happen, the information should
be the sum of the informations I (A) + I (B)
This gives that information should be a logarithmic measure.
Information Theory

The information I (A; B) that is given about an event A, when event B

happens is defined as

4 P(A|B)
I (A; B) = logb
P(A)

where we assume that P(A) 6= 0 and P(B) 6= 0. In the future we assume,

unless otherwise specified, that b = 2. The unit of information is then
called bit. (If b = e the unit is called nat.)
I (A; B) is symmetric in A and B:

P(A|B) P(AB)
I (A; B) = log = log =
P(A) P(A)P(B)

P(B|A)
= log = I (B; A)
P(B)
Therefore the information is also called mutual information.
Information Theory, cont.

We further have that

−∞ ≤ I (A; B) ≤ − log P(A)

with “equality” to the left if P(A|B) = 0 and equality to the right if

P(A|B) = 1.
I (A; B) = 0 means that the events A and B are independent.
− log P(A) is the amount of information that needs to be given in order
for us to determine that event A has happened.

P(A|A)
I (A; A) = log = − log P(A)
P(A)

We define the self information of the event A as

4
I (A) = − log P(A)
Information Theory, cont.

If we apply the definitions on the events {X = x} and {Y = y } we get

I (X = x) = − log pX (x)

and
pX |Y (x|y )
I (X = x; Y = y ) = log
pX (x)
These are real functions of the random variables X and the random
variable (X , Y ), so the mean values are well defined
L
4 X
H(X ) = E {I (X = x)} = − pX (xi ) log pX (xi )
i=1

is called the entropy (or uncertainty) of the random variable X .

Information Theory, cont.

4
I (X ; Y ) = E {I (X = x; Y = y )} =
L X
M
X pX |Y (xi |yj )
= pXY (xi , yj ) log
pX (xi )
i=1 j=1

is called the mutual information between the random variables X and Y .

Information Theory, cont.

If (X , Y ) is viewed as one random variable we get

L X
X M
H(X , Y ) = − pXY (xi , yj ) log pXY (xi , yj )
i=1 j=1

This is called the joint entropy of X and Y .

It then follows that the mutual information can be written as
pX |Y pXY
I (X ; Y ) = E {log } = E {log }
pX pX pY
= E {log pXY − log pX − log pY }
= E {log pXY } − E {log pX } − E {log pY }
= H(X ) + H(Y ) − H(X , Y )
Information Theory, cont.

Useful inequality
log r ≤ (r − 1) log e
with equality if and only if r = 1.
This can also be written
ln r ≤ r − 1
If X takes values in {x1 , x2 , . . . , xL } we have that

0 ≤ H(X ) ≤ log L

with equality to the left if and only if there is an i such that pX (xi ) = 1
and with equality to the right if and only if pX (xi ) = 1/L for all
i = 1, . . . , L.
Information Theory, cont.

Proof left inequality:


 = 0, pX (xi ) = 0
−pX (xi ) · log pX (xi ) > 0, 0 < pX (xi ) < 1
= 0, pX (xi ) = 1


Thus we have that H(X ) ≥ 0 with equality if and only if pX (xi ) is either
0 or 1 for each i, but this means that pX (xi ) = 1 for exactly one i.
Information Theory, cont.

Proof right inequality:

L
X
H(X ) − log L = − pX (xi ) log pX (xi ) − log L
i=1
L
X 1
= pX (xi ) log
L · pX (xi )
i=1
L
X 1
≤ pX (xi )( − 1) log e
L · pX (xi )
i=1
L L
X 1 X
= ( − pX (xi )) log e
L
i=1 i=1
= (1 − 1) log e = 0
1
with equality if and only if pX (xi ) = L for all i = 1, . . . , L
Information Theory, cont.
The conditional entropy of X given the event Y = yj is
L
4 X
H(X |Y = yj ) = − pX |Y (xi |yj ) log pX |Y (xi |yj )
i=1

We have that
0 ≤ H(X |Y = yj ) ≤ log L
The conditional entropy of X given Y is defined as
4
H(X |Y ) = E {− log pX |Y } =
L X
X M
= − pXY (xi , yj ) log pX |Y (xi |yj )
i=1 j=1

We have that
0 ≤ H(X |Y ) ≤ log L
Information Theory, cont.

We also have that

L X
X M
H(X |Y ) = − pXY (xi , yj ) log pX |Y (xi |yj )
i=1 j=1

L X
X M
=− pY (yj )pX |Y (xi |yj ) log pX |Y (xi |yj )
i=1 j=1

M
X L
X
=− pY (yj ) pX |Y (xi |yj ) log pX |Y (xi |yj )
j=1 i=1

M
X
= pY (yj )H(X |Y = yj )
j=1
Information Theory, cont.

We have
pX1 X2 ...XN = pX1 · pX2 |X1 · · · pXN |X1 ...XN−1
which leads to the chain rule

H(X1 X2 . . . XN ) =

H(X1 ) + H(X2 |X1 ) + · · · + H(XN |X1 . . . XN−1 )

We also have that

I (X ; Y ) = H(X ) − H(X |Y ) =
= H(Y ) − H(Y |X )
Information Theory, cont.

Other interesting inequalities

H(X |Y ) ≤ H(X )

with equality if and only if X and Y are independent.

I (X ; Y ) ≥ 0

with equality if and only if X and Y are independent.

If f (X ) is a function of X , we have

H(f (X )) ≤ H(X )

H(f (X )|X ) = 0
H(X , f (X )) = H(X )
Entropy for sources

The entropy, or entropy rate of a stationary random source Xn is defined

as
1
lim H(X1 . . . Xn ) =
n→∞ n
1
lim (H(X1 ) + H(X2 |X1 ) + . . . + H(Xn |X1 . . . Xn−1 )) =
n→∞ n
lim H(Xn |X1 . . . Xn−1 )
n→∞

For a memoryless source, the entropy rate is equal to H(Xn ).

Entropy for Markov sources

The entropy rate of a stationary Markov source Xn of order k is given by

H(Xn |Xn−1 . . . Xn−k )

The entropy rate of the state sequence Sn is the same as the entropy rate
of the source

H(Sn |Sn−1 Sn−2 . . .) = H(Sn |Sn−1 ) =

H(Xn . . . Xn−k+1 |Xn−1 . . . Xn−k ) = H(Xn |Xn−1 . . . Xn−k )

and thus the entropy rate can also be calculated by

k
L
X
H(Sn |Sn−1 ) = wj · H(Sn |Sn−1 = sj )
j=1

ie a weighted average of the entropies of the outgoing probabilities of

each state.

Info Theory Polyanskiy Wu
No ratings yet
Info Theory Polyanskiy Wu
730 pages
Math7224 Notes
No ratings yet
Math7224 Notes
32 pages
Lecture 3 - Entropy
No ratings yet
Lecture 3 - Entropy
35 pages
Chapter 2
No ratings yet
Chapter 2
68 pages
Info Theory
No ratings yet
Info Theory
59 pages
IT-CO-1-EN
No ratings yet
IT-CO-1-EN
26 pages
Lec7 InformationTheory
No ratings yet
Lec7 InformationTheory
41 pages
Lecture 4 - Inequalities
No ratings yet
Lecture 4 - Inequalities
19 pages
session3
No ratings yet
session3
44 pages
IICT Notes Unit-2
No ratings yet
IICT Notes Unit-2
17 pages
Unit 1 Shashi
No ratings yet
Unit 1 Shashi
51 pages
It Lectures
No ratings yet
It Lectures
342 pages
Joint & Conditional Entropy, Mutual Information: Application of Information Theory, Lecture 2
No ratings yet
Joint & Conditional Entropy, Mutual Information: Application of Information Theory, Lecture 2
26 pages
IT_w2
No ratings yet
IT_w2
16 pages
Elements of Information Theory 2006 Thomas M. Cover and Joy A. Thomas
No ratings yet
Elements of Information Theory 2006 Thomas M. Cover and Joy A. Thomas
16 pages
Lecture2 1
No ratings yet
Lecture2 1
37 pages
2 Information Theory
No ratings yet
2 Information Theory
40 pages
Lecture 3
No ratings yet
Lecture 3
31 pages
Information Theory Lecture Notes
No ratings yet
Information Theory Lecture Notes
37 pages
Information Theory
No ratings yet
Information Theory
122 pages
Jour 2
No ratings yet
Jour 2
37 pages
EC401 M1-Information Theory & Coding-Ktustudents - in PDF
No ratings yet
EC401 M1-Information Theory & Coding-Ktustudents - in PDF
50 pages
Lecture Note PDF
No ratings yet
Lecture Note PDF
373 pages
Information Theoretic Inequalities
No ratings yet
Information Theoretic Inequalities
18 pages
IT_w1
No ratings yet
IT_w1
20 pages
Information Theory Entropy Relative Entropy
No ratings yet
Information Theory Entropy Relative Entropy
60 pages
lời giải
No ratings yet
lời giải
52 pages
Information Theory: Ying Nian Wu UCLA Department of Statistics
No ratings yet
Information Theory: Ying Nian Wu UCLA Department of Statistics
41 pages
dabel_info_theory
No ratings yet
dabel_info_theory
25 pages
Entropy and Mutual Information
No ratings yet
Entropy and Mutual Information
63 pages
Module14 InformationTheoryandEntropy
No ratings yet
Module14 InformationTheoryandEntropy
24 pages
Elements of Information Theory-Chapter1-2
No ratings yet
Elements of Information Theory-Chapter1-2
63 pages
Notes It
No ratings yet
Notes It
46 pages
1 3 4 382 PDF
No ratings yet
1 3 4 382 PDF
5 pages
Entropy
No ratings yet
Entropy
9 pages
ECE4007 Information Theory and Coding: DR - Sangeetha R.G
No ratings yet
ECE4007 Information Theory and Coding: DR - Sangeetha R.G
44 pages
Entropy 4
No ratings yet
Entropy 4
10 pages
L04
No ratings yet
L04
4 pages
Communication Theory and Coding: Basics
No ratings yet
Communication Theory and Coding: Basics
17 pages
2 Entropy and Mutual Information: I (A) F (P (A) )
No ratings yet
2 Entropy and Mutual Information: I (A) F (P (A) )
27 pages
Information Theory Textbook
No ratings yet
Information Theory Textbook
14 pages
L01
No ratings yet
L01
5 pages
Entropy 2
No ratings yet
Entropy 2
3 pages
CS340 Machine Learning Information Theory
No ratings yet
CS340 Machine Learning Information Theory
22 pages
Lecture 3: Entropy, Relative Entropy, and Mutual Information
No ratings yet
Lecture 3: Entropy, Relative Entropy, and Mutual Information
5 pages
Entropy
No ratings yet
Entropy
21 pages
EE 231A: Information Theory: Rick Wesel Wesel@ee - Ucla.edu
No ratings yet
EE 231A: Information Theory: Rick Wesel Wesel@ee - Ucla.edu
16 pages
MIT16 36s09 Lec03
No ratings yet
MIT16 36s09 Lec03
10 pages
LECTURE 1: Introduction
No ratings yet
LECTURE 1: Introduction
16 pages
Notes08 Infotheory
No ratings yet
Notes08 Infotheory
7 pages
An Application of Generalized Tsalli's-Havrda-Charvat Entropy in Coding Theory Through A Generalization of Kraft Inequality
No ratings yet
An Application of Generalized Tsalli's-Havrda-Charvat Entropy in Coding Theory Through A Generalization of Kraft Inequality
5 pages
Information and Entropy: Aria Nosratinia - Information Theory 2-1
No ratings yet
Information and Entropy: Aria Nosratinia - Information Theory 2-1
7 pages
The Binary Entropy Function: ECE 7680 Lecture 2 - Definitions and Basic Facts
No ratings yet
The Binary Entropy Function: ECE 7680 Lecture 2 - Definitions and Basic Facts
8 pages
Lecture 1: Introduction, Entropy and ML Estimation
No ratings yet
Lecture 1: Introduction, Entropy and ML Estimation
5 pages
Leviathan
No ratings yet
Leviathan
261 pages
Physical Education and Health
No ratings yet
Physical Education and Health
8 pages
Basic Physical Chemistry
No ratings yet
Basic Physical Chemistry
11 pages
Analisis Tec Del Mdo Financiero
No ratings yet
Analisis Tec Del Mdo Financiero
42 pages
25.09.2022 Minor Test 1 - Conquer Batch 13, 15 & 16
No ratings yet
25.09.2022 Minor Test 1 - Conquer Batch 13, 15 & 16
31 pages
ProfitGrabber Action Guide v6
No ratings yet
ProfitGrabber Action Guide v6
136 pages
Chapter 1 Introduction To Software Engineering and Process Models
No ratings yet
Chapter 1 Introduction To Software Engineering and Process Models
53 pages
Engineering Students' Perception Towards Engineers and Engineering Works
No ratings yet
Engineering Students' Perception Towards Engineers and Engineering Works
11 pages
Nabakem Mega Check Cleaner MSDS
No ratings yet
Nabakem Mega Check Cleaner MSDS
8 pages
Vertiv Aisle Containment System
No ratings yet
Vertiv Aisle Containment System
46 pages
NCERT Solutions For Class 11 Maths Chapter 9 Sequences and Series Miscellaneous Exercise
No ratings yet
NCERT Solutions For Class 11 Maths Chapter 9 Sequences and Series Miscellaneous Exercise
29 pages
Macros Inc Guide To Flexible Dieting PDF
100% (1)
Macros Inc Guide To Flexible Dieting PDF
30 pages
Module - 3 - Electric Vehicles EV's & Hybrid Electric Vehicle
No ratings yet
Module - 3 - Electric Vehicles EV's & Hybrid Electric Vehicle
14 pages
Economics As A Social Science
No ratings yet
Economics As A Social Science
14 pages
Hospital List
No ratings yet
Hospital List
18 pages
Biodiversity in Sulawesi Island
No ratings yet
Biodiversity in Sulawesi Island
9 pages
Kristjansson, Friendship Phronesis One Thought Too Many
No ratings yet
Kristjansson, Friendship Phronesis One Thought Too Many
25 pages
SYLLABUS DESIGN (HO ESP Mate Dev Topic# 4) PDF
No ratings yet
SYLLABUS DESIGN (HO ESP Mate Dev Topic# 4) PDF
49 pages
Theory and Analysis: - Frank Jones
No ratings yet
Theory and Analysis: - Frank Jones
14 pages
2 Cost Terms Concepts and Behaviour Questions
No ratings yet
2 Cost Terms Concepts and Behaviour Questions
2 pages
KT 3 Ngu Am Hoc-Thuc
No ratings yet
KT 3 Ngu Am Hoc-Thuc
10 pages
How The Pandemic Highlights Our ICT Problems: An Essay: By: Karisa Marleman // 13 XII MIPA 4
No ratings yet
How The Pandemic Highlights Our ICT Problems: An Essay: By: Karisa Marleman // 13 XII MIPA 4
7 pages
CSF363
No ratings yet
CSF363
2 pages
BDA Style Guide 2023
No ratings yet
BDA Style Guide 2023
3 pages
834L Photobrochure
No ratings yet
834L Photobrochure
1 page
Review Laskar Pelangi
No ratings yet
Review Laskar Pelangi
4 pages
Maxwell Kestner Resume
No ratings yet
Maxwell Kestner Resume
2 pages
Vsi
No ratings yet
Vsi
6 pages
Bus Stat Chapter 4
No ratings yet
Bus Stat Chapter 4
2 pages
Character Sheet v.2
No ratings yet
Character Sheet v.2
1 page
Theory of Approximation
From Everand
Theory of Approximation
N. I. Achieser
No ratings yet
Differential Forms
From Everand
Differential Forms
Henri Cartan
5/5 (2)
Infinite Series
From Everand
Infinite Series
James M Hyslop
No ratings yet
Lectures on Integral Equations
From Everand
Lectures on Integral Equations
Harold Widom
3.5/5 (1)
A Short Course in Discrete Mathematics
From Everand
A Short Course in Discrete Mathematics
Edward A. Bender
3/5 (1)
Worked Examples in Mathematics for Scientists and Engineers
From Everand
Worked Examples in Mathematics for Scientists and Engineers
G. Stephenson
No ratings yet

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Lect2 PDF

Uploaded by

Lect2 PDF

Uploaded by

Kraft’s inequality

An instantaneous code (prefix code, tree code) with the codeword

Proof: Suppose that we have a tree code. Let lmax = max{l1 , . . . , lN }.

Conversely, given codeword lengths l1 , . . . , lN that fulfill Kraft’s

Obviously we can place a codeword at depth l1 .

where we used the fact that Kraft’s inequality is fulfilled.

Kraft’s inequality can be shown to be fulfilled for all uniquely decodable

where Ak is the number of combinations of n codewords that have the

in order to be able to decode.

One consequence of Kraft-McMillan’s inequality is that it tells us that

Given a uniquely decodable but not instantaneous code with codeword

average number of bits per codeword

where L is the alphabet size and pi the probability of symbol i.

H(Xj ) is called the entropy of the source.

We want a measure I of information that is connected to the probabilities

The information I (A; B) that is given about an event A, when event B

where we assume that P(A) 6= 0 and P(B) 6= 0. In the future we assume,

We further have that

−∞ ≤ I (A; B) ≤ − log P(A)

with “equality” to the left if P(A|B) = 0 and equality to the right if

We define the self information of the event A as

If we apply the definitions on the events {X = x} and {Y = y } we get

is called the entropy (or uncertainty) of the random variable X .

is called the mutual information between the random variables X and Y .

If (X , Y ) is viewed as one random variable we get

This is called the joint entropy of X and Y .

Proof left inequality:

Proof right inequality:

We also have that

H(X1 ) + H(X2 |X1 ) + · · · + H(XN |X1 . . . XN−1 )

Other interesting inequalities

with equality if and only if X and Y are independent.

with equality if and only if X and Y are independent.

The entropy, or entropy rate of a stationary random source Xn is defined

For a memoryless source, the entropy rate is equal to H(Xn ).

The entropy rate of a stationary Markov source Xn of order k is given by

H(Xn |Xn−1 . . . Xn−k )

H(Sn |Sn−1 Sn−2 . . .) = H(Sn |Sn−1 ) =

and thus the entropy rate can also be calculated by

ie a weighted average of the entropies of the outgoing probabilities of

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.