Lect2 PDF
Lect2 PDF
N
X
2−li ≤ 1
i=1
Kraft’s inequality, cont.
Start with a complete binary tree where all the leaves are at depth lmax .
Assume, without loss of generality, that the codeword lengths are sorted
in increasing order.
Choose a free node at depth l1 for the first codeword and remove all its
descendants. Do the same for l2 and codeword 2, etc. until we have
placed all codewords.
Kraft’s inequality, cont.
In order for the algorithm to work, there must in every step i be free
leaves at the maximum depth lmax . The number of remaining leaves is
i−1
X i−1
X N
X
2lmax − 2lmax −lj = 2lmax (1 − 2−lj ) > 2lmax (1 − 2−lj ) ≥ 0
j=1 j=1 j=1
This shows that there are free leaves in every step. Thus, we can
construct a tree code code with the given codeword lengths.
Kraft-McMillan’s inequality
li1 + . . . + lin is the length of n codewords from the code. The smallest
value this exponent can take is n, which would happen if all code words
had the length 1. The largest value the exponent can take is nl where l is
the maximal codeword length. The summation can then be written as
XN nl
X
( 2−li )n = Ak 2−k
i=1 k=n
Ak ≤ 2k
We have
N
X nl
X
( 2−li )n ≤ 2k 2−k = nl − n + 1
i=1 k=n
which gives us
N
X
2−li ≤ (n(l − 1) + 1)1/n
i=1
This is true for all n, including when we let n tend to infinity, which
finally gives us
XN
2−li ≤ 1
i=1
The converse of the inequality has already been proven, since we know
that we can construct a prefix code with the given codeword lengths if
they fulfill Kraft’s inequality, and all prefix codes are uniquely decodable.
Instantaneous codes
We measure how good a code is with the mean data rate R (more often
just data rate or rate).
Given that we have a memoryless source Xj and that we code one symbol
at a time with a prefix code. Then the mean codeword length l¯ (which is
equal to the rate) is bounded by
L
X
l¯ ≥ − pi · log2 pi = H(Xj )
i=1
Consider the difference between the entropy and the mean codeword
length
L L L
X X X 1
H(Xj ) − l¯ = − pi · log pi − pi · li = pi · (log − li )
pi
i=1 i=1 i=1
L L
X 1 X 2−li
= pi · (log − log 2li ) = pi · log
pi pi
i=1 i=1
L L L
1 X 2−li 1 X −li X
≤ pi · ( − 1) = ( 2 − pi )
ln 2 pi ln 2
i=1 i=1 i=1
1
≤ (1 − 1) = 0
ln 2
where we used the fact that ln x ≤ x − 1 and Kraft’s inequality.
Shannon’s information measure
4 P(A|B)
I (A; B) = logb
P(A)
P(A|B) P(AB)
I (A; B) = log = log =
P(A) P(A)P(B)
P(B|A)
= log = I (B; A)
P(B)
Therefore the information is also called mutual information.
Information Theory, cont.
P(A|A)
I (A; A) = log = − log P(A)
P(A)
I (X = x) = − log pX (x)
and
pX |Y (x|y )
I (X = x; Y = y ) = log
pX (x)
These are real functions of the random variables X and the random
variable (X , Y ), so the mean values are well defined
L
4 X
H(X ) = E {I (X = x)} = − pX (xi ) log pX (xi )
i=1
4
I (X ; Y ) = E {I (X = x; Y = y )} =
L X
M
X pX |Y (xi |yj )
= pXY (xi , yj ) log
pX (xi )
i=1 j=1
Useful inequality
log r ≤ (r − 1) log e
with equality if and only if r = 1.
This can also be written
ln r ≤ r − 1
If X takes values in {x1 , x2 , . . . , xL } we have that
0 ≤ H(X ) ≤ log L
with equality to the left if and only if there is an i such that pX (xi ) = 1
and with equality to the right if and only if pX (xi ) = 1/L for all
i = 1, . . . , L.
Information Theory, cont.
Thus we have that H(X ) ≥ 0 with equality if and only if pX (xi ) is either
0 or 1 for each i, but this means that pX (xi ) = 1 for exactly one i.
Information Theory, cont.
We have that
0 ≤ H(X |Y = yj ) ≤ log L
The conditional entropy of X given Y is defined as
4
H(X |Y ) = E {− log pX |Y } =
L X
X M
= − pXY (xi , yj ) log pX |Y (xi |yj )
i=1 j=1
We have that
0 ≤ H(X |Y ) ≤ log L
Information Theory, cont.
L X
X M
=− pY (yj )pX |Y (xi |yj ) log pX |Y (xi |yj )
i=1 j=1
M
X L
X
=− pY (yj ) pX |Y (xi |yj ) log pX |Y (xi |yj )
j=1 i=1
M
X
= pY (yj )H(X |Y = yj )
j=1
Information Theory, cont.
We have
pX1 X2 ...XN = pX1 · pX2 |X1 · · · pXN |X1 ...XN−1
which leads to the chain rule
H(X1 X2 . . . XN ) =
I (X ; Y ) = H(X ) − H(X |Y ) =
= H(Y ) − H(Y |X )
Information Theory, cont.
H(X |Y ) ≤ H(X )
I (X ; Y ) ≥ 0
H(f (X )) ≤ H(X )
H(f (X )|X ) = 0
H(X , f (X )) = H(X )
Entropy for sources
The entropy rate of the state sequence Sn is the same as the entropy rate
of the source