Materi Source Coding
Materi Source Coding
Materi Source Coding
Information
A quantitative measure of the amount of information any probabilistic event
represents.
single
source
symbol
Information
A quantitative measure of the amount of information any probabilistic event
represents.
Informasi adalah pesan (ucapan atau ekspresi) atau kumpulan pesan yang terdiri dari
order sekuens dari simbol, atau makna yang dapat ditafsirkan dari pesan atau
kumpulan pesan. (sumber: wikipedia.org)
i 1 P i 1 pi 6.3
Consider the function f(p) = p log (1/p). Use natural logarithms:
f′(p) = (-p log p)′ = -p(1/p) – log p = -1 + log (1/p)
f″(p) = p(-p-2) = -1/p < 0 for p (0,1) f is concave down
f′(1/e) = 0
f(1/e) = 1/e
1/e
f
f′(1) = -1
f′(0) = ∞
0 1/e 1
p f(1) = 0
ln x
0 x
1
-1
6.4
Fundamental Gibbs inequality
q q
Let xi 1 and yi 1 be two probability distributions, and consider
i 1 i 1
only when xi yi
q
yi q
yi q q q
i 1
xi log
xi
i 1
xi (1 ) ( xi yi ) xi yi 1 1 0
xi i 1 i 1 i 1
6.5
Compression: Background
• If the entropy of the original message
ensemble is less than the length of the
word over the original alphabet, this means
that the original encoding is redundant and
that the original information may be
compressed by the efficient encoding.
H (X ) H(X )
Efficiency 100%
L log
D forD 2 L
if D 2 then log D 1
10
Compression: Background
• The main idea behind the compression is
to create such a code, for which the
average length of the encoding vector
(word) will not exceed the entropy of the
original ensemble of messages.
• This means that in general those codes
that are used for compression are not
uniform.
11
Shannon-Fano Encoding
• Sources without memory are such sources of
information, where the probability of the next
transmitted symbol (message) does not
depend on the probability of the previous
transmitted symbol (message).
• Separable codes are those codes for which
the unique decipherability holds.
• Shannon-Fano encoding constructs
reasonably efficient separable binary codes
for sources without memory.
12
Shannon-Fano Encoding
• Shannon-Fano encoding is the first
established and widely used encoding
method. This method and the
corresponding code were invented
simultaneously and independently of each
other by C. Shannon and R. Fano in 1948.
13
Shannon-Fano Encoding
• Let us have the ensemble of the original
messages to be transmitted with their
corresponding probabilities:
X x1 , x2 ,..., xn ; P p1 , p2 ,..., pn
• Our task is to associate a sequence Ck of
binary numbers of unspecified length nk to
each message xk such that:
14
Shannon-Fano Encoding
• No sequences of employed binary numbers
Ck can be obtained from each other by adding
more binary digits to the shorter sequence
(prefix property).
• The transmission of the encoded message is
“reasonably” efficient, that is, 1 and 0 appear
independently and with “almost” equal
probabilities. This ensures transmission of
“almost” 1 bit of information per digit of the
encoded messages.
15
Shannon-Fano Encoding
• Another important general consideration,
which was taken into account by C.
Shannon and R. Fano, is that (as we have
already considered) a more frequent
message has to be encoded by a shorter
encoding vector (word) and a less frequent
message has to be encoded by a longer
encoding vector (word).
16
Shannon-Fano Encoding:
Algorithm
• The letters (messages) of (over) the input alphabet must be
arranged in order from most probable to least probable.
• Then the initial set of messages must be divided into two
subsets whose total probabilities are as close as possible to
being equal. All symbols then have the first digits of their
codes assigned; symbols in the first set receive "0" and
symbols in the second set receive "1".
• The same process is repeated on those subsets, to determine
successive digits of their codes, as long as any sets with more
than one member remain.
• When a subset has been reduced to one symbol, this means
the symbol's code is complete.
17
Shannon-Fano Encoding:
Example
Message x1 x2 x3 x4 x5 x6 x7 x8
1 1 1
L P xi ni 2 2 2 3 4 4 2.75
4 8 16
• The Shannon-Fano code gives 100% efficiency
19
Shannon-Fano Encoding: Example
Message x1 x2 x3 x4 x5 x6 x7 x8
Probabilit 0.25 0.25 0.12 0.125 0.062 0.062 0.062 0.062
y 5 5 5 5 5
Encoding 00 01 100 101 1100 1101 1110 1111
vector
22
Shannon-Fano Coding
1 1 1 li r li pi
log r li log r 1 r pi r .
pi pi pi pi r
K
q q q
pi 1
Summing this inequality over i: p
i 1
i 1 r
i 1
li
i 1
r r
Kraft inequality is satisfied, therefore there is an instantaneous code
with these lengths.
6.6
q q
1
Also, H r ( S ) pi log r pi li H r ( S ) 1
i 1 pi i 1
L
by summing multiplied by pi
0 1
H2(S) = 2.5 L = 5/2
0 1 0 1
0 1 0 1
6.6
Huffman Encoding
• This encoding algorithm has been proposed
by David A. Huffman in 1952, and it is still
the main loss-less compression basic
encoding algorithm.
• The Huffman encoding ensures constructing
separable codes (the unique decipherability
property holds) with minimum redundancy
for a set of discrete messages (letters), that
is, this encoding results in an optimum code.
25
Huffman Encoding: Background
• For an optimum encoding, the longer encoding
vector (word) should correspond to a message
(letter) with lower probability:
P x1 P x2 ... P xN L x1 L x2 ... L xN
• For an optimum encoding it is necessary that
L xN 1 L xN
otherwise the average length of the encoding
vector will be unnecessarily increased.
• It is important to mention that not more than D (D is the number of
letters in the encoding alphabet) encoding vectors could have equal
length (for the binary encoding D=2)
26
Huffman Encoding: Background
• For an optimum encoding with D=2 it is
necessary that the last two encoding
vectors are identical except for the last
digits. L xN 1
• For an optimum encoding it is necessary
that each sequence of length
digits either must be used as an encoding
vector or must have one of its prefixes
used as an encoding vector.
27
Huffman Encoding: Algorithm
• The letters (messages) of (over) the input alphabet must be
arranged in order from most probable to least probable.
• Two least probable messages (the last two messages) are
merged into the composite message with a probability equal
to the sum of their probabilities. This new message must be
inserted into the sequence of the original messages instead of
its “parents”, accordingly with its probability.
• The previous step must be repeated until the last remaining
two messages will compose a message, which will be the only
member of the messages’ sequence.
• The process may be utilized by constructing a binary tree –
the Huffman tree.
28
Huffman Encoding: Algorithm
• The Huffman tree should be constructed as follows:
1) A root of the tree is a message from the last step
with the probability 1; 2) Its children are two messages
that have composed the last message; 3) The step 2
must be repeated until all leafs of the tree will be
obtained. These leafs are the original messages.
• The siblings-nodes from the same level are given the
numbers 0 (left) and 1 (right).
• The encoding vector for each message is obtained by
passing a path from the root’s child to the leave
corresponding to this message and reading the
numbers of nodes (root’s childintermidiatesleaf)
that compose the encoding vector.
29
Huffman Encoding: Example
• Let us construct the Huffman code for the
following set of messages: x1, x2, x3, x4, x5
with the probabilities p(x1)=…=p(x5)=0.2
• 1) x1 (p=0.2), x2 (p=0.2), x3 (p=0.2), x4 (p=0.2), x5 (p=0.2)
• 2) x4,x5x45 (p=0.4)=> x45,x1,x2,x3
• 3) x2,x3x23 (p=0.4)=>x45, x23, x1
• 4) x1,x23x123(p=0.6)=> x123, x45
• 5) x123, 45x12345 (p=1)
30
Huffman Encoding: Example
x12345
0 1
x123 x45
0 1 0 1
x1 x23 x4 x5
0 1
x2 x3
31
Huffman Encoding: Example
• Entropy H ( X ) 5 0.2 log 0.2 5 15 log 15 log 15 log 5 2.32
• Average length of the encoding vector
1 1 12
L 3 2 2 3 2.4
5 5 5
• The Huffman code gives (2.32/2.4)100% =
97% efficiency
32
The Entropy of Code Extensions
concatenation multiplication
probabilities.
The entropy is: [Letting i = (i1, …, in)q , an n-digit number base q]
qn qn
1 1
H ( S ) H (T ) Qi log Qi log
n
i 1 Qi i 1 pi1 pin
qn 1 1 qn
1 qn
1
Qi log
p
log
p
Qi log
i 1 pi1
Qi log .
pin
i 1 i1 i n
i 1
6.8
qn qn
1 1
Consider the kth term Qi log pi1 pi n log
i 1 pi k i 1 pi k
q q q q q
1 1
i 1 1
pi1 pi n log
i n 1
i k pi1 pi k pi n pi k log
ˆ
pi k i1 1 i n 1
ˆ
i k 1 pi k
q q
i 1 1
iˆk pi1 pˆ i k pi n H (S ) H (S )
i n 1
pi1 pˆ i k pi n is just a probability in the (n 1)st
extention, and adding them all up gives 1.
H(Sn) = n∙H(S)
Hence the average S-F code length Ln for T satisfies:
H(T) Ln < H(T) + 1 n ∙ H(S) Ln < n ∙ H(S) + 1
H(S) (Ln/n) < H(S) + 1/n
6.8
Extension Example
S = {s1, s2} p1 = 2/3 p2 = 1/3 H2(S) = (2/3)log2(3/2) + (1/3)log2(3/1)
~ 0.9182958 …
Huffman coding: s1 = 0 s2 = 1 Avg. coded length = (2/3)∙1+(1/3)∙1 = 1
Shannon-Fano: l1 = 1 l2 = 2 Avg. coded length = (2/3)∙1+(1/3)∙2 = 4/3
2nd extention: p11 = 4/9 p12 = 2/9 = p21 p22 = 1/9 S-F:
l11 = log2 (9/4) = 2 l12 = l21 = log2 (9/2) = 3 l22 = log2 (9/1) = 4
LSF(2) = avg. coded length = (4/9)∙2+(2/9)∙3∙2+(1/9)∙4 = 24/9 = 2.666…
Sn = (s1 + s2)n, whose probabilities are corresponding terms in (p1 + p2)n
n
n i n 2
i
1
n i
2 i
n i
p1 p2 So there are symbols with probability
i 0 i
i 3 3 3n
3n
The corresponding SF length is log 2 i n log 2 3 i n log 2 3 i
2
6.9
Extension cont.
n
n 2i 1 n n i
n n log 2 3 i n 2 n log 2 3 i
(n)
LSF
i 0 i 3 3 i 0 i
1 n
n i n n i 2n
n log 2
3 2
i 2 i
n log 2
3
3n i
i 0 i 0 3
(2 + 1)n = 3n 2n 3n-1 *
(n)
LSF as 2
Hence log 2 3 H 2 (S )
n n 3
n
n i n i dx n
n i x 1
(2 x ) 2 x n (2 x ) 2 (n i ) x
n n 1 n i 1
n 3n 1
i 0 i i 0 i
n
n i n
n i n
n n
n i
i 0 i
2 ( n i ) n 3n 1
n
i 0 i
2
i 0 i
i 2 i
i 0 i
2 i n 3n
n 3n 1
6.9
Markov Process Entropy
p ( si | si1 sim ) conditional probability that si follows si1 sim .
For an mth order process, think of letting the state s si1 , , sim .
Hence, I ( s | s ) 1
i log , and so
p ( si | s )
H (S | s ) p( s | s ) I ( s | s )
si S
i i
p( s ) p( s | s ) I ( s | s ) p( s ) p( s | s ) I ( s | s )
i i i i
sS m si S sS m si S
p(s , si ) I (si | s )
ms
p ( s , si ) log p ( s1i |s )
sS i S s , si S m1
6.10
.8 previous next Example
state state
0, 0
Si1 Si2 Si p(si | si1, si2) p(si1, si2) p(si1, si2, si)
.2 .5
4 1 1 1 1 1
2 log 2 2 log 2 4 log 2 0.801377
14 0.8 14 0.2 14 0.5 6.11
Base Fibonacci
The golden ratio = (1+√5)/2 is a solution 0
… 1/r
to x2 − x − 1 = 0 and is equal to the limit of H2 = log2 r
r−1
the ratio of adjacent Fibonacci numbers.
1/2 0
1/
1/ 0 1 1st order Markov process:
1/2 0 1
1/
Think of source as 0
1/ 1/2 1/ + 1/2 = 1
emitting variable 10
1 0
length symbols: 1/2