Materi Source Coding

Download as ppt, pdf, or txt
Download as ppt, pdf, or txt
You are on page 1of 39

SOURCE CODING

Information
A quantitative measure of the amount of information any probabilistic event
represents.

single
source
symbol

Axioms: I(p) = the amount of information in the


occurrence of an event of probability p
A. I(p) ≥ 0
for any event p
B. I(p1∙p2) = I(p1) + I(p2) p1
& p2 are independent events
Cauchy
C. functional equation I(p)
unitsisof
a continuous function of p
information:
in base 2 = a bit
Existence: I(p) = log_(1/p)
in base e = a nat
in base 10 = a Hartley
6.2
Translated by Mujur

Information
A quantitative measure of the amount of information any probabilistic event
represents.

Informasi: Sebuah ukuran kuantitatif dari sejumlah informasi yang direpresentasikan


oleh suatu event probabilistik.
Pengertian Informasi secara khusus

Informasi adalah pesan (ucapan atau ekspresi) atau kumpulan pesan yang terdiri dari
order sekuens dari simbol, atau makna yang dapat ditafsirkan dari pesan atau
kumpulan pesan. (sumber: wikipedia.org)

Pengertian Informasi secara umum


Entropy
The average amount of information received on a per
symbol basis from a source S = {s1, …, sq} of symbols, si
has probability pi. It is measuring the rate.
weighted arithmetic mean information of the
For radix r of information weighted geometric mean
     
      pi pi
q
q
1 q
1 1
H r ( S )   pi  log r   log r    log r   
i 1 pi i 1  pi  i 1  pi 
when all the probabilities are independent.
• Entropy is the amount of information in the probability
distribution.
Alternative approach: consider a long message of N symbols from
S = {s1, …, sq} with probabilities p1, …, pq. You expect si to
appear Npi times, and the probability of this typical message is
q q
 1  1
P   pi whose information is log   N  pi log  N  H (S )
Np i

i 1 P  i 1 pi 6.3
Consider the function f(p) = p log (1/p). Use natural logarithms:
f′(p) = (-p log p)′ = -p(1/p) – log p = -1 + log (1/p)
f″(p) = p(-p-2) = -1/p < 0 for p  (0,1)  f is concave down

f′(1/e) = 0
f(1/e) = 1/e
1/e

f
f′(1) = -1
f′(0) = ∞
0 1/e 1
p f(1) = 0

log 1p  log p ( log p)  p 1


lim f ( p)  lim  lim  lim  lim 0
p 0 p 0 1 p 0 p 1 p  0 ( p )
1 p 0  p  2
p
6.3
Gibbs Inequality
Basic information about log function:
Tangent line to y = ln x at x = 1 is
(y  ln 1) = (ln)′x=1(x  1)  y = x  1
(ln x)″ = (1/x)′ = -(1/x2) < 0 x   
ln x is concave down. Therefore, ln x  x  1
y=x1

ln x

0 x
1

-1

6.4
Fundamental Gibbs inequality
q q
Let  xi  1 and  yi  1 be two probability distributions, and consider
i 1 i 1
 only when xi  yi
q
yi  q
yi q q q


i 1
xi log
xi
 
i 1
xi (1  )   ( xi  yi )   xi   yi  1  1  0
xi i 1 i 1 i 1

• Minimum Entropy occurs when one pi = 1 and all others are 0.


• Maximum Entropy occurs when? Consider
Gibbs with
 1 distribution y i  1q
 1
q q q   
1  q
H ( S )  log q   pi log  log q  pi   pi log  0
i 1 pi i 1 i 1  pi 
 
 
1
Hence H ( S )  log q. Equality occurs only when pi  .
q 6.4
Entropy Examples

S = {s1} p1 = 1 H(S) = 0 (no information)


S = {s1,s2} p1 = p2 = ½ H2(S) = 1 (1 bit per symbol)
S = {s1, …, sr} p1 = … = pr = 1/r Hr(S) = 1 but H2(S) = log2r.

Run length coding (for instance, in predictive coding) (binary)

p = 1  q probability of 0 H2(S) = p log2(1/p) + q log2(1/q)


As q  0 the term q log2(1/q) dominates (compare slopes).
1/q = average run length; log2(1/q) = # of bits needed (on average);
q log2(1/q) = average # of bits of information per bit of original code.
Entropy as a Lower Bound for Average Code Length

Given an instantaneous code with length li in radix r, let


q
1 r  li q
K   li  1 ; Qi  ;  Qi  1
i 1 r K i 1
q
 Qi  Qi 1 1
So by Gibbs,  pi log r    0, applying log  log  log
i 1  pi  pi pi Qi
q q q
1 1
H r ( S )   pi log r   pi log r   pi (log r K  li log r r )
i 1 pi i 1 Qi i 1
q
 log r K   pi li . Since K  1, log r K  0, and hence H r ( S )  L.
i 1

By the McMillan inequality, this hold for all uniquely decodable


codes. Equality occurs when K = 1 (the decoding tree is complete)
and p  r  li
i

6.5
Compression: Background
• If the entropy of the original message
ensemble is less than the length of the
word over the original alphabet, this means
that the original encoding is redundant and
that the original information may be
compressed by the efficient encoding.

H (X ) H(X )
Efficiency  100%  
L  log
D forD  2 L
if D  2 then log D 1
10
Compression: Background
• The main idea behind the compression is
to create such a code, for which the
average length of the encoding vector
(word) will not exceed the entropy of the
original ensemble of messages.
• This means that in general those codes
that are used for compression are not
uniform.

11
Shannon-Fano Encoding
• Sources without memory are such sources of
information, where the probability of the next
transmitted symbol (message) does not
depend on the probability of the previous
transmitted symbol (message).
• Separable codes are those codes for which
the unique decipherability holds.
• Shannon-Fano encoding constructs
reasonably efficient separable binary codes
for sources without memory.

12
Shannon-Fano Encoding
• Shannon-Fano encoding is the first
established and widely used encoding
method. This method and the
corresponding code were invented
simultaneously and independently of each
other by C. Shannon and R. Fano in 1948.

13
Shannon-Fano Encoding
• Let us have the ensemble of the original
messages to be transmitted with their
corresponding probabilities:
 X    x1 , x2 ,..., xn ;  P    p1 , p2 ,..., pn 
• Our task is to associate a sequence Ck of
binary numbers of unspecified length nk to
each message xk such that:

14
Shannon-Fano Encoding
• No sequences of employed binary numbers
Ck can be obtained from each other by adding
more binary digits to the shorter sequence
(prefix property).
• The transmission of the encoded message is
“reasonably” efficient, that is, 1 and 0 appear
independently and with “almost” equal
probabilities. This ensures transmission of
“almost” 1 bit of information per digit of the
encoded messages.
15
Shannon-Fano Encoding
• Another important general consideration,
which was taken into account by C.
Shannon and R. Fano, is that (as we have
already considered) a more frequent
message has to be encoded by a shorter
encoding vector (word) and a less frequent
message has to be encoded by a longer
encoding vector (word).

16
Shannon-Fano Encoding:
Algorithm
• The letters (messages) of (over) the input alphabet must be
arranged in order from most probable to least probable.
• Then the initial set of messages must be divided into two
subsets whose total probabilities are as close as possible to
being equal. All symbols then have the first digits of their
codes assigned; symbols in the first set receive "0" and
symbols in the second set receive "1".
• The same process is repeated on those subsets, to determine
successive digits of their codes, as long as any sets with more
than one member remain.
• When a subset has been reduced to one symbol, this means
the symbol's code is complete.

17
Shannon-Fano Encoding:
Example
Message x1 x2 x3 x4 x5 x6 x7 x8

Probability 0.25 0.25 0.12 0.12 0.062 0.062 0.062 0.062


5 5 5 5 5 5
x1,x2,x3,x4,x5,x6,x7,x8
0 1
x1,x2 x3,x4,x5,x6,x7,x8
00 01 10 11
x1 x2 x3,x4 x5,x6,x7,x8

100 101 110 111


x3 x4 x5,x6 x7,x8 1111

1100 1101 1110


18
x5 x6 x7 x8
Shannon-Fano Encoding:
Message x x
Example
x x x x x x8
1 2 3 4 5 6 7

Probabilit 0.25 0.25 0.12 0.125 0.062 0.062 0.062 0.062


y 5 5 5 5 5
Encoding 00 01 100 101 1100 1101 1110 1111
vector

• Entropy H    2   1 log 1   2   1 log 1   4   1 log 1    2.75


 4 4
 
8 8
 
16 16

• Average length  
 of the encoding vector   

 1  1   1 
L   P xi  ni   2    2   2    3   4    4    2.75
 4  8   16  
• The Shannon-Fano code gives 100% efficiency

19
Shannon-Fano Encoding: Example
Message x1 x2 x3 x4 x5 x6 x7 x8
Probabilit 0.25 0.25 0.12 0.125 0.062 0.062 0.062 0.062
y 5 5 5 5 5
Encoding 00 01 100 101 1100 1101 1110 1111
vector

• The Shannon-Fano code gives 100%


efficiency. Since the average length of the
encoding vector for this code is 2.75 bits, it
gives the 0.25 bits/symbol compression,
while the direct uniform binary encoding (3
bits/symbol) is redundant.
20
Shannon-Fano Encoding: Properties

• The Shannon-Fano encoding is the most


efficient when the probability of the
occurrence of each message (letter) xk is
of the form  k 
n
P x  2 k
and N  n  n  n
2
k 1
k
2 1
2 2
 ...  2  nN  1

• The prefix property always holds and


N N
L   P xk  nk   P xk  log P xk 
k 1 k 1
   
      I ( xk ) 
H (X )

the efficiency is 100%.


21
Shannon-Fano Encoding: Properties
• It should be taken into account that the
Shannon-Fano code is not unique because
it depends on the partitioning of the input set
of messages, which, in turn, is not unique.
• If the successive equiprobable partitioning is
not possible at all, the Shannon-Fano code
may not be an optimum code, that is, a code
that leads to the lowest possible average
length of the encoding vector for a given D.

22
Shannon-Fano Coding

The simplest variable length method. Less efficient than Huffman,


but allows one to code symbol si with length li directly from pi.
Given source symbols s1, …, sq with probabilities p1, …, pq pick
li = logr(1/pi). Hence,

 1  1 1 li r  li pi
  log r   li   log r   1  r   pi  r  .
 pi   pi  pi pi r
 K
q q q
pi 1
Summing this inequality over i: p
i 1
i 1 r
i 1
 li

i 1

r r
Kraft inequality is satisfied, therefore there is an instantaneous code
with these lengths.
6.6
q q
1
Also, H r ( S )   pi log r   pi li  H r ( S )  1
i 1 pi i 1 
L
by summing  multiplied by pi

Example: p’s: ¼, ¼, ⅛, ⅛, ⅛, ⅛ l’s: 2, 2, 3, 3, 3, 3 K = 1

0 1
H2(S) = 2.5 L = 5/2

0 1 0 1

0 1 0 1

6.6
Huffman Encoding
• This encoding algorithm has been proposed
by David A. Huffman in 1952, and it is still
the main loss-less compression basic
encoding algorithm.
• The Huffman encoding ensures constructing
separable codes (the unique decipherability
property holds) with minimum redundancy
for a set of discrete messages (letters), that
is, this encoding results in an optimum code.
25
Huffman Encoding: Background
• For an optimum encoding, the longer encoding
vector (word) should correspond to a message
(letter) with lower probability:
P x1  P x2   ...  P xN   L x1  L x2   ...  L xN 
• For an optimum encoding it is necessary that
L  xN 1   L  xN 
otherwise the average length of the encoding
vector will be unnecessarily increased.
• It is important to mention that not more than D (D is the number of
letters in the encoding alphabet) encoding vectors could have equal
length (for the binary encoding D=2)
26
Huffman Encoding: Background
• For an optimum encoding with D=2 it is
necessary that the last two encoding
vectors are identical except for the last
digits. L  xN   1
• For an optimum encoding it is necessary
that each sequence of length
digits either must be used as an encoding
vector or must have one of its prefixes
used as an encoding vector.
27
Huffman Encoding: Algorithm
• The letters (messages) of (over) the input alphabet must be
arranged in order from most probable to least probable.
• Two least probable messages (the last two messages) are
merged into the composite message with a probability equal
to the sum of their probabilities. This new message must be
inserted into the sequence of the original messages instead of
its “parents”, accordingly with its probability.
• The previous step must be repeated until the last remaining
two messages will compose a message, which will be the only
member of the messages’ sequence.
• The process may be utilized by constructing a binary tree –
the Huffman tree.

28
Huffman Encoding: Algorithm
• The Huffman tree should be constructed as follows:
1) A root of the tree is a message from the last step
with the probability 1; 2) Its children are two messages
that have composed the last message; 3) The step 2
must be repeated until all leafs of the tree will be
obtained. These leafs are the original messages.
• The siblings-nodes from the same level are given the
numbers 0 (left) and 1 (right).
• The encoding vector for each message is obtained by
passing a path from the root’s child to the leave
corresponding to this message and reading the
numbers of nodes (root’s childintermidiatesleaf)
that compose the encoding vector.

29
Huffman Encoding: Example
• Let us construct the Huffman code for the
following set of messages: x1, x2, x3, x4, x5
with the probabilities p(x1)=…=p(x5)=0.2
• 1) x1 (p=0.2), x2 (p=0.2), x3 (p=0.2), x4 (p=0.2), x5 (p=0.2)
• 2) x4,x5x45 (p=0.4)=> x45,x1,x2,x3
• 3) x2,x3x23 (p=0.4)=>x45, x23, x1
• 4) x1,x23x123(p=0.6)=> x123, x45
• 5) x123, 45x12345 (p=1)

30
Huffman Encoding: Example
x12345

0 1
x123 x45

0 1 0 1
x1 x23 x4 x5

0 1
x2 x3

Encoding vectors: x1(00); x2(010); x3(011); x4(10); x5(11)

31
Huffman Encoding: Example
• Entropy H ( X )  5 0.2 log 0.2   5  15 log 15    log 15  log 5  2.32
 
• Average length of the encoding vector
 1   1  12
L  3    2   2   3    2.4
5  5  5
• The Huffman code gives (2.32/2.4)100% =
97% efficiency

32
The Entropy of Code Extensions

Recall: The nth extension of a source S = {s1, …, sq} with


probabilities p1, …, pq is the set of symbols
T = Sn = {si ∙∙∙ sin | sij  S 1  j  n} where
1

concatenation multiplication

ti = si ∙∙∙ sin has probability pi ∙∙∙ pin = Qi assuming independent


1 1

probabilities.
The entropy is: [Letting i = (i1, …, in)q , an n-digit number base q]
qn qn
1 1
H ( S )  H (T )   Qi log   Qi log
n

i 1 Qi i 1 pi1  pin
qn  1 1   qn
1 qn
1
 
Qi log
 p
   log
p
  Qi log
 i 1 pi1
    Qi log .
pin
i 1  i1 i n 
i 1

6.8
qn qn
1 1
Consider the kth term   Qi log   pi1  pi n log 
i 1 pi k i 1 pi k
q q q q q
1 1

i 1 1
 pi1  pi n log
i n 1
 i k  pi1  pi k  pi n  pi k log
ˆ
pi k i1 1 i n 1
ˆ
i k 1 pi k

q q


i 1 1
iˆk   pi1  pˆ i k  pi n H (S )  H (S )
i n 1
 pi1  pˆ i k  pi n is just a probability in the (n  1)st
extention, and adding them all up gives 1.
 H(Sn) = n∙H(S)
Hence the average S-F code length Ln for T satisfies:
H(T)  Ln < H(T) + 1  n ∙ H(S)  Ln < n ∙ H(S) + 1 

H(S)  (Ln/n) < H(S) + 1/n
6.8
Extension Example
S = {s1, s2} p1 = 2/3 p2 = 1/3 H2(S) = (2/3)log2(3/2) + (1/3)log2(3/1)
~ 0.9182958 …
Huffman coding: s1 = 0 s2 = 1 Avg. coded length = (2/3)∙1+(1/3)∙1 = 1
Shannon-Fano: l1 = 1 l2 = 2 Avg. coded length = (2/3)∙1+(1/3)∙2 = 4/3
2nd extention: p11 = 4/9 p12 = 2/9 = p21 p22 = 1/9 S-F:
l11 = log2 (9/4) = 2 l12 = l21 = log2 (9/2) = 3 l22 = log2 (9/1) = 4
LSF(2) = avg. coded length = (4/9)∙2+(2/9)∙3∙2+(1/9)∙4 = 24/9 = 2.666…
Sn = (s1 + s2)n, whose probabilities are corresponding terms in (p1 + p2)n
n
n i  n  2
i
1
n i
2 i
    
n i
  p1  p2 So there are   symbols with probability     
i 0  i 
i  3 3 3n
     
 3n 
The corresponding SF length is log 2 i   n log 2 3  i   n log 2 3  i
 2 
6.9
Extension cont.
n
 n  2i 1 n n i
    n  n log 2 3  i   n   2  n log 2 3  i  
(n)
LSF
i 0  i  3 3 i 0  i 
1  n
n i n n i  2n
n log 2   
3   2    
i 2  i  
 n log 2 
3 
3n  i
i 0   i 0    3

(2 + 1)n = 3n 2n 3n-1 *

(n)
LSF as 2
Hence  log 2 3   H 2 (S )
n n  3
n
 n  i n  i dx n
n i x 1
 (2  x )    2  x  n (2  x )    2 (n  i ) x
n n 1 n  i 1
 n  3n 1 
i 0  i  i 0  i 
n
n i n
n i n
n n
n i
 
  
i 0  i 
2 ( n  i )  n  3n 1
 n  
  
i 0  i 
2   
 
i 0  i 
  i  2 i
  
 
i 0  i 
 2  i  n  3n
 n  3n 1

6.9
Markov Process Entropy
p ( si | si1  sim )  conditional probability that si follows si1  sim .
For an mth order process, think of letting the state s  si1 ,  , sim .
Hence, I ( s | s )   1 
i log  , and so
 p ( si | s ) 
H (S | s )   p( s | s )  I ( s | s )
si S
i i

Now, let p ( s )  the probability of being in state s .


Then H ( S )   p( s )  H ( S | s ) 
sS m

 p( s )  p( s | s )  I ( s | s )    p( s )  p( s | s )  I ( s | s )
i i i i
sS m si S sS m si S

   p(s , si )  I (si | s ) 
ms
 p ( s , si )  log p ( s1i |s )
sS i S s , si S m1
6.10
.8 previous next Example
state state
0, 0
Si1 Si2 Si p(si | si1, si2) p(si1, si2) p(si1, si2, si)
.2 .5

.5 0 0 0 0.8 5/14 4/14


0, 1 1, 0 0 0 1 0.2 5/14 1/14
.5 0 1 0 0.5 2/14 1/14
.5 .2 0 1 1 0.5 2/14 1/14
1, 1
1 0 0 0.5 2/14 1/14
.8 1 0 1 0.5 2/14 1/14
equilibrium probabilities 1 1 0 0.2 5/14 1/14
p(0,0) = 5/14 = p(1,1)
1 1 1 0.8 5/14 4/14
p(0,1) = 2/14 = p(1,0)
1
H 2 (S )   p( si1 , si2 , si ) log 2
p( si | si1 , si2 )

{0 ,1}3

4 1 1 1 1 1
2 log 2  2 log 2  4 log 2  0.801377
14 0.8 14 0.2 14 0.5 6.11
Base Fibonacci
The golden ratio  = (1+√5)/2 is a solution 0
… 1/r
to x2 − x − 1 = 0 and is equal to the limit of H2 = log2 r
r−1
the ratio of adjacent Fibonacci numbers.
1/2 0
1/
1/ 0 1 1st order Markov process:
1/2 0 1
1/
Think of source as 0
1/ 1/2 1/ + 1/2 = 1
emitting variable 10
1 0
length symbols: 1/2

Entropy = (1/)∙log  + ½(1/²)∙log ² = log  which is maximal

take into account


variable length symbols

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy