Group Assignment Multimedia System
Group Assignment Multimedia System
NAME OF STUDENTS ID
1. TSEGAY ADISU 05629/12
1
Chapter 5
Lossless Compression Algorithms
OUTLINES:
5.1 Introduction
5.2 Basics of Information Theory
5.3 Run-Length Coding
5.4 Variable-Length Coding (VLC)
5.5 Dictionary-based Coding
5.6 Huffman coding
5.7 Arithmetic Coding
5.8 Lossless Image Compression
5.1 Introduction
• Compression: the process of coding that will effectively reduce
the total number of bits needed to represent certain information.
2
otherwise, it is lossy.
• Compression ratio:
compression ratio = B0 / B1
n
n = H(S) =Σ pi log2 1/2
i=1
n
=- Σ pi log2pi
i=1
a top-down approach
4
1. Sort the symbols according to the frequency count of their
occurrences.
Symbol H E L O
Count 1 1 2 1
5
L 2 1.32 0 2
H 1 2.32 10 2
E 1 2.32 110 3
O 1 2.32 111 3
L 2 1.32 00 4
H 1 2.32 01 2
E 1 2.32 10 2
O 1 2.32 11 2
6
TOTAL number of bits: 10
a bottom-up approach
From the list pick two symbols with the lowest frequency counts.
parent node.
3. Assign a codeword for each leaf based on the path from the
root.
7
• in fig 7.5 new symbols p1, p2, p3 are created to refer to the
parent nodes in the huffman codoning
tree. the contents in the llist are illustrated below:
after intialization: L H E o
After initialization (a): L P1 H
After initialization (b): L P2
After imitialization (c): p3
Properties of Huffman Coding
1. Unique Prefix Property: No Huffman code is a prefix of any
other Huffman
code - precludes any ambiguity in decoding.
2. Optimality: minimum redundancy code - proved optimal for a
given data model (i.e., a given, accurate, probability distribution):
• The two least frequent symbols will have the same length for
their Huffman codes,
differing only at the last bit.
8
• Symbols that occur more frequently will have shorter Huffman
codes than symbols that occur less frequently.
• The average code length for an information source S is strictly
less than η + 1. Combined
with Eq. (7.5), we have:
τ<η+1
Extended Huffman Coding
• Motivation: All codewords in Huffman coding have integer bit
lengths. It is wasteful when pi is very
large and hence log2 1/pi is close to o.
• Why not group several symbols together and assign a single
codeword to the group as a whole?
• Extended Alphabet: For alphabet S = {s1, s2,......., Sn}, if k
symbols are grouped together, then the extended alphabet is:
k symbols
S(k)={s1s1,....s1,s1s1.....s2,.......,s1s1...sn,s1s2...s2s1,...,snsn...sn}.
• -the size of the new alphabet s(k) is nk.
• It can be proven that the average # of bits for each symbol is:
η < τ < η + 1/k
• An improvement over the original Huffman coding, but not
much.
9
• Problem: If k is relatively large (e.g., k ≥ 3), then for most
practical applications where n ≫ 1, nk implies a huge symbol
table —impractical.
10
b) updates the configuration of the tree.
• The encoderand decodermust use exactly the same
initial_codeand
update_tree routines.
Notes on Adaptive Huffman Tree Updating
• Nodes are numbered in order from left to right, bottom to top.
The numbers in parentheses indicates the count.
• The tree must always maintain its sibling property, i.e., all
nodes
(internal and leaf) are arranged in the order of increasing
counts.
• If the sibling property is about to be violated, a
swapprocedure is
invoked to update the tree by rearranging the nodes.
• When a swap is necessary, the farthest node with count Nis
swapped with the node whose count has just been increased to
N+1.
11
Another Example: Adaptive Huffman Coding
• This is to clearly illustrate more implementation details. We
show exactly what bits are sent, as opposed to simply stating
how the tree is updated.
• An additional rule: if any character/symbol is to be sent the
first time, it must be preceded by a special symbol, NEW. The
initial code for NEW is 0. The count for NEW is always kept as 0
(the count is never increased); hence it is always denoted as
NEW:(0) in Fig. 7.7.
initial code assignment for AADCCDD Using adaptive huffman
coding.
initial code
-------------
NEW: 0
12
A: 00001
B: 00010
C: 00011
D: 00100
13
• It is important to emphasize that the code for aparticular
symbol changes during the adaptive Huffman coding process.
For example, after AADCCDD, when the character D overtakes
A as the most frequent symbol, its code changes from 101 to 0.
• The Squeeze Page on this books web site provides a Java
applet for adaptive Huffman coding.
5.5 Dictionary-based Coding
• LZW (Lempel, Ziv, Welch) uses fixed-length codewordsto
represent variable-length strings of symbols/characters that
commonly occur together, e.g., words in English text.
• The LZW encoder and decoder build up the same dictionary
dynamically while receiving the data.
14
• LZW places longer and longer repeated entries into a
dictionary, and then emits the codefor an element, rather than
the string itself, if the element has already been placed in the
dictionary.
ALGORITHM LZW Compression
• BEGIN
• s = next input character;
• while not EOF
{
• c = next input character;
• if s + c exists in the dictionary
• s = s + c;
• else
{
• output the code for s; // this is when we • output
(encode) for s+c, which
//was a matching found in the
dictionary in the
//(above) if in the previous iteration
• add string s + c to the dictionary with a new code;
• s = c;
}
}
• output the code for s;
15
• END
Example LZW compression for string ABABBABCABABBA
Lets start with a very simple dictionary (also referred to as a
string table), initially containing only 3 characters, with codes
as follows:
CODE STRING
1 A
2 B
3 C
• Now if the input string is ABABBABCABABBA, the LZW
compression algorithm works as follows:
LZW Coding (Contd)
• In real applications, the code length l is kept in the range of
[l0, lmax]. The dictionary initially has a size of 2l0. When it is
filled up, the code length will be increased by 1; this is allowed
to repeat until l = lmax.
• When lmaxis reached and the dictionary is filled up, it needs
to be flushed (as in Unix compress, or to have the LRU (least
recently used) entries removed.
5.7 Arithmetic Coding
• Arithmetic coding is a more modern coding method that
usually out-performs Huffman coding.
• Huffman coding assigns each symbol a codeword which has
an integral bit length. Arithmetic coding can treat the whole
message as one unit.
• A message is represented by a half-open interval [a, b) where
aand bare real numbers between 0 and 1. Initially, the interval
16
is [0, 1). When the message becomes longer, the length of the
interval shortens and the number of bits needed to represent
the interval increases.
17
18
PROCEDURE Generating Codeword for Encoder
BEGIN
code = 0;
k = 1;
while (value(code) < low)
{
assign 1 to the kth binary fraction bit
if (value(code) > high)
replace the kth bit by 0
k = k + 1;
}
END
• The final step in Arithmetic encoding calls for the generation
of a number that falls within the range [low, high). The above
algorithm will ensure that the shortest binary codeword is
found.
19
ALGORITHM Arithmetic Coding Decoder
BEGIN
get binary code and convert to
decimal value = value(code);
DO
{
find a symbol s so that
Range_low(s) <= value < Range_high(s);
output s;
low = Rang_low(s);
high = Range_high(s);
range = high - low;
value = [value - low] / range;
}
UNTIL symbol s is a terminator
END
20
5.7.2 Scaling and Incremental Coding
• The basic algorithm described in the last section has the
following limitations that make its practical implementation
infeasible.
• When it is used to code long sequences of symbols, the tag
intervals shrink to a very small range. Representing these
small intervals requires very high-precision numbers.
• The encoder will not produce any output codeworduntil the
entire sequence is entered. Likewise, the decoder needs to
have the codewordfor the entire sequence of the input symbols
before decoding.
Some key observations:
21
1. Although the binary representations for the low, high, and
any number within the small interval usually require many
bits, they always have the same MSBs (Most Significant
Bits). For example, 0.1000110 for 0.5469 (low), 0.1000111 for
0.5547 (high).
2. Subsequent intervals will always stay within the current
interval. Hence, we can output the common MSBs and remove
them from subsequent considerations.
Procedure : (E1 and E2 Scalings in Arithmetic Coding).
BEGIN
while (high <= 0.5) OR (low >= 0.5)
{ if (high <= 0.5) // E1 scaling
{ output 0;
low = 2 * low; high = 2 * high;
}
else // E2 scaling
{ output 1;
low = 2 * (low - 0.5); high = 2 * (high - 0.5);
}
}
END
E3 Scaling
N E3 scaling steps followed by an E1 is equivalent to an E1
22
followed by N E2 steps. N E3 scaling steps followed by an E2
is equivalent to an E2 followed by N E1 steps.
• Therefore, a good way to handle the signaling of the E3
scaling is: postpone until there is an E1 or E2.
• If there is an E1 after N E3 operations, send 0 followed by N
1s after the E1;
• if there is an E2 after N E3 operations, send 1 followed by N
0s after the E2.
5.7.3 Integer Implementation
• Uses only integer arithmetic. It is quite common in modern
multimedia applications.
• Basically, the unit interval is replaced by a range [0, N),
where N is an integer, e.g. 255.
• Because the integer range could be so small, e.g., [0, 255),
applying the scaling techniques similar to what was discussed
above, now in integer arithmetic, is a necessity.
• The main motivation is to avoid any floating number
operations.
5.7.4 Binary Arithmetic Coding
• Only use binary symbols, 0 and 1.
• The calculation of new intervals and the decision of which
interval to take (first or second) are simpler.
• Fast Binary Arithmetic Coding (Q-coder, MQ-coder) was
developed in multimedia standards such as JBIG, JBIG2, and
JPEG-2000. The more advanced version, Context-Adaptive
Binary Arithmetic Coding (CABAC) is used in H.264 (M-coder)
and H.265.
23
5.8 Lossless Image Compression
24
using one of the lossless compression techniques we have
discussed, e.g., the Huffman coding scheme.
25
THE END//
26