Compression: Safeen H. Rasool Assist. Lecturer
Compression: Safeen H. Rasool Assist. Lecturer
Safeen H. Rasool
Assist. Lecturer
• Data compression is a process by which the file size
is reduced by re-encoding the file data to use fewer
bits of storage than the original file. many methods
are used for this purpose, in general these methods
can be divided into two broad categories: lossless and
lossy methods.
Data Compression Methods
Run
Huffman LZW JPEG MPEG MP3
Length
CODEC
• A CODEC (compressor/decompressor) is used
carry out the algorithm to save a file in a
compressed format and open a compressed
file. CODECs can be implemented in either
hardware or software. Hardware CODECs are
more expensive but, because they use
dedicated chips instead of the computers CPU
time, they are significantly more efficient."
compression ratio
• The compression ratio (that is, the size of the
compressed file compared to that of the uncompressed
file) of lossy video codecs is nearly always far superior
to that of the audio and still-image equivalents.
– Video can be compressed immensely (e.g. 100:1) with little
visible quality loss
– Audio can often be compressed at 10:1 with imperceptible
loss of quality
– Still images are often lossily compressed at 10:1, as with
audio, but the quality loss is more noticeable, especially on
closer inspection.
Lossless Compression
• lossless data compression, the integrity of the
data is preserved. The original data and the
data after compression and decompression are
exactly the same because, in these methods,
the compression and decompression
algorithms are exact inverses of each other: no
part of the data is lost in the process.
Lossless Compression
• Redundant data is removed in compression and added
during decompression.
• Lossless compression methods are normally used when
we cannot afford to lose any data.
• Lossless data compression is used in many applications.
For example, it is used in the ZIP file format.
• Typical examples are executable programs, text
documents, and source code. Some image file formats,
like PNG or GIF, use only lossless compression.
Run-length encoding
• Run-length encoding is simplest method of
compression. It can be used to compress data
made of any combination of symbols.
• It is supported by most bitmap file formats,
such as TIFF, BMP, and PCX.
• RLE is suited for compressing any type of data
regardless of its information content.
Run-length encoding
• The general idea behind this method is to
replace consecutive repeating occurrences of a
symbol by one occurrence of the symbol
followed by the number of occurrences.
Huffman coding
• Huffman coding assigns shorter codes to symbols
that occur more frequently and longer codes to
those that occur less frequently.
• For example, imagine we have a text file that
uses only five characters (A, B, C, D, E). Before
we can assign bit patterns to each character, we
assign each character a weight based on its
frequency of use. In this example, assume that the
frequency of the characters is as shown in Table
Character A B C D E
Frequency 17 12 13 27 32
E
Character A B C D E
Frequency 17 12 13 27 32
B 0
1
C
E
Character A B C D E
Frequency 17 12 13 27 32
A 0
B 0 1
1
C
E
Character A B C D E
Frequency 17 12 13 27 32
A 0
0
B 0 1
1
1
C
E
Character A B C D E
Frequency 17 12 13 27 32
A 0
0
B 0 1
0
1
1
C
1
E
A 0
0
B 0 1
0
1
1
C
1
• A 000
• B0010
• C0011
• D01
• E1
Character A B C D E
Frequency 17 12 13 27 32