Lossless and Lossy Compression
Lossless and Lossy Compression
2023
COURSE OBJECTIVES:
To understand the basics of compression techniques
To understand the categories of compression for text, image and video
To explore the modalities of text, image and video compression algorithms
To know about basics of consistency of data availability in storage devices
To understand the concepts of data streaming services
UNIT I BASICS OF DATA COMPRESSION 6
Introduction ––Lossless and Lossy Compression– Basics of Huffmann coding- Arithmetic coding Dictionary
techniques- Context based compression – Applications
UNIT II IMAGE COMPRESSION 6
Lossless Image compression – JPEG-CALIC-JPEG LS-Prediction using conditional averages – Progressive
Image Transmission – Lossless Image compression formats – Applications - Facsimile encoding
UNIT III VIDEO COMPRESSION 6
Introduction – Motion Compensation – Video Signal Representation – H.261 – MPEG-1- MPEG-2- H.263. UNIT
IV DATA PLACEMENT ON DISKS 6
Statistical placement on Disks – Striping on Disks – Replication Placement on Disks – Constraint allocation on
Disks – Tertiary storage Devices – Continuous Placement on Hierarchical storage system – Statistical
placement on Hierarchical storage systems – Constraint allocation on Hierarchical storage system
UNIT V DISK SCHEDULING METHODS 6
Scheduling methods for disk requests – Feasibility conditions of concurrent streams– Scheduling methods for
request streams
30 PERIODS
COURSE OUTCOMES:
CO1: Understand the basics of text, Image and Video compression
CO2: Understand the various compression algorithms for multimedia content
CO3: Explore the applications of various compression techniques
CO4: Explore knowledge on multimedia storage on disks
CO5: Understand scheduling methods for request streams
TEXT BOOKS
1. Khalid Sayood, Introduction to Data Compression, Morgan Kaufmann Series in Multimedia Information and
Systems, 2018, 5th Edition.
2. Philip K.C.Tse, Multimedia Information Storage and Retrieval: Techniques and Technologies, 2008
REFERENCES
1. David Salomon, A concise introduction to data compression, 2008.
2. Lenald Best, Best’s Guide to Live Stream Video Broadcasting, BCB Live Teaching series, 2017.
3. Yun-Qing Shi, Image And Video Compression For Multimedia Engineering Fundamentals Algorithms And
Standards, Taylor& Francis,2019
4. Irina Bocharova, Compression for Multimedia, Cambridge University Press; 1st edition, 2009
lossless and lossy compression
What are lossless and lossy compression?
Lossless and lossy file compression describe whether all original data can be recovered when the file is
uncompressed.
With lossless compression, every bit of data originally in a file remains after it is uncompressed, and all the
information is restored. Lossy compression reduces a file by permanently eliminating certain information,
especially redundant information.
When the file is uncompressed, some of the original information is not there, although the user may not
notice it.
What is file compression?
Digital files such as image files are often "compressed" to reduce their size and/or to change
various attributes, such as:
file type
dimensions
resolution
bit depth
Compression reduces the size of a file, often without appreciable loss of information. It can be either lossless
or lossy.
A smaller-sized compressed file can be restored to its larger form -- in its entirety or with some data loss,
depending on the compression type -- by decompression.
The file can be decompressed to its original quality without any loss of data. This compression method is also
known as reversible compression.
With this method, although file sizes are reduced, the reduction is less compared to reduction using lossy
compression.
In lossy compression, the data in a file is removed and not restored to its original form after decompression.
Specifically, data is permanently removed, which is why this method is also known as irreversible
compression.
This data loss is not usually noticeable. However, the more a file is compressed, the more degradation occurs,
and the loss eventually becomes visible.
Lossy compression reduces file size much more than the file size achieved after lossless compression.
Lossy Lossless
images
sound
text
It is generally the technique of choice for detailed product images, photography showcases, text files
and spreadsheet files, where the loss of picture quality, words or data (e.g., financial data) could pose a
problem.
The Graphics Interchange File (GIF), an image format used on the internet, is generally compressed using
lossless compression. RAW, BMP and PNG are also lossless image formats.
images
audio
video
This technique is preferred for audio and video files where some amount of information loss is acceptable
since it's unlikely to be detected by most users. The JPEG image file, commonly used for photographs and
other complex still images on the web with no transparency, is generally compressed using lossy
compression.
Using JPEG compression, the creator can decide how much loss to introduce, and how much of a trade-off is
acceptable between file size and image quality.
Lossy compression is also suitable for websites featuring JPEG files and fast loading times since the
compression ratio can be adjusted while maintaining the right balance.
Lempel-Ziv-Welch (LZW)
Huffman Coding
Arithmetic Encoding
Fractal Compression
Advantages and drawbacks of lossless compression
The key benefit of lossless compression is that the quality of the file (e.g., an image) can be retained while
achieving a smaller file size.
In JPEG and PNG files, this is done by removing unnecessary metadata. For applications where it's important
to retain the quality of the file, lossless compression is the better choice.
The drawback of this compression technique is that larger file sizes are required to maintain post-
compression quality.
The disadvantage is that it also results in quality loss, which may not be acceptable for some applications or
users. The higher the compression ratio, the more quality degradation. Additionally, the original file -- with
original quality -- cannot be recovered after compressing.
Choosing lossless or lossy data compression comes down to three things: the application it is being used for,
the acceptable level of quality loss and the desired file size.
What is the best choice for compression: Lossless or lossy?
There is no "right" or "best" answer when it comes to choosing between lossless vs. lossy compression. The
choice depends on:
the application
So, for example, a blog or portfolio website could benefit from lossy compression since it offers significant file
size reduction, saves storage space and improves site Performance and user experience. On the other hand, a
website that requires high-quality photos would be better off with lossless compression.
It is also possible to use both types of compression for the same application.
Run-Length Encoding
Run-Length Encoding (RLE) is a basic data compression method that eliminates redundant
information in a dataset by replacing consecutive repeated values with a count and the value itself. It works
on various data types, including text, images, and numerical data.
Run-length encoding (RLE) is a form of lossless compression RLE is a simple method of compressing
data by specifying the number of times a character or pixel colour repeats followed by the value of the
character or pixel. The aim is to reduce the number of bits used to represent a set of data. Reducing the
number of bits used means that it will take up less storage space and be quicker to transfer.
Data Compression: RLE reduces the size of data, making it more efficient to store and transmit. It is
particularly useful for repetitive data or datasets with long sequences of the same value.
Improved Data Processing: By reducing the volume of data, RLE can speed up data processing
operations such as sorting, searching, and analysis. It simplifies data structures and allows for faster
computations.
Reduced Storage Costs: Compressed data requires less storage space, resulting in reduced storage costs
for organizations that deal with large datasets.
Bandwidth Optimization: When transferring data over networks or between systems, RLE can minimize
bandwidth requirements, leading to faster data transfers.
The process involves going through the text and counting the number of consecutive occurrences of
each character (called "a run"). The number of occurrences of the character and the character itself are
then stored in pairs.
Example:
aaaabbbbbbcddddd
There are 16 characters in the example so 16 bytes (assuming Extended ASCII is being used) are
needed to store these characters in an uncompressed format:
RLE can be used to store that same data using fewer bytes. There are four consecutive occurrences of
the character 'a' followed by six consecutive occurrences of 'b', one 'c' and finally five consecutive
occurrences of 'd'. This could be written as the following sequence of count-character pairs:
Run-length encoding for the above example: (4, a)(6, b)(1, c)(5, d)
So, if the text aaaabbbbbbcddddd is compressed using RLE, storing each count in binary in a single
byte, we would end up with:
04 97 06 98 01 99 05 100
As we can see, this compressed version only requires 8 bytes – a reduction from the original 16 bytes
Example
RLE count-character pairs (10, a)(6, b)(1, e)(1, c)(1, e)(1, c)(1, e)(1, c)(1, e) (1, c)(1, e)(1, c)(1, e)(1, c)
(15, d)(1, e)(1, c)(1, b)
Huffman Coding-
Prefix Rule-
Huffman Tree-
Step-01:
Step-03:
Step-04:
Keep repeating Step-02 and Step-03 until all the nodes form a single tree.
The tree finally obtained is the desired Huffman Tree.
Time Complexity-
Important Formulas-
The following 2 formulas are important to solve the problems based on Huffman Coding-
Formula-01:
Formula-02:
Problem-
A file contains the following characters with the frequencies as shown. If Huffman Coding is used for data
compression, determine-
1. Huffman Code for each character
2. Average code length
3. Length of Huffman encoded message (in bits)
Characters Frequencies
a 10
e 15
i 12
o 3
u 4
s 13
t 1
Solution-
Rule
If you assign weight ‘0’ to the left edges, then assign weight ‘1’ to the right edges.
If you assign weight ‘1’ to the left edges, then assign weight ‘0’ to the right edges.
Any of the above two conventions may be followed.
But follow the same convention at the time of decoding that is adopted at the time of
encoding.
After assigning weight to all the edges, the modified Huffman Tree is-
Now, let us answer each part of the given problem one by one-
1. Huffman Code For Characters-
To write Huffman Code for any character, traverse the Huffman Tree from root node to the leaf node of that
character.
Following this rule, the Huffman Code for each character is-
a = 111
e = 10
i = 00
o = 11001
u = 1101
s = 01
t = 11000
The LZW algorithm is a very common compression technique. This algorithm is typically used in
GIF and optionally in PDF and TIFF. Unix’s ‘compress’ command, among other uses. It is lossless,
meaning no data is lost when compressing. The algorithm is simple to implement and has the
potential for very high throughput in hardware implementations. It is the algorithm of the widely
used Unix file compression utility compress and is used in the GIF image format.
The Idea relies on reoccurring patterns to save data space. LZW is the foremost technique for
general-purpose data compression due to its simplicity and versatility. It is the basis of many PC
utilities that claim to “double the capacity of your hard drive”.
How does it work?
LZW compression works by reading a sequence of symbols, grouping the symbols into strings,
and converting the strings into codes. Because the codes take up less space than the strings they
replace, we get compression. Characteristic features of LZW includes,
LZW compression uses a code table, with 4096 as a common choice for the number of table
entries. Codes 0-255 in the code table are always assigned to represent single bytes from the
input file.
When encoding begins the code table contains only the first 256 entries, with the remainder of
the table being blanks. Compression is achieved by using codes 256 through 4095 to represent
sequences of bytes.
As the encoding continues, LZW identifies repeated sequences in the data and adds them to
the code table.
Decoding is achieved by taking each code from the compressed file and translating it through
the code table to find what character or characters it represents.
Example: ASCII code. Typically, every character is stored with 8 binary bits, allowing up to 256
unique symbols for the data. This algorithm tries to extend the library to 9 to 12 bits per character.
The new unique symbols are made up of combinations of symbols that occurred previously in the
string. It does not always compress well, especially with short, diverse strings. But is good for
compressing redundant data, and does not have to save the new dictionary with the data: this
method can both compress and uncompress data.
.
Disadvantages:
Patent Issues: LZW compression was patented in the 1980s, and for many years its use was
subject to licensing fees, which limited its adoption in some applications.
Memory Requirements: LZW compression requires significant memory to maintain the
compression dictionary, which can be a problem for applications with limited memory resources.
Compression Speed: LZW compression can be slower than some other compression algorithms,
particularly for large files, due to the need to constantly update the dictionary.
Limited Applicability: LZW compression is particularly effective for text-based data, but may not
be as effective for other types of data, such as images or video, which have different compression
requirements.