0% found this document useful (0 votes)
53 views

Lossless and Lossy Compression

The document discusses a course on multimedia data compression and storage. It covers topics like lossless and lossy compression techniques, image compression, video compression, data placement on disks, and disk scheduling methods. The course aims to provide understanding of compression basics, categories, and algorithms for text, image, and video data as well as concepts of data availability and streaming services.

Uploaded by

mohamadi_r
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
53 views

Lossless and Lossy Compression

The document discusses a course on multimedia data compression and storage. It covers topics like lossless and lossy compression techniques, image compression, video compression, data placement on disks, and disk scheduling methods. The course aims to provide understanding of compression basics, categories, and algorithms for text, image, and video data as well as concepts of data availability and streaming services.

Uploaded by

mohamadi_r
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 18

CCS353 MULTIMEDIA DATA COMPRESSION AND STORAGE LTPC

2023
COURSE OBJECTIVES:
To understand the basics of compression techniques
To understand the categories of compression for text, image and video
To explore the modalities of text, image and video compression algorithms
To know about basics of consistency of data availability in storage devices
To understand the concepts of data streaming services
UNIT I BASICS OF DATA COMPRESSION 6
Introduction ––Lossless and Lossy Compression– Basics of Huffmann coding- Arithmetic coding Dictionary
techniques- Context based compression – Applications
UNIT II IMAGE COMPRESSION 6
Lossless Image compression – JPEG-CALIC-JPEG LS-Prediction using conditional averages – Progressive
Image Transmission – Lossless Image compression formats – Applications - Facsimile encoding
UNIT III VIDEO COMPRESSION 6
Introduction – Motion Compensation – Video Signal Representation – H.261 – MPEG-1- MPEG-2- H.263. UNIT
IV DATA PLACEMENT ON DISKS 6
Statistical placement on Disks – Striping on Disks – Replication Placement on Disks – Constraint allocation on
Disks – Tertiary storage Devices – Continuous Placement on Hierarchical storage system – Statistical
placement on Hierarchical storage systems – Constraint allocation on Hierarchical storage system
UNIT V DISK SCHEDULING METHODS 6
Scheduling methods for disk requests – Feasibility conditions of concurrent streams– Scheduling methods for
request streams
30 PERIODS

COURSE OUTCOMES:
CO1: Understand the basics of text, Image and Video compression
CO2: Understand the various compression algorithms for multimedia content
CO3: Explore the applications of various compression techniques
CO4: Explore knowledge on multimedia storage on disks
CO5: Understand scheduling methods for request streams
TEXT BOOKS
1. Khalid Sayood, Introduction to Data Compression, Morgan Kaufmann Series in Multimedia Information and
Systems, 2018, 5th Edition.
2. Philip K.C.Tse, Multimedia Information Storage and Retrieval: Techniques and Technologies, 2008
REFERENCES
1. David Salomon, A concise introduction to data compression, 2008.
2. Lenald Best, Best’s Guide to Live Stream Video Broadcasting, BCB Live Teaching series, 2017.
3. Yun-Qing Shi, Image And Video Compression For Multimedia Engineering Fundamentals Algorithms And
Standards, Taylor& Francis,2019
4. Irina Bocharova, Compression for Multimedia, Cambridge University Press; 1st edition, 2009
lossless and lossy compression
What are lossless and lossy compression?
Lossless and lossy file compression describe whether all original data can be recovered when the file is
uncompressed.
With lossless compression, every bit of data originally in a file remains after it is uncompressed, and all the
information is restored. Lossy compression reduces a file by permanently eliminating certain information,
especially redundant information.

When the file is uncompressed, some of the original information is not there, although the user may not
notice it.
What is file compression?

Digital files such as image files are often "compressed" to reduce their size and/or to change
various attributes, such as:

 file type

 dimensions

 resolution

 bit depth

Compression reduces the size of a file, often without appreciable loss of information. It can be either lossless
or lossy.

A smaller-sized compressed file can be restored to its larger form -- in its entirety or with some data loss,
depending on the compression type -- by decompression.

Lossless compression vs. lossy compression


Lossless compression restores and rebuilds file data in its original form after the file is decompressed. For
example, when a picture's file size is compressed, its quality remains the same.

The file can be decompressed to its original quality without any loss of data. This compression method is also
known as reversible compression.

With this method, although file sizes are reduced, the reduction is less compared to reduction using lossy
compression.

In lossy compression, the data in a file is removed and not restored to its original form after decompression.
Specifically, data is permanently removed, which is why this method is also known as irreversible
compression.

This data loss is not usually noticeable. However, the more a file is compressed, the more degradation occurs,
and the loss eventually becomes visible.

Lossy compression reduces file size much more than the file size achieved after lossless compression.

Advantages and disadvantages of lossy and lossless compression.

Lossy Lossless

Small file sizes. Ideal for web use. Lots of


tools, No loss in quality. Slight decreases in file sizes. Reversible
Pros
plugins and software support it. image recontruction
Alterations are invisible to the eyes
Cons
Quality degrades due to higher rate of
Compressed files are larger than lossy files.
compression.
Lossless and lossy are data compression methods to reduce file sizes.

Applications of lossless and lossy compression


Lossless compression is mainly used to compress:

 images

 sound

 text

It is generally the technique of choice for detailed product images, photography showcases, text files
and spreadsheet files, where the loss of picture quality, words or data (e.g., financial data) could pose a
problem.

The Graphics Interchange File (GIF), an image format used on the internet, is generally compressed using
lossless compression. RAW, BMP and PNG are also lossless image formats.

Lossy compression is mainly used to compress:

 images

 audio

 video

This technique is preferred for audio and video files where some amount of information loss is acceptable
since it's unlikely to be detected by most users. The JPEG image file, commonly used for photographs and
other complex still images on the web with no transparency, is generally compressed using lossy
compression.

Using JPEG compression, the creator can decide how much loss to introduce, and how much of a trade-off is
acceptable between file size and image quality.

Lossy compression is also suitable for websites featuring JPEG files and fast loading times since the
compression ratio can be adjusted while maintaining the right balance.

Algorithms used in lossless and lossy compression


Different kinds of algorithms are used to reduce file sizes in lossless and lossy compression.

The algorithms used in lossless compression are:

 Run Length Encoding

 Lempel-Ziv-Welch (LZW)

 Huffman Coding

 Arithmetic Encoding

The algorithms used in lossy compression are:


 Transform Coding

 Discrete Cosine Transform

 Discrete Wavelet Transform

 Fractal Compression
Advantages and drawbacks of lossless compression
The key benefit of lossless compression is that the quality of the file (e.g., an image) can be retained while
achieving a smaller file size.

In JPEG and PNG files, this is done by removing unnecessary metadata. For applications where it's important
to retain the quality of the file, lossless compression is the better choice.

The drawback of this compression technique is that larger file sizes are required to maintain post-
compression quality.

Advantages and drawbacks of lossy compression


Lossy compression results in a significantly reduced file size (smaller than lossless compression), which is its
most noteworthy benefit. It is supported by many tools, plugins and software products that let the user
choose their preferred degree of compression.

The disadvantage is that it also results in quality loss, which may not be acceptable for some applications or
users. The higher the compression ratio, the more quality degradation. Additionally, the original file -- with
original quality -- cannot be recovered after compressing.

The differences between lossy and lossless compression.

Choosing lossless or lossy data compression comes down to three things: the application it is being used for,
the acceptable level of quality loss and the desired file size.
What is the best choice for compression: Lossless or lossy?
There is no "right" or "best" answer when it comes to choosing between lossless vs. lossy compression. The
choice depends on:

 the application

 acceptable level of quality loss

 desired file size

So, for example, a blog or portfolio website could benefit from lossy compression since it offers significant file
size reduction, saves storage space and improves site Performance and user experience. On the other hand, a
website that requires high-quality photos would be better off with lossless compression.

It is also possible to use both types of compression for the same application.

Run-Length Encoding

Run-Length Encoding (RLE) is a basic data compression method that eliminates redundant
information in a dataset by replacing consecutive repeated values with a count and the value itself. It works
on various data types, including text, images, and numerical data.

Run-length encoding (RLE) is a form of lossless compression RLE is a simple method of compressing
data by specifying the number of times a character or pixel colour repeats followed by the value of the
character or pixel. The aim is to reduce the number of bits used to represent a set of data. Reducing the
number of bits used means that it will take up less storage space and be quicker to transfer.

How Run-Length Encoding Works


RLE works by scanning through the data and identifying consecutive sequences of the same value.
Once a sequence is detected, it is replaced with a pair consisting of the count of the consecutive values and the
value itself. For example, if a sequence of 10 zeros is found, it can be encoded as (10, 0). This significantly
reduces the amount of data required to represent the information without losing any essential information.

Why Run-Length Encoding is Important


Run-Length Encoding offers several benefits in data processing and analytics:

 Data Compression: RLE reduces the size of data, making it more efficient to store and transmit. It is
particularly useful for repetitive data or datasets with long sequences of the same value.
 Improved Data Processing: By reducing the volume of data, RLE can speed up data processing
operations such as sorting, searching, and analysis. It simplifies data structures and allows for faster
computations.
 Reduced Storage Costs: Compressed data requires less storage space, resulting in reduced storage costs
for organizations that deal with large datasets.
 Bandwidth Optimization: When transferring data over networks or between systems, RLE can minimize
bandwidth requirements, leading to faster data transfers.

Most Important Run-Length Encoding Use Cases


Run-Length Encoding finds applications in various domains, including:
 Image and Video Compression: RLE is widely used for compressing images and video data, where
consecutive pixels often have the same value.
 Speech and Audio Compression: RLE can be applied to audio signals to reduce their size while
maintaining acceptable audio quality.
 Data Storage and Archiving: RLE can be used to compress data before storing it in databases or
archives, optimizing storage space utilization.
 Data Transmission and Communication: RLE can reduce the amount of data transferred during
communication, enhancing the efficiency of data transmission over networks.

RLE for text data

The process involves going through the text and counting the number of consecutive occurrences of
each character (called "a run"). The number of occurrences of the character and the character itself are
then stored in pairs.

Example:

aaaabbbbbbcddddd

There are 16 characters in the example so 16 bytes (assuming Extended ASCII is being used) are
needed to store these characters in an uncompressed format:

Example as ASCII: 97 97 97 97 98 98 98 98 98 98 99 100 100 100 100 100

RLE can be used to store that same data using fewer bytes. There are four consecutive occurrences of
the character 'a' followed by six consecutive occurrences of 'b', one 'c' and finally five consecutive
occurrences of 'd'. This could be written as the following sequence of count-character pairs:

Run-length encoding for the above example: (4, a)(6, b)(1, c)(5, d)

So, if the text aaaabbbbbbcddddd is compressed using RLE, storing each count in binary in a single
byte, we would end up with:

04 97 06 98 01 99 05 100

As we can see, this compressed version only requires 8 bytes – a reduction from the original 16 bytes

Example

Uncompressed text aaaaaaaaaabbbbbbececececececdddddddddddddddecb (46 bytes)

RLE count-character pairs (10, a)(6, b)(1, e)(1, c)(1, e)(1, c)(1, e)(1, c)(1, e) (1, c)(1, e)(1, c)(1, e)(1, c)
(15, d)(1, e)(1, c)(1, b)

Other Technologies or Terms Related to Run-Length Encoding


While Run-Length Encoding is a standalone compression technique, it can be used in conjunction with other
data processing and compression methods, such as:
 Huffman Coding: RLE can be combined with Huffman coding to achieve higher compression ratios by
further reducing the size of the encoded data.
 Lossless Compression: RLE is a lossless compression technique, meaning the original data can be fully
recovered without any loss of information.
 Data Lakehouse: Run-Length Encoding can be utilized within a data lakehouse environment to optimize
storage and processing efficiency.

Huffman Coding-

 Huffman Coding is a famous Greedy Algorithm.



It is used for the lossless compression of data.
 It uses variable length encoding.
 It assigns variable length code to all the characters.
 The code length of a character depends on how frequently it occurs in the given text.
 The character which occurs most frequently gets the smallest code.
 The character which occurs least frequently gets the largest code.
 It is also known as Huffman Encoding.

Prefix Rule-

 Huffman Coding implements a rule known as a prefix rule.


 This is to prevent the ambiguities while decoding.
 It ensures that the code assigned to any character is not a prefix of the code assigned to any other character.

Major Steps in Huffman Coding-

There are two major steps in Huffman Coding-


1. Building a Huffman Tree from the input characters.
2. Assigning code to the characters by traversing the Huffman Tree.

Huffman Tree-

The steps involved in the construction of Huffman Tree are as follows-

Step-01:

 Create a leaf node for each character of the text.


 Leaf node of a character contains the occurring frequency of that character.
Step-02:

 Arrange all the nodes in increasing order of their frequency value.

Step-03:

Considering the first two nodes having minimum frequency,


 Create a new internal node.
 The frequency of this new node is the sum of frequency of those two nodes.
 Make the first node as a left child and the other node as a right child of the newly created node.

Step-04:

 Keep repeating Step-02 and Step-03 until all the nodes form a single tree.
 The tree finally obtained is the desired Huffman Tree.

Time Complexity-

The time complexity analysis of Huffman Coding is as follows-


 extractMin( ) is called 2 x (n-1) times if there are n nodes.
 As extractMin( ) calls minHeapify( ), it takes O(logn) time.

Thus, Overall time complexity of Huffman Coding becomes O(nlogn).


Here, n is the number of unique characters in the given text.

Important Formulas-

The following 2 formulas are important to solve the problems based on Huffman Coding-

Formula-01:

Formula-02:

Total number of bits in Huffman encoded message


= Total number of characters in the message x Average code length per character
= ∑ ( frequencyi x Code lengthi )

PRACTICE PROBLEM BASED ON HUFFMAN CODING-

Problem-

A file contains the following characters with the frequencies as shown. If Huffman Coding is used for data
compression, determine-
1. Huffman Code for each character
2. Average code length
3. Length of Huffman encoded message (in bits)

Characters Frequencies

a 10

e 15

i 12

o 3

u 4

s 13

t 1

Solution-

First let us construct the Huffman Tree.


Huffman Tree is constructed in the following steps-
Step-07:
Now,

 We assign weight to all the edges of the constructed Huffman Tree.


 Let us assign weight ‘0’ to the left edges and weight ‘1’ to the right edges.

Rule
 If you assign weight ‘0’ to the left edges, then assign weight ‘1’ to the right edges.
 If you assign weight ‘1’ to the left edges, then assign weight ‘0’ to the right edges.
 Any of the above two conventions may be followed.
 But follow the same convention at the time of decoding that is adopted at the time of
encoding.

After assigning weight to all the edges, the modified Huffman Tree is-

Now, let us answer each part of the given problem one by one-
1. Huffman Code For Characters-

To write Huffman Code for any character, traverse the Huffman Tree from root node to the leaf node of that
character.
Following this rule, the Huffman Code for each character is-
 a = 111
 e = 10
 i = 00
 o = 11001
 u = 1101
 s = 01
 t = 11000

From here, we can observe-


 Characters occurring less frequently in the text are assigned the larger code.
 Characters occurring more frequently in the text are assigned the smaller code.

2. Average Code Length-

Using formula-01, we have-


Average code length
= ∑ ( frequencyi x code lengthi ) / ∑ ( frequencyi )
= { (10 x 3) + (15 x 2) + (12 x 2) + (3 x 5) + (4 x 4) + (13 x 2) + (1 x 5) } / (10 + 15 + 12 + 3 + 4 + 13 + 1)
= 2.52

3. Length of Huffman Encoded Message-

Using formula-02, we have-


Total number of bits in Huffman encoded message
= Total number of characters in the message x Average code length per character
= 58 x 2.52
= 146.16
≅ 147 bits
LZW (LEMPEL–ZIV–WELCH) COMPRESSION TECHNIQUE

The LZW algorithm is a very common compression technique. This algorithm is typically used in
GIF and optionally in PDF and TIFF. Unix’s ‘compress’ command, among other uses. It is lossless,
meaning no data is lost when compressing. The algorithm is simple to implement and has the
potential for very high throughput in hardware implementations. It is the algorithm of the widely
used Unix file compression utility compress and is used in the GIF image format.
The Idea relies on reoccurring patterns to save data space. LZW is the foremost technique for
general-purpose data compression due to its simplicity and versatility. It is the basis of many PC
utilities that claim to “double the capacity of your hard drive”.
How does it work?
LZW compression works by reading a sequence of symbols, grouping the symbols into strings,
and converting the strings into codes. Because the codes take up less space than the strings they
replace, we get compression. Characteristic features of LZW includes,
 LZW compression uses a code table, with 4096 as a common choice for the number of table
entries. Codes 0-255 in the code table are always assigned to represent single bytes from the
input file.
 When encoding begins the code table contains only the first 256 entries, with the remainder of
the table being blanks. Compression is achieved by using codes 256 through 4095 to represent
sequences of bytes.
 As the encoding continues, LZW identifies repeated sequences in the data and adds them to
the code table.
 Decoding is achieved by taking each code from the compressed file and translating it through
the code table to find what character or characters it represents.
Example: ASCII code. Typically, every character is stored with 8 binary bits, allowing up to 256
unique symbols for the data. This algorithm tries to extend the library to 9 to 12 bits per character.
The new unique symbols are made up of combinations of symbols that occurred previously in the
string. It does not always compress well, especially with short, diverse strings. But is good for
compressing redundant data, and does not have to save the new dictionary with the data: this
method can both compress and uncompress data.
.

Example 1: Use the LZW algorithm to compress the string: BABAABAAA


The steps involved are systematically shown in the diagram below.
LZW Decompression
The LZW decompressor creates the same string table during decompression. It starts with the first
256 table entries initialized to single characters.
Example 2: LZW Decompression: Use LZW to decompress the output sequence
of : <66><65><256><257><65><260>
The steps involved are systematically shown in the diagram below.

Advantages of LZW over Huffman:


 LZW requires no prior information about the input data stream.
 LZW can compress the input stream in one single pass.
 Another advantage of LZW is its simplicity, allowing fast execution.
High Compression Ratio: LZW can achieve high compression ratios, particularly for text-based
data, which can significantly reduce file sizes and storage requirements.
Fast Decompression: LZW decompression is typically faster than other compression algorithms,
making it a good choice for applications where decompression speed is critical.
Universal Adoption: LZW is widely used and supported across a variety of software applications
and operating systems, making it a popular choice for compression and decompression.
Dynamic Compression: LZW uses a dynamic compression algorithm, meaning it adapts to the
data being compressed, which allows it to achieve high compression ratios even for data with
repetitive patterns.

Disadvantages:

Patent Issues: LZW compression was patented in the 1980s, and for many years its use was
subject to licensing fees, which limited its adoption in some applications.
Memory Requirements: LZW compression requires significant memory to maintain the
compression dictionary, which can be a problem for applications with limited memory resources.
Compression Speed: LZW compression can be slower than some other compression algorithms,
particularly for large files, due to the need to constantly update the dictionary.
Limited Applicability: LZW compression is particularly effective for text-based data, but may not
be as effective for other types of data, such as images or video, which have different compression
requirements.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy