DC (Ca 1)
DC (Ca 1)
SECTION-A
1. Lossless Compression: This method reduces file size without losing any information. It works by
identifying and eliminating statistical redundancy. Common algorithms include Huffman coding, Lempel-
include JPEG for images, MP3 for audio, and MPEG for video12.
Fidelity
Fidelity refers to the degree to which a copy or reproduction accurately represents the original. In data
compression, high fidelity means that the compressed and then decompressed data closely matches the original
data. Fidelity is crucial in applications where precise replication of the original data is necessary, such as in
medical imaging or scientific data.
Quality
Quality refers to the overall excellence or superiority of the reproduced data. In the context of media, quality
often involves subjective assessments of how good the reproduced image, sound, or video appears or sounds to
human senses. Quality can be influenced by factors such as resolution, color accuracy, and sound clarity.
Key Points:
High Probability Events: If an event is very likely (high probability), its self-information is low because
it is not surprising.
Low Probability Events: If an event is unlikely (low probability), its self-information is high because it is
surprising.
Example:
Q9) The difference between the data and model is often referred to as?
Ans) The difference between the data and the model is often referred to as the residual or error.
Q10) The average self information associated with random experiment is called?
Ans) The average self-information associated with a random experiment is called entropy. In information
theory, entropy quantifies the expected amount of information produced by a stochastic source of data. It
measures the uncertainty or unpredictability of the random variable’s outcomes.
SECTION-B
Q1) What do you mean by data Compression? Compare lossless and lossy data compression
techniques in detail.
Ans) Data compression is the process of encoding information using fewer bits than the original representation.
The primary goal is to reduce the size of data files, making them easier to store and transmit. Compression can
be achieved through various algorithms that identify and eliminate redundancy in the data.
There are two main types of data compression: lossless and lossy.
Lossless compression reduces file size without losing any information. When the data is decompressed, it is
restored to its original form. This type of compression is essential for applications where data integrity is crucial,
such as text files, executable files, and certain types of images.
Common Algorithms:
Huffman Coding: Uses variable-length codes to represent symbols based on their frequencies.
Lempel-Ziv-Welch (LZW): Builds a dictionary of sequences encountered in the data.
DEFLATE: Combines LZ77 and Huffman coding, used in formats like ZIP and PNG.
Example:
Original: AAAAAABBBCCDAA
Compressed: 6A3B2C1D2A
Lossy compression reduces file size by removing less important information, which can result in some loss of
quality. This type of compression is often used for multimedia files like images, audio, and video, where a
perfect reproduction is not necessary, and some loss of quality is acceptable.
Common Algorithms:
JPEG: Used for images, it reduces file size by discarding less noticeable details.
MP3: Used for audio, it removes frequencies that are less audible to the human ear.
MPEG: Used for video, it reduces file size by removing redundant frames and compressing the remaining
data.
Example:
Table
Q2) Define the prefix code and UDC code and also determine whether the given code is UDC or not.
1) {0,01,11,111}
2) {0,01,110,111}
coding12.
A Uniquely Decodable Code (UDC) is a code in which every encoded message can be decoded in exactly one
way. All prefix codes are uniquely decodable, but not all uniquely decodable codes are prefix codes. UDCs
ensure that there is a one-to-one mapping between the encoded and decoded messages2.
Determining if the Given Codes are UDC
Let’s analyze the given codes to determine if they are uniquely decodable:
Summary
Code Set 1: {0, 01, 11, 111} is neither a prefix code nor a uniquely decodable code.
Code Set 2: {0, 01, 110, 111} is neither a prefix code nor a uniquely decodable code.
Q3) Explain physical model and probability models with example in detail.
Ans)
Physical Model in Data Compression
A physical model in data compression refers to the tangible representation of the data and its structure. This
model is used to understand how data is stored, organized, and accessed at the physical level. It involves the
actual implementation details of the compression algorithm, including how data is encoded, stored, and retrieved.
Example:
Discrete Cosine Transform (DCT): This is a physical model that transforms the image data from the
spatial domain to the frequency domain. It helps in identifying the parts of the image that can be
compressed more effectively.
Quantization: This step reduces the precision of the transformed coefficients, which is a physical
representation of the data reduction process.
Entropy Coding: Techniques like Huffman coding or arithmetic coding are used to further compress the
quantized coefficients by representing frequently occurring patterns with shorter codes.
A probability model in data compression involves using statistical methods to predict the likelihood of different
data patterns. This model helps in designing efficient coding schemes by assigning shorter codes to more
probable events and longer codes to less probable events.
Example:
Frequency Analysis: The algorithm starts by analyzing the frequency of each symbol in the data. This
frequency analysis forms the basis of the probability model.
Tree Construction: A binary tree is constructed where each leaf node represents a symbol, and the path
from the root to the leaf represents the code for that symbol. Symbols with higher probabilities are placed
closer to the root, resulting in shorter codes.
Encoding: The data is then encoded using the generated codes, which minimizes the overall length of the
encoded data based on the probability model.
Comparison
Summary
Physical Models: Focus on the actual implementation and storage of compressed data, such as the steps in
JPEG compression.
Probability Models: Use statistical methods to predict data patterns and design efficient coding schemes,
such as in Huffman coding.
Q4) Consider the following sequence: 12123333123333123312. Find the entropy for given sequence.
Ans) To find the entropy of the sequence 12123333123333123312 we need to follow these steps:
1: appears 4 times
2: appears 4 times
3: appears 6 times
- For digit 2:
P(2) = 4/20
= 0.2
- For digit 3:
P(3) = 6/20
= 0.3
Step 4: Apply the Entropy Formula
1. For digit 1:
H1 = -P(1) log2(P(1)) = -0.2 log2(0.2)
2. For digit 2 :
H2 = -P(2) log2(P(2)) = -0.2 log2(0.2) ≈ 0.4644
3. For digit 3 :
H3 = -P(3) log2(P(3)) = -0.3 log2(0.3)
Final Result
1.45