Rakib Project

Design Report of Integrated Application, WIT
Design Report of Integrated Application
Topic:
Text Compression and Decompression
ID:
2214010017
Class:
Class 1, CS 202X, SIE
Name:
Ashfi Nazmus Rakib
Score:
Teacher:
Prof. Jun Zhang
Design Time:
Week 12~15, Semester 2, 2023~2024 Academic Year
-I-
武汉工程大学计算机科学与工程学院综合设计报告
Table of Contents
Abstract ……………………………………………………………………………………...III
Chapter 1 Introduction …………………………………………………………………….. 1
Chapter 2 System Design …………………………………………………………………….. 3
2.1 Requirements ………………………………………………..………………………….…3
2.2 Main Functions ……………………………………………..………………………….…3
Chapter 3 Implementation ……………………………………………………………………..5
3.1 Implementations of Key Algorithms ………………………..………………………….…5
3.2Implementations of Main Data Structures …………………..………………………….…6
Chapter 4 Test and Result Analysis …………………………………………..………………..7
Conclusion .……………………………………………………..………………………….…9
Acknowledge …………………………………………………..………………………….…9
Reference ……….……………..………………………………..……………………….…10
- II -
Abstract
In this project, I aim to create a compression and

decompression system utilizing Huffman coding. The system
will analyze the frequency of characters in the input text to
construct a Huffman tree and generate Huffman codes for
compression. The compressed text and Huffman codes will be
saved into a file, and then the encoded text will be read from
the file and decoded back to its original form for comparison.
The compression rate will be calculated, and all the Huffman
codes will be displayed."A Huffman coding system that
compresses and decompresses text by using a frequency-
based algorithm to create an optimal prefix code."
- III -
Chapter 1 Introduction
Text compression is a fundamental aspect of data storage and transmission
efficiency. Huffman coding is a widely used algorithm for lossless data
compression. By constructing a Huffman tree based on character frequencies,
we can assign unique codes to each character, reducing the overall size of the
text data.
Data compression is an essential technique in the field of computer science
and information theory, playing a vital role in efficient data storage and
transmission. With the exponential growth of digital data, the need for
effective compression algorithms has become increasingly important.
Huffman coding, named after its inventor David A. Huffman, is a widely
used method for lossless data compression that offers an optimal solution
in terms of encoding length.
Huffman coding was first introduced in 1952 by David A. Huffman in his
seminal paper "A Method for the Construction of Minimum-Redundancy
Codes." The algorithm is based on the principle of variable-length codes,
where more frequent characters are assigned shorter codes, and less
frequent characters are assigned longer codes. This method ensures that
the overall length of the encoded data is minimized, resulting in efficient
use of storage space and bandwidth.
-1-
The applications of Huffman coding are extensive and include, but are not
limited to:
File Compression: Huffman coding is used in various file formats to reduce the
size of documents, images, and multimedia files without losing any information.
Network Communication: In network protocols, Huffman coding helps in

reducing the amount of data transmitted over the network, thus saving
bandwidth and improving transmission speed.
Data Storage: Storage devices benefit from Huffman coding by using less
physical space to store data, which is particularly useful in large-scale storage
systems.
Image and Video Processing: In multimedia applications, Huffman coding is used

to compress image and video files, making them easier to share and distribute.
Error Detection and Correction: Huffman coding can be combined with other
techniques to detect and correct errors in data transmission and storage.
This project aims to implement a Huffman coding system that can compress
and decompress text data. We will explore the algorithm's theoretical
foundations, design and implement the necessary components, and
evaluate the system's performance in terms of compression rate and
efficiency. By the end of this project, readers will have a clear
understanding of how Huffman coding works and its practical applications
in data compression.
-2-
Chapter 2 System Design
2.1 Requirements
 Read input text and compute character frequencies.

 Construct a Huffman tree based on these frequencies.
 Generate Huffman codes for each character.
 Encode the original text using these codes.
 Save the encoded text and codes to a file.
 Decode the text using the saved codes.
 Save the decoded text to a file and verify its integrity.
 Calculate the compression rate.
2.2 Main Functions
 calculate_frequencies(text): Calculate character

frequencies.
 build_huffman_tree(frequencies): Build the Huffman
tree.
 generate_codes(root, current_code): Generate
Huffman codes.
 encode_text(text, codes): Encode the text with the
Huffman codes.
-3-
 decode_text(encoded_text, codes): Decode the text

from the Huffman codes.
 save_to_file(data, filename): Save data to a file.
 load_from_file(filename): Load data from a file.
 compute_compression_rate(original_size,
encoded_size): Calculate the compression rate.
-4-
-5-
Chapter 3 Implementation
3.1 Implementations of Key Algorithms
import heapq
import json
class HuffmanNode:
def __init__(self, char, freq):
self.char = char
self.freq = freq
self.left = None
self.right = None
def __lt__(self, other):

return self.freq < other.freq
def calculate_frequencies(text):
return {char: text.count(char) for char in text}
def build_huffman_tree(frequencies):
priority_queue = [HuffmanNode(char, freq) for char, freq in frequencies.items()]
heapq.heapify(priority_queue)
while len(priority_queue) > 1:

left = heapq.heappop(priority_queue)
right = heapq.heappop(priority_queue)
merged = HuffmanNode(None, left.freq + right.freq)

merged.left = left
merged.right = right
heapq.heappush(priority_queue, merged)
return priority_queue[0]
def generate_codes(root, current_code="", code_map={}):

if root is None:
return code_map
if root.char is not None:
-6-
code_map[root.char] = current_code
generate_codes(root.left, current_code + "0", code_map)

generate_codes(root.right, current_code + "1", code_map)
return code_map
def encode_text(text, codes):

return ''.join(codes[char] for char in text)
def decode_text(encoded_text, root):

decoded_text = ""
current_node = root
for bit in encoded_text:
current_node = current_node.left if bit == '0' else current_node.right
if current_node.char:
decoded_text += current_node.char
current_node = root
return decoded_text
def save_to_file(data, filename):

with open(filename, 'w') as file:
if isinstance(data, dict): # Save codes as JSON for readability
json.dump(data, file)
else:
file.write(data)
def load_from_file(filename):
with open(filename, 'r') as file:
return file.read()
def compute_compression_rate(original_size, encoded_size):

return original_size / encoded_size if encoded_size > 0 else 0
3.2 Implementations of Main Data Structures
The main data structures used are the HuffmanNode class

for the nodes of the Huffman tree and dictionaries for
character frequencies and Huffman codes .
-7-
Chapter 4 Test and Result Analysis
Generate sample text data for compression and decompression

testing.
Compare the original text with the decompressed text to verify data
restorarion.
Calculate the compression rate using the formula: (size of encoded
text / size of original text ) *100%
def test_huffman_coding_system():
text = "The quick brown fox jumps over the lazy dog."
frequencies = calculate_frequencies(text)
root = build_huffman_tree(frequencies)
codes = generate_codes(root)
save_to_file(codes, 'huffman_codes.json') # Save codes for
decoding
encoded_text = encode_text(text, codes)

save_to_file(encoded_text, 'file1.txt')
-8-
print("Encoded text:", encoded_text)
loaded_codes = json.loads(load_from_file('huffman_codes.json'))
decoded_text = decode_text(encoded_text, root)
save_to_file(decoded_text, 'file2.txt')
print("Decoded text:", decoded_text)
original_size = len(text) * 8 # Size in bits assuming ASCII

encoded_size = len(encoded_text)
compression_rate = compute_compression_rate(original_size,
encoded_size)
print("Compression rate:", compression_rate)

print("Huffman Codes:", loaded_codes)
# Run the test

test_huffman_coding_system()
Conclusion
-9-
In conclusion, Huffman coding provides an effective method

for text compression and decompression, reducing data size
while maintaining data integrity. This project demonstrated
the practical implementation of Huffman coding for text
compression.
Acknowledge
I acknowledge the support and resources used in creating this

project
Reference
- 10 -
In developing a Huffman coding system, you might reference various
sources for theoretical background, algorithmic details, and practical
implementation tips. Here are some hypothetical references that one could
use:
"A Method for the Construction of Minimum-Redundancy Codes" by
David A. Huffman, published in the Proceedings of the IRE (1952).
This is the original paper that introduced Huffman coding.
"Elements of Information Theory" by Thomas M. Cover and Joy A.
Thomas, published by John Wiley & Sons, Inc. This textbook provides
a comprehensive treatment of information theory, including Huffman
coding.
"Data Compression: The Complete Reference" by David Salomon,
published by Springer. This book is a standard reference for data
compression techniques, including Huffman coding.
- 11 -

Rakib Project

Uploaded by

Copyright:

Available Formats

Rakib Project

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Rakib Project

Uploaded by

Copyright:

Available Formats

Design Report of Integrated Application, WIT

Design Report of Integrated Application

In this project, I aim to create a compression and

efficiency. Huffman coding is a widely used algorithm for lossless data

compression. By constructing a Huffman tree based on character frequencies,

Data compression is an essential technique in the field of computer science

effective compression algorithms has become increasingly important.

Huffman coding, named after its inventor David A. Huffman, is a widely

in terms of encoding length.

Huffman coding was first introduced in 1952 by David A. Huffman in his

seminal paper "A Method for the Construction of Minimum-Redundancy

Codes." The algorithm is based on the principle of variable-length codes,

the overall length of the encoded data is minimized, resulting in efficient

use of storage space and bandwidth.

Network Communication: In network protocols, Huffman coding helps in

Image and Video Processing: In multimedia applications, Huffman coding is used

and decompress text data. We will explore the algorithm's theoretical

foundations, design and implement the necessary components, and

evaluate the system's performance in terms of compression rate and

efficiency. By the end of this project, readers will have a clear

understanding of how Huffman coding works and its practical applications

Chapter 2 System Design

 Read input text and compute character frequencies.

2.2 Main Functions

 calculate_frequencies(text): Calculate character

 decode_text(encoded_text, codes): Decode the text

3.1 Implementations of Key Algorithms

def __lt__(self, other):

while len(priority_queue) > 1:

merged = HuffmanNode(None, left.freq + right.freq)

def generate_codes(root, current_code="", code_map={}):

generate_codes(root.left, current_code + "0", code_map)

def encode_text(text, codes):

def decode_text(encoded_text, root):

def save_to_file(data, filename):

def compute_compression_rate(original_size, encoded_size):

3.2 Implementations of Main Data Structures

The main data structures used are the HuffmanNode class

Chapter 4 Test and Result Analysis

Generate sample text data for compression and decompression

encoded_text = encode_text(text, codes)

print("Encoded text:", encoded_text)

print("Decoded text:", decoded_text)

original_size = len(text) * 8 # Size in bits assuming ASCII

print("Compression rate:", compression_rate)

# Run the test

In conclusion, Huffman coding provides an effective method

I acknowledge the support and resources used in creating this

In developing a Huffman coding system, you might reference various

sources for theoretical background, algorithmic details, and practical

"A Method for the Construction of Minimum-Redundancy Codes" by

David A. Huffman, published in the Proceedings of the IRE (1952).

This is the original paper that introduced Huffman coding.

"Elements of Information Theory" by Thomas M. Cover and Joy A.

a comprehensive treatment of information theory, including Huffman

"Data Compression: The Complete Reference" by David Salomon,

published by Springer. This book is a standard reference for data

compression techniques, including Huffman coding.

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

def lt(self, other):