Rakib Project

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 14

Design Report of Integrated Application, WIT

Design Report of Integrated Application

Topic:
Text Compression and Decompression
ID:
2214010017
Class:
Class 1, CS 202X, SIE
Name:
Ashfi Nazmus Rakib
Score:

Teacher:
Prof. Jun Zhang
Design Time:
Week 12~15, Semester 2, 2023~2024 Academic Year

-I-
武汉工程大学计算机科学与工程学院 综合设计报告

Table of Contents

Abstract ……………………………………………………………………………………...III
Chapter 1 Introduction …………………………………………………………………….. 1
Chapter 2 System Design …………………………………………………………………….. 3
2.1 Requirements ………………………………………………..………………………….…3
2.2 Main Functions ……………………………………………..………………………….…3
Chapter 3 Implementation ……………………………………………………………………..5
3.1 Implementations of Key Algorithms ………………………..………………………….…5
3.2Implementations of Main Data Structures …………………..………………………….…6
Chapter 4 Test and Result Analysis …………………………………………..………………..7
Conclusion .……………………………………………………..………………………….…9
Acknowledge …………………………………………………..………………………….…9
Reference ……….……………..………………………………..……………………….…10

- II -
武汉工程大学计算机科学与工程学院 综合设计报告

Abstract

In this project, I aim to create a compression and


decompression system utilizing Huffman coding. The system
will analyze the frequency of characters in the input text to
construct a Huffman tree and generate Huffman codes for
compression. The compressed text and Huffman codes will be
saved into a file, and then the encoded text will be read from
the file and decoded back to its original form for comparison.
The compression rate will be calculated, and all the Huffman
codes will be displayed."A Huffman coding system that
compresses and decompresses text by using a frequency-
based algorithm to create an optimal prefix code."

- III -
Design Report of Integrated Application, WIT

Chapter 1 Introduction
Text compression is a fundamental aspect of data storage and transmission

efficiency. Huffman coding is a widely used algorithm for lossless data

compression. By constructing a Huffman tree based on character frequencies,

we can assign unique codes to each character, reducing the overall size of the

text data.

Data compression is an essential technique in the field of computer science

and information theory, playing a vital role in efficient data storage and

transmission. With the exponential growth of digital data, the need for

effective compression algorithms has become increasingly important.

Huffman coding, named after its inventor David A. Huffman, is a widely

used method for lossless data compression that offers an optimal solution

in terms of encoding length.

Huffman coding was first introduced in 1952 by David A. Huffman in his

seminal paper "A Method for the Construction of Minimum-Redundancy

Codes." The algorithm is based on the principle of variable-length codes,

where more frequent characters are assigned shorter codes, and less

frequent characters are assigned longer codes. This method ensures that

the overall length of the encoded data is minimized, resulting in efficient

use of storage space and bandwidth.

-1-
武汉工程大学计算机科学与工程学院 综合设计报告

The applications of Huffman coding are extensive and include, but are not

limited to:
File Compression: Huffman coding is used in various file formats to reduce the
size of documents, images, and multimedia files without losing any information.

Network Communication: In network protocols, Huffman coding helps in


reducing the amount of data transmitted over the network, thus saving
bandwidth and improving transmission speed.

Data Storage: Storage devices benefit from Huffman coding by using less
physical space to store data, which is particularly useful in large-scale storage
systems.

Image and Video Processing: In multimedia applications, Huffman coding is used


to compress image and video files, making them easier to share and distribute.

Error Detection and Correction: Huffman coding can be combined with other
techniques to detect and correct errors in data transmission and storage.

This project aims to implement a Huffman coding system that can compress

and decompress text data. We will explore the algorithm's theoretical

foundations, design and implement the necessary components, and

evaluate the system's performance in terms of compression rate and

efficiency. By the end of this project, readers will have a clear

understanding of how Huffman coding works and its practical applications

in data compression.

-2-
Design Report of Integrated Application, WIT

Chapter 2 System Design

2.1 Requirements

 Read input text and compute character frequencies.


 Construct a Huffman tree based on these frequencies.
 Generate Huffman codes for each character.
 Encode the original text using these codes.
 Save the encoded text and codes to a file.
 Decode the text using the saved codes.
 Save the decoded text to a file and verify its integrity.
 Calculate the compression rate.

2.2 Main Functions

 calculate_frequencies(text): Calculate character


frequencies.
 build_huffman_tree(frequencies): Build the Huffman
tree.
 generate_codes(root, current_code): Generate
Huffman codes.
 encode_text(text, codes): Encode the text with the
Huffman codes.

-3-
武汉工程大学计算机科学与工程学院 综合设计报告

 decode_text(encoded_text, codes): Decode the text


from the Huffman codes.
 save_to_file(data, filename): Save data to a file.
 load_from_file(filename): Load data from a file.
 compute_compression_rate(original_size,
encoded_size): Calculate the compression rate.

-4-
武汉工程大学计算机科学与工程学院 综合设计报告

-5-
Design Report of Integrated Application, WIT

Chapter 3 Implementation

3.1 Implementations of Key Algorithms

import heapq
import json

class HuffmanNode:
def __init__(self, char, freq):
self.char = char
self.freq = freq
self.left = None
self.right = None

def __lt__(self, other):


return self.freq < other.freq

def calculate_frequencies(text):
return {char: text.count(char) for char in text}

def build_huffman_tree(frequencies):
priority_queue = [HuffmanNode(char, freq) for char, freq in frequencies.items()]
heapq.heapify(priority_queue)

while len(priority_queue) > 1:


left = heapq.heappop(priority_queue)
right = heapq.heappop(priority_queue)

merged = HuffmanNode(None, left.freq + right.freq)


merged.left = left
merged.right = right
heapq.heappush(priority_queue, merged)

return priority_queue[0]

def generate_codes(root, current_code="", code_map={}):


if root is None:
return code_map
if root.char is not None:

-6-
武汉工程大学计算机科学与工程学院 综合设计报告

code_map[root.char] = current_code

generate_codes(root.left, current_code + "0", code_map)


generate_codes(root.right, current_code + "1", code_map)
return code_map

def encode_text(text, codes):


return ''.join(codes[char] for char in text)

def decode_text(encoded_text, root):


decoded_text = ""
current_node = root
for bit in encoded_text:
current_node = current_node.left if bit == '0' else current_node.right
if current_node.char:
decoded_text += current_node.char
current_node = root
return decoded_text

def save_to_file(data, filename):


with open(filename, 'w') as file:
if isinstance(data, dict): # Save codes as JSON for readability
json.dump(data, file)
else:
file.write(data)

def load_from_file(filename):
with open(filename, 'r') as file:
return file.read()

def compute_compression_rate(original_size, encoded_size):


return original_size / encoded_size if encoded_size > 0 else 0

3.2 Implementations of Main Data Structures

The main data structures used are the HuffmanNode class


for the nodes of the Huffman tree and dictionaries for
character frequencies and Huffman codes .

-7-
武汉工程大学计算机科学与工程学院 综合设计报告

Chapter 4 Test and Result Analysis

Generate sample text data for compression and decompression


testing.
Compare the original text with the decompressed text to verify data
restorarion.
Calculate the compression rate using the formula: (size of encoded
text / size of original text ) *100%

def test_huffman_coding_system():
text = "The quick brown fox jumps over the lazy dog."

frequencies = calculate_frequencies(text)
root = build_huffman_tree(frequencies)
codes = generate_codes(root)
save_to_file(codes, 'huffman_codes.json') # Save codes for
decoding

encoded_text = encode_text(text, codes)


save_to_file(encoded_text, 'file1.txt')

-8-
武汉工程大学计算机科学与工程学院 综合设计报告

print("Encoded text:", encoded_text)

loaded_codes = json.loads(load_from_file('huffman_codes.json'))
decoded_text = decode_text(encoded_text, root)
save_to_file(decoded_text, 'file2.txt')

print("Decoded text:", decoded_text)

original_size = len(text) * 8 # Size in bits assuming ASCII


encoded_size = len(encoded_text)
compression_rate = compute_compression_rate(original_size,
encoded_size)

print("Compression rate:", compression_rate)


print("Huffman Codes:", loaded_codes)

# Run the test


test_huffman_coding_system()

Conclusion

-9-
武汉工程大学计算机科学与工程学院 综合设计报告

In conclusion, Huffman coding provides an effective method


for text compression and decompression, reducing data size
while maintaining data integrity. This project demonstrated
the practical implementation of Huffman coding for text
compression.

Acknowledge

I acknowledge the support and resources used in creating this


project

Reference

- 10 -
武汉工程大学计算机科学与工程学院 综合设计报告

In developing a Huffman coding system, you might reference various

sources for theoretical background, algorithmic details, and practical

implementation tips. Here are some hypothetical references that one could

use:

"A Method for the Construction of Minimum-Redundancy Codes" by

David A. Huffman, published in the Proceedings of the IRE (1952).

This is the original paper that introduced Huffman coding.

"Elements of Information Theory" by Thomas M. Cover and Joy A.

Thomas, published by John Wiley & Sons, Inc. This textbook provides

a comprehensive treatment of information theory, including Huffman

coding.

"Data Compression: The Complete Reference" by David Salomon,

published by Springer. This book is a standard reference for data

compression techniques, including Huffman coding.

- 11 -

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy