DATA STORAGE and Compression
DATA STORAGE and Compression
An image measures 100 by 80 pixels and has 128 colours (so this must use 7
bits)
A sound clip uses 48KHz sample rate, 24 bit resolution and is 30 seconds long.
48000 x 24 = 1152000 bits per second x 30 = 34560000 bits for the whole clip
Compression is reducing the size of a file. This is done to reduce the amount of
storage space it takes up or to reduce the bandwidth when sending a file. There
are 2 types of compression:
Lossless Compression:
o A compression algorithm is used to reduces the file
size without permanently removing any data
o Repeated patterns in the file are identified and indexed
o The data is replaced with the index and positions stored
o The number of times the pattern appears is also stored
o Techniques like run-length encoding (RLE) and Huffman
encoding are used
o RLE replaces sequences of repeated characters with a
code that represents the character and the number of times it
is repeated
o Huffman encoding replaces frequently used characters with shorter
codes and less frequently used characters with longer codes
Lossy Compression:
o Lossy compression reduces the file size by permanently
removing some data from the file
o This method is often used for images and audio files where minor
details or data can be removed without significantly impacting
the quality
o Techniques like down sampling, reducing resolution or colour
depth, and reducing the sample rate or resolution are used for
lossy compression
o The amount of data removed depends on the level of compression
selected and can impact the quality of the final file
Overall:
o Compression is necessary to reduce the size of large
files for storage, transmission, and faster processing
o The choice between lossy and lossless compression methods
depends on the type of file and its intended use
o Lossy compression is generally used for media files where minor
data loss is acceptable while lossless compression is used for text,
code, and archival purposes
Character Sets
Text is a collection of characters that can be represented in binary, which
is the language that computers use to process information
To represent text in binary, a computer uses a character set, which is
a collection of characters and the corresponding binary codes that
represent them
One of the most commonly used character sets is the American Standard
Code for Information Interchange (ASCII), which assigns a unique 7-bit
binary code to each character, including uppercase and lowercase
letters, digits, punctuation marks, and control characters
E.g. The ASCII code for the uppercase letter 'A' is 01000001, while the
code for the character '?' is 00111111
ASCII has limitations in terms of the number of characters it can represent,
and it does not support characters from languages other than
English
To address these limitations, Unicode was developed as a character
encoding standard that allows for a greater range of characters and
symbols than ASCII, including different languages and emojis
Unicode uses a variable-length encoding scheme that assigns a
unique code to each character, which can be represented in binary form
using multiple bytes
E.g. The Unicode code for the heart symbol is U+2665, which can be
represented in binary form as 11100110 10011000 10100101
As Unicode requires more bits per character than ASCII, it can result
in larger file sizes and slower processing times when working with text-
based data
REPRESENTING SOUND
o MP3
MP3 is a format for digital audio
MP3 is an actual recording of the sound
MP3 is a (lossy) compression format
It is recorded using a microphone
REPRESENTING IMAGES
Worked example
An image has a resolution of 600 x 400 and a colour depth of 1 byte. Calculate
the file size of the image giving your answer in bytes. Show your working
[2]
[2]