MultiMedia Computing 1
MultiMedia Computing 1
السنة الدراسية2018-2017
المحور األول
1. Introduction to Multimedia
1.1 Introduction
Multimedia has become an inevitable part of any presentation. It has found a
variety of applications right from entertainment to education. The evolution of internet
has also increased the demand for multimedia content.
Definition
Multimedia is the media that uses multiple forms of information content and
information processing (e.g. text, audio, graphics, animation, video, interactivity) to
inform or entertain the user. Multimedia also refers to the use of electronic media to
store and experience multimedia content. Multimedia is similar to traditional mixed
media in fine art, but with a broader scope. The term "rich media" is synonymous for
interactive multimedia.
2
1.3 Categories of Multimedia
Multimedia may be broadly divided into linear and non-linear categories. Linear
active content progresses without any navigation control for the viewer such as a
cinema presentation. Non-linear content offers user interactivity to control progress as
used with a computer game or used in self-paced computer based training. Non-linear
content is also known as hypermedia content.
Multimedia presentations can be live or recorded. A recorded presentation may
allow interactivity via a navigation system. A live multimedia presentation may allow
interactivity via interaction with the presenter or performer.
The World Wide Web (WWW) is the best example of a hypermedia application.
3
Example Hypermedia Applications:
The World Wide Web (WWW) is a clear example of the hypermedia application.
PowerPoint.
Adobe Acrobat (or other PDF software).
Adobe Flash
3. Components of Multimedia
Now let us consider the Components (Hardware and Software) required for a
multimedia system:
1) Capture devices: Video Camera, Video Recorder, Audio Microphone,
Keyboards, mice, graphics tablets, 3D input devices, tactile sensors, VR devices.
Digitizing Hardware.
2) Storage Devices: Hard disks, CD-ROMs, DVD-ROM, etc.
3) Communication Networks: Local Networks, Intranets, Internet, Multimedia or
other special high speed networks.
4) Computer Systems: Multimedia Desktop machines, Workstations,
MPEG/VIDEO/DSP Hardware.
5) Display Devices: CD-quality speakers, HDTV, SVGA, Hi-Res monitors, Color
printers etc
Multimedia involves multiple modalities of text, audio, images, drawings,
animation, and video. Examples of how these modalities are put to use:
1. Video teleconferencing.
2. Distributed lectures for higher education.
3. Tele-medicine.
4. Co-operative work environments.
5. Searching in (very) large video and image databases for target visual objects.
6. “Augmented” reality: placing real-appearing computer graphics and video objects
into scenes.
7. Using voice-recognition to build an interactive environment, say a web browser
4
4. Multimedia Research Topics and Projects
To the computer science researcher, multimedia consists of a wide variety of
topics:
1. Multimedia processing and coding: This includes multimedia content analysis,
content-based multimedia retrieval, multimedia security, audio/image/video
processing, compression, etc.
2. Multimedia system support and networking: network protocols, Internet,
operating systems, servers and clients, quality of service (QoS), and databases.
3. Multimedia tools, end-systems and applications: These include hypermedia
systems, user interfaces, authoring systems, multimodal interaction, and integration,
web-everywhere devices, multimedia education, including computer supported
collaborative learning and design, and applications of virtual environments.
The concerns of multimedia researchers also impact researchers in almost every other
branch. For example, data mining is an important current research area, and a large
database of multimedia data objects is a good example of just what we may be
interested in mining.
4. Multi-modal interaction and integration: “ubiquity” web-everywhere devices,
multimedia education including Computer Supported Collaborative Learning, and de-
sign and applications of virtual environments.
5
Multimedia Projects
Many exciting research projects are currently underway in multimedia, and we’d
like to introduce a few of them here:
For example, researchers are interested in camera-based object tracking
technology. One aim is to develop control systems for industrial control, gaming,
and so on that rely on moving scale models (toys) around a real environment (a
board game, say). Tracking the control objects (toys) provides user control of
the process.
3D motion capture can also be used for multiple actor capture, so that multiple
real actors in a virtual studio can be used to automatically produce realistic
animated models with natural movement.
Multiple views from several cameras or from a single camera under differing
lighting can accurately acquire data that gives both the shape and surface
properties of materials, thus automatically generating synthetic graphics models.
This allows photo-realistic synthesis of virtual actors.
3D capture technology is next to fast enough now to allow acquiring dynamic
characteristics of human facial expression during speech, to synthesize highly
realistic facial animation from speech.
Multimedia applications aimed at handicapped persons, particularly those with
poor vision and the elderly, are a rich field of endeavor in current research.
Digital fashion aims to develop smart clothing that can communicate with other
such enhanced clothing using wireless communication, so as to artificially
enhance human interaction in a social setting. The vision here is to use
technology to allow individuals to allow certain thoughts and feelings to be
broadcast automatically, for exchange with others equipped with similar
technology.
Georgia Tech's Electronic Housecall system, an initiative for providing
interactive health monitoring services to patients in their homes, relies on
networks for delivery, challenging current capabilities.
6
5. Applications of Multimedia
Multimedia finds its application in various areas including, but not limited to,
advertisements, art, education, entertainment, engineering, medicine, mathematics,
business, scientific research and spatial, temporal applications. multimedia such as
virtual reality, or VR. Goggles, helmets, special gloves.
A few application areas of multimedia are listed below:
Commercial: Much of the electronic old and new media utilized by commercial artists
is multimedia. Exciting presentations are used to grab and keep attention in advertising.
Industrial, business to business, and interoffice communications are often developed
by creative services firms for advanced multimedia presentations beyond simple slide
shows to sell ideas or liven-up training. Commercial multimedia developers may be
hired to design for governmental services and non-profit services applications as well.
7
almanacs. A CBT lets the user go through a series of presentations, text about a particular
topic, and associated illustrations in various information formats. Edutainment is an informal
term used to describe combining education with entertainment, especially multimedia
entertainment.
Medicine: In Medicine, doctors can get trained by looking at a virtual surgery or they
can simulate how the human body is affected by diseases spread by viruses and bacteria
and then develop techniques to prevent it.
8
Example Multimedia Applications
World Wide Web
Multimedia Authoring, e.g. Adobe/Macromedia Director
Hypermedia courseware
Video-on-demand
Interactive TV
Computer Games
Virtual reality
Digital video editing and production systems
Multimedia Database systems
WWW technology is maintained and developed by the World Wide Web Consortium
(W3C). The W3C has listed the following three goals for the WWW: universal access
of web resources (by everyone everywhere), effectiveness of navigating available
information, and responsible of posted material.
9
always preceded by the token ''http://''. A URI could be a Uniform Resource Locator
(URL).
Static or Discrete Media: Some media is time independent: Normal data, text, single
images and graphics are examples.
Continuous Media: Time dependent Media: Video, animation and audio are
examples.
10
E.g. transducers, thermocouples: temperature sensor, microphones: acoustic
sensor
Cameras (Video): light sensor. (usually) continuous Analog signals (e.g. Sound
and Light)
Digital: discrete digital signals that computer can readily deal with. Special hardware
devices: Analog-to-Digital Playback {a converse operation to Analog-to-Digital
converters. E.g. Audio: Take analog signals from analog sensor (e.g. microphone)
Analog-to-Digital-to-Analog Pipeline
• Begins at the conversion from the analog input and ends at the conversion from
the output of the processing system to the analog output as shown:
11
Multimedia Data: Input and Format
How to capture and store each Media format?
Note that text and graphics (and some images) are mainly generated directly by
computer/device (e.g. drawing/painting programs) and do not require digitising:
They are generated directly in some (usually binary) format.
Printed text and some handwritten text can be scanned via Optical Character
Recognition.
Handwritten text could also be digitised by electronic pen sensing.
Printed imagery/graphics can be scanned directly to image formats.
1- Text
2- Graphics
3- Images
4- Audio
5- Video
12
Compression: convenient to bundle les for archiving and transmission of larger
les. E.g. Zip, RAR, 7-zip.
2) Graphics
Format: constructed by the composition of primitive objects such as lines,
polygons, circles, curves and arcs.
Input: Graphics are usually generated by a graphics editor program (e.g. illustrator,
Freehand) or automatically by a program (e.g. Postscript).
Graphics input devices: keyboard (for text and cursor control), mouse, trackball or
graphics tablet.
Graphics are usually selectable and editable or revisable (unlike images).
Graphics les usually store the primitive assembly.
Do not take up a very high storage overhead.
Graphics standards: Open Graphics Library, a standard specification defining a
cross-language, cross-platform API for writing applications that produce 2D/3D
graphics.
Animation: can be generated via a sequence of slightly changed graphics.
2D animation: e.g. Flash | Key frame interpolation: tweening: motion & shape.
3D animation: e.g. Maya.
Change of shape/texture/position, lighting, camera Graphics animation is compact
Suitable for network transmission (e.g. Flash).
3) Images
Still pictures which (uncompressed) are represented as a bitmap (a grid of pixels).
Input: scanned for photographs or pictures using a digital scanner or from a digital
camera.
Input: May also be generated by programs similar to graphics or animation
programs.
Analog sources will require digitising.
Compression is commonly applied.
13
Can usually only edit individual or groups of pixels in an image editing
application, e.g. photoshop.
4) Audio
Audio signals are continuous analog signals.
Input: microphones and then digitised and stored.
CD Quality Audio requires 16-bit sampling at 44.1 KHz.
Usually compressed (E.g. MP3, AAC, Flac, Ogg Vorbis).
5) Video
Input: Analog Video is usually captured by a video camera and then digitised,
although digital video cameras now essentially perform both tasks.
There are a variety of video (analog and digital) formats.
Raw video can be regarded as being a series of single images. There are
typically 25, 30 or 50 frames per second.
Digital Images
An image must be converted to numerical form before processing. This conversion
process is called digitization, and a common form is illustrated in Figure (1). The image
is divided into small regions called picture elements, or pixel for short. The most
common subdivision scheme is the rectangular sampling grid shown in Figure (1). The
image is divided into horizontal lines made up of adjacent pixels.
At each pixel location, the image brightness are sampled and quantized. This
14
step generates an integer at each pixel representing the brightness or darkness of the
image at that point. When this has been done for all pixels, the image is represented by
a rectangular array of integer. Each pixel has a location or address (Line or row number
and sample or column number) and an integer value called gray level. This array of
digital data is now a candidate for computer processing.
A digital image is a numeric representation of a two-dimensional image.
Depending on whether the image resolution is fixed. The term "digital image"
usually refers to raster images or bitmapped images.
15
where each pixel value corresponds to the brightness of the image at the point (x,y).
In linear algebra terms, a two-dimensional array like our image model f(x,y) is
referred to as a matrix, and one row (or column) is called a vector. This image model
is for monochrome (one-color, this is what we normally refer to as black and white)
image data, we also have other types of image data that require extensions or
modifications to this model.
16
The number of quantization levels should be high enough for human
perception of fine shading details in the image. The occurrence of false contours
is the main problem in an image which has been quantized with insufficient
brightness levels.
1. Sampling (represents => resolution)
Sampling: is a process of measuring the value of the image function f (x, y) at
discrete intervals in space. Each sample corresponds to a small square area of the
image, known as a pixel. two-dimensional pattern to represent the measurements
(light intensity or color) that are made in the form of an image numerically.
17
In this figure, we have represented the image "Lena" sampled with two different
sampling structures. The image on the left is the reference image (spatial dimensions:
(256*256) pixels). The second image is sampled with a sampling frequency four times
lower for each of the two spatial dimensions. This means it is (64*64) pixels. For
display purposes, it has been brought to the same size as the original using a zoom.
This is in fact an interpolation of zero- order (each pixel is duplicated 4*4 times, so
that on the screen it displays a square of (4*4) identical pixels).
2. Quantization
Quantization: is the process of converting a continuous range of values into a
finite range of discreet values. The accuracy with which variations in f (x, y) are
represented is determined by The number of quantization levels that we use: the
more levels we use, the better the approximation.
As number of bits to represent a pixel intensity is quantized. Suppose 8 bit is
used for a pixel, it’s equivalent value ranges from 0 to 255 (discrete values). 0 is
assigned to pure Black, and 255 is assigned to pure White. Intermediate values are
assigned to gray scales as shown in this image.
Quantization of an image
This illustration shows examples of a quantization carried out on the image :
For the image on the left: quantization is followed by a natural binary coding with 8
bits per pixel. There are 28 = 256 reconstruction levels to represent the magnitude of
18
each pixel. It is the typical case of a monochrome image (only in gray scales).
For the middle image: quantization is carried out with a 4 bits per pixel coding, giving
24 = 16 reconstruction levels. Contours are well rendered but textures are imprecise in
some cases. These are areas in the signal with a weak spatial variation, which suffer
more visually due to the appearance of false contours (loss on the face and the
shoulder).
For the image on the right: quantization is carried out with a 2 bits per pixel coding,
so we have 22 = 4 reconstruction levels. The deterioration seen on the previous image
is even more flagrant here.
19
Effect of reducing the spatial resolution
Decreasing spatial resolution of a digital image, within the same area, may result
in what is known as checkerboard pattern. Also image details are lost when the spatial
resolution is reduced.
To demonstrate the checkerboard pattern effect, we subsample the 1024×1024
image shown in Figure (1) to obtain the image of size 512×512 pixels. The 512×512 is
then subsampled to 256×256 image, and so on until 32×32 image. The subsampling
process means deleting the appropriate number of rows and columns from the original
image. The number of allowed gray levels was kept at 256 in all the images.
Figure : A 1024×1024, 8-bit image Sub sampled down to size 32×32 pixels.
To see the effects resulting from the reduction in the number of samples, we
bring all the subsampled images up to size 1024×1024 by row and column pixel
replication. The resulted images are shown in Figure.
Figure 2. the effects of resulting from the reduction in the number of samples. All
images have 8 – bits.
20
Compare Figure 2(a) with the 512×512 image in Figure 2(b), we find that the
level of detail lost is simply too fine to be seen on the printed page at the scale in
which these images are shown. Next, the 256×256 image in Figure 2(c) shows a
very slight fine checkerboard pattern in the borders between flower petals and the
black background. A slightly more pronounced graininess throughout the image
also is beginning to appear. These effects are much more visible in the 128×128
image in Figure 2(d), and they become pronounced in the 64×64 and 32×32 images
in Figures 2(e) and (f), respectively.
21
Binary images are often created from gray-scale images via a threshold
operation where every pixel above the threshold value is turned white ('1'), and
those below it are turned black ('0').
• Each pixel is stored as a single bit (0 or 1).
• A 640 x 480 bit-mapped image requires 37.5 KB of storage.
2. Gray-Scale Images
Gray-scale images are referred to as monochrome, or one-color, images. They
contain brightness information only, no color information. The number of bits used for
each pixel determines the number of different brightness levels Available. The typical
image contains 8 bits/pixel data, which allows us to have 256 (0-255) different
brightness (gray) levels.
This representation provides more than adequate brightness resolution, in terms
of the human visual system's requirements and provides a "noise margin" by allowing
for approximately twice as many gray levels as required. Additionally, the 8 bit
representation is typical due to the fact that the byte which corresponds to 8 bit of data,
is the standard small unit in the world of digital computers.
Each pixel is usually stored as a byte (value between 0 to 255). A dark pixel may
have a value of 10; a bright one may be 240 (dark=0; white=255)
Example: find image size with 640*480 pixles
Total no. of bits = 640*480* 8 bit=2457600 bit Or 640*480*1 Byte=307200Bytes
Convert to Byte: 2457600/8=307200 Byte
Convert to KByte: 307200 Byte/1024 = 300 KB
22
3. Color Images
23
store an alpha value representing special effect information (e.g., transparency).
Examples: Color Image uses 32 bits with high = 200 and width = 200 pixels?
Size= 200*200*4 Bytes=160000B/1024=156.25KB.
4. Multispectral Images
Multispectral images typically contain information outside the normal human
perceptual range. This may include infrared, ultraviolet, X-ray, acoustic, radar data.
These are not images in the usual sense because the information represented is not
directly visible by the human system. However, the information is often represented
in visual form by mapping the different spectral bands to RGB components. If more
than three bands of information are in the multispectral image, the dimensionality
is reduced by applying a principal component's transform.
24
1- GIF
• Graphics Interchange Format (GIF) devised by the UNISYS Corp. and
CompuServe, initially for transmitting graphical images over phone lines via
modems.
• One of the simplest is the 8-bit GIF format, and we study it because it is easily
understood, and also because of its historical connection to the WWW and
HTML markup language as the first image
• type recognized by net browsers.
• Limited to only 8-bit (256) colour images, suitable for images with few
distinctive colours (e.g., graphics, drawing)
• GIF89a: supports simple animation, transparency index etc.
25
Color map Image descriptor
2- JPEG Standard
A standard for photographic image compression created by the Joint Photographic
Experts Group
Takes advantage of limitations in the human vision system to achieve high rates of
compression.
Lossy compression which allows user to set the desired level of
quality/compression.
3- TIFF
Tagged Image File Format stores many dierent types of images (e.g., bit-map,
greyscale, 8-bit & 24-bit RGB, etc.).
Developed by the Aldus Corp. in the 1980's and later supported by the Microsoft
TIFF is a typically lossless format.
It does not provide any major advantages over JPEG and is not as user-controllable
it appears to be declining in popularity
26
4- BMP
BitMap (BMP) is the major system standard graphics file format for Microsoft
Windows.
used in Microsoft Paint and other programs. It makes use of run-length encoding
compression and can fairly efficiently store 24-bit bitmap images.
Note, however, that BMP has many different modes, including uncompressed 24-
bit images.
5- PNG
PNG meant to supersede GIF standard
Features of PNG:
Support up to 48 bits per pixel - more accurate colors
Support description of gamma-correction and
alpha-channel for controls such as transparency
Support progress display through 8×8 blocks.
27
المحور الثالث
14. Arithmetic and Logical Operations on Images (Image Algebra)
These operations are applied on pixel-by-pixel basis. So, to add two images
together, we add the value at pixel (0 , 0) in image 1 to the value at pixel (0 , 0)
in image 2 and store the result in a new image at pixel (0 , 0). Then we move to
the next pixel and repeat the process, continuing until all pixels have been
visited.
Clearly, this can work properly only if the two images have identical
dimensions. If they do not, then combination is still possible, but a meaningful
result can be obtained only in the area of overlap. If our images have dimensions
of w1*h1, and w2*h2 and we assume that their origins are aligned, then the new
image will have dimensions w*h, where:
w = min (w1, w2)
h = min (h1, h2)
28
Addition can also be used to combine the information of two images,
such as an image morphing, in motion pictures.
a b c
Figure (4) a) noisy image b) average of five observation c) average of ten
observation
Subtraction
Subtracting two 8-bit grayscale images can produce values between - 225
and +225. This necessitates the use of 16-bit signed integers in the output image
– unless sign is unimportant, in which case we can simply take the modulus of
the result and store it using 8-bit integers:
g(x,y) = |f1 (x,y) – f2 (x,y)|
The main application for image subtraction is in change detection (or motion
detection). If we make two observations of a scene and compute their difference using
the above equation, then changes will be indicated by pixels in the difference image
which have non-zero values. Sensor noise, slight changes in illumination and various
other factors can result in small differences which are of no significance so it is usual
to apply a threshold to the difference image. Differences below this threshold are set to
zero. Difference above the threshold can, if desired, be set to the maximum pixel value.
Subtraction can also be used in medical imaging to remove static
29
background information.
Algorithm2: image subtraction ت ط
30
(a) (b) (c)
Figure a) original image b) image multiplied by 2 c) image divided by 2
AND OR XOR
Input 1 1 1 0 0 1 1 0 0 1 1 0 0
Input 2 1 0 1 0 1 0 1 0 1 0 1 0
output 1 0 0 0 1 1 1 0 0 1 1 0
Logical AND & OR operations are useful for the masking and compositing
of images. For example, if we compute the AND of a binary image with some
other image, then pixels for which the corresponding value in the binary image
is 1 will be preserved, but pixels for which the corresponding
31
binary value is 0 will be set to 0 (erased) . Thus the binary image acts as a
“mask” that removes information from certain parts of the image.
On the other hand, if we compute the OR of a binary image with some other
image , the pixels for which the corresponding value in the binary image is 0 will
be preserved, but pixels for which the corresponding binary value is 1, will be
set to 1 (cleared).
So, masking is a simple method to extract a region of interest from an
image.
AND ^
This operation can be used to find the similarity white regions of two
different images (it required two images).
g (x,y) = a (x,y) ^ b (x,y)
32
Exclusive OR
This operator can be used to find the differences between white regions of two
different images (it requires two images).
g (x,y) = a (x,y) ● b (x,y)
NOT
NOT operation can be performed on gray-level images, it’s applied on
only one image, and the result of this operation is the negative of the original
image.
g (x,y) = 255- f (x,y)
33
so the histogram will graphically display 256 numbers showing the distribution of
pixels amongst those grayscale values.
The gray level histogram is showing, the gray level, for each pixel in the
image.
in the image.
The histogram of an 8-bit image, can be though of as a table with 256 entries,
or “ bins”, indexed from 0 to 255. in bin 0 we record the number of times a gray
level of 0 occurs; in bin 1 we record the number of times a gray level of 1 occurs,
and so on, up to bin 255.
34
• The histogram of a digital image with L total possible intensity levels in the range
[0, G] is defined as the discrete function
• Where r is the kth intensity level in the interval [0,G] and n is the number of
k k
A color histogram counts pixels with a given pixel value in red, green, and blue
(RGB). For example, in pseudocode, for images with 8-bit values in each of R, G, B,
we can fill a histogram that has 2563 bins:
35
int hist[256][256][256]; // reset to 0
//image is an appropriate struct
for i=0..(MAX_Y-1)
for j=0..(MAX_X-1) ت ط
R = image[i][j].red;
B = image[i][j].blue;
hist[R][G][B]++;
Example : Plot the Histogram of the following example with 4x4 matrix of a 3-bit
image.
36
histogram is unique for any particular image, but the reverse is not true. Vastly
different images could have identical histograms. Such operations as moving
objects around within an image typically have no effect on the histogram.
Histograms are used in numerous image processing techniques, such as
image enhancement, compression and segmentation.
Note that the horizontal axis of the histogram plot (Figure (b) above)
represents gray level values, , from 0 to 255. The vertical axis represents the
values of i.e. the number of pixels which have the gray level .
Another way of getting histogram is to plot pixel intensities vs. pixel probabilities.
However, probability histogram should be used when comparing the histograms of
images with different sizes.
37
Example: Suppose that a 3-bit image (L = 8) of size 64 × 64 pixels has the gray level
(intensity) distribution shown in the table below. Perform normalized histogram.
Normalized histogram
38
Figure 11: a variety types of histograms
39
a. Histogram stretch
The mapping function for histogram stretch can be found by the equation :
Where :
I(r,c)MAX is the largest gray-level value in the image I(r,c)
I(r,c)MIN is the smallest gray-level value in the image I(r,c)
MAX and MIN correspond to the maximum and minimum gray-
level values possible ( for 8-bitimages these are 0 and 255 ).
This equation will take an image and stretch the histogram across the entire
gray-level range, which has the effect of increasing the contrast of a low contrast
image . If a stretch is desired over a smaller range, different MAX and MIN
values can be specified.
40
b. Histogram shrink
The opposite of a histogram stretch is a histogram shrink, which will
decrease image contrast by compressing the gray levels. The mapping
function for a histogram shrink can be found by the following equation:
41
c. Histogram slide
42
Where: Max =255 ; Min =0
Solution:
80 147 93
[255 107 67]
120 187 0
43
I(1,2) = [ 100-20 / 200-10 ] * [ 60 – 10 ] + 20= 41.05
I(2,0) = [ 100-20 / 200-10 ] * [ 100 – 10 ] + 20= 57.89
I(2,1) = [ 100-20 / 200-10 ] * [ 150 – 10 ] + 20= 78.94
I(2,2) = [ 100-20 / 200-10 ] * [ 10 – 10 ] + 20= 20
45 66 49
[100 53 41]
57 78 20
Example: Apply histogram slide for the following sub image, where OFFSET= 10 :
7 12 8
[20 9 6]
10 15 1
Solution:
Slide( I(r,c) ) =I(r,c) + OFFSET
17 22 18
[30 19 16]
20 25 11
Histogram equalization
44
𝑣 𝑣
Example:
Apply histogram equalization for the following sub image, where image is gray scale :
Solution:
𝑣𝑣𝑣(𝑣) − 𝑣𝑣𝑣𝑣𝑣𝑣
ℎ(𝑣) = 𝑣𝑣𝑣 ( × (𝑣 − 1))
(𝑣 − 𝑣) − 1
45
3−3
ℎ(50) = 𝑣𝑣𝑣 ( ) ∗ (256 − 1) = 0
16−1 Pixel Count Cdfr H(r)
4−3
ℎ(51) = 𝑣𝑣𝑣 ( ) ∗ (256 − 1) = 17 Intensity
16−1
ℎ(55) = 𝑣𝑣𝑣 (
8−3
) ∗ (256 − 1) = 51 50 3 3 0
16−1
10−3 51 1 4 17
ℎ(70) = 𝑣𝑣𝑣 ( ) ∗ (256 − 1) = 119
16−1 55 4 8 51
12−3
ℎ(80) = 𝑣𝑣𝑣 ( ) ∗ (255) = 153 70 2 10 119
15
13−3
ℎ(90) = 𝑣𝑣𝑣 ( ) ∗ (255) = 170 80 2 12 153
15
14−3 90 1 13 170
ℎ(100)=𝑣𝑣𝑣 ( ) ∗ (255) = 187
15
100 1 14 187
16−3
ℎ(150) = 𝑣𝑣𝑣 ( ) ∗ (255) = 221 150 2 16 221
15
0 51 221 221
17 0 51 51
119 153 170 187
0 51 119 143
47
Example: the original image is 256×256 pixel, single band (gray scale), 8-bit
per pixel. This file is 65,536 bytes (64K). After compression the image file is 6,554
byte. The compression ratio is:
Size U/Size C = 65536/6554= 9.999=10 this can be written as 10:1
This is called a “10 to 1” compression or a “10 times compression”, or it can be
stated as “compressing the image to 1/10 original size.
2. Lossy Compression.
Lossy Compression is the class of data encoding methods that allows a loss in
some of data images.
The uncompressed image cannot be same the original image file.
Lossy methods can provide high degrees of compression and result in smaller
compressed files, but some number of the original pixels, sound waves or
video frames are removed forever.
48
Compression System Model
The compression system model consists of two parts: the compressor
(Encoding) and the decompressor (Decoding).
Compressor: consists of preprocessing stage and encoding stage.
Decompressor: consists of decoding stage followed by a post processing stage, as
following figure:
Entropy
An important concept here is the idea of measuring the average information
in an image, referred to as entropy. The entropy of N×N image can be calculated by
this equation.
Where
Pi = The probability of the ith gray level nk /N2
49
nk= the total number of pixels with gray value k.
L= the total number of gray levels (e.g. 256 for 8-bits)
Example
Let L=8, meaning that there are 3 bits/ pixel in the original image. Let that
number of pixel at each gray level value are equal (they have the same probability)
that is:
P0=P1=P2=……=P7=1/8
Now, we can calculate the entropy as follows:
This tell us that the theoretical minimum for lossless coding for this image is 3
bit/pixel
Note: Log2(x) can be found by taking Log10 (x) and multiplying by 3.33 bacuase
1/Log10 (2) =3.32192809488736234
50
2. Simple Repetition Suppression
Replace series with a token and a count number of occurrences.
Usually need to have a special flag to denote when the repeated token
appears.
Simplicity is its downfall: poor compression ratios.
Compression savings depend on the content of the data.
Example:
89400000000000000000000000000000000
We can replace with: 894f32 /where f is the flag for zero.
3. Pattern Substitution
• This is a simple form of statistical encoding.
• Here we substitute a frequently repeating pattern(s) with a code.
• The code is shorter than the pattern giving us compression.
• The simplest scheme could employ predefined codes:
Example: Basic Pattern Substitution
Replace all occurrences of pattern of characters `and' with the predefined code '&',
So: and you and I becomes: & you & I
4- Shanon-Fano Method
51
4. The left part of the list is assigned the binary digit 0, and the right part is assigned
the digit 1. This means that the codes for the symbols in the first part will all
start with 0, and the codes in the second part will all start with 1.
5. Recursively apply the steps 3 and 4 to each of the two halves, subdividing groups
and adding bits to the codes until each symbol has become a corresponding
code leaf on the tree.
Example: The source of information A generates the symbols {A0, A1, A2, A3 and
A4} with the corresponding probabilities {0.4, 0.3, 0.15, 0.1 and 0.05}. Encoding the
source symbols using binary encoder and Shannon-Fano encoder gives:
52
The average length of the Shannon-Fano code is
Huffman Code
The Huffman coding algorithm comprises two steps, reduction and splitting. These
steps can be summarized as follows:
1) Reduction
a) List the symbols in descending order of probability.
b) Reduce the r least probable symbols to one symbol with a probability
equal to their combined probability.
c) Reorder in descending order of probability at each stage.
53
d) Repeat the reduction step until only two symbols remain.
2) Splitting
a) Assign r ,... 1, 0 to the r final symbols and work backwards.
b) Expand or lengthen the code to cope with each successive split.
Example: Design Huffman codes for 𝑣= {𝑣1, 𝑣2, … … . 𝑣5}, having the probabilities
{0.2, 0.4, 0.2, 0.1, 0.1}.
𝑣𝑣𝑣𝑣 = 𝑣=0 𝑣𝑣 𝑣𝑣
∑4
54
𝑣
or 𝑣(𝑣) = − ∑ 𝑣𝑣
𝑣(𝑣) = ∑ 𝑣𝑣 𝑣 𝑣𝑣 𝑣𝑣
𝑣𝑣 𝑣 𝑣𝑣 𝑣𝑣
55
(𝑣) = −[0.4𝑣𝑣0.4 + 2 × 0.2𝑣𝑣0.2 + 2 × 0.1𝑣𝑣0.1]/𝑣𝑣2 = 2.12193
bits/symbol
56
المحور الخامس
The simplest kind of sound wave is a sine wave. Pure sine waves rarely exist in
the natural world, but they are a useful place to start because all other sounds can
be broken down into combinations of sine waves. Even though such pressure
waves are longitudinal, they still have ordinary wave properties and behaviors, such
as reflection (bouncing), refraction (change of angle when entering a medium
with a different density) and diffraction (bending around an obstacle).
57
If we wish to use a digital version of sound waves we must form digitized
representations of audio information.
58
Sampling Quantization
Sampling rate: Number of samples per 3-bit quantization gives 8 possible level
second (measured in Hz) values
E.g., CD standard audio uses a sampling E.g., CD standard audio uses 16-bit
rate of 44,100 Hz (44100 samples per quantization giving 65536 values.
second)
Non-uniform sampling is also possible. This is not used for sampling in time, but is
used for quantization (the µ-Iaw). We call it non-linear If its logarithmic. The non-
linear scale is used because small amplitude signals are more likely to occur than
large amplitude signals, and they are less likely to mask any noise.
59
Data rate = sample rate * quantization * channel
Q) Compare rates for CD vs. mono audio?
Mono audio: Data rate = 8000 samples/second * 8 bits/sample * 1 channel
= 7.8 kBytes / second
CD: Data rate = 44,100 samples/second * 16 bits/sample * 2 channels
= 172.26 KBytes / second ~= 10MB / minute
60
able to recover the original sound.
As a simple illustration, Fig. (1,a) shows a single sinusoid: it is a single, pure,
frequency (only electronic instruments can create such boring sounds). Now if the
sampling rate just equals the actual frequency, we can see from Fig. (1,b) that a false
signal is detected: it is simply a constant, with zero frequency. On the other hand,
we sample at 1.5 times the frequency, Fig. (1,c) shows that we obtain an incorrect
(alias) frequency that is lower than the correct one. it is half the correct one (the
wavelength, from peak to peak, is double that of the actual signal).
An alias is any artifact that does not belong to the original signal. Thus, for
correct sampling we must use a sampling rate equal to at least twice the maximum
frequency content in the signal. This is called the Nyquist rate.
. f s ≥ 2f c
Fig. 1: Aliasing:
(a) a single frequency;
(b) sampling at exactly the frequency produces a constant;
(c) sampling at 1.5 times per cycle produces an alias frequency that is perceived
61
Digital Sampling Artifacts Arise - Effect known as Aliasing which affects Audio,
Image and Video
Generally, if a signal is band-limited—that is, if it has a lower limit f1 and an
upper limit f2 of frequency components in the signal—then we need a sampling rate
of at least 2( f2 − f1).
Example,
If the true frequency is 5.5 kHz and the sampling frequency is 8 kHz, then the
alias frequency is 2.5 kHz: So, if again the sampling frequency is less than twice the
true frequency and is less than the true frequency, then the alias frequency equals n
62
times the sampling frequency minus the true frequency, where n is the lowest
integer that makes n times the sampling frequency larger than the true frequency.
For example, when the true frequency is between 1.0 and 1.5 times the
sampling frequency, the alias frequency equals the true frequency minus the
sampling frequency.
In general, the apparent frequency of a sinusoid is the lowest frequency of a
sinusoid that has exactly the same samples as the input sinusoid.
Data Level
A digital signal has eight levels. How many bits are needed per level?
𝑣𝑣𝑣𝑣𝑣𝑣 𝑣𝑣 𝑣𝑣𝑣𝑣 = 𝑣𝑣𝑣𝑣𝑣 = 𝑣
C = 2 B log2 2n
63
(for each level, we send 2 bits).
Number of bits per level= log2 4= 2
C = 2 X (3000) log2 22= 1200 bps.
Example: Television channels are 6 MHz wide. How many bits/sec can be sent if
four-level digital signals are used? Assume a noiseless channel.
C = 2 X (6 X 106) log2 4= 24 Mbps.
Synthetic Sound (23rd)
Quantization and transmission of audio (24th)
Quantization
Since we quantize, we may choose to create either an accurate or less
accurate representation of sound magnitude values.
To compress the data, by assigning a bit stream that uses fewer bits for the
most prevalent signal values.
Quantization process introduces a certain amount of error or distortion into
the signal samples.
Perceptual Quantization (u-Law)
Want intensity values logarithmically mapped over N quantization units
64
Quantization and transformation of data are collectively known as coding of
the data. For audio, the µ-law technique for companding audio signals is usually
combined with a simple algorithm that exploits the temporal redundancy present in
audio signals.
Example: CD audio, which uses 16-bit samples at a 44,100 Hz sampling rate. There
are two parallel streams, one for each channel, to produce stereo. What is the
transmission rate of CD-quality audio?
65
As long as you understand the terms involved, this is a straightforward math
problem. For each of the two channels, there are 44,100 samples per second. Each
of these samples requires 16 bits. Transmission rates are normally described in
terms of the number of bits per second that must flow from the source to the
destination. In our case:
44100 samples per second * 16 bits per sample * 2 channels = 1411.2 kbps
PAM system can be converted to PCM if we add ADC at the source and
DAC at the destination.
66
Figure: PCM signal encoding and decoding
dt = xt-xt-1
Differences tend to be smaller. Use 4 bits instead of 12 bits. Changes between
adjacent samples small.
Value uses full bits, changes use fewer bits
Example:
(a) Full bits:-> 220, 218, 221, 219, 220, 221, 222, 218,… Result: originally for
encoding sequence 0-255 numbers need 8 bits;
(b) Changes:-> (Difference sequence) 220, -2, +3, -2, +1, +1, +1, -4....
Difference coding: need only 3 bits
67
signal, while for music we need to compress a 1.411 MHz signal. Two categories of
techniques are used for audio compression: predictive encoding and perceptual
encoding.
In lossless data compression, the integrity of the data is preserved. The original data
and the data after compression and decompression are exactly the same because, in
these methods, the compression and decompression algorithms are exact inverses
of each other: no part of the data is lost in the process. Redundant data is removed
in compression and added during decompression. Lossless compression methods
are normally used when we cannot afford to lose any data.
1. μ-Law and A-Law Companding
These two method encode audio samples, by means of nonlinear
quantization. The μ-law encoder inputs 14-bit samples and outputs 8-bit codewords.
68
The A- law inputs 13-bit samples and outputs 8-bit codewords. G.711 standard is an
8-bit codewords whose format is shown:-
P S2 S1 S0 Q3 Q2 Q1 Q0
69
Step4: Use bit P to determine the sign of the result.
End
Q3 Q2 Q1 Q0
0 0 0 1 0 1 0 1 1 0 0 0 1
12 11 10 9 8 7 6 5 4 3 2 1 0
μ - Law decoder
1. The quantization code is 1012 = 5, so 5 x 2 + 33 = 43
2. The segment code is 1002 = 4, so 43 x 24 = 688.
3. Decrement by the bias 688 — 33 = 655.
4. Bit P is 1, so the final result is —655. Thus, the quantization error (the noise) is
1; very small.
70
المحور السادس
Video Basics (26th)
INTRODUCTION:
Video can be defined as the photographic images that are played back at speeds of
15, 25, 30 frames per second and provide the appearance of full motion (see Figure
1). A video consists of a time-ordered sequence of frames (images).
DIGITAL VIDEO:
Digital video refers to the capturing, manipulation, and storing of moving images
that can be displayed on computer screens. This requires that the moving images be
digitally handled by the computer. The word digital refers to a system based on
discontinuous events (sampling), as opposed to analogue, a continuous event.
Visual pixels are the basic unit in digital video, where each color component sorted
digitally in each pixel.
71
photographers prefer shooting raw film due to the high quality of images that
the camera sensor could possibly produce.
Aspect Ratio: The aspect ratio of an image describes the proportional
relationship between its width and its height. It is commonly expressed as
two numbers separated by a colon, as in 16:9.
In analog video, NTSC uses 525 lines, whereas PAL uses 625, many lines are not
actually used for picture information, so the total numbers relevant for the picture
are somewhat smaller: 486 lines for NTSC and 576 lines for PAL. HD formats defined
by the ATSC have either 1080 or 720 active picture lines per frame.
Pixels per Line
320 240 1:1 4:3 Used for web distribution or offline video
editing.
640 480 1:1 4:3 An early standard for analog-to-digital video
editing, and an ATSC video specification.
7201 480 Height 4:3 NTSC DV and DVD image dimensions. Also part
greater than
width
of the ATSC video specification.
7201 486 Height 4:3 NTSC SD video dimensions used for professional
greater than
width
digital formats such as Digital Betacam, D-1, and
D-5.
7201 576 Width 4:3 PAL SD video dimensions used for digital
72
greater than formats such as Digital Betacam, D-1, and D5, as
height
well as DVD and DV.
1280 720 1:1 16:9 An HD video format, capable of higher frame
rates in exchange for smaller image dimensions.
1920 1080 1:1 16:9 An HD video format with very high resolution.
960 720 4:3 16:9 Some 720p formats (such as DVCPRO HD and
HDV) subsample 1280 pixels to 960 to minimize
the data rate.
1440 1080 4:3 16:9 Some 1080-line formats (such as HDV and
3:2
1280 DVCPRO HD) subsample 1920 pixels to 1440 or
even 1280 to minimize the data rate.
In order to know calculate the size of a given video clip (without compression), we
use the following equation:
video size (bits) = width x height x bit per pixel x frame rate x duration (time)
Where:
Bit per pixel = 24 for RGB, 8 for gray-level and 1 for binary image
Frame rate or frame per second = 15, 25, 30 fps or even more
Duration = time in seconds
Example:
Find the size in Kbyte for a 10 minutes Full HD (1920x1080) video sequence (RGB
type) with a frame rate = 30 fps.
Solutions:
RGB type = 24 bits=3Bytes
Time in seconds = 10 x 60 = 600
Video size = 1980 x 1080 x 3 x 30 x 600
Video size = = 115473600000 Bytes
To convert to KBytes
Video size = 115473600000/1024
= 112767187.5 Kbytes
Video Rate (bits) = width x height x bit per pixel x frame rate
73
Example:
What is the bit rate for high-definition TV (HDTV)?
Color Model
1. RGB Model
Generate all others colours based on three basic colours which are Red ,
Green, and Blue.
(Newer) Color LCD panels (typically thin-film-transistor liquid-crystal
displays (TFT LCD)): Transistor switch for each (R, G or B) pixel
Used in display (monitor) LED, data-show etc.
74
(R G B)=( (1-R)=C,(1-G)=M,(1-B)=y)
• All visible colors are in a horseshoe shaped cone in the X-Y-Z space. Consider
the plane X+Y+Z=1 and project it onto the X-Y plane, we get the CIE
chromaticity diagram.
• The edges represent the pure colors
75
4. YUV Color Model
• Digital video standard established in 1982
• Video is represented by a sequence of fields (odd and even lines). Two fields
make a frame.
• Works in PAL (50 fields/sec) or NTSC (60 fields/sec)
• Uses the Y, U, V color space.
76
in our minds, but if we defined this color in terms of its RGB component R=245,
G=110, and B = 20, most people would have no idea how this color appears. Because
the HSL color space was developed based on heuristics relating to human
perception, various methods are available to transform RGB pixel values into the
HSL color space. Most of these are algorithmic in nature and are geometric
approximations to mapping the RGB color cube into some HSL-type color space.
Video Compression
77
1) Reduce color nuances within the image
2) Reduce the color resolution with respect to the prevailing light intensity
3) Remove small, invisible parts, of the picture
4) Compare adjacent images and remove details that are unchnaged between
two images
The first three are image based compression techniques, where only one frame is
evaluated and compressed at a time. The main one that does the real image
compression is the Discrete Cosine Transform (DCT) followed by a quantization that
removes the redundant information (the “invisible” parts).
78
2- Compresses sequences of frames by only storing differences between them.
Based on Motion Compensation (MC).
A motion vector (MV) describes the offset between the location of the block being
coded (in the current frame) and the location of the best-match block in the
reference frame
The idea of looking for the football player in the next frame is called motion
estimation, and the concept of shifting pieces of the frame around so as to best
subtract away the player is called motion compensation.
The basic principle for video compression is the image-to-image prediction. The first
image is called an I-frame and is self-contained, having no dependency outside of
that image. The following frames may use part of the first image as a reference. An
image that is predicted from one reference image is called a P-frame and an image
that is bidirectionally predicted from two reference images is called a B-frame.
I-frames: I (Intracoded) frames, self-contained
P-frames: (Predicted) from last I or P reference frame
79
B-frames: (Bidirectional); predicted from two references one in the past and
one in the future, and thus out of order decoding is needed
Figure: The illustration above shows how a typical sequence with I-, B-, and P-frames may look.
Note that a P-frame may only reference a preceding I- or P-frame, while a B-frame may reference
both preceding and succeeding I- and P-frames.
The video decoder restores the video by decoding the bit stream frame by frame.
Decoding must always start with an I-frame, which can be decoded independently,
while P- and B-frames must be decoded together with current reference image(s).
80
81