IPT Module 1
IPT Module 1
In the human eye, the distance between the lens and the imaging region (the retina) is fixed, and the focal
TRACE KTU
length needed to achieve proper focus is obtained by varying the shape of the lens. The fibers in the ciliary
body accomplish this, flattening or thickening the lens for distant or near objects, respectively. The distance
between the center of the lens and the retina along the visual axis is approximately 17 mm. The range of
focal lengths is approximately 14 mm to 17 mm, the latter taking place when the eye is relaxed and focused
at distances greater than about 3 m.
Sampling Rate: The rate at which the samples are taken is called the sampling rate or pixel resolution. It
determines the level of detail captured and impacts the quality of the digital image. Higher sampling rates
TRACE KTU
yield more accurate representations of the original image but also result in larger file sizes.
Digitizing amplitude values is called quantization.
Quantization Levels: The number of quantization levels determines the range of discrete values that each
pixel can take. For example, an 8-bit image allows for 256 (2^8) quantization levels, resulting in 256
possible intensity values for each pixel. Higher bit depths provide more levels and enable a greater range of
colors or shades of gray.
Spatial resolution refers to the amount of detail or information captured in an image with respect to its
spatial dimensions, such as width and height. It is typically measured in pixels per unit area, such as
pixels per inch (PPI) or pixels per centimeter (PPC), and represents the smallest distinguishable feature
in an image.
For example, an image with a higher spatial resolution will have more pixels per unit area, which means
it can capture finer details and smaller features compared to an image with lower spatial resolution.
Spatial resolution is influenced by factors such as the camera sensor size, lens quality, and the imaging
system's pixel density.
Intensity resolution, on the other hand, refers to the ability of the imaging system to distinguish
differences in brightness or color between adjacent pixels. It is usually measured in bits per pixel, where
a higher bit depth indicates a higher intensity resolution.
For example, an image with 8 bits per pixel can distinguish 256 different intensity levels (2^8), while an
image with 16 bits per pixel can distinguish 65,536 different intensity levels (2^16). A higher intensity
resolution allows for more accurate and detailed representation of the colors and brightness in an image.
Image Interpolation:
Image interpolation is the process of estimating pixel values in an image at locations where there are no
measured values. This is often necessary when scaling or resampling an image to a different size or
resolution. It is commonly used to increase or decrease the spatial resolution of an image, rescale or resize
an image, or transform an image from one coordinate system to another.
When an image is resized or rescaled, the number of pixels may change, resulting in gaps or uneven spacing
between the original pixels. Image interpolation fills in these gaps by estimating the values of the missing
pixels. The goal of interpolation is to generate a visually plausible and smooth representation of the image.
Interpolation algorithms typically use the surrounding pixels in the image to estimate the value at the
missing location. There are several interpolation methods available, including:
1. Nearest-neighbor interpolation: This method simply assigns the value of the nearest pixel to the
missing location. It is the fastest method but can result in blocky or pixelated images.
2. Bilinear interpolation: This method takes the weighted average of the four nearest pixels to estimate
the missing value. It produces smoother results than nearest-neighbor interpolation but can result in
loss of sharpness or detail.
3. Bicubic interpolation: This method uses a more complex algorithm to estimate the missing value
based on the sixteen nearest pixels. It produces smoother and sharper results than bilinear
interpolation but can be slower and may introduce artifacts.
Image types
1. Binary Images: Binary images are the simplest type of digital images and consist of only two intensity
TRACE KTU
values or colors. Typically, these values are black and white, where black represents the absence of a
feature or object, and white represents the presence of a feature or object. Binary images are commonly
used for tasks such as image segmentation, edge detection, and object recognition. They are also used as
masks or templates for image processing operations.
2. Grayscale Images: Grayscale images are digital images in which each pixel has a single intensity value
representing the level of brightness or gray level. The intensity values range from 0 (black) to the
maximum value (white). Grayscale images do not contain color information but represent various shades
of gray. They are commonly used in applications where color information is not necessary, such as
medical imaging, document processing, and image enhancement techniques.
3. Color Images: Color images contain multiple channels of information, typically representing the three
primary colors: red, green, and blue (RGB). Each pixel in a color image consists of three color channels,
and the combination of intensities in these channels determines the perceived color. Color images
provide a more realistic representation of the visual world and are widely used in fields such as
photography, digital art, computer vision, and multimedia applications. Color images can be further
classified based on the color model used, such as RGB, CMYK, HSI, or YUV.
2. Lossy compression: This mechanism compresses the image data by discarding some of the information
that is deemed less important or noticeable to the human eye. Examples of lossy compression formats
include JPEG and MPEG. Lossy compression is often used for web images, digital video, and other
applications where storage space is limited and image quality can be slightly reduced without significant
visual impact.
3. RAW format: This is a file format that stores the unprocessed image data captured by a camera sensor,
without any compression or processing. RAW format allows for maximum flexibility and control in
post-processing, but requires more storage space and processing power than compressed formats.
4. Cloud storage: This mechanism allows for storing images remotely on cloud servers, which can be
accessed from anywhere with an internet connection. Cloud storage services such as Google Photos,
iCloud, and Amazon Cloud Drive offer various storage plans and features, as well as automatic backup
and synchronization.
5. External storage devices: This mechanism involves storing images on external hard drives, USB flash
drives, or other physical storage devices. External storage devices offer high storage capacity and
portability, but require regular backup and protection from physical damage or theft.
3. Multiplication: In image multiplication, the pixel values of two or more images are multiplied together to
create a new image. The resulting image will have pixel values that are the product of the corresponding
pixel values in the input images. Image multiplication can be used for various applications, such as
image filtering and contrast enhancement.
An important application of image multiplication (and division) is shading correction. Suppose that
an imaging sensor produces images that can be modelled as the product of a “perfect image,”
denoted by f(x, y), times a shading function h(x, y): i.e. g(x, y) = f(x, y) h(x, y). if h(x, y) is known
we can find f(x, y)
Another common use of image multiplication is in masking, also called region of interest (ROI),
operations. The process, consists simply of multiplying a given image by a mask image that has 1s in
the ROI and 0s elsewhere. There can be more than one ROI in the mask image, and the shape of the
ROI can be arbitrary, although rectangular shapes are used frequently for ease of implementation.
4. Division: In image division, the pixel values of one image are divided by the pixel values of another
image to create a new image. The resulting image will have pixel values that represent the quotient of
the corresponding pixel values in the input images. Image division can be used for various applications,
such as image normalization and image thresholding.
Logical Operations:
Logical operations on images involve performing Boolean operations on the pixel values of two or more
images to create a new image. These operations can be used for various purposes, such as image masking,
image thresholding, and image segmentation. Here are some common logical operations on images:
1. AND: In image AND operation, the pixel values of two images are compared using a logical "AND"
operator, and the resulting pixel value in the output image is set to 1 only if both input pixel values are 1.
Otherwise, the resulting pixel value is set to 0. Image AND operation can be used for image masking,
where one image is used as a mask to select regions of interest in another image.
2. OR: In image OR operation, the pixel values of two images are compared using a logical "OR" operator,
and the resulting pixel value in the output image is set to 1 if either or both input pixel values are 1.
Otherwise, the resulting pixel value is set to 0. Image OR operation can be used for image blending,
where two images are combined to create a new image that contains features from both.
3. NOT: In image NOT operation, the pixel values of an image are inverted, so that pixels that are
originally set to 1 are set to 0, and vice versa. Image NOT operation can be used for image thresholding,
where an image is inverted to create a binary image that highlights regions of interest.
4. XOR: In image XOR operation, the pixel values of two images are compared using a logical "XOR"
operator, and the resulting pixel value in the output image is set to 1 only if one of the input pixel values
TRACE KTU
is 1 and the other is 0. Otherwise, the resulting pixel value is set to 0. Image XOR operation can be used
for image segmentation, where regions of interest in an image are separated from the background.
This transformation can scale, rotate, translate or shear a image in both coordinates depending on the
values of chosen for elements of matrix T.
In practice, we can use Eq. in two basic ways.
The first, called a forward mapping, consists of scanning the pixels of the input image and, at each
location (v, w) , computing the spatial location, (x, y), of the corresponding pixel in the output image
TRACE KTU
using Eq. (2.6-23) directly .A problem with the forward mapping approach is that two or more pixels in
the input image can be transformed to the same location in the output image, raising the question of how
to combine multiple output values into a single output pixel. In addition, it is possible that some output
locations may not be assigned a pixel at all.
The second approach, called inverse mapping, scans the output pixel locations and, at each location, (x,
y), computes the corresponding location in the input image using(v, w) = T-1 (x, y) It then interpolates
among the nearest input pixels to determine the intensity of the output pixel value. Inverse mappings are
more efficient to implement than forward mappings and are used in numerous commercial
implementations of spatial transformations
Image Registrations:
Image registration is an important application of digital image processing used to align two or more
images of the same scene.
In image registration, we have available the input and output images, but the specific transformation that
produced the output image from the input generally is unknown.
The problem, then, is to estimate the transformation function and then use it to register the two images.
To clarify terminology, the input image is the image that we wish to transform, and what we call the
reference image is the image against which we want to register the input.
One of the principal approaches for solving the problem just discussed is to use tie points (also called
control points), which are corresponding points whose locations are known precisely in the input and
reference images.
There are numerous ways to select tie points, ranging from interactively selecting them to applying
algorithms that attempt to detect these points automatically.
In some applications, imaging systems have physical artifacts (such as small metallic objects) embedded
in the imaging sensors. These produce a set of known points (called reseau marks) directly on all images
captured by the system, which can be used as guides for establishing tie points.
3. GIF (Graphics Interchange Format): This is a popular format for simple animations and images with a
limited color palette. It supports transparency and animation, making it useful for web graphics and
social media.
4. BMP (Bitmap): This format is used for storing bitmap images in a device-independent format. It
supports images with a wide range of color depths and can be used for both monochrome and color
images.
5. TIFF (Tagged Image File Format): This is a flexible and high-quality format that supports various color
TRACE KTU
depths and compression methods. It is often used for professional printing and publishing.
6. RAW: This is a format used by digital cameras to store uncompressed image data. It provides the highest
quality and flexibility for post-processing, but also requires specialized software to work with.
7. PSD (Adobe Photoshop Document): This is a proprietary format used by Adobe Photoshop to save
layered images and other advanced features. It is primarily used for professional graphic design and
editing.
Colour Fundamentals:
Characterization of light is central to the science of color. If the light is achromatic (void of color), its
only attribute is its intensity, or amount. Achromatic light is what viewers see on a black and white
television set.
Chromatic light spans the electromagnetic spectrum from approximately 400 to 700 nm. Three basic
quantities are used to describe the quality of a chromatic light source: radiance, luminance, and
brightness.
Radiance is the total amount of energy that flows from the light source, and it is usually measured in
watts (W).
Luminance, measured in lumens (lm), gives a measure of the amount of energy an observer perceives
from a light source. For example, light emitted from a source operating in the far infrared region of the
spectrum could have significant energy (radiance), but an observer would hardly perceive it; its
luminance would be almost zero.
Finally, brightness is a subjective descriptor that is practically impossible to measure. It embodies the
achromatic notion of intensity and is one of the key factors in describing color sensation.
cones are the sensors in the eye responsible for color vision.
Detailed experimental evidence has established that the 6 to 7 million cones in the human eye can be
divided into three principal sensing categories, corresponding roughly to red, green, and blue.
Approximately 65% of all cones are sensitive to red light, 33% are sensitive to green light, and only
about 2% are sensitive to blue (but the blue cones are the most sensitive).
Due to these absorption characteristics of the human eye, colors are seen as variable combinations of the
so-called primary colors red (R), green (G), and blue (B).
The primary colors can be added to produce the secondary colors of light— magenta (red plus blue),
cyan (green plus blue), and yellow (red plus green).
Mixing the three primaries, or a secondary with its opposite primary color, in the right intensities
produces white light.
The characteristics generally used to distinguish one color from another are brightness, hue, and
saturation
Hue is an attribute associated with the dominant wavelength in a mixture of light waves. Hue represents
dominant color as perceived by an observer. Thus, when we call an object red, orange, or yellow, we are
referring to its hue.
TRACE KTU
Saturation refers to the relative purity or the amount of white light mixed with a hue.The pure spectrum
colors are fully saturated. Colors such as pink (red and white) and lavender (violet and white) are less
saturated, with the degree of saturation being inversely proportional to the amount of white light added.
Hue and saturation taken together are called chromaticity, and, therefore, a color may be characterized
by its brightness and chromaticity. The amounts of red, green, and blue needed to form any particular
color are called the tristimulus values and are denoted, X, Y, and Z, respectively.
A color is then specified by its trichromatic coefficients, defined as
Another approach for specifying colors is to use the CIE chromaticity diagram (Fig. 6.5), which shows
color composition as a function of x (red) and y (green). For any value of x and y, the corresponding
value of z (blue) is obtained from by noting that z = 1 – (x + y)
The positions of the various spectrum colors—from violet at 380 nm to red at 780 nm—are indicated
around the boundary of the tongue-shaped chromaticity diagram.
These are the pure colors shown in the spectrum of Fig. 6.2.
Any point not actually on the boundary but within the diagram represents some mixture of spectrum
colors.
The point of equal energy shown in Fig. 6.5 corresponds to equal fractions of the three primary colors; it
represents the CIE standard for white light.
Any point located on the boundary of the chromaticity chart is fully saturated.
As a point leaves the boundary and approaches the point of equal energy, more white light is added to
the color and it becomes less saturated. The saturation at the point of equal energy is zero.
The chromaticity diagram is useful for color mixing because a straight-line segment joining any two
points in the diagram defines all the different color variations that can be obtained by combining these
two colors additively.
TRACE KTU
Colour Models:
The purpose of a color model (also called color space or color system) is to facilitate the specification of
colors in some standard, generally accepted way.
In essence, a color model is a specification of a coordinate system and a subspace within that system
where each color is represented by a single point.
In terms of digital image processing, the hardware-oriented models most commonly used in practice are
the RGB (red, green, blue) model for color monitors and a broad class of color video cameras; the CMY
(cyan, magenta, yellow) and CMYK (cyan, magenta, yellow, black) models for color printing; and the
HSI (hue, saturation, intensity) model, which corresponds closely with the way humans describe and
interpret color. The HSI model also has the advantage that it decouples the color and gray-scale
information in an image, making it suitable for many of the gray-scale techniques.
1. RGB Model:
In the RGB model, each color appears in its primary spectral components of red, green, and blue.
This model is based on a Cartesian coordinate system.
The color subspace of interest is the cube shown in Fig. 6.7, in which RGB primary values are at three
corners; the secondary colors cyan, magenta, and yellow are at three other corners; black is at the origin;
and white is at the corner farthest from the origin.
In this model, the gray scale (points of equal RGB values) extends from black to white along the line
joining these two points.
The different colors in this model are points on or inside the cube, and are defined by vectors extending
from the origin.
Images represented in the RGB color model consist of three component images, one for each primary
color.
When fed into an RGB monitor, these three images combine on the screen to produce a composite color
image.
The number of bits used to represent each pixel in RGB space is called the pixel depth.
Consider an RGB image in which each of the red, green, and blue images is an 8-bit image. Under these
TRACE KTU
conditions each RGB color pixel [that is, a triplet of values (R, G, B)] is said to have a depth of 24 bits (3
image planes times the number of bits per plane).
The term full-color image is used often to denote a 24-bit RGB color image. The total number of colors
in a 24-bit RGB image is (28 )3 = 16,777,216.
A color image can be acquired by using three filters, sensitive to red, green, and blue, respectively.
When we view a color scene with a monochrome camera equipped with one of these filters, the result is
a monochrome image whose intensity is proportional to the response of that filter.
Repeating this process with each filter produces three monochrome images that are the RGB component
images of the color scene
CMY Model:
The CMY color model, also known as the subtractive color model, is a color model used in printing,
painting, and color mixing. CMY stands for cyan (C), magenta (M), and yellow (Y), the three primary
colors in this model. The CMY model works on the principle of subtractive color mixing, where colors
are created by subtracting certain wavelengths of light from white light.
In the CMY model, the primary colors are represented as percentages or decimal values ranging from 0
to 100, where 0 represents no ink or colorant, and 100 represents full saturation of the colorant. When all
three primary colors are combined at their maximum saturation, they theoretically absorb all
wavelengths of light and produce black. However, in practice, this often results in a dark, muddy color,
so black ink (K) is commonly added to the model to improve the depth and richness of dark colors.
To create specific colors using the CMY model, the cyan, magenta, and yellow colorants are combined
in various proportions. When equal amounts of cyan, magenta, and yellow are combined, they produce a
neutral gray color. By adjusting the amounts of each colorant, a wide range of colors can be achieved.
For example, combining 100% cyan and 100% magenta without yellow would create a shade of blue,
while combining 100% yellow and 100% magenta without cyan would create a shade of orange.
The CMY color model is used in printing processes such as offset printing and inkjet printing, where
cyan, magenta, yellow, and black (CMYK) inks are used to reproduce a wide range of colors. The black
ink (K) is added to improve the reproduction of dark colors and to provide more precise control over
color reproduction.
CMYK Model:
The CMYK color model is a subtractive color model used in printing, primarily for producing full-color
images and documents. CMYK stands for cyan (C), magenta (M), yellow (Y), and black (K),
representing the four primary ink colors used in the model. It is based on the principle of subtractive
color mixing, where colors are created by subtracting certain wavelengths of light from white light.
In the CMYK model, each primary color is represented by a percentage value ranging from 0 to 100,
indicating the amount of ink applied. A value of 0 represents no ink or colorant, while 100 represents full
TRACE KTU
saturation of the colorant. By combining different amounts of cyan, magenta, yellow, and black inks, a
wide range of colors can be achieved.
The addition of black (K) in the CMYK model is important for several reasons. First, it improves the
depth and richness of dark colors, as the combination of cyan, magenta, and yellow alone tends to
produce a muddy or dark brown color. Second, adding black ink reduces the amount of ink used, which
helps in terms of cost and drying time. Additionally, using black ink separately allows for precise control
over the richness of black areas in an image or text.
To reproduce a color image using the CMYK model, the original RGB image (which is commonly used
for digital displays) needs to be converted to CMYK. This conversion takes into account the differences
in color gamuts between RGB and CMYK, as RGB has a wider range of colors compared to CMYK.
The conversion process involves mapping the RGB values to their closest CMYK equivalents,
considering the characteristics of the printing process and the specific inks and papers being used.
The CMYK color model is widely used in various printing processes, including offset printing, digital
printing, and commercial printing. It allows for accurate reproduction of colors in printed materials such
as brochures, magazines, posters, and packaging. Designers and printers work with the CMYK model to
ensure the desired colors are achieved and maintained throughout the printing process.
HSI Model:
The HSI color model, also known as the HSB (Hue, Saturation, Brightness) color model, is a representation
of colors based on human perception. It provides an alternative way to describe and manipulate colors,
particularly for tasks such as color selection, image editing, and computer graphics.
1. Hue (H): It represents the dominant wavelength of a color and is often associated with the color
names we commonly use, such as red, blue, or green. The hue component is represented by an angle
ranging from 0 to 360 degrees or a normalized value between 0 and 1. The color wheel is often used
to visualize the hue component, where different angles correspond to different colors.
2. Saturation (S): It represents the purity or intensity of a color. A higher saturation value indicates a
more vivid or pure color, while a lower saturation value approaches a grayscale or desaturated color.
Saturation is typically represented as a percentage or a normalized value between 0 and 1.
The HSI color model is derived from the RGB color model, and it allows for a more intuitive way of
TRACE KTU
manipulating colors. Here's a brief explanation of how the components of HSI relate to RGB:
1. Hue (H): Hue is determined by the combination of red, green, and blue values in the RGB model.
The hue component of HSI can be thought of as the angle around the color wheel, where different
angles correspond to different primary and secondary colors.
2. Saturation (S): Saturation is determined by the amount of white light mixed with the color. In the
RGB model, it is calculated by comparing the intensity of the color to the maximum intensity
possible at a given brightness level. A fully saturated color in the RGB model has no white light
mixed with it.
3. Intensity or Brightness (I or B): Intensity represents the overall brightness of the color. In the RGB
model, it is calculated by averaging the red, green, and blue values.
The HSI color model provides advantages in certain applications. For example, in color selection, it allows
users to easily adjust the perceived brightness or intensity of a color without altering its hue. In image
editing, it can be used for operations such as color enhancement, adjusting saturation levels, or changing the
overall brightness of an image.
Converting RGB to HSI
TRACE KTU