Dip Notes
Dip Notes
Dip Notes
[EC612PE]
Prepared by
Course Objectives:
To comprehend the relation between human visual system and machine perception and
processing of digital images.
To provide a detailed approach towards image processing applications like enhancement,
segmentation, and compression.
Course Outcomes:
Exploration of the limitations of the computational methods on digital images.
Expected to implement the spatial and frequency domain image transforms on enhancement
and restoration of images.
Elaborate understanding on image enhancement techniques.
Expected to define the need for compression and evaluate the basic compression algorithms.
UNIT - I
Digital Image Fundamentals & Image Transforms: Digital Image Fundamentals, Sampling
and Quantization, Relationship between Pixels.
Image Transforms: 2-D FFT, Properties, Walsh Transform, Hadamard Transform, Discrete
Cosine Transform, Haar Transform, Slant Transform, Hotelling Transform.
UNIT - II
Image Enhancement (Spatial Domain): Introduction, Image Enhancement in Spatial Domain,
Enhancement through Point Processing, Types of Point Processing, Histogram Manipulation,
Linear and Non – Linear Gray Level Transformation, Local or Neighborhood criterion, Median
Filter, Spatial Domain High-Pass Filtering.
Image Enhancement (Frequency Domain): Filtering in Frequency Domain, Low Pass
(Smoothing) and High Pass (Sharpening) Filters in Frequency Domain.
UNIT - III
Image Restoration: Degradation Model, Algebraic Approach to Restoration, Inverse Filtering,
Least Mean Square Filters, Constrained Least Squares Restoration, Interactive Restoration.
UNIT – IV
Image Segmentation: Detection of Discontinuities, Edge Linking And Boundary Detection,
thresholding, Region Oriented Segmentation.
Morphological Image Processing: Dilation and Erosion: Dilation, Structuring Element
Decomposition, Erosion, Combining Dilation and Erosion, Opening and Closing, Hit or Miss
Transformation.
UNIT - V
Image Compression: Redundancies and their Removal Methods, Fidelity Criteria, Image
Compression Models, Huffman and Arithmetic Coding, Error Free Compression, Lossy
Compression, Lossy and Lossless Predictive Coding, Transform Based Compression, JPEG 2000
Standards.
TEXT BOOKS:
1. Digital Image Processing- Rafael C. Gonzalez, Richard E. Woods, 3rd Edition,
Pearson, 2008
2. Digital Image Processing- S Jayaraman, S Esakkirajan, T Veerakumar- MC GRAW
HILL EDUCATION, 2010.
REFERENCE BOOKS:
1. Digital Image Processing and Analysis-Human and Computer Vision Application
with using CVIP Tools - Scotte Umbaugh, 2nd Ed, CRC Press, 2011
2. Digital Image Processing using MATLAB – Rafael C. Gonzalez, Richard E Woods
and Steven L. Eddings, 2nd Edition, MC GRAW HILL EDUCATION, 2010.
3. Digital Image Processing and Computer Vision – Somka, Hlavac, Boyle- Cengage
Learning (Indian edition) 2008.
4. Introductory Computer Vision Imaging Techniques and Solutions- Adrian low, 2008,
2nd Edition
INDEX
TITLE PAGE No.
Chapter-I Digital Image Fundamentals & Image Transforms 1
UNIT-1
DIGITAL IMAGE FUNDAMENTALS & IMAGE TRANSFORMS
The term gray level is used often to refer to the intensity of monochrome images.
Color images are formed by a combination of individual 2-D images.
For example: The RGB color system, a color image consists of three (red, green and
blue) individual component images. For this reason many of the techniques developed for
monochrome images can be extended to color images by processing the three component
images individually.
An image may be continuous with respect to the x- and y- coordinates and also in
amplitude. Converting such an image to digital form requires that the coordinates, as well as
the amplitude, be digitized.
Medical applications:
Processing of chest X- rays
Cineangiograms
Projection images of transaxial tomography and
Medical images that occur in radiology nuclear magnetic resonance(NMR)
Ultrasonic scanning
IMAGE PROCESSING TOOLBOX (IPT) is a collection of functions that extend the
capability of the MATLAB numeric computing environment. These functions, and the
Specialize image processing hardware: It consists of the digitizer just mentioned, plus
hardware that performs other primitive operations such as an arithmetic logic unit, which
performs arithmetic such addition and subtraction and logical operations in parallel on images.
Computer: It is a general purpose computer and can range from a PC to a supercomputer
depending on the application. In dedicated applications, sometimes specially designed
computer are used to achieve a required level of performance
Software: It consists of specialized modules that perform specific tasks a well designed
package also includes capability for the user to write code, as a minimum, utilizes the
specialized module. More sophisticated software packages allow the integration of these
modules.
Mass storage: This capability is a must in image processing applications. An image of size
1024 x1024 pixels, in which the intensity of each pixel is an 8- bit quantity requires one
Megabytes of storage space if the image is not compressed .Image processing applications
falls into three principal categories of storage
i) Short term storage for use during processing
ii) On line storage for relatively fast retrieval
iii) Archival storage such as magnetic tapes and disks
Image display: Image displays in use today are mainly color TV monitors. These monitors
are driven by the outputs of image and graphics displays cards that are an integral part of
computer system.
Hardcopy devices: The devices for recording image includes laser printers, film cameras,
heat sensitive devices inkjet units and digital units such as optical and CD ROM disk. Films
provide the highest possible resolution, but paper is the obvious medium of choice for written
applications.
Networking: It is almost a default function in any computer system in use today because of
the large amount of data inherent in image processing applications. The key consideration in
image transmission bandwidth.
Fundamental Steps in Digital Image Processing:
There are two categories of the steps involved in the image processing –
1. Methods whose outputs are input are images.
2. Methods whose outputs are attributes extracted from those images.
Image acquisition: It could be as simple as being given an image that is already in digital
form. Generally the image acquisition stage involves processing such scaling.
Image Enhancement: It is among the simplest and most appealing areas of digital image
processing. The idea behind this is to bring out details that are obscured or simply to
highlight certain features of interest in image. Image enhancement is a very subjective area of
image processing.
Color image processing: It is an area that is been gaining importance because of the use of
digital images over the internet. Color image processing deals with basically color models
and their implementation in image processing applications.
Wavelets and Multiresolution Processing: These are the foundation for representing image
in various degrees of resolution.
Compression: It deals with techniques reducing the storage required to save an image, or the
bandwidth required to transmit it over the network. It has to major approaches a) Lossless
Compression b) Lossy Compression
Morphological processing: It deals with tools for extracting image components that are
useful in the representation and description of shape and boundary of objects. It is majorly
used in automated inspection applications.
Representation and Description: It always follows the output of segmentation step that is,
raw pixel data, constituting either the boundary of an image or points in the region itself. In
either case converting the data to a form suitable for computer processing is necessary.
Recognition: It is the process that assigns label to an object based on its descriptors. It is the
last step of image processing which use artificial intelligence of software.
Knowledge base:
Knowledge about a problem domain is coded into an image processing system in the form of
a knowledge base. This knowledge may be as simple as detailing regions of an image where
the information of the interest in known to be located. Thus limiting search that has to be
conducted in seeking the information. The knowledge base also can be quite complex such
interrelated list of all major possible defects in a materials inspection problems or an image
database containing high resolution satellite images of a region in connection with change
detection application.
In order to form a digital, the gray level values must also be converted (quantized) into
discrete quantities. So we divide the gray level scale into eight discrete levels ranging from
eight level values. The continuous gray levels are quantized simply by assigning one of the
eight discrete gray levels to each sample. The assignment it made depending on the vertical
proximity of a simple to a vertical tick mark.
Starting at the top of the image and covering out this procedure line by line produces a two
dimensional digital image.
Digital Image definition:
A digital image f(m,n) described in a 2D discrete space is derived from an analog
image f(x,y) in a 2D continuous space through a sampling process that is frequently referred
to as digitization. The mathematics of that sampling process will be described in subsequent
Chapters. For now we will look at some basic definitions associated with the digital image.
The effect of digitization is shown in figure.
The 2D continuous image f(x,y) is divided into N rows and M columns. The
intersection of a row and a column is termed a pixel. The value assigned to the integer
coordinates (m,n) with m=0,1,2..N-1 and n=0,1,2…N-1 is f(m,n). In fact, in most cases, is
actually a function of many variables including depth, color and time (t).
Thus the right side of the matrix represents a digital element, pixel or pel. The matrix can be
represented in the following form as well. The sampling process may be viewed as
partitioning the xy plane into a grid with the coordinates of the center of each grid being a
pair of elements from the Cartesian products Z2 which is the set of all ordered pair of
elements (Zi, Zj) with Zi and Zj being integers from Z. Hence f(x,y) is a digital image if gray
level (that is, a real number from the set of real number R) to each distinct pair of coordinates
(x,y). This functional assignment is the quantization process. If the gray levels are also
integers, Z replaces R, the and a digital image become a 2D function whose coordinates and
she amplitude value are integers. Due to processing storage and hardware consideration, the
number gray levels typically is an integer power of 2.
k
L=2
Then, the number, b, of bites required to store a digital image is B=M *N* k When M=N, the
2
equation become b=N *k
When an image can have 2k gray levels, it is referred to as ―k- bit‖. An image with 256
8
possible gray levels is called an ―8- bit image‖ (256=2 ).
transmitted through, objects. An example in the first category is light reflected from a planar
surface.
An example in the second category is when X-rays pass through a patient‘s body for the
purpose of generating a diagnostic X-ray film. In some applications, the reflected or
transmitted energy is focused onto a photo converter (e.g., a phosphor screen), which
converts the energy into visible light. Electron microscopy and some applications of gamma
imaging use this approach. The idea is simple: Incoming energy is transformed into a voltage
by the combination of input electrical power and sensor material that is responsive to the
particular type of energy being detected. The output voltage waveform is the response of the
sensor(s), and a digital quantity is obtained from each sensor by digitizing its response. In
this section, we look at the principal modalities for image sensing and generation.
In order to generate a 2-D image using a single sensor, there has to be relative displacements
in both the x- and y-directions between the sensor and the area to be imaged. Figure shows an
arrangement used in high-precision scanning, where a film negative is mounted onto a drum
whose mechanical rotation provides displacement in one dimension. The single sensor is
mounted on a lead screw that provides motion in the perpendicular direction. Since
mechanical motion can be controlled with high precision, this method is an inexpensive (but
slow) way to obtain high-resolution images. Other similar mechanical arrangements use a flat
bed, with the sensor moving in two linear directions. These types of mechanical digitizers
sometimes are referred to as microdensitometers.
Image Acquisition using a Sensor strips:
A geometry that is used much more frequently than single sensors consists of an in-line
arrangement of sensors in the form of a sensor strip, shows. The strip provides imaging
elements in one direction. Motion perpendicular to the strip provides imaging in the other
direction. This is the type of arrangement used in most flat bed scanners. Sensing devices
with 4000 or more in-line sensors are possible. In-line sensors are used routinely in airborne
imaging applications, in which the imaging system is mounted on an aircraft that flies at a
constant altitude and speed over the geographical area to be imaged. One dimensional
imaging sensor strips that respond to various bands of the electromagnetic spectrum are
mounted perpendicular to the direction of flight. The imaging strip gives one line of an image
at a time, and the motion of the strip completes the other dimension of a two-dimensional
image. Lenses or other focusing schemes are used to project area to be scanned onto the
sensors. Sensor strips mounted in a ring configuration are used in medical and industrial
imaging to obtain cross-sectional (―slice‖) images of 3-D objects.
The individual sensors arranged in the form of a 2-D array. Numerous electromagnetic and some
ultrasonic sensing devices frequently are arranged in an array format. This is also the predominant arrangement
found in digital cameras. A typical sensor for these cameras is a CCD array, which can be manufactured with a
broad range of sensing properties and can be packaged in rugged arrays of elements or more. CCD sensors are
used widely in digital cameras and other light sensing instruments. The response of each sensor is proportional
to the integral of the light energy projected onto the surface of the sensor, a property that is used in astronomical
and other applications requiring low noise images. Noise reduction is achieved by letting the sensor integrate
the input light signal over minutes or even hours. The two dimensional, its key advantage is that a complete
image can be obtained by focusing the energy pattern onto the surface of the array. Motion obviously is not
necessary, as is the case with the sensor arrangements This figure shows the energy from an illumination source
being reflected from a scene element, but, as mentioned at the beginning of this section, the energy also could
be transmitted through the scene elements. The first function performed by the imaging system is to collect the
incoming energy and focus it onto an image plane. If the illumination is light, the front end of the imaging
system is a lens, which projects the viewed scene onto the lens focal plane. The sensor array, which is
coincident with the focal plane, produces outputs proportional to the integral of the light received at each sensor.
Digital and analog circuitry sweep these outputs and convert them to a video signal, which is then digitized by
another section of the imaging system.
This set of pixels, called the 4-neighbors or p, is denoted by N4(p). Each pixel is one
unit distance from (x,y) and some of the neighbors of p lie outside the digital image if (x,y) is
on the border of the image. The four diagonal neighbors of p have coordinates and are
denoted by ND (p).
These points, together with the 4-neighbors, are called the 8-neighbors of p, denoted
by N8 (p).
As before, some of the points in ND (p) and N8 (p) fall outside the image if (x,y) is on
the border of the image.
ADJACENCY AND CONNECTIVITY
Let v be the set of gray –level values used to define adjacency, in a binary image, v={1}.
In a gray-scale image, the idea is the same, but V typically contains more elements, for
example, V = {180, 181, 182, …, 200}.
If the possible intensity values 0 – 255, V set can be any subset of these 256 values.
if we are reference to adjacency of pixel with value.
Three types of adjacency
4- Adjacency – two pixel P and Q with value from V are 4 –adjacency if A is in the
set N4(P)
8- Adjacency – two pixel P and Q with value from V are 8 –adjacency if A is in the
set N8(P)
M-adjacency –two pixel P and Q with value from V are m – adjacency if (i) Q is in
N4(p) or (ii) Q is in ND(q) and the set N4(p) ∩ N4(q) has no pixel whose values are
from V.
• Mixed adjacency is a modification of 8-adjacency. It is introduced to eliminate the
ambiguities that often arise when 8-adjacency is used.
• For example:
Fig:1.8(a) Arrangement of pixels; (b) pixels that are 8-adjacent (shown dashed) to the
center pixel; (c) m-adjacency.
Types of Adjacency:
• In this example, we can note that to connect between two pixels (finding a path
between two pixels):
– In 8-adjacency way, you can find multiple paths between two pixels
– While, in m-adjacency, you can find only one path between two pixels
• So, m-adjacency has eliminated the multiple path connection that has been generated
by the 8-adjacency.
• Two subsets S1 and S2 are adjacent, if some pixel in S1 is adjacent to some pixel in S2.
Adjacent means, either 4-, 8- or m-adjacency.
A Digital Path:
• A digital path (or curve) from pixel p with coordinate (x,y) to pixel q with coordinate (s,t)
is a sequence of distinct pixels with coordinates (x0,y0), (x1,y1), …, (xn, yn) where (x0,y0) =
(x,y) and (xn, yn) = (s,t) and pixels (xi, yi) and (xi-1, yi-1) are adjacent for 1 ≤ i ≤ n
• n is the length of the path
• If (x0,y0) = (xn, yn), the path is closed.
We can specify 4-, 8- or m-paths depending on the type of adjacency specified.
• Return to the previous example:
Fig:1.8 (a) Arrangement of pixels; (b) pixels that are 8-adjacent(shown dashed) to the
center pixel; (c) m-adjacency.
In figure (b) the paths between the top right and bottom right pixels are 8-paths. And
the path between the same 2 pixels in figure (c) is m-path
Connectivity:
• Let S represent a subset of pixels in an image, two pixels p and q are said to be
connected in S if there exists a path between them consisting entirely of pixels in S.
• For any pixel p in S, the set of pixels that are connected to it in S is called a connected
component of S. If it only has one connected component, then set S is called a
connected set.
Region and Boundary:
• REGION: Let R be a subset of pixels in an image, we call R a region of the image if R
is a connected set.
• BOUNDARY: The boundary (also called border or contour) of a region R is
the set of pixels in the region that have one or more neighbors that are not in R.
If R happens to be an entire image, then its boundary is defined as the set of pixels in the first
and last rows and columns in the image. This extra definition is required because an image
has no neighbors beyond its borders. Normally, when we refer to a region, we are referring to
subset of an image, and any pixels in the boundary of the region that happen to coincide with
the border of the image are included implicitly as part of the region boundary.
DISTANCE MEASURES:
For pixel p,q and z with coordinate (x.y) ,(s,t) and (v,w) respectively D is a distance function
or metric if
D [p.q] ≥ O {D[p.q] = O iff p=q}
D [p.q] = D [p.q] and
D [p.q] ≥ O {D[p.q]+D(q,z)
• The Euclidean Distance between p and q is defined as:
Pixels having a distance less than or equal to some value r from (x,y) are the points
contained in a disk of radius „ r „centered at (x,y)
• The D4 distance (also called city-block distance) between p and q is defined as:
D4 (p,q) = | x – s | + | y – t |
Pixels having a D4 distance from (x,y), less than or equal to some value r form a
Diamond centered at (x,y)
Example:
The pixels with distance D4 ≤ 2 from (x,y) form the following contours of
constant distance.
The pixels with D4 = 1 are the 4-neighbors of (x,y)
• The D8 distance (also called chessboard distance) between p and q is defined as:
D8 (p,q) = max(| x – s |,| y – t |)
Pixels having a D8 distance from (x,y), less than or equal to some value r form a
square Centered at (x,y).
Example:
D8 distance ≤ 2 from (x,y) form the following contours of constant distance.
• Dm distance:
It is defined as the shortest m-path between the points.
In this case, the distance between two pixels will depend on the values of the
pixels along the path, as well as the values of their neighbors.
• Example:
Consider the following arrangement of pixels and assume that p, p2, and p4
have value 1 and that p1 and p3 can have can have a value of 0 or 1 Suppose
that we consider the adjacency of pixels values 1 (i.e. V = {1})
Case2: If p1 =1 and p3 = 0
now, p1 and p will no longer be adjacent (see m-adjacency definition)
then, the length of the shortest
Case4: If p1 =1 and p3 = 1
The length of the shortest m-path will be 4 (p, p1 , p2, p3, p4)
IMAGE TRANSFORMS:
2-D FFT:
Where |𝑭(𝒌, 𝒍)| = (R2 { F(k,l)} + I2{ F(k,l)})1/2 is called as magnitude spectrum of the fourier transform
𝐼{𝐹(𝑘,𝑙)}
and 𝝋(𝒌, 𝒍) = tan−1 is called as phase angle or phase spectrum.
𝑅{𝐹(𝑘,𝑙)}
PROPERTIES OF 2D-DFT
1. Separable property
3. Periodicity property
4. Convolution property
5. Correlation property
6. Scaling property
8. Rotation property
SEPARABLE PROPERTY
The separable property allows a 2D transform to be computed in two steps by successive 1D operations
on rows and columns of an image.
2𝜋
−𝑗 𝑚𝑘
F(k,l) = ∑𝑁−1
𝑚=0 𝑓(𝑚, 𝑙)𝑒 𝑁
The 2D DFT of a shifted version of the image f(m,n) i.e., f(m-m0,n) is given by
where m0 represents the number of times that the function f(m,n) is shifted.
PERIODICITY PROPERTY
CONVOLUTION PROPERTY
Convolution is one of the most powerful operations in digital image processing. Convolution in spatial
domain is equal to multiplication in the frequency domain.Convolution of two sequence x(n) and h(n) is
defined as
CORRELATION PROPERTY
Correlation property is basically ued to find the relative similarity between two signlas.The process of
finding similarity of a signal to itself is autocorrelation whereas the process of finding the similarity
between two different signals is cross correlation.
The correlation property tells us that the correation of two sequences in time domain is equal to the
multiplication of DFT of one sequence and time reversal of the DFT of another sequence in the
frequency domain.
SCALING PROPERTY
Scaling is basically used to increase or decrease the size of an image. According to this property,the
expansion of a signal in one domain is equal to compression of the signal in another domain.
1 𝑘
If DFT of f(m,n) is F(k,l) then DFT [f(am,bn)] = |𝑎𝑏| 𝐹(𝑎 , 𝑙/𝑏)
CONJUGATE SYMMETRY
F(k,l) = F*(-k,-l)
ORTHOGONALITY PROPERTY
ROTATION PROPERTY
The rotation property states that if a function is rotated by the angle , its fourier transform also rotates by
an equal amount.
WALSH TRANSFORM:
We define now the 1-D Walsh transform as follows:
The array formed by the inverse Walsh matrix is identical to the one formed by the forward
Walsh matrix apart from a multiplicative factor N.
Walsh Transform
We define now the 2-D Walsh transform as a straightforward extension of the 1-D transform:
HADAMARD TRANSFORM:
We define now the 2-D Hadamard transform. It is similar to the 2-D Walsh transform.
We define now the Inverse 2-D Hadamard transform. It is identical to the forward 2-D
Hadamard transform.
The general equation for a 2D (N by M image) DCT is defined by the following equation:
Each step in the one dimensional Haar wavelet transform calculates a set of wavelet
coefficients (Hi-D) and a set of averages (Lo-D). If a data set s0, s1,…, sN-1 contains N
elements, there will be N/2 averages and N/2 coefficient values. The averages are stored in
the lower half of the N element array and the coefficients are stored in the upper half.
si si 1 si si 1
ai ci
2 2
In wavelet terminology the Haar average is calculated by the scaling function. The
coefficient is calculated by the wavelet function.
Two-Dimensional Wavelets
The two-dimensional wavelet transform is separable, which means we can apply a
one-dimensional wavelet transform to an image. We apply one-dimensional DWT to all rows
and then one-dimensional DWTs to all columns of the result. This is called the standard
decomposition and it is illustrated in figure 4.8.
SLANT TRANSFORM:
1 1
S2 =
1 -1
2 1/2
aN = N -4
2
4(N -1)
1 1 1 1
S4 =
1 -1 -1 -1
S = S*
S-1 = ST
(ii) The slant transform is fast, it can be implemented in (N log 2N) operations on an N x 1 vector.
(iii) The energy deal for images in this transform is rated in very good to excellent range.
(iv) The mean vectors for slant transform matrix S are not sequentially ordered for n ≥ 3.
HOTELLING TRANSFORM:
The basic principle of hotelling transform is the statistical properties of vector representation.
Consider a population of random vectors of the form,
X1
X2
.
X =
.
.
Xn
And the mean vector of the population is defined as the expected value of x i.e.,
mx = E{x}
The suffix m represents that the mean is associated with the population of x vectors. The
expected value of a vector or matrix is obtained by taking the expected value of each elememt.
The covariance matrix Cx in terms of x and mx is given as
Cx = E{(x-mx) (x-mx)T}
For M vector samples from a random population, the mean vector and covariance matrix
can be approximated from the samples by
mx k and
T
Cx kX k - m x mxT.
UNIT -2
IMAGE ENHANCEMENT
Image enhancement approaches fall into two broad categories: spatial domain
methods and frequency domain methods. The term spatial domain refers to the image plane
itself, and approaches in this category are based on direct manipulation of pixels in an image.
Frequency domain processing techniques are based on modifying the Fourier
transform of an image. Enhancing an image provides better contrast and a more detailed image as
compare to non enhanced image. Image enhancement has very good applications. It is used to
enhance medical images, images captured in remote sensing, images from satellite e.t.c. As indicated
previously, the term spatial domain refers to the aggregate of pixels composing an image.
Spatial domain methods are procedures that operate directly on these pixels. Spatial domain
processes will be denoted by the expression.
g(x,y) = T[f(x,y)]
where f(x, y) is the input image, g(x, y) is the processed image, and T is an operator
on f, defined over some neighborhood of (x, y). The principal approach in defining a
neighborhood about a point (x, y) is to use a square or rectangular subimage area centered at
(x, y), as Fig. 2.1 shows. The center of the subimage is moved from pixel to pixel starting,
say, at the top left corner. The operator T is applied at each location (x, y) to yield the output,
g, at that location. The process utilizes only the pixels in the area of the image spanned by the
neighborhood.
s=T(r)
where r is the pixels of the input image and s is the pixels of the output image. T is a
transformation function that maps each value of „r‟ to each value of „s‟.
For example, if T(r) has the form shown in Fig. 2.2(a), the effect of this transformation would
be to produce an image of higher contrast than the original by darkening the levels below m
and brightening the levels above m in the original image. In this technique, known as
contrast stretching, the values of r below m are compressed by the transformation function
into a narrow range of s, toward black. The opposite effect takes place for values of r above
m.
In the limiting case shown in Fig. 2.2(b), T(r) produces a two-level (binary) image. A
mapping of this form is called a thresholding function.
One of the principal approaches in this formulation is based on the use of so-called
masks (also referred to as filters, kernels, templates, or windows). Basically, a mask is a small
(say, 3*3) 2-D array, such as the one shown in Fig. 2.1, in which the values of the mask
coefficients determine the nature of the process, such as image sharpening. Enhancement
techniques based on this type of approach often are referred to as mask processing or
filtering.
Histogram Equalization:
Histogram equalization is a common technique for enhancing the appearance of images. Suppose
we have an image which is predominantly dark. Then its histogram would be skewed towards the
lower end of the grey scale and all the image detail are compressed into the dark end of the
histogram. If we could „stretch out‟ the grey levels at the dark end to produce a more uniformly
distributed histogram then the image would become much clearer.
Let there be a continuous function with r being gray levels of the image to be enhanced. The
range of r is [0, 1] with r=0 repressing black and r=1 representing white. The transformation
function is of the form
S=T(r) where 0<r<1
It produces a level s for every pixel value r in the original image.
The transformation function is assumed to fulfill two condition T(r) is single valued and
monotonically increasing in the internal 0<T(r)<1 for 0<r<1.The transformation function
should be single valued so that the inverse transformations should exist. Monotonically
increasing condition preserves the increasing order from black to white in the output image.
The second conditions guarantee that the output gray levels will be in the same range as the
input levels. The gray levels of the image may be viewed as random variables in the interval
[0.1]. The most fundamental descriptor of a random variable is its probability density
function (PDF) Pr(r) and Ps(s) denote the probability density functions of random variables r
and s respectively. Basic results from an elementary probability theory states that if Pr(r) and
Tr are known and T-1(s) satisfies conditions (a), then the probability density function Ps(s) of
the transformed variable is given by the formula
Thus the PDF of the transformed variable s is the determined by the gray levels PDF of the
input image and by the chosen transformations function.
A transformation function of a particular importance in image processing
Histogram Matching
In some cases it may be desirable to specify the shape of the histogram that we wish the
processed image to have. Histogram equalization does not allow interactive image
enhancement and generates only one result: an approximation to a uniform histogram.
Sometimes we need to be able to specify particular histogram shapes capable of highlighting
certain gray-level ranges. The method use to generate a processed image that has a specified
histogram is called histogram matching or histogram specification.
Algorithm
2. Compute G(k), k = 0, …, L-1, the transformation function, from the given histogram hz
3. Compute G-1(sk) for each k = 0, …, L-1 using an iterative method (iterate on z), or in effect,
directly compute G-1(Pf (k))
S = T(r) or
S = L-1 – r
Where r= graylevel value at pixel (x,y)
Lis the largest graylevel consists in the image
It results in getting photograph negative. It is useful when for enhancing white details embedded in dark
regions ofthe image.
The overall graph of these transitions has been shown below.
Negative
nth root
Log
nth power
s = (L – 1) – r
since the input image of Einstein is an 8 bpp image, so the number of levels in this image are
256. Putting 256 in the equation, we get this
s = 255 – r
So each value is subtracted by 255 and the result image has been shown above. So what
happens is that, the lighter pixels become dark and the darker picture becomes light. And it
results in image negative.
It has been shown in the graph below.
LOGARITHMICTRANSFORMATIONS:
Logarithmic transformation further contains two type of transformation. Log transformation
and inverse log transformation.
LOGTRANSFORMATIONS:
The log transformations can be defined by this formula
s = c log(r + 1).
Where s and r are the pixel values of the output and the input image and c is a constant. The
value 1 is added to each of the pixel value of the input image because if there is a pixel
intensity of 0 in the image, then log (0) is equal to infinity. So 1 is added, to make the
minimum value at least 1.
During log transformation, the dark pixels in an image are expanded as compare to the
higher pixel values. The higher pixel values are kind of compressed in log transformation.
This result in following image enhancement.
An another way of representing LOG TRANSFORMATIONS: Enhance details in the darker regions of an
image at theexpenseof detail in brighter regions.
T(f) = C * log (1+r)
Here C is constantand r ≥ 0.
The shape of the curve shows that this transformation maps the narrow range of low gray level
values in theinputimageintoa widerrangeofoutputimage.
Theoppositeis trueforhighlevelvalues of inputimage.
POWER – LAWTRANSFORMATIONS:
There are further two transformation is power law transformations, that include nth
power and nth root transformation. These transformations can be given by the expression:
s=crγ
This symbol γ is called gamma, due to which this transformation is also known as
gamma transformation.
Variation in the value of γ varies the enhancement of the images. Different display
devices / monitors have their own gamma correction, that‟s why they display their image at
different intensity.
γ
where c and g are positive constants. Sometimes Eq. (6) is written as S = C (r +ε)
to account for an offset (that is, a measurable output when the input is zero). Plots of s versus
r for various values of γ are shown in Fig. 2.10. As in the case of the log transformation,
power-law curves with fractional values of γ map a narrow range of dark input values into a
wider range of output values, with the opposite being true for higher values of input levels.
Unlike the log function, however, we notice here a family of possible transformation curves
obtained simply by varying γ.
In Fig that curves generated with values of γ>1 have exactly The opposite effect as those
generated with values of γ<1. Finally, we Note that Eq. (6) reduces to the identity
transformation when c=γ=1.
Fig. 2.13 Plot of the equation S = crγ for various values of γ (c =1 in all cases).
This type of transformation is used for enhancing images for different type of display
devices. The gamma of different display devices is different. For example Gamma of CRT
lies in between of 1.8 to 2.5, that means the image displayed on CRT is dark.
Varying gamma (γ) obtains familyof possible transformation curves S = C* r γ
CORRECTING GAMMA:
s=crγ
s=cr (1/2.5)
The same image but with different gamma values has been shown here.
Piecewise-Linear Transformation Functions:
A complementary approach to the methods discussed in the previous three sections is
to use piecewise linear functions. The principal advantage of piecewise linear functions over
the types of functions we have discussed thus far is that the form of piecewise functions can
be arbitrarily complex.
The principal disadvantage of piecewise functions is that their specification requires
considerably more user input. Contrast stretching: One of the simplest piecewise linear functions is
a contrast-stretching transformation. Low-contrast images can result from poor illumination, lack
of dynamic range in the imaging sensor, or even wrong setting of a lens aperture during image
acquisition. S= T(r)
Figure x(a) shows a typical transformation used for contrast stretching. The locations
of points (r1, s1) and (r2, s2) control the shape of the transformation
Function. If r1=s1 and r2=s2, the transformation is a linear function that produces No
changes in gray levels. If r1=r2, s1=0and s2= L-1, the transformation Becomes a thresholding
function that creates a binary image, as illustrated In fig. 2.2(b).
Intermediate values of ar1, s1b and ar2, s2b produce various degrees Of spread in the
gray levels of the output image, thus affecting its contrast. In general, r1≤ r2 and s1 ≤ s2 is
assumed so that the function is single valued and Monotonically increasing.
Fig. x Contrast stretching. (a) Form of transformation function. (b) A low-contrast stretching.
(c) Result of contrast stretching. (d) Result of thresholding (original image courtesy of
Dr.Roger Heady, Research School of Biological Sciences, Australian National University
Canberra Australia.
Figure x(b) shows an 8-bit image with low contrast. Fig. x(c) shows the result of contrast
stretching, obtained by setting (r1, s1 )=(rmin, 0) and (r2, s2)=(r max,L-1) where rmin and r max
denote the minimum and maximum gray levels in the image, respectively.Thus, the
transformation function stretched the levels linearly from their original range to the full range
[0, L-1]. Finally, Fig. x(d) shows the result of using the thresholding function defined
previously, with r1=r2=m, the mean gray level in the image. The original image on which
these results are based is a scanning electron microscope image of pollen, magnified
approximately 700 times.
Gray-level slicing:
Highlighting a specific range of gray levels in an image often is desired. Applications
include enhancing features such as masses of water in satellite imagery and enhancing flaws
in X-ray images.
There are several ways of doing level slicing, but most of them are variations of two
basic themes.One approach is to display a high value for all gray levels in the range of
interest and a low value for all other gray levels.
This transformation, shown in Fig. y(a), produces a binary image. The second
approach, based on the transformation shown in Fig.y (b), brightens the desired range of gray
levels but preserves the background and gray-level tonalities in the image. Figure y (c) shows
a gray-scale image, and Fig. y(d) shows the result of using the transformation in Fig.
y(a).Variations of the two transformations shown in Fig. are easy to formulate.
Fig. y (a)This transformation highlights range [A,B] of gray levels and reduces all others to a
constant level (b) This transformation highlights range [A,B] but preserves all other levels.
(c) An image . (d) Result of using the transformation in (a).
Bit-Plane Slicing:
Instead of highlighting gray-level ranges, highlighting the contribution made to total
image appearance by specific bits might be desired. Suppose that each pixel in an image is
represented by 8 bits. Imagine that the image is composed of eight 1-bit planes, ranging from
bit-plane 0 for the least significant bit to bit plane 7 for the most significant bit. In terms of 8-
bit bytes, plane 0 contains all the lowest order bits in the bytes comprising the pixels in the
image and plane 7 contains all the high-order bits.
Figure 3.12 illustrates these ideas, and Fig. 3.14 shows the various bit planes for the
image shown in Fig. 3.13. Note that the higher-order bits (especially the top four) contain the
majority of the visually significant data.The other bit planes contribute to more subtle details
in the image. Separating a digital image into its bit planes is useful for analyzing the relative
importance played by each bit of the image, a process that aids in determining the adequacy
of the number of bits used to quantize each pixel.
In terms of bit-plane extraction for an 8-bit image, it is not difficult to show that the
(binary) image for bit-plane 7 can be obtained by processing the input image with a
thresholding gray-level transformation function that (1) maps all levels in the image between
0 and 127 to one level (for example, 0); and (2) maps all levels between 129 and 255 to
another (for example, 255).The binary image for bit-plane 7 in Fig. 3.14 was obtained in just
this manner. It is left as an exercise
(Problem 3.3) to obtain the gray-level transformation functions that would yield the other bit
planes.
ADAPTIVE FILTER:
Adaptive filters are filters whose behavior changes based on statistical characteristics
of the image inside the filter region defined by the m X n rectangular window Sxy.
The simplest statistical measures of a random variable are its mean and variance.
These are reasonable parameters on which to base an adaptive filler because they are
quantities closely related to the appearance of an image. The mean gives a measure of
average gray level in the region over which the mean is computed, and the variance gives a
measure of average contrast in that region.
This filter is to operate on a local region, Sxy. The response of the filter at any point (x,
y) on which the region is centered is to be based on four quantities: (a) g(x, y), the value of
the noisy image at (x, y); (b) a2, the variance of the noise corrupting /(x, y) to form g(x, y);
(c) ray, the local mean of the pixels in S xy; and (d) σ2L , the local variance of the pixels in S xy.
The behavior of the filter to be as follows:
1. If σ2η is zero, the filler should return simply the value of g (x, y). This is the trivial, zero-noise
case in which g (x, y) is equal to f (x, y).
2. If the local variance is high relative to σ2η the filter should return a value close to g (x, y). A
high local variance typically is associated with edges, and these should be preserved.
3. If the two variances are equal, we want the filter to return the arithmetic mean value of the
pixels in Sxy. This condition occurs when the local area has the same properties as the overall
image, and local noise is to be reduced simply by averaging.
The only quantity that needs to be known or estimated is the variance of the overall noise,
a2. The other parameters are computed from the pixels in Sxy at each location (x, y) on
which the filter window is centered.
The median filter performs well as long as the spatial density of the impulse noise is not large
(as a rule of thumb, Pa and Pb less than 0.2). The adaptive median filtering can handle
impulse noise with probabilities even larger than these. An additional benefit of the adaptive
median filter is that it seeks to preserve detail while smoothing nonimpulse noise,
something that the "traditional" median filter does not do. The adaptive median filter also
works in a rectangular window area Sxy. Unlike those filters, however, the adaptive median
filter changes (increases) the size of Sxy during filter operation, depending on certain
conditions. The output of the filter is a single value used to replace the value of the pixel at
(x, y), the particular point on which the window Sxy is centered at a given time.
A2 = zmed - zmax
B2 = zxy - zmax
If B1> 0 AND B2 < 0, output zxy
Fig.. Spatial representation of typical (a) ideal (b) Butter-worth and (c) Gaussian
frequency domain high-pass filters, and corresponding intensity profiles through their centers.
We can expect IHPFs to have the same ringing properties as ILPFs. This is demonstrated
clearly in Fig.. which consists of various IHPF results using the original image in Fig.(a) with
D0 set to 30, 60,and 160 pixels, respectively. The ringing in Fig. (a) is so severe that it
produced distorted, thickened object boundaries (e.g.,look at the large letter ―a‖). Edges of the
top three circles do not show well because they are not as strong as the other edges in the
image (the intensity of these three objects is much closer to the background intensity, giving
discontinuities of smaller magnitude).
Where D0 is the cutoff frequency and D(u,v) is given by eq. As intended, the IHPF is
the opposite of the ILPF in the sense that it sets to zero all frequencies inside a circle of
radius D0 while passing, without attenuation, all frequencies outside the circle. As in
case of the ILPF, the IHPF is not physically realizable.
FILTERED RESULTS: IHPF:
Fig.. Results of high-pass filtering the image in Fig.(a) using an IHPF with D0 = 30,
60, and 160.
The situation improved somewhat with D0 = 60. Edge distortion is quite evident still,
but now we begin to see filtering on the smaller objects. Due to the now familiar inverse
relationship between the frequency and spatial domains, we know that the spot size of this
filter is smaller than the spot of the filter with D0 = 30. The result for D0 = 160 is closer to
what a high-pass filtered image should look like. Here, the edges are much cleaner and less
distorted, and the smaller objects have been filtered properly.
Of course, the constant background in all images is zero in these high-pass filtered images
because highpass filtering is analogous to differentiation in the spatial domain.
H (u,v) = 1, if D(u,v) ≤ D0
0, if D(u,v) ˃ D0
Where D0 is a positive constant and D(u,v) is the distance between a point (u,v) in the
frequency domain and the center of the frequency rectangle; that is
Fig: ideal low pass filter 3-D view and 2-D view and line graph.
Fig: (a) Test pattern of size 688x688 pixels (b) its Fourier spectrum
Fig: (a) original image, (b)-(f) Results of filtering using ILPFs with cutoff frequencies
set at radii values 10, 30, 60, 160 and 460, as shown in fig.2.2.2(b). The power removed by
these filters was 13, 6.9, 4.3, 2.2 and 0.8% of the total, respectively.
Transfer function does not have sharp discontinuity establishing cutoff between
passed and filtered frequencies.
Cut off frequency D0 defines point at which H(u,v) = 0.5
Fig. (a) perspective plot of a Butterworth lowpass-filter transfer function. (b) Filter displayed as an
image. (c)Filter radial cross sections of order 1 through 4.
Unlike the ILPF, the BLPF transfer function does not have a sharp discontinuity that
gives a clear cutoff between passed and filtered frequencies.
BUTTERWORTH LOW-PASS FILTERS OF DIFFERENT FREQUENCIES:
Fig. (a) Original image.(b)-(f) Results of filtering using BLPFs of order 2, with cutoff
frequencies at the radii
Fig. shows the results of applying the BLPF of eq. to fig.(a), with n=2 and D0 equal to
the five radii in fig.(b) for the ILPF, we note here a smooth transition in blurring as a function
of increasing cutoff frequency. Moreover, no ringing is visible in any of the images
processed with this particular BLPF, a fact attributed to the filter‟s smooth transition
between low and high frequencies.
A BLPF of order 1 has no ringing in the spatial domain. Ringing generally is
imperceptible in filters of order 2, but can become significant in filters of higher order.
Fig.(a) Original image. (b)-(f) Results of filtering using GLPFs with cutoff
frequencies at the radii shown in fig.2.2.2. compare with fig.2.2.3 and fig.2.2.6
Fig. (a) Original image (784x 732 pixels). (b) Result of filtering using a GLPF with
D0 = 100. (c) Result of filtering using a GLPF with D0 = 80. Note the reduction in fine skin
lines in the magnified sections in (b) and (c).
Fig. shows an application of lowpass filtering for producing a smoother, softer-
looking result from a sharp original. For human faces, the typical objective is to reduce the
sharpness of fine skin lines and small blemished.
Fig: Top row: Perspective plot, image representation, and cross section of a typical
ideal high-pass filter. Middle and bottom rows: The same sequence for typical butter-worth
and Gaussian high-pass filters.
Where D(u,v) is given by Eq.(3). This expression follows directly from Eqs.(3) and (6). The
middle row of Fig.2.2.11. shows an image and cross section of the BHPF function.
Butter-worth high-pass filter to behave smoother than IHPFs. Fig.2.2.14.shows the
performance of a BHPF of order 2 and with D0 set to the same values as in Fig.2.2.13. The
boundaries are much less distorted than in Fig.2.2.13. even for the smallest value of cutoff
frequency.
Fig. Results of high-pass filtering the image in Fig.2.2.2(a) using a BHPF of order 2
with D0 = 30, 60, and 160 corresponding to the circles in Fig.2.2.2(b). These results are much
smoother than those obtained with an IHPF.
GAUSSIAN HIGH-PASS FILTERS:
The transfer function of the Gaussian high-pass filter(GHPF) with cutoff frequency
locus at a distance D0 from the center of the frequency rectangle is given by
Where D(u,v) is given by Eq.(4). This expression follows directly from Eqs.(2) and
(6). The third row in Fig.2.2.11. shows a perspective plot, image and cross section of the
GHPF function. Following the same format as for the BHPF, we show in Fig.2.2.15.
comparable results using GHPFs. As expected, the results obtained are more gradual than
with the previous two filters.
FILTERED RESULTS:GHPF:
Fig. Results of high-pass filtering the image in fig.(a) using a GHPF with D0 = 30, 60
and 160, corresponding to the circles in Fig.(b).
UNIT-3
IMAGE RESTORATION
IMAGE RESTORATION:
Restoration improves image in some predefined sense. It is an objective process.
Restoration attempts to reconstruct an image that has been degraded by using a priori
knowledge of the degradation phenomenon. These techniques are oriented toward
modeling the degradation and then applying the inverse process in order to recover the
original image. Restoration techniques are based on mathematical or probabilistic
models of image processing. Enhancement, on the other hand is based on human
subjective preferences regarding what constitutes a ―good‖ enhancement result. Image
Restoration refers to a class of methods that aim to remove or reduce the degradations
that have occurred while the digital image was being obtained. All natural images when
displayed have gone through some sort of degradation:
During display mode
Acquisition mode, or
Processing mode
Sensor noise
Blur due to camera mis focus
Relative object-camera motion
Random atmospheric turbulence
Others
Degradation Model:
Degradation process operates on a degradation function that operates on an input
image with an additive noise term. Input image is represented by using the notation
f(x,y), noise term can be represented as η(x,y).These two terms when combined gives
the result as g(x,y). If we are given g(x,y), some knowledge about the degradation
function H or J and some knowledge about the additive noise teem η(x,y), the objective
of restoration is to obtain an estimate f'(x,y) of the original image. We want the estimate
to be as close as possible to the original image. The more we know about h and η , the
closer f(x,y) will be to f'(x,y). If it is a linear position invariant process, then degraded
image is given in the spatial domain by
g(x,y)=f(x,y)*h(x,y)+η(x,y)
Where z represents the gray level, μ= mean of average value of z, σ= standard deviation.
Rayleigh Noise:
Unlike Gaussian distribution, the Rayleigh distribution is no symmetric. It is given by
the formula.
Its shape is similar to Rayleigh disruption. This equation is referred to as gamma density
it is correct only when the denominator is the gamma function.
(iv) Exponential Noise:
Exponential distribution has an exponential shape. The PDF of exponential noise is given as
Where a>0. The mean and variance of this density are given by
If b>a, gray level b will appear as a light dot in image. Level a will appear like a dark dot.
i) Mean Filter:
ii) (a)Arithmetic Mean filter:
It is the simplest mean filter. Let Sxy represents the set of coordinates in the sub
image of size m*n centered at point (x,y). The arithmetic mean filter computes the
average value of the corrupted image g(x,y) in the area defined by Sxy. The value of the
restored image f at any point (x,y) is the arithmetic mean computed using the pixels in
the region defined by Sxy.
This operation can be using a convolution mask in which all coefficients have
value 1/mn A mean filter smoothes local variations in image Noise is reduced as a result
of blurring. For every pixel in the image, the pixel value is replaced by the mean value
of its neighboring pixels with a weight .This will resulted in a smoothing effect in the
image.
(b) Geometric Mean filter:
An image restored using a geometric mean filter is given by the expression
Here, each restored pixel is given by the product of the pixel in the sub image window,
raised to the power 1/mn. A geometric mean filters but it to loose image details in the
process.
(c) Harmonic Mean filter:
The harmonic mean filtering operation is given by the expression
The harmonic mean filter works well for salt noise but fails for pepper noise. It does
well with Gaussian noise also.
(d) Order statistics filter:
Order statistics filters are spatial filters whose response is based on ordering the pixel
contained in the image area encompassed by the filter. The response of the filter at any
point is determined by the ranking result.
The original of the pixel is included in the computation of the median of the filter are
quite possible because for certain types of random noise, the provide excellent noise
reduction capabilities with considerably less blurring then smoothing filters of similar
size. These are effective for bipolar and unipolor impulse noise.
(e) Max and Min filter:
Using the l00th percentile of ranked set of numbers is called the max filter and is given
by the equation
It is used for finding the brightest point in an image. Pepper noise in the image has very
low values, it is reduced by max filter using the max selection process in the sublimated
area sky. The 0th percentile filter is min filter.
This filter is useful for flinging the darkest point in image. Also, it reduces salt noise
of the min operation.
(f) Midpoint filter:
The midpoint filter simply computes the midpoint between the maximum and minimum
values in the area encompassed by
It comeliness the order statistics and averaging .This filter works best for randomly
distributed noise like Gaussian or uniform noise.
Periodic Noise by Frequency domain filtering:
These types of filters are used for this purpose-
Band Reject Filters:
It removes a band of frequencies about the origin of the Fourier transformer.
D(u,v)- the distance from the origin of the centered frequency rectangle.
W- the width of the band
Do- the radial center of the frequency rectangle.
Butterworth Band reject Filter:
These filters are mostly used when the location of noise component in the frequency
domain is known. Sinusoidal noise can be easily removed by using these kinds of
filters because it shows two impulses that are mirror images of each other about the
origin. Of the frequency transform.
These filters cannot be applied directly on an image because it may remove too much details
of an image but these are effective in isolating the effect of an image of selected frequency
bands.
Notch Filters:
A notch filter rejects (or passes) frequencies in predefined neighborhoods about a
center frequency.
Due to the symmetry of the Fourier transform notch filters must appear in symmetric
pairs about the origin.
The transfer function of an ideal notch reject filter of radius D 0 with centers a (u0 , v0)
and by symmetry at (-u0 , v0) is
Inverse Filtering:
The simplest approach to restoration is direct inverse filtering where we complete
We know that
Therefore
From the above equation we observe that we cannot recover the undegraded image
exactly because N(u,v) is a random function whose Fourier transform is not known.
One approach to get around the zero or small-value problem is to limit the filter
frequencies to values near the origin. We know that H(0,0) is equal to the average
values of h(x,y). By Limiting the analysis to frequencies near the origin we reduse the
probability of encountering zero values.
The inverse filtering approach has poor performance. The wiener filtering approach
uses the degradation function and statistical characteristics of noise into the restoration
process.
The objective is to find an estimate of the uncorrupted image f such that mean square
error between them is minimized.
in vector-matrix form
The optimality criteria for restoration is based on a measure of smoothness, such as the
second derivative of an image (Laplacian).
The minimum of a criterion function C defined as
Where γ is a parameter that must be adjusted so that the constraint is satisfied. P(u,v) is
the Fourier transform of the laplacian operator
e = ║ g Hfˆ║2
e = ( g Hfˆ)T ( g Hfˆ )
The partial derivative of the error metric with respect to image estimate is given by
f0 ˆ HT g
Where 0 < β < . If β is constant and no constraints are imposed then the result after N iterations is
equivalent to a single filtering step which can be represented in fourier domain as
FˆN ( k, l) = ( 1 – ( 1 – β| H ( k, l ) |2 ) N+1 ) x
However, if a positive constraint is imposed on the solution after each iteration then the
algorithm becomes non-linear. Alternatively, a steepest descent approach can be taken to
optimize the step length which is given by
β k = pk T p k
pk ) pk
T
UNIT-4
IMAGE SEGMENTATION
First-order derivatives of a digital image are based on various approximations of the 2-D
gradient. The gradient of an image f (x, y) at location (x, y) is defined as the vector
It is well known from vector analysis that the gradient vector points in the direction of
maximum rate of change of f at coordinates (x, y). An important quantity in edge detection is
the magnitude
of this vector, denoted by Af, where
This quantity gives the maximum rate of increase of f (x, y) per unit distance in the direction
of Af. It is a common (although not strictly correct) practice to refer to Af also as the
gradient. The direction of the gradient vector also is an important quantity. Let α (x, y)
represent the direction angle of the vector Af at (x, y). Then, from vector analysis,
where the angle is measured with respect to the x-axis. The direction of an edge at (x, y) is
perpendicular to the direction of the gradient vector at that point. Computation of the gradient
of an image is based on obtaining the partial derivatives &f/&x and &f/&y at every pixel
location. Let the 3x3 area shown in Fig. 1.1 (a) represent the gray levels in a neighborhood of
an image. One of the simplest ways to implement a first-order partial derivative at point z5 is to
use the
following Roberts cross-gradient operators:
These derivatives can be implemented for an entire image by using the masks shown in Fig.
1.1(b). Masks of size 2 X 2 are awkward to implement because they do not have a clear center.
An approach using masks of size 3 X 3 is given by
Fig.1.1 A 3 X 3 region of an image (the z‘s are gray-level values) and various masks used to
compute the gradient at point labeled z5.
A weight value of 2 is used to achieve some smoothing by giving more importance to the
center point. Figures 1.1(f) and (g), called the Sobel operators, and are used to implement
these two equations. The Prewitt and Sobel operators are among the most used in practice for
computing digital gradients. The Prewitt masks are simpler to implement than the Sobel
masks, but the latter have slightly superior noise-suppression characteristics, an important
issue when dealing with derivatives. Note that the coefficients in all the masks shown in Fig.
1.1 sum to 0, indicating that they give a response of 0 in areas of constant gray level, as
expected of a derivative operator.
The masks just discussed are used to obtain the gradient components Gx and Gy. Computation
of the gradient requires that these two components be combined. However, this
implementation is not always desirable because of the computational burden required by
squares and square roots. An approach used frequently is to approximate the gradient by
absolute values:
This equation is much more attractive computationally, and it still preserves relative changes
in gray levels. However, this is not an issue when masks such as the Prewitt and Sobel masks
are used to compute Gx and Gy.
It is possible to modify the 3 X 3 masks in Fig. 1.1 so that they have their strongest responses
along the diagonal directions. The two additional Prewitt and Sobel masks for detecting
discontinuities in the diagonal directions are shown in Fig. 1.2.
The Laplacian:
For a 3 X 3 region, one of the two forms encountered most frequently in practice is
where the z's are defined in Fig. 1.1(a). A digital approximation including the diagonal
neighbors is given by
Masks for implementing these two equations are shown in Fig. 1.3. We note from these
masks that the implementations of Eqns. are isotropic for rotation increments of 90° and 45°,
respectively.
EDGE DETECTION:
Intuitively, an edge is a set of connected pixels that lie on the boundary between two
regions. Fundamentally, an edge is a "local" concept whereas a region boundary, owing to the
way it is defined, is a more global idea. A reasonable definition of "edge" requires the ability to
measure gray-level transitions in a meaningful way. We start by modeling an edge intuitively.
This will lead us to formalism in which "meaningful" transitions in gray levels can be
measured. Intuitively, an ideal edge has the properties of the model shown in Fig. 2.1(a). An
ideal edge according to this model is a set of connected pixels (in the vertical direction here),
each of which is located at an orthogonal step transition in gray level (as shown by the
horizontal profile in the figure).
In practice, optics, sampling, and other image acquisition imperfections yield edges
that are blurred, with the degree of blurring being determined by factors such as the quality of
the image acquisition system, the sampling rate, and illumination conditions under which the
image is acquired. As a result, edges are more closely modeled as having a "ramp like" profile,
such as the one shown in Fig.2.1 (b).
Fig.2.1 (a) Model of an ideal digital edge (b) Model of a ramp edge. The slope of the ramp
is proportional to the degree of blurring in the edge.
The slope of the ramp is inversely proportional to the degree of blurring in the edge. In this
model, we no longer have a thin (one pixel thick) path. Instead, an edge point now is any point
contained in the ramp, and an edge would then be a set of such points that are connected. The
"thickness" of the edge is determined by the length of the ramp, as it transitions from an initial
to a final gray level. This length is determined by the slope, which, in turn, is determined by
the degree of blurring. This makes sense: Blurred edges lend to be thick and sharp edges tend
to be thin. Figure 2.2(a) shows the image from which the close-up in Fig. 2.1(b) was extracted.
Figure 2.2(b) shows a horizontal gray-level profile of the edge between the two regions. This
figure also shows the first and second derivatives of the gray-level profile. The first derivative
is positive at the points of transition into and out of the ramp as we move from left to right
along the profile; it is constant for points in the ramp; and is zero in areas of constant gray
level. The second derivative is positive at the transition associated with the dark side of the
edge, negative at the transition associated with the light side of the edge, and zero along the
ramp and in areas of constant gray level. The signs of the derivatives in Fig. 2.2(b) would be
reversed for an edge that transitions from light to dark.
We conclude from these observations that the magnitude of the first derivative can be used to
detect the presence of an edge at a point in an image (i.e. to determine if a point is on a ramp).
Similarly, the sign of the second derivative can be used to determine whether an edge pixel
lies on the dark or light side of an edge. We note two additional properties of the second
derivative around an edge: A) It produces two values for every edge in an image (an
undesirable feature); and B) an imaginary straight line joining the extreme positive and
negative values of the second derivative would cross zero near the midpoint of the edge. This
zero-crossing property of the second derivative is quite useful for locating the centers of thick
edges.
Fig.2.2 (a) Two regions separated by a vertical edge (b) Detail near the edge, showing a
gray- level profile, and the first and second derivatives of the profile.
One of the simplest approaches for linking edge points is to analyze the characteristics of
pixels in a small neighborhood (say, 3 X 3 or 5 X 5) about every point (x, y) in an image that
has been labeled an edge point. All points that are similar according to a set of predefined
criteria are linked, forming an edge of pixels that share those criteria.
The two principal properties used for establishing similarity of edge pixels in this kind of
analysis are (1) the strength of the response of the gradient operator used to produce the
edge
pixel; and (2) the direction of the gradient vector. The first property is given by the value of Af.
Thus an edge pixel with coordinates (xo, yo) in a predefined neighborhood of (x, y), is
similar in magnitude to the pixel at (x, y) if
The direction (angle) of the gradient vector is given by Eq. An edge pixel at (x o, yo) in
the predefined neighborhood of (x, y) has an angle similar to the pixel at (x, y) if
A point in the predefined neighborhood of (x, y) is linked to the pixel at (x, y) if both
magnitude and direction criteria are satisfied. This process is repeated at every location in the
image. A record must be kept of linked points as the center of the neighborhood is moved
from pixel to pixel. A simple bookkeeping procedure is to assign a different gray level to each
set of linked edge pixels.
In this process, points are linked by determining first if they lie on a curve of specified shape.
We now consider global relationships between pixels. Given n points in an image, suppose
that we want to find subsets of these points that lie on straight lines. One possible solution is to
first find all lines determined by every pair of points and then find all subsets of points that are
close to particular lines. The problem with this procedure is that it involves finding n(n - 1)/2 ~
n2 lines and then performing (n)(n(n - l))/2 ~ n3 comparisons of every point to all lines. This
approach is computationally prohibitive in all but the most trivial applications.
Fig.3.2 Subdivision of the parameter plane for use in the Hough transform
The computational attractiveness of the Hough transform arises from subdividing the
parameter space into so-called accumulator cells, as illustrated in Fig. 3.2, where (a max ,
amin) and (bmax , bmin), are the expected ranges of slope and intercept values. The cell at
coordinates (i, j), with accumulator value A(i, j), corresponds to the square associated with
parameter space coordinates (ai , bi).
Initially, these cells are set to zero. Then, for every point (xk, yk) in the image plane, we
let the parameter a equal each of the allowed subdivision values on the fl-axis and solve for the
corresponding b using the equation b = - xk a + yk .The resulting b‘s are then rounded off to
the nearest allowed value in the b-axis. If a choice of ap results in solution bq, we let A (p, q) =
A (p,
q) + 1. At the end of this procedure, a value of Q in A (i, j) corresponds to Q points in the xy-
plane lying on the line y = ai x + bj. The number of subdivisions in the ab-plane determines the
accuracy of the co linearity of these points. Note that subdividing the a axis into K increments
gives, for every point (xk, yk), K values of b corresponding to the K possible values of a. With
n image points, this method involves nK computations. Thus the procedure just discussed is
linear in n, and the product nK does not approach the number of computations discussed at the
beginning unless K approaches or exceeds n.
A problem with using the equation y = ax + b to represent a line is that the slope
approaches infinity as the line approaches the vertical. One way around this difficulty is to
use the normal representation of a line:
x cosθ + y sinθ = ρ
Figure 3.3(a) illustrates the geometrical interpretation of the parameters used. The use of this
representation in constructing a table of accumulators is identical to the method discussed for
the slope-intercept representation. Instead of straight lines, however, the loci are sinusoidal
curves in the ρθ -plane. As before, Q collinear points lying on a line x cosθj + y sinθj = ρ, yield
Q sinusoidal curves that intersect at (pi, θj) in the parameter space. Incrementing θ and solving
for the corresponding p gives Q entries in accumulator A (i, j) associated with the cell
determined by (pi, θj). Figure 3.3 (b) illustrates the subdivision of the parameter space.
Fig.3.3 (a) Normal representation of a line (b) Subdivision of the ρθ-plane into cells
The range of angle θ is ±90°, measured with respect to the x-axis. Thus with reference to Fig.
3.3 (a), a horizontal line has θ = 0°, with ρ being equal to the positive x-intercept. Similarly, a
vertical line has θ = 90°, with p being equal to the positive y-intercept, or θ = - 90°, with ρ
being equal to the negative y-intercept.
In this process we have a global approach for edge detection and linking based on representing
edge segments in the form of a graph and searching the graph for low-cost paths that
correspond to significant edges. This representation provides a rugged approach that performs
well in the presence of noise.
We begin the development with some basic definitions. A graph G = (N,U) is a finite,
nonempty set of nodes N, together with a set U of unordered pairs of distinct elements of N.
Each pair (ni, nj) of U is called an arc. A graph in which the arcs are directed is called a
directed graph. If an arc is directed from node ni to node nj, then nj is said to be a successor of
the parent node ni. The process of identifying the successors of a node is called expansion of
the node. In each graph we define levels, such that level 0 consists of a single node, called the
start or root node, and the nodes in the last level are called goal nodes. A cost c (ni, nj) can be
associated with every arc (ni, nj). A sequence of nodes n1, n2... nk, with each node ni being a
successor of node ni-1 is called a path from n1 to nk. The cost of the entire path is
The following discussion is simplified if we define an edge element as the boundary between
two pixels p and q, such that p and q are 4-neighbors, as Fig.3.4 illustrates. Edge elements are
identified by the xy-coordinates of points p and q. In other words, the edge element in Fig. 3.4
is defined by the pairs (xp, yp) (xq, yq). Consistent with the definition an edge is a sequence of
connected edge elements.
We can illustrate how the concepts just discussed apply to edge detection using
the 3 X 3 image shown in Fig. 3.5 (a). The outer numbers are pixel
Fig.3.5 (a) A 3 X 3 image region, (b) Edge segments and their costs, (c) Edge
corresponding to the lowest-cost path in the graph shown in Fig. 3.6
coordinates and the numbers in brackets represent gray-level values. Each edge element,
defined by pixels p and q, has an associated cost, defined as
where H is the highest gray-level value in the image (7 in this case), and f(p) and f(q) are the
gray- level values of p and q, respectively. By convention, the point p is on the right-hand side
of the direction of travel along edge elements. For example, the edge segment (1, 2) (2, 2) is
between points (1, 2) and (2, 2) in Fig. 3.5 (b). If the direction of travel is to the right, then p is
the point with coordinates (2, 2) and q is point with coordinates (1, 2); therefore, c (p, q) = 7 -
[7
- 6] = 6. This cost is shown in the box below the edge segment. If, on the other hand, we are
traveling to the left between the same two points, then p is point (1, 2) and q is (2, 2). In this
case the cost is 8, as shown above the edge segment in Fig. 3.5(b). To simplify the discussion,
we assume that edges start in the top row and terminate in the last row, so that the first element
of an edge can be only between points (1, 1), (1, 2) or (1, 2), (1, 3). Similarly, the last edge
element has
to be between points (3, 1), (3, 2) or (3, 2), (3, 3). Keep in mind that p and q are 4-neighbors,
as noted earlier. Figure 3.6 shows the graph for this problem. Each node (rectangle) in the
graph corresponds to an edge element from Fig. 3.5. An arc exists between two nodes if the
two corresponding edge elements taken in succession can be part of an edge.
Fig. 3.6 Graph for the image in Fig.3.5 (a). The lowest-cost path is shown dashed.
As in Fig. 3.5 (b), the cost of each edge segment, is shown in a box on the side of the arc
leading into the corresponding node. Goal nodes are shown shaded. The minimum cost path is
shown dashed, and the edge corresponding to this path is shown in Fig. 3.5 (c).
THRESHOLDING:
Because of its intuitive properties and simplicity of implementation, image thresholding enjoys
a central position in applications of image segmentation.
Global Thresholding:
The simplest of all thresholding techniques is to partition the image histogram by using a
single global threshold, T. Segmentation is then accomplished by scanning the image pixel by
pixel and labeling each pixel as object or back-ground, depending on whether the gray level of
that pixel is greater or less than the value of T. As indicated earlier, the success of this method
depends entirely on how well the histogram can be partitioned.
Fig.4.1 FIGURE 10.28 (a) Original image, (b) Image histogram, (c) Result of
global thresholding with T midway between the maximum and minimum gray levels.
Figure 4.1(a) shows a simple image, and Fig. 4.1(b) shows its histogram. Figure 4.1(c)
shows the result of segmenting Fig. 4.1(a) by using a threshold T midway between the
maximum and minimum gray levels. This threshold achieved a "clean" segmentation by
eliminating the shadows and leaving only the objects themselves. The objects of interest in
this case are darker than the background, so any pixel with a gray level ≤ T was labeled black
(0), and any pixel with a gray level ≥ T was labeled white (255).The key objective is merely
to generate a binary image, so the black-white relationship could be reversed. The type of
global thresholding just described can be expected to be successful in highly controlled
environments. One of the areas in which this often is possible is in industrial inspection
applications, where control of the illumination usually is feasible.
2. Segment the image using T. This will produce two groups of pixels: G1 consisting of all pixels
with gray level values >T and G2 consisting of pixels with values < T.
3. Compute the average gray level values µ 1 and µ2 for the pixels in regions G1and G2.
5. Repeat steps 2 through 4 until the difference in T in successive iterations is smaller than a
predefined parameter To.
When there is reason to believe that the background and object occupy comparable areas in the
image, a good initial value for T is the average gray level of the image. When objects are small
compared to the area occupied by the background (or vice versa), then one group of pixels will
dominate the histogram and the average gray level is not as good an initial choice. A more
appropriate initial value for T in cases such as this is a value midway between the maximum
and minimum gray levels. The parameter T o is used to stop the algorithm after changes
become small in terms of this parameter. This is used when speed of iteration is an important
issue.
Imaging factors such as uneven illumination can transform a perfectly segmentable histogram
into a histogram that cannot be partitioned effectively by a single global threshold. An
approach for handling such a situation is to divide the original image into subimages and then
utilize a different threshold to segment each subimage. The key issues in this approach are
how to subdivide the image and how to estimate the threshold for each resulting subimage.
Since the threshold used for each pixel depends on the location of the pixel in terms of the
subimages, this type of thresholding is adaptive.
Fig.5 (a) Original image, (b) Result of global thresholding. (c) Image subdivided into
individual subimages (d) Result of adaptive thresholding.
We illustrate adaptive thresholding with a example. Figure 5(a) shows the image, which we
DIGITAL IMAGE PROCESSING Page 87
MALLA REDDY ENGINEEERING COLLEGE FOR WOMEN Department of E.C.E
concluded could not be thresholded effectively with a single global threshold. In fact, Fig. 5(b)
shows the result of thresholding the image with a global threshold manually placed in the
valley of its histogram. One approach to reduce the effect of nonuniform illumination is to
subdivide the image into smaller subimages, such that the illumination of each subimage is
approximately uniform. Figure 5(c) shows such a partition, obtained by subdividing the image
into four equal parts, and then subdividing each part by four again. All the subimages that did
not contain a boundary between object and back-ground had variances of less than 75. All
subimages containing boundaries had variances in excess of 100. Each subimage with
variance greater than 100 was segmented with a threshold computed for that subimage using
the algorithm. The initial value for T in each case was selected as the point midway between
the minimum and maximum gray levels in the subimage. All subimages with variance less
than 100 were treated as one composite image, which was segmented using a single threshold
estimated using the same algorithm. The result of segmentation using this procedure is shown
in Fig. 5(d).
With the exception of two subimages, the improvement over Fig. 5(b) is evident. The
boundary between object and background in each of the improperly segmented subimages was
small and dark, and the resulting histogram was almost unimodal.
It is intuitively evident that the chances of selecting a "good" threshold are enhanced
considerably if the histogram peaks are tall, narrow, symmetric, and separated by deep valleys.
One approach for improving the shape of histograms is to consider only those pixels that lie
on or near the edges between objects and the background. An immediate and obvious
improvement is that histograms would be less dependent on the relative sizes of objects and
the background. For instance, the histogram of an image composed of a small object on a
large background area (or vice versa) would be dominated by a large peak because of the high
concentration of one type of pixels.
If only the pixels on or near the edge between object and the background were used,
the resulting histogram would have peaks of approximately the same height. In addition, the
probability that any of those given pixels lies on an object would be approximately equal to
the probability that it lies on the back-ground, thus improving the symmetry of the histogram
peaks.
The principal problem with the approach just discussed is the implicit assumption that the
edges between objects and background arc known. This information clearly is not available
during segmentation, as finding a division between objects and background is precisely what
segmentation is all about. However, an indication of whether a pixel is on an edge may be
obtained by computing its gradient. In addition, use of the Laplacian can yield information
regarding whether a given pixel lies on the dark or light side of an edge. The average value of
the Laplacian is 0 at the transition of an edge, so in practice the valleys of histograms formed
from the pixels selected by a gradient/Laplacian criterion can be expected to be sparsely
populated. This property produces the highly desirable deep valleys.
The gradient Aƒ at any point (x, y) in an image can be found. Similarly, the Laplacian A2 f can
also be found. These two quantities may be used to form a three-level image, as follows:
where the symbols 0, +, and - represent any three distinct gray levels, T is a threshold, and the
gradient and Laplacian are computed at every point (x, y). For a dark object on a light
background, the use of the Eqn. produces an image s(x, y) in which (1) all pixels that are not
on an edge (as determined by Aƒ being less than T) are labeled 0; (2) all pixels on the dark
side of an edge are labeled +; and (3) all pixels on the light side of an edge are labeled -. The
symbols + and - in Eq. above are reversed for a light object on a dark background. Figure
shows the labeling produced by Eq. for an image of a dark, underlined stroke
written on a light background.
The information obtained with this procedure can be used to generate a
segmented, binary image in which l's correspond to objects of interest and 0's correspond to
the background. The transition (along a horizontal or vertical scan line) from a light
background to a dark object must be characterized by the occurrence of a - followed by a + in
s (x, y). The interior of the object is composed of pixels that are labeled either 0 or +. Finally,
the transition from the object back to the background is characterized by the occurrence of a +
followed by a -. Thus a horizontal or vertical scan line containing a section of an object has
the following structure:
where (…) represents any combination of +, -, and 0. The innermost parentheses contain
object points and are labeled 1. All other pixels along the same scan line are labeled 0, with
the exception of any other sequence of (- or +) bounded by (-, +) and (+, -).
Figure 6.2 (a) shows an image of an ordinary scenic bank check. Figure 6.3 shows the
histogram as a function of gradient values for pixels with gradients greater than 5. Note that
this histogram has two dominant modes that are symmetric, nearly of the same height, and arc
separated by a distinct valley. Finally, Fig. 6.2(b) shows the segmented image obtained by
with T at or near the midpoint of the valley. Note that this example is an illustration of local
thresholding, because the value of T was determined from a histogram of the gradient and
Laplacian, which are local properties.
Basic Formulation:
Let R represent the entire image region. We may view segmentation as a process that partitions
R into n subregions, R1, R2..., Rn, such that
Here, P (Ri) is a logical predicate defined over the points in set Ri and Ǿ` is the null set.
Condition (a) indicates that the segmentation must be complete; that is, every pixel must be in a
region. Condition (b) requires that points in a region must be connected in some predefined
sense. Condition (c) indicates that the regions must be disjoint. Condition (d) deals with the
properties that must be satisfied by the pixels in a segmented region—for example P (Ri) =
TRUE if all pixels in Ri, have the same gray level. Finally, condition (c) indicates that regions R i
and Rj are different in the sense of predicate P.
Region Growing:
As its name implies, region growing is a procedure that groups pixels or subregions into larger
regions based on predefined criteria. The basic approach is to start with a set of "seed" points and
from these grow regions by appending to each seed those neighboring pixels that have properties
similar to the seed (such as specific ranges of gray level or color). When a priori information is
not available, the procedure is to compute at every pixel the same set of properties that ultimately
will be used to assign pixels to regions during the growing process. If the result of these
computations shows clusters of values, the pixels whose properties place them near the centroid
of these clusters can be used as seeds.
The selection of similarity criteria depends not only on the problem under consideration, but also
on the type of image data available. For example, the analysis of land-use satellite imagery
depends heavily on the use of color. This problem would be significantly more difficult, or even
impossible, to handle without the inherent information available in color images. When the
images are monochrome, region analysis must be carried out with a set of descriptors based on
gray levels and spatial properties (such as moments or texture).
Basically, growing a region should stop when no more pixels satisfy the criteria for inclusion in
that region. Criteria such as gray level, texture, and color, are local in nature and do not take into
account the "history" of region growth. Additional criteria that increase the power of a region-
growing algorithm utilize the concept of size, likeness between a candidate pixel and the pixels
grown so far (such as a comparison of the gray level of a candidate and the average gray level of
the grown region), and the shape of the region being grown. The use of these types of descriptors
is based on the assumption that a model of expected results is at least partially available. Figure
7.1 (a) shows an X-ray image of a weld (the horizontal dark region) containing several cracks
and porosities (the bright, white streaks running horizontally through the middle of the image).
We wish to use region growing to segment the regions of the weld failures. These segmented
features could be used for inspection, for inclusion in a database of historical studies, for
controlling an automated welding system, and for other numerous applications.
Fig.7.1 (a) Image showing defective welds, (b) Seed points, (c) Result of region growing, (d)
Boundaries of segmented ; defective welds (in black).
The first order of business is to determine the initial seed points. In this application, it is known
that pixels of defective welds tend to have the maximum allowable digital value B55 in this
case). Based on this information, we selected as starting points all pixels having values of 255.
The points thus extracted from the original image are shown in Fig. 10.40(b). Note that many of
the points are clustered into seed regions.
The next step is to choose criteria for region growing. In this particular
example we chose two criteria for a pixel to be annexed to a region: (1) The absolute gray-level
difference between any pixel and the seed had to be less than 65. This number is based on the
histogram shown in Fig. 7.2 and represents the difference between 255 and the location of the
first major valley to the left, which is representative of the highest gray level value in the dark
weld region. (2) To be included in one of the regions, the pixel had to be 8-connected to at least
one pixel in that region.
boundaries of these regions on the original image [Fig. 7.1(d)] reveals that the region-growing
procedure did indeed segment the defective welds with an acceptable degree of accuracy. It is of
interest to note that it was not necessary to specify any stopping rules in this case because the
criteria for region growing were sufficient to isolate the features of interest.
The procedure just discussed grows regions from a set of seed points. An alternative is to
subdivide an image initially into a set of arbitrary, disjointed regions and then merge and/or split
the regions in an attempt to satisfy the conditions. A split and merge algorithm that iteratively
works toward satisfying these constraints is developed.
Let R represent the entire image region and select a predicate P. One approach for segmenting R
is to subdivide it successively into smaller and smaller quadrant regions so that, for any region
Ri, P(Ri) = TRUE. We start with the entire region. If P(R) = FALSE, we divide the image into
quadrants. If P is FALSE for any quadrant, we subdivide that quadrant into subquadrants, and so
on. This particular splitting technique has a convenient representation in the form of a so-called
quadtree (that is, a tree in which nodes have exactly four descendants), as illustrated in Fig. 7.3.
Note that the root of the tree corresponds to the entire image and that each node corresponds to a
subdivision. In this case, only R4 was subdivided further.
If only splitting were used, the final partition likely would contain adjacent regions with identical
properties. This drawback may be remedied by allowing merging, as well as splitting. Satisfying
the constraints, requires merging only adjacent regions whose combined pixels satisfy the
predicate P. That is, two adjacent regions Rj and Rk are merged only if P (Rj U Rk) = TRUE.
The preceding discussion may be summarized by the following procedure, in which, at any step
we
1. Split into four disjoint quadrants any region Ri, for which P (Ri) = FALSE.
2. Merge any adjacent regions Rj and Rk for which P (Rj U Rk) =TRUE.
3. Stop when no further merging or splitting is possible.
Several variations of the preceding basic theme are possible. For example, one possibility is to
split the image initially into a set of blocks. Further splitting is carried out as described
previously, but merging is initially limited to groups of four blocks that are descendants in the
quadtree representation and that satisfy the predicate P. When no further mergings of this type
are possible, the procedure is terminated by one final merging of regions satisfying step 2. At
this point, the merged regions may be of different sizes. The principal advantage of this
approach is that it uses the same quadtree for splitting and merging, until the final merging
step.
UNIT- 5
IMAGE COMPRESSION
In this technique, an image is transformed using one of the image transform. Here,
the concept is to produce a new set of coefficients that are decorrelated. Then, the set of
coefficients in which more information present in the image are converted into a small
number of coefficients. The coefficients which provide little image information are
discarded.
In the encoding, construction of n x n images from original image will be the first
step. These are transformed to generate (n/n)2 number of sub-images. The size of these
images is n x n. This information process is used to decorrelate the pixels of the sub
image.
The quantization block is used to eliminate the co-efficient which carry least
information. These omitted coefficients have the small impact of the reconstructed sub
images. Finally the co-efficient are encoded.
The decoder is used to extract the code words. These are applied to the dequantizer
which is used to reconstruct the set of quantized transform coefficients. Then, the set of
transform coefficients that represents the image is produced from the quantized
coefficients. These are generated by using dequantizer block.
These co-efficient are inverse transformed and give the lossy version of an image.
The transform coding can be done without the quantization and dequantization
(truncation) blocks. But, here the compression ratio is low.
JPEG stands for Joint Photographic Experts Group. It is used on 24-bit color files. Works well on
photographic images. Although it is a lossy compression technique, it yields an excellent quality
image with high compression rates.
3. Transform the pixel information from the spatial domain to the frequency domain with the
Discrete Cosine Transform.
4. Quantize the resulting values by dividing each coefficient by an integer value and rounding off
to the nearest integer.
5. Look at the resulting coefficients in a zigzag order. Do a run-length encoding of the coefficients
ordered in this manner. Follow by Huffman coding.
Entropy encoder is used to provide the lossless additional compression. The DC coefficients
are encoded using predictive coding.
JPEG standard can use two entropy coding methods namely Huffman coding (it is used by
baseline system) and arithmetic coding (it is used by extending system).
In the decoder, compressed image is given to entropy decoder. Is uses the inverse procedure
of Huffman coding or arithmetic coding to produce the output. This output is given to dequantizer
block. Here, inverse function of the quantization will be done. The output of the dequantizer block
is given to inverse DCT block. The output of this block can give the decompressed image.
PART-B
1. Explain the steps involved in digital image processing. (or) Explain various functional block of digital
image processing.
2. Describe the elements of visual perception.
3. Describe image formation in the eye with brightness adaptation and discrimination
4. Write short notes on sampling and quantization.
5. Describe the functions of elements of digital image processing system with a diagram. Or List and
explain various elements of digital image processing system
6. Explain the basic relationships between pixels?
i) Neighbours of a pixel ii) Connectivity iii) Distance measure iv) Path
7. Write the expressions for Walsh transform kernel and Walsh transform
2
Mean µ=a+√πb/4; Standard deviation σ =b(4-π)/4
9. How a degradation process is modeled? Or Define degradation model and sketch it.
A system operator H, which together with an additive white noise term η(x,y) operates on an input
image f(x,y) to produce a degraded image g(x,y).
10. What are the three types of discontinuity in digital image?
Points, lines and edges.
11. What is concept algebraic approach? What are the two methods of algebraic approach?
The concept of algebraic approach is to estimate the original image which minimizes a predefined criterion
of performances. The two methods of algebraic approach are
1. Unconstraint restoration approach 2. Constraint restoration approach
12. Define Gray-level interpolation
Gray-level interpolation deals with the assignment of gray levels to pixels in the spatially transformed
image
13. What is meant by Noise probability density function?
The spatial noise descriptor is the statistical behavior of gray level values in the noise component of the
model.
14. What is geometric transformation?
Transformation is used to alter the co-ordinate description of image.
The basic geometric transformations are 1. Image translation 2. Scaling 3. Image rotation
15. What is image translation and scaling?
Image translation means reposition the image from one co-ordinate location to another along straight line
path. Scaling is used to alter the size of the object or image (ie) a co-ordinate system is scaled by a factor.
16. Why the restoration is called as unconstrained restoration?
In the absence of any knowledge about the noise ‗n‘, a meaningful criterion function is to seek an
such that H approximates of in a least square sense by assuming the noise term is as small as possible.
Where H = system operator. = estimated input image. g = degraded image.
17. Which is the most frequent method to overcome the difficulty to formulate the spatial
relocation of pixels?
The point is the most frequent method, which are subsets of pixels whose location in the input (distorted)
and output (corrected) imaged is known precisely.
18. What are the three methods of estimating the degradation function?
1. Observation 2. Experimentation 3. Mathematical modeling.
The simplest approach to restoration is direct inverse filtering, an estimate F^(u,v) of the transform of
the original image simply by dividing the transform of the degraded image G^(u,v) by the degradation
function.
19. What is pseudo inverse filter?
It is the stabilized version of the inverse filter. For a linear shift invariant system with frequency response
H(u,v) the pseudo inverse filter is defined as
H-(u,v)=1/(H(u,v) H#0
0 H=0
20. What is meant by least mean square filter or wiener filter?
The limitation of inverse and pseudo inverse filter is very sensitive noise. The wiener filtering is a method
of restoring images in the presence of blur as well as noise.
21. What is blur impulse response and noise levels?
Blur impulse response: This parameter is measured by isolating an image of a suspected object within a
picture.
Noise levels: The noise of an observed image can be estimated by measuring the image covariance over a
region of constant background luminance.
PART-B
1. Explain the algebra approach in image restoration. Describe how image restoration can be
performed for black and white binary images and blur caused by uniform linear motion.
2. Explain wiener filter or least mean square filter in image restoration.
3. What is meant by Inverse filtering? Explain it with equation.
4. Explain image degradation model /restoration process in detail.
5. Explain the causes for image degradation.
6. Describe constrained least square filtering for image restoration and derive its transfer function.
7. Explain different noise models in image processing
-1 -2 -1 -1 0 1
0 0 0 -2 0 2
1 2 2 -1 0 1
8. Define region splitting and merging. Specify the steps involved in splitting and merging
Region splitting and merging is a segmentation process in which an image is initially subdivided into a set
of arbitrary ,disjoint regions and then the regions are merger and /or splitted to satisfy the basic conditions.
Split into 4 disjoint quadrants any region Ri for which P(Ri)=FALSE. Merge any adjacent regions Rj and
Rk for which P(RjURk)=TRUE. Stop when no further merging or splitting is positive.
9. Define and give the transfer function of mean and geometric mean filter
The arithmetic mean filter is a very simple one and is calculated as follows:
g(s, t)
1
fˆ(x, y)
mn ( s,t )Sxy
The transfer function of Geometric mean filter is
1
mn
fˆ (x, y) g(s, t)
( s,t )S xy
•Each restored pixel value given by the product of all the pixel values in the filter window, raised to the
power 1/mn • Achieving smoothing comparable to the arithmetic mean filter, but tending to lose less image
detail in the process
PART-B
1. What is image segmentation. Explain in detail.
2. Explain Edge Detection and edge linking in detail?
3. Define thresholding and explain the various methods of thresholding in detail?
4. Discuss about region based image segmentation techniques. Compare threshold region based
techniques.
5. Explain the two techniques of region segmentation.
6. Explain region based segmentation and region growing with an example.
7. Explain strcuturing element decompisition od dilation.
8. Explain through the Erosion of opening and closing technique.
9. Discuss in detail about threshold selection based on boundary characteristics
10. What is meant by Hit or Miss Transformation? Explain in detail.
11. What is the need for Compression? Compare lossy and lossless compression technique
In terms of storage, the capacity of a storage device can be effectively increased with methods that
compress a body of data on its way to a storage device and decompress it when it is retrieved.
1. In terms of communications, the bandwidth of a digital communication link can be effectively
increased by compressing data at the sending end and decompressing data at the receiving end.
2. At any given time, the ability of the Internet to transfer data is fixed. Thus, if data can effectively
be compressed wherever possible, significant improvements of data throughput can be
achieved. Many files can be combined into one compressed document making sending easier.
Lossless compression technique Lossy compression technique
*In lossless data compression, the integrity of *Our eyes and ears cannot distinguish subtle
the data is preserved. The original data and changes. In such cases, we can use a lossy
the data after compression and decompression data compression method.
are exactly the same because, in these *These methods are cheaper—they take less
methods, the compression and decompression time and space when it comes to sending
algorithms are exact inverses of each other: millions of bits per second for images and
no part of the data is lost in the process. video.
*Redundant data is removed in compression *Several lossy compression techniques are
and added during decompression. JPEG (Joint Photographic Experts Group)
*Lossless compression methods are normally encoding is used to compress pictures and
used when we cannot afford to lose any data. graphics, MPEG (Moving Picture Experts
*Some techniques are run-length encoding, Group) encoding is used to compress video,
Huffman coding, Lempel Ziv encoding and MP3 (MPEG audio layer 3) for audio
compression
12. Define encoder
Source encoder is responsible for removing the coding and interpixel redundancy and psycho visual
redundancy. There are two components A) Source Encoder B) Channel Encoder
13. Define channel encoder.
The channel encoder reduces the impact of the channel noise by inserting redundant bits into the source
encoded data. Eg: Hamming code
14. What are the types of decoder?
Source decoder- has two components a) Symbol decoder- This performs inverse operation of symbol
encoder. b) Inverse mapping- This performs inverse operation of mapper. Channel decoder-this is omitted
if the system is error free.
15. What are the operations performed by error free compression?
1) Devising an alternative representation of the image in which its interpixel redundant are reduced.
2) Coding the representation to eliminate coding redundancy
16. What is Variable Length Coding?
Variable Length Coding is the simplest approach to error free compression. It reduces only the coding
redundancy. It assigns the shortest possible codeword to the most probable gray levels.
17. Define Huffman coding and mention its limitation
Huffman coding is a popular technique for removing coding redundancy.
1. When coding the symbols of an information source the Huffman code yields the smallest possible
number of code words, code symbols per source symbol.
Limitation: For equiprobable symbols, Huffman coding produces variable code words.
18. Define Block code.
Each source symbol is mapped into fixed sequence of code symbols or code words. So it is called as block
code.
19. Define instantaneous code.
A code word that is not a prefix of any other code word is called instantaneous or prefix codeword.
20. Define uniquely decodable code.
A code word that is not a combination of any other codeword is said to be uniquely decodable code.
Bit error rate: The bit error rate (BER) is the number of bit errors per unit time.
36. List the advantages of transform coding.
Very efficient multiplier less implementation, to provide high quality digitization .
Transform Coding may also be used for the efficient encoding of sequences which are not
successive samples of a waveform, but samples of N correlated sources
PART - B
1. Explain three basic data redundancy? What is the need for data compression.
2. What is image compression? Explain any four variable length coding compression schemes.
3. Explain about Image compression model.
4. Explain about Error free Compression and Lossy compression.
5. Explain the schematics of image compression standard JPEG.
6. Explain the principle of arithmetic coding with an example.
7. Discuss about MPEG standard and compare with JPEG
8. Draw and explain the block diagram of MPEG encoder
9. Discuss the need for image compression. Perform Huffman algorithm for the following intensity
distribution, for a 64x64 image. Obtain the coding efficiency and compare with that of uniform length
code. r0=1008, r1=320, r2=456, r3=686, r4=803, r5=105, r6=417, r7=301
10. (i) Briefly explain transform coding with neat sketch.
(ii) A source emits letters from an alphabet A = {a1 , a2 , a3 , a4 , a5} with probabilities
P(a1) = 0.2 , P(a2) = 0.4 , P(a3) = 0.2 , P(a4) = 0.1 and P(a5) = 0.1.
(1) Find the Huffman code (2) Find the average length of the code and its redundancy.
11. Describe run length encoding with examples.
PART- A
(25 Marks)
1.a) Define a digital image. [2]
b) Draw an image for image processing system. [3]
c) Present a note on smoothing linear filters. [2]
d) What are the applications of gray level slicing? [3]
e) Present a note on WEIGHT parameter. [2]
f) What are the spatial and frequency properties of noise? [3]
g) What are the applications of image segmentation? [2]
h) What is meant by watermarking? [3]
i) Define image compression. [2]
j) What is meant by error free compression? [3]
PART-B
(50 Marks)
2.a) Distinguish between digital image and binary image.
b) Explain a simple image model. [5+5]
OR
3.a) Explain the properties of slant transforrm.
b) Write short notes on hadamard transform. [5+5]
www.manaresults.co.in
8.a) Explain about Hough transform with an example.
b) What is the role of thresholding in segmentation? [5+5]
OR
9.a) Write short notes on dilation and erosion.
b) Give an overview of digital image watermarking methods. [5+5]
--ooOoo--
www.manaresults.co.in
Code No: 117CJ R13
JAWAHARLAL NEHRU TECHNOLOGICAL UNIVERSITY HYDERABAD
B. Tech IV Year I Semester Examinations, November/December - 2017
DIGITAL IMAGE PROCESSING
(Common to ECE, ETM)
Time: 3 Hours Max. Marks: 75
Note: This question paper contains two parts A and B.
Part A is compulsory which carries 25 marks. Answer all questions in Part A. Part B
consists of 5 Units. Answer any one full question from each unit. Each question
carries 10 marks and may have a, b, c as sub questions.
Part- A
(25 Marks)
1.a) Define Sampling and Quantization. [2]
b) List the properties of Walsh Transform. [3]
c) Define histogram. [2]
d) What is the need of image enhancement? [3]
e) What is the difference between image restoration and image enhancement? [2]
f) Draw the model of Image Restoration process. [3]
g) List different types of discontinuities in digital image. [2]
h) What is global, Local and dynamic threshold? [3]
i) What is the need of image compression? [2]
j) Give the characteristics of lossless compression. [3]
Part-B
(50 Marks)
2. With mathematical expressions explain the Slant transform and explain how it is useful in
Image processing. [10]
OR
3.a) List and explain the fundamental steps in digital image processing.
b) Discuss briefly the following:
i) Neighbours of pixels ii) connectivity. [5+5]
6. What is meant by image restoration? Explain the image degradation model. [10]
OR
7. Discuss in detail the image restoration using inverse filtering. [10]
www.ManaResults.co.in
8.a) Explain the basics of intensity thresholding in image segmentation.
b) Explain about morphological hit-or-miss transform. [5+5]
OR
9.a) Discuss in detail the edge linking using local processing.
b) Discuss briefly the region based segmentation. [6+4]
--ooOoo--
www.ManaResults.co.in
R13
Code No: 117CJ
JAWAHARLAL NEHRU TECHNOLOGICAL UNIVERSITY HYDERABAD
B. Tech IV Year I Semester Examinations, March - 2017
DIGITAL IMAGE PROCESSING
(Electronics and Communication Engineering)
Time: 3 Hours Max. Marks: 75
Note: This question paper contains two parts A and B.
Part A is compulsory which carries 25 marks. Answer all questions in Part A.
Part B consists of 5 Units. Answer any one full question from each unit. Each
question carries 10 marks and may have a, b, c as sub questions.
2.a) What is meant by digital image processing? What are the applications of it? How an
image is represented digitally?
b) Non uniform sampling is useful for what type of images. Give reasons. [5+5]
OR
3.a) Is fast algorithm applicable for computation of Hadamard transform, if so what are the
problems encountered in implementation.
b) Explain Discrete Cosine Transform and specify its properties. [5+5]
6. Describe constrained least square filtering technique for image restoration and derive its
transfer function. [10]
OR
7. Describe with mathematical model, both constrained and unconstrained restoration. [10]
www.ManaResults.co.in
8.a) Explain the segmentation techniques that are based on finding the regions.
b) Write the applications of segmentation. [7+3]
OR
9.a) Explain any two methods for linking the edge pixels to form a boundary of an object.
b) Explain with examples morphological operations dilation and erosion. [7+3]
---ooOoo---
www.ManaResults.co.in
Code No: 117CJ R13
JAWAHARLAL NEHRU TECHNOLOGICAL UNIVERSITY HYDERABAD
B. Tech IV Year I Semester Examinations, November/December - 2016
DIGITAL IMAGE PROCESSING
(Electronics and Communication Engineering)
Time: 3 Hours Max. Marks: 75
Note: This question paper contains two parts A and B.
Part A is compulsory which carries 25 marks. Answer all questions in Part A. Part B
consists of 5 Units. Answer any one full question from each unit. Each question carries
10 marks and may have a, b, c as sub questions.
PART- A
(25 Marks)
1.a) Define Weber Ratio [2]
b) What is city block distance [3]
c) What is mean by Image Subtraction? [2]
d) What are Piecewise-Linear Transformations [3]
e) What is degradation function? [2]
f) What is Gray-level interpolation? [3]
g) What are the logic operations involving binary images [2]
h) What is convex hull? [3]
i) Define Compression Ratio [2]
j) What is Arithmetic Coding? [3]
PART-B
(50 Marks)
2.a) Discuss the role of sampling and quantization with an example.
b) With a neat block diagram, explain the fundamental steps in digital image processing.[5+5]
OR
3.a) Discuss the Relationship between Pixels in detail.
b) Discuss optical illusions with examples. [5+5]
--ooOoo--
www.ManaResults.co.in
Code No:127CJ
R15
JAWAHARLAL NEHRU TECHNOLOGICAL UNIVERSITY HYDERABAD
B.Tech IV Year I Semester Examinations, May/June - 2019
DIGITAL IMAGE PROCESSING
(Electronics and Communication Engineering)
Time: 3 Hours Max.Marks: 75
Note: This question paper contains two parts A and B.
Part A is compulsory which carries 25 marks. Answer all questions in Part A. Part B consists of
5 Units. Answer any one full question from each unit. Each question carries 10 marks and may
have a, b, c as sub questions.
PART- A
(25 Marks)
1.a) How to represent the image? [2]
b) What is 4-, 8-, m- connectivity? [3]
c) What is High boost High pass filter? [2]
d) Compare linear and nonlinear gray level transformations. [3]
e) What are the advantages of Restoration? [2]
f) What are the different sources of degradation? [3]
g) What is erosion? [2]
h) How discontinuity property is used in image segmentation? [3]
i) What is mean by redundancy? [2]
j) What is fidelity? How it is used in image processing? [3]
PART-B
(50 Marks)
2.a) How to sample the image and how it differ from signal sampling?
b) Explore the relationship between pixels. [4+6]
OR
3.a) Define 2-D DFT and prove its convolution property and also write its applications.
b) Derive the 8 × 8 Slant transform matrix and write its order of sequence. [5+5]
4.a) Explain local enhancement techniques and compare it with global enhancement techniques.
b) Explain Histogram equalization method with example. [5+5]
OR
5.a) Consider the following image segment x and enhance it using the equation y = k x where k is
constant and y is output image.
0 1 2 3 4 5 6 7
54 35 64 53 123 43 56 45
b) Explain how low pass filter is used to enhance the image in frequency domain? [5+5]
6.a) Explain how image restoration improves the quality of image.
b) What is inverse filter? How it is used for image restoration? [5+5]
OR
7.a) How wiener filter is used for image restoration? What are the limitations of it?
b) What are the applications of restoration? [6+4]
10. Suppose the alphabet is [A, B,C], and the known probability distribution is PA= 0.5, PB= 0.4,
Pc= 0.1. For simplicity, let's also assume that bothencoder and decoder know that the length of the
messages is always 3, so thereis no need for a terminator.
a) How many bits are needed to encode the message BBB by Huffmancoding?
b) How many bits are needed to encode the message BBB by arithmetic coding?
c) Analyze and compare the results of (a) and (b). [10]
OR
11.a) Draw the general block diagram of compression modal and explain the significance of each
block.
b) Explain the loss-less prediction code for image compression with neat diagrams and equations.
[5+5]
--ooOoo--
Code No: 136BD R16
JAWAHARLAL NEHRU TECHNOLOGICAL UNIVERSITY HYDERABAD
B. Tech III Year II Semester Examinations, May – 2019
DIGITAL IMAGE PROCESSING
(Electronics and Communication Engineering)
Time: 3 hours Max. Marks: 75
PART - A
(25 Marks)
1.a) Compare and contrast digital image and binary image. [2]
b) Define spatial and gray level resolution. [3]
c) List the various areas of application of image subtraction [2]
d) Explain about median filtering. [3]
e) Explain about alpha-trimmed mean filter? [2]
f) Write short notes on Max and Min filters. [3]
g) What is meant by edge in a digital image? [2]
h) What is meant by optimal Thresholding? [3]
i) Write short notes on spatial redundancy. [2]
j) Explain the Fidelity criteria. [3]
PART - B
(50 Marks)
4. Sketch perspective plot of an 2-D Ideal Low pass filter transfer function and filter cross section and
explain its usefulness in Image enhancement. [10]
OR
5.a) What is meant by Histogram of an image. Write and explain with an example an algorithm for
histogram equalization.
b) What is meant by the Gradient and the Laplacian? Discuss their role in image enhancement.
[5+5]
6.a) Illustrate the use of adaptive median filter for noise reduction in an image.
b) Outline the different approaches to estimate the noise parameters in an image. [5+5]
OR
7.a) What are the different ways to estimate the degradation function? Explain.
b) Explain about noise reduction in an image using band reject and band pass filters. [5+5]
8.a) Explain about Global Processing by making use of Hough Transform?
b) Explain the following morphological algorithms [4+6]
i) Boundary extraction ii) Hole filling
OR
9.a) With necessary figures, explain the opening and closing operations.
b) Describe the procedure for image segmentation based on region growing with relevant examples.
[5+5]
10.a) Consider an 8- pixel line of gray-scale data, {12,12,13,13,10,13,57,54}, which has been uniformly
quantized with 6-bit accuracy. Construct its 3-bit IGS code.
b) What is bit-plane slicing? How it is used for achieving compression? [5+5]
OR
11.a) With the help of a block diagram explain about transform coding system.
b) Summarize the various types of data redundancies? [5+5]
---ooOoo---