DIP 15EC72 Notes
DIP 15EC72 Notes
DIP 15EC72 Notes
Module – 5
Segmentation: Point, Line, and Edge Detection, Thresholding, Region-Based
Segmentation, Segmentation Using Morphological Watersheds. L1, L2,
Representation and Description: Representation, Boundary descriptors. L3
[Text: Chapter 10: Sections 10.2, to 10.5 and Chapter 11: Sections 11.1 and 11.2]
Course Outcomes: At the end of the course students should be able to:
Understand image formation and the role human visual system plays in perception of gray
and color image data.
Apply image processing techniques in both the spatial and frequency (Fourier) domains.
Design image analysis techniques in the form of image segmentation and to evaluate the
Methodologies for segmentation.
Conduct independent study and analysis of Image Enhancement techniques.
Question paper pattern:
The question paper will have ten questions.
Each full question consists of 16 marks.
There will be 2 full questions (with a maximum of Three sub questions) from each
module.
Each full question will have sub questions covering all the topics under a module.
The students will have to answer 5 full questions, selecting one full question from each
module.
Text Book:
Digital Image Processing- Rafel C Gonzalez and Richard E. Woods, PHI 3rd Edition 2010.
Reference Books:
1. Digital Image Processing- S.Jayaraman, S.Esakkirajan, T.Veerakumar, Tata McGraw Hill
2014.
2. Fundamentals of Digital Image Processing-A. K. Jain, Pearson 2004.
Module – 1 RBT
Level
Digital Image Fundamentals: What is Digital Image Processing?, Origins of Digital
Image Processing, Examples of fields that use DIP, Fundamental Steps in Digital
Image Processing, Components of an Image Processing System, Elements of Visual
L1,
Perception, Image Sensing and Acquisition, Image Sampling and Quantization, Some
L2
Basic Relationships Between Pixels, Linear and Nonlinear Operations.
[Text: Digital Image Processing- Rafel C Gonzalez and Richard E. Woods
Chapter 1 and Chapter 2: Sections 2.1 to 2.5, 2.6.2]
An image may be defined as a two-dimensional function, f (x,y), where x and y are spatial
(plane) coordinates. The amplitude of f at any pair of coordinates (x,y) is called the intensity
or gray level of the image at that point.
When x, y, and f are all finite, discrete quantities, we call the image a digital image.
The field of digital image processing refers to processing digital images by means of a digital
computer.
A digital image is composed of a finite number of elements, each of which has a
location and value. These elements are called pixels.
Unlike humans, who are limited to the visual band of electromagnetic (EM) spectrum,
imaging machines cover almost the entire EM spectrum, ranging from gamma to radio
waves.
There is no general agreement regarding where image processing stops and other
related areas, such as image analysis and computer vision, starts.
Although there are no clear-cut boundaries in the continuum from image processing at
one end to computer vision at the other, one useful paradigm is to consider three types of
processes in this continuum:
A low-level process is characterized by the fact that both its inputs and outputs are
images.
A mid-level process is characterized by the fact that its inputs generally are images,
but its outputs are attributes extracted from those images.
The higher-level processes include object recognition, image analysis, and performing
the cognitive functions associated with vision.
One of the first applications of digital images was in the newspaper industry, when pictures
were first sent by submarine cable between London and New York.
Introduction of the Bartlane cable picture transmission system in the early 1920s reduced
the time to transport a picture across the Atlantic from more than one week to less than three
hours. Examples of fields that use DIP
The history of digital image processing is tied to the development of the digital computer.
The first computers powerful enough to carry out meaningful image processing tasks
appeared in the early 1960s.
Other than the processing intended for human interpretation, another important area of
applications of digital image processing is in solving problems dealing with machine
perception.
Typical problems in machine perception that routinely utilize image processing
techniques are automatic character recognition, industrial machine vision, military
recognizance, processing of fingerprints, and many other tasks.
The continuing decline in the ratio of computer price to performance and the expansion of
networking and communication bandwidth via World Wide Web and the Internet have
created unprecedented opportunities for continued growth of digital image processing.
One of the simplest ways to develop a basic understanding of the extent of image
processing applications is to categorize images according to their source. The principal
energy source for images in use today is the electromagnetic (EM) energy spectrum.
Electromagnetic waves can be conceptualized as propagating sinusoidal waves of varying
wavelengths, or they can be though of as a stream of massless particles traveling in a
wavelike pattern and moving at the speed of light. Each massless particle contains a certain
amount (or bundle) of energy.
Gamma-Ray Imaging
Major uses of imaging based on gamma rays
include nuclear medicine and astronomical
observations. Images are produced from
emissions collected by gamma ray detectors.
X-Ray Imaging
X-rays are among the oldest sources of EM radiation used for imaging.
Another major area of visual processing is remote sensing, which includes several
bands in the visual and infrared regions of the spectrum. Table 1.1 shows the so-called
thematic bands in NASA‘s LANDSAT satellite.
The primary function of LANDSAT is to obtain and transmit images of the Earth from
space for purposes of monitoring environmental conditions of the planet.
Figure 1.10 shows one image for each of the spectrum bands in Table 1.1.
Weather observation and prediction also are major applications of multi-spectrum imaging
from satellites.
Image acquisition is the first process shown in Fig., Note that acquisition could be as simple
as being given an image that is already in digital form. Generally, the image acquisition stage
involves preprocessing, such as scaling.
Image enhancement is among the simplest and most appealing areas of digital image
processing. Basically, the idea behind enhancement techniques is to bring out detail that is
obscured, or simply to highlight certain features of interest in an image. A familiar example
of enhancement is when we increase the contrast of an image because ―it looks better.‖ It is
important to keep in mind that enhancement is a very subjective area of image processing.
Image restoration is an area that also deals with improving the appearance of an image.
Morphological processing deals with tools for extracting image components that are
useful in the representation and description of shape.
Segmentation procedures partition an image into its constituent parts or objects. In
general, autonomous segmentation is one of the most difficult tasks in digital image
processing. A rugged segmentation procedure brings the process a long way toward
successful solution of imaging problems that require objects to be identified individually. On
the other hand, weak or erratic segmentation algorithms almost always guarantee eventual
failure. In general, the more accurate the segmentation, the more likely recognition is to
succeed.
Representation and description almost always follow the output of a segmentation
stage, which usually is raw pixel data, constituting either the boundary of a region (i.e., the
set of pixels separating one image region from another) or all the points in the region itself. In
either case, converting the data to a form suitable for computer processing is necessary. The
first decision that must be made is whether the data should be represented as a boundary or as
a complete region. Boundary representation is appropriate when the focus is on external
shape characteristics, such as corners and inflections. Regional representation is appropriate
when the focus is on internal properties, such as texture or skeletal shape. In some
applications, these representations complement each other. Choosing a representation is only
part of the solution for transforming raw data into a form suitable for subsequent computer
processing. A method must also be specified for describing the data so that features of
interest are highlighted. Description, also called feature selection, deals with extracting
attributes that result in some quantitative information of interest or are basic for
differentiating one class of objects from another.
Recognition is the process that assigns a label (e.g., ―vehicle‖) to an object based on
its descriptors. We conclude our coverage of digital image processing with the development
of methods for recognition of individual objects.
As recently as the mid-1980s, numerous models of image processing systems being sold
throughout the world were rather substantial peripheral devices that attached to equally
substantial host computers. Late in the 1980s and early in the 1990s, the market shifted to image
processing hardware in the form of single boards designed to be compatible with industry
standard buses and to fit into engineering workstation cabinets and personal computers. In
addition to lowering costs, this market shift also served as a catalyst for a significant number of
new companies whose specialty is the development of software written specifically for image
processing.
Although large-scale image processing systems still are being sold for massive imaging
applications, such as processing of satellite images, the trend continues toward miniaturizing and
blending of general-purpose small computers with specialized image processing hardware. Figure
3 shows the basic components comprising a typical general-purpose system used for digital image
processing. The function of each component is discussed in the following paragraphs, starting
with image sensing.
With reference to sensing, two elements are required to acquire digital images. The first is
a physical device that is sensitive to the energy radiated by the object we wish to image. The
second, called a digitizer, is a device for converting the output of the physical sensing device into
digital form. For instance, in a digital video camera, the sensors produce an electrical output
proportional to light intensity. The digitizer converts these outputs to digital data.
front-end subsystem, and its most distinguishing characteristic is speed. In other words, this
unit performs functions that require fast data throughputs (e.g., digitizing and averaging video
images at 30 framess) that the typical main computer cannot handle.
The computer in an image processing system is a general-purpose computer and can
range from a PC to a supercomputer. In dedicated applications, some times specially
designed computers are used to achieve a required level of performance, but our interest here
is on general-purpose image processing systems. In these systems, almost any well-equipped
PC-type machine is suitable for offline image processing tasks.
Software for image processing consists of specialized modules that perform specific
tasks. A well-designed package also includes the capability for the user to write code that, as
a minimum, utilizes the specialized modules. More sophisticated software packages allow the
integration of those modules and general-purpose software commands from at least one
computer language.
Image displays in use today are mainly color (preferably flat screen) TV monitors.
Monitors are driven by the outputs of image and graphics display cards that are an integral
part of the computer system. Seldom are there requirements for image display applications
that cannot be met by display cards available commercially as part of the computer system. In
some cases, it is necessary to have stereo displays, and these are implemented in the form of
headgear containing two small displays embedded in goggles worn by the user.
Hardcopy devices for recording images include laser printers, film cameras, heat-
sensitive devices, inkjet units, and digital units, such as optical and CD-ROM disks. Film
provides the highest possible resolution, but paper is the obvious medium of choice for
written material. For presentations, images are displayed on film transparencies or in a digital
medium if image projection equipment is used. The latter approach is gaining acceptance as
the standard for image presentations.
Networking is almost a default function in any computer system in use today.
Because of the large amount of data inherent in image processing applications, the key
consideration in image transmission is bandwidth. In dedicated networks, this typically is not
a problem, but communications with remote sites via the Internet are not always as efficient.
Fortunately, this situation is improving quickly as a result of optical fiber and other
broadband technologies.
• There is a distribution of discrete light receptors over retina surface. 2 types: cones
and rods
• Cones (6-7 million) are mainly around the central part called fovea and sensitive to
color
• Rods (75-150 million) are distributed wider and are sensitive to low illumination
levels
In an ordinary photographic camera, the converse is true. The lens has a fixed focal length,
and focusing at various distances is achieved by varying the distance between the lens and the
imaging plane.
Interesting Fact!!!
How to obtain dimension of the image formed on the retina?
Sol:
w.r.t above example:
Let h denote height of the object in retinal image.
Based on the geometry of above fig.,
15/100=h/17
Therefore, h=2.55mm
A small value of ΔIc /I means that a small change in intensity is discriminable. It represents
―good‖ brightness discrimination. A plot of logΔIc /I as a function of log I has the general
shape shown in Figure 2.6.
In Figure 2.8, all the center squares have exactly the same intensity, though they appear to
the eye to become darker as the background gets lighter.
Other examples of human perception phenomena are optical illusions, in which the eye fills
in non-existing information or wrongly perceives geometrical properties of objects.
In 1666, Sir Isaac Newton discovered that when a beam of sunlight is passed through a glass
prism, the emerging beam of light is consists of a continuous spectrum of colors from violet
at one end to red at the other.
As Figure 2.10 shows that the range of colors we perceive in visible light represents a very
small portion of the electromagnetic spectrum. Wavelength λ and frequency ν are related by
the expression .
λ =c/v
where c is the speed of light ( 2.998×108m/s ). The energy of the various components of the
electromagnetic spectrum is given by E = hν. where h is Planck‘s constant.
The range of measured values of monochromatic light from black to white is usually
called the gray scale, and monochromatic images are frequently referred to as gray-scale
images.
Figure 2.12 shows the three principal sensor arrangements used to transform illumination
energy into digital images.
Incoming energy is transformed into a voltage by the combination of input electrical power
and sensor material that is responsive to the particular type of energy being detected.
The output voltage waveform is the response of the sensor(s), and a digital quantity is
obtained from each sensor by digitizing its response.
single sensor places a laser source coincident with the sensor. Moving mirrors are used to
control the outgoing beam in a scanning pattern and to direct the reflected laser signal onto
the sensor.
Which means that reflectance is bounded by 0 (total absorption) and 1 (total reflectance).
To create a digital image, we need to convert the continuous sensed data into digital form.
This involves two processes: sampling and quantization.
When a sensing array is used for image acquisition, there is no motion and the number of
sensors in the array establishes the limits of sampling in both directions.
In some discussions, we use a more traditional matrix notation to denote a digital image as its
elements:
Due to storage and quantizing hardware considerations, the number of intensity levels
typically is an integer power of 2:
L=2k
We assume that the discrete levels are equally spaced and that they are integers in the
interval [ 0,L −1] .
We define the dynamic range of an imaging system to be the ratio of the maximum
measurable intensity to the minimum detectable intensity level in the system. As a rule, the
upper limit is determined by saturation and the lower limit by noise.
Closely associated with the concept of dynamic range is image contrast, which is defined as
the difference in intensity between the highest and lowest intensity levels in an image. The
number of bits required to store a digitized image is
b = M ×N ×k . When M = N, becomes b = N2k.
Example of
Spatial
resolution -
down to size
Varying ―N‖
Example of
Spatial
resolution -
upsampled
Experiment:
Pixel replication:
Pixel replication is applicable when we want to increase the size of an image an
integer number of times.
We can duplicate each column. This doubles the image size in the horizontal
direction.
Then, we duplicate each row of the enlarged image to double the size in the vertical
direction.
Image shrinking is done in a similar manner as just described for zooming. The equivalent
process of pixel replication is row-column deletion. For example, to shrink an image by one-
half, we delete every other row and column.
4 – Neighbors
D – Neighbors
8 – Neighbors
Let V be the set of gray-level values used to define adjacency. e.g. V = {1}
4-adjacency: Two pixels p and q with values from V are 4-adjacent if q is in the set N4 (P).
8-adjacency: Two pixels p and q with values from V are 8-adjacent if q is in the set N8 (P).
1. q is in N4(p),or
2. q is in ND(p) and the set N4(p) ∩ N4(q) has no pixels whose values are from V.
A {digital} path (or curve) from pixel p with coordinates (x, y) to pixel q with
coordinates (s, t) is a sequence of distinct pixels with coordinates.
Connected set:
Let S represent a subset of pixels in an image. Two pixels p and q are said to be
connected in S if there exists a path between them consisting entire pixels in S
Boundary: The boundary (also called border or contour) of a region R is the set of pixels in
the region that have one or more neighbors that are not in R.
Let pixels be p, q, and z, with coordinates (x, y), (s, t), and (v, w), respectively,
For any two images f and g and any two scalars a and b,
Example 1. Consider the image segment shown. Let V={0, 1} and compute the lengths of the
shortest 4-, 8-, and m-path between p and q. If a particular path does not exist between
these two points, explain why.
Sol.
• When V = {0,1}, 4-path does not exist between p and q because it is impossible to get
from p to q by traveling along points that are both 4-adjacent and also have values from
V. Fig. a shows this condition; it is not possible to get to q.
• The shortest 8-path is shown in Fig. b its length is 4.
• The length of the shortest m- path (shown dashed) is 5.
• Both of these shortest paths are unique in this case.
Example 2:
Example 3: Define 4-, 8- and m-adjacency. Compute the lengths of the shortest 4-, 8- and m-
path between p and q in the image segment shown in Fig. by considering V = {2, 3, 4}
(p) (q) : 3 – 2 – 3 – 4 – 2
Module – 2 RBT
Level
Spatial Domain: Some Basic Intensity Transformation Functions, Histogram
Processing, Fundamentals of Spatial Filtering, Smoothing Spatial Filters, Sharpening
Spatial Filters
Frequency Domain: Preliminary Concepts, The Discrete Fourier Transform (DFT)
L1, L2,
of Two Variables, Properties of the 2-D DFT, Filtering in the Frequency Domain,
L3
Image Smoothing and Image Sharpening Using Frequency Domain Filters, Selective
Filtering.
[Text: Digital Image Processing- Rafel C Gonzalez and Richard E. Woods
Chapter 3: Sections 3.2 to 3.6 and Chapter 4: Sections 4.2, 4.5 to 4.10]
Spatial domain refers to the image plane itself, and image processing methods in this
category are based on direct manipulation of pixels in an image.
Two principal categories of spatial processing are intensity transformations and spatial
filtering.
Intensity transformations operate on single pixels of an image for the purpose of contrast
manipulation and image thresholding.
Spatial filtering deals with performing operations, such as image sharpening, by working
in a neighbourhood of every pixel in an image.
Generally, spatial domain techniques are more efficient computationally and require less
processing resources to implement. The spatial domain processes can be denoted by the
expression
g(x, y) = T[ f (x, y)]
Where f(x, y) is the input image, g(x, y) is the output image, and T is an operator on f
defined over a neighbourhood of point (x, y). The operator can apply to a single image or to a
set of images.
Typically, the neighbourhood is rectangular, centered on (x, y), and much smaller than the
image.
Example:
Suppose that the neighbourhood is a square of size 3×3 and the operator T is defined
as ―compute the average intensity of the neighbourhood.‖
At an arbitrary location in an image, say (10, 15), the output g(10, 15) is computed as
the sum of f(10, 15) and its 8-neighbourhood is divided by 9.
The origin of the neighbourhood is then moved to the next location and the procedure
is repeated to generate the next value of the output image g.
The smallest possible neighbourhood is of size 1×1.
Image Negatives
The negative of an image with intensity levels in the range [0, L -1] is obtained by using the
negative transformation shown in Figure 3.3, which is given by
s = L -1- r
The negative transformation can be used to enhance white or gray detail embedded in dark
regions of an image.
Log Transformations
The general form of the log transformations is
s = c log(1+ r)
where c is a constant, and r ≥ 0
The log transformation maps a narrow range of low intensity values in the input into a
wider range of output levels. We use the transformation of this type to expend the values of
dark pixels in an image while compress the higher-level values.
The opposite is true of the inverse log transformation.
Figure 3.5(a) shows a Fourier spectrum with values in the range 0 to 1.5106 .
Figure 3.5(b) shows the result of applying (3.2-2) to the spectrum values, which will
rescale the values to a range of 0 to 6.2, and displaying the results with an 8-bit system.
Unlike the log function, changing the value of will obtain a family of possible
transformations. As shown in Figure 3.6, the curves generated with values of > 1 have
exactly the opposite effect as those generated with values of < 1.
The process used to correct these power-law response phenomena is called gamma
correction.
Contrast stretching
One of the simplest piecewise linear functions is a contrast stretching transformation.
Contrast-stretching transformation is a process that expands the range of intensity
levels in an image so that it spans the full intensity range of the recording medium or display
device.
Intensity-level slicing
Highlighting a specific range of intensities in an image often is of interest. The
process, often called intensity-level slicing, can be implemented in several ways, though
basic themes are mostly used. One approach is to display in one value all the values in the
range of interest and in another all other intensities, as shown in Figure 3.11 (a).
Another approach is based on the transformation in Figure 3.11(b), which brightens (or
darkens) the desired range of intensities but leaves all other intensities levels in the image
unchanged.
Figure 3.12 (b) shows the result of using a transformation of the form in Figure 3.11 (a),
with the selected band near the top of the scale, because the range of interest is brighter than
the background.
Figure 3.12 (c) shows the result of using the transformation in Figure 3.11 (b) in which a
band of intensities in the mid-gray region around the mean intensity was set to black, while
all other intensities were unchanged.
Bit-plane slicing
Instead of highlighting intensity-level ranges, we could highlight the contribution made to
total image appearance by specific bits.
Figure 3.13 shows an 8-bit image, which can be considered as being composed of eight 1-bit
planes, with plane 1 containing the lowest-order bit of all pixels in the image and plane 8 all
the highest-order bits.
Note that each bit plane is a binary image. For example, all pixels in the border have values 1
1 0 0 0 0 1 0, which is the binary representation of decimal 194. Those values can be viewed
in Figure 3.14 (b) through (i). Decomposing an image into its bit planes is useful for
analysing the relative importance of each bit in the image.
2.2Histogram Processing
The histogram of a digital image with intensity levels in the range [0, L -1] is a discrete
function h(rk) = nk , where rk is the kth intensity value and nk is the number of pixels in the
image with intensity rk.
It is common practice to normalize a histogram by diving each of its components by
the total number of pixels in the image, denoted by MN, where M and N are the row and
column dimensions of the image.
A normalized histogram is given by
Example:
Figure 3.16, which is the pollen image of Figure 3.10 shown in four basic intensity
characteristics: dark, light, low contrast, and high contrast, shows the histograms
corresponding to these image.
We consider the continuous intensity values and let the variable r denote the intensities of an
image. We assume that r is in the range [0, L 1].
We focus on transformations (intensity mappings) of the form
s T(r) 0 r L 1
that produce an output intensity level s for every pixel in the input image having intensity r.
Assume that
(a) T(r) is a monotonically increasing function in the interval 0 r L 1, and
(b) 0 T(r) L -1 for 0 r L -1.
In some formations to be discussed later, we use the inverse
r (s) 0 s L 1
From Figure 3.17 (a), we can see that it is possible for multiple values to map to a single
value and still satisfy these two conditions, (a) and (b). That is, a monotonic transformation
function can perform a one-to-one or many-to-one mapping, which is perfectly fine when
mapping from r to s.
However, there will be a problem if we want to recover the values of r uniquely from
the mapped values.
As Figure 3.17 (b) shows, requiring that T(r) be strictly monotonic guarantees that the
inverse mappings will be single valued. This is a theoretical requirement that allows us to
derive some important histogram processing techniques.
Prove that result of applying the transformation to all intensity levels „r‟. The resulting
intensities „s‟ have a uniform PDF independently of the form of the PDF of the r‟s
Solution:
The intensity levels in an image may be viewed as random variables in the interval [0, L -1].
A fundamental descriptor of a random variable is its probability density function (PDF).
(1)
(2)
Solution:
Example 2:
For a given 4X4 image having 0 – 9 gray scales, perform histogram equalization and draw the
histogram of image before and after equalization. 4X4 image is shown in Fig.
Solution:
2 3 3 2 3 6 6 3
4 2 4 3 8 3 8 6
3 2 3 5 6 3 6 9
2 4 2 4 3 8 3 8
Given Equalized
7 7
6 6
5 5
4 4
3 3
2 2
1 1
0 0
2 3 4 5 3 6 8 9
• Obtain pr(rj) from the input image and then obtain the values of sk, round the value to
the integer range [0, L-1].
k
( L 1) k
sk T (rk ) ( L 1) pr (rj ) nj
j 0 MN j 0
• Use the specified PDF and obtain the transformation function G(zq), round the value
to the integer range [0, L-1].
• Mapping from sk to zq
q
G( zq ) ( L 1) pz ( zi ) sk
i 0
zq G 1 (sk )
Note: Refer to the Problems in Class Work
Procedure:
Mean value of r:
L 1
M 1 N 1
m ri p(ri ) 1
i 0
MN
f ( x, y )
x 0 y 0
Variance
L 1
2 u2 (r ) (ri m) 2 p(ri )
i 0
M 1 N 1
1
f ( x, y ) m
2
MN x 0 y 0
Arithmetic operations
Image subtraction
Image Averaging:
• g(x, y) be the noisy image formed by the addition of noise η(x, y) to an original image
f(x, y)
• Noise is uncorrelated and has zero average value at each pair of coordinates, Then
Note:
1. As K increases variability (noise) decreases
2. approaches f(x, y) as number of noisy images (k) increases in averaging
process
a b
g ( x, y ) w(s, t ) f ( x s, y t )
s a t b
m 2a 1; n 2b 1
Foundation:
2 f
f ( x 1) f ( x 1) 2 f ( x)
x 2
The second-order isotropic derivative operator is the Laplacian for a function (image) f(x,y)
2 f 2 f
2 f
x 2 y 2
2 f
f ( x 1, y ) f ( x 1, y ) 2 f ( x, y )
x 2
2 f
f ( x, y 1) f ( x, y 1) 2 f ( x, y )
y 2
2 f f ( x 1, y ) f ( x 1, y ) f ( x, y 1) f ( x, y 1)
- 4 f ( x, y )
Image sharpening in the way of using the Laplacian:
If K = 1 Unsharp Masking
M ( x, y) | z8 z5 | | z6 z5 |
gy
gx
where |C| = √(R2 + I 2)is the length of the vector extending from the origin of the complex
plane to the point (R, I ) , and θ is the angle between the vector and the real axis.
Central to the study of linear systems and the Fourier transform is the concept of an
impulse and its sifting property. A unit impulse of a continuous variable t located at t = 0 ,
denoted δ(t) , is defined as
General Form:
2.6.5 Convolution
2-D impulse exhibits the sifting property under integration, and given by
Sampling in 2-D can be modeled using the sampling function (2-D impulse train):
Where T and Z are the separations between samples along the taxis
and z-axis.
Aliasing in Images:
There are two principal manifestations of aliasing in images:
• Spatial aliasing, which is due to under-sampling;
• Temporal aliasing, which is related to time intervals between images in a sequence of
images.
Assignment
Equation indicates that rotating f (x,y) by an angle θ0 will rotate F(u,v) by the same angle.
Conversely, rotating F(u,v) will rotate f (x,y) by the same angle.
2.8.3 Periodicity
The 2-D Fourier transform and its inverse are infinitely periodic in the u and v directions:
Given a digital image f (x, y) , of size M ×N, filtering Equation is given by:
Highpass filter:
– Enhances sharp details
– But Causes, reduction in contrast
• A highpass filter is obtained from a given lowpass filter using the equation
Homomorphic Filtering:
STAGE 1: Take a natural logarithm of both sides to decouple i(x, y) and r(x, y)
components and apply transforms.
STAGE 2: Use the Fourier transform to transform the image into frequency domain:
Where Fi(u, v) and Fr(u, v) are the Fourier transforms of lni(x, y) and lnr(x, y)
respectively.
STAGE 3: High pass the Z(u, v) by means of a filter function H(u, v) in frequency
domain, and get a filtered version S(u, v) as the following:
STAGE 4: Take an inverse Fourier transform to get the filtered image in the spatial
domain:
STAGE 5: The filtered enhanced image g(x, y) can be obtained by using the
following equations:
– In stage 1, natural log (ln) was considered, so filtered image is given as:
[Illumination and reflectance are not separable, but their approximate locations in the
frequency domain may be located.]
Process specific bands of frequencies or small regions of the rectangle, which are called
bandreject or bandpass filters.
Notch Filters
Notch reject filters are constructed as products of highpass filters whose centers have
been translated to the centers of the notches
Module – 3
Restoration: Noise models, Restoration in the Presence of Noise Only using Spatial
Filtering and Frequency Domain Filtering, Linear, Position- Invariant Degradations,
Estimating the Degradation Function, Inverse Filtering, Minimum Mean Square Error
(Wiener) Filtering, Constrained Least Squares Filtering.
[Text: Digital Image Processing- Rafel C Gonzalez and Richard E. Woods
Chapter 5: Sections 5.2, to 5.9]
3.1 Restoration
Where h(x, y) is the spatial representation of the degradation function and ―★‖
indicates convolution. Therefore, we can have the frequency domain representation by:
The principal sources of noise in digital images arise during image acquisition and/or
transmission.
In the spatial domain, we are interested in the parameters that define the spatial
characteristics of noise, and whether the noise is correlated with the image.
Frequency properties refer to the frequency content of noise in the Fourier sense.
Noise is independent of spatial coordinates and it is uncorrelated with respect to the
image itself.
Gaussian noise
Because of its mathematical tractability in both the spatial and frequency domains, Gaussian
(normal) noise models are used frequently in practice.
The probability density function (PDF) of a Gaussian random variable, z , is given by
where z represents intensity, is the mean (average) value of z, and σ is its standard
deviation.
Rayleigh noise
The probability density function of Rayleigh noise is given by
where a > 0 and b is a positive integer. The mean and variance of this density are given by
Exponential noise
The PDF of exponential noise is given by
Where a > 0. The mean and variance of this density are given by
Uniform noise
If b > a , intensity b appears as a light dot in the image. Conversely, intensity a will appear
like a dark dot.
If either Pa or Pb is zero, the impulse noise is called unipolar.
If neither Pa nor Pb is zero, and especially if they are approximately equal, the
impulse noise values will resemble salt-and-pepper granules randomly distributed over the
image.
Periodic Noise
Periodic noise in an image arises typically from electrical or electromechanical interference
during image acquisition.
The periodic noise can be reduced significantly via frequency domain filtering,
The parameters of periodic noise can be estimated by inspection of the Fourier spectrum of
the image.
Periodic noise tends to produce frequency spikes, which are detectable even by visual
analysis.
In simplistic cases, it is also possible to infer the periodicity of noise components
directly from the image.
Automated analysis is possible if the noise spikes are either exceptionally
pronounced, or when knowledge is available about the general location of the frequency
components of the interference.
It is often necessary to estimate the noise probability density functions for a particular
imaging arrangement.
When images already generated by a sensor are available, it may be possible to
estimate the parameters of the probability density functions from small patches of reasonably
constant background intensity.
The simplest use of the data from the image strips is for calculating the mean and
variance of intensity levels. Let S denote a stripe and ps (zi ) , i = 0,1,2,...,L -1, denote the
probability estimates of the intensities of the pixels in S , then the mean and variance of the
pixels in S are
Note:
The shape of the histogram identifies the closest probability density function match.
The Gaussian probability density function is completely specified by these two
parameters.
For the other shapes discussed previously, we can use the mean and variance to solve
the parameters a and b.
Impulse noise is handled differently because the estimate needed is of the actual
probability of occurrence of the white and black pixels.
3.3 Restoration in the Presence of Noise Only using Spatial Filtering and
Frequency Domain
When the only degradation present in an image is noise,
g(x,y) = h(x,y)★f (x,y) + η(x,y) and G(u,v) = H(u,v)F(u,v) + N(u,v)
Become
g(x,y) = f (x,y) + η (x,y) and G(u,v) = F(u,v) + N(u,v)
Since the noise terms are unknown, subtracting them from g(x,y ) or G(u,v ) is not a
realistic option.
In the case of periodic noise, it usually is possible to estimate N (u, v) from the
spectrum of G(u, v) .
A geometric mean filter achieves smoothing comparable to the arithmetic mean filter, but it
tends to lose less image detail in the process.
Works well for some types of noise like Gaussian noise and salt noise, but fails for pepper
noise.
Contraharmonic mean filter
The contra-harmonic mean filter yields a restored image based on the expression
PTO
The positive-order filter did a better job of cleaning the background, at the expense of
slightly thinning and blurring the dark areas.
The opposite was true of the negative-order filter.
In general, the arithmetic and geometric mean filters are suited for random noise like
Gaussian or uniform noise.
The contraharmonic mean filter is well suited for impulse noise, with the
disadvantage that it must be known whether the noise is dark or light in order to select Q.
Order-statistic filters are spatial filters whose response is based on ordering (ranking) the
values of the pixels contained in the image area encompassed by the filter.
Median filter
The best-known order-statistic filter is the median filter, which will replace the value
of a pixel by the median of the intensity levels in the neighbourhood of that pixel:
The median filters are particularly effective in the presence of both bipolar and
unipolar impulse noise.
The max filter is useful for finding the brightest points in an image, while the min
filter can be used for finding the darkest points in an image.
Midpoint filter
The midpoint filter computes the midpoint between the maximum and minimum
values in the area encompassed by the filter:
The midpoint filter works best for random distributed noise, like Gaussian or uniform noise.
When d = 0 , the alpha-trimmed mean filter is reduced to the arithmetic mean filter.
If d = mn -1 , the alpha-trimmed mean filter becomes a median filter.
The simplest statistical measures of a random variable are its mean and variance,
which are reasonable parameters for an adaptive filter.
The mean gives a measure of average intensity in the region over which the mean is
computed, and the variance gives a measure of contrast in that region.
The response of a filter, which operates on a local region Sxy, at any point (x,y) is to be
based on four quantities:
(a) g(x,y) , the value of the noisy image at (x,y) ;
(b) σn2,, the variance of the noise corrupting f (x,y) to form g(x,y) ;
(c) mL , the local mean of the pixels in Sxy ;
(d) σL2 , the local variance of the pixels in Sxy .
3. If the two variances are equal, we want the filter to return the arithmetic mean value of the
pixels in Sxy .
This condition occurs when the local area has the same properties as the overall
image, and local noise is to be reduced simply by averaging.
Based on these assumptions, an adaptive expression for obtaining ̂ (x, y) may be written as
The only quantity needed to be estimated is the variance of the overall noise, σn2, and
other parameters can be computed from the pixels in Sxy .
The median filter discussed previously performs well if the spatial density of the
impulse noise is not large (Pa and Pb are less than 0.2 ).
The adaptive median filtering can handle impulse noise with probabilities larger than
these. Unlike other filters, the adaptive median filter changes the size of Sxy during operation,
depending on certain conditions.
Let:
zmin = minimum intensity value in Sxy
zmax = maximum intensity value in Sxy
zmed = median of intensity values in Sxy
zxy = intensity value at coordinates (x,y)
Smax = maximum allowed size of Sxy
Purpose:
To remove salt-and-pepper (impulse) noise;
To provide smoothing of other noise that may not be impulsive; and
To reduce the distortion of object boundaries.
Periodic noise can be analyzed and filtered effectively by using frequency domain
techniques.
Bandreject Filters
Rejects (attenuates) band of frequencies and allows the rest
Figure shows perspective plots of ideal, Butterworth, and Gaussian bandreject filters,
One of the principal applications of bandreject filtering is for noise removal in applications
where the general location of the noise component(s) in the frequency domain is
approximately known.
Illustration:
The noise components can be seen as symmetric pairs of bright dots in the Fourier spectrum
shown in Figure (b)
Since the component lie on an approximate circle about the origin of the transform, so
a circularly symmetric bandreject filter is a good choice.
Bandpass Filters
Notch Filters
A notch filter rejects/passes frequencies in predefined neighbourhoods about a center
frequency. Figure shows plots of ideal, Butterworth, and Gaussian notch (reject) filters.
Restoration is done by Placing the notch filter at the location of spike (noise).
The transfer function HNP (u,v) of a notch pass filter is obtained from a corresponding
notch reject filter transfer function, HNR(u,v) , by using the equation
HNP (u,v) = 1 - HNR(u,v)
When several interference components are present, the methods discussed previously are not
always acceptable because they may remove too much image information in the filtering
process.
Optimum notch filters minimizes local variances of the restored estimate ̂ (x, y)
Step 1: The first step in Optimum notch filtering is to find the principal frequency
components and placing notch pass filters at the location of each spike in G(u,v), yielding
H(u, v). The Fourier transform of the interference pattern is thus given by
N(u, v)=HNP(u, v) G(u, v)
where G(u, v)is the Fourier transform of the corrupted image.
Step 2: The corresponding interference pattern in the spatial domain is obtained with the
inverse Fourier transform
Step 4: The effect of components not present in the estimate of η (x,y) can be minimized by
subtracting from g(x,y) a weighted portion of η (x,y) to obtain an estimate of f (x,y) :
Note: Weighting function can be chosen according to need; one approach minimizes the local
variance
The input-output relationship shown in Figure before the restoration can be expressed as
g (x, y) = H [f(x, y)] + η(x, y) (1)
First, we assume that η(x, y) = 0 so that,
Linearity Property:
H is Linear if:
H [af1(x, y) + bf2(x, y)] = a H [ f1(x, y)] + b H [ f2(x, y)] (2)
Where a and b are scalars and f1(x, y) and f2(x, y) are any two input images. If a = b = 1, then
equation (2) becomes
H [af1(x, y) + bf2(x, y)] = H [ f1(x, y)] + H [ f2(x, y)] (3)
Which is called the property of additivity.
This is called the property of homogeneity. It says that the response to a constant multiple of
any input is equal to the response to that input multiplied by the same constant.
Position- Invariant:
An operator having the input-output relationship
g(x,y) = H [ f (x,y)]
is said to be position (space) invariant if
H [ f (x – α, y - β)] = g(x – α, y - β) (5)
For any f (x,y) and any α & β . Eq (5) indicates that the response at any point in the
image depends only on the value of the input at that point, not on its position.
(6)
(7)
(8)
Since f (α, β) is independent of x and y, using the homogeneity property, it follows that
Where the term h(x, α, y, β) = H [δ(x - α, y - β)] is called the impulse response of H.
In other words, if η(x, y) = 0, then h(x, α, y, β) is the response of H to an impulse at (x, y).
(9)
is called the superposition (or Fredholm) integral of the first kind, and is a fundamental result
at the core of linear system theory.
We have
H [δ(x - α, y - β)] = h(x - α, y - β)
Above Equation tells us that knowing the impulse of a linear system allows us to compute its
response, g , to any input f . The result is simply the convolution of the impulse response and
the input function.
In the presence of additive noise, Equation (9) becomes,
Assuming that the values of the random noise η (x,y) are independent of position, we have
Based on the convolution theorem, we can express above equation in the frequency domain
as
A linear, spatially invariant degradation system with additive noise can be modelled in the
spatial domain as the convolution of the degradation function with an image, followed by the
additive of noise. The same process can be expressed in the frequency domain.
There are three principal ways to estimate the degradation function used in image restoration.
Estimation by Image Observation
Estimation by Experimentation
Estimation by Modeling
For example, suppose that a radial plot of Hs(u, v) has the approximate shape of a Gaussian
curve. Then we can construct a function H(u, v) on a large scale, but having the same basic
shape. This estimation is a laborious process used in very specific circumstances.
3.6.2 Estimation by Experimentation
If equipment similar to the equipment used to acquire the degraded image is available, it is
possible in principle to obtain an accurate estimate of the degradation.
Images similar to the degraded image can be acquired with various system settings
until they are degraded as closely as possible to the image we wish to restore.
Then the idea is to obtain the impulse response on the degradation by imaging an
impulse (small dot of light) using the same system settings.
An impulse is simulated by a bright dot of light, as bright as possible to reduce the
effect of noise to negligible values. Since the Fourier transform of an impulse is a constant, it
follows
Degradation modeling has been used for years. In some cases, the model can even take into
account environmental conditions that cause degradations.
For example, a degradation model proposed by Hufnagel and Stanley is based on the
physical characteristics of atmospheric turbulence
Where k is a constant that depends on the nature of the turbulence. Figure shows examples of
using Equation with different values of k.
The total exposure at any point of the recording medium is obtained by integrating the
instantaneous exposure over the time interval when the imaging system shutter is open. If the
T is the duration of the exposure, the blurred image g(x, y) is
Since the term inside the outer brackets is the Fourier transform of the displaced
function f [x - x0(t), y - y0(t)], we have
By defining
We can rewrite
G(u, v) = H(u, v) F(u, v)
Therefore,
(1)
(2)
Case 1: We cannot recover the undegraded image exactly because N(u,v) is not known.
Case 2: If the degradation function H(u,v) has zero or very small values, so the second term
of Eq (2) could easily dominate the estimate of ̂ .
One approach to get around the zero or small-value problem is to limit the filter
frequencies to values near the origin. As we know that H(0,0) is usually the highest value of
H(u, v) in the frequency domain.
(1)
Assumptions:
1. The noise and the image are uncorrelated.
2. One or the other has zero mean.
3. The intensity levels in the estimate are a linear function of the levels in the degraded
image.
Based on the above assumptions and with expectation of the minimum of the error function
as in equation (1), can be obtained in the frequency domain by the expression
Note: The value of K was chosen interactively to yield the best visual result.
Note: If the noise or K value is zero, then the Wiener filter reduces to the inverse filter.
If one considers the restored image to be signal and the difference between this image and the
original to be noise, we can define a signal-to-noise ratio in the spatial domain as
Module – 4
Color Image Processing: Color Fundamentals, Color Models, Pseudocolor Image
Processing.
Wavelets: Background, Multi-resolution Expansions. L1, L2,
Morphological Image Processing: Preliminaries, Erosion and Dilation, Opening and L3
Closing, The Hit-or-Miss Transforms, Some Basic Morphological Algorithms.
[Text: Chapter 6: Sections 6.1 to 6.3, Chapter 7: Sections 7.1 and 7.2, Chapter 9: Sections 9.1 to 9.5]
Characterization of light:
Chromaticity:
Chromaticity diagram
Intensity Slicing:
Fig. shows a plane at f(x, y)=li to slice the image function into two levels. If a
different color is assigned to each side of the plane shown in Fig., any pixel whose gray scale
is above the plane will be coded with one color, and any pixel below the plane will be coded
with the other.
The result is a two color image whose relative appearance can be controlled by
moving the slicing plane up and down the gray level axis. The idea of planes is useful
primarily for a geometric interpretation of the intensity-slicing technique. When more levels
are used, the mapping function takes on a staircase form.
1.1 Preliminaries:
“Morphology “– a branch in biology that deals with the form and structure of animals
and plants.
“Mathematical Morphology” – as a tool for extracting image components, that are
useful in the representation and description of region shape.
The language of mathematical morphology is – Set theory.
Unified and powerful approach to numerous image processing problems.
In binary images , the set elements are members of the 2-D integer space – Z2. where
each element (x, y) is a coordinate of a black (or white) pixel in the image.
Used to extract image components that are useful in the representation and description of
region shape, such as
Boundaries extraction
Skeletons
Convex hull
Morphological filtering
Thinning
Pruning
Basic Concepts in Set Theory
Subset: A B
Means every element in set A is also in set B.
Union: A B
Intersection: A B
disjoint / mutually exclusive: A B =
Complement:
Difference:
Reflection:
Translation
OR (Union): A is a subset
of B. set A is included in set B. A is
a subset of B, but A is not equal to B.
A is a superset of B, but B is not
equal to A.
Logic operations are just a private case for a binary set operations, such: AND –
Intersection, OR – Union, NOT-Complement.
Here the reflection is with respect to a specific origin, such as a point center in the shape, e.g.,
the center of the shape.
Example:
Erosion
Dilation
Opening
Closing
Hit-or-Miss transform
This equation indicates that the erosion of A by B is the set of all points z such that B,
translated by z, is contained in A.
This equation is based on obtaining the reflection of B about its origin and shifting this
reflection by z.
The dilation of A by B is the set of all displacements z, such that ̂ and A overlap by at least
one element. Based on this interpretation the above equation of can be rewritten as:
Usefulness:
Duality:
Dilation and erosion are duals of each other with respect to set complementation and
reflection. That is,
or
In other words, dilating the ―foreground‖ is the same as eroding the ―background‖, but the
structuring element reflects between the two. Likewise, eroding the foreground is the same as
dilating the background.
So, strictly speaking we don‘t really need both dilate and erode: with one or the other, and
with set complement and reflection of the structuring element, we can achieve the same
functionality. Hence, dilation and erosion are duals.
Proof:
( A B) c z ( B) z A c
We know by erosion definition:
If set (B)z is contained in A, then (B)z∩ Ac = ɸ, therefore
( A B) c z ( B) z Ac c
But the complement of z‘s satisfies z ( B) z Ac = z ( B)
c
z
Ac
Therefore,
( A B) c z ( B) z A
c
Hence Proved….
Opening:
A B ( A B) B
Remember that erosion finds all the places where the structuring element fits inside the
image, but it only marks these positions at the origin of the element.
By following an erosion by a dilation, we ―fill back in‖ the full structuring element at
places where the element fits inside the object.
So, an opening can be considered to be the union of all translated copies of the structuring
element that can fit inside the object. Openings can be used to remove small objects,
protrusions from objects, and connections between objects.
Smooth the contour of an image, breaks narrow isthmuses, eliminates thin protrusions
Closing:
Whereas opening removes all pixels where the structuring element won‘t fit inside the
image foreground, closing fills in all places where the structuring element will not fit in the
image background.
Smooth the object contour, fuse narrow breaks and long thin gulfs, eliminate small holes,
and fill in gaps.
Properties
Opening
(i) A°B is a subset (subimage) of A
(ii) If C is a subset of D, then C °B is a subset of D °B
(iii) (A °B) °B = A °B
Closing
(i) A is a subset (subimage) of A•B
(ii) If C is a subset of D, then C •B is a subset of D •B
(iii) (A •B) •B = A •B
Steps:
Erosion
Ac Erosion
with (W-X)
Detect
object
Extract image components that are useful in the representation and description of shape:
Boundary extraction
Region filling
Extract of connected components
Convex hull
Thinning
Thickening
Skeleton
Pruning
Boundary Extraction:
First, erode A by B, then make set difference between A and the erosion
The thickness of the contour depends on the size of constructing object – B
Region Filling
This algorithm is based on a set of dilations, complementation and intersections
p is the point inside the boundary, with the value of 1
X(k) = (X(k-1) xor B) conjunction with complemented A
Convex Hull
A is said to be convex if a straight line segment joining any two points in A lies
entirely within A
The convex hull H of set S is the smallest convex set containing S
Convex deficiency is the set difference H-S
Useful for object description
This algorithm iteratively applying the hit-or-miss transform to A with the first of B
element, unions it with A, and repeated with second element of B
Thinning
The thinning of a set A by a structuring element B, can be defined by terms of the hit-
and-miss transform:
Thickening
The structuring elements used for thickening have the same form as in thinning, but
with all 1‘s and 0‘s interchanged.
A separate algorithm for thickening is often used in practice, Instead the usual
procedure is to thin the background of the set in question and then complement the
result.
In other words, to thicken a set A, we form C=Ac, thin C and then form Cc.
Depending on the nature of A, this procedure may result in some disconnected points.
Therefore thickening by this procedure usually require a simple post-processing step
to remove disconnected points.
We will notice in the next example that the thinned background forms a boundary for
the thickening process, this feature does not occur in the direct implementation of
thickening
This is one of the reasons for using background thinning to accomplish thickening.
Skeleton
The notion of a skeleton S(A) of a set A is intuitively defined, we deduce from this
figure that:
a) If z is a point of S(A) and (D)z is the largest disk centered in z and contained
in A (one cannot find a larger disk that fulfils this terms) – this disk is called
―maximum disk‖.
b) The disk (D)z touches the boundary of A at two or more different places.
With
k times, and K is the last iterative step before A erodes to an empty set, in other
words:
4.2 Wavelets:
4.2.1 Background
Unlike the Fourier transform, which decomposes a signal to a sum of sinusoids, the
wavelet transform decomposes a signal
(image) to small waves of varying
frequency and limited duration. The
advantage is that we also know when
(where) the frequency appears.
Many applications in image
compression, transmission, and analysis.
We will examine wavelets from a multi-
resolution point of view and begin with
an overview of imaging techniques
involved in multi-resolution theory.
Small objects are viewed at high
resolutions. Large objects require only a
coarse resolution. Images have locally varying statistics resulting in combinations of edges,
abrupt features and homogeneous regions.
Image Pyramids:
Approximation pyramid:
Prediction pyramid:
A prediction of each high resolution level is obtained by
up-sampling (inserting zeros) the
previous low resolution level
(prediction pyramid) and
interpolation (filtering).
Approximation pyramid
Subband Coding:
Consider the two-band subband coding and decoding system as shown in figure (a). The
system is composed of two filter banks, each containing two FIR filters (h0(n), h1(n) & g0(n),
g1(n)).
Figure a
Figure b
Analysis filter bank includes h0(n) & h1(n) to break f(n) into two half-length
sequences flp(n) & fhp(n). Filters h0(n) & h1(n) are half-band filters whose idealized
characteristics are H0(w) and H1(w) are as shown in figure (b).
h0(n) low pass filter, output flp(n) is called approximation of f(n).
h1(n) high pass filter, output fhp(n) is called detail part of f(n).
Synthesis filter bank includes g0(n) & g1(n) combines flp(n) & fhp(n) to produce ̂ .
The goal in subband coding is to select h0(n), h1(n), g0(n) & g1(n) filters so that ̂ .
The resulting system is said to be Perfect Reconstruction Filters.
Approximation subband
Vertical subband
Horizontal subband
Diagonal subband
Of special interest in subband coding are filters that move beyond biorthogonality and require
being orthonormal:
Where the subscript means that the size of the filter should be even.
It is due to Alfred Haar [1910]. Its basis functions are the simplest known orthonormal
wavelets. The Haar transform is both separable and Symmetric. The Haar transform can be
expressed in matrix form
T = HFHT
Where F is an N*N image matrix, H is an N*N transformation matrix, T is the resulting N*N
transform.
For the Haar transform, transformation matrix H contains the Haar basis functions,
h(z).
They are defined over the continuous, closed interval for z ∈ [0,1] for k=0,1,2,…,N-
1, where N = 2n.
To generate H, we define the integer k such that
k = 2P + q −1 where, 0 ≤ p ≤ n −1 q = 0 or 1 for p = 0
1≤ q ≤ 2 p for p ≠ 0
For the above pairs of p and q, a value for k is determined and the Haar basis
functions are computed.
The ith row of a NxN Haar transformation matrix contains the elements of
hk(z) for z=0/N, 1/N, 2/N,…, (N-1)/N.
e.g. For instance, for N=4, p, q and k have the following values:
The rows of H2 are the simplest filters of length 2 that may be used as analysis filters
h0(n) and h1(n) of a perfect reconstruction filter bank.
Moreover, they can be used as scaling and wavelet vectors (defined in what follows) of
the simplest and oldest wavelet transform.
Module – 5
Segmentation: Point, Line, and Edge Detection, Thresholding, Region-Based
Segmentation, Segmentation Using Morphological Watersheds. L1, L2,
Representation and Description: Representation, Boundary descriptors. L3
[Text: Chapter 10: Sections 10.2, to 10.5 and Chapter 11: Sections 11.1 and 11.2]
5.1 Segmentation:
5.2.1 Representation
1
Outline:
Segmentation: Point, Line, and Edge Detection,
Thresholding, Region-Based Segmentation,
Segmentation Using Morphological Watersheds.
2
Preview
• Segmentation subdivides an image to
regions or objects
The goal is usually to find individual objects in an image.
• Mask operation
6
Example:
– Interested in lines of -45 degree
– Run the corresponding mask
– All other lines are eliminated
• To be measured by grey-
level transitions
8
First derivative can be used to detect the
presence of an edge (if a point is on a
ramp)
10
Dr. Mahantesh K, Associate
11
f ( x, y) ( x0 , y0 ) E, E : a nonnegativ e threshold
( x, y) ( x0 , y0 ) A, A : a nonegative angle threshold
Both magnitude and angle criteria should be satisfied
19
Example: find rectangular
shapes similar to license
plate
• Find gradients
20
2. Global Processing – the Hough Transform
• Determine if points lie on a curve of specified shape
• Consider a point (xi, yi) and the general line equation
• Write the equation for a second point (xj, yj) and find the
intersection point (a’, b’) on the parametric space
21
Computational aspects of the Hough transform
–
• Subdivision of the parametric space into accumulator cells
• The cell at (i,j) with accumulator values A(i,j) corresponds to
(ai,bj)
• For every point (xk,yk) vary a from cell to cell and solve for b:
• If ap generates bq, then increment the accumulator
A(p,q)=A(p,q)+1
22
Dr. Mahantesh K, Associate
23
yi axi b
1,4
28
Dr. Mahantesh K, Associate
29
34
Boundary Characteristics for Histogram Thresholding
• Consider only pixels lying on and near edges
• Use gradient or Laplacian to preliminary process the
image
35
Thresholds based on several variables
• Color or multispectral histograms
• Thresholding is based on finding clusters in multi-
dimensional space
• Example: face detection
• Different color models
36
Region based Segmentation
Basic formulation
• Every pixel must be in a region
• Points in a region must be connected
• Regions must be disjoint
• Logical predicate for one region and for distinguishing between
regions
Region growing
• If not, then the method requires that the region be split into two
40
• The main problem with region splitting is determining
where to split a region.
• One method to divide a region is to use a quadtree
structure.
• Quadtree: a tree in which nodes have exactly four
descendants
P(Ri) = FALSE.
P(RjURk) = TRUE.
42
Segmentation by Morphological Watersheds