Real Time Pattern Recognition Using Matrox Imaging System
Real Time Pattern Recognition Using Matrox Imaging System
Real Time Pattern Recognition Using Matrox Imaging System
=
+ + + + =
1
0
1
2 2 2 1 1 1 2 2 2 1 1 1
, , , , ,
w h
M
i
M
o j
j y i x f i y i x f y x f y x f G
An overview of the method is as follows: build pyramid representations of both the model and
the target (search space), and perform correlation search at the top levels of the two pyramids.
This can be done very quickly due to the reduced image sizes. The best match at the top level
can be refined using a coarse-to-fine strategy in which the best estimate of location at level k
in the pyramid is used as the starting point for the search at level k 1. When the bottom of
the pyramid is reached, the model has been found. Multiple instances can be found through
repeating this procedure by choosing more than one match at the top level of the pyramid.
The method of building both the image pyramid and the model pyramid is described briefly.
The structure of the pyramid is quite simple. An averaging-with-no-overlap scheme is used,
and uses a sub-sampling rate of 2. This means that each level in the pyramid has dimensions
equal to one-half of those for the level below, meaning that each level is one-quarter the size
of the one immediately below. Each pixel in a given layer is the average of four pixels from
the layer below. This type of pyramid can be built quickly, as each pixel in a new level only
requires 3 ads and one shift to compute.
Real-Time Pattern Recognition using Matrox Imaging System
National Institute Of Technology Rourkela 40
Fig. 3.3 The pyramid representation for a typical image is shown. The pyramid has three levels, with
level 0 being the largest image at 320240 (top), and level 2 being the smallest at 80 60 (bottom
right). In the level 0 image, a search model is defined.
The number of levels is limited by K
max
log
2
min (M
w
, M
h
). The advantages of building a
pyramid become quickly obvious: the K
th
level of a pyramid has 2
2(K1)
times fewer pixels
than does the original image. Remember that the model pyramid at level K also has 2
2(K1)
times fewer pixels than at its level 0, so that the total cost of NGC at the top level of the
pyramid is 2
4(K1)
times smaller than NGC on the original image. For a 4-level pyramid, this
factor is 4096. Tthe localization of the pattern at the top level of the pyramid is not perfect,
Real-Time Pattern Recognition using Matrox Imaging System
National Institute Of Technology Rourkela 41
but this can be used as an initial estimate for the next level of the pyramid. At each level of
the pyramid a small number of correlations is used to refine the location estimate. Instead of
performing an exhaustive local search for the new maximum at each level of the pyramid, it is
possible to estimate the gradient of the correlation surface and use a steepest descent method
to perform the search.
3.5.2 Derivation of Correlation Gradient
When the estimate of model location is refined at each level of the pyramid, it is possible to
use an estimate of the gradient of the correlation surface to perform a steepest descent search.
However, in applying steepest descent to a surface that we expect will have multiple maxima,
it is important to identify regions expected to contain local maxima to allow the search to
converge quickly. The use of the image pyramid does this: not only does it limit complexity
of the top-level correlation, but it facilitates finding candidate regions and provides a means
of quickly refining the search. In the derivation which follows the continuous case is
developed: the transition to the discrete case is straight forward.
Given an image I(x, y) and a model M(x, y), with M being differentiable over all x and y, the
correlation coefficient surface can be defined as
C (u v) =
( )( )( )
( ) ( ) | |
|
.
|
\
|
}}
}}
2
1
2
, ,
, ,
y x
y x
d d v y u x W y x I
d d v y u x MW y x I
The function W(x, y) is a windowing function which is used to force the contribution from M
to zero at its boundaries:
W(x y) = 0: x<0, x>M
w
, y<0, y>M
h
Real-Time Pattern Recognition using Matrox Imaging System
National Institute Of Technology Rourkela 42
The notation (MW) (x, y) is shorthand for M(x, y) W(x, y). For simple correlation
computation, W is normally chosen to be a box function. However, since we will want to
differentiate Eq. 2, the windowing function should be chosen such that its derivative goes to
zero at the borders of the window. It is assumed that M
2
(x, y) W(x, y) dx dy = 1, i.e. that
M(x, y) is normalized with respect to the windowing function.
The gradient of the windowed correlation is
( )
( )( )( )
( ) ( )
( ) ( )( )
( )( )( )
( ) ( )
( ) ( )
}}
}}
}}
}}
}}
}}
V
(
(
+
V
= V
y x
y x
y x
y x
y x
y x
d d v y u x W y x I
d d v y u x W y x I
d d v y u x MW y x I
d d v y u x MW y x I
d d v y u x W y x I
d d v y u x MW y x I
v u C
, ,
, ,
, ,
, ,
, ,
, ,
2 ,
2
2
2
2
2
Since C2 (u, v) [0, 1] we dont expect C
2
(u, v) to be huge so long as the correlation
surface is reasonably smooth, i.e. no discontinuities in I or M. As the amount of high-
frequency content in M and/or I increases, we expect the correlation surface to become less
smooth.
3.6 Simulation Results and Conclusion
Results are also shown for real-world images and several monochrome images such as
baboon, Lena where the target undergoes mild perspective distortion. In all cases the accept
threshold is set to 0.75, representing a correlation score of 0.866 = 0.75. Except where
Real-Time Pattern Recognition using Matrox Imaging System
National Institute Of Technology Rourkela 43
otherwise noted, the algorithm has been given the expected number of targets in advance.
Results are given for search time vs. number and size of targets and false positive/negative
results are also given.
(a) (b)
(c)
(d) (e)
Real-Time Pattern Recognition using Matrox Imaging System
National Institute Of Technology Rourkela 44
Fig 3.4.1: (a) model defined in Lena image, (b) model found in target, (c) model found in
noisy target, (d) (e) Table showing score, hotspot, etc of result
(a) (b)
(c)
Fig 3.4.2: (a) model defined in Baboon image, (b) model found in noisy target, (c) Table
showing score, hotspot, etc of result
Real-Time Pattern Recognition using Matrox Imaging System
National Institute Of Technology Rourkela 45
(a) (b)
(c)
(d) (e)
Fig 3.4.3: (a) model defined in real time image, (b) model found in target, (c) model found in
noisy target, (d) (e) Table showing score, hotspot, etc of result
Real-Time Pattern Recognition using Matrox Imaging System
National Institute Of Technology Rourkela 46
The algorithm[1] is also very fast even on modest hardware, making it attractive for machine
vision applications in industry. Our method has good performance in finding targets even in
the presence of small amounts of rotation and scale change. Further, we expect our method to
better detect subtle differences in target instances as it does a pixel-by-pixel comparison.
Further, searching for slightly re-scaled versions of the target across pyramid levels would
allow our method[1] to find targets across a range of scales. Three main contributions are
made: (1) incorporation of NGC search into a multi-scale image representation,(2) use of an
estimate of the correlation gradient to perform steepest descent search at each level of the
pyramid, and (3) a method for choosing the appropriate pyramid depth of the model using a
worst-case analysis.
The result is a fast and robust method for localizing target instances within images. Further,
since it is based on correlation search, the technique is simple and easily combined with PCA
techniques for target matching. The algorithm is limited in that it does not attempt to deal
with variable size or orientation of target instances, although it is shown to be robust to small
variations in either parameter. The level of robustness is dependent on the actual pattern to be
searched for, but this can be included in the worst-case analysis.
Chapter 4
Model Finder
Real-Time Pattern Recognition using Matrox Imaging System
National Institute Of Technology Rourkela 48
Matrox Inspector Geometric Model Finder allows finding patterns, or models, based on
geometric features. The algorithm finds models using edge-based geometric features instead
of a pixel-to-pixel correlation. As such, Geometric Model Finder offers several advantages
over correlation pattern matching, including greater tolerance of lighting variations (including
specular reflection), model occlusion, as well as variations in scale and angle. Model Finder
allows tailoring the search to fit the requirements of specific application. The search can be
done for any number of different models simultaneously, through a range of angles and scale.
Model Finder also provides complete support for calibration. Searches can be performed in
the calibrated real-world such that, even without physically correcting the images,
occurrences can be found even in the presence of complex distortions, and results returned in
real-world units.
4.1 Basic concepts
The basic concepts and vocabulary conventions for Model Finder are:
Edges: Transitions in grayscale value over several adjacent pixels. Well-defined edges have
sharp transitions in value. The smoother the image, the more gradual the change, and the
weaker the edge
Active edges: Edges which are extracted from the model source image to compose the
geometric model, and searched for in the target image.
Model: The pattern of active edges to find in the target image.
Occurrence: An instance of the model found in the target image.
Bounding-box: The boundary of the square or rectangular region which defines the height
and width of the model or occurrence.
Real-Time Pattern Recognition using Matrox Imaging System
National Institute Of Technology Rourkela 49
Model source image: The image from which to extract the model's active edges. In Matrox
Inspector, a model can be defined from any 1-band, 8-bit unsigned image.
Model image: The image extracted from the source region in the model source image which
is used as the model.
Model Finder context: The container for all models you want to find. The Model Finder
context allows you to set global search settings for all the models contained within the
context.
Mask: Used to define irrelevant, inconsistent, or featureless areas in the model, so that only
the pertinent model details are used for the search.
Pre-processing: Extracts the active edges from the model images contained within the
Model Finder context and sets internal search settings so that future searches will be
optimized for speed and robustness.
4.2 Guidelines for choosing models
While finding models based on geometric features are a robust, reliable technique, there are a
few pitfalls to be aware when defining the models, so that it is easy to choose the best
possible model.
Make sure your images have enough contrast
Contrast is necessary for identifying edges in the model source and target image with sub-
pixel accuracy. For this reason, it is recommended that models that contain only slow
gradations in grayscale values are avoided.
Real-Time Pattern Recognition using Matrox Imaging System
National Institute Of Technology Rourkela 50
Avoid poor geometric models
Poor geometric models suffer from a lack of clearly defined geometric characteristics, or from
geometric characteristics that do not distinguish themselves sufficiently from other image
features. These models can produce unreliable results.
Poor geometric model
Fig 4.1 Simple curves lack distinguishing features and can produce false matches
Be aware of ambiguous models
Certain types of geometric models provide non-unique, or ambiguous, results in terms of
position, angle, or scale. Models that are ambiguous in position are usually composed of one
or more sets of parallel lines only. Such models make it impossible to establish a unique
position for them. A large number of matches can be found since the actual number of line
segments in any particular line is theoretically limitless.
Model ambiguous in position
Model consisting of sets of parallel lines, without any
Real-Time Pattern Recognition using Matrox Imaging System
National Institute Of Technology Rourkela 51
Fig 4.2 Distinguishing features are ambiguous in position.
Models that are ambiguous in scale are usually composed of only straight lines that pass
through the same point; some spirals are also ambiguous in scale. Models that consist of small
portions of objects should be tested to verify that they are not ambiguous in scale.
Some spirals can be straight lines that pass only through the
Ambiguous in scale same points are ambiguous in scale
For example, a model of an isolated corner is ambiguous in terms of scale because it consists
of only two straight lines that pass through the same point.
Real-Time Pattern Recognition using Matrox Imaging System
National Institute Of Technology Rourkela 52
Fig 4.3 Distinguishing features are ambiguous in scale.
Symmetric models are often ambiguous in angle due to their similarity in features. For
example, circles are completely ambiguous in terms of angle. Other simple symmetric
models, such as squares and triangles, are ambiguous with respect to certain angles:
Nearly ambiguous models
When the major part of a model contains ambiguous features, false matches can occur
because the percentage of the occurrence's edges involved in the ambiguous features is great
enough to be considered a match.
Fig 4.4 Nearly ambiguous models
Real-Time Pattern Recognition using Matrox Imaging System
National Institute Of Technology Rourkela 53
To avoid this, make sure that the models have enough distinct features to be found among
other target image features. This will ensure that only correct matches are returned as results.
For example, the model below can produce false matches since the greater proportion of
active edges in the model is composed of parallel straight lines rather than distinguishing
curves.
4.3 Determining what is a match
A match occurs when the scores meet or exceed the acceptance value that is settled. The score
and the target score are the primary factors in determining which occurrences are considered
matches with the models in the Model Finder context.
The score is a measure of the active edges in the model found in the occurrence, weighted by
the deviation in of these common edges. If a weight mask is used, edges are also weighted
according to the weight mask.
The target score is a measure of edges found in the occurrence that are not present in the
original model (that is, extra edges), weighted by the deviation in position of the common
edges. Edges found in the occurrence that are not present in the model will reduce the target
score. These scores are calculated as follows:
Note: the normalized fit error is the fit error converted to a number between 0.0-1.0
The model coverage, target coverage, and fit error components of the score and target score
are explained below:
Score = Model coverage x (1 - Fit error weighting factor x Normalized fit error)
Target Score = Target coverage x (1 - Fit error weighting factor x Normalized fit error)
Real-Time Pattern Recognition using Matrox Imaging System
National Institute Of Technology Rourkela 54
Model coverage. The model coverage is the percentage of the total length of the
model's active edges found in the occurrence. 100% indicates that for every edge in
the model, a corresponding edge was found in the occurrence.
Target coverage. The target coverage is the percentage of the total length of the
model's active edges found in the occurrence, divided by the total length of edges
present in the occurrence's bounding box. Thus, a target coverage score of 100%
means that no extra edges were found. Lower scores indicate that features or edges
found in the target (result occurrence) are not present in the model.
Model coverage(x100 for percentage)
Target coverage(x100 for percentage)
Fig 4.5 model and target coverage
Real-Time Pattern Recognition using Matrox Imaging System
National Institute Of Technology Rourkela 55
Fit Error. The fit error is a measure of how well the edges of the occurrence correspond to
those of the model. The fit error is calculated as the average quadratic distance (with sub-pixel
accuracy) between the edges in the occurrence and the corresponding active edges in the
model:
=
2
+( )
2
A perfect fit gives a fit error of 0.0. The fit error weighting factor (between 0.0 - 100.0)
determines the importance to place on the fit error when calculating the score and target score.
The fit error weighting factor is set using the Model Advanced tab of the Model Finder dialog
box (default is 25.0).
4.4 Position, angle, and scale
The position, angle, and scale can be controlled when Model Finder searches for each model
in a target. The position, angle, and scale can be restricted at which model occurrences can be
found, to an expected (nominal) position, angle, or scale or to a given range.
Enabling calculations specific to searching within a range
Real-Time Pattern Recognition using Matrox Imaging System
National Institute Of Technology Rourkela 56
Depending on whether searching for models within a range of positions, angles, and/or scales,
Model Finder uses different search strategies to evaluate the edge-based features of the target
candidates. Typically, to search for models within a range, calculations specific to the
corresponding search strategy should be enabled for the context (Search Position Range,
Search Angle Range and/or Search Scale Range). On the Context General tab, the Search
Position Range and Search Angle Range search strategies are enabled by default, whereas the
Search Scale Range is disabled by default.
4.4.1 Search position and search position range
Each model defined in a Model Finder context can be searched at a specific position, or
within a position range. The position range limits the region in which the position of a model
occurrence can be found; coordinates which fall outside of this region cannot be returned as
results. Note that the position returned for an occurrence is determined by the model's
reference axis origin; by default, this position is set to the center of the model, however it can
be displaced if necessary.
The region defined by the default position range is the entire image plane (Whole World),
meaning that all coordinates (even outside the target if applicable) can be returned as results.
To specify a user-defined position range set the Search Position Range to User defined in the
Model Search tab. Then, set the X/Y Position field to specify the nominal search position. Use
the X D.Neg, Y D.Neg, X D.Pos, and Y D.Pos fields to set the position range relative to the
nominal position.
Real-Time Pattern Recognition using Matrox Imaging System
National Institute Of Technology Rourkela 57
Depending on the number of details present in the target, using a small position range
generally decreases the search time. Always set the search position range to the minimum
required when speed is a consideration; the X D.Neg, Y D.Neg, X D.Pos, Y D.Pos fields in the
Search Position Range area in the Model Search tab can be greater or equal to zero. If they
are all set to zero, the occurrence must be at the position specified by the nominal position.
Note that it is possible to specify a position range which defines a region partially, or totally,
outside of the target; this might be necessary, depending on the reference axis origin of the
model.
4.4.2 Angle and angular range
Each model defined in a Model Finder context can be specified at a specific angle, or within
an angular range. For each model in the context, it is possible to specify the angle of the
search, using the Angle field in the General tab. By default, the search angle is 0
o
. It is
possible to search within the full angular range of 360o from the nominal angle specified with
the Angle field. Use the Delta Positive and Delta Negative control types to specify the angular
range in the counter-clockwise and clockwise direction from the nominal angle, respectively;
the default for both is 180o. The angular range limits the possible angles which can be
returned as results for an occurrence. Note that the actual angle of the occurrence does not
Real-Time Pattern Recognition using Matrox Imaging System
National Institute Of Technology Rourkela 58
affect search speed. If needed to search for a model at discrete angles only (for example, at
intervals of 90 degrees), it is typically more efficient to define several models with different
expected angles, than to search through the full angular range.
Angle convention in inspector Delta convention in inspector
4.4.3 Scale and scale range
The scale of the model establishes the size of the occurrence that expected to find in the
target. If the expected occurrence is smaller or larger than that of the model, set the nominal
scale of the occurrence for each individual model, using the Scale edit field. The supported
scale factors are 0.5 to 2.0. When the scale of occurrences can vary around the specified
nominal scale, specify a range of scales, using the Max (1.0 to 2.0) and Min (0.5 to 1.0) fields
in the Search Scale Range area on the Model's General tab.
The minimum factor and the maximum factor together determine the scale range from the
nominal scale (Scale).
The maximum and minimum factors are applied to the Scale setting as follows:
Maximum scale = (Scale) x (Max.)
Minimum scale = (Scale) x (Min.)
Real-Time Pattern Recognition using Matrox Imaging System
National Institute Of Technology Rourkela 59
When calculations specific to scale-range search strategies are enabled, the scale range should
be used to cover an expected variance in the scale; should not use the scale range to cover
different expected scales at different positions. In this case, it is typically more efficient to
define several models with different expected scales. This is because a large scale range could
potentially slow down the operation; as well, unwanted occurrences could be found.
By default, calculations specific to scale-range search strategies are disabled. When disabled,
must specify a good nominal scale for each model, which is within the model's scale range.
Note that occurrences can still be found within the scale range specified for their model.
4.5 Context edge settings
Model Finder uses custom image processing algorithms to preprocess the image in order to
simultaneously extract active edges and improve model source and target images by
smoothing and reducing noise. The Smoothness and Detail level settings in the context's Edge
tab of the Model Finder dialog box control these image processing algorithms, determining
which active edges are extracted from the model source and target images.
4.5.1 Extracting edges
The edge extraction process involves a Denoising operation to even out rough edges and
remove noise. The degree of smoothness (strength) of the Denoising operation used for all
models in the context (using the Edge tab with the Smoothness field in the Filter Control area)
can be controlled. The range of this control type varies from 0.0 to 100.0; a value of 100.0
results in a strong noise reduction effect, while a value of 0.0 has almost no noise reduction
effect. The default setting is 50.0.
These settings only affect image type models and the target image.
Real-Time Pattern Recognition using Matrox Imaging System
National Institute Of Technology Rourkela 60
Target image with Edge map obtained with Edge map obtained with
Considerable noise smoothness set to 50 smoothness set to 70
Note that using a very high smoothing level can result in a loss of important detail and a
decrease in precision.
The detail level setting determines what is considered an edge. Edges are defined by the
transition in grayscale value between adjacent pixels. This setting can be controlled for all
models in the context, using the Detail Level field .The default setting (Medium) offers a
robust detection of edges from images with contrast variation, noise, and non-uniform
illumination. Nevertheless, in cases where objects of interest have a very low-contrast
compared to high-contrast areas in the image, some low-contrast edges can be missed.
The following examples show the effect of Detail level setting.
Low contrast image resulting edge map using Medium
Multi contrasted image resulting edge map resulting edge map
Real-Time Pattern Recognition using Matrox Imaging System
National Institute Of Technology Rourkela 61
Using Medium using High
If the images contain low-contrast and high-contrast objects, a detail level setting of High
should be used to ensure the detection of all important edges in the image. The Very High
setting performs an exhaustive edge extraction, including very low contrast edges. However,
it should be noted that this setting is very sensitive to noise. The Smoothness and Detail level
settings are applied to all the model and target images for the specified Model Finder context.
Note that model and target images are not directly modified; these settings merely extract the
edge-based information from the images.
Generally, the defaults settings for Smoothness and Detail level are sufficient for the majority
of images; when dealing with very noisy images it should be adjusted, extremely low or
multi-contrasted images, or images with very thin, refined features. In such cases, it is
recommended that try with different settings to achieve the necessary level of accuracy and
speed required by your application. In the special case when special hardware is available to
perform the convolution, it might be faster to use a kernel Finite Impulse Response (FIR)
implementation of the smooth filter rather than the default recursive Infinite Impulse
Response (IIR) implementation and the displayed default value for the filter mode will change
automatically.
Real-Time Pattern Recognition using Matrox Imaging System
National Institute Of Technology Rourkela 62
4.5.2 Shared edges
Edges that can be part of more than one occurrence are considered part of the occurrence with
the greatest score. For example, in the illustration below, two occurrences of two simple
models share a common edge. With shared edges enabled, these occurrences would have
perfect scores.
Model A Model B occurrence 1 occurrence 2
Shared edges
However, with shared edges disabled (default), the shared edge would be considered part of
occurrence, since it has the greater score; the score of occurrence 2 would be subsequently
reduced by the loss of the shared edge in the score calculation.
4.6 Thresholding
Thresholding is the simplest method of image segmentation. From a grayscale image,
thresholding can be used to create binary images. During the thresholding process,
individual pixels in an image are marked as object pixels if their value is greater than some
threshold value (assuming an object to be brighter than the background) and as background
pixels otherwise. This convention is known as threshold above. Variants include threshold
below, which is opposite of threshold above; threshold inside, where a pixel is labeled
"object" if its value is between two thresholds; and threshold outside, which is the opposite of
Real-Time Pattern Recognition using Matrox Imaging System
National Institute Of Technology Rourkela 63
threshold inside. Typically, an object pixel is given a value of 1 while a background pixel is
given a value of 0. Finally, a binary image is created by coloring each pixel white or black,
depending on a pixel's label.
Thresholding operations are either global or local and point- or region- dependent. Global
thresholding algorithms choose one threshold for the entire image while local thresholding
algorithms partition the image in to sub images and select a threshold for each sub image.
Point dependent thresholding algorithms only analyze the gray level distribution of the image
while region dependent algorithms also consider the location of the pixels.
Fig 4.6 .Example of a threshold effect used on an image
4.6.1 Adaptive thresholding
Thresholding is called adaptive thresholding [6] when a different threshold is used for
different regions in the image. This may also be known as local or dynamic thresholding. The
Real-Time Pattern Recognition using Matrox Imaging System
National Institute Of Technology Rourkela 64
key parameter in the thresholding process is the choice of the threshold value. Several
different methods for choosing a threshold exist; users can manually choose a threshold value,
or a thresholding algorithm can compute a value automatically, which is known as automatic
thresholding. A simple method would be to choose the mean or median value, the rationale
being that if the object pixels are brighter than the background, they should also be brighter
than the average. In a noiseless image with uniform background and object values, the mean
or median will work well as the threshold, however, this will generally not be the case. A
more sophisticated approach might be to create a histogram of the image pixel intensities and
use the valley point as the threshold. The histogram approach assumes that there is some
average value for the background and object pixels, but that the actual pixel values have some
variation around these average values. However, this may be computationally expensive, and
image histograms may not have clearly defined valley points, often making the selection of an
accurate threshold difficult. One method that is relatively simple, does not require much
specific knowledge of the image, and is robust against image noise, is the following iterative
method:
1. An initial threshold (T) is chosen; this can be done randomly or according to any other
method desired.
2. The image is segmented into object and background pixels as described above,
creating two sets:
1. G
1
= {f(m,n):f(m,n)>T} (object pixels)
2. G
2
= {f(m,n):f(m,n) T} (background pixels) (note, f(m,n) is the value of the
pixel located in the m
th
column, n
th
row)
3. The average of each set is computed.
1. m
1
= average value of G
1
2. m
2
= average value of G
2
Real-Time Pattern Recognition using Matrox Imaging System
National Institute Of Technology Rourkela 65
4. A new threshold is created that is the average of m
1
and m
2
1. T = (m
1
+ m
2
)/2
5. Go back to step two, now using the new threshold computed in step four, keep
repeating until the new threshold matches the one before it
This iterative algorithm is a special one-dimensional case of the k-means clustering algorithm,
which has been proven to converge at a local minimummeaning that a different initial
threshold may give a different final result. Color images can also be threshold. One approach
is to designate a separate threshold for each of the RGB components of the image and then
combine them with an AND operation.
4.7 Sobel operator
The Sobel operator [5] is used in image processing, particularly within edge
detection algorithms. Technically, it is a discrete differentiation operator, computing an
approximation of the gradient of the image intensity function. At each point in the image, the
result of the Sobel operator is either the corresponding gradient vector or the norm of this
vector. The Sobel operator is based on convolving the image with a small, separable, and
integer valued filter in horizontal and vertical direction and is therefore relatively inexpensive
in terms of computations. On the other hand, the gradient approximation which it produces is
relatively crude, in particular for high frequency variations in the image.
In simple terms, the operator calculates the gradient of the image intensity at each point,
giving the direction of the largest possible increase from light to dark and the rate of change
in that direction. The result therefore shows how "abruptly" or "smoothly" the image changes
at that point and therefore how likely it is that that part of the image represents an edge, as
well as how that edge is likely to be oriented. In practice, the magnitude (likelihood of an
Real-Time Pattern Recognition using Matrox Imaging System
National Institute Of Technology Rourkela 66
edge) calculation is more reliable and easier to interpret than the direction calculation.
Mathematically, the gradient of a two-variable function (here the image intensity function) is
at each image point a 2D vector with the components given by the derivatives in the
horizontal and vertical directions. At each image point, the gradient vector points in the
direction of largest possible intensity increase, and the length of the gradient vector
corresponds to the rate of change in that direction. This implies that the result of the Sobel
operator at an image point which is in a region of constant image intensity is a zero vector and
at a point on an edge is a vector which points across the edge, from darker to brighter values.
Mathematically, the operator uses two 33 kernels which are convolved with the original
image to calculate approximations of the derivatives - one for horizontal changes, and one for
vertical. If we define A as the source image, and G
x
and G
y
are two images which at each
point contain the horizontal and vertical derivative approximations, the computations are as
follows:
+1 0 1
+2 0 2
+1 0 1
+1 +2 +1
0 0 0
1 2 1
Where * here denotes the 2-dimensional convolution operation.
The x-coordinate is here defined as increasing in the "right"-direction, and the y-coordinate is
defined as increasing in the "down"-direction. At each point in the image, the resulting
gradient approximations can be combined to give the gradient magnitude, using:
=
2
+
2
Using this information, we can also calculate the gradient's direction:
Real-Time Pattern Recognition using Matrox Imaging System
National Institute Of Technology Rourkela 67
|
|
.
|
\
|
=
Gx
Gy
arctan u
Since the intensity function of a digital image is only known at discrete points, derivatives of
this function cannot be defined unless we assume that there is an underlying continuous
intensity function which has been sampled at the image points. With some additional
assumptions, the derivative of the continuous intensity function can be computed as a function
on the sampled intensity function, i.e. the digital image. It turns out that the derivatives at any
particular point are functions of the intensity values at virtually all image points. However,
approximations of these derivative functions can be defined at lesser or larger degrees of
accuracy. The Sobel operator represents a rather inaccurate approximation of the image
gradient, but is still of sufficient quality to be of practical use in many applications. More
precisely, it uses intensity values only in a 33 region around each image point to
approximate the corresponding image gradient, and it uses only integer values for the
coefficients which weight the image intensities to produce the gradient approximation.
4.8 Simulation Results and Conclusion
Results are also shown for real-world images and several monochrome images such as
baboon, where the target undergoes mild perspective distortion. Results are given for search
time vs. number and size of targets and false positive/negative results are also given.
Real-Time Pattern Recognition using Matrox Imaging System
National Institute Of Technology Rourkela 68
(a)
(b)
(c)
Real-Time Pattern Recognition using Matrox Imaging System
National Institute Of Technology Rourkela 69
(d) (e)
Fig 4.7.1: (a) model defined in Lena image, (b) model found in target, (c) model found in
noisy target, (d) (e) Table showing score, hotspot, etc of result
(a) (b)
(c)
Fig 4.7.2: (a) model defined in Baboon image, (b) model found in noisy target, (c) Table
showing score, hotspot, etc of result
Real-Time Pattern Recognition using Matrox Imaging System
National Institute Of Technology Rourkela 70
(a) (b)
(c)
(d) (e)
Real-Time Pattern Recognition using Matrox Imaging System
National Institute Of Technology Rourkela 71
Fig 4.7.3: (a) model defined in real time image, (b) model found in target, (c) model found in
noisy target, (d) (e) Table showing score, hotspot, etc of result
Edge detection is an important issue in image processing. Most common and earliest edge
detection algorithms are those based on gradient, such as Sobel operator and the Roberts
operator. The characteristics of Sobel operator, regularity, simplicity and efficiency, makes it
adequate for the implementation in application specific architectures such as Matrox imaging.
Gradient-based algorithms such as the Sobel operator have a major drawback of being very
sensitive to noise. The size of the kernel filter and coefficients are fixed and cannot be adapted
to a given image.
Chapter 5
COMPARISION
AND DISCUSSION
Real-Time Pattern Recognition using Matrox Imaging System
National Institute Of Technology Rourkela 73
5.1 Comparative Study
In this work the performance of the normalized gray scale techniques has been compared with
existing methods available in literature. This algorithm is very fast in detection and
localization of patterns with simple and inexpensive hardware. Typical search times are less
than 0.25 sec, and as low as 10-30 mS when the expected number of targets are known in
advance. Only two methods, those of Lowe [18] and Schilele & Pentland [32] cite similar
speeds. In comparing our algorithm to that of Lowe, we note that it is more complex to
implement than our algorithm. The normalized algorithm is based on NGC, is robust over
wide range of global illumination changes. Comparing to SIFT, it has rotation and scale
invariance while our method does not. The feature based algorithm used in the second part
has rotation and scale invariance.
5.2 Discussion
The normalized gray scale algorithm [1] is also very fast even on modest hardware, making it
attractive for machine vision applications in industry. Our method has good performance in
finding targets even in the presence of small amounts of rotation and scale change. The choice
of accept threshold will have a strong effect on the result. As this threshold is reduced we
would expect to find a large number of false positives. It must be remembered that lowering
the accept threshold also leads to a large number of candidate matches at the top level of the
pyramid, and this in turn will generally lengthen the search time. The feature based algorithm
used in the second part has rotation and scale invariance.
Chapter 6
CONCLUSIONS AND
SCOPE OF FUTURE WORK
Real-Time Pattern Recognition using Matrox Imaging System
National Institute Of Technology Rourkela 75
6.1 Conclusion
This work describes a novel approach to pattern matching based on normalized correlation and
Sobel based edge detection in a pyramid image representation. The result is a fast and robust
method for localizing target instances within images. Further, since it is based on correlation
search, the technique is simple and easily combined with PCA techniques for target matching.
The normalized algorithm[1] is limited in that it does not attempt to deal with variable size or
orientation of target instances, although it is shown to be robust to small variations in either
parameter. But the feature based algorithm used in the second part has rotation and scale
invariance.
6.2 Future Scope
In this thesis work, two pattern recognition algorithms, NGC and feature based algorithms are
implemented. Future work includes further investigation of size, and even orientation,
invariance in the search framework. While pattern recognition has been a field of research for
approximately fifty years; there still remains a lot of future research. One important
application of pattern recognition, which will be used towards security in the future, is person
recognition, which is still in its infancy.
Mathematical morphology is used for image processing and analysis, due to its basic
concept of set theory, has an inherent advantage for image processing. It can perform
tasks from simplest to the most demanding: noise reduction, edge detection,
segmentation, texture and shape analysis, but also it can be applied to almost all
applications dealing with digital image processing.
Most fuzzy rule based image segmentation techniques to date have been primarily
developed for gray level images. By combining with wavelet transform, fuzzy can be
Real-Time Pattern Recognition using Matrox Imaging System
National Institute Of Technology Rourkela 76
used for multiscale feature detection. Much more researches are expected in the
direction of fuzzy rule based feature extraction in future.
Discrete Wavelet Transform is a very good tool for image processing. Image textures
are classified based on wavelet transformation and singular value decomposition. This
technique achieves higher recognition rates compared to the traditional sub band
energy based approach.
There is considerable interest in developing new parallel architectures based on neural
network designs which can be applied to practical classification tasks.
Real-Time Pattern Recognition using Matrox Imaging System
National Institute Of Technology Rourkela 77
REFERENCES:
[1] W. James Maclean and John K Tsotsos, Fast pattern recognition using Normalized Grey
scale Correlation in a pyramid image representation, In IEEE Conference on Machine
Vision & applications, February 16, 2007.
[2] J. P. Lewis, Fast normalized cross correlation, In IEEE Conference on Vision Interface, 15
October 2005.
[3] Swami Manickam, Scott D Roth and Thomas Bushman, Intelligent and Optimal
normalized correlation for high speed pattern matching, Data cube Inc, Rosewood drive,
MA01923, USA.
[4] Abbas M Al-Ghaili, Syamsiah Mashohor, Alyani Ismail and Abdul Rahman Ramli, A
New Vertical Edge Detection Algorithm and its applications, In IEEE Conference on
Computer Engineering & Systems, Nov 2008.
[5] Nick Kanopoloulos, Nagesh Vasanthavads and Robert L. Baker, Design of an image edge
detection filter using the Sobel operator, IEEE journal of solid state circuits, vol 23,no.2,
APRIL1988.
[6] Rishi R, Rakesh, Probal Chaudhuri and C. A. Murthy, Thresholding in Edge Detection:
A Statistical Approach, IEEE transactions on image processing, vol. 13, no. 7, July 2004.
[7] Rafael C. Gonzalez and Paul Wintz, Digital Image Processing, Addison-Wesley
Publishing Company, Reading, Massachusetts, 2nd edition, 1987
[8] Ardeshir Goshtasby, Template matching in rotated images, IEEE Transactions on Pattern
Analysis and Machine Intelligence, PAMI-7(3):338344, May 1985.
[9] Franc ois Ennesser and Gerard Medioni. Finding Waldo, or focus of attention using local
colour information, IEEE Transactions on Pattern Analysis and Machine Intelligence,
17(8):805809, August 1995.
Real-Time Pattern Recognition using Matrox Imaging System
National Institute Of Technology Rourkela 78
[10] A. Baumberg, Reliable feature matching across widely separated views, In Conference on
Computer Vision & Pattern Recognition, pages 774781, 2000
[11] Chris Harris and Mike Stephens, A combined corner and edge detector, In Proceedings
Fourth Alvey Vision Conference, pages 147151, Manchester, United Kingdom, 1988.
[12] P. Burt, Attention mechanisms for vision in a dynamic world, In Proceedings of the
International Conference on Pattern Recognition, pages 977987, 1988.
[13] Michael A. Greenspan, Geometric probing of dense range data, IEEE Transaction on
Pattern Anal. Mach. Intell, 24(4):495508, 2002.
[14]Y. S. Huang, C. C. Chiang, J. W. Shieh, and Eric Grimson, Prototype optimization for
nearest neighbour classification, Pattern Recognition, 35:12371245, 2002.
[15] Jean-Michel Jolion and Azriel Rosenfeld, A pyramid framework for early vision, Kluwer
Academic Publishers, P.O. Box 17, 3300 AA Dordrecht, The Netherlands1994. ISBN: 0-
7923-9402-X
[16] Tony Lindeberg, Scale-space theory in computer vision, Kluwer Academic Publishers
P.O. Box 17, 3300 AA Dordrecht, The Netherlands, 1994. ISBN: 0-7923-9418-6.
[17] David G. Lowe, Object recognition from local scale-invariant features, In Proceedings of
the Seventh International Conference on Computer Vision, pages 11501157, Kerkyra,
Greece, 1999.
[18] Arun D. Kulkarni, Artificial Neural Networks for Image Understanding, Van Nostrand
Reinhold, New York, 1994. ISBN 0-442-00921-6; LofC QA76.87.K84 1993.
[19] Anil K. Jain and Aditya Vailaya, Shape-based retrieval: A case study with trademark
image databases, Pattern Recognition, 31(9):13691390, 1998.
[20] David G. Lowe, Distinctive image features from scale-invariant keypoints, International
Journal of Computer Vision, 60:90110, 2004.
[21] M. K. Hu, Visual pattern recognition by moment invariants, IRE Trans. Information
Real-Time Pattern Recognition using Matrox Imaging System
National Institute Of Technology Rourkela 79
Theory, IT-8:179187, 1962.
[22] Krystian Mikolajczyk and Cordelia Schmid, Indexing based on scale invariant interest
points, In IEEE International Conference on Computer Vision, pages 525531, 2001.
[23] Krystian Mikolajczyk and Cordelia Schmid, An affine invariant interest point detector, In
European Conference on Computer Vision, volume 4, pages 128142, 2002
[24] Krystian Mikolajczyk and Cordelia Schmid, Scale and affine invariant interest point
detectors, International Journal of Computer Vision, 60(1):6386, 2004.
[25] Chahab Nastar, Baback Moghaddam, and Alex Pentland, Flexible images: Matching and
recognition using learned deformations, Computer Vision & Image Understanding65
(2):179191, 1997.
[26] Alex Pentland, Rosalind W. Picard, and Stan Sclaroff, Photo book: Content based ma
nipulation of image databases, International Journal of Computer Vision, 18(3):233254,
1996.
[27] William K. Pratt, Digital Image Processing, John Wiley & Sons, Inc., New York, 2nd
Edition, 1991.
[28] Bernt Schiele and Alex Pentland, Probabilistic object recognition and localization, In
Proceedings of the Seventh International Conference on Computer Vision, pages 177182,
1999.
[29] Stan Sclaroff, Marco La Cascia, and Saratendu Sethi, Unifying textual and visual cues for
content-based image retrieval on the worldwide web, Computer Vision & Image
Understanding, 75(1/2):8698, 1999.
[30] Josef Sivic, Frederik Schaffalitzky, and Andrew Zisserman, Object level grouping for
video shots, In Proceedings of the European Conference on Computer Vision, volume
LNCS 3022, pages 8598, 2004
[31] Chris Stauffer and Eric Grimson, Similarity templates for detection and recognition, In
Real-Time Pattern Recognition using Matrox Imaging System
National Institute Of Technology Rourkela 80
IEEE Conference on Computer Vision & Pattern Recognition, Kauai, Hawaii, 2001.
[32] M. Turk and A. Pentland, Eigenfaces for recognition, Journal of Cognitive Neuro science
, 3:7186, 1991.
[33] G. vanderWal and P. Burt, A VLSI pyramid chip for multiresolution image analysis,
International Journal of Computer Vision, 8:177190, 1992.
[34] Harry Wechsler and George Lee Zimmerman, 2-D invariant object recognition using
distributed associative memory, IEEE Transactions on Pattern Analysis and Machine
Intelligence, 10(6):811821, November 1988
[35] Harry Wechsler and George Lee Zimmerman, Distributed associative memory (dam) for
bin-picking, IEEE Transactions on Pattern Analysis and Machine Intelligence11 (8):814
822, November 1989
[36] J. K. Wu, C. P. Lam, B. M. Mehtre, Y. J. Gao, and A. Desai Narasimhalu, Content-based
retrieval for trademark registration, Multimedia Tools & Applications, 3(3):2452671996.
[37] Gustavo Carneiro and Allan D. Jepson, Multi-scale phase-based local features, In pro
ceedings of the IEEE Computer Society Conference on Computer Vision and Pattern
Recognition, volume 1, pages 736743, Madison, WI, June 2003
[38] W. James MacLean and John K. Tsotsos, Fast pattern recognition using gradient descent
search in an image pyramid, In Proceedings of 15th Annual International Conference on
Pattern Recognition, volume 2, pages 877881, Barcelona, Spain, September 2000.