0% found this document useful (0 votes)
12 views

LAB MANUAL 2D1427 Image Based Recognitio

Uploaded by

Lucky Saxena
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views

LAB MANUAL 2D1427 Image Based Recognitio

Uploaded by

Lucky Saxena
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 25

LAB MANUAL - 2D1427 Image Based Recognition

and Classification

Babak Rasolzadeh

April 14, 2008


Face Detection

Real-time face detection in multi-scale images with an attentional cascade


of boosted classifiers.
In this project you will explore the machine learning method called Ad-
aboost by implementing it for the computer vision task of real-time face
detection in images. Real-time performance is achieved by exploiting a so
called attentional cascade. The final classifier/detector should be capable of
detecting upright frontal faces observed in reasonable lighting conditions.
Face detection is an important problem in image processing. It could poten-
tially be the first step in many applications – marking areas of interest for
better quality encoding for television broadcasting, content-based represen-
tation (in MPEG-4), face recognition, face tracking and gender recognition.
In fact for this latter task computer-based algorithms out-perform humans.
During the past decade, many methods and techniques have been gradually
developed and applied to solve the problem. These include vector quanti-
zation with multiple codebooks, face templates and Principal Component
Analysis (PCA). The latter technique is directly related to Eigenfaces and
Fisherfaces. Here we will develop a face detection system based on the well-
known work of Paul Viola and Michael Jones[2]. This basically involves the
interpretation of Haar-like features in a boosted cascade, see paper on the
course homepage.
The competition
In the end of the course/lab there will be a small competition between those
groups that have managed to perform the lab successfully. The objective of
the competition is to have a face detector that has accuracy as well as speed.
The group that maximizes these two criteria will win the competition and
rewarded with a small surprise-price.

Theory and Background

There is a loosely defined difference between the process of face detection


and that of face recognition. The former asumes that each instance of a face
belongs to a more general class of objects called faces. Thus, the process of
face recognition is a more specific process where you try to identify which
specific instance of a face you are looking at. Of course this is a sliding scale
since one could go on and say that a specific face, e.g. the face of Bill Gates,
can have different instances as well (i.e. happy, sad, tired etc.). That would
be an even more specific recognition process. In other words, the difference
between detection and recognition depends strongly on what class levels one
has defined.

1
However, for the sake of the argument that there is indeed a difference
between the two, one could say that at the detection level one needs to
identify a set of generalized features that apply for the class we are trying to
identify. The localization of such features can be accomplished by a number
of common methods. There are basically four different approaches to the
problem of face detection:

1. Knowledge-based methods: Rules are encoded based on the human


knowledge of the defining features of a human face. A majority of these
rules capture the relationship between features. [10, 8]
2. Feature invariant methods: Algorithms designed to find structural
features of a face that are invariant to the common problems of pose,
occlusion, expression, image conditions and rotation. [5, 6, 11]
3. Template matching methods: Given a sample set a corresponding
standard facial pattern set is produced. The relation between the
sample image and the defined pattern set is computed and used to
provide inference. [9, 7]
4. Appearance-based methods: Similar to template matching meth-
ods. The goal here is to achieve higher accuracy through larger varia-
tion in training data, since one uses statistics without any prior model
assumptions. [3, 12]

In this project, the focus will be on a specific appearance-based method,


namely the Viola-Jones face detector. This technique relies on the use of
simple Haar-like features that are evaluated quickly through the use of a
new image representation.
Over-complete Haar-like features with boosted cascade has proven to be an
effective approach to visual object detection, capable of processing images
extremely rapidly and achieving high detection rates with very low false
alarms. The effectiveness of this method comes from four key contributions.
The first one is a set of simple masks which are similar to Haar filters. The
second part is an image representation called Integral image which allows
these features to be computed very quickly. The third contribution is a
learning algorithm, based on Adaboost which selects a small number of fea-
tures from a large set and yields extremely effective classifier. The last one
is a method for combining increasingly more complex classifier in a Cascade
structure which allow background region of an image to be quickly discarded
while spending more computation on promising object-like regions. This is
sometimes referred to as an attentional cascade since it spends more compu-
tational effort on the more plausible target regions. We will introduce each
of these contributions and discuss them in detail below, before going on and
implementing them in Matlab.

2
Feature extraction: Haar-like features

There are two motivations for using features instead of the pixel intensities
directly. Firstly, features encode domain knowledge better than pixels. The
other reason is that a feature-based system can be much faster than a pixel-
based system.

(1) (2) (3) (4)

Figure 1: Four examples of the type of features normally used in the Viola-Jones
system.

In its simplest form, the Viola-Jones features can be thought of as pixel


intensity set evaluations. This is where the sum of the luminance of the
pixels in the white region(s) of the feature is subtracted from the sum of the
luminance in the dark region(s). This difference value is used as the feature
value. The position and size of the feature can vary over the detection-
box that is used. So for example feature of type 3 in figure 1 will have 4
parameters: the position (x, y) and the size of the white (w) and black (b)
regions respectively plus the height (h) of the feature. See figure 3.
Exercise: Calculate the number of distinct features as a function of (n, m),
for each of the four types in figure 1.

Fast feature extraction: Integral Image

In order to be successful a detection algorithm must posses two key features;


accuracy and speed. There is generally a trade-off between the two. Through
the use of a new image representation, the integral image, Viola & Jones
describe a means for the fast feature evaluation that proves to be an effective
way to speed up the classification task.
The integral image Int of an image I is defined as:
y
x X
X
Int(x, y) = I(x′ , y ′ ) (1)
x′ =0 y ′ =0

In other words the integral image at location (x, y) is the sum of all pixel-
values above and left of (x, y), inclusive.
The brilliance in using an integral image to speed up a feature extraction

3
lies in the fact that any rectangle in a image can be calculated from the
corresponding integral image, by indexing the integral image only four times.
Given a rectangle specified as four coordinates (x1 , y1 ) upper left and (x4 , y4 )
lower right (see figure 2), evaluating the area of a rectangle is done in four
integral image references:

A(x1 , y1 , x4 , y4 ) = Int(x1 , y1 ) + Int(x4 , y4 ) − Int(x1 , y4 ) − Int(x4 , y1 ) (2)

Figure 2: Example of integral image application.

Exercise: Write the features value of each of the four types in figure 1, as
a function of the parameters in figure 3, given the integral image Int (with
size n × m).

Figure 3: An illustration of the different parameters of a feature of type (3). The


detection-box is here assumed to be n × m dimensioned.

For a given set of training images, we can extract a large collection of features
very fast using the idea above. The hypothesis of Viola & Jones is that a
very small number of these features can be combined to form an effective
classifier.

The weak classifiers

How can these simple features be used to build classifiers ? First we will
consider weak classifiers. These are of the form - given a single feature vector

4
fi evaluated at x the output of the weak classifier hi (x) is either 0 or 1. The
output depends on whether the feature value is less than a given threshold
θi : 
1 if pi fi (x) < pi θi
hi (x) = (3)
0 otherwise
where pi is the parity and x is the image-box to be classified. Thus our set of
features define a set of weak classifiers. From the evaluation of each feature
type on training data it is possible to estimate the value of each classifier’s
threshold and its parity variable.
There are two basic methods for determining the threshold value associated
with a feature vector. Both methods rely on estimating two probability
distributions - the distribution of the values of the feature when applied
to the positive samples (face data) and to the negative samples (non-face
data). With these distributions, the threshold can be determined either by
taking the average of their means or by finding the crossover point [1]. This
cross-over point corresponds to:

fi s.t. p(fi |non-face) = p(fi |face) (4)

(See figure 4). In this project, that choice is left to the student. See the
matlab section.

Figure 4: An example of how the distribution of feature values for a specific feature
may look like over the set of all training samples.

Feature reduction: AdaBoost

Boosting (like Bagging) is the process of forming strong hypothesis through


linear combination of weak ones. In the context of Viola-Jones face detection

5
(a binary classification task) weak hypotheses can be represented as the weak
classifiers that are derived from the extracted set of features.
The idea of combining weak hypotheses to form strong ones is a logical step,
akin to the same logic that we as humans use during the decision making
process. For example, to determine that someone is who they say they are
we may ask them a series of questions, each one possibly no stronger than
the prior, but when the person has answered all the questions we make a
stronger decision about the validity of the persons identity.
An implementation of AdaBoost or Adaptive Boosting is shown in the al-
gorithm table 1.
The core idea behind the use of AdaBoost is the application of a weight
distribution to the sample set and the modification of the distribution during
each iteration of the algorithm. At the beginning the weight distribution is
flat, but after each iteration of the algorithm each of the weak hypotheses
returns a classification on each of the sample-images. If the classification
is correct the weight on that image is reduced (seen as an easier sample),
otherwise there is no change to its weight. Therefore, weak classifiers that
manage to classify difficult sample-images (i.e. with high weights) are given
higher weighting in the final strong classifier. Some of the initial rectangular
features selected by AdaBoost in an example run are shown in Figure 5.

Figure 5: Example of some features selected by AdaBoost.

Fast decision structure: The attentional cascade

Increasing the speed of a classification task generally implies that the clas-
sification error will increase. This is because the decreasing the time for
classification usually involves decreasing the number of evaluations, or weak
classifiers. However, this will significantly decrease accuracy. Viola & Jones
proposed a method for both reducing the classification time and maintaining
classifier robustness and accuracy though the use of a classifier cascade. The
logic behind the structure of the cascade is quite elegant, the key being that
in the early stages of the tree the classifier structure is largely naive, yet able

6
Algorithm 1 AdaBoost
Input: Example images (x1 , ..., xn ), and associated labels (y1 , ..., yn ), where
yi ∈ {0, 1}. yi = 0 denotes a negative example and yi = 1 a positive one.
m is the number of negative examples and l = n − m the number of
positive examples.
Initialise: Set the n weights to:

(2m)−1

if yi = 0
w1,i = (5)
(2l)−1 if yi = 1

for t = 1, · · · , T do

1. Normalize the weights,


wt,i
wt,i ← Pn (6)
j=1 wt,j

so that wt is a probability distribution.

2. For each feature j train a classifier hj which is restricted to using a


singlePfeature. The error is evaluated with respect to the wt,i ’s as
ǫj = i wt,i |hj (xi ) − yi |.

3. Then choose the classifier ht as the hj that gives the lowest error ǫj .
Set ǫt to that ǫj .

4. Update the weights:


wt+1,i = wt,i βt1−ei (7)
where ei = (1)0 if example xi is classified (in)correctly, and βi =
ǫi
1−ǫi .

end for
Output: A strong classifier defined by:
 PT 1 PT
h(x) =
1 if t=1 αt ht (x) ≥ 2 t=1 αt (8)
0 otherwise

where αt = log β1t .

7
to accurately classify negative samples with a small number of features. As
a positive sample progresses through the cascade, assuming that the sample
is indeed positively classified, then the process of classification will become
finer, and the number of features that are evaluated will increase. (See figure
6).

Figure 6: The attentional cascade using increasingly specialized classifiers.

The use of cascade capitalizes on the fact that in a large image during a
detection task a large majority of the sub-windows observed by the scanning
classifier (detection-box) will be rejected, since only a small regional area
corresponds to the targets (i.e. faces). For this reason, the generality of
the first number of stages must be sufficiently high to reduce the number
of these false positive sub-windows from progressing into the later stages of
the cascade.
The goal in a competitive algorithm is to provide inference with the lowest
possible false positives, and highest possible detection rate. Viola & Jones
show that given a trained classifier cascade, the false positive rate is the
product of all the false positive rates in the chain. Based on this and deeper
reasoning for the motivation of the cascade, they provide a generic algorithm
for the training process that the cascade must undertake in order to build
its stages (See algorithm table 2). In this algorithm both the minimum
acceptable detection rate d and the maximum false positive rate f for each
layer are required.

8
Algorithm 2 Cascade Training Algorithm
Input: Allowed false positive rates f , detection rates d and the target false
positive rate Ftarget . A set P of positive (faces) and of N negative
examples (non-faces).
Initialise: Set F0 = 1.0, D0 = 1.0, i = 0.
while Fi > Ftarget and ni < N do

• i=i+1

• ni = 0, Fi = Fi−1
while Fi > f × Fi−1 do

– ni = ni + 1

– Use P and N to train with ni features using AdaBoost.

– Evaluate the current cascaded classifier on the validation set


to determine Fi and Di .

– Decrease the threshold of the ith classifier until the current


cascaded classifier has a detection rate of at least d × Di−1
(this also affects Fi ).

end while

• N = ∅.

• If Fi > Ftarget then evaluate the current cascaded detector on the


set of the non-face images and put any false detections into the set
N.

end while

9
Matlab code and Database

You can find all the relevant files and documentation to this project on the
course webpage (www.csc.kth.se/utbildning/kth/kurser/2D1427/bik08/).
In this section there will be a brief overview of the Matlab code and dataset
used in the laboratory tasks. Almost all matlab-functions and scripts have
help comments. Just type
>> help filename
in the matlab command prompt to see the help.
Before we go into the specific tassks of the lab, we will give a brief overview
of the different functions included in the lab-package.

Lab functions overview

The main functions of the lab are listed in their order of execution in the file
AdaBoost main.m. Observe that we will work with windows of size 19×19
pixels. This assumes the detection-box is 19×19.
The function makelist.m runs a script that enables the user to select a
directory of images to be used as a database. Both training and validation
sets should be in the same directory. The output of this function is a file
list img.txt that lists all the positive samples (faces) and all the negative
samples (non-faces) from examining the filenames of the images.
This file is then read by the second function in AdaBoost main.m, namely
the ReadImageFile.m function. It initially reads all the face and non-face
image-files listed in list img.txt and stores each image as a row vector
of length 361(= 192 ). These row vectors are stacked on top of each other
to form two matrices - an Np × 361 matrix FaceData containing the face
image data and an Nn × 361 matrix NonFaceData containing the non-face
image data. Next the image data is normalized and the integral image of
each image is created. For this purpose the image-data is padded with extra
boundaries on the right and bottom of each image. Before ReadImageFile.m
exits, the resulting matrices cumFace and cumNonFace are saved to disk.
Your first task will be to write a function cumImageJN.m that performs this
cumulative sum (see Task 1).
The third function call in AdaBoost main.m is to makeImagesF.m. This
function generates the set of Viola-Jones features that will be used later on.
Here the student must think of an optimal way to represent and calculate
feature values. Remember that each of the many features (1̃00,000) will need
to be evaluated on ALL training images (1̃0,000). An efficient methodology
is described in the text (see Task 1), but it’s up to the student to implement

10
it or any other way that might be suitable. Important here is though the
ordering of the features (there’s a specific order specified for the features).
The fourth function DisplayFeature.m listed in AdaBoost main.m is one
that enables the graphical displaying of features in an intuitive way (black
and white areas). This function is given to the students and requires only
the feature number (according to the ordering specified in the lab) as pa-
rameter. Later in the lab however (Task II), the students is required to write
a function show classifier.m that utilizes DisplayFeature.m in order to
graphically display the superimposed features of a strong classfier.
The fifth function call in AdaBoost main.m is to TrainCascade.m. In this
function the goal is to create a cascade of strong classifiers (see Task 3). In
order for the competition between students to be fair, we’ve set an upper-
limit for the total number of weak classifiers (in all stages) to 100. How these
100 classifiers are distributed among the strong classifiers of the cascade is
up to the students. Just remember you will be judged both on accuracy
and speed! To do this, the function TrainCascade.m uses another function
called TrainAdaB stage.m which takes as input the desired number of weak
classifiers and creates a single stage (strong classifier) with that many fea-
tures. It uses the AdaBoost algorithm. When calling TrainAdaB stage.m
you must also specify the ratio of test data that you want to set aside
(extracted from all sample data) for validation. Completing this function
requires several steps and deeper understanding of the AdaBoost algorithm.
There are a couple of functions (findThreshold.m, TestStage.m) inside
TrainAdaB stage.m the student needs to complete (see Task 2).
When AdaBoost main.m is done it will create a cascaded face-detector that
will be used by the function FaceDetector.m. At this final stage of the
project the student will need to write a Matlab function that reads in an
image and tries to locate the faces in that image, see Task 3. These functions
are the ones run on a test set in the final competition at the end of the course.

Database

The database we use for this project can be found on the course webpage. It
consists of 2000 positive (ADAFACES) and 4000 negative (ADANFACES)
samples. There is, however, a simple trick to double the size of this database.
Since our method is feature invariant, every image will have a completely
different and unique counterpart - the mirror of that image. So by mirroring
every image in the dataset you will have “new” samples. We recommend
you do this before you start with the rest of this lab. (See Task 0).

11
Task 0 - Preliminaries

First of all you need to download the matlab library for the lab at
/afs/nada.kth.se/home/1/u16rglu1/Public/2D1427.
Under the subdirectory database you will find the database of positive
(ADAFACES.zip) and negative (ADANFACES.zip) samples, you need to
download these too.
When this is done you need to write a matlab script mirror.m that doubles
the existing database by creating the mirrored image of every image in the
database. Here you may want to use the matlab function fliplr (see matlab
Help). Your script should run this for every image of the databse and save
the mirrored versions.
The function makelist.m creates the list list img.txt of paths to the
database images. This list will be used by ReadImageFile.m for reading
the files. But it could also be used for the mirror.m script above. Just
remember to re-run makelist.m after you’ve mirrored the database.

Task I - Integral images and Haar-like features

In the file ReadImageFile.m there is a function call to a cumImageJN.m. It


takes as input an image, shaped as a row vector of size 1 × N , and returns
the corresponding integral image (a row vector of size 1 × N ).

Exercise 1 (Programming)
The first task is to write the function cumImageJN.m, run and test it.
( The matlab functions ’cumsum’ and ’reshape’ may be useful.)

As a test you can try this function call:

>> cumImageJN(ones(1, 9))

the resulting output should be [1 2 3 2 4 6 3 6 9].


If this works you can now run the function ReadImageFile.m (the second
function in AdaBoost main.m and thus generate the two matrices that hold
the integral image data (FaceData and NonFaceData. Note that these ma-
trices are created as vectorized images, see 7.
Next you are supposed to write a function that implements the Haar-like
features. Preferably we want to have them in a matrix fMat, Think about

12
Figure 7: The images are first vectorized then stacked into an array/matrix.

what the appropriate (most optimal feature description would be if the im-
ages are vectorized (reshaped as vectors instead of matrices) and if we want
to evaluate a feature on an image as

IntImage*fMat(:,i)

where IntImage is the linearized integral image (size: 1xnm) and fMat(:,i)
is the i:th feature. In other words a feature is represented as an nmx1 array
(column vector). Your task is to write a function featureGen.m that can
generate this feature vector given the type and parameters of the feature
(using the convention in figure 3).

Exercise 2 (Programming)
Write the function featureGen.m that given the parameters
(m, n, x, y, w, h, t) (where t indicates the type of the feature , see
figure 8), returns a feature vector featv of size nmx1, such that
IntImage*featv equals the feature value of that feature evaluated on
the image with integral image IntImage. The exercise below figure 2
can be useful for writing this function.

Convenient here is if all features start from x = 2, y = 2, since this will


simplify the feature calculation.

Before moving on you need to confirm that your code for featureGen.m is
correct. In order to do this you need to run the command

>> test_featGen

13
Make sure that the file featureGen.m is in the same folder as test featGen.p.
Show your result to the lab-assistant before moving on.
If this section is too difficult and taking to much time, you can ask the
lab-assistant to be given a simpler version of this task.
The function DisplayFeature.m allows the visualization of a feature. For
example, to illustrate feature featv:

>> box = DisplayFeature(19,19,featv);


>> imagesc(box); colormap(gray); axis equal;

Try generating different features (using various types and parameters) and
display them using the above command to see if they look like expected.

For the continuation of the lab it’s very important that the features are
ordered in a pre-defined way. We use the following ordering (using the
convention in figure 3):

for y=2...n
for x=2...m
for h=1...
for w=1...

How the different feature types are ordered is illustrated in figure 8.

Figure 8: The order in which the different feature types and sizes are stacked.

The goal now is to generate a feature matrix fMat that contains all the
features. This should have size 192 xN , where N is the total number of
features.

14
Exercise 3 (Programming)
Write the function featureMatrix.m that given the size of the window
(n, m) creates the matrix fMat.
Tips: Utilize your previously written function featureGen.m!

From now on in the lab, fMat and FaceData and NonFaceData will be the
only data we need to go on with the Boosting.
Before going on, check with the lab-assistant if you’ve done this correctly.

Exercise 4 (Written)
How are the Viola-Jones features represented in this implementation?
What is the benefit of this implementation?

Task II - AdaBoost and Classifier construction

Here your task is to write the function TrainAdaB stage.m. You should
note that this function takes a long time to run on the whole dataset. So
your code should be first developed and tested on a small control sample
mentioned at the end of this Task. We will now make our way through this
function and complete some exercises while doing so.

Classifier Construction via AdaBoost

The function takes the desired number of weak classifiers (T ), the ratio of
testdata to training (test ratio) as well as the desired false positive rate
(targetFP) and returns the strong classifier as a list fNbestArray of the se-
lected feature indicies in fMat, their respective thresholds thetaBestArray
with parities pBestArray, the respective AdaBoost coefficients alpha t Array
and finally the tested true positive (tp) and true negative rates(tn) of the
whole classifier on the test-set.

Exercise 5 (Programming)
Write the function TrainAdaB stage.m. The function should re-
turned a trained stage with the desired parameters. Use the funtions
findThreshold.m and TestStage.m that are described as parallel tasks.

15
In order to facilitate the writing of this function there are several tasks for
you to do in parallel.
When you have found out the feature value of a specific feature for ALL
samples (IntImageMat*fMat(:,i)), you basically have all the information
needed to construct the two histograms in Figure 4.

Exercise 6 (Programming)
First, write a function DisplayHist.m that displays the two histograms
of a specific feature, given the feature response vector featv and the
number of positive and negative samples (we assume the samples ordered
with positive first and negative last). TIP: Use the MATLAB function
histc.

When you have done this the goal is to find a threshold function for the
feature (generate a weak classifer). Here the ambitious student can try to
generate a more intelligent weak classifier (for example like in [1] by having
multiple thresholds). The standard simple approach that we suggest is to
use a single threshold just like described in the Viola-Jones paper.

Exercise 7 (Programming)
Write the function findThreshold.m that finds an appropriate thresh-
old on a feature response to separate the positive and negative sets.

This function should take the feature value response on all images (positive
and negative samples) (featval), the number of positive and negative sam-
ples (npos, nneg), and return the optimal threshold with the appropriate
parity. Your function call will probably look something like:

[threshold, parity]= findThreshold(featval,npos,nneg);

Note that featval is a vector of length N , where N is the number of training


images. The output-variable threshold must be a scalar and parity should
be either 1 or -1. You can test your findThreshold function by reassuring
yourself that min(featval)< parity*threshold < max(featval).
After completing this function you can test weather or not it’s working
properly by having a look at the file t findThreshold.m. Observer that
this check is only for the case of using the Viola-Jones ”‘average-of-means”’
threshold.

16
Classifier Evaluation

Your first task when you finally have a trained strong classifier is to write a
funtion that can use it.

Exercise 8 (Programming)
Write a function ApplyStage.m that runs your final strong classi-
fier data on a test image of size 19x19 and returns the classifica-
tion according to that classifier. The input to the function should be
fNbestArray(the array of row indexes in fMat of the features included
in the strong classifier), thetaBestArray (their corresponding thresh-
olds), pBestArray (their corresponding parities), and alpha t Array
(their corresponding feature weights; α-values).

You now have final strong classifier. You can test it on the portion of test
data (x test,y test) set aside at the beginning of TrainAdaB stage.m.

Exercise 9 (Programming)
Write a function TestStage.m that runs your final strong classifier
data on the test data (x test,y test). It should output the fraction of
true-positives (tp) and true-negatives (tn). This function should utilize
ApplyStage.m.

Similarly for testing the correctness of this function after completion you
can have a look at the script t TestStage.m. There’s also a suggested
appearance on the returned ROC curve in the file t TestStage.jpg.
The call to this function should probably look like:

[tp,tn] = TestStage(fNbestArray,thetaBestArray,pBestArray,alpha_t_Array,x_test,y_test);

where fNbestArray is the array of id’s (row indexes in the feature ma-
trix saved in the file feature.mat) of the features included in the strong
classifier, thetaBestArray is their corresponding thresholds, pBestArray
their corresponding parities, and alpha t Array their corresponding fea-
ture weights (α-values). Note that all of the four vectors above are 1 × T
where T is the number of features in the strong classifier.
NOTE: The definition of true-positive and true-negatives can be found by
studying table 1.
Note in table 1 that T P|P|
+F N
= F P|N
+T N
| = 1, where |P| and |N | are the
number of positive and negative samples, respectively. In other words the

17
Predicted class
True class Yes No
Yes True-Positive (TP) False-Negative (FN)
No False-Positive (FP) True-Negative (TN)

Table 1: The relationships between predicted class vs. true class.

TP FN FP TN
rates for each parameter is tp = |P| , f n = |P| , f p = |N | and tn = |N | .

The number of true-positives and false-positives will vary depending on the


threshold applied to the final strong classifier. The ROC-curve (Receiver
Operating Characteristic) is a way to summarize this variation. It is a curve
which plots the fraction of true-positives Vs the fraction of false-positives as
the threshold varies from −∞ to +∞. From this curve you can ascertain
what loss in classifier specificity (false-positive rate) you will have to endure
for a required accuracy (true-positive rate).

Exercise 10 (Programming)
Write a function to calculate and plot the ROC-curve for your classi-
fier.

Debug

Before you go on, it is highly recommended that you test your final
TrainAdaB stage on some simple (not too time consuming) data in order
to reassure yourself that things are working properly. For this task there is
some sample data on the course webpage. Download the files t images.mat
and t features.mat and replace the names on lines load(images.mat) and
load(features.mat) in the code with these names instead.

Exercise 11 (Programming)
Run the AdaBoost classifier training function you have just completed
with the small datasets you have just downloaded. When the code runs,
make sure that no warnings or errors are detected. Use the call:

>> [fN,theta,pBest,alpha_t,tp,tn]=TrainAdaB_stage(10,0.5,0);

i.e. where the number of stages (strong classifiers) in the cascade is 10


and the ratio of training to test data is 0.5

18
Once you are happy that your code runs smoothly and does what you expect
it to do, you can analyze the performance and structure of the classifier
you have trained. The performance can be evaluated by calculating and
visualizing the ROC-curve. While the structure of the classifier can be
examined by seeing which features were selected to build your classifier. For
the latter task a function show classifier.m have to be be written. We
now that to illustrate feature number i:

>> boxen = DisplayFeature(19,19,f(:,i));


>> imagesc(-pBestArray(i)*box); colormap(gray); axis equal;

Exercise 12 (Progamming)
Write the function show classifier.m that utilizes
DisplayFeature.m in order to graphically display the superimposed
features of a strong classfier.

Exercise 13 (Progamming/Written)
1. Calculate and plot the ROC-curve for your classifier.

2. Which 10 features were selected for your strong classifier? Display


them and save the images.

3. Run the script command and describe what you see.

>> show_classifier(fN,alpha_t,1);

PS - Don’t forget to change the lines load(t images.mat) and load(t features.mat)
back to their original form when you are done. - DS

Task III - Real-time face detection

Now you have a method TrainAdaB stage that generates a strong classifier.
It can be used to create the cascade of classifiers described in algorithm 2.
Given this cascade of classifiers it is then possible to run an efficient and fast
face detection. Task III is divided into two sections - building the cascade
of classifiers and implementing face-detection on test image.

19
Cascade of Classifiers

In the matlab file TrainCascade.m you have to implement the algorithm


described in algorithm table 2 using the function TrainAdaB stage.
The required validation dataset can be marked by the test ratio parameter
in the call to the TrainAdaB stage function. Remember that the TOTAL
number of features in the cascade (weak classifiers) cannot exceed 100. So
the choice of how many features to use at each level of the cascade is up to
you.
Following the algorithm in table 2, there are 3 important steps here:

1. Updating the negative training set at each level

2. The number of features used at each level is decided by the specified


classification rate.

3. Test whether the approach taken in step 2 results in a better classifier


than having a fixed number of features at each level of the cascade.
(Better means classification performance and speed).

Also remember to save the final cascaded classifier - preferably done by


saving the array of feature indexes and the corresponding array of α-values
and parities.

Exercise 14 (Programming)
Complete the function TrainCascade.m. Run and debug it.

Once your TrainCascade is ready you can run AdaBoost main.m. The pro-
gram will create a cascade that is saved and ready to be used.

Face detection

Your first task here is to write a function that utilizes the trained cascade
to classify a given subwindow of size 19x19.

Exercise 15 (Progamming)
Write a function ApplyCascade that runs a given subwindow of size
19x19 through a given cascade (cascade.mat) and returns a positive or
negative response. Preferably this function should utilize ApplyStage.m
in a for-loop.

20
Now you are ready to create a function FaceDetector.m that uses this
cascade to detect faces of all sizes in larger images. Since the function
ApplyCascade.m only classifies a subwindow of size 19x19 we want to apply
this function over all scales and all locations in a larger image. In order to
do this you should think in terms of nested for-loops, see figure 9.

Figure 9: The sliding window of the FaceDetector.m will traverse different loca-
tions and scales in the large image.

Exercise 16 (Programming)
Write the function FaceDetector.m. The function should take an
input image (can be read in by e.g. testimage=imread(’test.bmp’);)
and return the windows of the image which were labeled as faces by
your cascade classifier, i.e. the output should be the original image
with colored squares marking the location of the detected faces. Use
the fucntion ApplyCascade.m for classifying the subwindows. (Tip: the
MATLAB function rectangle might be useful here).

Remember to take care of the issue of multiple responses in neighboring


positions. This can be done for example by restricting how close two distinct
faces can be or by taking the average position of multiple detections that
are relatively close to each other.
If you are using a windows computer there is an entertaining application to
the program you’ve just created.

21
Exercise 17 (Just for fun)
If you connect a webcam to your computer and use the command
cam image = vcapg(’fast’); you can get live images from the we-
bcam. Try to write a script that takes cam image and uses the function
FaceDetector.m to detect faces in the webcam image-flow.

Task IV - Evaluation and testing

There is actually not much left to do now. The only thing you need to do
now is to reassure yourself that the code complies with the restrictions of
the competition;
You need to save the structure cascade.mat as a mat-file in MATLAB and
email it to babak2@kth.se. If everything is done right the competition script
will read your cascade into the detector we’ve designed and test it on a test
set separate from the training set you’ve been given.
As stated before, the winning group of this competition will be announced
at the end of the course. The two criteria that will be measure by the
competition script are speed and accuracy (where accuracy means high true-
positive and low false-positive).

GOOD LUCK!

22
Bibliography

[1] B. Rasolzadeh, L. Petersson and N. Pettersson, Response Binning: Im-


proved Weak Classifiers for Boosting. IEEE Intelligent Vehicles Sym-
posium (IV2006), Tokyo, Japan, June 2006.

[2] P. Viola & M. Jones, Robust real-time object detection, in Second Inter-
national Workshop on Statistical Learning and Computational Theories
of Vision Modeling, Learning, Computing and Sampling, July 2001.

[3] H. Rowley, S. Baluja and T. Kanade, Neural network-based face de-


tection, in IEEE Patt. Anal. Mach. Intell., volume 20, pages 22-38,
1998.

[4] R.E. Schapire, A brief introduction to boosting, in Proceedings of


the Sixteenth International Joint Conference on Artificial Intelligence,
1999.

[5] P. Viola & M. Jones, Rapid object detection using a boosted cascade
of simple features, 2001.

[6] R. Lienhart and J. Maydt, An extended set of Haar-like features for


rapid object detection, in ICIP, volume 1, pages 900-903, 2002.

[7] Joong-In Shin Hyun-Sool Kim, Woo-Seok Kang, Sang-Hui Park,


Face detection using template matching and ellipse fitting, in IEICE
TRANS. INF. SYST., volume E83-D, 2000.

[8] P. Peer and F. Solina, An automatic human face detection method, in


Proc. of Computer Vision Winter Workshop, pages 122-130, Rastenfeld,
Austria, 1999.

[9] E. Saber and A. Tekalp, Frontal-view face detection and facial feature
extraction using color, shape and symmetry based cost functions, 1998.

[10] J. Tang, S. Kawato and J. Ohya, A face recognition system based on


Wavelet transform and neural network, in International Conference on
Wavelet Analysis and its Applications, p. 53, 1999.

23
[11] B. Menser, M. Brnig, Segmentation of Human Faces in Color Images
Using Connected Operators, in ICIP, 1999.

[12] D. Roth, M.-H. Yang, and N. Ahuja, A SNoW-based face detector. In


NIPS-12, 2000.

24

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy