LAB MANUAL 2D1427 Image Based Recognitio
LAB MANUAL 2D1427 Image Based Recognitio
and Classification
Babak Rasolzadeh
1
However, for the sake of the argument that there is indeed a difference
between the two, one could say that at the detection level one needs to
identify a set of generalized features that apply for the class we are trying to
identify. The localization of such features can be accomplished by a number
of common methods. There are basically four different approaches to the
problem of face detection:
2
Feature extraction: Haar-like features
There are two motivations for using features instead of the pixel intensities
directly. Firstly, features encode domain knowledge better than pixels. The
other reason is that a feature-based system can be much faster than a pixel-
based system.
Figure 1: Four examples of the type of features normally used in the Viola-Jones
system.
In other words the integral image at location (x, y) is the sum of all pixel-
values above and left of (x, y), inclusive.
The brilliance in using an integral image to speed up a feature extraction
3
lies in the fact that any rectangle in a image can be calculated from the
corresponding integral image, by indexing the integral image only four times.
Given a rectangle specified as four coordinates (x1 , y1 ) upper left and (x4 , y4 )
lower right (see figure 2), evaluating the area of a rectangle is done in four
integral image references:
Exercise: Write the features value of each of the four types in figure 1, as
a function of the parameters in figure 3, given the integral image Int (with
size n × m).
For a given set of training images, we can extract a large collection of features
very fast using the idea above. The hypothesis of Viola & Jones is that a
very small number of these features can be combined to form an effective
classifier.
How can these simple features be used to build classifiers ? First we will
consider weak classifiers. These are of the form - given a single feature vector
4
fi evaluated at x the output of the weak classifier hi (x) is either 0 or 1. The
output depends on whether the feature value is less than a given threshold
θi :
1 if pi fi (x) < pi θi
hi (x) = (3)
0 otherwise
where pi is the parity and x is the image-box to be classified. Thus our set of
features define a set of weak classifiers. From the evaluation of each feature
type on training data it is possible to estimate the value of each classifier’s
threshold and its parity variable.
There are two basic methods for determining the threshold value associated
with a feature vector. Both methods rely on estimating two probability
distributions - the distribution of the values of the feature when applied
to the positive samples (face data) and to the negative samples (non-face
data). With these distributions, the threshold can be determined either by
taking the average of their means or by finding the crossover point [1]. This
cross-over point corresponds to:
(See figure 4). In this project, that choice is left to the student. See the
matlab section.
Figure 4: An example of how the distribution of feature values for a specific feature
may look like over the set of all training samples.
5
(a binary classification task) weak hypotheses can be represented as the weak
classifiers that are derived from the extracted set of features.
The idea of combining weak hypotheses to form strong ones is a logical step,
akin to the same logic that we as humans use during the decision making
process. For example, to determine that someone is who they say they are
we may ask them a series of questions, each one possibly no stronger than
the prior, but when the person has answered all the questions we make a
stronger decision about the validity of the persons identity.
An implementation of AdaBoost or Adaptive Boosting is shown in the al-
gorithm table 1.
The core idea behind the use of AdaBoost is the application of a weight
distribution to the sample set and the modification of the distribution during
each iteration of the algorithm. At the beginning the weight distribution is
flat, but after each iteration of the algorithm each of the weak hypotheses
returns a classification on each of the sample-images. If the classification
is correct the weight on that image is reduced (seen as an easier sample),
otherwise there is no change to its weight. Therefore, weak classifiers that
manage to classify difficult sample-images (i.e. with high weights) are given
higher weighting in the final strong classifier. Some of the initial rectangular
features selected by AdaBoost in an example run are shown in Figure 5.
Increasing the speed of a classification task generally implies that the clas-
sification error will increase. This is because the decreasing the time for
classification usually involves decreasing the number of evaluations, or weak
classifiers. However, this will significantly decrease accuracy. Viola & Jones
proposed a method for both reducing the classification time and maintaining
classifier robustness and accuracy though the use of a classifier cascade. The
logic behind the structure of the cascade is quite elegant, the key being that
in the early stages of the tree the classifier structure is largely naive, yet able
6
Algorithm 1 AdaBoost
Input: Example images (x1 , ..., xn ), and associated labels (y1 , ..., yn ), where
yi ∈ {0, 1}. yi = 0 denotes a negative example and yi = 1 a positive one.
m is the number of negative examples and l = n − m the number of
positive examples.
Initialise: Set the n weights to:
(2m)−1
if yi = 0
w1,i = (5)
(2l)−1 if yi = 1
for t = 1, · · · , T do
3. Then choose the classifier ht as the hj that gives the lowest error ǫj .
Set ǫt to that ǫj .
end for
Output: A strong classifier defined by:
PT 1 PT
h(x) =
1 if t=1 αt ht (x) ≥ 2 t=1 αt (8)
0 otherwise
7
to accurately classify negative samples with a small number of features. As
a positive sample progresses through the cascade, assuming that the sample
is indeed positively classified, then the process of classification will become
finer, and the number of features that are evaluated will increase. (See figure
6).
The use of cascade capitalizes on the fact that in a large image during a
detection task a large majority of the sub-windows observed by the scanning
classifier (detection-box) will be rejected, since only a small regional area
corresponds to the targets (i.e. faces). For this reason, the generality of
the first number of stages must be sufficiently high to reduce the number
of these false positive sub-windows from progressing into the later stages of
the cascade.
The goal in a competitive algorithm is to provide inference with the lowest
possible false positives, and highest possible detection rate. Viola & Jones
show that given a trained classifier cascade, the false positive rate is the
product of all the false positive rates in the chain. Based on this and deeper
reasoning for the motivation of the cascade, they provide a generic algorithm
for the training process that the cascade must undertake in order to build
its stages (See algorithm table 2). In this algorithm both the minimum
acceptable detection rate d and the maximum false positive rate f for each
layer are required.
8
Algorithm 2 Cascade Training Algorithm
Input: Allowed false positive rates f , detection rates d and the target false
positive rate Ftarget . A set P of positive (faces) and of N negative
examples (non-faces).
Initialise: Set F0 = 1.0, D0 = 1.0, i = 0.
while Fi > Ftarget and ni < N do
• i=i+1
• ni = 0, Fi = Fi−1
while Fi > f × Fi−1 do
– ni = ni + 1
end while
• N = ∅.
end while
9
Matlab code and Database
You can find all the relevant files and documentation to this project on the
course webpage (www.csc.kth.se/utbildning/kth/kurser/2D1427/bik08/).
In this section there will be a brief overview of the Matlab code and dataset
used in the laboratory tasks. Almost all matlab-functions and scripts have
help comments. Just type
>> help filename
in the matlab command prompt to see the help.
Before we go into the specific tassks of the lab, we will give a brief overview
of the different functions included in the lab-package.
The main functions of the lab are listed in their order of execution in the file
AdaBoost main.m. Observe that we will work with windows of size 19×19
pixels. This assumes the detection-box is 19×19.
The function makelist.m runs a script that enables the user to select a
directory of images to be used as a database. Both training and validation
sets should be in the same directory. The output of this function is a file
list img.txt that lists all the positive samples (faces) and all the negative
samples (non-faces) from examining the filenames of the images.
This file is then read by the second function in AdaBoost main.m, namely
the ReadImageFile.m function. It initially reads all the face and non-face
image-files listed in list img.txt and stores each image as a row vector
of length 361(= 192 ). These row vectors are stacked on top of each other
to form two matrices - an Np × 361 matrix FaceData containing the face
image data and an Nn × 361 matrix NonFaceData containing the non-face
image data. Next the image data is normalized and the integral image of
each image is created. For this purpose the image-data is padded with extra
boundaries on the right and bottom of each image. Before ReadImageFile.m
exits, the resulting matrices cumFace and cumNonFace are saved to disk.
Your first task will be to write a function cumImageJN.m that performs this
cumulative sum (see Task 1).
The third function call in AdaBoost main.m is to makeImagesF.m. This
function generates the set of Viola-Jones features that will be used later on.
Here the student must think of an optimal way to represent and calculate
feature values. Remember that each of the many features (1̃00,000) will need
to be evaluated on ALL training images (1̃0,000). An efficient methodology
is described in the text (see Task 1), but it’s up to the student to implement
10
it or any other way that might be suitable. Important here is though the
ordering of the features (there’s a specific order specified for the features).
The fourth function DisplayFeature.m listed in AdaBoost main.m is one
that enables the graphical displaying of features in an intuitive way (black
and white areas). This function is given to the students and requires only
the feature number (according to the ordering specified in the lab) as pa-
rameter. Later in the lab however (Task II), the students is required to write
a function show classifier.m that utilizes DisplayFeature.m in order to
graphically display the superimposed features of a strong classfier.
The fifth function call in AdaBoost main.m is to TrainCascade.m. In this
function the goal is to create a cascade of strong classifiers (see Task 3). In
order for the competition between students to be fair, we’ve set an upper-
limit for the total number of weak classifiers (in all stages) to 100. How these
100 classifiers are distributed among the strong classifiers of the cascade is
up to the students. Just remember you will be judged both on accuracy
and speed! To do this, the function TrainCascade.m uses another function
called TrainAdaB stage.m which takes as input the desired number of weak
classifiers and creates a single stage (strong classifier) with that many fea-
tures. It uses the AdaBoost algorithm. When calling TrainAdaB stage.m
you must also specify the ratio of test data that you want to set aside
(extracted from all sample data) for validation. Completing this function
requires several steps and deeper understanding of the AdaBoost algorithm.
There are a couple of functions (findThreshold.m, TestStage.m) inside
TrainAdaB stage.m the student needs to complete (see Task 2).
When AdaBoost main.m is done it will create a cascaded face-detector that
will be used by the function FaceDetector.m. At this final stage of the
project the student will need to write a Matlab function that reads in an
image and tries to locate the faces in that image, see Task 3. These functions
are the ones run on a test set in the final competition at the end of the course.
Database
The database we use for this project can be found on the course webpage. It
consists of 2000 positive (ADAFACES) and 4000 negative (ADANFACES)
samples. There is, however, a simple trick to double the size of this database.
Since our method is feature invariant, every image will have a completely
different and unique counterpart - the mirror of that image. So by mirroring
every image in the dataset you will have “new” samples. We recommend
you do this before you start with the rest of this lab. (See Task 0).
11
Task 0 - Preliminaries
First of all you need to download the matlab library for the lab at
/afs/nada.kth.se/home/1/u16rglu1/Public/2D1427.
Under the subdirectory database you will find the database of positive
(ADAFACES.zip) and negative (ADANFACES.zip) samples, you need to
download these too.
When this is done you need to write a matlab script mirror.m that doubles
the existing database by creating the mirrored image of every image in the
database. Here you may want to use the matlab function fliplr (see matlab
Help). Your script should run this for every image of the databse and save
the mirrored versions.
The function makelist.m creates the list list img.txt of paths to the
database images. This list will be used by ReadImageFile.m for reading
the files. But it could also be used for the mirror.m script above. Just
remember to re-run makelist.m after you’ve mirrored the database.
Exercise 1 (Programming)
The first task is to write the function cumImageJN.m, run and test it.
( The matlab functions ’cumsum’ and ’reshape’ may be useful.)
12
Figure 7: The images are first vectorized then stacked into an array/matrix.
what the appropriate (most optimal feature description would be if the im-
ages are vectorized (reshaped as vectors instead of matrices) and if we want
to evaluate a feature on an image as
IntImage*fMat(:,i)
where IntImage is the linearized integral image (size: 1xnm) and fMat(:,i)
is the i:th feature. In other words a feature is represented as an nmx1 array
(column vector). Your task is to write a function featureGen.m that can
generate this feature vector given the type and parameters of the feature
(using the convention in figure 3).
Exercise 2 (Programming)
Write the function featureGen.m that given the parameters
(m, n, x, y, w, h, t) (where t indicates the type of the feature , see
figure 8), returns a feature vector featv of size nmx1, such that
IntImage*featv equals the feature value of that feature evaluated on
the image with integral image IntImage. The exercise below figure 2
can be useful for writing this function.
Before moving on you need to confirm that your code for featureGen.m is
correct. In order to do this you need to run the command
>> test_featGen
13
Make sure that the file featureGen.m is in the same folder as test featGen.p.
Show your result to the lab-assistant before moving on.
If this section is too difficult and taking to much time, you can ask the
lab-assistant to be given a simpler version of this task.
The function DisplayFeature.m allows the visualization of a feature. For
example, to illustrate feature featv:
Try generating different features (using various types and parameters) and
display them using the above command to see if they look like expected.
For the continuation of the lab it’s very important that the features are
ordered in a pre-defined way. We use the following ordering (using the
convention in figure 3):
for y=2...n
for x=2...m
for h=1...
for w=1...
Figure 8: The order in which the different feature types and sizes are stacked.
The goal now is to generate a feature matrix fMat that contains all the
features. This should have size 192 xN , where N is the total number of
features.
14
Exercise 3 (Programming)
Write the function featureMatrix.m that given the size of the window
(n, m) creates the matrix fMat.
Tips: Utilize your previously written function featureGen.m!
From now on in the lab, fMat and FaceData and NonFaceData will be the
only data we need to go on with the Boosting.
Before going on, check with the lab-assistant if you’ve done this correctly.
Exercise 4 (Written)
How are the Viola-Jones features represented in this implementation?
What is the benefit of this implementation?
Here your task is to write the function TrainAdaB stage.m. You should
note that this function takes a long time to run on the whole dataset. So
your code should be first developed and tested on a small control sample
mentioned at the end of this Task. We will now make our way through this
function and complete some exercises while doing so.
The function takes the desired number of weak classifiers (T ), the ratio of
testdata to training (test ratio) as well as the desired false positive rate
(targetFP) and returns the strong classifier as a list fNbestArray of the se-
lected feature indicies in fMat, their respective thresholds thetaBestArray
with parities pBestArray, the respective AdaBoost coefficients alpha t Array
and finally the tested true positive (tp) and true negative rates(tn) of the
whole classifier on the test-set.
Exercise 5 (Programming)
Write the function TrainAdaB stage.m. The function should re-
turned a trained stage with the desired parameters. Use the funtions
findThreshold.m and TestStage.m that are described as parallel tasks.
15
In order to facilitate the writing of this function there are several tasks for
you to do in parallel.
When you have found out the feature value of a specific feature for ALL
samples (IntImageMat*fMat(:,i)), you basically have all the information
needed to construct the two histograms in Figure 4.
Exercise 6 (Programming)
First, write a function DisplayHist.m that displays the two histograms
of a specific feature, given the feature response vector featv and the
number of positive and negative samples (we assume the samples ordered
with positive first and negative last). TIP: Use the MATLAB function
histc.
When you have done this the goal is to find a threshold function for the
feature (generate a weak classifer). Here the ambitious student can try to
generate a more intelligent weak classifier (for example like in [1] by having
multiple thresholds). The standard simple approach that we suggest is to
use a single threshold just like described in the Viola-Jones paper.
Exercise 7 (Programming)
Write the function findThreshold.m that finds an appropriate thresh-
old on a feature response to separate the positive and negative sets.
This function should take the feature value response on all images (positive
and negative samples) (featval), the number of positive and negative sam-
ples (npos, nneg), and return the optimal threshold with the appropriate
parity. Your function call will probably look something like:
16
Classifier Evaluation
Your first task when you finally have a trained strong classifier is to write a
funtion that can use it.
Exercise 8 (Programming)
Write a function ApplyStage.m that runs your final strong classi-
fier data on a test image of size 19x19 and returns the classifica-
tion according to that classifier. The input to the function should be
fNbestArray(the array of row indexes in fMat of the features included
in the strong classifier), thetaBestArray (their corresponding thresh-
olds), pBestArray (their corresponding parities), and alpha t Array
(their corresponding feature weights; α-values).
You now have final strong classifier. You can test it on the portion of test
data (x test,y test) set aside at the beginning of TrainAdaB stage.m.
Exercise 9 (Programming)
Write a function TestStage.m that runs your final strong classifier
data on the test data (x test,y test). It should output the fraction of
true-positives (tp) and true-negatives (tn). This function should utilize
ApplyStage.m.
Similarly for testing the correctness of this function after completion you
can have a look at the script t TestStage.m. There’s also a suggested
appearance on the returned ROC curve in the file t TestStage.jpg.
The call to this function should probably look like:
[tp,tn] = TestStage(fNbestArray,thetaBestArray,pBestArray,alpha_t_Array,x_test,y_test);
where fNbestArray is the array of id’s (row indexes in the feature ma-
trix saved in the file feature.mat) of the features included in the strong
classifier, thetaBestArray is their corresponding thresholds, pBestArray
their corresponding parities, and alpha t Array their corresponding fea-
ture weights (α-values). Note that all of the four vectors above are 1 × T
where T is the number of features in the strong classifier.
NOTE: The definition of true-positive and true-negatives can be found by
studying table 1.
Note in table 1 that T P|P|
+F N
= F P|N
+T N
| = 1, where |P| and |N | are the
number of positive and negative samples, respectively. In other words the
17
Predicted class
True class Yes No
Yes True-Positive (TP) False-Negative (FN)
No False-Positive (FP) True-Negative (TN)
TP FN FP TN
rates for each parameter is tp = |P| , f n = |P| , f p = |N | and tn = |N | .
Exercise 10 (Programming)
Write a function to calculate and plot the ROC-curve for your classi-
fier.
Debug
Before you go on, it is highly recommended that you test your final
TrainAdaB stage on some simple (not too time consuming) data in order
to reassure yourself that things are working properly. For this task there is
some sample data on the course webpage. Download the files t images.mat
and t features.mat and replace the names on lines load(images.mat) and
load(features.mat) in the code with these names instead.
Exercise 11 (Programming)
Run the AdaBoost classifier training function you have just completed
with the small datasets you have just downloaded. When the code runs,
make sure that no warnings or errors are detected. Use the call:
>> [fN,theta,pBest,alpha_t,tp,tn]=TrainAdaB_stage(10,0.5,0);
18
Once you are happy that your code runs smoothly and does what you expect
it to do, you can analyze the performance and structure of the classifier
you have trained. The performance can be evaluated by calculating and
visualizing the ROC-curve. While the structure of the classifier can be
examined by seeing which features were selected to build your classifier. For
the latter task a function show classifier.m have to be be written. We
now that to illustrate feature number i:
Exercise 12 (Progamming)
Write the function show classifier.m that utilizes
DisplayFeature.m in order to graphically display the superimposed
features of a strong classfier.
Exercise 13 (Progamming/Written)
1. Calculate and plot the ROC-curve for your classifier.
>> show_classifier(fN,alpha_t,1);
PS - Don’t forget to change the lines load(t images.mat) and load(t features.mat)
back to their original form when you are done. - DS
Now you have a method TrainAdaB stage that generates a strong classifier.
It can be used to create the cascade of classifiers described in algorithm 2.
Given this cascade of classifiers it is then possible to run an efficient and fast
face detection. Task III is divided into two sections - building the cascade
of classifiers and implementing face-detection on test image.
19
Cascade of Classifiers
Exercise 14 (Programming)
Complete the function TrainCascade.m. Run and debug it.
Once your TrainCascade is ready you can run AdaBoost main.m. The pro-
gram will create a cascade that is saved and ready to be used.
Face detection
Your first task here is to write a function that utilizes the trained cascade
to classify a given subwindow of size 19x19.
Exercise 15 (Progamming)
Write a function ApplyCascade that runs a given subwindow of size
19x19 through a given cascade (cascade.mat) and returns a positive or
negative response. Preferably this function should utilize ApplyStage.m
in a for-loop.
20
Now you are ready to create a function FaceDetector.m that uses this
cascade to detect faces of all sizes in larger images. Since the function
ApplyCascade.m only classifies a subwindow of size 19x19 we want to apply
this function over all scales and all locations in a larger image. In order to
do this you should think in terms of nested for-loops, see figure 9.
Figure 9: The sliding window of the FaceDetector.m will traverse different loca-
tions and scales in the large image.
Exercise 16 (Programming)
Write the function FaceDetector.m. The function should take an
input image (can be read in by e.g. testimage=imread(’test.bmp’);)
and return the windows of the image which were labeled as faces by
your cascade classifier, i.e. the output should be the original image
with colored squares marking the location of the detected faces. Use
the fucntion ApplyCascade.m for classifying the subwindows. (Tip: the
MATLAB function rectangle might be useful here).
21
Exercise 17 (Just for fun)
If you connect a webcam to your computer and use the command
cam image = vcapg(’fast’); you can get live images from the we-
bcam. Try to write a script that takes cam image and uses the function
FaceDetector.m to detect faces in the webcam image-flow.
There is actually not much left to do now. The only thing you need to do
now is to reassure yourself that the code complies with the restrictions of
the competition;
You need to save the structure cascade.mat as a mat-file in MATLAB and
email it to babak2@kth.se. If everything is done right the competition script
will read your cascade into the detector we’ve designed and test it on a test
set separate from the training set you’ve been given.
As stated before, the winning group of this competition will be announced
at the end of the course. The two criteria that will be measure by the
competition script are speed and accuracy (where accuracy means high true-
positive and low false-positive).
GOOD LUCK!
22
Bibliography
[2] P. Viola & M. Jones, Robust real-time object detection, in Second Inter-
national Workshop on Statistical Learning and Computational Theories
of Vision Modeling, Learning, Computing and Sampling, July 2001.
[5] P. Viola & M. Jones, Rapid object detection using a boosted cascade
of simple features, 2001.
[9] E. Saber and A. Tekalp, Frontal-view face detection and facial feature
extraction using color, shape and symmetry based cost functions, 1998.
23
[11] B. Menser, M. Brnig, Segmentation of Human Faces in Color Images
Using Connected Operators, in ICIP, 1999.
24