Swarm Optimization Algorithms For Face Recognition: Juicy Ray
Swarm Optimization Algorithms For Face Recognition: Juicy Ray
Swarm Optimization Algorithms For Face Recognition: Juicy Ray
Juicy Ray
www.nitrkl.ac.in
May 9, 2013
Certificate
This is to certify that the work in the thesis entitled Swarm Optimization
Algorithms for Face Recognition by Juicy Ray, bearing roll number
109CS0180, is a record of her work carried out under my supervision and guidance
in partial fulllment of the requirements for the award of the degree of Bachelor of
Technology in Computer Science and Engineering.
Banshidhar Majhi
Acknowledgment
First and foremost, I would like to thank my supervisor Prof. B. Majhi for
introducing me to this exciting area of Biometry. I am especially indebted to him
for his guidance, support and patience with me throughout the course of my research.
He taught me the essence and principles of research and guided me through until
the completion of this thesis. It is due to his faith in me that today I am submitting
this thesis. It has been my privilege working with him and learning from him.
I would also like to thank Prof. Ratnakar Dash for showing me innovative
research directions for the entire period of carrying out the research. I am indebted
to all the professors, batch mates and friends at National Institute of Technology
Rourkela for their cooperation.
I owe my largest debt to my family, and I wish to express my heartfelt gratitude
to my mother for her encouragement, constant prayers, and continued support. My
parents have given me all their love and support over the years; I thank them for
their unwavering commitment through good times and hard times.
Juicy Ray
Abstract
In this thesis, a face recognition system based on swarm intelligence is developed.
Swarm intelligence can be dened as the collective intelligence that emerges from a
group of simple entities; these agents enter into interactions, sense and change their
environment locally. A typical system for face recognition consists of three stages:
feature extraction, feature selection and classication. Two approaches are explored.
First, Bacterial Foraging Optimization(BFO), in which the features extracted from
Principal Component Analysis(PCA) and Linear Discriminant Analysis(LDA)
are optimized.
Keywords:
BFO, PSO
Contents
Certificate
ii
Acknowledgement
iii
Abstract
iv
List of Figures
vii
List of Tables
viii
1 Introduction
1.1
Face as a biometric . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.2
1.3
Face database . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.4
Thesis Organisation . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3 Feature Extraction
3.1
Feature-based Methods . . . . . . . . . . . . . . . . . . . . . . . . . .
3.2
Appearance-based Methods . . . . . . . . . . . . . . . . . . . . . . . 10
3.2.1
3.2.2
3.2.3
4 Feature Selection
4.1
4.2
15
Chemotaxis . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
4.1.2
Reproduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
4.1.3
4.1.4
BFO Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . 17
PSO Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . 20
4.2.2
Binary PSO . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
23
5.1
5.2
26
6.1
Face Database . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
6.2
6.3
Comparative Analysis
. . . . . . . . . . . . . . . . . . . . . . . . . . 27
7 Conclusion
28
Bibliography
29
vi
List of Figures
1.1
Typical examples of sample face images from the Yale face database .
2.1
2.2
3.1
PCA. x and y are the original basis. is the rst principal component 11
3.2
3.3
(a) The two classes are not well separated when projected onto this
line (b) This line succeeded in separating the classes as well as reduces
dimensionality from two features (x1,x2) to only a value y [4]
3.4
. . . . 13
(a) A typical face image (b) its DCT transformed image and (c)
typical division of the DCT coecients into low, middle and high
frequencies (Source: [14])
4.1
. . . . . . . . . . . . . . . . . . . . . . . . 14
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
vii
List of Tables
1.1
6.1
. . . . . . . . . . . . . . . . . . . . . . . . 27
viii
Chapter 1
Introduction
Face recognition is a part of the capability of human beings and is a task that
humans perform routinely and eortlessly in daily lives. Though research into this
area dates back to the 1960s, face recognition is still an area of active research since
a completely successful approach or model has not been proposed to solve the face
recognition problem.
Wide availability of powerful and low-cost desktop and embedded computing
systems has created an enormous interest in automatic processing of digital images
in a variety of applications including biometric authentication, crowd surveillance,
human-computer interaction, multimedia management, mug-shot matching, user
verication and user access control. Because of its prevalence as an institutionalized
and accepted guarantor of identity since the advent of photography, there are also
large legacy systems based on face images like voter registration, passports and
drivers licenses, all being automated currently [1].
Chapter 1
Introduction
We can recognize a familiar individual under very adverse lighting conditions, from
varying angles or viewpoints. Scaling dierences or dierent backgrounds do not
change our ability to recognize faces and we can even recognize individuals with just
a fraction of their face visible or even after several years have passed. Furthermore,
we are able to recognize the faces of several thousand individuals whom we have met
during our lifetime. So, its a true challenge to build an automated system which
equals human ability to recognize faces.Many face recognition algorithms have been
developed. An exhaustive survey on FR techniques [1] is given in Table 1.1
1.1
Face as a biometric
Chapter 1
Introduction
Method
Category
Characteristics
PCA [12]
LDA [13]
2D-PCA [25]
ICA [16]
Laplacianfaces
Holistic-based
[17]
Evolutionary
Holistic-based
pursuit [18]
Holistic-based
Mapping
the
image
into
Sparse
Holistic-based
representation
[20]
Gabor
and
dynamic
link
Feature-based
architecture [21]
Gabor and elastic
bunch
Feature-based
graph
matching [22]
LBP [23]
Feature-based
Local
binary
patterns
are
introduced
SIFT [24]
Part-based
Chapter 1
Introduction
owner walks up to it, or adjusting mirrors and car seats to the drivers presets when
sitting down in their car.
1.2
Groups of starlings can form impressive shapes as they travel northward together
in the springtime.
swarm behavior.
intelligence describes the collective behaviour of what are usually simple individuals
in decentralized systems. The behaviour of individuals allows the entire system
to solve complex tasks.
1.3
Face database
There are several publicly available face databases for the research community to
use for algorithm development, which provide a standard benchmark when reporting
results. Dierent databases are collected to address a dierent type of challenge
or variations such as illumination, pose, occlusion, etc. In this project, I have
used the Yale database [11] [13]which contains 165 gray-scale images in GIF format
of 15 subjects.The images are at resolution 243x320 pixels with 24 bits graylevel.
There are 11 images of each person, one for each variation such as dierent facial
expression, center-light, with glasses, happy, left-light, with and without glasses,
normal, right-light, sad, sleepy, surprised, and wink. Some sample images of this
Chapter 1
Introduction
Figure 1.1: Typical examples of sample face images from the Yale face database
database are shown in Figure.1.1
1.4
Thesis Organisation
The rest of the thesis constitutes the following six chaptersChapter 2: Steps in Face Recognition
This chapter outlines dierent steps of face recognition in detail.
Chapter 3: Feature Extraction
This chapter deals with various feature extraction algorithms for face recognition.
Chapter 4: Feature Selection
This chapter discusses how the swarm intelligence based optimization algorithms
are used for optimizing the feature vector set.
Chapter 5: Face Recognition using swarm intelligence
This chapter is based on the details of the face recognition method using swarm
optimization based selected features.
Chapter 6: Results and Analysis
This chapter deals with a comparative analysis of the two algorithms carried out
followed by results drawn from the research.
Chapter 7: Conclusion
In this chapter, conclusion and possible directions for future work are given.
Chapter 2
Steps in Face Recognition
a default size say, 128 X 128 On which the system can operate.
Histogram equalization- For images that are too dark or too light, it
modies the dynamic range and improves the contrast of the image so
that facial features become more apparent.
Median filtering - For noisy images especially obtained from a camera
or from a frame grabber, median ltering can clean the image without
much loss of information.
Background removal - In order to deal primarily with facial
possible that in the face image the head is somehow shifted or rotated.
The head plays a major role in the determination of facial features. The
pre- processing module determines and normalizes the shifts and rotations
in the head position.
Illumination normalization - Face images taken under dierent
Chapter 3
Feature Extraction
Face recognition algorithms can be classied into two broad categories according
to feature extraction schemes for face representation: feature-based methods and
appearance-based methods [1]
3.1
Feature-based Methods
Properties and geometric relations such as the areas, distances, and angles between
the facial feature points are used as descriptors for face recognition in this approach
[2]. These methods try to dene a face as a function and attempts to nd a standard
template of all the faces. The features can be dened independent of each other.
For example, a face can be divided into eyes, face contour, nose and mouth. Also
a face model can be built by edges. But these methods are limited to faces that
are frontal and unoccluded. A face can also be represented as a silhouette. These
standard patterns are compared to the input images to detect faces. This approach
is simple to implement, but its inadequate for face detection. It cannot achieve good
results with any kind of pose angle variation, scale dierence and shape change.
Chapter 3
3.2
Feature Extraction
Appearance-based Methods
Appearance-based methods consider the global properties of the face image intensity
pattern [8]. Typically appearance-based face recognition algorithms proceed by
computing basis vectors to represent the face data eciently. In the next step,
the faces are projected onto these vectors and the projection coecients can be
used for representing the face images. Popular algorithms such as PCA, LDA, ICA,
LFA, Correlation Filters, Manifolds and Tensorfaces are based on the appearance
of the face. Holistic approaches to face recognition have trouble dealing with pose
variations. In general, appearance-based methods rely on techniques from statistical
analysis and machine learning to nd the relevant characteristics of face images. For
feature extraction purpose in this project, appearance-based methods like Principle
Component Analysis (PCA), Linear Discriminant Analysis (LDA), Discrete Cosine
Transform (DCT) and Discrete Wavelet Transform (DWT) have been used. They
are described in detail in the next subsection.
3.2.1
10
Chapter 3
Feature Extraction
Figure 3.1: PCA. x and y are the original basis. is the rst principal component
to Karhunen-Loeve Transform (KLT). So, let Xnxm be the the data matrix where
x1,. . . , xm are the image vectors (vector columns) and n is the number of pixels per
image. The KLT basis is obtained by solving the eigenvalue problem [3]:
Cx = T
(3.1)
Cx =
(3.2)
is a diagonal matrix,the
11
Chapter 3
Feature Extraction
Figure 3.2: (Starting from top-left)(a)Class Mean image (b) Overall Mean Image
(c)Test Image (d)Reconstructed Test image)
3.2.2
Classic LDA is designed to take into account only two classes. Specically, it requires
data points for dierent classes to be far from each other, while point from the same
class is close. Consequently, LDA obtains dierenced projection vectors for each
class. Suppose we have m samples x1,. . . ,xm belonging to c classes; each class has
mk elements. We assume that the mean has been extracted from the samples, as
in PCA. The ratio of between -class scatter to the within-class scatter is calculated
which is the optimizing criterion in LDA:
Criterion(W ) = inv(Sw) (Sb)
(3.3)
Between-class Scatter:
Sw =
|i |(i )(i )T
(3.4)
i=1
Within-class Scatter:
Sb =
(xk i )(k i )T
i=1 xk i
12
(3.5)
Chapter 3
Feature Extraction
Figure 3.3: (a) The two classes are not well separated when projected onto this line
(b) This line succeeded in separating the classes as well as reduces dimensionality
from two features (x1,x2) to only a value y [4]
Typically when dealing with face images (and most other image based pattern
recognition problems) the number of training images is smaller than the number
of pixels (or equivalently dimensionality of the data), thus the within-class scatter
matrix Sw is singular causing problems for LDA. To address this issue [13] rst
performs PCA to reduce the dimensionality of the data in order to overcome
this singular-matrix problem and then applies LDA in this lower-dimensional PCA
subspace.
3.2.3
The Discrete Cosine Transform (DCT) expresses a sequence of data points in terms
of a sum of cosine functions oscillating at dierent frequencies. It has strong energy
compaction properties. Therefore, it can be used to transform images, compacting
the variations, allowing an eective dimensionality reduction. They have been widely
used for data compression. The DCT is based on the Fourier discrete transform,
but using only real numbers. When a DCT is performed over an image, the energy
is compacted in the upper-left corner.
For an M N image, where each image corresponds to a 2D matrix, DCT
13
Chapter 3
Feature Extraction
Figure 3.4: (a) A typical face image (b) its DCT transformed image and (c) typical
division of the DCT coecients into low, middle and high frequencies (Source: [14])
coecients are calculated as follows [5]:
M
1 N
1
(2x + 1)
1
(2y + 1)
(u)(v)
f (x, y) cos
F (u, v) =
cos
(3.6)
2M
2N
MN
x=0 y=0
() =
if = 0,
(2) if otheriwse,
(3.7)
14
Chapter 4
Feature Selection
Feature selection (FS) is a global optimization problem in machine learning, which
reduces the number of features, removes irrelevant, noisy and redundant data, and
results in acceptable recognition accuracy [8]. Although feature selection is primarily
performed to select relevant and informative features, it can have other motivations,
including:
General data reduction, to limit storage requirements and increase algorithm
speed;
Feature set reduction, to save resources in the next round of data collection or
during utilization;
Performance improvement, to gain in predictive accuracy;
Data understanding, to gain knowledge about the process that generated the
Chapter 4
Feature Selection
4.1
After many
generations, bacteria with poor foraging strategies are eliminated while; the
individuals with good foraging strategy survive signifying survival of the ttest. The
whole process can be divided into three sections, namely, chemotaxis, reproduction,
and elimination and dispersal [15].
4.1.1
Chemotaxis
16
Chapter 4
4.1.2
Feature Selection
Reproduction
The health of each bacterium is calculated as the sum of the step tness during its
life, namely,
i
Jhealth
N
c +1
J(i, j, k, l)
(4.1)
j=1
4.1.3
The chemotaxis provides a basis for local search, and the reproduction process speeds
up the convergence of the algorithm. However, only chemotaxis and reproduction
are not enough for global optima searching. Since bacteria may get stuck around
the initial positions or local optima, it is possible for the diversity of BFO to change
either gradually or suddenly to eliminate the accidents of being trapped into the local
optima. In BFO, according to a preset probability Ped, bacteria are eliminated and
dispersed after a certain number of reproduction steps. After elimination, they are
moved to another position within the environment.
4.1.4
BFO Algorithm
17
Chapter 4
Feature Selection
(4.2)
Jlast = J(i, j, k, l)
(4.3)
(i)
i (j + 1, k, l) = i (j, k, l) + C(i)
T (i)(i)
(4.4)
This results in a step of size C(i) in the direction of the tumble for
bacterium i.
Compute J (i,j+1,k,l) and let
(4.5)
Swim
18
(4.6)
Chapter 4
Feature Selection
(4.7)
and let
i (j + 1, k, l) = i (j, k, l) + C(i)
(i)
T (i)(i)
(4.8)
Step5 If j < Nc , go to step 4. In this case continue chemotaxis since the life of the
bacteria is not over.
Step6 Reproduction:
For the given k and l, and for each i = 1,. . . , S , let
i
Jhealth
N
c +1
J(i, j, k, l)
(4.9)
j=1
bacteria with the best values split (this process is performed by the copies
that are made are placed at the same location as their parent).
Step7 If k < Nre , go to step 3. In this case, we have not reached the number of
specied reproduction steps, so we start the next generation of the chemotactic
loop.
19
Chapter 4
Feature Selection
4.2
4.2.1
PSO Algorithm
20
Chapter 4
Feature Selection
pinew = pi + vinew
(4.11)
criteria is met.
Go to 2nd step.
21
Chapter 4
4.2.2
Feature Selection
Binary PSO
Consider a database of L subjects or classes, each class W1, W2,. . . ,WL with
N1,N2,. . . ,NL number of samples. Let M1,M2,. . . ,ML be the individual class mean
and M0 be mean of feature vector. Fitness function is dened so as to increase the
class separation equation. By minimizing the tness function, class separation is
increased. With each iteration, the most important features are selected. Binary
value of 1 of its position implies that the feature is selected as a distinguishing feature
for the succeeding iterations and if the position value is 0 the feature is not selected.
The expressions for class, individual mean and mean of feature of feature vector are
shown below:
W j(i), f orj = 1, 2, . . . , N i
(4.12)
1
Mi =
W j(i)
N i j=1
(4.13)
Ni
Mo =
L
1
Ni Mi
N i=1
22
(4.14)
Chapter 5
Face Recognition Using Swarm
Intelligence
Swarm intelligence is a family of decentralized stochastic algorithms inspired by
the behavior of swarms. Swarm intelligence algorithms include Particle Swarm
Optimization (PSO) [26], Bacterial Foraging Optimization (BFO) [15], and Ant
Colony Optimization (ACO). The ultimate goal of any of these optimization
algorithms is to nd the global best tness as eciently as possible.
Since
23
Chapter 5
5.1
Feature Extraction Perform PCA on the images to obtain the optimal bases
before LDA. Then generate the eigenvectors as the feature vector set (which
will be input to the BFO) through LDA.
Feature Selection Apply the BFO algorithm on the feature vector set as stated in
section 4.1. Pick up the position of bacteria B with max (Jhealth) value. This
position represents the best feature subset of the features dened in feature
extraction step.
Classification Calculate the dierence between the feature subset (obtained
through feature selection) of each image of facial gallery and the test image
with the help of Euclidean Distance dened below. The index of the image
which has the smallest distance with the image under test is considered to
be the required index. For an N-dimensional space, the Euclidean distance
between two any points, pi and qi is given by:
n
D =
sqrt(pi qi)2
(5.1)
i=1
5.2
24
(5.2)
Chapter 5
25
Chapter 6
Results & Analysis
6.1
Face Database
The two recognition approaches has been tested on publicly available YALE face
database [11]. It comprises images from 15 subjects. For each subject, 11 dierent
images are recorded, one for each variation such as dierent facial expression,
center-light, with glasses, happy, left-light, with and without glasses, normal,
right-light, sad, sleepy, surprised, and wink. In total, the database consists of 165
images.
6.2
In order to construct the training set, 7 images per person was used and the
remaining 4 images were used for testing purpose. All 15 classes of persons in
the database were considered. The same training and testing data sets were used in
both the approaches. The results are displayed in table 6.1
26
Chapter 6
Feature Extraction
Feature Selection
Training Time
Recognition Rate
BFO
257.13 sec
95.14%
DCT
PSO
161.86 sec
93.7%
6.3
Comparative Analysis
less training time than the BFO-based selection algorithm. Hence, BFO is
computationally expensive than PSO.
Therefore, it can be said that the eectiveness of BFO in nding the optimal feature
subset compared to PSO compensates its computational ineciency.
27
Chapter 7
Conclusion
In this thesis, face recognition was performed by application of the swarm
optimization algorithms. It was found out that the underlying foraging principle
and the swarm optimization can be integrated into evolutionary computational
algorithms to provide a better search strategy for nding optimal feature vectors for
face recognition. Finally, it is believed that the two swarm optimization methods,
namely bacterial foraging optimization and particle swarm optimization, may be
useful for the development of face recognition system.
28
Bibliography
[1] W. Zhao, R. Chellappa, P. J. Phillips, and A. Rosenfeld. Face recognition: A literature survey.
ACM Comput. Surv., 35(4):399458, December 2003.
[2] Marios Savvides, Jingu Heo, and SungWon Park. Face recognition. In AnilK. Jain, Patrick
Flynn, and ArunA. Ross, editors, Handbook of Biometrics, pages 4370. Springer US, 2008.
[3] M. Kirby and L. Sirovich. Application of the karhunen-loeve procedure for the characterization
of human faces. IEEE Trans. Pattern Anal. Mach. Intell., 12(1):103108, January 1990.
[4] F.Z. Chelali, A. Djeradi, and R. Djeradi. Linear discriminant analysis for face recognition. In
Multimedia Computing and Systems, 2009. ICMCS 09. International Conference on, pages
110, 2009.
[5] B. Schwerin and K. Paliwal. Local-dct features for facial recognition. In Signal Processing
and Communication Systems, 2008. ICSPCS 2008. 2nd International Conference on, pages
16, 2008.
[6] Rutuparna Panda, Manoj Kumar Naik, and B.K. Panigrahi. Face recognition using bacterial
foraging strategy. Swarm and Evolutionary Computation, 1(3):138 146, 2011.
[7] S. Arivalagan and K. Venkatachalapathy.
29
Bibliography
[11] http://cvc.yale.edu/projects/yalefaces/yalefaces.html.
[12] Matthew Turk and Alex Pentland. Eigenfaces for recognition. J. Cognitive Neuroscience,
3(1):7186, January 1991.
[13] Peter N. Belhumeur, Joo P. Hespanha, and David J. Kriegman. Eigenfaces vs. fisherfaces:
Recognition using class specific linear projection.
[14] Saeed Dabbaghchian, Masoumeh P. Ghaemmaghami, and Ali Aghagolzadeh.
Feature
extraction using discrete cosine transform and discrimination power analysis with a face
recognition technology. Pattern Recognition, 43(4):1431 1440, 2010.
[15] K. M. Passino. Biomimicry of bacterial foraging for distributed optimization and control.
IEEE Control Systems Magazine, 22:5267, 2002.
[16] M.S. Bartlett, Javier R. Movellan, and T.J. Sejnowski. Face recognition by independent
component analysis. Neural Networks, IEEE Transactions on, 13(6):14501464, 2002.
[17] Xiaofei He, Shuicheng Yan, Yuxiao Hu, P. Niyogi, and Hong-Jiang Zhang. Face recognition
using laplacianfaces. Pattern Analysis and Machine Intelligence, IEEE Transactions on,
27(3):328340, 2005.
[18] Chengjun Liu and Harry Wechsler.
recognition. IEEE Trans. Pattern Anal. Mach. Intell., 22(6):570582, June 2000.
[19] Ming-Hsuan Yang. Kernel eigenfaces vs. kernel fisherfaces: Face recognition using kernel
methods.
30
Bibliography
[24] Jun Luo, Y. Ma, E. Takikawa, Shihong Lao, M. Kawade, and Bao-Liang Lu. Person-specific
sift features for face recognition. In Acoustics, Speech and Signal Processing, 2007. ICASSP
2007. IEEE International Conference on, volume 2, pages II593II596, 2007.
[25] A. F. Frangi J. Yang D. Zhang and J. Y. Yang. Two-dimensional pca: A new approach to
appear-ance-based face representation and recognition.
[26] J. Kennedy and R. Eberhart. Particle swarm optimization. In Neural Networks, 1995.
Proceedings., IEEE International Conference on, volume 4, pages 19421948 vol.4, 1995.
31