Swarm Optimization Algorithms For Face Recognition: Juicy Ray

Swarm Optimization Algorithms
for Face Recognition
Juicy Ray
Department of Computer Science and Engineering

National Institute of Technology Rourkela
Rourkela 769 008, India
Swarm Optimization Algorithms

for Face Recognition
Dissertation submitted in
May 2013
to the department of
Computer Science and Engineering
of
in partial fulfillment of the requirements
for the degree of
Bachelor of Technology
by
Juicy Ray
(Roll 109CS0180)
under the supervision of
Prof. Banshidhar Majhi
Department of Computer Science and Engineering

Rourkela 769 008, India
Computer Science and Engineering

Rourkela-769 008, India.
www.nitrkl.ac.in
Prof. Banshidhar Majhi

Professor
May 9, 2013
Certificate
This is to certify that the work in the thesis entitled Swarm Optimization
Algorithms for Face Recognition by Juicy Ray, bearing roll number
109CS0180, is a record of her work carried out under my supervision and guidance
in partial fulllment of the requirements for the award of the degree of Bachelor of
Technology in Computer Science and Engineering.
Banshidhar Majhi
Acknowledgment
First and foremost, I would like to thank my supervisor Prof. B. Majhi for
introducing me to this exciting area of Biometry. I am especially indebted to him
for his guidance, support and patience with me throughout the course of my research.
He taught me the essence and principles of research and guided me through until
the completion of this thesis. It is due to his faith in me that today I am submitting
this thesis. It has been my privilege working with him and learning from him.
I would also like to thank Prof. Ratnakar Dash for showing me innovative
research directions for the entire period of carrying out the research. I am indebted
to all the professors, batch mates and friends at National Institute of Technology
Rourkela for their cooperation.
I owe my largest debt to my family, and I wish to express my heartfelt gratitude
to my mother for her encouragement, constant prayers, and continued support. My
parents have given me all their love and support over the years; I thank them for
their unwavering commitment through good times and hard times.
Juicy Ray
Abstract
In this thesis, a face recognition system based on swarm intelligence is developed.
Swarm intelligence can be dened as the collective intelligence that emerges from a
group of simple entities; these agents enter into interactions, sense and change their
environment locally. A typical system for face recognition consists of three stages:
feature extraction, feature selection and classication. Two approaches are explored.
First, Bacterial Foraging Optimization(BFO), in which the features extracted from
Principal Component Analysis(PCA) and Linear Discriminant Analysis(LDA)
are optimized.
Second, Particle Swarm Optimization(PSO), which optimizes
the transform coecients obtained from the Discrete Cosine Transform(DCT) of

the images. PCA, LDA and DCT are all appearance-based methods of feature
extraction.
PCA and LDA are based on global appearance whereas DCT is
performed on a block by block basis exploring the local appearance-based features.

Finally, for classication Euclidean distance metric is used. The algorithms that
have been applied are tested on Yale Face Database.
Keywords:
BFO, PSO
Swarm intelligence, feature extraction, feature selection, PCA, LDA, DCT,
Contents
Certificate
ii
Acknowledgement
iii
Abstract
iv
List of Figures
vii
List of Tables
viii
1 Introduction
1.1
Face as a biometric . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.2
Face Recognition using Swarm Intelligence . . . . . . . . . . . . . . .
1.3
Face database . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.4
Thesis Organisation . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2 Steps in Face Recognition
3 Feature Extraction
3.1
Feature-based Methods . . . . . . . . . . . . . . . . . . . . . . . . . .
3.2
Appearance-based Methods . . . . . . . . . . . . . . . . . . . . . . . 10
3.2.1
Principal Component Analysis(PCA) . . . . . . . . . . . . . . 10
3.2.2
Linear Discriminant Analysis(LDA) . . . . . . . . . . . . . . . 12
3.2.3
Discrete Cosine Transform(DCT) . . . . . . . . . . . . . . . . 13
4 Feature Selection
4.1
4.2
15
Bacterial Foraging Optimization(BFO) . . . . . . . . . . . . . . . . . 16

4.1.1
Chemotaxis . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
4.1.2
Reproduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
4.1.3
Elimination & Dispersal . . . . . . . . . . . . . . . . . . . . . 17
4.1.4
BFO Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . 17
Particle Swarm Optimization(PSO) . . . . . . . . . . . . . . . . . . . 20

4.2.1
PSO Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . 20
4.2.2
Binary PSO . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
5 Face Recognition Using Swarm Intelligence
23
5.1
BFO-based Feature Selection Algorithm . . . . . . . . . . . . . . . . 24
5.2
PSO-based Feature Selection Algorithm . . . . . . . . . . . . . . . . . 24
6 Results & Analysis
26
6.1
Face Database . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
6.2
Experiment & Results . . . . . . . . . . . . . . . . . . . . . . . . . . 26
6.3
Comparative Analysis
. . . . . . . . . . . . . . . . . . . . . . . . . . 27
7 Conclusion
28
Bibliography
29
vi
List of Figures
1.1
Typical examples of sample face images from the Yale face database .
2.1
Outline of a typical face recognition system . . . . . . . . . . . . . . .
2.2
Feature Extraction process . . . . . . . . . . . . . . . . . . . . . . . .
3.1
PCA. x and y are the original basis. is the rst principal component 11
3.2
(Starting from top-left)(a)Class Mean image (b) Overall Mean Image

(c)Test Image (d)Reconstructed Test image) . . . . . . . . . . . . . . 12
3.3
(a) The two classes are not well separated when projected onto this
line (b) This line succeeded in separating the classes as well as reduces
dimensionality from two features (x1,x2) to only a value y [4]
3.4
. . . . 13
(a) A typical face image (b) its DCT transformed image and (c)
typical division of the DCT coecients into low, middle and high
frequencies (Source: [14])
4.1
. . . . . . . . . . . . . . . . . . . . . . . . 14
Example of a particle swarm optimisation swarms progression in

two-dimensions. Initially, the particles are scattered randomly in the
search space. As they search for the optimum value, the particles
balance the objectives of following the value gradient and random
exploration. Over time they begin to congregate in the general area
of the optimum value. Finally, the particles converge to the optimum
value
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
vii
List of Tables
1.1
Summary of Face Recognition techniques . . . . . . . . . . . . . . . .
6.1
Results on Yale Database
. . . . . . . . . . . . . . . . . . . . . . . . 27
viii
Chapter 1
Introduction
Face recognition is a part of the capability of human beings and is a task that
humans perform routinely and eortlessly in daily lives. Though research into this
area dates back to the 1960s, face recognition is still an area of active research since
a completely successful approach or model has not been proposed to solve the face
recognition problem.
Wide availability of powerful and low-cost desktop and embedded computing
systems has created an enormous interest in automatic processing of digital images
in a variety of applications including biometric authentication, crowd surveillance,
human-computer interaction, multimedia management, mug-shot matching, user
verication and user access control. Because of its prevalence as an institutionalized
and accepted guarantor of identity since the advent of photography, there are also
large legacy systems based on face images like voter registration, passports and
drivers licenses, all being automated currently [1].
Video indexing is another
example of legacy data for which face recognition.

The problem of face recognition can be stated as identifying an individual
from images of the face. The inadequacy of automated face recognition systems
is especially apparent when compared to our own innate face recognition ability. We
perform face recognition, an extremely complex visual task, with in a span of seconds
and our own recognition ability is far more robust than any computer can hope to be.
Chapter 1
Introduction
We can recognize a familiar individual under very adverse lighting conditions, from
varying angles or viewpoints. Scaling dierences or dierent backgrounds do not
change our ability to recognize faces and we can even recognize individuals with just
a fraction of their face visible or even after several years have passed. Furthermore,
we are able to recognize the faces of several thousand individuals whom we have met
during our lifetime. So, its a true challenge to build an automated system which
equals human ability to recognize faces.Many face recognition algorithms have been
developed. An exhaustive survey on FR techniques [1] is given in Table 1.1
1.1
Face as a biometric
Biometrics are automated methods of recognizing a person based on a physiological

or behavioral characteristic. The dierent features that are measured are face,
ngerprints, hand shape, calligraphy, iris, retinal, vein, and voice. Face recognition
has a number of strengths to recommend it over other biometric modalities in certain
circumstances. Face recognition as a biometric derives a number of advantages
from being the primary biometric that humans use to recognize one another. It is
well-accepted and easily understood by people, and it is easy for a human operator to
arbitrate machine decisions in fact face images are often used as a human-veriable
backup to automated ngerprint recognition systems.
Face recognition has the advantage ubiquity and of being universal over other
major biometrics, in that everyone has a face and everyone readily displays the face.
(Whereas, for instance, ngerprints are captured with much more diculty and a
signicant proportion of the population has ngerprints that cannot be captured
with quality sucient for recognition.) With some conguration and co-ordination
of one or more cameras, it is easy to acquire face images without active participation
of the subject. Such passive identication might be desirable for customization of
user services and consumer devices, whether that be opening a house door as the
Chapter 1
Introduction
Method
Category
Characteristics
PCA [12]
Appearance-based PCA for learning eigenfaces
LDA [13]
Appearance-based LDA for learning fisherfaces
2D-PCA [25]
Appearance-based For better statistical properties
ICA [16]
Appearance-based ICA for catch facial independent

properties
Laplacianfaces
Holistic-based
[17]
Nonlinear dimension reduction for

finding bases, LPP
Evolutionary
Holistic-based
pursuit [18]
For finding the best projection bases

based on generalization error
Kernel PCA And
Holistic-based
Kernel LDA [19]
Mapping
the
image
into
higher-dimensional space by the

kernel function
Sparse
Holistic-based
representation
Using L1 minimization for finding

sparse representation
[20]
Gabor
and
dynamic
link
Feature-based
Gabor features extracted at facial

feature locations
architecture [21]
Gabor and elastic
bunch
Feature-based
graph
Gabor features extracted at facial

feature locations, and obtaining the
matching [22]
robust representation by the FBG

matching.
LBP [23]
Feature-based
Local
binary
patterns
are
introduced
SIFT [24]
Part-based
Using SIFT feature with spatial

constraints to compare faces
Table 1.1: Summary of Face Recognition techniques
Chapter 1
Introduction
owner walks up to it, or adjusting mirrors and car seats to the drivers presets when
sitting down in their car.
1.2
Face Recognition using Swarm Intelligence
Groups of starlings can form impressive shapes as they travel northward together
in the springtime.
swarm behavior.
This is among a group of natural phenomena based on

The behaviour of these intelligent swarms has opened new
approaches for feature selection in face recognition as well.
The term swarm
intelligence describes the collective behaviour of what are usually simple individuals
in decentralized systems. The behaviour of individuals allows the entire system
to solve complex tasks.
Bacterial foraging optimization and Particle swarm
optimization are applications of swarm intelligence in the area of optimization.

During optimization, one attempts to nd the minimum of a target function. A
special characteristic of these algorithms is that they generally provide good results,
without the computational complexity typically required for nding the optimum
solution.
1.3
Face database
There are several publicly available face databases for the research community to
use for algorithm development, which provide a standard benchmark when reporting
results. Dierent databases are collected to address a dierent type of challenge
or variations such as illumination, pose, occlusion, etc. In this project, I have
used the Yale database [11] [13]which contains 165 gray-scale images in GIF format
of 15 subjects.The images are at resolution 243x320 pixels with 24 bits graylevel.
There are 11 images of each person, one for each variation such as dierent facial
expression, center-light, with glasses, happy, left-light, with and without glasses,
normal, right-light, sad, sleepy, surprised, and wink. Some sample images of this
Chapter 1
Introduction
Figure 1.1: Typical examples of sample face images from the Yale face database
database are shown in Figure.1.1
1.4
Thesis Organisation
The rest of the thesis constitutes the following six chaptersChapter 2: Steps in Face Recognition
This chapter outlines dierent steps of face recognition in detail.
Chapter 3: Feature Extraction
This chapter deals with various feature extraction algorithms for face recognition.
Chapter 4: Feature Selection
This chapter discusses how the swarm intelligence based optimization algorithms
are used for optimizing the feature vector set.
Chapter 5: Face Recognition using swarm intelligence
This chapter is based on the details of the face recognition method using swarm
optimization based selected features.
Chapter 6: Results and Analysis
This chapter deals with a comparative analysis of the two algorithms carried out
followed by results drawn from the research.
Chapter 7: Conclusion
In this chapter, conclusion and possible directions for future work are given.
Chapter 2
Steps in Face Recognition
Figure 2.1: Outline of a typical face recognition system

1. Acquisition module - It is the module where the face image under
consideration is presented to the system. It can request an image from several
dierent environments: The face image can be an image le that is located on
a disk or it can be captured by a frame grabber or can be scanned from paper
with the help of a scanner.
2. Preprocessing - This is done to enhance the images to improve the

recognition performance of the system:
Image normalization - It is done to change the acquired image size to
a default size say, 128 X 128 On which the system can operate.
Histogram equalization- For images that are too dark or too light, it
modies the dynamic range and improves the contrast of the image so
that facial features become more apparent.
Median filtering - For noisy images especially obtained from a camera
or from a frame grabber, median ltering can clean the image without
much loss of information.
Background removal - In order to deal primarily with facial
information itself, face background can be removed. This is important for

face recognition systems where entire information contained in the image
is used. It is obvious that, for background removal, the preprocessing
module should be capable of determining the face outline.
Translational and rotational normalizations- In some cases, it is
possible that in the face image the head is somehow shifted or rotated.
The head plays a major role in the determination of facial features. The
preprocessing module determines and normalizes the shifts and rotations
in the head position.
Illumination normalization - Face images taken under dierent
illuminations can degrade recognition performance especially for face

recognition systems based on the principal component analysis in which
entire face information is used for recognition.
3. Feature Extraction - This module nds the key features in the face that
will be used in classication. It is responsible for composing a feature vector
that is well enough to represent the image.
4. Classification Module - In this module, the extracted features of the face

image is compared with the ones stored in the face library with the help of a
pattern classier.
5. Training sets - It is used during the learning phase of face recognition process
6. Face database - On being classied as unknown by the classication module,
the face can be added to the library with their feature vectors for future
comparisons
To the steps described above, an additional important step is added as shown in
gure 2.2, the step of feature selection:
Figure 2.2: Feature Extraction process

Feature Selection-The feature selection seeks for the optimal set of d features
out of m features obtained from feature extraction module. Several methods have
been previously used to perform feature selection on training and testing data. In
the developed FR system I have utilized evolutionary feature selection algorithms
based on swarm intelligence called the Bacteria Foraging Optimization and Particle
Swarm Optimization
Chapter 3
Feature Extraction
Face recognition algorithms can be classied into two broad categories according
to feature extraction schemes for face representation: feature-based methods and
appearance-based methods [1]
3.1
Feature-based Methods
Properties and geometric relations such as the areas, distances, and angles between
the facial feature points are used as descriptors for face recognition in this approach
[2]. These methods try to dene a face as a function and attempts to nd a standard
template of all the faces. The features can be dened independent of each other.
For example, a face can be divided into eyes, face contour, nose and mouth. Also
a face model can be built by edges. But these methods are limited to faces that
are frontal and unoccluded. A face can also be represented as a silhouette. These
standard patterns are compared to the input images to detect faces. This approach
is simple to implement, but its inadequate for face detection. It cannot achieve good
results with any kind of pose angle variation, scale dierence and shape change.
Chapter 3
3.2
Feature Extraction
Appearance-based Methods
Appearance-based methods consider the global properties of the face image intensity
pattern [8]. Typically appearance-based face recognition algorithms proceed by
computing basis vectors to represent the face data eciently. In the next step,
the faces are projected onto these vectors and the projection coecients can be
used for representing the face images. Popular algorithms such as PCA, LDA, ICA,
LFA, Correlation Filters, Manifolds and Tensorfaces are based on the appearance
of the face. Holistic approaches to face recognition have trouble dealing with pose
variations. In general, appearance-based methods rely on techniques from statistical
analysis and machine learning to nd the relevant characteristics of face images. For
feature extraction purpose in this project, appearance-based methods like Principle
Component Analysis (PCA), Linear Discriminant Analysis (LDA), Discrete Cosine
Transform (DCT) and Discrete Wavelet Transform (DWT) have been used. They
are described in detail in the next subsection.
3.2.1
Principal Component Analysis(PCA)
PCA is an orthogonal linear transformation that transforms the data to a new

coordinate system such that greatest variance by any projection of the data comes
to lie on the rst coordinate, the second greatest variance comes up in the second
coordinate, and so on. The idea of PCA is illustrated in gure 3.1. Eigenfaces [12]
also known as Principal Components Analysis (PCA) nd the minimum mean
squared error linear subspace that maps from the original N dimensional data space
into an M-dimensional feature space. By doing this, Eigenfaces (where typically
M << N) achieve dimensionality reduction by using the M eigenvectors of the
covariance matrix corresponding to the largest eigenvalues. The resulting basis
vectors are obtained by nding the optimal basis vectors that maximize the total
variance of the projected data (i.e. the set of basis vectors that best describe the
data). Usually the mean x is extracted from the data, so that PCA is equivalent
10
Chapter 3
Feature Extraction
Figure 3.1: PCA. x and y are the original basis. is the rst principal component
to Karhunen-Loeve Transform (KLT). So, let Xnxm be the the data matrix where
x1,. . . , xm are the image vectors (vector columns) and n is the number of pixels per
image. The KLT basis is obtained by solving the eigenvalue problem [3]:
Cx = T
(3.1)
where Cx is the covariance matrix of the data:

1
xi xTi
m i=1
m
Cx =
= [1,. . . , n] is the eigenvector matrix of Cx .
(3.2)
is a diagonal matrix,the
eigenvalues 1, . . . , n of Cx are located on its main diagonal. i is the variance of

the data projected on i. The application of pca on an image from the yale database
is shown in gure 3.2
The primary advantage of this technique is that it can reduce the data needed
to identify the individual to 1/1000th of the data presented. PCA is good for data
representation but not necessarily for class discrimination as we will discuss next.
11
Chapter 3
Feature Extraction
Figure 3.2: (Starting from top-left)(a)Class Mean image (b) Overall Mean Image
(c)Test Image (d)Reconstructed Test image)
3.2.2
Linear Discriminant Analysis(LDA)
LDA is widely used to nd linear combinations of features while preserving class

separability.
Unlike PCA, LDA tries to model the dierences between classes.
Classic LDA is designed to take into account only two classes. Specically, it requires
data points for dierent classes to be far from each other, while point from the same
class is close. Consequently, LDA obtains dierenced projection vectors for each
class. Suppose we have m samples x1,. . . ,xm belonging to c classes; each class has
mk elements. We assume that the mean has been extracted from the samples, as
in PCA. The ratio of between -class scatter to the within-class scatter is calculated
which is the optimizing criterion in LDA:
Criterion(W ) = inv(Sw) (Sb)
(3.3)
Between-class Scatter:
Sw =
|i |(i )(i )T
(3.4)
i=1
Within-class Scatter:
Sb =
(xk i )(k i )T
i=1 xk i
12
(3.5)
Chapter 3
Feature Extraction
Figure 3.3: (a) The two classes are not well separated when projected onto this line
(b) This line succeeded in separating the classes as well as reduces dimensionality
from two features (x1,x2) to only a value y [4]
Typically when dealing with face images (and most other image based pattern
recognition problems) the number of training images is smaller than the number
of pixels (or equivalently dimensionality of the data), thus the within-class scatter
matrix Sw is singular causing problems for LDA. To address this issue [13] rst
performs PCA to reduce the dimensionality of the data in order to overcome
this singular-matrix problem and then applies LDA in this lower-dimensional PCA
subspace.
3.2.3
Discrete Cosine Transform(DCT)
The Discrete Cosine Transform (DCT) expresses a sequence of data points in terms
of a sum of cosine functions oscillating at dierent frequencies. It has strong energy
compaction properties. Therefore, it can be used to transform images, compacting
the variations, allowing an eective dimensionality reduction. They have been widely
used for data compression. The DCT is based on the Fourier discrete transform,
but using only real numbers. When a DCT is performed over an image, the energy
is compacted in the upper-left corner.
For an M N image, where each image corresponds to a 2D matrix, DCT
13
Chapter 3
Feature Extraction
Figure 3.4: (a) A typical face image (b) its DCT transformed image and (c) typical
division of the DCT coecients into low, middle and high frequencies (Source: [14])
coecients are calculated as follows [5]:
M
1 N
1
(2x + 1)
1
(2y + 1)
(u)(v)
f (x, y) cos
F (u, v) =
cos
(3.6)
2M
2N
MN
x=0 y=0
where u=1,. . . ,M v=1,. . . ,N
() =
if = 0,
(2) if otheriwse,
(3.7)
f(x,y) The image intensity function

F(u,v) A 2D matrix of DCT coecients The matrix F(u,v) can be truncated,
retaining the upper-left area, which has the most information, reducing the
dimensionality of the problem.This is illustrated in gure 3.4.
14
Chapter 4
Feature Selection
Feature selection (FS) is a global optimization problem in machine learning, which
reduces the number of features, removes irrelevant, noisy and redundant data, and
results in acceptable recognition accuracy [8]. Although feature selection is primarily
performed to select relevant and informative features, it can have other motivations,
including:
General data reduction, to limit storage requirements and increase algorithm
speed;
Feature set reduction, to save resources in the next round of data collection or
during utilization;
Performance improvement, to gain in predictive accuracy;
Data understanding, to gain knowledge about the process that generated the
data or simply visualize the data

In section 3, all the methods depicted use the top n principal components (or
transform coecients) when used directly for dimension reduction and eliminate the
lower order. However, there may be some useful information in lower order principal
components leading to a signicant contribution in improving the recognition rate.
Feature selection thus becomes an important step, aecting the performance of
15
Chapter 4
Feature Selection
a pattern recognition system. Here, search methods based on swarm intelligence

algorithms are developed to select the appropriate principal components or transform
coecients from the set of extracted feature vectors. They are discussed in detail in
the subsections below.
4.1
Bacterial Foraging Optimization(BFO)
BFO is based on foraging strategy of bacteria Escherichia coli.
After many
generations, bacteria with poor foraging strategies are eliminated while; the
individuals with good foraging strategy survive signifying survival of the ttest. The
whole process can be divided into three sections, namely, chemotaxis, reproduction,
and elimination and dispersal [15].
4.1.1
Chemotaxis
Chemotaxis can be dened as foraging behavior of bacteria in which it try to

avoid noxious substances and search for nutrient rich substances by climbing up
the nutrient concentration. This process involves two actions; either a run (in the
same direction as the previous step) or tumble (in an absolutely dierent direction
from the previous one). In order to explore whole search space there is a limit on run
steps in a particular direction. So, bacteria tumble after some run steps. Suppose
i(j,k,l) be the position of ith bacterium at jth chemotactic, kth reproductive and
lth elimination & dispersal loop. Then chemotactic movement of the bacterium may
be mathematically represented by following equation.
In the above expression, C(i) is the size of the step taken in random direction
and (i) indicates a vector in the arbitrary direction. When the bacterial movement
is run, (i) remains unchanged; otherwise, (i) is a random vector whose elements
lie in [-1, 1]. Fitness function, denoted as J(i,j,k,l), will be evaluated for each step
of run or tumble in the chemotactic process.
16
Chapter 4
4.1.2
Feature Selection
Reproduction
The health of each bacterium is calculated as the sum of the step tness during its
life, namely,
i
Jhealth
N
c +1
J(i, j, k, l)
(4.1)
j=1
where Nc is number of chemotactic steps. All bacteria are sorted in increasing

order according to health status. In the reproduction step, only the rst half of
population survives and a surviving bacterium reproduces by splitting into two
daughter bacteria, which are then placed at the same locations. Thus, the population
of bacteria keeps constant.
4.1.3
Elimination & Dispersal
The chemotaxis provides a basis for local search, and the reproduction process speeds
up the convergence of the algorithm. However, only chemotaxis and reproduction
are not enough for global optima searching. Since bacteria may get stuck around
the initial positions or local optima, it is possible for the diversity of BFO to change
either gradually or suddenly to eliminate the accidents of being trapped into the local
optima. In BFO, according to a preset probability Ped, bacteria are eliminated and
dispersed after a certain number of reproduction steps. After elimination, they are
moved to another position within the environment.
4.1.4
BFO Algorithm
The algorithm [6] is as stated below:

Step1 Initialize parameters p, S, Nc , Ns , Nr e, Ne d, Pe d, C(i)(i=1,. . . ,S), i
Step2 Elimination-dispersal loop: l=l+1
Step3 Reproduction loop: k=k+1
17
Chapter 4
Feature Selection
Step4 Chemotaxis loop: j=j+1

For i =1,. . . ,S take a chemotactic step for bacterium i as follows.
Compute tness function, J (i, j, k, l):
J(i, j, k, l) = J(i, j, k, l) + Jcc (i (j, k, l), P (j, k, l))
(4.2)
(i.e. add on the cell-to cell attractantrepellant prole to simulate the

swarming behavior)
Let
Jlast = J(i, j, k, l)
(4.3)
to save this value since we may nd a better cost via a run.

Tumble: generate a random vector (i) Rp with each element m (i)
,m=1,. . . ,p, a random number on [-1, 1].

Move: Let
(i)
i (j + 1, k, l) = i (j, k, l) + C(i)
T (i)(i)
(4.4)
This results in a step of size C(i) in the direction of the tumble for
bacterium i.
Compute J (i,j+1,k,l) and let
J(i, j, k, l) = J(i, j, k, l) + Jcc (i (j, k, l), P (j, k, l))
(4.5)
Swim
Let m=0 (counter for swim length).

While m < Ns (if have not climbed down too long)
(i) Let
m = m + 1.
18
(4.6)
Chapter 4
Feature Selection
(ii) If J(i,j+1,k,l)< Jlast (if doing better), let

Jlast = J(i, j + 1, k, l)
(4.7)
and let
i (j + 1, k, l) = i (j, k, l) + C(i)
(i)
T (i)(i)
(4.8)
And use this i (j+1,k,l) to compute the new J(i,j+1,k,l)

(iii) Else, let This is the end of the while statement.
Go to next bacterium (i+1) if i = S
Step5 If j < Nc , go to step 4. In this case continue chemotaxis since the life of the
bacteria is not over.
Step6 Reproduction:
For the given k and l, and for each i = 1,. . . , S , let
i
Jhealth
N
c +1
J(i, j, k, l)
(4.9)
j=1
be the health of the bacterium i (a measure of how many nutrients it got

over its lifetime and how successful it was at avoiding noxious substances).
Sort bacteria and chemotactic parameters C(i) in order of ascending cost
Jhealth (higher cost means lower health).
The Sr bacteria with the highest Jhealth values die and the remaining Sr
bacteria with the best values split (this process is performed by the copies
that are made are placed at the same location as their parent).
Step7 If k < Nre , go to step 3. In this case, we have not reached the number of
specied reproduction steps, so we start the next generation of the chemotactic
loop.
19
Chapter 4
Feature Selection
Step8 Elimination-dispersal: For i= 1,. . . ,S with probability Ped , eliminate and

disperse each bacterium (this keeps the number of bacteria in the population
constant). To do this, if a bacterium is eliminated, simply disperse another
one to a random location on the optimization domain. If ed l N , then go to
step 2; otherwise end.
4.2
Particle Swarm Optimization(PSO)
Particle Swarm Optimization is an algorithm capable of optimizing a non-linear

and multidimensional problem which usually reaches good solutions eciently while
requiring minimal parameterization. The basic concept of the algorithm is to create
a swarm of particles which move in the space around them (the problem space)
searching for their goal, the place which best suits their needs given by a tness
function [9]. A nature analogy with birds is the following: a bird ock ies in its
environment looking for the best place to rest. An example of PSO is shown in
gure 4.1
4.2.1
PSO Algorithm
The PSO Algorithm [10] is stated as follows:

Initialize the particle position by assigning location p = (p0,p1,. . . ,pN) and
velocities v = (v0,v1,. . . ,vN).

Determine the tness value of all the particles: f(p) = (f(p0),f(p1),. . . ,f(pN)).
Evaluate the location where each individual has the highest tness value so
far: p = (p0best,p1best,. . . ,pNbest).

Evaluate the global tness value which is best of all pbest :G(p) = max(f(p)).
20
Chapter 4
Feature Selection
Figure 4.1: Example of a particle swarm optimisation swarms progression in

two-dimensions. Initially, the particles are scattered randomly in the search space.
As they search for the optimum value, the particles balance the objectives of
following the value gradient and random exploration. Over time they begin to
congregate in the general area of the optimum value. Finally, the particles converge
to the optimum value
The particle velocity is updated based on the pbest and gbest
vinew = v1 + c1 rand() (pibest pi) + c2 rand() (pgbest (4.10)

pi)
for 1 < i < N. where c1 and c2 are constants known as acceleration coecients
and rand () are two separately generated uniformly distributed random
numbers in the range [0, 1].
Update the particle location by:
pinew = pi + vinew
(4.11)
for 1 < i < N.

Terminate if maximum number of iterations is attained or minimum error
criteria is met.
Go to 2nd step.
21
Chapter 4
4.2.2
Feature Selection
Binary PSO
Consider a database of L subjects or classes, each class W1, W2,. . . ,WL with
N1,N2,. . . ,NL number of samples. Let M1,M2,. . . ,ML be the individual class mean
and M0 be mean of feature vector. Fitness function is dened so as to increase the
class separation equation. By minimizing the tness function, class separation is
increased. With each iteration, the most important features are selected. Binary
value of 1 of its position implies that the feature is selected as a distinguishing feature
for the succeeding iterations and if the position value is 0 the feature is not selected.
The expressions for class, individual mean and mean of feature of feature vector are
shown below:
W j(i), f orj = 1, 2, . . . , N i
(4.12)
1
Mi =
W j(i)
N i j=1
(4.13)
Ni
Mo =
L
1
Ni Mi
N i=1
22
(4.14)
Chapter 5
Face Recognition Using Swarm
Intelligence
Swarm intelligence is a family of decentralized stochastic algorithms inspired by
the behavior of swarms. Swarm intelligence algorithms include Particle Swarm
Optimization (PSO) [26], Bacterial Foraging Optimization (BFO) [15], and Ant
Colony Optimization (ACO). The ultimate goal of any of these optimization
algorithms is to nd the global best tness as eciently as possible.
Since
tness/objective function calls are often the most resource-intensive component of

an optimization algorithm, eciency is often dened as using the fewest number of
tness function calls as possible, i.e., fastest convergence to the global optimum.
In this project, the face images of the Yale face database are used to generate the
training set and the test set. The Yale face database composes of 165 face images,
11 dierent face images for each of 15 people. The training set is generated by 75
face images, 5 dierent images for each of 15 people. The test set is generated by
rest of the images in the database.
Two dierent approaches are followed and a comparison of the two is made. They
are described in the subsection below:
23
Chapter 5
5.1
Face Recognition Using Swarm Intelligence
BFO-based Feature Selection Algorithm
Feature Extraction Perform PCA on the images to obtain the optimal bases
before LDA. Then generate the eigenvectors as the feature vector set (which
will be input to the BFO) through LDA.
Feature Selection Apply the BFO algorithm on the feature vector set as stated in
section 4.1. Pick up the position of bacteria B with max (Jhealth) value. This
position represents the best feature subset of the features dened in feature
extraction step.
Classification Calculate the dierence between the feature subset (obtained
through feature selection) of each image of facial gallery and the test image
with the help of Euclidean Distance dened below. The index of the image
which has the smallest distance with the image under test is considered to
be the required index. For an N-dimensional space, the Euclidean distance
between two any points, pi and qi is given by:
n
D =
sqrt(pi qi)2
(5.1)
i=1
Where pi (or qi) is the coordinate of p (or q) in dimension i.
5.2
PSO-based Feature Selection Algorithm
Feature Extraction Obtain the DCT array by applying Discrete Cosine

Transformation to image. Take the most representative features of size 50
50 from upper left corner of DCT Array.
Feature Selection In this algorithm, each particle represents a range of possible
candidate solutions. The evolution of each generation is accomplished through
the tness function. The tness function is as follow:
v
u l
u
F = t (M i M o)(M i M o)T
i=1
24
(5.2)
Chapter 5
Face Recognition Using Swarm Intelligence
Where Mi represents number of classication and Mo represents the grand

mean in the sample space.
The most representative features obtained through DCT processing are
provided as input to PSO algorithm stated in section 4.2.1. PSO algorithm
nds out the optimal set of coecients.
Classification Euclidean distance classication is done to judge the category of
each testing sample.
25
Chapter 6
Results & Analysis
6.1
Face Database
The two recognition approaches has been tested on publicly available YALE face
database [11]. It comprises images from 15 subjects. For each subject, 11 dierent
images are recorded, one for each variation such as dierent facial expression,
center-light, with glasses, happy, left-light, with and without glasses, normal,
right-light, sad, sleepy, surprised, and wink. In total, the database consists of 165
images.
6.2
Experiment & Results
In order to construct the training set, 7 images per person was used and the
remaining 4 images were used for testing purpose. All 15 classes of persons in
the database were considered. The same training and testing data sets were used in
both the approaches. The results are displayed in table 6.1
26
Chapter 6
Results & Analysis
Feature Extraction
Feature Selection
Training Time
Recognition Rate
PCA and LDA
BFO
257.13 sec
95.14%
DCT
PSO
161.86 sec
93.7%
Table 6.1: Results on Yale Database
6.3
Comparative Analysis
On comparing BFO-based approach with PSO-based approach for feature selection,

it is found that:
The average recognition rate of BFO is better than that of PSO-based feature
selection. Also, on analysis it is found that number features required by BFO

are less than that required for recognition using PSO.
However, in terms of computational time, PSO-based selection algorithm takes
less training time than the BFO-based selection algorithm. Hence, BFO is
computationally expensive than PSO.
Therefore, it can be said that the eectiveness of BFO in nding the optimal feature
subset compared to PSO compensates its computational ineciency.
27
Chapter 7
Conclusion
In this thesis, face recognition was performed by application of the swarm
optimization algorithms. It was found out that the underlying foraging principle
and the swarm optimization can be integrated into evolutionary computational
algorithms to provide a better search strategy for nding optimal feature vectors for
face recognition. Finally, it is believed that the two swarm optimization methods,
namely bacterial foraging optimization and particle swarm optimization, may be
useful for the development of face recognition system.
28
Bibliography
[1] W. Zhao, R. Chellappa, P. J. Phillips, and A. Rosenfeld. Face recognition: A literature survey.
ACM Comput. Surv., 35(4):399458, December 2003.
[2] Marios Savvides, Jingu Heo, and SungWon Park. Face recognition. In AnilK. Jain, Patrick
Flynn, and ArunA. Ross, editors, Handbook of Biometrics, pages 4370. Springer US, 2008.
[3] M. Kirby and L. Sirovich. Application of the karhunen-loeve procedure for the characterization
of human faces. IEEE Trans. Pattern Anal. Mach. Intell., 12(1):103108, January 1990.
[4] F.Z. Chelali, A. Djeradi, and R. Djeradi. Linear discriminant analysis for face recognition. In
Multimedia Computing and Systems, 2009. ICMCS 09. International Conference on, pages
110, 2009.
[5] B. Schwerin and K. Paliwal. Local-dct features for facial recognition. In Signal Processing
and Communication Systems, 2008. ICSPCS 2008. 2nd International Conference on, pages
16, 2008.
[6] Rutuparna Panda, Manoj Kumar Naik, and B.K. Panigrahi. Face recognition using bacterial
foraging strategy. Swarm and Evolutionary Computation, 1(3):138 146, 2011.
[7] S. Arivalagan and K. Venkatachalapathy.
Article: Face recognition based on a hybrid
meta-heuristic feature selection algorithm. International Journal of Computer Applications,

55(17):1822, October 2012. Published by Foundation of Computer Science, New York, USA.
[8] Navdeep Kaur Rasleen Jakhar and Ramandeep Singh. Face recognition using bacteria foraging
optimization-based selected features. (IJACSA) International Journal of Advanced Computer
Science and Applications,Special Issue on Artificial Intelligence, 2011.
[9] Rabab M. Ramadan and Rehab F. Abdel.
Face Recognition Using Particle Swarm
Optimization-Based Selected Features. 2009.

[10] Guojian Cheng, Caiyun Shi, Kai Zhu, and Kevin Gong. The application of binary particle
swarm algorithm in face recognition. In Computational Intelligence and Security (CIS), 2011
Seventh International Conference on, pages 12291233, 2011.
29
Bibliography
[11] http://cvc.yale.edu/projects/yalefaces/yalefaces.html.
[12] Matthew Turk and Alex Pentland. Eigenfaces for recognition. J. Cognitive Neuroscience,
3(1):7186, January 1991.
[13] Peter N. Belhumeur, Joo P. Hespanha, and David J. Kriegman. Eigenfaces vs. fisherfaces:
Recognition using class specific linear projection.
[14] Saeed Dabbaghchian, Masoumeh P. Ghaemmaghami, and Ali Aghagolzadeh.
Feature
extraction using discrete cosine transform and discrimination power analysis with a face
recognition technology. Pattern Recognition, 43(4):1431 1440, 2010.
[15] K. M. Passino. Biomimicry of bacterial foraging for distributed optimization and control.
IEEE Control Systems Magazine, 22:5267, 2002.
[16] M.S. Bartlett, Javier R. Movellan, and T.J. Sejnowski. Face recognition by independent
component analysis. Neural Networks, IEEE Transactions on, 13(6):14501464, 2002.
[17] Xiaofei He, Shuicheng Yan, Yuxiao Hu, P. Niyogi, and Hong-Jiang Zhang. Face recognition
using laplacianfaces. Pattern Analysis and Machine Intelligence, IEEE Transactions on,
27(3):328340, 2005.
[18] Chengjun Liu and Harry Wechsler.
Evolutionary pursuit and its application to face
recognition. IEEE Trans. Pattern Anal. Mach. Intell., 22(6):570582, June 2000.
[19] Ming-Hsuan Yang. Kernel eigenfaces vs. kernel fisherfaces: Face recognition using kernel
methods.
In Automatic Face and Gesture Recognition, 2002. Proceedings. Fifth IEEE
International Conference on, pages 215220, 2002.

[20] J. Wright, A.Y. Yang, A. Ganesh, S.S. Sastry, and Yi Ma. Robust face recognition via
sparse representation. Pattern Analysis and Machine Intelligence, IEEE Transactions on,
31(2):210227, 2009.
[21] M. Lades, J.C. Vorbruggen, J. Buhmann, J. Lange, C. Von Der Malsburg, R.P. Wurtz,
and W. Konen. Distortion invariant object recognition in the dynamic link architecture.
Computers, IEEE Transactions on, 42(3):300311, 1993.
[22] L. Wiskott, J.-M. Fellous, N. Kuiger, and C. Von der Malsburg. Face recognition by elastic
bunch graph matching. Pattern Analysis and Machine Intelligence, IEEE Transactions on,
19(7):775779, 1997.
[23] Timo Ahonen, Abdenour Hadid, and Matti Pietik?inen. Face description with local binary
patterns: Application to face recognition.
Machine Intelligence, 28(12):20372041, 2006.
30
IEEE Transactions on Pattern Analysis and
Bibliography
[24] Jun Luo, Y. Ma, E. Takikawa, Shihong Lao, M. Kawade, and Bao-Liang Lu. Person-specific
sift features for face recognition. In Acoustics, Speech and Signal Processing, 2007. ICASSP
2007. IEEE International Conference on, volume 2, pages II593II596, 2007.
[25] A. F. Frangi J. Yang D. Zhang and J. Y. Yang. Two-dimensional pca: A new approach to
appear-ance-based face representation and recognition.
[26] J. Kennedy and R. Eberhart. Particle swarm optimization. In Neural Networks, 1995.
Proceedings., IEEE International Conference on, volume 4, pages 19421948 vol.4, 1995.
31

Swarm Optimization Algorithms For Face Recognition: Juicy Ray

Uploaded by

Copyright:

Available Formats

Swarm Optimization Algorithms For Face Recognition: Juicy Ray

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Swarm Optimization Algorithms For Face Recognition: Juicy Ray

Uploaded by

Copyright:

Available Formats

Swarm Optimization Algorithms

for Face Recognition

Department of Computer Science and Engineering

Swarm Optimization Algorithms

Department of Computer Science and Engineering

Computer Science and Engineering

Prof. Banshidhar Majhi

Second, Particle Swarm Optimization(PSO), which optimizes

the transform coecients obtained from the Discrete Cosine Transform(DCT) of

PCA and LDA are based on global appearance whereas DCT is

performed on a block by block basis exploring the local appearance-based features.

Swarm intelligence, feature extraction, feature selection, PCA, LDA, DCT,

Face Recognition using Swarm Intelligence . . . . . . . . . . . . . . .

2 Steps in Face Recognition

Principal Component Analysis(PCA) . . . . . . . . . . . . . . 10

Linear Discriminant Analysis(LDA) . . . . . . . . . . . . . . . 12

Discrete Cosine Transform(DCT) . . . . . . . . . . . . . . . . 13

Bacterial Foraging Optimization(BFO) . . . . . . . . . . . . . . . . . 16

Elimination & Dispersal . . . . . . . . . . . . . . . . . . . . . 17

Particle Swarm Optimization(PSO) . . . . . . . . . . . . . . . . . . . 20

5 Face Recognition Using Swarm Intelligence

BFO-based Feature Selection Algorithm . . . . . . . . . . . . . . . . 24

PSO-based Feature Selection Algorithm . . . . . . . . . . . . . . . . . 24

6 Results & Analysis

Experiment & Results . . . . . . . . . . . . . . . . . . . . . . . . . . 26

Outline of a typical face recognition system . . . . . . . . . . . . . . .

Feature Extraction process . . . . . . . . . . . . . . . . . . . . . . . .

(Starting from top-left)(a)Class Mean image (b) Overall Mean Image

Example of a particle swarm optimisation swarms progression in

Summary of Face Recognition techniques . . . . . . . . . . . . . . . .

Results on Yale Database

Video indexing is another

example of legacy data for which face recognition.

Biometrics are automated methods of recognizing a person based on a physiological

Appearance-based PCA for learning eigenfaces

Appearance-based LDA for learning fisherfaces

Appearance-based For better statistical properties

Appearance-based ICA for catch facial independent

Nonlinear dimension reduction for

For finding the best projection bases

Kernel PCA And

Kernel LDA [19]

higher-dimensional space by the

Using L1 minimization for finding

Gabor features extracted at facial

Gabor features extracted at facial

robust representation by the FBG

Using SIFT feature with spatial

Table 1.1: Summary of Face Recognition techniques

Face Recognition using Swarm Intelligence

This is among a group of natural phenomena based on

approaches for feature selection in face recognition as well.

The term swarm

Bacterial foraging optimization and Particle swarm

optimization are applications of swarm intelligence in the area of optimization.

Figure 2.1: Outline of a typical face recognition system

Steps in Face Recognition

2. Preprocessing - This is done to enhance the images to improve the