Seminar Project - Face - Recognition

Bachelor of Computer Applications (BCA)
Programme
Seminar Report
BCA Sem VI
AY 2022-23
Face Recognition Technology

By
Exam No Name of Student

3193 Savani Dhruvi Vinodbhai
Project Guide by :
Prof.Nehal patel
Acknowledgement
The success and final outcome of this project required a lot of guidance and
assistance from many people and we are extremely fortunate to have got this
all along the completion of our project work. Whatever we have done is only
due to such guidance and assistance.
We would not forget to thank I/C Principal Dr. Aditi Bhatt, Head of Department
Dr. Vaibhav Desai and Project guide Prof. Nehal patel, and all other Assistant
professors of SDJ International College, who took keen interest on our project
work and guided us all along, till the completion of our project work by
providing all the necessary information for developing a good system.
We are extremely grateful to her for providing such a nice support and
guidance though she/he had busy schedule managing the college dealings.
We are thankful and fortunate enough to get support and guidance from all
Teaching staffs of Bachelor of Computer Application Department which
helped us in successfully completing our project work. Also, we would like to
extend our sincere regards to all the non-teaching staff of Bachelor of
Computer Application Department for their timely support.
Savani Dhruvi Vinodbhai(3193)

INDEX
Sr No Description Page No.
1 Practical and Overview of facial recognition

2 Introduction
3 Fundamental of facial recognition
Domain knowledge of human faces and human viaual
4
system
5 Design issue
6 Face Detection
7 Template matching method
8 Appearance based method
9 Generative model framework
10 Feature Attraction and face recognition
11 Fisherface and linear analysis
12 Robust face recognition via sparse representation
13 Feature based method
14 Binary Features
15 Template based methods
16 The Summary of Face Detaction
17 Referance
Attendance System Practical with the help of python

Overview of face recognition Technology
Face recognition has been one of the most interesting and important research fields
in the past two decades. The reasons come from the need of automatic recognitions
and surveillance systems, the interest in human visual system on face recognition,
and the design of human-computer interface, etc. These researches involve know-
ledge and researchers from disciplines such as neuroscience, psychology, computer
vision, pattern recognition, image processing, and machine learning, etc. A bunch of
papers have been published to overcome difference factors (such as illumination, ex-
pression, scale, pose, ……) and achieve better recognition rate, while there is still no
robust technique against uncontrolled practical cases which may involve kinds of
factors simultaneously. In this report, we’ll go through general ideas and structures
of recognition, important issues and factors of human faces, critical techniques and
algorithms, and finally give a comparison and conclusion.
Table of content:
(1) Introduction to face recognition: Structure and Procedure
(2) Fundamental of face pattern recognition
(3) Issues and factors of human faces
(4) Techniques and algorithms on face detection
(5) Techniques and algorithms on face feature extraction and face recognition
(6) Comparison and Conclusion
1
1. Introduction to Face Recognition: Structure and Procedure
In this report, we focus on image-based face recognition. Given a picture taken from
a digital camera, we’d like to know if there is any person inside, where his/her face
locates at, and who he/she is. Towards this goal, we generally separate the face rec-
ognition procedure into three steps: Face Detection, Feature Extraction, and Face
Recognition
Face Detection:
The main function of this step is to determine (1) whether human faces appear in
a given image, and (2) where these faces are located at. The expected outputs of this
step are patches containing each face in the input image. In order to make further
face recognition system more robust and easy to design, face alignment are per-
formed to justify the scales and orientations of these patches. Besides serving as the
pre-processing for face recognition, face detection could be used for re-
gion-of-interest detection, retargeting, video and image classification, etc.
Feature Extraction:
After the face detection step, human-face patches are extracted from images. Di-
rectly using these patches for face recognition have some disadvantages, first, each
patch usually contains over 1000 pixels, which are too large to build a robust recogni-
tion system1. Second, face patches may be taken from different camera alignments,
with different face expressions, illuminations, and may suffer from occlusion and
clutter. To overcome these drawbacks, feature extractions are performed to do in-
formation packing, dimension reduction, salience extraction, and noise cleaning. Af-
ter this step, a face patch is usually transformed into a vector with fixed dimension
or a set of fiducial points and their corresponding locations. We will talk more de-
tailed about this step in Section 2. In some literatures, feature extraction is either in-
cluded in face detection or face recognition.
Face Recognition:
After formulizing the representation of each face, the last step is to recognize the
2
identities of these faces. In order to achieve automatic recognition, a face database is

required to build. For each person, several images are taken and their features are
extracted and stored in the database. Then when an input face image comes in, we
perform face detection and feature extraction, and compare its feature to each face
class stored in the database. There have been many researches and algorithms pro-
posed to deal with this classification problem.
There are two general applications of face recognition, one is called identification
and another one is called verification. Face identification means given a face image,
we want the system to tell who he / she is or the most probable identification; while
in face verification, given a face image and a guess of the identification, we want the
system to tell true or false about the guess. In fig. 2, we show an example of how
these three steps work on an input image.
Figure 2: An example of how the three steps work on an input image. (a) The input image a
nd the re-
sult of face detection (the red rectangle) (b) The extracted face patch (c) The feature vector
after fea-
ture extraction (d) Comparing the input vector with the stored vectors in the database by cla
ssification
techniques and determine the most probable class (the red rectangle).
.
2. Fundamental of pattern recognition
Before going into details of techniques and algorithms of face recognition, we’d like
to make a digression here to talk about pattern recognition. The discipline, pattern
recognition, includes all cases of recognition tasks such as speech recognition, object
recognition, data analysis, and face recognition, etc. In this section, we won’t discuss
those specific applications, but introduce the basic structure, general ideas and gen-
eral concepts behind them.
The general structure of pattern recognition is shown in fig.3. In order to generate
a system for recognition, we always need data sets for building categories and com-
pare similarities between the test data and each category. A test data is usually called
3
a “query” in image retrieval literatures, and we will use this term throughout this re-
port. From fig. 3, we can easily notice the symmet2ric structure. Starting from the data
sets side, we first perform dimension reduction on the stored raw data. The me-
thods of dimension reduction can be categorized into data-driven methods and do-
main-knowledge methods,
each raw data in the data sets is transformed into a set of features, and the classifier
is mainly trained on these feature representations. When a query comes in, we per-
form the same dimension reduction procedure on it and enter its features into the
trained classifier. The output of the classifier will be the optimal class (sometimes
with the classification accuracy) label or a rejection note (return to manual classifica-
tion).
Figure 3: The general structure of a pattern recognition sy

stem
Notation
There are several conventional notations in the literatures of pattern recognition and
machine learning. We usually denote a matrix with an upper-case character and a
vector with a lower-case one.
4
Figure 4: Various approaches in statistical pattern recognition.
2.2Different kinds of pattern recognition (four categories)

Following the definition of Jain et al. [5], Techniques of pattern recognition can be
classified into four categories: Template matching, statistical approaches, syntactic
approach, and neural networks. The template matching category builds several tem-
plates for each label class and compares these templates with the test pattern to
achieve a suitable decision. The statistical approaches is the main category that will
be discussed in this report, which extracts knowledge from training data and uses
different kinds of machine learning tools for dimension reduction and recognition.
Fig. 4 shows the categories of the statistical approach.
The syntactic approach is often called the rule-based pattern recognition, which
is built on human knowledge or some physical rules, for example, the word classifica-
tion and word correction requires the help of grammars. The term, knowledge, is re-
ferred to the rule that the recognition system uses to perform certain actions. Finally,
the well-know neural networks is a framework based on the recognition unit called
perceptron. With different numbers of perceptrons, layers, and optimization criteria,
the neural networks could have several variations and be applied to wide recognition
cases.
5
Dimension Reduction: Domain-knowledge Approach and Data-driven

Approach
Dimension reduction is one of the most important steps in pattern recognition and
machine learning. It’s difficult to directly use the raw data (ex. face patches) for pat-
tern recognition not only because significant parts of the data haven’t been extracted
but also because the extremely high dimensionality of the raw data. Significant parts
(for recognition purposes or the parts with more interest) usually occupy just a small
portion of the raw data and cannot directly be extracted by simple methods such as
cropping and sampling. For example, a one-channel audio signal usually contains
over 10000 samples per second, and there will be over 1800000 samples for a three
minute-long song. Directly using the raw signal for music genre recognition is prohi-
bitive and we may seek to extract useful music features such as pitch, tempo, and
information of instruments which could better express our auditory perception. The
goal of dimension reduction is to extract useful information and reduce the dimen-
sionality of input data into classifiers in order to decrease the cost of computation
and solve the curse of dimensionality problem.
There’re two main categories of dimension reduction techniques: domain-
knowledge approaches and data-driven approaches. The domain-knowledge ap-
proaches perform dimension reduction based on knowledge of the specific pattern
recognition case. For example, in image processing and audio signal processing, the
discrete Fourier transform (DFT) discrete cosine transform (DCT) and discrete wavelet
transform are frequently used because of the nature that human visual and auditory
perception have higher response at low frequencies than high frequencies. Another
significant example is the use of language model in text retrieval which includes the
contextual environment of languages.
In contrast to the domain-knowledge approaches, the data-driven approaches
directly extract useful features from the training data by some kinds of machine
learning techniques. For example, the eigenface which will be discussed in Section
5.11 determines the most important projection bases based on the principal com-
ponent analysis which are dependent on the training data set, not the fixed basis like
the DFT or DCT. In section 5, we’ll see more examples about these two dimension
reduction categories.
2.4Two tasks: Unsupervised Learning and Supervised Learning

There’re two general tasks in pattern recognition and machine learning: supervised
learning and unsupervised learning. The main difference between these two tasks is
if the label of each training sample is known or unknown. When the label is known,
then during the learning phase in pattern recognition, we’re trying to model the rela-
6
tion between the feature vectors and their corresponding labels, and this kind of
learning is called the supervised learning. On the other hand, if the label of each
training sample is unknown, then what we try to learn is the distribution of the
possible categories of feature vectors in the training data set, and this kind of learn-
ing is called the unsupervised learning. In fact, there is another task of learning called
the semi-supervised learning, which means only part of the training data has labels,
while this kind of learning is beyond the scope of this report.
Evaluation Methods
Besides the choices of pattern recognition methods, we also need to evaluate the
performance of the experiments. There are two main evaluation plots: the ROC (re-
ceiver operating characteristics) curve and the PR (precision and recall) curve. The
ROC curve examines the relation between the true positive rate and the false posi-
tive rate, while the PR curve extracts the relation between detection rate (recall) and
the detection precision. In the two-class recognition case (for example, face and
non-face), the true positive means the portion of face images to be detected by the
system, while the false positive means the portion of non-face images to de detected
as faces. The term true positive here has the same meaning as the detection rate and
recall and we give a detailed description in table 1 and table 2. In fig. 5, we show
examples of the PR curve. In addition to using curves for evaluation, there’re some
frequently used values for performance judgments, and we summarize them in table
3.
The threshold used to decide the positive or negative for a given case plays an
important role in pattern recognition. With low threshold, we could achieve high true
positive rate but also high false positive rate, and vice versa. To be noticed, each
point on the ROC curve or PR curve corresponds to a specific threshold used.
The terms positive and negative reveal the asymmetric condition on detection
tasks where one class is the desired pattern class and another class is the comple-
ment class. While in tasks that each class has equal importance or similar meaning
(for example, each class denotes one kind of object), the error rate is much pre-
ferred.
Table 1: The definition of true positive and false posit

ive
Ground truth \ detection Detected (positive) Rejected (negative)
False negative (FN)

Desired class True positive (TP)
True negative (TN)

Complement class False positive (FP)
7
Fig 5: An example of the PR curve. This is the experimental result of the video fingerprinting
technique,
where five different methods are compared. The horizontal axis indicates the recall and the
vertical
Term Definition
Recall (R) # of true positive / # of all desired patterns in the validatio
n
Precision (P) # of true positive / # of all detected pattern
8
--->Conclusion
The tasks and cases discussed in the previous sections give an overview about pat-
tern recognition. To gain more insight on the performance of pattern recognition
techniques, we need to take care about some important factors. In template match-
ing, the number of templates for each class and the adopted distance metric directly
affects the recognition result. In statistical pattern recognition, there are four impor-
tant factors: the size of the training data N, the dimensionality of each feature vec-
tor d, the number of classes C, and the complexity of the classifier h, and we sum-
marize their meanings and relations in table 4 and table 5. In syntactic approach, we
expect that the more rules are considered, the higher recognition performance we
can achieve, while the system will become more complicated. And sometimes, it ’s
hard to transfer and organize human knowledge into algorithms. Finally in neural
networks, the number of layers, the number of used perceptrons (neurons), the di-
mensionality of feature vectors, and the number of classes all have effects on the
recognition performance. More interesting, the neural networks have been discussed
and proved to have closed relationships with the statistical pattern recognition tech-
niques [5].
3. Issues and factors of human faces
on a specific application, besides building the general structure of pattern recogni-

tion system, we also need to consider the intrinsic properties of the domain-specific
data. For example, to analyze music or speech, we may first transform the input sig-
nal into frequency domain or MFCC (Mel-frequency cepstral coefficients) because
features represented in these domain have been proved to better capture human
auditory perception. In this section, we’ll talk about the domain-knowledge of hu-
man faces, factors that result in face-appearance variations in images, and finally list
important issues to be considered when designing a face recognition system.
Domain-knowledge of human faces and human visual system
Aspects from psychophysics and neuroscience

There are several researches in psychophysics and neuroscience studying about how
we human performs recognition processes, and many of them have direct relevance
to engineers interested in designing algorithms or systems for machine recognition
of faces. In this subsection, we briefly review several interesting aspects. The first
9
argument in these disciplines is that whether face recognition a dedicated process

against other object recognition tasks. Evidences that (1) faces are more easily re-
membered by humans than other objects when presented in an upright orientation
and (2) prosopagnosia patients can recognize faces from other objects but have dif-
ficulty in identifying the face support the viewpoint of face recognition as a dedicat-
ed process. While recently, some findings in human neuropsychology and neuroi-
maging suggest that face recognition may not be unique [2].
Holistic-based or feature-based
This is another interesting argument in psychophysics / neuroscience as well as in
algorithm design. The holistic-based viewpoint claims that human recognize faces by
the global appearances, while the feature-based viewpoint believes that important
features such as eyes, noses, and mouths play dominant roles in identifying and re-
membering a person. The design of face recognition algorithms also apply these
perspectives and will be discussed in Section 5.
Thatcher Illusion
The Thatcher illusion is an excellent example showing how the face alignment affects
human recognition of faces. In the illusion shown in the fig. 6, eyes and mouth of an
expressing face are excised and inverted, and the result looks grotesque in an
upright face. However, when shown inverted, the face looks fairly normal in ap-
pearance, and the inversion of the internal features is not readily noticed.
(a) (b)
Figure 6: The Thatcher Illusion.

(a) The head is located up-side down, and it’s hard to notice that the
eyes are pasted in the reverse direction in the right-side picture, while in
(b) we can easily recognize the strange
appearance.
Factors of human appearance variations

There are several factors that result in difficulties of face detection and face recogni-
tion. Except the possible low quality driven from the image acquisition system, we
focus on the angle of human faces taken by the camera and the environment of
10
(2) face pose, (3) face expression, (4) RST (rotation, scale, and translation) variation,
(5) clutter background, and (6) occlusion. Table 6 lists the details of each factor.
Table 6: The list and description of the six factors:

Illumination The illumination variation has been widely discussed in many face
detection and recognition researches. This variation is caused by

various lighting environments and is mentioned to have larger ap-
pearance difference than the difference caused by different identi-
ties. Fig. 7 shows the example of illumination changes on images of
the same person, and it’s obviously that under some illumination
conditions, we can neither assure the identification nor accurately
point out the positions of facial features.
Pose The pose variation results from different angles and locations dur-
ing the image acquisition process. This variation changes the spatial
relations among facial features and causes serious distortion on the
traditional appearance-based face recognition algorithms such as
eigenfaces and fisherfaces. An example of pose variation is shown
in fig. 8.
Expression Human uses different facial expressions to express their feelings or
tempers. The expression variation results in not only the spatial re-
lation change, but also the facial-feature shape change.
RST variation The RST (rotation, scaling, and translation) variation is also caused
by the variation in image acquisition process. It results in difficulties

both in face detection and recognition, and may require exhaustive
searching in the detection process over all possible RST parameters.
Cluttering In addition to the above four variations which result in changes in
facial appearances, we also need to consider the influence of envi-

ronments and backgrounds around people in images. The cluttering
background affects the accuracy of face detection, and face patches
including this background also diminish the performance of face
recognition algorithms.
Occlusion The occlusion is possibly the most difficult problem in face recogni-
tion and face detection. It means that some parts of human faces
are unobserved, especially the facial features.
11
Figure 7: Face-
patch changes under different illumination conditions. We can easily find how strong
the illumination can affects the face appearance.
Figure 8: Face-
patch changes under different pose conditions. When the head pose changes, the spa-
tial relation (distance, angle, etc.) among fiducial points (eyes, mouth, etc.) also changes an
d results in
serious distortion on the traditional appearance representation.
Design issues
When designing a face detection and face recognition system, in addition to consi-
dering the aspects from psychophysics and neuroscience and the factors of human
appearance variations, there are still some design issues to be taken into account.
First, the execution speed of the system reveals the possibility of on-line service
and the ability to handle large amounts of data. Some previous methods could accu-
rately detect human faces and determine their identities by complicated algorithms,
which requires a few seconds to a few minutes for just an input image and can’t be
used in practical applications. For example, several types of digital cameras now have
the function to detect and focus on human faces, and this detection process usually
takes less than 0.5 second. In recent pattern recognition researches, lots of published
papers concentrate their works on how to speed-up the existing algorithms and how
to handle large amounts of data simultaneously, and new techniques also include the
12
execution time in the experimental results as comparison and judgment against oth-
er techniques.
Second, the training data size is another important issue in algorithm design. It is
trivial that more data are included, more information we can exploit and better per-
formance we can achieve. While in practical cases, the database size is usually limited
due to the difficulty in data acquisition and the human privacy. Under the condition
of limited data size, the designed algorithm should not only capture information from
training data but also include some prior knowledge or try to predict and interpolate
the missing and unseen data. In the comparison between the eigenface and the fi-
sherface, it has been examined that under limited data size, the eigenface has better
performance than the fisherface.
Finally, how to bring the algorithms into uncontrolled conditions is yet an unsolved
problem. In Section 3.2, we have mentioned six types of appearance-variant factors,
in our knowledge until now, there is still no technique simultaneously handling these
factors well. For future researches, besides designing new algorithms, we’ll try to
combine the existing algorithms and modify the weights and relationship among
them to see if face detection and recognition could be extended into uncontrolled
conditions.
4. Face detection
From this section on, we start to talk about technical and algorithm aspects of face
recognition. We follow the three-step procedure depicted in fig. 1 and introduce
each step in the order: Face detection is introduced in this section, and feature ex-
traction and face recognition are introduced in the next section. In the survey written
by Yang et al. [7], face detection algorithms are classified into four categories: know-
ledge-based, feature invariant, template matching, and the appearance-based me-
thod. We follow their idea and describe each category and present excellent exam-
ples in the following subsections. To be noticed, there are generally two face detec-
tion cases, one is based on gray level images, and the other one is based on colored
images.
Knowledge-based methods
These rule-based methods encode human knowledge of what constitutes a typi-
cal face. Usually, the rules capture the relationships between facial features. These
methods are designed mainly for face localization, which aims to determine the im-
age position of a single face. In this subsection, we introduce two examples based on
hierarchical knowledge-based method and vertical / horizontal projection.
13
Figure 9: The multi-resolution hierarchy of images created by averaging.

(a) The original image.
(b) The image with each 4-by-4 square substituted by the averaged intensity of pixels
in that square.
(c) The image with 8-by-8 square. (d) The image with 16-by-16 square.
Figure 10: Examples of the horizontal / vertical projection method.

The image (a) and image (b) are
sub-sampled with 8-by-8 squares by the same method described in fig. 7, and (c) with 4-by-
4. The
projection method performs well in image (a) while can’t handle complicated backgrounds a
nd mul-
ti-face images in image (b) and (c).
Hierarchical knowledge-based method
This method is composed of the multi-resolution hierarchy of images and specific

rules defined at each image level [8]. The hierarchy is built by image sub-sampling
and an example is shown in fig. 9. The face detection procedure starts from the
highest layer in the hierarchy (with the lowest resolution) and extracts possible face
candidates based on the general look of faces. Then the middle and bottom layers
carry rule of more details such as the alignment of facial features and verify each face
candidate. This method suffers from many factors described in Section 3 especially
the RST variation and doesn’t achieve high detection rate (50 true positives in 60 test
images), while the coarse-to-fine strategy does reduces the required computation
and is widely adopted by later algorithms.
Horizontal / vertical projection
This method uses the fairly simple image processing technique, the horizontal
14
which together constitute a face candidate. Finally, each face candidate is validated
by further detection rules such as eyebrow and nostrils. As shown in fig. 10, this me-
thod is sensitive to complicated backgrounds and can’t be used on images with mul-
tiple faces.
Feature invariant approaches

These algorithms aim to find structural features that exist even when the pose,
viewpoint, or lighting conditions vary, and then use these to locate faces. These me-
thods are designed mainly for face localization. To distinguish from the knowledge-
based methods, the feature invariant approaches start at feature extraction process
and face candidates finding, and later verify each candidate by spatial relations
among these features, while the knowledge-based methods usually exploit in-
formation of the whole image and are sensitive to complicated backgrounds and
other factors described in Section 3. We present two characteristic techniques of this
category in the following subsections, and readers could find more works in
[6][12][13][14][26][27].
Face Detection Using Color Information
In this work, Hsu et al. [10] proposed to combine several features for face detec-
tion. They used color information for skin-color detection to extract candidate face
regions. In order to deal with different illumination conditions, they extracted the 5%
brightest pixels and used their mean color for lighting compensation. After skin-color
detection and skin-region segmentation, they proposed to detect invariant facial
features for region verification. Human eyes and mouths are selected as the most
significant features of faces and two detection schemes are designed based on
chrominance contrast and morphological operations, which are called “eyes map”
and “mouth map”. Finally, we form the triangle between two eyes and a mouth and
verify it based on (1) luminance variations and average gradient orientations of eye
and mouth blobs, (2) geometry and orientation of the triangle, and (3) the presence
of a face boundary around the triangle. The regions pass the verification are denoted
as faces and the Hough transform are performed to extract the best-fitting ellipse to
extract each face.
This work gives a good example of how to combine several different techniques
together in a cascade fashion. The lighting compensation process doesn’t have a sol-
id background, but it introduces the idea that despite modeling all kinds of illumina-
tion conditions based on complicated probability or classifier models, we can design
an illumination-adaptive model which modifies its detection threshold based on the
illumination and chrominance properties of the present image. The eyes map and
15
Figure 11: The flowchart of the face detection algorithm proposed by Hsu et al
. [10]
Figure 12: The flowchart to generate the eye map. [

10]
Figure 13: The flowchart to generate the mouth map. [

10]
16
the mouth map shows great performance with fairly simple operations, and in our
recent work we also adopt their framework and try to design more robust maps.
Face detection based on random labeled graph matching

Leung et al. developed a probabilistic method to locate a face in a cluttered scene
based on local feature detectors and random graph matching [11]. Their motivation
is to formulate the face localization problem as a search problem in which the goal is
to find the arrangement of certain features that is most likely to be a face pattern. In
the initial step, a set of local feature detectors is applied to the image to identify can-
didate locations for facial features, such as eyes, nose, and nostrils, since the feature
detectors are not perfectly reliable, the spatial arrangement of the features must also
be used for localize the face.
The facial feature detectors are built by the multi-orientation and multi-scale
Gaussian derivative filters, where we select some characteristic facial features (two
eyes, two nostrils, and nose/lip junction) and generate a prototype filter response for
each of them. The same filter operation is applied to the input image and we com-
pare the response with the prototype responses to detect possible facial features. To
enhance the reliability of these detectors, the multivariate-Gaussian distribution is
used to represent the distribution of the mutual distances among each facial feature,
and this distribution is estimated by a set of training arrangements. The facial feature
detectors averagely find 10-20 candidate locations for each facial feature, and the
brute-force matching for each possible facial feature arrangement is computationally
very demanding. To solve this problem, the authors proposed the idea of controlled
search. They set a higher threshold for strong facial feature detection, and each pair
of these strong features is selected to estimate the locations of other three facial
features using a statistical model of mutual distances. Furthermore, the covariance of
the estimates can be computed. Thus, the expected feature locations are estimated
with high probability and shown as ellipse regions as depicted in fig. 14. Constella-
tions are formed only from candidate facial features that lie inside the appropriate
locations, and the ranking of constellation is based on a probability density function
that a constellation corresponds to a face versus the probability it was generated by
the non-face mechanism. In their experiments, this system is able to achieve a cor-
rect localization rate of 86% for cluttered images.
This work presents how to estimate the statistical properties among characteristic
facial features and how to predict possible facial feature locations based on other
observed facial features. Although the facial feature detectors used in this work is not
robust compared to other detection algorithms, their controlled search scheme
could detect faces even some features are occluded.
17
Figure 14: The locations of the missing features are estimated from two feature points. T
he ellipses
show the areas which with high probability include the missing feature
s. [11]
Template matching methods

In this category, several standard patterns of a face are stored to describe the
face as a whole or the facial feature separately. The correlations between an input
image and the stored pattern are computed for detection. These methods have been
used for both face localization and detection. The following subsection summarizes
an excellent face detection technique based on deformable template matching,
where the template of faces is deformable according to some defined rules and con-
straints.
Adaptive appearance model
In the traditional deformable template matching techniques [31], the deformation

constraints are determined based on user-defined rules such as first- or second-order
derivative properties [15]. These constraints are seeking for the smooth nature or
some prior knowledge, while not all the patterns we are interested in have these
properties. Furthermore, the traditional techniques are mainly used for shape or
boundary matching, not for texture matching.
The active shape model (ASM) proposed by Kass et al. [16] exploits information
from training data to generate the deformable constraints. They applied the principal
component analysis (PCA) [17][18] to learn the possible variation of object shapes,
and from their experimental results shown in fig. 15, we can see the most significant
principal components are directly related to some factors of variation, such as length
or width. Although the principal component analysis can’t exactly capture the nonli-
near shape variation such as bending, this model presents a significant way of think-
ing: learning the deformation constraints directly from the possible variation.
18
(a) (b)
(c) (d) (e)
Figure 15: The example of the ASM for resistor shapes. In (a), the shape variation
summarized and several discrete points are extracted from the shape boundaries for shape
learning,
as shown in (b). From (c) to (e), the effects of changing the weight of the first three principal
are presented, and we can see the relationship between these components and the shape
[15]
The ASM model can only deal with shape variation but not texture variation. Fol-
lowing their works, there are many works trying to combine shape and texture varia-
tion together, for example, Edwards et al. proposed that first matching an ASM to
boundary features in the image, then a separate eigenface model (texture model
based on the PCA) is used to reconstruct the texture in a shape-normalized frame.
This approach is not, however, guaranteed to give an optimal fit of the appearance
(shape boundary and texture) model to the image because small errors in the match
of the shape model can result in a shape-normalized texture map that can’t be re-
constructed correctly using eigenface model. To direct match shape and texture si-
multaneously, Cootes et al. proposed the well-know active appearance model (AAM)
[19][20].
The active appearance model requires a training set of annotated images where
corresponding points have been marked on each example. In fig. 16, we show that to
build a facial model, the main features of human faces are required to be marked
manually (each face image is marked as a vector ). The ASM is then applied to align
19
each training face is warped so the points match those of the mean shape , obtain-
ing a shape-free patch. These shape-free patches are further represented as a set of
vectors and undergo the intensity normalization process (each vector is denoted as
). By applying the PCA to the intensity normalized data we obtain a linear model
that captures the possible texture variation. We summarize the process that has
been done until now for the AAM as follows:
, where is the orthonormal bases of the ASM and is the set of shape para-
meters for each training face. The matrix is the orthonormal bases of the texture
variation and is the set of texture parameters for each intensity normalized shape-
free patch. The details and process of the PCA is described in Section 5.
To capture the correlation between shape and texture variation, a further PCA is
applied to the data as follows. For each training example we generate the concate-
nated vector:
, where is a diagonal matrix of weights for each shape parameter, allowing for
the difference in units between the shape and texture models. The PCA is applied on
these vectors to generate a further model:
, where Q represents the eigenvectors and c is a vector of appearance parameters

controlling both the shape and texture of the model. Note that the linear nature of
the model allows us to express the shape and texture directly as function of c:
An example image can be synthesized for a given c by generating the shape-free tex-
ttuerxetupraet,ch first, and warp it to the suitable shape.
In the training phase for face detection, we learn the mean vectors of shape and
, , and Q to generate a facial AAM. And in the face detection phase,
20
we modify the vector c, the location and scale of the model to minimize the differ-
ence between synthesized appearance and the current location and scale in the in-
put image. After reaching a local minimum difference, we compare it with a pre-
defined threshold to determine the existence of a face. Fig. 17 illustrates the dif-
ference-minimization process. The parameter modification is rather a complicated
optimization, and in their works, they combined the genetic algorithm and a pre-
defined Parameter-refinement matrix to facilitate the convergence process. These
techniques are beyond the scope of this report, and the readers who are in-
terested in them can refer to the original papers [19].
Figure 16: A labeled training image gives a shape free patch and a set of po
ints. [19]
Figure 17: The fitting procedure of the adaptive appearance model after specific iterat
ions. [19]
21
Appearance-based methods
In contrast to template matching, the models (or templates) are learned from a set
of training images which should capture the representative variability of facial ap-
pearance. These learned models are then used for detection. These methods are de-
signed mainly for face detection, and two high-cited works are introduced in the fol-
lowing sections. More significant techniques are included in [7][24][25][26].
Example-based learning for view-based human face detection

The appearance-based methods consider not the facial feature points but all regions
of the face. Given a window size, the appearance-based method scans through the
image and analyze each covered region. In the work of Sung et al. [21], the window
size of 19x19 is selected for training and each extracted patch can be represented by
a 381-dimensional vector, which is shown in fig. 18. A face mask is used to disregard
pixels near the boundaries of the window which may contain background pixels, and
reduce the vector into 283 dimensions. In order to better capture the distribution of
the face samples, the Gaussian mixture model [28] is used. Given samples of face
patches and non-face patches, two six-component Gaussian mixture models are
trained based on the modified K-means algorithm [28]. The non-face patches need to
be carefully chosen in order to include non-face samples as many as possible, espe-
cially some naturally non-face patterns in the real world that look like faces when
viewed in a selected window. To classify a test patch, the distances between the
patch and the 12 trained components are extracted as the patch feature, and a mul-
tilayer neural network [29][30] is trained to capture the relationship between these
patch features and the corresponding labels.
During the face detection phase, several window sizes are selected to scan the input
image, where each extracted patches are first resized into size of 19x19. Then we
perform the mask operation, extract the patch features, and classify each patch into
face or non-face based on the neural network classifier.
Fast face detection based on the Haar features and the Adaboost algorithm
The appearance-based method usually has better performance than the feature-
invariant because it scans all the possible locations and scales in the image, but this
exhaustive searching procedure also result in considerable computation. In order to
facilitate this procedure, Viola et al. [22][23] proposed the combination of the
Haar features and the Adaboost classifier [18][28]. The Haar features are used to
capture the significant characteristics of human faces, especially the contrast fea-
tures. Fig. 19 shows the adopted four feature shapes, where each feature is labeled
by its width, length, type, and the contrast value (which is calculated as the averaged
22
Figure 18: The modeling procedure of the distribution of face and non-
face samples. The window size
of 19x19 is sued for representing the canonical human frontal face. In the top row, a six-
component
Gaussian mixture model is trained to capture the distribution of face samples; while in the b
ottom
row a six-component model is trained for non-
face samples. The centroids of each component are
shown in the right side of the figure. [21]
intensity in the black region minus the averaged intensity in the white region). A
19x19 window typically contains more than one thousand Haar features and results
in huge computational cost, while many of them don’t contribute to the classification
between face and non-face samples because both face and non-face samples have
these contrasts. To efficiently apply the large amount of Haar features, the Adaboost
algorithm is used to perform the feature selection procedure and only those features
with higher discriminant abilities are chosen. Fig. 19 also shows two significant Haar
features which have the highest discriminant abilities. For further speedup, the cho-
sen features are utilized in a cascade fashion, where the features with higher discri-
minant abilities are tested at the first few stages and the image windows passing
these tests are fed into the later stages for detailed tests. The cascade procedure
could quickly filter out many non-face regions by testing only a few features at each
stage and shows significant computation saving.
The key concept of using the cascade procedure is to keep sufficient high true pos-
itive rate at each stage, and this could be reached by modifying the threshold of the
23
rate will also increases the false positive rate, this effect could be attenuated by the
cascade procedure. For example, a classifier with 99% true positive rate and 20%
false positive rate is not sufficient for practical use, while cascading this performance
for three times could result in 95% true positive rate while 0.032% false positive rate,
which is surprisedly improved. During the training phase of the cascade procedure,
we set a lower bound of true positive rate and a higher bound of false positive rate
for each stage and the whole system. We train each stage in term to achieve the de-
sired bound, and increase a new stage if the bound of the whole system hasn’t been
reached.
In the face detection phase, several window scales and locations are chosen to ex-
tract possible face patches in the image, and we test each patch by the trained cas-
cade procedure and those which pass all the stages are labeled as faces. There are
many works later based on their framework, such as [32].
(a)
(b)
Figure 19: The Haar features and their abilities to capture the significant contrast feature of
the hu-
man face. [23]
Figure 20: The cascade procedure during the training phase. At each stage, only a portion
of patches
can be denoted as faces and pass to the following stage for further verifications. The patch
es denoted
24
Part-based methods
With the development of the graphical model framework [33] and the point of inter-
est detection such as the difference of Gaussian detector [34] (used in the SIFT de-
tector) and the Hessian affine detector [35], the part-based method recently attracts
more attention. We’d like to introduce two outstanding examples, one is based on
the generative model and one is based on the support vector machine (SVM) classifi-
er.
Face detection based on the generative model framework
R. Fergus et al. [36] proposed to learn and recognize the object models from unla-
beled and unsegmented cluttered scenes in a scale invariant manner. Objects are
modeled as flexible constellations of parts, and only the topic of each image should
be given (for example, car, people, or motors, etc.). The object model is generated by
the probabilistic representation and each object is denoted by the parts detected by
the entropy-based feature detector. Aspects including appearances, scales, shapes,
and occlusions of each part and the object are considered and modeled by the
probabilistic representation to deal with possible object variances.
Given an image, the entropy-based feature detector is first applied to detect the
top P parts (including locations and scales) with the largest entropies, and then these
parts are fed into the probabilistic model for object recognition. The probabilistic
object model is composed of N interesting parts (N<P) and denoted as follows:
, where X denotes the part locations, S denotes the scales, and A denotes the ap-
pearances. The indexing variable h is a hypothesis to determine the attribute of each
detected part (belong to the N interesting parts of the object or not) and the possible
occlusion of each interesting part (If no detected part is assigned to an interesting
part, this interesting part is occluded in the image). Note that P regions are detected
from the image while we assume that only N points are characteristics of the object
and other parts belong to the background.
25
The model is trained by the well-known expectation maximization (EM) algorithm

[28] in order to cope with the unobserved variable h, and both the object model and
background model are trained from the same set of object-labeled images. Then
when an input image comes in, we first extract its P parts and calculate the quantity
R. Comparing this R with a defined threshold, we can determine if there is any face
appears in the image. In addition to this determination, we can analyze each h and
extract the N interesting parts of this image according to h with the highest probabil-
ity score. From fig. 21, we see that these detected N parts based on the highest score
h actually capture the meaningful characteristics of human faces.
Figure 21: An example of face detection based on the generative model framework. (Up-
left) The av-
eraged location and the location variance of each interesting part of the face. (Up-
right) Sample ap-
pearances of the six interesting parts and the background part (the bottom row). (Bottom) E
xamples
of faces and the corresponding interesting parts. [
36]
26
Component-based face detection based on the SVM classifier

Based on the same idea of using detected parts to represent human faces, Bernd et
al. [37] proposed the face detection algorithm consisting of a two-level hierarchy of
support vector machine (SVM) classifiers [18][28]. On the first level, component clas-
sifiers independently detect components of a face. On the second level, a single clas-
sifier checks if the geometrical configuration of the detected components in the im-
age matches a geometrical model of a face. Fig. 22 shows the procedure of their al-
gorithm.
On the first level, the linear SVM classifiers are trained to detect each component.
Rather than manually extracting each component from training images, the authors
proposed an automatic algorithm to select components based on their discrimina-
tive power and their robustness against pose and illumination changes (in their im-
plementation, 14 components are used). This algorithm starts with a small rectangu-
lar component located around a pre-selected point in the face. In order to simplify
the training phase, the authors used synthetic 3D images for component learning.
The component is extracted from all synthetic face images to build a training set of
positive examples, and a training set of non-face pattern that have that same rec-
tangular shape is also generated. After training an SVM on the component data, they
estimate the performance of the SVM based on the estimated upper bound on
the expected probability of error and later the component is enlarged by expanding
the rectangle by one pixel into one of the four directions (up, down, left, right). Again,
they generated training data, trained an SVM, determined , and finally kept the
expansion which decreases the most. This process is continued until the expan-
sions into all four directions lead to an increase in , and the SVM classifier of the
component is determined.
On the second level the geometrical configuration classifier performs the final face
detection by linear combining the results of the component classifiers. Given a
window (a current face searching window), the maximum continuous out-
puts of the component classifiers within rectangular search regions around the ex-
pected positions of the components and the detected positions are used as inputs to
the geometrical configuration classifier. The search regions have been calculated
from the mean and standard deviation of the locations of the components in the
training images. The output of this second-level SVM tells us if a face is detected in
the current window. To search all possible scales and locations inside an
input image, we need to change the window sizes of each component and possible
face size, which is an exhaustive process.
In their work, they proposed three basic ideas behind part- or component-based
detection of objects. First, some object classes can be described well by a few cha-
27
racteristic object parts and their geometrical relation. Second, the patterns of some
object parts might vary less under pose changes than the pattern belonging to the
whole object. Third, a component-based approach might be more robust against
partial occlusions than a global approach. And the two main problems of a compo-
nent-based approach are how to choose the set of discriminatory object parts and
how to model their geometrical configuration.
(a)
(b)
Fig 22: In (a), the system overview of the component-
based classifier using four components is pre-
sented. On the first level, windows of the size of the components (solid line boxes) are shift
ed over the
face image and classified by the component classifiers. On the second, the maximum outpu
ts of the
component classifiers within predefined search regions (dotted lined boxes) and the positio
ns of the
components are fed into the geometrical configuration classifier. In (b), the fourteen learned
compo-
nents are denoted by the black boxes with the corresponding center marked by cro
sses. [37]
4.6 Our proposed methods
28
\
Figure 24: The results after (a) the eyes map (b) and the mouth
map.
In our previous work [37], we adopt the top-down me-

thod to detect faces in an image. We first classify pixels
into skin color or non-skin color, and then find candi-
date face regions based on connected component algo-
rithm. We discard small regions with fewer skin-color
pixels, and verify the remained regions based on the
most fitted ellipse. Regions have higher overlapping
with its fitted ellipse are remained for further verifica-
tion. Important and invariant facial features (ex. Eyes
and mouths) are extracted from each candidate face
region, and we test the relation among these feature
points as well as their constellation and orientation
against the face region. Finally, those candidate regions
pass our heuristic testing procedure are determined as
detected faces.
Our method suffers from the hard decisions be-
tween each block shown in fig. 25. Each block discards
parts of the candidate regions, while these regions may
have positive responses in the later blocks. Besides, our
face detection relies on the well-defined skin color
classification and facial feature extraction detection,
which may not work well in complicated scenes. To
solve these problems, we’d like to make these blocks Figure 25: The procedure of
parallel or exploit more robust features for detection. our previous work
(a) (b)
(c)
Figure 26: The facial feature pair verification process. In (a) we show an positive pair and (
b-c) are two
29
5. Feature Extraction and Face Recognition
Assumed that the face of a person is located, segmented from the image, and aligned
into a face patch, in this section, we’ll talk about how to extract useful and compact
features from face patches. The reason to combine feature extraction and face rec-
ognition steps together is that sometimes the type of classifier is corresponded to
the specific features adopted. In this section, we separate the feature extraction
techniques into four categories: holistic-based method, feature-based method, tem-
plate-based method, and part-based method. The first three categories are fre-
quently discussed in literatures, while the forth category is a new idea used in recent
computer vision and object recognition.
Holistic-based methods
Holistic-based methods are also called appearance-based methods, which mean we
use whole information of a face patch and perform some transformation on this
patch to get a compact representation for recognition. To be more clearly distin-
guished from feature-based methods, we can say that feature-based methods di-
rectly extract information from some detected fiducial points (such as eyes, noses,
and lips, etc. These fiducial points are usually determined from domain knowledge)
and discard other information; while appearance-based methods perform transfor-
mations on the whole patch and reach the feature vectors, and these transformation
basis are usually obtained from statistics.
During the past twenty years, holistic-based methods attract the most attention
against other methods, so we will focus more on this category. In the following sub-
sections, we will talk about the famous eigenface [39] (performed by the PCA), fi-
sherface (performed by the LDA), and some other transformation basis such as the
independent component analysis (ICA), nonlinear dimension reduction technique,
and the over-complete database (based on compressive sensing). More interesting
techniques could be found in [42][43].
Eigenface and Principal Component Analysis
The idea of eigenface is rather easy. Given a face data set (say N faces), we first scale
each face patch into a constant size (for example, 100x100) and transfer each patch
into vector representation (100-by-100 matrix into 10000-by-1 vector). Based on
these N D-dimensional vectors (D=10000 in this case), we can apply the principal
component analysis (PCA) [17][18] to obtain suitable basis (each is a D-dimensional
vector) for dimension reduction. Assume we choose M projection basis ( ),
30
each D-dimensional vector could be transformed into an M-dimensional vector re-

presentation. Generally, these M projection basis are called eigenfaces. The algo-
rithms for PCA and eigenfaces representation are shown below:
Eigenface representation:
(1) Initial setting:
Originally N D-dimensional vectors:
A set of M projection basis:
These basis are mutually orthogonal, and generally we have
(2) The eigenface representation
For each (i=1~N), we compute its projection onto , and
we can get a new M-dimensional vector . This process achieves our goal
of dimension reduction.
The PCA basis:

PCA projection basis are purely data-driven, which are computed from the dataset
we have. This projection process is also called Karhunen-Loeve transform in the
data compression community. Given N D-dimensional vectors (In face recognition
task, usually ), we can get at least min(N-1, D-1) projection basis with one
mean vector:
(1) Compute the mean vector Ψ (D-by-
1 vector)
(2) Subtract each by Ψ and get
(3) Calculate the covariance matrix Σ of all the s (D-by-

D matrix)
(4) Calculate the set of Σ (D-by-(N-
1) matrix, where each eigenvector is aligned as a column vector)
(5) Preserve the M largest eigenvectors based on their eigenvalues (D-by-
M matrix U)
(6) is the eigenface representation (M-
dimensional vector) of the ith face
The orthogonal PCA bases are proved to preserve the most projection energy and
preserve the largest variation after projection, while the proof is not included in this
report. In the work proposed by Turk et al., they proposed a speed-up algorithm to
reach the eigenvectors form the covariance matrix Σ, and used the vectors after
dimension reduction for face detection and face recognition. They also proposed
some criteria for face detection and face tracking.
The PCA method has been proved to discard noise and outlier data from the
training set, while they may also ignore some key discriminative factors which may
not have large variation but dominate our perception. We’ll compare this effect in
the next subsection about the fisherface and linear discriminant analysis. To be
31
Re
co
gni
tio
n
Te
ch
nol
og
y
(a)
(b) (c)
Figure 27: (a) We generate a database with only 10 faces and each face patch is of size 10
0-by-100.
Through the computation of PCA basis, we get (b) a mean face and (c) 9 eigenface (the ord
er of ei-
genfaces from highest eigenvalues is listed from left to right, and from top to b
ottom).
Fisherface and linear Discriminative Analysis
The eigenfaces have the advantage of dimension reduction as well as saving the most
energy and the largest variation after projection, while they do not exploit the infor-
mation of face label included in the database. Besides, there have been several re-
searches showing that the illumination differences result in serious appearance vari-
ations, which means that first several eigenfaces may capture the variation of illumi-
nation of faces rather than the face structure variations, and some detailed struc-
tured difference may have small eigenvalues and their corresponding eigenfaces are
probably dropped when only preserving the M largest eigenvectors.
Despite calculating the projection bases from the whole training data without la-
bels (without human identities, which corresponds to unsupervised learning), Bel-
humeur et al. [40] proposed to use the linear discriminative analysis (LDA) [17] for
bases finding. The objective of applying the LDA is to look for dimension reduction
based on discrimination purpose as well as to find bases for projection that minimize
the intra-class variation but preserve the inter-class variation. They didn’t explicitly
32
(a)
(b)
Figure 28: The reconstruction process based on eigenface representation. (a) The original f
ace in the
database could be reconstructed by its eigenface representation and the set of projection v
ectors
(lossless if we use all PCA projection vectors, or this reconstruction will be lossy). (b) The r
econstruc-
tion process with different number of basis used: From left to right, and from top to bottom,
we in
turn add one projection vector with its corresponding projection value. The bottom right pict
ure using
9 projection vectors and the mean vector is the perfect reconstruction
result.
in a manner which discounts those regions of the face with large intra-class deviation.
Fig. 29 shows the difference of applying the PCA and LDA on the same labeled train-
ing data. The circled data point indicates the samples from class 1 and crossed from
class 2, as you can see, the PCA basis preserved the largest variation after projection,
while the projection result is not suitable for recognition. On the other hand, the LDA
exploits the best projection basis for discrimination purpose, although it doesn’t
preserve as much energy as what the PCA does, the projection result clearly sepa-
rates these two classes by just a simple thresholding. Fig. 30 also depicts the impor-
tance of choosing suitable projection bases.
In the two class problem, the LDA is also called the Fisher linear discriminant algo-
rithm. Given a training set with 2 classes where indicates the set of class , we
33
Figure 29: The comparison between Fisher linear discrimination and principal compone
nt analysis.
[40]
Figure 30: The figure of using different bases for projection. With suitable bases, th
e dimen-
sion-
reduced result could preserve the discriminative nature of the original data. [
17]
, where , and
34
, where the means the projection vector here to be calculated and is the training
set of class
. This obtained is well known in mathematical physics as the ge-
neralized Rayleigh quotient, and a vector
that maximizes must satisfy
for some constant
, which is a generalized eigenvalues problem. If the is nonsin-
And if
the PCA or finding the bases from the nullspace of
is nonsingular, we can choose either pre-reducing the dimensionality of by
.
To solve the multiclass problem, given a training set with C classes, the intra-classvariation and inter-
class variation could be rewritten as:
, where m is the mean vector of all the samples in the training set. W is the projec-
tion matrix where with , and each should satisfy
In order to deal with the singularity problem of , Belhumeur et al. used the PCA
method described as below, where all the samples in the training data are first pro-
jected to an dimension-reduced space, and , , and are calculate in this space:
35
The has the maximum rank as N-C, where N is the size of the training set, so we
need to reduce the dimensionality of the samples down to N-C or less. In their ex-
periment results, the LDA bases outperform the PCA bases, especially in the illumina-
tion changing cases. The LDA could also be applied in other recognition cases. For
example, fig.31 shows the projection basis for the glasses / without glasses case, and
as you can see, this basis capture the glass shape around human eyes, rather than
the face difference of people in the training set.
(a) (b)
Figure 31: The recognition case of human faces with glasses or without glasses. (a) an exa
mple of fac-
es with glasses. (b) the projection basis reached by the LD
A. [40]
5.1.3 Independent Component Analysis
Followed the projection and bases finding ideas, the following three subsections use
different criteria to find the bases or decomposition of the training set. Because
these criteria involve many mathematical and statistical theorems and backgrounds,
here we will briefly describe the ideas behind them but no more details about the
mathematical equation and theorems.
The PCA exploits the second-order statistical property of the training set (the cova-
riance matrix) and yields projection bases that make the projected samples uncorre-
lated with each other. The second-order property only depends on the pair-wise re-
lationships between pixels, while some important information for face recognition
may be contained in the higher-order relationships among pixels. The independent
component analysis (ICA) [18][31] is a generalization of the PCA, which is sensitive to
the higher-order statistics. Fig. 32 shows the difference of the PCA bases and ICA
bases.
In the works proposed by Bartlett et al. [44], they derived the ICA bases from the
principle of optimal information transfer through sigmoidal neurons. In addition,
they proposed to architectures for dimension-reduction decomposition, one treats
the image as random variables and the pixels as outcomes, and the other one treats
the
36
Figure 32: (top) Example 3-

D data distribution and corresponding PC and IC axes. Each axis is a column
of the projection matrix W found by the PCA and the ICA. Note that PC axes are orthogonal
while the
ICA axes are not. IF only two components are allowed, the ICA choose a different subspac
e than ICA.
(bottom left) Distribution of the first PCA coordinates of the data. (bottom right) Distribution i
f the
first ICA coordinates of the data. Note that since the ICA axes are non-
orthogonal, relative distance
between points are different in the PCA than in the ICA, as are the angels between points.
As you can
see, the bases found by ICA preserve more original structure than the P
CA. [44]
pixels as random variables and the image as outcomes. The Architecture I depicted in
fig. 33 found n “source of pixel” images, where each has the appearance as shown in
the column U illustrated in fig. 34, and a human face could be decomposed in to a
weight vector as in fig. 35. This architecture finds a set of statistically independent
basis images and each of them captures the features in human faces such as eyes,
eyebrows, and mouths.
The Architecture II finds the basis images which have similar appearances as the
PCA does as shown in fig. 36, and has the decomposition as shown in fig. 37. This ar-
chitecture uses the ICA to find a representation in which the coefficients used to
code images are statistically independent. Generally speaking, the first architectural
finds spatially local basis images for the face, while the second architecture produces
a factorial face code. In their experimental results, both these two representation
were superior to the representation based on the PCA for recognizing faces across
37
Figure 33: Two architectures for performing the ICA on images. (a) The Architecture I for fin
ding statis-
tically independent basis images. Performing source separation on the face images produc
ed IC im-
ages in the rows of U. (b) The gray values at pixel location I are plotted for each face image
. ICA in the
Architecture I finds weight vectors in the directions of statistical dependencies among the pi
xel loca-
tions. (c) The Architecture II for finding a factorial code. Performing source separation on th
e pixels
produced a factorial code in the columns of the output matrix, U. (d) Each face image is plot
ted ac-
cording to the gray values taken on at each pixel location. The ICA in the Architecture II find
s weight
vectors in the directions of statistical dependencies among the face imag
es. [44]
38
Figure 35: The independent basis image representation consisted of the coefficients, b, fo
r the linear
combination if independent basis images, u, that comprised each face imag
e x. [44]
Figure 36: Image synthesis model for the Architecture II. Each image in the dataset was co
nsidered to
be a linear combination if underlying basis images in the matrix A. The basis images were e
ach asso- , where
ciated with a set of independent “causes,” given by a vector of coefficients S. The basis ima
ges were
estimated by is the learned ICA weight matrix. [44]
Figure 37: The factorial code representation consisted of the independent coefficients, u, fo
r the linear
combination of basis images in A that comprised each face image
x. [44]
Laplacianfaces and nonlinear dimension reduction
Despite using linear projection to obtain the representation vector of each face im-
age, some researchers claim that the nonlinear projection may yield better repre-
39
Figure 38: The problem of nonlinear dimension reduction, where a three-

dimensional data is gener-
ated from a two-
dimensional manifold. An unsupervised learning algorithm must discover the global
internal coordinates of the manifold without signals that explicitly indicate how the data sho
uld be
embedded in two dimensions. The color coding illustrates the neighborhood preserving ma
pping di s-
covered by a nonlinear dimension reduction technique called the LLE (locally linear embed
ding); black
outlines in (B) and (C) show the neighborhood of a single point. Unlike nonlinear dimension
reduction
techniques, projections of the PCA map faraway data points (in the manifold sense) to near
by points
in the plane, failing to identify the underlying structure of the manifold.
[47]
Figure 39: Two-

dimensional linear embedding of face images by Laplacianfaces. As can be seen, the
40
The manifold is a low-dimension shape of data distribution embedded

in the high-dimension space, as depicted in fig. 38. The linear projection would de-
stroy the low-dimension structure which may make the blue samples closed to the
red samples in the dimension-reduced space, while the nonlinear dimension tech-
niques preserve this manifold property. Using the weight vector reached by the LPP,
we can train a classifier or use the K-NN (K-nearest neighbors) technique for face
recognition. Fig. 39 illustrates how the Laplacianfaces preserve the face variation
from the high-dimension space into a 2-D space, where the closed points definitely
have similar appearances.
Robust face recognition via sparse representation
Wright et al. [48] proposed to use the sparse signal representation for face recogni-
tion. They used the over-complete database as the projection basis, and applied the
L1-minimization algorithm to find the representation vector for a human face. They
claimed that if sparsity in the recognition problem is properly harnessed, the choice
of features is no longer critical. What is crucial, however, is that whether the number
of features is sufficiently large and whether the sparse representation is correctly
computed. This framework can handle errors due to occlusion and corruption un-
iformly by exploiting the fact that these errors are often sparse with respect to the
standard (pixel) basis. Fig. 40 shows the overview of their algorithm.
Fig. 40: The sparse representation technique represents a test image (left), which is (a) pot
entially
occluded or corrupted. Red (darker) coefficients correspond to training images of the correc
t individu-
al. [48]
Feature-based methods
We have briefly compared the differences between holistic-based methods and fea-
ture-based methods based on what the information they use from a given face patch,
and from another point of view, we can say that holistic-based methods rely more on
41
statistical learning and analysis, while feature-based methods exploit more ideas
from image processing, computer vision, and domain knowledge form human. In this
section, we discus two outstanding features for face recognition, the Gabor wavelet
feature and the local binary pattern.
Gabor wavelet features with elastic graph matching based methods
The application of Gabor wavelet for face recognition is pioneered by Lades et al.’s
work [49]. In their work, the elastic graph matching framework is used for finding
feature points, building the face model and performing distance measurement, while
the Gabor wavelets are used to extract local features at these feature points, and a set
of complex Gabor wavelet coefficients for each point is called a jet. Graph matching
based methods normally requires two stages to build the graph for a face image I
and compute its similarity with a model graph . During the first stage, is
shifted within the input image to find the optimal global offset of while keeping
its shape rigid. Then in the second stage, each vertex in is shifted in a topological
constraint to compensate the local distortions due to rotations in depth or expression
variations. It is actually the deformation of the vertices that makes the graph matching
procedure elastic. To achieve these two stages, a cost measure function S( ) is
neccesarily to be defined and these two stages terminate when this function reaches
the minimum value.
Lades et al.’s [49] used a simple rectangular graph to model faces in the database
while each vertex is without the direct object meaning on faces. In the database
building stage, the deformation process mentioned above is not included, and the rec-
tangular graph is manually placed on each face and the features are extracted at indi-
vidual vertices. When a new face I comes in, the distance between it and all the faces
in the database are required to compute, which means if there are totally N face mod-
els in the database, we have to build N graphs for I based on each face model. This
matching process is very computationally expensive especially for large data-
base.Fig.41 shows an example of a model graph and a deformed graph based on it,
and the cost function is defined as:
where determines the relative importance of jet similarity and the topography term.
is the distance vector of the labeled edge e between two vertices, is the set of
jets at vertex n, and is the distance measure function between two jets based on
the magnitude of jets.
42
They employed object-adaptive graph to model faces in the database, which means
the vertices of a graph refer to special facial landmarks and enhance the distortion-
tolerant ability (see Fig.42). The distance measure function here not only counts on
the magnitude information, but also takes in the phase information from the feature
jets. And the most important improvement is the used of face bunch graph (FBG),
which is composed of several face models to cover a wide range of possible variations
in the appearance of faces, such as differently shaped eyes, mouths, or noses, etc. A
bunch is a set of jets taken from the same vertex (the same landmark) from different
face models and Fig.43 shows the FBG structure. The cost function is redefined as:
where B is the FBG representation, and N and E are the total amounts of vertices and
edges in the FBG. denotes the model graph of B and is the
new-defined distance measure function which takes the phase of jets into account. To
build the database, a FBG is first generated and models for individual faces are gener-
ated by the elastic graph matching procedure based on FBG. When a new face comes
in, the same elastic graph matching procedure based on FBG is executed to generate a
new face model, and this model could directly compare with the face models in the
database without re-modeling. The FBG serves as the general representation of faces
and reduce the computations for face modeling.
Besides these two symbolic examples using the elastic graph matching framework,
a number of varied versions have been proposed in literature and readers could found
a brief introduction in [51].
(a) (b)
Figure 41: The graphic models of face images. The model graph (a) is built to represent a f
ace stored in
the database, and features are directly extracted on vertices in the rectangular graph. Whe
n a new
face comes in and we want to recognize this person, a deformed graph (b) is generated ba
sed on the
two- stage process [49].
43
Figure 42: The object-

adaptive grids for difference poses. Now the vertices are positioned automati-
cally by elastic bunch graph matching and are located at special facial landscapes. One ca
n see that, in
general, the matching finds the fiducial points quite accurately, but still with some mis s-
positioning
[50].
Figure 43: The face bunch graph serves as the general representation of faces. As shown i
n this figure,
there are nine vertices in the graph and each of them contains a bunch of jets to cover the
variations
in the facial appearance. The edges are represented by the averaged distance vector calcul
ated from
all face models sued to build the FBG [50]
.
Binary features
Besides applying the Gabor wavelet features with elastic graph matching based
methods, Ahonen et al. [52] proposed to extract the local binary pattern (LBP) histo-
grams with spatial information as the face feature and use a nearest neighbor classifier
based on Chi square metric as the dissimilarity measure. The idea behind using the
44
these micro-patterns, a global description of the face image is obtained.

The original LBP operator, introduced by Ojala et al. [53], is a powerful means of
texture description. The operator labels the pixels of an image by thresholding the
-neighborhood of each pixel with the center value and considering the result as a
binary number. Then the histogram of the labels can be used as a texture descriptor.
Fig. 43 illustrates the basic LBP operator. Later the operator was extended to use
neighborhoods of different sizes based on circular neighborhoods and bilinear inter-
polation of the pixel values [54]. The notation (P,R), where P means the number of
sampling points on a circle of radius R, is adopted and illustrated in fig. 44.
Another extension to the original operator uses so called uniform patterns [54]. A
local binary pattern is called uniform if it contains at most two bitwise transitions
from 0 to 1 or vice versa when the binary string is considered circular. Ojala et al. no-
ticed that in their experiments with texture images, uniform patterns account for a bit
less than 90 % of all patterns when using the (8,1) neighborhood and for around 70%
in the (16,2) neighborhood.
The following notation for the LBP is used: . The subscript represents using
the operator in a (P,R) neighborhood. Superscript u2 stands for using only uniform
patterns and labeling all remaining patterns with a single label. A histogram of the
labeled image can be defined as:
, in which n is the number of different labels produced by the LBP operator and
This histogram contains information about the distribution of the local mi-
cro-patterns, such as edges, spots and flat areas, over the whole image. For efficient
face representation, one should retain also spatial information. For this purpose, the
image is divided into several regions as shown in fig. 45 and the spatially enhanced
histogram is defined as:
In this histogram, we effectively achieve a description of the face on three different

level of locality: the labels for the histogram contain information about the patterns
on a pixel-level, the labels are summed over a small region to produce information
on a region level and regional histograms are concatenated to build .
45
In the face recognition phase, the nearest neighbor classifier is adopted to com-
pare the distance between the input face and the database. Several metrics could be
applied for distance calculation, such as the histogram intersection, log-likelihood
statistic, and distance, and Chi square statistic , etc. When the image
has been divided into regions, it can be expected that some of the regions contain
more useful information than others in terms of distinguishing between people. For
example, eyes seem to be an important cue in human face recognition. To take ad-
vantage of this, a weight can be set for each region based on the importance of the
information it contains. For example, the weighted statistic becomes
, in which is the weight for region j and S and M denote two feature vectors to be
compared. Fig. 45 also shows the weights they applied in the experimental results.
Later on, several recent works used this feature for face recognition ,such as [55].
Figure 43: The basic LBP operator. [5

2]
Figure 44: The circular (8,2) neighborhood. The pixel values are bilinearly interpolated whe
never the
46
(a) (b)
Figure 45: (a) An example of a facial image divided into windows.

(b) The weights set for
weighted dissimilarity measure. Black square indicate weight 0.0, dark gray 1.0, light gray
2.0 and
Template-based methods
The recognition system based on the two methods introduced above usually perform
feature extraction for all face images stored in the database and train classifiers or
define some metric to compute the similarity of a test face patch with each class
person class. To overcome variations of faces, these methods increase their database
to accommodate much more samples and expect the trained transformation basis or
defined distance metric could attenuate the intra-class variation while maintaining
the inter-class variation. Traditional template-matching is pretty much like using dis-
tance metric for face recognition, which means selecting a set of symbolic templates
for each class (person), the similarity measurement is computed between a test im-
age and each class, and the class with the highest similarity score is the selected as
the correct match. Recently, deformable template techniques are proposed [31]. In
contrast to implicitly modeling intra-class variations (ex. increasing database), de-
formable template methods explicitly models possible variations of human faces
from training data and are expected to deal with much severe variations. In this sec-
tion, we introduce the face recognition technique based on the ASM and the AAM
described in Section 4.3.1.
From the introduction in section 4.3.1, we know that during the face detection
process, the AAM will generates a parameter vector c which could synthesize a face
appearance that is best fitted to the face shown in the image. Then if we have a
well-chosen database which contains several significant views, pose, expressions of
each person, we can achieve a set of AAM parameter vectors to represent each iden-
tity. To compare the input face with the database, Edwards et al. [56] proposed to
use the Mahalonobis distance measure for each class and generate a
class-dependent metric to encounter the intra-class variation. To better exploit the

inter-class variation against the intra-class variation, they also used the linear discri-
minant analysis (LDA) for dimension reduction and classification task.
47
Part-based methods
Following the ideas presented in Section 4.5, there have been several researches
these years exploiting information from facial characteristic parts or parts that are
robust against pose or illumination variation for face recognition. To be distinguished
from the feature-based category, the part-based methods detect significant parts
from the face image and combine the part appearances with machine learning tools
for recognition, while the feature-based methods extract features from facial feature
points or the whole face and compare these features to achieve the recognition
purpose. In this subsection, we introduced two techniques, one is an extension sys-
tem of the method described in Section 4.5.2, and one is based on the SIFT (scale-
invariant feature transform) features extracted from the face image.
Component-based face recognition
Based on the face detection algorithm described in Section 4.5.2, Heisele et al. [57]
compared the performance of the component-based face recognition against the
global approaches. In their work, they generated three different face recognition
structures based on the SVM classifier: a component-based algorithm based on the
output of the component-based face detection algorithm, a global algorithm directly
fed by the detected face appearance, and finally a global approach which takes the
view variation into account.
Given the detected face patches, the two global approaches have the only differ-
ence that whether the view variation of the detected face is considered. The algo-
rithm without this consideration directly builds a SVM classifier for a person based
on all possible views, while the one with this consideration first divides the training
images of a person into several view-specific clusters, and then trains one SVM clus-
ter for each of them. The SVM classifier is originally developed for binary classifica-
tion case, and to extend for multi-class tasks, the one-versus-all and the pair-wise
approaches are described in [58]. The view-specific clustering procedure is depicted
in fig. 46.
The component-based SVM classifier is cascaded behind the component-based
face detection algorithm. After a face is detected in the image, they choose 10 of the
14 detected parts, normalized them in size and combined their gray values into a sin-
gle feature vector. Then a one-versus-all multi-class structure with a linear SVM for
each person is trained for face recognition purpose. In fig. 47, we show the face de-
tection procedure, and in fig. 48, we present the detected face and the correspond-
ing 10 components fed into the face recognition algorithm.
In the experimental results, the component system outperforms the global sys-
tems for recognition rate larger than 60% because the information fed into the clas-
48
sifiers capture more specific facial features. In addition, the clustering leads to a sig-
nificant improvement of the global method. This is because clustering generates
view-specific clusters that have smaller intra-class variations than the whole set of
images of a person. Based on these results, they claimed that a combination of weak
classifiers trained on a properly chosen subsets of the data can outperform a single,
more powerful classifier trained on the whole data.
Figure 46: Binary three of face images generated by divisive clusteri

ng. [57]
Figure 47: System overview again of the component-

based face detector using four components. [57]
49
Figure 48: Examples of the component-

based face detection and the 10 components used for face
recognition. [57]
Person-specific SIFT features for face recognition
The scale-invariant feature transform (SIFT) proposed by Lowe et.al [34] has been
widely and successfully applied to object detection and recognition. In the works of
Luo et al. [59], they proposed to use the person-specific SIFT features and a simple
non-statistical matching strategy combined with local and global similarity on key-
point clusters to solve face recognition problems.
The SIFT is composed of two functions, the interest-point detector and the re-
gion-descriptor. Lowe et al. used the difference-of-Gaussian (DOG) algorithm to
detect these points in a scale-invariant fashion, and generated the descriptor based
on the orientation and gradient information calculated in a scale-specific region
around each detected point. Fig. 49 shows the SIFT features extracted on sample
faces and some corresponding matching point in two face images. In each face image
the number and the positions of the features selected by the SIFT point detector are
different, so these features are person-specific. In order to only compare the feature
pairs with similar physical meaning between faces in the database and the input face,
same number of sub-regions are constructed in each face image to compute the si-
milarity between each pair of sub-regions based on the features inside and at last get
the average similarity values. They proposed to ensemble a K-means clustering
scheme to construct the sub-regions automatically based on the locations of features
50
in training samples.
After constructing the sub-regions on face images, when testing a new image, all
the SIFT features extracted from the image are assigned into corresponding sub-
regions based on the locations. The construction of five sub-regions is illustrated in
fig. 50, and it can be seen that the centers of regions denoted by crosses just cor-
respond to two eyes, nose and two mouth corners that agree with the opinion of
face recognition experience as these areas are the most discriminative parts of face
images. Based on the constructed sub-regions, a local-and-global combined matching
strategy is used for face recognition. The details of this matching scheme are referred
in [59].
Figure 49: SIFT features on sample images and features matches in faces with expression
variation.
[59]
Figure 50: Sub-

region construction and similarity computation scheme for the face recognition system.
[59]
6. Comparison and Conclusion
In this section, we’ll give summaries on face detection and face recognition tech-
niques during the past twenty year as well as popular face data set for experiments
and their characteristics
51
Table 7: The summary of face-detection:

Method Category Characteristics
Hierarchical knowledge-based [8] Knowledge-based Coarse-to-fine procedure
Horizontal / vertical projection [9] Knowledge-based
Face Detection Using Color InformationF[e

1ature-based Combining skin-
0] color detection, face shape verification, and
facial feature configuration for detection
Face detection based on random labFeeleature-based Combining simple features with statistical learning
d
graph matching [11]

Active appearance model [19] Template matchingLearning facial shape and appearance variation by
data
Example-based learning [21] Appearance-basedLearning the face and non-
face distribution by mixture of
Gaussian
Haar features with Adaboost [22] Appearance-basedAdaboost for speed-up
Generative models [36] Part-based Unsupervisedly extracting important facial features
Component-based with SVM [37] Part-based Learning global and local SVM for detect
Table 8: The summary of face recognition techniq

ues
Method Category Characteristics
PCA [39] Holistic-based PCA for learning eigenfaces, unsupervis

ed
LDA [40] Holistic-based LDA for learning fisherfaces, supervise
d
2D-PCA [41] Holistic-based 2D-
PCA for better statistical properties
ICA [44] Holistic-based ICA for catch facial independent components, two archi-
tectures are proposed

Laplacianfaces [45] Holistic-based Nonlinear dimension reduction for finding bases, L
PP
Evolutionary pursuit [43] Holistic-based Using the genetic algorithm for finding the best projectio
n
bases based on generalization erro
r
Kernel PCA And Kernel LDA [42]
52
Component-base [57] Part-based Comparing global and component representation, while

a
SVM classifier for each person is not suitable in pract
ice.
SIFT [59] Part-based Using SIFT feature with spatial constraints to compare f
aces
Table 9: The summary of popular databases used for detection and recognition tasks
Name RGB/gray Image size # people Pictures/person Conditions Available
AR Face Database* RGB 576x768 126 26 i, e, o, t Yes
Richard’s MIT RGB 480x640 154 6 p, o Yes
CVL RGB 640x480 114 7 p, e Yes
The Yale Face DatabaseG

Br*ay 640x480 10 576 p, i Yes
The Yale Face Database*Gray 320x243 15 11 i, e Yes
PIE* RGB 640x486 68 ~608 p ,i, e Yes
The UMIST Face DatabaG

seray 220x220 20 19-36 p Yes
Olivetti Att-ORL* Gray 92x112 40 10 Yes
JAFFE Gray 256x256 10 7 e Yes
The Human Scan Gray 384x286 23 ~66 Yes
XM2VTSDB RGB 576x720 295 p With pay
FERET* RGB/gray 256x384 30000 p, i, e, i/o, t Yes
The (*) points out most used databases. Image variations are indicated by (i) illumination, (p) pose, (e) expression,
indoor/outdoor conditions and (t) time delay.
Reference
[1] R. Chellappa, C. L. Wilson, and S. Sirohey, “Human and machine recognition of faces: a
survey,” Proc.
IEEE, vol. 83, no. 5, pp. 705-
740, 1995.
[2] W. Zhao, R. Chellappa, P. J. Phillips, and A. Rosenfeld, “Face recognition: a literature sur
vey,” Tech-
nical Report CAR-TR-
948, Center for Automation Research, University of Maryland (2002).
53
[3] A. F. Abate, M. Nappi, D. Riccio, and G. Sabatino, “2D and 3D face recognition: a survey,
” Pattern
Recognition Letter, vol. 28, no. 14, pp. 1885–
1906, 2007.
[4] M. Grgic, and K, Delac, “Face recognition homepage.” [online] Avai
lable:
http://www.face-rec.org/general-
info. [Accessed May. 27, 2010].
[5] A. K. Jain, R. P. W. Duin, and J. C. Mao, “Statistical pattern recognition: a review,” IEEE
Trans. Pattern
Analysis and Machine Intelligence, vol. 22, no. 1, pp. 4–
37, 2000.
[6] [online] Available: http://www.michaelbach.de/ot/fcs_thompson-
thatcher/index.html. [Accessed
May. 27, 2010].
[7] M. H. Yang, D. J. Kriegman, and N. Ahuja, “Detecting face in images:
a survey,” IEEE Tra
54

Seminar Project - Face - Recognition

Uploaded by

Copyright:

Available Formats

Seminar Project - Face - Recognition

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Seminar Project - Face - Recognition

Uploaded by

Copyright:

Available Formats

Bachelor of Computer Applications (BCA)

Face Recognition Technology

Exam No Name of Student

due to such guidance and assistance.

providing all the necessary information for developing a good system.

Teaching staffs of Bachelor of Computer Application Department which

helped us in successfully completing our project work. Also, we would like to

extend our sincere regards to all the non-teaching staff of Bachelor of

Computer Application Department for their timely support.

Savani Dhruvi Vinodbhai(3193)

Sr No Description Page No.

1 Practical and Overview of facial recognition

Attendance System Practical with the help of python

Overview of face recognition Technology

1. Introduction to Face Recognition: Structure and Procedure

gion-of-interest detection, retargeting, video and image classification, etc.

identities of these faces. In order to achieve automatic recognition, a face database is

2. Fundamental of pattern recognition

Figure 3: The general structure of a pattern recognition sy

Figure 4: Various approaches in statistical pattern recognition.

2.2Different kinds of pattern recognition (four categories)

Dimension Reduction: Domain-knowledge Approach and Data-driven

2.4Two tasks: Unsupervised Learning and Supervised Learning

Table 1: The definition of true positive and false posit

False negative (FN)

True negative (TN)

Precision (P) # of true positive / # of all detected pattern

3. Issues and factors of human faces

on a specific application, besides building the general structure of pattern recogni-

Domain-knowledge of human faces and human visual system

Aspects from psychophysics and neuroscience

argument in these disciplines is that whether face recognition a dedicated process

Figure 6: The Thatcher Illusion.

Factors of human appearance variations

Table 6: The list and description of the six factors:

detection and recognition researches. This variation is caused by

by the variation in image acquisition process. It results in difficulties

facial appearances, we also need to consider the influence of envi-

Figure 9: The multi-resolution hierarchy of images created by averaging.

Figure 10: Examples of the horizontal / vertical projection method.

Hierarchical knowledge-based method

This method is composed of the multi-resolution hierarchy of images and specific

Horizontal / vertical projection

Feature invariant approaches

Face Detection Using Color Information

Figure 12: The flowchart to generate the eye map. [

Figure 13: The flowchart to generate the mouth map. [

Face detection based on random labeled graph matching

Template matching methods

Adaptive appearance model

In the traditional deformable template matching techniques [31], the deformation

(c) (d) (e)

, where Q represents the eigenvectors and c is a vector of appearance parameters

Example-based learning for view-based human face detection

Face detection based on the generative model framework

The model is trained by the well-known expectation maximization (EM) algorithm

Component-based face detection based on the SVM classifier

4.6 Our proposed methods