Conf Paper 1 - Vivek
Conf Paper 1 - Vivek
Conf Paper 1 - Vivek
Techniques
Vivek B. Sharma Anil Singh Parihar
Dept. of Computer Science and Engineering Dept. of Computer Science and Engineering
Delhi Technological University Delhi Technological University
Delhi, India Delhi, India
vivekbsharma 2k20cse27@dtu.ac.in parihar.anil@gmail.com
Abstract—The humanistic face is a distinct characteristic of dynamic entity with a great degree of diversity in appearance,
the human body. This uniqueness varies from person to person making face recognition a tough task in computer vision.
since each human face is distinct in several ways and hence is
a unique biometric of an individual person. Face detection is II. R EAL W ORLD A PPLICATIONS OF FACE R ECOGNITION
a subfield of artificial intelligence that detects faces in images,
video recordings, and so on. Detecting human faces is important Face recognition has been and continues to be the most
in a variety of applications such as facial animation, face significant study field in computer vision, capturing the interest
recognition, human image database management and human of both academics and industrialists.
computer interface, face verification and validation, and as a • Security access control: Face recognition is widely used
pre-processing step for many high-end and complex computer
vision tasks for which a number of state-of-the-art detection in locations where access is restricted to certain individuals.
methods have been developed. However, the complexity of the Face-recognition technology is utilised to get access to such
human face and the changes caused by various influences like facilities. Trueface.ai is developing Chui, a facial-recognition
as lighting, stance, and angle of the photo shot will also affect doorbell, using deep learning algorithms for fraud detection
the identification accuracy of the algorithms, which is where and distinguishing a real face from an image.
advances in deep learning come in handy. With the progress
of deep learning, we are now able to train artificial neural • Surveillance systems: A considerable number of CCTV
networks for face identification and recognition, which in some cameras are now installed in strategic areas to aid in the iden-
cases outperform convectional techniques in terms of accuracy tification of offenders. The person’s face may be recognised
and performance. The primary goal of this study is to analyse in recordings acquired by CCTV cameras. Theft has fallen
contemporary face recognition algorithms by over 30 percent at some superstores, including Walmart,
Index Terms—face detection, feature extraction, image process-
thanks to technology such as FaceFirst, which can recognise
ing, neural networks, face detection, face recognition.Viola-Jones,
Haar Feature Based Detection (HFFD), Principle Component a customer’s face from a distance of 100 feet.
Analysis (PCA), Local Binary Pattern (LBP), CNNs (Convolu- • General identity verification: Today, many vital personal
tional Neural Networks, MTCNN, YOLO, BlazeFace, SSD (Single papers, such as national identification cards and passports,
Shot Detector)) employ recorded face photos of the individual to confirm their
identity.
I. I NTRODUCTION • Image database investigations: In the event of missing
persons, images may be matched to existing databases to
Humans may determine their identity based on their appear- determine an individual’s identity.
ance. The face is one of the most often utilised biometrics for • Mobile and laptop applications: Face-recognition tech-
recognising people, alongside fingerprints and iris detection. nology is increasingly being utilised in mobile phones and
Faces not only reveal people’s identities, but also information computers to safeguard an individual’s data, in lieu of pin
about human behaviour such as emotions. People’s faces are numbers and passwords.
one of the most significant issues in computer vision and • Forensic science: Facial recognition is an important tech-
pattern recognition, and they play a vital role in a variety of nique and topic of research in the area of forensic science.
sectors such as user authentication, homeland security, smart When no automation is available, forensic scientists must do
home access security, detecting criminals, and identifying this vital activity manually. This may be used for both law
humans in small scale applications. Face recognition is divided enforcement and comparative reasons. • Miscellaneous: This
into four steps: extraction, picture pre-processing, feature includes identifying persons in hospitals, police departments,
extraction, classification, and recognition. Any face recogni- or courts of law as a source of evidence.
tion strategy was employed for basic services such as face
recognition, face tracking, face verification, and facial pattern III. M ETHODOLOGY
detection, and may be thought of as a building component Extraction of human facial characteristics such as eyes,
of a multistage computer vision system. The human face is a nose, lips, eye colour, beard, moustache, hair, skin colour, and
so on is part of human face analysis. Face recognition, emotion today’s digital age. Data collecting may occur in a variety of
or expression recognition, facial image database management, ways. The main device might gather their own data, or a third
human computer interaction, image retrieval, and many more party device can collect the data and deliver it to the primary
applications benefit from the extraction of human face char- device for analysis. A good example is cloud computing. If all
acteristics. The general actions that will be taken throughout data processing happens in a single place, it has a significant
the facial recognition process are summarised below. The impact on the network since it constrains the centrally located
procedures shown in Figure 3 are the general processes that device, and if the centrally located device fails, the whole
every facial feature extraction must take. system may fail. To circumvent this in [8,] the authors created
1) Picture acquisition: Before an image can be used as a way for implementing edge computing. As previously said,
an input for either neural network training or face whether it is supervised or unsupervised learning, a strong data
recognition, it must first be preprocessed into a standard collection will lead to a higher performance of the system with
format. A digital picture will be developed at this point. favourable outcomes. Different writers have utilised various
2) Pre-processing of images: This stage is mostly per- datasets, the most popular of which is the Labelled Faces in the
formed to enhance picture quality and involves proce- Wild (LFW) [8], which has over 13,000 pictures gathered from
dures such as colour conversion, image scaling when the internet. One of the publicly accessible datasets for the
the models accept a 224x224 image. This may be purpose of face detection is the FDDB dataset [3]. This dataset
accomplished by deleting the undesired data known as contains 5171 annotated faces from the Faces in the Wild
noise. To eliminate noise from a picture, noise reduction dataset in 2845 photos. Certain writers employed a mix of two
methods such as the Gaussian filter, median filter, low or more datasets to validate their approach.we have utilised the
pass filter, and high pass filter are available. If the noise Yale Face Database, which has 165 greyscale photos in GIF
is not eliminated from the picture, the model will learn format of 15 different people, and the Cambridge Database
the noise during the training phase, potentially leading of Faces, which comprises 10 different photographs of each
to overfitting. of the 40 different topics. To improve the efficacy of their
3) Face Detection: Before feature extraction and face approaches, several writers generated their own dataset. [20]
recognition can occur, face detection must be performed. built their own Child face dataset since there was no publicly
The system must determine which region of the picture available dataset. They generated a novel generation dataset
contains the face object. In order for this to happen, the called the T-Dataset in by leveraging the correlation between
network will be taught on how a human face looks and the training pictures without employing a convectional image
what the main spots are (eyes, nose, mouth, ears). density approach.
4) Image segmentation: Image segmentation may be useful
V. N EURAL N ETWORKS
in extracting features from human face photos. Picture
segmentation is the process of dividing an image into The human organic nervous system inspired the Artificial
distinct portions or areas. Image segmentation may be Neural Network (ANN). ANN leverages brain processing to
used in face feature extraction to get various feature generate algorithms that may be used to model complicated
components such as eye segmentation, lip segmenta- patterns and forecast issues. They have the capacity to learn
tion, brow segmentation, nose segmentation, beard and and model nonlinear and complicated interactions, which is
moustache segmentation. Face characteristics may be critical since many of the relationships between inputs and
extracted by doing picture segmentation. outputs in real life are nonlinear and complex. It may infer
5) Feature Extraction: After face detection, the features hidden associations on unseen data after learning from the
of the face may be extracted to start the face recog- initial inputs and relationships between the inputs, allowing
nition process. An image’s feature extraction focuses the model to generalise and predict on unseen data. The back
on detecting intrinsic properties or features of an item propagation method is employed in the ANN learning process
in the picture. These characteristics may be used to [28]. There are many different varieties of ANN, such as
characterise the items. The most important human face multi-layered feed forward neural networks, which include
features are the eyes, nose, ears, and mouth, and this multiple levels such as an input layer, a hidden layer, and
can be accomplished using a variety of techniques such an output layer. Layer nodes independently compute data and
as calculating the distance between two eyes, eyes transfer it to the next layer up to the output layer. Data
and nose, and so on in the case of geometric based computation on layer nodes is the weighted sum of the input
techniques, Scale Invariant Feature Transform (SIFT) values, and node output is pushed out by an activation function
and Dominant Rotated Local Binary Pattern (DRLBP). such as a threshold function, sigmoid function, or hyperbolic
Color-based approaches may be used to determine eye function. Convolution neural network is a very traditional
colour, as well as beard and moustache colour. neural network structure that has recently been applied in the
area of deep learning. It extracts and transforms features using
IV. ACQUISITION OF THE DATA many layers of feature descriptors. In general, the early layers
Data is a very significant resource in today’s world; it is nec- extract the fundamental aspects of the face, whereas the latter
essary for the better and correct operation of any application in layers extract the finer elements of the face. Cascaded neural
networks [7] are a sort of feed forward network in which a link In [20], the accuracy and performance of three conventional
is made from the input and every previous layer to the next tier, Convolutional Neural Networks (CNNs) are assessed on a kid
allowing for a non-linear relationship between the input and face dataset, including VGG Face based on two architectures
the output. LeNet, Alexnet, GoogLeNet, and VGG are several (VGG16 and ResNet50) and Mobilenet. The ChildDB is used
well-known commercially accessible, publicly developed, and to build and train the CNN. The CNN’s fully connected
frequently used ANN [17]. The artificial neural networks layers or the K-Nearest Neighbor Algorithm are used for
discussed above may be classified into two types: supervised classification (KNN). All three approaches had a respectable
ANN and unsupervised ANN. According to [17], supervised performance average of 99 percent accuracy, with the Mo-
ANN is a machine learning activity in which the model is first bilenet providing the best accuracy of 99.75 percent and the
trained using input variables and an algorithm is used to learn ability to analyse pictures in a relatively short amount of time.
the mapping function from the input to the output. VII. A NALYSIS OF NEW FACE IDENTIFICATION AND
VI. FACE D ETECTION AND R ECOGNITION RECOGNITION ALGORITHMS
Human face detection is a form of technology used to In Machine Learning algorithms, there is a trade-off be-
recognise human faces in still or digital photographs, as tween inference speed and accuracy, much as there is a trade-
well as videos. To recognise faces, first transform digital off between processing time and space in Computer Science.
photographs into greyscale images, or utilise a coloured image There are several object identification methods available, each
as input. Images may be of any size or shape, including with its own speed and accuracy trade-offs.
bigger images, crowded images, individual poses, colour and In this paper, we have analysed a number of State of the
backdrop lighting, and so on. To recognise human faces in Art face recognition Algorithms, which are as below:
photos and movies, several algorithms have been created. A. YOLO (You Only Look Once)
Face detection methods are used to determine the size and YOLO face detection (You only look once) is a cutting-
placement of the face area. Face recognition techniques are edge Deep Learning technique for object identification. A
developed based on the input photographs, postures, angles, Deep CNN model is formed by multiple convolutional neural
characteristics, and look. In the following parts, I will analyse networks. Joseph Redmon came up with the idea. It is a real-
several face recognition methods and compare their accuracy time object identification system capable of identifying several
and performance. items in a single frame. Over time, YOLO has developed
Mobile apps have limited resources in terms of both hard- into subsequent forms, including YOLOv2, YOLOv3, and
ware and software. Keeping these limitations in mind, Google YOLOv4. YOLO takes a completely new approach to earlier
developed a technique based on the Mobilenet architecture detection methods. It uses a single neural network to process
that is based on depth wise separable convolutions operations the whole picture. This network separates the picture into
followed by batch normalisation and ReLu nonlinearity gen- regions and forecasts bounding boxes and probabilities for
erating feature maps that are provided as input to the soft-max each. The projected probabilities are used to weight these
layer, which will result in labelling of the object as face or bounding boxes.
non-face. The suggested methods in [2] outperformed standard 1) Advantages: Compared to classifier-based systems, the
boosting techniques such as Viola-Jones and ResNet (SSD), YOLO model offers significant benefits. It is capable of
and it was also able to recognise faces with diverse orientations recognising many things in a single frame. At test time, it
and many faces in a single run. examines the whole picture, so its predictions are influenced
Facial recognition algorithms with high recognition rates by the image’s global context. It also predicts with a single
include HOG, SIFT, PCA, and Viola-Jones. However, these network assessment, as opposed to R-CNN, which requires
algorithms have limits when the input photographs have a thousands for a single picture. This allows it to be 1000x
variable lighting condition, position, angle of face rotation, quicker than R-CNN and 100x faster than Fast R-CNN. The
and so forth. To circumvent these constraints, researchers have YOLO architecture allows for end-to-end training and real-
lately begun to experiment with artificial neural networks, time performance while retaining excellent average accuracy.
in which network models are first trained using a training 2) Disadvantages: The YOLOv3 AP does show a trade-off
dataset and then utilised for face recognition. The authors of in speed and accuracy when compared to RetinaNet because
[8] constructed a convolutional neural network that is a deep RetinaNet training period is longer than YOLOv3. However,
unified model for Face Recognition based on Faster Region by having a bigger dataset, the accuracy of recognising objects
Convolution Neural Network. The author employed a Region using YOLOv3 can be made equivalent to the accuracy of
Propose Network (RPN) for face recognition. A RPN is a using RetinaNet, making it a suitable alternative for models
light weight network that scans pictures with the aid of sliding that can be trained with huge datasets.
windows over the anchors and returns the anchors (bounding A prominent example would be traffic detection models,
box) that have the highest chance of containing the face. Edge where a large amount of data may be utilised to train the
Computing has been used to process data at the network’s model due to the large number of photos of various cars. On
nodes’ edges in order to minimise data latency and boost real- the other hand, YOLOv3 may not be optimal for utilising niche
time responsiveness. models in situations where huge datasets are difficult to gather.
B. MTCNN (Multi-Task Cascaded Convolutional Neural Net- 1) Advantages: Excellent precision. It can detect in a vari-
works) ety of positions, lighting, and occlusions. Excellent inference
speed.
The MultiTask Cascaded Convolutional Neural Network
2) Disadvantages: It’s inferior to YOLO. Even though the
(paper) is a recent face identification technique that employs a
inference speed was fast, it was still insufficient to operate on
three-stage neural network detector. First, the picture is scaled
a CPU, low-end GPU, or mobile devices.
many times to identify faces of various sizes. The P-network
(Proposal) then scans pictures to achieve initial detection. It
has a low detection threshold and hence identifies numerous D. BlazeFace
false positives, even after NMS (Non-Maximum Suppression). It is, as the name suggests, a lightning-fast face-detection
The suggested areas (which include many false positives) technology produced by Google. It allows images with a
are sent into the second network, the R-network (Refine), resolution of 128x128. Its inference time is measured in
which, as the name implies, filters detections (again using milliseconds. This method has been tuned for use in mobile
NMS) to get very accurate bounding boxes. The O-network phone facial recognition. It is a specific face detector model,
(Output) conducts the final refining of the bounding boxes in as opposed to YOLO and SSD, which were designed to
the final step. Not only are faces recognised in this manner, identify a wide range of classes. As a result, BlazeFace’s
but bounding boxes are also incredibly accurate and exact. Deep Convolutional Neural Network architecture is smaller
Detecting facial landmarks, such as the eyes, nose, and than those of YOLO and SSD. It use Depthwise Separable
corners of the mouth, is an optional function of MTCNN. It Convolution rather than typical Convolution layers, resulting
is nearly free since they are utilised for facial detection in the in less calculations.
process, which is an added benefit if you need those (e.g. for 1) Advantages: Very Good inference speed and accurate
face alignment). face detection
Although the standard TensorFlow version of MTCNN 2) Disadvantages: This model is built for identifying facial
works fine, the PyTorch implementation is quicker (link). pictures from a mobile phone camera, therefore it expects the
Using a few methods, it produces roughly 13 FPS on full HD face to cover the majority of the image. When the face is tiny,
videos and up to 45 FPS on rescaled videos. It’s also super it does not operate properly. As a result, it does not work well
simple to set up and utilise. I’ve also got 6–8 FPS on the CPU with CCTV camera photos.
for full HD, indicating that MTCNN is capable of real-time
processing. E. FaceBoxes
MTCNN is very precise and stable. It recognises faces cor-
rectly despite with varying sizes, illumination, and rotations. Faceboxes is the most recent face recognition method that
It’s a little slower than the Viola-Jones detector, but not by we employed. It, too, is a Deep Convolutional Neural Network
much. Color information is also used since CNNs get RGB with a modest architecture and is only built for one class
pictures as input. - Human Face. Its inference time on the CPU is real-time.
1) Advantage: MTCNN is very precise and stable. It recog- For facial detection, its accuracy is equivalent to Yolo. It can
nises faces correctly despite with varying sizes, illumination, accurately distinguish tiny and big faces in a picture.
and rotations. It’s a little slower than the Viola-Jones detector, 1) Advantages: Fast inference speed and good accuracy.
but not by much. Color information is also used since CNNs 2) Disadvantages: Not fully developed.It is still under
get RGB pictures as input. evolution.
2) Disadvantage: Higher run time and lack of robustness
VIII. A NALYSIS OF THE TOOLS UTILIZED FOR FACE
IDENTIFICATION AND RECOGNITION
C. SSD (Single Shot Detector)
SSD is intended for real-time object detection. Faster R- A. Matlab
CNN creates boundary boxes using a region proposal network Matlab has a large number of libraries and functions for
and then uses those boxes to categorise things. While it is con- different image processing processes such as image prepro-
sidered cutting-edge in terms of precision, the whole process cessing,segmentation, feature extraction, classification or clus-
operates at 7 frames per second. Far below the requirements tering, or both. It also has capabilities for detecting faces and
of real-time processing. By removing the requirement for the facial features (eyes, nose, and mouth).
region proposal network, SSD speeds up the procedure. SSD
implements a few enhancements, such as multi-scale features
B. OpenCV
and default boxes, to compensate for the reduction in accuracy.
These enhancements enable SSD to match the accuracy of the OpenCV includes several image processing methods such
Faster R-CNN utilising lower quality pictures, thus increasing as noise removal (filters), feature extraction, classification, and
the speed. In contrast, it reaches real-time processing speed clustering. For face recognition, OpenCV 2.4 includes a new
and even outperforms the accuracy of the Faster R-CNN. FaceRecognize class.
C. OpenIMAJ [12] Rong Qi, Rui-Sheng Jia, Qi-Chao, Hong-Mei Sun, Ling- Qun Zuo, “Face
Detection Method Based on Cascaded Convolutional Networks”, Digital
OpenIMAJ is a Java toolkit for image foundations, video Object Identifier 10.1109/ACCESS.2019.2934563, IEEE Access, 2019.
processing, machine learning, deep learning approaches, and [13] Jandaulyet Salyut, Cetin Kurnaz, “Profile Face Recognition Using Local
Binary Patterns with Artificial Neural Network”, International Confer-
extraction and analysis of face features. OpenIMAJ has facial ence on Artificial Intelligence and Data Processing (IDAP), 2018.
analysis capabilities such as face identification, face reforma- [14] Sujay S N, H S Manjunatha Reddy, Ravi J, “Face Recognition Using
tion, and the Eigen face technique. extended LBP Features and Multilevel SVM Classifier”, International
Conference on Electrical, Electronics, Communication, Computer, and
Optimization Techniques (ICEECCOT), 2017.
IX. C ONCLUSION [15] Rong Qi, Rui-Sheng Jia, Qi-Chao, Hong-Mei Sun, Ling- Qun Zuo, “Face
Detection Method Based on Cascaded Convolutional Networks”, Digital
In this study, I have examined the numerous contempo- Object Identifier 10.1109/ACCESS.2019.2934563, IEEE Access, 2019.
rary face recognition algorithms which have been recently [16] Nidhi R.Brahmbhatt, Harshadkumar B.Prajapati, Vipul K.Dabhi, “Sur-
developed and now accessible, as well as the step-by-step vey and Analysis of Extraction of Human Face Features”, Innovations
in Power and Advanced Computing Technologies (i-PACT), 2017.
approaches for face detection. This study provides a short [17] Md. Asif Anjum Akash, M A. H. Akhand, N. Siddique, “Robust
discussion of techniques in which the whole process of face Face Detection Using Hybrid Skin Color Matching under Different
recognition is broken into parts. It is seen that a numer- Illumination”, International Conference on Electrical, Computer and
Communication Engineering (ECCE), 2019.
ous faster face recognition techniques have come into being [18] Eyad I.Abbas, Mohammed E. Safi, Khalida S.Rijab, “ Face Recognition
and number of others are under evolution. Each and every Rate Using Different Classifier Methods Based on PCA”, International
technique is superior and inferior at the same time to other Conference on Current Research in Computer Science and Information
Technology (ICCIT), 2017.
algorithms. However, the techniques are still evolving and [19] Gaili Yue, Lei Lu, “Face Recognition based on Histogram equalization
getting better day by day, and Convolution Neural Network”, International Conference on Intelli-
gent Human-Machine Systems and Cybernetics (IHMSC) , 2018.
R EFERENCES [20] Shun Lei Myat Oo, Aung Nway Oo, “Child Face Recognition with
Deep Learning”, International Conference on Advanced Information
[1] Samit Shirsat, Aakash Naik, Darshan Tamse, Jaysingh Yadav, Pratik- Technologies (ICAIT), 2019
sha Shetgaonkar, Shailendra Aswale, “Proposed System for Criminal [21] Iacopo Masi, Yue Wu, Prem Natarajan, “Deep Face Recognition: a
Detection and Recognition on CCTV Data Using Cloud and Machine Survey”, SIBGRAPI Conference on Graphics, Patterns and Images
Learning”, Proceedings International Conference on Vision Towards (SIBGRAPI), 2018.
Emerging Trends in Communication and Networking (ViTECoN), 2019. [22] Juxiang Chen, Zhihoa Zhang,Liansheng Yao, Bo Li, Tong Chen, “
[2] Jandaulyet Salyut, Cetin Kurnaz, “Profile Face Recognition Using Local Face Recognition Using Depth Images Base Convolutional Neural
Binary Patterns with Artificial Neural Network”, International Confer- Network”, 2019 International Conference on Computer, Information and
ence on Artificial Intelligence and Data Processing (IDAP), 2018. Telecommunication Systems (CITS), 2019.
[3] Ahmad Almadhor, “Deep Learning Based Detection Algorithm for Mo-
bile Application”, Proceedings TENCON, IEEE Region 10 International
Conference , 2018.
[4] Kartika Candra, Salmet Wibawanto, nur Hidayah, Gigih Prasetyo Cahy-
ono, “Ant System for Face Detection”, International Seminar on Appli-
cation for Technology of Information and Communication (iSemantic)
, 2019.
[5] Md. Asif Anjum Akash, M A. H. Akhand, N. Siddique, “Robust
Face Detection Using Hybrid Skin Color Matching under Different
Illumination”, International Conference on Electrical, Computer and
Communication Engineering (ECCE), 2019.
[6] Ru Wang, XinShi He, “Face Detection on Template Matching and Neural
Network”, International Conference on Communications, Information
System and Computer Engineering (CISCE) , 2019.
[7] Subham Mukherjee, Ayan Das, Sumalya Saha, Ayan Kumar Bhu-
nia, Sounak Lahiri, Aishik Konwer, Arindam Chakraborty, “Conolu-
tionla Neural Network based Face Detection”, International Confer-
ence on Electronics, Materials Engineering and Nano- Technology
(IEMENTech), 2017.
[8] Muhammad Zeeshan Khan, Saad Harous, Saleet Ul Hassan, Muhammad
Usman Ghani Khan, Razi Iqbal, Shaid Mumtaz, “Deep Unified Model
for Face Recognition on Convolutional Neural Network and Edge
Computing”, Digital Object Identifier 10.1109/ACCESS.2019.2918275
IEEE Access, 2019.
[9] Karaaba, M.; Surinta, O.; Schomaker, L.; Wiering, M.A. Robust face
recognition by computing distances from multiple histograms of oriented
gradients. In Proceedings of the 2015 IEEE Symposium Series on
Computational Intelligence, Cape Town, South Africa, 7–10 December
2015; IEEE: Piscataway, NJ, USA, 2015;
[10] Chien-Yu Chen, Jian-Jiun Ding, Hung-Wei Hsu, Yih- Cherng Lee, “Ad-
vanced Orientation Robust Face Detection Algorithm Using Prominent
Features and Machine Learning Techniques”, IEEE Visual Communica-
tions and Image Processing (VCIP) , 2018.
[11] Monali Nitin Chaudhari, Gayatri Ramrakhiani, Mrinal Deshmuk, Rak-
shita Parvatikar, “Face Detection using Viola Jones Algorithm aand Neu-
ral Network”, International Conference on Computing Communication
Control and Automation (ICCUBEA), 2018.