0% found this document useful (0 votes)
13 views

Final Documentation of Final Project-converted

Uploaded by

shaziya3002
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views

Final Documentation of Final Project-converted

Uploaded by

shaziya3002
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 66

Facial Expression Recognition using CNN

A PROJECT REPORT
ON
“FACIAL EXPRESSION RECOGNITION USING CNN”
SUBMITTED TO
JAWAHARLAL NEHRU TECHNOLOGICAL UNIVERSITY,
ANANTAPUR
IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR
THE AWARD OF THE DEGREE OF
BACHELOR OF TECHNOLOGY
IN
ELECTRONICS AND COMMUNICATION ENGINEERING

Ms. B. Deepika 18KF1A0404

Ms. M. Rupa Sree 18KF1A0431

Ms. N. Pooja 18KF1A0435

Mr. P.Rohith 18KF1A0441

Ms. S. Prathyusha 18KF1A0448

UNDER THE GUIDENCE OF


Dr. A. Senthil Kumar,
B.E.,M.E.,MBA.,PGDVLSI.,DISM.,Ph.D.(IITR).,PDF(TUT,SA).,
Senior PDF(VSB-TUO,EUROPE).,PGPAI&ML(University of Texas, Austin) .
PRINCIPAL
SANSKRITHI SCHOOL OF ENGINEERING

DEPARTMENT OF DEPARTMENT NAME


SANSKRITHI SCHOOL OF ENGINEERING
(Affiliated to JNTUA, Anantapur, Approved by AICTE, NEW DELHI)
PUTTAPARTHI-515134
2018-2022.

1
Sanskrithi School of Engineering
Facial Expression Recognition using CNN

SANSKRITHI SCHOOL OF ENGINEERING


DEPARTMENT OF ELECTRONICS AND COMMUNICATION ENGINEERING
(Affiliated to JNTUA, Anantapur, Approved by AICTE, NEW DELHI)
PUTTAPARTHI-515134.
2018-2022.

CERTIFICATE

This is to certify that the Project entitled “FACIAL EXPRESSION RECOGNITION


USING CNN” is submitted by

Ms. B. Deepika 18KF1A0404

Ms. M. RupaSree 18KF1A0431

Ms. N. Pooja 18KF1A0435

Mr. P.Rohith 18KF1A0441

Ms. S. Prathyusha 18KF1A0448

In partial fulfilment of the requirements for the award of BACHELOR OF


TECHNOLOGY IN ELECTRONICS AND COMMUNICATION ENGINEERING,
SANSKRITHI SCHOOL OF ENGINEERING, PUTTAPARTHI-515134.

Signature of the Guide Signature of the HoD Signature of Principal


Dr. A Senthil Kumar Mr. S. Hari Krishnan, Dr. A Senthil Kumar
B.E, M.E, M.B.A, PGDVLSI, DISM, M. Tech, (Ph. D)
B.E, M.E, M.B.A, PGDVLSI, DISM,
PDF(TUT, SA), Senior PDF(VSB TUO, PDF(TUT, SA), Senior PDF(VSB
Department of
EUROPE), PGP AI & ML (University of TUO, EUROPE), PGP AI & ML
Electronics And Communication
Texas, Austin) (University of Texas, Austin)
Engineering, Sanskrithi School of
Sanskrithi School of Engineering Sanskrithi School of Engineering
Engineering

Internal Examiner External Examiner

2
Sanskrithi School of Engineering
Facial Expression Recognition using CNN

ACKNOWLEDGEMENT

We express our special thanks to Dr.A.Senthil Kumar, B.E., M.E.,


PGDVLSI, DISM., Ph.D.(IITR), PDF.(TUT.SA), Senior
PDF.(VSD.TUV,EUROPE), PGPAI&ML(University of Texas, Austin).
Principal, SSE., for his valuable guidance and supervision and constructive
suggestions to complete this project.

We thankfully acknowledge to Mr.S. Hari Krishnan, M.Tech., (Ph.D.).,


Head of the Department of ECE, Sanskrithi School of Engineering, Puttaparthi,
for his valuable suggestions and advices throughout the course.

We thankfully acknowledge to Dr.A.Senthil Kumar , B.E., M.E., MBA.,


PGDVLSI., DISM., Ph.D. (IITR).,PDF(TUT,SA).,Senior PDF(VSB-TUO,
EUROPE)., PGPAI&ML(University of Texas, Austin). Principal, Sanskrithi
School Of Engineering, Puttaparthi, for his valuable suggestions and advices
throughout the course.

We express our sincere thanks to Mr. B. Vijay Bhaskar Reddy,


Chairman, Sanskrithi Group of institutions, Puttaparthi for his inspiring all the
way and for arranging all the facilities and resources needed to completion of
the course.

We express our heartfelt thanks to our parents and family members, who
gave moral support in completion of the course

We express our heartfelt thanks and gratitude to all professors, lab


coordinators, non teaching staff and who have help me understanding,
encouragement and support made this effort worthwhile and possible.

3
Sanskrithi School of Engineering
Facial Expression Recognition using CNN

DECLARATION

I hereby declare that this project report entitled “FACIAL EXPRESSION


RECOGNITION USING CNN” is the work done by

Ms. B. Deepika 18KF1A0404


Ms. M. Rupa Sree 18KF1A0431
Ms. N. Pooja 18KF1A0435
Mr. P.Rohith 18KF1A0441
Ms. S. Prathyusha 18KF1A0448
Towards the partial fulfilment of the requirement for the award of the degree of
BACHELOR OF TECHNOLOGY in ELECTRONICS AND
COMMUNICATION ENGINEERING and submitted to JAWAHARLAL NEHRU
TECHNOLOGICAL UNIVERSITY, ANANTAPUR, the result of the work carried out
under the guidance of

Dr. A Senthil Kumar, B.E, M.E, M.B.A, PGDVLSI, DISM, PDF (TUT, SA),
Senior PDF (VSB TUO, EUROPE), PGP AI & ML (University of Texas, Austin)
PRINCIPAL Sanskrithi School of Engineering.

I further declare that this project report has not been previously submitted before
either in part or full for the award of any degree or any diploma by any organization or any
universities.

Ms. B. Deepika (18KF1A0404)


Ms. M. Rupa Sree (18KF1A0431)
Ms. N. Pooja (18KF1A0435)
Mr. P. Rohith (18KF1A0441)
Ms. S. Prathyusha (18KF1A0448)

4
Sanskrithi School of Engineering
Facial Expression Recognition using CNN

ABSTRACT

In this project deals with the construction of a video processing


system dedicated to identifying and understanding facial expressions of persons.
This project approach implies detection of facial landmarks and analysis of their
position to identify emotions. This project adapted the networks which were
initially constructed to work on colored or grayscale images to work with black
and white images containing facial landmarks. The training, validation and
query datasets were also adapted and preprocessed from consecrated computer
vision datasets, with the addition of several images acquired by ourselves. This
project presents the experimental results and verified various techniques.
Keywords: Facial Expression Recognition, Convolutional Neural Network,
Facial Landmarks, Machine Learning

5
Sanskrithi School of Engineering
Facial Expression Recognition using CNN

QUOTATION
“Education is not the learning of facts, but the training of the
mind to think”
-
- Albert Einstein

6
Sanskrithi School of Engineering
Facial Expression Recognition using CNN

CHAPTER - 1

7
Sanskrithi School of Engineering
Facial Expression Recognition using CNN

1. INTRODUCTION
1.1 Natural Intelligence:
It is the intelligence created by nature, natural evolutionary mechanisms, as biological
intelligence embodied as the brain, animal and human and any hypothetical alien intelligence.
The key problem is “Will It Ever Be Possible to Understand the Human Brain”?

Scientists still have no reliable model of how the brain actually works, and
“neuroscience is still in its infancy,” capable of assembling a multitude of facts but struggling
to determine the relationship between them.

‘Neuroscience does not have, as physics does, a standard model that serves as a
conceptual structure in which gaps of knowledge and inconsistencies can be isolated and
serve as impetus for experiments, technological improvements or elaborate calculations. In
this workshop the speakers are asked to present embryos of brain theories that could develop
into a “standard model of the functions of the mammalian brain”.

As our current experimental knowledge of mechanisms of brain dynamics is sparse,


the talks will try to identify the gaps and bridge them by speculative hypotheses. The idea is
that this exercise identifies crucial questions for brain science and produce hypotheses, of
which some might be testable experimentally’.

1.2 Artificial Intelligence:


Artificial intelligence (AI) is intelligence demonstrated by machines, as opposed to
the natural intelligence displayed by animals including humans. AI research has been defined
as the field of study of intelligent agents, which refers to any system that perceives its
environment and takes actions that maximize its chance of achieving its goals.
The term "artificial intelligence" had previously been used to describe machines that
mimic and display "human" cognitive skills that are associated with the human mind, such as
"learning" and "problem-solving". This definition has since been rejected by major AI
researchers who now describe AI in terms of rationality and acting rationally, which does not
limit how intelligence can be articulated.

1.3 Computer Vision:


Computer Vision is a field of multiple disciplines that care about how computers can
gain a high-level understanding from digital images/videos. This is an attempt to automate
tasks that the human visual system is able to perform. This is a process of acquiring,
processing, analyzing, and understanding digital images, and extracting high-dimensional data
from the real world (to produce numerical/symbolic information).

Typical tasks involved Python Computer Vision are:

• Recognition
• Motion Analysis
• Scene Reconstruction
• Image Restoration

8
Sanskrithi School of Engineering
Facial Expression Recognition using CNN

1.4 Neural Network:


A neural network is a method in artificial intelligence that teaches computers to
process data in a way that is inspired by the human brain. It is a type of machine learning
process, called deep learning that uses interconnected nodes or neurons in a layered structure
that resembles the human brain. It creates an adaptive system that computers use to learn
from their mistakes and improve continuously. Thus, artificial neural networks attempt to
solve complicated problems, like summarizing documents or recognizing faces, with greater
accuracy.

Neural networks can help computers make intelligent decisions with limited human
assistance. This is because they can learn and model the relationships between input and
output data that are nonlinear and complex.

1.5 Our Concept:


Human beings communicate with each other in the form of speech, gestures and
emotions. As such systems that can recognize the same are in great demand in many fields.
With respect to artificial intelligence, a computer will be able to interact with humans much
more naturally if they are capable of understanding human emotion. It would also help during
counselling and other health care related fields.
In an E-Learning system, the presentation style may be varied depending on the
student’s state. Thus, the project proposes a model that is aimed at real-time facial emotion
recognition. For real-time purposes, facial emotion recognition has a number of applications.
For instance, ATMs could be set up such that they won’t dispense money when the user is
scared. In the gaming industry, emotion-aware games can be developed which could vary the
difficulty of a level depending on the player’s emotions. It also has uses in video game
testing. At present, players usually give some form of verbal or written feedback. By judging
their expressions during different points of the game, a general understanding of the game’s
strong and weak points can be discerned.
Emotions can also be gauged while a viewer watches ads to see how they react to
them. This is especially helpful since ads do not usually have feedback mechanisms apart
from tracking whether the ad was watched and whether there was any user interaction.
Software for cameras can use emotion recognition to take photos whenever a user smiles.
Even today, researchers aim to identify these six emotions with reliable accuracy.
Emotions can be inferred from a person’s actions, speech, writing and facial expressions. In
terms of facial emotion recognition, one major challenge lies in the data collected. Most
datasets contain labelled images which are generally posed. This generally involves photos
taken in a stable environment such as a laboratory. While it is much easier to accurately
predict the emotion in such scenarios, these systems tend to be unreliable in predicting
emotions in the ”wild” (Uncontrolled environments).
Another issue is that most datasets are from these controlled environments and it is
relatively harder to obtain labelled datasets of emotions in the wild. Furthermore, most
datasets have relatively lesser training data for emotions such as fear and disgust when
compared to emotions such as happiness.
9
Sanskrithi School of Engineering
Facial Expression Recognition using CNN

Another factor to take into account is a person’s pose. It is significantly harder to


determine the emotion of a person when only half of their face is visible. In addition, lighting
plays a major role in facial emotion recognition. Systems may fail to identify an emotion that
it normally would identify if the lighting conditions are poor. Finally, one must remember
that a user’s emotional state is a combination of many factors; a smile does not always mean
that a person is genuinely happy. The objective of this project is to classify human faces into
one of the six universal emotions or a seventh neutral emotion.

10
Sanskrithi School of Engineering
Facial Expression Recognition using CNN

CHAPTER – 2

11
Sanskrithi School of Engineering
Facial Expression Recognition using CNN

2.1 AIM AND OBJECTIVE


2.1.1 Problem Definition:
Human facial expressions can be easily classified into 7 basic emotions: happy, sad,
surprise, fear, anger, disgust, and neutral. Our facial emotions are expressed through
activation of specific sets of facial muscles. These sometimes subtle, yet complex, signals in
an expression often contain an abundant amount of information about our state of mind.
Through facial emotion recognition, we are able to measure the effects that content and
services have on the audience/users through an easy and low-cost procedure. For example,
retailers may use these metrics to evaluate customer interest. Healthcare providers can
provide better service by using additional information about patients' emotional state during
treatment. Entertainment producers can monitor audience engagement in events to
consistently create desired content. Humans are well-trained in reading the emotions but a
computer also do a job that accessing emotional states(expressions).

2.1.2 Objectives
The objectives of the project are:
1. Identifying of the facial expressions using image processing algorithms.
2. Classification of the facial expressions using Convolutional Neural Networks.
3. To get the output from the video frames/images or live recognition of expressions

12
Sanskrithi School of Engineering
Facial Expression Recognition using CNN

2.2 LITERATURE SURVEY

PAPERS REFERRED:
2.2.1 TITLE: Facial expression detection by combining deep learning neural
networks

AUTHORS: Alexandru Costache, Dan Popescu


YEAR: 2021
Construction of a video processing system dedicated to identifying the understanding
facial expressions of persons uses Haar cascade algorithm and used CNN for image
processing. CNN is useful in fast execution and more accuracy. Can execute complex images
easily. It captures and processes static image or video captures. It takes upto 400ms per face
to detect the land marks.

2.2.2 TITLE: A brief review of facial emotion recognition based on visual information
AUTHORS: Byoung Chul Ko, Ekmann and Friesen
YEAR: 2018
Face and facial components detection, features extraction and expressions of a human
begin by using conventional FER including SVM Adaboost and random forest algorithms
and used CNN for visual information. It reduces the dependence on face physics based
models. CNN based FER methods cannot reflect the temporal variations in the facial
components. By enabling the pre-processing technique “end to end” learning in the pipeline
directly from the input image.

2.2.3 TITLE: Recognition of emotion intensities using machine learning algorithms:


A comparative study

AUTHORS: D. Mehta, M.F.H. Siddiqui, and A.Y. Javaid


YEAR: 2019
Recognition of emotion along with intensities emotions using Machine Learning, this
increased in behavioural biometric systems and humans machine interaction plays important
role. So that the FER receiving enormous attention. Facial expression can displays personal
emotions and indicates an individual’s intentions within a social situation. It does not encode
the observed facial emotions and does not involve multi class facial behaviour.

13
Sanskrithi School of Engineering
Facial Expression Recognition using CNN

2.2.4 TITLE: Facial emotion recognition using deep learning: review and insights
AUTHORS: Mehrabian, Ekman, Freisen, Agrwal et mittal, Deepak Jain, Mohammad pour
Year: 2020
Code facial expressions and extract these features in order to have a better prediction
by computer. The architecture and the data base used and we present the progress made by
comparing proposed methods. CNN and CNN-LSTM are exploited to achieve better
performances. Verbal and non-verbal information captured by various sensors. There are
always limited by learning only the six basics emotion plus neutral and emotions that are
more complex.

2.2.5 TITLE: Facial expression recognition with Convolutional neural networks:


coping with few data and the training sample order

AUTHORS: A. De Souza, A. Lopes, E. Aguiar, and T. Oliveira-Santos


YEAR: 2016
Experiments showed that the combination of normalization procedures improves
significantly the accuracy. To overcome more data usage, pre-processing operations were
applied to the images and selects the subset of features which reduces the data usage.

14
Sanskrithi School of Engineering
Facial Expression Recognition using CNN

CHAPTER-3

15
Sanskrithi School of Engineering
Facial Expression Recognition using CNN

3. SOFTWARE REQUIREMENTS
PLATFORM : Anaconda Navigator
TOOLS : Jupyter Notebook, Spyder

Figure 1 Anaconda Logo

Figure 2 Jupyter Notebook Logo Figure 3 Spyder Logo

As the project is developed in python, we have used Anaconda for Python 3.6.5 and
Spyder.

3.1 Anaconda:
It is a free and open source distribution of the Python and R programming languages
for data science and machine learning related applications (large-scale data processing,
predictive analytics, scientific computing), that aims to simplify package management and
deployment. Package versions are managed by the package management system conda. The
Anaconda distribution is used by over 6 million users, and it includes more than 250 popular
data science packages suitable for Windows, Linux, and Mac OS.

3.2 Spyder:
Spyder (formerly Pydee) is an open source cross-platform integrated development
environment (IDE) for scientific programming in the Python language. Spyder integrates
NumPy, SciPy, Matplotlib and IPython, as well as other open source software. It is released
under the MIT license. Spyder is extensible with plug-ins, includes support for interactive
tools for data inspection and embeds Python-specific code quality assurance and
introspection instruments, such as Pyflakes, Pylint and Rope. It is available cross-platform
through Anaconda, on Windows with WinPython and Python (x,y), on mac OS through
MacPorts, and on major Linux distributions such as Arch Linux, Debian, Fedora, Gentoo
Linux, openSUSE and Ubuntu. Features include: o editor with syntax highlighting and
introspection for code completion o support for multiple Python consoles (including IPython)

16
Sanskrithi School of Engineering
Facial Expression Recognition using CNN

o the ability to explore and edit variables from a GUI Available plug-ins include: o Static Code
Analysis with Pylint o Code Profiling o Conda Package Manager with Conda.

3.3 Jupyter Notebook:


The Jupyter Notebook is an open-source web application that allows you to create and
share documents that contain live code, equations, visualizations, and narrative text. Its uses
include data cleaning and transformation, numerical simulation, statistical modelling, data
visualization, machine learning, and much more.

Jupyter Notebook (formerly I Python Notebooks) is a web-based interactive


computational environment for creating Jupyter notebook documents. The “notebook” term
can colloquially make reference to many different entities, mainly the Jupyter web application,
Jupyter Python web server, or Jupyter document format depending on context.

Jupyter Book is an open-source project for building books and documents from
computational material. It allows the user to construct the content in a mixture of Markdown,
an extended version of Markdown called MyST, Maths & Equations using MathJax, Jupyter
Notebooks, restructured Text, the output of running Jupyter Notebooks at build time. Multiple
output formats can be produced (currently single files, multipage HTML web pages and PDF
files).

3.4 Pandas
Pandas is an open-source library that is made mainly for working with relational or
labelled data both easily and intuitively. It provides various data structures and operations for
manipulating numerical data and time series. This library is built on top of the NumPy library.
Pandas is fast and it has high performance & productivity for users. Pandas were initially
developed by Wes McKinney in 2008 while he was working at AQR Capital Management. He
convinced the AQR to allow him to open source the Pandas. Another AQR employee, Chang
She, joined as the second major contributor to the library in 2012. Over time many versions of
pandas have been released. The latest version of the pandas is 1.4.1

3.5 Numpy
NumPy is a general-purpose array-processing package. It provides a high-performance
multidimensional array object, and tools for working with these arrays. It is the fundamental
package for scientific computing with Python. It is open-source software. It contains various
features including these important ones:
• A powerful N-dimensional array object
• Sophisticated (broadcasting) functions
• Tools for integrating C/C++ and Fortran code
• Useful linear algebra, Fourier transform, and random number capabilities
Besides its obvious scientific uses, NumPy can also be used as an efficient multi-
dimensional container of generic data. Arbitrary data-types can be defined using Numpy
which allows NumPy to seamlessly and speedily integrate with a wide variety of databases

17
Sanskrithi School of Engineering
Facial Expression Recognition using CNN

3.6 Keras & Tensorflow:


Keras is a deep learning API written in Python, running on top of the machine
learning platform TensorFlow. It was developed with a focus on enabling fast
experimentation. Being able to go from idea to result as fast as possible is key to doing good
research.

Keras is:

• Simple -- but not simplistic. Keras reduces developer cognitive load to free you to
focus on the parts of the problem that really matter.
• Flexible -- Keras adopts the principle of progressive disclosure of complexity: simple
workflows should be quick and easy, while arbitrarily advanced workflows should
be possible via a clear path that builds upon what you've already learned.
• Powerful -- Keras provides industry-strength performance and scalability: it is used
by organizations and companies including NASA, YouTube, or Waymo.

TensorFlow 2 is an end-to-end, open-source machine learning platform. You can think of it


as an infrastructure layer for differentiable programming. It combines four key abilities:

• Efficiently executing low-level tensor operations on CPU, GPU, or TPU.


• Computing the gradient of arbitrary differentiable expressions.
• Scaling computation to many devices, such as clusters of hundreds of GPUs.
• Exporting programs ("graphs") to external runtimes such as servers, browsers, mobile
and embedded devices.

Keras is the high-level API of TensorFlow 2: an approachable, highly-productive


interface for solving machine learning problems, with a focus on modern deep learning. It
provides essential abstractions and building blocks for developing and shipping machine
learning solutions with high iteration velocity.

Keras empowers engineers and researchers to take full advantage of the scalability and
cross-platform capabilities of TensorFlow 2: you can run Keras on TPU or on large clusters
of GPUs, and you can export your Keras models to run in the browser or on a mobile device.

18
Sanskrithi School of Engineering
Facial Expression Recognition using CNN

CHAPTER 4

19
Sanskrithi School of Engineering
Facial Expression Recognition using CNN

4. PROPOSED METHODOLOGY

BLOCK DIAGRAM

FACE FACE
VIDEO FRAME
DETECTION SEGMENTATION

IMAGE
FACIAL LAND
OUTPUT IMAGE ANALYSIS TO
MARKS DETERMINE
DETECTION COMPILATION
EMOTIONS

Figure 4 Project Block Diagram

4.1 Description:
Deep learning is a class of machine learning algorithms suitable for extracting
various features from inputs, with varying degrees of complexity. In image processing,
most such algorithms are based on artificial neural networks, usually CNN. Depending
on the complexity of the network, it can recognize from something as trivial as numbers
or letters, to something as detailed as persons or faces. We constructed an application that
processes video inputs to determine emotions of persons appearing in the videos. Figure
schematically shows the steps of such a process. Our inputs for image processing with
the target of determining emotions are captured frames through webcam. The photos we
used were downloaded from specific FER-2013 datasets. For the video frames, a facial
detection algorithm was employed, using the Haar-Cascade Classifier available with the
OpenCV library. Coordinates of a bounding rectangle for each detected face are given as
output. As we want our algorithm to perform FER under real- world conditions, it is
likely that persons will not always be passing by our camera, so a lot of input frames may
be unnecessarily processed. We use a flag telling us if there are persons in the frame.

20
Sanskrithi School of Engineering
Facial Expression Recognition using CNN

4.2 Implementation:
4.2.1 Face Detection:
Face detection using Haar cascades is a machine learning based approach where a
cascade function is trained with a set of input data. OpenCV already contains pre-trained
classifiers for face. In this project we will use Haar Cascade and LBP cascade classifiers for
the face detection. The implementation of the face detection is as shown below.

Figure 5 Implementation of Face detection

• The detection works only on grayscale images. So it is important to convert the color
image to grayscale
• detectMultiScale function is used to detect the faces. It takes 3 arguments — the input
image, scaleFactor and minNeighbours. scaleFactor specifies how much the image size
is reduced with each scale. minNeighbours specifies how many neighbors each
candidate rectangle should have to retain it.
• Faces contains a list of coordinates for the rectangular regions where faces were
found. We use these coordinates to draw the rectangles in our image.

21
Sanskrithi School of Engineering
Facial Expression Recognition using CNN

4.2.2 Face Recognition:


Now a days computer vision is getting advanced. Major tech giants are building
their models to become more like humans, to do so machines must be capable of
detecting your emotions and treating you accordingly.

• Getting Data
• Preparing data
• Image Augmentation
• Build model and train
• use the webcam for detection

4.2.3 Getting Data:

We used the dataset fer-2013 which is publically available on Kaggle. it has 48*48
pixels gray-scale images of faces along with their emotion labels. Start by importing pandas
and some essential libraries and then loading the dataset as shown in the figure.

Figure 6 Data Collection

This dataset contains 3 columns, emotion, pixels and Usage. Emotion column
contains integer encoded emotions and pixels column contains pixels in the form of a string
separated by spaces, and usage tells if data is made for training or testing purpose. This
dataset contains 7 Emotions :- (0=Angry, 1=Disgust, 2=Fear, 3=Happy, 4=Sad, 5=Surprise,
6=Neutral).

4.2.4 Preparing Data:

Data is not in the right format. We need to pre-process the data. Here X_train,
X_test contains pixels, and y_test, y_train contains emotions. At this stage X_train,
X_test contains pixel’s number is in the form of a string, converting it into numbers is easy,
we just need to typecast. y_test, y_train contains 1D integer encoded labels, we need to
connect them into categorical data for efficient training. num_classes = 7 gives that we have
7 classes to classify.

22
Sanskrithi School of Engineering
Facial Expression Recognition using CNN

Figure 7 Preparing Data

4.2.4 Reshaping Data:

We need to convert the data in the form of a 4d tensor (row_num, width, height, and
channel) for training purposes. Here 1 tells us that training data is in gray scale
form, at this stage, we have successfully pre-processed our data into X_train,
X_test, y_train, y_test.

Figure 8 Reshaping Data

4.2.5 Image Augmentation for Facial Emotion Detection:

Image data augmentation is used to improve the performance and ability of


the model to generalize. It’s always a good practice to apply some data
augmentation before passing it to the model, which can be done using Image Data
Generator provided by Keras.

• rescale: It normalizes the pixel value by dividing it by 255.


• horizontal_flip: It flips the image horizontally.
• fill_mode: It fills the image if not available after some cropping.
• rotation_range: It rotates the image by 0–90 degrees. testing data, we will only
apply rescaling (normalization).

23
Sanskrithi School of Engineering
Facial Expression Recognition using CNN

Figure 9 Image Augmentation

4.2.6 Fitting the generator to our data:


We use batch_size of 64 and after fitting our data to our image generator, data will be
generated in the batch size of 64. Using a data generator is the best way to train a large
amount of data. train_flow contains our X_train and y_train while test_flow contains
our X_test and y_test.

Figure 10 Fitting the generator

4.2.7 Building Facial Emotion Detection Model using CNN:

Designing the CNN model for emotion detection using functional API. We are
creating blocks using Conv2D layer, Batch-Normalization, Max-Pooling2D, Dropout,
Flatten, and then stacking them together and at the end-use Dense Layer for output. Building
the model using functional API gives more flexibility. FER_model takes input size and
returns model for training. Now let’s define the architecture of the model.

24
Sanskrithi School of Engineering
Facial Expression Recognition using CNN

4.2.8 Compiling the Facial Emotion Detection Model:


Compiling model using Adam optimizer keeping lr= 0.0001, if the model’s
accuracy doesn’t improve after some epochs learning rate decreases
by decay factor.

Figure 11 Compiling Facial Emotion

4.2.9 Training the Facial Emotion Detection Model:


To train the model you need to write the following line of code.

Figure 12 Training the model

• steps_per_epoch = TotalTrainingSamples / TrainingBatchSize


• validation_steps = TotalvalidationSamples / ValidationBatchSize

4.2.10 Save the Model:


Saving our model’s architecture into JSON and model’s weight into .h5.

Figure 13 Saving the model

25
Sanskrithi School of Engineering
Facial Expression Recognition using CNN

4.2.11 Testing the Model using Webcam Feed:


Loading the saved model:

Loading the trained model architecture and weights so that it can be used
further to make predictions.

Figure 14 Testing the model

Loading Haar-Cascade for Face Detection:

We used Haar-cascade for the detection position of faces and after getting
position we will crop the faces.

Figure 15 Loading the Cascade Classifier

Read Frames and apply Pre processing using OpenCV:

Use OpenCV to read frames and for image processing.

• Here emotion_prediction returns the label of emotion.


• Normalize test images by dividing them by 255.
• np.expand_dims convert a 3D matrix into a 4D tensor.
• (x,y,w,h) are the coordinates of faces in the input frame.
• haar_cascade takes only grayscale images.

Adding an overlay on the output frame and displaying the prediction with
confidence gives a better look.

26
Sanskrithi School of Engineering
Facial Expression Recognition using CNN

Figure 16 Reading frames

Model Validation:

Figure 17 Model Validation

The confusion matrix gives the counts of emotion predictions and some insights to the
performance of the multi-class classification model.

27
Sanskrithi School of Engineering
Facial Expression Recognition using CNN

The following are the figures shows the outputs of face detection techniques.
Fig18: Haar Cascade classifier Output
Fig19: LBP cascade classifier Output

Input Image:

Figure 18 INPUT IMAGE

Output From Haar And LBP Cascade Classifiers:

Figure 19 Haar Cascade Output Figure 20 Lbp Cascade Output

The following figures show the output frames for the inputs which read from the webcam
directly.

Figure 21 Expression Recognition using webcam


28
Sanskrithi School of Engineering
Facial Expression Recognition using CNN

The following are the figures show the output frames for the inputs which is given from
the data sets.

Figure 22 Expression recognition using input data

Conclusion:
In this project first level of implemented face detection using both the algorithm
(Haar Cascade and LBP Cascade for face detection). Based on the result of the both
algorithm, best result are verified by Haar cascade. Second level focused on face
recognition by CNN with live image and data set. Both compared and verified best result
is discussed with level of accuracy and losses.

29
Sanskrithi School of Engineering
Facial Expression Recognition using CNN

CHAPTER 5

30
Sanskrithi School of Engineering
Facial Expression Recognition using CNN

5. SOFTWARE ANALYSIS
5.1 Face Detection Technique
It is a true challenge to build an automated system which equals human ability to
detect faces, recognized estimates human body dimensions or body part from an image or
a video. The conceptual and intellectual challenges of such a problem, faces are non-rigid
and have a high degree of variability in size, shape, color and texture. Auto focus in
cameras, visual surveillance, traffic safety monitoring and human computer interaction.
Face reorganization will be following a pattern, which is focus on face or body. Face
detection is the step stone to the entire facial analysis algorithms, including face
alignment, face modeling, and face recognition and lots of more. Only when computers
can recognize face because computer is compute the logic and facial expiration and
match the expiration according to the facial structure. They begin to truly understand
peoples thoughts and intentions. Given an arbitrary image, the goal of face detection is to
determine whether or not there are any faces in the image and if the image is present then
it return the image location and extent of each face.
5.1.1 Haar Cascade Algorithm:

Start

Detection window is set with a sliding step

Slide the window vertically and horizontally

Apply a face recognition filter at each step

No If positive

response
yes

Face is detected

No
If window size is max

Yes
Stop

Figure Flow Chart of Face Detection

31
Sanskrithi School of Engineering
Facial Expression Recognition using CNN

Step 1 – Live video feed of the driver is being captured through the webcam which is
mounted in the vehicle.

Step 2 – The video is converted into snaps as images.

Step 3- Each Frame/image is taken and using Viola-Jones algorithm, faces are detected &
therefore extracting the facial features.
Step 4- The faces detected are stored into a database/artificial neural network & there is a
simultaneous comparison between the faces being detected & faces already stored in the
database.
Step 5 – If a match is found & in case of abnormal behavior in the driver there is a signal sent
to the hardware part for delivering the output.

Step 6 – If match is not found, Go back to step4 and repeat the process until the video
capturing is turned off.

Human face detection has been playing an major role in human-machine


interaction based applications. To this face detection we are manipulating an algorithm
which is named as Haar cascade algorithm and it is proposed by viola and jones. This
haar cascade algorithm can be explained in four stages: Calculating Haar Features,
creating Integral Images, using Adaboost, implementing Cascading Classifiers. It’s
important to remember that this algorithm requires a lot of positive images of faces and
negative images of non-faces to train the classifier, similar to other machine learning models.

5.1.2 Haar Feature Construction:

Figure 23 Haar features

The first step is to collect the Haar features. A Haar feature is essentially calculations
that are performed on adjacent rectangular regions at a specific location in a detection
window. The calculation involves summing the pixel intensities in each region and calculating
32
Sanskrithi School of Engineering
Facial Expression Recognition using CNN

the differences between the sums. Here are some examples of Haar features below. These
features can be difficult to determine for a large image. This is where integral images come
into play because the number of operations is reduced using the integral image.

5.1.3 Creating Integral Images:

Without going into too much of the mathematics behind it, integral images essentially
speed up the calculation of these Haar features. Instead of computing at every pixel, it instead
creates sub-rectangles and creates array references for each of those sub-rectangles. These are
then used to compute the Haar features.

Figure 24 Integral of Images

5.1.4 Adaboost Training:


Adaboost essentially chooses the best features and trains the classifiers to use them. It
uses a combination of “weak classifiers” to create a “strong classifier” that the algorithm can
use to detect objects.
Weak learners are created by moving a window over the input image, and computing
Haar features for each subsection of the image. This difference is compared to a learned
threshold that separates non-objects from objects. Because these are “weak classifiers,” a large
number of Haar features is needed for accuracy to form a strong classifier. The last step
combines these weak learners into a strong learner using cascading classifiers.

33
Sanskrithi School of Engineering
Facial Expression Recognition using CNN

Figure 25 Adaboost training

5.1.5 Haar Cascade Classifiers:

The cascade classifier is made up of a series of stages, where each stage is a collection
of weak learners. Weak learners are trained using boosting, which allows for a highly accurate
classifier from the mean prediction of all weak learners.

Based on this prediction, the classifier either decides to indicate an object was found
(positive) or move on to the next region (negative). Stages are designed to reject negative
samples as fast as possible, because a majority of the windows do not contain anything of
interest.

It’s important to maximize a low false negative rate, because classifying an object as a
non-object will severely impair your object detection algorithm. A video below shows Haar
cascades in action. The red boxes denote “positives” from the weak learners.

Haar cascades are one of many algorithms that are currently being used for object
detection. One thing to note about Haar cascades is that it is very important to reduce the
false negative rate, so make sure to tunehyper parameters accordingly when training your
model.

5.2 Local Binary Pattern:


Local Binary Patterns (LBP) is a texture descriptor that can be also used to
represent faces, since a face image can be seen as a composition of micro-texture-
patterns. Briefly, the procedure consists of dividing a facial image in several regions
where the LBP features are extracted and concatenated into a feature vector that will be
later used as facial descriptor.
Local Binary Pattern features have per formed very well in various applications,
including texture classification and segmentation, image retrieval and surface inspection.
The original LBP operator labels the pixels of an image by thresholding the 3-by-3
neighborhood of each pixel with the center pixel value and considering the result as a
34
Sanskrithi School of Engineering
Facial Expression Recognition using CNN

binary number.Figure1shows an example of LBP calculation.

Figure 26 Example of LBP Calculation

The 256-bin histogram of the labels computed over an image can be used as a
texture descriptor. Each bin of histogram (LBP code) can be regarded as a micro-texton.
Local primitives which are codified by these bins include different types of curved edges,
spots, flat areas, etc. Figure2 shows some examples

Figure 27 Examples of texture primitives

The LBP operator has been extended to consider different neighbor sizes. For
example, the operator LBP4,1 uses 4 neighbors while LBP16,2 considers the 16 neighbors on
a circle of radius 2. In general, the operator LBPP,R refers to a neighborhood size of P
equally spaced pixels on a circle of radius R that form a circularly symmetric neighbor set.
LBPP,R produces 2P different output values, corresponding to the 2P different binary patterns
that can be formed by the Ppixels in the neighbor set. It has been shown that certain bins
contain more information than others. Therefore, it is possible to use only a subset of the 2P
LBPs to describe the textured images, defined these fundamental patterns as those with a
small number of bitwise transitions from 0 to 1 and vice versa. For example, 00000000 and
11111111 contain 0 transitions while 00000110 and 01111110 contain 2 transitions and so
on. Accumulating the patterns which have more than 2 transitions into a single bin yields an
LBP descriptor. The most important properties of LBP features are their tolerance against
35
Sanskrithi School of Engineering
Facial Expression Recognition using CNN

monotonic illumination changes and their computational simplicity.

5.2.1 LBP based Facial Representation


Each face image can be considered as a composition of micro-patterns which can be
effectively detected by the LBP operator, introduced a LBP based face representation for
face recognition. To consider the shape information of faces, they divided face images into
M small non-overlapping regions R0, R1, ..., RM (as shown in Figure 4).The LBP histograms
extracted from each sub-region are then concatenated into a single, spatially enhanced
feature histogram defined as:

Where i = 0, ..., L-1, j = 0, ..., M-1.The extracted feature histogram describes the local
texture and global shape of face images.

Figure 28 LBP based facial representation

5.2.2 Learning Classification Functions


In this system, a variant of Adaboost, Gentle Adaboost issued to select the features
and to train the classifier. The formal guarantees provided by the Adaboost learning
procedure are quite strong. It has been proved in that the training error of the strong
classifier approaches zero exponentially in the number of rounds. Gentle Adaboost takes a
Newton steps for optimization.

36
Sanskrithi School of Engineering
Facial Expression Recognition using CNN

The weak classifier is designed to select the single LBP histogram bin which best
separates the positive and negative examples. Similar to, a weak classifier hj(x) consists of
a feature fj which corresponds to each LBP histogram bin, a threshold θj and a parity pj
indicating the direction of the in equality sign:

Found weak classifiers are used to compose a strong classifier.

5.2.3 The Attentional Cascade


A cascade of classifiers is used, which achieves increased detection performance while
obviously reducing computation amount. Simpler classifiers are used to reject the
majority of sub-windows before more complex classifiers are used to achieve low false
alarm rates.
Stages in the cascade are constructed by training classifiers using Gentle AdaBoost. A
positive result from earlier strong classifier triggers next strong classifier which has also
been adjusted to achieve higher detection rate than previous one. A negative result is
immediately rejected at any stage of cascade structure.

Figure 29 Schematic depiction of the detection cascade.

Because an overwhelming majority of input sub-window is negative, this method can


significantly reduce the number of sub-windows which is processed with more complex
classifier.

5.2.4 Training Datasets


It is an appearance-based detection scheme; large training sets of images are needed
in order to capture the variability of facial appearances. Some training examples are
shown in Figure30.
As shown in below figure, the trainings also include rotated face training examples to
37
Sanskrithi School of Engineering
Facial Expression Recognition using CNN

enable to detect rotated faces. Because rotated face in every 90˚ can be detected by rotating
LBP operator, only±18˚, 12˚ and 6˚ rotated face examples are added to training set. With
above training set, face detection works well; it can detect faces in images with low false
alarm rate. But it cannot detect faces in low light condition and dark skin faces. To solve
this problem, there are two approaches; one is image preprocessing and another is
enhancing training set. To estimate an illumination condition of image and enhance its
quality. But their method is computationally, it is not feasible on mobile product.
Therefore, to enable the system to detect faces in low light conditions, faces in various
illuminations and dark skin faces are also added to this training set.

Figure 30 Example face images from the training set with rotation.

Figure30. Example face images from the training set with various illumination
conditions. In this 57,134 face images and used it as a positive training set. To collect non
face patterns, this is used the “bootstrap” strategy in five iterations. First, my system extracts
200 patterns per an image from a set of false-alarm-causing image set which do not contain
faces. Because most of false alarms are come from trees, characters, handwritings and
fabrics, used these kinds of images as a false-alarm-causing image set. Some examples are
shown in Figure9. Then at the end of each training iterations, I run the face detector and
collected all those non face patterns that were wrongly classified as faces and used them for
training. And, extract negative training examples on false-alarm-causing image set again. To
get more efficient negative examples, used classifiers which were found in previous iteration
and chose negative examples which were mis - classified as a face.

5.3 FACE RECOGNITION ALGORITHM:


5.3.1 CNN Importance:
Now facial recognition is popular and widely used for person identification. A human
face characteristic is different for each person to person. Camera is the only device that
needed for face recognition. So it provides inexpensive and reliable personal identification
which is applicable in many fields. An efficient face recognition system provides fast and
accurate user identification and authentication. It has important role in many applications
such as government use, commercial use, security gates, attendance management, smart
cards, access control and biometrics.
Face detection: It is the process of finding human face in an image or video.
Face authentication: Facial recognition is a way of identifying or confirming an individual’s
based on facial features

38
Sanskrithi School of Engineering
Facial Expression Recognition using CNN

5.3.2 CONVOLUTIONAL NEURAL NETWORK:


A Convolutional neural network is a neural network comprised of convolution layers
which does computational heavy lifting by performing convolution. Convolution is a
mathematical operation on two functions to produce a third function. It is to be noted that the
image is not represented as pixels, but as numbers representing the pixel value. In terms of
what the computer sees, there will simply just be a matrix of numbers. The convolution
operation takes place on these numbers. We utilize both fully- connected layers as well as
Convolutional layers. In a fully-connected layer, every node is connected to every other
neuron. They are the layers used in standard feed forward neural networks. Unlike the fully
connected layers, Convolutional layers are not connected to every neuron. Connections are
made across localized regions. A sliding ”window” is moved across the image. The size of
this window is known as the kernel or the filter. They help recognise patterns in the data. For
each filter, there are two main properties to consider - padding and stride. Stride represents the
step of the convolution operation, that is, the number of pixels the window moves across.
Padding is the addition of null pixels to increase the size of an image. Null pixels here refer to
pixels with value of 0. If we have a 5x5 image and a window with a 3x3 filter, a stride of 1
and no padding, the output of the Convolutional layer will be a 3x3 image. This condensation
of a feature map is known as pooling. In this case, ”max pooling” is utilized. Here, the
maximum value is taken from each sliding window and is placed in the output matrix.
Convolution is very effective in image recognition and classification compared to a feed-
forward neural network. This is because convolution allows reducing the number of
parameters in a network and taking advantage of spatial locality. Further, Convolutional
neural networks introduce the concept of pooling to reduce the number of parameters by
down sampling. Applications of Convolutional neural networks include image recognition,
self-driving cars and robotics. CNN is popularly used with videos, 2D images, spectrograms,
Synthetic Aperture Radars.

5.3.3 CNN Working:

An image is nothing but the 2-dimensional array. Before training an image, we need to
process the dataset. By processing the dataset, we mean converting each image in to NumPy
array. Each row represents an image. NumPy package is inbuilt function. Datasets is
completely ready to be trained by the model.

Neural networks are like layers. Each layer of neural network contains nodes which
calculates some values based on characteristics or weights. Activation function are Relu for
hidden layers and either sigmoid or SoftMax for output layers.

Convolution layer is a fundamental mathematical operation that is highly useful for to


detect features of an image. In this layer we pass kernel. i.e., n*n matrix over the image pixel.
Kernel has values in each of cell. It processed with original image help to produce some
characteristics which help to identify images of the same object while predicting.

Pooling:
Max Pooling operation involves sliding a 2- dimensional filter over each channel of
features map and extract maximum features from image. Pooling layer used to reduce the
dimension of feature map. It reduces the number of parameters to learn and amount of
computation to perform. Pooling layer summarises the feature present in a region of the
feature map generated by the convolution layer.
39
Sanskrithi School of Engineering
Facial Expression Recognition using CNN

Average Pooling is a pooling operation that calculates the average value for patches of
a feature map, and uses it to create a down sampled (pooled) feature map. It is usually used
after a Convolutional layer. It adds a small amount of translation invariance - meaning
translating the image by a small amount does not significantly affect the values of most
pooled outputs. It extracts features more smoothly than Max Pooling, whereas max pooling
extracts more pronounced features like edges.

Flattening:
Flattening operation is performed when we got multidimensional output and we want
to convert in to a single long continuous linear vector. The flattened matrix is fed as input to
the fully connected layer.

Fully Connection Layer


It is one of the fully feed forward neural networks. It formed by last few layers. Once
the image is convolved, pooled and flattened, the result is a vector. This vector act as the input
layer for an ANN which then works normally to detect the image. It assigns random weights
to each synapse; the input layer is weight adjusted and put in to an activation function. Every
single neuron has a connection to every single neuron in next layer. The output is then
compared with true values and the error generated is back-propagated, i.e., the weights are re-
adjusted and all the processes repeated. This is done until the error is reduced or gets correct
output. One of the greatest challenges of developing CNNs is adjusting the weights of the
individual neurons to extract the right features from images. The process of adjusting these
weights to get correct output is called training.

40
Sanskrithi School of Engineering
Facial Expression Recognition using CNN

5.3.4 CNN Algorithm Flow Chart:

Start

Loading data from the prepared dataset

Training Training Samples


Samples
Convolutional Layers

Max Polling Layer


Feature
Extration
Convolutional Layers

Max Polling Layer

Convolutional Layers

Max Polling Layer

CNN based feature learning network

Training Samples features


Training Samples
features
Feature
Construct the support vector regression Extration
machine

Rotating machine status evaluation

End

41
Sanskrithi School of Engineering
Facial Expression Recognition using CNN

Steps of CNN Algorithm:


A summarized CNN algorithm involves the following steps:
Step 1: Choose a Dataset

Choose a dataset of your interest or you can also create your own image dataset for solving
your own image classification problem. An easy place to choose a dataset is on kaggle.com.

Step 2: Prepare Dataset for Training

Preparing our dataset for training will involve assigning paths and creating categories(labels),
resizing our images.

Step 3: Create Training Data

Training is an array that will contain image pixel values and the index at which the image in
the CATEGORIES list.

Step 4: Shuffle the Dataset


Step 5: Assigning Labels and Features

This shape of both the lists will be used in Classification using the NEURAL NETWORKS.

Step 6: Normalising X and converting labels to categorical data


Step 7: Split X and Y for use in CNN
Step 8: Define, compile and train the CNN Model
Step 9: Accuracy and Score of model

42
Sanskrithi School of Engineering
Facial Expression Recognition using CNN

CHAPTER - 6

43
Sanskrithi School of Engineering
Facial Expression Recognition using CNN

6. SOURCE CODE
import numpy as np
import pandas as pd
import os
for dirname, _, filenames in os.walk('/kaggle/input'):
for filename in filenames:
print(os.path.join(dirname, filename))
import matplotlib.pyplot as plt
import scipy
df = pd.read_csv('fer2013.csv')
df.head()
df.info()
num_classes = 7
width = 48
height = 48
emotion_labels = ["Angry", "Disgust", "Fear", "Happy", "Sad", "Surprise", "Neutral"]
classes=np.array(("Angry", "Disgust", "Fear", "Happy", "Sad", "Surprise", "Neutral"))
df.Usage.value_counts()
k = np.array(list(map(int,df.iloc[0,1].split(" "))),dtype='uint8').reshape((48,48))
k.shape
X_train = []
y_train = []
X_test = []
y_test = []
for index, row in df.iterrows():
k = row['pixels'].split(" ")
if row['Usage'] == 'Training':
X_train.append(np.array(k))
y_train.append(row['emotion'])
elif row['Usage'] == 'PublicTest':
X_test.append(np.array(k))
44
Sanskrithi School of Engineering
Facial Expression Recognition using CNN

y_test.append(row['emotion'])
X_train[0]
plt.imshow(np.array(X_train[0], dtype = 'uint8').reshape(48,48,1), cmap = 'gray')
X_train = np.array(X_train, dtype = 'uint8')
y_train = np.array(y_train, dtype = 'uint8')
X_test = np.array(X_test, dtype = 'uint8')
y_test = np.array(y_test, dtype = 'uint8')
X_train = X_train.reshape(X_train.shape[0], 48, 48, 1)
X_test = X_test.reshape(X_test.shape[0], 48, 48, 1)
X_train.shape
import keras
from tensorflow.keras.utils import to_categorical
y_train= to_categorical(y_train, num_classes=7)
y_test = to_categorical(y_test, num_classes=7)
from keras.preprocessing.image import ImageDataGenerator
datagen = ImageDataGenerator(
rescale=1./255,
rotation_range = 10,
horizontal_flip = True,
width_shift_range=0.1,
height_shift_range=0.1,
fill_mode = 'nearest')

testgen = ImageDataGenerator(
rescale=1./255
)
datagen.fit(X_train)
batch_size = 64
train_flow = datagen.flow(X_train, y_train, batch_size=batch_size)
test_flow = testgen.flow(X_test, y_test, batch_size=batch_size)
for X_batch, y_batch in datagen.flow(X_train, y_train, batch_size=9):

45
Sanskrithi School of Engineering
Facial Expression Recognition using CNN

for i in range(0, 9):


plt.axis('off')
plt.subplot(330 + 1 + i)
plt.imshow(X_batch[i].reshape(48, 48), cmap=plt.get_cmap('gray'))
plt.axis('off')
plt.show()
break
from keras.layers import
ZeroPadding2D,Convolution2D,MaxPooling2D,Flatten,Dense,Dropout
from keras.models import Sequential
from keras.utils.vis_utils import plot_model
from keras.models import Model
from keras.layers import Input, Dense, Flatten, Dropout, BatchNormalization
from keras.layers.convolutional import Conv2D
from keras.layers.pooling import MaxPooling2D
from keras.layers.merge import concatenate
from tensorflow.keras.optimizers import Adam, SGD
from keras.regularizers import l1, l2
from matplotlib import pyplot as plt
from sklearn.metrics import confusion_matrix
def FER_Model(input_shape=(48,48,1)):

visible = Input(shape=input_shape, name='input')


num_classes = 7

conv1_1 = Conv2D(64, kernel_size=3, activation='relu', padding='same', name =


'conv1_1')(visible)
conv1_1 = BatchNormalization()(conv1_1)
conv1_2 = Conv2D(64, kernel_size=3, activation='relu', padding='same', name =
'conv1_2')(conv1_1)
conv1_2 = BatchNormalization()(conv1_2)
pool1_1 = MaxPooling2D(pool_size=(2,2), name = 'pool1_1')(conv1_2)

46
Sanskrithi School of Engineering
Facial Expression Recognition using CNN

drop1_1 = Dropout(0.3, name = 'drop1_1')(pool1_1)

conv2_1 = Conv2D(128, kernel_size=3, activation='relu', padding='same', name =


'conv2_1')(drop1_1)
conv2_1 = BatchNormalization()(conv2_1)
conv2_2 = Conv2D(128, kernel_size=3, activation='relu', padding='same', name =
'conv2_2')(conv2_1)
conv2_2 = BatchNormalization()(conv2_2)
conv2_3 = Conv2D(128, kernel_size=3, activation='relu', padding='same', name =
'conv2_3')(conv2_2)
conv2_2 = BatchNormalization()(conv2_3)
pool2_1 = MaxPooling2D(pool_size=(2,2), name = 'pool2_1')(conv2_3)
drop2_1 = Dropout(0.3, name = 'drop2_1')(pool2_1)

conv3_1 = Conv2D(256, kernel_size=3, activation='relu', padding='same', name =


'conv3_1')(drop2_1)
conv3_1 = BatchNormalization()(conv3_1)
conv3_2 = Conv2D(256, kernel_size=3, activation='relu', padding='same', name =
'conv3_2')(conv3_1)
conv3_2 = BatchNormalization()(conv3_2)
conv3_3 = Conv2D(256, kernel_size=3, activation='relu', padding='same', name =
'conv3_3')(conv3_2)
conv3_3 = BatchNormalization()(conv3_3)
conv3_4 = Conv2D(256, kernel_size=3, activation='relu', padding='same', name =
'conv3_4')(conv3_3)
conv3_4 = BatchNormalization()(conv3_4)
pool3_1 = MaxPooling2D(pool_size=(2,2), name = 'pool3_1')(conv3_4)
drop3_1 = Dropout(0.3, name = 'drop3_1')(pool3_1)

conv4_1 = Conv2D(256, kernel_size=3, activation='relu', padding='same', name =


'conv4_1')(drop3_1)
conv4_1 = BatchNormalization()(conv4_1)
conv4_2 = Conv2D(256, kernel_size=3, activation='relu', padding='same', name =
'conv4_2')(conv4_1)

47
Sanskrithi School of Engineering
Facial Expression Recognition using CNN

conv4_2 = BatchNormalization()(conv4_2)
conv4_3 = Conv2D(256, kernel_size=3, activation='relu', padding='same', name =
'conv4_3')(conv4_2)
conv4_3 = BatchNormalization()(conv4_3)
conv4_4 = Conv2D(256, kernel_size=3, activation='relu', padding='same', name =
'conv4_4')(conv4_3)
conv4_4 = BatchNormalization()(conv4_4)
pool4_1 = MaxPooling2D(pool_size=(2,2), name = 'pool4_1')(conv4_4)
drop4_1 = Dropout(0.3, name = 'drop4_1')(pool4_1)

conv5_1 = Conv2D(512, kernel_size=3, activation='relu', padding='same', name =


'conv5_1')(drop4_1)
conv5_1 = BatchNormalization()(conv5_1)
conv5_2 = Conv2D(512, kernel_size=3, activation='relu', padding='same', name =
'conv5_2')(conv5_1)
conv5_2 = BatchNormalization()(conv5_2)
conv5_3 = Conv2D(512, kernel_size=3, activation='relu', padding='same', name =
'conv5_3')(conv5_2)
conv5_3 = BatchNormalization()(conv5_3)
conv5_4 = Conv2D(512, kernel_size=3, activation='relu', padding='same', name =
'conv5_4')(conv5_3)
conv5_3 = BatchNormalization()(conv5_3)
pool5_1 = MaxPooling2D(pool_size=(2,2), name = 'pool5_1')(conv5_4)
drop5_1 = Dropout(0.3, name = 'drop5_1')(pool5_1)

flatten = Flatten(name = 'flatten')(drop5_1)


ouput = Dense(num_classes, activation='softmax', name = 'output')(flatten)

model = Model(inputs =visible, outputs = ouput)


print(model.summary())

return model
model = FER_Model()

48
Sanskrithi School of Engineering
Facial Expression Recognition using CNN

opt = Adam(lr=0.0001, decay=1e-6)


model.compile(loss='categorical_crossentropy', optimizer=opt, metrics=['accuracy'])
from keras.callbacks import ModelCheckpoint
filepath="weights_min_loss.hdf5"
checkpoint = ModelCheckpoint(filepath, monitor='val_loss', verbose=1,
save_best_only=True, mode='min')
callbacks_list = [checkpoint]

num_epochs = 50
history = model.fit(train_flow,
steps_per_epoch=len(X_train) / batch_size,
epochs=num_epochs,
verbose=2,
callbacks=callbacks_list,
validation_data=test_flow,
validation_steps=len(X_test) / batch_size)

train_loss=history.history['loss']
val_loss=history.history['val_loss']
train_acc=history.history['accuracy']
val_acc=history.history['val_accuracy']

epochs = range(len(train_acc))

plt.plot(epochs,train_loss,'r', label='train_loss')
plt.plot(epochs,val_loss,'b', label='val_loss')
plt.title('train_loss vs val_loss')
plt.xlabel('epoch')
plt.ylabel('loss')
plt.legend()
plt.figure()

49
Sanskrithi School of Engineering
Facial Expression Recognition using CNN

plt.plot(epochs,train_acc,'r', label='train_acc')
plt.plot(epochs,val_acc,'b', label='val_acc')
plt.title('train_acc vs val_acc')
plt.xlabel('epoch')
plt.ylabel('accuracy')
plt.legend()
plt.figure()
model.save('./working/Fer2013.h5')
loss = model.evaluate(X_test/255., y_test)
print("Test Loss " + str(loss[0]))
print("Test Acc: " + str(loss[1]))
import itertools
def plot_confusion_matrix(y_test, y_pred, classes,
normalize=False,
title='Unnormalized confusion matrix',
cmap=plt.cm.Blues):
cm = confusion_matrix(y_test, y_pred)

if normalize:
cm = np.round(cm.astype('float') / cm.sum(axis=1)[:, np.newaxis], 2)

np.set_printoptions(precision=2)

plt.imshow(cm, interpolation='nearest', cmap=cmap)


plt.title(title)
plt.colorbar()
tick_marks = np.arange(len(classes))
plt.xticks(tick_marks, classes, rotation=45)
plt.yticks(tick_marks, classes)

50
Sanskrithi School of Engineering
Facial Expression Recognition using CNN

thresh = cm.min() + (cm.max() - cm.min()) / 2.


for i, j in itertools.product(range(cm.shape[0]), range(cm.shape[1])):
plt.text(j, i, cm[i, j],
horizontalalignment="center",
color="white" if cm[i, j] > thresh else "black")

plt.tight_layout()
plt.ylabel('True expression')
plt.xlabel('Predicted expression')
plt.show()
y_pred_ = model.predict(X_test/255., verbose=1)
y_pred = np.argmax(y_pred_, axis=1)
t_te = np.argmax(y_test, axis=1)
fig = plot_confusion_matrix(y_test=t_te, y_pred=y_pred,
classes=classes,
normalize=True,
cmap=plt.cm.Greys,
title='Average accuracy: ' + str(np.sum(y_pred == t_te)/len(t_te)) + '\n')

model_json = model.to_json()
with open("model.json", "w") as json_file:
json_file.write(model_json)
model.save_weights("model.h5")
print("Saved model to disk")

# image recognition using webcam


import cv2
from tensorflow.keras.models import load_model, model_from_json
from tensorflow.keras.preprocessing.image import load_img,img_to_array
from tensorflow.keras.preprocessing import image
model = model_from_json(open("model.json", "r").read())

51
Sanskrithi School of Engineering
Facial Expression Recognition using CNN

model.load_weights('model.h5')
# model = load_model('static\Fer2013.h5')
face_haar_cascade = cv2.CascadeClassifier('C:/Users/Sai Gopesh/.spyder-
py3/haarcascade_frontalface_default.xml')
cap=cv2.VideoCapture(0)

while cap.isOpened():
res,frame=cap.read()

height, width , channel = frame.shape


sub_img = frame[0:int(height/6),0:int(width)]

black_rect = np.ones(sub_img.shape, dtype=np.uint8)*0


res = cv2.addWeighted(sub_img, 0.77, black_rect,0.23, 0)
FONT = cv2.FONT_HERSHEY_SIMPLEX
FONT_SCALE = 0.8
FONT_THICKNESS = 2
lable_color = (10, 10, 255)
lable = "Emotion Detection made by Gopesh"
lable_dimension = cv2.getTextSize(lable,FONT ,FONT_SCALE,FONT_THICKNESS)[0]
textX = int((res.shape[1] - lable_dimension[0]) / 2)
textY = int((res.shape[0] + lable_dimension[1]) / 2)
cv2.putText(res, lable, (textX,textY), FONT, FONT_SCALE, (0,0,0),
FONT_THICKNESS)
gray_image= cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)
faces = face_haar_cascade.detectMultiScale(gray_image )
try:
for (x,y, w, h) in faces:
cv2.rectangle(frame, pt1 = (x,y),pt2 = (x+w, y+h), color = (255,0,0),thickness = 2)
roi_gray = gray_image[y-5:y+h+5,x-5:x+w+5]
roi_gray=cv2.resize(roi_gray,(48,48))
image_pixels = img_to_array(roi_gray)
52
Sanskrithi School of Engineering
Facial Expression Recognition using CNN

image_pixels = np.expand_dims(image_pixels, axis = 0)


image_pixels /= 255
predictions = model.predict(image_pixels)
max_index = np.argmax(predictions[0])
emotion_detection = ('angry', 'disgust', 'fear', 'happy', 'sad', 'surprise', 'neutral')
emotion_prediction = emotion_detection[max_index]
cv2.putText(res, "Sentiment: {}".format(emotion_prediction), (0,textY+22+5),
FONT,0.7, lable_color,2)
lable_violation = 'Confidence:
{}'.format(str(np.round(np.max(predictions[0])*100,1))+ "%")
violation_text_dimension =
cv2.getTextSize(lable_violation,FONT,FONT_SCALE,FONT_THICKNESS )[0]
violation_x_axis = int(res.shape[1]- violation_text_dimension[0])
cv2.putText(res, lable_violation, (violation_x_axis,textY+22+5), FONT,0.7,
lable_color,2)
except :
pass
frame[0:int(height/6),0:int(width)] =res
cv2.imshow('frame', frame)

if cv2.waitKey(1) & 0xFF == ord('q'):


break
cap.release()
cv2.destroyAllWindows
#face recognition using input images
import numpy as np
import cv2
from tensorflow.keras.models import load_model, model_from_json
from tensorflow.keras.preprocessing.image import load_img,img_to_array
from tensorflow.keras.preprocessing import image
model = model_from_json(open("model.json", "r").read())

53
Sanskrithi School of Engineering
Facial Expression Recognition using CNN

model.load_weights('model.h5')
face_haar_cascade = cv2.CascadeClassifier('haarcascade_frontalface_default.xml')
frame=cv2.imread("expr2.jpg")

height, width , channel = frame.shape


sub_img = frame[0:int(height/6),0:int(width)]

black_rect = np.ones(sub_img.shape, dtype=np.uint8)*0


res = cv2.addWeighted(sub_img, 0.77, black_rect,0.23, 0)
FONT = cv2.FONT_HERSHEY_SIMPLEX
FONT_SCALE = 0.8
FONT_THICKNESS = 2
lable_color = (10, 10, 255)
lable = "Emotion Detection made by Batch 1"
lable_dimension = cv2.getTextSize(lable,FONT ,FONT_SCALE,FONT_THICKNESS)[0]
textX = int((res.shape[1] - lable_dimension[0]) / 2)
textY = int((res.shape[0] + lable_dimension[1]) / 2)
cv2.putText(res, lable, (textX,textY), FONT, FONT_SCALE, (0,0,0), FONT_THICKNESS)
gray_image= cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)
faces = face_haar_cascade.detectMultiScale(gray_image )
try:
for (x, y, w, h) in faces:
cv2.rectangle(frame, pt1 = (x,y),pt2 = (x+w, y+h), color = (255,0,0),thickness = 2)
roi_gray = gray_image[y-5:y+h+5,x-5:x+w+5]
roi_gray=cv2.resize(roi_gray,(48,48))
image_pixels = img_to_array(roi_gray)
image_pixels = np.expand_dims(image_pixels, axis = 0)
image_pixels /= 255
predictions = model.predict(image_pixels)
max_index = np.argmax(predictions[0])
emotion_detection = ('angry', 'disgust', 'fear', 'happy', 'sad', 'surprise', 'neutral')

54
Sanskrithi School of Engineering
Facial Expression Recognition using CNN

emotion_prediction = emotion_detection[max_index]
cv2.putText(res, "Sentiment: {}".format(emotion_prediction), (0,textY+22+5),
FONT,0.7, lable_color,2)
lable_violation = 'Confidence: {}'.format(str(np.round(np.max(predictions[0])*100,1))+
"%")
violation_text_dimension =
cv2.getTextSize(lable_violation,FONT,FONT_SCALE,FONT_THICKNESS )[0]
violation_x_axis = int(res.shape[1]- violation_text_dimension[0])
cv2.putText(res, lable_violation, (violation_x_axis,textY+22+5), FONT,0.7,
lable_color,2)
except :
pass
frame[0:int(height/6),0:int(width)] =res
cv2.imshow('frame', frame)

#if cv2.waitKey(1) & 0xFF == ord('q'):


#break

cv2.waitKey(0)

55
Sanskrithi School of Engineering
Facial Expression Recognition using CNN

CHAPTER – 7

56
Sanskrithi School of Engineering
Facial Expression Recognition using CNN

7. RESULTS:

Figure 31 Neutral Expression using webcam

Figure 32 Happy Expression webcam

57
Sanskrithi School of Engineering
Facial Expression Recognition using CNN

Figure 33 Surprise Expression using webcam

Figure 34 Sad Expression using webcam

58
Sanskrithi School of Engineering
Facial Expression Recognition using CNN

Figure 35 Angry Expression using webcam

Figure 36 Fear Expression using input data

59
Sanskrithi School of Engineering
Facial Expression Recognition using CNN

Figure 37 Surprise Expression using input data

Figure 38 Sad Expression using input data

60
Sanskrithi School of Engineering
Facial Expression Recognition using CNN

Figure 39 Disgust Expression using input data

Figure 40 Neutral Expression using input data

61
Sanskrithi School of Engineering
Facial Expression Recognition using CNN

Figure 41 Angry Expression using input data

Figure 42 Happy Expression using input data

62
Sanskrithi School of Engineering
Facial Expression Recognition using CNN

CHAPTER - 8

63
Sanskrithi School of Engineering
Facial Expression Recognition using CNN

8. Conclusion & Future Scope:

In this project we have done face detections using the Haar cascade classifiers and
LBP cascade classifiers. Based on the results of face detection algorithms, we have chosen
Haar Cascade classifier for the face detection in this project. After the face detection, we have
worked for face recognition. For the face recognition we have used Convolutional Neural
Networks algorithm. Data sets are FER-2013 were used in this process of face recognition.
We have given the inputs as input images and using web cam feed. We compared the results.
Inputs given through input images have more accuracy and recognizing the expressions
accurately than using web cam. The achieved accuracy while training was 0.75 for 50 epoch.
The relatively lower amount of data for emotions such as”disgust” makes the model have
difficulty predicting it. To improve the prediction of the expressions “disgust” leads to the
future work. Further training samples for the more difficult to predict emotion of disgust will
definitely be required in order to perfect such a system. This also implies that with some
work, the model could very well be deployed into real-life applications for effective
utilization in domains such as in healthcare, marketing and the video game industry.

64
Sanskrithi School of Engineering
Facial Expression Recognition using CNN

CHAPTER – 9

65
Sanskrithi School of Engineering
Facial Expression Recognition using CNN

REFERENCES:
• Facial expression detection by combining deep learning neural networks The 12th
international symposium on advanced topics in electrical engineering march 25-27,
2021 bucharest, romania
• B.C. Ko, “A brief review of facial emotion recognition based on visual information”,
Sensointernational symposium on advanced topics in electrical engineering march 25-
27, 2021 bucharest, romaniars, vol. 18, no. 2, 401, Jan. 2018.
• D. Mehta, M.F.H. Siddiqui, and A.Y. Javaid, “Recognition of emotion intensities
using machine learning algorithms: A comparative study”, Sensors, vol. 19, no. 8,
1897, Apr. 2019.
• Facial emotion recognition using deep learning: review and insights from The 2nd
International Workshop on the Future of Internet of Everything (FIoE) August 9-12,
2020, Leuven, Belgium
• A. De Souza, A. Lopes, E. Aguiar, and T. Oliveira-Santos, “Facial expression
recognition with convolutional neural networks: coping with few data and the training
sample order”, Pattern recognition, vol. 61, pp. 610-628, Jan. 2016.
• Facial Emotion Recognition Using Deep Cnn Based Features by Jyostna Devi
Bodapati, N. Veeranjaneyulu
• Facial Emotion Recognition using Convolutional Neural Networks by Akash
Saravanan, Gurudutt Perichetla, Dr. K.S.Gayathri
• https://www.geeksforgeeks.org/opencv-python-tutorial/
• https://www.analyticsvidhya.com/blog/2021/11/facial-emotion-detection-using-cnn/
• https://en.wikipedia.org/wiki/CNN
• https://keras.io/about/
• https://opencv.org/about/

66
Sanskrithi School of Engineering

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy