0% found this document useful (0 votes)
46 views

Human Emotion Detectionusing Machine Learning Techniques

This document summarizes a research paper on human emotion detection using machine learning techniques. The paper proposes detecting emotions from video images in real-time using facial expression recognition. Key techniques discussed include local binary pattern histograms and convolutional neural networks to extract features from facial landmarks in video frames and identify emotions. The goal is to develop a monitoring system for elderly people that can detect discomfort and alert caregivers if needed.

Uploaded by

Gaming H
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
46 views

Human Emotion Detectionusing Machine Learning Techniques

This document summarizes a research paper on human emotion detection using machine learning techniques. The paper proposes detecting emotions from video images in real-time using facial expression recognition. Key techniques discussed include local binary pattern histograms and convolutional neural networks to extract features from facial landmarks in video frames and identify emotions. The goal is to develop a monitoring system for elderly people that can detect discomfort and alert caregivers if needed.

Uploaded by

Gaming H
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/341782389

Human Emotion Detection using Machine Learning Techniques

Article  in  SSRN Electronic Journal · January 2020


DOI: 10.2139/ssrn.3591060

CITATIONS READS
8 5,951

5 authors, including:

Punidha Angusamy
Coimbatore Institute of Technology
15 PUBLICATIONS   15 CITATIONS   

SEE PROFILE

All content following this page was uploaded by Punidha Angusamy on 03 July 2021.

The user has requested enhancement of the downloaded file.


Human Emotion Detection using Machine
Learning Techniques
Punidha A, Inba. S, Pavithra. K.S., Ameer Shathali. M, Athibarasakthi. M
Coimbatore Institute of Technology, Coimbatore

Abstract---Image processing is a method to between humans but they also occur in most
convert an image into digital form and perform other mammals and some other animal species.
some operations on it. This is done to enhance the Humans can adopt a facial expression voluntarily or
image or to extract useful information from it. involuntarily. Involuntary expression are those that
Facial expressions are nonverbal form of when people make when they are ill, hurt or feeling
communication. There are eight universal facial uncomfortable.
expressions which include: neutral, happy,
sadness, anger, contempt, disgust, fear, and
surprise. So it is very important to detect these
emotions on the face. A monitoring system for
elderly people which is based on technology
involving recognizing emotions from video image.
Our proposed system includes video analysis
technology which includes the data from video is
adopted to realize monitoring elders' living
conditions in real time. In case of emergency, the
system will alert their relatives and children by
sending a message.

Keywords-facial emotion recognition; Local Binary


Pattern Histogram(LBPH)algorithm,Convolutional
Neural Networks(CNN)

I. INTRODUCTION
Fig. 1: Example of expression for the six basic emotions
A facial expression can be said as the movement
of muscles beneath the skin of the face. Facial Affective computing is the study of systems and
expressions are a form of nonverbal developing systems and devices that can recognize,
communication.Human face could convey countless interpret, process and simulate human affects.
emotions without saying a single word. And unlike Affective computing technologies can sense the
some forms of nonverbal communication are not emotions of the user through devices such as sensors,
universal but these facial expressions are universal microphone, cameras and respond by performing
and can be understood by any kind of people. The some specific, predefined product/service features
facial expressions for happiness, sadness, anger, One way to look at effective computing is human-
surprise, fear, and disgust are the same across the computer interaction in which a device has the ability
people of different cultures. to detect and respond to the emotions exhibited by
the users.
The movements of muscles convey the emotions
of individual to people who see them. They are the
means through which social information is conveyed
Emotion recognition can help in monitoring the Machine learning can be classified into
emotional health of the users and screening for several categories. In supervised learning, the
emotion-related physiology and mental disease. algorithm builds a mathematical model from a data
Emotions are not only expressed through set that contains both the inputs and the required
psychological behavioral performance, but also outputs. Classification algorithms and regression
through a series of physiological changes. These algorithms come under supervised learning.
physiological changes are not being controlled by Classification algorithms can be used when we want
humans. Thus, physiological signals can possibly to restrict the out to some limited set of values.
reflect the true feelings of subjects. There are several Regression algorithm are named for continuous
kinds of physiological signals that have been output because they have value within a particular
successfully applied to emotion recognition, range. Semi-supervised learning algorithms create
including electrocardiogram (ECG), galvanic skin mathematical models from incomplete training data,
response (GSR), electroencephalogram (EEG), in which a portion of the sample input does not have
respiratory suspended particulate (RSP) and blood any labels. In unsupervised learning, the algorithm
volume pulse (BVP).These physiological signals, builds a mathematical model from a set of input data
most importantly ECG reflect the relationship and it has no desired output labels.
between the heart beating and emotion changes.
Researchers have performed much work on emotion B. IMAGE RECOGNITION
recognition based on ECG.Heart Rate Variability
Image recognition refers to the technology that
(HRV) which is extracted from an ECG is considered
identifies places, logos, people, objects, buildings,
to be one of the important parameters of emotion
and other things in images. Image recognition is a
recognition.
part of computer vision and it is a process that can
identify and detect an object in a digital video or
Our aim is to work in the real time in which we
image. Computer vision is the one in which it
detect the emotions from images that has been
includes the methods of gathering, processing and
captured by live webcam. Now the webcam will be
analyzing data from video or static images that comes
running a video and the faces are going to be detected
from the real world. The data that comes from such
in the frames according to the facial landmarks which
source is high-dimensional and produces either
will contain the eyes, eyebrows, nose, mouth, corners
numerical or symbolic information in the form of
of the face. Then the features were extracted from
decisions. Apart from image recognition, computer
these facial landmarks (dots) faces which will be
vision also includes event detection, learning, object
utilized for the detection of the facial emotions. After
recognition ,video tracking and image reconstruction.
the emotions are identified, we look for any
discomfort in the emotions through image processing The human eye see an image as set of signals
techniques. which will be processed by the visual cortex which is
II. BACKGROUND there in our brain. Image recognition tries to exactly
recreate this process. Computer perceives an image as
A. MACHINE LEARNING either raster or vector image. Raster images has
sequence of pixels which has discrete numerical
Machine Learning is a method which helps us to values for the colors in the images while vector
study about algorithms and models that a computer images are a set of color-annotated polygons. To
system can use to perform some specific task without analyze images the geometric encoding is
external instructions. Machine Learning algorithms transformed into constructs that depicts physical
build mathematical model based on some sample data features and objects. These constructs are then
which is called as training data which could be used logically analyzed by the computer. Data
to make predictions or decisions without any external organization involves classification and feature
factors affecting the performance of the task. extraction. The first step in image classification is to
make the image simple by extracting only the
important information that is needed and leaving out method efficiently finds the corners of the image. The
other information. FAST component identifies features and is identified
as areas of the image with a sharp contrast of
The second step is to build a predictive model for brightness. If more than 8 surrounding pixels are
which a classification algorithm can be used. Before brighter or darker than a given pixel, that spot is
classification algorithm works, we need to train it by flagged as a feature. BRIEF expresses this by
showing thousands of subject and non-subject images converting the extracted points as binary feature
as relative to the project. To build a predictive model vectors. Color Gradient Histogram method simply
we make use of neural networks. The neural network measures the proportions of red, green, and blue
is a system which is a combination of hardware and values of an image and finds an image with similar
software similar to our brain and it estimate functions color proportions. Color Gradient Histogram can be
based on huge amount of unknown inputs. There are tuned through binning the values.
numerous algorithms for image classification in
recognizing images such as support vector machines Corner detection is a method which is used
(SVM), face landmark estimation, K-nearest by computer systems to extract the features. Those
neighbors (KNN), logistic regression etc. extracted features are used to infer the contents of an
image. Some of the applications where corner
The third step is image recognition. The image detection is used are motion detection, image
data both training and test data are organized. registration, video tracking, image mosaicing, 3D
Training data differs from test data in which we modelling and object recognition. Harris corner
remove duplicates between them. This data is been detector and shi-Tomasi corner detector are used
fed into the model which in turn recognize images. widely for corner detection. Harris Corner Detector is a
Now we need to train a classifier that takes method in which it determines which windows
measurements from a new test image and tells us produce very large variations in intensity when moved
about the closest match with the subject. This in both X and Y directions (i.e. gradients).With each
classifier takes only milliseconds. The result of the such window found, a score R is computed. A
classifier is either subject or non-subject. threshold is applied to this score and then important
corners are selected & marked. Shi-Tomasi corner
C. FEATURE EXTRACTION
detector is another method to find the corners. It is
Feature extraction is a process of dimensional almost similar to Harris Corner detector the score (R)
reduction by which an initial set of raw data is is calculated and we can find the top N corners, which
reduced for some processing purpose. Features define might be useful in case we don’t want to detect each
the behavior of an image. Basically features refer to a and every corner. R is calculated by the formula:
pattern found in an image such as a point or edge.
, If R is greater than the
The process of feature extraction is useful when you
threshold, it is classified as a corner.
need to reduce the number of resources needed for
The Viola Jones algorithm is implemented
processing while retaining the important and relevant
for the face feature detection as it does not consumes
information. The amount of redundant data can be
much time, thus giving greater accuracy[1]. The
reduced by feature extraction. Image preprocessing
Viola Jones detection framework identifies the faces
techniques such as thresholding, resizing,
or features of the face by using simple features
normalization, binarization, etc. are applied on the
known as Haar-like features. The process involves
sampled image and the features are extracted later.
passing feature boxes over the image and computing
Feature extraction techniques are applied to get
the difference of summed pixel values between
features for classifying and recognition of images.
adjacent regions. The difference is then compared
ORB and Color Gradient Histogram are with a threshold which indicates whether an object is
some of the feature detection algorithms. ORB considered to be detected or not. This requires
(Oriented FAST and Rotated BRIEF) algorithm is thresholds that have been trained in advance for
actually a combination of FAST and BRIEF.ORB different feature boxes and features[2].
Let us consider the image below. Top row
shows two good features. The first feature selected
focuses on the property that the region of the eyes is
darker than the region of nose and cheeks. The
second feature selected focuses on the property that
the eyes are darker than the bridge of the nose. But
the same windows applying on cheeks or any other
place is irrelevant.

During the detection phase, a window which is of


target size is moved over the input image. For each
subsection of the image the Haar features are The cascade classifier has series of stages, in
calculated. Various features show various values. The which each stage has weak learners. Each stage is
difference is then compared to a threshold that trained using a technique called as boosting.
separates non-objects from objects. Each Haar feature Boosting helps to train a highly accurate classifier by
is a "weak classifier" because it detects slightly taking weighted average of the decisions made by the
better than random guessing. A large number of Haar weak learners.
features are required to distinguish an object from
non-object with sufficient accuracy and are therefore In each stage the classifier labels the region
organized into cascade classifiers to form a strong by using the current location of the sliding window as
classifier. either positive or negative. Positive means that an
object is found and negative means no objects were
found. If the label is negative, the classification of
that particular region is complete and the detector
slides the window to the next location. If the label is
positive, the classifier passes the region to the next
stage. The detector reports an object found at the
current window location when the final stage
classifies the region as positive.

III. LEARNING METHODS

A. LOCAL BINARY PATTERN HISTOGRAM

Local Binary Pattern (LBP) is the one which labels


the pixels of an image and thresholds the
neighborhood of each pixel and considers the result as
Fig. 2: Feature boxes a binary number. It is found that when LBP is
combined with Histogram of Oriented Gradients
improves detection performance. LBPH uses 4
parameters:

Radius: the radius can be used to build the


circular local binary pattern and it represents the
radius around the central pixel which is usually set
to 1.
Neighbors: Neighbors are the number of sample Now using the above image we can produce
points that is required to build the circular local histograms. We can use the parameters grid X and
binary pattern. The more sample points included, grid Y to divide the image into multiple grids. Each
the higher the computational cost and it is usually histogram (from each grid) will contain only 256
set to 8. positions (0~255) representing the occurrences of
We should concatenate
each pixel intensity.
Grid X: the number of cells in the horizontal
each histogram to form a larger histogram
direction. More the cells, finer the grid, the higher
the dimensionality of the resulting feature vector. Each histogram created is used to represent
It is usually set to 8. each image from the training dataset. So, given an
input image, we should perform the steps again for
Grid Y: the number of cells in the vertical
this new image and create a histogram which
direction. More the cells, finer the grid, the higher
represents the image. To find the image that matches
the dimensionality of the resulting feature vector.
the input image we just need to compare two
It is usually set to 8.
histograms and return the image with the closest
histogram. Several approaches can be used to
To train the algorithm we need to use dataset with
compare the histograms (calculate the distance
several face images and we need to set an ID.
between two histograms), for example: euclidean
Images of same person must have same ID.
distance, chi-square, absolute value, etc.
The first step of LBP is to construct an
intermediate image to better represent the original
image

So the output of the algorithm is ID with the


image with the closest histogram. The algorithm
should also return the calculated distance

As seen in the above image if we have a grayscale B.CONVOLUTIONAL NEURAL NETWORKS


image, we can take part of that image in a 3x3 pixels.
It is the matrix that consist of intensity of each pixel Convolutional Neural networks are made up of
(0-255).Then central value of the matrix is taken as neurons with learnable weights. Each neuron will
threshold value. For each neighbor value of the central accepts several inputs, calculates the sum and
value we need to set new binary value.1 for values produces with an output. Convolutional Neural
equal or higher than the threshold and 0 for values Networks have a different architecture than regular
lower than the threshold. Neural Networks. Regular Neural Networks transform
an input by putting it through a series of hidden layers.
Now the matrix will contain only the binary Every layer is made up of a set of neurons, where each
value ignoring the central value. Concatenate all the layer is fully connected to all neurons in the layer
binary value from each position into a new binary before. Finally, there is a last fully connected layer
value. Then convert that binary value to decimal value called the output layer
and set it as the central value of the matrix. At the end
of the LBP procedure we get a new image with better Convolutional Neural Networks are different.
characteristics of the original image The layers are organized in 3 dimensions: width,
height and depth. The neurons in one layer do not
connect to all the neurons in the next layer but only to
a small region of it. And at last the final output will be is useful for extracting dominant features which are
reduced to a single vector of probability scores, rotational and positional invariant, thus maintaining
organized along the depth dimension. the process of effectively training of the model.
Four concepts in CNN are: Pooling shortens the training time and controls over-
fitting.
1.Convolution
2.ReLu Classification — Fully Connected Layer (FC Layer):
3.Pooling Now that we have converted our input image into
4.Full connectedness a suitable form, we shall flatten the image into a
column vector. The flattened output is fed to a feed-
Feature Extraction: Convolution forward neural network and back propagation is
applied to every iteration of training. The model is
Convolution in CNN is performed on an able to distinguish between dominating and certain
input image using a filter or a kernel. Filtering low-level features in images and classify them using
and convolution will involves with scanning the the Softmax Classification technique.
screen which starts from top left to right and
moving down a bit after covering the width of So now we have all the pieces required to build a
the screen and repeating the same process until CNN. Convolution, ReLU and Pooling. The output of
we scan the whole screen. The feature from the max pooling is fed into the classifier we discussed
face of the individual is lined up with the initially which is usually a multi-layer perceptron
image. The image pixel is multiplied by the layer. Usually in CNNs these layers are used more
corresponding feature pixel. The values are than once i.e. Convolution -> ReLU -> Max-Pool ->
added and divided by total number of pixels in Convolution -> ReLU -> Max-Pool and so on. We
the feature. won’t discuss the fully connected layer right now.

Feature Extraction: Non-Linearity CONCLUSION


After sliding our filter over the original image the
output which we get is passed through another The result obtained from the proposed model gives the
mathematical function which is called an activation estimated sentiment prediction of the subject based on
function. the video information. The resulting output can be
used in many situations, the mental disorders and
The activation function usually used in most cases stress level is estimated and therefore in case of
in CNN feature extraction is ReLu which stands for “critical” sentiments the peers and family members of
Rectified Linear Unit. Which simply converts all of the subject can take actions to encourage, motivate
the negative values to 0 and keeps the positive values and uplift the emotional stature of the subject thus
the same. The aim is to remove all the negative resulting in the harmony and peace of mind of the
values from the convolution. All the positive values subject. Therefore such sentiment analyses models are
remain the same but the negative values changes to are quirement for shaping the society into a happening
zero. place.

Feature Extraction: Pooling


After a convolution layer once you get the feature REFERENCES
maps, we need to add a pooling or a sub-sampling
layer in CNN layers. Similar to the Convolutional [1] Facial Emotion Recognition System through
Layer, the Pooling layer is responsible for reducing Machine Learning approach, Renuka S. Deshmukh,
the spatial size of the Convolved Feature. This is to Shilpa Paygude,Vandana jagtap.
decrease the computational power required to process
the data through dimensionality reduction. Further, it [2] Emotion Detection Through Facial Feature
Recognition, James Pao
[3] Renuka S. Deshmukh, Vandana Jagtap, Shilpa Seyyed Mohammad. R Hashemi, “Facial
Paygude, “Facial Emotion Recognition System Emotion Recognition using Deep Convolutional
through Machine Learning approach” , 2017. Networks”, 2017.

[4] Dongwei Lu, Zhiwei He, Xiao Li, Mingyu [8] Dubey, M., Singh, P. L., ”Automatic
Gao, Yun Li, Ke Yin, “The Research of Elderly Emotion Recognition Using Facial Expression:
Care System Based on Video Image Processing A Review,” International Research Journal of
Technology”, 2017. Engineering and Tech nology,2016

[9]https://stackoverflow.com/questions/4217765
[5] Shivam Gupta, ”Facial emotion recognition in
8/how-to-switch-backend-with-keras-from-
real-time and static Images”, 2018.
tensorflow-to-theano
[6] Ma Xiaoxi, Lin Weisi, Huang Dongyan,
Dong Minghui, Haizhou Li, “Facial Emotion [10]https://archive.ics.uci.edu/ml/datasets/Gram
Recognition”, 2017. matical+Facial+Expressions
[7] Mostafa Mohammadpour, Hossein [11]https://www.edureka.co/blog/convolutional-
Khaliliardali, Mohammad. M AlyanNezhadi, neural-network/

View publication stats

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy