Engagement Detection Through Facial Emotional Recognition Using A Shallow Residual Convolutional Neural Networks
Engagement Detection Through Facial Emotional Recognition Using A Shallow Residual Convolutional Neural Networks
236
1
Department of Computer Science and Engineering, School of Engineering and Technology,
CHRIST (Deemed to be University), Bangalore, India
2
Department of Psychology, School of Social Sciences, CHRIST (Deemed to be University), Bangalore, India
* Corresponding author’s Email: Michael.moses@christuniversity.in
Abstract: Online teaching and learning has recently turned out to be the order of the day, where majority of the learners
undergo courses and trainings over the new environment. Learning through these platforms have created a requirement
to understand if the learner is interested or not. Detecting engagement of the learners have sought increased attention
to create learner centric models that can enhance the teaching and learning experience. The learner will over a period
of time in the platform, tend to expose various emotions like engaged, bored, frustrated, confused, angry and other
cues that can be classified as engaged or disengaged. This paper proposes in creating a Convolutional Neural Network
(CNN) and enabling it with residual connections that can enhance the learning rate of the network and improve the
classification on three Indian datasets that predominantly work on classroom engagement models. The proposed
network performs well due to introduction of Residual learning that carries additional learning from the previous batch
of layers into the next batch, Optimized Hyper Parametric (OHP) setting, increased dimensions of images for higher
data abstraction and reduction of vanishing gradient problems resulting in managing overfitting issues. The Residual
network introduced, consists of a shallow depth of 50 layers which has significantly produced an accuracy of 91.3%
on ISED & iSAFE data while it achieves a 93.4% accuracy on the Daisee dataset. The average accuracy achieved by
the classification network is 0.825 according to Cohens Kappa measure.
Keywords: Student engagement detection, Residual networks, Convolutional neural network, Emotion detection,
Facial expression recognition.
devices that are aware of the impact. These programs boredom, 2) engaged, 3) frustrated and 4) confused
can more easily grasp the individual feelings and Few authors have used combinational methods
communicate with the consumer. Current explicitly by combining two emotions that became
technologies, however, are yet to achieve the unable to interpret mixed feelings into a limited
maximum emotional and social capacities needed to collection of words sufficiently [2]. On the other hand,
create a rich and stable Human Machine Interaction the dimensional model of affect can discern between
(HMI). It is primarily due to the reality that HMI slightly different displays of affect and represent
devices need to communicate with humans in an minor changes in the intensity of each emotion on a
unregulated atmosphere (aka wild setting) where continuous scale, such as valence and anticipation.
scene illumination, camera orientation, picture size, Valence reflects how positive or negative an event is
landscape, user-head posture, gender and ethnicity and excitement reflects whether an event is exciting,
can differ considerably. restless or calm.
Furthermore, there are inadequate variations and In the continuous domain, dimensional
annotated samples in the data that drive the creation perception of affect encompasses the strength and
of affective computing systems and particularly FER specific types of emotion. However, comparatively
systems that can be used in developing these systems. fewer studies have been performed to establish
Studies in late psychology showed that people automatic algorithms for calculating affect using a
basically convey their feelings externally. Study of cumulative dimensional model (e.g. valence and
facial expression is an essential part of genuinely rich arousal). One of the key reasons for this is that
Man-Machine Frameworks on communication building a massive database to cover the entire
(MMI), as it uses nonverbal signs all over to measure continuous space of valence and anticipation is costly
the user's enthusiasm. This work emphasises on and there are very few annotated face databases in the
creation of Neural Networks to detect user emotions continuous domain. Facial Expression Recognition
and User engagement by utilising most commonly (FER) for different domains use supervised/semi
used datasets. In the recent literature and the work supervised learning methods for automated affective
carried by various authors for user engagement computing. They require labelled dataset for training
detection, Convolutional Neural Networks (CNN) and testing, these datasets are generally created by
and Residual Networks (ResNet) are considerably subjects based on posed actions and also expressions
used to improve the outcomes in emotional detections. extracted from videos enacted by various actors.
Residual Networks utilizes skip networks that can Recent studies of the education sectors have
connect between layers to enhance the learning initiated to impart knowledge through online portals.
pattern of the network. This rest of the paper is These methods have become challenging in
organised as follows, Section 2 describes the Related analysing the engagement levels of the students while
Work and Section 3 explains the Datasets. Section 4 teaching and learning is conducted through online
explains the Residual Network. The details of portals [2]. Facial expression and affect datasets in
experimentation are provided in section 5. The the wild have been receiving a lot of attention
detailed Results are in Section 6. Finally, Section 7 recently. These datasets are either collected from
concludes the paper. movies or the world wide web and well labelled [3-
5], and varied dimensions [6]. However, they cover
2. Related works just one model of affect, have a small range of
subjects, or include little instances of certain
In literature, there are several models to measure
emotions like disgust and sadness. A broad archive,
emotional behaviours: 1) definite models that select
with such a substantial quantity of object variations
the emotion or affect from a list of categories of
in wild condition covering numerous affect models is
affectivity, including six specific emotions identified
therefore, a requirement. Though there are several
by Ekman etc. 2) Dimensional model where meaning,
models for affect computing for emotional
such as a valence and arousal, is selected over a
recognition in videos or single images, object
sustained emotional scale 3) Facial Action coding
localization and continuous emotional analysis has
systems, in the case of Action Units (AUs), all
always been a challenging task due to face detection,
potential facial behaviours are identified. 4) Tagged
posture recognition, segmentation, human pose,
Emotions, these are emotions that are grouped
object association and for affective state
together based on Eckman’s categorical model to
classification using facial expressions in a cluttered
create emotional tags as combinational outcomes [1].
environment. For the better growth of Massive Open
The authors have grouped the emotions into four
Online Courses (MOOCs), there is a need to design
primary categories of learning environments: 1)
smart interfaces that can simulate the interactions
International Journal of Intelligent Engineering and Systems, Vol.14, No.2, 2021 DOI: 10.22266/ijies2021.0430.21
Received: October 13, 2020. Revised: December 24, 2020. 238
between the instructor and pupil. The principal improve the scope of detection rate and the precision
disadvantage of existing e-learning systems is, that of detection percentage for individual emotions. The
they cannot have direct input in real time Students (or need for a system that can reduce the error rates while
instructors) during the delivery of the content, training is important as the connections established in
compared with traditional instruction in the each layer kneads to weight updating and
classroom., MOOCs have a 91-93% dropout rate [7]. approximation to improve detection in the data.
Understanding user engagement at different junctures This paper aims in analysing the Residual
of the e-learning experience can help design intuitive network’s performance with respect to parameterized
interfaces that support students' better absorption of study that can establish the significance of
knowledge and personalize learning. The user 's classification model on three established datasets.
understanding of affective state is an important Also, this work would compare the results with the
computer vision sub-area, centred for a long time on existing models in terms of accuracy.
datasets of the seven basic terms: neutral, happiness,
sadness, anger, disgust, surprise and contempt [8]. In 3. Datasets
recent years, the data collection has been extended to
Data for any work is pivotal and all the
cover successful states in terms of dimension
experiments are based on the data. There are critical
representations [9-11], but the vast subtleties in
datasets that emulate emotions and help in creating
affective states allow datasets for particular goals to
models for detecting emotions for various
be established. This strategy, which is backed by
applications. Classification problems to detect
recent developments, including, tends to promote
emotions have recently been a field of study in
measurable outcomes. It has been found that in e-
various prominent datasets that contribute to the
learning and classroom settings students often prefer
understanding of emotions. Though there are several
to communicate only a few effective conditions.
datasets that help in analysing the face for emotions,
These included 7 fundamental emotions and a
there are few datasets for Indian Origin faces for a
few emotions that are concentrated on learning.
classroom environment. The learning environments
Distinct works focus on hand gesture [12], facial
not only are limited to the basic emotions also can be
recognition, affective states, however there are a very
extended to various classes of classifications that can
few works on elaborating the available dataset on
influence the accurate measurement of engagement
assessing various cognitive levels of understanding
in a class. This work focusses on elaborating the
students emotional state for engagement and
emotions by using the available dataset for Indian
distraction. There are distinctive doubts on the
origin by combining basic classes, Engagement
curation and usage of facial data in facial recognition
recognition and Learning centered emotions. The
[13]. Few researchers have captured emotions under
following datasets are used for this study are DAISEE,
controlled environments while the subjects watch
iSAFE and ISED databases. Table 1 lists the details
videos of different emotions [14-18]. With such
of the available datasets for affectnet.
methods being able to collect a vast number of frames,
the variety of such repositories is restricted due to the
number of participants, head orientation, and
4. The residual network
environmental exposures [21, 22]. In this work the priority is to elaborate the feature
Some of the works carried out on ISED datasets extraction process by creating a space where a
have predominantly used CNN as the crux, while particular emotion exposed by a human is discrete.
some modifications on the network is incorporated to This is achieved by extending the emotions into 10
enhance the accuracy of the algorithm [25-26]. classes and establishing a model that can eliminate
Feature extraction methods involving Local the bias of learning and detection. network that was
prominent directional patterns and local directional used for emotional analysis. This network is grouped
structural pattern have been used. However, these as convolutional layers on the left of the figure and
methods lack in efficient classification accuracy the Skip network on the right. The middle layers
when compared to the CNN’s [25-27]. Many authors represent the connection between the residual
use modified CNN in order to achieve greater results connections and the convolution layers.
by adding multiple deep layers that enhance the SIU1CONV1 represents one single convolutional
performance of the system [29-31]. CNN’s are prone unit and S1U1BN1 represents the Batch
to issues of vanishing gradients that leads to accuracy normalization layer. Each of these layers are grouped
loss by curating the training into an expanding into groups, each consisting of two convolutional and
memory requirement. In all the major works carried batch normalization layers.
out mentioned in this section, there is a need to
International Journal of Intelligent Engineering and Systems, Vol.14, No.2, 2021 DOI: 10.22266/ijies2021.0430.21
Received: October 13, 2020. Revised: December 24, 2020. 239
The residual connections are routed after the layer that channels the input images into the network.
second batch into the third batch and similarly the The final layers are equipped with pooling layer,
second connection arises from the fourth batch into SoftMax layer and the classification layer, the
the fifth batch of layers. The ReLu layers are the Pooling pools all the weights from the various
interconnection layers providing the activation from distribution provided by various layers to help in
the previous group of layers into the next group of dimensionality reduction and the SoftMax layer
layers. Additionally, the input layer has a convolution
International Journal of Intelligent Engineering and Systems, Vol.14, No.2, 2021 DOI: 10.22266/ijies2021.0430.21
Received: October 13, 2020. Revised: December 24, 2020. 240
matching the dimensions of the previous layer with Table 1. Parameter setting detail
the next layer. Multiple convolutional layers are Parameter’s Name Value
represented with the function F(x,{Wi}). The Networks Layers 50
element-wise addition is performed on two feature InitialLearnRate(Lr) 1.00E-04
maps, channel by channel. Table. 2 illustrates the Regularization Function l2Regularization
details of various parameter setting used in the GradientThreshold 'l2norm'
network. Eq. (5), explains the CNN process, in MaxEpochs 40
MiniBatchSize(Bs) 32
which: n is the size of the image, p is the padding size,
Verbose Frequency 50
f is the size of the filter, nc is the number of channels,
Validation Frequency (Vf) 500
nf is the number of filters and s is the stride. Shuffle 'every-epoch'
convolutions are carried out on each image based on Padding Direction 'right' (1,1)
n, nc and f, preserving the relation between pixels and Filter Size (3 x 3)
creating a matrix of feature maps. s is used to shift the Stride (1,1)
filter over the image pixels p. Number of Filters(layer) 8,16,32,64
Optimiser SGDM
n+2p-f n+2p-f Learning Rate Scheduler Piecewise
[n,n,nc]. [f,f,nc]= [[
s
+1] [ s +1] ,nf] (5)
Image size 192 x 192 x 3
Learning Droprate 60 iterations
After two layers of convolution Eq. (4) is introduced
as a shortcut layer to carry information from the images are tested for True Positive, false positive,
previous layers into the next layer. The residual True negative or false. The images are resized and
connections help in associating the prediction values fed into the Residual network with the sizes of 128 x
that were estimated by the previous layers as an input 128 x 3 dimensions.
into the next layer. The residual function computes Images from each dataset for training are 508
and matches the actual value with the predicted value. images from ISED and iSAFE datasets that trains
If the value of x is equal to the actual value then the data for 7 classes. Similarly, 5295 images are used for
residual function is zero resulting in a higher training data from Daisee dataset. The network was
derivative. along with the residual connections batch created from the scratch and the network was used to
normalization is also carried out in the block to improve the efficiency of emotional understanding.
normalize the values to a threshold where the However, every time training is carried out on the
derivatives are not too small to be removed due to datasets, the learning rate of the network is set to
least significance. All the layers, parameters 1.00E-04 this helps the network to learn features from
mentioned are the outcome of OHP. This tuning helps the inception. The network uses a piecewise learning
in assembling the required number of layers to extract rate scheduler, this enhances the learning rate by
meaningful interpretations of mid-level and high- decreasing the learning rate often and optimizing the
level features after each iteration to create a pool of network for a higher degree of weight vector
weighted probabilities. These probabilities are used distribution. The data is shuffled after each epoch and
to classify the images during validation and testing. the mini batch size was fixed at 32. Fig. 2 elicits the
details of the network where two skip networks are
5. Experimentation introduced. The plain network had 40 layers, while
The Residual Network was used on the following the introduction of the residual layers has increased
datasets, ISED, iSAFE and Daisee Datasets. The the number of layers to 50.
network is trained with images of the faces of Indian The number of layers is still lesser than the
origin. Since these images are created by the authors prominent networks like ResNet 32, ResNet50 and
for an E-learning environment the same trained Resnet 101. The pooling layers help in reducing the
network was used to create an observation on the dimensionality of the features extracted, while the
Online classes that are being conducted during these ReLu layers were used in the network as activation
pandemic years. All the experiments were carried out functions. The layers were not chosen at random,
using Intel Xeon E3 based workstations, NVIDIA however the layers were precisely placed after
Gforce GTX graphics card on a 32 GB RAM and several iterations and changes to the entire network
Matlab2019b was used as the platform to train and based on the performances based on OHP. Deep
validate the network. Images from the testing data is Networks with higher number of layers and residual
drawn at random and fed into the network, these networks and hence achieving the desired results for
the data took a long time to reach an optimum
International Journal of Intelligent Engineering and Systems, Vol.14, No.2, 2021 DOI: 10.22266/ijies2021.0430.21
Received: October 13, 2020. Revised: December 24, 2020. 242
International Journal of Intelligent Engineering and Systems, Vol.14, No.2, 2021 DOI: 10.22266/ijies2021.0430.21
Received: October 13, 2020. Revised: December 24, 2020. 244
Table 4. Performance comparison of various available emotional analysis especially in the field of student
models engagement and attention estimation. Based on Eq.
Sl Name of the Method Dataset Acc (9) the classifiers accuracy on both the dataset is
no (%) calculated based on the confusion matrix. From Table.
1 CNN [23] ISED 51.6 3, various parametric results show that the network
2 CNN [23] ISED 59.3 was able to achieve significant outcome using the
3 Inception V3 [23] ISED 47.9 residual network. The average results for the dataset
4 EmotionNet 2 [23] ISED 21.0 ISED and iSafe dataset are, Accuracy 91.3%, Error at
5 EmotionalDAN [23] ISED 62.0 8.7%, Sensitivity 91.34%, Specificity 98.56,
6 CNN [24] ISED 82.9 precision 91.35%, false Positive rate is recorded
7 Local prominent at1.44% and the Kappa’s Coefficient 0.65 which is
directional patterns [25] recorded to be a substantial classifier for the database.
i. LBP (Local 76.47 Similarly, in the Daisee dataset, overall results for
Binary Pattern) Accuracy: 93.44%, Error rate: 6.56%, Sensitivity:
ii. LDP (Local 74.61 93.63%, Specificity: 97.82%, Precision 93.37%, false
Directional
positive rate is recorded at 2.18% and the Kappa’s
Pattern)
iii. LDN (Local 75.85 Coefficient 0.825 which is recorded to be a perfect
Directional classifier for the database. The proposed method has
Number) attained promising results due to the enormous
iv. LPTP (Local 72.46 images used for training, optimised layers to enrich
Directional the shallow network to learn, understand and capture
Ternary Pattern) ISED features. The Features learned using the Network
v. PTP (Positional 76.16 makes it sophisticated for classification with higher
Ternary Pattern) accuracy. The Feature maps created after each layer
vi. HOG (Histogram that uses the activation functions enhances the
Of Gradients) 76.75
network by reducing the vanishing gradient and
vii. LPDP (Local
Prominent overfitting issues that are prominent in the traditional
Directional 77.80 CNN. The network is compared with the results that
Pattern) are deeper and most commonly used trained networks.
viii. LPDP𝑓 (Local Table. 4, enlists the methods used by various authors
Prominent 78.32 and the accuracy that it has achieved on the dataset,
Directional though there are different classes in each of the
Pattern 𝑓) datasets our model has attained a signific-ant
8 Landmark Detection [26] ISED 34 improvement in all parameters used to measure the
9 Local directional-structural ISED 77.78 tangibility in the introduction of residual networks on
pattern [27] conventional CNN.
10 LDP+KPCA [28] Daisee 90.89
There are three observations about the network’s
11 Hybrid CNN [29] Daisee 86 performance, firstly degradation issues have rapidly
decreased due to the lower training error rate which
12 Deep Engagement Daisee 57.9
Recognition Network [30]
was observed to be at 0.265. The reduced training
13 Very Deep Convolutional Daisee 92.33 error improves the efficiency of the learning due to
Network [31] the optimum depth of the network. Secondly, the
14 Proposed Model ISED & 91.3 identity connections as mentioned in Fig. 1, have
iSAFE helped in significantly decreased the time complexity
15 Proposed Model Daisee 93.44 for training and validation by 30%. Thirdly, the
network uses SGD solver and it is able to find good
solutions. Though the network is shallow, gradient
state-of-the-art deep learning networks and descent algorithm works on batches of smaller sizes
traditional methods such as Inception V3, CNN, and this enables the network to train on smaller batches
the well-known local directional patterns. The model and create multiple layers of features. These features
proposed exceeds the precision of all ISED algorithm are the crux of the classification unit to create
models that have used the database. Conventional probabilities on the Weighted layers by accurately
methods, CNN, Hybrid CNN’s and many more recent turning on the exact neurons to provide precise and
works that have been carried out in the recent years accurate results. Two-Fold Cross validations provide
are prominent in enhancing the efficiency of visibility on the network’s validation outcome during
International Journal of Intelligent Engineering and Systems, Vol.14, No.2, 2021 DOI: 10.22266/ijies2021.0430.21
Received: October 13, 2020. Revised: December 24, 2020. 245
the training phase. This provides a closer detail of the Conflicts of Interest
performance of the network. Incorporating
Optimized Hyper Parametric Settings that are The authors declare no conflict of interest.
discussed in Section. 4, has cushioned the network to
be customized for better performance. Author Contributions
The contributions by the authors for this research
7. Conclusions article are as follows: “conceptualization,
methodology, Formal analysis and writing—original
The proposed work intends to evaluate the
performance of the inclusion of residual layers to an draft preparation Michael Moses Thiruthuvanathan;
existing CNN model. CNN models are prone to Result validation, resources, formal analysis
vanishing gradients and loss in accuracy as the writing—review and editing and supervision
networks grow deep. The residual connections on the Balachandran Krishnan; data curation, Result
shallow network is designed from the scratch Validation and ethical inference Madhavi
specifically for the purpose of detecting students’ Rangaswamy”
engagement on an E-Learning platform. Both
behavioral classes like boredom, engaged, frustrated Acknowledgments
and confusion classes along with emotional classes Authors wish to acknowledge the technical and
are considered in this work. This work utilizes infrastructure help rendered by the faculty members
students’ facial features to predict and classify from the department of Computer Science and
images in the wild to calculate the accuracy of the Engineering, CHRIST (Deemed to be University),
proposed approach. In this model, the network uses Bangalore, India.
residual networks to improve connections from
previous layers into the next layers to improve the References
learning and classification response of the system.
Two-fold cross validations are used to understand the [1] A. Mollahosseini, B. Hasani, and M. H. Mahoor,
capability of the model. This network is trained on “AffectNet: A Database for Facial Expression,
three Indian Datasets indigenously, to be able to Valence, and Arousal Computing in the Wild”,
detect the emotional and behavioral intent of the IEEE Transactions on Affective Computing, Vol.
students. The total layers used in the network is 50. 10, No. 1, pp. 18-31, 2019.
The shallow network helps in efficient learning [2] T. Ashwin and R. Guddeti, “Affective Database
model that is able to validate images at an average of for E-Learning and Classroom Environments
86.87% during training. The model is tested for using Indian Students’ Faces, Hand Gestures
detection efficiency with test data and compared with and Body Postures”, Future Generation
state-of-the-art models that are built with CNN as Computer Systems, Vol. 108, No. 1, pp. 334-348,
primary network. The usage of Residual Connections 2020.
and Optimized Hyper Parametric Settings has [3] A. Dhall, R. Goecke, J. Joshi, M. Wagner, and T.
considerably enhanced the performance in creating Gedeon, “Emotion Recognition in the Wild
and using the network for a Indian face based Challenge 2013”, In: Proc. of 15th International
emotional classification model. Conf. on Multimodal Interaction, ACM, pp.
The Furnished results in Table. 3 and Table 4 509–516, 2013.
based on Fig. 4 and Fig. 5 show that the model [4] I. Goodfellow, D. Erhan, P. L. Carrier, A.
outperforms the other state of the art techniques. Also, Courville, M. Mirza, B. Hamner, W. Cukierski,
the classifiers performance is evaluated with the help Y. Tang, D. Thaler, and H. Lee, “Challenges in
of Kappa score and the network performs well and Representation Learning: A Report on Three
the classifier is diligently able to perform at 82.51%. Machine Learning Contests”, In: Proc. of
The proposed model is evaluated for performance International Conf. on Neural Information
measures using standard evaluation metrics that Processing, Vol. 64, No. 1, pp. 59–63, 2015.
results in improvement close to 2% in various [5] A. Mollahosseini, B. Hasani, M. J. Salvador, H.
parameters. The proposed model will be tested for Abdollahi, D. Chan, and M. H. Mahoor, “Facial
group engagement detections and evaluate the Expression Recognition from World Wild Web”,
Valence and Arousal of the group using the model as In: Proc. of IEEE Conf. on Computer Vision and
a future enhancement. Pattern Recognition (CVPR) Workshops, Vol. 1,
pp. 168-195, 2016.
[6] S. Zafeiriou, A. Papaioannou, I. Kotsia, M.
Nicolaou and G. Zhao, “Facial Affect “In-the-
International Journal of Intelligent Engineering and Systems, Vol.14, No.2, 2021 DOI: 10.22266/ijies2021.0430.21
Received: October 13, 2020. Revised: December 24, 2020. 246
Wild”: A Survey and a New Database”, In: Proc. [17] C. Bian, Y. Zhang, F. Yang, W. Bi, and W. Lu,
of IEEE Conf. on Computer Vision and Pattern “Spontaneous Facial Expression Database for
Recognition Workshops (CVPRW), pp. 1487- Academic Emotion Inference in Online
1498, 2016. Learning”, IET Computer Vision. Vol. 13, No. 3,
[7] L. Rothkrantz, “Dropout Rates of Regular pp. 329–337, 2018.
Courses and MOOCs”, In: Proc. of International [18] A. Gupta, A. D’Cunha, K. Awasthi, and V.
Conf. on Computer Supported Education, Rome, Balasubramanian, “Daisee: Towards User
pp. 25-46, 2016. Engagement Recognition in the Wild”, arXiv
[8] M. Li, H. Xu, X. Huang, Z. Song, X. Liu and X. preprint arXiv: 1609.01885, 2016.
Li, “Facial Expression Recognition with Identity [19] M. J. Lyons, S. Akamatsu, M. Kamachi, J.
and Emotion Joint Learning”, IEEE Gyoba, J. Budynek, “The Japanese female facial
Transactions on Affective Computing, Vol. 4, expression (JAFFE) database”, In: Proc. of
No. 8, pp. 411-416, 2018. Third International Conf. on Automatic Face
[9] S. M. Mavadati, M. H. Mahoor, K. Bartlett, P. and Gesture Recognition, pp. 14–16, 1998.
Trinh, and J. F. Cohn, “Disfa: A Spontaneous [20] I. Goodfellow, D. Erhan, P. L. Carrier, A.
Facial Action Intensity Database”, IEEE Courville, M. Mirza, B. Hamner, W. Cukierski,
Transactions on Affective Computing, Vol. 4, Y. Tang, D. Thaler, D.-H. Lee, Y. Zhou, C.
No. 2, pp. 151–160, 2013. Ramaiah, F. Feng, R. Li, X. Wang, D.
[10] T. Ashwin and R. Guddeti, “Unobtrusive Athanasakis, J. Shawe-Taylor, M. Milakov, J.
Students’ Engagement Analysis in Computer Park, R. Ionescu, M. Popescu, C. Grozea, J.
Science Laboratory Using Deep Learning Bergstra, J. Xie, L. Romaszko, B. Xu, Z. Chuang,
Techniques”, In: Proc. of IEEE 18th and Y. Bengio. “Challenges in representation
International Conf. on Advanced Learning learning: A report on three machine learning
Technologies (ICALT), pp. 436–440, 2018. contests”, Neural Information Processing, Vol.
[11] S. Setty, M. Husain, P. Beham, J. Gudavalli, M. 8228, pp. 117-124, 2013.
Kandasamy, R. Vaddi, V.Hemadri, J. Karure, R. [21] S. Singh and S. Benedict, “Indian Semi-Acted
Raju, and B. Rajan, “Indian Movie Face Facial Expression (iSAFE) Dataset for Human
Database: A Benchmark for Face Recognition Emotions Recognition”, Advances in Signal
under Wide Variations”, In: Proc. of Fourth Processing and Intelligent Recognition Systems.
National Conf. on Computer Vision, Pattern SIRS Communications in Computer and
Recognition, Image Processing and Graphics Information Science, Vol 1209, No. 1, pp. 150-
(NCVPRIPG), IEEE, 2013, pp. 1–5, 2013. 162, 2019.
[12] S. Patwardhan and G. M. Knapp, “Affect [22] A. Kaur, A. Mustafa, L. Mehta and A. Dhall,
Intensity Estimation Using Multiple Modalities”, “Prediction and Localization of Student
In: Proc. of Florida Artificial Intelligence Engagement in the Wild”, Digital Image
Research Society Conf., pp. 130-133, 2014. Computing: Techniques and Applications
[13] R. Noorden, “The ethical questions that haunt (DICTA), Canberra, Australia, pp. 1-8, 2018.
facial-recognition research”, Nature. Vol. 587: [23] I. Tautkute, T. Trzcinski, and A. Bielski, “I
pp. 354-358, 2020. Know How You Feel: Emotion Recognition
[14] A. Dhall, R. Goecke, S. Lucey, and T. Gedeon, with Facial Landmarks”, In: Proc. of IEEE Conf.
“Collecting Large, Richly Annotated Facial- on Computer Vision and Pattern Recognition
Expression Databases from Movies”, IEEE Workshops (CVPRW), pp. 1959-1974, 2018.
MultiMedia, Vol. 19, No. 3, pp. 34-41, 2012. [24] S. Gonzalez-Lozoya, J. Calleja and L. Pellegrin,
[15] S. Happy, P. Patnaik, A. Routray, and R. Guha, H.Escalante, Ma. Medina and A. Benitez-Ruiz,
“The Indian Spontaneous Expression Database “Recognition of Facial Expressions based on
for Emotion Recognition”, IEEE Transactions CNN Features”, Multimedia Tools &
on Affective Computing, Vol. 8, No. 1, pp. 131- Application, Vol. 79, pp. 13987–14007, 2020.
142, 2017. [25] F. Makhmudkhujaev, M. Abdullah-Al-Wadud,
[16] T.Sapinski, D. Kaminska, A. Pelikant, C. M. Iqbal, B. Ryu, and O. Chae, “Facial
Ozcinar, E.Avots, and G. Anbarjafari. Expression Recognition with Local Prominent
“Multimodal Database of Emotional Speech, Directional Pattern”, Signal Processing: Image
Video and Gestures”, In: Proc. of International Communication, Vol. 74, No. 1, pp, 1-12, 2019.
Conf. on Pattern Recognition Information [26] S. Engoor, S. SendhilKumar, C. Hepsibah
Forensics, pp. 153–163, 2018. Sharon, and G. S. Mahalakshmi, “Occlusion-
aware Dynamic Human Emotion Recognition
International Journal of Intelligent Engineering and Systems, Vol.14, No.2, 2021 DOI: 10.22266/ijies2021.0430.21
Received: October 13, 2020. Revised: December 24, 2020. 247
International Journal of Intelligent Engineering and Systems, Vol.14, No.2, 2021 DOI: 10.22266/ijies2021.0430.21