EAI Endorsed Transactions: Music Recommendation Based On Facial Emotion Recognition
EAI Endorsed Transactions: Music Recommendation Based On Facial Emotion Recognition
1
PES University, Bangalore, Karnataka, 560050, India
2
Northwestern University, Evanston, IL 60208, United States
3
Great Learning, Hyderabad, Telangana, 500089, India
Abstract
INTRODUCTION: Music provides an incredible avenue for individuals to express their thoughts and emotions, while also
serving as a delightful mode of entertainment for enthusiasts and music lovers.
OBJECTIVES: This paper presents a comprehensive approach to enhancing the user experience through the integration of
emotion recognition, music recommendation, and explainable AI using GRAD-CAM.
METHODS: The proposed methodology utilizes a ResNet50 model trained on the Facial Expression Recognition (FER)
dataset, consisting of real images of individuals expressing various emotions.
RESULTS: The system achieves an accuracy of 82% in emotion classification. By leveraging GRAD-CAM, the model
provides explanations for its predictions, allowing users to understand the reasoning behind the system's recommendations.
The model is trained on both FER and real user datasets, which include labelled facial expressions, and real images of
individuals expressing various emotions. The training process involves pre-processing the input images, extracting features
through convolutional layers, reasoning with dense layers, and generating emotion predictions through the output layer
CONCLUSION: The proposed methodology, leveraging the Resnet50 model with ROI-based analysis and explainable AI
techniques, offers a robust and interpretable solution for facial emotion detection paper.
Keywords: Facial emotion detection, Resnet50, convolutional neural network, deep learning, region of interest, explainable AI.
Copyright © YYYY Author et al., licensed to EAI. This is an open access article distributed under the terms of the CC BY-NC-SA
4.0, which permits copying, redistributing, remixing, transformation, and building upon the material in any medium so long as the
original work is properly cited.
doi: 10.4108/_______________
1
Rajesh B , Keerthana V, Narayana Darapaneni, Anwesh Reddy P
emotional states and tailoring content accordingly. By In 2005, Wieczorkowska et al. conducted a study aiming
accurately detecting and classifying facial emotions, the to assist users in discovering music aligned with their moods.
system can adapt its functionality and provide content that They employed the K-nearest neighbors (KNN) algorithm to
resonates with users' emotional needs. This capability opens classify a vast dataset of 327,683 songs into six distinct
up new possibilities for interactive applications, such as emotions, resulting in an overall accuracy of 37%. Similarly,
personalized music recommendation systems. in 2008, another study [12] utilized a regression method for
Music recommendation systems have gained immense Music Emotion Recognition (MER) and achieved accuracy
popularity in recent years, leveraging advanced algorithms to rates of 64% for arousal and 59% for valence. Yading Song
suggest songs based on users' preferences. However, et al. [13] explored various facets of music for MER and
integrating emotion recognition into music recommendation utilized a labeled dataset of 2,904 songs categorized as
adds a new dimension to the system. By considering the "happy," "sad," "angry," or "relaxed." Support Vector
detected emotions, the system can recommend songs that Machines (SVM) were employed, with spectral
align with users' emotional states, creating a more meaningful characteristics exhibiting superior performance compared to
and engaging music listening experience.. other acoustic parameters.
Explainable AI is another crucial aspect of the proposed In 1978, Ekman and Friesen [14] introduced Action Units
system. Conventional machine learning models frequently (AU), which incorporated both permanent and transient facial
lack interpretability, posing a challenge for users to traits for emotion recognition. The increasing popularity of
comprehend the rationale behind their recommendations. By employing Convolutional Neural Networks (CNNs) in
incorporating the GRAD-CAM technique, the system emotion recognition can be attributed to the continuous
provides visual explanations for its predictions, enabling advancements in methodologies. Lyrical analysis has also
users to grasp the underlying features that contribute to the been utilized for music classification [15], [16]. However,
recommended content. This transparency fosters trust, relying exclusively on tokenized methods falls short in
understanding, and user engagement. achieving accurate song categorization. Additionally, the
The primary objective of this study is to create an affordable presence of language barriers poses a limitation to
music application that utilizes real-time video and classification within a single language, creating a distinct
Convolutional Neural Network (CNN) technology to disadvantage in the overall process.
automatically select songs based on the user's current mood In the year 2020, T. Vijayakumar [17] introduced a
state. The system aims to minimize resource consumption research paper that concentrated on tackling inverse problems
while incorporating an emotion module that analyzes the utilizing Convolutional Neural Networks (CNNs). The study
user's real-time video to evaluate their emotional state. initially employed CNN and later transitioned to direct
Subsequently, it matches the identified mood with songs from inversion using a combination of Filtered Back Projection
a categorized collection and offers recommendations for a (FBP) and CNN, known as FBP-C. The approach utilized
diverse range of songs. By alleviating the burden of manual individual learning and a U-net architecture. The synthetic
song selection, this system has the potential to address the dataset used in the study consisted of 475 training images and
existing challenge in finding suitable songs. 25 validation images. The backpropagation technique
employed in the study produced satisfactory results.
In a recent study carried out in 2021, Sungheetha, Akey,
2. Literature Survey and Rajesh Sharma [18] directed their attention towards
image classification using Convolutional Neural Networks
The literature survey encompasses a comprehensive (CNN) for the early detection of Diabetic Retinopathy. The
analysis of existing research and studies related to emotion conventional methods employed for detecting Hard Exudates
recognition, music recommendation systems, and explainable (HE) in retinopathy images, which are crucial for assessing
AI. This section highlights key findings, methodologies, and diabetes severity, were found to be ineffective. To address
insights from previous works, laying the foundation for the this challenge, the study proposed the utilization of CNN to
proposed system. extract relevant features from deep networks, offering a
Several studies have explored the use of different viable solution. Deep learning architectures, including CNN,
techniques and methodologies for music recommendation have demonstrated their effectiveness as powerful tools for
systems based on mood/emotions. In their study, Renu Taneja image recognition, analysis, classification, and identification
et al. [9] utilized Audio to retrieve audio features such as within the domain of medical imaging.
pace, beats, and RMSE. They then constructed clusters to In the year 2021, a survey was conducted by Smys, S., Joy
represent various moods based on these extracted properties. Iong Zong Chen, and Subarna Shakya [19] to investigate
Kee Moe Han et al. [8], on the other hand, employed the various architectures and design methodologies employed in
average emotions from a group of 15 individuals to determine neural networks. The study classified deep neural networks
the emotion of a song. They trained a classifier using this data into three distinct types: hybrid architectures, generative
and categorized the music signal's emotion by considering architectures, and discriminative architectures. The hybrid
audio parameters such as pitch and timbre. Another research architecture was presented by integrating Convolutional
conducted by V. R. Ghule et al. [10] centered on the Neural Networks (CNN) with deep belief networks, while the
development of a music system that employed facial discriminative architecture predominantly relied on CNN,
recognition technology for analyzing emotions. featuring stacked pooling and convolution layers to construct
2
Music Recommendation Based on Facial Emotion Recognition
a deep model. The survey provided insights into the diverse As emphasized in the literature review, Peter Burkert, Felix
approaches and structures employed. Trier, Muhammad Zeshan Afzal, Andreas Dengel, and
Emotion recognition has emerged as a highly investigated Marcus Liwicki [21] presented DeXpression, a deep
domain within the fields of computer vision and human- convolutional neural network designed for expression
computer interaction. Researchers have explored various recognition. This network integrates explainable AI
techniques, including facial expression analysis, techniques to provide visualizations of the specific image
physiological signals, and audio analysis. In the realm of regions that contribute to the model's predictions.Building
music psychology, Swathi Swaminathan and E. Glenn upon the insights gained from the extensive literature survey,
Schellenberg [1] conducted research to shed light on the this paper now proceeds to outline the methodology
current state of emotion research, emphasizing the employed to address the research gaps identified in the
importance of comprehending emotions within music-related existing studies. The methodology section represents a fusion
contexts. F. Abdat, C. Maaoui, and A. Pruski [2] directed their of established techniques from prior research and innovative
attention toward human-computer interaction and highlighted approaches tailored to the specific objectives of our project.
the significance of facial cues in emotion detection through By capitalizing on the strengths and limitations of previous
facial expression recognition. These studies offer valuable methodologies, we have designed a refined framework that
insights into the theoretical foundations and practical aims to advance the fields of facial emotion recognition and
applications of techniques utilized in emotion recognition. music recommendation. This comprehensive methodology
Music recommendation systems have the objective of encompasses various stages, including dataset acquisition,
delivering personalized and pertinent music suggestions to preprocessing, model selection, training, and evaluation.
users, taking into account their preferences, context, and While also introducing novel strategies to tackle the unique
emotional states. In the realm of mood classification from challenges associated with facial emotion recognition and
musical audio, Kyogu Lee and Minsu Cho [4] investigated music recommendation.
the utilization of user group-dependent models, highlighting
the importance of acknowledging user diversity and
individual preferences in the realm of music 3.1. Dataset Description
recommendation. In a distinct perspective, Daniel Wolff,
Tillman Weyde, and Andrew MacFarlane [5] concentrated on The proposed system utilizes a dataset consisting of facial
culture-aware music recommendation, recognizing the images with labeled emotions for training and evaluation
influence of cultural background on music preferences. purposes. The dataset includes real images of different
Additionally, Mirim Lee and Jun-Dong Cho [15] developed individuals encompassing a diverse range of emotions. The
a context-based social music recommendation service, dataset may be augmented and pre-processed to enhance
underscoring the significance of contextual factors in model performance and generalize well to real-world
augmenting music recommendation systems. These studies scenarios.
contribute to the comprehension of music recommendation The dataset used for training the facial emotion recognition
techniques and the factors that impact user satisfaction and model consists of two components: the FER dataset and real
engagement. images of different individuals. The dataset for facial
Explainable AI has gained considerable attention to expression recognition (FER) consists of categorized facial
enhance transparency and interpretability in AI models. Peter expressions, encompassing a range of emotions including
Burkert, Felix Trier, Muhammad Zeshan Afzal, Andreas anger, disgust, fear, happiness, sadness, surprise, and
Dengel, and Marcus Liwicki [21] presented DeXpression, a neutrality. This dataset serves as the foundation for training
deep convolutional neural network designed for expression the deep learning model and enables it to learn patterns
recognition. This network integrates explainable AI associated with various emotions. To enhance the diversity
techniques to provide visualizations of the specific image and generalization capability of the model, real images of
regions that contribute to the model's predictions. This study different individuals are also captured and included in the
demonstrates the potential of explainable AI in enhancing the training dataset.
interpretability of deep learning models. The literature survey In addition to the FER dataset, a music dataset is utilized for
explores the existing research and developments in the field generating personalized music recommendations. The music
of explainable AI and its relevance to the proposed project. dataset contains a diverse collection of music tracks from
By reviewing these studies and research papers, it is various genres and styles. This information serves as the basis
evident that emotion recognition, music recommendation, for mapping the detected emotions to appropriate music
and explainable AI are active areas of research with various tracks. The dataset for music can be acquired from diverse
approaches and techniques. However, there is still a need to sources, such as online music platforms, curated databases, or
integrate these domains to enhance user experience. The personalized collections.
proposed project aims to bridge these gaps and provide a
comprehensive system that combines emotion recognition
from facial cues, music recommendation based on emotions,
and explainable AI using GRAD-CAM visualization..
3. Methodology
3
Rajesh B , Keerthana V, Narayana Darapaneni, Anwesh Reddy P
By incorporating both the Real user image dataset and a Convolutional Layer
comprehensive music dataset, the system can offer
personalized music recommendations based on the user's real- The convolutional layers in Resnet50 are responsible for
time emotional state. The facial emotion recognition model feature extraction. They consist of filters that slide across the
trained on the FER dataset enables accurate emotion detection, input images, convolving with the pixel values to produce
while the music dataset provides a rich selection of tracks for feature maps. Each filter specializes in capturing specific
mapping and playlist generation. Table 1 provides the patterns or features, such as edges, textures, and shapes. The
classifications of moods and corresponding songs. depth of the network enables Resnet50 to learn increasingly
complex and abstract features as the information passes
through multiple convolutional layers.
Table 1. Music Dataset
Dense Layer
Sr.No Emotions No of Songs
1 Happy 20 Following the convolutional layers, Resnet50 incorporates
2 Sad 30 dense layers, also known as fully connected layers. These
3 Angry 20 layers receive the extracted features from the previous layers
4 Surprise 20 and perform high-level reasoning and decision-making. The
5 Neutral 20 dense layers are typically comprised of multiple neurons, with
6 Disgust 20 each neuron representing a specific class or emotion in the
7 Fear 16 case of facial emotion detection. Through a series of weighted
connections and activation functions, the dense layers
3.2. Dataset Description transform the extracted features into probability scores or
confidence values for each class.
Facial emotion recognition utilizes the ResNet50 Output Layer
model, a deep CNN architecture renowned for its exceptional
performance in image classification assignments It consists of The output layer in Resnet50 represents the final stage,
50 layers, including convolutional layers, shortcut responsible for generating predicted emotions for the input
connections, and global average pooling. The input layer of images in facial emotion detection. This layer comprises
the ResNet50 model takes in facial images as input. The neurons that correspond to various emotions, including
convolutional layers extract meaningful features from the happiness, neutral, anger, surprise, fear, disgust, and sadness.
input images, capturing the facial expressions' key The choice of activation function in the output layer depends
characteristics. The dense layers process the extracted features on the problem's characteristics. In this scenario, a commonly
and learn the relationship between facial expressions and employed activation function is softmax, which ensures that
emotions. Finally, the output layer provides the predicted the predicted emotion probabilities sum up to 1. This property
facilitates interpretation and comparison of the predictions.
emotion based on the learned features.
Resnet50 excels in facial emotion detection due to its deep
The output of the dense layer in the network employs the architecture and residual connections. The depth enables the
softmax activation function, which enables the prediction of network to learn rich and meaningful representations of facial
a multinomial probability distribution. This distribution is features, capturing intricate details relevant to emotions. The
well-suited for multiclass classification tasks that involve residual connections help alleviate the vanishing gradient
more than two labels. In this particular project, which problem, allowing for better training and improved
involves classifying emotions into seven distinct labels, the performance.
output for each class is represented as a probability By leveraging Resnet50 as the core model for facial
distribution. The network architecture comprises nine emotion detection, this research aims to accurately classify the
convolutional layers, with a max-pooling layer following emotional states conveyed by individuals' facial expressions.
every three convolutional layers, and two dense layers. The trained model is capable of analyzing facial images and
predicting the corresponding emotions, contributing to the
Input Layer creating of advanced systems capable of comprehending and
appropriately reacting to human emotion.
4
Music Recommendation Based on Facial Emotion Recognition
To summarize, Resnet50 serves as a robust deep-learning utilized to detect and extract the eyes from facial images in
model that constitutes the foundation of the facial emotion real time. These extracted eye regions were then fed into the
detection system created in this study. It leverages its trained Resnet50 model, which generated predictions of the
architecture, including convolutional layers, dense layers, and corresponding emotional states.
residual connections, to extract features and make accurate
predictions. Through extensive training on the dataset, the This methodology offers several advantages. Firstly, by
model can capture the subtle variations in facial expressions narrowing the focus to the eyes, the model's attention is
and classify them into different emotions. This model serves concentrated on the most expressive and informative facial
as a valuable tool for understanding and analyzing human
region, potentially improving the accuracy of emotion
emotions, opening doors to numerous applications in fields
detection. Secondly, working with a specialized dataset of
like psychology, human-computer interaction, and affective
computing. eye regions allows for more targeted training, enabling the
model to learn eye-specific features more effectively. Lastly,
the use of the Haar cascade classifier for eye detection
provides a robust and efficient means of isolating the eyes,
3.3. Facial Emotion Detection with ROI (Eyes): ensuring accurate extraction even in real-time
In the context of facial emotion detection, the eyes are scenarios.Overall, the integration of facial emotion detection
considered a crucial region for accurate emotion recognition. with ROI (Eyes) using the Haar cascade classifier and the
The eyes exhibit significant changes in various emotional Resnet50 model demonstrates a tailored approach to emotion
states, and capturing these subtle variations can enhance the recognition. By leveraging the distinctive features of the eyes
performance of emotion classification models. To leverage the and training on eye-specific datasets, this methodology
distinctive features of the eyes, a specific approach was enhances the precision and granularity of facial emotion
employed, involving the extraction of the region of interest detection systems, providing valuable insights into the role of
(ROI) using a Haar cascade classifier. the eyes in expressing and recognizing emotions.
The Haar cascade classifier is a popular technique in computer By focusing solely on the eyes, the model gained a deeper
vision for object detection, known for its efficiency and understanding of the specific eye-related cues and expressions
accuracy. In this methodology, the Haar cascade classifier was associated with each emotion. This approach allowed for a
trained to identify and localize the eyes in facial images. Once more fine-grained analysis of the eyes' role in emotion
the eyes were successfully detected, they were cropped and recognition, capturing the nuances and subtleties that
extracted as separate images, creating a specialized dataset contribute to accurate classification. Once the model was
specifically consisting of eye regions. trained on the eye-centric dataset, it was capable of predicting
facial emotions based on new, unseen eye regions. During
This eye-centric dataset was then utilized for training a facial inference, the Haar cascade classifier was utilized to detect
emotion classification model. The model architecture, based and extract the eyes from facial images in real time. These
on the Resnet50 deep convolutional neural network (CNN), extracted eye regions were then fed into the trained Resnet50
was employed to learn the intricate patterns and features model, which generated predictions of the corresponding
present in the eye regions. The Resnet50 model has been emotional states.
widely recognized for its exceptional performance in
computer vision tasks, making it a suitable choice for this 3.4. Explainable AI:
research.
Explainable AI (XAI) is an essential aspect of building
During the training process, the model was exposed to the eye trustworthy and interpretable machine learning models. It
images from the specialized dataset, with each eye region aims to provide insights into the reasoning behind the
associated with a corresponding emotional label. The model predictions made by the model, offering transparency and
learned to analyze the eye features and classify them into enabling users to understand and trust the decision-making
different emotional states, including happiness, sadness, process. In this project, the GRAD-CAM (Gradient-weighted
anger, fear, disgust, surprise, and neutral. Class Activation Mapping) technique was employed to
achieve explainability in the facial emotion detection model.
By focusing solely on the eyes, the model gained a deeper
understanding of the specific eye-related cues and GRAD-CAM is a visualization technique that helps identify
expressions associated with each emotion. This approach the regions of an image that are influential in a model's
allowed for a more fine-grained analysis of the eyes' role in decision-making process. It generates heatmaps by
emotion recognition, capturing the nuances and subtleties that highlighting the important areas of an input image that
contribute to accurate classification. contribute most significantly to the predicted class. By
applying GRAD-CAM to the facial emotion detection model,
Once the model was trained on the eye-centric dataset, it was we can gain insights into the regions of the face that
capable of predicting facial emotions based on new, unseen contribute to the classification of specific emotions.
eye regions. During inference, the Haar cascade classifier was
5
Rajesh B , Keerthana V, Narayana Darapaneni, Anwesh Reddy P
To apply GRAD-CAM, the pre-trained Resnet50 model was effectively classify emotions such as anger, disgust, fear,
utilized. After an input image was fed into the model for happiness, sadness, surprise, and neutral expressions.
prediction, the gradients of the target class (the predicted
emotion) were computed with respect to the final
convolutional layer. These gradients were then used to weigh 4.2. Region of Interest (ROI) Analysis:
the activations of the convolutional layer, creating a heatmap
By focusing on the eyes as the region of interest, we
that visually represents the regions that influenced the
observed that the model's performance improved in detecting
prediction the most. By visualizing the heatmaps generated subtle changes in emotions. The eye region, known to convey
by GRAD-CAM, we were able to identify the facial regions, vital emotional cues, proved to be influential in accurately
such as the eyes, nose, or mouth, that played a significant role predicting emotions. The use of Haar cascades for eye
in the model's decision-making process. This information can detection and a separate dataset consisting solely of eye
be invaluable for understanding how the model interprets images contributed to the model's enhanced performance in
emotions and which facial features contribute most capturing subtle variations in emotional states.
prominently to each emotion classification.
6
Music Recommendation Based on Facial Emotion Recognition
demonstrated improved performance in capturing subtle [3] "How music changes your mood," Examined Existence.
emotional cues. [Online]. Available: http://examinedexistence.com/how-
music-changes-your-mood/. Accessed: Jan. 13, 2017.
The integration of music recommendations based on the [4] Kyogu Lee and Minsu Cho, "Mood Classification from
detected emotions enhanced the user experience, providing Musical Audio Using User Group-dependent Models."
personalized playlists that resonated with the user's emotional [5] Daniel Wolff, Tillman Weyde, and Andrew MacFarlane,
state. The system's effectiveness was further enhanced by "Culture-aware Music Recommendation."
incorporating explainable AI techniques, particularly the [6] A. Lehtiniemi and J. Holm, "Using Animated Mood Pictures in
GRAD-CAM method, which provided insights into the Music Recommendation," in 16th International Conference on
Information Visualization, 2012.
model's decision-making process and enhanced transparency.
[7] A. S. Dhavalikar and Dr. R. K. Kulkarni, "Face Detection and
The results of this paper have several implications for Facial Expression Recognition System," International
Conference on Electronics and Communication Systems
future research and development. Some potential areas for (ICECS -2014).
further exploration and improvement include: [8] K. Han, T. Zin, and H. M. Tun, "Extraction Of Audio Features
Expansion of Emotion Categories: While the current For Emotion Recognition System Based On Music,"
International Journal Of Scientific & Technology Research,
system successfully classified emotions into seven categories, June 2016.
future work could involve expanding the range of emotions [9] R. Taneja, A. Bhatia, J. Monga, and P. Marwaha, "Emotion
recognized. This could include more nuanced emotional states detection of audio files," in IEEE Computing for Sustainable
or cultural-specific emotions, allowing for a more Global Development (INDIACom), 2016 3rd International
comprehensive understanding of users' emotional Conference on, March 2016, pp. 2397-240.
experiences. [10] V. R. Ghule, A. B. Benke, S. S. Jadhav, and S. A. Joshi,
"Emotion Based Music Player Using Facial Recognition,"
Multi-modal Emotion Recognition: Incorporating International Journal of Innovative Research in Computer and
additional modalities, such as voice or gesture recognition, Communication Engineering, February 2017, Vol. 5, Issue 2.
alongside facial emotion detection, can provide a more holistic [11] A. Wieczorkowska, P. Synak, R. Lewis, and Z. W. Raś,
understanding of users' emotional states. Multi-modal "Extracting emotions from music data," in International
Symposium on Methodologies for Intelligent Systems, May
approaches have the potential to enhance the accuracy and 2005, pp. 456-465.
robustness of emotion detection systems. [12] Y. H. Yang, Y. C. Lin, Y. F. Su, and H. H. Chen, "A regression
Real-Time System Deployment: While our system approach to music emotion recognition," IEEE Transactions on
audio, speech, and language processing, 2008, 16(2), 448-457.
performed real-time emotion recognition, further optimization
and deployment on low-latency platforms can ensure its [13] Y. Song, S. Dixon, and M. Pearce, "Evaluation of Musical
Features for Emotion Classification," in ISMIR, October 2012,
practical usability in real-world scenarios, such as interactive pp. 523-528.
applications or emotion-aware systems. [14] Ying-li Tian, T. Kanade, and J. Cohn, "Recognizing lower face
action units for facial expression analysis," in Proceedings of
User Feedback and Personalization: Integrating user the 4th IEEE International Conference on Automatic Face and
feedback mechanisms can enable the system to adapt and Gesture Recognition (FG'00), Mar. 2000, pp. 484-490.
personalize its recommendations based on individual [15] Mirim Lee and Jun-Dong Cho, "Logmusic: context-based
preferences and responses. User feedback loops can contribute social music recommendation service on mobile device,"
to continuous improvement and user satisfaction. Ubicomp'14 Adjunct, Seattle, WA, USA, Sep. 13-17, 2014.
[16] Gil Levi and Tal Hassner, "Emotion Recognition in the Wild
Generalization to Diverse Populations: Future research via Convolutional Neural Networks and Mapped Binary
should focus on expanding the diversity of the dataset utilised Patterns."
for training the model, encompassing individuals from various [17] Vijayakumar, T., "Posed Inverse Problem Rectification Using
demographics, cultures, and age groups. This will ensure the Novel Deep Convolutional Neural Network," Journal of
generalizability and inclusiveness of the system across Innovative Image Processing (JIIP) 2, no. 03 (2020): 121-127.
different populations. [18] Sungheetha, A., and Rajesh Sharma, "Design an Early
Detection and Classification for Diabetic Retinopathy by Deep
In conclusion, our proposed system demonstrates the Feature Extraction based Convolution Neural Network,"
potential of combining facial emotion detection, ROI analysis, Journal of Trends in Computer Science and Smart technology
(TCSST) 3, no. 02 (2021): 81-94.
music recommendation, and explainable AI techniques to
create a user-centric, personalized experience. The achieved [19] Smys, S., Joy Iong Zong Chen, and Subarna Shakya, "Survey
on Neural Network Architectures with Deep Learning," Journal
results, along with the future scope outlined, contribute to the of Soft Computing Paradigm (JSCP) 2, no. 03 (2020): 186-194.
advancement of affective computing and emotion-aware [20] "Unsupervised feature learning and deep learning Tutorial."
systems, with implications in fields such as entertainment, [Online]. Available:
healthcare, and human-computer interaction. http://ufldl.stanford.edu/tutorial/supervised/OptimizationStoch
asticGradientDescent/. Accessed: Jan. 13,
References 2017.OptimizationStochasticGradientDescent/. Accessed: Jan.
13, 2017.onal Conference on, March 2016, pp. 2397-240.
[1] Swathi Swaminathan and E. Glenn Schellenberg, "Current [21] Peter Burkert, Felix Trier, Muhammad Zeshan Afzal, Andreas
emotion research in music psychology," Emotion Review, vol. Dengel, and Marcus Liwicki, "DeXpression: Deep
7, no. 2, pp. 189-197, Apr. 2015. Convolutional Neural Network for Expression Recognition."
[2] F. Abdat, C. Maaoui, and A. Pruski, "Human-computer [22] "Unsupervised feature learning and deep learning Tutorial."
interaction using emotion recognition from facial expression," [Online]. Available:
in UKSim 5th European Symposium on Computer Modeling http://ufldl.stanford.edu/tutorial/supervised/OptimizationStoch
and Simulation, 2011. asticGradientDescent/. Accessed: Jan. 13, 2017.
7
Rajesh B , Keerthana V, Narayana Darapaneni, Anwesh Reddy P