[15]

Download as pdf or txt
Download as pdf or txt
You are on page 1of 21

electronics

Article
Customized 2D CNN Model for the Automatic Emotion
Recognition Based on EEG Signals
Farzad Baradaran 1 , Ali Farzan 1, *, Sebelan Danishvar 2, * and Sobhan Sheykhivand 3

1 Department of Computer Engineering, Shabestar Branch, Islamic Azad University, Shabestar 53816-37181, Iran
2 College of Engineering, Design and Physical Sciences, Brunel University London, Uxbridge UB8 3PH, UK
3 Department of Biomedical Engineering, University of Bonab, Bonab 55517-61167, Iran;
s.sheykhivand@tabrizu.ac.ir
* Correspondence: alifarzan402@gmail.com (A.F.); sebelan.danishvar@brunel.ac.uk (S.D.)

Abstract: Automatic emotion recognition from electroencephalogram (EEG) signals can be considered
as the main component of brain–computer interface (BCI) systems. In the previous years, many
researchers in this direction have presented various algorithms for the automatic classification of
emotions from EEG signals, and they have achieved promising results; however, lack of stability,
high error, and low accuracy are still considered as the central gaps in this research. For this purpose,
obtaining a model with the precondition of stability, high accuracy, and low error is considered
essential for the automatic classification of emotions. In this research, a model based on Deep
Convolutional Neural Networks (DCNNs) is presented, which can classify three positive, negative,
and neutral emotions from EEG signals based on musical stimuli with high reliability. For this
purpose, a comprehensive database of EEG signals has been collected while volunteers were listening
to positive and negative music in order to stimulate the emotional state. The architecture of the
proposed model consists of a combination of six convolutional layers and two fully connected layers.
In this research, different feature learning and hand-crafted feature selection/extraction algorithms
were investigated and compared with each other in order to classify emotions. The proposed model
for the classification of two classes (positive and negative) and three classes (positive, neutral, and
negative) of emotions had 98% and 96% accuracy, respectively, which is very promising compared
with the results of previous research. In order to evaluate more fully, the proposed model was also
investigated in noisy environments; with a wide range of different SNRs, the classification accuracy
Citation: Baradaran, F.; Farzan, A.;
Danishvar, S.; Sheykhivand, S.
was still greater than 90%. Due to the high performance of the proposed model, it can be used in
Customized 2D CNN Model for the brain–computer user environments.
Automatic Emotion Recognition
Based on EEG Signals. Electronics Keywords: emotion recognition; deep learning; EEG; music; CNN
2023, 12, 2232. https://doi.org/
10.3390/electronics12102232

Academic Editor: Ruifeng Xu


1. Introduction
Received: 16 February 2023 In the near future, computers will quickly become a pervasive part of human life. How-
Revised: 9 April 2023
ever, they are emotionally blind and cannot understand human emotional states [1]. Read-
Accepted: 27 April 2023
ing and understanding human emotional states can maximize human–computer interaction
Published: 14 May 2023
(HCI) performance [2]. Therefore, the exchange of this information and the recognition of
the user’s affective states are considered necessary to increase human–computer interac-
tion [3]. A person’s emotional state can be recognized through physiological signs, such as
Copyright: © 2023 by the authors.
electroencephalography and electrodermal response, as well as physiological indicators,
Licensee MDPI, Basel, Switzerland. such as facial signs [4]. However, the diagnosis based on physiological indicators is less
This article is an open access article often considered and used due to its sensitivity to social coverage. Among physiological
distributed under the terms and signals, EEG is the most popular and widely used signal to detect different emotions [5,6].
conditions of the Creative Commons Evidence shows that there is a very strong correlation between this signal and emotions,
Attribution (CC BY) license (https:// such as happiness, sadness, and anger. Therefore, the use of these non-invasive technolo-
creativecommons.org/licenses/by/ gies can enable us to develop an emotion recognition system that can be used in everyday
4.0/). life [7,8].

Electronics 2023, 12, 2232. https://doi.org/10.3390/electronics12102232 https://www.mdpi.com/journal/electronics


Electronics 2023, 12, 2232 2 of 21

In general, excitement is a group of physiological reactions produced by the human


body under multiple external, physical, and mental stimuli [9]. In emotional models,
scientists and psychologists divide human emotions into six main categories: sadness, hap-
piness, surprise, fear, anger, and disgust [10]. Various stimuli, such as events, images [11],
music [12], or movies [13], have been used to evoke emotions in previous research. Among
these stimuli, music is known as the fastest and most effective stimulus for emotional
induction. Determining the underlying truth of emotions is difficult because there is no
clear definition of emotions, but the best way to determine or interpret emotions during
testing is to subjectively rate emotional trials or report them by the test subject. For exam-
ple, subjective ratings of emotional tests are widely used by researchers [14]. Self-reports
can be collected from volunteers using questionnaires or designed instruments. The Self-
Assessment Manikin (SAM) is one of these tools designed to assess people’s emotional
experiences. SAM is a brief self-report questionnaire that uses images to express the scales
of pleasure, arousal, and dominance. Given its non-verbal design, the questionnaire is
readable by people regardless of age, language skill, or other educational factors [15].
Many studies have been conducted in order to automatic emotion recognition. In the
following, previous works will be examined along with their advantages and disadvantages.
Li et al. [16] presented a new model to automatically recognize emotion from EEG
signals. These researchers used 128 channels to record the EEG signal. In addition, they
identified active channels in 32 volunteers with Correlation-based Feature Selection (CFS)
and reported that 5 channels, namely T3, F3, Fp1, O2, and Fp2, have a great effect on
emotion recognition. These researchers used the Genetic Algorithm (GA) to reduce the
dimensions of the feature vector and used the t-test to verify the correctness of the se-
lected features. Finally, K-Nearest Neighbors (KNN), Random Forest (RF), Bayesian, and
Multilayer Perceptron (MLP) classifiers have been used for classification. Yimin et al. [17]
recognized the four emotions of happiness, sadness, surprise, and peace from EEG signals.
They used eight volunteers to record the EEG signal. These researchers used four classifiers,
namely RF, Linear Discriminant Analysis (LDA), Support Vector Machine (SVM), and C4.5,
for the classification part and concluded that the C4.5 classifier had a better performance
in detecting emotion. Hassanzadeh et al. [18] used a Fuzzy Parallel Cascade (FPC) model
to detect emotion. For their experiment, these researchers used a musical stimulus with
15 volunteers. They also compared their proposed model with Recurrent Neural Networks
(RNN). Finally, the Mean Squared Error (MSE) of these researchers for the classification
of the two classes of valence and arousal is reported to be approximately 0.089, which is
lower compared with other models. Panayo et al. [19] used Deep Neural Networks (DNNs)
to recognize emotion from EEG signals. They conducted their experiment on 12 people.
Their proposed network architecture consisted of six convolutional layers. In addition,
these researchers compared their proposed algorithm with SVM and concluded that CNN
had better performance in emotion recognition than comparative methods. Chen et al. [20]
used EEG signals to automatically classify two classes of emotion. These researchers used
parallel RNNs in their proposed algorithm. The final reported accuracy for valence and
arousal class classification based on their proposed algorithm was 93.64% and 93.26%,
respectively. He et al. [21] used dual Wavelet Transform (WT) to extract features from EEG
signals in order to recognize emotion. In addition, these researchers, after feature extraction,
used recursive units to train their model. Finally, they achieved an accuracy of 85%, 84%,
and 87% for positive, negative, and neutral emotion classes, respectively. Sheykhivand
et al. [22] used 12 channels of EEG signals for the automatic recognition of emotion. For
this purpose, these researchers used a combination of RNN and CNN for feature selec-
tion/extraction and classification. In their proposed model, they identified three different
states of emotion, including positive, negative, and neutral, using musical stimulation and
achieved 96% accuracy. Among the advantages of their model, the classification accuracy
was high, but the computational complexity can be considered as the disadvantage of this
research. Er et al. [23] presented a new model for automatic emotion recognition from EEG
signals. These researchers used transfer-learning networks, such as VGG16 and AlexNet,
Electronics 2023, 12, 2232 3 of 21

in their proposed model. They achieved satisfactory results based on the VGG16 network
in order to classify four different basic emotional classes, including happy, relaxed, sad,
and angry. Among the advantages of this research, low computational complexity and
low classification accuracy can be considered as disadvantages of this research. Zhao
et al. [24] presented a new model for automatic emotion recognition. Their model consisted
of two different parts. The first part consisted of a novel multi-feature fusion network that
used spatiotemporal neural network structures to learn spatiotemporal distinct emotional
information for emotion recognition. In this network, two common types of features, time
domain features (differential entropy and sample entropy) and frequency domain features
(power spectral density), were extracted. Then, in the second part, they were classified
into different classes by Softmax and SVM. These researchers used the DEAP dataset to
evaluate their proposed model and achieved promising results. However, computational
complexity can be considered a disadvantage of this research. Nandini et al. [25] used
multi-domain feature extraction and different time–frequency domain techniques and
wavelet-based atomic function to automatically detect emotions from EEG signals. These
researchers have used the DEAP database to evaluate their algorithm. In addition, they
used machine learning algorithms, such as Random Forest to classify the data and achieved
an average accuracy of 98%. Among the advantages of this research is the high classification
accuracy. Niu et al. [26] used a two-way deep residual neural network to classify discrete
emotions. At first, these researchers divided the EEG signal into five different frequency
bands using WT to enter the proposed network. In order to evaluate their algorithm,
they collected a dedicated database from seven participants. The classification accuracy
reported by these researchers was 94%. Among the problems of this research was the high
computational load. Vergan et al. [27] have used deep learning networks to classify three
and four emotional classes. These researchers used the CNN network to select and extract
features from EEG signals. In order to reduce the deep feature vector, the semi-supervised
dimensionality reduction method was used by these researchers. They used two databases,
DEAP and SEED, in order to evaluate their proposed method and achieved a high accuracy
of 90%. Hu et al. [28] used Feature Pyramid Network (FPN) to improve emotion recognition
performance based on EEG signals. In their proposed model, the Differential Entropy (DE)
of each recorded EEG channel was extracted as the main feature. These researchers used
SVM to score each class. The accuracy reported by these researchers in order to detect the
dimension of valence and arousal for the DEAP database was reported as 94% and 96%,
respectively. Among the advantages of this research is high classification accuracy. In addi-
tion, due to the computational complexity, their proposed model could not be implemented
on real-time systems, which can be considered a disadvantage of this research.
As reviewed and discussed, many studies have been conducted and organized for
automatic emotion recognition from EEG signals. However, these studies have limitations
and challenges. Most previous research has used manual and engineering features in
feature extraction/selection. Using manual features requires prior knowledge of the subject
and may not be optimal for another subject. In simpler terms, the use of engineering
features will not guarantee the optimality of the extracted feature vector. The next limitation
of previous research can be considered the absence of a comprehensive and up-to-date
database. Existing databases to emotion recognition are limited and are organized based
on visual stimulation and, thus, are not suitable for use in deep learning networks. It
can almost be said that there is no general and standard database based on auditory
stimulation. Many studies have used deep learning networks to detect emotions and
have achieved satisfactory results. However, due to computational complexity, these
studies cannot be implemented in real-time systems. Accordingly, this research tries to
overcome the mentioned limitations and present a new model with high reliability and
low computational complexity in order to achieve automatic emotion recognition. To this
end, a comprehensive database for emotion recognition based on musical stimuli has been
collected in the BCI laboratory of Tabriz University based on EEG signals in compliance
with the necessary standards. The proposed model is based on deep learning, which can
Electronics 2023, 12, 2232 4 of 21

identify the optimal features from the alpha, beta, and gamma bands extracted from the
recorded EEG signal to hierarchically and end-to-end classify the basic emotions in two
different scenarios. The contribution of this study is organized as follows:
• Collecting a comprehensive database of emotion recognition using musical stimulation
based on EEG signals.
• Presenting an intelligent model based on deep learning in order to separate two and
three basic emotional classes.
• Using end-to-end deep neural networks, which has led to the elimination of feature
selection/extraction block diagram.
• Providing an algorithm based on deep convolutional networks that can be resistant to
environmental noise to an acceptable extent.
• Presenting an automatic model that can classify two and three emotional classes with
the highest accuracy and the least error compared with previous research.
The remainder of the article is written and organized as follows: Section 2 is related
to materials and methods, and in this section, the method of data collection and the
mathematical background related to deep learning networks are described. Section 3 is
related to the proposed model, which describes the data preprocessing and the proposed
architecture. Section 4 presents the simulation results and compares the obtained results
with previous research. Section 5 discusses the applications related to the current research.
Finally, Section 6 addresses the conclusion.

2. Materials and Methods


In this section, firstly, the method of collecting the proposed database of EEG signals for
emotion recognition is described. Then, the mathematical background related to common
signal processing filters and DNNs is presented.

2.1. EEG Database Collection


In order to collect the database, EEG signals were used for emotion recognition from
11 volunteers (5 men and 6 women) in the age range of 18 to 32 years. All volunteers
present in the experiment were free of any underlying diseases. In addition, they read
and signed the written consent form to participate in the experiment. This experiment
was approved by the ethics committee of the university with license 1398.12.1 in the BCI
laboratory at Tabriz University. The volunteers were asked to avoid alcohol, medicine,
caffeine, and energy drinks 48 h before the test. In addition, volunteers were asked to take
a bath the day before the test and avoid using hair-softening shampoos. All the tests were
performed at 9 am so that people would have enough energy. Before the experiment, Beck’s
popular Depression Mood Questionnaire [29] was used to exclude from the experiment
volunteers who suffered from depression. After this test, the candidates who received a
score higher than 21, were deemed to have depression disorder and were excluded from
the test process. The reason for using this test is that depressive disorder causes a lack of
emotional induction in people. In addition, the SAM assessment test in paper form with
9 grades was used to control the dimension of valence and arousal [15]. In the relevant
test, a score lower than 3 and a score higher than 6 are considered low grade and high
grades, respectively.
In order to record EEG signals, a 21-channel Medicom device according to the
10–20 standard was used. Medicam is a Russian device for recording brain signals, which
is widely used in medical clinics and research centers due to its high performance. Silver
chloride electrodes, which were organized in the form of a hat, were used in this work.
Two electrodes, A1 and A2, were used to reference brain signals. Thus, out of 21 channels,
19 channels are actually available. To avoid EOG signal artifacts, volunteers were asked to
keep their eyes closed during EEG signal recording. The sampling frequency was 250 Hz
and an impedance matching of 10 kΩ was used on the electrodes. The recording mode of
the signals was also set as bipolar.
Electronics 2023, 12, 2232 5 of 21

Table 1 shows the details of the signal recording of the volunteers in the experiment
and the reason for removing some volunteers from the experiment process. To clarify the
reason for the exclusion of the volunteers in the experiment, Volunteer 6 was excluded
Electronics 2023, 11, x FOR PEER REVIEW 7 of 22
from the experiment due to the low level of positive emotional arousal (<6). In addition,
Volunteer 2 was excluded from the continuation of the signal recording process due to
depression disorder (21 > 22).
In
In order to arouse positive and negative emotion in subjects, musical stimulus was
used in in this
this study.
study. To this end, 10 pieces of music with happy and sad styles were played played
for the
the volunteers
volunteers through headphones. Each piece of music music was played
played for
for the
the volunteers
volunteers
for 11 min,
min, and
and the
the EEG
EEG signals
signals of
of the
the subjects
subjects were
were recorded.
recorded. Between
Between each
each music
music played,
played,
volunteers were given 15 ss of
the volunteers of rest
rest (neutral)
(neutral) in
in order
order to
to prevent
prevent the
the transfer
transfer of
of produced
produced
excitement. An
excitement. Anexample
exampleof ofthe
the recorded
recorded EEGEEG signal
signal for
for 33 different
different emotional
emotional states
states is
shown in Figure 1.

Figure
Figure 1.
1. An
An example
example of
of EEG
EEG signal
signal recorded
recorded from
from C4
C4 and
and F4 channels for
F4 channels positive, negative,
for positive, negative, and
and
neutral emotions in Subject 1.
neutral emotions in Subject 1.

In
In this
this way,
way, the
thesignals
signalsrecorded
recordedfrom
fromsubjects
subjectsforfor
happy
happysongs, sadsad
songs, songs, andand
songs, re-
laxation state are labeled as positive emotion, negative emotion, and neutral emotion,
relaxation state are labeled as positive emotion, negative emotion, and neutral emotion, re-
spectively.
respectively.Table
Table2 2shows
showsthe
thePersian
Persiansongs
songsplayed
playedfor
forsubjects.
subjects.Figure
Figure22shows
shows the
the order
order
of
of playing
playing music
music for
for subjects.
subjects.

Table 2. The music used in the experiment.

Emotion Sign and Music The Type of Emotion Cre-


The Name of the Music
Number ated in the Subject
N1 Negative Advance Income of Isfahan
P1 Positive Azari 6/8
N2 Negative Advance Income of Homayoun
P2 Positive Azari 6/8
P3 Positive Bandari 6/8
N3 Negative Afshari piece
N4 Negative Advance Income of Isfahan
P4 Positive Persian 6/8
N5 Negative Advance Income of Dashti
P5 Positive Bandari 6/8
Electronics 2023, 12, 2232 6 of 21

Table 1. Details related to recording the signal of volunteers in the experiment.


Subject 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
Sex M M F M M M M M M F F F F M F M
Age 25 24 27 24 32 18 25 29 30 19 18 20 22 24 23 28
BDI 16 22 19 4 0 11 13 19 20 14 22 12 0 12 1 9
Valence for
9 6.8 6.2 7.4 5.8 5.6 7.2 7.8 7.4 6.8 7.8 8.6 6 8 - 7.4
P emotion
Arousal for
9 6.2 7.4 7.6 5 5.4 7.4 7.4 7 6.6 8 8.6 6 8 - 8
P emotion
Valence for
2 3.6 4.2 2.4 4.4 2 3.8 2.8 3.4 3.8 4.5 2 2 1.8 - 1.8
N emotion
Arousal for
1 2 4.6 2.6 5.6 1.6 3.8 3 5.4 3.2 3 1.2 1.2 1.6 - 2
N emotion
Result of
ACC REJ REJ ACC REJ REJ REJ ACC REJ REJ REJ ACC ACC ACC REJ ACC
Test
Failure in Failure in Failure Failure Failure Failure
Reason for Depressed Depressed Motion
- the SAM - the SAM in the P in the N - in the N in the N - - - -
rejection 21 < 22 21 < 22 noise
test test emotion emotion emotion emotion
Electronics 2023, 12, 2232 7 of 21

Table 2. The music used in the experiment.

Emotion Sign and The Type of Emotion


The Name of the Music
Music Number Created in the Subject
N1 Negative Advance Income of Isfahan
P1 Positive Azari 6/8
N2 Negative Advance Income of Homayoun
P2 Positive Azari 6/8
P3 Positive Bandari 6/8
N3 Negative Afshari piece
N4 Negative Advance Income of Isfahan
P4 Positive Persian 6/8
lectronics 2023, 11, x FOR PEER REVIEW N5 Negative 8 of 22
Advance Income of Dashti
P5 Positive Bandari 6/8

Figure 2.Figure 2. Schedule


Schedule of musicofplayed
music played
for the for the volunteers.
volunteers.

2.2. Signal Processing Filters


In the field of signal processing, many filtering algorithms are used for signal pre-
processing in order to remove motion and environmental noises and to reach the desired
frequency range. Among the most popular filters, we mention Notch [30] and Butterworth
Electronics 2023, 12, 2232 8 of 21

2.2. Signal Processing Filters


In the field of signal processing, many filtering algorithms are used for signal pre-
processing in order to remove motion and environmental noises and to reach the desired
frequency range. Among the most popular filters, we mention Notch [30] and Butterworth
filters [31], which are also used in this study. In the following section, the mathematical
details of each of these filters are examined.

2.2.1. Notch Filter


A Notch filter is a type of band-stop filter, which is a filter that attenuates frequencies
within a specific range while passing all other frequencies unaltered. For a Notch filter,
this range of frequencies is very narrow. The range of frequencies that a band-stop filter
attenuates is called the stopband. The narrow stopband in a Notch filter makes the fre-
quency response resemble a deep notch, which gives the filter its name. It also means that
Notch filters have a high Q factor, which is the ratio of center frequency to bandwidth.
Notch filters are used to remove a single frequency or a narrow band of frequencies. In
audio systems, a Notch filter can be used to remove interfering frequencies, such as power
line hum. Notch filters can also be used to remove a specific interfering frequency in
radio receivers and software-defined radio. The main application of the notch filter can be
considered to remove the frequency of 50 or 60 Hz of city electricity [30].

2.2.2. Butterworth Filter


Among the very popular filters for removing the frequency range above 100 Hz of
brain signals is the low-pass Butterworth filter [31]. The Butterworth filter is a type of
signal processing filter designed to have a frequency response that is as flat as possible in
the passband. It is also referred to as a maximally flat magnitude filter. One of the most
important features of this filter is the existence of a flat maximum frequency response in
the pass region and no ripple. In addition, its graph tends to a very good approximation
with a negative slope to negative infinity.
Butterworth showed that a low-pass filter could be designed whose cutoff frequency was
normalized to 1 radian per second; its frequency response can be defined by Equation (1):

1
G (ω ) = √ (1)
1 + ω 2n
where ω is the angular frequency in radians per second and n is the number of poles in the
filter—equal to the number of reactive elements in a passive filter. If ω = 1, the amplitude

response of this type of filter in the passband is 1/ 2 ≈ 0.7071, which is half power or
−3 dB [31].

2.3. Brief Description of Convolutional Neural Networks Model


Deep learning can be considered as a subset of Machine Learning (ML), which has
been widely used in the previous years in all subjects, including medicine, agriculture,
industries, and engineering. CNNs can be considered as the main part of deep learning
networks. CNNs have been shown to be a highly successful replacement for traditional
neural networks in the development of machine learning classification algorithms. CNN
learns in two stages: feed forward and reverse propagation. In general, DCNN is composed
of three major layers: convolutional, pooling, and connected layers [32]. The output
of a convolutional layer is referred to as feature mapping. The max-pooling layer is
typically employed after each convolutional layer and chooses just the maximum values
in each feature map. A dropout layer is employed to prevent overfitting; hence, each
neuron is thrown out of the network at each stage of training with a probability. A Batch
Normalization (BN) layer is commonly used to normalise data within a network and
Electronics 2023, 12, 2232 9 of 21

expedites network training. BN is applied to the neurons’ output just before applying the
activation function. Usually, a neuron without BN is computed as follows:

z = g(w, x ) + b; a = f (z) (2)

where g() is the linear transformation of the neuron, w is the weight of the neuron, b is the
bias of the neurons, and f () is the activation function. The model learns the parameters w
and b. By adding the BN, the equation can be defined as follows:

z − mz
z = g(w, x ); zN = ( )·γ + β; a = f (z N ) (3)
sz

where zN is the output of BN, mz is the mean of the neurons’ output, Sz is the standard
deviation of the output of the neurons, and γ and β are learning parameters of BN [33].
One of the most significant components of deep learning is the performance of ac-
tivation functions because activation functions play an important part in the learning
process. An activation function is used after each convolutional layer. Various activation
functions, such as ReLU, Leaky-ReLU, ELU, and Softmax, are available to increase learning
performance on DNN networks. Since the discovery of the ReLU activation function,
which is presently the most often used activation unit, DNNs have come a long way. The
ReLU activation function overcomes the gradient removal problem while simultaneously
boosting learning performance. The ReLU activation function is described as follows [32]:

f if f > 0
q( f ) = (4)
0 otherwise

The Softmax activation function is defined as follows [26]:

edn
σ(d)n = d
for n = 1, ...k, d = (d1 , ..., dk ) ∈ Rk (5)
∑kj=1 s j

where d is the input vector, the output values σ (d) are between 0 and 1, and their sum is
equal to 1.
In the prediction step of deep models, a loss function is utilized to learn the error ratio.
In machine learning approaches, the loss function is a method of evaluating and describing
model efficiency. An optimization approach is then used to minimize the error criterion.
Indeed, the results of optimization are used to update hyper parameters [33].

3. Proposed Deep Model


In this section, the proposed deep model, which includes pre-processing of recorded
data, deep network architecture design, parameter optimization, and training and evalu-
ation sets, is fully described. Figure 3 shows the main framework of the proposed deep
model for automatic emotion recognition from EEG signals based on musical stimulation.

3.1. Data Pre-Processing


In this section, the method of preprocessing the recorded data, which includes filtering
and segmentation, is examined. Brain signals are strongly affected by noise, and it is
necessary to remove environmental and motion noises using different filtering algorithms.
For this purpose, two filtering algorithms were used in this work to remove artifacts. First,
a Notch filter was applied to the recorded EEG signals in order to eliminate the frequency
of 50 Hz. Then, considering that emotional arousal occurs only between the ranges of 0.5 to
45 Hz [22,34], a first-order Butterworth filter with a frequency of 0.5 to 45 Hz was used on
the EEG data.
criterion. Indeed, the results of optimization are used to update hyper parameters [33].

3. Proposed Deep Model


In this section, the proposed deep model, which includes pre-processing of recorded
data, deep network architecture design, parameter optimization, and training and evalu-
Electronics 2023, 12, 2232 10 of 21
ation sets, is fully described. Figure 3 shows the main framework of the proposed deep
model for automatic emotion recognition from EEG signals based on musical stimulation.

Figure
Figure 3.
3. The
The main
main framework
framework of
of the
the proposed
proposed deep
deep model.
model.

3.1. Data
As isPre-Processing
clear, the use of all EEG channels will increase the computational load due to
the increase in the dimensions
In this section, the method of of the feature matrix.
preprocessing Therefore,
the recorded it iswhich
data, necessary to identify
includes filter-
active
ing andchannels. Emotion-related
segmentation, EEG
is examined. electrodes
Brain signalswere distributed
are strongly mainly
affected byinnoise,
the prefrontal
and it is
lobe, temporal
necessary lobe margin,
to remove and posterior
environmental and occipital lobe [35].
motion noises These
using regions
different are precisely
filtering algo-
in line with the physiological principle of emotion generation. By selecting
rithms. For this purpose, two filtering algorithms were used in this work to remove the electrode
arti-
distribution,
facts. First, a the
Notchextracted feature
filter was dimension
applied can be greatly
to the recorded reduced.
EEG signals The complexity
in order to eliminate of
calculation can be diminished, and the experiment is more straightforward
the frequency of 50 Hz. Then, considering that emotional arousal occurs only between the and easier to
carry out. Based on this, only the electrodes of the prefrontal lobe, temporal lobe margin,
and posterior occipital lobe, which include Pz, T3, C3, C4, T4, F7, F3, Fz, F4, F8, Fp1, and
Fp2, were used for processing [35].
We selected 5 min (30 s × 5 min = 300 s) of the signals recorded from electrodes for
each positive and negative class. Considering that the sampling frequency was 250 Hz,
we had 75,000 available samples for each class. In the next step, 3 frequency bands α, β,
and γ were extracted from the data using 8th Daubechies WT [36]. For the first subject
and the positive emotional state, these frequency bands for one segment are presented in
Figure 4. Then, in order to avoid the phenomenon of overfitting, overlapping operations
were performed on the data obtained from the selected electrodes according to Figure 5.
According to this operation for the two-class scenario, the recorded data from the selected
electrodes was divided into 540 samples of 8 s each. All the mentioned steps were repeated
for the second scenario with the difference in data length. Finally, in this research, the input
data for both scenarios were applied to the proposed network in the form of images. Thus,
the input data for the first and second scenarios were equal to (7560) × (36 × 2000 × 1) and
(11340) × (36 × 2000 × 1), respectively. The input images from the extracted frequency
bands are shown in Figure 6.
According to this operation for the two-class scenario, the recorded data from the selected
electrodes was divided into 540 samples of 8 s each. All the mentioned steps were repeated
for the second scenario with the difference in data length. Finally, in this research, the
input data for both scenarios were applied to the proposed network in the form of images.
Thus, the input data for the first and second scenarios were equal to (7560)  (36  2000  1)
Electronics 2023, 12, 2232 11 of 21
and (11340)  (36  2000 1) , respectively. The input images from the extracted frequency
bands are shown in Figure 6.

Electronics 2023, 11, x FOR PEER REVIEW 12 of 22


Figure 4. Extracted frequency bands for a segment for the positive emotional state of the first subject.
Electronics 2023, 11, x FOR PEER REVIEW
Figure 4. Extracted frequency bands for a segment for the positive emotional state of the first subject.12 of 22

Figure 5. Overlap operation performed on the EEG signal for each electrode in positive, negative,
Figure
Figure5.5. Overlap
Overlap operation
operation performed on the
performed on the EEG
EEG signal
signalfor
foreach
eachelectrode
electrodeininpositive,
positive,negative,
negative,
and neutral emotion.
and
andneutral
neutralemotion.
emotion.
Positive emotion
Positive emotion

Negative emotion
Negative emotion

Neutral emotion
Neutral emotion

Figure 6. Inputformed
formed imagesfor forpositive,
positive,negative,
negative,and
andneutral
neutralemotion
emotionfor
forthe
thefirst
firstsubject
subjectbased
based
Figure6.6. Input
Figure Input formed images
images for positive, negative, and neutral emotion for the first subject based
on
on extracted α ,
extractedα,αβ,, and
onextracted
β , and
β , and γ frequency
γ frequency
γ frequency bands.
bands.bands.

3.2.Deep
3.2. DeepArchitectural
ArchitecturalDetails
Details
Forthe
For theproposed
proposeddeep
deepnetwork
networkarchitecture,
architecture,a acombination
combinationofofsixsix convolutional
convolutional lay-
lay-
ers along with two fully connected layers was used. The order of the layers was as follows:
ers along with two fully connected layers was used. The order of the layers was as follows:
Electronics 2023, 12, 2232 12 of 21

3.2. Deep Architectural Details


For the proposed deep network architecture, a combination of six convolutional layers
along with two fully connected layers was used. The order of the layers was as follows:
I. One drop-out layer.
II. A 2D Convolution layer with the Leaky-ReLu nonlinear function and a Max-Pooling
layer with Batch Normalization are added.
III. The architecture of the previous stage is repeated three more times.
IV. A Convolution 2D layer is added with the Leaky-ReLu nonlinear function along with
the Batch Normalization.
V. The architecture of the previous stage is repeated one more time.
VI. The output of the previous architecture is connected to the two Fully Connected layers,
which are used in the last layer of the Softmax function to access the outputs and
emotion recognition.
Electronics 2023, 11, x FOR PEER REVIEW 13 of 22
The graphic representation of the mentioned proposed architecture is shown in
Figure 7.

Figure
Figure 7.
7. Graphical detailsofofthe
Graphical details thedesigned
designednetwork
network architecture
architecture along
along with
with the the
sizessizes of filters,
of filters, layers,
layers, etc. etc.

The
The hyper-parameters
hyper-parameters related
relatedto the proposed
to the proposedmodel werewere
model carefully adjusted
carefully in order
adjusted in or-
to achieve the best efficiency and convergence by the trial-and-error method.
der to achieve the best efficiency and convergence by the trial-and-error method. Accord- Accordingly,
the Cross-Entropy
ingly, objectiveobjective
the Cross-Entropy functionfunction
and RmSprop optimization
and RmSprop with a learning
optimization with rate of
a learning
0.001of
rate and a batch
0.001 andsize of 10 size
a batch wereofselected.
10 wereMore detailsMore
selected. related to therelated
details size of the filters,
to the sizethe
of the
number of steps, and the type of layers used are shown in Table
filters, the number of steps, and the type of layers used are shown in Table 3.3.
In the design of the proposed architecture, we attempted to take into account the
Table 3. The detailsfor
best dimensions the network
of the size andarchitecture,
strides of filters, optimizers,
including the size ofetc., so that
the filters, the
the proposed
number of layers,
model could perform
and the type of layers. best in terms of different evaluation criteria in order to emotion
recognition. Table 4 shows the number of different layers of the network, different types
Number of Size of Kernel
of optimizers, sizes of filters and steps, etc., which Activation
were used in choosing the optimal
Padding Strides Output Shape Layer Type L
Filters mode of the proposed
and Pooling architecture. According to Table 4, the best possible state of the used
Function
Yes 16 2parameters128 was selected
× 128 in the 18,
(None, architecture
1000, 16) of the proposed
Leaky ReLUmodel. Convolution 2-D 0–1
No - 2 2×2 (None, 9, 500, 16) - Max-Pooling 2-D 1–2
Yes 32 1 3×3 (None, 9,500, 32) Leaky ReLU Convolution 2-D 2–3
No - 2 2×2 (None, 4, 250, 32) - Max-Pooling 2-D 3–4
Yes 32 1 3×3 (None, 4, 250, 32) Leaky ReLU Convolution 2-D 4–5
No - 2 2×2 (None, 2, 125, 32) - Max-Pooling 2-D 5–6
Yes 32 1 3×3 (None, 2, 125, 32) Leaky ReLU Convolution 2-D 6–7
Electronics 2023, 12, 2232 13 of 21

Table 3. The details of the network architecture, including the size of the filters, the number of layers,
and the type of layers.

Number of Size of Kernel Activation


Padding Strides Output Shape Layer Type L
Filters and Pooling Function
Yes 16 2 128 × 128 (None, 18, 1000, 16) Leaky ReLU Convolution 2-D 0–1
No - 2 2×2 (None, 9, 500, 16) - Max-Pooling 2-D 1–2
Yes 32 1 3×3 (None, 9, 500, 32) Leaky ReLU Convolution 2-D 2–3
No - 2 2×2 (None, 4, 250, 32) - Max-Pooling 2-D 3–4
Yes 32 1 3×3 (None, 4, 250, 32) Leaky ReLU Convolution 2-D 4–5
No - 2 2×2 (None, 2, 125, 32) - Max-Pooling 2-D 5–6
Yes 32 1 3×3 (None, 2, 125, 32) Leaky ReLU Convolution 2-D 6–7
No - 2 2×2 (None, 1, 62, 32) - Max-Pooling 2-D 7–8
Yes 32 1 3×3 (None, 1, 62, 32) Leaky ReLU Convolution 2-D 8–9
Yes 16 1 3×3 (None, 1, 62, 16) Leaky ReLU Convolution 2-D 10–11
- - - - (None, 100) Leaky ReLU FC 11–12
- - - - (None, 2–3) Softmax FC 12–13

Table 4. The details of the designed network architecture, including the size of the filters, the number
of layers, and the type of layers.

Parameters Search Space Optimal Value


RMSProp, Adam, Sgd, Adamax, and
Optimizer RMSProp
Adadelta
Cost function MSE, Cross-entropy Cross-Entropy
Convolution layers 3, 5, 6, 11, 15 6
Filters in the first convolution
16, 32, 64, 128 16
layer
Number of
Filters in the second
16, 32, 64, 128 32
convolution layer
Filters in other convolution
16, 32, 64, 128 32
layers
First convolution layer 3, 16, 32, 64, 128 128
Size of filter in the
Other convolution layers 3, 16, 32, 64, 128 3
Before the first convolution
0, 0.2, 0.3, 0.4, 0.5 0.3
layer
Dropout rate
After the first convolution
0, 0.2, 0.3, 0.4, 0.5 0.3
layer
Batch size 4, 8, 10, 16, 32, 64 10
Learning rate 0.01, 0.001, 0.0001 0. 001

The number of divided samples for each of the training, test, and validation sets are
examined in this section. Based on this, the total number of samples in this study for the
first and second scenarios was 7560 and 11,340, respectively, of which 70% were randomly
selected for the training set (5292 samples for the two-class state and 7938 samples for
the three-class state), 10% of the dataset was selected for the validation set (756 samples
for the two class state and 1134 samples for the three class state), and 20% of the dataset
(1512 samples for the two class state and 2268 samples for the three class state) was selected
for the test set. The collection related to model training and evaluation is shown in Figure 8.
selected for the training set (5292 samples for the two-class state and 7938
three-class state), 10% of the dataset was selected for the validation set
the two class state and 1134 samples for the three class state), and 20% of
Electronics 2023, 12, 2232 samples for the two class state and 2268 samples for the three class
14 of 21 state)
the test set. The collection related to model training and evaluation is sho

Dataset distribution
Figure8.8. Dataset
Figure along the
distribution data recognition
along the dataprocess.
recognition process.
4. Experimental Results
4. Experimental Results
Python programming language under Keras and TensorFlow was used to simulate
the proposed deep model. All simulation results are extracted from a computer system
withPython
16 GB RAM,programming
a 2.8 GHz CPU, and language
a 2040 GPU.under Keras and TensorFlow was
Figure 9 depicts the classification
the proposed deep model. All simulation error and accuracy graph for
results various
are scenarios from
extracted for ac
training and validation data in 200 network iterations. Figure 9a shows that the network
with 16forGB
error theRAM,
two-classa state
2.8 GHz
reachedCPU,
a stableand
stateaby2040 GPU.
increasing the algorithm iteration
in the 165th iteration. Figure 9 shows that after 200 repetitions, the proposed method for
emotion recognition achieved 98% and 96% accuracy in the two-class and the three-class
states, respectively. Figure 10 depicts the confusion matrix used to classify the scenarios
under consideration. According to Figure 10, the proposed deep network’s performance
is very promising. Table 5 also shows the values of accuracy, sensitivity, specificity, and
precision for each emotion in different scenarios. As can be seen, all the values obtained for
each class of the two considered scenarios were greater than 90%. A visualization of the
samples before and after entering the network was considered to demonstrate the more
accurate performance of the proposed network. Figure 11 shows a TSen diagram with
this visualization. As can be seen, the proposed model successfully separated the samples
related to each emotion in each scenario. This positive outcome is due to the proposed
improved CNN architecture. Figure 12 depicts the Receiver Operating Characteristic
Curve (ROC) analysis for different scenario classifications. An ROC is a graphical plot
that illustrates the diagnostic ability of a classifier system as its discrimination threshold
is varied. In this diagram, the farther the curve is from the bisector and the closer it
is to the vertical line on the left, the better the performance of the classifier. As is well
known, each emotion class in the ROC analysis has a score between 0.9 and 1, indicating
excellent classification performance. Based on the results, it is possible to conclude that
the proposed deep model for classifying different emotional classes was very efficient and
met the relevant expectations. However, in order to conduct further analysis, the obtained
results must be compared with other studies. The findings will be compared with other
previous studies and methods for this purpose.
Sheykhivand et al. [22] Music CNN-LSTM 3 96
Electronics
Electronics 2023, 2023, HouPEER
11, x FOR
12, 2232 et al. [28]
REVIEW Video Clip FPN+SVM 4 95.50 15 of1621of 22
Proposed model Music Customized CNN 3 98

a. Hou et al. [28] Video Clip FPN+SVM 4 95.50


Proposed model Music Customized CNN 3 98

a.

Accuracy (%)

Loss
b.
Train Accuracy Validation Accuracy Train Loss Validation Loss
b.
100 0.35

90 0.3

80 0.25

70 0.2

60 0.15

50 0.1

40 0.05

30 0
0 20 40 60 80 100 120 140 160 180 200
Itrations

Figure
Figure 9. The proposed
FigureThe9.proposed
9. deep model’s
The deep model’s
performance
proposed performance
deep model’s in classifying
in classifying
performance differentdifferent scenarios
scenarios
in classifying (a and
different (a,b)
b) in
in terms
scenarios ofb) in
(a and
accuracy
terms and
of classification
accuracy and error in 200 iterations.
classification
terms of accuracy and classification error in 200 iterations. error in 200 iterations.

First Scenario Second Scenario


o Second Scenario
1

Figure 10. Confusion matrix for classifying various scenarios.


Figure 10. Confusion matrix for classifying various scenarios.
Figure 10. Confusion matrix for classifying various scenarios.
Electronics 2023, 11, x FOR PEER REVIEW 17 of 22

Electronics 2023, 12, 2232 16 of 21


First Scenario
Figure 10. Confusion matrix for classifying various scenarios.

First Scenario

Second Scenario

Second Scenario

Figure 11. Visualization of test samples before and after entering the proposed deep model for
Figure 11. Visualization of test samples before and after entering the proposed deep model for
different scenarios.
Figure 11. Visualization of test samples before and after entering the proposed deep model for
different scenarios.
different scenarios.

Figure 12. ROC


Figure analysis
12. ROC to classify
analysis different
to classify scenarios.
different scenarios.
Figure 12. ROC analysis to classify different scenarios.
Table 6 compares
Table 6 comparesprevious studies,
previous as well
studies, as the
as well methods
as the methodsemployed
employedin each study,
in each study,
with the 6the
Table
with proposed
compares improved
proposed previous
improved deep model.
studies,
deep as As
model. shown
well
As as the
showninmethods
Table 6, the
in Table proposed
employed
6, the inmethod
proposed each study,
method
achieved
achievedthe highest
the accuracy
highest when
accuracy compared
when with
compared previous
with works.
previous
with the proposed improved deep model. As shown in Table 6, the proposed method However,
works. this
However,com- this
parison does
comparison not appear
does not to be
appear tofair because the
be faircompared databases
because thewith under
databases consideration
underworks. are
consideration not
are not
achieved the highest accuracy when previous However, this
identical. As As
identical. a result,
a to use
result, to the the
use proposed
proposedrecorded
recordeddatabase, it isitnecessary
database, is to simulate
necessary to simulate
comparison
andand
evaluatedoes
thenot appear
methods to in
used be fair because the databases under consideration are not
evaluate the methods usedprior research.
in prior research.
identical. As a result, to use the proposed recorded database, it is necessary to simulate
and evaluate the methods used in prior research.
Electronics 2023, 12, 2232 17 of 21

Table 5. The accuracy, sensitivity, specificity, and precision achieved by each class for different scenarios.

First Scenario (P and N) Positive Negative


Sensitivity 98.2 98.2
Accuracy 98.8 97.6
Specificity 97.6 98.8
Precision 97.6 98.8
Second Scenario Positive Negative Neutral
Sensitivity 97.8 97.5 96.7
Accuracy 95.7 95.3 97.2
Specificity 98.9 98.6 96.5
Precision 97.9 97.4 93

Table 6. Comparing the performance of prior research with the proposed model.

Number of
Study Stimulus Methods Emotions ACC%
Considered
Zhao et al. [37] Music Deep local domain 4 89
Frequency bands
Chanel et al. [38] Video Games 3 63
extraction
Principal
Jirayucharoensak et al. [39] Video Clip component 3 49.52
analysis
Er et al. [23] Music VGG16 4 74
Sheykhivand et al. [22] Music CNN-LSTM 3 96
Hou et al. [28] Video Clip FPN+SVM 4 95.50
Proposed model Music Customized CNN 3 98

To more accurately evaluate the proposed model, the deep network architecture
presented in this study was compared with other common methods and previous research
used for automatic emotion recognition. In this regard, two methods based on raw signal
feature learning and engineering feature extraction (manual) were used along with MLP
classifiers, CNN-1D, CNN-LSTM (1D), and the proposed CNN-2D model. The gamma band
was extracted from the recorded EEG signals for engineering features (using a 5-level Daubechies
WT). From the obtained gamma band, two Root Mean Square (RMS) and Standard Deviation
(SD) features were extracted. Based on this, the input dimensions for the first and second
scenarios were (2 × 7 × 540) × (e × 2) and (3 × 7 × 540) × (e × 2), respectively. Following that,
MLP, CNN-1D, CNN-LSTM (1D), and proposed CNN-2D networks were used to classify
the extracted feature vector. The raw signals were classified using expressed networks for
feature learning, with no manual feature extraction or selection. The MLP network had two
fully connected layers, the last of which had two neurons (for the two-class state) and three
neurons (for the three-class state). Following [22], CNN-1D and CNN-LSTM (1D) network
architectures were considered. To improve the performance of the expressed networks,
their hyperparameters were adjusted on the basis of the type of data. The results of this
comparison are shown in Table 7 and Figure 13. According to Table 7, feature learning from
raw signals for CNN-1D, CNN-LSTM (1D), and proposed CNN-2D deep networks were
continually improved, and these networks could learn important features layer by layer,
resulting in two-class and three-class scenarios with accuracy greater than 90%. On the
contrary, as can be seen from the engineering features used as input in CNN-1D, CNN-
LSTM (1D), and CNN-2D deep networks, these networks did not improve recognition.
band was extracted from the recorded EEG signals for engineering features (using a 5-
level Daubechies WT). From the obtained gamma band, two Root Mean Square (RMS)
and Standard Deviation (SD) features were extracted. Based on this, the input dimensions
for the first and second scenarios were (2  7  540)  (e  2) and (3  7  540)  (e  2) , re-
Electronics 2023, 12, 2232 18 of 21
spectively. Following that, MLP, CNN-1D, CNN-LSTM (1D), and proposed CNN-2D net-
works were used to classify the extracted feature vector. The raw signals were classified
using expressed networks for feature learning, with no manual feature extraction or se-
When feature
lection. The MLP learning
network and
had engineering features were
two fully connected layers,compared, featurehad
the last of which learning
two neu-from
raw (for
rons datathewith CNN-1D,
two-class CNN-LSTM
state) (1D), and
and three neurons CNN-2D
(for deep networks
the three-class outperformed
state). Following [22],
engineering
CNN-1D andfeatures.
CNN-LSTM (1D) network architectures were considered. To improve the
performance of the expressed networks, their hyperparameters were adjusted on the basis
of the 7.type
Table of data.the
Comparing The results of of
performance this comparison
different modelsare shown
with in learning
different Table 7 methods.
and Figure 13.
According to Table 7, feature learning from raw signals for CNN-1D, CNN-LSTM (1D),
Feature Learning Eng. Features
and proposed
Model CNN-2D deep networks were continually improved, and these networks
First Scenario
could learn important Second
features layer Scenario
by layer, resultingFirst Scenario andSecond
in two-class Scenario
three-class sce-
nariosMLP
with accuracy greater
75% than 90%. On the
70% contrary, as can
79%be seen from the engineer-
74%
ing features used as input in CNN-1D, CNN-LSTM (1D), and CNN-2D deep networks,
1D-CNN 94% 90% 82% 76%
these networks did not improve recognition. When feature learning and engineering fea-
CNN-LSTM
tures were compared,97% 95% raw data with80%
feature learning from CNN-1D, CNN-LSTM 77% (1D),
and2D-CNN
CNN-2D deep networks98% outperformed 81%
96%engineering features. 75%

Figure
Figure13.
13.Bar
Bardiagram
diagramcomparing
comparingdifferent
differentmodels
modelswith
withdifferent
differentlearning methods.
learning methods.

Table This
7. Comparing
result is the performance
related to theseofnetworks’
different models with
distinct different learning
architecture, whichmethods.
can automatically
extract useful features from raw data for classification. Furthermore, obtaining engineer-
Feature Learning Eng. Features
ing features
Model necessitates expertise and prior knowledge, whereas learning features from
First Scenario Second Scenario First Scenario Second Scenario
raw data allows less specialized knowledge. While CNN-1D, CNN-LSTM (1D), and pro-
MLP
posed CNN-2D deep networks 75% 70% when learning
perform better 79%features from 74%
raw data, all
investigated models, including CNN-1D, CNN-LSTM (1D),82%
1D-CNN 94% 90% CNN-2D, and MLP 76%networks,
CNN-LSTM
performed 97% when learning
nearly identically 95%engineering features.
80% 77%
This demonstrates that
deep 2D-CNN 98%
networks cannot outperform 96% methods in81%
traditional 75% without
emotion recognition
feature learning ability.
This
The result
natureisof
related
brain to these indicates
signals networks’thatdistinct
theyarchitecture, which can automatically
have a low signal-to-noise ratio (SNR)
extract
and areuseful features
highly fromtoraw
sensitive dataThis
noise. for classification.
issue may make Furthermore,
different obtaining engineer-
classes’ classification
ing features
accuracy necessitates
difficult. As aexpertise
result, it and prior knowledge,
is necessary to designwhereas learning
the proposed featuresinfrom
network order
to classify different emotions in a way that is less sensitive to environmental noises. As
a result, in this study, we artificially tested the performance of the proposed model in
noisy environments. Gaussian white noise with different SNRs was added to the data for
this purpose. Figure 14 depicts the classification results in noisy environments obtained
using the proposed model. As is well known, the proposed deeply customized model has
a high noise resistance when compared with other networks. This subject is related to
personalized architecture (use of large filter dimensions in the initial layer of the network
and use of tuned filter dimensions in the middle layers).
classify different emotions in a way that is less sensitive to environmental noises. As a
result, in this study, we artificially tested the performance of the proposed model in noisy
environments. Gaussian white noise with different SNRs was added to the data for this
purpose. Figure 14 depicts the classification results in noisy environments obtained using
the proposed model. As is well known, the proposed deeply customized model has a high
Electronics 2023, 12, 2232 noise resistance when compared with other networks. This subject is related to personal- 19 of 21
ized architecture (use of large filter dimensions in the initial layer of the network and use
of tuned filter dimensions in the middle layers).

Figure 14. Bar diagram comparing different models with different learning methods.
Figure 14. Bar diagram comparing different models with different learning methods.
Despite its positive results, this study, like all others, has benefits and drawbacks.
Despite its positive results, this study, like all others, has benefits and drawbacks.
One of this study’s limitations is the small number of emotional classes. To that end, the
One of this study’s limitations is the small number of emotional classes. To that end, the
number of emotional classes in the collected database should be increased. To address
number of emotional
existing uncertainties, classes
we plan in athe
to use collected
Generative databaseNetwork
Adversarial should (GAN)
be increased.
instead To address
existing uncertainties, we plan to use a Generative Adversarial Network
of classical data augmentation and Type 2 Fuzzy Networks in conjunction with CNN (GAN)
in instead of
classical
the future. data augmentation
The proposed andisType
architecture 2 Fuzzy
suitable Networks
for real-time in conjunction
applications due to its with
sim- CNN in the
ple and end-to-end
future. architecture.
The proposed architecture is suitable for real-time applications due to its simple
and end-to-end architecture.
5. Discussion
In this section, the possible applications of the present study are reviewed along with
5. Discussion
the practical implications of the developed emotion recognition methodology for the new
In this section, the possible applications of the present study are reviewed along with
Society 5.0 paradigm.
the practical implications of the developed emotion recognition methodology for the new
Society 5.0 paradigm.
The potential to provide machines with emotional intelligence in order to improve
the intuitiveness, authenticity, and naturalness of the connection between humans and
robots is an exciting problem in the field of human–robot interaction. A key component in
doing this is the robot’s capacity to deduce and comprehend human emotions. Emotions,
as previously noted, are vital to the human experience and influence behavior. They are
fundamental to communication, and effective relationships depend on having emotional
intelligence or the capacity to recognize, control, and use one’s emotions. The goal of
affective computing is to provide robots with emotional intelligence to enhance regular
human–machine interaction. (HMI). It is envisaged that BCI would enable robots to exhibit
human-like observation, interpretation, and emotional expression skills. Following are the
three primary perspectives that have been used to analyze emotions [40]:
a. Formalization of the robot’s internal emotional state: Adding emotional character-
istics to agents and robots can increase their efficacy, adaptability, and plausibility.
Determining neurocomputational models, formalizing them in already-existing cog-
nitive architectures, modifying well-known cognitive models, or designing special-
ized emotional architectures has, thus, been the focus of robot design in recent years.
b. Robotic emotional expression: In situations requiring complicated social interac-
tion, such as assistive, educational, and social robotics, the capacity of robots to
display recognisable emotional expressions has a significant influence on the social
interaction that results.
c. Robots’ capacity to discern human emotional state: Interacting with people would
be improved if robots could discern and comprehend human emotions.
Electronics 2023, 12, 2232 20 of 21

According to the desired performance of the present study, the proposed model can
be used in each of the discussed cases.

6. Conclusions
In this paper, a new model for automatic emotion recognition using EEG signals was
developed. A standard database was collected for this purpose by music stimulation using
EEG signals to recognize three classes of positive, negative, and neutral emotions. A Deep
Learning model based on two-dimensional CNN networks was also customized for feature
selection/extraction and classification operations. The proposed network, which included
six convolutional layers and two fully connected layers, could classify three emotions in
two different scenarios with 95% accuracy. Furthermore, the architecture suggested in
this study was tested in a noisy environment and yielded acceptable results across a wide
range of SNRs. As a result, even at −4 dB, the categorization accuracy of greater than 90%
was maintained. In addition, the proposed method was compared with previous methods
and studies in terms of different measuring criteria, and it had a promising performance.
According to the favorable results of the proposed model, it can be used in real-time
emotion recognition based on BCI systems.

Author Contributions: Conceptualization, F.B.; methodology, S.S. and S.D.; software, A.F. and F.B;
validation, S.D. and S.S.; writing—original draft preparation, F.B. and A.F. All authors have read and
agreed to the published version of the manuscript.
Funding: This research received no external funding.
Data Availability Statement: The data related to this article is publicly available on the GitHub
platform under the title Baradaran emotion dataset.
Conflicts of Interest: The authors declare no conflict of interest.

References
1. Alswaidan, N.; Menai, M.E.B. A survey of state-of-the-art approaches for emotion recognition in text. Knowl. Inf. Syst. 2020, 62,
2937–2987. [CrossRef]
2. Sheykhivand, S.; Rezaii, T.Y.; Meshgini, S.; Makoui, S.; Farzamnia, A. Developing a deep neural network for driver fatigue
detection using EEG signals based on compressed sensing. Sustainability 2022, 14, 2941. [CrossRef]
3. Sheykhivand, S.; Meshgini, S.; Mousavi, Z. Automatic detection of various epileptic seizures from EEG signal using deep learning
networks. Comput. Intell. Electr. Eng. 2020, 11, 1–12.
4. Dzedzickis, A.; Kaklauskas, A.; Bucinskas, V. Human emotion recognition: Review of sensors and methods. Sensors 2020, 20, 592.
[CrossRef]
5. Egger, M.; Ley, M.; Hanke, S. Emotion recognition from physiological signal analysis: A review. Electron. Notes Theor. Comput. Sci.
2019, 343, 35–55. [CrossRef]
6. Jain, M.; Narayan, S.; Balaji, P.; Bhowmick, A.; Muthu, R.K. Speech emotion recognition using support vector machine. arXiv
2020, arXiv:2002.07590.
7. Khalil, R.A.; Jones, E.; Babar, M.I.; Jan, T.; Zafar, M.H.; Alhussain, T. Speech emotion recognition using deep learning techniques:
A review. IEEE Access 2019, 7, 117327–117345. [CrossRef]
8. Ko, B.C. A brief review of facial emotion recognition based on visual information. Sensors 2018, 18, 401. [CrossRef]
9. Lee, J.; Kim, S.; Kim, S.; Park, J.; Sohn, K. In Context-aware emotion recognition networks. In Proceedings of the IEEE/CVF
International Conference on Computer Vision, Seoul, Republic of Korea, 27 October–2 November 2019; pp. 10143–10152.
10. Li, X.; Song, D.; Zhang, P.; Zhang, Y.; Hou, Y.; Hu, B. Exploring EEG features in cross-subject emotion recognition. Front. Neurosci.
2018, 12, 162. [CrossRef]
11. Liu, Z.-T.; Xie, Q.; Wu, M.; Cao, W.-H.; Mei, Y.; Mao, J.-W. Speech emotion recognition based on an improved brain emotion
learning model. Neurocomputing 2018, 309, 145–156. [CrossRef]
12. Poria, S.; Majumder, N.; Mihalcea, R.; Hovy, E. Emotion recognition in conversation: Research challenges, datasets, and recent
advances. IEEE Access 2019, 7, 100943–100953. [CrossRef]
13. Shu, L.; Xie, J.; Yang, M.; Li, Z.; Li, Z.; Liao, D.; Xu, X.; Yang, X. A review of emotion recognition using physiological signals.
Sensors 2018, 18, 2074. [CrossRef] [PubMed]
14. Swain, M.; Routray, A.; Kabisatpathy, P. Databases, features and classifiers for speech emotion recognition: A review. Int. J. Speech
Technol. 2018, 21, 93–120. [CrossRef]
15. Zhang, T.; Zheng, W.; Cui, Z.; Zong, Y.; Li, Y. Spatial–temporal recurrent neural network for emotion recognition. IEEE Trans.
Cybern. 2018, 49, 839–847. [CrossRef] [PubMed]
Electronics 2023, 12, 2232 21 of 21

16. Li, X.; Hu, B.; Sun, S.; Cai, H. EEG-based mild depressive detection using feature selection methods and classifiers. Comput.
Methods Programs Biomed. 2016, 136, 151–161. [CrossRef]
17. Hou, Y.; Chen, S. Distinguishing different emotions evoked by music via electroencephalographic signals. Comput. Intell. Neurosci.
2019, 2019, 3191903. [CrossRef]
18. Hasanzadeh, F.; Annabestani, M.; Moghimi, S. Continuous emotion recognition during music listening using EEG signals: A
fuzzy parallel cascades model. Appl. Soft Comput. 2021, 101, 107028. [CrossRef]
19. Keelawat, P.; Thammasan, N.; Numao, M.; Kijsirikul, B. Spatiotemporal emotion recognition using deep CNN based on EEG
during music listening. arXiv 2019, arXiv:1910.09719.
20. Chen, J.; Jiang, D.; Zhang, Y.; Zhang, P. Emotion recognition from spatiotemporal EEG representations with hybrid convolutional
recurrent neural networks via wearable multi-channel headset. Comput. Commun. 2020, 154, 58–65. [CrossRef]
21. Wei, C.; Chen, L.-L.; Song, Z.-Z.; Lou, X.-G.; Li, D.-D. EEG-based emotion recognition using simple recurrent units network and
ensemble learning. Biomed. Signal Process. Control 2020, 58, 101756. [CrossRef]
22. Sheykhivand, S.; Mousavi, Z.; Rezaii, T.Y.; Farzamnia, A. Recognizing emotions evoked by music using CNN-LSTM networks on
EEG signals. IEEE Access 2020, 8, 139332–139345. [CrossRef]
23. Er, M.B.; Çiğ, H.; Aydilek, İ.B. A new approach to recognition of human emotions using brain signals and music stimuli. Appl.
Acoust. 2021, 175, 107840. [CrossRef]
24. Gao, Q.; Yang, Y.; Kang, Q.; Tian, Z.; Song, Y. EEG-based emotion recognition with feature fusion networks. Int. J. Mach. Learn.
Cybern. 2022, 13, 421–429. [CrossRef]
25. Nandini, D.; Yadav, J.; Rani, A.; Singh, V. Design of subject independent 3D VAD emotion detection system using EEG signals
and machine learning algorithms. Biomed. Signal Process. Control 2023, 85, 104894. [CrossRef]
26. Niu, W.; Ma, C.; Sun, X.; Li, M.; Gao, Z. A Brain Network Analysis-Based Double Way Deep Neural Network for Emotion
Recognition. IEEE Trans. Neural Syst. Rehabil. Eng. 2023, 31, 917–925. [CrossRef] [PubMed]
27. Zali-Vargahan, B.; Charmin, A.; Kalbkhani, H.; Barghandan, S. Deep time-frequency features and semi-supervised dimension
reduction for subject-independent emotion recognition from multi-channel EEG signals. Biomed. Signal Process. Control 2023, 85,
104806. [CrossRef]
28. Hou, F.; Gao, Q.; Song, Y.; Wang, Z.; Bai, Z.; Yang, Y.; Tian, Z. Deep feature pyramid network for EEG emotion recognition.
Measurement 2022, 201, 111724. [CrossRef]
29. Smarr, K.L.; Keefer, A.L. Measures of depression and depressive symptoms: Beck depression Inventory-II (BDI-II), center for
epidemiologic studies depression scale (CES-D), geriatric depression scale (GDS), hospital anxiety and depression scale (HADS),
and patient health Questionnaire-9 (PHQ-9). Arthritis Care Res. 2011, 63, S454–S466.
30. Mojiri, M.; Karimi-Ghartemani, M.; Bakhshai, A. Time-domain signal analysis using adaptive notch filter. IEEE Trans. Signal
Process. 2006, 55, 85–93. [CrossRef]
31. Robertson, D.G.E.; Dowling, J.J. Design and responses of Butterworth and critically damped digital filters. J. Electromyogr. Kinesiol.
2003, 13, 569–573. [CrossRef]
32. Kattenborn, T.; Leitloff, J.; Schiefer, F.; Hinz, S. Review on Convolutional Neural Networks (CNN) in vegetation remote sensing.
ISPRS J. Photogramm. Remote Sens. 2021, 173, 24–49. [CrossRef]
33. Novakovsky, G.; Dexter, N.; Libbrecht, M.W.; Wasserman, W.W.; Mostafavi, S. Obtaining genetics insights from deep learning via
explainable artificial intelligence. Nat. Rev. Genet. 2023, 24, 125–137. [CrossRef] [PubMed]
34. Khaleghi, N.; Rezaii, T.Y.; Beheshti, S.; Meshgini, S.; Sheykhivand, S.; Danishvar, S. Visual Saliency and Image Reconstruction
from EEG Signals via an Effective Geometric Deep Network-Based Generative Adversarial Network. Electronics 2022, 11, 3637.
[CrossRef]
35. Wang, J.; Wang, M. Review of the emotional feature extraction and classification using EEG signals. Cogn. Robot. 2021, 1, 29–40.
[CrossRef]
36. Mouley, J.; Sarkar, N.; De, S. Griffith crack analysis in nonlocal magneto-elastic strip using Daubechies wavelets. Waves Random
Complex Media 2023, 1–19. [CrossRef]
37. Zhao, H.; Ye, N.; Wang, R. Improved Cross-Corpus Speech Emotion Recognition Using Deep Local Domain Adaptation. Chin. J.
Electron. 2023, 32, 1–7.
38. Chanel, G.; Rebetez, C.; Bétrancourt, M.; Pun, T. Emotion assessment from physiological signals for adaptation of game difficulty.
IEEE Trans. Syst. Man Cybern.-Part A Syst. Hum. 2011, 41, 1052–1063. [CrossRef]
39. Sabahi, K.; Sheykhivand, S.; Mousavi, Z.; Rajabioun, M. Recognition COVID-19 cases using deep type-2 fuzzy neural networks
based on chest X-ray image. Comput. Intell. Electr. Eng. 2023, 14, 75–92.
40. Shahini, N.; Bahrami, Z.; Sheykhivand, S.; Marandi, S.; Danishvar, M.; Danishvar, S.; Roosta, Y. Automatically Identified EEG
Signals of Movement Intention Based on CNN Network (End-To-End). Electronics 2022, 11, 3297. [CrossRef]

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual
author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to
people or property resulting from any ideas, methods, instructions or products referred to in the content.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy