[15]
[15]
[15]
Article
Customized 2D CNN Model for the Automatic Emotion
Recognition Based on EEG Signals
Farzad Baradaran 1 , Ali Farzan 1, *, Sebelan Danishvar 2, * and Sobhan Sheykhivand 3
1 Department of Computer Engineering, Shabestar Branch, Islamic Azad University, Shabestar 53816-37181, Iran
2 College of Engineering, Design and Physical Sciences, Brunel University London, Uxbridge UB8 3PH, UK
3 Department of Biomedical Engineering, University of Bonab, Bonab 55517-61167, Iran;
s.sheykhivand@tabrizu.ac.ir
* Correspondence: alifarzan402@gmail.com (A.F.); sebelan.danishvar@brunel.ac.uk (S.D.)
Abstract: Automatic emotion recognition from electroencephalogram (EEG) signals can be considered
as the main component of brain–computer interface (BCI) systems. In the previous years, many
researchers in this direction have presented various algorithms for the automatic classification of
emotions from EEG signals, and they have achieved promising results; however, lack of stability,
high error, and low accuracy are still considered as the central gaps in this research. For this purpose,
obtaining a model with the precondition of stability, high accuracy, and low error is considered
essential for the automatic classification of emotions. In this research, a model based on Deep
Convolutional Neural Networks (DCNNs) is presented, which can classify three positive, negative,
and neutral emotions from EEG signals based on musical stimuli with high reliability. For this
purpose, a comprehensive database of EEG signals has been collected while volunteers were listening
to positive and negative music in order to stimulate the emotional state. The architecture of the
proposed model consists of a combination of six convolutional layers and two fully connected layers.
In this research, different feature learning and hand-crafted feature selection/extraction algorithms
were investigated and compared with each other in order to classify emotions. The proposed model
for the classification of two classes (positive and negative) and three classes (positive, neutral, and
negative) of emotions had 98% and 96% accuracy, respectively, which is very promising compared
with the results of previous research. In order to evaluate more fully, the proposed model was also
investigated in noisy environments; with a wide range of different SNRs, the classification accuracy
Citation: Baradaran, F.; Farzan, A.;
Danishvar, S.; Sheykhivand, S.
was still greater than 90%. Due to the high performance of the proposed model, it can be used in
Customized 2D CNN Model for the brain–computer user environments.
Automatic Emotion Recognition
Based on EEG Signals. Electronics Keywords: emotion recognition; deep learning; EEG; music; CNN
2023, 12, 2232. https://doi.org/
10.3390/electronics12102232
in their proposed model. They achieved satisfactory results based on the VGG16 network
in order to classify four different basic emotional classes, including happy, relaxed, sad,
and angry. Among the advantages of this research, low computational complexity and
low classification accuracy can be considered as disadvantages of this research. Zhao
et al. [24] presented a new model for automatic emotion recognition. Their model consisted
of two different parts. The first part consisted of a novel multi-feature fusion network that
used spatiotemporal neural network structures to learn spatiotemporal distinct emotional
information for emotion recognition. In this network, two common types of features, time
domain features (differential entropy and sample entropy) and frequency domain features
(power spectral density), were extracted. Then, in the second part, they were classified
into different classes by Softmax and SVM. These researchers used the DEAP dataset to
evaluate their proposed model and achieved promising results. However, computational
complexity can be considered a disadvantage of this research. Nandini et al. [25] used
multi-domain feature extraction and different time–frequency domain techniques and
wavelet-based atomic function to automatically detect emotions from EEG signals. These
researchers have used the DEAP database to evaluate their algorithm. In addition, they
used machine learning algorithms, such as Random Forest to classify the data and achieved
an average accuracy of 98%. Among the advantages of this research is the high classification
accuracy. Niu et al. [26] used a two-way deep residual neural network to classify discrete
emotions. At first, these researchers divided the EEG signal into five different frequency
bands using WT to enter the proposed network. In order to evaluate their algorithm,
they collected a dedicated database from seven participants. The classification accuracy
reported by these researchers was 94%. Among the problems of this research was the high
computational load. Vergan et al. [27] have used deep learning networks to classify three
and four emotional classes. These researchers used the CNN network to select and extract
features from EEG signals. In order to reduce the deep feature vector, the semi-supervised
dimensionality reduction method was used by these researchers. They used two databases,
DEAP and SEED, in order to evaluate their proposed method and achieved a high accuracy
of 90%. Hu et al. [28] used Feature Pyramid Network (FPN) to improve emotion recognition
performance based on EEG signals. In their proposed model, the Differential Entropy (DE)
of each recorded EEG channel was extracted as the main feature. These researchers used
SVM to score each class. The accuracy reported by these researchers in order to detect the
dimension of valence and arousal for the DEAP database was reported as 94% and 96%,
respectively. Among the advantages of this research is high classification accuracy. In addi-
tion, due to the computational complexity, their proposed model could not be implemented
on real-time systems, which can be considered a disadvantage of this research.
As reviewed and discussed, many studies have been conducted and organized for
automatic emotion recognition from EEG signals. However, these studies have limitations
and challenges. Most previous research has used manual and engineering features in
feature extraction/selection. Using manual features requires prior knowledge of the subject
and may not be optimal for another subject. In simpler terms, the use of engineering
features will not guarantee the optimality of the extracted feature vector. The next limitation
of previous research can be considered the absence of a comprehensive and up-to-date
database. Existing databases to emotion recognition are limited and are organized based
on visual stimulation and, thus, are not suitable for use in deep learning networks. It
can almost be said that there is no general and standard database based on auditory
stimulation. Many studies have used deep learning networks to detect emotions and
have achieved satisfactory results. However, due to computational complexity, these
studies cannot be implemented in real-time systems. Accordingly, this research tries to
overcome the mentioned limitations and present a new model with high reliability and
low computational complexity in order to achieve automatic emotion recognition. To this
end, a comprehensive database for emotion recognition based on musical stimuli has been
collected in the BCI laboratory of Tabriz University based on EEG signals in compliance
with the necessary standards. The proposed model is based on deep learning, which can
Electronics 2023, 12, 2232 4 of 21
identify the optimal features from the alpha, beta, and gamma bands extracted from the
recorded EEG signal to hierarchically and end-to-end classify the basic emotions in two
different scenarios. The contribution of this study is organized as follows:
• Collecting a comprehensive database of emotion recognition using musical stimulation
based on EEG signals.
• Presenting an intelligent model based on deep learning in order to separate two and
three basic emotional classes.
• Using end-to-end deep neural networks, which has led to the elimination of feature
selection/extraction block diagram.
• Providing an algorithm based on deep convolutional networks that can be resistant to
environmental noise to an acceptable extent.
• Presenting an automatic model that can classify two and three emotional classes with
the highest accuracy and the least error compared with previous research.
The remainder of the article is written and organized as follows: Section 2 is related
to materials and methods, and in this section, the method of data collection and the
mathematical background related to deep learning networks are described. Section 3 is
related to the proposed model, which describes the data preprocessing and the proposed
architecture. Section 4 presents the simulation results and compares the obtained results
with previous research. Section 5 discusses the applications related to the current research.
Finally, Section 6 addresses the conclusion.
Table 1 shows the details of the signal recording of the volunteers in the experiment
and the reason for removing some volunteers from the experiment process. To clarify the
reason for the exclusion of the volunteers in the experiment, Volunteer 6 was excluded
Electronics 2023, 11, x FOR PEER REVIEW 7 of 22
from the experiment due to the low level of positive emotional arousal (<6). In addition,
Volunteer 2 was excluded from the continuation of the signal recording process due to
depression disorder (21 > 22).
In
In order to arouse positive and negative emotion in subjects, musical stimulus was
used in in this
this study.
study. To this end, 10 pieces of music with happy and sad styles were played played
for the
the volunteers
volunteers through headphones. Each piece of music music was played
played for
for the
the volunteers
volunteers
for 11 min,
min, and
and the
the EEG
EEG signals
signals of
of the
the subjects
subjects were
were recorded.
recorded. Between
Between each
each music
music played,
played,
volunteers were given 15 ss of
the volunteers of rest
rest (neutral)
(neutral) in
in order
order to
to prevent
prevent the
the transfer
transfer of
of produced
produced
excitement. An
excitement. Anexample
exampleof ofthe
the recorded
recorded EEGEEG signal
signal for
for 33 different
different emotional
emotional states
states is
shown in Figure 1.
Figure
Figure 1.
1. An
An example
example of
of EEG
EEG signal
signal recorded
recorded from
from C4
C4 and
and F4 channels for
F4 channels positive, negative,
for positive, negative, and
and
neutral emotions in Subject 1.
neutral emotions in Subject 1.
In
In this
this way,
way, the
thesignals
signalsrecorded
recordedfrom
fromsubjects
subjectsforfor
happy
happysongs, sadsad
songs, songs, andand
songs, re-
laxation state are labeled as positive emotion, negative emotion, and neutral emotion,
relaxation state are labeled as positive emotion, negative emotion, and neutral emotion, re-
spectively.
respectively.Table
Table2 2shows
showsthe
thePersian
Persiansongs
songsplayed
playedfor
forsubjects.
subjects.Figure
Figure22shows
shows the
the order
order
of
of playing
playing music
music for
for subjects.
subjects.
1
G (ω ) = √ (1)
1 + ω 2n
where ω is the angular frequency in radians per second and n is the number of poles in the
filter—equal to the number of reactive elements in a passive filter. If ω = 1, the amplitude
√
response of this type of filter in the passband is 1/ 2 ≈ 0.7071, which is half power or
−3 dB [31].
expedites network training. BN is applied to the neurons’ output just before applying the
activation function. Usually, a neuron without BN is computed as follows:
where g() is the linear transformation of the neuron, w is the weight of the neuron, b is the
bias of the neurons, and f () is the activation function. The model learns the parameters w
and b. By adding the BN, the equation can be defined as follows:
z − mz
z = g(w, x ); zN = ( )·γ + β; a = f (z N ) (3)
sz
where zN is the output of BN, mz is the mean of the neurons’ output, Sz is the standard
deviation of the output of the neurons, and γ and β are learning parameters of BN [33].
One of the most significant components of deep learning is the performance of ac-
tivation functions because activation functions play an important part in the learning
process. An activation function is used after each convolutional layer. Various activation
functions, such as ReLU, Leaky-ReLU, ELU, and Softmax, are available to increase learning
performance on DNN networks. Since the discovery of the ReLU activation function,
which is presently the most often used activation unit, DNNs have come a long way. The
ReLU activation function overcomes the gradient removal problem while simultaneously
boosting learning performance. The ReLU activation function is described as follows [32]:
f if f > 0
q( f ) = (4)
0 otherwise
edn
σ(d)n = d
for n = 1, ...k, d = (d1 , ..., dk ) ∈ Rk (5)
∑kj=1 s j
where d is the input vector, the output values σ (d) are between 0 and 1, and their sum is
equal to 1.
In the prediction step of deep models, a loss function is utilized to learn the error ratio.
In machine learning approaches, the loss function is a method of evaluating and describing
model efficiency. An optimization approach is then used to minimize the error criterion.
Indeed, the results of optimization are used to update hyper parameters [33].
Figure
Figure 3.
3. The
The main
main framework
framework of
of the
the proposed
proposed deep
deep model.
model.
3.1. Data
As isPre-Processing
clear, the use of all EEG channels will increase the computational load due to
the increase in the dimensions
In this section, the method of of the feature matrix.
preprocessing Therefore,
the recorded it iswhich
data, necessary to identify
includes filter-
active
ing andchannels. Emotion-related
segmentation, EEG
is examined. electrodes
Brain signalswere distributed
are strongly mainly
affected byinnoise,
the prefrontal
and it is
lobe, temporal
necessary lobe margin,
to remove and posterior
environmental and occipital lobe [35].
motion noises These
using regions
different are precisely
filtering algo-
in line with the physiological principle of emotion generation. By selecting
rithms. For this purpose, two filtering algorithms were used in this work to remove the electrode
arti-
distribution,
facts. First, a the
Notchextracted feature
filter was dimension
applied can be greatly
to the recorded reduced.
EEG signals The complexity
in order to eliminate of
calculation can be diminished, and the experiment is more straightforward
the frequency of 50 Hz. Then, considering that emotional arousal occurs only between the and easier to
carry out. Based on this, only the electrodes of the prefrontal lobe, temporal lobe margin,
and posterior occipital lobe, which include Pz, T3, C3, C4, T4, F7, F3, Fz, F4, F8, Fp1, and
Fp2, were used for processing [35].
We selected 5 min (30 s × 5 min = 300 s) of the signals recorded from electrodes for
each positive and negative class. Considering that the sampling frequency was 250 Hz,
we had 75,000 available samples for each class. In the next step, 3 frequency bands α, β,
and γ were extracted from the data using 8th Daubechies WT [36]. For the first subject
and the positive emotional state, these frequency bands for one segment are presented in
Figure 4. Then, in order to avoid the phenomenon of overfitting, overlapping operations
were performed on the data obtained from the selected electrodes according to Figure 5.
According to this operation for the two-class scenario, the recorded data from the selected
electrodes was divided into 540 samples of 8 s each. All the mentioned steps were repeated
for the second scenario with the difference in data length. Finally, in this research, the input
data for both scenarios were applied to the proposed network in the form of images. Thus,
the input data for the first and second scenarios were equal to (7560) × (36 × 2000 × 1) and
(11340) × (36 × 2000 × 1), respectively. The input images from the extracted frequency
bands are shown in Figure 6.
According to this operation for the two-class scenario, the recorded data from the selected
electrodes was divided into 540 samples of 8 s each. All the mentioned steps were repeated
for the second scenario with the difference in data length. Finally, in this research, the
input data for both scenarios were applied to the proposed network in the form of images.
Thus, the input data for the first and second scenarios were equal to (7560) (36 2000 1)
Electronics 2023, 12, 2232 11 of 21
and (11340) (36 2000 1) , respectively. The input images from the extracted frequency
bands are shown in Figure 6.
Figure 5. Overlap operation performed on the EEG signal for each electrode in positive, negative,
Figure
Figure5.5. Overlap
Overlap operation
operation performed on the
performed on the EEG
EEG signal
signalfor
foreach
eachelectrode
electrodeininpositive,
positive,negative,
negative,
and neutral emotion.
and
andneutral
neutralemotion.
emotion.
Positive emotion
Positive emotion
Negative emotion
Negative emotion
Neutral emotion
Neutral emotion
Figure 6. Inputformed
formed imagesfor forpositive,
positive,negative,
negative,and
andneutral
neutralemotion
emotionfor
forthe
thefirst
firstsubject
subjectbased
based
Figure6.6. Input
Figure Input formed images
images for positive, negative, and neutral emotion for the first subject based
on
on extracted α ,
extractedα,αβ,, and
onextracted
β , and
β , and γ frequency
γ frequency
γ frequency bands.
bands.bands.
3.2.Deep
3.2. DeepArchitectural
ArchitecturalDetails
Details
Forthe
For theproposed
proposeddeep
deepnetwork
networkarchitecture,
architecture,a acombination
combinationofofsixsix convolutional
convolutional lay-
lay-
ers along with two fully connected layers was used. The order of the layers was as follows:
ers along with two fully connected layers was used. The order of the layers was as follows:
Electronics 2023, 12, 2232 12 of 21
Figure
Figure 7.
7. Graphical detailsofofthe
Graphical details thedesigned
designednetwork
network architecture
architecture along
along with
with the the
sizessizes of filters,
of filters, layers,
layers, etc. etc.
The
The hyper-parameters
hyper-parameters related
relatedto the proposed
to the proposedmodel werewere
model carefully adjusted
carefully in order
adjusted in or-
to achieve the best efficiency and convergence by the trial-and-error method.
der to achieve the best efficiency and convergence by the trial-and-error method. Accord- Accordingly,
the Cross-Entropy
ingly, objectiveobjective
the Cross-Entropy functionfunction
and RmSprop optimization
and RmSprop with a learning
optimization with rate of
a learning
0.001of
rate and a batch
0.001 andsize of 10 size
a batch wereofselected.
10 wereMore detailsMore
selected. related to therelated
details size of the filters,
to the sizethe
of the
number of steps, and the type of layers used are shown in Table
filters, the number of steps, and the type of layers used are shown in Table 3.3.
In the design of the proposed architecture, we attempted to take into account the
Table 3. The detailsfor
best dimensions the network
of the size andarchitecture,
strides of filters, optimizers,
including the size ofetc., so that
the filters, the
the proposed
number of layers,
model could perform
and the type of layers. best in terms of different evaluation criteria in order to emotion
recognition. Table 4 shows the number of different layers of the network, different types
Number of Size of Kernel
of optimizers, sizes of filters and steps, etc., which Activation
were used in choosing the optimal
Padding Strides Output Shape Layer Type L
Filters mode of the proposed
and Pooling architecture. According to Table 4, the best possible state of the used
Function
Yes 16 2parameters128 was selected
× 128 in the 18,
(None, architecture
1000, 16) of the proposed
Leaky ReLUmodel. Convolution 2-D 0–1
No - 2 2×2 (None, 9, 500, 16) - Max-Pooling 2-D 1–2
Yes 32 1 3×3 (None, 9,500, 32) Leaky ReLU Convolution 2-D 2–3
No - 2 2×2 (None, 4, 250, 32) - Max-Pooling 2-D 3–4
Yes 32 1 3×3 (None, 4, 250, 32) Leaky ReLU Convolution 2-D 4–5
No - 2 2×2 (None, 2, 125, 32) - Max-Pooling 2-D 5–6
Yes 32 1 3×3 (None, 2, 125, 32) Leaky ReLU Convolution 2-D 6–7
Electronics 2023, 12, 2232 13 of 21
Table 3. The details of the network architecture, including the size of the filters, the number of layers,
and the type of layers.
Table 4. The details of the designed network architecture, including the size of the filters, the number
of layers, and the type of layers.
The number of divided samples for each of the training, test, and validation sets are
examined in this section. Based on this, the total number of samples in this study for the
first and second scenarios was 7560 and 11,340, respectively, of which 70% were randomly
selected for the training set (5292 samples for the two-class state and 7938 samples for
the three-class state), 10% of the dataset was selected for the validation set (756 samples
for the two class state and 1134 samples for the three class state), and 20% of the dataset
(1512 samples for the two class state and 2268 samples for the three class state) was selected
for the test set. The collection related to model training and evaluation is shown in Figure 8.
selected for the training set (5292 samples for the two-class state and 7938
three-class state), 10% of the dataset was selected for the validation set
the two class state and 1134 samples for the three class state), and 20% of
Electronics 2023, 12, 2232 samples for the two class state and 2268 samples for the three class
14 of 21 state)
the test set. The collection related to model training and evaluation is sho
Dataset distribution
Figure8.8. Dataset
Figure along the
distribution data recognition
along the dataprocess.
recognition process.
4. Experimental Results
4. Experimental Results
Python programming language under Keras and TensorFlow was used to simulate
the proposed deep model. All simulation results are extracted from a computer system
withPython
16 GB RAM,programming
a 2.8 GHz CPU, and language
a 2040 GPU.under Keras and TensorFlow was
Figure 9 depicts the classification
the proposed deep model. All simulation error and accuracy graph for
results various
are scenarios from
extracted for ac
training and validation data in 200 network iterations. Figure 9a shows that the network
with 16forGB
error theRAM,
two-classa state
2.8 GHz
reachedCPU,
a stableand
stateaby2040 GPU.
increasing the algorithm iteration
in the 165th iteration. Figure 9 shows that after 200 repetitions, the proposed method for
emotion recognition achieved 98% and 96% accuracy in the two-class and the three-class
states, respectively. Figure 10 depicts the confusion matrix used to classify the scenarios
under consideration. According to Figure 10, the proposed deep network’s performance
is very promising. Table 5 also shows the values of accuracy, sensitivity, specificity, and
precision for each emotion in different scenarios. As can be seen, all the values obtained for
each class of the two considered scenarios were greater than 90%. A visualization of the
samples before and after entering the network was considered to demonstrate the more
accurate performance of the proposed network. Figure 11 shows a TSen diagram with
this visualization. As can be seen, the proposed model successfully separated the samples
related to each emotion in each scenario. This positive outcome is due to the proposed
improved CNN architecture. Figure 12 depicts the Receiver Operating Characteristic
Curve (ROC) analysis for different scenario classifications. An ROC is a graphical plot
that illustrates the diagnostic ability of a classifier system as its discrimination threshold
is varied. In this diagram, the farther the curve is from the bisector and the closer it
is to the vertical line on the left, the better the performance of the classifier. As is well
known, each emotion class in the ROC analysis has a score between 0.9 and 1, indicating
excellent classification performance. Based on the results, it is possible to conclude that
the proposed deep model for classifying different emotional classes was very efficient and
met the relevant expectations. However, in order to conduct further analysis, the obtained
results must be compared with other studies. The findings will be compared with other
previous studies and methods for this purpose.
Sheykhivand et al. [22] Music CNN-LSTM 3 96
Electronics
Electronics 2023, 2023, HouPEER
11, x FOR
12, 2232 et al. [28]
REVIEW Video Clip FPN+SVM 4 95.50 15 of1621of 22
Proposed model Music Customized CNN 3 98
a.
Accuracy (%)
Loss
b.
Train Accuracy Validation Accuracy Train Loss Validation Loss
b.
100 0.35
90 0.3
80 0.25
70 0.2
60 0.15
50 0.1
40 0.05
30 0
0 20 40 60 80 100 120 140 160 180 200
Itrations
Figure
Figure 9. The proposed
FigureThe9.proposed
9. deep model’s
The deep model’s
performance
proposed performance
deep model’s in classifying
in classifying
performance differentdifferent scenarios
scenarios
in classifying (a and
different (a,b)
b) in
in terms
scenarios ofb) in
(a and
accuracy
terms and
of classification
accuracy and error in 200 iterations.
classification
terms of accuracy and classification error in 200 iterations. error in 200 iterations.
First Scenario
Second Scenario
Second Scenario
Figure 11. Visualization of test samples before and after entering the proposed deep model for
Figure 11. Visualization of test samples before and after entering the proposed deep model for
different scenarios.
Figure 11. Visualization of test samples before and after entering the proposed deep model for
different scenarios.
different scenarios.
Table 5. The accuracy, sensitivity, specificity, and precision achieved by each class for different scenarios.
Table 6. Comparing the performance of prior research with the proposed model.
Number of
Study Stimulus Methods Emotions ACC%
Considered
Zhao et al. [37] Music Deep local domain 4 89
Frequency bands
Chanel et al. [38] Video Games 3 63
extraction
Principal
Jirayucharoensak et al. [39] Video Clip component 3 49.52
analysis
Er et al. [23] Music VGG16 4 74
Sheykhivand et al. [22] Music CNN-LSTM 3 96
Hou et al. [28] Video Clip FPN+SVM 4 95.50
Proposed model Music Customized CNN 3 98
To more accurately evaluate the proposed model, the deep network architecture
presented in this study was compared with other common methods and previous research
used for automatic emotion recognition. In this regard, two methods based on raw signal
feature learning and engineering feature extraction (manual) were used along with MLP
classifiers, CNN-1D, CNN-LSTM (1D), and the proposed CNN-2D model. The gamma band
was extracted from the recorded EEG signals for engineering features (using a 5-level Daubechies
WT). From the obtained gamma band, two Root Mean Square (RMS) and Standard Deviation
(SD) features were extracted. Based on this, the input dimensions for the first and second
scenarios were (2 × 7 × 540) × (e × 2) and (3 × 7 × 540) × (e × 2), respectively. Following that,
MLP, CNN-1D, CNN-LSTM (1D), and proposed CNN-2D networks were used to classify
the extracted feature vector. The raw signals were classified using expressed networks for
feature learning, with no manual feature extraction or selection. The MLP network had two
fully connected layers, the last of which had two neurons (for the two-class state) and three
neurons (for the three-class state). Following [22], CNN-1D and CNN-LSTM (1D) network
architectures were considered. To improve the performance of the expressed networks,
their hyperparameters were adjusted on the basis of the type of data. The results of this
comparison are shown in Table 7 and Figure 13. According to Table 7, feature learning from
raw signals for CNN-1D, CNN-LSTM (1D), and proposed CNN-2D deep networks were
continually improved, and these networks could learn important features layer by layer,
resulting in two-class and three-class scenarios with accuracy greater than 90%. On the
contrary, as can be seen from the engineering features used as input in CNN-1D, CNN-
LSTM (1D), and CNN-2D deep networks, these networks did not improve recognition.
band was extracted from the recorded EEG signals for engineering features (using a 5-
level Daubechies WT). From the obtained gamma band, two Root Mean Square (RMS)
and Standard Deviation (SD) features were extracted. Based on this, the input dimensions
for the first and second scenarios were (2 7 540) (e 2) and (3 7 540) (e 2) , re-
Electronics 2023, 12, 2232 18 of 21
spectively. Following that, MLP, CNN-1D, CNN-LSTM (1D), and proposed CNN-2D net-
works were used to classify the extracted feature vector. The raw signals were classified
using expressed networks for feature learning, with no manual feature extraction or se-
When feature
lection. The MLP learning
network and
had engineering features were
two fully connected layers,compared, featurehad
the last of which learning
two neu-from
raw (for
rons datathewith CNN-1D,
two-class CNN-LSTM
state) (1D), and
and three neurons CNN-2D
(for deep networks
the three-class outperformed
state). Following [22],
engineering
CNN-1D andfeatures.
CNN-LSTM (1D) network architectures were considered. To improve the
performance of the expressed networks, their hyperparameters were adjusted on the basis
of the 7.type
Table of data.the
Comparing The results of of
performance this comparison
different modelsare shown
with in learning
different Table 7 methods.
and Figure 13.
According to Table 7, feature learning from raw signals for CNN-1D, CNN-LSTM (1D),
Feature Learning Eng. Features
and proposed
Model CNN-2D deep networks were continually improved, and these networks
First Scenario
could learn important Second
features layer Scenario
by layer, resultingFirst Scenario andSecond
in two-class Scenario
three-class sce-
nariosMLP
with accuracy greater
75% than 90%. On the
70% contrary, as can
79%be seen from the engineer-
74%
ing features used as input in CNN-1D, CNN-LSTM (1D), and CNN-2D deep networks,
1D-CNN 94% 90% 82% 76%
these networks did not improve recognition. When feature learning and engineering fea-
CNN-LSTM
tures were compared,97% 95% raw data with80%
feature learning from CNN-1D, CNN-LSTM 77% (1D),
and2D-CNN
CNN-2D deep networks98% outperformed 81%
96%engineering features. 75%
Figure
Figure13.
13.Bar
Bardiagram
diagramcomparing
comparingdifferent
differentmodels
modelswith
withdifferent
differentlearning methods.
learning methods.
Table This
7. Comparing
result is the performance
related to theseofnetworks’
different models with
distinct different learning
architecture, whichmethods.
can automatically
extract useful features from raw data for classification. Furthermore, obtaining engineer-
Feature Learning Eng. Features
ing features
Model necessitates expertise and prior knowledge, whereas learning features from
First Scenario Second Scenario First Scenario Second Scenario
raw data allows less specialized knowledge. While CNN-1D, CNN-LSTM (1D), and pro-
MLP
posed CNN-2D deep networks 75% 70% when learning
perform better 79%features from 74%
raw data, all
investigated models, including CNN-1D, CNN-LSTM (1D),82%
1D-CNN 94% 90% CNN-2D, and MLP 76%networks,
CNN-LSTM
performed 97% when learning
nearly identically 95%engineering features.
80% 77%
This demonstrates that
deep 2D-CNN 98%
networks cannot outperform 96% methods in81%
traditional 75% without
emotion recognition
feature learning ability.
This
The result
natureisof
related
brain to these indicates
signals networks’thatdistinct
theyarchitecture, which can automatically
have a low signal-to-noise ratio (SNR)
extract
and areuseful features
highly fromtoraw
sensitive dataThis
noise. for classification.
issue may make Furthermore,
different obtaining engineer-
classes’ classification
ing features
accuracy necessitates
difficult. As aexpertise
result, it and prior knowledge,
is necessary to designwhereas learning
the proposed featuresinfrom
network order
to classify different emotions in a way that is less sensitive to environmental noises. As
a result, in this study, we artificially tested the performance of the proposed model in
noisy environments. Gaussian white noise with different SNRs was added to the data for
this purpose. Figure 14 depicts the classification results in noisy environments obtained
using the proposed model. As is well known, the proposed deeply customized model has
a high noise resistance when compared with other networks. This subject is related to
personalized architecture (use of large filter dimensions in the initial layer of the network
and use of tuned filter dimensions in the middle layers).
classify different emotions in a way that is less sensitive to environmental noises. As a
result, in this study, we artificially tested the performance of the proposed model in noisy
environments. Gaussian white noise with different SNRs was added to the data for this
purpose. Figure 14 depicts the classification results in noisy environments obtained using
the proposed model. As is well known, the proposed deeply customized model has a high
Electronics 2023, 12, 2232 noise resistance when compared with other networks. This subject is related to personal- 19 of 21
ized architecture (use of large filter dimensions in the initial layer of the network and use
of tuned filter dimensions in the middle layers).
Figure 14. Bar diagram comparing different models with different learning methods.
Figure 14. Bar diagram comparing different models with different learning methods.
Despite its positive results, this study, like all others, has benefits and drawbacks.
Despite its positive results, this study, like all others, has benefits and drawbacks.
One of this study’s limitations is the small number of emotional classes. To that end, the
One of this study’s limitations is the small number of emotional classes. To that end, the
number of emotional classes in the collected database should be increased. To address
number of emotional
existing uncertainties, classes
we plan in athe
to use collected
Generative databaseNetwork
Adversarial should (GAN)
be increased.
instead To address
existing uncertainties, we plan to use a Generative Adversarial Network
of classical data augmentation and Type 2 Fuzzy Networks in conjunction with CNN (GAN)
in instead of
classical
the future. data augmentation
The proposed andisType
architecture 2 Fuzzy
suitable Networks
for real-time in conjunction
applications due to its with
sim- CNN in the
ple and end-to-end
future. architecture.
The proposed architecture is suitable for real-time applications due to its simple
and end-to-end architecture.
5. Discussion
In this section, the possible applications of the present study are reviewed along with
5. Discussion
the practical implications of the developed emotion recognition methodology for the new
In this section, the possible applications of the present study are reviewed along with
Society 5.0 paradigm.
the practical implications of the developed emotion recognition methodology for the new
Society 5.0 paradigm.
The potential to provide machines with emotional intelligence in order to improve
the intuitiveness, authenticity, and naturalness of the connection between humans and
robots is an exciting problem in the field of human–robot interaction. A key component in
doing this is the robot’s capacity to deduce and comprehend human emotions. Emotions,
as previously noted, are vital to the human experience and influence behavior. They are
fundamental to communication, and effective relationships depend on having emotional
intelligence or the capacity to recognize, control, and use one’s emotions. The goal of
affective computing is to provide robots with emotional intelligence to enhance regular
human–machine interaction. (HMI). It is envisaged that BCI would enable robots to exhibit
human-like observation, interpretation, and emotional expression skills. Following are the
three primary perspectives that have been used to analyze emotions [40]:
a. Formalization of the robot’s internal emotional state: Adding emotional character-
istics to agents and robots can increase their efficacy, adaptability, and plausibility.
Determining neurocomputational models, formalizing them in already-existing cog-
nitive architectures, modifying well-known cognitive models, or designing special-
ized emotional architectures has, thus, been the focus of robot design in recent years.
b. Robotic emotional expression: In situations requiring complicated social interac-
tion, such as assistive, educational, and social robotics, the capacity of robots to
display recognisable emotional expressions has a significant influence on the social
interaction that results.
c. Robots’ capacity to discern human emotional state: Interacting with people would
be improved if robots could discern and comprehend human emotions.
Electronics 2023, 12, 2232 20 of 21
According to the desired performance of the present study, the proposed model can
be used in each of the discussed cases.
6. Conclusions
In this paper, a new model for automatic emotion recognition using EEG signals was
developed. A standard database was collected for this purpose by music stimulation using
EEG signals to recognize three classes of positive, negative, and neutral emotions. A Deep
Learning model based on two-dimensional CNN networks was also customized for feature
selection/extraction and classification operations. The proposed network, which included
six convolutional layers and two fully connected layers, could classify three emotions in
two different scenarios with 95% accuracy. Furthermore, the architecture suggested in
this study was tested in a noisy environment and yielded acceptable results across a wide
range of SNRs. As a result, even at −4 dB, the categorization accuracy of greater than 90%
was maintained. In addition, the proposed method was compared with previous methods
and studies in terms of different measuring criteria, and it had a promising performance.
According to the favorable results of the proposed model, it can be used in real-time
emotion recognition based on BCI systems.
Author Contributions: Conceptualization, F.B.; methodology, S.S. and S.D.; software, A.F. and F.B;
validation, S.D. and S.S.; writing—original draft preparation, F.B. and A.F. All authors have read and
agreed to the published version of the manuscript.
Funding: This research received no external funding.
Data Availability Statement: The data related to this article is publicly available on the GitHub
platform under the title Baradaran emotion dataset.
Conflicts of Interest: The authors declare no conflict of interest.
References
1. Alswaidan, N.; Menai, M.E.B. A survey of state-of-the-art approaches for emotion recognition in text. Knowl. Inf. Syst. 2020, 62,
2937–2987. [CrossRef]
2. Sheykhivand, S.; Rezaii, T.Y.; Meshgini, S.; Makoui, S.; Farzamnia, A. Developing a deep neural network for driver fatigue
detection using EEG signals based on compressed sensing. Sustainability 2022, 14, 2941. [CrossRef]
3. Sheykhivand, S.; Meshgini, S.; Mousavi, Z. Automatic detection of various epileptic seizures from EEG signal using deep learning
networks. Comput. Intell. Electr. Eng. 2020, 11, 1–12.
4. Dzedzickis, A.; Kaklauskas, A.; Bucinskas, V. Human emotion recognition: Review of sensors and methods. Sensors 2020, 20, 592.
[CrossRef]
5. Egger, M.; Ley, M.; Hanke, S. Emotion recognition from physiological signal analysis: A review. Electron. Notes Theor. Comput. Sci.
2019, 343, 35–55. [CrossRef]
6. Jain, M.; Narayan, S.; Balaji, P.; Bhowmick, A.; Muthu, R.K. Speech emotion recognition using support vector machine. arXiv
2020, arXiv:2002.07590.
7. Khalil, R.A.; Jones, E.; Babar, M.I.; Jan, T.; Zafar, M.H.; Alhussain, T. Speech emotion recognition using deep learning techniques:
A review. IEEE Access 2019, 7, 117327–117345. [CrossRef]
8. Ko, B.C. A brief review of facial emotion recognition based on visual information. Sensors 2018, 18, 401. [CrossRef]
9. Lee, J.; Kim, S.; Kim, S.; Park, J.; Sohn, K. In Context-aware emotion recognition networks. In Proceedings of the IEEE/CVF
International Conference on Computer Vision, Seoul, Republic of Korea, 27 October–2 November 2019; pp. 10143–10152.
10. Li, X.; Song, D.; Zhang, P.; Zhang, Y.; Hou, Y.; Hu, B. Exploring EEG features in cross-subject emotion recognition. Front. Neurosci.
2018, 12, 162. [CrossRef]
11. Liu, Z.-T.; Xie, Q.; Wu, M.; Cao, W.-H.; Mei, Y.; Mao, J.-W. Speech emotion recognition based on an improved brain emotion
learning model. Neurocomputing 2018, 309, 145–156. [CrossRef]
12. Poria, S.; Majumder, N.; Mihalcea, R.; Hovy, E. Emotion recognition in conversation: Research challenges, datasets, and recent
advances. IEEE Access 2019, 7, 100943–100953. [CrossRef]
13. Shu, L.; Xie, J.; Yang, M.; Li, Z.; Li, Z.; Liao, D.; Xu, X.; Yang, X. A review of emotion recognition using physiological signals.
Sensors 2018, 18, 2074. [CrossRef] [PubMed]
14. Swain, M.; Routray, A.; Kabisatpathy, P. Databases, features and classifiers for speech emotion recognition: A review. Int. J. Speech
Technol. 2018, 21, 93–120. [CrossRef]
15. Zhang, T.; Zheng, W.; Cui, Z.; Zong, Y.; Li, Y. Spatial–temporal recurrent neural network for emotion recognition. IEEE Trans.
Cybern. 2018, 49, 839–847. [CrossRef] [PubMed]
Electronics 2023, 12, 2232 21 of 21
16. Li, X.; Hu, B.; Sun, S.; Cai, H. EEG-based mild depressive detection using feature selection methods and classifiers. Comput.
Methods Programs Biomed. 2016, 136, 151–161. [CrossRef]
17. Hou, Y.; Chen, S. Distinguishing different emotions evoked by music via electroencephalographic signals. Comput. Intell. Neurosci.
2019, 2019, 3191903. [CrossRef]
18. Hasanzadeh, F.; Annabestani, M.; Moghimi, S. Continuous emotion recognition during music listening using EEG signals: A
fuzzy parallel cascades model. Appl. Soft Comput. 2021, 101, 107028. [CrossRef]
19. Keelawat, P.; Thammasan, N.; Numao, M.; Kijsirikul, B. Spatiotemporal emotion recognition using deep CNN based on EEG
during music listening. arXiv 2019, arXiv:1910.09719.
20. Chen, J.; Jiang, D.; Zhang, Y.; Zhang, P. Emotion recognition from spatiotemporal EEG representations with hybrid convolutional
recurrent neural networks via wearable multi-channel headset. Comput. Commun. 2020, 154, 58–65. [CrossRef]
21. Wei, C.; Chen, L.-L.; Song, Z.-Z.; Lou, X.-G.; Li, D.-D. EEG-based emotion recognition using simple recurrent units network and
ensemble learning. Biomed. Signal Process. Control 2020, 58, 101756. [CrossRef]
22. Sheykhivand, S.; Mousavi, Z.; Rezaii, T.Y.; Farzamnia, A. Recognizing emotions evoked by music using CNN-LSTM networks on
EEG signals. IEEE Access 2020, 8, 139332–139345. [CrossRef]
23. Er, M.B.; Çiğ, H.; Aydilek, İ.B. A new approach to recognition of human emotions using brain signals and music stimuli. Appl.
Acoust. 2021, 175, 107840. [CrossRef]
24. Gao, Q.; Yang, Y.; Kang, Q.; Tian, Z.; Song, Y. EEG-based emotion recognition with feature fusion networks. Int. J. Mach. Learn.
Cybern. 2022, 13, 421–429. [CrossRef]
25. Nandini, D.; Yadav, J.; Rani, A.; Singh, V. Design of subject independent 3D VAD emotion detection system using EEG signals
and machine learning algorithms. Biomed. Signal Process. Control 2023, 85, 104894. [CrossRef]
26. Niu, W.; Ma, C.; Sun, X.; Li, M.; Gao, Z. A Brain Network Analysis-Based Double Way Deep Neural Network for Emotion
Recognition. IEEE Trans. Neural Syst. Rehabil. Eng. 2023, 31, 917–925. [CrossRef] [PubMed]
27. Zali-Vargahan, B.; Charmin, A.; Kalbkhani, H.; Barghandan, S. Deep time-frequency features and semi-supervised dimension
reduction for subject-independent emotion recognition from multi-channel EEG signals. Biomed. Signal Process. Control 2023, 85,
104806. [CrossRef]
28. Hou, F.; Gao, Q.; Song, Y.; Wang, Z.; Bai, Z.; Yang, Y.; Tian, Z. Deep feature pyramid network for EEG emotion recognition.
Measurement 2022, 201, 111724. [CrossRef]
29. Smarr, K.L.; Keefer, A.L. Measures of depression and depressive symptoms: Beck depression Inventory-II (BDI-II), center for
epidemiologic studies depression scale (CES-D), geriatric depression scale (GDS), hospital anxiety and depression scale (HADS),
and patient health Questionnaire-9 (PHQ-9). Arthritis Care Res. 2011, 63, S454–S466.
30. Mojiri, M.; Karimi-Ghartemani, M.; Bakhshai, A. Time-domain signal analysis using adaptive notch filter. IEEE Trans. Signal
Process. 2006, 55, 85–93. [CrossRef]
31. Robertson, D.G.E.; Dowling, J.J. Design and responses of Butterworth and critically damped digital filters. J. Electromyogr. Kinesiol.
2003, 13, 569–573. [CrossRef]
32. Kattenborn, T.; Leitloff, J.; Schiefer, F.; Hinz, S. Review on Convolutional Neural Networks (CNN) in vegetation remote sensing.
ISPRS J. Photogramm. Remote Sens. 2021, 173, 24–49. [CrossRef]
33. Novakovsky, G.; Dexter, N.; Libbrecht, M.W.; Wasserman, W.W.; Mostafavi, S. Obtaining genetics insights from deep learning via
explainable artificial intelligence. Nat. Rev. Genet. 2023, 24, 125–137. [CrossRef] [PubMed]
34. Khaleghi, N.; Rezaii, T.Y.; Beheshti, S.; Meshgini, S.; Sheykhivand, S.; Danishvar, S. Visual Saliency and Image Reconstruction
from EEG Signals via an Effective Geometric Deep Network-Based Generative Adversarial Network. Electronics 2022, 11, 3637.
[CrossRef]
35. Wang, J.; Wang, M. Review of the emotional feature extraction and classification using EEG signals. Cogn. Robot. 2021, 1, 29–40.
[CrossRef]
36. Mouley, J.; Sarkar, N.; De, S. Griffith crack analysis in nonlocal magneto-elastic strip using Daubechies wavelets. Waves Random
Complex Media 2023, 1–19. [CrossRef]
37. Zhao, H.; Ye, N.; Wang, R. Improved Cross-Corpus Speech Emotion Recognition Using Deep Local Domain Adaptation. Chin. J.
Electron. 2023, 32, 1–7.
38. Chanel, G.; Rebetez, C.; Bétrancourt, M.; Pun, T. Emotion assessment from physiological signals for adaptation of game difficulty.
IEEE Trans. Syst. Man Cybern.-Part A Syst. Hum. 2011, 41, 1052–1063. [CrossRef]
39. Sabahi, K.; Sheykhivand, S.; Mousavi, Z.; Rajabioun, M. Recognition COVID-19 cases using deep type-2 fuzzy neural networks
based on chest X-ray image. Comput. Intell. Electr. Eng. 2023, 14, 75–92.
40. Shahini, N.; Bahrami, Z.; Sheykhivand, S.; Marandi, S.; Danishvar, M.; Danishvar, S.; Roosta, Y. Automatically Identified EEG
Signals of Movement Intention Based on CNN Network (End-To-End). Electronics 2022, 11, 3297. [CrossRef]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual
author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to
people or property resulting from any ideas, methods, instructions or products referred to in the content.