0% found this document useful (0 votes)
38 views

EEG emotion recognition article

This study proposes an efficient 3D emotion recognition model for variable-length EEG data using an overlapping sliding window (OSW) framework and deep learning techniques. The model achieves high accuracy rates of 96.63%, 95.87%, and 96.30% for arousal, valence, and dominance emotions, respectively, outperforming existing methods. The research addresses challenges related to imbalanced and variable-length data in the AMIGOS dataset, focusing on effective feature extraction and classification using convolutional neural networks.

Uploaded by

shruti
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
38 views

EEG emotion recognition article

This study proposes an efficient 3D emotion recognition model for variable-length EEG data using an overlapping sliding window (OSW) framework and deep learning techniques. The model achieves high accuracy rates of 96.63%, 95.87%, and 96.30% for arousal, valence, and dominance emotions, respectively, outperforming existing methods. The research addresses challenges related to imbalanced and variable-length data in the AMIGOS dataset, focusing on effective feature extraction and classification using convolutional neural networks.

Uploaded by

shruti
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 17

The current issue and full text archive of this journal is available on Emerald Insight at:

https://www.emerald.com/insight/2210-8327.htm

An overlapping sliding window and Emotion


recognition
combined features based emotion system for
EEG signals
recognition system for EEG signals
Shruti Garg, Rahul Kumar Patro, Soumyajit Behera and
Neha Prerna Tigga
Received 18 May 2021
Birla Institute of Technology, Ranchi, India, and Revised 18 July 2021
Ranjita Pandey Accepted 7 August 2021

University of Delhi, New Delhi, India

Abstract
Purpose – The purpose of this study is to propose an alternative efficient 3D emotion recognition model for
variable-length electroencephalogram (EEG) data.
Design/methodology/approach – Classical AMIGOS data set which comprises of multimodal records of
varying lengths on mood, personality and other physiological aspects on emotional response is used for
empirical assessment of the proposed overlapping sliding window (OSW) modelling framework. Two features
are extracted using Fourier and Wavelet transforms: normalised band power (NBP) and normalised wavelet
energy (NWE), respectively. The arousal, valence and dominance (AVD) emotions are predicted using one-
dimension (1D) and two-dimensional (2D) convolution neural network (CNN) for both single and combined
features.
Findings – The two-dimensional convolution neural network (2D CNN) outcomes on EEG signals of AMIGOS
data set are observed to yield the highest accuracy, that is 96.63%, 95.87% and 96.30% for AVD, respectively,
which is evidenced to be at least 6% higher as compared to the other available competitive approaches.
Originality/value – The present work is focussed on the less explored, complex AMIGOS (2018) data set
which is imbalanced and of variable length. EEG emotion recognition-based work is widely available on
simpler data sets. The following are the challenges of the AMIGOS data set addressed in the present work:
handling of tensor form data; proposing an efficient method for generating sufficient equal-length samples
corresponding to imbalanced and variable-length data.; selecting a suitable machine learning/deep learning
model; improving the accuracy of the applied model.
Keywords Electroencephalography (EEG), Emotion recognition (ER),
1D and 2D convolution neural network (CNN)
Paper type Research paper

1. Introduction
Emotions are a manifestation of intuitive states of the mind. They are known to be generated
by events occurring in a person’s environment or internally generated by thoughts [1].
Identification and classification of these emotions using computers have been widely studied
under affective computing and human–computer interface [2].
Emotions are recognised using physiological or non-physiological signals [3].
Electroencephalogram (EEG), electrocardiogram (ECG) [4], galvanic skin response (GSR),
blood volume pulse (BVP) [5] and respiratory suspended particulate (RSP) [6] are popular

© Shruti Garg, Rahul Kumar Patro, Soumyajit Behera, Neha Prerna Tigga and Ranjita Pandey.
Published in Applied Computing and Informatics. Published by Emerald Publishing Limited. This article
is published under the Creative Commons Attribution (CC BY 4.0) licence. Anyone may reproduce,
distribute, translate and create derivative works of this article (for both commercial and non-commercial
purposes), subject to full attribution to the original publication and authors. The full terms of this licence Applied Computing and
Informatics
may be seen at http://creativecommons.org/licences/by/4.0/legalcode Emerald Publishing Limited
First and fifth authors acknowledge FRP grant extended by the University of Delhi under IoE e-ISSN: 2210-8327
p-ISSN: 2634-1964
initiative. DOI 10.1108/ACI-05-2021-0130
ACI tools used in literature to obtain physiological signals, while facial expressions [7], speech [8],
body gestures and videos [9] give non-physiological signals. The advantage of using the
physiological signals for ER is that they are directly captured from human body which gives
true response of human intuitions [10] unlike non-physiological signals that can be
synthetically elicited. Thus, the EEG signals are suitable tool for current research. However,
since EEG signals involve studying human behaviour directly, there is a limitation to
the number of samples that can be collected while deep learning (DL) methods required large
number of samples to work efficiently. Therefore, there is a need for innovative resampling
method to be able to apply DL methods.
The EEG signals are generated by electrical waves corresponding to brain activity
presented by external stimuli [11]. The raw signals need to be pre-processed, and then
appropriate features need to be extracted to get emotions from the signals. Lastly, an efficient
classifier is applied to obtain an appropriate recognition of emotions.
The features of EEG signals were frequently extracted in the time, frequency and time–
frequency domains. The features extracted in the time domain are the Hjorth feature [12],
fractal dimension feature [13] and higher-order crossing feature [14]. The features used in the
frequency domain are power spectral density (PSD) [15], spectral entropy (SE) [16] and
differential entropy [17]. Wavelets and a short-time Fourier transform (STFT) [18] have been
used to extract the time–frequency domain features.
After feature extraction, the machine learning (ML) and DL methods are primarily applied
in literature for classification [19]. The ML methods applied are k-nearest neighbour (KNN),
random forest (RF), decision tree (DT), neural network (NN) and support vector machine
(SVM) for ER. The DL methods used for ER are convolution neural network (CNN), long
short-term memory (LSTM), recurrent neural network (RNN) and several other variants. The
DL methods are found to work with greater accuracy [20]. Table 1 shows a summary of DL
methods applied in recent years.
Apart from these, nature-inspired algorithms have also been applied on ER tasks for
feature selection, such as on the DEAP data set along with particle swarm optimisation (PSO)
[21] and firefly optimisation (FO) [30]. At the same time, LSTM and SVM were used as
classifiers. Feature selection through FO has been known to achieve an accuracy of 86.90%,
while PSO feature selection recorded an accuracy of 84.16%.
Emotions in ER can be classified in two ways: discrete emotions, such as anger, happiness,
sadness, disgust, fear and neutral, and emotion models. There are two types of emotion
models: two-dimensional (2D) [31] and three-dimensional (3D) [32]. The 2D emotion model
consists of valence and arousal; valence represents the measure of pleasant and unpleasant,
and arousal represents excitement and calmness. The 3D emotion model comprises AVD.
The arousal and valence emotions are the same as in the 2D emotion model. Dominance is the
third emotional aspect, representing dependence and independence.

1.1 Contribution
The objective of the present work is to develop an efficient ER model for the AMIGOS [33]
data set in 3D emotional space (i.e. AVD) using DL models. The AMIGOS is a new data set
among other popular EEG data sets for ER. The following are the challenges of the AMIGOS
data set addressed in the present work:
(1) Handling of tensor form data.
(2) Proposing an efficient method for generating sufficient equal-length samples
corresponding to imbalanced and variable-length data.
(3) Selecting a suitable ML/DL model.
(4) Improving the accuracy of the applied model.
Ref., Emotions Feature extraction Accuracy
Emotion
year recognised method Classifier Data sets % recognition
system for
[21], 2D-emotion High-order statistics LSTM SEED 90.81
2020 model EEG signals
[22], Negative, Electrode frequency CNN SEED, DEAP 90.59
2020 positive and distribution
neutral map þ STFT
[23], 3D emotion Multi-level feature Multi-level feature DEAP, DREAMER 98.32
2020 model capsule network capsule network
(end to end network)
[24], Negative, Local and global Regularised graph SEED, SEED-IV 85.30
2020 positive and inter channel neural network
neutral relation
[25], 2D emotion Differential entropy Graph convolution DEAP 90.60
2021 model Network þ LSTM
[26], Sad, happy, Time frequency Configurable CNN, Recorded EEG of 93.01
2020 relax, fear representation by Alexnet, VGG-16, students of Indian
smoothed Pseudo- Resnet-50 Institute of information
Wigner–Ville Technology design and
distribution Manufacturing,
Jabalpur
[27], 2D emotion End-to-end region Region asymmetric DEAP, DREAMER 95
2020 model asymmetric convolution neural
convolution neural network
network
[28], 2D emotion Spectrogram Bidirection LSTM AMIGOS 83.30
2020 model representation
[29], 2D emotion Features extracted CNN þ SVM AMIGOS 90.54 Table 1.
2021 model from topographic Studies on EEG-based
and holographic emotion recognition
feature map using deep learning

The equal-length data samples are generated here by the OSW method. Although the data
can be oversampled using the built-in Python function Synthetic Minority Oversampling
Technique (SMOTE) [34], SMOTE generates the data by replicating the examples without
adding any new information to them. Thus, the OSW method is proposed in the present work,
which induces variability in the sample records by avoiding the repetition of the signals.
Feature extraction is undertaken in two modes using normalised transformation of band
power and a wavelet energy.
The rest of this paper comprises three additional sections. Section 2 provides details of the
emotion recognition system proposed in the research, Section 3 details the results and
discussions and Section 4 provides the conclusions.

2. Emotion recognition system


The proposed emotion recognition system (ERS) is modelled in three stages:
(1) Data preprocessing,
(2) Feature extraction and
(3) Classification implemented for AVD.
Figure 1 shows the framework adopted for OSW-based ERS.
ACI The important concepts used in present research are described as follows:

2.1 Decomposition of signal using OSW


The emotion samples are amplified in current research using OSW, as a large amount of data
is recommended for efficient model building in DL methods [35]. EEG signals produced in
different experiments were decomposed into 512 size windows by a shift of 32, as shown in
Figure 2.
The portion of signals not covered by the 512 windows was trimmed or not used for
computation purposes. The window and shift were decided experimentally.

2.2 Feature extraction


Once signals were decomposed into equal-length samples using overlapping windows, NBP
and NWE features were extracted using discrete Fourier transform (DFT) [36] and discrete
wavelet transform (DWT) [37].
2.2.1 Normalised band power (NBP). To calculate NBP feature, first Fourier transform Xk
was calculated for windowed signal by using Eqn (1):
XN −1
Xk ¼ n¼0
xn e−2πnk=N (1)

where N is length of vector x and 0 ≤ k ≤ N − 1.


Once signal is converted to frequency domain, the five frequency bands (4–8 Hz, 8–13 Hz,
13–16 Hz, 16–30 Hz and 30–45 Hz) were extracted. The beta band was decomposed into two
(beta1 and beta2) to equalise the dimensions of the wavelet transform. Further, band power
and normalised band power were then calculated for each band by Eqns (2) and (3) given
below:
Xk
PB ¼ 0
jXk j2 (2)

where PB represents power of band B, k is length of each band.

Figure 1.
Framework for
overlapping sliding
window-based emotion
recognition system
y0[n] Emotion
VALUE

recognition
1 512
system for
TIME IN SAMPLES EEG signals
y100[n]
VALUE

32

1 TIME IN SAMPLES 512


32 y200[n]
VALUE

1 TIME IN SAMPLES 512


y300[n]
VALUE

32

1 TIME IN SAMPLES 512


y400[n]
VALUE

32 Figure 2.
Overlapping window
signal decomposition
1 TIME IN SAMPLES 512

b B ¼ PPB
P (3)
B PB

where, Pb B is called NBP.


2.2.2 Normalised wavelet energy (NWE). In DWT, the different frequencies of signal are cut
at different levels, and the process is called multi-level wavelet transform, defined in Eqn (4)
 
1 Xp−1 tτ
Dðτ; sÞ ¼ pffiffi xn : ψ (4)
s n¼0 s

where τ ¼ k: 2−j and s ¼ 2−j represents translation and scale respectively. ψ is called
mother wavelet which was taken here Daubechies4(db4) wavelet. The signal is further
decomposed into cAn and cDn which are called approximation coefficient at nth level
(provides low frequencies) and detailed coefficient at nth level (provides high
frequencies), respectively. Because the EEG signal provided in the pre-processed data
set is in the range of 4 Hz–45 Hz, five-level decomposition is sufficient for required four-
band information, as shown in Figure 3.
After decomposition of signal into multilevel wavelet coefficients, the wavelet energy is
calculated using detailed coefficients cDn of above five levels because the emotion
information is mostly available in higher frequencies. The formula for calculating wavelet
energy is given in Eqn (5):
X
WEn ¼ jcDn j2 (5)
n

NWE is calculated using Eqn (6)


d n ¼ PWE
WE (6)
n WEn
ACI

Figure 3.
Wavelet
decomposition of
different bands

2.3 Convolution neural network


A CNN is multilayer structure that consists of different types of layers, including input,
convolution, pooling, fully connected, softmax/logistic and output [38]. The extracted
features are fed into two different types of CNN: 1D and 2D. Both 1D and 2D CNN followed
same architectures, convolutions, including Conv1D and Conv2D, preceded by batch
normalisation. A max pooling layer with ReLU activation function is applied to every
convolution layer. Lastly, the max pooling layer is connected to an adaptive average pooling
layer, which is then passed through a flattening layer and followed by four output dense
layers. The first three are dense linear layers, and the last is a sigmoid layer for binary
classification among four output dense layers. Architecture of 1D and 2D CNN is shown in
Figures 4 and 5, respectively.
Emotion
recognition
system for
EEG signals

Figure 4.
1D-CNN architecture
ACI

Figure 5.
2D-CNN architecture
3. Results and discussions Emotion
All experiments conducted in the present work are performed on Intel i5 8 GB RAM AMD recognition
processor using Python 3.7 programming language. PyTorch version 1.7.0 is used to
implement CNN, and the execution of CNN is achieved on Kaggle GPU.
system for
The present work is executed in the following steps: EEG signals

3.1 Preparation of data


The data set used to pursue research was originally prepared by Correa et al. (2018) to identify
affect, mood and personality in an intricate format. This data set comprises of 40 folders
wherein each folder corresponds to one participant. Further each folder consists of a
MATLAB file with the following list, as shown in Table 2.
In the present study, the data for 16 short videos were taken for 14 columns of EEG and
their respective labels in AVD, from self-assessment labels list. Emotion indices responses
under AVD were coded as 1 and 0 according to Table 3.
3.1.1 Balancing for emotions. After preparing the data set, the number of samples in each
category AVD are plotted in Figure 6(a). It is evident that the number of samples recorded as
low emotions for each category is significantly fewer than those recorded as high emotional
level. Thus, the low and high emotions of each category were balanced by the Python
function, SMOTE. The result of upsampling is shown in Figure 6(b).
The resultant number of samples is insufficient for applying DL methods. Moreover,
replication in data causes reduction in the accuracy of models as shown in Table 6. To
overcome these limitations, the data is being generated by non-overlapping sliding window
(NOSW) and OSW in the present work. The resultant number of samples is shown in Table 4.

Name Size Content

Joined_data 1 3 20 In this array, there are 20 columns corresponding to 16 short videos and 4
long videos shown to the participants. Each cell consists of a matrix of size
y 3 17. Here, y is variable the value of which depends on the length of
video. In this matrix, there are total 17 columns out of which 14 are
corresponding to EEG signals, 2 are corresponding to ECG and last is
corresponding to GSR signal
Labels_self- 1 3 20 In this array, there are 20 columns corresponding to 16 short videos and 4
assessment long videos shown to the participants. Each cell consists of a matrix of size
1 3 12, wherein 12 columns correspond to 12 assessments (arousal,
valence, dominance, liking, familiarity and seven basic emotions) by
participant for every video. The first five dimensions of emotions are
measured on a scale of 1–9, where 1 is the lowest and 9 is the highest. The
seven basic emotions (neutral, disgust, happiness, surprise, anger, fear
and sadness) are displayed in binary (i.e. 0 or 1)
Labels_ext_annotation 1 3 20 In this array, there are 20 columns corresponding to 16 short videos and 4
long videos shown to the participants. Each cell consists of a matrix of size
z 3 3, where z is number of segments in a video each of length 20 seconds Table 2.
and three columns holds the value for segment number, arousal and Content of
valence MATLAB files

High arousal Low arousal High valence Low valence High dominance Low dominance
(HA) 5 1 (LA) 5 0 (HV) 5 1 (LV) 5 0 (HD) 5 1 (LD) 5 0 Table 3.
Coding of AVD from
>4.5 ≤4.5 >4.5 ≤4.5 >4.5 ≤4.5 1–9 to 0–1
ACI

Before balancing Aer balancing


600
600 507 507 497 497
507 497 500
500 416 416 416
400 339 400
298
300 248 300
200 200
100 100
Figure 6.
High and low emotion 0 0
HIGH LOW HIGH LOW HIGH LOW HIGH LOW HIGH LOW HIGH LOW
indices in AVD (a) prior
to balancing (b) after AROUSAL VALENCE DOMINANCE AROUSAL VALENCE DOMINANCE
balancing
(a) (b)

S. No Sampling technique Arousal Valence Dominance

Table 4. 1 Original samples 755 795 755


Number of samples 2 After resampling by SMOTE 1,014 994 832
generated after 3 Samples generated after decomposition of signal by NOSW 29,382 29,382 29,382
resampling methods 4 Samples generated after decomposition of signal by OSW 458,664 458,664 458,664

3.2 Feature extraction and classification


The decomposed signals were cleaned by removing NaN values. Moreover, five NBP and five
NWE features corresponding to five EEG band were extracted by Fourier and wavelet
transform, respectively. A combined vector of both features {NBP, NWE} is also formed by
appending the NWE features to the NBP features. A total of 70 (514 3 5) features were
extracted for 14 EEG channels by each of NBP and NWE separately. Thus, there are 140
features present for combined vector. The features extracted by different resampling
methods are shown in Table 5.
CNN classifiers discussed in Section 2.3 were applied to individual and combined features.
The train, validation and test samples are divided into a 70:40:30 ratio. The learning rate,
batch size and optimiser are taken as 0.001, 32 and “adam”, respectively. A binary cross-
entropy function was used as loss function.
The training of CNN continues until the accuracy of the network does not become
constant/start decreasing. The emotion recognition accuracies of two DL classifiers were
compared with baseline ML model, SVM in Table 6; the highest accuracy is shown in italic.

Resampling method Feature modality Arousal Valence Dominance

SMOTE Individual feature 1014 3 70 994 3 70 884 3 70


Combined feature 1014 3 140 994 3 140 884 3 140
Table 5. NOSW Individual feature 29382 3 70 29382 3 70 29382 3 70
Feature dimensions Combined feature 29382 3 140 29382 3 140 29382 3 140
obtained after feature OSW Individual feature 458664 3 70 458664 3 70 458664 3 70
extraction Combined feature 458664 3 140 458664 3 140 458664 3 140
Resampling method Classifier Feature Arousal Valence Dominance
Emotion
recognition
SMOTE SVM NBP 80.76 68.81 63.35 system for
NWE 76.92 73.11 67.17
{NBP,NWE} 77.88 67.74 74.04 EEG signals
1D CNN NBP 61.7 58.09 61.93
NWE 63.99 50.49 60.01
{NBP,NWE} 61.86 58.17 58.87
2D CNN NBP 62.35 53.4 58.52
NWE 67.89 63.56 60.01
{NBP,NWE} 68.11 61.22 55.53
NOSW SVM NBP 71.21 62.85 54.15
NWE 80.11 73.73 76.27
{NBP,NWE} 81.14 75.19 78.39
1D CNN NBP 71.33 66.56 65.66
NWE 78.64 69.55 73.13
{NBP,NWE} 78.3 69.97 70.09
2D CNN NBP 75.41 71.81 71.37
NWE 80.22 75.5 76.67
{NBP,NWE} 81.79 75.59 78.67
OSW SVM NBP 84.56 88.21 87.02
NWE 87.05 83.22 85.18
{NBP,NWE} 70.64 63.79 56.25
1D CNN NBP 91.45 92.93 93.47
NWE 89.9 85.95 88.21
{NBP,NWE} 93.66 93.14 92.62 Table 6.
2D CNN NBP 94.22 93.78 94.08 Accuracy obtained
NWE 92.3 90.35 91.51 after applying ML/DL
{NBP,NWE} 96.63 95.87 96.3 classifiers

From Table 6, it is evident that the accuracies obtained after resampling by SMOTE are at
least 1% lower than the NOSW and at least 15.87% lower than OSW. Comparing the ML and
DL methods, SVM performs better under SMOTE resampling since sample size is small. But,
in NOSW and OSW resampling, 2D CNN performs best since sample size is large. A feature-
wise comparison of all methods is shown in Figure 7.
From Figure 7(a), it can be observed that SVM outperforms other methods, and no specific
pattern is observed that can indicate which among individual or combined features perform
better. It is evident from 7(b) that the NWE feature is providing higher accuracy in the case of
NOSW. In contrast, the NBP feature is providing higher accuracies with the OSW for DL
methods shown in Figure 7(c). The combined features for 2D CNN give higher accuracies for
both NOSW and OSW shown in Figure 7(b) and 7(c). Thus, by combining the observations
made by Table 6, Figure 7(b) and 7(c), a 2D CNN classifier with a combined feature vector
found best for all the emotional indices with 96.63%, 95.87% and 96.30% accuracies,
respectively.
An execution history of 2D CNN with combined features for overlapping window is shown
in Figure 8 in terms of loss and accuracy curve for arousal, valence and dominance,
respectively. Loss curves represent the training and validation loss, which is expected to be as
close as possible. The accuracy curve shows accuracy obtained for each emotion indices for
20 epochs.
The results are also compared for time of execution of individual feature versus combine
features shown in Figure 9.
Figure 9 shows that as the sample size increases from SMOTE → NOSW → OSW, the time
for execution increases significantly in case of SVM for both individual and combined
ACI Arousal Valence Dominance

80.76

77.88
76.92

74.04
73.11
68.81

68.11
67.89
67.74
67.17

63.99

63.56
63.35

62.35
61.93

61.86

61.22
60.01
60.01

58.87
61.7

58.52
58.09

56.17

55.53
50.49

53.4
NBP NWE {NBP,NWE} NBP NWE {NBP,NWE} NBP NWE {NBP,NWE}
SVM ID-CNN 2D-CNN

Comparison of feature extraction methods for SMOTE


(a)

Arousal Valence Dominance

81.79
81.14

80.22
80.11

78.67
4
78.64
78.39

76.67
76.27

75.59
75.41
75.19

78.3
73.73

73.13

75.5
71.81
71.37
71.33
71.21

70.09
69.97
69.55
66.56
65.66
62.85
54.15

NBP NWE {NBP,NWE} NBP NWE {NBP,NWE} NBP NWE {NBP,NWE}


SVM ID-CNN 2D-CNN

Comparison of feature extraction methods for NOSW


(b)
Arousal Valence Dominance

96.63
95.87
94.22

94.08
93.78
93.66
93.47

93.14
92.93

92.62

96.3
91.51
91.45

90.35
92.3
88.21

88.21
87.05
87.02

89.9
85.95
85.18
84.56

83.22

70.64
63.79
56.25

Figure 7.
Comparison of
NBP NWE {NBP,NWE} NBP NWE {NBP,NWE} NBD NWE {NBP,NWE}
methods performance
(in terms of accuracies) SVM ID-CNN 2D-CNN

for different feature Comparison of feature extraction methods for OSW


extraction method
(c)

features. The reason for this observation is the fact that SVM cannot be executed on GPUs
since it involves complex calculations. As observed in this study that basic SVM performs
poorly when the sample size is large (as in case of combined feature with OSW in Table 6), the
same is being reported in [39].
Table 7 compares the results obtained in present study with ERS articles published from
2018 onwards on AMIGOS data set.
Emotion
recognition
system for
EEG signals

Figure 8.
Loss and accuracy
curve for AVD
ACI
1340

720

120 120
5
4 5
4 5
4 15
10 12
7 12
7 90 90
Figure 9.
SVM 1D CNN 2D CNN SVM 1D CNN 2D CNN SVM 1D CNN 2D CNN
Time of execution of
ML/DL methods for SMOTE NOSW OSW
individual and combined
features (in mins) Inividual Features Combined Features

Emotions
Arousal Valence Dominance
Ref, year % % % Modality Features Classifier

[33], 57.7 56.4 – EEG All band, PSD, spectral SVM


2018(Original power asymmetry
paper) between 7 pairs of
electrodes in the five
bands
[40], 2018 71.54 66.67 72.36 EEG Conditional entropy Extreme
(CF) feature, CNN learning
based feature using machine (ELM)
EEG topography
[41], 2018 68.00 84.00 – EEG þ ECG þ GSR Time, frequency and GaussianNB,
entropy domain XGBoost
features
[42], 2019 83.02 79.13 – EEG PSD, Conditional LSTM
entropy, PSD image
based deep learning
features
[28], 2020 83.30 79.40 – EEG þ ECG þ GSR Spectrogram Bidirectional
representation LSTM
[29], 2021 90.54 87.39 – EEG Features extracted CNN þ SVM
from topographic and
Table 7. holographic feature
Comparison table of map
proposed work with Our method 96.63 95.87 96.30 EEG NBP þ NWE 2D CNN
existing work Note(s): *The “–” in dominance column mean the study is conducted for 2D emotions only

From Table 7, the emotions are recognised using only EEG data in [29, 33, 40, 42]. The other
studies were carried out using multimodal data. The first study [33] conducted on the
AMIGOS data set provides an initial analysis, produces very low accuracy – 57.7% and
therefore poses an open research challenge. The accuracy was improved to 71.54% in [40] in
same year, in which the features were extracted using CNN. A multimodal ERS was proposed
in [28, 41], producing accuracies of up to 84%. Highest accuracy achieved on AMIGOS data
set prior to this work is 90.54% using CNN þ SVM model in [29]. Finally, the present
proposed model has improved the accuracy up to 96.63% with a single modality (EEG)
through a 2D CNN classifier.
Siddharth et al. in 2019 [42] worked on four data sets (DEAP, DREAMER, AMIGOS and
MAHNOB-HCI) using LSTM. However, it is observed by them that LSTM is difficult to
implement on AMIGOS data set, which has varying lengths of data. This indicates necessity Emotion
for executing an efficient pre-processing method prior to classification. The present paper recognition
offers the most efficient classification strategy for EEG records of varying lengths through
decomposition of data using an OSW approach which provides an efficient alternative for
system for
handling imbalanced variable-length data prior to the classification. EEG signals

4. Conclusions
Despite significant development in the field of DL and its suitability to various
applications, almost 59% of researchers have used an SVM with RBF kernels for BCIs
[19]. This is due to the unavailability of a large-scale data set for BCIs. However, DL
models are widely applied in speech and visual modality. A BCI data set provides
genuine human responses as they are taken directly from the human body. Thus, ER
using brain signals is preferred. There is a need for an “off-the-shelf” method to conduct
research on BCIs with a high accuracy. The accuracy found in BCIs is generally low –
especially for the AMIGOS data set.
The present contribution focusses on obtaining predictive outcome of the 3D emotion
responses to EEG signals in context of imbalanced variable-length records. Novelty of the
present paper is that it proposes application of OSW for CNN to the intricate AMIGOS data
set aimed at highly accurate prediction of 3D emotions in contrast to the accuracy achieved by
the existing approaches available in literature. Most of the earlier analysis of AMIGOS data
set has been pivoted on 2D emotion analysis. The current paper views EEG (14 channels) on
3D emotions for predictive inference and presents a comparative assessment of the predictive
accuracy with that of Siddharth et al. (2018) [40]. Thus, the present approach is found to have
the highest accuracy with respect to all the three AVD emotion indices as compared to similar
works referenced in literature (Table 7).
The present work can be further extended for multiple modalities in physiological signals
as well as with the inclusion of response to video interventions such as in automatic video
recommendation system for enhancing the mood of individuals. Another possible extension
of this work can be accomplished by representing the signal features in 2D/3D form and
subsequently combining them with the respective video/image features.

References
1. K€ovecses Z. Emotion concepts. New York: Springer Science and Business Media; 2012 Dec 6.
2. Alarcao SM, Fonseca MJ. Emotion recognition using EEG signals: a survey. IEEE Trans Affect
Comp. 2017 Jun 12; 10(3): 374-93.
3. Saxena A, Khanna A, Gupta D. Emotion recognition and detection methods: a comprehensive
survey. J Art Int Sys. 2020 Feb 7; 2(1): 53-79.
4. Lin YP, Wang CH, Jung TP, Wu TL, Jeng SK, Duann JR, Chen JH. EEG-based emotion recognition
in music listening. IEEE (Inst Electr Electron Eng) Trans Biomed Eng. 2010 May 3; 57(7):
1798-806.
5. Santamaria-Granados L, Munoz-Organero M, Ramirez-Gonzalez G, Abdulhay E, Arunkumar NJ.
Using deep convolutional neural network for emotion detection on a physiological signals dataset
(AMIGOS). IEEE Access. 2018 Nov 23; 7: 57-67.
6. Xiefeng C, Wang Y, Dai S, Zhao P, Liu Q. Heart sound signals can be used for emotion
recognition. Sci Rep. 2019 Apr 24; 9(1): 1.
7. Recio G, Schacht A, Sommer W. Recognizing dynamic facial expressions of emotion: specificity
and intensity effects in event-related brain potentials. Biol Psychol. 2014 Feb 1; 96: 111-25.
8. El Ayadi M, Kamel MS, Karray F. Survey on speech emotion recognition: features, classification
schemes, and databases. Pattern Recogn. 2011 Mar 1; 44(3): 572-87.
ACI 9. Gunes H, Piccardi M. Bi-modal emotion recognition from expressive face and body gestures. J Net
Comp Appl. 2007 Nov 1; 30(4): 1334-45.
10. Song T, Liu S, Zheng W, Zong Y, Cui Z, Li Y, Zhou X. Variational instance-adaptive graph for
EEG emotion recognition. IEEE Trans Affect Comp. 2021 Mar 9, Early access. doi: 10.1109/
TAFFC.2021.3064940.
11. Barlow JS. The electroencephalogram: its patterns and origins. Cambridge, MA and London: MIT
Press; 1993.
12. Yazıcı M, Ulutaş M. Classification of EEG signals using time domain features. 2015 23nd Signal
Processing and Communications Applications Conference (SIU): IEEE; 2015 May 16. 2358-2361.
13. Liu Y, Sourina O. Real-time fractal-based valence level recognition from EEG. Transactions on
computational science; Berlin, Heidelberg: Springer: 2013; 18. p. 101-120.
14. Petrantonakis PC, Hadjileontiadis LJ. Emotion recognition from EEG using higher order
crossings. IEEE Trans Inf Tech Biomed. 2009 Oct 23; 14(2): 186-97.
15. Kim C, Sun J, Liu D, Wang Q, Paek S. An effective feature extraction method by power spectral
density of EEG signal for 2-class motor imagery-based BCI. Med Biol Eng Comput. 2018 Sep;
56(9): 1645-58.
16. Zhang R, Xu P, Chen R, Li F, Guo L, Li P, Zhang T, Yao D. Predicting inter-session performance of
SMR-based brain–computer interface using the spectral entropy of resting-state EEG. Brain
Topogr. 2015 Sep; 28(5): 680-90.
17. Zhang J, Wei Z, Zou J, Fu H. Automatic epileptic EEG classification based on differential entropy
and attention model. Eng Appl Artif Intelligence. 2020 Nov 1; 96: 103975.
18. Al-Fahoum AS, Al-Fraihat AA. Methods of EEG signal features extraction using linear analysis
in frequency and time-frequency domains. Int Scholarly Res Notices. 2014: 1-7.
19. Gu X, Cao Z, Jolfaei A, Xu P, Wu D, Jung TP, Lin CT. EEG-based brain-computer interfaces
(BCIS): a survey of recent studies on signal sensing technologies and computational intelligence
approaches and their applications. IEEE ACM Trans Comput Biol Bioinf. 2021 Jan 19. doi: 10.
1109/TCBB.2021.3052811.
20. Tao W, Li C, Song R, Cheng J, Liu Y, Wan F, Chen X. EEG-based emotion recognition via channel-
wise attention and self attention. IEEE Trans Affect Com. 2020 Sep 22. doi: 10.1109/TAFFC.2020.
3025777.
21. Sharma R, Pachori RB, Sircar P. Automated emotion recognition based on higher order statistics
and deep learning algorithm. Bio Sig Pro Cont. 2020 Apr 1; 58: 101867.
22. Wang F, Wu S, Zhang W, Xu Z, Zhang Y, Wu C, Coleman S. Emotion recognition with
convolutional neural network and EEG-based EFDMs. Neuropsychologia. 2020 Sep 1; 146: 107506.
23. Liu Y, Ding Y, Li C, Cheng J, Song R, Wan F, Chen X. Multi-channel EEG-based emotion
recognition via a multi-level features guided capsule network. Comput Biol Med. 2020 Aug 1; 123:
103927.
24. Zhong P, Wang D, Miao C. EEG-based emotion recognition using regularized graph neural
networks. IEEE Trans Affect Comp. 2020 May 11. doi: 10.1109/TAFFC.2020.2994159.
25. Yin Y, Zheng X, Hu B, Zhang Y, Cui X. EEG emotion recognition using fusion model of graph
convolutional neural networks and LSTM. Appl Soft Comput. 2021 Mar 1; 100: 106954.
26. Khare SK, Bajaj V. Time-frequency representation and convolutional neural network-based
emotion recognition. IEEE Trans Neural Net Lear Sys. 2020 Jul 31; 32(7): 2901-2909.
27. Cui H, Liu A, Zhang X, Chen X, Wang K, Chen X. EEG-based emotion recognition using an end-to-
end regional-asymmetric convolutional neural network. Knowl Base Syst. 2020 Oct 12; 205:
106243.
28. Li C, Bao Z, Li L, Zhao Z. Exploring temporal representations by leveraging attention-based
bidirectional LSTM-RNNs for multi-modal emotion recognition. Inf Process Management. 2020
May 1; 57(3): 102185.
29. Topic A, Russo M. Emotion recognition based on EEG feature maps through deep learning Emotion
network. Eng Sci Tech Int J. 2021 Apr 16. doi: 10.1016/j.jestch.2021.03.012.
recognition
30. He H, Tan Y, Ying J, Zhang W. Strengthen EEG-based emotion recognition using firefly
integrated optimization algorithm. Appl Soft Comput. 2020 Sep 1; 94: 106426.
system for
31. Russell JA. A circumplex model of affect. J Personal Soc Psychol. 1980 Dec; 39(6): 1161.
EEG signals
32. Verma GK, Tiwary US. Affect representation and recognition in 3D continuous valence–arousal–
dominance space. Multimed Tool Appl. 2017 Jan; 76(2): 2159-83.
33. Correa JA, Abadi MK, Sebe N, Patras I. Amigos: a dataset for affect, personality and mood
research on individuals and groups. IEEE Trans Affect Com. 2018 Nov 30; 12(2): 479-493.
34. Chawla NV, Bowyer KW, Hall LO, Kegelmeyer WP. SMOTE: synthetic minority over-sampling
technique. J Artif intelligence Res. 2002 Jun 1; 16: 321-57.
35. Angelov P, Sperduti A. Challenges in deep learning. In: ESANN 2016 - 24th European symposium
on artificial neural networks. ESANN 2016 - 24th European symposium on artificial neural
networks. i6doc.com publication, BEL; 2016. 489-496. ISBN 9782875870278.
36. Bracewell RN, Bracewell RN. The Fourier transform and its applications. New York, NY:
McGraw-Hill; 1986 Feb.
37. Daubechies I. Ten lectures on wavelets, CBMS conf. Series Appl Math; 1992 Jan 1; 61.
38. Le QV. A tutorial on deep learning part 2: autoencoders, convolutional neural networks and
recurrent neural networks. Google Brain. 2015 Oct 20: 1-20 [online] Available from: https://cs.
stanford.edu/~quocle/tutorial2.pdf.
39. Cervantes J, Li X, Yu W, Li K. Support vector machine classification for large data sets via
minimum enclosing ball clustering. Neurocomputing. 2008 Jan 1; 71(4–6): 611-9.
40. Siddharth, Jung TP, Sejnowski TJ. Multi-modal approach for affective computing. 2018 40th
Annual International Conference of the IEEE Engineering in Medicine and Biology Society
(EMBC); IEEE; 2018 Jul 18. 291-294.
41. Tung K, Liu PK, Chuang YC, Wang SH, Wu AY. Entropy-assisted multi-modal emotion
recognition framework based on physiological signals. 2018 IEEE-EMBS Conference on
Biomedical Engineering and Sciences (IECBES); IEEE: 2018 Dec 3. 22-26.
42. Siddharth S, Jung TP, Sejnowski TJ. Utilizing deep learning towards multi-modal bio-sensing and
vision-based affective computing. IEEE Trans Affect Com. 2019 May 14. doi: 10.1109/TAFFC.
2019.2916015.

Annexure
Annexure is available online for this article.

Corresponding author
Shruti Garg can be contacted at: gshruti@bitmesra.ac.in

For instructions on how to order reprints of this article, please visit our website:
www.emeraldgrouppublishing.com/licensing/reprints.htm
Or contact us for further details: permissions@emeraldinsight.com

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy