Biomedical Signal Processing and Control: Linlin Gong, Mingyang Li, Tao Zhang, Wanzhong Chen
Biomedical Signal Processing and Control: Linlin Gong, Mingyang Li, Tao Zhang, Wanzhong Chen
Keywords: EEG-based emotion recognition has become an important task in affective computing and intelligent interac-
Electroencephalogram (EEG) tion. However, how to effectively combine the spatial, spectral, and temporal distinguishable information of
Emotion recognition EEG signals to achieve better emotion recognition performance is still a challenge. In this paper, we propose
Attention mechanism
a novel attention-based convolutional transformer neural network (ACTNN), which effectively integrates the
Convolutional neural network (CNN)
crucial spatial, spectral, and temporal information of EEG signals, and cascades convolutional neural network
Transformer
and transformer in a new way for emotion recognition task. We first organized EEG signals into spatial–
spectral–temporal representations. To enhance the distinguishability of features, spatial and spectral attention
masks are learned for the representation of each time slice. Then, a convolutional module is used to extract
local spatial and spectral features. Finally, we concatenate the features of all time slices, and feed them into the
transformer-based temporal encoding layer to use multi-head self-attention for global feature awareness. The
average recognition accuracy of the proposed ACTNN on two public datasets, namely SEED and SEED-IV, is
98.47% and 91.90% respectively, outperforming the state-of-the-art methods. Besides, to explore the underlying
reasoning process of the model and its neuroscience relevance with emotion, we further visualize spatial and
spectral attention masks. The attention weight distribution shows that the activities of prefrontal lobe and
lateral temporal lobe of the brain, and the gamma band of EEG signals might be more related to human
emotion. The proposed ACTNN can be employed as a promising framework for EEG emotion recognition.
∗ Corresponding author.
E-mail address: chenwz@jlu.edu.cn (W. Chen).
https://doi.org/10.1016/j.bspc.2023.104835
Received 28 November 2022; Received in revised form 17 February 2023; Accepted 5 March 2023
Available online 10 March 2023
1746-8094/© 2023 Elsevier Ltd. All rights reserved.
L. Gong et al. Biomedical Signal Processing and Control 84 (2023) 104835
domain features generally include power spectrum (PSD) features [23]. framework. First, we use a non-overlapping window with a length of
For time-frequency domain features, wavelet transform is often used to T-second to intercept EEG signals after removing noise and artifacts.
extract, including discrete wavelet transform (DWT) [24,25], tunable Q Then, we divide it into T 1-second segments. For each segment, we
wavelet transform (TQWT) [26–28], Dual-Tree Complex Wavelet Trans- extract the DE features in the 𝛿, 𝜃, 𝛼, 𝛽, 𝛾 frequency bands, and then
forms (DT-CWT) [29,30], etc. In addition, some studies used features map the features in space according to the position of the electrodes.
based on entropy measure. In the field of EEG emotion recognition, Subsequently, to enhance the critical spatial and spectral information
differential entropy (DE) [31] feature is widely used and proved to and suppress invalid information, we introduce a new parallel spatial
be robust. Besides, it also includes sample entropy [32,33], energy and spectral attention mechanism. Next, we use the convolution mod-
entropy [34], approximate entropy [35,36], etc. For feature classi- ule to extract the local spatial and spectral features of each time slice. In
fication, the classifiers used generally include support vector ma- the temporal encoding part, we concatenate the features from the time
chine (SVM) [31,36], k-nearest neighbor (KNN) [19,21], random forest slices and apply multi-head self-attention for global awareness through
(RF) [34], decision tree (DT) [25,37], and so on, as well as ensemble three temporal encoding layers. Finally, the classifier is composed of a
model [24] of multiple classifiers. fully-connected layer and a softmax layer to predict emotion labels.
Recently, with the continuous improvement and superior perfor- We carry out a series of experiments through ACTNN. Firstly, statis-
mance of deep learning algorithm, EEG emotion recognition method tic analysis of DE features is carried out using one-way ANOVA. Sec-
based on deep learning framework has been effectively applied, and has ondly, the overall performance of ACTNN and the comparison of results
achieved the better performance. Zheng et al. [38] designed a classifica- under different attention conditions are reported. Thirdly, we com-
tion model based on deep belief network (DBN), and discussed the key pare the recognition performance when the input is the raw EEG
frequency bands and channels more suitable for emotion classification signals or DE features. Fourthly, the spatial and spectral attention mask
tasks. Maheshwari et al. [39] proposed a deep convolution neural is visualized to explore the model’s potential reasoning process and
network (Deep CNN) EEG emotion classification method. Considering interpretability. Finally, the ablation experiment is conducted to in-
the spatial information of adjacent and symmetric channels of EEG, vestigate the contribution of key components of ACTNN to recognition
Cui et al. [40] proposed an end-to-end regional asymmetric convo- performance.
lutional neural network (RACNN), wherein the temporal, regional, The main contributions of this paper are as follows:
and asymmetric feature extractor in the model are all composed of (1) We propose a novel attention-based Convolutional Transformer
convolution structures. Xing et al. [41] used the Stack AutoEncoder neural network, named ACTNN. It cascades convolutional neural net-
(SAE) to decompose the EEG source signal, and then used the long work and transformer in an innovative way to deal with EEG emo-
short-term memory recurrent neural network (LSTM-RNN) framework tion recognition tasks, which effectively utilizes the advantages of
for emotion classification. local awareness of CNN and global awareness of transformer, and the
In addition, some researchers also use the mixed depth model to combination of the two can form a powerful model.
carry out experiments. For example, Iyer et al. [42] proposed a hybrid (2) We introduce the new attention mechanism to effectively en-
model based on CNN and LSTM, and an integrated model combining hance the distinguishability of spatial, spectral, and temporal of EEG
CNN, LSTM and hybrid models. Li et al. [43] presented a hybrid model signals, and achieved satisfactory results. Moreover, we apply a more
based on CNN and RNN (CRNN), which constructs scalograms as the lightweight spatial and spectral attention layout, which overcomes the
input of the model after continuous wavelet transform of EEG signals. high computational complexity caused by common attention mech-
Zhang et al. [44] designed an end-to-end hybrid network based on CNN anism layout, saves computational consumption and ensures better
and LSTM (CNN-LSTM), which directly takes the original EEG signal recognition accuracy.
as the input. All hybrid frameworks showed better classification results (3) The average recognition accuracy of the proposed ACTNN model
than using a single model. on SEED and SEED-IV datasets is 98.47% and 91.90% respectively,
However, there are still some challenges and problems worthy of outperforms the state-of-the-art methods. Besides, to explore the un-
improvement in the area of EEG-based emotion recognition. derlying reasoning process of the model and its neuroscience relevance
Firstly, as mentioned above, most studies often extract features to emotion, we analyze the attention mask, and the weight distribution
from the time domain, frequency domain or time-frequency domain showed that the activities of the prefrontal and lateral temporal lobes
of EEG signals. In fact, EEG also includes the spatial information of of the brain and the gamma band of EEG signals might be more related
each channel. Because when in an emotional state, it involves large- to human emotion.
scale network interaction of the entire neural axis [45]. To effectively The rest of this paper is arranged as follows: Section 2 introduces
use the spatial information of EEG signals, Song et al. [46] proposed to two public databases and the proposed methods in detail. Section 3
use dynamical graph convolution neural network (DGCNN) to carry out reports and analyzes the experimental results. Section 4 discusses the
EEG emotion recognition, in their method, each EEG channel is taken noteworthy points and points to be improved in our work according to
as the vertex of the graph, and the adjacency matrix is dynamically the results. Section 5 elaborates the conclusion of our work.
updated during training. Subsequently, some models based on graph
neural network are proposed gradually [47,48]. Another method using 2. Materials and methods
spatial information has recently attracted attention, that is, EEG signal
is processed in a two-dimensional matrix. Yang et al. [49] was the 2.1. Dataset
first to propose this integration method. Later, CRNN [43], HCNN [50],
PCNN [51], etc., also used similar construction methods. We conduct extensive experiments on the SEED1 [31,38] and SEED-
Secondly, CNN has been extensively applied in EEG emotion recog- IV [53] dataset to evaluate our model. The main details of the two
nition task. However, there is a temporal context relation between datasets are summarized in Table 1.
frames. The convolution kernel in CNN can perceive locally, but it The SEED dataset [31,38] is a public EEG emotion dataset, which
may break these relations. We have learned that Transformer [52] has is mainly oriented to discrete emotion models. The experimental flow
strong global awareness due to the design of multi-head self-attention of SEED dataset is shown in Fig. 1, which is similar to that of SEED-
mechanism. Therefore, we hope to combine the local perception ability IV dataset. It including 15 subjects (7 males and 8 females, age range:
of CNN and the global perception ability of Transformer, and to design 23.27±2.37). Each subject did three experiments at intervals of about
a novel EEG-based emotion recognition model with better performance.
In this paper, we propose a novel multi-channel EEG emotion
1
recognition model (ACTNN), which cascades CNN and transformer https://bcmi.sjtu.edu.cn/~seed/index.html
2
L. Gong et al. Biomedical Signal Processing and Control 84 (2023) 104835
Table 1 on the 𝛿(1–4 Hz), 𝜃(4–8 Hz), 𝛼(8–13 Hz), 𝛽(13–31 Hz), and 𝛾(31–50 Hz)
Details of SEED and SEED-IV datasets.
frequency bands, respectively. The calculation formula of DE feature
Item SEED SEED-IV is
Subjects 15 15
Trials/Film clips 15 24 𝐷𝐸(𝑋) = − 𝑓 (𝑥) log 𝑓 (𝑥) 𝑑𝑥 (1)
∫𝑋
Each clip duration 4-min 2-min
Sessions/experiments 3 3 Where 𝑋 represented the EEG sequence and 𝑓 (𝑥) represented the its
EEG electrodes 62 62
probability density function. Shi et al. [54] had proved that when band-
Sampling rate 200 Hz 200 Hz
Emotion category 3 class 4 class
pass filtering is carried out at a 2 Hz step from 2 Hz to 44 Hz, the EEG
signals of each subband approximately follow the Gaussian distribution,
namely, 𝑋 ∼ 𝑁(𝜇, 𝜎 2 ). Therefore, the formula (1) can be further written
as
∞
1 (𝑥 − 𝜇)2 1 (𝑥 − 𝜇)2
𝐷𝐸(𝑋) = √ exp(− ) log √ exp(− ) 𝑑𝑥
∫−∞ 2𝜎 2 2𝜎 2
2𝜋𝜎 2 2𝜋𝜎 2
(2)
1
= log(2𝜋𝑒𝜎 2 )
2
Where 𝜋 and 𝑒 was constant, and 𝜎 2 represented the variance of the
EEG time series.
2.2. The proposed model The three-dimensional structure 𝐸𝑖 ∈ 𝑅𝐵×(𝐻×𝑊 ) contained impor-
tant spatial and spectral information, where 𝑖 represented the 𝑖th time
The framework of the proposed ACTNN is shown in Fig. 2. It mainly slice, 𝑖 = 1, … , 𝑇 . We introduced spatial and spectral attention branch,
consists of the following parts: EEG signal acquisition, preprocessing which were to adaptively capture brain regions and frequency bands
and segmentation, feature extraction, spatial projection, spatial and that were more critical for the tasks. Inspired by the convolutional
spectral attention branch, spatial–spectral convolution part, temporal attention module, scSE [55], which was used in the field of medical
encoding part, and classifier. image segmentation initially, we designed attention branch that were
For the EEG signals induced by emotional stimulus of each subject suitable for our tasks, as shown in the lower-left corner of Fig. 2.
in the dataset, we first intercept the non-overlapping T-second EEG
signals and divide them into T time slices with a length of 1 s. Then, 2.5.1. Spatial attention branch
we extract DE features in five frequency bands (i.e., 𝛿, 𝜃, 𝛼, 𝛽, 𝛾 The spatial attention branch aimed to capture crucial brain regions
rhythms) from each slice and map them to the spatial matrix. In the and corresponding electrodes involved in emotional activities, which
attention stage, we introduce a parallel spatial and spectral attention used the method of spectral squeeze and spatial excitation. In detail,
branch to adaptively allocate attention weights to spatial and spectral the three-dimensional structure 𝐸𝑖 of each time slice was shown as
dimensions. Next, we use the spatial–spectral convolution module to 𝐸𝑖 = [𝑒1,1 , 𝑒1,2 , … , 𝑒𝐻,𝑊 ], where 𝑒𝑖,𝑗 ∈ 𝑅𝐵×(1×1) . Spectral squeeze was
extract local features from each time slice. After concatenating the mainly realized by 3D convolution, that used convolution kernel with
features of each time slice, the temporal encoding layer is used to the size of 𝐵 × 1 × 1 and the output channel of 1, which was represented
further extract the temporal features from the global. Finally, a fully- by
connected layer and a softmax layer is used to predict the emotional
state of the subjects. The following is a detailed introduction to the 𝐾𝑖 = 𝑊𝑘 ⊗ 𝐸𝑖 (3)
specific implementation process of each part. Where 𝑊𝑘 ∈ 𝑅1×𝐵×1×1 represented the learned matrix, and 𝐾𝑖 ∈
𝑅1×𝐻×𝑊 was the spatial scores tensor. Next, the sigmoid function (rep-
2.3. Preprocessing and feature extraction resented by 𝜎(⋅)) was applied to normalize each element
𝑘𝑚=1,2,…,𝐻,𝑛=1,2,…,𝑊 of 𝐾𝑖 to the range of [0,1], which was the spatial
For the preprocessed EEG signals in the SEED and SEED-IV dataset, attention scores. Finally, using spatial attention scores to recalibrate the
we used a non-overlapping window with a length of T-second to original three-dimensional structure 𝐸𝑖 , we can get
intercept EEG signals. Then, we divided it into T 1-second segments.
For each segment, we extracted the differential entropy (DE) features 𝐸𝑖,𝑠𝑝𝑎𝑡𝑖𝑎𝑙 = [𝜎(𝑘1,1 )𝑒1,1 , 𝜎(𝑘1,2 )𝑒1,2 , … , 𝜎(𝑘𝐻,𝑊 )𝑒𝐻,𝑊 ] (4)
3
L. Gong et al. Biomedical Signal Processing and Control 84 (2023) 104835
Fig. 2. The framework diagram of the attention-based convolutional transformer neural network (ACTNN) for EEG emotion recognition.
4
L. Gong et al. Biomedical Signal Processing and Control 84 (2023) 104835
Table 2
Sample size in SEED and SEED-IV datasets.
𝑌𝑖 = 𝑓 (𝐶𝑜𝑛𝑣(𝐵𝑖,2 , 𝑘𝑐3 )), 𝑘𝑐3 ∈ 𝑅1×3×3 (12) Dataset session1/session2/session3
5
L. Gong et al. Biomedical Signal Processing and Control 84 (2023) 104835
Table 3 Table 5
Hyper-parameter setting. The F statistic and 𝑝-value obtained by one-way
Hyper-parameter Value or type ANOVA in SEED-IV.
Subject F 𝑝-value
Optimizer Adam
Learning rate 1e−5 1 227.51 1.28E−134
Loss function cross entropy 2 250.23 7.59E−147
Batch size 32 3 286.65 4.90E−166
Number of epochs 30(SEED)/50(SEED-IV) 4 496.03 9.90E−268
Dropout 0.7(SEED)/0.6(SEED-IV) 5 64.03 2.80E−40
6 37.03 1.52E−23
7 160.74 1.72E−97
Table 4 8 166.12 1.51E−100
The F statistic and 𝑝-value obtained by one-way 9 74.84 6.92E−47
ANOVA in SEED. 10 52.90 2.03E−33
Subject F 𝑝-value 11 168.25 9.35E−102
12 329.79 4.09E−185
1 436.99 1.40E−169
13 129.28 2.42E−79
2 118.4 1.97E−50
14 388.92 1.95E−217
3 101.98 9.76E−44
15 67.18 3.31E−42
4 238.4 1.34E−97
5 422.8 1.15E−164
6 1769.72 0
7 903.47 3.01e−315
8 1386.73 0
9 935.79 0
10 227.4 2.13E−93
11 960.23 0
12 65.01 1.96E−28
13 573.44 3.01E−215
14 1060.74 0
15 781.07 1.00E−279
6
L. Gong et al. Biomedical Signal Processing and Control 84 (2023) 104835
Table 6
The component of different attention situations.
Component Spatial attention Spectral attention Spatial–spectral convolution module Temporal encoding layer
Attention MHSA FFN
W/O any attention × × ✓ × ✓
With only spatial attention ✓ × ✓ × ✓
With only spectral attention × ✓ ✓ × ✓
With spatial–spectral attention ✓ ✓ ✓ × ✓
With only temporal attention × × ✓ ✓ ✓
With all attention ✓ ✓ ✓ ✓ ✓
Table 7
The average accuracy and standard deviation (acc/std(%)) of ACTNN model in different attention situations.
Attention SEED SEED-IV
session1 session2 session3 session1 session2 session3
w/o any attention 91.57/7.11 92.24/4.15 91.86/6.72 66.98/6.66 63.04/9.25 67.69/7.86
With only spatial attention 94.43/4.90 95.92/2.80 95.51/4.72 74.28/7.80 70.55/8.26 73.94/8.37
With only spectral attention 94.59/4.73 95.99/2.83 95.50/4.73 74.20/8.32 70.62/8.62 74.07/8.15
With spatial–spectral attention 96.31/3.11 97.21/2.17 97.06/3.49 77.24/7.62 73.59/7.47 76.27/8.83
With only temporal attention 97.21/2.66 97.48/2.86 97.35/2.58 89.71/4.72 84.13/8.24 87.37/9.63
With all attention 98.21/1.71 98.47/1.73 98.72/1.71 93.55/2.33 90.93/5.51 91.21/8.46
Fig. 7. The recognition accuracy of each subject under six attention situations in three sessions of SEED dataset. They are w/o any attention (dark blue square), with only spatial
attention (green circle), with only spectral attention (light blue triangle), with spatial–spectral attention (purple pentagon), with only temporal attention (orange diamond), and
with all attention (red star), respectively.
Table 6 listed the component of each attention situation. Table 7 For the overall performance of ACTNN, Figs. 9 and 10 reported
was the average accuracy and standard deviation obtained under dif- the accuracy of all subjects on SEED and SEED-IV datasets. As can
ferent attention mechanisms. Figs. 7 and 8 showed the results of all be seen from Fig. 9, the proposed ACTNN can achieve a satisfactory
sessions for all subjects in SEED and SEED-IV respectively. classification result for all subjects in SEED. The average recognition
When no attention was used (dark blue square in Figs. 7 and 8), accuracy of all subjects in the three sessions was 98.21%, 98.47%, and
all subjects in SEED dataset can still perform relatively well, but the 98.72% respectively, and the corresponding standard deviation was
subjects in SEED-IV dataset had been greatly affected. Compared with- 1.71%, 1.73%, and 1.71% respectively. This showed that ACTNN had
out any attention, the results obtained by only adding spatial attention better stability and superiority on SEED dataset.
(green circle) or spectral attention (light blue triangle) had a similar For most subjects in SEED-IV dataset (see Fig. 10), the proposed
increase, which may be due to the parallel structure of spatial and ACTNN can achieve good results, the average recognition accuracy of
spectral attention mechanisms. And when spectral and spatial attention all subjects in the three sessions was 93.55%, 90.93%, and 91.21%
mechanisms were combined (purple pentagon), the accuracy can be respectively, and the standard deviation was 2.33%, 5.51%, and 8.46%
respectively. Except for a few cases, such as subject #9 in session 3
further improved. As for adding only temporal attention (orange di-
(75.4%), subject #11 in session 2 (77.16%) and session 3 (67.31%),
amond), the best improvement can be got compared with the previous
which may be due to the difference between the emotional label
situations. This may be because it captured context from the global
marked for the EEG signals and the induced emotions of the
scope of the time slice, which makes input more discriminative.
subject.
We can make a quantitative comparison from Table 7. Compared
In addition, to illustrate the ability of ACTNN to distinguish various
with no attention, the maximum improvement had been achieved by
emotional states, Fig. 11 showed the confusion matrix obtained by
adding temporal attention, which can increase the accuracy by at ACTNN on SEED and SEED-IV datasets respectively. As shown in (a)
least 5.24% and 19.68% in SEED and SEED-IV, respectively. However, of Fig. 11, for the SEED dataset, ACTNN can achieve the best classi-
adding spatial attention increased the average accuracy by at least fication for positive emotions, followed by neutral emotions. For the
2.86% and 6.25% respectively, adding spectral attention increased by SEED-IV dataset (see (b) in Fig. 11), sad emotion was the most easily
at least 3.02% and 6.38% respectively, and adding spatial–spectral distinguished emotional state and fear emotions seemed to be the least
attention increased by at least 4.74% and 8.58% respectively. recognizable.
To sum up, in our model, temporal attention achieved the best
performance than spatial or spectral attention. Due to the structure 3.5. Comparative analysis between raw EEG signals and DE features
designed, spatial attention and spectral attention had similar improve-
ments. It played a better role when spatial and spectral attention were It proved that when the extracted DE features were used as the input
combined. of ACTNN, good emotion recognition performance can be obtained.
7
L. Gong et al. Biomedical Signal Processing and Control 84 (2023) 104835
Fig. 8. The recognition accuracy of each subject under six attention situations in three sessions of SEED-IV dataset. They are w/o any attention (dark blue square), with only
spatial attention (green circle), with only spectral attention (light blue triangle), with spatial–spectral attention (purple pentagon), with only temporal attention (orange diamond),
and with all attention (red star), respectively.
Fig. 10. The overall performance of the proposed ACTNN on SEED-IV dataset.
Fig. 11. The confusion matrix of the proposed ACTNN on SEED and SEED-IV dataset.
8
L. Gong et al. Biomedical Signal Processing and Control 84 (2023) 104835
Table 8
Comparison of recognition performance (average accuracy and standard deviation) with
raw EEG signal and DE feature as input in SEED.
Session Raw EEG signals DE features
Acc (%) Std (%) Acc (%) Std (%)
1 94.70 5.77 98.21 1.71
2 96.78 2.28 98.47 1.73
3 95.86 2.81 98.72 1.71
Average 95.78 3.62 98.47 1.72
Table 9
Comparison of recognition performance (average accuracy and standard deviation) with
raw EEG signal and DE feature as input in SEED-IV.
Session Raw EEG signals DE features
Acc (%) Std (%) Acc (%) Std (%)
1 89.36 5.88 93.55 2.33 Fig. 12. Brain topographic map of spatial attention mask adaptively assigned by
2 87.65 4.45 90.93 5.51 ACTNN on the subject #4 in SEED dataset, where the first and second maps represent
3 87.34 5.05 91.21 8.46 the attention weights assigned for the 1st and 2nd second input data, respectively.
Average 88.12 5.13 91.90 5.43
95.78%, which was 2.69% less than the DE features. Similarly, Table 9
described that the average accuracy obtained by using the raw EEG
signals in the SEED-IV dataset was 88.12%, which was 3.78% less than bands. As shown in Figs. 12 and 13, since we set T to 2, the first
the DE features. Therefore, although extracted DE feature seemed to and second brain topographic maps of each emotion represented the
increase the complexity compared with the raw EEG signals, the final attention masks captured for the 1st and 2nd second, respectively. It
recognition results showed that DE features obtained better perfor- can be seen that the weight distribution changed in a small range with
mance within the acceptable range of complexity, as they contained time. Due to the limited space, we only listed the results of subject #4
more effective emotional information. in SEED and subject #3 in SEED-IV as an example. The spatial attention
brain topographic maps of other subjects were attached at the end of
3.6. Analysis of spatial and spectral attention mask the paper (see supplemental material).
For spectral attention masks, we computed the average spectral
To further understand the underlying reasoning process of our attention masks of all subjects after the training, which represented the
proposed method, we visualized the spatial and spectral attention mask common importance of the different frequency bands, and explained
in the model. These attention masks were a set of data-driven attention the contribution of each frequency band to emotion recognition. Then,
weights, that dynamically assigned to critical electrodes or frequency we plotted the average weights of spectral attention mask in Fig. 14.
bands after training. We can see that all the attention mask values were between 0 and 1.
To describe the weight distribution of spatial attention masks more In the SEED and SEED-IV datasets, the model allocated the maximum
intuitively, we captured the updated masks in the last iteration of the attention weight on the gamma band. Since the attention weight was
model and mapped them to the brain topographic map. Figs. 12 and data-driven, it indicated that the features of the gamma band may
13 showed the spatial attention mask captured for SEED and SEED- provide more valuable discrimination information for emotion recog-
IV datasets respectively. The redder the color, the higher the assigned nition tasks, and the EEG of the gamma band may be more related to
weight. It can be seen that the attention weights of all emotions were human emotion, which was consistent with the existing research [58].
mainly distributed in the prefrontal lobe and lateral temporal lobe, Thus, the features of gamma band were continuously enhanced after
which indicated that these brain regions may be more closely related recalibration, and to improve the overall recognition performance.
to emotional activation and information processing in the brain, and
this was consistent with the observation results of neurobiological 3.7. Method comparison
studies [56,57]. It should be noted that the spatial attention mask was
obtained by compressing the frequency bands, that is, we used the To verify the effectiveness of our model, we compared the proposed
convolutional kernels with the size of 5 × 1 × 1 on spectral dimension, model with the state-of-the-art methods, and a brief introduction of
so it contained the comprehensive information of the five frequency each method was listed as follows.
9
L. Gong et al. Biomedical Signal Processing and Control 84 (2023) 104835
10
L. Gong et al. Biomedical Signal Processing and Control 84 (2023) 104835
Table 10
Performance comparison between the baseline methods and the proposed ACTNN on the SEED and SEED-IV datasets.
Methods Year Evaluation methods SEED SEED-IV
Acc (%) Std (%) Acc (%) Std (%)
DBN [38] 2015 Trial (9:6) 86.08 8.34 – –
SVM [53] 2018 Trial (16:8) – – 70.58 17.01
DGCNN [46] 2018 Trial (9:6) 90.40 8.49 – –
BiHDM [59] 2019 Trial (9:6)/(16:8) 93.12 6.06 74.35 14.09
GCB-net+BLS [48] 2019 Trial (9:6) 94.24 6.70 – –
RGNN [47] 2020 Trial (9:6)/(16:8) 94.24 5.95 79.37 10.54
4D-CRNN [60] 2020 5-fold CV 94.74 2.32 – –
SST-EmotionNet [61] 2020 Shuffle (6:4) 96.02 2.17 84.92 6.66
3D-CNN&PST [62] 2021 Shuffle (9:6) 95.76 4.98 82.73 8.96
EeT [63] 2021 5-fold CV 96.28 4.39 83.27 8.37
JDAT [64] 2021 10-fold CV 97.30 1.74 – –
4D-aNN [65] 2022 5-fold CV 96.25 1.86 86.77 7.29
MDGCN-SRCNN [66] 2022 Trial (9:6)/(16:8) 95.08 6.12 85.52 11.58
HCRNN [67] 2022 5 times 10-fold CV 95.33 1.39 – –
ACTNN(this paper) 2022 10-fold CV 98.47 1.72 91.90 5.43
5. Conclusion
11
L. Gong et al. Biomedical Signal Processing and Control 84 (2023) 104835
CNN-based and Transformer-based modules, and the results show that [10] J. Yedukondalu, L.D. Sharma, Cognitive load detection using circulant singular
the temporal encoding module has a relatively larger contribution to spectrum analysis and Binary Harris Hawks Optimization based feature selection,
Biomed. Signal Process. Control 79 (2023) 104006.
the improvement of recognition performance.
[11] R.W. Picard, Affective computing: challenges, Int. J. Hum.-Comput. Stud. 59
The proposed ACTNN provides a new insight into human emotion (1–2) (2003) 55–64.
decoding based on EEG signals, and can also be easily applied to [12] H.D. Nguyen, S.H. Kim, et al., Facial expression recognition using a temporal
other EEG classification tasks, such as sleep stage classification, motor ensemble of multi-level convolutional neural networks, IEEE Trans. Affect.
imagination, etc. In future work, we will explore the performance of Comput. 13 (1) (2022) 226–237.
[13] F. Noroozi, C.A. Corneanu, et al., Survey on emotional body gesture recognition,
ACTNN in subject-independent and cross-session tasks to improve the IEEE Trans. Affect. Comput. 12 (2) (2021) 505–523.
generalization ability of the model. [14] W. Li, Z. Zhang, A. Song, Physiological-signal-based emotion recognition: An
odyssey from methodology to philosophy, Measurement 172 (2021) 108747.
CRediT authorship contribution statement [15] C. Morawetz, S. Bode, et al., Effective amygdala-prefrontal connectivity pre-
dicts individual differences in successful emotion regulation, Soc. Cogn. Affect.
Neurosci. 12 (4) (2017) 569–585.
Linlin Gong: Conceptualization, Methodology, Software, Writing – [16] S. Berboth, C. Morawetz, Amygdala-prefrontal connectivity during emotion reg-
original draft. Mingyang Li: Writing – review & editing, Methodology. ulation: A meta-analysis of psychophysiological interactions, Neuropsychologia
Tao Zhang: Investigation, Validation. Wanzhong Chen: Supervision, 153 (2021) 107767.
[17] J.T. Cacioppo, D.J. Klein, et al., The psychophysiology of emotion, in: The
Formal analysis.
Handbook of Emotion, 2003.
[18] P.C. Petrantonakis, L.J. Hadjileontiadis, Emotion recognition from brain signals
Declaration of competing interest using hybrid adaptive filtering and higher order crossings analysis, IEEE Trans.
Affect. Comput. 1 (2) (2010) 81–97.
The authors declare that they have no known competing finan- [19] P.C. Petrantonakis, L.J. Hadjileontiadis, Emotion recognition from EEG using
higher order crossings, IEEE Trans. Inf. Technol. Biomed. 14 (2) (2010) 186–197.
cial interests or personal relationships that could have appeared to
[20] H. Bo, C. Xu, et al., Emotion recognition based on representation dissimilar-
influence the work reported in this paper. ity matrix, in: 2022 IEEE International Conference on Multimedia and Expo
Workshops, ICMEW, 2022, pp. 1–6.
Data availability [21] N. Jadhav, R. Manthalkar, Y. Joshi, Effect of meditation on emotional response:
An EEG-based study, Biomed. Signal Process. Control 34 (2017) 101–113.
[22] R.M. Mehmood, B. Muhammad, et al., EEG-based affective state recognition from
The authors do not have permission to share data. human brain signals by using hjorth-activity, Measurement 202 (2022) 111738.
[23] M. Alsolamy, A. Fattouh, Emotion estimation from EEG signals during listening
Acknowledgments to Quran using PSD features, in: 2016 7th International Conference on Computer
Science and Information Technology, CSIT, 2016, pp. 1–5.
[24] K.S. Kamble, J. Sengupta, Ensemble machine learning-based affective computing
We sincerely appreciate all the editors and reviewers for their
for emotion recognition using dual-decomposed EEG signals, IEEE Sens. J. 22
insightful comments and constructive suggestions. This work was sup- (3) (2022) 2496–2507.
ported by the Natural Science Foundation of Jilin Province, China [25] P. Wagh Kalyani, K. Vasanth, Performance evaluation of multi-channel electroen-
(Grant No. 20210101178JC), Scientific Research Project of Education cephalogram signal (EEG) based time frequency analysis for human emotion
recognition, Biomed. Signal Process. Control 78 (2022) 103966.
Department of Jilin Province, China (Grant No. JJKH20221009KJ), In-
[26] S. Li, X. Lyu, et al., Identification of emotion using electroencephalogram by
terdisciplinary Integration and Innovation Project of JLU, China (Grant tunable Q-factor wavelet transform and binary gray wolf optimization, Front.
No. JLUXKJC2021ZZ02), and National Natural Science Foundation of Comput. Neurosci. 15 (2021).
China (Grant No. 62203183). [27] A. Subasi, T. Tuncer, et al., EEG-based emotion recognition using tunable q
wavelet transform and rotation forest ensemble classifier, Biomed. Signal Process.
Control 68 (2021) 102648.
Appendix A. Supplementary data
[28] S.K. Khare, V. Bajaj, G.R. Sinha, Adaptive tunable q wavelet transform-based
emotion identification, IEEE Trans. Instrum. Meas. 69 (12) (2020) 9609–9617.
Supplementary material related to this article can be found online [29] C. Wei, L. Chen, et al., EEG-based emotion recognition using simple recurrent
at https://doi.org/10.1016/j.bspc.2023.104835. units network and ensemble learning, Biomed. Signal Process. Control 58 (2020)
101756.
[30] D.S. Naser, G. Saha, Recognition of emotions induced by music videos using DT-
References CWPT, in: 2013 Indian Conference on Medical Informatics and Telemedicine,
ICMIT, 2013, pp. 53–57.
[1] R.W. Picard, Affective Computing, MIT Press, 1997. [31] R. Duan, J. Zhu, B. Lu, Differential entropy feature for EEG-based emotion
[2] P.D. Bamidis, C. Papadelis, et al., Affective computing in the era of contempo- classification, in: 2013 6th International IEEE/EMBS Conference on Neural
rary neurophysiology and health informatics, Interact. Comput. 16 (4) (2004) Engineering, NER, 2013, pp. 81–84.
715–721. [32] J. Xiang, C. Rui, L. Li, Emotion recognition based on the sample entropy of EEG,
[3] R.W. Picard, E. Vyzas, J. Healey, Toward machine emotional intelligence: in: Proceedings of the 2nd International Conference on Biomedical Engineering
analysis of affective physiological state, IEEE Trans. Pattern Anal. Mach. Intell. and Biotechnology, 2014, pp. 1185–1192.
23 (10) (2001) 1175–1191. [33] Y. Shi, X. Zheng, T. Li, Unconscious emotion recognition based on multi-scale
[4] S. Ehrlich, C. Guan, G. Cheng, A closed-loop brain-computer music interface for sample entropy, in: 2018 IEEE International Conference on Bioinformatics and
continuous affective interaction, in: 2017 International Conference on Orange Biomedicine, BIBM, 2018, pp. 1221–1226.
Technologies, ICOT, 2017, pp. 176–179, http://dx.doi.org/10.1109/ICOT.2017. [34] E.S. Pane, A.D. Wibawa, M.H. Purnomo, Improving the accuracy of EEG emotion
8336116. recognition by combining valence lateralization and ensemble learning with
[5] J. Pan, Q. Xie, et al., Emotion-related consciousness detection in patients with tuning parameters, Cogn. Process 20 (2019) 405–417.
disorders of consciousness through an EEG-based BCI system, Front. Hum. [35] T. Chen, S. Ju, et al., Emotion recognition using empirical mode decomposition
Neurosci. 12 (2018). and approximation entropy, Comput. Electr. Eng. 72 (2018) 383–392.
[6] E.V.C. Friedrich, A. Sivanathan, et al., An effective neurofeedback intervention to [36] T. Chen, S. Ju, et al., EEG emotion recognition model based on the LIBSVM
improve social interactions in children with autism spectrum disorder, J. Autism classifier, Measurement 164 (2020) 108047.
Dev. Disord. 45 (2015) 4084–4100. [37] W. Jiang, G. Liu, et al., Cross-subject emotion recognition with a decision tree
[7] H. Dini, F. Ghassemi, M.S.E. Sendi, Investigation of brain functional networks classifier based on sequential backward selection, in: 2019 11th International
in children suffering from attention deficit hyperactivity disorder, Brain Topogr. Conference on Intelligent Human–Machine Systems and Cybernetics, IHMSC,
33 (2020) 733–750. 2019, pp. 309–313.
[8] H. Chang, Y. Zong, et al., Depression assessment method: An EEG emotion recog- [38] W. Zheng, B. Lu, Investigating critical frequency bands and channels for EEG-
nition framework based on spatiotemporal neural network, Front. Psychiatry 12 based emotion recognition with deep neural networks, IEEE Trans. Auton. Ment.
(2022). Dev. 7 (3) (2015) 162–175.
[9] H. Hu, Z. Zhu, et al., Analysis on biosignal characteristics to evaluate road rage [39] D. Maheshwari, S.K. Ghosh, et al., Automated accurate emotion recognition
of Younger drivers: A driving simulator study, in: 2018 IEEE Intelligent Vehicles system using rhythm-specific deep convolutional neural network technique with
Symposium, IV, 2018, pp. 156–161. multi-channel EEG signals, Comput. Biol. Med. 134 (2021) 104428.
12
L. Gong et al. Biomedical Signal Processing and Control 84 (2023) 104835
[40] H. Cui, A. Liu, et al., EEG-based emotion recognition using an end-to-end [54] L. Shi, Y. Jiao, B. Lu, Differential entropy feature for EEG-based vigilance
regional-asymmetric convolutional neural network, Knowl.-Based Syst. 205 estimation, in: 2013 35th Annual International Conference of the IEEE Engi-
(2020) 106243. neering in Medicine and Biology Society, EMBC, 2013, pp. 6627–6630, http:
[41] X. Xing, Z. Li, et al., SAE+LSTM: A new framework for emotion recognition from //dx.doi.org/10.1109/EMBC.2013.6611075.
multi-channel EEG, Front. Neurorobot. 13 (2019). [55] A.G. Roy, N. Navab, C. Wachinger, Concurrent spatial and channel ‘squeeze &
[42] A. Iyer, S.S. Das, et al., CNN and LSTM based ensemble learning for human excitation’ in fully convolutional networks, in: 2018 International Conference
emotion recognition using EEG recordings, Multimodal Interact. IoT Appl. (2022) on Medical Image Computing and Computer-Assisted Intervention, vol. 11070,
http://dx.doi.org/10.1007/s11042-022-12310-7. 2018, http://dx.doi.org/10.1007/978-3-030-00928-1_48.
[43] X. Li, D. Song, et al., Emotion recognition from multi-channel EEG data through [56] S.J. Reznik, J.J.B. Allen, Frontal asymmetry as a mediator and moderator of
Convolutional Recurrent Neural Network, in: 2016 IEEE International Conference emotion: An updated review, Psychophysiology 55 (1) (2018) e12965.
on Bioinformatics and Biomedicine, BIBM, 2016, pp. 352–359. [57] D. Seo, C.A. Olman, Neural correlates of preparatory and regulatory control
[44] Y. Zhang, J. Chen, et al., An investigation of deep learning models for EEG-based over positive and negative emotion, Soc. Cogn. Affect. Neurosci. 9 (4) (2014)
emotion recognition, Front. Neurosci. 14 (2020). 494–504.
[45] L. Pessoa, A network model of the emotional brain, Trends in Cognitive Sciences [58] K. Yang, L. Tong, High Gamma band EEG closely related to emotion: Evidence
21 (5) (2017) 357–371. from functional network, Front. Hum. Neurosci. 14 (2020).
[46] T. Song, W. Zheng, et al., EEG emotion recognition using dynamical graph [59] Y. Li, L. Wang, et al., A novel bi-hemispheric discrepancy model for EEG emotion
convolutional neural networks, IEEE Trans. Affect. Comput. 11 (3) (2020) recognition, IEEE Trans. Cogn. Dev. Syst. 13 (2) (2021) 354–367.
532–541. [60] F. Shen, G. Dai, et al., EEG-based emotion recognition using 4D convolutional
[47] P. Zhong, D. Wang, C. Miao, EEG-based emotion recognition using regularized recurrent neural network, Cogn. Neurodyn. 14 (2020) 815–828.
graph neural networks, IEEE Trans. Affect. Comput. 13 (3) (2022) 1290–1301. [61] Z. Jia, Y. Lin, et al., SST-EmotionNet: Spatial-spectral-temporal based attention
[48] T. Zhang, X. Wang, et al., GCB-net: Graph convolutional broad network and its 3D dense network for EEG emotion recognition, in: Proceedings of the 28th ACM
application in emotion recognition, IEEE Trans. Affect. Comput. 13 (1) (2022) International Conference on Multimedia, MM ’20, Association for Computing
379–388. Machinery, 2020, pp. 2909–2917, http://dx.doi.org/10.1145/3394171.3413724.
[49] Y. Yang, Q. Wu, et al., Continuous convolutional neural network with 3D [62] J. Liu, Y. Zhao, et al., Positional-spectral-temporal attention in 3D convolutional
input for EEG-based emotion recognition, in: 2018 International Conference neural networks for EEG emotion recognition, in: 2021 Asia-Pacific Signal and
on Neural Information Processing, 2018, http://dx.doi.org/10.1007/978-3-030- Information Processing Association Annual Summit and Conference, APSIPA ASC,
04239-4_39. 2021, pp. 305–312.
[50] J. Li, Z. Zhang, et al., Hierarchical convolutional neural networks for EEG-based [63] J. Liu, H. Wu, et al., Spatial–temporal transformers for EEG emotion recognition,
emotion recognition, Cogn. Comput. 10 (2018) 368–380. 2021, preprint, http://dx.doi.org/10.48550/arXiv.2110.06553.
[51] Y. Yang, Q. Wu, et al., Emotion recognition from multi-channel EEG through [64] Z. Wang, Z. Zhou, et al., JDAT: Joint-Dimension-Aware Transformer with Strong
parallel convolutional recurrent neural network, in: 2018 International Joint Flexibility for EEG Emotion Recognition, TechRxiv, 2021, preprint http://dx.doi.
Conference on Neural Networks, IJCNN, 2018. org/10.36227/techrxiv.17056961.v1.
[52] A. Vaswani, N. Shazeer, et al., Attention is all you need, in: Proceedings of [65] G. Xiao, M. Shi, et al., 4D attention-based neural network for EEG emotion
the 31st International Conference on Neural Information Processing Systems, recognition, Cogn. Neurodyn. 16 (2022) 805–818.
NIPS’17, 2017, pp. 6000–6010. [66] G. Bao, K. Yang, et al., Linking multi-layer dynamical GCN with style-based
[53] W. Zheng, W. Liu, et al., EmotionMeter: A multimodal framework for recognizing recalibration CNN for EEG-based emotion recognition, Front. Neurorobot. 16
human emotions, IEEE Trans. Cybern. 49 (3) (2019) 1110–1122. (2022).
[67] M. Zhong, Q. Yang, et al., EEG emotion recognition based on TQWT-features and
hybrid convolutional recurrent neural network, Biomed. Signal Process. Control
79 (2) (2023) 104211.
13