Wang 2020

Neuropsychologia 146 (2020) 107506
Contents lists available at ScienceDirect
Neuropsychologia
journal homepage: http://www.elsevier.com/locate/neuropsychologia
Emotion recognition with convolutional neural network and

EEG-based EFDMs☆
Fei Wang a, *, Shichao Wu a, Weiwei Zhang a, Zongfeng Xu b, Yahui Zhang b, Chengdong Wu a,
Sonya Coleman c
a
Faculty of Robot Science and Engineering, Northeastern University, Shenyang, 110169, China
b
College of Information Science and Engineering, Northeastern University, Shenyang, 110819, China
c
Intelligent Systems Research Centre, Ulster University, Londonderry, United Kingdom
A R T I C L E I N F O A B S T R A C T
Index Terms: Electroencephalogram (EEG), as a direct response to brain activity, can be used to detect mental states and
Emotion recognition physical conditions. Among various EEG-based emotion recognition studies, due to the non-linear, non-sta
Electroencephalogram tionary and the individual difference of EEG signals, traditional recognition methods still have the disadvantages
Convolutional neural network
of complicated feature extraction and low recognition rates. Thus, this paper first proposes a novel concept of
Electrode-frequency distribution maps
electrode-frequency distribution maps (EFDMs) with short-time Fourier transform (STFT). Residual block based
Gradient-weighted class activation mapping
deep convolutional neural network (CNN) is proposed for automatic feature extraction and emotion classification
with EFDMs. Aim at the shortcomings of the small amount of EEG samples and the challenge of differences in
individual emotions, which makes it difficult to construct a universal model, this paper proposes a cross-datasets
emotion recognition method of deep model transfer learning. Experiments carried out on two publicly available
datasets. The proposed method achieved an average classification score of 90.59% based on a short length of EEG
data on SEED, which is 4.51% higher than the baseline method. Then, the pre-trained model was applied to
DEAP through deep model transfer learning with a few samples, resulted an average accuracy of 82.84%. Finally,
this paper adopts the gradient weighted class activation mapping (Grad-CAM) to get a glimpse of what features
the CNN has learned during training from EFDMs and concludes that the high frequency bands are more
favorable for emotion recognition.
1. Introduction researchers to study phase changes related to emotion. EEG is

non-invasive, fast, and low-cost compared with other psychophysio
Human emotion plays an important role in the process of affective logical signals (Niemic, 2004). Various psychophysiological studies have
computing and human machine interaction (HMI) (Preethi et al., 2014). demonstrated the relationship between human emotions and EEG sig
Moreover, many mental health issues are reported to be relevant to nals (Sammler et al., 2007), (Mathersul et al., 2008), (Knyazev et al.,
emotions, such as depression, attention deficit (Alkaysi et al., 2017), 2010). With the wide implementation of machine learning methods in
(Bocharov et al., 2017). Much information such as posture, facial the field of emotion recognition, many remarkable results have been
expression, speech, skin responses, brain waves and heart rate are achieved. Sebe et al. summarized the studies of emotion recognition
commonly used for emotion recognition (Liberati et al., 2015). There is with single modality, described the challenging problem of multimodal
some evidence that electroencephalogram (EEG) based methods are emotion recognition (Sebe et al., 2005). Alarcao et al. presented a
more reliable, demonstrating high accuracy and objective evaluation comprehensive overview of the existing works on EEG emotion recog
compared with other external features (Zheng et al., 2015). Although nition in recent years (Alarcao and Fonseca, 2019). A number of EEG
EEG has a poor spatial resolution and requires many sensors placed on datasets have been built with various emotions or scored in one
the scalp, it provides an excellent temporal resolution, allowing continuous emotion space. However, the problem of modeling and
This work was supported in part by the National Natural Science Foundation of China under Grant 61973065, Fundamental Research Funds for the Central
☆
Universities of China under Grant N172608005 and N182612002, Liaoning Provincial Natural Science Foundation of China under Grant 20180520007.
* Corresponding author.
E-mail address: wangfei@mail.neu.edu.cn (F. Wang).
https://doi.org/10.1016/j.neuropsychologia.2020.107506
Received 25 February 2020; Received in revised form 23 May 2020; Accepted 26 May 2020
Available online 1 June 2020
0028-3932/© 2020 Elsevier Ltd. All rights reserved.
F. Wang et al. Neuropsychologia 146 (2020) 107506
detecting human emotions has not been fully investigated (Mühl et al., domain for further feature extraction. Since many studies demonstrated
2014). EEG based emotion recognition is still very challenging for the that the frequency domain features have higher distinguishability, we
fuzzy boundary between emotion categories as well as the difference of proposed the novel concept of electrode-frequency distribution maps
EEG signals from kinds of subjects. (EFDMs) firstly. With the successful application of CNN in speech
Various feature extraction, selection and classification methods have recognition (Abdelhamid et al., 2014), we build a deep neural network
been proposed for EEG based emotion recognition (Zhuang et al., 2017). for emotion recognition based on EFDMs. The EFDMs of EEG signals can
Friston modeled the brain as a large number of interacting nonlinear be regarded as grayscale images. Therefore, with proposed EFDMs, we
dynamical systems and emphasized the labile nature of normal brain realized the purpose of constructing emotion recognition model based
dynamics (Friston, 2001). Several studies have suggested that the on CNN.
human brain can be considered as a chaotic system, i.e., a nonlinear At present, studies on EEG emotion recognition mainly focus on
system that exhibits particular sensitivity to initial conditions (Ezzat subject-dependent emotion recognition tasks. For engineering applica
doost et al., 2020). The nonlinear interaction between brain regions may tions, it’s obviously impossible to collect a huge amount of subjects’ EEG
reflect the unstable nature of brain dynamics. Thus, for this unstable and signals in advance to build a universal emotion recognition model to
nonlinear EEG signals, a nonlinear analysis method such as sample en identify the emotions of every person. Therefore, how to realize the
tropy (Jie et al., 2014) is more appropriate than that of linear methods, subject- dependent pattern classification is one tough issue in the
which ignores information associated with nonlinear dynamics of the practical application of emotion recognition. Traditional emotion
human brain. Time-frequency analysis methods are based on the spec recognition models are usually established for a specific task on a small
trum of EEG signals. Power spectral density and differential entropy of dataset, thus they often fail to achieve good effect under new tasks, due
sub-band EEG rhythms are commonly used as emotional features (Duan to the possible differences in stimulus paradigm, subjects and EEG
et al., 2013), (Ang et al., 2017). In the last decade, a large number of acquisition equipment. In addition, the learning process of deep neural
studies have demonstrated that the higher frequency rhythms such as networks is vitally important, and generally requires a large amount of
beta and gamma outperform lower rhythms, i.e., delta and theta, for labeled data, while the acquisition of EEG signals is not as easy as that of
emotion recognition. Traditional recognition methods are mainly based image, speech and text signals. Accordingly, how to achieve a highly
on the combination of hand-crafted features and shallow models like effective classifier through the training process based on a small number
k-nearest neighbor (KNN), support vector machines (SVM) and belief of labeled samples is another issue that needs to be considered. In this
networks (BN) (Duan et al., 2012), (Sohaib et al., 2013), (Zubair and paper, transfer learning is employed to solve those problems highlighted
Yoon, 2018). However, EEG signals have a low signal-to-noise ratio above. Among various transfer learning methods, one is to reuse the pre-
(SNR) and are often mixed with noise generated in the process of data trained model from source domain to target domain, dependent on the
collection. Another much more challenging problem is that, unlike similarities of data, tasks and models between them (Pan and Yang,
image or speech signals, EEG signals are temporally asymmetry and 2010). Transfer learning accelerates the training process by transferring
nonstationary, which has created significant difficulties for data pre the pre-trained model parameters to a new domain task. Since Yosinski
processing to obtain clean data for feature extraction. The nonstationary et al. published an article on how to transfer the features in deep neural
means the properties (mean, variance and covariance) of EEG signals network, it has achieved a rapid development in the field of image
varied with time partly or totally. Temporally asymmetric refers to the processing (Yosinski et al., 2014).
fact that the corresponding activation lobes and activation degree are We firstly proposed a novel concept of EFDMs based on multiple
different under various cognitive activities. Pauls has identified these channel EEG signals. Then four residual blocks based CNN was built for
two nonlinearity properties of EEG (Palus, 1996). Moreover, traditional automatic feature extraction and emotion classification with EFDMs as
manual feature extraction and selection methods are crucial to an af input. We mainly set up two experiments in this paper. One is to evaluate
fective model and require specific domain knowledge. The commonly the effectiveness of the proposed method on SEED. Second, based on the
used dimensionality reduction techniques for EEG signal analysis are deep model transfer learning strategy, the pre-rained CNN from the first
principal component analysis (PCA) and Fisher projection. In general, experiment is applied to DEAP for the cross-datasets emotion recogni
the cost of these traditional feature selection methods increases tion. At the last, we have given more neuroscience interpretation by
quadratically with respect to the number of features that is included revealing the key EEG electrodes and frequency bands corresponding to
(Dash and Liu, 1997). each emotion category based on the attention mechanism of deep neural
As a form of representation learning, deep learning can extract fea network and the proposed EFDMs.
tures automatically through model training (Zhang et al., 2018). Apart
from the successful implementation in image and speech domains, deep 2. Methods
learning has been introduced to physiological signals, such as EEG
emotion recognition in recent years. Zheng et al. trained an efficient In this section, we will detail the general framework of the EFDMs
deep belief network (DBN) to classify three emotional states (negative, based CNN for emotion recognition, including a short description of
neutral, and positive) by extracting differential entropy (DE) of different short-time Fourier transform (STFT), the structure and key parameters
frequency bands and achieved an average recognition of 86.65% (Zheng of the proposed CNN as well as a brief introduction to Grad-CAM.
and Lu, 2015). As a typical deep neural network model, convolutional
neural network (CNN) has achieved great progress in computer vision, 2.1. Short-time Fourier Transform
image processing and speech recognition (Hatcher and Yu, 2018).
Yanagimoto et al. built a CNN to recognize the emotional valence of Fourier transform (FT) is often used to analyze the frequency features
DEAP and analyze various emotions with EEG (Yanagimoto and Sugi of time series. It provides the frequency information averaged over the
moto, 2016). Wen et al. rearranged the original EEG signals through entire signal time interval, and does not know the time when each fre
Pearson Correlation Coefficients and fed them into the end-to-end CNN quency component appears. Therefore, the spectrum of two signals with
based model for the purposes of reducing the manual effort on features, large difference in time domain may be the same in frequency domain.
which achieved an accuracy of 77.98% for Valence and 72.98% for That is to say, FT assumes that the time sequences are stationary, which
Arousal on DEAP, respectively (Wen et al., 2017). is a false hypothesis for EEG signals apparently. For these nonstationary
The mainly used feature extraction methods of EEG signals can signal analyze, the time series should be cut into minor segments, and
mainly be divided into time domain, frequency domain, and time- within each segment, the signal waves can be approximately considered
frequency domain (Wang, 2011), (Chuang et al., 2014), (Li et al., as stationary signals used for FT. The idea is called STFT. It is a sequence
2017). Frequency analysis transformed the EEG signals into frequency of Fourier transforms of a windowed signal, used to analyze how the
2
frequency content of a nonstationary signal changes. Provides the time- where x½n� is a time series and ω½n� is window function. With a
localized frequency information for situations in which frequency normalization of Xðm; wÞ, we got the corresponding EFDMs.
components of a signal vary over time. The calculation of STFT is
defined as:
2.2. The proposed model for EEG emotion recognition
Z ∞
Xðτ; wÞ ¼ xðtÞωðt τÞe jwt dt (1)
∞ In image processing, convolution operations can effectively filter
image information, and CNN make use of these characteristics to ach
where xðtÞ represents original signal and ωðtÞ indicates the window ieve automatic feature extraction from images. In order to apply the
function such as the Hanning window as shown in (2). It’s a linear CNN for automatic feature extraction and pattern classification in EEG-
combination of modulated rectangular windows, and usually emerges in based emotion recognition, we proposed a novel concept of EFDMs
applications that require low aliasing and less spectrum leakage. based on multiple channel EEG signals. These EFDMs can be treated as
1
� �
2πn
�� grayscale images to apply two dimensional convolution operation.
wðnÞ ¼ 1 cos (2) A CNN with four residual blocks is proposed for EEG emotion
2 N 1
recognition with EFDMs as input. The general network structure is
in which n represents the window length and N is the sampling number. shown in Fig. 1. The network consists of 1 convolution layer, 4 residual
As for discrete time series, the data could be broken up into seg blocks, 4 max pooling layers, 2 fully connected layers, and finally the
ments. Each segment is Fourier transformed, and the complex result is Softmax layer. The network also includes 5 batch normalization and 4
added to a matrix, which records magnitude and phase for each point in dropout layers for over-fitting consideration. The size of the max pooling
time and frequency. The calculation of STFT for discrete time series can window is 2 � 2, and the window slide step is 2. In addition, all inter
be expressed as: mediate layers use the Rectified Linear Unit (ReLU) as an activation
function. The detailed structure of the residual block is shown in the
X dashed box. The size of the convolution kernel in the residual block is 3,
∞
Xðm; wÞ ¼ x½n�ω½n m�e jwn
(3)
n¼ ∞ 3, 1, the sliding step is 1, and one batch normalization layer is included
after each convolution layer.
Fig. 1. The proposed residual block based CNN for EEG emotion recognition.
3
The residual block based CNN can effectively alleviate the problem 3.2. DEAP dataset description
of gradient disappearance and gradient explosion through the shortcut
connections between layers. The network embedded with max pooling DEAP is a multimodal dataset consisting of EEG recordings collected
layer has a certain translation and rotation invariance to the input. while watching the selected video clips to analyze human affective
Moreover, since the emotion-related features of EEG signals mainly re states. The EEG and peripheral physiological signals of 32 participants
flected in the sub-frequency rhythms, the pooling operation in the fre were recorded using a Biosemi ActiveTwo system as each watched 40 1-
quency direction can make the neural network more effective for min long excerpts of music videos. The experiments were performed in
extracting emotion-relevant features from EFDMs. Finally, two fully two laboratory environments with controlled illumination. The EEG
connected layers are used for emotion classification based on the fea signals were recorded at a sampling rate of 512Hz from 32 active elec
tures extracted by the former convolution layers. trodes according to the international 10–20 system. Each participant
assesses their levels of arousal, valence, dominance and liking using self-
2.3. Grad-CAM assessment manikins (SAM). Participants selected the numbers 1–9 for
emotional state for each clip. The arousal scales extend from passive to
Gradient-weighted class activation mapping (Grad-CAM) is used to active, and valence ranges from negative to positive.
make CNN-based models more transparent by producing visual expla Some preprocessing operations have been applied to the raw data,
nations (Selvaraju et al., 2020). This can be used to understand the such as down sampling the recordings to 128 Hz; EOG artifacts were
importance of input data with respect to a target class of interest. In removed; a bandpass filter of 4Hz–45Hz was applied; averaged to the
order to obtain the class-discriminative localization map Grad-CAM for common reference. After that, the data were segmented into 60 s trials
any class c (LcGrad CAM ), the gradient of the score for class c was first and a 3 s pre-trial baseline removed (Koelstra et al., 2012).
computed (yc ), with respect to the feature maps Ak of a convolutional
layer. These gradients flowing back are global-average-pooled to obtain 3.3. Data preprocessing
the neuron importance weights: ack
For the SEED dataset since the length of the EEG signals acquired
1 XX ∂yc under various stimulus differs greatly, we firstly count the lengths of all
αck ¼ (4)
Z i2w j2h ∂Akij EEG trials, then the EEG signals are truncated (taking the first 37,000
sampling points for subsequent analysis) to ensure that every kind of
in which Z represents the number of pixels in the feature map. emotion has the same number of samples.
Through performing a weighted combination of forward activation Due to the big differences in the experimental protocol, the compo
maps followed by a ReLU, the Grad-CAM can be expressed as: sition of subjects, and the configuration of signal acquisition system
! between DEAP and SEED dataset. In order to ensure that the classifi
X
c
LGrad CAM ¼ ReLU c k
αk A (5) cation task using both SEED and DEAP is similar, the emotional space of
k DEAP is divided into discrete parts according to the valence score similar
to the approach in (Lan et al., 2019). Samples with a score in valence
The output LcGrad CAM indicates which parts the proposed neural net
greater than 7 were classified as positive, samples with scores less than
works have paid more attention to, and we denote them as attention
or equal to 7 and greater than 3 were classified as neutral, and samples
heat maps. For each emotion, we use (6) to calculate the average heat
with scores no more than 3 were treated as negative. Based on this
maps of all samples to understand what’s the difference when classify
classification criterion, the number of subjects with different types of
different emotions.
emotion was counted in each film stimulus, and the frequency results are
1X c shown in Fig. 2. The horizontal axis represents the film clip index from 1
LAVE ¼ LGrad (6)
to 40, and the vertical axis represents the number of subjects with one
CAM
N
specific emotion corresponding to each stimulus. We then look for the
3. Dataset description and analysis trials that have the most participants who reported to have successfully
induced positive, neutral and negative emotion, respectively. These
In this section, we make a description on two EEG emotion recog trials are: #18 for positive emotion, #16 for neutral emotion, and #38
nition datasets, i.e. SEED and DEAP. Then some data preprocessing for negative emotion, each having 27, 28 and 19 subjects respectively.
methods are presented to prepare samples for cross-datasets emotion Fourteen subjects in DEAP (numbered 2, 5, 10, 11, 12, 13, 14, 15, 19, 22,
recognition. Finally, data distribution between different subjects are 24, 26, 28 and 31) that successfully induced all three types of emotions
analyzed. under these three trials (#18, #16 and #38) were selected for subse
quent experiments.
3.1. SEED dataset description After that, the EEG signals in each channel are divided into a number
of samples with a 1s long non-overlapping Hanning window in two
SEED dataset contains three categories of emotions, i.e., negative, datasets. Hence, we obtain 185 samples of one trail corresponding to one
neutral, and positive. Fifteen subjects (7 males and 8 females) partici film clip, and 41625 samples are obtained under different emotions in
pated in the experiments. EEG signals were recorded using an ESI SEED. For DEAP, since each trial lasts for 63 s and the first 3 s are
NeuroScan System at a sampling rate of 1000 Hz from a 62-channels baseline recording without emotion elicitation, we only use the segment
active AgCl electrode cap according to the international 10–20 system from the 4th second to the end. Thus, 60 samples were obtained in each
while they were watching emotional film clips. There are 15 trials (film trial, the total number of samples is 2520. Finally, Fourier Transform
clips watching test) in one experiment. Each subject participated in the and normalization are performed on the samples to get the EFDMs.
experiment 3 times at an interval of one week or longer.
For EEG signal processing, the raw EEG data were first down- 3.4. EFDMs
sampled to 200 Hz. In order to filter the noise and remove most arti
facts, a bandpass filter of 0.5Hz–70Hz was performed (Zheng et al., The Fourier transform is applied to each EEG channel of all samples
2019). obtained above, and then the transformed results are normalized to
produce the input data that are suitable for CNN. The normalized results
in two dimensions were known as EFDMs. The EFDMs of the EEG signals
can be represented as grayscale images, and the normalized results can
4
Fig. 2. DEAP emotion space discrete results with valence score.
be compared to the gray pixel value. Therefore, we can build a CNN for subjects is quite different, which does not satisfy the independent and
EEG-based emotion recognition with EFDMs. Fig. 3 shows the EFDMs identical assumption between training and test samples in traditional
under different emotions. machine learning. In addition, the feature differences among three kinds
of emotions of the same subject are not obvious. Therefore, in this case,
3.5. Data distribution analysis traditional machine learning methods often fail to achieve good recog
nition results. The recently proposed transfer learning is specifically
In order to illustrate the difference in EEG signals between different designed to solve this problem. Such methods usually carried out within
subjects, we use the SEED dataset as an example and randomly select 50 one dataset, which has some similar parts among different subjects, such
samples of five subjects under three emotional states for analysis. Firstly, as EEG signal acquisition equipment and experimental process, this is
the DE of five sub-band EEG signals in all channels are extracted, and the helpful for knowledge transfer from source to target domain. However,
feature vectors are formed. Then the PCA is used to reduce the dimen in cross-dataset emotion recognition task, the differences introduced by
sionality of the features, the two components with the largest eigen different signal acquisition equipment and experimental environments
values are retained for data distribution analysis. need also to be considered. Therefore, it is more difficult to realize the
As can be seen from Fig. 4, the data distribution among different transfer learning of emotion recognition model with cross-datasets.
Fig. 3. EFDMs under different emotions.
5
Fig. 4. Two-dimension visualization of features selected from different subjects in SEED.
4. Experiments and results analysis

Table 1
Some notable works on SEED dataset.
We set up two experiments. First, the effectiveness of the proposed
method for EEG-based emotion recognition is verified using SEED. Then, Method Feature Classifier Signal Accuracy
(%)
based on the deep neural network transfer learning strategy, the pre-
trained model is applied to DEAP with 12 training samples of each Zheng and Lu DE DBN EEG(1s) 86.08
(2015)
emotion class.
Lu et al. DE (EEG) Fuzzy integral EEG (4s) þ 87.59
(2015) fusion strategy Eye
movement
4.1. SEED based emotion recognition Liu et al. (Liu DE BDAE þ SVM EEG (4s) þ 91.01
et al., Eye
Over the past few years, many scholars have conducted notable 2016), movement
2016
research on EEG based emotion recognition with SEED. To compare the Yang et al. ( DE hierarchical EEG (4s) þ 91.51
proposed approach with (Zheng and Lu, 2015), (Lu et al., 2015), (Liu Yang et al., network with Eye
et al., 2016), (Yang et al., 2017), in this experiment, we strictly obey the 2017), subnetwork nodes movement
protocol of Zheng et al. (Zheng and Lu, 2015). Specifically, for all 15 2017
Tang et al. ( PSD, DE, Bimodal-LSTM EEG (4s) þ 93.97
trials of EEG data associated with one session of one subject, the first 9
Tang et al., Mean, SD Eye
trials are used to serve as the training set and the remaining 6 are the 2017), movement
testing set. Then, the recognition accuracy corresponding to each period 2017
is obtained for each subject. Finally, the average classification accuracy Li et al. (Li DE BiDANN EEG (9s) 92.38
over three sessions for all 15 subjects is calculated. et al.,
2018),
The training and testing processes are implemented using Pytorch 2018
framework with Adam algorithm as an optimizer; the learning rate is set Song et al. ( DE DGCNN EEG (1s) 90.40
as 0.0001, and the loss function is a cross entropy loss function. Song et al.,
We compared the proposed models with other state-of -the-art ap 2018),
2018
proaches (Tang et al., 2017), (Li et al., 2018), (Song et al., 2018) and the
Ours EFDMs CNN EEG (1s) 90.59
baseline method, which uses DBN directly as the classifier. As shown in
Table 1, Bimodal-LSTM achieved the best accuracy (93.97%) among (Lu
et al., 2015), (Liu et al., 2016), (Yang et al., 2017), (Tang et al., 2017) outputs. The element ði; jÞ is the percentage of samples in class i that
with 4 s of EEG as well as eye movement information. Based on single were classified as class j. From the results we can see that, in general,
EEG, Li et al. (Lu et al., 2015) obtained the best recognition rate of positive emotion can be recognized with high accuracy (93%), while
92.38% with 9s EEG. The result of the proposed model based on EFDMs negative emotion is more difficult to recognize, and very easy to be
and CNN is 90.59%, which is 4.51% higher than the baseline results with confused with neutral emotion.
differential entropy and DBN. Compared with other methods, the data
samples of 1s used in this paper are shorter and the process to produce
EFDMs through STFT is simpler compared to DE. That is to say, the EEG 4.2. DEAP based emotion recognition
based emotion recognition method combined with EFDMs and CNN is
effective. The goal of machine learning is to build a model that is as general as
To see the results of recognizing each emotion, we depict the possible to meet the requirements of different user groups and different
confusion matrix corresponding to the experiments using SEED, as environments. However, such an ideal model often fails to meet the
shown in Fig. 5. Each row of the confusion matrix represents the target expected requirements in practical applications. Therefore, how to
class and each column represents the predicted class that a classifier establish a universal model to tackle the possible differences between
6
Table 2
Cross-datasets emotion recognition results with leave-one-subject-out cross-
validation strategy.
Method SEEDI→DEAP SEEDII→DEAP SEEDIII→DEAP
Baseline 34.57 (7.98) 32.99 (3.44) 32.51 (6.73)

MIDA 40.34 (14.72) 39.90 (14.83) 37.46 (13.11)
TCA 42.60 (14.69) 42.40 (14.56) 39.76 (15.15)
SA 36.73 (10.69) 37.36 (7.90) 37.27 (10.05)
ITL 34.50 (13.17) 34.10 (9.29) 33.62 (10.53)
GFK 41.91 (11.33) 40.08 (11.53) 39.53 (11.31)
KPCA 35.60 (6.97) 34.69 (4.34) 35.11 (10.05)
on a small number of training samples. For DEAP, we randomly divide

the samples of each subject into a training and testing dataset with a
training versus testing ratio of 1:4. (Which means 20% for training, 80%
for testing, the training sample size of each emotion is 12.) Then, two
different deep model transfer learning strategies are used to fine-tune
the pre-trained CNN with the training dataset, after which the
network is tested on the testing samples. During model fine-tuning,
Adam is used as an optimizer, the learning rate is 0.00002 with cross-
Fig. 5. Confusion matrix on SEED. entropy as a loss function. The recognition accuracy of the proposed
method using the DEAP dataset based on a few training samples is
subjects and signal acquisition devices under different classification shown in Fig. 6.
tasks, as well as realizing few-shot learning, is a problem that needs to be The average recognition accuracy and standard deviation of the
taken into consideration in CNN-based emotion recognition system. baseline, fine-tune fc layers and fine-tune all layers are 32.94% (3.80),
Various studies on CNN have shown that shallow convolution layers are 70.14% (9.81), and 82.84% (10.74), respectively. The baseline indicates
designed to extract common basic features from the input, while deeper that the pre-trained model using SEED is directly applied to DEAP
convolution layers can extract more abstract and task related features. without using a transfer learning strategy. It can be seen that the
Therefore, it is possible to get an accurate classification result based on recognition accuracy of all 15 subjects with this method is very low (less
partial fine-tuning of the pre-trained CNN with a few training samples. than 40%), which means that the data distribution of DEAP is quite
Generally, the accuracy is positively correlated with the number of fine- different from that of SEED. The data distribution between these two
tuned layers. To this end, through two deep neural network transfer datasets doesn’t satisfy the independent and identical assumption of
learning strategies, i.e. just fine-tune the fully connected layers or fine- traditional machine learning. It is worth noting that the baseline result is
tune all layers, the pre-trained CNN with SEED is transformed for similar to the baseline recognition accuracy in (Lan et al., 2019).
another emotion recognition task based on DEAP. Fine-tune fc layers and fine-tune all layers represent the recognition
In order to produce EFDMs with the same attributes (including results obtained by using 12 training samples to fine-tune the
channel order, frequency range and size) for deep model transfer pre-trained CNN with fully connected layers or all layers, respectively.
learning between two datasets, we take following different preprocess For each subject, the recognition results of these two methods changed
ing methods. For SEED, 32 EEG channels (Fp1, Fp2, AF3, AF4, F7, F3, Fz, synchronously. Compared with the baseline results, these two transfer
F4, F8, FC5, FC1, FC2, FC6, T7, C3, Cz, C4, T8, CP5, CP1, CP2, CP6, P7, learning strategies improved the recognition accuracy significantly for
P3, Pz, P4, P8, PO3, PO4, O1, Oz, O2) are shared with DEAP and the first every subject. The method of fine-tune all layers achieved the best
64 frequency points are selected to build EFDMs with a size of 32 � 64. classification results among 15 subjects with an average accuracy of
For DEAP, the EEG channels are rearranged according to the former 82.84%, the highest result is 96.60% on subject 11, while a lower result
presented electrode order to ensure that they are consistent with SEED. is 61.18% and 63.13% for subject 7 and 9, respectively. The best
Based on the review of relevant works, we found that some scholars recognition accuracy using fine-tune fc layers is 84.86% with subject 11.
have conducted research on emotion recognition with transfer learning However, its performance on subject 5, 7, 9, and 14 is poor, all of them
across two datasets (Lan et al., 2019). However, there are some differ are lower than 60%.
ences between the research focus. The main research focus of this paper The confusion matrix of the proposed method is shown in Fig. 7. As
is to realize emotion recognition based on a deep model transfer strategy can be seen from the figure, the baseline method classified almost all
with a few training samples. While the latter aims to use a domain samples (about 84%) as neutral emotion, only a small number of sam
adaptation method to transfer the classification knowledge learned ples are recognized as negative, while fewer samples are classified as
using SEED, to DEAP. There are also differences in experimental set positive (less than 1%). It shows that there is a big difference in the
tings. In this paper, a small amount of data of the target subject is used EFDMs between the two datasets, and the pre-trained CNN learned from
for training, while the latter uses the leave-one- subject-out SEED cannot be directly used for DEAP. With the proposed deep model
cross-validation strategy for classification on DEAP. (The data of each transfer learning strategy of fine-tuning the fully connected layers of the
session in SEED were used as source samples, and each subject in DEAP pre-trained CNN, the classification accuracy has been greatly improved.
was set as a target sample for testing.) The recognition results using the The best result is obtained in neutral emotion recognition (77%), fol
DEAP dataset with domain adaptation from (Lan et al., 2019) are shown lowed by positive, while the result of negative emotion recognition is not
in Table 2. as good as the former (62%). With the method of fine-tuning all layers,
It can be seen from the table that Transfer component analysis (TCA) the recognition results have been further enhanced, the positive emotion
achieved the best recognition accuracy under the three experimental recognition is the best (86%), the neutral is second, and the accuracy of
settings. However, the recognition accuracy of all domain adaptation the negative emotion recognition has reached 79%. It is worth noting
methods is very low (no more than 43%), and the recognition result for that the emotions achieved with the best classification accuracy of fine-
Information theoretical learning (ITL) is even lower than that of the tune fc layers and fine-tune all layers are neutral and positive, respec
baseline method which did not adopt transfer learning. tively. That is to say, through fine-tuning the weights of the convolution
From here on, we will carry out deep model transfer learning based layers in the pre-trained CNN, it has helped to learn more emotion
7
Fig. 6. The recognition accuracy of our proposed method on DEAP.
Fig. 7. The confusion matrix on DEAP.
Fig. 8. Classification accuracy with varying number of training samples.
8
related features in DEAP. The average recognition accuracy for fine- 4.3. What did our network learn?
tuning all layers is 82.84%.
We set up six comparative experiments to illustrate the effectiveness The existing CNN based EEG emotion recognition studies take the
of the proposed methods for EEG based emotion recognition, including original EEG signals or time frequency maps as the input. However, the
one-shot learning (taking one sample from each emotion to form a original EEG signals cannot represent its frequency feature, and the time
training set, and using remaining samples for testing), as well as the frequency maps cannot reflect the position relationship between EEG
experiments with a training data proportion of 0.05, 0.1, 0.3, 0.4 and channels. The EFDMs proposed in this paper can simultaneously give
0.5, respectively. For the training dataset, we randomly selected a expression to the frequency distribution as well as the EEG electrodes
number of samples from all types of emotions according to the training position information. Based on the attention mechanism of deep neural
data proportion, and the remaining samples are used to form the testing network, we adopt Grad-CAM to analysis what information the CNN has
dataset. In order to avoid the problem that a small number of randomly learned from EFDMs. Investigate the key EEG electrodes as well as fre
selected training samples are not representative enough, the compara quency bands corresponding to each emotion category automatically
tive experiment under every experimental setting was repeated five and simultaneously.
times, and the average value was used as the final result. The average Fig. 9 shows the ‘attention maps’ generated with Grad-CAM of
recognition accuracy and standard deviation of the proposed methods different emotion categories. The brighter the color is, the more
with a different number of training samples are shown in Fig. 8. important the information contained in this area is to emotion recog
We can see that the accuracy direction of these two deep model nition. Similarly, the darker the color is, the less important this area is.
transfer learning is consistent, both increase with the number of training From the figure, we can see that the EEG channels and frequency bands
samples used. When the number of training samples from each emotion that the CNN focused on are quite different. The average attention level
increased to 12 (the training data proportion is 0.25), the growth trend for all channels are shown in the right histogram, which represent the
of these two methods was significantly slower. Under each experimental average of Grad-CAM value across each channel. From these attention
setup, the result of fine-tune all layers is always better than that of fine- heat maps and histograms, we can find that there is a large similarity
tune fc layers. This shows that there is a difference in EFDMs between between negative and neutral emotion. That’s why the network mis
SEED and DEAP, which needs to be adjusted through the fine-tuning classified 6% of neutral emotions into negative (with 3% into positive),
with the weights of the convolution layers. Additionally, under the while the proportion of negative samples misclassified into neutral is 8%
‘one-shot learning’ experimental setup, which uses only one sample (with 4% into positive) as shown in Fig. 5.
from each emotion kind for training, the accuracy of these two methods From Fig. 9 (a), (d), the key frequency bands related to negative
(e.g., fine-tune fc layers 45.50% (5.60), fine-tune all layers 50.02% emotion recognition mainly concentrated in 25–57 Hz, and the key
(11.48)) is much higher than that of the baseline method (32.94% channels distributed around FC2, FC6. From Fig. 9 (b), (e), the key EEG
(3.80)). This also illustrated by the effectiveness of the proposed frequency bands and channels of neutral emotion are 27–55Hz, and T8,
methods in emotional recognition by fine-tuning the CNN with a few CP5, CP1, CP2. Similarly, the critical information of positive emotion
samples. from Fig. 9 (c), (f) are 24–59Hz, and FC2, FC6. Although the distribution
of key frequency bands under three emotions is highly coincident, the
Fig. 9. Heat maps and average attention level for every channel obtained through Grad-CAM on SEED. (a), (d) Negative. (b), (e) Neutral. (c), (f) Positive.
9
key point of positive (29Hz) is quite different from that of negative Alkaysi, A.M., Alani, A., Loo, C., Powell, T.Y., Martin, D., Breakspear, M., Boonstra, T.W.,
2017. Predicting tDCS treatment outcomes of patients with major depressive
(44Hz), and neutral (44Hz). This means the high frequency feature
disorder using automated EEG classification. Journal of Affective Disorders 208,
components contain more distinguishing information for EEG based 597–603.
emotion recognition. In addition, the alpha band (8–13Hz) of some Ang, A., Yeong, Y., Wee, W., 2017. Emotion classification from EEG signals using time-
channels is helpful for negative and positive emotion classification, frequency-DWT features and ANN. Journal of Computer and Communications 5,
75–79.
while not for neutral emotion. We can draw the conclusion that CNN Bocharov, A.V., Knyazev, G.G., Savostyanov, A.N., 2017. Depression and implicit
pays more attention to the high frequency bands of the EEG signals emotion processing: an EEG study. Neurophysiologie Clinique-clinical
(24–59Hz), which is consistent with the conclusion in (Zheng et al., Neurophysiology 47 (3), 225–230.
Chuang, C., Ko, L., Lin, Y., Jung, T., Lin, C., 2014. Independent component ensemble of
2015), (Zheng and Lu, 2015), (Wang et al., 2014), (Zheng et al., 2019). EEG for brain–computer interface. IEEE Transactions on Neural Systems and
Therefore, the CNN can be trained to automatically discover the EEG Rehabilitation Engineering 22 (2), 230–238.
channels and features that are conducive to emotion recognition. It is Dash, M., Liu, H., 1997. Feature selection for classification. In intelligent data analysis 1,
131–156.
worth noting that the range of key EEG channels and frequency bands Duan, R., Wang, X., Lu, B., 2012. EEG-based emotion recognition in listening music by
obtained in this paper is a little wider than the true information due to using support vector machine and linear dynamic system. In international
the influence of two-dimensional convolution operation. That is to say, conference on neural information processing. Springer, Berlin, Heidelberg,
pp. 468–475.
the real key EEG channels and frequency bands related to emotion Duan, R., Zhu, J., Lu, B., 2013. Differential entropy feature for EEG-based emotion
recognition should be concentrated in several channels and frequency classification. In international ieee/embs conference on neural engineering. IEEE,
bands with the highest brightness in the attention maps. pp. 81–84.
Ezzatdoost, K., Hojjati, H., Aghajan, H., 2020. Decoding olfactory stimuli in EEG data
using nonlinear features: a pilot study. Journal of Neuroscience Methods 341,
5. Conclusion 108780.
Friston, K.J., 2001. Book Review: Brain function, nonlinear coupling, and neuronal
transients. The Neuroscientist 7 (5), 406–418.
In this paper, we have provided a solution to tackle the challenge of
Hatcher, W.G., Yu, W., 2018. A survey of deep learning: platforms, applications and
differences in individual emotions with deep model transfer learning. emerging research trends. IEEE Access 6, 24411–24432.
Aims to build a robust emotion recognition model independent of Jie, X., Cao, R., Li, L., 2014. Emotion recognition based on the sample entropy of EEG.
Biomedical Materials and Engineering 24 (1), 1185–1192.
stimulus, subjects, and EEG collection device etc. We have mainly set up
Knyazev, G.G., Slobodskojplusnin, J.Y., Bocharov, A.V., 2010. Gender differences in
two experiments, within and cross-datasets emotion recognition. First, implicit and explicit processing of emotional facial expressions as revealed by event-
the effectiveness of the proposed approach is valid on SEED with an related theta synchronization. Emotion 10 (5), 678–687.
average accuracy of 90.59%. After that, the pre-rained CNN from the Koelstra, S., Muhl, C., Soleymani, M., Lee, J., Yazdani, A., Ebrahimi, T., Pun, T.,
Nijholt, A., Patras, I., 2012. Deap: a database for emotion analysis; using
first experiment is applied to DEAP with the deep model transfer physiological signals. IEEE Transactions on Affective Computing 3 (1), 18–31.
learning method. Experiments show that when 12 training samples of Lan, Z., Sourina, O., Wang, L., Scherer, R., Mullerputz, G., 2019. Domain adaptation
each emotion are used for deep model fine-tune, a high accuracy can be techniques for EEG-based emotion recognition: a comparative study on two public
datasets. IEEE Trans. on Cognitive and Developmental Systems 11 (1), 85–94.
achieved with a few samples. At the last, based on the attention mech Li, M., Chen, W., Zhang, T., 2017. Classification of epilepsy EEG signals using DWT-based
anism of deep neural network, we adopt Grad-CAM to analysis what envelope analysis and neural network ensemble. Biomedical Signal Processing and
information the CNN has learned from EFDMs, obtained the key EEG Control 31, 357–365.
Li, Y., Zheng, W., Cui, Z., Zhang, T., Zong, Y., 2018. A novel neural network model based
electrodes and frequency bands corresponding to each emotion category on cerebral hemispheric asymmetry for EEG emotion recognition. In international
automatically and simultaneously. The results show that the high fre joint conference on artificial intelligence. Morgan Kaufmann, pp. 1561–1567.
quency bands (24–59Hz) are more helpful for emotion recognition. he Liberati, G., Federici, S., Pasqualotto, E., 2015. Extracting neurophysiological signals
reflecting users’ emotional and affective responses to BCI use: a systematic literature
key channels of neutral are T8, CP5, CP1, CP2, which is different from review. NeuroRehabilitation 37 (3), 341–358.
that of negative and positive (FC2, FC6). Liu, W., Zheng, W., Lu, B., 2016. Emotion recognition using multimodal deep learning. In
From Table 1, we can see that the proposed approach hasn’t achieved international conference on neural information processing. Springer, Cham,
pp. 521–529.
the best performance, this may due to the 1s signal used is shorter than
Lu, Y., Zheng, W., Li, B., Lu, B., 2015. Combining eye movements and EEG to enhance
that of others with 4s and 9s, or due to the lack of eye movement data. emotion recognition. In international conference on artificial intelligence. Morgan
We will consider the issue of EEG data length as well as the multimodal Kaufmann, pp. 1170–1176.
data fusion method for emotion recognition in the future. Moreover, we Mathersul, D., Williams, L.M., Hopkinson, P.J., Kemp, A.H., 2008. Investigating models
of affect: relationships among EEG alpha asymmetry, depression, and anxiety.
only studied the transfer learning method of fine-tuning deep neural Emotion 8 (4), 560–572.
networks to tackle the challenge of individual difference between sub Mühl, C., Allison, B., Nijholt, A., Chanel, G., 2014. A survey of affective brain computer
jects with the cross-datasets emotion recognition experiment at present. interfaces: principles, state-of-the-art, and challenges. Brain-Computer Interfaces 1
(2), 66–84.
More and more advanced deep transfer learning methods have emerged Niemic, C., 2004. Studies of emotion: a theoretical and empirical review of
recently. Therefore, more attempts should be tried with these algo psychophysiological studies of emotion, 1. jur: Journal of Undergraduate Research,
rithms. Furthermore, the source and target domain included in this pp. 15–18.
Palus, M., 1996. Nonlinearity in normal human EEG: cycles, temporal asymmetry,
paper is the same. Concentrate on the EEG emotion recognition issue nonstationarity and randomness, not chaos. Biological Cybernetics 75 (5), 389–396.
with insufficient samples and different source and target domain is Pan, S.J., Yang, Q., 2010. A survey on transfer learning. IEEE Transactions on Knowledge
another work worth studying. and Data Engineering 22 (10), 1345–1359.
Preethi, J., Sreeshakthy, M., Dhilipan, A., 2014. A survey on EEG based emotion analysis
using various feature extraction techniques. International Journal of Science,
CRediT authorship contribution statement Engineering and Technology Research (IJSETR) 3 (11), 3113–3120.
Sammler, D., Grigutsch, M., Fritz, T., Koelsch, S., 2007. Music and emotion:
electrophysiological correlates of the processing of pleasant and unpleasant music.
Shichao Wu: Methodology, Software, Writing - original draft, Psychophysiology 44 (2), 293–304.
Writing - review & editing. Weiwei Zhang: Data curation, Validation. Sebe, N., Cohen, I., Gevers, T., Huang, T.S., 2005. Multimodal approaches for emotion
Zongfeng Xu: Resources. Yahui Zhang: Visualization. Chengdong Wu: recognition: a survey. Electronic Imaging 56–67.
Selvaraju, R.R., Cogswell, M., Das, A., Vedantam, R., Parikh, D., Batra, D., 2020. Grad-
Investigation. Sonya Coleman: Formal analysis.
cam: visual explanations from deep networks via gradient-based localization.
International Journal of Computer Vision 128 (2), 336–359.
REFERENCES Sohaib, A.T., Qureshi, S., Hagelb€ ack, J., Hilborn, O., Jercic, P., 2013. Evaluating
classifiers for emotion recognition using EEG. In international conference on
augmented cognition. Springer, Berlin, Heidelberg, pp. 492–501.
Abdelhamid, O., Mohamed, A., Jiang, H., Deng, L., Penn, G., Yu, D., 2014. Convolutional
Song, T., Zheng, W., Song, P., Cui, Z., 2018. EEG emotion recognition using dynamical
neural networks for speech recognition. IEEE Transactions on Audio, Speech, and
graph convolutional neural networks. IEEE Transactions on Affective Computing.
Language Processing 22 (10), 1533–1545.
https://doi.org/10.1109/TAFFC.2018.2817622.
Alarcao, S.M., Fonseca, M.J., 2019. Emotions recognition using EEG signals: a survey.
IEEE Transactions on Affective Computing 10 (3), 374–393.
10
Tang, H., Liu, W., Zheng, W., Lu, B., 2017. Multimodal emotion recognition using deep Zhang, Q., Yang, L.T., Chen, Z., Li, P., 2018. A survey on deep learning for big data.
neural networks. In international conference on neural information processing. Information Fusion 42, 146–157.
Springer, Cham, pp. 811–819. Zheng, W., Liu, W., Lu, Y., Lu, B., Cichocki, A., 2019. Emotionmeter: a multimodal
Wang, H., 2011. Optimizing spatial filters for single-trial EEG classification via a framework for recognizing human emotions. IEEE Transactions on Systems, Man,
discriminant extension to CSP: the Fisher criterion. Medical & Biological Engineering and Cybernetics 49 (3), 1110–1122.
& Computing 49 (9), 997–1001. Zheng, W., Lu, B., 2015. Investigating critical frequency bands and channels for EEG-
Wang, X., Nie, D., Lu, B., 2014. Emotional state classification from EEG data using based emotion recognition with deep neural networks. IEEE Transactions on
machine learning approach. Neurocomputing 129, 94–106. Autonomous Mental Development 7 (3), 162–175.
Wen, Z., Xu, R., Du, J., 2017. A novel convolutional neural networks for emotion Zheng, W., Guo, H., Lu, B., 2015. Revealing critical channels and frequency bands for
recognition based on eeg signal. In 2017 international conference on security, emotion recognition from EEG with deep belief network. In international ieee/embs
pattern analysis, and cybernetics. IEEE, pp. 672–677. conference on neural engineering. IEEE, pp. 154–157.
Yanagimoto, M., Sugimoto, C., 2016. Convolutional neural networks using supervised Zheng, W., Zhu, J., Lu, B., 2019. Identifying stable patterns over time for emotion
pre-training for EEG-based emotion recognition. In the 8th international workshop recognition from EEG. IEEE Transactions on Affective Computing 10, 417–429.
on biosignal interpretation. IEEE, pp. 72–75. Zhuang, N., Zeng, Y., Tong, L., Zhang, C., Zhang, H., Yan, B., 2017. Emotion recognition
Yang, Y., Wu, Q.M., Zheng, W., Lu, B., 2017. EEG-based emotion recognition using from EEG signals using multidimensional information in EMD domain. BioMed
hierarchical network with subnetwork nodes. IEEE Transactions on Cognitive and Research International 1–9.
Developmental Systems 10 (2), 408–419. Zubair, M., Yoon, C., 2018. EEG based classification of human emotions using discrete
Yosinski, J., Clune, J., Bengio, Y., Lipson, H., 2014. How transferable are features in deep wavelet transform. IT Convergence and Security 2017. Springer, Singapore,
neural networks? In neural information processing systems. MIT Press, pp. 21–28.
pp. 3320–3328.
11

Wang 2020

Uploaded by

Copyright:

Available Formats

Wang 2020

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Wang 2020

Uploaded by

Copyright:

Available Formats

Neuropsychologia 146 (2020) 107506

Contents lists available at ScienceDirect

Emotion recognition with convolutional neural network and

1. Introduction researchers to study phase changes related to emotion. EEG is

Fig. 2. DEAP emotion space discrete results with valence score.

Fig. 3. EFDMs under different emotions.

Fig. 4. Two-dimension visualization of features selected from different subjects in SEED.

4. Experiments and results analysis

Baseline 34.57 (7.98) 32.99 (3.44) 32.51 (6.73)

on a small number of training samples. For DEAP, we randomly divide

Fig. 6. The recognition accuracy of our proposed method on DEAP.

Fig. 7. The confusion matrix on DEAP.

Fig. 8. Classification accuracy with varying number of training samples.

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.