A Novel Neural Network Model Based On Cerebral Hemispheric Asymmetry For EEG Emotion Recognition

Download as pdf or txt
Download as pdf or txt
You are on page 1of 7

Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence (IJCAI-18)

A Novel Neural Network Model based on Cerebral Hemispheric Asymmetry for


EEG Emotion Recognition

Yang Li1,2 , Wenming Zheng1,∗ , Zhen Cui3 , Tong Zhang1,2 and Yuan Zong1
1
Key Laboratory of Child Development and Learning Science of Ministry of Education,
Southeast University, China
2
School of Information Science and Engineering, Southeast University, China
3
School of Computer Science and Engineering,
Nanjing University of Science and Technology, China
wenming zheng@seu.edu.cn

Abstract sponse, heart rate, blood pressure, cortisol level, electromyo-


graphy and respiration rate. However, from a neuroscience
In this paper, we propose a novel neural net- point of view [Lotfi and Akbarzadeh-T, 2014], human’s emo-
work model, called bi-hemispheres domain adver- tion is closely related to a variety of brain subregions, such
sarial neural network (BiDANN), for EEG emotion as the orbital frontal cortex, ventral medial prefrontal cor-
recognition. BiDANN is motivated by the neuro- tex, amygdala [Britton et al., 2006; Etkin et al., 2011]. Thus
science findings, i.e., the emotional brain’s asym- it is a direct means to study emotion by collecting human’s
metries between left and right hemispheres. The brain activity signals under deferent moods. The technology
basic idea of BiDANN is to map the EEG data of electroencephalograph (EEG) can measure the changes of
of both left and right hemispheres into discrimi- this brain electrical activities. It places electrodes on the head
native feature spaces separately, in which the data of participants non-invasively, and has high temporal resolu-
representations can be classified easily. For fur- tion and can directly reflect the potential activity of the nerve.
ther precisely predicting the class labels of test- Therefore, EEG signal can be used to decode emotions. As
ing data, we narrow the distribution shift between a novel research means for emotion, EEG advances the re-
training and testing data by using a global and search of emotion recognition.
two local domain discriminators, which work ad-
versarially to the classifier to encourage domain- EEG emotion recognition task can be roughly partitioned
invariant data representations to emerge. After that, into two steps: feature extraction and classifier design. First,
the learned classifier from labeled training data can features are extracted from time domain, frequency domain
be applied to unlabeled testing data naturally. We or time-frequency domain [Jenke et al., 2014]. Then a set
conduct two experiments to verify the performance of EEG feature vectors are chosen to train a classifier and
of our BiDANN model on SEED database. The the other EEG data are tested based on it. Many researchers
experimental results show that the proposed model construct models and introduce methods to deal with EEG
achieves the state-of-the-art performance. emotion recognition tasks [Musha et al., 1997]. In [Kim et
al., 2013], Kim et al. reviewed the computational methods
that have been developed to deduct EEG indices of emotion,
1 Introduction to extract emotion-related features, or to classify EEG signals
Emotion plays a crucial role in human being’s learning and into one of many emotional states. In [Jenke et al., 2014],
communications, and has been one of the most attractive topic Jenke et al. made a lot of experiments to compare the ex-
in affective computing area. Psychologists have conducted isting features using machine learning techniques for feature
a lot of studies about the definition, constitution, property selection on a self recorded data set. In [Li et al., 2016], Li
and function of emotion [Izard, 1991; Storbeck and Clore, et al. proposed a novel regression model, called graph reg-
2005]. However, emotion is still hard to be understood by ularized sparse linear discriminant analysis (GraphSLDA),
machines. Emotion recognition, as a popular topic, receives to deal with EEG emotion recognition problem. In [Zheng,
substantial attentions in computer vision and pattern recogni- 2017], Zheng et al. proposed a novel group sparse canonical
tion researches [Picard and Picard, 1997]. correlation analysis (GSCCA) method for simultaneous EEG
The responses of emotion can be facial expression, speech channel selection and emotion recognition. Recently, deep
and other physiologic signals such as skin conductance re- learning methods have shown better performance than tradi-
tional methods to deal with EEG emotion recognition prob-

Corresponding author lem. For example, in [Zheng and Lu, 2015], Zheng et al.

1561
Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence (IJCAI-18)

used Deep Belief Network(DBN) to extract high-level fea-


tures of EEG emotion data.
Although there have been so many algorithms or models
to deal with EEG emotion recognition problem, most of them
have not considered the dependencies between training and
testing data. Specifically, for traditional deep learning meth-
ods, the label information of testing data is predicted directly
based on the classifier trained based on the training data sam-
ples. The effect of this learning strategy is mainly based on a
assumption that the distributions of training data and testing
data are similar. For EEG signals, however, data distribution
shift is tremendous for different people or even same peo-
ple but under different circumstances. Thus, in EEG emotion Figure 1: The framework of BiDANN. The black lines and arrows
recognition algorithm, we should ensure the data representa- refer to source domain path, while the red lines and arrows refer to
target domain path.
tions invariable referring to source (training) or target (test-
ing) domain by removing the domain identification informa-
tion to decrease this data distribution shift. In other words, we data. The global discriminator constrains the entire data
should find a data representation space where the data coming distribution similar, meanwhile the local discriminator fur-
from source and target domains is indistinguishable as more ther constrains one part of the entire data distribution simi-
as possible, while preserving a low risk on the source labeled lar because of the large difference between two hemispheres.
data. The parameters of f eature extractors are optimized to
On the other side, from the view of neuroscience, it is more minimize the loss of classif ier but maximize the loss of
valuable to consider the nature or characteristics of brain in domain discriminators, which will lead to an adversarial
the construction of model for EEG emotion recognition. In learning between classif ier and domain discriminators
fact, although anatomy of human brain looks like symmetric, to encourage emotion-related but domain-invariant data rep-
the left and right hemispheres are not entirely symmetrical. resentation appeared.
Asymmetry, both in structure and function, exists throughout The major contribution of this paper can be summarized as
the neocortex and cortical substructures [Zatorre et al., 1992; follows:
Greve et al., 2013]. • This is the first work to consider the dependence be-
Recently, in [Ganin et al., 2016], Ganin et al. proposed a tween left and right hemispheres in emotion recognition
Domain Adversarial Neural Networks (DANN) to deal with research, and integrate this neuroscience finding of cere-
domain adaptation problems. It adopted a domain discrimina- bral asymmetry into deep learning model;
tor to distinguish which domain the input comes from. With-
out any class information from testing data, DANN modified • Besides constraining the global distribution similarity
the data representation space to generate domain-invariable between training and testing data, we consider the lo-
data features. Benefiting from this, we can consider to fit cal distribution dependency between left and right hemi-
the asymmetry of emotional brain into the DANN framework, spheres.
meanwhile utilize time information for EEG emotion recog-
nition tasks. 2 The Proposed Model for EEG Emotion Recogni-
Thus, in this paper, we propose a novel deep neural net- tion
work model, called bi-hemispheres domain adversarial neural In this section, we introduce the proposed BiDANN, and then
network (BiDANN) model, which considers distribution shift use it to deal with EEG emotion recognition.
between training and testing data and cerebral hemisphere
asymmetry, to deal with two general EEG emotion recog- 2.1 The BiDANN Model
nition tasks. BiDANN learns discriminative information in Fig. 1 shows the overall framework of BiDANN, which con-
regard to emotion while in-discriminative information refer- sists of two feature extractors (Feature Extractor -1 and Fea-
ring to domains. This is achieved by jointly optimizing three ture Extractor -2), a global discriminator (Discriminator) and
modules: (1) f eature extractors. Two f eature extractors two local discriminators (Discriminator -1 and Discriminator
learn the inter-information on each cerebral hemisphere sepa- -2), three gradient reversal layers (GRL), and a classifier. The
rately, and map the original EEG data into deep feature space two feature extractors capture the dynamic features of two
which has more discriminative emotional information. (2) hemispheric EEG signal separately. Three discriminators are
classif ier. It predicts the emotion class label by mapping trained on a binary domain label set D = {0, 1}, in which
the feature into label space. (3) domain discriminators. A the domain labels of source samples are set to 0 while the do-
global domain discriminator is used to distinguish which do- main labels of target samples are set to 1. The global discrim-
main (training data or testing data) the input comes from so inator narrows the features’ distribution gap between source
as to decrease the distribution shift, while two hemispheric and target domains, while the two local discriminators com-
local discriminators further narrow the left and right hemi- plementally eliminate the left and right hemispheric features’
spheric data distributions separately in either source or target distribution difference within source and target domains re-
domain, which are components of the entire cerebral EEG spectively. GRL can maximize the loss of discriminators by

1562
Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence (IJCAI-18)

has a higher level features, i.e.,


X T
hck = σ( Gcik hfi + bc ), k = 1, 2, · · · , K, (2)
i=1
where bc ∈ IRdh ×1 is a bias and K is the length of the
compressed sequence. Then we get dynamic features
with more discriminability about emotions to represent
the input states. The above feature extraction process
can be simply formulated as Hc = £(X).
For an EEG sequence X, we split it into left and right
hemispheric data sequences, i.e., X = [Xl , Xr ] =
{[xl1 , xl2 , · · · , xlt , · · · , xlT ], [xr1 , xr2 , · · · , xrt , · · · , xrT ]}.
The features of both hemispheres on source (train-
ing) data XS = [XlS , XrS ] and target (testing) data
Figure 2: The feature extractors of BiDANN. XT = [XlT , XrT ] can be formulated as:
HlR = [HlS , HlT ] = [hl1S , · · · , hlKS , hl1T , · · · , hlKT ]
leaving the input unchanged during forward propagation and = Ef (XlS , XlT ; θfl ) = [£1 (XlS ), £1 (XlT )], (3)
reversing the gradient by multiplying it by a negative scalar HrR = [HrS , HrT ] = [hr1S , · · · , hrKS , hr1T , · · · , hrKT
]
during backpropagation [Ganin et al., 2016]. The classifier
maps the emotion-related and domain-invariable features into = Ef (XrS , XrT ; θfr ) = [£2 (XrS ), £2 (XrT )]. (4)
class label space to predict the class labels. The complete feature extraction process is shown in
Overall, the complete objective function of the proposed Fig. 2.
BiDANN is as follows: (2) Local and global discriminators. We set domain label
sets DS = 0,i=1,2,··· ,N and DT = 1,j=1,2,··· ,M to source
L(XR ; θfl , θfr , θc , θdl , θdr , θd ) = Lc (XS ; θfl , θfr , θc ) and target samples separately, where N and M are the
−Lld (XlR ; θfl , θdl )−Lrd (XrR ; θfr , θdr )−Ld (XR ; θfl , θfr , θd ). (1) number of source and target domain samples. This is
used to calculate the loss of discriminators. We can de-
note the loss function of discriminator as:
Here X denotes an EEG sequence. R ∈ (S, T ), S and T
denote the source and target domains separately. Lc , Lld , Lrd L(Gd (Ef (XR ; θf ); θd ), DR ), (5)
and Ld are the loss functions of classifier, left and right hemi- where L is the classification loss such as cross-entropy
spheric local discriminators, and global discriminator. θc , θdl , loss function, Gd is the domain label classifier and Ef
θdr and θd are their corresponding parameters. θfl and θfr are is the feature extractor function. To coincide the feature
the parameters of left and right hemispheric feature extrac- distributions of source and target domains, the param-
tors. Here l and r denote the left and right hemisphere sepa- eters of feature extractors are updated to strive to gen-
rately. erate data representation to confuse the discriminator to
The detailed operations of BiDANN on left and right hemi- distinguish which domain the input comes from by max-
spheric EEG data come from three aspects: imizing the discriminator loss function.
Furthermore, because the special nature of human cere-
(1) Feature extractor for single hemispheric data. Let X = bral function brings the left hemispheric channels’ distri-
{x1 , · · · , xt−1 , xt , · · · , xT } ∈ IRdx ×T , where dx is the bution of EEG signal has a gap with the channels’ distri-
dimension and T is the length of sequence. To make bution on the right, we can not narrow this gap only use
full use of the temporal dependence in sequence, we the global discriminator in reality although it looks fea-
construct a Long Short-Term Memory (LSTM) frame- sible in theory, which is different from traditional data’s
work to learn the context information and transfer the constitutions such as images or audio. Thus we use two
input to another space which has more effective and high hemispheric data domain discriminators to constrain the
level components. Given an input xt ∈ IRdx ×1 , it is en- local data distribution similarity between source and tar-
coded to a latent state hft ∈ IRdh ×1 , which locates in get domains. The experimental results show that the op-
another space and has more discriminability that we ex- erations on split left and right cerebral hemispheric data
pect. Here dh is the dimension of hidden states. of our model is useful to get a good performance. In
summary, we can formulate the loss functions of local
Through repeating the LSTM module recurrently, we and global discriminators as:
obtain a sequence of hidden states that fully represents
the input sequence. For the sequence of hidden states, Lld (XlR ; θfl , θdl ) = L(Gd (Ef (XlR ; θfl ); θdl ), DR ), (6)
we map it into another compressed sequence Hc = Lrd (XrR ; θfr , θdr ) = L(Gd (Ef (XrR ; θfr ); θdr ), DR ), (7)
{hc1 , · · · , hck , · · · , hcK } ∈ IRdh ×K with a projection ma-
Ld (XR ; θfl , θfr , θd ) =
trix Gc = [Gcik ]T ×K which scans the whole hidden se-
quence and represents them with simplified states which L(Gd (Ef (XR ; θfl , θfr ); θd ), DR ). (8)

1563
Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence (IJCAI-18)

(3) Discriminative prediction. Like most supervised mod- Algorithm 1 Optimization of BiDANN.
els, we add a supervision term into the network so as Input:
to enhance the model’s discriminability. Concretely, we Training data set XS and Testing data set XT ;
use softmax function on the transformed hidden states to Ground-truth label set LS of training data set;
predict the class labels, i.e., Training (source) domain label set DS =[DSl , DSr ]={0}
T T
qi = [hl1S , · · · , hlKS , hr1S T , · · · , hrKS T ]T , (9) and testing (target) domain label set DT = [DTl , DTr ] =
{1};
exp(Gqi + b) Initial learning rate α;
P (yi = c|qi , G, b) = P , (10)
k exp(Gqk + b) Output:
yei = arg max P (yi = c|qi , G, b), (11) Parameter: θ̂fl , θ̂fr , θ̂c , θ̂d , θ̂dl , θ̂dr .
1: Input XS and LS to update the parameters of Classifier:
where qi ∈ IR2Kdh ×1 , the variables G ∈ IRdL ×2Kdh and
θc ← θc − α ∂L l l ∂Lc r r
∂θc , θf ← θf − α ∂θ l , θf ← θf − α ∂θ r ;
c ∂Lc
b ∈ IRdL ×1 are respectively the transform matrix and f f

bias, c is the c-th class, yi is the ground-truth label of 2: Input XS , XT , DS and DT to update the parameters of
i-th training data, dL is the number of class. The loss global Discriminator:
function of class label prediction can be expressed as: θd ← θd − α ∂L l l ∂Ld r r ∂Ld
∂θd , θf ← θf + α ∂θ l , θf ← θf + α ∂θ r ;
d
f f

Lc (XS ; θfl , θfr , θc ) = L(Gc (Ef (XS ; θfl , θfr ); θc ), yi ) 3: Input XlS , XlT , DS
l
and DTl to update the parameters of
X left hemispheric local Discriminator:
=− log(P (e
yi = c|qi , G, b), (12) ∂Ll ∂Ll
θdl ← θdl − α ∂θld , θfl ← θfl + α ∂θld ;
t d f
4: Input XrS , XrT , DS
r
and DTr to update the parameters of
where Gc denotes the class label classifier of source do-
right hemispheric local Discriminator:
main. ∂Lr ∂Lr
θdr ← θdr − α ∂θrd , θfr ← θfr + α ∂θrd ;
d f
2.2 Optimization of BiDANN 5: If algorithm has scanned all data 100 times, then α ←
Through minimizing Lc and maximizing Lld , Lrd , Ld , we op- 0.9 × α and goto step1;
timize the objective function of Eq. (1) to achieve a saddle 6: return θ̂fl , θ̂fr , θ̂c , θ̂d , θ̂dl , θ̂dr .
point by:
(θ̂fl , θ̂fr , θ̂c ) = arg min L(XR ; (θfl , θfr , θc ), θ̂d , θ̂dl , θ̂dr ),(13)
θfl ,θfr ,θc

θ̂d = arg max L(XR ; θ̂fl , θ̂fr , θ̂c , θd , θ̂dl , θ̂dr ), (14)
θd

θ̂dl = arg max L(XlR ; θ̂fl , θ̂fr , θ̂c , θ̂d , θdl , θ̂dr ), (15)
θdl

θ̂dr = arg max


r
L(XrR ; θ̂fl , θ̂fr , θ̂c , θ̂d , θ̂dl , θdr ). (16)
θd
(a) BiDANN-R1 (b) BiDANN-R2
We use stochastic gradient descent (SGD) to update θfl , θfr ,
θc to the direction of minimizing Lc and maximizing Lld , Lrd , Figure 3: The baseline methods in our experiments.
Ld , and update θdl , θdr , θd to the direction of minimizing Lld ,
Lrd , Ld . This max-minimum goal can be converted to a mini-
mum function, i.e., min L = min Lc +Lld +Lrd +Ld , by three 3 Experiments
gradient reversal layers (GRL) shown in Fig. 1, which keep
3.1 Setting Up
the gradient sign at forward-propagation but reverse it while
performing back-propagation. The optimization procedure of The baseline methods in the experiments are DANN [Ganin
BiDANN is shown in Algorithm 1. et al., 2016] and two reduced versions of our proposed
We iteratively train the classifier and three discriminators BiDANN, i.e., BiDANN-R1 and BiDANN-R2 shown in
and update the parameters same with standard deep learning Fig. 3 in our experiments. BiDANN-R2 reduces the local dis-
methods by chain rule. But the difference is that the param- criminators compared with BiDANN, and further BiDANN-
eters before the GRL module, i.e., the parameters of feature R1 extracts source and target domain samples’ feature ignor-
extractors, will minus the gradients with opposite sign that ing hemispheric difference. The feature extractors’ structure
come from GRL at back-propagation. Thus the feature ex- of these methods are same with the Feature Extractor -1 (or
tractors will generate data representations that minimize the Feature Extractor -2) of BiDANN shown in Fig. 2. These
loss of classifier while maximize the loss of discriminators. compared methods and our BiDANN are implemented with
In addition, for BiDANN, we update the classifier more times the popular Theano.
than the discriminators, because that our goal is to classify the We verified our BiDANN method on SEED database,
EEG emotion data instead of wiping out the domain-related which contains 15 subjects’ EEG data with 62 channels sorted
information thoroughly. according to 10-20 system. The EEG recording experiments

1564
Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence (IJCAI-18)

are conducted using ESI NeuroScan at a sampling rate of Method ACC/STD(%)


1000 Hz, and each subject has done the experiments twice SVM Suykens and Vandewalle, 1999]
[ 83.99/09.72
separately. During the recording experiments, the participants CCA [Thompson, 2005] 77.63/13.21
watched three kinds of film clips that are related to emotions
(positive, neutral, negative). Each emotion contains 5 ses- GSCCA [Zheng, 2017] 82.96/09.95
sions and each session has 185-238 samples [Zheng and Lu, DBN [Zheng and Lu, 2015] 86.08/08.34
2015].
GraphSLDA [Li et al., 2016] 87.39/08.64
Following the same features released in [Zheng and Lu,
2015], we use differential entropy (DE) of EEG signals as DANN [Ganin et al., 2016] 91.36/08.30
the input to feed into our model, which is equivalent to the BiDANN-R1 90.29/08.02
logarithm energy spectrum in a certain frequency band. DE BiDANN-R1 (Left) 88.98/08.00
is calculated in five bands (δ : 1∼3Hz, θ : 4∼7Hz, α: 8 ∼
13Hz, β: 14∼30Hz, γ: 31∼50Hz), thus it has a dimension BiDANN-R1 (Right) 89.70/07.63
of 310. BiDANN-R2 91.60/08.47
In any session, we use a slicing window of 9s to tem- BiDANN 92.38/07.04
porally scan the sequences by one step. For each step, the
sequences in the slicing window are used as the representa- Table 1: The mean accuracies (ACC) and standard deviations (STD)
tion of the point which is in the center of the slicing win- on SEED database for conventional EEG emotion recognition ex-
dow. By doing this, the temporal dependencies can be in- periment.
volved while recognizing the human emotion at a specific
moment. This is quite different from [Zheng and Lu, 2015]
which just focuses on recognizing the average energy within
a short time ignoring the temporal variation information.
Then the input data can be represented as a sequence, i.e.,
X={x1 , . . . , xi , . . . , xT } ∈ IRdx ×T , where the length of the
sequence T = 9 and the feature dimension dx = 310. We
use the left hemispheric electrodes (FP1, AF3, F7, F5, F3,
F1, FT7, FC5, FC3, FC1, T7, C5, C3, C1, TP7, CP5, CP3,
CP1, P7, P5, P3, P1, PO7, PO5, PO3, CB1, O1, FPZ, FCZ, (a) DANN (b) BiDANN-R1
CPZ, POZ) of 10-20 system as left hemispheric data Xl , and
the symmetric right hemispheric electrodes (FP2, AF4, F8,
Figure 4: The investigation of hemisphere usage.
F6, F4, F2, FT8, FC6, FC4, FC2, T8, C6, C4, C2, TP8, CP6,
CP4, CP2, P8, P6, P4, P2, PO8, PO6, PO4, CB2, O2, FZ, CZ,
PZ, OZ) as right hemispheric data Xr . Then both the training
trained on single hemispheric data to investigate which
(source) and testing (target) data are split into two compo-
hemisphere prefers to process emotions.
nents separately, i.e., XlS , XrS , XlT , XrT ∈ IR155×9 . In the
Table 1 shows the performance on SEED database. Our
experiments, the dimension of hidden state dh and the length
BiDANN achieves the state-of-the-art performance. Even
of compressed sequence K are set to 150 and 3 respectively.
compared with DBN, which also is a deep learning method,
These parameters are roughly set without elaborate traversal.
our method improves 6.3 percent in classification accuracy.
3.2 EEG Emotion Recognition on SEED Database In addition, from Table 1, we can see that the methods
with domain discriminator (including DANN and BiDANN-
Conventional (subject-dependent) EEG Emotion Recognition R1 frameworks) improve performance compared with other
In this experiment, we obey the protocol of Zheng methods. This shows that, for conventional EEG emotion
et al. [Zheng and Lu, 2015] strictly, which makes 9 sessions recognition, even the training and testing data come from a
of EEG data as training data while 6 sessions as testing data same subject, the domain gap always disturbs the decision of
from a same subject. Thus there are totally 1938 samples in classifier. It may be a small difference between EEG signal
training data and 1336 samples in testing data. We calculate and other data such as images. Moreover, we can see that the
the mean accuracy of 15 subjects as the evaluation criterion results of BiDANN-R1 (Right) using right hemispheric data
in our experiment. get better classification accuracy than BiDANN-R1 (Left) us-
Here we compare our BiDANN with the linear ing left hemispheric data, which shows that right hemisphere
SVM [Suykens and Vandewalle, 1999], Canonical Cor- is superior to the left hemisphere in emotion recognition pro-
relation Analysis (CCA) [Thompson, 2005], Group Sparse cess. But nether the experimental results of BiDANN-R1
Canonical Correlation Analysis (GSCCA) [Zheng, 2017], (Right) nor BiDANN-R1 (Left) surpass the performance of
and Deep Believe Network (DBN). These methods have BiDANN-R1. This shows that human left and right cerebral
been used in the classification of EEG signals [Zheng and hemispheres indeed have dependency in emotion processing.
Lu, 2015]. We compare BiDANN with the baseline methods, In addition, we investigate the effect of left and right hemi-
i.e., DANN, BiDANN-R1 and BiDANN-R2. And we con- sphere on three types of emotion using DANN and BiDANN-
duct additional experiments using BiDANN-R1 frameworks R1 methods. The results are shown in Fig. 4. For negative

1565
Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence (IJCAI-18)

Method ACC/STD(%)
SVM [Suykens and Vandewalle, 1999] 56.73/16.29
KPCA [Schölkopf et al., 1998] 61.28/14.62
TCA [Pan et al., 2011] 63.64/14.88
T-SVM [Collobert et al., 2006] 72.53/14.00
TPT [Sangineto et al., 2014] 76.31/15.89
DANN [Ganin et al., 2016] 75.08/11.18 (a) The conventional EEG emo- (b) The personalized EEG emo-
tion recognition experiment. tion recognition experiment.
BiDANN-R1 76.97/11.08
BiDANN-R2 82.22/07.61 Figure 5: The confusion matrices in our experiments.
BiDANN 83.28/09.60

Table 2: The mean accuracies (ACC) and standard deviations (STD) 3.3 Confusion Matrix
on SEED database for personalized EEG emotion recognition exper- To see the results of recognizing each emotion, we depict the
iment. confusion matrices corresponding to the experimental results
of our BiDANN. Fig. 5 shows the confusion matrices of con-
ventional and personalized EEG emotion recognition exper-
iments on SEED database respectively. From these two fig-
emotion, we can see that using the right hemispheric data has
ures, we can obtain two observations:
a much better performance than the left ether in DANN or
BiDANN-R1, which shows that the right hemisphere can bet- (1) Our BiDANN method performs well in recognizing all
ter process negative emotion than left hemisphere does. For three types of emotion, especially the positive emotion
positive emotion, the performance of left hemispheric EEG as the accuracies are more than 90% either in conven-
data approximates that of right hemispheric data in the experi- tional or personalized EEG emotion recognition tasks.
mental result of DANN method, and in BiDANN-R1 method, This shows that there indeed exists similarities in the
the performance of left hemispheric EEG data improves 3% same emotion of EEG signal. It is efficient to use the
compared with right hemispheric data, which shows that the EEG emotion signal to decode human emotion.
left hemisphere can better process positive emotion than right
hemisphere. (2) The mean accuracies of three types of emotion in all
subjects are negative 86.15%, neutral 93.61%, positive
Personalized (Subject-Independent) EEG Emotion Recognition 96.89% in conventional EEG emotion recognition task
from Fig. 5(a) and negative 80.51%, neutral 74.51%,
In this experiment, we adopt a leave-one-subject-out cross positive 91.04% in personalized EEG emotion recogni-
validation strategy to evaluate the performance of our model, tion from Fig. 5(b). We can observe that positive emo-
which is same with the protocol of Zheng et al. [Zheng and tion is much easier than negative and neutral emotions
Lu, 2016]. This strategy takes one subject’s EEG as the test- to be recognized. In addition, negative and neutral emo-
ing data while the rest 14 subjects’ EEG as training data. We tions are much more likely to be confused than positive
calculate the mean accuracy of 15 times experiments as the emotion. Maybe the positive emotion stimulus materials
evaluation criterion. cause more resonance in participants.
Here we compare our BiDANN with linear SVM [Suykens
and Vandewalle, 1999], KPCA [Schölkopf et al., 1998], 4 Conclusion
TCA [Pan et al., 2011], T-SVM [Collobert et al., 2006],
TPT [Sangineto et al., 2014] and the baseline methods, i.e., Emotion is a basic and common phenomenon which exists
DANN, BiDANN-R1 and BiDANN-R2. TCA and KPCA are in every human being. The technology of EEG provides a
infeasible to include all the training EEG data due to limits direct means to study emotion by measuring the signal of
of memory and time cost for singular value decomposition. nerve activity in brain. EEG emotion recognition models
Thus in the experiment, we use the randomly selected 5000 should consider the neurophysiology nature of brain and the
samples as the training data for TCA and KPCA. statistics characteristics of EEG signal. In this paper, we uti-
lize cerebral hemispheric asymmetry to deal with EEG emo-
Table 2 shows the performance on SEED database. We can tion recognition problem and propose a novel EEG emotion
see that BiDANN-R2 improves 5.2 percent compared with recognition framework called BiDANN. BiDANN first ex-
BiDANN-R1, which shows the importance of considering the tracts time dynamic features of left and right hemispheric
discrepancy between left and right cerebral hemispheric data EEG data separately and then narrows the distribution gap be-
for EEG emotion recognition. Furthermore, BiDANN with tween training and testing data by using local and global dis-
local discriminators has a better performance than BiDANN- criminators. The experimental results show that our BiDANN
R2 about 1 percent. This reveals that local discriminators are is superior to the baselines even other deep learning methods.
useful to further narrow the distribution difference between In the future work, we will investigate the effect of hemi-
source and target domains on both hemispheres. sphere data on more types of emotion.

1566
Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence (IJCAI-18)

Acknowledgments [Pan et al., 2011] Sinno Jialin Pan, Ivor W Tsang, James T
This work was supported by the National Basic Research Kwok, and Qiang Yang. Domain adaptation via transfer
Program of China under Grant 2015CB351704, the Na- component analysis. IEEE Transactions on Neural Net-
tional Natural Science Foundation of China under Grant works, 22(2):199–210, 2011.
61572009, Grant 61772276, and Grant 61602244, and the [Picard and Picard, 1997] Rosalind W Picard and Roalind
Jiangsu Provincial Key Research and Development Program Picard. Affective computing, volume 252. MIT press Cam-
under Grant BE2016616. bridge, 1997.
[Sangineto et al., 2014] Enver Sangineto, Gloria Zen, Elisa
References Ricci, and Nicu Sebe. We are not all equal: Personaliz-
[Britton et al., 2006] Jennifer C Britton, K Luan Phan, ing models for facial expression analysis with transductive
Stephan F Taylor, Robert C Welsh, Kent C Berridge, and parameter transfer. In Proceedings of the 22nd ACM inter-
I Liberzon. Neural correlates of social and nonsocial emo- national conference on Multimedia, pages 357–366. ACM,
tions: An fmri study. Neuroimage, 31(1):397–409, 2006. 2014.
[Collobert et al., 2006] Ronan Collobert, Fabian Sinz, Jason [Schölkopf et al., 1998] Bernhard Schölkopf, Alexander
Weston, and Léon Bottou. Large scale transductive svms. Smola, and Klaus-Robert Müller. Nonlinear compo-
Journal of Machine Learning Research, 7(Aug):1687– nent analysis as a kernel eigenvalue problem. Neural
1712, 2006. computation, 10(5):1299–1319, 1998.
[Etkin et al., 2011] Amit Etkin, Tobias Egner, and Raffael [Storbeck and Clore, 2005] Justin Storbeck and Gerald L
Kalisch. Emotional processing in anterior cingulate and Clore. With sadness comes accuracy; with happiness, false
medial prefrontal cortex. Trends in Cognitive Sciences, memory: Mood and the false memory effect. Psychologi-
15(2):85–93, 2011. cal Science, 16(10):785–791, 2005.
[Ganin et al., 2016] Yaroslav Ganin, Evgeniya Ustinova, [Suykens and Vandewalle, 1999] Johan AK Suykens and
Hana Ajakan, Pascal Germain, Hugo Larochelle, François Joos Vandewalle. Least squares support vector machine
Laviolette, Mario Marchand, and Victor Lempitsky. classifiers. Neural processing letters, 9(3):293–300, 1999.
Domain-adversarial training of neural networks. Journal [Thompson, 2005] Bruce Thompson. Canonical correlation
of Machine Learning Research, 17(59):1–35, 2016. analysis. Encyclopedia of statistics in behavioral science,
[Greve et al., 2013] Douglas N Greve, Lise Van der Haegen, 2005.
Qing Cai, Steven Stufflebeam, Mert R Sabuncu, Bruce Fis- [Zatorre et al., 1992] Robert J Zatorre, Marilyn Jones-
chl, and Marc Brysbaert. A surface-based analysis of lan- Gotman, Alan C Evans, and Ernst Meyer. Functional
guage lateralization and cortical asymmetry. Journal of localization and lateralization of human olfactory cortex.
Cognitive Neuroscience, 25(9):1477–1492, 2013. Nature, 360(6402):339–340, 1992.
[Izard, 1991] Carroll E Izard. The psychology of emotions. [Zheng and Lu, 2015] Wei-Long Zheng and Bao-Liang Lu.
Springer Science & Business Media, 1991. Investigating critical frequency bands and channels for
eeg-based emotion recognition with deep neural networks.
[Jenke et al., 2014] Robert Jenke, Angelika Peer, and Martin
IEEE Transactions on Autonomous Mental Development,
Buss. Feature extraction and selection for emotion recog- 7(3):162–175, 2015.
nition from eeg. IEEE Transactions on Affective Comput-
ing, 5(3):327–339, 2014. [Zheng and Lu, 2016] Wei-Long Zheng and Bao-Liang Lu.
Personalizing eeg-based affective models with transfer
[Kim et al., 2013] Min-Ki Kim, Miyoung Kim, Eunmi Oh, learning. In Proceedings of the Twenty-Fifth International
and Sung-Phil Kim. A review on the computational meth- Joint Conference on Artificial Intelligence, pages 2732–
ods for emotional state estimation from the human eeg. 2738. AAAI Press, 2016.
Computational and Mathematical Methods in Medicine,
2013, 2013. [Zheng, 2017] Wenming Zheng. Multichannel eeg-based
emotion recognition via group sparse canonical correla-
[Li et al., 2016] Yang Li, Wenming Zheng, Zhen Cui, and tion analysis. IEEE Transactions on Cognitive and Devel-
Xiaoyan Zhou. A novel graph regularized sparse linear opmental Systems, 9(3):281–290, 2017.
discriminant analysis model for eeg emotion recognition.
In International Conference on Neural Information Pro-
cessing, pages 175–182. Springer, 2016.
[Lotfi and Akbarzadeh-T, 2014] Ehsan Lotfi and M-R
Akbarzadeh-T. Practical emotional neural networks.
Neural Networks, 59:61–72, 2014.
[Musha et al., 1997] Toshimitsu Musha, Yuniko Terasaki,
Hasnine A Haque, and George A Ivamitsky. Feature ex-
traction from eegs associated with emotions. Artificial Life
and Robotics, 1(1):15–19, 1997.

1567

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy