3. Comparison of different feature extraction methods
3. Comparison of different feature extraction methods
ScienceDirect
1
2
3 Original Research Article
6 Q1 Rab Nawaz, Kit Hwa Cheah, Humaira Nisar *, Yap Vooi Voon
7 Department of Electronic Engineering, Faculty of Engineering and Green Technology, Universiti Tunku Abdul Rahman
8 (UTAR), Malaysia
Article history: EEG-based emotion recognition is a challenging and active research area in affective
Received 23 October 2019 computing. We used three-dimensional (arousal, valence and dominance) model of emotion
Received in revised form to recognize the emotions induced by music videos. The participants watched a video (1 min
13 April 2020 long) while their EEG was recorded. The main objective of the study is to identify the features
Accepted 14 April 2020 that can best discriminate the emotions. Power, entropy, fractal dimension, statistical
Available online xxx features and wavelet energy are extracted from the EEG signals. The effects of these features
are investigated and the best features are identified. The performance of the two feature
Keywords: selection methods, Relief based algorithm and principle component analysis (PCA), is
EEG compared. PCA is adopted because of its improved performance and the efficacies of the
Emotion recognition features are validated using support vector machine, K-nearest neighbors and decision tree
3D emotion model classifiers. Our system achieves an overall best classification accuracy of 77.62%, 78.96% and
Machine learning 77.60% for valence, arousal and dominance respectively. Our results demonstrated that
time-domain statistical characteristics of EEG signals can efficiently discriminate different
emotional states. Also, the use of three-dimensional emotion model is able to classify
similar emotions that were not correctly classified by two-dimensional model (e.g. anger and
fear). The results of this study can be used to support the development of real-time EEG-
based emotion recognition systems.
© 2020 Nalecz Institute of Biocybernetics and Biomedical Engineering of the Polish
Academy of Sciences. Published by Elsevier B.V. All rights reserved.
11
9
10
12
13
14 opment of reliable emotion recognition systems with accept- 18
1. Introduction able accuracy and adaptability to real-life applications is a 19
challenging task. Many research studies are conducted in the 20
15 Emotion is known as the reflection of the mental states and past few decades for human emotion recognition using 21
16 psychophysiological expressions. It is an important factor in different approaches such as facial expressions [2–5], periph- 22
17 human–computer-interaction (HCI) systems [1,2]. The devel- eral physiological signals [6,7], and the brain signals [5,8–11]. 23
* Corresponding author at: Faculty of Engineering and Green Technology (FEGT), Department of Electronic Engineering, Universiti Tunku
Abdul Rahman (UTAR), Jalan Universiti, Kampar 31900, Perak, Malaysia.
E-mail address: humaira@utar.edu.my (H. Nisar).
https://doi.org/10.1016/j.bbe.2020.04.005
0208-5216/© 2020 Nalecz Institute of Biocybernetics and Biomedical Engineering of the Polish Academy of Sciences. Published by Elsevier
B.V. All rights reserved.
Please cite this article in press as: Nawaz R, et al. Comparison of different feature extraction methods for EEG-based emotion recognition.
Biocybern Biomed Eng (2020), https://doi.org/10.1016/j.bbe.2020.04.005
BBE 450 1–17
24 The EEG (physiological) signals that originate from central ERD feature vectors [19]. Similarly, in [20] the authors proposed 82
25 nervous system (CNS) have gained an appreciable interest in a EEG-based emotion recognition framework based on the 83
26 emotion studies. The psychophysiological characteristics of higher order crossing (HOC) features and compared the 84
27 the emotions are directly reflected in the EEG signals. EEG- classification rate among HOC, statistical and wavelet based 85
28 based emotion recognition studies can be differentiated based features. Hence the feature comparison in the above-men- 86
29 on the emotion elicitation method, feature extraction, classi- tioned studies are either the extraction of the same feature 87
30 fiers used, number of participants and number of emotions types with different computational implementation or not 88
31 classified. In spite of the large number of studies conducted on exhaustive. Although, the performance reported in the above 89
32 EEG-based emotion recognition; there are unsolved issues and studies appeared to be efficient there is a lack of exploration of 90
33 questions. For example, the number of emotion classes the robust feature set for the realization of emotion recogni- 91
34 recognized, the number of electrodes used and the accuracy tion system. Therefore, the practical exploitation of such 92
35 of emotion recognition [12]. system for HCI is limited. Hence, the current work presents the 93
36 With the advancement of technology and real world implementation of an EEG-based emotion recognition system 94
37 applications, more and more researchers are exploiting the and evaluates the robustness of features set drawn from the 95
38 digital signal processing and machine learning approaches for commonly used feature types; i.e. power features [2], entropy 96
39 HCI. Various frame works have been developed for human features [8], fractal dimension features [21] statistical features 97
40 emotion recognition. However, there still exists the ongoing [1] and wavelet features [8]. 98
41 challenge of the reliability of these frameworks [1]. One of the
42 issues of these frameworks is the non-linear characteristics of 1.1.2. Emotion models 99
43 the EEG signals. The EEG signals are highly non-linear in To model the emotional state, researchers use one of the two 100
44 nature that results in abrupt changes in the features. Whereas, representations, discrete model or dimensional model. In 101
45 the human emotions change gradually from one state to discrete model of emotions, the theory of basic emotion is 102
46 another [8]. To validate the usefulness of the proposed given consideration. According to this theory, all the emotions 103
47 approach, different types of EEG features are extracted and can be represented categorically in a primitive set of basic 104
48 their performance is compared based on the classification emotions [22]. In the discrete model of emotions, different 105
49 accuracy of the emotional state on the DEAP dataset [13]. scientists consider different basic emotions. The number of 106
basic emotions in the discrete model is a topic of controversy 107
50 1.1. Related work and there is a dispute among the researchers about the exact 108
number of basic emotions. Also, there is debate about the 109
51 1.1.1. EEG-based emotion recognition and feature extraction universality of these emotions. Some researchers maintain 110
52 While emotion recognition studies encompass a wide spec- that there are only four basic emotions, including fear, anger, 111
53 trum of domains, in the current study we focused our attention disgust, and happiness [22], while others count as many as 112
54 on electroencephalogram. EEG has a strong connection with 27 emotions [23]. In dimensional model, the emotions are 113
55 the emotional state identification [14], as brain is the region represented in multiple dimensions, e.g. arousal and valence. 114
56 where the emotions originate and EEG is a measure of brain The most famous example of the dimensional model is the 115
57 activity. For this reason, EEG based human emotion recogni- circumplex model [24]. This model is based on two different 116
58 tion has become an active research area. Researchers' extract dimensions, valence and arousal. The valence represents the 117
59 different features from EEG signals and use them for quality/pleasantness of the emotion and ranges from unpleas- 118
60 recognition of emotion. The differences in the selection of ant (low valence) to pleasant (high valence). Arousal repre- 119
61 these features are based on some theoretical considerations. sents the excitation of the emotion and ranges from calm (low 120
62 For example, EEG signals in the left and right frontal region (i.e. arousal) to excited (high arousal). The 2D model can define a lot 121
63 EEG asymmetry) are associated with positive and negative of emotions, however there is a chance that the 2D model may 122
64 emotions respectively [15]. Hence, EEG asymmetry associated not be able to differentiate the emotions which have the same 123
65 with the emotions, has received tremendous attention in level of arousal and valence. For example, angry and fear are 124
66 emotion studies [16]. Similarly, the change in power spectrum two different emotions but both have high level of arousal and 125
67 of distinct EEG bands is one of the primary indicators of the have negative valence [25]. Hence, to differentiate between 126
68 emotional states [17]. However, there is no common agree- such cases, a third dimension called dominance is introduced 127
69 ment on the choice of EEG attributes which are most and an extended version of the 2D model is obtained [26], as 128
70 appropriate for emotion recognition [1]. Therefore, an exhaus- shown in Fig. 1. The 3rd dimension in this extended model 129
71 tive evaluation of different types of features is important and represents the feeling of being in control of the emotion [27]. 130
72 necessary for the realization of emotion recognition systems. With the addition of this dimension the 2D model is more 131
73 Few studies are conducted to compare the importance of complete and able to differentiate between angry and fear as 132
74 different features of EEG signals for emotion recognition [18]. two different emotions. 133
75 Stelios and Leontios estimated the event related synchroniza-
76 tion/desynchronization (ERS/ERD) features of the EEG signals 1.1.3. Emotion elicitation 134
77 based on time–frequency (TF) analyses. The ERS/ERD features There are many different ways to elicit emotions. Music is 135
78 are estimated with different TF analyses methods in five considered an excellent elicitor of emotion [28] and is used to 136
79 frequency bands using multiple reference state time windows elicit emotions in the laboratory setting for experimental 137
80 and feature vectors. Finally, they compared the classification purposes [29]. Fernandez-Sotos et al. examined the impact 138
81 of liking and disliking emotional state among different TF ERS/ of musical parameter, called note value, in elicitation of 139
Please cite this article in press as: Nawaz R, et al. Comparison of different feature extraction methods for EEG-based emotion recognition.
Biocybern Biomed Eng (2020), https://doi.org/10.1016/j.bbe.2020.04.005
BBE 450 1–17
Please cite this article in press as: Nawaz R, et al. Comparison of different feature extraction methods for EEG-based emotion recognition.
Biocybern Biomed Eng (2020), https://doi.org/10.1016/j.bbe.2020.04.005
BBE 450 1–17
230 a 45-s long EEG sample to eliminate the mood swing and fatigue very important and need to be chosen carefully. Candra et al. 236
231 related portion from the data [35]. After excluding the first 20 s investigated the effect of window size on emotion classification 237
232 of the samples in our study, the remaining 40 s long EEG epoch using DEAP dataset [36]. Their result show that the effective 238
233 was subjected to time-window (i.e. short segments) based window size to extract features from EEG signal and recognize 239
234 feature extraction procedure described in the next section. In emotion is 3–12 s. Based on their investigation, the width of the 240
235 EEG-based emotion recognition, the width of time-window is time-windowed segment in the current work is set to 10 s 241
Please cite this article in press as: Nawaz R, et al. Comparison of different feature extraction methods for EEG-based emotion recognition.
Biocybern Biomed Eng (2020), https://doi.org/10.1016/j.bbe.2020.04.005
289
BBE 450 1–17
242 (128 10 = 1280 samples) with no-overlap. Following this objective of the feature extraction stage of the EEG-based 265
243 segmentation, the 40-s long EEG epoch is divided into four emotion recognition system. For this reason, we extracted 266
244 10-s long EEG segments as shown in the pre-processing block in different features from the data to investigate their performance 267
245 Fig. 3. Hence for each subject we acquire 160 (40 (videos) 4 in the classification of emotional state. All of the extracted 268
246 (segments) = 160)) labeled observations. Finally, all the features features, which are explained in detail below, have been 269
247 (discussed in the next section) were computed for each of the extensively used for EEG-based emotion recognition [2,37,38]. 270
248 four segments separately. The flow chart of the algorithm to
249 perform the pre-processing is shown in Fig. 3. The processing 2.3.1. Power features 271
250 starts by inputting the subjects EEG recording. Next, 14 EEG signal is composed of different frequency bands, delta (0– 272
251 channels EEG is extracted from 32-channels as discussed 4 Hz), theta (4–8 Hz), alpha (8–13 Hz), beta (13–30 Hz) and 273
252 above. After that each channel is processed individually gamma band (above 30 Hz). After pre-processing we used 274
253 starting from channel no. 1 to channel no. 14. At each channel, modified periodogram algorithm, called Welch method [39] to 275
254 the one minute long EEG segment corresponding to a single compute power spectrum feature. In Welch method power 276
255 video is taken into account at a time starting from EEG segment spectra is estimated by first decomposing the time-series 277
256 no.1 (that correspond to video no.1) to segment no. 40 (that signal into successive blocks and then periodogram is formed 278
257 corresponds to video no. 40). For each EEG segment corre- for each block. Finally the mean is computed by taking the 279
258 sponding to a single video, first 20 s samples were removed and average over the blocks. The procedure to compute power 280
259 the remaining part was subjected to feature extraction. The spectra is given below; 281
260 features are extracted in window fashion as explained in the 282
261 next section. 1) Decompose the time-series signal into N successive blocks. 284
283
x ¼ fx1 ; x2 ; . . .; xN g (1) 284
262 2.3. Feature extraction
2) Compute the periodogram of each block 285
286
288
287
263 To derive salient features from the data that can effectively map 1 286
285
287
Pxn ¼ jFFTðxn Þj2 (2)
264 the EEG segments to the respective emotional state is the main L 288
289
290
290
289
291
291
292
293
294
296
295
Please cite this article in press as: Nawaz R, et al. Comparison of different feature extraction methods for EEG-based emotion recognition.
Biocybern Biomed Eng (2020), https://doi.org/10.1016/j.bbe.2020.04.005
BBE 450 1–17
289
290 where L represents the total number of points in the nth Table 1 – Summary of power features extracted from
291 block of signal. different EEG bands.
293
292 3) Finally the Welch estimate of power spectral density is Feature Theta Alpha Beta gamma Total
294 given by
PFv 14 14 14 14 56
1mX¼0
CPv 14 14 14 14 56
^sx , Px ðmÞ (3)
K K1 n DASM 7 7 7 7 28
RASM 7 7 7 7 28
296
295 where K is the total number of blocks in the signal. PFall 42 42 42 42 168
320 (paired channels) = 28) DASM and 28-dimensional RASM where PSD is the normalized power spectral density and fn 368
366
367
321 feature vector. In addition to this, the seven left–right is the half of the sampling frequency according to Nyquist 369
322 symmetrical pair of channels were used to extract cross criteria. 370
323 power spectral density feature for each of the four bands.
324 Cross power spectrum of each symmetrical pair resulted 2.3.2.3. Singular value decomposition entropy (SVDE). Singular 371
325 into a complex value. The real and imaginary parts of the value decomposition entropy (SVDE) is an indicator of the 372
326 complex value for seven symmetrical pairs produced 56- number of eigenvectors that are needed to represent the EEG 373
327 dimensional (7(paired channel) (4 (band_real) + 4(ban- data [46]. In other words, it measures the dimensionality of the 374
328 d_img)) = 56) cross power spectral feature vector (CPv). data. It is considered an alternative method of information 375
329 Collectively the power feature resulted in 168-dimensional extraction from the EEG data [47]. It is computed as follows in 376
330 vector (PFv + DASM + RASM + CPv), hereinafter power fea- Eqs. (6)–(8): 377
331 ture all (PFall). Table 1 shows the summary of power features 378
332 extracted. 1) The input EEG signal is decomposed into N segments 380
379
333 The EEG power features have been found effective x ¼ fx1 ; x2 ; . . .; xN g (6) 380
334 [42] to recognize the emotional variation after music
335 stimulation. 2) Then a delayed vector of the input vector y is created 381
382
384
383
as 382
381
383
385
336 2.3.2. Entropy features 384
yi ¼ xi ; xiþt ; . . .; xiþðm1Þt (7)
337 The human brain is a complex system and the EEG signals
338 are non-linear and chaotic in nature [21]. In addition to the where t is the delay and m is the embedding dimension 386
387
339 linear investigation of the EEG signal it is highly recom- 3) The embedding space is then constructed by: 386
385
389
388
340 mended to analyze it with non-linear features. With the h iT 387
Y ¼ y1 ; y2 ; . . .; yNðm1Þt (8)
341 advancement in nonlinear dynamics, entropy is one of the 388
342 powerful algorithms developed for the non-linear analysis
390
391
343 of EEG signals. In this study we extracted five variants of 392
344 entropy from each EEG channel that produced 70-dimension- The SVDE is then performed on matrix Y to produce M singular 393
390
389
345 al ð14ðchannelsÞ5ðentropiesÞ ¼ 70Þ entropy feature vector values, s 1 ; ; s M , known as the singular spectrum. The SVDE 394
346 (ENv ). is then computed as in Eq. (9): 395
Please cite this article in press as: Nawaz R, et al. Comparison of different feature extraction methods for EEG-based emotion recognition.
Biocybern Biomed Eng (2020), https://doi.org/10.1016/j.bbe.2020.04.005
BBE 450 1–17
XM
396 SVDE ¼ s log2 s i
i¼1 i
(9) algorithms are used to compute FD. The algorithms proposed 457
by Katz [53], Petrosian [55] and Higuchi [56] are widely used in 458
397
398
399 2.3.2.4. Approximate entropy and sample entropy. Approxi- EEG signal characterization [52]. The methods developed for 459
400 mate entropy (ApEn) and sample entropy (SampEn) both computation of FD by these algorithms are different than each 460
401 measure the irregularity and complexity in the time-series EEG other and are explained below. We computed FD for each 461
402 data [48]. The general steps to compute approximate entropy channel of EEG signals using these three algorithms resulting 462
403 are given below; in 42-dimensional (14(channels) 3(algorithms) = 42) FD fea- 463
404 ture vector (FDv). 464
405
406 1) Decompose the EEG signal consisting of N data points 2.3.3.1. Katz's FD. This algorithm is based on the FD compu- 465
406 x ¼ fx1 ; x2 ; . . .; xN g (10) tation of a waveform of planar curve [43]: 466
407
408 467
410
409 2) Define Nm þ 1 vectors Xð1Þ. . .XðNm þ 1Þ
408
407 logðLÞ
409 XðiÞ ¼ ½xðiÞ; xði þ 1Þ; . . .xði þ m1Þ (11) FD ¼ (16)
logðdÞ
410
412
411
414
413 3) Compute the maximum norm distance between X(i) and where d is computed as the maximum distance between two 468
469
415 X(j) points and known as diameter of the curve. L is the total length 470
412
411
413 of the curve. An average value of the Katz's FD is obtained by 471
d½XðiÞ; Xð jÞ ¼ max
k¼1;2;...;m jxði þ k1Þxð j þ k1Þj (12)
414 dividing L and d by the average distance between the conse- 472
417
416
415
419
418 4) For j ¼ 1; . . .; Nm þ 1 find vector Xð jÞ for a specific XðiÞ cutive points [52]: 473
420 where i 6¼ j that fulfills the condition d½XðiÞ; Xð jÞr: where r is 474
421 the tolerance value for comparison. Let Bi be the number of
logðLÞ logðL=aÞ logðmÞ
417
416
418
422 vectors fulfilling the above criteria. Compute the probability FD ¼ ¼ ¼ (17)
logðdÞ logðd=aÞ logðmÞ þ logðd=LÞ
423
419 of similar pattern by;
where the average distance between the consecutive points is 475
476
420 Bi
Cm
r ðiÞ ¼ (13) represented by a and m/L is the number of samples in the 477
421 Nm þ 1
segment [52]. 478
422
426
425
424 Find the mean of natural log of the above as
423
m 1 XNmþ1 2.3.3.2. Petrosian FD. The algorithm developed by Petrosian is 479
; ðrÞ ¼ lnCm
r ðiÞ (14)
Nm þ 1 i¼1 based on the conversion of time-series data into binary 480
sequences [52]. Once the data is converted, FD is computed as: 481
427
428
426
425
424
430
429 5) Repeat the above procedure by increasing the dimension to 482
431 m + 1 to compute ;mþ1 ðrÞ and then compute approximate logðmÞ
FD ¼ (18)
432 entropy as follows:
logðmÞ þ log mþ0:4N
m
d
427
428 ApEnðm; r; NÞ ¼ ;m ðrÞ;mþ1 ðrÞ (15)
429 where Nd represents the number of segment pairs which are 483
484
434
433
430 where m and r are fixed values. In this study we used m = 2 not similar in the binary sequence [52]. 485
435
431 and r ¼ 0:2 stdðdataÞ based on the empirical results found
436
432 in [49]. 2.3.3.3. Higuchi's FD. The time-series EEG signal is decom- 486
437 posed into N samples; 487
438 The SampEn is an extended and modified version of ApEn and
XðnÞ ¼ Xð1Þ; Xð2Þ; . . .; XðNÞ (19) 488
434
433
439 has been used for emotion recognition from EEG [11,50]. It has
435 489
490
440 two advantages over approximate ApEn: data length indepen- A new time-series signal is constructed by picking up one 491
441 dence and no self-similarity measurement [49]. The larger sample after every kth sample from the original time-series; 492
442 value of ApEn and SampEn represents more irregularity and
Nm 493
443 complexity in the data and vice versa [51]. The procedure of the Xm
k : XðmÞ; Xðm þ kÞ; . . .; X m þ :t (20)
k
444 computation of ApEn and SampEn adopted in this paper is
445 described in [42]. where m = 1, 2, 3, . . ., k. m shows the starting point and k 494
495
represents the interval between the two samples in the origi- 496
446 2.3.3. Fractal dimension features nal data. 497
447 Considering the EEG signals as a geometric figure, its temporal Now for each k, compute Lm ðkÞ as follows: 498
448 sequences are assessed directly in the time domain through 2 3
P b Nm c 499
449 fractal dimension (FD) [52]. FD estimates the geometric 16 i¼1
k
jXðm þ ikÞXðm þ ði1ÞkÞjÞ ðN1Þ7
450 complexity in the time-series EEG signals [53] and evaluates Lm ðkÞ ¼ 4 5 (21)
k k ck
b Nm
451 its correlation and evolutionary features by quantifying the
Pk
452 fractional spaces occupied [52]. It is considered a successful For the average value of Lm ðkÞ i.e. hLðkÞi ¼ 1k m¼1 Lm ðkÞ the 502
500
501
453 feature for emotion recognition based on EEG signals [21]. In following properties exists: 503
454 another study multimodal fusion approach is used and FD
504
455 feature extracted from EEG signal is fused with musical
hLðkÞi / kFD (22)
456 features for emotion recognition [54]. A wide variety of
Please cite this article in press as: Nawaz R, et al. Comparison of different feature extraction methods for EEG-based emotion recognition.
Biocybern Biomed Eng (2020), https://doi.org/10.1016/j.bbe.2020.04.005
BBE 450 1–17
505
506 where FD is the value of the fractal dimension of the time- 2.3.4.6.1. Wavelet feature. Wavelet feature is one of the 550
548
549
507 series data and can be measured as: time–frequency (TF) domain features that is used in the 551
emotion recognition from EEG signal [8]. In wavelet decompo- 552
508
sition a time–frequency transform of the original signal is 553
loghLðkÞi obtained for each time point and each frequency resolution. A 554
FD ¼ lim (23)
k!1 logk
mother wavelet is correlated with the original signal to obtain 555
509
510 the wavelet coefficients. In the current study we used 556
511 2.3.4. Statistical features Daubechies wavelet as a mother wavelet. This mother wavelet 557
512 A set of signal statistics from the time-series EEG signal are is chosen based on its near optimal time–frequency represen- 558
513 used to recognize the emotion [1]. In the current study six tation characteristics [8]. The signal is decomposed in theta, 559
514 statistical features are extracted which are adapted from [1,38]. alpha, beta and gamma band using wavelet decomposition. As 560
515 Extracting these six statistical features for each channel of EEG mentioned earlier, DEAP dataset is filtered in 4–45 Hz range, 561
516 data produced 84-dimensional ð14ðchannelsÞ6ðstatisticsÞ ¼ 84Þ that is why we could not compute delta (0.1–4 Hz) EEG band. 562
517 statistical feature vector (STv ). These statistical measures have For sampling rate of 128 samples/s, we applied four level of 563
518 been used to characterize the time-series EEG signals. octave wavelet decomposition to obtain the coefficients of the 564
EEG bands as: D4 decomposition is in theta range (4–8 Hz), D3 565
519 2.3.4.1. Mean. in alpha range (8–16 Hz), D2 in beta range (16–32 Hz) and D1 in 566
gamma range (32–64 Hz). Since the coefficient from levelj 567
520
correspond to bandj, the wavelet energy Ej for that band is 568
1X
n¼1
computed as the corresponding squared coefficient [8,57], as 569
X ¼ XðnÞ (24)
N N given in Eq. (31). The wavelet energy Ej is computed for each of 570
521
522 the 14 channels of four bands resulting in 56-dimensional 571
523 2.3.4.2. Standard deviation. (4ðbandsÞ14ðchannelsÞ ¼ 56) wavelet energy feature vector (Ej). 572
524 573
vffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi k¼1
u n¼1 X
u1 X Ej ¼ Dj ðkÞ2
sX ¼ t
(31)
ðXðnÞmX Þ2 (25)
N N N
529
530 One of the most important and challenging task in affective 579
531 2.3.4.4. Mean of absolute values of first difference of normalized computing is to identify the most relevant features that can 580
532 EEG. best predict the output class. In almost all cases, a subset of 581
extracted features is informative and relevant in determining 582
533
the class. The presence of the remaining irrelevant features 583
1 Xn¼1
dX ¼ jXðn þ 1ÞXðnÞj (27) that contribute to the dimensionality of the problem decreases 584
N1 N1
the computational efficiency of the model. Feature selection 585
534
535 (FS) is thus the process of identifying irrelevant features in the 586
536 2.3.4.5. Mean of absolute values of second difference. data [58]. Different methods are used to implement FS and 587
form a new set of relevant features. 588
537
The FS methods can be categorized based on their 589
1 Xn¼1
relationship with the classification algorithm into two 590
gX ¼ jXðn þ 2ÞXðnÞj (28)
N2 N2 categories, namely filter methods and wrapper methods [1]. 591
538
539 Filter FS methods perform the selection criteria independent of 592
540 2.3.4.6. Mean of absolute values of second difference of the final classification algorithm, thus having lower chance of 593
541 normalized EEG. over-fitting. On the contrary, the selection output of wrapper 594
methods is highly dependent on the classification or learning 595
542 1 Xn¼1
gX ¼ jXðn þ 2ÞXðnÞj (29) algorithms for recursive feature inclusion or elimination. 596
N2 N2 Therefore, the wrapper methods are more likely to result in 597
543
544 where N is the total number of samples in the EEG signal. XðnÞ classifier over-fitting and are more computationally expen- 598
545 represents the EEG signal normalized to zero mean and unit sive. 599
546 variance as below: From another aspect, feature selection methods can also be 600
categorized as supervised and unsupervised methods. Super- 601
547 vised FS methods require the availability of ground-truth 602
XðnÞmX labels affixed to the feature sets for selecting the most 603
XðnÞ ¼ (30)
sX influential feature subsets. Supervised FS methods include 604
Please cite this article in press as: Nawaz R, et al. Comparison of different feature extraction methods for EEG-based emotion recognition.
Biocybern Biomed Eng (2020), https://doi.org/10.1016/j.bbe.2020.04.005
BBE 450 1–17
605 the relief-based algorithms, the Fisher score, the Chi-squared SVM and k parameter for KNN. In the current study 663
606 score, and correlation-based algorithm. On the contrary, the fC; gg 2 104 ; 103 ; 102 ; 101 ; 1; 10; 102 ; 103 ; 104 ; 105 ; 106 and 664
607 unsupervised FS methods do not rely on the labels affixed to k 2 f5; 4; 3; 2; 1g [10,38]. 665
608 feature sets in their operation. Unsupervised methods include There are 160 (40ðvideosÞ4ðsegmentsÞ ¼ 160) observations 666
609 variance-based methods, mean absolute difference, disper- for each subject. For each subject cross validation method with 667
610 sion ratio, Laplacian scoring, and clustering selection algo- k = 5 is applied to split the data into training and testing data. 668
611 rithms. Using this 5-fold cross validation method, the data is divided 669
612 It is claimed that choosing a FS method is highly dependent into five equal segments having 32 (160=5 ¼ 32) observations in 670
613 on the research scenario and none is considered a universal each segment. For each subject the classifier is trained using 671
614 best [59]. In this study we have applied two different methods the four segments (i.e. 128 observations) and the fifth segment 672
615 for FS, relief based algorithm (RBA) and principle component (i.e. 32 observations) is used as a test data. The procedure is 673
616 analysis (PCA). For subsequent analysis we will adopt the repeated iteratively five times. Each time four segments are 674
617 method which gives better results for our dataset. PCA is a used as training and the fifth as testing. 675
618 widely applied method in the area of computer science. In this
619 method the significance of the feature components are 676
620 evaluated by exploiting the eigenvectors of the covariance 3. Results
621 matrix and uses a reasonable scheme to perform FS [60]. RBA is
622 characterized as an individual evaluation filter method [58]. For better understanding of the relationship between EEG and 677
623 Relief algorithm was originally formulated by Kira and Rendell emotional responses, we investigated which feature type is 678
624 [61,62]. This algorithm calculates a proxy statistics called most reliable for emotion recognition. Also, we compared 679
625 feature score ranging from 1 (worst) to +1 (best) for each feature selection methods and provided our evaluation of the 680
626 feature in the feature vector according to its importance [58]. best method that can be used for feature selection in emotion 681
627 Based on RBA, all the features are evaluated one by one, and a recognition systems. Finally we present, how the emotion 682
628 real value between 1 and +1 is assigned to it that indicates classification is effected if the original feature vector is reduced 683
629 importance of that feature [63]. A family of RBA FS algorithm is and only top-100 features are used. 684
630 developed and implemented [58]. We adopted RBA FS method
631 because of its non-myopic strength, i.e. the quality of target 3.1. Effectiveness of feature type for emotion recognition 685
632 feature is estimated in the context of other features [64]. before FS 686
633 Additionally, these methods are non-parametric [65]. Different
634 core algorithms of RBA FS methods, ReliefF, RReliefF, SURF, To evaluate the effectiveness of each feature type for emotion 687
635 TuRF, etc. are developed. For detail descriptions of the recognition, we use only one type of feature at a time. The 688
636 differences between these core algorithms see [58]. results in this subsection are with the condition that all the 689
features in each type are used to classify the emotion for each 690
637 2.5. Classifier subject (i.e. the results before performing any FS method). 691
Then the mean accuracy among all the subjects is computed. 692
638 We used support vector machine (SVM), K-nearest neighbors The average classification results based on each of the five 693
639 (KNN) and decision tree (DT) classifiers based on their types, Statistical features, Power features, Entropy features, 694
640 promising empirical results in the EEG based emotion Fractal Dimension and wavelet energy features are plotted in 695
641 recognition [9,11,14,27,57,66]. The RBF kernel SVM classifier Fig. 4. From the mean accuracy, it is evident that the average 696
642 based on LIBSVM [67] is used in our work. SVM constructs a classification accuracy of statistical features is better than the 697
643 separate hyperplane between the two different classes and other feature types for all classifiers, SVM, KNN and decision 698
644 tries to maximize the distance of each class from the separated tree. It is clear that statistical features can best distinguish 699
645 hyperplane [8,21]. between low level and high level of valence, arousal and 700
646 On the other hand, the KNN, due to its non-parametric dominance than the other four types of features. Using the 701
647 learning approach, is considered very reliable for EEG data statistical features, we obtained the highest classification 702
648 classification. Based on its core strategy, KNN looks for k accuracy for valence, arousal and dominance as 77.62%, 703
649 number of samples (called k-neighbors) nearest to the 78.96% and 77.60% respectively, with the SVM classifier. 704
650 predicting sample. Each neighbor votes for their class. The Similarly, the statistical features outperformed the other four 705
651 predicting sample is assigned to the most voted class [57]. DT types of features yielding 75.02%, 74.71% and 76.36% accuracy 706
652 has promising results in terms of minimum computation time for valence, arousal and dominance respectively, using KNN 707
653 and is thus considered reliable for real-time systems [66]. We classifier. Finally, with decision tree the classification accura- 708
654 apply grid searching to tune the hyperparameters for SVM and cies are 71.84%, 72.93% and 73.91% for valence, arousal and 709
655 KNN to choose the optimal set of hyperparameters. Grid dominance respectively. These results indicate that time- 710
656 searching scans the available set of parameters and configures domain statistics are the best attributes to distinguish among 711
657 the optimal parameters for a given model. Iterating through all different emotional states. 712
658 possible combinations of parameters in a grid search manner It is obvious from these results that the classification 713
659 and build and store model for each combination. After performance of statistical feature is better than other features 714
660 scanning all the possible combinations, it identifies the best in all classifiers SVM, KNN and DT. Also, it is clear that the SVM 715
661 parameters from the best model. Using this method we method outperformed KNN and DT to classify valence and 716
662 identify best penalty parameter (C) and kernel coefficient (g) for dominance in each type of feature. Whereas, in arousal 717
Please cite this article in press as: Nawaz R, et al. Comparison of different feature extraction methods for EEG-based emotion recognition.
Biocybern Biomed Eng (2020), https://doi.org/10.1016/j.bbe.2020.04.005
BBE 450 1–17
Fig. 4 – Classification accuracies before PCA with respect to each type of features.
718 classification KNN is slightly better than SVM using power the top 50% of the PCs for the classification. The results 746
719 features. This result partially reflects that SVM classifier obtained for top 50% of the PCs are shown in Fig. 6 for valence, 747
720 outperformed KNN and DTin EEG based emotion recognition arousal and dominance. Again the best performance is 748
721 [14]. achieved by statistical features among all types. Using KNN 749
classifier, the statistical features outperformed the other types 750
722 3.2. Comparison of RBA and PCA as FS method of features by having 77.54% and 79% accuracy for valence and 751
dominance respectively and 78.5% accuracy for arousal using 752
723 In EEG based emotion recognition, the selection of proper SVM classifier. Selecting the top 50% PCs reduced the 753
724 time-window is very important and have significant effect on dimension of the feature matrix by half which is a significant 754
725 the performance of the system [9]. We extracted the features computational improvement. After using top-50% PCs, the 755
726 from the 10 s EEG segment (as explained in Section 2.2<**>). In prediction time is improved as shown in Table 2. This will help 756
727 this case, the number of observations for each subject becomes in developing a system that can recognize the emotions in 757
728 160 ð4ðEEGsegÞ40ðvideosÞ ¼ 160Þ. Hence, the total number of real-time. Also, the prediction of dominance was slightly more 758
729 observations for 32 subjects is 5120 (32ðsubjectsÞ160 ¼ 5120). accurate when we used top 50% principle components. 759
730 Based on the best performance of statistical features and SVM Dominance accuracy increased from 77.60% to 79%. We 760
731 classifier in the previous section, we tested the performance of observed that there is no significant difference in the 761
732 RBA and PCA as feature selection method. We applied RBA and prediction rate when we use top-50% PCs. This indicates that 762
733 PCA to the statistical features and ranked the features (in case most of the data variance is explained by top PCs. Therefore, 763
734 of RBA) and principle components (in case of PCA) in we compute the proportion of the data variance explained by 764
735 descending order. We selected top features and used SVM each PC within each category of features and plotted for top-5 765
736 classifier and compared the results obtained with RBA and PCs in Fig. 7. It is clear from the figure that almost 90% of the 766
737 PCA. The results are plotted in Fig. 5. It is clear from the figure data variance can be explained by top five PCs in each category 767
738 that most often the features selected using PCA can predict the of features. 768
739 emotion better than the features selected by RBA. Therefore, Next, we compared the performance of the top 100 features 769
740 we adopted PCA as FS method in the subsequent analysis. with all features combined (i.e. 420 features). We combined all 770
the feature types in one, creating a full feature vector (i.e. 420 771
741 3.3. Effectiveness of feature type for emotion recognition features) and applied the PCA FS. Based on the importance 772
742 after FS score of each PC, the PCs are sorted in descending order. Out of 773
420, top 100 PCs and all of 420 features are applied separately to 774
743 Next, we applied PCA FS and ranked the principle components predict the emotion class and the results are plotted in Fig. 8. 775
744 with-in each category. The PCs are sorted in descending order Using all features at once, the best accuracy obtained for Q2 776
745 based on their importance. From each feature type we applied valence is 71.5% using SVM classifier. Similarly, the best 777
Please cite this article in press as: Nawaz R, et al. Comparison of different feature extraction methods for EEG-based emotion recognition.
Biocybern Biomed Eng (2020), https://doi.org/10.1016/j.bbe.2020.04.005
BBE 450 1–17
Fig. 5 – Comparison of feature selection methods using statistical features as feature type and SVM as classifier.
778 accuracy for arousal and dominance are 70.9% and 71.3% decrease is not significant. This indicates that most of the data 783
779 respectively, using all 420 features and SVM classifier. When variance is handled by the top 100 PCs. Reducing the feature 784
780 top 100 PCs are used as feature vector, the classification vector to 100 PCs can increase the feasibility of the real-time 785
781 accuracy slightly decreased to 71.2%, 70.5% and 71.2% emotion recognition systems without effecting the perfor- 786
782 for valence, arousal and dominance respectively, but this mance of the system. The results show that a reliable real-time 787
Fig. 6 – Classification accuracies after applying PCA with respect to each type of features.
Please cite this article in press as: Nawaz R, et al. Comparison of different feature extraction methods for EEG-based emotion recognition.
Biocybern Biomed Eng (2020), https://doi.org/10.1016/j.bbe.2020.04.005
BBE 450 1–17
Table 2 – Comparison of prediction time between all features and top 50% PCs from each category (Note: time is given in
seconds).
Statistical Power FD Entropy Wavelet energy
Before After Before After Before After Before After Before After
PCA PCA PCA PCA PCA PCA PCA PCA PCA PCA
SVM Arousal 452.31 353.97 632.66 414.89 372.26 372.69 445.30 346.39 402.94 383.55
Valence 440.76 358.44 635.13 421.77 361.39 371.53 412.88 362.21 409.71 385.83
Dominance 717.36 377.67 1286.34 443.23 422.43 420.30 452.95 388.94 391.46 375.01
KNN Arousal 381.51 178.54 684.60 335.92 204.11 112.58 298.77 143.73 261.53 135.52
Valence 347.28 176.68 651.26 337.02 200.49 112.95 294.27 146.41 255.59 139.10
Dominance 351.78 175.30 660.61 352.35 203.75 132.95 304.31 160.16 254.43 135.250
DT Arousal 72.06 63.88 128.25 81.43 59.58 55.31 71.03 56.07 74.54 57.11
Valence 71.65 61.15 105.40 77.23 62.98 51.52 69.54 55.83 72.74 55.48
Dominance 73.94 62.71 106.49 93.07 63.78 52.72 70.49 58.27 70.90 56.25
Fig. 7 – The amount of variance explained by top 5 PCs with-in each feature category.
Fig. 8 – Classification results when all (420) features and top 100 PCs of all features are applied.
Please cite this article in press as: Nawaz R, et al. Comparison of different feature extraction methods for EEG-based emotion recognition.
Biocybern Biomed Eng (2020), https://doi.org/10.1016/j.bbe.2020.04.005
BBE 450 1–17
788 emotion recognition system can be implemented with less better with p < 0.000 than other four types of features, 810
789 number of PCs as feature set than the original high irrespective of the classifier used. In valence classification, 811
790 dimensional feature vector. This will enhance the computa- power features significantly outperformed wavelet feature 812
791 tional performance of the system and will help to recognize with p = 0.025 using KNN and entropy features with p = 0.0244 813
792 the emotions faster. using DT. Similarly, power feature outperformed fractal 814
dimension with p = 0.003 to classify valence using DT. Wavelet 815
793 3.4. Statistical analysis feature outperformed fractal dimension with p = 0.04 to 816
classify valence using DT. In arousal classification, there 817
794 To determine whether the emotion recognition is affected by was no significant difference in the classification accuracies 818
795 different types of features; and there is any significant difference among the different feature types except for the statistical 819
796 between the classification performances among different features where, p < 0.05. In dominance, the classification 820
797 feature types we performed two-tailed paired t-test. There are results were also better for statistical features than other 821
798 five types of features, that is why we performed t-test 10 times, feature types p < 0.05. Similarly, power feature outperformed 822
799 each time two feature types were compared. The null hypothe- fractal dimension with p = >0.032 using KNN classifier. Also, 823
800 sis is ‘‘there is no difference among the classification results power feature outperformed entropy feature with p = 0.019 824
801 obtained with different feature types’’. All the effects were and wavelet feature with p = 0.038 using DT. 825
802 compared using a significance level of p < 0:05. If p > 0:05 the null
803 hypothesis is accepted otherwise null hypothesis is rejected. 3.4.2. Top 50% PCs in each category 826
804 The comparisons were conducted under conditions, all features After PCA, top-50% PCs in each feature type were used for 827
805 in each category and top 50% PCs in each category, separately. classification and their results were compared. Again statisti- 828
806 Q3 All the statistical results (p-values) are given in Table 3. cal features outperformed the other four feature types in 829
valence, arousal and dominance with p < 0.05 as shown in 830
807 3.4.1. All features in each category Table 2. In valence classification, power feature outperformed 831
808 The results show that the performance of statistical features to wavelet feature with p = 0.0029 using KNN and fractal 832
809 classify valence, arousal and dominance were significantly dimension and entropy with p = 0.016and 0.022 using DT 833
Table 3 – Statistical results ( p-values) of t-test using all features and top 50% PCs in each category (stat: statistical, WE:
wavelet features, FD: fractal dimension).
Condition All features in each category Top 50% PCs in each category
Please cite this article in press as: Nawaz R, et al. Comparison of different feature extraction methods for EEG-based emotion recognition.
Biocybern Biomed Eng (2020), https://doi.org/10.1016/j.bbe.2020.04.005
BBE 450 1–17
834 respectively. In arousal classification the results of power our method with the recent emotion recognition studies that 881
835 features were significant from fractal dimension with p = 0.010 have also used DEAP dataset and an overall comparison is given Q4 882
836 using DT. In dominance, the power features and fractal in Table 4. A new method for feature extraction called empirical 883
837 dimension were significant with p = 0.038 and 0.047 using KNN mode decomposition (EMD) is proposed for emotion recognition 884
838 and DT respectively. Similarly in dominance classification, using EEG signals. Zhuang et al. utilized this method and 885
839 wavelet features outperformed power features with p=0.005 performed binary (low/high) classification on arousal and 886
840 and entropy with p = 0.007, fractal dimension outperformed valence using SVM classifier. They obtained 70.41% and 887
841 entropy with p=0.02 using DT. 72.10% accuracy for arousal and valence respectively [32]. 888
Similarly, a binary class (low/high) classification for arousal and 889
842 valence using EMD feature extraction method and KNN 890
4. Discussion
classifier is performed in [68]. They achieved 51% and 67% 891
accuracy for arousal and valence respectively. Hao et al. 892
843 The results obtained in the current study and the emotion presented a deep learning framework for emotion recognition 893
844 recognition rate verifies that EEG data contains sufficient using DEAP dataset. In this framework they proposed multi- 894
845 information to discriminate among different emotional states. band feature matrix (MFM) approach in which they utilized the 895
846 It is worth noting that the feasibility of using fewer electrodes to spatial and frequency band characteristics of multichannel EEG 896
847 train classifiers for real-time HCI applications is supported by signal and performed binary classification for arousal, valence 897
848 our results. The accuracy among different types of features and dominance (3D model). They achieved 68.28%, 66.73% and 898
849 varied significantly and the main finding of the current study 67.25% classification accuracy for arousal, valence and domi- 899
850 shows that the statistical features are the most suitable feature nance respectively [69]. In another study, Rhythmic (i.e. 900
851 type for emotion recognition. This also indicates that music different bands) and Time (RT) characteristics of the EEG signal 901
852 videos produced a significant effect on the affective state of the is given consideration and binary classification is performed for 902
853 human beings and this effect is strongly reflected in the valence and arousal using DEAP data [70]. In this work, for each 903
854 statistics of the time-series EEG. In the previous EEG based EEG band, nine different rectangular time segmentation widows 904
855 emotion recognition studies, the researchers have mainly (0.25 s, 0.5 s, 0.75 s, 1 s, 2 s, 3 s, 4 s, 5 s, and 6 s) are used and best 905
856 focused on the frequency domain characteristics of the EEG EEG band and time window is identified. They achieved best 906
857 signals, for example power of different EEG bands. Our results accuracy of 69.1% for arousal and 62.12% for valence using less 907
858 suggest that features related to time-domain statistics seem than 1 s time window. Our proposed method outperformed all 908
859 more reliable although these features have received less these methods with an average accuracy of 77.62%, 78.96% and 909
860 attention so far. Compared to other feature extraction methods 77.60% of valence, arousal and dominance respectively. 910
861 in other domains (i.e. frequency domain or time–frequency
862 domain) the advantages of statistical features is that it utilizes 911
5. Conclusions
863 more time domain information. As originally the EEG signal
864 obtained by the EEG device is in the time-domain, that is why
865 time-domain statistical features may provide a direct way of In the current study an EEG-based emotion recognition 912
866 predicting the emotions from the original EEG signal. There will classification has been carried out. A comprehensive set of 913
867 be no hassle of converting the time-domain signal into other features (power spectral density (power features), entropy, 914
868 domains for example frequency domain. One of the possible fractal dimension, statistical, and wavelet features) are 915
869 reasons that statistical features outperformed the other types extracted from the EEG signal. We performed a quantitative 916
870 could be its stability. In the earlier studies, the feature stability analysis by comparing the feature extraction methods using 917
871 was quantified by the intra-class correlation coefficient (ICC) three different machine learning classifiers, SVM, KNN and DT. 918
872 which allows the assessment of the similarities in the group To avoid over-specification using large number of extracted 919
873 data [38]. This indicates that statistical features can best features and to make the feature extraction feasible online, we 920
874 describe whether the data from the same group resemble each reduced the feature space by feature selection (FS) technique. 921
875 other. The best average ICC of the statistical features over the Hence, we identified the top relevant features with PCA FS and 922
876 other types of features in the previous study [38] provides a good investigated which type (power, entropy, FD, statistical or 923
877 justification of the superiority of the statistical features in the wavelet) of features are most promising. The results showed 924
878 current study. The classification rate of the current study is that the statistical features are the most sensitive feature 925
879 promising as compared to previous studies that validated their metric in characterizing the brain emotional dynamics. We 926
880 methods on DEAP dataset. For this purpose, we have compared compared the reliability of the features by exploiting all the 927
Please cite this article in press as: Nawaz R, et al. Comparison of different feature extraction methods for EEG-based emotion recognition.
Biocybern Biomed Eng (2020), https://doi.org/10.1016/j.bbe.2020.04.005
BBE 450 1–17
928 features and then used top 50% of features from each category. [2] Thammasan N, Moriyama K, Fukui K, Numao M. 972
929 We confirmed the suitability of the most sensitive feature type Continuous music-emotion recognition based on 973
electroencephalogram. IEICE Trans Inf Syst 2016;99:1234– 974
930 by all classifiers, SVM, KNN and DT. We found that in both
41. 975
931 cases (all features and top 50% features statistical features
[3] Black MJ, Yacoob Y. Recognizing facial expressions in image 976
932 outperformed the other feature types). Our system achieves an sequences using local parameterized models of image 977
933 overall best classification accuracy of 77.62%, 78.96% and motion. Int J Comput Vis 1997;25:23–48. 978
934 77.60% for binary classification of valence, arousal and [4] Anderson K, McOwan PW. A real-time automated system 979
935 dominance respectively using all features in statistical for the recognition of human facial expressions. IEEE Trans 980
936 category. Hence, we conclude that EEG time domain statistics Syst Man Cybern Part B 2006;36:96–105. 981
[5] Soleymani M, Asghari-Esfeden S, Fu Y, Pantic M. Analysis of 982
937 has significant role in classifying the emotional state.
EEG signals and facial expressions for continuous emotion 983
938 Moreover, reducing the feature space to half using PCA FS detection. IEEE Trans Affect Comput 2016;17–28. 984
939 will help in developing a fast emotion recognition system [6] Brosschot JF, Thayer JF. Heart rate response is longer after 985
940 without affecting the performance of the system significantly. negative emotions than after positive emotions. Int J 986
941 The results of the current study give a comprehensive Psychophysiol 2003;50:181–7. 987
942 overview to the researchers about most relevant features to [7] Kim KH, Bang SW, Kim SR. Emotion recognition system 988
using short-term monitoring of physiological signals. Med 989
943 be extracted from EEG signal for emotional state prediction.
Biol Eng Comput 2004;42:419–27. 990
944 This may lead to a feasible online feature extraction
[8] Wang X-W, Nie D, Lu B-L. Emotional state classification 991
945 framework and realization of a real-time HCI system for from EEG data using machine learning approach. 992
946 emotion recognition. Neurocomputing 2014;129:94–106. 993
[9] Mohammadi Z, Frounchi J, Amiri M. Wavelet-based 994
emotion recognition system using EEG signal. Neural 995
947 Conflict of interest Comput Appl 2017;28:1985–90. 996
[10] Yin Z, Zhao M, Wang Y, Yang J, Zhang J. Recognition of 997
emotions using multimodal physiological signals and an 998
948 The authors have no conflict of interest.
ensemble deep learning model. Comput Methods Programs 999
Biomed 2017;140:93–110. 1000
949 [11] Zhang Y, Ji X, Zhang S. An approach to EEG-based emotion 1001
Author statement
recognition using combined feature extraction method. 1002
Neurosci Lett 2016;633:152–7. 1003
950 Rab Nawaz: Conceptualization, Data curation, Methodology, [12] Al-Nafjan A, Hosny M, Al-Wabil A, Al-Ohali Y. Classification 1004
of human emotions from electroencephalogram (EEG) 1005
951 Software, Investigation, Validation, Writing- Original draft
signal using deep neural network. Int J Adv Comput Sci 1006
952 preparation, Writing- Revised draft preparation. Appl 2017;8:419–25. 1007
953 Kit Hwa Cheah: Software, Visualization, Investigation, [13] Koelstra S, Muhl C, Soleymani M, Lee J-S, Yazdani A, 1008
954 Validation. Ebrahimi T, et al. Deap: a database for emotion analysis; 1009
955 Humaira Nisar: Funding acquisition, Conceptualization, using physiological signals. IEEE Trans Affect Comput 1010
956 Supervision, Writing- Reviewing and Editing. 2012;3:18–31. 1011
957 Yap Vooi Voon: Supervision. [14] Bhatti AM, Majid M, Anwar SM, Khan B. Human emotion 1012
recognition and analysis in response to audio music using 1013
958 brain signals. Comput Human Behav 2016;65:267–75. 1014
959 [15] Harmon-Jones E, Gable PA, Peterson CK. The role of 1015
asymmetric frontal cortical activity in emotion-related 1016
960 Acknowledgments phenomena: a review and update. Biol Psychol 2010;84: 1017
451–62. 1018
961 Q5 This work is supported by Universiti Tunku Abdul Rahman [16] Pizzagalli D, Regard M, Lehmann D. Rapid emotional face 1019
processing in the human right and left brain hemispheres: 1020
962 Research Fund (UTARRF) (Grant Number: IPSR/RMC/UTARRF/
an ERP study. Neuroreport 1999;10:2691–8. 1021
963 2019-C1/H01). [17] Lan Z, Sourina O, Wang L, Scherer R, Müller-Putz G. 1022
Unsupervised feature learning for EEG-based emotion 1023
recognition. 2017 Int. Conf. Cyberworlds. 2017. pp. 182–5. 1024
964 [18] Schaaff K, Schultz T. Towards emotion recognition from 1025
Appendix A. Supplementary data electroencephalographic signals. Affect. Comput. Intell. 1026
Interact. Work. 2009. ACII 2009. 3rd Int. Conf.. 2009. pp. 1–6. 1027
[19] Hadjidimitriou SK, Hadjileontiadis LJ. Toward an EEG-based 1028
965 Supplementary data associated with this article can be
recognition of music liking using time–frequency analysis. 1029
966 found, in the online version, at doi:10.1016/j.bbe.2020.04.005. IEEE Trans Biomed Eng 2012;59:3498–510. 1030
[20] Petrantonakis PC, Hadjileontiadis LJ. Emotion recognition 1031
967
from EEG using higher order crossings. IEEE Trans Inf 1032
Technol Biomed 2010;14:186–97. 1033
[21] Liu Y, Sourina O, Nguyen MK. Real-time EEG-based emotion 1034
968 references
recognition and its applications. Transaction on 1035
computational science XII. Springer; 2011. p. 256–77. 1036
[22] Tuomas E, Vuoskoski K. A review of music and emotion 1037
969 [1] Jenke R, Peer A, Buss M. Feature extraction and selection for studies: approaches, emotion models, and stimuli. Music 1038
970 emotion recognition from EEG. IEEE Trans Affect Comput Percept An Interdiscip J 2013;30:307–40. 1039
971 2014;5:327–39. http://dx.doi.org/10.1525/jams.2009.62.1.145 1040
Please cite this article in press as: Nawaz R, et al. Comparison of different feature extraction methods for EEG-based emotion recognition.
Biocybern Biomed Eng (2020), https://doi.org/10.1016/j.bbe.2020.04.005
BBE 450 1–17
1041 [23] Cowen AS, Keltner D. Self-report captures 27 distinct [44] Olofsen E, Sleigh JW, Dahan A. Permutation entropy of the 1110
1042 categories of emotion bridged by continuous gradients. Proc electroencephalogram: a measure of anaesthetic drug 1111
1043 Natl Acad Sci U S A 2017;114:E7900–9. effect. Br J Anaesth 2008;101:810–21. 1112
1044 [24] Russell JA. Affective space is bipolar. J Pers Soc Psychol http://dx.doi.org/10.1093/bja/aen290 1113
1045 1979;37:345–56. [45] Inouye T, Shinosaki K, Sakamoto H, Toi S, Ukai S, Iyama A, 1114
1046 http://dx.doi.org/10.1037/0022-3514.37.3.345 et al. Quantification of EEG irregularity by use of the entropy 1115
1047 [25] Liu Y, Sourina O. Real-time subject-dependent EEG-based of the power spectrum. Electroencephalogr Clin 1116
1048 emotion recognition algorithm. Transaction on Neurophysiol 1991;79:204–10. 1117
1049 computational science XXIII. Springer; 2014. p. 199–223. [46] Grassberger P, Schreiber T, Schaffrath C. Nonlinear time 1118
1050 [26] Mehrabian A. Pleasure-arousal-dominance: A general sequence analysis. Int J Bifurc Chaos 1991;1:521–47. 1119
1051 framework for describing and measuring individual [47] Sleigh JW, Olofsen E, Dahan A, De Goede J, Steyn-Ross DA. 1120
1052 differences in temperament. Curr Psychol 1996;14:261–92. Entropies of the EEG: the effects of general anaesthesia; 1121
1053 [27] Liu Y, Sourina O. EEG-based dominance level recognition 2001. 1122
1054 for emotion-enabled interaction. 2012 IEEE Int. Conf. on [48] Hosseini SA, Naghibi-Sistani MB. Emotion recognition 1123
1055 Multimed. Expo (ICME). 2012. pp. 1039–44. method using entropy analysis of EEG signals. Int J Image 1124
1056 [28] Tandle AL, Joshi MS, Dharmadhikari AS, Jaiswal SV. Mental Graph Signal Process 2011;3:30. 1125
1057 state and emotion detection from musically stimulated [49] Richman JS, Moorman JR. Physiological time-series analysis 1126
1058 EEG. Brain Informatics 2018;5:14. using approximate entropy and sample entropy. Am J 1127
1059 [29] Scherer KR. Which emotions can be induced by music? Physiol Circ Physiol 2000;278:H2039–4. 1128
1060 What are the underlying mechanisms? And how can we [50] Jie X, Cao R, Li L. Emotion recognition based on the sample 1129
1061 measure them? J New Music Res 2004;33:239–51. entropy of EEG. Biomed Mater Eng 2014;24:1185–92. 1130
1062 [30] Fernández-Sotos A, Fernández-Caballero A, Latorre JM. [51] Puthankattil SD, Joseph PK. Analysis of EEG signals using 1131
1063 Influence of tempo and rhythmic unit in musical emotion wavelet entropy and approximate entropy: a case study on 1132
1064 regulation. Front Comput Neurosci 2016;10:80. depression patients. Int J Med Heal Biomed Pharm Eng 1133
1065 [31] Hubert W, de Jong-Meyer R. Autonomic, neuroendocrine, 2014;8:420–4. 1134
1066 and subjective responses to emotion-inducing film stimuli. [52] Garc\'\ia-Mart\'\inez B, Martinez-Rodrigo A, Alcaraz R, 1135
1067 Int J Psychophysiol 1991;11:131–40. Fernández-Caballero A. A review on nonlinear methods 1136
1068 [32] Zhuang N, Zeng Y, Tong L, Zhang C, Zhang H, Yan B. using electroencephalographic recordings for emotion 1137
1069 Emotion recognition from EEG signals using recognition. IEEE Trans Affect Comput 2019. 1138
1070 multidimensional information in EMD domain. Biomed Res [53] Esteller R, Vachtsevanos G, Echauz J, Litt B. A comparison of 1139
1071 Int 2017. waveform fractal dimension algorithms. IEEE Trans Circuits 1140
1072 http://dx.doi.org/10.1155/2017/8317357 Syst I Fundam Theory Appl 2001;48:177–83. 1141
1073 [33] Ros T, Munneke MAM, Parkinson LA, Gruzelier JH. [54] Thammasan N, Fukui K, Numao M. Multimodal fusion of 1142
1074 Neurofeedback facilitation of implicit motor learning. Biol eeg and musical features in music-emotion recognition. 1143
1075 Psychol 2014;95:54–8. Thirty-First AAAI Conf. Artif. Intell.; 2017. 1144
1076 [34] Phneah SW, Nisar H. EEG-based alpha neurofeedback [55] Petrosian A. Kolmogorov complexity of finite sequences 1145
1077 training for mood enhancement. Australas Phys Eng Sci and recognition of different preictal EEG patterns. Proc. 1146
1078 Med 2017;40:325–36. Eighth IEEE Symp Comput Med Syst. 1995. pp. 212–7. 1147
1079 [35] Hou Y, Chen S. Distinguishing different emotions evoked by [56] Higuchi T. Approach to an irregular time series on the basis 1148
1080 music via electroencephalographic signals. Comput Intell of the fractal theory. Phys D Nonlinear Phenom 1149
1081 sbref: Neurosci 2019. 1988;31:277–83. 1150
1082 [36] Candra H, Yuwono M, Chai R, Handojoseno A, Elamvazuthi [57] Li M, Xu H, Liu X, Lu S. Emotion recognition from 1151
1083 I, Nguyen HT, et al. Investigation of window size in multichannel EEG signals using K-nearest neighbor 1152
1084 classification of EEG-emotion signal with wavelet entropy classification. Technol Heal Care 2018;26:509–19. 1153
1085 and support vector machine. 2015 37th Annu. Int. Conf. [58] Urbanowicz RJ, Meeker M, La Cava W, Olson RS, Moore JH. 1154
1086 IEEE Eng. Med. Biol. Soc.. 2015. pp. 7250–3. Relief-based feature selection: Introduction and review. J 1155
1087 [37] Picard RW, Vyzas E, Healey J. Toward machine emotional Biomed Inform 2018;85:189–203. 1156
1088 intelligence: Analysis of affective physiological state. IEEE [59] Bolón-Canedo V, Sánchez-Maroño N, Alonso-Betanzos A. A 1157
1089 Trans Pattern Anal Mach Intell 2001;1175–91. review of feature selection methods on synthetic data. 1158
1090 [38] Lan Z, Sourina O, Wang L, Liu Y. Real-time EEG-based Knowl Inf Syst 2013;34:483–519. 1159
1091 emotion monitoring using stable features. Vis Comput [60] Song F, Guo Z, Mei D. Feature selection using principal 1160
1092 2016;32:347–58. component analysis. 2010 Int Conf Syst Sci Eng Des Manuf 1161
1093 [39] Wang W, Gill EW. Comparison of a modified Inform vol. 1. 2010. pp. 27–30. 1162
1094 periodogram and standard periodogram for current [61] Kira K, Rendell LA. A practical approach to feature 1163
1095 estimation by an hf surface radar. Ocean 2014-TAIPEI. 2014. selection. Mach Learn Proc 1992. Elsevier; 1992. p. 249–56. 1164
1096 pp. 1–7. [62] Kira K, Rendell LA, et al. The feature selection problem: 1165
1097 [40] Lin Y-P, Wang C-H, Jung T-P, Wu T-L, Jeng S-K, Duann J-R, Traditional methods and a new algorithm. Aaai 1992;2:129–34. 1166
1098 et al. EEG-based emotion recognition in music listening. [63] Yang Y-H, Lin Y-C, Su Y-F, Chen HH. A regression approach 1167
1099 IEEE Trans Biomed Eng 2010;57:1798–806. to music emotion recognition. IEEE Trans Audio Speech 1168
1100 [41] Li M, Lu B-L. Emotion classification based on gamma-band Lang Process 2008;16:448–57. 1169
1101 EEG. 2009 Annu. Int. Conf. IEEE Eng. Med. Biol. Soc.. 2009. pp. [64] Kononenko I, Šikonja MR. Non-myopic feature quality 1170
1102 1223–6. evaluation with (R) ReliefF. Comput Methods Featur Sel 1171
1103 [42] Nawaz R, Nisar H, Voon YV. The effect of music on human 2008;169–91. 1172
1104 brain; Frequency domain and time series analysis using [65] Kooperberg C, Dai JY, Hsu L, Tzeng J-Y. Statistical 1173
1105 electroencephalogram. IEEE Access 2018;6. approaches to Gene X environment interactions for 1174
1106 http://dx.doi.org/10.1109/ACCESS.2018.2855194 complex phenotypes. MIT press; 2016. 1175
1107 [43] Bandt C, Pompe B. Permutation entropy: a natural [66] Al Zoubi O, Awad M, Kasabov NK. Anytime multipurpose 1176
1108 complexity measure for time series. Phys Rev Lett emotion recognition from EEG data using a Liquid State 1177
1109 2002;88:174102. Machine based framework. Artif Intell Med 2018;86:1–8. 1178
Please cite this article in press as: Nawaz R, et al. Comparison of different feature extraction methods for EEG-based emotion recognition.
Biocybern Biomed Eng (2020), https://doi.org/10.1016/j.bbe.2020.04.005
BBE 450 1–17
1179 [67] Chang C-C, Lin C-J. LIBSVM: a library for support vector [69] Chao H, Dong L, Liu Y, Lu B. Emotion recognition from 1184
1180 machines. ACM Trans Intell Syst Technol 2011;2:27. multiband EEG signals using CapsNet. Sensors 2019;19:2212. 1185
1181 [68] Mert A, Akan A. Emotion recognition from EEG signals by [70] Yan J, Chen S, Deng S. A EEG-based emotion recognition 1186
1182 using multivariate empirical mode decomposition. Pattern model with rhythm and time characteristics. Brain 1187
1183 Anal Appl 2018;21:81–9. Informat 2019;6:7. 1188
1189
Please cite this article in press as: Nawaz R, et al. Comparison of different feature extraction methods for EEG-based emotion recognition.
Biocybern Biomed Eng (2020), https://doi.org/10.1016/j.bbe.2020.04.005