06_chapter 2
06_chapter 2
Literature Survey
those related to emotion recognition such as the dataset available, different equip-
ment related to the 10-20 probe system along with different research carried out in
the last decade. The description further extended to feature extraction methods, sig-
nal processing and classification algorithms. This chapter narrated the methodology
2.1 Introduction
have been proposed by psychological researchers but data scientists simplified these
emotions and limited them to a few emotions while the classification of emotions
from EEG signals. Some researchers have classified emotion into three classes such
as anger, surprise and others. Anger defines as any emotion other than the first two
emotions. The analysis of EEG signals concerning emotion has been based on tem-
24
Literature Survey
poral and spatial domains. EEG signal generally consists of five frequency bands
such as theta, alpha, beta and gamma. A discussion on the source of EEG sig-
nals, their characteristics and their uses in emotion recognition have been discussed
in this chapter. The harnessing of EEG signals and their pre-processing has been
described. The EEG signals are the noncontrolled signal in the sense that human
beings are not capable of manipulating the EEG signals. A proper objective is set
ent emotions. The same audio/video clip is unable to produce the same emotion in
clips and their duration is set. The 10-20 methods and the different probes used
by different researchers have been discussed in this chapter. The noise-free EEG
EEG signals is an important topic of discussion. The feature selection and its sta-
tistical tool are also important for efficient emotion recognition. There is no direct
relation between EEG signals and a manual score of emotion since EEG signals are
involuntary and a manual score is voluntary. A special selection of EEG signals and
segmentation of EEG signals are often necessary to correlate the manual score of
emotion. Sometimes special signal processing tools are used to enhance the relation
In literature, different statistical classifiers have been explored for the efficient
classification of emotion. SVM has been a prominent classifier for emotion recog-
25
Literature Survey
nition. However other classifiers have also been used for emotion recognition. It is
observed from the above-related studies that feature is extracted in temporal and
in the frequency domain. Among the frequency domain features Fourier Transform
or short-time Fourier transform have been widely used. Time-domain features in-
different formats. Among the classifier, SVM (Wang et al., 2011),(Hosseini and
Neighbor (KNN)(Mohammadi et al., 2017) and Naı̈ve Bayes (NB) (Huang et al.,
2012) have also been used in several emotion classifications. Though several re-
searchers have explored different techniques for the classification task, in each case,
Between 2009 and 2016, the authors Alarcao and Fonseca (2017), reviewed a
significant number of highly popular emotional research papers and found as many
as 42 distinct feature selection approaches have been explored. The authors further
revealed that different forms of Fourier transform such as Short-time Fourier Trans-
form (STFT), Discrete Fourier Transform (DFT), and Wavelet Transform have been
the most popular feature estimation techniques besides statistical Power Spectral
Density (PSD).
Of late, a different variant of the neural network along with different activation
26
Literature Survey
ment techniques are also reported for the efficient classification of emotions. Exten-
sive research is being done on emotion recognition. The uniformity in results has
not been reported so far. Even there is no uniformity in reporting of research. The
prediction accuracies are reported with a different metric. Brouwer et al. (2015)
have recommended six ways to bring uniformity in the research of emotion from
and Fonseca (2017) and the six major approaches are summarized in Figure 2.1.
The somatosensory system is a part of the human sensory system which evokes
27
Literature Survey
pleasure. This sensory feeling is transported to the brain through the spinal cord or
other nervous systems for proper explanation and reactions. Event-Related Potential
(ERP) (Sur and Sinha, 2009) are the electrical activity of the brain which becomes
active during special event or stimuli. The early waves of ERPs are lasted for 100 mil-
liseconds after the appearance of stimuli and are termed exogenously which depends
on the parameter of stimuli and the later part of the wave is termed endogenous and
(ERD) (Pfurtscheller, 1991) is related to the attenuation of alpha waves for a short
waves for a short duration. During visual stimuli, ERD is found in the occipital
zone and ERS over the central part of the brain close to C3 and C4.
The evoked potential is electrical activity resulting from different stimuli. Evoke
potentials are recorded in the form of EEG signals. Evoke potential is divided into
four categories: (1) visual evoked potential (2) auditory evoked potential (3) motor
evoked potential and (4) somatosensory evoked potential. Amygdala is the main
part of the emotional system of the human body. It gives instantaneous responses
to the stimuli in the form of emotion. It identifies the facial expression of a friend
or foe. So the affective system is also associated with Amygdala. Thalamus is an-
other part of the limbic system which process emotion such as happiness, pleasure,
fear, sadness and disgust. It also processes all sensory inputs except olfactory in-
(VTA). It is located in the central part of the brain and is responsible for the im-
28
Literature Survey
pulsive action of the human being. It also processes the information from Amygdala.
the skull of the human brain. 10 − 20 means 10% or 20% distance of nasion to inion
length. It is the distance by which all probes are located from neighbouring probes.
In this system whole skull is divided into four regions such as frontal, temporal,
parietal, and occipital. A line between nasion to inion divides the skull into two
regions. The left side of the skull suffix with an odd number and the right side with
an even number. All probes on the line joining between nasion to inion are marked
with a suffix zero or z. There are many recording machines with skull caps available
on market. Biosem active two (Khalili and Moradi, 2009) is a 280 channels DC
amplifier with 24-bit resolution coupled with four different sampling frequencies.
Some of the EEG recording equipment and channels with sampling frequency
have been presented in Table 2.1. Li and Lu (2009) used an Ag/Agcl 62-channel
EEG cap for the recording of EEG signals of ten subjects at a sampling rate of 1Khz.
Further to this, they explore features with a statistical tool such as Common Spatial
patterns (CSP). Lin et al. (2009) have harvested EEG signals using the Neuroscan
module with 32 channels. Bhatti et al. (2016) recorded EEG signals using a Neu-
rosky headset while listening to music. They have recorded four different emotions
like happy, sad, love and anger emotions in response to audio music.
29
Literature Survey
Table 2.1: Some of the EEG recording equipment and channels with the sampling
frequency
Emotion elicitation or emotion invoke in the brain due to some stimulus and these
stimuli could be any object like audio, visuals or touch. Audio and videos have
been used for a long to invoke emotion in the human mind. The reliability of data
depends on the stimulus which elicits emotion. The other parameter of the reliabil-
ity of data is the number of subjects, the gender of subjects, their median age and
finally the duration of the signal recording. International Affective Picture System
(IAPS) is one of the databases designed to elicit different emotions in the human
30
Literature Survey
mind.
Yazdani et al. (2009) have used stimuli as video clips from ’YouTube’ to evoke
six basic emotions defined by Paul Ekman. The emotions are joy, sadness, surprise,
disgust, fear and anger. The mean, minimum and maximum duration of video clips
Soleymani et al. (2015) have studied facial expression and EEG signals and re-
ported successful extraction of emotion from facial activities. The EEG signals are
So, the information contents of EEG signal are actual and more genuine manifesta-
be different from the actual emotion experienced by the subject and she/he may
Kroupi et al. (2014) have studied the effects of the olfactory bulb in the elicita-
tion of emotion based on audio, video and odour, in which the Wasserstein distance
metric has been used for estimation of the power difference between trial and base-
lines as the classifier. However, no hypothesis is proposed with a link to the class
or measure of emotion. Chanel et al. (2011) have worked on emotion while playing
games at various difficulty levels. In his case features are theta (θ), alpha (α) and
beta (β) of the EEG signal and some other physiological signals. Some of the lists
of stimulus, duration, number of subjects and emotion recorded have been depicted
in Table 2.2.
31
Literature Survey
Table 2.2: Some of the list of stimulus, duration, number of subjects and emotion
recorded
It is clear from this particular section that the wide-ranging stimulus has been
applied to evoke emotion. The stimulus is from YouTube videos to facial expres-
sions besides EEG signals. So it is difficult to standardise the stimulus and so is the
classification of emotion.
32
Literature Survey
There are several Datasets publicly available for research as mentioned in Table 2.3.
These datasets are widely varied in terms of the number of subjects. Generally, the
number of subjects in each dataset is less and different. The stimuli are more or
less different. It is further observed that the length of the dataset, sampling rate
and pre-processing are varied. Two prominent datasets have been discussed in great
detail in this section. These are the SEED and DEAP datasets.
SEED dataset: the SEED data set is recorded from fifteen Chinese nationals of an
average age is 23.27 the standard deviation is ±2.37. The group consist of seven
males and eight females. The SEED data set is recorded based on fifteen Chinese
film clips. The actual duration of the clip is for four minutes. Before the actual
exposure to the film clip, the subject is given five minutes for preparation. After the
actual exposure of four minutes, the subjects are given 45 seconds for self-assessment
Further, the data is stored in two folders named pre-processed and post-processed
data. The file names are ‘Preprocessed EEG’ and ‘Extracted Features’. Prepro-
cessed data is segmented and downsampled to 200Hz before storing it in the ’Pre-
processed EEG’ folder. Data is recorded three times for each subject with a gap
of one week between two recordings. There are 45 Matlab files, one for each ex-
periment. In the ‘extracted Features’ folder, different features are stored. These
features are differential entropy (DE), differential asymmetry (DASM) and rational
asymmetry (RASM). The feature signals are further smoothened by moving average
33
Literature Survey
Table 2.3: Some of the Emotional Database Publicly available for research
DEAP data set: DEAP data set is a multimodal data set containing data from 32
subjects. There are 32 EEG channels and 8 ancillary channels against 40 videos
that have been recorded for each subject. Some of the emotional database available
It is observed from the above presentation that the data sets publically available
are fewer in number and not uniform in terms of video and duration of a video
shown and the number of channels used to record the EEG signals. The emotional
marking is also on different scales. For example, the DEAP dataset applied a 1 to 9
continuous scale to express emotion levels whereas the SEED dataset used plus one
for positive emotion, zero for neutral emotion and minus one for negative emotion.
Further, their average age and their cultural identities are also not the same. Since
cultural differences have a great influence on the perception of emotion. Finally, the
34
Literature Survey
prediction result varies in these datasets when prediction is done under the same
model. For example, Gupta et al. (2018) reported the prediction result of 59 percent
and 83 percent while working with the DEAP dataset and SEED dataset.
electrical probe from the different zone of the scalp defined as a 10-20 system. The
frequency of the EEG signal occupies in the ranges of 0.1Hz to around 100 Hz. EEG
signal is of great importance to diagnose brain disorders such as epilepsy etc. The
EEG signal is consist of five spectral bands delta, theta, alpha, beta and gamma.
The five widely accepted EEG frequency bands are delta (0 to 4 Hz), theta (4 to 8
Hz), alpha (8 to 13 Hz), beta (13 to 30 Hz) and gamma (> 30Hz). Many researchers
have used one or more spectral bands as a feature for the prediction of emotion.
The first objective of classification problems is to select a proper feature and sta-
tistical tool for its extraction. The feature could be spatial or temporal. In emotion
recognition, input is the EEG signals or in some cases other physiological signals
selection is important in many aspects. The proper feature enhances the classifi-
cation accuracy. The reduction of features simplifies the complexity of the model
and processing time. It also reduces the redundancy of features. The physiological
signals are generally associated with noise from other systems of the human body.
35
Literature Survey
13Hz has a direct relation with Thalamus. The Thalamus generates and modulates
the alpha wave of EEG signals (Lindgren et al., 1999), (Schürmann and Başar, 2001).
form (WT) with different kernels have also been extensively used for feature extrac-
tion in the time-frequency domain. The Short-time Fourier transform (STFT), and
Fast Fourier transform (FFT) are also reported for the feature extraction process.
beta and gamma bands of EEG signals using a spectrogram, Zhao-Atlas-Marks dis-
tribution, and Hilbert-hung spectrum (HHS), and emotion recognition is done using
SVM, kNN, quadratic, and linear While the study on EEG signal emotion recog-
In an interesting research work where the Hjorth parameter is used. Using Hjorth
theta, alpha, beta, and gamma. These parameters were utilised by the Support
Vector Machine to classify emotions. Out of three emotional levels, the arousal of
36
Literature Survey
equivocal about channel selection for emotion extraction and overall accuracy.
In the time domain, Empirical Mode Decomposition (EMD) method has widely
been applied. N Zhuang et al. (Zhuang et al., 2017) decomposed the EEG signal
into finer waves called Intrinsic Mode Frequency (IMF) and then used Multidimen-
sional information of IMF as features, the first difference of time series, the first
difference of phase, and the normalised energy have been used to classify emotion
researchers. The author reported accuracy was 69.10 percent for valence, and 71.99
percent for arousal. In the EMD method, IMFs are not correctly evaluated if noise
The variant of Wavelet Transform has also been observed in emotion recognition
research. Gupta et al. (2018) ) proposed that EEG signals be decomposed into nu-
merous sub-bands using the Flexible Analytic Wavelet Transform (FAWT). For the
categorization of emotion from information potential, Random Forest and SVM al-
gorithms were applied, with an accuracy of 59 percent and 83 percent for the DEAP
database and SEED database, respectively. The reasons for such huge disparities in
In another interesting work where a large number of features have been explored.
Chanel et al. (2009) applied Short Term Fourier Transform (STFT) with 512 samples
37
Literature Survey
Table 2.4: Some of the feature and their statistical tools for extration
and with 50% overlap between two consecutive windows to extract features. They
have used 16,704 (64 electrodes ×9 frequency band × 29-time frames) features in
one set. In another set of features, they have used mutual information (MI) between
pairs of electrodes between different areas of the brain. However, it is not clear how
The application of selective spectral bands has been put into research by several
have used the four spectral bands viz. theta, alpha, beta and gamma. They also
experimented with other physiological signals such as Galvanic skin resistance, res-
38
Literature Survey
piration, blood pressure and temperature. They found that EEG signal is more
carried out research based on alpha (α) band frequency and achieved an average
accuracy of 66.66% and 66.67% from a stimulus such as visual and audio-visual.
They used the db4 wavelet function(WT) to decompose the frequency of EEG sig-
nal and neural network for classification. Koelstra S et al. (Koelstra et al., 2010)
have recorded EEG and other peripheral signals in their laboratory however, they
used Power spectral density (PSD) and Common Spatial Patterns (CSP) as features.
They also reported that PSD analysis with a bandwidth of 1-10Hz and with 50%
band overlap investigates the rhythmic variations of brainwaves. They further used
CSP to classify the signal according to its variance. Petrantonakis et al. (Petran-
tonakis and Hadjileontiadis, 2010) have explored feature extraction with the help
as features of spectral bands such as theta, alpha and beta of EEG using the Fast
Similarly, Mikhail et al. (Mikhail et al., 2013) have useed alpha (α) band fre-
quency while predicting four different emotion. The accuracies achieved by them are
51%, 53%, 58% and 61% for joy, anger, fear and sadness respectively. They further
experimented with reduced channel and achieve accuracies of 33%, 38%, 33% and
37.5% for joy, anger, fear and sadness respectively. The beta (β) spectral band of
the EEG signal is normally related to the relaxing but focused state of a subject.
39
Literature Survey
In the frequency between 13 and 30 Hz, beta rhythms are normally found in the
The fine brainwave state is the delta. Here the brainwaves are of the greatest
amplitude and slowest frequency. They typically centre around a range of 1.5 to
4 cycles per second. They never go down to zero because that would mean that
person is brain-dead. But, deep dreamless sleep would take human beings down to
When adults go to bed and read for a few minutes before attempting sleep, adults
are likely to be in low beta. When adults put the book down, turn off the lights
and close their eyes, the brainwaves will descend from beta to alpha, to theta and
Zhuang N et al. (Zhuang et al., 2017) experimented with beta (16-32 Hz) and
gamma (32-64 Hz) components of EEG signals as a feature. They also exhibited
few merit and demerit of the Empirical Mode of Decomposition (EMD). A compre-
hensive table of features and feature extraction tools has been presented in Table 2.4.
There are cases where the direct features are not applied for classification. Few
researchers have modified or selected features carefully so that the classification be-
40
Literature Survey
comes efficient. Petrantonakis and Hadjileontiadis (2010) described how they used
the asymmetry index to segment areas before using Empirical Mode Decomposition
(EMD) to extract features. For classifying valence and arousal emotions, they used
classifier.
For channel selection out of all EEG channels, Wang et al. (2019) used normal-
ized mutual information. They also used a short-time Fourier transform to measure
the EEG spectrogram (STFT). EEG signals are converted from lower to higher
et al. (2016). The Gaussian process is used to extract the feature, which is quite
to find better features that correlate better with the emotional score. To reach the
The three structural components of emotion categorization from EEG data that
have gained the most attention in research are feature extraction, signal processing,
and classification. Several studies (Piho and Tjahjadi, 2018), (Wang et al., 2019)
look at the emotional content of physiological signals, which is crucial for emotion
classification. Because all of the features are unnecessary for emotion recognition,
Atkinson and Campos (2016) employed Mutual Information to choose the impor-
tant features while discarding the redundant ones. The fundamental criterion for
41
Literature Survey
method, the training and testing were carried out. The classification is done using
an SVM classifier with RBF and a polynomial kernel. To choose channels from a
large number of EEG channels, Wang et al. (2019) employed normalized mutual
information. The EEG spectrogram was also assessed using a short-time Fourier
transform (STFT).
while using DEAP and MAHNOB-HCI data sets (Piho and Tjahjadi, 2018).The
amount of useful data has been selected based on MI between data and emotional
labelling. The signal segments with the maximum information have been used for
feature extraction. Five frequency bands have been investigated so far. On the
other hand, the authors have not looked at any optimization methods. The present
work takes a cue from this particular research and applied the different algorithms
recognition. There are several EEG channel reduction processes have been reported
recently. Srinivas et al. (Nadipalli et al., 2014) have used wavelet transform to
extract features from Gama, Beta, Alpha, Theta and Delta bands. They have
used the Radial Basis Function and Multilayer perception model for classification.
42
Literature Survey
Further, they propose that Occipital lobe channels such as Oz, O1, and O2 give
better accuracies out of all EEG channels. They also stated that wavelet transform
Wang et al. (Wang et al., 2019) used Normalized Mutual Information (NMI) to re-
duce channel redundancy and hardware complexity and also sliced the channel of
the DEAP dataset. Short-time Fourier transforms were used to capture the EEG
son et al.(Atkinson and Campos, 2016) selected relevant channels with the help of
SVM (Liu and Sourina, 2012), (Duan et al., 2013),(Jie et al., 2014), (Jiang et al.,
2016) has been a classifier for emotion recognition for a long time. Different kernels
of SVM have been used for the classification of emotion. The second most used clas-
sifier is the random forest. The k-Nearest Neighbour has largely been used in the
classification of emotion. Different kernels for SVM such as the Radial basis func-
tion (Ali et al., 2016), (Alsolamy and Fattouh, 2016), (Atkinson and Campos, 2016),
linear function (Li and Lu, 2009), (Koelstra et al., 2010), (Nie et al., 2011), (Wang
et al., 2014), polynomial function (Lan et al., 2016), (Liu et al., 2013) and Gaussian
(Liu et al., 2016a), (Jatupaiboon et al., 2013) have been used widely for classifi-
43
Literature Survey
2011), (Stikic et al., 2014) and Quadratic Discriminant Analysis (QDA) (Khalili and
Moradi, 2009), (Lee and Hsieh, 2014) have also been used by a few researchers to
recognize emotions.
Wang et al. (Wang et al., 2011) have used alpha, beta and gamma bands to
extract emotion with the help of a classifier such as KNN, Multilayer perception
and SVM. Haiyan Xu et al. (Xu and Plataniotis, 2012) have made use of alpha
and beta spectral bands of EEG signals as features using statistical, narrow-band
and higher Order Crossings and wavelet entropy while classification is done with the
classes. Several other researchers have explored emotion recognition using k-NN
et al., 2012).
Variant of SVM such as Multiclass SVM and Fuzzy SVM has also been applied for
emotion detection. Yuan-Pin Lin (Lin et al., 2009), used five spectral bands of EEG
signals viz. delta (1-3 Hz), theta (4-7 Hz), alpha (8-13 Hz), beta (14-30 Hz) and
gamma (31-50 Hz). They applied multiclass SVM for the classification of emotion.
Using Shannon Entropy and a higher-order auto-regressive model, Vijayan D et. al.
(Vijayan et al., 2015) retrieved characteristics from EEG signals from the DEAP
SVM). Exciting, happy, sad, and hatred are the four different types of emotions. The
visual foundation for selecting 12 EEG channels over the gamma band for emotion
44
Literature Survey
et al., 2017) described emotion identification using a Fuzzy Support Vector Machine
(FSVM). SVM has been demonstrated to be inferior to the Fuzzy Support Vector
ing algorithms. The author has demonstrated that FSVM classification for valence,
Normally, CNN is used to predict emotions based on facial expressions. Deep neu-
have been used to improve accuracy. Emotion recognition has been described using
deep neural networks (DNN) and convolutional neural networks (CNN) (Tripathi
et al., 2017). The authors used a large neural network for DNN, with hidden layers
using ReLu as an activation function and softmax activation functions for the final
output layer. For the hidden and output layers, dropout probabilities of 0.25 and
0.5 are applied, respectively. The input data has been represented as a 2D picture
with 100 initial convolution filters and a (3×3) convolution kernel when dealing
with CNN. convolution kernel when dealing with CNN. For valence and arousal, the
stated levels of accuracy are 66.79% percent and 57.58% percent, respectively.
Kang et al. (Kang et al., 2019) have applied Independent component analysis
to remove noises from artefacts and other nearby channels. They also use the mu-
45
Literature Survey
The Bimodal Deep Auto Encoder (BADE) has been developed recently (Liu
et al., 2016c). A bimodal Deep Auto Encoder (BADE) was designed to accom-
modate EEG and Eye data for feature extraction from the SEED and DEAP data
sets. Hidden layers from two separate modes are integrated using the Restricted
weights of the two modes. In addition, linear SVM was employed to reach an accu-
Xu et al., (Xu and Plataniotis, 2016) used Deep Belief Network (DBN) to in-
vestigate emotion recognition. They used ANOVA with SVM-RBF, SADE with
Soft-max activation function, and DBM with Soft-max classifier to predict emotion.
In their research paper, they reported high F1 scores for the prediction of arousal,
(Alhagry et al., 2017) ) have applied the Recurrent Neural Network (RNN) to
detect emotion. The authors used two LSTM layers in their model: the dropout
layer and the dense layer. The dense layer is performed for classification and the
LSTM is used for feature extraction. For arousal, valence, and liking, high accuracy
of 85.65 percent, 85.45 percent, and 87.99 percent has been recorded.
For the prediction of emotion, Chen J et al. (Chen et al., 2016) used a three-stage
46
Literature Survey
decision process. The authors divided the subjects into a few groups using the k-
is used to classify the data, and an accuracy of 70.04 percent has been reported.
cao and Fonseca, 2017). For EEG spectrum decomposition, the Wavelet transform
(WT) and Empirical Mode Decomposition (EMD) are the most often used methods.
The wavelet transform’s performance is determined by the wavelet function and scal-
ing function. Researchers in this field use a variety of standard WT basis function
families, including Symlets (sym), Haar (db1), Coiflets (coif), and Daubechies (db).
SVM is a good classifier, but it only works if the parameters are correctly selected
during training (Anguita et al., 2012). The learning phase of an SVM classifier is
be right. Because the EEG data and manual assessment are not linearly connected,
merous features, several statistical tools for feature extractions, signal processing
47
Literature Survey
tion, signal processing, and classification, relatively few studies have been conclusive
or generic about their findings. Furthermore, the type of characteristics chosen ap-
and accuracy varies greatly depending on the type of classifier used and the dataset
and influenced by factors such as time, space, context, race, and so on. Support
(KNN), and Empirical Mode Decomposition (EMD) with Hilbert Huang transform
are only a few of the most often utilised approaches for EEG emotion recognition.
48