10 1016@j Cogsys 2019 06 003

Download as pdf or txt
Download as pdf or txt
You are on page 1of 22

Available online at www.sciencedirect.

com

ScienceDirect
Cognitive Systems Research 58 (2019) 173–194
www.elsevier.com/locate/cogsys

Hybrid particle swarm optimization-genetic algorithm trained


multi-layer perceptron for classification of human glioma
from molecular brain neoplasia data

Kamanasish Bhattacharjee, Millie Pant ⇑


Department of Applied Science & Engineering, Indian Institute of Technology, Roorkee, India

Received 31 December 2018; received in revised form 2 May 2019; accepted 13 June 2019
Available online 21 June 2019

Abstract

Multi-Layer Perceptron (MLP) is among the most widely applied Artificial Neural Networks (ANNs). Multi-Layer Perceptron
(MLP) requires specific designing and training depending upon specific applications. This paper deals with the high-dimensional problem
of classification of human glioma from Molecular Human Brain Neoplasia Data by designing a Multi-Layer Perceptron (MLP) which is
trained through hybridizing Particle Swarm Optimization (PSO) and Genetic Algorithm (GA). The results are compared with individual
algorithms in terms of convergence rate, Mean Squared Error (MSE) and classification accuracy.
Ó 2019 Elsevier B.V. All rights reserved.

Keywords: PSO; GA; Hybrid; MLP; Glioma; Classification

1. Introduction Jain, & Mikkelsen, 2015), which stores the data regarding
the types and grades of human glioma. It links radiological
Lack of recorded molecular data for a large sample and phenotype to tissue genotype through clinical information
inadequate biomedical data integration from various and genomic characterization data. Primary objective of
sources hamper the process of finding better treatments the paper is designing a Multi-Layer Perceptron (MLP)
of brain tumors. Hence, the task of data integration, redis- to classify human glioma from REMBRANDT.
tribution and analysis both across and within functional The aim of training MLP is to obtain optimal weight-
domains is a very important aspect in the field of biomed- bias combination for attaining minimum training and test-
ical research. Each tumor has a special genomic signature ing error. In Machine Learning, the MLP is generally
which varies from patient to patient as well as from tumor trained through a gradient based supervised learning tech-
to tumor. Hence, the development of individualized nique - Backpropagation (BP). There are, however, some
treatment based on these signatures need advanced disadvantages in training MLP through BP like slow con-
biomedical informatics infrastructure. One such effort is vergence (Fahlman, 1988; Vogl, Mangis, Rigler, Zink, &
‘‘The Repository of Molecular Brain Neoplasia Data Alkon, 1988) and the local minima entrapment tendency
(REMBRANDT)” (Clark, 2013; Scarpace, Lisa, Adam, (Gori & Tesi, 1992; Lee, Oh, & Kim, 1993). Also, the con-
vergence of BP greatly depends on the initial parameter
values. Unsuitable initial values of parameters sometimes
⇑ Corresponding author.
lead to divergence instead of convergence. It has also been
E-mail addresses: kbhattacharjee@as.iitr.ac.in (K. Bhattacharjee),
millifpt@iitr.ac.in (M. Pant). pointed out that BP is more suitable for simple datasets but

https://doi.org/10.1016/j.cogsys.2019.06.003
1389-0417/Ó 2019 Elsevier B.V. All rights reserved.
174 K. Bhattacharjee, M. Pant / Cognitive Systems Research 58 (2019) 173–194

its suitability deteriorates with increasing search space Pepyne, 2002). This indicates that there is a possibility of
complexity (Mirjalili, Mirjalili, & Lewis, 2014). In the further improvement in the existing algorithms which is
problem of classifying human glioma from Molecular the main motivation behind the present work.
Brain Neoplasia Data, the search space is high- In case of MLP, the focus of the present work, the
dimensional and consequently the computation becomes search space for training is quite complex and is different
complex. These shortcomings make BP less attractive for for different datasets and requires a robust algorithm that
practical applications concerning complex data sets. can deal efficiently for different kinds of data sets. In this
An alternative to gradient based learning algorithms can study, hybridization of two most widely used and most suc-
be metaheuristic optimization methods. For researchers cessful metaheuristic techniques in training MLP have been
working in this field, it has been a focus of attraction. exploited – Genetic Algorithm (GA) and Particle Swarm
Unlike gradient based techniques, the stochastic nature of Optimization (PSO). GA and PSO are hybridized in five
metaheuristics helps in evading local minima. Literature different ways and the resulting algorithms are tested and
shows the applications of several metaheuristics employed analysed on different benchmarked datasets and are finally
for training MLP – Genetic Algorithm (GA) (Seiffert, employed on the Molecular Brain Neoplasia Data.
2001), Particle Swarm Optimization (PSO) (Mendes, The paper is divided in 11 sections. Section 2 explains
Cortez, Rocha, & Neves, 2002; Rashid & Baig, 2010; Yu, functionality of MLP and how particles/individuals are
Wang, & Xi, 2008), Differential Evolution (DE) (Slowik represented for the problem of training MLP. Section 3
& Bialko, 2008), Ant Colony Optimization (ACO) (Blum reviews various works on hybrid PSO-GA. Section 4–6
& Socha, 2005), Artificial Bee Colony (ABC) (Bullinaria describes the GA, PSO and hybrid PSOGA algorithms
& AlYahya, 2014), Evolution Strategy (Wienholt, 1993) respectively. Section 7 gives a detailed description of the
are some examples of metaheuristics that have been applied Molecular Brain Neoplasia Data. Section 8 deals with
successfully to train an MLP. designing the MLP for classification. Section 9 explains
A review of aforementioned algorithms infers that GA the various metrics used for performance measurement in
has been the most popular metaheuristics algorithm for this paper. Section 10 discusses the experimental results
training MLP. One of the possible reasons for its popular- and Section 11 concludes the paper.
ity could be the fact that GA has been in existence since a
long time and has been perhaps one of the most well stud- 2. Multi-Layer Perceptron (MLP)
ied algorithms of modern times. It has undergone several
modifications and variations to enhance its performance MLP is a class of feedforward Artificial Neural Network
and has been successfully applied to various problems. (ANN) having a minimum of at least three layers. Basic
The basic operators of GA like crossover and mutation structure of MLP is depicted in Fig. 1. Here, n, h and m
are particularly designed for enhancing the exploration represents the numbers of input, hidden and output nodes
capability of GA significantly by causing abrupt changes respectively. In this work, all the weights of an MLP are
in the candidate solutions and thereby try to avoid the local stored in the matrix W and all the biases are stored in
minima. Besides GA, the other recent metaheuristics algo- the matrix B. The MLP output calculation is given below.
rithms that have shown good results are PSO and ACO. First, the weighted sums of inputs are calculated –
Both the algorithms (PSO and ACO) have certain shortcom-
X
n
ings, which make them more susceptible of getting trapped sj ¼ ðW ððj  1Þn þ iÞ  I i Þ þ BðjÞ; j ¼ 1; 2;    ; h ð1Þ
into a local minima. Initial swarm distribution affects the i¼1
performance of PSO and the main concept of PSO is based
on interaction among the swarm members. When most of W ððj  1Þn þ iÞ is weight from ith input node to jth
the particles are trapped in a local minima, then there is very hidden node
small chance of preventing rest of the particles from being BðjÞ is bias of jth hidden node
trapped in the same minima. Use of pheromone matrix for
reinforcement learning and exploitation in ACO increases I i is the ith input
the tendency of getting trapped in local minima. Here the weights and biases are presented in the equa-
Another metaheuristics algorithm that has been widely tion in matrix form i.e. how they are stored in a matrix.
used for optimization purpose is the Differential Evolution Then output at each hidden node is given by–
(DE). However, it was pointed out by Piotrowski   1
(Piotrowski, 2014) through a number of experiments that S j ¼ sigmoid sj ¼ ; j ¼ 1; 2;    ; h ð2Þ
DE is probably not suitable for training Neural Networks ð1 þ expðsj ÞÞ
as it suffers from the problem of stagnation, probably due Lastly, final outputs are calculated –
to loss of diversity.
Nevertheless, many instances are available in literature, X
h
ok ¼ ðW ðnh þ ðk  1Þh þ jÞ  S j Þ þ Bðh þ kÞ;
which shows that no metaheuristics algorithm can work j¼1
equally good for all types of optimization problems (No
Free Lunch Theorem) (Wolpert & Macready, 1997; Ho & k ¼ 1; 2;    ; m ð3Þ
K. Bhattacharjee, M. Pant / Cognitive Systems Research 58 (2019) 173–194 175

Fig. 1. Multi-layer perceptron.

1 Therefore, number of parameters to be optimized for train-


Ok ¼ sigmoid ðok Þ ¼ ; k ¼ 1; 2;    ; m ð4Þ
ð1 þ expðok ÞÞ ing MLP is ðn þ 1Þh þ ðh þ 1Þm. Generally, the structure of
MLP is denoted as n  h  m.
W ðnh þ ðk  1Þh þ jÞ is weight from jth node in hidden
Here, inputs correspond to the features. Hence, number
layer to kth node in output layer of input layer nodes is same as the number features in the
Bðh þ k Þ is bias of kth output node classification problem. As there are no standard rules to
select the number of hidden nodes, the rule used in
The main task in training an MLP is to obtain the opti-
Mirjalili et al. (2014) is used in this paper –
mal weight-bias combination to get desirable outputs from
given inputs. H ¼ 2XN þ 1 ð5Þ
In the problem formulation, all the weights and biases
combines to form a candidate in the population i.e. each
of the weights and biases are an attribute of a candidate N ¼ Number of input nodes; H
solution. ¼ Number of hidden nodes
Candidate Solution ¼ fW ð1Þ;       ::; W ðnh þ mhÞ; Bð1Þ;
Total number of weights to be optimized is nh þ mh and
      ; Bðh þ mÞg total number of biases to be optimized is h þ m. Therefore,
The total number of weights to be optimized is nh þ mh it is evident that total number of weights and biases to be
and total number of biases to be optimized is h þ m. optimized is directly proportional to the number of
176 K. Bhattacharjee, M. Pant / Cognitive Systems Research 58 (2019) 173–194

features. As the number of features in a problem increases, 2006) respectively. Third one is proposed in the paper
the MLP gets bigger, subsequently making the search space through incorporating mutation with PSO and then using
more complex. elitist method for enhancing the evolutionary performance.
Yu, Wei, and Wang (2012) generated a model for Energy
3. Literature review Demand Estimation based on PSO and GA (PSO-GA
EDE) for China where PSO processes the population over
Hybridization of PSO and GA has found application for a specified number of generations and then the best N par-
solving various problems. The hybridization technique var- ticles are chosen, discarding the remaining M particles
ies from problem to problem. In this section, various tech- (Population size = N + M). Then, GA generates M new
niques of PSO-GA hybridization are reviewed mainly individuals from best N particles. Lastly, a new population
focusing on the process of hybridization and not on the is generated by combining these M new individuals and N
application for which it is used. best particles which is used in the next generation. Abdel-
Robinson, Sinton, and Rahmat-Samii (2002) investi- Kader (2011) used PSO-GA hybrid algorithm for solving
gated the possibility of hybridizing PSO and GA to opti- the Quality of Service (QoS) multicast routing problem.
mize the design of a profiled corrugated horn antenna. Here, PSO and GA are applied in series along with a
They proposed two approaches - GA-PSO and PSO-GA. replacement termw 2 [0:1]. First, PSO is applied to current
In GA-PSO approach, GA is applied till improvement in population and then the best (population_size*(1 -w)) par-
objective function evaluation started to level off and then ticles are included in the new population. Next, replacement
the GA output is used as the input to PSO. In case of term is used to determine the appropriate number of indi-
PSO-GA, PSO is applied first followed by GA. Shi, Lu, viduals, GA is applied on these individuals and rest of
Zhou, Lee, Lin, and Liang (2003) used two techniques to the population is filled by the output of GA. For solving
integrate PSO and GA - in parallel (PGPHEA) and in ser- non-linear optimization problems, hybridized PSO and
ies (PGSHEA). Juang (2004) proposed a Hybrid GA-PSO GA in series is applied along with dynamic constriction
(HGAPSO) for optimization of recurrent neural network factor to maintain feasibility of the particles by Abd-El-
where the initial population is divided into two halves. Wahed, Mousa, and El-Shorbagy (2011). Kuo, Syu,
The better half is given as input to PSO and the rest is dis- Chen, and Tien (2012) introduce a dynamic clustering tech-
carded. The output of PSO is given as input to GA. And nique based on hybrid PSO-GA (DCPG) where binary
finally, the outputs of PSO and GA are combined to gener- PSO is used along with crossover and mutation operators
ate new population for next generation. Li, Zhao, Guo, of GA to perform mating between personal best and global
and Teng (2006) proposed PHPSO-GA, a parallel best solutions of PSO and then mutate the global best solu-
hybridization of PSO and GA for packing and layout tion. Sheikhalishahi, Ebrahimipour, Shiri, Zaman, and
design problems where adaptive crossover and mutation Jeihoonian (2013) proposed three ways to hybridize PSO
operators of GA are applied to divide the population into and GA for solving the Reliability Redundancy Allocation
various classes. Then different PSO update operators are Problem (RRAP) - series, series-parallel and complex
employed according to the nature of different classes. Du, (bridge) system. Utkarsh, Kantha, Praveen, and Kumar
Li, and Cao (2006) proposed a learning algorithm for (2015) used GA and PSO in series combination to train
ANN based on GA-PSO. Here, candidate solutions are Functional Link ANN which is used as a channel equalizer
generated through crossover and mutation along with in adaptive signal processing. For solving constrained opti-
PSO, based on redefined local optimization swarm. It has mization problems, Garg (2016) proposed a hybrid PSO-
good global search capacity as well as local minima avoid- GA algorithm where solution is obtained via PSO and then
ing capability. Kao and Zahara (2008) presented an inte- GA operators are applied on this solution for exploration
grated GA-PSO for global optimization of multimodal of the search space. Ali and Tawhid (2017) used PSO-GA
function. The population is divided into best and worst for minimizing molecular potential energy function by first
half. Best half is fed to GA. Worst half and the output of applying PSO, then dimensionality reduction and popula-
GA are fed to PSO. Then the outputs of GA and PSO tion partitioning through arithmetic crossover and lastly
are combined to generate new population for next genera- employing mutation operation of GA to avoid premature
tion. Premalatha and Natarajan (2009) used three different convergence. Asadnia, Khorasani, and Warkiani (2017)
techniques to hybridize GA and PSO for global maximiza- used PSO and GA in series combination to train ANN
tion - parallel, series and a technique where each particle in for determining growth parameters of Carbon Nano Tube
PSO change their best positions through mutation operator (CNT). Semero, Zhang, Zheng, and Wei (2018) used a par-
of GA. Marinakis and Marinaki (2010) used Genetic-PSO allel technique to combine PSO and GA to optimize a
method for solving vehicle routing problem where in each multi-layered feed-forward ANN model for wind power
generation GA is applied on the population followed by generation forecasting. Here, PSO and GA are executed
PSO. Kuo and Han (2011) presented three hybrid PSO- simultaneously and after each iteration best solutions of
GA methods to solve bi-level linear programming problem PSO and GA are compared and the solution which is better
- HGAPSO-1, HGAPSO-2 and HGAPSO-3. The first two is used in the next generation for both the algorithms.
methods are based on (Kao & Zahara, 2008) and (Du et al., Anand, Suganthi, Anand, and Suganthi (2018) used
K. Bhattacharjee, M. Pant / Cognitive Systems Research 58 (2019) 173–194 177

ANN-GA-PSO model for forecasting electricity demand in vector. Initially the positions and velocities are generated
Tamil Nadu. Here, PSO and GA are integrated in a simple randomly within the specified ranges. An individual’s per-
series combination where the result of PSO is fed to GA. formance depends not only on its own experience (pbest)
but also it learns from the behaviour of other particles
4. Genetic algorithm (gbest).
Velocity is updated using the following formula -
Holland (1992) introduced Genetic Algorithm (GA) in  
vi ðt þ 1Þ ¼ w  vji ðtÞ þ r1  c1  pbestji ðtÞ  xji ðtÞ
j
1992. GA is a technique based on biological evolution pro-  
cess for solving complex optimization problems. Selection, þ r2  c2  gbestj ðtÞ  xji ðtÞ ð6Þ
Crossover and Mutation are three main operations of GA. The new position of the particle is given as follows -
In GA, parents are selected from the individuals and chil-
dren are produced through crossover and mutation that xji ðt þ 1Þ ¼ xji ðtÞ þ vji ðt þ 1Þ ð7Þ
will be individuals for next generation. Over generations,
the population of individuals evolve towards an optimal N ¼ Number of particles in the swarm;
solution. Fig. 2 shows the workflow of GA. Algorithmic i ¼ 1; 2;    ::; N; j ¼ 1; 2;       :; Dim
steps of GA to train a n  h  m MLP are given below -
r1 ; r2 2 ½0; 1; t ¼ Current iteration;
Step 1: Randomly initialize a population with each indi- c1 ¼ Cognitive constant; c2 ¼ Social constant;
vidual having ðnh þ mhÞ þ ðh þ mÞ attributes. N = Size w ¼ Inertia weight
of population.
Step 2: Set Crossover type, Selection type, Crossover Fig. 3 shows the workflow of PSO. Algorithmic steps of
Probability, Mutation Probability, Number of maxi- PSO to train a n  h  m MLP are given below)
mum generations.
Step 3: Calculate MSE for each individual in the popu- Step 1: Randomly initialize a population with each par-
lation through forward pass operations on MLP. ticle having ðnh þ mhÞ þ ðh þ mÞ attributes. Initialize
Step 4: Calculate fitness value for each individual. velocities corresponding to each particle. N = Size of
Step 5: Depending on the fitness values, choose the fit- population.
test half of the population. Step 2: Set Inertia weight, Acceleration constants, Num-
Step 6: Depending on the Crossover type and Crossover ber of maximum generations.
Probability, perform crossover operation between cho- Step 3: Calculate MSE for each particle in the popula-
sen individuals. tion through forward pass operations on MLP.
Step 7: Perform the mutation operation. Step 4: Determine pbest and gbest solutions.
Step 8: Replace the worst half by new children generated Step 5: Update particle velocity using Eq. (6).
through Steps 5, 6 and 7. Step 6: Update particle position using Eq. (7).
Step 9: Check if maximum generations are completed. If Step 7: Calculate MSE for each particle in the popula-
yes, output the best individual of current population as tion through forward pass operations on MLP.
result. If not, go to Step 3. Step 8: Update pbest and gbest solutions.
Step 9: Check if maximum generations are completed.
If yes, output the current gbest as result. If not, go to
5. Particle swarm optimization Step 5.

PSO was proposed by Kennedy and Eberhart (1995) in


1995. It is a population based stochastic optimization algo- 6. Hybrid PSO-GA
rithm which follows the philosophy of interactive foraging
behaviour displayed by individuals moving together in From Section 3, it can be observed that in most of the
groups. Each member of the group or swarm, called parti- literatures, PSO and GA are combined mainly in two ways
cle, is a solution to the given problem. The algorithmic - in series or in parallel for various applications. Hence, in
design of PSO is such that each particle shares its own this paper, we formulated five different ways to integrate
experience with other members of the population. In each PSO and GA through series combination or parallel
iteration, particles update their current best solution and combination.
move towards global optimum. Despite proving itself to
be a competent algorithm, PSO has some characteristic 6.1. Series PSO-GA (SPSOGA)
drawbacks like premature convergence or being vulnerable
to the presence of local minima in case of highly multi Here PSO and GA are combined is series. For first half
modal problems. of the total iterations PSO is applied on the population and
Two main equations that govern the working of PSO then the output population of PSO is fed into GA which
are represented in the form of velocity vector and position runs for the rest half of the total iterations. Fig. 4 shows
178 K. Bhattacharjee, M. Pant / Cognitive Systems Research 58 (2019) 173–194

Fig. 3. Particle swarm optimization flowchart.

Fig. 2. Genetic algorithm flowchart. Step 7: Calculate MSE for each particle in the popula-
tion through forward pass operations on MLP.
Step 8: Update pbest and gbest solutions.
Step 9: Check if half of maximum generations are com-
the workflow of SPSOGA. Algorithmic steps of SPSOGA pleted. If yes, go to Step 10. If not, go to Step 5.
to train a n  h  m MLP are given below - Step 10: Calculate fitness value for each individual.
Step 11: Depending on the fitness values, choose the fit-
Step 1: Randomly initialize a population with each par- test half of the population.
ticle having ðnh þ mhÞ þ ðh þ mÞ attributes. Initialize Step 12: Depending on the Crossover type and Cross-
velocities corresponding to each particle. N = Size of over Probability, perform crossover operation between
population. chosen individuals.
Step 2: Set Inertia weight, Acceleration constants, Step 13: Perform the mutation operation.
Crossover type, Selection type, Crossover Probability, Step 14: Replace the worst half by new children gener-
Mutation Probability, Number of maximum ated through Steps 11, 12 and 13.
generations. Step 15: Check if maximum generations are completed.
Step 3: Calculate MSE for each particle in the popula- If yes, output the best individual of current population
tion through forward pass operations on MLP. as result. If not, go to Step 16.
Step 4: Determine pbest and gbest solutions. Step 16: Calculate MSE for each particle in the popula-
Step 5: Update particle velocity using Eq. (6). tion through forward pass operations on MLP and go to
Step 6: Update particle position using Eq. (7). Step 10.
K. Bhattacharjee, M. Pant / Cognitive Systems Research 58 (2019) 173–194 179

Fig. 4. SPSOGA flowchart. Fig. 5. SGAPSO flowchart.

Step 1: Randomly initialize a population with each par-


6.2. Series GA-PSO (SGAPSO) ticle having ðnh þ mhÞ þ ðh þ mÞ attributes. N = Size of
population.
Here also PSO and GA are combined is series. But in Step 2: Set Inertia weight, Acceleration constants,
this case GA is applied first. For first half of the total iter- Crossover type, Selection type, Crossover Probability,
ations GA is applied on the population and then the output Mutation Probability, Number of maximum
population of GA is fed into PSO which runs for the rest generations.
half of the total iterations. Fig. 5 shows the workflow of Step 3: Calculate MSE for each particle in the popula-
SGAPSO. Algorithmic steps of SGAPSO to train a tion through forward pass operations on MLP.
n  h  m MLP are given below - Step 4: Calculate fitness value for each individual.
180 K. Bhattacharjee, M. Pant / Cognitive Systems Research 58 (2019) 173–194

Step 5: Depending on the fitness values, choose the fit-


test half of the population.
Step 6: Depending on the Crossover type and Crossover
Probability, perform crossover operation between cho-
sen individuals.
Step 7: Perform the mutation operation.
Step 8: Replace the worst half by new children generated
through Steps 5, 6 and 7.
Step 9: Check if half of maximum generations are com-
pleted. If yes, go to Step 10. If not, go to Step 3.
Step 10: Initialize velocities corresponding to each
particle.
Step 11: Determine pbest and gbest solutions.
Step 12: Update particle velocity using Eq. (6).
Step 13: Update particle position using Eq. (7).
Step 14: Calculate MSE for each particle in the popula-
tion through forward pass operations on MLP.
Step 15: Update pbest and gbest solutions.
Step 16: Check if maximum generations are completed.
If yes, output the current gbest as result. If not, go to
Step 12.

6.3. Parallel PSO-GA (PPSOGA)

In this hybridization, after initializing the population


with N particles and evaluating fitness for each particle,
the N particles are fed to PSO as well as to GA. After com-
pletion of the PSO and GA operations on each of N parti-
cles, the results of PSO and GA are combined to form 2 N
particles. Then N best particles are selected from these 2 N
particles for the next generation. Fig. 6 shows the workflow
of PPSOGA. Algorithmic steps of PPSOGA to train a
n  h  m MLP are given below -

Step 1: Randomly initialize a population with each par-


ticle having ðnh þ mhÞ þ ðh þ mÞ attributes. Initialize
velocities corresponding to each particle. N = Size of
population.
Step 2: Set Inertia weight, Acceleration constants,
Crossover type, Selection type, Crossover Probability,
Mutation Probability, Number of maximum
generations.
Step 3: Calculate MSE for each particle in the popula-
tion through forward pass operations on MLP.
Step 4: Determine pbest and gbest solutions.
Step 5: Update particle velocity using Eq. (6).
Step 6: Update particle position using Eq. (7). Fig. 6. PPSOGA flowchart.
Step 7: Calculate MSE for each particle in the popula-
tion through forward pass operations on MLP.
Step 8: Update pbest and gbest solutions. Step 7: Perform the mutation operation.
Step 4: Calculate fitness value for each individual. Step 8: Replace the worst half by new children generated
Step 5: Depending on the fitness values, choose the fit- through Steps 5, 6 and 7.
test half of the population. Step 9: Combine populations (Total 2 N particles/
Step 6: Depending on the Crossover type and Crossover individuals)
Probability, perform crossover operation between Step 10: Select best N particles/individuals and discard
chosen individuals. the worst half.
K. Bhattacharjee, M. Pant / Cognitive Systems Research 58 (2019) 173–194 181

Step 11: Check if maximum generations are completed.


If yes, output the best particle/individual of current pop-
ulation as result. If not, go to Step 4.

6.4. Parallel PSO-GA 2 (PPSOGA2)

In this hybridization, after initializing the population


with N particles and evaluating fitness for each particle,
the best N/2 particles are fed to PSO as well as to GA. After
completion of the PSO and GA operations on each of N/2
particles, the results of PSO and GA are combined to form
N particles. Then again N/2 best particles are selected from
these N particles for the next generation. Fig. 7 shows the
workflow of PPSOGA2. Algorithmic steps of PPSOGA2
to train a n  h  m MLP are given below –

Step 1: Randomly initialize a population with each par-


ticle having ðnh þ mhÞ þ ðh þ mÞ attributes. Initialize
velocities corresponding to each particle. N = Size of
population.
Step 2: Set Inertia weight, Acceleration constants,
Crossover type, Selection type, Crossover Probability,
Mutation Probability, Number of maximum
generations.
Step 3: Calculate MSE for each particle in the popula-
tion through forward pass operations on MLP.
Step 4: Select best N/2 particles/individuals and discard
the worst half.
Step 5: Determine pbest and gbest solutions.
Step 6: Update particle velocity using Eq. (6).
Step 7: Update particle position using Eq. (7).
Step 8: Calculate MSE for each particle in the popula-
tion through forward pass operations on MLP.
Step 9: Update pbest and gbest solutions.
Step 5: Calculate fitness value for each individual.
Step 6: Depending on the fitness values, choose the fit-
test half of the population.
Step 7: Depending on the Crossover type and Crossover
Probability, perform crossover operation between cho-
sen individual.
Step 8: Perform the mutation operation.
Step 9: Replace the worst half of the population by new
children generated through Steps 6, 7 and 8.
Step 10: Combine populations (Total N particles/
individuals)
Step 11: Check if maximum generations are completed.
If yes, output the best particle/individual of current pop-
ulation as result. If not, go to Step 4. Fig. 7. PPSOGA2 flowchart.

6.5. Hybrid PSO-GA (HPSOGA) PSO are fed to GA. Finally, the results of PSO and GA
are combined to form N particles. Then again N/2 best par-
This hybridization is a combination of series as well as ticles are selected from these N particles for the next gener-
parallel techniques. Here, after initializing the population ation. Fig. 8 shows the workflow of HPSOGA.
with N particles and evaluating fitness for each particle, Algorithmic steps of PPSOGA2 to train a n  h  m
the best N/2 particles are fed to PSO. Then the result of MLP are given below –
182 K. Bhattacharjee, M. Pant / Cognitive Systems Research 58 (2019) 173–194

Step 2: Set Inertia weight, Acceleration constants,


Crossover type, Selection type, Crossover Probability,
Mutation Probability, Number of maximum
generations.
Step 3: Calculate MSE for each particle in the popula-
tion through forward pass operations on MLP.
Step 4: Select best N/2 particles/individuals and discard
the worst half.
Step 5: Determine pbest and gbest solutions.
Step 6: Update particle velocity using Eq. (6).
Step 7: Update particle position using Eq. (7).
Step 8: Calculate MSE for each particle in the popula-
tion through forward pass operations on MLP.
Step 9: Update pbest and gbest solutions.
Step 10: Calculate fitness value for each individual.
Step 11: Depending on the fitness values, choose the fit-
test half of the population.
Step 12: Depending on the Crossover type and Cross-
over Probability, perform crossover operation between
chosen individuals.
Step 13: Perform the mutation operation.
Step 14: Replace the worst half by new children gener-
ated through Steps 11, 12 and 13.
Step 15: Combine populations (Total N particles/
individuals).
Step 16: Check if maximum generations are completed.
If yes, output the best individual of current population
as result. If not, go to Step 4.

7. Molecular brain neoplasia data

The VASARI (‘‘Visually AcceSAble Rembrandt


Images”) MRI feature set of human gliomas was devised
based upon the REMBRANDT. Neuroradiologists from
Thomas Jefferson University (TJU) Hospital characterized
these imaging features. For this study, the data of 32
patients has been taken. Each patient’s MRI data have
been analysed by 3 different radiologists and each decided
the glioma types. Hence, the dataset contains total of 96
cases. The original VASARI dataset contains 30 features
or attributes. But the first feature ‘Tumor Location’ is a
multivalued attribute. Hence, the VASARI dataset has
been pre-processed for this study to make all the attributes
single valued. The final number of features or attributes
after pre-processing is 36. A detailed description of all
the features has been listed in Table 1. This genetically pro-
filed MRI collection contains cases of 3 types of glioma -
Fig. 8. HPSOGA flowchart. astrocytomas, glioblastoma and oligodendrogliomas –
which act as the 3 classes in this study. These classes are
mutually independent i.e. if a patient has astrocytoma then
Step 1: Randomly initialize a population with each par- he will not have glioblastoma or oligodendroglioma. In
ticle having ðnh þ mhÞ þ ðh þ mÞ attributes. Initialize REMBRANDT, an assumption is made that the abnor-
velocities corresponding to each particle. N = Size of mality is comprised of the following components – (1)
population. Enhancing, (2) Non-enhancing, (3) Necrotic, (4) Edema.
K. Bhattacharjee, M. Pant / Cognitive Systems Research 58 (2019) 173–194 183

Table 1 24 Pial invasion 1 = No, 2 = Yes


Description of features or attributes. 25 Ependymal invasion
Feature Feature Feature values 26 Cortical involvement
No. 27 Deep WM invasion
28 nCET tumor crosses midline 1 = N/A, 2 = No,
1 Tumor present in frontal lobe 1 = No, 2 = Yes 29 Enhancing tumor crosses 3 = Yes
2 Tumor present in temporal lobe midline
3 Tumor present in insular lobe
30 Satellites 1 = No, 2 = Yes
4 Tumor present in parietal lobe 31 Calvarial remodeling
5 Tumor present in occipital lobe 32 Enhancing tumor’s extent of 1 = N/A,
6 Tumor present in brainstem
resection 2 = 0%,
7 Tumor present in cerebellum
33 Extent resection of nCET 3= <5%,
8 Tumor epicenter side 1 = Right, 34 Extent resection of vasogenic 4 = 6–33%,
2 = Center, edema 5 = 34–67%,
3 = Left
6 = 68–95%,
9 Eloquent brain 1 = Does not involve,
7= >95%,
2 = Involves speech 8 = 100%,
motor, 9 = Indeterminate
3 = Involves speech
35 Lesion size 1= <0.5 cm,
receptive, 36 2 = 0.5 cm,
4 = Involves motor, 3 = 1.0 cm,
5 = Involves vision 4 = 1.5 cm,
10 Enhancement quality 1 = None,
5 = 2.0 cm,
2 = Mild, 6 = 2.5 cm,
3 = Marked 7 = 3.0 cm,
11 Proportion enhancing 1 = N/A, 2 = 0%, 3=
8 = 3.5 cm,
12 Proportion non-enhancing <5%, 4 = 6–33%, 5 = 34–
9 = 4.0 cm,
13 Proportion necrosis 67%, 6 = 68–95%, 7= 10 = 4.5 cm,
>95%, 11 = 5.0 cm,
8 = 100%,
12 = 5.5 cm,
9 = Indeterminate 13 = 6.0 cm,
14 Cyst(s) 1 = No, 2 = Yes 14 = 6.5 cm,
15 Multifocal or multicentric 1 = Not available, 15 = 7.0 cm,
2 = Multifocal,
16 = 7.5 cm,
3 = Multicentric, 17 = 8.0 cm,
4 = Gliomatosis 18= >8.0 cm
16 T1/FLAIR RATIO 1 = Expansive,
2 = Mixed,
3 = Infiltrative
17 Enhancing margin thickness 1 = Not Available, For better understanding of the features, some of the
2 = No thickness, features are presented pictorially. Features 1–6 i.e. various
3 = Thin margin, locations of the tumor within brain are presented in Fig. 9.
4 = Thick margin
18 Enhancing margin definition 1 = Not Available,
Fig. 10 shows feature 9 i.e. which subcortical white matter
2 = Well-defined margin, of the eloquent cortex is involved in tumor. Fig. 11 shows
3 = Poorly-defined feature 11 i.e. visually, when scanning through the entire
margin tumor volume, what proportion of the entire tumor is esti-
19 Non-enhancing margin 1 = N/A, mated to be enhancing according to radiologists.
definition 2 = Smooth,
3 = Irregular
20 Proportion of edema 1 = N/A, 8. Designing the multi-layer perceptron
2 = 0%,
3= <5%, There are 36 features or attributes in the dataset. Hence,
4 = 6–33%, the number of nodes in the input layer of MLP is 36.
5 = 34–67%,
6 = 68–95%,
Hence, from Eq. (5), number of hidden nodes -
7= >95%, H ¼ 2  36 þ 1 ¼ 73
8 = 100%,
9 = Indeterminate As the classes considered in this problem are mutually
21 Edema crosses midline 1 = N/A, independent, number of nodes in output layer is kept 1
2 = No, i.e. it will output only one class at a time. Hence the
3 = Yes
22 Hemorrhage 1 = No,
MLP architecture is 36  73  1, shown in Fig. 12.
2 = Yes Total number of weights to be optimized is ð36  73Þ
23 Diffusion 1 = No image,
2 = Facilitated diffusion, þ ð1  73Þ ¼ 2701
3 = Restricted diffusion,
4 = No diffusion Total number of biases to be optimized is ð73 þ 1Þ ¼ 74:
184 K. Bhattacharjee, M. Pant / Cognitive Systems Research 58 (2019) 173–194

Total number of parameters in this problem to be


optimized is ð2701 þ 74Þ ¼ 2775

9. Performance measurement metrics

There are three performance measurement metrics used


in this paper to compare the algorithms for training MLP.

9.1. Mean Squared Error (MSE)

Squared Error is nothing but squared value of the differ-


ence between predicted output by MLP and the original
output. Here, for each combination of weights and biases
i.e. for each candidate solution in the population, Squared
Error is calculated for each data in the training as well as
testing dataset. Then the mean of all these errors are calcu-
lated to find the Mean Squared Error (MSE) for the candi-
date solution. This MSE works as the objective function in
training MLP i.e. minimizing the MSE is the objective of
this problem.
Fig. 9. Features 1–6 - tumor locations.
1X d
MSE ¼ ðPredicted output  Original outputÞ2 ð8Þ
d i¼1

Fig. 10. Feature 9 - eloquent brain.

Fig. 11. Feature 11 - proportion enhancing.


K. Bhattacharjee, M. Pant / Cognitive Systems Research 58 (2019) 173–194 185

Fig. 12. 36-73-1 MLP.

d ¼ Number of samples present in the dataset 10. Experimental results and discussion
After executing the algorithm for a certain number of
iterations, the final MSE of the solution obtained by an The five hybrid PSO-GA algorithms described in Sec-
algorithm is recorded for the purpose of comparison. Obvi- tion 6, are compared to PSO, GA, ACO, DE and Back
ously, the lower is the value of MSE, the better is the per- propagation(BP) in this section. First they are tested on 6
formance of the algorithm. benchmark datasets – 3 benchmark function approxima-
tion datasets and 3 benchmark classification datasets. Then
9.2. Classification accuracy they are applied on the high dimensional Molecular Brain
Neoplasia data.
After training an MLP through an algorithm, a testing
dataset is given to the MLP. Classification accuracy is the 10.1. Parameter setting
percentage of data accurately classified by the MLP.
Hence, this is a direct measurement of how well an algo- The population size is 100 for every dataset. Each
rithm trains the MLP. The better it trains, the more accu- candidate in the population is initialized randomly in
rate the classification will be. the range [10,10]. In case of metaheuristic algorithms,
the maximum generation is 200 for the function approx-
9.3. Convergence rate imation datasets and 300 for classification datasets. In
case of Back Propagation (BP), the maximum iteration
Convergence rate indicates how fast and how smoothly is 250. Assumptions and parameter values of each
an algorithm approaches towards the optimal solution. algorithm is presented in a tabulated form in Table 2.
Smoother the curve, the more reliable is the behavior in Tuning the parameters is not within the scope of this
local minima avoidance. paper.
186 K. Bhattacharjee, M. Pant / Cognitive Systems Research 58 (2019) 173–194

Table 2 10.2.2. Sphere function


Parameters of algorithms.
X
2
Algorithm Parameter Value y¼ x2i ð10Þ
PSO Inertia weight 0.3 i¼1
Cognitive constant 1
Social constant 1 It is a two dimensional dataset. The training dataset
GA Encoding Real contains 1681 entries (x 2 ½2 : 0:1 : 2) and testing dataset
Selection Roulette wheel contains 441 entries (x 2 ½2 : 0:2 : 2). A 2-5-1 MLP is
Crossover Single point used for this function approximation with 15 weights and
Mutation Uniform
6 (total 21 parameters) to be optimized. Table 4 and
Crossover probability 1
Mutation probability 0.01 Fig. 14 show the experimental results. From MSE and clas-
ACO Initial pheromone 1e-06 sification accuracy values, it can be observed that SGAPSO
Pheromone update constant 20 produces the best result in this case. In terms of conver-
Exploration constant 1 gence behaviour, SGAPSO shows the fastest convergence.
Global pheromone decay rate 0.9
But, the most reliable behaviour in local minima avoidance
Local pheromone decay rate 0.5
Pheromone sensitivity 1 is shown by PPSOGA as its convergence curve is the
Visibility sensitivity 5 smoothest. It can be noted that SGAPSO gives better result
DE Mutation strategy DE/rand/1 than SPSOGA.
Mutation probability 0.8
Crossover probability 0.5
10.2.3. Rastrigin function
X
3
y ¼ 10  3 þ ½x2i  10cosð2pxi Þ ð11Þ
i¼1
10.2. Benchmark function approximation datasets
It is a three dimensional dataset. The training dataset
3 benchmark function approximation datasets are used contains 1331 entries (x 2 ½5 : 1 : 5) and testing dataset
here – Sigmoid function, Sphere function and Rastrigin contains 216 entries (x 2 ½5 : 2 : 5). A 3-7-1 MLP is used
function. The bold numbers in each table signify the best for this function approximation with 28 weights and 8
result among the different algorithms in terms of MSE biases (total 36 parameters) to be optimized. Table 5 and
and Classification accuracy. Fig. 15 show the experimental results. Here, PPSOGA
achieves the best MSE, classification accuracy and fastest
10.2.1. Sigmoid function convergence while smooth convergence curves of PPSOGA,
PPSOGA2 and HPSOGA indicates these algorithms’ more
y ¼ 1=ð1 þ ex Þ ð9Þ reliable behaviour in local minima avoidance. The parallel
It is a one dimensional dataset. The training dataset con- behaviour of these algorithms helps in this process. When
tains 121 entries (x 2 ½3 : 0:05 : 3) and testing dataset GA and PSO are running simultaneously, the different
contains 61 entries (x 2 ½3 : 0:1 : 3). A 1-3-1 MLP is used operations of GA and PSO changes the population in differ-
for this function approximation with 6 weights and 4 biases ent ways. Hence, if one gets stuck in a local minima, the
(total 10 parameters) to be optimized. Table 3 and Fig. 13 other helps to overcome it in the next generation. GA assists
show the experimental results. SPSOGA achieves the low- PSO more in such a case than vice versa. This results in a
est MSE in this case. But in terms of classification accu- smooth convergence curve for parallel algorithms.
racy, GA, SPSOGA, PPSOGA, SGAPSO and PPSOGA2
- all the algorithms shows 100% accuracy. 10.3. Benchmark classification datasets

3 benchmark classification datasets are used here – 3-bit


XOR, Iris and Breast Cancer.
Table 3
Sigmoid dataset results. 10.3.1. 3-bit XOR
Algorithm MSE Classification accuracy (%) The N bit XOR is a popular benchmark classification
PSO 0.0010562 89 problem where the task is to recognize the number of 1 s
GA 0.0011054 100 in input i.e. -
ACO 0.00082469 85 
0 if number of 1s in input vector is even
DE 0.0012686 86 output ¼
SPSOGA 3.4573e-05 100 1 if number of 1s in input vector is odd
PPSOGA 8.4883e-05 100
SGAPSO 0.00012078 100 A 3-7-1 MLP is used for this classification with 28
PPSOGA2 0.00083994 100 weights and 8 biases (total 36 parameters) to be optimized.
HPSOGA 0.00025412 86 Table 6 and Fig. 16 show the experimental results. This is
BP 0.0131 75
the simplest classification dataset used here. Hence, GA,
K. Bhattacharjee, M. Pant / Cognitive Systems Research 58 (2019) 173–194 187

Fig. 13. Convergence curves for Sigmoid function.

Table 4 avoiding local minima. Better local minima avoidance fea-


Sphere dataset results. ture of parallel algorithms is discussed in 10:2:3.
Algorithm MSE Classification accuracy (%)
10.3.2. Iris
PSO 0.028523 41.5
GA 0.006545 74 This is one of the best known benchmark classification
ACO 0.033892 39 datasets to be found in the pattern recognition literature
DE 0.033144 47.6 (Fisher, 1936). The dataset consists of 3 classes of 50
SPSOGA 0.040915 35 instances each (total 150 instances) and each sample in
PPSOGA 0.012229 60
the dataset has 4 features. One class is linearly separable
SGAPSO 0.0051809 87.75
PPSOGA2 0.017114 53 from the other two but the latter two are not linearly
HPSOGA 0.009907 64.85 separable from each other. Hence, in this case 3 nodes
BP 0.0780 52 are used in the output layer. A 4-9-3 MLP is used for this
classification with 63 weights and 12 biases (total 75
parameters) to be optimized. Table 7 and Fig. 17 show
PPSOGA, SGAPSO, PPSOGA2 achieves 100% classifica- the experimental results. Here, PPSOGA2 shows best result
tion accuracy. PPSOGA shows the best result in terms of in every aspect i.e. MSE, classification accuracy, fastest and
MSE. PPSOGA and PPSOGA2 produce best result in smoothest convergence.
188 K. Bhattacharjee, M. Pant / Cognitive Systems Research 58 (2019) 173–194

Fig. 14. Convergence curves for Sphere function.

Table 5 classes in this problem – 2 denotes ‘Benign’ and 4 denotes


Rastrigin dataset results. ‘Malignant’ and the classes are mutually independent. A 9-
Algorithm MSE Classification accuracy (%) 19-1 MLP is used for this classification with 190 weights
PSO 0.039225 34 and 20 biases (total 210 parameters) to be optimized. This
GA 0.017869 49.5 dataset is divided into 2 parts – training data and testing
ACO 0.047029 34 data. The first 599 samples are taken as training data while
DE 0.034521 33.33 the latter 100 are testing data. Table 8 and Fig. 18 show the
SPSOGA 0.025292 50
experimental results. Here, SGAPSO and PPSOGA
PPSOGA 0.011166 60.5
SGAPSO 0.023091 55 achieve best classification accuracy. SGAPSO shows best
PPSOGA2 0.014232 52 MSE and fastest convergence while PPSOGA has the
HPSOGA 0.016233 51 smoothest convergence.
BP 0.0529 45
10.4. Molecular brain neoplasia dataset

10.3.3. Breast cancer From the above experiments on benchmark functions


Wolberg and Mangasarian (1990) introduced this data- and datasets, it is obvious that in every case, the hybrid
set. It has total 699 instances and 9 features. There are 2 PSO-GA algorithms outperform the individual algorithm.
K. Bhattacharjee, M. Pant / Cognitive Systems Research 58 (2019) 173–194 189

Fig. 15. Convergence curves for Rastrigin function.

Table 6 best results in most of the cases. As datasets get bigger and
XOR dataset results. more complex, the relative performance of the hybrid algo-
Algorithm MSE Classification accuracy (%) rithms compared to individual algorithm gets better. As
PSO 0.066471 50 discussed in Section 7, Molecular brain Neoplasia data is
GA 3.98e-06 100 a high dimensional data with total 2775 parameters to be
ACO 0.060283 62 optimized. From, benchmark dataset results it can be
DE 0.0035 75
SPSOGA 0.061311 80
extrapolated that the hybrid algorithm will outperform
PPSOGA 6.67e-07 100 the individual algorithms. Still all the algorithms are
SGAPSO 0.00043793 100 applied on this dataset for comparison purpose and to
PPSOGA2 3.05e-06 100 observe by which margin the hybrid algorithm outperforms
HPSOGA 0.0060741 91 the individual ones. Table 9 and Fig. 19 show the experi-
BP 0.0697 50
mental results are presented in. It can be observed from
the results that, SGAPSO achieves the best MSE and clas-
Although, it is difficult to specify which hybrid algorithm sification accuracy and fastest convergence followed by
will work better for which type of dataset, SGAPSO and SPSOGA. In case of high dimensionality also, prior use
PPSOGA narrowly passes this generalization as they show of GA to PSO holds its superiority. From Fig. 19, it is evi-
190 K. Bhattacharjee, M. Pant / Cognitive Systems Research 58 (2019) 173–194

Fig. 16. Convergence curves for XOR dataset.

Table 7
Iris dataset results. 11. Conclusion
Algorithm MSE Classification accuracy (%)
In this paper five versions of hybrid PSO-GA algorithms
PSO 0.32444 35
GA 0.030057 86
are presented for the training of MLP and for consequently
ACO 0.27444 45 classifying different sets of problems including 6 bench-
DE 0.11741 64 mark datasets (3 benchmark function approximation data-
SPSOGA 0.029044 91 sets - Sigmoid, Sphere, Rastrigin and 3 benchmark
PPSOGA 0.037154 89 classification datasets  3-bit XOR, Iris, Breast Cancer)
SGAPSO 0.074193 75
PPSOGA2 0.026446 92
and Molecular Brain Neoplasia data. The following con-
HPSOGA 0.031465 75 clusions can be drawn from the present study:
BP 0.20324 52
1. It is evident from the numerical results and graphical
interpretations that hybridization of PSO and GA is
dent that though MSE values are not appropriate, the an efficient way for training MLP algorithms. This is
curves of parallel hybridizations – PPSOGA, PPSOGA2 quite obvious because hybridization provides a syner-
and HPSOGA are smoother than others. getic effect on the working of the algorithms.
K. Bhattacharjee, M. Pant / Cognitive Systems Research 58 (2019) 173–194 191

Fig. 17. Convergence curves for Iris dataset.

Table 8 the 4 proposed hybrid variants is 100% in case of


Breast cancer dataset results. Sigmoid function which is the simplest of the data sets
Algorithm MSE Classification accuracy (%) considered in this study. However, as the complexity
PSO 0.026193 37 of the datasets is increased it is observed that none of
GA 0.011536 94 the datasets are able to achieve 100% Classification
ACO 0.16881 79 accuracy.
DE 0.025 11
3. A similar trend is observed in case in case of classifica-
SPSOGA 0.001316 97
PPSOGA 0.001285 99 tion benchmark datasets where for a simple test dataset
SGAPSO 0.000538 99 like that of XOR, GA and 4 of the proposed hybrid vari-
PPSOGA2 0.004038 83 ants gave 100% Classification Accuracy but as the com-
HPSOGA 0.0031757 89 plexity of the datasets increases as in case of Iris and
BP 0.17233 78
Breast Cancer data set, the performance of the algo-
rithms deteriorates. Nevertheless it is worth noting that
2. Implementation of the proposed hybrid variants on 3 above 90% accuracy is obtained by PPSOGA2 achieved
test problems viz. Sigmoid, Sphere and Rastrigin func- an accuracy above 90% for Iris dataset and PPSOGA
tion reveals that the classification accuracy of GA and and SGAPSO achieved an accuracy of 99%.
192 K. Bhattacharjee, M. Pant / Cognitive Systems Research 58 (2019) 173–194

Fig. 18. Convergence curves for breast cancer dataset.

Table 9 considered as a good performance as the dataset is very


Molecular brain neoplasia dataset results.
high dimensional. It may also be concluded here that the
Algorithm MSE Classification accuracy (%) implementation of GA before PSO is more beneficial for
PSO 0.06768 38.5 high dimension data sets.
GA 0.0472 43 5. A comparison of proposed algorithms among them-
ACO 0.09375 55
DE 0.1577 33
selves shows that PPSOGA2 and SGAPSO are almost
SPSOGA 0.02723 43.75 at par with each other.
PPSOGA 0.10238 57
SGAPSO 0.030778 62 Overall it may be concluded that the best hybrid algo-
PPSOGA2 0.099578 59 rithm in each experiment gives 15–60% better MSE values
HPSOGA 0.11088 54
BP 0.12723 52
and 5–30% better classification accuracy than GA. The
competence of the algorithms is also evident for high
dimensional datasets like that of Molecular Brain Neo-
4. In case of Molecular Brain Neoplasia dataset, SGAPSO plasia data.
gave the best accuracy (62%), which is much better than This research can be extended in several directions
the accuracy obtained by other algorithms. This can be which include testing the algorithms on more complex
K. Bhattacharjee, M. Pant / Cognitive Systems Research 58 (2019) 173–194 193

Fig. 19. Convergence curves for molecular brain neoplasia dataset.

datasets; doing a thorough analysis of the proposed Ali, A. F., & Tawhid, M. A. (2017). A hybrid particle swarm optimization
algorithms; comparing the proposed algorithms with other and genetic algorithm with population partitioning for large scale
optimization problems. Ain Shams Engineering Journal, 8(2), 191–206.
hybrid variants available in literature and so on. Anand, A., Suganthi, L., Anand, A., & Suganthi, L. (2018). Hybrid GA-
PSO optimization of artificial neural network for forecasting electricity
Declaration of Competing Interest demand. Energies, 11(4), 728.
Asadnia, M., Khorasani, A. M., & Warkiani, M. E. (Sep. 2017). An
accurate PSO-GA based neural network to model growth of carbon
The authors declare that there is no conflict of interest nanotubes. Journal of Nanomaterials, 2017, 1–6.
regarding the publication of this article. Blum, C., & Socha, K. (2005). Training feed-forward neural networks with
ant colony optimization: an application to pattern classification. In
Fifth international conference on hybrid intelligent systems (HIS’05),
References p. 6 pp.
Bullinaria, J. A., & AlYahya, K. (2014). Artificial bee colony training of
Abdel-Kader, R. F. (2011). Hybrid discrete PSO with GA operators for neural networks. Cham: Springer (pp. 191–201).
efficient QoS-multicast routing. Ain Shams Engineering Journal, 2(1), Clark, K. et al. (2013). The cancer imaging archive (TCIA): Maintaining
21–31. and operating a public information repository. Journal of Digital
Abd-El-Wahed, W. F., Mousa, A. A., & El-Shorbagy, M. A. (2011). Imaging, 26(6), 1045–1057.
Integrating particle swarm optimization with genetic algorithms for Shiqiang Du, Wanshe Li, & Kai Cao (2006). A learning algorithm of
solving nonlinear optimization problems. Journal of Computational artificial neural network based on GA – PSO. In 2006 6th world
and Applied Mathematics, 235(5), 1446–1453. congress on intelligent control and automation, pp. 3633–3637.
194 K. Bhattacharjee, M. Pant / Cognitive Systems Research 58 (2019) 173–194

Fahlman, S. E. (1988). An empirical study on learning speed in the back Rashid, M., & Baig, A. R. (2010). Improved opposition-based PSO for
propagation, no. 4976. feedforward neural network training. In 2010 international conference
Fisher, R. A. (1936). The use of multiple measurements in taxonomic on information science and applications, pp. 1–6.
problems. Annals of Eugenics, 7(2), 179–188. Robinson, J., Sinton, S., & Rahmat-Samii, Y. (2002). Particle swarm,
Garg, H. (Feb. 2016). A hybrid PSO-GA algorithm for constrained genetic algorithm, and their hybrids: optimization of a profiled
optimization problems. Applied Mathematics and Computation, 274 , corrugated horn antenna. In IEEE antennas and propagation society
292–305. international symposium (IEEE Cat. No.02CH37313), vol. 1, pp. 314–
Gori, M., & Tesi, A. (1992). On the problem of local minima in 317.
backpropagation. IEEE Transactions on Pattern Analysis and Machine Scarpace, D. W., Flanders Lisa, Adam, E., Jain, Rajan, Mikkelsen, Tom,
Intelligence, 14(1), 76–86. & Andrews (2015). Data from REMBRANDT. The cancer imaging
Ho, Y. C., & Pepyne, D. L. (2002). Simple explanation of the no-free- archive.
lunch theorem and its implications. Journal of Optimization Theory and Seiffert, U. (2001). Multiple layer perceptron training using genetic
Applications, 115(3), 549–570. algorithms. Eur. Symp. Artif. Neural Networks ESANN, 159–164.
Holland, J. H. (1992). Genetic Algorithms, 267(1), 66–73. Semero, Y. K., Zhang, J., Zheng, D., & Wei, D. (2018). A GA-PSO hybrid
Juang, C.-F. (2004). A hybrid of genetic algorithm and particle swarm algorithm based neural network modeling technique for short-term
optimization for recurrent network design. IEEE Transactions on wind power forecasting. Distributed Generation and Alternative Energy
Systems, Man, and Cybernetics Part B, 34(2), 997–1006. Journal, 33(4), 26–43.
Kao, Y.-T., & Zahara, E. (2008). A hybrid genetic algorithm and particle Sheikhalishahi, M., Ebrahimipour, V., Shiri, H., Zaman, H., & Jeihoo-
swarm optimization for multimodal functions. Applied Soft Comput- nian, M. (2013). A hybrid GA–PSO approach for reliability optimiza-
ing, 8(2), 849–857. tion in redundancy allocation problem. International Journal of
Kennedy, J., & Eberhart, R. (1995). Particle swarm optimization. In Advanced Manufacturing Technology, 68(1–4), 317–338.
Proceedings of ICNN’95 - international conference on neural net- Shi, X. H., Lu, Y. H., Zhou, C. G., Lee, H. P., Lin, W. Z., & Liang, Y. C.
works, vol. 4, pp. 1942–1948. (2003). Hybrid evolutionary algorithms based on PSO and GA. In The
Kuo, R. J., & Han, Y. S. (Aug. 2011). A hybrid of genetic algorithm and 2003 congress on evolutionary computation, 2003. CEC ’03., vol. 4,
particle swarm optimization for solving bi-level linear programming pp. 2393–2399.
problem – A case study on supply chain model. Applied Mathematical Slowik, A., & Bialko, M. (2008). Training of artificial neural networks
Modelling, 35(8), 3905–3917. using differential evolution algorithm. In 2008 conference on human
Kuo, R. J., Syu, Y. J., Chen, Z.-Y., & Tien, F. C. (2012). Integration of system interactions, pp. 60–65.
particle swarm optimization and genetic algorithm for dynamic Utkarsh, A., Kantha, A. S., Praveen, J., & Kumar, J. R. (2015). Hybrid
clustering. Inf. Sci. (Ny), 195, 124–140. GA-PSO trained functional link artificial neural network based
Lee, Y., Oh, S.-H., & Kim, M. W. (1993). An analysis of premature channel equalizer. In 2015 2nd international conference on signal
saturation in back propagation learning. Neural Networks, 6(5) , processing and integrated networks (SPIN), pp. 285–290.
719–728. Vogl, T. P., Mangis, J. K., Rigler, A. K., Zink, W. T., & Alkon, D. L.
Li, G., Zhao, F., Guo, C., & Teng, H. (2006). Parallel hybrid PSO-GA (1988). Accelerating the convergence of the back-propagation method.
algorithm and its application to layout design. Berlin, Heidelberg: Biological Cybernetics, 59(4–5), 257–263.
Springer (pp. 749–758). Wienholt, W. (1993). Minimizing the system error in feedforward neural
Marinakis, Y., & Marinaki, M. (Mar. 2010). A hybrid genetic – Particle networks with evolution strategy. In ICANN ’93 (pp. 490–493).
swarm optimization algorithm for the vehicle routing problem. Expert London: Springer London.
Systems with Applications, 37(2), 1446–1455. Wolberg, W. H., & Mangasarian, O. L. (1990). Multisurface method of
Mendes, R., Cortez, P., Rocha, M., & Neves, J. (2002). Particle swarms pattern separation for medical diagnosis applied to breast cytology.
for feedforward neural network training. In Proceedings of the 2002 Proceedings of the National Academy of Sciences of the United States of
international joint conference on neural networks. IJCNN’02 (Cat. America, 87(23), 9193–9196.
No.02CH37290), pp. 1895–1899. Wolpert, D. H., & Macready, W. G. (1997). No free lunch theorems for
Mirjalili, S., Mirjalili, S. M., & Lewis, A. (2014). Let a biogeography- optimization. IEEE Transactions on Evolutionary Computation, 1(1),
based optimizer train your multi-layer perceptron. Inf. Sci. (Ny), 269, 67–82.
188–209. Yu, J., Wang, S., & Xi, L. (2008). Evolving artificial neural networks using
Piotrowski, A. P. (2014). Differential evolution algorithms applied to an improved PSO and DPSO. Neurocomputing, 71(4–6), 1054–1060.
Neural Network training suffer from stagnation. Applied Soft Com- Yu, S., Wei, Y.-M., & Wang, K. (2012). A PSO–GA optimal model to
puting, 21, 382–406. estimate primary energy demand of China. Energy Policy, 42, 329–340.
Premalatha, K., & Natarajan, A. M. (2009). Hybrid PSO and GA for
global maximization. International Journal of Open Problems in
Computer Science and Mathematics, 2(4).

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy