Neighborhood Search Based Improved Bat Algorithm For Data Clustering
Neighborhood Search Based Improved Bat Algorithm For Data Clustering
Neighborhood Search Based Improved Bat Algorithm For Data Clustering
https://doi.org/10.1007/s10489-021-02934-x
Abstract
Clustering is an unsupervised data analytic technique that can determine the similarity between data objects and put the
similar data objects into one cluster. The similarity among data objects is determined through some distance function. It is
observed that clustering technique gains wide popularity due to its unsupervised and can be used in diverse research filed
such as image segmentation, data analytics, outlier detection, and so on. This work focuses on the data clustering problems
and proposes a new clustering algorithm based on the behavior of micro-bats. The proposed bat algorithm to determine
the optimal cluster center for data clustering problems. It is also observed that several shortcomings are associated with
bat algorithm such as slow convergence rate, local optima, and trade-off among search mechanisms. The slow convergence
issue is addressed through an elitist mechanism. While an enhanced cooperative method is introduced for handling popula-
tion initialization issues. In this work, a Q-learning based neighbourhood search mechanism is also developed to effectively
overcome the local optima issue. Several benchmark non-healthcare and healthcare datasets are selected for evaluating the
performance of the proposed bat algorithm. The simulation results are evaluated using intracluster distance, standard devia-
tion, accuracy, and rand index parameters and compared with nineteen existing meta-heuristic algorithms. It is observed that
the proposed bat algorithm obtains significant results with these datasets.
13
Vol.:(0123456789)
10542 A. Kaur, Y. Kumar
clustering solutions [10]. These algorithms are inspired that hybridization of WOA and TS successfully handles the
from swarm intelligence and insect behaviour like particle aforementioned issues.
swarm optimization (PSO) [11, 12], artificial bee colony Sorensen presented the critical evaluation of various
(ABC) [13–15] and ant colony optimization (ACO) [16, well known metaheuristic algorithms [37]. It is stated that
17]; well-known physics laws like magnetic optimization researchers focus on the actual mechanism behind the under-
algorithm (MOA) [1], charged system search (CSS) [18], lying concept rather than to develop the new metaheuristic
black hole (BH) [19] and big bang-big crunch (BB-BC) [20, algorithm and also concentrate promising research direc-
21]; chemical processes like artificial chemical reaction opti- tion in the field of metaheuristic algorithms. To keep in
mization (ACRO) [22]; evolutionary algorithms [23] like mind rather than design a new metaheuristic algorithm, this
genetic algorithm (GA), genetic programing (GP) and bio- work considers the existing metaheuristic algorithm i.e. Bat
geography based algorithm (BBA) [24]; animal behaviour algorithm for solving data clustering problems. Recently,
based algorithms like grey wolf optimization (GWO) [25], the bat algorithm become popular in the research commu-
elephant heard optimization [26], cat swarm optimization nity and provides optimal solution for various optimization
[27], lion optimizer [28]; and population based algorithms problems [38–40]. Bat algorithm is developed by Yang et al.
like sine cosine algorithm [29], stochastic fractal search [38] based on the behavior of micro-bats into an algorithm,
algorithm [30], thermal exchange optimization algorithm especially the echolocation feature of micro-bats. The micro-
[31]. These algorithms are differ to each other in terms of bats use the echolocation feature to detect prey (food) and
local and global search mechanisms. The different search avoid obstacles. For detection of prey, microbats emits a
mechanisms are adopted for computing local and global short pulse. The aim of short pulse is to produce echo and
optimum solutions. Few algorithms having strong local in turn, micro-bats recognize the shape and size of prey. It is
search ability, for example CSO, BB-BC, SCA, BH etc., seen that several performance issues are associated with the
while rest of have strong global search ability like PSO, bat algorithm [40–42]. These issues are outlined as conver-
ABC, BBA, GWO etc. [10] However, it is observed that gence rate, local optima, population initialization, and trade-
for getting optimal solution, local and global search abili- off factor among local, and global searches. In turn, the bat
ties should be balanced [12]. Further, Abraham et al. [32] algorithm converges on near to optimal solution instead of
stated that data clustering can minimize the dissimilarity the optimal solution. The issues related to the performance
measures between the data points within a cluster and dis- of the bat algorithm are summarized as
similarity can be maximized between data points of other
clusters. Several clustering algorithms are designed such • Population Initialization: The initial population pro-
k-means, c-means, tabu search, simulated annealing etc. But, vides a significant impact on the success of clustering
it is observed that these algorithms are sensitive to initial algorithms [42, 43]. If, initial population is not selected
solutions, thus easily trapped in local optima. For example, in effective manner, then premature convergence problem
Choudhury et al., [33] designed an entropy based method can occur. As, meta-heuristic algorithms select the initial
to determine the initial solutions for the k-means algorithm. population using random function.
The aim of this method is to overcome the dependency of • Local optima: It is noticed that sometime, the population
k-means on initial solutions. Moreover, Torrente and Romo of bat algorithm is not updated in effective manner [39,
[34] also considered the initialization issue of k-means and 44]. In turn, the objective function returns same value in
developed a new initialization method based on the concept successive iteration. Finally, algorithm converges with
of bootstrap and data depth for computing the optimal ini- same solution, but the solution is not optimal one. This
tial solutions. It is also noticed that traditional clustering situation is called local optima and it occurs due to lack
algorithms faced difficulty with complex and large datasets. of appropriate mechanism to update population of micro-
This issue of data clustering is effectively addressed through bats.
metaheuristic algorithms. For example, Ahmadi et al. [35] • Convergence Rate: The convergence rate of an algo-
designed a clustering algorithm based on the grey wolf rithm depends on the optimization process and explora-
optimization method to tackle the data clustering problems, tion of the search space [45, 46]. The convergence rate
especially with large datasets. Several modifications like can also affect due to lack of coordination between explo-
local search and balancing factor are incorporated into grey ration (local search) and exploitation (global search) pro-
wolf optimization to effectively handle the data clustering cesses.
problem. Ghany et al. [36] developed a hybrid clustering
algorithm based on whale optimization algorithm (WOA) The contribution of work is given as:
and tabu search (TS) for solving data clustering. The reason
for hybridization is to overcome the local optima and also to 1. To develop an enhanced cooperative co-evolution
improve the quality of clustering solutions. The results stated method to handle the population initialization issue.
13
Neighborhood search based improved bat algorithm for data clustering 10543
2. An elitist strategy is developed for improving the con- the proposed algorithm significantly improves the perfor-
vergence rate. mance of conventional ABC algorithm.
3. To incorporate limit operator to check the local optima Cao et al. [50] developed a new initialization method
situation in algorithm. based on neighbourhood rough set model. The intra cluster
4. To develop the neighbourhood search mechanism for and inter cluster similarities of an object were represented
exploring optimal candidate solution in exploration pro- in terms of cohesion and coupling degrees. Furthermore, it
cess. is integrated with KM algorithm for improving clustering
5. The proposed bat algorithm is applied to solve clustering results. The efficacy of proposed algorithm is tested over
problems. three datasets and compared with other two initialization
algorithms. The proposed initialization method provides
superior results than traditional methods.
Han et al. [51] adopted a new diversity mechanism in
2 Related works gravitational search algorithm to handle clustering prob-
lems. The collective response of birds can be used to design
The recent works reported on partitional clustering algo- diversity mechanism and implemented through three
rithm are summarized in this section. Since past few dec- simple steps- (i) initialization, (ii) identification (nearest
ades, numbers of clustering algorithms are developed for neighbours) and (iii) orientation alteration. The candidate
obtaining the optimum results for partitional clustering. Few population is generated into initialization step as a first step
of them are discussed below. of algorithm. The second step corresponds to evaluate the
To determine best initial population and automatic cluster nearest neighbours through a neighbourhood strategy. Third
numbers, Rahman and Islam [47] designed a hybrid algo- step can change the current location of candidate solution
rithm based on K-means (KM) and genetic algorithm (GA). based on nearest neighbour. Thirteen datasets are chosen
Genetic algorithm was applied to determine optimized ini- for evaluating the performance of algorithm and simulation
tial cluster centres for KM. C-means algorithm was adopted results are compared with well-known clustering algorithms.
to obtain optimum clustering results. The performances of Authors claimed that proposed algorithm achieves superior
proposed algorithms were assessed on twenty datasets. The clustering results.
results were compared using well-known clustering tech- Senthilnath et al. [52] introduced two-phase fire fly algo-
niques. It was claimed that fuzzy c-means with GA gives rithm (FA) for clustering task. This algorithm simulates the
better clustering results. flashing pattern and social insect behaviours of fire flies.
Liu et al. [48] presented a clone selection algorithm for First phase of algorithm measures the variation of light
addressing automatic clustering. In automatic clustering, intensity. Second phase towards the movement of fireflies.
number of clusters can be detected in auto manner. Hence, The efficiency of fire fly algorithm is assessed on thirteen
in this work, authors introduce a genetic operator for detect- standard datasets and compared with ABC and PSO. The
ing the number of clusters. The well-known twenty-three simulation results favour the existence of FA algorithm in
datasets are selected for measuring the performance of the clustering filed.
clone selection algorithm. The results are compared with To handle the initialization issue of K-mean algorithm,
ILS, ACDE, VGA and DCPSO algorithms. Authors claimed Erisoglu et al. [53] developed a new initialization method.
that proposed algorithm provides better results without prior This method is based on the bi-dimensional mapping of fea-
knowledge for number of clusters. tures. Initially, two features are chosen, the first feature is
A two-step artificial bee colony algorithm was reported an attribute with maximum value of variation coefficient,
for obtaining the optimal clustering results [49]. Prior to called main axis. The second feature is determined using
implement, three improvements are inculcated in ABC algo- correlation values between main axis (first variable) and rest
rithm to make it more robust and efficient. These improve- of attributes. Hence, the second feature is an attribute with
ments are summarized as initial cluster centre locations, minimum correlation. The several benchmark datasets are
updated search mechanism, and equations and abandoned used to evaluate the performance of the proposed algorithm.
food source. The initial cluster centre locations are deter- From simulation results proved that proposed method sig-
mined through one step KM method. A PSO based search nificantly better than KM algorithm.
mechanism is used for exploring the promising search space. Kumar and Sahoo [54] hybridized the MCSS algorithm
Hooke and Jeeves concept are considered for evaluating with PSO. The personal best mechanism of PSO algorithm
abandoned food source locations. The performance of pro- was added into magnetic charge system search algorithm.
posed two-step ABC algorithm is tested on both artificial Further, neighbourhood strategy was also introduced to
and benchmark data sets and compared with well-known avoid local optima situation. The ten datasets are selected
clustering algorithms. It was observed from the results that for evaluating the performance of MCSS–PSO algorithm
13
10544 A. Kaur, Y. Kumar
and results are compared with wide range of clustering of mole rats. The algorithm starts by initializing the popula-
algorithms. Authors claimed that better quality results are tion of mole rats and searches the entire space for optimal
achieved by MCSS-PSO algorithm. solution in random fashion. In next iterations, employed
Zhou et al. [55] introduced simplex method-based SSO mole rats start movement to target food source and their
algorithm for solving clustering task. In this work, simplex neighbours. The performance of proposed algorithm was
method is incorporated into SSO algorithm to enhance local tested on six standard datasets and compared with other
search ability and improved convergence speed. The eleven well-known clustering algorithms. Results revealed that
datasets are considered for evaluating the simulation results blind naked mole rat’s algorithm provides higher accuracy
of proposed algorithm and compared with well-known clus- with faster convergence speed.
tering algorithms. The proposed SSO algorithm perform Hatamlou [60] considered the slow convergence rate
well in terms of accuracy, robustness, and convergence of binary search algorithm and designed a new algorithm
speed. for cluster analysis. This algorithm chooses initial cluster
Boushaki et al. [56] designed a new quantum chaotic CS points from differ locations. Further, the search direction is
algorithm for clustering task. To extend the global search based on the successive objective function values. If current
ability of quantum chaotic cuckoo search algorithm, a non- objective function is better than previous objective func-
homogeneous update mechanism was employed. Chaotic tion, then search proceeds in same direction, otherwise in
maps are incorporated in this algorithm to improve conver- opposite direction. The six benchmark datasets are chosen
gence speed. The performance of algorithm was compared for evaluating the efficacy of proposed algorithm. The results
with different variants CS algorithms and hybrid variants are compared with KM, GA, SA, TS, ACO, HBMO, and
of clustering algorithms. Authors claimed that proposed PSO algorithms. The proposed algorithm provides superior
CS algorithm provides more compact clusters than other clustering results.
algorithms. Bijari et al. [61] presented a memory-enriched BB-BC
A combination of GA and message-based similarity algorithm for clustering. It works in two phases- BB and BC
(MBS) measure was presented for effective cluster analysis phase. The BB phase corresponds for generation of random
by Chand et al. [57]. The MBS measure consists of two types points near to initial seed points. While, BC phase corre-
of messages-responsibility and availability. The messages sponds for optimizing these generated points. The BB-BC
are exchanged among data points and cluster centres. The algorithm is memory less algorithm. So, a memory concept
responsibility can be measured as evidence regarding cluster is integrated into BB-BC algorithm for memorizing the best
centres, while the availability corresponds to appropriate- location and also maintaining the exploration and exploita-
ness of data point with respect to clusters. Further, GAMBS tion tasks. The performance of algorithm was tested on six
consists of variable-length real-valued chromosome repre- data sets and compared with well-known algorithms like
sentation and evolutionary operator. The artificial and real- GA, PSO, GWO, and original BB–BC. Results stated that
life datasets are adopted for measuring the performance of the clustering results are improved significantly.
GAMBS algorithm. The simulation results showed that the Abualigah et al. [62] combined krill herd (KH) optimi-
algorithm obtains significant clustering results. zation algorithm with harmony search (HS) to overcome
Hatamlou [23] developed a new clustering algorithm local optima problem in clustering. A global exploration
inspired from black hole (BH) phenomenon. Like other clus- operator and reproduction procedure was integrated in krill
tering algorithms, BH algorithm starts with initial popula- herd algorithm. The seven standard datasets are selected
tion selection and objective function evaluation. The per- for measuring the performance of proposed algorithm and
formance of proposed algorithm is tested on six benchmark results are compared with GA, PSO, HS, KHA, H-GA, and
datasets and it is stated that black hole clustering algorithm H-PSO algorithms. Authors claimed that proposed combina-
provides better clustering results. tion (KH + HS) achieves more accurate clustering results.
Zhang et al. [58] presented an ABC algorithm for data Pakrashi and Chaudhuri [63] hybridized Kalman filtering
clustering. In ABC, onlooker bees and employed bees are algorithm with KM algorithm. In this work, authors consider
responsible for global search, while scout bees are respon- the slow convergence rate of KM algorithm and it can be
sible for local search. Further, Deb’s rule is incorporated improved with the help of Kalman filtering algorithm. Fur-
to redirect search in solution space. The performance was ther, a conditional restart mechanism was also incorporated
tested on three real-life datasets and compared with other in K-Means algorithm to handle local optima situation. The
clustering algorithms. Results revealed that the proposed seven benchmark datasets are taken for evaluating the per-
algorithm provides good quality results. formance of proposed algorithm and results are compared
Taherdangkoo et al. [59] reported a new blind naked mole with HKA, KGA, GAC, ABCC, and PSO algorithms. It is
rat’s algorithm in clustering field. This algorithm considers noticed that Kalman filtering algorithm successfully over-
food search capability and colony protection characteristics come the deficiency of KM algorithm.
13
Neighborhood search based improved bat algorithm for data clustering 10545
Kang et al. [64] hybridized KM and mussels wander- of proposed algorithm is tested on seven benchmark datasets
ing optimization algorithm, called K-MWO. The proposed and compared with other well-known clustering algorithms.
algorithm comprises of local search abilities of KM, while, Authors claimed that combination of KHM-GSA achieves
global search is accomplished through mussels wandering better convergence.
optimization algorithm. The performance is tested on nine A hybrid version of ant algorithm is presented for han-
datasets and results are compared with K-M and K-PSO dling clustering problems [70]. KHM algorithm is used for
algorithms. Authors claimed that K-MWO is an effective hybridizing the Ant algorithm, called KHM-Ant. The pro-
clustering algorithm. posed algorithm contains the merit of both algorithms such
To solve clustering search space problems, Wang et al. as initialization characteristic of KHM and local optima
[65] presented a hybrid version of flower pollination algo- characteristic of ant. The five benchmark datasets are consid-
rithm (FPA) and bee pollinator algorithm (BPA). The dis- ered for measuring the performance of KHM-Ant algorithm.
card pollen operator of ABC is used for enhancing global The simulation results are compared with KHM and ACA.
search ability of flower pollination algorithm. Further, the Authors claimed that more optimal results are achieved by
local search mechanism is improved through elite mutation KHM-Ant algorithm.
and crossover operator. Several artificial and benchmark Xiao et al. [71] developed a quantum-inspired GA (QGA)
datasets are selected for measuring the performance of pro- for partitional clustering. In this work, Q-bit based represen-
posed algorithm. The simulation results are compared with tation and rotation operation of quantum gates are applied
KM, FPA, CS, PSO, ABC, DE, algorithms. Results proved for achieving better search mechanisms. Several standards
that combination of FPA and BPA provides more optimal and simulated datasets are selected for evaluating the per-
results than others. formance of QGA algorithm. The QGA is able for finding
Hatamlou and Hatamlou [66] designed a two-stage optimal clusters without prior knowledge of number of clus-
clustering approach to overcome the drawbacks of particle ters centres.
swarm optimization like local optima and slow convergence Aljarah et al. [72] hybridized grey wolf optimizer (GWO)
speed. In first stage, PSO algorithm is adopted for generat- with tabu search (TS) for cluster analysis. TS is incorporated
ing the initial candidate solution. In second stage, HS algo- as an operator in GWO for searching neighbourhood. It helps
rithm is considered for improving the quality of solution. in balancing exploration and exploitation of GWO. The pro-
Seven datasets are chosen for measuring the performance posed GWOTS is tested on thirteen real datasets and results
of the proposed algorithm and results are compared with have been compared with other popular metaheuristics. The
KM, PSO, GSA, BB-BC methods. It is seen that proposed experiment results show that GWOTS is superior in terms of
algorithm determines good quality clusters. convergence behaviour and optimality of results.
A hybrid version of ABC algorithm with genetic algo- A PSO based clustering algorithm is presented in [73].
rithm is also presented for enhancing the information The concept of cooperative evaluation is incorporated into
exchange mechanism among bees by Yan et al. [67]. It is PSO for improving convergence rate and diversity. The
applied for solving data clustering problems. The infor- cooperative co-evolution method worked as decomposer
mation exchange mechanism is enhanced with the help of and PSO algorithm as optimizer. The standard and simu-
crossover operator. The six standard datasets are adopted lated datasets are selected for measuring the performance of
for evaluating the simulation results of proposed ABC algo- PSO and compared with SRPSO, ACO, ABC, DE, and KM
rithm and results are compared with other ABC, CABC, algorithms. The concept of cooperative evaluation improves
PSO, CPSO and GA clustering algorithms. The proposed the performance of PSO in significant manner.
ABC algorithm having better clustering results than others. To solve clustering problems effectively, an improved
To perform efficient clustering, Kwedlo [68] combined CSO algorithm is reported in [74]. Several modifications are
differential evolution (DE) algorithm with KM. KM algo- incorporated in CSO algorithm to make it effective. These
rithm is used to tune candidate solutions generated through modifications are described in terms of search equations.
mutation and crossover operators of DE. Additionally, a Further, a local search method is also developed for han-
recording procedure is also introduced to handle redundant dling local optima problem. The performance is evaluated
solutions. The performance of proposed algorithm was com- on five datasets and compared with several known clustering
pared with five other well-known clustering algorithms. It algorithms. Simulation results showed that improved CSO
was noticed that DE-KM algorithm gives state of art clus- obtains effective clustering results.
tering results. A class room teaching based meta-heuristic algorithm is
Yin et al. [69] presented a hybridized version of improved also presented for handling clustering problems [75]. The
GSA with KHM for solving clustering problems. This work properties of K-means algorithm were also investigated for
considers the convergence rate of KHM and diversity mech- effective clustering results [76]. Six benchmark datasets
anism of GSA to develop new algorithm. The performance are used to evaluate the performance of aforementioned
13
10546 A. Kaur, Y. Kumar
algorithm. The performance is measured in terms of over- 4.1 Enhanced co‑operative co‑evolution method
lapping, number of clusters, dimensionality and cluster size.
An intelligent system for spam detection was presented in It is observed that efficiency of clustering algorithm also
[77]. More relevant features were identified using evolution- depends on initial cluster points [42, 44]. Several initializa-
ary random weight networks. Table 1 gives the summary of tion methods are reported to address initial cluster selection
the various studies of literature. problem [47–50]. For improving the performance of bat
algorithm, an enhanced co-operative co-evolution frame-
work is introduced to select initial cluster centers. The co-
3 Bat algorithm operative co-evolution method works on divide and con-
quers paradigm. This paradigm divides the problem into
Bat algorithm is based on the echolocation behaviour of sub problems and sub problems can be solved individually
microbats especially prey detection and obstacle avoidance and final solution is obtained by combing each sub problem
[38]. In search of prey, microbat emits short pulse and con- solution. Hence, in this work, a co-operative co-evolution
siders the echo of nearby objects to determine its shape and method with centroid selection mechanism is proposed. This
size. The loudness, emission rate and random variable of bat method considers number of partitions, size of partition, and
algorithm are initialized using Eqs. (1–2). selection criteria for population initialization.
( )
At+1
i
= α Ati (1) 4.1.1 Population partitions and size description
[ ]
rit+1 = 1 − exp (−α) (2) This subsection gives description about number of partitions and
their size to implement the co-operative co-evolution method.
Where, Ati is loudness,ritis pulse emission rates and (α) is a First task is to divide the population (data instances) into several
user specified variable having values between [0–1]. The fre- predefined partitions. The partitions are equal to number of clus-
quency and velocity of bats are computed using Eqs. (3–4). ters (K) for a given dataset as mentioned in Eq. 7.
( t )
fit = fmin
t
+ fmax t
–fmin rand () (3) pn ∝ K (7)
(9)
4 Improved Bat algorithm for cluster In Eq. 9, ps1, ps2, …psndenotes subpopulations. UL rep-
analysis resents the upper limit in a sub population. Further, from
each subpopulation an appropriate centroid is selected using
This section presents the proposed improvements in bat Eq. 10.
algorithm. These improvements are (i) enhanced co-oper- ( ) ( ( ) ( ))
ative co-evaluation for population initialization, (ii) elit- Ckn = min psn + max psn − min psn ∗ rand (0, 1);Where n = 1 to K (10)
ist strategy for better convergence rate as well as tradeoff Where, C k denotes k th cluster center, min(p n )
between local and global solutions, and (iii) neighborhood and max(pn) are minimum and maximum values of each
strategy to avoid local optima and explore good candidate nth sub population (psn) and rand () is a random number.
solutions.
13
Table 1 Summarization of the recent state of art works in the direction of partitional clustering
Author Name Source of Inspiration Shortcomings Amendments Future Work
Rahman & Islam [47] - K- means and Genetic Algorithm -Initial Population-Automatic cluster -Proposed a hybrid algorithm based -Reduction of time complexity for
numbers on K-means (KM) and genetic algo- GenClust.
rithm (GA).
-Optimized initial cluster centres iden-
tified using GA for KM.
-Optimum clustering results obtained
using C-means.
Liu et al. [48] -Gene Transposon - Automatic Clustering -Proposed an improved variant of - Employ more improved cluster validity
clonal selection algorithm to deter- indices to form antibody affinity mak-
mine number of clusters. ing it suitable for non-convex data sets.
-A novel operation called antibody
gene transposon is introduced to
the framework of clonal selection
algorithm which can realize to find
satisfied number of cluster automati-
cally
Kumar & Sahoo [49] -Artificial Bee Colony algorithm -Issues of ABC addressed such as -One step KM method has been used -Other initialization methods can be
food sources initial position, solution for finding initial cluster centre loca- explored for generating initial optimal
search equation and abandoned food tions. solutions for clustering algorithms in
Neighborhood search based improved bat algorithm for data clustering
13
10547
mal solutions.
Table 1 (continued)
10548
13
Senthilnath et al. [52] -Fire fly algorithm - Applied for clustering task -Simulates flashing pattern and social -Design new heuristics to recognize
insect behaviours of fire flies in two flashing pattern and light intensity.
phases.
-First phase of algorithm measures
variation of light intensity.
-Second phase towards the movement
of fireflies.
Erisoglu et al. [53] -Bi-dimensional mapping of features -Initialization issue of K-means -Two features are chosen. - Focus on bi-directional clustering
-First feature is an attribute with maxi-
mum value of variation coefficient,
called main axis.
-Second feature is determined using
minimum correlation values between
main axis and rest of attributes.
Kumar & Sahoo [54] -MCSS and PSO -Local Optima-Solution search -Hybridized MCSS algorithm with
PSO.
-Personal best mechanism of PSO
algorithm added into magnetic
charge system search algorithm.
-Neighbourhood strategy introduced to
resolve local optima situation.
Zhou et al. [55] -Social spider optimization (SSO) -Initial cluster centers -Incorporated simplex method into -Extend fitness function to explicitly
-Local optima SSO algorithm optimize intra-cluster distances.
-Convergence Rate -It enhances local search and improves -Perform testing with higher-dimension
convergence rate. problems and more patterns.
-Work to be extended for dynami-
cally determining optimal number of
clusters.
Boushaki et al. [56] -Quantum Chaotic Cuckoo Search -Initial population -Designed quantum chaotic CS -Apply proposed approach on specific
algorithm -Search ability algorithm for clustering task.-Initial datasets. -Also applying quantum
-Convergence speed population is created using ergodic- theory to other recent meta-heuristic
ity and non-repetition properties of algorithms.
chaotic map.
-Also, employed nonhomogeneous
update mechanism for global search
ability improvement.
-Incorporated boundary strategy for
enhancing search procedure.
A. Kaur, Y. Kumar
Table 1 (continued)
Author Name Source of Inspiration Shortcomings Amendments Future Work
Chang et al.[57] -GA, message-based similarity (MBS) -Initial cluster numbers -Developed GAMS algorithm for -New method will be designed to deter-
-Clustering performance clustering problem with unknown mine initial solutions that can identify
cluster number. the clusters in automatic manner.
-Decisions are made using domain
specific knowledge by GAMS.
-It can find optimal number of clusters
as well as cluster centres automati-
cally.
- New similarity measure uses both
distance between data point with the
nearest centre and with neighbouring
data points.
Hatamlou [19] Black Hole Optimization -Quality clusters -Proposed BH for solving clustering -Can be applied to other areas of appli-
problem. cations.
- It is simple, easy to implement and -For more effectiveness, BH can be
free from parameter tuning. combined with other approaches.
Zhang et al. [58] -Artificial Bee colony -Quality clusters -Developed an artificial bee colony -Can be combined with effective local
algorithm to solve clustering prob- search strategy and hybrid using other
lems. metaheuristics.
Neighborhood search based improved bat algorithm for data clustering
13
10549
Table 1 (continued)
10550
13
Hatamlou [60] -Binary search algorithm -Quality cluster -Proposed novel binary search algo- - Integrate the neighbourhood structure
-Convergence rate rithm for data clustering. to determine optimal population.
-It finds high quality clusters and
converges to the same solution in
different runs.
-In the proposed algorithm a set of ini-
tial centroids are chosen from differ-
ent parts of the test dataset and then
optimal locations for the centroids
are found by thoroughly exploring
around of the initial centroids.
Bijari et al. [61] -Memory-enriched Big Bang-Big -Slow convergence speed -Reported a memory-enriched BB-BC -BB-BC can be hybrid with k-means to
Crunch algorithm -Local optima algorithm for clustering. overcome limitations of K-means.
-Exploration and exploitation trade-off -Works in two phases that is BB and -Can be used for multi-objective opti-
BC phase. mization
-BB phase corresponds for generation -Application of ME-BB–BC in techni-
of random points near to initial seed cal settings such as power dispatch
points. systems
-BC phase corresponds for optimizing
these generated points.
- A memory concept is integrated into
BB-BC algorithm to memorize best
location and balancing exploration
and exploitation tasks.
Abualigah et al. [62] -Krill herd and harmony search -Local Optima-Global search ability -Proposed hybrid algorithm by com- -Investigating on benchmark function
bining krill herd (KH) optimization datasets
algorithm with harmony search (HS).
-A global exploration operator and
reproduction procedure integrated in
krill herd algorithm.
Pakrashi & Chaudhuri [63] -K-Means and Heuristic Kalman -Convergence rate -Hybridized Kalman filtering algo- -Investigate HKA by hybridizing with
algorithm -Local Optima rithm with K-Means algorithm other methods.
(HKA-K). -Improving performance of HKA can be
-Kalman filtering helps in improving investigated.
convergence rate -A new and effective multi-solution
-Incorporated a conditional restart framework can be developed for this
mechanism in K-Means algorithm to proposed method instead of condi-
handle local optima situation. tional restart method
A. Kaur, Y. Kumar
Table 1 (continued)
Author Name Source of Inspiration Shortcomings Amendments Future Work
Kang et al. [64] -KM and mussels wandering optimiza- -Improve local search and global -Hybridized KM with mussels wander- -Study new updating rules of the weight
tion algorithm search abilities ing optimization algorithm, called information in the proposed clustering
K-MWO. ensemble framework.
-The proposed algorithm encompasses -Exploring the reason why proposed
local search abilities of KM and method cannot perform well in some
global search is achieved through particular dataset.
mussels wandering optimization -Develop a more influential clustering
algorithm. ensemble by combining with other
approaches.
Wang et al. [65] -Flower Pollination Algorithm and Bee -Diversity of population -Hybridized flower pollination -Extend fitness function to explicitly
Pollinator -Local Optima algorithm (FPA) and bee pollinator optimize intra-cluster distances.
-Convergence Speed algorithm (BPA). -Investigation of proposed algorithm on
-Global search ability of FPA is higher dimensional problems and large
enhanced using discard pollen opera- number of patterns.
tor of ABC. - Determine optimal number of clusters
-Elite mutation and crossover operator dynamically in proposed algorithm.
are used for improving local search
mechanism.
Hatamlou & Hatamlou [66] -Particle Swarm Optimization (PSO) -Local Optima -Designed a two-stage clustering -Wrapping methods can be adopted for
Neighborhood search based improved bat algorithm for data clustering
and Heuristic Search (HS) -Convergence Speed approach by hybridizing PSO and generating initial solutions.
HS.
-First stage adopts PSO algorithm for
generating initial candidate solution.
-Second stage employs HS algorithm
for improving quality of solution.
Yan et al.[67] -Artificial Bee Colony and GA -Convergence Speed of ABC -Proposed a Hybrid Artificial Bee -Finding features of functions and
Colony (HABC) algorithm to improving HABC for which it does not
improve optimization ability of work well.
canonical ABC.
-Information exchange between bees is
done through crossover operator of
Genetic Algorithm (GA).
Kwedlo [68] -Differential Evolution and K-means -Effective Clustering by combining -Combined differential evolution -Replacement of K-means by better
evolutionary algorithm with local algorithm (DE) with K-means for local search method.
optimization method clustering. -Reducing the runtime of DE-KM
-K-means algorithm fine-tunes each through parallelization
candidate solution attained by muta-
tion and crossover operators of DE.
-Proposed a reordering procedure
to tackle redundant representation
problem.
13
10551
Table 1 (continued)
10552
13
Yin et al. [69] -Gravitational Search Algorithm and -Local Optima of KHM -Proposed a hybridized version of -For effective clustering Integrate better
KHM -Slow convergence Speed of IGSA improved GSA with KHM. local search algorithms into KHM.
-Convergence rate of KHM and
diversity mechanism of GSA are
considered.
Jiang et al. [70] -Ant clustering algorithm, K-harmonic -Initialization-Local Optima -Propose a new clustering algorithm -Improving Ant clustering algorithm
means using Ant clustering algorithm to reduce runtime of ACAKHM
with K-harmonic means clustering algorithm.
(ACAKHM). - Study KHM algorithm with other com-
-Proposed algorithm utilizes merits of binatorial optimization techniques.
KHM and ACA.
-KHM is less sensitive to initialization
and ACA can avoid trapping in local
optimal solution.
Xiao et al. [71] -Quantum-inspired GA, K-means -Initial cluster centroid -Proposes a quantum-inspired genetic -Investigate exploring search space with
algorithm for k-means clustering small number of individuals
(KMQGA).
-A Q-bit based representation is
employed for exploration and exploi-
tation.
-Variable length of a Q-bit in KMQGA
is considered during evolution.
-KMQGA obtains optimal number of
clusters and provide optimal cluster
centroids.
Aljarah et al. [72] -Grey Wolf Optimizer, Tabu search -Local Optima -Hybridized GWO with TS -Investigation of GWOTS on synthetic
-Premature convergence -Incorporated TS as an operator in datasets and arbitrary shapes datasets.
GWO. -Run time reduction of GWOTS using
-TS helps balancing exploration and parallel computation.
exploitation of GWO.
Jiang & Wang [73] -PSO, bare-bone particle swarm -Convergence rate -Developed CC framework to improve -Develop CC framework for clustering
optimization (BPSO), Cooperative -Diversity of population the performance of PSO on cluster- problems where numbers of clusters is
Co-evolution (CC) ing high-dimensional datasets. not known prior.
-To solve each sub-problem coopera- -Detection of natural clusters and
tively, BPSO is employed. clustering in sub-space with CC
-A new centroid-based encoding framework.
schema is designed for each particle.
-Chern off bounds is applied for decid-
ing population size.
A. Kaur, Y. Kumar
Table 1 (continued)
Author Name Source of Inspiration Shortcomings Amendments Future Work
Kumar & Singh [74] -Cat Swarm optimization (CSO) -Unbalance exploration and exploita- -Proposed an Improved CSO -Multi objective variant of CSO algo-
tion processes. -New search equation is proposed rithm will be developed for improving
-Lack of diversity for steering search towards global clustering quality.
-Local Optima optimal solution.
-Slow convergence -Incorporated local search method
for quality solutions and overcome
problem of local optima.
Kumar & Singh [75] -Teaching learning based optimization -Unbalance between local and global -Proposed chaotic version of TLBO - Instead of local search, global search
(TLBO) search algorithm using different chaotic mechanism will be incorporated and
-Premature Convergence mechanisms. investigated its performance.
-Incorporated a local search method
for improving quality of solution
and better trade-off among local and
global search.
Fränti & Sieranoja [76] -K-means -Initialization -Introduced clustering basic bench-
-Clustering accuracy mark.
-Performance of k- means is studied
using this benchmark.
-Performance is measured on factors-
Neighborhood search based improved bat algorithm for data clustering
13
10553
10554 A. Kaur, Y. Kumar
The working of co-operative co-evolution method is rate depends on the searching pattern of algorithm. To
illustrated using iris dataset. This dataset contains one improve convergence speed, an improved elitist strategy is
hundred fifty data instances and four attributes. These incorporated in bat algorithm. According to elitist strategy,
data instances are divided into three classes. Hence num- best positions move from previous iteration to next itera-
ber of clusters (K) considered for iris dataset is three. tion. In this work, elitist strategy is implemented in two
The co-operative co-evolution method contains three Eqs. phases- Evaluation Phase and Updating Phase.
(7–10) to determine the initial population for clustering
algorithm in terms of cluster centers.
4.2.1 Evaluation phase
a) First step is to divide the population into subpopulation.
Equation 7 is used to compute the subpopulation. In this phase, personal best and global best positions are
pn ∝ K, For iris dataset, K = 3; computed using Eqs. (11–12). The comparison operator
pn = 3 is used to calculate global best position i. e. XGbest and
personal best position XPbest.
b) In second step, size of subpopulations is computed. The XPbest = min (f itness value) (11)
size of subpopulation is computed through Eq. 8.
ps = T∕K, where T = 150 and K = 3; XGbest = min (distance value) (12)
ps = 150∕3 = 50.
The personal best ( XPbest) is obtained using the fitness
function as described in Eq. 13.
The size of subpopulation ( ps)is 50. Further, the num- � �
ber of subpopulations is determined using Eq. 9. � � � K
SSE CK
F CK = ∑K � � (13)
ps1 = 1 to ⌈ps ⌉; p�s =�50; K∈1 K=1 SSE CK
� �
ps1 = 1� to 50,
� LB� ps1 � = 1 and UB ps1� =� 50; Where, SSE denotes the sum of square error and
ps2 = UB ps1 + 1� to�⌈ps1 + ps ⌉;UB � � ps1 = 50, ps = 50; CKrepresents the Kthcentroid object. After evaluation of
ps2 = 51 to 100;LB ps2 = 51, UB ps2 = 100; fitness function minimum value is selected as personal
� � � � n−1
∑
psn = UB ps(n−1) + 1 + ⌈ps ⌉;n = 3 best. In next step, the global best ( X Gbest) is evaluated
i=0 using Eq. 13, which is minimum value of distance func-
�
� � � ∑
2
tion or objective function.
ps3 = UB ps2 + 1 + ⌈ps ⌉; ps0 = ps
i=0
ps3 = 101 to 150;
4.2.2 Updating phase
iii) In third step, Eq. 10 is used for computing initial cluster
centers for clustering algorithm.
In this phase, the personal best and global best positions
are compared with previous iteration values. If, current
• When n = 1 ck1 = min(1 : 50) + (max(1 : 50) − min(1 :
values are better than previous values, than positions are
50)) ∗ rand()
updated using Eqs. (14–15). Otherwise, previous values
are considered.
{5.5221 4.0109 1.7333 0.5074}
{
Xt−1 = Xtpbest f it(t) <= f it(t − 1)
(14)
Pbest
• W h e n XPbest = t t−1
XPbest = Xpbest f it(t) >= f it(t − 1)
n = 2 c k2 = min(51 : 100) + (max(51 : 100) − min
(51 : 100)) ∗ rand()
{
{6.8022 3.2681 4.9022 1.7246} Xt−1
Gbest
= XtGbest s(t) ≤ s(t − 1)
• When n = 3 ck3 = min(101 : 150) + (max(101 : 150) − XGbest =
XtGbest = Xt−1
Gbest
s(t) ≥ s(t − 1)
min(101 : 150)) ∗ rand() (15)
{5.2810 2.4032 4.8048 1.5397} To achieve optimum trade-off among search mecha-
nisms, the basic frequency, velocity and search equations
4.2 Elitist strategy of bat algorithm are modified using Eqs. (16–19).
13
Neighborhood search based improved bat algorithm for data clustering 10555
( ) ( ( ) ( ))
min XtGbest + max XtGbest − min XtGbest β degraded due to local optima issue [78]. The various strate-
fit = ( ) (16) gies are developed for avoiding the local optima issue in lit-
max XtPbest
erature [79, 80]. This work also presents a Q-learning based
neighborhood search mechanism for effectively handling the
( )
vti = vt−1
i
+ XtGbest − XtPbest fit (17) local optima issue of clustering algorithms. The proposed
Q-learning based concept works into two steps- Identifica-
tion Step and Evaluation Step. The first step corresponds to
Xnew = XtGbest + randi[−1, 1] (18) determine the neighborhood boundary and neighboring data
{ objects. Whereas, second step corresponds for evaluating the
XGbest if rand > ri updated position of initial cluster points through Q-learning
Xti = (19)
Xnew + vti otherwise concept. Fig. 1(a-c) illustrates the Q-learning based neigh-
borhood search mechanism.
where, fit represent frequency of ith bat, vti denotes the
th
velocity
( t of )i bat, andX( i represents
t
) the position of ith bat; 4.3.1 Identification step
min XGbest and max XGbest denote minimum and maxi-
t
mum(values)of sum function associated with XtGbest position. This step corresponds to determine neighboring data points
max XtPbest represents the maximum value of fitness func- of initial cluster centers as shown in Fig. 1(a). The Euclidean
tion associated with personal best position and β denotes a distance measure is considered for evaluating the neighbor-
random value between [0, 1]. ing data points. In this work, neighboring data objects are
set to 5. Hence, five data objects with minimum Euclidean
4.3 Q‑learning based neighborhood search distance are selected as neighboring data points of given
mechanism cluster center as shown in Fig. 1(b). Let Xi represents ith
cluster center and Xi, neigh represents set of neighboring
This subsection describes the Q-learning based neighbor- data points of ith cluster center. Xi, neigh is described as
hood search for handling the local optima problem of bat Xi, neigh = {Xi, 1, Xi, 2, ……Xi, 5} where neigh = 1 to 5.
algorithm. The performance of clustering algorithms can be
13
10556 A. Kaur, Y. Kumar
4.3.2 Evaluation phase
evolution (described in subsection 4.1) method is used
This step corresponds to determine the updated position to select initial centroid. The other parameters like loud-
of initial cluster points as shown in Fig. 1(c). In this step, ness, emission rate and random variable are specified.
• Evaluation and Assignment Step: The work of this step
Q-learning [81] is used instead of arithmetic mean to com-
pute the position of neighborhood data items. The Q-learn- is to evaluate objective function and allocate objects to
ing algorithm follows simple procedure until good quality nearest clusters. The Euclidean distance can be acted as
solution is received. It consists of initializing the Q-table, objective function and allocates objects to nearest clus-
choosing an action, performing the action, measuring the ters using minimized Euclidean distance. Moreover, a
rewards and updating the Q-table. The Q[s, a] of neighbour- Q-learning based neighborhood search mechanism is
ing data points is updated using Eq. 20: incorporated to overcome local optima situation. A limit
operator is applied to determine the local optima situa-
[ ( ) ]
Qε(s, a) = Q(s, a) + α R(s, a) + γmaxQ� s� + a� − Q(s, a) tion. If, the candidate solution is not improved in prede-
(20) fined limit operator range, then it is assumed that algo-
where Q " (s, a) represents the new Q-value for state (s) and rithm traps in local optima and neighborhood mechanism
action (a), Q(s, a) gives the current value, α is the learn- can be invoked.
ing rate, R(s, a) represents the reward for taking action for a
state, γ is the discount rate, and maxQ′(s′ + a′) represents the
• Updation Step: This step corresponds for updating the
maximum expected future reward.
positions of bats through search mechanism. The emis-
sion rate of bat algorithm is compared with random
function. If random function provides less value than
emission rate, then neighborhood value is accepted.
Otherwise, a new value is calculated using parameters
variations i.e., loudness and emission rate. If, a termina-
tion criterion is met, then algorithm stops its working
and final solution is obtained. Otherwise, it repeats phase
2–3.
4.4 Time complexity
13
Neighborhood search based improved bat algorithm for data clustering 10557
13
10558 A. Kaur, Y. Kumar
Table 2 Summary of non-healthcare and healthcare datasets 5.1.1 Comparison of simulation results of proposed BAT
Sr. No. Data sets K D N Description and standard/well‑known clustering algorithms
13
Neighborhood search based improved bat algorithm for data clustering 10559
Table 3 Simulation results of proposed BAT and standard clustering algorithms using intra cluster distance (intra) and standard deviation (SD)
measures
Datasets Measure Standard/ Well-known Clustering Algorithms
K-means PSO ACO ABC DE GA BB-BC BAT IBAT
Iris Intra 9.20E+01 9.86E+01 1.01E+02 1.08E+02 1.21E+02 1.25E+02 9.68E+01 1.15E+02 9.16E+01
SD 2.67E+01 4.67E-01 1.31E+00 3.63E+00 5.23E+00 1.46E+01 2.22E+00 3.76E+01 2.12E+01
Rank 2 4 5 6 8 9 3 7 1
Glass Intra 3.79E+02 2.76E+02 2.19E+02 3.29E+02 3.62E+02 2.82E+02 6.64E+02 3.75E+02 1.96E+02
SD 7.05E+01 8.59E+00 3.36E+00 1.14E+01 1.21E+01 4.14E+00 6.89E+01 4.29E+00 1.98E+00
Rank 8 3 2 5 6 4 9 7 1
Wine Intra 1.81E+04 1.64E+04 1.62E+04 1.69E+04 1.58E+04 1.65E+04 1.67E+04 1.71E+04 1.61E+04
SD 9.06E+02 8.55E+01 3.69E+01 4.74E+02 5.60E+01 7.84E+01 2.88E+00 5.66E+01 3.54E+01
Rank 9 4 3 7 1 5 6 8 2
Ionosphere Intra 2.42E+03 1.00E+03 8.16E+02 1.11E+03 1.13E+03 1.00E+03 1.07E+03 1.33E+03 8.01E+02
SD 4.55E+02 3.34E+02 4.48E+02 2.61E+02 3.17E+02 4.13E+02 2.99E+02 2.34E+02 1.53E+02
Rank 9 3 2 6 7 4 5 8 1
Control Intra 1.01E+06 4.18E+04 2.39E+04 5.12E+04 5.23E+04 4.62E+04 2.38E+04 2.68E+04 2.39E+04
SD 5.05E+03 1.02E+03 1.71E+02 1.32E+03 9.16E+02 1.58E+03 1.09E+02 1.78E+02 1.24E+02
Rank 9 5 3 7 8 6 1 4 2
Vowel Intra 1.60E+05 1.58E+05 1.89E+05 1.70E+05 1.81E+05 1.59E+05 1.94E+05 1.96E+05 1.49E+05
SD 4.52E+03 2.88E+03 2.58E+03 4.64E+03 2.86E+03 3.11E+03 2.44E+04 3.98E+03 1.15E+03
Rank 4 2 6 5 7 3 8 9 1
Balance Intra 1.20E+05 6.20E+04 5.94E+04 6.61E+04 6.78E+04 6.91E+04 5.96E+04 6.02E+04 5.01E+04
SD 9.28E+03 4.01E+03 7.56E+02 6.79E+02 5.25E+03 5.62E+03 3.72E+02 8.26E+02 3.59E+02
Rank 9 8 2 5 6 7 3 4 1
Crude oil Intra 2.91E+02 2.86E+02 2.47E+02 2.81E+02 3.69E+02 2.81E+02 2.77E+02 2.89E+02 2.51E+02
SD 2.63E+01 1.14E+01 7.11E+00 1.09E+01 2.33E+01 8.14E+00 1.17E+02 1.76E+01 1.06E+02
Rank 9 6 1 5 3 7 4 8 2
Average Rank 7.4 4.4 3 5.8 5.8 5.6 4.9 6.9 1.4
13
10560 A. Kaur, Y. Kumar
The convergence behaviour of proposed BAT, BAT, results in most of cases when compared to other hybridized
BB-BC, GA, DE, ABC, ACO, PSO and K-means clus- clustering algorithms. The results of accuracy parameter of
tering algorithm is shown in Fig. 3(a-h). In this graphi- proposed BAT algorithm and other hybridized clustering
cal illustration, X-axis labels the number of iteration and algorithms are demonstrated in Table 7. It is noticed that
Y-axis labels the intra-cluster distance. It is observed that proposed BAT algorithm provides more accurate results
proposed BAT algorithm converges on minimum values for iris (93.00), wine (76.01), ionosphere (71.94), control
accept the balance and control dataset. Although in most (75.30), vowel (67.11), and crude oil (76.64). For glass and
of aspect, the proposed algorithm provides better con- balance datasets, PSO-BB-BC performs better with accuracy
vergence rate. Hence, it is stated that the proposed BAT value as 69.52 and 89.21, after that proposed BAT gives
outperforms than other well-known clustering algorithms. higher accuracy rate as 69.17 and 88.92 respectively. Moreo-
ver, rand index is also computed to prove its effectiveness in
5.1.2 Comparison of simulation results of proposed BAT clustering field. Table 8 demonstrates the simulation results
and existing hybrid clustering algorithms of proposed BAT algorithm and other hybridized clustering
algorithms using rand index parameter for benchmark clus-
This subsection discusses about the proposed BAT simula- tering datasets. The proposed BAT algorithm obtains better
tion results for benchmark clustering datasets and compared results for wine (0.374), ionosphere (0.319), glass (0.427),
to existing hybridized clustering algorithms. Furthermore, control (0.799), balance (0.574) and crude oil (0.074) data-
the performance of proposed algorithm is compared with six sets for rand index measure as compared to hybridized vari-
hybridized clustering algorithms. Table 6 demonstrates the ants of clustering algorithms. While, K-KHA achieves better
simulation results of H-KHA, MEBBC, IKH, ICMPKHM, rand index (0.734) for iris, PSO-BB-BC (0.852) for vowel
PSO-BB-BC, CBPSO and proposed BAT algorithm using dataset. From the results it can be stated that proposed BAT
average intra cluster distance (intra) and standard deviation algorithm is competent with other hybridized variants of
(SD). It is observed that proposed BAT algorithm obtains clustering techniques over benchmark clustering datasets.
minimum intra cluster distance for iris (9.16E+01), glass
(1.96E+02), wine (1.61E+04), ionosphere (8.01E+02), 5.1.3 Comparison of simulation results of proposed BAT
control (2.39E+04), balance (5.01E+04) and crude oil and recently reported clustering algorithms
(2.51E+02). While in vowel dataset, ICMPKHM has mini-
mum intra cluster value (1.47E+05) than proposed algo- The performance of proposed BAT algorithm is also com-
rithm. As well as, the values of standard deviation are pared with recent clustering algorithms. Table 9 demon-
minimum for proposed BAT algorithms for the dataset’s strates the simulation results of VS, MBOA, WOA, ICSO,
iris (2.12E+01), wine (3.54E+01). It gives the favourable Chaotic TLBO and proposed BAT algorithm using average
13
Neighborhood search based improved bat algorithm for data clustering 10561
(a) (b)
(c) (d)
Fig. 3 (a-h) Convergence behaviour of IBAT, BAT, BB-BC, GA, DE, ABC, ACO, PSO and K-means algorithm
intra cluster distance parameter and standard deviation. It is value as 91.19 and 69.52, after that proposed BAT gives
observed that proposed BAT algorithm obtains minimum accuracy value as 93.00 and 69.17. Additionally, rand index
intra cluster distance for iris (9.16E+01), glass (1.96E+02), is also calculated to verify its efficacy in clustering field.
wine (1.61E+04), ionosphere (8.01E+02), control Table 11 illustrates the simulation results of proposed BAT
(2.39E+04), balance (5.01E+04) and crude oil (2.51E+02). algorithm and recent clustering algorithms using rand index
The values of standard deviation are minimum for proposed parameter for benchmark clustering datasets. The proposed
BAT algorithms for the dataset’s iris (2.12E+01), glass BAT algorithm obtains better results for most of datasets that
(1.98E+00), ionosphere (1.53E+01), control (3.62E+01), is iris (0.72), glass (0.427), wine (0.374), control (0.799),
vowel (1.15E+02) and crude oil (1.06E+02). Whereas for vowel (0.846), and balance (0.534) for rand index measure
the datasets wine (3.54E+01) and balance (3.59E+02), pro- as compared to recent clustering algorithms. Whereas, for
posed algorithm is the second most after Chaotic TLBO. ionosphere, ICSO achieves better rand index (0.319) and
From results, it is concluded that proposed BAT is com- MBOA achieves higher rand index (0.078) for crude oil
petent and outperforms in most of cases when compared dataset. The results show that proposed BAT outperforms
to other hybridized clustering algorithms. Table 10 dem- and is proficient as compared to recent clustering algorithms.
onstrates the results of accuracy parameter of proposed
BAT algorithm and recent clustering algorithms for bench-
mark clustering datasets. It is noticed that proposed BAT 5.2 Experiment 2: healthcare datasets
algorithm provides more accurate results for wine (76.01),
ionosphere (71.94), control (75.30), vowel (67.11), balance This subsection presents the simulation results of proposed
(88.92) and crude oil (76.64). Except for iris and glass data- BAT clustering algorithm for healthcare datasets.
set in which Chaotic TLBO performs better with accuracy
13
10562 A. Kaur, Y. Kumar
(e) (f)
(g) (h)
Fig. 3 (continued)
5.2.1 Comparison of simulation results of proposed BAT accuracy parameter of proposed BAT algorithm and other
and standard/well‑known clustering algorithms well-known clustering algorithms are illustrated in Table 13.
Results show that proposed BAT algorithm gives higher
The performance comparison of proposed BAT algorithm accuracy for CMC with value 48.21, WBC as 96.61 and
with K-means, PSO, ACO, ABC, DE, GA, BB-BC, and Thyroid as 71.98. In case of LD dataset, PSO gives better
BAT algorithms are presented in Table 12. The results are results with accuracy value as 54.05. Even then the proposed
evaluated in terms of average intra cluster distance (intra), algorithm with accuracy value as 54.02 outperforms than
standard deviation (SD) and rank. Four healthcare data- rest of the well-known clustering algorithms in case of LD
sets are considered to test and compare the performance dataset. So, it is concluded that proposed BAT algorithm
of proposed BAT with well-known clustering algorithms. gives more accurate results in clustering field for considered
Simulation results showed that proposed BAT algorithm healthcare datasets. Table 14 presents the simulation results
obtains minimum intra cluster distance values for all the of proposed BAT algorithm and other well-known clustering
considered healthcare datasets that is CMC (5.52E+03), LD algorithms using rand index measure for healthcare data-
(2.31E+02), WBC (2.89E+03), and thyroid (2.51E+02) as sets. It is seen from the results that proposed BAT algorithm
compared to well-known clustering algorithms. The stand- obtains better results as compared to other clustering algo-
ard deviation is minimum value computed for assessing the rithms for CMC, LD, WBC and thyroid datasets with values
efficiency of algorithms. It represents dispersion of data 0.28, 0.492, 0.276 and 0.383 respectively. Hence, proposed
objects within a cluster. It is also analysed that in most of BAT algorithm is considered to be one of proficient algo-
aspects standard deviation parameter is minimum for pro- rithm for cluster analysis.
posed BAT algorithm than rest of algorithms. The results of
13
Neighborhood search based improved bat algorithm for data clustering 10563
Table 6 Simulation results of proposed BAT and hybrid clustering algorithms using intra cluster distance (intra) and standard deviation (SD)
measures
Datasets Measure Hybrid Clustering Algorithms
H-KHA MEBBC IKH ICMPKHM PSO-BB-BC CBPSO IBAT
13
10564 A. Kaur, Y. Kumar
using intra cluster distance parameter for healthcare on considered healthcare datasets. Table 16 demonstrates the
datasets results of accuracy parameter as average case of proposed
Figure 4(a-d) show the convergence behavior of IBAT BAT algorithm and other hybridized clustering algorithms
and well-known clustering algorithms (BAT, BB-BC, GA, for healthcare datasets. It is perceived that proposed BAT
DE, ABC, ACO, PSO and K-means). In the graphical illus- algorithm provides more accurate results as compared to
tration X-axis labels number of iterations and Y-axis labels other hybridized clustering algorithms. The proposed BAT
the intra-cluster distance. It is observed that IBAT algo- gives higher accuracy values for CMC as 48.21, LD as 54.02.
rithm converges on minimum values for all the considered While there is marginal difference for accuracy values of
healthcare datasets. The proposed algorithm provides better proposed BAT over WBC (IBAT = 96.61, CBPSO = 96.89)
convergence rate in most of the cases. It is stated that the and thyroid (IBAT = 71.98, CBPSO = 72.21) as compared
IBAT outperforms than other clustering algorithms for the to CBPSO but then also the values are higher than CBPSO
considered healthcare datasets. and other hybridized clustering algorithms. It is stated that
proposed BAT provides more accurate results compared
5.2.2 Comparison of simulation results of proposed BAT with other hybridized clustering algorithms. Rand index is
and existing hybrid clustering algorithms also computed for healthcare datasets to prove its effective-
ness in clustering. Table 17 shows the simulation results of
This subsection presents the performance of proposed algo- proposed BAT algorithm and other hybridized clustering
rithm compared with six hybridized clustering algorithms. algorithms for healthcare datasets using rand index meas-
Table 15 demonstrates the simulation results of H-KHA, ure. It is noticed that proposed BAT algorithm gives better
MEBBC, IKH, ICMPKHM, PSO-BB-BC, CBPSO and pro- results for rand index on CMC with value 0.280, WBC with
posed BAT algorithm using average intra cluster distance value 0.257 and thyroid with 0.383. While proposed BAT
(intra), standard deviation (SD) and rank measures. From the gives identical rand index value compared with MEBBC for
results, it is observed that proposed BAT algorithm attains LD (0.496). Thus, it is indicated that proposed BAT obtains
minimum intra cluster distance in all the considered health- better results than other hybridized variants of clustering
care datasets with intra cluster distance value for CMC as algorithms for considered healthcare datasets.
5.52E+03, LD as 2.31E+02, WBC as 2.89E+03 and thy-
roid as 2.51E+02 as compared to other hybridized cluster- 5.2.3 Comparison of simulation results of proposed BAT
ing algorithms. In vowel dataset H-KHA has minimum intra and recently reported clustering algorithms
cluster value than proposed algorithm. The proposed BAT
algorithm also gives minimum value of standard deviation This subsection presents the performance comparison of
for most of the cases in comparison to hybridized algorithm proposed algorithm with recent clustering algorithms.
13
Neighborhood search based improved bat algorithm for data clustering 10565
13
10566 A. Kaur, Y. Kumar
Table 12 Simulation results of proposed BAT and standard clustering algorithms using intra cluster distance (intra) and standard deviation (SD)
measures
Datasets Measure Standard/Well-known Clustering Algorithms
K-means PSO ACO ABC DE GA BB-BC BAT IBAT
CMC Intra 5.59E+03 5.85E+03 5.83E+03 5.94E+03 5.95E+03 5.76E+03 5.71E+03 5.79E+03 5.52E+03
SD 6.76E+00 4.89E+01 1.23E+02 1.31E+02 8.69E+01 5.04E+01 2.86E+01 3.67E+01 5.39E+00
Rank 2 7 6 8 9 4 3 5 1
LD Intra 1.17E+04 2.39E+02 2.41E+02 9.85E+03 1.15E+04 5.44E+02 2.32E+02 2.36E+02 2.31E+02
SD 6.68E+02 2.88E+01 1.64E+01 8.20E+02 2.07E+03 4.18E+01 2.41E+01 1.52E+01 1.67E+01
Rank 9 4 5 7 8 6 2 3 1
WBC Intra 1.93E+04 4.26E+03 3.37E+03 3.50E+03 3.73E+03 3.00E+03 2.96E+03 3.06E+03 2.89E+03
SD 5.14E-12 2.08E+02 4.17E+01 2.12E+02 1.84E+02 2.25E+02 5.57E+02 1.98E+02 1.33E+02
Rank 9 8 5 6 7 4 2 3 1
Thyroid Intra 2.39E+03 1.11E+04 1.99E+03 1.98E+03 2.96E+02 1.22E+04 1.94E+03 3.85E+02 2.51E+02
SD 2.46E+02 2.71E+01 3.09E+01 2.23E+02 2.06E+01 3.26E+01 1.95E+02 2.28E+01 1.32E+01
Rank 7 8 6 5 3 9 4 2 1
Average Rank 6.8 6.8 5.5 6.5 6.8 5.8 2.8 3.3 1
Table 18 demonstrates the simulation results of VS, MBOA, algorithm also gives minimum value of standard deviation
WOA, ICSO, Chaotic TLBO and proposed BAT algorithm for almost all the considered healthcare datasets except thy-
using average intra cluster distance (intra), standard devia- roid in which ICSO give minimum standard deviation value
tion (SD) and rank measures for healthcare datasets. From as 1.16E+01 and then comes proposed BAT with value
the results, it is witnessed that proposed BAT algorithm 1.32E+01 as compared to recent clustering algorithms. The
attains minimum intra cluster distance in healthcare data- results of accuracy parameter as average case of proposed
sets with intra cluster distance value for LD as 2.31E+02, BAT algorithm and recent clustering algorithms for health-
WBC as 2.89E+03 and thyroid as 2.51E+02 as compared to care datasets are illustrated in Table 19. It is noticed that
recent clustering algorithms. For CMC dataset, it is seen that proposed BAT algorithm provides more accurate results
MBOA gives the minimum intra cluster distance value as as compared to recent clustering algorithms. The proposed
5.21E+03 than proposed algorithm. Also, the proposed BAT BAT gives higher accuracy values for CMC as 48.21, LD
13
Neighborhood search based improved bat algorithm for data clustering 10567
as 54.02, WBC as 96.61 and Thyroid as 71.98. From the this work, Friedman statistical test is considered for identi-
results, it is indicated that IBAT provides more accurate fying the best performing algorithm among all. To perform
results as compared to recent clustering algorithms. To the statistical test, two hypothesis ( H0 and H
1) are designed
prove the effectiveness of proposed algorithm in clustering at the significance level 0.05. Hypothesis (H0) corresponds
for healthcare datasets, Rand index is computed. The simu- to no significant difference among performances of new
lation results of proposed BAT algorithm and recent clus- algorithm and rest of algorithms. Hypothesis ( H1) corre-
tering algorithms using rand index measure for healthcare sponds to significant difference among performances of new
datasets are shown in Table 20. It is seen that proposed BAT algorithm and rest of algorithms. If, significant difference
algorithm gives better results for rand index on WBC with is not occurred, then hypothesis (H0) is not rejected and it
value 0.276 and thyroid with 0.383 as compared to recent is said that the proposed algorithm (IBAT) similar perfor-
clustering algorithms. While for CMC dataset ICSO gives mance like other algorithms. Otherwise, hypothesis (H0) is
the better rand index value 0.283 and then comes proposed rejected and hypothesis (H1) is true and it indicates that there
BAT with value 0.280 and chaotic TLBO obtains better rand is a significant difference occurs between the performances
index rate as 0.498. Thus, it is signified that proposed BAT of newly proposed algorithm and rest of algorithms. So, it
is competent and obtains better results for most of the con- can perform better than other existing algorithms and bet-
sidered healthcare datasets. ter performing algorithm. Hence, statistical tests are widely
adopted for analyzing performance of newly proposed algo-
5.3 Statistical test rithm and statistical test results gives the clear idea about
better performing algorithm. In this work, Friedman statisti-
This subsection describes the statistical test to determine cal test is applied for determining the best performing algo-
the best performing algorithm among proposed IBAT and rithm. In the first step of Friedman test, a rank is assigned
other existing clustering algorithms. The statistical tests are to each algorithm with each dataset and furthermore, aver-
used to establish a new algorithm and also compute weather age ranking is computed for all algorithms using all dataset.
a significant difference is occurred among the performances The ranking of each algorithm with each dataset is reported
of the new algorithm and existing algorithms [87–89]. In in Table 21. The ranking of algorithms is computed using
13
10568 A. Kaur, Y. Kumar
(a) (b)
(c) (d)
Fig. 4 (a-d) Convergence behaviour of IBAT, BAT, BB-BC, GA, DE, ABC, ACO, PSO and K-means algorithm
Table 15 Simulation results of proposed BAT and hybrid clustering algorithms using intra cluster distance (intra) and standard deviation (SD)
measures
Datasets Measure Hybrid Clustering Algorithms
H-KHA MEBBC IKH ICMPKHM PSO-BB-BC CBPSO IBAT
13
Neighborhood search based improved bat algorithm for data clustering 10569
13
10570 A. Kaur, Y. Kumar
accuracy measure and it is also illustrated the average rank- the significant difference between the performances of algo-
ing of each technique. It is seen that proposed BAT (IBAT) rithms, while symbol “-” indicates no significant difference
algorithm obtains 2.1 rank which is highest rank among between the performances of algorithms. On the analysis
rest of algorithms and ACO algorithm obtains lower rank of posthoc test, it is stated that several algorithms exhibit
(18) among all algorithms. It also noticed that chaotic the similar performance and can be clubbed into a single
TLBO achieves second highest rank (5.1) among all algo- group. In turn, twelve groups are determined that having
rithms. The statistical results of Friedman test are shown in similar performance according to posthoc test. The descrip-
Table 22. It is observed that the statistical value of Fried- tion of these groups are listed as Group 1 K-means, PSO,
man test is 63.6531. The degree of freedom is 19 and the ABC, DE, GA, BB-BC, BAT, MEBBC, IKH, ICMPKHM,
critical value is 301,435 at the significance level of 0.05. CBPSO, VS, MBOA, and WOA algorithm. Group 2 consists
The p value computed for Friedman test is 1.01E-06. On the of K-Means, K-means, PSO, GA, BB-BC, BAT, H-KHA,
analysis of statistical results, it is concluded that p value is MEBBC, IKH, ICMPKHM, PSO-BB-BC, CBPSO, VS,
considerably less than critical value. Hence, the hypothesis MBOA, WOA, ICSO, and Chaotic TLBO algorithms. The
(H0) is rejected and a significant difference occurs between algorithms in group 3 are ACO, ABC, DE, GA, and BAT.
the performance of proposed BAT (IBAT) and other exist- Group 4 contains K-means, ACO, ABC, DE, GA, BB-BC,
ing algorithms. These results certified that proposed algo- BAT, MEBBC, IKH, VS, MBOA, and WOA algorithms.
rithm (IBAT) performs better than other existing algorithms Group 5 consists of contains K-means, ACO, ABC, DE,
and also validates the performance of proposed algorithm GA, BB-BC, BAT, MEBBC, IKH, VS, MBOA, CBPSO,
as compared to existing clustering algorithms. Moreover, and WOA algorithms. Group 6 consists of K-means, PSO,
a posthoc test is also conducted to determine the possible ACO, ABC, DE, GA, BB-BC, BAT, MEBBC, IKH, VS,
grouping of the similar algorithms. The results of posthoc MBOA, CBPSO, and WOA algorithms. Group 7 consists
test is presented into Table 23. The symbol “+” indicates of PSO, H-KHA, MEBBC, IKH, ICMPKHM, PSO-BB-BC,
13
Neighborhood search based improved bat algorithm for data clustering 10571
13
10572 A. Kaur, Y. Kumar
K-MEAN - + - - - - - + -
PSO - + + + - - - - -
ACO + + - - - + - + +
ABC - + - - - - - + -
DE - + - - - - - + -
GA - - - - - - - + -
BB-BC - - - - - - - + -
BAT - - - - - - - + -
H-KHA + - + + + + + + -
MEBBC - - + - - - - - -
IKH - - + - - - - - - -
ICM- - - + + + + + - -
PKHM
PSO- + - + + + + + + - -
BB-BC
CBP + O - - + + - - - - - -
VS - - + - - - - - - -
MBOA - - + - - - - - - -
WOA - - + - - - - - - -
ICSO - - - - - - - - - -
Chaotic TLBO + - + + + + + + - +
IBAT + + + + + + + + + +
-I -I -P -C -V -M -W -I -C -I
K C S B S B O C T B
H M O P O A S L A
PK BB-BC S A O B T
HM O O
K-MEAN - + - - - - - + + +
PSO - - - - - - - - - +
ACO + + + + + + + + + +
ABC - + + + - - - + + +
DE - + + - - - - + + +
GA - + + - - - - + + +
BB-BC - - + - - - - + + +
BAT - + + - - - - + + +
H-KHA - - - - - - - - - +
MEBBC - - - - - - - + + +
IKH - - - - - - + + +
ICM- - - - - - - - - +
PKHM
PSO- - - - - - - - - +
BB-BC
CBP + O - - - - - - - - +
VS - - - - - - + + +
MBOA - - - - - - + + +
WOA - - - - - - + + +
ICSO - - - - + + + - +
13
Neighborhood search based improved bat algorithm for data clustering 10573
Table 23 (continued)
-I -I -P -C -V -M -W -I -C -I
K C S B S B O C T B
H M O P O A S L A
PK BB-BC S A O B T
HM O O
Chaotic TLBO + - - - + + + - +
IBAT + + + + + + + + +
of datasets. Furthermore, Friedman statistical test is also 4. Xu R, Wunsch D (2005) Survey of clustering algorithms. IEEE
applied to determine the best performing algorithm. The sta- Trans Neural Netw 16(3):645–678
5. Chang D-X, Zhang X-D, Zheng C-W (2009) A genetic algorithm
tistical results showed that the hypothesis ( H0) is rejected with gene rearrangement for K-means clustering. Pattern Recogn
at confidence level of 0.05. In turn, significant difference is 42(7):1210–1222
occurred between the performance of proposed IBAT and 6. Ester M, Kriegel HP, Sander J, Xu X (1996) A density-based
other clustering algorithms. Hence, it is stated that proposed algorithm for discovering clusters in large spatial databases with
noise. In Kdd (Vol. 96, No. 34, pp. 226-231)
IBAT algorithm is best performing algorithms than rest of 7. Scheunders P (1997) A genetic c-means clustering algo-
clustering algorithms. Finally, it is also concluded that pro- rithm applied to color image quantization. Pattern Recogn
posed IBAT algorithm is a robust and an effective algorithm 30(6):859–866
for handling data clustering task. 8. Gomez-Muñoz VM, Porta-Gándara MA (2002) Local wind pat-
terns for modeling renewable energy systems by means of cluster
analysis techniques. Renew Energy 2:171–182
Abbreviations ABC: Artificial Bee Colony; ACA: Ant Clustering 9. Mitra S, Banka H (2006) Multi-objective evolutionary bi cluster-
Algorithm; ACDE: Automatic Clustering Differential Evolution; ing of gene expression data. Pattern Recogn 39:2464–2477
ACO: Ant Colony Optimization; BATC: Bat Algorithm based Clus- 10. Nanda SJ, Panda G (2014) A survey on nature inspired metaheuris-
tering; BB-BC: Big Bang–Big Crunch; CABC: Cooperative Artifi- tic algorithms for partitional clustering. Swarm and Evolutionary
cial Bee Colony; CCSSA: Chaotic Charge System Search Algorithm; computation 16:1–18
CPSO: Cooperative Particle Swarm Optimization; CS: Cuckoo 11. Cura T (2012) A particle swarm optimization approach to cluster-
Search; CSO: Cat Swarm Optimization; CSS: Charge System Search; ing. Expert Syst Appl 39(1):1582–1588
DCPSO: Dynamic Clustering Particle Swarm Optimization; DE: Dif- 12. Jordehi AR (2015) Enhanced leader PSO (ELPSO): a new PSO
ferential Evolution; FA: Firefly algorithm; FPAC: Flower Pollination variant for solving global optimisation problems. Appl Soft Com-
Algorithm based Clustering; GA: Genetic Algorithm; GAMS: Genetic put 26:401–417
Algorithm with Message-based Similarity; GTCSA: Gene Trans- 13. Karaboga, D. (2005) An idea based on honey bee swarm for
poson based Clone Selection Algorithm; GWA: Grey Wolf Algo- numerical optimization, Erciyes University, Kayseri, Turkey,
rithm; GWO: Grey Wolf Optimizer; HABC: Hybrid Artificial Bee Technical Report-TR06
Colony; HBMO: Honey Bee Mating Optimization; KH: Krill Herd; 14. Karaboga D, Basturk B (2007) A powerful and efficient algorithm
KHM: K-harmonic Means; K-MWO: K-means and Mussels Wan- for numerical function optimization: artificial bee colony (ABC)
dering Optimization; HS: Harmony Search; IBAT: Improved Bat; algorithm. J Glob Optim 39(3):459–471
ICSO: Improved Cat Swarm Optimization; ILS: Iterated Local Search; 15. Karaboga D, Akay B (2009) A comparative study of artificial bee
MCSS: Magnetic Charge System Search; MO: Magnetic Optimiza- colony algorithm. Appl Math Comput 214(1):108–132
tion; PSO: Particle Swarm Optimization; SA: Simulated Annealing; 16. Dorigo M, Birattari M, Stutzle T (2006) Artificial ants as a
TLBO: Teaching learning Based Optimization; TS: Tabu Search; computational intelligence technique. IEEE Comput Intell Mag
VGA: Variable-string-length Genetic Algorithm; MBOA: Modified 1:28–39
Butterfly Optimization Algorithm; WOA: Whale Optimization Algo- 17. Dorigo M, Maniezzo V, Colorni A (1996) Ant system: optimiza-
rithm; ICSO: Improved cat swarm optimization; Chaotic TLBO: Cha- tion by a colony of cooperating agents. IEEE Transactions on
otic Teaching Learning based optimization; VS: Vortex Search Systems, man, and cybernetics, Part B: Cybernetics 26(1):29–41
18. Kumar Y, Sahoo G (2014) A charged system search approach
for data clustering. Progress in Artificial Intelligence
2(2–3):153–166
19. Hatamlou A (2013) Black hole: A new heuristic optimization
References approach for data clustering. Inf Sci 222:175–184
20. Erol OK, Eksin I (2006) A new optimization method: big bang–
1. Kushwaha N, Pant M, Kant S, Jain VK (2018) Magnetic optimiza- big crunch. Adv Eng Softw 37(2):106–111
tion algorithm for data clustering. Pattern Recogn Lett 115:59–65 21. Jordehi AR (2014) A chaotic-based big bang–big crunch algo-
2. Kant S, Ansari IA (2016) An improved K means clustering with rithm for solving global optimisation problems. Neural Comput
Atkinson index to classify liver patient dataset. International & Applic 25(6):1329–1335
Journal of System Assurance Engineering and Management 22. Alatas B (2011) ACROA: artificial chemical reaction optimi-
7(1):222–228 zation algorithm for global optimization. Expert Syst Appl
3. Aggarwal CC, Reddy CK (2014) Data clustering. Algorithms and 38(10):13170–13180
applications. Chapman & Hall/CRC Data mining and Knowledge 23. Maulik U, Bandyopadhyay S (2000) Genetic algorithm-based
Discovery series, Londra clustering technique. Pattern Recogn 33(9):1455–1465
13
10574 A. Kaur, Y. Kumar
24. Ergezer M, Simon D, Du D Oppositional biogeography-based 47. Rahman MA, Islam MZ (2014) A hybrid clustering technique
optimization. 2009 IEEE international conference on systems, combining a novel genetic algorithm with K-Means. Knowl-Based
man and cybernetics. IEEE, 2009 Syst 71:345–365
25. Mirjalili S, Mirjalili SM, Lewis A (2014) Grey wolf optimizer. 48. Liu R et al (2012) Gene transposon based clone selection algo-
Adv Eng Softw 69:46–61 rithm for automatic clustering. Inf Sci 204:1–22
26. Wang GG, Deb S, Coelho LDS (2015) Elephant herding optimiza- 49. Kumar Y, Sahoo G (2017) A two-step artificial bee colony algo-
tion. In 2015 3rd International Symposium on Computational and rithm for clustering. Neural Comput & Applic 28(3):537–551
Business Intelligence (ISCBI) (pp. 1–5). IEEE 50. Cao F, Liang J, Jiang G (2009) An initialization method for the
27. Chu SC, Tsai PW, Pan JS (2006) Cat swarm optimization. In: K-Means algorithm using neighborhood model. Computers &
Pacific Rim international conference on artificial intelligence. Mathematics with Applications 58(3):474–483
Springer, Berlin, Heidelberg, pp 854–858 51. Han XH et al (2017) A novel data clustering algorithm based on
28. Yazdani M, Jolai F (2016) Lion optimization algorithm (LOA): a modified gravitational search algorithm. Eng Appl Artif Intell
nature-inspired metaheuristic algorithm. Journal of computational 61:1–7
design and engineering 3(1):24–36 52. Senthilnath J, Omkar SN, Mani V (2011) Clustering using firefly
29. Mirjalili S (2016) SCA: a sine cosine algorithm for solving opti- algorithm: performance study. Swarm and Evolutionary Computa-
mization problems. Knowl-Based Syst 96:120–133 tion 1(3):164–171
30. Salimi H (2015) Stochastic fractal search: a powerful metaheuris- 53. Erisoglu M, Calis N, Sakallioglu S (2011) A new algorithm for
tic algorithm. Knowl-Based Syst 75:1–18 initial cluster centers in k-means algorithm. Pattern Recogn Lett
31. Kaveh A, Dadras A (2017) A novel meta-heuristic optimiza- 32(14):1701–1705
tion algorithm: thermal exchange optimization. Adv Eng Softw 54. Kumar Y, Sahoo G (2015) Hybridization of magnetic charge
110:69–84 system search and particle swarm optimization for efficient data
32. Abraham A, Das S, Roy S (2008) Swarm intelligence algorithms clustering using neighborhood search strategy. Soft Comput
for data clustering. In: Soft computing for knowledge discovery 19(12):3621–3645
and data mining. Springer, Boston, pp 279–313 55. Zhou Y et al (2017) A simplex method-based social spider opti-
33. Chowdhury K, Chaudhuri D, Pal AK (2021) An entropy-based mization algorithm for clustering analysis. Eng Appl Artif Intell
initialization method of K-means clustering on the optimal 64:67–82
number of clusters. Neural Comput & Applic 33(12):6965–6982 56. Boushaki SI, Kamel N, Bendjeghaba O (2018) A new quantum
34. Torrente A, Romo J (2021) Initializing k-means clustering by chaotic cuckoo search algorithm for data clustering. Expert Syst
bootstrap and data depth. Journal of Classification 38(2):232–256 Appl 96:358–372
35. Ahmadi R, Ekbatanifard G, Bayat P (2021) A Modified Grey Wolf 57. Chang D et al (2012) A genetic clustering algorithm using a mes-
Optimizer Based Data Clustering Algorithm. Appl Artif Intell sage-based similarity measure. Expert Syst Appl 39(2):2194–2202
35(1):63–79 58. Zhang C, Ouyang D, Ning J (2010) An artificial bee colony
36. Ghany KKA, AbdelAziz AM, Soliman THA, Sewisy AAEM approach for clustering. Expert Syst Appl 37(7):4761–4767
(2020) A hybrid modified step whale optimization algorithm with 59. Taherdangkoo M et al (2013) A robust clustering method based on
tabu search for data clustering. Journal of King Saud University- blind, naked mole-rats (BNMR) algorithm. Swarm and Evolution-
Computer and Information Sciences ary Computation 10:1–11
37. Sörensen K (2015) Metaheuristics—the metaphor exposed. Int 60. Hatamlou A (2012) In search of optimal centroids on data clus-
Trans Oper Res 22(1):3–18 tering using a binary search algorithm. Pattern Recogn Lett
38. Yang X-S A new metaheuristic bat-inspired algorithm. Nature 33(13):1756–1760
inspired cooperative strategies for optimization (NICSO 2010). 61. Bijari K et al (2018) Memory-enriched big bang–big crunch opti-
Springer, Berlin, Heidelberg, 2010. 65–74 mization algorithm for data clustering. Neural Comput & Applic
39. Ashish T, Kapil S, Manju B (2018) Parallel bat algorithm-based 29(6):111–121
clustering using mapreduce. In: Networking Communication and 62. Abualigah LM et al (2017) A novel hybridization strategy for
Data Knowledge Engineering. Springer, Singapore, pp 73–82 krill herd algorithm applied to clustering techniques. Appl Soft
40. Fister I Jr, Fister D, Yang XS (2013) A hybrid Bat algorithm. Comput 60:423–435
ELEKTROTEHNIˇSKI VESTNIK 80(1–2):1–7 63. Pakrashi A, Chaudhuri BB (2016) A Kalman filtering induced
41. Yilmaz S, Kucuksille EU (2013) Improved bat algorithm (IBA) heuristic optimization based partitional data clustering. Inf Sci
on continuous optimization problems. Lecture Notes on Software 369:704–717
Engineering 1(3):279 64. Kang Q et al (2016) A weight-incorporated similarity-based clus-
42. Senthilnath J, Kulkarni S, Benediktsson JA, Yang XS (2016 Apr) tering ensemble method based on swarm intelligence. Knowl-
A novel approach for multispectral satellite image classifica- Based Syst 104:156–164
tion based on the bat algorithm. IEEE Geosci Remote Sens Lett 65. Wang R et al (2016) Flower pollination algorithm with bee pol-
13(4):599–603 linator for cluster analysis. Inf Process Lett 116.1:1–14
43. Neelima S, Satyanarayana N, Murthy PK (2018) Minimizing 66. Hatamlou A, Hatamlou M (2013) PSOHS: an efficient two-stage
Frequent Itemsets Using Hybrid ABCBAT Algorithm. In: Data approach for data clustering. Memetic Computing 5(2):155–161
Engineering and Intelligent Computing. Springer, Singapore, pp 67. Yan X et al (2012) A new approach for data clustering using hybrid
91–97 artificial bee colony algorithm. Neurocomputing 97:241–250
44. Aboubi Y, Drias H, Kamel N (2016) BAT-CLARA: BAT-inspired 68. Kwedlo W (2011) A clustering method combining differential
algorithm for Clustering LARge Applications. IFAC-PapersOn- evolution with the K-means algorithm. Pattern Recogn Lett
Line. 49(12):243–248 32(12):1613–1621
45. Fister I, Fong S, Brest J (2014) A novel hybrid self-adaptive bat 69. Yin M et al (2011) A novel hybrid K-harmonic means and gravita-
algorithm. Sci World J 2014:70973 tional search algorithm approach for clustering. Expert Syst Appl
46. Zhao D, He Y (2015) Chaotic binary bat algorithm for analog test 38(8):9319–9324
point selection. Analog Integr Circ Sig Process 84(2):201–214 70. Jiang H et al (2010) Ant clustering algorithm with K-harmonic
means clustering. Expert Syst Appl 37(12):8679–8684
13
Neighborhood search based improved bat algorithm for data clustering 10575
71. Xiao J et al (2010) A quantum-inspired genetic algorithm for Arvinder Kaur received her
k-means clustering. Expert Syst Appl 37(7):4966–4973 B.Tech degree in Information
72. Žalik KR (2008) An efficient k′-means clustering algorithm. Pat- Technology from Punjab Techni-
tern Recogn Lett 29(9):1385–1391 cal University, Jalandhar, Pun-
73. Jiang B, Wang N (2014) Cooperative bare-bone particle swarm jab, India in 2007 and M.Tech.
optimization for data clustering. Soft Comput 18(6):1079–1091 degree in Computer Science and
74. Kumar Y, Singh PK (2018) Improved cat swarm optimization Engineering from Punjab Tech-
algorithm for solving global optimization problems and its appli- nical University, Jalandhar, Pun-
cation to clustering. Appl Intell 48(9):2681–2697 jab, India in 2010. She is pursu-
75. Kumar Y, Singh PK (2019) A chaotic teaching learning based ing Ph.D. degree in Computer
optimization algorithm for clustering problems. Appl Intell Science and Engineering from
49(3):1036–1062 Jaypee University of Informa-
76. Fränti P, Sieranoja S (2018) K-means properties on six clustering tion Technology, Waknaghat,
benchmark datasets. Appl Intell 48(12):4743–4759 Himachal Pradesh, India. She is
77. Faris H, Ala’M AZ, Heidari AA, Aljarah I, Mafarja M, Hassonah working on data clustering. She
MA, Fujita H (2019) An intelligent system for spam detection and has published papers in interna-
identification of the most relevant features based on evolutionary tional journals and conferences of repute.
random weight networks. Information Fusion 48:67–83
78. Maulik U, Bandyopadhyay S (2000) Genetic algorithm-based
clustering technique. Pattern Recognition 33.9:1455–1465
79. Kumar Y, Sahoo G (2017) An Improved Cat Swarm Optimiza-
tion Algorithm Based on Opposition-Based Learning and Cauchy
Operator for Clustering. JIPS 13(4):1000–1013
80. Jensi R, Wiselin Jiji G (2016) An improved krill herd algorithm Yugal Kumar is presently working
with global exploration capability for solving numerical function as Assistant Professor (Senior
optimization problems and its application to data clustering. Appl Grade) in Department of Com-
Soft Comput 46:230–245 puter Science & Engineering at
81. Watkins CJ, Dayan P (1992) Q-learning. Mach Learn Jaypee University of Information
8(3–4):279–292 Technology (JUIT), Waknaghat,
82. Bouyer A, Hatamlou A (2018) An efficient hybrid clustering Himachal Pradesh, India. He has
method based on improved cuckoo optimization and modified more than 15 years of teaching
particle swarm optimization algorithms. Appl Soft Comput and research experience at
67:172–182 reputed colleges and universities
83. Hatamlou A (2017) A hybrid bio-inspired algorithm and its appli- of India. He has completed his
cation. Appl Intell 47(4):1059–1067 Ph.D. in Computer Science &
84. Doğan B, Ölmez T (2015) A new metaheuristic for numeri- Engineering from Birla institute
cal function optimization: vortex search algorithm. Inf Sci of Technology, Mesra, Ranchi. His primary area of research includes
293:125–145 meta-heuristic algorithms, data clustering, swarm intelligence, pattern
85. Mirjalili S, Lewis A (2016) The whale optimization algorithm. recognition, medical data international journals and conferences of
Adv Eng Softw 95:51–67 repute. He is serving as editorial review board member of various jour-
86. Wang G-G, Deb S, Cui Z (2019) Monarch butterfly optimization. nals including Soft Computing, Neurocomputing, Computer Methods
Neural Comput & Applic 31(7):1995–2014 and Programs in Biomedicine, PLOSE ONE, Journal of Advanced
87. Demšar J (2006) Statistical comparisons of classifiers over multi- Computational Intelligence and Intelligent Informatics and Journal of
ple data sets. The Journal of Machine Learning Research 7:1–30 Information Processing System.
88. Derrac J, García S, Molina D, Herrera F (2011) A practical tuto-
rial on the use of nonparametric statistical tests as a methodology
for comparing evolutionary and swarm intelligence algorithms.
Swarm and Evolutionary Computation 1(1):3–18
89. García S, Fernández A, Luengo J, Herrera F (2010) Advanced
nonparametric tests for multiple comparisons in the design of
experiments in computational intelligence and data mining:
Experimental analysis of power. Inf Sci 180(10):2044–2064
13