A Comparison of K-Means Clustering Algorithm and C
A Comparison of K-Means Clustering Algorithm and C
net/publication/331165943
CITATIONS READS
27 3,425
2 authors:
All content following this page was uploaded by Tanvi Gupta on 18 February 2019.
Abstract
K-Means Clustering is the clustering technique, which is used to make a number of clusters of the observations. Here the cluster’s center
point is the ‘mean’ of that cluster and the others points are the observations that are nearest to the mean value. However, in Clustering
Large Applications (CLARA) clustering, medoids are used as their center points for cluster, and rest of the observations in a cluster are
near to that center point .Actually in this, clustering will work on large datasets as compared to K-Medoids and K-Means clustering algo-
rithm, as it will select the random observations from the dataset and perform Partitioning Around Medoids (PAM) algorithm on it. This
paper will state that out of the two algorithms; K-Means and CLARA, CLARA Clustering gives better result.
Keywords:K-Means Clustering; CLARA Clustering; K-Medoids Clustering; PAM Algorithm; Iris Dataset.
Copyright © 2018 Tanvi Gupta, Supriya P. Panda. This is an open access article distributed under the Creative Commons Attribution License,
which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
International Journal of Engineering & Technology 4767
4) It gives the best result when dataset are distinct or well sep- the data mining processes. It is an unsupervised learning and is the
arated from each other[3]. arrangement of the set of identical objects in one cluster. The au-
Disadvantages: thors in this paper compare the different types of clustering like
1) The learning algorithm needs the prior information of num- partition-based, hierarchal, grid-based, density –based.
ber of cluster to be formed. In [7], the author named T.Velmurugan, analyses the performance
2) If there is an overlapping of data, K-means will not be able of the two clustering algorithms using the calculation of distance
to solve as there are clusters formed at the same space. between the two data points. Also, the computational time is cal-
3) This algorithm uses a Euclidean distance whose measures culated in order to measure the performance of the algorithm. In
can unequally weight underlying factors. this paper, K-Means is better than the K-Medoid clustering in
4) It is also not able to handle outliers and noisy data, terms of efficiency for the application they have chosen.
5) This learning algorithm is not invariant to non-linear trans-
formation i.e., with different representation of data we get
different results, 3. Experimental Setup and Dataset
6) This algorithm will not work for non-linear data,
7) If we choose the cluster center randomly, this will not lead
to the actual result. To show the comparison between K-Means clustering algorithm
and CLARA clustering algorithm, R programming is used to clus-
1.1.2. K-Medoid Clustering Algorithm ter plot the graph for both clustering techniques on Iris Dataset.
2. Literature Survey
In [4], the author named Tagaram Soni Madhulatha, explains the
concept of clustering with the difference between K-Means and K-
Fig. 3:Graph of Iris Dataset without Clustering Having Three Types of
Medoids clustering .Author says that clustering is an unsupervised Species (Setosa, Versicolor, Virginica) with Two Attributes of PetalLength
form of learning, which helps to partition the data in the clusters and PetalWidth.
using distance measures without any background knowledge.
In [5], the authors named T.Velmurugan, T.Santhanam, explained 2) K-Means cluster plot of Iris Dataset
the clustering by saying that it is an unsupervised learning. They
say that the clustering is decided based on the type of available
data and the purpose for which the clustering is to be done. They
say from the experiments they perform K-Medoids is more robust
than K-Means clustering in terms of noises and outliers but K-
Medoids is good for only small datasets.
In [6], the authors named K. Chitra, D.Maheswari explain the
different types of clustering. They say that the clustering is one of
4768 International Journal of Engineering & Technology
References
[1] S. AnithaElavarasi and J. Akilandeswari ,A Survey On Partition
Clustering Algorithms, International Journal of Enterprise Compu-
ting and Business Systems, 2011.
[2] Navneet Kaur, Survey Paper on Clustering Techniques, Interna-
tional Journal of Science, Engineering and Technology Research
(IJSETR) Volume 2, Issue 4, 2013.
[3] Preeti Arora, Deepali , ShipraVarshney,” Analysis of K-Means and
K-Medoids Algorithm For Big Data”, International Conference on
Information Security & Privacy (ICISP2015), 2015, Nagpur, IN-
DIA.
Fig. 4:K-Means Cluster Plot of Iris Dataset Using Two Components: [4] TagaramSoniMadhulatha, “Comparison between K-Means and K-
PetalLength and Petal Width. Medoids Clustering Algorithms”, advancing in computing and in-
formation technology, volume 198, pp. 472-481.
[5] T.Velmurugan, T. Santhanum, “Performance Analysis of K-Means
3) Cluster plot of CLARA clustering on Iris Dataset.
and K-Medoids Clustering Algorithms for A Randomly Generated
Data Set”, International Conference on Systemics, Cybernetics and
Informatics, pp.578-583, 2008.
[6] K. Chitra, Dr. D.Maheswari, “A Comparative Study of Various
Clustering Algorithms in Data Mining”, IJCSMC, Vol. 6, Issue 8,
pp.109 – 115, 2017.
[7] Dr. T. Velmurugan ,”Efficiency of K-Means and K-Medoids algo-
rithms for clustering arbitrary data points”, International Journal
Computer Technology & Applications, Volume 3(5),pp1758-1764,
2012.
[8] https://sites.google.com/site/dataclusteringalgorithms/k-means-
clustering-algorithm.
The above 2 figures show the cluster plot of both the clustering
techniques. Fig. 4 uses K-means algorithm with Euclidian distance
and Fig.5 uses CLARA clustering with Manhattan distances. From
both the figures we conclude that CLARA clustering has more
power to detect outliers and noise than K-Means clustering as the
point of variability is 100% for CLARA and approx. 90% for K-
Means. This proves the CLARA clustering to be more robust as
compared to K-means algorithm.
Code and output in R for finding Euclidean distance and Manhat-
tan distance on the Iris Dataset.
P<-1:10
> Q<-11:20
> x<-rbind(P,Q)
>distance(x,method="euclidean")
euclidean
31.62278
>distance(x,method="manhattan")
manhattan
100
4. Conclusion
This paper is regarding the comparison of K-Means Clustering
and CLARA Clustering on Iris Dataset, which are using Euclidean
distance and Manhattan Distance as a dissimilarity measure, re-
spectively. After plotting of graphs using the two attributes of
dataset that are “Petal. Length” and “Petal. Width”, we can con-
clude that CLARA Clustering using Manhattan distance is better
than K-Means Clustering with Euclidean distance.