1730702218_ML13_Kmeans
1730702218_ML13_Kmeans
1730702218_ML13_Kmeans
9 9
8 8
7 7
6 6
5 5
4 4
3 3
2 2
1 1
0 0
0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10
10
10
9
9
8
8
7
7
6
6
5
5
4
4
3
3
2
2
1
1
0
0
0 1 2 3 4 5 6 7 8 9 10
0 1 2 3 4 5 6 7 8 9 10
Use k-means clustering algorithm to divide the following data into
two clusters and also compute the representative data points for
the clusters.
= (4.5, 4)
= (4.5, 4)
Re compute the
distances of the
given data points
Cluster centres
are recalculated
and found there is
no change in the
centroid, so we
STOP
= (4.5, 4)
Comments on the K-Means Method
• Strength
• Relatively efficient
• Weakness
• Applicable only when mean is defined, then what
about categorical data?
• Need to specify k, the number of clusters, in advance
• Unable to handle noisy data and outliers
Optimal number of clusters (k)
• This method tries to measure the homogeneity or heterogeneity within
the cluster for various values of ‘k’.
• The measure of quality of clustering uses the Sum of Squares technique
• Within Cluster Sum of Squares (WCSS) for a given k is computed as
0
-4 -2 0 2 4 6
D -2 C
-4
X1