0% found this document useful (0 votes)

278 views

K-Means Clustering Algorithm With Numerical Example

K-means Clustering Algorithm With Numerical Example

Uploaded by

thummalakuntas

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF or read online on Scribd

0% found this document useful (0 votes)

278 views

K-Means Clustering Algorithm With Numerical Example

K-means Clustering Algorithm With Numerical Example

Uploaded by

thummalakuntas

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF or read online on Scribd

You are on page 1/ 11

K-means Clustering Algorithm With Numerical Example k-means clustering is one of the most used clustering algorithms in machine learning. In this article, we will discuss the concept, examples, advantages, and disadvantages of the k-means clustering algorithm. We will also discuss a numerical on k-means clustering to understand the algorithm in a better way. What is K-means Clustering? k-means clustering is an unsupervised machine learning algorithm used to group a dataset into k clusters. It is an iterative algorithm that starts by randomly selecting k centroids in the dataset. After selecting the centroids, the entire dataset is divided into clusters based on the distance of the data points from the centroid. In the new clusters, the centroids are calculated by taking the mean of the data points. With the new centroids, we regroup the dataset into new clusters. This process continues until we get a stable cluster.K-means clustering is a partition clustering algorithm. We call it partition clustering because of the reason that the k-means clustering algorithm partitions the entire dataset into mutually exclusive clusters. K-means Clustering Algorithm To understand the process of clustering using the k-means clustering algorithm and solve the numerical example, let us first state the algorithm. Given a dataset of N entries 4 number K as the number of clusters that need to be formed, we will use the followin steps to find the clusters using the k-means algorithm,1. First, we will select K random entries from the dataset and use them as centroids. 2. Now, we will find the distance of each entry in the dataset from the centroids. You can use any distance metric such as euclidean distance, Manhattan distance, or squared euclidean distance. 3. After finding the distance of each data entry from the centroids, we will start assigning the data points to clusters.We will assign each data point to the cluster with the centroid to which it has the least distance. 4. After assigning the points to clusters, we will calculate the new centroid of the clusters. For this, we will use the mean of each data point in the same cluster as the new centroid. If the newly created centroids are the same as the centroids in the previous iteration, we will consider the current clusters to be final. Hence, we will stop the execution of the algorithm. If any of the newly created centroids is different from the centroids in the previous iteration, we will go to step 2. K-means Clustering Numerical Example with Solution Now that we have discussed the algorithm, let us solve a numerical problem on k means clustering. The problem is as follows.You are given 15 points in the Cartesian coordinate system as follows. Point Coordinates Al (2,10) A2 (2,6) AB (11,11) Aa (6,9) AS (6,4) AG (1,2) AT (5,10) AB (4,9)Ag (10,12) A10 (7,5) AN (9,11) Ala (4,6) AB (3,10) AI4 3,8) AIS (6.11) Input Dataset We are also given the information that we need to make 3 clusters. It means we are given K=3.We will solve this numerical on k-means clustering using the approach discussed below. First, we will randomly choose 3 centroids from the given data. Let us consider A2 (2,6), AT (5,10), and A15 (6,11) as the centroids of the initial clusters. Hence, we will consider that * Centroid 1=(2,6) is associated with cluster 1. * Centroid 2=(5,10) is associated with cluster 2 * Centroid 3=(6,11) is associated with cluster 3 Now we will find the euclidean distance between each point and the centroids. Based on the minimum distance of each point from the centroids, we will assign the points to a cluster. | have tabulated the distance of the given points from the clusters in the following table point | Distance from Distance from Distance from Assigned in Centroid 1 (2,6) | Centroid 2(5,10) | Centroid 3 (6,11) _| Cluster Al 3 4.123106 Cluster 2 (2,10)A2 (26) | 0 5 6.403124 Cluster 1 s 10.29563 6.082763 5 Cluster 3 (11,11) " uner A4 (69) | 5 1.414214 2 Cluster 2 AS (6,4) | 4.472136 6.082763 7 Cluster 1 6 (1,2) | 4.123106 8.944272 10.29563 Cluster 1 av 5 0 1.414214 Cluster 2 610) : uster ‘A8 (4,9) | 3.605551 1.414214 2.828427 Cluster 2 Ag 10 5.385165 4.123106 Cluster 3 (10,12) Alo 75) 5.09902 5.385165 6.082763 Cluster 1 Al 8.602325 4.123106 3 Cluster 3 (9,11) Al2 2 4.123106 5.385165 Cluster 1 (4,6) ANB 4.123106 2 3.162278 Cluster 2 (3,10) Al4 aay | 2236068 2.828427 4.242641 Cluster 1 AIS 6.403124 1.414214 0 Cluster 3 (6,11) Results from 1st iteration of K means clustering At this point, we have completed the first iteration of the k-means clustering algorithm and assigned each point into a cluster.In the above table, you can observe that the point that is closest to the centroid of a given cluster is assigned to the cluster. %3,70,999 31,70,999 Now, we will calculate the new centroid for each cluster. * In cluster 1, we have 6 points ie. A2 (2,6), AS (6,4), AG (1,2), A10 (7,5), A12 (4,6), A14 (3,8). To calculate the new centroid for cluster 1, we will find the mean of the x and y coordinates of each point in the cluster. Hence, the new centroid for cluster 1 is (3.833, 5.167) * In cluster 2, we have 5 points i.e. A1 (2,10), A4 (6,9), A7 (5,10) , A8 (4,9), and A13 (3,10). Hence, the new centroid for cluster 2 is (4, 9.6) * In cluster 3, we have 4 points i.e. A3 (11,11), A9 (10,12), A11 (9,11), and A15 (6,11). Hence, the new centroid for cluster 3 is (9, 11.25). Now that we have calculated new centroids for each cluster, we will calculate the distance of each data point from the new centroids. Then, we will assign the points to clusters based on their distance from the centroids. The results for this process have been given in the following table. Distance from Distance from . Distance from : Assigned Point | Centroid 1 (3.833, centroid 3 (9, centroid 2 (4, 9.6) Cluster 5.167) 11.25) Al 5.169 2.040 7411 Cluster 2 (2,10) A2 (2,6) | 2.013 4118 8.750 Cluster 1A3 9.241 7.139 2.016 Cluster 3 (11,11) ‘Ad (6,9) | 4.403 2.088 3.750 Cluster 2 AS (6,4) | 2.461 5.946 7.846 Cluster 1 A6 (1,2) | 4.249 8.171 12.230 Cluster 1 av 4972 1.077 4191 Cluster 2 6109 { . luster A8 (4,9) | 3.837 0.600 5.483 Cluster 2 Ag 9.204 6.462 1.250 Cluster 3 (10,12) Alo 3.4171 5,492 6.562 Cluster 1 (7,5) au 7.792 5.192 0.250 Cluster 3 (9,11) : uner av 0.850 3.600 7.250 Cluster 1 (4,6) Al 4.904 1.077 6.129 Cluster 2 (3,10) Al4 2.953 1.887 6.824 Cluster 2 (3,8) ats 6.223 2.441 3.010 Cluster 2 (611) Results from 2nd iteration of K means clustering, Now, we have completed the second iteration of the k-means clustering algorithm and assigned each point into an updated cluster. In the above table, you can observe that the point closest to the new centroid of a given cluster is assigned to the cluster.Now, we will calculate the new centroid for each cluster for the third iteration. * In cluster 1, we have 5 points i.e. A2 (2,6), AS (6,4), A6 (1,2), A10 (7,5), and A12 (4,6). To calculate the new centroid for cluster 1, we will find the mean of the x and y coordinates of each point in the cluster. Hence, the new centroid for cluster 1 is (4, 46) * In cluster 2, we have 7 points ie. A1 (2,10), Ad (6,9), A7 (5,10) , A8 (4,9), A13 (3,10), A14 (3,8), and A15 (6,11). Hence, the new centroid for cluster 2 is (4.143, 9.571) * In cluster 3, we have 3 points ie. A3 (11,11), A9 (10,12), and A11 (9,11). Hence, the new centroid for cluster 3 is (10, 11.333). At this point, we have calculated new centroids for each cluster. Now, we will calculate the distance of each data point from the new centroids. Then, we will assign the points to clusters based on their distance from the centroids. The results for this process have been given in the following table. Distance from _| Distance from Distance from . : Assigned Point | Centroid 1 (4, | centroid 2 (4.143, | centroid 3 (10, Cluster 46) 9.571) 11.333) Al 5.758 2.186 8.110 Cluster 2 2,10) A2 (2,6) | 2.441 4.165 9.615 Cluster 1 a3 9.485 7.004 1.054 Cluster 3 (11,11) A4 (6,9) | 4.833 1.943 4.631 Cluster 2 AS (6,4) | 2.088 5.872 8353 Cluster 1 A6 (1,2) | 3970 8.197 12,966 Cluster 1 AT 5.492 0.958 5.175, Cluster 2 (5,10) A8 (4,9) | 4.400 0.589 6.438 Cluster 2Ad 9.527 6.341 0.667 Cluster 3 (10,12) A10 3.027 5.390 7.008 Cluster 1 (7,5) Au 8.122 5.063 1.054 Cluster 3 (9,11) Al2 1.400 3.574 8.028 Cluster 1 (4,6) A13 5.492 1.221 7.126 Cluster 2 (3,10) Ala 3.544 1.943 7.753 Cluster 2 (3,8) AIS 6.705 2.343 4.014 Cluster 2 (6,11) Results from 3rd iteration of K means clustering Now, we have completed the third iteration of the k-means clustering algorithm and assigned each point into an updated cluster. In the above table, you can observe that the point that is closest to the new centroid of a given cluster is assigned to the cluster. Now, we will calculate the new centroid for each cluster for the third iteration * In cluster 1, we have 5 points i.e. A2 (2,6), AS (6,4), A6 (1,2), A10 (7,5), and A12 (4,6). To calculate the new centroid for cluster 1, we will find the mean of the x and y coordinates of each point in the cluster. Hence, the new centroid for cluster 1 is (4, 46) * In cluster 2, we have 7 points i.e. A1 (2,10), A4 (6,9), A7 (5,10) , A8 (4,9), A13 (3,10), A14 (3,8), and A15 (6,11). Hence, the new centroid for cluster 2 is (4.143, 9.571) * In cluster 3, we have 3 points i.e. A3 (11,11), A9 (10,12), and A11 (9,11). Hence, the new centroid for cluster 3 is (10, 11.333).Here, you can observe that no point has changed its cluster compared to the previous iteration. Due to this, the centroid also remains constant. Therefore, we will say that the clusters have been stabilized. Hence, the clusters obtained after the third iteration are the final clusters made from the given dataset. If we plot the clusters on a graph, the graph looks like as follows. 2 70. 12) 66.11) 49,112) @° 2337401) 10 42-10) g3,.10) £ 143337) 3) 66.9) 8 33.8) 6 22. 6) 34.6) os 46.5) $7.5) 4 248.2) 2 4 6 8 10 Plot for K-Means Clustering In the above plot, points in the clusters have been plotted using red, blue, and black markers. The centroids of the clusters have been marked using green circles Applications of K-means Clustering in Machine Learningk-means clustering algorithm finds its applications in various domains. Following are some of the popular applications of k-means clustering, * Document Classification: Using k-means clustering, we can divide documents into various clusters based on their content, topics, and tags. * Customer segmentation: Supermarkets and e-commerce websites divide their customers into various clusters based on their transaction data and demography. This helps the business to target appropriate customers with relevant products to increase sales. © Cyber prot groups to identify their relationships. With k-means clustering, we can easily make J: In cyber profiling, we collect data from individuals as well as clusters of people based on their connection to each other to identify any available patterns, * Image segmentation: We can use k-means clustering to perform image segmentation by grouping similar pixels into clusters. + Fraud detection in banking and insurance: By using historical data on frauds, banks and insurance agencies can predict potential frauds by the application of k- means clustering. Apart from these examples, there are various other applications of k-means clustering such as ride-share data analysis, social media profiling, identification of crime localities, etc. Suggested Reading: Advanced Coding Concepts Advantages of K-means Clustering Algorithm Following are some of the advantages of the k-means clustering algorithm * Easy to implement: K-means clustering is an iterable algorithm and a relatively simple algorithm. In fact, we can also perform k-means clustering manually as we did in the numerical example. * Scalabi : We can use k-means clustering for even 10 records or even 10 million records in a dataset. It will give us results in both cases. * Convergence: The k-means clustering algorithm is guaranteed to give us results: It guarantees convergence. Thus, we will get the result of the execution of the algorithm for sure.* Generali: tion: K-means clustering doesn't apply to a specific problem. From numerical data to text documents, you can use the k-means clustering algorithm on any dataset to perform clustering. It can also be applied to datasets of different sizes having entirely different distributions in the dataset. Hence, this algorithm is completely generalized. * Choice of centroids: You can warm-start the choice of centroids in an easy manner. Hence, the algorithm allows you to choose and assign centroids that fit well with the dataset. Disadvantages of K-means Clustering Algorithm With all the advantages, the k-means algorithm has certain disadvantages too which are discussed below. * Deciding the number of clusters: In k-means clustering, you need to decide the number of clusters by using the elbow method * Choice of ini centroids: The number of iterations in the clustering process completely depends on the choice of centroids. Hence, you need to properly choose the centroids in the initial step for maximizing the efficiency of the algorithm. * Effect of outliers: In the execution of the k-means clustering algorithm, we use all the points in a cluster to determine the centroids for the next iteration. If there are outliers in the dataset, they highly affect the position of the centroids. Due to this, the clustering becomes inaccurate. To avoid this, you can try to identify outliers and remove them in the data cleaning process. * Curse of Dimensionality: If the number of dimensions in the dataset increases, the distance of the data points from a given point starts converging to a specific value. Due to this, k-means clustering that calculates the clusters based on the distance between the points becomes inefficient. To overcome this problem, you can use advanced clustering algorithms like spectral clustering. Alternatively, you can also try to reduce the dimensionality of the dataset while data preprocessing. Conclusion In this article, we have explained the k-means clustering algorithm with a numerical example. We have also discussed the applications, advantages, and disadvantages of the k-means clustering algorithm,

AI - 01 Practice Exercise 1 Answer
100% (1)
AI - 01 Practice Exercise 1 Answer
2 pages
Correction Exercices DIANA Et AGnes
No ratings yet
Correction Exercices DIANA Et AGnes
3 pages
Machine Learning Exercises in Python, Part 1: Curious Insight
No ratings yet
Machine Learning Exercises in Python, Part 1: Curious Insight
14 pages
Unsupervised Learning 2024-PPG
No ratings yet
Unsupervised Learning 2024-PPG
85 pages
IS328 Final Exam
No ratings yet
IS328 Final Exam
12 pages
K Mean Clustering
No ratings yet
K Mean Clustering
36 pages
Support Vector Machines PDF
100% (1)
Support Vector Machines PDF
37 pages
Ain Shams University Faculty of Engineering
No ratings yet
Ain Shams University Faculty of Engineering
2 pages
Data Mining Exercises - Solutions
No ratings yet
Data Mining Exercises - Solutions
5 pages
Expectation Maximization
No ratings yet
Expectation Maximization
23 pages
ET4248E - Chap9 - K-Means and GMM
No ratings yet
ET4248E - Chap9 - K-Means and GMM
27 pages
Machine Learning CA 2
No ratings yet
Machine Learning CA 2
19 pages
Unsupervised Learning
No ratings yet
Unsupervised Learning
24 pages
Introduction To Tree Methods
No ratings yet
Introduction To Tree Methods
15 pages
K Means Example
No ratings yet
K Means Example
10 pages
Approximate Inference
No ratings yet
Approximate Inference
37 pages
Motion Detection
No ratings yet
Motion Detection
33 pages
Single Link Example
No ratings yet
Single Link Example
8 pages
PCA With An Example
No ratings yet
PCA With An Example
7 pages
3.1 C 4.5 Algorithm-19
No ratings yet
3.1 C 4.5 Algorithm-19
10 pages
A Short Course On Nonparametric Curve Estimation R PDF
No ratings yet
A Short Course On Nonparametric Curve Estimation R PDF
114 pages
Data Mining
No ratings yet
Data Mining
6 pages
Support Vector Machines
No ratings yet
Support Vector Machines
14 pages
Artificial Intelligence DITI 3113: Uniformed Search I
No ratings yet
Artificial Intelligence DITI 3113: Uniformed Search I
51 pages
ML Unit-3.-1
No ratings yet
ML Unit-3.-1
28 pages
Order of Complexity Analysis
No ratings yet
Order of Complexity Analysis
9 pages
Naive Bayes Classification
No ratings yet
Naive Bayes Classification
47 pages
37-39 Backtracking Algorithms
100% (1)
37-39 Backtracking Algorithms
33 pages
Association Rules FP Growth
No ratings yet
Association Rules FP Growth
32 pages
Decision Tables Exercises
100% (1)
Decision Tables Exercises
3 pages
TDT4136 Introduction To Artificial Intelligence: Lecture 1: Introduction (Chapter 1 in The Textbook)
No ratings yet
TDT4136 Introduction To Artificial Intelligence: Lecture 1: Introduction (Chapter 1 in The Textbook)
40 pages
Chapter
100% (1)
Chapter
101 pages
Problem Solving
No ratings yet
Problem Solving
20 pages
The Design and Implementation of Host-Based Intrusion Detection System
100% (1)
The Design and Implementation of Host-Based Intrusion Detection System
4 pages
Data Structures and Algorithms: Assignment 1
No ratings yet
Data Structures and Algorithms: Assignment 1
4 pages
Asymptotic Notation PDF
No ratings yet
Asymptotic Notation PDF
22 pages
Distributed Databases: Solutions To Practice Exercises
No ratings yet
Distributed Databases: Solutions To Practice Exercises
4 pages
2018 2019 IEEE MATLAB Image Processing PROJECT TITLES PDF
No ratings yet
2018 2019 IEEE MATLAB Image Processing PROJECT TITLES PDF
7 pages
Machine Learning Random Forest Algorithm - Javatpoint
No ratings yet
Machine Learning Random Forest Algorithm - Javatpoint
14 pages
Download Full (Ebook) Artificial Intelligence and Heuristics for Smart Energy Efficiency in Smart Cities: Case Study: Tipasa, Algeria by Mustapha Hatti ISBN 9783030920371, 3030920372 PDF All Chapters
100% (8)
Download Full (Ebook) Artificial Intelligence and Heuristics for Smart Energy Efficiency in Smart Cities: Case Study: Tipasa, Algeria by Mustapha Hatti ISBN 9783030920371, 3030920372 PDF All Chapters
81 pages
Chapter 5 Algorithmic Complexity
No ratings yet
Chapter 5 Algorithmic Complexity
4 pages
Unit I
No ratings yet
Unit I
85 pages
Dimensionality Reduction Lecture Slide
No ratings yet
Dimensionality Reduction Lecture Slide
27 pages
Unit 3 KNN
No ratings yet
Unit 3 KNN
16 pages
Support Vector Machines: Dominik Wisniewski Wojciech Wawrzyniak
No ratings yet
Support Vector Machines: Dominik Wisniewski Wojciech Wawrzyniak
16 pages
An To An A That It It An: I. (L, The
No ratings yet
An To An A That It It An: I. (L, The
10 pages
Decision Tree Entropy Gini
No ratings yet
Decision Tree Entropy Gini
5 pages
Transformer Architecture
No ratings yet
Transformer Architecture
18 pages
Dimension Reduction
No ratings yet
Dimension Reduction
15 pages
Beamer Template 2
100% (1)
Beamer Template 2
12 pages
Chandigarh Group of Colleges College of Engineering Landran, Mohali
No ratings yet
Chandigarh Group of Colleges College of Engineering Landran, Mohali
47 pages
Certainty Factor
100% (2)
Certainty Factor
41 pages
Subsets, Graph Coloring, Hamiltonian Cycles, Knapsack Problem. Traveling Salesperson Problem
No ratings yet
Subsets, Graph Coloring, Hamiltonian Cycles, Knapsack Problem. Traveling Salesperson Problem
22 pages
Data Warehousing, OLAP, Data Mining Practice Questions Solutions
No ratings yet
Data Warehousing, OLAP, Data Mining Practice Questions Solutions
4 pages
Convolutional Neural Network
100% (1)
Convolutional Neural Network
3 pages
Tutorial 11 Answers
No ratings yet
Tutorial 11 Answers
4 pages
Branch and Price - Wikipedia
No ratings yet
Branch and Price - Wikipedia
3 pages
Improved Shuffled Frog Leaping Algorithm For Continuous Optimization Problem
No ratings yet
Improved Shuffled Frog Leaping Algorithm For Continuous Optimization Problem
4 pages
Updated_k-means Naive bayes
No ratings yet
Updated_k-means Naive bayes
11 pages
K-Means Clustering
No ratings yet
K-Means Clustering
7 pages
Introduction To Artificial Neural Networks
No ratings yet
Introduction To Artificial Neural Networks
25 pages
Overfitting vs. Underfitting, Bias vs. Variance
No ratings yet
Overfitting vs. Underfitting, Bias vs. Variance
7 pages
Gradient Descent
No ratings yet
Gradient Descent
3 pages
Bayes Algorithm
No ratings yet
Bayes Algorithm
26 pages
STMW Internet - of - Things - Lab - Manual
No ratings yet
STMW Internet - of - Things - Lab - Manual
50 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

K-Means Clustering Algorithm With Numerical Example

Uploaded by

K-Means Clustering Algorithm With Numerical Example

Uploaded by

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.