0% found this document useful (0 votes)

3 views8 pages

Kmean

kmeans and how they are used in AI

Uploaded by

test.thegr1ffyn

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

3 views8 pages

Kmean

kmeans and how they are used in AI

Uploaded by

test.thegr1ffyn

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 8

K-means Clustering (unsupervised learning Algorithm)

K-means clustering, a popular unsupervised learning algorithm, in simple terms. Imagine you have a
box of mixed candies and you want to sort them into groups based on their colors. Here's how K-
means works step by step:

Step-by-Step Explanation
1. Choosing the Number of Groups (K):
o First, you decide how many groups (clusters) you want to divide the candies into. Let's
say you want to sort them into 3 groups.
2. Placing the Initial Centers (Centroids):
oImagine putting 3 invisible markers randomly in your box of candies. These markers
represent the centers (or centroids) of your 3 groups.
3. Assigning Candies to Groups:
oLook at each candy one by one and find out which marker (centroid) is closest to it.
Assign the candy to that group. For example, if a red candy is closest to the first
marker, it goes into the first group.
4. Moving the Markers:
oOnce all the candies are assigned to groups, you move the markers to the center of
their respective groups. This means you calculate the average position of all the
candies in each group and place the marker there.
5. Repeating the Process:
o Repeat steps 3 and 4 until the markers (centroids) don't move much anymore. This
means the candies are well grouped, and the markers have found their best positions.
6. Final Groups:
o The candies are now sorted into 3 groups based on their colors, with each group
having its own centroid.

Algorithm (Technical Details)

1. Initialization:
o Choose the number of clusters ( K ).
Data points o Randomly select ( K ) initial centroids from the data points.
2. Assignment Step:
o For each data point, calculate the distance to each centroid.
o Assign each data point to the nearest centroid's cluster.

3. Update Step:
o Calculate the new centroid of each cluster by taking the mean of all data points in the
cluster.
o Move the centroid to this new mean position.
4. Repeat:
o Repeat the Assignment and Update steps until the centroids no longer change
significantly (convergence).

Slides Algorithm
Simple Calculation Example
Let's say you have the following 2D points (candies):

 (1, 2)
 (2, 3)
 (3, 4)
 (8, 8)
 (9, 9)
 (10, 10)

And you want to divide them into ( K = 2 ) clusters.

1. Initialize Centroids:
o Let's randomly pick (1, 2) and (9, 9) as the initial centroids.
2. Assignment Step:
o Calculate the distance from each point to the centroids:

 (1, 2) to (1, 2) = 0
 (1, 2) to (9, 9) = 10.63
 (2, 3) to (1, 2) = 1.41
 (2, 3) to (9, 9) = 9.22
 (3, 4) to (1, 2) = 2.83
 (3, 4) to (9, 9) = 7.81
 (8, 8) to (1, 2) = 9.21
 (8, 8) to (9, 9) = 1.41
 (9, 9) to (1, 2) = 10.63
 (9, 9) to (9, 9) = 0
 (10, 10) to (1, 2) = 12.04
 (10, 10) to (9, 9) = 1.41
o Assign points to the nearest centroid:
 Group 1: (1, 2), (2, 3), (3, 4)
 Group 2: (8, 8), (9, 9), (10, 10)
3. Update Step:
oCalculate new centroids:
 Group 1: Mean of (1, 2), (2, 3), (3, 4) = (2, 3)
 Group 2: Mean of (8, 8), (9, 9), (10, 10) = (9, 9)
4. Repeat:
o Assign points again:
 Group 1: (1, 2), (2, 3), (3, 4)
 Group 2: (8, 8), (9, 9), (10, 10)
o New centroids remain the same, so we stop.
Final Clusters
 Cluster 1: (1, 2), (2, 3), (3, 4)
 Cluster 2: (8, 8), (9, 9), (10, 10)

That's K-means clustering! You started with random centroids, assigned points to the nearest ones,
updated the centroids, and repeated the process until the centroids stabilized.

Optimization Objectives of K-mean

Now just like supervised learning, unsupervised learning also have its optimization objective as we will
be discussing in the k-mean section:

The optimization objective of K-means clustering is to minimize the within-cluster sum of squares
(WCSS), also known as the sum of squared errors (SSE). This objective can be understood as trying
to make the points within each cluster as close to each other as possible, which means minimizing the
distance between the points and their respective centroids.

Mathematical Formulation

WCSS
Explanation

1. Within-Cluster Sum of Squares (WCSS):

o For each cluster i, calculate the squared distance between each point in the
cluster and the cluster's centroid.
o Sum these squared distances for all points in the cluster.
o Sum these values across all clusters.
2. Minimization Goal:

Example with Calculation

Let's go through a simple example with 2 clusters and a few data points.

Data Points:

 (1, 2)
 (2, 3)
 (3, 4)
 (8, 8)
 (9, 9)
 (10, 10)

Initial Centroids:

 Centroid 1: (1, 2)
 Centroid 2: (9, 9)

Assignment Step:

Based on the previous calculations, the points are assigned as follows:

 Cluster 1: (1, 2), (2, 3), (3, 4)

 Cluster 2: (8, 8), (9, 9), (10, 10)

Centroid Calculation:

 New centroid for Cluster 1: Mean of (1, 2), (2, 3), (3, 4) = (2, 3)
 New centroid for Cluster 2: Mean of (8, 8), (9, 9), (10, 10) = (9, 9)
Total WCSS

WCSS = WCSS for Cluster 1 + WCSS for Cluster 2 = 4 + 4 = 8

The goal of K-means is to minimize this total WCSS value. During each iteration, the algorithm
updates the centroids and reassigns points to clusters in a way that gradually reduces the WCSS
until it cannot be reduced any further (convergence).

By minimizing WCSS, K-means ensures that the points within each cluster are as close to each
other (and their centroid) as possible, leading to more compact and well-defined clusters.

Now finally algorithm will act like this (no change in the syntax)
Random initialization of Clustering Centroids
 We must have K < m ( training examples)
 So, what we can do is select a data point randomly and make him a centroid.

 Figure 1 with centroid away is a good one but figure 2 has selection not good. So this mean
that K-means can converge into different solution depending on the type of random selection
so It’s important to have multiple random selections:

Figure: multiple random initializations of centroids

“If you have K (no of clusters greater than 10 then the multiple
random initialization is not worth it because I will give same value
maximum time but If we have less than 10 clusters then it’s better to
use multiple random initialization and at the last select the one with
the least amount of distortion ( cost function)”

AstroFlex AstroStart RSS-5224 - Install
100% (3)
AstroFlex AstroStart RSS-5224 - Install
30 pages
Painless Pre-Algebra
From Everand
Painless Pre-Algebra
Barron's Educational Series
3/5 (2)
Manual On Housing For Pigs
100% (5)
Manual On Housing For Pigs
87 pages
Kmean
No ratings yet
Kmean
4 pages
Chapter 9
No ratings yet
Chapter 9
8 pages
algo
No ratings yet
algo
59 pages
KMeans_Clustering
No ratings yet
KMeans_Clustering
11 pages
Lecture 18 K Means Clustering
No ratings yet
Lecture 18 K Means Clustering
77 pages
Clustering
No ratings yet
Clustering
18 pages
Unsupervised Learning
No ratings yet
Unsupervised Learning
12 pages
K-Means Clustering
No ratings yet
K-Means Clustering
38 pages
Clustering
No ratings yet
Clustering
24 pages
Unsupervised Learning (1)
No ratings yet
Unsupervised Learning (1)
27 pages
13: Clustering: Unsupervised Learning - Introduction
No ratings yet
13: Clustering: Unsupervised Learning - Introduction
4 pages
UNIT-4
No ratings yet
UNIT-4
22 pages
K Means Clustering Algorithm
No ratings yet
K Means Clustering Algorithm
12 pages
K Mean
No ratings yet
K Mean
7 pages
Clustering Algorithms
No ratings yet
Clustering Algorithms
19 pages
K Means
No ratings yet
K Means
26 pages
AI Chapter 3 Part 5
No ratings yet
AI Chapter 3 Part 5
30 pages
Introduction To Kmeans
No ratings yet
Introduction To Kmeans
4 pages
Clustering
No ratings yet
Clustering
17 pages
UNIT 4
No ratings yet
UNIT 4
125 pages
Unit 4 Aam
No ratings yet
Unit 4 Aam
26 pages
A Paper With 12pt Global Font Size
No ratings yet
A Paper With 12pt Global Font Size
13 pages
Week 9
No ratings yet
Week 9
66 pages
K - Means Clustering
No ratings yet
K - Means Clustering
13 pages
Working of K Means Algorithm - YashBhure
No ratings yet
Working of K Means Algorithm - YashBhure
14 pages
Clustering
No ratings yet
Clustering
125 pages
Pilot
No ratings yet
Pilot
3 pages
K-means clustering
No ratings yet
K-means clustering
7 pages
Unsupervised Learning 2024-PPG
No ratings yet
Unsupervised Learning 2024-PPG
85 pages
K-Means Clustering Algorithm - Javatpoint
No ratings yet
K-Means Clustering Algorithm - Javatpoint
21 pages
KMeans Clustering
No ratings yet
KMeans Clustering
16 pages
ML Unit-2
No ratings yet
ML Unit-2
31 pages
K Means Algorithms
No ratings yet
K Means Algorithms
27 pages
DM&BAFall2204 2
No ratings yet
DM&BAFall2204 2
61 pages
kmeansfinal
No ratings yet
kmeansfinal
16 pages
Kmeans
No ratings yet
Kmeans
92 pages
k Means Clustering
No ratings yet
k Means Clustering
29 pages
42-Unsupervised Learning - k-means clustering-21-11-2024
No ratings yet
42-Unsupervised Learning - k-means clustering-21-11-2024
18 pages
9unsupervised Learning - K-Means Clustering (ML)
No ratings yet
9unsupervised Learning - K-Means Clustering (ML)
12 pages
Lecture - 10 Unsupervised Learning & K-Means Clustering
No ratings yet
Lecture - 10 Unsupervised Learning & K-Means Clustering
31 pages
UNIT - 3 - Clustering
No ratings yet
UNIT - 3 - Clustering
21 pages
Clustering Algorithm
No ratings yet
Clustering Algorithm
47 pages
Clustering K-Means
100% (2)
Clustering K-Means
28 pages
Ml Unit5 Notes
No ratings yet
Ml Unit5 Notes
18 pages
Mod4_Unsupervised Learning
No ratings yet
Mod4_Unsupervised Learning
9 pages
Unit 4 Clustering - K-Means and Hierarchical
No ratings yet
Unit 4 Clustering - K-Means and Hierarchical
40 pages
ML-12
No ratings yet
ML-12
19 pages
K-Means Clustering Clearly Explained
No ratings yet
K-Means Clustering Clearly Explained
12 pages
K - Means Clustering
No ratings yet
K - Means Clustering
3 pages
Unit 3 - KmeansClustering
No ratings yet
Unit 3 - KmeansClustering
17 pages
Clustering Analysis: What Is Cluster Analysis?
No ratings yet
Clustering Analysis: What Is Cluster Analysis?
5 pages
06. k Clustering
No ratings yet
06. k Clustering
28 pages
K means Clustering
No ratings yet
K means Clustering
11 pages
2021 Clustering
No ratings yet
2021 Clustering
50 pages
Clustering
No ratings yet
Clustering
6 pages
Unit IV
No ratings yet
Unit IV
96 pages
K-MEANS-FINAL
No ratings yet
K-MEANS-FINAL
10 pages
Lec8 - K Mean Clustering - Converted
No ratings yet
Lec8 - K Mean Clustering - Converted
9 pages
ML Application in Signal Processing and Communication Engineering
No ratings yet
ML Application in Signal Processing and Communication Engineering
27 pages
Principles of Management
100% (1)
Principles of Management
17 pages
Ac STMT
No ratings yet
Ac STMT
13 pages
Blinding Works
No ratings yet
Blinding Works
3 pages
Group Project Declaration Form
No ratings yet
Group Project Declaration Form
2 pages
Bearing Calculator
No ratings yet
Bearing Calculator
2 pages
Operating Instructions Parts List: Instructions Before Use This Hand Stacker
No ratings yet
Operating Instructions Parts List: Instructions Before Use This Hand Stacker
11 pages
CA 2mark and 16 Mark With Answer
100% (1)
CA 2mark and 16 Mark With Answer
109 pages
Raised Cosine
No ratings yet
Raised Cosine
4 pages
Amanda R Donnelly
No ratings yet
Amanda R Donnelly
1 page
Unit Conversion Factors
No ratings yet
Unit Conversion Factors
3 pages
CMU-Q Dean
No ratings yet
CMU-Q Dean
4 pages
Manual Configuracion IMT25
No ratings yet
Manual Configuracion IMT25
90 pages
Exp04 - DC Circuit With Resistive Load
100% (1)
Exp04 - DC Circuit With Resistive Load
12 pages
Test
100% (1)
Test
4 pages
Permission Letter
No ratings yet
Permission Letter
1 page
Ruvac Ra
No ratings yet
Ruvac Ra
26 pages
Woods Silencer Catalogue and Technical Data
No ratings yet
Woods Silencer Catalogue and Technical Data
12 pages
Ceiling cassetteinstallationIM818MCK PDF
No ratings yet
Ceiling cassetteinstallationIM818MCK PDF
24 pages
Enclosure Heaters: Continuous Thermal Output 10 - 300 W
No ratings yet
Enclosure Heaters: Continuous Thermal Output 10 - 300 W
1 page
Uberlingen Map PDF
No ratings yet
Uberlingen Map PDF
3 pages
Cryptography Lecture 1: Principles and History
100% (1)
Cryptography Lecture 1: Principles and History
71 pages
Email Contacts
No ratings yet
Email Contacts
173 pages
Video Surveillance Policy-First Reading
No ratings yet
Video Surveillance Policy-First Reading
1 page
Rosario Training
No ratings yet
Rosario Training
10 pages
Learn Composition
No ratings yet
Learn Composition
9 pages
C++ C++ Is A Superset of C.: Output Operator
No ratings yet
C++ C++ Is A Superset of C.: Output Operator
39 pages
Rema Tip Top
No ratings yet
Rema Tip Top
9 pages
4g Technology
100% (4)
4g Technology
15 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Kmean

Uploaded by

Kmean

Uploaded by

K-means Clustering (unsupervised learning Algorithm)

Algorithm (Technical Details)

And you want to divide them into ( K = 2 ) clusters.

Optimization Objectives of K-mean

1. Within-Cluster Sum of Squares (WCSS):

Example with Calculation

Based on the previous calculations, the points are assigned as follows:

 Cluster 1: (1, 2), (2, 3), (3, 4)

WCSS = WCSS for Cluster 1 + WCSS for Cluster 2 = 4 + 4 = 8

Figure: multiple random initializations of centroids

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.