0% found this document useful (0 votes)

25 views

SCA - Module 8

This document discusses K-means clustering, an algorithm that groups data points into K number of clusters based on their attributes and distances. It explains the basic steps of the algorithm, including initializing cluster seeds, assigning data points to the closest cluster, recalculating the centroids, and iterating until clusters stabilize. Issues like dependency on initial seeds and getting stuck in local optima are also covered.

Uploaded by

mahnoor

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

25 views

SCA - Module 8

Uploaded by

mahnoor

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 13

Features

Clustering

Week 11
Clustering >> K-means Clustering
• Clustering is the task of grouping a set
of objects in such a way that objects in
the same group (called a cluster) are
more similar it each other than those in
the other groups
• To cluster n objects based on attributes
into k-partitions
• Strengths
• Simple iterative method
• User provides “K”

• Weaknesses
• Often too simple → bad results • Euclidean distance
• Manhattan distance
• Difficult to guess the correct “K” • Maximum norm
K-means Clustering

Basic Algorithm:
• Step 0: select K
• Step 1: randomly select initial cluster seeds

Seed 1
650

Seed 2
200
K-means Clustering

• An initial cluster seed represents the “mean value” of its cluster.

• In the preceding figure:
• Cluster seed 1 = 650
• Cluster seed 2 = 200
K-means Clustering

• Step 2: calculate distance from each object to each cluster seed.

• What type of distance should we use?

• Squared Euclidean distance

• Step 3: Assign each object to the closest cluster

K-means Clustering

Seed 1

Seed 2
K-means Clustering
• Step 4: Compute the new centroid for each cluster

Cluster Seed 1
708.9

Cluster Seed 2
214.2
K-means Clustering

• Iterate
• Calculate distance from objects to cluster centroids
• Assign objects to closest cluster
• Recalculate new centroids

• Stop based on convergence criteria

• No change in clusters
• Max iterations
K-means issues

• Distance measure is squared Euclidean

• Approach tries to minimize the within-cluster sum of squares error

• Implicit assumption that SSE is similar for each group
Bottom line

• K-means
• Easy to use
• Need to know K
• May need to scale data
• Good initial method

• Local optima
• No guarantee of optimal solution
• Repeat with different starting values
Example
Data provided by 3 companies 12
Scatter plot

A1 2 10 10

A2 2 5 8

A3 8 4 6

B1 5 8 4

B2 7 5 2

B3 6 4 0
0 1 2 3 4 5 6 7 8 9

C1 1 2
C2 4 9
Data Distance to Cluster New Cluster

Example Initial centroid A1

x1
2
y1
10
x2
2
0
y2
10 5
3.605551275
8 1
8.062257748
2
1
A1: 2, 10 A2 2 5 5 4.242640687 3.16227766 3
B1: 5, 8 A3 8 4 8.485281374 5 7.280109889 2
C1: 1, 2 B1 5 8 3.605551275 0 7.211102551 2
B2 7 5 7.071067812 3.605551275 6.708203932 2
B3 6 4 7.211102551 4.123105626 5.385164807 2
C1 1 2 8.062257748 7.211102551 0 3
C2 4 9 2.236067977 1.414213562 7.615773106 2

Data points Distance to Cluster New Cluster

New centroids
x2 y2
A1: 2, 10
x1 y1 2 10 6 6 1.5 3.5
B1: 6, 6
A1 2 10 0 5.656854249 6.519202405 1 1
C1: 1.5, 3.5
A2 2 5 5 4.123105626 1.58113883 3 3
A3 8 4 8.485281374 2.828427125 6.519202405 2 2
Scatter plot B1 5 8 3.605551275 2.236067977 5.700877125 2 2
15 B2 7 5 7.071067812 1.414213562 5.700877125 2 2
10 B3 6 4 7.211102551 2 4.527692569 2 2
5 C1 1 2 8.062257748 6.403124237 1.58113883 3 3
0 C2 4 9 2.236067977 3.605551275 6.041522987 2 1
0 2 4 6 8 10
New centroids Data Distance to Cluster New Cluster
A1: 3, 9.5 x2 y2
B1: 6.5, 5.25 x1 y1 3 9.5 6.5 5.25 1.5 3.5
C1: 1.5, 3.5 A1 2 10 1.118033989 6.543126164 6.519202405 1 1
A2 2 5 4.609772229 4.506939094 1.58113883 3 3
A3 8 4 7.433034374 1.952562419 6.519202405 2 2
B1 5 8 2.5 3.132491022 5.700877125 2 1
B2 7 5 6.020797289 0.559016994 5.700877125 2 2
B3 6 4 6.264982043 1.346291202 4.527692569 2 2
C1 1 2 7.762087348 6.38846617 1.58113883 3 3
C2 4 9 1.118033989 4.506939094 6.041522987 1 1
Example
Data Distance to Cluster New Cluster
x2 y2
New centroids
x1 y1 3.67 9 7 4.33 1.5 3.5
A1: 3.67, 9
A1 2 10 1.946509697 7.559689147 6.519202405 1 1
B1: 7, 4.33
A2 2 5 4.334616477 5.044690278 1.58113883 3 3
C1: 1.5, 3.5
A3 8 4 6.614295125 1.053043209 6.519202405 2 2
B1 5 8 1.664001202 4.179581319 5.700877125 1 1
B2 7 5 5.204699799 0.67 5.700877125 2 2
B3 6 4 5.516239661 1.053043209 4.527692569 2 2
C1 1 2 7.491922317 6.436528567 1.58113883 3 3
C2 4 9 0.33 5.550576547 6.041522987 1 1

Scatter plot of points

12
Final Cluster 10
A1 2 10 1 10 9
8
B1 5 8 1 8
C2 4 9 1
A3 8 4 2 6 5 5
4 4
B2 7 5 2 4
B3 6 4 2 2
2
A2 2 5 3
C1 1 2 3 0
0 1 2 3 4 5 6 7 8 9

EXP-301 - Windows User Mode Exploit Development (OSED) 2022 - Offensive Security - 2022 - Anna's Archive
100% (1)
EXP-301 - Windows User Mode Exploit Development (OSED) 2022 - Offensive Security - 2022 - Anna's Archive
604 pages
Bazaar Universe Guide
No ratings yet
Bazaar Universe Guide
7 pages
MOUNIKA
No ratings yet
MOUNIKA
56 pages
Roger G. Barry - Synoptic and Dynamic Climatology (2001)
No ratings yet
Roger G. Barry - Synoptic and Dynamic Climatology (2001)
637 pages
unsupervised_learning_1
No ratings yet
unsupervised_learning_1
40 pages
PART2
No ratings yet
PART2
61 pages
08_k-means
No ratings yet
08_k-means
19 pages
Clustering
No ratings yet
Clustering
125 pages
Quality of Clustering: Clustering (K-Means Algorithm)
No ratings yet
Quality of Clustering: Clustering (K-Means Algorithm)
4 pages
Unsupervised Learning - Clustering
No ratings yet
Unsupervised Learning - Clustering
55 pages
Unit 5
No ratings yet
Unit 5
63 pages
Clustering Analysis: What Is Cluster Analysis?
No ratings yet
Clustering Analysis: What Is Cluster Analysis?
5 pages
AI-AG-Day-2-28th Feb 2023
No ratings yet
AI-AG-Day-2-28th Feb 2023
44 pages
Unit 7 Clustering (P) (1) (1)
No ratings yet
Unit 7 Clustering (P) (1) (1)
22 pages
K-Means Clustering
No ratings yet
K-Means Clustering
18 pages
MLT Unit 3 Notes
No ratings yet
MLT Unit 3 Notes
19 pages
Clustering
No ratings yet
Clustering
84 pages
ML Module 4 2022 1 PDF
No ratings yet
ML Module 4 2022 1 PDF
31 pages
K Means Clustering
No ratings yet
K Means Clustering
11 pages
Clustering Lecture
No ratings yet
Clustering Lecture
46 pages
k Mean Clustering
No ratings yet
k Mean Clustering
32 pages
Unit 4 Clustering - K-Means and Hierarchical
No ratings yet
Unit 4 Clustering - K-Means and Hierarchical
40 pages
K-Means Clustering
No ratings yet
K-Means Clustering
38 pages
K Mean Clustering
No ratings yet
K Mean Clustering
27 pages
21csc305p Machine Learning Unit 3_updated (2)
No ratings yet
21csc305p Machine Learning Unit 3_updated (2)
147 pages
Clustering Part1
No ratings yet
Clustering Part1
19 pages
Unsupervised Learning Modi
No ratings yet
Unsupervised Learning Modi
16 pages
Clustering
No ratings yet
Clustering
75 pages
Soft Vs Hard Clustering
No ratings yet
Soft Vs Hard Clustering
5 pages
KMean Merged
No ratings yet
KMean Merged
13 pages
CH-6 DM Clustering
No ratings yet
CH-6 DM Clustering
28 pages
Chapter 3 Unsupervised Learning
No ratings yet
Chapter 3 Unsupervised Learning
45 pages
9.54 Class 13: Unsupervised Learning
No ratings yet
9.54 Class 13: Unsupervised Learning
54 pages
K Mean Clustering1
No ratings yet
K Mean Clustering1
23 pages
B43 Exp5 ML
No ratings yet
B43 Exp5 ML
6 pages
Clustering Algorithms
No ratings yet
Clustering Algorithms
19 pages
Clustering K-Means
100% (2)
Clustering K-Means
28 pages
Clustering
No ratings yet
Clustering
75 pages
w6 Clustering
No ratings yet
w6 Clustering
29 pages
Clustering in Python
No ratings yet
Clustering in Python
31 pages
K Means
No ratings yet
K Means
26 pages
kmeansfinal
No ratings yet
kmeansfinal
16 pages
DEU CSC5045 Intelligent System Applications Using Fuzzy - 4+clustering
No ratings yet
DEU CSC5045 Intelligent System Applications Using Fuzzy - 4+clustering
61 pages
K Means Algorithms
No ratings yet
K Means Algorithms
27 pages
Cluster Analysis: Dr. Bernard Chen Ph.D. Assistant Professor
No ratings yet
Cluster Analysis: Dr. Bernard Chen Ph.D. Assistant Professor
43 pages
Cluster
No ratings yet
Cluster
50 pages
4 Clustering1
No ratings yet
4 Clustering1
41 pages
Working of K Means Algorithm - YashBhure
No ratings yet
Working of K Means Algorithm - YashBhure
14 pages
K Means
No ratings yet
K Means
23 pages
P-3 1 2-Kmeans
No ratings yet
P-3 1 2-Kmeans
43 pages
16 K Mean Clustring 1 18052023 095249am 08042024 093324am
No ratings yet
16 K Mean Clustring 1 18052023 095249am 08042024 093324am
20 pages
(IJCST-V3I1P7) Author: Kanika, Gargi Narula
No ratings yet
(IJCST-V3I1P7) Author: Kanika, Gargi Narula
3 pages
K Mean Clustering
No ratings yet
K Mean Clustering
45 pages
K Means
No ratings yet
K Means
33 pages
Week 9 - Clustering
No ratings yet
Week 9 - Clustering
63 pages
Clustering-Part1.pptx
No ratings yet
Clustering-Part1.pptx
84 pages
Clustering
No ratings yet
Clustering
35 pages
Lect 10 DM
No ratings yet
Lect 10 DM
36 pages
Cluster Analysis: Talha Farooq Faizan Ali Muhammad Abdul Basit
No ratings yet
Cluster Analysis: Talha Farooq Faizan Ali Muhammad Abdul Basit
16 pages
Cluster Analysis
No ratings yet
Cluster Analysis
29 pages
ADL LAB Manual
No ratings yet
ADL LAB Manual
27 pages
W6 Clustering
No ratings yet
W6 Clustering
29 pages
Sudoku New: Workouts to sharpen your mind
From Everand
Sudoku New: Workouts to sharpen your mind
Sahil Gupta
No ratings yet
Blockchain, Crypto and DeFi: Bridging Finance and Technology
From Everand
Blockchain, Crypto and DeFi: Bridging Finance and Technology
Marco Di Maggio
No ratings yet
Lec 3 4 Forecasting
No ratings yet
Lec 3 4 Forecasting
50 pages
Inventory Management SLA
No ratings yet
Inventory Management SLA
15 pages
SCA - Module 7
No ratings yet
SCA - Module 7
47 pages
Lec 15 Layout Strategy
No ratings yet
Lec 15 Layout Strategy
54 pages
SCA - Module 5
No ratings yet
SCA - Module 5
37 pages
Lec 11 12 Capacity Planning
No ratings yet
Lec 11 12 Capacity Planning
44 pages
Reg No. 2032
No ratings yet
Reg No. 2032
2 pages
Name: Reg No: 2016202 Section: B: Maheen Ashraf
No ratings yet
Name: Reg No: 2016202 Section: B: Maheen Ashraf
1 page
Lec 13 Location Strategies
No ratings yet
Lec 13 Location Strategies
25 pages
AIR Campus-Drive-Letter 2017
No ratings yet
AIR Campus-Drive-Letter 2017
1 page
Organisational Structures and Vocational Training Provision
No ratings yet
Organisational Structures and Vocational Training Provision
8 pages
Pakistan Textile Industry Facing New Challenges
No ratings yet
Pakistan Textile Industry Facing New Challenges
10 pages
Ladder
No ratings yet
Ladder
235 pages
2223 Term 1 Pre-Assessment Unit 1-3
No ratings yet
2223 Term 1 Pre-Assessment Unit 1-3
17 pages
Taylor & Pastor 2007
No ratings yet
Taylor & Pastor 2007
17 pages
Sample Cooler Application Note
No ratings yet
Sample Cooler Application Note
2 pages
Ansi C63.2
No ratings yet
Ansi C63.2
19 pages
Uncertainty Evaluation and Propagation For Spectral Measurements
No ratings yet
Uncertainty Evaluation and Propagation For Spectral Measurements
11 pages
Design of Axially Loaded Pad Footing (Square) Data
No ratings yet
Design of Axially Loaded Pad Footing (Square) Data
8 pages
Module 2 Linear Programming
No ratings yet
Module 2 Linear Programming
8 pages
Uniform Flow
No ratings yet
Uniform Flow
28 pages
Visual Inspection: Asme - Section 5 - Article 9
No ratings yet
Visual Inspection: Asme - Section 5 - Article 9
93 pages
Ag Belt
100% (2)
Ag Belt
759 pages
Adomian Decomposition Method For Solving Higher Order Boundary Value Problems
No ratings yet
Adomian Decomposition Method For Solving Higher Order Boundary Value Problems
12 pages
Grade 6 Worksheet 1 - Advanced Features
No ratings yet
Grade 6 Worksheet 1 - Advanced Features
2 pages
ATV310 - Diagnostics and Troubleshooting - Jan2016 - EN
No ratings yet
ATV310 - Diagnostics and Troubleshooting - Jan2016 - EN
6 pages
Data Analyst to Data Scientist (2)
No ratings yet
Data Analyst to Data Scientist (2)
19 pages
Lasform Technology
No ratings yet
Lasform Technology
3 pages
8JR00381 Cat 312c Schematics Caterpill4r
No ratings yet
8JR00381 Cat 312c Schematics Caterpill4r
2 pages
Aditya Engineering College (A) : Unit-V
No ratings yet
Aditya Engineering College (A) : Unit-V
11 pages
Computer Abbreviations
No ratings yet
Computer Abbreviations
6 pages
GenMath Q1 W4
No ratings yet
GenMath Q1 W4
16 pages
The Waterfall Model
No ratings yet
The Waterfall Model
3 pages
Communicating: Skills Introduction
No ratings yet
Communicating: Skills Introduction
4 pages
Lecture-2 Bioelectric Amplifiers and Measuring Instruments
No ratings yet
Lecture-2 Bioelectric Amplifiers and Measuring Instruments
29 pages
21EEE01 Utilization & Conservation of Electrical Energy
No ratings yet
21EEE01 Utilization & Conservation of Electrical Energy
17 pages
Solutions PDF
No ratings yet
Solutions PDF
33 pages
Aliyu 2017
No ratings yet
Aliyu 2017
26 pages
Thesis Summary (FMIPA, Dept. of Mathematics) - I Gusti Agung Kartika Shanti
No ratings yet
Thesis Summary (FMIPA, Dept. of Mathematics) - I Gusti Agung Kartika Shanti
8 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

SCA - Module 8

Uploaded by

SCA - Module 8

Uploaded by

Features

• An initial cluster seed represents the “mean value” of its cluster.

• Step 2: calculate distance from each object to each cluster seed.

• What type of distance should we use?

• Step 3: Assign each object to the closest cluster

• Stop based on convergence criteria

• Distance measure is squared Euclidean

• Approach tries to minimize the within-cluster sum of squares error

Example Initial centroid A1

Data points Distance to Cluster New Cluster

Scatter plot of points

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.