0% found this document useful (0 votes)

2 views29 pages

K Means Clustering

The document provides an overview of k-means clustering, which is a method of grouping similar data objects into clusters based on their characteristics. It outlines the steps involved in the k-means algorithm, the quality measures for clustering, and its applications in various fields such as marketing and city planning. Additionally, it discusses the strengths and weaknesses of the k-means algorithm, including its efficiency and sensitivity to outliers.

Uploaded by

pobocow192

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

2 views29 pages

K Means Clustering

Uploaded by

pobocow192

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 29

Module 4

k -means Clustering Examples

What is Cluster Analysis?
• Cluster: a collection of data objects
– Similar to one another within the same cluster
– Dissimilar to the objects in other clusters
• Cluster analysis
– Finding similarities between data according to the
characteristics found in the data and grouping similar data
objects into clusters
• Unsupervised learning: no predefined classes
• Typical applications
– As a stand-alone tool to get insight into data distribution
– As a preprocessing step for other algorithms
March 7, 2
2023
Clustering
• Clustering is the task of grouping a set of
objects in such a way that objects in the same
group/cluster are more similar (in some sense
or another) to each other than to those in other
groups (clusters)
• k-means clustering aims to partition n
observations into k clusters in which
each observation belongs to the cluster
with the nearest mean
Examples of Clustering
Applications
• Marketing: Help marketers discover distinct groups in their customer bases,
and then use this knowledge to develop targeted marketing programs
• Land use: Identification of areas of similar land use in an earth observation
database
• Insurance: Identifying groups of motor insurance policy holders with a
high
average claim cost
• City-planning: Identifying groups of houses according to their house type,
value, and geographical location
• Earth-quake studies: Observed earth quake epicenters should be clustered
along continent faults
March 7, 4
2023
Quality: What Is Good Clustering?
• A good clustering method will produce high quality clusters
with
– high intra-class similarity
– low inter-class similarity
• The quality of a clustering result depends on both the
similarity
measure used by the method and its implementation
• The quality of a clustering method is also measured by its
ability to discover some or all of the hidden patterns

March 7, 6
2023
• Intra-cluster cohesion(compactness):
–Cohesion measures how near the data points in a cluster are to the cluster
centroid.
–Sum of squared error (SSE) is a commonly used measure.
• Inter-cluster separation(isolation):
–Separation means that different cluster centroids should be far away from
one another
Measure the Quality of Clustering

• Dissimilarity/Similarity metric: Similarity is expressed in terms

of a distance function, typically metric: d(i, j)
• There is a separate “quality” function that measures the
“goodness” of a cluster.
• The definitions of distance functions are usually very different
for interval-scaled, boolean, categorical, ordinal ratio, and
vector variables.
• Weights should be associated with different variables based on
applications and data semantics.
• It is hard to define “similar enough” or “good enough”
– the answer is typically highly subjective.
March 7, 2023
8
Distance (dissimilarity) measures
Requirements of Clustering in Data Mining

• Scalability
• Ability to deal with different types of attributes
• Ability to handle dynamic data
• Discovery of clusters with arbitrary shape
• Minimal requirements for domain knowledge to determine
input parameters
• Able to deal with noise and outliers
• Insensitive to order of input records
• High dimensionality
• Incorporation of user-specified constraints
• Interpretability and usability

March 7, 1
2023 0
k-means Algorithm
• Given k, the k-means algorithm
is implemented in four steps:
– Partition objects into k nonempty
subsets
– Compute seed points as the centroids of the
clusters of the current partition (the centroid
is the center, i.e., mean point, of the cluster)
– Assign each object to the cluster with the nearest
seed point
– Go back to Step 2, stop when no more new
assignment exists
K-means Clustering -
Steps
K-means clustering
1. Pick k starting means, m1 to mk
Can use: Random values/ Dynamically
picked/ Lower- Upper Bounds
2. Repeat until convergence:
i)Split data into k sets,
S1 to Sk where x belongs to Si
iff mi is the closest mean to x
ii) Update each mi to the mean of Si
k-means Clustering Method
• General Example
10 10
10
9 9
9
8 8
8
7 7
7
6 6
6
5 5
5
4 4
4
Assign 3 Updat 3
3

2 each
2 e the 2

1
object
1

0
cluste 1

0
0
0 1 2 3 4 5 6 7 8 9 s to
0 1
10
2 3 4 5 6 7 8 9 r 0 1
10
2 3 4 5 6 7 8 9

10
most mean
similar reassig s reassig
center 10
n 10
n
K=2 9 9

8 8

Arbitrarily 7 7

6
choose K object
6

5 5

as initial cluster 4 Updat 4

center 3
e the 3

2 2

1 cluste 1

0
0 1 2 3 4 5 6 7 8 9
r 0
0 1 2 3 4 5 6 7 8 9
10
mean 10

s
Example 1
k. Means Example
• Given: {2,4,10,12,3,20,30,11,25}, k=2
•Randomly assign means:
m1=3,m2=4 Iteration 1
• K1={2,3},
K2={4,10,12,20,30,11,25}
• Calculating mean of K1 and K2
results in m1=2.5,m2=16
Iteration 2
• K1={2,3,4},K2={10,12,20,30,11,2
5}
•Calculating mean of K1 and K2 results in
m1=3,m2=18 Iteration 3
• K1={2,3,4,10},K2={12,20,30,11,25}
•Calculating mean of K1 and K2 results in
m1=4.75,m2=19.6 Iteration 4
• K1={2,3,4,10,11,12},K2={20,30,25}
Example 2
Example 3
• The dataset • Dataset:
contains 8 objects with their
sample Objects X Y Z
X, Y and Z coordinates.
• The task is to cluster these OB-1 1 4 1
objects into two OB-2 1 2 2
clusters (k=2).
• Let us consider OB-2 (1,2,2) OB-3 1 4 2
and OB-6 (2,4,2) OB-4 2 1 2
as centroids of cluster 1
and cluster 2 OB-5 1 1 1
respectively. OB-6 2 4 2
• For distance measurement,
OB-7 1 1 2
let Manhattan distance
be used : OB-8 2 1 1
d=|x2–x1|+|y2–y1|+|z2–z1|
Example 3 (cont’d)
• After the initial pass of clustering, the
state of the clustered objects will
look something like the following:
• Distance:
Cluster 1 Distan Distan
OB-2 Cluster 2 ce ce
Object from from
OB-4 OB-1 s X Y Z
C1(1,2, C2(2,4,
OB-5 OB-3
2) 2)
OB-7 OB-6
OB-8 OB-1 1 4 1 3 2
• Updated cluster centroids: OB-2 1 2 2 0 3
• For cluster 1: OB-3 1 4 2 2 1
OB-4 2 1 2 2 3
((1+2+1+1+2)/5, OB-5 1 1 1 2 5
(2+1+1+1+1)/5,(2+2+1+2+1)/5)= OB-6 2 4 2 3 0
(1.4,1.2,1.6) OB-7 1 1 2 1 4
• For cluster 2: OB-8 2 1 1 3 4
((1+1+2)/3, (4+4+4)/3, (1+2+2)/3) =
(1.33, 4, 1.66).
Example 3 (cont’d)
• The new assignments of the objects
with respect to the updated clusters
will be:
• Distance:
Distan Distan
Cluster 1 ce ce
OB-2 Cluster 2 from from
OB-1 Object
OB-4
s X Y Z C1(1.4 C2(1.3
OB-5 OB-3 ,1.2,1. 3, 4,
OB-7 OB-6 6) 1.66)
OB-8
OB-1 1 4 1 3.8 1
• Since there is no change in the OB-2 1 2 2 1.6 2.66
current cluster formation, it is the
OB-3 1 4 2 3.6 0.66
same as the previous state of
clusters. OB-4 2 1 2 1.2 4
• Hence for k=2, the final state of two OB-5 1 1 1 1.2 4
clusters are as above. OB-6 2 4 2 3.8 1
OB-7 1 1 2 1 3.66
OB-8 2 1 1 1.4 4.33
Why use K-means?
• Strengths:
• –Simple: easy to understand and to implement
• –Efficient: Time complexity: O(tkn),

• where n is the number of data points,

• K is the number of clusters, and
• t is the number of iterations.
• –Since both k and t are small. k-means is considered a
linear algorithm.
• K-means is the most popular clustering algorithm.
• Note that: it terminates at a local optimum if SSE is used.
The global optimum is hard to find due to complexity.
Weaknesses of K-means
• The algorithm is only applicable if the mean
is defined.
–For categorical data, k-mode -the centroid is
represented by most frequent values.
•The user needs to specify k.
•The algorithm is sensitive to outliers
–Outliers are data points that are very far away
from other data points.
–Outliers could be errors in the data recording or
some special data points with very different values.
References
1Margaret H. Dunham , “Data
Mining : Introductory and Advanced Concepts”,
Pearson, 2012.
2 https://www.datacamp.com/community/tuto
rials/k-means-clustering-python

Unit 4 Clustering - K-Means and Hierarchical
No ratings yet
Unit 4 Clustering - K-Means and Hierarchical
40 pages
Clustering
No ratings yet
Clustering
125 pages
Data Mining - Clustering
No ratings yet
Data Mining - Clustering
90 pages
21csc305p Machine Learning Unit 3 - Updated
No ratings yet
21csc305p Machine Learning Unit 3 - Updated
147 pages
Clustering Algorithm
No ratings yet
Clustering Algorithm
47 pages
DM Lecture 06
No ratings yet
DM Lecture 06
32 pages
Chapter 3-Unsupervised Learning - Updated
No ratings yet
Chapter 3-Unsupervised Learning - Updated
54 pages
Clustering
No ratings yet
Clustering
84 pages
Clustering
No ratings yet
Clustering
104 pages
Cluster Analysis 1731695796
No ratings yet
Cluster Analysis 1731695796
91 pages
Session 18-Cluster Analysis
No ratings yet
Session 18-Cluster Analysis
20 pages
Chapter 5. Clustering Algorithms-Stud
No ratings yet
Chapter 5. Clustering Algorithms-Stud
44 pages
Unsupervised Learning 1
No ratings yet
Unsupervised Learning 1
40 pages
Chapter 5 Clustering
No ratings yet
Chapter 5 Clustering
40 pages
Clustering and K-Means Algorithm
No ratings yet
Clustering and K-Means Algorithm
81 pages
P-3 1 2-Kmeans
No ratings yet
P-3 1 2-Kmeans
43 pages
K Mean
No ratings yet
K Mean
7 pages
Clustering Algorithm: An Unsupervised Learning Approach
No ratings yet
Clustering Algorithm: An Unsupervised Learning Approach
23 pages
ML Module5 Clustering
No ratings yet
ML Module5 Clustering
71 pages
Unit 4
No ratings yet
Unit 4
74 pages
Cluster Analysis: Dr. Bernard Chen Ph.D. Assistant Professor
No ratings yet
Cluster Analysis: Dr. Bernard Chen Ph.D. Assistant Professor
43 pages
Datamining-Lect5 - Clustering. The K-Means Algorithm. Hierarchical Clustering. The DBSCAN Algorithm. Clustering Evaluation
No ratings yet
Datamining-Lect5 - Clustering. The K-Means Algorithm. Hierarchical Clustering. The DBSCAN Algorithm. Clustering Evaluation
110 pages
Unit 7 Clustering (P)
No ratings yet
Unit 7 Clustering (P)
22 pages
Data Mining Lecture Notes-1: Bsc. (H) Computer Science: Vi Semester Teacher: Ms. Sonal Linda
No ratings yet
Data Mining Lecture Notes-1: Bsc. (H) Computer Science: Vi Semester Teacher: Ms. Sonal Linda
40 pages
What Is Cluster Analysis?: - Cluster: A Collection of Data Objects
No ratings yet
What Is Cluster Analysis?: - Cluster: A Collection of Data Objects
42 pages
Unit 4
No ratings yet
Unit 4
125 pages
Clustering
No ratings yet
Clustering
29 pages
Datamining Lect8
No ratings yet
Datamining Lect8
79 pages
Machine Learning & Data Mining
No ratings yet
Machine Learning & Data Mining
108 pages
22AIP3101A Session 9
No ratings yet
22AIP3101A Session 9
38 pages
Clustering in Machine Learning
No ratings yet
Clustering in Machine Learning
20 pages
ML Application in Signal Processing and Communication Engineering
No ratings yet
ML Application in Signal Processing and Communication Engineering
27 pages
KMeans Clustering
No ratings yet
KMeans Clustering
11 pages
Unsupervised Learning: K-Means Clustering
No ratings yet
Unsupervised Learning: K-Means Clustering
23 pages
Introduction To Unsupervised Learning:: Clustering
No ratings yet
Introduction To Unsupervised Learning:: Clustering
21 pages
Clustering K-Means
100% (2)
Clustering K-Means
28 pages
K Means Algorithm
No ratings yet
K Means Algorithm
4 pages
07 Clustering
No ratings yet
07 Clustering
34 pages
ML Unit-4 Final 2024-25
No ratings yet
ML Unit-4 Final 2024-25
28 pages
L7 Clustering
No ratings yet
L7 Clustering
58 pages
K Mean Clustering1
No ratings yet
K Mean Clustering1
23 pages
Clustering Part1
No ratings yet
Clustering Part1
84 pages
Clustering
No ratings yet
Clustering
80 pages
Clustering Techniques - Hierarchical, K-Means Clustering
No ratings yet
Clustering Techniques - Hierarchical, K-Means Clustering
22 pages
Cluster
No ratings yet
Cluster
50 pages
Soft Vs Hard Clustering
No ratings yet
Soft Vs Hard Clustering
5 pages
Unsupervised Learning - Clustering
No ratings yet
Unsupervised Learning - Clustering
55 pages
CT075!3!2 DTM Topic 10 Cluster Analysis
No ratings yet
CT075!3!2 DTM Topic 10 Cluster Analysis
21 pages
CH-6 DM Clustering
No ratings yet
CH-6 DM Clustering
28 pages
Chapter 3 Unsupervised Learning
No ratings yet
Chapter 3 Unsupervised Learning
45 pages
Mod4 - Unsupervised Learning
No ratings yet
Mod4 - Unsupervised Learning
9 pages
DM 4
No ratings yet
DM 4
76 pages
Clustering in Python
No ratings yet
Clustering in Python
31 pages
Assignment No. A6: 1 Title
No ratings yet
Assignment No. A6: 1 Title
5 pages
Data Mining: I Gede Mahendra Darmawiguna
No ratings yet
Data Mining: I Gede Mahendra Darmawiguna
25 pages
CS8091 - Big Data Analytics - Unit 2
No ratings yet
CS8091 - Big Data Analytics - Unit 2
44 pages
7.introduction To Clustering
No ratings yet
7.introduction To Clustering
11 pages
Kmeansfinal
No ratings yet
Kmeansfinal
16 pages
MODULE 4 Clustering
No ratings yet
MODULE 4 Clustering
23 pages
Applications of Finite Mathematics
From Everand
Applications of Finite Mathematics
Gautami Devar
No ratings yet
Mitacs Globalink Recommendation Letter Guidelines Aug2015
No ratings yet
Mitacs Globalink Recommendation Letter Guidelines Aug2015
1 page
Creating Blue Ocean Strategy in Air Indus (PVT) LTD PDF
No ratings yet
Creating Blue Ocean Strategy in Air Indus (PVT) LTD PDF
182 pages
Zagreb Hotel
No ratings yet
Zagreb Hotel
2 pages
Japanese Assignment
No ratings yet
Japanese Assignment
25 pages
Medidor de Flujo ULTRAFIX
No ratings yet
Medidor de Flujo ULTRAFIX
26 pages
Cognitive Evaluation Theory: Explanation
No ratings yet
Cognitive Evaluation Theory: Explanation
1 page
ISPnub - Stand-Alone AVR in-System-Programmer Module - Fisch
No ratings yet
ISPnub - Stand-Alone AVR in-System-Programmer Module - Fisch
2 pages
Astor MG
No ratings yet
Astor MG
67 pages
KX TS880
No ratings yet
KX TS880
2 pages
Motivation & Self-Esteem
No ratings yet
Motivation & Self-Esteem
9 pages
Position Title SG Training Experience Education Eligibility
No ratings yet
Position Title SG Training Experience Education Eligibility
4 pages
Margin of Over Pull Calculations
No ratings yet
Margin of Over Pull Calculations
4 pages
Machine Tool Technology - 20me32p
No ratings yet
Machine Tool Technology - 20me32p
10 pages
House Committee Letter To Energy Secretary Regarding Use of Personal Email
No ratings yet
House Committee Letter To Energy Secretary Regarding Use of Personal Email
8 pages
15W Qcwolves B22 Led Bulb 08112023
No ratings yet
15W Qcwolves B22 Led Bulb 08112023
4 pages
Parallelism Ppt.8
No ratings yet
Parallelism Ppt.8
13 pages
2
No ratings yet
2
10 pages
COMFED Sudha Internship Final Report
No ratings yet
COMFED Sudha Internship Final Report
3 pages
Munar, Ronald C-WPS Office
No ratings yet
Munar, Ronald C-WPS Office
6 pages
Chapter 11: Risk Assessment Part III Consideration of Internal Control in A Financial Statement Audit
No ratings yet
Chapter 11: Risk Assessment Part III Consideration of Internal Control in A Financial Statement Audit
42 pages
Alvo Stockman - Best Friends Forever (Alvolucion) PDF
100% (2)
Alvo Stockman - Best Friends Forever (Alvolucion) PDF
36 pages
Mattila Malinen 2024 Exploring The Normative Structure of Finnish Soldiers Home Association Understanding An Auxiliary
No ratings yet
Mattila Malinen 2024 Exploring The Normative Structure of Finnish Soldiers Home Association Understanding An Auxiliary
23 pages
UV Cure Conformal Coating
No ratings yet
UV Cure Conformal Coating
3 pages
Patient Safety and Quality Care Movement
No ratings yet
Patient Safety and Quality Care Movement
9 pages
Worksheet 14A: Properties of Waves: - Date
100% (1)
Worksheet 14A: Properties of Waves: - Date
2 pages
Jbel Lahdid (Essaouira) - Schedule 24 - Maintenance Plan Dm037065-En R2
No ratings yet
Jbel Lahdid (Essaouira) - Schedule 24 - Maintenance Plan Dm037065-En R2
34 pages
Introduction To Regulatory Affairs
100% (1)
Introduction To Regulatory Affairs
15 pages
Henny Penny Cfa 500 Users Manual 564984
No ratings yet
Henny Penny Cfa 500 Users Manual 564984
100 pages
Health & Safety Procedure: HSP0001 Confined Space Safety
No ratings yet
Health & Safety Procedure: HSP0001 Confined Space Safety
33 pages
Forced Convection Heat Transfer of Molten Salts - A Review
No ratings yet
Forced Convection Heat Transfer of Molten Salts - A Review
9 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

K Means Clustering

Uploaded by

K Means Clustering

Uploaded by

Module 4

k -means Clustering Examples

• Dissimilarity/Similarity metric: Similarity is expressed in terms

as initial cluster 4 Updat 4

• where n is the number of data points,

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.