clustering
clustering
Programme : BCA
Semester : V
Subject Code : BCAT311
Subject : Machine Learning with Python
Topic : Clustering
Faculty : Ms. Shilpi Bansal
© Institute of Information Technology and Management, D-29, Institutional Area, Janakpuri, New Delhi-110058
List of Topics
Introduction to clustering
K-mean clustering
Hierarchical clustering
5
Requirements for Clustering in Data
Mining
Scalability
Ability to deal with different types of attributes
Discovery of clusters with arbitrary shape
Minimal domain knowledge required to determine input
parameters
Ability to deal with noise and outliers
Insensitivity to order of input records
Robustness wrt high dimensionality
Incorporation of user-specified constraints
Interpretability and usability
6
Similarity and Dissimilarity Between
Objects
7
Major Clustering Approaches
8
Partitioning Algorithms
9
K-Means Clustering
10
K-Means Clustering (contd.)
Example
10 10
9 9
8 8
7 7
6 6
5 5
4 4
3 3
2 2
1 1
0 0
0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10
10 10
9 9
8 8
7 7
6 6
5 5
4 4
3 3
2 2
1 1
0 0
0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10
11
Comments on the K-Means Method
Strengths
Relatively efficient: O(tkn), where n is # objects, k is # clusters, and t is
# iterations. Normally, k, t << n.
Often terminates at a local optimum. The global optimum may be found
using techniques such as simulated annealing and genetic algorithms
Weaknesses
Applicable only when mean is defined (what about categorical data?)
Need to specify k, the number of clusters, in advance
Trouble with noisy data and outliers
Not suitable to discover clusters with non-convex shapes
12
Hierarchical Clustering
Use distance matrix as clustering criteria. This method does not
require the number of clusters k as an input, but needs a termination
condition
10 10 10
9 9 9
8 8 8
7 7 7
6 6 6
5 5 5
4 4 4
3 3 3
2 2 2
1 1 1
0 0 0
0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10
14
A Dendrogram Shows How the
Clusters are Merged Hierarchically
15
DIANA (Divisive Analysis)
10 10
10
9 9
9
8 8
8
7 7
7
6 6
6
5 5
5
4 4
4
3 3
3
2 2
2
1 1
1
0 0
0
0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10
0 1 2 3 4 5 6 7 8 9 10
16
© Institute of Information Technology and Management, D-29,
Institutional Area, Janakpuri, New Delhi-110058