Topic4 Clustering
Topic4 Clustering
EE4483 / IM 4483
1
Plan for Part 3
2
Clustering
3
Outline
• Concept of Clustering
• Distance Metrics
• K-Means
4
Carry-on Questions
5
Recap: Classification
6
Recap: Classification
• What if we do NOT have the labels? Which pixels form the flower?
9
From classification to clustering
10
From classification to clustering
Background Pixels
Flower Pixels
11
From classification to clustering
12
From classification to clustering
Cluster 2
Cluster 1
13
From classification to clustering
• Classification is supervised:
• Clustering is unsupervised:
14
Clustering
• Clustering:
• Unsupervised Method.
15
Distance Measures / Metrics
16
Distance Measures / Metrics
1. 𝑑 𝒙𝑖 , 𝒙𝑗 ≥ 0, 𝑑 𝒙𝑖 , 𝒙𝑗 = 0 if and only if 𝒙𝑖 = 𝒙𝑗 .
𝒙𝑗
non-negativity
𝒙𝑖
17
Distance Measures / Metrics
1. 𝑑 𝒙𝑖 , 𝒙𝑗 ≥ 0, 𝑑 𝒙𝑖 , 𝒙𝑗 = 0 if and only if 𝒙𝑖 = 𝒙𝑗 .
𝒙𝑙
2. 𝑑 𝒙𝑖 , 𝒙 𝑗 + 𝑑 𝒙 𝑗 , 𝒙𝑙 ≥ 𝑑 𝒙𝑖 , 𝒙𝑙
triangle 𝒙𝑗
inequality 𝒙𝑖
18
Distance Measures / Metrics
1. 𝑑 𝒙𝑖 , 𝒙𝑗 ≥ 0, 𝑑 𝒙𝑖 , 𝒙𝑗 = 0 if and only if 𝒙𝑖 = 𝒙𝑗 .
2. 𝑑 𝒙𝑖 , 𝒙 𝑗 + 𝑑 𝒙 𝑗 , 𝒙𝑙 ≥ 𝑑 𝒙𝑖 , 𝒙𝑙
𝒙𝑗
3. 𝑑 𝒙𝑖 , 𝒙 𝑗 = 𝑑 𝒙 𝑗 , 𝒙𝑖
symmetry 𝒙𝑖
19
Distance Measures / Metrics
• Example of distances:
1. Euclidean Distance:
𝑑
𝑑 𝒙 ,𝒚 = (𝑥𝑗 − 𝑦𝑗 )2 = 𝒙 − 𝒚 2
𝑗=1
𝒙 −𝒚 2
𝒚
20
Distance Measures / Metrics
• Example of distances:
1. Euclidean Distance:
𝑑
𝑑 𝒙 ,𝒚 = (𝑥𝑗 − 𝑦𝑗 )2 = 𝒙 − 𝒚 2
𝑗=1
2. Manhattan Distance:
𝑑
𝑑 𝒙 , 𝒚 = | 𝑥𝑗 − 𝑦𝑗 | = 𝒙 − 𝒚 1
𝑗=1
𝒙
𝒙 −𝒚 1
𝒚 21
Distance Measures / Metrics
• Example of distances:
1. Euclidean Distance:
𝑑
𝑑 𝒙 ,𝒚 = (𝑥𝑗 − 𝑦𝑗 )2 = 𝒙 − 𝒚 2
𝑗=1
2. Manhattan Distance:
𝑑
𝑑 𝒙 , 𝒚 = | 𝑥𝑗 − 𝑦𝑗 | = 𝒙 − 𝒚 1
𝑗=1
• Example of distances:
1. Euclidean Distance: 𝑑 𝒙 , 𝒚 = 𝒙 − 𝒚 2
2. Manhattan Distance: 𝑑 𝒙 , 𝒚 = 𝒙 − 𝒚 1
3. Infinity (Sup) Distance: 𝑑 𝒙 , 𝒚 = max 𝑥𝑗 − 𝑦𝑗
1≤𝑗≤𝑑
23
Distance Measures / Metrics
24
Clustering Algorithms
• Partition Algorithms
• K-Means
• Mixture of Gaussian
• Spectral Clustering
• Hierarchical Algorithms
• Agglomerative
• Divisive
25
Clustering Algorithms
• Partition Algorithm
Cluster 1
Cluster 2
26
Clustering Algorithms
• Hierarchical Algorithm
27
Clustering Algorithms
• Partition Algorithms
• K-Means
• Mixture of Gaussian
• Spectral Clustering
• Hierarchical Algorithms
• Agglomerative
• Divisive
28
Clustering Algorithms
• Partition Algorithms
• K-Means
• Mixture of Gaussian
• Spectral Clustering
• Hierarchical Algorithms
• Agglomerative
• Divisive
29
K-Means
• Clustering Goal:
3. The sum of distances between each 𝒙(𝑖) and its centroid 𝜇𝑘 is minimized.
30
K-Means
31
K-Means
1. Assign every data point 𝒙𝑖 to its closest cluster center, according to the
given distance metric, i.e., find the 𝜇𝑘 such that 𝑑 𝒙𝑖 , 𝜇𝑘 is minimized.
2. Update the cluster center 𝜇𝑘 to be the average of its assigned data points.
32
K-Means Example 1
33
K-Means Example 1
34
K-Means Example 1
• Given the three initial cluster centers A1, B1, and C1.
• Step 1: Determine which data point belongs to which cluster by
calculating their distances to the centers, i.e., highlighted columns in the
following matrix:
35
K-Means Example 1
• Cluster 1={A1}, Cluster 2={B1, B2, B3, A3, C2}, Cluster 3={A2, C1}
36
K-Means Example 1
• Cluster 1={A1}, Cluster 2={B1, B2, B3, A3, C2}, Cluster 3={A2, C1}
• Step 2: The cluster centers after the first round of iteration can be obtained
by computing the mean of all the data points belong to each cluster as:
C1 = (2, 10); C2 = (6, 6); C3 = (1.5, 3.5)
37
K-Means Example 1
• Step 2: The cluster centers after the first round of iteration can be
obtained by computing the mean of all the data points belong to each
cluster as:
C1 = (2, 10); C2 = (6, 6); C3 = (1.5, 3.5)
• ……
38
K-Means Example 2
Centroid 1
Data samples
39
K-Means Example 3
40
K-Means Example 3
K=2 K=3 41
K-Means Example 3
K=3 42
K-Means
𝑁 𝐾
1 2
min min 𝑟𝑖𝑘 𝑥𝑖 − µ𝑘 2
µ𝑘 𝑟𝑖𝑘 2
𝑖=1 𝑘=1
• Normalization: σ𝐾
𝑘=1 𝑟𝑖𝑘 = 1 ∀𝑖
43
K-Means
• Pros:
• Easy to implement
• Cons:
• Need to choose K.
44
K-Means
45
K-Means
46
K-Means
• Good Initialization
Centroid
47
K-Means
• Good Initialization
48
K-Means
• Bad Initialization
Centroid
49
K-Means
• Bad Initialization
50
K-Means
51
HAC
• At each step, merge the closest pair of clusters, until only one cluster (or K
clusters) left. K is a given number.
• How to merge?
52
HAC
Iteration:
Merge two clusters with the
minimum distance.
Stopping Criteria:
All objects are merged into a
single cluster.
Or
Cut at 2 clusters
Cluster Cluster
1 2
54
HAC
• Advantages of HAC
• Any clustering result with the desired number of clusters K, can be obtained
by “cutting” the dendrogram at the corresponding level.
55
HAC
Distance?
56
HAC
• MIN: the minimum distance between any pair of two data samples
from each cluster.
57
HAC
• MAX: the maximum distance between any pair of two data samples
from each cluster.
58
HAC
• Group Average: the average distance between all pairs of two data
samples from each cluster.
59
HAC We will use Centroid
Distance for HAC in
this course.
1. How to Define the distance between 2 clusters?
60
HAC
61
HAC
62
HAC
63
HAC
64
HAC
65
HAC
A 0 2.5 1 4 5
C 1 2.5 0 5.5 4
D 4 3 5.5 0 1.5
E 5 3.5 4 1.5 0
A 0 2.5 1 4 5
C 1 2.5 0 5.5 4
D 4 3 5.5 0 1.5
E 5 3.5 4 1.5 0
D 4.3 3 0 1.5
B 2.8 0 3 3.5
D 4.3 3 0 1.5
B 2.8 0 3.3
B 2.8 0 3.3
A&B&C 0 3.9
D&E 3.9 0
A&B&C&D&E 0
• K-Means
• HAC
75
What we learn
• What is clustering?
• K-Means
• HAC
• Need to choose K. Can stuck at poor local minimum. Need good metric.
77
Thank you! Now questions
78