0% found this document useful (0 votes)
23 views

Topic4 Clustering

The document discusses clustering algorithms for unsupervised machine learning. It introduces concepts of clustering, distance metrics, and popular clustering algorithms like K-Means and hierarchical agglomerative clustering (HAC). Examples are provided to explain clustering and how it differs from classification.

Uploaded by

Renhe SONG
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
23 views

Topic4 Clustering

The document discusses clustering algorithms for unsupervised machine learning. It introduces concepts of clustering, distance metrics, and popular clustering algorithms like K-Means and hierarchical agglomerative clustering (HAC). Examples are provided to explain clustering and how it differs from classification.

Uploaded by

Renhe SONG
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 78

Artificial Intelligence & Data Mining

EE4483 / IM 4483

WEN Bihan (Asst Prof)


Homepage: www.bihanwen.com

1
Plan for Part 3

• Unsupervised Learning - Clustering & Regression

• Regularization and Optimization for Deep Models

• Bayesian Reasoning & Dimensionality Reduction

2
Clustering

WEN Bihan (Asst Prof)


Homepage: www.bihanwen.com

3
Outline

• Concept of Clustering

• Distance Metrics

• K-Means

• Hierarchical Agglomerative Clustering (HAC)

4
Carry-on Questions

• What is a cluster? What is clustering?

• What is the difference between clustering and classification?

• What are the limitations of K-Means algorithm?

• What are the limitations of HAC algorithm?

5
Recap: Classification

• Computer recognize whether there is a cat in the image:

6
Recap: Classification

• Computer recognize whether there is a cat in the image:

• Need training process to teach computer.


• Images labeled as Cat.
7
From classification to clustering

• What if we do NOT have the labels? Which pixels form the flower?

• Data intrinsic structure / Similarity within the same group.


• How can we do it?
8
From classification to clustering

• Clustering is (typically) unsupervised learning:

• Unsupervised Learning: self-organized learning that helps find unknown


patterns in data set without pre-existing labels.

• Clustering: Given a collection of data samples, the goal is to group / organize


the data such that the data in the same group are more similar to each other,
than to those in other groups.

• Cluster: a set of data which are similar to each other.

9
From classification to clustering

• Unsupervised Learning: self-organized learning that helps find


unknown patterns in data set without pre-existing labels.

No labels or class information provided

10
From classification to clustering

• Clustering: Given a collection of data samples, the goal is to group /


organize the data such that the data in the same group are more
similar to each other, than to those in other groups.

Background Pixels
Flower Pixels
11
From classification to clustering

• Clustering: Given a collection of data samples, the goal is to group /


organize the data such that the data in the same group are more
similar to each other, than to those in other groups.

12
From classification to clustering

• Cluster: a set of data which are similar to each other.

Cluster 2

Cluster 1

13
From classification to clustering

• Classification is supervised:

• Class labels are provided in training.

• Learn a classifier to predict the class labels of unseen data.

• Clustering is unsupervised:

• No pre-existing label is given.

• Understand the structure / organization of your underlying data.

14
Clustering

• Clustering:

• Unsupervised Method.

• Basic Idea: group together the similar data points.

• Input: a group of data points, without any training label.

• Output: the “membership” of each data point.

• How do we define “similarity” here?

15
Distance Measures / Metrics

• Given a set of N data samples / points 𝒙1 , 𝒙2 , … , 𝒙𝑖 , … 𝒙𝑁 that we


would like to cluster.

• Each data sample is assumed to be an d-dimensional vector that we


write as a column vector:
𝑥1
𝒙= ⋮
𝑥𝑑

• We define the distance between any two data samples 𝒙𝑖 and 𝒙𝑗 as

𝑑(𝒙𝑖 , 𝒙𝑗 ) → 𝑅 (real scalar)

16
Distance Measures / Metrics

• We define the distance between any two data samples 𝒙𝑖 and 𝒙𝑗 as


𝑑(𝒙𝑖 , 𝒙𝑗 ) → 𝑅 (real scalar)

• A distance metric a function of (𝑅 𝑑 × 𝑅 𝑑 ) → 𝑅 that satisfies:

1. 𝑑 𝒙𝑖 , 𝒙𝑗 ≥ 0, 𝑑 𝒙𝑖 , 𝒙𝑗 = 0 if and only if 𝒙𝑖 = 𝒙𝑗 .

𝒙𝑗
non-negativity

𝒙𝑖
17
Distance Measures / Metrics

• We define the distance between any two data samples 𝒙𝑖 and 𝒙𝑗 as


𝑑(𝒙𝑖 , 𝒙𝑗 ) → 𝑅 (real scalar)

• A distance metric a function of (𝑅 𝑑 × 𝑅 𝑑 ) → 𝑅 that satisfies:

1. 𝑑 𝒙𝑖 , 𝒙𝑗 ≥ 0, 𝑑 𝒙𝑖 , 𝒙𝑗 = 0 if and only if 𝒙𝑖 = 𝒙𝑗 .
𝒙𝑙
2. 𝑑 𝒙𝑖 , 𝒙 𝑗 + 𝑑 𝒙 𝑗 , 𝒙𝑙 ≥ 𝑑 𝒙𝑖 , 𝒙𝑙

triangle 𝒙𝑗
inequality 𝒙𝑖
18
Distance Measures / Metrics

• We define the distance between any two data samples 𝒙𝑖 and 𝒙𝑗 as


𝑑(𝒙𝑖 , 𝒙𝑗 ) → 𝑅 (real scalar)

• A distance metric a function of (𝑅 𝑑 × 𝑅 𝑑 ) → 𝑅 that satisfies:

1. 𝑑 𝒙𝑖 , 𝒙𝑗 ≥ 0, 𝑑 𝒙𝑖 , 𝒙𝑗 = 0 if and only if 𝒙𝑖 = 𝒙𝑗 .

2. 𝑑 𝒙𝑖 , 𝒙 𝑗 + 𝑑 𝒙 𝑗 , 𝒙𝑙 ≥ 𝑑 𝒙𝑖 , 𝒙𝑙
𝒙𝑗
3. 𝑑 𝒙𝑖 , 𝒙 𝑗 = 𝑑 𝒙 𝑗 , 𝒙𝑖

symmetry 𝒙𝑖
19
Distance Measures / Metrics

• Example of distances:

1. Euclidean Distance:
𝑑

𝑑 𝒙 ,𝒚 = ෍(𝑥𝑗 − 𝑦𝑗 )2 = 𝒙 − 𝒚 2
𝑗=1

• We denote 𝒙 − 𝒚 2 as the 𝑙2 -norm of (𝒙 − 𝒚).

𝒙 −𝒚 2

𝒚
20
Distance Measures / Metrics

• Example of distances:

1. Euclidean Distance:
𝑑

𝑑 𝒙 ,𝒚 = ෍(𝑥𝑗 − 𝑦𝑗 )2 = 𝒙 − 𝒚 2
𝑗=1

2. Manhattan Distance:
𝑑

𝑑 𝒙 , 𝒚 = ෍ | 𝑥𝑗 − 𝑦𝑗 | = 𝒙 − 𝒚 1
𝑗=1

𝒙
𝒙 −𝒚 1

𝒚 21
Distance Measures / Metrics

• Example of distances:

1. Euclidean Distance:
𝑑

𝑑 𝒙 ,𝒚 = ෍(𝑥𝑗 − 𝑦𝑗 )2 = 𝒙 − 𝒚 2
𝑗=1

2. Manhattan Distance:
𝑑

𝑑 𝒙 , 𝒚 = ෍ | 𝑥𝑗 − 𝑦𝑗 | = 𝒙 − 𝒚 1
𝑗=1

3. Infinity (Sup) Distance: 𝑑 𝒙 , 𝒚 = max 𝑥𝑗 − 𝑦𝑗


1≤𝑗≤𝑑
22
Distance Measures / Metrics

• Example of distances:
1. Euclidean Distance: 𝑑 𝒙 , 𝒚 = 𝒙 − 𝒚 2
2. Manhattan Distance: 𝑑 𝒙 , 𝒚 = 𝒙 − 𝒚 1
3. Infinity (Sup) Distance: 𝑑 𝒙 , 𝒚 = max 𝑥𝑗 − 𝑦𝑗
1≤𝑗≤𝑑

23
Distance Measures / Metrics

• Example of distances: Used in this course


1. Euclidean Distance: 𝑑 𝒙 , 𝒚 = 𝒙 − 𝒚 2
2. Manhattan Distance: 𝑑 𝒙 , 𝒚 = 𝒙 − 𝒚 1
3. Infinity (Sup) Distance: 𝑑 𝒙 , 𝒚 = max 𝑥𝑗 − 𝑦𝑗
1≤𝑗≤𝑑

24
Clustering Algorithms

• Partition Algorithms

• K-Means

• Mixture of Gaussian

• Spectral Clustering

• Hierarchical Algorithms

• Agglomerative

• Divisive

25
Clustering Algorithms

• Partition Algorithm

Cluster 1

Cluster 2

➢ Cluster the data samples into non-overlapping subsets (clusters).

➢Each data sample is in exactly one cluster.

26
Clustering Algorithms

• Hierarchical Algorithm

➢ A set of nested clusters organized as a hierarchical tree.

27
Clustering Algorithms

• Partition Algorithms

• K-Means

• Mixture of Gaussian

• Spectral Clustering

• Hierarchical Algorithms

• Agglomerative

• Divisive

28
Clustering Algorithms

• Partition Algorithms

• K-Means

• Mixture of Gaussian

• Spectral Clustering

• Hierarchical Algorithms

• Agglomerative

• Divisive

29
K-Means

• Given a set of d-dimension data points 𝒙1 , 𝒙2 , … , 𝒙𝑖 , … 𝒙𝑁 , and a


distance metric 𝑑 𝒙, 𝒚

• Clustering Goal:

1. Split the data points 𝒙1 , 𝒙2 , … , 𝒙 𝑖 , … 𝒙𝑁 into K clusters.

2. Each cluster has a d-dimension centroid / center 𝜇𝑘 .

3. The sum of distances between each 𝒙(𝑖) and its centroid 𝜇𝑘 is minimized.

30
K-Means

• Distance metric: typically we use Euclidean distance

• Cluster Center / Centroid:


𝜇𝑘 = the average of the data points belong to this cluster.

• Split the data points:


Each data point belongs to only one cluster.

31
K-Means

• Initialize: pick K random points as the cluster centers 𝜇𝑘 .

• Iterate between the following step 1 and 2:

1. Assign every data point 𝒙𝑖 to its closest cluster center, according to the
given distance metric, i.e., find the 𝜇𝑘 such that 𝑑 𝒙𝑖 , 𝜇𝑘 is minimized.

2. Update the cluster center 𝜇𝑘 to be the average of its assigned data points.

• Stopping Criterion: when no points’ assignments change.

32
K-Means Example 1

• Suppose our task is to cluster the following eight points in 2D space


into K = 3 clusters: A1(2,10), A2(2,5), A3(8,4), B1(5,8), B2(7,5),
B3(6,4), C1(1,2), C2(4,9).

• The distance function is Euclidean distance.

• Suppose initially we assign A1, B1 and C1 as the center of each


cluster, respectively.

• Apply K-means to estimate the final three clusters.

33
K-Means Example 1

• A1=(2, 10); A2=(2,5); A3=(8,4); B1=(5,8); B2=(7,5); B3=(6,4); C1=(1,2); C2=(4,9)

34
K-Means Example 1

• Given the three initial cluster centers A1, B1, and C1.
• Step 1: Determine which data point belongs to which cluster by
calculating their distances to the centers, i.e., highlighted columns in the
following matrix:

35
K-Means Example 1

• Cluster 1={A1}, Cluster 2={B1, B2, B3, A3, C2}, Cluster 3={A2, C1}

36
K-Means Example 1

• Cluster 1={A1}, Cluster 2={B1, B2, B3, A3, C2}, Cluster 3={A2, C1}

• Step 2: The cluster centers after the first round of iteration can be obtained
by computing the mean of all the data points belong to each cluster as:
C1 = (2, 10); C2 = (6, 6); C3 = (1.5, 3.5)

37
K-Means Example 1

• Step 2: The cluster centers after the first round of iteration can be
obtained by computing the mean of all the data points belong to each
cluster as:
C1 = (2, 10); C2 = (6, 6); C3 = (1.5, 3.5)

• Repeat Step 1: Determine which data point belongs to which cluster,


by calculating their distances to the new centers C1, C2 and C3.

• ……

• Stopping Criterion: when no points’ assignments change

38
K-Means Example 2
Centroid 1

Data samples

39
K-Means Example 3

40
K-Means Example 3

• Cluster the pixel’s gray-scale intensity: 1-dimension feature

K=2 K=3 41
K-Means Example 3

• Cluster the pixel’s RGB color: 3-dimension feature

K=3 42
K-Means

• What cost function does K-Means optimize?

𝑁 𝐾
1 2
min min ෍ ෍ 𝑟𝑖𝑘 𝑥𝑖 − µ𝑘 2
µ𝑘 𝑟𝑖𝑘 2
𝑖=1 𝑘=1

• Calculation of the centers: µ𝑘 = σ𝑁


𝑖=1 𝑟𝑖𝑘 𝑥𝑖 ∀𝑘

• Constraint on the assignment: 𝑟𝑖𝑘 ∈ {0, 1} ∀𝑖, 𝑘

• Normalization: σ𝐾
𝑘=1 𝑟𝑖𝑘 = 1 ∀𝑖

43
K-Means

• Pros:

• Simple but effective

• Easy to implement

• Cons:

• Need to choose K.

• Stuck at poor local minimum

• Need an appropriate distance metric

44
K-Means

• Different initialization -> Local minimum

45
K-Means

• Poor local minimum

46
K-Means

• Good Initialization

Centroid
47
K-Means

• Good Initialization

48
K-Means

• Bad Initialization

Centroid
49
K-Means

• Bad Initialization

50
K-Means

• Need a better metric

Not linearly separable Good metric space for clustering

51
HAC

• Hierarchical Agglomerative Algorithm ( HAC )

• Start with the points as individual clusters

• At each step, merge the closest pair of clusters, until only one cluster (or K
clusters) left. K is a given number.

• How to merge?

• Merge the pair of clusters with the minimum distance.

52
HAC

• Hierarchical Agglomerative Algorithm ( HAC ) Initialization:


Each object is a cluster.

Iteration:
Merge two clusters with the
minimum distance.

Stopping Criteria:
All objects are merged into a
single cluster.

Or

Only K clusters are left.


5 4 3 2 1
clusters clusters clusters clusters cluster 53
HAC

• HAC can be visualized as a Dendrogram


• A tree-like diagram that records the sequences of merges.

Cut at 2 clusters

Cluster Cluster
1 2

54
HAC

• Advantages of HAC

• Do not have to assume / pre-define the number of clusters.

• Any clustering result with the desired number of clusters K, can be obtained
by “cutting” the dendrogram at the corresponding level.

• The result is independent of the initialization.

55
HAC

1. How to Define the distance between 2 clusters?

Distance?

56
HAC

1. How to Define the distance between 2 clusters?

• MIN: the minimum distance between any pair of two data samples
from each cluster.

57
HAC

1. How to Define the distance between 2 clusters?

• MAX: the maximum distance between any pair of two data samples
from each cluster.

58
HAC

1. How to Define the distance between 2 clusters?

• Group Average: the average distance between all pairs of two data
samples from each cluster.

59
HAC We will use Centroid
Distance for HAC in
this course.
1. How to Define the distance between 2 clusters?

• Centroid Distance: the distance between the means of data samples


(i.e., centroids) from each cluster.

60
HAC

2. How to determine the pair of clusters with minimum distance?

• Visualize the samples in the space.

61
HAC

2. How to determine the pair of clusters with minimum distance?

• Visualize the samples in the space.

• Merge the two clusters (points)


that are closest to each other.

62
HAC

2. How to determine the pair of clusters with minimum distance?

• Visualize the samples in the space.

• Merge the two clusters (points)


that are closest to each other.

• Merge the next closest clusters.

63
HAC

2. How to determine the pair of clusters with minimum distance?

• Visualize the samples in the space

• Merge the two clusters (points)


that are closest to each other

• Merge the next closest clusters.

• Then the next closest…

64
HAC

2. How to determine the pair of clusters with minimum distance?

• Visualize the samples in the space

• Merge the two clusters (points)


that are closest to each other

• Merge the next closest clusters.

• Then the next closest…

• Until only one cluster left.


Or the given K clusters left.

65
HAC

2. How to determine the pair of clusters with minimum distance?

• Equivalently, we have the resulting dendrogram showing the HAC process

• The y-axis on dendrogram shows the distance between clusters when


merging at each step. 66
HAC

2. How to determine the pair of clusters with minimum distance?

• We can also calculate the distance table


Number of Clusters = 5
A B C D E

A 0 2.5 1 4 5

B 2.5 0 2.5 3 3.5

C 1 2.5 0 5.5 4

D 4 3 5.5 0 1.5

E 5 3.5 4 1.5 0

Select the smallest off-diagonal value. Merge the corresponding pair.


67
HAC

2. How to determine the pair of clusters with minimum distance?

• We can also calculate the distance table


Number of Clusters = 4
A B C D E

A 0 2.5 1 4 5

B 2.5 0 2.5 3 3.5

C 1 2.5 0 5.5 4

D 4 3 5.5 0 1.5

E 5 3.5 4 1.5 0

A and C need to be merged to a new cluster. Distances need to updated.


68
HAC

2. How to determine the pair of clusters with minimum distance?

• We can also calculate the distance table


Number of Clusters = 4
A&C B D E

A&C 0 2.8 4.3 4.5


Update!
B 2.8 0 3 3.5

D 4.3 3 0 1.5

E 4.5 3.5 1.5 0

A and C need to be merged to a new cluster. Distances need to updated.


69
HAC

2. How to determine the pair of clusters with minimum distance?

• We can also calculate the distance table


Number of Clusters = 4
A&C B D E

A&C 0 2.8 4.3 4.5

B 2.8 0 3 3.5

D 4.3 3 0 1.5

E 4.5 3.5 1.5 0

Repeat the process, to select the next smallest off-diagonal value.


70
HAC

2. How to determine the pair of clusters with minimum distance?

• We can also calculate the distance table


Number of Clusters = 3
A&C B D&E

A&C 0 2.8 4.5

B 2.8 0 3.3

D&E 4.5 3.3 0

Merge the corresponding clusters, and update the table


71
HAC

2. How to determine the pair of clusters with minimum distance?

• We can also calculate the distance table


Number of Clusters = 3
A&C B D&E

A&C 0 2.8 4.5

B 2.8 0 3.3

D&E 4.5 3.3 0

Repeat the process, to select the next smallest off-diagonal value.


72
HAC

2. How to determine the pair of clusters with minimum distance?

• We can also calculate the distance table


Number of Clusters = 2
A&B&C D&E

A&B&C 0 3.9

D&E 3.9 0

Merge the corresponding clusters, and update the table


73
HAC

2. How to determine the pair of clusters with minimum distance?

• We can also calculate the distance table


Number of Clusters = 1
A&B&C&D&E

A&B&C&D&E 0

Finally reach the single cluster. Stop.


74
K-Means vs. HAC

• K-Means

✓ Simple and cheap algorithm

✘ Results are sensitive to the initialization


✘ Number of clusters needs to be pre-defined

• HAC

✓ Deterministic algorithm, i.e., not randomness.


✓ Show us a range of clustering results with different choices of K.

✘ More memory- and computationally-intensive than K-Means

75
What we learn

• What is clustering?

• What is distance metric?

o Euclidean distance, centroid distance, etc.

• K-Means

o Goal, algorithm, optimization, examples

• HAC

o Algorithm, dendrogram, cluster selection, examples


76
Carry-on Questions
• What is a cluster? What is clustering?

• Cluster: a set of data which are similar to each other.


• Clustering: group / organize the data such that the data in the same group are
more similar to each other, than to those in other groups.

• What is the difference between clustering and classification?

• Clustering is unsupervised, while classification is supervised.

• What is the limitations of K-Means algorithm?

• Need to choose K. Can stuck at poor local minimum. Need good metric.

• What are the limitations of HAC algorithm?

• Memory- and computationally-intensive.

77
Thank you! Now questions

78

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy