0% found this document useful (0 votes)

13 views57 pages

2 Clustering

MOL ppt 2 fau erlangen

Uploaded by

Vibhu Gupta

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

13 views57 pages

2 Clustering

MOL ppt 2 fau erlangen

Uploaded by

Vibhu Gupta

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 57

Unsupervised Learning: Clustering

Lecture “Mathematics of Learning” 2024/25

Frauke Liers
Friedrich-Alexander-Universität Erlangen-Nürnberg
Rough Differentiation in Learning Methods

Frauke Liers · Friedrich-Alexander-Universität Erlangen-Nürnberg · Unsupervised Learning: Clustering 2

Rough Differentiation in Learning Methods

supervised learning:
• predict values of an outcome measure based on a number of input measures
(e.g., given some patient data together with label ’has illness’ / ’does not have
illness’. New patient data comes in, predict whether the patient is ill or not.)
unsupervised learning:
• no outcome measure given. goal: find structures among data.
...also something in between: semi-supervised learning.

Frauke Liers · Friedrich-Alexander-Universität Erlangen-Nürnberg · Unsupervised Learning: Clustering 2

Unsupervised Learning: Clustering of Data

Is there any structure in this data?

Frauke Liers · Friedrich-Alexander-Universität Erlangen-Nürnberg · Unsupervised Learning: Clustering 3

There are two categories/clusters such that objects within a cluster resemble
each other but objects from different clusters look different.

Frauke Liers · Friedrich-Alexander-Universität Erlangen-Nürnberg · Unsupervised Learning: Clustering 4

What’s clustering?

Given fruit measurement data

(heighti , widthi )N
i =1 .
• Visually: There are multiple
“categories” of data.

Frauke Liers · Friedrich-Alexander-Universität Erlangen-Nürnberg · Unsupervised Learning: Clustering 5

What’s clustering?

Given fruit measurement data

(heighti , widthi )N
i =1 .
• Visually: There are multiple
“categories” of data.
• How can we sort data into
categories/clusters?

Frauke Liers · Friedrich-Alexander-Universität Erlangen-Nürnberg · Unsupervised Learning: Clustering 5

What’s clustering?

Given fruit measurement data

(heighti , widthi )N
i =1 .
• Visually: There are multiple
“categories” of data.
• How can we sort data into
categories/clusters?
→ clustering problem

Frauke Liers · Friedrich-Alexander-Universität Erlangen-Nürnberg · Unsupervised Learning: Clustering 5

What’s clustering?

Given fruit measurement data

(heighti , widthi )N
i =1 .
• Visually: There are multiple
“categories” of data.
• How can we sort data into
categories/clusters?
→ clustering problem
• More measurements help

Frauke Liers · Friedrich-Alexander-Universität Erlangen-Nürnberg · Unsupervised Learning: Clustering 5

What’s clustering?

Given fruit measurement data

(heighti , widthi )N
i =1 .
• Visually: There are multiple
“categories” of data.
• How can we sort data into
categories/clusters?
→ clustering problem
• More measurements help
• Clustering the raw data by hand
is cumbersome

Frauke Liers · Friedrich-Alexander-Universität Erlangen-Nürnberg · Unsupervised Learning: Clustering 5

Clustering Algorithm

Given
• N number of data points
• M number of variables (i.e “mass”, “price”, “color”, ...)
• Data X = {x1, . . . , xN }, where xn ∈ RM for all n = 1, . . . , N
• K assumed number of clusters
Want

Frauke Liers · Friedrich-Alexander-Universität Erlangen-Nürnberg · Unsupervised Learning: Clustering 6

Clustering Algorithm

Frauke Liers · Friedrich-Alexander-Universität Erlangen-Nürnberg · Unsupervised Learning: Clustering 6

Clustering Algorithm

Frauke Liers · Friedrich-Alexander-Universität Erlangen-Nürnberg · Unsupervised Learning: Clustering 6

Clustering Algorithm

Given
• N number of data points
• M number of variables (i.e “mass”, “price”, “color”, ...)
• Data X = {x1, . . . , xN }, where xn ∈ RM for all n = 1, . . . , N
• K assumed number of clusters
Want
• Assignment: xn 7→ kn ∈ {1, . . . , K } for all n = 1, . . . , N
• Assignment rule: x 7→ k (x) ∈ {1, . . . , K } for all x ∈ RM
• Reconstruction rule (’representative’): k 7→ mk ∈ RM
On an abstract level:
• Determination of best possible clustering (w.r.t. some objective) is a classical
combinatorial optimization problem
• K-means clustering: Determine K points, i.e., centers, that minimize the sum
of the squared Euclidean distance to its closest center.

Frauke Liers · Friedrich-Alexander-Universität Erlangen-Nürnberg · Unsupervised Learning: Clustering 6

Clustering Problems and Algorithms

• No general definition of clustering here, see Theoretical Computer Science

for further details
• However: Already in simplified / restricted situations, clustering is difficult,
i.e., NP-hard.
• In particular: We cannot expect to be able to determine an algorithm that can
efficiently determine the best clustering within polynomial time in the input
size.
• Further Reading: M. Mahajan, P. Nimbhorkar, K. Varadarajan: The planar
k-means problem is NP-hard. Proceedings of WALCOM: Algorithms and
Computation, S.274-285 (2009).
• Clustering is a very basic learning task. Depending on the application,
different clustering algorithms work best.
• Here: Focus on (standard) algorithms K-Means and Expectation
Maximization.
• Plus: Hierarchical clustering and principal component analysis for clustering
and data reduction.
Frauke Liers · Friedrich-Alexander-Universität Erlangen-Nürnberg · Unsupervised Learning: Clustering 7
K-means clustering as optimization problem

Find clustering C = {C1, . . . , CK } into sets Ck ⊂ X and centers

m = {m1, . . . , mK } with mk associated to Ck , which minimize the clustering
energy
K
1XX
E (C , m) := kx − mk k2 .
2
k =1 x ∈Ck

Frauke Liers · Friedrich-Alexander-Universität Erlangen-Nürnberg · Unsupervised Learning: Clustering 8

K-means clustering as optimization problem

Find clustering C = {C1, . . . , CK } into sets Ck ⊂ X and centers

m = {m1, . . . , mK } with mk associated to Ck , which minimize the clustering
energy
K
1XX
E (C , m) := kx − mk k2 .
2
k =1 x ∈Ck

Observations
• The clustering energy has local
minima

(https://upload.wikimedia.org/wikipedia/commons/7/7c/K-means_convergence_to_a_
local_minimum.png, modified)