0% found this document useful (0 votes)
13 views57 pages

2 Clustering

MOL ppt 2 fau erlangen

Uploaded by

Vibhu Gupta
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views57 pages

2 Clustering

MOL ppt 2 fau erlangen

Uploaded by

Vibhu Gupta
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 57

Unsupervised Learning: Clustering

Lecture “Mathematics of Learning” 2024/25

Frauke Liers
Friedrich-Alexander-Universität Erlangen-Nürnberg
Rough Differentiation in Learning Methods

supervised learning:
• predict values of an outcome measure based on a number of input measures
(e.g., given some patient data together with label ’has illness’ / ’does not have
illness’. New patient data comes in, predict whether the patient is ill or not.)

Frauke Liers · Friedrich-Alexander-Universität Erlangen-Nürnberg · Unsupervised Learning: Clustering 2


Rough Differentiation in Learning Methods

supervised learning:
• predict values of an outcome measure based on a number of input measures
(e.g., given some patient data together with label ’has illness’ / ’does not have
illness’. New patient data comes in, predict whether the patient is ill or not.)
unsupervised learning:
• no outcome measure given. goal: find structures among data.
...also something in between: semi-supervised learning.

Frauke Liers · Friedrich-Alexander-Universität Erlangen-Nürnberg · Unsupervised Learning: Clustering 2


Unsupervised Learning: Clustering of Data

Is there any structure in this data?

Frauke Liers · Friedrich-Alexander-Universität Erlangen-Nürnberg · Unsupervised Learning: Clustering 3


There are two categories/clusters such that objects within a cluster resemble
each other but objects from different clusters look different.

Frauke Liers · Friedrich-Alexander-Universität Erlangen-Nürnberg · Unsupervised Learning: Clustering 4


What’s clustering?

Given fruit measurement data


(heighti , widthi )N
i =1 .
• Visually: There are multiple
“categories” of data.

Frauke Liers · Friedrich-Alexander-Universität Erlangen-Nürnberg · Unsupervised Learning: Clustering 5


What’s clustering?

Given fruit measurement data


(heighti , widthi )N
i =1 .
• Visually: There are multiple
“categories” of data.
• How can we sort data into
categories/clusters?

Frauke Liers · Friedrich-Alexander-Universität Erlangen-Nürnberg · Unsupervised Learning: Clustering 5


What’s clustering?

Given fruit measurement data


(heighti , widthi )N
i =1 .
• Visually: There are multiple
“categories” of data.
• How can we sort data into
categories/clusters?
→ clustering problem

Frauke Liers · Friedrich-Alexander-Universität Erlangen-Nürnberg · Unsupervised Learning: Clustering 5


What’s clustering?

Given fruit measurement data


(heighti , widthi )N
i =1 .
• Visually: There are multiple
“categories” of data.
• How can we sort data into
categories/clusters?
→ clustering problem
• More measurements help

Frauke Liers · Friedrich-Alexander-Universität Erlangen-Nürnberg · Unsupervised Learning: Clustering 5


What’s clustering?

Given fruit measurement data


(heighti , widthi )N
i =1 .
• Visually: There are multiple
“categories” of data.
• How can we sort data into
categories/clusters?
→ clustering problem
• More measurements help
• Clustering the raw data by hand
is cumbersome

Frauke Liers · Friedrich-Alexander-Universität Erlangen-Nürnberg · Unsupervised Learning: Clustering 5


Clustering Algorithm

Given
• N number of data points
• M number of variables (i.e “mass”, “price”, “color”, ...)
• Data X = {x1, . . . , xN }, where xn ∈ RM for all n = 1, . . . , N
• K assumed number of clusters
Want

Frauke Liers · Friedrich-Alexander-Universität Erlangen-Nürnberg · Unsupervised Learning: Clustering 6


Clustering Algorithm

Given
• N number of data points
• M number of variables (i.e “mass”, “price”, “color”, ...)
• Data X = {x1, . . . , xN }, where xn ∈ RM for all n = 1, . . . , N
• K assumed number of clusters
Want
• Assignment: xn 7→ kn ∈ {1, . . . , K } for all n = 1, . . . , N

Frauke Liers · Friedrich-Alexander-Universität Erlangen-Nürnberg · Unsupervised Learning: Clustering 6


Clustering Algorithm

Given
• N number of data points
• M number of variables (i.e “mass”, “price”, “color”, ...)
• Data X = {x1, . . . , xN }, where xn ∈ RM for all n = 1, . . . , N
• K assumed number of clusters
Want
• Assignment: xn 7→ kn ∈ {1, . . . , K } for all n = 1, . . . , N
• Assignment rule: x 7→ k (x) ∈ {1, . . . , K } for all x ∈ RM

Frauke Liers · Friedrich-Alexander-Universität Erlangen-Nürnberg · Unsupervised Learning: Clustering 6


Clustering Algorithm

Given
• N number of data points
• M number of variables (i.e “mass”, “price”, “color”, ...)
• Data X = {x1, . . . , xN }, where xn ∈ RM for all n = 1, . . . , N
• K assumed number of clusters
Want
• Assignment: xn 7→ kn ∈ {1, . . . , K } for all n = 1, . . . , N
• Assignment rule: x 7→ k (x) ∈ {1, . . . , K } for all x ∈ RM
• Reconstruction rule (’representative’): k 7→ mk ∈ RM
On an abstract level:
• Determination of best possible clustering (w.r.t. some objective) is a classical
combinatorial optimization problem
• K-means clustering: Determine K points, i.e., centers, that minimize the sum
of the squared Euclidean distance to its closest center.

Frauke Liers · Friedrich-Alexander-Universität Erlangen-Nürnberg · Unsupervised Learning: Clustering 6


Clustering Problems and Algorithms

• No general definition of clustering here, see Theoretical Computer Science


for further details
• However: Already in simplified / restricted situations, clustering is difficult,
i.e., NP-hard.
• In particular: We cannot expect to be able to determine an algorithm that can
efficiently determine the best clustering within polynomial time in the input
size.
• Further Reading: M. Mahajan, P. Nimbhorkar, K. Varadarajan: The planar
k-means problem is NP-hard. Proceedings of WALCOM: Algorithms and
Computation, S.274-285 (2009).
• Clustering is a very basic learning task. Depending on the application,
different clustering algorithms work best.
• Here: Focus on (standard) algorithms K-Means and Expectation
Maximization.
• Plus: Hierarchical clustering and principal component analysis for clustering
and data reduction.
Frauke Liers · Friedrich-Alexander-Universität Erlangen-Nürnberg · Unsupervised Learning: Clustering 7
K-means clustering as optimization problem

Find clustering C = {C1, . . . , CK } into sets Ck ⊂ X and centers


m = {m1, . . . , mK } with mk associated to Ck , which minimize the clustering
energy
K
1XX
E (C , m) := kx − mk k2 .
2
k =1 x ∈Ck

Frauke Liers · Friedrich-Alexander-Universität Erlangen-Nürnberg · Unsupervised Learning: Clustering 8


K-means clustering as optimization problem

Find clustering C = {C1, . . . , CK } into sets Ck ⊂ X and centers


m = {m1, . . . , mK } with mk associated to Ck , which minimize the clustering
energy
K
1XX
E (C , m) := kx − mk k2 .
2
k =1 x ∈Ck

Observations
• The clustering energy has local
minima

(https://upload.wikimedia.org/wikipedia/commons/7/7c/K-means_convergence_to_a_
local_minimum.png, modified)

Frauke Liers · Friedrich-Alexander-Universität Erlangen-Nürnberg · Unsupervised Learning: Clustering 8


Derivation of the K-means algorithm

iterate between: determine clustering for fixed means, determine means for
fixed clustering.
Let us fix the clustering C in
K
1XX
E (C , m) := kx − mk k2 .
2
k =1 x ∈Ck

Frauke Liers · Friedrich-Alexander-Universität Erlangen-Nürnberg · Unsupervised Learning: Clustering 9


Derivation of the K-means algorithm

iterate between: determine clustering for fixed means, determine means for
fixed clustering.
Let us fix the clustering C in
K
1XX
E (C , m) := kx − mk k2 .
2
k =1 x ∈Ck

Optimal means? Necessary first-order optimality condition: gradient with


respect to mk is zero, i.e., a critical point.
Taking the gradient with respect to mk we obtain the first-order optimality
condition:
X X
0 = ∇mk E (C , m) = (x − mk ) = x − |Ck |mk
x ∈Ck x ∈Ck

Frauke Liers · Friedrich-Alexander-Universität Erlangen-Nürnberg · Unsupervised Learning: Clustering 9


Derivation of the K-means algorithm

iterate between: determine clustering for fixed means, determine means for
fixed clustering.
Let us fix the clustering C in
K
1XX
E (C , m) := kx − mk k2 .
2
k =1 x ∈Ck

Optimal means? Necessary first-order optimality condition: gradient with


respect to mk is zero, i.e., a critical point.
Taking the gradient with respect to mk we obtain the first-order optimality
condition:
X X
0 = ∇mk E (C , m) = (x − mk ) = x − |Ck |mk
x ∈Ck x ∈Ck

and hence
1 X
mk = x=
b mean of the cluster
|Ck |
x ∈Ck

Frauke Liers · Friedrich-Alexander-Universität Erlangen-Nürnberg · Unsupervised Learning: Clustering 9


Derivation of the K-means algorithm

iterate between: determine clustering for fixed means, determine means for
fixed clustering.
Conversely, let us fix the means m in
K
1XX
E (C , m) := kx − mk k2 .
2
k =1 x ∈Ck

Frauke Liers · Friedrich-Alexander-Universität Erlangen-Nürnberg · Unsupervised Learning: Clustering 9


Derivation of the K-means algorithm

iterate between: determine clustering for fixed means, determine means for
fixed clustering.
Conversely, let us fix the means m in
K
1XX
E (C , m) := kx − mk k2 .
2
k =1 x ∈Ck

optimal clustering? perform the simple assignment step


Ck = {x ∈ X : kx − mk k ≤ x − mj for all j = 1, . . . , K }
b Voronoi cell of mk
=

Frauke Liers · Friedrich-Alexander-Universität Erlangen-Nürnberg · Unsupervised Learning: Clustering 9


Derivation of the K-means algorithm

iterate between: determine clustering for fixed means, determine means for
fixed clustering.
Conversely, let us fix the means m in
K
1XX
E (C , m) := kx − mk k2 .
2
k =1 x ∈Ck

optimal clustering? perform the simple assignment step


Ck = {x ∈ X : kx − mk k ≤ x − mj for all j = 1, . . . , K }
b Voronoi cell of mk
=

mk Ck

Frauke Liers · Friedrich-Alexander-Universität Erlangen-Nürnberg · Unsupervised Learning: Clustering 9


K-means clustering algorithm

Data: X = {x1 , . . . , xN } and number of clusters K ∈ N


Result: cluster means m = (m1 , . . . , mK )
initialize m randomly;
repeat
// assignment step:
for n ← 1 to N // assign n-th point to cluster with nearest mean do
kn ← argmink kxn − mk k
end
// update step:
for k ← 1 to K do
Ck ← {n ∈ {1, . . . , N} : kn = k} // cluster
if |Ck | > 0 then
mk ← |C1k | n∈Ck xn // cluster mean
P

end
end
until assignment step does not do anything;

Frauke Liers · Friedrich-Alexander-Universität Erlangen-Nürnberg · Unsupervised Learning: Clustering 10


K-means clustering algorithm

Data: X = {x1 , . . . , xN } and number of clusters K ∈ N


Result: cluster means m = (m1 , . . . , mK )
initialize m randomly;
repeat
// assignment step:
for n ← 1 to N // assign n-th point to cluster with nearest mean do
kn ← argmink kxn − mk k
end
// update step:
for k ← 1 to K do
Ck ← {n ∈ {1, . . . , N} : kn = k} // cluster
if |Ck | > 0 then
mk ← |C1k | n∈Ck xn // cluster mean
P

end
end
until assignment step does not do anything;
• Assignment rule: x 7→ argmink kx − mk k.
• Reconstruction rule: k 7→ mk

Frauke Liers · Friedrich-Alexander-Universität Erlangen-Nürnberg · Unsupervised Learning: Clustering 10


Back to our fruits data set

Frauke Liers · Friedrich-Alexander-Universität Erlangen-Nürnberg · Unsupervised Learning: Clustering 11


Back to our fruits data set

Frauke Liers · Friedrich-Alexander-Universität Erlangen-Nürnberg · Unsupervised Learning: Clustering 11


Back to our fruits data set

Frauke Liers · Friedrich-Alexander-Universität Erlangen-Nürnberg · Unsupervised Learning: Clustering 11


Back to our fruits data set

Frauke Liers · Friedrich-Alexander-Universität Erlangen-Nürnberg · Unsupervised Learning: Clustering 11


Back to our fruits data set

Frauke Liers · Friedrich-Alexander-Universität Erlangen-Nürnberg · Unsupervised Learning: Clustering 11


Back to our fruits data set

Frauke Liers · Friedrich-Alexander-Universität Erlangen-Nürnberg · Unsupervised Learning: Clustering 11


Back to our fruits data set

Frauke Liers · Friedrich-Alexander-Universität Erlangen-Nürnberg · Unsupervised Learning: Clustering 11


Different metrics lead to different (convex) clusters

using squared Euclidean distance


(L2-norm):
K
1XX
E (C , m ) = kx − mk k2
2
k =1 x ∈Ck

Frauke Liers · Friedrich-Alexander-Universität Erlangen-Nürnberg · Unsupervised Learning: Clustering 12


Different metrics lead to different (convex) clusters

using squared Euclidean distance


using Manhattan distance (L1-norm):
(L2-norm):
K X X
M
K X
1 X X 2 E (C , m) = |x i − mki |
E (C , m ) = kx − mk k
2 k =1 x ∈Ck i =1
k =1 x ∈Ck

Frauke Liers · Friedrich-Alexander-Universität Erlangen-Nürnberg · Unsupervised Learning: Clustering 12


Advantages and Disadvantages of K-Means

Very well known clustering algorithm that is used often.


Advantages:
• easy to implement
• can run with only a set of real data vectors and value of K
• A feasible clustering is always available.

Frauke Liers · Friedrich-Alexander-Universität Erlangen-Nürnberg · Unsupervised Learning: Clustering 13


Advantages and Disadvantages of K-Means

Very well known clustering algorithm that is used often.


Advantages:
• easy to implement
• can run with only a set of real data vectors and value of K
• A feasible clustering is always available.
Disadvantages
• Choosing a good value of K can be difficult. Potential way out: Test several
values.
• Assumes numerical data, not categorial data such as ’car’, ’truck’, etc. We
will see a clustering approach for categorial data later.
• K-means aims at minimizing the Euclidean distances. This is not always the
right objective.
• The result strongly depends on initialization, but some improvements are
known.
• Assumes that clusters are convex.
Frauke Liers · Friedrich-Alexander-Universität Erlangen-Nürnberg · Unsupervised Learning: Clustering 13
Advantages and Disadvantages of K-Means

Disadvantages:
• K-means sometimes does not work well, in particular for non-spherical /
nonconvex data or for unevenly sized clusters, i.e. it has some implicit
assumptions:

from varianceexplained.org
next: some improvements.
Frauke Liers · Friedrich-Alexander-Universität Erlangen-Nürnberg · Unsupervised Learning: Clustering 14
Expectation-Maximization Clustering Algorithm (EM)

• further reading: The Elements of Statistical Learning, Chapter 14


• Recall K-means has implicit assumptions (clusters are convex & roughly
equally sized) that may not be satisfied.
• alternative way of thinking: Decide for each data point the probability with
which it belongs to a certain cluster, i.e., a ’soft’ clustering. allows clusters of
different size, can detect correlations
• problem: This probability distribution is unknown.
• task: Estimate probability distribution, improve estimate iteratively.

Frauke Liers · Friedrich-Alexander-Universität Erlangen-Nürnberg · Unsupervised Learning: Clustering 15


Mixture of Gaussian Distributions

• make quite general assumption: This unknown distribution is a mixture, i.e.,


superposition, of K many (multi-dimensional) Gaussian distributions. Means
that x ∼ P(·|p, µ, Σ) with a probability density of the form
K
X
P(x |p, µ, Σ) = pk · φ(x |µk , Σk ),
k =1
• where p, µ, Σ are unknown parameters with

p = (p1, . . . , pK ), pk ∈ R probability vector


µ = (µ1, . . . , µK ), µk ∈ RM vector of means
Σ = (Σ1, . . . , ΣK ), Σk ∈ RM ×M covariance matrix
where M is the dimension of a data point x, i.e. x ∈ RM .

Frauke Liers · Friedrich-Alexander-Universität Erlangen-Nürnberg · Unsupervised Learning: Clustering 16


Recall Covariance Matrix

Let x = (x1, . . . , xM )> be random vector, finite variance and mean. Let
covariance matrix Σ = (Σxi ,xj ) ∈ RM ×M be defined as
Σxi ,xj = E ((xi − E (xi ))(xj − E (xj ))) where E denotes expected values µX = E (X ).
• represents important statistical information, in particular correlation between
data
• is a real, quadratic, symmetric matrix
• is positive-semidefinite matrix

Frauke Liers · Friedrich-Alexander-Universität Erlangen-Nürnberg · Unsupervised Learning: Clustering 17


More Details on Mixture of Gaussians
K
P
P(x |p, µ, Σ) = pk · φ(x |µk , Σk )
k =1
Clustering task: Given data points, estimate p, µ, Σ.
Output: Yields for each data point n and for each cluster k an estimated
probability that n generated from k .
Assign each data point the mean with highest probability.
Advantage: also clusters are possible that are (partially) contained in each
other:

Figure: K = 3 Gaussian Mixture in 1d and generated data.


Frauke Liers · Friedrich-Alexander-Universität Erlangen-Nürnberg · Unsupervised Learning: Clustering 18
Responsibility Calculation

Define responsibility of cluster k for x by ob-


served relative frequencies, i.e., probability that
x is generated by Gaussian k.
p φ(x |µk , Σk )
γ(x ; k ) = PKk
j =1 pj φ(x |µj , Σj )

Roughly speaking: ”How much % of data point x is attributed to cluster k?”


Density function of (standard) normal distribution is denoted by φ.

Frauke Liers · Friedrich-Alexander-Universität Erlangen-Nürnberg · Unsupervised Learning: Clustering 19


Responsibility Calculation

Define responsibility of cluster k for x by ob-


served relative frequencies, i.e., probability that
x is generated by Gaussian k.
p φ(x |µk , Σk )
γ(x ; k ) = PKk
j =1 pj φ(x |µj , Σj )

Roughly speaking: ”How much % of data point x is attributed to cluster k?”


Density function of (standard) normal distribution is denoted by φ.
algorithmic idea:
iteratively: for fixed Gaussian mixing, compute responsibilities. for fixed
responsibilities, update mixing.

Frauke Liers · Friedrich-Alexander-Universität Erlangen-Nürnberg · Unsupervised Learning: Clustering 19


Outline of Expected Maximization Clustering

Frauke Liers · Friedrich-Alexander-Universität Erlangen-Nürnberg · Unsupervised Learning: Clustering 20


Outline of Expected Maximization Clustering

Update step takes data points into account via relative frequencies.
Compare this to K-means... Indeed, K-means can be seen as a special case of
EM.

Frauke Liers · Friedrich-Alexander-Universität Erlangen-Nürnberg · Unsupervised Learning: Clustering 20


A visual comparison

drawback: K-means and EM only find local optima. (recall already K-means
problem is NP-hard...)
play around with skikit-learn (machine learning library in python)

Frauke Liers · Friedrich-Alexander-Universität Erlangen-Nürnberg · Unsupervised Learning: Clustering 21


Hierarchical Clustering Methods

• Sometimes, there is some hierarchical structure in the data that we want to


disclose in a clustering.
• In hierarchical methods, we do not need to specify the number of clusters K
as input
• Instead: measure of dissimilarity, e.g., ’distance’ between (disjoint) groups of
observations - Typically based on pairwise dissimilarities.
• Clusters at each levels of hierarchy are created by merging clusters at next
lower level.
• The lowest level contains one point each, highest level contains all points.
• Both agglomerative (bottom-up) or divisive (top-down) methods exist.

Frauke Liers · Friedrich-Alexander-Universität Erlangen-Nürnberg · Unsupervised Learning: Clustering 22


Hierarchical Clustering Methods

• Idea of agglomerative approach: start at bottom. At each level, recursively


merge a selected pair of clusters into a single cluster. Next higher level
contains one cluster less. Merge those two clusters with the smallest
dissimilarity. ⇒ This leads to N − 1 levels in hierarchy.
• Clusterings are often drawn by a rooted binary tree. It contains a node as
root. Each node has not more than two children in the tree.

Frauke Liers · Friedrich-Alexander-Universität Erlangen-Nürnberg · Unsupervised Learning: Clustering 23


Visualization as a Dendrogram

from statistical learning book:

Frauke Liers · Friedrich-Alexander-Universität Erlangen-Nürnberg · Unsupervised Learning: Clustering 24


Dendrogram

• a word of caution: small changes in data can lead to quite different


dendrograms
• For a clustering application, we need to decide whether hierarchical structure
is actually intrinsic to the data or not.

Frauke Liers · Friedrich-Alexander-Universität Erlangen-Nürnberg · Unsupervised Learning: Clustering 25


Clustering in More General Contexts

How can we compute (pairwise) distances in a clustering algorithm? ...for


example in ordinal or in categorial contexts? E.g., Suppose we have
measurements or averages over personal judgements by participants who are
asked to judge differences between objects.
Often, one uses dissimilarities based on attributes for the distance calculation in
a clustering algorithm.
Dissimilarities are calculated as follows:
• Suppose we have for objects i = 1, . . . , N measurements xij for variables
(attributes) j = 1, . . . , p.
• We define dissimilarity between objects i and i 0 as
p
X
D (xi , xi 0 ) = dj (xij , xi 0j )
j =1

• In this definition, different attributes could also be weighted differently.

Frauke Liers · Friedrich-Alexander-Universität Erlangen-Nürnberg · Unsupervised Learning: Clustering 26


Alternatives for Calculating Distances

Depending on the variable types, distances need to be calculated differently.


• Ordinal variables: e.g., contiguous integers, ordered sets such as academic
grades (A, B, C, D, F), degree of preference (can’t stand, dislike, OK, like,
terrific). Dissimilarities for ordinal variables are typically defined by replacing
their M original values with
i − 21
, i = 1, . . . , M
M
in the prescribed order of their original values.
• (unordered) categorical variables: Dissimilarity between pairs of values must
be delineated explicitly.
If variable assumes M distinct values, these can be arranged in a symmetric
M × M matrix with Lrr 0 = Lr 0r , Lrr = 0, Lrr 0 ≥ 0. Often: Lrr 0 = 1 for all r 6= r 0.
Unequal losses can be used to emphasize some distances more than others.
How can we cluster data in such contexts?

Frauke Liers · Friedrich-Alexander-Universität Erlangen-Nürnberg · Unsupervised Learning: Clustering 27


K-medoids Algorithm

As discussed before, the K -means clustering algorithm is designed for


numerical and quantitative values.
Look at K -means again: iteratively,
1. Each data point is assigned to the cluster that minimizes distance (or
dissimilarity).
2. Then, the new cluster mean is calculated.
The optimization step 1) can easily be generalized to dissimilarities stored in
matrices, as studied on the last slides.

Frauke Liers · Friedrich-Alexander-Universität Erlangen-Nürnberg · Unsupervised Learning: Clustering 28


K -Medoids Algorithm

1. For a given cluster assignment C, find the observation or attribute in the


cluster that minimizes the total distance to the other points in that cluster:
X
?
ik = argmin D (xi , xi 0 ).
{i :C (i )=k }
C (i 0 )=k

Then
mk = xik? , k = 1, 2, . . . K
are the current estimates of the cluster centers.
2. Given a current set of cluster centers {m1, . . . , mK }, minimize the total
dissimilarity by assigning each observation to the closest (current) cluster
center:
C (i ) = argmin D (xi , mk ).
1≤k ≤K
3. Iterate steps 1 and 2 until the assignments do not change.

Frauke Liers · Friedrich-Alexander-Universität Erlangen-Nürnberg · Unsupervised Learning: Clustering 29


How to choose number of clusters K ?

For a clustering problem, best number of clusters depends on the goal and on
the knowledge on the application.
• Sometimes, the best value of K is given as input (e.g., K salespeople are
employed, the task is to cluster a database in K many segments)
• However, if ’natural’ clusters need to be determined, the best value of K is
unknown and needs to be estimated from data as well.

Frauke Liers · Friedrich-Alexander-Universität Erlangen-Nürnberg · Unsupervised Learning: Clustering 30


(Heuristic) estimate of good K values

• Examine within-cluster dissimilarity Dk 0 as a function of the number of


clusters k 0.
• Separate solutions are obtained for k 0 ∈ {1, 2, . . . , Kmax}.
• Values {D1, D2, . . . , DKmax } decrease with increasing k 0.
• intuition: if there are K natural groupings, then for k 0 < K , clusters will each
contain a subset of the true underlying groups, i.e., dissimilarity Dk 0 will
(noticeably) decrease for increasing k 0.
• For k 0 > K instead, at least one natural group will be assigned separate
clusters. I.e., Dk 0 will decrease only mildly.
• Estimate best number of clusters K by identifying a ’kink’ in the plot of Dk 0 as
a function of k 0.

Frauke Liers · Friedrich-Alexander-Universität Erlangen-Nürnberg · Unsupervised Learning: Clustering 31


(Heuristic) estimate of good K values

• Examine within-cluster dissimilarity Dk 0 as a function of the number of


clusters k 0.
• Separate solutions are obtained for k 0 ∈ {1, 2, . . . , Kmax}.
• Values {D1, D2, . . . , DKmax } decrease with increasing k 0.
• intuition: if there are K natural groupings, then for k 0 < K , clusters will each
contain a subset of the true underlying groups, i.e., dissimilarity Dk 0 will
(noticeably) decrease for increasing k 0.
• For k 0 > K instead, at least one natural group will be assigned separate
clusters. I.e., Dk 0 will decrease only mildly.
• Estimate best number of clusters K by identifying a ’kink’ in the plot of Dk 0 as
a function of k 0.
Next, we will turn to a fundamental method in learning that can be used for
clustering as well, and also more generally for reducing data dimensions, called
Principal Component Analysis.

Frauke Liers · Friedrich-Alexander-Universität Erlangen-Nürnberg · Unsupervised Learning: Clustering 31

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy