0% found this document useful (0 votes)

72 views185 pages

kdd10 Thclust

The document discusses recent developments in the theory of clustering algorithms. It covers: 1) Practical clustering algorithms like k-means that have strong theoretical guarantees on their performance. 2) Models to explain behaviors observed in practice, such as how k-means finds a local minimum that is potentially much worse than the optimal solution. 3) Topics covered include analyzing the runtime of k-means, generalizing it to Bregman clustering, and measuring the stability of clustering solutions.

Uploaded by

Michael Burak

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

72 views185 pages

kdd10 Thclust

Uploaded by

Michael Burak

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 185

New Developments In The Theory Of Clustering

that’s all very well in practice, but does it work in theory ?

Sergei Vassilvitskii (Yahoo! Research)

Suresh Venkatasubramanian (U. Utah)

Sergei V. and Suresh V. Theory of Clustering

Overview

What we will cover

A few of the recent theory results on clustering:
Practical algorithms that have strong theoretical guarantees
Models to explain behavior observed in practice

Sergei V. and Suresh V. Theory of Clustering

Overview

What we will not cover

The rest:
Recent strands of theory of clustering such as metaclustering
and privacy preserving clustering
Clustering with distributional data assumptions
Proofs

Sergei V. and Suresh V. Theory of Clustering

Outline

Outline
I Euclidean Clustering and k-means algorithm

II Bregman Clustering and k-means

III Stability

Sergei V. and Suresh V. Theory of Clustering

Outline

III Stability

Sergei V. and Suresh V. Theory of Clustering

Outline

Outline
I Euclidean Clustering and k-means algorithm
What to do to select initial centers (and what not to do)
How long does k-means take to run in theory, practice and
theoretical practice
How to run k-means on large datasets
II Bregman Clustering and k-means
Bregman Clustering as generalization of k-means
Performance Results
III Stability

Sergei V. and Suresh V. Theory of Clustering

Outline

Sergei V. and Suresh V. Theory of Clustering

Euclidean Clustering and k-means

Sergei V. and Suresh V. Theory of Clustering

Introduction

What does it mean to cluster?

Given n points in Rd find the best way to split them into k groups.

Sergei V. and Suresh V. Theory of Clustering

points in Rd split them into k similar groups.
n nIntroduction

How do we define “best" ?

Example:

3
Sergei V. and Suresh V. Theory of Clustering
points in Rd split them into k similar groups.
n nIntroduction

How do we define “best" ?

Example:

3
Sergei V. and Suresh V. Theory of Clustering
points in Rd split them into k similar groups.
n nIntroduction
ctive: minimize maximum radius
How do we define “best" ?
Minimize the maximum radius of a cluster

4
Sergei V. and Suresh V. Theory of Clustering
points in Rd split them into k similar groups.
n nIntroduction
ctive: minimize maximum radius
How do we define “best" ?
maximize inter-cluster distance
Maximize the average inter-cluster distance

5
Sergei V. and Suresh V. Theory of Clustering
points in Rd split them into k similar groups.
n nIntroduction
ctive: minimize maximum radius
How do we define
maximize “best" ? distance
inter-cluster
Minimize the variance within each cluster.

6
Sergei V. and Suresh V. Theory of Clustering
Introduction

How do we define “best" ?

Minimize the variance within each cluster.

Minimizing total variance

For each cluster Ci ∈ C , ci = |C1 | x∈Ci x is the expected location of
P
i
a point in a cluster.
Then the variance of each cluster is:
X
= kx − ci k2
x∈Ci

And the total objective is:

XX
φ= kx − ci k2
ci x∈Ci

Sergei V. and Suresh V. Theory of Clustering

Approximations

Minimizing Variance
P P C = {C12, C2 , . . . , Ck } that
Given X and k, find a clustering
minimizes: φ(X, C ) = ci x∈Ci kx − ci k

Sergei V. and Suresh V. Theory of Clustering

Approximations

Minimizing Variance
P P C = {C12, C2 , . . . , Ck } that
Given X and k, find a clustering
minimizes: φ(X, C ) = ci x∈Ci kx − ci k

Definition
Let φ ∗ denote the value of the optimum solution above. We say that a
clustering C 0 is α-approximate if:

φ ∗ ≤ φ(X, C 0 ) ≤ α · φ ∗

Sergei V. and Suresh V. Theory of Clustering

Approximations

Minimizing Variance
P P C = {C1 , C2 , . . . , Ck } that
Given X and k, find a clustering
minimizes: φ(X, C ) = ci x∈Ci kx − ci k2

Solving this problem

This problem is NP-complete, even when the pointset X lies in two
dimensions...

Sergei V. and Suresh V. Theory of Clustering

Approximations

Minimizing Variance
P P C = {C1 , C2 , . . . , Ck } that
Given X and k, find a clustering
minimizes: φ(X, C ) = ci x∈Ci kx − ci k2

Solving this problem

This problem is NP-complete, even when the pointset X lies in two
dimensions...

...but we’ve been solving it for over 50 years! [S56][L57][M67]

Sergei V. and Suresh V. Theory of Clustering

k-means

Sergei V. and Suresh V. Theory of Clustering

k-means

ExampleLloyd’s Method: k-means

Given a set of data points

Initialize with random clusters

Sergei V. and Suresh V. Theory of Clustering

k-means

ExampleLloyd’s Method: k-means

Select initial centers at random

Initialize with random clusters

Sergei V. and Suresh V. Theory of Clustering

k-means

ExampleLloyd’s Method: k-means

Assign each point to nearest center

Sergei V. and Suresh V. Theory of Clustering

k-means

Lloyd’s Method: k-means

Example
Recompute optimum centers given a fixed clustering

Recompute optimum centers (means)

Sergei V. and Suresh V. Theory of Clustering

k-means

Example Lloyd’s Method: k-means

Repeat

Repeat: Assign points to nearest center

Sergei V. and Suresh V. Theory of Clustering

k-means

Example Lloyd’s Method: k-means

Repeat

Repeat: Recompute centers

Sergei V. and Suresh V. Theory of Clustering

k-means

Example Lloyd’s Method: k-means

Repeat

Repeat...

Sergei V. and Suresh V. Theory of Clustering

k-means

ExampleLloyd’s Method: k-means

Until the clustering doesn’t change

Repeat...Until clustering does not change

Sergei V. and Suresh V. Theory of Clustering

Performance

This algorithm terminates!

Recall the total error:
XX
φ(X, C ) = kx − ci k2
ci x∈Ci

In every iteration φ is reduced:

Assigning each point to the nearest center reduces φ
Given a fixed cluster, the mean is the optimal location for the
center (requires proof)

Sergei V. and Suresh V. Theory of Clustering

Performance
k-means Accuracy
TheHow good isfinds
algorithm this algorithm?
a local minimum . . .

Finds a local optimum

That is potentially arbitrarily worse than optimal solution

32
Sergei V. and Suresh V. Theory of Clustering
Performance
k-means Accuracy
. . . How
that’sgood is this algorithm?
potentially arbitrarily worse than optimum solution

Finds a local optimum

That is potentially arbitrarily worse than optimal solution

33
Sergei V. and Suresh V. Theory of Clustering
Performance
k-means Accuracy
But does this really happen?

But does this really happen?

Sergei V. and Suresh V. Theory of Clustering

37
Performance
k-means Accuracy
But does this really happen? YES!

But does this really happen? YES

Even with many random restarts!

Sergei V. and Suresh V. Theory of Clustering
38
Performance

Finding a good set of initial points is a black art

Sergei V. and Suresh V. Theory of Clustering

Performance

Finding a good set of initial points is a black art

Try many times with different random seeds

Most common method
Has limited benefit even in case of Gaussians

Sergei V. and Suresh V. Theory of Clustering

Performance

Finding a good set of initial points is a black art

Try many times with different random seeds

Most common method
Has limited benefit even in case of Gaussians
Find a different way to initialize centers
Hundreds of heuristics
Including pre & post processing ideas

Sergei V. and Suresh V. Theory of Clustering

Performance

Finding a good set of initial points is a black art

Try many times with different random seeds

Most common method
Has limited benefit even in case of Gaussians
Find a different way to initialize centers
Hundreds of heuristics
Including pre & post processing ideas

There exists a fast and simple initialization scheme with provable

performance guarantees

Sergei V. and Suresh V. Theory of Clustering

Random Initializations on Gaussians

Sergei V. and Suresh V. Theory of Clustering

Random Initializations on Gaussians
k-means on Gaussians
Some Gaussians are combined

Sergei V. and Suresh V. Theory of Clustering

46
Seeding on Gaussians

But the Gaussian case has an easy fix: use a furthest point heuristic

Sergei V. and Suresh V. Theory of Clustering

Seeding on Gaussians Simple Fix
Select
But thecenters
Gaussianusing a furthest
case has an easy point algorithm
fix: use a furthest(2-approximation
point heuristic
to k-Center clustering).

47
Sergei V. and Suresh V. Theory of Clustering
Seeding on Gaussians Simple Fix
Select
But thecenters
Gaussianusing a furthest
case has an easy point algorithm
fix: use a furthest(2-approximation
point heuristic
to k-Center clustering).

48
Sergei V. and Suresh V. Theory of Clustering
Seeding on Gaussians Simple Fix
Select
But thecenters
Gaussianusing a furthest
case has an easy point algorithm
fix: use a furthest(2-approximation
point heuristic
to k-Center clustering).

49
Sergei V. and Suresh V. Theory of Clustering
Seeding on Gaussians Simple Fix
Select
But thecenters
Gaussianusing a furthest
case has an easy point algorithm
fix: use a furthest(2-approximation
point heuristic
to k-Center clustering).

50
Sergei V. and Suresh V. Theory of Clustering
Seeding on Gaussians Simple Fix
Select
But thecenters
Gaussianusing a furthest
case has an easy point algorithm
fix: use a furthest(2-approximation
point heuristic
to k-Center clustering).

51
Sergei V. and Suresh V. Theory of Clustering
Sensitive to Outliers
Seeding on Gaussians

But this fix is overly sensitive to outliers

Sergei V. and Suresh V. 53

Theory of Clustering
Sensitive to Outliers
Seeding on Gaussians

But this fix is overly sensitive to outliers

Sergei V. and Suresh V. 54

Theory of Clustering
Sensitive to Outliers
Seeding on Gaussians

But this fix is overly sensitive to outliers

Sergei V. and Suresh V. 55

Theory of Clustering
Sensitive to Outliers
Seeding on Gaussians

But this fix is overly sensitive to outliers

Sergei V. and Suresh V. 56

Theory of Clustering
Sensitive to Outliers
Seeding on Gaussians

But this fix is overly sensitive to outliers

Sergei V. and Suresh V. 57

Theory of Clustering
k-means++

What if we interpolate between the two methods?

Sergei V. and Suresh V. Theory of Clustering

k-means++

What if we interpolate between the two methods?

Let D(x) be the distance between a point x and its nearest cluster
center. Chose the next point proportionally to Dα (x).

Sergei V. and Suresh V. Theory of Clustering

k-means++

What if we interpolate between the two methods?

Let D(x) be the distance between a point x and its nearest cluster
center. Chose the next point proportionally to Dα (x).
α = 0 −→ Random initialization

Sergei V. and Suresh V. Theory of Clustering

k-means++

What if we interpolate between the two methods?

Let D(x) be the distance between a point x and its nearest cluster
center. Chose the next point proportionally to Dα (x).
α = 0 −→ Random initialization
α = ∞ −→ Furthest point heuristic

Sergei V. and Suresh V. Theory of Clustering

k-means++

What if we interpolate between the two methods?

Let D(x) be the distance between a point x and its nearest cluster
center. Chose the next point proportionally to Dα (x).
α = 0 −→ Random initialization
α = ∞ −→ Furthest point heuristic
α = 2 −→ k-means++

Sergei V. and Suresh V. Theory of Clustering

k-means++

What if we interpolate between the two methods?

More generally
Set the probability of selecting a point proportional to its
contribution to the overall error.
P P
If minimizing ci x∈Ci kx − ci k, sample according to D.
If minimizing ci c∈Ci kx − ci k∞ , sample according to D∞
P P

(take the furthest point).

Sergei V. and Suresh V. Theory of Clustering

Example of k-means++

k-Means++
If the data set looks Gaussian. . .

Sergei V. and Suresh V. Theory of Clustering

Example of k-means++

k-Means++
If the data set looks Gaussian. . .

Sergei V. and Suresh V. Theory of Clustering

Example of k-means++

k-Means++
If the data set looks Gaussian. . .

Sergei V. and Suresh V. Theory of Clustering

Example of k-means++

k-Means++
If the data set looks Gaussian. . .

Sergei V. and Suresh V. Theory of Clustering

Example of k-means++

k-Means++
If the data set looks Gaussian. . .