0% found this document useful (0 votes)
72 views185 pages

kdd10 Thclust

The document discusses recent developments in the theory of clustering algorithms. It covers: 1) Practical clustering algorithms like k-means that have strong theoretical guarantees on their performance. 2) Models to explain behaviors observed in practice, such as how k-means finds a local minimum that is potentially much worse than the optimal solution. 3) Topics covered include analyzing the runtime of k-means, generalizing it to Bregman clustering, and measuring the stability of clustering solutions.

Uploaded by

Michael Burak
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
72 views185 pages

kdd10 Thclust

The document discusses recent developments in the theory of clustering algorithms. It covers: 1) Practical clustering algorithms like k-means that have strong theoretical guarantees on their performance. 2) Models to explain behaviors observed in practice, such as how k-means finds a local minimum that is potentially much worse than the optimal solution. 3) Topics covered include analyzing the runtime of k-means, generalizing it to Bregman clustering, and measuring the stability of clustering solutions.

Uploaded by

Michael Burak
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 185

New Developments In The Theory Of Clustering

that’s all very well in practice, but does it work in theory ?

Sergei Vassilvitskii (Yahoo! Research)


Suresh Venkatasubramanian (U. Utah)

Sergei V. and Suresh V. Theory of Clustering


Overview

What we will cover


A few of the recent theory results on clustering:
Practical algorithms that have strong theoretical guarantees
Models to explain behavior observed in practice

Sergei V. and Suresh V. Theory of Clustering


Overview

What we will not cover


The rest:
Recent strands of theory of clustering such as metaclustering
and privacy preserving clustering
Clustering with distributional data assumptions
Proofs

Sergei V. and Suresh V. Theory of Clustering


Outline

Outline
I Euclidean Clustering and k-means algorithm

II Bregman Clustering and k-means

III Stability

Sergei V. and Suresh V. Theory of Clustering


Outline

Outline
I Euclidean Clustering and k-means algorithm
What to do to select initial centers (and what not to do)
How long does k-means take to run in theory, practice and
theoretical practice
How to run k-means on large datasets
II Bregman Clustering and k-means

III Stability

Sergei V. and Suresh V. Theory of Clustering


Outline

Outline
I Euclidean Clustering and k-means algorithm
What to do to select initial centers (and what not to do)
How long does k-means take to run in theory, practice and
theoretical practice
How to run k-means on large datasets
II Bregman Clustering and k-means
Bregman Clustering as generalization of k-means
Performance Results
III Stability

Sergei V. and Suresh V. Theory of Clustering


Outline

Outline
I Euclidean Clustering and k-means algorithm
What to do to select initial centers (and what not to do)
How long does k-means take to run in theory, practice and
theoretical practice
How to run k-means on large datasets
II Bregman Clustering and k-means
Bregman Clustering as generalization of k-means
Performance Results
III Stability
How to relate closeness in cost function to closeness in
clusters.

Sergei V. and Suresh V. Theory of Clustering


Euclidean Clustering and k-means

Sergei V. and Suresh V. Theory of Clustering


Introduction

What does it mean to cluster?


Given n points in Rd find the best way to split them into k groups.

Sergei V. and Suresh V. Theory of Clustering


points in Rd split them into k similar groups.
n nIntroduction

How do we define “best" ?


Example:

3
Sergei V. and Suresh V. Theory of Clustering
points in Rd split them into k similar groups.
n nIntroduction

How do we define “best" ?


Example:

3
Sergei V. and Suresh V. Theory of Clustering
points in Rd split them into k similar groups.
n nIntroduction
ctive: minimize maximum radius
How do we define “best" ?
Minimize the maximum radius of a cluster

4
Sergei V. and Suresh V. Theory of Clustering
points in Rd split them into k similar groups.
n nIntroduction
ctive: minimize maximum radius
How do we define “best" ?
maximize inter-cluster distance
Maximize the average inter-cluster distance

5
Sergei V. and Suresh V. Theory of Clustering
points in Rd split them into k similar groups.
n nIntroduction
ctive: minimize maximum radius
How do we define
maximize “best" ? distance
inter-cluster
Minimize the variance within each cluster.

6
Sergei V. and Suresh V. Theory of Clustering
Introduction

How do we define “best" ?


Minimize the variance within each cluster.

Minimizing total variance


For each cluster Ci ∈ C , ci = |C1 | x∈Ci x is the expected location of
P
i
a point in a cluster.
Then the variance of each cluster is:
X
= kx − ci k2
x∈Ci

And the total objective is:


XX
φ= kx − ci k2
ci x∈Ci

Sergei V. and Suresh V. Theory of Clustering


Approximations

Minimizing Variance
P P C = {C12, C2 , . . . , Ck } that
Given X and k, find a clustering
minimizes: φ(X, C ) = ci x∈Ci kx − ci k

Sergei V. and Suresh V. Theory of Clustering


Approximations

Minimizing Variance
P P C = {C12, C2 , . . . , Ck } that
Given X and k, find a clustering
minimizes: φ(X, C ) = ci x∈Ci kx − ci k

Definition
Let φ ∗ denote the value of the optimum solution above. We say that a
clustering C 0 is α-approximate if:

φ ∗ ≤ φ(X, C 0 ) ≤ α · φ ∗

Sergei V. and Suresh V. Theory of Clustering


Approximations

Minimizing Variance
P P C = {C1 , C2 , . . . , Ck } that
Given X and k, find a clustering
minimizes: φ(X, C ) = ci x∈Ci kx − ci k2

Solving this problem


This problem is NP-complete, even when the pointset X lies in two
dimensions...

Sergei V. and Suresh V. Theory of Clustering


Approximations

Minimizing Variance
P P C = {C1 , C2 , . . . , Ck } that
Given X and k, find a clustering
minimizes: φ(X, C ) = ci x∈Ci kx − ci k2

Solving this problem


This problem is NP-complete, even when the pointset X lies in two
dimensions...

...but we’ve been solving it for over 50 years! [S56][L57][M67]

Sergei V. and Suresh V. Theory of Clustering


k-means

Sergei V. and Suresh V. Theory of Clustering


k-means

ExampleLloyd’s Method: k-means


Given a set of data points

Initialize with random clusters

Sergei V. and Suresh V. Theory of Clustering


k-means

ExampleLloyd’s Method: k-means


Select initial centers at random

Initialize with random clusters

Sergei V. and Suresh V. Theory of Clustering


k-means

ExampleLloyd’s Method: k-means


Assign each point to nearest center

Assign each point to nearest center

Sergei V. and Suresh V. Theory of Clustering


k-means

Lloyd’s Method: k-means


Example
Recompute optimum centers given a fixed clustering

Recompute optimum centers (means)

Sergei V. and Suresh V. Theory of Clustering


k-means

Example Lloyd’s Method: k-means


Repeat

Repeat: Assign points to nearest center

Sergei V. and Suresh V. Theory of Clustering


k-means

Example Lloyd’s Method: k-means


Repeat

Repeat: Recompute centers

Sergei V. and Suresh V. Theory of Clustering


k-means

Example Lloyd’s Method: k-means


Repeat

Repeat...

Sergei V. and Suresh V. Theory of Clustering


k-means

ExampleLloyd’s Method: k-means


Until the clustering doesn’t change

Repeat...Until clustering does not change

Sergei V. and Suresh V. Theory of Clustering


Performance

This algorithm terminates!


Recall the total error:
XX
φ(X, C ) = kx − ci k2
ci x∈Ci

In every iteration φ is reduced:


Assigning each point to the nearest center reduces φ
Given a fixed cluster, the mean is the optimal location for the
center (requires proof)

Sergei V. and Suresh V. Theory of Clustering


Performance
k-means Accuracy
TheHow good isfinds
algorithm this algorithm?
a local minimum . . .

Finds a local optimum

That is potentially arbitrarily worse than optimal solution

32
Sergei V. and Suresh V. Theory of Clustering
Performance
k-means Accuracy
. . . How
that’sgood is this algorithm?
potentially arbitrarily worse than optimum solution

Finds a local optimum

That is potentially arbitrarily worse than optimal solution

33
Sergei V. and Suresh V. Theory of Clustering
Performance
k-means Accuracy
But does this really happen?

But does this really happen?

Sergei V. and Suresh V. Theory of Clustering


37
Performance
k-means Accuracy
But does this really happen? YES!

But does this really happen? YES

Even with many random restarts!


Sergei V. and Suresh V. Theory of Clustering
38
Performance

Finding a good set of initial points is a black art

Sergei V. and Suresh V. Theory of Clustering


Performance

Finding a good set of initial points is a black art

Try many times with different random seeds


Most common method
Has limited benefit even in case of Gaussians

Sergei V. and Suresh V. Theory of Clustering


Performance

Finding a good set of initial points is a black art

Try many times with different random seeds


Most common method
Has limited benefit even in case of Gaussians
Find a different way to initialize centers
Hundreds of heuristics
Including pre & post processing ideas

Sergei V. and Suresh V. Theory of Clustering


Performance

Finding a good set of initial points is a black art

Try many times with different random seeds


Most common method
Has limited benefit even in case of Gaussians
Find a different way to initialize centers
Hundreds of heuristics
Including pre & post processing ideas

There exists a fast and simple initialization scheme with provable


performance guarantees

Sergei V. and Suresh V. Theory of Clustering


Random Initializations on Gaussians

Sergei V. and Suresh V. Theory of Clustering


Random Initializations on Gaussians
k-means on Gaussians
Some Gaussians are combined

Sergei V. and Suresh V. Theory of Clustering


46
Seeding on Gaussians

But the Gaussian case has an easy fix: use a furthest point heuristic

Sergei V. and Suresh V. Theory of Clustering


Seeding on Gaussians Simple Fix
Select
But thecenters
Gaussianusing a furthest
case has an easy point algorithm
fix: use a furthest(2-approximation
point heuristic
to k-Center clustering).

47
Sergei V. and Suresh V. Theory of Clustering
Seeding on Gaussians Simple Fix
Select
But thecenters
Gaussianusing a furthest
case has an easy point algorithm
fix: use a furthest(2-approximation
point heuristic
to k-Center clustering).

48
Sergei V. and Suresh V. Theory of Clustering
Seeding on Gaussians Simple Fix
Select
But thecenters
Gaussianusing a furthest
case has an easy point algorithm
fix: use a furthest(2-approximation
point heuristic
to k-Center clustering).

49
Sergei V. and Suresh V. Theory of Clustering
Seeding on Gaussians Simple Fix
Select
But thecenters
Gaussianusing a furthest
case has an easy point algorithm
fix: use a furthest(2-approximation
point heuristic
to k-Center clustering).

50
Sergei V. and Suresh V. Theory of Clustering
Seeding on Gaussians Simple Fix
Select
But thecenters
Gaussianusing a furthest
case has an easy point algorithm
fix: use a furthest(2-approximation
point heuristic
to k-Center clustering).

51
Sergei V. and Suresh V. Theory of Clustering
Sensitive to Outliers
Seeding on Gaussians

But this fix is overly sensitive to outliers

Sergei V. and Suresh V. 53


Theory of Clustering
Sensitive to Outliers
Seeding on Gaussians

But this fix is overly sensitive to outliers

Sergei V. and Suresh V. 54


Theory of Clustering
Sensitive to Outliers
Seeding on Gaussians

But this fix is overly sensitive to outliers

Sergei V. and Suresh V. 55


Theory of Clustering
Sensitive to Outliers
Seeding on Gaussians

But this fix is overly sensitive to outliers

Sergei V. and Suresh V. 56


Theory of Clustering
Sensitive to Outliers
Seeding on Gaussians

But this fix is overly sensitive to outliers

Sergei V. and Suresh V. 57


Theory of Clustering
k-means++

What if we interpolate between the two methods?

Sergei V. and Suresh V. Theory of Clustering


k-means++

What if we interpolate between the two methods?

Let D(x) be the distance between a point x and its nearest cluster
center. Chose the next point proportionally to Dα (x).

Sergei V. and Suresh V. Theory of Clustering


k-means++

What if we interpolate between the two methods?

Let D(x) be the distance between a point x and its nearest cluster
center. Chose the next point proportionally to Dα (x).
α = 0 −→ Random initialization

Sergei V. and Suresh V. Theory of Clustering


k-means++

What if we interpolate between the two methods?

Let D(x) be the distance between a point x and its nearest cluster
center. Chose the next point proportionally to Dα (x).
α = 0 −→ Random initialization
α = ∞ −→ Furthest point heuristic

Sergei V. and Suresh V. Theory of Clustering


k-means++

What if we interpolate between the two methods?

Let D(x) be the distance between a point x and its nearest cluster
center. Chose the next point proportionally to Dα (x).
α = 0 −→ Random initialization
α = ∞ −→ Furthest point heuristic
α = 2 −→ k-means++

Sergei V. and Suresh V. Theory of Clustering


k-means++

What if we interpolate between the two methods?

Let D(x) be the distance between a point x and its nearest cluster
center. Chose the next point proportionally to Dα (x).
α = 0 −→ Random initialization
α = ∞ −→ Furthest point heuristic
α = 2 −→ k-means++

More generally
Set the probability of selecting a point proportional to its
contribution to the overall error.
P P
If minimizing ci x∈Ci kx − ci k, sample according to D.
If minimizing ci c∈Ci kx − ci k∞ , sample according to D∞
P P

(take the furthest point).

Sergei V. and Suresh V. Theory of Clustering


Example of k-means++

k-Means++
If the data set looks Gaussian. . .

Sergei V. and Suresh V. Theory of Clustering


Example of k-means++

k-Means++
If the data set looks Gaussian. . .

Sergei V. and Suresh V. Theory of Clustering


Example of k-means++

k-Means++
If the data set looks Gaussian. . .

Sergei V. and Suresh V. Theory of Clustering


Example of k-means++

k-Means++
If the data set looks Gaussian. . .

Sergei V. and Suresh V. Theory of Clustering


Example of k-means++

k-Means++
If the data set looks Gaussian. . .

Sergei V. and Suresh V. Theory of Clustering


Example of k-means++

k-Means++
If the outlier should be its own cluster . . .

Sergei V. and Suresh V. Theory of Clustering


Example of k-means++

k-Means++
If the outlier should be its own cluster . . .

Sergei V. and Suresh V. Theory of Clustering


Example of k-means++

k-Means++
If the outlier should be its own cluster . . .

Sergei V. and Suresh V. Theory of Clustering


Example of k-means++

k-Means++
If the outlier should be its own cluster . . .

Sergei V. and Suresh V. Theory of Clustering


Example of k-means++

k-Means++
If the outlier should be its own cluster . . .

Sergei V. and Suresh V. Theory of Clustering


Analyzing k-means++

What can we say about performance of k-means++?

Sergei V. and Suresh V. Theory of Clustering


Analyzing k-means++

What can we say about performance of k-means++?

Theorem (AV07)
This algorithm always attains an O(log k) approximation in
expectation

Sergei V. and Suresh V. Theory of Clustering


Analyzing k-means++

What can we say about performance of k-means++?

Theorem (AV07)
This algorithm always attains an O(log k) approximation in
expectation

Theorem (ORSS06)
A slightly modified version of this algorithm attains an O(1)
approximation if the data is ‘nicely clusterable’ with k clusters.

Sergei V. and Suresh V. Theory of Clustering


Nice Clusterings

What do we mean by ‘nicely clusterable’?

Intuitively, X is nicely clusterable if going from k − 1 to k clusters


drops the total error by a constant factor.

Sergei V. and Suresh V. Theory of Clustering


Nice Clusterings

What do we mean by ‘nicely clusterable’?

Intuitively, X is nicely clusterable if going from k − 1 to k clusters


drops the total error by a constant factor.

Definition
A pointset X is (k, ε)-separated if φk∗ (X) ≤ ε2 φk−1

(X).

Sergei V. and Suresh V. Theory of Clustering


Why does this work?

Intuition
Look at the optimum clustering. In expectation:
1 If the algorithm selects a point from a new OPT cluster, that
cluster is covered pretty well
2 If the algorithm picks two points from the same OPT cluster,
then other clusters must contribute little to the overall error

Sergei V. and Suresh V. Theory of Clustering


Why does this work?

Intuition
Look at the optimum clustering. In expectation:
1 If the algorithm selects a point from a new OPT cluster, that
cluster is covered pretty well
2 If the algorithm picks two points from the same OPT cluster,
then other clusters must contribute little to the overall error
As long as the points are reasonably well separated, the first
condition holds.

Sergei V. and Suresh V. Theory of Clustering


Why does this work?

Intuition
Look at the optimum clustering. In expectation:
1 If the algorithm selects a point from a new OPT cluster, that
cluster is covered pretty well
2 If the algorithm picks two points from the same OPT cluster,
then other clusters must contribute little to the overall error
As long as the points are reasonably well separated, the first
condition holds.

Two theorems
Assume the points are (k, ε)-separated and get an O(1)
approximation.
Make no assumptions about separability and get an O(log k)
approximation.

Sergei V. and Suresh V. Theory of Clustering


Summary

k-means++ Summary:
To select the next cluster, sample a point in proportion to its
current contribution to the error
Works for k-means, k-median, other objective functions
Universal O(log k) approximation, O(1) approximation under
some assumptions
Can be implemented to run in O(nkd) time (same as a single
k-means step)

Sergei V. and Suresh V. Theory of Clustering


Summary

k-means++ Summary:
To select the next cluster, sample a point in proportion to its
current contribution to the error
Works for k-means, k-median, other objective functions
Universal O(log k) approximation, O(1) approximation under
some assumptions
Can be implemented to run in O(nkd) time (same as a single
k-means step)

But does it actually work?

Sergei V. and Suresh V. Theory of Clustering


Large Evaluation

Sergei V. and Suresh V. Theory of Clustering


Typical Run

KM++ v. KM v. KM-Hybrid

1300

1200

1100

1000
LLOYD
Error

HYBRID
KM++
900

800

700

600
0 50 100 150 200 250 300 350 400 450 500
Stage

Sergei V. and Suresh V. Theory of Clustering


Other Runs

KM++ v. KM v. KM-Hybrid

250000

200000

150000
LLOYD
Error

HYBRID
KM++
100000

50000

0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
20
40
60
80
10
12
14
16
18
20
22
24
26
28
30
32
34
36
38
40
42
44
46
48
50
Stage

Sergei V. and Suresh V. Theory of Clustering


Convergence

How fast does k-means converge?


It appears the algorithm converges in under 100 iterations (even
faster with smart initialization).

Sergei V. and Suresh V. Theory of Clustering


Convergence

How fast does k-means converge?


It appears the algorithm converges in under 100 iterations (even
faster with smart initialization).

Theorem (V09)
There exists a pointset X in R2 and a set of initial centers C so that
k-means takes 2Ω(k) iterations to converge when initialized with C .

Sergei V. and Suresh V. Theory of Clustering


Theory vs. Practice

Finding the disconnect


In theory:
k-means might run in exponential time
In practice:
k-means converges after a handful of iterations

It works in practice but it does not work in theory!

Sergei V. and Suresh V. Theory of Clustering


Finding the disconnect

Robustness of worst case examples


Perhaps the worst case examples are too precise, and can never
arise out of natural data

Quantifying the robustness


If we slightly perturb the points of the example:
The optimum solution shouldn’t change too much
Will the running time stay exponential?

Sergei V. and Suresh V. Theory of Clustering


Huge gap between worst-case and observed results.
Small Perturbations
Check how fragile the worst case is.
Add a little bit of noise to the data before running the
algorithm

28

Sergei V. and Suresh V. Theory of Clustering


Huge gap between worst-case and observed results.
Small Perturbations
Check how fragile the worst case is.
Add a little bit of noise to the data before running the
algorithm

Optimum solution barely changes


29

Sergei V. and Suresh V. Theory of Clustering


Huge gap between worst-case and observed results.
Small Perturbations
Check how fragile the worst case is.
Add a little bit of noise to the data before running the
algorithm

28

Sergei V. and Suresh V. Theory of Clustering


Smoothed Analysis

Perturbation
To each point x ∈ X add independent noise drawn from N(0, σ2 ).

Definition
The smoothed complexity of an algorithm is the maximum expected
running time after adding the noise:

max Eσ [Time(X + σ)]


X

Sergei V. and Suresh V. Theory of Clustering


Smoothed Analysis

Theorem (AMR09)
The smoothed complexity of k-means is bounded by

n k d D log4 n
‚ 34 34 8 6 Œ
O
σ6

Notes
While the bound is large, it is not exponential (2k  k34 for
large enough k)
The (D/σ)6 factor shows the bound is scale invariant

Sergei V. and Suresh V. Theory of Clustering


Smoothed Analysis

Comparing bounds
The smoothed complexity of k-means is polynomial in n, k and D/σ
where D is the diameter of X, whereas the worst case complexity of
k-means is exponential in k

Implications
The pathological examples:
Are very brittle
Can be avoided with a little bit of random noise

Sergei V. and Suresh V. Theory of Clustering


k-means Summary

Running Time
Exponential worst case running time
Polynomial typical case running time

Sergei V. and Suresh V. Theory of Clustering


k-means Summary

Running Time
Exponential worst case running time
Polynomial typical case running time

Solution Quality
Arbitrary local optimum, even with many random restarts
Simple initialization leads to a good solution

Sergei V. and Suresh V. Theory of Clustering


Large Datasets

Implementing k-means++
Initialization:
Takes O(nd) time and one pass over the data to select the next
center
Takes O(nkd) time total
Overall running time:
Each round of k-means takes O(nkd) running time
Typically finish after a constant number of rounds

Sergei V. and Suresh V. Theory of Clustering


Large Datasets

Implementing k-means++
Initialization:
Takes O(nd) time and one pass over the data to select the next
center
Takes O(nkd) time total
Overall running time:
Each round of k-means takes O(nkd) running time
Typically finish after a constant number of rounds

Large Data
What if O(nkd) is too much, can we parallelize this algorithm?

Sergei V. and Suresh V. Theory of Clustering


Parallelizing k-means

Approach
Partition the data:
Split X into X1 , X2 , . . . , Xm of roughly equal size.

Sergei V. and Suresh V. Theory of Clustering


Parallelizing k-means

Approach
Partition the data:
Split X into X1 , X2 , . . . , Xm of roughly equal size.
In parallel compute a clustering on each partition:
j j
Find C j = {C1 , . . . , Ck }: a good clustering on each partition,
j j
and denote by wi the number of points in cluster Ci .

Sergei V. and Suresh V. Theory of Clustering


Parallelizing k-means

Approach
Partition the data:
Split X into X1 , X2 , . . . , Xm of roughly equal size.
In parallel compute a clustering on each partition:
j j
Find C j = {C1 , . . . , Ck }: a good clustering on each partition,
j j
and denote by wi the number of points in cluster Ci .
Cluster the clusters:
Let Y = ∪1≤j≤m C j . Find a clustering of Y, weighted by the
j
weights W = {wi }.

Sergei V. and Suresh V. Theory of Clustering


Parallelization Example
Speed Up: Intuition
Given X

Sergei V. and Suresh V. Theory of Clustering


Parallelization Example
Speed Up: Intuition
Partition the dataset

Sergei V. and Suresh V. Theory of Clustering


Parallelization Example
Speed Up: Intuition
Cluster each partition separately

Sergei V. and Suresh V. Theory of Clustering


Parallelization Example
Speed Up: Intuition
Cluster each partition separately

Sergei V. and Suresh V. Theory of Clustering


Parallelization Example
Speed Up: Intuition
Cluster each partition separately

Sergei V. and Suresh V. Theory of Clustering


Parallelization Example
Speed Up: Intuition
Cluster each partition separately

Sergei V. and Suresh V. Theory of Clustering


Parallelization Example
Speed Up: Intuition
Cluster the clusters

Sergei V. and Suresh V. Theory of Clustering


Parallelization Example
Speed Up: Intuition
Cluster the clusters

Sergei V. and Suresh V. Theory of Clustering


Parallelization Example
Speed Up: Intuition
Cluster the clusters

Sergei V. and Suresh V. Theory of Clustering


Parallelization Example
Speed Up: Intuition
Final clustering:

Sergei V. and Suresh V. Theory of Clustering


Parallelization Example
Speed Up: Intuition
Final clustering:

Sergei V. and Suresh V. Theory of Clustering


Analysis

Quality of the solution


What happens when we approximate the approximation?
Suppose the algorithm in phase 1 gave a β-approximate
solution to its input
Algorithm in phase 2 gave a γ-approximate solution to its
(smaller) input

Sergei V. and Suresh V. Theory of Clustering


Analysis

Quality of the solution


What happens when we approximate the approximation?
Suppose the algorithm in phase 1 gave a β-approximate
solution to its input
Algorithm in phase 2 gave a γ-approximate solution to its
(smaller) input

Theorem (GNMO00, AJM09)


The two phase algorithm gives a 4γ(1 + β) + 2β approximate
solution.

Sergei V. and Suresh V. Theory of Clustering


Analysis

Running time
Suppose we partition the input across m different machines.
First phase running time: O( nkd
m
).
Second phase running time O(mk2 d).

Sergei V. and Suresh V. Theory of Clustering


Improving the algorithm

Approximation Guarantees
Using k-means++ sets β = γ = O(log k) and leads to a O(log2 k)
approximation.

Sergei V. and Suresh V. Theory of Clustering


Improving the algorithm

Approximation Guarantees
Using k-means++ sets β = γ = O(log k) and leads to a O(log2 k)
approximation.

Improving the Approximation


Must improve the approximation guarantee of the first round, but
can use a larger k to ensure every cluster is well summarized.

Sergei V. and Suresh V. Theory of Clustering


Improving the algorithm

Approximation Guarantees
Using k-means++ sets β = γ = O(log k) and leads to a O(log2 k)
approximation.

Improving the Approximation


Must improve the approximation guarantee of the first round, but
can use a larger k to ensure every cluster is well summarized.

Theorem (ADK09)
Running k-means++ initialization for O(k) rounds leads to a O(1)
approximation to the optimal solution (but uses more centers than
OPT).

Sergei V. and Suresh V. Theory of Clustering


Two round k-means++

Final Algorithm
Partition the data:
Split X into X1 , X2 , . . . , Xm of roughly equal size.

Sergei V. and Suresh V. Theory of Clustering


Two round k-means++

Final Algorithm
Partition the data:
Split X into X1 , X2 , . . . , Xm of roughly equal size.
Compute a clustering using ` = O(k) centers each partition:
Find C j = {C1 , . . . , C` } using k-means++ on each partition,
j j

j j
and denote by wi the number of points in cluster Ci .

Sergei V. and Suresh V. Theory of Clustering


Two round k-means++

Final Algorithm
Partition the data:
Split X into X1 , X2 , . . . , Xm of roughly equal size.
Compute a clustering using ` = O(k) centers each partition:
Find C j = {C1 , . . . , C` } using k-means++ on each partition,
j j

j j
and denote by wi the number of points in cluster Ci .
Cluster the clusters.
Let Y = ∪1≤j≤m C j be a set of O(`m) points. Use k-means++
j
to cluster Y, weighted by the weights W = {wi }.

Theorem
The algorithm achieves an O(1) approximation in time
O( nkd
m
+ mk2 d)

Sergei V. and Suresh V. Theory of Clustering


Summary

Before...
k-means used to be a prime example of the disconnect between
theory and practice – it works well, but has horrible worst case
analysis

...and after
Smoothed analysis explains the running time and rigorously
analyzed initializations routines help improve clustering quality.

Sergei V. and Suresh V. Theory of Clustering


Outline

Outline
I Euclidean Clustering and k-means algorithm
What to do to select initial centers (and what not to do)
How long does k-means take to run in theory, practice and
theoretical practice
How to run k-means on large datasets

Sergei V. and Suresh V. Theory of Clustering


Outline

Outline
I Euclidean Clustering and k-means algorithm
What to do to select initial centers (and what not to do)
How long does k-means take to run in theory, practice and
theoretical practice
How to run k-means on large datasets
II Bregman Clustering and k-means
Bregman Clustering as generalization of k-means
Performance Results

Sergei V. and Suresh V. Theory of Clustering


Outline

Outline
I Euclidean Clustering and k-means algorithm
What to do to select initial centers (and what not to do)
How long does k-means take to run in theory, practice and
theoretical practice
How to run k-means on large datasets
II Bregman Clustering and k-means
Bregman Clustering as generalization of k-means
Performance Results
III Stability
How to relate closeness in cost function to closeness in
clusters.

Sergei V. and Suresh V. Theory of Clustering


Clustering With Non-Euclidean Metrics

Sergei V. and Suresh V. Theory of Clustering


Application I: Clustering Documents

Kullback-Leibler distance:
X pi
D(p, q) = pi log
i
qi

Sergei V. and Suresh V. Theory of Clustering


Application II: Image Analysis

Kullback-Leibler distance:

X pi
D(p, q) = pi log
i
qi

Sergei V. and Suresh V. Theory of Clustering


Application III: Speech Analysis

Itakuro-Saito distance:

Xp pi
i
D(p, q) = − log −1
i
qi qi

Sergei V. and Suresh V. Theory of Clustering


Bregman Divergences

Definition
Let φ : Rd → R be a strictly convex function. The Bregman
divergence dφ is defined as

Dφ (x k y) = φ(x) − φ(y) − 〈∇φ(y), x − y〉

Examples:
x
Kullback-Leibler: φ(x) = xi ln xi − xi , Dφ (x k y) = xi ln yi
P P
i
P xi xi
Itakura-Saito: φ(x) = − ln xi , Dφ (x k y) = i y − log y − 1
P
i i
1
`22 : φ(x) = 2
kxk2 , Dφ (x k y) = kx − yk 2

Sergei V. and Suresh V. Theory of Clustering


Overview

k-means clustering ≡ Bregman clustering

The algorithm works the same way.


Same (bad) worst-case behavior
Same (good) smoothed behavior
Same (good) quality guarantees, with correct initialization

Sergei V. and Suresh V. Theory of Clustering


Properties

q
D(qkp)

D(pkq)

Dφ (x k y) = φ(x) − φ(y) − 〈∇φ(y), x − y〉

Asymmetry: In general, Dφ (p k q) 6= Dφ (q k p)
No triangle inequality: Dφ (p k q) + Dφ (q k r) can be less than
Dφ (p k r) !
How can we now do clustering ?
Sergei V. and Suresh V. Theory of Clustering
Breaking down k-means

Initialize cluster centers


while not converged do
Assign points to nearest cluster center
Find new cluster center by averaging points assigned together
end while

Key Point
Setting cluster center as centroid minimizes the average squared
distance to center

Sergei V. and Suresh V. Theory of Clustering


Breaking down k-means

Initialize cluster centers


while not converged do
Assign points to nearest cluster center

Find new cluster center by averaging points assigned together


end while

Key Point
Setting cluster center as centroid minimizes the average squared
distance to center

Sergei V. and Suresh V. Theory of Clustering


Bregman Centroids

Problem
Given points x1 , . . . xn ∈ Rd , find c such that
X
Dφ (xi k c)
i

is minimized.

Answer
1X
c= xi
n

Independent of φ[BMDG05] !

Sergei V. and Suresh V. Theory of Clustering


Bregman k-means

Initialize cluster centers


while not converged do
Assign points to nearest cluster center (by measuring
Dφ (x k c))
Find new cluster center by averaging points assigned together
end while

Key Point
Setting cluster center as centroid minimizes average Bregman
divergence to center

Sergei V. and Suresh V. Theory of Clustering


Convergence

Lemma ([BMDG05])
The (Bregman) k-means algorithm converges in cost.

Euclidean distance: Bregman divergence: Bregman


The quantity Information:
XX XX
kx − center(C)k2 Dφ (x k center(C))
C x∈C C x∈C

decreases with each iteration of decreases with each iteration of


k-means the Bregman k-means algorithm.

Sergei V. and Suresh V. Theory of Clustering


EM and Soft Clustering

Expectation maximization:
Initialize density parameters and means for k distributions
while not converged do
For distribution i and point x, compute conditional probability
p(i|x) that x was drawn from i (by Bayes rule)
For each distribution i, recompute new density parameters
and means (via maximum likelihood)
end while

This yields a soft clustering of points to “clusters”


Originally used for mixtures of Gaussians.

Sergei V. and Suresh V. Theory of Clustering


Exponential Families And Bregman Divergences

Definition (Exponential Family)


Parametric family of distributions pΨ,θ is an exponential family if
each density is of the form

pΨ,θ = exp(〈x, θ 〉 − Ψ(θ ))p0 (x)

with Ψ convex.

Let φ(t) = Ψ∗ (t) be the Legendre-Fenchel dual of Ψ(x):

φ(t) = sup 〈x, t〉 − Ψ(x)



x

Theorem ([BMDG05])
pΨ,θ = exp(−Dφ (x k µ))bφ (x)
where µ is the expectation parameter ∇Ψ(θ )
Sergei V. and Suresh V. Theory of Clustering
EM: Euclidean and Bregman

Expectation maximization:
Initialize density parameters and means for k distributions
while not converged do
For distribution i and point x, compute conditional probability
p(i|x) that x was drawn from i (by Bayes rule)
For each distribution i, recompute new density parameters
and means (via maximum likelihood)
end while

Choosing the corresponding Bregman divergence Dφ (· k ·), φ = Ψ∗


gives mixture density estimation for any exponential family pΨ,θ .

Sergei V. and Suresh V. Theory of Clustering


Performance Analysis

Sergei V. and Suresh V. Theory of Clustering


Performance Analysis

Two questions:

Problem (Rate of convergence)


Given an arbitrary set of n points in d dimensions, how long does it
take for (Bregman) k-means to converge ?

Sergei V. and Suresh V. Theory of Clustering


Performance Analysis

Two questions:

Problem (Rate of convergence)


Given an arbitrary set of n points in d dimensions, how long does it
take for (Bregman) k-means to converge ?

Problem (Quality of Solution)


Let OPT denote the optimal clustering that minimizes the average
sum of (Bregman) distances to cluster centers. How close to OPT is
the solution returned by (Bregman) k-means ?

Sergei V. and Suresh V. Theory of Clustering


Performance Analysis

Two questions:

Problem (Rate of convergence)


Given an arbitrary set of n points in d dimensions, how long does it
take for (Bregman) k-means to converge ?

Sergei V. and Suresh V. Theory of Clustering


Convergence of k-means

Parameters: n, k, d.

, Good news
k-means always converges in O(nkd ) time.

/ Bad news
k-means can take time 2Ω(k) to converge:

Sergei V. and Suresh V. Theory of Clustering


Convergence of k-means

Parameters: n, k, d.

, Good news
k-means always converges in O(nkd ) time.

/ Bad news
k-means can take time 2Ω(k) to converge:
Even if d = 2, i.e in the plane

Sergei V. and Suresh V. Theory of Clustering


Convergence of k-means

Parameters: n, k, d.

, Good news
k-means always converges in O(nkd ) time.

/ Bad news
k-means can take time 2Ω(k) to converge:
Even if d = 2, i.e in the plane
Even if centers are chosen from the initial data

Sergei V. and Suresh V. Theory of Clustering


Convergence of Bregman k-means

Euclidean distance: Bregman divergence:


k-means can take time 2Ω(k) to For some Bregman divergences,
converge: k-means can take time 2Ω(k) to
Even if d = 2, i.e in the converge[MR09]:
plane Even if d = 2, i.e in the
Even if centers are chosen plane
from the initial data Even if centers are chosen
from the initial data

Sergei V. and Suresh V. Theory of Clustering


Proof Idea

"Well behaved" Bregman divergences look "locally Euclidean":

c c

{x|kx − ck2 ≤ 1} {x | Dφ (x, c) ≤ 1}

Take a bad Euclidean instance and shrink it to make it local.

Sergei V. and Suresh V. Theory of Clustering


Huge gap between worst-case and observed results.
Smoothed Analysis
Check how fragile the worst case is.
Real inputs aren’t worst-case!
Add a little bit of noise to the data before running the
algorithm

Analyze expected run-time over perturbations.


28

Sergei V. and Suresh V. Theory of Clustering


Huge gap between worst-case and observed results.
Smoothed Analysis
Check how fragile the worst case is.
Real inputs aren’t worst-case!
Add a little bit of noise to the data before running the
algorithm

Optimum solution barely changes


Analyze expected run-time over perturbations.
29

Sergei V. and Suresh V. Theory of Clustering


k-means: Worst-case vs Smoothed

Theorem
Smoothed complexity of k-means using Gaussian noise with variance
σ is polynomial in n and 1/σ.

Compare this to worst-case lower bound of 2Θ(n)

Sergei V. and Suresh V. Theory of Clustering


Bregman Smoothing

Normal smoothing doesn’t work !


P
∆n = {(x1 , . . . xn ) | xi = 1}

Sergei V. and Suresh V. Theory of Clustering


Bregman smoothing

More general notion of smoothing:

Sergei V. and Suresh V. Theory of Clustering


Bregman smoothing

More general notion of smoothing:

perturbation should stay close to a hyperplane

Sergei V. and Suresh V. Theory of Clustering


Bregman smoothing

More general notion of smoothing:

perturbation should stay close to a hyperplane


density of perturbation is proportional to 1/σd

Sergei V. and Suresh V. Theory of Clustering


Bregman smoothing: Results

Theorem ([MR09])
For “well-behaved”pBregman divergences, smoothed complexity is
bounded by poly(n k , 1/σ) and kkd poly(n, 1/σ).

This is in comparison to worst-case bound of 2Ω(n) .

Sergei V. and Suresh V. Theory of Clustering


Performance Analysis

Two questions:

Problem (Rate of convergence)


Given an arbitrary set of n points in d dimensions, how long does it
take for (Bregman) k-means to converge ?

Problem (Quality of Solution)


Let OPT denote the optimal clustering that minimizes the average
sum of (Bregman) distances to cluster centers. How close to OPT is
the solution returned by (Bregman) k-means ?

Sergei V. and Suresh V. Theory of Clustering


Performance Analysis

Two questions:

Problem (Quality of Solution)


Let OPT denote the optimal clustering that minimizes the average
sum of (Bregman) distances to cluster centers. How close to OPT is
the solution returned by (Bregman) k-means ?

Sergei V. and Suresh V. Theory of Clustering


Optimality and Approximations

Problem
Given x1 , . . . , xn , and parameter k, find k centers c1 , . . . , ck such that
n
X k
min d(xi , cj )
j=1
x=1

is minimized.

Sergei V. and Suresh V. Theory of Clustering


Optimality and Approximations

Problem
Given x1 , . . . , xn , and parameter k, find k centers c1 , . . . , ck such that
n
X k
min d(xi , cj )
j=1
x=1

is minimized.

Problem (c-approximation)
Let OPT be the optimal solution Pn above. Fix c > 0. Find centers
c01 , . . . c0k such that if A = x=1 minkj=1 d(xi , c0j ), then

OPT ≤ A ≤ c · OPT

Sergei V. and Suresh V. Theory of Clustering


k-means++: Initialize carefully!

Initialization
Let distance from x to nearest cluster center be D(x)
Pick x as new center with probability

p(x) ∝ D2 (x)

Properties of solution:
For arbitrary data, this gives O(log n)-approximation
For “well-separated data”, this gives constant
(O(1))-approximation.

Sergei V. and Suresh V. Theory of Clustering


What is ’well-separated’

Informally, data is (k, α)-well separated if the best clustering that


uses k − 1 clusters has cost that is ≥ 1/α · OPT.

Sergei V. and Suresh V. Theory of Clustering


What is ’well-separated’

Informally, data is (k, α)-well separated if the best clustering that


uses k − 1 clusters has cost that is ≥ 1/α · OPT.

Sergei V. and Suresh V. Theory of Clustering


What is ’well-separated’

Informally, data is (k, α)-well separated if the best clustering that


uses k − 1 clusters has cost that is ≥ 1/α · OPT.

Sergei V. and Suresh V. Theory of Clustering


Bregman k-means++

Initialization
Let Bregman divergence from x to nearest cluster center be
D(x)
Pick x as new center with probability

p(x) ∝ D(x)

Run algorithm as before.

Theorem ([AB09, AB10])


O(1)-approximation for (k, α)-separated sets.
O(log n) approximation in general.

Sergei V. and Suresh V. Theory of Clustering


Stability in clustering

Sergei V. and Suresh V. Theory of Clustering


Target and Optimal clustering

OP T C∗
d(OP T, C ∗ )

C dq (OP T, C)

Two measures of cost:


Distance between clusterings C , C ∗ :

d(C , C ∗ ) = fraction of points on which they disagree

(Quality) distance from C to OPT:

cost(C )
dq (C , OPT) =
cost(OPT)

Can closeness in dq imply closeness in d ?

Sergei V. and Suresh V. Theory of Clustering


NP-hardness

NP-hardness is an obstacle to finding good clusterings.


k-means and k-median are NP-hard, and hard to approximate
in general graphs
k-means, k-median can be approximated in Rd but seem to
need time exponential in d
Same is true for Bregman clustering[CM08]

Sergei V. and Suresh V. Theory of Clustering


Target And Optimal Clusterings

What happens if target clustering and optimal clustering are not


the same ?

OP T C∗
Measuring dq

The two distance functions might be incompatible.

Sergei V. and Suresh V. Theory of Clustering


Target And Optimal Clusterings

What happens if target clustering and optimal clustering are not


the same ?

OP T C∗
Measuring d

The two distance functions might be incompatible.

Sergei V. and Suresh V. Theory of Clustering


Stability Of Clusterings

An instance is stable if approximating the cost function gives us a


solution close to the target clustering.

View 1: If we perturb inputs, the output should not change.

Sergei V. and Suresh V. Theory of Clustering


Stability Of Clusterings

An instance is stable if approximating the cost function gives us a


solution close to the target clustering.

View 1: If we perturb inputs, the output should not change.


View 2: If we change the distance function, output should not
change.

Sergei V. and Suresh V. Theory of Clustering


Stability Of Clusterings

An instance is stable if approximating the cost function gives us a


solution close to the target clustering.

View 1: If we perturb inputs, the output should not change.


View 2: If we change the distance function, output should not
change.
View 3: If we change the cost quality of solution, then output
should not change.

Sergei V. and Suresh V. Theory of Clustering


Stability I: Perturbing Inputs

Well separated sets:

Data is (k, α)-well separated if the best clustering that uses k − 1


clusters has cost that is ≥ 1/α · OPT.

Two interesting properties[ORSS06]:


All optimal clusterings mostly look the same: dq small ⇒ d
small.
Small perturbations of the data don’t change this property.

Computationally, well-separatedness makes k-means work well


Sergei V. and Suresh V. Theory of Clustering
Stability II: Perturbing Distance Function

Definition (α-perturbations[BL09])
A clustering instance (P, d) is α-perturbation-resilient if the optimal
clustering is identical to the optimal clustering for any (P, d0 ), where

d(x, y)/α ≤ d0 (x, y) ≤ d(x, y) · α

The smaller the α, the more resilient the instance (and the
more “stable”)
Center-based clustering problems (k-median, k-means,
k-center)
p can be solved optimally for
3-perturbation-resilient inputs[ABS10]

Sergei V. and Suresh V. Theory of Clustering


Stability III: Perturbing Quality of Solution

Definition ((c, ε)-property[BBG09])


Given an input, all clusterings that are c-approximate are also
ε-close.

Surprising facts:
Finding a c-approximation in general might be NP-hard.
Finding a c-approximation here is easy !

Sergei V. and Suresh V. Theory of Clustering


Proof Idea

If near-optimal clusters are close to true answer, then clusters


must be well-separated.
If clusters are well-separated, then choosing the right
threshold separates them cleanly.
Important that ALL near-optimal clusterings are close to true
answer.

Sergei V. and Suresh V. Theory of Clustering


Proof Idea

If near-optimal clusters are close to true answer, then clusters


must be well-separated.
If clusters are well-separated, then choosing the right
threshold separates them cleanly.
Important that ALL near-optimal clusterings are close to true
answer.

Sergei V. and Suresh V. Theory of Clustering


Proof Idea

If near-optimal clusters are close to true answer, then clusters


must be well-separated.
If clusters are well-separated, then choosing the right
threshold separates them cleanly.
Important that ALL near-optimal clusterings are close to true
answer.

Sergei V. and Suresh V. Theory of Clustering


Proof Idea

If near-optimal clusters are close to true answer, then clusters


must be well-separated.
If clusters are well-separated, then choosing the right
threshold separates them cleanly.
Important that ALL near-optimal clusterings are close to true
answer.

Sergei V. and Suresh V. Theory of Clustering


Main Result

Theorem
In polynomial time, we can find a clustering that is O(ε)-close to the
target clustering, even if finding a c-approximation is NP-hard.

Sergei V. and Suresh V. Theory of Clustering


Generalization

Strong assumption: ALL near-optimal clusterings are close to true


answer.
Variant[ABS10]: Only consider Voronoi-based clusterings, where
each point is assigned to nearest cluster center.

Same results hold as for previous case.

Sergei V. and Suresh V. Theory of Clustering


Generalization

Strong assumption: ALL near-optimal clusterings are close to true


answer.
Variant[ABS10]: Only consider Voronoi-based clusterings, where
each point is assigned to nearest cluster center.

Same results hold as for previous case.

Sergei V. and Suresh V. Theory of Clustering


Wrap Up

Sergei V. and Suresh V. Theory of Clustering


We understand much more about the behavior of k-means,
and why it does well in practice.
A simple initialization procedure for k-means is both effective
and gives provable guarantees
Much of the theoretical machinery around k-means works for
the generalization to Bregman divergences.
New and interesting questions on the relationship between
the target clustering and cost measures used to get near it:
ways of subverting NP-hardness.

Sergei V. and Suresh V. Theory of Clustering


Thank You

Slides for this tutorial can be found at

http://www.cs.utah.edu/~suresh/web/2010/05/08/
new-developments-in-the-theory-of-clustering-tutorial/

Research on this tutorial was partially supported by NSF CCF-0953066

Sergei V. and Suresh V. Theory of Clustering


References I

Marcel R. Ackermann and Johannes Blömer.


Coresets and approximate clustering for bregman divergences.
In Mathieu [Mat09], pages 1088–1097.

Marcel R. Ackermann and Johannes Blömer.


Bregman clustering for separable instances.
In Kaplan [Kap10], pages 212–223.

P. Awasthi, A. Blum, and O. Sheffet.


Clustering Under Natural Stability Assumptions.
Computer Science Department, page 123, 2010.

Ankit Aggarwal, Amit Deshpande, and Ravi Kannan.


Adaptive sampling for k-means clustering.
In APPROX ’09 / RANDOM ’09: Proceedings of the 12th International Workshop and 13th International Workshop on
Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques, pages 15–28, Berlin,
Heidelberg, 2009. Springer-Verlag.

Nir Ailon, Ragesh Jaiswal, and Claire Monteleoni.


Streaming k-means approximation.
In Y. Bengio, D. Schuurmans, J. Lafferty, C. K. I. Williams, and A. Culotta, editors, Advances in Neural Information
Processing Systems 22, pages 10–18. 2009.

David Arthur, Bodo Manthey, and Heiko Röglin.


k-means has polynomial smoothed complexity.
In FOCS ’09: Proceedings of the 2009 50th Annual IEEE Symposium on Foundations of Computer Science, pages
405–414, Washington, DC, USA, 2009. IEEE Computer Society.

Sergei V. and Suresh V. Theory of Clustering


References II

David Arthur and Sergei Vassilvitskii.


k-means++: the advantages of careful seeding.
In SODA ’07: Proceedings of the eighteenth annual ACM-SIAM symposium on Discrete algorithms, pages 1027–1035,
Philadelphia, PA, USA, 2007. Society for Industrial and Applied Mathematics.

Maria-Florina Balcan, Avrim Blum, and Anupam Gupta.


Approximate clustering without the approximation.
In Mathieu [Mat09], pages 1068–1077.

Yonatan Bilu and Nathan Linial.


Are stable instances easy?
CoRR, abs/0906.3162, 2009.

Arindam Banerjee, Srujana Merugu, Inderjit S. Dhillon, and Joydeep Ghosh.


Clustering with bregman divergences.
Journal of Machine Learning Research, 6:1705–1749, 2005.

Kamalika Chaudhuri and Andrew McGregor.


Finding metric structure in information theoretic clustering.
In Servedio and Zhang [SZ08], pages 391–402.

Yingfei Dong, Ding-Zhu Du, and Oscar H. Ibarra, editors.


Algorithms and Computation, 20th International Symposium, ISAAC 2009, Honolulu, Hawaii, USA, December 16-18,
2009. Proceedings, volume 5878 of Lecture Notes in Computer Science. Springer, 2009.

S. Guha, N. Mishra, R. Motwani, and L. O’Callaghan.


Clustering data streams.
In FOCS ’00: Proceedings of the 41st Annual Symposium on Foundations of Computer Science, page 359, Washington,
DC, USA, 2000. IEEE Computer Society.

Sergei V. and Suresh V. Theory of Clustering


References III

Haim Kaplan, editor.


Algorithm Theory - SWAT 2010, 12th Scandinavian Symposium and Workshops on Algorithm Theory, Bergen,
Norway, June 21-23, 2010. Proceedings, volume 6139 of Lecture Notes in Computer Science. Springer, 2010.

Claire Mathieu, editor.


Proceedings of the Twentieth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2009, New York, NY, USA,
January 4-6, 2009. SIAM, 2009.

Bodo Manthey and Heiko Röglin.


Worst-case and smoothed analysis of -means clustering with bregman divergences.
In Dong et al. [DDI09], pages 1024–1033.

Rafail Ostrovsky, Yuval Rabani, Leonard J. Schulman, and Chaitanya Swamy.


The effectiveness of lloyd-type methods for the k-means problem.
In FOCS ’06: Proceedings of the 47th Annual IEEE Symposium on Foundations of Computer Science, pages 165–176,
Washington, DC, USA, 2006. IEEE Computer Society.

Rocco A. Servedio and Tong Zhang, editors.


21st Annual Conference on Learning Theory - COLT 2008, Helsinki , Finland, July 9-12, 2008. Omnipress, 2008.

Andrea Vattani.
k-means requires exponentially many iterations even in the plane.
In SCG ’09: Proceedings of the 25th annual symposium on Computational geometry, pages 324–332, New York, NY,
USA, 2009. ACM.

Sergei V. and Suresh V. Theory of Clustering

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy