Clarans Clustering
Clarans Clustering
Effective Clustering
Methods for Spatial
Data Mining
Raymond T. Ng, Jiawei Han
1
Overview
Spatial Data Mining
Clustering techniques
CLARANS
Spatial and Non-Spatial dominant
CLARANS
Observations
Summary
2
Overview
Spatial Data Mining
Clustering techniques
CLARANS
Spatial and Non-Spatial dominant
CLARANS
Observations
Summary
3
Spatial Data Mining
Identifying interesting relationships and
characteristics that may exist implicitly in Spatial
Databases
Different from Relational Databases
Spatial objects - store both spatial and non-
spatial attributes
Queries (“All Walmart stores within 10 miles of
UH)
Spatial Joins, work on spatial indexes (R-tree)
Huge sizes (Tera bytes)
GIS is a classic example
4
Overview
Spatial Data Mining
Clustering techniques
CLARANS
Spatial and Non-Spatial dominant
CLARANS
Observations
Summary
5
Partitioning Methods
Given K, the number of partitions to create, a partitioning
method constructs initial partitions. It then iterative
refines the quality of these clusters so as to maximize
intra-cluster similarity and inter-cluster dissimilarity.
[Quality of Clustering]: Average dissimilarity of objects
from their cluster centers (medoids)
Selected algorithms:
1. K-medoids
2. PAM
3. CLARA
4. CLARANS
6
K-Medoids 10
partitions) 5
Effective, why ? 3
Resistant to outliers
2
1
based 6
efficiently 0
0 1 2 3 4 5 6 7 8 9 10
K-means
7
PAM (Partitioning Around Medoids)
[Goal]: Find K representative objects of
the data set. Each of the K objects is
called a Medoid, the most centrally
located object within a cluster.
8
PAM (2)
Start with K data points designated
as medoids. Create cluster around
a medoid by moving data points
close to the medoid
Oj belongs to Oi
if d(Oj, Oi) = minOe d(Oj, Oe)
Iteratively replace Oi with Oh if
quality of clustering improves.
Swapping cost, Cijh, associated for
replacing a selected object Oi with
a non-selected object Oh
9
PAM (3)
* O(k(n-k)2) for each iteration
* Good for small data sets
(n=100, k=5)
10
CLARA (Clustering LARge
Applications)
11
Overview
Spatial Data Mining
Clustering techniques
CLARANS
Spatial and Non-Spatial dominant
CLARANS
Observations
Summary
12
CLARANS (Clustering Large Applications
based on RANdomized Search)
13
CLARANS (2)
Experimental values
• numLocal = 2
• maxNeighbors =
max(1.25% of k(n-k), 250)
14
CLARANS (3)
Outperforms PAM and CLARA in terms
of running time and quality of
clustering
O(n2) for each iteration
CLARANS vs CLARA
15
CLARANS vs PAM
Overview
Spatial Data Mining
Clustering techniques
CLARANS
Spatial and Non-Spatial dominant
CLARANS
Observations
Summary
16
Generalization
Useful to mine non-spatial
attributes
Process of merging tuples
based on a concept hierarchy
DBLearn – SQL query, gen.
hierarchy and threshold
Sphere(color, diameter)
17
Silhouette
Silhouette of object Oj
determines how much
Oj belongs to it’s
cluster
Between -1 and 1
1 indicates high
degree of membership
Silhouette width of
cluster Silhoutte width Interpretation
Average silhouette of 0.71 – 1 Strong cluster
all objects in cluster 0.51 – 0.7 Reasonable cluster
query
Apply DBLearn
22
Observations
In all previous methods, quality of
mining depends on the SQL query
CLARANS assumes that the entire
dataset is in memory. Not always the
case for large data sets.
Quality of results cannot be guaranteed
when N is very large – due to
Randomized Search
23
Observations (2)
Other clustering algorithms proposed
for Spatial Data Mining
Hierarchical: BIRCH
Density based: DBSCAN, GDBSCAN,
DBRS
Grid based: STING
24
Summary
A seminal paper on use of clustering for
spatial data mining
CLARANS is an effective clustering
technique for large datasets
SD(CLARANS)/NSD(CLARANS) are
effective spatial data mining algorithms
25
References
Primary
Efficient and Effective Clustering Methods for
Spatial Data Mining (1994) - Raymond T. Ng, Jiawei Han
Secondary
CLARANS: A Method for Clustering Objects for
Spatial Data Mining - Raymond T. Ng, Jiawei Han
Clustering for Mining in Large Spatial Databases -
Martin Ester, Hans-Peter Kriegel, Jörg Sander, Xiaowei Xu
An Introduction to Spatial Database Systems - Ralf
Hartmut Güting
26