0% found this document useful (0 votes)

63 views

Clarans Clustering

This document discusses efficient clustering methods for spatial data mining. It introduces CLARANS, an effective clustering algorithm that improves upon PAM and CLARA. CLARANS uses a randomized search of a graph to find high quality clusters faster than exhaustive searches. The document also presents the SD and NSD approaches which apply either spatial or non-spatial clustering first before generalizing the other attributes, and observes that quality is dependent on the initial dataset and query.

Uploaded by

20repsqt

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

63 views

Clarans Clustering

Uploaded by

20repsqt

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 26

Efficient and

Effective Clustering
Methods for Spatial
Data Mining
Raymond T. Ng, Jiawei Han

1
Overview
Spatial Data Mining
Clustering techniques
CLARANS
Spatial and Non-Spatial dominant
CLARANS
Observations
Summary

2
Overview
Spatial Data Mining
Clustering techniques
CLARANS
Spatial and Non-Spatial dominant
CLARANS
Observations
Summary

3
Spatial Data Mining
Identifying interesting relationships and
characteristics that may exist implicitly in Spatial
Databases
Different from Relational Databases
Spatial objects - store both spatial and non-
spatial attributes
Queries (“All Walmart stores within 10 miles of
UH)
Spatial Joins, work on spatial indexes (R-tree)
Huge sizes (Tera bytes)
GIS is a classic example

4
Overview
Spatial Data Mining
Clustering techniques
CLARANS
Spatial and Non-Spatial dominant
CLARANS
Observations
Summary

5
Partitioning Methods
Given K, the number of partitions to create, a partitioning
method constructs initial partitions. It then iterative
refines the quality of these clusters so as to maximize
intra-cluster similarity and inter-cluster dissimilarity.
[Quality of Clustering]: Average dissimilarity of objects
from their cluster centers (medoids)

Selected algorithms:
1. K-medoids
2. PAM
3. CLARA
4. CLARANS

6
K-Medoids 10

Partition based clustering (K 6

partitions) 5

Effective, why ? 3

Resistant to outliers
2

Do not depend on order in 0

0 1 2 3 4 5 6 7 8 9 10

which data points are K-medoids

examined
Cluster center is part of 10

dataset, unlike k-means 9

where cluster center is gravity 8

based 6

Experiments show that large 4

data sets are handled 2

efficiently 0
0 1 2 3 4 5 6 7 8 9 10

K-means
7
PAM (Partitioning Around Medoids)
[Goal]: Find K representative objects of
the data set. Each of the K objects is
called a Medoid, the most centrally
located object within a cluster.

8
PAM (2)
Start with K data points designated
as medoids. Create cluster around
a medoid by moving data points
close to the medoid
Oj belongs to Oi
if d(Oj, Oi) = minOe d(Oj, Oe)
Iteratively replace Oi with Oh if
quality of clustering improves.
Swapping cost, Cijh, associated for
replacing a selected object Oi with
a non-selected object Oh

9
PAM (3)
* O(k(n-k)2) for each iteration
* Good for small data sets
(n=100, k=5)

10
CLARA (Clustering LARge
Applications)

Improvement over PAM

Finds medoids in a sample from the dataset
[Idea]: If the samples are sufficiently
random, the medoids of the sample
approximate the medoids of the dataset
[Heuristics]: 5 samples of size 40+2k gives
satisfactory results
Works well for large datasets (n=1000, k=10)

11
Overview
Spatial Data Mining
Clustering techniques
CLARANS
Spatial and Non-Spatial dominant
CLARANS
Observations
Summary

12
CLARANS (Clustering Large Applications
based on RANdomized Search)

A graph abstraction, Gn,k

Each vertex is a {Om1, ..., Omk}
S1
collection of k medoids
| S1∩ S2 | = k – 1 S2
Each node has k(n-k)
{Oa1, ..., Oak}
neighbors
Cost of each node is total
dissimilarity of objects to {Ob1, ..., Obk}
their medoids
PAM searches whole graph {Oc1, ..., Ock}

CLARA searches subgraph

{Od1, ..., Odk}

13
CLARANS (2)

Experimental values

• numLocal = 2
• maxNeighbors =
max(1.25% of k(n-k), 250)

14
CLARANS (3)
Outperforms PAM and CLARA in terms
of running time and quality of
clustering
O(n2) for each iteration

CLARANS vs CLARA
15

CLARANS vs PAM
Overview
Spatial Data Mining
Clustering techniques
CLARANS
Spatial and Non-Spatial dominant
CLARANS
Observations
Summary

16
Generalization
Useful to mine non-spatial
attributes
Process of merging tuples
based on a concept hierarchy
DBLearn – SQL query, gen.
hierarchy and threshold

Sphere(color, diameter)

Initial relation Generalized relation

17
Silhouette
Silhouette of object Oj
determines how much
Oj belongs to it’s
cluster
Between -1 and 1
1 indicates high
degree of membership

Silhouette width of
cluster Silhoutte width Interpretation
Average silhouette of 0.71 – 1 Strong cluster
all objects in cluster 0.51 – 0.7 Reasonable cluster

0.26 – 0.5 Weak or artificial

Silhouette coefficient cluster
Average silhouette ≤ 0.25 No cluster found
widths of k clusters 18
SD and NSD approach
SD – Spatial Dominant
NSD – Non-Spatial Dominant
Clustering for spatial attributes /
Generalization for non-spatial attributes
Dominance is decided by what is
carried out first
(clustering/generalization)
Second phase works on tuples from
previous stage
19
SD(CLARANS)
Data
SQL For every cluster
CLARANS
on spatial Collect non-spatial
Specify learning attributes components
Oi
request in the
Tuples Knat clusters
form of SQL Oh
Oj

query

Apply DBLearn

Finds non-spatial generalizations from spatial

clustering
Value for Knat is determined through heuristics
using the silhouette coefficients
Clustering phase can be treated as finding
spatial generalization hierarchy 20
NSD(CLARANS)

Finds spatial clusters from non-spatial

generalizations
Clusters may overlap
21
Overview
Spatial Data Mining
Clustering techniques
CLARANS
Spatial and Non-Spatial dominant
CLARANS
Observations
Summary

22
Observations
In all previous methods, quality of
mining depends on the SQL query
CLARANS assumes that the entire
dataset is in memory. Not always the
case for large data sets.
Quality of results cannot be guaranteed
when N is very large – due to
Randomized Search

23
Observations (2)
Other clustering algorithms proposed
for Spatial Data Mining
Hierarchical: BIRCH
Density based: DBSCAN, GDBSCAN,
DBRS
Grid based: STING

24
Summary
A seminal paper on use of clustering for
spatial data mining
CLARANS is an effective clustering
technique for large datasets
SD(CLARANS)/NSD(CLARANS) are
effective spatial data mining algorithms

25
References
Primary
Efficient and Effective Clustering Methods for
Spatial Data Mining (1994) - Raymond T. Ng, Jiawei Han
Secondary
CLARANS: A Method for Clustering Objects for
Spatial Data Mining - Raymond T. Ng, Jiawei Han
Clustering for Mining in Large Spatial Databases -
Martin Ester, Hans-Peter Kriegel, Jörg Sander, Xiaowei Xu
An Introduction to Spatial Database Systems - Ralf
Hartmut Güting

Business Plan
100% (3)
Business Plan
7 pages
Raymond T. NG, Jiawei Han
No ratings yet
Raymond T. NG, Jiawei Han
26 pages
CLARANS
No ratings yet
CLARANS
19 pages
TR 94 13
No ratings yet
TR 94 13
25 pages
Comparative Study of Spatial Data Mining Techniques: Kamalpreet Kaur Jassar Kanwalvir Singh Dhindsa
No ratings yet
Comparative Study of Spatial Data Mining Techniques: Kamalpreet Kaur Jassar Kanwalvir Singh Dhindsa
4 pages
Spatial Data Mining: Presented By-: Rajkumar Jain M.tech (C.s.e) 1 Year (2 Sem)
0% (1)
Spatial Data Mining: Presented By-: Rajkumar Jain M.tech (C.s.e) 1 Year (2 Sem)
27 pages
Pam Clustering Technique: Bachelor of Technology Computer Science and Engineering
No ratings yet
Pam Clustering Technique: Bachelor of Technology Computer Science and Engineering
11 pages
Session 7 Clustering
No ratings yet
Session 7 Clustering
93 pages
Unit-4
No ratings yet
Unit-4
76 pages
Clustering For Big Data Analytics
No ratings yet
Clustering For Big Data Analytics
28 pages
Partitioning Methods
No ratings yet
Partitioning Methods
26 pages
2002 Spring CS525 Lecture 2
No ratings yet
2002 Spring CS525 Lecture 2
37 pages
What Is Cluster Analysis?: - Cluster: A Collection of Data Objects
No ratings yet
What Is Cluster Analysis?: - Cluster: A Collection of Data Objects
42 pages
Cluster-Analysis
No ratings yet
Cluster-Analysis
89 pages
Data Mining Unit-Iv
No ratings yet
Data Mining Unit-Iv
34 pages
10ClusBasic Editted v1
No ratings yet
10ClusBasic Editted v1
41 pages
Performance Analysis of Different Clustering Algorithm: Naresh Mathur, Manish Tiwari, Sarika Khandelwal
No ratings yet
Performance Analysis of Different Clustering Algorithm: Naresh Mathur, Manish Tiwari, Sarika Khandelwal
5 pages
Performance Analysis of Different Cluste PDF
No ratings yet
Performance Analysis of Different Cluste PDF
5 pages
Clustering
No ratings yet
Clustering
32 pages
Datamining and Dataware Housing With Special Reference TO Partitional Algorithms in Clustering of Data Mining
No ratings yet
Datamining and Dataware Housing With Special Reference TO Partitional Algorithms in Clustering of Data Mining
10 pages
GDBSCAN
No ratings yet
GDBSCAN
30 pages
Session 3: Clustering Techniques - Partitioning & Hierarchical Methods
No ratings yet
Session 3: Clustering Techniques - Partitioning & Hierarchical Methods
27 pages
Concepts and Techniques: Data Mining
No ratings yet
Concepts and Techniques: Data Mining
101 pages
Concepts and Techniques: - Chapter 10
No ratings yet
Concepts and Techniques: - Chapter 10
97 pages
Spatial Mining
No ratings yet
Spatial Mining
1 page
Sathyabama Institute of Science and Technology SIT1301-Data Mining and Warehousing
No ratings yet
Sathyabama Institute of Science and Technology SIT1301-Data Mining and Warehousing
22 pages
Data Mining Clustering
No ratings yet
Data Mining Clustering
76 pages
Machine Learning Unit-4
No ratings yet
Machine Learning Unit-4
24 pages
10ClusBasic
No ratings yet
10ClusBasic
95 pages
APznzaaxpWzYylHJmwXGn2puBz7GP1usZYf9XTi7oqfrrKnFV9DMMfVzPCu6yO0UOnr_XFt1gJv4TE1ITR6850n9k65DydQUgoRlylNdn2acWAu6KNonoO8z7QULN6BlLxY_B-JhKko0tJ3K77woLz26oTaAv1YNcIuMcOSqInmgeCUzpUxjKC9VqnT_lhE7vDyWp_LQQjGTRnamgIC6ya3nlwi7mjjE9EUIiO2sUhjkD6RV
No ratings yet
APznzaaxpWzYylHJmwXGn2puBz7GP1usZYf9XTi7oqfrrKnFV9DMMfVzPCu6yO0UOnr_XFt1gJv4TE1ITR6850n9k65DydQUgoRlylNdn2acWAu6KNonoO8z7QULN6BlLxY_B-JhKko0tJ3K77woLz26oTaAv1YNcIuMcOSqInmgeCUzpUxjKC9VqnT_lhE7vDyWp_LQQjGTRnamgIC6ya3nlwi7mjjE9EUIiO2sUhjkD6RV
38 pages
unit iv[1]
No ratings yet
unit iv[1]
96 pages
Data Mining-Partitioning Methods
100% (1)
Data Mining-Partitioning Methods
7 pages
A Comparison of K-Means Clustering Algorithm and C
No ratings yet
A Comparison of K-Means Clustering Algorithm and C
4 pages
Cluster Analysis
No ratings yet
Cluster Analysis
76 pages
Machine Learning Note Modul 4 5[1]
No ratings yet
Machine Learning Note Modul 4 5[1]
20 pages
Data Mining Project: Cluster Analysis and Dimensionality Reduction in R Using Bank Marketing Data Set
No ratings yet
Data Mining Project: Cluster Analysis and Dimensionality Reduction in R Using Bank Marketing Data Set
31 pages
Cluster Analysis - Approach 1
No ratings yet
Cluster Analysis - Approach 1
28 pages
10clustering - Han and Kamber
No ratings yet
10clustering - Han and Kamber
93 pages
10ClusBasic
No ratings yet
10ClusBasic
66 pages
05 Clustering
No ratings yet
05 Clustering
96 pages
Concepts and Techniques: Data Mining
No ratings yet
Concepts and Techniques: Data Mining
50 pages
Lecture 8 - Clustering
No ratings yet
Lecture 8 - Clustering
23 pages
Clustering Partitioning Methods
No ratings yet
Clustering Partitioning Methods
20 pages
Cluster Analysis Clustering
No ratings yet
Cluster Analysis Clustering
6 pages
Clustering
No ratings yet
Clustering
11 pages
By Lior Rokach and Oded Maimon: Clustering Methods
No ratings yet
By Lior Rokach and Oded Maimon: Clustering Methods
5 pages
Survey of Clustering Data Mining Techniques: Pavel Berkhin
100% (1)
Survey of Clustering Data Mining Techniques: Pavel Berkhin
56 pages
Clustering Techniques in Data Mining
No ratings yet
Clustering Techniques in Data Mining
7 pages
Chatgpt Unit - 4
No ratings yet
Chatgpt Unit - 4
4 pages
Clustering K Means Agnes
No ratings yet
Clustering K Means Agnes
36 pages
DM Clustering UNIT4
No ratings yet
DM Clustering UNIT4
36 pages
UNIT V MACHINE LEARNING
No ratings yet
UNIT V MACHINE LEARNING
5 pages
2008.05171v2
No ratings yet
2008.05171v2
41 pages
Clustering Unit4
No ratings yet
Clustering Unit4
9 pages
PSO and WDO Data Clusterin
No ratings yet
PSO and WDO Data Clusterin
19 pages
Final ML Unit3 May24
No ratings yet
Final ML Unit3 May24
154 pages
Datamining - Revited
No ratings yet
Datamining - Revited
8 pages
Spatial and Web Mining
No ratings yet
Spatial and Web Mining
27 pages
Unit 5 DM
No ratings yet
Unit 5 DM
47 pages
Graph Partitioning Advance Clustering Technique
No ratings yet
Graph Partitioning Advance Clustering Technique
14 pages
Mastering Deep Learning with Keras: From Basics to Expert Proficiency
From Everand
Mastering Deep Learning with Keras: From Basics to Expert Proficiency
William Smith
No ratings yet
Digital Media Reach: A Comparative Study of Rural and Urban People in India
No ratings yet
Digital Media Reach: A Comparative Study of Rural and Urban People in India
10 pages
MacAlister of Glenbarr and Cour and Others
100% (3)
MacAlister of Glenbarr and Cour and Others
16 pages
OC Unit-2B Optical Sources
No ratings yet
OC Unit-2B Optical Sources
79 pages
Daily Accomplishment Report (Deniel)
No ratings yet
Daily Accomplishment Report (Deniel)
12 pages
Introduction To Science: Study Guide For Module No. 1
No ratings yet
Introduction To Science: Study Guide For Module No. 1
6 pages
C1. Introduction
No ratings yet
C1. Introduction
38 pages
Noetic Maltego Report
No ratings yet
Noetic Maltego Report
273 pages
Ethics Reviewer Final
No ratings yet
Ethics Reviewer Final
9 pages
OSY Practical No.2
No ratings yet
OSY Practical No.2
16 pages
Constructing Scatter Plots
No ratings yet
Constructing Scatter Plots
2 pages
Cadiente Vs Macas
No ratings yet
Cadiente Vs Macas
1 page
Project Assignment
No ratings yet
Project Assignment
30 pages
Download
No ratings yet
Download
6 pages
PSC - MALAYALAM by Kerala PSC
100% (1)
PSC - MALAYALAM by Kerala PSC
167 pages
Soal Iv
No ratings yet
Soal Iv
6 pages
Sar Parivahan Private Limited (Assam) - 18Aahcs3892C1Z6 Sale Details For The Month Ending March 2020
No ratings yet
Sar Parivahan Private Limited (Assam) - 18Aahcs3892C1Z6 Sale Details For The Month Ending March 2020
3 pages
Lack of Materials
No ratings yet
Lack of Materials
7 pages
Career Interest Survey
No ratings yet
Career Interest Survey
3 pages
PCO 202 Pharmacology - Basics-2
No ratings yet
PCO 202 Pharmacology - Basics-2
18 pages
Aeromedical Evacuation Management of Acute and Stabilized Patients William W. Hurd all chapter instant download
100% (1)
Aeromedical Evacuation Management of Acute and Stabilized Patients William W. Hurd all chapter instant download
55 pages
Methods of Recording Jaw Relation
100% (1)
Methods of Recording Jaw Relation
80 pages
Letters of The Damned Issue #1
No ratings yet
Letters of The Damned Issue #1
19 pages
Testbank Mid Cau Hoi Suu Tam
No ratings yet
Testbank Mid Cau Hoi Suu Tam
47 pages
ECE170: Electronics (1) : Lecture
No ratings yet
ECE170: Electronics (1) : Lecture
40 pages
Linear Algebra, Vector Algebra and Analytical Geometry: V.V. Konev
No ratings yet
Linear Algebra, Vector Algebra and Analytical Geometry: V.V. Konev
39 pages
Q1 Curriculum Map Grade 7
No ratings yet
Q1 Curriculum Map Grade 7
3 pages
CH 02
No ratings yet
CH 02
34 pages
financials
No ratings yet
financials
3 pages
TOEFL
No ratings yet
TOEFL
6 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Clarans Clustering

Uploaded by

Clarans Clustering

Uploaded by

Efficient and

Partition based clustering (K 6

Do not depend on order in 0

which data points are K-medoids

dataset, unlike k-means 9

where cluster center is gravity 8

Experiments show that large 4

data sets are handled 2

Improvement over PAM

A graph abstraction, Gn,k

CLARA searches subgraph

{Od1, ..., Odk}

Initial relation Generalized relation

0.26 – 0.5 Weak or artificial

Finds non-spatial generalizations from spatial

Finds spatial clusters from non-spatial

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.