0% found this document useful (0 votes)

78 views

Cluster Analysis - Approach 1

clustring approaches

Uploaded by

Charan Naidu

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

78 views

Cluster Analysis - Approach 1

clustring approaches

Uploaded by

Charan Naidu

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 28

CLUSTER ANALYSIS

Clustering approaches
Partitioning Algorithms: Basic Concept
 Partitioning method: Construct a partition of a database D of
n objects into a set of k clusters,

 Given a k, find a partition of k clusters that optimizes the

chosen partitioning criterion
• Global optimal: exhaustively enumerate all partitions
• Heuristic methods: k-means and k-medoids algorithms
• k-means (MacQueen’67): Each cluster is represented by the
center of the cluster
• k-medoids or PAM (Partition around medoids) (Kaufman
& Rousseeuw’87): Each cluster is represented by one of the
objects in the cluster
The K-Means Clustering Method
 Given k, the k-means algorithm is implemented in four steps:
• Partition objects into k nonempty subsets
• Compute seed points as the centroids of the clusters of the
current partition (the centroid is the center, i.e., mean point,
of the cluster)
• Assign each object to the cluster with the nearest seed point
• Go back to Step 2, stop when no more new assignment
The K-Means Clustering Method
• Example:
Comments on the K-Means Method
 Strength: Relatively efficient: O(tkn), where n is no of
objects, k is no of clusters, and t is no of iterations. Normally,
k, t << n.
• Comparing: PAM: O(k(n-k)2 ), CLARA: O(ks2 + k(n-k))
 Comment: Often terminates at a local optimum. The global
optimum may be found using techniques such as: deterministic
annealing and genetic algorithms
 Weakness
• Applicable only when mean is defined, then what about
categorical data?
• Need to specify k, the number of clusters, in advance
• Unable to handle noisy data and outliers
• Not suitable to discover clusters with non-convex shapes
Variations of the K-Means Method
 A few variants of the k-means which differ in
• Selection of the initial k means
• Dissimilarity calculations
• Strategies to calculate cluster means
 Handling categorical data: k-modes (Huang’98)
• Replacing means of clusters with modes
• Using new dissimilarity measures to deal with categorical
objects
• Using a frequency-based method to update modes of
clusters
• A mixture of categorical and numerical data: k-prototype
method
What Is the Problem of the K-Means
Method?
• The k-means algorithm is sensitive to outliers !
– Since an object with an extremely large value may
substantially distort the distribution of the data.
• K-Medoids:Instead of taking the mean value of the object in a
cluster as a reference point, medoids can be used, which is the
most centrally located object in a cluster.
The K-Medoids Clustering Method
 Find representative objects, called medoids, in clusters
 PAM (Partitioning Around Medoids, 1987)
• starts from an initial set of medoids and iteratively replaces
one of the medoids by one of the non-medoids if it improves
the total distance of the resulting clustering
• PAM works effectively for small data sets, but does not
scale well for large data sets
 CLARA (Kaufmann & Rousseeuw, 1990)
 CLARANS (Ng & Han, 1994): Randomized sampling
 Focusing + spatial data structure (Ester et al., 1995)
A Typical K-Medoids Algorithm
(PAM)
Total Cost = 20
10 10 10

9 9 9

8 8 8

Arbitrary Assign
7 7 7

6 6 6

5
choose k 5
each 5

4 object as 4 remainin 4

3 initial 3 g object 3

2
medoids 2
to 2

nearest
1 1 1

0 0 0
0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10 medoids 0 1 2 3 4 5 6 7 8 9 10

K=2 Randomly select a

Total Cost = 26 nonmedoid object,Oramdom
10 10

Do loop 9
Compute
9

Swapping O
8 8

7 total cost of 7

Until no and Oramdom 6

swapping 6

5 5

change If quality is 4 4

improved. 3

2
3

1 1

0 0
0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10
PAM (Partitioning Around Medoids)
(1987)
 PAM (Kaufman and Rousseeuw, 1987), built in Splus
 Use real object to represent the cluster
Select k representative objects arbitrarily
For each pair of non-selected object h and selected object i ,
calculate the total swapping cost T Cih
For each pair of i and h ,
• If T Cih < 0, i is replaced by h
• Then assign each non-selected object to the most similar
representative object
repeat steps 2-3 until there is no change
What Is the Problem with PAM?
 Pam is more robust than k-means in the presence of noise and
outliers because a medoid is less influenced by outliers or
other extreme values than a mean
 Pam works efficiently for small data sets but does not scale
well for large data sets.
– O(k(n-k)2 ) for each iteration
where n is no of data, k is no of clusters
• Sampling based method,
CLARA(Clustering LARge Applications)
CLARA (Clustering Large Applications)
 CLARA (Kaufmann and Rousseeuw in 1990)
• Built in statistical analysis packages, such as S+
 It draws multiple samples of the data set, applies PAM on each
sample, and gives the best clustering as the output
 Strength: deals with larger data sets than PAM
 Weakness:
• Efficiency depends on the sample size
• A good clustering based on samples will not necessarily
represent a good clustering of the whole data set if the
sample is biased
CLARANS (“Randomized” CLARA)
(1994)
 CLARANS (A Clustering Algorithm based on Randomized
Search) (Ng and Han’94)
 CLARANS draws sample of neighbors dynamically
 The clustering process can be presented as searching a graph
where every node is a potential solution, that is, a set of k
medoids
 If the local optimum is found, CLARANS starts with new
randomly selected node in search for a new local optimum
 It is more efficient and scalable than both PAM and CLARA
 Focusing techniques and spatial access structures may Further
improve its performance.
Hierarchical Clustering

• Use distance matrix as clustering criteria. This method

does not require the number of clusters k as an input, but
needs a termination condition
Step 0 Step 1 Step 2 Step 3 Step 4
agglomerative
(AGNES)
a
ab
b abcde
c
cde
d
de
e
divisive
Step 4 Step 3 Step 2 Step 1 Step 0 (DIANA)
AGNES (Agglomerative Nesting)
• Implemented in statistical analysis packages, e.g., Splus
• Use the Single-Link method and the dissimilarity matrix.
• Merge nodes that have the least dissimilarity
• Go on in a non-descending fashion
• Eventually all nodes belong to the same cluster

10 10 10

9 9 9

8 8 8

7 7 7

6 6 6

5 5 5

4 4 4

3 3 3

2 2 2

1 1 1

0 0 0
0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10
Dendrogram: Shows How the Clusters
are Merged

Decompose data objects into a several levels of nested

partitioning (tree of clusters), called a dendrogram.

A clustering of the data objects is obtained by cutting the

dendrogram at the desired level, then each connected
component forms a cluster.
DIANA (Divisive Analysis)
• Introduced in Kaufmann and Rousseeuw (1990)
• Implemented in statistical analysis packages, e.g., Splus

• Inverse order of AGNES

• Eventually each node forms a cluster on its own

10 10
10

9 9
9
8 8
8
7 7
7
6 6
6
5 5
5
4 4
4
3 3
3
2 2
2
1 1
1
0 0
0
0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10
0 1 2 3 4 5 6 7 8 9 10
Recent Hierarchical Clustering Methods

 Major weakness of agglomerative clustering methods

• do not scale well: time complexity of at least O(n2),
where n is the number of total objects
• can never undo what was done previously
 Integration of hierarchical with distance-based clustering
• BIRCH (1996): uses CF-tree and incrementally adjusts
the quality of sub-clusters
• ROCK (1999): clustering categorical data by neighbor
and link analysis
• CHAMELEON (1999): hierarchical clustering using
dynamic modeling
BIRCH (1996)

 Birch: Balanced Iterative Reducing and Clustering using

Hierarchies (Zhang, Ramakrishnan & Livny, SIGMOD’96)
 Incrementally construct a CF (Clustering Feature) tree, a
hierarchical data structure for multiphase clustering

• Phase 1: scan DB to build an initial in-memory CF tree (a multi-

level compression of the data that tries to preserve the inherent
clustering structure of the data)

• Phase 2: use an arbitrary clustering algorithm to cluster the leaf nodes

of the CF-tree
 Scales linearly: finds a good clustering with a single scan
and improves the quality with a few additional scans
 Weakness: handles only numeric data, and sensitive to the
order of the data record
Clustering Feature Vector inBIRCH
Clustering Feature: CF = (N, LS, SS)
N: Number of data points in the subcluster
LS: N =X - linear sum of data points
i=1 i

SS: N =X 2 - square sum of data points

i=1 i CF = (5, (16,30),(54,190))
10

9
(3,4)
8

6
(2,6)
5

4 (4,5)
3

1
(4,7)
0
0 1 2 3 4 5 6 7 8 9 10
(3,8)
CF-Tree in BIRCH
 Clustering feature:
• summary of the statistics for a given subcluster: the 0-th, 1st and 2nd
moments of the subcluster from the statistical point of view.
• registers crucial measurements for computing cluster and utilizes
storage efficiently

 A CF tree is a height-balanced tree that stores the clustering features for

a hierarchical clustering
• A nonleaf node in a tree has descendants or “children”
• The nonleaf nodes store sums of the CFs of their children
 A CF tree has two parameters
• Branching factor: specify the maximum number of children.
• Threshold: max diameter of sub-clusters stored at the leaf nodes
The CF Tree Structure
Root
B=7 CF1 CF2 CF3 CF6
child1 child2 child3 child6
L=6

Non-leaf node
CF1 CF2 CF3 CF5
child1 child2 child3 child5

Leaf node Leaf node

prev CF1 CF2 CF6 next prev CF1 CF2 CF4 next
Clustering Categorical Data: The
ROCK Algorithm
ROCK: RObust Clustering using linKs

Major ideas
Use links to measure similarity/proximity
Not distance-based
Computational complexity:
Algorithm: sampling-based clustering
Draw random sample
Cluster with links
Label data in disk
Similarity Measure in ROCK
 Traditional measures for categorical data may not work well, e.g.,
Jaccard coefficient
 Example: Two groups (clusters) of transactions
• C1. <a, b, c, d, e>: {a, b, c}, {a, b, d}, {a, b, e}, {a, c, d}, {a, c, e},
{a, d, e}, {b, c, d}, {b, c, e}, {b, d, e}, {c, d, e}
• C2. <a, b, f, g>: {a, b, f}, {a, b, g}, {a, f, g}, {b, f, g}
 Jaccard co-efficient may lead to wrong clustering result
• C1: 0.2 ({a, b, c}, {b, d, e}} to 0.5 ({a, b, c}, {a, b, d})
• C1 & C2: could be as high as 0.5 ({a, b, c}, {a, b, f})
 Jaccard co-efficient-based similarity function:
Link Measure in ROCK

Links: no of common neighbors

C1 <a, b, c, d, e>: {a, b, c}, {a, b, d}, {a, b, e}, {a, c, d}, {a, c, e},
{a, d, e}, {b, c, d}, {b, c, e}, {b, d, e}, {c, d, e}
C2 <a, b, f, g>: {a, b, f}, {a, b, g}, {a, f, g}, {b, f, g}

Let T1 = {a, b, c}, T2 = {c, d, e}, T3 = {a, b, f}

link(T1, T2) = 4, since they have 4 common neighbors
{a, c, d}, {a, c, e}, {b, c, d}, {b, c, e}

link(T1, T3) = 3, since they have 3 common neighbors

{a, b, d}, {a, b, e}, {a, b, g}
Thus link is a better measure than Jaccard coefficient
CHAMELEON: Hierarchical Clustering
Using Dynamic Modeling (1999)
 Measures the similarity based on a dynamic model
• Two clusters are merged only if the interconnectivity and
closeness (proximity) between two clusters are high relative to
the internal interconnectivity of the clusters and closeness of
items within the clusters
• Cure ignores information about interconnectivity of the
objects, Rock ignores information about the closeness of two
clusters
 A two-phase algorithm
1. Use a graph partitioning algorithm: cluster objects into a large
number of relatively small sub-clusters
2. Use an agglomerative hierarchical clustering algorithm: find the
genuine clusters by repeatedly combining these sub-clusters
Overall Framework of
CHAMELEON
Construct
Sparse Graph Partition the Graph

Data Set

Merge Partition

Final Clusters

April 18, 2013 Data Mining: Concepts and Tec5h6niques

CHAMELEON (Clustering Complex Objects)

Python Notes
91% (11)
Python Notes
204 pages
Session 3: Clustering Techniques - Partitioning & Hierarchical Methods
No ratings yet
Session 3: Clustering Techniques - Partitioning & Hierarchical Methods
27 pages
2002 Spring CS525 Lecture 2
No ratings yet
2002 Spring CS525 Lecture 2
37 pages
Cluster-Analysis
No ratings yet
Cluster-Analysis
89 pages
Data Mining-Partitioning Methods
100% (1)
Data Mining-Partitioning Methods
7 pages
Cluster
No ratings yet
Cluster
20 pages
What Is Cluster Analysis?: - Cluster: A Collection of Data Objects
No ratings yet
What Is Cluster Analysis?: - Cluster: A Collection of Data Objects
42 pages
Clustering
No ratings yet
Clustering
24 pages
Clustering Partitioning Methods
No ratings yet
Clustering Partitioning Methods
20 pages
Unit 5 DM
No ratings yet
Unit 5 DM
47 pages
Session 7 Clustering
No ratings yet
Session 7 Clustering
93 pages
Data Mining Clustering
No ratings yet
Data Mining Clustering
76 pages
Cluster Analysis
No ratings yet
Cluster Analysis
76 pages
Concepts and Techniques: Data Mining
No ratings yet
Concepts and Techniques: Data Mining
101 pages
Partitioning Methods
No ratings yet
Partitioning Methods
26 pages
Concepts and Techniques: Data Mining
No ratings yet
Concepts and Techniques: Data Mining
50 pages
10ClusBasic
No ratings yet
10ClusBasic
95 pages
Introduction to Cluster Analysis.
No ratings yet
Introduction to Cluster Analysis.
53 pages
Concepts and Techniques: Data Mining
No ratings yet
Concepts and Techniques: Data Mining
101 pages
10ClusBasic
No ratings yet
10ClusBasic
66 pages
Unit - 5 Cluster Analysis
No ratings yet
Unit - 5 Cluster Analysis
83 pages
05 Clustering
No ratings yet
05 Clustering
96 pages
Clustering K Means Agnes
No ratings yet
Clustering K Means Agnes
36 pages
Partitioning Methods & Hierachical Methods
No ratings yet
Partitioning Methods & Hierachical Methods
22 pages
10ClusBasic Editted v1
No ratings yet
10ClusBasic Editted v1
41 pages
What Is Cluster Analysis?: - Cluster: A Collection of Data Objects
No ratings yet
What Is Cluster Analysis?: - Cluster: A Collection of Data Objects
77 pages
Clustering[1]
No ratings yet
Clustering[1]
37 pages
Lecture 8 - Clustering
No ratings yet
Lecture 8 - Clustering
23 pages
Clustering
No ratings yet
Clustering
32 pages
Pam Clustering Technique: Bachelor of Technology Computer Science and Engineering
No ratings yet
Pam Clustering Technique: Bachelor of Technology Computer Science and Engineering
11 pages
Clustering Data Mining
No ratings yet
Clustering Data Mining
27 pages
unit iv[1]
No ratings yet
unit iv[1]
96 pages
10ClusBasic (1)
No ratings yet
10ClusBasic (1)
31 pages
Clustering
No ratings yet
Clustering
45 pages
Cluster Analysis Clustering
No ratings yet
Cluster Analysis Clustering
6 pages
Sathyabama Institute of Science and Technology SIT1301-Data Mining and Warehousing
No ratings yet
Sathyabama Institute of Science and Technology SIT1301-Data Mining and Warehousing
22 pages
Partitioning Methods
100% (1)
Partitioning Methods
3 pages
DM Clustering UNIT4
No ratings yet
DM Clustering UNIT4
36 pages
Concepts and Techniques: - Chapter 10
No ratings yet
Concepts and Techniques: - Chapter 10
97 pages
10clustering - Han and Kamber
No ratings yet
10clustering - Han and Kamber
93 pages
Clusteringi 4
No ratings yet
Clusteringi 4
6 pages
Clustering
No ratings yet
Clustering
25 pages
DWMModule 4 (1) (1) (1)
No ratings yet
DWMModule 4 (1) (1) (1)
31 pages
Clustering
No ratings yet
Clustering
7 pages
APznzaaxpWzYylHJmwXGn2puBz7GP1usZYf9XTi7oqfrrKnFV9DMMfVzPCu6yO0UOnr_XFt1gJv4TE1ITR6850n9k65DydQUgoRlylNdn2acWAu6KNonoO8z7QULN6BlLxY_B-JhKko0tJ3K77woLz26oTaAv1YNcIuMcOSqInmgeCUzpUxjKC9VqnT_lhE7vDyWp_LQQjGTRnamgIC6ya3nlwi7mjjE9EUIiO2sUhjkD6RV
No ratings yet
APznzaaxpWzYylHJmwXGn2puBz7GP1usZYf9XTi7oqfrrKnFV9DMMfVzPCu6yO0UOnr_XFt1gJv4TE1ITR6850n9k65DydQUgoRlylNdn2acWAu6KNonoO8z7QULN6BlLxY_B-JhKko0tJ3K77woLz26oTaAv1YNcIuMcOSqInmgeCUzpUxjKC9VqnT_lhE7vDyWp_LQQjGTRnamgIC6ya3nlwi7mjjE9EUIiO2sUhjkD6RV
38 pages
DM_C6
No ratings yet
DM_C6
37 pages
Clustering Algorithm: A Fundamental Operation in Data Mining
No ratings yet
Clustering Algorithm: A Fundamental Operation in Data Mining
44 pages
Chapter 5. Clustering Algorithms-Stud
No ratings yet
Chapter 5. Clustering Algorithms-Stud
44 pages
Data Mining Project: Cluster Analysis and Dimensionality Reduction in R Using Bank Marketing Data Set
No ratings yet
Data Mining Project: Cluster Analysis and Dimensionality Reduction in R Using Bank Marketing Data Set
31 pages
DWDM Unit 4
No ratings yet
DWDM Unit 4
24 pages
Unit Ii DM
No ratings yet
Unit Ii DM
82 pages
Clustering For Big Data Analytics
No ratings yet
Clustering For Big Data Analytics
28 pages
Lecture 6
No ratings yet
Lecture 6
14 pages
Unit V - Clustering
No ratings yet
Unit V - Clustering
19 pages
Clustering
No ratings yet
Clustering
34 pages
Partition
No ratings yet
Partition
52 pages
DMW Unit-V
No ratings yet
DMW Unit-V
47 pages
5. Clustering
No ratings yet
5. Clustering
89 pages
Unit-4
No ratings yet
Unit-4
76 pages
Gautam A. Kudale
No ratings yet
Gautam A. Kudale
6 pages
Fill Your Glass With Gold-When It's Half-Full or Even Completely Shattered
From Everand
Fill Your Glass With Gold-When It's Half-Full or Even Completely Shattered
Hillary Saffran
No ratings yet
DB Viewing
No ratings yet
DB Viewing
49 pages
Web Application Development Using Python: Techprime
100% (2)
Web Application Development Using Python: Techprime
2 pages
Python Programs by Narayana
100% (1)
Python Programs by Narayana
18 pages
Django Payments
No ratings yet
Django Payments
23 pages
03 Hierarchical Clustering
100% (1)
03 Hierarchical Clustering
15 pages
DBSCAN
No ratings yet
DBSCAN
18 pages
Assignment 10: Introduction To Machine Learning Prof. B. Ravindran
100% (1)
Assignment 10: Introduction To Machine Learning Prof. B. Ravindran
4 pages
Clustering - Jupyter Notebook
100% (1)
Clustering - Jupyter Notebook
11 pages
Tutorials On The Usage of The Geo R
No ratings yet
Tutorials On The Usage of The Geo R
113 pages
Analisis Clustering Dokumen Tugas Akhir Mahasiswa Sistem Informasi Universitas Nasional Menggunakan Metode K-Means Clustering
No ratings yet
Analisis Clustering Dokumen Tugas Akhir Mahasiswa Sistem Informasi Universitas Nasional Menggunakan Metode K-Means Clustering
7 pages
K Metoids
No ratings yet
K Metoids
18 pages
A Comparative Study of Various Algorithms To Detect Clustering in Spatial Data
No ratings yet
A Comparative Study of Various Algorithms To Detect Clustering in Spatial Data
42 pages
Estimasi Sumberdaya Batubara Dengan Menggunakan Geostatistik Krigging
100% (1)
Estimasi Sumberdaya Batubara Dengan Menggunakan Geostatistik Krigging
11 pages
Basics 1 Vario Gram
No ratings yet
Basics 1 Vario Gram
37 pages
Example For Agglomerative Clustering
No ratings yet
Example For Agglomerative Clustering
2 pages
CLUSTERING GRID-BASED METHODS Elsayed Hemayed Data Mining Course
No ratings yet
CLUSTERING GRID-BASED METHODS Elsayed Hemayed Data Mining Course
14 pages
Chapter 3: Cluster Analysis: 3.1 Basic Concepts of Clustering
No ratings yet
Chapter 3: Cluster Analysis: 3.1 Basic Concepts of Clustering
33 pages
Penentuan Jurusan Siswa Sekolah Menengah Atas Disesuaikan Dengan Minat Siswa Menggunakan Algoritma Fuzzy C-Means
No ratings yet
Penentuan Jurusan Siswa Sekolah Menengah Atas Disesuaikan Dengan Minat Siswa Menggunakan Algoritma Fuzzy C-Means
12 pages
Hierarchical Clustering Algorithm
No ratings yet
Hierarchical Clustering Algorithm
8 pages
Soft Clustering
No ratings yet
Soft Clustering
7 pages
Perbandingan Akurasi Euclidean Distance, Minkowski Distance, Dan Manhattan Distance Pada Algoritma K-Means Clustering Berbasis Chi-Square
No ratings yet
Perbandingan Akurasi Euclidean Distance, Minkowski Distance, Dan Manhattan Distance Pada Algoritma K-Means Clustering Berbasis Chi-Square
6 pages
Cluster Analysis - Approach 1
No ratings yet
Cluster Analysis - Approach 1
28 pages
Density-Based Methods: DBSCAN: Density-Based Clustering Based On Connected Regions With High Density
No ratings yet
Density-Based Methods: DBSCAN: Density-Based Clustering Based On Connected Regions With High Density
3 pages
Clustering - The Data Ensemble
No ratings yet
Clustering - The Data Ensemble
4 pages
Agglomerative Hierarchical Clustering
No ratings yet
Agglomerative Hierarchical Clustering
21 pages
CSE3506 - Essentials of Data Analytics: Facilitator: DR Sathiya Narayanan S
No ratings yet
CSE3506 - Essentials of Data Analytics: Facilitator: DR Sathiya Narayanan S
17 pages
Implementasi Data Mining Clustering Tingkat Kepuasan Konsumen Terhadap Pelayanan Go-Jek
No ratings yet
Implementasi Data Mining Clustering Tingkat Kepuasan Konsumen Terhadap Pelayanan Go-Jek
7 pages
Dbscan: Fast Density-Based Clustering With R: Michael Hahsler Matthew Piekenbrock
No ratings yet
Dbscan: Fast Density-Based Clustering With R: Michael Hahsler Matthew Piekenbrock
28 pages
Clustering Documentation Python Code
No ratings yet
Clustering Documentation Python Code
8 pages
CC282 Unsupervised Learning (Clustering) : Lecture 7 Slides For CC282 Machine Learning, R. Palaniappan, 2008 1
No ratings yet
CC282 Unsupervised Learning (Clustering) : Lecture 7 Slides For CC282 Machine Learning, R. Palaniappan, 2008 1
38 pages
Techniques of Cluster Analysis: A Seminar On
No ratings yet
Techniques of Cluster Analysis: A Seminar On
25 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Cluster Analysis - Approach 1

Uploaded by

Cluster Analysis - Approach 1

Uploaded by

CLUSTER ANALYSIS

 Given a k, find a partition of k clusters that optimizes the

K=2 Randomly select a

Until no and Oramdom 6

• Use distance matrix as clustering criteria. This method

Decompose data objects into a several levels of nested

A clustering of the data objects is obtained by cutting the

• Inverse order of AGNES

 Major weakness of agglomerative clustering methods

 Birch: Balanced Iterative Reducing and Clustering using

• Phase 1: scan DB to build an initial in-memory CF tree (a multi-

• Phase 2: use an arbitrary clustering algorithm to cluster the leaf nodes

SS: N =X 2 - square sum of data points

 A CF tree is a height-balanced tree that stores the clustering features for

Leaf node Leaf node

Links: no of common neighbors

Let T1 = {a, b, c}, T2 = {c, d, e}, T3 = {a, b, f}

link(T1, T3) = 3, since they have 3 common neighbors

April 18, 2013 Data Mining: Concepts and Tec5h6niques

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.