0% found this document useful (0 votes)

13 views

Clustering

The document discusses different methods for clustering, including partitioning methods like k-means and k-medoids, hierarchical methods like agglomerative nesting, and model-based and density-based methods. It provides examples of applications of clustering and discusses concepts like similarity measures and evaluating the quality of clustering.

Uploaded by

sujan.cseru

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

13 views

Clustering

Uploaded by

sujan.cseru

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

You are on page 1/ 45

Lecture 10

Clustering

1
Preview

 Introduction
 Partitioning methods
 Hierarchical methods
 Model-based methods
 Density-based methods

2
Examples of Clustering Applications
 Marketing: Help marketers discover distinct groups in their
customer bases, and then use this knowledge to develop
targeted marketing programs
 Land use: Identification of areas of similar land use in an
earth observation database
 Insurance: Identifying groups of motor insurance policy
holders with a high average claim cost
 Urban planning: Identifying groups of houses according to
their house type, value, and geographical location
 Seismology: Observed earth quake epicenters should be
clustered along continent faults
4
What Is a Good Clustering?

 A good clustering method will produce

clusters with
 High intra-class similarity
 Low inter-class similarity
 Precise definition of clustering quality is difficult
 Application-dependent
 Ultimately subjective

5
Requirements for Clustering
in Data Mining
 Scalability
 Ability to deal with different types of attributes
 Discovery of clusters with arbitrary shape
 Minimal domain knowledge required to determine
input parameters
 Ability to deal with noise and outliers
 Insensitivity to order of input records
 Robustness wrt high dimensionality
 Incorporation of user-specified constraints
 Interpretability and usability
6
Similarity and Dissimilarity
Between Objects
 Same we used for IBL (e.g, Lp norm)
 Euclidean distance (p = 2):
   
d(i,j)
  
2 2 2
d (i, j )
 Properties (| x x | | x x | ... | x x | )
i1 of j1 a metrici2 j2 ip : jp

 d(i,j)  0
 d(i,i) = 0

 d(i,j) = d(j,i)

 d(i,j)  d(i,k) + d(k,j)

7
Major Clustering Approaches

 Partitioning: Construct various partitions and then evaluate

them by some criterion
 Hierarchical: Create a hierarchical decomposition of the set
of objects using some criterion
 Model-based: Hypothesize a model for each cluster and
find best fit of models to data
 Density-based: Guided by connectivity and density
functions

8
Partitioning Algorithms

 Partitioning method: Construct a partition of a database D

of n objects into a set of k clusters
 Given a k, find a partition of k clusters that optimizes the
chosen partitioning criterion
 Global optimal: exhaustively enumerate all partitions
 Heuristic methods: k-means and k-medoids algorithms
 k-means (MacQueen, 1967): Each cluster is
represented by the center of the cluster
 k-medoids or PAM (Partition around medoids)
(Kaufman & Rousseeuw, 1987): Each cluster is
represented by one of the objects in the cluster
9
K-Means Clustering

 Given k, the k-means algorithm consists of

four steps:
 Select initial centroids at random.

 Assign each object to the cluster with the

nearest centroid.
 Compute each centroid as the mean of the

objects assigned to it.

 Repeat previous 2 steps until no change.

10
K-Means Clustering (contd.)
 Example
10 10

9 9

8 8

7 7

6 6

5 5

4 4

3 3

2 2

1 1

0 0
0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10

10 10

9 9

8 8

7 7

6 6

5 5

4 4

3 3

2 2

1 1

0 0
0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10

11
Comments on the K-Means Method
 Strengths
 Relatively efficient: O(tkn), where n is # objects, k is
# clusters, and t is # iterations. Normally, k, t << n.
 Often terminates at a local optimum. The global optimum
may be found using techniques such as simulated
annealing and genetic algorithms
 Weaknesses
 Applicable only when mean is defined (what about

categorical data?)
 Need to specify k, the number of clusters, in advance

 Trouble with noisy data and outliers

 Not suitable to discover clusters with non-convex shapes

12
Hierarchical Clustering
 Use distance matrix as clustering criteria. This method
does not require the number of clusters k as an input,
but needs a termination condition
Step 0 Step 1 Step 2 Step 3 Step 4
agglomerative
(AGNES)
a ab
b abcde
c
cde
d
de
e
divisive
Step 4 Step 3 Step 2 Step 1 Step 0 (DIANA)
13
AGNES (Agglomerative Nesting)
 Produces tree of clusters (nodes)
 Initially: each object is a cluster (leaf)
 Recursively merges nodes that have the least dissimilarity
 Criteria: min distance, max distance, avg distance, center
distance
 Eventually all nodes belong to the same cluster (root)
10 10 10

9 9 9

8 8 8

7 7 7

6 6 6

5 5 5

4 4 4

3 3 3

2 2 2

1 1 1

0 0 0
0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10

14
A Dendrogram Shows How the
Clusters are Merged Hierarchically

Decompose data objects into several levels of nested

partitioning (tree of clusters), called a dendrogram.

A clustering of the data objects is obtained by cutting the

dendrogram at the desired level. Then each connected
component forms a cluster.

15
DIANA (Divisive Analysis)

 Inverse order of AGNES

 Start with root cluster containing all objects
 Recursively divide into subclusters
 Eventually each cluster contains a single object
10 10
10

9 9
9
8 8
8

7 7
7
6 6
6

5 5
5
4 4
4

3 3
3
2 2
2

1 1
1
0 0
0
0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10
0 1 2 3 4 5 6 7 8 9 10

16
Other Hierarchical Clustering Methods
 Major weakness of agglomerative clustering methods
 Do not scale well: time complexity of at least O(n2),

where n is the number of total objects

 Can never undo what was done previously

 Integration of hierarchical with distance-based clustering

 BIRCH: uses CF-tree and incrementally adjusts the

quality of sub-clusters
 CURE: selects well-scattered points from the cluster and

then shrinks them towards the center of the cluster by

a specified fraction

17
BIRCH
 BIRCH: Balanced Iterative Reducing and Clustering using
Hierarchies (Zhang, Ramakrishnan & Livny, 1996)
 Incrementally construct a CF (Clustering Feature) tree
 Parameters: max diameter, max children
 Phase 1: scan DB to build an initial in-memory CF tree
(each node: #points, sum, sum of squares)
 Phase 2: use an arbitrary clustering algorithm to cluster
the leaf nodes of the CF-tree
 Scales linearly: finds a good clustering with a single scan
and improves the quality with a few additional scans
 Weaknesses: handles only numeric data, sensitive to order
of data records.
18
Clustering Feature Vector

Clustering Feature: CF = (N, LS, SS)

N: Number of data points
LS: Ni=1 Xi
SS: Ni=1 Xi2 CF = (5, (16,30),(54,190))
10

9
(3,4)
(2,6)
8

(4,5)
5

1
(4,7)
(3,8)
0
0 1 2 3 4 5 6 7 8 9 10

19
CF Tree Root

B=7 CF1 CF2 CF3 CF6

L=6 child1 child2 child3 child6

Non-leaf node
CF1 CF2 CF3 CF5
child1 child2 child3 child5

Leaf node Leaf node

prev CF1 CF2 CF6 next prev CF1 CF2 CF4 next

20
CURE (Clustering Using
REpresentatives)

 CURE: non-spherical clusters, robust wrt outliers

 Uses multiple representative points to evaluate
the distance between clusters
 Stops the creation of a cluster hierarchy if a
level consists of k clusters
21
Drawbacks of Distance-Based
Method

 Drawbacks of square-error-based clustering method

 Consider only one point as representative of a cluster
 Good only for convex clusters, of similar size and
density, and if k can be reasonably estimated
22
Cure: The Algorithm

 Draw random sample s

 Partition sample to p partitions with size s/p
 Partially cluster partitions into s/pq clusters
 Cluster partial clusters, shrinking
representatives towards centroid
 Label data on disk

23
Data Partitioning and Clustering
 s = 50
 p=2 s/pq = 5

 s/p = 25

y
y y

x
y y

x x
x x
24
Cure: Shrinking Representative Points
y y

x x

 Shrink the multiple representative points towards the

gravity center by a fraction of .
 Multiple representatives capture the shape of the cluster
25
Model-Based Clustering

 Basic idea: Clustering as probability estimation

 One model for each cluster
 Generative model:
 Probability of selecting a cluster

 Probability of generating an object in cluster

 Find max. likelihood or MAP model

 Missing information: Cluster membership
 Use EM algorithm
 Quality of clustering: Likelihood of test objects
26
Mixtures of Gaussians

 Cluster model: Normal distribution (mean, covariance)

 Assume: diagonal covariance, known variance,
same for all clusters
 Max. likelihood: mean = avg. of samples
 But what points are samples of a given cluster?
 Estimate prob. that point belongs to cluster
 Mean = weighted avg. of points, weight = prob.
 But to estimate probs. we need model
 “Chicken and egg” problem: use EM algorithm

27
EM Algorithm for Mixtures

 Initialization: Choose means at random

 E step:
 For all points and means, compute Prob(point|mean)

 Prob(mean|point) = Prob(mean)
Prob(point|mean) / Prob(point)
 M step:
 Each mean = Weighted avg. of points

 Weight = Prob(mean|point)

 Repeat until convergence

28
EM Algorithm (contd.)

 Guaranteed to converge to local optimum

 K-means is special case

29
AutoClass

 Developed at NASA (Cheeseman & Stutz, 1988)

 Mixture of Naïve Bayes models
 Variety of possible models for Prob(attribute|
class)
 Missing information: Class of each example
 Apply EM algorithm as before
 Special case of learning Bayes net with
missing values
 Widely used in practice
30
COBWEB

 Grows tree of clusters (Fisher, 1987)

 Each node contains:
P(cluster), P(attribute|cluster) for each attribute
 Objects presented sequentially
 Options: Add to node, new node; merge, split
 Quality measure: Category utility:
Increase in predictability of attributes/#Clusters

31
A COBWEB Tree

32
Neural Network Approaches

 Neuron = Cluster = Centroid in instance space

 Layer = Level of hierarchy
 Several competing sets of clusters in each layer
 Objects sequentially presented to network
 Within each set, neurons compete to win object
 Winning neuron is moved towards object
 Can be viewed as mapping from low-level
features to high-level ones

33
Competitive Learning

34
Self-Organizing Feature Maps

 Clustering is also performed by having several

units competing for the current object
 The unit whose weight vector is closest to the
current object wins
 The winner and its neighbors learn by having
their weights adjusted
 SOMs are believed to resemble processing that
can occur in the brain
 Useful for visualizing high-dimensional data in
2- or 3-D space

35
Density-Based Clustering
 Clustering based on density (local cluster criterion),
such as density-connected points
 Major features:

Discover clusters of arbitrary shape

Handle noise

One scan

Need density parameters as termination condition
 Representative algorithms:
 DBSCAN (Ester et al., 1996)

 DENCLUE (Hinneburg & Keim, 1998)

36
Definitions (I)
 Two parameters:
 Eps: Maximum radius of neighborhood
 MinPts: Minimum number of points in an Eps-
neighborhood of a point
 NEps(p) ={q Є D | dist(p,q) <= Eps}
 Directly density-reachable: A point p is directly density-
reachable from a point q wrt. Eps, MinPts iff
 1) p belongs to NEps(q)
p MinPts = 5
 2) q is a core point: q
Eps = 1 cm
|NEps (q)| >= MinPts
37
Definitions (II)

 Density-reachable:
p
 A point p is density-reachable
from a point q wrt. Eps, MinPts if p1
q
there is a chain of points p1, …, pn,
p1 = q, pn = p such that pi+1 is
directly density-reachable from pi
 Density-connected
p q
 A point p is density-connected to a
point q wrt. Eps, MinPts if there is
a point o such that both, p and q o
are density-reachable from o wrt.
Eps and MinPts.
38
DBSCAN: Density Based Spatial
Clustering of Applications with Noise

 Relies on a density-based notion of cluster: A cluster is

defined as a maximal set of density-connected points
 Discovers clusters of arbitrary shape in spatial databases
with noise

Outlier

Border
Eps = 1cm
Core MinPts = 5

39
DBSCAN: The Algorithm

 Arbitrarily select a point p

 Retrieve all points density-reachable from p wrt Eps
and MinPts.
 If p is a core point, a cluster is formed.
 If p is a border point, no points are density-reachable
from p and DBSCAN visits the next point of the
database.
 Continue the process until all of the points have been
processed.
40
DENCLUE: Using Density Functions
 DENsity-based CLUstEring (Hinneburg & Keim, 1998)
 Major features
 Good for data sets with large amounts of noise
 Allows a compact mathematical description of arbitrarily
shaped clusters in high-dimensional data sets
 Significantly faster than other algorithms
(faster than DBSCAN by a factor of up to 45)
 But needs a large number of parameters

41
DENCLUE
 Uses grid cells but only keeps information about grid cells
that do actually contain data points and manages these
cells in a tree-based access structure.
 Influence function: describes the impact of a data point
within its neighborhood.
 Overall density of the data space can be calculated as the
sum of the influence function of all data points.
 Clusters can be determined mathematically by identifying
density attractors.
 Density attractors are local maxima of the overall density
function.

42
Influence Functions

 Example
d ( x , y )2

f Gaussian ( x , y )  e 2 2

d ( x , xi ) 2

( x )  i 1 e
D N
2 2
f Gaussian
d ( x , xi ) 2

( x, xi )  i 1 ( xi  x)  e
N
f D
Gaussian
2 2

43
Density Attractors

44
Center-Defined & Arbitrary Clusters

45
Clustering: Summary

 Introduction
 Partitioning methods
 Hierarchical methods
 Model-based methods
 Density-based methods

Tao Lei, Asoke K. Nandi - Image Segmentation. Principles, Techniques, and Applications-Wiley (2023)
No ratings yet
Tao Lei, Asoke K. Nandi - Image Segmentation. Principles, Techniques, and Applications-Wiley (2023)
334 pages
Graded Quiz - Test Your Project Understanding - Coursera
100% (1)
Graded Quiz - Test Your Project Understanding - Coursera
1 page
clustering
No ratings yet
clustering
16 pages
Custer Analysis: Prepared by Navin Ninama
No ratings yet
Custer Analysis: Prepared by Navin Ninama
20 pages
Cluster
No ratings yet
Cluster
20 pages
Clustering K Means Agnes
No ratings yet
Clustering K Means Agnes
36 pages
Introduction to Cluster Analysis.
No ratings yet
Introduction to Cluster Analysis.
53 pages
Data mining and machine learning
No ratings yet
Data mining and machine learning
48 pages
Gautam A. Kudale
No ratings yet
Gautam A. Kudale
6 pages
Data Mining Clustering
No ratings yet
Data Mining Clustering
76 pages
Grouping
No ratings yet
Grouping
98 pages
Concepts and Techniques: Data Mining
No ratings yet
Concepts and Techniques: Data Mining
101 pages
Cluster
100% (1)
Cluster
72 pages
APznzaaxpWzYylHJmwXGn2puBz7GP1usZYf9XTi7oqfrrKnFV9DMMfVzPCu6yO0UOnr_XFt1gJv4TE1ITR6850n9k65DydQUgoRlylNdn2acWAu6KNonoO8z7QULN6BlLxY_B-JhKko0tJ3K77woLz26oTaAv1YNcIuMcOSqInmgeCUzpUxjKC9VqnT_lhE7vDyWp_LQQjGTRnamgIC6ya3nlwi7mjjE9EUIiO2sUhjkD6RV
No ratings yet
APznzaaxpWzYylHJmwXGn2puBz7GP1usZYf9XTi7oqfrrKnFV9DMMfVzPCu6yO0UOnr_XFt1gJv4TE1ITR6850n9k65DydQUgoRlylNdn2acWAu6KNonoO8z7QULN6BlLxY_B-JhKko0tJ3K77woLz26oTaAv1YNcIuMcOSqInmgeCUzpUxjKC9VqnT_lhE7vDyWp_LQQjGTRnamgIC6ya3nlwi7mjjE9EUIiO2sUhjkD6RV
38 pages
AIMLB PGP 2024 Session 12
No ratings yet
AIMLB PGP 2024 Session 12
46 pages
IT3080 Lecture04 2023
No ratings yet
IT3080 Lecture04 2023
56 pages
10ClusBasic
No ratings yet
10ClusBasic
95 pages
Lecture 18
No ratings yet
Lecture 18
27 pages
Clustering Basics
No ratings yet
Clustering Basics
39 pages
Lecture 8 - Clustering
No ratings yet
Lecture 8 - Clustering
23 pages
Cluster Analysis - Approach 1
No ratings yet
Cluster Analysis - Approach 1
28 pages
Clustering Data Mining
No ratings yet
Clustering Data Mining
27 pages
Unit 5
No ratings yet
Unit 5
10 pages
Chapter 3 Unsupervised Learning
No ratings yet
Chapter 3 Unsupervised Learning
45 pages
Clustering
No ratings yet
Clustering
11 pages
Clustering
No ratings yet
Clustering
65 pages
22AIP3101A Session 9
No ratings yet
22AIP3101A Session 9
38 pages
Concepts and Techniques: - Chapter 10
No ratings yet
Concepts and Techniques: - Chapter 10
97 pages
8. Clustering
No ratings yet
8. Clustering
38 pages
ML - 8
No ratings yet
ML - 8
70 pages
10ClusBasic (1)
No ratings yet
10ClusBasic (1)
31 pages
Clustering
No ratings yet
Clustering
28 pages
DMW Unit-V
No ratings yet
DMW Unit-V
47 pages
DSML-ML09. Unsupervised Learning
No ratings yet
DSML-ML09. Unsupervised Learning
69 pages
Clustering
No ratings yet
Clustering
34 pages
Chp10 Cluster Analysis Basic Concepts and Methods
No ratings yet
Chp10 Cluster Analysis Basic Concepts and Methods
24 pages
Cluster Analysis
No ratings yet
Cluster Analysis
76 pages
Sathyabama Institute of Science and Technology SIT1301-Data Mining and Warehousing
No ratings yet
Sathyabama Institute of Science and Technology SIT1301-Data Mining and Warehousing
22 pages
UNIT5
No ratings yet
UNIT5
60 pages
05 Clustering
No ratings yet
05 Clustering
96 pages
Clustering
No ratings yet
Clustering
32 pages
unit iv[1]
No ratings yet
unit iv[1]
96 pages
DM_C6
No ratings yet
DM_C6
37 pages
10clustering - Han and Kamber
No ratings yet
10clustering - Han and Kamber
93 pages
Unit 5 DM
No ratings yet
Unit 5 DM
47 pages
Clustering
No ratings yet
Clustering
7 pages
2002 Spring CS525 Lecture 2
No ratings yet
2002 Spring CS525 Lecture 2
37 pages
Clustering Hierarchical PDF
No ratings yet
Clustering Hierarchical PDF
31 pages
Fundamentals of Data Science Unit 3
No ratings yet
Fundamentals of Data Science Unit 3
15 pages
Chapter 5
No ratings yet
Chapter 5
43 pages
Week 07 Lecture Material
No ratings yet
Week 07 Lecture Material
49 pages
Machine Learning Unit-4
No ratings yet
Machine Learning Unit-4
24 pages
Cluster-Analysis
No ratings yet
Cluster-Analysis
89 pages
Clustering in Machine Learning
No ratings yet
Clustering in Machine Learning
4 pages
HTCB Unit 5
No ratings yet
HTCB Unit 5
3 pages
Clustering
No ratings yet
Clustering
104 pages
Final ML Unit3 May24
No ratings yet
Final ML Unit3 May24
154 pages
Unsupervised Machine Learning Techniques (2)
No ratings yet
Unsupervised Machine Learning Techniques (2)
58 pages
Partition
No ratings yet
Partition
52 pages
Cluster Analysis
No ratings yet
Cluster Analysis
18 pages
M5
No ratings yet
M5
40 pages
Sudoku New: Workouts to sharpen your mind
From Everand
Sudoku New: Workouts to sharpen your mind
Sahil Gupta
No ratings yet
Data Mining Tutorial
100% (2)
Data Mining Tutorial
64 pages
Methodologies and Intelligent Systems for Technology Enhanced Learning, 10th International Conference. Workshops: Volume 2 Zuzana Kubincová - Get the ebook in PDF format for a complete experience
100% (1)
Methodologies and Intelligent Systems for Technology Enhanced Learning, 10th International Conference. Workshops: Volume 2 Zuzana Kubincová - Get the ebook in PDF format for a complete experience
56 pages
Aplikasi Citra Drone Untuk Klasifikasi Vegetasi Di Cagar Alam Curah Manis Sempolan 1 Menggunakan Metode Manual, Object Base Image
No ratings yet
Aplikasi Citra Drone Untuk Klasifikasi Vegetasi Di Cagar Alam Curah Manis Sempolan 1 Menggunakan Metode Manual, Object Base Image
13 pages
13.QUESTION BANK
No ratings yet
13.QUESTION BANK
4 pages
AIML Mod 4&5
No ratings yet
AIML Mod 4&5
7 pages
A Comparative Study of Various Machine Learning Algorithms in Fog Computing
No ratings yet
A Comparative Study of Various Machine Learning Algorithms in Fog Computing
12 pages
Introduction to Engineering Data Analysis
No ratings yet
Introduction to Engineering Data Analysis
20 pages
Cluster Past
No ratings yet
Cluster Past
5 pages
Ppjfkmkno
100% (1)
Ppjfkmkno
249 pages
MID TERM medicine recommended system report
No ratings yet
MID TERM medicine recommended system report
43 pages
Software Evolution Class Note
No ratings yet
Software Evolution Class Note
24 pages
IOE Syllabus of Data Mining
No ratings yet
IOE Syllabus of Data Mining
2 pages
ML - Unit - 1 (24-25)
No ratings yet
ML - Unit - 1 (24-25)
43 pages
Data Mining: A Preprocessing Engine
No ratings yet
Data Mining: A Preprocessing Engine
5 pages
Fresco
No ratings yet
Fresco
50 pages
Data Pre Processing
No ratings yet
Data Pre Processing
63 pages
Frontiers in Advanced Control Systems
100% (1)
Frontiers in Advanced Control Systems
290 pages
Ahmed Et Al 2024 Comprehensive Understanding of Factors Impacting Competitive Construction Bidding
No ratings yet
Ahmed Et Al 2024 Comprehensive Understanding of Factors Impacting Competitive Construction Bidding
23 pages
8588Cluster analysis 5ed. Edition Brian S. Everitt All Chapters Instant Download
100% (13)
8588Cluster analysis 5ed. Edition Brian S. Everitt All Chapters Instant Download
75 pages
Pertemuan-X - Manajemen Data Bagian 2
No ratings yet
Pertemuan-X - Manajemen Data Bagian 2
31 pages
Awe The Audience: How The Narrative Trajectories Affect Audience Perception in Public Speaking
No ratings yet
Awe The Audience: How The Narrative Trajectories Affect Audience Perception in Public Speaking
12 pages
Advanced Classification
No ratings yet
Advanced Classification
12 pages
Petroleum data mining presentation
No ratings yet
Petroleum data mining presentation
16 pages
Data Mining Algorithmes
No ratings yet
Data Mining Algorithmes
166 pages
Ce Research
No ratings yet
Ce Research
11 pages
Weka Activity Report
No ratings yet
Weka Activity Report
30 pages
Multi-Dimensional Information-Driven Many-Objective Software Remodularization Approach
No ratings yet
Multi-Dimensional Information-Driven Many-Objective Software Remodularization Approach
18 pages
Student Performance Analysis Using Machine Learning
No ratings yet
Student Performance Analysis Using Machine Learning
40 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Clustering

Uploaded by

Clustering

Uploaded by

Lecture 10

 A good clustering method will produce

 d(i,j)  d(i,k) + d(k,j)

 Partitioning: Construct various partitions and then evaluate

 Partitioning method: Construct a partition of a database D

 Given k, the k-means algorithm consists of

 Assign each object to the cluster with the

objects assigned to it.

 Trouble with noisy data and outliers

 Not suitable to discover clusters with non-convex shapes

Decompose data objects into several levels of nested

A clustering of the data objects is obtained by cutting the

 Inverse order of AGNES

where n is the number of total objects

 Integration of hierarchical with distance-based clustering

then shrinks them towards the center of the cluster by

Clustering Feature: CF = (N, LS, SS)

B=7 CF1 CF2 CF3 CF6

L=6 child1 child2 child3 child6

Leaf node Leaf node

 CURE: non-spherical clusters, robust wrt outliers

 Drawbacks of square-error-based clustering method

 Draw random sample s

 Shrink the multiple representative points towards the

 Basic idea: Clustering as probability estimation

 Probability of generating an object in cluster

 Find max. likelihood or MAP model

 Cluster model: Normal distribution (mean, covariance)

 Initialization: Choose means at random

 Repeat until convergence

 Guaranteed to converge to local optimum

 Developed at NASA (Cheeseman & Stutz, 1988)

 Grows tree of clusters (Fisher, 1987)

 Neuron = Cluster = Centroid in instance space

 Clustering is also performed by having several

 DENCLUE (Hinneburg & Keim, 1998)

 Relies on a density-based notion of cluster: A cluster is

 Arbitrarily select a point p

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.