0% found this document useful (0 votes)

8 views

Lecture 18

Uploaded by

sundarkonduru0

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

8 views

Lecture 18

Uploaded by

sundarkonduru0

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 27

Hierarchical Clustering

1
2
Bottom-up Top-down

3
Hierarchical Clustering
• Two broad

Step 0 Step 1 Step 2 Step 3 Step 4 agglomerative

(AGNES)
a
ab
b
abcde
c
cde
d
de
e
divisive
(DIANA)
Step 4 Step 3 Step 2 Step 1 Step 0

4
Hierarchical Agglomerative Clustering: Linkage
Methods
• The single linkage method is based on minimum
distance, or the nearest neighbor rule.

• The complete linkage method is based on the

maximum distance or the furthest neighbor approach.

• The average linkage method the distance between two

clusters is defined as the average of the distances
between all pairs of objects

5
Linkage Methods of Clustering
Single Linkage
Minimum
Distance

Cluster 1 Cluster 2
Complete Linkage
Maximum
Distance

Cluster 1 Cluster 2
Average Linkage

Average
Cluster 1 Distance Cluster 2 6
• Yet another distance between clusters is,

7
8
Dendrogram

9
• Single-link method can be seen as a graph
based method.
• Nodes are points.
• Every pair has an edge with distance as its
cost.
• Single-link is minimum spanning tree
clustering only.

10
Minimum spanning tree clustering

11
Single-link Vs. Complete-link

12
Single link is sensitive to noise, but is
good with arbitrary shaped clusters

13
AGNES (Agglomerative Nesting)

• Introduced in Kaufmann and Rousseeuw (1990)

• Implemented in statistical analysis packages, e.g., Splus
• Use the Single-Link method and the dissimilarity matrix.
• Merge nodes that have the least dissimilarity
• Go on in a non-descending fashion
• Eventually all nodes belong to the same cluster

14
DIANA (Divisive Analysis)

• Introduced in Kaufmann and Rousseeuw (1990)

• Implemented in statistical analysis packages, e.g., Splus
• Inverse order of AGNES
• The cluster is split according to some principle, such as the
maximum Euclidean distance between the closest neighboring
objects in the cluster

15
More on Hierarchical Clustering Methods

■ Major weakness of agglomerative clustering methods

2
■ do not scale well: time complexity of at least O(n ),

where n is the number of total objects

■ can never undo what was done previously

■ Integration of hierarchical with distance-based clustering

■ BIRCH (1996): uses CF-tree and incrementally adjusts

the quality of sub-clusters

■ CURE (1998): selects well-scattered points from the

cluster and then shrinks them towards the center of the

cluster by a specified fraction
■ CHAMELEON (1999): hierarchical clustering using

dynamic modeling
16
BIRCH (1996)
■ Birch: Balanced Iterative Reducing and Clustering using
Hierarchies, by Zhang, Ramakrishnan, Livny (SIGMOD’96)
■ Incrementally construct a CF (Clustering Feature) tree, a
hierarchical data structure for multiphase clustering
■ Phase 1: scan DB to build an initial in-memory CF tree (a
multi-level compression of the data that tries to preserve
the inherent clustering structure of the data)
■ Phase 2: use an arbitrary clustering algorithm to cluster
the leaf nodes of the CF-tree
■ Scales linearly: finds a good clustering with a single scan
and improves the quality with a few additional scans
■ Weakness: handles only numeric data, and sensitive to the
order of the data record.
17
Clustering Feature Vector

Clustering Feature: CF = (N, LS, SS)

N: Number of data points
LS: ∑Ni=1=Xi
SS: ∑Ni=1=Xi2 CF = (5, (16,30),(54,190))

(3,4)
(2,6)
(4,5)
(4,7)
(3,8)

18
19
CF Additive Theorem
● Suppose cluster C1 has CF1=(N1, LS1 ,SS1), cluster
C2 has CF2 =(N2,LS2,SS2)
● If we merge C1 with C2, the CF for the merged
cluster C is

● Why CF?
● Summarized info for single cluster
● Summarized info for two clusters
● Additive theorem

21
22
CF Tree Root
CF1 CF2 CF3 CF6
child1 child2 child3 child6

Non-leaf node
CF1 CF2 CF3 CF5
child1 child2 child3 child5

Leaf node Leaf node

prev CF1 CF2 CF6 next prev CF1 CF2 CF4 next

23
• A CF tree is a height-balanced tree that stores
the CFs in its nodes.
• Nonleaf nodes store sums of the CFs of their
children.
– Thus summarizes about their children
• A CF tree has two parameters: branching
factor B, and threshold T.
• B is maximum number of children a nonleaf
node can have.

24
• Threshold T is the maximum diameter of
subclusters stored at the leaf nodes of the
tree.

25
26
• CF tree is built incrementally.
• An object is inserted in to the closest leaf
entry (subcluster).
• If the diameter of the subcluster stored in the
leaf node after the insertion is larger than T,
the leaf node is split.
– This can result in splitting of the parent node(s)
• Like B+ tree insertion.

Vise Seed Pitch Deck Presentation
No ratings yet
Vise Seed Pitch Deck Presentation
11 pages
Linear Algebra for Data Science, Machine Learning, and Signal Processing
0% (2)
Linear Algebra for Data Science, Machine Learning, and Signal Processing
2 pages
Data Mining Unit 5
No ratings yet
Data Mining Unit 5
30 pages
Chp10 Cluster Analysis Basic Concepts and Methods
No ratings yet
Chp10 Cluster Analysis Basic Concepts and Methods
24 pages
4.4 Hierarchical Clustering Methods
No ratings yet
4.4 Hierarchical Clustering Methods
39 pages
Clustering
No ratings yet
Clustering
45 pages
Clustering Hierarchical PDF
No ratings yet
Clustering Hierarchical PDF
31 pages
P 3.1.3 Hierarchical
No ratings yet
P 3.1.3 Hierarchical
30 pages
Hierarchical ClusteringAlgorithm
No ratings yet
Hierarchical ClusteringAlgorithm
32 pages
Clustering: EE-671 Prof L. Behera, IITK
No ratings yet
Clustering: EE-671 Prof L. Behera, IITK
33 pages
2.11 Hierarchical clustering - Agglomerative & Divisive Clustering
No ratings yet
2.11 Hierarchical clustering - Agglomerative & Divisive Clustering
11 pages
Clustering
No ratings yet
Clustering
110 pages
Grouping
No ratings yet
Grouping
98 pages
Unit-4 new
No ratings yet
Unit-4 new
36 pages
Agnes
No ratings yet
Agnes
25 pages
Custer Analysis: Prepared by Navin Ninama
No ratings yet
Custer Analysis: Prepared by Navin Ninama
20 pages
13_BIRCH
No ratings yet
13_BIRCH
8 pages
Unit 4 - Data Warehousing and Mining
No ratings yet
Unit 4 - Data Warehousing and Mining
51 pages
AI20- Hierarchical-clustering
No ratings yet
AI20- Hierarchical-clustering
31 pages
Unit 4 Clustering
No ratings yet
Unit 4 Clustering
32 pages
Heirarchical clustering
No ratings yet
Heirarchical clustering
22 pages
Clustering Part 2
No ratings yet
Clustering Part 2
28 pages
Hierarchical Clustering Unit 4 ML
No ratings yet
Hierarchical Clustering Unit 4 ML
14 pages
Module-5-Cluster Analysis-Part1
No ratings yet
Module-5-Cluster Analysis-Part1
24 pages
Data Mining Functionalities
No ratings yet
Data Mining Functionalities
13 pages
Hierarchical Clustering
No ratings yet
Hierarchical Clustering
26 pages
Cluster Analysis - Approach 1
No ratings yet
Cluster Analysis - Approach 1
28 pages
Introduction to Cluster Analysis.
No ratings yet
Introduction to Cluster Analysis.
53 pages
By Lior Rokach and Oded Maimon: Clustering Methods
No ratings yet
By Lior Rokach and Oded Maimon: Clustering Methods
5 pages
DM Clustering UNIT4
No ratings yet
DM Clustering UNIT4
36 pages
Unit-6 Clustering Techniques
No ratings yet
Unit-6 Clustering Techniques
110 pages
Data Mining-Unit 3-Part1
No ratings yet
Data Mining-Unit 3-Part1
41 pages
Session 3: Clustering Techniques - Partitioning & Hierarchical Methods
No ratings yet
Session 3: Clustering Techniques - Partitioning & Hierarchical Methods
27 pages
Clustering
No ratings yet
Clustering
19 pages
Hierar Scale4
No ratings yet
Hierar Scale4
51 pages
Cluster Analysis: G Sreenivas
No ratings yet
Cluster Analysis: G Sreenivas
29 pages
Clustering K Means Agnes
No ratings yet
Clustering K Means Agnes
36 pages
Clustering
No ratings yet
Clustering
39 pages
6 - Clustering and Applications and Trends in Datamining
No ratings yet
6 - Clustering and Applications and Trends in Datamining
66 pages
5812d46b-1c39-4a89-ae4b-eec09f93ba4b
No ratings yet
5812d46b-1c39-4a89-ae4b-eec09f93ba4b
66 pages
clustering
No ratings yet
clustering
16 pages
Presentation 28128 Content Document 20241126014005PM
No ratings yet
Presentation 28128 Content Document 20241126014005PM
80 pages
1629189889 ML TCS Lecture Hierarchical 1608
No ratings yet
1629189889 ML TCS Lecture Hierarchical 1608
41 pages
Hierarchical-Clustering-in-Machine-Learning
No ratings yet
Hierarchical-Clustering-in-Machine-Learning
10 pages
Introduction To Data Mining Clustering Analysis
No ratings yet
Introduction To Data Mining Clustering Analysis
84 pages
8. Clustering
No ratings yet
8. Clustering
38 pages
AIMLB PGP 2024 Session 12
No ratings yet
AIMLB PGP 2024 Session 12
46 pages
6 - Machine Learning and Unlabeled Data
No ratings yet
6 - Machine Learning and Unlabeled Data
67 pages
Hierarchical Clustering: Relationship Between Clusters
No ratings yet
Hierarchical Clustering: Relationship Between Clusters
23 pages
10ClusBasic (1)
No ratings yet
10ClusBasic (1)
31 pages
Lecture - 11 Hierarchical Clustering
No ratings yet
Lecture - 11 Hierarchical Clustering
28 pages
DM_C6
No ratings yet
DM_C6
37 pages
Clustering-Part2
No ratings yet
Clustering-Part2
40 pages
Lecture 13
No ratings yet
Lecture 13
45 pages
Hierarchical Clustering
No ratings yet
Hierarchical Clustering
41 pages
Clustering
No ratings yet
Clustering
24 pages
Unit 3 DVA
No ratings yet
Unit 3 DVA
50 pages
MA Unit 5
No ratings yet
MA Unit 5
7 pages
Lec35
No ratings yet
Lec35
18 pages
Convolutional Neural Networks: Fundamentals and Applications for Analyzing Visual Imagery
From Everand
Convolutional Neural Networks: Fundamentals and Applications for Analyzing Visual Imagery
Fouad Sabry
No ratings yet
K Nearest Neighbor Algorithm: Fundamentals and Applications
From Everand
K Nearest Neighbor Algorithm: Fundamentals and Applications
Fouad Sabry
No ratings yet
Radial Basis Networks: Fundamentals and Applications for The Activation Functions of Artificial Neural Networks
From Everand
Radial Basis Networks: Fundamentals and Applications for The Activation Functions of Artificial Neural Networks
Fouad Sabry
No ratings yet
Kriti Final Report
No ratings yet
Kriti Final Report
60 pages
Customer Segmentation Clustering
No ratings yet
Customer Segmentation Clustering
35 pages
Tie 045
100% (1)
Tie 045
31 pages
Analysis of Machine Learning Techniques for Time Domain Waveform Prediction in Analog and Mixed Signal Integrated Circuit Verification
No ratings yet
Analysis of Machine Learning Techniques for Time Domain Waveform Prediction in Analog and Mixed Signal Integrated Circuit Verification
9 pages
Quantum Enhanced Support Vector Machine With Instantaneous Quantum Polynomial Encoding For Improved Cyclone Classification
No ratings yet
Quantum Enhanced Support Vector Machine With Instantaneous Quantum Polynomial Encoding For Improved Cyclone Classification
5 pages
TT Be 5th-Semt CBCS 5
No ratings yet
TT Be 5th-Semt CBCS 5
2 pages
SPPU High Performance Computing
No ratings yet
SPPU High Performance Computing
12 pages
2024 rp
No ratings yet
2024 rp
10 pages
Splnproc 1703
No ratings yet
Splnproc 1703
12 pages
Generative AI Chatbots in Higher Education: A Review of An Emerging Research Area
No ratings yet
Generative AI Chatbots in Higher Education: A Review of An Emerging Research Area
17 pages
The 10 Generic Kinds of Agents 1730948119
No ratings yet
The 10 Generic Kinds of Agents 1730948119
17 pages
Crime Prediction Using Machine Learning Project[1] [Read-Only]
No ratings yet
Crime Prediction Using Machine Learning Project[1] [Read-Only]
14 pages
The Generative AI Advantage
No ratings yet
The Generative AI Advantage
20 pages
Gen 1
No ratings yet
Gen 1
27 pages
Ali Mohanad
No ratings yet
Ali Mohanad
127 pages
Naive Bayes
No ratings yet
Naive Bayes
32 pages
Machine Learning Algorithms: Amit Kumar Singh.b
No ratings yet
Machine Learning Algorithms: Amit Kumar Singh.b
14 pages
Short Notes On Vanishing & Exploding Gradients
No ratings yet
Short Notes On Vanishing & Exploding Gradients
30 pages
AI Research
No ratings yet
AI Research
10 pages
Libro ML 100 páginas Burkov
No ratings yet
Libro ML 100 páginas Burkov
196 pages
Intelligent Systems Tutorial
No ratings yet
Intelligent Systems Tutorial
6 pages
DARPA 2009 Budget Includes 'Silent Talk' Mind Reading Project, Remote EEG (Electroencephalography) / MEG (Magnetoencephalography)
0% (1)
DARPA 2009 Budget Includes 'Silent Talk' Mind Reading Project, Remote EEG (Electroencephalography) / MEG (Magnetoencephalography)
471 pages
Abhijit Balaji PDF
No ratings yet
Abhijit Balaji PDF
1 page
SPU-JSTMR__Volume_1_2
100% (1)
SPU-JSTMR__Volume_1_2
93 pages
Dimension Reduction _ Dimensionality Reduction Techniques
No ratings yet
Dimension Reduction _ Dimensionality Reduction Techniques
5 pages
Capstone Project Weekly Progress Report
No ratings yet
Capstone Project Weekly Progress Report
3 pages
AIML Question Bank-Module 4&5
No ratings yet
AIML Question Bank-Module 4&5
2 pages
An Adaptive Batch Size-Based-CNN-LSTM Framework For Human Activity Recognition in Uncontrolled Environment
No ratings yet
An Adaptive Batch Size-Based-CNN-LSTM Framework For Human Activity Recognition in Uncontrolled Environment
9 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Lecture 18

Uploaded by

Lecture 18

Uploaded by

Hierarchical Clustering

Step 0 Step 1 Step 2 Step 3 Step 4 agglomerative

• The complete linkage method is based on the

• The average linkage method the distance between two

• Introduced in Kaufmann and Rousseeuw (1990)

• Introduced in Kaufmann and Rousseeuw (1990)

■ Major weakness of agglomerative clustering methods

where n is the number of total objects

■ Integration of hierarchical with distance-based clustering

the quality of sub-clusters

cluster and then shrinks them towards the center of the

Clustering Feature: CF = (N, LS, SS)

Leaf node Leaf node

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.