0% found this document useful (0 votes)

46 views

Custer Analysis: Prepared by Navin Ninama

This document provides an overview of cluster analysis techniques. It defines cluster analysis as the process of grouping similar data objects into clusters. The document then categorizes and describes major clustering methods, including partitioning, hierarchical, density-based, grid-based, and model-based methods. Examples of applications are also given for areas like marketing, insurance, and earthquake studies. Quality measures and requirements for good clustering are discussed.

Uploaded by

Nishith Lakhlani

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

46 views

Custer Analysis: Prepared by Navin Ninama

Uploaded by

Nishith Lakhlani

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 20

Custer Analysis

PREPARED BY
Navin Ninama
140160702007

Cluster Analysis
1. What a Cluster Analysis?
2. A Categorization of Major

Clustering Methods
3. Partitioning Methods
4. Hierarchical Methods
5. Density-Based Methods
6. Grid-Based Methods
7. Model-Based Methods

What is Cluster Analysis?

Cluster: a collection of data objects
Similar to one another within the same cluster
Dissimilar to the objects in other clusters
Cluster analysis
Finding similarities between data according to the
characteristics found in the data and grouping
similar data objects into clusters
Unsupervised learning: no predefined classes
Typical applications
As a stand-alone tool to get insight into data
distribution
As a preprocessing step for other algorithms

Clustering:
Rich
Applications
Multidisciplinary Efforts

and

Pattern Recognition
Spatial Data Analysis
Create thematic maps in GIS by clustering feature spaces
Detect spatial clusters or for other spatial mining tasks
Image Processing
Economic Science (especially market research)
WWW
Document classification
Cluster Weblog data to discover groups of similar access
patterns

Examples of Clustering Applications

Marketing: Help marketers discover distinct groups in their customer bases,
and then use this knowledge to develop targeted marketing programs
Land use: Identification of areas of similar land use in an earth observation
database
Insurance: Identifying groups of motor insurance policy holders with a high
average claim cost
City-planning: Identifying groups of houses according to their house type,
value, and geographical location
Earth-quake studies: Observed earth quake epicenters should be clustered
along continent faults

Quality: What Is Good Clustering?

A good clustering method will produce high
quality clusters with
high intra-class similarity
low inter-class similarity
The quality of a clustering result depends on
both the similarity measure used by the method
and its implementation
The quality of a clustering method is also
measured by its ability to discover some or all of
the hidden patterns

Measure the Quality of Clustering

Dissimilarity/Similarity metric: Similarity is expressed in terms
of a distance function, typically metric: d(i, j)
There is a separate quality function that measures the
goodness of a cluster.
The definitions of distance functions are usually very different
for interval-scaled, boolean, categorical, ordinal ratio, and
vector variables.
Weights should be associated with different variables based on
applications and data semantics.
It is hard to define similar enough or good enough

the answer is typically highly subjective.

Requirements of Clustering in Data

Mining
Scalability
Ability to deal with different types of attributes
Ability to handle dynamic data
Discovery of clusters with arbitrary shape
Minimal requirements for domain knowledge to
determine input parameters
Able to deal with noise and outliers
Insensitive to order of input records
High dimensionality
Incorporation of user-specified constraints
Interpretability and usability

Major Clustering Approaches (I)

Partitioning approach:
Construct various partitions and then evaluate them by some
criterion, e.g., minimizing the sum of square errors
Typical methods: k-means, k-medoids, CLARANS
Hierarchical approach:
Create a hierarchical decomposition of the set of data (or objects)
using some criterion
Typical methods: Diana, Agnes, BIRCH, ROCK, CAMELEON
Density-based approach:
Based on connectivity and density functions
Typical methods: DBSACN, OPTICS, DenClue

Major Clustering Approaches (II)

Grid-based approach:
based on a multiple-level granularity structure
Typical methods: STING, WaveCluster, CLIQUE
Model-based:
A model is hypothesized for each of the clusters and tries to find the best
fit of that model to each other
Typical methods: EM, SOM, COBWEB
Frequent pattern-based:
Based on the analysis of frequent patterns
Typical methods: pCluster
User-guided or constraint-based:
Clustering by considering user-specified or application-specific constraints
Typical methods: COD (obstacles), constrained clustering

Recent Hierarchical Clustering Methods

Major weakness of agglomerative clustering methods

do not scale well: time complexity of at least O(n2),

where n is the number of total objects
can never undo what was done previously
Integration of hierarchical with distance-based clustering

BIRCH (1996): uses CF-tree and incrementally adjusts

the quality of sub-clusters
ROCK (1999): clustering categorical data by neighbor
and link analysis
CHAMELEON (1999): hierarchical clustering using
dynamic modeling

CHAMELEON: Hierarchical Clustering Using Dynamic Modeling (1999)

CHAMELEON: by G. Karypis, E.H. Han, and V. Kumar99
Measures the similarity based on a dynamic model
Two clusters are merged only if the interconnectivity and
closeness (proximity) between two clusters are high relative to
the internal interconnectivity of the clusters and closeness of
items within the clusters
Cure ignores information about interconnectivity of the objects,
Rock ignores information about the closeness of two clusters
A two-phase algorithm
1. Use a graph partitioning algorithm: cluster objects into a large
number of relatively small sub-clusters
2. Use an agglomerative hierarchical clustering algorithm: find the
genuine clusters by repeatedly combining these sub-clusters

Density-Based Clustering Methods

Clustering based on density (local cluster criterion), such as density-connected
points
Major features:

Discover clusters of arbitrary shape

Handle noise
One scan
Need density parameters as termination condition

Several interesting studies:

DBSCAN: Ester, et al. (KDD96)

OPTICS: Ankerst, et al (SIGMOD99).
DENCLUE: Hinneburg & D. Keim (KDD98)
CLIQUE: Agrawal, et al. (SIGMOD98) (more grid-based)

Density-Based Clustering: Basic Concepts

Two parameters:

Eps: Maximum radius of the neighbourhood

MinPts: Minimum number of points in an Epsneighbourhood of that point
NEps(p): {q belongs to D | dist(p,q) <= Eps}
Directly density-reachable: A point p is directly density-reachable from
a point q w.r.t. Eps, MinPts if

p belongs to NEps(q)
core point condition:
|NEps (q)| >= MinPts

Grid-Based Clustering Method

Using multi-resolution grid data structure

Several interesting methods
STING (a STatistical INformation Grid approach) by Wang,
Yang and Muntz (1997)

WaveCluster by Sheikholeslami, Chatterjee, and Zhang

(VLDB98)

A multi-resolution clustering approach using wavelet

method

CLIQUE: Agrawal, et al. (SIGMOD98)

On high-dimensional data (thus put in the section of clustering
high-dimensional data

The STING Clustering Method

Each cell at a high level is partitioned into a
number of smaller cells in the next lower level
Statistical info of each cell is calculated and stored
beforehand and is used to answer queries
Parameters of higher level cells can be easily
calculated from parameters of lower level cell
count, mean, s, min, max
type of distributionnormal, uniform, etc.

Use a top-down approach to answer spatial data

queries
Start from a pre-selected layertypically with a
small number of cells
For each cell in the current level compute the
confidence interval

Comments on STING
Remove the irrelevant cells from further consideration
When finish examining the current layer, proceed to the next lower
level
Repeat this process until the bottom layer is reached
Advantages:

Query-independent,
easy
to
parallelize,
incremental update
O(K), where K is the number of grid cells at the
lowest level
Disadvantages:

All the cluster boundaries are either horizontal

or vertical, and no diagonal boundary is
detected

Model-Based Clustering
What is model-based clustering?

Attempt to optimize the fit between the given data

and some mathematical model
Based on the assumption: Data are generated by a
mixture of underlying probability distribution
Typical methods

Statistical approach
EM (Expectation maximization), AutoClass

Machine learning approach

COBWEB, CLASSIT

Neural network approach

SOM (Self-Organizing Feature Map)

EM Expectation Maximization

EM A popular iterative refinement algorithm

An extension to k-means
Assign each object to a cluster according to a weight (prob.
distribution)
New means are computed based on weighted measures
General idea
Starts with an initial estimate of the parameter vector
Iteratively rescores the patterns against the mixture density
produced by the parameter vector
The rescored patterns are used to update the parameter
updates
Patterns belonging to the same cluster, if they are placed by
their scores in a particular component
Algorithm converges fast but may not be in global optima

Thank You

Swell Factor and Shrinkage Factor
100% (4)
Swell Factor and Shrinkage Factor
27 pages
Grade 8 2nd Periodicals
100% (4)
Grade 8 2nd Periodicals
3 pages
Jetty - Design
No ratings yet
Jetty - Design
2 pages
Density & Grid based clustering
100% (1)
Density & Grid based clustering
21 pages
Chapter 5
No ratings yet
Chapter 5
43 pages
Clustering
No ratings yet
Clustering
45 pages
DWDM - Unit - VI
No ratings yet
DWDM - Unit - VI
38 pages
Data Mining-Unit IV
No ratings yet
Data Mining-Unit IV
15 pages
Cluster Analysis
No ratings yet
Cluster Analysis
26 pages
ICS 2408 Lecture 7 Clustering
No ratings yet
ICS 2408 Lecture 7 Clustering
25 pages
Clustering Part2
No ratings yet
Clustering Part2
29 pages
ML Unit 4 Notes - NJ
No ratings yet
ML Unit 4 Notes - NJ
15 pages
Fundamentals of Data Science Unit 3
No ratings yet
Fundamentals of Data Science Unit 3
15 pages
By Lior Rokach and Oded Maimon: Clustering Methods
No ratings yet
By Lior Rokach and Oded Maimon: Clustering Methods
5 pages
Clustering K Means Agnes
No ratings yet
Clustering K Means Agnes
36 pages
UNIT 2 DMW
No ratings yet
UNIT 2 DMW
26 pages
Lecture 8 - Clustering
No ratings yet
Lecture 8 - Clustering
23 pages
Concepts and Techniques: Data Mining
No ratings yet
Concepts and Techniques: Data Mining
101 pages
unit iv[1]
No ratings yet
unit iv[1]
96 pages
Cluster Analysis
No ratings yet
Cluster Analysis
36 pages
Unit 5
No ratings yet
Unit 5
27 pages
Concepts and Techniques: - Chapter 10
No ratings yet
Concepts and Techniques: - Chapter 10
97 pages
05 Clustering
No ratings yet
05 Clustering
96 pages
Unit 4 Clustering
No ratings yet
Unit 4 Clustering
18 pages
Cluster Analysis-Unit 4
No ratings yet
Cluster Analysis-Unit 4
7 pages
ICS 2408 - Lecture 7 - Clustering
No ratings yet
ICS 2408 - Lecture 7 - Clustering
25 pages
Sathyabama Institute of Science and Technology SIT1301-Data Mining and Warehousing
No ratings yet
Sathyabama Institute of Science and Technology SIT1301-Data Mining and Warehousing
22 pages
Clustering
No ratings yet
Clustering
11 pages
10ClusBasic (1)
No ratings yet
10ClusBasic (1)
31 pages
Data Mining Unit-Iv
No ratings yet
Data Mining Unit-Iv
34 pages
Grouping
No ratings yet
Grouping
98 pages
HTCB Unit 5
No ratings yet
HTCB Unit 5
3 pages
Chap8-Cluster Analysis
No ratings yet
Chap8-Cluster Analysis
78 pages
10clustering - Han and Kamber
No ratings yet
10clustering - Han and Kamber
93 pages
Screenshot 2024-05-17 at 3.30.05 PM
No ratings yet
Screenshot 2024-05-17 at 3.30.05 PM
31 pages
Introduction to Cluster Analysis.
No ratings yet
Introduction to Cluster Analysis.
53 pages
10ClusBasic
No ratings yet
10ClusBasic
95 pages
Ds Econtent
No ratings yet
Ds Econtent
8 pages
Cluster
No ratings yet
Cluster
20 pages
Cluster Analysis
No ratings yet
Cluster Analysis
18 pages
Unit 4
No ratings yet
Unit 4
21 pages
DWMModule 4 (1) (1) (1)
No ratings yet
DWMModule 4 (1) (1) (1)
31 pages
Cluster Analysis
No ratings yet
Cluster Analysis
21 pages
Clustering
No ratings yet
Clustering
8 pages
APznzaaxpWzYylHJmwXGn2puBz7GP1usZYf9XTi7oqfrrKnFV9DMMfVzPCu6yO0UOnr_XFt1gJv4TE1ITR6850n9k65DydQUgoRlylNdn2acWAu6KNonoO8z7QULN6BlLxY_B-JhKko0tJ3K77woLz26oTaAv1YNcIuMcOSqInmgeCUzpUxjKC9VqnT_lhE7vDyWp_LQQjGTRnamgIC6ya3nlwi7mjjE9EUIiO2sUhjkD6RV
No ratings yet
APznzaaxpWzYylHJmwXGn2puBz7GP1usZYf9XTi7oqfrrKnFV9DMMfVzPCu6yO0UOnr_XFt1gJv4TE1ITR6850n9k65DydQUgoRlylNdn2acWAu6KNonoO8z7QULN6BlLxY_B-JhKko0tJ3K77woLz26oTaAv1YNcIuMcOSqInmgeCUzpUxjKC9VqnT_lhE7vDyWp_LQQjGTRnamgIC6ya3nlwi7mjjE9EUIiO2sUhjkD6RV
38 pages
Clustering
No ratings yet
Clustering
6 pages
Unit-IV Cluster Outlier Analysis
No ratings yet
Unit-IV Cluster Outlier Analysis
21 pages
Clustering For Big Data Analytics
No ratings yet
Clustering For Big Data Analytics
28 pages
Cluster Analysis
No ratings yet
Cluster Analysis
76 pages
Clustering new
No ratings yet
Clustering new
6 pages
Clustering
No ratings yet
Clustering
104 pages
Data Mining Clustering Techniques
No ratings yet
Data Mining Clustering Techniques
3 pages
Study of Clustering Methods in Data Mini PDF
No ratings yet
Study of Clustering Methods in Data Mini PDF
5 pages
Data Mining Clustering
No ratings yet
Data Mining Clustering
76 pages
Ambo University: Inistitute of Technology
No ratings yet
Ambo University: Inistitute of Technology
15 pages
DWDM Lecture Notes U-5
No ratings yet
DWDM Lecture Notes U-5
26 pages
CLUSTER ANALYSIS unit 3 Data mining
No ratings yet
CLUSTER ANALYSIS unit 3 Data mining
84 pages
Lecture Notes For Chapter 8: by Tan, Steinbach, Kumar
No ratings yet
Lecture Notes For Chapter 8: by Tan, Steinbach, Kumar
93 pages
Cluster Analysis: Basic Concepts and Algorithms
No ratings yet
Cluster Analysis: Basic Concepts and Algorithms
141 pages
Clustering
No ratings yet
Clustering
32 pages
Chap8-Cluster Analysis
No ratings yet
Chap8-Cluster Analysis
103 pages
Clustering Unit4
No ratings yet
Clustering Unit4
9 pages
K Nearest Neighbor Algorithm: Fundamentals and Applications
From Everand
K Nearest Neighbor Algorithm: Fundamentals and Applications
Fouad Sabry
No ratings yet
A B C E D F: Worksite Triage Form
No ratings yet
A B C E D F: Worksite Triage Form
5 pages
Disaster Risk Management
No ratings yet
Disaster Risk Management
61 pages
Port - Wikipedia
No ratings yet
Port - Wikipedia
7 pages
A Short Guide To The History of Australia
No ratings yet
A Short Guide To The History of Australia
28 pages
MAPS_WORKSHEET for Class 6
No ratings yet
MAPS_WORKSHEET for Class 6
3 pages
1 Aztec Self Notes
100% (1)
1 Aztec Self Notes
10 pages
Yanbo Huang
No ratings yet
Yanbo Huang
13 pages
Literature Chapter 10
No ratings yet
Literature Chapter 10
3 pages
A Resource Guide To First-Tier Suburb Coalitions
No ratings yet
A Resource Guide To First-Tier Suburb Coalitions
11 pages
History of Greece Ernst Curtius
100% (1)
History of Greece Ernst Curtius
500 pages
Cook, Old Smyrna, 1948-1951
100% (1)
Cook, Old Smyrna, 1948-1951
42 pages
Teacher Guide
No ratings yet
Teacher Guide
24 pages
Geography Quiz: The World: Directions: Read Each Question or Statement Below, and Circle The Correct Answer
No ratings yet
Geography Quiz: The World: Directions: Read Each Question or Statement Below, and Circle The Correct Answer
5 pages
REDACTED VERSION Oil Spill Contingency Plan (Approved by MCA) - Valid Until 17.05.2017
No ratings yet
REDACTED VERSION Oil Spill Contingency Plan (Approved by MCA) - Valid Until 17.05.2017
102 pages
Gentrification in Color and Time
No ratings yet
Gentrification in Color and Time
38 pages
An Ancient Water Town Revived: The Case of Zhujiajiao, Shanghai
No ratings yet
An Ancient Water Town Revived: The Case of Zhujiajiao, Shanghai
16 pages
Tornado
No ratings yet
Tornado
5 pages
Msoe 4 em PDF
No ratings yet
Msoe 4 em PDF
6 pages
Estidama A-Z PDF
No ratings yet
Estidama A-Z PDF
15 pages
Chapter 2 The Size and Spacing of Cities
100% (1)
Chapter 2 The Size and Spacing of Cities
18 pages
Geography Lesson 1
No ratings yet
Geography Lesson 1
6 pages
Geography Quiz - Industrialization
No ratings yet
Geography Quiz - Industrialization
8 pages
HKIE Environmental Division 2010 Environmental Paper Award
No ratings yet
HKIE Environmental Division 2010 Environmental Paper Award
21 pages
Chapter 5 - Territory
No ratings yet
Chapter 5 - Territory
8 pages
Sat Pura
No ratings yet
Sat Pura
4 pages
Begusarai Jh 1-Model.pdf 05.12.2024
No ratings yet
Begusarai Jh 1-Model.pdf 05.12.2024
1 page
Tolkien and Sanskrit (2016) PDF
No ratings yet
Tolkien and Sanskrit (2016) PDF
14 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Custer Analysis: Prepared by Navin Ninama

Uploaded by

Custer Analysis: Prepared by Navin Ninama

Uploaded by

Custer Analysis

What is Cluster Analysis?

Examples of Clustering Applications

Quality: What Is Good Clustering?

Measure the Quality of Clustering

the answer is typically highly subjective.

Requirements of Clustering in Data

Major Clustering Approaches (I)

Major Clustering Approaches (II)

Recent Hierarchical Clustering Methods

do not scale well: time complexity of at least O(n2),

BIRCH (1996): uses CF-tree and incrementally adjusts

CHAMELEON: Hierarchical Clustering Using Dynamic Modeling (1999)

Density-Based Clustering Methods

Discover clusters of arbitrary shape

Several interesting studies:

DBSCAN: Ester, et al. (KDD96)

Density-Based Clustering: Basic Concepts

Eps: Maximum radius of the neighbourhood

Grid-Based Clustering Method

Using multi-resolution grid data structure

WaveCluster by Sheikholeslami, Chatterjee, and Zhang

A multi-resolution clustering approach using wavelet

CLIQUE: Agrawal, et al. (SIGMOD98)

The STING Clustering Method

Use a top-down approach to answer spatial data

All the cluster boundaries are either horizontal

Attempt to optimize the fit between the given data

Machine learning approach

Neural network approach

EM A popular iterative refinement algorithm

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.