0% found this document useful (0 votes)

11 views

clustering

Machine learning

Uploaded by

aishwary srivastav

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

11 views

clustering

Machine learning

Uploaded by

aishwary srivastav

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 16

INSTITUTE OF INFORMATION TECHNOLOGY & MANAGEMENT

Accredited ‘A’ Grade by NAAC &Recognised U/s 2(f) of UGC act

Rated Category `A+’ by SFRC & `A’ by JAC Govt. of NCT of Delhi
Approved by AICTE & Affiliated to GGS Indraprastha University, New Delhi

Machine Learning with Python

Programme : BCA
Semester : V
Subject Code : BCAT311
Subject : Machine Learning with Python
Topic : Clustering
Faculty : Ms. Shilpi Bansal
© Institute of Information Technology and Management, D-29, Institutional Area, Janakpuri, New Delhi-110058
List of Topics

 Introduction to clustering
 K-mean clustering
 Hierarchical clustering

© Institute of Information Technology and Management, D-29,

Institutional Area, Janakpuri, New Delhi-110058
Examples of Clustering Applications
 Marketing: Help marketers discover distinct groups in their customer
bases, and then use this knowledge to develop targeted marketing
programs
 Land use: Identification of areas of similar land use in an earth
observation database
 Insurance: Identifying groups of motor insurance policy holders with a
high average claim cost
 Urban planning: Identifying groups of houses according to their house
type, value, and geographical location
 Seismology: Observed earth quake epicenters should be clustered along
continent faults
4
What Is a Good Clustering?

 A good clustering method will produce clusters with

 High intra-class similarity
 Low inter-class similarity

 Precise definition of clustering quality is difficult

 Application-dependent
 Ultimately subjective

5
Requirements for Clustering in Data
Mining
 Scalability
 Ability to deal with different types of attributes
 Discovery of clusters with arbitrary shape
 Minimal domain knowledge required to determine input
parameters
 Ability to deal with noise and outliers
 Insensitivity to order of input records
 Robustness wrt high dimensionality
 Incorporation of user-specified constraints
 Interpretability and usability
6
Similarity and Dissimilarity Between
Objects

 Same we used for IBL (e.g, Lp norm)

 Euclidean distance (p = 2):
 Properties | + | x d(i,j)
d (i, j) = (| xof−ax metric
2
− x |: +...+ | x
2
−x |2 )
i
1 j
1 i
2 j2 i p jp
 d(i,j)  0
 d(i,i) = 0
 d(i,j) = d(j,i)
 d(i,j)  d(i,k) + d(k,j)

7
Major Clustering Approaches

 Partitioning: Construct various partitions and then evaluate them by

some criterion
 Hierarchical: Create a hierarchical decomposition of the set of objects
using some criterion
 Model-based: Hypothesize a model for each cluster and find best fit of
models to data
 Density-based: Guided by connectivity and density functions

8
Partitioning Algorithms

 Partitioning method: Construct a partition of a database D of n objects

into a set of k clusters
 Given a k, find a partition of k clusters that optimizes the chosen
partitioning criterion
 Global optimal: exhaustively enumerate all partitions
 Heuristic methods: k-means and k-medoids algorithms
 k-means (MacQueen, 1967): Each cluster is represented by the center
of the cluster
 k-medoids or PAM (Partition around medoids) (Kaufman & Rousseeuw,
1987): Each cluster is represented by one of the objects in the cluster

9
K-Means Clustering

 Given k, the k-means algorithm consists of four steps:

 Select initial centroids at random.
 Assign each object to the cluster with the nearest centroid.
 Compute each centroid as the mean of the objects assigned to it.
 Repeat previous 2 steps until no change.

10
K-Means Clustering (contd.)
 Example
10 10

9 9

8 8

7 7

6 6

5 5

4 4

3 3

2 2

1 1

0 0
0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10

10 10

9 9

8 8

7 7

6 6

5 5

4 4

3 3

2 2

1 1

0 0
0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10

11
Comments on the K-Means Method

 Strengths
 Relatively efficient: O(tkn), where n is # objects, k is # clusters, and t is
# iterations. Normally, k, t << n.
 Often terminates at a local optimum. The global optimum may be found
using techniques such as simulated annealing and genetic algorithms
 Weaknesses
 Applicable only when mean is defined (what about categorical data?)
 Need to specify k, the number of clusters, in advance
 Trouble with noisy data and outliers
 Not suitable to discover clusters with non-convex shapes

12
Hierarchical Clustering
 Use distance matrix as clustering criteria. This method does not
require the number of clusters k as an input, but needs a termination
condition

Step 0 Step 1 Step 2 Step 3 Step 4

agglomerative
(AGNES)
a ab
b abcde
c
cde
d
de
e
divisive
13 Step 4 Step 3 Step 2 Step 1 Step 0 (DIANA)
AGNES (Agglomerative Nesting)

 Produces tree of clusters (nodes)

 Initially: each object is a cluster (leaf)
 Recursively merges nodes that have the least dissimilarity
 Criteria: min distance, max distance, avg distance, center distance
 Eventually all nodes belong to the same cluster (root)

10 10 10

9 9 9

8 8 8

7 7 7

6 6 6

5 5 5

4 4 4

3 3 3

2 2 2

1 1 1

0 0 0
0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10

14
A Dendrogram Shows How the
Clusters are Merged Hierarchically

Decompose data objects into several levels of nested

partitioning (tree of clusters), called a dendrogram.

A clustering of the data objects is obtained by cutting the

dendrogram at the desired level. Then each connected
component forms a cluster.

15
DIANA (Divisive Analysis)

 Inverse order of AGNES

 Start with root cluster containing all objects
 Recursively divide into subclusters
 Eventually each cluster contains a single object

10 10
10

9 9
9

8 8
8

7 7
7

6 6
6

5 5
5

4 4
4

3 3
3

2 2
2

1 1
1

0 0
0
0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10
0 1 2 3 4 5 6 7 8 9 10

Biocare - Im 12 Service Manual
No ratings yet
Biocare - Im 12 Service Manual
103 pages
Clustering K Means Agnes
No ratings yet
Clustering K Means Agnes
36 pages
Clustering
No ratings yet
Clustering
32 pages
Cluster-Analysis
No ratings yet
Cluster-Analysis
89 pages
Clustering
No ratings yet
Clustering
45 pages
Grouping
No ratings yet
Grouping
98 pages
unit iv[1]
No ratings yet
unit iv[1]
96 pages
Unit - 5 Cluster Analysis
No ratings yet
Unit - 5 Cluster Analysis
83 pages
Unit 5 DM
No ratings yet
Unit 5 DM
47 pages
Lec09 Clustering
No ratings yet
Lec09 Clustering
27 pages
Introduction to Cluster Analysis.
No ratings yet
Introduction to Cluster Analysis.
53 pages
AIMLB PGP 2024 Session 12
No ratings yet
AIMLB PGP 2024 Session 12
46 pages
Concepts and Techniques: - Chapter 10
No ratings yet
Concepts and Techniques: - Chapter 10
97 pages
Clustering
No ratings yet
Clustering
104 pages
Concepts and Techniques: Data Mining
No ratings yet
Concepts and Techniques: Data Mining
101 pages
Lecture 8 - Clustering
No ratings yet
Lecture 8 - Clustering
23 pages
10ClusBasic
No ratings yet
10ClusBasic
95 pages
BDA Unit 2
No ratings yet
BDA Unit 2
31 pages
DMDWUNITV
No ratings yet
DMDWUNITV
72 pages
APznzaaxpWzYylHJmwXGn2puBz7GP1usZYf9XTi7oqfrrKnFV9DMMfVzPCu6yO0UOnr_XFt1gJv4TE1ITR6850n9k65DydQUgoRlylNdn2acWAu6KNonoO8z7QULN6BlLxY_B-JhKko0tJ3K77woLz26oTaAv1YNcIuMcOSqInmgeCUzpUxjKC9VqnT_lhE7vDyWp_LQQjGTRnamgIC6ya3nlwi7mjjE9EUIiO2sUhjkD6RV
No ratings yet
APznzaaxpWzYylHJmwXGn2puBz7GP1usZYf9XTi7oqfrrKnFV9DMMfVzPCu6yO0UOnr_XFt1gJv4TE1ITR6850n9k65DydQUgoRlylNdn2acWAu6KNonoO8z7QULN6BlLxY_B-JhKko0tJ3K77woLz26oTaAv1YNcIuMcOSqInmgeCUzpUxjKC9VqnT_lhE7vDyWp_LQQjGTRnamgIC6ya3nlwi7mjjE9EUIiO2sUhjkD6RV
38 pages
Data Mining-Unit 3-Part1
No ratings yet
Data Mining-Unit 3-Part1
41 pages
10clustering - Han and Kamber
No ratings yet
10clustering - Han and Kamber
93 pages
Final ML Unit3 May24
No ratings yet
Final ML Unit3 May24
154 pages
Clustering
No ratings yet
Clustering
34 pages
Cluster Analysis
No ratings yet
Cluster Analysis
18 pages
کتاب چهارم بارگزاری شده
No ratings yet
کتاب چهارم بارگزاری شده
63 pages
Clustering
No ratings yet
Clustering
24 pages
Data Mining Clustering
No ratings yet
Data Mining Clustering
76 pages
Cluster Analysis
No ratings yet
Cluster Analysis
76 pages
2002 Spring CS525 Lecture 2
No ratings yet
2002 Spring CS525 Lecture 2
37 pages
10ClusBasic (1)
No ratings yet
10ClusBasic (1)
31 pages
What Is Cluster Analysis?: - Cluster: A Collection of Data Objects
No ratings yet
What Is Cluster Analysis?: - Cluster: A Collection of Data Objects
77 pages
Cluster Analysis: G Sreenivas
No ratings yet
Cluster Analysis: G Sreenivas
29 pages
Unit5 Clustering
No ratings yet
Unit5 Clustering
74 pages
Concepts and Techniques: Data Mining
No ratings yet
Concepts and Techniques: Data Mining
50 pages
05 Clustering
No ratings yet
05 Clustering
96 pages
DSML-ML09. Unsupervised Learning
No ratings yet
DSML-ML09. Unsupervised Learning
69 pages
Clustering Data Mining
No ratings yet
Clustering Data Mining
27 pages
Machine Learning Notes-1 (Clustering-1)
No ratings yet
Machine Learning Notes-1 (Clustering-1)
25 pages
22AIP3101A Session 9
No ratings yet
22AIP3101A Session 9
38 pages
Clustering
No ratings yet
Clustering
84 pages
UNIT-5 PPT
No ratings yet
UNIT-5 PPT
85 pages
10ClusBasic Editted v1
No ratings yet
10ClusBasic Editted v1
41 pages
Clustering-Part 1
No ratings yet
Clustering-Part 1
35 pages
Clustering For Big Data Analytics
No ratings yet
Clustering For Big Data Analytics
28 pages
Clustering
No ratings yet
Clustering
35 pages
UNIT5
No ratings yet
UNIT5
60 pages
What Is Cluster Analysis?: - Cluster: A Collection of Data Objects
No ratings yet
What Is Cluster Analysis?: - Cluster: A Collection of Data Objects
42 pages
Cluster
100% (1)
Cluster
72 pages
8. Clustering
No ratings yet
8. Clustering
38 pages
Clustering-Part1
No ratings yet
Clustering-Part1
79 pages
Sathyabama Institute of Science and Technology SIT1301-Data Mining and Warehousing
No ratings yet
Sathyabama Institute of Science and Technology SIT1301-Data Mining and Warehousing
22 pages
10ClusBasic
No ratings yet
10ClusBasic
66 pages
Unit 2 - Introduction to Cluster Analysis
No ratings yet
Unit 2 - Introduction to Cluster Analysis
53 pages
Clustering
No ratings yet
Clustering
25 pages
Data Mining: I Gede Mahendra Darmawiguna
No ratings yet
Data Mining: I Gede Mahendra Darmawiguna
25 pages
Clustering
No ratings yet
Clustering
39 pages
Concepts and Techniques: Data Mining
No ratings yet
Concepts and Techniques: Data Mining
101 pages
Session 7 Clustering
No ratings yet
Session 7 Clustering
93 pages
Data Mining Models: Techniques and Applications
From Everand
Data Mining Models: Techniques and Applications
Ravi Deshpande
No ratings yet
Sudoku New: Workouts to sharpen your mind
From Everand
Sudoku New: Workouts to sharpen your mind
Sahil Gupta
No ratings yet
PCA
No ratings yet
PCA
32 pages
Logistic Regression in Machine Learning
No ratings yet
Logistic Regression in Machine Learning
11 pages
neural network
No ratings yet
neural network
18 pages
Reinforcement Learning Notes ?
No ratings yet
Reinforcement Learning Notes ?
40 pages
Delhi Public School Ranipur, Haridwar: RÉSUMÉ (Year ..)
No ratings yet
Delhi Public School Ranipur, Haridwar: RÉSUMÉ (Year ..)
5 pages
PCL Report 2023-Pages-11
No ratings yet
PCL Report 2023-Pages-11
1 page
Pre Stressed
No ratings yet
Pre Stressed
4 pages
220-1102 586题
No ratings yet
220-1102 586题
637 pages
IR QP ANSWER
No ratings yet
IR QP ANSWER
59 pages
New Rawmill Maintenance List
No ratings yet
New Rawmill Maintenance List
3 pages
BDP-LX58-K_manual_ENpdf
No ratings yet
BDP-LX58-K_manual_ENpdf
64 pages
Swing JavaBuilder
No ratings yet
Swing JavaBuilder
88 pages
MT_1 SEM _R19_REGSUP
No ratings yet
MT_1 SEM _R19_REGSUP
16 pages
Certified Lead Data Scientist (CLDS™)
No ratings yet
Certified Lead Data Scientist (CLDS™)
8 pages
Manual Quick Start Fca1616 Fca610
No ratings yet
Manual Quick Start Fca1616 Fca610
18 pages
Comparison A36 & A283 GR.C
100% (20)
Comparison A36 & A283 GR.C
1 page
Lightolier Lytecaster Downlights Catalog 1984
No ratings yet
Lightolier Lytecaster Downlights Catalog 1984
68 pages
Catalogo Kemper
No ratings yet
Catalogo Kemper
28 pages
KRAFTON, INC. and PUBG SANTA MONICA, INC., v. APPLE INC., GOOGLE, LLC, (... )
No ratings yet
KRAFTON, INC. and PUBG SANTA MONICA, INC., v. APPLE INC., GOOGLE, LLC, (... )
108 pages
Enhanced Power Transfer by Simultaneous Transmission of Ac-Dc
No ratings yet
Enhanced Power Transfer by Simultaneous Transmission of Ac-Dc
6 pages
Revised Module CSS 9 Second Quarter
No ratings yet
Revised Module CSS 9 Second Quarter
26 pages
Joining Kit Hrcare
No ratings yet
Joining Kit Hrcare
16 pages
D-Link Copper Product Catalog
No ratings yet
D-Link Copper Product Catalog
71 pages
Xiaomi Redmi Note 7 Pro Secret Codes, Engineering Mode
50% (2)
Xiaomi Redmi Note 7 Pro Secret Codes, Engineering Mode
5 pages
Certificate in Database Management
No ratings yet
Certificate in Database Management
4 pages
Abstract Estimate: 202223016398 Co-Located Anganwadi Centers 28225801302 Mpps Sasanakota 2,49,694
No ratings yet
Abstract Estimate: 202223016398 Co-Located Anganwadi Centers 28225801302 Mpps Sasanakota 2,49,694
6 pages
Logcat
No ratings yet
Logcat
2,308 pages
2024-Sifiso-EdTech-Packages
No ratings yet
2024-Sifiso-EdTech-Packages
8 pages
the-art-and-science-of-data-driven-journalism
No ratings yet
the-art-and-science-of-data-driven-journalism
84 pages
Customer Relationship Management: Prof. Swagato Chatterjee
No ratings yet
Customer Relationship Management: Prof. Swagato Chatterjee
695 pages
Answer For Final Exam
100% (1)
Answer For Final Exam
5 pages
Equations and Inequalities
100% (1)
Equations and Inequalities
59 pages
Kartu Soal PG SAFRIDA 2018 - 2019
No ratings yet
Kartu Soal PG SAFRIDA 2018 - 2019
6 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

clustering

Uploaded by

clustering

Uploaded by

INSTITUTE OF INFORMATION TECHNOLOGY & MANAGEMENT

Accredited ‘A’ Grade by NAAC &Recognised U/s 2(f) of UGC act

Machine Learning with Python

© Institute of Information Technology and Management, D-29,

 A good clustering method will produce clusters with

 Precise definition of clustering quality is difficult

 Same we used for IBL (e.g, Lp norm)

 Partitioning: Construct various partitions and then evaluate them by

 Partitioning method: Construct a partition of a database D of n objects

 Given k, the k-means algorithm consists of four steps:

Step 0 Step 1 Step 2 Step 3 Step 4

 Produces tree of clusters (nodes)

Decompose data objects into several levels of nested

A clustering of the data objects is obtained by cutting the

 Inverse order of AGNES

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.