0% found this document useful (0 votes)

7 views38 pages

UNIT-5-ML

The document provides an overview of clustering techniques in data analysis, including various methods such as K-Means, agglomerative, and divisive clustering. It emphasizes the importance of data partitioning for model performance and introduces matrix factorization as a mathematical representation of clustering. Additionally, it compares hard and soft clustering approaches and illustrates the efficiency of clustering through examples.

Uploaded by

kammarabhavi63

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

7 views38 pages

UNIT-5-ML

Uploaded by

kammarabhavi63

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 38

UNIT – 5

Clustering
1.Introduction
2.Partitioning of Data
3.Matrix Factorization
4.Clustering of Patterns
5.Divisive clustering
6.Agglomerative clustering
7.Partitional clustering
8.K- Means Clustering
9.Soft Partitioning
10.Soft Clustering
11.Fuzzy C-Means Clustering
12.Rough Clustering
13.Rough K-Means clustering Algorithm
14.Expertation Maximization-Based Clustering
15.Spectral clustering
1.Introduction
Clustering refers to the process of arranging or organizing
objects according to specific criteria. It plays a crucial role in
uncovering concealed knowledge in large data sets.
Clustering involves dividing or grouping the data into smaller
data sets based on similarities/dissimilarities. Depending on
the requirements, this grouping can lead to various
outcomes, such as partitioning of data, data re-organization,
compression of data and data summarization.
2.Partitioning of Data
Partitioning of data in machine learning refers to the process
of dividing a dataset into smaller subsets for efficient
processing, training, and evaluation. This helps improve
model performance, reduce overfitting, and optimize
computational efficiency.
Types of Data Partitioning
1. Train-Test Split
• The dataset is divided into training data (used to train
the model) and testing data (used to evaluate
performance).
• Common split ratios: 80-20, 70-30, or 90-10.
• Example: A spam detection model is trained on 80% of
emails and tested on the remaining 20%.
2. Train-Validation-Test Split
• The dataset is divided into three subsets:
o Training Set: Used for model learning.
o Validation Set: Used for hyperparameter tuning
and model selection.
o Test Set: Used to assess final model performance.
• Common split: 70% Train, 15% Validation, 15% Test.
3. Cross-Validation (K-Fold Partitioning)
• The dataset is divided into K equal-sized subsets (folds).
• The model is trained on K-1 folds and tested on the
remaining fold, repeating the process K times.
• Example: 5-Fold Cross-Validation splits the data into 5
subsets, trains the model on 4, and tests on 1, repeating
this process 5 times.
4. Clustering-Based Partitioning
• Used in unsupervised learning where data is partitioned
into clusters based on similarity.
• Example: K-Means clustering groups customers based on
purchase behavior.
5. Stratified Sampling
• Ensures that each class or category is proportionally
represented in the training and test sets.
• Useful for imbalanced datasets (e.g., medical diagnoses
where 90% of samples are non-disease and 10% are
disease cases).
Importance of Data Partitioning
• Prevents overfitting by ensuring the model generalizes
well to new data.
• Helps in fair model evaluation by keeping a separate
test set.
• Improves computational efficiency by working on
smaller subsets.
EXAMPLE :
To illustrate, let us take the example of an EMPLOYEE data
table that contains 30,000 records of fixed length. We assume
that there are 1000 distinct values of DEPT_CODE and that
the employee records are evenly distributed among these
values. If this data table is clustered by DEPT_CODE, accessing
all the employees of the department with DEPT_CODE = '15'
requires log2(1000) + 30 accesses.
The first term involves accessing the index table that is
constructed using DEPT_CODE, which is achieved through
binary search.
The second term involves fetching 30 (that is, 30000/1000)
records from the clustered (that is, grouped based on
DEPT_CODE) employee table, which is indicated by the
fetched entry in the index table. Without clustering, accessing
30 employee records from a department would require, on
average, 30 x 30000/2 accesses.
Answer :
The example explains the efficiency of data access when
using clustering in a database. Let's break it down step by
step:
Understanding the Problem
• There is an EMPLOYEE data table containing 30,000
records.
• The table includes a column DEPT_CODE, which has
1,000 distinct values.
• The records are evenly distributed among these 1,000
departments. So, each department has:
30,000 records1,000 departments=30 records per departmen
30,000 \records\1,000 \departments = 30 records per
department,.
Accessing Employee Records Without Clustering
If the data is not clustered, then accessing all employees in a
department involves scanning the entire table, because the
records are scattered.
• On average, finding 30 employees from one department
requires searching through half of the table (assuming a
uniform distribution), which means:
30×30,0002=450,000 accesses.
30 \times \frac{30,000}{2} = 450,000 \text{ accesses}.
Accessing Employee Records With Clustering
When the data is clustered by DEPT_CODE, all employees
belonging to a department are stored together. The search
process works in two steps:
1. Index Search
o The database first searches for DEPT_CODE = '15'
using an index table.
o Since there are 1,000 departments, the search uses
binary search, which requires log₂(1000) accesses.
o Approximate value:
log⁡2(1000)≈10 accesses.\log_2(1000) \approx 10 \text{
accesses}.
2. Fetching Records
o After locating the department in the index table, we
directly retrieve its 30 records (since they are
stored together).
o This requires 30 accesses.
Comparison of Access Costs
• Without Clustering: ~450,000 accesses.
• With Clustering: log₂(1000) + 30 ≈ 10 + 30 = 40 accesses.
Conclusion
By clustering the data based on DEPT_CODE, access time
significantly reduces from 450,000 accesses to 40 accesses,
improving efficiency in database queries.
3.Matrix Factorization :
Matrix Factorization in Clustering
Matrix factorization is a technique that can be used to
represent clustering mathematically. It allows us to express a
dataset as the product of two matrices:
X≈BCX
where:
• X is the original data matrix.
• B is the cluster assignment matrix (indicates which data
points belong to which clusters).
• C is the representative matrix (contains the cluster
centroids or leaders).
This approach helps in understanding clustering as a
factorization problem, where we break down a large data
matrix into meaningful components.

Example of Clustering as Matrix Factorization

Consider the dataset:
X=[6 6 6]
[6 6 8]
[2 4 2]
[2 2 2]
Here, there are 4 data points in a 3-dimensional space. If we
use the Leader Algorithm with a threshold of 3 units, we
obtain two clusters:
• Cluster 1: (6,6,6) and (6,6,8) with (6,6,6) as the leader.
• Cluster 2: (2,4,2) and (2,2,2) with (2,4,2) as the leader.
Thus, the Cluster Assignment Matrix B is:
B=[1 0
10
01
0 1]
Each row in B represents a data point, and the 1s indicate the
assigned cluster.
The Cluster Representative Matrix C (leaders of clusters) is:
C=[6 6 6
2 4 2]
Thus, the matrix factorization:
X≈B×C

Hard Clustering vs. Soft Clustering

• Hard Clustering (Leader Algorithm): Each data point
belongs to only one cluster (values in BB are either 0 or
1).
• Soft Clustering (Fuzzy Clustering): A data point can
belong to multiple clusters with different probabilities
(values in BB are between 0 and 1, and each row sums
to 1).
Example of soft clustering:
B=[0.8 0.20
0.7 0.3
0.1 0.9
0.2 0.8]

Using Centroids Instead of Leaders

Instead of using a single leader for a cluster, we can compute
the centroid (mean) of points in each cluster. The updated CC
matrix is:
C=[6 6 7
2 3 2]
Here:
• (6,6,7) is the centroid of Cluster 1 (mean of (6,6,6) and
(6,6,8)).
• (2,3,2) is the centroid of Cluster 2 (mean of (2,4,2) and
(2,2,2)).
4.Clustering of Patterns
• Intra-cluster distance (distance within a cluster) should
be small, meaning data points in a cluster are close
together.
• Inter-cluster distance (distance between clusters) should
be large, ensuring well-separated clusters.
This ensures that the clustering structure is meaningful and
useful for applications like image processing, customer
segmentation, and recommendation systems.
5.Divisive clustering
Divisive Clustering in Machine Learning
Divisive Clustering is a top-down hierarchical clustering
approach. It starts with all data points in one large cluster
and then recursively splits the clusters into smaller ones,
until each point is in its own cluster or a stopping condition is
met.

How Divisive Clustering Works:

1. Start with all data in one cluster.
2. At each step:
o Choose the cluster to split (often the one with the
highest dissimilarity or largest size).
o Split it into two clusters based on a certain
criterion (e.g., distance, variance).
3. Continue splitting until:
o A predefined number of clusters is reached, or
o A threshold distance or dissimilarity is met.
Example:
Imagine you have 10 data points:
• Initially:
Cluster_1 = {x1, x2, ..., x10}
• After 1st split:
Cluster_1 = {x1, x2, x3, x4, x5}
Cluster_2 = {x6, x7, x8, x9, x10}
• After next split:
Cluster_1a = {x1, x2}
Cluster_1b = {x3, x4, x5}
(and so on...)

Divisive vs Agglomerative Clustering:

Feature Divisive Agglomerative
Direction Top-down Bottom-up
Each point as its own
Start with One big cluster
cluster
Merge/Split Split clusters Merge clusters
Often more
Complexity More commonly used
expensive

Advantages of Divisive Clustering

• May produce better results if the global structure is
more important than local patterns.
• Good when a natural top-down structure exists in the
data.
Disadvantages
• Computationally expensive (needs to evaluate many
possible splits).
• Less commonly used in practice compared to
agglomerative clustering.

Algorithms
One popular algorithm for divisive clustering is:
• DIANA (DIvisive ANAlysis Clustering)

Example :
A (1, 2)
B (2, 2)
C (5, 5)
D (6, 5)
E (8, 6)
Sure! Let's walk through examples of both divisive and
agglomerative clustering using a small, simple dataset.
Dataset (2D points):
Let’s say we have the following 5 data points:
A (1, 2)
B (2, 2)
C (5, 5)
D (6, 5)
E (8, 6)

1. Agglomerative Clustering (Bottom-Up)

Steps:
1. Start with each point as its own cluster:
2. {A}, {B}, {C}, {D}, {E}
3. Find the two closest clusters (using, say, Euclidean
distance):
o A and B are closest → merge them:
o {A, B}, {C}, {D}, {E}
4. Next closest: C and D → merge them:
5. {A, B}, {C, D}, {E}
6. Next: {C, D} and E → merge:
7. {A, B}, {C, D, E}
8. Final merge:
9. {A, B, C, D, E}
You can visualize this process using a dendrogram, which
shows how clusters are formed step by step.

2. Divisive Clustering (Top-Down)

Steps:
1. Start with all points in one cluster:
2. {A, B, C, D, E}
3. Find the “best” way to split into two clusters:
o Based on distance, one possible split:
o {A, B} and {C, D, E}
4. Next, choose one of the clusters to split:
o Split {C, D, E} → maybe {C, D} and {E}
Now you have:
{A, B}, {C, D}, {E}
5. Keep splitting until each point is in its own cluster or
until your stopping condition is met.

Comparison Summary:
Feature Agglomerative Divisive
Start with Individual data points One big cluster
Recursively split
Process Merge closest clusters
clusters
Feature Agglomerative Divisive
One cluster (or target Many small clusters
Ends with
number) (or target)
Visualization
Dendrogram Dendrogram
tool
Sure! Here's a problem example that illustrates both
agglomerative and divisive clustering with step-by-step
clustering.

Problem:
You are given 4 data points in 1D space:
A: 1
B: 2
C: 5
D: 8
Use Euclidean distance and demonstrate agglomerative and
divisive clustering steps until 2 clusters remain.

AGGLOMERATIVE CLUSTERING (Bottom-Up)

Step 0: Initial clusters
{A}, {B}, {C}, {D}
Step 1: Find closest pair
• Distances:
o A-B = |1−2| = 1
o B-C = |2−5| = 3
o C-D = |5−8| = 3
Closest pair: A-B
→ Merge:
{A, B}, {C}, {D}
Step 2: Find next closest
• Distance between {A, B} and C = min(|1−5|, |2−5|) = 3
• C-D = 3
Tie: pick C-D
→ Merge:
{A, B}, {C, D}
Reached 2 clusters

DIVISIVE CLUSTERING (Top-Down)

Step 0: Start with all in one cluster
{A, B, C, D}
Step 1: Split into 2 groups
• Based on distance, a good split is:
• {A, B} and {C, D}
→ Why?
Distance between B (2) and C (5) = 3
Farther apart than within-group distances:
• A-B = 1
• C-D = 3
Reached 2 clusters

Final Result:
Both methods ended with the same two clusters:
{A, B} and {C, D}

Great! Here's a more advanced version of the problem using

2D data points and including both agglomerative and
divisive clustering. At the end, I’ll also show how soft
clustering would approach this.

Problem (Harder Version):

You’re given the following 5 data points in 2D:
A (1, 2)
B (2, 1)
C (5, 5)
D (6, 5)
E (8, 6)
1. Agglomerative Clustering (Bottom-Up)
Step 0: Start with each point in its own cluster
{A}, {B}, {C}, {D}, {E}
Step 1: Compute pairwise Euclidean distances
A B C D E
A0 1.41 5.00 5.83 8.06
B 1.41 0 5.00 5.66 7.81
C 5.00 5.00 0 1.00 3.16
D 5.83 5.66 1.00 0 2.24
E 8.06 7.81 3.16 2.24 0
Step 2: Merge closest pair → C and D (1.00)
{A}, {B}, {C, D}, {E}
Step 3: Next closest → A and B (1.41)
{A, B}, {C, D}, {E}
Step 4: Merge C-D and E (distance = 2.24)
{A, B}, {C, D, E}
Reached 2 clusters

2. Divisive Clustering (Top-Down)

Step 0: Start with all points in one cluster
{A, B, C, D, E}
Step 1: Split into 2 clusters
• Based on distances, A and B are far from C, D, E.
• Good split:
• {A, B}, {C, D, E}
Step 2: Optional further split if required
• Could split {C, D, E} into {C, D} and {E}, or stop here.
Again, reached 2 clusters:
{A, B} and {C, D, E}

3. Soft Clustering
Instead of assigning each point to only one cluster, we give
degrees of membership. For example:
• Let’s assume 2 soft clusters: Cluster 1 (centered near A-
B) and Cluster 2 (near C-D-E).
• Membership matrix (sample):
Point Cluster 1 Cluster 2
A 0.95 0.05
B 0.90 0.10
C 0.10 0.90
D 0.05 0.95
Point Cluster 1 Cluster 2
E 0.02 0.98
Each row sums to 1. This reflects soft clustering like Fuzzy C-
Means, where patterns are partially assigned to multiple
clusters.

Agglomerative Clustering in Machine Learning

Agglomerative Clustering is a bottom-up hierarchical
clustering method. It is one of the most commonly used
hierarchical clustering techniques.

How it works:
1. Start with each data point as its own cluster.
2. Compute distances (or similarity) between all pairs of
clusters.
3. Merge the two closest clusters.
4. Repeat steps 2 and 3 until:
o All points are in one single cluster, or
o A desired number of clusters is reached.
Distance Measures:
The closeness of clusters can be measured using different
linkage criteria:
• Single linkage: Minimum distance between two points
from different clusters.
• Complete linkage: Maximum distance between two
points.
• Average linkage: Average distance between all points
from both clusters.
• Ward’s method: Minimizes the total within-cluster
variance.

Example:
Let’s say we have points:
A (1), B (2), C (8)
• Step 1: Each point is its own cluster: {A}, {B}, {C}
• Step 2: Distances:
o A-B = 1, B-C = 6, A-C = 7
• Step 3: Merge A and B → {A, B}, {C}
• Step 4: Next closest is {A, B} and {C} → merge → {A, B, C}
You can stop after any number of clusters (e.g., stop at 2
clusters: {A, B} and {C})
Visualization: Dendrogram
A dendrogram is a tree-like diagram that shows the sequence
of merges. You can "cut" it at any level to choose how many
clusters you want.

Advantages:
• Simple and intuitive
• Doesn’t require the number of clusters in advance (but
you can set one if desired)
• Works well for small datasets

Disadvantages:
• Computationally expensive for large datasets (O(n² log
n))
• Sensitive to noise and outliers
• Once merged, clusters can’t be split (no backtracking)
The terms soft partitioning and soft clustering are often used
interchangeably, but there can be a subtle difference in
context depending on how they're used. Here's a breakdown:
1. Soft Clustering
• Definition: A form of clustering where each data point
can belong to multiple clusters with varying degrees of
membership.
• Example: In fuzzy c-means, a point might belong to
Cluster A with 70% membership and Cluster B with 30%.
• Focus: More on the degree of belonging or fuzzy
membership of data points to clusters.
• Use cases: When data naturally overlaps or has
ambiguity in group membership.
2. Soft Partitioning
• Definition: A general term that refers to partitioning
data into overlapping subsets, where elements can
belong to multiple partitions.
• Example: In overlapping community detection in
networks, a node might belong to multiple communities.
• Focus: More on the partition structure—the fact that
the partitions are not mutually exclusive.
Key Difference
• Soft clustering is a specific technique (often using fuzzy
logic) to assign partial memberships to clusters.
• Soft partitioning is a broader concept that refers to any
partitioning scheme where overlap is allowed (not
necessarily tied to fuzzy logic or probabilities).
Analogy
Think of soft clustering as how you assign elements to groups
(e.g., based on probability or fuzzy logic), and soft
partitioning as what kind of grouping structure you allow
(e.g., overlaps allowed or not).
Let me know if you want an example to illustrate this better.
Partitional clustering :
Partitional clustering in machine learning is a type of
clustering method where the dataset is divided into distinct,
non-overlapping groups (clusters). Each data point belongs
to exactly one cluster.
Key Characteristics:
• The number of clusters (k) is usually defined beforehand.
• The goal is to minimize intra-cluster distance (points in
the same cluster are close) and maximize inter-cluster
distance (clusters are far apart).
• It’s a hard clustering method (no overlaps).

Popular Algorithms:
1. K-Means Clustering
o Most common partitional algorithm.
o Iteratively assigns points to the nearest centroid
and updates centroids.
2. K-Medoids (PAM)
o Similar to K-Means, but uses actual data points as
cluster centers (medoids).
3. CLARANS
o A scalable version of K-Medoids for large datasets.
Advantages:
• Simple and fast for large datasets.
• Easy to implement and interpret.
Limitations:
• Requires the number of clusters in advance.
• Sensitive to initial cluster centroids.
• Doesn’t handle non-spherical clusters or noise well.

Here’s a clear and simple explanation of the k-means

clustering algorithm, based on the text you shared:

What is K-Means Clustering?

K-means is a partitional clustering algorithm that divides a
set of data points into k clusters. Each cluster has a center
(centroid), and each data point is assigned to the cluster with
the nearest centroid.

Steps in K-Means Algorithm:

1. Initialize: Choose k data points randomly as the initial
cluster centers (centroids).
2. Assign: Assign each data point to the closest cluster
based on distance (usually Euclidean distance).
3. Update: Recalculate the centroids by computing the
mean of all data points in each cluster.
4. Repeat: Reassign data points to the nearest centroid and
update centroids again until:
o No changes in cluster assignments, or
o A maximum number of iterations is reached.

Key Considerations:
• Sensitive to Initialization: Results may vary based on
initial centroid selection. That’s why better initialization
methods (like k-means++) or multiple runs are used.
• Choosing k: It's often not known beforehand. A popular
method is the Elbow Method.
Elbow Method (to find best k):
1. Run k-means for several values of k.
2. Compute the total in-cluster sum of squared error (IC-
SSE) for each.
3. Plot IC-SSE vs. k.
4. Look for the “elbow” point (sharp bend) in the graph.
That k is a good choice.

Time & Space Complexity:

• Time Complexity:
O(n⋅l⋅k⋅p)
Where:
o n= number of data points
o l= number of features
o k= number of clusters
o p = number of iterations
• Space Complexity:
O(k⋅n)
(to store distances or cluster assignments)
Given:
There are 8 data points with two features f1 and f2, from
Table 7.15 (not shown here, but implied). Let's denote each
point as:
Here's the visual plot showing the k-means clustering result
for k=3:
• Red points belong to Cluster 1 (C1)
• Blue points belong to Cluster 2 (C2)
• Green points belong to Cluster 3 (C3)
• 'X' markers show the final cluster centers
Each point is labeled (x₁ to x₈), so you can see which cluster it
got assigned to.
Here are the calculated Euclidean distances from each data
point to the initial cluster centers (C1, C2, and C3), along with
their assigned cluster (minimum distance):
Fuzzy C-Means (FCM) Clustering is an advanced version of k-
means clustering used in machine learning, especially when
clusters overlap and a data point can belong to more than
one cluster.

Key Concepts
• Unlike k-means (hard clustering), where each point
belongs to exactly one cluster, FCM uses soft clustering
— each point belongs to all clusters with varying
degrees of membership (between 0 and 1).
• It's widely used in image segmentation, pattern
recognition, and medical diagnosis.
How FCM Works
1. Initialize:
o Choose number of clusters c.
o Randomly initialize the membership matrix (values
indicating how much a point belongs to each
cluster).
2. Update cluster centers: Each cluster center is calculated
as a weighted average of all points, weighted by their
membership values.
3. Update membership values: Membership of a point to a
cluster depends on its distance to the cluster center.
Closer = higher membership.
4. Repeat steps 2 and 3 until convergence (small changes in
values).
Difference from K-means
Feature K-Means Fuzzy C-Means
Cluster
Hard (0 or 1) Soft (0 to 1)
membership
Overlapping
Not allowed Allowed
clusters
One cluster per Membership degrees per
Output
point cluster
When to Use
• When data is not clearly separable
• When soft assignments are more meaningful (e.g., in
medical diagnostics, a symptom may relate to multiple
diseases)

9.Soft Partitioning
10.Soft Clustering
11.Fuzzy C-Means Clustering
12.Rough Clustering
13.Rough K-Means clustering Algorithm
14.Expertation Maximization-Based Clustering
15.Spectral clustering
Refer your Note Book

HRG FMT VGDS - MCS Interface Control Document v1.1 - AS - 14062013
No ratings yet
HRG FMT VGDS - MCS Interface Control Document v1.1 - AS - 14062013
20 pages
unit 4 mining
No ratings yet
unit 4 mining
12 pages
ML Unit 4 Notes - NJ
No ratings yet
ML Unit 4 Notes - NJ
15 pages
Unit 4
No ratings yet
Unit 4
4 pages
Unit 3 & 4 (p18)
No ratings yet
Unit 3 & 4 (p18)
18 pages
ML UNIT-III
No ratings yet
ML UNIT-III
18 pages
DW & DM Unit 4 Notes
No ratings yet
DW & DM Unit 4 Notes
40 pages
Unit 3 Clustering Algorithm
No ratings yet
Unit 3 Clustering Algorithm
44 pages
Unsupervised-Learning-Part-1 (1)
No ratings yet
Unsupervised-Learning-Part-1 (1)
9 pages
Chapter 7
No ratings yet
Chapter 7
29 pages
7.introduction To Clustering
No ratings yet
7.introduction To Clustering
11 pages
Classification and Clustering
No ratings yet
Classification and Clustering
8 pages
Clustering
No ratings yet
Clustering
29 pages
Clustering Notes
No ratings yet
Clustering Notes
17 pages
Unit 4 Clustering - K-Means and Hierarchical
No ratings yet
Unit 4 Clustering - K-Means and Hierarchical
40 pages
ARTIFICIAL INTELLIGENCE LEC 5
No ratings yet
ARTIFICIAL INTELLIGENCE LEC 5
20 pages
Untitled document
No ratings yet
Untitled document
32 pages
Clustering
No ratings yet
Clustering
57 pages
Clustering
No ratings yet
Clustering
84 pages
ML Unit-4 Final 2024-25
No ratings yet
ML Unit-4 Final 2024-25
28 pages
ML-UNIT-5
No ratings yet
ML-UNIT-5
20 pages
UNIT- IV UNSUPERVISIED LEARNING_NOTES
No ratings yet
UNIT- IV UNSUPERVISIED LEARNING_NOTES
32 pages
Data Mining - UNIT-IV
No ratings yet
Data Mining - UNIT-IV
24 pages
Clustering new
No ratings yet
Clustering new
6 pages
WS - Data Analytics Fundamental-R
No ratings yet
WS - Data Analytics Fundamental-R
51 pages
Machine Learning-4
No ratings yet
Machine Learning-4
73 pages
TQM - TRG - F-07 - Cluster Analysis - Rev02 - 20180421
No ratings yet
TQM - TRG - F-07 - Cluster Analysis - Rev02 - 20180421
42 pages
UNIT-4
No ratings yet
UNIT-4
106 pages
Clustering
No ratings yet
Clustering
22 pages
MODULE-V
No ratings yet
MODULE-V
16 pages
UNIT 4 NOTES
No ratings yet
UNIT 4 NOTES
66 pages
Unit-4 (2)
No ratings yet
Unit-4 (2)
29 pages
DATA MINING ASSIGNMENT (1)
No ratings yet
DATA MINING ASSIGNMENT (1)
5 pages
DM MODULE 4
No ratings yet
DM MODULE 4
17 pages
Assignment 4
No ratings yet
Assignment 4
40 pages
CS8091 - Big Data Analytics - Unit 2
No ratings yet
CS8091 - Big Data Analytics - Unit 2
44 pages
Unit 4 Descriptive Modeling
No ratings yet
Unit 4 Descriptive Modeling
18 pages
Unit 5
No ratings yet
Unit 5
5 pages
clustering
No ratings yet
clustering
20 pages
ML Mod 4 Part 1
No ratings yet
ML Mod 4 Part 1
99 pages
Unit 2 - Introduction to Cluster Analysis
No ratings yet
Unit 2 - Introduction to Cluster Analysis
53 pages
CLUSTRING
No ratings yet
CLUSTRING
13 pages
Day 3 - Content
No ratings yet
Day 3 - Content
50 pages
Module 5
No ratings yet
Module 5
98 pages
Fds Unit03
No ratings yet
Fds Unit03
11 pages
Clustering
No ratings yet
Clustering
3 pages
Cluster
100% (1)
Cluster
72 pages
Unit 4
No ratings yet
Unit 4
74 pages
DM Lecture 06
No ratings yet
DM Lecture 06
32 pages
ML CH 4
No ratings yet
ML CH 4
51 pages
Classify Clustering
No ratings yet
Classify Clustering
31 pages
Pattern Analysis-Machine Learning
No ratings yet
Pattern Analysis-Machine Learning
74 pages
E-Note_28966_Content_Document_20241211091351PM
No ratings yet
E-Note_28966_Content_Document_20241211091351PM
69 pages
Module-5 Clustering Algorithms
No ratings yet
Module-5 Clustering Algorithms
44 pages
Iv Unit DM
No ratings yet
Iv Unit DM
26 pages
Unit 5 Data Science
No ratings yet
Unit 5 Data Science
18 pages
ML+Clustering
No ratings yet
ML+Clustering
33 pages
Data Mining Notes UNIT IV
No ratings yet
Data Mining Notes UNIT IV
19 pages
CBSYLLABUS BDA
No ratings yet
CBSYLLABUS BDA
5 pages
ML_Unit-3
No ratings yet
ML_Unit-3
22 pages
Decision Tree Pruning: Fundamentals and Applications
From Everand
Decision Tree Pruning: Fundamentals and Applications
Fouad Sabry
No ratings yet
Chapter 5 - AI ETHICS QUESTIONS
100% (4)
Chapter 5 - AI ETHICS QUESTIONS
8 pages
Feasibility Study of Lithium-Ion Batteries For Torpedo Applications - DRDO DSJ
No ratings yet
Feasibility Study of Lithium-Ion Batteries For Torpedo Applications - DRDO DSJ
8 pages
Configuring A SINAMICS S120 With Startdrive V15
No ratings yet
Configuring A SINAMICS S120 With Startdrive V15
31 pages
Solar DC Cable Panel Dimension
No ratings yet
Solar DC Cable Panel Dimension
10 pages
Hyundai Transmission Problems
No ratings yet
Hyundai Transmission Problems
49 pages
Coax Rotary Joint PDF
No ratings yet
Coax Rotary Joint PDF
8 pages
AQA-IX3-711-MB
No ratings yet
AQA-IX3-711-MB
1 page
American Cash
No ratings yet
American Cash
2 pages
CQF Brochure
No ratings yet
CQF Brochure
24 pages
Kim 2015
No ratings yet
Kim 2015
14 pages
Communication Tasks
No ratings yet
Communication Tasks
19 pages
Material Requirement Planning
No ratings yet
Material Requirement Planning
13 pages
Glossary of Terms in Public Speaking
No ratings yet
Glossary of Terms in Public Speaking
3 pages
Thesis On Baidu Inc
No ratings yet
Thesis On Baidu Inc
15 pages
Indian Institute of Information Technology Guwahati: Summer Internship 2024
No ratings yet
Indian Institute of Information Technology Guwahati: Summer Internship 2024
2 pages
(MIUI With Tom #23) Optimize All APP For Running It Faster, Smoother and With Less Battery Power! - TechLab - Xiaomi Community - Xiaomi
No ratings yet
(MIUI With Tom #23) Optimize All APP For Running It Faster, Smoother and With Less Battery Power! - TechLab - Xiaomi Community - Xiaomi
8 pages
Vicidialerd
No ratings yet
Vicidialerd
1 page
RST2228 Hardware Installation Guide en
No ratings yet
RST2228 Hardware Installation Guide en
50 pages
Pricelist Hikvision Full Agustus 2018
No ratings yet
Pricelist Hikvision Full Agustus 2018
81 pages
Intelicharger 120 12-24 Datasheet r1
No ratings yet
Intelicharger 120 12-24 Datasheet r1
4 pages
Webasto UniControl Installation Instructions
No ratings yet
Webasto UniControl Installation Instructions
13 pages
Technology and Livelihood Education: Quarter 3 - Module 1
100% (2)
Technology and Livelihood Education: Quarter 3 - Module 1
41 pages
1d Backprop
No ratings yet
1d Backprop
23 pages
Ee 502 1
No ratings yet
Ee 502 1
3 pages
Bha #1 8.5od Junk Mill
No ratings yet
Bha #1 8.5od Junk Mill
1 page
Acct Statement XX8881 10112023
No ratings yet
Acct Statement XX8881 10112023
6 pages
Student Help Guide - Take-Home - Portfolio On Myexams
No ratings yet
Student Help Guide - Take-Home - Portfolio On Myexams
8 pages
Drawing:: SG01 (20KAF) - 3P001-000 SPG30-20K
No ratings yet
Drawing:: SG01 (20KAF) - 3P001-000 SPG30-20K
1 page
CVOR & DVOR (including use of RMI)
No ratings yet
CVOR & DVOR (including use of RMI)
14 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

UNIT-5-ML

Uploaded by

UNIT-5-ML

Uploaded by

UNIT – 5

Example of Clustering as Matrix Factorization

Hard Clustering vs. Soft Clustering

Using Centroids Instead of Leaders

How Divisive Clustering Works:

Divisive vs Agglomerative Clustering:

Advantages of Divisive Clustering

1. Agglomerative Clustering (Bottom-Up)

2. Divisive Clustering (Top-Down)

AGGLOMERATIVE CLUSTERING (Bottom-Up)

DIVISIVE CLUSTERING (Top-Down)

Great! Here's a more advanced version of the problem using

Problem (Harder Version):

2. Divisive Clustering (Top-Down)

Agglomerative Clustering in Machine Learning

Here’s a clear and simple explanation of the k-means

What is K-Means Clustering?

Steps in K-Means Algorithm:

Time & Space Complexity:

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.