0% found this document useful (0 votes)

4 views8 pages

Clustering Explanation

Clustering is an unsupervised machine learning technique used to group similar data points into clusters without predefined labels, revealing hidden insights in various fields such as customer segmentation and anomaly detection. There are several clustering methods, including partitioning (like K-Means), hierarchical, density-based, grid-based, and model-based approaches, each with its strengths and weaknesses. K-Means is a popular method that iteratively assigns data points to clusters based on proximity to centroids, while hierarchical clustering builds a tree of relationships without needing to specify the number of clusters in advance.

Uploaded by

divy1908

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

4 views8 pages

Clustering Explanation

Uploaded by

divy1908

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 8

Understanding Clustering: Grouping Data

for Insights

Introduction to Clustering
Imagine you have a huge pile of toys, all mixed up. Some are cars, some are dolls, some
are building blocks, and so on. If you wanted to make sense of this pile, what would you
do? You'd probably start putting similar toys together. All the cars in one box, all the
dolls in another, and all the building blocks in a third. This simple act of grouping similar
items together is, at its core, what clustering is all about in the world of data science.

In more formal terms, clustering is an unsupervised machine learning technique that

involves grouping data points such that data points in the same group (or "cluster") are
more similar to each other than to those in other groups. Unlike supervised learning,
where we have predefined categories or labels for our data, clustering works with
unlabeled data. This means the algorithm itself discovers patterns and structures within
the data without any prior knowledge of what those groups should look like.

Think of it as an automatic sorting machine. You feed it a lot of information, and it

figures out on its own how to categorize that information into meaningful groups. These
groups can then reveal hidden insights, identify underlying structures, or simplify
complex datasets.

Why is Clustering Important?

Clustering is a powerful tool with a wide range of applications across various fields:

• Customer Segmentation: Businesses use clustering to group customers based on

their purchasing behavior, demographics, or interests. This allows them to tailor
marketing strategies, develop personalized products, and improve customer
satisfaction.
• Document Analysis: In natural language processing, clustering can group similar
documents together, making it easier to organize large collections of text, identify
topics, or recommend related articles.
• Image Processing: Clustering is used in image segmentation, where pixels with
similar characteristics are grouped to identify objects or regions within an image.
• Anomaly Detection: Outliers that don't fit into any cluster can be identified as
anomalies or unusual patterns, which is crucial in fraud detection or network
intrusion detection.
• Biology and Medicine: Researchers use clustering to group genes with similar
expression patterns, classify diseases, or identify patient subgroups for targeted
treatments.

In essence, clustering helps us make sense of vast amounts of data by finding natural
groupings, allowing us to derive actionable insights and make better decisions. It's
about discovering the inherent structure in data when you don't have a clear answer key.

Types of Clustering
While the core idea of clustering remains the same—grouping similar data points—
various algorithms approach this task differently. These differences often stem from how
they define "similarity" and how they construct the clusters. Here are some of the main
types of clustering:

1. Partitioning Methods

Partitioning methods divide data objects into a set of k clusters, where k is the number
of clusters specified by the user. These methods typically work by iteratively reassigning
data points to clusters until some criterion is met (e.g., minimizing the sum of squared
distances between data points and their cluster centroids).

• K-Means Clustering: This is perhaps the most popular and widely used
partitioning method. It aims to partition n observations into k clusters in which
each observation belongs to the cluster with the nearest mean (centroid), serving
as a prototype of the cluster. We will delve deeper into K-Means later.
• K-Medoids (PAM - Partitioning Around Medoids): Similar to K-Means, but instead
of using the mean of the cluster as the centroid, it uses an actual data point (the
medoid) from the cluster. This makes K-Medoids more robust to outliers than K-
Means.

2. Hierarchical Methods

Hierarchical clustering methods build a hierarchy of clusters. They can be broadly

categorized into two approaches:

• Agglomerative (Bottom-Up): This is the most common approach. It starts with

each data point as a single cluster and then iteratively merges the closest pairs of
clusters until all data points are in a single cluster or a termination condition is met.
The result is a tree-like structure called a dendrogram, which shows the sequence
of merges.
• Divisive (Top-Down): This approach works in the opposite direction. It starts with
all data points in one large cluster and then recursively splits the cluster into
smaller clusters until each data point is in its own cluster or a termination
condition is met.

3. Density-Based Methods

Density-based methods discover clusters of arbitrary shape based on areas of high

density separated by areas of lower density. They are good at finding non-linear shapes
and can identify noise (outliers).

• DBSCAN (Density-Based Spatial Clustering of Applications with Noise): This

algorithm groups together points that are closely packed together, marking as
outliers points that lie alone in low-density regions. DBSCAN does not require the
number of clusters to be specified beforehand.

4. Grid-Based Methods

Grid-based methods quantize the object space into a finite number of cells that form a
grid structure. All clustering operations are performed on this grid structure. These
methods are typically fast and independent of the number of data objects.

• STING (Statistical Information Grid): A typical example where the spatial area is
divided into rectangular cells, and different levels of rectangular cells correspond
to different levels of resolution.

5. Model-Based Methods

Model-based methods assume a model for each cluster and try to find the best fit of the
data to the given model. These methods often use statistical approaches to determine
the probability of data points belonging to certain clusters.

• Gaussian Mixture Models (GMM): Assumes that data points are generated from a
mixture of several Gaussian distributions with unknown parameters. It attempts to
find the parameters of these distributions and assign each data point to the
distribution it most likely belongs to.

Each type of clustering has its strengths and weaknesses, making them suitable for
different kinds of data and problems. The choice of method often depends on the nature
of the data, the desired cluster shapes, and whether the number of clusters is known
beforehand.
K-Means Clustering: Simple Yet Powerful Grouping
K-Means clustering is one of the most popular and straightforward partitioning
clustering algorithms. Its popularity stems from its simplicity, efficiency, and
effectiveness in many real-world applications. The core idea behind K-Means is to
partition n data points into k distinct, non-overlapping clusters, where each data point
belongs to the cluster with the nearest mean (centroid).

How K-Means Works: An Intuitive Walkthrough

Let's break down the K-Means algorithm into simple steps:

1. Choose the Number of Clusters (k): This is the first and often the trickiest step.
You need to decide how many groups (k) you want to divide your data into.
Sometimes this number is known from domain knowledge (e.g., you want to
segment customers into 3 types: high-value, medium-value, low-value). Other
times, you might need to use techniques like the "Elbow Method" or "Silhouette
Score" to find an optimal k.

2. Initialize Centroids: Randomly select k data points from your dataset to serve as
the initial centroids (the center points) for your k clusters. These initial centroids
are essentially educated guesses for where the centers of your groups might be.

3. Assign Data Points to the Closest Centroid: For each data point in your dataset,
calculate its distance to each of the k centroids. The data point is then assigned to
the cluster whose centroid is closest to it. Think of it like drawing lines from each
data point to the closest center point.

4. Update Centroids: Once all data points have been assigned to a cluster,
recalculate the position of each centroid. The new centroid for each cluster is the
mean (average) of all the data points currently assigned to that cluster. This step
moves the center of each group to a more accurate position based on its current
members.

5. Repeat Steps 3 and 4: Continue iteratively assigning data points and updating
centroids until one of the following conditions is met:

◦ The centroids no longer move significantly (they have converged).

◦ The assignments of data points to clusters no longer change.
◦ A maximum number of iterations is reached.
Example: Grouping Students by Study Habits

Let's imagine a school wants to group its students based on their study habits to offer
more personalized support. They collect data on two variables: "Hours Spent Studying
Per Week" and "Number of Assignments Completed On Time Per Week." We'll use a
small, simplified dataset for illustration.

Data Points (Students):

Student Hours Spent Studying Assignments Completed On Time

A 2 1

B 3 2

C 8 7

D 7 8

E 2 3

F 9 6

Let's say we decide to form k = 2 clusters. We want to find two groups of students.

Step 1: Choose k = 2.

Step 2: Initialize Centroids. Let's randomly pick Student A (2,1) and Student D (7,8) as
our initial centroids.

• Centroid 1 (C1): (2, 1)

• Centroid 2 (C2): (7, 8)

Step 3: Assign Data Points (Iteration 1). We calculate the Euclidean distance (a
common way to measure distance in K-Means) from each student to C1 and C2.

• Student A (2,1):
◦ Distance to C1 (2,1): 0 (assigned to C1)
◦ Distance to C2 (7,8): $\sqrt{(7-2)^2 + (8-1)^2} = \sqrt{5^2 + 7^2} = \sqrt{25 + 49}
= \sqrt{74} \approx 8.6$
• Student B (3,2):
◦ Distance to C1 (2,1): $\sqrt{(3-2)^2 + (2-1)^2} = \sqrt{1^2 + 1^2} = \sqrt{2}
\approx 1.4$
◦ Distance to C2 (7,8): $\sqrt{(7-3)^2 + (8-2)^2} = \sqrt{4^2 + 6^2} = \sqrt{16 + 36}
= \sqrt{52} \approx 7.2$
◦ Assigned to C1 (1.4 < 7.2)
• Student C (8,7):
◦ Distance to C1 (2,1): $\sqrt{(8-2)^2 + (7-1)^2} = \sqrt{6^2 + 6^2} = \sqrt{36 + 36}
= \sqrt{72} \approx 8.5$
◦ Distance to C2 (7,8): $\sqrt{(8-7)^2 + (7-8)^2} = \sqrt{1^2 + (-1)^2} = \sqrt{1 + 1}
= \sqrt{2} \approx 1.4$
◦ Assigned to C2 (1.4 < 8.5)
• Student D (7,8):
◦ Distance to C1 (2,1): $\sqrt{(7-2)^2 + (8-1)^2} = \sqrt{5^2 + 7^2} = \sqrt{74}
\approx 8.6$
◦ Distance to C2 (7,8): 0 (assigned to C2)
• Student E (2,3):
◦ Distance to C1 (2,1): $\sqrt{(2-2)^2 + (3-1)^2} = \sqrt{0^2 + 2^2} = \sqrt{4} = 2$
◦ Distance to C2 (7,8): $\sqrt{(7-2)^2 + (8-3)^2} = \sqrt{5^2 + 5^2} = \sqrt{25 + 25}
= \sqrt{50} \approx 7.1$
◦ Assigned to C1 (2 < 7.1)
• Student F (9,6):
◦ Distance to C1 (2,1): $\sqrt{(9-2)^2 + (6-1)^2} = \sqrt{7^2 + 5^2} = \sqrt{49 + 25}
= \sqrt{74} \approx 8.6$
◦ Distance to C2 (7,8): $\sqrt{(9-7)^2 + (6-8)^2} = \sqrt{2^2 + (-2)^2} = \sqrt{4 + 4}
= \sqrt{8} \approx 2.8$
◦ Assigned to C2 (2.8 < 8.6)

Current Clusters:

• Cluster 1 (assigned to C1): Students A (2,1), B (3,2), E (2,3)

• Cluster 2 (assigned to C2): Students C (8,7), D (7,8), F (9,6)

Step 4: Update Centroids (Iteration 1).

• New C1: Average of (2,1), (3,2), (2,3) = (($2+3+2)/3$, ($1+2+3)/3$) = (7/3, 6/3) = (2.33,
2)
• New C2: Average of (8,7), (7,8), (9,6) = (($8+7+9)/3$, ($7+8+6)/3$) = (24/3, 21/3) = (8,
7)

Step 5: Repeat (Iteration 2). Now we use the new centroids (2.33, 2) and (8, 7) and
repeat the assignment process.

• Student A (2,1): Closest to C1

• Student B (3,2): Closest to C1
• Student C (8,7): Closest to C2
• Student D (7,8): Closest to C2
• Student E (2,3): Closest to C1
• Student F (9,6): Closest to C2

Notice that the cluster assignments did not change in this iteration. This means the
algorithm has converged. Our final clusters are:

• Cluster 1 (Low Study/Assignments): Students A, B, E

• Cluster 2 (High Study/Assignments): Students C, D, F

This example demonstrates how K-Means iteratively refines its clusters until a stable
grouping is achieved. In a real-world scenario, with many more data points and
dimensions, this process would be handled by a computer program.

Advantages and Disadvantages of K-Means

Advantages:

• Simplicity: Easy to understand and implement.

• Efficiency: Relatively fast for large datasets, especially when k is small.
• Scalability: Can be applied to a wide range of data types and sizes.

Disadvantages:

• Requires Pre-defined k: You need to specify the number of clusters (k)

beforehand, which can be challenging.
• Sensitive to Initial Centroids: The initial random placement of centroids can
affect the final clustering result. Different initializations might lead to different
clusters.
• Sensitive to Outliers: Outliers can significantly pull the centroids, distorting the
clusters.
• Assumes Spherical Clusters: K-Means works best with clusters that are roughly
spherical and equally sized. It struggles with clusters of irregular shapes or varying
densities.
• Not Suitable for Non-Linear Data: It uses Euclidean distance, which works well for
linearly separable data but not for complex, non-linear relationships.

Hierarchical Clustering: Building a Tree of Relationships

Hierarchical clustering is another popular clustering technique that, unlike K-Means,
does not require you to specify the number of clusters beforehand. Instead, it builds a
hierarchy of clusters, represented as a tree-like diagram called a dendrogram. This
hierarchy allows you to visualize the relationships between data points and choose the
number of clusters at different levels of granularity.
There are two main types of hierarchical clustering:

1. Agglomerative (Bottom-Up): This is the more common approach. It starts with

each data point as its own individual cluster and then successively merges pairs of
clusters until all data points are in a single cluster.
2. Divisive (Top-Down): This approach starts with all data points in one large cluster
and then recursively splits the cluster into smaller clusters until each data point is
in its own cluster.

We will focus on the Agglomerative approach as it is more widely used and easier to
understand.

How Agglomerative Hierarchical Clustering Works: A Step-by-Step

Guide

1. Start with Individual Clusters: Each data point begins as its own cluster. If you
have N data points, you start with N clusters.

2. Calculate Proximity (Similarity/Distance): Compute the similarity or distance

between all pairs of clusters. Common distance metrics include Euclidean distance,
Manhattan distance, or cosine similarity. The choice of distance metric depends on
the nature of your data.

3. Merge Closest Clusters: Identify the two closest (most similar) clusters and merge
them into a new, larger cluster. This reduces the number of clusters by one.

4. Update Proximity Matrix: Recalculate the distances between the new cluster and
all other existing clusters. This step is crucial and depends on the linkage method
used:

◦ Single Linkage (Min): The distance between two clusters is the minimum
distance between any data point in the first cluster and any data point in th
(Content truncated due to size limit. Use line ranges to read in chunks)

7.introduction To Clustering
No ratings yet
7.introduction To Clustering
11 pages
Clustering
No ratings yet
Clustering
125 pages
Clustering
No ratings yet
Clustering
104 pages
Unit 4
No ratings yet
Unit 4
74 pages
Clustering
No ratings yet
Clustering
84 pages
Clustering
No ratings yet
Clustering
89 pages
ML Unit III
No ratings yet
ML Unit III
82 pages
Clustering Part1
No ratings yet
Clustering Part1
79 pages
Clustering and K-Means Algorithm
No ratings yet
Clustering and K-Means Algorithm
81 pages
ML CH 4
No ratings yet
ML CH 4
51 pages
UNIT 4 Updated
No ratings yet
UNIT 4 Updated
56 pages
ML Module 4 Unsupervised Learning - Updated
No ratings yet
ML Module 4 Unsupervised Learning - Updated
55 pages
Unit 3 Data
No ratings yet
Unit 3 Data
37 pages
Cluster Analysis: Basic Concepts Partitioning Methods Hierarchical Methods Density-Based Methods Grid-Based Methods Evaluation of Clustering
No ratings yet
Cluster Analysis: Basic Concepts Partitioning Methods Hierarchical Methods Density-Based Methods Grid-Based Methods Evaluation of Clustering
38 pages
Lect 10 DM
No ratings yet
Lect 10 DM
36 pages
Lecture 9 Clustering
No ratings yet
Lecture 9 Clustering
36 pages
Machine Learning For Humans, Part 3 - Unsupervised Learning - by Vishal Maini - Machine Learning For Humans - Medium
No ratings yet
Machine Learning For Humans, Part 3 - Unsupervised Learning - by Vishal Maini - Machine Learning For Humans - Medium
23 pages
Lecturer-1 Unit 3
No ratings yet
Lecturer-1 Unit 3
31 pages
DWDM Unit 3
No ratings yet
DWDM Unit 3
21 pages
Cluster
No ratings yet
Cluster
20 pages
ML Unit-4 Final 2024-25
No ratings yet
ML Unit-4 Final 2024-25
28 pages
Final ML Unit3 May24
No ratings yet
Final ML Unit3 May24
154 pages
AI
No ratings yet
AI
19 pages
Unit-4 ML
No ratings yet
Unit-4 ML
16 pages
ML Unit 4 (Ab 22)
No ratings yet
ML Unit 4 (Ab 22)
39 pages
What Is Cluster Analysis?: - Cluster: A Collection of Data Objects
No ratings yet
What Is Cluster Analysis?: - Cluster: A Collection of Data Objects
42 pages
Working of K Means Algorithm - YashBhure
No ratings yet
Working of K Means Algorithm - YashBhure
14 pages
Clustering: An Overview: Key Concepts Objective
No ratings yet
Clustering: An Overview: Key Concepts Objective
12 pages
Clustering in Machine Learning
No ratings yet
Clustering in Machine Learning
21 pages
Data Mining Clustering
No ratings yet
Data Mining Clustering
76 pages
Unit 4
No ratings yet
Unit 4
16 pages
Assignment No 5 K-Means Clustering
No ratings yet
Assignment No 5 K-Means Clustering
2 pages
Machine Learning Chapter 3
No ratings yet
Machine Learning Chapter 3
12 pages
Ds Econtent
No ratings yet
Ds Econtent
8 pages
M5
No ratings yet
M5
40 pages
M5
No ratings yet
M5
40 pages
DWMModule 4
No ratings yet
DWMModule 4
31 pages
Clustering
No ratings yet
Clustering
32 pages
Clustering
No ratings yet
Clustering
11 pages
ML - 8
No ratings yet
ML - 8
70 pages
Clustering New
No ratings yet
Clustering New
6 pages
An Introduction To Clustering and Different Methods of Clustering
No ratings yet
An Introduction To Clustering and Different Methods of Clustering
9 pages
Clustering
No ratings yet
Clustering
11 pages
Clustering
No ratings yet
Clustering
6 pages
Lecture 8 - Clustering
No ratings yet
Lecture 8 - Clustering
23 pages
Clustering
No ratings yet
Clustering
6 pages
Understanding Clustering - A Comprehensive Guide To
No ratings yet
Understanding Clustering - A Comprehensive Guide To
5 pages
Clustering
No ratings yet
Clustering
10 pages
AIMLB PGP 2024 Session 12
No ratings yet
AIMLB PGP 2024 Session 12
46 pages
Chapter 5. Clustering Algorithms-Stud
No ratings yet
Chapter 5. Clustering Algorithms-Stud
44 pages
DWDM Unit V Note
No ratings yet
DWDM Unit V Note
19 pages
K Means Clustering
No ratings yet
K Means Clustering
6 pages
Clustering Algorithm
No ratings yet
Clustering Algorithm
47 pages
Classify Clustering
No ratings yet
Classify Clustering
31 pages
An Introduction To Clustering Methods
No ratings yet
An Introduction To Clustering Methods
8 pages
BCA Elementary Mathematics O
No ratings yet
BCA Elementary Mathematics O
109 pages
Unit 4 Clustering
No ratings yet
Unit 4 Clustering
18 pages
Machine Learning Unit-4
No ratings yet
Machine Learning Unit-4
24 pages
K Means Clustering
No ratings yet
K Means Clustering
22 pages
K Mean Clustering1
No ratings yet
K Mean Clustering1
23 pages
GCD and LCM Using Recursion Multiple Choice Questions
No ratings yet
GCD and LCM Using Recursion Multiple Choice Questions
3 pages
Class 6 Grammar Creative Writing Extra Questions
No ratings yet
Class 6 Grammar Creative Writing Extra Questions
3 pages
Summative Test: Newton'S Law of Motion
100% (3)
Summative Test: Newton'S Law of Motion
2 pages
Magneto Hydrodynamics
0% (1)
Magneto Hydrodynamics
7 pages
2023-Jee Advanced-2-Paper-1
No ratings yet
2023-Jee Advanced-2-Paper-1
16 pages
Eddu Current Testing Reference From LP Project
No ratings yet
Eddu Current Testing Reference From LP Project
34 pages
01 Introduction
No ratings yet
01 Introduction
24 pages
AtawaandZouaghi Gravity ActaGeophysica
No ratings yet
AtawaandZouaghi Gravity ActaGeophysica
22 pages
Lab Manual - OOP 2024 25 Pages
No ratings yet
Lab Manual - OOP 2024 25 Pages
4 pages
Module 4 Circuit Theorems
No ratings yet
Module 4 Circuit Theorems
24 pages
MATH10021 Algebra: Michael Wemyss
No ratings yet
MATH10021 Algebra: Michael Wemyss
48 pages
Using ERDAS Imagine: This Guide Covers A Few of The Essential Techniques For Navigating Around ERDAS Imagine
No ratings yet
Using ERDAS Imagine: This Guide Covers A Few of The Essential Techniques For Navigating Around ERDAS Imagine
10 pages
Lec 1
No ratings yet
Lec 1
8 pages
21-22 Algebra I-Curriculum Map
No ratings yet
21-22 Algebra I-Curriculum Map
2 pages
Lecture7 Bootstrap Simulation PDF
No ratings yet
Lecture7 Bootstrap Simulation PDF
16 pages
STM (Sec. D&E) Course Outline
No ratings yet
STM (Sec. D&E) Course Outline
4 pages
Unit 1 MCQ: Questions
No ratings yet
Unit 1 MCQ: Questions
16 pages
Finance Chpter 5 Time Value of Money
No ratings yet
Finance Chpter 5 Time Value of Money
11 pages
Area y Per Imetro de Formas Varias (A)
No ratings yet
Area y Per Imetro de Formas Varias (A)
20 pages
Influence of Parameter of Disk Winding
No ratings yet
Influence of Parameter of Disk Winding
9 pages
Introduction To Instrumentation and Measurements: Engr. Edelito A. Handig
No ratings yet
Introduction To Instrumentation and Measurements: Engr. Edelito A. Handig
13 pages
Rocscience Computing Pile Resistance From Applied Soil Displacement For Slope Stability Analysis
No ratings yet
Rocscience Computing Pile Resistance From Applied Soil Displacement For Slope Stability Analysis
4 pages
Mensuration 2 CONE
No ratings yet
Mensuration 2 CONE
2 pages
CH-11 Gyroscopic Couple
No ratings yet
CH-11 Gyroscopic Couple
4 pages
Kelompok 7 Chap 18
No ratings yet
Kelompok 7 Chap 18
6 pages
Experiment 5 Physics 250 Wheatstone Bridge Apparatus
No ratings yet
Experiment 5 Physics 250 Wheatstone Bridge Apparatus
5 pages
Arduino Music - Lavender Town
0% (3)
Arduino Music - Lavender Town
5 pages
Diesmo, Primes
No ratings yet
Diesmo, Primes
3 pages
Two-Step Word Problems
No ratings yet
Two-Step Word Problems
4 pages
Machine Learning with Clustering: A Visual Guide for Beginners with Examples in Python
From Everand
Machine Learning with Clustering: A Visual Guide for Beginners with Examples in Python
Artem Kovera
No ratings yet

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Clustering Explanation

Uploaded by

Clustering Explanation

Uploaded by

Understanding Clustering: Grouping Data

In more formal terms, clustering is an unsupervised machine learning technique that

Think of it as an automatic sorting machine. You feed it a lot of information, and it

Why is Clustering Important?

• Customer Segmentation: Businesses use clustering to group customers based on

Hierarchical clustering methods build a hierarchy of clusters. They can be broadly

• Agglomerative (Bottom-Up): This is the most common approach. It starts with

Density-based methods discover clusters of arbitrary shape based on areas of high

• DBSCAN (Density-Based Spatial Clustering of Applications with Noise): This

How K-Means Works: An Intuitive Walkthrough

Let's break down the K-Means algorithm into simple steps:

◦ The centroids no longer move significantly (they have converged).

Data Points (Students):

Student Hours Spent Studying Assignments Completed On Time

• Centroid 1 (C1): (2, 1)

• Cluster 1 (assigned to C1): Students A (2,1), B (3,2), E (2,3)

Step 4: Update Centroids (Iteration 1).

• Student A (2,1): Closest to C1

• Cluster 1 (Low Study/Assignments): Students A, B, E

Advantages and Disadvantages of K-Means

• Simplicity: Easy to understand and implement.

• Requires Pre-defined k: You need to specify the number of clusters (k)

Hierarchical Clustering: Building a Tree of Relationships

1. Agglomerative (Bottom-Up): This is the more common approach. It starts with

How Agglomerative Hierarchical Clustering Works: A Step-by-Step

2. Calculate Proximity (Similarity/Distance): Compute the similarity or distance

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.