0% found this document useful (0 votes)

3 views110 pages

ML unit 4

The document discusses K-means clustering, an unsupervised machine learning algorithm that partitions data into K clusters based on similarity. It contrasts K-means with other clustering methods such as hierarchical and density-based clustering, highlighting their differences in approach and applications. Additionally, it outlines the algorithm's steps, advantages, disadvantages, and various applications in fields like customer segmentation and fraud detection.

Uploaded by

jallusowjanya

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

3 views110 pages

ML unit 4

Uploaded by

jallusowjanya

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 110

K-means Clustering

Types of Clustering :

Types of Clustering Algorithms (Comparison):

• Partitional Clustering (e.g., K-Means, K-Medoids)
• Hierarchical Clustering (e.g., Agglomerative, Divisive)
• Density-Based Clustering (e.g., DBSCAN)
• Model-Based Clustering (e.g., Gaussian Mixture
Models)
K-Means is a partitional and centroid-based
unsupervised clustering algorithm
Unsupervised Learning is a type of machine
learning where the model is trained on data
without explicit labels or supervision.

Unlike supervised learning, which uses labeled

data to train a model, unsupervised learning
identifies patterns, structures, and relationships
within the data on its own.
Partitional Clustering vs Hierarchical Clustering
Both partitional clustering and hierarchical
clustering are unsupervised machine learning
techniques used for grouping similar data points
into clusters, but they differ in their approach,
algorithms, and applications.
Hierarchical clustering creates a
Partitional clustering divides the
tree-like structure of clusters,
data into a predefined number of
which can be visualized as a
clusters. The goal is to partition
dendrogram. This method does
the data into a set of disjoint
not require the number of
clusters such that data points
clusters to be predefined.
within the same cluster are more
Instead, it builds a hierarchy of
similar to each other than to data
clusters and allows you to choose
points in other clusters.
the desired level of clustering
Key Concepts of Unsupervised Learning
1.Clustering – Grouping similar data points together.
1. Examples:
1.K-Means Clustering: Divides data into K clusters.
2.Hierarchical Clustering: Builds a tree-like hierarchy of
clusters.
3.DBSCAN (Density-Based Spatial Clustering): Identifies
clusters of varying density.
2.Dimensionality Reduction – Reducing the number of variables while
preserving important information.
1. Examples:
1.Principal Component Analysis (PCA): Projects data into a
lower-dimensional space.
2.t-SNE (t-Distributed Stochastic Neighbor Embedding):
Useful for visualizing high-dimensional data.
3.Autoencoders: Neural networks that learn efficient data
representations.
Key Concepts of Unsupervised Learning
3. Association Rule Learning – Finding relationships between
variables in large datasets.
1. Examples:
1.Apriori Algorithm: Used in market basket analysis to
identify frequently bought items together.
2.Eclat Algorithm: Another rule-mining approach for finding
associations.

4. Anomaly Detection – Identifying unusual patterns or outliers in

data.
1. Examples:
1.Isolation Forest: Detects outliers by isolating data points.
2.One-Class SVM (Support Vector Machine): Learns the
normal behavior of data and detects deviations.
Applications of Unsupervised Learning

• Customer Segmentation (e.g., e-commerce,

marketing)
• Fraud Detection (e.g., banking, cybersecurity)
• Recommendation Systems (e.g., Netflix, Amazon)
• Medical Diagnosis (e.g., identifying diseases from
medical images)
• Image and Speech Recognition
K-means:
• K-means algorithm is an algorithm to cluster n objects based
on attributes into k partitions, where k<n.
• K-Means clustering is an unsupervised clustering technique.
• It is a partitions based clustering algorithm.
• A cluster is defined as a group of objects that belongs to the
same class.
•Definition:
K-Means is an unsupervised machine learning algorithm used for clustering
data into K distinct groups based on similarity.
•User-defined Parameter (K):
The number of clusters K is specified by the user before running the algorithm.
•Goal of the Algorithm:
To minimize intra-cluster distance (homogeneity within clusters) and
maximize inter-cluster distance (differences between clusters).
•Cluster Representation:
Each cluster is represented by its centroid, which is the mean of the data
points in that cluster.
•Algorithm Steps:
•Initialize K centroids randomly.
•Assign each data point to the nearest centroid.
•Recalculate the centroids by averaging the points in each cluster.
•Repeat the assign–recalculate steps until centroids do not change
(convergence).
•Distance Measure:
Commonly uses Euclidean distance to compute similarity between data points
and centroids.
•Convergence Criteria:
The algorithm stops when data points no longer change clusters or after a
predefined number of iterations.
•Applications:
Used in market segmentation, image compression, document clustering,
anomaly detection, etc.
K-Means Clustering Algorithm
K-Means Clustering Algorithm involves the following steps-
Step-01:
• Choose the number of clusters K.
Step-02:
• Randomly select any K data points as cluster centres.
• Select cluster centers in such a way that they are as farther as
possible from each other.
Step-03:
• Calculate the distance between each data point and each
cluster center.
• The distance may be calculated either by using given distance
function or by using Euclidean distance formula.
K-Means Clustering Algorithm
Step-04:
• Assign each data point to some cluster.
A data point is assigned to that cluster whose center is nearest to
that data point.
Step-05:
• Re-compute the center of newly formed clusters.
The center of a cluster is computed by taking mean of all the data
points contained in that cluster.
Step-06:
• Keep repeating the procedure from Step-03 to Step-05 until any of
the following stopping criteria is met-
• Center of newly formed clusters do not change
• Data points remain present in the same cluster
• Maximum number of iterations are reached
Squared Error criteria
Flowchart
Example
Use K-Means Algorithm to create two clusters-

Solution-
• We follow the above discussed K-Means Clustering Algorithm.
• Assume A(2, 2) and C(1, 1) are centers of the two clusters.
Iteration-01:
• We calculate the distance of each point from each of the center of the two
clusters.
• The distance is calculated by using the Euclidean distance formula.
The following illustration shows the calculation of distance between point
A(2, 2) and each of the center of the two clusters
Calculating Distance Between A(2, 2) and C1(2, 2)-
Ρ(A, C1)
= sqrt [ (x2 – x1)2 + (y2 – y1)2 ]
= sqrt [ (2 – 2)2 + (2 – 2)2 ]
= sqrt [ 0 + 0 ]
=0
Calculating Distance Between A(2, 2) and C2(1, 1)-
Ρ(A, C2)
= sqrt [ (x2 – x1)2 + (y2 – y1)2 ]
= sqrt [ (1 – 2)2 + (1 – 2)2 ]
= sqrt [ 1 + 1 ]
= sqrt [ 2 ]
= 1.41
• In the similar manner, we calculate the distance of other points from each
of the center of the two clusters.

From here, New clusters are-

Cluster-01:
• First cluster contains points-A(2, 2), B(3, 2), D(3, 1)
Cluster-02:
• Second cluster contains points-C(1, 1), E(1.5, 0.5)
Now,
• We re-compute the new cluster centers.
• The new cluster center is computed by taking mean of all the points
contained in that cluster.
For Cluster-01:
• Center of Cluster-01
• = ((2 + 3 + 3)/3, (2 + 2 + 1)/3)
• = (2.67, 1.67)

For Cluster-02:
• Center of Cluster-02
• = ((1 + 1.5)/2, (1 + 0.5)/2)
• = (1.25, 0.75)
This is completion of Iteration-01.
Next,
• we go to iteration-02, iteration-03 and so on until the centers do not
change anymore.
Iteration-02:
Given points Distance from Distance from Points belongs
cluster(2.67,1.67) of cluster(1.25,0.75) to cluster
data points of data points

A(2,2) 0.73 1.45 C1

B(3,2) 0.44 2.14 C1
C(1,1) 1.79 0.34 C2
D(3,1) 0.54 1.76 C1
E(1.5,0.5) 1.45 0.34 C2

From here, New clusters are-

Cluster-01:
First cluster contains points-A(2, 2), B(3, 2), D(3, 1)
Cluster-02:
Second cluster contains points-C(1, 1), E(1.5, 0.5)
Here,
Cluster elements are same as in the previous iteration then stop the process.
K-means Advantages
• Relatively simple to implement.
• Scales to large data sets.
• Guarantees convergence.
• Easily adapts to new examples.
• Generalizes to clusters of different shapes and
sizes, such as elliptical clusters.
K-means Disadvantages
• It requires to specify the number of clusters (k)
in advance.
• It can not handle noisy data and outliers.
• It is not suitable to identify clusters with non-
convex shapes.
Example -2
Suppose we have a dataset of customer spending habits, and we want to group them
into K = 3 clusters based on their annual income and spending score.

Customer ID Annual Income (k$) Spending Score

1 15 39
2 16 81
3 17 6
4 18 77
5 19 40
6 20 76
7 21 6
8 22 94
9 23 3
10 24 99
Exercise Problem
Challenges in Unsupervised Learning
• Number of clusters are normally not known a priori.
• For clustering algorithms, such as K-means, different initial
centers may lead to different clustering results, moreover K is
unknown.
• Time complexity - partitional clustering algorithms
are O(N) whereas hierarchical are O(N2).
• The similarity criteria is not clear - should we use Euclidean
or Manhattan or Hamming?
• In hierarchical clustering, at what stage should we stop?
• Evaluating clustering results are difficult because labels are
not available at the beginning.
Hierarchical clustering
• The hierarchical clustering methods are used to group the data
into hierarchy or tree-like structure.
• For example, in a machine learning problem of organizing
employees of a university in different departments, first the
employees are grouped under the different departments in the
university, and then within each department, the employees
can be grouped according to their roles such as professors,
assistant professors, supervisors, lab assistants, etc. This
creates a hierarchical structure of the employee data and eases
visualization and analysis.
Types of Hierarchal Clustering
There are two types of hierarchal clustering:
1. Agglomerative clustering
2. Divisive Clustering
Types of Hierarchal Clustering
• Agglomerative Clustering is a type of hierarchical clustering

algorithm. It is an unsupervised machine learning technique that

divides the population into several clusters such that data points in the

same cluster are more similar and data points in different clusters are

dissimilar.

• Points in the same cluster are closer to each other.

• Points in the different clusters are far apart.

• On the other hand, the divisive method starts with one cluster with all
given objects and then splits it iteratively to form smaller clusters
• The agglomerative hierarchical clustering method uses
the bottom-up strategy. It starts with each object forming
its own cluster and then iteratively merges the clusters
according to their similarity to form larger clusters. It
terminates either when a certain clustering condition
imposed by the user is achieved or all the clusters merge
into a single cluster.
Some pros and cons of Hierarchical
Clustering
Pros
• No assumption of a particular number of
clusters (i.e. k-means)
• May correspond to meaningful taxonomies
Cons
• Once a decision is made to combine two
clusters, it can’t be undone
• Too slow for large data sets, O(𝑛2 log(𝑛))
Agglomerative Clustering: It uses a bottom-up
approach. It starts with each object forming its own
cluster and then iteratively merges the clusters
according to their similarity to form large clusters. It
terminates either
• When certain clustering condition imposed by
user is achieved or
• All clusters merge into a single cluster
variants of Agglomerative methods:
1. Agglomerative Algorithm: Single Link
• Single-nearest distance or single linkage is the
agglomerative method that uses the distance
between the closest members of the two
clusters.
Question. Find the clusters using a single link
technique. Use Euclidean distance and draw
the dendrogram.
Sample
X Y
No.
P1 0.40 0.53
P2 0.22 0.38
P3 0.35 0.32
P4 0.26 0.19
P5 0.08 0.41
P6 0.45 0.30
Step 2: Merging the two closest members of the two
clusters and finding the minimum element in distance
matrix. Here the minimum value is 0.10 and hence we
combine P3 and P6 (as 0.10 came in the P6 row and P3
column).
Now, form clusters of elements corresponding to the
minimum value and update the distance matrix. To update
the distance matrix:
min ((P3,P6), P1) = min ((P3,P1), (P6,P1)) = min (0.22,0.24) = 0.22
min ((P3,P6), P2) = min ((P3,P2), (P6,P2)) = min (0.14,0.24) = 0.14
min ((P3,P6), P4) = min ((P3,P4), (P6,P4)) = min (0.13,0.22) = 0.13
min ((P3,P6), P5) = min ((P3,P5), (P6,P5)) = min (0.28,0.39) = 0.28
Now we will repeat the same process. Merge two closest members of
the two clusters and find the minimum element in distance matrix.
The minimum value is 0.13 and hence we combine P3, P6 and P4.
Now, form the clusters of elements corresponding to the minimum
values and update the Distance matrix. In order to find, what we have
to update in distance matrix,

min (((P3,P6) P4), P1) = min (((P3,P6), P1), (P4,P1)) = min (0.22,0.37) = 0.22

min (((P3,P6), P4), P2) = min (((P3,P6), P2), (P4,P2)) = min (0.14,0.19) = 0.14

min (((P3,P6), P4), P5) = min (((P3,P6), P5), (P4,P5)) = min (0.28,0.23) = 0.23
Again repeating the same process: The minimum value is 0.14 and
hence we combine P2 and P5. Now, form cluster of elements
corresponding to minimum value and update the distance matrix. To
update the distance matrix:
min ((P2,P5), P1) = min ((P2,P1), (P5,P1)) = min (0.23, 0.34) = 0.23
min ((P2,P5), (P3,P6,P4)) = min ((P3,P6,P4), (P3,P6,P4))
= min (0.14. 0.23) = 0.14
Again repeating the same process: The minimum value is 0.14 and
hence we combine P2,P5 and P3,P6,P4. Now, form cluster of
elements corresponding to minimum value and update the distance
matrix. To update the distance matrix:
min ((P2,P5,P3,P6,P4), P1) = min ((P2,P5), P1), ((P3,P6,P4), P1))
= min (0.23, 0.22) = 0.22
So now we have reached
to the solution finally,
the dendrogram for
those question will be as
follows:
DBSCAN Clustering
• There are different approaches and algorithms to
perform clustering tasks which can be divided
into three sub-categories:
1. Partition-based clustering: E.g. k-means, k-
median
2. Hierarchical clustering: E.g. Agglomerative,
Divisive
3. Density-based clustering: E.g. DBSCAN
Density-based clustering
• Partition-based and hierarchical clustering techniques are
highly efficient with normal shaped clusters. However,
when it comes to arbitrary shaped clusters or detecting
outliers, density-based techniques are more efficient.
• For example, the dataset in the figure below can easily be
divided into three clusters using k-means algorithm.

k-means clustering
Consider the following figures:

The data points in these figures are grouped in arbitrary

shapes or include outliers. Density-based clustering
algorithms are very efficient at finding high-density regions
and outliers. It is very important to detect outliers for some
task, e.g. anomaly detection.
DBSCAN Algorithm
• DBSCAN stands for Density-Based Spatial Clustering
of Applications with Noise. It is able to find arbitrary shaped
clusters and clusters with noise (i.e. outliers).
• In DBSCAN, instead of guessing the number of clusters, will
define two hyper parameters: epsilon and min Points to arrive
at clusters.
• Epsilon (ε): The distance that specifies the neighborhoods.
Two points are considered to be neighbors if the distance
between them are less than or equal to epsilon
• minPoints(n): Minimum number of data points to define a
cluster.
DBSCAN Algorithm
Based on Epsilon (ε) and minPoints(n) parameters, points
are classified as core, border, and outlier or noise points:
• Core point: A point is a core point if there are at least
minPoints number of points (including the point itself) in
its surrounding area with radius epsilon.
• Border point: A point is a border point if it is reachable
from a core point and there are less than minPoints number
of points within its surrounding area.
• Outlier or Noise point: A point is an outlier if it is not a
core point and not reachable from any core points.
DBSCAN Algorithm
• These points may be better explained with visualizations.
Density connected
Three terms are necessary to understand in order to
understand DBSCAN:
• Direct density reachable: A point is called direct
density reachable if it has a core point in its
neighbourhood.
• Density Connected: Two points are called density
connected if there is a core point which is density
reachable from both the points.
• Density Reachable: A point is called density reachable
from another point if they are connected through a series
of core points.
Evaluation Metrics of DBSCAN
• We will use the Silhouette score and Adjusted rand
score for evaluating clustering algorithms.
• Silhouette score is in the range of -1 to 1. A score near 1
denotes the best meaning that the data point i is very compact
within the cluster to which it belongs and far away from the
other clusters. Values near 0 denote overlapping clusters.
• Absolute Rand Score is in the range of 0 to 1. More than 0.9
denotes excellent cluster recovery, above 0.8 is a good
recovery. Less than 0.5 is considered to be poor recovery.
DBSCAN
Pros
• The DBSCAN is better than other cluster algorithms because
it does not require a pre-set number of clusters.
• It identifies outliers as noise, unlike the Mean-Shift method
that forces such points into the cluster in spite of having
different characteristics.
• It finds arbitrarily shaped and sized clusters quite well.
Cons
• It is not very effective when you have clusters of varying
densities.
• If you have high dimensional data, the determining of the
distance threshold Ɛ becomes a challenging task.
DBSCAN Algorithm

Step1: Label Core point and Noise point

▪ Select a random starting point, say x
▪ Identify neighborhood of point x using the radius ε
▪ Count the number of points, say k, in this neighborhood including poin
▪ If k>=Minpts of points then mark x as a core point else mark x as nois
▪ Select a new unvisited point and repeat the above steps
Step2: Check if noise point can become boundary point
▪ If noise point is directly density reachable (That is within the boundary
core point), mark it as boundary and it will form the part of the cluster
▪ A point which is neither core point nor boundary point is marked as no
Problem
Dimensionality reduction

Dimensionality reduction is a technique used to

reduce the number of features in a dataset while
retaining as much of the important information as
possible.
In other words, it is a process of transforming
high-dimensional data into a lower-dimensional
space that still preserves the essence of the
original data.
• In machine learning, high-dimensional data
refers to data with a large number of features or
variables.
• The curse of dimensionality is a common
problem in machine learning, where the
performance of the model deteriorates as the
number of features increases.
• This is because the complexity of the model
increases with the number of features, and it
becomes more difficult to find a good solution.
• In addition, high-dimensional data can also lead
to overfitting, where the model fits the training
data too closely and does not generalize well to
new data.
Dimensionality reduction can help to mitigate these
problems by reducing the complexity of the model
and improving its generalization performance.
There are two main approaches to dimensionality
reduction: feature selection and feature extraction.
Feature Selection
Feature selection involves selecting a subset of the
original features that are most relevant to the
problem at hand. The goal is to reduce the
dimensionality of the dataset while retaining the
most important features.

There are several methods for feature selection,

including filter methods, wrapper methods, and
embedded methods.
Filter methods rank the features based on their
relevance to the target variable, wrapper methods
use the model performance as the criteria for
selecting features, and embedded methods combine
feature selection with the model training process.
Feature Extraction:
Feature extraction involves creating new features by
combining or transforming the original features. The
goal is to create a set of features that captures the
essence of the original data in a lower-dimensional
space.

There are several methods for feature extraction,

including principal component analysis (PCA), linear
discriminant analysis (LDA), and t-distributed
stochastic neighbor embedding (t-SNE).

PCA is a popular technique that projects the original

features onto a lower-dimensional space while
preserving as much of the variance as possible.
A 3-D classification problem can be hard to visualize,
whereas a 2-D one can be mapped to a simple 2-dimensional
space, and a 1-D problem to a simple line. The below figure
illustrates this concept, where a 3-D feature space is split into
two 2-D feature spaces, and later, if found to be correlated,
the number of features can be reduced even further.
components of dimensionality redn this, we try
tuction:
Feature selection: Io find a subset of the original set
of variables, or features, to get a smaller subset
which can be used to model the problem. It usually
involves three ways:
• Filter
• Wrapper
• Embedded
• Feature extraction: This reduces the data in a high
dimensional space to a lower dimension space, i.e. a
space with lesser no. of dimensions.
Methods of Dimensionality Reduction
The various methods used for dimensionality reduction
include:
• Principal Component Analysis (PCA)
• Linear Discriminant Analysis (LDA)
• Generalized Discriminant Analysis (GDA)
• Dimensionality reduction is the process of reducing the number of features in a
dataset while retaining as much information as possible.
This can be done to reduce the complexity of a model, improve the performance of
a learning algorithm, or make it easier to visualize the data.
• Techniques for dimensionality reduction include: principal component analysis
(PCA), singular value decomposition (SVD), and linear discriminant analysis (LDA).
• Each technique projects the data onto a lower-dimensional space while
preserving important information.
• Dimensionality reduction is performed during pre-processing stage before
building a model to improve the performance
Linear Discriminant Analysis (LDA)

Linear Discriminant Analysis (LDA) is a supervised

dimensionality reduction technique that aims to project data
onto a lower-dimensional space while maximizing the
separation between different classes within the data,

essentially making it easier to classify data points by

emphasizing the features that best differentiate between
classes;

it achieves this by finding linear combinations of features that

maximize class separability, making it particularly useful for
classification tasks where class labels are available
Linear Discriminant Analysis (LDA) is a technique used to reduce the
number of dimensions (or features) in data, while preserving the
information that helps in distinguishing between different classes
or categories.
Imagine you have data points representing different groups (e.g.,
"spam" and "not spam" emails). These data points have features
like words, email length, etc. Now, LDA tries to find a simpler
representation of the data, reducing the number of features while
keeping the key differences between "spam" and "not spam."
Here's how it works:
1. Maximizing Separation: It tries to make the difference between
classes as big as possible in the new, reduced space.
2. Combining Features: LDA doesn't just look at each individual
feature. It combines the features in such a way that the new
combination of features best separates the different classes.
3. Projection to Lower Dimensions: The result is that you end up
with fewer features, but the key information that helps distinguish
between classes is still there.
Principal Component Analysis (PCA) is a dimensionality reduction technique
used to reduce the number of features (variables) in a dataset by
transforming them into a new set of features called principal
components. These components retain most of the original data’s
variance while being uncorrelated and independent.
2. Motivation for PCA
Real-world datasets have many features that may be highly correlated.
For example, height and weight are often related — higher height often
implies more weight.
Such correlations add redundancy, making ML models less efficient.
PCA helps by removing correlation and retaining essential information in
fewer dimensions.
3. Goals of PCA
Reduce dimensionality of the data.
Maximize variance captured in the new features.
Generate orthogonal (independent) features.
• Preserve the original data structure as much as possible in fewer
dimensions.
Comparing and Evaluating Clustering Algorithms

Clustering is an unsupervised learning technique used to group similar data points.

Different clustering algorithms exist, each with its strengths and weaknesses.
Comparing and evaluating them requires considering factors like scalability, cluster
shapes, speed, and accuracy.

REFERENCE

https://www.tutorialspoint.com/scikit_learn/scikit_learn_clust
ering_performance_evaluation.htm
Thank you

AIRCON - User Manual
100% (1)
AIRCON - User Manual
47 pages
U1 - KMeans - 5th Sem - DS
No ratings yet
U1 - KMeans - 5th Sem - DS
14 pages
Ml Unit5 Notes
No ratings yet
Ml Unit5 Notes
18 pages
unsupervised learning
No ratings yet
unsupervised learning
23 pages
Unsupervised Machine Learning
No ratings yet
Unsupervised Machine Learning
10 pages
Week 9
No ratings yet
Week 9
66 pages
UNIT 4
No ratings yet
UNIT 4
125 pages
Machine Learning
No ratings yet
Machine Learning
23 pages
ML CH 4
No ratings yet
ML CH 4
51 pages
Unit-IV ppt
No ratings yet
Unit-IV ppt
51 pages
Unit-4 (2)
No ratings yet
Unit-4 (2)
29 pages
22AIP3101A Session 9
No ratings yet
22AIP3101A Session 9
38 pages
Lect 10 - Unsupervised Learning
No ratings yet
Lect 10 - Unsupervised Learning
50 pages
K Means Clustering
No ratings yet
K Means Clustering
22 pages
Unit 4
No ratings yet
Unit 4
74 pages
FML Unit4
No ratings yet
FML Unit4
14 pages
Machine Learning Chapter 3
No ratings yet
Machine Learning Chapter 3
12 pages
Clustering-Part1.pptx
No ratings yet
Clustering-Part1.pptx
84 pages
UNIT - 4 DWDM
No ratings yet
UNIT - 4 DWDM
27 pages
Unit 4 Clustering - K-Means and Hierarchical
No ratings yet
Unit 4 Clustering - K-Means and Hierarchical
40 pages
MachineLearning Unit IV.pptx
No ratings yet
MachineLearning Unit IV.pptx
51 pages
ML Unit-4 Final 2024-25
No ratings yet
ML Unit-4 Final 2024-25
28 pages
8. Clustering
No ratings yet
8. Clustering
38 pages
ML Mod 4 Part 1
No ratings yet
ML Mod 4 Part 1
99 pages
Clustering Techniques - Hierarchical, K-Means Clustering
No ratings yet
Clustering Techniques - Hierarchical, K-Means Clustering
22 pages
ML Unit III.pptx
No ratings yet
ML Unit III.pptx
82 pages
Unsupervised Learning
No ratings yet
Unsupervised Learning
12 pages
Text Analytics Unit-3
No ratings yet
Text Analytics Unit-3
11 pages
K Means
No ratings yet
K Means
9 pages
Unit 3 Data
No ratings yet
Unit 3 Data
37 pages
K Mean
No ratings yet
K Mean
7 pages
DSML-ML09. Unsupervised Learning
No ratings yet
DSML-ML09. Unsupervised Learning
69 pages
ARTIFICIAL INTELLIGENCE LEC 5
No ratings yet
ARTIFICIAL INTELLIGENCE LEC 5
20 pages
Unit-5
No ratings yet
Unit-5
33 pages
Clustering Algorithm
No ratings yet
Clustering Algorithm
47 pages
04-FSSR_DS610_2024=2025T1_Kmeans
No ratings yet
04-FSSR_DS610_2024=2025T1_Kmeans
57 pages
CLUSTERING
No ratings yet
CLUSTERING
11 pages
M5
No ratings yet
M5
40 pages
An Introduction To Clustering Methods
No ratings yet
An Introduction To Clustering Methods
8 pages
Machine Learning Unsupervised
No ratings yet
Machine Learning Unsupervised
20 pages
Clustering Algorithms
No ratings yet
Clustering Algorithms
19 pages
07-Clustering-2024
No ratings yet
07-Clustering-2024
51 pages
Introduction To Unsupervised Learning:: Clustering
No ratings yet
Introduction To Unsupervised Learning:: Clustering
21 pages
Unit-4
No ratings yet
Unit-4
53 pages
Dwdm Unit v Note
No ratings yet
Dwdm Unit v Note
19 pages
ML Unit 4 V1
No ratings yet
ML Unit 4 V1
30 pages
Module 6 - Un-Supervised Learning Algorithms
No ratings yet
Module 6 - Un-Supervised Learning Algorithms
31 pages
Clustering Kmeans
No ratings yet
Clustering Kmeans
6 pages
Machine Learning & Data Mining: Understanding
No ratings yet
Machine Learning & Data Mining: Understanding
7 pages
Unit 5
No ratings yet
Unit 5
5 pages
1731009606_Clustering_(Class_38-39)
No ratings yet
1731009606_Clustering_(Class_38-39)
45 pages
Unit IV
No ratings yet
Unit IV
96 pages
Lecture 01 - Unsupervised Learning (Optional)
No ratings yet
Lecture 01 - Unsupervised Learning (Optional)
57 pages
DM Lecture 06
No ratings yet
DM Lecture 06
32 pages
Week 9. Unsupervised Learning
No ratings yet
Week 9. Unsupervised Learning
32 pages
UNIT 4 K-Means Clustring
No ratings yet
UNIT 4 K-Means Clustring
13 pages
Lecture - 10 Unsupervised Learning & K-Means Clustering
No ratings yet
Lecture - 10 Unsupervised Learning & K-Means Clustering
31 pages
7.introduction To Clustering
No ratings yet
7.introduction To Clustering
11 pages
MACHINE LEARNING NOTES ANNA UNIVERSITY
No ratings yet
MACHINE LEARNING NOTES ANNA UNIVERSITY
14 pages
CE345 - Lecture #9 - Clustering
No ratings yet
CE345 - Lecture #9 - Clustering
56 pages
The Secret Of Machine Learning
From Everand
The Secret Of Machine Learning
Mhd Arjunanta
No ratings yet
Technical and System Requirements For Advanced Distribution Automation PDF
No ratings yet
Technical and System Requirements For Advanced Distribution Automation PDF
5 pages
Data Communication & Computer Network
No ratings yet
Data Communication & Computer Network
3 pages
ERD Exercise1 LimaCoyoca
No ratings yet
ERD Exercise1 LimaCoyoca
2 pages
Module 4 - Lecture 5
No ratings yet
Module 4 - Lecture 5
24 pages
KDF71
No ratings yet
KDF71
6 pages
1.10 Taylor and Maclaurin Series
No ratings yet
1.10 Taylor and Maclaurin Series
12 pages
DDDĐ
No ratings yet
DDDĐ
7 pages
Inverter Datasheet
No ratings yet
Inverter Datasheet
76 pages
SAP Note:: 2043432 - Browser Cache Is Not Invalidated After
No ratings yet
SAP Note:: 2043432 - Browser Cache Is Not Invalidated After
3 pages
Workbook - Design & Process
No ratings yet
Workbook - Design & Process
29 pages
Nios D.el - Ed. Assignment Front Page
No ratings yet
Nios D.el - Ed. Assignment Front Page
1 page
Final Exam Sem 2 2023 2024 Second Draft
No ratings yet
Final Exam Sem 2 2023 2024 Second Draft
4 pages
Motherboards Tuf z270 Mark 2
No ratings yet
Motherboards Tuf z270 Mark 2
70 pages
Vinaya M: Product Marketer, Zoho Corporation
No ratings yet
Vinaya M: Product Marketer, Zoho Corporation
2 pages
ZQL1688 - Function Testing SOP - A
No ratings yet
ZQL1688 - Function Testing SOP - A
22 pages
Presentation On E-Kranti-25!03!2015 v8 1
No ratings yet
Presentation On E-Kranti-25!03!2015 v8 1
24 pages
Low Level Design
No ratings yet
Low Level Design
23 pages
Claim Myunisa Mylife 2017 PDF
No ratings yet
Claim Myunisa Mylife 2017 PDF
12 pages
LENOVO
No ratings yet
LENOVO
1 page
Solved - You Have Data From A Corporation On The Annual Salary O...
No ratings yet
Solved - You Have Data From A Corporation On The Annual Salary O...
1 page
HOMER Energy Intro
No ratings yet
HOMER Energy Intro
17 pages
Unit 2 Appreciation of Computing in Different Fields
No ratings yet
Unit 2 Appreciation of Computing in Different Fields
17 pages
Sections1 2
No ratings yet
Sections1 2
7 pages
IPLOOK Completed Indoor Coverage Solution_20241009
No ratings yet
IPLOOK Completed Indoor Coverage Solution_20241009
14 pages
Screenshot 2021-03-05 at 10.38.01 AM
No ratings yet
Screenshot 2021-03-05 at 10.38.01 AM
31 pages
EN-USG-410_4.0
No ratings yet
EN-USG-410_4.0
214 pages
E-Series Manual (N00-807-01)
No ratings yet
E-Series Manual (N00-807-01)
119 pages
Peter Shirley - Ray Tracing in One Weekend (2016)
No ratings yet
Peter Shirley - Ray Tracing in One Weekend (2016)
38 pages
Wa0000.
No ratings yet
Wa0000.
9 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

ML unit 4

Uploaded by

ML unit 4

Uploaded by

K-means Clustering

Types of Clustering Algorithms (Comparison):

Unlike supervised learning, which uses labeled

4. Anomaly Detection – Identifying unusual patterns or outliers in

• Customer Segmentation (e.g., e-commerce,

From here, New clusters are-

A(2,2) 0.73 1.45 C1

From here, New clusters are-

Customer ID Annual Income (k$) Spending Score

algorithm. It is an unsupervised machine learning technique that

• Points in the same cluster are closer to each other.

• Points in the different clusters are far apart.

The data points in these figures are grouped in arbitrary

Step1: Label Core point and Noise point

Dimensionality reduction is a technique used to

There are several methods for feature selection,

There are several methods for feature extraction,

PCA is a popular technique that projects the original

Linear Discriminant Analysis (LDA) is a supervised

essentially making it easier to classify data points by

it achieves this by finding linear combinations of features that

Clustering is an unsupervised learning technique used to group similar data points.

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.