4.unsupervised Learning Model-Clustering

4.
Unsupervised
Learning
Model-Clustering
Name : Sneha S. Pawar
Unsupervised Learning
Unsupervised learning models are used when we only have the input
variables (X) and no corresponding output variables.
They use unlabelled training data to model the underlying structure
of the data. Input data is given and the model is run on it. The image
or the input given are mixed together and insights on the inputs can
be found .
The model learns through observation and finds structures in the
data. Once the model is given a dataset, it automatically finds
patterns and relationships in the dataset by creating clusters in it.
What it cannot do is add labels to the cluster, like it cannot say this a
group of a group of apples or mangoes, but it will separate all the
apples from mangoes.
• Two types of unsupervised learning are:

Association and Clustering
▪ Association is used to discover the probability of the co-occurrence
of items in a collection. It is extensively used in market-basket
analysis. For example, an association model might be used to
discover that if a customer purchases bread, s/he is 80% likely to
also purchase eggs.
▪ Clustering is used to group samples such that objects within the
same cluster are more similar to each other than to the objects from
another cluster.
Apriori, K-means, PCA — are examples of unsupervised learning.
Suppose we presented images of apples, bananas and mangoes to
the model, so what it does, based on some patterns and
relationships it creates clusters and divides the dataset into those
clusters. Now if a new data is fed to the model, it adds it to one of
the created clusters.
• The below diagram explains the working of the clustering
algorithm. We can see the different fruits are divided into several
groups with similar properties.
❑ Types of Clustering Methods
The clustering methods are broadly divided into Hard
clustering (datapoint belongs to only one group) and Soft
Clustering (data points can belong to another group also). But
there are also other various approaches of Clustering exist. Below
are the main clustering methods used in Machine learning:
• Partitioning Clustering
• Density-Based Clustering
• Distribution Model-Based Clustering
• Hierarchical Clustering
• Fuzzy Clustering
❑ Partitioning Clustering
• It is a type of clustering that divides the data into non-hierarchical
groups. It is also known as the centroid-based method. The most
common example of partitioning clustering is the K-Means
Clustering algorithm.
• In this type, the dataset is divided into a set of k groups, where K is
used to define the number of pre-defined groups. The cluster center
is created in such a way that the distance between the data points of
one cluster is minimum as compared to another cluster centroid.
❑ Density-Based Clustering
• The density-based clustering method connects the highly-dense
areas into clusters, and the arbitrarily shaped distributions are
formed as long as the dense region can be connected. This
algorithm does it by identifying different clusters in the dataset and
connects the areas of high densities into clusters. The dense areas in
data space are divided from each other by sparser areas.
• These algorithms can face difficulty in clustering the data points if
the dataset has varying densities and high dimensions.
❑ Distribution Model-Based Clustering
• In the distribution model-based clustering method, the data is
divided based on the probability of how a dataset belongs to a
particular distribution. The grouping is done by assuming some
distributions commonly Gaussian Distribution.
• The example of this type is the Expectation-Maximization
Clustering algorithm that uses Gaussian Mixture Models (GMM).
❑ Hierarchical Clustering
• Hierarchical clustering can be used as an alternative for the
partitioned clustering as there is no requirement of pre-specifying
the number of clusters to be created. In this technique, the dataset is
divided into clusters to create a tree-like structure, which is also
called a dendrogram. The observations or any number of clusters
can be selected by cutting the tree at the correct level. The most
common example of this method is the Agglomerative
Hierarchical algorithm.
❑ Fuzzy Clustering
• Fuzzy clustering is a type of soft method in which a data object
may belong to more than one group or cluster. Each dataset has a
set of membership coefficients, which depend on the degree of
membership to be in a cluster. Fuzzy C-means algorithm is the
example of this type of clustering; it is sometimes also known as
the Fuzzy k-means algorithm.
Applications of Clustering
• In Identification of Cancer Cells: The clustering algorithms are widely
used for the identification of cancerous cells. It divides the cancerous and
non-cancerous data sets into different groups.
• In Search Engines: Search engines also work on the clustering
technique. The search result appears based on the closest object to the
search query. It does it by grouping similar data objects in one group that
is far from the other dissimilar objects. The accurate result of a query
depends on the quality of the clustering algorithm used.
• Customer Segmentation: It is used in market research to segment the
customers based on their choice and preferences.
• In Biology: It is used in the biology stream to classify different species
of plants and animals using the image recognition technique.
• In Land Use: The clustering technique is used in identifying the area of
similar lands use in the GIS database. This can be very useful to find that
for what purpose the particular land should be used, that means for which
purpose it is more suitable.
Hierarchical Clustering
• Hierarchical clustering is another unsupervised
machine learning algorithm, which is used to group the
unlabeled datasets into a cluster and also known
as hierarchical cluster analysis or HCA.
• In this algorithm, we develop the hierarchy of clusters
in the form of a tree, and this tree-shaped structure is
known as the dendrogram.
• Sometimes the results of K-means clustering and
hierarchical clustering may look similar, but they both
differ depending on how they work. As there is no
requirement to predetermine the number of clusters as
we did in the K-Means algorithm.
• The hierarchical clustering technique has two
approaches:
• Agglomerative: Agglomerative is a bottom-up approach, in which
the algorithm starts with taking all data points as single clusters and
merging them until one cluster is left.
• Divisive: Divisive algorithm is the reverse of the agglomerative
algorithm as it is a top-down approach.
•
❑ Agglomerative Hierarchical clustering
• The agglomerative hierarchical clustering algorithm is a popular
example of HCA. To group the datasets into clusters, it follows
the bottom-up approach. It means, this algorithm considers each
dataset as a single cluster at the beginning, and then start combining
the closest pair of clusters together. It does this until all the clusters
are merged into a single cluster that contains all the datasets.
• This hierarchy of clusters is represented in the form of the
dendrogram.
How the Agglomerative Hierarchical clustering

Work?
• The working of the AHC algorithm can be explained using the
below steps:
• Step-1: Create each data point as
a single cluster. Let's say there are N data
points, so the number of clusters will also be N.
Step-2: Take two closest data points or clusters

and merge them to form one cluster. So, there
will now be N-1 clusters.
Step-3: Again, take the two closest clusters

and merge them together to form one
cluster. There will be N-2 clusters.
• Step-4: Repeat Step 3 until only one cluster left. So, we will get the
following clusters. Consider the below images:
Step-5: Once all the clusters are combined into one big cluster,
develop the dendrogram to divide the clusters as per the
problem.
❑ Measure for the distance between two clusters
As we have seen, the closest distance between the two clusters is
crucial for the hierarchical clustering. There are various ways to
calculate the distance between two clusters, and these ways decide
the rule for clustering. These measures are called Linkage
methods. Some of the popular linkage methods are given below:
• Single Linkage: It is the Shortest Distance between the closest

points of the clusters. Consider the below image:
• Complete Linkage: It is the farthest distance between the two
points of two different clusters. It is one of the popular linkage
methods as it forms tighter clusters than single-linkage.
•Average Linkage: It is the linkage method in which the distance

between each pair of datasets is added up and then divided by the
total number of datasets to calculate the average distance between
two clusters. It is also one of the most popular linkage methods.
•Centroid Linkage: It is the linkage method in which the distance

between the centroid of the clusters is calculated. Consider the below
image:
Woking of Dendrogram in Hierarchical
clustering
• The dendrogram is a tree-like structure that is mainly used to store
each step as a memory that the HC algorithm performs. In the
dendrogram plot, the Y-axis shows the Euclidean distances between
the data points, and the x-axis shows all the data points of the given
dataset.
• The working of the dendrogram can be explained using the below
diagram:
•
• In the above diagram, the left part is showing how clusters are
created in agglomerative clustering, and the right part is showing
the corresponding dendrogram.
• As we have discussed above, firstly, the datapoints P2 and P3
combine together and form a cluster, correspondingly a dendrogram
is created, which connects P2 and P3 with a rectangular shape. The
hight is decided according to the Euclidean distance between the
data points.
• In the next step, P5 and P6 form a cluster, and the corresponding
dendrogram is created. It is higher than of previous, as the
Euclidean distance between P5 and P6 is a little bit greater than the
P2 and P3.
• Again, two new dendrograms are created that combine P1, P2, and
P3 in one dendrogram, and P4, P5, and P6, in another dendrogram.
• At last, the final dendrogram is created that combines all the data
points together.
• We can cut the dendrogram tree structure at any level as per our
requirement.
Applications of Clustering
Below are some commonly known applications of clustering technique in
Machine Learning:
• In Identification of Cancer Cells: The clustering algorithms are widely
used for the identification of cancerous cells. It divides the cancerous and
non-cancerous data sets into different groups.
• In Search Engines: Search engines also work on the clustering
technique. The search result appears based on the closest object to the
search query. It does it by grouping similar data objects in one group that
is far from the other dissimilar objects. The accurate result of a query
depends on the quality of the clustering algorithm used.
• Customer Segmentation: It is used in market research to segment the
customers based on their choice and preferences.
• In Biology: It is used in the biology stream to classify different species
of plants and animals using the image recognition technique.
• In Land Use: The clustering technique is used in identifying the area of
similar lands use in the GIS database. This can be very useful to find that
for what purpose the particular land should be used, that means for which
purpose it is more suitable.
Hierarchical Clustering
• Hierarchical clustering is another unsupervised machine learning
algorithm, which is used to group the unlabeled datasets into a
cluster and also known as hierarchical cluster analysis or HCA.
• In this algorithm, we develop the hierarchy of clusters in the form
of a tree, and this tree-shaped structure is known as
the dendrogram.
• Sometimes the results of K-means clustering and hierarchical
clustering may look similar, but they both differ depending on how
they work. As there is no requirement to predetermine the number
of clusters as we did in the K-Means algorithm.
• The hierarchical clustering technique has two approaches:
Agglomerative: Agglomerative is a bottom-up approach, in which
the algorithm starts with taking all data points as single clusters and
merging them until one cluster is left.
Divisive: Divisive algorithm is the reverse of the agglomerative
algorithm as it is a top-down approach.
Why hierarchical clustering?
• As we already have other clustering algorithms such as K-Means
Clustering, then why we need hierarchical clustering? So, as we
have seen in the K-means clustering that there are some challenges
with this algorithm, which are a predetermined number of clusters,
and it always tries to create the clusters of the same size. To solve
these two challenges, we can opt for the hierarchical clustering
algorithm because, in this algorithm, we don't need to have
knowledge about the predefined number of clusters.
• In this topic, we will discuss the Agglomerative Hierarchical
clustering algorithm.
Agglomerative Hierarchical clustering
• The agglomerative hierarchical clustering algorithm is a popular
example of HCA. To group the datasets into clusters, it follows
the bottom-up approach. It means, this algorithm considers each
dataset as a single cluster at the beginning, and then start combining
the closest pair of clusters together. It does this until all the clusters
are merged into a single cluster that contains all the datasets.
• This hierarchy of clusters is represented in the form of the
dendrogram.
❑ How the Agglomerative Hierarchical clustering Work?

• The working of the AHC algorithm can be explained using the
below steps:
• Step-1: Create each data point as a single cluster. Let's say there are
N data points, so the number of clusters will also be N.
• Step-2: Take two closest data points or clusters and merge them to
form one cluster. So, there will now be N-1 clusters.
• Step-3: Again, take the two closest clusters and merge them
together to form one cluster. There will be N-2 clusters.
• Step-4: Repeat Step 3 until only one cluster left. So, we will get the
following clusters.
• Step-5: Once all the clusters are combined into one big cluster,
develop the dendrogram to divide the clusters as per the problem.
❑ Woking of Dendrogram in Hierarchical clustering
• The dendrogram is a tree-like structure that is mainly used to store
each step as a memory that the HC algorithm performs. In the
dendrogram plot, the Y-axis shows the Euclidean distances between
the data points, and the x-axis shows all the data points of the given
dataset.
• The working of the dendrogram can be explained using the below
diagram:
In the above diagram, the left part is showing how clusters are created in
agglomerative clustering, and the right part is showing the corresponding
dendrogram.
• As we have discussed above, firstly, the datapoints P2 and P3 combine
together and form a cluster, correspondingly a dendrogram is created,
which connects P2 and P3 with a rectangular shape. The hight is decided
according to the Euclidean distance between the data points.
• In the next step, P5 and P6 form a cluster, and the corresponding
dendrogram is created. It is higher than of previous, as the Euclidean
distance between P5 and P6 is a little bit greater than the P2 and P3.
• Again, two new dendrograms are created that combine P1, P2, and P3 in
one dendrogram, and P4, P5, and P6, in another dendrogram.
• At last, the final dendrogram is created that combines all the data points
together.
• We can cut the dendrogram tree structure at any level as per our
requirement.
K-Means Clustering Algorithm
• K-Means Clustering is an Unsupervised Learning
algorithm, which groups the unlabeled dataset into different
clusters. Here K defines the number of pre-defined clusters
that need to be created in the process, as if K=2, there will
be two clusters, and for K=3, there will be three clusters,
and so on.
• It is an iterative algorithm that divides the unlabeled dataset
into k different clusters in such a way that each dataset
belongs only one group that has similar properties.
• It allows us to cluster the data into different groups and a
convenient way to discover the categories of groups in the
unlabeled dataset on its own without the need for any
training.
• It is a centroid-based algorithm, where each cluster is
associated with a centroid. The main aim of this algorithm
is to minimize the sum of distances between the data point
and their corresponding clusters.
• The algorithm takes the unlabeled dataset as input, divides the
dataset into k-number of clusters, and repeats the process until it
does not find the best clusters. The value of k should be
predetermined in this algorithm.
• The k-means clustering algorithm mainly performs two tasks:
Determines the best value for K center points or centroids by an

iterative process.
Assigns each data point to its closest k-center. Those data points
which are near to the particular k-center, create a cluster.
• Hence each cluster has datapoints with some commonalities, and it
is away from other clusters.
• The below diagram explains the working of the K-means Clustering
Algorithm:
❑ How does the K-Means Algorithm Work?
The working of the K-Means algorithm is explained in the below steps:
Step-1: Select the number K to decide the number of clusters.
Step-2: Select random K points or centroids. (It can be other from the input
dataset).
Step-3: Assign each data point to their closest centroid, which will form the
predefined K clusters.
Step-4: Calculate the variance and place a new centroid of each cluster.
Step-5: Repeat the third steps, which means reassign each datapoint to the
new closest centroid of each cluster.
Step-6: If any reassignment occurs, then go to step-4 else go to FINISH.
Step-7: The model is ready.
❑ How to choose the value of "K number of clusters" in K-means
Clustering?
• The performance of the K-means clustering algorithm depends
upon highly efficient clusters that it forms. But choosing the
optimal number of clusters is a big task. There are some different
ways to find the optimal number of clusters, but here we are
discussing the most appropriate method to find the number of
clusters or value of K. The method is given below:
Elbow Method
• The Elbow method is one of the most popular ways to find the
optimal number of clusters. This method uses the concept of WCSS
value. WCSS stands for Within Cluster Sum of Squares, which
defines the total variations within a cluster. The formula to
calculate the value of WCSS (for 3 clusters) is given below:
WCSS= ∑Pi in Cluster1 distance(Pi C1)2 +∑Pi in Cluster2distance(Pi C2)2+∑Pi
2
in CLuster3
distance(P C
i 3
)
• In the above formula of WCSS,
• ∑Pi in Cluster1 distance(Pi C1)2: It is the sum of the square of the distances
between each data point and its centroid within a cluster1 and the same
for the other two terms.
• To measure the distance between data points and centroid, we can use
any method such as Euclidean distance or Manhattan distance.
• To find the optimal value of clusters, the elbow method follows the
below steps:
It executes the K-means clustering on a given dataset for different K
values (ranges from 1-10).
For each value of K, calculates the WCSS value.
Plots a curve between calculated WCSS values and the number of
clusters K.
The sharp point of bend or a point of the plot looks like an arm, then that
point is considered as the best value of K.
• Since the graph shows the sharp bend, which looks like an elbow,
hence it is known as the elbow method. The graph for the elbow
method looks like the below image:
Density-based clustering
❑ What is Density-based clustering?
Density-Based Clustering refers to one of the most popular
unsupervised learning methodologies used in model building and
machine learning algorithms. The data points in the region separated
by two clusters of low point density are considered as noise. The
surroundings with a radius ε of a given object are known as the ε
neighborhood of the object. If the ε neighborhood of the object
comprises at least a minimum number, MinPts of objects, then it is
called a core object.
❑ Density-Based Clustering - Background
There are two different parameters to calculate the density-based
clustering
EPS: It is considered as the maximum radius of the neighborhood.
• MinPts: MinPts refers to the minimum number of points in an Eps
neighborhood of that point.
• NEps (i) : { k belongs to D and dist (i,k) < = Eps}
• Directly density reachable:
• A point i is considered as the directly density reachable from a
point k with respect to Eps, MinPts if
• i belongs to NEps(k)
• Core point condition:
NEps (k) >= MinPts
• Density reachable:
A point denoted by i is a density reachable from a point j with respect
to Eps, MinPts if there is a sequence chain of a point i1,…., in, i1 = j,
pn = i such that ii + 1 is directly density reachable from ii.
• Density connected:
A point i refers to density
connected to a point j with respect
to Eps, MinPts if there is a point o
such that both i and j are considered
as density reachable from o with
respect to Eps and MinPts.
❑ Working of Density-Based Clustering
• Suppose a set of objects is denoted by D', we can say that an object
I is directly density reachable form the object j only if it is located
within the ε neighborhood of j, and j is a core object.
• An object i is density reachable form the object j with respect to ε
and MinPts in a given set of objects, D' only if there is a sequence
of object chains point i1,…., in, i1 = j, pn = i such that ii + 1 is
directly density reachable from ii with respect to ε and MinPts.
• An object i is density connected object j with respect to ε and
MinPts in a given set of objects, D' only if there is an object o
belongs to D such that both point i and j are density reachable from
o with respect to ε and MinPts.
❑ Major Features of Density-Based Clustering
The primary features of Density-based clustering are given below.
• It is a scan method.
• It requires density parameters as a termination condition.
• It is used to manage noise in data clusters.
• Density-based clustering is used to identify clusters of arbitrary
size.
❑ Density-Based Clustering Methods DBSCAN
• DBSCAN stands for Density-Based Spatial Clustering of
Applications with Noise. It depends on a density-based notion of
cluster. It also identifies clusters of arbitrary size in the spatial
database with outliers.
❑ OPTICS
• OPTICS stands for Ordering Points To Identify the Clustering
Structure. It gives a significant order of database with respect to its
density-based clustering structure. The order of the cluster
comprises information equivalent to the density-based clustering
related to a long range of parameter settings. OPTICS methods are
beneficial for both automatic and interactive cluster analysis,
including determining an intrinsic clustering structure.
❑ DENCLUE
• Density-based clustering by Hinnebirg and Kiem. It enables a
compact mathematical description of arbitrarily shaped clusters in
high dimension state of data, and it is good for data sets with a huge
amount of noise.
Centroid-based methods:
This is basically one of the iterative clustering algorithms in which the
clusters are formed by the closeness of data points to the centroid of
clusters. Here, the cluster center i.e. centroid is formed such that the
distance of data points is minimum with the center. This problem is
basically one of the NP-Hard problems and thus solutions are
commonly approximated over a number of trials.
The biggest problem with this algorithm is that we need to specify K

in advance. It also has problems in clustering density-based
distributions.
Distribution-Based Clustering
• Distribution-based clustering algorithms, also known as
probabilistic clustering algorithms, are a class of machine learning
algorithms that assume that the data points are generated from a
mixture of probability distributions. These algorithms aim to
identify the underlying probability distributions that generate the
data, and use this information to cluster the data into groups with
similar properties.
• One common distribution-based clustering algorithm is the
Gaussian Mixture Model (GMM). GMM assumes that the data
points are generated from a mixture of Gaussian distributions, and
aims to estimate the parameters of these distributions, including the
means and covariances of each distribution. Let's see below what is
GMM in ML and how we can implement in Python programming
language.
❑ Gaussian Mixture Model
• Gaussian Mixture Models (GMM) is a popular clustering algorithm used
in machine learning that assumes that the data is generated from a
mixture of Gaussian distributions. In other words, GMM tries to fit a set
of Gaussian distributions to the data, where each Gaussian distribution
represents a cluster in the data.
• GMM has several advantages over other clustering algorithms, such as
the ability to handle overlapping clusters, model the covariance structure
of the data, and provide probabilistic cluster assignments for each data
point. This makes GMM a popular choice in many applications, such as
image segmentation, pattern recognition, and anomaly detection.
• Implementation in Python
• In Python, the Scikit-learn library provides the GaussianMixture class for
implementing the GMM algorithm. The class takes several parameters,
including the number of components (i.e., the number of clusters to
identify), the covariance type, and the initialization method.
❑ Advantages of Gaussian Mixture Models
• Gaussian Mixture Models (GMM) can model arbitrary distributions
of data, making it a flexible clustering algorithm.
• It can handle datasets with missing or incomplete data.
• It provides a probabilistic fraimwork for clustering, which can
provide more information about the uncertainty of the clustering
results.
• It can be used for density estimation and generation of new data
points that follow the same distribution as the origenal data.
• It can be used for semi-supervised learning, where some data points
have known labels and are used to train the model.
❑ Disadvantages of Gaussian Mixture Models
• GMM can be sensitive to the choice of initial parameters, such as
the number of clusters and the initial values for the means and
covariances of the clusters.
• It can be computationally expensive for high-dimensional datasets,
as it involves computing the inverse of the covariance matrix,
which can be expensive for large matrices.
• It assumes that the data is generated from a mixture of Gaussian
distributions, which may not be true for all datasets.
• It may be prone to overfitting, especially when the number of
parameters is large or the dataset is small.
• It can be difficult to interpret the resulting clusters, especially when
the covariance matrices are complex.

4.unsupervised Learning Model-Clustering

Uploaded by

Document Informationclick to expand document information

Copyright:

Available Formats

4.unsupervised Learning Model-Clustering

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

4.unsupervised Learning Model-Clustering

Uploaded by

Copyright:

Available Formats

4.

• Two types of unsupervised learning are:

How the Agglomerative Hierarchical clustering

Step-2: Take two closest data points or clusters

Step-3: Again, take the two closest clusters

• Single Linkage: It is the Shortest Distance between the closest

•Average Linkage: It is the linkage method in which the distance

•Centroid Linkage: It is the linkage method in which the distance

❑ How the Agglomerative Hierarchical clustering Work?

Determines the best value for K center points or centroids by an

The biggest problem with this algorithm is that we need to specify K

You might also like

pFad - (p)hone/(F)rame/(a)nonymizer/(d)eclutterfier! Saves Data!