0% found this document useful (0 votes)
6 views9 pages

What Is Unsupervised Learning

Unsupervised Learning involves machines learning from unlabeled data to identify patterns and group similar items without supervision. It includes techniques like clustering and association, with clustering further divided into hierarchical and partitioning methods, such as K-Means. K-Means clustering specifically requires a predefined number of clusters (K) and iteratively assigns data points to the nearest centroid until convergence.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views9 pages

What Is Unsupervised Learning

Unsupervised Learning involves machines learning from unlabeled data to identify patterns and group similar items without supervision. It includes techniques like clustering and association, with clustering further divided into hierarchical and partitioning methods, such as K-Means. K-Means clustering specifically requires a predefined number of clusters (K) and iteratively assigns data points to the nearest centroid until convergence.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 9

What is Unsupervised Learning?

In Unsupervised Learning, the machine uses unlabeled data and learns on itself
without any supervision. The machine tries to find a pattern in the unlabeled data
and gives a response.

Let's take a similar example is before, but this time we do not tell the machine
whether it's a spoon or a knife. The machine identifies patterns from the given set
and groups them based on their patterns, similarities, etc.

Unsupervised learning can be further grouped into types:

1. Clustering

2. Association

Clustering - Unsupervised Learning

Clustering is the method of dividing the objects into clusters that are similar
between them and are dissimilar to the objects belonging to another cluster. For
example, finding out which customers made similar product purchases.
Suppose a telecom company wants to reduce its customer churn rate by
providing personalized call and data plans. The behavior of the customers is
studied and the model segments the customers with similar traits. Several
strategies are adopted to minimize churn rate and maximize profit through
suitable promotions and campaigns.

On the right side of the image, you can see a graph where customers are
grouped. Group A customers use more data and also have high call durations.
Group B customers are heavy Internet users, while Group C customers have
high call duration. So, Group B will be given more data benefit plants, while
Group C will be given cheaper called call rate plans and group A will be given the
benefit of both.

Types of Clustering

Clustering is a type of unsupervised learning wherein data points are grouped


into different sets based on their degree of similarity.

The various types of clustering are:

 Hierarchical clustering

 Partitioning clustering

Hierarchical clustering is further subdivided into:


 Agglomerative clustering

 Divisive clustering

Partitioning clustering is further subdivided into:

 K-Means clustering

 Fuzzy C-Means clustering

k-means Clustering Hierarchical Clustering

k-means, using a pre-specified number


of clusters, the method assigns records to
each cluster to find the mutually
exclusive cluster of spherical shape Hierarchical methods can be either divisive or
based on distance. agglomerative.

K Means clustering needed advance In hierarchical clustering one can stop at any
knowledge of K i.e. no. of clusters one number of clusters, one find appropriate by
want to divide your data. interpreting the dendrogram.

Agglomerative methods begin with ‘n’ clusters


One can use median or mean as a cluster and sequentially combine similar clusters until
centre to represent each cluster. only one cluster is obtained.

Divisive methods work in the opposite


direction, beginning with one cluster that
Methods used are normally less includes all the records and Hierarchical
computationally intensive and are suited methods are especially useful when the target is
with very large datasets. to arrange the clusters into a natural hierarchy.

In K Means clustering, since one start


with random choice of clusters, the
results produced by running the algorithm In Hierarchical Clustering, results are
many times may differ. reproducible in Hierarchical clustering

K- means clustering a simply a division A hierarchical clustering is a set of nested


of the set of data objects into non-
overlapping subsets (clusters) such that
each data object is in exactly one subset). clusters that are arranged as a tree.

K Means clustering is found to work well


when the structure of the clusters is hyper Hierarchical clustering don’t work as well as, k
spherical (like circle in 2D, sphere in means when the shape of the clusters is hyper
3D). spherical.

Advantages: 1. Convergence is Advantages: 1 .Ease of handling of any forms


guaranteed. 2. Specialized to clusters of of similarity or distance. 2. Consequently,
different sizes and shapes. applicability to any attributes types.

Disadvantage: 1. Hierarchical clustering


Disadvantages: 1. K-Value is difficult to requires the computation and storage of an n×n
predict 2. Didn’t work well with global distance matrix. For very large datasets, this can
cluster. be expensive and slow

Hierarchical Clustering

Hierarchical clustering uses a tree-like structure, like so:

In agglomerative clustering, there is a bottom-up approach. We begin with each


element as a separate cluster and merge them into successively more massive
clusters, as shown below:
Divisive clustering is a top-down approach. We begin with the whole set and
proceed to divide it into successively smaller clusters, as you can see below:

Partitioning Clustering

Partitioning clustering is split into two subtypes - K-Means clustering and Fuzzy
C-Means.

In k-means clustering, the objects are divided into several clusters mentioned by
the number ‘K.’ So if we say K = 2, the objects are divided into two clusters, c1
and c2, as shown:

Here, the features or characteristics are compared, and all objects having similar characteristics
are clustered together.
Fuzzy c-means is very similar to k-means in the sense that it clusters objects that have similar
characteristics together. In k-means clustering, a single object cannot belong to two different
clusters. But in c-means, objects can belong to more than one cluster, as shown.

What is Meant by the K-Means Clustering Algorithm?

K-Means clustering is an unsupervised learning algorithm. There is no labeled data for this
clustering, unlike in supervised learning. K-Means performs the division of objects into clusters
that share similarities and are dissimilar to the objects belonging to another cluster.

The term ‘K’ is a number. You need to tell the system how many clusters you need to create. For
example, K = 2 refers to two clusters. There is a way of finding out what is the best or optimum
value of K for a given data.

For a better understanding of k-means, let's take an example from cricket. Imagine you received
data on a lot of cricket players from all over the world, which gives information on the runs
scored by the player and the wickets taken by them in the last ten matches. Based on this
information, we need to group the data into two clusters, namely batsman and bowlers.

Let's take a look at the steps to create these clusters.

Solution:

Assign data points

Here, we have our data set plotted on ‘x’ and ‘y’ coordinates. The information on the y-axis is
about the runs scored, and on the x-axis about the wickets taken by the players.

If we plot the data, this is how it would look:


Perform Clustering

We need to create the clusters, as shown below:


Considering the same data set, let us solve the problem using K-Means clustering (taking K = 2).

The first step in k-means clustering is the allocation of two centroids randomly (as K=2). Two
points are assigned as centroids. Note that the points can be anywhere, as they are random points.
They are called centroids, but initially, they are not the central point of a given data set.

The next step is to determine the distance between each of the randomly assigned centroids' data
points. For every point, the distance is measured from both the centroids, and whichever distance
is less, that point is assigned to that centroid. You can see the data points attached to the
centroids and represented here in blue and yellow.
The next step is to determine the actual centroid for these two clusters. The
original randomly allocated centroid is to be repositioned to the actual centroid of
the clusters.

This process of calculating the distance and repositioning the centroid continues
until we obtain our final cluster. Then the centroid repositioning stops.

As seen above, the centroid doesn't need anymore repositioning, and it means
the algorithm has converged, and we have the two clusters with a centroid.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy