What Is Unsupervised Learning
What Is Unsupervised Learning
In Unsupervised Learning, the machine uses unlabeled data and learns on itself
without any supervision. The machine tries to find a pattern in the unlabeled data
and gives a response.
Let's take a similar example is before, but this time we do not tell the machine
whether it's a spoon or a knife. The machine identifies patterns from the given set
and groups them based on their patterns, similarities, etc.
1. Clustering
2. Association
Clustering is the method of dividing the objects into clusters that are similar
between them and are dissimilar to the objects belonging to another cluster. For
example, finding out which customers made similar product purchases.
Suppose a telecom company wants to reduce its customer churn rate by
providing personalized call and data plans. The behavior of the customers is
studied and the model segments the customers with similar traits. Several
strategies are adopted to minimize churn rate and maximize profit through
suitable promotions and campaigns.
On the right side of the image, you can see a graph where customers are
grouped. Group A customers use more data and also have high call durations.
Group B customers are heavy Internet users, while Group C customers have
high call duration. So, Group B will be given more data benefit plants, while
Group C will be given cheaper called call rate plans and group A will be given the
benefit of both.
Types of Clustering
Hierarchical clustering
Partitioning clustering
Divisive clustering
K-Means clustering
K Means clustering needed advance In hierarchical clustering one can stop at any
knowledge of K i.e. no. of clusters one number of clusters, one find appropriate by
want to divide your data. interpreting the dendrogram.
Hierarchical Clustering
Partitioning Clustering
Partitioning clustering is split into two subtypes - K-Means clustering and Fuzzy
C-Means.
In k-means clustering, the objects are divided into several clusters mentioned by
the number ‘K.’ So if we say K = 2, the objects are divided into two clusters, c1
and c2, as shown:
Here, the features or characteristics are compared, and all objects having similar characteristics
are clustered together.
Fuzzy c-means is very similar to k-means in the sense that it clusters objects that have similar
characteristics together. In k-means clustering, a single object cannot belong to two different
clusters. But in c-means, objects can belong to more than one cluster, as shown.
K-Means clustering is an unsupervised learning algorithm. There is no labeled data for this
clustering, unlike in supervised learning. K-Means performs the division of objects into clusters
that share similarities and are dissimilar to the objects belonging to another cluster.
The term ‘K’ is a number. You need to tell the system how many clusters you need to create. For
example, K = 2 refers to two clusters. There is a way of finding out what is the best or optimum
value of K for a given data.
For a better understanding of k-means, let's take an example from cricket. Imagine you received
data on a lot of cricket players from all over the world, which gives information on the runs
scored by the player and the wickets taken by them in the last ten matches. Based on this
information, we need to group the data into two clusters, namely batsman and bowlers.
Solution:
Here, we have our data set plotted on ‘x’ and ‘y’ coordinates. The information on the y-axis is
about the runs scored, and on the x-axis about the wickets taken by the players.
The first step in k-means clustering is the allocation of two centroids randomly (as K=2). Two
points are assigned as centroids. Note that the points can be anywhere, as they are random points.
They are called centroids, but initially, they are not the central point of a given data set.
The next step is to determine the distance between each of the randomly assigned centroids' data
points. For every point, the distance is measured from both the centroids, and whichever distance
is less, that point is assigned to that centroid. You can see the data points attached to the
centroids and represented here in blue and yellow.
The next step is to determine the actual centroid for these two clusters. The
original randomly allocated centroid is to be repositioned to the actual centroid of
the clusters.
This process of calculating the distance and repositioning the centroid continues
until we obtain our final cluster. Then the centroid repositioning stops.
As seen above, the centroid doesn't need anymore repositioning, and it means
the algorithm has converged, and we have the two clusters with a centroid.