BAR Machine Learning Notes Part 2
BAR Machine Learning Notes Part 2
variability.
K-Means Clustering is an unsupervised learning algorithm that is used to solve clustering problems
in machine learning or data science.
It allows us to cluster the data into different groups and is a convenient way to discover the
categories of groups in the unlabeled dataset on its own without the need for any training.
It is a centroid-based algorithm, where each cluster is associated with a centroid. The main aim of
this algorithm is to minimize the sum of distances between the data point and their corresponding
clusters.
Step-2: Select random K points or centroids. (It can be other from the input dataset).
Step-3: Assign each data point to their closest centroid, which will form the predefined K clusters.
Step-4: Calculate the variance and place a new centroid of each cluster.
Step-5: Repeat the third steps, which means reassign each datapoint to the new closest centroid of
each cluster.
This is one of the most obvious trade-offs in machine learning. This trade -off is in relation to
the error of the model. The error of the model is a sum of the bias error and variance error.
We have two prominent scenarios derived from the famous bull’s eye framework
Every model will have some element of bias error as well as variance error. Hence, we have
the problem of generalization which is inevitable. The challenge is to have the optimal mix of bias
error and variance error that provides the least total error for the model. Thus minimize the problem
of generalization.