K-MEANS-FINAL
K-MEANS-FINAL
Example: Consider a dataset of house prices. Each house is represented by features such as the number of
bedrooms, square footage, and location. The price of the house is the output (or label). In supervised learning,
the algorithm is trained on this data to predict the price of a new house based on its features.
Decision trees
Neural networks
Example: Consider a dataset of customer data at a retail store with features like age, purchasing history, and
browsing time. There are no labels to indicate what group each customer belongs to. In unsupervised learning,
the algorithm can group similar customers together, which can be useful for targeting marketing campaigns.
Output: No predefined label; the algorithm finds patterns like customer segments.
K-means clustering
Hierarchical clustering
Autoencoders
1
1.3 Differences Between Supervised and Unsupervised Learning
2 K-Means Algorithm
The K-Means algorithm is used to divide a given dataset into k distinct groups or clusters. Below is an
easy-to-follow explanation of the algorithm step by step:
Objective The goal of the K-Means algorithm is to group the data points into k clusters such that points in
the same cluster are more similar to each other than to points in other clusters.
Initial Setup You start by choosing the number of clusters (k ), which you need to decide beforehand. Initially,
the algorithm assigns k random points in the input space as the centers of the clusters (called centroids).
Distance Calculation To group data points, we need to measure how close a point is to a cluster center.
This is done using a distance measure like Euclidean distance (the straight-line distance between two points).
Assigning Points to Clusters For each data point, calculate its distance from all the cluster centers. Assign
each data point to the cluster whose center is closest to it. This means each point is grouped based on the
minimum distance to a cluster center.
Updating Cluster Centers After assigning all data points to clusters, compute the mean of the points
in each cluster. The new cluster center is placed at this mean point (the average position of all points in the
cluster). This step ensures that the cluster centers move towards the center of the group of points they are
responsible for.
Iterative Process Repeat the steps of reassigning points to clusters and updating the cluster centers. This
process continues until the cluster centers stop moving or change very little, meaning the algorithm has converged
and the clusters are stable.
Stopping Criteria The algorithm stops when the cluster centers don’t change their positions significantly,
meaning the best cluster centers for the data have been found.
Example Imagine a group of people (data points) being grouped around k tour guides (cluster centers).
Initially, the tour guides stand randomly, and people move toward the guide closest to them. As people gather,
the tour guides move to the center of their respective groups. This process repeats until the tour guides no
longer need to move.
2
The k-Means Algorithm
Initialisation
– Choose a value for k.
– Choose k random positions in the input space.
– Assign the cluster centres µj to those positions.
Learning
– Repeat
* For each datapoint xi :
Compute the distance to each cluster centre.
Assign the datapoint to the nearest cluster centre with distance di = minj d(xi , µj ).
* For each cluster centre:
Move the position of the centre to the mean of the points in that cluster:
Nj
1 X
µj = xi
Nj i=1
where Nj is the number of points in cluster j.
– Until the cluster centres stop moving.
Usage
– For each test point:
* Compute the distance to each cluster centre.
* Assign the datapoint to the nearest cluster centre with distance di = minj d(xi , µj ).
3
4 Dealing with Noise
Clustering is often used to handle noisy data. Noisy data can occur for several reasons; it might be slightly
corrupted or even completely wrong. If we can group the data points into the right clusters, we can effectively
reduce the noise. This is because we can replace each noisy data point with the average of the points in its
cluster.
However, the k-means algorithm uses the average (mean) to find the cluster centers, which can be greatly
affected by outliers—these are extreme values that don’t fit with the rest of the data. For example, if we have
the numbers (1, 2, 1, 2, 100), the average is 21.2, but the middle value, known as the median, is 2. The median
is a robust statistic because it is not influenced by outliers.
To improve our clustering method, we can replace the average with the median when calculating the cluster
centers. This change makes the algorithm more resistant to noise. However, calculating the median can take
more time than finding the average, but it helps in effectively reducing noise in the data.
algorithm, each input identifies the closest cluster center by calculating the distance to all centers. We can
mimic this in a neural network. The position of each neuron corresponds to its weights in weight space. For
each input, we calculate the distance between the input and each neuron’s weight. During training, we adjust
the position of the neuron by changing its weights.
To implement the k-means algorithm using neurons, we will use one layer of neurons and some input nodes,
without a bias node. The first layer will consist of the input nodes that don’t perform any computations, while
the second layer will contain competitive neurons. Only one neuron will “fire” for each input, meaning that
only one cluster center can represent a particular input vector. The neuron that is closest to the input will be
chosen to fire. This approach is called winner-takes-all activation, which is a form of competitive learning. In
competitive learning, neurons compete to fire based on their closeness to the input. Each neuron may learn
to recognize a specific feature, firing only when it detects that feature. For example, you could have a neuron
trained to recognize your grandmother.
We will select k neurons and connect the inputs to the neurons fully, as usual. The activation of the neurons
will be calculated using a linear transfer function as follows:
X
hi = wij xj .
j
When the inputs are normalized to have the same scale, this effectively measures the distance between the
input vector and the cluster center represented by that neuron. Higher activation values indicate that the input
and the cluster center are closer together.
The winning neuron is the one closest to the current input. The challenge is updating the position of that
neuron in weight space, or how to adjust its weights. In the original k-means algorithm, this was straightforward;
we simply set the cluster center to be the mean of all data points assigned to that center. However, in neural
network training, we often input just one vector at a time and change the weights accordingly (using online
learning rather than batch learning). This means we don’t know the mean because we only have information
about the current input.
4
To address this, we approximate the mean by moving the winning neuron closer to the current input. This
makes it more likely to be the best match for the next occurrence of that input. The adjustment is given by:
∆wij = ηxj .
However, this method might not be sufficient. We will explore why that is when we discuss normalization
further.
7 Neuronal Activation
The neuronal activation can be expressed as:
hi = WiT · x,
where · denotes the inner product (or scalar product) between two vectors, and WiT is the transpose of the
i-th row of the weight matrix W .
The inner product calculates the product ∥Wi ∥∥x∥ cos θ, where θ is the angle between the two vectors, and
∥ · ∥ represents the magnitude of the vector. If the magnitudes of all vectors are normalized to one, then only
the angle θ influences the size of the dot product.
This means that the closer the two vectors point in the same direction, the larger the activation will be.
Therefore, the angle between the vectors is crucial in determining how effectively they match each other.
5
minimize the error for each weight independently. This makes the analysis of the algorithm more complex, but
competitive learning algorithms like this tend to work well in practice.
Key Points:
Weight Update Rule: Moves weights closer to inputs based on their distance.
Independence: Each weight is updated independently, making the analysis complex but effective.
– Normalize the data so that all points lie on the unit sphere.
– Repeat the following steps:
* For each datapoint:
· Compute the activations of all the nodes.
· Pick the winner as the node with the highest activation.
· Update the weights using the weight update rule.
– Continue until the number of iterations exceeds a predefined threshold.
Usage
6
11 Key Points on Competitive Learning for Clustering
Cluster Assignment: After training, a new datapoint can be classified by presenting it to the trained
algorithm, which determines the activated cluster.
Class Label Interpretation: When target data is available, the best-matching cluster can be interpreted
as a class label. However, careful consideration is needed since the order of nodes may not align with the
order of target classes.
Matching Output Classes: To ensure accurate classification, it is essential to match output classes to
target data carefully. Misalignment can lead to misleading results.
Combining K-Means and Perceptrons: The k-means algorithm can position Radial Basis Function
(RBF) nodes in the input space, which accurately represents the input data. A Perceptron can be trained
on top of these RBFs to match output classes to the target data categories.
Utilizing More Clusters: Using more clusters in the k-means network allows for better representation
of the input data without needing to manually assign datapoints to clusters since the Perceptron handles
this classification.
7
kmeans
October 7, 2024
[12]: X = iris.data[:, :2] # Use only the first two features for 2D visualization
y = iris.target # True labels for comparison
# Original data
plt.subplot(1, 2, 1)
plt.scatter(X[:, 0], X[:, 1], c=y, cmap='viridis', marker='o', edgecolor='k',␣
↪s=100)
1
[14]: kmeans = KMeans(n_clusters=3, random_state=42)
kmeans.fit(X)
2
plt.tight_layout()
plt.show()