0% found this document useful (0 votes)
3 views

DMDW Lab8

The document outlines the K-means clustering algorithm, explaining its purpose, methodology, and challenges in determining the optimal number of clusters. It provides a Python implementation of the K-means algorithm using the Iris dataset, including data scaling, cluster assignment, and centroid calculation. Additionally, it visualizes the clustering results and displays the centroids of the clusters.

Uploaded by

jagnoorsm.cs.22
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views

DMDW Lab8

The document outlines the K-means clustering algorithm, explaining its purpose, methodology, and challenges in determining the optimal number of clusters. It provides a Python implementation of the K-means algorithm using the Iris dataset, including data scaling, cluster assignment, and centroid calculation. Additionally, it visualizes the clustering results and displays the centroids of the clusters.

Uploaded by

jagnoorsm.cs.22
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 3

LAB - 8

1. Demonstrate the K-means clustering using the WEKA tool.

K-Means clustering is an unsupervised learning algorithm used to partition data into k distinct
clusters based on similarity. The algorithm works by first randomly selecting k initial centroids,
then iteratively assigning each data point to the nearest centroid, and recalculating the centroids
as the mean of all points in each cluster. This process is repeated until the centroids no longer
change significantly, signaling convergence. The goal is to minimize the sum of squared
distances between data points and their assigned centroids, which is known as the objective
function or inertia.

One of the challenges in K-Means is determining the optimal number of clusters, k. Methods like
the Elbow Method, Silhouette Score, and Gap Statistic help identify a good value for k by
assessing how well the data is grouped. While K-Means is computationally efficient and widely
used in various applications, it has some limitations, such as sensitivity to the initialization of
centroids, difficulty handling non-spherical clusters, and its sensitivity to outliers. Despite these
drawbacks, K-Means remains a powerful tool for clustering in fields like customer segmentation,
document clustering, and image compression.
2. Implement K-means clustering algorithm using python.

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.datasets import load_iris
from sklearn.preprocessing import StandardScaler

iris = load_iris()
X = iris.data
y = iris.target

scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)

class KMeans:
def __init__(self, n_clusters=3, max_iters=100):
self.n_clusters = n_clusters
self.max_iters = max_iters

def fit(self, X):


random_idx = np.random.permutation(len(X))[:self.n_clusters]
self.centroids = X[random_idx]

for _ in range(self.max_iters):
self.labels = self._assign_clusters(X)
new_centroids = self._calculate_centroids(X)
if np.all(new_centroids == self.centroids):
break
self.centroids = new_centroids

def _assign_clusters(self, X):


distances = np.linalg.norm(X[:, np.newaxis] - self.centroids, axis=2)
return np.argmin(distances, axis=1)

def _calculate_centroids(self, X):


centroids = np.zeros((self.n_clusters, X.shape[1]))
for i in range(self.n_clusters):
if np.any(self.labels == i):
centroids[i] = X[self.labels == i].mean(axis=0)
return centroids

def predict(self, X):


return self._assign_clusters(X)

kmeans = KMeans(n_clusters=3)
kmeans.fit(X_scaled)

plt.scatter(X_scaled[:, 0], X_scaled[:, 1], c=kmeans.labels, cmap='viridis',


marker='o')
plt.scatter(kmeans.centroids[:, 0], kmeans.centroids[:, 1], c='red', marker='x',
s=100)
plt.title('K-Means Clustering on Iris Dataset (First Two Features)')
plt.xlabel('Feature 1')
plt.ylabel('Feature 2')
plt.show()

centroids_df = pd.DataFrame(kmeans.centroids, columns=['Feature 1', 'Feature 2',


'Feature 3', 'Feature 4'])
centroids_df.index.name = 'Cluster'
print("\nCentroids:")
print(centroids_df)

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy