Pa66 ML Exp6
Pa66 ML Exp6
Pa66 ML Exp6
Tech (R&A)
Semester: 7 Subject: Machine Learning
Name: Pratik Mane Class: R&A Final Year
Roll No: PA66 Batch:
Experiment No: 06
Submitted on:
Objective:
1. To solve clustering example by using k-means algorithm
2. To implement the same using python
www.mitwpu.edu.in
Expt. 6
(The following part to be solved in the colab notebook)
Conclusion:
Post-lab questions:
1. State applications of k-means clustering algorithm.
https://www-users.cs.umn.edu/~kumar001/dmbook/ch8.pdf
www.mitwpu.edu.in
Expt. 6
11/6/22, 1:40 PM PA66 EXP6 ML.ipynb - Colaboratory
df.head()
0 Rob 27 70000
1 Michael 29 90000
2 Mohan 29 61000
3 Ismail 28 60000
4 Kory 42 150000
<matplotlib.collections.PathCollection at 0x7f899eebc110>
# Choosing value of k
km = KMeans(n_clusters=3)
km
https://colab.research.google.com/drive/1holqy_3nnMUE2ghbRw32OHUPZxL4MAsb#scrollTo=lB169pK9cYmI&printMode=true 1/7
11/6/22, 1:40 PM PA66 EXP6 ML.ipynb - Colaboratory
KMeans(n_clusters=3)
# Here name column is excluded since it is string and will not be used in numeric computat
# Fit and predict
y_predicted = km.fit_predict(df[['Age','Income ($)']])
y_predicted
array([1, 1, 2, 2, 0, 0, 0, 0, 0, 0, 0, 2, 2, 2, 2, 2, 2, 2, 2, 1, 1, 2],
dtype=int32)
df['Cluster'] = y_predicted
df.head()
0 Rob 27 70000 1
1 Michael 29 90000 1
2 Mohan 29 61000 2
3 Ismail 28 60000 2
4 Kory 42 150000 0
km.cluster_centers_
array([[3.82857143e+01, 1.50000000e+05],
[3.40000000e+01, 8.05000000e+04],
[3.29090909e+01, 5.61363636e+04]])
plt.xlabel('Age')
plt.ylabel('Income ($)')
plt.legend()
# From the graph we can conclude that our scaling is not perfect that is 16000 versus 43
# Therefore we need to use minmax scaler to implement k-means algorithm properly.
https://colab.research.google.com/drive/1holqy_3nnMUE2ghbRw32OHUPZxL4MAsb#scrollTo=lB169pK9cYmI&printMode=true 2/7
11/6/22, 1:40 PM PA66 EXP6 ML.ipynb - Colaboratory
<matplotlib.legend.Legend at 0x7f899eeafc50>
df1 = df[df.Cluster==0]
df2 = df[df.Cluster==1]
df3 = df[df.Cluster==2]
plt.xlabel('Age')
plt.ylabel('Income ($)')
<matplotlib.legend.Legend at 0x7f899f4a7dd0>
scaler.fit(df[['Age']])
df[['Age']] = scaler.transform(df[['Age']])
df
https://colab.research.google.com/drive/1holqy_3nnMUE2ghbRw32OHUPZxL4MAsb#scrollTo=lB169pK9cYmI&printMode=true 3/7
11/6/22, 1:40 PM PA66 EXP6 ML.ipynb - Colaboratory
array([1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0],
dtype=int32)
df['Cluster'] = y_predicted
df.drop('Cluster' , axis='columns')
df
https://colab.research.google.com/drive/1holqy_3nnMUE2ghbRw32OHUPZxL4MAsb#scrollTo=lB169pK9cYmI&printMode=true 4/7
11/6/22, 1:40 PM PA66 EXP6 ML.ipynb - Colaboratory
19 Alia 0.764706
km.cluster_centers_ 0.299145 0
20 Sid 0.882353
array([[0.85294118, 0.316239
0.2022792 ], 0
[0.1372549 , 0.11633428],
21 Abdul 0.764706 0.111111 0
[0.72268908, 0.8974359 ]])
df1 = df[df.Cluster==0]
df2 = df[df.Cluster==1]
df3 = df[df.Cluster==2]
plt.xlabel('Age')
plt.ylabel('Income ($)')
<matplotlib.legend.Legend at 0x7f899f176c90>
k_rng = range(1,10)
sse = []
for k in k_rng:
km = KMeans(n_clusters=k)
km.fit(df[['Age','Income ($)']])
sse.append(km.inertia_)
sse
[5.434011511988178,
2.091136388699078,
0.4750783498553096,
0.3491047094419566,
0.26640301246684156,
0.21066678488010523,
0.17681044133887713,
0.13265419827245162,
0.10497488680620906]
plt.xlabel('k')
plt.ylabel('Sum of Squared Error')
plt.plot(k_rng,sse)
https://colab.research.google.com/drive/1holqy_3nnMUE2ghbRw32OHUPZxL4MAsb#scrollTo=lB169pK9cYmI&printMode=true 6/7
11/6/22, 1:40 PM PA66 EXP6 ML.ipynb - Colaboratory
[<matplotlib.lines.Line2D at 0x7f899ed77890>]
Conclusion: KMeans Algorithm was performed on dataset having Income and Age as input
attributes rstly the data was scaled to the range of 0 to 1 for both the input attributes after that
the value of k was chosen based on Elbow curve method, it came to be 3 and was later veri ed
at the end also visually the user can say that there would be 3 clusters formed based on
obsersing the plot. Finally the dataset was re ned two times to obtain the most accurate
clustering of Data set.
https://colab.research.google.com/drive/1holqy_3nnMUE2ghbRw32OHUPZxL4MAsb#scrollTo=lB169pK9cYmI&printMode=true 7/7