Data Science and AI
Data Science and AI
By
Mohammed Ziyan Kaliyadan
To
Faculty Member of IIT Madras
Table of contents
03 K-Means clustering
01
Exploratory Data
Analysis (EDA)
Using Google Slides
• Google Slides can be a powerful tool for solving Exploratory Data
Analysis (EDA) by allowing you to use Built-in Functions in finding
Output for the desired Mathematical Questions.
• It has helped us in finding
the
Average rating for all
the ㅤ movies. And
accordingly find the ㅤ
highest rated movie among
the ㅤ dataset
• Additionally, it can be used to find the Mode
and Median of the given data effortlessly, by
using the function MODE() and MEDIAN().
• It also can be used to find the
Highest and Lowest rated
Movie by separating them by
sections and finding the
• average between them
The functions MAX() and
MIN() can be used to get the
Maximum and Minimum value
among the averages getting
the output:
For Maximum: Section D,
Movie 1 (3.6)
02
K-Nearest Neighbours
(KNN)
K-NN
• The K-Nearest Neighbors (KNN) algorithm is a supervised
machine learning method employed to tackle classification and
regression problems. KNN is one of the most basic yet essential
• classification algorithms in machine learning.
For these Problems we will be using Euclidean Distance Formula In
order to Calculate the distance between the points from the User T01
to other Users we will be using the Formula:2
d = √[∑(x2i – x1i) ]*
*Source:
https://www.geeksforgeeks.org/euclidean-distance/
• By Importing the Dataset to Orange and by calculating the Distance
between the Points by Distances (Euclidean) we can view the results
using Distance Matrix and Manually find the distance then picking the
• closest Distance between the points.
We can also use programming languages like Python to find the order in
• Ascending order and sort them.
T01 has ratings 2 5 2 5 3 For Movies
• 1, 2 ,3 ,4, 5 respectively.
Thus we get 3 such points for T01:
• B09 (0.000), C02 (2.000) and D02 (2.236)
In here we have taken the Distance of first 5
Movies of All the Users w.r.t T01.
• The KNN algorithm as said earlier can be used to predict the target by using
features and labels. In this setup The File folder consist of all the User
Data and their ratings except of T01. and connect it with the KNN and to
predictions. This act as training module for the coming Test Data that is to
• be Predicted .
There is another File(1) which is our Test
Subject T01 having the ratings except for
Movie 6. The KNN that is used by our
Feature folder (File) is now being used by
• our File(1) which gives the predicted value
Depending upon the K value in this case
We will be using K value as 1,3 and 50, The
• Value may change.
The below File(2) was used to cross verify
Our predicted K value.
Output
Given the Value:
• When K=1
Movie 6 = 4
• When K=3
Movie 6 = 3
• When K=50
Movie 6 = 3
03
K-Means Clustering
K-Means
• K-Means Clustering is an Unsupervised Machine Learning algorithm,
which groups the unlabeled dataset into different clusters.
• Unsupervised Machine Leaning is the process of teaching a computer to
use unlabeled, unclassified data and enabling the algorithm to operate
on that data without supervision.
*Source: https://keytodatascience.com/k-means-
clustering-algorithm/
• We first Import the dataset into Orange and it is given that Clusters C1
and C2 are assigned as A06 and B03 respectively.
• Considering the User ID A01, We have to find the Distance between it
and the cluster C1, So as done by K-NN we will use the same steps as it
is and view the Distances using Distance Matrix
• And we see that the distances between them to be
7.071
• The Cluster that the User ID E05 will belong can be viewed easily by K-
means and viewing it in Scatter Plots.
• We can take any axis as the cluster will
Be uniform even after we change the
Axis.
• As we see here the User ID E05 belongs
To Cluster C2
Conclusion
• In this Course we learned about Different Types of Datatypes like Structured and
Unstructured Datatype and we further looked in detail about Paradigms of Learning
namely,
•
• Supervised learning
• Unsupervised Learning
Sequential learning
•
We also investigated on implementing K-NN, Decision Tree, Neural Network and K-means
algorithm and Hierarchical Clustering, among this we looked to K-NN and K-means in
depth and be able to answer when provided with questions based on them and researched
• in some parts of Sequential Learning.
• Also we looked into the Responsibilities on creating AI and the etiquettes it should have.
Additionally we also touched on using Google Sheets and helped in finding the answers for
Mathematical problems using Built-in Function.