ML Lec-13

Download as pdf or txt
Download as pdf or txt
You are on page 1of 17

ML

LECTURE-13
BY
Dr. Ramesh Kumar Thakur
Assistant Professor (II)
School Of Computer Engineering
v 1.Using KNN algorithm and the given data set, predict the label of the test data point (3,7), where K=3
and Euclidean distance.
X Y Label
7 7 1
7 4 1
3 4 2
1 4 2

v Ans :- To predict the label of the test data point, we have to calculate the distance from the test data point
to other data in the data set using the Euclidean distance formula.

v Euclidean distance formula : √(X₂-X₁)²+(Y₂-Y₁)²


v Where:
v X₂ = Test data point's X value (3).
v X₁= Existing data's X value.
v Y₂ = Test data point's Y value (7).
v Y₁ = Existing data's Y value.
v Distance #1
v For the first row, d1:
v d1 = √(3 - 7)² + (7 - 7)²
v = √16 + 0
v = √16
v =4

v Distance #2
v For the second row, d2:
v d2 = √(3 - 7)² + (7 - 4)²
v = √16 + 9
v = √25
v =5
v Distance #3
v For the third row, d3:
v d3 = √(3 - 3)² + (7 - 4)²
v = √0 + 9
v = √9
v =3

v Distance #4
v For the fourth row, d4:
v d4 = √(3 - 1)² + (7 - 4)²
v = √4 + 9
v = √13
v = 3.6
v Here's what the table will look like after all the distances have been calculated:
X Y Label Distance
7 7 1 4
7 4 1 5
3 4 2 3
1 4 2 3.6

v As we can see, the majority class within the 3 nearest neighbors to the test data point is label 2.
Therefore, we'll classify the test data point as label 2.

v 2.Perform KNN classification on the following training instances each having two attributes (X1, and X2).
Compute the class label for the test instance t1 = (3,7), with K=3 and Euclidean distance.
Training instances X1 X2 output
I1 7 7 0
I2 7 4 0
I3 3 4 1
I4 1 4 1

v Ans :- Same as solution of question 1.


v 3.Explain the merits and demerits of Cosine distance measure. Find the cosine distance between (1,6,1,0)
and (0,1,2,2).
v Ans:- Merits of cosine distance measure:-
v a) Low storage cost,
v b) High computational efficiency
v c) Good retrieval performance.
v Demerits of cosine distance measure:-
v I. It yields the same value regardless of the size of the vectors being compared, as long as the angle
between them is the same.
v II. It does not take into account the semantic meanings of words or phrases, even when using techniques
like Natural Language Processing.
v Formula for cosine distance:-

v Where,
v p . q = product (dot) of the vectors ‘p’ and ‘q’.
v ||p|| and ||q|| = length (magnitude) of the two vectors ‘p’ and ‘q’.
v Cosine distance between (1,6,1,0) and (0,1,2,2) is

v 4.Under what conditions Minkowski distance is same as Euclidean distance?


v Ans:- When the order in the formula of Minkowski distance is 2, it is same as Euclidean Distance.

v 5.Suppose you have a dataset of animals and you want to use KNN to predict whether a new animal is a
cat or a dog based on its weight and height. You have the following dataset.
Animal Weight (Kg) Height (Cm) Species
1 4 35 Cat
2 6 40 Dog
3 3 25 Cat
4 7 45 Dog
5 5 30 Cat
6 8 50 Dog
7 2 20 Cat
8 5 35 Dog

v Predict the species of a new animal that weights 4Kg and is 30 Cm tall.
v Ans: To predict the species of a new animal, we have to calculate the distance of the features of new
animal from the features of other animal in the data set using the Euclidean distance formula.
v Euclidean distance formula : √(X₂-X₁)²+(Y₂-Y₁)²
v Where:
v X₂ = New animal's weight (4).
v X₁= Existing animal's weight.
v Y₂ = New animal's height (30).
v Y₁ = Existing animal's height.

v Distance #1
v For the first row, d1:
v d1 = √(4 - 4)² + (30 - 35)²
v = √0 + 25
v = √25
v =5
v Distance #2
v For the second row, d2:
v d2 = √(4 - 6)² + (30 - 40)²
v = √4 + 100
v = √104
v = 10.2

v Distance #3
v For the third row, d3:
v d3 = √(4 - 3)² + (30 - 25)²
v = √1 + 25
v = √26
v = 5.1
v Distance #4
v For the fourth row, d4:
v d4 = √(4 - 7)² + (30 - 45)²
v = √9 + 225
v = √234
v = 15.3

v Distance #5
v For the first row, d5:
v d1 = √(4 - 5)² + (30 - 30)²
v = √1 + 0
v = √1
v =1
v Distance #6
v For the second row, d2:
v d2 = √(4 - 8)² + (30 - 50)²
v = √16 + 400
v = √416
v = 20.4

v Distance #7
v For the third row, d3:
v d3 = √(4 - 2)² + (30 - 20)²
v = √4 + 100
v = √104
v = 10.2
v Distance #8
v For the fourth row, d4:
v d4 = √(4 - 5)² + (30 - 35)²
v = √1 + 25
v = √26
v = 5.1
v Here's what the table will look like after all the distances have been calculated:
Animal Weight (Kg) Height (Cm) Species Distance
1 4 35 Cat 5
2 6 40 Dog 10.2
3 3 25 Cat 5.1
4 7 45 Dog 15.3
5 5 30 Cat 1
6 8 50 Dog 20.4
7 2 20 Cat 10.2
8 5 35 Dog 5.1

v As we can see, the majority class within the 3 nearest neighbors to the new animal is cat. Therefore,
we'll classify the new animal as cat.
v 6.Evaluate the Euclidean distance, Manhattan distance, Minkowski distance and the Cosine distance for
the following two points. P1(1,0,2,5,3) and P2(2,1,0,3,-1).
v Ans:- Euclidean distance formula:-

v Euclidean distance between P1(1,0,2,5,3) and P2(2,1,0,3,-1) is

v Manhattan distance formula:-

v Manhattan distance between P1(1,0,2,5,3) and P2(2,1,0,3,-1) is

v Minkowski distance formula:-


v Minkowski Distance is the generalized form of Euclidean and Manhattan Distance. Here, k
represents the order of the norm. When the order(k) is 1, it will represent Manhattan Distance and
when the order in the above formula is 2, it will represent Euclidean Distance. So Minkowski
distance is10 when k=1 and Minkowski distance is 5.1 when k=2.
v Cosine distance formula:-

v Where,
v p . q = product (dot) of the vectors ‘p’ and ‘q’.
v ||p|| and ||q|| = length (magnitude) of the two vectors ‘p’ and ‘q’.
v Cosine distance between P1(1,0,2,5,3) and P2(2,1,0,3,-1) is

v 7.Why KNN is called as Lazy Learner algorithm?


v Ans:- KNN is called a lazy learner algorithm because it does not learn from the training set
immediately, instead it stores the dataset, and at the time of classification, it performs an action on
the dataset.
v 8.Perform KNN classification on the following dataset and predict the class for (height = 170, weight =
57), with K=5 using Euclidean distance.
Height (CM) Weight(KG) Class
167 51 Underweight
182 62 Normal
176 69 Normal
172 65 Normal
173 64 Normal
174 56 Underweight
169 58 Normal
173 57 Normal
170 55 Normal
170 57 ?

v 9.Using KNN algorithm and the given data set, predict the label of the test data point (8,5), where K=3
and Euclidean distance.
X Y Label
4.2 3.8 0
6.5 7.7 1
7.3 8.6 1
5.7 5.9 0
8.0 8.1 1
10.0 6.5 1

v 10.The Manhattan distance between two points (10, 10) and (30, 30) is ?
v 11.Using KNN algorithm and the given data set, predict the class label of the test data point (16,8), where
K=3 and Euclidean distance.
X Y Label
10 5 0
6.5 11 1
7 15 1
12 5 0
8 10 1
15 8 0

v Note:- Question 8-11 is homework.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy