0% found this document useful (0 votes)
16 views26 pages

2.2 Lazy Learning

K-nearest neighbor (KNN) is an instance-based, lazy learning algorithm that can be used for classification or regression. It stores all available cases and classifies new cases based on a vote of its k nearest neighbors. KNN works by finding the k training examples that are closest in distance to the new example, and predicting the new example to belong to the most common class among its k nearest neighbors. The distance is usually measured using Euclidean distance. Choosing an appropriate value for k is important, as different k values can result in different classifications. Generally, a larger k reduces the effect of noise on the classification, but makes boundaries between classes less distinct.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
16 views26 pages

2.2 Lazy Learning

K-nearest neighbor (KNN) is an instance-based, lazy learning algorithm that can be used for classification or regression. It stores all available cases and classifies new cases based on a vote of its k nearest neighbors. KNN works by finding the k training examples that are closest in distance to the new example, and predicting the new example to belong to the most common class among its k nearest neighbors. The distance is usually measured using Euclidean distance. Choosing an appropriate value for k is important, as different k values can result in different classifications. Generally, a larger k reduces the effect of noise on the classification, but makes boundaries between classes less distinct.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 26

Lazy Learning

Module 2
K-Nearest Neighbor Learning
Different Learning Methods
 Eager Learning

Any random movement


=>It’s a mouse

I saw a mouse!
Instance-based Learning

 Lazy Learning

Its very similar to a


Desktop!!
Different Learning Methods
 Eager Learning
 Explicit description of target function on the whole training
set
 Instance-based Learning
 Learning=storing all training instances
 Classification=assigning target function to a new instance
 Referred to as “Lazy” learning
Instance-Based Methods (Lazy Learners)

 Lazy learner waits until the last minute before doing any model
construction in order to classify a given test tuple.
 That is, when given a training tuple, a lazy learner simply stores it and
waits until it is given a test tuple.
 Only when it sees the test tuple, does it perform generalization in order
to classify the tuple based on its similarity to the stored training tuples.
 Unlike eager learning methods, lazy learners do less work when a
training tuple is presented and more work when making a classification
or prediction.
 They are also referred to as instance based learners.

6
Instance-based (Lazy) Learning

Store training examples and delay the processing (“lazy evaluation”)


until a new instance must be classified.

Typical approaches
 k-nearest neighbor approach
 Instances represented as points in a Euclidean space.
 Locally weighted regression
 Constructs local approximation
 Case-based reasoning
 Uses symbolic representations and knowledge-
based inference
The k-Nearest Neighbor Algorithm

 Nearest-neighbor classifiers are based on learning by analogy, that is, by


comparing a given test tuple with training tuples that are similar to it.
 The training tuples are described by ‘n’ attributes.
 Each tuple represents a point in an n-dimensional space.
 In this way, all of the training tuples are stored in an n-dimensional pattern
space.
 When given an unknown tuple, a k-nearest-neighbor classifier searches
the pattern space for the k training tuples that are closest to the unknown
tuple. These k training tuples are the k “nearest neighbors” of the unknown
tuple.
1-Nearest Neighbor
3-Nearest Neighbor
K-Nearest Neighbor…
 “Closeness” is defined in terms of a distance metric, such as
Euclidean distance.

 The Euclidean distance between two points or tuples, say,


X1 = (x11, x12, . .. , x1n) and X2 = (x21, x22, ... , x2n), is
K-Nearest Neighbor…

 How determine a good value for ‘k’, the no. of neighburs?

 Starting with k = 1, we use a test set to estimate the error rate of the
classifier. This process can be repeated each time by incrementing
k to allow for one more neighbor. The k value that gives the
minimum error rate may be selected.
K-Nearest Neighbor – By Example

The letter k is a variable term implying that any number of


nearest neighbors could be used.
The k-Nearest Neighbor Algorithm- Features

 All instances correspond to points in the n-D space


 Classification is delayed till a new instance arrives

 The nearest neighbor are defined in terms of Euclidean distance.


 The target function could be discrete- or real- valued.
 For discrete-valued, the k-NN returns the most common value
among the k training examples nearest to xq.

14
K-Nearest Neighbor
KNN- Procedure

Step1- Determine the parameter, K. (Let K= 3)

Step2- Calculate the distance between the query Instance and training
samples.

Step3- Sort the distance and determine nearest neighbours based on ‘k’

Step4- Gather the class label of nearest neighbours

Step5- Use majority of the category of nearest neighbours as the prediction


value of the query instance.
K-Nearest Neighbor – By Example
To illustrate this process, consider the following dataset:

Here we consider only two features of each ingredient- Sweetness & Crunchiness.
The first is a 1 to 10 score of how sweet the ingredient tastes and the second is a
measure from 1 to 10 of how crunchy the ingredient is . We then labeled each
ingredient as one of the three types of food: fruits, vegetables,or proteins.
K-Nearest Neighbor – By Example….
We can plot two-dimensional data on a scatter plot, with the x dimension indicating the
ingredient's sweetness and the y dimension, the crunchiness. After adding a few more
ingredients to the taste dataset, the scatter plot might look similar to this:
Similar types of
food tend to be
grouped closely
together. As in the
second diagram,
vegetables tend
to be crunchy but
not sweet, fruits
tend to be sweet
and either
crunchy or not
crunchy, while
proteins tend to
be neither
crunchy nor sweet
K-Nearest Neighbor – By Example….

Now the question: is tomato a fruit or vegetable? We can use the nearest
neighbor approach to determine which class is a better fit, as shown in the following
diagram
K-Nearest Neighbor – By Example….

Measuring similarity with distance


Locating the tomato's nearest neighbors requires a distance function, or a
formula that measures the similarity between the two instances.

There are many different ways to calculate distance. Traditionally, the k-NN algorithm
uses Euclidean distance.

Euclidean distance is specified by the following formula, where p and q are the examples to
be compared, each having n features. The term p1 refers to the value of the first feature of
example p, while q1 refers to the value of the first feature of example q:
K-Nearest Neighbor – By Example….

Measuring similarity with distance


To calculate the distance between the tomato (sweetness = 6,
crunchiness = 4), and the green bean (sweetness = 3, crunchiness = 7), we can
use the formula as follows:

In a similar way, we can calculate the distance between the tomato and several of its closest neighbors
as follows:
K-Nearest Neighbor – By Example….

To classify the tomato as a vegetable, protein, or fruit, we'll begin by


assigning the tomato, the food type of its single nearest neighbor. This is
called 1-NN classification because k = 1. The orange is the nearest neighbor
to the tomato, with a distance of 1.4. As orange is a fruit, the 1-NN
algorithm would classify tomato as a fruit.

If we use the k-NN algorithm with k = 3 instead, it performs a vote among


the three nearest neighbors: orange, grape, and nuts. Since the majority
class among these neighbors is fruit (two of the three votes), the tomato
again is classified as a fruit.
Choosing an appropriate k

To classify the tomato as a vegetable, protein, or fruit, we'll begin by


assigning the tomato, the food type of its single nearest neighbor. This is
called 1-NN classification because k = 1. The orange is the nearest neighbor
to the tomato, with a distance of 1.4. As orange is a fruit, the 1-NN
algorithm would classify tomato as a fruit.

If we use the k-NN algorithm with k = 3 instead, it performs a vote among


the three nearest neighbors: orange, grape, and nuts. Since the majority
class among these neighbors is fruit (two of the three votes), the tomato
again is classified as a fruit.
Exercises

1. Based on the survey conducted in an institution the students are


classified based on the 2 attributes- Academic excellence and other
achievements. Consider the data set given. Find the classification of a
student with value of X is 5 and Y is 7 based on the data of trained
samples using KNN algorithm. Choose k = 3.
Exercises

2.

X1 X2 Y
(Acid Durability) Strength Classification
7 7 Bad
7 4 Bad
3 4 Good
1 4 Good
Exercises

3.

Use KNN to determine the diagnosis value of instance with radius_mean 12.3 and
texture_mean 22.05

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy