2.2 Lazy Learning
2.2 Lazy Learning
Module 2
K-Nearest Neighbor Learning
Different Learning Methods
Eager Learning
I saw a mouse!
Instance-based Learning
Lazy Learning
Lazy learner waits until the last minute before doing any model
construction in order to classify a given test tuple.
That is, when given a training tuple, a lazy learner simply stores it and
waits until it is given a test tuple.
Only when it sees the test tuple, does it perform generalization in order
to classify the tuple based on its similarity to the stored training tuples.
Unlike eager learning methods, lazy learners do less work when a
training tuple is presented and more work when making a classification
or prediction.
They are also referred to as instance based learners.
6
Instance-based (Lazy) Learning
Typical approaches
k-nearest neighbor approach
Instances represented as points in a Euclidean space.
Locally weighted regression
Constructs local approximation
Case-based reasoning
Uses symbolic representations and knowledge-
based inference
The k-Nearest Neighbor Algorithm
Starting with k = 1, we use a test set to estimate the error rate of the
classifier. This process can be repeated each time by incrementing
k to allow for one more neighbor. The k value that gives the
minimum error rate may be selected.
K-Nearest Neighbor – By Example
14
K-Nearest Neighbor
KNN- Procedure
Step2- Calculate the distance between the query Instance and training
samples.
Step3- Sort the distance and determine nearest neighbours based on ‘k’
Here we consider only two features of each ingredient- Sweetness & Crunchiness.
The first is a 1 to 10 score of how sweet the ingredient tastes and the second is a
measure from 1 to 10 of how crunchy the ingredient is . We then labeled each
ingredient as one of the three types of food: fruits, vegetables,or proteins.
K-Nearest Neighbor – By Example….
We can plot two-dimensional data on a scatter plot, with the x dimension indicating the
ingredient's sweetness and the y dimension, the crunchiness. After adding a few more
ingredients to the taste dataset, the scatter plot might look similar to this:
Similar types of
food tend to be
grouped closely
together. As in the
second diagram,
vegetables tend
to be crunchy but
not sweet, fruits
tend to be sweet
and either
crunchy or not
crunchy, while
proteins tend to
be neither
crunchy nor sweet
K-Nearest Neighbor – By Example….
Now the question: is tomato a fruit or vegetable? We can use the nearest
neighbor approach to determine which class is a better fit, as shown in the following
diagram
K-Nearest Neighbor – By Example….
There are many different ways to calculate distance. Traditionally, the k-NN algorithm
uses Euclidean distance.
Euclidean distance is specified by the following formula, where p and q are the examples to
be compared, each having n features. The term p1 refers to the value of the first feature of
example p, while q1 refers to the value of the first feature of example q:
K-Nearest Neighbor – By Example….
In a similar way, we can calculate the distance between the tomato and several of its closest neighbors
as follows:
K-Nearest Neighbor – By Example….
2.
X1 X2 Y
(Acid Durability) Strength Classification
7 7 Bad
7 4 Bad
3 4 Good
1 4 Good
Exercises
3.
Use KNN to determine the diagnosis value of instance with radius_mean 12.3 and
texture_mean 22.05