ML Lecture#2

Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 70

Machine Learning

Lecture # 2
Data Normalization, KNN & Minimum Distance

1
Generalization
– While classes can be specified by training samples
with known labels, the goal of a recognition
system is to recognize novel inputs
– When a recognition system is over-fitted to
training samples, it may give bad performance for
typical inputs

2
OverFitting

3
PERFORMANCE MEASUREMENTS
R.O.C. Analysis

False positives – i.e. falsely predicting an event


False negatives – i.e. missing an incoming event

Similarly, we have “true positives” and “true negatives”

Prediction
0 1

0
TN FP
Truth
1 FN TP
Accuracy Measures
• Accuracy
– = (TP+TN)/(P+N)
• Sensitivity or true positive rate (TPR)
– = TP/(TP+FN) = TP/P
• Specificity or TNR
– = TN/(FP+TN) = TN/N
• Positive Predictive value (Precision) (PPV)
– = Tp/(Tp+Fp)
• Recall
– = Tp/(Tp+Fn)
ROC Curve
Choosing the threshold
• Where should we set the threshold
• We could choose the equal error rate point where the errors
in positive training set equals the errors in the negative
training set
• Data classes may be very imbalanced (e.g. |D+| ≪ |D−|)
• In many applications we might be risk adverse to false
positives or false negatives
• Want to see all the options
• The receiver operating characteristic (ROC) curve is a standard
way to test this
Threshold

Call these patients “negative” Call these patients “positive”

Test Result
Some definitions ...

Call these patients “negative” Call these patients “positive”

True Positives

Test Result
without the disease
with the disease
Call these patients “negative” Call these patients “positive”

Test Result False


without the disease
Positives
with the disease
Call these patients “negative” Call these patients “positive”

True
negatives

Test Result
without the disease
with the disease
Call these patients “negative” Call these patients “positive”

False
negatives

Test Result
without the disease
with the disease
ROC curve comparison

A good test: A poor test:

100%
100%
True Positive Rate

True Positive Rate


0 0
% %
0 100% 100%
0
% False Positive Rate % False Positive Rate
ROC curve extremes
Best Test: Worst test:

100% 100%
True Positive Rate

True Positive
Rate
0
0 %
% 0 100
0 100 False Positive %
False Positive % %
% Rate
Rate

The distributions The distributions


don’t overlap at overlap completely
all
AUC for ROC curves
100% 100%

AUC = 100%
True Positive Rate

True Positive
Rate
AUC = 50%
0
0 %
% 0 100
0 100 False Positive Rate %
False Positive Rate % %
%

100% 100%

AUC = 90%
True Positive

True Positive AUC = 65%


Rate

Rate

0 0
% %
0 100 100
False Positive Rate 0
% % False Positive Rate %
%
Data Normalization
• Between 0 to 1
((x-min(x))/(max(x)-min(x)))

• Between -1 to 1
((x-min(x))/(max(x)-min(x)))*2-1
Data Normalization
Classification Example
• Why recognising rugby players is (almost)

the same problem as handwriting recognition


Can we LEARN to recognise a rugby player?

What are the “features” of a rugby player?


Rugby players = short + heavy?

190cm

130cm

60kg 90kg
Ballet dancers = tall + skinny?

190cm

130cm

60kg 90kg
Rugby players “cluster” separately in the space.

Height

Weight
K Nearest Neighbors
Nearest Neighbour Rule

Consider a two class problem where


each sample consists of two
measurements (x,y).

For a given query point q, k=1


assign the class of the
nearest neighbour.

Compute the k nearest k=3


neighbours and assign the
class by majority vote.

50
The K-Nearest Neighbour Algorithm

Who’s this?

Height

Weight
The K-Nearest Neighbour Algorithm
1. Measure distance to all points

Height

Weight
The K-Nearest Neighbour Algorithm
1. Measure distance to all points
2. Find closest “k” points  (here k=3, but it could be more)

Height

Weight
The K-Nearest Neighbour Algorithm
1. Measure distance to all points
2. Find closest “k” points  (here k=3, but it could be more)
3. Assign majority class

Height

Weight
“Euclidean distance”
d  (w  w ) 2  (h 
h )2
1
1

(w, h)

Height
d
(w1, h1)

Weight
The K-Nearest Neighbour Algorithm

for each testing


point
measure distance to every training point
find the k closest points
identify the most common class among
those k
predict that class
end
• Advantage: Surprisingly good classifier!
• Disadvantage: Have to store the entire training
set in memory
Euclidean distance still works in 3-d, 4-d, 5-d, etc….

d  (x  x )  ( y  y )  (z 
2 2

z )2
1 1
1

x = Height
y = Weight
z = Shoe size
Choosing the wrong features makes it difficult,
too many and it’s computationally intensive.

Possible features:
- Shoe size
- Height
?
- Age
- Weight

Shoe size

Age
Nearest Neighbour Rule

Consider a two class problem where


each sample consists of two
measurements (x,y).

For a given query point q, k=1


assign the class of the
nearest neighbour.

Compute the k nearest k=3


neighbours and assign the
class by majority vote.

59
Nearest Neighbor Classifier

10
9
8
7
Antenna Length

6
5 If the nearest instance to the previously
4 unseen instance is a Katydid
class is Katydid
3 else
2 class is
1 Grasshopper
Katydids
1 2 3 4 5 6 7 8 9 10 Grasshopper
Abdomen Length s
The nearest neighbor algorithm is sensitive to outliers…

The solution is to…


We can generalize the nearest neighbor algorithm to
the K- nearest neighbor (KNN) algorithm.
We measure the distance to the nearest K instances, and let
them vote. K is typically chosen to be an odd number.

K=1 K=3
K-Nearest Neighbour Model
• Picking K

– Use N fold cross validation – Pick K to minimize the cross validation error

– For each of N training example

 Find its K nearest neighbours


 Make a classification based on these K neighbours
 Calculate classification error
 Output average error over all examples

– Use the K that gives lowest average error over the N training examples

63
Condensing
• Aim is to reduce the number of training samples
• For example, Retain only the samples that are needed to
define the decision boundary

*Note: Not part of this lecture


64
K-Nearest Neighbour Model
• Example : Classify whether a customer will respond to a survey question
using a 3-Nearest Neighbor classifier

Customer Age Income No. credit Response


cards
John 35 35K 3 No

Rachel 22 50K 2 Yes

Hannah 63 200K 1 No

Tom 59 170K 1 No

Nellie 25 40K 4 Yes

David 37 50K 2 ?

65
K-Nearest Neighbour Model
• Example : 3-Nearest Neighbors
Customer Age Income No. credit
Age Income Response
cards
John 35 35K 3 No

Rachel 22 50K 2 Yes

Hannah 63 200K 1 No 15.16

Tom 59 170K 1 No 15

Nellie 25 40K 4 Yes 152.23


122
15.74
David 37 50K 2 ?

66
K-Nearest Neighbour Model
• Example : 3-Nearest Neighbors
Customer Age Income No. credit Response
Age Income
Response
cards cards
John 35 35K 3 3 No
No
Rachel 22 50K 2 Yes
2
Hannah 63 200K Yes
1 No 15.16

Tom 59 170K 1 No 15

Nellie 25 40K 4 Yes 152.23


122
15.74
David 37 50K 2 ?

Three nearest ones to David are: No, Yes, Yes

67
K-Nearest Neighbour Model
• Example : 3-Nearest Neighbors
Customer Age Income No. credit Response
Age Income
Response
cards cards
John 35 35K 3 3 No
No
Rachel 22 50K 2 Yes
2
Hannah 63 200K Yes
1 No 15.16

Tom 59 170K 1 No 15

Nellie 25 40K 4 Yes 152.23


122
15.74
David 37 50K 2 Ye??s

Three nearest ones to David are: No, Yes, Yes

68
K-Nearest Neighbour Model
• Example: For the example we saw earlier, pick the best K from the set {1, 2,
3} to build a K-NN classifier

Customer Age Income No. credit Response


cards
John 35 35K 3 No

Rachel 22 50K 2 Yes

Hannah 63 200K 1 No

Tom 59 170K 1 No

Nellie 25 40K 4 Yes

David 37 50K 2 ?

69
Acknowledgements
 Introduction to Machine Learning, Alphaydin
 Digital image processing, Gonzalez, 3rd Edition
 Pattern Classification” by Duda et al., John Wiley & Sons.
 Some material adapted from Dr Ali Hassan’s slides
Material in these slides has been taken from, the following
resources

76

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy