ML Lecture#2
ML Lecture#2
ML Lecture#2
Lecture # 2
Data Normalization, KNN & Minimum Distance
1
Generalization
– While classes can be specified by training samples
with known labels, the goal of a recognition
system is to recognize novel inputs
– When a recognition system is over-fitted to
training samples, it may give bad performance for
typical inputs
2
OverFitting
3
PERFORMANCE MEASUREMENTS
R.O.C. Analysis
Prediction
0 1
0
TN FP
Truth
1 FN TP
Accuracy Measures
• Accuracy
– = (TP+TN)/(P+N)
• Sensitivity or true positive rate (TPR)
– = TP/(TP+FN) = TP/P
• Specificity or TNR
– = TN/(FP+TN) = TN/N
• Positive Predictive value (Precision) (PPV)
– = Tp/(Tp+Fp)
• Recall
– = Tp/(Tp+Fn)
ROC Curve
Choosing the threshold
• Where should we set the threshold
• We could choose the equal error rate point where the errors
in positive training set equals the errors in the negative
training set
• Data classes may be very imbalanced (e.g. |D+| ≪ |D−|)
• In many applications we might be risk adverse to false
positives or false negatives
• Want to see all the options
• The receiver operating characteristic (ROC) curve is a standard
way to test this
Threshold
Test Result
Some definitions ...
True Positives
Test Result
without the disease
with the disease
Call these patients “negative” Call these patients “positive”
True
negatives
Test Result
without the disease
with the disease
Call these patients “negative” Call these patients “positive”
False
negatives
Test Result
without the disease
with the disease
ROC curve comparison
100%
100%
True Positive Rate
100% 100%
True Positive Rate
True Positive
Rate
0
0 %
% 0 100
0 100 False Positive %
False Positive % %
% Rate
Rate
AUC = 100%
True Positive Rate
True Positive
Rate
AUC = 50%
0
0 %
% 0 100
0 100 False Positive Rate %
False Positive Rate % %
%
100% 100%
AUC = 90%
True Positive
Rate
0 0
% %
0 100 100
False Positive Rate 0
% % False Positive Rate %
%
Data Normalization
• Between 0 to 1
((x-min(x))/(max(x)-min(x)))
• Between -1 to 1
((x-min(x))/(max(x)-min(x)))*2-1
Data Normalization
Classification Example
• Why recognising rugby players is (almost)
190cm
130cm
60kg 90kg
Ballet dancers = tall + skinny?
190cm
130cm
60kg 90kg
Rugby players “cluster” separately in the space.
Height
Weight
K Nearest Neighbors
Nearest Neighbour Rule
50
The K-Nearest Neighbour Algorithm
Who’s this?
Height
Weight
The K-Nearest Neighbour Algorithm
1. Measure distance to all points
Height
Weight
The K-Nearest Neighbour Algorithm
1. Measure distance to all points
2. Find closest “k” points (here k=3, but it could be more)
Height
Weight
The K-Nearest Neighbour Algorithm
1. Measure distance to all points
2. Find closest “k” points (here k=3, but it could be more)
3. Assign majority class
Height
Weight
“Euclidean distance”
d (w w ) 2 (h
h )2
1
1
(w, h)
Height
d
(w1, h1)
Weight
The K-Nearest Neighbour Algorithm
d (x x ) ( y y ) (z
2 2
z )2
1 1
1
x = Height
y = Weight
z = Shoe size
Choosing the wrong features makes it difficult,
too many and it’s computationally intensive.
Possible features:
- Shoe size
- Height
?
- Age
- Weight
Shoe size
Age
Nearest Neighbour Rule
59
Nearest Neighbor Classifier
10
9
8
7
Antenna Length
6
5 If the nearest instance to the previously
4 unseen instance is a Katydid
class is Katydid
3 else
2 class is
1 Grasshopper
Katydids
1 2 3 4 5 6 7 8 9 10 Grasshopper
Abdomen Length s
The nearest neighbor algorithm is sensitive to outliers…
K=1 K=3
K-Nearest Neighbour Model
• Picking K
– Use N fold cross validation – Pick K to minimize the cross validation error
– Use the K that gives lowest average error over the N training examples
63
Condensing
• Aim is to reduce the number of training samples
• For example, Retain only the samples that are needed to
define the decision boundary
Hannah 63 200K 1 No
Tom 59 170K 1 No
David 37 50K 2 ?
65
K-Nearest Neighbour Model
• Example : 3-Nearest Neighbors
Customer Age Income No. credit
Age Income Response
cards
John 35 35K 3 No
Tom 59 170K 1 No 15
66
K-Nearest Neighbour Model
• Example : 3-Nearest Neighbors
Customer Age Income No. credit Response
Age Income
Response
cards cards
John 35 35K 3 3 No
No
Rachel 22 50K 2 Yes
2
Hannah 63 200K Yes
1 No 15.16
Tom 59 170K 1 No 15
67
K-Nearest Neighbour Model
• Example : 3-Nearest Neighbors
Customer Age Income No. credit Response
Age Income
Response
cards cards
John 35 35K 3 3 No
No
Rachel 22 50K 2 Yes
2
Hannah 63 200K Yes
1 No 15.16
Tom 59 170K 1 No 15
68
K-Nearest Neighbour Model
• Example: For the example we saw earlier, pick the best K from the set {1, 2,
3} to build a K-NN classifier
Hannah 63 200K 1 No
Tom 59 170K 1 No
David 37 50K 2 ?
69
Acknowledgements
Introduction to Machine Learning, Alphaydin
Digital image processing, Gonzalez, 3rd Edition
Pattern Classification” by Duda et al., John Wiley & Sons.
Some material adapted from Dr Ali Hassan’s slides
Material in these slides has been taken from, the following
resources
76