Nearest Neighbour
Nearest Neighbour
Nearest Neighbour
Nearest Neighbour
1
Inductive Learning (?)
• Induction
– Given a training set of examples of the form (𝑥, $
𝑥
)
• 𝑥 is the input, $(𝑥) is the output
– Return a function ℎ that approximates $
• ℎ is called the hypothesis
2
Supervised Learning
• Two types of problems
1. Classification
2. Regression
3
Classification Example
• Problem: Will you enjoy an outdoor sport based on the
weather?
• Training set: Sky Humidity Wind Water Forecast EnjoySport
Sunny Normal Strong Warm Same yes
Sunny High Strong Warm Same yes
Sunny High Strong Warm Change no
Sunny High Strong Cool Change yes
𝑥 9(𝑥)
• Possible Hypotheses:
– ℎ1 : 𝑆 = 𝑠𝑢𝑛𝑛𝑦 → 𝑒𝑛j𝑜𝑦𝑆𝑝𝑜𝑟𝑡 = 𝑦𝑒𝑠
– ℎ1 : 2 𝑎 = 𝑐𝑜𝑜𝑙 or 𝐹 = 𝑠𝑎𝑚𝑒 → 𝑒𝑛j𝑜𝑦𝑆𝑝𝑜𝑟𝑡 =
𝑦𝑒𝑠
4
Regression Example
• Find function ℎ that fits ƒ at instances 𝑥
5
More Examples
Problem Domain Range Classification /
Regression
Spam Detection
Stock price
prediction
Speech recognition
Digit recognition
Housing valuation
Weather prediction
6
Hypothesis Space
• Hypothesis space 𝐻
– Set of all hypotheses ℎ that the learner may consider
– Learning is a search through hypothesis space
7
Generalizatio
•
n
A good hypothesis will generalize well
– i.e., predict unseen examples correctly
• Usually …
– Any hypothesis ℎ found to approximate the target function
ƒ well over a sufficiently large set of training examples
will also approximate the target function well over any
unobserved examples
8
Inductive Learning
• Goal: find an ℎ that agrees with ƒ on training set
– ℎ is consistent if it agrees with ƒ on all examples
9
Inductive Learning
• A learning problem is realizable if the hypothesis
space contains the true function otherwise it is
unrealizable.
– Difficult to determine whether a learning problem is
realizable since the true function is not known
• It is possible to use a very large hypothesis
space
– For example: H = class of all Turing machines
• But there is a tradeoff between expressiveness of a
hypothesis class and the complexity of finding a
good hypothesis
10
Nearest Neighbour Classification
• Classification function
ℎ 𝑥 = 𝑦𝑥 ∗
where 𝑦𝑥 ∗ is the label associated with the nearest neighbour
𝑥 ∗ = 𝑎𝑟𝑔𝑚i𝑛𝑥 ' 𝑑(𝑥, 𝑥')
• Distance measures: 𝑑 𝑥, 𝑥'
1/𝑝
Weighted dimensions: 𝑑 𝑥, = ∑𝑀𝑝𝑐 𝑥
𝑥' j −
𝑥' j
j
j
11
Voronoi Diagram
• Partition implied by nearest neighbor fn ℎ
– Assuming Euclidean distance
12
K-Nearest Neighbour
• Nearest neighbour often instable (noise)
• Idea: assign most frequent label among k-
nearest neighbours
– Let 𝑘𝑛𝑛 𝑥 be the 𝑘-nearest neighbours
of 𝑥
according to distance 𝑑
– Label: 𝑦𝑥 ← 𝑚𝑜𝑑𝑒( 𝑦𝑥' 𝑥 ' ∈ 𝑘𝑛𝑛(𝑥) )
13
Effect of 𝐾
• 𝐾 controls the degree of smoothing.
• Which partition do you prefer?
Why?
14
Performance of a learning algorithm
• A learning algorithm is good if it produces a
hypothesis that does a good job of predicting
classifications of unseen examples
• Verify performance with a test set
1. Collect a large set of examples
2. Divide into 2 disjoint sets: training set and test set
3. Learn hypothesis ℎ with training set
4. Measure percentage of correctly classified
examples by ℎ
in the test set
15
The effect of K
• Best 𝐾 depends on
– Problem
– Amount of training data
% correct
K
16
Underfitting
• Definition: underfitting occurs when an algorithm finds a
hypothesis ℎ with training accuracy that is lower than the
future accuracy of some other hypothesis ℎ’
• Amount of underfitting of ℎ:
𝑚𝑎𝑥 {0, max ƒ 𝑢 𝑡 𝑢𝑟𝑒𝐴𝑐𝑐𝑢𝑟𝑎𝑐𝑦ℎ′ −
,-
𝑡𝑟𝑎8𝑛𝐴𝑐𝑐𝑢𝑟𝑎𝑐𝑦
≈ 𝑚𝑎𝑥 {0, max 𝑡𝑒𝑠𝑡𝐴𝑐𝑐𝑢𝑟𝑎𝑐𝑦 ℎ′ ℎ− 𝑡𝑟𝑎8𝑛𝐴𝑐𝑐𝑢𝑟𝑎𝑐𝑦
} ℎ }
,-
• Common cause:
– Classifier is not expressive
enough
17
Overfitting
• Definition: overfitting occurs when an algorithm finds a
hypothesis ℎ with higher training accuracy than its future
accuracy.
• Amount of overfitting of ℎ:
max {0, 𝑡𝑟𝑎+𝑛𝐴𝑐𝑐𝑢𝑟𝑎𝑐𝑦 ℎ − 2𝑢𝑡𝑢𝑟𝑒𝐴𝑐𝑐𝑢𝑟𝑎𝑐𝑦 ℎ
}
≈ max {0, 𝑡𝑟𝑎+𝑛𝐴𝑐𝑐𝑢𝑟𝑎𝑐𝑦 ℎ − 𝑡𝑒𝑠𝑡𝐴𝑐𝑐𝑢𝑟𝑎𝑐𝑦 ℎ }
• Common causes:
– Classifier is too expressive
– Noisy data
– Lack of data
18
Choosing K
• How should we choose K?
– Ideally: select K with highest future accuracy
– Alternative: select K with highest test accuracy
19
Choosing K based on Validation Set
Let 𝑘 be the number of neighbours
For 𝑘 = 1 to max # of neighbours
ℎ𝑘 ← 𝑡𝑟𝑎)𝑛(𝑘,
𝑡𝑟𝑎)𝑛)𝑛𝑔𝐷𝑎𝑡𝑎)
𝑎𝑐𝑐𝑢𝑟𝑎𝑐𝑦𝑘 ←
𝑡𝑒𝑠𝑡(ℎ𝑘, 𝑣𝑎𝑙)𝑑𝑎𝑡)𝑜𝑛𝐷𝑎𝑡𝑎)
𝑘∗ ← 𝑎𝑟𝑔𝑚𝑎𝑥𝑘 𝑎𝑐𝑐𝑢𝑟𝑎𝑐𝑦𝑘
ℎ ← 𝑡𝑟𝑎)𝑛(𝑘∗, 𝑡𝑟𝑎)𝑛)𝑛𝑔𝐷𝑎𝑡𝑎 𝖴
𝑣𝑎𝑙)𝑑𝑎𝑡)𝑜𝑛𝐷𝑎𝑡𝑎)
𝑎𝑐𝑐𝑢𝑟𝑎𝑐𝑦 ← 𝑡𝑒𝑠𝑡(ℎ, 𝑡𝑒𝑠𝑡𝐷𝑎𝑡𝑎) 20
Return 𝑘 ∗ , ℎ, 𝑎𝑐𝑐𝑢𝑟𝑎𝑐𝑦
Robust validation
21
Cross-Validation
• Repeatedly split training data in two parts, one for training
and one for validation. Report the average validation accuracy.
• 𝑘-fold cross validation: split training data in 𝑘 equal size
subsets. Run 𝑘 experiments, each time validating on one
subset and training on the remaining subsets. Compute
the average validation accuracy of the 𝑘 experiments.
• Picture:
22
Selecting the Number of Neighbours
by Cross-Validation
Let 𝑘 be the number of neighbours
Let 𝑘′ be the number of trainingData splits
For 𝑘 = 1 to max # of neighbours
For $ = 1 to 𝑘′ do (where $
indexes trainingData splits)
ℎ𝑘i ← 𝑡𝑟𝑎$𝑛(𝑘,
𝑡𝑟𝑎$𝑛$𝑛𝑔𝐷𝑎𝑡𝑎1..i–1,i41..𝑘𝘍 )
𝑎𝑐𝑐𝑢𝑟𝑎𝑐𝑦𝑘 i ← 𝑡𝑒𝑠𝑡(ℎ𝑘 i ,
𝑡𝑟𝑎$𝑛$𝑛𝑔𝐷𝑎𝑡𝑎i )
𝑎𝑐𝑐𝑢𝑟𝑎𝑐𝑦𝑘 ← 𝑎𝑣𝑒𝑟𝑎𝑔𝑒(
𝑎𝑐𝑐𝑢𝑟𝑎𝑐𝑦𝑘i
∀i)
23
∗
Weighted K-Nearest Neighbour
𝑦0 ← 𝑎𝑟𝑔𝑚𝑎𝑥𝑦 6
w(𝑥, 𝑥 $)
where 𝑘𝑘𝑘 𝑥 is the
{0 𝘍set of K 0nearest
|0 𝘍 ∈;𝑛𝑛 𝖠 𝑦>𝑦 𝑥neighbours
𝘍}
of 𝑥
24
K-Nearest Neighbour Regression
• We can also use KNN for regression
• Let 𝑦𝑥 be a real value instead of a categorical label