K - Nearest Neighbours Classifier / Regressor
K - Nearest Neighbours Classifier / Regressor
K - Nearest Neighbours Classifier / Regressor
Classifier / Regressor
K-NN
• K-Nearest Neighbors is one of the most basic yet essential
classification algorithms in Machine Learning.
• It belongs to the supervised learning domain which is applied for
pattern recognition, data mining and intrusion detection.
• It is non-parametric classification/regression method, meaning, it
does not make any underlying assumptions about the distribution of
data (as opposed to other algorithms such as GMM, which assume a
Gaussian distribution of the given data).
• We are given some prior data (also called training data), which
classifies coordinates into groups identified by an attribute.
• In k-NN classification, the output is a class membership. An
object is classified by a plurality vote of its neighbors, with the
object being assigned to the class most common among
its k nearest neighbors (k is a positive integer, typically small).
If k = 1, then the object is simply assigned to the class of that
single nearest neighbor.
• In k-NN regression, the output is the property value for the
object. This value is the average of the values of k nearest
neighbors.
• k-NN is a type of instance-based learning, or lazy learning, where the
function is only approximated locally and all computation is deferred
until classification.
• Both for classification and regression, a useful technique can be to
assign weights to the contributions of the neighbors, so that the
nearer neighbors contribute more to the average than the more
distant ones.
K-NN classification Algorithm
2. for i=0 to m: Calculate cosine similarity cosФ = (a * b)/(||a|| ||b||) where a is the input
training tuple and b is the targeted tuple.
•Given training data X, posteriori probability of a hypothesis H, P(H|X), follows the Bayes’ theorem
•Let D be a training set of tuples and their associated class labels, and each
tuple is represented by an n-D attribute vector X = (x1, x2, …, xn)
•Suppose there are m classes C1, C2, …, Cm.
•Classification is to derive the maximum posteriori, i.e., the maximal P(Ci|X)
•This can be derived from Bayes’ theorem
P(X | C )P(C )
P(C | X) i i
i P(X)
2
and P(xk|Ci) is
P ( X | C i ) g ( x k , C i , Ci )
Multivariate Bernoulli Distribution
Example
• P(Ci): P(buys_computer = “yes”) = 9/14 = 0.643
P(buys_computer = “no”) = 5/14= 0.357
• Compute P(X|Ci) for each class
P(age = “youth” | buys_computer = “yes”) = 2/9 = 0.222
P(age = “youth” | buys_computer = “no”) = 3/5 = 0.6
P(income = “medium” | buys_computer = “yes”) = 4/9 = 0.444
P(income = “medium” | buys_computer = “no”) = 2/5 = 0.4
P(student = “yes” | buys_computer = “yes) = 6/9 = 0.667
P(student = “yes” | buys_computer = “no”) = 1/5 = 0.2
P(credit_rating = “fair” | buys_computer = “yes”) = 6/9 = 0.667
P(credit_rating = “fair” | buys_computer = “no”) = 2/5 = 0.4
P(X|Ci) : P(X|buys_computer = “yes”) = 0.222 x 0.444 x 0.667 x 0.667 = 0.044
P(X|buys_computer = “no”) = 0.6 x 0.4 x 0.2 x 0.4 = 0.019
P(X|Ci)*P(Ci) : P(X|buys_computer = “yes”) * P(buys_computer = “yes”) = 0.028
P(X|buys_computer = “no”) * P(buys_computer = “no”) = 0.007
Therefore, X belongs to class (“buys_computer = yes”)
X = (age=middle_aged, income=low, student=yes, credit_rating=fair)
X = (age=senior, income=high, student=no, credit_rating=fair)
Avoiding the Zero-Probability Problem
• Naïve Bayesian prediction requires each conditional prob. be
non-zero. Otherwise, the predicted prob. will be zero
n
P( X | C i) P( xk | C i)
k 1
• Ex. Suppose a dataset with 1000 tuples, income=low (0),
income= medium (990), and income = high (10)
• Use Laplacian correction (or Laplacian estimator)
• Adding 1 to each case
Prob(income = low) = 1/1003
Prob(income = medium) = 991/1003
Prob(income = high) = 11/1003
• The “corrected” prob. estimates are close to their
“uncorrected” counterparts
Gaussian Naïve Bayes classification
Gaussian Naïve Bayes Classification - Example
Gender height (feet) weight (lbs) foot size(inches)
male 6 180 12
male 5.92 (5'11") 190 11
male 5.58 (5'7") 170 12
male 5.92 (5'11") 165 10
female 5 100 6
female 5.5 (5'6") 150 8
female 5.42 (5'5") 130 7
female 5.75 (5'9") 150 9
sample 6 130 8
We wish to determine which posterior is greater, male or female. For the
classification as male the posterior is given by