Predict Based Simmiliarity and Validation
Predict Based Simmiliarity and Validation
Predict Based Simmiliarity and Validation
To do this we simply modify the algorithm to return the majority target level within
the set of k nearest neighbors to the query q
Where Mk(q) is the prediction of the model for the query q given the parameter of
the model k; levels(t) is the set of levels in the domain of the target feature, and l is
an element of this set; i iterates over the instances di in increasing distance from the
query q; ti is the value of the target feature for instance di; and δ(ti, l) is the
Kronecker delta function, which takes two parameters return 1 if they are equal
and 0 otherwise.
votes of the neighbors that are further away from the query get less weight.
The easiest way to implement this weighting scheme is to weight each neighbor by
the reciprocal of the squared distance between the neighbor d and the
query q:
PERFORMANCE MEASURES FOR CATEGORICAL
TARGETS: MATRIX CONFUSION
Performance measurement for machine learning classification problem where output can
be two or more classes. It is a table with 4 different combinations of predicted and
actual values.
It is difficult to compare two models with low precision and high recall or
vice versa. So to make them comparable, we use F-Score/ F-measure. F-
measure helps to measure Recall and Precision at the same time. It uses
Harmonic Mean in place of Arithmetic Mean by punishing the extreme
values more. If we calculate accuracy with formula:
𝑇𝑃+𝑇𝑁
𝐴𝑐𝑐𝑢𝑟𝑎𝑐𝑦 = x100%
𝑇𝑜𝑡𝑎𝑙
EXAMPLE- 90% TRAINING AND 10% TESTING FROM ATLETE DATA
Jarak terhadap data ke-19 Jarak terhadap data ke-20
AGILITY
Target
X2=AGILITY-
ID SPEED X1=SPEED-7,5 X2=AGILITY-8 Jarak X1=SPEED-7,25 5,75 Jarak
1 2,5 6no -5 -2 5,39 -4,75 -3,25 5,76
2 3,75 8no -3,75 0 3,75 -3,5 -2 4,03
3 2,25 5,5no -5,25 -2,5 5,81 -5 -3,5 6,10
4 3,25 8,25no -4,25 0,25 4,26 -4 -2,5 4,72
5 2,75 7,5no -4,75 -0,5 4,78 -4,5 -3 5,41
6 4,5 5no -3 -3 4,24 -2,75 -1,25 3,02
7 3,5 5,25no -4 -2,75 4,85 -3,75 -2,25 4,37
8 3 3,25no -4,5 -4,75 6,54 -4,25 -2,75 5,06
9 4 4no -3,5 -4 5,32 -3,25 -1,75 3,69
10 4,25 3,75no -3,25 -4,25 5,35 -3 -1,5 3,35
11 2 2no -5,5 -6 8,14 -5,25 -3,75 6,45
12 5 2,5no -2,5 -5,5 6,04 -2,25 -0,75 2,37
13 8,25 8,5no 0,75 0,5 0,90 1 2,5 2,69
14 5,75 8,75yes -1,75 0,75 1,90 -1,5 0 1,50
15 4,75 6,25yes -2,75 -1,75 3,26 -2,5 -1 2,69
16 5,5 6,75yes -2 -1,25 2,36 -1,75 -0,25 1,77
17 5,25 9,5yes -2,25 1,5 2,70 -2 -0,5 2,06
18 7 4,25yes -0,5 -3,75 3,78 -0,25 1,25 1,27
19 7,5 8yes
20 7,25 5,75yes
3-NN FOR 10% DATA TESTING
Yes No
Yes 2 0
No 0 0
2
𝑅𝑒𝑐𝑎𝑙𝑙 = 2=1
2
𝑃𝑟𝑒𝑐𝑖𝑠𝑠𝑖𝑜𝑛 = 2=1
2×2×2
𝐹 − 𝑚𝑒𝑎𝑠𝑢𝑟𝑒 = =8/4 =2
2+2
2
𝐴𝑘𝑢𝑟𝑎𝑠𝑖 = 2 x100%=100%
LATIHAN
1. Tentukan prediksi menggunakan data atlete jika training:testing
70:30
2. Buatlah matriks konfusion dan hitung F-measure serta akurasinya
SUMBER
Narkhede, Sarang. 2018. Understanding Confusion Matrix.
https://towardsdatascience.com/understanding-confusion-matrix-
a9ad42dcfd62
Keller, J.D., Naame, B.M., Arcy, A.D., 2015. FUNDAMENTALS OF MACHINE
LEARNING FOR PREDICTIVE DATA ANALYTICS. The MIT
Cambridge Press