Lecture 5 Evaluation_Classifer
Lecture 5 Evaluation_Classifer
Lecture 5 Evaluation_Classifer
LEARNING CLASSIFIERS
Machine Learning
1
Outline: Evaluation Parameters
Precision
Recall
Accuracy
F-Measure
True Positive Rate
False Positive Rate
Sensitivity
ROC 2
Experiment: Training and Testing
Objective: Unbiased estimate of accuracy
Simplest way
to split the data is to
use the train-test
split method
4
Learning Curve
How does the accuracy of a learning method
change as a function of the training-set size?
This can be assessed by plotting learning curves
#Given training/test set
partition
• for each sample size s on
learning curve
• (optionally) repeat n times
• randomly select s instances
from training set
• learn model
• evaluate model on test set
to determine accuracy a
• plot (s, a) or (s, avg.
accuracy and error bars)
5
Validation (Tuning) Set
Consider we want unbiased estimates of accuracy
during the learning process (e.g. to choose the best level
of decision-tree pruning)? holding out a
portion or subset
of training data
that is held out.
This method is
called
the validation set
approach
7
Random Sampling
It can be addressed the second issue by repeatedly
randomly partitioning the available data into training and set
sets.
8
Random Sampling…
When randomly selecting
training or validation sets,
we may want to ensure that
class proportions are
maintained in each selected
set.
This can be done via
stratified sampling: first
stratify (divider) instances
by class, then randomly
select instances from each
class proportionally.
9
Cross Validation
The train and test split has limitations such as
the dataset is small, the method is prone to
high variance.
Due to the random partition, the results can be
entirely different for different test sets because in
some partitions, samples that are easy to
classify get into the test set, while in others, the
test set receives the ‘difficult’ ones.
To deal with this issue, we use cross-
validation to evaluate the performance of a
machine learning model.
10
Cross Validation…
K-Fold Cross Validation
In k-fold CV, we first divide our dataset into k equally
sized subsets. Then, we repeat the train-test method
k times such that each time one of the k subsets is
used as a test set and the rest k-1 subsets are used
together as a training set.
Finally, we compute the estimate of the model’s
performance estimate by averaging the scores over
the k trials.
11
K-Fold Cross Validation
Example of 3-fold
For example, let’s suppose that we have a dataset
𝑆𝑐𝑜𝑟𝑒1 + 𝑆𝑐𝑜𝑟𝑒2 + 𝑆𝑐𝑜𝑟𝑒3
𝑆 = {𝑥1 , 𝑥2 , 𝑥3 , 𝑥4 , 𝑥5 , 𝑥6 },
First we divide the samples in to 3-fold as 𝑆1 =
3
𝑥1 , 𝑥2 , 𝑆2 = 𝑥3 , 𝑥4 , 𝑆3 = 𝑥5 , 𝑥6 . Then we evaluate
the model as
𝑂𝑣𝑒𝑟𝑎𝑙𝑙 𝑆𝑐𝑜𝑟𝑒 =
12
5-fold Cross Validation
Partition
data
into n
subsamples
(S)
Iteratively
leave one
subsample
out for
the test set,
train on
the rest
13
5-fold Cross Validation…
Suppose we have 100 instances, and we want to
estimate accuracy with cross validation.
19
Type-I and Type-II Error
20
Sec. 8.3
Precision
Precision: measures the correctness achieved in true
prediction. Also, tells us how many predictions are actually
positive out of all the total positive predicted. Precision
should be high(ideally 1)
“Precision is a useful metric in cases where False Positive
is a higher concern than False Negatives”
𝑡𝑝
Precision/ Positive Prediction Value 𝑃 =
𝑡𝑝 +𝑓𝑝
𝑡𝑝
Recall R=
𝑡𝑝 +𝑓𝑛
21
Issues with “Precision & Recall”
TP FP
FN TN
A combined measure: F
Combined measure that assesses
precision/recall tradeoff is F measure (weighted
harmonic mean):
( 1) PR
2
1
F
1
(1 )
1 PR
2
P R
• high F1 score if both Precision and Recall are high
• low F1 score if both Precision and Recall are low
• medium F1 score if one of Precision and Recall is low and the other is high
Accuracy Measure
The accuracy of
an engine: the
fraction of these
classifications that
are correct.
𝑨𝒄𝒄𝒖𝒓𝒂𝒄𝒚(%) =
(𝒕𝒑 +𝒕𝒏 )
× 100
(𝒕𝒑 +𝒕𝒏 +𝒇𝒏 +𝒇𝒑 )
25
Accuracy Measure
𝒚 labelled Value (0- ෝ predicted
𝒚 Output at Confusion Matrix
Negative, 1-Positive) value threshold (0.5)
0 0.3 0 TP=2 FP=1
1 0.4 0 FN=1 TN=2
0 0.7 1
1 0.8 1
0 0.4 0
1 0.7 1
4 𝑇𝑃 2
𝐴𝑐𝑐𝑢𝑟𝑎𝑐𝑦 = = .666 𝑅𝑒𝑐𝑎𝑙𝑙 = = = .666
6 𝑇𝑃 + 𝐹𝑁 3
2
𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 = = .666
3
26
Issues with Accuracy
Consider a 2-class problem
Number of Class 0 examples = 9990
5+10
𝑀𝑖𝑠𝑠 𝐶𝑙𝑎𝑠𝑠𝑖𝑓𝑖𝑐𝑎𝑡𝑖𝑜𝑛 𝐸𝑟𝑟𝑜𝑟 = = 0.09
50+10+5+100
𝐹𝑁+𝐹𝑃
Error in percentage= ∗ 100
𝑇𝑃+𝐹𝑃+𝑇𝑁+𝐹𝑁 30
Sensitivity & Specificity
Sensitivity is the metric that evaluates a
model’s ability to predict true positives of each
available category.
Specificity is the metric that evaluates a
model’s ability to predict true negatives of each
available category.
31
Find Sensitivity and Specificity
32
Other form of Accuracy Metrics
33
ROC/AUC
A Receiver Operating Characteristic (ROC)/Area Under
Curve plots the TP-rate vs. the FP-rate as a threshold on
the confidence of an instance being positive is varied.
35
ROC curves & Misclassification
costs
36
Create ROC of a model
Consider a prediction table at different threshold setting
𝒚 ෝ predicted
𝒚 Output at Output at Output at Output at
labelled Value (0- value threshold threshold threshold threshold
Negative, 1- (0.5) (0.6) (0.72) (0.8)
Positive)
0 0.3 0 0 0 0
1 0.55 1 0 0 0
0 0.75 1 1 1 0
1 0.8 1 1 1 1
0 0.4 0 0 0 0
1 0.7 1 1 0 0
37
Create ROC of a model…
Threshold setting (0.6)
𝒚 ෝ predicted
𝒚 Output at Output at Output at Output at
labelled Value (0- value threshold threshold threshold threshold
Negative, 1- (0.5) (0.6) (0.72) (0.8)
Positive)
0 0.3 0 0 0 0
1 0.55 1 0 0 0
0 0.75 1 1 1 0
1 0.8 1 1 1 1
0 0.4 0 0 0 0
1 0.7 1 1 0 0
38
Create ROC of a model…
Threshold setting (0.72)
𝒚 ෝ predicted
𝒚 Output at Output at Output at Output at
labelled Value (0- value threshold threshold threshold threshold
Negative, 1- (0.5) (0.6) (0.72) (0.8)
Positive)
0 0.3 0 0 0 0
1 0.55 1 0 0 0
0 0.75 1 1 1 0
1 0.8 1 1 1 1
0 0.4 0 0 0 0
1 0.7 1 1 0 0
39
Create ROC of a model…
Threshold setting (0.80)
𝒚 ෝ predicted
𝒚 Output at Output at Output at Output at
labelled Value (0- value threshold threshold threshold threshold
Negative, 1- (0.5) (0.6) (0.72) (0.8)
Positive)
0 0.3 0 0 0 0
1 0.55 1 0 0 0
0 0.75 1 1 1 0
1 0.8 1 1 1 1
0 0.4 0 0 0 0
1 0.7 1 1 0 0
40
Plot of ROC
Threshold TP=3 FP=1 TN=2 FN=0 TPR=3/(3+0)=1 FPR=2/(2+1)=.66
Setting (0.5)
0.5
0.4
0.3
0.2
0.1
0
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
FPR 41
Step to create ROC
Sort test-set predictions according to confidence
that each instance is positive.
Step through sorted list from high to low
confidence
locate a threshold between instances with opposite
classes (keeping instances with the same confidence
value on the same side of threshold)
compute TPR, FPR for instances above threshold
output (FPR, TPR) coordinate
42
Example of ROC Plot
43
Example of ROC Plot …
Rearrange the samples according to class
Correct class Instance Confidence Positive
+ Ex 9 0.99
+ Ex 7 0.98 Positive
+ Ex 2 0.70 Class
+ Ex 6 0.65
+ Ex 5 0.24
- Ex 1 0.72
- Ex10 0.51
Negative
- Ex 3 0.39 Class
- Ex 4 0.11
- Ex 8 0.01
44
Example of ROC Plot …
For Threshold 0.72
Correct class Instance confidence positive predicted class
+ Ex 9 0.99 +
+ Ex 7 0.98 +
+ Ex 2 0.70 -
+ Ex 6 0.65 -
+ Ex 5 0.24 -
- Ex 1 0.72 +
- Ex10 0.51 -
- Ex 3 0.39 -
- Ex 4 0.11 -
- Ex 8 0.01 -
47
Significance of ROC…
49
Significance of ROC…
52
Precision/recall curves
A precision/recall curve plots the precision vs.
recall (TP-rate) as a threshold on the confidence
of an instance being positive is varied.
53
Comment on ROC/PR Curve
Both
allow predictive performance to be assessed at various levels of
confidence
assume binary classification tasks
sometimes summarized by calculating area under the curve
ROC curves
insensitive to changes in class distribution (ROC curve does not
change if the proportion of positive and negative instances in the
test set are varied)
can identify optimal classification thresholds for tasks with
differential misclassification costs
Precision/Recall curves
show the fraction of predictions that are false positives
well suited for tasks with lots of negative instances
54
Loss Function
Mean Square Error Loss Function
It is used for regression problem
Mean square error loss for m-data point is defined as
1 𝑚
𝐿𝑆𝐸 = σ𝑖=1(𝑦𝑖 − 𝑦ෝ𝑖 )2
𝑚
ො 2.
For single point 𝐿𝑆𝐸 1 = (𝑦 − 𝑦)
Binary Cross Entropy Loss Function
It is used for classification problem
1 𝑚
BCELF is defined 𝐿𝐶𝐸 = − σ [𝑦 ln 𝑦ො𝑖 + (1 −
𝑚 𝑖=1 𝑖
55
Example
Consider a 2-class problem, where ground truth is
𝑦 = 0. 𝑡ℎ𝑒𝑛 𝑳𝑺𝑬 𝟏 = 𝒚 ෝ𝟐 and, 𝑦 = 1 𝐿𝑆𝐸 1 = (1 − 𝑦)
ො 2.
Similarly 𝑳𝑪𝑬 𝟏 = 𝒍𝒏 𝟏 − 𝒚 ෝ 𝒂𝒏𝒅 𝒍𝒏(ෝ𝒚) 𝟐
(1 − 𝑦)
ො 2 ෝ
𝒚
Consider example,
𝑦 = 0, & 𝑦ො = 0.9, 𝐿𝑆𝐸 = 0.81
Similarly 𝐿𝐶𝐸 = 2.3
𝜕𝐿𝑆𝐸 𝜕𝐿𝐶𝐸
Gradient = 1.8 and = 10.0
𝜕𝑦ො 𝜕𝑦ො
TP TP
P= R=
TP+𝐹𝑃 TP+𝐹𝑁
5 5
P= 𝑅=
5+3 5+7
5 5
P= R=
8 12
57
Practice question
Q2. A database contains 80 records on a particular topic of which 55 are
relevant to a certain investigation. A search was conducted on that topic
and 50 records were retrieved. Of the 50 records retrieved, 40 were
relevant. Construct the confusion matrix for the search and calculate the
precision and recall scores for the search. Each record may be assigned a
class label “relevant" or “not relevant”.
Solution: All the 80 records were tested for relevance. The test classified 50
records as “relevant”. But only 40 of them were actually relevant.
Actual relevant Actual not relevant
Predicted relevant 40 10
Predicted not 15 25
relevant
58
Practice question
TP = 40
FP = 10
FN = 15
59
Practice question
Using the data in the confusion matrix of a classifier of two-class
dataset, several measures of performance can be calculated as
well.
65
Accuracy = (TP + TN)/( TP + TN + FP + FN ) =
90
65 25
Error rate = 1− Accuracy = 1- =
90 90
40
Sensitivity = TP/( TP + FN) =
55
25
Specificity = TN /(TN + FP) =
35
80
F-measure = (2 × TP)/( 2 × TP + FP + FN) =
105
60
Practice question
Q3. Let there be 10 balls (6 white and 4 red balls) in a box and let it be
required to pick up the red balls from them. Suppose we pick up 7 balls as
the red balls of which only 2 are actually red balls. What are the values of
precision and recall in picking red ball?
Solution:
TP = 2
FP = 7 − 2 = 5
FN = 4 − 2 = 2
The precision P is P = TP/( TP + FP)
= 2/( 2 + 5) = 2/ 7