Unit 5 Classification PDF
Unit 5 Classification PDF
Unit 5 Classification PDF
Unit-5
Topics Covered – Unit 5
What is Classification
General Approach to Classification
Issues in Classification
Classification Algorithms
Statistical Based
• Bayesian Classification
Distance Based
• KNN
Decision Tree Based
• ID3
Neural Network Based
Rule Based
What is Classification
Attribute set
Classification Class label
(x) Model (y)
Classification as the task of mapping an input attribute set x into its class label y
Speech recognition
Pattern recognition
Classification Ex: Grading
x
If x >= 90 then grade =A.
<90 >=90
If 80<=x<90 then grade =B.
If 70<=x<80 then grade =C. x A
If 60<=x<70 then grade =D. <80 >=80
If x<50 then grade =F. x B
<70 >=70
Classify the following marks x C
78 , 56 , 99
<50 >=60
F D
Topics Covered
What is Classification
General Approach to Classification
Issues in Classification
Classification Algorithms
Statistical Based
• Bayesian Classification
Distance Based
• KNN
Decision Tree Based
• ID3
Neural Network Based
Rule Based
General approach to Classification
Classification
Algorithms
Training
Data
Classifier
Testing
Data Unseen Data
(Jeff, Professor, 4)
NAME RANK YEARS TENURED
Tom A ssistan t P ro f 2 no Tenured?
M erlisa A sso ciate P ro f 7 no
G eo rg e P ro fesso r 5 yes
Jo sep h A ssistan t P ro f 7 yes
Defining Classes
Topics Covered
What is Classification
General Approach to Classification
Issues in Classification
Classification Algorithms
Statistical Based
• Bayesian Classification
Distance Based
• KNN
Decision Tree Based
• ID3
Neural Network Based
Rule Based
Issues in Classification
Missing Data
Ignore missing value
Replace with assumed value
Measuring Performance
Classification accuracy on test data
Confusion matrix
• provides the information needed to determine how
well a classification model performs
Confusion Matrix
Confusion Matrix
Predicted Class
Class= 1 Class= 0
Actual Class= 1 f11 f10
Class Class =0 f01 f00
• Each entry fij in this table denotes the number of records from
class i predicted to be of class j.
• For instance, f01 is the number of records from class 0 incorrectly
predicted as class 1.
• The total number of correct predictions: (f11+ f00)
• The total number of incorrect predictions: (f01 + f10)
Classification Performance
F Score =
High recall, low precision: This means that most of the positive examples are
correctly recognized (low FN) but there are a lot of false positives.
Low recall, high precision: This shows that we miss a lot of positive examples
(high FN) but those we predict as positive are indeed positive (low FP)
Example to interpret Confusion Matrix
Example
When to use Accuracy / Precision /
Recall / F1-Score?
Accuracy is used when the True Positives and True Negatives are
more important. Accuracy is a better metric for Balanced Data.
•P(c|x) is the posterior probability of class (c, target) given predictor (x, attributes).
•P(c) is the prior probability of class.
•P(x|c) is the likelihood which is the probability of the predictor given class.
•P(x) is the prior probability of the predictor.
Bayes Theorem
Data tuple (A): 35-year-old customer with an income of $40,000
Hypothesis (B): customer will buy a computer
Posterior Probability: P(A|B),
P(A|B) is the likelihood. It represents the probability of observing the data
(35-year-old with an income of $40,000) given that the hypothesis
(customer will buy a computer) is true.
Prior Probability: P(A),
is the prior probability of a customer being a 35-year-old with an income
of $40,000.
Posterior Probability: P(B|A),
probability of the customer buying a computer given their age and income.
For all entries in the dataset, the denominator does not change,
it remain static. Therefore, the denominator can be removed
and a proportionality can be introduced.
It is easy to use.
Unlike other classification approaches, only one
scan of the training data is required.
The naive Bayes approach can easily handle missing
values by simply omitting that probability when
calculating the likelihoods of membership in each
class.
In cases where there are simple relationships, the
technique often does yield good results.
Disadvantages of naïve bayes
Classes represented by
Centroid: Central value.
Medoid: Representative point.
Algorithm: KNN
K Nearest Neighbor (KNN)
KNN captures the idea of similarity
(sometimes called distance, proximity, or
closeness) with some mathematics -calculating
the distance between points on a graph.
K in KNN refers to number of nearest
neighbors.
Properties of KNN −
Lazy learning algorithm − it does not have a
specialized training phase and uses all the data
for training while classification.
Non-parametric learning algorithm − it
doesn’t assume anything about the underlying
data.
K Nearest Neighbor (KNN)
The average of these data points is the final prediction for the
new point.
X
Topics Covered
What is Classification
General Approach to Classification
Issues in Classification
Classification Algorithms
Statistical Based
• Bayesian Classification
Distance Based
• KNN
Decision Tree Based
• ID3
Neural Network Based
Rule Based
Decision Tree based Algorithms
Splitting Attributes
Home Marital Annual Defaulted
ID
Owner Status Income Borrower
1 Yes Single 125K No Home
2 No Married 100K No Owner
Yes No
3 No Single 70K No
4 Yes Married 120K No NO MarSt
5 No Divorced 95K Yes Single, Divorced Married
6 No Married 60K No
Income NO
7 Yes Divorced 220K No
< 80K > 80K
8 No Single 85K Yes
9 No Married 75K No NO YES
10 No Single 90K Yes
10
MarSt Single,
Married Divorced
Home Marital Annual Defaulted
ID
Owner Status Income Borrower
NO Home
1 Yes Single 125K No
Yes Owner No
2 No Married 100K No
3 No Single 70K No NO Income
4 Yes Married 120K No < 80K > 80K
5 No Divorced 95K Yes
NO YES
6 No Married 60K No
7 Yes Divorced 220K No
8 No Single 85K Yes
9 No Married 75K No There could be more than one tree that
fits the same data!
10 No Single 90K Yes
10
Apply Model to Test Data
Test Data
Start from the root of tree.
Home Marital Annual Defaulted
Owner Status Income Borrower
No Married 80K ?
Home 10
Yes Owner No
NO MarSt
Single, Divorced Married
Income NO
< 80K > 80K
NO YES
Apply Model to Test Data
Test Data
Home Marital Annual Defaulted
Owner Status Income Borrower
No Married 80K ?
Home 10
Yes Owner No
NO MarSt
Single, Divorced Married
Income NO
< 80K > 80K
NO YES
Apply Model to Test Data
Test Data
Home Marital Annual Defaulted
Owner Status Income Borrower
No Married 80K ?
Home 10
Yes Owner No
NO MarSt
Single, Divorced Married
Income NO
< 80K > 80K
NO YES
Apply Model to Test Data
Test Data
Home Marital Annual Defaulted
Owner Status Income Borrower
No Married 80K ?
Home 10
Yes Owner No
NO MarSt
Single, Divorced Married
Income NO
< 80K > 80K
NO YES
Apply Model to Test Data
Test Data
Home Marital Annual Defaulted
Owner Status Income Borrower
No Married 80K ?
Home 10
Yes Owner No
NO MarSt
Single, Divorced Married
Income NO
< 80K > 80K
NO YES
Parts of a Decision Tree
Decision Tree
Given:
D = {t1, …, tn} where ti=<ti1, …, tih>
Attributes {A1, A2, …, Ah}
Classes C={C1, …., Cm}
Decision or Classification Tree is a tree associated with
D such that
Each internal node is labeled with attribute, Ai
Each arc is labeled with predicate which can be
applied to attribute at parent
Each leaf node is labeled with a class, Cj
Decision Tree based Algorithms
Pruning
Once a tree is constructed, some modifications to the tree
might be needed to improve the performance of the tree
during the classification phase.
The pruning phase might remove redundant comparisons
or remove subtrees to achieve better performance.
Comparing
Decision
Trees
ID3
ID3 stands for Iterative Dichotomiser 3
Creates tree using information theory concepts and
tries to reduce expected number of comparison.
ID3 chooses split attribute with the highest
information gain:
Information gain=(Entropy of distribution before
the split)–(entropy of distribution after it)
Entropy
Entropy
Is used to measure the amount of uncertainty or
surprise or randomness in a set of data.
When all data belongs to a single class, entropy is
zero as there is no uncertainty.
An equally divided sample as an entropy of 1
Entropy
The Mathematical formula for Entropy is -
By Apurva Joshi
Advantages of ID3
Another Example:
If 90 <= grade, then class = A
If 80 <=grade and grade < 90, then class = B
If 70 <=grade and grade < 80, then class = C
If 60 <=grade and grade < 70, then class = D
If grade < 60, then class = F
Decision
Tree
Rules
Generating Rules Example
(contd..)
Optimized Tree
Optimized set of
rules
Generating Rules Example
Name Blood Type Give Birth Can Fly Live in Water Class
human warm yes no no mammals
python cold no no no reptiles
salmon cold no no yes fishes
whale warm yes no yes mammals
frog cold no no sometimes amphibians
komodo cold no no no reptiles
bat warm yes yes no mammals
pigeon warm no yes no birds
cat warm yes no no mammals
leopard shark cold yes no yes fishes
turtle cold no no sometimes reptiles
penguin warm no no sometimes birds
porcupine warm yes no no mammals
eel cold no no yes fishes
salamander cold no no sometimes amphibians
gila monster cold no no no reptiles
platypus warm no no no mammals
owl warm no yes no birds
dolphin warm yes no yes mammals
eagle warm no yes no birds
Exhaustive rules
Classifier has exhaustive coverage if it accounts for
every possible combination of attribute values
Each record is covered by at least one rule
Characteristics of Rule Sets:
Strategy
Name Blood Type Give Birth Can Fly Live in Water Class
turtle cold no no sometimes ?