Data Mining Classification: Alternative Techniques
Data Mining Classification: Alternative Techniques
Data Mining Classification: Alternative Techniques
Rule-Based
Rule-Based Classifier
● Rule: (Condition) → y
– where
u Condition is a conjunctions o f a ttributes
u y is the class label
– LHS: rule antecedent or condition
– RHS: rule consequent
– Examples of classification rules:
u (Blood Type=Warm) ∧ (Lay E ggs=Yes) → Birds
u (Taxable Income < 5 0K) ∧ (Refund=Yes) → Evade=No
Name Blood Type Give Birth Can Fly Live in Water Class
hawk warm no yes no ?
grizzly bear warm yes no no ?
Name Blood Type Give Birth Can Fly Live in Water Class
lemur warm yes no no ?
turtle cold no no sometimes ?
dogfish shark cold yes no yes ?
A lemur triggers rule R3, so it is c lassified as a mammal
A t urtle t riggers both R4 and R5
A dogfish s hark t riggers none of t he rules
● Exhaustive r ules
– Classifier has exhaustive coverage if it
accounts for every possible combination of
attribute values
– Each record is covered by at least one rule
02/14/2018 Introduction to Data Mining, 2nd Edition 7
Name Blood Type Give Birth Can Fly Live in Water Class
turtle cold no no sometimes ?
02/14/2018 Introduction to Data Mining, 2nd Edition 9
● Rule-based ordering
– Individual rules a re ranked b ased o n their q uality
● Class-based ordering
– Rules that b elong to the same class a ppear together
● Direct Method:
u Extract rules directly from data
u Examples: RIPPER, CN2, Holte’s 1R
● Indirect M ethod:
u Extract rules from other classification models (e.g.
decision trees, neural networks, etc).
u Examples: C4.5rules
R1 R1
R2
Rule Growing
Yes: 3
{ } No: 4
Refund=No, Refund=No,
Status=Single, Status=Single,
Income=85K Income=90K
(Class=Yes) (Class=Yes)
Refund=
No
Status =
Single
Status =
Divorced
Status =
Married
... Income
> 80K
Refund=No,
Status = Single
Yes: 3 Yes: 2 Yes: 1 Yes: 0 Yes: 3 (Class = Yes)
No: 4 No: 1 No: 0 No: 3 No: 1
● Growing a rule:
– Start from empty rule
– Add conjuncts as long as they improve FOIL’s
information gain
– Stop when rule no longer covers negative examples
– Prune the rule immediately using incremental reduced
error pruning
– Measure for pruning: v = (p-n)/(p+n)
u p: n umber o f p ositive e xamples covered b y the rule in
the validation set
u n: n umber o f n egative e xamples covered b y the rule in
the validation set
– Pruning method: delete any final sequence of
conditions that maximizes v
02/14/2018 Introduction to Data Mining, 2nd Edition 19
Indirect Methods
P
No Yes
Q R Rule Set
Give C4.5rules:
Birth? (Give Birth=No, Can Fly=Yes) → Birds
(Give Birth=No, Live in Water=Yes) → Fishes
Yes No
(Give Birth=Yes) → Mammals
(Give Birth=No, Can Fly=No, Live in Water=No) → Reptiles
Mammals Live In ( ) → Amphibians
Water?
Yes No RIPPER:
(Live in Water=Yes) → Fishes
Sometimes (Have Legs=No) → Reptiles
(Give Birth=No, Can Fly=No, Live In Water=No)
Fishes Amphibians Can → Reptiles
Fly?
(Can Fly=Yes,Give Birth=No) → Birds
Yes No () → Mammals
Birds Reptiles
C4.5 a nd C4.5rules:
PREDICTED CLASS
Amphibians Fishes Reptiles Birds Mammals
ACTUAL Amphibians 2 0 0 0 0
CLASS Fishes 0 2 0 0 1
Reptiles 1 0 3 0 0
Birds 1 0 0 3 0
Mammals 0 0 1 0 6
RIPPER:
PREDICTED CLASS
Amphibians Fishes Reptiles Birds Mammals
ACTUAL Amphibians 0 0 0 0 2
CLASS Fishes 0 3 0 0 0
Reptiles 0 0 3 0 1
Birds 0 0 1 2 1
Mammals 0 2 1 0 4