15.module6 Decisiontree-Updated 14
15.module6 Decisiontree-Updated 14
15.module6 Decisiontree-Updated 14
1
Introduction
2
Introduction
Top-Down strategy
3
Decision Tree
Example
4
Decision Tree
Example
RULE 1 If it is sunny and the humidity is not above 75%, then play
RULE 2 If it is sunny and the humidity is above 75%, then do not play.
RULE 3 If it is overcast, then play.
RULE 4 If it is rainy and not windy, then play.
RULE 5 If it is rainy and windy, then don't play.
5
Classification
6
In our tree, we can carry out the classification
for an unknown record as follows.
Let us assume, for the record, that we know
the values of the first four attributes (but we
do not know the value of class attribute) as
8
The accuracy of the classifier is determined by the percentage of the
test data set that is correctly classified.
We can see that for Rule 1 there are two records of the test data set
satisfying outlook= sunny and humidity < 75, and only one of these
is correctly classified as play.
Thus, the accuracy of this rule is 0.5 (or 50%). Similarly, the
accuracy of Rule 2 is also 0.5 (or 50%). The accuracy of Rule 3 is
0.66.
RULE 1
If it is sunny and the humidity
is not above 75%, then play.
9
Concept of Categorical Attributes
10
Figure gives a decision tree for the
training data.
11
Advantages and Shortcomings of Decision
Tree Classifications
A decision tree construction process is concerned with identifying
the splitting attributes and splitting criterion at every level of the tree.
Weaknesses are:
The process of growing a decision tree is computationally expensive.At
each node, each candidate splitting field is examined before its best split
can be found.
Some decision tree can only deal with binary-valued target classes.
12
Iterative Dichotomizer (ID3)
Quinlan (1986)
Each node corresponds to a splitting attribute
Each arc is a possible value of that attribute.
13
Training Dataset
This follows an example from Quinlan’s ID3
14
EXAMPLE PROBLEM
The training data are in above Table. The data tuples are described by the attributes age,
income, student, and credit rating. The class label attribute, buys computer, has two
distinct values (namely, yes, no). Let C1 correspond to the class buys computer = yes and
C2 correspond to buys computer = no. The tuple we wish to classify is X = (age = youth,
income = medium, student = yes, credit rating = fair)
Extracting Classification Rules from Trees
15
Solution (Rules)
16
Algorithm for Decision Tree Induction
17
Attribute Selection Measure: Information
Gain (ID3/C4.5)
Let status be the class attribute. Use the ID3 algorithm to construct a
decision tree from the given data.
22