15.module6 Decisiontree-Updated 14

Decision Tree
1
Introduction
 A classification scheme which generates a tree and

a set of rules from given data set.
The set of records available for developing

classification methods is divided into two disjoint
subsets - a training set and a test set.
 The attributes of the records are categorise into two
types:
 Attributes whose domain is numerical are called numerical
attributes.
 Attributes whose domain is not numerical are called the
categorical attributes.
2
Introduction
 A decision tree is a tree with the following properties:

 An inner node represents an attribute.
 An edge represents a test on the attribute of the father

node.
 A leaf represents one of the classes.
 Construction of a decision tree

 Based on the training data
 Top-Down strategy
3
Decision Tree
Example
 The data set has five attributes.

 There is a special attribute: the attribute class is the class label.
 The attributes, temp (temperature) and humidity are numerical
attributes
 Other attributes are categorical, that is, they cannot be ordered.
 Based on the training data set, we want to find a set of rules to

know what values of outlook, temperature, humidity and wind,
determine whether or not to play golf.
4
Decision Tree
Example
We have five leaf nodes.

 In a decision tree, each leaf node represents a rule.
 We have the following rules corresponding to the tree given in

Figure.
 RULE 1 If it is sunny and the humidity is not above 75%, then play
 RULE 2 If it is sunny and the humidity is above 75%, then do not play.
 RULE 3 If it is overcast, then play.
 RULE 4 If it is rainy and not windy, then play.
 RULE 5 If it is rainy and windy, then don't play.
5
Classification
 The classification of an unknown input vector is done by

traversing the tree from the root node to a leaf node.
 A record enters the tree at the root node.
 At the root, a test is applied to determine which child

node the record will encounter next.
 This process is repeated until the record arrives at a leaf
node.
 All the records that end up at a given leaf of the tree are
classified in the same way.
 There is a unique path from the root to each leaf.
 The path is a rule which is used to classify the records.
6
 In our tree, we can carry out the classification
for an unknown record as follows.
 Let us assume, for the record, that we know
the values of the first four attributes (but we
do not know the value of class attribute) as
outlook= rain; temp = 70; humidity = 65; and

windy= true.
7
We start from the root node to check the value of the attribute
associated at the root node.
 This attribute is the splitting attribute at this node.
 For a decision tree, at every node there is an attribute associated
with the node called the splitting attribute.
In our example, outlook is the splitting attribute at root.


 Since for the given record, outlook = rain, we move to the right-
most child node of the root.
 At this node, the splitting attribute is windy and we find that for
the record we want classify, windy = true.
 Hence, we move to the left child node to conclude that the class
label Is "no play".
8
 The accuracy of the classifier is determined by the percentage of the
test data set that is correctly classified.
 We can see that for Rule 1 there are two records of the test data set
satisfying outlook= sunny and humidity < 75, and only one of these
is correctly classified as play.
 Thus, the accuracy of this rule is 0.5 (or 50%). Similarly, the
accuracy of Rule 2 is also 0.5 (or 50%). The accuracy of Rule 3 is
0.66.
RULE 1
If it is sunny and the humidity
is not above 75%, then play.
9
Concept of Categorical Attributes
 Consider the following training

data set.
 There are three attributes,
namely, age, pincode and
class.
 The attribute
classclass is used for
label.
The attribute age is a numeric attribute, whereas pincode is a categorical

one.
Though the domain of pincode is numeric, no ordering can be defined
among pincode values.
You cannot derive any useful information if one pin-code is greater than
another pincode.
10
 Figure gives a decision tree for the
training data.
 The splitting attribute at the root is

pincode and the splitting criterion
here is pincode = 500 046.
 Similarly, for the left child node, the
splitting criterion is age < 48 (the
splitting attribute is age).
At root level, we have 9 records.
 Although the right child node has The associated splitting criterion is
the same attribute as the splitting pincode = 500 046.
attribute, the splitting criterion is
different. As a result, we split the records
into two subsets. Records 1, 2, 4, 8,
and 9 are to the left child note and
remaining to the right node.
The process is repeated at every
node.
11
Advantages and Shortcomings of Decision
Tree Classifications
 A decision tree construction process is concerned with identifying
the splitting attributes and splitting criterion at every level of the tree.
 Major strengths are:

 Decision tree able to generate understandable rules.
 They are able to handle both numerical and categorical attributes.
 They provide clear indication of which fields are most important for
prediction or classification.
 Weaknesses are:
 The process of growing a decision tree is computationally expensive.At
each node, each candidate splitting field is examined before its best split
can be found.
 Some decision tree can only deal with binary-valued target classes.
12
Iterative Dichotomizer (ID3)
 Quinlan (1986)
 Each node corresponds to a splitting attribute
 Each arc is a possible value of that attribute.
 At each node the splitting attribute is selected to be the most

informative among the attributes not yet considered in the path from
the root.
Entropy is used to measure how informative is a node.

 The algorithm uses the criterion of information gain to determine the
goodness of a split.
 The attribute with the greatest information gain is taken as
the splitting attribute, and the data set is split for all distinct
values of the attribute.
13
Training Dataset
This follows an example from Quinlan’s ID3
The class label attribute,

buys_computer, has two distinct
values. age income student credit_rating buys_computer
<=30 high no fair no
Thus there are two distinct <=30 high no excellent no
classes. (m =2) 31…40 high no fair yes
Class C1 corresponds to yes >40 medium no fair yes
>40 low yes fair yes
and class C2 corresponds to no
>40 low yes excellent no
There are 9 samples of class yes 31…40 low yes excellent yes
and 5 samples of class no. <=30 medium no fair no
<=30 low yes fair yes
>40 medium yes fair yes
<=30 medium yes excellent yes
31…40 medium no excellent yes
31…40 high yes fair yes
>40 medium no excellent no
14
EXAMPLE PROBLEM
The training data are in above Table. The data tuples are described by the attributes age,
income, student, and credit rating. The class label attribute, buys computer, has two
distinct values (namely, yes, no). Let C1 correspond to the class buys computer = yes and
C2 correspond to buys computer = no. The tuple we wish to classify is X = (age = youth,
income = medium, student = yes, credit rating = fair)
Extracting Classification Rules from Trees
 Represent the knowledge in

the form of IF-THEN rules
 One rule is created for each
path from the root to a leaf
 Each attribute-value pair
along a path forms a
conjunction
 The leaf node holds the class
prediction
What are the rules?
 Rules are easier for humans
to understand
15
Solution (Rules)
IF age = “<=30” AND student = “no” THEN buys_computer = “no”
IF age = “<=30” AND student = “yes” THEN buys_computer = “yes”
IF age = “31…40” THEN buys_computer = “yes”
IF age = “>40” AND credit_rating = “excellent” THEN

buys_computer = “yes”
IF age = “<=30” AND credit_rating = “fair” THEN buys_computer =

“no”
16
Algorithm for Decision Tree Induction
 Basic algorithm (a greedy algorithm)

 Tree is constructed in a top-down recursive divide-and-conquer
manner
 At start, all the training examples are at the root
 Attributes are categorical (if continuous-valued, they are

discretized in advance)
 Examples are partitioned recursively based on selected attributes
 Test attributes are selected on the basis of a heuristic or
statistical measure (e.g., information gain)

 Conditions for stopping partitioning
 All samples for a given node belong to the same class
 There are no remaining attributes for further partitioning -

majority voting is employed for classifying the leaf
 There are no samples left
17
Attribute Selection Measure: Information
Gain (ID3/C4.5)
 Select the attribute with the highest information gain

 S contains si tuples of class Ci for i = {1, …, m}
 information measures info required to classify any

arbitrary tuple m
si si
I( s1,s2,...,sm)  log2 ….information is encoded in bits.
 s si1
 entropy of attribute A with values {a1,a2,…,av}

v
s1j smj
E(A)  I(s1j,...,smj)
j1 s
 information gained by branching on attribute A
Gain(A)  I(s 1,s 2 ,..., sm) - E(A)

18
Exercise 1
 The following table consists of training data from an employee
database.
 Let status be the class attribute. Use the ID3 algorithm to construct a
decision tree from the given data.
22

15.module6 Decisiontree-Updated 14

Uploaded by

Copyright:

Available Formats

15.module6 Decisiontree-Updated 14

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

15.module6 Decisiontree-Updated 14

Uploaded by

Copyright:

Available Formats

Decision Tree

 A classification scheme which generates a tree and

The set of records available for developing

 A decision tree is a tree with the following properties:

 An edge represents a test on the attribute of the father

 Construction of a decision tree

 The data set has five attributes.

 Based on the training data set, we want to find a set of rules to

We have five leaf nodes.

 We have the following rules corresponding to the tree given in

 The classification of an unknown input vector is done by

 At the root, a test is applied to determine which child

 The path is a rule which is used to classify the records.

outlook= rain; temp = 70; humidity = 65; and

In our example, outlook is the splitting attribute at root.

 Consider the following training

The attribute age is a numeric attribute, whereas pincode is a categorical

 The splitting attribute at the root is

 Major strengths are:

 At each node the splitting attribute is selected to be the most

Entropy is used to measure how informative is a node.

The class label attribute,

 Represent the knowledge in

IF age = “<=30” AND student = “no” THEN buys_computer = “no”

IF age = “<=30” AND student = “yes” THEN buys_computer = “yes”

IF age = “31…40” THEN buys_computer = “yes”

IF age = “>40” AND credit_rating = “excellent” THEN

IF age = “<=30” AND credit_rating = “fair” THEN buys_computer =

 Basic algorithm (a greedy algorithm)

 Attributes are categorical (if continuous-valued, they are

 Test attributes are selected on the basis of a heuristic or

statistical measure (e.g., information gain)

 All samples for a given node belong to the same class

 There are no remaining attributes for further partitioning -

 Select the attribute with the highest information gain

 information measures info required to classify any

 entropy of attribute A with values {a1,a2,…,av}

 information gained by branching on attribute A

Gain(A)  I(s 1,s 2 ,..., sm) - E(A)

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.