DecisionTree Numerical ID3Prob
DecisionTree Numerical ID3Prob
Algorithm
Reference book:
R2.Tom Mitchell, Machine Learning, McGraw-Hill, 1997
Root Leaf
Decision
Tree
Algorithms
C4.5 CART
(Outlook = Sunny ˄ Humidity = Normal) v (Outlook = Overcast) v (Outlook = Rain ˄ Wind = Weak)
Dr.S.Sridevi, ASP/SCOPE, VIT Chennai
Concept of Decision Trees
Attribute 1 Attribute 2 Attribute 3 Attribute 4 Class = {M, H}
Class = M
Class = H
Attribute 2
Attribute 3
Class = M
Class = H
Attribute 1
Abdomen Thorax
length Antennae
length length
Mandible
Size
Spiracle
diameter Leg length
10
7
Abdomen length > 7.1?
6
Antenna length
5 No
Yes
4
Antenna length > 6.0? Katydid
3
2 No Yes
1
Grasshopper Katydid
1 2 3 4 5 6 7 8 9 10
Abdomen length Fig. 4.8 Feature space and the decision tree for insect data.
Grasshoppers Katydids
10
7
Abdomen length > 7.1?
6
Antenna length
5 no yes
4
Antenna length > 6.0? Katydid
3
2 no yes
1
Grasshopper Katydid
1 2 3 4 5 6 7 8 9 10
Abdomen length
Fig. 4.8 Feature space and the decision tree for insect data.
Concept of Decision Tree ML
A decision tree is built top-down from a root node and involves partitioning the
data into subsets that contain instances with similar values (homogenous).
Stop Continue
Dr.S.Sridevi, ASP/SCOPE, VIT Chennai growing tree growing tree
Decision tree - procedure
1. Start with one of the best attribute available in the dataset
2. Start with full dataset in root node (If in case you consider taking Diameter as best
attribute: Ask a question [Is dia>= 3?]
3. Based on attribute values[True/False] – Dataset is divided into 2 subsets and those
subsets becomes input to 2 new child nodes
Stop Continue
Dr.S.Sridevi, ASP/SCOPE, VIT Chennai growing tree growing tree
Decision tree - procedure
1. Start with one of the best attribute available in the dataset
2. Start with full dataset in root node (If in case you consider taking Diameter as best attribute:
Ask a question [Is dia>= 3?]
3. Based on attribute values[True/False] – Dataset is divided into 2 subsets and those subsets
becomes input to 2 new child nodes
4. In False side – Data subset has similar type for label (Grape- There is no uncertainty (no
confusion in predicting the label) about the type of leaf so stop growing the tree in that
side)
5. In True side – Subset has mixture of labels so uncertainty exists – So continue splitting the
dataset as well node
Dataset
Dataset Dataset
(Grape) (Mango &
Lemon)
Stop
Continue
growing tree
growing tree
Dr.S.Sridevi, ASP/SCOPE, VIT Chennai (Leaf node)
Decision tree - procedure
Challenges
How to represent the entire information in the dataset using minimum number of
rules?
How to develop the smallest tree?
Solution
Select the variable with maximum information (highest relation with Y) for first split
31
ID3 Vs C4.5 Vs CART
Decision Trees
33
Decision Trees
34
34
Decision Trees
35
35
Decision Trees
36
36
Decision Trees
37
37
Decision Trees
Decision trees has three types of
nodes
▪ A root node that has no
incoming edges and zero or
more outgoing edges
▪ Internal nodes, each of which
has exactly one incoming edge
and two or more outgoing
edges
▪ Leaf or Terminal nodes, each
of which has exactly one
incoming edge and no
outgoing edges 38
38
Decision Trees
▪ In a decision tree, each leaf
node is assigned a class label
▪ The non-terminal nodes,
which include the root and
other internal nodes, contain
attribute test conditions to
separate records that have
different characteristics
39
39
Decision Trees
▪ Classifying a test record is straight
forward once a decision tree has been
constructed
▪ Starting from the root node, we apply
the test condition to the record and
follow the appropriate branch based
on the outcome of the test
▪ This will lead us either to another
internal node, for which a new test
condition is applied or to a leaf node
▪ Class label associated with a leaf node
is then assigned to the record
40
40
Stepts in Decision Trees
Step-1: Begin the tree with the root node, says S, which contains the complete dataset.
Step-2: Find the best attribute in the dataset using Attribute Selection Measure (ASM).
Step-3: Divide the S into subsets that contains possible values for the best attributes.
Step-4: Generate the decision tree node, which contains the best attribute.
Step-5: Recursively make new decision trees using the subsets of the dataset created in
step -3. Continue this process until a stage is reached where you cannot further classify the
nodes and called the final node as a leaf node.
41
41
Decision Trees
42
42
Decision Trees
43
43
Terminologies
• Root Node : It represents the entire population or sample and
this further gets divided into two or more homogeneous sets
• Leaf Node: Node cannot be further segregated into further
nodes
• Parent/Child node: Root node is the parent node and all the
other nodes branched from it is know as child node
• Branch/sub tree: Formed by splitting the tree/node
• Splitting : It is dividing the root node/sub node into different
parts on the basis of some condition.
• Pruning: Opposite of splitting, basically removing unwanted
branches from the tree
• Entropy: Measure that tells the purity/impurity of samples
• Information Gain: It is the decrease in entropy after a dataset
is split on the basis of an attribute. Constructing a decision Decision trees represent a
tree is all about finding attribute that returns the highest
information gain (Useful in deciding which attribute can be disjunction of conjunctions of
used as root node ) constraints on the attribute values
• Reduction in variance: It is an algorithm used for continuous of instances.
target variables (regression problems). The split with lower
variance is selected as the criteria to split the population.
• Gini index: the measure of purity or impurity used in building
decision tree in CART ()
• Chi Square: It is an algorithm to find out the statistical
significance between the differences between sub-nodes and
parent node.
Dr.S.Sridevi, ASP/SCOPE, VIT Chennai
How many attributes? Which attribute is significant?
1. Information Gain
2. Gini Index
47
47
Decision Trees
1. Information Gain:
Information gain is the measurement of changes in entropy after the
segmentation of a dataset based on an attribute.
It calculates how much information a feature provides us about a class.
According to the value of information gain, we split the node and build the
decision tree.
Where,
•S= Total number of samples
•P(yes)= probability of yes
•P(no)= probability of no
49
49
Decision Trees
2. Gini Index:
•Gini index is a measure of impurity or purity used while creating a
decision tree in the CART(Classification and Regression Tree) algorithm.
•An attribute with the low Gini index should be preferred as compared to
the high Gini index.
•It only creates binary splits, and the CART algorithm uses the Gini index
to create binary splits.
•Gini index can be calculated using the below formula:
Gini Index= 1- ∑jPj2
50
50
Information Gain
Few samples
are mixed Dr.S.Sridevi, ASP/SCOPE, VIT Chennai
Information Gain
Candidate
attributes
4
5 5
• There are 5 samples in Sunny sub-attribute with 2-Yes positive labels + 3-No negative
labels.
• There are 4 samples in Overcast sub-attribute with 4-Yes positive labels + 0-No
negative labels.
• There are 5 samples in Rainy sub-attribute with 3-Yes positive labels + 2-No negative
labels.
Dr.S.Sridevi, ASP/SCOPE, VIT Chennai
4
5 5 Frequency Table for X1
7 7
Dr.S.Sridevi, ASP/SCOPE, VIT Chennai
Windy?
false true
No No
Yes No
Yes Yes
Yes Yes
No Yes
Yes No
Yes
Yes 6
8 Dr.S.Sridevi, ASP/SCOPE, VIT Chennai
Frequency Table for entire dataset
The given 4
Candidate
attributes
E(Outlook=Sunny)
= -(2/5) log2 (2/5) –(3/5) log2 (3/5) Step 6: Information for outlook
= -0.4(-1.3219)-0.6(-0.7369)
= 0.52876+0.44214
= 0.971 Step7: Calculate Gain for outlook
Dr.S.Sridevi, ASP/SCOPE, VIT Chennai
Step 1: Calculate Overall Entropy value
E(Outlook=Sunny)
= -(4/4) log2 (4/4) –(0/4) log2 (0/4) Step 6: Information for outlook
= -0
Temp?
high normal
No Yes
No No
Yes Yes
Yes Yes
No Yes
Yes Yes
No Yes
7 7
Dr.S.Sridevi, ASP/SCOPE, VIT Chennai
Take X3:Humidity attribute and
analyze what are the sub-attributes
in it and count
false true
No No
Yes No
Yes Yes
Yes Yes
No Yes
Yes No
Yes
Yes 6
8 Dr.S.Sridevi, ASP/SCOPE, VIT Chennai
Dr.S.Sridevi, ASP/SCOPE, VIT Chennai
We have to check which attribute
Step 3 is highest Information Gain value
The labels
are similar
and pure
On checking overcast attribute has only (Yes) positive labels with high purity
measure so it is considered as leaf node because it has no possibility to grow
further. Whereas sunny and rain has both Yes and No labels which means impurity
is there and it has possibilityDr.S.Sridevi,
to branch out
ASP/SCOPE, as Yes and No.
VIT Chennai
Below the Sunny attribute we will grow the tree via choosing any of the pending
attributes (temp or humidity or wind)
Which to choose?
Measure the Information Gain and choose the one with highest value.
Step 5: So, Lets consider the data samples D1, D2, D8, D9, D11 pertaining to Sunny sub-
attribute with respect to other attributes namely temp, humidity and windy
+
-