0% found this document useful (0 votes)

63 views114 pages

DecisionTree Numerical ID3Prob

Uploaded by

HARSH NAYAN

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

63 views114 pages

DecisionTree Numerical ID3Prob

Uploaded by

HARSH NAYAN

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 114

Decision Tree & ID3

Algorithm
Reference book:
R2.Tom Mitchell, Machine Learning, McGraw-Hill, 1997

Dr.S.Sridevi, ASP/SCOPE, VIT Chennai

Tree Versus ML Decision Tree
Leaf
Root

Root Leaf

Dr.S.Sridevi, ASP/SCOPE, VIT Chennai

1. Satisfy Criteria for salary?
2. Is it a dream company?
3. Commute/travel time is less than a hour?
4. Offers free breakfast & Coffee?

Dr.S.Sridevi, ASP/SCOPE, VIT Chennai

You need to buy apples…
How would you choose fresh apples in market?

Dr.S.Sridevi, ASP/SCOPE, VIT Chennai

• DT is a method for approximating discrete-valued functions that is robust to
noisy data and capable of learning disjunctive expressions.
• Learned trees can also be re-represented as sets of if-then rules to improve
human readability. Dr.S.Sridevi, ASP/SCOPE, VIT Chennai
1. Satisfy Criteria for salary?
2. Is it a dream company?
3. Commute/travel time is less than a hour?
4. Offers free breakfast & Coffee?

Dr.S.Sridevi, ASP/SCOPE, VIT Chennai

Definition – Decision Trees
1. Decision trees classify instances by sorting
them down the tree from the root to
some leaf node, which provides the
classification of the instance.
2. Each node in the tree specifies a test of
• DT is a method for approximating discrete- some attribute of the instance, and each
valued functions that is robust to noisy data branch descending from that node
and capable of learning disjunctive corresponds to one of the possible values
expressions. for this attribute.
3. An instance is classified by starting at the
• Learned trees can also be re-represented as
root node of the tree, testing the attribute
sets of if-then rules to improve human
specified by this node, then moving down
readability.
the tree branch corresponding to the
value of the attribute.
4. This process is then repeated for the sub-
tree rooted at the new node.

Dr.S.Sridevi, ASP/SCOPE, VIT Chennai

Types of Decision Trees

• Classification Tree : Classification Tree is used to create a decision tree

for a categorical response (commonly known as target) with many
categorical or continuous predictors (factors). The categorical response
can be in the form of binomial or multinomial (e.g. Pass/Fail, high,
medium & low, etc.). It illustrates important patterns and relationships
between a categorical response and important predictors within highly
complicated data, without using parametric methods. Also, identify
groups in the data with desirable characteristics, and to predict response
values for new observations. For e.g., a credit card company can use
classification tree to identify customers that will take credit card or not
based on several predictors.
• Regression Tree : Regression Tree is used to create a decision tree for a
continuous response (commonly known as target) with many categorical
or continuous predictors (factors). The continuous response can be in the
form of a real number (e.g. piston diameter, blood pressure level, etc.). It
also illustrates the important patterns and relationships between a
continuous response and predictors within highly complicated data,
without using parametric methods. Also, identify groups in the data with
desirable characteristics, and to predict response values for new
observations. For example, a pharmaceutical company can use regression
tree to identify the potential predictors which are affecting the dissolution
rate based on several predictors.

Dr.S.Sridevi, ASP/SCOPE, VIT Chennai

Decision Tree Algorithms
ID3

Decision
Tree
Algorithms

C4.5 CART

Dr.S.Sridevi, ASP/SCOPE, VIT Chennai

A decision tree is built top-down from a root node and involves partitioning the
data into subsets that contain instances with similar values (homogenous).

Decision Tree algorithms

1. ID3(Iterative Dichotomiser 3): ID3 cannot handle continuous variables

directly; it works only with categorical data. It is also prone to overfitting.
(Splitting Criterion: Information Gain)

2. C4.5: It can handle both categorical and continuous attributes by converting

continuous attributes into categorical ones through thresholding. (Splitting
Criterion: Gain Ratio: C4.5 is an extension of ID3)
3. CART (Classification and Regression Trees): CART splits nodes into exactly
two branches (Splitting Criterion: Gini Index for classification trees, Variance
Reduction for regression trees.)
Dr.S.Sridevi, ASP/SCOPE, VIT Chennai
Terminologies
• Root Node : It represents the entire population or sample and
this further gets divided into two or more homogeneous sets
• Leaf Node: Node cannot be further segregated into further
nodes
• Parent/Child node: Root node is the parent node and all the
other nodes branched from it is know as child node
• Branch/sub tree: Formed by splitting the tree/node
• Splitting : It is dividing the root node/sub node into different
parts on the basis of some condition.
• Pruning: Opposite of splitting, basically removing unwanted
branches from the tree
• Entropy: Measure that tells the purity/impurity of samples
• Information Gain: It is the decrease in entropy after a dataset
is split on the basis of an attribute. Constructing a decision Decision trees represent a
tree is all about finding attribute that returns the highest
information gain (Useful in deciding which attribute can be disjunction of conjunctions of
used as root node ) constraints on the attribute values
• Reduction in variance: It is an algorithm used for continuous of instances.
target variables (regression problems). The split with lower
variance is selected as the criteria to split the population.
• Gini index: the measure of purity or impurity used in building
decision tree in CART ()
• Chi Square: It is an algorithm to find out the statistical
significance between the differences between sub-nodes and
parent node.
Dr.S.Sridevi, ASP/SCOPE, VIT Chennai
DECISION TREE REPRESENTATION
• (Outlook = Sunny, Temperature = Hot, Humidity = High, Wind = Strong)
• In general, decision trees represent a disjunction of conjunctions of constraints on the attribute
values of instances.
• Each path from the tree root to a leaf corresponds to a conjunction of attribute tests, and the tree
itself to a disjunction of these conjunctions.
(Outlook = Sunny ˄ Humidity = Normal) v (Outlook = Overcast) v (Outlook = Rain ˄ Wind = Weak)
If (Outlook = Sunny AND Humidity = Normal) OR (Outlook = Overcast) OR (Outlook = Rain AND Wind = Weak)
Then: Play Tennis = Yes
Else
Then: Play Tennis = No

Dr.S.Sridevi, ASP/SCOPE, VIT Chennai

Test Instance:
Whether we PlayTennis ?
If (“Outlook = Sunny ˄ Humidity = Normal”)
Then Play Tennis = Yes

Outlook Temperature Humidity Wind Play Tennis:

Decision
Sunny ----- High ----- Yes

(Outlook = Sunny ˄ Humidity = Normal) v (Outlook = Overcast) v (Outlook = Rain ˄ Wind = Weak)
Dr.S.Sridevi, ASP/SCOPE, VIT Chennai
Concept of Decision Trees
Attribute 1 Attribute 2 Attribute 3 Attribute 4 Class = {M, H}

Class = M

Class = H
Attribute 2

Attribute 3
Class = M

Class = H
Attribute 1

Dr.S.Sridevi, ASP/SCOPE, VIT Chennai

Fig. 3.1 Representation of objects (samples) using features.
Colour {Green, Brown, Gray, Other} Has wings?

Abdomen Thorax
length Antennae
length length

Mandible
Size

Spiracle
diameter Leg length

Fig. 3.2 Measuring features for domain of interest.

Table 3.1 Instance, Features and Class
Insect Abdomen Antennae Insect class
ID length length
1 2.7 5.5 Grasshopper
2 8 9.1 Kartydid
3 0.9 4.7 Grasshopper
4 1.1 3.1 Grasshopper
5 5.4 8.5 Kartydid
6 2.9 1.9 Grasshopper
7 6.1 6.6 Kartydid
8 0.5 1 Grasshopper
9 8.3 6.6 Kartydid
10 8.1 4.7 Kartydid
An Example from Medicine
Table 4.1 Medical Data
The main
o Gender Age BP Drug
1 Male 20 Normal A
purpose of the decision tree is 2 Female 73 Normal B
3 Male 37 High A
to expose the structural 4 Male 33 Low B
5 Female 48 High A
6 Male 29 Normal A
information contained in the 7 Female 52 Normal B
8 Male 42 Low B
data. 9 Male 61 Normal B
10 Female 30 Normal A
11 Female 26 Low B
12 Male 54 High A
Grasshoppers Katydids

7
Abdomen length > 7.1?
6
Antenna length

5 No
Yes

4
Antenna length > 6.0? Katydid
3

2 No Yes

1
Grasshopper Katydid

1 2 3 4 5 6 7 8 9 10

Abdomen length Fig. 4.8 Feature space and the decision tree for insect data.
Grasshoppers Katydids

7
Abdomen length > 7.1?
6
Antenna length

5 no yes

4
Antenna length > 6.0? Katydid
3

2 no yes

1
Grasshopper Katydid

1 2 3 4 5 6 7 8 9 10

Abdomen length
Fig. 4.8 Feature space and the decision tree for insect data.
Concept of Decision Tree ML
A decision tree is built top-down from a root node and involves partitioning the
data into subsets that contain instances with similar values (homogenous).

Dr.S.Sridevi, ASP/SCOPE, VIT Chennai

Training Dataset
• Input features are also called Attributes
• This example dataset has 2 attributes – Colour & Diameter
• Instances are defined by categorical levels/numerical
values of attributes
• We used to call a dataset as Labelled dataset if class is
defined for input feature – Concept of Supervised
classification: Here Output is categorical
Decision tree - procedure
1. Start with one of the best attribute available in the dataset
2. Start with full dataset in root node (If in case you consider taking Diameter as best
attribute: Ask a question [Is dia>= 3?]

Stop Continue
Dr.S.Sridevi, ASP/SCOPE, VIT Chennai growing tree growing tree
Decision tree - procedure
1. Start with one of the best attribute available in the dataset
2. Start with full dataset in root node (If in case you consider taking Diameter as best attribute:
Ask a question [Is dia>= 3?]
3. Based on attribute values[True/False] – Dataset is divided into 2 subsets and those subsets
becomes input to 2 new child nodes
4. In False side – Data subset has similar type for label (Grape- There is no uncertainty (no
confusion in predicting the label) about the type of leaf so stop growing the tree in that
side)
5. In True side – Subset has mixture of labels so uncertainty exists – So continue splitting the
dataset as well node

Dataset

Dataset Dataset
(Grape) (Mango &
Lemon)

Stop
Continue
growing tree
growing tree
Dr.S.Sridevi, ASP/SCOPE, VIT Chennai (Leaf node)
Decision tree - procedure

Based on the characteristics of attributes, Identify different set

of possible question to ask.
How to identify, a question to continue growing tree is good
indictor? → Information gain metric

Dr.S.Sridevi, ASP/SCOPE, VIT Chennai

Dr.S.Sridevi, ASP/SCOPE, VIT Chennai
Step I
• Identify different set of possible question to ask

Dr.S.Sridevi, ASP/SCOPE, VIT Chennai

Method to identify best
attributes

Dr.S.Sridevi, ASP/SCOPE, VIT Chennai

Entropy – Is a metric to identify best attribute

A decision tree is built top-down from a root node and involves

partitioning the data into subsets that contain instances with similar
values (homogenous). ID3 algorithm uses entropy to calculate the
homogeneity of a sample. If the sample is completely homogeneous the
entropy is zero and if the sample is an equally divided it has entropy of
one.
CLASSIFICATION METHODS

Challenges

How to represent the entire information in the dataset using minimum number of
rules?
How to develop the smallest tree?

Solution

Select the variable with maximum information (highest relation with Y) for first split

31
ID3 Vs C4.5 Vs CART
Decision Trees

▪ Decision trees are a type of supervised machine learning

▪ Use well “labelled” training data and on basis of that data, predict the
output. This process can then be used to predict the results for
unknown data
▪ Decision trees can be applied for both regression and classification
problems
▪ A decision tree processes data into groups based on the value of the
data and the features it is provided
▪ Decision trees can be used for regression to get real numeric value. Or
they can be used for classification to split data into different categories

33
Decision Trees

34
34
Decision Trees

35
35
Decision Trees

36
36
Decision Trees

37
37
Decision Trees
Decision trees has three types of
nodes
▪ A root node that has no
incoming edges and zero or
more outgoing edges
▪ Internal nodes, each of which
has exactly one incoming edge
and two or more outgoing
edges
▪ Leaf or Terminal nodes, each
of which has exactly one
incoming edge and no
outgoing edges 38
38
Decision Trees
▪ In a decision tree, each leaf
node is assigned a class label
▪ The non-terminal nodes,
which include the root and
other internal nodes, contain
attribute test conditions to
separate records that have
different characteristics

39
39
Decision Trees
▪ Classifying a test record is straight
forward once a decision tree has been
constructed
▪ Starting from the root node, we apply
the test condition to the record and
follow the appropriate branch based
on the outcome of the test
▪ This will lead us either to another
internal node, for which a new test
condition is applied or to a leaf node
▪ Class label associated with a leaf node
is then assigned to the record
40
40
Stepts in Decision Trees

Step-1: Begin the tree with the root node, says S, which contains the complete dataset.
Step-2: Find the best attribute in the dataset using Attribute Selection Measure (ASM).
Step-3: Divide the S into subsets that contains possible values for the best attributes.
Step-4: Generate the decision tree node, which contains the best attribute.
Step-5: Recursively make new decision trees using the subsets of the dataset created in
step -3. Continue this process until a stage is reached where you cannot further classify the
nodes and called the final node as a leaf node.

41
41
Decision Trees

42
42
Decision Trees

Why use Decision Trees?

1. Decision Trees usually mimic human thinking ability

while making a decision, so it is easy to understand.

2. The logic behind the decision tree can be easily

understood because it shows a tree-like structure.

43
43
Terminologies
• Root Node : It represents the entire population or sample and
this further gets divided into two or more homogeneous sets
• Leaf Node: Node cannot be further segregated into further
nodes
• Parent/Child node: Root node is the parent node and all the
other nodes branched from it is know as child node
• Branch/sub tree: Formed by splitting the tree/node
• Splitting : It is dividing the root node/sub node into different
parts on the basis of some condition.
• Pruning: Opposite of splitting, basically removing unwanted
branches from the tree
• Entropy: Measure that tells the purity/impurity of samples
• Information Gain: It is the decrease in entropy after a dataset
is split on the basis of an attribute. Constructing a decision Decision trees represent a
tree is all about finding attribute that returns the highest
information gain (Useful in deciding which attribute can be disjunction of conjunctions of
used as root node ) constraints on the attribute values
• Reduction in variance: It is an algorithm used for continuous of instances.
target variables (regression problems). The split with lower
variance is selected as the criteria to split the population.
• Gini index: the measure of purity or impurity used in building
decision tree in CART ()
• Chi Square: It is an algorithm to find out the statistical
significance between the differences between sub-nodes and
parent node.
Dr.S.Sridevi, ASP/SCOPE, VIT Chennai
How many attributes? Which attribute is significant?

Dr.S.Sridevi, ASP/SCOPE, VIT Chennai

Why need to find significant attribute?

• Because to start to construct the decision tree, one of the best

attribute has to assigned as Root node.

Dr.S.Sridevi, ASP/SCOPE, VIT Chennai

Decision Trees

Attribute Selection Measures

While implementing a Decision tree, the main issue arises that how to
select the best attribute for the root node and for sub-nodes. So, to solve
such problems there is a technique which is called as Attribute selection
measure or ASM. By this measurement, we can easily select the best
attribute for the nodes of the tree. There are two popular techniques for
ASM, which are:

1. Information Gain
2. Gini Index

47
47
Decision Trees

1. Information Gain:
Information gain is the measurement of changes in entropy after the
segmentation of a dataset based on an attribute.
It calculates how much information a feature provides us about a class.
According to the value of information gain, we split the node and build the
decision tree.

A decision tree algorithm always tries to maximize the value of information

gain, and a node/attribute having the highest information gain is split first. It
can be calculated using the below formula:

Information Gain = Entropy(S )- [(Weighted Avg) *Entropy(each feature)]

48
48
Decision Trees

Entropy: Entropy is a metric to measure the impurity in a given attribute.

It specifies randomness in data.

Entropy can be calculated as:

Entropy(s)= -P(yes)log2 P(yes)- P(no) log2 P(no)

Where,
•S= Total number of samples
•P(yes)= probability of yes
•P(no)= probability of no

49
49
Decision Trees

2. Gini Index:
•Gini index is a measure of impurity or purity used while creating a
decision tree in the CART(Classification and Regression Tree) algorithm.
•An attribute with the low Gini index should be preferred as compared to
the high Gini index.
•It only creates binary splits, and the CART algorithm uses the Gini index
to create binary splits.
•Gini index can be calculated using the below formula:
Gini Index= 1- ∑jPj2

50
50
Information Gain

Few samples
are mixed Dr.S.Sridevi, ASP/SCOPE, VIT Chennai
Information Gain

Dr.S.Sridevi, ASP/SCOPE, VIT Chennai

Dr.S.Sridevi, ASP/SCOPE, VIT Chennai
Dr.S.Sridevi, ASP/SCOPE, VIT Chennai
Entropy

Dr.S.Sridevi, ASP/SCOPE, VIT Chennai

Dr.S.Sridevi, ASP/SCOPE, VIT Chennai
Dr.S.Sridevi, ASP/SCOPE, VIT Chennai
Dr.S.Sridevi, ASP/SCOPE, VIT Chennai
Dr.S.Sridevi, ASP/SCOPE, VIT Chennai
Entropy metric (Numerical example)

Dr.S.Sridevi, ASP/SCOPE, VIT Chennai

Entropy metric (Numerical example)

Dr.S.Sridevi, ASP/SCOPE, VIT Chennai

Concept of Decision
Trees

Dr.S.Sridevi, ASP/SCOPE, VIT Chennai

ID3 ALGORITHM (Iterative Dichotomiser 3)
• ID3 algorithm, learns decision trees by constructing them
top-down, beginning with the question "which attribute
should be tested at the root of the tree?‘ To answer this
question, each instance attribute is evaluated using a
statistical test to determine how well it alone classifies the
training examples.
• The best attribute is selected and used as the test at the
root node of the tree.
• A descendant of the root node is then created for each
possible value of this attribute, and the training examples
are sorted to the appropriate descendant node
• The entire process is then repeated using the training
examples associated with each descendant node to select
the best attribute to test at that point in the tree.
• This forms a greedy search for an acceptable decision tree,
in which the algorithm never backtracks to reconsider
earlier choices.
• A simplified version of the algorithm, specialized to learning
boolean-valued functions (i.e., concept learning)

Dr.S.Sridevi, ASP/SCOPE, VIT Chennai

WHICH ATTRIBUTE IS THE BEST
CLASSIFIER?

• The central choice in the ID3 algorithm is selecting which attribute to

test at each node in the tree.
• What is a good quantitative measure of the worth of an attribute?
• We will define a statistical property, called information gain, that
measures how well a given attribute separates the training examples
according to their target classification.
• ID3 uses this information gain measure to select among the
candidate attributes at each step while growing the tree.

Dr.S.Sridevi, ASP/SCOPE, VIT Chennai

Dr.S.Sridevi, ASP/SCOPE, VIT Chennai
Contd.,

• ID3(Examples, Target attribute, Attributes)

• Examples are the training examples. Target attribute is the attribute whose
value is to be predicted by the tree. Attributes is a list of other attributes that
may be tested by the learned decision tree. Returns a decision tree that
correctly classifies the given Examples.
• Create a Root node for the tree
• If all Examples are positive, Return the single-node tree Root, with label = +
• If all Examples are negative, Return the single-node tree Root, with label = -
• If Attributes is empty, Return the single-node tree Root, with label = most
common value of
• Target attribute in Examples
• Otherwise Begin
• A<-the attribute from Attributes that best classifies Examples
• The decision attribute for Root <- A

Dr.S.Sridevi, ASP/SCOPE, VIT Chennai

Contd.,

• For each possible value, vi, of A,

• Add a new tree branch below Root, corresponding to the test A = vi
• Let Examplesvi be the subset of Examples that have value vi for A
• If Examplesvi is empty
• Then below this new branch add a leaf node with label = most
common
• value of Target attribute in Examples
• Else below this new branch add the subtree
• ID3(Examplesvi, Targetattribute, Attributes - (A)))
• End
• Return Root

Dr.S.Sridevi, ASP/SCOPE, VIT Chennai

Numerical problem
Given a dataset of historical tennis match data, including
features such as outlook, temperature, humidity and windy,
design a decision tree classifier to predict the outcome of a
tennis playing decision making based on these input features.
How would you determine the optimal split criteria at each
node of the decision tree to make accurate predictions about
whether a player should play tennis or not under specific
weather and player condition scenarios?

Dr.S.Sridevi, ASP/SCOPE, VIT Chennai

DATASET FOR PLAYING TENNIS

+ Dr.S.Sridevi, ASP/SCOPE, VIT Chennai

-
DATASET FOR PLAYING TENNIS
Question?

Candidate
attributes

+ Positive instances Dr.S.Sridevi, ASP/SCOPE, VIT Chennai

- Negative instances
Take X1:outlook attribute and
analyze what are the sub-attributes
in it and count # of samples in each

4
5 5

• There are 5 samples in Sunny sub-attribute with 2-Yes positive labels + 3-No negative
labels.
• There are 4 samples in Overcast sub-attribute with 4-Yes positive labels + 0-No
negative labels.
• There are 5 samples in Rainy sub-attribute with 3-Yes positive labels + 2-No negative
labels.
Dr.S.Sridevi, ASP/SCOPE, VIT Chennai
4
5 5 Frequency Table for X1

Dr.S.Sridevi, ASP/SCOPE, VIT Chennai

Take X2:temp attribute and analyze what are
the sub-attributes in it and count
Temp?

hot mild cool

No Yes Yes
No No No
Yes Yes Yes
Yes Yes Yes
4 Yes Frequency Table for X2
No 4

Dr.S.Sridevi, ASP/SCOPE, VIT Chennai

humidity?

Frequency Table for X3

high normal
No Yes
No No
Yes Yes
Yes Yes
No Yes
Yes Yes
No Yes

7 7
Dr.S.Sridevi, ASP/SCOPE, VIT Chennai
Windy?

Frequency Table for X4

false true
No No
Yes No
Yes Yes
Yes Yes
No Yes
Yes No
Yes
Yes 6
8 Dr.S.Sridevi, ASP/SCOPE, VIT Chennai
Frequency Table for entire dataset

Dr.S.Sridevi, ASP/SCOPE, VIT Chennai

DATASET FOR PLAYING TENNIS There are 4 attributes (outlook, temperature,
humidity and windy) in the given dataset.
Sub-attributes Attributes Target Which should be considered as root node?
Question?

The given 4
Candidate
attributes

+ Positive instances =9 /14 Solution

Dr.S.Sridevi, ASP/SCOPE, VIT Chennai
- Negative instances = 5/14
Step 1

• Measure entropy for overall samples S in the given dataset. Formula

for Entropy

Dr.S.Sridevi, ASP/SCOPE, VIT Chennai

Step 2
• For all the attributes from the dataset, compute entropy and
information gain measures to identify which attribute is significant to
consider it as a root node to start building the decision tree.
• In this dataset there are 4 attributes namely outlook,
temperature, humidity and windy.
• Lets start considering outlook as the first choice in our computation to
calculate Information gain

Dr.S.Sridevi, ASP/SCOPE, VIT Chennai

For X1: Overcast compute all the
following measures, Refer to the
basic Entropy formula
Step 1: Calculate Overall Entropy value

Step 2: How many sub attributes in Outlook?

Step 3: Calculate Outlook-Sunny Attribute based

Entropy value

Step 4: Calculate Outlook-Overcast Attribute based

Entropy value
4
5 5
Step 5: Calculate Outlook-Rainy Attribute based
Entropy value

Step 6: Information for outlook

Step7: Calculate Gain for outlook

Dr.S.Sridevi, ASP/SCOPE, VIT Chennai
Step 1: Calculate Overall Entropy value

+ Positive instances =2 /5 Step 2: How many sub attributes in Outlook?

- Negative instances = 3/5 3 = Sunny, Overcast & Rainy
Step 3: Calculate Outlook-Sunny Attribute based
Entropy value

E(Outlook=Sunny) = -(2/5) log2 (2/5) –(3/5) log2 (3/5) = 0.971

Step 4: Calculate Outlook-Overcast Attribute based

Entropy value
4
5 5
Note: In calculator Step 5: Calculate Outlook-Rainy Attribute based
log2 (x) = log(x)/log(2) Entropy value

E(Outlook=Sunny)
= -(2/5) log2 (2/5) –(3/5) log2 (3/5) Step 6: Information for outlook
= -0.4(-1.3219)-0.6(-0.7369)
= 0.52876+0.44214
= 0.971 Step7: Calculate Gain for outlook
Dr.S.Sridevi, ASP/SCOPE, VIT Chennai
Step 1: Calculate Overall Entropy value

+ Positive instances =4 /4 Step 2: How many sub attributes in Outlook?

- Negative instances = 0/4 3 = Sunny, Overcast & Rainy
Step 3: Calculate Outlook-Sunny Attribute based
Entropy value

E(Outlook=Sunny) = -(2/5) log2 (2/5) –(3/5) log2 (3/5) = 0.971

Step 4: Calculate Outlook-Overcast Attribute based

Entropy value
4
5 5 E(Outlook=Overcast) = -(4/4) log2(4/4) -(0/4) log2(0/4) = 0

Note: In calculator Step 5: Calculate Outlook-Rainy Attribute based

log2 (x) = log(x)/log(2) Entropy value

E(Outlook=Sunny)
= -(4/4) log2 (4/4) –(0/4) log2 (0/4) Step 6: Information for outlook
= -0

Step7: Calculate Gain for outlook

Dr.S.Sridevi, ASP/SCOPE, VIT Chennai
Step 1: Calculate Overall Entropy value

+ Positive instances =3 /5 Step 2: How many sub attributes in Outlook?

- Negative instances = 2/5 3 = Sunny, Overcast & Rainy
Step 3: Calculate Outlook-Sunny Attribute based
Entropy value

E(Outlook=Sunny) = -(2/5) log2 (2/5) –(3/5) log2 (3/5) = 0.971

Step 4: Calculate Outlook-Overcast Attribute based

Entropy value
4
5 5 E(Outlook=Overcast) = -(4/4) log2(4/4) -(0/4) log2(0/4) = 0

Note: In calculator Step 5: Calculate Outlook-Rainy Attribute based

log2 (x) = log(x)/log(2) Entropy value
E(Outlook=Rainy) = -(2/5) log2 (2/5) –(3/5) log2 (3/5) = 0.971
E(Outlook=Sunny)
= -(2/5) log2 (2/5) –(3/5) log2 (3/5) Step 6: Information for outlook
= -0.4(-1.3219)-0.6(-0.7369)
= 0.52876+0.44214
= 0.971 Step7: Calculate Gain for outlook
Dr.S.Sridevi, ASP/SCOPE, VIT Chennai
Step 1: Calculate Overall Entropy value

E(S) = -(9/14) log2 (9/14) –(5/14) log2 (5/14) = 0.940

Step 2: How many sub attributes in Outlook?

3 = Sunny, Overcast & Rainy
Step 3: Calculate Outlook-Sunny Attribute based
Entropy value

E(Outlook=Sunny) = -(2/5) log2 (2/5) –(3/5) log2 (3/5) = 0.971

Step 4: Calculate Outlook-Overcast Attribute based

Entropy value
4
5 5 E(Outlook=Overcast) = -(4/4) log2(4/4) -(0/4) log2(0/4) = 0

Note: In calculator Step 5: Calculate Outlook-Rainy Attribute based

log2 (x) = log(x)/log(2) Entropy value
E(Outlook=Rainy) = -(2/5) log2 (2/5) –(3/5) log2 (3/5) = 0.971
I(Outlook) = E(Outlook=Sunny)
+E(Outlook=Overcast)+E(Outlook=Rai Step 6: Information for outlook
ny) = (5/14) * E(Outlook=Sunny)+ I(Outlook) = (5/14) * 0.971 + (4/14)*0 + (5/14)*0.971 = 0.693
(4/14)* +E(Outlook=Overcast)
(5/14)*+E(Outlook=Rainy) Step7: Calculate Gain for outlook
= 0.693 Dr.S.Sridevi, ASP/SCOPE, VIT Chennai
Gain(Outlook) = E(S)-I(Outlook) [Step1-Step 6]
= 0.940 – 0.693=0.247
Take X2:temp attribute and analyze
what are the sub-attributes in it and
count

Temp?

hot mild cool

No Yes Yes
No No No
Yes Yes Yes
Yes Yes Yes
4 Yes
No 4

Dr.S.Sridevi, ASP/SCOPE, VIT Chennai

Take X2:temp attribute and analyze
what are the sub-attributes in it and
count
Step 1: Calculate Overall Entropy value

E(S) = -(9/14) log2 (9/14) –(5/14) log2 (5/14) = 0.940

Temp? Step 2: How many sub attributes in temp?

3 = hot, mild, cool

Step 3: Calculate Outlook-Sunny Attribute based
Entropy value
hot mild cool E(temp=hot) = -(2/4) log (2/4) –(2/4) log (2/4) = 1
2 2
No Yes Yes
No Step 4: Calculate Outlook-Overcast Attribute based
No No Entropy value
Yes Yes Yes
Yes Yes Yes E(temp=mild) = -(4/6) log2(4/6)-(2/6) log2 (2/6) = 0.9184
Yes Step 5: Calculate Outlook-Rainy Attribute based
4 4 Entropy value
No
E(temp=cool) = -(3/4) log2 (3/4) –(1/4) log2 (1/4) = 0.8112
6
Step 6: Information for outlook
I(temp) = (4/14) * 1 + (6/14)*0.9184 + (4/14)*0.8112= 0.9149
Step7: Calculate Gain for outlook
Dr.S.Sridevi, ASP/SCOPE, VIT Chennai Gain(Temp) = E(S)-I(Temp)
= 0.94 – 0.9149=0.0251
humidity?

high normal
No Yes
No No
Yes Yes
Yes Yes
No Yes
Yes Yes
No Yes

7 7
Dr.S.Sridevi, ASP/SCOPE, VIT Chennai
Take X3:Humidity attribute and
analyze what are the sub-attributes
in it and count

Dr.S.Sridevi, ASP/SCOPE, VIT Chennai

Windy?

false true
No No
Yes No
Yes Yes
Yes Yes
No Yes
Yes No
Yes
Yes 6
8 Dr.S.Sridevi, ASP/SCOPE, VIT Chennai
Dr.S.Sridevi, ASP/SCOPE, VIT Chennai
We have to check which attribute
Step 3 is highest Information Gain value

The outlook attribute which has

maximum value of Information
Gain is assigned as root node to
grow the Decision tree

Dr.S.Sridevi, ASP/SCOPE, VIT Chennai

Step 4
Having identified that outlook attribute is the root node, yet we need to make 3 branches for
sunny, overcast and rain from it.

The labels
are similar
and pure
On checking overcast attribute has only (Yes) positive labels with high purity
measure so it is considered as leaf node because it has no possibility to grow
further. Whereas sunny and rain has both Yes and No labels which means impurity
is there and it has possibilityDr.S.Sridevi,
to branch out
ASP/SCOPE, as Yes and No.
VIT Chennai
Below the Sunny attribute we will grow the tree via choosing any of the pending
attributes (temp or humidity or wind)
Which to choose?
Measure the Information Gain and choose the one with highest value.
Step 5: So, Lets consider the data samples D1, D2, D8, D9, D11 pertaining to Sunny sub-
attribute with respect to other attributes namely temp, humidity and windy

Dr.S.Sridevi, ASP/SCOPE, VIT Chennai

Step 5a

Dr.S.Sridevi, ASP/SCOPE, VIT Chennai

Step 5b

Dr.S.Sridevi, ASP/SCOPE, VIT Chennai

Step 5c

Dr.S.Sridevi, ASP/SCOPE, VIT Chennai

We have to check which attribute
is having highest Information
Gain value

The Humidity attribute which has

maximum value of Information
Gain is assigned as root node in
Level1 to grow the Decision tree
further
Dr.S.Sridevi, ASP/SCOPE, VIT Chennai
Step 6

Step 6: To grow the tree further below

Rain sub-attribute, Lets consider the data
samples pertaining to Rain sub-attribute
On checking High sub-attribute has only (No)with negative respect to with
labels otherhigh
mainpurity
attributes
measure and Normal sub-attribute has namely
onlyASP/SCOPE,
(Yes) temp, humidity
with and
highwindy
Dr.S.Sridevi, VITpositive
Chennai labels purity
measure so they are considered as leaf nodes.
Step 6

Dr.S.Sridevi, ASP/SCOPE, VIT Chennai

Step 7

Dr.S.Sridevi, ASP/SCOPE, VIT Chennai

Dr.S.Sridevi, ASP/SCOPE, VIT Chennai
Dr.S.Sridevi, ASP/SCOPE, VIT Chennai
Dr.S.Sridevi, ASP/SCOPE, VIT Chennai
On checking Strong sub-attribute has only (No) negative labels with high purity
measure and Weak sub-attribute has only (Yes) positive labels with high purity
measure so they are considered as leaf nodes.

Dr.S.Sridevi, ASP/SCOPE, VIT Chennai

Finally Decision tree is grown

Dr.S.Sridevi, ASP/SCOPE, VIT Chennai

DECISION TREE REPRESENTATION

• (Outlook = Sunny, Temperature = Hot, Humidity = High, Wind = Strong)

• would be sorted down the leftmost branch of this decision tree and would therefore be classified
as a negative instance (i.e., the tree predicts that Play Tennis = no).
• In general, decision trees represent a disjunction of conjunctions of constraints on the attribute
values of instances.
• Each path from the tree root to a leaf corresponds to a conjunction of attribute tests, and the tree
itself to a disjunction of these conjunctions.
(Outlook = Sunny ˄ Humidity = Normal) v (Outlook = Overcast) v (Outlook = Rain ˄ Wind = Weak)

Dr.S.Sridevi, ASP/SCOPE, VIT Chennai

Attributes
Target

+
-

Dr.S.Sridevi, ASP/SCOPE, VIT Chennai

Attributes

Dr.S.Sridevi, ASP/SCOPE, VIT Chennai

Dr.S.Sridevi, ASP/SCOPE, VIT Chennai
Dr.S.Sridevi, ASP/SCOPE, VIT Chennai
`

Dr.S.Sridevi, ASP/SCOPE, VIT Chennai

Dr.S.Sridevi, ASP/SCOPE, VIT Chennai
Dr.S.Sridevi, ASP/SCOPE, VIT Chennai
Dr.S.Sridevi, ASP/SCOPE, VIT Chennai

DL Full Merged
No ratings yet
DL Full Merged
454 pages
Informatica Powercentre
No ratings yet
Informatica Powercentre
298 pages
Opentext Vendor Invoice Management For Sap Solutions 75 Sp5 Administration Guide
No ratings yet
Opentext Vendor Invoice Management For Sap Solutions 75 Sp5 Administration Guide
246 pages
Data Warehousing and Data Mining (10cs755)
No ratings yet
Data Warehousing and Data Mining (10cs755)
142 pages
Unit 5 - Data Mining - WWW - Rgpvnotes.in
No ratings yet
Unit 5 - Data Mining - WWW - Rgpvnotes.in
15 pages
LDA 01 Linear Discriminant Analysis
No ratings yet
LDA 01 Linear Discriminant Analysis
65 pages
Classification: Decision Tree Induction: Lecture #9
No ratings yet
Classification: Decision Tree Induction: Lecture #9
121 pages
Machine Learning and Data Mining in Manufacturing
No ratings yet
Machine Learning and Data Mining in Manufacturing
45 pages
2nd Unit - 2.2 - Data Analytics
No ratings yet
2nd Unit - 2.2 - Data Analytics
22 pages
CH 6
No ratings yet
CH 6
72 pages
Confusion Matrix in Machine Learning
No ratings yet
Confusion Matrix in Machine Learning
10 pages
1) Architecture of Data Mining
No ratings yet
1) Architecture of Data Mining
10 pages
Data Science and Analytics
No ratings yet
Data Science and Analytics
3 pages
Module - 4 K Means Clustering
No ratings yet
Module - 4 K Means Clustering
20 pages
Liaquat Majeed Sheikh: National University of Computer and Emerging Sciences
No ratings yet
Liaquat Majeed Sheikh: National University of Computer and Emerging Sciences
79 pages
Phuong Nguyen: The Complete Guide To Cluster Analysis Using Python
No ratings yet
Phuong Nguyen: The Complete Guide To Cluster Analysis Using Python
68 pages
Worksheet - Matrices
No ratings yet
Worksheet - Matrices
12 pages
Difference Between Data Science and Machine Learning
No ratings yet
Difference Between Data Science and Machine Learning
5 pages
Bioinformatics F&amp M 20100722 Bujak
100% (1)
Bioinformatics F&amp M 20100722 Bujak
27 pages
Social Network Analysis in R PDF
No ratings yet
Social Network Analysis in R PDF
35 pages
SAP BI Interview Questions by Kuldeep
No ratings yet
SAP BI Interview Questions by Kuldeep
6 pages
Applied Machine Learning by Purdue
No ratings yet
Applied Machine Learning by Purdue
12 pages
Power BI Introduction
No ratings yet
Power BI Introduction
5 pages
Exercise XD01
No ratings yet
Exercise XD01
8 pages
Data Science Deep Learning & Artificial Intelligence
No ratings yet
Data Science Deep Learning & Artificial Intelligence
9 pages
Panasonic TX-p42vt30e, TX-p42vt30j Chasis Gpf14d-E
No ratings yet
Panasonic TX-p42vt30e, TX-p42vt30j Chasis Gpf14d-E
145 pages
Data Services-Typs of Lookup Funtions
No ratings yet
Data Services-Typs of Lookup Funtions
2 pages
ML L8 Decision Tree
No ratings yet
ML L8 Decision Tree
109 pages
Bootstrap Powerpoint
100% (1)
Bootstrap Powerpoint
20 pages
Decision Trees For Predictive Modeling (Neville)
100% (1)
Decision Trees For Predictive Modeling (Neville)
24 pages
Technology at Airbnb: Group 3-Team Deloitte
No ratings yet
Technology at Airbnb: Group 3-Team Deloitte
6 pages
What Is A DSS?: Decision Support Systems Concepts, Methodologies, and Technologies: An Overview
No ratings yet
What Is A DSS?: Decision Support Systems Concepts, Methodologies, and Technologies: An Overview
9 pages
Springer Template ICMMCS 2025
100% (1)
Springer Template ICMMCS 2025
10 pages
Amt305 Introduction To Machine Learning, Pyq
No ratings yet
Amt305 Introduction To Machine Learning, Pyq
5 pages
Zero To Advance in DSA - Shumbul Arifa
No ratings yet
Zero To Advance in DSA - Shumbul Arifa
21 pages
Data Scientist Certification Study Guide
No ratings yet
Data Scientist Certification Study Guide
7 pages
NetBackup 52xx and 53xx Appliance Admin Guide - 32
No ratings yet
NetBackup 52xx and 53xx Appliance Admin Guide - 32
372 pages
SAP HANA Predictive Analysis Library PAL en
100% (2)
SAP HANA Predictive Analysis Library PAL en
243 pages
Module 4 - Sem 1 - Principles and Drivers of New Marketing Environment
No ratings yet
Module 4 - Sem 1 - Principles and Drivers of New Marketing Environment
42 pages
Numpy-100 - 100 - Numpy - Exercises - With - Hint - Ipynb at Master Rougier - Numpy-100 GitHub PDF
No ratings yet
Numpy-100 - 100 - Numpy - Exercises - With - Hint - Ipynb at Master Rougier - Numpy-100 GitHub PDF
14 pages
Felcom 16 Operators Manual
No ratings yet
Felcom 16 Operators Manual
187 pages
M200FGE L23 CHIMEIInnolux
No ratings yet
M200FGE L23 CHIMEIInnolux
29 pages
Weka Tutorial
No ratings yet
Weka Tutorial
2 pages
Cluster
100% (1)
Cluster
72 pages
TERMUX2019
No ratings yet
TERMUX2019
12 pages
Nasir Ali 233664 ICT MidLab
No ratings yet
Nasir Ali 233664 ICT MidLab
8 pages
Applied Artificial Intelligence Professional Course Brochure
No ratings yet
Applied Artificial Intelligence Professional Course Brochure
14 pages
20-Sep-2021 - Future Coin-WP
No ratings yet
20-Sep-2021 - Future Coin-WP
48 pages
MU 78 15E BEAT TR12.r4
No ratings yet
MU 78 15E BEAT TR12.r4
51 pages
Automatic Vehicle Location AVL 2 PDF
No ratings yet
Automatic Vehicle Location AVL 2 PDF
50 pages
Siv 2023-24 UML UNIT-1-1
No ratings yet
Siv 2023-24 UML UNIT-1-1
33 pages
Day 5 Supervised Technique-Decision Tree For Classification PDF
100% (1)
Day 5 Supervised Technique-Decision Tree For Classification PDF
58 pages
DP2 V5127 New Install
No ratings yet
DP2 V5127 New Install
33 pages
IM Ch14 Big Data Analytics NoSQL Ed12
No ratings yet
IM Ch14 Big Data Analytics NoSQL Ed12
8 pages
Unit 5 PDF
100% (1)
Unit 5 PDF
32 pages
Neural Networks
No ratings yet
Neural Networks
13 pages
Concur Invoice Rollout v2.0 - 202306
No ratings yet
Concur Invoice Rollout v2.0 - 202306
25 pages
Data Science
100% (1)
Data Science
7 pages
Great Learning - Pgp-Dsba-Brochure
No ratings yet
Great Learning - Pgp-Dsba-Brochure
18 pages
Classification With Decision Trees: Instructor: Qiang Yang
100% (1)
Classification With Decision Trees: Instructor: Qiang Yang
62 pages
Global Pers Pro
No ratings yet
Global Pers Pro
13 pages
MBA105 - Almario - Parco - Assignment 2
No ratings yet
MBA105 - Almario - Parco - Assignment 2
11 pages
Sap MDG
No ratings yet
Sap MDG
6 pages
Shiv Aarti PDF - PDF
No ratings yet
Shiv Aarti PDF - PDF
6 pages
T-GCPBDML-B - M2 - Data Engineering For Streaming Data - ILT Slides
No ratings yet
T-GCPBDML-B - M2 - Data Engineering For Streaming Data - ILT Slides
71 pages
Hw1 Theory Solution PuHK4fmHvB
No ratings yet
Hw1 Theory Solution PuHK4fmHvB
4 pages
Sminton, Journal Manager, S6060-9102-Jair
No ratings yet
Sminton, Journal Manager, S6060-9102-Jair
10 pages
Difference Expansion
No ratings yet
Difference Expansion
8 pages
BDM Unit I Slides Part 1
No ratings yet
BDM Unit I Slides Part 1
27 pages
Test 3 - Revision
No ratings yet
Test 3 - Revision
5 pages
Text
No ratings yet
Text
131 pages
Deep Learning Methods and Applications For Electrical Power Systems A Comprehensive Review
No ratings yet
Deep Learning Methods and Applications For Electrical Power Systems A Comprehensive Review
22 pages
Decision Trees & The Iterative Dichotomiser 3 (ID3) Algorithm
100% (1)
Decision Trees & The Iterative Dichotomiser 3 (ID3) Algorithm
8 pages
Bugando Letter 2
No ratings yet
Bugando Letter 2
1 page
IT Code 402 Notes: CBSE Class 10
No ratings yet
IT Code 402 Notes: CBSE Class 10
7 pages
NPCIL Recruitment Portal Online Test Call Letter - Registration No - 1829STWE000025
No ratings yet
NPCIL Recruitment Portal Online Test Call Letter - Registration No - 1829STWE000025
4 pages
Nptel Mapped Courses
No ratings yet
Nptel Mapped Courses
2 pages
The OSI-Model (Open Systems Interconnection) : "All People Seem To Need Data Processing"
No ratings yet
The OSI-Model (Open Systems Interconnection) : "All People Seem To Need Data Processing"
1 page
Data Science Recommended Books
No ratings yet
Data Science Recommended Books
23 pages
Machine Learning & Data Mining
No ratings yet
Machine Learning & Data Mining
4 pages
Simple Libraries in Python
No ratings yet
Simple Libraries in Python
12 pages
Seminar Report Machine Learning
No ratings yet
Seminar Report Machine Learning
20 pages
99 Apache Spark Interview Questions For Professionals PDF - Google Search
No ratings yet
99 Apache Spark Interview Questions For Professionals PDF - Google Search
2 pages
Cheatsheet Midterms 2 - 3
No ratings yet
Cheatsheet Midterms 2 - 3
2 pages
Python Natural Language Processing Cookbook: Over 60 recipes for building powerful NLP solutions using Python and LLM libraries
From Everand
Python Natural Language Processing Cookbook: Over 60 recipes for building powerful NLP solutions using Python and LLM libraries
Zhenya Antić
No ratings yet
HDInsight Essentials - Second Edition
From Everand
HDInsight Essentials - Second Edition
Rajesh Nadipalli
No ratings yet
Optimizing Hadoop for MapReduce
From Everand
Optimizing Hadoop for MapReduce
Khaled Tannir
No ratings yet
Software Asset Management: What Is It and Why Do I Need It?: A Textbook on the Fundamentals in Software License Compliance, Audit Risks, Optimizing Software License ROI, Business Practices and Life Cycle Management
From Everand
Software Asset Management: What Is It and Why Do I Need It?: A Textbook on the Fundamentals in Software License Compliance, Audit Risks, Optimizing Software License ROI, Business Practices and Life Cycle Management
Carl A. Bolton
No ratings yet
The Datadog Handbook: A Guide to Monitoring, Metrics, and Tracing
From Everand
The Datadog Handbook: A Guide to Monitoring, Metrics, and Tracing
Robert Johnson
No ratings yet

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.