0% found this document useful (0 votes)
51 views

DecisionTree Numerical ID3Prob

ml

Uploaded by

HARSH NAYAN
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
51 views

DecisionTree Numerical ID3Prob

ml

Uploaded by

HARSH NAYAN
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 114

Decision Tree & ID3

Algorithm
Reference book:
R2.Tom Mitchell, Machine Learning, McGraw-Hill, 1997

Dr.S.Sridevi, ASP/SCOPE, VIT Chennai


Tree Versus ML Decision Tree
Leaf
Root

Root Leaf

Dr.S.Sridevi, ASP/SCOPE, VIT Chennai


1. Satisfy Criteria for salary?
2. Is it a dream company?
3. Commute/travel time is less than a hour?
4. Offers free breakfast & Coffee?

Dr.S.Sridevi, ASP/SCOPE, VIT Chennai


You need to buy apples…
How would you choose fresh apples in market?

Dr.S.Sridevi, ASP/SCOPE, VIT Chennai


• DT is a method for approximating discrete-valued functions that is robust to
noisy data and capable of learning disjunctive expressions.
• Learned trees can also be re-represented as sets of if-then rules to improve
human readability. Dr.S.Sridevi, ASP/SCOPE, VIT Chennai
1. Satisfy Criteria for salary?
2. Is it a dream company?
3. Commute/travel time is less than a hour?
4. Offers free breakfast & Coffee?

Dr.S.Sridevi, ASP/SCOPE, VIT Chennai


Definition – Decision Trees
1. Decision trees classify instances by sorting
them down the tree from the root to
some leaf node, which provides the
classification of the instance.
2. Each node in the tree specifies a test of
• DT is a method for approximating discrete- some attribute of the instance, and each
valued functions that is robust to noisy data branch descending from that node
and capable of learning disjunctive corresponds to one of the possible values
expressions. for this attribute.
3. An instance is classified by starting at the
• Learned trees can also be re-represented as
root node of the tree, testing the attribute
sets of if-then rules to improve human
specified by this node, then moving down
readability.
the tree branch corresponding to the
value of the attribute.
4. This process is then repeated for the sub-
tree rooted at the new node.

Dr.S.Sridevi, ASP/SCOPE, VIT Chennai


Types of Decision Trees

• Classification Tree : Classification Tree is used to create a decision tree


for a categorical response (commonly known as target) with many
categorical or continuous predictors (factors). The categorical response
can be in the form of binomial or multinomial (e.g. Pass/Fail, high,
medium & low, etc.). It illustrates important patterns and relationships
between a categorical response and important predictors within highly
complicated data, without using parametric methods. Also, identify
groups in the data with desirable characteristics, and to predict response
values for new observations. For e.g., a credit card company can use
classification tree to identify customers that will take credit card or not
based on several predictors.
• Regression Tree : Regression Tree is used to create a decision tree for a
continuous response (commonly known as target) with many categorical
or continuous predictors (factors). The continuous response can be in the
form of a real number (e.g. piston diameter, blood pressure level, etc.). It
also illustrates the important patterns and relationships between a
continuous response and predictors within highly complicated data,
without using parametric methods. Also, identify groups in the data with
desirable characteristics, and to predict response values for new
observations. For example, a pharmaceutical company can use regression
tree to identify the potential predictors which are affecting the dissolution
rate based on several predictors.

Dr.S.Sridevi, ASP/SCOPE, VIT Chennai


Decision Tree Algorithms
ID3

Decision
Tree
Algorithms

C4.5 CART

Dr.S.Sridevi, ASP/SCOPE, VIT Chennai


A decision tree is built top-down from a root node and involves partitioning the
data into subsets that contain instances with similar values (homogenous).

Decision Tree algorithms

1. ID3(Iterative Dichotomiser 3): ID3 cannot handle continuous variables


directly; it works only with categorical data. It is also prone to overfitting.
(Splitting Criterion: Information Gain)

2. C4.5: It can handle both categorical and continuous attributes by converting


continuous attributes into categorical ones through thresholding. (Splitting
Criterion: Gain Ratio: C4.5 is an extension of ID3)
3. CART (Classification and Regression Trees): CART splits nodes into exactly
two branches (Splitting Criterion: Gini Index for classification trees, Variance
Reduction for regression trees.)
Dr.S.Sridevi, ASP/SCOPE, VIT Chennai
Terminologies
• Root Node : It represents the entire population or sample and
this further gets divided into two or more homogeneous sets
• Leaf Node: Node cannot be further segregated into further
nodes
• Parent/Child node: Root node is the parent node and all the
other nodes branched from it is know as child node
• Branch/sub tree: Formed by splitting the tree/node
• Splitting : It is dividing the root node/sub node into different
parts on the basis of some condition.
• Pruning: Opposite of splitting, basically removing unwanted
branches from the tree
• Entropy: Measure that tells the purity/impurity of samples
• Information Gain: It is the decrease in entropy after a dataset
is split on the basis of an attribute. Constructing a decision Decision trees represent a
tree is all about finding attribute that returns the highest
information gain (Useful in deciding which attribute can be disjunction of conjunctions of
used as root node ) constraints on the attribute values
• Reduction in variance: It is an algorithm used for continuous of instances.
target variables (regression problems). The split with lower
variance is selected as the criteria to split the population.
• Gini index: the measure of purity or impurity used in building
decision tree in CART ()
• Chi Square: It is an algorithm to find out the statistical
significance between the differences between sub-nodes and
parent node.
Dr.S.Sridevi, ASP/SCOPE, VIT Chennai
DECISION TREE REPRESENTATION
• (Outlook = Sunny, Temperature = Hot, Humidity = High, Wind = Strong)
• In general, decision trees represent a disjunction of conjunctions of constraints on the attribute
values of instances.
• Each path from the tree root to a leaf corresponds to a conjunction of attribute tests, and the tree
itself to a disjunction of these conjunctions.
(Outlook = Sunny ˄ Humidity = Normal) v (Outlook = Overcast) v (Outlook = Rain ˄ Wind = Weak)
If (Outlook = Sunny AND Humidity = Normal) OR (Outlook = Overcast) OR (Outlook = Rain AND Wind = Weak)
Then: Play Tennis = Yes
Else
Then: Play Tennis = No

Dr.S.Sridevi, ASP/SCOPE, VIT Chennai


Test Instance:
Whether we PlayTennis ?
If (“Outlook = Sunny ˄ Humidity = Normal”)
Then Play Tennis = Yes

Outlook Temperature Humidity Wind Play Tennis:


Decision
Sunny ----- High ----- Yes

(Outlook = Sunny ˄ Humidity = Normal) v (Outlook = Overcast) v (Outlook = Rain ˄ Wind = Weak)
Dr.S.Sridevi, ASP/SCOPE, VIT Chennai
Concept of Decision Trees
Attribute 1 Attribute 2 Attribute 3 Attribute 4 Class = {M, H}

Class = M

Class = H
Attribute 2

Attribute 3
Class = M

Class = H
Attribute 1

Dr.S.Sridevi, ASP/SCOPE, VIT Chennai


Fig. 3.1 Representation of objects (samples) using features.
Colour {Green, Brown, Gray, Other} Has wings?

Abdomen Thorax
length Antennae
length length

Mandible
Size

Spiracle
diameter Leg length

Fig. 3.2 Measuring features for domain of interest.


Table 3.1 Instance, Features and Class
Insect Abdomen Antennae Insect class
ID length length
1 2.7 5.5 Grasshopper
2 8 9.1 Kartydid
3 0.9 4.7 Grasshopper
4 1.1 3.1 Grasshopper
5 5.4 8.5 Kartydid
6 2.9 1.9 Grasshopper
7 6.1 6.6 Kartydid
8 0.5 1 Grasshopper
9 8.3 6.6 Kartydid
10 8.1 4.7 Kartydid
An Example from Medicine
Table 4.1 Medical Data
The main
o Gender Age BP Drug
1 Male 20 Normal A
purpose of the decision tree is 2 Female 73 Normal B
3 Male 37 High A
to expose the structural 4 Male 33 Low B
5 Female 48 High A
6 Male 29 Normal A
information contained in the 7 Female 52 Normal B
8 Male 42 Low B
data. 9 Male 61 Normal B
10 Female 30 Normal A
11 Female 26 Low B
12 Male 54 High A
Grasshoppers Katydids

10

7
Abdomen length > 7.1?
6
Antenna length

5 No
Yes

4
Antenna length > 6.0? Katydid
3

2 No Yes

1
Grasshopper Katydid

1 2 3 4 5 6 7 8 9 10

Abdomen length Fig. 4.8 Feature space and the decision tree for insect data.
Grasshoppers Katydids

10

7
Abdomen length > 7.1?
6
Antenna length

5 no yes

4
Antenna length > 6.0? Katydid
3

2 no yes

1
Grasshopper Katydid

1 2 3 4 5 6 7 8 9 10

Abdomen length
Fig. 4.8 Feature space and the decision tree for insect data.
Concept of Decision Tree ML
A decision tree is built top-down from a root node and involves partitioning the
data into subsets that contain instances with similar values (homogenous).

Dr.S.Sridevi, ASP/SCOPE, VIT Chennai


Training Dataset
• Input features are also called Attributes
• This example dataset has 2 attributes – Colour & Diameter
• Instances are defined by categorical levels/numerical
values of attributes
• We used to call a dataset as Labelled dataset if class is
defined for input feature – Concept of Supervised
classification: Here Output is categorical
Decision tree - procedure
1. Start with one of the best attribute available in the dataset
2. Start with full dataset in root node (If in case you consider taking Diameter as best
attribute: Ask a question [Is dia>= 3?]

Stop Continue
Dr.S.Sridevi, ASP/SCOPE, VIT Chennai growing tree growing tree
Decision tree - procedure
1. Start with one of the best attribute available in the dataset
2. Start with full dataset in root node (If in case you consider taking Diameter as best
attribute: Ask a question [Is dia>= 3?]
3. Based on attribute values[True/False] – Dataset is divided into 2 subsets and those
subsets becomes input to 2 new child nodes

Stop Continue
Dr.S.Sridevi, ASP/SCOPE, VIT Chennai growing tree growing tree
Decision tree - procedure
1. Start with one of the best attribute available in the dataset
2. Start with full dataset in root node (If in case you consider taking Diameter as best attribute:
Ask a question [Is dia>= 3?]
3. Based on attribute values[True/False] – Dataset is divided into 2 subsets and those subsets
becomes input to 2 new child nodes
4. In False side – Data subset has similar type for label (Grape- There is no uncertainty (no
confusion in predicting the label) about the type of leaf so stop growing the tree in that
side)
5. In True side – Subset has mixture of labels so uncertainty exists – So continue splitting the
dataset as well node

Dataset

Dataset Dataset
(Grape) (Mango &
Lemon)

Stop
Continue
growing tree
growing tree
Dr.S.Sridevi, ASP/SCOPE, VIT Chennai (Leaf node)
Decision tree - procedure

Based on the characteristics of attributes, Identify different set


of possible question to ask.
How to identify, a question to continue growing tree is good
indictor? → Information gain metric

Dr.S.Sridevi, ASP/SCOPE, VIT Chennai


Dr.S.Sridevi, ASP/SCOPE, VIT Chennai
Step I
• Identify different set of possible question to ask

Dr.S.Sridevi, ASP/SCOPE, VIT Chennai


Method to identify best
attributes

Dr.S.Sridevi, ASP/SCOPE, VIT Chennai


Entropy – Is a metric to identify best attribute

A decision tree is built top-down from a root node and involves


partitioning the data into subsets that contain instances with similar
values (homogenous). ID3 algorithm uses entropy to calculate the
homogeneity of a sample. If the sample is completely homogeneous the
entropy is zero and if the sample is an equally divided it has entropy of
one.
CLASSIFICATION METHODS

Challenges

How to represent the entire information in the dataset using minimum number of
rules?
How to develop the smallest tree?

Solution

Select the variable with maximum information (highest relation with Y) for first split

31
ID3 Vs C4.5 Vs CART
Decision Trees

▪ Decision trees are a type of supervised machine learning


▪ Use well “labelled” training data and on basis of that data, predict the
output. This process can then be used to predict the results for
unknown data
▪ Decision trees can be applied for both regression and classification
problems
▪ A decision tree processes data into groups based on the value of the
data and the features it is provided
▪ Decision trees can be used for regression to get real numeric value. Or
they can be used for classification to split data into different categories

33
Decision Trees

34
34
Decision Trees

35
35
Decision Trees

36
36
Decision Trees

37
37
Decision Trees
Decision trees has three types of
nodes
▪ A root node that has no
incoming edges and zero or
more outgoing edges
▪ Internal nodes, each of which
has exactly one incoming edge
and two or more outgoing
edges
▪ Leaf or Terminal nodes, each
of which has exactly one
incoming edge and no
outgoing edges 38
38
Decision Trees
▪ In a decision tree, each leaf
node is assigned a class label
▪ The non-terminal nodes,
which include the root and
other internal nodes, contain
attribute test conditions to
separate records that have
different characteristics

39
39
Decision Trees
▪ Classifying a test record is straight
forward once a decision tree has been
constructed
▪ Starting from the root node, we apply
the test condition to the record and
follow the appropriate branch based
on the outcome of the test
▪ This will lead us either to another
internal node, for which a new test
condition is applied or to a leaf node
▪ Class label associated with a leaf node
is then assigned to the record
40
40
Stepts in Decision Trees

Step-1: Begin the tree with the root node, says S, which contains the complete dataset.
Step-2: Find the best attribute in the dataset using Attribute Selection Measure (ASM).
Step-3: Divide the S into subsets that contains possible values for the best attributes.
Step-4: Generate the decision tree node, which contains the best attribute.
Step-5: Recursively make new decision trees using the subsets of the dataset created in
step -3. Continue this process until a stage is reached where you cannot further classify the
nodes and called the final node as a leaf node.

41
41
Decision Trees

42
42
Decision Trees

Why use Decision Trees?

1. Decision Trees usually mimic human thinking ability


while making a decision, so it is easy to understand.

2. The logic behind the decision tree can be easily


understood because it shows a tree-like structure.

43
43
Terminologies
• Root Node : It represents the entire population or sample and
this further gets divided into two or more homogeneous sets
• Leaf Node: Node cannot be further segregated into further
nodes
• Parent/Child node: Root node is the parent node and all the
other nodes branched from it is know as child node
• Branch/sub tree: Formed by splitting the tree/node
• Splitting : It is dividing the root node/sub node into different
parts on the basis of some condition.
• Pruning: Opposite of splitting, basically removing unwanted
branches from the tree
• Entropy: Measure that tells the purity/impurity of samples
• Information Gain: It is the decrease in entropy after a dataset
is split on the basis of an attribute. Constructing a decision Decision trees represent a
tree is all about finding attribute that returns the highest
information gain (Useful in deciding which attribute can be disjunction of conjunctions of
used as root node ) constraints on the attribute values
• Reduction in variance: It is an algorithm used for continuous of instances.
target variables (regression problems). The split with lower
variance is selected as the criteria to split the population.
• Gini index: the measure of purity or impurity used in building
decision tree in CART ()
• Chi Square: It is an algorithm to find out the statistical
significance between the differences between sub-nodes and
parent node.
Dr.S.Sridevi, ASP/SCOPE, VIT Chennai
How many attributes? Which attribute is significant?

Dr.S.Sridevi, ASP/SCOPE, VIT Chennai


Why need to find significant attribute?

• Because to start to construct the decision tree, one of the best


attribute has to assigned as Root node.

Dr.S.Sridevi, ASP/SCOPE, VIT Chennai


Decision Trees

Attribute Selection Measures


While implementing a Decision tree, the main issue arises that how to
select the best attribute for the root node and for sub-nodes. So, to solve
such problems there is a technique which is called as Attribute selection
measure or ASM. By this measurement, we can easily select the best
attribute for the nodes of the tree. There are two popular techniques for
ASM, which are:

1. Information Gain
2. Gini Index

47
47
Decision Trees

1. Information Gain:
Information gain is the measurement of changes in entropy after the
segmentation of a dataset based on an attribute.
It calculates how much information a feature provides us about a class.
According to the value of information gain, we split the node and build the
decision tree.

A decision tree algorithm always tries to maximize the value of information


gain, and a node/attribute having the highest information gain is split first. It
can be calculated using the below formula:

Information Gain = Entropy(S )- [(Weighted Avg) *Entropy(each feature)]


48
48
Decision Trees

Entropy: Entropy is a metric to measure the impurity in a given attribute.


It specifies randomness in data.

Entropy can be calculated as:


Entropy(s)= -P(yes)log2 P(yes)- P(no) log2 P(no)

Where,
•S= Total number of samples
•P(yes)= probability of yes
•P(no)= probability of no

49
49
Decision Trees

2. Gini Index:
•Gini index is a measure of impurity or purity used while creating a
decision tree in the CART(Classification and Regression Tree) algorithm.
•An attribute with the low Gini index should be preferred as compared to
the high Gini index.
•It only creates binary splits, and the CART algorithm uses the Gini index
to create binary splits.
•Gini index can be calculated using the below formula:
Gini Index= 1- ∑jPj2

50
50
Information Gain

Few samples
are mixed Dr.S.Sridevi, ASP/SCOPE, VIT Chennai
Information Gain

Dr.S.Sridevi, ASP/SCOPE, VIT Chennai


Dr.S.Sridevi, ASP/SCOPE, VIT Chennai
Dr.S.Sridevi, ASP/SCOPE, VIT Chennai
Entropy

Dr.S.Sridevi, ASP/SCOPE, VIT Chennai


Dr.S.Sridevi, ASP/SCOPE, VIT Chennai
Dr.S.Sridevi, ASP/SCOPE, VIT Chennai
Dr.S.Sridevi, ASP/SCOPE, VIT Chennai
Dr.S.Sridevi, ASP/SCOPE, VIT Chennai
Entropy metric (Numerical example)

Dr.S.Sridevi, ASP/SCOPE, VIT Chennai


Entropy metric (Numerical example)

Dr.S.Sridevi, ASP/SCOPE, VIT Chennai


Concept of Decision
Trees

Dr.S.Sridevi, ASP/SCOPE, VIT Chennai


ID3 ALGORITHM (Iterative Dichotomiser 3)
• ID3 algorithm, learns decision trees by constructing them
top-down, beginning with the question "which attribute
should be tested at the root of the tree?‘ To answer this
question, each instance attribute is evaluated using a
statistical test to determine how well it alone classifies the
training examples.
• The best attribute is selected and used as the test at the
root node of the tree.
• A descendant of the root node is then created for each
possible value of this attribute, and the training examples
are sorted to the appropriate descendant node
• The entire process is then repeated using the training
examples associated with each descendant node to select
the best attribute to test at that point in the tree.
• This forms a greedy search for an acceptable decision tree,
in which the algorithm never backtracks to reconsider
earlier choices.
• A simplified version of the algorithm, specialized to learning
boolean-valued functions (i.e., concept learning)

Dr.S.Sridevi, ASP/SCOPE, VIT Chennai


WHICH ATTRIBUTE IS THE BEST
CLASSIFIER?

• The central choice in the ID3 algorithm is selecting which attribute to


test at each node in the tree.
• What is a good quantitative measure of the worth of an attribute?
• We will define a statistical property, called information gain, that
measures how well a given attribute separates the training examples
according to their target classification.
• ID3 uses this information gain measure to select among the
candidate attributes at each step while growing the tree.

Dr.S.Sridevi, ASP/SCOPE, VIT Chennai


Dr.S.Sridevi, ASP/SCOPE, VIT Chennai
Contd.,

• ID3(Examples, Target attribute, Attributes)


• Examples are the training examples. Target attribute is the attribute whose
value is to be predicted by the tree. Attributes is a list of other attributes that
may be tested by the learned decision tree. Returns a decision tree that
correctly classifies the given Examples.
• Create a Root node for the tree
• If all Examples are positive, Return the single-node tree Root, with label = +
• If all Examples are negative, Return the single-node tree Root, with label = -
• If Attributes is empty, Return the single-node tree Root, with label = most
common value of
• Target attribute in Examples
• Otherwise Begin
• A<-the attribute from Attributes that best classifies Examples
• The decision attribute for Root <- A

Dr.S.Sridevi, ASP/SCOPE, VIT Chennai


Contd.,

• For each possible value, vi, of A,


• Add a new tree branch below Root, corresponding to the test A = vi
• Let Examplesvi be the subset of Examples that have value vi for A
• If Examplesvi is empty
• Then below this new branch add a leaf node with label = most
common
• value of Target attribute in Examples
• Else below this new branch add the subtree
• ID3(Examplesvi, Targetattribute, Attributes - (A)))
• End
• Return Root

Dr.S.Sridevi, ASP/SCOPE, VIT Chennai


Numerical problem
Given a dataset of historical tennis match data, including
features such as outlook, temperature, humidity and windy,
design a decision tree classifier to predict the outcome of a
tennis playing decision making based on these input features.
How would you determine the optimal split criteria at each
node of the decision tree to make accurate predictions about
whether a player should play tennis or not under specific
weather and player condition scenarios?

Dr.S.Sridevi, ASP/SCOPE, VIT Chennai


DATASET FOR PLAYING TENNIS

+ Dr.S.Sridevi, ASP/SCOPE, VIT Chennai


-
DATASET FOR PLAYING TENNIS
Question?

Candidate
attributes

+ Positive instances Dr.S.Sridevi, ASP/SCOPE, VIT Chennai


- Negative instances
Take X1:outlook attribute and
analyze what are the sub-attributes
in it and count # of samples in each

4
5 5

• There are 5 samples in Sunny sub-attribute with 2-Yes positive labels + 3-No negative
labels.
• There are 4 samples in Overcast sub-attribute with 4-Yes positive labels + 0-No
negative labels.
• There are 5 samples in Rainy sub-attribute with 3-Yes positive labels + 2-No negative
labels.
Dr.S.Sridevi, ASP/SCOPE, VIT Chennai
4
5 5 Frequency Table for X1

Dr.S.Sridevi, ASP/SCOPE, VIT Chennai


Take X2:temp attribute and analyze what are
the sub-attributes in it and count
Temp?

hot mild cool


No Yes Yes
No No No
Yes Yes Yes
Yes Yes Yes
4 Yes Frequency Table for X2
No 4

Dr.S.Sridevi, ASP/SCOPE, VIT Chennai


humidity?

Frequency Table for X3


high normal
No Yes
No No
Yes Yes
Yes Yes
No Yes
Yes Yes
No Yes

7 7
Dr.S.Sridevi, ASP/SCOPE, VIT Chennai
Windy?

Frequency Table for X4

false true
No No
Yes No
Yes Yes
Yes Yes
No Yes
Yes No
Yes
Yes 6
8 Dr.S.Sridevi, ASP/SCOPE, VIT Chennai
Frequency Table for entire dataset

Dr.S.Sridevi, ASP/SCOPE, VIT Chennai


DATASET FOR PLAYING TENNIS There are 4 attributes (outlook, temperature,
humidity and windy) in the given dataset.
Sub-attributes Attributes Target Which should be considered as root node?
Question?

The given 4
Candidate
attributes

+ Positive instances =9 /14 Solution


Dr.S.Sridevi, ASP/SCOPE, VIT Chennai
- Negative instances = 5/14
Step 1

• Measure entropy for overall samples S in the given dataset. Formula


for Entropy

Dr.S.Sridevi, ASP/SCOPE, VIT Chennai


Step 2
• For all the attributes from the dataset, compute entropy and
information gain measures to identify which attribute is significant to
consider it as a root node to start building the decision tree.
• In this dataset there are 4 attributes namely outlook,
temperature, humidity and windy.
• Lets start considering outlook as the first choice in our computation to
calculate Information gain

Dr.S.Sridevi, ASP/SCOPE, VIT Chennai


For X1: Overcast compute all the
following measures, Refer to the
basic Entropy formula
Step 1: Calculate Overall Entropy value

Step 2: How many sub attributes in Outlook?

Step 3: Calculate Outlook-Sunny Attribute based


Entropy value

Step 4: Calculate Outlook-Overcast Attribute based


Entropy value
4
5 5
Step 5: Calculate Outlook-Rainy Attribute based
Entropy value

Step 6: Information for outlook

Step7: Calculate Gain for outlook


Dr.S.Sridevi, ASP/SCOPE, VIT Chennai
Step 1: Calculate Overall Entropy value

+ Positive instances =2 /5 Step 2: How many sub attributes in Outlook?


- Negative instances = 3/5 3 = Sunny, Overcast & Rainy
Step 3: Calculate Outlook-Sunny Attribute based
Entropy value

E(Outlook=Sunny) = -(2/5) log2 (2/5) –(3/5) log2 (3/5) = 0.971

Step 4: Calculate Outlook-Overcast Attribute based


Entropy value
4
5 5
Note: In calculator Step 5: Calculate Outlook-Rainy Attribute based
log2 (x) = log(x)/log(2) Entropy value

E(Outlook=Sunny)
= -(2/5) log2 (2/5) –(3/5) log2 (3/5) Step 6: Information for outlook
= -0.4(-1.3219)-0.6(-0.7369)
= 0.52876+0.44214
= 0.971 Step7: Calculate Gain for outlook
Dr.S.Sridevi, ASP/SCOPE, VIT Chennai
Step 1: Calculate Overall Entropy value

+ Positive instances =4 /4 Step 2: How many sub attributes in Outlook?


- Negative instances = 0/4 3 = Sunny, Overcast & Rainy
Step 3: Calculate Outlook-Sunny Attribute based
Entropy value

E(Outlook=Sunny) = -(2/5) log2 (2/5) –(3/5) log2 (3/5) = 0.971

Step 4: Calculate Outlook-Overcast Attribute based


Entropy value
4
5 5 E(Outlook=Overcast) = -(4/4) log2(4/4) -(0/4) log2(0/4) = 0

Note: In calculator Step 5: Calculate Outlook-Rainy Attribute based


log2 (x) = log(x)/log(2) Entropy value

E(Outlook=Sunny)
= -(4/4) log2 (4/4) –(0/4) log2 (0/4) Step 6: Information for outlook
= -0

Step7: Calculate Gain for outlook


Dr.S.Sridevi, ASP/SCOPE, VIT Chennai
Step 1: Calculate Overall Entropy value

+ Positive instances =3 /5 Step 2: How many sub attributes in Outlook?


- Negative instances = 2/5 3 = Sunny, Overcast & Rainy
Step 3: Calculate Outlook-Sunny Attribute based
Entropy value

E(Outlook=Sunny) = -(2/5) log2 (2/5) –(3/5) log2 (3/5) = 0.971

Step 4: Calculate Outlook-Overcast Attribute based


Entropy value
4
5 5 E(Outlook=Overcast) = -(4/4) log2(4/4) -(0/4) log2(0/4) = 0

Note: In calculator Step 5: Calculate Outlook-Rainy Attribute based


log2 (x) = log(x)/log(2) Entropy value
E(Outlook=Rainy) = -(2/5) log2 (2/5) –(3/5) log2 (3/5) = 0.971
E(Outlook=Sunny)
= -(2/5) log2 (2/5) –(3/5) log2 (3/5) Step 6: Information for outlook
= -0.4(-1.3219)-0.6(-0.7369)
= 0.52876+0.44214
= 0.971 Step7: Calculate Gain for outlook
Dr.S.Sridevi, ASP/SCOPE, VIT Chennai
Step 1: Calculate Overall Entropy value

E(S) = -(9/14) log2 (9/14) –(5/14) log2 (5/14) = 0.940

Step 2: How many sub attributes in Outlook?


3 = Sunny, Overcast & Rainy
Step 3: Calculate Outlook-Sunny Attribute based
Entropy value

E(Outlook=Sunny) = -(2/5) log2 (2/5) –(3/5) log2 (3/5) = 0.971

Step 4: Calculate Outlook-Overcast Attribute based


Entropy value
4
5 5 E(Outlook=Overcast) = -(4/4) log2(4/4) -(0/4) log2(0/4) = 0

Note: In calculator Step 5: Calculate Outlook-Rainy Attribute based


log2 (x) = log(x)/log(2) Entropy value
E(Outlook=Rainy) = -(2/5) log2 (2/5) –(3/5) log2 (3/5) = 0.971
I(Outlook) = E(Outlook=Sunny)
+E(Outlook=Overcast)+E(Outlook=Rai Step 6: Information for outlook
ny) = (5/14) * E(Outlook=Sunny)+ I(Outlook) = (5/14) * 0.971 + (4/14)*0 + (5/14)*0.971 = 0.693
(4/14)* +E(Outlook=Overcast)
(5/14)*+E(Outlook=Rainy) Step7: Calculate Gain for outlook
= 0.693 Dr.S.Sridevi, ASP/SCOPE, VIT Chennai
Gain(Outlook) = E(S)-I(Outlook) [Step1-Step 6]
= 0.940 – 0.693=0.247
Take X2:temp attribute and analyze
what are the sub-attributes in it and
count

Temp?

hot mild cool


No Yes Yes
No No No
Yes Yes Yes
Yes Yes Yes
4 Yes
No 4

Dr.S.Sridevi, ASP/SCOPE, VIT Chennai


Take X2:temp attribute and analyze
what are the sub-attributes in it and
count
Step 1: Calculate Overall Entropy value

E(S) = -(9/14) log2 (9/14) –(5/14) log2 (5/14) = 0.940

Temp? Step 2: How many sub attributes in temp?

3 = hot, mild, cool


Step 3: Calculate Outlook-Sunny Attribute based
Entropy value
hot mild cool E(temp=hot) = -(2/4) log (2/4) –(2/4) log (2/4) = 1
2 2
No Yes Yes
No Step 4: Calculate Outlook-Overcast Attribute based
No No Entropy value
Yes Yes Yes
Yes Yes Yes E(temp=mild) = -(4/6) log2(4/6)-(2/6) log2 (2/6) = 0.9184
Yes Step 5: Calculate Outlook-Rainy Attribute based
4 4 Entropy value
No
E(temp=cool) = -(3/4) log2 (3/4) –(1/4) log2 (1/4) = 0.8112
6
Step 6: Information for outlook
I(temp) = (4/14) * 1 + (6/14)*0.9184 + (4/14)*0.8112= 0.9149
Step7: Calculate Gain for outlook
Dr.S.Sridevi, ASP/SCOPE, VIT Chennai Gain(Temp) = E(S)-I(Temp)
= 0.94 – 0.9149=0.0251
humidity?

high normal
No Yes
No No
Yes Yes
Yes Yes
No Yes
Yes Yes
No Yes

7 7
Dr.S.Sridevi, ASP/SCOPE, VIT Chennai
Take X3:Humidity attribute and
analyze what are the sub-attributes
in it and count

Dr.S.Sridevi, ASP/SCOPE, VIT Chennai


Windy?

false true
No No
Yes No
Yes Yes
Yes Yes
No Yes
Yes No
Yes
Yes 6
8 Dr.S.Sridevi, ASP/SCOPE, VIT Chennai
Dr.S.Sridevi, ASP/SCOPE, VIT Chennai
We have to check which attribute
Step 3 is highest Information Gain value

The outlook attribute which has


maximum value of Information
Gain is assigned as root node to
grow the Decision tree

Dr.S.Sridevi, ASP/SCOPE, VIT Chennai


Step 4
Having identified that outlook attribute is the root node, yet we need to make 3 branches for
sunny, overcast and rain from it.

The labels
are similar
and pure
On checking overcast attribute has only (Yes) positive labels with high purity
measure so it is considered as leaf node because it has no possibility to grow
further. Whereas sunny and rain has both Yes and No labels which means impurity
is there and it has possibilityDr.S.Sridevi,
to branch out
ASP/SCOPE, as Yes and No.
VIT Chennai
Below the Sunny attribute we will grow the tree via choosing any of the pending
attributes (temp or humidity or wind)
Which to choose?
Measure the Information Gain and choose the one with highest value.
Step 5: So, Lets consider the data samples D1, D2, D8, D9, D11 pertaining to Sunny sub-
attribute with respect to other attributes namely temp, humidity and windy

Dr.S.Sridevi, ASP/SCOPE, VIT Chennai


Step 5a

Dr.S.Sridevi, ASP/SCOPE, VIT Chennai


Step 5b

Dr.S.Sridevi, ASP/SCOPE, VIT Chennai


Step 5c

Dr.S.Sridevi, ASP/SCOPE, VIT Chennai


We have to check which attribute
is having highest Information
Gain value

The Humidity attribute which has


maximum value of Information
Gain is assigned as root node in
Level1 to grow the Decision tree
further
Dr.S.Sridevi, ASP/SCOPE, VIT Chennai
Step 6

Step 6: To grow the tree further below


Rain sub-attribute, Lets consider the data
samples pertaining to Rain sub-attribute
On checking High sub-attribute has only (No)with negative respect to with
labels otherhigh
mainpurity
attributes
measure and Normal sub-attribute has namely
onlyASP/SCOPE,
(Yes) temp, humidity
with and
highwindy
Dr.S.Sridevi, VITpositive
Chennai labels purity
measure so they are considered as leaf nodes.
Step 6

Dr.S.Sridevi, ASP/SCOPE, VIT Chennai


Step 7

Dr.S.Sridevi, ASP/SCOPE, VIT Chennai


Dr.S.Sridevi, ASP/SCOPE, VIT Chennai
Dr.S.Sridevi, ASP/SCOPE, VIT Chennai
Dr.S.Sridevi, ASP/SCOPE, VIT Chennai
On checking Strong sub-attribute has only (No) negative labels with high purity
measure and Weak sub-attribute has only (Yes) positive labels with high purity
measure so they are considered as leaf nodes.

Dr.S.Sridevi, ASP/SCOPE, VIT Chennai


Finally Decision tree is grown

Dr.S.Sridevi, ASP/SCOPE, VIT Chennai


DECISION TREE REPRESENTATION

• (Outlook = Sunny, Temperature = Hot, Humidity = High, Wind = Strong)


• would be sorted down the leftmost branch of this decision tree and would therefore be classified
as a negative instance (i.e., the tree predicts that Play Tennis = no).
• In general, decision trees represent a disjunction of conjunctions of constraints on the attribute
values of instances.
• Each path from the tree root to a leaf corresponds to a conjunction of attribute tests, and the tree
itself to a disjunction of these conjunctions.
(Outlook = Sunny ˄ Humidity = Normal) v (Outlook = Overcast) v (Outlook = Rain ˄ Wind = Weak)

Dr.S.Sridevi, ASP/SCOPE, VIT Chennai


Attributes
Target

+
-

Dr.S.Sridevi, ASP/SCOPE, VIT Chennai


Attributes

Dr.S.Sridevi, ASP/SCOPE, VIT Chennai


Dr.S.Sridevi, ASP/SCOPE, VIT Chennai
Dr.S.Sridevi, ASP/SCOPE, VIT Chennai
`

Dr.S.Sridevi, ASP/SCOPE, VIT Chennai


Dr.S.Sridevi, ASP/SCOPE, VIT Chennai
Dr.S.Sridevi, ASP/SCOPE, VIT Chennai
Dr.S.Sridevi, ASP/SCOPE, VIT Chennai

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy