0% found this document useful (0 votes)

76 views

Decision Trees and How To Build and Optimize Decision Tree Classifier

Decision trees are a popular machine learning algorithm that can be used for both classification and regression problems. They work by recursively splitting the data into purer subgroups based on the values of predictor variables. The algorithm starts with the entire training set as the root node and then identifies the optimal variable and cut point to split the data into two child nodes. This splitting process is repeated on the child nodes until some stopping criteria is reached, resulting in a tree structure. Some key techniques used in decision trees include information gain, gini index, and pruning.

Uploaded by

Shalini Singhal

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

76 views

Decision Trees and How To Build and Optimize Decision Tree Classifier

Uploaded by

Shalini Singhal

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 16

Decision Trees

Decision trees and how to build and optimize decision tree classifier.

Introduction

Classification is a two-step process, learning step and prediction step, in machine

learning. In the learning step, the model is developed based on given training data. In
the prediction step, the model is used to predict the response for given data. Decision
Tree is one of the easiest and popular classification algorithms to understand and
interpret.

Decision Tree Algorithm

Decision Tree algorithm belongs to the family of supervised learning algorithms. Unlike
other supervised learning algorithms, the decision tree algorithm can be used for
solving regression and classification problems too.
The goal of using a Decision Tree is to create a training model that can use to predict
the class or value of the target variable by learning simple decision rules inferred
from prior data (training data).
In Decision Trees, for predicting a class label for a record we start from the root of the
tree. We compare the values of the root attribute with the record’s attribute. On the
basis of comparison, we follow the branch corresponding to that value and jump to the
next node.

Types of Decision Trees

Types of decision trees are based on the type of target variable we have. It can be of
two types:
1. Categorical Variable Decision Tree: Decision Tree which has a categorical target
variable then it called a Categorical variable decision tree.
2. Continuous Variable Decision Tree: Decision Tree has a continuous target variable
then it is called Continuous Variable Decision Tree.
Example:- Let’s say we have a problem to predict whether a customer will pay his
renewal premium with an insurance company (yes/ no). Here we know that the income
of customers is a significant variable but the insurance company does not have income
details for all customers. Now, as we know this is an important variable, then we can
build a decision tree to predict customer income based on occupation, product, and
various other variables. In this case, we are predicting values for the continuous
variables.

Important Terminology related to Decision Trees

1. Root Node: It represents the entire population or sample and this further gets divided
into two or more homogeneous sets.
2. Splitting: It is a process of dividing a node into two or more sub-nodes.
3. Decision Node: When a sub-node splits into further sub-nodes, then it is called the
decision node.
4. Leaf / Terminal Node: Nodes do not split is called Leaf or Terminal node.
5. Pruning: When we remove sub-nodes of a decision node, this process is called
pruning. You can say the opposite process of splitting.
6. Branch / Sub-Tree: A subsection of the entire tree is called branch or sub-tree.
7. Parent and Child Node: A node, which is divided into sub-nodes is called a parent
node of sub-nodes whereas sub-nodes are the child of a parent node.
Decision trees classify the examples by sorting them down the tree from the root to
some leaf/terminal node, with the leaf/terminal node providing the classification of the
example.
Each node in the tree acts as a test case for some attribute, and each edge descending
from the node corresponds to the possible answers to the test case. This process is
recursive in nature and is repeated for every subtree rooted at the new node.

Assumptions while creating Decision Tree

Below are some of the assumptions we make while using Decision tree:
• In the beginning, the whole training set is considered as the root.
• Feature values are preferred to be categorical. If the values are continuous then they
are discretized prior to building the model.
• Records are distributed recursively on the basis of attribute values.
• Order to placing attributes as root or internal node of the tree is done by using some
statistical approach.
Decision Trees follow Sum of Product (SOP) representation. The Sum of product
(SOP) is also known as Disjunctive Normal Form. For a class, every branch from the
root of the tree to a leaf node having the same class is conjunction (product) of values,
different branches ending in that class form a disjunction (sum).
The primary challenge in the decision tree implementation is to identify which attributes
do we need to consider as the root node and each level. Handling this is to know as the
attributes selection. We have different attributes selection measures to identify the
attribute which can be considered as the root note at each level.

How do Decision Trees work?

The decision of making strategic splits heavily affects a tree’s accuracy. The decision
criteria are different for classification and regression trees.
Decision trees use multiple algorithms to decide to split a node into two or more sub-
nodes. The creation of sub-nodes increases the homogeneity of resultant sub-nodes. In
other words, we can say that the purity of the node increases with respect to the target
variable. The decision tree splits the nodes on all available variables and then selects
the split which results in most homogeneous sub-nodes.
The algorithm selection is also based on the type of target variables. Let us look at
some algorithms used in Decision Trees:
ID3 → (extension of D3)
C4.5 → (successor of ID3)
CART → (Classification And Regression Tree)
CHAID → (Chi-square automatic interaction detection Performs multi-level splits when
computing classification trees)
MARS → (multivariate adaptive regression splines)
The ID3 algorithm builds decision trees using a top-down greedy search approach
through the space of possible branches with no backtracking. A greedy algorithm, as
the name suggests, always makes the choice that seems to be the best at that moment.
Steps in ID3 algorithm:
1. It begins with the original set S as the root node.
2. On each iteration of the algorithm, it iterates through the very unused attribute of the
set S and calculates Entropy(H) and Information gain(IG) of this attribute.
3. It then selects the attribute which has the smallest Entropy or Largest Information
gain.
4. The set S is then split by the selected attribute to produce a subset of the data.
5. The algorithm continues to recur on each subset, considering only attributes never
selected before.

Attribute Selection Measures

If the dataset consists of N attributes then deciding which attribute to place at the root or
at different levels of the tree as internal nodes is a complicated step. By just randomly
selecting any node to be the root can’t solve the issue. If we follow a random approach,
it may give us bad results with low accuracy.
For solving this attribute selection problem, researchers worked and devised some
solutions. They suggested using some criteria like :
Entropy,
Information gain,
Gini index,
Gain Ratio,
Reduction in Variance
Chi-Square
These criteria will calculate values for every attribute. The values are sorted, and
attributes are placed in the tree by following the order i.e., the attribute with a high
value(in case of information gain) is placed at the root.
While using Information Gain as a criterion, we assume attributes to be categorical, and
for the Gini index, attributes are assumed to be continuous.

Entropy

Entropy is a measure of the randomness in the information being processed. The higher
the entropy, the harder it is to draw any conclusions from that information. Flipping a
coin is an example of an action that provides information that is random.
From the above graph, it is quite evident that the entropy H(X) is zero when the
probability is either 0 or 1. The Entropy is maximum when the probability is 0.5 because
it projects perfect randomness in the data and there is no chance if perfectly
determining the outcome. ID3 follows the rule — A branch with an entropy of zero is a
leaf node and A branch with entropy more than zero needs further splitting.
Mathematically Entropy for 1 attribute is represented as:
Where S → Current state, and Pi → Probability of an event i of state S or
Percentage of class i in a node of state S.
Mathematically Entropy for multiple attributes is represented as:
where T→ Current state and X → Selected attribute

Information Gain

Information gain or IG is a statistical property that measures how well a given attribute
separates the training examples according to their target classification. Constructing a
decision tree is all about finding an attribute that returns the highest information gain
and the smallest entropy.

Information Gain

Information gain is a decrease in entropy. It computes the difference between entropy

before split and average entropy after split of the dataset based on given attribute
values. ID3 (Iterative Dichotomiser) decision tree algorithm uses information gain.
Mathematically, IG is represented as:
In a much simpler way, we can conclude that:

Information Gain
Where “before” is the dataset before the split, K is the number of subsets generated by
the split, and (j, after) is subset j after the split.

Gini Index

You can understand the Gini index as a cost function used to evaluate splits in the
dataset. It is calculated by subtracting the sum of the squared probabilities of each class
from one. It favours larger partitions and easy to implement whereas information gain
favours smaller partitions with distinct values.
Gini Index

Gini Index works with the categorical target variable “Success” or “Failure”. It performs
only Binary splits.
Higher the value of Gini index higher the homogeneity.
Steps to Calculate Gini index for a split
1. Calculate Gini for sub-nodes, using the above formula for success(p) and failure(q)
(p²+q²).
2. Calculate the Gini index for split using the weighted Gini score of each node of that
split.
CART (Classification and Regression Tree) uses the Gini index method to create split
points.

Gain ratio

Information gain is biased towards choosing attributes with a large number of values as
root nodes. It means it prefers the attribute with a large number of distinct values.
C4.5, an improvement of ID3, uses Gain ratio which is a modification of Information gain
that reduces its bias and is usually the best option. Gain ratio overcomes the problem
with information gain by taking into account the number of branches that would result
before making the split. It corrects information gain by taking the intrinsic information of
a split into account.
Let us consider if we have a dataset that has users and their movie genre preferences
based on variables like gender, group of age, rating, blah, blah. With the help of
information gain, you split at ‘Gender’ (assuming it has the highest information gain) and
now the variables ‘Group of Age’ and ‘Rating’ could be equally important and with the
help of gain ratio, it will penalize a variable with more distinct values which will help us
decide the split at the next level.
Gain Ratio

Where “before” is the dataset before the split, K is the number of subsets generated by
the split, and (j, after) is subset j after the split.

Reduction in Variance

Reduction in variance is an algorithm used for continuous target variables (regression

problems). This algorithm uses the standard formula of variance to choose the best
split. The split with lower variance is selected as the criteria to split the population:

Above X-bar is the mean of the values, X is actual and n is the number of values.
Steps to calculate Variance:
1. Calculate variance for each node.
2. Calculate variance for each split as the weighted average of each node variance.

Chi-Square

The acronym CHAID stands for Chi-squared Automatic Interaction Detector. It is one of
the oldest tree classification methods. It finds out the statistical significance between the
differences between sub-nodes and parent node. We measure it by the sum of squares
of standardized differences between observed and expected frequencies of the target
variable.
It works with the categorical target variable “Success” or “Failure”. It can perform two or
more splits. Higher the value of Chi-Square higher the statistical significance of
differences between sub-node and Parent node.
It generates a tree called CHAID (Chi-square Automatic Interaction Detector).
Mathematically, Chi-squared is represented as:

Steps to Calculate Chi-square for a split:

1. Calculate Chi-square for an individual node by calculating the deviation for Success
and Failure both
2. Calculated Chi-square of Split using Sum of all Chi-square of success and Failure of
each node of the split

How to avoid/counter Overfitting in Decision Trees?

The common problem with Decision trees, especially having a table full of columns, they
fit a lot. Sometimes it looks like the tree memorized the training data set. If there is no
limit set on a decision tree, it will give you 100% accuracy on the training data set
because in the worse case it will end up making 1 leaf for each observation. Thus this
affects the accuracy when predicting samples that are not part of the training set.
Here are two ways to remove overfitting:
1. Pruning Decision Trees.
2. Random Forest
Pruning Decision Trees
The splitting process results in fully grown trees until the stopping criteria are reached.
But, the fully grown tree is likely to overfit the data, leading to poor accuracy on unseen
data.
Pruning in action

In pruning, you trim off the branches of the tree, i.e., remove the decision nodes
starting from the leaf node such that the overall accuracy is not disturbed. This is done
by segregating the actual training set into two sets: training data set, D and validation
data set, V. Prepare the decision tree using the segregated training data set, D. Then
continue trimming the tree accordingly to optimize the accuracy of the validation data
set, V.
Pruning

In the above diagram, the ‘Age’ attribute in the left-hand side of the tree has been
pruned as it has more importance on the right-hand side of the tree, hence removing
overfitting.
Random Forest
Random Forest is an example of ensemble learning, in which we combine multiple
machine learning algorithms to obtain better predictive performance.
Why the name “Random”?
Two key concepts that give it the name random:
1. A random sampling of training data set when building trees.
2. Random subsets of features considered when splitting nodes.
A technique known as bagging is used to create an ensemble of trees where multiple
training sets are generated with replacement.
In the bagging technique, a data set is divided into N samples using randomized
sampling. Then, using a single learning algorithm a model is built on all samples. Later,
the resultant predictions are combined using voting or averaging in parallel.
Random Forest in action

Which is better Linear or tree-based models?

Well, it depends on the kind of problem you are solving.

1. If the relationship between dependent & independent variables is well approximated
by a linear model, linear regression will outperform the tree-based model.
2. If there is a high non-linearity & complex relationship between dependent &
independent variables, a tree model will outperform a classical regression method.
3. If you need to build a model that is easy to explain to people, a decision tree model
will always do better than a linear model. Decision tree models are even simpler to
interpret than linear regression!

AI and ML Lab - VIVA Questions
100% (5)
AI and ML Lab - VIVA Questions
7 pages
Soultion5
No ratings yet
Soultion5
3 pages
Decisiontree 2
No ratings yet
Decisiontree 2
16 pages
Decision Tree Algorithm, Explained-1-22
No ratings yet
Decision Tree Algorithm, Explained-1-22
22 pages
Decision Tree Algorithm: and Classification Problems Too
No ratings yet
Decision Tree Algorithm: and Classification Problems Too
12 pages
Decision Tree Algorithm, Explained
No ratings yet
Decision Tree Algorithm, Explained
20 pages
Decision Tree
No ratings yet
Decision Tree
43 pages
FALLSEM2024-25 BCSE209L TH VL2024250101598 2024-08-05 Reference-Material-I
No ratings yet
FALLSEM2024-25 BCSE209L TH VL2024250101598 2024-08-05 Reference-Material-I
31 pages
1.decision Trees Concepts
No ratings yet
1.decision Trees Concepts
70 pages
ML Unit 3
No ratings yet
ML Unit 3
14 pages
ML Unit-2.1
No ratings yet
ML Unit-2.1
17 pages
Decision Tree Is An Upside
No ratings yet
Decision Tree Is An Upside
17 pages
AI - Mod 5. Part 2
No ratings yet
AI - Mod 5. Part 2
40 pages
Decision Tree in Machine Learning
No ratings yet
Decision Tree in Machine Learning
11 pages
UNIT 2 - Groups (Decision Tree) (1)
No ratings yet
UNIT 2 - Groups (Decision Tree) (1)
20 pages
Konsep Ensemble
No ratings yet
Konsep Ensemble
52 pages
AI&Ml-module 4 (Complete)
No ratings yet
AI&Ml-module 4 (Complete)
124 pages
AI&Ml-module 4 (Part 1)
No ratings yet
AI&Ml-module 4 (Part 1)
85 pages
Decision Tree (1)
No ratings yet
Decision Tree (1)
7 pages
Unit 3 (A) NGP
No ratings yet
Unit 3 (A) NGP
78 pages
Decision Tree New
No ratings yet
Decision Tree New
52 pages
Data Structures: Notes For Lecture 13 Techniques of Data Mining by Samaher Hussein Ali
No ratings yet
Data Structures: Notes For Lecture 13 Techniques of Data Mining by Samaher Hussein Ali
8 pages
Unit-3 Decision Tree Learning (Februray 26, 2024)
No ratings yet
Unit-3 Decision Tree Learning (Februray 26, 2024)
51 pages
Tree
No ratings yet
Tree
7 pages
Decision Tree - Associative Rule Mining
No ratings yet
Decision Tree - Associative Rule Mining
69 pages
Decision Tree
No ratings yet
Decision Tree
21 pages
Decision Trees
No ratings yet
Decision Trees
7 pages
4. Classification
No ratings yet
4. Classification
75 pages
Session 5b Classification by Decision Tree Induction (1)
No ratings yet
Session 5b Classification by Decision Tree Induction (1)
42 pages
Bhabesh - Chapter 3 Complete Editing Including Summary
No ratings yet
Bhabesh - Chapter 3 Complete Editing Including Summary
18 pages
Lecture Note #5_PEC-CS701E
No ratings yet
Lecture Note #5_PEC-CS701E
16 pages
Adobe Scan 16 May 2023 (5)
No ratings yet
Adobe Scan 16 May 2023 (5)
12 pages
Decision Tree.pptx
No ratings yet
Decision Tree.pptx
41 pages
Decision Trees Edited
No ratings yet
Decision Trees Edited
56 pages
decision tree
No ratings yet
decision tree
13 pages
Machine Learning chapter 4
No ratings yet
Machine Learning chapter 4
9 pages
Data Minning Unit 5 PDF
No ratings yet
Data Minning Unit 5 PDF
19 pages
BSC ML Ch3.pptx
No ratings yet
BSC ML Ch3.pptx
106 pages
DM Mod 3
No ratings yet
DM Mod 3
14 pages
ID3 Algorithm
100% (1)
ID3 Algorithm
3 pages
ML Unit 3
No ratings yet
ML Unit 3
15 pages
m3
No ratings yet
m3
141 pages
Lesson 7 Supervised Method (Decision Trees) Algorithms
No ratings yet
Lesson 7 Supervised Method (Decision Trees) Algorithms
12 pages
Lect 8-Decision Tree-2
No ratings yet
Lect 8-Decision Tree-2
16 pages
Machine_Learning_Lecture_08_Decision Tree Learning (1)
No ratings yet
Machine_Learning_Lecture_08_Decision Tree Learning (1)
67 pages
FALLSEM2024-25 BCSE209L TH VL2024250101735 2024-07-29 Reference-Material-I
No ratings yet
FALLSEM2024-25 BCSE209L TH VL2024250101735 2024-07-29 Reference-Material-I
48 pages
ML UNIT 2-2-40
No ratings yet
ML UNIT 2-2-40
39 pages
Decision Trees
No ratings yet
Decision Trees
3 pages
Session 17-Decision Tree
No ratings yet
Session 17-Decision Tree
16 pages
1822-b.e-cse-batchno-149
No ratings yet
1822-b.e-cse-batchno-149
66 pages
Decision Trees
No ratings yet
Decision Trees
14 pages
Cse 445 Lecture 8 Mma
No ratings yet
Cse 445 Lecture 8 Mma
107 pages
ID3
No ratings yet
ID3
7 pages
Asset v1 MKAU+SEng9032+DEV 01+Type@Asset+Block@ML Chapterthree
No ratings yet
Asset v1 MKAU+SEng9032+DEV 01+Type@Asset+Block@ML Chapterthree
129 pages
Decision Trees_ a Complete Introduction With Examples _ by Shubham Koli _ Medium
No ratings yet
Decision Trees_ a Complete Introduction With Examples _ by Shubham Koli _ Medium
22 pages
Decision Tree Learning
No ratings yet
Decision Tree Learning
70 pages
Aiml M4 C1
No ratings yet
Aiml M4 C1
101 pages
NOTES
No ratings yet
NOTES
18 pages
Decision Tree Learning
No ratings yet
Decision Tree Learning
22 pages
Decision Tree Classification Algorithm (2)
No ratings yet
Decision Tree Classification Algorithm (2)
11 pages
Decision Tree-31-01-2025
No ratings yet
Decision Tree-31-01-2025
28 pages
Alternating Decision Tree: Fundamentals and Applications
From Everand
Alternating Decision Tree: Fundamentals and Applications
Fouad Sabry
No ratings yet
Basic Principles of Programming Languages
No ratings yet
Basic Principles of Programming Languages
40 pages
Case Study - 2: Communication Skills For Professionals
No ratings yet
Case Study - 2: Communication Skills For Professionals
3 pages
Support Vector Machine
No ratings yet
Support Vector Machine
38 pages
Jasim Alam (1426) Prabal Jain (1418) Vivek Anand (1444) Indian Institute of Forest Management (IIFM)
No ratings yet
Jasim Alam (1426) Prabal Jain (1418) Vivek Anand (1444) Indian Institute of Forest Management (IIFM)
27 pages
Intel Microprocessor Architecture
No ratings yet
Intel Microprocessor Architecture
27 pages
Ai Assignment
No ratings yet
Ai Assignment
2 pages
Casestudy Alibaba
No ratings yet
Casestudy Alibaba
1 page
7d41 PDF
No ratings yet
7d41 PDF
7 pages
Lab Manual LPII 2
No ratings yet
Lab Manual LPII 2
43 pages
Soul Ti On 7
No ratings yet
Soul Ti On 7
2 pages
MAchine Learning 2
No ratings yet
MAchine Learning 2
16 pages
CSE3506 Module2 Notes
No ratings yet
CSE3506 Module2 Notes
96 pages
Genetic Based ID3 Classification Algorithm Diagnosis and Prognosis of Oral Cancer
No ratings yet
Genetic Based ID3 Classification Algorithm Diagnosis and Prognosis of Oral Cancer
3 pages
Data Science Interview Questions
100% (1)
Data Science Interview Questions
68 pages
2023 ML Assignment
No ratings yet
2023 ML Assignment
57 pages
73 11 Prasanna Survey
No ratings yet
73 11 Prasanna Survey
8 pages
So sánh thuật toán cây quyết định ID3 và C45
No ratings yet
So sánh thuật toán cây quyết định ID3 và C45
7 pages
Sat - 60.Pdf - Predection of Football Players Performance Using Machine Learning and Deep Learning Algorithms
No ratings yet
Sat - 60.Pdf - Predection of Football Players Performance Using Machine Learning and Deep Learning Algorithms
11 pages
ML L8 Decision Tree
No ratings yet
ML L8 Decision Tree
109 pages
Classification - Decision Trees
No ratings yet
Classification - Decision Trees
43 pages
Module 3-1 PDF
No ratings yet
Module 3-1 PDF
43 pages
Classification and Prediction
No ratings yet
Classification and Prediction
41 pages
ASSIGNMEnt 3
No ratings yet
ASSIGNMEnt 3
26 pages
ML Notes -2025
No ratings yet
ML Notes -2025
145 pages
UNIT 2 Class Basic
No ratings yet
UNIT 2 Class Basic
69 pages
06 - Decision Trees
100% (1)
06 - Decision Trees
83 pages
Scikit Learn
No ratings yet
Scikit Learn
17 pages
21CS54 Module 4 2021 Scheme
No ratings yet
21CS54 Module 4 2021 Scheme
42 pages
Lec 3&4
No ratings yet
Lec 3&4
20 pages
Decision Tree
100% (1)
Decision Tree
10 pages
Lecture 13-Supervised Learning-Decision Trees-M
No ratings yet
Lecture 13-Supervised Learning-Decision Trees-M
47 pages
Decision Trees Classification: Mustafa Jarrar
No ratings yet
Decision Trees Classification: Mustafa Jarrar
46 pages
Data Analytics - Object Segmentation UNIT-IV
100% (1)
Data Analytics - Object Segmentation UNIT-IV
33 pages
ML UNIT-2 Notes
No ratings yet
ML UNIT-2 Notes
15 pages
CPE412 Pattern Recognition (Week 10)
No ratings yet
CPE412 Pattern Recognition (Week 10)
28 pages
Artificial Intelligence Module 5
No ratings yet
Artificial Intelligence Module 5
23 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Decision Trees and How To Build and Optimize Decision Tree Classifier

Uploaded by

Decision Trees and How To Build and Optimize Decision Tree Classifier

Uploaded by

Decision Trees

Classification is a two-step process, learning step and prediction step, in machine

Decision Tree Algorithm

Types of Decision Trees

Important Terminology related to Decision Trees

Assumptions while creating Decision Tree

How do Decision Trees work?

Attribute Selection Measures

Information gain is a decrease in entropy. It computes the difference between entropy

Reduction in variance is an algorithm used for continuous target variables (regression

Steps to Calculate Chi-square for a split:

How to avoid/counter Overfitting in Decision Trees?

Which is better Linear or tree-based models?

Well, it depends on the kind of problem you are solving.

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.