0% found this document useful (0 votes)

5 views

Decision Trees

Uploaded by

Tâm

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

5 views

Decision Trees

Uploaded by

Tâm

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 8

Decision Trees

Hello everyone. I am a self-taught data scientist, and today's topic is decision trees. I will share what I
have learned, and I hope you can also gain some insights along with me.
Note: I would like to thank Onur Koç, who has played a significant role in helping me understand the
topic as I studied.
Decision trees are non-parametric supervised machine learning algorithms that can be employed for
both classification and regression tasks. They are widely used and robust machine learning algorithms
in today's context. Visually, they can be represented as upside-down trees.

Before we start, I would like to provide information about the terms used in the diagram on the right:

Root node:

• It is the starting point of the decision tree.

• It is the first node that forms the foundation of the entire tree.
• It is created by dividing the dataset based on a condition related to a specific feature.
Decision node:

• It tests the dataset on a specific feature and divides the data into two or more subsets based on
the test result.
• Each subset can be further divided into more subsets with another test at the next node.
• Decision nodes represent the decision rules used for classifying or regressing the dataset.
Terminal/Leaf node:

• After the dataset is divided based on a specific rule or condition, classification or regression
results are obtained in these terminal nodes.
• Leaf nodes are the bottommost nodes of the tree and produce the final outcomes. They contain
a class or regression value.
Let’s build a decision tree and visualize it to understand the process. We are going to use one of the
most popular datasets, the iris dataset.

Firstly, we import the necessary

libraries. Then, we split our dataset
into independent features and the
dependent variable we want to
predict. The only parameter we set in
our model is max_depth. I will
explain this parameter and more in the
following.

If you are not familiar with

the iris dataset, don't worry.
When you run the code on
the side, you will receive
details about the dataset in
the output format on the side.

The 'Target' column

represents the flower types.
We have three different
flower types:
0 = Iris-setosa
1 = Iris-versicolor
2 = Iris-virginica
We've built our first tree. Now, let's visualize it and discuss the visualization.

Now, with an understanding of the decision tree's working principle, we begin at the root node (depth
0, the first one): this node checks if the petal length (cm) feature is less than or equal to 2.45. If so, we
move to the left child node of the root (depth 1, left). In this case, this node serves as a leaf node,
indicating it doesn't ask further questions but produces a result. The result classifies data with petal
length (cm) values equal to or less than 2.45 as the setosa type.
Exploring data with petal length (cm) values greater than 2.45, we examine the right child node of the
root (depth 1, right), which is a decision node introducing a new question. Does our petal width (cm)
value exceed 1.75? This question leads us to new decision nodes (depth 2) that, in turn, ask more
questions, eventually reaching leaf nodes to classify all our data.
The 'Samples' value indicates the number of examples in that node, while the 'value' list shows the
class affiliation of the examples. For example, when observing the depth 1, left node, it informs us that
a total of 50 samples are divided, with all 50 belonging to the first class (as seen in the 'class' section).
Understanding the logic of the decision tree, let's now address potential questions that might arise,
helping us delve deeper into the working principle of the decision tree, where we will find answers.

• When asking questions, how does it decide which feature to select? For example,
o Why did it choose the petal length feature at the root node instead of sepal width or
petal width?
• When asking questions, how does it decide which feature value to choose? For example,
o Why did it not choose other values like 1.7 or 2.3 instead of the value 2.45?
• What is the Gini value, and why is it important?

Parameters of Decision Tree

1. Criterion
Criterion : {“gini”,”entropy”,”log_loss”}, default = “gini”
Determines the criterion used to measure the splitting quality of the decision tree. Today, we will talk
about Gini and Entropy.
1.1 Gini
Gini Index is a measure of splitting quality used by a Decision Tree algorithm. This index helps assess
how homogeneous (containing examples from the same class) or heterogeneous (containing examples
from different classes) a dataset is. The Gini Index calculates the homogeneity of a node when it is
split using a specific feature and threshold value. Ideally, a node's Gini Index is zero, indicating that all
examples belong to the same class. The lower the Gini Index, the better the split, as it signifies higher
homogeneity. Gini Impurity ranges between 0 and 0.5.

𝑛
n: The number of classes
𝐺𝑖𝑛𝑖 = 1 − ∑ 𝑝𝑖2
𝑝𝑖 : The percentage of each class in the node
𝑖=1

Let’s compute the Gini of the depth 2 left node:

𝐺𝑖𝑛𝑖 = 1 − [(0⁄54)2 + (49⁄54)2 + (5⁄54)2 = 𝟎. 𝟏𝟔𝟖

1.2 Entropy
From a very general perspective, we can define entropy as a measure of the disorder of a system. From
this point of view, in our case, it is the measure of impurity in a split.
𝑛

𝐸𝑛𝑡𝑟𝑜𝑝𝑦 (𝑆) = ∑ − 𝑝𝑖 𝑙𝑜𝑔2 ( 𝑝𝑖 )

𝑖=0
n: The number of classes 𝑝𝑖 : The percentage of each class in the node

If we had chosen entropy as the criterion, we would have needed to

perform the following calculation for the same node (depth 2, left).

𝐸𝑛𝑡𝑟𝑜𝑝𝑦 (𝑆) = −[(0⁄54)𝑙𝑜𝑔2 (0⁄54) + (49⁄54)𝑙𝑜𝑔2 (49⁄54) + (5⁄54)𝑙𝑜𝑔2 (5⁄54) ≈ 𝟎, 𝟒𝟒𝟓

To verify, this time we instruct our code to operate with the entropy by inputting the criterion
parameter.

As seen, we have obtained the same result. Entropy ranges from 0 to

1. If entropy is 0, consider it a pure sub-tree, and entropy becomes 1 if
all labels are equally distributed in a leaf. The obtained value of 0.45
indicates a moderate level of disorder.

On the adjacent graph, we

can better understand the
level of disorder
corresponding to the
varying entropy values.
Information Gain
Information Gain measures how well a feature or a set of features can split or classify a dataset.

|𝑆𝑖 |
𝐺𝑎𝑖𝑛(𝑆, 𝐴) = 𝐸𝑛𝑡𝑟𝑜𝑝𝑦(𝑆) − ∑ 𝐸𝑛𝑡𝑟𝑜𝑝𝑦(𝑆𝑖 )
|𝑆 |
𝑖 ∈𝑣𝑎𝑙𝑢𝑒𝑠(𝐴)

Let's calculate the

information gain for the
second split.

54 46
𝐼𝑛𝑓𝑜𝑟𝑚𝑎𝑡𝑖𝑜𝑛 𝐺𝑎𝑖𝑛 = 1 − [(100) × 0,445 + (100) × 0,151)] ≈ 0,69

Information Gain value:

• 0: The classes in the node are completely homogeneous, indicating a clear separation.
• 1: The classes in the node are completely mixed and not homogeneous.
The Information Gain value of 0.69 indicates that it has slightly reduced the uncertainty between the
classes in the node, but the classes are still not entirely homogeneous.
2. max_depth
As you may recall, we used the max_depth parameter at the beginning of the text. This parameter
determines the maximum depth of the decision tree. It controls how deep the tree can grow. A larger
max_depth results in a more complex and detailed tree, but it may increase the risk of overfitting.
3. min_samples_split
It sets the minimum number of samples required to split a node. This parameter restricts further
divisions in the tree and can help reduce the risk of overfitting.

min_samples_split = 10

NO SPLIT

min_samples_split = 10

4. max_features
It is a parameter that determines the maximum number of features to consider at each split step of a
decision tree. This parameter is used to control how many features the model will consider in each
split step. It is particularly useful for large datasets. Let's say our dataset has 50 different features, and
we set our parameter as max_features = 10. Before each split, the model randomly selects 10 features
and chooses the best one from these 10 features. It's a parameter that can be adjusted to prevent
overfitting.
5. class_weight (for classification problems)
The main reasons for using `class_weight` are as follows:

• Balancing classes in imbalanced datasets: If some classes in your dataset have fewer examples
than others, you can use class weights to assign more weight to minority classes, allowing the
model to better learn these classes.
• Giving more importance to specific classes: If you want certain classes to have a greater impact on
the model's learning, you can assign higher weights to these classes.
Typically, the `class_weight` parameter is used in two ways:

• `class_weight="balanced"`: This option automatically determines class weights. The weights are
calculated inversely proportional to the frequency of each class in the dataset. This ensures the
automatic assignment of appropriate weights when there is an imbalance among classes.
• Manual Specification (`class_weight={0: 1, 1: 2}`): Users can manually set the weights of classes.
This is useful, especially when prioritizing a specific class or correcting an imbalance situation.
6. sample_weight
The sample_weight parameter is used to determine the importance of each individual example (data
point). For instance, when developing a medical diagnosis model, you may believe that the diagnosis
for some patients is more critical than others.
For example, consider the following scenarios:
Example 1 (Patient A): He/she has a critical condition, and accurate diagnosis is crucial.
Example 2 (Patient B): He/she has a less critical condition, and accurate diagnosis is important but not
a top priority.
By using sample_weight , you can assign higher weight to Example 1, which helps the model pay
more attention to diagnosing critical cases.

The pros and coins of decision trees:

Some advantages of decision trees are: The disadvantages of decision trees include:

Simple to understand and to interpret. Trees can be visualized. Decision-tree learners can create over-complex trees that do
not generalize the data well. This is called overfitting.
Mechanisms such as pruning, setting the minimum number of
samples required at a leaf node or setting the maximum depth
of the tree are necessary to avoid this problem.
Requires little data preparation. Other techniques often Decision trees can be unstable because small variations in the
require data normalization, dummy variables need to be data might result in a completely different tree being
created and blank values to be removed. Some tree and generated. This problem is mitigated by using decision trees
within an ensemble.
algorithm combinations support missing values.
The cost of using the tree (i.e., predicting data) is logarithmic Predictions of decision trees are neither smooth nor
in the number of data points used to train the tree. continuous, but piecewise constant approximations.
Therefore, they are not good at extrapolation.
Able to handle both numerical and categorical data. The problem of learning an optimal decision tree is NP-
complete, so practical algorithms often make locally optimal
decisions. These algorithms cannot guarantee the globally
best decision tree.

Able to handle multi-output problems. There are concepts that are hard to learn because decision
trees do not express them easily, such as XOR, parity or
multiplexer problems.
Uses a white box model. If a given situation is observable in a Decision tree learners create biased trees if some classes
model, the explanation for the condition is easily explained by dominate. It is therefore recommended to balance the dataset
boolean logic. prior to fitting with the decision tree.

Possible to validate a model using statistical tests. That makes

it possible to account for the reliability of the model.
Performs well even if its assumptions are somewhat violated
by the true model from which the data were generated.

sm17 Solution Manual Operations and Supply Chain Management PDF
No ratings yet
sm17 Solution Manual Operations and Supply Chain Management PDF
32 pages
Dtree&rf
No ratings yet
Dtree&rf
26 pages
Unit IV Decision Trees
No ratings yet
Unit IV Decision Trees
37 pages
Decision Tree in Machine Learning
No ratings yet
Decision Tree in Machine Learning
11 pages
Decision Trees_ a Complete Introduction With Examples _ by Shubham Koli _ Medium
No ratings yet
Decision Trees_ a Complete Introduction With Examples _ by Shubham Koli _ Medium
22 pages
DM chapter 4
No ratings yet
DM chapter 4
6 pages
Decision Tree & Random Forest
No ratings yet
Decision Tree & Random Forest
16 pages
Trees and Forests: Machine Learning With Python Cookbook
No ratings yet
Trees and Forests: Machine Learning With Python Cookbook
5 pages
Decision Trees
No ratings yet
Decision Trees
5 pages
LAB (1) Decision Tree: Islamic University of Gaza Computer Engineering Department Artificial Intelligence ECOM 5038
No ratings yet
LAB (1) Decision Tree: Islamic University of Gaza Computer Engineering Department Artificial Intelligence ECOM 5038
18 pages
08 Decision - Tree
No ratings yet
08 Decision - Tree
9 pages
Breaking Down Decision Tree Algorithm
No ratings yet
Breaking Down Decision Tree Algorithm
10 pages
Unit 5. Decision Trees
No ratings yet
Unit 5. Decision Trees
58 pages
Lesson 5.0 Supervised Learning with Decision Trees (1)
No ratings yet
Lesson 5.0 Supervised Learning with Decision Trees (1)
16 pages
Data Mining Notes Unit 4
No ratings yet
Data Mining Notes Unit 4
30 pages
Decision Tree
No ratings yet
Decision Tree
19 pages
Decision Tree Theory
No ratings yet
Decision Tree Theory
22 pages
ML Mod-4
No ratings yet
ML Mod-4
30 pages
ML_UNIT_3_NOTES-1
No ratings yet
ML_UNIT_3_NOTES-1
118 pages
UNIT3
No ratings yet
UNIT3
71 pages
Machine Learning chapter 4
No ratings yet
Machine Learning chapter 4
9 pages
Unit 3
No ratings yet
Unit 3
31 pages
ML CLASS 6 Decision Tree Algorithm
No ratings yet
ML CLASS 6 Decision Tree Algorithm
21 pages
Decision Tree
No ratings yet
Decision Tree
5 pages
Decision Tree & Random Forest
No ratings yet
Decision Tree & Random Forest
28 pages
AI - Mod 5. Part 2
No ratings yet
AI - Mod 5. Part 2
40 pages
DMDW-CO3-SESSION-14
No ratings yet
DMDW-CO3-SESSION-14
55 pages
Rf&DTfratello2018
No ratings yet
Rf&DTfratello2018
10 pages
Decision Trees and Regression Techniques
No ratings yet
Decision Trees and Regression Techniques
27 pages
Chapter 03
No ratings yet
Chapter 03
30 pages
Decision Tree
No ratings yet
Decision Tree
16 pages
CS446: Machine Learning: Lecture 21 (ML Models - Decision Trees - ID3)
No ratings yet
CS446: Machine Learning: Lecture 21 (ML Models - Decision Trees - ID3)
54 pages
1.decision Trees Concepts
No ratings yet
1.decision Trees Concepts
70 pages
Decision Tree
No ratings yet
Decision Tree
7 pages
Machine Learning Unit-3.2
No ratings yet
Machine Learning Unit-3.2
61 pages
Decision Trees
No ratings yet
Decision Trees
15 pages
Lecture Note #5_PEC-CS701E
No ratings yet
Lecture Note #5_PEC-CS701E
16 pages
Machine_Learning_Lecture_08_Decision Tree Learning (1)
No ratings yet
Machine_Learning_Lecture_08_Decision Tree Learning (1)
67 pages
Decision Tree Algorithm - A Complete Guide: Data Science Blogathon
No ratings yet
Decision Tree Algorithm - A Complete Guide: Data Science Blogathon
13 pages
decision_trees_implementation (1)
No ratings yet
decision_trees_implementation (1)
13 pages
Wk. 5.2. Decision Trees (27.10.2020)
No ratings yet
Wk. 5.2. Decision Trees (27.10.2020)
57 pages
Decision Tree Algorithm
No ratings yet
Decision Tree Algorithm
18 pages
Classification
No ratings yet
Classification
30 pages
Act9
No ratings yet
Act9
22 pages
TEAA_ Tree Ensembles-1
No ratings yet
TEAA_ Tree Ensembles-1
43 pages
7_DecisionTree
No ratings yet
7_DecisionTree
58 pages
Decision Tree Basics
No ratings yet
Decision Tree Basics
30 pages
LVC+1+Post-Session+Summary
No ratings yet
LVC+1+Post-Session+Summary
9 pages
BSC ML Ch3.pptx
No ratings yet
BSC ML Ch3.pptx
106 pages
ML (Interview)
No ratings yet
ML (Interview)
20 pages
Chapter 4classification and Prediction
No ratings yet
Chapter 4classification and Prediction
19 pages
ML Unit 3 Notes
No ratings yet
ML Unit 3 Notes
117 pages
ML Unit 3
No ratings yet
ML Unit 3
15 pages
m3
No ratings yet
m3
141 pages
FALLSEM2024-25 BCSE209L TH VL2024250101598 2024-08-05 Reference-Material-I
No ratings yet
FALLSEM2024-25 BCSE209L TH VL2024250101598 2024-08-05 Reference-Material-I
31 pages
Random Forest Regression
No ratings yet
Random Forest Regression
57 pages
Decision_tree
No ratings yet
Decision_tree
15 pages
Decision Tree Classification Algorithm
No ratings yet
Decision Tree Classification Algorithm
30 pages
DECISION TREES-jb
No ratings yet
DECISION TREES-jb
8 pages
Decision Tree
No ratings yet
Decision Tree
12 pages
Alternating Decision Tree: Fundamentals and Applications
From Everand
Alternating Decision Tree: Fundamentals and Applications
Fouad Sabry
No ratings yet
Graded Quiz - Using Probability Distributions - Coursera
No ratings yet
Graded Quiz - Using Probability Distributions - Coursera
10 pages
Unit-5 Multirate Updated
No ratings yet
Unit-5 Multirate Updated
83 pages
Secant and Tangent Lines
No ratings yet
Secant and Tangent Lines
7 pages
Module 10 Trends No Activity
No ratings yet
Module 10 Trends No Activity
3 pages
Task 1: Search Algorithms: Depth First Search (DFS) - Is An Algorithm For Traversing or Searching Tree or Graph Data
No ratings yet
Task 1: Search Algorithms: Depth First Search (DFS) - Is An Algorithm For Traversing or Searching Tree or Graph Data
8 pages
Analysis and Control of Underactuated Mechanical Systems
No ratings yet
Analysis and Control of Underactuated Mechanical Systems
148 pages
DS100-1 WS 3.8 Enrico, DM
No ratings yet
DS100-1 WS 3.8 Enrico, DM
5 pages
Min Cost Flow and Succsesive Shortest Path Algorithm
No ratings yet
Min Cost Flow and Succsesive Shortest Path Algorithm
18 pages
Edge Detection: Xy, and Y2, We Will Approximate - L (M+X, N+y) - L (M, N) - (L (M+X, N+y) - L (M, N) ) by The Linear
No ratings yet
Edge Detection: Xy, and Y2, We Will Approximate - L (M+X, N+y) - L (M, N) - (L (M+X, N+y) - L (M, N) ) by The Linear
25 pages
pseudocode-to-predict-stock-prices
No ratings yet
pseudocode-to-predict-stock-prices
3 pages
Lecture-1-7 PDE
100% (1)
Lecture-1-7 PDE
18 pages
Course Details Math Methods I I
No ratings yet
Course Details Math Methods I I
3 pages
GenAI Assignment 3 & 4
No ratings yet
GenAI Assignment 3 & 4
6 pages
[Ebooks PDF] download (Ebook) Event-Triggered Sliding Mode Control: A New Approach to Control System Design by Bijnan Bandyopadhyay,Abhisek K. Behera (auth.) ISBN 9783319742182, 9783319742199, 3319742183, 3319742191 full chapters
100% (12)
[Ebooks PDF] download (Ebook) Event-Triggered Sliding Mode Control: A New Approach to Control System Design by Bijnan Bandyopadhyay,Abhisek K. Behera (auth.) ISBN 9783319742182, 9783319742199, 3319742183, 3319742191 full chapters
67 pages
Binary Revision Past Paper Questions
No ratings yet
Binary Revision Past Paper Questions
7 pages
Data Analysis Procedure
No ratings yet
Data Analysis Procedure
3 pages
Desk Checking Note
No ratings yet
Desk Checking Note
13 pages
MAS S62S18 Lec02
No ratings yet
MAS S62S18 Lec02
35 pages
2.2 Tensors Part2 Slides
No ratings yet
2.2 Tensors Part2 Slides
15 pages
Cpa DLP
No ratings yet
Cpa DLP
21 pages
Exercises Calculating Present Values (Answers) - A2C Intacc2A
No ratings yet
Exercises Calculating Present Values (Answers) - A2C Intacc2A
4 pages
3 Running
No ratings yet
3 Running
7 pages
Building LSTM-Based Model For Solar Energy Forecasting - by Dr. Saptarsi Goswami - Towards Data Science
No ratings yet
Building LSTM-Based Model For Solar Energy Forecasting - by Dr. Saptarsi Goswami - Towards Data Science
7 pages
CH 18
No ratings yet
CH 18
30 pages
Chapter 11 Projects
No ratings yet
Chapter 11 Projects
16 pages
sm-unit6-Run test 1 march (2)
No ratings yet
sm-unit6-Run test 1 march (2)
4 pages
Wavelets
No ratings yet
Wavelets
30 pages
BSTS302P Advanced-Competitive-Coding - Ii SS 1.0 0 BSTS302P
No ratings yet
BSTS302P Advanced-Competitive-Coding - Ii SS 1.0 0 BSTS302P
2 pages
Forecasting - CocaColaSales
No ratings yet
Forecasting - CocaColaSales
6 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Decision Trees

Uploaded by

Decision Trees

Uploaded by

Decision Trees

• It is the starting point of the decision tree.

Firstly, we import the necessary

If you are not familiar with

The 'Target' column

Parameters of Decision Tree

Let’s compute the Gini of the depth 2 left node:

𝐺𝑖𝑛𝑖 = 1 − [(0⁄54)2 + (49⁄54)2 + (5⁄54)2 = 𝟎. 𝟏𝟔𝟖

𝐸𝑛𝑡𝑟𝑜𝑝𝑦 (𝑆) = ∑ − 𝑝𝑖 𝑙𝑜𝑔2 ( 𝑝𝑖 )

If we had chosen entropy as the criterion, we would have needed to

𝐸𝑛𝑡𝑟𝑜𝑝𝑦 (𝑆) = −[(0⁄54)𝑙𝑜𝑔2 (0⁄54) + (49⁄54)𝑙𝑜𝑔2 (49⁄54) + (5⁄54)𝑙𝑜𝑔2 (5⁄54) ≈ 𝟎, 𝟒𝟒𝟓

As seen, we have obtained the same result. Entropy ranges from 0 to

On the adjacent graph, we

Let's calculate the

Information Gain value:

The pros and coins of decision trees:

Possible to validate a model using statistical tests. That makes

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.