0% found this document useful (0 votes)
9 views

Decision Tree

Uploaded by

shaikhmismail66
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views

Decision Tree

Uploaded by

shaikhmismail66
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 12

Hands on Machine Learning

nd
2 Edition
Chapter 6 – Decision Trees

Prepared by Glenn Miller for the San Diego Machine Learning MeetupTM group
Decision Tree (DT) + / -

ADVANTAGES DISADVANTAGES
• Simple to understand and interpret (‘White • Prone to overfitting (must restrict degrees of
Box’ model) freedom)
• Little data prep (e.g. no scaling) • Can be unstable (small data variations
produce big changes to the tree)
• Versatile (classification and regression)
• Cost of using the tree is logarithmic in # of • Predictions are piecewise constant
data points used to train the tree approximations (not smooth or continuous)

• Can handle multi-output problems • DT learners create biased trees if some


classes dominate
• Can validate using statistical tests
• Practical DT algos cannot guarantee to return
• Performs well even if its assumptions are the globally optimal DT b/c learning an
somewhat violated optimal DT is NP-complete

Source: Scikit-Learn documentation


Computational Complexity
• Predictions are fast, even with large training sets

• Algorithm compares all available* features on all samples at each node

• Big O – O(n x mlog2(m))

• Presorting the data (presort=True) can speed up training for small data sets
*If max_features is set, the algorithm will consider max_features features at each split, subject to a minimum one valid
partition of the node samples
Regularization and Pruning
Regularize: prevent the tree from growing too large (like the tree below) by limiting parameter(s)
before growing the tree (e.g., set max_depth or min_samples_leaf)

Prune: let the tree grow and then replace irrelevant nodes with leaves
Impurity

Gini impurity index Entropy / Information Gain


n
• Hi = -  Pi,klog2(Pi,k)
k=1

Where Pi,k is the ratio of k instances among the


training instances in the ith node
CART Algorithm (Scikit-Learn)
Splits the training set into two subsets using a
single feature (k) and a threshold (tk)

Classification (DecisionTreeClassifier) Regression (DecisionTreeRegressor)

• Predict a class in each node • Predict a value in each node


• Minimize impurity • Minimize MSE
• Cost function: • Cost function:
J(k,tk) = mleft Gleft + mright Gright J(k,tk) = mleft MSEleft + mright MSEright
m m m m
G is impurity of the subset; m is number of instances in the subset
• Setosa

(Some) Iris
Species
• Versicolor

• Versicolor
Classification
Regression
Regularization / Pruning

Classification Regression

Source: HOML 2nd edition pp. 182,184 / https://github.com/ageron/handson-ml2/blob/master/06_decision_trees.ipynb


Instability
Sensitivity to training set rotation Sensitivity to training set details

Just one data


point removed!!

Source: HOML 2nd edition pp. 185,186 / https://github.com/ageron/handson-ml2/blob/master/06_decision_trees.ipynb


Conclusion
• Decision trees are powerful, versatile, and easy to understand

• They have limitations, which can be addressed – e.g. averaging trees


reduces instability (random forests)

• More on this in the chapter 7, ‘Ensemble Learning and Random Forests’

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy