0% found this document useful (0 votes)
9 views

Decision Tree

Uploaded by

yichenxiao248
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views

Decision Tree

Uploaded by

yichenxiao248
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 13

Decision Tree (Classification Tree)

Gang Wang, Ph.D.


Associate Professor of Management Information Systems
University of Delaware
Predictive Analytics

• Predictive analytics problems can be largely categorized into either


Class Prediction (Classification) or Numeric Prediction (Regression)
problems
• In classification, we try to use the information from the sample data
to sort the data into distinct classes: (Classification Trees)
• Predict whether a customer is a loyal or churn customer
• Predict whether a loan is going to default or not
• Predict whether a passenger of Titanic is going to survive or not
• In numeric prediction, we try to predict the numeric value of a
target/label attribute (Regression Trees)
• Predict the housing price
• Predict how much a customer is going to spend in the next quarter
• Decision Trees can be used for both classification and regression
predictions, which are often referred to as CART: Classification and
Regression Trees

2
Classification Trees (Decision Trees)

• Decision Trees (DTs), or classification trees, are one of the most


popular classification techniques
• Easy to setup
• Easy to interpret (especially for business users)
• Computationally cheap
• Almost all data mining packages include DTs

Root

Node

Leaf

3
Homogeneity/Purity of Data
• The basic idea of a decision tree is to split data set based on the
homogeneity (of the same kind) of data, i.e., reducing “Impurity”
• Impurity (uncertainty) is at maximum when all possible classes are
equally represented, e.g., same number of “default” and “not default”
in the following example

Age >=45 100% Purity


or
Entire Population 0% Impurity
Balance >= 50k
Age <45

Balance < 50k Age >=45

Age < 45

4
Entropy

• Entropy is one of the most common measures for calculating


impurity proposed by Claude Shannon known as "the father of
information theory"

Entropy =

P(xi) is the probability of class xi in the data


with total n classes
For example: churn dataset has two classes:
- “Default” (14 cases)
- “Not Default” (16 cases)
Entropy (entire dataset) =

5
ID3 Decision Tree Algorithm

• Iterative Dichotomizer 3 (ID3)


• ID3 was developed in 1986 by
Ross Quinlan
• Classic golf dataset: decide
playing golf or not based on
four attributes: Temperature,
Humidity, Wind and Outlook
• Two key questions:
• Where to split the data
• When to stop splitting

Sources:
https://en.wikipedia.org/wiki/ID3_algorithm
http://www.saedsayad.com/decision_tree.htm 6
Source: http://www.saedsayad.com/decision_tree.htm

Entropy Calculation using Frequency Table

• Calculate Entropy using • Calculate Entropy given the


the frequency table of frequency table of one feature
the target variable: variable and the target variable:
Target Feature

7
http://www.saedsayad.com/decision_tree.htm

Information Gain/Entropy Reduction

• The information gain is based on the decrease in entropy after a


dataset is split on an attribute (information gain = entropy reduction)
• Constructing a decision tree is all about finding attribute that returns
the highest information gain at each split

Information Gain when splitting on Outlook attribute is 0.247

8
- Choose attribute with
the largest information
gain as the decision
node
- divide the dataset by its
branches and repeat the
same process on every
branch

need for
further
splitting

Already pure
(Entropy is 0)
No more
splitting

need for
further
splitting
• A branch with entropy of 0 is a leaf node.
• A branch with entropy more than 0 needs further splitting
• Repeat recursively on the non-leaf branches until all data is classified

Windy is chosen to
be the next
decision node here

10
Split Points for Numeric Variables

• Method 1: use averages of available values


• For Temperature: first split at Avg(64, 65) = 64.5, then
Avg(65, 68) = 66.5, etc.
• Method 2: discretize values via “binning”
• For temperature: >= 80 is “Hot”
between 70 and 79 is “Mild”
less than 70 is “Cool”

11
When to Stop?
• ID3 algorithm may lead to a very complex tree as shown below,
which may work well for the training data but has poor prediction
performance for the unseen data, which is called overfitting (we will
discuss this more in the future)

• Pruning can be used to reduce overfitting:


• Pre-pruning (stop growing the tree), such as setting max depth or minimal
information gain
• Post-pruning (fully grow the tree then remove unimportant nodes), such as
Minimal Cost-Complexity Pruning https://scikit-learn.org/stable/modules/tree.html#minimal-cost-complexity-pruning

12
Questions?

13

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy