0% found this document useful (0 votes)

9 views

Decision Tree

Uploaded by

yichenxiao248

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

9 views

Decision Tree

Uploaded by

yichenxiao248

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 13

Decision Tree (Classification Tree)

Gang Wang, Ph.D.

Associate Professor of Management Information Systems
University of Delaware
Predictive Analytics

• Predictive analytics problems can be largely categorized into either

Class Prediction (Classification) or Numeric Prediction (Regression)
problems
• In classification, we try to use the information from the sample data
to sort the data into distinct classes: (Classification Trees)
• Predict whether a customer is a loyal or churn customer
• Predict whether a loan is going to default or not
• Predict whether a passenger of Titanic is going to survive or not
• In numeric prediction, we try to predict the numeric value of a
target/label attribute (Regression Trees)
• Predict the housing price
• Predict how much a customer is going to spend in the next quarter
• Decision Trees can be used for both classification and regression
predictions, which are often referred to as CART: Classification and
Regression Trees

2
Classification Trees (Decision Trees)

• Decision Trees (DTs), or classification trees, are one of the most

popular classification techniques
• Easy to setup
• Easy to interpret (especially for business users)
• Computationally cheap
• Almost all data mining packages include DTs

Root

Node

Leaf

3
Homogeneity/Purity of Data
• The basic idea of a decision tree is to split data set based on the
homogeneity (of the same kind) of data, i.e., reducing “Impurity”
• Impurity (uncertainty) is at maximum when all possible classes are
equally represented, e.g., same number of “default” and “not default”
in the following example

Age >=45 100% Purity

or
Entire Population 0% Impurity
Balance >= 50k
Age <45

Balance < 50k Age >=45

Age < 45

4
Entropy

• Entropy is one of the most common measures for calculating

impurity proposed by Claude Shannon known as "the father of
information theory"

Entropy =

P(xi) is the probability of class xi in the data

with total n classes
For example: churn dataset has two classes:
- “Default” (14 cases)
- “Not Default” (16 cases)
Entropy (entire dataset) =

5
ID3 Decision Tree Algorithm

• Iterative Dichotomizer 3 (ID3)

• ID3 was developed in 1986 by
Ross Quinlan
• Classic golf dataset: decide
playing golf or not based on
four attributes: Temperature,
Humidity, Wind and Outlook
• Two key questions:
• Where to split the data
• When to stop splitting

Sources:
https://en.wikipedia.org/wiki/ID3_algorithm
http://www.saedsayad.com/decision_tree.htm 6
Source: http://www.saedsayad.com/decision_tree.htm

Entropy Calculation using Frequency Table

• Calculate Entropy using • Calculate Entropy given the

the frequency table of frequency table of one feature
the target variable: variable and the target variable:
Target Feature

7
http://www.saedsayad.com/decision_tree.htm

Information Gain/Entropy Reduction

• The information gain is based on the decrease in entropy after a

dataset is split on an attribute (information gain = entropy reduction)
• Constructing a decision tree is all about finding attribute that returns
the highest information gain at each split

Information Gain when splitting on Outlook attribute is 0.247

8
- Choose attribute with
the largest information
gain as the decision
node
- divide the dataset by its
branches and repeat the
same process on every
branch

need for
further
splitting

Already pure
(Entropy is 0)
No more
splitting

need for
further
splitting
• A branch with entropy of 0 is a leaf node.
• A branch with entropy more than 0 needs further splitting
• Repeat recursively on the non-leaf branches until all data is classified

Windy is chosen to
be the next
decision node here

10
Split Points for Numeric Variables

• Method 1: use averages of available values

• For Temperature: first split at Avg(64, 65) = 64.5, then
Avg(65, 68) = 66.5, etc.
• Method 2: discretize values via “binning”
• For temperature: >= 80 is “Hot”
between 70 and 79 is “Mild”
less than 70 is “Cool”

11
When to Stop?
• ID3 algorithm may lead to a very complex tree as shown below,
which may work well for the training data but has poor prediction
performance for the unseen data, which is called overfitting (we will
discuss this more in the future)

• Pruning can be used to reduce overfitting:

• Pre-pruning (stop growing the tree), such as setting max depth or minimal
information gain
• Post-pruning (fully grow the tree then remove unimportant nodes), such as
Minimal Cost-Complexity Pruning https://scikit-learn.org/stable/modules/tree.html#minimal-cost-complexity-pruning

12
Questions?

Assignment 1 Q1
No ratings yet
Assignment 1 Q1
1 page
Decision Tree & Techniques
71% (7)
Decision Tree & Techniques
41 pages
Biochemical Tests For Staph-Strep - Output
100% (1)
Biochemical Tests For Staph-Strep - Output
5 pages
Lecture 17 18
No ratings yet
Lecture 17 18
52 pages
3. Tree Models
No ratings yet
3. Tree Models
42 pages
Decision Trees
No ratings yet
Decision Trees
15 pages
DS_w12_DT
No ratings yet
DS_w12_DT
61 pages
Decision Tree Algorithm, Explained-1-22
No ratings yet
Decision Tree Algorithm, Explained-1-22
22 pages
Decision Tree
No ratings yet
Decision Tree
74 pages
Decision Trees
No ratings yet
Decision Trees
45 pages
Lesson 7 Supervised Method (Decision Trees) Algorithms
No ratings yet
Lesson 7 Supervised Method (Decision Trees) Algorithms
12 pages
Classification
No ratings yet
Classification
148 pages
FALLSEM2024-25 BCSE209L TH VL2024250101598 2024-08-05 Reference-Material-I
No ratings yet
FALLSEM2024-25 BCSE209L TH VL2024250101598 2024-08-05 Reference-Material-I
31 pages
DECISION TREES-jb
No ratings yet
DECISION TREES-jb
8 pages
Cse 445 Lecture 8 Mma
No ratings yet
Cse 445 Lecture 8 Mma
107 pages
Decision Trees
No ratings yet
Decision Trees
14 pages
AI&Ml-module 4 (Complete)
No ratings yet
AI&Ml-module 4 (Complete)
124 pages
AI&Ml-module 4 (Part 1)
No ratings yet
AI&Ml-module 4 (Part 1)
85 pages
Decision Tree For Classification (ID3 Information Gain Entropy)
No ratings yet
Decision Tree For Classification (ID3 Information Gain Entropy)
3 pages
Decision Trees and Decision Modeling
No ratings yet
Decision Trees and Decision Modeling
58 pages
16-Decision Tree Classification Algorithm Advantages With Examples (Iterative Dichotomiser 3-ID3) - 22-03-2024
No ratings yet
16-Decision Tree Classification Algorithm Advantages With Examples (Iterative Dichotomiser 3-ID3) - 22-03-2024
83 pages
Classification and Prediction
No ratings yet
Classification and Prediction
81 pages
Day 5 Supervised Technique-Decision Tree For Classification PDF
100% (1)
Day 5 Supervised Technique-Decision Tree For Classification PDF
58 pages
Decision Trees - 2022
No ratings yet
Decision Trees - 2022
49 pages
Session 5b Classification by Decision Tree Induction (1)
No ratings yet
Session 5b Classification by Decision Tree Induction (1)
42 pages
06 - Decision Trees
No ratings yet
06 - Decision Trees
14 pages
2 - Decision Tree
No ratings yet
2 - Decision Tree
23 pages
Unit IV Da Online - PPTX 2 82
No ratings yet
Unit IV Da Online - PPTX 2 82
81 pages
Machine Learning-Lecture 05
No ratings yet
Machine Learning-Lecture 05
21 pages
ML Unit-2 Material WORD
No ratings yet
ML Unit-2 Material WORD
25 pages
Lecture 6 - Decision Trees
No ratings yet
Lecture 6 - Decision Trees
43 pages
DWM_Module 3 (1)
No ratings yet
DWM_Module 3 (1)
22 pages
Unit6 -2 Classification-Decision-Trees_25625586-1bf9-4821-a721-70db2d7805ef
No ratings yet
Unit6 -2 Classification-Decision-Trees_25625586-1bf9-4821-a721-70db2d7805ef
36 pages
Aiml M4 C1
No ratings yet
Aiml M4 C1
101 pages
Decision Tree Classifier-Introduction, ID3
No ratings yet
Decision Tree Classifier-Introduction, ID3
34 pages
Slide 3
No ratings yet
Slide 3
23 pages
Ch02 DecisionTree
No ratings yet
Ch02 DecisionTree
41 pages
Decision Tree.pptx
No ratings yet
Decision Tree.pptx
41 pages
Decision Tree Classification fully explained by Example
No ratings yet
Decision Tree Classification fully explained by Example
4 pages
1.decision Trees Concepts
No ratings yet
1.decision Trees Concepts
70 pages
Storey DecisionTrees
No ratings yet
Storey DecisionTrees
38 pages
Decision Tree
No ratings yet
Decision Tree
14 pages
MLT UNIT-3 notes
No ratings yet
MLT UNIT-3 notes
35 pages
Decision Trees_ a Complete Introduction With Examples _ by Shubham Koli _ Medium
No ratings yet
Decision Trees_ a Complete Introduction With Examples _ by Shubham Koli _ Medium
22 pages
Data Mining Notes Unit 4
No ratings yet
Data Mining Notes Unit 4
30 pages
L3 - Decision Trees
No ratings yet
L3 - Decision Trees
28 pages
06 Classification Decision Tree
No ratings yet
06 Classification Decision Tree
42 pages
Decision Trees Edited
No ratings yet
Decision Trees Edited
56 pages
فاينل تعلم
No ratings yet
فاينل تعلم
144 pages
Unit-2 Material (1)
No ratings yet
Unit-2 Material (1)
52 pages
AIML Lec-11
No ratings yet
AIML Lec-11
18 pages
decision tree
No ratings yet
decision tree
66 pages
DM UNIT III (1)
No ratings yet
DM UNIT III (1)
87 pages
AI Chapter 3 Part 2
No ratings yet
AI Chapter 3 Part 2
51 pages
4. Classification
No ratings yet
4. Classification
75 pages
2024-Lecture11-MLAlgorithms
No ratings yet
2024-Lecture11-MLAlgorithms
84 pages
decision-tree-intro-MDT903
No ratings yet
decision-tree-intro-MDT903
40 pages
MLT Unit 3
100% (1)
MLT Unit 3
38 pages
Lecture 04 Decession Trees 04112022 015118pm
No ratings yet
Lecture 04 Decession Trees 04112022 015118pm
43 pages
Decision Tree Pruning: Fundamentals and Applications
From Everand
Decision Tree Pruning: Fundamentals and Applications
Fouad Sabry
No ratings yet
Alternating Decision Tree: Fundamentals and Applications
From Everand
Alternating Decision Tree: Fundamentals and Applications
Fouad Sabry
No ratings yet
Search Tree: Fundamentals and Applications
From Everand
Search Tree: Fundamentals and Applications
Fouad Sabry
No ratings yet
Cabral Et Al., 2017
No ratings yet
Cabral Et Al., 2017
11 pages
Assessment Task 3..
No ratings yet
Assessment Task 3..
11 pages
Events Brochure
No ratings yet
Events Brochure
45 pages
ESIA Ch9
No ratings yet
ESIA Ch9
32 pages
Economic Benefits of Load Volume Scanning of Underground Mining Trucks
No ratings yet
Economic Benefits of Load Volume Scanning of Underground Mining Trucks
8 pages
Daily Lesson Log Template Sample
No ratings yet
Daily Lesson Log Template Sample
5 pages
First Angle Projection and Third Angle Projection
No ratings yet
First Angle Projection and Third Angle Projection
6 pages
Writting Number One
No ratings yet
Writting Number One
2 pages
CHARACTER FORMATION FINAL EXAM 2024-25
No ratings yet
CHARACTER FORMATION FINAL EXAM 2024-25
6 pages
2004 Gamborena
No ratings yet
2004 Gamborena
14 pages
Supermarket
No ratings yet
Supermarket
39 pages
Paper Two - Sample Paper (Highlighted)
No ratings yet
Paper Two - Sample Paper (Highlighted)
6 pages
Waste Card SW306
No ratings yet
Waste Card SW306
1 page
Accelerated Plant Breeding, Volume 1 Cereal Crops, 1st Edition Instant Access
100% (13)
Accelerated Plant Breeding, Volume 1 Cereal Crops, 1st Edition Instant Access
15 pages
Bench Case SWOT
No ratings yet
Bench Case SWOT
4 pages
Mok
No ratings yet
Mok
2 pages
Pre Board 10th SST Set 2
No ratings yet
Pre Board 10th SST Set 2
8 pages
BillieSutton - 2018 - Yearend Part 1
No ratings yet
BillieSutton - 2018 - Yearend Part 1
40 pages
Documentation For Underground Car Parks
No ratings yet
Documentation For Underground Car Parks
15 pages
Cardiology Procedures A Clinical Primer 1st Edition by Robert Hendel, Carey Kimmelstiel ISBN 9781447172901 144717290X instant download
No ratings yet
Cardiology Procedures A Clinical Primer 1st Edition by Robert Hendel, Carey Kimmelstiel ISBN 9781447172901 144717290X instant download
72 pages
Percakapan Bahasa Inggris Reservation Kamar Hotel Melalui Telepon
No ratings yet
Percakapan Bahasa Inggris Reservation Kamar Hotel Melalui Telepon
4 pages
Chapter 13 Lesson 3: Writing An Inequality From A Graph
No ratings yet
Chapter 13 Lesson 3: Writing An Inequality From A Graph
23 pages
Welding
No ratings yet
Welding
19 pages
Glossary of Literary Terms (Beam Version)
No ratings yet
Glossary of Literary Terms (Beam Version)
36 pages
Papercuts 6 The Eagle Has Crash Landed Colin Bateman download
100% (1)
Papercuts 6 The Eagle Has Crash Landed Colin Bateman download
34 pages
Gantt Chart
No ratings yet
Gantt Chart
16 pages
Datasheet 2
No ratings yet
Datasheet 2
89 pages
Technology and Livelihood Education Dressmaking
No ratings yet
Technology and Livelihood Education Dressmaking
12 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Decision Tree

Uploaded by

Decision Tree

Uploaded by

Decision Tree (Classification Tree)

Gang Wang, Ph.D.

• Predictive analytics problems can be largely categorized into either

• Decision Trees (DTs), or classification trees, are one of the most

Age >=45 100% Purity

Balance < 50k Age >=45

• Entropy is one of the most common measures for calculating

P(xi) is the probability of class xi in the data

• Iterative Dichotomizer 3 (ID3)

Entropy Calculation using Frequency Table

• Calculate Entropy using • Calculate Entropy given the

Information Gain/Entropy Reduction

• The information gain is based on the decrease in entropy after a

Information Gain when splitting on Outlook attribute is 0.247

• Method 1: use averages of available values

• Pruning can be used to reduce overfitting:

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.