0% found this document useful (0 votes)

18 views26 pages

Lecture 07A - Decision Trees

Uploaded by

Yuuki Sagaguchi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

18 views26 pages

Lecture 07A - Decision Trees

Uploaded by

Yuuki Sagaguchi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 26

Decision Trees

Yudi Agusta, PhD

Data Mining, Lecture 07
Copyright © Yudi Agusta, PhD, 2021
Copyright © Yudi Agusta, PhD, 2021

Lecture’s Structure
◼ Unsupervised, Supervised and Semi-
supervised Classification
◼ Definition of Decision Trees
◼ Method for
◼ Finding a consistent decision trees
◼ Finding a good decision trees
◼ Overfitting Problem
◼ Pruning Method
Copyright © Yudi Agusta, PhD, 2021

Supervised, Unsupervised, Semi-Supervised

◼ Supervised classification:
◼ Needs help from the data
◼ Each data in the dataset analysed has been classified
◼ Needs great amount of data
◼ Unsupervised classification:
◼ Doesn’t need help from the data
◼ Each data in the dataset analysed is not classified
◼ Great amount of data are not necessarily needed
◼ Semi-supervised classification:
◼ Need the user to set one or more parameters at the beginning of
the classification process
◼ Example: Clustering (fuzzy c-means)
Copyright © Yudi Agusta, PhD, 2021

Supervised, Unsupervised, Semi-Supervised

◼ Supervised Classification
◼ Neural Networks (NN)
◼ Decision Trees
◼ Naïve Bayes
◼ Support Vector Machine (SVM)
◼ Bayesian Networks (BN)
◼ Unsupervised Classification
◼ Clustering (Mixture Modelling)
◼ Self Organising Map (SOM)
◼ Semi-supervised Classification
◼ Analytical Hierarchy Process (AHP)
◼ Simple Additive Model (SAM)
Copyright © Yudi Agusta, PhD, 2021

Decision Trees
◼ Given an event, predict its category.
Examples:
◼ Who will won a given ball game?
◼ How should we file a given e-mail?
◼ Event = list of features. Examples:
◼ Ball game: Who is the goalkeeper?
◼ E-mail: Who sent the e-mail?
Copyright © Yudi Agusta, PhD, 2021

Decision Trees
◼ Each data in the training data has its own category
◼ Use training data to build the decision tree
◼ Use a decision tree model to predict categories of a
new event

New
Event

Training Events Decision

& Categories Tree Model

Category
Copyright © Yudi Agusta, PhD, 2021

Decision Trees
◼ A decision tree is a tree where:
◼ Each interior node is labeled with a feature
◼ Each arc out of an interior node is labeled with a feature
value for the node’s feature
◼ Each leaf is labeled with a category

Jumlah ART >3.5

<1.5
<3.5
Jenis Lantai WC Luas Lantai
tanah semen/keramik bersama sendiri <20.5 >20.5

Miskin Tidak Miskin Miskin Tidak Miskin Miskin Tidak Miskin

An Example of Training Data

Jumlah ART Jenis Lantai WC Luas Lantai Category
5 Tanah Sendiri 15 Miskin
3 Tanah Sendiri 20 Tidak Miskin
4 Semen Sendiri 50 Tidak Miskin
1 Semen Bersama 40 Tidak Miskin
1 Tanah Sendiri 30 Miskin
2 Semen Bersama 40 Miskin
8 Semen Bersama 18 Miskin
Copyright © Yudi Agusta, PhD, 2021

Finding A Consistent Decision Tree

◼ Too many decision trees to try them all
◼ Build a decision tree top-down, using
recursive partitioning, as follows:
◼ If all of the events have the same category:
◼ Create a leaf with that category
◼ Else
◼ Pick a feature, and create a node for that feature
◼ For each feature value:
◼ Back to the first step to create sub-tree for the events with
that feature value
Copyright © Yudi Agusta, PhD, 2021

Creating a Decision Tree

◼ Use training data consisting of events and
categories to create a new decision tree
◼ What properties do we want the decision tree
to have?
◼ It should be consistent with the training data
◼ Many decision trees are consistent
◼ It should be simple
◼ Occam’s Razor
◼ Simple models tend to generalise better
◼ In simpler decision trees, each node is based on more
data (on average)
Copyright © Yudi Agusta, PhD, 2021

Finding a Good Decision Tree

◼ What feature should we split on?
◼ We want a small decision tree
◼ Pick the feature that gives us the most information about
what the category should be
◼ 20 Questions
◼ I am thinking of a number from 1 to 1000
◼ You can ask yes/no questions
◼ What is the first question you would ask?
◼ On average, some questions give you more information than
others:
◼ Is it 752? Or Is it prime? Or Is it between 1 and 500?
◼ Pick a question that provides the most information
Copyright © Yudi Agusta, PhD, 2021

Entropy
◼ Entropy provides a measure of how much we know about the
category or how much information we gain
◼ On average, how many more yes/no questions do we need to ask
to determine the category?
◼ The more we know, the lower the entropy
◼ The entropy of a set of events/data E is H(E):
H ( E ) =  P(c) log 2 P(c)
cC
◼ Where P(c) is the probability that an event in E has category c
◼ Entropy means the average value that we can expect to have if
an event occurs
Copyright © Yudi Agusta, PhD, 2021

Entropy
◼ Example of Entropy calculation

H ( E ) =  P(c) log 2 P(c)

cC
◼ For the above dataset the H(E) is calculated as follows:
◼ H(E) = P(tidak miskin) log P(tidak miskin) + P(miskin) log P(miskin)
◼ H(E) = 3/7 log 3/7 + 4/7 log 4/7
◼ H(E) = 0.43 x (-0.37) + 0.57 x (-0.24)
◼ H(E) = - 0.16 - 0.17 = - 0.33
Copyright © Yudi Agusta, PhD, 2021

Information Gain
◼ How much information does a feature give us
about what the category should be?
◼ H(E)=entropy of event set E
◼ H(E|f)=expected entropy of event set E, once we
know the value of feature f.
◼ G(E,f)=H(E)-H(E|f)=the amount of new
information provided by feature f
◼ Split on the feature that maximises
information gain
Copyright © Yudi Agusta, PhD, 2021

Information Gain
◼ Information Gain for the variable ‘Jenis Lantai’ is
calculated as follows:
◼ H(E|f) =

◼ P(tidak miskin|tanah) log P(tidak miskin|tahan) +

Information Gain
◼ Variable ‘Jumlah ART’ dan ‘Luas Lantai’ on the other
hand, has to use continuous distribution such as
Normal distribution to get the value of P(c)
◼ Untuk variable ‘Jumlah ART’ H(E|f) =

◼ P(tidak miskin|3) log P(tidak miskin|3) + ...

Information Gain Ratio

◼ But: information gain is biased towards features that have many
values
◼ Maximum information gain for a feature depends on the number
feature values:
◼ Max information gain for a binary feature is 2
◼ Max information gain for a feature with 1024 values is 10
◼ Use information gain ratio to remove this bias:
G( E, f )
GR( E , f ) =
−  P(v) log 2 P(v)
vV

◼ Where v is in V (all values of the variable considered during the

Overfitting
◼ Decision tree may encode idiosyncrasies of the
training data.
◼ E.g., consider this sub

Jumlah ART >3.5

<1.5
<3.5
Jenis Lantai WC Luas Lantai
tanah semen/keramik bersama sendiri <20.5 >20.5

Miskin Miskin Miskin Miskin Miskin Tidak Miskin

◼ How much should we trust the “Tidak Miskin” leaf, if it is

Overfitting (2)
◼ Fully consistent decision trees tend to over-
generalise
◼ Especially if the decision tree is big
◼ Or if there is not much training data
◼ Trade-off full consistency for compactness:
◼ Larger decision trees can be more consistent
◼ Smaller decision trees generelise better
Copyright © Yudi Agusta, PhD, 2021

Pruning
◼ We can reduce overfitting by reducing the
size of the decision tree:
◼ Create the complete decision tree
◼ Discard any unreliable leafs
◼ Repeat (2) until the decision tree is compact
enough
◼ Which leaves are unreliable
◼ Leafs with low counts?
◼ When should we stop
◼ No more unreliable leaves?
Copyright © Yudi Agusta, PhD, 2021

Test Data
◼ How can we tell if we’re overfitting?
◼ Use a subset of the training data to train the
model (the “training set”)
◼ Use the rest of the training data to test for
overfitting (the “test set”)
◼ For each leaf:
◼ Test the performance of the decision tree on the test set
without that leaf
◼ If the performance improves, discard the leaf
◼ Repeat (3) until no more leafs are discarded
Copyright © Yudi Agusta, PhD, 2021

Using A Decision Tree

◼ Given a new event (= list of feature
values):
◼ Start at the root
◼ At each interior node, follow the outgoing
arc for the feature value that matches our
event
◼ When we reach a leaf node, return its
category as the result of classification
using the decision tree model
Copyright © Yudi Agusta, PhD, 2021

When are Decision Trees Useful

◼ Disadvantages of decision trees:
◼ Problems with sparse data & overfitting
◼ Pruning helps these problems, but doesn’t solve them
◼ Don’t (directly) combine information about
different features
◼ Advantages of decision trees:
◼ Fast
◼ Easy to interpret
◼ Helps us understand what is important in new domains
◼ We can explain the “why” of how the decision tree
makes decisions
Copyright © Yudi Agusta, PhD, 2021

Execise
Jumlah ART Jenis Lantai WC Luas Lantai Category
5 Tanah Sendiri 15 Miskin
3 Tanah Sendiri 20 Tidak Miskin
4 Semen Sendiri 50 Tidak Miskin
1 Semen Bersama 40 Tidak Miskin
1 Tanah Sendiri 30 Miskin
2 Semen Bersama 40 Miskin
8 Semen Bersama 18 Miskin
Copyright © Yudi Agusta, PhD, 2021

Topik Diskusi
◼ Two types of unsupervised and
supervised classification methods have
been introduced (clustering and
decision trees). What is the main
difference of the two in regards to the
possible applications?
◼ Think of a problem that can be solved
using decision trees.

Unit IV Da Online - PPTX 2 82
No ratings yet
Unit IV Da Online - PPTX 2 82
81 pages
ML Unit 3 Notes-1
No ratings yet
ML Unit 3 Notes-1
118 pages
Slide 3
No ratings yet
Slide 3
23 pages
ML Unit 3 Notes
No ratings yet
ML Unit 3 Notes
117 pages
DESIGN OF METAL STRUCTRES - Mukhanov PDF
90% (10)
DESIGN OF METAL STRUCTRES - Mukhanov PDF
260 pages
What Is Decision Tree
No ratings yet
What Is Decision Tree
35 pages
ML Unit II - Final
No ratings yet
ML Unit II - Final
138 pages
Week 11 - Decision Tree Learning
No ratings yet
Week 11 - Decision Tree Learning
43 pages
AI&Ml-module 4 (Complete)
No ratings yet
AI&Ml-module 4 (Complete)
124 pages
2024 Lecture11 MLAlgorithms
No ratings yet
2024 Lecture11 MLAlgorithms
84 pages
Module 4 Lecture - 2
No ratings yet
Module 4 Lecture - 2
65 pages
Decision Tree
No ratings yet
Decision Tree
45 pages
Lecture 11 Slides - After
No ratings yet
Lecture 11 Slides - After
55 pages
DM-Lecture Decision Trees (A)
No ratings yet
DM-Lecture Decision Trees (A)
161 pages
Decision - Tree
No ratings yet
Decision - Tree
75 pages
7 DecisionTree
No ratings yet
7 DecisionTree
58 pages
Decision Trees-Lecture 9&10
No ratings yet
Decision Trees-Lecture 9&10
60 pages
21 Decision Trees
No ratings yet
21 Decision Trees
62 pages
Springer - Linguistic Decision Trees For Classification-2014
No ratings yet
Springer - Linguistic Decision Trees For Classification-2014
43 pages
2025 Lecture07 P1 ID3
No ratings yet
2025 Lecture07 P1 ID3
41 pages
M2 Decision Trees
No ratings yet
M2 Decision Trees
37 pages
ML Unit 2-2-40
No ratings yet
ML Unit 2-2-40
39 pages
Decision Tree and Random Forest
No ratings yet
Decision Tree and Random Forest
41 pages
CS446: Machine Learning: Lecture 21 (ML Models - Decision Trees - ID3)
No ratings yet
CS446: Machine Learning: Lecture 21 (ML Models - Decision Trees - ID3)
54 pages
MLT UNIT-3 Notes
No ratings yet
MLT UNIT-3 Notes
35 pages
Decisiontrees
No ratings yet
Decisiontrees
28 pages
Decision Tree
No ratings yet
Decision Tree
20 pages
AIML Module-04
No ratings yet
AIML Module-04
46 pages
ML & Statistics Unit 6
No ratings yet
ML & Statistics Unit 6
36 pages
Lecture 04 Decession Trees 04112022 015118pm
No ratings yet
Lecture 04 Decession Trees 04112022 015118pm
43 pages
Chapter18 2
No ratings yet
Chapter18 2
40 pages
Unit 3
No ratings yet
Unit 3
33 pages
Week 8 - Understanding The Decision Tree
No ratings yet
Week 8 - Understanding The Decision Tree
28 pages
Lecture 6 - Decision Trees
No ratings yet
Lecture 6 - Decision Trees
43 pages
Decision Trees
No ratings yet
Decision Trees
27 pages
Decision Trees
No ratings yet
Decision Trees
128 pages
9-Module 5 Decision Tree-21-03-2024
No ratings yet
9-Module 5 Decision Tree-21-03-2024
83 pages
06 - Decision Trees
No ratings yet
06 - Decision Trees
14 pages
19 - Decision Tree - ID3
No ratings yet
19 - Decision Tree - ID3
87 pages
16-Decision Tree Classification Algorithm Advantages With Examples (Iterative Dichotomiser 3-ID3) - 22-03-2024
No ratings yet
16-Decision Tree Classification Algorithm Advantages With Examples (Iterative Dichotomiser 3-ID3) - 22-03-2024
83 pages
AIML Lec-11
No ratings yet
AIML Lec-11
18 pages
Oily - Water - Separator FINAL
No ratings yet
Oily - Water - Separator FINAL
13 pages
ML-Unit I - Decision Tree
No ratings yet
ML-Unit I - Decision Tree
27 pages
Lecture Note #5 - PEC-CS701E
No ratings yet
Lecture Note #5 - PEC-CS701E
16 pages
Lesson 3.1 - Supervised Learning Decision Trees
No ratings yet
Lesson 3.1 - Supervised Learning Decision Trees
51 pages
Machine Learning-Lecture 05
No ratings yet
Machine Learning-Lecture 05
21 pages
Decision Tree
No ratings yet
Decision Tree
16 pages
Random Forests
No ratings yet
Random Forests
22 pages
Dokumen - Tips Decision Tree and Random Forest 58f9e8a0cce07
No ratings yet
Dokumen - Tips Decision Tree and Random Forest 58f9e8a0cce07
17 pages
AI&Ml-module 4 (Part 1)
No ratings yet
AI&Ml-module 4 (Part 1)
85 pages
Machine - Learning - Lecture - 08 - Decision Tree Learning
No ratings yet
Machine - Learning - Lecture - 08 - Decision Tree Learning
67 pages
1694600905-Unit2.4 Decision Tree CU 2.0
No ratings yet
1694600905-Unit2.4 Decision Tree CU 2.0
29 pages
Decision Tree
No ratings yet
Decision Tree
18 pages
NRC 鱼的营养需要 2011 Nutrient Requirements of Fish and Shrimp
100% (2)
NRC 鱼的营养需要 2011 Nutrient Requirements of Fish and Shrimp
539 pages
Module - 2 Decision Tree Learning
No ratings yet
Module - 2 Decision Tree Learning
79 pages
Day 5 Supervised Technique-Decision Tree For Classification PDF
100% (1)
Day 5 Supervised Technique-Decision Tree For Classification PDF
58 pages
Lecture 17 18
No ratings yet
Lecture 17 18
52 pages
School OF Dentistry: Website: WWW - Ubaguio.edu Telefax No.: (074) 442-3071 E-Mail Address: Sod@ubaguio - Edu
No ratings yet
School OF Dentistry: Website: WWW - Ubaguio.edu Telefax No.: (074) 442-3071 E-Mail Address: Sod@ubaguio - Edu
23 pages
Decision Trees - A Complete Introduction With Examples - by Shubham Koli - Medium
No ratings yet
Decision Trees - A Complete Introduction With Examples - by Shubham Koli - Medium
22 pages
LVC 1 Post-Session Summary
No ratings yet
LVC 1 Post-Session Summary
9 pages
OM ClotQuant 4
No ratings yet
OM ClotQuant 4
28 pages
Decision Trees
No ratings yet
Decision Trees
15 pages
Data Communication & Networks: Unit 1
No ratings yet
Data Communication & Networks: Unit 1
17 pages
Dionisie Areopagitul Opere Complete
No ratings yet
Dionisie Areopagitul Opere Complete
146 pages
Cavitation Cleaning - Valqua
No ratings yet
Cavitation Cleaning - Valqua
4 pages
Basic Electrical Engineering Practice Problems
60% (5)
Basic Electrical Engineering Practice Problems
26 pages
Decision Tree Algorithm - A Complete Guide: Data Science Blogathon
No ratings yet
Decision Tree Algorithm - A Complete Guide: Data Science Blogathon
13 pages
Geometric Intuition of Decision Tree: Axis Parallel Hyperplanes
No ratings yet
Geometric Intuition of Decision Tree: Axis Parallel Hyperplanes
7 pages
Scheme Cbcs Nep Ece 2024
No ratings yet
Scheme Cbcs Nep Ece 2024
223 pages
Roblox Drink - Google Search
No ratings yet
Roblox Drink - Google Search
1 page
Litelature P2H
No ratings yet
Litelature P2H
41 pages
Fibonacci - Series - Report
No ratings yet
Fibonacci - Series - Report
8 pages
m2 Model Paper
No ratings yet
m2 Model Paper
3 pages
User Manual HI 9814-1
No ratings yet
User Manual HI 9814-1
2 pages
CIEM5790 2019 20 Exam Paper
No ratings yet
CIEM5790 2019 20 Exam Paper
7 pages
E6 Naplan 2016 Final Test Language Conventions Year 5
No ratings yet
E6 Naplan 2016 Final Test Language Conventions Year 5
12 pages
MSDS SIGMAWELD 199 (SIGMAWELD MC) PASTE (ENG) (sk-17-03-07leg-MT)
No ratings yet
MSDS SIGMAWELD 199 (SIGMAWELD MC) PASTE (ENG) (sk-17-03-07leg-MT)
8 pages
Alternating Decision Tree: Fundamentals and Applications
From Everand
Alternating Decision Tree: Fundamentals and Applications
Fouad Sabry
No ratings yet
Al79 10e
No ratings yet
Al79 10e
33 pages
NATO Ammo Clone Information
0% (1)
NATO Ammo Clone Information
6 pages
Gas Bill Đã Chuyển Đổi
No ratings yet
Gas Bill Đã Chuyển Đổi
2 pages
Laboratory 8 - Micronutrients and Water
No ratings yet
Laboratory 8 - Micronutrients and Water
6 pages
Price List Harsons Medtra 2022 PPN 11%
0% (1)
Price List Harsons Medtra 2022 PPN 11%
2 pages
(36-Way Distributor Drawing)
No ratings yet
(36-Way Distributor Drawing)
1 page
e10.Unit 2.Đề 1.New
No ratings yet
e10.Unit 2.Đề 1.New
5 pages
Fermentation of Glucose Using Yeast
No ratings yet
Fermentation of Glucose Using Yeast
3 pages
Date of Submission: 13-Sep-2020 To: Naveed Islam From: MD. Shafiqul Islam, ID# 1610042030, ENG 115, Sec 4
No ratings yet
Date of Submission: 13-Sep-2020 To: Naveed Islam From: MD. Shafiqul Islam, ID# 1610042030, ENG 115, Sec 4
2 pages
MASTER Chef Portable Charcoal Kettle BBQ Canadian Tire
No ratings yet
MASTER Chef Portable Charcoal Kettle BBQ Canadian Tire
1 page
Newspaper
No ratings yet
Newspaper
2 pages
Some Parts of Lincoln Soaked by Close To 9 Inches of Rain: 17 Print Email
No ratings yet
Some Parts of Lincoln Soaked by Close To 9 Inches of Rain: 17 Print Email
2 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Lecture 07A - Decision Trees

Uploaded by

Lecture 07A - Decision Trees

Uploaded by

Decision Trees

Yudi Agusta, PhD

Supervised, Unsupervised, Semi-Supervised

Supervised, Unsupervised, Semi-Supervised

Training Events Decision

Jumlah ART >3.5

Miskin Tidak Miskin Miskin Tidak Miskin Miskin Tidak Miskin

An Example of Training Data

Finding A Consistent Decision Tree

Creating a Decision Tree

Finding a Good Decision Tree

H ( E ) =  P(c) log 2 P(c)

◼ P(tidak miskin|tanah) log P(tidak miskin|tahan) +

◼ P(tidak miskin|3) log P(tidak miskin|3) + ...

Information Gain Ratio

◼ Where v is in V (all values of the variable considered during the

Jumlah ART >3.5

Miskin Miskin Miskin Miskin Miskin Tidak Miskin

◼ How much should we trust the “Tidak Miskin” leaf, if it is

Using A Decision Tree

When are Decision Trees Useful

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.