Lecture 07A - Decision Trees
Lecture 07A - Decision Trees
Lecture’s Structure
◼ Unsupervised, Supervised and Semi-
supervised Classification
◼ Definition of Decision Trees
◼ Method for
◼ Finding a consistent decision trees
◼ Finding a good decision trees
◼ Overfitting Problem
◼ Pruning Method
Copyright © Yudi Agusta, PhD, 2021
Decision Trees
◼ Given an event, predict its category.
Examples:
◼ Who will won a given ball game?
◼ How should we file a given e-mail?
◼ Event = list of features. Examples:
◼ Ball game: Who is the goalkeeper?
◼ E-mail: Who sent the e-mail?
Copyright © Yudi Agusta, PhD, 2021
Decision Trees
◼ Each data in the training data has its own category
◼ Use training data to build the decision tree
◼ Use a decision tree model to predict categories of a
new event
New
Event
Category
Copyright © Yudi Agusta, PhD, 2021
Decision Trees
◼ A decision tree is a tree where:
◼ Each interior node is labeled with a feature
◼ Each arc out of an interior node is labeled with a feature
value for the node’s feature
◼ Each leaf is labeled with a category
Entropy
◼ Entropy provides a measure of how much we know about the
category or how much information we gain
◼ On average, how many more yes/no questions do we need to ask
to determine the category?
◼ The more we know, the lower the entropy
◼ The entropy of a set of events/data E is H(E):
H ( E ) = P(c) log 2 P(c)
cC
◼ Where P(c) is the probability that an event in E has category c
◼ Entropy means the average value that we can expect to have if
an event occurs
Copyright © Yudi Agusta, PhD, 2021
Entropy
◼ Example of Entropy calculation
Information Gain
◼ How much information does a feature give us
about what the category should be?
◼ H(E)=entropy of event set E
◼ H(E|f)=expected entropy of event set E, once we
know the value of feature f.
◼ G(E,f)=H(E)-H(E|f)=the amount of new
information provided by feature f
◼ Split on the feature that maximises
information gain
Copyright © Yudi Agusta, PhD, 2021
Information Gain
◼ Information Gain for the variable ‘Jenis Lantai’ is
calculated as follows:
◼ H(E|f) =
Information Gain
◼ Variable ‘Jumlah ART’ dan ‘Luas Lantai’ on the other
hand, has to use continuous distribution such as
Normal distribution to get the value of P(c)
◼ Untuk variable ‘Jumlah ART’ H(E|f) =
Overfitting
◼ Decision tree may encode idiosyncrasies of the
training data.
◼ E.g., consider this sub
Overfitting (2)
◼ Fully consistent decision trees tend to over-
generalise
◼ Especially if the decision tree is big
◼ Or if there is not much training data
◼ Trade-off full consistency for compactness:
◼ Larger decision trees can be more consistent
◼ Smaller decision trees generelise better
Copyright © Yudi Agusta, PhD, 2021
Pruning
◼ We can reduce overfitting by reducing the
size of the decision tree:
◼ Create the complete decision tree
◼ Discard any unreliable leafs
◼ Repeat (2) until the decision tree is compact
enough
◼ Which leaves are unreliable
◼ Leafs with low counts?
◼ When should we stop
◼ No more unreliable leaves?
Copyright © Yudi Agusta, PhD, 2021
Test Data
◼ How can we tell if we’re overfitting?
◼ Use a subset of the training data to train the
model (the “training set”)
◼ Use the rest of the training data to test for
overfitting (the “test set”)
◼ For each leaf:
◼ Test the performance of the decision tree on the test set
without that leaf
◼ If the performance improves, discard the leaf
◼ Repeat (3) until no more leafs are discarded
Copyright © Yudi Agusta, PhD, 2021
Execise
Jumlah ART Jenis Lantai WC Luas Lantai Category
5 Tanah Sendiri 15 Miskin
3 Tanah Sendiri 20 Tidak Miskin
4 Semen Sendiri 50 Tidak Miskin
1 Semen Bersama 40 Tidak Miskin
1 Tanah Sendiri 30 Miskin
2 Semen Bersama 40 Miskin
8 Semen Bersama 18 Miskin
Copyright © Yudi Agusta, PhD, 2021
Exercise
◼ Hitung information gain dari masing-
masing variable pada saat dilakukan
split yang pertama
Copyright © Yudi Agusta, PhD, 2021
Topik Diskusi
◼ Two types of unsupervised and
supervised classification methods have
been introduced (clustering and
decision trees). What is the main
difference of the two in regards to the
possible applications?
◼ Think of a problem that can be solved
using decision trees.