0% found this document useful (0 votes)
18 views26 pages

Lecture 07A - Decision Trees

Uploaded by

Yuuki Sagaguchi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
18 views26 pages

Lecture 07A - Decision Trees

Uploaded by

Yuuki Sagaguchi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 26

Decision Trees

Yudi Agusta, PhD


Data Mining, Lecture 07
Copyright © Yudi Agusta, PhD, 2021
Copyright © Yudi Agusta, PhD, 2021

Lecture’s Structure
◼ Unsupervised, Supervised and Semi-
supervised Classification
◼ Definition of Decision Trees
◼ Method for
◼ Finding a consistent decision trees
◼ Finding a good decision trees
◼ Overfitting Problem
◼ Pruning Method
Copyright © Yudi Agusta, PhD, 2021

Supervised, Unsupervised, Semi-Supervised


◼ Supervised classification:
◼ Needs help from the data
◼ Each data in the dataset analysed has been classified
◼ Needs great amount of data
◼ Unsupervised classification:
◼ Doesn’t need help from the data
◼ Each data in the dataset analysed is not classified
◼ Great amount of data are not necessarily needed
◼ Semi-supervised classification:
◼ Need the user to set one or more parameters at the beginning of
the classification process
◼ Example: Clustering (fuzzy c-means)
Copyright © Yudi Agusta, PhD, 2021

Supervised, Unsupervised, Semi-Supervised


◼ Supervised Classification
◼ Neural Networks (NN)
◼ Decision Trees
◼ Naïve Bayes
◼ Support Vector Machine (SVM)
◼ Bayesian Networks (BN)
◼ Unsupervised Classification
◼ Clustering (Mixture Modelling)
◼ Self Organising Map (SOM)
◼ Semi-supervised Classification
◼ Analytical Hierarchy Process (AHP)
◼ Simple Additive Model (SAM)
Copyright © Yudi Agusta, PhD, 2021

Decision Trees
◼ Given an event, predict its category.
Examples:
◼ Who will won a given ball game?
◼ How should we file a given e-mail?
◼ Event = list of features. Examples:
◼ Ball game: Who is the goalkeeper?
◼ E-mail: Who sent the e-mail?
Copyright © Yudi Agusta, PhD, 2021

Decision Trees
◼ Each data in the training data has its own category
◼ Use training data to build the decision tree
◼ Use a decision tree model to predict categories of a
new event

New
Event

Training Events Decision


& Categories Tree Model

Category
Copyright © Yudi Agusta, PhD, 2021

Decision Trees
◼ A decision tree is a tree where:
◼ Each interior node is labeled with a feature
◼ Each arc out of an interior node is labeled with a feature
value for the node’s feature
◼ Each leaf is labeled with a category

Jumlah ART >3.5


<1.5
<3.5
Jenis Lantai WC Luas Lantai
tanah semen/keramik bersama sendiri <20.5 >20.5

Miskin Tidak Miskin Miskin Tidak Miskin Miskin Tidak Miskin


Copyright © Yudi Agusta, PhD, 2021

An Example of Training Data


Jumlah ART Jenis Lantai WC Luas Lantai Category
5 Tanah Sendiri 15 Miskin
3 Tanah Sendiri 20 Tidak Miskin
4 Semen Sendiri 50 Tidak Miskin
1 Semen Bersama 40 Tidak Miskin
1 Tanah Sendiri 30 Miskin
2 Semen Bersama 40 Miskin
8 Semen Bersama 18 Miskin
Copyright © Yudi Agusta, PhD, 2021

Finding A Consistent Decision Tree


◼ Too many decision trees to try them all
◼ Build a decision tree top-down, using
recursive partitioning, as follows:
◼ If all of the events have the same category:
◼ Create a leaf with that category
◼ Else
◼ Pick a feature, and create a node for that feature
◼ For each feature value:
◼ Back to the first step to create sub-tree for the events with
that feature value
Copyright © Yudi Agusta, PhD, 2021

Creating a Decision Tree


◼ Use training data consisting of events and
categories to create a new decision tree
◼ What properties do we want the decision tree
to have?
◼ It should be consistent with the training data
◼ Many decision trees are consistent
◼ It should be simple
◼ Occam’s Razor
◼ Simple models tend to generalise better
◼ In simpler decision trees, each node is based on more
data (on average)
Copyright © Yudi Agusta, PhD, 2021

Finding a Good Decision Tree


◼ What feature should we split on?
◼ We want a small decision tree
◼ Pick the feature that gives us the most information about
what the category should be
◼ 20 Questions
◼ I am thinking of a number from 1 to 1000
◼ You can ask yes/no questions
◼ What is the first question you would ask?
◼ On average, some questions give you more information than
others:
◼ Is it 752? Or Is it prime? Or Is it between 1 and 500?
◼ Pick a question that provides the most information
Copyright © Yudi Agusta, PhD, 2021

Entropy
◼ Entropy provides a measure of how much we know about the
category or how much information we gain
◼ On average, how many more yes/no questions do we need to ask
to determine the category?
◼ The more we know, the lower the entropy
◼ The entropy of a set of events/data E is H(E):
H ( E ) =  P(c) log 2 P(c)
cC
◼ Where P(c) is the probability that an event in E has category c
◼ Entropy means the average value that we can expect to have if
an event occurs
Copyright © Yudi Agusta, PhD, 2021

Entropy
◼ Example of Entropy calculation

H ( E ) =  P(c) log 2 P(c)


cC
◼ For the above dataset the H(E) is calculated as follows:
◼ H(E) = P(tidak miskin) log P(tidak miskin) + P(miskin) log P(miskin)
◼ H(E) = 3/7 log 3/7 + 4/7 log 4/7
◼ H(E) = 0.43 x (-0.37) + 0.57 x (-0.24)
◼ H(E) = - 0.16 - 0.17 = - 0.33
Copyright © Yudi Agusta, PhD, 2021

Information Gain
◼ How much information does a feature give us
about what the category should be?
◼ H(E)=entropy of event set E
◼ H(E|f)=expected entropy of event set E, once we
know the value of feature f.
◼ G(E,f)=H(E)-H(E|f)=the amount of new
information provided by feature f
◼ Split on the feature that maximises
information gain
Copyright © Yudi Agusta, PhD, 2021

Information Gain
◼ Information Gain for the variable ‘Jenis Lantai’ is
calculated as follows:
◼ H(E|f) =

◼ P(tidak miskin|tanah) log P(tidak miskin|tahan) +


◼ P(tidak miskin|semen) log P(tidak miskin|semen) +
◼ P(miskin|tanah) log P(miskin|tahan) +
◼ P(miskin|semen) log P(miskin|semen)
◼ H(E|f) =
◼ 1/3 log 1/3 + 2/3 log 2/3 + ½ log ½ + ½ log ½
◼ 0.33x(–0.48) + 0,67x(-0.17) + 0.5x(-0.30) + 0.5x(-0.30)
◼ - 0.16 - 0.11 - 0.15 - 0.15 = - 0.57
Copyright © Yudi Agusta, PhD, 2021

Information Gain
◼ Variable ‘Jumlah ART’ dan ‘Luas Lantai’ on the other
hand, has to use continuous distribution such as
Normal distribution to get the value of P(c)
◼ Untuk variable ‘Jumlah ART’ H(E|f) =

◼ P(tidak miskin|3) log P(tidak miskin|3) + ...


◼ Untuk variable ‘Luas Lantai’ H(E|f) =
◼ P(tidak miskin|20) log P(tidak miskin|20) + ...
◼ Where P(tidak miskin|3) dan P(tidak miskin|20) are
calculated using Normal Distribution
Copyright © Yudi Agusta, PhD, 2021

Information Gain Ratio


◼ But: information gain is biased towards features that have many
values
◼ Maximum information gain for a feature depends on the number
feature values:
◼ Max information gain for a binary feature is 2
◼ Max information gain for a feature with 1024 values is 10
◼ Use information gain ratio to remove this bias:
G( E, f )
GR( E , f ) =
−  P(v) log 2 P(v)
vV

◼ Where v is in V (all values of the variable considered during the


calculation)
Copyright © Yudi Agusta, PhD, 2021

Overfitting
◼ Decision tree may encode idiosyncrasies of the
training data.
◼ E.g., consider this sub

Jumlah ART >3.5


<1.5
<3.5
Jenis Lantai WC Luas Lantai
tanah semen/keramik bersama sendiri <20.5 >20.5

Miskin Miskin Miskin Miskin Miskin Tidak Miskin

◼ How much should we trust the “Tidak Miskin” leaf, if it is


based on just one training sample
Copyright © Yudi Agusta, PhD, 2021

Overfitting (2)
◼ Fully consistent decision trees tend to over-
generalise
◼ Especially if the decision tree is big
◼ Or if there is not much training data
◼ Trade-off full consistency for compactness:
◼ Larger decision trees can be more consistent
◼ Smaller decision trees generelise better
Copyright © Yudi Agusta, PhD, 2021

Pruning
◼ We can reduce overfitting by reducing the
size of the decision tree:
◼ Create the complete decision tree
◼ Discard any unreliable leafs
◼ Repeat (2) until the decision tree is compact
enough
◼ Which leaves are unreliable
◼ Leafs with low counts?
◼ When should we stop
◼ No more unreliable leaves?
Copyright © Yudi Agusta, PhD, 2021

Test Data
◼ How can we tell if we’re overfitting?
◼ Use a subset of the training data to train the
model (the “training set”)
◼ Use the rest of the training data to test for
overfitting (the “test set”)
◼ For each leaf:
◼ Test the performance of the decision tree on the test set
without that leaf
◼ If the performance improves, discard the leaf
◼ Repeat (3) until no more leafs are discarded
Copyright © Yudi Agusta, PhD, 2021

Using A Decision Tree


◼ Given a new event (= list of feature
values):
◼ Start at the root
◼ At each interior node, follow the outgoing
arc for the feature value that matches our
event
◼ When we reach a leaf node, return its
category as the result of classification
using the decision tree model
Copyright © Yudi Agusta, PhD, 2021

When are Decision Trees Useful


◼ Disadvantages of decision trees:
◼ Problems with sparse data & overfitting
◼ Pruning helps these problems, but doesn’t solve them
◼ Don’t (directly) combine information about
different features
◼ Advantages of decision trees:
◼ Fast
◼ Easy to interpret
◼ Helps us understand what is important in new domains
◼ We can explain the “why” of how the decision tree
makes decisions
Copyright © Yudi Agusta, PhD, 2021

Execise
Jumlah ART Jenis Lantai WC Luas Lantai Category
5 Tanah Sendiri 15 Miskin
3 Tanah Sendiri 20 Tidak Miskin
4 Semen Sendiri 50 Tidak Miskin
1 Semen Bersama 40 Tidak Miskin
1 Tanah Sendiri 30 Miskin
2 Semen Bersama 40 Miskin
8 Semen Bersama 18 Miskin
Copyright © Yudi Agusta, PhD, 2021

Exercise
◼ Hitung information gain dari masing-
masing variable pada saat dilakukan
split yang pertama
Copyright © Yudi Agusta, PhD, 2021

Topik Diskusi
◼ Two types of unsupervised and
supervised classification methods have
been introduced (clustering and
decision trees). What is the main
difference of the two in regards to the
possible applications?
◼ Think of a problem that can be solved
using decision trees.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy