0% found this document useful (0 votes)
65 views

Decision Trees

This document discusses decision trees for classification and regression. It provides examples of how a decision tree is built to classify a golf playing dataset based on weather attributes. Key steps include calculating entropy at each node, determining the attribute with highest information gain to split on, and recursively splitting the data into purer subsets to construct the tree. The tree is built with outlook (sunny, overcast, rainy) as the root node, and windy and temperature/humidity as subsequent nodes based on their information gains.

Uploaded by

laila
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
65 views

Decision Trees

This document discusses decision trees for classification and regression. It provides examples of how a decision tree is built to classify a golf playing dataset based on weather attributes. Key steps include calculating entropy at each node, determining the attribute with highest information gain to split on, and recursively splitting the data into purer subsets to construct the tree. The tree is built with outlook (sunny, overcast, rainy) as the root node, and windy and temperature/humidity as subsequent nodes based on their information gains.

Uploaded by

laila
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 52

Decision Trees

Classification & Regression


Dr. Saed Sayad
University of Toronto
2010
saed.sayad@utoronto.ca

http://chem-eng.utoronto.ca/~datamining/ 1
Decision Tree
A set of training examples is broken down into
smaller and smaller subsets while at the same
time an associated decision tree is
incrementally developed. At the end of the
learning process, a decision tree covering the
training set is returned.

Mitchell, 1997

http://chem-eng.utoronto.ca/~datamining/ 2
Decision Tree - Classification

http://chem-eng.utoronto.ca/~datamining/ 3
Dataset
Predictors Target

Outlook Temp. Humidity Windy Play Golf


Rainy Hot High False No
Rainy Hot High True No
Overcast Hot High False Yes
Sunny Mild High False Yes
Sunny Cool Normal False Yes
Sunny Cool Normal True No
Overcast Cool Normal True Yes
Rainy Mild High False No
Rainy Cool Normal False Yes
Sunny Mild Normal False Yes
Rainy Mild Normal True Yes
Overcast Mild High True Yes
Overcast Hot Normal False Yes
Sunny Mild High True No

http://chem-eng.utoronto.ca/~datamining/ 4
Decision Tree
Outlook

Sunny Overcast Rainy

Windy Play Humidity

FALSE TRUE High Normal

Play Not Play Not Play Play

http://chem-eng.utoronto.ca/~datamining/ 5
Entropy

Entropy = -p log2p – q log2q


Entropy = -0.5 log20.5 – 0.5 log20.5 = 1
http://chem-eng.utoronto.ca/~datamining/ 6
Entropy – Frequency

c
E ( S )    pi log 2 pi
i 1

Entropy (5,3,2) = Entropy (0.5,0.3,0.2)


= - (0.5 * log20.5) - (0.3 * log20.3) - (0.2 * log20.2)
= 1.49

http://chem-eng.utoronto.ca/~datamining/ 7
Entropy - Target
Play Golf Play Golf
No No
No No
Yes No 5 / 14 = 0.36
Yes No
Yes No
No Yes
Yes Yes
Sort
No Yes
Yes Yes
Yes Yes 9 / 14 = 0.64
Yes Yes
Yes Yes
Entropy(PlayGolf) = Entropy (5,9)
Yes Yes
= Entropy (0.36, 0.64)
No Yes
= - (0.36 log2 0.36) - (0.64 log2 0.64)
= 0.94
http://chem-eng.utoronto.ca/~datamining/ 8
Frequency Tables

Play Golf Play Golf


Yes No Yes No
Sunny 3 2 Hot 2 2
Outlook Overcast 4 0 Temp. Mild 4 2
Rainy 2 3 Cool 3 1

Play Golf Play Golf


Yes No Yes No
High 3 4 False 6 2
Humidity Windy
Normal 6 1 True 3 3

http://chem-eng.utoronto.ca/~datamining/ 9
Entropy – Frequency Table
Play Golf
Yes No
Sunny 3 2 5
Outlook Overcast 4 0 4
Rainy 2 3 5
14

E (T , X )   P(c) E (c)
cX
E(PlayGolf, Outlook) = P(Sunny)*E(3,2) + P(Overcast)*E(4,0) + P(Rainy)*E(2,3)

= (5/14)*0.971 + (4/14)*0.0 + (5/14)*0.971

= 0.693

http://chem-eng.utoronto.ca/~datamining/ 10
Information Gain

Gain(T , X )  Entropy (T )  Entropy (T , X )

G(PlayGolf, Outlook) = E(PlayGolf) – E(PlayGolf, Outlook)

= 0.940 – 0.693 = 0.247

http://chem-eng.utoronto.ca/~datamining/ 11
Information Gain – the best predictor?

Play Golf Play Golf


Yes No Yes No
Sunny 3 2 Hot 2 2
Outlook Overcast 4 0 Temp. Mild 4 2
Rainy 2 3 Cool 3 1
Gain = 0.247 Gain = 0.029

Play Golf Play Golf


Yes No Yes No
High 3 4 False 6 2
Humidity Windy
Normal 6 1 True 3 3
Gain = 0.152 Gain = 0.048

http://chem-eng.utoronto.ca/~datamining/ 12
Decision Tree – Root Node

Outlook

Sunny Overcast Rainy

http://chem-eng.utoronto.ca/~datamining/ 13
Dataset – Sorted by Outlook

http://chem-eng.utoronto.ca/~datamining/ 14
Subset (Outlook = Overcast)
Temp. Humidity Windy Play Golf
Hot High FALSE Yes
Cool Normal TRUE Yes
Mild High TRUE Yes
Outlook
Hot Normal FALSE Yes
Hot High FALSE Yes

Sunny Overcast Rainy

Play=Yes

http://chem-eng.utoronto.ca/~datamining/ 15
Subset (Outlook = Sunny)
Temp. Humidity Windy Play Golf
Mild High FALSE Yes
Cool Normal FALSE Yes
Cool Normal TRUE No
Mild Normal FALSE Yes
Mild High TRUE No

Play Golf Play Golf Play Golf


Yes No Yes No Yes No
Mild 2 1 High 1 1 False 3 0
Temp. Humidity Windy
Cool 1 1 Normal 2 1 True 0 2
Gain = 0.02 Gain = 0.02 Gain = 0.97

http://chem-eng.utoronto.ca/~datamining/ 16
Subset (Outlook = Sunny)
Temp. Humidity Windy Play Golf
Outlook
Mild High FALSE Yes
Cool Normal FALSE Yes
Mild Normal FALSE Yes
Cool Normal TRUE No
Sunny Overcast Rainy
Mild High TRUE No

Windy Play=Yes

FALSE TRUE

Play=Yes Play=No

http://chem-eng.utoronto.ca/~datamining/ 17
Subset (Outlook = Rainy)
Temp. Humidity Windy Play Golf
Hot High FALSE No
Hot High TRUE No
Mild High FALSE No
Cool Normal FALSE Yes
Mild Normal TRUE Yes

Play Golf
Yes No Play Golf Play Golf
Hot 0 2 Yes No Yes No
Temp. Mild 1 1 High 0 3 False 1 2
Humidity Windy
Cool 1 0 Normal 2 0 True 1 1
Gain = 0.57 Gain = 0.97 Gain = 0.02

http://chem-eng.utoronto.ca/~datamining/ 18
Subset (Outlook = Rainy)
Outlook

Sunny Overcast Rainy

Windy Play=Yes Humidity

FALSE TRUE High Normal

Play=Yes Play=No Play=No Play=Yes

http://chem-eng.utoronto.ca/~datamining/ 19
Decision Rules
R1: IF (Outlook=Sunny) AND
(Windy=FALSE) THEN Play=Yes Outlook

R2: IF (Outlook=Sunny) AND


Sunny Overcast Rainy
(Windy=TRUE) THEN Play=No

R3: IF (Outlook=Overcast) THEN Windy Play=Yes Humidity


Play=Yes

R4: IF (Outlook=Rainy) AND FALSE TRUE High Normal


(Humidity=High) THEN Play=No
Play=Yes Play=No Play=No Play=Yes
R5: IF (Outlook=Rain) AND
(Humidity=Normal) THEN
Play=Yes

http://chem-eng.utoronto.ca/~datamining/ 20
Decision Tree - Issues
 Working with Continuous Attributes
 Overfitting and Pruning
 Super Attributes (attributes with many values)
 Working with Missing Values
 Attributes with Different Costs

http://chem-eng.utoronto.ca/~datamining/ 21
Numeric Variables - Binning
Temp B_Temp Play Golf
85 80-90 No
80 80-90 No
83 80-90 Yes
70 70-80 Yes Play Golf
68 60-70 Yes Yes No
65 60-70 No 60-70 3 1
64 60-70 Yes B_Temp 70-80 4 2
72 70-80 No 80-90 2 2
69 60-70 Yes
75 70-80 Yes
75 70-80 Yes
72 70-80 Yes
81 80-90 Yes
71 70-80 No

http://chem-eng.utoronto.ca/~datamining/ 22
Continuous Attributes - Discretization

 Equal Frequency
This strategy creates a set of N intervals with the same
number of elements.

 Equal width
The original range of values is divided into N intervals with the
same range.

 Entropy based
For each numeric attribute, instances are sorted and, for each
possible threshold, a binary <, >= test is considered and
evaluated in exactly the same way that a categorical attribute
would be.

http://chem-eng.utoronto.ca/~datamining/ 23
Avoid Overfitting
Overfitting when our learning algorithm continues develop
hypotheses that reduce training set error at the cost of an
increased test set error.
 Stop growing when data split not statistically significant
(Chi2 test)
 Grow full tree then post-prune
 Minimum description length (MDL):
Minimize: size(tree) + size(misclassifications(tree))

http://chem-eng.utoronto.ca/~datamining/ 24
Avoid Overfitting - Post- Pruning
o First, build full tree then prune it.
 Fully-grown tree shows all attribute interactions
 Problem: some subtrees might be due to chance effects
o Two pruning operations:
 Subtree replacement
 Subtree raising
o Possible strategies:
 error estimation
 significance testing
 MDL principle

http://chem-eng.utoronto.ca/~datamining/ 25
Error Estimation
• Transformed value for f : f p
p(1  p) / N
(i.e. subtract the mean and divide by the standard deviation)

• Resulting equation:
 f p 
Pr  z   z  c
 p(1  p) / N 
• Solving for p:

 z2 f f2 z2   z2 
p   f  z    1  
2 
 2N N N 4N   N 

http://chem-eng.utoronto.ca/~datamining/ 26
witten & eibe
Error Estimation
• Error estimate for subtree is weighted sum of error
estimates for all its leaves
• Error estimate for a node (upper bound):
 z 2
f f 2
z 2 
 z2 
e   f  z   
2
1  
 2N N N 4N   N
• If c = 25% then z = 0.69 (from normal distribution)
• f is the error on the training data
• N is the number of instances covered by the leaf

http://chem-eng.utoronto.ca/~datamining/ 27
witten & eibe
Error Estimation

f = 5/14
e = 0.46
e < 0.51
so prune!

f=0.33 f=0.5 f=0.33


e=0.47 e=0.72 e=0.47

Combined using ratios 6:2:6 gives 0.51


28
(6/14*0.47+2/14*0.72+6/14*0.47)
Subtree Replacement

http://chem-eng.utoronto.ca/~datamining/ 29
witten & eibe
Super Attributes
 The information gain equation, G(T,X) is biased toward
attributes that have a large number of values over
attributes that have a smaller number of values.
 Theses ‘Super Attributes’ will easily be selected as the
root, result in a broad tree that classifies perfectly but
performs poorly on unseen instances.
 We can penalize attributes with large numbers of values
by using an alternative method for attribute selection,
referred to as GainRatio.

GainRatio(T,X) = Gain(T,X) / SplitInformation(T,X)

http://chem-eng.utoronto.ca/~datamining/ 30
Super Attributes
Play Golf
Yes No total
Sunny 3 2 5
Outlook Overcast 4 0 4
Rainy 2 3 5
Gain = 0.247

Split (T , X )   P(c) log 2 P(c)


cA

Split (Play,Outlook) = - (5/14*log2(5/14) + 4/14*log2(4/15) + 5/14*log2(5/14))


= 1.577

Gain Ratio (Play,Outlook) = 0.247/1.577 = 0.156

http://chem-eng.utoronto.ca/~datamining/ 31
Super Attributes
Play Golf
Yes No total
id1 1 0 1
id2 0 1 1 Entropy(Play,ID) = 0
id3 1 0 1 Gain(Play,ID) = 0.94
id4 1 0 1
id5 0 1 1
Split (Play,ID) = - (1/14*log2(1/14)*14=3.81
id6 0 1 1
Gain Ratio (Play,ID) = 0.94/3.81= 0.247
id7 1 0 1
ID
id8 1 0 1
id9 0 1 1
id10 1 0 1
id11 1 0 1
id12 0 1 1
id13 1 0 1
id14 1 0 1

http://chem-eng.utoronto.ca/~datamining/ 32
Attributes with Different Costs

 Sometimes the best attribute for splitting the training


elements is very costly. In order to make the overall
decision process more cost effective we may wish to
penalize the information gain of an attribute by its cost.

G (T , X )
G' (T , X ) 
Cost ( X )

http://chem-eng.utoronto.ca/~datamining/ 33
Numeric Variables and Missing Values
Outlook Temp Humidity Windy Play Golf
Rainy 85 High False No
Rainy 80 High True No
Overcast ? High False Yes
Sunny 70 High False Yes
Sunny 68 ? False Yes
Sunny 65 Normal True No
Overcast 64 Normal True Yes
Rainy 72 High ? No
Rainy 69 Normal False Yes
Sunny ? Normal False Yes
Rainy 75 Normal True Yes
? 72 High True Yes
Overcast 81 Normal False Yes
Sunny 71 High True No

http://chem-eng.utoronto.ca/~datamining/ 34
Missing Values
• For the numerical variables replace the
missing value with the average or median.

• For the categorical variables replace the


missing with.
 Most common value
 Most common value at node K

• K Nearest Neighbors (KNN)


http://chem-eng.utoronto.ca/~datamining/ 35
Decision Tree - Regression

http://chem-eng.utoronto.ca/~datamining/ 36
Dataset
Predictors Target

Outlook Temp. Humidity Windy Golf Players


Rainy Hot High False 25
Rainy Hot High True 30
Overcast Hot High False 46
Sunny Mild High False 45
Sunny Cool Normal False 52
Sunny Cool Normal True 23
Overcast Cool Normal True 43
Rainy Mild High False 35
Rainy Cool Normal False 38
Sunny Mild Normal False 46
Rainy Mild Normal True 48
Overcast Mild High True 52
Overcast Hot Normal False 44
Sunny Mild High True 30

http://chem-eng.utoronto.ca/~datamining/ 37
Decision Tree - Regression

Outlook

Sunny Rainy
Overcast

Windy 46.3 Temp.

FALSE TRUE Cool Hot Mild

47.7 26.5 38 27.5 41.5

http://chem-eng.utoronto.ca/~datamining/ 38
Entropy versus Standard Deviation
c
Entropy E    pi log 2 pi
i 1

Classification

Chi2 Test
r
 2  
c O
ij  Eij 
2

Decision (CHAID) i 1 j 1 Eij


Trees

Regression StDev S
 ( x   ) 2

http://chem-eng.utoronto.ca/~datamining/ 39
Target – Standard Deviation & Average
Golf Players
25
30
46
45
52
23 StDev = 9.32
43
Avg = 39.79
35
38
46
48
52
44
30

http://chem-eng.utoronto.ca/~datamining/ 40
Standard Deviation Tables

Golf Players Golf Players


(StDev) (StDev)
Overcast 3.49 Cool 10.51
Outlook Rainy 7.78 Temp. Hot 8.95
Sunny 10.87 Mild 7.65

Golf Players Golf Players


(StDev) (StDev)
High 9.36 False 7.87
Humidity Windy
Normal 8.37 True 10.59

http://chem-eng.utoronto.ca/~datamining/ 41
Standard Deviation
Golf Players
Count
(StDev)
Overcast 3.49 4
Outlook Rainy 7.78 5
Sunny 10.87 5
14

S (T , X )   P(c) S (c)
cX

S(Players, Outlook) = P(Sunny)*S(Sunny) + P(Overcast)*S(Overcast) + P(Rainy)*S(Rainy)

= (4/14)*3.49 + (5/14)*7.78 + (5/14)*10.87

= 7.66

http://chem-eng.utoronto.ca/~datamining/ 42
Standard Deviation Reduction (SDR)

SDR(T , X )  S (T )  S (T , X )

G(Players, Outlook) = S(Players) – S(Players, Outlook)

= 9.32 – 7.66 = 1.66

http://chem-eng.utoronto.ca/~datamining/ 43
Standard Deviation Reduction
the best predictor?
Golf Players Golf Players
(StDev) (StDev)
Overcast 3.49 Cool 10.51
Outlook Rainy 7.78 Temp. Hot 8.95
Sunny 10.87 Mild 7.65
SDR=1.66 SDR=0.17

Golf Players Golf Players


(StDev) (StDev)
High 9.36 False 7.87
Humidity Windy
Normal 8.37 True 10.59
SDR=0.28 SDR=0.29

http://chem-eng.utoronto.ca/~datamining/ 44
Decision Tree – Root Node

Outlook

Sunny Overcast Rainy

http://chem-eng.utoronto.ca/~datamining/ 45
Dataset – Sorted by Outlook

http://chem-eng.utoronto.ca/~datamining/ 46
Subset (Outlook = Sunny)
Temp. Humidity Windy Golf Players

Mild High FALSE 45


Cool Normal FALSE 52
Cool Normal TRUE 23
Mild Normal FALSE 46
Mild High TRUE 30
SD=10.87

Golf Players Golf Players Golf Players


(StDev) (StDev) (StDev)
Cool 14.50 High 7.50 False 3.09
Temp. Humidity Windy
Mild 7.32 Normal 12.50 True 3.50
SDR= 0.678 SDR= 0.370 SDR= 7.62

SDR = 10.87-((2/5)*14.5 + (3/5)*7.32) SDR = 10.87-((2/5)*7.5 + (3/5)*12.5) SDR = 10.87-((3/5)*3.09 + (2/5)*3.5)

http://chem-eng.utoronto.ca/~datamining/ 47
Subset (Outlook = Sunny)
Temp. Humidity Windy Golf Players
Mild High FALSE 45
Cool Normal FALSE 52 Outlook
Mild Normal FALSE 46
Cool Normal TRUE 23
Mild High TRUE 30
Sunny Overcast Rainy

Windy

FALSE TRUE

47.7 26.5

http://chem-eng.utoronto.ca/~datamining/ 48
Subset (Outlook = Overcast)

Outlook

Sunny Overcast Rainy

Windy 46.3

FALSE TRUE

47.7 26.5

http://chem-eng.utoronto.ca/~datamining/ 49
Subset (Outlook = Rainy)
Temp. Humidity Windy Golf Players
Hot High FALSE 25
Hot High TRUE 30
Mild High FALSE 35
Cool Normal FALSE 38
Mild Normal TRUE 48
StDev=7.78

Golf Players
(StDev) Golf Players Golf Players
Cool 0 (StDev) (StDev)

Temp. Hot 2.5 High 4.1 False 5.6


Humidity Windy
Mild 6.5 Normal 5.0 True 9.0

SDR= 4.18 SDR= 3.32 SDR= 0.82

SDR = 7.78 - ((2/5)*2.5 + (2/5)*6.5) SDR = 7.78 - ((3/5)*4.1 + (2/5)*5.0) SDR = 7.78 - ((3/5)*5.6 + (2/5)*9.0)

http://chem-eng.utoronto.ca/~datamining/ 50
Subset (Outlook = Rainy)
Temp. Humidity Windy Golf Players
Cool Normal FALSE 38
Hot High FALSE 25
Hot High TRUE 30
Mild High FALSE 35
Outlook
Mild Normal TRUE 48

Sunny Overcast Rainy

Windy 46.3 Temp.

FALSE TRUE Cool Hot Mild

47.7 26.5 38 27.5 41.5

http://chem-eng.utoronto.ca/~datamining/ 51
http://chem-eng.utoronto.ca/~datamining/ 52

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy