0% found this document useful (0 votes)

65 views

Decision Trees

This document discusses decision trees for classification and regression. It provides examples of how a decision tree is built to classify a golf playing dataset based on weather attributes. Key steps include calculating entropy at each node, determining the attribute with highest information gain to split on, and recursively splitting the data into purer subsets to construct the tree. The tree is built with outlook (sunny, overcast, rainy) as the root node, and windy and temperature/humidity as subsequent nodes based on their information gains.

Uploaded by

laila

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

65 views

Decision Trees

Uploaded by

laila

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 52

Decision Trees

Classification & Regression

Dr. Saed Sayad
University of Toronto
2010
saed.sayad@utoronto.ca

http://chem-eng.utoronto.ca/~datamining/ 1
Decision Tree
A set of training examples is broken down into
smaller and smaller subsets while at the same
time an associated decision tree is
incrementally developed. At the end of the
learning process, a decision tree covering the
training set is returned.

Mitchell, 1997

http://chem-eng.utoronto.ca/~datamining/ 2
Decision Tree - Classification

http://chem-eng.utoronto.ca/~datamining/ 3
Dataset
Predictors Target

Outlook Temp. Humidity Windy Play Golf

Rainy Hot High False No
Rainy Hot High True No
Overcast Hot High False Yes
Sunny Mild High False Yes
Sunny Cool Normal False Yes
Sunny Cool Normal True No
Overcast Cool Normal True Yes
Rainy Mild High False No
Rainy Cool Normal False Yes
Sunny Mild Normal False Yes
Rainy Mild Normal True Yes
Overcast Mild High True Yes
Overcast Hot Normal False Yes
Sunny Mild High True No

http://chem-eng.utoronto.ca/~datamining/ 4
Decision Tree
Outlook

Sunny Overcast Rainy

Windy Play Humidity

FALSE TRUE High Normal

Play Not Play Not Play Play

http://chem-eng.utoronto.ca/~datamining/ 5
Entropy

Entropy = -p log2p – q log2q

Entropy = -0.5 log20.5 – 0.5 log20.5 = 1
http://chem-eng.utoronto.ca/~datamining/ 6
Entropy – Frequency

c
E ( S )    pi log 2 pi
i 1

Entropy (5,3,2) = Entropy (0.5,0.3,0.2)

= - (0.5 * log20.5) - (0.3 * log20.3) - (0.2 * log20.2)
= 1.49

http://chem-eng.utoronto.ca/~datamining/ 7
Entropy - Target
Play Golf Play Golf
No No
No No
Yes No 5 / 14 = 0.36
Yes No
Yes No
No Yes
Yes Yes
Sort
No Yes
Yes Yes
Yes Yes 9 / 14 = 0.64
Yes Yes
Yes Yes
Entropy(PlayGolf) = Entropy (5,9)
Yes Yes
= Entropy (0.36, 0.64)
No Yes
= - (0.36 log2 0.36) - (0.64 log2 0.64)
= 0.94
http://chem-eng.utoronto.ca/~datamining/ 8
Frequency Tables

Play Golf Play Golf

Yes No Yes No
Sunny 3 2 Hot 2 2
Outlook Overcast 4 0 Temp. Mild 4 2
Rainy 2 3 Cool 3 1

Play Golf Play Golf

Yes No Yes No
High 3 4 False 6 2
Humidity Windy
Normal 6 1 True 3 3

http://chem-eng.utoronto.ca/~datamining/ 9
Entropy – Frequency Table
Play Golf
Yes No
Sunny 3 2 5
Outlook Overcast 4 0 4
Rainy 2 3 5
14

E (T , X )   P(c) E (c)
cX
E(PlayGolf, Outlook) = P(Sunny)*E(3,2) + P(Overcast)*E(4,0) + P(Rainy)*E(2,3)

= (5/14)0.971 + (4/14)0.0 + (5/14)*0.971

= 0.693

http://chem-eng.utoronto.ca/~datamining/ 10
Information Gain

Gain(T , X )  Entropy (T )  Entropy (T , X )

G(PlayGolf, Outlook) = E(PlayGolf) – E(PlayGolf, Outlook)

= 0.940 – 0.693 = 0.247

http://chem-eng.utoronto.ca/~datamining/ 11
Information Gain – the best predictor?

Play Golf Play Golf

Yes No Yes No
Sunny 3 2 Hot 2 2
Outlook Overcast 4 0 Temp. Mild 4 2
Rainy 2 3 Cool 3 1
Gain = 0.247 Gain = 0.029

Play Golf Play Golf

Yes No Yes No
High 3 4 False 6 2
Humidity Windy
Normal 6 1 True 3 3
Gain = 0.152 Gain = 0.048

http://chem-eng.utoronto.ca/~datamining/ 12
Decision Tree – Root Node

Outlook

Sunny Overcast Rainy

http://chem-eng.utoronto.ca/~datamining/ 13
Dataset – Sorted by Outlook

http://chem-eng.utoronto.ca/~datamining/ 14
Subset (Outlook = Overcast)
Temp. Humidity Windy Play Golf
Hot High FALSE Yes
Cool Normal TRUE Yes
Mild High TRUE Yes
Outlook
Hot Normal FALSE Yes
Hot High FALSE Yes

Sunny Overcast Rainy

Play=Yes

http://chem-eng.utoronto.ca/~datamining/ 15
Subset (Outlook = Sunny)
Temp. Humidity Windy Play Golf
Mild High FALSE Yes
Cool Normal FALSE Yes
Cool Normal TRUE No
Mild Normal FALSE Yes
Mild High TRUE No

Play Golf Play Golf Play Golf

Yes No Yes No Yes No
Mild 2 1 High 1 1 False 3 0
Temp. Humidity Windy
Cool 1 1 Normal 2 1 True 0 2
Gain = 0.02 Gain = 0.02 Gain = 0.97

http://chem-eng.utoronto.ca/~datamining/ 16
Subset (Outlook = Sunny)
Temp. Humidity Windy Play Golf
Outlook
Mild High FALSE Yes
Cool Normal FALSE Yes
Mild Normal FALSE Yes
Cool Normal TRUE No
Sunny Overcast Rainy
Mild High TRUE No

Windy Play=Yes

FALSE TRUE

Play=Yes Play=No

http://chem-eng.utoronto.ca/~datamining/ 17
Subset (Outlook = Rainy)
Temp. Humidity Windy Play Golf
Hot High FALSE No
Hot High TRUE No
Mild High FALSE No
Cool Normal FALSE Yes
Mild Normal TRUE Yes

Play Golf
Yes No Play Golf Play Golf
Hot 0 2 Yes No Yes No
Temp. Mild 1 1 High 0 3 False 1 2
Humidity Windy
Cool 1 0 Normal 2 0 True 1 1
Gain = 0.57 Gain = 0.97 Gain = 0.02

http://chem-eng.utoronto.ca/~datamining/ 18
Subset (Outlook = Rainy)
Outlook

Sunny Overcast Rainy

Windy Play=Yes Humidity

FALSE TRUE High Normal

Play=Yes Play=No Play=No Play=Yes

http://chem-eng.utoronto.ca/~datamining/ 19
Decision Rules
R1: IF (Outlook=Sunny) AND
(Windy=FALSE) THEN Play=Yes Outlook

R2: IF (Outlook=Sunny) AND

Sunny Overcast Rainy
(Windy=TRUE) THEN Play=No

R3: IF (Outlook=Overcast) THEN Windy Play=Yes Humidity

Play=Yes

R4: IF (Outlook=Rainy) AND FALSE TRUE High Normal

(Humidity=High) THEN Play=No
Play=Yes Play=No Play=No Play=Yes
R5: IF (Outlook=Rain) AND
(Humidity=Normal) THEN
Play=Yes

http://chem-eng.utoronto.ca/~datamining/ 20
Decision Tree - Issues
 Working with Continuous Attributes
 Overfitting and Pruning
 Super Attributes (attributes with many values)
 Working with Missing Values
 Attributes with Different Costs

http://chem-eng.utoronto.ca/~datamining/ 21
Numeric Variables - Binning
Temp B_Temp Play Golf
85 80-90 No
80 80-90 No
83 80-90 Yes
70 70-80 Yes Play Golf
68 60-70 Yes Yes No
65 60-70 No 60-70 3 1
64 60-70 Yes B_Temp 70-80 4 2
72 70-80 No 80-90 2 2
69 60-70 Yes
75 70-80 Yes
75 70-80 Yes
72 70-80 Yes
81 80-90 Yes
71 70-80 No

http://chem-eng.utoronto.ca/~datamining/ 22
Continuous Attributes - Discretization

 Equal Frequency
This strategy creates a set of N intervals with the same
number of elements.

 Equal width
The original range of values is divided into N intervals with the
same range.

 Entropy based
For each numeric attribute, instances are sorted and, for each
possible threshold, a binary <, >= test is considered and
evaluated in exactly the same way that a categorical attribute
would be.

http://chem-eng.utoronto.ca/~datamining/ 23
Avoid Overfitting
Overfitting when our learning algorithm continues develop
hypotheses that reduce training set error at the cost of an
increased test set error.
 Stop growing when data split not statistically significant
(Chi2 test)
 Grow full tree then post-prune
 Minimum description length (MDL):
Minimize: size(tree) + size(misclassifications(tree))

http://chem-eng.utoronto.ca/~datamining/ 24
Avoid Overfitting - Post- Pruning
o First, build full tree then prune it.
 Fully-grown tree shows all attribute interactions
 Problem: some subtrees might be due to chance effects
o Two pruning operations:
 Subtree replacement
 Subtree raising
o Possible strategies:
 error estimation
 significance testing
 MDL principle

http://chem-eng.utoronto.ca/~datamining/ 25
Error Estimation
• Transformed value for f : f p
p(1  p) / N
(i.e. subtract the mean and divide by the standard deviation)

• Resulting equation:
 f p 
Pr  z   z  c
 p(1  p) / N 
• Solving for p:

 z2 f f2 z2   z2 
p   f  z    1  
2 
 2N N N 4N   N 

http://chem-eng.utoronto.ca/~datamining/ 26
witten & eibe
Error Estimation
• Error estimate for subtree is weighted sum of error
estimates for all its leaves
• Error estimate for a node (upper bound):
 z 2
f f 2
z 2 
 z2 
e   f  z   
2
1  
 2N N N 4N   N
• If c = 25% then z = 0.69 (from normal distribution)
• f is the error on the training data
• N is the number of instances covered by the leaf

http://chem-eng.utoronto.ca/~datamining/ 27
witten & eibe
Error Estimation

f = 5/14
e = 0.46
e < 0.51
so prune!

f=0.33 f=0.5 f=0.33

e=0.47 e=0.72 e=0.47

Combined using ratios 6:2:6 gives 0.51

28
(6/14*0.47+2/14*0.72+6/14*0.47)
Subtree Replacement

http://chem-eng.utoronto.ca/~datamining/ 29
witten & eibe
Super Attributes
 The information gain equation, G(T,X) is biased toward
attributes that have a large number of values over
attributes that have a smaller number of values.
 Theses ‘Super Attributes’ will easily be selected as the
root, result in a broad tree that classifies perfectly but
performs poorly on unseen instances.
 We can penalize attributes with large numbers of values
by using an alternative method for attribute selection,
referred to as GainRatio.

GainRatio(T,X) = Gain(T,X) / SplitInformation(T,X)

http://chem-eng.utoronto.ca/~datamining/ 30
Super Attributes
Play Golf
Yes No total
Sunny 3 2 5
Outlook Overcast 4 0 4
Rainy 2 3 5
Gain = 0.247

Split (T , X )   P(c) log 2 P(c)

cA

Split (Play,Outlook) = - (5/14log2(5/14) + 4/14log2(4/15) + 5/14*log2(5/14))

= 1.577

Gain Ratio (Play,Outlook) = 0.247/1.577 = 0.156

http://chem-eng.utoronto.ca/~datamining/ 31
Super Attributes
Play Golf
Yes No total
id1 1 0 1
id2 0 1 1 Entropy(Play,ID) = 0
id3 1 0 1 Gain(Play,ID) = 0.94
id4 1 0 1
id5 0 1 1
Split (Play,ID) = - (1/14*log2(1/14)*14=3.81
id6 0 1 1
Gain Ratio (Play,ID) = 0.94/3.81= 0.247
id7 1 0 1
ID
id8 1 0 1
id9 0 1 1
id10 1 0 1
id11 1 0 1
id12 0 1 1
id13 1 0 1
id14 1 0 1

http://chem-eng.utoronto.ca/~datamining/ 32
Attributes with Different Costs

 Sometimes the best attribute for splitting the training

elements is very costly. In order to make the overall
decision process more cost effective we may wish to
penalize the information gain of an attribute by its cost.

G (T , X )
G' (T , X ) 
Cost ( X )

http://chem-eng.utoronto.ca/~datamining/ 33
Numeric Variables and Missing Values
Outlook Temp Humidity Windy Play Golf
Rainy 85 High False No
Rainy 80 High True No
Overcast ? High False Yes
Sunny 70 High False Yes
Sunny 68 ? False Yes
Sunny 65 Normal True No
Overcast 64 Normal True Yes
Rainy 72 High ? No
Rainy 69 Normal False Yes
Sunny ? Normal False Yes
Rainy 75 Normal True Yes
? 72 High True Yes
Overcast 81 Normal False Yes
Sunny 71 High True No

http://chem-eng.utoronto.ca/~datamining/ 34
Missing Values
• For the numerical variables replace the
missing value with the average or median.

• For the categorical variables replace the

missing with.
 Most common value
 Most common value at node K

• K Nearest Neighbors (KNN)

http://chem-eng.utoronto.ca/~datamining/ 35
Decision Tree - Regression

http://chem-eng.utoronto.ca/~datamining/ 36
Dataset
Predictors Target

Outlook Temp. Humidity Windy Golf Players

Rainy Hot High False 25
Rainy Hot High True 30
Overcast Hot High False 46
Sunny Mild High False 45
Sunny Cool Normal False 52
Sunny Cool Normal True 23
Overcast Cool Normal True 43
Rainy Mild High False 35
Rainy Cool Normal False 38
Sunny Mild Normal False 46
Rainy Mild Normal True 48
Overcast Mild High True 52
Overcast Hot Normal False 44
Sunny Mild High True 30

http://chem-eng.utoronto.ca/~datamining/ 37
Decision Tree - Regression

Outlook

Sunny Rainy
Overcast

Windy 46.3 Temp.

FALSE TRUE Cool Hot Mild

47.7 26.5 38 27.5 41.5

http://chem-eng.utoronto.ca/~datamining/ 38
Entropy versus Standard Deviation
c
Entropy E    pi log 2 pi
i 1

Classification

Chi2 Test
r
 2  
c O
ij  Eij 
2

Decision (CHAID) i 1 j 1 Eij

Trees

Regression StDev S
 ( x   ) 2

http://chem-eng.utoronto.ca/~datamining/ 39
Target – Standard Deviation & Average
Golf Players
25
30
46
45
52
23 StDev = 9.32
43
Avg = 39.79
35
38
46
48
52
44
30

http://chem-eng.utoronto.ca/~datamining/ 40
Standard Deviation Tables

Golf Players Golf Players

(StDev) (StDev)
Overcast 3.49 Cool 10.51
Outlook Rainy 7.78 Temp. Hot 8.95
Sunny 10.87 Mild 7.65

Golf Players Golf Players

(StDev) (StDev)
High 9.36 False 7.87
Humidity Windy
Normal 8.37 True 10.59

http://chem-eng.utoronto.ca/~datamining/ 41
Standard Deviation
Golf Players
Count
(StDev)
Overcast 3.49 4
Outlook Rainy 7.78 5
Sunny 10.87 5
14

S (T , X )   P(c) S (c)
cX

S(Players, Outlook) = P(Sunny)S(Sunny) + P(Overcast)S(Overcast) + P(Rainy)*S(Rainy)

= (4/14)3.49 + (5/14)7.78 + (5/14)*10.87

= 7.66

http://chem-eng.utoronto.ca/~datamining/ 42
Standard Deviation Reduction (SDR)

SDR(T , X )  S (T )  S (T , X )

G(Players, Outlook) = S(Players) – S(Players, Outlook)

= 9.32 – 7.66 = 1.66

http://chem-eng.utoronto.ca/~datamining/ 43
Standard Deviation Reduction
the best predictor?
Golf Players Golf Players
(StDev) (StDev)
Overcast 3.49 Cool 10.51
Outlook Rainy 7.78 Temp. Hot 8.95
Sunny 10.87 Mild 7.65
SDR=1.66 SDR=0.17

Golf Players Golf Players

(StDev) (StDev)
High 9.36 False 7.87
Humidity Windy
Normal 8.37 True 10.59
SDR=0.28 SDR=0.29

http://chem-eng.utoronto.ca/~datamining/ 44
Decision Tree – Root Node

Outlook

Sunny Overcast Rainy

http://chem-eng.utoronto.ca/~datamining/ 45
Dataset – Sorted by Outlook

http://chem-eng.utoronto.ca/~datamining/ 46
Subset (Outlook = Sunny)
Temp. Humidity Windy Golf Players

Mild High FALSE 45

Cool Normal FALSE 52
Cool Normal TRUE 23
Mild Normal FALSE 46
Mild High TRUE 30
SD=10.87

Golf Players Golf Players Golf Players

(StDev) (StDev) (StDev)
Cool 14.50 High 7.50 False 3.09
Temp. Humidity Windy
Mild 7.32 Normal 12.50 True 3.50
SDR= 0.678 SDR= 0.370 SDR= 7.62

SDR = 10.87-((2/5)14.5 + (3/5)7.32) SDR = 10.87-((2/5)7.5 + (3/5)12.5) SDR = 10.87-((3/5)3.09 + (2/5)3.5)

http://chem-eng.utoronto.ca/~datamining/ 47
Subset (Outlook = Sunny)
Temp. Humidity Windy Golf Players
Mild High FALSE 45
Cool Normal FALSE 52 Outlook
Mild Normal FALSE 46
Cool Normal TRUE 23
Mild High TRUE 30
Sunny Overcast Rainy

Windy

FALSE TRUE

47.7 26.5

http://chem-eng.utoronto.ca/~datamining/ 48
Subset (Outlook = Overcast)

Outlook

Sunny Overcast Rainy

Windy 46.3

FALSE TRUE

47.7 26.5

http://chem-eng.utoronto.ca/~datamining/ 49
Subset (Outlook = Rainy)
Temp. Humidity Windy Golf Players
Hot High FALSE 25
Hot High TRUE 30
Mild High FALSE 35
Cool Normal FALSE 38
Mild Normal TRUE 48
StDev=7.78

Golf Players
(StDev) Golf Players Golf Players
Cool 0 (StDev) (StDev)

Temp. Hot 2.5 High 4.1 False 5.6

Humidity Windy
Mild 6.5 Normal 5.0 True 9.0

SDR= 4.18 SDR= 3.32 SDR= 0.82

SDR = 7.78 - ((2/5)*2.5 + (2/5)*6.5) SDR = 7.78 - ((3/5)*4.1 + (2/5)*5.0) SDR = 7.78 - ((3/5)*5.6 + (2/5)*9.0)

http://chem-eng.utoronto.ca/~datamining/ 50
Subset (Outlook = Rainy)
Temp. Humidity Windy Golf Players
Cool Normal FALSE 38
Hot High FALSE 25
Hot High TRUE 30
Mild High FALSE 35
Outlook
Mild Normal TRUE 48

Sunny Overcast Rainy

Windy 46.3 Temp.

FALSE TRUE Cool Hot Mild

47.7 26.5 38 27.5 41.5

http://chem-eng.utoronto.ca/~datamining/ 51
http://chem-eng.utoronto.ca/~datamining/ 52

Learn As You Play Oboe, Peter Wastall - Cópia PDF
100% (4)
Learn As You Play Oboe, Peter Wastall - Cópia PDF
66 pages
Tikki en
No ratings yet
Tikki en
9 pages
Decision Tree & Techniques
71% (7)
Decision Tree & Techniques
41 pages
Melodic Soloing in 10 Days
From Everand
Melodic Soloing in 10 Days
Graham Tippett
4/5 (1)
Ml Unit 2 Final_iii Yr
No ratings yet
Ml Unit 2 Final_iii Yr
72 pages
MLT UNIT-3 notes
No ratings yet
MLT UNIT-3 notes
35 pages
Decision Tree
No ratings yet
Decision Tree
100 pages
Classification With Decision Trees: Instructor: Qiang Yang
100% (1)
Classification With Decision Trees: Instructor: Qiang Yang
62 pages
ML Unit-3 ppt
No ratings yet
ML Unit-3 ppt
92 pages
Data Mining: Classification-1
No ratings yet
Data Mining: Classification-1
53 pages
L5 - Decision Tree - B
No ratings yet
L5 - Decision Tree - B
51 pages
Session 6 - Decision Tree
No ratings yet
Session 6 - Decision Tree
37 pages
ML-19 (1)
No ratings yet
ML-19 (1)
28 pages
Geometric Intuition of Decision Tree: Axis Parallel Hyperplanes
No ratings yet
Geometric Intuition of Decision Tree: Axis Parallel Hyperplanes
7 pages
Classification - Issues Regarding Classification and Prediction
No ratings yet
Classification - Issues Regarding Classification and Prediction
42 pages
Decision Tree
No ratings yet
Decision Tree
27 pages
Decision Tree
No ratings yet
Decision Tree
14 pages
Lecture 12
No ratings yet
Lecture 12
19 pages
7. Decision Tree & Random Forest
No ratings yet
7. Decision Tree & Random Forest
41 pages
CS467-M4-Machine Learning-Ktustudents - in
No ratings yet
CS467-M4-Machine Learning-Ktustudents - in
9 pages
AIML Lect5 Decision Tree
No ratings yet
AIML Lect5 Decision Tree
33 pages
Decision Tree
No ratings yet
Decision Tree
58 pages
Randomforest TNP
No ratings yet
Randomforest TNP
71 pages
Chapter 3
No ratings yet
Chapter 3
88 pages
07 - ML - Decision Tree
No ratings yet
07 - ML - Decision Tree
37 pages
DMDW-CO3-SESSION-14
No ratings yet
DMDW-CO3-SESSION-14
55 pages
3 Decision Trees_LMS
No ratings yet
3 Decision Trees_LMS
47 pages
Unit 6 Finalized
No ratings yet
Unit 6 Finalized
30 pages
Lecture2 DT
No ratings yet
Lecture2 DT
75 pages
2.3 Decision-Tree-Algorithm
No ratings yet
2.3 Decision-Tree-Algorithm
61 pages
Classification With Decision Trees I: Instructor: Qiang Yang
No ratings yet
Classification With Decision Trees I: Instructor: Qiang Yang
29 pages
Lecture 4
No ratings yet
Lecture 4
74 pages
Chapter 5 2018 2019
No ratings yet
Chapter 5 2018 2019
5 pages
Machine Learning - Part 1
100% (1)
Machine Learning - Part 1
80 pages
Examples
No ratings yet
Examples
8 pages
7-Decision Trees Learning
No ratings yet
7-Decision Trees Learning
51 pages
3. Tree Models
No ratings yet
3. Tree Models
42 pages
Decision Tree
100% (1)
Decision Tree
10 pages
Classification and Clustering
No ratings yet
Classification and Clustering
59 pages
6__DecisionTrees__ID3_CART
No ratings yet
6__DecisionTrees__ID3_CART
24 pages
ML_Unit-3
No ratings yet
ML_Unit-3
29 pages
07_Decision tree
No ratings yet
07_Decision tree
45 pages
Decision - Tree
No ratings yet
Decision - Tree
75 pages
Module 3 Chap 3 Decision Tree Learning
No ratings yet
Module 3 Chap 3 Decision Tree Learning
79 pages
Decision Tree
100% (4)
Decision Tree
66 pages
Decision Tree
No ratings yet
Decision Tree
35 pages
Decision Tree: Courtesy: Prof. Pabitra Mitra, CSE, IIT Kharagpur
No ratings yet
Decision Tree: Courtesy: Prof. Pabitra Mitra, CSE, IIT Kharagpur
73 pages
Wk. 5.2. Decision Trees (27.10.2020)
No ratings yet
Wk. 5.2. Decision Trees (27.10.2020)
57 pages
Classification Trees: C4.5: Vanden Berghen Frank
No ratings yet
Classification Trees: C4.5: Vanden Berghen Frank
5 pages
What Is An ID3 Algorithm?
No ratings yet
What Is An ID3 Algorithm?
10 pages
Decision Trees
No ratings yet
Decision Trees
25 pages
3.1 C 4.5 Algorithm-19
No ratings yet
3.1 C 4.5 Algorithm-19
10 pages
Decision Tree and Random Forest
No ratings yet
Decision Tree and Random Forest
74 pages
06-Classification_Part1
No ratings yet
06-Classification_Part1
44 pages
AI Lecture 9
No ratings yet
AI Lecture 9
69 pages
Decision Tree & Random Forest
No ratings yet
Decision Tree & Random Forest
28 pages
Lecture 4
No ratings yet
Lecture 4
74 pages
IS4834 Week 8
No ratings yet
IS4834 Week 8
42 pages
Decision Tree Algorithm
No ratings yet
Decision Tree Algorithm
12 pages
Decision Tree Induction
No ratings yet
Decision Tree Induction
52 pages
Decision Tree Classification
100% (1)
Decision Tree Classification
11 pages
Cue Ball Control Ownership Series, Portfolio #9 of 12: Cue Ball Control Ownership Series, #9
From Everand
Cue Ball Control Ownership Series, Portfolio #9 of 12: Cue Ball Control Ownership Series, #9
Allan P. Sand
No ratings yet
Network Diagram Exercises
No ratings yet
Network Diagram Exercises
2 pages
College
No ratings yet
College
1 page
البيانات الضخمة Big Data
No ratings yet
البيانات الضخمة Big Data
30 pages
2D Architecture.docx شرح
No ratings yet
2D Architecture.docx شرح
2 pages
Sharda Bi4e PPT 01
No ratings yet
Sharda Bi4e PPT 01
39 pages
Fourth Edition: Descriptive Analytics I: Nature of Data, Statistical Modeling, and Visualization
No ratings yet
Fourth Edition: Descriptive Analytics I: Nature of Data, Statistical Modeling, and Visualization
66 pages
Outlines: Services Offered by BSNL Types of Exchanges Layout of Exchanges Refrences
No ratings yet
Outlines: Services Offered by BSNL Types of Exchanges Layout of Exchanges Refrences
22 pages
Sokolowski 2007
No ratings yet
Sokolowski 2007
5 pages
Civil Engineer List
No ratings yet
Civil Engineer List
9 pages
Caeses Su2
No ratings yet
Caeses Su2
11 pages
WAIS-IV v10 Pub
No ratings yet
WAIS-IV v10 Pub
16 pages
Yellow Gray and Black Minimalist Industries Presentation
No ratings yet
Yellow Gray and Black Minimalist Industries Presentation
72 pages
1968 Ford Mustang
No ratings yet
1968 Ford Mustang
7 pages
Lesson Plan 8 Speech Marks Cambridge English Book 3
No ratings yet
Lesson Plan 8 Speech Marks Cambridge English Book 3
2 pages
HOUSE RULE: The Following Are Expected To Be Followed:: Present Test Permit When I Take The Exam
No ratings yet
HOUSE RULE: The Following Are Expected To Be Followed:: Present Test Permit When I Take The Exam
1 page
FD 320 Document Folder: Operator Manual Third Edition
No ratings yet
FD 320 Document Folder: Operator Manual Third Edition
7 pages
Ti Scroll Ped en Rev02
No ratings yet
Ti Scroll Ped en Rev02
12 pages
Rackett 1970
No ratings yet
Rackett 1970
4 pages
At.2516 Forming The Auditors Opinion and Report On The FSs
No ratings yet
At.2516 Forming The Auditors Opinion and Report On The FSs
71 pages
DARCOM Frank Holbrook - Daniel
100% (7)
DARCOM Frank Holbrook - Daniel
582 pages
Chanda Importane in Neuroscience
No ratings yet
Chanda Importane in Neuroscience
11 pages
X
No ratings yet
X
2 pages
Game Balance Miracle
100% (1)
Game Balance Miracle
2 pages
Surya Resume
No ratings yet
Surya Resume
1 page
Punjab PCS 2024 Eligibility Age Qualification Requirements for PCS Exam
No ratings yet
Punjab PCS 2024 Eligibility Age Qualification Requirements for PCS Exam
5 pages
Product Data Sheet 6AV6647 0AC11 3AX0
No ratings yet
Product Data Sheet 6AV6647 0AC11 3AX0
9 pages
Monthly Prayer Times 2023 October
No ratings yet
Monthly Prayer Times 2023 October
3 pages
Alaska Gold Mine
No ratings yet
Alaska Gold Mine
4 pages
THREE MOMENTS THEOREM NOTES
No ratings yet
THREE MOMENTS THEOREM NOTES
49 pages
Ethics in Social Marketing PDF
No ratings yet
Ethics in Social Marketing PDF
2 pages
Ordonez v. USA - Document No. 3
No ratings yet
Ordonez v. USA - Document No. 3
3 pages
Identifying Emotions Lesson
No ratings yet
Identifying Emotions Lesson
2 pages
Product Specification-Pumpkin Seed Kernels A
No ratings yet
Product Specification-Pumpkin Seed Kernels A
4 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Decision Trees

Uploaded by

Decision Trees

Uploaded by

Decision Trees

Classification & Regression

Outlook Temp. Humidity Windy Play Golf

Sunny Overcast Rainy

Windy Play Humidity

FALSE TRUE High Normal

Play Not Play Not Play Play

Entropy = -p log2p – q log2q

Entropy (5,3,2) = Entropy (0.5,0.3,0.2)

Play Golf Play Golf

Play Golf Play Golf

= (5/14)*0.971 + (4/14)*0.0 + (5/14)*0.971

Gain(T , X )  Entropy (T )  Entropy (T , X )

G(PlayGolf, Outlook) = E(PlayGolf) – E(PlayGolf, Outlook)

= 0.940 – 0.693 = 0.247

Play Golf Play Golf

Play Golf Play Golf

Sunny Overcast Rainy

Sunny Overcast Rainy

Play Golf Play Golf Play Golf

Sunny Overcast Rainy

Windy Play=Yes Humidity

FALSE TRUE High Normal

Play=Yes Play=No Play=No Play=Yes

R2: IF (Outlook=Sunny) AND

R3: IF (Outlook=Overcast) THEN Windy Play=Yes Humidity

R4: IF (Outlook=Rainy) AND FALSE TRUE High Normal

f=0.33 f=0.5 f=0.33

Combined using ratios 6:2:6 gives 0.51

GainRatio(T,X) = Gain(T,X) / SplitInformation(T,X)

Split (T , X )   P(c) log 2 P(c)

Split (Play,Outlook) = - (5/14*log2(5/14) + 4/14*log2(4/15) + 5/14*log2(5/14))

Gain Ratio (Play,Outlook) = 0.247/1.577 = 0.156

 Sometimes the best attribute for splitting the training

• For the categorical variables replace the

• K Nearest Neighbors (KNN)

Outlook Temp. Humidity Windy Golf Players

Windy 46.3 Temp.

FALSE TRUE Cool Hot Mild

47.7 26.5 38 27.5 41.5

Decision (CHAID) i 1 j 1 Eij

Golf Players Golf Players

Golf Players Golf Players

S(Players, Outlook) = P(Sunny)*S(Sunny) + P(Overcast)*S(Overcast) + P(Rainy)*S(Rainy)

= (4/14)*3.49 + (5/14)*7.78 + (5/14)*10.87

G(Players, Outlook) = S(Players) – S(Players, Outlook)

= 9.32 – 7.66 = 1.66

Golf Players Golf Players

Sunny Overcast Rainy

Mild High FALSE 45

Golf Players Golf Players Golf Players

SDR = 10.87-((2/5)*14.5 + (3/5)*7.32) SDR = 10.87-((2/5)*7.5 + (3/5)*12.5) SDR = 10.87-((3/5)*3.09 + (2/5)*3.5)

Sunny Overcast Rainy

Temp. Hot 2.5 High 4.1 False 5.6

SDR= 4.18 SDR= 3.32 SDR= 0.82

Sunny Overcast Rainy

Windy 46.3 Temp.

FALSE TRUE Cool Hot Mild

47.7 26.5 38 27.5 41.5

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

= (5/14)0.971 + (4/14)0.0 + (5/14)*0.971

Split (Play,Outlook) = - (5/14log2(5/14) + 4/14log2(4/15) + 5/14*log2(5/14))

S(Players, Outlook) = P(Sunny)S(Sunny) + P(Overcast)S(Overcast) + P(Rainy)*S(Rainy)

= (4/14)3.49 + (5/14)7.78 + (5/14)*10.87

SDR = 10.87-((2/5)14.5 + (3/5)7.32) SDR = 10.87-((2/5)7.5 + (3/5)12.5) SDR = 10.87-((3/5)3.09 + (2/5)3.5)