Decision Tree

Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 12

Decision Tree

https://www.saedsayad.com/decision_tree.htm

1
1
Classification and Prediction
 Classification is the process of finding a model (or function)
that describes and distinguishes data classes or concepts.
 The model are derived based on the analysis of a set of
training data (i.e., data objects for which the class labels are
known).
 The model is used to predict the class label of objects for
which the class label is unknown.

2
Decision Tree Induction
 A decision tree is a flowchart-like tree structure, where each
node denotes a test on an attribute value, each branch
represents an outcome of the test, and tree leaves represent
classes or class distributions.
 At each node, the algorithm chooses the “best” attribute to
partition the data into individual classes.
 The construction of decision tree classifiers does not require
any domain knowledge or parameter setting, and therefore is
appropriate for exploratory knowledge discovery.
 Decision trees can easily be converted to classification rules.
 Decision trees can handle multidimensional data.

4
Decision Tree Induction: Training Dataset
age income student credit_rating buys_computer
<=30 high no fair no
<=30 high no excellent no
31…40 high no fair yes
This >40 medium no fair yes
follows an >40 low yes fair yes
example of >40 low yes excellent no
31…40 low yes excellent yes
Quinlan’s <=30 medium no fair no
ID3 <=30 low yes fair yes
>40 medium yes fair yes
(Playing <=30 medium yes excellent yes
Tennis) 31…40 medium no excellent yes
31…40 high yes fair yes
>40 medium no excellent no

5
Output: A Decision Tree for “buys_computer”

age?

<=30 overcast
31..40 >40

student? yes credit rating?

no yes excellent fair

no yes yes

6
Algorithm for Decision Tree Induction
 Basic algorithm (a greedy algorithm)
– Tree is constructed in a top-down recursive divide-and-conquer
manner
– At start, all the training examples are at the root
– Attributes are categorical (if continuous-valued, they are discretized
in advance)
– Examples are partitioned recursively based on selected attributes
– Test attributes are selected on the basis of a heuristic or statistical
measure (e.g., information gain)

7
Attribute Selection Measure for ID3: Information Gain

 Select the attribute with the highest information gain


 Let pi be the probability that an arbitrary tuple in D belongs
to class Ci
 Expected information (entropy) needed to classify a tuple
in D: m
Info(D)   pi log2 ( pi )
i1

 Information needed (after using A to split D into v


v |D |
partitions) to classify D:
InfoA (D )    I(Dj )
j

j1 | D|
 Information gained by branching on attribute A
Gain(A) Info(D) InfoA(D)
9
Attribute Selection: Information Gain

 Class P: buys_computer = “yes” 5 4


Info age (D)  I (2,3)  I (4,0)
 Class N: buys_computer = “no” 14 14
Info(D)  I (9,5)   9 log 2 ( 9 )  5 log 2 ( 5 ) 0.940  5 I (3,2)  0.694
14 14 14 14 14
age pi ni I(pi, ni) 5
I (2,3)means “age <=30” has 5 out of
<=30 2 3 0.971 14 14 samples, with 2 yes’es and 3
31…40 4 0 0 no’s. Hence
>40 3 2 0.971
age income student credit_rating buys_computer Gain(age)  Info(D)  Infoage(D)  0.246
<=30 high no fair no
<=30 high no excellent no Similarly,
31…40 high no fair yes
>40 medium no fair yes
>40 low yes fair yes
>40
31…40 low
low yes
yes
excellent
excellent
no
yes Gain(income)  0.029
<=30
<=30
medium
low
no
yes
fair
fair
no
yes Gain(student)  0.151
Gain(credit_ rating)  0.048
>40 medium yes fair yes
<=30 medium yes excellent yes
31…40 medium no excellent yes
31…40February
high 18, 2016 yes fair yes 10
>40 medium no excellent no
Gain Ratio for Attribute Selection (C4.5)

• Information gain measure is biased towards attributes with


a large number of values
• C4.5 (a successor of ID3) uses gain ratio to overcome the
problem.
v | Dj |
| Dj |
SplitInfoA (D)    log2 ( )
j1 | D | | D|
– GainRatio(A) = Gain(A)/SplitInfo(A)
4 4 6 6 4 4
• Ex. SplitInfo A(D)    log 2( )  log 2 ( )  log 2 ( )  0.926
14 14 14 14 14 14
– gain_ratio(income) = 0.029/0.926 = 0.031
• The attribute with the maximum gain ratio is selected as
the splitting attribute
11
Comparing Attribute Selection Measures

• The three measures, in general, return good results but


– Information gain:
• biased towards multivalued attributes
– Gain ratio:
• tends to prefer unbalanced splits in which one
partition is much smaller than the others

12
Thank you for your attention.

Any Question?

13

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy