0% found this document useful (0 votes)
21 views45 pages

07_Decision tree

Uploaded by

Obaida Almoula
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
21 views45 pages

07_Decision tree

Uploaded by

Obaida Almoula
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 45

AI

Decision Tree

Dr. Ali Al-Saegh


Computer Engineering Department, College of Engineering, University of Mosul
Introduction
• A decision tree is a supervised learning algorithm that generates a tree and a set of rules
from a given dataset. It is used for classification and regression.
• It is a hierarchical data structure that represents data through a divide-and-conquer strategy.

• A decision tree consists of : Decision nodes


• Decision node tests one attribute (feature)
• One branch is for each possible attribute value Branch
• Leaf node assigns a class

• In general, the rules have the form:


• If condition1 and condition2 and … then outcome.
• e.g.: If diameter >= 3 and color = orange then orange

Leaf node
Choosing a good attribute
• Would we prefer to split on X1 or X2?

• Good split if we are more certain about classification after split.


• Deterministic is good (all true or all false)
• Uniform distribution is bad
Entropy
• Entropy is a measure of the amount of uncertainty or impurity in the dataset S.

𝐻 𝑆 = − ෍ 𝑃(𝑥) log 2 𝑃(𝑥)


𝑥∈𝑋

• 𝑆: dataset for which entropy is being calculated


• 𝑋: set of classes in 𝑆
• 𝑃(𝑥): probability of 𝑥 i.e. proportion of the number of elements in class 𝑥 to the number of elements in 𝑆

Impure Less impure Pure


Interpretation of entropy
• “Low Entropy” • “High Entropy”
• The class (label) is from a • The class (label) is from a
varied (peaks and valleys) uniform-like distribution.
distribution. • Flat histogram.
• Histogram has many lows or highs. • Values sampled from it are
• Values sampled from it are more less predictable.
predictable.
Information gain
• Information Gain (IG) measures the reduction in entropy or uncertainty after
splitting the dataset according to a given value of a random variable.
• It is a measure of how much information a feature provides about a class.

𝐼𝐺 𝑆, 𝐴 = 𝐻(𝑆) − ෍ 𝑃 𝑡 𝐻(𝑡)
𝑡∈𝑇

• 𝐼𝐺 𝑆, 𝐴 : information gain by splitting 𝑆 on feature 𝐴


• 𝑇: the subset created from splitting set 𝑆 on feature 𝐴
• 𝑃 𝑡 : the proportion of number of elements in 𝑡 to the number of elements in 𝑆
• 𝐻(𝑆): entropy of dataset 𝑆
• 𝐻(𝑡) : entropy of subset 𝑡
ID3 (Iterative Dichotomies 3)
• The ID3 algorithm is used to generate a decision tree from a dataset.
• IG determines which feature is most useful for discriminating between the classes
to be learned.
• IG is used within the ID3 algorithm to decide the ordering of features in the nodes
of a decision tree.
• ID3 follows a greedy approach by selecting the best attribute/feature that yields
maximum IG.
• A greedy algorithm is an approach for solving a problem by selecting the best option
available at the moment. Hence, it may find the local optimal solution.
ID3 algorithm
1. Calculate entropy for the dataset.
2. For each attribute/feature.
2.1. Calculate entropy for all its categorical values.
2.2. Calculate information gain for the feature.
3. Find the feature with maximum information gain.
4. Repeat it until we get the desired decision tree.
Example 1
Day Outlook Temp. Humidity Wind Play tennis
1 Sunny Hot High Weak No
• For the shown dataset (𝑠), it is 2 Sunny Hot High Strong No
required to construct a decision 3 Overcast Hot High Weak Yes
tree to decide whether to play 4 Rainy Mild High Weak Yes
tennis or not based on the 5 Rainy Cold Normal Weak Yes
weather conditions. 6 Rainy Cold Normal Strong No
7 Overcast Cold Normal Strong Yes
8 Sunny Mild High Weak No
9 Sunny Cold Normal Weak Yes
10 Rainy Mild Normal Weak Yes
11 Sunny Mild Normal Strong Yes
12 Overcast Mild High Strong Yes
13 Overcast Hot Normal Weak Yes
14 Rainy Mild High Strong No
Solution
• Dataset is of binary classes (yes and no), where 9 out of 14 are "yes" and 5 out of
14 are "no“
• The entropy for the dataset is calculated as:

𝐻 𝑆 = − ෍ 𝑃(𝑥) log 2 𝑃(𝑥)


𝑥∈𝑋

9 9 5 5
= − log 2 − log 2
14 14 14 14

= 0.41 + 0.53 = 0.94


Solution
• For each feature of the dataset, calculate entropy for all its categorical values then calculate
information gain for the feature.
• First feature is outlook which has three categorical values sunny, overcast, and rainy.
2 2 3 3
• 𝐻 𝑠𝑢𝑛𝑛𝑦 = − log 2 − log 2 = 0.971 2 out of 5 sunny is “yes” and 3 out of 5 sunny is “no”
5 5 5 5

4 4
• 𝐻 𝑜𝑣𝑒𝑟𝑐𝑎𝑠𝑡 = − log 2 −0=0 4 out of 4 overcast is “yes”
4 4

3 3 2 2
• 𝐻 𝑟𝑎𝑖𝑛𝑦 = − log 2 − log 2 = 0.971 3 out of 5 rainy is “yes” and 2 out of 5 rainy is “no”
5 5 5 5

• 𝐼𝐺 𝑆, 𝑜𝑢𝑡𝑙𝑜𝑜𝑘 = 𝐻(𝑆) − 𝑃 𝑠𝑢𝑛𝑛𝑦 × 𝐻 𝑠𝑢𝑛𝑛𝑦 + 𝑃 𝑜𝑣𝑒𝑟𝑐𝑎𝑠𝑡 × 𝐻 𝑜𝑣𝑒𝑟𝑐𝑎𝑠𝑡 + 𝑃(𝑟𝑎𝑖𝑛) × 𝐻(𝑟𝑎𝑖𝑛)


5 4 5
• 𝐼𝐺 𝑆, 𝑜𝑢𝑡𝑙𝑜𝑜𝑘 = 0.94 − × 0.971 + ×0+ × 0.971 = 0.94 − 0.693 = 0.247
14 14 14
Solution
• Second feature is temp. which has three categorical values hot, cold, and mild.
2 2 2 2
• 𝐻 ℎ𝑜𝑡 = − log 2 − log 2 =1 2 out of 4 hot is “yes” and 2 out of 4 hot is “no”
4 4 4 4

3 3 1 1
• 𝐻 𝑐𝑜𝑙𝑑 = − log 2 − log 2 = 0.811 3 out of 4 cold is “yes” and 1 out of 4 cold is “no”
4 4 4 4

4 4 2 2
• 𝐻 𝑚𝑖𝑙𝑑 = − log 2 − log 2 = 0.9179 4 out of 6 mild is “yes” and 2 out of 6 mild is “no”
6 6 6 6

• 𝐼𝐺 𝑆, 𝑡𝑒𝑚𝑝. = 𝐻(𝑆) − 𝑃 ℎ𝑜𝑡 × 𝐻 ℎ𝑜𝑡 + 𝑃 𝑐𝑜𝑙𝑑 × 𝐻 𝑐𝑜𝑙𝑑 + 𝑃(𝑚𝑖𝑙𝑑) × 𝐻(𝑚𝑖𝑙𝑑)


4 4 6
• 𝐼𝐺 𝑆, 𝑡𝑒𝑚𝑝. = 0.94 − ×1+ × 0.811 + × 0.9179 = 0.94 − 0.9108 = 0.0292
14 14 14
Solution
• Third feature is humidity which has two categorical values high and normal.
3 3 4 4
• 𝐻 ℎ𝑖𝑔ℎ = − log 2 − log 2 = 0.983 3 out of 7 high is “yes” and 4 out of 7 high is “no”
7 7 7 7

6 6 1 1
• 𝐻 𝑛𝑜𝑟𝑚𝑎𝑙 = − log 2 − log 2 = 0.591 6 out of 7 normal is “yes” and 1 out of 7 normal is “no”
7 7 7 7

• 𝐼𝐺 𝑆, ℎ𝑢𝑚𝑖𝑑𝑖𝑡𝑦 = 𝐻(𝑆) − 𝑃 ℎ𝑖𝑔ℎ × 𝐻 ℎ𝑖𝑔ℎ + 𝑃 𝑛𝑜𝑟𝑚𝑎𝑙 × 𝐻 𝑛𝑜𝑟𝑚𝑎𝑙


7 7
• 𝐼𝐺 𝑆, ℎ𝑢𝑚𝑖𝑑𝑖𝑡𝑦 = 0.94 − × 0.983 + × 0.591 = 0.94 − 0.787 = 0.153
14 14
Solution
• Fourth feature is wind which has two categorical values weak and strong.
6 6 2 2
• 𝐻 𝑤𝑒𝑎𝑘 = − log 2 − log 2 = 0.811 6 out of 8 weak is “yes” and 2 out of 8 weak is “no”
8 8 8 8

3 3 3 3
• 𝐻 𝑠𝑡𝑟𝑜𝑛𝑔 = − log 2 − log 2 =1 3 out of 6 strong is “yes” and 3 out of 6 strong is “no”
6 6 6 6

• 𝐼𝐺 𝑆, 𝑤𝑖𝑛𝑑 = 𝐻(𝑆) − 𝑃 𝑤𝑒𝑎𝑘 × 𝐻 𝑤𝑒𝑎𝑘 + 𝑃 𝑠𝑡𝑟𝑜𝑛𝑔 × 𝐻 𝑠𝑡𝑟𝑜𝑛𝑔


8 6
• 𝐼𝐺 𝑆, 𝑤𝑖𝑛𝑑 = 0.94 − × 0.811 + × 1 = 0.94 − 0.892 = 0.048
14 14
Solution
• The feature with maximum information gain is outlook. So, the decision tree built
so far:
• Note that, when Outlook = overcast, it is of pure class “Yes”.

Outlook

Sunny Overcast Rainy

? Yes ?
Solution
• Next, from the remaining three features temp., humidity, and wind, we decide which
one is the best for the left branch of outlook.
• Since the left branch of outlook denotes sunny, we will work with the set of rows
having sunny as the value in the outlook column.
• Calculate entropy for this subset (outlook = sunny):

Outlook Temp. Humidity Wind Play tennis


Sunny Hot High Weak No
2 2 3 3
𝐻 𝑠𝑢𝑛𝑛𝑦 = − log 2 − log 2 Sunny Hot High Strong No
5 5 5 5
Sunny Mild High Weak No
Sunny Cold Normal Weak Yes
= 0.971 Sunny Mild Normal Strong Yes
Solution
• First feature is temp. which has three categorical values hot, cold, and mild.
2 2
• 𝐻 ℎ𝑜𝑡 = 0 − log 2 =0
2 2

• 𝐻 𝑐𝑜𝑙𝑑 = −1 log 2 1 − 0 = 0
1 1 1 1
• 𝐻 𝑚𝑖𝑙𝑑 = − log 2 − log 2 =1
2 2 2 2

• 𝐼𝐺 𝑠𝑢𝑛𝑛𝑦, 𝑡𝑒𝑚𝑝. = 𝐻(𝑠𝑢𝑛𝑛𝑦) − 𝑃 ℎ𝑜𝑡 × 𝐻 ℎ𝑜𝑡 + 𝑃 𝑐𝑜𝑙𝑑 × 𝐻 𝑐𝑜𝑙𝑑 + 𝑃(𝑚𝑖𝑙𝑑) × 𝐻(𝑚𝑖𝑙𝑑)


2 1 2
• 𝐼𝐺 𝑠𝑢𝑛𝑛𝑦, 𝑡𝑒𝑚𝑝. = 0.971 − ×0+ × 0 + × 1 = 0.971 − 0.4 = 0.571
5 15 5
Solution
• Second feature is humidity which has two categorical values high and normal.
3 3
• 𝐻 ℎ𝑖𝑔ℎ = 0 − log 2 =0
3 3

2 2
• 𝐻 𝑛𝑜𝑟𝑚𝑎𝑙 = − log 2 −0=0
2 2

• 𝐼𝐺 𝑠𝑢𝑛𝑛𝑦, ℎ𝑢𝑚𝑖𝑑𝑖𝑡𝑦 = 𝐻(𝑠𝑢𝑛𝑛𝑦) − 𝑃 ℎ𝑖𝑔ℎ × 𝐻 ℎ𝑖𝑔ℎ + 𝑃 𝑛𝑜𝑟𝑚𝑎𝑙 × 𝐻 𝑛𝑜𝑟𝑚𝑎𝑙


3 2
• 𝐼𝐺 𝑠𝑢𝑛𝑛𝑦, ℎ𝑢𝑚𝑖𝑑𝑖𝑡𝑦 = 0.971 − × 0 + × 0 = 0.971 − 0 = 0.971
5 5
Solution
• Third feature is wind which has two categorical values weak and strong.
1 1 2 2
• 𝐻 𝑤𝑒𝑎𝑘 = − log 2 − log 2 = 0.918
3 3 3 3

1 1 1 1
• 𝐻 𝑠𝑡𝑟𝑜𝑛𝑔 = − log 2 − log 2 =1
2 2 2 2

• 𝐼𝐺 𝑠𝑢𝑛𝑛𝑦, 𝑤𝑖𝑛𝑑 = 𝐻(𝑠𝑢𝑛𝑛𝑦) − 𝑃 𝑤𝑒𝑎𝑘 × 𝐻 𝑤𝑒𝑎𝑘 + 𝑃 𝑠𝑡𝑟𝑜𝑛𝑔 × 𝐻 𝑠𝑡𝑟𝑜𝑛𝑔


3 2
• 𝐼𝐺 𝑠𝑢𝑛𝑛𝑦, 𝑤𝑖𝑛𝑑 = 0.971 − × 0.918 + × 1 = 0.971 − 0.9508 = 0.0202
5 5
Solution
• Here, the attribute with the maximum information gain is humidity. So, the decision tree
built so far.
• When outlook = sunny and humidity = high, it is a pure class of category “no”.
• When outlook = sunny and humidity = normal, it is a pure class of category “yes”.
• Therefore, we don't need to do further calculations on the sunny branch.

Outlook

Sunny Overcast Rainy

Humidity Yes ?

High Normal

No Yes
Solution
• Next, from the remaining two features temp. and wind, we decide which one is the
best for splitting the data.
• Since the remaining branch of outlook denotes rainy, we will work with the set of
rows having rainy as the value in the outlook column.
• Calculate entropy for this subset (outlook = rainy):

Outlook Temp. Humidity Wind Play tennis


3 3 2 2 Rainy Mild High Weak Yes
𝐻 𝑟𝑎𝑖𝑛𝑦 = − log 2 − log 2 Rainy Cold Normal Weak Yes
5 5 5 5
Rainy Cold Normal Strong No
Rainy Mild Normal Weak Yes
= 0.971 Rainy Mild High Strong No
Solution
• First feature is temp. which has two categorical values cold and mild.
1 1 1 1
• 𝐻 𝑐𝑜𝑙𝑑 = − log 2 − log 2 =1
2 2 2 2

2 2 1 1
• 𝐻 𝑚𝑖𝑙𝑑 = − log 2 − log 2 = 0.918
3 3 3 3

• 𝐼𝐺 𝑟𝑎𝑖𝑛𝑦, 𝑡𝑒𝑚𝑝. = 𝐻(𝑟𝑎𝑖𝑛𝑦) − 𝑃 𝑐𝑜𝑙𝑑 × 𝐻 𝑐𝑜𝑙𝑑 + 𝑃(𝑚𝑖𝑙𝑑) × 𝐻(𝑚𝑖𝑙𝑑)


2 3
• 𝐼𝐺 𝑟𝑎𝑖𝑛𝑦, 𝑡𝑒𝑚𝑝. = 0.971 − × 1 + × 0.918 = 0.971 − 0.9508 = 0.0202
5 5
Solution
• Second feature is wind which has two categorical values weak and strong.
3 3
• 𝐻 𝑤𝑒𝑎𝑘 = − log 2 −0=0
3 3

2 2
• 𝐻 𝑠𝑡𝑟𝑜𝑛𝑔 = 0 − log 2 =0
2 2

• 𝐼𝐺 𝑟𝑎𝑖𝑛𝑦, 𝑤𝑖𝑛𝑑 = 𝐻(𝑟𝑎𝑖𝑛𝑦) − 𝑃 𝑤𝑒𝑎𝑘 × 𝐻 𝑤𝑒𝑎𝑘 + 𝑃 𝑠𝑡𝑟𝑜𝑛𝑔 × 𝐻 𝑠𝑡𝑟𝑜𝑛𝑔


3 2
• 𝐼𝐺 𝑟𝑎𝑖𝑛𝑦, 𝑤𝑖𝑛𝑑 = 0.971 − × 0 + × 0 = 0.971 − 0 = 0.971
5 5
Solution
• The feature with maximum information gain is wind.
• when outlook = rainy and wind = strong, it is a pure class of category "no".
• When outlook = rainy and wind = weak, it is again a pure class of category "yes".
• Therefore, there is no more calculations.
Solution

Outlook

Sunny Overcast Rainy

Humidity Yes Wind

High Normal Weak Strong

No Yes Yes No
Real-valued features
• The real-life data often contains numeric information or a mixture of different
feature types while decision trees work with categorical values.
• Discretization is a pre-processing step that changes numeric values to categorical
ones by finding sub-intervals.
• Binary split is a discretization method based on a threshold value (“greater than or
equal to” and “less than”).
• Splitting on feature 𝑥 at value 𝑡:
• One branch: 𝑥 ≥ 𝑡
• Other branch: 𝑥 < 𝑡
• In binary split, the aim is to maximize 𝐼𝐺 𝑆|𝑥: 𝑡
• i.e. threshold 𝑡 should maximize information gain for feature 𝑥 in dataset 𝑆.
Example 2
Day Outlook Temp. Humidity Wind Play tennis
• For the shown dataset (𝑠), it is 1 Sunny 85 85 Weak No
required to construct a decision 2 Sunny 80 90 Strong No
tree to decide whether to play 3 Overcast 83 78 Weak Yes
tennis or not based on the 4 Rainy 70 96 Weak Yes
weather conditions. 5 Rainy 68 80 Weak Yes
6 Rainy 65 70 Strong No
7 Overcast 64 65 Strong Yes
8 Sunny 72 95 Weak No
9 Sunny 69 70 Weak Yes
10 Rainy 75 80 Weak Yes
11 Sunny 75 70 Strong Yes
12 Overcast 72 90 Strong Yes
13 Overcast 81 75 Weak Yes
14 Rainy 71 80 Strong No
Solution
Day Humidity Play tennis
1 65 Yes
• Continuous values of humidity and temp. features need to 2 70 No
be converted to categorical ones. 3 70 Yes
• We will convert the humidity values using binary 4 70 Yes
discretization. 5 75 Yes
6 78 Yes
• Binary discretization steps:
7 80 Yes
1. Sort values from smallest to largest. 8 80 Yes
2. Iterate on all values and separate the dataset into two 9 80 No
parts. 10 85 No
3. Calculate the gain for every step (value). 11 90 No
12 90 Yes
4. The value which maximizes the gain would be the 13 95 No
threshold. 14 96 Yes
Solution
𝐻 ℎ𝑢𝑚𝑖𝑑𝑖𝑡𝑦 ≤ 65 = −𝑝(𝑛𝑜) log 2 𝑝(𝑛𝑜) − 𝑝(𝑦𝑒𝑠) log 2 𝑝(𝑦𝑒𝑠)

0 0 1 1
= − log 2 − log 2 =0
1 1 1 1

𝐻 ℎ𝑢𝑚𝑖𝑑𝑖𝑡𝑦 > 65 = −𝑝(𝑛𝑜) log 2 𝑝(𝑛𝑜) − 𝑝(𝑦𝑒𝑠) log 2 𝑝(𝑦𝑒𝑠)

5 5 8 8
=− log 2 − log 2 = 0.53 + 0.431 = 0.961
13 13 13 13

𝐼𝐺 ℎ𝑢𝑚𝑖𝑑𝑖𝑡𝑦, 65 = 𝐻(𝑆) − 𝑃 ℎ𝑢𝑚𝑖𝑑𝑖𝑡𝑦 ≤ 65 × 𝐻 ℎ𝑢𝑚𝑖𝑑𝑖𝑡𝑦 ≤ 65 + 𝑃 ℎ𝑢𝑚𝑖𝑑𝑖𝑡𝑦 > 65 × 𝐻 ℎ𝑢𝑚𝑖𝑑𝑖𝑡𝑦 > 65

1 13
𝐼𝐺 ℎ𝑢𝑚𝑖𝑑𝑖𝑡𝑦, 65 = 0.94 − ×0 + × 0.961 = 0.94 − 0.892 = 0.048
14 14
Solution
• IG maximizes when humidity is equal to 80. Day Outlook Temp. Humidity>80 Wind Play tennis
• Hence, threshold is equal to 80. 1 Sunny 85 yes Weak No
2 Sunny 80 yes Strong No
3 Overcast 83 no Weak Yes
Humidity 𝐼𝐺
4 Rainy 70 yes Weak Yes
65 0.048
5 Rainy 68 no Weak Yes
70 0.014 6 Rainy 65 no Strong No
75 0.045 7 Overcast 64 no Strong Yes
78 0.090 8 Sunny 72 yes Weak No
80 0.101 9 Sunny 69 no Weak Yes
85 0.024 10 Rainy 75 no Weak Yes
11 Sunny 75 no Strong Yes
90 0.010
12 Overcast 72 yes Strong Yes
95 0.048
13 Overcast 81 no Weak Yes
Humidity cannot be
96 greater than this value 14 Rainy 71 no Strong No
Solution
• If you change the continuous values of temp. to categorical values and continue
solving, you will get:

Outlook

Sunny Overcast Rainy

Humidity Yes Wind

> 80 ≤ 80 Weak Strong

No Yes Yes No
Regression tree
• Standard deviation reduction (𝑆𝐷𝑅) is used instead of IG for constructing a
regression decision tree.
• It involves partitioning the data into subsets that contain instances with nearly
similar values (homogenous).
• Standard deviation (𝑆𝐷) is used to calculate the homogeneity of numerical
samples.
σ 𝑥 − 𝑥ҧ 2
• If the numerical samples are completely homogeneous their standard 𝑆𝐷 =
deviation is zero. 𝑛
• Branching termination criteria are:
• when coefficient of variation (𝐶𝑉) for a branch becomes smaller than a 𝑆𝐷
certain threshold. 𝐶𝑉 = × 100%
𝑥ҧ
• when too few instances (𝑛) remain in the branch.
Standard deviation
• For each feature of the dataset, calculate 𝑆𝐷 for all its values then calculate 𝑆𝐷𝑅
for the feature.

𝑆𝐷 𝑆, 𝐴 = ෍ 𝑃 𝑡 𝑆𝐷(𝑡)
𝑡∈𝑇

• 𝑆𝐷 𝑆, 𝐴 : standard deviation by splitting the dataset (𝑆) on the feature 𝐴


• 𝑇: the subset created from splitting 𝑆 by feature 𝐴
• 𝑃 𝑡 : the proportion of number of elements in 𝑡 to the number of elements in 𝑆
• 𝑆𝐷(𝑡): standard deviation of subset 𝑡

𝑆𝐷𝑅 𝑆, 𝐴 = 𝑆𝐷 𝑆 − 𝑆𝐷 𝑆, 𝐴
Example 3
Hours
Day Outlook Temp. Humidity Wind
played
• For the shown dataset (𝑠), it is 1 Sunny Hot High Weak 25
required to construct a regression 2 Sunny Hot High Strong 30
tree to decide hours to play tennis 3 Overcast Hot High Weak 46
based on the weather conditions. 4 Rainy Mild High Weak 45
5 Rainy Cold Normal Weak 52
6 Rainy Cold Normal Strong 23
7 Overcast Cold Normal Strong 43
8 Sunny Mild High Weak 35
9 Sunny Cold Normal Weak 38
10 Rainy Mild Normal Weak 46
11 Sunny Mild Normal Strong 48
12 Overcast Mild High Strong 52
13 Overcast Hot Normal Weak 44
14 Rainy Mild High Strong 30
Solution
𝑆𝐷 ℎ𝑜𝑢𝑟𝑠, 𝑜𝑢𝑡𝑙𝑜𝑜𝑘 = 𝑃 𝑠𝑢𝑛𝑛𝑦 × 𝑆𝐷 𝑠𝑢𝑛𝑛𝑦 + 𝑃 𝑜𝑣𝑒𝑟𝑐𝑎𝑠𝑡 × 𝑆𝐷 𝑜𝑣𝑒𝑟𝑐𝑎𝑠𝑡 + 𝑃 𝑟𝑎𝑖𝑛𝑦 × 𝑆𝐷 𝑟𝑎𝑖𝑛𝑦

5 4 5
𝑆𝐷 ℎ𝑜𝑢𝑟𝑠, 𝑜𝑢𝑡𝑙𝑜𝑜𝑘 = × 7.78 + × 3.49 + × 10.87 = 7.66
14 14 14

𝑆𝐷 ℎ𝑜𝑢𝑟𝑠 = 9.32
Hours
𝑆𝐷𝑅 ℎ𝑜𝑢𝑟𝑠, 𝑜𝑢𝑡𝑙𝑜𝑜𝑘 = 𝑆𝐷 ℎ𝑜𝑢𝑟𝑠 − 𝑆𝐷 ℎ𝑜𝑢𝑟𝑠, 𝑜𝑢𝑡𝑙𝑜𝑜𝑘 played 𝑛
(𝑆𝐷)
𝑆𝐷𝑅 ℎ𝑜𝑢𝑟𝑠, 𝑜𝑢𝑡𝑙𝑜𝑜𝑘 = 9.32 − 7.66 = 1.66 Sunny 7.78 5
Outlook Overcast 3.49 4
Rainy 10.87 5
It is supposed that you
know how to calculate
SD of sunny for example
Solution
SD(Hours) 𝑛 SD(Hours) 𝑛
Sunny 7.78 5 Cold 10.51 4
Outlook Overcast 3.49 4 Temp. Hot 8.95 4
Rainy 10.87 5 Mild 7.65 6
SD(hours, outlook)= 7.66 SD(hours, Temp.)= 8.84
SDR(hours, outlook)= 9.32 - 7.66 = 1.66 SDR(hours, Temp.)= 9.32 – 8.84 = 0.48

SD(Hours) 𝑛 SD(Hours) 𝑛
High 9.36 7 Weak 7.87 8
Humidity Wind
Normal 8.73 7 Strong 10.59 6
SD(hours, humidity)= 9.05 SD(hours, wind)= 9.03
SDR(hours, humidity)= 9.32 – 9.04 = 0.28 SDR(hours, wind)= 9.32 – 9.03 = 0.29
Solution
• The feature with the largest SDR is outlook which is selected to be the root for the
tree.
• The dataset is divided based on the values of the selected feature.
• This process is run recursively on the non-leaf branches until all data is processed.
• Termination criteria are:
• 𝐶𝑉 ≤ 10%
• and/or 𝑛 ≤ 3.
Solution
• Calculate the average of hours (AVG) and CV for all values (sunny, overcast, rainy) of the
outlook feature.
• Overcast subset does not need splitting because its 𝐶𝑉 = 8% is less than the threshold
(10%).
• The related leaf node of the overcast gets the average of the overcast subset.

SD AVG CV Outlook
𝑛
(hours) (hours) (hours) Initial tree Rainy Sunny
Sunny 7.66 35.2 22% 5
Overcast
Outlook Overcast 3.49 46.3 8% 4
Rainy 10.87 39.2 28% 5 ? 46.3 ?
Solution
• Rainy branch has 𝐶𝑉 = 28% which is greater than the given threshold (10%).
Hence, this branch needs further splitting.
• 𝑆𝐷 ℎ𝑜𝑢𝑟𝑠, 𝑟𝑎𝑖𝑛𝑦 = 10.87, this represents the SD of the remaining sub dataset
when outlook = rainy.
• Then, calculate SDR for each of the features temp., humidity, and wind.
Outlook Temp. Humidity Wind Hours played
Rainy Mild High Weak 45
Rainy Cold Normal Weak 52
Rainy Cold Normal Strong 23
Rainy Mild Normal Weak 46
Rainy Mild High Strong 30
Solution
SD(Hours) 𝑛 SD(Hours) 𝑛
Cold 14.5 2 High 7.5 2
Temp. Humidity
Mild 7.32 3 Normal 12.5 3
SD(hours, Temp.)= 10.19 SD(hours, humidity)= 10.5
SDR(hours, Temp.)= 10.87 – 10.19 = 0.678 SDR(hours, humidity)= 10.87 – 10.5 = 0.37

SD(Hours) 𝑛
Weak 3.09 3
Wind
Strong 3.5 2
SD(hours, wind)= 3.25
SDR(hours, wind)= 10.87 – 3.25 = 7.62
Solution
• Wind has the largest SDR.
• Because the number of instances for both branches (weak and strong) are all equal
or less than 3 𝑛 ≤ 3 we stop further branching and assign the average of each
branch to the related leaf node.
• 𝐴𝑉𝐺(𝑤𝑖𝑛𝑑 = 𝑤𝑒𝑎𝑘) = 47.7 Outlook
• 𝐴𝑉𝐺(𝑤𝑖𝑛𝑑 = 𝑠𝑡𝑟𝑜𝑛𝑔) = 26.5 Rainy Sunny
Overcast

Wind 46.3 ?

Weak Strong

47.7 26.5
Solution
• Sunny branch has 𝐶𝑉 = 22% which is greater than the given threshold (10%). Hence,
this branch needs further splitting.
• 𝑆𝐷 ℎ𝑜𝑢𝑟𝑠, 𝑠𝑢𝑛𝑛𝑦 = 7.78, this represents the SD of the remaining sub dataset when
outlook = sunny.
• Then, calculate SDR for each of the features temp., humidity, and wind.

Outlook Temp. Humidity Wind Hours played


Sunny Hot High Weak 25
Sunny Hot High Strong 30
Sunny Mild High Weak 35
Sunny Cold Normal Weak 38
Sunny Mild Normal Strong 48
Solution
SD(Hours) 𝑛 SD(Hours) 𝑛
Cold 0 1 High 4.1 3
Humidity
Temp. Hot 2.5 2 Normal 5 2
Mild 6.5 2 SD(hours, humidity)= 4.46
SD(hours, Temp.)= 3.6 SDR(hours, humidity)= 7.78 – 4.46 = 3.32
SDR(hours, Temp.)= 7.78 – 3.6 = 4.18

SD(Hours) 𝑛
Weak 5.6 3
Wind
Strong 9 2
SD(hours, wind)= 6.96
SDR(hours, wind)= 7.78 – 6.96 = 0.82
Solution
• Temp. has the largest SDR.
• Because the number of instances for temp.’s branches (cold, hot, and mild) are all
equal or less than 3 𝑛 ≤ 3 we stop further branching and assign the average of
each branch to the related leaf node.
• 𝐴𝑉𝐺(𝑡𝑒𝑚𝑝. = 𝑐𝑜𝑙𝑑) = 38
• 𝐴𝑉𝐺(𝑡𝑒𝑚𝑝. = ℎ𝑜𝑡) = 27.5
• 𝐴𝑉𝐺(𝑡𝑒𝑚𝑝. = 𝑚𝑖𝑙𝑑) = 41.5
Solution

Outlook
Rainy Sunny
Overcast

Wind 46.3 Temp.


Cold Mild
Weak Strong Hot

47.7 26.5 38 27.5 41.5

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy