0% found this document useful (0 votes)

21 views45 pages

07_Decision tree

Uploaded by

Obaida Almoula

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

21 views45 pages

07_Decision tree

Uploaded by

Obaida Almoula

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 45

AI

Decision Tree

Dr. Ali Al-Saegh

Computer Engineering Department, College of Engineering, University of Mosul
Introduction
• A decision tree is a supervised learning algorithm that generates a tree and a set of rules
from a given dataset. It is used for classification and regression.
• It is a hierarchical data structure that represents data through a divide-and-conquer strategy.

• A decision tree consists of : Decision nodes

• Decision node tests one attribute (feature)
• One branch is for each possible attribute value Branch
• Leaf node assigns a class

• In general, the rules have the form:

• If condition1 and condition2 and … then outcome.
• e.g.: If diameter >= 3 and color = orange then orange

Leaf node
Choosing a good attribute
• Would we prefer to split on X1 or X2?

• Good split if we are more certain about classification after split.

• Deterministic is good (all true or all false)
• Uniform distribution is bad
Entropy
• Entropy is a measure of the amount of uncertainty or impurity in the dataset S.

𝐻 𝑆 = − ෍ 𝑃(𝑥) log 2 𝑃(𝑥)

𝑥∈𝑋

• 𝑆: dataset for which entropy is being calculated

• 𝑋: set of classes in 𝑆
• 𝑃(𝑥): probability of 𝑥 i.e. proportion of the number of elements in class 𝑥 to the number of elements in 𝑆

Impure Less impure Pure

Interpretation of entropy
• “Low Entropy” • “High Entropy”
• The class (label) is from a • The class (label) is from a
varied (peaks and valleys) uniform-like distribution.
distribution. • Flat histogram.
• Histogram has many lows or highs. • Values sampled from it are
• Values sampled from it are more less predictable.
predictable.
Information gain
• Information Gain (IG) measures the reduction in entropy or uncertainty after
splitting the dataset according to a given value of a random variable.
• It is a measure of how much information a feature provides about a class.

𝐼𝐺 𝑆, 𝐴 = 𝐻(𝑆) − ෍ 𝑃 𝑡 𝐻(𝑡)
𝑡∈𝑇

• 𝐼𝐺 𝑆, 𝐴 : information gain by splitting 𝑆 on feature 𝐴

• 𝑇: the subset created from splitting set 𝑆 on feature 𝐴
• 𝑃 𝑡 : the proportion of number of elements in 𝑡 to the number of elements in 𝑆
• 𝐻(𝑆): entropy of dataset 𝑆
• 𝐻(𝑡) : entropy of subset 𝑡
ID3 (Iterative Dichotomies 3)
• The ID3 algorithm is used to generate a decision tree from a dataset.
• IG determines which feature is most useful for discriminating between the classes
to be learned.
• IG is used within the ID3 algorithm to decide the ordering of features in the nodes
of a decision tree.
• ID3 follows a greedy approach by selecting the best attribute/feature that yields
maximum IG.
• A greedy algorithm is an approach for solving a problem by selecting the best option
available at the moment. Hence, it may find the local optimal solution.
ID3 algorithm
1. Calculate entropy for the dataset.
2. For each attribute/feature.
2.1. Calculate entropy for all its categorical values.
2.2. Calculate information gain for the feature.
3. Find the feature with maximum information gain.
4. Repeat it until we get the desired decision tree.
Example 1
Day Outlook Temp. Humidity Wind Play tennis
1 Sunny Hot High Weak No
• For the shown dataset (𝑠), it is 2 Sunny Hot High Strong No
required to construct a decision 3 Overcast Hot High Weak Yes
tree to decide whether to play 4 Rainy Mild High Weak Yes
tennis or not based on the 5 Rainy Cold Normal Weak Yes
weather conditions. 6 Rainy Cold Normal Strong No
7 Overcast Cold Normal Strong Yes
8 Sunny Mild High Weak No
9 Sunny Cold Normal Weak Yes
10 Rainy Mild Normal Weak Yes
11 Sunny Mild Normal Strong Yes
12 Overcast Mild High Strong Yes
13 Overcast Hot Normal Weak Yes
14 Rainy Mild High Strong No
Solution
• Dataset is of binary classes (yes and no), where 9 out of 14 are "yes" and 5 out of
14 are "no“
• The entropy for the dataset is calculated as:

𝐻 𝑆 = − ෍ 𝑃(𝑥) log 2 𝑃(𝑥)

𝑥∈𝑋

9 9 5 5
= − log 2 − log 2
14 14 14 14

= 0.41 + 0.53 = 0.94

Solution
• For each feature of the dataset, calculate entropy for all its categorical values then calculate
information gain for the feature.
• First feature is outlook which has three categorical values sunny, overcast, and rainy.
2 2 3 3
• 𝐻 𝑠𝑢𝑛𝑛𝑦 = − log 2 − log 2 = 0.971 2 out of 5 sunny is “yes” and 3 out of 5 sunny is “no”
5 5 5 5

4 4
• 𝐻 𝑜𝑣𝑒𝑟𝑐𝑎𝑠𝑡 = − log 2 −0=0 4 out of 4 overcast is “yes”
4 4

3 3 2 2
• 𝐻 𝑟𝑎𝑖𝑛𝑦 = − log 2 − log 2 = 0.971 3 out of 5 rainy is “yes” and 2 out of 5 rainy is “no”
5 5 5 5

• 𝐼𝐺 𝑆, 𝑜𝑢𝑡𝑙𝑜𝑜𝑘 = 𝐻(𝑆) − 𝑃 𝑠𝑢𝑛𝑛𝑦 × 𝐻 𝑠𝑢𝑛𝑛𝑦 + 𝑃 𝑜𝑣𝑒𝑟𝑐𝑎𝑠𝑡 × 𝐻 𝑜𝑣𝑒𝑟𝑐𝑎𝑠𝑡 + 𝑃(𝑟𝑎𝑖𝑛) × 𝐻(𝑟𝑎𝑖𝑛)

5 4 5
• 𝐼𝐺 𝑆, 𝑜𝑢𝑡𝑙𝑜𝑜𝑘 = 0.94 − × 0.971 + ×0+ × 0.971 = 0.94 − 0.693 = 0.247
14 14 14
Solution
• Second feature is temp. which has three categorical values hot, cold, and mild.
2 2 2 2
• 𝐻 ℎ𝑜𝑡 = − log 2 − log 2 =1 2 out of 4 hot is “yes” and 2 out of 4 hot is “no”
4 4 4 4

3 3 1 1
• 𝐻 𝑐𝑜𝑙𝑑 = − log 2 − log 2 = 0.811 3 out of 4 cold is “yes” and 1 out of 4 cold is “no”
4 4 4 4

4 4 2 2
• 𝐻 𝑚𝑖𝑙𝑑 = − log 2 − log 2 = 0.9179 4 out of 6 mild is “yes” and 2 out of 6 mild is “no”
6 6 6 6

• 𝐼𝐺 𝑆, 𝑡𝑒𝑚𝑝. = 𝐻(𝑆) − 𝑃 ℎ𝑜𝑡 × 𝐻 ℎ𝑜𝑡 + 𝑃 𝑐𝑜𝑙𝑑 × 𝐻 𝑐𝑜𝑙𝑑 + 𝑃(𝑚𝑖𝑙𝑑) × 𝐻(𝑚𝑖𝑙𝑑)

4 4 6
• 𝐼𝐺 𝑆, 𝑡𝑒𝑚𝑝. = 0.94 − ×1+ × 0.811 + × 0.9179 = 0.94 − 0.9108 = 0.0292
14 14 14
Solution
• Third feature is humidity which has two categorical values high and normal.
3 3 4 4
• 𝐻 ℎ𝑖𝑔ℎ = − log 2 − log 2 = 0.983 3 out of 7 high is “yes” and 4 out of 7 high is “no”
7 7 7 7

6 6 1 1
• 𝐻 𝑛𝑜𝑟𝑚𝑎𝑙 = − log 2 − log 2 = 0.591 6 out of 7 normal is “yes” and 1 out of 7 normal is “no”
7 7 7 7

• 𝐼𝐺 𝑆, ℎ𝑢𝑚𝑖𝑑𝑖𝑡𝑦 = 𝐻(𝑆) − 𝑃 ℎ𝑖𝑔ℎ × 𝐻 ℎ𝑖𝑔ℎ + 𝑃 𝑛𝑜𝑟𝑚𝑎𝑙 × 𝐻 𝑛𝑜𝑟𝑚𝑎𝑙

7 7
• 𝐼𝐺 𝑆, ℎ𝑢𝑚𝑖𝑑𝑖𝑡𝑦 = 0.94 − × 0.983 + × 0.591 = 0.94 − 0.787 = 0.153
14 14
Solution
• Fourth feature is wind which has two categorical values weak and strong.
6 6 2 2
• 𝐻 𝑤𝑒𝑎𝑘 = − log 2 − log 2 = 0.811 6 out of 8 weak is “yes” and 2 out of 8 weak is “no”
8 8 8 8

3 3 3 3
• 𝐻 𝑠𝑡𝑟𝑜𝑛𝑔 = − log 2 − log 2 =1 3 out of 6 strong is “yes” and 3 out of 6 strong is “no”
6 6 6 6

• 𝐼𝐺 𝑆, 𝑤𝑖𝑛𝑑 = 𝐻(𝑆) − 𝑃 𝑤𝑒𝑎𝑘 × 𝐻 𝑤𝑒𝑎𝑘 + 𝑃 𝑠𝑡𝑟𝑜𝑛𝑔 × 𝐻 𝑠𝑡𝑟𝑜𝑛𝑔

8 6
• 𝐼𝐺 𝑆, 𝑤𝑖𝑛𝑑 = 0.94 − × 0.811 + × 1 = 0.94 − 0.892 = 0.048
14 14
Solution
• The feature with maximum information gain is outlook. So, the decision tree built
so far:
• Note that, when Outlook = overcast, it is of pure class “Yes”.

Outlook

Sunny Overcast Rainy

? Yes ?
Solution
• Next, from the remaining three features temp., humidity, and wind, we decide which
one is the best for the left branch of outlook.
• Since the left branch of outlook denotes sunny, we will work with the set of rows
having sunny as the value in the outlook column.
• Calculate entropy for this subset (outlook = sunny):

Outlook Temp. Humidity Wind Play tennis

Sunny Hot High Weak No
2 2 3 3
𝐻 𝑠𝑢𝑛𝑛𝑦 = − log 2 − log 2 Sunny Hot High Strong No
5 5 5 5
Sunny Mild High Weak No
Sunny Cold Normal Weak Yes
= 0.971 Sunny Mild Normal Strong Yes
Solution
• First feature is temp. which has three categorical values hot, cold, and mild.
2 2
• 𝐻 ℎ𝑜𝑡 = 0 − log 2 =0
2 2

• 𝐻 𝑐𝑜𝑙𝑑 = −1 log 2 1 − 0 = 0
1 1 1 1
• 𝐻 𝑚𝑖𝑙𝑑 = − log 2 − log 2 =1
2 2 2 2

• 𝐼𝐺 𝑠𝑢𝑛𝑛𝑦, 𝑡𝑒𝑚𝑝. = 𝐻(𝑠𝑢𝑛𝑛𝑦) − 𝑃 ℎ𝑜𝑡 × 𝐻 ℎ𝑜𝑡 + 𝑃 𝑐𝑜𝑙𝑑 × 𝐻 𝑐𝑜𝑙𝑑 + 𝑃(𝑚𝑖𝑙𝑑) × 𝐻(𝑚𝑖𝑙𝑑)

2 1 2
• 𝐼𝐺 𝑠𝑢𝑛𝑛𝑦, 𝑡𝑒𝑚𝑝. = 0.971 − ×0+ × 0 + × 1 = 0.971 − 0.4 = 0.571
5 15 5
Solution
• Second feature is humidity which has two categorical values high and normal.
3 3
• 𝐻 ℎ𝑖𝑔ℎ = 0 − log 2 =0
3 3

2 2
• 𝐻 𝑛𝑜𝑟𝑚𝑎𝑙 = − log 2 −0=0
2 2

• 𝐼𝐺 𝑠𝑢𝑛𝑛𝑦, ℎ𝑢𝑚𝑖𝑑𝑖𝑡𝑦 = 𝐻(𝑠𝑢𝑛𝑛𝑦) − 𝑃 ℎ𝑖𝑔ℎ × 𝐻 ℎ𝑖𝑔ℎ + 𝑃 𝑛𝑜𝑟𝑚𝑎𝑙 × 𝐻 𝑛𝑜𝑟𝑚𝑎𝑙

3 2
• 𝐼𝐺 𝑠𝑢𝑛𝑛𝑦, ℎ𝑢𝑚𝑖𝑑𝑖𝑡𝑦 = 0.971 − × 0 + × 0 = 0.971 − 0 = 0.971
5 5
Solution
• Third feature is wind which has two categorical values weak and strong.
1 1 2 2
• 𝐻 𝑤𝑒𝑎𝑘 = − log 2 − log 2 = 0.918
3 3 3 3

1 1 1 1
• 𝐻 𝑠𝑡𝑟𝑜𝑛𝑔 = − log 2 − log 2 =1
2 2 2 2

• 𝐼𝐺 𝑠𝑢𝑛𝑛𝑦, 𝑤𝑖𝑛𝑑 = 𝐻(𝑠𝑢𝑛𝑛𝑦) − 𝑃 𝑤𝑒𝑎𝑘 × 𝐻 𝑤𝑒𝑎𝑘 + 𝑃 𝑠𝑡𝑟𝑜𝑛𝑔 × 𝐻 𝑠𝑡𝑟𝑜𝑛𝑔

3 2
• 𝐼𝐺 𝑠𝑢𝑛𝑛𝑦, 𝑤𝑖𝑛𝑑 = 0.971 − × 0.918 + × 1 = 0.971 − 0.9508 = 0.0202
5 5
Solution
• Here, the attribute with the maximum information gain is humidity. So, the decision tree
built so far.
• When outlook = sunny and humidity = high, it is a pure class of category “no”.
• When outlook = sunny and humidity = normal, it is a pure class of category “yes”.
• Therefore, we don't need to do further calculations on the sunny branch.

Outlook

Sunny Overcast Rainy

Humidity Yes ?

High Normal

No Yes
Solution
• Next, from the remaining two features temp. and wind, we decide which one is the
best for splitting the data.
• Since the remaining branch of outlook denotes rainy, we will work with the set of
rows having rainy as the value in the outlook column.
• Calculate entropy for this subset (outlook = rainy):

Outlook Temp. Humidity Wind Play tennis

3 3 2 2 Rainy Mild High Weak Yes
𝐻 𝑟𝑎𝑖𝑛𝑦 = − log 2 − log 2 Rainy Cold Normal Weak Yes
5 5 5 5
Rainy Cold Normal Strong No
Rainy Mild Normal Weak Yes
= 0.971 Rainy Mild High Strong No
Solution
• First feature is temp. which has two categorical values cold and mild.
1 1 1 1
• 𝐻 𝑐𝑜𝑙𝑑 = − log 2 − log 2 =1
2 2 2 2

2 2 1 1
• 𝐻 𝑚𝑖𝑙𝑑 = − log 2 − log 2 = 0.918
3 3 3 3

• 𝐼𝐺 𝑟𝑎𝑖𝑛𝑦, 𝑡𝑒𝑚𝑝. = 𝐻(𝑟𝑎𝑖𝑛𝑦) − 𝑃 𝑐𝑜𝑙𝑑 × 𝐻 𝑐𝑜𝑙𝑑 + 𝑃(𝑚𝑖𝑙𝑑) × 𝐻(𝑚𝑖𝑙𝑑)

2 3
• 𝐼𝐺 𝑟𝑎𝑖𝑛𝑦, 𝑡𝑒𝑚𝑝. = 0.971 − × 1 + × 0.918 = 0.971 − 0.9508 = 0.0202
5 5
Solution
• Second feature is wind which has two categorical values weak and strong.
3 3
• 𝐻 𝑤𝑒𝑎𝑘 = − log 2 −0=0
3 3

2 2
• 𝐻 𝑠𝑡𝑟𝑜𝑛𝑔 = 0 − log 2 =0
2 2

• 𝐼𝐺 𝑟𝑎𝑖𝑛𝑦, 𝑤𝑖𝑛𝑑 = 𝐻(𝑟𝑎𝑖𝑛𝑦) − 𝑃 𝑤𝑒𝑎𝑘 × 𝐻 𝑤𝑒𝑎𝑘 + 𝑃 𝑠𝑡𝑟𝑜𝑛𝑔 × 𝐻 𝑠𝑡𝑟𝑜𝑛𝑔

3 2
• 𝐼𝐺 𝑟𝑎𝑖𝑛𝑦, 𝑤𝑖𝑛𝑑 = 0.971 − × 0 + × 0 = 0.971 − 0 = 0.971
5 5
Solution
• The feature with maximum information gain is wind.
• when outlook = rainy and wind = strong, it is a pure class of category "no".
• When outlook = rainy and wind = weak, it is again a pure class of category "yes".
• Therefore, there is no more calculations.
Solution

Outlook

Sunny Overcast Rainy

Humidity Yes Wind

High Normal Weak Strong

No Yes Yes No
Real-valued features
• The real-life data often contains numeric information or a mixture of different
feature types while decision trees work with categorical values.
• Discretization is a pre-processing step that changes numeric values to categorical
ones by finding sub-intervals.
• Binary split is a discretization method based on a threshold value (“greater than or
equal to” and “less than”).
• Splitting on feature 𝑥 at value 𝑡:
• One branch: 𝑥 ≥ 𝑡
• Other branch: 𝑥 < 𝑡
• In binary split, the aim is to maximize 𝐼𝐺 𝑆|𝑥: 𝑡
• i.e. threshold 𝑡 should maximize information gain for feature 𝑥 in dataset 𝑆.
Example 2
Day Outlook Temp. Humidity Wind Play tennis
• For the shown dataset (𝑠), it is 1 Sunny 85 85 Weak No
required to construct a decision 2 Sunny 80 90 Strong No
tree to decide whether to play 3 Overcast 83 78 Weak Yes
tennis or not based on the 4 Rainy 70 96 Weak Yes
weather conditions. 5 Rainy 68 80 Weak Yes
6 Rainy 65 70 Strong No
7 Overcast 64 65 Strong Yes
8 Sunny 72 95 Weak No
9 Sunny 69 70 Weak Yes
10 Rainy 75 80 Weak Yes
11 Sunny 75 70 Strong Yes
12 Overcast 72 90 Strong Yes
13 Overcast 81 75 Weak Yes
14 Rainy 71 80 Strong No
Solution
Day Humidity Play tennis
1 65 Yes
• Continuous values of humidity and temp. features need to 2 70 No
be converted to categorical ones. 3 70 Yes
• We will convert the humidity values using binary 4 70 Yes
discretization. 5 75 Yes
6 78 Yes
• Binary discretization steps:
7 80 Yes
1. Sort values from smallest to largest. 8 80 Yes
2. Iterate on all values and separate the dataset into two 9 80 No
parts. 10 85 No
3. Calculate the gain for every step (value). 11 90 No
12 90 Yes
4. The value which maximizes the gain would be the 13 95 No
threshold. 14 96 Yes
Solution
𝐻 ℎ𝑢𝑚𝑖𝑑𝑖𝑡𝑦 ≤ 65 = −𝑝(𝑛𝑜) log 2 𝑝(𝑛𝑜) − 𝑝(𝑦𝑒𝑠) log 2 𝑝(𝑦𝑒𝑠)

0 0 1 1
= − log 2 − log 2 =0
1 1 1 1

𝐻 ℎ𝑢𝑚𝑖𝑑𝑖𝑡𝑦 > 65 = −𝑝(𝑛𝑜) log 2 𝑝(𝑛𝑜) − 𝑝(𝑦𝑒𝑠) log 2 𝑝(𝑦𝑒𝑠)

5 5 8 8
=− log 2 − log 2 = 0.53 + 0.431 = 0.961
13 13 13 13

𝐼𝐺 ℎ𝑢𝑚𝑖𝑑𝑖𝑡𝑦, 65 = 𝐻(𝑆) − 𝑃 ℎ𝑢𝑚𝑖𝑑𝑖𝑡𝑦 ≤ 65 × 𝐻 ℎ𝑢𝑚𝑖𝑑𝑖𝑡𝑦 ≤ 65 + 𝑃 ℎ𝑢𝑚𝑖𝑑𝑖𝑡𝑦 > 65 × 𝐻 ℎ𝑢𝑚𝑖𝑑𝑖𝑡𝑦 > 65

1 13
𝐼𝐺 ℎ𝑢𝑚𝑖𝑑𝑖𝑡𝑦, 65 = 0.94 − ×0 + × 0.961 = 0.94 − 0.892 = 0.048
14 14
Solution
• IG maximizes when humidity is equal to 80. Day Outlook Temp. Humidity>80 Wind Play tennis
• Hence, threshold is equal to 80. 1 Sunny 85 yes Weak No
2 Sunny 80 yes Strong No
3 Overcast 83 no Weak Yes
Humidity 𝐼𝐺
4 Rainy 70 yes Weak Yes
65 0.048
5 Rainy 68 no Weak Yes
70 0.014 6 Rainy 65 no Strong No
75 0.045 7 Overcast 64 no Strong Yes
78 0.090 8 Sunny 72 yes Weak No
80 0.101 9 Sunny 69 no Weak Yes
85 0.024 10 Rainy 75 no Weak Yes
11 Sunny 75 no Strong Yes
90 0.010
12 Overcast 72 yes Strong Yes
95 0.048
13 Overcast 81 no Weak Yes
Humidity cannot be
96 greater than this value 14 Rainy 71 no Strong No
Solution
• If you change the continuous values of temp. to categorical values and continue
solving, you will get:

Outlook

Sunny Overcast Rainy

Humidity Yes Wind

> 80 ≤ 80 Weak Strong

No Yes Yes No
Regression tree
• Standard deviation reduction (𝑆𝐷𝑅) is used instead of IG for constructing a
regression decision tree.
• It involves partitioning the data into subsets that contain instances with nearly
similar values (homogenous).
• Standard deviation (𝑆𝐷) is used to calculate the homogeneity of numerical
samples.
σ 𝑥 − 𝑥ҧ 2
• If the numerical samples are completely homogeneous their standard 𝑆𝐷 =
deviation is zero. 𝑛
• Branching termination criteria are:
• when coefficient of variation (𝐶𝑉) for a branch becomes smaller than a 𝑆𝐷
certain threshold. 𝐶𝑉 = × 100%
𝑥ҧ
• when too few instances (𝑛) remain in the branch.
Standard deviation
• For each feature of the dataset, calculate 𝑆𝐷 for all its values then calculate 𝑆𝐷𝑅
for the feature.

𝑆𝐷 𝑆, 𝐴 = ෍ 𝑃 𝑡 𝑆𝐷(𝑡)
𝑡∈𝑇

• 𝑆𝐷 𝑆, 𝐴 : standard deviation by splitting the dataset (𝑆) on the feature 𝐴

• 𝑇: the subset created from splitting 𝑆 by feature 𝐴
• 𝑃 𝑡 : the proportion of number of elements in 𝑡 to the number of elements in 𝑆
• 𝑆𝐷(𝑡): standard deviation of subset 𝑡

𝑆𝐷𝑅 𝑆, 𝐴 = 𝑆𝐷 𝑆 − 𝑆𝐷 𝑆, 𝐴
Example 3
Hours
Day Outlook Temp. Humidity Wind
played
• For the shown dataset (𝑠), it is 1 Sunny Hot High Weak 25
required to construct a regression 2 Sunny Hot High Strong 30
tree to decide hours to play tennis 3 Overcast Hot High Weak 46
based on the weather conditions. 4 Rainy Mild High Weak 45
5 Rainy Cold Normal Weak 52
6 Rainy Cold Normal Strong 23
7 Overcast Cold Normal Strong 43
8 Sunny Mild High Weak 35
9 Sunny Cold Normal Weak 38
10 Rainy Mild Normal Weak 46
11 Sunny Mild Normal Strong 48
12 Overcast Mild High Strong 52
13 Overcast Hot Normal Weak 44
14 Rainy Mild High Strong 30
Solution
𝑆𝐷 ℎ𝑜𝑢𝑟𝑠, 𝑜𝑢𝑡𝑙𝑜𝑜𝑘 = 𝑃 𝑠𝑢𝑛𝑛𝑦 × 𝑆𝐷 𝑠𝑢𝑛𝑛𝑦 + 𝑃 𝑜𝑣𝑒𝑟𝑐𝑎𝑠𝑡 × 𝑆𝐷 𝑜𝑣𝑒𝑟𝑐𝑎𝑠𝑡 + 𝑃 𝑟𝑎𝑖𝑛𝑦 × 𝑆𝐷 𝑟𝑎𝑖𝑛𝑦

5 4 5
𝑆𝐷 ℎ𝑜𝑢𝑟𝑠, 𝑜𝑢𝑡𝑙𝑜𝑜𝑘 = × 7.78 + × 3.49 + × 10.87 = 7.66
14 14 14

𝑆𝐷 ℎ𝑜𝑢𝑟𝑠 = 9.32
Hours
𝑆𝐷𝑅 ℎ𝑜𝑢𝑟𝑠, 𝑜𝑢𝑡𝑙𝑜𝑜𝑘 = 𝑆𝐷 ℎ𝑜𝑢𝑟𝑠 − 𝑆𝐷 ℎ𝑜𝑢𝑟𝑠, 𝑜𝑢𝑡𝑙𝑜𝑜𝑘 played 𝑛
(𝑆𝐷)
𝑆𝐷𝑅 ℎ𝑜𝑢𝑟𝑠, 𝑜𝑢𝑡𝑙𝑜𝑜𝑘 = 9.32 − 7.66 = 1.66 Sunny 7.78 5
Outlook Overcast 3.49 4
Rainy 10.87 5
It is supposed that you
know how to calculate
SD of sunny for example
Solution
SD(Hours) 𝑛 SD(Hours) 𝑛
Sunny 7.78 5 Cold 10.51 4
Outlook Overcast 3.49 4 Temp. Hot 8.95 4
Rainy 10.87 5 Mild 7.65 6
SD(hours, outlook)= 7.66 SD(hours, Temp.)= 8.84
SDR(hours, outlook)= 9.32 - 7.66 = 1.66 SDR(hours, Temp.)= 9.32 – 8.84 = 0.48

SD(Hours) 𝑛 SD(Hours) 𝑛
High 9.36 7 Weak 7.87 8
Humidity Wind
Normal 8.73 7 Strong 10.59 6
SD(hours, humidity)= 9.05 SD(hours, wind)= 9.03
SDR(hours, humidity)= 9.32 – 9.04 = 0.28 SDR(hours, wind)= 9.32 – 9.03 = 0.29
Solution
• The feature with the largest SDR is outlook which is selected to be the root for the
tree.
• The dataset is divided based on the values of the selected feature.
• This process is run recursively on the non-leaf branches until all data is processed.
• Termination criteria are:
• 𝐶𝑉 ≤ 10%
• and/or 𝑛 ≤ 3.
Solution
• Calculate the average of hours (AVG) and CV for all values (sunny, overcast, rainy) of the
outlook feature.
• Overcast subset does not need splitting because its 𝐶𝑉 = 8% is less than the threshold
(10%).
• The related leaf node of the overcast gets the average of the overcast subset.

SD AVG CV Outlook
𝑛
(hours) (hours) (hours) Initial tree Rainy Sunny
Sunny 7.66 35.2 22% 5
Overcast
Outlook Overcast 3.49 46.3 8% 4
Rainy 10.87 39.2 28% 5 ? 46.3 ?
Solution
• Rainy branch has 𝐶𝑉 = 28% which is greater than the given threshold (10%).
Hence, this branch needs further splitting.
• 𝑆𝐷 ℎ𝑜𝑢𝑟𝑠, 𝑟𝑎𝑖𝑛𝑦 = 10.87, this represents the SD of the remaining sub dataset
when outlook = rainy.
• Then, calculate SDR for each of the features temp., humidity, and wind.
Outlook Temp. Humidity Wind Hours played
Rainy Mild High Weak 45
Rainy Cold Normal Weak 52
Rainy Cold Normal Strong 23
Rainy Mild Normal Weak 46
Rainy Mild High Strong 30
Solution
SD(Hours) 𝑛 SD(Hours) 𝑛
Cold 14.5 2 High 7.5 2
Temp. Humidity
Mild 7.32 3 Normal 12.5 3
SD(hours, Temp.)= 10.19 SD(hours, humidity)= 10.5
SDR(hours, Temp.)= 10.87 – 10.19 = 0.678 SDR(hours, humidity)= 10.87 – 10.5 = 0.37

SD(Hours) 𝑛
Weak 3.09 3
Wind
Strong 3.5 2
SD(hours, wind)= 3.25
SDR(hours, wind)= 10.87 – 3.25 = 7.62
Solution
• Wind has the largest SDR.
• Because the number of instances for both branches (weak and strong) are all equal
or less than 3 𝑛 ≤ 3 we stop further branching and assign the average of each
branch to the related leaf node.
• 𝐴𝑉𝐺(𝑤𝑖𝑛𝑑 = 𝑤𝑒𝑎𝑘) = 47.7 Outlook
• 𝐴𝑉𝐺(𝑤𝑖𝑛𝑑 = 𝑠𝑡𝑟𝑜𝑛𝑔) = 26.5 Rainy Sunny
Overcast

Wind 46.3 ?

Weak Strong

47.7 26.5
Solution
• Sunny branch has 𝐶𝑉 = 22% which is greater than the given threshold (10%). Hence,
this branch needs further splitting.
• 𝑆𝐷 ℎ𝑜𝑢𝑟𝑠, 𝑠𝑢𝑛𝑛𝑦 = 7.78, this represents the SD of the remaining sub dataset when
outlook = sunny.
• Then, calculate SDR for each of the features temp., humidity, and wind.

Outlook Temp. Humidity Wind Hours played

Sunny Hot High Weak 25
Sunny Hot High Strong 30
Sunny Mild High Weak 35
Sunny Cold Normal Weak 38
Sunny Mild Normal Strong 48
Solution
SD(Hours) 𝑛 SD(Hours) 𝑛
Cold 0 1 High 4.1 3
Humidity
Temp. Hot 2.5 2 Normal 5 2
Mild 6.5 2 SD(hours, humidity)= 4.46
SD(hours, Temp.)= 3.6 SDR(hours, humidity)= 7.78 – 4.46 = 3.32
SDR(hours, Temp.)= 7.78 – 3.6 = 4.18

SD(Hours) 𝑛
Weak 5.6 3
Wind
Strong 9 2
SD(hours, wind)= 6.96
SDR(hours, wind)= 7.78 – 6.96 = 0.82
Solution
• Temp. has the largest SDR.
• Because the number of instances for temp.’s branches (cold, hot, and mild) are all
equal or less than 3 𝑛 ≤ 3 we stop further branching and assign the average of
each branch to the related leaf node.
• 𝐴𝑉𝐺(𝑡𝑒𝑚𝑝. = 𝑐𝑜𝑙𝑑) = 38
• 𝐴𝑉𝐺(𝑡𝑒𝑚𝑝. = ℎ𝑜𝑡) = 27.5
• 𝐴𝑉𝐺(𝑡𝑒𝑚𝑝. = 𝑚𝑖𝑙𝑑) = 41.5
Solution

Outlook
Rainy Sunny
Overcast

Wind 46.3 Temp.

Cold Mild
Weak Strong Hot

47.7 26.5 38 27.5 41.5

Decision Tree
100% (4)
Decision Tree
66 pages
Assigment 2 Ammad Ali
No ratings yet
Assigment 2 Ammad Ali
8 pages
Decision Tree
100% (1)
Decision Tree
10 pages
Module 3-Decision Tree Learning
100% (1)
Module 3-Decision Tree Learning
33 pages
Decision Tree
No ratings yet
Decision Tree
36 pages
Decisiontrees
No ratings yet
Decisiontrees
46 pages
Decision Tree Classification
100% (1)
Decision Tree Classification
11 pages
ML Unit-3 ppt
No ratings yet
ML Unit-3 ppt
92 pages
Decision-Tree Learning .
No ratings yet
Decision-Tree Learning .
29 pages
L5 - Decision Tree - B
No ratings yet
L5 - Decision Tree - B
51 pages
Classification - Issues Regarding Classification and Prediction
No ratings yet
Classification - Issues Regarding Classification and Prediction
42 pages
7-Decision Trees Learning
No ratings yet
7-Decision Trees Learning
51 pages
Classification With Decision Trees: Instructor: Qiang Yang
100% (1)
Classification With Decision Trees: Instructor: Qiang Yang
62 pages
Decision Trees Iterative Dichotomiser 3 (ID3) For Classification: An ML Algorithm
No ratings yet
Decision Trees Iterative Dichotomiser 3 (ID3) For Classification: An ML Algorithm
7 pages
3.1 C 4.5 Algorithm-19
No ratings yet
3.1 C 4.5 Algorithm-19
10 pages
07 - ML - Decision Tree
No ratings yet
07 - ML - Decision Tree
37 pages
Classification With Decision Trees I: Instructor: Qiang Yang
No ratings yet
Classification With Decision Trees I: Instructor: Qiang Yang
29 pages
Decision Tree
No ratings yet
Decision Tree
100 pages
Lecture2 DT
No ratings yet
Lecture2 DT
75 pages
unit 3
No ratings yet
unit 3
90 pages
Random Writer
No ratings yet
Random Writer
14 pages
3 Decision Trees_LMS
No ratings yet
3 Decision Trees_LMS
47 pages
Unit 6 Finalized
No ratings yet
Unit 6 Finalized
30 pages
2.3 Decision-Tree-Algorithm
No ratings yet
2.3 Decision-Tree-Algorithm
61 pages
Geometric Intuition of Decision Tree: Axis Parallel Hyperplanes
No ratings yet
Geometric Intuition of Decision Tree: Axis Parallel Hyperplanes
7 pages
Decision Tree (Class 37-38) 169692509554958626652505a71d481
No ratings yet
Decision Tree (Class 37-38) 169692509554958626652505a71d481
45 pages
DMDW-CO3-SESSION-14
No ratings yet
DMDW-CO3-SESSION-14
55 pages
A Step by Step ID3 Decision Tree Example by Niranjan Kumar Das
No ratings yet
A Step by Step ID3 Decision Tree Example by Niranjan Kumar Das
8 pages
Chapter 3
No ratings yet
Chapter 3
88 pages
DT Classifier
No ratings yet
DT Classifier
45 pages
Machine Learning Descision Tree
No ratings yet
Machine Learning Descision Tree
20 pages
Entropy and Information Gain Explained
No ratings yet
Entropy and Information Gain Explained
10 pages
3.2 GaussElimination+Pivoting
No ratings yet
3.2 GaussElimination+Pivoting
61 pages
Lec-2 Decision Tree_13-8-2024
No ratings yet
Lec-2 Decision Tree_13-8-2024
38 pages
7. Decision Tree & Random Forest
No ratings yet
7. Decision Tree & Random Forest
41 pages
Decision Tree
No ratings yet
Decision Tree
27 pages
ML-19 (1)
No ratings yet
ML-19 (1)
28 pages
Decision Trees
No ratings yet
Decision Trees
49 pages
Lesson 5
No ratings yet
Lesson 5
28 pages
MLT UNIT-3 notes
No ratings yet
MLT UNIT-3 notes
35 pages
C. N. II Lec.5 2024
No ratings yet
C. N. II Lec.5 2024
50 pages
Python Lab Exercise 1-10
No ratings yet
Python Lab Exercise 1-10
12 pages
Decision Tree Algorithm
No ratings yet
Decision Tree Algorithm
18 pages
DM UNIT 4b (1R ALGO)
No ratings yet
DM UNIT 4b (1R ALGO)
39 pages
ML_Unit-2_Material
No ratings yet
ML_Unit-2_Material
20 pages
Hashing
No ratings yet
Hashing
41 pages
Decision Tree
No ratings yet
Decision Tree
14 pages
Unit 4 - Decision Tree ID3
No ratings yet
Unit 4 - Decision Tree ID3
5 pages
ID3 Decision Tree Explanation
No ratings yet
ID3 Decision Tree Explanation
8 pages
ML intro
No ratings yet
ML intro
45 pages
Flowchart_new_database
No ratings yet
Flowchart_new_database
16 pages
3ID3 Algorithm
No ratings yet
3ID3 Algorithm
9 pages
What Is An ID3 Algorithm?
No ratings yet
What Is An ID3 Algorithm?
10 pages
ML Article Writing
No ratings yet
ML Article Writing
3 pages
T6 Decision Tree
No ratings yet
T6 Decision Tree
38 pages
Time Complexity Analysis
No ratings yet
Time Complexity Analysis
30 pages
Lecture Three Source Transformation
No ratings yet
Lecture Three Source Transformation
16 pages
Machine Learning Lec6
No ratings yet
Machine Learning Lec6
40 pages
ML_Unit-3
No ratings yet
ML_Unit-3
29 pages
id3algorithm-200307175839
No ratings yet
id3algorithm-200307175839
22 pages
Chapter 3 Sorting and Searching
No ratings yet
Chapter 3 Sorting and Searching
20 pages
08_k-means
No ratings yet
08_k-means
19 pages
Practical File FOR Programming in Visual Basic: Submitted To: Submitted By: Meenu Kakker Bca 5 Sem Sec: - A' Roll No.
No ratings yet
Practical File FOR Programming in Visual Basic: Submitted To: Submitted By: Meenu Kakker Bca 5 Sem Sec: - A' Roll No.
30 pages
Supervised learningNN
No ratings yet
Supervised learningNN
73 pages
3-Fourier Series l2
No ratings yet
3-Fourier Series l2
17 pages
09_PCA
No ratings yet
09_PCA
19 pages
DS-I - Introduction To Data Structure
No ratings yet
DS-I - Introduction To Data Structure
64 pages
HNSW
No ratings yet
HNSW
13 pages
DECISION TREE ALGORITHM LEARNING-converted
No ratings yet
DECISION TREE ALGORITHM LEARNING-converted
10 pages
Mod 4 - CLustering
No ratings yet
Mod 4 - CLustering
55 pages
LeetCode Sheet - June-July 2023 Classroom Batch ( by Abhishek Srivastava ) - 1D Arrays
No ratings yet
LeetCode Sheet - June-July 2023 Classroom Batch ( by Abhishek Srivastava ) - 1D Arrays
2 pages
AVL Trees: by Jyostna Devi Bodapati
No ratings yet
AVL Trees: by Jyostna Devi Bodapati
55 pages
ID3
No ratings yet
ID3
7 pages
Decision Tree Classifier-C4.5
No ratings yet
Decision Tree Classifier-C4.5
23 pages
FDS Assignment 5 Rohini
No ratings yet
FDS Assignment 5 Rohini
4 pages
Data Structures: Stacks
No ratings yet
Data Structures: Stacks
12 pages
ZIMAMOTO
No ratings yet
ZIMAMOTO
14 pages
Practical Programs
No ratings yet
Practical Programs
5 pages
A22-DS-(C TO All)-23-08-2023-(Regular) (1)
No ratings yet
A22-DS-(C TO All)-23-08-2023-(Regular) (1)
2 pages
Assigment 2 Ammad Ali
No ratings yet
Assigment 2 Ammad Ali
8 pages
List of Algorithms
No ratings yet
List of Algorithms
4 pages
MDSA Asdignment
No ratings yet
MDSA Asdignment
30 pages
00 Decision Tree Example
No ratings yet
00 Decision Tree Example
12 pages
Decision Tree
No ratings yet
Decision Tree
29 pages
Question-Bank-MCA101 - C - DS
No ratings yet
Question-Bank-MCA101 - C - DS
6 pages
be_artificial-intelligence-and-data-science_semester-4_2024_march_data-structures-and-algorithms-ds-a-2019-pattern
No ratings yet
be_artificial-intelligence-and-data-science_semester-4_2024_march_data-structures-and-algorithms-ds-a-2019-pattern
2 pages
Algorithm:: Experiment No: 03 Experiment Name: Write A Program Implementation of Aim
No ratings yet
Algorithm:: Experiment No: 03 Experiment Name: Write A Program Implementation of Aim
3 pages
Laboratory - Activity 3
No ratings yet
Laboratory - Activity 3
3 pages
A Simple Scilab 5.3.3 Code For ANN (Artificial Neuron Networks)
No ratings yet
A Simple Scilab 5.3.3 Code For ANN (Artificial Neuron Networks)
4 pages
Quick Sort and Selection Sort
No ratings yet
Quick Sort and Selection Sort
8 pages
Electrical Circuits Analysis: Lecturer Dr. Ahmed Maamoon Al-Kababji
No ratings yet
Electrical Circuits Analysis: Lecturer Dr. Ahmed Maamoon Al-Kababji
26 pages
ID3_Complete_Solution
No ratings yet
ID3_Complete_Solution
3 pages
Lecture Five Mesh Analysis: - Identify Mesh - How To Apply Mesh Analysis - Super Mesh
No ratings yet
Lecture Five Mesh Analysis: - Identify Mesh - How To Apply Mesh Analysis - Super Mesh
15 pages
decision tree id3 problem
No ratings yet
decision tree id3 problem
5 pages
DSA Company Wise - Overall Roadmap
No ratings yet
DSA Company Wise - Overall Roadmap
1 page
Gujarat Technological University
No ratings yet
Gujarat Technological University
1 page
Simple Numbers
From Everand
Simple Numbers
Prasant
No ratings yet

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

07_Decision tree

Uploaded by

07_Decision tree

Uploaded by

AI

Dr. Ali Al-Saegh

• A decision tree consists of : Decision nodes

• In general, the rules have the form:

• Good split if we are more certain about classification after split.

𝐻 𝑆 = − ෍ 𝑃(𝑥) log 2 𝑃(𝑥)

• 𝑆: dataset for which entropy is being calculated

Impure Less impure Pure

• 𝐼𝐺 𝑆, 𝐴 : information gain by splitting 𝑆 on feature 𝐴

𝐻 𝑆 = − ෍ 𝑃(𝑥) log 2 𝑃(𝑥)

= 0.41 + 0.53 = 0.94

• 𝐼𝐺 𝑆, 𝑜𝑢𝑡𝑙𝑜𝑜𝑘 = 𝐻(𝑆) − 𝑃 𝑠𝑢𝑛𝑛𝑦 × 𝐻 𝑠𝑢𝑛𝑛𝑦 + 𝑃 𝑜𝑣𝑒𝑟𝑐𝑎𝑠𝑡 × 𝐻 𝑜𝑣𝑒𝑟𝑐𝑎𝑠𝑡 + 𝑃(𝑟𝑎𝑖𝑛) × 𝐻(𝑟𝑎𝑖𝑛)

• 𝐼𝐺 𝑆, 𝑡𝑒𝑚𝑝. = 𝐻(𝑆) − 𝑃 ℎ𝑜𝑡 × 𝐻 ℎ𝑜𝑡 + 𝑃 𝑐𝑜𝑙𝑑 × 𝐻 𝑐𝑜𝑙𝑑 + 𝑃(𝑚𝑖𝑙𝑑) × 𝐻(𝑚𝑖𝑙𝑑)

• 𝐼𝐺 𝑆, ℎ𝑢𝑚𝑖𝑑𝑖𝑡𝑦 = 𝐻(𝑆) − 𝑃 ℎ𝑖𝑔ℎ × 𝐻 ℎ𝑖𝑔ℎ + 𝑃 𝑛𝑜𝑟𝑚𝑎𝑙 × 𝐻 𝑛𝑜𝑟𝑚𝑎𝑙

• 𝐼𝐺 𝑆, 𝑤𝑖𝑛𝑑 = 𝐻(𝑆) − 𝑃 𝑤𝑒𝑎𝑘 × 𝐻 𝑤𝑒𝑎𝑘 + 𝑃 𝑠𝑡𝑟𝑜𝑛𝑔 × 𝐻 𝑠𝑡𝑟𝑜𝑛𝑔

Sunny Overcast Rainy

Outlook Temp. Humidity Wind Play tennis

• 𝐼𝐺 𝑠𝑢𝑛𝑛𝑦, 𝑡𝑒𝑚𝑝. = 𝐻(𝑠𝑢𝑛𝑛𝑦) − 𝑃 ℎ𝑜𝑡 × 𝐻 ℎ𝑜𝑡 + 𝑃 𝑐𝑜𝑙𝑑 × 𝐻 𝑐𝑜𝑙𝑑 + 𝑃(𝑚𝑖𝑙𝑑) × 𝐻(𝑚𝑖𝑙𝑑)

• 𝐼𝐺 𝑠𝑢𝑛𝑛𝑦, ℎ𝑢𝑚𝑖𝑑𝑖𝑡𝑦 = 𝐻(𝑠𝑢𝑛𝑛𝑦) − 𝑃 ℎ𝑖𝑔ℎ × 𝐻 ℎ𝑖𝑔ℎ + 𝑃 𝑛𝑜𝑟𝑚𝑎𝑙 × 𝐻 𝑛𝑜𝑟𝑚𝑎𝑙

• 𝐼𝐺 𝑠𝑢𝑛𝑛𝑦, 𝑤𝑖𝑛𝑑 = 𝐻(𝑠𝑢𝑛𝑛𝑦) − 𝑃 𝑤𝑒𝑎𝑘 × 𝐻 𝑤𝑒𝑎𝑘 + 𝑃 𝑠𝑡𝑟𝑜𝑛𝑔 × 𝐻 𝑠𝑡𝑟𝑜𝑛𝑔

Sunny Overcast Rainy

Outlook Temp. Humidity Wind Play tennis

• 𝐼𝐺 𝑟𝑎𝑖𝑛𝑦, 𝑡𝑒𝑚𝑝. = 𝐻(𝑟𝑎𝑖𝑛𝑦) − 𝑃 𝑐𝑜𝑙𝑑 × 𝐻 𝑐𝑜𝑙𝑑 + 𝑃(𝑚𝑖𝑙𝑑) × 𝐻(𝑚𝑖𝑙𝑑)

• 𝐼𝐺 𝑟𝑎𝑖𝑛𝑦, 𝑤𝑖𝑛𝑑 = 𝐻(𝑟𝑎𝑖𝑛𝑦) − 𝑃 𝑤𝑒𝑎𝑘 × 𝐻 𝑤𝑒𝑎𝑘 + 𝑃 𝑠𝑡𝑟𝑜𝑛𝑔 × 𝐻 𝑠𝑡𝑟𝑜𝑛𝑔

Sunny Overcast Rainy

Humidity Yes Wind

High Normal Weak Strong

𝐻 ℎ𝑢𝑚𝑖𝑑𝑖𝑡𝑦 > 65 = −𝑝(𝑛𝑜) log 2 𝑝(𝑛𝑜) − 𝑝(𝑦𝑒𝑠) log 2 𝑝(𝑦𝑒𝑠)

𝐼𝐺 ℎ𝑢𝑚𝑖𝑑𝑖𝑡𝑦, 65 = 𝐻(𝑆) − 𝑃 ℎ𝑢𝑚𝑖𝑑𝑖𝑡𝑦 ≤ 65 × 𝐻 ℎ𝑢𝑚𝑖𝑑𝑖𝑡𝑦 ≤ 65 + 𝑃 ℎ𝑢𝑚𝑖𝑑𝑖𝑡𝑦 > 65 × 𝐻 ℎ𝑢𝑚𝑖𝑑𝑖𝑡𝑦 > 65

Sunny Overcast Rainy

Humidity Yes Wind

> 80 ≤ 80 Weak Strong

• 𝑆𝐷 𝑆, 𝐴 : standard deviation by splitting the dataset (𝑆) on the feature 𝐴

Outlook Temp. Humidity Wind Hours played

Wind 46.3 Temp.

47.7 26.5 38 27.5 41.5

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.