0% found this document useful (0 votes)

8 views

Unit 3 MLT

The document discusses decision trees as a supervised learning algorithm applicable for classification and regression tasks. It explains the concepts of inductive and deductive reasoning, the structure of decision trees, and the ID3 algorithm for constructing them. Additionally, it covers the importance of attributes, information gain, and entropy in building effective decision trees.

Uploaded by

aditya Kumar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

8 views

Unit 3 MLT

Uploaded by

aditya Kumar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 18

\

Unit – 3(Decision Tree)

Decision tree is generally seen as Supervised Learning algorithm for

classification tasks.
But can also be used for regression problems and in unsupervised learning
situation.
Reasoning- the action of thinking about something in a logical, sensible way.
Inductive Reasoning- Specific to General
My elder brother is good at math. My friend's elder brother is good at math. My
neighbor's big brother is a math tutor. Therefore, all elder brothers are good at
math.'
We've probably heard people use this type of reasoning in life. We know this is not
always true. You probably know that being an older brother doesn't inherently
make you good at math. What we have done is made a generalized conclusion: all
older brothers are good at math based on three premises of specific instances:
Mine, my friend's and my neighbor's elder brother are all good at math. These
specific instances are not representative of the entire population of older brothers.
Because inductive reasoning is based on specific instances, it can often produce
weak and invalid arguments.
We can remember inductive reasoning like this: inductive reasoning is bottom-up
reasoning; it starts with a probable conclusion and induces premises.

Deductive Reasoning- General to Specific

Deductive reasoning is reasoning where true premises develop a true and valid
conclusion. Deductive reasoning uses general principles to create a specific
conclusion. Deductive reasoning is also known as 'top-down reasoning' because it
goes from general and works its way down more specific.
For example, 'All cars have engines. I have a car. Therefore, my car has an engine.'
1. Decision Tree
1.1 Introduction

Decision tree learning is a method for approximating discrete-valued target

functions, in which the learned function is represented by a decision tree. Learned
trees can also be re-represented as sets of if-then rules to improve human
readability. These learning methods are among the most popular of inductive
inference algorithms and have been successfully applied to a broad range of tasks
from learning to diagnose medical cases to learning to assess credit risk of loan
applicants.

1.2 Decision Tree Representation

Consider a set of training examples in following Table 1 and corresponding
decision tree in Figure 1.
Input Data Set
Label/Output
Day Outlook Temperature Humidit Wind PlayTennis
y
D1 Sunny Hot High Weak No
D2 Sunny Hot High Strong No
D3 Overcas Hot High Weak Yes
t
D4 Rain Mild High Weak Yes
D5 Rain Cool Normal Weak Yes
D6 Rain Cool Normal Strong No
D7 Overcas Cool Normal Strong Yes
t
D8 Sunny Mild High Weak No
D9 Sunny Cool Normal Weak Yes
D10 Rain Mild Normal Weak Yes
D11 Sunny Mild Normal Strong Yes
D12 Overcas Mild High Strong Yes
t
D13 Overcas Hot Normal Weak Yes
t
D14 Rain Mild High Strong No
TABLE 1: Training examples for the target concept PlayTennis.

Outlook

Sunny Overcast Rain

Humidity Yes Wind

High Normal Strong Weak

No Yes No Yes

Figure 1

Decision trees classify instances by sorting them down the tree from the root to
some leaf node, which provides the classification of the instance. Each node in the
tree specifies a test of some attribute of the instance, and each branch descending
from that node corresponds to one of the possible values for this attribute. An
instance is classified by starting at the root node of the tree, testing the attribute
specified by this node, then moving down the tree branch corresponding to the
value of the attribute in the given example. This process is then repeated for the
subtree rooted at the new node. In general, decision trees represent a disjunction of
conjunctions of constraints on the attribute values of instances. Each path from the
tree root to a leaf corresponds to a conjunction of attribute tests, and the tree itself
to a disjunction of these conjunctions. For example, the decision tree shown in

Figure 1 corresponds to the expression for playing Tennis=Yes

(Outlook = Sunny ^ Humidity = Normal)
V (Outlook = Overcast)
v (Outlook = Rain ^ Wind = Weak)

1.3 Appropriate Problems For Decision Tree Learning

Decision tree learning is generally best suited to problems with the following
characteristics:
--Instances are represented by attribute-value pairs. Instances are described by
a fixed set of attributes (e.g., Temperature) and their values (e.g., Hot). The
easiest situation for decision tree learning is when each attribute takes on a small
number of disjoint possible values (e.g., Hot, Mild, Cold). However, few
algorithms allow handling real-valued attributes as well (e.g., representing
Temperature numerically).
--The target function has discrete output values. The decision tree in Figure.1
assigns a boolean classification (e.g., yes or no) to each example. Decision tree
methods easily extend to learning functions with more than two possible output
values. learning target functions with real-valued outputs is also , though the
application of decision trees in this setting is less common.
--Disjunctive descriptions may be required. As noted above, decision trees
naturally represent disjunctive expressions.
--The training data may contain errors. Decision tree learning methods are
robust to errors, both errors in classifications of the training examples and errors in
the attribute values that describe these examples.
--The training data may contain missing attribute values. Decision tree
methods can be used even when some training examples have unknown values
(e.g., if the Humidity of the day is known for only some of the training examples).

Decision tree learning has therefore been applied to problems such as learning to
classify medical patients by their disease, equipment malfunctions by their cause,
and loan applicants by their likelihood of defaulting on payments. Such problems,
in which the task is to classify examples into one of a discrete set of possible
categories, are often referred to as Classification problems.

1.4 The Basic Decision Tree Learning Algorithm- ID3 Algorithm

ID3, learns decision trees by constructing them top down, beginning with the
question "which attribute should be tested at the root of the tree?' To answer this
question, each instance attribute is evaluated using a statistical test to determine
how well it alone classifies the training examples.
The best attribute is selected and used as the test at the root node of the tree. A
descendant of the root node is then created for each possible value of this attribute.
The entire process is then repeated using the training examples associated with
each descendant node to select the best attribute to test at that point in the tree. This
forms a greedy search for an acceptable decision tree, in which the algorithm never
backtracks to reconsider earlier choices.
1.4.1 Which Attribute Is the Best Classifier?
The central choice in the ID3 algorithm is selecting which attribute to test at each
node in the tree. We would like to select the attribute that is most useful for
classifying examples. What is a good quantitative measure of the worth of an
attribute? We define a statistical property, called informution gain, that measures
how well a given attribute separates the training examples according to their target
classification. ID3 uses this information gain measure to select among the
candidate attributes at each step while growing the tree.

1.4.2 Entropy Measures Homogeneity Of Examples

In order to define information gain precisely, we begin by defining a measure
commonly used in information theory, called entropy, that characterizes the
(im)purity of an arbitrary collection of examples. Given a collection S, containing
positive and negative examples of some target concept, the entropy of S relative to
this boolean classification is
Entropy(S)= - (P+ )(Log2 P+ )- (P- )(Log2 P- ) EQ(1)
where P+, is the proportion of positive examples in S and P- is the proportion of
negative examples in S. suppose S is a collection of 14 examples of some boolean
concept, including 9 positive and 5 negative examples. Then the entropy of S
relative to this boolean classification is
Entropy([9+,5-])= -(9/14) log2 (9/14) - (5/14) log2 (5/14) =0.940

Entropy is 0 if all members of S belong to the same class. For example, if all
members are positive then P- is 0, and Entropy(S) = -1 . log2(1) - 0 . log2 0 = -1 .
0 - 0 . log2 0 = 0. Note the entropy is 1 when the collection contains an equal
number of positive and negative examples. If the collection contains unequal
numbers of positive and negative examples, then Entropy is between 0 and 1.
More generally, if the target attribute can take on c different values, then the
entropy of S relative to this c-wise classification is defined as

c
Entropy(S) = ∑ - Pi log2 Pi
i=1
where Pi is the proportion of S belonging to class i

Day Outlook Temperature Humidit Wind PlayTennis

y
D1 Sunny Hot High Weak No
D2 Sunny Hot High Strong No
D3 Overcas Hot High Weak Yes
t
D4 Rain Mild High Weak Yes
D5 Rain Cool Normal Weak Yes
D6 Rain Cool Normal Strong No
D7 Overcas Cool Normal Strong Yes
t
D8 Sunny Mild High Weak No
D9 Sunny Cool Normal Weak Yes
D10 Rain Mild Normal Weak Yes
D11 Sunny Mild Normal Strong Yes
D12 Overcas Mild High Strong Yes
t
D13 Overcas Hot Normal Weak Yes
t
D14 Rain Mild High Strong No

1.4.3 Information Gain Measures The Expected Reduction In Entropy

Given entropy as a measure of the impurity in a collection of training examples,
we can now define a measure of the effectiveness of an attribute in classifying the
training data. The measure we will use, called information gain, is simply the
expected reduction in entropy caused by partitioning the examples according to
this attribute. More precisely, the information gain, Gain(S, A) of an attribute A,
relative to a collection of examples S, is defined as
Gain(S, A) = Entropy(S) - ∑ (|Sv| / |S|) Entropy(Sv)
v€values(A)
where values(A) is the set of all possible values for attribute A, and Sv is the subset
of S for which attribute A has value v (i.e., Sv = {s €S |A(s) = v)).
For example, suppose S is a collection of training-example days described by
attributes including Wind, which can have the values Weak or Strong. As before,
assume S is a collection containing 14 examples, [9+, 5-]. Of these 14 examples,
suppose 6 of the positive and 2 of the negative examples have Wind = Weak, and
the remainder have Wind = Strong. The information gain due to sorting the original
14 examples by the attribute Wind may then be calculated as

Values(Wind) = Weak, Strong

S=[9+, 5-]
Sweak [6+, 2-]
Sstrong [3+, 3-]

Gain(S, Wind)= Entropy(S) - ∑ (|Sv| / |S| )Entropy(Sv)

v€{weak, strong}

= Entropy(S) – (8/14) Entropy(Sweak)-(6/14) Entropy(Sstrong)

Now from EQ(1) Entropy(Sweak)= - (6/8)Log2 (6/8) – (2/8)Log2 (2/8)
= - (0.75)(-0.4150)-(0.25)(-2.0)
=0.31125+0.50
=0.811
Entropy(Sstrong)= - (3/6)Log2 (3/6) – (3/6)Log2 (3/6)
= - (0.5)Log2(0.5)- (0.5)Log2(0.5)
= -(0.5)(-1.0) – (0.5)(-1.0)
=1.0

So, Gain(S,Wind)=0.940-(8/14)(0.811)-(6/14)(1.0)= 0.048

Information gain is precisely the measure used by ID3 to select the best attribute at
each step in growing the tree.

1.4.4 Building Decision Tree

Let us try to build decision tree from table 1.
Which attribute is the best classifier? That attribute which has highest Information
Gain

Let us find out Information gain of all attributes. First take Humidity

S: [9+,5-]
E =0.940 (Calculated in previous page)
Humidity

High Normal

[3+, 4-] [6+, 1-]

E =-3/7(log2 3/7)-4/7(log2 4/7) E=-6/7(log2 6/7)-1/7(log2 1/7)

=-(0.428)(-1.224)-(0.571)(-0.808) =-(0.8571)(log2 0.8571)-(0.1428)(log2 0.1428)
=0.523+0.461=0.985 =-(0.8571)(-0.2224)-(0.1428)(-2.8079)
=0.1906+0.4009=0.592

Gain(S, A) = Entropy(S) - ∑ (|Sv| / |S|) Entropy(Sv)

v€values(A)
= Entropy(S) - ∑ (|Sv| / |S| )Entropy(Sv)
v€{High, Normal}

Gain(S, Humidity)
=0.940 - (7/14) (0.985) - (7/14) (0.592) = 0.151

Now take Wind

S: [9+,5-]
E =0.940
Wind

Weak Strong
[6+,2-] [3+,3-]
E=0.811 E=1

Gain(S, Wind)
=0.940 - (8/14) (0.811) - (6/14) (1.0 )= 0.048 (Details in previous page)

Now take Outlook

S: [9+,5-]
E =0.940
Outlook
Sunny Overcast Rain
[2+,3-] [4+,0-] [3+,2-]
E =-2/5(log2 2/5)-3/5(log2 3/5) E=-4/4(log2 1)-0/4(log2 0) E=-3/5(log2 3/5)-2/5(log2 2/5)
= -(0.4)(log2 0.4)-(0.6)(log2 0.6) =0 =-(0.6)(log2 0.6)-(0.4)(log2 0.4)
=-(0.4)(-1.32)-(0.6)(-0.73) =-(0.8571)(-0.2224)-(0.1428)(-2.8079)
=0.528+0.44=0.968 =0.44+0.528=0.968

Gain(S, Outlook) = Entropy(S) - ∑ (|Sv| / |S|) Entropy(Sv)

v€values(Sunny,Overcast,Rain)
Gain(S, Outlook) = 0.940-(5/14*0.968+ 4/14 * 0 + 5/14 * 0.968)
=0.940- 0.691=0.249

Now take temperature

S: [9+,5-]
E =0.940
Temperature

Hot Mild Cool

[2+,2-] [4+,2-] [3+,1-]
E =-2/4(log2 2/4)-2/4(log2 2/4) E=-4/6(log2 4/6)-2/6(log2 2/6) E=-3/4(log2 3/4)-1/4(log2 1/4)
= -(0.5)(log2 0.5)-(0.5)(log2 0.5) = -(0.66)(-0.599)-(0.333)(-1.586) =-(0.75)(log2 0.75)-(0.25)(log2 0.25)
=-(0.5)(-1)-(0.5)(-1) =0.395+0.528 =-(0.75)(-0.415)-(0.25)(-2)
=0.5+0.5=1 =0.923 =0.311+0.5=0.811
Gain(S, Temperature) = Entropy(S) - ∑ (|Sv| / |S|) Entropy(Sv)
v€values(hot,mild,cool)

Gain(S, Temperature) = 0.940-(4/14 * 1+ 6/14 * 0.923 + 4/14 * 0.811)

=0.940- (0.285+0.395+0.232)=0.029

So,
Gain(S, Outlook)=0.249
Gain(S, Humidity) = 0.151
Gain(S, Wind) = 0.048
Gain(S, Temperature) = 0.029

Therefore, Outlook is selected as the decision attribute for the root node, and
branches are created below the root for each of its possible values (i.e., Sunny,
Overcast, and Rain). The resulting partial decision tree is shown in Figure 2, along
with the training examples sorted to each new descendant node. Note that every
example for which Outlook = Overcast is also a positive example of PlayTennis.
Therefore, this node of the tree becomes a leaf node with the classification
PlayTennis = Yes. In contrast, the descendants corresponding to Outlook = Sunny
and Outlook = Rain still have nonzero entropy, and the decision tree will be further
elaborated below these nodes.

Outlook

Sunny Overcast Rain

Yes
Which attribute should be tested here?

Figure 2: The partially learned decision tree

Ssunny={D1,D2,D8,D9,D11}
E(Ssunny)=-(2/5)( log2 2/5) – (3/5)( log2 3/5)=-(0.4)(-1.322)-(0.6)(0.737)
=0.529+0.442 =0.970
Humidity has two values high and normal. For Outlook=Sunny, humidity has three
times high and two times normal values.
For outlook = sunny, humidity = high, there are three negative output values and
no positive values. And for humidity=normal, there are two positive output values
and no negative values.

Gain(S, A) = Entropy(S) - ∑ (|Sv| / |S|) Entropy(Sv)

v€values(A)

Gain(Ssunny, Humidity)=0.970-[(3/5)(-0/3* log2 0/3-3/3 * log2 3/3)+(2/5)(-2/2 log2

2/2-0/2 *log2 0/2)]=0.970

Gain(Ssunny, Temperature)=0.970-[(2/5)(-0/2 * log2 0/2 - 2/2 log2 2/2)+(2/5)(-1/2 *

log2 ½ - 1/2 log2 ½ )+(1/5)(-1/1 log2 1/1 - 0/1 log2 0/1)]=0.970-[0+0.4+0]=0.570

Gain(Ssunny, Wind)=0.970-[(2/5)(1.0)+(3/5)(-1/3 log2 1/3-2/3 log2 2/3)]

=0.970-[0.4+0.6(-0.33*(-1.599)-0.66*(-0.599))]=0.970-[0.4+0.6(0.527+0.395)]
=0.970-[0.953]=0.017

Day Outlook Temperature Humidity Wind PlayTennis

D1 Sunny Hot High Weak No
D2 Sunny Hot High Strong No
D3 Overcast Hot High Weak Yes
D4 Rain Mild High Weak Yes
D5 Rain Cool Normal Weak Yes
D6 Rain Cool Normal Strong No
D7 Overcast Cool Normal Strong Yes
D8 Sunny Mild High Weak No
D9 Sunny Cool Normal Weak Yes
D10 Rain Mild Normal Weak Yes
D11 Sunny Mild Normal Strong Yes
D12 Overcast Mild High Strong Yes
D13 Overcast Hot Normal Weak Yes
D14 Rain Mild High Strong No

Outlook

Sunny Overcast Rain

Humidity Yes

Now

Srain= {D4,D5,D6.D10,D14}

E(Srain) =-3/5log2 3/5-2/5log2 2/5=-(0.6)(-0.737) – (0.4)(-1.322)=0.442+0.528=

0.970

Gain(S, wind) = Entropy(S) - ∑ (|Sv| / |S|) Entropy(Sv)

v€values(weak,strong)

Gain(Srain, wind)=0.970-[2/5(-0/2 log2 0/2-2/2 log2 2/2)+3/5(-3/3 log2 3/3-0/3 log2

0/3)]=0.970-[0]=0.970

Gain(S, temp) = Entropy(S) - ∑ (|Sv| / |S|) Entropy(Sv)

v€values(cool,mild)
Gain(Srain, Temperature)=0.970-[2/5(-1/2 log2 ½-1/2 log2 ½ )+3/5(-2/3 log2 2/3-
1/3log2 1/3)]

=0.970-[0.4+0.6(-0.666(-0.586)-0.333(-1.586))]=0.970-[0.4+0.6(0.390+0.528)]

=0.970-[0.951]=0.019

Outlook

Sunny Overcast Rain

Humidity Yes Wind

High Normal Strong Weak

Now it can be observed that Entropy value at bottom four points is zero
For Example Entropy at left most point (High) is

E(D1,D2,D8)=0

Gain(S, temp) = Entropy(S) - ∑ (|Sv| / |S|) Entropy(Sv)

v€values(high,mild)

Gain(S,temp)=0-[2/3(0)+1/3(0)]=0

So, gain at every point with respect to attribute Temperature is zero

SO, final Decision tree is

Outlook

Sunny Overcast Rain

Humidity Yes Wind

High Normal Strong Weak

No Yes No Yes

Algorithm-
ID3(Examples, Target attribute, Attributes)
Examples are the training examples. Target attribute is the attribute whose
value is to be predicted by the tree. Attributes is a list of other attributes that
may be tested by the learned decision tree. Returns a decision tree that
correctly classifies the given Examples.
● Create a Root node for the tree
● If all Examples are positive, Return the single-node tree Root, with label = +
● If all Examples are negative, Return the single-node tree Root, with label = -
● If Attributes is empty, Return the single-node tree Root, with label = most
common value of Target attribute in Examples
● Otherwise Begin
● A the attribute from Attributes that best classifies Examples
● The decision attribute for Root A
● For each possible value, vi, of A,

● Add a new tree branch below Root, corresponding to the test A = vi

● Let Examplesvi be the subset of Examples that have value vi for A
● If Examplesvi is empty
● Then below this new branch add a leaf node with label = most
common value of Target attribute in Examples
● Else below this new branch add the subtree
ID3(Examplesvi Targetattribute, Attributes – {A})
End
Return Root

1.5 Inductive Bias In Decision Tree Learning

Inductive bias is the set of assumptions that, together with the training data,
deductively justify the classifications assigned by the learner to future instances.
Given a collection of training examples, there are typically many decision trees
consistent with these examples. Describing the inductive bias of ID3 therefore
consists of describing the basis by which it chooses one of these consistent
hypotheses over the others. Which of these decision trees does ID3 choose? ID3
search strategy (a) selects in favor of shorter trees over longer ones, and (b) selects
trees that place the attributes with highest information gain closest to the root.
1.6 Issues In Decision Tree Learning

Practical issues in learning decision trees include determining how deeply to grow
the decision tree, handling continuous attributes, choosing an appropriate attribute
selection measure, Handling training data with missing attribute values, handling
attributes with differing costs, and improving computational efficiency.
2. Overfitting and Underfitting in Machine Learning model

This situation where any given model is performing too well on the training data
but the performance drops significantly over the test set is called
an overfitting model. On the other hand, if the model is performing poorly over
the test and the train set, then we call that an underfitting model.Overfitting and
Underfitting are the two main problems that occur in machine learning and degrade
the performance of the machine learning models.
The main goal of each machine learning model is to generalize well. It means after
providing training on the dataset, it can produce reliable and accurate output.
Hence, the underfitting and overfitting are the two terms that need to be checked
for the performance of the model and whether the model is generalizing well or
not.

Overfitting

Overfitting occurs when our machine learning model tries to cover all the data
points or more than the required data points present in the given dataset. Because
of this, the model starts caching noise and inaccurate values present in the dataset,
and all these factors reduce the efficiency and accuracy of the model. The
overfitted model has low bias and high variance.

The chances of occurrence of overfitting increase as much we provide training to

our model. It means the more we train our model, the more chances of occurring
the overfitted model.
Overfitting is the main problem that occurs in supervised learning.

Example: The concept of the overfitting can be understood by the below graph of
the linear regression output:

Fig : Overfitting
How to avoid the Overfitting in Model
Both overfitting and underfitting cause the degraded performance of the machine
learning model. There are some ways by which we can reduce the occurrence of
overfitting in our model.
o Cross-Validation
o Training with more data
o Removing features
o Early stopping the training

Underfitting

Underfitting occurs when our machine learning model is not able to capture the
underlying trend of the data. To avoid the overfitting in the model, the fed of
training data can be stopped at an early stage, due to which the model may not
learn enough from the training data. As a result, it may fail to find the best fit of the
dominant trend in the data.

In the case of underfitting, the model is not able to learn enough from the training
data, and hence it reduces the accuracy and produces unreliable predictions.

An underfitted model has high bias and low variance.

Example: We can understand the underfitting using below output of the linear
regression model:

Fig: Underfitting
How to avoid underfitting:
o By increasing the training time of the model.
o By increasing the number of features.

o Goodness of Fit
o The "Goodness of fit" term is taken from the statistics, and the goal of the
machine learning models to achieve the goodness of fit. In statistics
modeling, it defines how closely the result or predicted values match the true
values of the dataset.
o The model with a good fit is between the underfitted and overfitted model,
and ideally, it makes predictions with 0 errors, but in practice, it is difficult
to achieve it.
o As when we train our model for a time, the errors in the training data go
down, and the same happens with test data. But if we train the model for a
long duration, then the performance of the model may decrease due to the
overfitting, as the model also learn the noise present in the dataset. The
errors in the test dataset start increasing, so the point, just before the raising
of errors, is the good point, and we can stop here for achieving a good
model.

Leadership Communication
89% (56)
Leadership Communication
449 pages
ANSI 3290-1-2014-Rolling Bearings-Steel Balls
100% (1)
ANSI 3290-1-2014-Rolling Bearings-Steel Balls
18 pages
Module 3 DecisionTree Notes
100% (1)
Module 3 DecisionTree Notes
14 pages
Decision Trees
No ratings yet
Decision Trees
7 pages
ML Unit-2.1
No ratings yet
ML Unit-2.1
17 pages
Module - 2 Decision Tree Learning
No ratings yet
Module - 2 Decision Tree Learning
79 pages
Module 3
No ratings yet
Module 3
102 pages
Unit 2 1
No ratings yet
Unit 2 1
15 pages
Module 3-Decision Tree Learning
100% (1)
Module 3-Decision Tree Learning
33 pages
Module 3
No ratings yet
Module 3
101 pages
Unit2 ML
No ratings yet
Unit2 ML
19 pages
Machine Learning: MVJ21CS62
No ratings yet
Machine Learning: MVJ21CS62
12 pages
New Module 3 Part1
No ratings yet
New Module 3 Part1
69 pages
ML-Lec5
No ratings yet
ML-Lec5
7 pages
Screenshot 2024-02-06 at 1.43.15 PM
No ratings yet
Screenshot 2024-02-06 at 1.43.15 PM
66 pages
Decision Tree Learning and Inductive Inference
No ratings yet
Decision Tree Learning and Inductive Inference
37 pages
ML Unit 3
No ratings yet
ML Unit 3
36 pages
Lect 8-Decision Tree-2
No ratings yet
Lect 8-Decision Tree-2
16 pages
Decision Tree Learning
No ratings yet
Decision Tree Learning
70 pages
MCA3 (DS) Unit 4 ML
No ratings yet
MCA3 (DS) Unit 4 ML
29 pages
Unit 3
No ratings yet
Unit 3
46 pages
03 02 Decision Trees (1)
No ratings yet
03 02 Decision Trees (1)
61 pages
ML UNIT-2 Notes
No ratings yet
ML UNIT-2 Notes
15 pages
unit 3
No ratings yet
unit 3
90 pages
2.decision Tree
No ratings yet
2.decision Tree
56 pages
Decision Tree Algorithm: and Classification Problems Too
No ratings yet
Decision Tree Algorithm: and Classification Problems Too
12 pages
Unit-3 (1)
No ratings yet
Unit-3 (1)
81 pages
ML UNIT 2-2-40
No ratings yet
ML UNIT 2-2-40
39 pages
Unit-2 Notes
No ratings yet
Unit-2 Notes
20 pages
Ai Mod3@Azdocuments - in
No ratings yet
Ai Mod3@Azdocuments - in
42 pages
Unit-3 MLT
No ratings yet
Unit-3 MLT
74 pages
Chapter 3 Decision Trees
No ratings yet
Chapter 3 Decision Trees
61 pages
AI_01_ID3
No ratings yet
AI_01_ID3
7 pages
Module - 3 - DTL & Ann
No ratings yet
Module - 3 - DTL & Ann
10 pages
NOTES Module 3 - Chapter 6 - Decision Tree Learning
No ratings yet
NOTES Module 3 - Chapter 6 - Decision Tree Learning
20 pages
Module 2 Notes
No ratings yet
Module 2 Notes
20 pages
Chapter 5 2018 2019
No ratings yet
Chapter 5 2018 2019
5 pages
Lec-3-Decision Trees
No ratings yet
Lec-3-Decision Trees
47 pages
Module 3-1 PDF
No ratings yet
Module 3-1 PDF
43 pages
ID3
No ratings yet
ID3
7 pages
ML-3-Decision Tree
No ratings yet
ML-3-Decision Tree
17 pages
AIML- Module 3- Updated
No ratings yet
AIML- Module 3- Updated
42 pages
Mod 3 AIML QB With Answers
No ratings yet
Mod 3 AIML QB With Answers
26 pages
Class 16 Decision Tree
No ratings yet
Class 16 Decision Tree
45 pages
Module 2 Notes v1 PDF
No ratings yet
Module 2 Notes v1 PDF
20 pages
module 2
No ratings yet
module 2
42 pages
ml unit 3 part 1
No ratings yet
ml unit 3 part 1
42 pages
Springer.linguistic Decision Trees for Classification-2014
No ratings yet
Springer.linguistic Decision Trees for Classification-2014
43 pages
ML Lecture 13-14
No ratings yet
ML Lecture 13-14
33 pages
L3 - Decision Trees
No ratings yet
L3 - Decision Trees
28 pages
W7-8_ Decision Trees
No ratings yet
W7-8_ Decision Trees
81 pages
Decision Tree
No ratings yet
Decision Tree
34 pages
Decision Trees Iterative Dichotomiser 3 (ID3) For Classification: An ML Algorithm
No ratings yet
Decision Trees Iterative Dichotomiser 3 (ID3) For Classification: An ML Algorithm
7 pages
ML_Unit-2_Material
No ratings yet
ML_Unit-2_Material
20 pages
Deep Learning: Decision Trees I
No ratings yet
Deep Learning: Decision Trees I
45 pages
Unit 2
No ratings yet
Unit 2
20 pages
The ID3 Algorithm
No ratings yet
The ID3 Algorithm
9 pages
Visit:: Join Telegram To Get Instant Updates: Contact: MAIL: Instagram: Instagram: Whatsapp Share
No ratings yet
Visit:: Join Telegram To Get Instant Updates: Contact: MAIL: Instagram: Instagram: Whatsapp Share
21 pages
Module 3 Chap 3 Decision Tree Learning
No ratings yet
Module 3 Chap 3 Decision Tree Learning
79 pages
Decision Trees / NLP
No ratings yet
Decision Trees / NLP
27 pages
Video Tutorial: Decision Tree Learning
No ratings yet
Video Tutorial: Decision Tree Learning
21 pages
Alternating Decision Tree: Fundamentals and Applications
From Everand
Alternating Decision Tree: Fundamentals and Applications
Fouad Sabry
No ratings yet
Colored Pencil Drawing Tutorial Step by Step
100% (1)
Colored Pencil Drawing Tutorial Step by Step
4 pages
Software Upgrade Procedure
No ratings yet
Software Upgrade Procedure
4 pages
3DX1.0
No ratings yet
3DX1.0
8 pages
ISM-06-07 (Global E-Business and Collaboration)
No ratings yet
ISM-06-07 (Global E-Business and Collaboration)
48 pages
Rapid download FPGA Prototyping by VHDL Examples Xilinx MicroBlaze MCS SoC 2nd Edition Pong P. Chu ebook PDF all chapters
No ratings yet
Rapid download FPGA Prototyping by VHDL Examples Xilinx MicroBlaze MCS SoC 2nd Edition Pong P. Chu ebook PDF all chapters
19 pages
1 Pengantar Teori Graf (TM 1)
No ratings yet
1 Pengantar Teori Graf (TM 1)
14 pages
Edexcel Ial 2024Jan FP1 draft
No ratings yet
Edexcel Ial 2024Jan FP1 draft
32 pages
EDN's Best of Design Ideas - Volume 1
100% (5)
EDN's Best of Design Ideas - Volume 1
24 pages
"Fruit Shop Management SYSTEM" "Fruit Shop Management SYSTEM" "Fruit Shop Management SYSTEM"
No ratings yet
"Fruit Shop Management SYSTEM" "Fruit Shop Management SYSTEM" "Fruit Shop Management SYSTEM"
6 pages
Literature Review of Lathe Machine
100% (2)
Literature Review of Lathe Machine
5 pages
Computer Applications Sem 1
100% (1)
Computer Applications Sem 1
5 pages
American Psychological Association Homework
100% (1)
American Psychological Association Homework
7 pages
RCX3_CNT_E_V3.15
No ratings yet
RCX3_CNT_E_V3.15
430 pages
Spring Framework Roadmap v1
No ratings yet
Spring Framework Roadmap v1
16 pages
Programming in C Practice Book - Module 2
No ratings yet
Programming in C Practice Book - Module 2
119 pages
CS 4402-01 Learning Journal Unit 3
No ratings yet
CS 4402-01 Learning Journal Unit 3
4 pages
Chandakhighscape-City Com
No ratings yet
Chandakhighscape-City Com
5 pages
VPC-795DN Sheet
100% (1)
VPC-795DN Sheet
2 pages
4007Es/4007Es Hybrid Operating Instructions Following An Alarm, Supervisory, or Trouble Condition System Using Individual Acknowledge
No ratings yet
4007Es/4007Es Hybrid Operating Instructions Following An Alarm, Supervisory, or Trouble Condition System Using Individual Acknowledge
2 pages
HPAC Admin Guide
No ratings yet
HPAC Admin Guide
56 pages
Tugas 2 Stapro
100% (1)
Tugas 2 Stapro
3 pages
AI Important Questions
No ratings yet
AI Important Questions
196 pages
Summer School & Advanced Training Programme SENSE 2025
No ratings yet
Summer School & Advanced Training Programme SENSE 2025
2 pages
Contract of Subscription (OE TEKNIK MFG. CORP.)
No ratings yet
Contract of Subscription (OE TEKNIK MFG. CORP.)
7 pages
Units of Measure
No ratings yet
Units of Measure
2 pages
ITIL 4 Sample - Questions
No ratings yet
ITIL 4 Sample - Questions
30 pages
Circular BIST PDF
No ratings yet
Circular BIST PDF
13 pages
Solving Linear and Quadratic Equations Graphically Questions
100% (2)
Solving Linear and Quadratic Equations Graphically Questions
3 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Unit 3 MLT

Uploaded by

Unit 3 MLT

Uploaded by

\

Unit – 3(Decision Tree)

Decision tree is generally seen as Supervised Learning algorithm for

Deductive Reasoning- General to Specific

Decision tree learning is a method for approximating discrete-valued target

1.2 Decision Tree Representation

Sunny Overcast Rain

High Normal Strong Weak

Figure 1 corresponds to the expression for playing Tennis=Yes

1.3 Appropriate Problems For Decision Tree Learning

1.4 The Basic Decision Tree Learning Algorithm- ID3 Algorithm

1.4.2 Entropy Measures Homogeneity Of Examples

Day Outlook Temperature Humidit Wind PlayTennis

1.4.3 Information Gain Measures The Expected Reduction In Entropy

Values(Wind) = Weak, Strong

Gain(S, Wind)= Entropy(S) - ∑ (|Sv| / |S| )Entropy(Sv)

= Entropy(S) – (8/14) Entropy(Sweak)-(6/14) Entropy(Sstrong)

So, Gain(S,Wind)=0.940-(8/14)(0.811)-(6/14)(1.0)= 0.048

1.4.4 Building Decision Tree

[3+, 4-] [6+, 1-]

E =-3/7(log2 3/7)-4/7(log2 4/7) E=-6/7(log2 6/7)-1/7(log2 1/7)

Gain(S, A) = Entropy(S) - ∑ (|Sv| / |S|) Entropy(Sv)

Now take Wind

Now take Outlook

Gain(S, Outlook) = Entropy(S) - ∑ (|Sv| / |S|) Entropy(Sv)

Now take temperature

Hot Mild Cool

Gain(S, Temperature) = 0.940-(4/14 * 1+ 6/14 * 0.923 + 4/14 * 0.811)

Sunny Overcast Rain

Figure 2: The partially learned decision tree

Gain(S, A) = Entropy(S) - ∑ (|Sv| / |S|) Entropy(Sv)

Gain(Ssunny, Humidity)=0.970-[(3/5)(-0/3* log2 0/3-3/3 * log2 3/3)+(2/5)(-2/2 log2

Gain(Ssunny, Temperature)=0.970-[(2/5)(-0/2 * log2 0/2 - 2/2 log2 2/2)+(2/5)(-1/2 *

Gain(Ssunny, Wind)=0.970-[(2/5)(1.0)+(3/5)(-1/3 log2 1/3-2/3 log2 2/3)]

Day Outlook Temperature Humidity Wind PlayTennis

Sunny Overcast Rain

E(Srain) =-3/5*log2 3/5-2/5*log2 2/5=-(0.6)(-0.737) – (0.4)(-1.322)=0.442+0.528=

Gain(S, wind) = Entropy(S) - ∑ (|Sv| / |S|) Entropy(Sv)

Gain(Srain, wind)=0.970-[2/5(-0/2 log2 0/2-2/2 log2 2/2)+3/5(-3/3 log2 3/3-0/3 log2

Gain(S, temp) = Entropy(S) - ∑ (|Sv| / |S|) Entropy(Sv)

Sunny Overcast Rain

High Normal Strong Weak

Gain(S, temp) = Entropy(S) - ∑ (|Sv| / |S|) Entropy(Sv)

So, gain at every point with respect to attribute Temperature is zero

SO, final Decision tree is

Sunny Overcast Rain

Humidity Yes Wind

High Normal Strong Weak

● Add a new tree branch below Root, corresponding to the test A = vi

1.5 Inductive Bias In Decision Tree Learning

The chances of occurrence of overfitting increase as much we provide training to

An underfitted model has high bias and low variance.

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

E(Srain) =-3/5log2 3/5-2/5log2 2/5=-(0.6)(-0.737) – (0.4)(-1.322)=0.442+0.528=