0% found this document useful (0 votes)

41 views

Class 16 Decision Tree

The document discusses decision tree learning and some key concepts: - Decision trees are a widely used method for inductive inference that can approximate discrete-valued functions. They represent learned functions as tree structures. - The ID3 algorithm builds decision trees in a top-down manner by selecting the attribute that best splits the training examples at each step, based on information gain. - Overfitting can occur if trees are allowed to grow too deep. Strategies like reduced-error pruning post-prune trees to avoid overfitting while maintaining accuracy on validation data.

Uploaded by

Sumana Basu

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

41 views

Class 16 Decision Tree

Uploaded by

Sumana Basu

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

You are on page 1/ 45

Decision tree learning

Inductive inference with decision trees

 Inductive reasoning is a method of reasoning in which a body
of observations is considered to derive a general principle.
 Decision Trees is one of the most widely used and practical
methods of inductive inference
 Features
 Method for approximating discrete-valued functions (including
boolean)
 Learned functions are represented as decision trees (or if-then-else
rules)
 Expressive hypotheses space, including disjunction
Decision tree representation (PlayTennis)

Outlook=Sunny, Temp=Hot, Humidity=High, Wind=Strong No

Decision trees expressivity
 Decision trees represent a disjunction of conjunctions on
constraints on the value of attributes:
(Outlook = Sunny  Humidity = Normal) 
(Outlook = Overcast) 
(Outlook = Rain  Wind = Weak)
Decision trees representation

+

When to use Decision Trees
 Problem characteristics:
 Instances can be described by attribute value pairs
 Target function is discrete valued
 Disjunctive hypothesis may be required
 Possibly noisy training data samples
 Robust to errors in training data
 Missing attribute values
 Different classification problems:
 Equipment or medical diagnosis
 Credit risk analysis
 Several tasks in natural language processing
Top-down induction of Decision Trees
 ID3 (Quinlan, 1986) is a basic algorithm for learning DT's
 Given a training set of examples, the algorithms for building DT
performs search in the space of decision trees
 The construction of the tree is top-down. The algorithm is greedy.
 The fundamental question is “which attribute should be tested next?
Which question gives us more information?”
 Select the best attribute
 A descendent node is then created for each possible value of this
attribute and examples are partitioned according to this value
 The process is repeated for each successor node until all the
examples are classified correctly or there are no attributes left
Which attribute is the best classifier?

 A statistical property called information gain, measures how

well a given attribute separates the training examples
 Information gain uses the notion of entropy, commonly used
in information theory
 Information gain = expected reduction of entropy
Entropy in binary classification
 Entropy measures the impurity of a collection of examples. It
depends from the distribution of the random variable p.
 S is a collection of training examples
 p+ the proportion of positive examples in S
 p– the proportion of negative examples in S
Entropy (S)  – p+ log2 p+ – p–log2 p– [0 log20 = 0]
Entropy ([14+, 0–]) = – 14/14 log2 (14/14) – 0 log2 (0) = 0
Entropy ([9+, 5–]) = – 9/14 log2 (9/14) – 5/14 log2 (5/14) = 0.94
Entropy ([7+, 7– ]) = – 7/14 log2 (7/14) – 7/14 log2 (7/14) =
= 1/2 + 1/2 = 1 [log21/2 = – 1]
Note: the log of a number < 1 is negative, 0  p  1, 0  entropy  1
Entropy
Entropy in general
 Entropy measures the amount of information in a random
variable
H(X) = – p+ log2 p+ – p– log2 p– X = {+, –}
for binary classification [two-valued random variable]
c c

H(X) = –  pi log2 pi =  pi log2 1/ pi X = {i, …, c}

i=1 i=1
for classification in c classes
Example: rolling a die with 8, equally probable, sides
8
H(X) = –  1/8 log2 1/8 = – log2 1/8 = log2 8 = 3
i=1
Entropy and information theory
 Entropy specifies the number the average length (in bits) of the
message needed to transmit the outcome of a random variable.
This depends on the probability distribution.
 Optimal length code assigns  log2 p bits to messages with
probability p. Most probable messages get shorter codes.
 Example: 8-sided [unbalanced] die
1 2 3 4 5 6 7 8
4/16 4/16 2/16 2/16 1/16 1/16 1/16 1/16
2 bits 2 bits 3 bits 3 bits 4bits 4bits 4bits 4bits
E = (1/4 log2 4)  2 + (1/8 log2 8)  2 + (1/16 log2 16)  4 = 1+3/4+1 = 2.75
Information gain as entropy reduction
 Information gain is the expected reduction in entropy caused
by partitioning the examples on an attribute.
 The higher the information gain the more effective the
attribute in classifying training data.
 Expected reduction in entropy knowing A

Gain(S, A) = Entropy(S) −  |Sv|

Entropy(Sv)
|S|
v  Values(A)
Values(A): possible values for A
Sv: subset of S for which A has value v
Example: expected information gain
 Let
 Values(Wind) = {Weak, Strong}
 S = [9+, 5−]

 SWeak = [6+, 2−]

 SStrong = [3+, 3−]

 Information gain due to knowing Wind:
Gain(S, Wind) = Entropy(S) − 8/14 Entropy(SWeak) − 6/14 Entropy(SStrong)
= 0.94 − 8/14  0.811 − 6/14  1.00
= 0.048
Which attribute is the best classifier?
Example
First step: which attribute to test at the root?

 Which attribute should be tested at the root?

 Gain(S, Outlook) = 0.246
 Gain(S, Humidity) = 0.151
 Gain(S, Wind) = 0.084
 Gain(S, Temperature) = 0.029
 Outlook provides the best prediction for the target
 Lets grow the tree:
 add to the tree a successor for each possible value of Outlook
 partition the training samples according to the value of Outlook
After first step
Second step
 Working on Outlook=Sunny node:
Gain(SSunny, Humidity) = 0.970  3/5  0.0  2/5  0.0 = 0.970
Gain(SSunny, Wind) = 0.970  2/5  1.0  3.5  0.918 = 0 .019
Gain(SSunny, Temp.) = 0.970  2/5  0.0  2/5  1.0  1/5  0.0 = 0.570
 Humidity provides the best prediction for the target
 Lets grow the tree:
 add to the tree a successor for each possible value of Humidity
 partition the training samples according to the value of Humidity
Second and third steps

{D1, D2, D8} {D9, D11} {D4, D5, D10} {D6, D14}
No Yes Yes No
ID3: algorithm
ID3(X, T, Attrs) X: training examples:
T: target attribute (e.g. PlayTennis),
Attrs: other attributes, initially all attributes
Create Root node
If all X's are +, return Root with class +
If all X's are –, return Root with class –
If Attrs is empty return Root with class most common value of T in X
else
A  best attribute; decision attribute for Root  A
For each possible value vi of A:
- add a new branch below Root, for test A = vi
- Xi  subset of X with A = vi
- If Xi is empty then add a new leaf with class the most common value of T in X
else add the subtree generated by ID3(Xi, T, Attrs  {A})
return Root
Inductive bias in decision tree learning

 The inductive bias (also known as learning bias) of a

learning algorithm is the set of assumptions that the learner
uses to predict outputs of given inputs that it has not
encountered.
 What is the inductive bias of DT learning?
1. Shorter trees are preferred over longer trees
2. Prefer trees that place high information gain attributes close
to the root
Prefer shorter hypotheses: Occam's rasor
 Why prefer shorter hypotheses?
 Arguments in favor:
 There are fewer short hypotheses than long ones
 If a short hypothesis fits data unlikely to be a coincidence
 Elegance and aesthetics
 Arguments against:
 Not every short hypothesis is a reasonable one.
 Occam's razor:"The simplest explanation is usually the best
one."
Issues in decision trees learning
 Overfitting
 Extensions
 Continuous valued attributes
 Alternative measures for selecting attributes
 Handling training examples with missing attribute values
 Handling attributes with different costs
 Improving computational efficiency
 Most of these improvements in C4.5 (Quinlan, 1993)
Overfitting: definition
 Building trees that “adapt too much” to the training examples
may lead to “overfitting”.
 Consider error of hypothesis h over
 training data: errorD(h) empirical error
 entire distribution X of data: errorX(h) expected error
 Hypothesis h overfits training data if there is an alternative
hypothesis h'  H such that
errorD(h) < errorD(h’) and
errorX(h’) < errorX(h)
i.e. h’ behaves better over unseen data
Example

D15 Sunny Hot Normal Strong No

Overfitting in decision trees

Outlook=Sunny, Temp=Hot, Humidity=Normal, Wind=Strong, PlayTennis=No 

New noisy example causes splitting of second leaf node.

Overfitting in decision tree learning
Avoid overfitting in Decision Trees
 Two strategies:
1. Stop growing the tree earlier, before perfect classification
2. Allow the tree to overfit the data, and then post-prune the tree
 Training and validation set
 split the training in two parts (training and validation) and use
validation to assess the utility of post-pruning
 Reduced error pruning
 Rule pruning
Reduced-error pruning (Quinlan 1987)
 Each node is a candidate for pruning
 Pruning consists in removing a subtree rooted in a node: the
node becomes a leaf and is assigned the most common
classification
 Nodes are removed only if the resulting tree performs no
worse on the validation set.
 Nodes are pruned iteratively: at each iteration the node whose
removal most increases accuracy on the validation set is
pruned.
 Pruning stops when no pruning increases accuracy
Effect of reduced error pruning
Rule post-pruning
1. Create the decision tree from the training set
2. Convert the tree into an equivalent set of rules
 Each path corresponds to a rule
 Each node along a path corresponds to a pre-condition
 Each leaf classification to the post-condition
3. Prune (generalize) each rule by removing those preconditions
whose removal improves accuracy …
 … over validation set
 … over training with a pessimistic, statistically inspired, measure
4. Sort the rules in estimated order of accuracy, and consider
them in sequence when classifying new instances
Converting to rules

(Outlook=Sunny)(Humidity=High) ⇒ (PlayTennis=No)
Why converting to rules?
 Each distinct path produces a different rule: a condition
removal may be based on a local (contextual) criterion. Node
pruning is global and affects all the rules
 In rule form, tests are not ordered and there is no book-
keeping involved when conditions (nodes) are removed
 Converting to rules improves readability for humans
Dealing with continuous-valued attributes
 So far discrete values for attributes and for outcome.
 Given a continuous-valued attribute A, dynamically create a new
attribute Ac
Ac = True if A < c, False otherwise
 How to determine threshold value c ?
 Example. Temperature in the PlayTennis example
 Sort the examples according to Temperature
Temperature 40 48 | 60 72 80 | 90
PlayTennis No No 54 Yes Yes Yes 85 No
 Determine candidate thresholds by averaging consecutive values where
there is a change in classification: (48+60)/2=54 and (80+90)/2=85
 Evaluate candidate thresholds (attributes) according to information gain.
The best is Temperature>54.The new attribute competes with the other
ones
Problems with information gain
 Natural bias of information gain: it favours attributes with
many possible values.
 Consider the attribute Date in the PlayTennis example.
 Date would have the highest information gain since it perfectly
separates the training data.
 It would be selected at the root resulting in a very broad tree
 Very good on the training, this tree would perform poorly in predicting
unknown instances. Overfitting.
 The problem is that the partition is too specific, too many
small classes are generated.
 We need to look at alternative measures …
An alternative measure: gain ratio
c |Si | |Si |
SplitInformation(S, A)  −  log2
|S |
i=1 |S |
 S are the sets obtained by partitioning on value i of A
i
 SplitInformation measures the entropy of S with respect to the values of A. The
more uniformly dispersed the data the higher it is.
Gain(S, A)
GainRatio(S, A) 
SplitInformation(S, A)
 GainRatio penalizes attributes that split examples in many small classes such
as Date. Let |S |=n, Date splits examples in n classes
 SplitInformation(S, Date)= −[(1/n log2 1/n)+…+ (1/n log2 1/n)]= −log21/n =log2n
 Compare with A, which splits data in two even classes:
 SplitInformation(S, A)= − [(1/2 log21/2)+ (1/2 log21/2) ]= − [− 1/2 −1/2]=1
Adjusting gain-ratio
 Problem: SplitInformation(S, A) can be zero or very small
when |Si | ≈ |S | for some value i
 To mitigate this effect, the following heuristics has been
used:
1. compute Gain for each attribute
2. apply GainRatio only to attributes with Gain above average
Handling incomplete training data
 How to cope with the problem that the value of some
attribute may be missing?
 Example: Blood-Test-Result in a medical diagnosis problem
 The strategy: use other examples to guess attribute
1. Assign the value that is most common among the training examples at
the node
2. Assign a probability to each value, based on frequencies, and assign
values to missing attribute, according to this probability distribution
 Missing values in new instances to be classified are treated
accordingly, and the most probable classification is chosen
(C4.5)
Handling attributes with different costs
 Instance attributes may have an associated cost: we would
prefer decision trees that use low-cost attributes
 ID3 can be modified to take into account costs:
1. Tan and Schlimmer (1990)
Gain2(S, A)
Cost(S, A)
2. Nunez (1988)
2Gain(S, A)  1

(Cost(A) + 1)w w ∈ [0,1]

Gini (impurity) Index
 The Gini index is a measure of diversity in a dataset. In other
words, if we have a set in which all the elements are similar,
this set has a low Gini index, and if all the elements are
different, it has a large Gini index.
 For clarity, consider the following two sets of 10 colored balls
(where any two balls of the same color are indistinguishable):
 • Set 1: eight red balls, two blue balls
 • Set 2: four red balls, three blue balls, two yellow balls, one green ball
 Set 1 looks more pure than set 2, because set 1 contains mostly
red balls and a couple of blue ones, whereas set 2 has many
different colors. Next, we devise a measure of impurity that
assigns a low value to set 1 and a high value to set 2.
Gini (impurity) Index
 If we pick two random elements of the set, what is the
probability that they have a different color ? The two elements
don’t need to be distinct; we are allowed to pick the same
element twice.
 P(picking two balls of different color) = 1 – P(picking two balls
of the same color)
 P(picking two balls of the same color) = P(both balls are color
1) + P(both balls are color 2) + … + P(both balls are color n)
 P(both balls are color i) = pi2
 P(picking two balls of different colors) = 1 – p12 – p22 – … – pn2
Gini (impurity) Index
 Gini impurity index:
In a set with m elements and n classes, with ai elements belonging
to the i-th class, the Gini impurity index is
Gini = 1 – p12 – p22 – … – pn2 , where pi = ai / m
Gini (impurity) Index

Split on Gender: Split on Class:

1.Female Gini = 1- (0.2)2-(0.8)2=0.32 1.IX Gini = 1- (0.43)2-(0.57)2=0.49
2.Male Gini= 1- (0.65)2-(0.35)2=0.45 2.X Gini= 1- (0.56)2-(0.44)2=0.49
3.Ginigender = (10/30)*0.32+(20/30)*0.45 3.Giniclass = (14/30)*0.49+(16/30)*0.49
= 0.406 = 0.49
The Attribute producing least Gini impurity index is selected for
split.
References
 Machine Learning, Tom Mitchell, Mc Graw-Hill International
Editions, 1997 (Cap 3).

The Operative's Dossier
88% (8)
The Operative's Dossier
100 pages
CFX 5000v10
100% (2)
CFX 5000v10
14 pages
Cell Theory
No ratings yet
Cell Theory
32 pages
Pig Farming Book 3
50% (4)
Pig Farming Book 3
67 pages
M01 Tree-Based Methods
No ratings yet
M01 Tree-Based Methods
38 pages
L3 - Decision Trees
No ratings yet
L3 - Decision Trees
28 pages
Unit-3 (1)
No ratings yet
Unit-3 (1)
81 pages
7-Decision Trees Learning
No ratings yet
7-Decision Trees Learning
51 pages
Decision Trees CLS
No ratings yet
Decision Trees CLS
43 pages
Module 3-Decision Tree Learning
100% (1)
Module 3-Decision Tree Learning
33 pages
Decision Trees
No ratings yet
Decision Trees
53 pages
DMDW-CO3-SESSION-14
No ratings yet
DMDW-CO3-SESSION-14
55 pages
07 - ML - Decision Tree
No ratings yet
07 - ML - Decision Tree
37 pages
ML Lecture 3
No ratings yet
ML Lecture 3
13 pages
Decision Tree Learning
No ratings yet
Decision Tree Learning
70 pages
Module 3 Chap 3 Decision Tree Learning
No ratings yet
Module 3 Chap 3 Decision Tree Learning
79 pages
unit 3
No ratings yet
unit 3
90 pages
Module - 2 Decision Tree Learning
No ratings yet
Module - 2 Decision Tree Learning
79 pages
Decision Tree-Using Entropy
No ratings yet
Decision Tree-Using Entropy
17 pages
Cse 445 Lecture 8 Mma
No ratings yet
Cse 445 Lecture 8 Mma
107 pages
7. Decision Tree & Random Forest
No ratings yet
7. Decision Tree & Random Forest
41 pages
SDG Sdgs DF
No ratings yet
SDG Sdgs DF
23 pages
ML-3-Decision Tree
No ratings yet
ML-3-Decision Tree
17 pages
2.3 Decision-Tree-Algorithm
No ratings yet
2.3 Decision-Tree-Algorithm
61 pages
Decision - Tree
No ratings yet
Decision - Tree
75 pages
Machine Learning Unit-3.2
No ratings yet
Machine Learning Unit-3.2
61 pages
Chapter 3 Decision Trees
No ratings yet
Chapter 3 Decision Trees
61 pages
Unit2 ML
No ratings yet
Unit2 ML
19 pages
Unit 3 (A) NGP
No ratings yet
Unit 3 (A) NGP
78 pages
Module 3
No ratings yet
Module 3
102 pages
ML-Lec5
No ratings yet
ML-Lec5
7 pages
Chapter 5 2018 2019
No ratings yet
Chapter 5 2018 2019
5 pages
3 Decision Tree Learning
No ratings yet
3 Decision Tree Learning
38 pages
MCA3 (DS) Unit 4 ML
No ratings yet
MCA3 (DS) Unit 4 ML
29 pages
Module 3
No ratings yet
Module 3
101 pages
Decision Tree
No ratings yet
Decision Tree
33 pages
Unit6 -2 Classification-Decision-Trees_25625586-1bf9-4821-a721-70db2d7805ef
No ratings yet
Unit6 -2 Classification-Decision-Trees_25625586-1bf9-4821-a721-70db2d7805ef
36 pages
DS_w12_DT
No ratings yet
DS_w12_DT
61 pages
Decision Trees: Decision Tree Representation ID3 Learning Algorithm Entropy, Information Gain Overfitting
No ratings yet
Decision Trees: Decision Tree Representation ID3 Learning Algorithm Entropy, Information Gain Overfitting
33 pages
L5 - Decision Tree - B
No ratings yet
L5 - Decision Tree - B
51 pages
Unit 3 MLT
No ratings yet
Unit 3 MLT
18 pages
Chapter 4 (2)
No ratings yet
Chapter 4 (2)
103 pages
MLT Unit 3
100% (1)
MLT Unit 3
38 pages
MOD 4-1
No ratings yet
MOD 4-1
42 pages
ML UNIT-2 Notes
No ratings yet
ML UNIT-2 Notes
15 pages
Asset v1 MKAU+SEng9032+DEV 01+Type@Asset+Block@ML Chapterthree
No ratings yet
Asset v1 MKAU+SEng9032+DEV 01+Type@Asset+Block@ML Chapterthree
129 pages
Lecture 6 Classification-Decision Tree Rule Based K-NN
No ratings yet
Lecture 6 Classification-Decision Tree Rule Based K-NN
73 pages
Unit 5. Decision Trees
No ratings yet
Unit 5. Decision Trees
58 pages
ID3 Dozier Seals
No ratings yet
ID3 Dozier Seals
30 pages
Screenshot 2024-02-06 at 1.43.15 PM
No ratings yet
Screenshot 2024-02-06 at 1.43.15 PM
66 pages
2c Decision Tree Algorithm
No ratings yet
2c Decision Tree Algorithm
21 pages
Decision Tree Algorithm: and Classification Problems Too
No ratings yet
Decision Tree Algorithm: and Classification Problems Too
12 pages
AIML Lect5 Decision Tree
No ratings yet
AIML Lect5 Decision Tree
33 pages
Chapter 4
No ratings yet
Chapter 4
103 pages
Decision Tree Example
No ratings yet
Decision Tree Example
21 pages
Deep Learning: Decision Trees I
No ratings yet
Deep Learning: Decision Trees I
45 pages
Decision Tree
No ratings yet
Decision Tree
12 pages
New Module 3 Part1
No ratings yet
New Module 3 Part1
69 pages
Decision Trees
No ratings yet
Decision Trees
15 pages
AI PPT
No ratings yet
AI PPT
16 pages
2.decision Tree
No ratings yet
2.decision Tree
56 pages
Lecture 3
No ratings yet
Lecture 3
18 pages
Unit 2 1
No ratings yet
Unit 2 1
15 pages
Alternating Decision Tree: Fundamentals and Applications
From Everand
Alternating Decision Tree: Fundamentals and Applications
Fouad Sabry
No ratings yet
Class 9 Validation of The Linear Regression Model
No ratings yet
Class 9 Validation of The Linear Regression Model
20 pages
Class 06 07 Naive Bayes
No ratings yet
Class 06 07 Naive Bayes
91 pages
Class 03 04 Confidence Interval, Hypothesis Testing
No ratings yet
Class 03 04 Confidence Interval, Hypothesis Testing
87 pages
Everything (Almost) You Wanted To Know About A Willow Flute, But Were AFRAID To Ask by Sarah Kirton
No ratings yet
Everything (Almost) You Wanted To Know About A Willow Flute, But Were AFRAID To Ask by Sarah Kirton
8 pages
Pharmacist POST CODE-20/21 LN Directorate Of: Homeo
No ratings yet
Pharmacist POST CODE-20/21 LN Directorate Of: Homeo
5 pages
Material For Teaching in Mult Setting
No ratings yet
Material For Teaching in Mult Setting
50 pages
Region: Project: Plant: Date: Supplier: Country: Family: Contact Name: Serial Nº: Contact Name
No ratings yet
Region: Project: Plant: Date: Supplier: Country: Family: Contact Name: Serial Nº: Contact Name
5 pages
J. Appl. Environ. Biol. Sci., 3 (9) 94-98, 2013
No ratings yet
J. Appl. Environ. Biol. Sci., 3 (9) 94-98, 2013
5 pages
15 Resources To Learn C - Sharp PDF
No ratings yet
15 Resources To Learn C - Sharp PDF
4 pages
ADP Perposal
No ratings yet
ADP Perposal
13 pages
Summary - Chapter 9
100% (2)
Summary - Chapter 9
21 pages
FABM 2 Finals Reviewer
No ratings yet
FABM 2 Finals Reviewer
10 pages
(Ebook) Puppet and Spirit: Ritual, Religion, and Performing Objects by Claudia Orenstein, Tim Cusack ISBN 9781003150367, 9780367713379, 1003150365, 0367713373 - The full ebook set is available with all chapters for download
100% (1)
(Ebook) Puppet and Spirit: Ritual, Religion, and Performing Objects by Claudia Orenstein, Tim Cusack ISBN 9781003150367, 9780367713379, 1003150365, 0367713373 - The full ebook set is available with all chapters for download
83 pages
TUV SUD Defect Annotation Guide
No ratings yet
TUV SUD Defect Annotation Guide
10 pages
Class Ix Notes 2023-24
No ratings yet
Class Ix Notes 2023-24
27 pages
The Economic and Security Intricacies of The Bangsamoro
100% (2)
The Economic and Security Intricacies of The Bangsamoro
19 pages
11 - Analisis Trend
No ratings yet
11 - Analisis Trend
28 pages
Requirements Writing for System Engineering 1st ed. Edition George Koelsch instant download
No ratings yet
Requirements Writing for System Engineering 1st ed. Edition George Koelsch instant download
56 pages
Bosch Compund Mitre Saw PCM 1800 User Manual
No ratings yet
Bosch Compund Mitre Saw PCM 1800 User Manual
21 pages
Theory of Architecture
No ratings yet
Theory of Architecture
5 pages
Intentional Interviewing and Counseling Facilitating Client Development in a Multicultural Society 7th Edition Allen E. Ivey all chapter instant download
100% (4)
Intentional Interviewing and Counseling Facilitating Client Development in a Multicultural Society 7th Edition Allen E. Ivey all chapter instant download
61 pages
Pajero III Rear Differential Lock
100% (2)
Pajero III Rear Differential Lock
8 pages
Delhi Sultanate: Islam in India 1206-1526
No ratings yet
Delhi Sultanate: Islam in India 1206-1526
8 pages
Exercises On Indirect Speech With Key
No ratings yet
Exercises On Indirect Speech With Key
5 pages
Petron Corporation SEC Reg No 31171 2023 Annual Report SEC Form 17 A 15 April 2024 For Website
No ratings yet
Petron Corporation SEC Reg No 31171 2023 Annual Report SEC Form 17 A 15 April 2024 For Website
493 pages
Proc Summary
No ratings yet
Proc Summary
19 pages
Direct claim A Not-so-Smart TV
No ratings yet
Direct claim A Not-so-Smart TV
3 pages
BMW 525i Sedan 2005 Owners Manual
No ratings yet
BMW 525i Sedan 2005 Owners Manual
221 pages
Centrifugal Fan
100% (1)
Centrifugal Fan
65 pages
Product Catalog - CFRP
No ratings yet
Product Catalog - CFRP
2 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Class 16 Decision Tree

Uploaded by

Class 16 Decision Tree

Uploaded by

Decision tree learning

Inductive inference with decision trees

Outlook=Sunny, Temp=Hot, Humidity=High, Wind=Strong No

 A statistical property called information gain, measures how

H(X) = –  pi log2 pi =  pi log2 1/ pi X = {i, …, c}

Gain(S, A) = Entropy(S) −  |Sv|

 SWeak = [6+, 2−]

 SStrong = [3+, 3−]

 Which attribute should be tested at the root?

 The inductive bias (also known as learning bias) of a

D15 Sunny Hot Normal Strong No

Outlook=Sunny, Temp=Hot, Humidity=Normal, Wind=Strong, PlayTennis=No 

New noisy example causes splitting of second leaf node.

(Cost(A) + 1)w w ∈ [0,1]

Split on Gender: Split on Class:

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.