0% found this document useful (0 votes)
40 views

Unit II Part 1

The document discusses decision trees, including their structure, uses, advantages, and limitations. It provides examples of using decision trees for classification and describes the top-down induction of decision trees algorithm ID3.

Uploaded by

aravindjas95020
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
40 views

Unit II Part 1

The document discusses decision trees, including their structure, uses, advantages, and limitations. It provides examples of using decision trees for classification and describes the top-down induction of decision trees algorithm ID3.

Uploaded by

aravindjas95020
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 62

Decision Trees

Decision Trees are considered to be one of the most popular


approaches for representing classifiers. The tree has three important
nodes. These are root node, internal nodes, and leaf nodes. Root and
internal contain attribute test conditions to separate records that have
different characteristics. Branch deals with one of the possible values
of the attribute.
A root • It has no incoming edges.
node
• It has exactly one
Internal incoming edge and two or
more outgoing edges. It
nodes tests an attribute.

Leaf • It has one incoming edge


and no outgoing edges. It
nodes assigns a classification
Decision Trees
Above diagram is used to classify the bird Crane. Flow is given below.
Name Body temperature Gives Birth Class
Crane Warm No Non-mammal
Uses of Decision Trees
Automated
• To get to the desired department.
telephone call
• To predict the price of an option either a bull
Financial institution market or bear market using binary decision
tree.
• To establish customers by type and predict
Marketers whether a customer will buy a specific type
of product.

• To predict heart attack outcomes in


Medical field
chest pain patients.

Gaming industry • To recognize moment and face.

Real Life • Selecting a flight to travel.


Advantages of Decision Trees
• Easy interpretable for humans
1

• Very fast at testing time


2

• They can handle both categorical and numerical data


3

• They can handle highly dimensional data and also operate


4 large datasets

• Decision trees are capable of handling datasets that may have


5 missing values.

• Decision trees do not require complex data preparation


6
Limitations of Decision Trees
• Only axis-aligned splits of data-items
1

• Greedy and may not find the globally optimal


2 tree

• They create overly complex models


3

• Prone to Over-fitting
4

• Need to be careful with parameter tuning


5
Classifcation Learning
Tid Attrib1 Attrib2 Attrib3 Class Learning
1 Yes Large 125K No
algorithm
2 No Medium 100K No

3 No Small 70K No

4 Yes Medium 120K No


Induction
5 No Large 95K Yes

6 No Medium 60K No

7 Yes Large 220K No Learn


8 No Small 85K Yes Model
9 No Medium 75K No

10 No Small 90K Yes


Model
10

Training Set
Apply
Tid Attrib1 Attrib2 Attrib3 Class Model
11 No Small 55K ?

12 Yes Medium 80K ?

13 Yes Large 110K ? Deduction


14 No Small 95K ?

15 No Large 67K ?
10

Test Set
Tree Uses Nodes, and Leaves
Example of a Decision Tree
Tid Refund Marital Taxable
Status Income Cheat

1 Yes Single 125K No


2 No Married 100K No
3 No Single 70K No
4 Yes Married 120K No
5 No Divorced 95K Yes
6 No Married 60K No
7 Yes Divorced 220K No
8 No Single 85K Yes
9 No Married 75K No
10 No Single 90K Yes
10

Training Data
Example of a Decision Tree
Tid Refund Marital Taxable
Status Income Cheat
Refund
1 Yes Single 125K No
Yes No
2 No Married 100K No
3 No Single 70K No NO MarSt
4 Yes Married 120K No Single, Divorced Married
5 No Divorced 95K Yes
TaxInc NO
6 No Married 60K No
< 80K > 80K
7 Yes Divorced 220K No
8 No Single 85K Yes NO YES
9 No Married 75K No
10 No Single 90K Yes
10

Training Data Model: Decision Tree


Apply Model to Test Data
Test Data
Start at the root of tree Refund Marital Taxable
Status Income Cheat

No Married 80K ?
Refund 10

Yes No

NO MarSt
Single, Divorced Married

TaxInc NO
< 80K > 80K

NO YES
Apply Model to Test Data
Test Data
Refund Marital Taxable
Status Income Cheat

No Married 80K ?
Refund 10

Yes No

NO MarSt
Single, Divorced Married

TaxInc NO
< 80K > 80K

NO YES
Apply Model to Test Data
Test Data
Refund Marital Taxable
Status Income Cheat

No Married 80K ?
Refund 10

Yes No

NO MarSt
Single, Divorced Married

TaxInc NO
< 80K > 80K

NO YES
Apply Model to Test Data
Test Data
Refund Marital Taxable
Status Income Cheat

No Married 80K ?
Refund 10

Yes No

NO MarSt
Single, Divorced Married

TaxInc NO
< 80K > 80K

NO YES
Apply Model to Test Data
Test Data
Refund Marital Taxable
Status Income Cheat

No Married 80K ?
Refund 10

Yes No

NO MarSt
Single, Divorced Married

TaxInc NO
< 80K > 80K

NO YES
Apply Model to Test Data
Test Data
Refund Marital Taxable
Status Income Cheat

No Married 80K ?
Refund 10

Yes No

NO MarSt
Single, Divorced Married Assign Cheat to “No”

TaxInc NO
< 80K > 80K

NO YES
When to consider Decision Trees
• Instances describable by attribute-value pairs
• Target function is discrete valued
• Disjunctive hypothesis may be required
• Possibly noisy training data
• Missing attribute values
• Examples:
– Medical diagnosis
– Credit risk analysis
– Object classification for robot manipulator (Tan 1993)

17
Top-Down Induction of Decision Trees
ID3 (Iterative Dichotomiser 3)
In DTL, ID3 (Iterative Dichotomiser 3) is an algorithm invented by Ross
Quinlan used to generate a DT from a dataset. ID3 is the precursor to the C4.5
algorithm, and is typically used in the machine learning and natural language
processing domains.

1. A  the “best” decision attribute for next node


2. Assign A as decision attribute for node
3. For each value of A create new descendant
4. Sort training examples to leaf node according to
the attribute value of the branch
5. If all training examples are perfectly classified (same
value of target attribute) stop, else iterate over new
leaf nodes.
18
Top-Down Induction of Decision Trees
ID3 (Iterative Dichotomiser 3)

19
Pseudocode-ID3

ID3(Examples, Target_attribute, Attributes)


•Compute the information gain for all the attributes. The best attribute is the one with highest
information gain
•Create a Root with best attribute for the tree
•If all instances belongs to the same class, Return single node tree Node,with class label
•If attributes(Features) is empty, Return single node tree Root with label
= the most common value of Target_attribute in Examples
•Otherwise Begin
A(Best attribute)
The decision attribute for Root
For each possible value, vi of A,
Begin
oAdd new branch below Root with A=vi
oLet Examplesvi be the subset of Examples that have value vi for A
oIf Examplesvi is empty
•Then add a leafnode withlabel= the most common value of Target_attribute in
Examples
•Else below this new branch add the subtree
ID3 (Examplesvi, Target_attribute, Attributes −{A})
End
•End
•Return Node
Entropy

• S is a sample of training examples


• p+ is the proportion of positive examples
• p- is the proportion of negative examples
• Entropy measures the impurity of S
Entropy(S) = -p+ log2 p+ - p- log2 p-
22
Training Examples
Day Outlook Temp. Humidity Wind Play Tennis
D1 Sunny Hot High Weak No
D2 Sunny Hot High Strong No
D3 Overcast Hot High Weak Yes
D4 Rain Mild High Weak Yes
D5 Rain Cool Normal Weak Yes
D6 Rain Cool Normal Strong No
D7 Overcast Cool Normal Strong Yes
D8 Sunny Mild High Weak No
D9 Sunny Cold Normal Weak Yes
D10 Rain Mild Normal Weak Yes
D11 Sunny Mild Normal Strong Yes
D12 Overcast Mild High Strong Yes
D13 Overcast Hot Normal Weak Yes
D14 Rain Mild High Strong No
23
Creation of top most Node
Solution: ID3 determines the information gain for each
candidate attribute
(i.e Outlook, Temperature, humidity, and Wind).
Select the one with highest information gain.
Entropy H of S (whole Data-set)
S={D1,...,D14}=[9+,5−]
p   9; p   5
H ( S ) Binary   p log 2 p   p log 2 p 

9 9 5 5
H (S )   log 2  log 2  0.940
14 14 14 14
Information Gain
Information Gain is a statistical measure that indicates how
well a given feature F separates (discriminates) instances
according to the target classes for an arbitrary collection of
examples = S. |S|= cardinality of S=Number of the
elements in the set.
Sv
GainS , F   H S     H S v 
v Values ( F ) S
S v = subsets of sets with value v of feature F.
Entropy is a statistical measure from information theory that
characterizes impurity of an arbitrary collection of examples = S.
Information Gain for Wind feature
S for Weak value ={D1,D3,D4,D5,D8,D9,D10,D13}=[6+,2−]
S for Strong value ={D2,D6,D7,D11,D12,D4}=[3+,3−]
S 
 H S Strong 
S Strong
GainS ,Wind   H S    Weak  H SWeak  
Sv
GainS , F   H S     H Sv 
v Values( F ) S  S S 
Information Gain
Sv
GainS , F   H S     H S v 
v Values ( F ) S

S for Weak value ={D1,D3,D4,D5,D8,D9,D10,D13}=[6+,2−]


S for Strong value ={D2,D6,D7,D11,D12,D4}=[3+,3−]
Sv  SWeak 
GainS , F   H S     H Sv   H S Strong 
S Strong
GainS ,Wind   H S     H SWeak  
v Values( F ) S  S S 
H ( SWeak )   p log 2 p   p log 2 p    log 2  log 2
6 6 2 2
8 8 8 8
 0.75(0.415)  0.25(2)  0.311  0.5  0.811

H ( S Strong )   p log 2 p   p log 2 p    log 2  log 2


3 3 3 3
6 6 6 6
 0.5(1)  0.5(1)  1

 SWeak 
 H S Strong 
S Strong
GainS ,Wind   H S     H SWeak  
 S S 

8 6 
GainS ,Wind   0.940    0.811   1  0.940  [0.463  0.428]  0.048
14 14 
Similarly compute for others. Information gains for the four features:
GainS ,Wind   0.048; GainS , Outlook   0.246
GainS , Humidity   0.151; GainS , Temperature   0.029
Outlook has the highest Information Gain and is the preferred feature to discriminate among
data-items. Outlook attribute provides the best prediction of the target attribute. Play tennis,
over the training examples. Therefore, outlook is selected as the decision attribute for the root
node, and branches are created below the root for each of its possible values (i.e., Sunny,
Overcast, and Rain).
Outlook

Sunny Overcast Rainy

D1,D2,D8,D9,D11 D3,D7,D12,D13 D4,D5,D6,D10,D14

[2+,3-] [4+,0-] [3+,2-]

? Yes ?

The overcast descendant has only positive examples and therefore becomes a leaf
node with classification Yes. The other two nodes(Sunny and Rainy) will be further
expanded by selecting the attribute with highest information gain relative to the
new subsets of examples.
2 2 3 3
H ( S Sunny )   log 2  log 2  0.4(1.32)  0.6(0.73)  0.970
5 5 5 5
[Sunny,Humidity]=High 3(0+,3-), Normal 2(2+,0-)
3 3
H ( S High )   log 2  1(0)  0
3 3
2 2
H ( S Normal )   log 2  1(0)  0
2 2
 S High 
GainS Sunny , Humidity   H ( S Sunny )    H S High  
S Normal
 H S Normal 
 Humidity
S S Humidity 

GainS Sunny , Humidity   0.970    0   0  0.970


3 2 
5 5 
[Sunny, Temperature]=Hot 2(0+,2-), Mild 2(1+,1-), Cool 1(1+.0-)

2 2
H ( S Hot )   log 2  1(0)  0
2 2
1 1 1 1
H ( S Mild )   log 2  log 2  0.5(1)  0.5(1)  1
2 2 2 2
1 1
H ( SCool )   log 2  1(0)  0
1 1

 S Hot 
GainS Sunny , Temperature   H ( S Sunny )  
S Mild SCool
 H S Hot    H S Mild    H SCool 
 STemp STemp STemp 

GainS Sunny , Temperature   0.970    0   1   0  0.57


2 2 1 
5 5 5 
Sunny,Wind]=Strong 2(1+,1-), Weak 3(2+,1-)
1 1 1 1
H ( S Strong )   log 2  log 2  1
2 2 2 2
2 2 1 1
H ( SWeak )   log 2  log 2  0.667(0.5851)  0.333(1.5851)  0.918
3 3 3 3
 S Strong 
GainS Sunny ,Wind   H ( S Sunny )    H S Strong  
SWeak
 H SWeak 
 Wind 
S SWindy

GainS Sunny ,Wind   0.970    1   0.918  0.019


2 3 
5 5 
GainS Sunny , Humidity   0.97; GainS Sunny , Temp   0.57; GainS Sunny ,Wind   0.019

Hence, Humidity has the highest Information Gain and is the preferred feature to
discriminate among data-items.
Outlook

Sunny Overcast Rainy

D1,D2,D8,D9,D11 D3,D7,D12,D13 D4,D5,D6,D10,D14

[2+,3-] [4+,0-] [3+,2-]

Humidity
Yes ?
High Normal
[0+,3-] [2+,0-]

No
Yes
•Every attribute has already been included along this path through the tree
•The training examples associated with this leaf node all have the same target
attribute value (i.e., their entropy is zero). Hence, a decision tree for the concept play
tennis is given below.
Outlook

Sunny Overcast Rainy

D1,D2,D8,D9,D11 D3,D7,D12,D13 D4,D5,D6,D10,D14


[2+,3-] [4+,0-] [3+,2-]

Humidity
Yes Wind
High Normal Strong Weak
[0+,3-] [2+,0-] [0+,2-] [3+,0-]

No No Yes
Yes
C4.5-Gain Ratio
Wind Attribute
Gain( S , A) c
Si Si
GainRatio ( S , A)  Split inf( S , A)   log 2
Split inf( S , A) i 1 S S
GainS ,Wind   0.048; GainS , Outlook   0.246 GainS , Humidity   0.151; GainS , Temperature  0.029
S for Weak value ={D1,D3,D4,D5,D8,D9,D10,D13}=[6+,2−] Day Wind Play Tennis
S for Strong value ={D2,D6,D7,D11,D12,D14}=[3+,3−]
There are 8decisions for Weak and 6 decisions for Strong D1 Weak No
c
D2 Strong No
Si S 8 8 6 6
Split inf( S ,Wind )   log 2 i   log 2  log 2  0.985 D3 Weak Yes
i 1 S S 14 14 14 14 D4 Weak Yes
D5 Weak Yes
Gain( S ,Wind ) 0.048
GainRatio ( S , A)    0.049 D6 Strong No
Split inf( S ,Wind ) 0.985 D7 Strong Yes
D8 Weak No
D9 Weak Yes
D10 Weak Yes
D11 Strong Yes
D12 Strong Yes
D13 Weak Yes
D14 Strong No
C4.5-Training Examples
Day Outlook Temperature Humidity Wind Play Tennis

D1 Sunny 85 85 Weak No
D2 Sunny 80 90 Strong No
D3 Overcast 83 78 Weak Yes
D4 Rain 70 96 Weak Yes
D5 Rain 68 80 Weak Yes
D6 Rain 65 70 Strong No
D7 Overcast 64 65 Strong Yes
D8 Sunny 72 95 Weak No
D9 Sunny 69 70 Weak Yes
D10 Rain 75 80 Weak Yes
D11 Sunny 75 70 Strong Yes
D12 Overcast 72 90 Strong Yes
D13 Overcast 81 75 Weak Yes
D14 Rain 71 80 Strong No
Outlook Attribute
GainS , Outlook   0.246 S for Sunny value ={D1,D2, D8,D9, D11}=[2+,3−]
S for Overcast value ={D3, D7, D12,D13}=[4+,0−]
S for Rainy value ={D4,D5,D6,D10,D14}=[3+,2−]
c
Si S 5 5 4 4 5 5
Split inf( S , Outlook )   log 2 i   log 2  log 2  log 2  1.577
i 1 S S 14 14 14 14 14 14
Day Outlook Play Tennis

D1 Sunny No
Gain( S , Outlook ) 0.246
GainRatio ( S , A)    0.155 D2 Sunny No
Split inf( S , Outlook ) 1.577 D3 Overcast Yes
D4 Rain Yes
D5 Rain Yes
D6 Rain No
D7 Overcast Yes
D8 Sunny No
D9 Sunny Yes
D10 Rain Yes
D11 Sunny Yes
D12 Overcast Yes
D13 Overcast Yes
D14 Rain No
Humidity Attribute
Humidity is a continuous attribute. We need to convert continuous values to
nominal ones. C4.5 proposes to perform binary split based on a threshold value.
Threshold should be a value which offers maximum gain for that attribute. Sort
humidity values smallest to largest. Day Humidity Play Tennis
D7 65 Yes
D6 70 No
D9 70 Yes
D11 70 Yes
D13 75 Yes
D3 78 Yes
D5 80 Yes
D10 80 Yes
D14 80 No
D01 85 No
D02 90 No
D12 90 Yes
D8 95 No
D4 96 Yes
Separate dataset into two parts as instances less than or equal to current value and
instances greater than current value. Calculate the gain or gain ratio for every step.
This value is the threshold.
Step 1: Consider threshold 65
<=65 - 01(1+,0-)
>65 - 13(8+,5-)
H ( S 65 )   p log 2 p   p log 2 p    log 2  log 2  0
1 1 0 0
1 1 1 1
8 8 5 5
H ( S  65 )   log 2  log 2  0.961
13 13 13 13
S S 
GainS , Humidity65   H S     65  H S 65    65  H S  65 
 S S 
1 
GainS , Humidity65   0.940    0   0.961  0.048
13
14 14 
c
Si S
Split inf S , Humidity65   
1 1 13 13
log 2 i   log 2  log 2  0.371
i 1 S S 14 14 14 14

GainS , Humidity65 
GainRatioS , Humidity65  
0.048
  0.126
Split inf S , Humidity65  0.371
Step 2: Consider threshold 70
<=70 - 04(3+,1-)
>70 - 10(6+,4-)
H ( S 70 )   p log 2 p   p log 2 p    log 2  log 2  0.811
3 3 1 1
4 4 4 4
Day Humidity Play Tennis 6 6 4 4
H ( S  70 )   log 2  log 2  0.970
D7 65 Yes 10 10 10 10
D6 70 No S S 
D9 70 Yes GainS , Humidity70   H S     70  H S 70    70  H S  70 
D11 70 Yes  S S 
4 
GainS , Humidity70   0.940    0.811   0.970  0.014
D13 75 Yes 10
D3 78 Yes 14 14 
D5 80 Yes c
Si S
Split inf S , Humidity70   
4 4 10 10
log 2 i   log 2  log 2  0.863
D10 80 Yes i 1 S S 14 14 14 14
D14 80 No GainS , Humidity70 
GainRatioS , Humidity70  
0.014
  0.016
D01 85 No Split inf S , Humidity70  0.863
D02 90 No
D12 90 Yes
D8 95 No
D4 96 Yes
Above procedure is applied for all the thresholds

GainS , Humidity75   0.045; GainRatioS , Humidity75   0.047

GainS , Humidity78   0.090; GainRatioS , Humidity78   0.090

GainS , Humidity80   0.101; GainRatioS , Humidity80   0.107

GainS , Humidity85   0.024; GainRatioS , Humidity85   0.027

GainS , Humidity90   0.010; GainRatioS , Humidity90   0.016

GainS , Humidity95   0.048; GainRatioS , Humidity95   0.128

Humidity cannot be greater than 96. Hence, it is ignored. Gain


maximizes when threshold of 80.
Temperature Attribute:
Temperature feature is also continuous. Similar procedure of Humidity can be
applied here. Gain maximizes at 83.

GainS , Temp83   0.113; GainRatioS , Temp83   0.305

Summarize the total calculations gives the following table.

Attribute Gain Gain Ratio


Wind 0.048 0.049
Outlook 0.246 0.155
Humidity80 0.101 0.107
Temperature83 0.113 0.305
If we use Gain metric, then outlook will be the root node (ID3). If we use Gain
Ratio metric, then Temperature will be the root node(C4.5).
CART Algorithm
Classification And Regression Trees

ID3 uses information gain for splitting. C4.5 uses gain ratio for
splitting. CART can handle both classification and regression tasks.
It uses gini index to create decision points for splitting.
M
Gini  1   Pi 
2

i 1
where i= 1 to M number of classes.
Outlook Attribute:
Outlook Yes No Number of Instances
Day Outlook Play Tennis
Sunny 2 3 5
D1 Sunny No Overcast 4 0 4
D2 Sunny No Rain 3 2 5
D3 Overcast Yes
2 2

Gini Outlook Sunny   1   Pi 


D4 Rain Yes M
 2 3
 1        0.48
2
D5 Rain Yes
i 1  5 5
D6 Rain No
D7 Overcast Yes 2 2
M
4 0
Gini OutlookOvercast   1   Pi  1       0
2
D8 Sunny No
D9 Sunny Yes i 1 4 4
D10 Rain Yes
2 2
M
3  2
Gini Outlook Rain   1   Pi   1        0.48
D11 Sunny Yes 2
D12 Overcast Yes
i 1 5  5
D13 Overcast Yes
D14 Rain No Weighted sum of Gini index for outlook feature is

Gini Outlook    POutlook i  Gini Outlooki  


M
5 4 5
 0.48   0   0.48  0.342
i 1 14 14 14
Temperature Attribute:
Temperature Yes No Number of instances
Day Temperature Play Tennis Hot 2 2 4
D1 Hot No Cool 3 1 4
D2 Hot No
D3 Hot Yes Mild 4 2 6
D4 Mild Yes
D5 Cool Yes 2 2
M
2 2
Gini TempHot   1   Pi   1        0 .5
2
D6 Cool No
D7 Cool Yes i 1 4 4
2 2
D8 Mild No M
3 1
Gini TempCool   1   Pi   1        0.375
2
D9 Cool Yes
D10 Mild Yes
i 1 4 4
2 2
D11 Mild Yes M
4 2
Gini TempMild   1   Pi   1        0.445
2
D12 Mild Yes
D13 Hot Yes i 1 6 6
D14 Mild No

Weighted sum of Gini index for Temperature feature is

Gini Temp    PTemp i  Gini Tempi  


M
4 4 6
 0.5   0.375   0.445  0.439
i 1 14 14 14
Humidity Attribute:
Humidity Yes No Number of instances
Day Humidity Play Tennis High 3 4 7
D1 High No
Normal 6 1 7
D2 High No 2 2

Gini Humidity High   1   Pi 


M
3 4
 1        0.489
2
D3 High Yes
D4 High Yes i 1 7 7
D5 Normal Yes
2 2
M
6 1
Gini Humidity Normal   1   Pi 
D6 Normal No
 1        0.244
2
D7 Normal Yes
i 1 7 7
D8 High No
D9 Normal Yes
D10 Normal Yes
Weighted sum of Gini index for Humidity feature is
D11 Normal Yes
D12 High Yes
D13 Normal Yes
D14 High No

Gini Humidity    PHumidity i  Gini Humidityi  


M
7 7
 0.489   0.244  0.367
i 1 14 14
Wind Attribute:
Wind Yes No Number of instances
Day Wind Play Weak 6 2 8
Tennis Strong 3 3 6
D1 Weak No
2 2
D2 Strong No M
6 2
Gini WindWeak   1   Pi   1        0.375
2
D3 Weak Yes
D4 Weak Yes
i 1 8 8
D5 Weak Yes 2 2

Gini Wind Strong   1   Pi 


M
3 3
 1        0 .5
2
D6 Strong No
D7 Strong Yes i 1 6 6
D8 Weak No
D9 Weak Yes
D10 Weak Yes Weighted sum of Gini index for Wind feature is
D11 Strong Yes
D12 Strong Yes
D13 Weak Yes
D14 Strong No

M
Gini Wind    PWind i  Gini Windi  
8 6
 0.375   0.5  0.428
i 1 14 14
Consolidated table is
.
Feature Gini index
Outlook 0.342(Lowest)
Temperature 0.439
Humidity 0.367
Wind 0.428
The winner will be outlook feature because its cost is the lowest.

Outlook

Sunny Rainy
Overcast
Important Algorithm Types
Algorithm
Types

ID3 C4.5 CART


ID3:The ID3 algorithm is considered as a very simple decision tree algorithm (Quinlan,
1986). ID3 uses information gain as splitting criteria. The growing stops when all
instances belong to a single value of target feature or when best information gain is not
greater than zero. ID3 does not apply any pruning procedures nor does it handle
numeric attributes or missing values.
C4.5:C4.5 is an evolution of ID3, presented by the same author (Quinlan, 1993). It uses
gain ratio as splitting criteria. The splitting ceases when the number of instances to be
split is below a certain threshold. Error–based pruning is performed after the growing
phase. C4.5 can handle numeric attributes.
CART:CART stands for Classification and Regression Trees (Breiman et al., 1984). It is
characterized by the fact that it constructs binary trees, namely each internal node has
exactly two outgoing edges. An important feature of CART is its ability to generate
regression trees. Regression trees are trees where their leaves predict a real number
and not a class. It uses gini index to create decision points for splitting.
Comparison
Algorithm Splitting Attribute Missing Pruning Outlier
Criteria Type Values Strategy Detection
ID3 Information Handles only Do not No pruning Susceptible
Gain Categorical handle is done
Value
C4.5 Gain ratio Handles both Handle Pruning is Susceptible
Categorical used
and Numerical
Value
CART Gini Index Handles both Handle Pruning is Can handle
Categorical used
and Numerical
Value
Comparison
Pruning is a data compression technique in machine
learning and search algorithms that reduces the size of decision
trees by removing sections of the tree that are non-critical and
redundant to classify instances. Pruning reduces the complexity of
the final classifier, and hence improves predictive accuracy by the
reduction of overfitting.

An outlier is an object that deviates significantly from the rest of


the objects.

Datasets may have missing values, and this can cause problems
for many machine learning algorithms. As such, it is good practice
to identify and replace missing values for each column in your
input data prior to modeling your prediction task. This is called
missing data imputation, or imputing for short.
EXAMPLE-Construct DT-Information Gain

Channel Variance Image Type


Monochrome Low BW
RGB Low BW
RGB High Color
EXAMPLE
Channel Variance Image Type
Monochrome Low BW
RGB Low BW
RGB High Color
Solution 1:
Solution 2
Channel Variance Image Type
Monochrome Low BW
RGB Low BW
RGB High Color
2  2 1 1
H S   E   p ( BW ) log 2 ( BW )  p (C ) log 2 (C )    log 2      log 2  
3  3 3 3
2 1
   0.585    1.585  0.9183
3 3
 S Mono S RGB 
IGainS , Channel   H S     H S Mono    H S RGB 
 S S 
1  1 2 
 0.9183    H S Mono    H S RGB   0.9183    0  1  0.9183  0.666  0.256
2
3 3  3 3 
H S Mono   EMono  1 log 2 1  0
1 1 1 1 1
H S RGB   E RGB   log 2    log 2      1
1
2 2 2 2 2 2
Solution 2
Channel Variance Image Type
Monochrome Low BW
RGB Low BW
RGB High Color
2  2 1 1
H S   E   p( BW ) log 2 ( BW )  p(C ) log 2 (C )    log 2      log 2  
3  3  3 3
2 1
   0.585    1.585  0.9183
3 3
 S Low 
 H S High 
S High
IGainS , Variance   H S     H S Low  
 
S S

 0.9183    H S Low    H S High   0.9183    0   0  0.9183  0  0.9183


2 1  2 1 
3 3  3 3 
H S Low   E Low  2 / 2 log 2 2 / 2   0

H SHigh
RGB   E RGB
High  1 log 2 1  0
Solution 2
Channel Variance Image Type
Monochrome Low BW
RGB Low BW
RGB High Color
InformationGainS , Channel   0.256 InformationGainS , Variance   0.9183
Satellite Image Classification using Decision Tree
High Resolution Images

In the high resolution satellite imagery, B4 (near infrared (NIR))


is the band used for initial binary splitting of the image, which is
subsequently supported by B2.
Satellite Image Classification using Decision Tree
Low Resolution Images

in the low resolution satellite imagery, the only parameter that


used to classified the images is NDVI(Normalized Difference
Vegetation Index).
NDVI is calculated in accordance with the formula:

NIR – reflection in the near-infrared spectrum


RED – reflection in the red range of the spectrum
NDVI
SPOT satellites spectral bands and resolutions
sensor electromagnetic pixel size spectral bands
spectrum

SPOT 5 Panchromatic 2.5 m or 5 m 0.48 - 0.71 µm


B1 : green 10 m 0.50 - 0.59 µm
B2 : red 10 m 0.61 - 0.68 µm
B3 : near infrared 10 m 0.78 - 0.89 µm
B4 : short wave infrared 20 m 1.58 - 1.75 µm
(SWIR)

SPOT 4 Monospectral 10 m 0.61 - 0.68 µm


B1 : green 20 m 0.50 - 0.59 µm
B2 : red 20 m 0.61 - 0.68 µm
B3 : near infrared 20 m 0.78 - 0.89 µm
B4 : short wave infrared 20 m 1.58 - 1.75 µm
(SWIR)
NDVI
The Normalised Difference Vegetation Index (NDVI) (Rouse Jr. et al. 1974) was developed as
an index of plant “greenness” and attempts to track photosynthetic activity. It has since
become one of the most widely applied indices. The NDVI is a relative value and cannot be
used to compare between images taken at different times or from different sensors. NDVI
values range from -1 to +1, where higher positive values indicate the presence of greener and
healthier plants. The NDVI is widely used due to its simplicity, and several indices have been
developed to replicate or improve upon it.
NDVI = NIR - R / NIR + R
Sentinel 2: B8 - B4 / B8 + B4
Landsat 8: B5 - B4 / B5 + B4

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy