unit 3
unit 3
Techniques
KCS 055
Decision Tree
• A decision tree in a machine learning is a
flowchart structure in which each node
represents a test, and each branch represents the
outcome of the test.
• The end node called leaf node represents a class
label.
• Decision Tree is a supervised learning method .
• Used for both classification and regression.
Decision Tree Learning
𝐸 𝑥 = − 𝑃 𝑥𝑖 . 𝑙𝑜𝑔2𝑃(𝑥𝑖)
𝑖=1
• In other words, entropy is the measure of
“randomness information” of a variable.
• It is the measure of uncertainty associated with
random variable (x)
Calculate the entropy (E) of a single attribute
“Playing Golf” problem.
𝐸 𝑆 = − σ𝑛𝑖=1 𝑝𝑖 . 𝑙𝑜𝑔2𝑝i
Where x = current state, 𝑝𝑖 = Prob. of event (i) of state (S).
Yes No Entropy(Playing Golf) = Entropy(5,9)
9 5 Prob. Of Play Golf = Yes = 9/14 = 0.64
Prob. Of Play Golf = No = 5/14 = 0.36
𝑆𝑉
𝐼𝐺 𝑆, 𝐴 = 𝐸𝑛𝑡𝑟𝑜𝑝𝑦 𝑆 − . 𝐸𝑛𝑡𝑟𝑜𝑝𝑦(𝑆𝑉)
𝑆
5 4 5
𝐼𝐺 𝑆, 𝐴 = 0.94 − .𝐸 𝑆𝑢𝑛𝑛𝑦 − .E Overcast − .E(Rain)
14 14 14
Attribute 2: Temp.
Temp. Yes No
Hot
Entropy (S) = E(Play) = 0.94 Mild
Cool
𝑆𝑉
𝐼𝐺 𝑆, 𝐴 = 𝐸𝑛𝑡𝑟𝑜𝑝𝑦 𝑆 − . 𝐸𝑛𝑡𝑟𝑜𝑝𝑦(𝑆𝑉)
𝑆
𝐼𝐺 𝑆, 𝐴 = 0.94 − 𝑃(𝐻𝑜𝑡). 𝐸 𝐻𝑜𝑡 −
𝑃(𝑀𝑖𝑙d). E Mild − 𝑃(𝐶𝑜𝑜𝑙).E(Cool)
Attribute 3: Humidity
Humidity Yes No
High
Entropy (S) = E(Play) = 0.94 Normal
𝑆𝑉
𝐼𝐺 𝑆, 𝐴 = 𝐸𝑛𝑡𝑟𝑜𝑝𝑦 𝑆 − . 𝐸𝑛𝑡𝑟𝑜𝑝𝑦(𝑆𝑉)
𝑆
𝐼𝐺 𝑆, 𝐴
= 0.94 − 𝑃(𝐻𝑖𝑔ℎ). 𝐸 𝐻𝑖𝑔ℎ − 𝑃(Normal). E Normal
Attribute 4: Wind
Wind Yes No
Weak
Entropy (S) = E(Play) = 0.94 Strong
𝑆𝑉
𝐼𝐺 𝑆, 𝐴 = 𝐸𝑛𝑡𝑟𝑜𝑝𝑦 𝑆 − . 𝐸𝑛𝑡𝑟𝑜𝑝𝑦(𝑆𝑉)
𝑆
IG(S, Wind) = 0.94 - 𝑃(𝑊𝑒𝑎𝑘).E(Weak) -
𝑃(𝑆𝑡𝑟𝑜𝑛𝑔).E(Strong)
• IG(Outlook) = 0.2464 Root Node
• IG(Temp) = 0.0289
• IG(Humidity) = 0.1316
• IG (Wind) = 0.0478
Outlook
SSunny=[2+, 3-]
2 2 3 3
Entropy(Ssunny) = - log2 - log2 = 0.97
5 5 5 5
Attribute 1: Temp.
Temp. Yes No
Hot 0 2
Entropy (Ssunny) = E(Ssunny) = 0.97 Mild 1 1
Cool 1 0
𝑆𝑉
𝐼𝐺 𝑆, 𝐴 = 𝐸𝑛𝑡𝑟𝑜𝑝𝑦 𝑆 − . 𝐸𝑛𝑡𝑟𝑜𝑝𝑦(𝑆𝑉)
𝑆
2 2 1
IG(S, Temp) = 0.97 - .E(Hot) - .E(Mild) - .E(Cool)
5 5 5
Attribute 2: Humidity
Humidity Yes No
High
Entropy (Ssunny) = E(Ssunny) = 0.97 Normal
𝑆𝑉
𝐼𝐺 𝑆, 𝐴 = 𝐸𝑛𝑡𝑟𝑜𝑝𝑦 𝑆 − . 𝐸𝑛𝑡𝑟𝑜𝑝𝑦(𝑆𝑉)
𝑆
IG(S, Humidity) = 0.97 - 𝑃(𝐻𝑖𝑔ℎ).E(High) -
𝑃(𝑁𝑜𝑟𝑚𝑎𝑙).E(Normal)
Attribute 3: Wind
Wind Yes No
Weak
Entropy (Ssunny) = E(Ssunny) = 0.97 Strong
𝑆𝑉
𝐼𝐺 𝑆, 𝐴 = 𝐸𝑛𝑡𝑟𝑜𝑝𝑦 𝑆 − . 𝐸𝑛𝑡𝑟𝑜𝑝𝑦(𝑆𝑉)
𝑆
IG(S, Wind) = 0.97 - 𝑃(𝑊𝑒𝑎𝑘).E(Weak) -
𝑃(𝑆𝑡𝑟𝑜𝑛𝑔).E(Strong)
• IG(Ssunny ,Temp) = 0.570
• IG(Ssunny ,Humidity) = 0.97 Next Root Node
Yes
No
Information Gain of each attribute w.r.t
Rain
Srain=[3+, 2-]
3 3 2 2
Entropy(Srain) = - log2 - log2 = 0.97
5 5 5 5
Attribute 1: Temp.
Temp. Yes No
Hot 0 0
Entropy (Srain) = E(Srain) = 0.97 Mild 2 1
Cool 1 1
𝑆𝑉
𝐼𝐺 𝑆, 𝐴 = 𝐸𝑛𝑡𝑟𝑜𝑝𝑦 𝑆 − . 𝐸𝑛𝑡𝑟𝑜𝑝𝑦(𝑆𝑉)
𝑆
0 3 2
IG(S, Temp) = 0.97 - .E(Hot) - .E(Mild) - .E(Cool)
5 5 5
Attribute 2: Humidity
Humidity Yes No
High
Entropy (Srain) = E(Srain) = 0.97 Normal
𝑆𝑉
𝐼𝐺 𝑆, 𝐴 = 𝐸𝑛𝑡𝑟𝑜𝑝𝑦 𝑆 − . 𝐸𝑛𝑡𝑟𝑜𝑝𝑦(𝑆𝑉)
𝑆
IG(S, Humidity) = 0.97 - 𝑃(𝐻𝑖𝑔ℎ).E(High) -
𝑃(𝑁𝑜𝑟𝑚𝑎𝑙).E(Normal)
Attribute 3: Wind
Wind Yes No
Weak
Entropy (Srain) = E(Srain) = 0.97 Strong
𝑆𝑉
𝐼𝐺 𝑆, 𝐴 = 𝐸𝑛𝑡𝑟𝑜𝑝𝑦 𝑆 − . 𝐸𝑛𝑡𝑟𝑜𝑝𝑦(𝑆𝑉)
𝑆
IG(S, Wind) = 0.97 - 𝑃(𝑊𝑒𝑎𝑘).E(Weak) -
𝑃(𝑆𝑡𝑟𝑜𝑛𝑔).E(Strong)
• IG(Srain ,Temp) = 0.0192
No Yes Yes No
General Decision Tree Algorithm Steps
• Inductive Learning
• Deductive Learning
Restrictive
• Based on Conditions
Bias
Preference
• Based on Priorities
Bias
• According to Occums
Razor
– Prefer simplest
hypothesis that fits
the data.
Issues in Decision Tree Learning
• Lazy Learners.
𝑑1 = 3−7 2 + 7 −7 2 = 16 + 0 = 16 = 4
𝑑2 = 3−7 2 + 7 −4 2 = 16 + 9 = 25 = 5
where,
𝑎𝑖 𝑥 = Value of ith attribute of instance (𝑥)
𝑤𝑖 = Coefficients of weights
Locally Weighted Linear Regression
To find coefficients of weights, we use gradient descent rule:
where,
n = Learning rate constant
𝑓 𝑥 = Target Function
መ
𝑓(𝑥) =Approximation to target function.
𝑎𝑗(𝑥) = Value of jth attribute of instance (x)
Locally Weighted Linear Regression
Locally Weighted Linear Regression
• Locally weighted linear regression is a supervised learning algorithm.
• It is a non-parametric algorithm.
• There exists No training phase. All the work is done during the testing
phase/while making predictions.
• The dataset must always be available for predictions.
• Locally weighted regression methods are a generalization of k-Nearest
Neighbour.
• In Locally weighted regression an explicit local approximation is constructed
from the target function for each query instance.
• The local approximation is based on the target function of the form like
constant, linear, or quadratic functions localized kernel functions.
Applications of Locally Weighted Linear
Regression
• Multiquadric
Փ(r) =(r2 + c2)1/2
where, c>0, constant
• Inverse Multiquadric
1
Փ(r) = 2 2 1/2
(r + c )
• Gaussian Function
−r2
Փ(r) =exp[ 2 ]
2𝜎
Radial Basis Function (RBF)
• Used for approximation of multivariate target functions.
መ
𝑓(𝑥) = 𝑤0 + 𝑤𝑢 . 𝐾𝑢 𝑑 𝑥𝑢, 𝑥
መ
𝑓(𝑥) = Approximation of multivariate target functions
𝑤0 = Initial weight
𝑤𝑢= Unit Weight
𝐾𝑢 𝑑 𝑥𝑢, 𝑥 = Kernel Function
𝑑 𝑥𝑢, 𝑥 = distance between 𝑥𝑢 and 𝑥.
Radial Basis Function (RBF) Networks
• Used in Artificial Neural Networks (ANN).
• Used for classification task in ANN.
• Commonly used in ANN for function
approximation also.
• RBF networks are different from simple ANN due
to their universal approximation and faster
speed.
• Feed forward neural network.
• Consists of 3 layers – Input layer, Middle Layer
and Output Layer
Case Based Learning or Case Based
Reasoning (CBR)
D4(t,s3) =σ𝑛𝑖=0 𝑡 − 𝑠𝑖 ∗ 𝑤𝑖
The minimum distance is D3(t, s3) = 1. D4 (t,s3) = |3-0|*1 + |1-(-1)|*1 + |2-0|*1
Therefore best fit credit score for Case 5 is 4. D4(t,s3) = 3 + 2 + 2 = 7
CBR Example
D4(t,s3) =σ𝑛𝑖=0 𝑡 − 𝑠𝑖 ∗ 𝑤𝑖
The minimum distance is D2(t, s3) = 2. D4 (t,s3) = |3-0|*1 + |1-(-1)|*3 + |2-0|*1
Therefore best fit credit score for Case 5 is 2. D4(t,s3) = 3 + 2*3 + 2 = 3 + 6 + 2 = 11
Lazy Learning vs Eager Learning
Tom M. Mitchell, Ethem Alpaydin, ―Introduction Stephen Marsland, Bishop, C., Pattern
―Machine Learning, to Machine Learning (Adaptive ―Machine Learning: An Recognition and Machine
McGraw-Hill Computation and Machine Algorithmic Perspective, Learning. Berlin:
Education (India) Learning), The MIT Press 2004. CRC Press, 2009. Springer-Verlag.
Private Limited, 2013.
Text Books
Saikat Dutt, Andreas C. Müller and John Paul Mueller and Dr. Himanshu
Subramanian Sarah Guido - Luca Massaron - Sharma, Machine
Chandramaouli, Amit Introduction to Machine Machine Learning for Learning, S.K.
Kumar Das – Machine Learning with Python Dummies Kataria & Sons -2022
Learning, Pearson