0% found this document useful (0 votes)

7 views

unit 3

The document provides an overview of Decision Trees in machine learning, detailing their structure, terminology, and the process of decision tree learning. It explains key concepts such as entropy, information gain, and the calculations involved in determining these metrics for attributes in a dataset. The document also includes examples related to the 'Playing Golf' problem to illustrate these concepts.

Uploaded by

darkguardian363

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

7 views

unit 3

Uploaded by

darkguardian363

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 90

Machine Learning

Techniques

KCS 055
Decision Tree
• A decision tree in a machine learning is a
flowchart structure in which each node
represents a test, and each branch represents the
outcome of the test.
• The end node called leaf node represents a class
label.
• Decision Tree is a supervised learning method .
• Used for both classification and regression.
Decision Tree Learning

The decision tree learning is a method for

approximating discrete-valued target
function, (concept) in which the learned
function is represented by a decision tree.
Important Terminology of Decision Tree
• Root Node: It represents the entire population
(which gets further divided into 2 or more sets.
• Splitting: It is a process of dividing node into two
or more sub nodes to increase tree.
• Decision nodes: When a sub- node splits into
further sub-nodes then it is called a decision tree.
• Leaf/ Terminal Node: The end nodes which do
not split are called leaf or terminal nodes.
Important Terminology of Decision Tree
• Pruning: The removal of sub-nodes is called
pruning to reduce tree.
• Branch (sub-tree): A subsection of entire tree
is called branch or subtree.
• Parent nodes and child nodes: A node
divided into sub-nodes is called parent node.
The sub-nodes of a parent node are called
child nodes.
Entropy
• Average amount of information contained by
random variable (x) is called Entropy.
𝑛

𝐸 𝑥 = − ෍ 𝑃 𝑥𝑖 . 𝑙𝑜𝑔2𝑃(𝑥𝑖)
𝑖=1
• In other words, entropy is the measure of
“randomness information” of a variable.
• It is the measure of uncertainty associated with
random variable (x)
Calculate the entropy (E) of a single attribute
“Playing Golf” problem.
𝐸 𝑆 = − σ𝑛𝑖=1 𝑝𝑖 . 𝑙𝑜𝑔2𝑝i
Where x = current state, 𝑝𝑖 = Prob. of event (i) of state (S).
Yes No Entropy(Playing Golf) = Entropy(5,9)
9 5 Prob. Of Play Golf = Yes = 9/14 = 0.64
Prob. Of Play Golf = No = 5/14 = 0.36

E(x) = - (0.36 log2 (0.36)) – (0.64 log2 (0.64))

E(x) = 0.94
Calculate the entropy (E) of multiple attribute
“Playing Golf” problem.

Outlook Yes No 𝐸 𝑇, 𝑋 = ෍ 𝑃 𝑐 . 𝐸(𝑐)

𝑐𝜖𝑋
Sunny 3 2
Overcast 4 0 Where T = current state, X= Selected Attribute
Rain 2 3 E(Play Golf, Outlook) = P(Sunny).E(3,2) +
P(Overcast).E(4,0) + P(Rain).E(2,3)
E(Play Golf, Outlook) = 5/14 . 0.971 + 4/14 . 0 + 5/14 . 0.971
E(Play Golf, Outlook) = 0.693
Information Gain

• The information gain is defined as the reduction

(decrease) in entropy.
IG(T,X) = Entropy(T) – Entropy(T,X)
Where, T = Current State
X= Selected attribute
𝑆𝑉
𝐼𝐺 𝑆, 𝐴 = 𝐸𝑛𝑡𝑟𝑜𝑝𝑦 𝑆 − ෍ . 𝐸𝑛𝑡𝑟𝑜𝑝𝑦(𝑆𝑉)
𝑆
Calculate the Information Gain

• Entropy (T) = E(Play Golf) = 0.94

• Entropy (T,X) = E(Play Golf, Outlook) = 0.693

IG (Outlook) = Entropy(T) – Entropy(T,X)

IG (Outlook) = 0.94 – 0.693 = 0.247

Attribute 1: Outlook
Outlook Yes No
Sunny 3 2
Entropy (S) = E(Play) = 0.94 Overcast 4 0
Rain 2 3

𝑆𝑉
𝐼𝐺 𝑆, 𝐴 = 𝐸𝑛𝑡𝑟𝑜𝑝𝑦 𝑆 − ෍ . 𝐸𝑛𝑡𝑟𝑜𝑝𝑦(𝑆𝑉)
𝑆
5 4 5
𝐼𝐺 𝑆, 𝐴 = 0.94 − .𝐸 𝑆𝑢𝑛𝑛𝑦 − .E Overcast − .E(Rain)
14 14 14
Attribute 2: Temp.
Temp. Yes No
Hot
Entropy (S) = E(Play) = 0.94 Mild
Cool

𝑆𝑉
𝐼𝐺 𝑆, 𝐴 = 𝐸𝑛𝑡𝑟𝑜𝑝𝑦 𝑆 − ෍ . 𝐸𝑛𝑡𝑟𝑜𝑝𝑦(𝑆𝑉)
𝑆
𝐼𝐺 𝑆, 𝐴 = 0.94 − 𝑃(𝐻𝑜𝑡). 𝐸 𝐻𝑜𝑡 −
𝑃(𝑀𝑖𝑙d). E Mild − 𝑃(𝐶𝑜𝑜𝑙).E(Cool)
Attribute 3: Humidity
Humidity Yes No
High
Entropy (S) = E(Play) = 0.94 Normal

𝑆𝑉
𝐼𝐺 𝑆, 𝐴 = 𝐸𝑛𝑡𝑟𝑜𝑝𝑦 𝑆 − ෍ . 𝐸𝑛𝑡𝑟𝑜𝑝𝑦(𝑆𝑉)
𝑆
𝐼𝐺 𝑆, 𝐴
= 0.94 − 𝑃(𝐻𝑖𝑔ℎ). 𝐸 𝐻𝑖𝑔ℎ − 𝑃(Normal). E Normal
Attribute 4: Wind
Wind Yes No
Weak
Entropy (S) = E(Play) = 0.94 Strong

𝑆𝑉
𝐼𝐺 𝑆, 𝐴 = 𝐸𝑛𝑡𝑟𝑜𝑝𝑦 𝑆 − ෍ . 𝐸𝑛𝑡𝑟𝑜𝑝𝑦(𝑆𝑉)
𝑆
IG(S, Wind) = 0.94 - 𝑃(𝑊𝑒𝑎𝑘).E(Weak) -
𝑃(𝑆𝑡𝑟𝑜𝑛𝑔).E(Strong)
• IG(Outlook) = 0.2464 Root Node

• IG(Temp) = 0.0289
• IG(Humidity) = 0.1316
• IG (Wind) = 0.0478
Outlook

Sunny Overcast Rain

Yes Yes No
No
(D4, D5, D6, D10, D14)
(D1, D2, D8, D9, D11)
Yes
(D3, D7, D12. D13)
Information Gain of each attribute w.r.t
Sunny

Day Outlook Temp Humidity Wind Play

D1 Hot High Weak No
D2 Hot High Strong No
D8 Sunny Mild High Weak No
D9 Cool Normal Weak Yes
D11 Mild Normal Strong Yes
Information Gain of each attribute w.r.t
Sunny
Day Outlook Temp Humidity Wind Play
D1 Hot High Weak No
D2 Hot High Strong No
D8 Sunny Mild High Weak No
D9 Cool Normal Weak Yes
D11 Mild Normal Strong Yes

SSunny=[2+, 3-]
2 2 3 3
Entropy(Ssunny) = - log2 - log2 = 0.97
5 5 5 5
Attribute 1: Temp.
Temp. Yes No
Hot 0 2
Entropy (Ssunny) = E(Ssunny) = 0.97 Mild 1 1
Cool 1 0

𝑆𝑉
𝐼𝐺 𝑆, 𝐴 = 𝐸𝑛𝑡𝑟𝑜𝑝𝑦 𝑆 − ෍ . 𝐸𝑛𝑡𝑟𝑜𝑝𝑦(𝑆𝑉)
𝑆
2 2 1
IG(S, Temp) = 0.97 - .E(Hot) - .E(Mild) - .E(Cool)
5 5 5
Attribute 2: Humidity
Humidity Yes No
High
Entropy (Ssunny) = E(Ssunny) = 0.97 Normal

𝑆𝑉
𝐼𝐺 𝑆, 𝐴 = 𝐸𝑛𝑡𝑟𝑜𝑝𝑦 𝑆 − ෍ . 𝐸𝑛𝑡𝑟𝑜𝑝𝑦(𝑆𝑉)
𝑆
IG(S, Humidity) = 0.97 - 𝑃(𝐻𝑖𝑔ℎ).E(High) -
𝑃(𝑁𝑜𝑟𝑚𝑎𝑙).E(Normal)
Attribute 3: Wind
Wind Yes No
Weak
Entropy (Ssunny) = E(Ssunny) = 0.97 Strong

𝑆𝑉
𝐼𝐺 𝑆, 𝐴 = 𝐸𝑛𝑡𝑟𝑜𝑝𝑦 𝑆 − ෍ . 𝐸𝑛𝑡𝑟𝑜𝑝𝑦(𝑆𝑉)
𝑆
IG(S, Wind) = 0.97 - 𝑃(𝑊𝑒𝑎𝑘).E(Weak) -
𝑃(𝑆𝑡𝑟𝑜𝑛𝑔).E(Strong)
• IG(Ssunny ,Temp) = 0.570
• IG(Ssunny ,Humidity) = 0.97 Next Root Node

• IG (Ssunny ,Wind) = 0.0192

Outlook

Sunny Overcast Rain

Day Outlook Temp Humidity Wind Play

Yes
Humidity
D1 Hot High Weak No
D2 Hot High Strong No
D8 Sunny Mild High Weak No
D9 Cool Normal Weak Yes
High Normal
D11 Mild Normal Strong Yes

Yes
No
Information Gain of each attribute w.r.t
Rain

Day Outlook Temp Humidity Wind Play

D4 Mild High Weak Yes
D5 Cool Normal Weak Yes
D6 Rain Cool Normal Strong No
D10 Mild Normal Weak Yes
D14 Mild High Strong No
Information Gain of each attribute w.r.t
Rain
Day Outlook Temp Humidity Wind Play
D4 Mild High Weak Yes
D5 Cool Normal Weak Yes
D6 Rain Cool Normal Strong No
D10 Mild Normal Weak Yes
D14 Mild High Strong No

Srain=[3+, 2-]
3 3 2 2
Entropy(Srain) = - log2 - log2 = 0.97
5 5 5 5
Attribute 1: Temp.
Temp. Yes No
Hot 0 0
Entropy (Srain) = E(Srain) = 0.97 Mild 2 1
Cool 1 1

𝑆𝑉
𝐼𝐺 𝑆, 𝐴 = 𝐸𝑛𝑡𝑟𝑜𝑝𝑦 𝑆 − ෍ . 𝐸𝑛𝑡𝑟𝑜𝑝𝑦(𝑆𝑉)
𝑆
0 3 2
IG(S, Temp) = 0.97 - .E(Hot) - .E(Mild) - .E(Cool)
5 5 5
Attribute 2: Humidity
Humidity Yes No
High
Entropy (Srain) = E(Srain) = 0.97 Normal

𝑆𝑉
𝐼𝐺 𝑆, 𝐴 = 𝐸𝑛𝑡𝑟𝑜𝑝𝑦 𝑆 − ෍ . 𝐸𝑛𝑡𝑟𝑜𝑝𝑦(𝑆𝑉)
𝑆
IG(S, Humidity) = 0.97 - 𝑃(𝐻𝑖𝑔ℎ).E(High) -
𝑃(𝑁𝑜𝑟𝑚𝑎𝑙).E(Normal)
Attribute 3: Wind
Wind Yes No
Weak
Entropy (Srain) = E(Srain) = 0.97 Strong

𝑆𝑉
𝐼𝐺 𝑆, 𝐴 = 𝐸𝑛𝑡𝑟𝑜𝑝𝑦 𝑆 − ෍ . 𝐸𝑛𝑡𝑟𝑜𝑝𝑦(𝑆𝑉)
𝑆
IG(S, Wind) = 0.97 - 𝑃(𝑊𝑒𝑎𝑘).E(Weak) -
𝑃(𝑆𝑡𝑟𝑜𝑛𝑔).E(Strong)
• IG(Srain ,Temp) = 0.0192

• IG(Srain ,Humidity) = 0.0192

• IG (Srain ,Wind) = 0.97 Next Root Node

Outlook Outlook= Rain,
Temperature= Cool,
Humidity = High,
Wind = Strong
Sunny Overcast Rain

Outlook= Sunny, Yes

Humidity Wind
Temperature= Cool,
Humidity = High,
Wind = Strong
High Normal
Weak Strong

No Yes Yes No
General Decision Tree Algorithm Steps

• Calculate the Entropy (E) of every Attribute A of dataset

(S).
• Split (partition) the dataset (S) into subsets using the
attribute for which the resulting entropy after splitting is
minimized (or information gain is maximized).
• Make a decision tree node containing that attribute.
• Repeat steps 1, 2 and 3 until the dataset is finished.
Types of Decision Tree Algorithm

Iterative Classification and

Dichotomizer 3 CD 4.5 Algorithm Regression Tree
(ID3) Algorithm (CART) Algorithm

The previous example was solved using ID3 Algorithm.

Pseudocode of ID-3 Decision tree Algorithm

• ID3(Example, Target-Attribute, Attributes)

• Create a root node for the tree
– If all examples are +ve, return the single node tree
root, with label = (+).
– If all examples are -ve, return the single node tree
root, with label = (-).
Pseudocode of ID-3 Decision tree
Algorithm
• Otherwise begin
– A Attribute that best classifies examples.
– Decision tree attribute for root = A.
– For each possible value(vi) of A,
– Add a new tree branch below,
corresponding to the test A = (vi).
• Let examples(vi) be the subset of examples
that have value (vi) for A.
Pseudocode of ID-3 Decision tree
Algorithm

• If example (vi) is empty:

– Then below this new branch, add a leaf node with label
= (most common target value in the examples)
– Else below this new branch, add the subtree
ID3 (Example(vi), Target Attribute, Attribute – (A))
• End
• Return Root
Limitations of ID-3 Decision Tree
Algorithm

• ID-3 does not guarantee an optimal solution.

• ID-3 can overfit the training data.

• ID-3 is harder to use on continuous data as compared

to discrete data.
Inductive Bias

Remarks on Candidate Elimination Algorithm

– Will the CE algorithm give us correct hypothesis?

– What training example should the learner request

next?
Inductive Bias

• Inductive Learning

– We drive rules from given examples

• Deductive Learning

– Already Existing rules are applied to our examples.

Inductive Bias

• Biased Hypothesis space

– Does not consider all types of training examples.
– Ex: Sunny, Mild, Normal, Strong = Yes
• Unbiased Hypothesis space
– Hypothesis represents all set of examples
– Possible Instances : 3 x 3 x 2 x 2 = 36
– Target Concept : 236 (Practically not possible)
Idea of Inductive Bias

Learner generalizes beyond the observed training

examples to infer new examples

‘>’ → inductively inferred from

x>y => y I inferred from x.

Inductive Bias in Decision Tree
Learning

• Inductive Bias is set of assumptions.

• Inductive Bias of ID3 consists of describing the basis

by which ID3 chooses one consistent decision tree
over all the possible Decision Trees.
Inductive Bias in Decision Tree
Learning

• ID3 Search Strategy:

– Selects in favor of shorter trees over longer ones.

– Selects element with highest Information Gain as

root attribute over lowest IGs.
Types of Inductive Bias

Restrictive
• Based on Conditions
Bias
Preference
• Based on Priorities
Bias

ID3 = > Preference

Version Space and Candidate Elimination => Restrictive

Why short hypothesis?

• According to Occums
Razor
– Prefer simplest
hypothesis that fits
the data.
Issues in Decision Tree Learning

• Avoiding Overfitting of Data

– Reduced Error Pruning
– Rue Post-Pruning
• Incorporating continuous valued attributes.
• Alternative measures for selecting attributes.
• Handling training examples with missing attribute values.
• Handling attribute with differing costs.
3 main properties of Instance Based
Learning

• Lazy Learners.

• Classification is different for each instance.

• Instances are represented with n dimensional

Euclidean space.
K-Nearest Neighbor Algorithm
KNN Algorithm
• Supervised ML Algorithm.
• Can be used for both Regression and
Classification problems.
• Lazy Learning Algorithm.
• Non-parametric learning Algorithm.
KNN Algorithm Steps

• Step 1: Load the data set

• Step 2: Choose any value of nearest neighbour data points (K). K can be any integer value.
• Step 3: Calculate the Euclidean distance between test data (Query instance, x) and each row
of training data and note them down in ordered table.
• Step 4: Arrange the Euclidean distance table in ascending order . Choose top K rows from the
table.
• Step 5: Now assign a class to the new instance point query (test data) to that class for which
the K number of neighbour is maximum.
• Step 6: End.
Example
Consider the given training example Classify
the following data query pass or fail:
Data Query →Maths = 6, MLT = 8 and k=3
Maths MLT Result
4 3 Fail
Euclidean Distance (d)
6 7 Pass
7 8 Pass
𝑑= 𝑥 − 𝑥1 2 + 𝑥 − 𝑥2 2
5 5 Fail
8 8 Pass
Example
Data Query →Maths = 6, MLT = 8 and k=3
Maths MLT Result
𝑑= 𝑥 − 𝑥1 2 + 𝑥 − 𝑥2 2
4 3 Fail
2 2
6 7 Pass
𝑑1 = 6−4 + 8 −3 = 4 + 25 = 29 = 5.38
7 8 Pass
𝑑2 = 6−6 2 + 8 −7 2 = 0+1= 1=1 5 5 Fail
8 8 Pass
𝑑3 = 6−7 2 + 8 −8 2 = 1+0= 1=1 d2, d3, and d5 are 3 nearest
neighbors.
𝑑4 = 6−5 2 + 8 −5 2 = 1+9= 10 = 3.16 D2 → Pass, D3→ Pass, D5→
Pass
𝑑5 = 6−8 2 + 8 −8 2 = 4+0= 4=2 Therefore, when Maths =6,
MLT= 8, Result = Pass.
Classify P5 in the following given
data using KNN, consider K=3
Classify P5 in the following given data using
KNN, consider K=3

𝑑1 = 3−7 2 + 7 −7 2 = 16 + 0 = 16 = 4

𝑑2 = 3−7 2 + 7 −4 2 = 16 + 9 = 25 = 5

𝑑3 = 3−3 2 + 7 −4 2 = 0+9= 9=3

𝑑4 = 3−1 2 + 7 −4 2 = 4+9= 13 =3.61

3 nearest neighbors are →d1, d3, d4
P1 → Bad, P3→Good, P4 →Good
Therefore, P5 belongs to Good.
Wt = 57
Ht = 170
Class = ?
Consider
K=3 or 5.
Name Age Gender Class
Ajay 32 0 Football Here male is
Mark 40 0 Neither denoted with
Saira 16 1 Cricket numeric value 0
Zara 34 1 Cricket and female with 1.
Sachin 55 0 Neither Find in which class
Rahul 40 0 Cricket of people Angelina
Pooja 20 1 Neither will lie whose k
Smith 15 0 Cricket factor is 3 and age
Laxmi 55 1 Football is 5.
Michael 15 0 Football
Credit
KNN Applications
Rating
Stock
Pattern
Price
Recogniti
Predictio
on
n
Pattern
recognition
Data
Loan
Preproce
Approval
ssing
Compute
r Vision
Advantages of K-NN

• Very Easy to Implement for classification task.

• No training required to make predictions.
• Only 2 parameters required in KNN i.e., value of integer K and
Euclidean distance function (d).
• A variety of distance function available like Euclidean,
Manhattan, Minkowkski, etc.
• KNN can be used for both classification and regression tasks.
Disadvantages of K-NN

• Choosing the value of K affects the results.

• Difficult distance function calculation.
• Does not work with large dataset.
• Need feature scaling.
• Sensitive to noisy data, missing value and outliers.
• Needs large memory size of computer storage.
Implementation of KNN

• from sklearn.neigbors import

KNeighborsClassifier
• classifier =
KNeighborsClassifier(n_neighbors=8)
• classsifier.fit(x_train, y_train)
Locally Weighted Regression

• Locally : The function (f) is approximated only based

on data points which are local or near to the query
point.
• Weighted: Each training example is weighted by its
distance from query point(xq).
• Regression: Means approximation of a target
function.
Locally Weighted Regression

• Locally weighted regression means approximation of

a real-valued target function f(x) over a local region
near to the query point(xq). It uses distance weighted
training examples to form local approximation to
target function f(x).
Locally Weighted Regression
Locally Weighted Linear Regression

Approximation of function using linear function:

𝑓መ 𝑥 = 𝑤0 + 𝑤1𝑎1 𝑥 + 𝑤2𝑎2 𝑥 + … … … + 𝑤𝑛𝑎𝑛(𝑥)

where,
𝑎𝑖 𝑥 = Value of ith attribute of instance (𝑥)
𝑤𝑖 = Coefficients of weights
Locally Weighted Linear Regression
To find coefficients of weights, we use gradient descent rule:

∆𝑤𝑖 = 𝑛 ෍(𝑓 𝑥 − 𝑓መ (𝑥)). 𝑎𝑗(𝑥)

where,
n = Learning rate constant
𝑓 𝑥 = Target Function
መ
𝑓(𝑥) =Approximation to target function.
𝑎𝑗(𝑥) = Value of jth attribute of instance (x)
Locally Weighted Linear Regression
Locally Weighted Linear Regression
• Locally weighted linear regression is a supervised learning algorithm.
• It is a non-parametric algorithm.
• There exists No training phase. All the work is done during the testing
phase/while making predictions.
• The dataset must always be available for predictions.
• Locally weighted regression methods are a generalization of k-Nearest
Neighbour.
• In Locally weighted regression an explicit local approximation is constructed
from the target function for each query instance.
• The local approximation is based on the target function of the form like
constant, linear, or quadratic functions localized kernel functions.
Applications of Locally Weighted Linear
Regression

Time Series Analysis

Anomaly Detection
Robotics and Control Systems
Radial Basis Function (RBF)

• It is a mathematical function whose value depends

only on the distance from the origin.
• RBF works by defining its distance from the center or
origin point. This is done by using absolute values.
• Denoted by Փ(x).
Փ(r) =Փ(|r|)
3 Radial functions

• Multiquadric
Փ(r) =(r2 + c2)1/2
where, c>0, constant
• Inverse Multiquadric
1
Փ(r) = 2 2 1/2
(r + c )
• Gaussian Function
−r2
Փ(r) =exp[ 2 ]
2𝜎
Radial Basis Function (RBF)
• Used for approximation of multivariate target functions.
መ
𝑓(𝑥) = 𝑤0 + ෍ 𝑤𝑢 . 𝐾𝑢 𝑑 𝑥𝑢, 𝑥

መ
𝑓(𝑥) = Approximation of multivariate target functions
𝑤0 = Initial weight
𝑤𝑢= Unit Weight
𝐾𝑢 𝑑 𝑥𝑢, 𝑥 = Kernel Function
𝑑 𝑥𝑢, 𝑥 = distance between 𝑥𝑢 and 𝑥.
Radial Basis Function (RBF) Networks
• Used in Artificial Neural Networks (ANN).
• Used for classification task in ANN.
• Commonly used in ANN for function
approximation also.
• RBF networks are different from simple ANN due
to their universal approximation and faster
speed.
• Feed forward neural network.
• Consists of 3 layers – Input layer, Middle Layer
and Output Layer
Case Based Learning or Case Based
Reasoning (CBR)

• Used for classification and regression.

• Process of solving new problems based on the
solutions of similar past problems.
• It is an advanced instance-based learning method
used to solve more complex problems.
• Does not use Euclidean Metric.
Steps in CBR

• Retrieve: Gather data from memory. Check any previous

solution similar to current problem.
• Reuse: Suggest a solution based on experience. Adapt it to
meet the demands of new situation.
• Revise: Evaluate the use of solution in new context.
• Retain: Store this new problem-solving method in memory
system.
Steps In CBR
Applications in CBR
Customer service helpdesk for
diagnosis problems.

Engineering and law for

technical design and legal rules.

Medical Science for patient

case histories and treatment.
CBR Example (Smart Software Agent)
Tells the problem

Software assistant recommends some

possible solutions to solve the current
User calls for problem such as internet connection CBR is used by the
computer related problem, printer problem, etc. software assistant
service problem. to diagnose the
problem
CBR Example (Smart Software Agent)
Customer not satisfied

Gives new solution

Stores the new

recommended
solution into
database for future.
CBR Example

Case Monthly Account Home Credit

Income Balance Owner Score Q: Which case will the
1 3 2 0 2 CSR system retrieve as
2 2 1 1 2 the ‘best match’, if all
3 3 2 2 4 the weights wi = 1?
4 0 -1 0 0
5 3 1 2 ?
CBR
All weights are equal: w = 1
Example
D1(t,s1) =σ𝑛𝑖=0 𝑡 − 𝑠𝑖 ∗ 𝑤𝑖
D1 (t,s1) = |3-3|*1 + |1-2|*1 + |2-0|*1
D1(t,s1) = 0 + 1 + 2 = 3
Case Monthly Account Home Credit
Income Balance Owner Score D2(t,s2) =σ𝑛𝑖=0 𝑡 − 𝑠𝑖 ∗ 𝑤𝑖
1 3 2 0 2 D2 (t,s2) = |3-2|*1 + |1-1|*1 + |2-1|*1
2 2 1 1 2 D2(t,s2) = 1 + 0 + 1 = 2
3 3 2 2 4
D3(t,s3) =σ𝑛𝑖=0 𝑡 − 𝑠𝑖 ∗ 𝑤𝑖
4 0 -1 0 0
D3 (t,s3) = |3-3|*1 + |1-2|*1 + |2-2|*1
5 3 1 2 ? D3(t,s3) = 0 + 1 + 0 = 1

D4(t,s3) =σ𝑛𝑖=0 𝑡 − 𝑠𝑖 ∗ 𝑤𝑖
The minimum distance is D3(t, s3) = 1. D4 (t,s3) = |3-0|*1 + |1-(-1)|*1 + |2-0|*1
Therefore best fit credit score for Case 5 is 4. D4(t,s3) = 3 + 2 + 2 = 7
CBR Example

Case Monthly Account Home Credit Score ‘Account

Income Balance Owner
Balance’ is 3
1 3 2 0 2
times more
2 2 1 1 2
3 3 2 2 4
important than
4 0 -1 0 0 any other
5 3 1 2 ? feature.
All weights are equal: w = 1
CBR Example
D1(t,s1) =σ𝑛𝑖=0 𝑡 − 𝑠𝑖 ∗ 𝑤𝑖
D1 (t,s1) = |3-3|*1 + |1-2|*3 + |2-0|*1
D1(t,s1) = 0 + 3 + 2 = 5
Case Monthly Account Home Credit
Income Balance Owner Score D2(t,s2) =σ𝑛𝑖=0 𝑡 − 𝑠𝑖 ∗ 𝑤𝑖
1 3 2 0 2 D2 (t,s2) = |3-2|*1 + |1-1|*3 + |2-1|*1
2 2 1 1 2 D2(t,s2) = 1 + 0 + 1 = 2
3 3 2 2 4
D3(t,s3) =σ𝑛𝑖=0 𝑡 − 𝑠𝑖 ∗ 𝑤𝑖
4 0 -1 0 0
D3 (t,s3) = |3-3|*1 + |1-2|*3 + |2-2|*1
5 3 1 2 ? D3(t,s3) = 0 + 3 + 0 = 3

D4(t,s3) =σ𝑛𝑖=0 𝑡 − 𝑠𝑖 ∗ 𝑤𝑖
The minimum distance is D2(t, s3) = 2. D4 (t,s3) = |3-0|*1 + |1-(-1)|*3 + |2-0|*1
Therefore best fit credit score for Case 5 is 2. D4(t,s3) = 3 + 2*3 + 2 = 3 + 6 + 2 = 11
Lazy Learning vs Eager Learning

Lazy Learning Eager Learning

• Simply stores training data • When we give a training set,
and waits until it gets a test it constructs a model for
data. classification before getting
• Less training time, more new example
prediction time. • More training time, less
• Ex: All instance based prediction time.
learning algorithms • Ex: Naïve Bayes, Decision
Tree
Reference Books

Tom M. Mitchell, Ethem Alpaydin, ―Introduction Stephen Marsland, Bishop, C., Pattern
―Machine Learning, to Machine Learning (Adaptive ―Machine Learning: An Recognition and Machine
McGraw-Hill Computation and Machine Algorithmic Perspective, Learning. Berlin:
Education (India) Learning), The MIT Press 2004. CRC Press, 2009. Springer-Verlag.
Private Limited, 2013.
Text Books

Saikat Dutt, Andreas C. Müller and John Paul Mueller and Dr. Himanshu
Subramanian Sarah Guido - Luca Massaron - Sharma, Machine
Chandramaouli, Amit Introduction to Machine Machine Learning for Learning, S.K.
Kumar Das – Machine Learning with Python Dummies Kataria & Sons -2022
Learning, Pearson

Informatics Security Handbook 1st Edition
100% (1)
Informatics Security Handbook 1st Edition
410 pages
Homework 5 - 3
No ratings yet
Homework 5 - 3
1 page
SSL and TLS Essentials
100% (1)
SSL and TLS Essentials
212 pages
Unit-3 (1)
No ratings yet
Unit-3 (1)
81 pages
Unit 2 1
No ratings yet
Unit 2 1
15 pages
Module 3-Decision Tree Learning
100% (1)
Module 3-Decision Tree Learning
33 pages
2.3 Decision-Tree-Algorithm
No ratings yet
2.3 Decision-Tree-Algorithm
61 pages
7-Decision Trees Learning
No ratings yet
7-Decision Trees Learning
51 pages
7. Decision Tree & Random Forest
No ratings yet
7. Decision Tree & Random Forest
41 pages
New Module 3 Part1
No ratings yet
New Module 3 Part1
69 pages
ML UNIT-2 Notes
No ratings yet
ML UNIT-2 Notes
15 pages
Module - 2 Decision Tree Learning
No ratings yet
Module - 2 Decision Tree Learning
79 pages
Decision Tree Learning and Inductive Inference
No ratings yet
Decision Tree Learning and Inductive Inference
37 pages
Module 3 DecisionTree Notes
100% (1)
Module 3 DecisionTree Notes
14 pages
07 - ML - Decision Tree
No ratings yet
07 - ML - Decision Tree
37 pages
DMDW-CO3-SESSION-14
No ratings yet
DMDW-CO3-SESSION-14
55 pages
ML-Lec5
No ratings yet
ML-Lec5
7 pages
Lec-2 Decision Tree_13-8-2024
No ratings yet
Lec-2 Decision Tree_13-8-2024
38 pages
Lect 8-Decision Tree-2
No ratings yet
Lect 8-Decision Tree-2
16 pages
Module 3 Chap 3 Decision Tree Learning
No ratings yet
Module 3 Chap 3 Decision Tree Learning
79 pages
Decision Tree Learning
No ratings yet
Decision Tree Learning
70 pages
SDG Sdgs DF
No ratings yet
SDG Sdgs DF
23 pages
Unit 3 MLT
No ratings yet
Unit 3 MLT
18 pages
2.decision Tree
No ratings yet
2.decision Tree
56 pages
Unit2 ML
No ratings yet
Unit2 ML
19 pages
Class 16 Decision Tree
No ratings yet
Class 16 Decision Tree
45 pages
L3 - Decision Trees
No ratings yet
L3 - Decision Trees
28 pages
Chapter 3 Decision Trees
No ratings yet
Chapter 3 Decision Trees
61 pages
ML Unit-2.1
No ratings yet
ML Unit-2.1
17 pages
Unit 5. Decision Trees
No ratings yet
Unit 5. Decision Trees
58 pages
MLT UNIT-3 notes
No ratings yet
MLT UNIT-3 notes
35 pages
Lecture2 DT
No ratings yet
Lecture2 DT
75 pages
AIML- Module 3- Updated
No ratings yet
AIML- Module 3- Updated
42 pages
Module 3
No ratings yet
Module 3
101 pages
L5 - Decision Tree - B
No ratings yet
L5 - Decision Tree - B
51 pages
Visit:: Join Telegram To Get Instant Updates: Contact: MAIL: Instagram: Instagram: Whatsapp Share
No ratings yet
Visit:: Join Telegram To Get Instant Updates: Contact: MAIL: Instagram: Instagram: Whatsapp Share
21 pages
FALLSEM2024-25 BCSE209L TH VL2024250101598 2024-08-05 Reference-Material-I
No ratings yet
FALLSEM2024-25 BCSE209L TH VL2024250101598 2024-08-05 Reference-Material-I
31 pages
Module 2 Notes v1 PDF
No ratings yet
Module 2 Notes v1 PDF
20 pages
module 2
No ratings yet
module 2
42 pages
Module 3
No ratings yet
Module 3
102 pages
Unit II Part 1
No ratings yet
Unit II Part 1
62 pages
Module - 3 - DTL & Ann
No ratings yet
Module - 3 - DTL & Ann
10 pages
ID3
No ratings yet
ID3
7 pages
Unit-2 Notes
No ratings yet
Unit-2 Notes
20 pages
Ai Mod3@Azdocuments - in
No ratings yet
Ai Mod3@Azdocuments - in
42 pages
Decision-Tree Learning .
No ratings yet
Decision-Tree Learning .
29 pages
Chapter 5 2018 2019
No ratings yet
Chapter 5 2018 2019
5 pages
M01 Tree-Based Methods
No ratings yet
M01 Tree-Based Methods
38 pages
Decision Trees Iterative Dichotomiser 3 (ID3) For Classification: An ML Algorithm
No ratings yet
Decision Trees Iterative Dichotomiser 3 (ID3) For Classification: An ML Algorithm
7 pages
Module 3-1 PDF
No ratings yet
Module 3-1 PDF
43 pages
MCA3 (DS) Unit 4 ML
No ratings yet
MCA3 (DS) Unit 4 ML
29 pages
NOTES Module 3 - Chapter 6 - Decision Tree Learning
No ratings yet
NOTES Module 3 - Chapter 6 - Decision Tree Learning
20 pages
Decisiontrees
No ratings yet
Decisiontrees
46 pages
Decision Tree
No ratings yet
Decision Tree
14 pages
ID3 Algorithm: Abbas Rizvi CS157 B Spring 2010
No ratings yet
ID3 Algorithm: Abbas Rizvi CS157 B Spring 2010
19 pages
Unit6 -2 Classification-Decision-Trees_25625586-1bf9-4821-a721-70db2d7805ef
No ratings yet
Unit6 -2 Classification-Decision-Trees_25625586-1bf9-4821-a721-70db2d7805ef
36 pages
Decision Trees
No ratings yet
Decision Trees
53 pages
Screenshot 2024-02-06 at 1.43.15 PM
No ratings yet
Screenshot 2024-02-06 at 1.43.15 PM
66 pages
AI_01_ID3
No ratings yet
AI_01_ID3
7 pages
Decision Tree Classifier-Introduction, ID3
No ratings yet
Decision Tree Classifier-Introduction, ID3
34 pages
AIML Lect5 Decision Tree
No ratings yet
AIML Lect5 Decision Tree
33 pages
Decision Tree (Class 37-38) 169692509554958626652505a71d481
No ratings yet
Decision Tree (Class 37-38) 169692509554958626652505a71d481
45 pages
Trigonometric Ratios to Transformations (Trigonometry) Mathematics E-Book For Public Exams
From Everand
Trigonometric Ratios to Transformations (Trigonometry) Mathematics E-Book For Public Exams
Mohmmad Khaja Shareef
5/5 (1)
SEM 4 - Physiology = Zoology
No ratings yet
SEM 4 - Physiology = Zoology
4 pages
BEE602 Microprocessor Ruchika.singh@Kiet.edu SET B
No ratings yet
BEE602 Microprocessor Ruchika.singh@Kiet.edu SET B
2 pages
Reproductive System
No ratings yet
Reproductive System
22 pages
CoPilot Architecture
No ratings yet
CoPilot Architecture
2 pages
MakeMyTrip Research - Aryan Yadav
No ratings yet
MakeMyTrip Research - Aryan Yadav
4 pages
Polynomials Assignment 4
No ratings yet
Polynomials Assignment 4
6 pages
Analysis and Control of Underactuated Mechanical Systems
No ratings yet
Analysis and Control of Underactuated Mechanical Systems
148 pages
TAFL Typing Notes All Units PDF
No ratings yet
TAFL Typing Notes All Units PDF
117 pages
Basic-Probability-Concepts
No ratings yet
Basic-Probability-Concepts
6 pages
Report of Comparing 5 Classification Algorithms of Machine Learning PDF
No ratings yet
Report of Comparing 5 Classification Algorithms of Machine Learning PDF
4 pages
pseudocode-to-predict-stock-prices
No ratings yet
pseudocode-to-predict-stock-prices
3 pages
LESSON-PLAN-IN-General Math-FUNCTIONS
No ratings yet
LESSON-PLAN-IN-General Math-FUNCTIONS
5 pages
2.2 Tensors Part2 Slides
No ratings yet
2.2 Tensors Part2 Slides
15 pages
13-Convection and Diffusion - B
No ratings yet
13-Convection and Diffusion - B
35 pages
Cryptography Lecture 5 Notes
No ratings yet
Cryptography Lecture 5 Notes
11 pages
Elliptic Curve Cryptography in Practice
No ratings yet
Elliptic Curve Cryptography in Practice
18 pages
Prototype Design and Analysis of Controllers For One Dimensional Ball and Beam System
No ratings yet
Prototype Design and Analysis of Controllers For One Dimensional Ball and Beam System
6 pages
homework4_v1.0
No ratings yet
homework4_v1.0
5 pages
Chapter 1 Linear Regression Notes (as FS2)
No ratings yet
Chapter 1 Linear Regression Notes (as FS2)
29 pages
Research Paper-On GPT Rise and Downfall
No ratings yet
Research Paper-On GPT Rise and Downfall
3 pages
Assignment 1 Solutions
No ratings yet
Assignment 1 Solutions
3 pages
TIFR Question With Solution 2019
No ratings yet
TIFR Question With Solution 2019
27 pages
Batch1 (Set Ques)
No ratings yet
Batch1 (Set Ques)
7 pages
0307 Haug
No ratings yet
0307 Haug
7 pages
Reasoning Shortcut Tricks
No ratings yet
Reasoning Shortcut Tricks
53 pages
Scaling Values FC105
No ratings yet
Scaling Values FC105
2 pages
Analysis of PCG
No ratings yet
Analysis of PCG
7 pages
Statistical Classification
No ratings yet
Statistical Classification
6 pages
Nath et al 2024
No ratings yet
Nath et al 2024
18 pages
(eBook PDF) Business Analytics Data Analysis Decision Making 6th pdf download
100% (2)
(eBook PDF) Business Analytics Data Analysis Decision Making 6th pdf download
54 pages
Edge Detection and Hough Transform Method
No ratings yet
Edge Detection and Hough Transform Method
11 pages
Fuzzy Logic Controlled Ćuk Converter
No ratings yet
Fuzzy Logic Controlled Ćuk Converter
5 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

unit 3

Uploaded by

unit 3

Uploaded by

Machine Learning

The decision tree learning is a method for

E(x) = - (0.36 log2 (0.36)) – (0.64 log2 (0.64))

Outlook Yes No 𝐸 𝑇, 𝑋 = ෍ 𝑃 𝑐 . 𝐸(𝑐)

• The information gain is defined as the reduction

• Entropy (T) = E(Play Golf) = 0.94

• Entropy (T,X) = E(Play Golf, Outlook) = 0.693

IG (Outlook) = Entropy(T) – Entropy(T,X)

IG (Outlook) = 0.94 – 0.693 = 0.247

Sunny Overcast Rain

Day Outlook Temp Humidity Wind Play

• IG (Ssunny ,Wind) = 0.0192

Sunny Overcast Rain

Day Outlook Temp Humidity Wind Play

Day Outlook Temp Humidity Wind Play

• IG(Srain ,Humidity) = 0.0192

• IG (Srain ,Wind) = 0.97 Next Root Node

Outlook= Sunny, Yes

• Calculate the Entropy (E) of every Attribute A of dataset

Iterative Classification and

The previous example was solved using ID3 Algorithm.

• ID3(Example, Target-Attribute, Attributes)

• If example (vi) is empty:

• ID-3 does not guarantee an optimal solution.

• ID-3 can overfit the training data.

• ID-3 is harder to use on continuous data as compared

Remarks on Candidate Elimination Algorithm

– Will the CE algorithm give us correct hypothesis?

– What training example should the learner request

– We drive rules from given examples

– Already Existing rules are applied to our examples.

• Biased Hypothesis space

Learner generalizes beyond the observed training

‘>’ → inductively inferred from

x>y => y I inferred from x.

• Inductive Bias is set of assumptions.

• Inductive Bias of ID3 consists of describing the basis

• ID3 Search Strategy:

– Selects in favor of shorter trees over longer ones.

– Selects element with highest Information Gain as

ID3 = > Preference

Version Space and Candidate Elimination => Restrictive

• Avoiding Overfitting of Data

• Classification is different for each instance.

• Instances are represented with n dimensional

• Step 1: Load the data set

𝑑3 = 3−3 2 + 7 −4 2 = 0+9= 9=3

𝑑4 = 3−1 2 + 7 −4 2 = 4+9= 13 =3.61

• Very Easy to Implement for classification task.

• Choosing the value of K affects the results.

• from sklearn.neigbors import

• Locally : The function (f) is approximated only based

• Locally weighted regression means approximation of

Approximation of function using linear function:

∆𝑤𝑖 = 𝑛 ෍(𝑓 𝑥 − 𝑓መ (𝑥)). 𝑎𝑗(𝑥)

Time Series Analysis

• It is a mathematical function whose value depends

• Used for classification and regression.

• Retrieve: Gather data from memory. Check any previous

Engineering and law for

Medical Science for patient

Software assistant recommends some

Gives new solution

Stores the new

Case Monthly Account Home Credit

Case Monthly Account Home Credit Score ‘Account

Lazy Learning Eager Learning

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.