0% found this document useful (0 votes)
7 views

unit 3

The document provides an overview of Decision Trees in machine learning, detailing their structure, terminology, and the process of decision tree learning. It explains key concepts such as entropy, information gain, and the calculations involved in determining these metrics for attributes in a dataset. The document also includes examples related to the 'Playing Golf' problem to illustrate these concepts.

Uploaded by

darkguardian363
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views

unit 3

The document provides an overview of Decision Trees in machine learning, detailing their structure, terminology, and the process of decision tree learning. It explains key concepts such as entropy, information gain, and the calculations involved in determining these metrics for attributes in a dataset. The document also includes examples related to the 'Playing Golf' problem to illustrate these concepts.

Uploaded by

darkguardian363
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 90

Machine Learning

Techniques

KCS 055
Decision Tree
• A decision tree in a machine learning is a
flowchart structure in which each node
represents a test, and each branch represents the
outcome of the test.
• The end node called leaf node represents a class
label.
• Decision Tree is a supervised learning method .
• Used for both classification and regression.
Decision Tree Learning

The decision tree learning is a method for


approximating discrete-valued target
function, (concept) in which the learned
function is represented by a decision tree.
Important Terminology of Decision Tree
• Root Node: It represents the entire population
(which gets further divided into 2 or more sets.
• Splitting: It is a process of dividing node into two
or more sub nodes to increase tree.
• Decision nodes: When a sub- node splits into
further sub-nodes then it is called a decision tree.
• Leaf/ Terminal Node: The end nodes which do
not split are called leaf or terminal nodes.
Important Terminology of Decision Tree
• Pruning: The removal of sub-nodes is called
pruning to reduce tree.
• Branch (sub-tree): A subsection of entire tree
is called branch or subtree.
• Parent nodes and child nodes: A node
divided into sub-nodes is called parent node.
The sub-nodes of a parent node are called
child nodes.
Entropy
• Average amount of information contained by
random variable (x) is called Entropy.
𝑛

𝐸 𝑥 = − ෍ 𝑃 𝑥𝑖 . 𝑙𝑜𝑔2𝑃(𝑥𝑖)
𝑖=1
• In other words, entropy is the measure of
“randomness information” of a variable.
• It is the measure of uncertainty associated with
random variable (x)
Calculate the entropy (E) of a single attribute
“Playing Golf” problem.
𝐸 𝑆 = − σ𝑛𝑖=1 𝑝𝑖 . 𝑙𝑜𝑔2𝑝i
Where x = current state, 𝑝𝑖 = Prob. of event (i) of state (S).
Yes No Entropy(Playing Golf) = Entropy(5,9)
9 5 Prob. Of Play Golf = Yes = 9/14 = 0.64
Prob. Of Play Golf = No = 5/14 = 0.36

E(x) = - (0.36 log2 (0.36)) – (0.64 log2 (0.64))


E(x) = 0.94
Calculate the entropy (E) of multiple attribute
“Playing Golf” problem.

Outlook Yes No 𝐸 𝑇, 𝑋 = ෍ 𝑃 𝑐 . 𝐸(𝑐)


𝑐𝜖𝑋
Sunny 3 2
Overcast 4 0 Where T = current state, X= Selected Attribute
Rain 2 3 E(Play Golf, Outlook) = P(Sunny).E(3,2) +
P(Overcast).E(4,0) + P(Rain).E(2,3)
E(Play Golf, Outlook) = 5/14 . 0.971 + 4/14 . 0 + 5/14 . 0.971
E(Play Golf, Outlook) = 0.693
Information Gain

• The information gain is defined as the reduction


(decrease) in entropy.
IG(T,X) = Entropy(T) – Entropy(T,X)
Where, T = Current State
X= Selected attribute
𝑆𝑉
𝐼𝐺 𝑆, 𝐴 = 𝐸𝑛𝑡𝑟𝑜𝑝𝑦 𝑆 − ෍ . 𝐸𝑛𝑡𝑟𝑜𝑝𝑦(𝑆𝑉)
𝑆
Calculate the Information Gain

• Entropy (T) = E(Play Golf) = 0.94

• Entropy (T,X) = E(Play Golf, Outlook) = 0.693

IG (Outlook) = Entropy(T) – Entropy(T,X)

IG (Outlook) = 0.94 – 0.693 = 0.247


Attribute 1: Outlook
Outlook Yes No
Sunny 3 2
Entropy (S) = E(Play) = 0.94 Overcast 4 0
Rain 2 3

𝑆𝑉
𝐼𝐺 𝑆, 𝐴 = 𝐸𝑛𝑡𝑟𝑜𝑝𝑦 𝑆 − ෍ . 𝐸𝑛𝑡𝑟𝑜𝑝𝑦(𝑆𝑉)
𝑆
5 4 5
𝐼𝐺 𝑆, 𝐴 = 0.94 − .𝐸 𝑆𝑢𝑛𝑛𝑦 − .E Overcast − .E(Rain)
14 14 14
Attribute 2: Temp.
Temp. Yes No
Hot
Entropy (S) = E(Play) = 0.94 Mild
Cool

𝑆𝑉
𝐼𝐺 𝑆, 𝐴 = 𝐸𝑛𝑡𝑟𝑜𝑝𝑦 𝑆 − ෍ . 𝐸𝑛𝑡𝑟𝑜𝑝𝑦(𝑆𝑉)
𝑆
𝐼𝐺 𝑆, 𝐴 = 0.94 − 𝑃(𝐻𝑜𝑡). 𝐸 𝐻𝑜𝑡 −
𝑃(𝑀𝑖𝑙d). E Mild − 𝑃(𝐶𝑜𝑜𝑙).E(Cool)
Attribute 3: Humidity
Humidity Yes No
High
Entropy (S) = E(Play) = 0.94 Normal

𝑆𝑉
𝐼𝐺 𝑆, 𝐴 = 𝐸𝑛𝑡𝑟𝑜𝑝𝑦 𝑆 − ෍ . 𝐸𝑛𝑡𝑟𝑜𝑝𝑦(𝑆𝑉)
𝑆
𝐼𝐺 𝑆, 𝐴
= 0.94 − 𝑃(𝐻𝑖𝑔ℎ). 𝐸 𝐻𝑖𝑔ℎ − 𝑃(Normal). E Normal
Attribute 4: Wind
Wind Yes No
Weak
Entropy (S) = E(Play) = 0.94 Strong

𝑆𝑉
𝐼𝐺 𝑆, 𝐴 = 𝐸𝑛𝑡𝑟𝑜𝑝𝑦 𝑆 − ෍ . 𝐸𝑛𝑡𝑟𝑜𝑝𝑦(𝑆𝑉)
𝑆
IG(S, Wind) = 0.94 - 𝑃(𝑊𝑒𝑎𝑘).E(Weak) -
𝑃(𝑆𝑡𝑟𝑜𝑛𝑔).E(Strong)
• IG(Outlook) = 0.2464 Root Node

• IG(Temp) = 0.0289
• IG(Humidity) = 0.1316
• IG (Wind) = 0.0478
Outlook

Sunny Overcast Rain


Yes Yes No
No
(D4, D5, D6, D10, D14)
(D1, D2, D8, D9, D11)
Yes
(D3, D7, D12. D13)
Information Gain of each attribute w.r.t
Sunny

Day Outlook Temp Humidity Wind Play


D1 Hot High Weak No
D2 Hot High Strong No
D8 Sunny Mild High Weak No
D9 Cool Normal Weak Yes
D11 Mild Normal Strong Yes
Information Gain of each attribute w.r.t
Sunny
Day Outlook Temp Humidity Wind Play
D1 Hot High Weak No
D2 Hot High Strong No
D8 Sunny Mild High Weak No
D9 Cool Normal Weak Yes
D11 Mild Normal Strong Yes

SSunny=[2+, 3-]
2 2 3 3
Entropy(Ssunny) = - log2 - log2 = 0.97
5 5 5 5
Attribute 1: Temp.
Temp. Yes No
Hot 0 2
Entropy (Ssunny) = E(Ssunny) = 0.97 Mild 1 1
Cool 1 0

𝑆𝑉
𝐼𝐺 𝑆, 𝐴 = 𝐸𝑛𝑡𝑟𝑜𝑝𝑦 𝑆 − ෍ . 𝐸𝑛𝑡𝑟𝑜𝑝𝑦(𝑆𝑉)
𝑆
2 2 1
IG(S, Temp) = 0.97 - .E(Hot) - .E(Mild) - .E(Cool)
5 5 5
Attribute 2: Humidity
Humidity Yes No
High
Entropy (Ssunny) = E(Ssunny) = 0.97 Normal

𝑆𝑉
𝐼𝐺 𝑆, 𝐴 = 𝐸𝑛𝑡𝑟𝑜𝑝𝑦 𝑆 − ෍ . 𝐸𝑛𝑡𝑟𝑜𝑝𝑦(𝑆𝑉)
𝑆
IG(S, Humidity) = 0.97 - 𝑃(𝐻𝑖𝑔ℎ).E(High) -
𝑃(𝑁𝑜𝑟𝑚𝑎𝑙).E(Normal)
Attribute 3: Wind
Wind Yes No
Weak
Entropy (Ssunny) = E(Ssunny) = 0.97 Strong

𝑆𝑉
𝐼𝐺 𝑆, 𝐴 = 𝐸𝑛𝑡𝑟𝑜𝑝𝑦 𝑆 − ෍ . 𝐸𝑛𝑡𝑟𝑜𝑝𝑦(𝑆𝑉)
𝑆
IG(S, Wind) = 0.97 - 𝑃(𝑊𝑒𝑎𝑘).E(Weak) -
𝑃(𝑆𝑡𝑟𝑜𝑛𝑔).E(Strong)
• IG(Ssunny ,Temp) = 0.570
• IG(Ssunny ,Humidity) = 0.97 Next Root Node

• IG (Ssunny ,Wind) = 0.0192


Outlook

Sunny Overcast Rain

Day Outlook Temp Humidity Wind Play


Yes
Humidity
D1 Hot High Weak No
D2 Hot High Strong No
D8 Sunny Mild High Weak No
D9 Cool Normal Weak Yes
High Normal
D11 Mild Normal Strong Yes

Yes
No
Information Gain of each attribute w.r.t
Rain

Day Outlook Temp Humidity Wind Play


D4 Mild High Weak Yes
D5 Cool Normal Weak Yes
D6 Rain Cool Normal Strong No
D10 Mild Normal Weak Yes
D14 Mild High Strong No
Information Gain of each attribute w.r.t
Rain
Day Outlook Temp Humidity Wind Play
D4 Mild High Weak Yes
D5 Cool Normal Weak Yes
D6 Rain Cool Normal Strong No
D10 Mild Normal Weak Yes
D14 Mild High Strong No

Srain=[3+, 2-]
3 3 2 2
Entropy(Srain) = - log2 - log2 = 0.97
5 5 5 5
Attribute 1: Temp.
Temp. Yes No
Hot 0 0
Entropy (Srain) = E(Srain) = 0.97 Mild 2 1
Cool 1 1

𝑆𝑉
𝐼𝐺 𝑆, 𝐴 = 𝐸𝑛𝑡𝑟𝑜𝑝𝑦 𝑆 − ෍ . 𝐸𝑛𝑡𝑟𝑜𝑝𝑦(𝑆𝑉)
𝑆
0 3 2
IG(S, Temp) = 0.97 - .E(Hot) - .E(Mild) - .E(Cool)
5 5 5
Attribute 2: Humidity
Humidity Yes No
High
Entropy (Srain) = E(Srain) = 0.97 Normal

𝑆𝑉
𝐼𝐺 𝑆, 𝐴 = 𝐸𝑛𝑡𝑟𝑜𝑝𝑦 𝑆 − ෍ . 𝐸𝑛𝑡𝑟𝑜𝑝𝑦(𝑆𝑉)
𝑆
IG(S, Humidity) = 0.97 - 𝑃(𝐻𝑖𝑔ℎ).E(High) -
𝑃(𝑁𝑜𝑟𝑚𝑎𝑙).E(Normal)
Attribute 3: Wind
Wind Yes No
Weak
Entropy (Srain) = E(Srain) = 0.97 Strong

𝑆𝑉
𝐼𝐺 𝑆, 𝐴 = 𝐸𝑛𝑡𝑟𝑜𝑝𝑦 𝑆 − ෍ . 𝐸𝑛𝑡𝑟𝑜𝑝𝑦(𝑆𝑉)
𝑆
IG(S, Wind) = 0.97 - 𝑃(𝑊𝑒𝑎𝑘).E(Weak) -
𝑃(𝑆𝑡𝑟𝑜𝑛𝑔).E(Strong)
• IG(Srain ,Temp) = 0.0192

• IG(Srain ,Humidity) = 0.0192

• IG (Srain ,Wind) = 0.97 Next Root Node


Outlook Outlook= Rain,
Temperature= Cool,
Humidity = High,
Wind = Strong
Sunny Overcast Rain

Outlook= Sunny, Yes


Humidity Wind
Temperature= Cool,
Humidity = High,
Wind = Strong
High Normal
Weak Strong

No Yes Yes No
General Decision Tree Algorithm Steps

• Calculate the Entropy (E) of every Attribute A of dataset


(S).
• Split (partition) the dataset (S) into subsets using the
attribute for which the resulting entropy after splitting is
minimized (or information gain is maximized).
• Make a decision tree node containing that attribute.
• Repeat steps 1, 2 and 3 until the dataset is finished.
Types of Decision Tree Algorithm

Iterative Classification and


Dichotomizer 3 CD 4.5 Algorithm Regression Tree
(ID3) Algorithm (CART) Algorithm

The previous example was solved using ID3 Algorithm.


Pseudocode of ID-3 Decision tree Algorithm

• ID3(Example, Target-Attribute, Attributes)


• Create a root node for the tree
– If all examples are +ve, return the single node tree
root, with label = (+).
– If all examples are -ve, return the single node tree
root, with label = (-).
Pseudocode of ID-3 Decision tree
Algorithm
• Otherwise begin
– A Attribute that best classifies examples.
– Decision tree attribute for root = A.
– For each possible value(vi) of A,
– Add a new tree branch below,
corresponding to the test A = (vi).
• Let examples(vi) be the subset of examples
that have value (vi) for A.
Pseudocode of ID-3 Decision tree
Algorithm

• If example (vi) is empty:


– Then below this new branch, add a leaf node with label
= (most common target value in the examples)
– Else below this new branch, add the subtree
ID3 (Example(vi), Target Attribute, Attribute – (A))
• End
• Return Root
Limitations of ID-3 Decision Tree
Algorithm

• ID-3 does not guarantee an optimal solution.

• ID-3 can overfit the training data.

• ID-3 is harder to use on continuous data as compared


to discrete data.
Inductive Bias

Remarks on Candidate Elimination Algorithm

– Will the CE algorithm give us correct hypothesis?

– What training example should the learner request


next?
Inductive Bias

• Inductive Learning

– We drive rules from given examples

• Deductive Learning

– Already Existing rules are applied to our examples.


Inductive Bias

• Biased Hypothesis space


– Does not consider all types of training examples.
– Ex: Sunny, Mild, Normal, Strong = Yes
• Unbiased Hypothesis space
– Hypothesis represents all set of examples
– Possible Instances : 3 x 3 x 2 x 2 = 36
– Target Concept : 236 (Practically not possible)
Idea of Inductive Bias

Learner generalizes beyond the observed training


examples to infer new examples

‘>’ → inductively inferred from

x>y => y I inferred from x.


Inductive Bias in Decision Tree
Learning

• Inductive Bias is set of assumptions.

• Inductive Bias of ID3 consists of describing the basis


by which ID3 chooses one consistent decision tree
over all the possible Decision Trees.
Inductive Bias in Decision Tree
Learning

• ID3 Search Strategy:

– Selects in favor of shorter trees over longer ones.

– Selects element with highest Information Gain as


root attribute over lowest IGs.
Types of Inductive Bias

Restrictive
• Based on Conditions
Bias
Preference
• Based on Priorities
Bias

ID3 = > Preference

Version Space and Candidate Elimination => Restrictive


Why short hypothesis?

• According to Occums
Razor
– Prefer simplest
hypothesis that fits
the data.
Issues in Decision Tree Learning

• Avoiding Overfitting of Data


– Reduced Error Pruning
– Rue Post-Pruning
• Incorporating continuous valued attributes.
• Alternative measures for selecting attributes.
• Handling training examples with missing attribute values.
• Handling attribute with differing costs.
3 main properties of Instance Based
Learning

• Lazy Learners.

• Classification is different for each instance.

• Instances are represented with n dimensional


Euclidean space.
K-Nearest Neighbor Algorithm
KNN Algorithm
• Supervised ML Algorithm.
• Can be used for both Regression and
Classification problems.
• Lazy Learning Algorithm.
• Non-parametric learning Algorithm.
KNN Algorithm Steps

• Step 1: Load the data set


• Step 2: Choose any value of nearest neighbour data points (K). K can be any integer value.
• Step 3: Calculate the Euclidean distance between test data (Query instance, x) and each row
of training data and note them down in ordered table.
• Step 4: Arrange the Euclidean distance table in ascending order . Choose top K rows from the
table.
• Step 5: Now assign a class to the new instance point query (test data) to that class for which
the K number of neighbour is maximum.
• Step 6: End.
Example
Consider the given training example Classify
the following data query pass or fail:
Data Query →Maths = 6, MLT = 8 and k=3
Maths MLT Result
4 3 Fail
Euclidean Distance (d)
6 7 Pass
7 8 Pass
𝑑= 𝑥 − 𝑥1 2 + 𝑥 − 𝑥2 2
5 5 Fail
8 8 Pass
Example
Data Query →Maths = 6, MLT = 8 and k=3
Maths MLT Result
𝑑= 𝑥 − 𝑥1 2 + 𝑥 − 𝑥2 2
4 3 Fail
2 2
6 7 Pass
𝑑1 = 6−4 + 8 −3 = 4 + 25 = 29 = 5.38
7 8 Pass
𝑑2 = 6−6 2 + 8 −7 2 = 0+1= 1=1 5 5 Fail
8 8 Pass
𝑑3 = 6−7 2 + 8 −8 2 = 1+0= 1=1 d2, d3, and d5 are 3 nearest
neighbors.
𝑑4 = 6−5 2 + 8 −5 2 = 1+9= 10 = 3.16 D2 → Pass, D3→ Pass, D5→
Pass
𝑑5 = 6−8 2 + 8 −8 2 = 4+0= 4=2 Therefore, when Maths =6,
MLT= 8, Result = Pass.
Classify P5 in the following given
data using KNN, consider K=3
Classify P5 in the following given data using
KNN, consider K=3

𝑑1 = 3−7 2 + 7 −7 2 = 16 + 0 = 16 = 4

𝑑2 = 3−7 2 + 7 −4 2 = 16 + 9 = 25 = 5

𝑑3 = 3−3 2 + 7 −4 2 = 0+9= 9=3

𝑑4 = 3−1 2 + 7 −4 2 = 4+9= 13 =3.61


3 nearest neighbors are →d1, d3, d4
P1 → Bad, P3→Good, P4 →Good
Therefore, P5 belongs to Good.
Wt = 57
Ht = 170
Class = ?
Consider
K=3 or 5.
Name Age Gender Class
Ajay 32 0 Football Here male is
Mark 40 0 Neither denoted with
Saira 16 1 Cricket numeric value 0
Zara 34 1 Cricket and female with 1.
Sachin 55 0 Neither Find in which class
Rahul 40 0 Cricket of people Angelina
Pooja 20 1 Neither will lie whose k
Smith 15 0 Cricket factor is 3 and age
Laxmi 55 1 Football is 5.
Michael 15 0 Football
Credit
KNN Applications
Rating
Stock
Pattern
Price
Recogniti
Predictio
on
n
Pattern
recognition
Data
Loan
Preproce
Approval
ssing
Compute
r Vision
Advantages of K-NN

• Very Easy to Implement for classification task.


• No training required to make predictions.
• Only 2 parameters required in KNN i.e., value of integer K and
Euclidean distance function (d).
• A variety of distance function available like Euclidean,
Manhattan, Minkowkski, etc.
• KNN can be used for both classification and regression tasks.
Disadvantages of K-NN

• Choosing the value of K affects the results.


• Difficult distance function calculation.
• Does not work with large dataset.
• Need feature scaling.
• Sensitive to noisy data, missing value and outliers.
• Needs large memory size of computer storage.
Implementation of KNN

• from sklearn.neigbors import


KNeighborsClassifier
• classifier =
KNeighborsClassifier(n_neighbors=8)
• classsifier.fit(x_train, y_train)
Locally Weighted Regression

• Locally : The function (f) is approximated only based


on data points which are local or near to the query
point.
• Weighted: Each training example is weighted by its
distance from query point(xq).
• Regression: Means approximation of a target
function.
Locally Weighted Regression

• Locally weighted regression means approximation of


a real-valued target function f(x) over a local region
near to the query point(xq). It uses distance weighted
training examples to form local approximation to
target function f(x).
Locally Weighted Regression
Locally Weighted Linear Regression

Approximation of function using linear function:


𝑓መ 𝑥 = 𝑤0 + 𝑤1𝑎1 𝑥 + 𝑤2𝑎2 𝑥 + … … … + 𝑤𝑛𝑎𝑛(𝑥)

where,
𝑎𝑖 𝑥 = Value of ith attribute of instance (𝑥)
𝑤𝑖 = Coefficients of weights
Locally Weighted Linear Regression
To find coefficients of weights, we use gradient descent rule:

∆𝑤𝑖 = 𝑛 ෍(𝑓 𝑥 − 𝑓መ (𝑥)). 𝑎𝑗(𝑥)

where,
n = Learning rate constant
𝑓 𝑥 = Target Function

𝑓(𝑥) =Approximation to target function.
𝑎𝑗(𝑥) = Value of jth attribute of instance (x)
Locally Weighted Linear Regression
Locally Weighted Linear Regression
• Locally weighted linear regression is a supervised learning algorithm.
• It is a non-parametric algorithm.
• There exists No training phase. All the work is done during the testing
phase/while making predictions.
• The dataset must always be available for predictions.
• Locally weighted regression methods are a generalization of k-Nearest
Neighbour.
• In Locally weighted regression an explicit local approximation is constructed
from the target function for each query instance.
• The local approximation is based on the target function of the form like
constant, linear, or quadratic functions localized kernel functions.
Applications of Locally Weighted Linear
Regression

Time Series Analysis


Anomaly Detection
Robotics and Control Systems
Radial Basis Function (RBF)

• It is a mathematical function whose value depends


only on the distance from the origin.
• RBF works by defining its distance from the center or
origin point. This is done by using absolute values.
• Denoted by Փ(x).
Փ(r) =Փ(|r|)
3 Radial functions

• Multiquadric
Փ(r) =(r2 + c2)1/2
where, c>0, constant
• Inverse Multiquadric
1
Փ(r) = 2 2 1/2
(r + c )
• Gaussian Function
−r2
Փ(r) =exp[ 2 ]
2𝜎
Radial Basis Function (RBF)
• Used for approximation of multivariate target functions.

𝑓(𝑥) = 𝑤0 + ෍ 𝑤𝑢 . 𝐾𝑢 𝑑 𝑥𝑢, 𝑥


𝑓(𝑥) = Approximation of multivariate target functions
𝑤0 = Initial weight
𝑤𝑢= Unit Weight
𝐾𝑢 𝑑 𝑥𝑢, 𝑥 = Kernel Function
𝑑 𝑥𝑢, 𝑥 = distance between 𝑥𝑢 and 𝑥.
Radial Basis Function (RBF) Networks
• Used in Artificial Neural Networks (ANN).
• Used for classification task in ANN.
• Commonly used in ANN for function
approximation also.
• RBF networks are different from simple ANN due
to their universal approximation and faster
speed.
• Feed forward neural network.
• Consists of 3 layers – Input layer, Middle Layer
and Output Layer
Case Based Learning or Case Based
Reasoning (CBR)

• Used for classification and regression.


• Process of solving new problems based on the
solutions of similar past problems.
• It is an advanced instance-based learning method
used to solve more complex problems.
• Does not use Euclidean Metric.
Steps in CBR

• Retrieve: Gather data from memory. Check any previous


solution similar to current problem.
• Reuse: Suggest a solution based on experience. Adapt it to
meet the demands of new situation.
• Revise: Evaluate the use of solution in new context.
• Retain: Store this new problem-solving method in memory
system.
Steps In CBR
Applications in CBR
Customer service helpdesk for
diagnosis problems.

Engineering and law for


technical design and legal rules.

Medical Science for patient


case histories and treatment.
CBR Example (Smart Software Agent)
Tells the problem

Software assistant recommends some


possible solutions to solve the current
User calls for problem such as internet connection CBR is used by the
computer related problem, printer problem, etc. software assistant
service problem. to diagnose the
problem
CBR Example (Smart Software Agent)
Customer not satisfied

Gives new solution

Stores the new


recommended
solution into
database for future.
CBR Example

Case Monthly Account Home Credit


Income Balance Owner Score Q: Which case will the
1 3 2 0 2 CSR system retrieve as
2 2 1 1 2 the ‘best match’, if all
3 3 2 2 4 the weights wi = 1?
4 0 -1 0 0
5 3 1 2 ?
CBR
All weights are equal: w = 1
Example
D1(t,s1) =σ𝑛𝑖=0 𝑡 − 𝑠𝑖 ∗ 𝑤𝑖
D1 (t,s1) = |3-3|*1 + |1-2|*1 + |2-0|*1
D1(t,s1) = 0 + 1 + 2 = 3
Case Monthly Account Home Credit
Income Balance Owner Score D2(t,s2) =σ𝑛𝑖=0 𝑡 − 𝑠𝑖 ∗ 𝑤𝑖
1 3 2 0 2 D2 (t,s2) = |3-2|*1 + |1-1|*1 + |2-1|*1
2 2 1 1 2 D2(t,s2) = 1 + 0 + 1 = 2
3 3 2 2 4
D3(t,s3) =σ𝑛𝑖=0 𝑡 − 𝑠𝑖 ∗ 𝑤𝑖
4 0 -1 0 0
D3 (t,s3) = |3-3|*1 + |1-2|*1 + |2-2|*1
5 3 1 2 ? D3(t,s3) = 0 + 1 + 0 = 1

D4(t,s3) =σ𝑛𝑖=0 𝑡 − 𝑠𝑖 ∗ 𝑤𝑖
The minimum distance is D3(t, s3) = 1. D4 (t,s3) = |3-0|*1 + |1-(-1)|*1 + |2-0|*1
Therefore best fit credit score for Case 5 is 4. D4(t,s3) = 3 + 2 + 2 = 7
CBR Example

Case Monthly Account Home Credit Score ‘Account


Income Balance Owner
Balance’ is 3
1 3 2 0 2
times more
2 2 1 1 2
3 3 2 2 4
important than
4 0 -1 0 0 any other
5 3 1 2 ? feature.
All weights are equal: w = 1
CBR Example
D1(t,s1) =σ𝑛𝑖=0 𝑡 − 𝑠𝑖 ∗ 𝑤𝑖
D1 (t,s1) = |3-3|*1 + |1-2|*3 + |2-0|*1
D1(t,s1) = 0 + 3 + 2 = 5
Case Monthly Account Home Credit
Income Balance Owner Score D2(t,s2) =σ𝑛𝑖=0 𝑡 − 𝑠𝑖 ∗ 𝑤𝑖
1 3 2 0 2 D2 (t,s2) = |3-2|*1 + |1-1|*3 + |2-1|*1
2 2 1 1 2 D2(t,s2) = 1 + 0 + 1 = 2
3 3 2 2 4
D3(t,s3) =σ𝑛𝑖=0 𝑡 − 𝑠𝑖 ∗ 𝑤𝑖
4 0 -1 0 0
D3 (t,s3) = |3-3|*1 + |1-2|*3 + |2-2|*1
5 3 1 2 ? D3(t,s3) = 0 + 3 + 0 = 3

D4(t,s3) =σ𝑛𝑖=0 𝑡 − 𝑠𝑖 ∗ 𝑤𝑖
The minimum distance is D2(t, s3) = 2. D4 (t,s3) = |3-0|*1 + |1-(-1)|*3 + |2-0|*1
Therefore best fit credit score for Case 5 is 2. D4(t,s3) = 3 + 2*3 + 2 = 3 + 6 + 2 = 11
Lazy Learning vs Eager Learning

Lazy Learning Eager Learning


• Simply stores training data • When we give a training set,
and waits until it gets a test it constructs a model for
data. classification before getting
• Less training time, more new example
prediction time. • More training time, less
• Ex: All instance based prediction time.
learning algorithms • Ex: Naïve Bayes, Decision
Tree
Reference Books

Tom M. Mitchell, Ethem Alpaydin, ―Introduction Stephen Marsland, Bishop, C., Pattern
―Machine Learning, to Machine Learning (Adaptive ―Machine Learning: An Recognition and Machine
McGraw-Hill Computation and Machine Algorithmic Perspective, Learning. Berlin:
Education (India) Learning), The MIT Press 2004. CRC Press, 2009. Springer-Verlag.
Private Limited, 2013.
Text Books

Saikat Dutt, Andreas C. Müller and John Paul Mueller and Dr. Himanshu
Subramanian Sarah Guido - Luca Massaron - Sharma, Machine
Chandramaouli, Amit Introduction to Machine Machine Learning for Learning, S.K.
Kumar Das – Machine Learning with Python Dummies Kataria & Sons -2022
Learning, Pearson

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy