29-Naive Bayes-03-10-2024
29-Naive Bayes-03-10-2024
29-Naive Bayes-03-10-2024
3 No Small 70K No
6 No Medium 60K No
Training Set
Apply
Tid Attrib1 Attrib2 Attrib3 Class Model
11 No Small 55K ?
15 No Large 67K ?
10
Test Set
Classification Problem
• More precisely, a classification problem can be stated as
below:
Definition : Classification Problem
• Foundation
• Based on Bayes’ Theorem.
• Assumptions
• The classes are mutually exclusive and exhaustive.
• The attributes are independent given the class.
• Given this is the knowledge of data and classes, we are to find the most
likely classification for any other unseen instance, for example:
Week Day Winter High None ???
• Week Day
• Winter
• High
• e
Where,
• P(A|B) is Posterior probability: Probability of hypothesis A on
the observed event B.
Working Steps of Naïve Bayes' Classifier
• There are any two class conditional probabilities namely P(Y= 𝑦𝑖 |X=x)
and P(Y= 𝑦𝑗 | X=x).
• If P(Y= 𝑦𝑖 | X=x) > P(Y= 𝑦𝑗 | X=x), then we say that 𝑦𝑖 is more stronger
than 𝑦𝑗 for the instance X = x.
Input: Given a set of k mutually exclusive and exhaustive classes C = 𝑐1 , 𝑐2 , … . . , 𝑐𝑘 , which have prior
probabilities P(C1), P(C2),….. P(Ck).
There are n-attribute set A = 𝐴1 , 𝐴2 , … . . , 𝐴𝑛 , which for a given instance have values 𝐴1 = 𝑎1 , 𝐴2 = 𝑎2 ,….., 𝐴𝑛 = 𝑎𝑛
Note: σ 𝒑𝒊 ≠ 𝟏, because they are not probabilities rather a proportion values (to posterior probabilities)
Example:
Working Steps of Naïve Bayes' Classifier
• Step 1: Calculate the prior probability for given class labels
• Step 2: Find the Likelihood probability with each attribute for
each class
• Step 3: Put these values in Bayes Formula and calculate posterior
probability.
• Step 4: See which class has a higher probability, given the input
belongs to the higher probability class.
Example Cont’d
• Learning Phase(frequency tables)
P(outlook = sunny, temperature = cool, humidity = normal, wind = strong | Play tennis) =
P(sunny|Yes)∗P(cool|Yes)∗P(high|Yes)∗P(strong|Yes) ∗ P(yes)
=
P((sunny, cool, high, strong) | Yes) + P((sunny, cool, high, strong) | No)
.0051
=
.0051 + .0207
= .1977
Example Cont’d
• Similarly,
P((sunny, cool, high, strong) | No) ∗ P(No)
(No|(sunny, cool, high, strong)) =
P(sunny, cool, high, strong)
.0207
=
.0051 + .0207
= .8023
• Apply MAP (Maximum Likelihood) rule
The 20% for playing tennis in the described conditions, and a value of
80% for not playing tennis in these conditions, therefore the prediction
is that no tennis will be played if the day is like these conditions.
Car theft Example
Data set
Attributes are Color , Type , Origin, and the subject, stolen can be either yes or no.
P(red|Yes)∗P("Domestic"|Yes)∗P(SUV|Yes)∗ P(yes)
=
P((Red,Domestic, SUV) | Yes) + P((Red,Domestic, SUV) | No)
(.6∗.5∗.2)∗.5
= =0.295
.6∗.5∗.2 ∗.5+ .4∗.6∗.6 ∗.5
P(red|No)∗P("Domestic"|No)∗P(SUV|No)∗ P(No)
=
P((Red,Domestic, SUV) | Yes) + P((Red,Domestic, SUV) | No)
Blood Pressure: High Wight: Above average Family History: Yes Age: 50+ Diabetes?
.44∗.78∗.67∗.78 ∗.45
• =
.44∗.78∗.67∗.78 ∗.45+ .36∗.36∗.36∗.27 ∗.55
.080
• = =.920
.0869
• Similarly,
• P(No|(BP=High,Weight=Above average, Family history=yes,Age=50+)) =
.36∗.36∗.36∗.27 ∗.55
• =
.44∗.78∗.67∗.78 ∗.45+ .36∗.36∗.36∗.27 ∗.55
.0069
• = =.080
.0869
Problem 4
Days Season Fog Rain Class
Weekday Spring None None On Time
Class
Prior Probability 14/20 = 0.70 2/20 = 0.10 3/20 = 0.15 1/20 = 0.05
Problem 4 Cont’d
Instance:
Case3: Class = Very Late : 0.15 × 1.0 × 0.67 × 0.33 × 0.67 = 0.0222
• The Naïve Bayes’ approach is a very popular one, which often works well.
• In real-life situations, all attributes are not necessarily categorical, In fact, there is a mix of both
categorical and continuous attributes.
• In the following, we discuss the schemes to deal with continuous attributes in Bayesian classifier.
1. We can discretize each continuous attribute and then replace the continuous values with their
corresponding discrete intervals.
2. We can assume a certain form of a probability distribution for the continuous variable and estimate the
parameters of the distribution using the training data. A Gaussian distribution is usually chosen to
represent the posterior probabilities for continuous attributes. A general form of Gaussian distribution will
look like
2
2
1 −
x−μ
P x: μ, σ = e
2πσ 2σ2
where, μ and σ2 denote mean and variance, respectively.
Estimating the Posterior Probabilities for
Continuous Attributes
For each class Ci, the posterior probabilities for attribute Aj (it is the numeric attribute) can be
calculated following the Gaussian normal distribution as follows.
2
1 aj − μij
P Aj = aj|Ci = e−
2πσij 2σij2
Here, the parameter μij can be calculated based on the sample mean of attribute value of Aj for
the training records that belong to the class Ci.
Similarly, σij2 can be estimated from the calculation of variance of such training records.
Estimating the Posterior Probabilities for
Continuous Attributes
Example Problem: classify whether a given person is a male or a female based on the measured features.
The features include height, weight, and foot size, assuming there exists normalized Gaussian
Distribution in the populations.
Estimating the Posterior Probabilities for
Continuous Attributes
• To which class a person with the given inputs will be classified?
= 6.1984e-09
= 5.3778e-04
1 x−μ 2
P x: μ, σ2 = e− P(female) = 0.5
2πσ 2σ2 p(height | female) = 2.2346e-1
P(male) = 0.5 p(weight | female) = 1.6789e-2
p(foot size | female) = 2.8669e-1
• A number m that says how confident we are of our prior estimate p, as measured in the
number of samples
M-estimate of Conditional Probability
• The M-estimation is to deal with the potential problem of Naïve Bayesian Classifier
when training data size is too poor.
• If the posterior probability for one of the attributes is zero, then the overall class-
conditional probability for the class vanishes.
• In other words, if training data do not cover many of the attribute values, then we may not be
able to classify some of the test records.
𝑛𝑐𝑖 +𝑚𝑝
P Aj = aj|Ci = 𝑛+𝑚
𝑛𝑐𝑖 = number of training examples from class C𝑖 that take the value Aj =aj
p = is a user-specified parameter.
Note: