29-Naive Bayes-03-10-2024

Bayesian Classifier
Supervised Classification Technique
• Given a collection of records (training set )

• Each record contains a set of attributes, one of the attributes is
the class.
• Find a model for class attributes as a function of the values

of other attributes.
• Goal: Previously unseen records should be assigned a class as

accurately as possible.
• Satisfy the property of “mutually exclusive and exhaustive”
Illustrating Classification Tasks
Tid Attrib1 Attrib2 Attrib3 Class Learning
1 Yes Large 125K No
algorithm
2 No Medium 100K No
3 No Small 70K No
4 Yes Medium 120K No

Induction
5 No Large 95K Yes
6 No Medium 60K No
7 Yes Large 220K No Learn

8 No Small 85K Yes Model
9 No Medium 75K No
10 No Small 90K Yes

Model
10
Training Set
Apply
Tid Attrib1 Attrib2 Attrib3 Class Model
11 No Small 55K ?
12 Yes Medium 80K ?
13 Yes Large 110K ? Deduction

14 No Small 95K ?
15 No Large 67K ?
10
Test Set
Classification Problem
• More precisely, a classification problem can be stated as
below:
Definition : Classification Problem
Given a database D = 𝑡1 , 𝑡2 , … . . , 𝑡𝑚 of tuples and a set of classes C =

𝑐1 , 𝑐2 , … . . , 𝑐𝑘 , the classification problem is to defie a mapping f ∶ D → 𝐶,
Where each 𝑡𝑖 is assigned to one class.
Note that tuple 𝑡𝑖 ∈ 𝐷 is defined by a set of attributes 𝐴 = 𝐴1 , 𝐴2 , … . . , 𝐴𝑛 .

Bayesian Classifier
• Principle
• If it walks like a duck and quacks like a duck, then it is probably a
duck.
Bayesian Classifier
• A statistical classifier
• Performs probabilistic prediction, i.e., predicts class membership probabilities.
• Foundation
• Based on Bayes’ Theorem.
• Assumptions
• The classes are mutually exclusive and exhaustive.
• The attributes are independent given the class.
• Called “Naïve” classifier because of these assumptions.

• Empirically proven to be useful.
• Scales very well.
Naïve Bayes Classifier Algorithm
• Naïve Bayes algorithm is a supervised learning algorithm, which is
based on the Bayes theorem and used for solving classification
problems.
• It is mainly used in text classification that includes a high-

dimensional training dataset.
• It is a probabilistic classifier, which means it predicts on the

basis of the probability of an object.
Example: Bayesian Classification
• Example: Air Traffic Data

• Let us consider a set of
observations recorded in a
database
• Regarding the arrival of airplanes on
the routes from any airport to New
Delhi under certain conditions.
Air-Traffic Data
Days Season Fog Rain Class
Weekday Spring None None On Time
Weekday Winter None Slight On Time
Weekday Winter None None On Time
Holiday Winter High Slight Late
Saturday Summer Normal None On Time
Weekday Autumn Normal None Very Late
Holiday Summer High Slight On Time
Sunday Summer Normal None On Time
Weekday Winter High Heavy Very Late
Weekday Summer None Slight On Time
Air-Traffic Data
• In this database, there are four attributes
A = [ Day, Season, Fog, Rain]
with 20 tuples.
• The categories of classes are:
C= [On Time, Late, Very Late, Cancelled]
• Given this is the knowledge of data and classes, we are to find the most
likely classification for any other unseen instance, for example:
Week Day Winter High None ???
• Week Day
• Winter
• High
• e
• Classification technique eventually to map this tuple into an accurate class

Bayesian Classifier
• In many applications, the relationship between the attributes set
and the class variable is non-deterministic.
• In other words, a test cannot be classified to a class label with certainty.
• In such a situation, the classification can be achieved probabilistically.
• The Bayesian classifier is an approach for modeling probabilistic

relationships between the attribute set and the class variable.
• More precisely, the Bayesian classifier uses Bayes’ Theorem of

Probability for classification.
• Before going to discuss the Bayesian classifier, we should have a

quick look at the Theory of Probability and then Bayes’ Theorem.
Bayes' Theorem
• Bayes' theorem is also known as Bayes' Rule or Bayes' law, which
is used to determine the probability of a hypothesis with prior
knowledge. It depends on conditional probability.
Where,
• P(A|B) is Posterior probability: Probability of hypothesis A on
the observed event B.
Working Steps of Naïve Bayes' Classifier
• Step 1: Convert the given dataset into frequency tables.
• Step 2:Generate Likelihood table by finding the

probabilities of given features.
• Step 3:Now, use the Bayes theorem to calculate the

posterior probability.
Recollect Prior and Posterior Probabilities
• P(A) and P(B) are called prior probabilities X Y
• P(A|B), P(B|A) are called posterior probabilities
𝑥1 A
Example: Prior versus Posterior Probabilities
𝑥2 A
• This table shows that the event Y has two outcomes
namely A and B, which is dependent on another event X 𝑥3 B
with various outcomes like 𝑥1 , 𝑥2 and 𝑥3 .
𝑥3 A
• Case1: Suppose, we don’t have any information of the
event A. Then, from the given sample space, we can 𝑥2 B
calculate
𝟓 𝑥1 A
P(Y = A) = = 0.5
𝟏𝟎
•
𝑥1 B
• Case2: Now, suppose, we want to calculate P(X = 𝑥2 |Y =A)
𝟐 𝑥3 B
= = 0.4 .
𝟓
𝑥2 B
The later is the conditional or posterior probability, where as 𝑥2 A
the former is the prior probability.
Naïve Bayesian Classifier
• Suppose, Y is a class variable and X = {𝑋1,𝑋_2,…..,𝑋_𝑛 } is a
set of attributes, with an instance of Y.
INPUT (X) CLASS(Y)
… … …
… … … …
𝑥 1, 𝑥 2 , … , 𝑥 𝑛 𝑦 𝑖
… … … …
The classification problem, then can be expressed as the class-

conditional probability
𝑃 𝑌 = 𝑦𝑖| 𝑋1 = 𝑥1 AND 𝑋2 = 𝑥2 AND … . . 𝑋𝑛 = 𝑥𝑛
• Naïve Bayesian classifier calculates this posterior probability using Bayes’ theorem,
which is as follows.
• From Bayes’ theorem on conditional probability, we have
𝑃(𝑋|𝑌)∙𝑃(𝑌)
𝑃 𝑌𝑋 =
𝑃(𝑋)
𝑃(𝑋|𝑌) ∙ 𝑃(𝑌)
=
𝑃 𝑋 𝑌 = 𝑦1 ∙ 𝑃 𝑌 = 𝑦1 + ⋯ + 𝑃 𝑋 𝑌 = 𝑦𝑘 ∙ 𝑃 𝑌 = 𝑦𝑘
where,
𝑃 𝑋 = σ𝑘𝑖=1 𝑃(𝑋|𝑌 = 𝑦𝑖 ) ∙ 𝑃(Y = 𝑦𝑖 )
Note:
▪ 𝑃 𝑋 is called the evidence (also the total probability) and it is a constant.
▪ The probability P(Y|X) (also called class conditional probability) is therefore
proportional to P(X|Y)∙ 𝑃(𝑌).
▪ Thus, P(Y|X) can be taken as a measure of Y given that X.

P(Y|X) ≈ 𝑃 𝑋 𝑌 ∙ 𝑃(𝑌)
• Suppose, for a given instance of X (say x = (𝑋1 = 𝑥1 ) and ….. (𝑋𝑛 = 𝑥𝑛 )).
• There are any two class conditional probabilities namely P(Y= 𝑦𝑖 |X=x)
and P(Y= 𝑦𝑗 | X=x).
• If P(Y= 𝑦𝑖 | X=x) > P(Y= 𝑦𝑗 | X=x), then we say that 𝑦𝑖 is more stronger
than 𝑦𝑗 for the instance X = x.
• The strongest 𝑦𝑖 is the classification for the instance X = x.

Bayesian Classifier Solved
Problems
Algorithm: Naïve Bayesian Classification
Input: Given a set of k mutually exclusive and exhaustive classes C = 𝑐1 , 𝑐2 , … . . , 𝑐𝑘 , which have prior
probabilities P(C1), P(C2),….. P(Ck).
There are n-attribute set A = 𝐴1 , 𝐴2 , … . . , 𝐴𝑛 , which for a given instance have values 𝐴1 = 𝑎1 , 𝐴2 = 𝑎2 ,….., 𝐴𝑛 = 𝑎𝑛
Step: For each 𝑐𝑖 ∈ C, calculate the class condition probabilities, i = 1,2,…..,k

𝑝𝑖 = 𝑃 𝐶𝑖 × ς𝑛𝑗=1 𝑃(𝐴𝑗 = 𝑎𝑗 |𝐶𝑖 )
𝑝𝑥 = max 𝑝1 , 𝑝2 , … . . , 𝑝𝑘
Output: 𝐶𝑥 is the classification
Note: σ 𝒑𝒊 ≠ 𝟏, because they are not probabilities rather a proportion values (to posterior probabilities)
Example:
Working Steps of Naïve Bayes' Classifier
• Step 1: Calculate the prior probability for given class labels
• Step 2: Find the Likelihood probability with each attribute for
each class
• Step 3: Put these values in Bayes Formula and calculate posterior
probability.
• Step 4: See which class has a higher probability, given the input
belongs to the higher probability class.
Example Cont’d
• Learning Phase(frequency tables)
Outlook Play=Yes Play=No Temperature Play=Yes Play=No

Sunny 2/9 3/5 Hot 2/9 2/5
Overcast 4/9 0/5 Mild 4/9 2/5
Rain 3/9 2/5 Cool 3/9 1/5
Humidity Play=Yes Play=No Wind Play=Yes Play=No
High Strong 3/9 3/5

3/9 4/5
Normal Weak 6/9 2/5
6/9 1/5
P(Play=Yes) = 9/14 P(Play=No) = 5/14

Example Cont’d
• Remembering Bayes Rule:
p(h|D) = p(D|h) * p(h)
p(D)
• We write our f(x) in that form:
P(Play Tennis | Attributes) = P(Attributes | Play Tennis) * P(Play Tennis)
P(Attributes)
Or
P(v|a) = P(a|v) * P(v)

P(a)
Lets look closely at P(a|v)
P(a|v) = P(a0…a3 | v0,1)

Or
P(a|v) = P(Outlook, Temperature, Humidity, Wind | Play tennis, Don’t Play

tennis)
Example Cont’d
• In order to get a table with reliable measurements every
combination of each attribute a0…a3 for each hypothesis v0,1 our
table would have to be of size 3*3*2*2*2 = 72 and each combination
would have to be observed multiple times to ensure its reliability.
• Why, because we are assuming an inter-dependence of the

attributes (probably a good assumption).
• The Naïve Bayes classifier is based on simplifying this assumption.

That is to say, cool temperature is completely independent of it
being sunny and so on.
Example Cont’d
• So :
P(a0…a3 | vj=0,1) = P(a0|v0) * P(a1|v0) * P(an|v0)
= P(a0|v1) * P(a1|v1) * P(an|v1)
or
P(a0…an | vj) = ς𝑖 P(ai | vj)
P(outlook = sunny, temperature = cool, humidity = normal, wind = strong | Play tennis) =
P(outlook = sunny | Play tennis) * P(temperature = cool | Play tennis) *

P(humidity = normal | Play tennis) * P(wind = strong | Play tennis)
Example Cont’d
• Using the table of 14 examples we can calculate our overall
probabilities and conditional probabilities.
First, we estimated the probability of playing tennis:

P(Play Tennis = Yes) = 9/14 = .64
P(Play Tennis = No) = 5/14 = .36
• Then we estimate the conditional probabilities of the individual attributes.

• Remember this is the step in which we are assuming that the attributes are independent of
each other:
• Outlook:
P(Outlook = Sunny | Play Tennis = Yes) = 2/9 = .22
P(Outlook = Sunny | Play Tennis = No) = 3/5 = .6
P(Outlook = Overcast | Play Tennis = Yes) = 4/9 = .44
P(Outlook = Overcast | Play Tennis = No) = 0/5 = 0
P(Outlook = Rain | Play Tennis = Yes) = 3/9 = .33
P(Outlook = Rain | Play Tennis = No) = 2/5 = .4
Example Cont’d
Temperature
P(Temperature = Hot | Play Tennis = Yes) = 2/9 = .22
P(Temperature = Hot | Play Tennis = No) = 2/5 = .40 Wind
P(Temperature = Mild | Play Tennis = Yes) = 4/9 = .44 P(Wind = Weak | Play Tennis = Yes) = 6/9 = .66
P(Temperature = Mild | Play Tennis = No) = 2/5 = .40 P(Wind = Weak | Play Tennis = No) = 2/5 = .40
P(Temperature = Cool | Play Tennis = Yes) = 3/9 = .33 P(Wind = Strong | Play Tennis = Yes) = 3/9 = .33
P(Temperature = Cool | Play Tennis = No) = 1/5 = .20 P(Wind = Strong | Play Tennis = No) = 3/5 = .60
Humidity
P(Humidity = Hi | Play Tennis = Yes) = 3/9 = .33
P(Humidity = Hi | Play Tennis = No) = 4/5 = .80
P(Humidity = Normal | Play Tennis = Yes) = 6/9 = .66
P(Humidity = Normal | Play Tennis = No) = 1/5 = .20
Example Cont’d
• Test Phase
Given a new instance,
x’=(Outlook=Sunny, Temperature=Cool, Humidity=High, Wind=Strong)
P((sunny, cool, high, strong) | Yes) ∗ P(Yes)

• P(Yes|(sunny, cool, high, strong)) =
P(sunny, cool, high, strong)
P(sunny|Yes)∗P(cool|Yes)∗P(high|Yes)∗P(strong|Yes) ∗ P(yes)
=
P((sunny, cool, high, strong) | Yes) + P((sunny, cool, high, strong) | No)
(.22 ∗ .33 ∗ .33 ∗ .33) ∗ .64

=
(.22 ∗ .33 ∗ .33 ∗ .33)∗.64 + (.6 ∗ .2 ∗ .8 ∗ .6)∗.36
.0051
=
.0051 + .0207
= .1977
Example Cont’d
• Similarly,
P((sunny, cool, high, strong) | No) ∗ P(No)
(No|(sunny, cool, high, strong)) =
P(sunny, cool, high, strong)
.0207
=
.0051 + .0207
= .8023
• Apply MAP (Maximum Likelihood) rule
The 20% for playing tennis in the described conditions, and a value of
80% for not playing tennis in these conditions, therefore the prediction
is that no tennis will be played if the day is like these conditions.
Car theft Example
Data set
Attributes are Color , Type , Origin, and the subject, stolen can be either yes or no.
We want to classify a Red, Domestic,and There is no example of a Red Domestic

SUV SUV in our data set
Car theft Example
• Calculate the probabilities:
P(Red|Yes), P(SUV|Yes), P(Domestic|Yes) ,
P(Red|No) , P(SUV|No), and P(Domestic|No)
Color Stolen=Yes Stolen=No Type Stolen=Yes Stolen=No

Red 3/5 2/5 SUV 1/5 2/5
Origin Stolen=Yes Stolen=No P(Stolen= Yes) = 5/10 = .5

Domestic P(Stolen= No) = 5/10 = .5
2/5 3/5
P((Red,Domestic, SUV) | Yes) ∗ P(Yes)
• P(Yes|(Red,Domestic, SUV)) =
P(Red,Domestic, SUV)
P(red|Yes)∗P("Domestic"|Yes)∗P(SUV|Yes)∗ P(yes)
=
P((Red,Domestic, SUV) | Yes) + P((Red,Domestic, SUV) | No)
(.6∗.5∗.2)∗.5
= =0.295
.6∗.5∗.2 ∗.5+ .4∗.6∗.6 ∗.5
P((Red,Domestic, SUV) | No) ∗ P(No)

• P(No|(Red,Domestic, SUV)) =
P(Red,Domestic, SUV)
P(red|No)∗P("Domestic"|No)∗P(SUV|No)∗ P(No)
=
P((Red,Domestic, SUV) | Yes) + P((Red,Domestic, SUV) | No)
.4∗.6∗.6 )∗.5 The example gets classified as ’NO’

= =0.705
.6∗.5∗.2 ∗.5+ .4∗.6∗.6 ∗.5
Problem 3- for Naive Bayes Classification
Classify from the

following Data:
Blood Pressure: High

Wight: Above average
Family History: Yes
Age: 50+
Diabetes?
Problem 3 Cont’d
P(Diabetes=Yes) = 9/20 =0.45 P(Diabetes=No) = 11/20=0.55
"BP" Diabetes=Yes Diabetes=No "Weight" Diabetes=Yes Diabetes=No

Low 0/9 4/11 Below Average ? ?
Average 5/9 3/11
Average
High 4/9 4/11
Above average 7/9 4/11
"Family Diabetes=Yes Diabetes=No

“Age" Diabetes=Yes Diabetes=No
history"
0-50
Yes 6/9 4/11
50+ 7/9 3/11
No
Problem 3 Cont’d
Classify from the following Data:
Blood Pressure: High Wight: Above average Family History: Yes Age: 50+ Diabetes?
• P(Yes|(BP=High, Weight=Above average, Family history=yes,Age=50+)) =
P((BP=High,Wieght=Above average, Family history=yes,Age=50+))) | Yes) ∗ P(Yes)

P("BP=High,Weight=Above average, Family history=yes,Age=50+")
• P(Yes|(BP=High, Weight=Above average, Family history=yes,Age=50+))
.44∗.78∗.67∗.78 ∗.45
• =
.44∗.78∗.67∗.78 ∗.45+ .36∗.36∗.36∗.27 ∗.55
.080
• = =.920
.0869
• Similarly,
• P(No|(BP=High,Weight=Above average, Family history=yes,Age=50+)) =
P((BP=High,Weight=Above average, Family history=yes,Age=50+))) | No) ∗ P(No)

P("BP=High,Wieht=Above average, Family history=yes,Age=50+")
• P(No|(BP=High, Weight=Above average, Family history=yes,Age=50+))
.36∗.36∗.36∗.27 ∗.55
• =
.44∗.78∗.67∗.78 ∗.45+ .36∗.36∗.36∗.27 ∗.55
.0069
• = =.080
.0869
Problem 4
Days Season Fog Rain Class
Weekday Spring None None On Time
Air-Traffic Data Weekday

Weekday
Winter
Winter
None
None
Slight
None
On Time
On Time
Holiday Winter High Slight Late
Saturday Summer Normal None On Time
Given this is the knowledge of
data and classes, we are to Weekday Autumn Normal None Very Late
find the most likely Holiday Summer High Slight On Time

classification for any other Sunday Summer Normal None On Time
unseen instance, for example: Weekday Winter High Heavy Very Late
Weekday Summer None Slight On Time
Saturday Spring High Heavy Cancelled
Weekday Summer High Slight On Time
Week Winter High None ??? Weekday Winter Normal None Late
Day
Weekday Summer High None On Time
Weekday Winter Normal Heavy Very Late
Saturday Autumn High Slight On Time
Weekday Autumn None Heavy On Time
Holiday Spring Normal Slight On Time
Weekday Spring Normal None On Time
Weekday Spring Normal Heavy On Time
Problem 4 Cont’d
Class
Attribute On Time Late Very Late Cancelled
Weekday 9/14 = 0.64 ½ = 0.5 3/3 = 1 0/1 = 0
Saturday 2/14 = 0.14 ½ = 0.5 0/3 = 0 1/1 = 1

Day
Sunday 1/14 = 0.07 0/2 = 0 0/3 = 0 0/1 = 0
Holiday 2/14 = 0.14 0/2 = 0 0/3 = 0 0/1 = 0
Spring 4/14 = 0.29 0/2 = 0 0/3 = 0 0/1 = 0

Season
Summer 6/14 = 0.43 0/2 = 0 0/3 = 0 0/1 = 0
Autumn 2/14 = 0.14 0/2 = 0 1/3= 0.33 0/1 = 0
Winter 2/14 = 0.14 2/2 = 1 2/3 = 0.67 0/1 = 0

Problem 4 Cont’d
Class
Attribute On Time Late Very Late Cancelled
None 5/14 = 0.36 0/2 = 0 0/3 = 0 0/1 = 0

Fog
High 4/14 = 0.29 1/2 = 0.5 1/3 = 0.33 1/1 = 1
Normal 5/14 = 0.36 1/2 = 0.5 2/3 = 0.67 0/1 = 0
None 5/14 = 0.36 1/2 = 0.5 1/3 = 0.33 0/1 = 0

Rain
Slight 8/14 = 0.57 0/2 = 0 0/3 = 0 0/1 = 0
Heavy 1/14 = 0.07 1/2 = 0.5 2/3 = 0.67 1/1 = 1
Prior Probability 14/20 = 0.70 2/20 = 0.10 3/20 = 0.15 1/20 = 0.05
Problem 4 Cont’d
Instance:
Week Day Winter High Heavy ???
Case1: Class = On Time : 0.70 × 0.64 × 0.14 × 0.29 × 0.07 = 0.0013
Case2: Class = Late : 0.10 × 0.50 × 1.0 × 0.50 × 0.50 = 0.0125
Case3: Class = Very Late : 0.15 × 1.0 × 0.67 × 0.33 × 0.67 = 0.0222
Case4: Class = Cancelled : 0.05 × 0.0 × 0.0 × 1.0 × 1.0 = 0.0000
Case3 is the strongest; Hence correct classification is Very Late

Pros and Cons
• The Naïve Bayes’ approach is a very popular one, which often works well.
• However, it has a number of potential problems
• It relies on all attributes being categorical.
• If the data is less, then it estimates poorly.

Estimating the Posterior Probabilities for
Continuous Attributes
Approach to overcome the limitations in Naïve Bayesian Classification
• Estimating the posterior probabilities for continuous attributes
• In real-life situations, all attributes are not necessarily categorical, In fact, there is a mix of both
categorical and continuous attributes.
• In the following, we discuss the schemes to deal with continuous attributes in Bayesian classifier.
1. We can discretize each continuous attribute and then replace the continuous values with their
corresponding discrete intervals.
2. We can assume a certain form of a probability distribution for the continuous variable and estimate the
parameters of the distribution using the training data. A Gaussian distribution is usually chosen to
represent the posterior probabilities for continuous attributes. A general form of Gaussian distribution will
look like
2
2
1 −
x−μ
P x: μ, σ = e
2πσ 2σ2
where, μ and σ2 denote mean and variance, respectively.
For each class Ci, the posterior probabilities for attribute Aj (it is the numeric attribute) can be
calculated following the Gaussian normal distribution as follows.
2
1 aj − μij
P Aj = aj|Ci = e−
2πσij 2σij2
Here, the parameter μij can be calculated based on the sample mean of attribute value of Aj for
the training records that belong to the class Ci.
Similarly, σij2 can be estimated from the calculation of variance of such training records.
Example Problem: classify whether a given person is a male or a female based on the measured features.
The features include height, weight, and foot size, assuming there exists normalized Gaussian
Distribution in the populations.
• To which class a person with the given inputs will be classified?
= 6.1984e-09
= 5.3778e-04
1 x−μ 2
P x: μ, σ2 = e− P(female) = 0.5
2πσ 2σ2 p(height | female) = 2.2346e-1
P(male) = 0.5 p(weight | female) = 1.6789e-2
p(foot size | female) = 2.8669e-1
Since the posterior numerator is greater in the female

case, we predict the sample is female.
Similarly, p(weight | male) = 5.9881e-06
p(foot size | male) = 1.3112e-3
M-estimate of Conditional Probability
• we estimated conditional probabilities Pr(A | B) by nc /n where nc is the number of times A ∧ B

happened and n is the number of times B happened in the training data.
• This can cause trouble if nc = 0
To avoid this, we fix the following numbers p and m beforehand:
• A nonzero prior estimate p for Pr(A | B), and
• A number m that says how confident we are of our prior estimate p, as measured in the
number of samples
• The M-estimation is to deal with the potential problem of Naïve Bayesian Classifier
when training data size is too poor.
• If the posterior probability for one of the attributes is zero, then the overall class-
conditional probability for the class vanishes.
• In other words, if training data do not cover many of the attribute values, then we may not be
able to classify some of the test records.
• This problem can be addressed by using the M-estimate approach.

• M-estimate approach can be stated as follows
𝑛𝑐𝑖 +𝑚𝑝
P Aj = aj|Ci = 𝑛+𝑚
where, n = total number of instances from class C𝑖
𝑛𝑐𝑖 = number of training examples from class C𝑖 that take the value Aj =aj
m = it is a parameter known as the equivalent sample size, and
p = is a user-specified parameter.
Note:
If n = 0, that is, if there is no training set available, then 𝑃 ai|C𝑖 = p,
so, this is a different value, in absence of sample value.

29-Naive Bayes-03-10-2024

Uploaded by

Copyright:

Available Formats

29-Naive Bayes-03-10-2024

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

29-Naive Bayes-03-10-2024

Uploaded by

Copyright:

Available Formats

Bayesian Classifier

Supervised Classification Technique

• Given a collection of records (training set )

• Find a model for class attributes as a function of the values

• Goal: Previously unseen records should be assigned a class as

4 Yes Medium 120K No

7 Yes Large 220K No Learn

10 No Small 90K Yes

12 Yes Medium 80K ?

13 Yes Large 110K ? Deduction

Given a database D = 𝑡1 , 𝑡2 , … . . , 𝑡𝑚 of tuples and a set of classes C =

Where each 𝑡𝑖 is assigned to one class.

Note that tuple 𝑡𝑖 ∈ 𝐷 is defined by a set of attributes 𝐴 = 𝐴1 , 𝐴2 , … . . , 𝐴𝑛 .

• Called “Naïve” classifier because of these assumptions.

• It is mainly used in text classification that includes a high-

• It is a probabilistic classifier, which means it predicts on the

• Example: Air Traffic Data

• Classification technique eventually to map this tuple into an accurate class

• The Bayesian classifier is an approach for modeling probabilistic

• More precisely, the Bayesian classifier uses Bayes’ Theorem of

• Before going to discuss the Bayesian classifier, we should have a

• Step 1: Convert the given dataset into frequency tables.

• Step 2:Generate Likelihood table by finding the

• Step 3:Now, use the Bayes theorem to calculate the

The classification problem, then can be expressed as the class-

▪ Thus, P(Y|X) can be taken as a measure of Y given that X.

• The strongest 𝑦𝑖 is the classification for the instance X = x.

Step: For each 𝑐𝑖 ∈ C, calculate the class condition probabilities, i = 1,2,…..,k

Output: 𝐶𝑥 is the classification

Outlook Play=Yes Play=No Temperature Play=Yes Play=No

Humidity Play=Yes Play=No Wind Play=Yes Play=No

High Strong 3/9 3/5

P(Play=Yes) = 9/14 P(Play=No) = 5/14

P(v|a) = P(a|v) * P(v)

P(a|v) = P(a0…a3 | v0,1)

P(a|v) = P(Outlook, Temperature, Humidity, Wind | Play tennis, Don’t Play

• Why, because we are assuming an inter-dependence of the

• The Naïve Bayes classifier is based on simplifying this assumption.

P(outlook = sunny | Play tennis) * P(temperature = cool | Play tennis) *

First, we estimated the probability of playing tennis:

• Then we estimate the conditional probabilities of the individual attributes.

x’=(Outlook=Sunny, Temperature=Cool, Humidity=High, Wind=Strong)

P((sunny, cool, high, strong) | Yes) ∗ P(Yes)

(.22 ∗ .33 ∗ .33 ∗ .33) ∗ .64

We want to classify a Red, Domestic,and There is no example of a Red Domestic

Color Stolen=Yes Stolen=No Type Stolen=Yes Stolen=No

Origin Stolen=Yes Stolen=No P(Stolen= Yes) = 5/10 = .5

P((Red,Domestic, SUV) | No) ∗ P(No)

.4∗.6∗.6 )∗.5 The example gets classified as ’NO’

Classify from the

Blood Pressure: High

"BP" Diabetes=Yes Diabetes=No "Weight" Diabetes=Yes Diabetes=No

"Family Diabetes=Yes Diabetes=No

• P(Yes|(BP=High, Weight=Above average, Family history=yes,Age=50+)) =

P((BP=High,Wieght=Above average, Family history=yes,Age=50+))) | Yes) ∗ P(Yes)

• P(Yes|(BP=High, Weight=Above average, Family history=yes,Age=50+))

P((BP=High,Weight=Above average, Family history=yes,Age=50+))) | No) ∗ P(No)

• P(No|(BP=High, Weight=Above average, Family history=yes,Age=50+))

Air-Traffic Data Weekday

find the most likely Holiday Summer High Slight On Time