L23 Bayesian Naive
L23 Bayesian Naive
L23 Bayesian Naive
B
A
P ( B1 ) P ( A / B1 ) P ( B2 ) P ( A / B2 )
Bayes Classifiers
Credit rating prediction:
Training data: Joint observation of features and outcome
Classifier is a mapping from observed values of x to predicted
values of c
Features # bad #good
x=0 42 15
X=2 3 5
• Predict more likely outcome for each possible
observation
• Probability of outcome c given an observation x
P(h|D)=P(D|h)P(h)/P(D)
Maximum A Posteriori (MAP)
Hypothesis and Maximum Likelihood
• Goal: To find the most probable hypothesis h from a set of
candidate hypotheses H given the observed data D.
• MAP Hypothesis, hMAP = argmax hH P(h|D)
= argmax hH P(D|h)P(h)/P(D)
= argmax hH P(D|h)P(h)
• If every hypothesis in H is equally probable a priori, we only
need to consider the likelihood of the data D given h, P(D|
h). Then, hMAP becomes the Maximum Likelihood,
hML= argmax hH P(D|h)
This can be applied to any set H of mutually exclusive propositions whose
probabilities sum to 1.
Bayes’ Rule
Understanding Bayes' rule
P ( d | h) P ( h) d data
p(h | d ) h hypothesis
P(d ) Proof. Just rearrange :
p ( h | d ) P ( d ) P ( d | h) P ( h)
P ( d , h) P ( d , h)
the same joint probability
Who is who in Bayes’ rule
on both sides
P(p|a1,a2,..,an) = a P(vj|a1,a2,..,an)
P(n|a1,a2,..,an) =b
14
Naïve Bayesian Classification
• If i-th attribute is categorical:
P(a1,a2,..,an|vj) is estimated as the relative
freq of samples having value ai as i-th
attribute in class v
• If i-th attribute is continuous:
P(a1,a2,..,an|vj) is estimated thru a Gaussian
density function
• Computationally easy in both cases
Play-tennis example: estimating P(xi|C)
outlook
Outlook Temperature Humidity Windy Class P(sunny|p) = 2/9 P(sunny|n) = 3/5
sunny hot high false N
sunny hot high true N P(overcast|p) = 4/9 P(overcast|n) = 0
overcast hot high false P
rain mild high false P P(rain|p) = 3/9 P(rain|n) = 2/5
rain cool normal false P
rain cool normal true N temperature
overcast cool normal true P
sunny mild high false N P(hot|p) = 2/9 P(hot|n) = 2/5
sunny cool normal false P
rain mild normal false P P(mild|p) = 4/9 P(mild|n) = 2/5
sunny mild normal true P
overcast
overcast
mild
hot
high true
normal false
P
P
P(cool|p) = 3/9 P(cool|n) = 1/5
rain mild high true N
humidity
P(high|p) = 3/9 P(high|n) = 4/5
P(p) = 9/14 P(normal|p) = 6/9 P(normal|n) = 2/5
windy
P(n) = 5/14
P(true|p) = 3/9 P(true|n) = 3/5
P(false|p) = 6/9 P(false|n) = 2/5
Play-tennis example: classifying X
• An unseen sample X = <rain, hot, high, false>
• P(X|p)·P(p) =
P(rain|p)·P(hot|p)·P(high|p)·P(false|p)·P(p) =
3/9·2/9·3/9·6/9·9/14 = 0.010582
• P(X|n)·P(n) =
P(rain|n)·P(hot|n)·P(high|n)·P(false|n)·P(n) =
2/5·2/5·4/5·2/5·5/14 = 0.018286