Bayes
Bayes
Bayes
By
SATHISHKUMAR G
(sathishsak111@gmail.com)
classification
Uncertainty & Probability
Baye's rule
Choosing Hypotheses- Maximum a posteriori
Maximum Likelihood - Baye's concept learning
Maximum Likelihood of real valued function
Bayes optimal Classifier
Joint distributions
Naive Bayes Classifier
Uncertainty
Our main tool is the probability theory, which
assigns to each sentence numerical degree of
belief between 0 and 1
(Kolmogorov’s axioms,
first published in German 1933)
P(true)=1, P(false)=0
P(a|b)P(b)
P(b|a)=
P(a)
Diagnosis
What is the probability of meningitis in the patient with stiff
neck?
A doctor knows that the disease meningitis causes the patient to have
a stiff neck in 50% of the time -> P(s|m)
Prior Probabilities:
• That the patient has meningitis is 1/50.000 -> P(m)
• That the patient has a stiff neck is 1/20 -> P(s)
P(s|m)P(m)
P(m|s)=
P(s)
0.5*0.0
0002
P(m|s)= =0
.000
2
0
.05
Normalization
1 = P ( y | x ) + P ( ¬y | x )
P( x | y ) P( y )
P( y | x) =
P( x)
P(Y | X ) = α × P ( X | Y ) P(Y )
P ( x | ¬y ) P ( ¬y )
P ( ¬y | x ) = α P( y | x), P(¬y | x)
P( x)
α 0.12,0.08 = 0.6,0.4
Bayes Theorem
for all h in H
)= ∑1⋅ + ∑0⋅
1 1
P(D
hi ∈VSH,D |H| hi ∉VSH,D |H|
|VSH,D |
P(D)=
|H|
1
1⋅ 1
P(h|D)= |H| = if h is consistent with D
|VSH,D | |VSH,D |
|H|
Maximum Likelihood of real
valued function
Maximize natural log of this instead...
Bayes optimal Classifier
A weighted majority classifier
What is he most probable classification of the new
instance given the training data?
The most probable classification of the new instance is
obtained by combining the prediction of all hypothesis,
weighted by their posterior probabilities
If the classification of new example can take any
value vj from some set V, then the probability P(vj|D)
that the correct classification for the new instance is
vj, is just:
Bayes optimal classification:
Gibbs Algorithm
Bayes optimal classifier provides best
result, but can be expensive if many
hypotheses
Gibbs algorithm:
Choose one hypothesis at random, according
to P(h|D)
Use this to classify new instance
Suppose correct, uniform prior distribution
over H, then
Pick any hypothesis at random..
Its expected error no worse than twice Bayes
optimal
Joint distribution
n
P(cause, effect1 , effect 2 ,...effect n ) = P (cause)∏ P (effecti | cause)
i =1
Naive Bayes Classifier
Along with decision trees, neural networks,
nearest nbr, one of the most practical learning
methods
When to use:
Moderate or large training set available
Attributes that describe instances are conditionally
independent given classification
Successful applications:
Diagnosis
Classifying text documents
Naive Bayes Classifier
Assume target function f: X V, where
each instance x described by attributes a1,
a2 .. an
Most probable value of f(x) is:
vNB
which gives
Naive Bayes Algorithm
For each target value vj
estimate P(vj)
For each attribute value ai of each
attribute a
estimate P(ai|vj)
Training dataset
age income student credit_rating buys_computer
Class: <=30 high no fair no
C1:buys_computer=‘yes’ <=30 high no excellent no
C2:buys_computer=‘no’ 30…40 high no fair yes
>40 medium no fair yes
>40 low yes fair yes
>40 low yes excellent no
Data sample: 31…40 low yes excellent yes
<=30 medium no fair no
X= <=30 low yes fair yes
(age<=30, >40 medium yes fair yes
Income=medium,
<=30 medium yes excellent yes
Student=yes
31…40 medium no excellent yes
Credit_rating=Fair)
31…40 high yes fair yes
>40 medium no excellent no
Naï ve Bayesian Classifier:
Example
Compute P(X|Ci) for each class
P(buys_computer=„yes“)=9/14
P(age=“<30” | buys_computer=“yes”) = 2/9=0.222
P(buys_computer=„no“)=5/14
P(age=“<30” | buys_computer=“no”) = 3/5 =0.6
P(income=“medium” | buys_computer=“yes”)= 4/9 =0.444
P(income=“medium” | buys_computer=“no”) = 2/5 = 0.4
P(student=“yes” | buys_computer=“yes)= 6/9 =0.667
P(student=“yes” | buys_computer=“no”)= 1/5=0.2
P(credit_rating=“fair” | buys_computer=“yes”)=6/9=0.667
P(credit_rating=“fair” | buys_computer=“no”)=2/5=0.4
A P(M)
A P(J)
JohnCalls t .90 MaryCalls t .7
f .01
f .05
Thank you