Bayesian Theory: By: Khaliq Bero
Bayesian Theory: By: Khaliq Bero
Bayesian Theory: By: Khaliq Bero
THEORY
Naive Bayes Classifier in Python | Naive Bayes Algorithm |
Machine Learning Algorithm
P(H|E) = P(E|H).P(H)
P(E)
3
Bayes Theorem Example
P(King) = 4/52 = 1/13
P(King|Face) = P(Face|King.P(King)
P(Face)
P(King|Face) = 1
P(Face) = 12/52 = 3/13
= 1.(1/13) = 1/3
3/13
4
Bayes Theorem Proof
P(A|B) = P(A B) P(A B) = P(A|B).P(A) = P(B|A).P(B)
P(B)
= P(A|B) = P(B|A).P(B)
P(B|A) = P(B A) P(A)
P(A)
5
Bayes Theorem Proof
Likelihood Prior
How probable is the evidence How probable was our hypothesis
Given that our hypothesis is true? Before observing the evidence?
P(H|E) = P(E|H).P(H)
P(E)
Posterior
How probable is our Hypothesis Marginal
Given the observed evidence? How probable is the new evidence
(Not directly computable) Under all possible hypothesis?
6
Naïve Bayes : Working
Classification Steps
DAY Outlook Humidity Wind Play
D1 Sunny High Weak No
D2 Sunny High Strong No
D3 Overcast High Weak Yes
D4 Rain High Weak Yes
D5 Rain Normal Weak Yes
D6 Rain Normal Strong No
D7 Overcast Normal Strong Yes
D8 Sunny High Weak No
D9 Sunny Normal Weak Yes
D10 Rain Normal Weak Yes
D11 Sunny Normal Strong Yes
D12 Overcast High Strong Yes
D13 Overcast Normal Weak Yes
D14 Rain High Strong No 7
Classificat
ion Steps
P(x|c) = P (Sunny|Yes) = 3/10 = 0.3
P(Yes/High) = 0.33 x 0.6 / 0.5 = 0.42 P(Yes/Weak) = 0.67 x 0.64 / 0.57 = 0.75
P(No/High) = 0.8 x 0.36 / 0.5 = 0.58 P(No/Weak) = 0.4 x 0.36 / 0.57 = 0.25
9
Classificat
ion Steps
10
Classificat
ion Steps
11
12
Types of
Naïve
Gaussian
Bayes
It use in classification and it assumes that a
feature follow a normal distribution
Multinomial
It is used for discrete counts. For example, let’s
say, we have a text classification problem. Here
we can consider Bernoulli trials which is one step
further and instead of “word occurring in the
document”, we have “count how often word
occurs in the document”, you can think of it as
“number of times outcome number x_i is
observed over the number of trials”. 13
Types of
Naïve Bayes
Bernouli
The binomial model is useful if your feature
vectors are binary (i.e. zeros and ones). One
application would be text classification with ‘bag
of words’ model where the 1s & 0s are “word
occurs in the document” and “word does not
occur in the document” respectively.
14
Thank you.
15