Bayes' Rule and Its Use

Download as pdf or txt
Download as pdf or txt
You are on page 1of 13

BAYES’ RULE AND ITS USE

Bayes’ Theorem
We again consider the conditional probability statement:
P(A|B) = P(A ∩ B)
P(B) = P(A ∩ B)
P(B|A1)P(A1) + P(B|A2)P(A2) + ··· + P(B|An)P(An)
in which we have used the Theorem of Total Probability to replace P(B). Now
P(A ∩ B) = P(B ∩ A) = P(B|A) × P(A)
Substituting this in the expression for P(A|B) we immediately obtain the result
P(A|B) = P(B|A) × P(A)
P(B|A1)P(A1) + P(B|A2)P(A) + ··· + P(B|An)P(An)
This is true for any event A and so, replacing A by Ai gives the result, known as Bayes’
Theorem
as
P(Ai|B) = P(B|Ai) × P(Ai)
P(B|A1)P(A1) + P(B|A2)P(A2) + ··· + P(B|An)P(An)

This equation is known as Bayes' rule (also Bayes' law or Bayes' theorem).
This simple equation underlies all modern AI systems for probabilistic inference.
Bayes' Rule and conditional independence
P(Cavity | toothache  catch)
= αP(toothache  catch | Cavity) P(Cavity)
= αP(toothache | Cavity) P(catch | Cavity) P(Cavity)

• This is an example of a naïve Bayes model:


P(Cause,Effect1, … ,Effectn) = P(Cause) πiP(Effecti|Cause)

• Total number of parameters is linear in n


• A single cause directly influences a number of effect, all of which are
conditionally independent, given the cause. The full joint distribution can
be written as

P(Cause, Effect1 ,.....Effect n )  P(Cause) P( Effecti | Cause)


i

• Such a probability distribution is called a Naïve Bayes model. Some times


called Bayesian classifier
Simple Example Problem
• A doctor knows that the disease meningitis is causes the patient to have a
stiff neck, say, 50% of the time. The doctor also knows some unconditional
facts: the prior probability of a patient having meningitis is 1/50,000, and
the prior probability of any patient having a stiff neck is 1/20. Letting S be
the proposition that the patient has a stiff neck and M be the proposition
that the patient has meningitis.
EXAMPLE PROBLEM

The training data are in above Table. The data tuples are described by the attributes age,
income, student, and credit rating. The class label attribute, buys computer, has two distinct
values (namely, yes, no). Let C1 correspond to the class buys computer = yes and C2
correspond to buys computer = no. The tuple we wish to classify is X = (age = youth,
income = medium, student = yes, credit rating = fair)
• We need to maximize P(X|Ci)P(Ci), for i = 1, 2. P(Ci),
the prior probability of each class, can be computed
based on the training tuples:

• P(buys computer = yes) = 9/14 = 0.643


• P(buys computer = no) = 5/14 = 0.357
• To compute P(X|Ci), for i = 1, 2, we compute the
following conditional probabilities:
• P(age = youth | buys computer = yes) = ?
• P(age = youth | buys computer = no) = ?
• P(income = medium | buys computer = yes) = ?
• P(income = medium | buys computer = no) = ?
• P(student = yes | buys computer = yes) = ?
• P(student = yes | buys computer = no) = ?
• P(credit rating = fair | buys computer = yes) = ?
• P(credit rating = fair | buys computer = no) = ?

• The above full join distribution can be written as

P(Cause, Effect1 ,.....Effect n )  P(Cause) P( Effecti | Cause)


i
P(Cause, Effect1 ,.....Effect n )  P(Cause) P( Effecti | Cause)
i

= 0.222 X 0.444 X 0.667 X 0.667 = 0.044


Similarly,
P(X | buys computer = no)
= 0.600 X 0.400 X 0.200 X 0.400 = 0.019
To find the class, Ci, that maximizes P(XjCi)P(Ci), we
compute
P(X | buys computer = yes)P(buys computer = yes) = ?
P(X | buys computer = no)P(buys computer = no) = ?
P(X | buys computer = yes)P(buys computer = yes)
=0.044 X 0.643 = 0.028
P(X | buys computer = no)P(buys computer = no)
=0.0918 X 0.357 = 0.007
Therefore, the naïve Bayesian classifier predicts buys
computer = yes for tuple X.
Practice Problem 1

• X = (color= red, Type = SUV, Origin = Domestic)


• We have P(Yes) = .5 and P(No) = .5,
• For v = Y es, we have P(Yes) * P(Red | Yes) * P(SUV | Yes) *
P(Domestic|Yes) = .5 * .56 * .31 * .43 = .037
• For v = No, we have P(No) * P(Red | No) * P(SUV | No) * P
(Domestic | No) = .5 * .43 * .56 * .56 = .069
• Since 0.069 > 0.037, our example gets classified as ’NO
• X = (color= red, Type = SUV, Origin = Domestic) => Stolen=NO
Practice Problem 2

• X = (Outlook = sunny, Temperature = hot, Humidity = High,


Windy = false)

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy