Bayes' Rule and Its Use
Bayes' Rule and Its Use
Bayes' Rule and Its Use
Bayes’ Theorem
We again consider the conditional probability statement:
P(A|B) = P(A ∩ B)
P(B) = P(A ∩ B)
P(B|A1)P(A1) + P(B|A2)P(A2) + ··· + P(B|An)P(An)
in which we have used the Theorem of Total Probability to replace P(B). Now
P(A ∩ B) = P(B ∩ A) = P(B|A) × P(A)
Substituting this in the expression for P(A|B) we immediately obtain the result
P(A|B) = P(B|A) × P(A)
P(B|A1)P(A1) + P(B|A2)P(A) + ··· + P(B|An)P(An)
This is true for any event A and so, replacing A by Ai gives the result, known as Bayes’
Theorem
as
P(Ai|B) = P(B|Ai) × P(Ai)
P(B|A1)P(A1) + P(B|A2)P(A2) + ··· + P(B|An)P(An)
This equation is known as Bayes' rule (also Bayes' law or Bayes' theorem).
This simple equation underlies all modern AI systems for probabilistic inference.
Bayes' Rule and conditional independence
P(Cavity | toothache catch)
= αP(toothache catch | Cavity) P(Cavity)
= αP(toothache | Cavity) P(catch | Cavity) P(Cavity)
The training data are in above Table. The data tuples are described by the attributes age,
income, student, and credit rating. The class label attribute, buys computer, has two distinct
values (namely, yes, no). Let C1 correspond to the class buys computer = yes and C2
correspond to buys computer = no. The tuple we wish to classify is X = (age = youth,
income = medium, student = yes, credit rating = fair)
• We need to maximize P(X|Ci)P(Ci), for i = 1, 2. P(Ci),
the prior probability of each class, can be computed
based on the training tuples: