Bayesian Networks
Bayesian Networks
Bayesian Networks
Introduction
Suppose you are trying to determine
if a patient has inhalational
anthrax. You observe the
following symptoms:
• The patient has a cough
• The patient has a fever
• The patient has difficulty
breathing
Introduction
You would like to determine how
likely the patient is infected with
inhalational anthrax given that the
patient has a cough, a fever, and
difficulty breathing
Sums to 1
The Joint Probability Distribution
A B C P(A,B,C)
• Once you have the joint probability false false false 0.1
distribution, you can calculate any false false true 0.2
probability involving A, B, and C false true false 0.05
false true true 0.05
true false false 0.3
true false true 0.1
true true false 0.05
true true true 0.15
Examples of things you can compute:
• P(A=true) = sum of P(A,B,C) in rows with A=true
• P(A=true, B = true | C=true) =
P(A = true, B = true, C = true) / P(C = true)
The Problem with the Joint
Distribution
• Lots of entries in the A B C P(A,B,C)
table to fill up! false false false 0.1
C D
C D
C D
Using a Bayesian Network Example
Using the network in the example, suppose you want to
calculate:
This is from the
P(A = true, B = true, C = true, D = true) graph structure
= P(A = true) * P(B = true | A = true) *
P(C = true | B = true) P( D = true | B = true)
= (0.4)*(0.3)*(0.1)*(0.95) A
B
These numbers are from the
conditional probability tables
C D
Bayesian Network: Markov Blanket
Bayesian Networks
Two important properties:
1. Encodes the conditional independence
relationships between the variables in the
graph structure
2. Is a compact representation of the joint
probability distribution over the variables
Conditional Independence
The Markov condition: given its parents (P1, P2),
a node (X) is conditionally independent of its non-
descendants (ND1, ND2)
P1 P2
ND1 X ND2
C1 C2
The Joint Probability Distribution
Due to the Markov condition, we can
compute the joint probability distribution over
all the variables X1, …, Xn in the Bayesian net
using the formula:
n
P ( X 1 x1 ,..., X n xn ) P ( X i xi | Parents ( X i ))
i 1
P(We
A,derive ) P( applying
B, Cit,byDrepeatedly A) P( Bthe|Bayes’
A) PRule
(C | A, B ) P( D | A, B, C )
P(X,Y)=P(X|Y)P(Y):
P ( A, B, C , D ) P ( B, C , D | A) P ( A)
P (C , D | B, A) P ( B | A) P ( A)
P ( D | C , B, A) P (C | B, A) P ( B | A) P ( A)
P ( A) P ( B | A) P (C | A, B ) P ( D | A, B, C )
Joint Probability Factorization
Our example graph carries additional independence
information, which simplifies the joint distribution:
P ( A, B, C , D ) P ( A) P ( B | A) P (C | A, B ) P ( D | A, B, C )
P ( A) P ( B | A) P (C | B ) P ( D | B )
P ( A t , C t ) b,d
P ( A t , B b, C t , D d )
P (C t | A t )
P ( A t ) P ( A t )
P ( A t ) P ( B b | A t ) P (C c | B b) P ( D d | B b) C D
b ,c ,d
P ( A t ) P ( B b | A t ) P (C c | B b) P ( D d | B b)
b c ,d
P ( A t ) P ( B b | A t ) P (C c | B b) P ( D d | B b)
b c d
P ( A t ) P ( B b | A t ) P (C c | B b) * 1
b c
0.4( P ( B t | A t ) P (C c | B t ) P ( B f | A t ) P (C c | B f )) ...
c c
A P(A) A B P(B|A) B D P(D|B) B C P(C|B)
false 0.6 false false 0.01 false false 0.02 false false 0.4
true 0.4 false true 0.99 false true 0.98 false true 0.6
true false 0.7 true false 0.05 true false 0.9
true true 0.3 true true 0.95 true true 0.1
What is P(C=true, A=true)? A
P ( A t , C t ) P ( A t , B b, C t , D d ) B
b ,d
P ( A t ) P ( B b | A t ) P (C t | B b) P ( D d | B b)
b ,d
C D
P ( A t ) P ( B b | A t ) P (C t | B b) P ( D d | B b)
b d
0.4( P ( B t | A t ) P (C t | B t ) P ( D d | B t )
d
P ( B f | A t ) P (C t | B f ) P ( D d | B f ))
d