Monte Carlo Artificial Intelligence: Bayesian Networks
Monte Carlo Artificial Intelligence: Bayesian Networks
Monte Carlo Artificial Intelligence: Bayesian Networks
Bayesian Networks
1
Why This Matters
• Bayesian networks have been the most
important contribution to the field of AI in
the last 10 years
• Provide a way to represent knowledge in an
uncertain domain and a way to reason about
this knowledge
• Many applications: medicine, factories,
help desks, spam filtering, etc.
2
A Bayesian Network
B P(B) E P(E) A Bayesian network is made
false 0.999 false 0.998 up of two parts:
true 0.001 true 0.002
1. A directed acyclic graph
Burglary Earthquake 2. A set of parameters
B E A P(A|B,E)
Alarm
false false false 0.999
false false true 0.001
false true false 0.71
false true true 0.29
true false false 0.06
true false true 0.94
true true false 0.05
true true true 0.95
A Directed Acyclic Graph
Burglary Earthquake
Alarm
4
A Directed Acyclic Graph
Burglary Earthquake
Alarm
5
A Set of Parameters
B P(B) E P(E) Burglary Earthquake
false 0.999 false 0.998
true 0.001 true 0.002
Alarm
B E A P(A|B,E)
false false false 0.999
false false true 0.001 Each node Xi has a conditional probability
false true false 0.71 distribution P(Xi | Parents(Xi)) that quantifies the
false true true 0.29 effect of the parents on the node
true false false 0.06
The parameters are the probabilities in these
true false true 0.94 conditional probability distributions
true true false 0.05
true true true 0.95
Because we have discrete random variables, we
have conditional probability tables (CPTs)
6
A Set of Parameters
Conditional Probability Stores the probability distribution
Distribution for Alarm for Alarm given the values of
Burglary and Earthquake
B E A P(A|B,E)
false false false 0.999
For a given combination of values of the
false false true 0.001
parents (B and E in this example), the
false true false 0.71
entries for P(A=true|B,E) and P(A=false|
false true true 0.29 B,E) must add up to 1 eg. P(A=true|
true false false 0.06 B=false,E=false) + P(A=false|
true false true 0.94 B=false,E=false)=1
true true false 0.05
true true true 0.95
8
Bayes Nets Formalized
A Bayes net (also called a belief network) is an augmented
directed acyclic graph, represented by the pair V , E
where:
– V is a set of vertices.
– E is a set of directed edges joining vertices. No loops
of any length are allowed.
10
Bayesian Network Example
Weather Cavity
12
A Representation of the Full Joint
Distribution
• We will use the following abbrevations:
– P(x1, …, xn) for P( X1 = x1 … Xn = xn)
– parents(Xi) for the values of the parents of Xi
• From the Bayes net, we can calculate:
n
P ( x1 ,..., xn ) P ( xi | parents ( X i ))
i 1
13
The Full Joint Distribution
P( x1 ,..., xn )
P( xn | xn 1 ,..., x1 ) P( xn 1 ,..., x1 ) ( Chain Rule)
14
n
The Full Joint Distribution
n
P( x | x
i 1
i i 1 ,..., x1 ) P ( xi | parents( xi ))
i 1
Alarm
JohnCalls MaryCalls
16
Conditional Independence
• There is a general topological criterion called d-
separation
• d-separation determines whether a set of nodes X
is independent of another set Y given a third set E
17
D-separation
• We will use the notation I(X, Y | E) to mean
that X and Y are conditionally independent
given E
• Theorem [Verma and Pearl 1988]:
If a set of evidence variables E d-separates X and
Y in the Bayesian Network’s graph, then
I(X, Y | E)
• d-separation can be determined in linear
time using a DFS-like algorithm
18
D-separation
• Let evidence nodes E V (where V are the
vertices or nodes in the graph), and X and Y
be distinct nodes in V – E.
• We say X and Y are d-separated by E in the
Bayesian network if every undirected path
between X and Y is blocked by E.
• What does it mean for a path to be blocked?
There are 3 cases…
19
Case 1
There exists a node N on the path such that
• It is in the evidence set E (shaded grey)
• The arcs putting N in the path are “tail-to-
tail”.
X N Y
Burglary Earthquake
Alarm
Your house has a twitchy burglar alarm that is also sometimes triggered by
earthquakes
Earth obviously doesn’t care if your house is currently being broken into
While you are on vacation, one of your nice neighbors calls and lets you
know your alarm went off
23
Case 3 (Explaining Away)
Burglary Earthquake
Alarm
25
Conditional Independence
• Note: D-separation only finds random variables
that are conditionally independent based on the
topology of the network
• Some random variables that are not d-separated
may still be conditionally independent because of
the probabilities in their CPTs
26