Lecture Bayesian Networks
Lecture Bayesian Networks
Bayesian Networks
● A simple, graphical notation for conditional independence assertions
and hence for compact specification of full joint distributions
● Syntax
– a set of nodes, one per variable
– a directed, acyclic graph (link ≈ “directly influences”)
– a conditional distribution for each node given its parents:
P(X i |Parents(X i ))
Example
● I’m at work, neighbor John calls to say my alarm is ringing, but neighbor Mary
doesn’t call. Sometimes it’s set off by minor earthquakes.
Is there a burglar?
Example
5
Compactness
● I.e., grows linearly with n, vs. O(2 n ) for the full joint distribution
Global Semantics
● Global semantics defines the full joint distribution as the product of the local
conditional distributions:
n
P (x 1 , . . . , x n ) = G P (x i |parents(X i ))
i =1
● E.g., P ( j ∧ m ∧ a ∧ ¬b ∧ ¬e)
● P (J | M ) = P
(J)?
9
● P (J | M ) = P ( J ) ? No
● P (A|J, M ) = P (A|J)? P (A|J, M ) = P
(A)?
1
0
Example
● Suppose we choose the ordering M , J , A , B , E
● P (J | M ) = P ( J ) ? No
● P (A|J, M ) = P (A|J)? P (A|J, M ) = P (A)?
No
● P (B|A, J , M ) = P (B|A)?
● P (B|A, J , M ) = P (B)?
Example
● Suppose we choose the ordering M , J , A , B , E
● P (J | M ) = P ( J ) ? No
● P (A|J, M ) = P (A|J)? P (A|J, M ) = P (A)?
No
● P (B|A, J , M ) = P (B|A)? Yes
● P (B|A, J , M ) = P (B)? No
● P ( E | B, A , J , M ) = P (E|A)?
● P ( E | B, A , J , M ) = P (E|A, B )?
Example
● Suppose we choose the ordering M , J , A , B , E
● P (J | M ) = P ( J ) ? No
● P (A|J, M ) = P (A|J)? P (A|J, M ) = P (A)?
No
● P (B|A, J , M ) = P (B|A)? Yes
● P (B|A, J , M ) = P (B)? No
● P ( E | B, A , J , M ) = P (E|A)? No
● P ( E | B, A , J , M ) = P (E|A, B )? Yes
Example
inference
Inference Tasks
● Simple queries: compute posterior marginal P(X i |E = e)
e.g., P (NoGas|Gauge = empty, Lights = on, Starts = false)
Inference by Enumeration
● Slightly intelligent way to sum out variables from the joint without actually
constructing its explicit representation
Enumeration Algorithm
function ENUMERATION-ASK(X, e, bn) returns a distribution over X
inputs: X, the query variable
e, observed values for variables E
bn, a Bayesian network with variables { X } ∪ E ∪ Y
Q(X ) ← a distribution over X, initially empty
for each value x i of X do
extend e with value x i for X
Q(x i ) ← ENUMERATE-ALL(VARS[bn], e)
return NORMALIZE(Q(X ))
function ENUMERATE-ALL(vars, e) returns a real number
if EMPTY?(vars) then return 1.0
Y ← FIRST(vars)
if Y has value y in e
then return P (y | P a ( Y )) × ENUMERATE-ALL(REST(vars), e)
else return ∑ y P (y | P a ( Y )) × ENUMERATE-
ALL(REST(vars), ey ) where ey is e extended with Y = y
Tree 9
Variables 2
● Here
– X = J o h n C a l l s , E = {Burglar y }
– Ancestors ({X} ∪ E) = {Alarm, Earthquake}
⇒ M a r y C a l l s is irrelevant
● Compare this to backward chaining from the query in Horn clause KBs
Artificial Intelligence: Bayesian Networks 2 April 2024
Irrelevant 2
Variables 3
● Definition: moral graph of Bayes net: marry all parents and drop arrows
● For P (JohnCalls |Alarm = true), both Burglar y and Earthquake are irrelevant
( A = J ohnCalls , B = Burglar y and Earthquake, C = Alar m )
approximate inference
● Outline
– Sampling from an empty network
– Rejection sampling: reject samples disagreeing with evidence
– Likelihood weighting: use evidence to weight samples
– Markov chain Monte Carlo (MCMC): sample from a stochastic process
whose stationary distribution is the true posterior
Example
Example
Example
Example
Example
Example
Example
● Shorthand: Pˆ (x 1 , . . . , x n ) ≈ P (x 1 . . . x n )
Artificial Intelligence: Bayesian Networks 2 April 2024
3
6
Rejection Sampling
● Pˆ (X |e) estimated from samples agreeing with e
Likelihood Weighting
● Idea: fix evidence variables, sample only nonevidence variables,
and weight each sample by the likelihood it accords the
evidence
w=
1.0
w=
1.0
w=
1.0
w = 1.0 ×
0.1
w = 1.0 ×
0.1
w = 1.0 ×
0.1
MCMC Example
● Estimate P(Rain|Sprinkler = true, WetGrass = true)
Summary
● Bayes nets provide a natural representation for (causally induced)
conditional independence