0% found this document useful (0 votes)
4 views

Lecture Bayesian Networks

Uploaded by

Dhanu Sree
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views

Lecture Bayesian Networks

Uploaded by

Dhanu Sree
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 50

1

Bayesian Network Example


● Topology of network encodes conditional independence assertions:

● Weather is independent of the other variables

● Toothache and Catch are conditionally independent given Cavity


2

Bayesian Networks
● A simple, graphical notation for conditional independence assertions
and hence for compact specification of full joint distributions

● Syntax
– a set of nodes, one per variable
– a directed, acyclic graph (link ≈ “directly influences”)
– a conditional distribution for each node given its parents:
P(X i |Parents(X i ))

● In the simplest case, conditional distribution represented as


a conditional probability table (CPT) giving the
distribution over X i for each combination of parent values
3

Example
● I’m at work, neighbor John calls to say my alarm is ringing, but neighbor Mary
doesn’t call. Sometimes it’s set off by minor earthquakes.
Is there a burglar?

● Variables: Burglar , Earthquake, Alarm, J o h n C a l l s , M a r y C a l l s

● Network topology reflects “causal” knowledge


– A burglar can set the alarm off
– An earthquake can set the alarm off
– The alarm can cause Mary to call
– The alarm can cause John to call
4

Example
5

Compactness

● A conditional probability table for Boolean X i with k Boolean parents has 2 k


rows for the combinations of parent values

● Each row requires one number p for X i = true


(the number for X i = false is just 1 − p)

● If each variable has no more than k parents,


the complete network requires O(n ⋅ 2 k ) numbers

● I.e., grows linearly with n, vs. O(2 n ) for the full joint distribution

● For burglary net, 1 + 1 + 4 + 2 + 2 = 10 numbers (vs. 25 − 1 = 31)


6

Global Semantics

● Global semantics defines the full joint distribution as the product of the local
conditional distributions:
n
P (x 1 , . . . , x n ) = G P (x i |parents(X i ))
i =1
● E.g., P ( j ∧ m ∧ a ∧ ¬b ∧ ¬e)

= P (j|a)P (m|a)P (a|¬b, ¬e)P


(¬b)P (¬e)
= 0.9 × 0.7 × 0.001 × 0.999 ×
0.998
≈ 0.00063
7

Constructing Bayesian Networks


● Need a method such that a series of locally testable assertions of
conditional independence guarantees the required global
semantics
1. Choosean ordering of v a r i a b l e s X 1 , . . . , X n
2. For i = 1 to n
add X i to the network
select parents from X 1 , . . . , X i − 1 such that
P(X i |Parents (X i )) = P ( X i | X 1 , . . . , X i − 1 )

● This choice of parents guarantees the global semantics:


P(X1 ,..., Xn ) (chain rule)
Gn P ( X i | X 1 , . . . , X i − 1 )
= i =1
n
(by construction)
= G
P(X i |
Parents(
X i ))
i =1
Example 8

● Suppose we choose the ordering M , J , A , B , E

● P (J | M ) = P
(J)?
9

● Suppose we choose the ordering M , J , A , B , E

● P (J | M ) = P ( J ) ? No
● P (A|J, M ) = P (A|J)? P (A|J, M ) = P
(A)?
1
0

Example
● Suppose we choose the ordering M , J , A , B , E

● P (J | M ) = P ( J ) ? No
● P (A|J, M ) = P (A|J)? P (A|J, M ) = P (A)?
No
● P (B|A, J , M ) = P (B|A)?
● P (B|A, J , M ) = P (B)?

Artificial Intelligence: Bayesian Networks 2 April 2024


1
1

Example
● Suppose we choose the ordering M , J , A , B , E

● P (J | M ) = P ( J ) ? No
● P (A|J, M ) = P (A|J)? P (A|J, M ) = P (A)?
No
● P (B|A, J , M ) = P (B|A)? Yes
● P (B|A, J , M ) = P (B)? No
● P ( E | B, A , J , M ) = P (E|A)?
● P ( E | B, A , J , M ) = P (E|A, B )?

Artificial Intelligence: Bayesian Networks 2 April 2024


1
2

Example
● Suppose we choose the ordering M , J , A , B , E

● P (J | M ) = P ( J ) ? No
● P (A|J, M ) = P (A|J)? P (A|J, M ) = P (A)?
No
● P (B|A, J , M ) = P (B|A)? Yes
● P (B|A, J , M ) = P (B)? No
● P ( E | B, A , J , M ) = P (E|A)? No
● P ( E | B, A , J , M ) = P (E|A, B )? Yes

Artificial Intelligence: Bayesian Networks 2 April 2024


1
3

Example

● Deciding conditional independence is hard in noncausal directions

(Causal models and conditional independence seem hardwired for humans!)


● Assessing conditional probabilities is hard in noncausal directions
● Network is less compact: 1 + 2 + 4 + 2 + 4 = 13 numbers needed

Artificial Intelligence: Bayesian Networks 2 April 2024


1
4

Example: Car Diagnosis


● Initial evidence: car won’t start
● Testable variables (green), “broken, so fix it” variables (orange)
● Hidden variables (gray) ensure sparse structure, reduce parameters

Artificial Intelligence: Bayesian Networks 2 April 2024


1
5

inference

Artificial Intelligence: Bayesian Networks 2 April 2024


1
6

Inference Tasks
● Simple queries: compute posterior marginal P(X i |E = e)
e.g., P (NoGas|Gauge = empty, Lights = on, Starts = false)

● Conjunctive queries: P ( X i , X j | E = e) = P(X i |E = e ) P ( X j | X i , E = e)

● Optimal decisions: decision networks include utility information;


probabilistic inference required for P (outcome|action,
evidence)

● Value of information: which evidence to seek next?

● Sensitivity analysis: which probability values are most critical?

● Explanation: why do I need a new starter motor?

Artificial Intelligence: Bayesian Networks 2 April 2024


1
7

Inference by Enumeration
● Slightly intelligent way to sum out variables from the joint without actually
constructing its explicit representation

● Simple query on the burglary network


P(B|j, m)
= P(B, j , m)/P (j, m)
= αP(B, j , m)
= α ∑ ∑ a P(B, e, a, j ,
e
m)
● Rewrite full joint entries using product of CPT entries:
P(B|j, m)
= α ∑ ∑ a P (B )P (e)P(a|B, e)P (j|a)P (m|
e
a)αP(B) ∑ e P (e) ∑ a P(a|B, e)P (j|a)P (m|a)
=

● Recursive depth-first enumeration: O(n) space, O(d n ) time

Artificial Intelligence: Bayesian Networks 2 April 2024


1
8

Enumeration Algorithm
function ENUMERATION-ASK(X, e, bn) returns a distribution over X
inputs: X, the query variable
e, observed values for variables E
bn, a Bayesian network with variables { X } ∪ E ∪ Y
Q(X ) ← a distribution over X, initially empty
for each value x i of X do
extend e with value x i for X
Q(x i ) ← ENUMERATE-ALL(VARS[bn], e)
return NORMALIZE(Q(X ))
function ENUMERATE-ALL(vars, e) returns a real number
if EMPTY?(vars) then return 1.0
Y ← FIRST(vars)
if Y has value y in e
then return P (y | P a ( Y )) × ENUMERATE-ALL(REST(vars), e)
else return ∑ y P (y | P a ( Y )) × ENUMERATE-
ALL(REST(vars), ey ) where ey is e extended with Y = y

Artificial Intelligence: Bayesian Networks 2 April 2024


Evaluation 1

Tree 9

● Enumeration is inefficient: repeated computation


e.g., computes P (j|a)P (m|a) for each value
of e

Artificial Intelligence: Bayesian Networks 2 April 2024


2
0

Inference by Variable Elimination


● Variable elimination: carry out summations right-to-left,
storing intermediate results (factors) to avoid
recomputation
P(B|j, m)
= α P(B) ∑ e P (e) ∑ a P(a|B, e) P (j|a) P (m|
a) v v −−−−−−−−−v−−−−−−−−−- −−−−−−−v−−−−−−−-
−−−−−−−−−−v−−−−−−−−−−−-
B E A
J M

= αP(B) ∑ e P (e) ∑ a P(a|B, e)P (j|a)f M (a)

= αP(B) ∑ e P (e) ∑ a P(a|B, e ) f J (a)f M (a)

= αP(B) ∑ e P (e) ∑ a f A (a, b, e ) f J (a)f M (a)

= αP(B) ∑ e P ( e ) f A¯ J M (b, e) (sum out A)


= α P ( B ) f E¯ A¯ J M (b) (sum out E )
Artificial Intelligence: Bayesian Networks 2 April 2024
= αf B (b) × f E¯ A¯ J M (b)
2
1

Variable Elimination Algorithm

•function ELIMINATION-ASK(X, e, bn) returns a distribution over


X
• inputs: X, the query variable
• e, evidence specified as an event
• bn, a belief network specifying joint distribution P ( X 1 , . . . ,
Xn )
• factors ← [ ]; vars ← REVERSE(VARS[bn])
• for each var in vars do
• factors ← [MAKE-FACTOR(var, e)|factors]
• if var is a hidden variable then factors ← SUM-OUT(var,
factors)
• return NORMALIZE(POINTWISE-PRODUCT(factors))

Artificial Intelligence: Bayesian Networks 2 April 2024


Irrelevant 2

Variables 2

● Consider the query P (JohnCalls |Burglary = true)

P (J|b) = α P (b) Σ P (e) Σ P (a|b, e)P (J|a) Σ P (m|a)


e
a
m
Sum over m is identically 1; M is irrelevant to the query
● Theorem 1: Y is irrelevant unless Y ∈ Ancestors ({X} ∪ E)

● Here
– X = J o h n C a l l s , E = {Burglar y }
– Ancestors ({X} ∪ E) = {Alarm, Earthquake}
⇒ M a r y C a l l s is irrelevant
● Compare this to backward chaining from the query in Horn clause KBs
Artificial Intelligence: Bayesian Networks 2 April 2024
Irrelevant 2

Variables 3

● Definition: moral graph of Bayes net: marry all parents and drop arrows

● Definition: A is m-separated from B by C iff separated by C in the moral graph

● Theorem 2: Y is irrelevant if m-separated from X by E

● For P (JohnCalls |Alarm = true), both Burglar y and Earthquake are irrelevant
( A = J ohnCalls , B = Burglar y and Earthquake, C = Alar m )

Artificial Intelligence: Bayesian Networks 2 April 2024


2
4

Complexity of Exact Inference


● Singly connected networks (or polytrees)
– any two nodes are connected by at most one (undirected) path
– time and space cost of variable elimination are O(d k n)

● Multiply connected networks


– can reduce 3SAT to exact inference = ⇒ NP-hard

Artificial Intelligence: Bayesian Networks 2 April 2024


2
5

approximate inference

Artificial Intelligence: Bayesian Networks 2 April 2024


2
6

Inference by Stochastic Simulation


● Basic idea
– Draw N samples from a sampling distribution S

– Compute an approximate posterior probability Pˆ


– Show this converges to the true probability P

● Outline
– Sampling from an empty network
– Rejection sampling: reject samples disagreeing with evidence
– Likelihood weighting: use evidence to weight samples
– Markov chain Monte Carlo (MCMC): sample from a stochastic process
whose stationary distribution is the true posterior

Artificial Intelligence: Bayesian Networks 2 April 2024


2
7

Sampling from an Empty Network

•function PRIOR-SAMPLE(bn) returns an event sampled from bn


• inputs: bn, a belief network specifying joint distribution P ( X 1 , . . . ,
Xn )
• x ← an event with n elements
• for i = 1 to n do
• x i ← a random sample from P ( X i | parents(X i ))
•given the values of Parents(X i ) in x
return x

Artificial Intelligence: Bayesian Networks 2 April 2024


2
8

Example

Artificial Intelligence: Bayesian Networks 2 April 2024


2
9

Example

Artificial Intelligence: Bayesian Networks 2 April 2024


3
0

Example

Artificial Intelligence: Bayesian Networks 2 April 2024


3
1

Example

Artificial Intelligence: Bayesian Networks 2 April 2024


3
2

Example

Artificial Intelligence: Bayesian Networks 2 April 2024


3
3

Example

Artificial Intelligence: Bayesian Networks 2 April 2024


3
4

Example

Artificial Intelligence: Bayesian Networks 2 April 2024


3
5

Sampling from an Empty Network


● Probability that PRIORSAMPLE generates a particular event
SP S (x 1 . . . xn ) = ∏n P (x i |parents(X i )) = P (x 1 . . . x n )
i =1
i.e., the true prior probability

● E.g., S P S ( t , f, t, t) = 0.5 × 0.9 × 0.8 × 0.9 = 0.324 = P (t, f, t, t)

● Let N P S ( x 1 . . . x n ) be the number of samples generated for event x 1 , . . . , x n

● Then we have = lim N P S ( x 1 , . . . ,


Nlim
→∞ Pˆ (x 1 , . . . , N →∞
x n )/N
xn ) S P S ( x 1 , . . . , xn )
= P (x . . . x )
1 n
=
● That is, estimates derived from PRIORSAMPLE are
consistent

● Shorthand: Pˆ (x 1 , . . . , x n ) ≈ P (x 1 . . . x n )
Artificial Intelligence: Bayesian Networks 2 April 2024
3
6

Rejection Sampling
● Pˆ (X |e) estimated from samples agreeing with e

function REJECTION-SAMPLING(X, e, bn, N) returns an estimate of P


( X |e)
local variables: N, a vector of counts over X, initially zero
for j = 1 to N do
x ← PRIOR-SAMPLE(bn)
if x is consistent with e then
N[x] ← N[x]+1 where x is the value of X in x
return NORMALIZE(N[X])

● E.g., estimate P(Rain|Sprinkler = true) using 100 samples


27 samples have Sprinkler = true
Of these, 8 have R a i n = true and 19 have R a i n =
false

● Pˆ (Rain|S prinkler = true) = NORMALIZE((8, 19⟩) =


(0.296,
Artificial 0.704⟩
Intelligence: Bayesian Networks 2 April 2024
3
7

Analysis of Rejection Sampling


● Pˆ (X |e) = α N P S ( X , e) (algorithm defn.)
(normalized by N P S (e))
= N P S ( X , e)/N P S (e)
≈ P ( X , e)/P (e) (property of PRIORSAMPLE)
= P(X|e) (defn. of conditional probability)

● Hence rejection sampling returns consistent posterior estimates

● Problem: hopelessly expensive if P (e) is small

● P (e) drops off exponentially with number of evidence variables!

Artificial Intelligence: Bayesian Networks 2 April 2024


3
8

Likelihood Weighting
● Idea: fix evidence variables, sample only nonevidence variables,
and weight each sample by the likelihood it accords the
evidence

function LIKELIHOOD-WEIGHTING(X, e, bn, N) returns an


estimate of P ( X |e)
local variables: W, a vector of weighted counts over X,
initially zero
for j = 1 to N do
x, w ← WEIGHTED-SAMPLE(bn)
W[x ] ← W[x ] + w where x is the value of X in
x return NORMALIZE(W[X ])

function WEIGHTED-SAMPLE(bn, e) returns


an event and a weight
x ← an event with n elements; w ← 1
for i = 1 to n do
if X i has a value x i in e
then w ← w × P ( X i = x i | parents(X i ))
Artificial Intelligence: Bayesian Networks 2 April 2024
else x i ← a random sample from P(X i |
3
9

Likelihood Weighting Example

w=
1.0

Artificial Intelligence: Bayesian Networks 2 April 2024


4
0

Likelihood Weighting Example

w=
1.0

Artificial Intelligence: Bayesian Networks 2 April 2024


4
1

Likelihood Weighting Example

w=
1.0

Artificial Intelligence: Bayesian Networks 2 April 2024


4
2

Likelihood Weighting Example

w = 1.0 ×
0.1

Artificial Intelligence: Bayesian Networks 2 April 2024


4
3

Likelihood Weighting Example

w = 1.0 ×
0.1

Artificial Intelligence: Bayesian Networks 2 April 2024


4
4

Likelihood Weighting Example

w = 1.0 ×
0.1

Artificial Intelligence: Bayesian Networks 2 April 2024


4
5

Likelihood Weighting Example

w = 1.0 × 0.1 × 0.99 =


0.099

Artificial Intelligence: Bayesian Networks 2 April 2024


4
6

Likelihood Weighting Analysis


● Sampling probability for
WEIGHTEDSAMPLE is
S W S ( z , e) = ∏ li = P (z i |parents(Z i ))
1

● Weight for a given sample z, e is


w(z, e) = ∏ im= P (e i |parents(E i ))
1
● Weighted sampling probability is
S W S ( z , e)w(z, e)
= ∏ li = 1 P (z i |parents(Z i )) ∏im= P (e i |parents(E i ))
= P 1(z, e) (by standard global semantics of network)

● Hence likelihood weighting returns consistent estimates


but performance still degrades with many evidence variables
because a few samples have nearly all the total weight

Artificial Intelligence: Bayesian Networks 2 April 2024


4
7

Approximate Inference using MCMC

● “State” of network = current assignment to all variables

● Generate next state by sampling one variable


Sample each variable in turn, keeping evidence fixed

● Can also choose a variable to sample at random each


time

Artificial Intelligence: Bayesian Networks 2 April 2024


4
8

The Markov Chain


● With Sprinkler = true, WetGrass = true, there are four
states:

● Wander about for a while, average what you see

Artificial Intelligence: Bayesian Networks 2 April 2024


4
9

MCMC Example
● Estimate P(Rain|Sprinkler = true, WetGrass = true)

● Sample Cloudy or R a i n given its Markov blanket, repeat.


Count number of times R a i n is true and false in the
samples.

● E.g., visit 100 states


31 have R a i n = true, 69 have R a i n = false

● Pˆ (Rain|S prinkler = true, WetGrass = true)


= NORMALIZE((31, 69⟩) = (0.31, 0.69⟩

● Theorem: chain approaches stationary distribution:


long-run fraction of time spent in each state is exactly
proportional to its posterior probability

Artificial Intelligence: Bayesian Networks 2 April 2024


5
0

Summary
● Bayes nets provide a natural representation for (causally induced)
conditional independence

● Generally easy for (non)experts to construct

● Exact inference by variable elimination


– polytime on polytrees, NP-hard on general graphs
– space = time, very sensitive to topology

● Approximate inference by LW, M C M C


– LW does poorly when there is lots of (downstream) evidence
– LW, M C M C generally insensitive to topology
– Convergence can be very slow with probabilities close to 1 or 0
– Can handle arbitrary combinations of discrete and continuous
variables

Artificial Intelligence: Bayesian Networks 2 April 2024

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy