Bayesian Belief Network

Bayesian Belief
Network
Bayesian Network
2
Motivation
 We want a representation and
reasoning system that is based on
conditional independence
 Compact yet expressive representation
 Efficient reasoning procedures
 Bayesian Networks are such a Thomas Bayes
representation
 Named after Thomas Bayes (ca. 1702 –
1761)
 Term coined in 1985 by Judea Pearl (1936
– )
 Their invention changed the focus on AI
from logic to probability!
Judea Pearl
Bayesian Network
 Bayesian belief network is a useful way
to represent probabilistic models and
visualize them.
 A Bayesian network is a probabilistic

graphical model which represents a set
of variables and their conditional
dependencies using a directed acyclic
graph.
Bayesian Networks
 A Bayesian network specifies a joint distribution in a
structured form
 Represent dependence/independence via a directed graph

 Nodes = random variables
 Edges = direct dependence
 Structure of the graph  Conditional independence

relations
 Requires that graph is acyclic (no directed cycles)
 Two components to a Bayesian network

 The graph structure (conditional independence assumptions)
 The numerical probabilities (for each variable given its parents)
Bayesian Networks
 General form:
𝑃(𝑋 1, 𝑋 2,…. 𝑋 𝑁)=∏ 𝑃(𝑋𝑖∨𝑝𝑎𝑟𝑒𝑛𝑡𝑠(𝑋 𝑖))

𝑖
The full joint distribution The graph-structured approximation

Example of a simple Bayesian network
𝑃(𝑋 1, 𝑋 2,…. 𝑋 𝑁)=∏ 𝑃(𝑋𝑖∨𝑝𝑎𝑟𝑒𝑛𝑡𝑠(𝑋 𝑖)) A B

𝑖
𝑃 ( 𝐴, 𝐵 ,𝐶 )=𝑃 ( 𝐶| 𝐴 ,𝐵 ) 𝑃 ( 𝐴) 𝑃 (𝐵)
C
Probability model has simple factored form

Directed edges => direct dependence
Absence of an edge => conditional independence
Also known as belief networks, graphical models,

causal networks
Other formulations, e.g., undirected graphical
models
Examples of 3-way Bayesian
Networks
A B C Absolute Independence:
p(A,B,C) = p(A) p(B) p(C)
Examples of 3-way Bayesian Networks
 Conditionally independent
effects:
A
 B and C are conditionally
independent given A
B C
 e.g., A is a disease, and we

model B and C as
conditionally independent
symptoms given A
Examples of 3-way Bayesian Networks
 Independent Clauses:
A B
 “Explaining away” effect:
C
 A and B are independent but become
dependent once C is known!!
 (we’ll come back to this later)
Examples of 3-way Bayesian
Networks
A B C Markov dependence:
p(A,B,C) = p(C|B) p(B|A)p(A)
Consider an example
P(C)
Windy Cloudy 0.002
P(W)
W C P(R)
0.001
Rains T T 0.95
T F 0.95
F T 0.29
F F 0.001
Wet R P(W)
grass T 0.90 Take R P(0)
off
F 0.05 from T 0.91
work F 0.00
1
Find the probability of having a wet grass?
P(W) = P(W/R)* P(R) + P(W/~R)*P(~R)

= 0.9* P(R) + 0.05 *P(~R)
……………………………… (1)
Still, we have to find P(R)

P(R) = P(R/W,C) * P(W⋀C)+P(R/~W,C)* P(~W
⋀C)+ P(R/W, ~C)* P(W ⋀~C)+ P(R/~W,~C) *
………………………………….
P(~W⋀~C)
(2)
= 0.95*0.001*0.002+ 0.29*(1-
0.001)*0.002+0.95*0.001*(1-0.002)+0.001*(1-
0.001)*(1-0.002) = 0.00252
P(R)= 0.00252
…………………………………. (3)
P(W⋀C)+P(~R/~W,C)* P(~W ⋀C)+
P(~R)= P(~R/W,C) *
P(~R/W, ~C)* P(W ⋀~C)+ P(~R/~W,~C)
………………….
* P(~W⋀~C)
(4)
= (1-0.95)*0.001*0.002+ (1-0.29)*(1-
0.001)*0.002+(1-0.95)*0.001*(1-
0.002)+(1- 0.001)*(1-0.001)*(1-0.002) =
0.9974
P(~R) = 0.9974
Sub. Eq.3 and 5 in Eq.1
P(W) = P(W/R)* P(R) + P(W/~R)*P(~R)
=0.9* 0.00252+0.005* 0.9974

= 0.9* P(R) + 0.05 *P(~R)
= 0.0521
The Alarm Example
 You have a new burglar alarm installed
 It is reliable in detecting burglary, but responds to minor
earthquakes
 Two neighbors (John, and Mary) promise to call you at work
when they hear the alarm
 John always calls when hears an alarm, but confuses alarm
with phone ringing (and calls then also)

 Mary likes loud music and sometimes misses the alarm!
 Given evidence about who has and hasn’t called, estimate the
probability of a burglary
The Alarm Example
 Represent problem using 5 binary variables:
 B = a burglary occurs at your house
 E = an earthquake occurs at your house
 A = the alarm goes off
 J = John calls to report the alarm
 M = Mary calls to report the alarm
 What is P(B | M, J) ?
 We can use the full joint distribution to answer this question

 Requires 25 = 32 probabilities
 Can we use prior domain knowledge to come up with a

Bayesian network that requires fewer probabilities?
Constructing a Bayesian Network:
Step 1
 Order the variables in terms of causality

(may be a partial order)
 e.g., {E, B} -> {A} -> {J, M}
 Use these assumptions to create the

graph structure of the Bayesian network
The Resulting Bayesian Network
network topology reflects causal knowledge

1. What is probability of P(j  m  a  b 
e)? (what is the probability that the
alaram has sounded but neither a burglary
nor an earthquake has occurred, and both
John and mery call)
2. What is the probability of john making
calls?
Sol.
Sol. 1.
P(j  m  a  b  e) = P (j | a) P (m | a) P
(a | b, e) P (b) P (e)
= 0.90*0.70*0.001*0.999*0.998
= 0.00062
Sol. 2.
P(J) = P(J/a) P(a) + P(J/~a)*P(~a)

= 0.90* P(a) + 0.05* P(~a)
P(a) = P(a/b,e) * P(b⋀e)+P(a/~b, e)* P(~b

To find P(a);
⋀e)+ P(a/b, ~e)* P(b ⋀~e)+ P(a/~b,~e)
= 0.00252
* P(~b⋀~e)
P(~a) = P(~a/b,e) * P(b⋀e)+P(~a/~b,e)* P(~b ⋀e)+
Similarly,
P(~a/b, ~e)* P(b ⋀~e)+ P(~a/~b,~e) * P(~b⋀~e)

= 0.9974
P(J) = P(J/a) P(a) + P(J/~a)*P(~a)

= 0.90* 0.00252 + 0.05* 0.9974
= 0.0521
Constructing a Bayesian Network:
Step 2
 Fill in conditional
probability tables (CPTs)
 One for each node
 entries, where is the number of
parents
 Where do these
probabilities come from?
 Expert knowledge
 From data (relative frequency
estimates)
 Or a combination of both
The Bayesian network
Shouldn’t these add up to 1?

No. Each row adds up to 1,
and we’re using this to let us
show only half of the table. For
example,
The Bayesian network
What is P(j  m  a  b  e)?
P (j | a) P (m | a) P (a | b, e) P (b) P

Number of Probabilities in Bayesian
Networks (i.e. why Bayesian Networks are
effective)
 Consider n binary variables
 Unconstrained joint distribution requires

O(2n) probabilities
 If we have a Bayesian network, with a

maximum of k parents for any node,
then we need O(n 2k) probabilities
 16 binary variables
 Full joint distribution is
 How many probability values required for
Bayes Net?
Bayesian Networks from a different
Variable Ordering
Example for BN construction: Fire
Diagnosis
You want to diagnose whether there is a

fire in a building
 You receive a noisy report about whether
everyone is leaving the building

 If everyone is leaving, this may have
been caused by a fire alarm

 If there is a fire alarm, it may have been
caused by a fire or by tampering

 If there is a fire, there may be smoke
Diagnosis
First you choose the variables. In this case, all are Boolean:
 Tampering is true when the alarm has been tampered with
 Fire is true when there is a fire
 Alarm is true when there is an alarm
 Smoke is true when there is smoke
 Leaving is true if there are lots of people leaving the
building
 Report is true if the sensor reports that lots of people are
leaving the building
 Let’s construct the Bayesian network for this

 First, you choose a total ordering of the variables, let’s say:
Fire; Tampering; Alarm; Smoke; Leaving; Report.
Diagnosis
Diagnosis
• Using the total ordering of variables:

 Let’s say Fire; Tampering; Alarm; Smoke; Leaving; Report.
• Now choose the parents for each variable by evaluating
conditional independencies
 Fire is the first variable in the ordering. It does not have parents.
 Tampering independent of fire (learning that one is true would not
change your beliefs about the probability of the other)
 Alarm depends on both Fire and Tampering: it could be caused by
either or both
 Smoke is caused by Fire, and so is independent of Tampering and
Alarm given whether there is a Fire
 Leaving is caused by Alarm, and thus is independent of the other
variables given Alarm
 Report is caused by Leaving, and thus is independent of the other
variables given Leaving
Diagnosis
• How many probabilities do we need to

specify for this Bayesian network?
• 1+1+4+2+2+2 = 12
Independence
 Let define the symbol to indicate
independence of two variables.
A B C
Independence
True False
 General rule of thumb:

 A known variable makes everything below that variable
independent from everything above that variable.
Another (tricky) Example
True False
Explaining Away
 Earth doesn’t care whether your
house is currently being burgled
 While you are on vacation, one of
your neighbors calls and tells you
your home’s burglar alarm is
ringing.
 But now suppose you learn that
there was a medium-sized
earthquake in your neighborhood.
Oh, whew! Probably not a burglar
after all.
 “explains away” the hypothetical

burglar, so knowing about the and
effects you estimate of .
Independence
 Is there a principled way to determine all
these dependencies?
 Yes! It’s called D-Separation – 3 specific
rules.
 Some say D-separation rules are easy
 Our book: “rather complicated… we omit it”
 The truth: a mix of both… easy to state rules,
can be tricky to apply. Talk to me if you want to
know more.
Web Resources
 https://www.youtube.com/watch?v=JS8vD
X89w7Y
 https://www.youtube.com/watch?v=hEZjP
Z-Ze0A
 https://www.youtube.com/watch?v=iz7Kl2
gcmlk&t=686s
 https://www.youtube.com/watch?v=-h_h7
pnwY8A&t=0s
 https://www.youtube.com/watch?
v=zLTiayj_aSI

Bayesian Belief Network

Uploaded by

Copyright:

Available Formats

Bayesian Belief Network

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Bayesian Belief Network

Uploaded by

Copyright:

Available Formats

Bayesian Belief

 A Bayesian network is a probabilistic

 Represent dependence/independence via a directed graph

 Structure of the graph  Conditional independence

 Requires that graph is acyclic (no directed cycles)

 Two components to a Bayesian network

𝑃(𝑋 1, 𝑋 2,…. 𝑋 𝑁)=∏ 𝑃(𝑋𝑖∨𝑝𝑎𝑟𝑒𝑛𝑡𝑠(𝑋 𝑖))

The full joint distribution The graph-structured approximation

𝑃(𝑋 1, 𝑋 2,…. 𝑋 𝑁)=∏ 𝑃(𝑋𝑖∨𝑝𝑎𝑟𝑒𝑛𝑡𝑠(𝑋 𝑖)) A B

Probability model has simple factored form

Also known as belief networks, graphical models,

 e.g., A is a disease, and we

P(W) = P(W/R)* P(R) + P(W/~R)*P(~R)

Still, we have to find P(R)

P(~R/W, ~C)* P(W ⋀~C)+ P(~R/~W,~C)

P(W) = P(W/R)* P(R) + P(W/~R)*P(~R)

=0.9* 0.00252+0.005* 0.9974

with phone ringing (and calls then also)

 We can use the full joint distribution to answer this question

 Can we use prior domain knowledge to come up with a

 Order the variables in terms of causality

 Use these assumptions to create the

network topology reflects causal knowledge

P(J) = P(J/a) P(a) + P(J/~a)*P(~a)

P(a) = P(a/b,e) * P(b⋀e)+P(a/~b, e)* P(~b

⋀e)+ P(a/b, ~e)* P(b ⋀~e)+ P(a/~b,~e)

P(~a/b, ~e)* P(b ⋀~e)+ P(~a/~b,~e) * P(~b⋀~e)

P(J) = P(J/a) P(a) + P(J/~a)*P(~a)

Shouldn’t these add up to 1?

What is P(j  m  a  b  e)?

P (j | a) P (m | a) P (a | b, e) P (b) P

 Consider n binary variables

 Unconstrained joint distribution requires

 If we have a Bayesian network, with a

You want to diagnose whether there is a

everyone is leaving the building

been caused by a fire alarm

caused by a fire or by tampering

 Fire is true when there is a fire

 Alarm is true when there is an alarm

 Smoke is true when there is smoke

 Leaving is true if there are lots of people leaving the

leaving the building

 Let’s construct the Bayesian network for this

• Using the total ordering of variables:

• How many probabilities do we need to

 General rule of thumb:

 “explains away” the hypothetical

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.