Bayesian Belief Network

Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 41

Bayesian Belief

Network
Bayesian Network
2
Motivation
 We want a representation and
reasoning system that is based on
conditional independence
 Compact yet expressive representation
 Efficient reasoning procedures
 Bayesian Networks are such a Thomas Bayes
representation
 Named after Thomas Bayes (ca. 1702 –
1761)
 Term coined in 1985 by Judea Pearl (1936
– )
 Their invention changed the focus on AI
from logic to probability!
Judea Pearl
Bayesian Network
 Bayesian belief network is a useful way
to represent probabilistic models and
visualize them.

 A Bayesian network is a probabilistic


graphical model which represents a set
of variables and their conditional
dependencies using a directed acyclic
graph.
Bayesian Networks
 A Bayesian network specifies a joint distribution in a
structured form

 Represent dependence/independence via a directed graph


 Nodes = random variables
 Edges = direct dependence

 Structure of the graph  Conditional independence


relations

 Requires that graph is acyclic (no directed cycles)

 Two components to a Bayesian network


 The graph structure (conditional independence assumptions)
 The numerical probabilities (for each variable given its parents)
Bayesian Networks

 General form:

𝑃(𝑋 1, 𝑋 2,…. 𝑋 𝑁)=∏ 𝑃(𝑋𝑖∨𝑝𝑎𝑟𝑒𝑛𝑡𝑠(𝑋 𝑖))


𝑖

The full joint distribution The graph-structured approximation


Example of a simple Bayesian network

𝑃(𝑋 1, 𝑋 2,…. 𝑋 𝑁)=∏ 𝑃(𝑋𝑖∨𝑝𝑎𝑟𝑒𝑛𝑡𝑠(𝑋 𝑖)) A B


𝑖
𝑃 ( 𝐴, 𝐵 ,𝐶 )=𝑃 ( 𝐶| 𝐴 ,𝐵 ) 𝑃 ( 𝐴) 𝑃 (𝐵)
C

Probability model has simple factored form


Directed edges => direct dependence
Absence of an edge => conditional independence

Also known as belief networks, graphical models,


causal networks
Other formulations, e.g., undirected graphical

models
Examples of 3-way Bayesian
Networks

A B C Absolute Independence:
p(A,B,C) = p(A) p(B) p(C)
Examples of 3-way Bayesian Networks

 Conditionally independent
effects:

A
 B and C are conditionally
independent given A
B C

 e.g., A is a disease, and we


model B and C as
conditionally independent
symptoms given A
Examples of 3-way Bayesian Networks

 Independent Clauses:

A B
 “Explaining away” effect:
C
 A and B are independent but become
dependent once C is known!!
 (we’ll come back to this later)
Examples of 3-way Bayesian
Networks

A B C Markov dependence:
p(A,B,C) = p(C|B) p(B|A)p(A)
Consider an example

P(C)
Windy Cloudy 0.002

P(W)
W C P(R)
0.001
Rains T T 0.95
T F 0.95
F T 0.29
F F 0.001

Wet R P(W)
grass T 0.90 Take R P(0)
off
F 0.05 from T 0.91
work F 0.00
1
Find the probability of having a wet grass?

P(W) = P(W/R)* P(R) + P(W/~R)*P(~R)


= 0.9* P(R) + 0.05 *P(~R)
……………………………… (1)

Still, we have to find P(R)


P(R) = P(R/W,C) * P(W⋀C)+P(R/~W,C)* P(~W
⋀C)+ P(R/W, ~C)* P(W ⋀~C)+ P(R/~W,~C) *

………………………………….
P(~W⋀~C)

(2)
= 0.95*0.001*0.002+ 0.29*(1-
0.001)*0.002+0.95*0.001*(1-0.002)+0.001*(1-
0.001)*(1-0.002) = 0.00252

P(R)= 0.00252
…………………………………. (3)
P(W⋀C)+P(~R/~W,C)* P(~W ⋀C)+
P(~R)= P(~R/W,C) *

P(~R/W, ~C)* P(W ⋀~C)+ P(~R/~W,~C)

………………….
* P(~W⋀~C)

(4)
= (1-0.95)*0.001*0.002+ (1-0.29)*(1-
0.001)*0.002+(1-0.95)*0.001*(1-
0.002)+(1- 0.001)*(1-0.001)*(1-0.002) =
0.9974
P(~R) = 0.9974
Sub. Eq.3 and 5 in Eq.1

P(W) = P(W/R)* P(R) + P(W/~R)*P(~R)

=0.9* 0.00252+0.005* 0.9974


= 0.9* P(R) + 0.05 *P(~R)

= 0.0521
The Alarm Example
 You have a new burglar alarm installed
 It is reliable in detecting burglary, but responds to minor
earthquakes
 Two neighbors (John, and Mary) promise to call you at work
when they hear the alarm
 John always calls when hears an alarm, but confuses alarm

with phone ringing (and calls then also)


 Mary likes loud music and sometimes misses the alarm!

 Given evidence about who has and hasn’t called, estimate the
probability of a burglary
The Alarm Example
 Represent problem using 5 binary variables:
 B = a burglary occurs at your house
 E = an earthquake occurs at your house
 A = the alarm goes off
 J = John calls to report the alarm
 M = Mary calls to report the alarm

 What is P(B | M, J) ?

 We can use the full joint distribution to answer this question


 Requires 25 = 32 probabilities

 Can we use prior domain knowledge to come up with a


Bayesian network that requires fewer probabilities?
Constructing a Bayesian Network:
Step 1

 Order the variables in terms of causality


(may be a partial order)
 e.g., {E, B} -> {A} -> {J, M}

 Use these assumptions to create the


graph structure of the Bayesian network
The Resulting Bayesian Network

network topology reflects causal knowledge


1. What is probability of P(j  m  a  b 
e)? (what is the probability that the
alaram has sounded but neither a burglary
nor an earthquake has occurred, and both
John and mery call)
2. What is the probability of john making
calls?
Sol.
Sol. 1.

P(j  m  a  b  e) = P (j | a) P (m | a) P
(a | b, e) P (b) P (e)
= 0.90*0.70*0.001*0.999*0.998
= 0.00062
Sol. 2.

P(J) = P(J/a) P(a) + P(J/~a)*P(~a)


= 0.90* P(a) + 0.05* P(~a)

P(a) = P(a/b,e) * P(b⋀e)+P(a/~b, e)* P(~b


To find P(a);

⋀e)+ P(a/b, ~e)* P(b ⋀~e)+ P(a/~b,~e)

= 0.00252
* P(~b⋀~e)
P(~a) = P(~a/b,e) * P(b⋀e)+P(~a/~b,e)* P(~b ⋀e)+
Similarly,

P(~a/b, ~e)* P(b ⋀~e)+ P(~a/~b,~e) * P(~b⋀~e)


= 0.9974

P(J) = P(J/a) P(a) + P(J/~a)*P(~a)


= 0.90* 0.00252 + 0.05* 0.9974
= 0.0521
Constructing a Bayesian Network:
Step 2

 Fill in conditional
probability tables (CPTs)
 One for each node
 entries, where is the number of
parents

 Where do these
probabilities come from?
 Expert knowledge
 From data (relative frequency
estimates)
 Or a combination of both
The Bayesian network

Shouldn’t these add up to 1?


No. Each row adds up to 1,
and we’re using this to let us
show only half of the table. For
example,
The Bayesian network

What is P(j  m  a  b  e)?

P (j | a) P (m | a) P (a | b, e) P (b) P


Number of Probabilities in Bayesian
Networks (i.e. why Bayesian Networks are
effective)

 Consider n binary variables

 Unconstrained joint distribution requires


O(2n) probabilities

 If we have a Bayesian network, with a


maximum of k parents for any node,
then we need O(n 2k) probabilities
 16 binary variables
 Full joint distribution is
 How many probability values required for
Bayes Net?
Bayesian Networks from a different
Variable Ordering
Example for BN construction: Fire
Diagnosis

You want to diagnose whether there is a


fire in a building
 You receive a noisy report about whether

everyone is leaving the building


 If everyone is leaving, this may have

been caused by a fire alarm


 If there is a fire alarm, it may have been

caused by a fire or by tampering


 If there is a fire, there may be smoke
Example for BN construction: Fire
Diagnosis

First you choose the variables. In this case, all are Boolean:
 Tampering is true when the alarm has been tampered with

 Fire is true when there is a fire

 Alarm is true when there is an alarm

 Smoke is true when there is smoke

 Leaving is true if there are lots of people leaving the

building
 Report is true if the sensor reports that lots of people are

leaving the building

 Let’s construct the Bayesian network for this


 First, you choose a total ordering of the variables, let’s say:
Fire; Tampering; Alarm; Smoke; Leaving; Report.
Example for BN construction: Fire
Diagnosis
Example for BN construction: Fire
Diagnosis

• Using the total ordering of variables:


 Let’s say Fire; Tampering; Alarm; Smoke; Leaving; Report.
• Now choose the parents for each variable by evaluating
conditional independencies
 Fire is the first variable in the ordering. It does not have parents.
 Tampering independent of fire (learning that one is true would not
change your beliefs about the probability of the other)
 Alarm depends on both Fire and Tampering: it could be caused by
either or both
 Smoke is caused by Fire, and so is independent of Tampering and
Alarm given whether there is a Fire
 Leaving is caused by Alarm, and thus is independent of the other
variables given Alarm
 Report is caused by Leaving, and thus is independent of the other
variables given Leaving
Example for BN construction: Fire
Diagnosis

• How many probabilities do we need to


specify for this Bayesian network?
• 1+1+4+2+2+2 = 12
Independence
 Let define the symbol to indicate
independence of two variables.

A B C
Independence

True False

 General rule of thumb:


 A known variable makes everything below that variable
independent from everything above that variable.
Another (tricky) Example

True False
Explaining Away
 Earth doesn’t care whether your
house is currently being burgled
 While you are on vacation, one of
your neighbors calls and tells you
your home’s burglar alarm is
ringing.
 But now suppose you learn that
there was a medium-sized
earthquake in your neighborhood.
Oh, whew! Probably not a burglar
after all.

 “explains away” the hypothetical


burglar, so knowing about the and
effects you estimate of .
Independence
 Is there a principled way to determine all
these dependencies?
 Yes! It’s called D-Separation – 3 specific
rules.
 Some say D-separation rules are easy
 Our book: “rather complicated… we omit it”
 The truth: a mix of both… easy to state rules,
can be tricky to apply. Talk to me if you want to
know more.
Web Resources
 https://www.youtube.com/watch?v=JS8vD
X89w7Y
 https://www.youtube.com/watch?v=hEZjP
Z-Ze0A
 https://www.youtube.com/watch?v=iz7Kl2
gcmlk&t=686s
 https://www.youtube.com/watch?v=-h_h7
pnwY8A&t=0s
 https://www.youtube.com/watch?
v=zLTiayj_aSI

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy