Bayesian Networks

Download as ppt, pdf, or txt
Download as ppt, pdf, or txt
You are on page 1of 45

Bayesian Networks

Introduction
Suppose you are trying to determine
if a patient has inhalational
anthrax. You observe the
following symptoms:
• The patient has a cough
• The patient has a fever
• The patient has difficulty
breathing
Introduction
You would like to determine how
likely the patient is infected with
inhalational anthrax given that the
patient has a cough, a fever, and
difficulty breathing

We are not 100% certain that the


patient has anthrax because of these
symptoms. We are dealing with
uncertainty!
Introduction
Now suppose you order an x-ray
and observe that the patient has a
wide mediastinum.
Your belief that that the patient is
infected with inhalational anthrax is
now much higher.
Introduction
• In the previous slides, what you observed
affected your belief that the patient is
infected with anthrax
• This is called reasoning with uncertainty
• Wouldn’t it be nice if we had some
methodology for reasoning with
uncertainty? Well in fact, we do…
Bayesian Networks
HasAnthrax

HasCough HasFever HasDifficultyBreathing HasWideMediastinum

• In the opinion of many AI researchers, Bayesian


networks are the most significant contribution in
AI in the last 10 years
• They are used in many applications eg. spam
filtering, speech recognition, robotics, diagnostic
systems and even syndromic surveillance
Outline
1. Introduction
2. Probability Primer
3. Bayesian networks
Probability Primer: Random Variables
• A random variable is the basic element of
probability
• Refers to an event and there is some degree
of uncertainty as to the outcome of the
event
• For example, the random variable A could
be the event of getting a head on a coin flip
Boolean Random Variables
• We will start with the simplest type of random
variables – Boolean ones
• Take the values true or false
• Think of the event as occurring or not occurring
• Examples (Let A be a Boolean random variable):
A = Getting a head on a coin flip
A = It will rain today
The Joint Probability Distribution
• Joint probabilities can be between A B C P(A,B,C)
any number of variables false false false 0.1
false false true 0.2
eg. P(A = true, B = true, C = true)
false true false 0.05
• For each combination of variables,
false true true 0.05
we need to say how probable that true false false 0.3
combination is true false true 0.1
• The probabilities of these true true false 0.05
combinations need to sum to 1 true true true 0.15

Sums to 1
The Joint Probability Distribution
A B C P(A,B,C)
• Once you have the joint probability false false false 0.1
distribution, you can calculate any false false true 0.2
probability involving A, B, and C false true false 0.05
false true true 0.05
true false false 0.3
true false true 0.1
true true false 0.05
true true true 0.15
Examples of things you can compute:
• P(A=true) = sum of P(A,B,C) in rows with A=true
• P(A=true, B = true | C=true) =
P(A = true, B = true, C = true) / P(C = true)
The Problem with the Joint
Distribution
• Lots of entries in the A B C P(A,B,C)
table to fill up! false false false 0.1

• For k Boolean random false false true 0.2


false true false 0.05
variables, you need a false true true 0.05
table of size 2k true false false 0.3
• How do we use fewer true false true 0.1
true true false 0.05
numbers? Need the true true true 0.15
concept of
independence
Independence
Variables A and B are independent if any of
the following hold:
• P(A,B) = P(A) P(B)
• P(A | B) = P(A)
• P(B | A) = P(B)
This says that knowing the outcome of
A does not tell me anything new about
the outcome of B.
Independence
How is independence useful?
• Suppose you have n coin flips and you want to
calculate the joint distribution P(C1, …, Cn)
• If the coin flips are not independent, you need 2 n
values in the table
• If the coin flips are independent, then
n
P (C1 ,..., Cn )  P (Ci ) Each P(Ci) table has 2 entries
i 1 and there are n of them for a
total of 2n values
Conditional Independence
Variables A and B are conditionally
independent given C if any of the following
hold:
• P(A, B | C) = P(A | C) P(B | C)
• P(A | B, C) = P(A | C)
• P(B | A, C) = P(B | C)
Knowing C tells me everything about B. I don’t gain
anything by knowing A (either because A doesn’t
influence B or because knowing C provides all the
information knowing A would give)
Outline
1. Introduction
2. Probability Primer
3. Bayesian networks
A Bayesian Network
A Bayesian network is made up of:
1. A Directed Acyclic Graph
A

C D

2. A set of tables for each node in the graph


A P(A) A B P(B|A) B D P(D|B) B C P(C|B)
false 0.6 false false 0.01 false false 0.02 false false 0.4
true 0.4 false true 0.99 false true 0.98 false true 0.6
true false 0.7 true false 0.05 true false 0.9
true true 0.3 true true 0.95 true true 0.1
A Bayesian Network
• Semantics of the DAG
– Nodes are random variables
– Edges represent causal influences
– Each node is associated with a conditional probability
distribution
• Two equivalent viewpoints
– A data structure that represents the joint distribution
compactly
– A representation for a set of conditional independence
assumptions about a distribution
A Directed Acyclic Graph
Each node in the graph is a A node X is a parent of
random variable another node Y if there is an
arrow from node X to node Y
A eg. A is a parent of B

C D

Informally, an arrow from


node X to node Y means X
has a direct influence on Y
A Set of Tables for Each Node
A P(A) A B P(B|A)
Each node Xi has a
false 0.6 false false 0.01
conditional probability
true 0.4 false true 0.99
true false 0.7
distribution P(Xi | Parents(Xi))
true true 0.3 that quantifies the effect of
the parents on the node
B C P(C|B)
false false 0.4 The parameters are the
false true 0.6 A probabilities in these
true false 0.9 conditional probability tables
true true 0.1 (CPTs)
B
B D P(D|B)
false false 0.02
C D
false true 0.98
true false 0.05
true true 0.95
A Set of Tables for Each Node
Conditional Probability
Distribution for C given B
B C P(C|B)
false false 0.4
false true 0.6
true false 0.9
true true 0.1 For a given combination of values of the parents (B
in this example), the entries for P(C=true | B) and
P(C=false | B) must add up to 1
eg. P(C=true | B=false) + P(C=false |B=false )=1

If you have a Boolean variable with k Boolean parents, this table


has 2k+1 probabilities (but only 2k need to be stored)
Using a Bayesian Network Example
Using the network in the example, suppose you want to
calculate:
P(A = true, B = true, C = true, D = true)
= P(A = true) * P(B = true | A = true) *
P(C = true | B = true) P( D = true | B = true)
= (0.4)*(0.3)*(0.1)*(0.95) A

C D
Using a Bayesian Network Example
Using the network in the example, suppose you want to
calculate:
This is from the
P(A = true, B = true, C = true, D = true) graph structure
= P(A = true) * P(B = true | A = true) *
P(C = true | B = true) P( D = true | B = true)
= (0.4)*(0.3)*(0.1)*(0.95) A

B
These numbers are from the
conditional probability tables
C D
Bayesian Network: Markov Blanket
Bayesian Networks
Two important properties:
1. Encodes the conditional independence
relationships between the variables in the
graph structure
2. Is a compact representation of the joint
probability distribution over the variables
Conditional Independence
The Markov condition: given its parents (P1, P2),
a node (X) is conditionally independent of its non-
descendants (ND1, ND2)

P1 P2

ND1 X ND2

C1 C2
The Joint Probability Distribution
Due to the Markov condition, we can
compute the joint probability distribution over
all the variables X1, …, Xn in the Bayesian net
using the formula:
n
P ( X 1  x1 ,..., X n  xn )  P ( X i  xi | Parents ( X i ))
i 1

Where Parents(Xi) means the values of the Parents of the node Xi


with respect to the graph
Joint Probability Factorization
For any joint distribution of random variables the following factorization is always
true:

P(We
A,derive ) P( applying
B, Cit,byDrepeatedly A) P( Bthe|Bayes’
A) PRule
(C | A, B ) P( D | A, B, C )
P(X,Y)=P(X|Y)P(Y):

P ( A, B, C , D )  P ( B, C , D | A) P ( A)
 P (C , D | B, A) P ( B | A) P ( A)
 P ( D | C , B, A) P (C | B, A) P ( B | A) P ( A)
P ( A) P ( B | A) P (C | A, B ) P ( D | A, B, C )
Joint Probability Factorization
Our example graph carries additional independence
information, which simplifies the joint distribution:

P ( A, B, C , D )  P ( A) P ( B | A) P (C | A, B ) P ( D | A, B, C )
 P ( A) P ( B | A) P (C | B ) P ( D | B )

This is why, we only need the tables for


A
P(A), P(B|A), P(C|B), and P(D|B)
and why we computed
P(A = true, B = true, C = true, D = true) B
= P(A = true) * P(B = true | A = true) *
C D
P(C = true | B = true) P( D = true | B = true)
= (0.4)*(0.3)*(0.1)*(0.95)
Inference
• Using a Bayesian network to compute
probabilities is called inference
• In general, inference involves queries of the form:
P( X | E )
E = The evidence variable(s)
X = The query variable(s)
Inference
HasAnthrax

HasCough HasFever HasDifficultyBreathing HasWideMediastinum

• An example of a query would be:


P( HasAnthrax = true | HasFever = true, HasCough = true)
• Note: Even though HasDifficultyBreathing and
HasWideMediastinum are in the Bayesian network, they are
not given values in the query (ie. they do not appear either as
query variables or evidence variables)
• They are treated as unobserved variables and summed out.
Inference Example A

Supposed we know that A=true.


B
What is more probable C=true or D=true?
For this we need to compute
P(C=t | A =t) and P(D=t | A =t). C D
Let us compute the first one.

P ( A t , C t ) b,d
 P ( A t , B b, C t , D d )
P (C t | A t )  
P ( A t ) P ( A t )

A P(A) A B P(B|A) B D P(D|B) B C P(C|B)


false 0.6 false false 0.01 false false 0.02 false false 0.4
true 0.4 false true 0.99 false true 0.98 false true 0.6
true false 0.7 true false 0.05 true false 0.9
true true 0.3 true true 0.95 true true 0.1
What is P(A=true)? A
P ( A t )   P ( A t , B b, C c, D d )
b ,c ,d
B
  P ( A t ) P ( B b | A t ) P (C c | B b) P ( D d | B b)
b ,c ,d

 P ( A t )  P ( B b | A t ) P (C c | B b) P ( D d | B b) C D
b ,c ,d

 P ( A t ) P ( B b | A t ) P (C c | B b) P ( D d | B b)
b c ,d

 P ( A t ) P ( B b | A t ) P (C c | B b) P ( D d | B b)
b c d

 P ( A t ) P ( B b | A t ) P (C c | B b) * 1
b c

0.4( P ( B t | A t ) P (C c | B t )  P ( B  f | A t ) P (C c | B  f )) ...
c c
A P(A) A B P(B|A) B D P(D|B) B C P(C|B)
false 0.6 false false 0.01 false false 0.02 false false 0.4
true 0.4 false true 0.99 false true 0.98 false true 0.6
true false 0.7 true false 0.05 true false 0.9
true true 0.3 true true 0.95 true true 0.1
What is P(C=true, A=true)? A

P ( A t , C t )  P ( A t , B b, C t , D d ) B
b ,d

 P ( A t ) P ( B b | A t ) P (C t | B b) P ( D d | B b)
b ,d
C D
 P ( A t ) P ( B b | A t ) P (C t | B b) P ( D d | B b)
b d

0.4( P ( B t | A t ) P (C t | B t ) P ( D d | B t )
d

 P ( B  f | A t ) P (C t | B  f ) P ( D d | B  f ))
d

0.4(0.3 * 0.1 * 1  0.7 * 0.6 * 1) 0.4(0.03  0.42) 0.4 * 0.45 0.18


A P(A) A B P(B|A) B D P(D|B) B C P(C|B)
false 0.6 false false 0.01 false false 0.02 false false 0.4
true 0.4 false true 0.99 false true 0.98 false true 0.6
true false 0.7 true false 0.05 true false 0.9
true true 0.3 true true 0.95 true true 0.1
Bayesian Networks
Example
•Harry installed a new burglar alarm at his home to detect
burglary. The alarm reliably responds to detecting a
burglary but also responds to minor earthquakes. Harry
has two neighbors David and Sophia, who have taken a
responsibility to inform Harry at work when they hear the
alarm. David always calls Harry when he hears the alarm,
but sometimes he gets confused with the phone ringing
and calls at that time too. On the other hand, Sophia likes
to listen to high music, so sometimes she misses to hear
the alarm. Here we would like to compute the probability
of a Burglary Alarm.
Problem
Calculate the probability that the alarm has sounded,
but there is neither a burglary, nor an earthquake
occurred, and David and Sophia both called the Harry.
0.05
0.31
0.001
P[D, S, A, B, E]= P[D | S, A, B, E]. P[S, A, B, E]

=P[D | S, A, B, E]. P[S | A, B, E]. P[A, B, E]

= P [D| A]. P [ S| A, B, E]. P[ A, B, E]

= P[D | A]. P[ S | A]. P[A| B, E]. P[B, E]

= P[D | A ]. P[S | A]. P[A| B, E]. P[B |E]. P[E]


• Let's take the observed probability for the Burglary and
earthquake component:
P(B= True) = 0.002, which is the probability of burglary.
P(B= False)= 0.998, which is the probability of no
burglary.
P(E= True)= 0.001, which is the probability of a minor
earthquake
P(E= False)= 0.999, Which is the probability that an
earthquake not occurred.
• P(S, D, A, ¬B, ¬E) = P (S|A) *P (D|A)*P (A|¬B ^
¬E) *P (¬B) *P (¬E).
= 0.75* 0.91* 0.001* 0.998*0.999
= 0.00068045.
Question
• From the given Bayesian Network Graph above, we see
that the marks depend upon two other variables. They
are,
– Exam Level (e) – This discrete variable denotes the difficulty
of the exam and has two values (0 for easy and 1 for difficult)
– IQ Level (i) – This represents the Intelligence Quotient level of
the student and is also discrete in nature having two values (0
for low and 1 for high)
• Additionally, the IQ level of the student also leads us to
another variable, which is the Aptitude Score of the
student (s). Now, with marks the student has scored, he
can secure admission to a particular university. The
probability distribution for getting admitted (a) to a
university is also given above.
Problems
• Case 1: Calculate the probability that in spite of the
exam level being difficult, the student having a low
IQ level and a low Aptitude Score, manages to pass
the exam and secure admission to the university.
• Case 2: In another case, calculate the probability that
the student has a High IQ level and Aptitude Score,
the exam being easy yet fails to pass and does not
secure admission to the university.
Solutions
• Case 1:
– From the above word problem statement, the Joint
Probability Distribution can be written as below,
P[a=1, m=1, i=0, e=1, s=0]
From the above Conditional Probability tables, the values
for the given conditions are fed to the formula and is
calculated as below.
P[a=1, m=1, i=0, e=1, s=0] = P(a=1 | m=1) . P(m=1 | i=0,
e=1) . P(i=0) . P(e=1) . P(s=0 | i=0)
= 0.1 * 0.1 * 0.8 * 0.3 * 0.75
= 0.0018
Solutions
• Case 2:
– The formula for the JPD is given by
P[a=0, m=0, i=1, e=0, s=1]
Thus,
P[a=0, m=0, i=1, e=0, s=1]= P(a=0 | m=0) . P(m=0 | i=1,
e=0) . P(i=1) . P(e=0) . P(s=1 | i=1)
= 0.6 * 0.5 * 0.2 * 0.7 * 0.6
= 0.0252

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy