L07 Probabilistic Reasoning Till Sep6
L07 Probabilistic Reasoning Till Sep6
Semester I, 2024-25
Probabilistic Reasoning
Rohan Paul
1
Outline
• Last Class
• Adversarial Search
• This Class
• Probabilistic Reasoning
• Reference Material
• AIMA Ch. 13 and 14
2
Acknowledgement
These slides are intended for teaching purposes only. Some material
has been used/adapted from web sources and from slides by Doina
Precup, Dorsa Sadigh, Percy Liang, Mausam, Dan Klein, Anca
Dragan, Nicholas Roy and others.
3
Uncertainty in AI
• Uncertainty: I hear an unusual sound and a
• Observed variables (evidence): Agent knows certain things about burning smell in my car, what
the state of the world (e.g., sensor measurements or symptoms) fault is there in my engine?
4
Examples
Inferring disease from symptoms
5
Examples
Accident – driving domain.
6
Examples
Fault diagnosis
7
Examples
Predictive analytics/expert systems
8
Outline
• Representation for Uncertainty (review)
• Bayes Nets:
• Probabilistic reasoning gives us a framework for managing our
beliefs and knowledge.
• Answering queries using Bayes Net
• Inference methods
• Approximate methods for answering queries
• Use of learning
9
Random Variables
• A random variable is some aspect of the world about which
we (may) have uncertainty I hear an unusual sound and a
• R = Do I have Covid? burning smell in my car, what
• T = Engine is faulty or working? fault is there in my engine?
• D = How long will it take to drive to IIT?
• L = Where is the person?
I have fever, loss of smell, loss
of taste, do I have Covid?
• Domains
• R in {true, false} (often write as {+r, -r})
• T in {faulty, working}
I hear some footsteps in my
• D in [0, ) house, where is the burglar?
• L in possible locations in a grid {(0,0), (0,1), …}
10
Joint Distributions
• A joint distribution over a set of random variables:
specifies a real number for each assignment (or outcome):
T W P
• Must obey: hot sun 0.4
hot rain 0.1
cold sun 0.2
cold rain 0.3
12
Marginalization
▪ From a joint distribution (>1 variable) reduce it to a distribution over a smaller set of variables
▪ Called marginal distributions are sub-tables which eliminate variables
▪ Marginalization (summing out): Combine collapsed rows by adding likelihoods
T P
hot 0.5
T W P cold 0.5
hot sun 0.4
hot rain 0.1
cold sun 0.2 W P
13
Conditioning Joint Distribution
W P W P
P(a) P(b)
sun 0.4 sun 0.8
rain 0.6 rain 0.2
14
Inference by Enumeration
• P(W)? S T W P
summer hot sun 0.30
summer hot rain 0.05
summer cold sun 0.10
summer cold rain 0.05
winter hot sun 0.10
winter hot rain 0.05
winter cold sun 0.15
winter cold rain 0.20
Inference by Enumeration
• P(W)? S T W P
summer hot sun 0.30
P(sun)=.3+.1+.1+.15=.65 summer hot rain 0.05
summer cold sun 0.10
summer cold rain 0.05
winter hot sun 0.10
winter hot rain 0.05
winter cold sun 0.15
winter cold rain 0.20
Inference by Enumeration
• P(W)? S T W P
summer hot sun 0.30
P(sun)=.3+.1+.1+.15=.65 summer hot rain 0.05
P(rain)=1-.65=.35
summer cold sun 0.10
summer cold rain 0.05
winter hot sun 0.10
winter hot rain 0.05
winter cold sun 0.15
winter cold rain 0.20
Inference by Enumeration
S T W P
summer hot sun 0.30
summer hot rain 0.05
summer cold sun 0.10
• P(W | winter, hot)? summer cold rain 0.05
winter hot sun 0.10
winter hot rain 0.05
winter cold sun 0.15
winter cold rain 0.20
Inference by Enumeration
S T W P
summer hot sun 0.30
summer hot rain 0.05
summer cold sun 0.10
• P(W | winter, hot)? summer cold rain 0.05
winter hot sun 0.10
P(sun|winter,hot)~.1
P(rain|winter,hot)~.05
winter hot rain 0.05
P(sun|winter,hot)=2/3 winter cold sun 0.15
P(rain|winter,hot)=1/3
winter cold rain 0.20
Product Rule
• Marginal and a conditional provides the joint distribution.
• Example:
D W P D W P
wet sun 0.1 wet sun 0.08
R P
dry sun 0.9 dry sun 0.72
sun 0.8
wet rain 0.7 wet rain 0.14
rain 0.2
dry rain 0.3 dry rain 0.06
20
Chain Rule
• Dividing, we get:
• Usefulness
• Lets us build one conditional from its reverse.
• Often one conditional is difficult to obtain but the other one is
simple.
22
Independence H 0.5
T 0.5
• Two variables are independent if:
H 0.5
n smaller T 0.5
distributions
• This says that their joint distribution factors into a
product two simpler distributions
• Another form: H 0.5
T 0.5
• We write:
• Example
• N-independent flips of a fair coin.
23
Bayesian Networks
• Problem with using full joint distribution tables as our probabilistic models:
• Unless there are only a few variables, the joint is hard to represent explicitly.
• Bayesian Networks:
• A technique for describing complex joint distributions (models) using simple, local
distributions (conditional probabilities)
• Also known as probabilistic graphical models
• Encode how variables locally influence each other. Local interactions chain together
to give global, indirect interactions
24
Examples
• To see what probability a BN gives to a full assignment, multiply all the relevant conditionals:
Example: The Alarm Network
B P(B) E P(E)
Burglary Earthquake
+b 0.001 +e 0.002
-b 0.999 -e 0.998
Alarm
B E A P(A|B,E)
+b +e +a 0.95
John Mary
Calls Calls +b +e -a 0.05
+b -e +a 0.94
A J P(J|A) A M P(M|A) +b -e -a 0.06
+a +j 0.9 +a +m 0.7 -b +e +a 0.29
+a -j 0.1 +a -m 0.3 -b +e -a 0.71
-a +j 0.05 -a +m 0.01 -b -e +a 0.001
-a -j 0.95 -a -m 0.99 -b -e -a 0.999
Example: The Alarm Network
B P(B) E P(E)
B E
+b 0.001 +e 0.002
-b 0.999 -e 0.998
A
A J P(J|A) A M P(M|A)
B E A P(A|B,E)
+a +j 0.9 +a +m 0.7
+b +e +a 0.95
+a -j 0.1 +a -m 0.3
-a +j 0.05 J M -a +m 0.01
+b +e -a 0.05
+b -e +a 0.94
-a -j 0.95 -a -m 0.99
+b -e -a 0.06
-b +e +a 0.29
-b +e -a 0.71
-b -e +a 0.001
-b -e -a 0.999
Estimating likelihood of variables
B P(B) E P(E)
B E
+b 0.001 +e 0.002
-b 0.999 -e 0.998
A
A J P(J|A) A M P(M|A)
B E A P(A|B,E)
+a +j 0.9 +a +m 0.7
+b +e +a 0.95
+a -j 0.1 +a -m 0.3
-a +j 0.05 J M -a +m 0.01
+b +e -a 0.05
+b -e +a 0.94
-a -j 0.95 -a -m 0.99
+b -e -a 0.06
-b +e +a 0.29
-b +e -a 0.71
-b -e +a 0.001
-b -e -a 0.999
Answering a general probabilistic query
• Inference by enumeration is one way to perform
inference in a Bayesian Network (Bayes Net). B E
J M
Bayesian Networks: Inference
• Bayesian Networks
• Implicitly encode a probability distribution
• As a product of local conditional distributions
• Variables
• Query variables
• Evidence variables All variables
• Hidden variables
• Inference: What we want to estimate?
• Estimating some useful quantity from the joint
distribution.
• Posterior probability
• Most likely explanation
Inference by Enumeration: A way of answering
probabilistic queries
+t +l 0.3
+t -l 0.7
-t +l 0.1
-t -l 0.9
Inference by Enumeration as Operations on
Factors Traffic domain
• Factors
• A factor is a function from some set R
of variables into a specific value. +r 0.1 +r +t 0.8 +t +l 0.3
• Initial factos -r 0.9 +r -t 0.2 +t -l 0.7
• Conditional probability tables (one per -r +t 0.1 -t +l 0.1
node) T -r -t 0.9 -t -l 0.9
• Select the values consistent with the
evidence
• Inference by Enumeration
• Via factors, can be understood as a
L If some variables are observed then apply that information to
associated factor. Others are not affected.
procedure that joins all the factors
and then sums out all the hidden
variables.
• Define two operations “joining” and
“summing” next. +r 0.1 +r +t 0.8 +t +l 0.3
-r 0.9 +r -t 0.2 -t +l 0.1
-r +t 0.1
-r -t 0.9
Operation I: Joining Factors
• Joining
• Get all the factors over the joining variables.
• Build a new factor over the union of variables involved.
• Computation for each entry: pointwise products
R
+r 0.1 +r +t 0.8 +r +t 0.08
-r 0.9 +r -t 0.2 +r -t 0.02 R,T
T -r +t 0.1 -r +t 0.09
-r -t 0.9 -r -t 0.81
Joining Factors
R, T, L T, L L
+r +t +l 0.024
+r +t -l 0.056
+r -t +l 0.002 Sum out R Sum out T
+r -t -l 0.018 +t +l 0.051
-r +t +l 0.027 +t -l 0.119 +l 0.134
-r +t -l 0.063 -t +l 0.083 -l 0.866
-r -t +l 0.081 -t -l 0.747
-r -t -l 0.729
Inference by Enumeration
R Multiple join operations and multiple eliminate operations
L
Variable Elimination
• Inference by Enumeration
• Problem: the whole distribution is “joined up“ before “sum out” the hidden variables
• Variable Elimination
• Interleaves joining and eliminating variables
• Does not create the full joint distribution in one go
• Key Idea:
• Picks a variable ordering. Picks a variable.
• Joins all factors containing that variable.
• Sums out the influence of the variable on new factor.
L Join on r Join on r
Join on t Eliminate r
Eliminate r Join on t
Eliminate t Eliminate t
Variable Elimination
Join R Sum out R Join T Sum out T
+r +t 0.08
+r 0.1 +r -t 0.02
-r 0.9 +t 0.17
-r +t 0.09
-t 0.83
-r -t 0.81
R R, T T T, L
+r +t 0.8 L
+r -t 0.2
-r +t 0.1
T -r -t 0.9 L L +t +l 0.051
+t -l 0.119 +l 0.134
-t +l 0.083 -l 0.866
L +t +l 0.3 +t +l 0.3 -t -l 0.747
+t +l 0.3
+t -l 0.7 +t -l 0.7
+t -l 0.7
-t +l 0.1 -t +l 0.1
-t +l 0.1
-t -l 0.9 -t -l 0.9
-t -l 0.9
Incorporating Evidence
• Till Now, we computed P(Late)?
+r 0.1 +r +t 0.8 +t +l 0.3
• What happens when P(Late| Rain)? -r 0.9 +r -t 0.2 +t -l 0.7
-r +t 0.1 -t +l 0.1
• How to incorporate evidence in Variable -r -t 0.9 -t -l 0.9
Elimination.
• Solution
• If evidence, then start with factors and select
the evidence. +r 0.1 +r +t 0.8 +t +l 0.3
+r -t 0.2 +t -l 0.7
• After selecting evidence, eliminate all variables -t +l 0.1
other than query and evidence. -t -l 0.9
use x*(y+z) = xy + xz
use x*(y+z) = xy + xz
Computational complexity
• Depends on the largest factor
generated in VE.
• Factor size = number of entries
in the table.
• In this example: each factor is of
size 2 (only one variable). Note
that y is observed.
• X1, X2, Z, X3
How does variable ordering affect VE complexity?
This factor is 2n
Example
Eliminate Z Last
Other steps are like the previous example. Each factor is of size 2 consisting of one variable.
Variable ordering can have considerable impact.
Variable Ordering for VE
• Variable elimination is dominated by the size of the largest
factor constructed during the operation of the algorithm.
• Depends on the structure of the network and order of
elimination of the variables.
• Finding the optimal ordering is intractable.
• Can pose the problem of finding good ordering as a search.
• Use heuristics.
• Min-fill heuristic
• Eliminate the variable that creates the smallest sized factor (greedy
approach).
• Min-neighbors
• Eliminate the variable that has the smallest number of neighbors in
the current graph. Rank A, B and D with
the Min-Fill heuristic.
53
Some variables may be irrelevant for VE
Burglary Earthquake
Every variable that is not an ancestor of
a query variable or evidence variable is
irrelevant for the query.
Alarm
John Mary
Calls Calls
A Bayesian Network Model for Diagnosis of Liver Disorders. Onisko et al. 99.
Conditional Independence
• X and Y are independent if
• Example:
Smoke causes the alarm to be triggered. Once there is smoke it does not
matter what caused it (e.g., Fire or any other source).
Graph structure encodes independence
relations
• Conditional independence relations in a Bayes Net
Graph structure encodes independence
relations
X Z
• If you have Covid, then belief over the loss of
X: Fever Z: Loss of smell
smell is not affected by presence of fever
• Observing the cause blocks the influence
(inactivates the path).
Common Effect
X: Covid Y: Tuberculosis
• Are X and Y independent?
• Yes
X Y • Covid and TB both cause Fever. But can’t say
that if you have Covid then you are more or
less likely to have TB (under this model)
Z
Z: Fever
Common Effect
X: Covid Y: Tuberculosis • Is X independent of Y given Z?
• No
X Y • Seeing the fever puts Covid and TB in
competition as possible causal explanations.
• It is likely that one of them is the cause, rare for
both. If Covid is present then the likelihood of
TB being present is low (reduces its chances).
Z • Observing the cause activates influence
between possible causes.
Z: Fever
“Explaining Away”
Yes R B
T’
D-Separation: Examples
L
Yes
R B
Yes
D T
Yes
T’
D-Separation: Examples
Yes
T D