0% found this document useful (0 votes)
5 views

L07 Probabilistic Reasoning Till Sep6

Uploaded by

pedanticwiles
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views

L07 Probabilistic Reasoning Till Sep6

Uploaded by

pedanticwiles
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 71

COL333/671: Introduction to AI

Semester I, 2024-25

Probabilistic Reasoning

Rohan Paul

1
Outline
• Last Class
• Adversarial Search
• This Class
• Probabilistic Reasoning
• Reference Material
• AIMA Ch. 13 and 14

2
Acknowledgement
These slides are intended for teaching purposes only. Some material
has been used/adapted from web sources and from slides by Doina
Precup, Dorsa Sadigh, Percy Liang, Mausam, Dan Klein, Anca
Dragan, Nicholas Roy and others.

3
Uncertainty in AI
• Uncertainty: I hear an unusual sound and a
• Observed variables (evidence): Agent knows certain things about burning smell in my car, what
the state of the world (e.g., sensor measurements or symptoms) fault is there in my engine?

• Unobserved variables: Agent needs to reason about other aspects


(e.g. what disease is present, is the car operational, location of the
burglar) I have fever, loss of smell, loss
of taste, do I have Covid?
• Model: Agent knows something about how the known variables
relate to the unknown variables

I hear some footsteps in my


house, where is the burglar?
• Probabilistic reasoning gives us a framework for managing
our beliefs and knowledge.

4
Examples
Inferring disease from symptoms

5
Examples
Accident – driving domain.

6
Examples
Fault diagnosis

7
Examples
Predictive analytics/expert systems

8
Outline
• Representation for Uncertainty (review)
• Bayes Nets:
• Probabilistic reasoning gives us a framework for managing our
beliefs and knowledge.
• Answering queries using Bayes Net
• Inference methods
• Approximate methods for answering queries
• Use of learning

9
Random Variables
• A random variable is some aspect of the world about which
we (may) have uncertainty I hear an unusual sound and a
• R = Do I have Covid? burning smell in my car, what
• T = Engine is faulty or working? fault is there in my engine?
• D = How long will it take to drive to IIT?
• L = Where is the person?
I have fever, loss of smell, loss
of taste, do I have Covid?
• Domains
• R in {true, false} (often write as {+r, -r})
• T in {faulty, working}
I hear some footsteps in my
• D in [0, ) house, where is the burglar?
• L in possible locations in a grid {(0,0), (0,1), …}

10
Joint Distributions
• A joint distribution over a set of random variables:
specifies a real number for each assignment (or outcome):

T W P
• Must obey: hot sun 0.4
hot rain 0.1
cold sun 0.2
cold rain 0.3

Note: Joint distribution can answer all probabilistic queries.


Problem: Table size is dn. 11
Events
• An event is a set E of outcomes

• From a joint distribution, we can calculate the


probability of any event
• Probability that it’s hot AND sunny? .4
T W P
• Probability that it’s hot? .4 + .1
hot sun 0.4
hot rain 0.1
• Probability that it’s hot OR sunny? .4 + .1 + .2
cold sun 0.2
cold rain 0.3

12
Marginalization
▪ From a joint distribution (>1 variable) reduce it to a distribution over a smaller set of variables
▪ Called marginal distributions are sub-tables which eliminate variables
▪ Marginalization (summing out): Combine collapsed rows by adding likelihoods

T P
hot 0.5
T W P cold 0.5
hot sun 0.4
hot rain 0.1
cold sun 0.2 W P

cold rain 0.3 sun 0.6


rain 0.4

13
Conditioning Joint Distribution

▪ Conditional distributions are probability distributions


T W P
over some variables given fixed values of others
hot sun 0.4
hot rain 0.1
cold sun 0.2
cold rain 0.3

P(a,b) Conditional Distributions

W P W P
P(a) P(b)
sun 0.4 sun 0.8
rain 0.6 rain 0.2
14
Inference by Enumeration
• P(W)? S T W P
summer hot sun 0.30
summer hot rain 0.05
summer cold sun 0.10
summer cold rain 0.05
winter hot sun 0.10
winter hot rain 0.05
winter cold sun 0.15
winter cold rain 0.20
Inference by Enumeration
• P(W)? S T W P
summer hot sun 0.30
P(sun)=.3+.1+.1+.15=.65 summer hot rain 0.05
summer cold sun 0.10
summer cold rain 0.05
winter hot sun 0.10
winter hot rain 0.05
winter cold sun 0.15
winter cold rain 0.20
Inference by Enumeration
• P(W)? S T W P
summer hot sun 0.30
P(sun)=.3+.1+.1+.15=.65 summer hot rain 0.05
P(rain)=1-.65=.35
summer cold sun 0.10
summer cold rain 0.05
winter hot sun 0.10
winter hot rain 0.05
winter cold sun 0.15
winter cold rain 0.20
Inference by Enumeration
S T W P
summer hot sun 0.30
summer hot rain 0.05
summer cold sun 0.10
• P(W | winter, hot)? summer cold rain 0.05
winter hot sun 0.10
winter hot rain 0.05
winter cold sun 0.15
winter cold rain 0.20
Inference by Enumeration
S T W P
summer hot sun 0.30
summer hot rain 0.05
summer cold sun 0.10
• P(W | winter, hot)? summer cold rain 0.05
winter hot sun 0.10
P(sun|winter,hot)~.1
P(rain|winter,hot)~.05
winter hot rain 0.05
P(sun|winter,hot)=2/3 winter cold sun 0.15
P(rain|winter,hot)=1/3
winter cold rain 0.20
Product Rule
• Marginal and a conditional provides the joint distribution.

• Example:
D W P D W P
wet sun 0.1 wet sun 0.08
R P
dry sun 0.9 dry sun 0.72
sun 0.8
wet rain 0.7 wet rain 0.14
rain 0.2
dry rain 0.3 dry rain 0.06

20
Chain Rule

Constructing a larger distribution by simpler distribution.


21
Bayes Rule Thomas Bayes

• Two ways to factor a joint distribution over two variables:

• Dividing, we get:

• Example: Diagnostic probability from causal probability:

• Usefulness
• Lets us build one conditional from its reverse.
• Often one conditional is difficult to obtain but the other one is
simple.
22
Independence H 0.5
T 0.5
• Two variables are independent if:
H 0.5
n smaller T 0.5
distributions
• This says that their joint distribution factors into a
product two simpler distributions
• Another form: H 0.5
T 0.5
• We write:

• Example
• N-independent flips of a fair coin.

23
Bayesian Networks
• Problem with using full joint distribution tables as our probabilistic models:
• Unless there are only a few variables, the joint is hard to represent explicitly.

• Bayesian Networks:
• A technique for describing complex joint distributions (models) using simple, local
distributions (conditional probabilities)
• Also known as probabilistic graphical models
• Encode how variables locally influence each other. Local interactions chain together
to give global, indirect interactions

24
Examples

Online tool: https://detsutut.shinyapps.io/shinyDBNet/ 25


Bayesian Networks: Semantics
• A directed, acyclic graph, one node per random variable

• A conditional probability table (CPT) for each node


• A collection of distributions over X, one for each combination of parents’ values

• Bayesian Networks implicitly encode joint distributions


• As a product of local conditional distributions

• To see what probability a BN gives to a full assignment, multiply all the relevant conditionals:
Example: The Alarm Network
B P(B) E P(E)
Burglary Earthquake

+b 0.001 +e 0.002

-b 0.999 -e 0.998
Alarm

B E A P(A|B,E)
+b +e +a 0.95
John Mary
Calls Calls +b +e -a 0.05
+b -e +a 0.94
A J P(J|A) A M P(M|A) +b -e -a 0.06
+a +j 0.9 +a +m 0.7 -b +e +a 0.29
+a -j 0.1 +a -m 0.3 -b +e -a 0.71
-a +j 0.05 -a +m 0.01 -b -e +a 0.001
-a -j 0.95 -a -m 0.99 -b -e -a 0.999
Example: The Alarm Network
B P(B) E P(E)
B E
+b 0.001 +e 0.002
-b 0.999 -e 0.998

A
A J P(J|A) A M P(M|A)
B E A P(A|B,E)
+a +j 0.9 +a +m 0.7
+b +e +a 0.95
+a -j 0.1 +a -m 0.3
-a +j 0.05 J M -a +m 0.01
+b +e -a 0.05
+b -e +a 0.94
-a -j 0.95 -a -m 0.99
+b -e -a 0.06
-b +e +a 0.29
-b +e -a 0.71
-b -e +a 0.001
-b -e -a 0.999
Estimating likelihood of variables
B P(B) E P(E)
B E
+b 0.001 +e 0.002
-b 0.999 -e 0.998

A
A J P(J|A) A M P(M|A)
B E A P(A|B,E)
+a +j 0.9 +a +m 0.7
+b +e +a 0.95
+a -j 0.1 +a -m 0.3
-a +j 0.05 J M -a +m 0.01
+b +e -a 0.05
+b -e +a 0.94
-a -j 0.95 -a -m 0.99
+b -e -a 0.06
-b +e +a 0.29
-b +e -a 0.71
-b -e +a 0.001
-b -e -a 0.999
Answering a general probabilistic query
• Inference by enumeration is one way to perform
inference in a Bayesian Network (Bayes Net). B E

J M
Bayesian Networks: Inference
• Bayesian Networks
• Implicitly encode a probability distribution
• As a product of local conditional distributions
• Variables
• Query variables
• Evidence variables All variables
• Hidden variables
• Inference: What we want to estimate?
• Estimating some useful quantity from the joint
distribution.
• Posterior probability
• Most likely explanation
Inference by Enumeration: A way of answering
probabilistic queries

• Setup: A distribution over query variables (Q)


given evidence variables (E)
• Select entries consistent with the evidence.
• E.g., Alarm rang, it is rainy, disease present
• Compute the joint distribution
• Sum out (eliminate) the hidden variables (H)
• Normalize the distribution
• Next
• Introduce a notion called factors
• Understand this computation using joining and
marginalization of factors.
Inference by Enumeration: Example
• Traffic Domain
+r 0.1
• Random Variables -r 0.9
R
• R: Raining
• T: Traffic
• L: Late for class
T +r +t 0.8
+r -t 0.2
-r +t 0.1
-r -t 0.9
L

+t +l 0.3
+t -l 0.7
-t +l 0.1
-t -l 0.9
Inference by Enumeration as Operations on
Factors Traffic domain
• Factors
• A factor is a function from some set R
of variables into a specific value. +r 0.1 +r +t 0.8 +t +l 0.3
• Initial factos -r 0.9 +r -t 0.2 +t -l 0.7
• Conditional probability tables (one per -r +t 0.1 -t +l 0.1
node) T -r -t 0.9 -t -l 0.9
• Select the values consistent with the
evidence
• Inference by Enumeration
• Via factors, can be understood as a
L If some variables are observed then apply that information to
associated factor. Others are not affected.
procedure that joins all the factors
and then sums out all the hidden
variables.
• Define two operations “joining” and
“summing” next. +r 0.1 +r +t 0.8 +t +l 0.3
-r 0.9 +r -t 0.2 -t +l 0.1
-r +t 0.1
-r -t 0.9
Operation I: Joining Factors
• Joining
• Get all the factors over the joining variables.
• Build a new factor over the union of variables involved.
• Computation for each entry: pointwise products

R
+r 0.1 +r +t 0.8 +r +t 0.08
-r 0.9 +r -t 0.2 +r -t 0.02 R,T
T -r +t 0.1 -r +t 0.09
-r -t 0.9 -r -t 0.81
Joining Factors

Source: AIMA Ch 14.


Joining Multiple Factors
+r 0.1
R -r 0.9 Join R
+r +t 0.08
Join T
R, T, L
+r -t 0.02
T +r +t 0.8 -r +t 0.09
+r -t 0.2 -r -t 0.81 R, T
-r +t 0.1 +r +t +l 0.024
-r -t 0.9 +r +t -l 0.056
L
L +r -t +l 0.002
+r -t -l 0.018
+t +l 0.3 +t +l 0.3 -r +t +l 0.027
+t -l 0.7 +t -l 0.7 -r +t -l 0.063
-t +l 0.1 -t +l 0.1 -r -t +l 0.081
-t -l 0.9 -t -l 0.9 -r -t -l 0.729
Operation II: Eliminating Factors
• Marginalization +r +t 0.08 Sum out R
• Take a factor and sum out a variable +r -t 0.02 +t 0.17
-r +t 0.09 -t 0.83
• Shrinks the factor to a smaller one
-r -t 0.81

R, T, L T, L L
+r +t +l 0.024
+r +t -l 0.056
+r -t +l 0.002 Sum out R Sum out T
+r -t -l 0.018 +t +l 0.051
-r +t +l 0.027 +t -l 0.119 +l 0.134
-r +t -l 0.063 -t +l 0.083 -l 0.866
-r -t +l 0.081 -t -l 0.747
-r -t -l 0.729
Inference by Enumeration
R Multiple join operations and multiple eliminate operations

L
Variable Elimination
• Inference by Enumeration
• Problem: the whole distribution is “joined up“ before “sum out” the hidden variables

• Variable Elimination
• Interleaves joining and eliminating variables
• Does not create the full joint distribution in one go
• Key Idea:
• Picks a variable ordering. Picks a variable.
• Joins all factors containing that variable.
• Sums out the influence of the variable on new factor.

• Leverage the structure (topology) of the Bayesian Network


• Marginalize early (avoid growing the full joint distribution)
Inference by Enumeration vs. Variable
Elimination
R

Inference by Enumeration Variable Elimination


T

L Join on r Join on r

Join on t Eliminate r

Eliminate r Join on t

Eliminate t Eliminate t
Variable Elimination
Join R Sum out R Join T Sum out T

+r +t 0.08
+r 0.1 +r -t 0.02
-r 0.9 +t 0.17
-r +t 0.09
-t 0.83
-r -t 0.81
R R, T T T, L
+r +t 0.8 L
+r -t 0.2
-r +t 0.1
T -r -t 0.9 L L +t +l 0.051
+t -l 0.119 +l 0.134
-t +l 0.083 -l 0.866
L +t +l 0.3 +t +l 0.3 -t -l 0.747
+t +l 0.3
+t -l 0.7 +t -l 0.7
+t -l 0.7
-t +l 0.1 -t +l 0.1
-t +l 0.1
-t -l 0.9 -t -l 0.9
-t -l 0.9
Incorporating Evidence
• Till Now, we computed P(Late)?
+r 0.1 +r +t 0.8 +t +l 0.3
• What happens when P(Late| Rain)? -r 0.9 +r -t 0.2 +t -l 0.7
-r +t 0.1 -t +l 0.1
• How to incorporate evidence in Variable -r -t 0.9 -t -l 0.9
Elimination.
• Solution
• If evidence, then start with factors and select
the evidence. +r 0.1 +r +t 0.8 +t +l 0.3
+r -t 0.2 +t -l 0.7
• After selecting evidence, eliminate all variables -t +l 0.1
other than query and evidence. -t -l 0.9

Evidence incorporated in the initial factors


Variable Elimination (VE)
• Query:

• Start with initial factors:


• Local conditional probability tables.
• Evidence (known) variables are instantiated.

• While there are still hidden variables (not Q or evidence):


• Pick a hidden variable H (from some ordering)
• Join all factors mentioning H
• Eliminate (sum out) H

• Join all the remaining factors and normalize


Variable Elimination: Alarm Domain
Original query
Factors
Model

marginal can be obtained from joint by summing out

use Bayes’ net joint distribution expression

use x*(y+z) = xy + xz

joining on a, and then summing out gives f 1

use x*(y+z) = xy + xz

joining on e, and then summing out gives f2


Variable Elimination: Complexity

There are three variables to eliminate { X1, X2 and Z }. The Y


variables are observed (instantiated).
Example
Variable Elimination: Efficient way to re-use
factor computation.

Source: AIMA Ch 14.


Example

Computational complexity
• Depends on the largest factor
generated in VE.
• Factor size = number of entries
in the table.
• In this example: each factor is of
size 2 (only one variable). Note
that y is observed.
• X1, X2, Z, X3
How does variable ordering affect VE complexity?

• For the query P(X n|y1,…,yn)


• There are n variables Z, X1, …, Xn-1to
eliminate.
• We need a way to order them (for
joining and eliminating)
• Consider two different orderings as
• Eliminate Z first. Z, X1, …, Xn-1
• Eliminate Z last. X1, …, Xn-1, Z.
• What is the size of the maximum factor
generated for each of the orderings?
Example
Eliminate Z First

This factor is 2n
Example
Eliminate Z Last

This factor is size 2

Other steps are like the previous example. Each factor is of size 2 consisting of one variable.
Variable ordering can have considerable impact.
Variable Ordering for VE
• Variable elimination is dominated by the size of the largest
factor constructed during the operation of the algorithm.
• Depends on the structure of the network and order of
elimination of the variables.
• Finding the optimal ordering is intractable.
• Can pose the problem of finding good ordering as a search.
• Use heuristics.
• Min-fill heuristic
• Eliminate the variable that creates the smallest sized factor (greedy
approach).
• Min-neighbors
• Eliminate the variable that has the smallest number of neighbors in
the current graph. Rank A, B and D with
the Min-Fill heuristic.

53
Some variables may be irrelevant for VE
Burglary Earthquake
Every variable that is not an ancestor of
a query variable or evidence variable is
irrelevant for the query.
Alarm

John Mary
Calls Calls

Variable can be eliminated.


54
Slide adapted from Prof. Mausam
Bayesian Networks: Independence
• Bayesian Networks
• Implicitly encode joint distributions
• A collection of distributions over X, one for
each combination of parents’ values
• Product of local conditional distributions
• Inference
• Given a fixed BN, what is P(X | e)
• Variable Elimination
• Modeling
• Understanding the assumptions made when
choosing a Bayes net graph

A Bayesian Network Model for Diagnosis of Liver Disorders. Onisko et al. 99.
Conditional Independence
• X and Y are independent if

• X and Y are conditionally independent given Z

• (Conditional) independence: Given Z, Y has no more information to


convey about X or Y does not probabilistically influence X.

• Example:
Smoke causes the alarm to be triggered. Once there is smoke it does not
matter what caused it (e.g., Fire or any other source).
Graph structure encodes independence
relations
• Conditional independence relations in a Bayes Net
Graph structure encodes independence
relations

Is X1 conditionally independent of X6 given X3 and X4?


Bayesian Network: Independence Assumptions

• Often there are additional conditional independences that are


implicit in the network.
• Core Idea: examine three node networks and then chain the ideas
together.
• How to show if two variables (X and Y) are conditionally independent
given evidence (say Z)?
• Yes. Provide a proof by analyzing the probability expression.
• No. Find a counter example. Instantiate a CPT for the BN such that X and Y are
not independent given Z.
Causal Chains
• Is X guaranteed to be independent of Z?
X Y Z • No
• Intuitively
X: No Mask Y: Covid Z: Fever
Transmission • Wearing no masks causes virus transmission
which causes fever.
• Wearing masks causes no virus transmission
causes no symptom.
• Path between X and Z is active.
• Instantiate a CPT
P( +y | +x ) = 1, P( -y | - x ) = 1,
P( +z | +y ) = 1, P( -z | -y ) = 1
Causal Chains
• Is X guaranteed to be independent of Z
X Y Z given Y?
• Yes
X: No Mask Y: Covid Z: Fever
Transmission

• Evidence along the chain blocks the


influence (inactivates the path).
Common Cause
Y: Covid infection
• Is X guaranteed to be independent of Z?
Y • No
• Intuitively
• Covid infection causes both Fever and Loss
of Smell.
• Path between X and Z is active.
X Z
• Instantiate a CPT
X: Fever Z: Loss of smell P( +x | +y ) = 1, P( -x | -y ) = 1,
P( +z | +y ) = 1, P( -z | -y ) = 1
Common Cause
Y: Covid infection • Is X independent of Z given Y?
• Yes
Y

X Z
• If you have Covid, then belief over the loss of
X: Fever Z: Loss of smell
smell is not affected by presence of fever
• Observing the cause blocks the influence
(inactivates the path).
Common Effect
X: Covid Y: Tuberculosis
• Are X and Y independent?
• Yes
X Y • Covid and TB both cause Fever. But can’t say
that if you have Covid then you are more or
less likely to have TB (under this model)

Z
Z: Fever
Common Effect
X: Covid Y: Tuberculosis • Is X independent of Y given Z?
• No
X Y • Seeing the fever puts Covid and TB in
competition as possible causal explanations.
• It is likely that one of them is the cause, rare for
both. If Covid is present then the likelihood of
TB being present is low (reduces its chances).
Z • Observing the cause activates influence
between possible causes.
Z: Fever
“Explaining Away”

In words: if there are two possible causes


for the observed evidence, knowing
about one of the causes provides
information about the other.

Example from Nicholas Roy


Active and Inactive Paths
• Question: Are X and Y conditionally independent given Active Triples Inactive Triples
evidence variables {Z}?
• Yes, if X and Y “d-separated” by Z
• Consider all (undirected) paths from X to Y
• No active paths = independence.

• A path is active if each triple is active:


• Causal chain A -> B -> C where B is unobserved (either direction)
• Common cause A <- B -> C where B is unobserved
• Common effect (aka v-structure)
A -> B <- C where B or one of its descendants is observed

• A path is blocked with even a single inactive segment


D-Separation
▪ Query: ?
▪ Check all (undirected) paths between and
▪ If one or more active, then independence not guaranteed

▪ Otherwise (i.e. if all paths are inactive),


then independence is guaranteed
D-Separation: Examples

Yes R B

T’
D-Separation: Examples
L

Yes
R B
Yes

D T

Yes
T’
D-Separation: Examples

Yes
T D

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy