Lecture 29

Reasoning under Uncertainty
Instructors: Dr. Durgesh Singh

CSE Discipline, PDPM IIITDM, Jabalpur -482005
Reasoning under uncertainty
▪ Agents in the real world need to handle uncertainty, whether

due to partial observability, nondeterminism, or adversaries.
▪ An agent may never know for sure what state it is in now or
where it will end up after a sequence of actions.
Nature of Uncertain Knowledge
▪ Let us try to write rules for dental diagnosis using propositional

logic, so that we can see how the logical approach breaks down.
Consider the following simple rule:
Toothache ⇒ Cavity.
▪ The problem is that this rule is wrong.
▪ Not all patients with toothaches have cavities; some of them
have gum disease, swelling, or one of several other problems:
Toothache ⇒ Cavity ∨ GumProblem ∨ Swelling ∨ ……..
▪ In order to make the rule true, we have to add an almost

unlimited list of possible problems. We could try turning the rule
into a causal rule:
Cavity ⇒ Toothache
But this rule is also not right; not all cavities cause pain.
Toothache and a Cavity are always not connected, so the
judgement may go wrong.
▪ This is typical of the medical domain, as well as most other

judgmental domains: law, business, design, automobile repair,
gardening, dating, and so on.
▪ The agent’s knowledge can at best provide only a degree of
belief in the relevant sentences.
▪ Our main tool for dealing with degrees of belief is probability
theory.
▪ A logical agent believes each sentence to be true or false or has
no opinion, whereas a probabilistic agent may have a numerical
degree of belief between 0 (for sentences that are certainly
false) and 1 (certainly true).
Basic Probability Notation
▪ Random variables are typically divided into three kinds,

depending on the type of the domain:
▪ Boolean random variables, such as Cavity, have the domain
(true, false) or (1,0)
▪ Discrete random variables, take on values from a countable
domain. For example, the domain of Weather might be (sunny,
rainy, cloudy, snow).
▪ Continuous random variables (bounded or unbounded) take on
values from the real numbers. Ex: temp=21.4; temp<21.4 or
temp< 1.
Atomic events or sample points
▪ Atomic event: A complete specification of the state of the world
about which the agent is uncertain
▪ E.g., if the world consists of only two Boolean variables Cavity
and Toothache, then there are 4 distinct atomic events:
Cavity = false  Toothache = false
Cavity = false  Toothache = true
Cavity = true  Toothache = false
Cavity = true  Toothache = true
▪ Atomic events are mutually exclusive and exhaustive
▪ When two events are mutually exclusive, it means they cannot both occur at
the same time.
▪ When two events are exhaustive, it means that one of them must occur.
Axioms of Probability Theory
▪ All probabilities between 0 and 1

– 0 ≤ P(A) ≤ 1
– P(true) = 1
– P(false) = 0.
▪ The probability of disjunction is:
P(A B) = P(A)+ P(B)− P(A B)
Prior probability
▪ The unconditional or prior probability associated with a

proposition A is the degree of belief according to the absence of
any other information;
▪ It is written as P ( A ).
▪ For example, if the prior probability that I have a cavity is 0.1,
then we would write
P ( Cavity= true ) = 0.1 or P ( cavity ) = 0.1
▪ P ( A ) can be used only when there is no other information.
▪ As soon as some new information is known, we must reason with
the conditional probability of a given that new information.
Prior probability…
▪ Sometimes, we will want to talk about the probabilities of all the
possible values of a random variable.
▪ In that case, we will use an expression such as P(Weather), which
denotes a vector of values for the probabilities of each individual state
of the weather.
▪ Instead of writing these four equations
P ( Weather = sunny) = 0.7
P ( Weather= rain) = 0.2
P ( Weather= cloudy) = 0.08
P(Weather = snow) = 0.02
we may simply write: P( Weather) = (0.7,0.2,0.08,0.02) (Note that the
probabilities sum to 1 )
▪ This statement defines a prior probability distribution for the random
variable Weather.
Prior probability…
▪ Joint probability distribution for a set of random variables gives
the probability of every atomic event on those random variables
▪ P(Weather, Cavity) = a 4 × 2 matrix of values:
Weather = sunny rainy cloudy snow

Cavity = true 0.144 0.02 0.016 0.02
Cavity = false 0.576 0.08 0.064 0.08
▪ A full joint distribution specifies the probability of every atomic

event and is therefore a complete specification of one's
uncertainty about the world in question.
Conditional or posterior probability
▪ The notation used is P(a l b),where a and b are any proposition.

This is read as "the probability of a, given that all we know is b."
For example,
P(cavity l toothache) = 0.8
“indicates that if a patient is observed to have a toothache and no
other information is yet available, then the probability of the
patient's having a cavity will be 0.8.”
Conditional or posterior probability
▪ Conditional probabilities can be defined in terms of

unconditional probabilities.
P(a|b) = P (a ^ b)
P (b)
holds whenever P(b)>0

This equation can be written as
P(a^b) = P(a|b) * P(b) (which is called product rule)
Alternative way:
P(a^b) = P(b|a) * P(a)
Chain Rule/Product Rule
Example
A domain consisting of just the three Boolean variables

(the dentist’s nasty steel probe
catches in my tooth).
Inference Using Full Joint Distributions
P(toothache)=.108+.012+.016+.064
= .20 or 20%
P(toothachecavity) = .20 + ??
.072 + .008
.28
Problems with joint distribution ??
▪ Worst case time: O(dn)
▪ Where d = max arity
▪ And n = number of random variables
▪ Space complexity also O(dn)
▪ Size of joint distribution
19
Independence
▪ A and B are independent iff:
P( A | B) = P( A)
P( B | A) = P( B)
Therefore, if A and B are independent:

P( A  B)
P( A | B) = = P( A)
P( B)
P( A  B) = P( A) P( B)
Independence…
Complete independence is powerful but rare. What to do if it

doesn’t hold?
Conditional Independence
Conditional Independence
▪ The general definition of conditional independence of two

variables X and Y, given a third variable Z is
(I)
(II)
Conditional Independence II
P(catch | toothache, cavity) = P(catch | cavity)

P(catch | toothache,cavity) = P(catch |cavity)
Power of Cond. Independence
▪ Often, using conditional independence reduces the storage

complexity of the joint distribution from exponential to linear!!
▪ Conditional independence is the most basic & robust form of

knowledge about uncertain environments.
Bayes Rule
P( E | H ) P( H )
P( H | E ) =
P( E )
Simple proof from def of conditional probability:
P( H  E )
P( H | E ) = (Def. cond. prob.)
P( E )
P( H  E )
P( E | H ) = (Def. cond. prob.)
P( H )
P( H  E ) = P( E | H ) P( H ) (Mult by P(H) in line 2)
P( E | H ) P( H )
P( H | E ) = (Substitute #3 in #1)
P( E )
Use to Compute Diagnostic Probability from Causal
Probability
E.g. let M be meningitis, S be stiff neck

P(M) = 0.0001,
P(S) = 0.1,
P(S|M)= 0.8
P(M|S) =
Bayes Rule
▪ Does patient have cancer or not?
Given: A patient takes a lab test, and the result comes back
positive. The test returns a correct positive result in only 98% of
the cases in which the disease is present, and a correct negative
result in only 97% of the cases in which the disease is not present.
Furthermore, 0.008 of the entire population have this cancer.
Bayesian Networks
▪ In general, joint distribution over set of variables (X1, X1, ... , Xn)
requires exponential space for representation & inference.
▪ We also saw that independence and conditional independence
relationships among variables can greatly reduce the number of
probabilities that need to be specified in order to define the full
joint distribution.
▪ BNs(a graphical representation) is a data structure
▪ represents the dependencies among variables and
▪ give a concise specification of any full joint probability distribution
Chain rule in Bayesian Networks
The general assertion that, for every variable Xi in the

Bayesian network,
Bayes Networks
▪ A Bayesian network is a directed graph in which each node is

annotated with quantitative probability information.
▪ The full specification is as follows:
1. Each node corresponds to a random variable, which may be discrete
or continuous.
2. Directed links or arrows connect pairs of nodes. If there is an arrow
from node X to node Y , X is said to be a parent of Y.
3. Each node Xi, has a conditional probability distribution
P (Xi | Parents (Xi)) that quantifies the effect of the parents on the
node.
4. The graph has no directed cycles (and hence is a directed, acyclic
graph, or DAG).
Example
Topology of network encodes conditional independence

assertions:
Example: Burglar Alarm
▪ You have a new burglar alarm installed at home.

▪ It is reliable at detecting a burglary, but also responds on
occasion to minor earthquakes.
▪ You also have two neighbors, John and Mary, who have
promised to call you at work when they hear the alarm.
▪ John always calls when he hears the alarm, but sometimes
confuses the telephone ringing with the alarm and calls then,
too.
▪ Mary, on the other hand, likes loud music and sometimes misses
the alarm altogether.
Given the evidence of who has or has not called, we would like to
estimate the probability of a burglary.
Example: Burglar Alarm
Earthquake Burglary
Alarm
JohnCalls MarryCalls
Example Bayes Net: Burglar Alarm
▪ Notice that the network does not have nodes corresponding to
▪ Mary’s currently listening to loud music or
▪ The telephone ringing and confusing John.
▪ These factors are summarized in the uncertainty associated with
the links from Alarm to to JohnCalls and MaryCalls .
Conditional probability table, or CPT
▪ Each row in a CPT contains the conditional probability of each

node value for a conditioning case.
▪ A conditioning case is just a possible combination of values for
the parent nodes’
▪ Each row must sum to 1.
▪ For Boolean variables, once you know that the probability of a
true value is p , the probability of false must be 1-p, so we often
omit the second number.
▪ In general, a table for a Boolean variable with k Boolean parents
contains 2^k independently specifiable probabilities.
▪ A node with no parents has only one row, representing the prior
probabilities of each possible value of the variable.
Syntax of BNs
▪ a set of nodes, one per random variable

▪ a directed, acyclic graph (link ≈"directly influences")
▪ a conditional distribution for each node given its
parents: P (Xi | Parents (Xi))
▪ For discrete variables, conditional probability table (CPT)=
distribution over Xi for each combination of parent values
Example Bayes Net: Burglar Alarm
Earthquake Burglary
Burglar Alarm Example …
Alarm
▪If I know if Alarm, no other evidence influences my degree

of belief in JohnCalls
▪ P(J|M,A,E,B) = P(J|A)
▪ also: P(M|J,A,E,B) = P(M|A) and P(E|B) = P(E)
▪By the chain rule we have
P(J,M,A,E,B) = P(J|M,A,E,B) ·P(M|A,E,B)· P(A|E,B) ·P(E|B) ·P(B)
= P(J|A) ·P(M|A) ·P(A|B,E) ·P(E) ·P(B)
▪Full joint requires only 10 parameters
BNs: Qualitative Structure
▪ Graphical structure of BN reflects conditional independence

among variables
▪ Each variable X is a node in the DAG
▪ Edges denote direct probabilistic influence
▪ parents of X are denoted Par(X)
▪ Each variable X is conditionally independent of all non
descendants, given its parents.
▪ Graphical test exists for more general independence
▪ “Markov Blanket”
Given Parents, X is Independent of Non-Descendants
Fig: A node X is conditionally independent of its non-descendants (e.g., the

Zij’ s) given its parents (the Uis shown in the gray area).
For Example
Earthquake Burglary
Alarm
Example
Given Markov Blanket, X is Independent of
All Other Nodes
MB(X) = Par(X)  Childs(X)  Par(Childs(X))

Example
Conditional Probability Tables
Pr(B=t) Pr(B=f)
Earthquake Burglary 0.05 0.95
Pr(A|E,B)
e,b 0.9 (0.1)
e,b 0.2 (0.8)
Radio Alarm
e,b 0.85 (0.15)
e,b 0.01 (0.99)
Nbr1Calls Nbr2Calls
Conditional Probability Tables
▪ For complete spec. of joint dist., quantify BN
▪ For each variable X, specify CPT: P(X | Par(X))

▪ number of params locally exponential in |Par(X)|
▪ If X1, X2,... Xn is any topological sort of the network, then we are

assured:
P(Xn,Xn-1,...X1) = P(Xn| Xn-1,...X1)·P(Xn-1 | Xn-2,… X1)
… P(X2 | X1) · P(X1)
= P(Xn| Par(Xn)) · P(Xn-1 | Par(Xn-1)) … P(X1)
Exact Inference in BNs
▪ The graphical independence representation

▪ yields efficient inference schemes
▪ We generally want to compute
▪ Marginal probability: Pr(Z), or
▪ Pr(Z|E) where E is (conjunctive) evidence
▪ Z: query variable(s),
▪ E: evidence variable(s)
▪ everything else: hidden variable
▪ One simple algorithm:
▪ Inference by enumeration with variable elimination (VE)
Inference in BNs
▪ Let E be the list of evidence variables, let e be the list of

observed values for them, and let y be the remaining
unobserved variables (hidden variables). The query P(X | e) can
be evaluated as
where the summation is over all possible ys (i.e., all possible

combinations of values of the unobserved variables Y).
▪ Now, a Bayes net gives a complete representation of the full joint
distribution.
▪ Therefore, a query can be answered using a Bayes net by computing
sums of products of conditional probabilities from the network.
Example: P(B | J=true, M=true)
Earthquake Burglary
Alarm
John Mary
P(B|j,m) = P(B) P(E) P(A|B,E)P(j|A)P(m|A)

E A
Earthquake Burglary
Burglar Alarm Example …
Alarm
Inference by Enumeration
Dynamic Programming
Variable Elimination
▪ A factor is a function from some set of variables into a specific

value: e.g., f(E,A, B)
▪ CPTs are factors, e.g., P(A|E,B) function of A,E,B
▪ VE works by eliminating all variables in turn until there is a factor
with only query variable
▪ To eliminate a variable:
▪ join all factors containing that variable (like DB)
▪ sum out the influence of the variable on new factor
Example of VE: P(J)
P(J)
= M,A,B,E P(J,M,A,B,E) Earthqk Burgl
= M,A,B,E P(J|A)P(M|A) P(B)P(A|B,E)P(E)
= AP(J|A) MP(M|A) BP(B) EP(A|B,E)P(E) Alarm
= AP(J|A) MP(M|A) BP(B) f1(A,B)
= AP(J|A) MP(M|A) f2(A) J M
= AP(J|A) f3(A)
= f4(J)
Example: P(B | J=true, M=true) using VE
Example: Traffic Domain
▪ Random Variables +r 0.1

▪ R: Raining R -r 0.9
▪ T: Traffic
▪ L: Late for class! T +r +t 0.8
+r -t 0.2
-r +t 0.1
-r -t 0.9
L
+t +l 0.3
+t -l 0.7
-t +l 0.1
-t -l 0.9

Lecture 29

Uploaded by

Copyright:

Available Formats

Lecture 29

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Lecture 29

Uploaded by

Copyright:

Available Formats

Reasoning under Uncertainty

Instructors: Dr. Durgesh Singh

▪ Agents in the real world need to handle uncertainty, whether

▪ Let us try to write rules for dental diagnosis using propositional

▪ In order to make the rule true, we have to add an almost

▪ This is typical of the medical domain, as well as most other

▪ Random variables are typically divided into three kinds,

▪ All probabilities between 0 and 1

▪ The unconditional or prior probability associated with a

Weather = sunny rainy cloudy snow

▪ A full joint distribution specifies the probability of every atomic

▪ The notation used is P(a l b),where a and b are any proposition.

▪ Conditional probabilities can be defined in terms of

holds whenever P(b)>0

A domain consisting of just the three Boolean variables

▪ A and B are independent iff:

Therefore, if A and B are independent:

Complete independence is powerful but rare. What to do if it

▪ The general definition of conditional independence of two

P(catch | toothache, cavity) = P(catch | cavity)

▪ Often, using conditional independence reduces the storage

▪ Conditional independence is the most basic & robust form of

P( H  E ) = P( E | H ) P( H ) (Mult by P(H) in line 2)

E.g. let M be meningitis, S be stiff neck

▪ Does patient have cancer or not?

The general assertion that, for every variable Xi in the

▪ A Bayesian network is a directed graph in which each node is

Topology of network encodes conditional independence

▪ You have a new burglar alarm installed at home.

▪ Each row in a CPT contains the conditional probability of each

▪ a set of nodes, one per random variable

▪If I know if Alarm, no other evidence influences my degree

▪ Graphical structure of BN reflects conditional independence

Fig: A node X is conditionally independent of its non-descendants (e.g., the

MB(X) = Par(X)  Childs(X)  Par(Childs(X))

▪ For each variable X, specify CPT: P(X | Par(X))

▪ If X1, X2,... Xn is any topological sort of the network, then we are

▪ The graphical independence representation

▪ Let E be the list of evidence variables, let e be the list of

where the summation is over all possible ys (i.e., all possible

P(B|j,m) = P(B) P(E) P(A|B,E)P(j|A)P(m|A)

▪ A factor is a function from some set of variables into a specific

= M,A,B,E P(J,M,A,B,E) Earthqk Burgl

= M,A,B,E P(J|A)P(M|A) P(B)P(A|B,E)P(E)

= AP(J|A) MP(M|A) BP(B) EP(A|B,E)P(E) Alarm

= AP(J|A) MP(M|A) BP(B) f1(A,B)

= AP(J|A) MP(M|A) f2(A) J M

▪ Random Variables +r 0.1

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.