Uncertainty Presented

Section 3:
Reasoning Under Uncertainty
With slides from Dan Klein and Pieter Abbeel

ExpectiMax: What Probabilities to Use?
• In Expectimax search, we have a probabilistic model

of how the opponent (or environment) will behave
in any state
– Model could be a simple uniform distribution (roll a die)
– Model could be sophisticated and require a great deal of
computation
– We have a chance node for any outcome out of our
control: opponent or environment
– Model might say that adversarial actions are more likely!
• For now, assume each chance node magically comes

along with probabilities that specify the distribution
over its outcomes
Having a probabilistic belief

about another agent’s action
does not mean that the
agent is flipping any coins!
2/10/19 Artificial Intelligence, Fall 2018 2

Quiz: Informed Probabilities
• Let’s say you know that your opponent is actually running a depth 2 minimax, using the
result 80% of the time, and moving randomly otherwise
• Question: What tree search should you use?
§ Answer: Expectimax!
§ To figure out EACH chance node’s probabilities,
you have to run a simulation of your opponent
0.1 § This kind of thing gets very slow very quickly
0.9 § Even worse if you have to simulate your
opponent simulating you…
§ … except for minimax, which has the nice
property that it all collapses into one game tree

Modeling Assumptions

Dealing with Uncertainty
§ The robot can handle uncertainty in an obstacle position by

representing the set of all positions of the obstacle that the robot
think possible at each time (belief state)
§ For example, this set can be a disc whose radius grows linearly with
time
Set of possible
Set of possible positions at time 2T
Initial set of positions at time T
possible positions
t=0 t=T t = 2T
Dealing with Uncertainty
§ The robot can handle uncertainty in an obstacle position by

representing the set of all positions of the obstacle that the robot
think possible at each time (belief state)
§ For example, this set can be a disc whose radius grows linearly with
time
The robot must plan to be

outside this disc at time t = T
t=0 t=T t = 2T
Imperfect Observation of the World
Observation of the world can be:

§ Partial, e.g., a vision sensor can’t see through obstacles
(lack of percepts)
R1 R2
The robot may not know whether

there is dust in room R2

Definition: Belief State
§ In the presence of non-deterministic sensory uncertainty, an
agent belief state represents all the states of the world that it
thinks are possible at a given time or at a given stage of
reasoning
§ In the probabilistic model of uncertainty, a probability is

associated with each state to measure its likelihood to be the
actual state
0.2 0.3 0.4 0.1

What do probabilities mean?
§ Probabilities have a natural frequency interpretation

§ The agent believes that if it was able to return many times to a
situation where it has the same belief state, then the actual
states in this situation would occur at a relative frequency
defined by the probabilistic distribution
0.2 0.3 0.4 0.1
This state would occur

20% of the times

Belief State: Example
§ Consider a world where a dentist agent D meets a new patient P
§ D is interested in only one thing: whether P has a cavity, which D
models using the proposition Cavity
§ Before making any observation, D’s belief state is:
Cavity ¬ Cavity
p 1-p
§ This means that D believes that a fraction p of patients have
cavities

Where do probabilities come from?
§ Frequencies observed in the past, e.g., by the agent, its designer,

or others
§ Symmetries, e.g.:
• If I roll a dice, each of the 6 outcomes has probability 1/6
§ Subjectivism, e.g.:
• If I drive on Highway 280 at 120mph, I will get a speeding ticket with
probability 0.6
• Principle of indifference: If there is no knowledge to consider one
possibility more probable than another, give them the same probability

Pacman: Ghost position is uncertain
• A ghost is in the grid

somewhere
• Sensor readings tell how
close a square is to the
ghost
– On the ghost: red
– 1 or 2 away: orange
– 3 or 4 away: yellow
– 5+ away: green
§ Sensors are noisy, but we know P(Color | Distance)

P(red | 3) P(orange | 3) P(yellow | 3) P(green | 3)
0.05 0.15 0.5 0.3

Pacman Uncertainty: 2
• General situation:
– Observed variables (evidence): Agent knows certain
things about the state of the world (e.g., sensor
readings or symptoms)
– Unobserved variables: Agent needs to reason about
other aspects (e.g. where an object is or what
disease is present)
– Model: Agent knows something about how the
known variables relate to the unknown variables
• Probabilistic reasoning gives us a framework for

managing our beliefs and knowledge

Random Variables
• A random variable is some aspect of the world about

which we (may) have uncertainty
– R = Is it raining?
– T = Is it hot or cold?
– D = How long will it take to drive to work?
– L = Where is the ghost?
• We denote random variables with capital letters
• Random variables have domains

– R in {true, false} (often write as {+r, -r})
– T in {hot, cold}
– D in [0, ¥)
– L in possible locations, maybe {(0,0), (0,1), …}

Probability Distributions
• Unobserved random variables have distributions

Shorthand notation:
T P W P
hot 0.5 sun 0.6
cold 0.5 rain 0.1
fog 0.3
meteor 0.0
• A distribution is a TABLE of probabilities of values OK if all domain entries are unique
• A probability (lower case value) is a single

number
• Must have: and

Joint Distributions
• A joint distribution over a set of random variables:

specifies a real number for each assignment (or outcome):
T W P
– Must obey: hot sun 0.4
hot rain 0.1
cold sun 0.2
cold rain 0.3
• Size of distribution of n variables with domain sizes d?

– O(size) = ?
– For all but the smallest distributions, impractical to write out!

Marginal Distributions
• Marginal distributions are sub-tables which eliminate variables

• Marginalization (summing out): Combine collapsed rows by adding
T P
hot ?
T W P
cold ?
hot sun 0.4
hot rain 0.1
cold sun 0.2 W P
cold rain 0.3 sun ?
rain ?

Marginal Distributions
• Marginal distributions are sub-tables which eliminate variables

• Marginalization (summing out): Combine collapsed rows by adding
T P
hot 0.5
T W P
cold 0.5
hot sun 0.4
hot rain 0.1
cold sun 0.2 W P
cold rain 0.3 sun 0.6
rain 0.4

Exercise: Marginal Distributions
X P
+x
X Y P
-x
+x +y 0.2
+x -y 0.3
-x +y 0.4 Y P
-x -y 0.1 +y
-y

Probabilistic Models
• A probabilistic model is a joint

distribution over a set of random
variables
• Probabilistic models:
– (Random) variables with domains
– Assignments are called outcomes Distribution over T,W
– Joint distributions: say whether T W P
assignments (outcomes) are likely hot sun 0.4
– Normalized: sum to 1.0 hot rain 0.1
– Ideally: only certain variables cold sun 0.2
directly interact cold rain 0.3

Conditional Probabilities
• Relates joint and conditional probabilities
– In fact, this is taken as the definition of a conditional probability
P(a,b)
P(a) P(b)
T W P
hot sun 0.4
hot rain 0.1
cold sun 0.2
cold rain 0.3
P(hot|sun) = ? P(sun|cold) = ?

Conditional Distributions
• Conditional distributions are probability distributions over

some variables given fixed values of others
Joint Distribution
W P
T W P
sun 0.8
hot sun 0.4
rain 0.2
hot rain 0.1
cold sun 0.2
W P cold rain 0.3
sun 0.4
rain 0.6

Normalization Trick
T W P
hot sun 0.4
W P
hot rain 0.1
sun 0.4
cold sun 0.2
rain 0.6
cold rain 0.3

Normalization Trick
SELECT the joint NORMALIZE the

probabilities selection
T W P matching the (make it sum to
hot sun 0.4 evidence one) W P
T W P
hot rain 0.1 cold sun 0.2 sun 0.4
cold sun 0.2 cold rain 0.3 rain 0.6
cold rain 0.3

Normalization Trick

T W P matching the (make it sum to
hot sun 0.4 evidence one) W P
T W P
hot rain 0.1 cold sun 0.2 sun 0.4
cold sun 0.2 cold rain 0.3 rain 0.6
cold rain 0.3
• Why does this work? Sum of selection is P(evidence)! (P(T=c), here)

Exercise: Selection & Normalization
• P(X | Y=-y) ?

X Y P matching the (make it sum to
+x +y 0.2 evidence one)
+x -y 0.3
-x +y 0.4
-x -y 0.1

To Normalize
• (Dictionary) To bring or restore to a normal condition
All entries sum to ONE

• Procedure:
– Step 1: Compute Z = sum over all entries
– Step 2: Divide every entry by Z
• Example 1 § Example 2
W P T W P T W P
Normalize W P
sun 0.2 sun 0.4 hot sun 20 Normalizehot sun 0.4
rain 0.3 hot rain 5 hot rain 0.1

Z = 0.5 rain 0.6
cold sun 10 Z = 50 cold sun 0.2
cold rain 15 cold rain 0.3

Probabilistic Inference
• Probabilistic inference: compute a desired probability

from other known probabilities (evidence)
• We generally compute conditional probabilities

– P(on time | no reported accidents) = 0.90
– These represent the agent’s beliefs given the evidence
• Probabilities change with new evidence:

– P(on time | no accidents, 5 a.m.) = 0.95
– P(on time | no accidents, 5 a.m., raining) = 0.80
– Observing new evidence causes beliefs to be updated

Inference 1: The Product Rule
• Example:
D W P D W P
wet sun 0.1 wet sun 0.08
R P
dry sun 0.9 dry sun 0.72
sun 0.8
wet rain 0.7 wet rain 0.14
rain 0.2
dry rain 0.3 dry rain 0.06

The Product Rule
• Sometimes given conditional distributions but want the joint
2/10/19 Artificial Intelligence, Spring 2018 30

The Product Rule
• Example:
D W P D W P
wet sun 0.1 wet sun 0.08
R P
dry sun 0.9 dry sun 0.72
sun 0.8
wet rain 0.7 wet rain 0.14
rain 0.2
dry rain 0.3 dry rain 0.06

The Chain Rule
• More generally, can always write any joint distribution as an incremental product
of conditional distributions
• Why is this true?

– Recursive decomposition using product rule

Inference 2: Bayes’ Rule
• Two ways to factor a joint distribution over two variables:

That’s my rule!
• Dividing, we get:
• Why is this at all helpful?

– Lets us build one conditional from its reverse
– Often one conditional is tricky but the other one is simple
– Foundation of many AI systems we’ll see later (e.g. ASR, MT, POS,…)
• In the running for most important AI, ML, DM equation!

Inference with Bayes’ Rule
• Example: Diagnostic probability from causal probability:
• Example: P (cause|e↵ect) = P (e↵ect|cause)P (cause)

P (e↵ect)
– M: meningitis, S: stiff neck
P (+m) = 0.0001
P (+s| + m) = 0.8 Example
P (+s| m) = 0.01 givens
P (+s| + m)P (+m) P (+s| + m)P (+m)

P (+m| + s) = =
P (+s) P (+s| + m)P (+m) + P (+s|

Inference with Bayes’ Rule
• Example: Diagnostic probability from causal probability:
• Example: P (cause|e↵ect) = P (e↵ect|cause)P (cause)

P (e↵ect)
– M: meningitis, S: stiff neck
P (+m) = 0.0001
P (+s| + m) = 0.8 Example
P (+s| m) = 0.01 givens
P (+s| + m)P (+m) P (+s| + m)P (+m) 0.8 ⇥ 0.0001

P (+m| + s) = = = =
P (+s) P (+s| + m)P (+m) + P (+s| m)P ( m) 0.8 ⇥ 0.0001 + 0.01 ⇥ 0.9999
– Note: posterior probability of meningitis still very small

– Note: you should still get stiff necks checked out! Why?

Ghostbusters, Revisited
• Let’s say we have two distributions:
– Prior distribution over ghost location: P(G)
• Let’s say this is uniform
– Sensor reading model: P(R | G)
• Given: we know what our sensors do
• R = reading color measured at (1,1)
• E.g. P(R = yellow | G=(1,1)) = 0.1
• We can calculate the posterior distribution P(G|r)

over ghost locations given a reading using Bayes’
rule:

Ghostbusters, Revisited
• Let’s say we have two distributions:
– Prior distribution over ghost location: P(G)
• Let’s say this is uniform
• Given: we know what our sensors do
• R = reading color measured at (1,1)
• E.g. P(R = yellow | G=(1,1)) = 0.1
• Can calculate posterior distribution P(G|r) over ghost

locations given a sensor reading, with Bayes’ rule:

Hands-on Example: Ghost Localization
• Setup:
– Prior distribution over ghost location: P(G) = uniform (on right)
– R = reading color measured at (1,1) = Yellow

0.05 0.15 0.5 0.3
• What is probability of ghost at (3,3)?
Details on Board

Hands-on Example: Ghost Localization
• Setup:
– Prior distribution over ghost location: P(G) = uniform (on right)
– R = reading color measured at (1,1) = Yellow

0.05 0.15 0.5 0.3
• What is probability of ghost at (3,3)?

– Answer: 0.1

Quiz: Inference with Bayes’ Rule
• Given:
D W P
wet sun 0.1
W P
dry sun 0.9
sun 0.8
wet rain 0.7
rain 0.2
dry rain 0.3
• What is P(W | dry) ?

W P
sun
Rain

Graphical Model Notation
• Nodes: variables (with domains)

– Can be assigned (observed) or unassigned
(unobserved)
• Arcs: interactions
– Indicate direct influence between
variables
– Formally: encode conditional
independence
• For now: imagine that arrows mean
direct causation (not true in general)

Definition: Independence
• Two variables are independent if:
– This says that their joint distribution factors into a product two
simpler distributions
– Another form:
– We write:
• Independence is a simplifying modeling assumption

– Empirical joint distributions: at best close to independent
– What could we assume for {Weather, Traffic, Cavity,
Toothache}?

Example: Independence
• N fair, independent coin flips:
H 0.5 H 0.5 H 0.5

T 0.5 T 0.5 T 0.5

Conditional Independence
• P(Toothache, Cavity, Catch)
• If I have a cavity, the probability that the probe catches in it
doesn't depend on whether I have a toothache:
– P(+catch | +toothache, +cavity) = P(+catch | +cavity)
• The same independence holds if I don t have a cavity:
– P(+catch | +toothache, -cavity) = P(+catch| -cavity)
• Catch is conditionally independent of Toothache given Cavity:
– P(Catch | Toothache, Cavity) = P(Catch | Cavity)
§ Equivalent statements:
§ P(Toothache | Catch , Cavity) = P(Toothache | Cavity)
§ P(Toothache, Catch | Cavity) = P(Toothache | Cavity) P(Catch | Cavity)
§ One can be derived from the other easily

Conditional Independence and the
Chain Rule
• Chain rule:
• Trivial decomposition:
• With assumption of conditional independence:
• Bayes nets / graphical models help us express conditional independence assumptions

Graphical Model Semantics
• A set of nodes, one per variable X
• A directed, acyclic graph A1 An
• A conditional distribution for each node

– A collection of distributions over X, one for
each combination of parents values X
– CPT: conditional probability table

– Description of a noisy causal process
A Bayes net = Topology (graph) + Local Conditional Probabilities

Conditional Probability Tables
• Each node has a conditional probability table (CPT) that
gives the probability of each of its values given every possible
combination of values for its parents (conditioning case).
– Roots (sources) of the DAG that have no parents are given prior
probabilities.
P(B) P(E)
.001
Burglary Earthquake .002
B E P(A)
T T .95
T F .94
Alarm F T .29
F F .001
A P(J) A P(M)
T .90 T .70
JohnCalls MaryCalls
F .05 F .01
2/10/19 Artificial Intelligence,

47 Fall 2018
CPT Comments
• Probability of Node=false not given, can subtract
from 1: B
T
E
T
P(A=T)
.95
B
T
E
T
P(A=F)
.05
• CPT rows do not need to add up to one – they are

NOT NORMALIZED. (convenient for inference)
• Example requires 10 parameters rather than 25–
1=31 for specifying the full joint distribution.
• Number of parameters in the CPT for a node is
exponential in the number of parents (fan-in).
48 Fall 2018
Joint Distributions for Bayes Nets
• A Bayesian Network implicitly defines a joint
distribution.
n
P( x1 , x2 ,... xn ) = Õ P( xi | Parents ( X i ))
i =1
• Example
P ( J Ù M Ù A Ù ¬B Ù ¬E )
= P( J | A) P( M | A) P( A | ¬B Ù ¬E ) P(¬B) P(¬E )
= 0.9 ´ 0.7 ´ 0.001´ 0.999 ´ 0.998 = 0.00062
• An inefficient approach to inference is:
– 1) Compute the joint distribution using this equation.
– 2) Compute any desired conditional probability using
the joint distribution.
49 Fall 2018
Bayes Nets: Big Picture
• Two problems with using full joint distribution tables

as our probabilistic models:
– Unless there are only a few variables, the joint is WAY too
big to represent explicitly
– Hard to learn (estimate) anything empirically about more
than a few variables at a time
• Bayes nets: a technique for describing complex joint

distributions (models) using simple, local distributions
(conditional probabilities)
– More properly called graphical models
– We describe how variables locally interact
– Local interactions chain together to give global, indirect
interactions

Probability Review
• Reading Materials: R&N Chapter 13
• Online Resources:
– https://courses.washington.edu/css490/2012.Winter/lec
ture_slides/02_math_essentials.pdf
Tutorials with hands-on examples
– https://www.hackerearth.com/practice/machine-
learning/prerequisites-of-machine-learning/basic-
probability-models-and-rules/tutorial/
– https://www.hackerearth.com/practice/machine-
learning/prerequisites-of-machine-learning/bayes-rules-
conditional-probability-chain-rule/tutorial/
Preview: Homework 3
• http://www.cs.emory.edu/~eugene/cs425/p3/

Probability Review
• Bag with 10 marbles: 3 red, 7 blue
– Reach in, take one, put it back

– Repeat lots of times.
– What fraction red? About .3
– P(red) = .3

Probability Distribution
• The probability for each value of a random variable
if color = (red, blue)
P(color) = (.3, .7)

Basic Properties
• 0 ≤ P(A) ≤ 1
• P(true) = 1
P(red Ú blue Ú green) = 1
• P(false) = 0
P(black) = 0

Basic Properties
Counted twice
• P(A Ú B) = P(A) + P(B) - P(A Ù B)
.3 P(red)
+ .4 P(striped)
- .1 P(red Ù striped)
So subtract
P(red Ú striped) = .6 once

Probability Distributions
• Unobserved random variables have distributions

Shorthand notation:
T P W P
hot 0.5 sun 0.6
cold 0.5 rain 0.1
fog 0.3
meteor 0.0
• A distribution is a TABLE of probabilities of values OK if all domain entries are unique
• A probability (lower case value) is a single

number
• Must have: and

Joint Distributions
• A joint distribution over a set of random variables:

specifies a real number for each assignment (or outcome):
T W P
– Must obey: hot sun 0.4
hot rain 0.1
cold sun 0.2
cold rain 0.3
• Size of distribution of n variables with domain sizes d?

– O(size) = ?
– For all but the smallest distributions, impractical to write out!

Probabilistic Models
• A probabilistic model is a joint distribution Distribution over T,W

over a set of random variables
T W P
• Probabilistic models: hot sun 0.4
– (Random) variables with domains hot rain 0.1
– Assignments are called outcomes
– Joint distributions: say whether cold sun 0.2
assignments (outcomes) are likely cold rain 0.3
– Normalized: sum to 1.0
– Ideally: only certain variables directly
interact Constraint over T,W
T W P
• Constraint satisfaction problems:
– Variables with domains hot sun T
– Constraints: state whether assignments are hot rain F
possible
– Ideally: only certain variables directly cold sun F
interact cold rain T

Events
• An event is a set E of outcomes
• From a joint distribution, we can calculate the

probability of any event T W P
– Probability that it’s hot AND sunny hot sun 0.4
P(+hot, + sun) = hot rain 0.1
– Probability that it’s hot? cold sun 0.2
P(+hot) = cold rain 0.3
– Probability that it’s hot OR sunny?

– P(+hot OR +sun)=
• Typically, the events we care about are partial

assignments, like P(T=hot)
Events
• An event is a set E of outcomes
• From a joint distribution, we can calculate the

probability of any event T W P
– Probability that it’s hot AND sunny? hot sun 0.4
hot rain 0.1
– Probability that it’s hot?
cold sun 0.2
– Probability that it’s hot OR sunny? cold rain 0.3
• Typically, the events we care about are partial

assignments, like P(T=hot)

Exercise: Event Probabilities
• P(+x, +y) ?
X Y P
+x +y 0.2
• P(+x) ?
+x -y 0.3
-x +y 0.4
-x -y 0.1
• P(-y OR +x) ?

Conditional Probabilities
• Relates joint and conditional probabilities
– In fact, this is taken as the definition of a conditional probability
P(a,b)
P(a) P(b)
T W P
hot sun 0.4
hot rain 0.1
cold sun 0.2
cold rain 0.3
P(hot|sun) = ? P(cold|rain) = ?

Exercise: Conditional Probabilities
• P(+x | +y) ?
X Y P
+x +y 0.2 • P(-x | +y) ?
+x -y 0.3
-x +y 0.4
-x -y 0.1
• P(-y | +x) ?

• Conditional distributions are probability distributions over

some variables given fixed values of others
Joint Distribution
W P
T W P
sun 0.8
hot sun 0.4
rain 0.2
hot rain 0.1
cold sun 0.2
W P cold rain 0.3
sun 0.4
rain 0.6

Uncertainty Presented

Uploaded by

Copyright:

Available Formats

Uncertainty Presented

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Uncertainty Presented

Uploaded by

Copyright:

Available Formats

Section 3:

Reasoning Under Uncertainty

With slides from Dan Klein and Pieter Abbeel

• In Expectimax search, we have a probabilistic model

• For now, assume each chance node magically comes

Having a probabilistic belief

2/10/19 Artificial Intelligence, Fall 2018 2

2/10/19 Artificial Intelligence, Fall 2018 3

2/10/19 Artificial Intelligence, Fall 2018 4

§ The robot can handle uncertainty in an obstacle position by

§ The robot can handle uncertainty in an obstacle position by

The robot must plan to be

Observation of the world can be:

The robot may not know whether

2/10/19 Artificial Intelligence, Fall 2018 7

§ In the probabilistic model of uncertainty, a probability is

0.2 0.3 0.4 0.1

2/10/19 Artificial Intelligence, Fall 2018 8

§ Probabilities have a natural frequency interpretation

0.2 0.3 0.4 0.1

This state would occur

2/10/19 Artificial Intelligence, Fall 2018 9

2/10/19 Artificial Intelligence, Fall 2018 10

§ Frequencies observed in the past, e.g., by the agent, its designer,

2/10/19 Artificial Intelligence, Fall 2018 11

• A ghost is in the grid

§ Sensors are noisy, but we know P(Color | Distance)

2/10/19 Artificial Intelligence, Fall 2018 12

• Probabilistic reasoning gives us a framework for

2/10/19 Artificial Intelligence, Fall 2018 13

• A random variable is some aspect of the world about

• We denote random variables with capital letters

• Random variables have domains

2/10/19 Artificial Intelligence, Fall 2018 14

• Unobserved random variables have distributions

• A distribution is a TABLE of probabilities of values OK if all domain entries are unique

• A probability (lower case value) is a single

• Must have: and

2/10/19 Artificial Intelligence, Fall 2018 15

• A joint distribution over a set of random variables:

• Size of distribution of n variables with domain sizes d?

2/10/19 Artificial Intelligence, Fall 2018 16

• Marginal distributions are sub-tables which eliminate variables

2/10/19 Artificial Intelligence, Fall 2018 17

• Marginal distributions are sub-tables which eliminate variables

2/10/19 Artificial Intelligence, Fall 2018 18

2/10/19 Artificial Intelligence, Fall 2018 19

• A probabilistic model is a joint

– Ideally: only certain variables cold sun 0.2

directly interact cold rain 0.3

2/10/19 Artificial Intelligence, Fall 2018 20

2/10/19 Artificial Intelligence, Fall 2018 21

• Conditional distributions are probability distributions over

2/10/19 Artificial Intelligence, Fall 2018 22

2/10/19 Artificial Intelligence, Fall 2018 23

SELECT the joint NORMALIZE the

2/10/19 Artificial Intelligence, Fall 2018 24

SELECT the joint NORMALIZE the

• Why does this work? Sum of selection is P(evidence)! (P(T=c), here)