Lecture Quantifying Uncertainty

Download as pdf or txt
Download as pdf or txt
You are on page 1of 40

Quantifying Uncertainty

Motivation

Uncertainty is everywhere. Consider the following proposition.


At: Leaving t minutes before the flight will get me to the
airport.

Problems:
1. partial observability (road state, other drivers’ plans, etc.)
2. noisy sensors (radio traffic reports)
3. uncertainty in action outcomes (flat tire, etc.)
4. immense complexity of modelling and predicting traffic
Knowledge representation
Language Main elements Assignments
Propositional logic Facts T, F, unknown
First-order logic facts, objects, relations T, F, unknown
Temporal logic facts, objects, relations, times T, F, unknown
Temporal CSPs time points time intervals
Fuzzy logic set membership degree of truth
Probability theory facts degree of belief

The first three do not represent uncertainty, while the last three do.
Example of uncertain reasoning
• Consider the following simple rule
• Toothache ⇒Cavity
• It is wrong as not all toothache’s are due to cavity
• Toothache ⇒Cavity V gum_problem V Abscess …
• Turn around Cavity ⇒Toothache
• This rule is not true either (not all cavities cause pain
• Only way to fix the rule is to make it logically exhaustive
• To augment the left hand side with all the qualifications required for a cavity to
cause a toothache.
Handling Uncertain KB
• Logic fails in medical domain because of three reasons
• Laziness: Too much to list the complete set of antecedents or consequents
needed to ensure an exceptionless rule and too hard to use such rules.
• Theoretical ignorance: No complete theory for the domain
• Practical ignorance: Even if we know all rules, we might be uncertain about
a particular patient because not all necessary tests have been conducted

• Typical of medical domain, true with any other domains law, gardening etc
• The agent’s knowledge can at least be best provide only a degree of belief
in the relevant sentences.
• The tool to dealing with degree of belief is probability theory
Uncertainty and rational decisions
• For getting airport prepare many choices
• The agent must have preferences between the different possible
outcomes of the various plans
• We use utility theory to represent and reason with preferences.
Utility theory says every state has a degree of usefulness or utility.
• The agent prefers high utility
• Preferences, as expressed by utilities, are combined with probabilities
in the general theory of rational decisions called decision theory.
• Decision theory = probability theory + utility theory
Decision theoretic agent that selects rational
actions
Probability
• Probabilistic assertions summarize effects of
• laziness: failure to enumerate exceptions, qualifications, etc.
ignorance: lack of relevant facts, initial conditions, etc.
• Probabilities relate propositions to one’s own state of knowledge. They
might be learned from past experience of similar situations.
• e.g., P (A25) = 0.05
• Probabilities of propositions change with new evidence: e.g.,
P (A25| no reported accidents) = 0.06
• e.g., P (A25| no reported accidents, 5am) = 0.15
Probability basics
Begin with a set Ω called the sample space
A sample space is a set of possible outcomes
Each ω ∈ Ω is a sample point (possible world,
atomic event) e.g., 6 possible rolls of a die:{1, 2,
3, 4, 5, 6}
Probability space or probability model:
Take a sample space Ω, and
assign a number P(ω) (the probability of ω)
to every atomic event ω ∈ Ω
Probability basics (cont’d)
A probability space must satisfy the following properties:
0 ≤ P(ω) ≤ 1 for every ω ∈ Ω
Σ
ω∈Ω P(ω) = 1
e.g., for rolling the die,
P (1) = P (2) = P (3) = P (4) = P (5) = P (6) = 1/6.
An event A is any subset of Ω
The probability of an event is defined as follows:
Σ
P(A) = {ω ∈ A} P(ω)
e.g., P(die roll < 4) =
P(1) + P(2) + P(3) = 1/6 + 1/6 + 1/6 = 1/2
Random variables
A random variable is a function from sample points to some range such as
integers or Booleans.
We’ll use capitalized words for random variables.
e.g., rolling the die:

true if ω is even,
Odd(ω) =
false otherwise

A probability distribution gives a probability for every possible value.


If X is a random variable, then
Σ
P(X = xi) = {P(ω) : X(ω) = xi}
e.g., P (Odd = true) = P (1) + P (3) + P (5) = 1/6 + 1/6 + 1/6
= 1/2
Note that we don’t write Odd’s argument ω here.
Propositions
• Odd is a Boolean or propositional random variable: its range is {true,
false}
• We’ll use the corresponding lower-case word (in this case odd) for the event that a
propositional random variable is true
• e.g., P(odd) = P(Odd = true) = 1/6
• P(¬odd) = P(Odd = false) = 5/6
• Boolean formula = disjunction of the sample points in which it is true
• e.g., (a ∨ b) ≡ (¬a ∧ b) ∨ (a ∧ ¬b) ∨ (a ∧ b)
• ⇒ P(a ∨ b) = P(¬a ∧ b) + P(a ∧ ¬b) + P(a ∧ b)
Syntax for propositions
Propositional or Boolean random variables
e.g., Cavity (do I have a cavity in one of my teeth?)
Cavity = true is a proposition, also written cavity
Discrete random variables (finite or infinite)
e.g., Weather is one of < sunny, rain, cloudy, snow >
Weather = rain is a proposition
Values must be exhaustive and mutually exclusive
Continuous random variables (bounded or unbounded) e.g., Temp = 21.6;
Temp < 22.0
Arbitrary Boolean combinations of basic propositions e.g., ¬cavity
means Cavity = false
Probabilities of propositions
e.g., P(cavity) = 0.1 and P(Weather = sunny) = 0.72
Syntax for probability distributions
Represent a discrete probability distribution as a vector of probability values:
P(Weather) =< 0.72, 0.1, 0.08, 0.1 >
The above is an ordered list representing the probabilities of
sunny, rain, cloudy, and snow.
Probabilities of sunny, rain, cloudy, and snow must sum to 1 when the vector is
normalized
If B is a Boolean random variable, then P(B) =< P(b), P(¬b) >
e.g., if P(cavity) = 0.1 then
P(Cavity = true) = 0.1 and P(Cavity) =< 0.1,0.9 >
When the entries in the vector do not add up to 1, but represent the true ratios, the
vector is preceded by a normalizing constant, α,
e.g. P(Cavity) = α < 0.01, 0.09 > where α is 10
Syntax for joint probability distributions
A joint probability distribution for a set of n random variables gives
the probability of every atomic event on those variables,
i.e., every sample point
Represent it as an n-dimensional matrix,
e.g., P(Weather, Cavity) is a 4 × 2 matrix.
The entries contain propabilities for all possible combinations of
Weather (4), and Cavity (2).
Weather =
rain cloudy
sunny Snow

Cavity = true 0.144 0.02 0.016 0.02


Cavity = false 0.576 0.08 0.064 0.08
Every question about a domain can be answered by the joint
distribution because every event is a sum of sample points
Conditional probability
Prior (unconditional) probabilities refer to degrees of belief in the absence of
any other information.
Posterior (conditional) probabilites refer to degrees of belief when we have
some information, called evidence.
Consider drawing straws from a set of 1 long and 4 short straws, long refers to
drawing a long straw, and short refers to drawing a short straw.
P(long) = 0.2
P(long|short) = 0.25
P(long|long) = 0.0
P(long|short, short) = 1 3
P(long|rain) = 0.2
Conditional probability (cont’d)
P(cavity|toothache) = 0.8 means
the probability of cavity given that toothache is all we know It
does not mean “if toothache then 80% chance of cavity
Suppose we get more evidence, e.g., cavity is also given. Then
P(cavity|toothache, cavity) = 1
Note: the less specific belief remains valid, but is not always useful
New evidence may be irrelevant, allowing simplification, e.g.,
P(cavity|toothache, 49ersWin) = P(cavity|toothache) =0.8
Conditional distibutions are shown as vectors for all possible
combinations of the evidence and query.
P(Cavity|Toothache) is a 2-element vector of 2-element vectors

< < 0.12, 0.08 >,< 0.08, 0.72 > >


` ˛¸ x` ˛¸ x
toot h ache ¬too th ache
Conditional probability definitions
Definition of conditional probability:
P(a ∧ b)
P(a|b) =
P(b)
Product rule gives an alternative formulation and holds even if
P(b) = 0

P(a ∧ b) = P(a|b)P(b) = P(b|a)P(a)


A general version holds for an entire probability distribution, e.g.,

P(Weather, Cavity) = P(Weather|Cavity)P(Cavity) This is


not matrix multiplication, it’s a set of 4× 2 equations:
P(sunny,cavity) = P(sunny|cavity)P(cavity) P(sunny,¬cavity) = P(sunny|¬cavity)P(¬cavity) P(rain,cavity) =
P(rain|cavity)P(cavity) P(rain,¬cavity) = P(rain|¬cavity)P(¬cavity) P(cloudy,
cavity) = P(cloudy|cavity)P(cavity) P(cloudy,¬cavity) = P(cloudy|¬cavity)P(¬cavity) P(snow,cavity) =
P(snow|cavity)P(cavity) P(snow,¬cavity) = P(snow|¬cavity)P(¬cavity)
Chain rule
Chain rule is derived by successive applications of the product rule:
P(X1, ...,Xn)
= P(Xn|X1, ...,Xn−1)P(X1, ...,Xn−1)
= P(Xn|X1, ...,Xn−1)P(Xn−1|X1, ...,Xn−2)P(X1, ...,Xn−2)
= ...
Qn
= i=1 P(Xi|X1, ...,Xi−1)
For example,
P(X1, X2, X3, X4)
= P(X1)P(X2|X1)P(X3|X1, X2)P(X4|X1, X2, X3)
= P(X4|X3, X2, X1)P(X3|X2, X1)P(X2|X1)P(X1)
Inference by enumeration
The Dentist Domain:
What is the probability of a cavity given a toothache? What is
the probability of a cavity given the probe catches?
We start with the joint distribution:

toothache ~toothache
catch ~catch catch ~catch
cavity .108 .012 .072 .008
~cavity .016 .064 .144 .576

For any proposition q, add up the atomic events where it is true:


Σ
P(q) = P(w)
w:w|=q
Computing the probability of a proposition
toothache ~toothache
catch ~catch catch ~catch
cavity .108 .012 .072 .008
~cavity .016 .064 .144 .576

For any proposition q, add up the atomic events where it is true:


Σ
P(q) = P(w)
w:w|=q

Red shows “the world” given what we know so far.


Green shows the (atomic) event we are interested in.

P(toothache)= P(toothache, catch, cavity) + P(toothache, ¬catch, cavity)+


P(toothache, catch, ¬cavity) + P(toothache, ¬catch, ¬cavity)
= 0.108 + 0.012 + 0.016 + 0.064 = 0.2
Computing the probability of a logical sentence

toothache ~toothache
catch ~catch catch ~catch
cavity .108 .012 .072 .008
~cavity .016 .064 .144 .576

P(cavity ∨ toothache)
= 0.108 + 0.012 + 0.072 + 0.008 + 0.016 + 0.064
= 0.28
Computing a conditional probability
toothache ~toothache
catch ~catch catch ~catch
cavity .108 .012 .072 .008
~cavity .016 .064 .144 .576

Once toothache comes as evidence the world is restricted to those


cells where Toothache is true as shown in red.
General idea:
Compute the distribution on the query variable (Cavity) (Cavity) by
fixing the evidence variables (Toothache) and
summing over all possible values of hidden variables (Catch, Cavity)
∧toothache)
P(¬cavity|toothache) = P(¬cavity
P(toothache)
= 0.016 + 0.064 = 0.4
0.108 + 0.012 + 0.016 + 0.064
Computing a conditional probability (cont’d)
toothache ~toothache
catch ~catch catch ~catch
cavity .108 .012 .072 .008
~cavity .016 .064 .144 .576

General idea: Fix the evidence variable (Toothache) and


sum over all possible values of hidden variables
(Catch for the numerator, Cavity and Catch for the denominator)
Σ
P(Y =y,E=e) P(Y=y,E=e,H=h)

P(Y = y|E = e) = P(E=e)
= P(E=e,H=h)
h
Σ
P(¬cav,tth) P(¬cav,tth,H=h)

P(¬cav|tth) = P(tth)
=
h P(tth,H=h)
P(¬cav,tth,cat)+P(¬cav,tth,¬cat)
= P(tth,cav,cat)+P(tth,cav,¬cat)+P(tth,¬cav,cat)+P(tth,¬cav,¬cat)

= 0.016+0.064
0.108+0.012+0.016+0.064
Normalization
toothache ~toothache
catch ~catch catch ~catch
cavity .108 .012 .072 .008
~cavity .016 .064 .144 .576

Recall that events are lower case, random variables are Capitalized
General idea: The denominator can be viewed as a normalization constant α
We take the probability distribution over the values of the hidden variables.
P(Cavity|toothache) = αP(Cavity, toothache)
= α[P(Cavity, toothache, catch) + P(Cavity, toothache, ¬catch)]
= α[< P(cavity, toothache, catch), P(¬cavity, toothache, catch) > +
< P(cavity, toothache, ¬catch), P(¬cavity, toothache, ¬catch) >]
= α[< 0.108, 0.016 > + < 0.012, 0.064 >]
= α[< 0.108+ 0.012, 0.016+ 0.64 >] = α[< 0.12, 0.08 >]
=< 0.6, 0.4 > because the entries must add up to 1
Compute α from 0.12+0.08
1
Inference by enumeration, summary
Let X be the set of all variables. Typically, we are interested in the
posterior (conditional) joint distribution of the query variables Y given
specific values e from the evidence variables E

Let the hidden variables be H = X − Y − E


Then the required summation of joint entries is done by summing
out the hidden variables:
Σ
P(Y|E = e) = αP(Y,E = e) = α P(Y,E = e, H = h)
h

i.e., sum over every possible combination of values


h =< h1, ...,hn > of the hidden variables H =< H1, ...,Hn >
The terms in the summation are joint entries because Y, E, and H
together exhaust the set of random variables
Inference by enumeration, issues

Consider that number of random variables is n, and


d is the largest arity

► Worst case time complexity is O(dn)


► Space complexity of O(dn), to store the entire joint
distribution
► How to find the numbers for the O(dn) entries?
Independence
Random variables A and B are independent iff
P(A|B) = P(A) or P(B|A) = P(B) or P(A, B) = P(A)P(B)
Cavity
Cavity
decomposes into Toothache Catch
Toothache Catch

Weather
Weather

P(Toothache, Catch, Cavity, Weather)


= P(Toothache, Catch, Cavity)P(Weather)
2 × 2 × 2 × 4 = 32 entries reduced to (2 × 2 × 2) + 4 = 12 entries For
n independent biased coins, 2n entries reduced to n
Absolute independence powerful but rare
E.g., dentistry is a large field with hundreds of variables,
none of which are independent. What to do?
Conditional independence
Consider P(Toothache, Cavity, Catch)
If I have a cavity, the probability that the probe catches in it
doesn’t depend on whether I have a toothache:
P(catch|toothache, cavity) = P(catch|cavity)
The same independence holds if I haven’t got a cavity:
P(catch|toothache, ¬cavity) = P(catch|¬cavity)
Thus Catch is conditionally independent of Toothache given
Cavity:
P(Catch|Toothache, Cavity) = P(Catch|Cavity)
Or equivalently:
P(Toothache|Catch, Cavity) = P(Toothache|Cavity)
P(Toothache, Catch|Cavity) =
P(Toothache|Cavity)P(Catch|Cavity)
Conditional independence (cont’d)
• Write out full joint distribution using chain rule:
• P(Toothache, Catch, Cavity)
• = P(Toothache|Catch, Cavity)P(Catch, Cavity)
• = P(Toothache|Catch, Cavity)P(Catch|Cavity)P(Cavity)
• = P(Toothache|Cavity)P(Catch|Cavity)P(Cavity)
• In most cases, the use of conditional independence reduces
the size of the representation of the joint distribution from
exponential in n to linear in n.
• Conditional independence is our most basic and robust
from of knowledge about uncertain environments.
Bayes’ rule
Product rule: P(a ∧ b) = P(a|b)P(b) = P(b|a)P(a)
P(b|a)P(a)
Bayes’ rule: P(a|b) = P(b)

or in probability distribution form,


P(X|Y)P(Y)
P(Y|X) = = αP(X|Y)P(Y)
P(Y)
Useful for assessing diagnostic probability from causal probability:

P(Effect|Cause)P(Cause)
P(Cause|Effect) =
P(Effect)
Bayes’ rule example

Useful for assessing diagnostic probability from causal probability:

P(Effect|Cause)P(Cause)
P(Cause|Effect) =
P(Effect)
E.g., let M be meningitis, S be stiff neck:

P(s|m)P(m) 0.8× 0.0001


P(m|s) = = = 0.0008
P(s) 0.1
Note: posterior probability of meningitis is still very small
Bayes’ rule and conditional independence
P(Cavity|toothache ∧ catch)
= P(toothache ∧ catch|Cavity)P(Cavity)/P(toothache ∧ catch)
= αP(toothache ∧ catch|Cavity)P(Cavity)
= αP(toothache|Cavity)P(catch|Cavity)P(Cavity)
A naive Bayes model is a mathematical model that assumes the
effects are conditionally independent, given the cause
Q
P(Cause, Effect1, ...,Effectn) = P(Cause) i P(Effecti|Cause)

Cavity Cause

Toothache Catch Effect 1 Effect n

Naive Bayes model ⇒ total number of parameters is linear in n


The wumpus world
1,4 2,4 3,4 4,4
Each cell has 0.2 probability of containing a pit.
Falling into a pit kills the agent.
1,3 2,3 3,3 4,3
The wumpus won’t fall into a pit.

Pi,j = true iff [i, j ] contains a pit.


1,2 2,2 3,2 4,2 ∀i, j P(pi,j ) = 0.2
B
OK
1,1 2,1 3,1 4,1 Each pit causes a breeze in the adjacent cells.
B
OK
Bi,j = true iff [i, j] is breezy.
OK

The agent is navigating the There is one wumpus. Being in the same cell as the
wumpus world in search of wumpus kills the agent. The cells adjacent to
gold. where the wumpus have a stench.

After finding a breeze in both [1,2] and [2,1],


The agent can perceive a
there is no safe place to explore.
breeze, a smell, or the gold.
Specifying the probability model for pits
The only breezes we care about are
B1,1, B1,2, B2,1. We can ignore the others.

The full joint distribution is:


1,4 2,4 3,4 4,4
P(P1,1, . . . , P4,4, B1,1, B1,2, B2,1)
Apply the product rule to get P(Effect|Cause):
1,3 2,3 3,3 4,3 P(B1,1, B1,2, B2,1|P1,1, . . . , P4,4)P(P1,1, . . . , P4,4)

1,2 2,2 3,2 4,2


First term: 1 if pits are adjacent to breezes, 0
B otherwise
OK

1,1 2,1
B
3,1 4,1 Second term: Pits are placed independently.
OK OK Calculate using probability 0.2 for each of the n
pits. For example:
P(p1,1, . . . , p4,4) = 0.216 × 0.80, as n = 0
P(¬p1,1, . . . , p4,4) = 0.215 × 0.81, as n = 1
Observations and query
We know the following facts (evidence):
b = ¬b1,1 ∧ b1,2 ∧ b2,1
known = ¬p1,1 ∧ ¬p1,2 ∧ ¬p2,1
1,4 2,4 3,4 4,4
The query is P(P1,3|known, b)
1,3 2,3 3,3 4,3 We need to sum over the hidden variables, so
define Unknown = Pi,j s
1,2 2,2 3,2 4,2
other than P1,3 and Known
B
OK
For inference by enumeration, we have
P(P
Σ |known, b) =
α 1,3 P(P1,3, unknown, known, b)
1,1 2,1
B 3,1 4,1
OK
OK

unknown

Exponential number of combinations based on


the number of cells in unknown
Using conditional independence
Basic insight: Given the frontier squares, b is conditionally
independent of the other hidden squares

1,4 2,4 4,4


Define Unknown = Frontier ∪ Other

1,3 2,3 3,3 4,3


P(b|P1,3, Known, Unknown)
= P(b|P1,3, Known, Frontier, Other)
OTHER
QUERY

= P(b|P1,3, Known, Frontier)


1,2 2,2 3,2 4,2

We want to manipulate the query into a form


1,1 2,1
FRONTIER
3,1 4,1 where we can use the above conditional
KNOWN
independence.
Translating to use conditional independence
P(P1,3|known, b)
= P(P1,3, known, b)/P(known, b)
= αP(P1,3, known, b)
Σ
=α unknown P(P1,3, known, b, unknown)
Σ
=α unknown P(b|P1,3, known, unknown)P(P1,3, known, unknown)
Σ Σ
=α frontier other P(b|P1,3, known, frontier, other)P(P1,3, known, frontier, other)
Σ Σ
=α other P(b|P1,3, known, frontier)P(P1,3, known, frontier, other)
Σ frontier Σ
=α P(b|P1,3, known, frontier) P(P1,3, known, frontier, other)
Σ frontier Σ other
=α P(b|P1,3, known, frontier) P(P1,3)P(known)P(frontier)P(other)
frontier Σ other Σ
= αP(known)P(P1,3) P(b|P1,3, known, frontier) P(frontier)P(other)
Σ frontier Σ other
= α′P(P1,3) P(b|P1,3, known, frontier) P(frontier)P(other)
Σ frontier other Σ
= α′P(P1,3) P(b|P1,3, known, frontier)P(frontier) P(other)
frontier other
Σ
= α′P(P1,3) frontier P(b|P1,3, known, frontier)P(frontier)
Results using conditional independence

1,3 1,3 1,3 1,3 1,3

1,2B 2,2 1,2B 2,2 1,2 B 2,2 1,2 B 2,2 1,2 B 2,2
OK OK OK OK OK
1,1 2,1 3,1 1,1 2,1 3,1 1,1 2,1 3,1 1,1 2,1 3,1 1,1 2,1 3,1
B B B B B
OK OK OK OK OK OK OK OK OK OK

0.2 x 0.2 = 0.04 0.2 x 0.8 = 0.16 0.8 x 0.2 = 0.16 0.2 x 0.2 = 0.04 0.2 x 0.8 = 0.16

P(P1,3|known, b) ≈ < 0.31, 0.69 >


P(P2,2|known, b) ≈ < 0.86, 0.14 >
Summary

• Probability is a rigorous formalism for uncertain knowledge


• Joint probability distribution specifies probability of every
• atomic event
• Queries can be answered by inference by enumeration
• (summing over atomic events)
• Can reduce combinatorial explosion using independence and
• conditional independence

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy