13.Uncertainty

Download as pdf or txt
Download as pdf or txt
You are on page 1of 31

Quantifying Uncertainty

Jihoon Yang

Machine Learning Research Laboratory


Department of Computer Science & Engineering
Sogang University

Jihoon Yang (ML Research Lab) Uncertainty 1 / 31


Uncertainty
Uncertainty
Let action At = leave for airport t minutes before flight
Will At get me there on time?
Problems:
1) partial observability (road state, other drivers’ plans, etc.)
2) noisy sensors (traffic radio)
3) uncertainty in action outcomes (flat tire, etc.), etc.
Hence a purely logical approach either
1) risks falsehood: “A25 will get me there on time”
or 2) leads to conclusions that are too weak for decision making
“A25 will get me there on time if there’s no accident on the
bridge and it doesn’t rain and my tires remain intact etc.”
(A1440 might reasonably be said to get me there on time
but I’d have to stay overnight in the airport . . .)

AI Slides (6e)
Jihoon !c Lin (ML
Yang Zuoquan@PKU 1998-2020
Research Lab) Uncertainty 9 644 2 / 31
Making decisions
Making under uncertainty
decisions under uncertai
Suppose I believe the following:

P (A25 gets me there on time| . . .) = 0.04


P (A90 gets me there on time| . . .) = 0.70
P (A120 gets me there on time| . . .) = 0.95
P (A1440 gets me there on time| . . .) = 0.9999

Which action to choose?

Depends on my preferences for missing flight vs. airport cuisine, etc.

Utility theory is used to represent and infer preferences

Decision theory = utility theory + probability theory

Jihoon Yang (ML Research Lab) Uncertainty 3 / 31


Probability Basics Probability basics
Begin with a set ⌦—the sample space
e.g., 6 possible rolls of a die.
! 2 ⌦ is a sample point/possible world/atomic event

A probability space or probability model is a sample space


with an assignment P (!) for every ! 2 ⌦ s.t.
0  P (!)  1
⌃ ! P (!) = 1
e.g., P (1) = P (2) = P (3) = P (4) = P (5) = P (6) = 1/6.

An event A is any subset of ⌦

P (A) = ⌃{!2A}P (!)


E.g., P (die roll < 4) = P (1) + P (2) + P (3) = 1/6 + 1/6 + 1/6 = 1/2

Jihoon Yang (ML Research Lab) Uncertainty 4 / 31


Random Variables

Random variables
A random variable is a function from sample points to some range,
e.g., the reals or Booleans (e.g. Odd(1) = true)

P induces a probability distribution for any r.v. X:

P (X = xi ) = ⌃{!:X(!) = x }P (!)
i

e.g., P (Odd = true) = P (1) + P (3) + P (5) = 1/6 + 1/6 + 1/6 = 1/2

Jihoon Yang (ML Research Lab) Uncertainty 5 / 31


Propositions
Propositions
Think of a proposition as the event (set of sample points)
where the proposition is true

Given Boolean random variables A and B:


event a = set of sample points ! where A(!) = true
event ¬a = set of sample points ! where A(!) = f alse
event a ^ b = points ! where A(!) = true and B(!) = true

Often in AI applications, the sample points are defined


by the values of a set of random variables, i.e., the
sample space is the Cartesian product of the ranges of the variables

With Boolean variables, sample point = propositional logic model


e.g., A = true, B = f alse, or a ^ ¬b.
Proposition = disjunction of atomic events in which it is true
e.g., (a _ b) ⌘ (¬a ^ b) _ (a ^ ¬b) _ (a ^ b)
) P (a _ b) = P (¬a ^ b) + P (a ^ ¬b) + P (a ^ b)
Jihoon Yang (ML Research Lab) Uncertainty 6 / 31
Syntax for Propositions
Syntax for propositions
Propositional or Boolean random variables
e.g., Cavity (do I have a cavity?)
Cavity = true is a proposition, also written cavity

Discrete random variables (finite or infinite)


e.g., W eather is one of hsunny, rain, cloudy, snowi
W eather = rain is a proposition
Values must be exhaustive and mutually exclusive

Continuous random variables (bounded or unbounded)


e.g., T emp = 21.6; also allow, e.g., T emp < 22.0.

Arbitrary Boolean combinations of basic propositions

Jihoon Yang (ML Research Lab) Uncertainty 7 / 31


Axioms of Probability
Axioms of probability
For any propositions A, B
1. 0 ≤ P (A) ≤ 1
2. P (T rue) = 1 and P (F alse) = 0
3. P (A ∨ B) = P (A) + P (B) − P (A ∧ B)
True

A A B B

>

A probability is a measure over a set of events that satisfies three


axioms ⇒ probability theory is analogous to logical theory (axioms)
Jihoon Yang (ML Research Lab) Uncertainty 8 / 31
Prior & Joint ProbabilityPrior probability
Prior or unconditional probabilities of propositions
e.g., P (Cavity = true) = 0.1 and P (W eather = sunny) = 0.72
correspond to belief prior to arrival of any (new) evidence

Probability distribution gives values for all possible assignments:


P(W eather) = h0.72, 0.1, 0.08, 0.1i (normalized, i.e., sums to 1)

Joint probability distribution for a set of r.v.s gives the


probability of every atomic event on those r.v.s (i.e., every sample point)
P(W eather, Cavity) = a 4 ⇥ 2 matrix of values:

W eather = sunny rain cloudy snow


Cavity = true 0.144 0.02 0.016 0.02
Cavity = f alse 0.576 0.08 0.064 0.08

Every question about a domain can be answered by the joint


distribution because every event is a sum of sample points

Jihoon Yang (ML Research Lab) Uncertainty 9 / 31


Inference
Inference using
using Joint the joint distribution
Distribution

Toothache = Toothache =
true false
Cavity = true 0. 4 0.1
Cavity = false 0.1 0.4

P(cavity) = P(cavity, ache) + P(cavity, ¬ache)

Jihoon Yang (ML Research Lab) Uncertainty 10 / 31


Conditional Probability
Conditional probability

• Conditional or posterior probabilities


P(cavity | toothache) = 0.8
probability of cavity given that toothache
(note cavity is shorthand for Cavity = true)

• Notation for conditional distributions:


P(Cavity | Toothache) = 2-element vector of 2-element vectors
P(Cavity | Toothache, Cavity) = 1

• New evidence may be irrelevant (probability of cavity given


toothache is independent of Weather)
P(cavity | toothache, sunny) = P(cavity | toothache) = 0.8

Jihoon Yang (ML Research Lab) Uncertainty 11 / 31


Conditional Probability
Conditional probability
Definition of conditional probability:

P (a ^ b)
P (a|b) = if P (b) 6= 0
P (b)

Product rule gives an alternative formulation:


P (a ^ b) = P (a|b)P (b) = P (b|a)P (a)

A general version holds for whole distributions, e.g.,


P(W eather, Cavity) = P(W eather|Cavity)P(Cavity)
(View as a 4 ⇥ 2 set of equations, not matrix mult.)

Chain rule is derived by successive application of product rule:


P(X1 , . . . , Xn ) = P(X1 , . . . , Xn 1 ) P(Xn |X1 , . . . , Xn 1 )
= P(X1 , . . . , Xn 2 ) P(Xn 1 |X1 , . . . , Xn 2 ) P(Xn |X1 , . . . , Xn 1 )
= . . .n
= ⇧ i = 1 P(Xi |X1 , . . . , Xi 1 )

Jihoon Yang (ML Research Lab) Uncertainty 12 / 31


Inference by Enumeration
Inference
Probabilistic inference is the computation of posterior probabilities for
query propositions given observed evidence
where the full joint distribution can be viewed as the KB
from which answers to all questions may be derived
Start with the joint distribution
toothache toothache

L
catch catch catch catch
L

L
cavity .108 .012 .072 .008
cavity .016 .064 .144 .576
L

For any proposition φ, sum the atomic events where it is true


P (φ) = Σω:ω|=φP (ω)

Jihoon
AI Yang
Slides (6e)!c Lin(ML Research1998-2020
Zuoquan@PKU Lab) Uncertainty 9 659 13 / 31
Inference by Enumeration
Inference by enumeration
Start with the joint distribution
toothache toothache

L
catch catch catch catch

L
cavity .108 .012 .072 .008
cavity .016 .064 .144 .576
L

For any proposition φ, sum the atomic events where it is true


P (φ) = Σω:ω|=φP (ω)
E.g., P (toothache) = 0.108 + 0.012 + 0.016 + 0.064 = 0.2

Jihoon Yang (ML Research Lab) Uncertainty 14 / 31


Probabilistic Inference
Probabilistic Inference

• One common task is to extract the distribution over a single


variable or some subset of variables, called marginal distribution
P(toothache) = 0.108 + 0.012 + 0.016 + 0.064 = 0.2
P(¬toothache) = … = 0.8

• This process is called marginalization or summing out: for any sets


of variables Y and Z
P(Y ) P(Y , z ) P(Y | z )P( z )
z z

• A distribution over Y can be obtained by summing out all other


variables from any joint distribution containing Y

Jihoon Yang (ML Research Lab) Uncertainty 15 / 31


Inference by Enumeration
Inference by enumeration
Start with the joint distribution
toothache toothache

L
catch catch catch catch

L
cavity .108 .012 .072 .008
cavity .016 .064 .144 .576
L

For any proposition φ, sum the atomic events where it is true


P (φ) = Σω:ω|=φP (ω)
E.g., P (cavity ∨ toothache) = 0.108 + 0.012 + 0.072 + 0.008 +
0.016 + 0.064 = 0.28

Jihoon Yang (ML Research Lab) Uncertainty 16 / 31


Inference by Enumeration
Inference by enumeration
Start with the joint distribution
toothache toothache

L
catch catch catch catch

L
cavity .108 .012 .072 .008
cavity .016 .064 .144 .576
L

Can also compute conditional probabilities


P (¬cavity ∧ toothache)
P (¬cavity|toothache) =
P (toothache)
0.016 + 0.064
= = 0.4
0.108 + 0.012 + 0.016 + 0.064

Jihoon Yang (ML Research Lab) Uncertainty 17 / 31


Normalization
Normalization

toothache ¬toothache
catch ¬catch catch ¬catch
cavity .108 .012 .072 .008
¬cavity .016 .064 .144 .576

• Denominator can be viewed as a normalization constant α


P(Cavity | toothache) = αP(Cavity, toothache)
= α[ P(Cavity, toothache, catch) + P(Cavity, toothache, ¬catch) ]
= α[ <0.108, 0.016> + <0.012, 0.064> ]
= α<0.12, 0.08> = <0.6, 0.4>

• General idea: compute distribution on query variable by fixing


evidence variables and summing over unobserved variables

epartment of Computer Science & Engineering


achine Learning Research Laboratory 24
Jihoon Yang (ML Research Lab) Uncertainty 18 / 31
Probabilistic Inference
Probabilistic Inference
• Let X be all variables. Typically we want the posterior distribution of
the query variables Y given specific values e for the evidence
variables E

• Let other variables be H = X – Y – E

• Then the required summation of joint entries is done by summing


out the other variables:
P(YProbabilistic
| E e) Inference
P ( Y, H h | E e) P(Y, H h, E e)
h h

• In principle, joint distributions can be used to answer any


probabilistic queries

• Obvious problems:
n
– Worst-case time complexity O ( d ) where d is the largest arity
partment of Computer Science & Engineering n
– Space
chine Learning complexity O(d ) to store the joint distribution
Research Laboratory 25

n
– How to find the numbers for O ( d ) entries??
Jihoon Yang (ML Research Lab) Uncertainty 19 / 31
Independence
Independence

• A and B are independent iff


P(A|B) = P(A) or P(B|A) = P(B) or P(A, B) = P(A)P(B)

Toothache
Toothache decomposes into
Catch Cavity
Catch Cavity

Weather
Weather

• P(Toothache, Catch, Cavity, Weather)


= P(Toothache, Catch, Cavity) P(Weather)
n
• 32 entries reduced to 12; for n independent biased coins, O(2 ) O ( n)

• Absolute independence powerful but rare


• How can we manage a large numbers of variables?

partment of Computer Science & Engineering


chine Learning Research Laboratory 28

Jihoon Yang (ML Research Lab) Uncertainty 20 / 31


Conditional Independence
Conditional independence
P(T oothache, Cavity, Catch) has 23 − 1 = 7 independent entries
If I have a cavity, the probability that the probe catches in it doesn’t
depend on whether I have a toothache
(1) P (catch|toothache, cavity) = P (catch|cavity)
The same independence holds if I haven’t got a cavity
(2) P (catch|toothache, ¬cavity) = P (catch|¬cavity)
Catch is conditionally independent of T oothache given Cavity
P(Catch|T oothache, Cavity) = P(Catch|Cavity)
Equivalent statements
P(T oothache|Catch, Cavity) = P(T oothache|Cavity)
P(T oothache, Catch|Cavity) = P(T oothache|Cavity)P(Catch|Cavity)

c Lin Zuoquan@PKU 1998-2020


AI Slides (6e)! 9 666

Jihoon Yang (ML Research Lab) Uncertainty 21 / 31


Conditional Independence
Conditional independence

• Write out full joint distribution using chain rule:


P(Toothache, Catch, Cavity)
= P(Toothache | Catch, Cavity) P(Catch, Cavity)
= P(Toothache | Catch, Cavity) P(Catch | Cavity) P(Cavity)
= P(Toothache | Cavity) P(Catch | Cavity) P(Cavity)
i.e. 2 + 2 + 1 = 5 independent numbers

• Conditional independence
– Often reduces the size of the representation of the joint
distribution from exponential in n to linear in n
– Is one of the most basic and robust form of knowledge about
uncertain environments

Jihoon Yang (ML Research Lab) Uncertainty 22 / 31


Conditional Independence
Conditional Independence

• X is conditionally independent of Y given Z if the probability


distribution governing X is independent of the value of Y given the
value of Z:
P(X|Y, Z) = P(X|Z)

that is, if

( xi , y j , z k ) P( X xi | Y yj,Z zk ) P( X xi | Z zk )

Jihoon Yang (ML Research Lab) Uncertainty 23 / 31


Bayes’ Rule Bayes’ Rule
Product rule P (a ^ b) = P (a|b)P (b) = P (b|a)P (a)

P (b|a)P (a)
) Bayes’ rule P (a|b) =
P (b)

or in distribution form
P(X|Y )P(Y )
P(Y |X) = = ↵P(X|Y )P(Y )
P(X)

Useful for assessing diagnostic probability from causal probability:

P (Ef f ect|Cause)P (Cause)


P (Cause|Ef f ect) =
P (Ef f ect)

E.g., let M be meningitis, S be sti↵ neck:

P (s|m)P (m) 0.8 ⇥ 0.0001


P (m|s) = = = 0.0008
P (s) 0.1

Note: posterior probability of meningitis still very small!


Jihoon Yang (ML Research Lab) Uncertainty 24 / 31
Bayes’ Rule and Conditional Independence
Bayes’ rule and conditional independence

P(Cavity | toothache Λ catch)


= α P(toothache Λ catch |Cavity) P(Cavity)
= α P(toothache | Cavity) P(catch | Cavity) P(Cavity)

• This is an example of a naïve Bayes (idiot Bayes) model:


P(Cause, Effect1 ,..., Effect n ) P Cause i P Effecti | Cause

Cavity Cause

Toothache Catch Effect1 . . . Effectn

• Total number of parameter is linear in n

Jihoon
partment Yang (ML
of Computer Research
Science Lab)
& Engineering Uncertainty 25 / 31
Example: Wumpus World
Example: Wumpus World
1,4 2,4 3,4 4,4

1,3 2,3 3,3 4,3

1,2 2,2 3,2 4,2


B
OK
1,1 2,1 3,1 4,1
B
OK OK

Pij = true iff [i, j] contains a pit


Bij = true iff [i, j] is breezy
Include only B1,1, B1,2, B2,1 in the probability model

Jihoon Yang (ML Research Lab) Uncertainty 26 / 31


Specifying the probability model

Specifying the probability model


The full joint distribution is P(P1,1, . . . , P4,4, B1,1, B1,2, B2,1)
Apply product rule: P(B1,1, B1,2, B2,1 | P1,1, . . . , P4,4)P(P1,1, . . . , P4,4)
(Do it this way to get P (Ef f ect|Cause))
First term: 1 if pits are adjacent to breezes, 0 otherwise
Second term: pits are placed randomly, probability 0.2 per square:
4,4
P(P1,1, . . . , P4,4) = Πi,j = 1,1P(Pi,j ) = 0.2n × 0.816−n
for n pits

Jihoon Yang (ML Research Lab) Uncertainty 27 / 31


Observations and query
Observations and query
We know the following facts:
b = ¬b1,1 ∧ b1,2 ∧ b2,1
known = ¬p1,1 ∧ ¬p1,2 ∧ ¬p2,1
Query is P(P1,3|known, b)
Define U nknown = Pij s other than P1,3 and Known
For inference by enumeration, we have
P(P1,3|known, b) = αΣunknownP(P1,3, unknown, known, b)
Grows exponentially with number of squares

Jihoon Yang (ML Research Lab) Uncertainty 28 / 31


Using conditional independence
Using conditional independence
Basic insight: observations are conditionally independent of other
hidden squares given neighbouring hidden squares
1,4 2,4 3,4 4,4

1,3 2,3 3,3 4,3


OTHER
QUERY

1,2 2,2 3,2 4,2

1,1 2,1 FRINGE


3,1 4,1
KNOWN

Define U nknown = F ringe ∪ Other


P(b|P1,3, Known, U nknown) = P(b|P1,3, Known, F ringe)
Manipulate query into a form where we can use this

c Lin Zuoquan@PKU 1998-2020


AI Slides (6e)! 9 673
Jihoon Yang (ML Research Lab) Uncertainty 29 / 31
Using conditional independence
Using conditional independence

!
P(P1,3|known, b) = α P(P1,3, unknown, known, b)
unknown
!
= α P(b|P1,3, known, unknown)P(P1,3, known, unknown)
unknown
! !
= α P(b|known, P1,3, f ringe, other)P(P1,3, known, f ringe, other)
f ringe other
! !
= α P(b|known, P1,3, f ringe)P(P1,3, known, f ringe, other)
f ringe other
! !
= α P(b|known, P1,3, f ringe) P(P1,3, known, f ringe, other)
f ringe other
! !
= α P(b|known, P1,3, f ringe) P(P1,3)P (known)P (f ringe)P (other)
f ringe other
! !
= α P (known)P(P1,3) P(b|known, P1,3, f ringe)P (f ringe) P (other)
f ringe other
!
= α! P(P1,3) P(b|known, P1,3, f ringe)P (f ringe)
f ringe

c Lin Zuoquan@PKU 1998-2020


AI Slides (6e)! 9 674
Jihoon Yang (ML Research Lab) Uncertainty 30 / 31
Using conditional independence

Using conditional independence


1,3 1,3 1,3 1,3 1,3

1,2 2,2 1,2 2,2 1,2 2,2 1,2 2,2 1,2 2,2
B B B B B

OK OK OK OK OK
1,1 2,1 3,1 1,1 2,1 3,1 1,1 2,1 3,1 1,1 2,1 3,1 1,1 2,1 3,1
B B B B B

OK OK OK OK OK OK OK OK OK OK

0.2 x 0.2 = 0.04 0.2 x 0.8 = 0.16 0.8 x 0.2 = 0.16 0.2 x 0.2 = 0.04 0.2 x 0.8 = 0.16

P(P1,3|known, b) = α! "0.2(0.04 + 0.16 + 0.16), 0.8(0.04 + 0.16)#


≈ "0.31, 0.69#

P(P2,2|known, b) ≈ "0.86, 0.14#

c Lin Zuoquan@PKU 1998-2020


AI Slides (6e)! 9 675
Jihoon Yang (ML Research Lab) Uncertainty 31 / 31

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy