Chapter 5

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 18

Module 5

UNCERTAINTY

ACTING UNDER UNCERTAINTY


Agents may need to handle uncertainty, whether due to partial
observability, nondeterminism, or a combination of the two. An agent
may never know what state it is in or where it will end up after a
sequence of actions.
Agents often make decisions based on incomplete information
 partial observability
 nondeterministic actions
Partial solution (see previous chapters): maintain belief states
 represent the set of all possible world states the agent might be
in
 generating a contingency plan handling every possible
eventuality
Several drawbacks:
 must consider every possible explanation for the observation
(even very-unlikely ones)
=) impossibly complex belief-states
 contingent plans handling every eventuality grow arbitrarily
large
 sometimes there is no plan that is guaranteed to achieve the
goal

Agent’s knowledge cannot guarantee a successful outcome but can


provide some degree of belief (likelihood) on it. A rational decision
depends on both the relative importance of (sub)goals and the
likelihood that they will be achieved. Probability theory offers a clean
way to quantify likelihood.
Example
Automated taxi to Airport
 Goal: deliver a passenger to the airport on time
 Action At : leave for airport t minutes before flight: How can
we be sure that A90 will succeed?
 Too many sources of uncertainty:
o partial observability (ex: road state, other drivers’ plans, etc.)
o uncertainty in action outcome (ex: flat tire, etc.)
o noisy sensors (ex: unreliable traffic reports)
o complexity of modelling and predicting traffic
With purely-logical approach it is difficult to anticipate everything
that can go wrong
 risks falsehood: “A25 will get me there on time” or
 leads to conclusions that are too weak for decision making:
“A25 will get me there on time if there’s no accident on the
bridge , and it doesn’t rain and my tires remain intact”
 Over-cautious choices are not rational solutions either, ex:
A1440 causes staying overnight at the airport

Summarizing uncertainty
Let’s consider an example of uncertain reasoning: diagnosing a
dental patient’s toothache. A medical diagnosis
• Given the symptoms (toothache) infer the cause (cavity) How to
encode this relation in logic?
• diagnostic rules:
• Toothache → Cavity (wrong)
• Toothache → (Cavity ∨ GumProblem ∨ Abscess ∨ ...) (too many
possible causes, some very unlikely) causal rules:
• Cavity → Toothache (wrong)
• (Cavity ∧ ...) → Toothache (many possible (con)causes)
• Problems in specifying the correct logical rules: Complexity: too
many possible antecedents or consequents Theoretical ignorance: no
complete theory for the domain Practical ignorance: no complete
knowledge of the patient
Trying to use logic to cope with a domain like medical diagnosis thus
fails for three main reasons:
 Laziness: It is too much work to list the complete set of
antecedents or consequents needed to ensure an exceptionless
rule and too hard to use such rules.
 Theoretical ignorance: Medical science has no complete theory
for the domain.
 Practical ignorance: Even if we know all the rules, we might
be uncertain about a particular patient because not all the
necessary tests have been or can be run.
The connection between toothaches and cavities is just not a logical
consequence in either direction. This is typical of the medical domain,
as well as most other judgmental domains: law, business, design,
automobile repair, gardening, dating, and so on. The agent’knowledge
can at best provide only a degree of belief in the relevant sentences.
Our main tool for dealing with degrees of belief is probability theory.
Probability provides a way of summarizing the uncertainty that
comes from our laziness and ignorance, thereby solving the
qualification problem.
• Probability allows to summarize the uncertainty on effects of
laziness: failure to enumerate exceptions, qualifications, etc.
ignorance: lack of relevant facts, initial conditions, etc.
• Probability can be derived from, statistical data (ex: 80% of
toothache patients so far had cavities) some knowledge (ex: 80% of
toothache patients has cavities) their combination.
• Probability statements are made with respect to a state of knowledge
(aka evidence), not with respect to the real world
• e.g., “The probability that the patient has a cavity, given that she has
a toothache, is 0.8”:
• P(HasCavity (patient ) | hasToothAche(patient)) = 0.8
• Probabilities of propositions change with new evidence:
• “The probability that the patient has a cavity, given that she has a
toothache and a history of gum disease, is 0.4”:

∧HistoryOfGum(patient )) = 0.4
• P(HasCavity (patient ) | hasToothAche(patient )

Uncertainty and rational decisions


Consider again the A90 plan for getting to the airport. Suppose it
gives us a 97% chance of catching flight. Does this mean it is a
rational choice? Not necessarily: there might be other plans, such as
A180, with higher probabilities. To make such choices, an agent
must first have preferences between the different possible outcomes
of the various plans. Use utility theory to represent and reason with
preferences. Utility theory says that every state has a degree of
usefulness, or utility, to an agent and that the agent will prefer states
with higher utility.
• Ex: Suppose I believe:
• P(A25 gets me there on time |...) = 0.04
• P(A90 gets me there on time |...) = 0.70
• P(A120 gets me there on time |...) = 0.95
• P(A1440 gets me there on time |...) = 0.9999
• Which action to choose?
Preferences, as expressed by utilities, are combined with probabilities
in the general theory of rational decisions called decision theory:
Decision theory = probability theory + utility theory .
The fundamental idea of decision theory is that an agent is rational
if and only if it chooses the action that yields the highest
expected utility, averaged over all the possible outcomes of the
action. This is called the principle of maximum expected utility
(MEU).

BASIC PROBABILITY NOTATION

What probabilities are about

Like logical assertions, probabilistic assertions are about possible


worlds. Whereas logical assertions say which possible worlds are
strictly ruled out (all those in which the assertion is false),
probabilistic assertions talk about how probable the various worlds
are. In probability theory, the set of all possible worlds is called the
sample space. The possible worlds aremutually exclusive and
exhaustive—two possible worlds cannot both be the case, and one
possible world must be the case. For example, if we are about to roll
two (distinguishable) dice, there are 36 possible worlds to consider:
(1,1), (1,2), ..., (6,6). The Greek letter Ω (uppercase omega) is used to
refer to the sample space, and ω (lowercase omega) refers to
elements of the space, that is, particular possible worlds.
A fully specified probability model associates a numerical probability
P (ω) with each possible world.1 The basic axioms of probability
theory say that every possible world has a probability between 0 and 1
and that the total probability of the set of possible worlds is 1:
• Sample space Ω: the set of all possible worlds
• ω ∈ Ω is a possible world (aka sample point or atomic event) ex:
the dice roll (1,4)
• the possible worlds are mutually exclusive and exhaustive. ex: the
36 possible outcomes of rolling two dice: (1,1), (1,2), ...

an assignment P(ω) for every ω ∈ Ω s.t.


• A probability model (aka probability space) is a sample space with

• 0 ≤ P(ω) ≤ 1, for every ω ∈ Ω Σω∈ΩP(ω) = 1


• Ex: 1-die roll: P(1) = P(2) = P(3) = P(4) = P(5) = P(6) = 1/6
• An Event A is any subset of Ω, s.t. P(A) = Σω∈AP(ω)
• events can be described by propositions in some formal language
ex: P(Total = 11) = P(5, 6) + P(6, 5) = 1/36 + 1/36 = 1/18
• ex: P(doubles) = P(1, 1) + P(2, 2) + ... + P(6, 6) = 6/36 = 1/6
Probabilities such as P (Total = 11) and P (doubles) are called
unconditional or prior probabilities (and sometimes just “priors” for
short); they refer to degrees of belief in propositions in the absence
of any other information.
Unconditional or prior probabilities refer to degrees of belief in
propositions in the absence of any other information (evidence)
ex: P(cavity ) = 0.2, P(Total = 11) = 1/18, P(double) = 1/6
Conditional or posterior probabilities refer to degrees of belief in
proposition a given some evidence b: P(a|b)
 evidence: information already revealed
 ex: P(cavity |toothache) = 0.6: p. of a cavity given a toothache
(assuming no other information is provided!)
 ex: P(Total = 11|die1 = 5) = 1/6: p. of total 11 given first die is
5
 restricts the set of possible worlds to those where the first die is
5
Note: P(a|... ∧ a) = 1, P(a|... ∧ ¬a) = 0. ex: P(cavity |toothache ∧
cavity ) = 1, P(cavity |toothache ∧ ¬cavity ) = 0
 Less specific belief still valid after more evidence arrives. ex:
P(cavity ) = 0.2 holds even if P(cavity |toothache) = 0.6
 New evidence may be irrelevant, allowing for simplification. ex:
P(cavity |toothache, 49ersWin) = P(cavity |toothache) = 0.8

The language of propositions in probability assertions


• Factored representation of possible worlds: sets of ⟨variable, value⟩
pairs Variables in probability theory: Random variables
• domain: the set of possible values a variable can take on
• ex: Die: {1, 2, 3, 4, 5, 6}, Weather: {sunny, rain, cloudy, snow },
Odd: {true, false}
• A random variable can be seen as a function from sample points to
the domain: ex:
Die(ω), Weather (ω),... (“(ω)” typically omitted)
• Probability Distribution gives the probabilities of all the possible
values of a random Variable

“The probability that the patient has a cavity, given that she is a
teenager with no toothache, is 0.1” as follows:
P (cavity |¬toothache ∧ teen)=0.1 .
Probability Distribution gives the probabilities of all the possible
values of a random variable
ex: P (Weather = sunny)=0.6
P (Weather = rain )=0.1
P (Weather = cloudy)=0.29
P (Weather = snow)=0.01 ,
but as an abbreviation we will allow
P(Weather)= 0.6, 0.1, 0.29, 0.01
where the bold P indicates that the result is a vector of numbers, and
where we assume a predefined ordering sunny, rain , cloudy , snow
on the domain of Weather. We say that the P statement defines a
probability distribution for the random variable Weather.TheP
notation is also used for conditional distributions: P(X | Y ) gives the
values of P (X = xi | Y = yj) for each possible i, j pair.
For continuous variables, it is not possible to write out the entire
distribution as a vector, because there are infinitely many values.
Instead, we can define the probability that a random
variable takes on some value x as a parameterized function of x. For
example, the sentence P (NoonTemp = x)=Uniform[18C,26C](x)
expresses the belief that the temperature at noon is distributed
uniformly between 18 and 26 degrees Celsius. We call this a
probability density function. P(Weather , Cavity) denotes the
probabilities of all combinations of the values of Weather and Cavity.
This is a 4 × 2 table of probabilities called the joint probability
distribution of Weather and Cavity.

For example, the product rules for all possible values of Weather and
Cavity can be written as a single equation:
P(Weather , Cavity)=P(Weather | Cavity)P(Cavity) ,

Probability axioms and their reasonableness


The basic axioms of probability imply certain relationships among the
degrees of belief that can be accorded to logically related
propositions. We think a proposition a as the event A (set of sample
points) where the proposition is true
• Odd is a propositional random variable of range {true, false}
• notation: a ⇐⇒ “A = true′′
• Given Boolean random variables A and B:
• a: set of sample points where A(ω) = true
• ¬a: set of sample points where A(ω) = false
• a ∧ b: set of sample points where A(ω) = true, B(ω) = true
• with Boolean random variables, sample points are PL models
Proposition: disjunction of the sample points in which it is true
ex: (a ∨ b) ≡ (¬a ∧ b) ∨ (a ∧ ¬b) ∨ (a ∧ b)
P(a ∨ b) = P(¬a ∧ b) + P(a ∧ ¬b) + P(a ∧ b)
• Some derived facts:
P(¬a) = 1 − P(a)
We can also derive the well known formula for the probability of a
disjunction, sometimes called the inclusion–exclusion principle:
P(a ∨ b) = P(a) + P(b) − P(a ∧ b)

INFERENCE USING FULL JOINT DISTRIBUTIONS


For probabilistic inference—that is, the computation of posterior
probabilities for query propositions given observed evidence. We use
the full joint distribution as the “knowledge base” from which
answers to all questions may be derived.
Example: a domain consisting of just the three Boolean variables
Toothache, Cavity, and Catch
The full joint distribution is a 2×2×2 table as shown in Figure 13.3.

Notice that the probabilities in the joint distribution sum to 1, as


required by the axioms of probability.
For example, there are six possible worlds in which cavity ∨
toothache holds:
P(cavity ∨ toothache) = 0.108 + 0.012 + 0.072 + 0.008 + 0.016 +
0.064 = 0.28.
Adding the entries in the first row gives the unconditional or marginal
probability of cavity:
P(cavity) = 0.108 + 0.012 + 0.072 + 0.008 = 0.2
This process is called marginalization, or summing out—because we
sum up the probabilities for each possible value of the other variables,
thereby taking them out of the equation.
We can write the following general marginalization rule for any sets
of variables Y and Z:

For example, we can compute the probability of a cavity, given


evidence of a toothache, as follows:

The two values sum to 1.0, as they should. Notice that in these two
calculations the term
1/P(toothache ) remains constant, no matter which value of Cavity we
calculate. In fact, it can be viewed as a normalization constant for the
distribution P(Cavity | toothache), ensuring that it adds up to 1. we
can write the two preceding equations in one:
A general inference procedure: The query is P(X | e) and can be
evaluated as

where the summation is


over all possible ys (i.e., all possible combinations of values of the
unobserved variables Y). Notice that together the variables X, E, and
Y constitute the complete set of variables for the domain, so P(X, e,
y) is simply a subset of probabilities from the full joint distribution.

INDEPENDENCE
Expand the full joint distribution in Figure 13.3 by adding a fourth
variable, Weather. The full joint distribution then becomes
P(Toothache , Catch, Cavity, Weather ), which has 2 × 2 × 2 × 4 = 32
entries. It contains four “editions” of the table shown in Figure 13.3,
one for each kind of weather. What relationship do these editions
have to each other and to the original threevariable table? For
example, how are P (toothache , catch , cavity , cloudy) and P
(toothache , catch , cavity ) related? We can use the product rule:

P (toothache , catch , cavity , cloudy) = P (cloudy | toothache ,


catch , cavity )P (toothache , catch , cavity )
the weather does not influence the dental variables. Therefore, the
following assertion seems reasonable:
P (cloudy | toothache , catch , cavity )=P (cloudy) .
From this, we can deduce
P (toothache , catch , cavity , cloudy)=P (cloudy)P (toothache ,
catch , cavity ) .
we can write the general equation
P(Toothache, Catch, Cavity, Weather )=P(Toothache , Catch,
Cavity)P(Weather ) .
The 32-element table for four variables can be constructed from one
8-element table and one 4-element table. This decomposition is
illustrated schematically in Figure below.

BAYES’RULE AND ITS USE

Defined the product rule. It can actually be written in two forms


THE WUMPUS WORLD REVISITED
To calculate the probability that each of the three squares contains a
pit. (For this example we ignore the wumpus and the gold.) The
relevant properties of the wumpus world are that (1) a pit causes
breezes in all neighboring squares, and (2) each square other than
[1,1] contains a pit with probability 0.2. The first step is to identify
the set of random variables we need: to specify the full joint
distribution, P(P1,1,...,P4,4,B1,1,B1,2,B2,1). Applying the product
rule, we have
P(P1,1,...,P4,4,B1,1,B1,2,B2,1) = P(B1,1,B1,2,B2,1 |
P1,1,...,P4,4)P(P1,1,...,P4,4)
This decomposition makes it easy to see what the joint probability
values should be. The first term is the conditional probability
distribution of a breeze configuration, given a pit configuration; its
values are 1 if the breezes are adjacent to the pits and 0 otherwise.
The second term is the prior probability of a pit configuration. Each
square contains a pit with probability 0.2, independently of the other
squares; hence,

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy