ch13 Uncertainty
ch13 Uncertainty
ch13 Uncertainty
uncertainty
Prepared by
Atalla Salh
Ihab Ameer
Melad Muzhar
ACTING UNDER UNCERTAINTY
When an agent knows enough facts about its
environment,the logical approach enables it to derive
plans that are guaranteed to work. This is a good thing.
Unfortunately, agents almost never have access to the
whole truth about their erzvironment. Agents must,
therefore, act under uncertainty
.
Uncertainty arises because of both-
laziness and ignorance. It is inescapable in
.complex,dynamic, or inaccessible worlds
Propositions
we have seen two formal languages-propositic)nal logic and first-order logicfor
stating propositions. Probability theory typically uses a language that is slightly more
expressive than propositional
random varible The basic element of the language, which can be thought of as random
referring to a "part" of the world whose "status" is initially unknown
Each random variable has a domain of values that it can take on-
Probability Theory: The Basics (AI Style)
Like logical assertions, probabilistic assertions are •
about
.possible worlds
Logical assertions say which possible worlds •
(interpretations)
are ruled out (those in which the KB assertions are
.false)
Probabilistic assertions talk about how probable the •
.various worlds are
The Basics (cont’d)
The set of possible worlds is called the sample space (denoted •
by Ω). The elements of Ω (sample points) will be denoted by
.ω
The possible words of Ω (e.g., outcomes of throwing a dice) are •
.mutually exclusive and exhaustive
In standard probability theory textbooks, instead of possible •
worlds we talk about outcomes, and instead of sets of possible
worlds we talk about events (e.g., when two dice sum up to
.)11
We will represent events by propositions in a logical
.language which we will define formally later
Atomic events
atomic event is useful in understanding the-
.foundations of probability theory
An atomic event is a complete specification of the-
state of the world about which the agent is uncertain.
-It can be thought of as an assignment of particular
values to all the variables of which the world is
composed
Prior probability
The unconditional or prior probability associated with a proposition a is
the degree of belief accorded to it in the absence of any other
information; it is written as P ( a ) . For example, if
the prior probability that I have a cavity is 0.1, then we would write
. P(Cavity = true) = 0.1 or P(cavity) = 0.1
Conditional probability
Once the agent has obtained some evidence concerning the previously
unknown random variables
making up the domain, prior probabilities are no longer applicable.
Instead, we use conditional or posterior probabilities. The notation
used is P(alb), where a and b are any proposition. This is read as "the
,probability of a, given that all we know is b." For example
P(cavity| toothache) = 0.8
we have defined a syntax for propositions and for prior and conditional probability
statements about those propositions. Now we must provide some sort of semantics
for probability
statements. We begin with the basic axioms that serve to define the probability scale
:and its endpoints
,All probabilities are between 0 and 1. For any propositjion a .1
p(a)<=1=<0
Necessarily true (i.e., valid) propositions have probability I, and necessarily false .2
(i.e.,unsatisfiable) propositions have probability 0
P(true)=1 p(false)=0
Next, we need an axiom that connects the probabilities of logically related
propositions. The simplest way to do this is to define the probability of a disjunction as
follow
The probability of a disjunction is given by .3
P(a˅b)=p(a)+p(b)-p(a^b)
Using the axioms of probability
We can derive a variety of useful facts from the basic ,axioms. For
example, the familiar rule for negation follows by substituting ~a for b in
:axiom 3, giving us
P(a V ~a ) = P(a) + P (~a ) - P(a A ~a ) ('by axiom 3 with b = ~a )
P(true) = P(a) + P ( ~ a ) - P(false) (by logical equivalence)
P(a) + P ( ~ a ) (by axiom 2) = 1
.P (~a ) = 1 - P(a) (by algebra)
The third line of this derivation is itself a useful fact and can be extended
from the Boolean case to the general discrete case. Let the discrete
.variable D have the domain (dl, . . . , d,)
Why the axioms of probability are reasonable
The axioms of probability can be seen as restricting tihe set of probabilistic beliefs that an
agent can hold. This is somewhat analogous to the logical case, where a logical agent cannot
simultaneously believe A, B , and l ( A A B ) , for example. There is, however, an additional
complication. In the logical case, the semantic definition of conjunction means that at least
one of the three beliefs just mentioned must be false in the world, so it is unreasonable for an
agent to believe all three. With probabilities, on the other hand, statements refer not to the
world directly, but to the agent's own state of knowledge. Why, then, can an agent not hold
?the following set of beliefs, which clearly violates axiom 3
P(a)=0.4 p(a^b)=0.0
P(b)=0.3 p(aνb)=0.8
The Joint Probability Distribution
If we have more than one random variable and we are considering
problems that involve two or more of these variables at the same time,
then the joint
probability distribution specifies degrees of belief in the values that
.these functions take jointly
The joint probability distribution P(X), where X is a vector of random
variables, is usually specified graphically by a n-dimensional table
(where n
.)is the dimension of X
Example: (two Boolean variables Toothache and Cavity)
The Full Joint Probability Distribution
The full joint probability distribution is the joint probability
distribution for all random variables.
If we have this distribution, then we can compute the
probability of any propositional sentence using the formulas
.about probabilities we presented earlier
Independence
The notion of independence captures the situation
when the probability of a random variable taking a
certain value is not influenced by the fact
.that we know the value of some other variable
Definition. Two propositions a and b are called
independent if
P(a|b) = P(a) (equivalently: P(b|a) = P(b) or P(a ∧ b) =
.P(a)P(b))
Definition. Two random variables X and Y are called
independent if
P(X | Y ) = P(X) (equivalently: P(Y | X) = P(Y ) or
.)P(X, Y ) = P(X) P(Y )
Example
P(Weather | Toothache,Catch,Cavity) = P(Weather)
!Note: Zeus might be an exception to this rule
Applying Bayes' rule: The simple case
On the surface, Bayes' rule does not seem very useful. It requires three terms-a conditional
.probability and two unconditional probabilities-just to compute one conditional probability
Bayes' rule is useful in practice because there are many cases where we do have good
probability estimates for these three numbers and need to compute the fourth. In a task such
as medical diagnosis, we often have conditional probabilities on causal relationships and want
to derive a diagnosis. A doctor knows that the disease meningitis causes the patient to have
a stiff neck, say, 50% of the time. The doctor also knows some unconditional facts: the prior
probability that a patient has meningitis is 1150,000, and the prior probability that any patient
has a stiff neck is 1120. Letting s be the proposition that the patient has a stiff neck and rn be
the proposition that the patient has meningitis
Applying Bayes’ Rule: Combining Evidence
What happens when we have two or more pieces of
?evidence
Example: What can a dentist conclude if her steel
probe catches
?in the aching tooth of a patient
?How can we compute P(Cavity | toothache ∧ catch)
Combining Evidence (cont’d)
.Use the full joint distribution table (does not scale) •
:Use Bayes’ rule •
= P(Cavity | toothache∧catch)
P(toothache ∧ catch | Cavity) P(Cavity)/P(toothache
∧ catch)
This approach does not scale too if we have a large
number of
.evidence variables
Question: Can we use independence
Combining Evidence (cont’d)