Chapter 5
Chapter 5
Chapter 5
UNCERTAINTY
Summarizing uncertainty
Let’s consider an example of uncertain reasoning: diagnosing a
dental patient’s toothache. A medical diagnosis
• Given the symptoms (toothache) infer the cause (cavity) How to
encode this relation in logic?
• diagnostic rules:
• Toothache → Cavity (wrong)
• Toothache → (Cavity ∨ GumProblem ∨ Abscess ∨ ...) (too many
possible causes, some very unlikely) causal rules:
• Cavity → Toothache (wrong)
• (Cavity ∧ ...) → Toothache (many possible (con)causes)
• Problems in specifying the correct logical rules: Complexity: too
many possible antecedents or consequents Theoretical ignorance: no
complete theory for the domain Practical ignorance: no complete
knowledge of the patient
Trying to use logic to cope with a domain like medical diagnosis thus
fails for three main reasons:
Laziness: It is too much work to list the complete set of
antecedents or consequents needed to ensure an exceptionless
rule and too hard to use such rules.
Theoretical ignorance: Medical science has no complete theory
for the domain.
Practical ignorance: Even if we know all the rules, we might
be uncertain about a particular patient because not all the
necessary tests have been or can be run.
The connection between toothaches and cavities is just not a logical
consequence in either direction. This is typical of the medical domain,
as well as most other judgmental domains: law, business, design,
automobile repair, gardening, dating, and so on. The agent’knowledge
can at best provide only a degree of belief in the relevant sentences.
Our main tool for dealing with degrees of belief is probability theory.
Probability provides a way of summarizing the uncertainty that
comes from our laziness and ignorance, thereby solving the
qualification problem.
• Probability allows to summarize the uncertainty on effects of
laziness: failure to enumerate exceptions, qualifications, etc.
ignorance: lack of relevant facts, initial conditions, etc.
• Probability can be derived from, statistical data (ex: 80% of
toothache patients so far had cavities) some knowledge (ex: 80% of
toothache patients has cavities) their combination.
• Probability statements are made with respect to a state of knowledge
(aka evidence), not with respect to the real world
• e.g., “The probability that the patient has a cavity, given that she has
a toothache, is 0.8”:
• P(HasCavity (patient ) | hasToothAche(patient)) = 0.8
• Probabilities of propositions change with new evidence:
• “The probability that the patient has a cavity, given that she has a
toothache and a history of gum disease, is 0.4”:
∧HistoryOfGum(patient )) = 0.4
• P(HasCavity (patient ) | hasToothAche(patient )
“The probability that the patient has a cavity, given that she is a
teenager with no toothache, is 0.1” as follows:
P (cavity |¬toothache ∧ teen)=0.1 .
Probability Distribution gives the probabilities of all the possible
values of a random variable
ex: P (Weather = sunny)=0.6
P (Weather = rain )=0.1
P (Weather = cloudy)=0.29
P (Weather = snow)=0.01 ,
but as an abbreviation we will allow
P(Weather)= 0.6, 0.1, 0.29, 0.01
where the bold P indicates that the result is a vector of numbers, and
where we assume a predefined ordering sunny, rain , cloudy , snow
on the domain of Weather. We say that the P statement defines a
probability distribution for the random variable Weather.TheP
notation is also used for conditional distributions: P(X | Y ) gives the
values of P (X = xi | Y = yj) for each possible i, j pair.
For continuous variables, it is not possible to write out the entire
distribution as a vector, because there are infinitely many values.
Instead, we can define the probability that a random
variable takes on some value x as a parameterized function of x. For
example, the sentence P (NoonTemp = x)=Uniform[18C,26C](x)
expresses the belief that the temperature at noon is distributed
uniformly between 18 and 26 degrees Celsius. We call this a
probability density function. P(Weather , Cavity) denotes the
probabilities of all combinations of the values of Weather and Cavity.
This is a 4 × 2 table of probabilities called the joint probability
distribution of Weather and Cavity.
For example, the product rules for all possible values of Weather and
Cavity can be written as a single equation:
P(Weather , Cavity)=P(Weather | Cavity)P(Cavity) ,
The two values sum to 1.0, as they should. Notice that in these two
calculations the term
1/P(toothache ) remains constant, no matter which value of Cavity we
calculate. In fact, it can be viewed as a normalization constant for the
distribution P(Cavity | toothache), ensuring that it adds up to 1. we
can write the two preceding equations in one:
A general inference procedure: The query is P(X | e) and can be
evaluated as
INDEPENDENCE
Expand the full joint distribution in Figure 13.3 by adding a fourth
variable, Weather. The full joint distribution then becomes
P(Toothache , Catch, Cavity, Weather ), which has 2 × 2 × 2 × 4 = 32
entries. It contains four “editions” of the table shown in Figure 13.3,
one for each kind of weather. What relationship do these editions
have to each other and to the original threevariable table? For
example, how are P (toothache , catch , cavity , cloudy) and P
(toothache , catch , cavity ) related? We can use the product rule: