cs511 Uncertainty
cs511 Uncertainty
cs511 Uncertainty
1
The uncertainty may also be caused by the
represented knowledge since it might
2
Probabilistic reasoning.
Certainty factors
Dempster-Shafer Theory
3
1. Classical Probability
The oldest and best defined technique for managing
uncertainty is based on classical probability theory. Let us
start to review it by introducing some terms.
For example:
If the outcome of an experiment consists in the
determination of the sex of a newborn child, then
S = {g, b}
4
Event: any subset E of the sample space is known as
an event.
5
A formal theory of probability can be made using
three axioms:
1. 0 P(E) 1.
P(Ei) + P(Ei’) = 1,
6
Compound probabilities
7
Conditional Probabilities
P(A | B)
P(A B)
P(A | B) = ------------------, for P(B) 0.
P(B)
8
An example
As an example of probabilities, Table below shows
hypothetical probabilities of a disk crash using a Brand X
drive within one year.
X X’ Total of rows
C P(C X) P(C X’) P(C)
C’ P(C’ X) P(C’ X’) P(C’)
Total of columns P(X) P(X’) 1
9
(6) The probability of a crash, given that Brand X is
used, is
P(C X) 0.6
P(C | X) = ------------- = ------- = 0.75
P(X) 0.8
10
have not crashed (0.2), and some are not Brand X
and have not crashed (0.1).
P(A | B) = P(A) or
P(B | A) = P(B) or
11
Bayes’ Theorem
P(H E)
P(H | E) = ------------------, for P(E) 0.
P(E)
P(E | H)
12
From conditional probability:
P(H E)
P(H | E) = ------------------,
P(E)
P(E H)
Furthermore, we have, P(E | H) = ---------------
P(H)
So,
P(E | H)P(H) = P(H | E)P(E) = P(H E)
Thus
P(E | H) P(H)
P(H | E) = ---------------------
P(E)
P(E Hi)
P(Hi | E) = -------------------
P(E Hj)
j
13
Hypothetical reasoning and
backward induction
Bayes’ Theorem is commonly used for decision tree
analysis of business and the social sciences.
14
Probabilities
No oil oil Prior
Subjective Opinion
P(O’) = 0.4 P(O)=0.6 of site: P(Hi)
15
Using Addition law to calculate the total probability
of a + and a - test
Probabilities
– test + test unconditional
P(-) = 0.48 P(+)=0.52 P(E)
Posterior
No oil oil No oil oil Of site:
P(O’|-) P(O|-) P(O’|+) P(O|+) P(Hi|E) = P(E|
=3/4 = 1/4 = 1/13 = 12/13 Hi)P(Hi)/P(E)
16
The Figure below shows the Baysian decision tree
using the data from the above Figure. The payoffs are
at the bottom of the tree.
Thus if oil is found, the payoff is
$1,250,000 - $200,000 - $50,000 = $1,000,000
while a decision to quit after the seismic test result gives a
payoff of -$50,000.
A
Event
P(-) = 0.48 P(+)=0.52 Test result
+ or -
D E Act.
Quit or drill
Quit Drill Quit Drill
B C
Event
No oil oil No oil oil
Oil or no oil
P(O’|-) P(O|-) P(O’|+) P(O|+)
=3/4 = 1/4 = 1/13 = 12/13
-$50,000 -$50,000 payoff
-$1,000,000 $1,000,000 -$1,000,000 $1,000,000
17
In order for the prospector to make the best decision,
the expected payoff must be calculated at event node
A.
18
The expected payoff from an event node is the sum of
the payoffs times the probabilities leading to the
payoffs.
A $416,000
Event
P(-) = 0.48 P(+)=0.52 Test result
+ or -
-$50,000 $846,153
D E Act.
Quit or drill
Quit Drill Quit Drill
B C
Event
No oil oil No oil oil
Oil or no oil
P(O’|-) P(O|-) P(O’|+) P(O|+)
=3/4 = 1/4 = 1/13 = 12/13
-$50,000 -$50,000 payoff
-$1,000,000 $1,000,000 -$1,000,000 $1,000,000
19
The decision tree shows the optimal strategy for the
prospector. If the seismic test is positive, the site
should be drilled, otherwise, the site should be
abandoned.
20
Bayes’ rule and knowledge-based
systems
As we know, rule-based systems express knowledge in an
IF-THEN format:
IF X is true
THEN Y can be concluded with probability p
21
Within the rule given above, Y (denotes some piece
of evidence (typically referred to as E) and X denotes
some hypothesis (H) given
P(E | H) P(H)
(1) P(H | E) = -------------------
P(E)
or
P(E | H) P(H)
(2) P(H | E) = -----------------------------------------
P(E | H)P(H) + P(E | H’)P(H’)
22
P(H) = P(Rob has a cold)
= 0.2
P(E | H) =P(Rob was observed sneezing | Rob has a cold)
= 0.75
P(E | H’) = P(Rob was observed sneezing |
Rob does not have a cold)
= 0.2
Then
and
23
We can also determine what his probability of having
a cold would be if he was not sneezing:
P(E’ | H)P(H)
P(H | E’) = -------------------
P(E’)
(1-0.75) (0.2)
= -------------------
(1 - 0.31)
= 0.07246
24
Propagation of Belief
25
This equation is derived based on several
assumptions:
26
To illustrate how belief is propagated through a
system using Bayes’ rule, consider the values shown
in the Table below. These values represent
(hypothetically) three mutually exclusive and
exhaustive hypotheses
27
If we observe evidence E1 (e.g., the patient sneezes),
we can compute posterior probabilities for the
hypotheses using Equation (3) (where k = 1) to be:
(0.3)(0.6)
P(H1 | E1) = ------------------------------------------ = 0.4
(0.3)(0.6) + (0.8)(0.3) + (0.3)(0.1)
(0.8)(0.3)
P(H2 | E1) = ------------------------------------------ = 0.53
(0.3)(0.6) + (0.8)(0.3) + (0.3)(0.1)
(0.3)(0.1)
P(H3 | E1) = ------------------------------------------ = 0.06
(0.3)(0.6) + (0.8)(0.3) + (0.3)(0.1)
P(H1 | E1 E2)
(0.3)(0.6)(0.6)
= ------------------------------------------------------------
(0.3)(0.6)(0.6) + (0.8)(0.9)(0.3) + (0.3)(0.0)(0.1)
= 0.33
28
P(H2 | E1 E2)
(0.8)(0.9)(0.3)
= ------------------------------------------------------------
(0.3)(0.6)(0.6) + (0.8)(0.9)(0.3) + (0.3)(0.0)(0.1)
= 0.67
P(H3 | E1 E2)
(0.3)(0.0)(0.1)
= ------------------------------------------------------------
(0.3)(0.6)(0.6) + (0.8)(0.9)(0.3) + (0.3)(0.0)(0.1)
= 0.0
29
Advantages and disadvantages of
Bayesian methods
30
3. Often the type of relationship between the hypothesis
and evidence is important in determining how the
uncertainty will be managed. Reducing these
associations to simple numbers removes relevant
information that might be needed for successful
reasoning about the uncertainties. For example,
Bayesian-based medical diagnostic systems have
failed to gain acceptance because physicians distrust
systems that cannot provide explanations describing
how a conclusion was reached (a feature difficult to
provide in a Bayesian-based system).
31
2: Certainty factors
P(H) + P(H’) = 1
and so
32
P(H) = 1 - P(H’)
For the case of a posterior hypothesis that relies on
evidence, E
33
The MYCIN knowledge engineers found that while
an expert would agree to equation (2), they became
uneasy and refused to agree with the probability result
34
Assuming that you agree with (4) (or perhaps your
own value for the likelihood) then by equation (1)
35
Measures of belief and disbelief
where
36
disease with the highest CF would be the one that
is first investigated by ordering tests.
= 1 if P(H) = 1
MB(H, E)
max[P(H | E), P(H)] - P(H)
= ---------------------------------- otherwise
1 - P(H)
= 1 if P(H) =1
MD(H,E)
min [P(H | E), P(H)] - P(H)
= ---------------------------------- otherwise
- P(H)
37
According to these definitions, some characteristics
are shown in Table 5-1.
____________________________________
Characteristics Values
------------------------------------------------------
Ranges 0 MB 1
0 MD 1
-1 CF 1
-------------------------------------------------------
Certain True Hypothesis MB =1
P(H | E) = 1 MD =0
CF = 1
-------------------------------------------------------
Certain False Hypothesis MB =0
P(H’|E) =1 MD =1
CF = -1
-------------------------------------------------------
Lack of evidence MB = 0
P(H | E) = P(H) MD = 0
CF =0
-------------------------------------------------------
38
The certainty factor, CF, indicates the net belief in
hypothesis based on some evidence.
39
A CF=70% means that the belief is 70% greater than
the disbelief.
CF(H, E) + CF(H’, E) =0
which means
(6) I am 70 % certain that I will graduate if I get
an ‘A’ in this course.
(7) I am -70% certain that I will not graduate if I
get an ‘A’ in this course.
0 means no evidence.
40
So certainty values greater than 0 favor the hypothesis
Certainty factors less than 0 favor the negation of the
hypothesis. Statements (6) and (7) are equivalent
using certainty factors
The above CF values might be elicited by asking
41
Calculation with Certainty Factors
CF = MB - MD
MB - MD
CF = ------------------------
1 - min(MB, MD)
42
This softens the effects of a single piece of
disconfirming evidence on many confirming pieces of
evidence. Under this definition with MB=0.999,
MD=0.799
0.999-0.799 0.200
CF = --------------------------- = ------------- = 0.995
1- min(0.999, 0.799) 1 - 0.799
-------------------------------------------------------------
Evidence, E Antecedent Certainty
-------------------------------------------------------------
E1 AND E2 min [CF(E1, e),CF(E2, e)]
E1 OR E2 max[CF(E1, e),CF(E2, e)]
NOT E -CF(E, e)
--------------------------------------------------------------
Table 5-2
43
E = max[min(E1, E2, E3), min(E4, -E5)]
for values
E1 = 0.9 E2 = 0.8 E3 = 0.3
E4 = -0.5 E5 = -0.4
the result is
E = max[min(0.9, 0.8, 0.3), min(-0.5, -(-0.4)]
= max[0.3, -0.5]
= 0.3
If E THEN H
is given by
where
CF(E,e) is the certainty factor of the evidence E
making up the antecedent of the rule base on
uncertain evidence e.
44
Thus, if all the evidence in the antecedent is known
with certainty, the formula for the certainty factor of
the hypothesis is
CF(H,e) = CF(H,E)
since CF(E,e) = 1.
45
CF(E1, e) = CF(E2, e) = CF(E3, e) = 1
What happens when all the evidenced are not known
with certainty?
then
CF(E,e) = CF(E1E2E3,e)
= min[CF(E1,e), CF(E2,e), CF(E3,e)]
= min[0.5, 0.6, 0.3]
= 0.3
46
What happen when another rule also concludes the
same hypothesis, but with a different certainty factor?
(9) CFCOMBINE(CF1,CF2)
CF1 + CF2
= ------------------------- if one of CF1 and CF2 < 0
1 - min(|CF1|,|CF2|)
47
The following figure summarizes the calculations
with certainty factors for two rules based on uncertain
evidence and concluding the same hypothesis.
Hypothesis, H
Rule 1 Rule 2
48
In our above example, if another rule concludes
strepococcus with certainty factor CF2 = 0.5, then the
combined certainty using the first formula of (9) is
0.205
= --------- = 0.34
1 - 0.4
The CFCOMBINE formula preserves the commutativity
of evidence. That is
CFCOMBINE(X,Y) = CFCOMBINE(Y,X)
49
Advantages and disadvantages of
certainty factors
The CF formalism has been quite popular with expert
system developers since its creation because
50
competence due to these systems’ ability to manipulate
and reason with uncertainty or is it due to other factors?
Some studies have shown that changing the certainty
factors or even turn off the CF reasoning portion of
MYCIN does not seems to affect the correct diagnoses
much.
For example, if
51
P(H1) = 0.8 P(H2) = 0.2
P(H1 | E) = 0.9 P(H2 | E) = 0.8
then CF(H1, E) = 0.5 and CF(H2, E) = 0.75
52
3: Dempster-Shafer Theory
53
Frames of discernment
Given a set of possible elements, called environment,
For example,
54
Each subset of can be interpreted as a possible
answer to a question.
Since the elements are mutually exclusive and the
environment is exhaustive, there can be only one
correct answer subset to a question.
55
Mass Functions and Ignorance
56
A fundamental difference between Dempster-Shafer
theory and probability theory is the treatment of
ignorance.
1
P = ----
N
57
Any belief that is not assigned to a specific subset is
considered no belief or nonbelief and just associated
with environment .
58
The Dempster-Shafer theory has a major difference
with probability theory which would assume that
P(hostile) = 0.7
P(non-hostile) = 1 - 0.7 = 0.3
59
We now state things more formally.
1 if x =
60
m0(x) =
0 otherwise
Each proper subset of gets assigned the number 0.
The core of m0 is equal to {}
0.3 if x =
61
m2(x) = 0.7 if x = {heart-attack, pulmonary-embolism,
aortic-dissection }
0 otherwise
62
Combining evidence
Dempster-Shafer theory provides a function for
computing from two pieces of evidence and their
associated masses describing the combined influence of
these pieces of evidence.
63
T11({B}) = m1({B, F}) m2({B}) = (0.7)(0.9)=0.63
Once the individual mass products have been
calculated as shown above, then according to
Dempster’s Rule the products over the common set of
intersections are added
m3({B}) = m1 m2({B})
= 0.63 + 0.27 = 0.90 Bomber
64
We have two belief values for the bomber, 0.9 and 1.
This pair represents a range of belief. It is called an
evidential interval.
65
The Bel (belief function, or support) is defined to be
the total belief of a set and all its subsets.
Bel(X) = m(Y)
Y X
For example,
Bel1 Bel2()
= m1 m2() + m1m2({B, F}) + m1m2({B})
= 0.03 + 0.07 + 0.9 = 1
66
Bel() = 1 in all cases since the sum of masses must
always equal 1.
Bel({A, F})
= m1m2({A, F}) + m1m2({A}) + m1 m2({F})
=0+0+0=0
Then
EI({B, F}) = [0.97, 1 - 0] = [0.97, 1]
EI({A}) = [0, 0.03]
67
The plausibility is defined as the degree to which the
evidence fails to refute X
PIs(X) = 1 - Bel(X’)
68
The Normalization of Belief
Let us see an example. Suppose a third evidence now
reports conflicting evidence of an airliner
Thus
m1m2m3({A}) = 0.0285
m1m2m3({B}) = 0.045
m1m2m3({B, F}) = 0.0035
m1m2m3() = 0.0015
m1m2m3() = 0
Note that for this example, the sum of all the masses
is less than 1
69
m1m2m3(X)
= 0.0285 + 0.045 + 0.0035 + 0.0015 = 0.0785
However a sum of 1 is required because the combined
evidence m1m2m3, is a valid mass and the sum
over all focal elements must be 1.
This is a problem.
The solution to this problem is a normalization of the
focal elements by dividing each focal element by
1-
where is defined for any sets X and Y as
= m1(X)m2(Y)
X Y =
m1m2m3({A}) = 0.363
m1m2m3({B}) = 0.573
m1m2m3({B, F}) = 0.045
m1m2m3() = 0.019
70
The total normalized belief in {B} is now
71
Difficulty with the Dempster-Shafer
theory
72