Probability Theory: Uncertainty Measure: Lecture Module 23

Download as pdf or txt
Download as pdf or txt
You are on page 1of 32

Probability Theory:

Uncertainty Measure

Lecture Module 23
Uncertainty
● Most intelligent systems have some degree of
uncertainty associated with them.
● Uncertainty may occur in KBS because of the
problems with the data.
− Data might be missing or unavailable.
− Data might be present but unreliable or ambiguous due to
measurement errors, multiple conflicting measurements etc.
− The representation of the data may be imprecise or
inconsistent.
− Data may just be expert's best guess.
− Data may be based on defaults and the defaults may have
exceptions.
Cont…
● Given numerous sources of errors, the most KBS
requires the incorporation of some form of uncertainty
management.
● For any form of uncertainty scheme, we must be
concerned with three issues.
− How to represent uncertain data?
− How to combine two or more pieces of uncertain data?
− How to draw inference using uncertain data?
● Probability is the oldest theory with strong
mathematical basis.
● Other methods for handling uncertainty are Bayesian
belief network, Certainty factor theory etc.
Probability Theory
● Probability is a way of turning opinion or expectation into
numbers.
● It lies between 0 to 1 that reflects the likelihood of an
event.
● The chance that a particular event will occur = the
number of ways the event can occur divided by the total
number of all possible events.
Example: The probability of throwing two successive
heads with a fair coin is 0.25
− Total of four possible outcomes are :
HH, HT, TH & TT
− Since there is only one way of getting HH,
probability = ¼ = 0.25
Event
Event: Every non-empty subset A (of sample space S) is
called an event.
− null set Φ is an impossible event.
− S is a sure event
● P(A) is notation for the probability of an event A.
● P(Φ) = 0 and P(S) = 1
● The probabilities of all events S = {A1, A2, …, An}
must sum up to certainty i.e. P(A1) + … + P(An) = 1
● Since the events are the set, it is clear that all set
operations can be performed on the events.
● If A and B are events, then
− A ∩ B ; A∪ B and A' are also events.
− A - B is an event "A but not B
− Events A and B are mutually exclusive, if A ∩ B=Φ
Axioms of Probability
● Let S be a sample space, A and B are events.
− P(A) ≥ 0
− P(S) = 1
− P(A’ ) = 1 - P(A)
− P(A ∪ B ) = P(A) + P(B) – P(A ∩ B)
− If events A and B are mutually exclusive, then
P(A ∪ B ) = P(A) + P(B),
● In general, for mutually exclusive events A1,…,An in S
P(A1 ∪ A2 ∪… ∪ An ) = P(A1) + P(A2) + …+ P(An)
Joint Probability
● Joint Probability of the occurrence of two independent events is
written as P(A and B) and is defined by
P(A and B) = P(A ∩ B) = P(A) * P(B)
Example: We toss two fair coins separately.
Let P(A) = 0.5 , Probability of getting Head of first coin
P(B) = 0.5, Probability of getting Head of second coin
● Probability (Joint probability) of getting Heads on both the coins is
= P(A and B)
= P(A) * P(B) = 0.5 X 0.5 = 0.25
● The probability of getting Heads on one or on both of the coins i.e.
the union of the probabilities P(A) and P(B) is expressed as
P(A or B) = P(A ∪ B) = P(A) + P(B) - P(A) * P(B)
= 0.5 + 0.5 - 0.25
= 0.75
Conditional Probability
● It relates the probability of one event to the occurrence
of another i.e. probability of the occurrence of an
event H given that an event E is known to have
occurred.
● Probability of an event H (Hypothesis), given the
occurrence of an event E (evidence) is denoted by
P(H | E) and is defined as follows:

Number of events favorable to H


which are also favorable to E
P(H | E) =
No. of events favorable to E

P(H and E)
=
P(E)
Example
● What is the probability of a person to be male if
person chosen at random is 80 years old?
● The following probabilities are given
− Any person chosen at random being male is about 0.50
− probability of a given person be 80 years old chosen at
random is equal to 0.005
− probability that a given person chosen at random is both
male and 80 years old may be =0.002
● The probability that an 80 years old person chosen at
random is male is calculated as follows:
P(X is male | Age of X is 80)
= [P(X is male and the age of X is 80)] / [P(Age of X is 80)]
= 0.002 / 0.005 = 0.4
Conditional Probability with Multiple
Evidences

● If there are n evidences and one hypothesis, then


conditional probability is defined as follows:

P(H and E1 … and En)


P(H | E1 and … and En) =
P(E1 and … and En)
Bayes’ Theorem
• Bayes theorem provides a mathematical model for this
type of reasoning where prior beliefs are combined
with evidence to get estimates of uncertainty.
• This approach relies on the concept that one should
incorporate the prior probability of an event into the
interpretation of a situation.
• It relates the conditional probabilities of events.
• It allows us to express the probability P(H | E) in terms
of the probabilities of P(E | H), P(H) and P(E).
P(E|H) * P(H)
P(H|E) =
P(E)
Proof of Bayes’ Theorem
● Bayes’ theorem is derived from conditional probability.
Proof: Using conditional probability
P(H|E) = P(H and E) / P(E)
⇒ P(H|E) * P(E) = P(H and E) (1)
Also P(E|H) = P(E and H) / P(H)
⇒ P(E|H) * P(H) = P(E and H) (2)

From Eqs (1) and (2), we get


P(H|E) * P(E) = P(E|H) * P(H)
Hence, we obtain
P(E|H) * P(H)
P(H|E) =
P(E)
Extension of Bayes’ Theorem
● Consider one hypothesis H and two evidences E1 and
E2.
● The probability of H if both E1 and E2 are true is
calculated by using the following formula:

P(E1| H) * P(E2| H) * P(H)


P(H|E1 and E2) =
P(E1 and E2)
Contd..
● Consider one hypothesis H and Multiple evidences
E1,…., En.
● The probability of H if E1,…, En are true is calculated
by using the following formula:

P(E1| H) * … * P(En | H) * P(H)


P(H|E1 and … and En) =
P(E1 and … and En)
Example
● Find whether Bob has a cold (hypotheses) given that
he sneezes (the evidence) i.e., calculate P(H | E).
● Suppose that we know / given the following.
P(H) = P (Bob has a cold) = 0.2
P(E | H)= P(Bob was observed sneezing
| Bob has a cold) = 0.75
P(E | ~H)= P(Bob was observed sneezing
| Bob does not have a cold) = 0.2
Now
P(H | E) = P(Bob has a cold | Bob was observed sneezing)
= [ P(E | H) * P(H) ] / P(E)
Cont…
● We can compute P(E) as follows:
P(E) = P( E and H) + P( E and ~H)
= P(E | H) * P(H) + P(E | ~H) * P(~H)
= (0.75)(0.2) + (0.2) (0.8) = 0.31
− Hence P(H | E) = [(0.75 * 0.2)] / 0.31 = 0.48387
− We can conclude that “Bob’s probability of having a cold given
that he sneezes” is about 0.5
● Further it can also determine what is his probability of
having a cold if he was not sneezing?
P(H | ~E) = [P(~E | H) * P(H)] / P(~E)
= [(1 – 0.75) * 0.2] / (1 – 0.31)
= 0.05 / 0.69 = 0.072
− Hence “Bob’s probability of having a cold if he was not
sneezing” is 0.072
Advantages and Disadvantages of
Bayesian Approach
Advantages:
● They have sound theoretical foundation in probability
theory and thus are currently the most mature of all
certainty reasoning methods.
● Also they have well-defined semantics for decision
making.
Disadvantages:
● They require a significant amount of probability data
to construct a KB.
− For example, a diagnostic system having 50 detectable
conclusions (R) and 300 relevant and observable
characteristics (S) requires a minimum of 15,050 (R*S + R)
probability values assuming that all of the conclusions are
mutually exclusive.
Cont…

● If conditional probabilities are based on


− statistical data, the sample sizes must be sufficient so that
the probabilities obtained are accurate.
− human experts, then question of values being consistent &
comprehensive arise.
● The reduction of the associations between the
hypothesis and evidence to numbers also eliminates
the knowledge embedded within.
− The ability to explain its reasoning and to browse through the
hierarchy of evidences to hypothesis to a user are lost.
Probabilities in Facts and Rules of
Production System
● Some Expert Systems use Bayesian theory to derive
further concepts.
− We know that KB = facts + Rules
● We normally assume that the facts are always
completely true but facts might also be probably true.
● Probability can be put as the last argument of a
predicate representing fact.
Example:
● a fact "battery in a randomly picked car is 4% of the
time dead" in Prolog is expressed as
battery_dead (0.04).
− This fact indicates that ‘battery is dead’ is sure with
probability 0.04.
Probability in Rules
● If_then rule in rule-based Systems can incorporate
probability as follows:
− if X is true then Y can be concluded with probability P
Examples:
● Consider the following probable rules and their
corresponding Prolog representation.
− "if 30% of the time when car does not start, it is true that the
battery is dead "
battery_dead (0.3) :- ignition_not_start(1.0).
Here 30% is rule probability. If right hand side of the rule is
certain, then we can even write above rule as:
battery_dead(0.3) :- ignition_not_start.
− "the battery is dead with same probability that the voltmeter is
outside the normal range"
battery_dead(P) :-voltmeter_measurment_abnormal(P).
Cumulative Probabilities
● Combining probabilities from the facts and successful
rules to get a cumulative probability of the battery
being dead is an important issue.
− We should gather all relevant rules and facts about the battery
is dead.
● The probability of a rule to succeed depends on
probabilities of sub goals on the right side of a rule.
− The cumulative probability of conclusion can be calculated by
using and-combination.
● In this case, probabilities of sub goals in the right side
of rule are multiplied, assuming all the events are
independent of each other using the formula
Prob(A and B and C and .....) = Prob(A) * Prob(B) * Prob(C) * ...
Cont…
● The rules with same conclusion can be uncertain for
different reasons.
● If there are more than one rules with the same
predicate name having different probabilities, then in
cumulative likelihood of the above predicate can be
computed by using or-combination.
● To get overall probability of predicate, the following
formula is used to get 'or' probability if events are
mutually independent.
Prob(A or B or C or ...)
= 1 - [(1 - Prob(A)) (1 - Prob(B)) (1 - Prob(C))....]
Examples
1. "half of the time when a computer does not work, then
the battery is dead"
battery_dead(P):-computer_dead(P1), P is P1*0.5.
− Here 0.5 is a rule probability.
2. "95% of the time when a computer has
electricalproblem and battery is old, then the battery is
dead"
battery_dead(P) :- electrical_prob(P1),
battery_old(P2), P is P1 * P2 * 0.95.
− Here 0.95 is a rule probability.
● The rule probability can be thought of hidden and is
combined along with associated probabilities in the
rule.
Prolog Programs for Cumulative
Probabilities
And-combination: collect all the probabilities in a list
and compute the product of these to get
and-comb effect.
and_comb([P], P).
and_comb([H | T], P) :- and_comb(T, P1), P is P1 * H.
• Assumption is that the sub goals in the rule are independent of
each other.
Or-combination: Compute probabilities of all the rules
with the same predicate name as head in a list and
compute or_comb probability.
or_comb([P], P).
or_comb([H|T],P):-or_comb(T,P1),P is 1-((1-H) * (1-P1)).
Production System for Diagnosis of
malfunctioning Equipment using Probability

„ Let us develop production system for diagnosis of


malfunctioning of television using probabilities.
„ We have to identify situations when television is not
working properly.
„ We broadly imagine one of two things may be
improperly adjusted.
• the controls (knobs and switches)
• the receiver (antenna or cable)
Contd…

Knobs adjustment
„ Knobs might be adjusted wrong on television
because of the following reasons (by no mean
exhaustive)
„ If television is old and require frequent adjustment.
„ Unusual usage (i.e., any thing strange has happened lately
that required adjustment of the knobs).
„ If children use your set and they play with the knobs.
Diagnosis of Malfunctioning Equipment using
Probability
● Let us develop rule based system for diagnosis of
malfunctioning of landline telephone using
probabilities.
● Identify situations when telephone is not working
properly.
● Broadly imagine one of the following things may be
the reasons
− Faulty Instrument
− Problem with exchange
− Broken cable
● Must note that these rules are to be carefully
designed in consultation with domain expert.
Few Rules from expert for Instrument
● Rule1: If instrument is old and has been repaired in
the past many times then it is 40% sure that fault lies
with the instrument.
telephone_not_working(0.4) :- ask(tele_history).

● Rule2: If instrument is fallen on the ground and broke


then it is 80% sure that fault lies with the instrument.
telephone_not_working(0.8) :- ask(tele_broken).
Rules – Cont…
● Rule3: If children are present and use your set and
play with the key pad with some probability, then it is
80% sure that the instrument is faulty because of
unusual usage.
telephone_not_working(P) :- ask(children_present, P1),
ask(children_keypad, P2),
and_comb([P1, P2, 0.8], P).
● This rule says that when both the conditions hold
(children are present and children play with key pads)
with some probabilities, then it means, "it is 80% sure
that the instrument is faulty ".
Cont…
● Finally get the cumulative probability for telephone not
working because of the instrument fault.
● We need to combine probabilities of all the rules using
or-combination.
● The corresponding rule is:
instrument_faulty(P) :- findall(X, telephone_not_working(X), L),
or_comb(L, P).
● The final rule for diagnosis for instrument faulty is
assumed to be 70% sure might be written as follows:
diagnosis('Instrument is faulty', P) :- instrument_faulty(P1),
and_comb([P1, 0.7], P).
Rules – Cont…
● Let us consider other kinds of telephone diagnosis such as
exchange, cable connection etc.

exchange_probability(0.99) :- ask(exchange_problem).
exchange_probability (0.5) :- ask(connecting_switch_problem).
overall_exchage_probability(P):-
findall(X, exchange_probabilty(X), L),
or_combination(L, P).

cable_broken_probabilty (0.98):-
ask(cable_old), ask(if_storm_recently).
cable_broken_probability (0.3):-
ask(recent_furniture_rearranging).
overall_cable_probabilty(P):-
findall(X, cable_broken_probability(X), L),
or_combination(L, P).
Rules for Diagnosis
● Rule1: Cable faulty is 80% sure :
diagnosis( 'Cable_problem', P):-
overall_cable_probabilty(P1),
and_comb([P1, 0.8], P).

● Rule2: Exchange problem is 90% sure:


diagnosis( 'Exchange_problem', P):-
overall_exchange_probabilty(P1),
and_comb([P1, 0.9], P).
● To run such a system, invoke a Goal as:
?- diagnosis(D, P).

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy