Unit-4 Uncertainity

Uncertaint
y
01 Uncertainty
02 Reasoning under Uncertainty
03 Concept of Probability: Axioms, Random variables

etc.
04 Baye’s Rules
05 Joint distribution
06 Inference in Joint distribution
07 Bayesian Network
Agents almost never have access to the whole truth
about their environment. Agents must, therefore, act
under uncertainty.
Acting Under
uncertainty
Handing Uncertain Knowledge
 Let us take an example of dental diagnosis
 ∀p Symptoms (p, toothache) ⇒ Disease (p, Cavity)
 Above logic is Not correct: toothache can be caused in

many other cases
 ∀p Symptoms (p, toothache) ⇒ Disease (p, Cavity) 

Disease (p, GumDisease) 
Disease (p,
ImpactedWisdom) …
Handling Uncertainty
 We consider the other way of writing the same problem

 ∀p Disease (p, Cavity) ⇒ Symptoms (p, Toothache)
 Not correct: Since all the cavities do not cause

toothache
Reasons for using Probability
 Laziness:
 It is too much work to complete set of antecedents or
consequents needed to ensure an exception less rule and too
hard to use such rules.
 Theoretical ignorance:
 Medical science has no complete theory for the domain.
 Practical ignorance:
 Even if we know all the rules, we might be uncertain about a
particular patient because not all the necessary tests have been
or can be run.P  Q is true if both P and Q are true.

Maximum Expected Utility Principle
The fundamental idea of decision theory is that an agent is
rational if and only if it chooses the action that yields the
highest expected utility, averaged over all the possible
outcomes of the action. This is called the principle of maximum
expected utility (MEU).

Axioms of Probability
 All probability are between 0 to 1: and Summation of all

probabilities equals to one
(1)
 (2)
 P(true) = 1 and P(false) = 0 (3)
 P(A  B) = P(A) + P(B) - P(A  B) (4)
Along with these the formula of conditional

Probability make up the Kolmogorov’s Axioms
(5)
Important Derivation from Axioms of
Probability
Random
Variables
• Random Variables: Random variables are real valued
function whose domain is sample space of random experiment.
• The random variables are typically divided into three
kinds.
• Boolean random variables: such as Cavity, have the
domain (true, false).
• We will often abbreviate a proposition such as Cavity = true
simply by the lowercase name cavity. Similarly, Cavity = false
would be abbreviated by ¬cavity.
• Discrete Random Variables: A discrete variable is
variable which takes finite number of distinct values.
• Continuous Random Variables: a Random

variable which takes infinite number of distinct
values.
Conditional Probability
• Suppose you are going to the dentist for a regular checkup, the
probability P(cavity)=0.2 might be of interest.
• you go to the dentist because you have a toothache, it’s P(cavity |
toothache)=0.8 that matters
• When making decisions, an agent needs to condition on all the
evidence it has observed.
• It is also important to understand the difference between
conditioning and logical implication.
• The assertion that P(cavity | toothache)=0.8 does not mean
“Whenever toothache is true, conclude that cavity is true with
probability 0.6” rather it means “Whenever toothache is true and
we have no further information, conclude that cavity is true with
probability 0.6.”
•
Joint Probability Distribution
• In addition to distributions on single variables.
• we need notation for distributions on multiple
variables. Commas are used for this purpose.
• For above mentioned purpose we use joined
Probability distribution.
Baye’s Rules
From Conditional Probability Formula we know following equations
• P(A ∧ B) = P(A | B) P(B)
• P(A ∧ B) = P(B | A) P(A) “ | ” is pronounced “given.”
Equating the two right-hand sides and dividing by P(A),

we get
• P(B | A) = P(A | B) P(B)

P(A)
Applying Bayes Rule: Simple Case
• we perceive as evidence the effect of some

unknown cause and we would like to determine
that cause. In that case, Bayes’ rule becomes
The more general case of Bayes’ rule for multivalued variables can be
written in the P notation as follows:
Inference From Joint Probability
• Start with the joint probability distribution:

• We begin with a simple example: a domain consisting of
just the three Boolean variables Toothache, Cavity, and
Catch (the dentist’s nasty steel probe catches in my
tooth). The full
• joint distribution is a 2×2×2 table.
• For any proposition φ, sum the atomic events where it is

true: P(φ) = Σω:ω╞φ P(ω)
Inference From Joint
Probability
• For any proposition φ, sum the atomic events

where it is true: P(φ) = Σω:ω╞φ P(ω)
• P(toothache) = 0.108 + 0.012 + 0.016 + 0.064
= 0.2
Marginalization
• For any proposition φ, sum the atomic events

where it is true: P(φ) = Σω:ω╞φ P(ω)
• P(Toothache) = 0.108 + 0.012 + 0.016 + 0.064
= 0.2
Start with the joint probability distribution:

•
• Can also compute conditional probabilities:

P(cavity  toothache)
P(cavity | toothache) =
=P(toothache)
0.016+0.064
0.108 + 0.012 + 0.016 + 0.064
= 0.4
P(cavity | toothache)
=
P(toothache)
0.016+0.064
0.108 + 0.012 + 0.016 +
0.064
= 0.4
• Denominator can be viewed as a normalization constant α.
P(Cavity | toothache) = α, P(Cavity,toothache)
= α, [P(Cavity, toothache, catch) + P( Cavity, toothache, catch)]
= α, [<0.108,0.016> + <0.012,0.064>]
= α, <0.12,0.08> = <0.6,0.4>
General idea: compute distribution on query variable by fixing evidence

variables and
summing over hidden variables
• Let us expand the full joint distribution in Figure by adding a

fourth variable, Weather .
• The full joint distribution then becomes P(Toothache, Catch,
Cavity, Weather ), which has 2 × 2 × 2 × 4 = 32 entries
• For example, how are P(toothache, catch, cavity, cloudy)
and P(toothache, catch, cavity) related? We can use the
product rule:
Independence
• A and B are independent iff
• P(A|B) = P(A) or P(B|A) = P(B) or P(A, B) = P(A) P(B)
P(Toothache, Catch, Cavity, Weather)

= P(Toothache, Catch, Cavity) P(Weather)
• 32 entries reduced to 12.

• Absolute independence powerful but rare Dentistry is a large field with
hundreds of variables, none of which are independent. What to do?
Inference From Joint
Probability
Conditional Independence
• It would be nice if Toothache and Catch were independent,
but they are not: if the probe catches in the tooth, then it
is likely that the tooth has a cavity and that the cavity
causes a toothache.
• These variables are independent, however, given the
presence or the absence of a cavity.
• Each is directly caused by the cavity, but neither has a
direct effect on the other: toothache depends on the state
of the nerves in the tooth, whereas the probe’s accuracy
depends on the dentist’s skill, to which the toothache is
irrelevant.
• Mathematically, this property is written as
above equation expresses the conditional independence of toothache

and catch given Cavity. We can plug it into Equation below
We get following equation

• The dentistry example illustrates a commonly occurring pattern in which

a single cause directly influences a number of effects, all of which are
conditionally independent, given the cause. The full joint distribution can
be written as
• Such a probability distribution is called a naive Bayes model—“naive”

because it is often used (as a simplifying assumption) in cases where the
“effect” variables are not actually conditionally independent given the
cause variable. (The naive Bayes model is sometimes called a Bayesian
classifier, a somewhat careless usage that has prompted true Bayesians
to call it the idiot Bayes model.)
Summary
• Probability is a rigorous formalism for uncertain
knowledge
• Joint probability distribution specifies probability of
every atomic event
• Queries can be answered by summing over atomic
events
• For nontrivial domains, we must find a way to
reduce the joint size
• Independence and conditional independence
provide the tools
Belief or Bayesian network:
Syntax: how to Make it
Belief or Bayesian network: Synta
4. Each node Xi has a conditional probability distribution

P(Xi|Parents(Xi)) that quantifies the effect of the
parents on the node.
5. The topology of the network—the set of nodes and links
—specifies the conditional independence relationships
that hold in the domain, in a way that will be made
precise shortly.
6. The intuitive meaning of an arrow is typically that X has
a direct influence on Y, which suggests that causes
should be parents of effects.
Fig. A simple Bayesian network in which Weather is independent of the

other
three variables and Toothache and Catch are conditionally
Belief or Bayesian network: Syntax
• Once the topology of the Bayesian network is laid out, we

need to only specify a conditional probability distribution
for each variable, given its parents.
• We will see that the combination of the topology and the
conditional distributions suffices to specify (implicitly) the
full joint distribution for all the variables.
Example Burglar Alarm
Findings to Probability from
Example
Let us take an example an understand this, For variables X,Y,Z, A,B if they
are independent
P(X,Y,Z,A,B)=P(X) *P(Y) *P(Z) *P(A) *P(B)…………………………………(A)
[Equation no.]
If they are dependent then we have to see the dependency for example
X,Y, and Z depends on A but not on B. Then the Formula becomes as
follows
P(X,Y,Z,A,B)=P(X|A) *P(Y|A) *P(Z|A) *P(A) *P(B)……………………(B)
[Equation no.]
If X,Y and Z depends on both A and B then formula (1) will be given as
Semantics
• A generic entry in the joint distribution is the probability

of a conjunction of particular assignments to each
variable, such as P(X1 =x1 ∧ . . . ∧ xn). We use the
notation P(x1, . . . , xn) as an abbreviation for this. The
value of this entry is given by the formula.
• ccccccccccc
(1)
• In other words, the tables we have been calling

conditional probability tables really are conditional
probability tables according to the semantics defined in
Equation above.
Semantics
We will now show that Equation (1) implies certain conditional

independence relationships that can be used to guide the knowledge
engineer in constructing the topology of the network.
First, we rewrite the entries in the joint distribution in terms of
conditional probability, using the product rule
Then we repeat the process, reducing each conjunctive probability to a

conditional probability and a smaller conjunction. We end up with one
big product
This identity is called the chain rule. It holds for any set of random
variables. Comparing it with Equation (1), we see that the specification of
the joint distribution is equivalent to the general assertion that, for every
variable Xi in the network,
(2)
Semantics
A method for constructing Bayesian
networks
• Equation (2) Bayesian network is a correct

representation of the domain only if each node is
conditionally independent of its other predecessors
in the node ordering, given its parents.
• We can satisfy this condition with this
methodology:
Semantics
A method for constructing Bayesian

networks
• Nodes: First determine the set of variables that are

required to model the domain.
• Now order them, {X1, . . . ,Xn}. Any order will work,

but the resulting network will be more compact if the
variables are ordered such that causes precede
effects.
• Links: For i = 1 to n do the following:
• Choose, from X1, . . . ,Xi−1, a minimal set of
parents for Xi, such that Equation (2) is
satisfied.
• For each parent insert a link from the parent
to Xi.
• CPTs: Write down the conditional probability
table, P(Xi|Parents(Xi)).
Properties of Bayesian Network
• Compactness: a complete and non-redundant

representation of the domain, a Bayesian network can
often be far more compact than the full joint
distribution.
• Locally structured: a locally structured system, each
subcomponent interacts directly with only a bounded
number of other components, regardless of the total
number of components-Fig(a)
• Markov Blanket: Another important independence
property is implied by the topological semantics: a node
is conditionally independent of all other nodes in the
network, given its parents, children and children’s
parents—that is, given its Markov blanket- Fig(b)
Figure (a) locally Structured (b) Markov
Blanket

Unit-4 Uncertainity

Uploaded by

Copyright:

Available Formats

Unit-4 Uncertainity

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Unit-4 Uncertainity

Uploaded by

Copyright:

Available Formats

Uncertaint

02 Reasoning under Uncertainty

03 Concept of Probability: Axioms, Random variables

06 Inference in Joint distribution

 Let us take an example of dental diagnosis

 ∀p Symptoms (p, toothache) ⇒ Disease (p, Cavity)

 Above logic is Not correct: toothache can be caused in

 ∀p Symptoms (p, toothache) ⇒ Disease (p, Cavity) 

 We consider the other way of writing the same problem

 Not correct: Since all the cavities do not cause

 It is too much work to complete set of antecedents or

consequents needed to ensure an exception less rule and too

hard to use such rules.

 Medical science has no complete theory for the domain.

 Even if we know all the rules, we might be uncertain about a

or can be run.P  Q is true if both P and Q are true.

The fundamental idea of decision theory is that an agent is

rational if and only if it chooses the action that yields the

highest expected utility, averaged over all the possible

outcomes of the action. This is called the principle of maximum

expected utility (MEU).

 All probability are between 0 to 1: and Summation of all

 P(true) = 1 and P(false) = 0 (3)

 P(A  B) = P(A) + P(B) - P(A  B) (4)

Along with these the formula of conditional

• Continuous Random Variables: a Random

• In addition to distributions on single variables.

• we need notation for distributions on multiple

variables. Commas are used for this purpose.

• For above mentioned purpose we use joined

From Conditional Probability Formula we know following equations

• P(A ∧ B) = P(A | B) P(B)

• P(A ∧ B) = P(B | A) P(A) “ | ” is pronounced “given.”

Equating the two right-hand sides and dividing by P(A),

• P(B | A) = P(A | B) P(B)

• we perceive as evidence the effect of some

• Start with the joint probability distribution:

• For any proposition φ, sum the atomic events where it is

• Start with the joint probability distribution:

• For any proposition φ, sum the atomic events

• For any proposition φ, sum the atomic events

Start with the joint probability distribution:

• Can also compute conditional probabilities:

• Denominator can be viewed as a normalization constant α.

P(Cavity | toothache) = α, P(Cavity,toothache)

= α, [P(Cavity, toothache, catch) + P( Cavity, toothache, catch)]

General idea: compute distribution on query variable by fixing evidence

• Let us expand the full joint distribution in Figure by adding a

P(Toothache, Catch, Cavity, Weather)

• 32 entries reduced to 12.

above equation expresses the conditional independence of toothache

We get following equation

• The dentistry example illustrates a commonly occurring pattern in which

• Such a probability distribution is called a naive Bayes model—“naive”

4. Each node Xi has a conditional probability distribution

Fig. A simple Bayesian network in which Weather is independent of the

• Once the topology of the Bayesian network is laid out, we

• A generic entry in the joint distribution is the probability

• In other words, the tables we have been calling