Unit-4 Uncertainity

Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 62

Uncertaint

y
01 Uncertainty

02 Reasoning under Uncertainty

03 Concept of Probability: Axioms, Random variables


etc.

04 Baye’s Rules

05 Joint distribution

06 Inference in Joint distribution

07 Bayesian Network
Agents almost never have access to the whole truth
about their environment. Agents must, therefore, act
under uncertainty.

Acting Under
uncertainty
Handing Uncertain Knowledge

 Let us take an example of dental diagnosis

 ∀p Symptoms (p, toothache) ⇒ Disease (p, Cavity)

 Above logic is Not correct: toothache can be caused in


many other cases

 ∀p Symptoms (p, toothache) ⇒ Disease (p, Cavity) 


Disease (p, GumDisease) 
Disease (p,
ImpactedWisdom) …
Handling Uncertainty

 We consider the other way of writing the same problem


 ∀p Disease (p, Cavity) ⇒ Symptoms (p, Toothache)

 Not correct: Since all the cavities do not cause


toothache
Reasons for using Probability

 Laziness:

 It is too much work to complete set of antecedents or

consequents needed to ensure an exception less rule and too

hard to use such rules.

 Theoretical ignorance:

 Medical science has no complete theory for the domain.

 Practical ignorance:

 Even if we know all the rules, we might be uncertain about a

particular patient because not all the necessary tests have been

or can be run.P  Q is true if both P and Q are true.


Maximum Expected Utility Principle

The fundamental idea of decision theory is that an agent is

rational if and only if it chooses the action that yields the

highest expected utility, averaged over all the possible

outcomes of the action. This is called the principle of maximum

expected utility (MEU).


Axioms of Probability

 All probability are between 0 to 1: and Summation of all


probabilities equals to one

(1)

 (2)

 P(true) = 1 and P(false) = 0 (3)

 P(A  B) = P(A) + P(B) - P(A  B) (4)

Along with these the formula of conditional


Probability make up the Kolmogorov’s Axioms

(5)
Important Derivation from Axioms of
Probability
Random
Variables
• Random Variables: Random variables are real valued
function whose domain is sample space of random experiment.
• The random variables are typically divided into three
kinds.
• Boolean random variables: such as Cavity, have the
domain (true, false).
• We will often abbreviate a proposition such as Cavity = true
simply by the lowercase name cavity. Similarly, Cavity = false
would be abbreviated by ¬cavity.
• Discrete Random Variables: A discrete variable is
variable which takes finite number of distinct values.

• Continuous Random Variables: a Random


variable which takes infinite number of distinct
values.
Conditional Probability

• Suppose you are going to the dentist for a regular checkup, the
probability P(cavity)=0.2 might be of interest.
• you go to the dentist because you have a toothache, it’s P(cavity |
toothache)=0.8 that matters
• When making decisions, an agent needs to condition on all the
evidence it has observed.
• It is also important to understand the difference between
conditioning and logical implication.
• The assertion that P(cavity | toothache)=0.8 does not mean
“Whenever toothache is true, conclude that cavity is true with
probability 0.6” rather it means “Whenever toothache is true and
we have no further information, conclude that cavity is true with
probability 0.6.”

Joint Probability Distribution

• In addition to distributions on single variables.

• we need notation for distributions on multiple

variables. Commas are used for this purpose.

• For above mentioned purpose we use joined

Probability distribution.
Baye’s Rules

From Conditional Probability Formula we know following equations

• P(A ∧ B) = P(A | B) P(B)

• P(A ∧ B) = P(B | A) P(A) “ | ” is pronounced “given.”

Equating the two right-hand sides and dividing by P(A),


we get

• P(B | A) = P(A | B) P(B)


P(A)
Applying Bayes Rule: Simple Case

• we perceive as evidence the effect of some


unknown cause and we would like to determine
that cause. In that case, Bayes’ rule becomes

The more general case of Bayes’ rule for multivalued variables can be
written in the P notation as follows:
Inference From Joint Probability

• Start with the joint probability distribution:


• We begin with a simple example: a domain consisting of
just the three Boolean variables Toothache, Cavity, and
Catch (the dentist’s nasty steel probe catches in my
tooth). The full
• joint distribution is a 2×2×2 table.

• For any proposition φ, sum the atomic events where it is


true: P(φ) = Σω:ω╞φ P(ω)
Inference From Joint
Probability

• Start with the joint probability distribution:

• For any proposition φ, sum the atomic events


where it is true: P(φ) = Σω:ω╞φ P(ω)
• P(toothache) = 0.108 + 0.012 + 0.016 + 0.064
= 0.2
Inference From Joint Probability

Marginalization
• Start with the joint probability distribution:

• For any proposition φ, sum the atomic events


where it is true: P(φ) = Σω:ω╞φ P(ω)
• P(Toothache) = 0.108 + 0.012 + 0.016 + 0.064
= 0.2
Inference From Joint Probability

Start with the joint probability distribution:


• Can also compute conditional probabilities:


P(cavity  toothache)

P(cavity | toothache) =
=P(toothache)
0.016+0.064
0.108 + 0.012 + 0.016 + 0.064
= 0.4
P(cavity | toothache)
=
P(toothache)
0.016+0.064
0.108 + 0.012 + 0.016 +
0.064
= 0.4
Inference From Joint Probability

• Denominator can be viewed as a normalization constant α.

P(Cavity | toothache) = α, P(Cavity,toothache)

= α, [P(Cavity, toothache, catch) + P( Cavity, toothache, catch)]

= α, [<0.108,0.016> + <0.012,0.064>]

= α, <0.12,0.08> = <0.6,0.4>

General idea: compute distribution on query variable by fixing evidence


variables and
summing over hidden variables
Inference From Joint Probability

• Let us expand the full joint distribution in Figure by adding a


fourth variable, Weather .
• The full joint distribution then becomes P(Toothache, Catch,
Cavity, Weather ), which has 2 × 2 × 2 × 4 = 32 entries
• For example, how are P(toothache, catch, cavity, cloudy)
and P(toothache, catch, cavity) related? We can use the
product rule:
Inference From Joint Probability

Independence
• A and B are independent iff
• P(A|B) = P(A) or P(B|A) = P(B) or P(A, B) = P(A) P(B)

P(Toothache, Catch, Cavity, Weather)


= P(Toothache, Catch, Cavity) P(Weather)

• 32 entries reduced to 12.


• Absolute independence powerful but rare Dentistry is a large field with
hundreds of variables, none of which are independent. What to do?
Inference From Joint
Probability
Conditional Independence
• It would be nice if Toothache and Catch were independent,
but they are not: if the probe catches in the tooth, then it
is likely that the tooth has a cavity and that the cavity
causes a toothache.
• These variables are independent, however, given the
presence or the absence of a cavity.
• Each is directly caused by the cavity, but neither has a
direct effect on the other: toothache depends on the state
of the nerves in the tooth, whereas the probe’s accuracy
depends on the dentist’s skill, to which the toothache is
irrelevant.
• Mathematically, this property is written as
Conditional Independence

above equation expresses the conditional independence of toothache


and catch given Cavity. We can plug it into Equation below

We get following equation


Conditional Independence

• The dentistry example illustrates a commonly occurring pattern in which


a single cause directly influences a number of effects, all of which are
conditionally independent, given the cause. The full joint distribution can
be written as

• Such a probability distribution is called a naive Bayes model—“naive”


because it is often used (as a simplifying assumption) in cases where the
“effect” variables are not actually conditionally independent given the
cause variable. (The naive Bayes model is sometimes called a Bayesian
classifier, a somewhat careless usage that has prompted true Bayesians
to call it the idiot Bayes model.)
Conditional Independence
Summary
• Probability is a rigorous formalism for uncertain
knowledge
• Joint probability distribution specifies probability of
every atomic event
• Queries can be answered by summing over atomic
events
• For nontrivial domains, we must find a way to
reduce the joint size
• Independence and conditional independence
provide the tools
Belief or Bayesian network:
Syntax: how to Make it
Belief or Bayesian network: Synta

4. Each node Xi has a conditional probability distribution


P(Xi|Parents(Xi)) that quantifies the effect of the
parents on the node.
5. The topology of the network—the set of nodes and links
—specifies the conditional independence relationships
that hold in the domain, in a way that will be made
precise shortly.
6. The intuitive meaning of an arrow is typically that X has
a direct influence on Y, which suggests that causes
should be parents of effects.

Fig. A simple Bayesian network in which Weather is independent of the


other
three variables and Toothache and Catch are conditionally
Belief or Bayesian network: Syntax

• Once the topology of the Bayesian network is laid out, we


need to only specify a conditional probability distribution
for each variable, given its parents.
• We will see that the combination of the topology and the
conditional distributions suffices to specify (implicitly) the
full joint distribution for all the variables.
Example Burglar Alarm
Findings to Probability from
Example

Let us take an example an understand this, For variables X,Y,Z, A,B if they
are independent
P(X,Y,Z,A,B)=P(X) *P(Y) *P(Z) *P(A) *P(B)…………………………………(A)
[Equation no.]
If they are dependent then we have to see the dependency for example
X,Y, and Z depends on A but not on B. Then the Formula becomes as
follows
P(X,Y,Z,A,B)=P(X|A) *P(Y|A) *P(Z|A) *P(A) *P(B)……………………(B)
[Equation no.]
If X,Y and Z depends on both A and B then formula (1) will be given as
Belief or Bayesian network:
Semantics

• A generic entry in the joint distribution is the probability


of a conjunction of particular assignments to each
variable, such as P(X1 =x1 ∧ . . . ∧ xn). We use the
notation P(x1, . . . , xn) as an abbreviation for this. The
value of this entry is given by the formula.

• ccccccccccc
(1)

• In other words, the tables we have been calling


conditional probability tables really are conditional
probability tables according to the semantics defined in
Equation above.
Belief or Bayesian network:
Semantics

We will now show that Equation (1) implies certain conditional


independence relationships that can be used to guide the knowledge
engineer in constructing the topology of the network.
First, we rewrite the entries in the joint distribution in terms of
conditional probability, using the product rule

Then we repeat the process, reducing each conjunctive probability to a


conditional probability and a smaller conjunction. We end up with one
big product

This identity is called the chain rule. It holds for any set of random
variables. Comparing it with Equation (1), we see that the specification of
the joint distribution is equivalent to the general assertion that, for every
variable Xi in the network,
(2)
Belief or Bayesian network:
Semantics
A method for constructing Bayesian
networks

• Equation (2) Bayesian network is a correct


representation of the domain only if each node is
conditionally independent of its other predecessors
in the node ordering, given its parents.
• We can satisfy this condition with this
methodology:
Belief or Bayesian network:
Semantics

A method for constructing Bayesian


networks

• Nodes: First determine the set of variables that are


required to model the domain.

• Now order them, {X1, . . . ,Xn}. Any order will work,


but the resulting network will be more compact if the
variables are ordered such that causes precede
effects.
• Links: For i = 1 to n do the following:
• Choose, from X1, . . . ,Xi−1, a minimal set of
parents for Xi, such that Equation (2) is
satisfied.
• For each parent insert a link from the parent
to Xi.
• CPTs: Write down the conditional probability
table, P(Xi|Parents(Xi)).
Properties of Bayesian Network

• Compactness: a complete and non-redundant


representation of the domain, a Bayesian network can
often be far more compact than the full joint
distribution.
• Locally structured: a locally structured system, each
subcomponent interacts directly with only a bounded
number of other components, regardless of the total
number of components-Fig(a)
• Markov Blanket: Another important independence
property is implied by the topological semantics: a node
is conditionally independent of all other nodes in the
network, given its parents, children and children’s
parents—that is, given its Markov blanket- Fig(b)
Figure (a) locally Structured (b) Markov
Blanket

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy