0% found this document useful (0 votes)
28 views

Module 4 - Probability Reasoning and Uncertainty

Ai ML notes

Uploaded by

Ankit
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
28 views

Module 4 - Probability Reasoning and Uncertainty

Ai ML notes

Uploaded by

Ankit
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 80

Module 4 - Probability reasoning

and uncertainty

Quantifying uncertainty, Knowledge representation in uncertainty,


Decision making – Simple, complex.
Uncertainty Handling
• This is a traditional AI topic,
• prior to covering machine learning approaches
• There are many different approaches for handling uncertainty
• Formal approaches based on mathematics (probabilities)
• Formal approaches based on logic
• Informal approaches
• Many questions arise
• How do we combine uncertainty values?
• How do we obtain uncertainty values?
• How do we interpret uncertainty values?
• How do we add uncertainty values to our knowledge and inference mechanisms?
Why Is Uncertainty Needed?
• We will find none of the approaches to be entirely adequate so the natural question is why
even bother?
Uncertainity is defined as the lack of exact information or knowledge leading to ambiguity and
unpredictability for a conclusion.
• Input data may be questionable
• to what extent is a patient demonstrating some symptom?
• do we rely on their word?
• Knowledge may be questionable
• is this really a fact?
• Knowledge may not be truth-preserving
• if I apply this piece of knowledge, does the conclusion necessarily hold true? associational knowledge for
instance is not truth preserving, but used all the time in diagnosis
• Input may be ambiguous or unclear
• this is especially true if we are dealing with real-world inputs from sensors, or dealing with situations where
ambiguity readily exists (natural languages for instance)
• Output may be expected in terms of a plausibility/probability such as “what is the likelihood that it will
rain today?”
• The world is not just T/F, so our reasoners should be able to model this and reason over the
shades of grey we find in the world
Methods to Handle Uncertainty
• Fuzzy Logic
• Logic that extends traditional 2-valued logic to be a continuous logic (values from 0 to 1)
• while this early on was developed to handle natural language ambiguities such as “you are very tall” it
instead is more successfully applied to device controllers
• Probabilistic Reasoning
• Using probabilities as part of the data and using Bayes theorem or variants to reason over what is
most likely
• Hidden Markov Models
• A variant of probabilistic reasoning where internal states are not observable (so they are called
hidden)
• Certainty Factors and Qualitative Fuzzy Logics
• More ad hoc approaches (non formal) that might be more flexible or at least more human-like
• Neural Networks
Bayesian Probabilities
• Bayes Theorem is used to describe the probability of an event, based on prior
knowledge of conditions that might be related to that event.
• Bayes Theorem is given below

• P(H0 | E) = probability of H0 given evidence E is true (the conditional probability)


• P(E | H0) = probability of E that H0 is true(the evidential probability)
• P(H0) = probability that H0 will arise (the prior probability)
• P(E) = probability that evidence E will arise
• Usually we normalize our probabilities so that P(E) = 1
• The idea is that you are given some evidence E = {E1, E2, …, En} and you have
a collection of hypotheses H1, H2, …, Hm
• Using a collection of evidential and prior probabilities, compute the most likely
hypothesis
Independence of Evidence
• Note that since E is a collection of some evidence, but not all possible
evidence, you will need a whole lot of probabilities
• P(E1 & E2 | H0), P(E1 & E3 | H0), P(E1 & E2 & E3 | H0), …
• If you have n items that could be evidence, you will need 2n different evidential
probabilities for every hypothesis.
• In order to get around the problem of needing an exponential number of
probabilities, one might make the assumption that pieces of evidence are
independent
• Under such an assumption
• P(E1 & E2 | H) = P(E1 | H) * P(E2 | H)
• P(E1 & E2) = P(E1) * P(E2)
• Is this a reasonable assumption?
Continued
• Example: A patient is suffering from a fever and nausea.
• Can we treat these two symptoms as independent?
• one might be causally linked to the other
• the two combined may help identify a cause (disease) that the symptoms separately might not
• A weaker form of independence is conditional independence
• If hypothesis H is known to be true, then whether E1 is true should not impact
P(E2 | H) or P(H | E2)
• Again, it this a reasonable assumption?
• Consider as an example:
• You want to run the sprinkler system if it is not going to rain and you base your
decision on whether it will rain or not on whether it is cloudy.
• the grass is wet, we want to know the probability that you ran the sprinkler versus if it rained
• evidential probabilities P(sprinkler | wet) and P(rain | wet) are not independent of whether it
was cloudy or not
Continued

Marginal Probability: The probability of an event irrespective of the outcomes of other


random variables, e.g. P(A).
Joint Probability: Probability of two (or more) simultaneous events, e.g. P(A and B) or P(A,
B).
Conditional Probability: Probability of one (or more) event given the occurrence of another
event, e.g. P(A given B) or P(A | B).
The joint probability can be calculated using the conditional probability; for example:
P(A, B) = P(A | B) * P(B)
Continued
• The conditional probability can be calculated using the joint probability; for example:
P(A | B) = P(A, B) / P(B)
• Bayes Theorem: Principled way of calculating a conditional probability without the
joint probability.
• It is often the case that we do not have access to the denominator directly, e.g. P(B).
P(B) = P(B|A) * P(A) + P(B|not A) * P(not A)
Therefore, P(A|B) = P(B|A) * P(A) / ((P(B|A) * P(A) + P(B|not A) * P(not A))
• P(A|B): Posterior probability.
• P(A): Prior probability.
• P(B|A): Likelihood.
• P(B): Evidence
Bayes Theorem to be restated as:
• Posterior = Likelihood * Prior / Evidence
Probability Basics
Bayes' Theorem
• Bayes’ Theorem is a way of finding a probability when we know certain
other probabilities.
• P(A|B) = P(B|A) * P(A) / P(B)
Example 1: If dangerous fires are rare (1%) but smoke is fairly common (10%) due to
barbecues, and 90% of dangerous fires make smoke then what is the probability of
dangerous Fire when there is Smoke.
P(Fire) = 1% = .01
P(Smoke) = 10% = 0.1
P(Smoke|Fire) = 90% = 0.9
P(Fire|Smoke)= P(Fire) × P(Smoke|Fire) / P(Smoke)
=0.01 * 0.9 /0.1 = 0.09 = 9%
Bayes' Theorem
Example 2: You are planning a picnic today, but the morning is cloudy. 50% of all rainy days start
off cloudy. But cloudy mornings are common (about 40% of days start cloudy). Also this is
usually a dry month (only 3 of 30 days tend to be rainy, or 10%). What is the chance of rain
during the day?
P(Rain) = 10% =0.1
P(Cloudy) = 40% =0.4
P(Cloudy|Rain) = 50% =0.5
P(Rain|Cloudy) =P(Rain) P(Cloudy|Rain) / P(Cloudy)
= 0.1 * 0.5 / 0.4 = 0.125
=12.5%
Example 3: Past data tells you that 10% of patients entering your clinic have liver disease. Five
percent of the clinic’s patients are alcoholics. It is also known that among those patients
diagnosed with liver disease, 7% are alcoholics. if the patient is an alcoholic, then find their
chances of having liver disease.
A= Patient has liver disease ; P(A) = 10% = 0.10.
B= Patient is an alcoholic; P(B) = 5% = 0.05
P(B|A) : the probability that a patient is alcoholic, given that they have liver disease= 7% = 0.07.
Bayes’ theorem tells us:
P(A|B) = (0.07 * 0.1)/0.05 = 0.14
Conditional probability
• Conditional probability is a probability of occurring an event when
another event has already happened.
• To calculate the event A when event B has already occurred
• Where P(A⋀B)= Joint probability of a and B
• P(B)= Marginal probability of B.
In a class, there are 70% of the students who like English and 40% of the
students who likes English and mathematics, and then what is the
percent of students those who like English also like mathematics?

• Let, A is an event that a student likes Mathematics


• B is an event that a student likes English.
Law of Total Probability
• If you have a set of mutually exclusive and exhaustive events
A1,A2,…,An that partition the sample space S, then for any event B in
the same probability space:
P(B)=σ𝑛𝑖=1 P(Ai)×P(B∣Ai)
• P(Ai​) is the probability of event Ai occurring.
• P(B∣Ai​) is the conditional probability of event B occurring given that event Ai has
occurred.
• The events Ai are mutually exclusive: only one of them can occur at a
time.
• The events Ai are exhaustive: their union covers the entire sample
space S.
• In Bayesian inference, it helps compute marginal probabilities and
posterior probabilities efficiently.
Question: what is the probability that a patient has diseases meningitis
with a stiff neck?
Given Data:
A doctor is aware that disease meningitis causes a patient to have a stiff
neck, and it occurs 80% of the time. He is also aware of some more facts,
which are given as follows:
•The Known probability that a patient has meningitis disease is 1/30,000.
•The Known probability that a patient has a stiff neck is 2%.

P(a|b) = 0.8

P(b) = 1/30000

P(a)= .02
From a standard deck of playing cards, a single card is drawn. The probability
that the card is king is 4/52, then calculate posterior probability P(King|Face),
which means the drawn face card is a king card

• P(king): probability that the card is King= 4/52= 1/13


• P(face): probability that a card is a face card= 3/13
• P(Face|King): probability of face card when we assume it is a
king = 1
• Putting all values in equation (i) we will get:
Bayes' Theorem
Example 1: There is a test for Allergy to Cats, but this test is not always
right. For people that really do have the allergy, the test says "Yes" 80%
of the time. For people that do not have the allergy, the test says "Yes"
10% of the time ("false positive"). If 1% of the population have the
allergy, and Harry's test says "Yes", what are the chances that Harry
really has the allergy?
P(Allergy) is Probability of Population having Allergy = 1%
P(~Allergy) is Probability of Population not having Allergy = 1-
P(Allergy) =99%
P(Yes|Allergy) is Probability of test saying "Yes" for people with allergy
= 80%
P(Yes|~Allergy) is Probability of test saying "Yes" for people don’t have
allergy = 10%
Bayes' Theorem
• Three different machines are used to produce a particular
manufactured item. The three machines, A, B and C, produce 20%,
30% and 50% of the items, respectively. Now, machines A, B and C
produce defective items at a rate of 1%, 2% and 3%,
respectively. Suppose that we pick an item from the final batch at
random. The item is found to be defective. What is the probability
that the item was produced by machine B?
Bayes' Theorem
Bayesian Network
A Bayesian network is a probabilistic graphical model which represents a set of variables
and their conditional dependencies using a directed acyclic graph.
It is also called a Bayes network, belief network, decision network, or Bayesian model.
Bayesian networks are probabilistic, because these networks are built from a probability
distribution, and also use probability theory for prediction and anomaly detection.
Real world applications are probabilistic in nature, and to represent the relationship
between multiple events, we need a Bayesian network.
It can also be used in various tasks including prediction, anomaly detection, diagnostics,
automated insight, reasoning, time series prediction, and decision making under
uncertainty.
Bayesian Network
Bayesian Network can be used for building models from data and experts opinions, and it
consists of two parts:
 Directed Acyclic Graph
 Table of conditional probabilities
Bayesian belief network deals with probabilistic events and solves problem which has
uncertainty.
Bayesian Belief Network is a graphical representation of different probabilistic
relationships among random variables in a particular set.
Bayesian Network
• The Bayesian network graph does not contain any cyclic graph. Hence, it is known as
a directed acyclic graph or DAG.
• Each node in the Bayesian network has condition probability
distribution P(Xi |Parent(Xi) ), which determines the effect of the parent on that node.
• Bayesian network is based on Joint probability distribution and conditional probability.
• The conditional independence of B from A and C as P(B),P(A|B), P(C|B)) or P(B).
• The joint probability of A and C given B
P(A, C | B) = P(A|B) * P(C|B)
• The joint probability of P(A, B, C), calculated as:
P(A, B, C) = P(A|B) * P(C|B) * P(B)
Bayesian belief network
• Each node corresponds to random variables, and a variable can
be continuous or discrete.
• Arc or directed arrows represent the causal relationship or conditional probabilities
between random variables.
• These directed links or arrows connect the pair of nodes in the graph. These links
represent that one node directly influence the other node, and if there is no directed
link that means that nodes are independent with each other
• In the diagram, A, B, C, and D are random variables represented by the nodes of the network
graph.
• If we are considering node B, which is connected with node A by a directed arrow, then node A is
called the parent of Node B.
• Node C is independent of node A.
Joint probability distribution
If we have variables x1, x2, x3,....., xn, then the probabilities of a different
combination of x1, x2, x3.. xn, are known as Joint probability distribution.
P[x1| x2, x3,....., xn]P[x2, x3,....., xn]
P[x1| x2, x3,....., xn]P[x2|x3,....., xn]....P[xn-1|xn]P[xn].
In general for each variable Xi, we can write the equation as:

Conditional dependencies as follows:


A is conditionally dependent upon B, e.g. P(A|B)
C is conditionally dependent upon B, e.g. P(C|B)
Bayesian Networks
Bayesian Networks
Bayesian Networks
Bayesian Networks
Bayesian Networks
Bayesian Belief Network
Suppose that there are two events which could cause grass to be wet: either the
sprinkler is on or it's raining. Also, suppose that the rain has a direct effect on the
use of the sprinkler (namely that when it rains, the sprinkler is usually not turned
on). Then the situation can be modeled with a Bayesian network. All three
variables have two possible values, T (for true) and F (for false).

=0.5*0.1*0.8*0.99 = 0.0396
Bayesian Belief Network
Bayesian Belief Network
Bayesian Belief Network

What is the probability that John calls.


Inference in Bayesian Network
• Probabilistic Inference system is used to compute posterior probability
distribution for a set of query variables, given some observed events.
• That is some assignment of values to a set of evidence variables.
Inference
Inference
Inference
Inference
Example-2
Example-2
Example-2
Inference
Inference
Inference
P(b| j,m) = α×0.00059224. The corresponding computation for ¬b yields
α×0.0014919;
P(B| j,m) = 0.00059224 /(0.00059224 + 0.0014919)
P(B| j,m) = ≈ 0.284
P(¬B|j,m) = ≈ 0.716
Inference
• Inference: calculating some useful  Examples:
quantity from a joint probability
 Posterior probability
distribution

 Most likely explanation:


Inference by Enumeration
* Works fine with
• General case:  We want: multiple query
• Evidence variables: variables, too
• Query* variable:
All variables
• Hidden variables:

 Step 1: Select the  Step 2: Sum out H to get joint  Step 3: Normalize
entries consistent of Query and evidence
with the evidence
Inference by Enumeration
• A method to calculate the probability of a specific event or variable
given evidence in the network.
• It's based on systematically considering all possible combinations of
values for the variables in the network that are not observed (i.e., not
provided as evidence) and summing or multiplying probabilities
accordingly.
1.Identify the Query: Determine the variable/s you want to make inferences
about.
2.Identify Evidence: Identify any evidence or observed values in the network.
3.Initialize: Start with the joint probability distribution of all variables in the
network.
4.Enumeration: Enumerate over all possible values of the remaining variables,
and for each combination of values
1.Multiply the probabilities associated with observed evidence
2.Sum or multiply probabilities for unobserved variables according to the network structure
and conditional probability tables
|W | P(W) | |W |T |P(T|W)|
|----------|------|
| Sunny | 0.7 |
Inference by Enumeration |---------|---------|--------|
| Sunny | Light | 0.9 |
| Rainy | 0.3 | | Sunny | Heavy | 0.1 |
Weather (W) --> Traffic (T) --> Late for work (L) | Rainy | Light | 0.3 |
| Rainy | Heavy | 0.7 |
Infer the probability of being late for work (L) given that it's rainy (W = rainy)
1.Identify the Query: P(L|W = rainy). |T |L |
P(L|T) |
2.Identify Evidence: (W = rainy). |--------|-----------|-------|
| Light | Late | 0.2
3.Initialize: P(L = late, W = rainy) |
| Light | Not late | 0.8
4.Enumeration: |
| Heavy | Late | 0.8
P(L = late, W = rainy) =σ𝑡 P(W = rainy) ∗ P(t|W = rainy)
| * P(L|t)
| Heavy | Not late | 0.2
|
|W | P(W) | |W |T |P(T|W)|
|----------|------|
| Sunny | 0.7 |
Inference by Enumeration |---------|---------|--------|
| Sunny | Light | 0.9 |
| Rainy | 0.3 | | Sunny | Heavy | 0.1 |
Weather (W) --> Traffic (T) --> Late for work (L) | Rainy | Light | 0.3 |
| Rainy | Heavy | 0.7 |
Infer the probability of being late for work (L) given that it's rainy (W = rainy)
|T |L |
1.Identify the Query: P(L|W = rainy). P(L|T) |
2.Identify Evidence: (W = rainy). |--------|-----------|-------|
| Light | Late | 0.2
3.Initialize: P(L = late, W = rainy) |
4. Enumeration: | Light | Not late | 0.8
|
P(L = late, W = rainy) = P(W = rainy) * Σ P(T|W = rainy) * P(L|T) | Heavy | Late | 0.8
= 0.3 * ((0.7 * 0.8) + (0.3 * 0.2)) |
= 0.3 * (0.56 + 0.06) | Heavy | Not late | 0.2
= 0.3 * 0.62 |
= 0.186
5. Normalize:
P(L = late | W = rainy) = 0.186 / (0.186 + P(L = not late | W = rainy))
Inference by Enumeration in Bayes’ Net
• Given unlimited time, inference in BNs is easy
B E
• Reminder of inference by enumeration by example:

J M
B E A P(A|B,E)

Example: Alarm Network +b


+b
+e
+e
+a
-a
0.95
0.05
B P(B) E P(E)
+b -e +a 0.94
B E
+b 0.001 +e 0.002 +b -e -a 0.06
-b 0.999 -e 0.998 -b +e +a 0.29
-b +e -a 0.71
A
A J P(J|A) A M P(M|A) -b -e +a 0.001
+a +j 0.9 +a +m 0.7 -b -e -a 0.999
+a -j 0.1 +a -m 0.3
-a +j 0.05 J M -a +m 0.01
-a -j 0.95 -a -m 0.99
Inference by Enumeration?

It can become computationally expensive


Inference by Enumeration vs. Variable Elimination
• Why is inference by enumeration so slow?  Idea: interleave joining and marginalizing!
• You join up the whole joint distribution before  Called “Variable Elimination”
you sum out the hidden variables
 Still NP-hard, but usually much faster than
inference by enumeration

 First we’ll need some new notation: factors


Inference by Variable Elimination
Inference by Variable Elimination
Inference by Variable Elimination
Factor I
• Joint distribution: P(X,Y)
T W P
• Entries P(x,y) for all x, y
• Sums to 1 hot sun 0.4
hot rain 0.1
cold sun 0.2
cold rain 0.3
• Selected joint: P(x,Y)
• A slice of the joint distribution
• Entries P(x,y) for fixed x, all y
• Sums to P(x) T W P
cold sun 0.2
• Number of capitals = cold rain 0.3
dimensionality of the table
Factor II
• Single conditional: P(Y | x)
• Entries P(y | x) for fixed x, all y
• Sums to 1 T W P
cold sun 0.4
cold rain 0.6

• Family of conditionals:
P(X |Y) T W P
• Multiple conditionals hot sun 0.8
• Entries P(x | y) for all x, y
hot rain 0.2
• Sums to |Y|
cold sun 0.4
cold rain 0.6
Factor III
• Specified family: P( y | X )
• Entries P(y | x) for fixed y,
but for all x
• Sums need not to 1

T W P
hot rain 0.2
cold rain 0.6
Factor Summary
 In general, when we write P(Y1 … YN | X1 … XM)
 It is a “factor,” a multi-dimensional array

 Its values are P(y1 … yN | x1 … xM)

 Any assigned (=lower-case) X or Y is a dimension missing (selected) from the array


General Variable Elimination
• Query:

• Start with initial factors:


• Local CPTs (but instantiated by evidence)

• While there are still hidden variables (not Q or


evidence):
• Pick a hidden variable H
• Join all factors mentioning H
• Eliminate (sum out) H

• Join all remaining factors and normalize


Example: Traffic Domain
• Random Variables
+r 0.1
• R: Raining -r 0.9
R
• T: Traffic
• L: Late for class!
T +r +t 0.8
+r -t 0.2
-r +t 0.1
-r -t 0.9
L

+t +l 0.3
+t -l 0.7
-t +l 0.1
-t -l 0.9
Inference by Enumeration: Procedural Outline
• Initial factors are local CPTs (one per node)

+r 0.1 +r +t 0.8 +t +l 0.3


-r 0.9 +r -t 0.2 +t -l 0.7
-r +t 0.1 -t +l 0.1
-r -t 0.9 -t -l 0.9

• Any known values are selected


• E.g. if we know , the initial factors are

+r 0.1 +r +t 0.8 +t +l 0.3


-r 0.9 +r -t 0.2 -t +l 0.1
-r +t 0.1
-r -t 0.9

• Procedure: Join all factors, then eliminate all hidden variables


Operation 1: Join Factors
• First basic operation: joining factors
• Combining factors:
• Just like a database join
• Get all factors over the joining variable
• Build a new factor over the union of the variables
involved

• Example: Join on R

R
+r 0.1 +r +t 0.8 +r +t 0.08
-r 0.9 +r -t 0.2 +r -t 0.02 R,T
-r +t 0.1 -r +t 0.09
T -r -t 0.9 -r -t 0.81

• Computation for each entry: pointwise products


Example: Multiple Joins

+r 0.1
R -r 0.9 Join R
+r +t 0.08
Join T
R, T, L
+r -t 0.02
T +r +t 0.8 -r +t 0.09
+r -t 0.2 -r -t 0.81 R, T
-r +t 0.1 +r +t +l 0.024
-r -t 0.9 +r +t -l 0.056
L
L +r -t +l 0.002
+r -t -l 0.018
+t +l 0.3 +t +l 0.3 -r +t +l 0.027
+t -l 0.7 +t -l 0.7 -r +t -l 0.063
-t +l 0.1 -t +l 0.1 -r -t +l 0.081
-t -l 0.9 -t -l 0.9 -r -t -l 0.729
Operation 2: Eliminate
• Second basic operation: marginalization
• Take a factor and sum out a variable
• Shrinks a factor to a smaller one
• A projection operation

• Example:

+r +t 0.08
+r -t 0.02 +t 0.17
-r +t 0.09 -t 0.83
-r -t 0.81
Multiple Elimination
R, T, L T, L L
+r +t +l 0.024
+r +t -l 0.056 Sum Sum
+r -t +l 0.002 out R out T
+r -t -l 0.018 +t +l 0.051
-r +t +l 0.027 +t -l 0.119 +l 0.134
-r +t -l 0.063 -t +l 0.083 -l 0.886
-r -t +l 0.081 -t -l 0.747
-r -t -l 0.729
Marginalizing
Join R
Early! (aka VE) Sum out T
Sum out R Join T
+r +t 0.08
+r 0.1 +r -t 0.02
-r 0.9 +t 0.17
-r +t 0.09
-t 0.83
-r -t 0.81
R R, T T T, L L
+r +t 0.8
+r -t 0.2
-r +t 0.1
T -r -t 0.9 L
L +t +l 0.051
+t -l 0.119 +l 0.134
-t +l 0.083 -l 0.866
L +t +l 0.3 +t +l 0.3 -t -l 0.747
+t +l 0.3
+t -l 0.7 +t -l 0.7
+t -l 0.7
-t +l 0.1 -t +l 0.1
-t +l 0.1
-t -l 0.9 -t -l 0.9
-t -l 0.9
Evidence
• If evidence, start with factors that select that evidence

+r 0.1 +r +t 0.8 +t +l 0.3


-r 0.9 +r -t 0.2 +t -l 0.7
-r +t 0.1 -t +l 0.1
-r -t 0.9 -t -l 0.9

• Computing , the initial factors become:

+r 0.1 +r +t 0.8 +t +l 0.3


+r -t 0.2 +t -l 0.7
-t +l 0.1
-t -l 0.9

• We eliminate all vars other than query + evidence


Evidence II
• Result will be a selected joint of query and evidence
• E.g. for P(L | +r), we would end up with:

Normalize
+r +l 0.026 +l 0.26
+r -l 0.074 -l 0.74

• To get our answer, just normalize this!

• That ’s it!
Traffic Domain
R

• Inference by Enumeration  Variable Elimination


T

L
Join on r Join on r

Join on t Eliminate r

Eliminate r Join on t

Eliminate t Eliminate t
Example

Choose A
Example

Choose E

Finish with B

Normalize
Same Example in Equations

marginal can be obtained from joint by summing out

use Bayes’ net joint distribution expression

use x*(y+z) = xy + xz

joining on a, and then summing out gives f1

use x*(y+z) = xy + xz

joining on e, and then summing out gives f2

All we are doing is exploiting uwy + uwz + uxy + uxz + vwy + vwz + vxy +vxz = (u+v)(w+x)(y+z) to improve computational efficiency!
Another Variable Elimination Example

Computational complexity critically


depends on the largest factor being
generated in this process. Size of factor
= number of entries in table. In
example above (assuming binary) all
factors generated are of size 2 --- as
they all only have one variable (Z, Z,
and X3 respectively).
Variable Elimination Ordering
• For the query P(Xn|y1,…,yn) work through the
following two different orderings as done in
previous slide: Z, X1, …, Xn-1 and X1, …, Xn-1, Z.
What is the size of the maximum factor
generated for each of the orderings? …
• Answer: 2n+1 versus 22 (assuming binary)

• In general: the ordering can greatly affect


efficiency. …
VE: Computational and Space Complexity
• The computational and space complexity of variable elimination is
determined by the largest factor

• The elimination ordering can greatly affect the size of the largest factor.
• E.g., previous slide’s example 2n vs. 2

• Does there always exist an ordering that only results in small factors?
• No!
Worst Case Complexity?
• CSP:

• If we can answer P(z) equal to zero or not, we answered whether the 3-SAT problem has a solution.
• Hence inference in Bayes’ nets is NP-hard. No known efficient probabilistic inference in general.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy