CS 188 Introduction To Fall 2015 Artificial Intelligence: Q1 Q2 Q3 Total /30 /40 /30 /100
CS 188 Introduction To Fall 2015 Artificial Intelligence: Q1 Q2 Q3 Total /30 /40 /30 /100
CS 188 Introduction To Fall 2015 Artificial Intelligence: Q1 Q2 Q3 Total /30 /40 /30 /100
• Format: Submit the answer sheet pdf containing your answers. You should solve the questions on this handout
(either through a pdf annotator, or by printing, then scanning; we recommend the latter to match exam setting).
Make sure that your answers (typed or handwritten) are within the dedicated regions for each
question/part. If you do not follow this format, we may deduct points.
• How to submit: Go to www.gradescope.com. Log in and click on the class CS188 Fall 2015. Click on
the submission titled HW 2 and upload your pdf containing your answers. If this is your first time using
Gradescope, you will have to set your password before logging in the first time. To do so, click on ”Forgot
your password” on the login page, and enter your email address on file with the registrar’s office (usually your
@berkeley.edu email address). You will then receive an email with a link to reset your password.
Last Name
First Name
SID
Collaborators
(a) [12 pt] For each of the following statements, write “True” if the statement must be true because of the
network structure; else write “False”. If you write “True”, also write the conditional independence property
which makes the statement true. If you write “False”, write the conditional independence assumption that
would be necessary to make it true. You should not have to compute any probabilities to answer.
i. [4 pt] P(B, I, M ) = P(B)P(I)P(M )
False, would need I ⊥⊥B⊥ ⊥M
ii. [4 pt] P(J | G) = P(J | G, I)
True, J ⊥⊥ I | G.
iii. [4 pt] P(M | G, B, I) = P(M | G, B, I, J)
True, M ⊥ ⊥ J | G, B, I.
(b) [4 pt] Calculate the value of P (b, i, ¬m, g, j).
(c) [4 pt] Calculate the probability that a politician goes to jail given that they broke the law, have been
indicted, and face a politically motivated prosecutor.
P (j, b, i, m, g 0 )
P
g0
P (j | b, i, m) = P
j 0 ,g 0 P (j 0 , b, i, m, g 0 )
P (b)P (m)P (i | b, m)P (g 0 | b, i, m)P (j | g 0 )
P
g0
= P
P (b)P (m)P (i | b, m)P (g 0 | b, i, m)P (j 0 | g 0 )
j 0 ,g 0
P (b)P (m)P (i | b, m) g0 P (g 0 | b, i, m)P (j | g 0 )
P
=
P (b)P (m)P (i | b, m) g0 P (g 0 | b, i, m) j 0 P (j 0 | g 0 )
P P
0 0
P
0 P (g | b, i, m)P (j | g )
= P g 0 P 0 0
g 0 P (g | b, i, m) j 0 P (j | g )
0.9 × 0.9 + 0.1 × 0
=
0.9 × 0.1 + 0.1 × 0 + 0.9 × 0.9 + 0.1 × 1
≈ 0.81
3
(d) [10 pt] After someone has been found guilty, the president can pardon them and guarantee that they
cannot go to jail (though this is quite rare). Draw the new Bayes net with P = Presidential Pardon added
to the network. Introduce edges from and to P to reflect its dependence on other variables in the Bayes
net. Explain what conditional independence assumptions you made and why.
Then, fill in all conditional probabilities involving the new variable P with reasonable probabilities re-
flecting the real world. Explain how you chose the values in your probability table(s).
Add the node P and the directed edges G → P and P → J:
G P (P | G)
g 0.01
¬g 0
It is impossible for the president to pardon someone who is not found guilty, so P (p | ¬g) = 0. Given that
someone has been found guilty, it’s rare that the president pardons them, so P (p | g) = 0.01.
G P P (J | G, P )
g p 0
g ¬p 0.9
¬g p 0
¬g ¬p 0
4
B E P (A | B, E)
b e 0.8 A P (J | A) A P (M | A)
P (B) P (E)
b ¬e 0.6 a 0.8 a 0.6
0.1 0.1
¬b e 0.6 ¬a 0.1 ¬a 0.1
¬b ¬e 0.1
Apply the algorithm to the query P(B | j, m). You will have to eliminate two variables, in any order.
For both, include all of the following:
• 2 pt per variable: The name of the variable you are eliminating (e.g., A) and the variables involved
in the resulting factor (e.g., f1 (B, C))
P
• 2 pt per variable: The summation to calculate the factor (e.g., f1 (B, C) = a P(B)P (a | B)P(C | a))
• 3 pt per variable: The values in the factor table.
Finally, you must multiply the remaining factors to get a final answer. Name the variables involved in the
final resulting factor (2 pts). Find the normalizing constant to make it a probability distribution (1 pt),
and write down the new probability table (3 pts).
The factors we initially begin with are P(B), P(E), P(A | B, E), P (j | A), P (m | A).
Eliminating variable E:
X
f1 (A, B) = P (e0 )(A | B, e0 ) = P (e)P (A | B, e) + P (¬e)P (A | B, ¬e)
e0
A B f1 (A, B)
a b 0.1 × 0.8 + 0.9 × 0.6 = 0.08 + 0.54 = 0.62
a ¬b 0.1 × 0.6 + 0.9 × 0.1 = 0.06 + 0.09 = 0.15
¬a b 0.1 × 0.2 + 0.9 × 0.4 = 0.02 + 0.36 = 0.38
¬a ¬b 0.1 × 0.4 + 0.9 × 0.9 = 0.04 + 0.81 = 0.85
We now have the factors P(B), f1 (A, B), P (j | A), and P (m | A) remaining.
Eliminating variable A:
X
f2 (j, m, B) = f1 (a0 , B)P (m | a0 )P (j | a0 ) = f1 (a, B)P (m | a)P (j | a) + f1 (¬a, B)P (m | ¬a)P (j | ¬a)
a0
B f2 (j, m, B)
b 0.62 × 0.6 × 0.8 + 0.38 × 0.1 × 0.1 ≈ 0.30
¬b 0.15 × 0.6 × 0.8 + 0.85 × 0.1 × 0.1 ≈ 0.08
Now the remaining factors are P(B) and f2 (j, m, B). Multiplying them together we get
B f3 (j, m, B)
b 0.1 × 0.3 = 0.03
¬b 0.9 × 0.08 = 0.072
We normalize this factor by dividing by the sum of its table entries: 0.03 + 0.072 = 0.102. This gives a
final table of
B P(B | j, m)
b 0.29
¬b 0.71
5
(b) [20 pt] Consider the Bayes net below. Suppose we are trying to compute the query P(X | e1 , e2 ). Assume
all variables are binary.
i. [4 pt] Suppose we choose to eliminate variables in the order A, B, C, D, F . What is the largest factor
resulting from summing out over one of these variables? How many entries are in its table? Assume
that we separately store pairs of numbers which we know sum to one.
There are two possible answers: f1 (X, B, C, D), from eliminating variable A, or f2 (C, X, D, F, e1 ),
from summing out B. Both have 24 = 16 entries in their tables.
ii. [16 pt] An optimal variable elimination ordering is one which minimizes the sum of the sizes of factors
generated. What is one optimal variable elimination ordering? For each variable eliminated in this
order, include the resulting factor and its size. The optimal ordering will be worth 6 points. Each
variable’s entry will be worth 2 points.
An optimal variable elimination order is any one which eliminates A last. For example, F, B, C, D, A:
• F → f1 (B) = 1: size 1 (or size 0)
• B → f2 (A, e1 ): size 2
• C → f3 (A, e2 , D): size 4
• D → f4 (A, e2 ): size 2
• A → f5 (X, e1 , e2 ): size 2
6
i. [5 pt] Suppose we are making the query P(A | D = d). Prove that C is irrelevant to this query using
the following steps:
• 1 pt: Write the full joint distribution as a product of the CPTs in the Bayes net
• 1 pt: Sum over this product and normalize to get P(A | D = d)
• 3 pt: Show algebraically that this expression does not depend on the variable C.
The full joint distribution is given by
1X
P(A | d) = P(A)P (b0 )P (c0 | A)P(d | A, b0 )
α 0 0
b ,c
1X X
P(A | d) = P (c0 | A) P (A)P (b0 )P (D | A, b0 )
α 0 0
c b
1X
P (A | d) = P (A)P (b0 )P (D | A, b0 )
α 0
b
Once we have built the P (A, d) table, we can simply sum over the entries and normalize by that value
to get P (A | d), without ever using the table P (C | A).
ii. [5 pt] Suppose we are making the query P(C | D). Execute the first two steps in part i) for this query,
and then argue why B is not irrelevant.
1 X
P(C | D) = P (a0 )P (b0 )P(C | a0 )P(D | a0 , b0 )
α 0 0
a ,b
We can’t pull out a factor involving B out of the summation that cancels to 1, so B is not irrelevant
to the query.
(b) [5 pt] The ancestor criterion says that any node which is not an ancestor of a query or evidence variable is
irrelevant. In the Bayes net below, query variables are indicated by a double circle and evidence variables
are shaded in. Cross out all the nodes that are irrelevant to this query according to the ancestor criterion.
7
(c) [10 pt] The moral graph of a Bayes net is an undirected graph containing all of the same connections as
the original Bayes net, plus edges that connect variables which shared a child in the original Bayes net.
Another criterion of irrelevance says that X is irrelevant to the query P(Q1 . . . Qn | e1 . . . en ) if in the
moral graph, every path between a query variable Qi and X goes through some evidence variable ej (i.e.,
X is m-separated from the query variables given the evidence variables).
For the following Bayes net, draw in the additional edges found in the moral graph. Then cross out all the
variables that are irrelevant to the query according to the m-separation criterion. Finally, list on the side
the variables that are not considered irrelevant by the ancestor criterion but are considered irrelevant by
the m-separation criterion.
A and B are considered irrelevant by the m-separation criterion but are not considered irrelevant by the
ancestor criterion.
8
(d) [5 pt] The Markov blanket of a variable X in a Bayes net is the set of X’s parents, children, and its
children’s parents. Explain why the moral graph can also be defined as an undirected graph in which
an edge (X, Y ) exists if and only if X is in the Markov blanket of Y and vice versa. (Be sure to clearly
explain both implications.)
(X, Y ) ∈ moral graph =⇒ X, Y in each other’s Markov blankets: If (X, Y ) is in the moral graph, it must
be the case that either
• X → Y or Y → X existed in the original Bayes net, or
• X → A and Y → A existed in the original Bayes net
In both of these cases X and Y would be in each other’s Markov blankets by the definition given above.