Exam2 s15 Sol

Final Examination
CS 540-3: Introduction to Artificial Intelligence

May 15, 2015
LAST NAME:
SOLUTIONS
FIRST NAME:
Problem
Score
Max Score
___________
12
___________
13
___________
11
___________
12
___________
12
___________
14
___________
___________
17
Total
___________
100
1 of 10
Question 1. [12] Constraint Satisfaction

You are planning a menu for friends and youve narrowed down the choices for each of the four
courses, appetizer (A), beverage (B), main course (M), and dessert (D) as follows:
A:
B:
M:
D:
veggies (v) or salad (s)

water (w), beer (b), or milk (m)
fish (f), hamburger (h), or pasta (p)
tart (t), ice cream (i), or cheese (c)
Each person gets the same menu consisting of one item in each course. Dietary restrictions of the
guests imply the following constraints:
(i) The appetizer must be veggies or the main course must be pasta or fish.
(ii) If you serve salad, the beverage must be water.
(iii) You must serve at least one of milk, ice cream or cheese.
(a) [4] Draw the constraint graph and the initial domain of each variable associated with this
problem. That is, show a graph with 4 nodes labeled A, B, M and D and arcs connecting
appropriate pairs of nodes based on the constraints; show the domains beside each node.
M - A - B - D and their domains are A = {v, s}, B = {w, b,

m}, M = {f, h, p}, and D = {t, i, c}
(b) [4] Say we decide to have the appetizer be salad, i.e., A = s. What are the domains of all the
variables after applying the Forwarding Checking inference algorithm (but no backtracking
search)?
Eliminate values b, m and h, resulting in A = {s}, B = {w}, M =
{f, p}, and D = {t, i, c}
(c) [4] Instead of using Forward Checking in (b), say we initially set A = s and then apply the
Arc-Consistency algorithm (AC-3) (but no backtracking search). What are the domains of
all the variables after it halts?
A = {s}, B = {w}, M = {f, p}, D = {i, c}. t is eliminated in
addition to the values eliminated in (b) because there is no
value for B that is consistent with t at D based on constraint
(iii).
2 of 10
Question 2. [13] Neural Networks

(a) [3] Can a Perceptron that uses a fixed threshold value of 0 and no extra input with a bias
weight learn the AND function of 3 binary inputs? That is, if all 3 inputs are 1s, the output is
1; otherwise the output is 0. If so, construct one; if not, explain why there is none.
No, because in this case the Perceptron can only learn a
decision line that passes through the origin since the output
is defined by w1x1 + w2x2 + w3x3 = 0.
(b) [5] Consider a Perceptron with 3 inputs, x1, x2, x3, and one output unit that uses a linear
threshold unit (LTU) as its activation function. Assume initial weights w1 = 0.2, w2 = 0.7, w3 =
0.9, learning rate 0.2, and bias w4 = 0.7 (and the LTU itself uses a fixed threshold value of
0). Hence we have:
0
w1
x1
w4
w3
w2
x2
x3
+1
(i) [2] Given the inputs x1=1, x2=0, x3=1, what is the output of this Perceptron? Show
your work.
The output is 1 because (.2)(1) + (.7)(0) + (.9)(1) + (0.7)(1) = 0.4 0.
(ii) [3] What are the four updated weights' values after applying the Perceptron Learning
Rule with the above input and teacher output 0? Show your work.
T=0 and O=1, so update the weights using wi = wi + (.2)(01)xi. Thus, the new weights are w1 = 0.2 + (-.2)(1) = 0.0,
w2 = 0.7 + (-.2)(0) = 0.7, w3 = 0.9 + (-.2)(1) = 0.7 and w4
= -0.7 + (-.2)(1) = -0.9
3 of 10
(c) [2] Which one of the following best describes the process of learning in a multilayer, feedforward neural network that uses back-propagation learning?
(i)
(ii)
(iii)
(iv)
(v)
(vi)
Activation values are propagated from the input nodes through the hidden layers to the
output nodes.
Activation values are propagated from the output nodes through the hidden layers to the
input nodes.
Weights on the arcs are modified based on values propagated from input nodes to
output nodes.
Weights on the arcs are modified based on values propagated from output nodes to
input nodes.
Arcs in the network are modified, gradually shortening the path from input nodes to
output nodes.
Weights on the arcs from the input nodes are compared to the weights on the arcs
coming into the output nodes, and then these weights are modified to reduce the
difference.
(iv)
(d) [3] Why dont multilayer, feed-forward neural networks use an LTU as the activation function
at nodes? And what do they use instead of an LTU?
Learning in multi-layer feed-forward neural networks uses the
back-propagation algorithm, which does gradient descent in
weight space. Gradient descent requires computing the
derivative of the activation function. But LTUs have a
discontinuity at the threshold value and therefore the
derivative is not defined there. So, we use a continuous
function such as the Sigmoid that is differentiable everywhere.
4 of 10
Question 3. [11] Probabilistic Reasoning

A barrel contains many balls, some of which are red and the rest are blue. 40% of the balls are
made of metal and the rest are made of wood. 30% of the metal balls are colored blue, and 10% of
the wood balls are colored blue. Let Boolean random variables B mean a ball is blue (so B means
a ball is red), and M means a ball is metal (and M means it is wood).
(a) [3] What is the probability that a wood ball is red, i.e., P(B | M)?
P(B | M) = 1 - P(B | M) = 1 - 0.1 = 0.9
(b) [4] What is the prior probability that a ball is blue, i.e., P(B)?
P(B) = P(B|M)P(M) + P(B|M)P(M) = .3 * .4 + .1 * (1 - .4) = 0.18
(c) [4] What is the posterior probability that a blue ball is metal, i.e., P(M | B)?
P(M | B) = P(B | M) P(M) / P(B) = .3 * .4 / .18 = 0.67
5 of 10
Question 4. [12] Nave Bayes

Consider the problem of detecting if an email message contains a Virus. Say we use four random
variables to model this problem: Boolean class variable V indicates if the message contains a virus
or not, and three Boolean feature variables: A, B and C. We decide to use a Nave Bayes Classifier
to solve this problem so we create a Bayesian network with arcs from V to each of A, B and C. Their
associated CPTs are created from the following data: P(V)=0.2, P(A|V)=0.8, P(A|V)=0.4,
P(B|V)=0.3, P(B|V)=0.1, P(C|V)=0.1, P(C|V)=0.6
(a) [4] Compute P(A, B, C | V)

From above, were given:
V
P(A | V)
P(B | V)
P(C | V)
P(V)
False
0.4
0.1
0.6
0.8
True
0.8
0.3
0.1
0.2
P(A,B,C | V) = (1 - P(A|V)) P(B|V) P(C|V) = (.2)(.3)(.1) = 0.006
(b) [4] Compute P(A, B, C)

P(A, B, C) = P(A | V) P(B | V) P(C | V) P(V)
+ P(A | V) P(B | V) P(C | V) P(V)
= (.8)(.7)(.9)(.2) + (.4)(.9)(.4)(.8)
= 0.216
(c) [4] Compute P(V | A, B, C)

P(V | A, B, C) = P(A, B, C | V) P(V) / P(A, B, C)
= P(A | V) P(B | V) P(C | V) P(V) / .216 = 0.467
6 of 10
Question 5. [12] Bayesian Networks

Consider the following Bayesian Network containing 5 Boolean random variables:
(a) [4] Write an expression for computing P(A, S, H, E, C) given only information that is in the
associated CPTs for this network.
P(A) P(S) P(H) P(E | A, S, H) P(C | E, H)
(b) [2] True or False: P(A, C, E) = P(A) P(C) P(E)

False. This is only true when A and C and E are independent, which
they are not here by the structure of the Bayesian network
(c) [3] How many numbers must be stored in total in all CPTs associated with this network
(excluding numbers that can be calculated from other numbers)?
1 + 1 + 1 + 23 + 22 = 15
(d) [3] C is conditionally independent of ____A and S____ given ____E and H_____
7 of 10
Question 6. [13] Hidden Markov Models

I have two coins, D1 and D2. At each turn I pick one of the two coins but don't know which one I
picked. Which coin is selected is assumed to obey a first-order Markov assumption. After I pick a
coin, I flip it and observe whether it's a Head or a Tail. The two coins are not "fair," meaning that
they each don't have equal likelihood of turning up heads or tails. The complete situation, including
the probability of which coin I pick first is given by the following HMM, where D1 and D2 are the two
hidden states, and H and T are the two possible observable values.
(a) [4] Compute P(q1 = D1, q2 = D1, q3 = D2, q4 = D2)

P(q1=D1, q2=D1, q3=D2, q4=D2)
= P(q4=D2 | q3=D2) P(q3=D2 | q2=D1) P(q2=D1 | q1=D1) P(q1=D1)
= (.3)(.4)(.6)(.3)
= 0.0216
(b) [5] Compute P(o1 = H, o2 = T, q1 = D2, q2 = D2)

P(o1=H,o2=T,q1=D2,q2=D2)
= P(q1=D2) P(q2=D2 | q1=D2) P(o1=H | q1=D2) P(o2=T | q2=D2)
= (.7)(.3)(.4)(.6)
= 0.0504
(c) [5] Compute P(q1 = D1 | o1 = H)

P(q1=D1|o1=H) = P(o1=H|q1=D1) P(q1=D1) / P(o1=H)
= P(o1=H|q1=D1) P(q1=D1) / [P(o1=H|q1=D1) P(q1=D1)
+ P(o1=H|q1=D2) P(q1=D2)]
= ((.55)(.3))/[(.55)(.3) + (.4)(.7)]
= 0.37
8 of 10
Question 7. [9] Face Recognition using Eigenfaces

(a) [6] The Eigenfaces algorithm for face recognition projects an input image represented as a point
in an n-dimensional image space to a point in an m-dimensional subspace called face space.
(i) [3] To what does each dimension in image space correspond?
Each dimension in image space corresponds to a pixel
(ii) [3] To what does each dimension in face space correspond?
Each dimension in face space corresponds to one eigenface. Each

eigenface is an eigenvector of size n x 1 corresponding to one of
the m largest eigenvalues computed by the method.
(b) [3] What classification method is used by the Eigenfaces algorithm to recognize the face in a
test image?
Nearest neighbor classifier
9 of 10
Question 8. [17] Propositional Logic

(a) [3] If |= and |= what does this mean about relation between the two PL sentences
and ?
They are logically equivalent
(b) [3] Given two arbitrary sentences and in PL, if |= then is _______________
(one word answer)
Unsatisfiable / contradiction / inconsistent / False under all
interpretations
(c) [3] Is the sentence ((P Q) Q) P valid, satisfiable, or unsatisfiable? Briefly explain how
you determined your answer.
Satisfiable (but not valid) since when P=T and Q=T the sentence is
True, but when P=F and Q=T, the sentence is False
(d) [3] Prove whether or not (P Q) |= (P Q)

Not true because one of the models of P Q has P=F and Q=F, which
does not satisfy P Q
(e) [5] Given the three PL sentences: P Q, P Q, P Q defining a KB, prove the goal
sentence (P Q) using the Resolution Refutation algorithm. Show your answer as a proof tree.
1.
2.
3.
4.
5.
6.
7.
P Q
P Q
P Q
P Q
Q
Q
False
negation of the
KB
KB
KB
Resolution rule
Resolution rule
Resolution rule
goal sentence
with 1 and 3
with 2 and 4
with 5 and 6
10 of 10

Exam2 s15 Sol

Uploaded by

Copyright:

Available Formats

Exam2 s15 Sol

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Exam2 s15 Sol

Uploaded by

Copyright:

Available Formats

Final Examination

CS 540-3: Introduction to Artificial Intelligence

Question 1. [12] Constraint Satisfaction

veggies (v) or salad (s)

M - A - B - D and their domains are A = {v, s}, B = {w, b,

Question 2. [13] Neural Networks

Question 3. [11] Probabilistic Reasoning

P(B) = P(B|M)P(M) + P(B|M)P(M) = .3 * .4 + .1 * (1 - .4) = 0.18

P(M | B) = P(B | M) P(M) / P(B) = .3 * .4 / .18 = 0.67

Question 4. [12] Nave Bayes

(a) [4] Compute P(A, B, C | V)

P(A,B,C | V) = (1 - P(A|V)) P(B|V) P(C|V) = (.2)(.3)(.1) = 0.006

(b) [4] Compute P(A, B, C)

(c) [4] Compute P(V | A, B, C)

Question 5. [12] Bayesian Networks

(b) [2] True or False: P(A, C, E) = P(A) P(C) P(E)

Question 6. [13] Hidden Markov Models

(a) [4] Compute P(q1 = D1, q2 = D1, q3 = D2, q4 = D2)

(b) [5] Compute P(o1 = H, o2 = T, q1 = D2, q2 = D2)

(c) [5] Compute P(q1 = D1 | o1 = H)

Question 7. [9] Face Recognition using Eigenfaces

Each dimension in image space corresponds to a pixel

(ii) [3] To what does each dimension in face space correspond?

Each dimension in face space corresponds to one eigenface. Each

Nearest neighbor classifier

Question 8. [17] Propositional Logic

(d) [3] Prove whether or not (P Q) |= (P Q)

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.