Midterm Solution
Midterm Solution
Midterm Solution
Summer 2013
Overall the class did very well on what was decidedly a medium-hard midterm.
The class average was a 72, which was above the predicted average of 70.
One nice result is that the grade distribution had positive skew. The median was
a 78, and a healthy 66% of the class scored above the mean (something to think
about). The standard deviation was 20 points, a statistic that captures the long
tail of scores that were below the mean. If you scored under a 50 please talk to
us!
30
Number of Students
25
20
15
10
Grade Range
If you want a re-grade, please return your midterm with a cover sheet explaining
what went wrong. We reserve the right to re-grade your entire exam.
CS221 Midterm –2–
There are a few heuristics that work for this problem. Two examples:
Euclidean distance.
You know that your opponent is actually running depth 2 minimax with a poor
heuristic funciton, using the result 80% of the time, and moving randomly
otherwise. Would it be better to model this agent using MiniMax or ExpectiMax?
True or False: At the end of the transition step (elapseTime), hidden markov
model beliefs do not need to be normalized.
True. The new nodes in W could form a “v” structure which allows information
to flow.
True or False. If there was just a single particle in a particle filter it would still
adequately incorporate observations.
Consider the original image (on the left). It we want to resize it to a smaller image
(on the right) we could scale the image. However this would make the castle look
distorted.
The importance of a seam is the sum of the importance of its pixels. On the right
the least important seams are visualized. Note that the least important seams do
not cross the castle.
2a. Formalize the task of finding the least important vertical seam as a
deterministic search problem (15 points).
State Description:
Start State:
State.row = anything.
State.col = -1
2b. Circle all of the algorithms that are guaranteed to find the least important
seam, given your DFS formalization (3 points).
Bellman-Ford
2c. If your picture is 100 pixels high give a lower bound on the worst case
number of nodes BFS would expand (2 points).
There were several correct answers. Also note, we changed the number of
pixels for some exams to be 200 to make sure no-body took a cheeky peek
at someone else’s test. Nobody did J.
1. Since searches keep a visited set, 100 * width is a lower bound.
2. If you run a search algorithm without a visited set and only check
terminal conditions once an element has been removed from the
container: w*3100 (the w is for the first move).
3. If you run a search algorithm without a visited set and check terminal
conditions when you generate successors: w*399 (the w is for the first
move).
CS221 Midterm –6–
Negotiation assumptions:
• The central assumption is that both agents in the negotiation only care
about maximizing their own utility.
• We are also going to assume that the two players do not collude. In other
words they do not decide together on a set of actions.
In this problem we will analyze how to make an optimal action for a negotiation
game without zero sum rewards.
CS221 Midterm –7–
3a. In the game tree bellow player one decisions are circles and player two
decisions are squares. The leaf nodes are annotated with utility tuples. The first
number in the utility tuple is the utility for player one. The second is the utility for
player two. Propagate utility tuples up from the bottom. At each point in time,
assume that a player greedily choses the action which leads to the state that
maximizes their own utility (5 points).
(7, 5)
(7, 5) (4, 2)
(7, 5) (2, 4)
(9, 1) (4, 2)
3b. The state with score (8, 8) has the highest sum of scores. Assuming that
there is no collusion between the agents, do you think it would be a better option
for the first player (circle) to choose the left or right action from the root node?
Explain. Briefly (2 points).
Left. Since the other player is self-maximizing its best for the circle player
to move left. Then, we assume the other player will also move left and we
will be left with a score of 7.
If we moved right, we assume the other player will also move right
(rationally, if they moved left, they assume we would move right and they
would be left with a score of 1). Then on our second turn we would move
right and end up with a score of 4.
If the players can collude everyone would have done better. Working
together is the way to go J.
CS221 Midterm –8–
3c. Write a getUtility function which returns a utility tuple, the first number is
the utility for player one, the second is the utility for player two (15 points). You
can use the following helper functions:
actions = getLegalActions(state)
nextState = getNextState(oldState, action)
state.isTerminal()
(utilP1, utilP2) = state.terminalUtility()
terminalUtility can only be called on states that are terminals and it returns
a tuple of utilities, utilP1 is the utility for player one and utilP2 is the utility for
player two.
# Function: getUtility
# -------------------------
# Return the maximax tuple for a given state:
# (utilPlayerOne, utilPlayerTwo).
# playerOne is a Boolean variable which is True iff
# it is playerOne’s turn.
def getUtility(state, playerOne):
# base case
if state.isTerminal():
return state.terminalUtility()
# recursive case
actions = getLegalActions(state)
bestUtil = (float(‘-inf’), float(‘-inf’))
for action in actions:
nextState = getNextState(state, action)
utility = getUtility(nextState, not playerOne)
return bestUtil
3d. In minimax we were able to use alpha-beta pruning to eliminate some states
from consideration. Would it be possible to prune states in this algorithm?
Explain. Briefly (3 points).
Being able to predict if growing conditions will lead to a healthy harvest is a hard
problem to solve in southern Kenya where there is a unique set of climate
conditions. We are going to use the best bayes net we can.
Initially we tried to model predicting harvest using the naïve bayes net (above).
But it was not very good at predicting harvests. Our intuition is that naïve bayes
would have been a good model except that the “feature” variables (A through E)
probably don’t fit the assumption of conditional independence given harvest.
To assuage this problem we are going to break the naïve bayes assumption and
allow extra arcs between the feature variables. Which arcs? Since we aren’t
experts we are going to let an algorithm pick the arcs that best match our
dataset. Learning bayesian networks from data is called structure learning!
𝑃(𝑋 = 𝑥, 𝑌 = 𝑦|𝐻 = ℎ)
! 𝑃(𝑋 = 𝑥, 𝑌 = 𝑦, 𝐻 = ℎ)𝑙𝑜𝑔
𝑃(𝑋 = 𝑥|𝐻 = ℎ)𝑃(𝑌 = 𝑦|𝐻 = ℎ)
!,!,!
Tree Augmented Naïve Bayes network: Let S be the sum of CMI for a set of
node pairs. Chose arcs to add to naïve bayes such that the set of node pairs of
the arcs maximize S given the constraint that no feature variable can have
more than two parents (one being the harvest variable).
Tree Augmented Naïve Bayes out performs Naïve Bayes and still has a simple
structure that is easy to run inference on.
CS221 Midterm – 11 –
4a. First lets explore what goes wrong when the naïve bayes assumption is
wrong. Consider a simple naïve bayes model with two feature variables A and B.
Assume A is not conditionally independent of B given C (6 points).
𝑃 𝐶 = 𝑐 𝐴 = 𝑎, 𝐵 = 𝑏 = 𝛼𝑃 𝐴 = 𝑎, 𝐵 = 𝑏, 𝐶 = 𝑐
𝑃 𝐶 = 𝑐 𝐴 = 𝑎, 𝐵 = 𝑏 = 𝛼𝑃 𝐶 = 𝑐 𝑃 𝐴 = 𝑎 𝐶 = 𝑐 𝑃(𝐵 = 𝑏|𝐶 = 𝑐)
ii. Express 𝑃(𝐶 = 𝑐|𝐴 = 𝑎, 𝐵 = 𝑏) for the right model in terms of implied
conditional probability tables. You may use a normalization constant.
𝑃 𝐶 = 𝑐 𝐴 = 𝑎, 𝐵 = 𝑏 = 𝛼𝑃 𝐴 = 𝑎, 𝐵 = 𝑏, 𝐶 = 𝑐
𝑃 𝐶 = 𝑐 𝐴 = 𝑎, 𝐵 = 𝑏 = 𝛼𝑃 𝐶 = 𝑐 𝑃 𝐴 = 𝑎 𝐶 = 𝑐 𝑃(𝐵 = 𝑏|𝐶 = 𝑐, 𝐴 = 𝑎)
𝑃! = 𝛼𝑃 𝐶 = 𝑐 𝑃 𝐴 = 𝑎 𝐶 = 𝑐 𝑃(𝐵 = 𝑏|𝐶 = 𝑐)
𝑃! = 𝛼𝑃 𝐶 = 𝑐 𝑃 𝐴 = 𝑎 𝐶 = 𝑐 𝑃(𝐵 = 𝑏|𝐶 = 𝑐, 𝐴 = 𝑎)
Given equation (iii) it follows that P1 does not equal P2
CS221 Midterm – 12 –
4b. Let’s augment our naïve bayes net! In order to learn the augmented naïve
bayes network, we must compute the conditional mutual information (CMI)
between each pair of features. Explain how we can compute the different parts of
the mutual information equation using data. Be concise (6 points):
𝑃 𝑋 = 𝑥, 𝑌 = 𝑦, 𝐻 = ℎ
𝑃(𝑋 = 𝑥, 𝑌 = 𝑦|𝐻 = ℎ)
𝑃 𝑋=𝑥𝐻=ℎ
𝑃(𝑌 = 𝑦|𝐻 = ℎ)
4c. State at least one formalization method we have learned in class that you
could use to model the problem of choosing arcs to add to our naïve bayes net.
The chosen arcs should maximize the sum of mutual information S given the
stated constraints. Do not formalize the problem (2 points).
4d. Applying the algorithm to our dataset of Kenyan harvests we learn the
following Tree Augmented Naïve Bayesian network (6 points):
i. Is D independent of F given C?
Yes.
At each time slice, we observe a segment of sound from which we can extract
the pitch contour—how the speaker’s pitch changes over the sound segment. Let
the pitch contour at time t be Ct. The domain of pitch contour is {angular, glideUp,
descending, flat, irregular}.
We would like to know the emotion at each time slice Et. The domain of emotions
is {sadness, surprise, joy, disgust, anger and fear}.
5a. Formalize the task of tracking emotion using a Hidden Markov Model (6
points)
6 x 6 = 36
5 x 6 = 30
5b. We want to know if the set of observations seem plausible given our emission
and transition distributions. Write a formula to calculate the probability of seeing a
particular sequence of observations over n time slices. Use probabilities from the
conditional probability distributions implied by the hidden markov model. You can
use a normalization constant. Note, this may include a huge sum. (7 points).
𝑃(𝐶! = 𝑐! , 𝐶! = 𝑐! , … 𝐶! = 𝑐! )
𝑃 𝐶! = 𝑐! , 𝐶! = 𝑐! , … 𝐶! = 𝑐! = 𝑃(𝐸 = 𝑒, 𝐶 = 𝑐)
!∈!
The joint can be expressed as the product of all the CPTs in the
bayes net.
𝑃 𝐶! = 𝑐! , 𝐶! = 𝑐! , … 𝐶! = 𝑐! =
!
5c. Our agent knows about two different personality types R = x and R = y. For
both personality type we have a unique transition probability table. Different
personality types have different ways of changing between emotions.
Using your algorithm from part (b) it is possible to compute a probability of seeing
the observations given the transition table for x (Θx) and the probability of seeing
the observations given the transition table for y (Θy).
Using the terms in the table, what is the probability of the human’s personality
trait being R = x given a particular sequence of observations? Hint: Use bayes
theorem. You can use a normalization constant (7 points).
𝑃(𝑅 = 𝑥|𝐶! = 𝑐! , 𝐶! = 𝑐! , … 𝐶! = 𝑐! )
𝛼𝜃! 𝜙!
P(C|R) is theta x
P(R) is phi x