FX 12 Sol
FX 12 Sol
FX 12 Sol
Problem 1: 15 points
The following problem is known as the BETWEENNESS problem. You are given a universal set U of elements and a collection of constraints of the form Between(x,y,z) where x,y,z are specied elements of U , meaning that either x < y < z or x > y > z. The problem is to nd an linear ordering of the elements satisfying all the constraints (or, as a decision problem, determine whether such an ordering exists). The problem is NP-complete. For instance, suppose that U is the set {a,b,c,d,e,f,g,h} and you are given the following constraints; Between(d,a,h) Between(f,c,g) Between(a,b,c) Between(a,h,f) Between(h,g,d) Between(a,c,h) Between(a,e,f) Between(f,h,c) One solution for these constraints is the ordering d,a,g,b,c,h,e,f . A. Describe a state space that would be suitable for an exhaustive blind search. Your description should specify (a) what a state is; (b) what are the successors to a given state; (c) what is the start state; (d) how are goal states recognized. Answer: State: The rst k items in the list. Successors: Add the k + 1st item, in such a way that none of the Between constraints are violated. Start state: The null list. Goal state: All the items are in the list, with all the constraints satised. B. In the state space in (A), what is the depth of the space? What is the branching factor? Is the space a tree? Which search strategy would be most suitable: depth-rst search, breadth-rst search, or iterative deepening? Answer: Depth: k. Branching factor: k. Tree? Yes. Strategy: DFS. C. The BETWEENNESS problem cover can be transformed into a satisability problem, using propositional atoms of the form Element-Place meaning that the specied element is in the specied place. In the above example, there would be 64 atoms a1 ...a8, b1 ...h8 and the valuation corresponding to the above solution would be d1=TRUE, a2=TRUE ...f8=TRUE and the other atoms are FALSE. Describe the constraints over these atoms needed to solve the problem. Your description should apply to any problem instance, not just to the one illustrated. For each category of constraint, give one instance of the constraint from the above example (or describe the instance in some formulations, the constraints get very long.) Your formulation should generate a set of propositional sentences that is polynomial in the size of the problem; however, the sentences do not have to be in conjunctive normal form. 1
Answer: Category 1: Every element is somewhere in the list. For each element x, assert that it is at one of the position. Example: a1 a2 ... a8. Category 2: No element is at two places in the list. For each element x and pair of places i < j, assert (xi xj). Example: (a1 a2) Category 3: No place has two elements. For each place i and each pair of elements x = y, assert (xi yi). Example: (a1 b1). Category 4: The betweenness constraints. For each betweenness constraint Between(x,y,z) for each pair of indices i, k that are at least 2 apart, assert that, if xi and zk then y is at one of the positions between i and k. Example: The constraint Between(d,a,h) gives rise to (among others) the sentence (d7 h3)(a4 a5 a6).
Problem 2: 10 points
Consider the following set of propositional formulas: P Q. P(Q R). X(Q W). (W X)R. WP. A. Convert these to CNF. Answer: 1. 2. 3. 4. 5. 6. P V Q. ~P V ~Q. ~P V ~R. X V Q V W W V ~X V R. ~W V ~P.
B. Give a trace of the execution of the Davis-Putnam algorithm. When a choice point is reached, choose the rst unbound atom alphabetically, and try TRUE before FALSE. Answer: State S0: Full set of clauses above, empty valuation. No easy operations. Try P=TRUE. Delete (1), delete ~P from (2), (3) and (6). State S1: 2
2. ~Q 3. ~R 4. X V Q V W 5. W V ~X V R. 6 ~W. 2, 3, and 6 are singleton clauses. Set Q=FALSE, R=FALSE, W=FALSE. Delete (2), (3), (6); delete Q, R, and W from (4) and (5) 4. X 5. ~X 4 is a singleton clause. Set X=TRUE. Delete ~X from 5. 5 is the null clause. Backtrack to the last choice, in S0. Try P=FALSE. Delete (2), (3), (6); delete P from (1). State S2 1. Q 4. X V Q V W 5. W V ~X V R. Q and W are pure literals. Set Q=TRUE, W=TRUE. Delete (1), (4), (5) All satisfied. Solution: P=FALSE, Q=TRUE, W=TRUE, X and R are arbitrary.
Problem 3: 20 points
Let U be a domain of people and books. Let L be a rst-order language of U with the following symbols: B(b,p) book b is a biography of person p. G(p1,p2,b) person p1 gave a copy of book b to person p2. R(p,b) person p has read book b. W(p,b) person p wrote book b. F(p1,p2) persons p1,p2 are friends. A,C,P,S --- Anne, Charles, Pamela, Sam A. Express the following in L: i. Sam gave a copy of a book by Anne to Pamela. Answer: b G(S, P, b) W(A, b). ii. Sam only gives people a book if he has read it. Answer: b,p G(S, p, b)R(S, b). iii. The only books that Anne has written are biographies of friends of Charles. 3
Answer: b W(A, b)p F(p, C) B(b, p). iv. Sam has read some biography of a friend of Charles. Answer: b,p R(S, b) B(b, p) F(p, C). B. Show how (iv) can be proven from (i-iii) using resolution theorem proving. Your answer should show the clauses generated and the resolutions that lead to the solution. You need not show the intermediate steps of Skolemization or the substeps of the resolution process (unication etc.) Answer: Skolemizing (i-iii) and the negation of (iv) gives the following clauses: 1. 2. 3. 4. 5. 6. G(S,P,sk1). W(A,sk1). G(S, p, b) R(S, b). W(A, b) F(sk2(b), C). W(A, b) B(b, sk2(b)). R(S, b) B(b, p) F(p, C).
Resolving (1) with (2) gives 7. R(S,sk1). Resolving (2) with (4) gives 8. F(sk2(sk1),C). Resolving (2) with (5) gives 9. B(sk1,sk2(sk1)). Resolving (7) with (6) gives 10. B(sk1, p) F(p, C). Resolving (9) with (10) gives 11. F(sk2(sk1), C). Resolving 8 with 11 gives the null clause.
Problem 4: 15 points
Consider the following Bayesian network:
A. Which of the following statements are true (more than one may be true): i. P and W are absolutely independent. True ii. P and W are conditionally independent given S. True iii. S and R are absolutely independent. False. 4
iv. S and R are conditionally independent given Q. False v. S and R are conditionally independent given Q and W. True. vi. Q and W are absolutely independent. False vii. Q and W are conditionally independent given S. True. B. Suppose that all the variables take 5 values. How many numeric values are recorded in the network? Use the fact that the probabilities of the outcome of a random variable on a xed condition add up to 1 to achieve as compact a representation as possible. Answer: 4 at P; 4 at S; 4*25=100 at Q; 4*5=20 at W; 4*25=100 at R. Total: 228. B. Suppose that all the random variables are Boolean. Give an expression for Prob(P=F|Q=F) in terms of quantities that are recorded in the Bayesian network. Answer: Prob(Q=F|P=F) = Prob(Q=F,S=T|P=F) + Prob(Q=F,S=F|P=F) = Prob(Q=F|S=T,P=F)*Prob(S=T|P=F) + Prob(Q=F|S=F,P=F)*Prob(S=F|P=F) = Prob(Q=F|S=T,P=F)*Prob(S=T) + Prob(Q=F|S=F,P=F)*Prob(S=F)
Problem 5: 10 points
Consider a classication problem where W, Y, and Z are the predictive attributes; C is the classication attribute; all attributes are Boolean; and you have the following data set; W T T T T F F F Y T T F F T F Z T T F F F F C T F T F F T F number of instances 5 8 2 3 6 1 2
The value is the null value. How does the Naive Bayes classier predict the value of C for W=F, Y=F, Z=T? You can leave your answer as an unevaluated arithmetic expression; I am not interested in your ability to do arithmetic. Answer: Using Naive Bayes, P(C=T | W=F,Y=F,Z=T) = P(W=F|C=T) P(Y=F|C=T) P(Z=T|C=T) P(C=T) = (7/8) (3/8) (5/8) (8/27). P(C=F | W=F,Y=F,Z=T) = P(W=F|C=F) P(Y=F|C=F) P(Z=T|C=F) P(C=F) = (8/19) (3/17) (8/16) (19/27). Note that the instances with a null value for attribute A are discounted in the calculation for A, but are included in the calculations for the other attributes.
Problem 6: 15 points
Suppose that you are trying to do classication learning from a labelled data set. There are predictive attributes A, B1 . . . Bk plus the classication attribute C. As it happens, A is a very good predictor of C, whereas B1 . . . Bk are entirely irrelevant; All the Bi s and C are both absolutely independent and conditionally independent given A. Assume that the data set contains a large number of instances. One can imagine rst learning a classier for C just based on A, and then learning a classier for C based on A and all the Bi s. The question then is, for various learning algorithms, does adding the Bi s signicantly degrade the classier? Case A. All the attributes are Boolean and the learning algorithm is Naive Bayes. Answer: In Naive Bayes, if you use only A, then you compare P (A|C = T ) P (C = T ) to P (A|C = F ) P (C = F ). If you use all the attributes, then you compare P (A|C = T ) P (B1 |C = T ) . . . P (Bk |C = T ) P (C = T ) to P (A|C = F ) P (B1 |C = F ) . . . P (Bk |C = F ) P (C = F ). Since each of the Bi s is independent of C, it is the case that P (Bi |C = T ) P (Bi |C = F ). So the added factors are the same in both products. Adding the additional attributes adds a certain amount of noise, but is unlikely to change the outcome. Case B. All the attributes are Boolean, and the learning algorithm is ID3. Assume that ID3 is implemented so that, if no remaining attribute gives rise to a signicant reduction in average entropy, no split is carried out. Answer: The top-level split will be on A. Once you have split on A, none of the Bi adds more information, so they do not lower the average entropy. So in either case, the algorithm returns the identical decision tree, consisting of a single node testing on A. Thus, there is no dierence at all between just using A and using all the attributes. Case C. All the attributes are numeric, and the learning algorithm is nearest neighbors. Answer: The distances from one point to another will be largely determined by the dierences in the B dimensions, which are useless. Therefore, including the Bi s in the calculation turns the classier into a practically random choice. The degradation is severe.
Problem 7: 15 points
A. As discussed in class, the k-means algorithm can be viewed as doing a form of hill-climbing n search, where the objective function is i=1 d(si , C(si ))2 where s1 . . . sn are the data points, and C(si ) is the location of the center of the cluster containing si . What is a state in this state space? Answer: A state is a set of centers. B. As discussed in class, given a data set (solid dots) and a starting pair of locations (stars) as shown below, the 2-means clustering algorithm returns the pair of clusters {w, y} and {x, z}. Does this non-intuitive result illustrate a failing of the objective function or of the search strategy? Answer: This is a failure of the search strategy. The solution with clusters {w, x} and {y, z} has a lower value of the objective function. C. If your answer in (B) was that this is a failing of the objective function, propose a better objective function. If your answer was that it was a failing of the search strategy, propose a more eective search strategy. 6
y z
x
Answer: Random restart.
D. If you apply the 2-means clustering algorithm to a data set like that shown below, you will probably get a division of the data into the left-hand and right-hand sets at the dashed line, as shown. (This was eyeballed; it is not exact.)
Explain why the algorithm gives this division, rather than the intuitively natural one. Answer: The ve loosely scattered points just to the right of the line are in fact closer to the center of the right cluster than to the center of the left cluster. The intuition is that the right cluster has a large diameter and that the left cluster has a small diameter, but k-means does not take that into account. E. Is the unintuitive answer in (D) a failing of the objective function or of the search strategy? Answer: A failing of the objective function; this clustering does actually minimize the objective function.