Artificial Intelligence Exercises For Tutorial 3 On Probabilistic Inference and Bayesian Networks Including Answers November 2018

Download as pdf or txt
Download as pdf or txt
You are on page 1of 13

Artificial Intelligence

Exercises for Tutorial 3 on Probabilistic Inference and Bayesian


Networks
Including Answers

November 2018.

Introduction
The following multiple choice questions are examples of typical questions one can expect
on the AI exam. The questions on the AI exam are also multiple choice, but for this
tutorial one has to explain the answers given. Moreover at the end one can find some open
questions.
After the tutorial the answers to the MC will be available on BB.

Exercises & Answers


1. Consider the following case of a car accident that involved a taxi.
All taxis in town are blue or green. It is known that under dim lighting con-
ditions discrimination between blue and green is 70% reliable; which means that
P (W = b|C = b) as well as P (W = g|C = g) are 0.70, where C is the two valued
variable with values g and b indicating the color of the taxi, C = b means ”the taxi
is blue” and W is the two valued variable indicating the declaration of the witness,
W = b means ”the witness says the taxi is blue”.

Suppose that 7 out of 10 taxis are actually green.

a: A witness declares that the taxi was blue. Given the declaration of our witness
what is the probability that the taxi is indeed blue?
b: Suppose two witnesses independently declare that the taxi was blue. Draw the
Bayesian Network for this case and what is the probability that the taxi was
indeed blue?
c: Now assume that a third independent witness appears on the scene and declares
that the taxi was green. What is the now probability that the taxi was indeed
blue?

Answer:

1
a: We must compute P (C = b|W = b). Using Bayes Law this is equal to

P (C = b|W = b) = P (W = b|C = b)P (C = b)/P (W = b) = 0.7 ∗ 0.3/P (W = b)

The denominator can be computed by marginalization:

P (W = b) = P (W = b, C = b) + P (W = b, C = g)
= P (W = b|C = g)P (C = g) + P (W = b|C = b)P (C = b)
= 0.3 ∗ 0.7 + 0.7 ∗ 0.3 = 0.42

Hence P (C = b|W = b) = 0.21/0.42 = 0.5.


b: The Bayesian Network as three nodes (see slides lecture) now called C denoting
the color of the car and a node W1 for declaration of witness 1 and a node W2
for the declaration of witness 2. Both CPTs are identical and follow directly
from the text, entries are 0.7 and 0.3. Similar to part a we can compute:

P (C = b|W1 = b, W2 = b) = P (W1 = b, W2 = b|C = b)P (C = b)/P (W1 = b, W2 = b)


= P (W1 = b|C = b)P (W2 = b|C = b)P (C = b)/P (W1 = b, W2 = b)
= 0.7 ∗ 0.7 ∗ 0.3/P (W1 = b, W2 = b)

the third equality follows from the independency assumption and

P (W1 = b, W2 = b) =
P (W1 = b|C = g)P (W2 = b|C = g)P (C = g) +
P (W1 = b|C = b)P (W2 = b|C = b)P (C = b) =
0.3 ∗ 0.3 ∗ 0.7 + 0.7 ∗ 0.7 ∗ 0.3 = 0.063 + 0.147 = 0.200

Hence P (C = b|W1 = b, W2 = b) = 0.142/0.200 = 0.710 Hence P (C = b|W1 =


b, W2 = b) = 0.142/0.200 = 0.710
c: For the three witnesses a similar computation gives that P (C = b|W1 = b, W2 =
b, W3 = g) = 0.7 ∗ 0.7 ∗ 0.3 ∗ 0.3/(0.3 ∗ 0.3 ∗ 0.7 ∗ 0.7 + 0.7 ∗ 0.7 ∗ 0.3 ∗ 0.3) = 0.5

Notation. The use of the bold P in an expression like P(B|LB = t) is introduced


in Russel and Norvigs’ Artificial Intelligence book (Section 13.2.2.). It is the vector
of probabilities for the values of a finite discrete random variable in a fixed order.
For a boolean variable the vector with two components represents the probability
distribution where the first number is the probability of the variable having value true,
the second number is the value for false. Of course the sum of these two values should
be 1. The constant α is used by Russel and Norvig for the normalization constant
(see section 13.3). For example the product α < 0.8, 0.6 > equals < α×0.8, α×0.6 >.
Since the sum is 1, we can compute α from α × 1.4 = 1.

2. The Prosecution argument. The counsel for the prosecution argues as follows:

2
Ladies and gentlemen of the jury, the probability of the observed match be-
tween the sample at the scene of the crime and that of the suspect having
arisen by innocent means is 1 in 10 million. This is an entirely negligible
probability, and we must therefore conclude that with a probability over-
whelmingly close to 1 that the suspect is guilty. You have no alternative
but to convict.

This argument is known as the Prosecutor’s Fallacy. Explain the error in the counsel’s
reasoning.
Answer: The confusion is between P (M |¬G), the probability of a match given the
suspect is not guilty, which is 1 in 10 million, and P (¬G|M ). They are not the same.
Bayes rule gives the relation. The values are the same when the priors P (¬G|B) and
P (G|B) (based on all other evidence B) are the same.

3. “Most car accidents are caused by people that do have a driver’s licence.” What is
suggested by this statement? What are the relevant conditional probabilities?
Answer: The suggestion is that having a driver’s licence causes (or at least has a pos-
itive impact on) car accidents, more than not having a driver’s licence. The relevant
probabilities are: P (Acc|¬HasLisence) and P (Acc|HasLisence). The statement is
probably true since most people driving a car have a driver’s license. But this doesn’t
imply that the first conditional probability is larger than the second one. You may
construct two worlds in both worlds the statement is true, but in one world the
suggestion is true, in the other world it is not true.

4. Make exercise 14.1 from the book of Russel and Norvig Artificial Intelligence (3rd
edition).
Solution (from R&N): see Figures 1 and 2.

3
Figure 1: Solution exercise 14.1 (first part))

Figure 2: Solution exercise 14.1 (continuation)

4
5. Make exercise 14.4 from the book of Russel and Norvig Artificial Intelligence (3rd
edition).
Solution (from R&N): see Figure 3

Figure 3: Solution exercise 14.4

5
Figure 4: The Sprinkler Bayesian network

6. Given the Sprinkler network shown in Figure 4. What is the best approximation
of the value of P (S = T rue|W = T rue) (the probability that the Sprinkler was on
given that the grass is Wet)? Use the enumeration method. Indicate where you use
the “conditional independency” relation represented by the BN.

(a) 0.2781
(b) 0.6471
(c) 0.1945
(d) 0.4298

We compute the distribution P and use normalization.

6
P(S|W = T rue)
=
αP(S, W = T rue)
=
P P
α c r P(C = c, S, R = r, W = T rue)
=
P P
α c r P(C = c)P(S|C = c)P(R = r|C = c)P(W = T rue|S, R = r)
= (factoring out P (C = c) and P (S|C = c) of the summation over r:)
P P
α c P(C = c)P(S|C = c) r P(R = r|C = c)P(W = T rue|S, R = r)
We compute this expression for each value of S (i.e. for both S=True and S=False).
to obtain both values of the distribution P(S|W = T rue).
We start with the conditional probability value for S = T rue.
P (S = T rue|W = T rue)
=
P P
α c P (C = c)P (S = T rue|C = c) r P (R = r|C = c)P (W = T rue|S = T rue, R =
r) ( where α stands for the constant P (W = T rue) )
=
P
αP (C = T rue).P (S = T rue|C = T rue) r P (R = r|C = T rue)P (W = T rue|S =
T rue, R = r)
+
P
αP (C = F alse).P (S = T rue|C = F alse) r P (R = r|C = F alse)P (W = T rue|S =
T rue, R = r).
=
α0.5 × 0.1 × (P (R = T rue|C = T rue)P (W = T rue|S = T rue, R = T rue) + P (R =
F alse|C = T rue)P (W = T rue|S = T rue, R = F alse))
+
α0.5 × 0.5 × (P (R = T rue|C = F alse)P (W = T rue|S = T rue, R = T rue) + P (R =
F alse|C = F alse)P (W = T rue|S = T rue, R = F alse))
=
α0.5 × 0.1 × (0.8 × 0.99 + 0.2 × 0.9)+ α0.5 × 0.5 × (0.2 × 0.99 + 0.8 × 0.9)
=
α0.2781
And, for S = F alse:
P (S = F alse|W = T rue)

7
=
P P
α c P (C = c)P (S = F alse|C = c) r P (R = r|C = c)P (W = T rue|S =
F alse, R = r)
=
P
αP (C = T rue).P (S = F alse|C = T rue) r P (R = r|C = T rue)P (W = T rue|S =
F alse, R = r)
+
P
αP (C = F alse).P (S = F alse|C = F alse) r P (R = r|C = F alse)P (W = T rue|S =
F alse, R = r).
=
α0.5 × 0.9 × (P (R = T rue|C = T rue)P (W = T rue|S = F alse, R = T rue) + P (R =
F alse|C = T rue)P (W = T rue|S = F alse, R = F alse))
+
α0.5 × 0.5 × (P (R = T rue|C = F alse)P (W = T rue|S = F alse, R = T rue) + P (R =
F alse|C = F alse)P (W = T rue|S = F alse, R = F alse))
=
α0.5 × 0.9 × (0.8 × 0.9 + 0.2 × 0.0)+ α0.5 × 0.5 × (0.2 × 0.9 + 0.8 × 0.0)
=
α0.369
Thus: P(S|W = T rue) = α < 0.2781, 0.369 >.
After normalization we obtain: P(S|W = T rue) =< 0.2781/0.2781+0.369, 0.369/0.2781+
0.369 >=< 0.4296, 0.5702 >.
Conclusion: answer d) ( 0.4298 ) is the best approximation of P (S = T rue|W =
T rue).

7. Given the Sprinkler network shown in Figure 4. What is the best approximation of
the value of P (S = T rue|W = T rue, R = T rue) (the probability that the Sprinkler
was on given that the grass is Wet and that it was Raining)?

(a) 0.2781
(b) 0.6471
(c) 0.1945
(d) 0.4298

Using the same technique as in the previous exercise, we compute:


P(S|W = T rue, R = T rue)

8
=
αP(S, W = T rue, R = T rue)
=
P
α c P(C = c, S, R = T rue, W = T rue)
=
P
α c P(C = c)P(S|C = c)P(R = T rue|C = c)P(W = T rue|S, R = T rue)
We first compute the value for S = T rue:
P (S = T rue|W = T rue, R = T rue)
=
P
α c P (C = c)P (S = T rue|C = c)P (R = T rue|C = c)P (W = T rue|S = T rue, R =
T rue)
=
αP (C = T rue)P (S = T rue|C = T rue)P (R = T rue|C = T rue)P (W = T rue|S =
T rue, R = T rue)
+
αP (C = F alse)P (S = T rue|C = F alse)P (R = T rue|C = F alse)P (W = T rue|S =
T rue, R = T rue)
=
α(0.5 × 0.1 × 0.8 × 0.99 + 0.5 × 0.5 × 0.2 × 0.99)
=
α0.0891
For S = F alse we compute:
P (S = F alse|W = T rue, R = T rue)
=
P
α c P (C = c)P (S = F alse|C = c)P (R = T rue|C = c)P (W = T rue|S =
F alse, R = T rue)
=
αP (C = T rue)P (S = F alse|C = T rue)P (R = T rue|C = T rue)P (W = T rue|S =
F alse, R = T rue)
+
αP (C = F alse)P (S = F alse|C = F alse)P (R = T rue|C = F alse)P (W = T rue|S =
F alse, R = T rue)
=
α(0.5 × 0.9 × 0.8 × 0.9 + 0.5 × 0.5 × 0.2 × 0.9)

9
=
α0.369
Normalization: P(S|W = T rue, R = T rue) = α < 0.0891, 0.369 >=< 0.0891/(0.0891+
0.369), 0.369/(0.0891 + 0.369) >=< 0, 1945, 0, 8055 >.
Conclusion: c is the correct answer.
Remark: the two exercises show an example of explaining away: the presence of
one of the possible causes of an observed event makes the other less probable. The
fact that the Sprinkler is on makes it less probable that there has been Rain.

8. In the Bayesian Network below with three boolean variables the probabilities for
P and M are: P (M = true) = 0, 1 and P (L = true) = 0.7 and the conditional
probabilities for variable V are as shown in the table.

 
L M L M P (V = true | L, M )
 
S  true true 0,9
S 
true false 0,5
w
S 
S /
 false true 0,3
V false false 0,05


What is the value of P (V = true | L = true) ?

(a) 0.72
(b) 0.54
(c) 0.46
(d) 0.28

Answer: b (0.54)
Following the same strategy via computing the full joint probability and using the
definition of conditional probability: P (X|Y ) = P (X, Y )/P (Y ):
P (V = t|L = t)
= (summing out M)
P
m P (V = t, M = m|L = t)
= (definition cond. prob)
P
P (V =t,M =m,L=t)
P mP
m v P (V =v,M =m,L=t)

= (use BN)

10

C

 S
 S
 w
 S
/
 S
D E
 

Figure 5: A Bayesian Network.


P
P (L=t)P (M =m)P (V =t|L=t,M =m)
P mP
m v P (L=t)P (M =m)P (V =v|L=t,M =m)

= (fill in from the CPT’s in the BN)


0.54.
A short route:
P
P (V |L) = m P (V |L, M ).P (M ) = 0.9 × 0.1 + 0.5 × 0.9 = 0.54
So, the correct answer is b)

9. Consider the Bayesian Network in Figure 5.


All three nodes represent boolean variables. The probability distributions for the
nodes of the network are as follows.
For node C: P (C = true) = 0.4.
For node D: P (D = true|C = true) = 0.8 and P (D = true|C = f alse) = 0.3.
For node E: P (E = true|C = true) = 0.9 and P (E = true|C = f alse) = 0.2.
What is the value of P (D = true)?

(a) 0.50
(b) 0.32
(c) 0.18
(d) 0.90

P P P P
Answer: P (D = t) = c e P (C = c, E = e, D = t) = c e P (C = c).P (E =
e|C = c).P
P(D = t|C = c) (using the BN semantics).
P By factoring out and simplifying
because e P (E = e|C = f ) = 1, as well as e P (E = e|C = t) = 1 we obtain:
P (C = t).P (D = t|C = t) + P (C = f ).P (D = t|C = f ) = 04. × 0.8 + 0.6 × 0.3 = 0.5.

11
10. Consider again the Bayesian Network in Figure 5 with the probability distributions
as given in the exercise above. One of the following statements is true. Which one?

(a) P (D = true|E = true) > P (D = true)


(b) P (D = true|E = true) = P (D = true)
(c) P (D = true|E = true) < P (D = true)
(d) There is not enough information to compute P (D = true|E = true).

Answer ( P (D|E) = 0.675 )

11. In the Bayesian Network below with three boolean variables the probabilities for
P and M are: P (M = true) = 0, 2 and P (L = true) = 0.7 and the conditional
probabilities for variable V are as shown in the table.

 
L M L M P (V = true | L, M )
 
S  true true 0,9
S 
true false 0,5
w
S 
S /
 false true 0,3
V false false 0,05


What is the value of P (V = f alse | L = f alse) ?

(a) 0.3
(b) 0.7
(c) 0.9
(d) 0.1

Answer: c (0.9)

12
Figure 6: The dependencies for the user tests of my design

12. The Bayesian Network structure given in Figure 6 models the dependencies between
three properties related to my design. IsMale is true when the user is male, IsYoung
is true when the user is young, LikesDesign is true when the user likes my design.
Which of the following statements is true? Give a proof or counter example.

(a) IsYoung and IsMale are independent


(b) IsYoung and IsMale are independent given LikesDesign

Answer: a) is true, b) is false.

13

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy