7 Posterior Probability and Bayes PDF

7 Posterior Probability and Bayes
Examples:
1. In a computer installation, 60% of programs are written in C ++
and 40% in Java. 60% of the programs written in C ++ compile
on the first run and 80% of the Java programs compile on the first
run.
(a) What is the overall proportion of programs that compile on
first run?
(b) If a randomly selected program compiles on the first run what
is the probability that it was written in C ++ ?
2. In a certain company
50% of documents are written in WORD; 30% in LATEX; 20% in

HTML.
From past experience it is know that :
40% of the WORD documents exceed 10 pages
20% of the LATEX documents exceed 10 pages
20% of the HTML exceed 10 pages
(a) What is the overall proportion of documents containing more

than 10 pages?
(b) A document is chosen at random and found to to have more
than 10 pages. What is the probability that it has been written
in LATEX?
3. Enquiries to an on-line computer system arrive on 5 communica-
tion lines. The percentage of messages received through each line
are:
Line 1 2 3 4 5
% received 20 30 10 15 25
From past experience, it is known that the percentage of messages

exceeding 100 characters on the different lines are:
Line 1 2 3 4 5
% exceeding
100 characters 40 60 20 80 90
(a) Calculate the overall proportion of messages exceeding 100

characters.
(b) If a message chosen at random is found to exceed 100 char-
acters, what is the probability that it came through line 5?
4. A binary communication channel carries data as one of two sets

of signals denoted by 0 and 1. Owing to noise, a transmitted 0
is sometimes received as a 1, and a transmitted 1 is sometimes
received as a 0. For a given channel, it can be assumed that
a transmitted 0 is correctly received with probability 0.95 and a
transmitted 1 is correctly received with probability 0.75. Also, 60%
of all messages are transmitted as a 0. If a signal is sent, determine
the probability that:
(a) a 1 was received;
(b) a 0 was received;
(c) an error occurred;
(d) a 1 was transmitted given than a 1 was received;
(e) a 0 was transmitted given that a 0 was received.
Example 1
Compiles Does not compile

on first run on first run
C ++ 72 48 120
Java 64 16 80
136 64 200
The probability that a program has been written in C ++ when we know
that it has compiled in the first run?
If E is the event of compiling on the first run, we seek:

P (C ++ |E) =?
Now
P (C ++ ∩ E) = P (C ++ )P (E|C ++ )
and
P (E ∩ C ++ ) = P (E)P (E|C ++ )
Since
P (E ∩ C ++ ) = P (C ++ ∩ E)
Then that
P (E)P (C ++ |E) = P (C ++ )P (E|C ++ )
So
P (C ++ )P (E|C ++ )
P (C ++ |E) =
P (E)
P(C ++ |E) is the Posterior Probability of C ++ after it has been found
that the program has compiled on the first run.
P (C ++ )P (E|C ++ )
P (C ++ |E) =
P (E)
P (E) = P (C ++ )P (E|C ++ ) + P (J)P (E|J)

= 120/200 ∗ 72/120 + 80/200 ∗ 64/80+ = 0.68
Then
120/200 ∗ 72/120
P (C ++ |E) =
.68
In R :
totalprob <-((80/200)*(64/80))+((120/200)*(72/120)) # total probability
conditc <-(120/200)*(72/120)/totalprob #posterior probability of C++
conditc
[1] 0.5294118
Analogously,
P (J)P (E|J)
P (J|E) =
P (C ++ )P (E|C ++ )
+ P (J)P (E|J)
In R
totalprob <-((80/200)*(64/80))+((120/200)*(72/120)) # total probability
conditjava <-(80/200)*(64/80)/totalprob #posterior probability of Java
conditjava
[1] 0.4705882
Prior and Posterior Probabilities after a program has

been found to have compiled on first run
C ++ Java
Prior 0.60 0.40
Posterior 0.53 0.47
Bayes’ Theorem
Bayes’ Rule
Bayes’ rule for two events
Consider two events A and B.
P (A ∩ B) = P (A)P (B|A)
and
P (B ∩ A) = P (B)P (A|B)
Since
P (A ∩ B) = P (B ∩ A),
It follows from the multiplication law that
P (B)P (A|B) = P (A)P (B|A)

which implies that
P (A)P (B|A)
P (A|B) =
P (B)
P (A|B) is Posterior Probability of A after B has occurred.
Example 4
A binary communication channel carries data as one of two sets of

signals denoted by 0 and 1. Owing to noise, a transmitted 0 is some-
times received as a 1, and a transmitted 1 is sometimes received as a
0.
For a given channel, it can be assumed that a transmitted 0 is cor-

rectly received with probability 0.95 and a transmitted 1 is correctly
received with probability 0.75. It is also known that 70% of all mes-
sages are transmitted as a 0.
If a signal is sent, what is the probability that

1. a 1 was transmitted given that a 1 was received;
2. a 0 was transmitted given that a 0 was received.
Solution
Let:
R0 is event that a zero is received.
T0 is event that a zero is transmitted, P (T0 = .7)
R1 is event that a one is received.

T1 is the event that a one is transmitted, P (T1 = .3)
P (T0 )P (R0 |T0 )

P (T0 |R0 ) =
P (R0 )
Now:
P (T0 )P (R0 |T0 ) = .7 × .95
and
R0 = (T0 ∩ R0 ) ∪ (T1 ∩ R0 )
So
P (R0 ) = P (T0 )P (R0 |T0 ) + P (T1 )P (R0 |T1 )
Therefore
P (R0 ) = .7 × .95 + .3 × .25 = .74
.7 × .95 .665
P (T0 |R0 ) = = = .90
.7 × .95 + .3 × .25 .74
Similarly
.3 × .25 .075
P (T1 |R0 ) = = = .10
.7 × .95 + .3 × .25 .74
Example
From past experience it is know that :
50% of documents are written in WORD;

30% in LATEX;
20% in HTML.
These are the prior probabilities
• P (W ord) = .5;
• P (Latex) = .3;
• P (Html) = .2.
40% of the WORD documents exceed 10 pages
20% of the LATEX documents exceed 10 pages
20% of the HTML exceed 10 pages
A a document chosen at random was found to exceed 10 pages, what
the probability is that it has been written in Latex?
Let E be the event that a document, chosen at random, contains more

than 10 pages.
P (Latex|E) =?
P (Latex)P (E|Latex)
P (Latex|E) =
P (E)
Now
P (E) = P (W ord)P (E|W ord) + P (Latex)P (E|Latex) + P (Html)P (E|Html)

= 0.5 × 0.4 + 0.3 × 0.2 + 0.2 × 0.2 = .3
So:
P (Latex)P (E|Latex)
P (Latex|E) =
P (E)
0.3 × 0.2
= = .2
.3
Similarly
.5 × .4
P (W ord|E) = = .67
.3
and
.2 × .2
P (Html|E) = = .13
.3
Prior and Posterior Probabilities after document has been

found to contain more than 10 pages
Word Latex Html

Prior 0.50 0.30 0.20
Posterior 0.67 0.20 0.13
General Bayes Rule
If a sample space can be partitioned into k mutually exclusive and
exhaustive events:
A1 , A2 , A3 , · · · Ak
S = A1 ∪ A2 ∪ A3 · · · ∪ Ak
Then for any event E,:

P (E) = P (A1 )P (E|A1 ) + P (A2 )P (E|A2 ) · · · P (Ak )P (E|Ak )
P (Ai )P (E|Ai )
P (Ai |E) =
P (E)
Proof: For any i, 1 ≤ i ≤ k
E ∩ Ai = Ai ∩ E
P (E)P (Ai |E) = P (Ai )P (E|Ai )
P (Ai )P (E|Ai )
P (Ai |E) =
P (E)
P (Ai |E) is called the POSTERIOR PROBABILITY

SOME APPLICATIONS OF BAYES
• Game Show: “Let’s make a Deal
• Hardware Fault Diagnosis
• Machine Learning
• Machine Translation
Game Show:
You are a contestant in a game which allows you to choose

one out of three doors. One of these doors conceals a lap-
top computer while the other two are empty. When you
have made your choice, the host of the show opens one of
the remaining doors, and shows you that it is one of the
empty ones. You are now given the opportunity to change
your door. Should you do so, or should you stick with your
original choice?
The question we address is: If you change, what are your

chances of winning the laptop?
When you choose a door randomly originally, there is a

probability of 1/3 that the laptop will be behind the door
that you choose, and a probability of 2/3 that it will be be-
hind one of the other two doors. These probabilities do not
change when a door is opened and revealed empty; therefore
if you switch you will have a probability of 2/3 of winning
the laptop.
Let D1 , D2 , and D3 be the events that the laptop is behind

door 1, door 2 and door 3 respectively. We assume that
1
P (D1 ) = P (D2 ) = P (D3 ) =
3
and
2
P (D1 ) = P (D2 ) = P (D3 ) =
3
Suppose you choose door 1.
Let E be the event that the host opens door 2, and re-
veals it to be empty. If you change after this, you will be
choosing door 3. So we need to know P (D3 |E). From
Bayes’ rule
P (D3 )P (E|D3 )
P (D3 |E) =
P (E)
Now, from total probability,
E = (D1 ∩ E) ∪ (D2 ∩ E) ∪ (D3 ∩ E)
So
P (E) = P (D1 ∩ E) + P (D2 ∩ E) + P (D3 ∩ E)
= P (D1 )P (E|D1 ) + P (D2 )P (E|D2 ) + P (D3 )P (E|D3 )
So we can write
P (D3 )P (E|D3 )
P (D3 |E) =
P (D1 )P (E|D1 ) + P (D2 )P (E|D2 ) + P (D3 )P (E|D3 )
(1)
We need to calculate P (E|D1 ), P (E|D2 ) and P (E|D3 ).
P (E|D1 ) is the probability that the host opens door 2 and

reveals it to be empty, given that it is behind door 1. When
the laptop is behind door 1, the door selected by you, the
host has two doors to select from. We assume that the host
selects one of these at random, i.e. P (E|D1 ) = 1/2.
P (E|D2 ) is the probability that the host opens door 2

and reveals it to be empty, given that it is behind door
2. This is an impossibility, so it has a probability of zero,
i.e. P (E|D2 ) = 0.
Finally P (E|D3 ) is the probability that the host opens door

2 and reveals it to be empty, given that it behind door 3.
When the laptop is behind door 3, there has just one way of
revealing an empty door; the host must open door 2. This is
a certainty, so it has a probability of one, i.e. P (E|D3 ) = 1.
Putting these values into (1) we have

1
3 ×1 2
P (D3 |E) = =
( 13 × 1
2) + ( 31 1
× 0) + ( 3 × 1) 3
So changing would increase your probability of winning the
laptop from 1/3 to 2/3.
You would double your chance by changing.
The prior probability of winning the laptop was 1/3, while

the posterior probability that the laptop is behind the door
you have not chosen is 2/3
Hardware Fault Diagnosis
Printer failures are associated with three types of problems;

hardware, software and electrical connections. A printer
manufacturer obtained the following probabilities from a
database of tests results.
P (H) = 0.1 P (S) = 0.6 P (E) = 0.3
where H, S, and E denote hardware, software or electrical
problems respectively:
These are the prior probabilities:
It is also known that the probability of a printer failure (F )

• given a hardware problem is 0.9, P (F |H) = 0.9
• given a software problem is 0.2, P (F |S) = 0.2
• given an electrical problem is 0.5, P (F |E) = 0.5
These are the conditional probabilities
If a customer reports a printer failure, what is the most

likely cause of the problem?
We need the posterior probabilities

P (H|F ), P (S|F ), P (E|F ),
the likelihood that, given that a printer fault has been re-
ported, it will be due to faulty hardware, software or elec-
trical.
We calculate the posterior probabilities using Bayes’ rule.
P (H|F ), P (S|F ), P (E|F )
First calculate the total probability of a failure occurring.

F = (H ∩ F ) ∪ (S ∩ F ) ∪ (E ∩ F )
P (F ) = P (H)P (F |H) + P (S)P (F |S) + P (E)P (F |E)

= .1 × .9 + .6 × .2 + .3 × .5 = .36
Then the posterior probabilities are:
P (H)P (F |H) .1 × .9
P (H|F ) = = = .250
P (F ) .36
P (S)P (F |S) .6 × .2
P (S|F ) = = = .333
P (F ) .36
P (E)P (F |E) .3 × .5
P (E|F ) = = = .417
P (F ) .36
Because P (E|F ) is the largest, the most likely cause of the
problem is electrical.
A help desk or website dialog to diagnose the problem should

check into this problem first.
Machine Learning
Bayes’ theorem is sometimes used to improve the accuracy

in supervised learning.
It is used in the classification of items where the system

has already learnt the probabilities.
We need to classify f , a new example, into the correct

class. Suppose there are only two classes, y = 1 and y = 2
into which we can classify new values of f .
By Bayes’ rule, we can write

P (y = 1 f ) P (y = 1)P (f |y = 1)
T
P (y = 1|f ) = = (2)
P (f ) P (f )
P (y = 2 f ) P (y = 2)P (f |y = 2)
T
P (y = 2|f ) = = (3)
P (f ) P (f )
Dividing (2) by (3) we get
P (y = 1|f ) P (y = 1)P (f |y = 1)
= (4)
P (y = 2|f ) P (y = 2)P (f |y = 2)
Our decision is to classify a new example into class 1 if
P (y = 1|f )
>1
P (y = 2|f )
or equivalently if
P (y = 1)P (f |y = 1)
>1
P (y = 2)P (f |y = 2)
f goes into class 1 iff
P (y = 1)P (f |y = 1) > P (y = 2)P (f |y = 2)
and f goes into class 2 if
P (y = 1)P (f |y = 1) < P (y = 2)P (f |y = 2)
The prior probabilities of being in either of two classes are

assumed to be already learnt:
P (y = 1) = .4 P (y = 2) = .6
Also the conditional probabilities for the new example f

are also aready learnt:
P (f |y = 1) = .5 P (f |y = 2) = .3
Into what class should you classify the new example?
P (y = 1)P (f |y = 1)) = .4 × .5 = .20

and
P (y = 2)P (f |y = 2) = .6 × .3 = .18
and since
P (y = 1)P (f |y = 1) > P (y = 2)P (f |y = 2),
the new example goes into class 1.
Machine Translation
A string of English words e can be translated into a string

of French words f in many different ways. Often, knowing
the broader context in which e occurs may narrow the set
of acceptable French translations but, even so, many ac-
ceptable translations remain.
Bayes’ rule can be used to make a choice between them.
To every pair (e, f) a number P (f |e) is assigned which is

interpreted as the probability that a translator when given
e to translate will return f as the translation.
Given a string of French words f, the job of the transla-

tion system is to find the string e that the native speaker
had in mind, when f was produced. We seek P (e|f ).
From Bayes’ rule

P (e)P (f |e)
P (e|f ) =
P (f )
We seek that ê to maximise P (e|f ), i.e. choose ê so that

P (e)P (f |e)
P (ê|f ) = maxe
P (f )
which is equivalent to maxe (P (e)P (f |e)) since P(f) is con-
stant regardless of e. This is known as the Fundamental
Equation of Machine Translation usually written as
ê = argmaxP (e)P (f |e)
As a simple example, suppose there are 3 possibilities for e:
P (e1 ) = 0.2, P (e2 ), = 0.5, P (e3 ) = 0.3.
One of these has been translated into f. A passage f has
the following conditional probabilities:
P (f |e1 ) = 0.4, P (f |e2 ) = 0.2, P (f |e3 ) = 0.4.
We can use Bayes’ theorem to decide which is the most
likely e.
P (e1 )P (f |e1 ) = .2 × .4 = .08
P (e2 )P (f |e2 ) = .5 × .2 = .1
P (e3 )P (f |e3 ) = .3 × .4 = .12
The best translation, in this case, is e3 since it has the high-
est posterior probability.

7 Posterior Probability and Bayes PDF

Uploaded by

Copyright:

Available Formats

7 Posterior Probability and Bayes PDF

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

7 Posterior Probability and Bayes PDF

Uploaded by

Copyright:

Available Formats

7 Posterior Probability and Bayes

50% of documents are written in WORD; 30% in LATEX; 20% in

(a) What is the overall proportion of documents containing more

From past experience, it is known that the percentage of messages

(a) Calculate the overall proportion of messages exceeding 100

4. A binary communication channel carries data as one of two sets

Compiles Does not compile

If E is the event of compiling on the first run, we seek:

P (E) = P (C ++ )P (E|C ++ ) + P (J)P (E|J)

Prior and Posterior Probabilities after a program has

P (B)P (A|B) = P (A)P (B|A)

A binary communication channel carries data as one of two sets of

For a given channel, it can be assumed that a transmitted 0 is cor-

If a signal is sent, what is the probability that

R1 is event that a one is received.

P (T0 )P (R0 |T0 )

From past experience it is know that :

50% of documents are written in WORD;

These are the prior probabilities

Let E be the event that a document, chosen at random, contains more

P (E) = P (W ord)P (E|W ord) + P (Latex)P (E|Latex) + P (Html)P (E|Html)

Prior and Posterior Probabilities after document has been

Word Latex Html

Then for any event E,:

Proof: For any i, 1 ≤ i ≤ k

P (Ai |E) is called the POSTERIOR PROBABILITY

• Game Show: “Let’s make a Deal

• Hardware Fault Diagnosis

You are a contestant in a game which allows you to choose

The question we address is: If you change, what are your

When you choose a door randomly originally, there is a

Let D1 , D2 , and D3 be the events that the laptop is behind

= P (D1 )P (E|D1 ) + P (D2 )P (E|D2 ) + P (D3 )P (E|D3 )

P (E|D1 ) is the probability that the host opens door 2 and

P (E|D2 ) is the probability that the host opens door 2

Finally P (E|D3 ) is the probability that the host opens door

Putting these values into (1) we have

You would double your chance by changing.

The prior probability of winning the laptop was 1/3, while

Printer failures are associated with three types of problems;

It is also known that the probability of a printer failure (F )

If a customer reports a printer failure, what is the most

We need the posterior probabilities

P (H|F ), P (S|F ), P (E|F )

First calculate the total probability of a failure occurring.

P (F ) = P (H)P (F |H) + P (S)P (F |S) + P (E)P (F |E)

A help desk or website dialog to diagnose the problem should

Bayes’ theorem is sometimes used to improve the accuracy

It is used in the classification of items where the system

We need to classify f , a new example, into the correct

By Bayes’ rule, we can write

The prior probabilities of being in either of two classes are

Also the conditional probabilities for the new example f

Into what class should you classify the new example?

P (y = 1)P (f |y = 1)) = .4 × .5 = .20

A string of English words e can be translated into a string

Bayes’ rule can be used to make a choice between them.

To every pair (e, f) a number P (f |e) is assigned which is

Given a string of French words f, the job of the transla-