7 Posterior Probability and Bayes PDF

Download as pdf or txt
Download as pdf or txt
You are on page 1of 20

7 Posterior Probability and Bayes

Examples:
1. In a computer installation, 60% of programs are written in C ++
and 40% in Java. 60% of the programs written in C ++ compile
on the first run and 80% of the Java programs compile on the first
run.
(a) What is the overall proportion of programs that compile on
first run?
(b) If a randomly selected program compiles on the first run what
is the probability that it was written in C ++ ?
2. In a certain company

50% of documents are written in WORD; 30% in LATEX; 20% in


HTML.
From past experience it is know that :
40% of the WORD documents exceed 10 pages
20% of the LATEX documents exceed 10 pages
20% of the HTML exceed 10 pages

(a) What is the overall proportion of documents containing more


than 10 pages?
(b) A document is chosen at random and found to to have more
than 10 pages. What is the probability that it has been written
in LATEX?
3. Enquiries to an on-line computer system arrive on 5 communica-
tion lines. The percentage of messages received through each line
are:

Line 1 2 3 4 5
% received 20 30 10 15 25

From past experience, it is known that the percentage of messages


exceeding 100 characters on the different lines are:

Line 1 2 3 4 5
% exceeding
100 characters 40 60 20 80 90

(a) Calculate the overall proportion of messages exceeding 100


characters.
(b) If a message chosen at random is found to exceed 100 char-
acters, what is the probability that it came through line 5?

4. A binary communication channel carries data as one of two sets


of signals denoted by 0 and 1. Owing to noise, a transmitted 0
is sometimes received as a 1, and a transmitted 1 is sometimes
received as a 0. For a given channel, it can be assumed that
a transmitted 0 is correctly received with probability 0.95 and a
transmitted 1 is correctly received with probability 0.75. Also, 60%
of all messages are transmitted as a 0. If a signal is sent, determine
the probability that:
(a) a 1 was received;
(b) a 0 was received;
(c) an error occurred;
(d) a 1 was transmitted given than a 1 was received;
(e) a 0 was transmitted given that a 0 was received.
Example 1

Compiles Does not compile


on first run on first run
C ++ 72 48 120
Java 64 16 80
136 64 200
The probability that a program has been written in C ++ when we know
that it has compiled in the first run?

If E is the event of compiling on the first run, we seek:


P (C ++ |E) =?
Now
P (C ++ ∩ E) = P (C ++ )P (E|C ++ )
and
P (E ∩ C ++ ) = P (E)P (E|C ++ )
Since
P (E ∩ C ++ ) = P (C ++ ∩ E)
Then that
P (E)P (C ++ |E) = P (C ++ )P (E|C ++ )
So
P (C ++ )P (E|C ++ )
P (C ++ |E) =
P (E)
P(C ++ |E) is the Posterior Probability of C ++ after it has been found
that the program has compiled on the first run.

P (C ++ )P (E|C ++ )
P (C ++ |E) =
P (E)

P (E) = P (C ++ )P (E|C ++ ) + P (J)P (E|J)


= 120/200 ∗ 72/120 + 80/200 ∗ 64/80+ = 0.68
Then
120/200 ∗ 72/120
P (C ++ |E) =
.68
In R :
totalprob <-((80/200)*(64/80))+((120/200)*(72/120)) # total probability
conditc <-(120/200)*(72/120)/totalprob #posterior probability of C++
conditc
[1] 0.5294118
Analogously,
P (J)P (E|J)
P (J|E) =
P (C ++ )P (E|C ++ )
+ P (J)P (E|J)
In R
totalprob <-((80/200)*(64/80))+((120/200)*(72/120)) # total probability
conditjava <-(80/200)*(64/80)/totalprob #posterior probability of Java
conditjava
[1] 0.4705882

Prior and Posterior Probabilities after a program has


been found to have compiled on first run
C ++ Java
Prior 0.60 0.40
Posterior 0.53 0.47
Bayes’ Theorem

Bayes’ Rule
Bayes’ rule for two events
Consider two events A and B.
P (A ∩ B) = P (A)P (B|A)
and
P (B ∩ A) = P (B)P (A|B)
Since
P (A ∩ B) = P (B ∩ A),
It follows from the multiplication law that

P (B)P (A|B) = P (A)P (B|A)


which implies that
P (A)P (B|A)
P (A|B) =
P (B)
P (A|B) is Posterior Probability of A after B has occurred.
Example 4

A binary communication channel carries data as one of two sets of


signals denoted by 0 and 1. Owing to noise, a transmitted 0 is some-
times received as a 1, and a transmitted 1 is sometimes received as a
0.

For a given channel, it can be assumed that a transmitted 0 is cor-


rectly received with probability 0.95 and a transmitted 1 is correctly
received with probability 0.75. It is also known that 70% of all mes-
sages are transmitted as a 0.

If a signal is sent, what is the probability that


1. a 1 was transmitted given that a 1 was received;
2. a 0 was transmitted given that a 0 was received.
Solution

Let:
R0 is event that a zero is received.
T0 is event that a zero is transmitted, P (T0 = .7)

R1 is event that a one is received.


T1 is the event that a one is transmitted, P (T1 = .3)

P (T0 )P (R0 |T0 )


P (T0 |R0 ) =
P (R0 )

Now:
P (T0 )P (R0 |T0 ) = .7 × .95
and
R0 = (T0 ∩ R0 ) ∪ (T1 ∩ R0 )
So
P (R0 ) = P (T0 )P (R0 |T0 ) + P (T1 )P (R0 |T1 )
Therefore
P (R0 ) = .7 × .95 + .3 × .25 = .74

.7 × .95 .665
P (T0 |R0 ) = = = .90
.7 × .95 + .3 × .25 .74

Similarly
.3 × .25 .075
P (T1 |R0 ) = = = .10
.7 × .95 + .3 × .25 .74
Example

From past experience it is know that :

50% of documents are written in WORD;


30% in LATEX;
20% in HTML.

These are the prior probabilities

• P (W ord) = .5;
• P (Latex) = .3;
• P (Html) = .2.
40% of the WORD documents exceed 10 pages
20% of the LATEX documents exceed 10 pages
20% of the HTML exceed 10 pages
A a document chosen at random was found to exceed 10 pages, what
the probability is that it has been written in Latex?

Let E be the event that a document, chosen at random, contains more


than 10 pages.

P (Latex|E) =?

P (Latex)P (E|Latex)
P (Latex|E) =
P (E)
Now

P (E) = P (W ord)P (E|W ord) + P (Latex)P (E|Latex) + P (Html)P (E|Html)


= 0.5 × 0.4 + 0.3 × 0.2 + 0.2 × 0.2 = .3
So:
P (Latex)P (E|Latex)
P (Latex|E) =
P (E)
0.3 × 0.2
= = .2
.3
Similarly
.5 × .4
P (W ord|E) = = .67
.3
and
.2 × .2
P (Html|E) = = .13
.3

Prior and Posterior Probabilities after document has been


found to contain more than 10 pages

Word Latex Html


Prior 0.50 0.30 0.20
Posterior 0.67 0.20 0.13
General Bayes Rule
If a sample space can be partitioned into k mutually exclusive and
exhaustive events:
A1 , A2 , A3 , · · · Ak
S = A1 ∪ A2 ∪ A3 · · · ∪ Ak

Then for any event E,:


P (E) = P (A1 )P (E|A1 ) + P (A2 )P (E|A2 ) · · · P (Ak )P (E|Ak )

P (Ai )P (E|Ai )
P (Ai |E) =
P (E)

Proof: For any i, 1 ≤ i ≤ k

E ∩ Ai = Ai ∩ E
P (E)P (Ai |E) = P (Ai )P (E|Ai )
P (Ai )P (E|Ai )
P (Ai |E) =
P (E)

P (Ai |E) is called the POSTERIOR PROBABILITY


SOME APPLICATIONS OF BAYES

• Game Show: “Let’s make a Deal

• Hardware Fault Diagnosis

• Machine Learning

• Machine Translation
Game Show:

You are a contestant in a game which allows you to choose


one out of three doors. One of these doors conceals a lap-
top computer while the other two are empty. When you
have made your choice, the host of the show opens one of
the remaining doors, and shows you that it is one of the
empty ones. You are now given the opportunity to change
your door. Should you do so, or should you stick with your
original choice?

The question we address is: If you change, what are your


chances of winning the laptop?

When you choose a door randomly originally, there is a


probability of 1/3 that the laptop will be behind the door
that you choose, and a probability of 2/3 that it will be be-
hind one of the other two doors. These probabilities do not
change when a door is opened and revealed empty; therefore
if you switch you will have a probability of 2/3 of winning
the laptop.

Let D1 , D2 , and D3 be the events that the laptop is behind


door 1, door 2 and door 3 respectively. We assume that
1
P (D1 ) = P (D2 ) = P (D3 ) =
3
and
2
P (D1 ) = P (D2 ) = P (D3 ) =
3
Suppose you choose door 1.
Let E be the event that the host opens door 2, and re-
veals it to be empty. If you change after this, you will be
choosing door 3. So we need to know P (D3 |E). From
Bayes’ rule
P (D3 )P (E|D3 )
P (D3 |E) =
P (E)
Now, from total probability,
E = (D1 ∩ E) ∪ (D2 ∩ E) ∪ (D3 ∩ E)
So
P (E) = P (D1 ∩ E) + P (D2 ∩ E) + P (D3 ∩ E)

= P (D1 )P (E|D1 ) + P (D2 )P (E|D2 ) + P (D3 )P (E|D3 )

So we can write
P (D3 )P (E|D3 )
P (D3 |E) =
P (D1 )P (E|D1 ) + P (D2 )P (E|D2 ) + P (D3 )P (E|D3 )
(1)
We need to calculate P (E|D1 ), P (E|D2 ) and P (E|D3 ).

P (E|D1 ) is the probability that the host opens door 2 and


reveals it to be empty, given that it is behind door 1. When
the laptop is behind door 1, the door selected by you, the
host has two doors to select from. We assume that the host
selects one of these at random, i.e. P (E|D1 ) = 1/2.

P (E|D2 ) is the probability that the host opens door 2


and reveals it to be empty, given that it is behind door
2. This is an impossibility, so it has a probability of zero,
i.e. P (E|D2 ) = 0.

Finally P (E|D3 ) is the probability that the host opens door


2 and reveals it to be empty, given that it behind door 3.
When the laptop is behind door 3, there has just one way of
revealing an empty door; the host must open door 2. This is
a certainty, so it has a probability of one, i.e. P (E|D3 ) = 1.

Putting these values into (1) we have


1
3 ×1 2
P (D3 |E) = =
( 13 × 1
2) + ( 31 1
× 0) + ( 3 × 1) 3
So changing would increase your probability of winning the
laptop from 1/3 to 2/3.

You would double your chance by changing.

The prior probability of winning the laptop was 1/3, while


the posterior probability that the laptop is behind the door
you have not chosen is 2/3
Hardware Fault Diagnosis

Printer failures are associated with three types of problems;


hardware, software and electrical connections. A printer
manufacturer obtained the following probabilities from a
database of tests results.
P (H) = 0.1 P (S) = 0.6 P (E) = 0.3
where H, S, and E denote hardware, software or electrical
problems respectively:
These are the prior probabilities:

It is also known that the probability of a printer failure (F )


• given a hardware problem is 0.9, P (F |H) = 0.9
• given a software problem is 0.2, P (F |S) = 0.2
• given an electrical problem is 0.5, P (F |E) = 0.5
These are the conditional probabilities

If a customer reports a printer failure, what is the most


likely cause of the problem?

We need the posterior probabilities


P (H|F ), P (S|F ), P (E|F ),
the likelihood that, given that a printer fault has been re-
ported, it will be due to faulty hardware, software or elec-
trical.
We calculate the posterior probabilities using Bayes’ rule.

P (H|F ), P (S|F ), P (E|F )

First calculate the total probability of a failure occurring.


F = (H ∩ F ) ∪ (S ∩ F ) ∪ (E ∩ F )

P (F ) = P (H)P (F |H) + P (S)P (F |S) + P (E)P (F |E)


= .1 × .9 + .6 × .2 + .3 × .5 = .36
Then the posterior probabilities are:
P (H)P (F |H) .1 × .9
P (H|F ) = = = .250
P (F ) .36
P (S)P (F |S) .6 × .2
P (S|F ) = = = .333
P (F ) .36
P (E)P (F |E) .3 × .5
P (E|F ) = = = .417
P (F ) .36
Because P (E|F ) is the largest, the most likely cause of the
problem is electrical.

A help desk or website dialog to diagnose the problem should


check into this problem first.
Machine Learning

Bayes’ theorem is sometimes used to improve the accuracy


in supervised learning.

It is used in the classification of items where the system


has already learnt the probabilities.

We need to classify f , a new example, into the correct


class. Suppose there are only two classes, y = 1 and y = 2
into which we can classify new values of f .

By Bayes’ rule, we can write


P (y = 1 f ) P (y = 1)P (f |y = 1)
T
P (y = 1|f ) = = (2)
P (f ) P (f )
P (y = 2 f ) P (y = 2)P (f |y = 2)
T
P (y = 2|f ) = = (3)
P (f ) P (f )
Dividing (2) by (3) we get
P (y = 1|f ) P (y = 1)P (f |y = 1)
= (4)
P (y = 2|f ) P (y = 2)P (f |y = 2)
Our decision is to classify a new example into class 1 if
P (y = 1|f )
>1
P (y = 2|f )
or equivalently if
P (y = 1)P (f |y = 1)
>1
P (y = 2)P (f |y = 2)
f goes into class 1 iff
P (y = 1)P (f |y = 1) > P (y = 2)P (f |y = 2)
and f goes into class 2 if
P (y = 1)P (f |y = 1) < P (y = 2)P (f |y = 2)

The prior probabilities of being in either of two classes are


assumed to be already learnt:
P (y = 1) = .4 P (y = 2) = .6

Also the conditional probabilities for the new example f


are also aready learnt:
P (f |y = 1) = .5 P (f |y = 2) = .3

Into what class should you classify the new example?

P (y = 1)P (f |y = 1)) = .4 × .5 = .20


and
P (y = 2)P (f |y = 2) = .6 × .3 = .18
and since
P (y = 1)P (f |y = 1) > P (y = 2)P (f |y = 2),
the new example goes into class 1.
Machine Translation

A string of English words e can be translated into a string


of French words f in many different ways. Often, knowing
the broader context in which e occurs may narrow the set
of acceptable French translations but, even so, many ac-
ceptable translations remain.

Bayes’ rule can be used to make a choice between them.

To every pair (e, f) a number P (f |e) is assigned which is


interpreted as the probability that a translator when given
e to translate will return f as the translation.

Given a string of French words f, the job of the transla-


tion system is to find the string e that the native speaker
had in mind, when f was produced. We seek P (e|f ).

From Bayes’ rule


P (e)P (f |e)
P (e|f ) =
P (f )

We seek that ê to maximise P (e|f ), i.e. choose ê so that


P (e)P (f |e)
P (ê|f ) = maxe
P (f )
which is equivalent to maxe (P (e)P (f |e)) since P(f) is con-
stant regardless of e. This is known as the Fundamental
Equation of Machine Translation usually written as
ê = argmaxP (e)P (f |e)
As a simple example, suppose there are 3 possibilities for e:
P (e1 ) = 0.2, P (e2 ), = 0.5, P (e3 ) = 0.3.
One of these has been translated into f. A passage f has
the following conditional probabilities:
P (f |e1 ) = 0.4, P (f |e2 ) = 0.2, P (f |e3 ) = 0.4.
We can use Bayes’ theorem to decide which is the most
likely e.
P (e1 )P (f |e1 ) = .2 × .4 = .08
P (e2 )P (f |e2 ) = .5 × .2 = .1
P (e3 )P (f |e3 ) = .3 × .4 = .12
The best translation, in this case, is e3 since it has the high-
est posterior probability.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy