Bayes' Theoremsdfsd
Bayes' Theoremsdfsd
Bayes' Theoremsdfsd
2 Examples
2.1 Cancer at age 65
Suppose we want to know a persons probability of having
cancer, but we know nothing about him or her. Despite
not knowing anything about that person, a probability can
be assigned based on the general prevalence of cancer.
For the sake of this example, suppose it is 1%. This
is known as the base rate or prior probability of having
cancer. Prior refers to the time before being informed
about the particular case at hand.
Statement of theorem
2 EXAMPLES
P(U +)
P(+|U)
(0.495%)
99%
P(-|U)
P(U)
(1%)
0.5%
P(U -)
(0.005%)
P(U +)
P(U)
(99.5%)
P(+|U)
(0.995%)
(1%)
P(-|U)
99%
P(U -)
(98.505%)
P (User | +) =
P (+ | User)P (User)
P (+ | User)P (User) + P (+ | Non-user)P (Non-user)
0.99 0.005
0.99 0.005 + 0.01 0.995
33.2%
2.2
Drug testing
Despite the apparent accuracy of the test, if an individual tests positive, it is more likely that they do not use the
drug than that they do. This again illustrates the importance of base rates, and how the formation of policy can
be egregiously misguided if base rates are neglected.
This surprising result arises because the number of nonusers is very large compared to the number of users; thus
the number of false positives (0.995%) outweighs the
number of true positives (0.495%). To use concrete numbers, if 1000 individuals are tested, there are expected to
be 995 non-users and 5 users. From the 995 non-users,
0.01 995 10 false positives are expected. From the
5 users, 0.99 5 5 true positives are expected. Out of
15 positive results, only 5, about 33%, are genuine.
Note: The importance of specicity can be illustrated by
showing that even if sensitivity is 100% and specicity is
at 99% the probability of the person being a drug user is
33% but if the specicity is changed to 99.5% and the
sensitivity is dropped down to 99% the probability of the
person being a drug user rises to 49.8%.
3.1
Bayesian interpretation
3
_
Relative size Case B Case B
Condition A
w
x
Total
w+x
Condition
y+z
Total
w+y
x+z
w+x+y+z
w
w+y
w
P (A|B) P (B) = ____ ________ = ________
w+y w+x+y+z w+x+y+z
w
w+x
w
P (B|A) P (A) = ____ ________ = ________
w+x w+x+y+z w+x+y+z
A geometric visualisation of Bayes theorem. In the table, the values w, x, y and z give the relative weights of each corresponding
condition and case. The gures denote the cells of the table involved in each metric, the probability being the fraction of each
gure that is shaded. This shows that P (A |B ) P (B ) = P (B
|A ) P (A ) i.e. P (A |B ) = P (B |A ) P (A )/ P (B ) . Similar
reasoning can be used to show that P (|B ) = P (B |) P ()/ P
(B ) and so forth.
In the Bayesian (or epistemological) interpretation, probability measures a degree of belief. Bayes theorem then
We are given that B has occurred, and we want to calcu- links the degree of belief in a proposition before and aflate the conditional probability of A3 . By Bayes theorem, ter accounting for evidence. For example, suppose it is
believed with 50% certainty that a coin is twice as likely
to land heads than tails. If the coin is ipped a number
P(A3 | B) = P(B | A3 ) P(A3 )/P(B) =
of times and the outcomes observed, that degree of be(0.01)(0.50)/(0.024) = 5/24.
lief may rise, fall or remain the same depending on the
results.
Given that the item is defective, the probability that it was
For proposition A and evidence B,
made by the third machine is only 5/24. Although machine 3 produces half of the total output, it produces a
P (A ), the prior, is the initial degree of
much smaller fraction of the defective items. Hence the
belief in A.
knowledge that the item selected was defective enables us
to replace the prior probability P(A3 ) = 1/2 by the smaller
P (A |B ), the posterior, is the degree of
posterior probability P(A3 | B) = 5/24.
belief having accounted for B.
Hence 2.4% of the total output of the factory is defective.
P(R P)
P(A B)
P(P|R)
P(B|A)
P(A)
P(A)
P(A)
P(B|A)
P(B|A)
P(A B)
P(P|R)
P(R)
(2%)
0.1%
P(A B)
P(P|C)
P(B)
P(B)
(95%)
P(C P)
P(B A)
P(A|B)
P(A|B)
P(A|B)
P(B)
P(P|C)
(94.905%)
P(B)
(4.995%)
5%
P(A B)
Knowledge of any
3 independent values
is sucient to deduce
all 24 values
P(R P)
(0.002%)
P(C P)
P(C)
P(B|A)
Knowledge of one
diagram is sucient
to deduce the other
(0.098%)
98%
(99.9%)
P(A)
FORMS
P (Rare | Pattern) =
P(B A)
P(B A)
1.9%.
P(A|B)
P(B A)
4 Forms
4.1 Events
4.1.1 Simple form
4.2
Random variables
constant by using the fact that their probabilities must add In the special case where A is a binary variable:
up to one. For instance, for a given event A, the event A itself and its complement A are exclusive and exhaustive.
P (B | A) P (A)
Denoting the constant of proportionality by c we have
P (A | B) =
Strip volume
= P(Y=y)
Strip volume
= P(X=x)
+y
1
c=
.
P (A) P (B | A) + P (A) P (B | A)
dy
dx
0
x
4.1.2
Alternative form
P(Y=y|X=x) =
P(X=x Y=y)
+x
P(X=x|Y=y) =
Volume
= P(X=x Y=y)
P(X=x Y=y)
P(Y=y)
Another form of Bayes Theorem that is generally enP(X=x)
countered when looking at two competing statements or
hypotheses is:
Diagram illustrating the meaning of Bayes theorem as applied
P (A | B) =
P (B | A) P (A)
Extended form
P (X = x | Y = y) =
fY (y | X = x) P (X = x)
.
fY (y)
P (B) =
P (B | Aj )P (Aj ),
j
P (B | Ai ) P (Ai )
P (Ai | B) =
P (B | Aj ) P (Aj )
j
+y
+f Y (y|X=x)
Area = 1, x
+f X(x)
P (B | A) =
HISTORY
P (A B)
, if P (A) = 0,
P (A)
P (A B) = P (A | B) P (B) = P (B | A) P (A),
P (A | B) =
0
0
Area = 1
+x
P (B | A) P (A)
, if P (B) = 0.
P (B)
+x
For two continuous random variables X and Y, Bayes theorem may be analogously derived from the denition of
denominator using the law of total probability. For fY(y), conditional density:
this becomes an integral:
fY (y) =
4.3
fX (x | Y = y) =
fX,Y (x, y)
fY (y)
fY (y | X = x) =
fX,Y (x, y)
fX (x)
fY (y | X = ) fX () d.
Bayes rule
fX (x | Y = y) =
fY (y | X = x) fX (x)
.
fY (y)
6 History
5
5.1
Derivation
For events
P (A | B) =
P (A B)
, if P (B) = 0,
P (B)
See also
Bayesian inference
Inductive probability
Notes
See also: Laplace, Essai philosophique sur les probabilits (Paris, France: Mme. Ve. Courcier
[Madame veuve (i.e., widow) Courcier], 1814),
page 10. English translation: Pierre Simon, Marquis de Laplace with F. W. Truscott and F. L.
Emory, trans., A Philosophical Essay on Probabilities (New York, New York: John Wiley & Sons,
1902), page 15.
[11] Daston, Lorraine (1988). Classical Probability in the Enlightenment. Princeton Univ Press. p. 268. ISBN 0-69108497-1.
[12] Stigler, Stephen M (1983). Who Discovered Bayes Theorem?". The American Statistician 37 (4): 290296.
doi:10.1080/00031305.1983.10483122.
[13] Edwards, A. W. F. (1986).
Is the Reference in Hartley (1749) to Bayesian Inference?".
The American Statistician 40 (2):
109110.
doi:10.1080/00031305.1986.10475370.
[14] Hooper, Martyn (2013).
Richard Price, Bayes
theorem, and God.
Signicance 10 (1): 3639.
doi:10.1111/j.1740-9713.2013.00638.x.
[7] Bayes, Thomas, and Price, Richard (1763). An Essay towards solving a Problem in the Doctrine of Chance. By
the late Rev. Mr. Bayes, communicated by Mr. Price,
in a letter to John Canton, A. M. F. R. S. (PDF). Philosophical Transactions of the Royal Society of London 53
(0): 370418. doi:10.1098/rstl.1763.0053.
9 Further reading
Bruss, F. Thomas (2013), 250 years of 'An Essay towards solving a Problem in the Doctrine of
Chance. By the late Rev. Mr. Bayes, communicated by Mr. Price, in a letter to John Canton, A. M. F. R. S.' ", DOI 10.1365/s13291-0130077-z, Jahresbericht der Deutschen MathematikerVereinigung, Springer Verlag, Vol. 115, Issue 3-4
(2013), 129-133.
Gelman, A, Carlin, JB, Stern, HS, and Rubin, DB
(2003), Bayesian Data Analysis, Second Edition,
CRC Press.
Grinstead, CM and Snell, JL (1997), Introduction
to Probability (2nd edition)", American Mathematical Society (free pdf available) .
Hazewinkel, Michiel, ed. (2001), Bayes formula,
Encyclopedia of Mathematics, Springer, ISBN 9781-55608-010-4
McGrayne, SB (2011). The Theory That Would Not
Die: How Bayes Rule Cracked the Enigma Code,
Hunted Down Russian Submarines & Emerged Triumphant from Two Centuries of Controversy. Yale
University Press. ISBN 978-0-300-18822-6.
10
Laplace, P (1774/1986), Memoir on the Probability of the Causes of Events, Statistical Science
1(3):364378.
Lee, PM (2012), Bayesian Statistics: An Introduction, Wiley.
Rosenthal, JS (2005), Struck by Lightning: the Curious World of Probabilities. Harper Collings.
Stigler, SM (1986). Laplaces 1774 Memoir on Inverse Probability. Statistical Science 1 (3): 359
363. doi:10.1214/ss/1177013620.
Stone, JV (2013), download chapter 1 of Bayes
Rule: A Tutorial Introduction to Bayesian Analysis, Sebtel Press, England.
10
External links
EXTERNAL LINKS
11
11.1
Text
11.2
Images
10
11
11.3
Content license