Chapter 8 Bayes' Theorem
Chapter 8 Bayes' Theorem
Chapter 8 Bayes' Theorem
Downloaded from ascelibrary.org by University of Illinois At Urbana on 06/27/17. Copyright ASCE. For personal use only; all rights reserved.
Bayes’ Theorem
8.1 Introduction
Often it is difficult to discern the likely cause of an outcome, but doing so is essential
to diagnosing the problem so that remedial measures and intervention may be
introduced. One needs to resort to techniques, such as Bayes’ theorem, that are able
to mathematically derive the likelihood of events that could cause the specific
outcome. In Chapter 4, we studied cause-effect diagrams where causes had very
discrete, nonprobabilistic connections to outcomes. Here, although we know that a
certain event might be the cause of an outcome, we need to determine the likelihood
of its being the real cause.
The purpose of this chapter is to go over the derivation of Bayes’ theorem from
conditional probability, followed by example applications to demonstrate our
understanding of the theorem. Sample exercises demonstrate applications of Bayes’
theorem that should help students to better understand the theorem through
practice.
Bayes’ theorem was derived from conditional probability through the under-
standing of events and sample space, the multiplication rule, and the law of total
probability. The example applications and exercises show that known beliefs can be
updated with the use of current data. Furthermore, the applications through the
examples prove that the theorem is an excellent tool for risk analysis in the decision-
making process.
137
Throughout the years, the validity of Bayes’ theorem has been questioned because it
is based on subjective interpretation and belief. Bayesian inferences require knowl-
edge and skill to translate these prior beliefs into something more scientific and less
subjective. There are no standard procedures to explain how to determine a prior
probability, and there is no correct way to choose one. If prior probabilities are not
selected by an experienced individual, he or she can generate misleading results.
Thus, Bayes’ theorem can produce posterior probability distributions that are biased
by the prior probabilities (SAS Institute 2010).
That said, Bayes’ theorem is mathematically perfect and constituted an advance
in mathematical science when first presented. In addition, its mathematical simplic-
ity is considerable. If the prior probabilities can be applied accurately, Bayes’
theorem yields very high quality, reliable information.
1
Thomas Bayes, 1701–1761. Bayes was an English philosopher, Presbyterian minister, and statistician
(Belhouse 2016). He is known for having published Divine Benevolence, or an Attempt to Prove That the
Principal End of the Divine Providence and Government is the Happiness of His Creatures (1731). Bayes’ theorem
was never published by him and only became known after his death.
Figs. 8-1 to 8-3 will provide a better understanding of the relationship between
events, sample space, and probability.
Fig. 8-1 represents the finite sample space S as a rectangular box encompassing
the circle representing event A. Fig. 8-2 shows the nonoccurrence of event A, written
as A 0 , in the hatched area within the sample space. Note that the circle representing
event A is no longer shaded. If P ðSÞ = 1, then A 0 = 1 − A. Alternately, we can say that
A þ A 0 = 1.
Multiple events that occur can interact with each other in a given sample space.
These interactions are best displayed in Venn diagrams. In the following cases, two
events, A and B, interact within the sample space. The sum, or union, of these two
events is denoted as A∪B, which represents the combined probability of the events—
that is, the probability that event A or event B or both will occur—and is illustrated in
Fig. 8-3; the events are outlined to represent their union. The probability of the
union of A and B in a sample space is denoted as P ðA∪BÞ. Because this is a function
of the sample space, P ðA∪BÞ implies that it is the probability of A∪B given the sample
space S, or P ðA∪BÞjS.
Fig. 8-4 illustrates the outcomes of what the two events have in common. This is
represented by the central crosshatched area where events A and B intersect and is
inherently known as the intersection, or product, of the events. This is denoted as
A∩B and its probability as P ðA∩BÞ. Or, similar to the previous discussion for union,
the intersection of events A and B given the sample space S can be written P ðA∩BÞjS.
P ðA∩BÞ can also be referred to as the joint probability of events A and B.
probability of the other; if they are dependent, the probability of the occurrence of
one event is affected by the occurrence or nonoccurrence of the other. The
following equations, derived from the laws of probability, are relevant to conditional
Downloaded from ascelibrary.org by University of Illinois At Urbana on 06/27/17. Copyright ASCE. For personal use only; all rights reserved.
probability and the derivation of Bayes’ theorem (Freund and Williams 1977).
The formula for independent events is derived from the special multiplication rule
P ðA ∩ BÞ = P ðAÞ · P ðBÞ
The formula for dependent events is derived from the general multiplication rule
P ðA ∩ BÞ = P ðBÞ · P ðAjBÞ
Because it does not matter which event is referred to as A or B, we can say that
which is Bayes’ theorem that deals specifically with dependent events and allows for
the updating of probability values should new information arise. In the previous
equation for the general multiplication rule, we see the conditional probability P ðAjBÞ.
This is defined as the probability that event A will occur given that event B has
occurred. This conditional probability is where the new information would be
applied to revise the posterior probability.
The data given can be tabulated as shown in Table 8-1, where U denotes the
selection of an upper classman, L denotes the selection of a lower classman,
E denotes the selection of a student with prior experience, and N denotes the
selection of a student with no experience. Using the given data, we can fill the
remaining cells with a bit of arithmetic, as shown in Table 8-2.
Downloaded from ascelibrary.org by University of Illinois At Urbana on 06/27/17. Copyright ASCE. For personal use only; all rights reserved.
20 þ 30
P ðU Þ = = 0.625
80
15 þ 15
P ðLÞ = = 0.375
80
20 þ 15
P ðE Þ = = 0.4375
80
30 þ 15
P ðN Þ = = 0.5625
80
30
P ðU ∩N Þ = = 0.375
80
30
P ðU jN Þ = = 0.667 = 66.7%
45
30∕80 P ðU ∩N Þ 30
P ðU jN Þ = = = = 0.667
Downloaded from ascelibrary.org by University of Illinois At Urbana on 06/27/17. Copyright ASCE. For personal use only; all rights reserved.
45∕80 P ðN Þ 45
Therefore,
P ðU ∩N Þ = P ðU jN Þ · P ðN Þ
P ðU jN · ðP ðN ÞÞ = P ðN jU Þ · P ðU Þ
And, in general,
The logic of Bayes’ theorem can be applied in either direction. The first logic to
consider would be from cause to effect, as shown in Fig. 8-5. Event A represents the
cause, and event B represents the effect, or outcome. The probabilities that follow
this logic are the a priori, or prior, probability and its respective conditional
probabilities. The a priori probabilities are used to derive the a posteriori probabili-
ties. Conversely, reasoning in the opposite direction would go from effect to cause,
using the a posteriori, or posterior, probability and its conditional probabilities as a
check to confirm the a priori probability values. This effect to cause logic, applied to the
example in Fig. 8-5, can now be seen in Fig. 8-6, where events B and B 0 are now the
probable causes, and events A, A1 : : : Ak are now the probable effects. Generally,
when going from effect to cause, the a posteriori becomes the new a priori. Whether
one starts with a priori or a posteriori is often a function of the type of data available,
but they are both mutually interrelated.
Tree diagrams (Figs. 8-5 and 8-6) are used in conjunction with Bayes’ theorem
for multiple, mutually exclusive events. When events are mutually exclusive, it
means that the events cannot occur at the same time. As shown in Fig. 8-5, events
A, A1 : : : Ak are the mutually exclusive causes, where A is one event, A1 is the
second event, and Ak represents a subsequent event, and events B and B 0 are the
mutually exclusive effects/outcomes. Conversely, in Fig. 8-6, events B and B 0 are
the mutually exclusive causes, and events A, A1 : : : Ak are the mutually exclusive
effects. The prime shown in event B 0 indicates its distinction and mutual exclusivity
from B.
These trees are a useful aid when visualizing the logic behind the processes,
especially when considering more than two causes and effects. Hence, this chapter
builds on earlier chapters that used trees (Ch. 6) and cause-effect diagrams (Ch. 4).
All together, these three chapters complement each other.
1) Branches stem from a central node (shown as a dot on the tree diagrams).
Downloaded from ascelibrary.org by University of Illinois At Urbana on 06/27/17. Copyright ASCE. For personal use only; all rights reserved.
2) The total probability of all of the branches emanating from a node must be equal
to 1.0, or 100%.
3) The events assigned to branches stemming from a node must be mutually exclusive.
The general formula for Bayes’ theorem, when considering multiple, mutually
exclusive events, is an expansion of what was previously derived:
P ðAi Þ · P ðBjAi Þ
P ðAi jBÞ =
P ðA1 Þ · P ðBjA1 Þ þ P ðA2 Þ · P ðBjA2 Þ þ · · · þ P ðAk Þ · P ðBjAk Þ
for i = 1, 2, · · · , k:
where P ðA1 Þ · P ðBjA1 Þ is equal to the joint probability of reaching B from Al ; P ðA2 Þ ·
P ðBjA2 Þ is equal to the joint probability of reaching B from A2 ; and P ðAk Þ · P ðBjAk Þ is
equal to the joint probability of reaching B from Ak . The entire denominator is equal
to the sum of all the joint probabilities, which is P ðBÞ in the above formula.
Tabulating these joint probabilities further aid in their calculation. The use and
application of these tree diagrams and prior and posterior probability calculation
tables are shown in Sections 8.6.1 and 8.6.2.
A caveat to consider when calculating the prior and posterior probabilities in the
tables is the decimal rounding of these probabilities. Probabilities may or may not
result in the exact calculated values, due to errors in rounding. An example of this
situation also is shown in Sections 8.6.1 and 8.6.2. Keep in mind that the total
probability of events in a given sample space must equal 1, or 100%.
Now that Bayes’ theorem has been derived, the following sections cover the
application of Bayes’ theorem through a series of simple examples of real-world
situations. These sections also cover the setup and use of the a priori and a posteriori
probability calculation tables.
In the first example, a client firm seeks probabilistic information on a particular
contractor for an upcoming project. The contractor has a great reputation and does
quality work, but has projects that run over budget and are delayed. The client firm
wants detailed information on this contractor to be able to compare with other
contractors. In this scenario, Bayes’ theorem can help in the decision-making
process by enabling the quantification and comparison of performance attributes,
thereby helping to reduce risk.
ancillary parts for its rifle. Although firearms are built within tight tolerances, no
system is truly perfect; therefore, the parts are susceptible to manufacturing
defects. Knowing the chances of defective parts occurring and being able to
quantify the probability of a particular part having an increased risk of defects
would be beneficial to the manufacturer. In other words, having an idea of what
parts could be defective can help the manufacturer decide which parts need to
have an increase in inventory, thereby reducing the risk of costs caused by delayed
production because of having an insufficient quantity of a particular part on hand
for a complete rifle.
From here, a tree diagram can be created to organize the events given in the
problem (Fig. 8-7). Each branch represents an event, has its own probability, and is
typically read from left to right.
Take branch C in Fig. 8-7 as an example. The probability, or a priori, of branch C
is P ðC Þ. Branches D and N D stem from branch C. Subsequently, going from branch
C to branch D yields a conditional probability, P ðDjC Þ; correspondingly, going from
branch C to branch N D yields P ðN DjC Þ.
2
This can be rephrased: What is the probability that the contractor will complete within budget, given that
a delay was encountered?
Fig. 8-7. Initial tree diagram with given events for Example 1
Fig. 8-8 shows the tree diagram with the given probabilities, along with the
complementary probability of each corresponding branch. The total probabilities of
corresponding branches must equal 1, or 100%. For example, in Fig. 8-8, branch C
splits off into branches D and N D. The conditional probability for branch
N D is P ðN DjC Þ = 0.90, because the conditional probability given for branch
D is P ðDjC Þ = 0.1. It is thus seen that P ðDjC Þ þ P ðN DjC Þ = 1 ⇒ P ðN DjC Þ =
1 − P ðDjC Þ = 0.90. The same applies to the branches stemming from branch N.
The individual probabilities of branches D and N D can now be calculated.
Fig. 8-9 shows the calculation of these probabilities; the last row of the table is
simply the sum of the probabilities for each branch. Keep in mind that the sum of these
two probabilities must equal 1. This figure extends Fig. 8-8 to simulate the probabilities
of D and N D. Coming from P ðDjC Þ, there can be no N D possible; coming from
P ðN DjC Þ, there can be no D possible, and so forth. This is reflected in Fig. 8-9.
Now, Bayes’ theorem can be applied to determine the a posteriori probability
that the contractor will complete the project within budget, given that they encoun-
tered a delay, or P ðC jDÞ.
Fig. 8-9. Example 1 probability calculations for a priori PðDÞ and PðNDÞ
P ðDjC Þ · ðC Þ 0.095
P ðC jDÞ = = = 0.73 = 73%
P ðDÞ 0.13
Thus, the probability that All-Win Construction will complete the project within
budget, given that a delay is encountered is 73%. At this point, the client firm might
take into consideration other attributes of All-Win Construction or consider another
contractor because 73% is not a stellar rating.
To check these calculations, let us calculate the posterior probabilities (a
posteriori) by building the tree diagram in reverse (or inside-out), as shown in
Fig. 8-10. The tree is set up similarly to how it was set up previously, except for using
the newly calculated probabilities for D and N D. The conditional probabilities, which
were initially the a priori probabilities, can now be back-calculated using Bayes’
Theorem using the data from the a priori calculations. If the calculations are all
correct, the posterior probabilities of C and D should equal the prior probabilities of
C and D with which we originally started. The calculations for these conditional
posterior probabilities are as follows:
P ðDjC Þ · P ðC Þ 0.095
P ðC jDÞ = = = 0.73
P ðDÞ 0.13
Downloaded from ascelibrary.org by University of Illinois At Urbana on 06/27/17. Copyright ASCE. For personal use only; all rights reserved.
P ðDjN Þ · P ðN Þ 0.035
P ðN jDÞ = = = 0.27
P ðDÞ 0.13
P ðN DjC Þ · P ðC Þ 0.855
P ðC jN DÞ = = = 0.98
P ðN DÞ 0.87
P ðN DjN Þ · P ðN Þ 0.015
P ðN jN DÞ = = = 0.02
P ðN DÞ 0.87
Labeling the tree diagram with the appropriate values and probabilities yields
the tree shown in Fig. 8-10. Once again, we see that P ðC jDÞ þ P ðN jDÞ = 1:0, as is
P ðC jN DÞ þ P ðN jN DÞ.
As in the a priori calculations, the individual probabilities of branches C and N
can now be calculated. The branches are extended in Fig. 8-11 with an aim to discover
these probabilities of C and N. Observe, again, that P ðC Þ þ P ðN Þ = 0.95 þ 0.05 = 1.
This means that the calculations are accurate. If the values are off by a considerable
margin, then it is advisable to recheck the calculations; ignore or adjust minor
rounding errors.
Supplier A makes the trigger assemblies of which, records show, about 1% have
been defective. Supplier B makes the rifle butt stocks, of which 3% have been
defective; Supplier C makes the slings, of which 2% have been defective; and
Downloaded from ascelibrary.org by University of Illinois At Urbana on 06/27/17. Copyright ASCE. For personal use only; all rights reserved.
Supplier D makes the forward grips, of which 5% have been defective. The rifle
manufacturer’s current inventory is shown in Table 8-3.
Choosing one part at random, what is the probability that it is defective? Also,
which part would have the greatest likelihood of being defective?
First, the a priori must be determined before the tree diagram can be created.
To do this, start by taking the total number of units for each part and dividing that by
the total number of parts (2,550 units). For example, the manufacturer has 400 units
of Supplier A’s trigger assemblies. Therefore, the probability of a trigger assembly
being chosen at random is 400∕2,550 = 15.7%.
The remaining probabilities are calculated similarly and shown in Table 8-4. The
conditional probabilities can be deduced from the problem statement. For example,
let P ðF jAÞ represent the probability that the part is defective, given that it is from
Supplier A.
Again, before a tree diagram is created, letters must first be assigned to represent
the corresponding events:
Using the data that was given and the data that was calculated, the tree diagram
can now be created (Fig. 8-12). Applying the probability calculations used in the
Trigger assemblies (A) Rifle butt stocks (B) Slings (C) Forward grips (D)
400 500 1,000 650
Total number of parts = 400 + 500 + 1,000 + 650 = 2,550
P ðF jAÞ · P ðAÞ
P ðAjF Þ =
P ðF Þ
Solving for Supplier A:
P ðF jC Þ · P ðC Þ ð0.02Þð0.392Þ
P ðC jF Þ = = = 0.280 = 28.0%
P ðF Þ 0.028
Based on these calculations using Bayes’ theorem, the part that will most likely
be defective when selecting at random is the forward grip made by Supplier D, with a
probability of 45.4%. In words, the probability that defects are in the trigger
assemblies given that defects are found is 5.6%; that defects are in the butt stocks
given that defects are found is 21%; that defects are in the slings given that defects are
found is 28%; and that defects are in the forward grips given that defects are found is
45.4%. Applying the same methods from the previous example, the a posteriori
tree diagram and probability calculations for this example can be seen in Figs. 8-14
and 8-15. Again, ignore or adjust rounding errors at the second and third decimals.
8.7 Exercises
This section includes three simple sample exercises that should help the reader
better understand Bayes’ theorem through application and practice. The exercises
provide steps for finding the solution to the problem.
154
Solution:
• T = Tall
• D = Do not live past 90
• L = Live past 90
2) Next, determine the a priori and conditional probabilities:
Total number of men considered (tall and short only, older than 80 years) = 2,415
1,171
P ðSÞ = = 0.485
2,415
1,244
P ðT Þ = = 0.515
2,415
P ðLjSÞ = 0.29
P ðLjT Þ = 0.16
The probability that a short man will not live past 90 years of age is 44.3%
P ðH Þ = 0.15 ðGivenÞ
P ðW jH Þ = 0.49 ðGivenÞ
0
P ðN jH Þ = 1 − 0.98 = 0.02 ðCalculatedÞ
P ðW jN Þ = 0.15 ðGivenÞ
The chance that there will be a rainstorm during the six months of the hurricane
season is 28.6%, and the chance that there will be no hurricane, given that it will rain,
is 74.5%.
158
Anabolic steroids have been the standard drug of choice for enhancing an
athlete’s performance. Because steroids have been in use for quite some time, the
tests for them have been quite accurate in detecting whether an athlete has been
Downloaded from ascelibrary.org by University of Illinois At Urbana on 06/27/17. Copyright ASCE. For personal use only; all rights reserved.
P ðU Þ = 0.20 ðGivenÞ
P ðN Þ = 0.80 ðCalculatedÞ
P ðP jU Þ = 0.99 ðGivenÞ
0
P ðP jU Þ = 1 − 0.99 = 0.01 ðCalculatedÞ
P ðP jN Þ = 0.05 ðGivenÞ
4) Tabulate the joint probabilities. The a priori is calculated by taking the sum of all
the joint probabilities in its column. This can be seen in the last row of the table in
Fig. 8-21, where the probability of martial artists being tested positive, whether
rightly or falsely, is 23.8%.
5) Finally, apply Bayes’ theorem to determine the final answer:
P ðP jN Þ · P ðN Þ 0.04
P ðN jP Þ = = = 0.168
P ðP Þ 0.238
The probability that an athlete who is not on PEDs and receives a false positive
result is 16.8%.
8.8 Conclusion
References
Ahmed, A., Kusumo, R., Savci, S., Kayis, B., Zhou, M., and Khoo, Y. B. (2005). “Application of
analytical hierarchy process and Bayesian belief networks for risk analysis.” Complexity Int.,
12(12), 1–3.
Belhouse, D. R. (2001). “The reverend Thomas Bayes FRS: A biography to celebrate the
tercentenary of his birth.” <http://www2.isye.gatech.edu/~brani/isyebayes/bank/bayes
biog.pdf> ( Dec. 20, 2016).
Das, B. (1999). “Representing uncertainty using Bayesian networks.” Publication DSTO-TR-
0918, Dept. of Defence, Defence Science and Technology Organisation, Salisbury,
Australia.
Freund, J. E., and Williams, F. J. (1977). Elementary business statistics: The modern approach,
3rd Ed., Prentice-Hall, Englewood Cliffs, NJ.
He, Q., et al. (2014). “Shorter men live longer: Association of height with longevity and
FOXO3 genotype in american men of Japanese ancestry.” PLoS ONE, 9(5), e94385.
SAS Institute (2010). “Bayesian analysis: Advantages and disadvantages.” <http://support.sas.
com/documentation/cdl/en/statug/68162/HTML/default/viewer.htm#statug_introbayes_
sect015.htm> (May 21, 2014).
Vanem, E. (2013). Bayesian hierarchical space-time models with application to significant wave height:
Ocean engineering and oceanography, Vol. 2, Springer, Berlin.