Bayesian Tutorial
Bayesian Tutorial
Bayesian Tutorial
BAYESIAN STATISTICS
in Health Economics and
Outcomes Research
BS
CH
B AYESIAN I NITIATIVE IN H EALTH E CONOMICS E
& O UTCOMES R ESEARCH Centre for Bayesian Statistics
in Health Economics
A Primer on Bayesian Statistics Luce O’Hagan
a primer on
BAYESIAN STATISTICS
in Health Economics and
Outcomes Research
Anthony O’Hagan, Ph.D. Bryan R. Luce, Ph.D.
Centre for Bayesian Statistics MEDTAP ® International, Inc.
in Health Economics Bethesda, MD
Sheffield Leonard Davis Institute,
United Kingdom University of Pennsylvania
United States
With a Preface by
Dennis G. Fryback
All rights reserved. No part of this book may be reproduced in any form,
or by any electronic or mechanical means, without permission in
writing from the publisher.
Acknowledgements..............................................................iv
Overview ..............................................................................9
Conclusions ........................................................................42
Appendix ............................................................................47
It is notable that Pearson, who is later identified mainly with the fre-
quentist school, particularly the Neyman-Pearson lemma, supports the
Bayesian method’s veracity in this paper.
An accessible overview of Bayesian philosophy and methods, often
cited as a classic, is the review by Edwards, Lindman, and Savage (1963).
It is worthwhile to quote their recounting of history:
Bayes’ theorem is a simple and fundamental fact about probability that
seems to have been clear to Thomas Bayes when he wrote his famous
article ... , though he did not state it there explicitly. Bayesian statistics is
so named for the rather inadequate reason that it has many more occa-
sions to apply Bayes’ theorem than classical statistics has. Thus from a
very broad point of view, Bayesian statistics date back to at least 1763.
This passage has two important ideas. The first concerns the definition
of “probability”. The second is that although the ideas behind Bayesian sta-
tistics are in the foundations of statistics as a science, Bayesian statistics
came of age to facilitate decision-making.
Probability is the mathematics used to describe uncertainty. The dom-
inant view of statistics today, termed in this Primer the “frequentist” view,
defines the probability of an event as the limit of the relative frequency
with which it occurs in series of suitably relevant observations in which it
could occur; notably, this series may be entirely hypothetical. To the fre-
quentist, the locus of the uncertainty is in the events. Strictly speaking, a
frequentist only attempts to quantify “the probability of an event” as a
characteristic of a set of similar events, which are at least in principle
repeatable copies. A Bayesian regards each event as unique, one which
will or will not occur. The Bayesian says the probability of the event is a
number used to indicate the opinion of a relevant observer concerning
whether the event will or will not occur on a particular observation. To the
Bayesian, the locus of the uncertainty described by the probability is in the
observer. So a Bayesian is perfectly willing to talk about the probability of
a unique event. Serious readers can find a full mathematical and philo-
sophical treatment of the various conceptions of probability in Kyburg &
Smokler (1964).
It is unfortunate that these two definitions have come to be character-
ized by labels with surplus meaning. Frequentists talk about their proba-
bilities as being “objective”; Bayesian probabilities are termed “subjective”.
Because of the surplus meaning invested in these labels, they are perceived
to be polar opposites. Subjectivity is thought to be an undesirable proper-
Dennis G. Fryback
Professor, Population Health Sciences
University of Wisconsin-Madison
References
Edwards W, Lindman H, Savage LJ. Bayesian statistical inference for psycho-
logical research. Psychological Review, 1963; 70:193-242.
Kyburg HE, Smokler, HE [Eds.] Studies in Subjective Probability, New York: John
Wiley & Sons, Inc. 1964.
We shall see how these benefits arise, and their implications for health
economics and outcomes research, in the remainder of this Primer.
However, even a cursory look at the benefits may make the reader won-
der why frequentist methods are still used at all. The answer is that there
are also widely perceived drawbacks to the Bayesian approach:
(D1) Bayesian methods involve an element of subjectivity that is
not overtly present in frequentist methods.
(D2) In practice, the extra information that Bayesian methods uti-
lize is difficult to specify reliably.
(D3) Bayesian methods are more complex than frequentist meth-
ods, and software to implement them is scarce or non-exis-
tent.
Overview 11
SECTION 1 Inference
Example.
Suppose that Mary has tossed a coin and knows the outcome,
Heads or Tails, but has not revealed it to Jamal. What probability
should Jamal give to it being Head? When asked this question,
Example.
Consider the proposition that treatment 2 will be more cost-effective
than treatment 1 for a health care provider. This proposition concerns
unknown parameters, such as each treatment’s mean cost and mean
efficacy across all patients in the population for which the health care
Interpreting a P-value
The null hypothesis that treatment 2 is not more cost-effective than treatment 1
is rejected at the 5% level, i.e. P = 0.05. What does this mean?
2. If we were to repeat the analysis many times, using new data each time, and
if the null hypothesis were really true, then on only 5% of those occasions
would we (falsely) reject it.
Section 1: Inference 15
ing about individual patients.)
The primary reason why we cannot interpret a P-value in this way is
because it does not take account of how plausible the null hypothesis was a
priori.
Example.
An experiment is conducted to see whether thoughts can be transmit-
ted from one subject to another. Subject A is presented with a shuffled
deck of cards and tries to communicate to Subject B whether each card
is red or black by thought alone. In the experiment, Subject B correct-
ly gives the color of 33 cards. The null hypothesis is that no thought-
transference takes place and Subject B is randomly guessing. The
observation of 33 correct is significant with a (one-sided) P-value of
3.5%. Should we now believe that it is 96.5% certain that Subject A
can transmit her thoughts to Subject B?
Section 1: Inference 17
TABLE 1. Summary of Key Differences Between
Frequentist and Bayesian Approaches
FREQUENTIST BAYESIAN
Nature of probability
It only applies to events that are (at least It applies to any event or proposition
in principle) repeatable. about which we are uncertain.
Nature of parameters
They are therefore not random variables, They are therefore random variables.
but fixed (unknown) quantities.
Nature of inference
Does not (although it appears to) make Makes direct probability statements
statements about parameters. about parameters.
Example
“We reject this hypothesis at the 5% “The probability that this hypothesis is
level of significance.” true is 0.05.”
0.4
0.3
0.2
0.1
-4 -2 0 2 4
Figure 1. The prior distribution (grey) and information from the new data (red)
are synthesized to produce the posterior distribution (black dotted).
In this example, the prior information (grey curve) tells us that the parameter is
almost certain to lie between – 4 and + 4, that it is most likely to be between – 2
and + 2, and that our best estimate of it would be 0.
The data (red curve) favor values of the parameter between 0 and 3, and strongly
argue against any value below – 2.
The posterior (black dotted curve) puts these two sources of information together.
So, for values below – 2 the posterior density is tiny because the data are saying
that these values are highly implausible. Values above + 4 are ruled out by the
prior; again, the posterior agrees. The data favors values around 1.5, while the
prior prefers values around 0. The posterior listens to both and the synthesis is a
compromise. After seeing the data, we now think the parameter is most likely to be
around 1.
either information source separately. This is the key benefit (B2) – “ability
to make use of more information and to obtain stronger results” – that the
Bayesian approach offers.
According to the Bayesian paradigm, any inference we desire is
derived from the posterior distribution. One estimate of a parameter might
be the mode of this distribution (i.e. the point where it reaches its maxi-
mum). Another common choice of estimate is the posterior expectation. If
we have a hypothesis, then the probability that the hypothesis is true is
also derived from the posterior distribution. For instance, in Figure 1 the
2. Take the view that the two drugs should have essentially identical
hospitalization rates – and so we pool the data from the two trials.
The second option will lead to the new data being swamped by the
much larger earlier trial, which seems unreasonable, but the first option
entails throwing away potentially useful information. In practice, a fre-
quentist would probably take the first option, but with a caveat that the
earlier trial suggests this may underestimate the true rate.
It would usually be more realistic to take the view that the two hospital-
ization rates will be different but similar. The Appendix demonstrates how a
Bayesian analysis can accommodate the earlier trial as prior information
although it necessitates a judgement about similarity of the drugs. How dif-
ferent might we have believed their hospitalization rates to be before con-
ducting the new trial?
The Bayesian analysis produces a definite and quantitative synthesis of
the two sources of information rather than just the vague “an earlier trial on
a similar drug produced a higher mean days in hospital, and so I am skepti-
cal about the reduction seen in this trial”. This synthesis results from making
a clear, reasoned and transparent interpretation of the prior information.
This is part of the key benefit (B5) – “more transparent judgements” – of the
Bayesian approach. Without the Bayesian analysis it would be natural to
moderate the claims of the new trial. The extent of such moderation would
still be judgmental, but the judgement would not be so open and the result
would not be transparently derived from the judgement by Bayes’ theorem.
The ‘Evidence’
Prior information should be based on sound evidence and reasoned judgements.
A good way to think of this is to parody a familiar quotation: the prior distribution
should be ‘the evidence, the whole evidence and nothing but the evidence’:
• ‘the evidence’ – genuine information legitimately interpreted;
• ‘the whole evidence’ – not omitting relevant information (preferably a
consensus that pools the knowledge of a range of experts);
• ‘nothing but the evidence’ – not contaminated by bias or prejudice.
Section 5: Computation 33
6
SECTION 1 Design and Analysis
of Trials
In this section we will briefly summarize the main messages being con-
veyed in this Primer.
• Bayesian methods are different from and, we posit, have certain
advantages over conventional frequentist methods, as set out in
benefits (B1) to (B5) of the Overview. These benefits are explored and
illustrated in various ways throughout subsequent sections of the
Primer.
• There are some perceived disadvantages of Bayesian methods, as set
out in the drawbacks (D1) to (D3) in the Overview. These are also
discussed in subsequent sections and we describe how they are
being addressed. It is up to the reader to judge the degree to which
the benefits may outweigh the drawbacks in practice.
• Bayesian technologies have already been developed in many of the
key methodologies of health economics. Already we see clear
advantages in the design and analysis of cost-effectiveness trials,
quantification of uncertainty in economic models, expression of
uncertainty about cost-effectiveness, assessment of the value of
potential new evidence, and synthesis of information going into
and through an economic evaluation.
• There is enormous scope for the development of new and more
sophisticated Bayesian techniques in health economics and out-
comes research. We are confident that Bayesian analysis will
increasingly become the approach of choice for the development
and evaluation of submissions on cost-effectiveness of medical
technologies, as well as for pure cost or utility studies.
Felli, C. and Hazen, G. B. (1998). Sensitivity analysis and the expected value of
perfect information. Medical Decision Making 18, 95-109.
O’Hagan, A. and Stevens, J. W. (2002). Bayesian methods for design and analy-
sis of cost-effectiveness trials in the evaluation of health care technologies.
Statistical Methods in Medical Research 11, 469-490.
Stevens, J. W., O’Hagan, A. and Miller, P. (2003). Case study in the Bayesian
analysis of a cost-effectiveness trial in the evaluation of health care technolo-
gies: Depression. Pharmaceutical Statistics 2, 51-68.
Van Hout, B. A., Al, M. J., Gordon, G. S. and Rutten, F. (1994). Costs, effects
and C/E ratios alongside a clinical trial. Health Economics 3, 309-319.
Preparatory books.
These books cover basic ideas of decision-theory and personal probability.
Neither book assumes knowledge of mathematics above a very elementary
level.
Lindley, D.V. (1980). Making Decisions, 2nd ed. Wiley, New York.
Lee, P.M. (1997). Bayesian Statistics: An Introduction, 2nd ed., Arnold, London.
Philosophy
Finally, the following presents the case for Bayesian inference from the per-
spective of philosophers of science.
Howson, C. and Urbach, P. (1993). Scientific Reasoning, 2nd edition. Open Court;
Peru, Illinois.
This Appendix offers more detailed discussion of the issues raised in the
first four sections of this Primer.
Inference – Details
The following subsections give details of the arguments presented in the
“Inference” section and of the key differences between Bayesian and frequentist
statistics identified in Table 1.
Appendix 49
Examples.
1) Screening. Consider a screening test for a rare disease. The test is very
accurate, with false-positive and false-negative rates of 0.1% (i.e. only
one person in a thousand who does not have the disease will give a
positive result, and only one person in a thousand with the disease will
give a negative result). You take the screen and your result is positive.
What should you think? Since the screen only makes one mistake in
a thousand, doesn’t this mean you are 99.9% certain to have the dis-
ease? In hypothesis testing terms, the positive result would allow you
to reject the null hypothesis that you don’t have the disease at the
0.1% level of significance, a highly significant result agreeing with that
99.9% diagnosis. But the disease is rare, and in practice we know that
most positives reporting for further tests will be false positives. If only
one person in 50,000 has this disease, your probability of having it
after a positive screening test is less than 1 in 50.
Appendix 51
Example.
Consider the rate of side effects from a drug. In a trial with 50 patients,
we observe no side effects. The standard unbiased estimator of the side
effect rate per patient is now zero (0/50). In what sense can we believe
that this is “on average neither too high nor too low”? It obviously can-
not be too high and is almost certain to be too low. It is true that the
estimation rule (which is to take the number of patients with side
effects and divide by 50) will produce estimates that on average are
neither too high nor too low if we keep repeating the rule with new
sets of data. It is also clear, though, that we cannot apply this interpre-
tation to the individual estimate. To do so is like interpreting a P-value
as the probability that the null hypothesis is true; it is simply incorrect.
In any Bayesian analysis, given no side effects among 50 patients, the
expected side effect rate would be positive. Furthermore, the posterior
expectation has the desired interpretation that this estimate is
expected to be neither too high nor too low.
0.8
0.6
0.4
0.2
Appendix 53
The Bayesian Method – Details
The following subsections give more details of the Bayesian method.
Bayes’ theorem
The simplest way to express Bayes’ theorem without using mathematical
notation is this:
0.4
0.3
0.2
0.1
-4 -2 0 2 4
The paradigm is about learning, and we can always learn more. When we
acquire more data, Bayes’ theorem tells us how to update our knowledge to
synthesize the new data. The old posterior contains all that we know before see-
Appendix 55
ing the new data, and so becomes the new prior distribution. Bayes’ theorem
synthesizes this with the new data to give the new posterior. And on it
goes…Bayesian methods are ideal for sequential trials!
Bayes’ theorem also makes it more clear as to why the common misinter-
pretation of frequentist inferences is wrong. The likelihood expresses the prob-
ability of obtaining the actual data, given any particular value of the parameter.
In simple terms,
Bayesian inference
In the Bayesian approach, all inferences are derived from the posterior dis-
tribution. When a Bayesian analysis reports a probability interval (a credible
interval) for a parameter, this is a posterior interval, derived from the para-
meter’s posterior distribution, based not only on the data but also on whatever
other information or knowledge the investigator possesses. The probability that
a hypothesis is true is a posterior probability and a typical example of an esti-
mate of a parameter would be the posterior mean (the ‘expected’ value).
These are the Bayesian analogues of the three kinds of inferences that are
available in the frequentist framework. However, Bayesian inference is much
more flexible than this.
Appendix 57
Prior Information – Details
The following subsections give more details of the discussion presented in
the Prior Information section of the main text.
Subjectivity
The fact that Bayesian methods are based on a subjective interpretation of
probability was introduced in the subsection The Nature of Probability in this
Appendix. We explained there that this formulation is necessary if we are to
give probabilities to parameters and hypotheses since the frequentist interpre-
tation of probability is too narrow. Yet this leaves Bayesian methods open to
the charge of subjectivity, which is thought by many to be unacceptable and
unscientific.
Yet science cannot be truly objective. Schools of thought and contention
abound in most disciplines. Science attempts to minimize subjectivity through
the use of objective data and reason, but where the data are not conclusive we
have to use our judgement and expertise.
Bayesian methods naturally accommodate this approach. Figure 1 demon-
strates how Bayes’ theorem naturally weights each information source accord-
ing to its strength. In that example, the data were only slightly more informa-
tive than the prior, and so the posterior is quite strongly influenced by the prior
as well as by the likelihood.
We also saw in Figure A2 how if the prior information is weakened then
Bayes’ theorem effectively places all of the weight on the likelihood. Often we
are in the more fortunate position of having strong data. Then the situation will
be more like the triplot in Figure A3. As new data accumulate the prior distri-
bution again becomes less influential.
1.2 1.2
1 1
0.8 0.8
0.6 0.6
0.4 0.4
0.2 0.2
(a) on the same scale as Figure A2 (b) with scale changed to show
difference between likelihood
and posterior.
Appendix 59
Whose prior?
Suppose that a sponsor of some medical technology (e.g. a pharmaceutical
company or device maker) wishes to present a Bayesian analysis in support of
a case for cost-effectiveness of that technology. What would be acceptable in the
form of prior information? One way to approach this question is to ask whose
prior should be used.
As shown above, it may not matter. Consensus can be reached if the data
are strong enough to overrule the differences in the prior opinions and knowl-
edge of all interested parties. However, the argument does not work if one per-
son has sufficiently strong prior information or opinions. It only takes one
extremely opinionated person to prevent agreement. We are familiar with the
person whose views on some matters are so prejudiced that they will not listen
to any facts or arguments to the contrary – Bayes’ theorem explains these peo-
ple, too!
This clarifies some aspects of subjectivity. While we should accept that dif-
ferent people might legitimately have different background knowledge or might
legitimately interpret the same information differently, there is no place for
prejudice or perverse misinterpretation of the available facts in health econom-
ics (or anywhere else). An important aspect of Bayesian analysis is that the prior
distribution is set out openly. If it is not based on reasonable use of information
and experience, the resulting analysis will not convince anyone. This is the key
benefit (B5) – “more open judgements” – of Bayesian analysis. All cards should
be on the table and nothing hidden.
Returning to the question of whose prior a sponsor might use, it is likely
that the sponsor’s own prior distribution would be unacceptable. In principle,
their opinions might be defensible on the basis of the company’s own substan-
tial experiences in development and testing of the product, but there is the risk
of selective use of information unless full disclosure can be enforced. To quote
the box “The Evidence” in the main text, the sponsor would need to be able to
show that its prior not only represented the evidence but also the whole evi-
dence.
The most natural choice of prior distribution might be the considered and
defensible prior of an expert in the field. The agency to which the cost-effec-
Examples
These ideas are illustrated in the following two examples, which are dis-
cussed briefly in the main text.
Subset analysis is a notoriously tricky question. The risk of dredging the
data to find subgroups of patients that respond differently is real. For example,
suppose that a cost study found the following mean costs table (Table 2), split
by treatment and the initial letter of the patient’s surname.
It looks like treatment 2 is cheaper for patients whose names begin with the
letters A to D. There is highly unlikely to be any plausible reason for such a sub-
group effect. To avoid the risk of declaring spurious subgroup effects, standard
clinical trials guidance requires that the analysis of possible subgroups must be
specified before a trial begins, and there must be a plausible mechanism for the
proposed subgroup effects.
From a Bayesian perspective, the absence of a plausible mechanism simply
constitutes prior information – the subgroup effect would have a very small
prior probability. Combining the prior information with the data would result
in a small posterior probability, regardless of how convincing the data appeared
to be. The prior information is strong enough to override the data. The standard
guidance, therefore, applies equally to Bayesian analyses; subgroup analyses
must be prespecified, so that prior information about their plausibility can be
quantified.
Appendix 61
Bayesian methods will then automatically moderate the data and prevent
us from claiming implausible effects that arise by chance in the data. The prior
information is clearly important. To an extent, the existing procedures for fre-
quentist subgroup analysis incorporate the prior information (and so, we would
argue, are unconsciously Bayesian). However, the frequentist analysis simply
splits subgroup hypotheses into those that are plausible a priori and those that
are not, whereas the Bayesian assigns a prior probability that can take any value
from 0 to 1, and thereby allows a far more subtle gradation.
Appendix 63
quentist analysis, knowing the result of the other trial.
In this and previous examples, we have stressed the power of the Bayesian
approach to temper overly optimistic interpretations of P-values, but it is impor-
tant to recognize that the reverse situation is equally common and important.
Pharmaceutical company executives and biostatisticians will be very familiar
with occasions where a Phase III trial of a drug has just failed to produce a sig-
nificant effect, yet there is plenty of evidence (from related drugs, from a Phase
II trial that was restricted to acute cases, etc.) to suggest that the drug really is
effective. A properly conducted Bayesian analysis would allow the responsible
incorporation of this additional evidence to demonstrate the drug’s true efficacy.
Both situations are of enormous importance to the developers and users of
health care technologies –the first in avoiding costly mistakes due to being over-
ly optimistic and the second in allowing beneficial products to be brought to mar-
ket that otherwise would have to be abandoned or delayed for more testing.
Elicitation
We will consider the process of eliciting a prior distribution for an expert
without reference to the actual nature of the underlying prior information. In
practice, of course, the expert will base her analysis on that information, but in
this subsection we will not try to deal with the specifics of the underlying
information.
Suppose that we decide to formulate a prior distribution for a particular
parameter (such as the mean utility gain arising from some treatment), repre-
senting the knowledge of a single expert about that parameter. The first diffi-
culty we will face is that the expert will almost certainly not be an expert in
probability or statistics. That means it will not be easy for this person to express
her beliefs in the kind of probabilistic form demanded by Bayes’ theorem.
Our expert might be willing to give us an estimate of the parameter, but
Appendix 65
gists; some useful reviews can be found in Lichtenstein et al (1980), Meyer and
Booker (1981), Cooke (1991), Kadane and Wolfson (1998). Although the psy-
chologists have tended to emphasize the tasks that people conceptualize poor-
ly, the practical significance of this work is that we know a great deal about how
to avoid the problems. This and ongoing research seeks to identify the kinds of
questions that are most likely to yield good answers, avoiding pitfalls that have
already been identified by psychologists and statisticians, and the kind of feed-
back mechanisms that help to ensure good communication between statistician
and expert.
The second answer is that, fortunately, the imprecision of distributions
elicited from experts may not matter much. As discussed earlier, we can think
of a range of prior distributions that are consistent with the statements we have
elicited from the expert, and if the data are sufficiently strong then all these dif-
ferent specifications of the prior distribution will lead to essentially the same
posterior distribution. The box “Example of elicitation” explores these ideas.
Example of Elicitation
An expert estimates a relative risk 1.2
(RR) parameter to be about 50%, but 1
has considerable uncertainty about its
0.8
true value. She says that it is unlikely
to be less than 0.2 or greater than 0.6
Conjugate priors
As explained in the preceding discussion, the usual approach to specifying
a prior distribution for some parameter consists of first specifying (or eliciting)
a few features of the distribution, such as a prior expectation and some meas-
Structural priors
Structural prior distributions express information about relationships
between parameters, usually without saying anything about the specific values
of individual parameters. For instance, we might specify a prior distribution that
Appendix 67
represents effective ignorance about the mean cost under two different treat-
ments, but says that we expect the ratio of these means to be in the range 0.2
to 5. The mean cost under any given treatment could be anything at all, but
whatever value it actually takes we expect the mean cost under the other treat-
ment will be within a factor of 5 of this.
A simple example is the prior distribution in the example of hospitaliza-
tion. This can be dissected into two parts. We have substantial prior informa-
tion about mean days in hospital under the related drug, and we have struc-
tural prior information about how it might differ from the corresponding mean
hospitalization under the new drug. Indeed, if the raw data of the earlier trial
are available we might formally analyze these as data with the results of the
new trial. In that case, the prior information is purely structural. The frame-
work now looks a little like a meta-analysis, and indeed Bayesian meta-analy-
sis is based strongly on structural prior information.
In a Bayesian meta-analysis, we have several datasets, each of which
addresses the efficacy of a treatment in slightly different conditions. So we
have a separate parameter for mean efficacy in each trial, but we formulate a
structural prior representing the prior expectation that these parameters
should not be too different. This is usually done in a hierarchical model,
where a common ‘underlying’ mean efficacy parameter is postulated, and each
of the trial mean efficacies is considered to be independently distributed
around this common parameter.
The hierarchical structure, introducing one or more common parameters,
is often used to link several related parameters and to express a belief that they
should be similar via the fact that they should all be similar to the common
parameter. Another example is to formulate structural prior information that
cost data arising in different arms of a trial should not be markedly different in
their degrees of skewness. This has the benefit of moderating the influence of
a very small number of patients with unusually high costs.
Computation – Details
The following subsections give details of Bayesian computation and its
ability to address very complex models.
MCMC
The technique that has revolutionized Bayesian computation is Markov
chain Monte Carlo (MCMC). The idea of MCMC has been outlined in the main
Appendix 69
text: Bayesian inference is solved by randomly drawing a very large sample
from the posterior distribution. Any inference we want to obtain from the pos-
terior distribution we can calculate from the sample.
At this stage, it may be helpful to emphasize that we are not talking about
the sample data. (Usually, we have little control over how many data we can
get, and we don’t expect to have such an enormous sample that we can calcu-
late anything we like from the sample in this simple way.) Instead, we are talk-
ing about artificially generating a sample of parameter values, by some random
simulation method that is somehow constructed to deliver a sample from the
posterior distribution of those parameters.
Strictly speaking, the idea of drawing a large sample from the posterior dis-
tribution is called Monte Carlo. Monte Carlo simulation is used very widely in
science as a powerful indirect way of calculating things that direct mathemati-
cal analysis cannot solve. What makes MCMC different is the way the sample
is drawn. Simple Monte Carlo can be visualized as playing darts – ‘throwing‘
points randomly into the space of all possible values of the parameters, with
each point independent of the others. This approach is impractical for Bayesian
analysis because in a model with many parameters it is extremely difficult to
construct an efficient algorithm for randomly ‘throwing‘ the points according to
the desired posterior distribution. MCMC operates by having a point wander-
ing around the space of possible parameter values. See Figure A4.
1
3
2
5 6
6 4 5
3
4
2
Appendix 71
Tackling hard problems
There is a frequentist parallel to the computational problems of Bayesian
methods, which is that it is extremely difficult to obtain exact frequentist tests,
confidence intervals and unbiased estimators except in really simple models. It
is often overlooked that the great majority of frequentist techniques in general
use are, in fact, only approximate. This includes all methods based on general-
ized linear models, generalized likelihood ratio tests, bootstrapping and many
more. The honorable exception is inference in the standard normal linear
model. Even here, it is not straightforward to compare non-nested models using
frequentist methods.
As described in the main text, the availability of computational techniques
like MCMC makes exact Bayesian inferences possible even in very complex
models. As statisticians strive to address larger, more complex data structures
(micro-array data, data mining, etc), the benefit (B3) – “ability to tackle more
complex problems” – of Bayesian methods becomes increasingly important.