The Futureof Statistics Bayesian
The Futureof Statistics Bayesian
Author(s): D. V. Lindley
Source: Advances in Applied Probability , Sep., 1975, Vol. 7, Supplement: Proceedings of
the Conference on Directions for Mathematical Statistics (Sep., 1975), pp. 106-115
Published by: Applied Probability Trust
JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide
range of content in a trusted digital archive. We use information technology and tools to increase productivity and
facilitate new forms of scholarship. For more information about JSTOR, please contact support@jstor.org.
Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at
https://about.jstor.org/terms
Applied Probability Trust is collaborating with JSTOR to digitize, preserve and extend access to
Advances in Applied Probability
The thesis behind this talk is very simple: the only good statistics
statistics. Bayesian statistics is not just another technique to be ad
repertoire alongside, for example, multivariate analysis; it is the on
that can produce sound inferences and decisions in multivariate, or
branch of, statistics. It is not just another chapter to add to that elemen
you are writing; it is that text. It follows that the unique dire
mathematical statistics must be along the Bayesian road.
The talk is divided into three sections. In the first I shall state th
position and explain how it differs from that which is currently p
hoped that it would not be necessary to include this section, but
persuaded me that I should. In the short time available, a complete s
not possible; but the literature contains many better and fuller stat
can be given here.* In the second section the central thesis will be j
in the third I shall undertake what I see as the real point of the lecture
study of future directions for statistics. It had originally been my
follow Orwell and and use 1984 in the title, but de Finetti (1974) su
hence the longer time span.
1. Bayesian Statistics
* Two references are Lindley (1971) and De Groot (1970). The best is de Finetti's
work, of which Volume I has just appeared (1974) in an English translation.
106
accurately, p(r 0), since it depends on 0. The Bayesian position is that 0 is also
a random variable with its density, p(8) say. After we have conducted the n
trials and seen the event occur r times, r ceases to be a random variable (the
notation often reflects this in a change from R to r). However, 0, not being
observed, retains its random variable status, its density changing from p (0) to
p(0lr) in consequence of the observation. The change here is governed by
Bayes' theorem: p(0Ir) a '(1 - 8)"-rp(0). Bayesian analysis is concerned*
with the distributions of 0 and how they are changed by observations:
sampling-theory statistics is concerned with the only distribution it has, p(rIO),
a distribution, which the Bayesian claims, is irrelevant after R has been
observed to be r.
Example 2. There are many problems which are concerned with the means
of several normal distributions: for example, the common two-way classifica-
tion (rows and columns, say) using the analysis of variance and the concepts of
main effects and interactions. In the Bayesian position the cell means are
themselves random variables whose distributions, as in Example 1, are affected
by observations. The distinction between Model I and Model II analyses
therefore disappears, though the parameters in the latter model, for example,
the variance component for rows, are random variables in the Bayesian
treatment, though not in the orthodox one. We return to these two examples
later in the talk.
Although all unobserved quantities are, in the Bayesian view, random, the
concept of probability thereby implied is not based on frequency considera-
tions. Probability is a relationship between 'you' and the external world,
expressing your views about that external world. In particular, the Bernoulli
'probability', 0 in Example 1, is not a probability in this sense, because it
describes a property of the external world. We refer to it as the propensity of
the event to occur. The important point here is not the names as such, but the
appreciation of the difference between, on the one hand, a relationship between
you and the sequence, and, on the other, a property of the sequence. The
function of names is to distinguish things: the same name is given to things
which are alike; different names to things which are dissimilar. A rose by any
name would smell as sweet but it would be confusing if the alternative name
was daffodil.
Other concepts enter into the Bayesian approach, in particular that of utility
and the combination of it with probability in the notion of expected utility. The
* Though not exclusively. Sometimes it is useful to talk about the unconditional distribution of
r, p(r), not p(rl0), as when we contemplate the possible results of the n trials. Such distributions
are not available in sampling-theory statistics.
2. Justification
The first complete justification for this viewpoint known to me was given
Ramsey (1964) in 1926. His work lay unappreciated for almost thirty years a
modern work begins with Savage's (1954) important book. The best up-to-dat
treatment in a textbook is probably De Groot's (1970). An alternative approac
is due to de Finetti (1964) in 1937. Ramsey's argument is essentially along t
following lines. In considering the way in which people would themselves wi
to act in the face of uncertainty, the statistician is led to state certain axiom
that they would not wish to violate. An example of these is the one Savage
charmingly called the 'sure-thing' principle. It says that if A is preferred to
when C obtains, and also when C does not obtain, then A is preferred to B
when one is uncertain about C. From these axioms it is possible to develop
mathematical system that we call Bayesian statistics. In particular, it is possi
to prove that uncertain quantities have a probability structure; the proper
that we took as basic to the system. I know of no objection to these axioms th
has persisted, and it is a pity that many critics of the approach do not pay more
attention to them instead of misrepresenting the position and so making it lo
ridiculous.
We should, at this point, take note of a great advantage the Bayesian position
has over all other approaches to statistics: namely, in the way just described, it
is a formal system with axioms and theorems. We all know and appreciate the
great impetus given to probability theory by Kolmogorov's (1950) 1933
axiomatisation of that field. A more striking example is provided by Newton's
statement of the laws of mechanics. Only when a system has a formal structure
can we be quite sure what it is we are talking about, and can we teach it to all
intelligent enough and willing to listen. Fisherians have condemned Bayesian
statistics as a 'monolithic structure'. Would they term Newtonian mechanics
monolithic? Critics often refer to Bayes as a Messiah; would they grant the
same status to Newton ? I find this messianic attitude particularly curious when
uttered by Fisherians who appear to regard the collected works, Fisher (1950),
and his last book, Fisher (1956), as the old and new testaments respectively.
An important theorem within the formal system is that which says that
inferences should follow the likelihood principle. Now it so happens that
almost all statistical techniques violate this principle and therefore do not fit
into the system. As a result all these techniques must be capable of producing
nonsense. And this indeed is so. In Lindley (1971) I have given a list of
counter-examples to demonstrate how ridiculous every statistical technique
can be. Thus in Example 1 above suppose it is required to test the hypothesis
0 = 1/2, by a standard significance test. Then a vast range of significance levels
can be produced by varying the sample space, or equivalently changing the
stopping rule. Careful reflection shows that this is not exactly sensible. Or
consider Kendall and Stuart's (1970) optimum estimate of 02 in Example 1,
namely r(r - l)/n(n - 1), when r = 1: to estimate a chance as zero when the
event has occurred is incredible.
The above justification for Bayesian statistics is at a theoretical level, thoug
its practical implications are immense. But an important alternative justific
tion rests on the pragmatic fact that it works. Bayesian statistics satisfies th
two basic requirements of science in resting on sound principles and working
practice. Let me demonstrate this using the two examples above.
Example 1. Consider n, trials with r, successes observed, and contemplate
n2 further trials and ask what are the chances of r2 additional successes. First
let us note that this is a practical problem. The physician who treated n, patients
with a drug and had r, respond successfully, could legitimately ask what migh
happen if the treatment were used on n2 further patients. Indeed Pearson (192
went so far as to describe it as one of the fundamental problems of practica
statistics. Although it rarely occurs in quite the simple form here presented,
solution to it is essential before more complicated and realistic problems are
discussed. But then notice that sampling-theory statistics has no simple way o
answering the question. For within that subject it is not possible to talk of
p(r21n2; ri, n1): only probablities conditional on 0 are admitted. The difficulty
circumvented by either making statements about 0 - to which the doctor's
response is that he is treating this patient, not a long-run frequency of patien
- or, rarely, to resort to the complexities of tolerance intervals. So im-
mediately we see that Bayesian statistics has one practical advantage over the
standard approach. But let us go further and consider the Bayesian answer. For
simplicity take the case n2 = r2 = 1: the chance of success on one further trial.
Under certain assumptions* the probability is (r + a )/(n + a + b) - omitting
the suffixes - where a and b refer to the initial (prior) views of the sequence.
Compare this with r/n, the usual point estimate of 0. The most obvious
difference between the two is the occurrence of a and b in the former but not
the latter. But doesn't this make good, practical sense ? The usual estimate says
that it does not matter whether it is a sequence of patients, transistors,
drawing-pins or coins, the estimate is always the same. The Bayesian argument
says it is necessary to think about whether it is patients, transistors, drawing-
pins or coins that are being discussed, for which it is could affect the choice of a
and b. For example, with drawing-pins I would take a = b = 2, but with coins
a = b = 100, say. The resulting Bayesian answers for modest values of n are
very different: isn't that right ? Wouldn't your reaction to drawing-pins (about
whose tossing propensities you probably know very little) be different from
those with coins (which are well-known to have propensities near 1/2)?
Example 2. The techniques available for studying the two-way layout are
extensive and one faces an embarrassment of choices which the textbooks do
not resolve. One can perform an analysis of variance with its associated
significance tests. But if, for example, the main effect of rows and the
interaction are significant at 1 percent, but not the column effect, how is one
supposed to estimate a cell mean? What multiple comparisons are to be
applied? The Bayesian approach is quite clear, first you have to think about
those rows and columns: are they important factors or are they nuisance
factors that good experimental design has suggested be included ? What do you
know about the factors - is one a control? And so on, thinking about the real
problem in order to assess an initial distribution. Having done this, Bayes
theorem is applied to provide answers to all questions in the form of a
probability distribution. Under certain assumptions the expectation of the
parameter describing the cell in the ith row and the jth column is a linear
function of four quantities, the overall mean x.., the row effect xi. - x.., the
column effect x j - x. and the interaction xij - x,. - x., + x., the weights depend-
ing on the appropriate variance components. The estimates avoid all multiple
comparison difficulties and any ambiguities over the meaning of significance
tests: see Lindley (1975).
(A further point arises here: it was not mentioned in the original lecture but
occurs in Rao's paper and was prominent in the discussion. It is now
well-known that the usual estimate of a multivariate normal mean is unsatisfac-
tory and that the Stein (1956) estimate is preferable. Unfortunately this
* The basic assumption is that the trials are exchangeable. This is weaker than the assumption
of a Bernoulli sequence.
any good statistics. But let it be allied to a good formal framework and regarded
as an approximation to a full Bayesian treatment. Newtonian mechanics
provides a good analogue again. Many problems are impossible to solve strictly
within that framework and much ingenuity is devoted to finding workable
approximations that produce valuable answers. Do your data analysis, but
remember, to make sense, you must never forget the rules of coherent
behaviour, any more than an engineer can forget Newton's laws.
Having cleared some dead wood from the path, let us go forward in a more
constructive vein. Statistics has had its greatest successes in those fields of
science where the long-run frequency view of probability is appropriate - for
example, in agriculture, where experiments may be repeated but nevertheless
the variation is sufficienly large for naive techniques to be inappropriate. But
with the widening of the notion of probablity to embrace non-repeatable
situations the potential scope of statistics is enormously increased. We can now
enter into fields that were previously denied to us, without any loss in the
traditional ones, where propensity and exchangeability replace long-run fre-
quencies and randomisation. The future of statistics looks very bright to me
and perhaps the most important thing I have to say to you today is to ask you to
recognise this enormous widening of our subject. For if we do not recognise
this, others will take over. Let us not repeat the split between OR and statistics.
Only statisticians know how to process evidence: only statisticians know how
to make decisions. (The obvious adjective must be added in two places.)
An an illustration of this widening of the range of applications of statistics,
consider the situation in law. In a court of law, one of the problems is, in
probability language, to assess p(GIE), the probability that the defendant is
guilty, G, given the evidence, E. The judge and jury would clearly wish this
assessment to be done using Bayes theorem; assuming, that is, they do not
themselves wish to stand accused of violating the axioms, such as the
sure-thing principle. At the moment it is unrealistic to be able to do this except
in special cases. One such case is forensic medicine, where the evidence is
precisely stated and certain probabilities are obtainable from scientific evalua-
tions outside the court - such as the chance that two hairs, one from the
suspect, one found at the scene of the crime, have come from the same head.
Again notice, as with the Bayesian solution of Pearson's problem, that such
probabilities do not arise naturally in the usual treatment of this problem.
Utility considerations also enter into legal matters. The jury, in some
situations, is not called upon to pass sentence, that is the prerogative of the
judge. He has a decision problem to solve and will require utility assessments,
either imposed by statute, or by himself, preferably the former. One thing
seems clear: fines should be in utiles. A wealthy man should pay more for a
parking offence than an impecunious student. An interesting example of the
I have spoken of the 21st century. I wish the change could come sooner. How
about a moratorium on research for two years? In the first of these we will all
read de Finetti's first volume: the next year will do for the second. It would do
you, and our subject, a lot of good.
References
DE FINETrT, B. (1964) Foresight: its logical laws, its subjective sources. Studies in Subjective
Probability, ed. Henry E. Kyburg, Jr. and Howard E. Smokier, pp. 93-158, Wiley, New York,
(Translation of La prevision: ses lois logiques, ses sources subjectives, Ann. Inst. H. Poincare, 7
(1937), 1-68.)
DE FINETTI, B. (1974) Theory of Probability: a critical introductory treatment. Volume 1
(Volume 2 to appear) Wiley, New York. (Translation of Teoria delle probabilita, sintesi introduttiva
con appendice critica (1970) Giulio Einaudi, Torino.)
DE GROOT, M.H. (1970) Optimal Statistical Decisions. McGraw-Hill, New York.
FISHER, R.A. (1950) Contributions to Mathematical Statistics. Wiley, New York.
FISHER, R. A. (1956) Statistical Methods and Scientific Inference. Oliver and Boyd, Edinburgh.
JEFFREYS, H. (1967) Theory of Probability. 3rd edition (corrected). Clarendon Press, Oxford.
KENDALL, M. G. AND STUART, A. (1970) The Advanced Theory of Statistics, Volume 2. Griffin,
London.
KOLMOGOROV, A.N. (1950) Foundations of the Theory of Probability. Chelsea, New York
(Translation of Grundbegriffe der Wahrscheinlichkeitsrechnung (1933), Springer, Berlin.)
LINDLEY, D.V. (1971) Bayesian Statistics, a Review. SIAM, Philadelphia.
LINDLEY. D. V. (1975) A Bayesian Solution solution for two-way analysis of variance. Proc.
1972 Meeting of Statisticians, Budapest. (To appear).
PEARSON, K. (1920) The fundamental problem of practical statistics. Biometrika 13, 1-16.
RAMSEY, F.P. (1964) Truth and Probability. Studies in Subjective Probability, ed. Henry E
Kyburg, Jr. and Howard E. Smokier, pp. 61-92, Wiley, New York, (Reprinted from Th
Foundations of Mathematics and Other Essays. (1931), 156-198, Kegan, Paul, Trench, Trubner &
Co. Ltd., London.
SAVAGE, L.J. (1954) The Foundations of Statistics. Wiley, New York.
SAVAGE, L.J. (1971) Elicitation of personal probabilities and expectations. J. Amer. Statist.
Assoc. 66, 783-801.
STEIN, C. M. (1956) Inadmissibility of the usual estimator for the mean of a multivariate normal
distribution. Proc. Third Berkeley Symp. Math. Statist. Prob. 1, 197-206. University of California
Press, Berkeley.
WATTS, D.G. (1968) Conference on the Future of Statistics. Academic Press, New York.