This document summarizes a paper on the current state and future of Bayesian analysis. It notes that Bayesian activity has grown dramatically in the 1990s, both within and outside of statistics. Over the last 10 years, roughly 60 Bayesian books have been written, compared to an estimated 15 books in the first 200 years and 30 books between 1970-1989. Several Bayesian organizations now exist. The full paper aims to provide an overview of current Bayesian work and discuss issues that may shape its future development.
Terms & Conditions of access and use can be found at http:// Vignettes for the Year 2000: Theory and Methods Foreword George CASELLA The vignette series concludes with 22 contributions in Theory and Methods. It is, of course, impossible to cover all of theory and methods with so few articles, but we hope that a snapshot of what was, and what may be, is achieved. This is the essence of vignettes which, according to the American Heritage Dictionaty, are literary sketches hav- ing the intimate charm and subtlety attributed to vignette portraits. Through solicitation and announcement, we have col- lected the Theory and Methods vignettes presented here. The scope is broad, but by no means exhaustive. Exclusion of a topic does not bear on its importance, but rather on the inability to secure a vignette. Many topics were solicited, and a general call was put in Amstut News. Nonetheless, we could not get every topic covered (among other factors, time was very tight). There is some overlap in the vignettes, as the authors, although aware of the other topics, were working indepen- dently, and there are places where information in one vi- gnette complements that in another. Rather than edit out some of the overlap, I have tried to signpost these instances with cross-references allowing the reader the luxury of see- ing two (or even three) views on a topic. Such diverse ac- counts can help to enhance our understanding. As I hope you will agree, the resulting collection is noth- ing short of marvelous. The writers are all experts in their fields, and bring a perception and view that truly highlights each subject area. My goal in this introduction is not to summarize what is contained in the following pages, but rather to entice you to spend some time looking through the vignettes. At the very least, you will find some wonderful stories about the history and development of our subject. (For example, see the vignettes by McCulloch and Meng for different histories of the EM algorithm.) Some of the speculation may even inspire you to try your hand, either in developing the theory or applying the methodology. The question of in which order to present the vignettes was one that I thought hard about. First, I tried to put them in a subject-oriented order, to create some sort of smooth flow throughout. This turned out to be impossible, as the connections between topics is not linear. Moreover, any ab- solute ordering could carry a connotation of importance of the topics, a judgment that I dont feel qualified to make. (Indeed, such a judgment may be impossible to make.) So in the end I settled for an alphabetical ordering according to author name. This is not only objective, but also makes the various vignettes a bit easier to find. George Casella is Arun Varma Commemorative Term Professor and Chair, Department of Statistics, University of Florida, Griffin-Floyd Hall, Gainesville, FL 3261 1 (Email: This work was sup- ported by National Science Foundation grant DMS-9971586. @ 2000 American Statistical Association Journal of the American Statistical Association December 2000, Vol. 95, No. 452, Vignettes Bayesian Analysis: A Look at Today and Thoughts of Tomorrow J ames 0. BERGER 1. INTRODUCTION Life was simple when I became a Bayesian in the 1970s; it was possible to track virtually all Bayesian activity. Preparing this paper on Bayesian statistics was humbling, as I realized that I have lately been aware of only about 10% of the ongoing activity in Bayesian analysis. One goal of this article is thus to provide an overview of, and access to, a significant portion of this current activity. Necessarily, the overview will be extremely brief; indeed, an entire area of Bayesian activity might only be mentioned in one sen- tence and with a single reference. Moreover, many areas of activity are ignored altogether, either due to ignorance on my part or because no single reference provides access to the literature. A second goal is to highlight issues or controversies that may shape the way that Bayesian analysis develops. This material is somewhat self-indulgent and should not be taken too seriously; for instance, if I had been asked to write such an article 10 years ago, I would have missed the mark by - not anticipating the extensive development of Markov J ames 0. Berger is Arts and Sciences Professor of Statistics, Institute of Statistics and Decision Sciences, Duke University, Durham, NC 27708 (E-mail: Preparation was supported by National Science Foundation grant DMS-9802261. 1270 J ournal of the American Statistical Association, December 2000 chain Monte Car10 (MCMC) and its enormous impact on Bayesian statistics. Section 2 provides a brief snapshot of the existing Bayesian activity and emphasizes its dramatic growth in the 1990s, both inside and outside statistics. I found myself simultaneously rejoicing and being disturbed at the level of Bayesian activity. As a Bayesian, I rejoiced to see the ex- tensive utilization of the paradigm, especially among non- statisticians. As a statistician, I worried that our profession may not be adapting fast enough to this dramatic change; we may be in danger of losing Bayesian analysis to other dis- ciplines (as we have lost other areas of statistics). In this regard, it is astonishing that most statistics and biostatistics departments in the United States do not even regularly offer a single Bayesian statistics course. Section 3 is organized by approaches to Bayesian analysis-in particular the objective, subjective, robust, frequentist-Bayes, and what I term quasi-Bayes approaches. This section contains most of my musings about the cur- rent and future state of Bayesian statistics. Section 4 briefly discusses the critical issues of computation and software. 2. BAYESIAN ACTIVITY 2.1 Numbers and Organizations The dramatically increasing level of Bayesian activity can be seen in part through the raw numbers. Harry Martz (per- sonal communication) studied the SciSearch database at Los Alamos National Laboratories to determine the increase in frequency of articles involving Bayesian analysis over the last 25 years. From 1974 to 1994, the trend was linear, with roughly a doubling of articles every 10 years. In the last 5 years, however, there has been a very dramatic upswing in both the number and the rate of increase of Bayesian arti- cles. This same phenomenon is also visible by looking at the number of books written on Bayesian analysis. During the first 200 years of Bayesian analysis (1769-1969), there were perhaps 15 books written on Bayesian statistics. Over the next 20 years (1970-1989), a guess as to the number of Bayesian books produced is 30. Over the last 10 years (1990-1999), roughly 60 Bayesian books have been writ- ten, not counting the many dozens of Bayesian conference proceedings and collections of papers. Bayesian books in particular subject areas are listed in Sections 2.2 and 2.3. A selection of general Bayesian books is given in Appendix A. Another aspect of Bayesian activity is the diversity of existing organizations that are significantly Bayesian in nature, including the following (those with an ac- tive website): International Society of Bayesian Analysis (, ASA Section on Bayesian Sta- tistical Science (, Decision Analysis Society of INFORMS (http://www., and ASA Section on Risk Anal- ysis ( In addition to the activities and meetings of these societies, the following are long-standing series of promi- nent Bayesian meetings that are not organized explic- itly by societies: Valencia Meetings on Bayesian Statis- tics (, Con- ferences on Maximum Entropy and Bayesian Methods (, CMU Work- shops on Bayesian Case Studies ( bayesworkshop/), and RSS Conferences on Practical Bayesian Statistics. The average number of Bayesian meet- ings per year is now well over 10, with at least an equal number of meetings being held that have a strong Bayesian component. 2.2 Interdisciplinary Activities and Applications Applications of Bayesian analysis in industry and govern- ment are rapidly increasing but hard to document, as they are often in-house developments. It is far easier to doc- ument the extensive Bayesian activity in other disciplines; indeed, in many fields of the sciences and engineering, there are now active groups of Bayesian researchers. Here we can do little more than list various fields that have seen a con- siderable amount of Bayesian activity, and present a few references to access the corresponding literature. Most of the listed references are books on Bayesian statistics in the given field, emphasizing that the activity in the field has reached the level wherein books are being written. Indeed, this was the criterion for listing an area, although fields in which there is a commensurate amount of activity, but no book, are also listed. (It would be hard to find an area of human investigation in which there does not exist some level of Bayesian work, so many fields of application are omitted.) For archaeology, see Buck, Cavanaugh, and Litton (1 996); atmospheric sciences, see Berliner, Royle, Wikle, and Milliff (1999); economics and econometrics, see Cy- ert and DeGroot (1987), Poirier (1999, Perlman and Blaug (1997), Kim, Shephard and Chib (1998), and Geweke (1999); education, see J ohnson (1997); epidemiology, see Greenland (1998); engineering, see Godsill and Rayner (1998); genetics, see Iversen, Parmigiani, and Berry (1998), Dawid (1 999) and Liu, Neuwald, and Lawrence (1999); hy- drology, see Parent, Hubert, BobCe and Miquel (1998); law, see DeGroot, Fienberg, and Kadane (1986) and Kadane and Schuan (1996); measurement and assay, see Brown (1993) and; medicine, see Berry and Stangl(l996) and Stangl and Berry (1998); phys- ical sciences, see Bretthorst (1988), J aynes (1999), and; quality management, see Moreno and Rios-Insua (1999); social sci- ences, see Pollard (1986) and J ohnson and Albert (1999). 2.3 Areas of Bayesian Statistics Here, Bayesian activity is listed by statistical area. Theory and Methods of Statistics 1271 and the vignette by George; contingency tables, see the vignette by Fienberg; decision analysis and decision the- ory, see Smith (1988), Robert (1994), Clemen (1996), and the vignette by Brown; design, see Pilz (1991), Chaloner and Verdinelli (1995), and Muller (1999); empirical Bayes, see Carlin and Louis (1996) and the vignette by Carlin and Louis; exchangeability and other foundations, see Good (1983), Regazzini (1999), Kadane, Schervish and Seiden- feld (1999), and the vignette by Robins and Wasserman; finite-population sampling, see Bolfarine and Zacks (1992) and Mukhopadhyay (1998); generalized linear models, see Dey, Ghosh, and Mallick (2000); graphical models and Bayesian networks, see Pearl (1988), J ensen (1986), Lau- ritzen (1996), J ordan (19981, and Cowell, Dawid, Lauritzen, and Spiegelhalter ( 1999); hierarchical (multilevel) mod- eling, see the vignette by Hobert; image processing, see Fitzgerald, Godsill, Kokaram, and Stark (1999); informa- tion, see Barron, Rissanen, and Yu (1998) and the vignette by Soofi; missing data, see Rubin (1987) and the vignette by Meng; nonparametrics and function estimation, see Dey, Muller, and Sinha (1998), Muller and Vidakovic (1999), and the vignette by Robins and Wasserman; ordinal data, see J ohnson and Albert (1999); predictive inference and model averaging, see Aitchison and Dunsmore (1975), Learner (1978), Geisser (1993), Draper (1995), Clyde (19991, and the BMA website under software; reliability and sur- vival analysis, see Clarotti, Barlow, and Spizzichino (1993) and Sinha and Dey (1999); sequential analysis, see Carlin, Kadane, and Gelfand (1998) and Qian and Brown (1999); signal processing, see 0 Ruanaidh and Fitzgerald (1996) and Fitzgerald, Godsill, Kokaram, and Stark (1999); spa- tial statistics, see Wolpert and Ickstadt (1998) and Besag and Higdon (1999); testing, model selection, and variable selection, see Kass and Raftery (1995), OHagan (1995), Berger and Pericchi (19961, Berger (1998), Racugno (19981, Sellke, Bayarri, and Berger (1999), Thiesson, Meek, Chick- ering, and Heckerman (1999), and the vignette by George; time series, see Pole, West, and Harrison (1995), Kitagawa and Gersch (1996) and West and Harrison (1997). 3. APPROACHES TO BAYESIAN ANALYSIS This section presents a rather personal view of the sta- tus and future of five approaches to Bayesian analysis, termed the objective, subjective, robust, frequentist-Bayes, and quasi-Bayes approaches. This is neither a complete list of the approaches to Bayesian analysis nor a broad dis- cussion of the considered approaches. The sections main purpose is to emphasize the variety of different and viable Bayesian approaches to statistics, each of which can be of great value in certain situations and for certain users. We should be aware of the strengths and weaknesses of each approach, as all will be with us in the future and should be respected as part of the Bayesian paradigm. 3.1 Objective Bayesian Analysis It is a common perception that Bayesian analysis is pri- marily a subjective theory. This is true neither historically nor in practice. The first Bayesians, Thomas Bayes (see Bayes 1783) and Laplace (see Laplace 1812), performed Bayesian analysis using a constant prior distribution for unknown parameters. Indeed, this approach to statistics, then called inverse probability (see Dale 1991) was very prominent for most of the nineteenth century and was highly influential in the early part of this century. Crit- icisms of the use of a constant prior distribution caused J effreys to introduce significant refinements of this theory (see J effreys 1961). Most of the applied Bayesian analyses I see today follow the Laplace-Jeffreys objective school of Bayesian analysis, possibly with additional modern refine- ments. (Of course, others may see subjective Bayesian ap- plications more often, depending on the area in which they work.) Many Bayesians object to the label objective Bayes, claiming that it is misleading to say that any statistical anal- ysis can be truly objective. Though agreeing with this at a philosophical level (Berger and Berry 1988), I feel that there are a host of practical and sociological reasons to use the label; statisticians must get over their aversion to calling good things by attractive names. The most familiar element of the objective Bayesian school is the use of noninformative or default prior distri- butions. The most famous of these is the Jefsreys prior (see J effreys 1961). Maximum entropy priors are another well- known type of noninformative prior (although they often also reflect certain informative features of the system being analyzed). The more recent statistical literature emphasizes what are called reference priors (Bernard0 1979; Yang and Berger 1997), which prove remarkably successful from both Bayesian and non-Bayesian perspectives. Kass and Wasser- man (1 996) provided a recent review of methods for select- ing noninformative priors. A quite different area of the objective Bayesian school is that concerned with techniques for default model selec- tion and hypothesis testing. Successful developments in this direction are much more recent (Berger and Pericchi 1996; Kass and Raftery 1995; OHagan 1995; Sellke, Bayarri, and Berger 1999). Indeed, there is still considerable ongoing dis- cussion as to which default methods are to be preferred for these problems (see Racugno 1998). The main concern with objective Bayesian procedures is that they often utilize improper prior distributions, and so do not automatically have desirable Bayesian properties, such as coherency. Also, a poor choice of improper priors can even lead to improper posteriors. Thus proposed ob- jective Bayesian procedures are typically studied to ensure that such problems do not arise. 1272 J ournal of the American Statistical Association, December 2000 subjective prior distributions) can be fully and accurately specified. The difficulty in such specification (Kahneman, Slovic, and Tversky 1986) often limits application of the approach, but there has been a considerable research ef- fort to further develop elicitation techniques for subjective Bayesian analysis (Lad, 1996; French and Smith 1997; The Statistician, 47, 1998). In many problems, use of subjective prior information is clearly essential, and in others it is readily available; use of subjective Bayesian analysis for such problems can provide dramatic gains. Even when a complete subjective analysis is not feasible, judicious use of partly subjective and partly objective prior distributions is often attractive (Andrews, Berger, and Smith 1993). 3.3 Robust Bayesian Analysis Robust Bayesian analysis recognizes the impossibility of complete subjective specification of the model and prior distribution; after all, complete specification would involve an infinite number of assessments, even in the simplest sit- uations. The idea is thus to work with classes of models and classes of prior distributions, with the classes reflect- ing the uncertainty remaining after the (finite) elicitation efforts. (Classes could also reflect the differing judgments of various individuals involved in the decision process.) The foundational arguments for robust Bayesian analysis are compelling (Kadane 1984; Walley 1991), and there is an extensive literature on the development of robust Bayesian methodology, including Berger (1985, 1994), Berger et al. (1996), and Rios Insua ( 1 990). Routine practical implemen- tation of robust Bayesian analysis will require development of appropriate software, however. Robust Bayesian analysis is also an attractive technology for actually implementing a general subjective Bayesian elicitation program. Resources (time and money) for sub- jective elicitation typically are very limited in practice, and need to be optimally utilized. Robust Bayesian analysis can, in principle, be used to direct the elicitation effort, by first assessing if the current information (elicitations and data) is sufficient for solving the problem and then, if not, deter- mining which additional elicitations would be most valuable (Liseo, Petrella, and Salinetti 1996). 3.4 Frequentist Bayes Analysis It is hard to imagine that the current situation, with sev- eral competing foundations for statistics, will exist indef- initely. Assuming that a unified foundation is inevitable, what will it be? Today, an increasing number of statisti- cians envisage that this unified foundation will be a mix of Bayesian and frequentist ideas (with elements of the cur- rent likelihood theory thrown in; see the vignette by Reid). Here is my view of what this mixture will be. First, the language of statistics will be Bayesian. Statis- tics is about measuring uncertainty, and over 50 years of efforts to prove otherwise have convincingly demonstrated that the only coherent language in which to discuss uncer- tainty is the Bayesian language. In addition, the Bayesian language is an order of magnitude easier to understand than the classical language (witness the p value controversy; Sellke et al. 1999), so that a switch to the Bayesian language should considerably increase the attractiveness of statistics. Note that, as discussed earlier, this is not about subjectivity or objectivity; the Bayesian language can be used for either subjective or objective statistical analysis. On the other hand, from a methodological perspective, it is becoming clear that both Bayesian and frequentist methodology is going to be important. For parametric prob- lems, Bayesian analysis seems to have a clear methodolog- ical edge, but frequentist concepts can be very useful, espe- cially in determining good objective Bayesian procedures (see, e.g., the vignette by Reid). In nonparametric analysis, it has long been known (Dia- conis and Freedman 1986) that Bayesian procedures can be- have poorly from a frequentist perspective. Although poor frequentist performance is not necessarily damning to a Bayesian, it typically should be viewed as a warning sign that something is amiss, especially when the prior distribu- tion used contains more hidden information than elicited information (as is virtually always the case with nonpara- metric priors). Furthermore, there are an increasing number of exam- ples in which frequentist arguments yield satisfactory an- swers quite directly, whereas Bayesian analysis requires a formidable amount of extra work. (The simplest such exam- ple is MCMC itself, in which one evaluates an integral by a sample average, and not by a formal Bayesian estimate; see the vignette by Robins and Wasserman for other examples). In such cases, I believe that the frequentist answer can be accepted by Bayesians as an approximate Bayesian answer, although it is not clear in general how this can be formally verified. This discussion of unification has been primarily from a Bayesian perspective. From a frequentist perspective, uni- fication also seems inevitable. It has long been known that optimal unconditional frequentist procedures must be Bayesian (Berger 1985), and there is growing evidence that this must be so even from a conditional frequentist perspective (Berger, Boukai, and Wang 1997). Note that I amnot arguing for an eclectic attitude toward statistics here; indeed, I think the general refusal in our field to strive for a unified perspective has been the single biggest impediment to its advancement. I am simply saying that any unification that will be achieved will almost necessarily have frequentist components to it. 3.5 Quasi-Bayesian Analysis There is another type of Bayesian analysis that one increasingly sees being performed, and that can be un- settling to pure Bayesians and many non-Bayesians. In this type of analysis, priors are chosen in various ad hoc fashions, including choosing vague proper priors, choosing priors to span the range of the likelihood, and choos- ing priors with tuning parameters that are adjusted un- til the answer looks nice. I call such analyses quasi- Bayes because, although they utilize Bayesian machinery, they do not carry the guarantees of good performance that D o w n l o a d e d
Theory and Methods of Statistics 1273 come with either true subjective analysis or (well-studied) objective Bayesian analysis. It is useful to briefly dis- cuss the possible problems with each of these quasi-Bayes procedures. Using vague proper priors will work well when the vague proper prior is a good approximation to a good objective mal hierarchical models with a higher-level variance V, it is quite common to use the vague proper prior density .(v) oc v- ( ~+i ) ~~~( - ~~/ v) , with and small. H ~ ~ - ever, as in these models that the posterior distribution for v will pile up its near ate such procedures should not be discouraged. However, it must be recognized that these procedures do not necessarily have intrinsic Bayesian justifications, and so must be justi- fied on extrinsic grounds (e.g., through extensive sensitivity studies, simulations, etc.). prior, but this can fail to be the case. For instance, in nor- 4. COMPUTATION AND SOFTWARE 4., Even 20 years ago, one often heard the refrain that Bayesian analysis is nice conceptually; too bad it is not possible to compute Bayesian answers in realistic situa- Computations, Techniques +, 0, it is typically the 0, so that the answer can be ridiculous if E is too small. An objective Bayesian who incorrectly used the related prior ~( v) 0~ V-1 would typically become aware of the problem, tions- Today, truly models Often can Only be computationally hand1ed by Bayesian techniques- This has attracted many to the Bayesian approach and because the posterior would not converge (as it will with the vague proper prior is safer than using improper priors, or conveys some type of guarantee of good performance, is simply wrong. The second common quasi-Bayes procedure is to choose priors that span the range of the likelihood function. For instance, one might choose a uniform prior over a range that includes most of the mass of the likelihood function but that does not extend too far (thus hopefully avoiding the problem of using a too vague proper prior). Another version of this procedure is to use conjugate priors, with spread out than the likelihood function but is roughly cen- tered in the same region. The two obvious concerns with these strategies are that (a) the answer can still be quite sen- sitive to the spread of the rather arbitrarily chosen prior, and (b) centering the prior on the likelihood is a problematical double use of the data. Also, in problems with complicated successfully. computation. has had the interesting effect of considerably reducing dis- Bayesian position* Although other goals are possible, most Bayesian com- putation is focused on calculation of posterior expectations, which are typically integrals of one to thousands of dimen- sions. Another common type of Bayesian computation is calculation of the posterior mode (as in computating MAP estimates in image processing). The traditional numerical methods for computing poste- rior expectations are numerical integration, Laplace approx- imation, and Monte Car10 importance sampling. Numerical integration can be effective in moderate (say, up to 10) di- were discussed by Monahan and Genz (1996). Laplace and other saddlepoint approximations are discussed in the vi- gnette by R. Strawdermm. Until recently, Monte carlo im- portance sampling was the most commonly used traditional method of computing posterior expectations. The method can work in very large dimensions and has the nice fea- vague proper prior). The common perception that using a cussion Of philosophical =guments for and against the parameters chosen so that the prior is more mensional problems. Modern developments in this direction likelihoods, it Can be difficult to implement this strategy quasiBaYes procedure is to write down proper (often conjugate) priors with unspecified pa- rameters, and then treat these parameters as tuning pa- rameters to be adjusted until the answer looks nice. Un- fortunately, one is sometimes not told that this has been done; that is, the choice of the parameters is, after the fact, presented as natural. TIEX issues are complicated by the fact that in the hands of an expert Bayesian analyst7 the quasi-BaYes Procedures mentioned here can be quite reasonable, in that the expert may have the experience and skill to tell when the Proce- dures are likely to be s~ccessful. Also, one must always consider the question: What is the alternative? I have seen many examples in which an answer was required and in which I would trust the quasi-Bayes answer more than the answer from any feasible alternative analysis. Finally, it is important to recognize that the genie cannot be put back into the bottle. The Bayesian machine, to- gether with MCMC, is arguably the most powerful mecha- nism ever created for processing data and knowledge. The quasi-Bayes approach can rather easily create procedures of astonishing flexibility for data analysis, and its use to cre- ture of producing reliable of the accuracy of the Today, MCMC has become the most popular method of Bayesian computation, in part because of its power in ban- dling very complex situations and in part because it is corn- paratively easy to program. B~~~~~~the Gibbs sampling vi- gnette by Gelfand and the MCMC vignette by cappc and Robert both address this computational technique, I do not discuss it here. Recent books in the area include those of Chen, Shao, and Ibrahim (2000), Gamerman (1997), Robert and Casella (1999), and Tanner (1993). It is not strictly the case that MCMC is replacing the more traditional methods listed above. 1274 Journal of the American Statistical Association, December 2000 I t would, of course, be wonderful to have a single general- purpose Bayesian software package, but three of the major strengths of the modern Bayesian approach create difficul- ties in developing generic software. One difficulty is the extreme flexibility of Bayesian analysis, with virtually any constructed model being amenable to analysis. Most clas- sical packages need to contend with only a relatively few well-defined models or scenarios for which a classical pro- cedure has been determined. Another strength of Bayesian analysis is the possibility of extensive utilization of sub- jective prior information, and many Bayesians tend to feel that software should include an elaborate expert system for prior elicitation. Finally, implementing the modern compu- tational techniques in a software package is extremely chal- lenging, because it is difficult to codify the art of finding a successful computational strategy in a complex situation. Note that development of software implementing the ob- jective Bayesian approach for standard statistical models can avoid these difficulties. There would be no need for a subjective elicitation interface, and the package could in- corporate specific computational techniques suited to the various standard models being considered. Because the vast majority of statistical analyses done today use such auto- matic software, having a Bayesian version would greatly impact the actual use of Bayesian methodology. Its creation should thus be a high priority for the profession. 