Probability
Probability
Probability
Contents
page
Introduction vii
1 The philosophy of induction 1
2 Probability and indifference 21
Intermezzo: a formal scheme of reference 28
3 Reactive frequency and induction 33
4 Probability and belief 67
5 The theory of simplicity 98
6 From probability to econometrics 119
7 Econometric modelling 142
8 In search of homogeneity 180
9 Positivism and the aims of econometrics 213
10 Probability, econometrics and truth 262
Personalia 276
References 281
Name Index 299
Subject Index 307
v
Introduction
Probability begins and ends with probability.
Keynes ([1921] CW VIII, p. 356)
When John Maynard Keynes accused Jan Tinbergen of practising black
magic and alchemy, econometrics was still in its infancy. A critical atti-
tude to econometrics was legitimate, as it would have been for any novel
enterprise. Stubborn perseverance on behalf of the pioneers of econo-
metrics is natural as well. However, after more than half a century of
development by some of the most brilliant social scientists, and much
practical experience, Keynes' comments are repeated today by respected
authorities. Has it all been in vain?
Not quite. It is true that the aspirations (or pretences) of econometrics
and the accomplishments still tend to be divided by a gap, which, in turn,
tends to damage the credibility of the whole discipline. Many of econo-
metrics' results remain controversial. Some critics claim that even the
most basic aim, the measurement and quantitative description of eco-
nomic phenomena, has not been accomplished. Econometric evidence
has been compared with the evidence of miracles in Lourdes. Some
deplore the waste of electricity used for econometric computer calcula-
tions. But a fair appraisal of contemporary econometrics cannot deny
that a number of interesting empirical lessons have been learnt. The
verdict that the econometric exploration was all in vain can only result
from a wrong interpretation of econometric aims.
This book is a methodological investigation of this exploration. It
confronts the aims with the methods and with the philosophical as well
as the probabilistic foundations of those methods. It concludes that the
achievements of econometrics can be found where its aspirations are put
in the positivist tradition. Positivism is a philosophy which has been
declared dead by many. It should be resurrected.
Positivism has an ancestor in David Hume, one of the founders of
British empiricism (the forerunner of positivism). Hume ([1748] 1977,
p. 114) encouraged his readers to ask about any book in their libraries,
vii
Does it contain any abstract reasoning concerning quantity or number? No. Does it
contain any experimental reasoning concerning matter of fact and existence? No.
Commit it then to the ames: For it can contain nothing but sophistry and
illusion.
`Science is Measurement', the original motto of the Cowles Commission
for Research in Economics (which had a leading role in shaping formal
econometrics), put econometrics clearly in the positivist tradition.
Twenty years later, in 1952, this motto was changed to `Theory and
Measurement', reecting the ambitions of a younger generation of
researchers (headed by Trygve Haavelmo and Tjalling Koopmans) to
integrate econometrics with (neoclassical) economic theory and formal
probability theory. The new tradition diverted econometrics to Neyman
Pearson testing procedures, away from the positivist tradition of Karl
Pearson (father of Neyman's companion Egon), Fisher and Jeffreys.
Simultaneously, in the philosophy of science positivism came under
attack and was replaced by methodological falsicationism. Chapter 1
discusses this philosophy, chapters 2 to 4 deal with different approaches
in probability theory. I claim that the Cowles programme in econo-
metrics, with its NeymanPearson foundation and a philosophical
sauce of methodological falsicationism, has done the reputation of
econometrics much harm. This claim is elaborated in the chapters
which follow the discussion of the various probability theories. The tran-
sition from probability theory to econometrics is shaky, as chapters 6 and
7 demonstrate. Chapter 8, which presents a case study in one of the best
episodes of applied econometric inference, shows that the sampling and
testing metaphors which dominated econometrics can lead to serious self-
deceit. Chapters 9 and 10 bring various arguments together and recom-
mend the positivist tradition, in which econometrics was born and to
which it should be brought back again.
Finally, what about truth? Does econometrics, based on the right kind
of probability, yield `true knowledge'? Not so. The quest for truth, which
dominates much of contemporary econometrics, should be abandoned. If
econometricians are able to deliver useful approximations to empirical
data, they achieve a major accomplishment. What denes `useful' is an
intricate matter, which can only be claried on a case-by-case basis. A
model which is good for one purpose, may be inappropriate for another.
I hope that the reader will allow the author a few disclaimers. First, the
focus on econometrics does not mean that there are no other ways of
doing empirical economics, neither is it intended to suggest that purely
theoretical work is not interesting. This book intends to provide a com-
viii Introduction
plement to books on economic methodology which tend to ignore the
strengths but also weaknesses of econometric inference.
Secondly, even though the book focuses on econometrics, I neglected
some approaches that might warrant discussion. For example, there is
hardly discussion of non-parametric inference, bootstraps, or even many
specic cross-section topics. Many of the fundamental themes discussed
here apply equally to econometric approaches which are beyond the
scope of this book.
Thirdly, I have assumed that the readers of this book will be econo-
metricians who are interested in the philosophical and statistical roots of
their activities, and economic methodologists who have an interest in the
scope and limits of empirical econometrics. This brings the risk that
econometricians may nd the econometric discussions not always satis-
factory, while methodologists might complain that I have not done jus-
tice to all philosophical theories and subtleties that they can think of. I
hope that both types of readers are willing to search for added value
rather than concentrate on what they already know.
After the disclaimers nally some words of thanks. The book grew out
of a PhD thesis, which was defended at the CentER for Economic
Research of Tilburg University (The Netherlands). I also was able to
discuss various parts with colleagues during visits to the London
School of Economics, Duke University, the Eidgeno ssische Technische
Hochschule Zu rich, the University of Western Australia and at many
seminars and conferences. For their nancial support, I would like to
thank in particular the Foreign and Commonwealth Ofce, the
Fulbright Commission and NWO (Netherlands Organisation for
Scientic Research).
In addition, I would like to thank many persons for inspiration, com-
ments and collaboration. They are my fellow authors Ton Barten, Jan
Magnus and Michael McAleer; many sparring partners among whom
Mark Blaug, Nancy Cartwright, James Davidson, Tony Edwards,
David Hendry, Colin Howson, Rudolf Kalman, the late Joop Klant,
Ed Leamer, Neil de Marchi, Mary Morgan, Stephen Pollock, Ron
Smith, Mark Steel, Paul Vita nyi and Arnold Zellner. Michiel de Nooij
provided highly appreciated research assistance.
Introduction ix
1 The philosophy of induction
[S]ome other scientists are liable to say that a hypothesis is denitely
proved by observation, which is certainly a logical fallacy; most
statisticians appear to regard observations as a basis for possibly
rejecting hypotheses, but in no case for supporting them. The latter
attitude, if adopted consistently, would reduce all inductive inference to
guesswork.
Harold Jeffreys ([1939] 1961, p. ix)
1 Introduction
Occasionally, the aspirations of econometrics are frustrated by technical
difculties which lead to increasing technical sophistication. More often,
however, deeper problems hamper econometrics. These are the problems
of scientic inference the logical, cognitive and empirical limitations to
induction. There is an escapist tendency in econometrics, which is to seek
salvation in higher technical sophistication and to avoid deeper philoso-
phical problems. This is reected by the erosion of an early foothold of
empirical econometrics, Econometrica. The share of empirical papers has
declined from a third in the rst (1933) volume to a fth in recent
volumes. This is not because most empirical values for economic vari-
ables or parameters have been settled. Despite the `econometric revolu-
tion', there is no well established numerical value for the price elasticity of
bananas. If Econometrica were to publish an issue with well established
econometric facts, it might be very thin indeed. The factual knowledge of
the economy remains far from perfect, as are the ability to predict its
performance, and the understanding of its underlying processes. Basic
economic phenomena, such as the consumption and saving patterns of
agents, remain enigmatic. After many years of econometric investigation,
there is no agreement on whether money causes output or not. Rival
theories ourish. Hence, one may wonder what the added-value of econo-
metrics is. Can we learn from experience in economics, and, if so, does
econometrics itself serve this purpose? Or, were the aspirations too high
after all, and does the sceptical attitude of Keynes half a century ago
remain justied today?
1
2 Humean scepticism
An important issue in the philosophy of science is how (empirical) knowl-
edge can be obtained.
1
This issue has a long history, dating back (at least)
to the days of the Greek Academy, in particular to the philosopher
Pyrrho of Elis (c. 365275 BC), the rst and most radical sceptic.
Academic scepticism, represented for example by Cicero (10643 BC), is
more moderate than Pyrrho's. The ideas of Pyrrho (who did not write
books, `wisely' as Russell, 1946, p. 256, remarks) are known via his pupil
Timon of Phlius (c. 320230 BC) and his follower Sextus Empiricus (sec-
ond century AD), whose work was translated into Latin in 1569. A few
earlier translations are known but they have probably only been read by
their translators. The 1569 translation was widely studied in the sixteenth
and seventeenth centuries. All major philosophers of this period referred
to scepticism. Rene Descartes, for example, claimed to be the rst philo-
sopher to refute scepticism.
One of the themes of the early sceptics is that only deductive inference
is valid (by which they mean: logically acceptable) for a demonstrative
proof, while induction is invalid as a means for obtaining knowledge.
Perception does not lead to general knowledge. According to Russell
(1946, p. 257),
Scepticism naturally made an appeal to many unphilosophic minds. People
observed the diversity of schools and the acerbity of their disputes, and decided
that all alike were pretending to knowledge which was in fact unattainable.
Scepticism was a lazy man's consolation, since it showed the ignorant to be as
wise as the reputed men of learning.
Still, there was much interest in scepticism since the publication of the
translation of Sextus Empiricus' work, not only by `unphilosophic
minds'. Scepticism has been hard to refute. Hume contributed to the
sceptical doctrine (although he did not end up as a Pyrrhonian, i.e. radi-
cal sceptic). The result, `Humean scepticism', is so powerful, that many
philosophers still consider it to be a death blow to induction, the `scandal
of philosophy'.
2
Hume ([1739] 1962) argues that the empirical sciences cannot deliver
causal knowledge. There are no rational grounds for understanding the
causes of events. One may observe a sequence of events and call them
cause and effect, but the connection between the two remains hidden.
Generalizations deserve scepticism. Hume (Book i, Part iii, section 12,
p. 189) summarizes this in two principles:
that there is nothing in any object, considered in itself, which can afford us a reason
for drawing a conclusion beyond it; and, that even after the observation of the
2 Probability, Econometrics and Truth
frequent or constant conjunction of objects, we have no reason to draw any inference
concerning any object beyond those of which we have had experience.
The `scandal of philosophy' is fundamental to empirical scientic infer-
ence (not just econometrics). It has wider implications (as Hume indi-
cates) than denying causal inference. For example, does past experience
justify the expectation of a sunrise tomorrow? The question was raised in
discussing the merits of Pierre Simon de Laplace's `rule of succession', a
statistical device for induction (see chapter 2).
3
Another example, popular
in philosophy, deals with extrapolation to a population instead of the
future: if only white swans have been observed, may we infer that all
swans are white? (This is the classic example of an afrmative universal
statement.)
The sceptical answer to these questions is negative. The rules of deduc-
tive logic prohibit drawing a general conclusion if this conclusion is not
entailed by its propositions. There is no logical reason why the next swan
should be white. Of course, swans can be dened to be white (like statis-
ticians who dene a fair die to be unbiased), making black swans a
contradiction in terms. An alternative strategy is to conclude that all
known swans are white. The conclusion is conditional on the observed
sample. Hence, the choice is between formulating denitions or making
conditional enumerations. But most empirical scientists want to make
generalizations. This is impossible if the induction problem proves insur-
mountable. Therefore, an understanding of induction is essential.
The logical form of the induction problem is that all observed X are
does not entail that all X are . The next three chapters, dealing with
probabilistic inference, consider a more delicate, probabilistic form of the
induction problem: given that most observed X are , what can be said
about X in general? The source of Humean scepticism follows from the
conjunction of three propositions (Watkins, 1984, p. 3):
(i) there are no synthetic a priori truths about the external world;
(ii) any genuine knowledge we have of the external world must ulti-
mately be derived from perceptual experience;
(iii) only deductive derivations are valid.
The conjunction of (i), (ii) and (iii) does not allow for inferring knowledge
beyond the initial premises. In this sense, inductive inference is impos-
sible.
John Watkins (p. 12) argues that a philosophical, or `rational' answer
to scepticism is needed, because otherwise it is likely to encourage irra-
tionality. Watkins holds that Hume himself regarded philosophical scep-
ticism as an academic joke. Indeed, Hume uses the expression jeux
d'esprit (in A letter From a Gentleman to his Friend in Edinburgh, included
The philosophy of induction 3
as an appendix to Hume [1748] 1977, p. 116). Describing the person who
is aficted by Pyrrhonism, Hume (p. 111) concludes:
When he awakes from his dream, he will be the rst to join in the laugh against
himself, and to confess, that all his objections are mere amusement.
Amusement, Watkins (1984, p. 12) argues, does not qualify as a rational
answer to scepticism. In fact, Hume's response is more elaborate than the
quotation suggests. It relies on conventionalism (see below). I agree with
Watkins that, formally, conventionalism is not very appealing (although
conventions have much practical merit). Fortunately, there are alterna-
tives. Once the source of Hume's problem (the threefold conjunction just
mentioned) is claried, the merits of those alternative responses to scepti-
cism can be appraised.
Watkins (pp. 45) discusses a number of strategies as responses to
Hume's problem. The most interesting ones are:
.
the naturalist (ignoring the conjunction of propositions (i)(iii));
.
the apriorist (denying proposition (i));
.
the conjecturalist (amending proposition (ii)); and
.
the probabilist strategy (which takes odds with proposition (iii)).
A more detailed discussion of the probabilist strategy will be given in the
next three chapters, while the remainder of this book considers how well
this strategy may work in econometrics.
3 Naturalism and pragmatism
Descartes argued that one should distrust sensations. Insight in causal
relations results from mere reasoning. Hume, criticizing Cartesian `dog-
matic rationalism', argues that such plain reasoning does not sufce to
obtain unique answers to scientic questions. Cartesian doubt, `were it
ever possible to be attained by any human creature (as it plainly is not)
would be entirely incurable' (Hume [1748] 1977, p. 103). It would not
yield true knowledge either: `reasoning a priori, any thing might appear
able to produce anything' (Letter From a Gentleman, p. 119). Cartesian
doubt is unacceptable to Hume ([1739] 1962, p. 318). It gave him a head-
ache:
The intense view of these manifold contradictions and imperfections in human
reason has so wrought upon me, and heated my brain, that I am ready to reject all
belief and reasoning, and can look upon no opinion even as more probable or
likely than another.
But this does not make Hume a Pyrrhonian or radical sceptic. He is
rescued from this philosophical `melancholy and delirium' by nature.
4 Probability, Econometrics and Truth
His naturalist strategy is to concede that there is no epistemological
answer to scepticism, but to deny its importance. It is human nature to
make generalizing inferences, the fact that inference is not warranted
from a logical point of view has no practical implications. Hume (An
Abstract of a Book Lately Published, Entitled, A Treatise of Human
Nature, Etc., in Hume [1739] 1962, p. 348) concludes,
that we assent to our faculties, and employ our reason only because we cannot
help it. Philosophy would render us entirely Pyrrhonian, were not nature too
strong for it.
The great subverter of Pyrrhonism, Hume ([1748] 1977, p. 109) writes, is
`action, and employment, and the occupations of common life'. Not
reasoning, but custom and habit, based on the awareness of constant
conjunctions of objects, make human beings draw inferences (p. 28).
This response is known as conventionalism. According to Hume
(p. 29), custom is the `great guide of human life', and without custom
or habit, those who are guided only by Pyrrhonian doubt will `remain in a
total lethargy, till the necessities of nature, unsatised, put an end to their
miserable existence' (p. 110). Reason is the slave of our passions.
A pinch of Pyrrhonian doubt remains useful, because it makes inves-
tigators aware of their fallibility (p. 112). The fact that one cannot obtain
absolute certainty by human reasoning does not imply universal doubt,
but only suggests that researchers should be modest (Letter From a
Gentleman, p. 116). But many scientists will feel embarrassed by the
conclusion that custom is the ultimate foundation of scientic inference.
Watkins, for example, rejects it. However, conventionalism may be
rationally justied. This has been attempted by some adherents of the
probabilistic approach. Other strategies related to Hume's conventional-
ism are instrumentalism (developed by John Dewey) and pragmatism, or
pragmaticism, as Charles Peirce christened it. These hold that hypotheses
may be accepted and rejected on rational grounds, on the basis of utility
or effectiveness. The pragmatic approach can be combined with the prob-
abilistic strategy. But it is not free of problems. Most importantly, it is an
invitation to scientic obscurantism (should a theory be useful to the
learned who qualies? or to the mighty?). A problem with conven-
tionalism is to give an answer to the question `where do these conventions
come from?' and to provide a rational justication for the conventions
(evolutionary game theory has been directed to this question). Lawrence
Boland (1982; also 1989, p. 33) argues that neoclassical economists deal
with the induction problem by adopting a conventionalist strategy.
Econometricians base much of their work on another convention con-
cerning the size of a test: the well known 5% signicance level. This
The philosophy of induction 5
convention has its roots in a quarrel between Karl Pearson and R. A.
Fisher, two founders of modern statistics (see chapter 3, section 3.2).
4 Apriorism
The apriorist strategy to the problem of scepticism denies proposition (i),
concerning the absence of synthetic a priori truth. Immanuel Kant
invented this notion of a priori synthetic truth, true knowledge that is
both empirical and based on reasoning. It is neither analytic nor syn-
thetic.
4
The canonical example of an a priori synthetic truth is Kant's
Principle of Universal Causation, which is his response to Humean scep-
ticism. Kant argued that everything must have a cause: `Everything that
happens presupposes something upon which it follows in accordance with
a rule' (translated from Kritik der reinen Vernunft, Kant's most important
work, published in 1781; in Kru ger, 1987, p. 72). This doctrine is also
known as causal determinism, or simply as causalism (Bunge [1959] 1979,
p. 4).
Unlike Hume, John Stuart Mill endorsed Kant's principle: for Mill,
induction is the search for causes. Mill distinguishes four `canons of
induction', given in his Logic, III (viii); Mill [1843] 1952):
.
the method of agreement;
.
the method of difference;
.
the method of residues;
.
the method of concomitant variations.
These methods are based on the `principle of uniformity of nature', which
holds that the future will resemble the past: the same events will happen
again if the conditions are sufciently similar. The method of difference
starts from the premise that all events have a cause. The next step is to
give an exhaustive list of possible causes, and select the one(s) which
always occurs in common with the event, and does not occur if the
event does not occur. A problem is to select this exhaustive list of possible
causes.
Keynes ([1921] CW VIII, p. 252) refers to the principle of uniformity of
nature in his discussion of reasoning by analogy, and suggests that dif-
ferences in position in time and space should be irrelevant for the validity
of inductions. If this principle forms the basis for induction, it cannot
itself be founded upon inductive arguments. Furthermore, it is doubtful
that experience validates such a strong principle. Nature seems much
more erratic and surprising than the principle of uniformity of nature
suggests. Still, the late philosopher Karl Popper ([1935] 1968, p. 252)
explicitly argues that `scientic method presupposes the immutability of
natural processes, or the ``principle of the uniformity of nature'' '.
6 Probability, Econometrics and Truth
Likewise, Bernt Stigum (1990, p. 542) argues that this principle is a
necessary postulate of epistemology. Some probability theorists advocate
a statistical version of this synthetic a priori truth: the stability of mass
phenomena (see in particular the discussion of von Mises in chapter 3,
section 2).
In the social sciences, it is not the uniformity of nature which is of
interest, but the relative stability of human behaviour. A more apt termi-
nology for the principle would then be the `principle of stable behaviour'.
Consider the axioms of consumer behaviour. If one assumes that prefer-
ences are stable (Hahn, 1985, argues this is all the axioms really say), then
accepting these axioms as a priori truths warrants inductive generaliza-
tions. This principle solves, or rather, sidesteps, the Humean problem. If
it is accepted, generalizations from human experience are admissible. But
again this postulate is doubtful. Too frequently, humans behave errati-
cally, and on a deeper level, reexivity (self-fullling prophecies) may
undermine uniform regularities in the social sciences. It suffers from
the same problems as the principle of uniformity of nature: either it is
false, or its justication involves innite regress. But a weaker principle of
stable behaviour may be accepted, by giving a probabilistic interpretation
to the generalization. There should be an appreciable (non-zero) prob-
ability that stable behaviour may be expected. This is the basis for
rational behaviour. A fair amount of stability is also necessary (not suf-
cient) for scientic inference: otherwise, it is impossible to `discover' laws,
or regularities.
It is hard to imagine interesting a priori synthetic truths specic to
economics. The axioms of consumer behaviour are not generally accepted
as true. An investigation of their validity cannot start by casting them
beyond doubt (chapter 8 provides a case history of `testing' consumer
demand theory). Bruce Caldwell (1982, p. 121) discusses praxeological
axioms of Austrian economists as an example of Kant's a priori synthe-
tical propositions. The Austrian Friedrich von Wieser argued that a
cumbersome sequence of induction is not needed to establish laws in
economics. He claimed (cited in Hutchison, 1981, p. 206) that we can
`hear the law pronounced by an unmistakable inner voice'. Ludwig von
Mises made apriorism the cornerstone of his methodology. The problem
of this line of thought is that inner voices may conict. If so, how are we
to decide which voice to listen to?
5 Conjecturalism
The conjecturalist strategy denies Watkins' proposition (ii) and instead
holds that scientic knowledge is only negatively controlled by experi-
The philosophy of induction 7
ence: through falsication. Popper provided the basic insights of the
conjecturalist philosophy (also known as methodological falsicationism)
in his Logik der Forschung in 1934 (translated as Popper, [1935] 1968).
This nearly coincides with one of the rst efforts to test economic theory
with econometric means (Tinbergen, 1939b). Followers of Popper are,
among others, Imre Lakatos and Watkins. I will rst discuss Popper's
views on inference, then Lakatos' modied conjecturalism.
5.1 Popper's conjecturalism
Popper's impact on economic methodology has been strong. Two pro-
nounced Popperians in economics are Mark Blaug (1980) and Terence
Hutchison (1981). Moreover, statisticians and econometricians fre-
quently make favourable references to Popper (Box, 1980, p. 383, n.;
Hendry, 1980; Spanos, 1986) or believe that Popper's is `the widely
accepted methodological philosophy as to the nature of scientic pro-
gress' (Bowden, 1989, p. 3). Critics claim that the real impact of
Popperian thought on economic inference is more limited (see also De
Marchi, 1988; Caldwell, 1991).
5.1.1 Falsication and verication
Scientic statements are those which can be refuted by empirical
observation. Scientists should make bold conjectures and try to falsify
them. This is the conjecturalist view in a nutshell. More precisely, theories
are thought of as mere guesses, conjectures, which have to be falsiable in
order to earn the predicate scientic. The modus tollens (if p, then q. But
not-q. Therefore, not-p) applies to scientic inference if a prediction
which can be deduced from a generalization (theory) is falsied, then that
generalization itself is false. The rules of deductive logic provide a basis
for scientic rationality and, therefore, make it possible to overcome the
problems of Humean scepticism. Falsiability distinguishes science from
non-science (the demarcation criterion). The growth of knowledge fol-
lows from an enduring sequence of conjectures and refutations. Theories
are replaced by better, but still fallible, theories. Scientists should remain
critical of their work.
So far, there seems not much controversial about the conjecturalist
approach. The tentative nature of science is a commonplace. Popper
went beyond the commonplace by constructing a philosophy of science
on it, methodological falsicationism. A source of controversy is
Popper's critique of logical positivism, the philosophy associated with
the Wiener Kreis.
5
A related source is his obnoxious rejection of
induction.
8 Probability, Econometrics and Truth
Logical positivism holds that the possibility of empirical verication,
rather than falsication, makes an empirical statement `meaningful' (the
meaning lies in its method of verication). There are many problems with
this view, but Popper aimed his re at an elementary one: afrmative
universal statements, like `all swans are white', are not veriable. In
response to Popper's critique, Carnap dropped the veriability criterion
and started to work on a theory of conrmation (see also chapter 4,
section 3.1). Again, this theory was criticized by Popper.
The logical difference between verication and falsication is straight-
forward. The observation of a white swan does not imply the truth of the
claim `all swans are white'. On the other hand, observing a black swan
makes a judgement about the truth of the claim possible. In other words,
there is a logical asymmetry between verication and falsication. This
asymmetry is central to Popper's ideas: `It is of great importance to
current discussion to notice that falsiability in the sense of my demarca-
tion principle is a purely logical affair' (Popper, 1983, p. xx; emphasis
added). This logical affair is not helpful in guiding the work of applied
scientists, like econometricians. It should have real-world implications.
For this purpose, Popper suggests the crucial test, a test that leads to the
unequivocal rejection of a theory. Such a test is hard to nd in economics.
According to Popper, it is much easier to nd conrmations than
falsications. In the example of swans this may be true, but for economic
theories things seem to be rather different. It is not easy to construct an
interesting economic theory which cannot be rejected out of hand. But if
verication does not make science, Popper needs another argument for
understanding the growth of knowledge. Popper ([1935] 1968, p. 39)
bases this argument on severe testing:
there is a great number presumably an innite number of `logically possible
worlds'. Yet the system called `empirical science' is intended to represent only one
world: the `real world' or `world of our experience' . . . But how is the system that
represents our world of experience to be distinguished? The answer is: by the fact
that it has been submitted to tests, and has stood up to tests.
Experience is the sieve for the abundance of logically possible worlds. The
difference with induction results from a linkage of experience with falsi-
cations: experience performs a negative function in inference, not the
positive one of induction.
Popper's idea that the truth of a theory cannot be proven on the basis
of (afrming) observations, is not revolutionary indeed, it basically
rephrases Hume's argument. Obviously, it was known to the logical
positivist. And it had already been a common-sense notion in the statis-
tical literature for ages (in fact, Francis Bacon had already made the
The philosophy of induction 9
argument, as shown by Turner, 1986, p. 10). One can nd this, explicitly,
in the writings of Karl Pearson, Ronald Aylmer Fisher, Harold Jeffreys
(see the epigraph to this chapter), Jerzy Neyman and Egon Pearson,
6
Frederick Mills, Jan Tinbergen,
7
Tjalling Koopmans (1937) and probably
many others. They did not need philosophical consultation to gain this
insight, neither did they render it a philosophical dogma according to
which falsication becomes the highest virtue of a scientist.
8
Econometricians are, in this respect, just like other scientists: they rarely
aim at falsifying, but try to construct satisfactory empirical models (see
Keuzenkamp and Barten, 1995). Of course, `satisfactory' needs to be
dened, and this is difcult.
Jeffreys' epigraph to this chapter can be supplemented by a remark
made by the theoretical physicist Richard Feynman (1965, p. 160): `gues-
sing is a dumb man's job'. A machine fabricating random guesses may be
constructed, consequences can be computed and compared with observa-
tions. Real science is very different: guesses are informed, sometimes
resulting from theoretical paradoxes, sometimes from experience and
experiment. Jeffreys argues that one may agree with Popper's insight
that conrmation is not the same as proof, without having to conclude
that conrmation (or verication) is useless for theory appraisal, and
induction impossible.
5.1.2 The crucial test
An important example to illustrate Popper's ([1935] 1968) meth-
odological falsicationism is Einstein's general theory of relativity, which
predicts a red shift in the spectra of stars.
9
This is the typical example of a
prediction of a novel fact which can be tested. Indeed, a test was per-
formed with a favourable result. But Paul Feyerabend (1975, p. 57, n. 9)
shows that Einstein would not have changed his mind if the test had been
negative. In fact, many of Popper's examples of crucial tests in physics
turn out to be far more complicated when studied in detail (see
Feyerabend, 1975; Lakatos, 1970; Hacking, 1983, chapter 15, agrees
with Lakatos' critique on crucial tests, but criticizes Lakatos for not
giving proper credit to empirical work).
For several reasons, few tests are crucial. First, there is the famous
`DuhemQuine problem'. Second, in many cases rejection by a `crucial
test' leaves the researcher empty handed. It is, therefore, unclear what the
implication of such a test should be if any. Third, most empirical tests
are probabilistic. This makes it hard to obtain decisive inferences (this
will be discussed below).
10 Probability, Econometrics and Truth
The DuhemQuine problem is that a falsication can be a falsication
of anything. A theory is an interconnected web of propositions. Quine
([1953] 1961, p. 43) argues,
Any statement can be held true come what may, if we make drastic enough
adjustments elsewhere in the system. . . . Conversely, by the same token, no state-
ment is immune to revision. Revision even of the logical law of the excluded
middle has been proposed as a means of simplifying quantum mechanics; and
what difference is there in principle between such a shift and the shift whereby
Kepler superseded Ptolemy, or Einstein Newton, or Darwin Aristotle?
For example, rejection of homogeneity in consumer demand (see chap-
ter 8) may cast doubt on the homogeneity proposition, but also point to
problems due to aggregation, dynamic adjustment, the quality of the data
and so on. In the example `all swans are white' the observation of a green
swan may be a falsication, but also evidence of hallucination or proof
that the observer wears sunglasses. The theory of simplicity provides
useful additional insights in the DuhemQuine thesis (see chapter 5, sec-
tion 3.2).
Second, what might a falsication imply? Should the theory be aban-
doned? Or, if two conicting theories are tested, how should falsications
be weighted if both theories have defects? This is of particular interest in
economics, as no economic theory is without anomalies. If induction is
impossible, is the support for a theory irrelevant? Popper ([1963] 1989,
chapter 10) tries to formulate an answer to these questions by means of
the notion of verisimilitude or `truthlikeness'. In order to have empirical
content, verisimilitude should be measurable. But this brings in induction
through the back door (see chapter 3, section 5.3).
In a review of Popper's methodology, Hausman (1988, p. 17) discusses
the rst and second problem. He concludes that Popper's philosophy of
science is `a mess, and that Popper is a very poor authority for economists
interested in the philosophy of science to look to'. Hausman shows that a
Popperian either has to insist on logical falsiability, in which case there
will be no science (everything will be rejected), or has to consider entire
`test systems' (in Popper's vocabulary, `scientic systems'), in which case
severe testing has little impact on the hypothesis of interest. The reason
for the latter is that such a test system combines a number of basic
statements and auxiliary hypotheses. If, as Popper claims, conrmation
is impossible, one is unable to rely on supporting evidence from which
one may infer that the auxiliary hypotheses are valid. A falsication,
therefore, can be the falsication of anything in the test system: the
DuhemQuine problem strikes with full force. One should be able to
rely on some rational reason for tentative acceptance of `background
The philosophy of induction 11
knowledge'. Crucial tests and logical falsication are of little interest in
economic inference. The complications of empirical testing, or what
Popper also calls `conventional falsication', are much more interesting,
but Popper is of little help in this regard. Lakatos (1978, pp. 1656)
reaches a similar conclusion as Hausman:
By refusing to accept a `thin' metaphysical principle of induction Popper fails to
separate rationalism from irrationalism, weak light from total darkness. Without
this principle Popper's `corroborations' or `refutations' and my `progress' or
`degeneration' would remain mere honoric titles awarded in a pure game . . . only
a positive solution of the problem of induction can save Popperian rationalism
from Feyerabend's epistemological anarchism.
Lakatos' own contribution is evaluated in section 5.2.
The third problem of crucial tests, the probabilistic nature of empirical
science, is of particular interest in econometrics. Popper ([1935] 1968,
p. 191) notes that probability propositions (`probability estimates' in
his words) are not falsiable. Indeed, Popper (p. 146) is aware that this
is an
almost insuperable objection to my methodological views. For although prob-
ability statements play such a vitally important role in empirical science, they turn
out to be in principle impervious to strict falsication. Yet this stumbling block
will become a touchstone upon which to test my theory, in order to nd out what
it is worth.
Popper's probability theory is discussed in chapter 3, section 5. There, I
argue that it is unsatisfactory, hence the stumbling block is a painful one.
One objection can already be given. In order to save methodological
falsicationism, Popper proposes a methodological rule or convention
for practical falsication: to regard highly improbable events as ruled
out or prohibited (p. 191; see Watkins, 1984, p. 244 for support, and
Howson and Urbach, 1989, p. 122 for a critique). This is known as
Cournot's rule, after the mathematician and economist Antoine-
Augustin Cournot, who considered highly improbable events as physi-
cally impossible. Cournot's rule has been defended by probability theor-
ists such as Emile Borel and Harald Crame r (1955, p. 156), without
providing a deeper justication. The rule has even been used to support
the hypothesis of divine providence: life on earth is highly improbable
and must be ruled out, if not for the existence of the hand of God.
The problem with the rule is where to draw the line: when does improb-
able turn into impossible? Popper ([1935] 1968, p. 204) argues that a
methodological rule might decree that only reasonably fair samples are
permitted, and that predictable or reproducible (i.e. systematic) devia-
12 Probability, Econometrics and Truth
tions must be ruled out. But the concept of a fair sample begs the ques-
tion. If an experiment can be repeated easily (in economics, controlled
experiments are rare and reproducible controlled experiments even more
so), this may be a relatively minor problem, but otherwise it can lead to
insoluble debates. Is a test result, with an outcome that is improbable
given the hypothesis under consideration, a uke or a straightforward
rejection? If ten coins are tossed 2,048 times, each particular sequence is
extremely improbable, and, by Cournot's principle, should be considered
impossible.
10
Cournot's rule does not provide sound guidance for the
following question: What will be a falsication of the statement that
most swans are white, or `Giffen goods are rare'? This type of proposition
will be discussed in the treatment of the probabilistic strategy (their
empirical content, not their falsiability, is of practical interest).
5.1.3 Critical rationalism
Caldwell (1991, p. 28) argues that, despite its drawbacks,
Popper's falsicationism partly captures the spirit of economic inference.
The fact, however, that economists occasionally test theories and some-
times conclude that they reject something, does not imply that methodo-
logical falsicationism is an apt description of this part of economic
inference. Methodological falsicationism deals with a logical criterion
to be applied to logical caricatures of scientic theories (in this sense,
Popper operates in the tradition of the Wiener Kreis). There are neither
crucial tests in economics, nor is there a strong desire for falsications.
Caldwell continues that falsicationism may be abandoned, but that
Popper's true contribution to scientic method is to be found in critical
rationalism. This is a much weakened version of methodological falsi-
cationism and merely implies that scientists should be (self-) critical. It
reduces Popper's philosophy to a platitude (some authors indeed claim
that Popper trivializes science, e.g. Feyerabend, 1987). Unlike methodo-
logical falsicationism, critical rationalism purports to be a historically
accurate description of science. By showing that, in a number of histori-
cal cases, good science accords to critical rationalism, Popper suggests
that the stronger programme of methodological falsicationism is sup-
ported. But those case studies provide weak evidence (Hacking, 1983), or
even are `myths, distortions, slanders and historical fairy tales'
(Feyerabend, 1987, p. 185). This does not mean that critical rationalism
is invalid; on the contrary, it is a principle which has been advocated by a
wide range of scientists, before and after the publication of Popper's
views on methodology.
For example, the motto of Karl Pearson ([1892] 1911) is a statement
due to Victor Cousin: `La critique est la vie de la science' (i.e. criticism is
The philosophy of induction 13
the life of science; see also Pearson, p. 31, among many other places
where he emphasizes the importance of criticism). More or less simulta-
neously with Pearson, Peirce introduced the theory of fallibilism, of
which falsicationism is a special case (see e.g. Popper, [1963] 1989,
p. 228). According to fallibilism, research is stimulated by a state of
unease concerning current knowledge. Research intends to remove this
state of unease by nding answers to scientic puzzles. In Humean vein,
Peirce (1955, p. 356) argues, `our knowledge is never absolute but always
swims, as it were, in a continuum of uncertainty and of indeterminacy'.
Methodological falsicationism is different from Peirce's philosophy, by
actually longing for a state of unease rather than attempting to remove
it.
5.1.4 Historicism
Although Popper's philosophy of science is in many respects
problematic, in particular if applied to the social sciences, he has con-
tributed an important insight to the social sciences that is usually
neglected in discussions of (economic) methodology. This is his critique
of `historicism'. Historicism, Popper ([1957] 1961) argues, starts from the
idea that there is such a thing as a historical necessity of events, or social
predestination. He considers Marx and Hegel as examples of historicists
(although they had a very different interpretation of historicism and
Popper's interpretation of their work is controversial).
11
Disregarding
the historicist merits of Marx and Hegel, this may be Popper's most
interesting book for economists.
Popper's critique of historicism is related to his idea that knowledge is
conjectural and universal theories cannot be proven. This does not pre-
clude growth of knowledge, but this growth results from trial and error.
It cannot be predicted. The ability to predict the future course of society
is limited as it depends on the growth of knowledge.
One of the most interesting parts in the Poverty of Historicism is his
discussion of the so-called Oedipus Effect. Popper (p. 13; see also Popper,
1948) denes it as:
the inuence of the prediction upon the predicted event (or, more generally, . . . the
inuence of an item of information upon the situation to which the information
refers), whether this inuence tends to bring about the predicted event, or whether
it tends to prevent it.
This problem of reexivity, as it is also known, undermines methodolo-
gical falsicationism. Popper's views on historicism and the special char-
acteristics of the social sciences suggest that falsicationism cannot play
quite the same role in the social sciences as in the natural sciences. Popper
14 Probability, Econometrics and Truth
([1957] 1961, pp. 13043) is ambiguous on this point. He remains con-
vinced of the general importance of his methodology, the unity of
method. Meanwhile, he recognizes the difference between physics and
economics:
In physics, for example the parameters of our equations can, in principle, be
reduced to a small number of natural constants a reduction that has been
carried out in many important cases. This is not so in economics; here the para-
meters are themselves in the most important cases quickly changing variables.
This clearly reduces the signicance, interpretability, and testability of our mea-
surements. (p. 143)
Popper swings between the strong imperatives of his own methodology,
and the more reserved opinions of his former colleagues at the London
School of Economics (LSE), Lionel Robbins and Friedrich Hayek.
Popper's compromise consists of restricting the domain of the social
sciences to an inquiry for conditional trends or even singular events,
given an a priori axiom of full rationality.
12
Popper calls this the zero
method. Methodological falsicationism may be applied to this con-
struct, but the rationality postulate is exempted from critical scrutiny.
It is an a priori synthetic truth (see also Caldwell, 1991, pp. 1921).
The initial conditions may change and this may invalidate the conti-
nuation of the trend. Hence, social scientists should be particularly inter-
ested in an analysis of initial conditions (or situational logic). The
difference between prognosis and prophecy is that prophecies are uncon-
ditional, as opposed to conditional scientic predictions (Popper, [1957]
1961, p. 128). In the social sciences, conditions may change due to the
unintended consequence of human behaviour (this is a cornerstone of
Austrian economic thought). Mechanic induction like extrapolation of
trends is, therefore, not a very reliable way of making forecasts. The
econometric implications of the Oedipus Effect and the lack of natural
constants in economics deserve more attention. These implications are of
greater interest to the economist or econometrician than the methodology
of falsicationism or Popper's ideas on probability.
5.2 Lakatos and conjecturalism
What does Lakatos offer to rescue the conjecturalist strategy? Because of
the problems involved with methodological falsicationism, he proposes
to give falsications less impact. He rejects the crucial test or `instant
falsication', and instead emphasizes the dynamics of theory
development.
The philosophy of induction 15
5.2.1 Research programmes
This dynamics can be evaluated by considering a theory as a part
of an ongoing research programme. A theory is just one instance of a
research programme, RP, at a given point in time. How do you decide
whether a succeeding theory still belongs to an RP? This question should
be settled by dening the essential characteristics of an RP, by its hard
core and the guidelines for research, the heuristic. The hard core consists
of the indisputable elements of a research programme. The positive heur-
istic provides the guidelines along which research should proceed. The
negative heuristic of a research programme forbids directing the modus
tollens at the hard core (Lakatos, 1978, p. 48).
According to Lakatos (p. 48), the hard core of an RP is irrefutable by
the methodological decision of its proponents. A falsication of the the-
ory is not automatically a rejection of an RP. Falsifying a theory is
replaced by measuring the degree of progressiveness of an RP. Lakatos
(p. 33) distinguishes three kinds of progress. A research programme is
.
theoretically progressive if `each new theory has excess empirical con-
tent over its predecessor, that is, if it predicts some novel, hitherto
unexpected fact'
.
empirically progressive if some of these predictions are conrmed
.
heuristically progressive if it avoids auxiliary hypotheses that are not in
the spirit of the heuristic of a research programme (Lakatos would call
such hypotheses ad hoc
3
, where ad hoc
1
and ad hoc
2
denote lack of
theoretical and empirical progressiveness, respectively).
It is not easy in ecomonics to apply Lakatos' suggestion of appraising
theories by comparing their rate of progressiveness or degeneration. A
research programme is a vague notion. Scientists may disagree about
what belongs to a specic RP and what does not (see Feyerabend,
1975). The problem culminates in the so-called tacking paradox (see
Lakatos, 1978, p. 46). If the theory of diminishing marginal utility of
money is replaced by a successor, which combines this theory with the
general theory of relativity, an apparently progressive step is being made.
Of course, this is not what Lakatos has in mind: this is why he emphasizes
the consistency with the positive heuristic of an RP. An alternative for
avoiding nonsensical combination of two theories is to introduce the
notion of irrelevant conjunction (see Rosenkrantz, 1983, for a discussion
and a Bayesian solution to the problem).
Lakatos' suggestions are of some help for understanding (`rationally
reconstructing') economic inference, but in many cases they are too vague
and insufciently operational. Chapter 8 provides a case study showing
how difcult it is to apply them to an important episode in the history of
applied econometrics: testing homogeneity of consumer demand.
16 Probability, Econometrics and Truth
Competing research programmes may apply to partly non-overlapping
areas of interest. This leads to problems such as the already mentioned
DuhemQuine problem, and incommensurability, Thomas Kuhn's (1962)
notion that new theories yield new interpretations of events, and even of
the language describing these events. Furthermore, whereas Popper still
made an (unsuccessful) attempt to contribute to the theory of probabil-
istic inference (cf. his propensity theory of probability, and the notion of
verisimilitude), Lakatos has a critical attitude towards this subject.
5.2.2 Growth and garbage
Both Popper and Lakatos attack inductivism, but Lakatos is
more radical in his critique of empirical tests. These tests are a key ele-
ment in Popper's epistemology, but not so in Lakatos'. This is clear from
the Methodology of Scientic Research Programmes, in which falsica-
tions are less crucial than in Popper's work. Instead of falsiability and
crucial tests, Lakatos advocates a requirement of growth. Can statistical
methods help to obtain knowledge about the degree of empirical progress
of economic theories or research programmes? Lakatos' scant remarks on
this issue provide little hope (all quoted from Lakatos, 1970, p. 176). To
start, the requirement of continuous growth
hits patched-up, unimaginative series of pedestrian `empirical' adjustments which
are so frequent, for instance, in modern social psychology. Such adjustments may,
with the help of so-called `statistical techniques', make some `novel' predictions
and may even conjure up some irrelevant grains of truth in them. But this theo-
rizing has no unifying idea, no heuristic power, no continuity.
These uncharitable statements are followed by such terms as worthless,
phoney corroborations, and, nally, pseudo-intellectual garbage. Lakatos
concludes:
Thus the methodology of research programmes might help us in devising laws for
stemming this intellectual pollution which may destroy our cultural environment
even earlier than industrial and trafc pollution destroys our physical environ-
ment.
Whoever is looking for a Lakatosian theory of testing will have a hard
job. Lakatos' approach is nearly anti-empirical. In his case studies of
physics, experimenters are repeatedly `taught lessons' by theoreticians;
bold conjectures are made despite seemingly conicting empirical evi-
dence, and so on (see Hacking, 1983, chapter 15, for a devastating cri-
tique of Lakatos' account of many of these experiments).
Lakatos' approach is of little help for economists and, as I will argue
later on (chapter 8), it does not provide a basis for a useful econometric
The philosophy of induction 17
methodology. If testing and falsifying of all propositions are approached
rigorously, not much of economics (or any other science) will be left. On
the other hand, if induction is rejected and severe testing as well, one ends
up with `anything goes' or `anarchism in disguise' (Feyerabend, 1975,
p. 200). Watkins (1984, p. 159), the saviour of methodological falsica-
tionism, agrees with this characterization of Feyerabend and makes the
following reductio ad absurdum:
If you could tell, which you normally cannot, that Research Program 2 is doing
better than Research Program 1, then you may reject Research Program 1, or, if
you prefer, continue to accept Research Program 1.
What is missing in the conjecturalist strategy is a clear view on the utility
of theories, the economy of scientic research (emphasized by Peirce) and
the positive role of measurement and evidence. A theory of inductive
reasoning remains indispensable for understanding science.
6 Probabilism
The probabilistic strategy may offer garbage, if applied badly (as Lakatos
suggests) but it may also yield insights in scientic inference: in its foun-
dations and in appraising econometric applications.
This brings us to the probabilist strategy, which claims that Hume
wants too much if he requires a proof for the truth of inductive infer-
ences, and amends proposition (iii) of section 2 above. A logic of `partial
entailment' is proposed: probability logic. This strategy has been inves-
tigated by John Maynard Keynes, Harold Jeffreys, Hans Reichenbach,
Rudolf Carnap and many others. Alternatively, the problem of Humean
scepticism may be resolved by providing a probabilistic underpinning of
the principle of the uniformity of nature, which has been investigated
with the Law of Large Numbers. This approach has been taken by
Richard von Mises.
The following three chapters discuss these and other versions of a
probabilistic strategy for scientic inference. The probabilistic strategy
deserves special attention because it can serve as a foundation for econo-
metric inference, the topic of this book. Furthermore, uncertainty is
particularly important in economics. If falsifying economic theories is
feasible at all, then it must be probabilistic falsication. The issue of
probabilistic testing can only be well understood if a good account of
probabilism is presented (note that testing is clearly neither the only nor
the ultimate aim of probabilism). If probabilistic falsication is impossi-
ble, then other methods are needed for appraising economic theories given
the uncertainty that is inextricably bound up with economics.
18 Probability, Econometrics and Truth
Notes
1. There are some excellent introductions to the philosophy of science, written
for and by economists. Blaug (1980) and Caldwell (1982) are particularly
strong on methodology, but both have little to say about econometrics.
Darnell and Evans (1990) combine methodology and econometrics, but
their treatment is brief and sometimes ambiguous. More recent developments
in methodology can be found in Backhouse (1994).
2. The phrase is due to C. D. Broad (see Ramsey, 1926, p. 99; and Hacking,
1975, p. 31). Frank Ramsey (1926, pp. 989) denies that there is no answer to
Hume's problem, but `Hume showed that it [i.e. inductive inference] could
not be reduced to deductive inference or justied by formal logic. So far as it
goes his demonstration seems to be nal; and the suggestion of Mr Keynes
that it can be got round by regarding induction as a form of probable infer-
ence cannot in my view be maintained. But to suppose that the situation
which results from this is a scandal to philosophy is, I think, a mistake.'
3. The problem is not trivial. Keynes ([1921] CW VIII, p. 418) notes how
Laplace calculated that, `account be taken of the experience of the human
race, the probability of the sun's rising tomorrow is 1,826,214 to 1, this large
number may seem in a kind of way to represent our state of mind of the
matter. But an ingenious German, Professor Bobek, has pushed the argument
a degree further, and proves by means of these same principles that the
probability of the sun's rising every day for the next 4000 years, is not
more, approximately, than two-thirds, a result less dear to our natural
prejudices.' See also Pearson ([1892] 1911, p. 141) for a discussion of Laplace
and the probability of sunrise.
4. A proposition is analytic if the predicate is included in the subject (e.g. all
econometricians are human). A synthetic proposition is not analytic (usually
thought to be based on matters of fact; e.g. all econometricians are wise).
Quine ([1953] 1961) contains a classic critique of the distinction.
5. The Weiner Kreis was the inuential group of scientists who met regularly in
Vienna during the 1920s and 1930s. In 1922 the group was formed by Moritz
Schlick. Some prominent members were Rudolph Carnap, Otto Neurath and
Hans Hahn. Other regular participants were Herbert Feigl, Philipp Frank,
Kurt Go del, Friedrich Waisman and Richard von Mises. The group built on
the positivist doctrines of Henri Poincare , and in particular the Austrian
physicist and philosopher Ernst Mach. In addition, they used advances in
logic due to Gottlob Frege, to Russell and Whitehead, whence logical posi-
tivism. A closely related view is logical empiricism, associated with Hans
Reichenbach and, again, Carnap. Caldwell (1982) denes logical empiricism
as the mature version of logical positivism. The different branches of
twentieth-century positivism are often grouped as neo-positivism. Popper is
occasionally associated with neo-positivism, but Hacking (1983, p. 43) argues
convincingly that Popper does not qualify as a positivist.
6. Although with a twist in their case, as they are interested in behaviour rather
than inference.
The philosophy of induction 19
7. Tinbergen (1939a, p. 12) argues that econometrics cannot prove a theory
right, but it may show that some theories are not supported by the data.
He never refers to Popper, or any other philosopher (Popper, [1957] 1961,
on the other hand, contains a reference to Tinbergen who notes that con-
structing a model is a matter of trial and error). Koopmans makes a remark
similar to Tinbergen's. If researchers still speak of verication of theories,
then this should not be taken literally: very few would deny Hume's argu-
ment.
8. According to Mark Blaug, an important difference between Popper and those
statisticians is Popper's banning of immunizing stratagems. Whether an abso-
lute ban would benet science is doubtful. See Keuzenkamp and McAleer
(1995) for a discussion of `ad hocness' and inference, in particular the refer-
ences to Jeffreys.
9. Popper regards Einstein as the best example of a scientist who took falsica-
tion seriously, because of his bold predictions. Einstein has the highest num-
ber of entries in the name index of Popper ([1935] 1968) (thirty-six times).
Carnap (with a score of thirty-ve) is a close second, but does not share in
Popper's admiration.
10. The example is not imaginary. W. Stanley Jevons performed this experiment
to `test' Bernoulli's law of large numbers. This and more laborious though
equally meaningless experiments are reported in Keynes ([1921] CW VIII,
pp. 3949).
11. Hegel, for example, believed in the historical relativity of truth, a tenable
position from the point of view presented in this book.
12. `[A]nd perhaps also on the assumption of the possession of complete infor-
mation' (Popper, [1957] 1961, p. 141) a remarkably un-Austrian assump-
tion!
20 Probability, Econometrics and Truth
2 Probability and indifference
One regards two events as equally probable when one can see no reason
that would make one more probable than the other.
Laplace (cited in Hacking, 1975, p. 132)
1 Introduction
Are scientic theories `probabiliable'? Lakatos (1978, p. 20), who
phrases this question, says no. According to the probabilistic response
to Humean scepticism, however, the answer should be afrmative,
although the justication for the afrmative answer is problematic. The
crossroads of philosophy of science and probability theory will be the
topic of chapters 2 to 4.
There exist different versions of probabilism. The roots of modern
probability theory lie in Laplace's indifference theory of probability,
which has been supported by economists such as John Stuart Mill and
W. Stanley Jevons. Laplace's theory is complemented by the theory of the
English Reverend Thomas Bayes. Their interpretation of probability,
which is based on `equally probable' events, will be presented in chap-
ter 2, section 2.
Since the days of Bayes and Laplace, confusion has existed about
whether probability refers to a property of events themselves, or beliefs
about events. For this purpose, it is useful to make a distinction between
the realist or aleatory interpretation of probability on the one hand, and
the epistemological interpretation of probability on the other hand
(Hacking, 1975).
Aleatory stems from the Latin word for `die' (alea jacta est). The
aleatory view has its roots in the study of games of chance, such as
playing cards and casting dice. It relates probability to the occurrence
of events or classes of events. The relative frequency theory of probability
(in brief, frequency theory) is based on this view. The propensity theory,
which regards probability as a physical property, also belongs to this
interpretation. Both of these aleatory interpretations of probability are
discussed in chapter 3. The frequency theory of probability dominates
econometrics. An unfortunate misnomer, frequently used in this context,
21
is `classical econometrics (statistics)'. Classical statistics is the proper
label for the indifference interpretation.
The epistemological view, which underlies Bayesian econometrics, is
discussed in chapter 4. This view is concerned with the credibility of pro-
positions, assertions and beliefs in the light of judgement or evidence
(Hacking, 1975, p. 14). Within the epistemological view, there exist `objec-
tive' (or logical) and `subjective' (or personalistic) interpretations. The the-
ory of Keynes is an example of a logical probability theory, while Frank
Ramsey, Bruno de Finetti and Leonard (`Jimmie') Savage developed sub-
jective theories of probability. Subjective theories of probability enable
dealing with cognitive limitations to human knowledge and inference.
Although most of modern probability theory is based on a formalism
due to A. N. Kolmogorov (presented in the Intermezzo), applied prob-
ability theory (statistical theory) cannot do without an interpretation of
this formalism. Chapters 2 to 4 deal with this interpretation. It is an
essential prerequisite for a proper understanding of different schools in
econometrics and for an appreciation of methodological controversies
among econometricians.
2 The indifference theory of probability
2.1 Indifference
The indifference or classical theory of probability dates back to
Christiaan Huygens, Gottfried von Leibniz, Jacques Bernoulli and, espe-
cially, Pierre Simon de Laplace (see Hacking, 1975, pp. 126127; von
Mises, [1928] 1981, p. 67). This theory denes probability as the ratio
of the number of favourable outcomes to the total number of equally
likely outcomes.
More precisely, if a random experiment can result in n mutually exclu-
sive and equally likely outcomes and if n
A
of these outcomes have an
attribute A, then the probability of A is
P(A) =
n
A
n
: (1)
Note that this interpretation, if taken literally, restricts probabilities to
rational numbers (the same applies to the frequency interpretation). This
is not quite satisfactory, because it excludes the appraisal of two single
independent and equally likely events if their joint probability is 1/2 (the
single probability of each event, 1=
2
_
has neither an indifference nor a
frequency interpretation).
22 Probability, Econometrics and Truth
The indifference theory of probability has more serious drawbacks,
and, therefore, has little support today. Even so, a discussion is useful
for two reasons. First, the problems of the indifference theory stimulated
the formulation of other theories of probability. Second, the indifference
theory is relevant for understanding the problem of non-informative
priors in Bayesian statistics.
The name `principle of indifference' is due to Keynes ([1921] CW VIII),
who preferred this label to `the principle of insufcient reason' by which it
was known before (see Hacking, 1975, p. 126). Jacques Bernoulli intro-
duced the principle, which states that events are equally probable if there
is no known reason why one event (or alternative) should occur (or be
more credible) rather than another (Keynes, [1921] CW VIII, p. 44). This
formulation allows an epistemological as well as an aleatory or realist
interpretation. I will continue with the latter, but all drawbacks which
will be discussed here apply to the epistemological one as well.
2.2 Objections
A rst objection to the indifference interpretation of probability is that
the denition is either circular, which reduces the denition of probability
to a tautology, or incomplete. The circularity arises if `equally likely' is
identied with `equally probable'. The probability of throwing heads with
an unbiased coin is 1/2, because this is how unbiasedness is dened: each
possible outcome is equally probable or equally likely. If, however, it is
unknown whether the coin is unbiased, then it is not clear how the
principle of indifference should be applied. A related objection, expressed
by von Mises ([1928] 1981, pp. 6970), is that it is often unclear what is
meant by equally likely:
according to certain insurance tables, the probability that a man forty years old
will die within the next year is 0.011. Where are the `equally likely cases' in this
example? Which are the `favourable' ones? Are there 1000 different possibilities,
eleven of which are `favourable' to the occurrence of death, or are there 3000
possibilities and thirty-three `favourable' ones? It would be useless to search the
textbooks for an answer, for no discussion on how to dene equally likely cases in
questions of this kind is given.
Von Mises concludes that, in practice, the notion of a priori `equally
likely' is substituted by some known long-run frequency. This step is
unwarranted unless the foundation of probability also changes from
indifference (or lack of knowledge) to frequencies (or abundance of
knowledge) in a very precise setting in which a proper account of ran-
domness is given (see the following chapter).
Probability and indifference 23
A second objection arises if the experiment is slightly complicated, by
tossing the coin twice. What is the probability of throwing two heads?
One solution, that works with partitions, would be 1/3, for there are three
possibilities: two heads, two tails and a head and a tail. Today, this
answer is considered wrong: the correct calculation should be based on
the possible number of permutations. The mistake (or perhaps one
should say `different understanding') has been made by eminent scholars
(for example Leibniz, see Hacking, 1975, p. 52), but even today it is not
always clear how to formulate the set of equally possible alternatives
(Hacking mentions a more recent but similar problem in small particle
physics, where in some cases partitions, and in other cases permutations,
have proved to be the relevant notion).
Third, the principle of indifference implies that, if there is no reason to
attribute unequal probabilities to different events, the probabilities must
be equal. This generates uniform probability distributions in fact all
probability statements based on the indifference theory are ultimately
reduced to applications of uniform distributions. But again, what should
be done if all that is known is that a coin is biased, without knowing in
which direction?
Fourth, the application of a uniform distribution gives rise to para-
doxes such as Bertrand's paradox (named after the mathematician Joseph
Bertrand who discussed this paradox in 1889; see Keynes, [1921] CW
VIII, p. 51; van Fraassen, 1989, p. 306). Consider gure 1, where an
equilateral triangle ABC is inscribed in a circle with radius OR. What
is the probability that a random chord XY is longer than one of the sides
of the triangle? There are (at least) three alternative solutions to the
problem if the principle of indifference is applied:
(i) Given that one point of the chord is xed at position X on the circle.
We are indifferent as to where on the circle the other end of the
chord lies. If Y is located between 1/3 and 2/3 of the circumference
away from X, then from the interval [0,1] the interval [1/3,2/3] pro-
vides the favourable outcomes. Hence the probability is 1/3.
(ii) We are indifferent as to which point Z the chord crosses orthogon-
ally the radius of the circle. As long as OZ is less than half the radius,
the chord is longer than the side of the triangle. Hence, the prob-
ability is 1/2.
(iii) We are indifferent with respect to the location of the middle point of
the chord. In the favourable cases, this point lies in the concentric
circle with half the radius of the original circle. The area of the inner
circle is 1/4 of the area of the full circle, hence the probability is 1/4.
Obviously, the solutions are contradictory. This is one of the most per-
suasive arguments that has been raised against the indifference theory of
24 Probability, Econometrics and Truth
probability (see also the discusion of prior probability in chapter 4,
section 5). Von Mises gives another version of this paradox, which
deals with the problem of inferring the ratio of wine and water in a
glass that contains a wine and water mixture. He concludes that the
principle of indifference cannot be sustained. However, Jaynes (1973)
shows that problems such as Bertrand's do have non-arbitrary solutions,
if careful thought is given to all the invariance requirements involved.
The greatest problem with the indifference argument is that it needs a lot
of thought before it can be applied, and there may be cases where para-
doxes will remain unsolved.
3 The rule of succession
How does the indifference theory of probability provide a basis for induc-
tive reasoning? The probability a priori, based on insufcient reason, is
mapped into a probability a posteriori by Laplace's rule of succession. No
other formula in the alchemy of logic, Keynes ([1921] CW VIII, p. 89)
writes, has exerted more astonishing powers. The simplest version of this
rule is as follows (for a formal derivation, see Jeffreys, [1939] 1961,
pp. 1278; Cox and Hinkley, 1974, p. 368). If there is evidence from a
sample of size n in which m elements have the characteristic C, we may
infer from this evidence that the probability of the next element having
characteristic C is equal to
P(C) =
m1
n 2
: (2)
Probability and indifference 25
Figure 1
An interesting implication of the rule of succession is that sampling will
only yield a high probability that a population is homogeneous if the
sample constitutes a large fraction of the whole population (Jeffreys,
[1939] 1961, p. 128). This is an unpleasant result for those with inductive
aims (homogeneity of outcomes is an important condition for frequency
theorists, like von Mises, and for proponents of epistemological prob-
ability, like Keynes and Jeffreys; see chapters 3 and 4). It results from
using a uniform prior probability distribution to numerous (or innite)
possible outcomes. Keynes' principle of limited independent variety (see
chapter 4, section 2) and the JeffreysWrinch `simplicity postulate' (dis-
cussed in chapter 5) are responses to this problem.
The validity of the rule of succession is controversial. Venn, Bertrand
and Keynes reject the rule, while authors such as Jevons and Pearson
support it.
1
Carnap generalizes the rule (see chapter 4, section 3.2). If n
increases, the probability converges to the relative frequency, which
yields the `straight rule'. This rule sets the probability equal to the relative
frequency.
2
On the other hand, if n = 0, then we may conclude that the
probability that a randomly chosen man is named Laplace is equal to 1/2.
The rule of succession raises the issue as to how to classify events in
equiprobable categories.
Relying on the principle of indifference is likely to yield suspect infer-
ences in economics. Take, for example, the dispute about the source of
long-term unemployment: heterogeneity versus hysteresis. Heterogeneity
means that a person who is long-term unemployed is apparently someone
of low quality. Hysteresis means that unemployment itself reduces the
chance of obtaining a new job, because the unemployed person loses
human capital. Are these hypotheses a priori equiprobable? Perhaps,
but it is very difcult to give a satisfactory justication for this choice
(but see chapter 4). Moreover, how may evidence for different countries
be used in a rule of succession? In Britain, hysteresis may provide the best
explanation for unemployment spells, in the USA it may be heterogene-
ity. But the evidence is not beyond doubt, and inserting the cases in a rule
of succession would be meaningless.
4 Summary
Only in rare cases, one can establish a priori whether events are equally
likely. When this is not possible, the indifference theory of probability
may result in paradoxes, notably Bertrand's. A related problem arises
with respect to classication of events. Still, the indifference approach has
been valuable as it inspired von Mises to his frequency theory of prob-
ability, and Keynes to his epistemological theory.
26 Probability, Econometrics and Truth
As an inductive principle, the method of indifference has yielded
Laplace's rule of succession. This rule has clear drawbacks, but still
gures in the background of some more advanced interpretations of
probability, such as the one proposed by Carnap. After an intermezzo,
I will now turn to these alternative interpretations: the frequency theory
(chapter 3) and the epistemological theory (chapter 4).
Notes
1. The rule of succession is discussed in Keynes ([1921] CW VIII, chapter 30).
Keynes' quibble about Laplace's calculations of the probability of a sunrise
(see chapter 1 footnote 3, and Keynes, pp. 41718) is the result of an abuse of
this rule: the elements in the sample should be independent, which is clearly
not the case here. Other discussions can be found in Fisher (1956, p. 24),
Jeffreys ([1939] 1961, p. 127) and Howson and Urbach (1989, pp. 424).
2. Cohen (1989, p. 96) attributes the straight rule to Hume. Jeffreys ([1939] 1961,
p. 370) attributes it to De Moivre's Doctrine of Chances (the rst edition of
this work appeared in 1718, when Hume was six years old). Jeffreys (p. 369)
suggests that, more recently, Jerzy Neyman supports the straight rule, and
criticizes Neyman. Neyman's (1952, pp. 1012) reply shows that Jeffreys'
suggestion is valid, but not all of his comments are.
Probability and indifference 27
I Intermezzo: a formal scheme of
reference
In this intermezzo several basic concepts of probability are explained.
Knowledge of these concepts is necessary for a good understanding of
the remainder of the book. Readers who have such knowledge already
may skip this intermezzo without a loss of understanding.
1 Probability functions and Kolmogorov's axioms
A. N. Kolmogorov provided the currently still-dominating formalism of
probability theory based on set theory. Kolmogorov (1933, p. 3) inter-
prets his formalism along the lines of von Mises' frequency theory, but
the same formalism can be used for other interpretations (a useful intro-
duction is given by Mood, Graybill and Boes, 1974, pp. 840, on which
much of this section is based; a short discussion of the relation between
sets and events can be found in Howson and Urbach, 1989, pp. 1719).
First, the notions of sample space and event should be dened. The
sample (or outcome) space o is the collection of all possible outcomes of a
conceptual experiment.
1
An event A is a subset of the sample space. The
class of all events associated with a particular experiment is dened to be
the event space, /. o is also called the sure event. The empty set is
denoted by .
A probability function (or measure) P() is a set function with domain
/ and counterdomain the interval on the real line, [0; 1], which satises
the following axioms (Kolmogorov, 1933, pp. 2 and 13):
(1) / is an algebra of events
(2) / o
(3) P(A) _ 0 for every A c /
(4) P(o) = 1
(5) If A
1
; A
2
; . . . ; A
n
is a sequence of mutually exclusive events in /
28
i.e.
\
n
i=1
A
i
=
!
and if A
1
C A
2
C . . . C A
n
=
[
n
i=1
A
i
c /;
then P
[
n
i=1
A
i
!
=
X
n
i=1
P(A
i
):
(6) If A
1
; A
2
; . . . A
n
is a sequence of mutually exclusive events in / and if
A
1
A
2
. . . A
n
,
then lim
no
P(A
n
) = 0:
The axioms have been challenged by various probability theorists (such
as von Mises and De Finetti). Others thought them of little interest (for
example, R. A. Fisher). For the purpose of this book, they are relevant,
precisely because various authors have different opinions of their mean-
ing. I will briey discuss them below.
Dening the event space as an algebra (or eld) implies that if A
1
and
A
2
are events (i.e. they belong to /), then also generalized events like
A
1
C A
2
and A
1
A
2
belong to /. The use of algebras makes application
of measure theory possible. Most of the interesting concepts and pro-
blems of probabilistic inference can be explained using probability func-
tions without using measure theory.
2
The second axiom states that the sample space contains the event
space. A direct result of axiom 1 and 2 is that the empty set, , is an
event as well. The third axiom restricts the counterdomain of a probabil-
ity function to the non-negative real numbers. The fourth axiom states
that the sure event has probability one. Some authors (e.g. Jeffreys, [1939]
1961) do not accept this axiom they hold it as a convention which is
often useful but sometimes is not. The fth axiom is that of (nite)
additivity, which is the most specic one for a probability function.
Axiom 6, perhaps the most controversial of these axioms (rejected, for
example, by de Finetti), is called the continuity axiom and is necessary for
dealing with innite sample spaces. These fth and sixth axioms jointly
imply countable additivity. It is easy to construct a sequence of events
that generates a relative frequency, which is not a countable additive (see
e.g. van Fraassen, 1980, pp. 1846, for discussion and further references).
Intermezzo: a formal scheme of reference 29
2 Basic concepts of probability theory
2.1 Probability space and conditional probability
The triplet (o; /; P()) is the so-called probability space. If A and B are
two events in the event space /, then the conditional probability of A
given B, P(A[B), is dened by
P(A[B) =
P(A) B)
P(B)
if P(B) > 0: (I:1)
2.2 Bayes' theorem, prior and posterior probability
Combining (I.1) with a similar expression for P(B[A) yields Bayes'
Theorem:
P(A[B) =
P(B[A)P(A)
P(B)
: (I:2)
In fact, the English Reverend Thomas Bayes was not the one who
authored this theorem his essay, posthumously published in 1763 (see
Kyburg and Smokler, [1964] 1980), contains a special case of the theo-
rem. A less restricted version (still for equiprobable events) of the theo-
rem can be found in Pierre Simon Marquis de Laplace's memoir on `the
probability of causes', published in 1774. The general version of Bayes'
theorem was given in the Essay Philosophique (Laplace [1814] 1951; see
also Jaynes, 1978, p. 217). I will stick to the tradition of calling (I.2)
Bayes' theorem. It is also known as the principle of inverse probability,
as a complement to the direct probability, P(B[A). Sometimes, a direct
probability is regarded as the probability of a cause of an event. The
indirect probability infers from events to causes (see Poincare , [1903]
1952, chapter 11). The relation between probability and causality will
be discussed in chapter 9, section 4.
A few remarks can be made. If an event is a priori extremely improb-
able, i.e. P(A) approaches zero, then no nite amount of evidence can
give it credibility. If, on the other hand, an event B is very unlikely given
A, i.e. P(B[A) approaches zero, but is observed, then A has very low
probability a posteriori. These considerations make the theorem a natural
candidate for analysing problems of inference. The theorem as such is not
controversial. Controversial is, however, what is known as Bayes' Rule or
Axiom to represent prior ignorance by a uniform probability distribu-
tion. The elicitation of prior probabilities is the central problem of
Bayesian inference.
30 Probability, Econometrics and Truth
2.3 Independence and random variable
A few more concepts need to be dened. One is independence. Events A
and B are stochastically independent if and only if (P(A B) = P(A)P(B).
Hence, if A and B are independent, then P(A[B) = P(A).
Until now, the fundamental notion of randomness has not been used.
It seems odd that probability functions can be dened without dening
this concept, but in practice, this difcult issue is neglected (an example is
the otherwise excellent book of Mood, Graybill and Boes, 1974). I will
come back to the notion of randomness later (see the discussions of von
Mises, Fisher and de Finetti). For the moment, we have to assume that
everybody knows, by intuition, what is meant by randomness. Then we
can go on and dene a random variable.
It is impossible to apply probability calculus to events like the occur-
rence of red, yellow and blue if we cannot assign numbers to these char-
acteristics, and impose an ordering on the events o = {red, yellow, blue].
The notion of a `random variable' serves this purpose. If the probability
space (o; /; P()) is given, then the random variable X() is a function
from o to R. Every outcome of an experiment now corresponds with a
number. Mood, Graybill and Boes (1974) acknowledge that there is no
justication for using the words `random' and `variable' in this denition.
2.4 Distribution and density functions
The cumulative distribution function of X, F
X
() is dened as that function
with domain R and counterdomain [0; 1] which satises F
X
(x) =
P(!:X(! _ x)) for every real number x. If X is a continuous random
variable, i.e. if there exists a function f
X
() such that
F
X
(x) =
x
o
f
X
(u)du for every x; (I:3)
then the integrand f
x
() in F
X
(x) =
x
o
f
X
(u)du is called the probability
density function for this continuous random variable. A geometric inter-
pretation is that the area under f
X
() over the interval (a; b) gives the
probability P(a < X < b). Similar notions exist for discrete random vari-
ables. It is also possible to extend the denition to jointly continuous
random variables and density functions. The k-dimensional random vari-
able (X
1
; X
2
; . . . ; X
k
) is a continuous random variable if and only if a
function f
X
1
; ...X
k
(; . . . ; ) _ 0 does exist such that
Intermezzo: a formal scheme of reference 31
F
X
1
...;X
k
(x
1
; . . . ; x
k
) =
x
k
o
. . .
x
1
o
f
X
1
;...;X
k
(u
1
; . . . ; u
k
)du
1
; . . . ; du
k
;
(I:4)
for all (x
1
; . . . ; x
k
). A joint probability density function is dened as any
non-negative integrand satisfying this equation.
Finally, consider the concepts of marginal and conditional density
functions. They are of particular importance for the theory of econo-
metric modelling, outlined in chapter 7. If X and Y are jointly continuous
random variables, then their marginal probability density functions are
dened as
f
X
(x) =
o
o
f
X;Y
(x; y)dy (I:5)
and
f
Y
(y) =
o
o
f
X;Y
(x; y)dx (I:6)
respectively. The conditional probability density function of Y given
X = x, f
Y[X
([x) is dened
f
Y[X
([x) =
f
X;Y
(x; y)
f
X
(x)
for f
X
(x) > 0: (I:7)
These are the fundamental notions of probability which are frequently
used in what follows. Note that some of the most fundamental notions,
such as randomness and, in general, the interpretation of probability,
have not yet been dened. Quine (1978) notes that mathematics in general
(hence probability theory in particular) cannot do without a non-formal
interpretation. Chapters 2, 3 and 4 deal with the interpretation of differ-
ent theories of probability.
Notes
1. The concept of a sample space seems to be used rst by R. A. Fisher in a 1915
Biometrika article on the distribution of the correlation coefcient. The
notion of a conceptual experiment is due to Kolmogorov, and is dened as
any partition of o, obeying some elementary set-theoretic requirements (see
Kolmogorov, 1933).
2. See Chung (1974) or Florens, Mouchart and Rolin (1990) for a more elabo-
rate discussion of the technical issues and renements of the axioms.
32 Probability, Econometrics and Truth
3 Relative frequency and induction
They wanted facts. Facts! They demanded facts from him, as if facts
could explain anything!
Joseph Conrad, Lord Jim
1 Introduction
The frequentist interpretation of probability dominates mainstream
probability theory and econometric theory. Proponents of the fre-
quency theory bolster its alleged objectivity, its exclusive reliance on
facts. For this reason, it is sometimes called a `realistic' theory (Cohen,
1989).
Keynes ([1921] CW VIII, p. 100) pays tribute to the English mathema-
tician Leslie Ellis, for providing the rst logical investigation of the fre-
quency theory of probability. Ellis (1843) objects to relating probability
to ignorance: ex nihilo nihil. John Venn ([1866] 1876) elaborated the
frequency theory. Still, a theory which is formally satisfactory did not
exist until the early twentieth century.
1
In 1919, Richard von Mises for-
mulated the foundations for a frequency theory of probabilistic inference
(see von Mises, [1928] 1981, p. 224). His views are discussed in, section 2.
Shortly after, Ronald Aylmer Fisher (1922) presented an alternative fre-
quentist foundation of probabilistic inference. Section 3 deals with
Fisher's theory. Today, the most popular frequentist theory of probabil-
ity is based on yet another interpretation, due to Jerzy Neyman and Egon
Pearson. This is the topic of, section 4. Popper's interpretation of prob-
ability, which is supposed to be entirely `objective', is discussed in section
5. Section 6 offers a summary of the chapter.
I will argue that, despite the elegance of the theory of von Mises and
the mathematical rigour of the NeymanPearson theory, those theories
have fundamental weaknesses which impede econometric inference. The
framework proposed by Fisher comes closer to the needs of econo-
metrics, while Popper's theory is of no use.
33
2 The frequency theory of Richard von Mises
2.1 The primacy of the collective
The principal goal of von Mises was to make probability theory a science
similar to other sciences, with empirical knowledge as its basis. He is a
follower of the positivist scientist and philosopher Ernst Mach. In line
with Ellis, von Mises ([1928] 1981, p. 30) criticizes the view that prob-
ability can be derived from ignorance:
It has been asserted and this is no overstatement that whereas other sciences
draw their conclusions from what we know, the science of probability derives its
most important results from what we do not know.
Rather, probability should be based on facts, not their absence. Indeed,
the frequency theory relates a probability directly to the `real world' via
the observed `objective' facts (i.e. the data), preferably repetitive events. If
people position the centre of gravity of an unknown cube in its geometric
centre, they do not do so because of their lack of knowledge. Instead, it is
the result of actual knowledge of a large number of similar cubes (p. 76).
Probability theory, von Mises (1964, p. 98) writes, is an empirical science
like other ones, such as physics:
Probability theory as presented in this book is a mathematical theory and a
science; the probabilities play the role of physical constants; to any probability
statement an approximate verication should be at least conceivable.
Von Mises' (pp. 1 and 1314) probability theory is not suitable for inves-
tigating problems such as the probability that the two poems Illiad and
Odyssey have the same author. There is no reference to a prolonged
sequence of cases, hence it hardly makes sense to assign a numerical
value to such a conjecture. The authenticity of Cromwell's skull, investi-
gated by Karl Pearson with statistical techniques, cannot be analysed in
von Mises' probabilistic framework. More generally, there is no way to
`probabilify' scientic theories.
This distinguishes his frequency theory from the one of his close col-
league, Reichenbach. According to von Mises, a theory is right or false. If
it is false, then it is not a valid base for probabilistic inference. Probability
should not be interpreted in an epistemological sense. It is not lack of
knowledge (uncertainty) which provides the foundation of probability
theory, but experience with large numbers of events. Like von Mises,
Reichenbach ([1938] 1976, p. 350) holds that the aim of induction `is to
nd series of events whose frequency of occurrence converges toward a
limit'. Both argue that probability is a frequency limit. However,
Reichenbach (1935, p. 329) claims that it is perfectly possible to deter-
34 Probability, Econometrics and Truth
mine the probability of the quantum theory of physics being correct. It is
even possible to x a numerical probability. This can be done, if
man sie mit anderen physikalischen Theorien a hnlicher Art in eine Klasse
zusammenfat und fu r diese die Wahrscheinlichkeit durch Ausza hlung bestimmt
(`if it is put in a class with other similar physical theories and one deter-
mines their probability by counting' i.e. one should construct an urn with
similar theories of physics). These probabilities are quantiable,
Reichenbach argues. How his urn can be constructed in practice, and
how the probability assignments can be made, remains obscure (except
for the suggestion that one may formulate a bet which may yield a con-
sistent foundation of probabilistic inference, but this implies a departure
from the basic philosophy underlying a frequentist theory of probability).
Therefore, Reichenbach's theory is of little help.
The basis of von Mises' theory is the collective. A collective is a
sequence of uniform events or processes, which differ in certain observa-
ble attributes such as colours, numbers or velocities of molecules in a gas
(von Mises, [1928] 1981, p. 12). It must be dened before one can speak of
a probability: rst the collective, then the probability. A collective has to
full two conditions, the convergence condition and the randomness
condition. They are dened as follows.
Convergence condition. Let the number of occurrences of an event with
characteristic i be denoted by n
i
, and the number of all events by n. Then
lim
no
n
i
n
= p
i
. i = 1. 2. . . . . k. (1)
where p
i
is the limiting frequency of characteristic i.
Randomness condition. Let m be an innite sub-sequence of sequence n,
derived by a place selection. A place selection is a function f that selects
an element out of sequence n where the selection criterion may depend on
the value of already selected events, but does not depend on the value of
the event to be selected or subsequent ones in n. Then it should be the
case that
lim
mo
m
i
m
= p
i
. i = 1. 2. . . . . k. (2)
The convergence condition implies that the relative frequencies of the
attributes must possess limiting values (this is an elaboration of the
Venn limit). The randomness condition implies that the limiting values
Relative frequency and induction 35
must remain the same in all arbitrary sub-sequences (von Mises, [1928]
1981, pp. 245). The randomness condition is also known as the principle
of the impossibility of a gambling system (p. 25). Fisher (1956, p. 32)
proposes a condition of `no recognizable subsets' which is similar to the
randomness condition. An example in time-series econometrics is the
criterion of serially uncorrelated residuals of a regression.
Von Mises (1964, p. 6) notes that an observed frequency may be dif-
ferent from its limiting value, for example, a specic sequence of
1,000,000 throws of a die may result in observing only sixes even if its
limiting frequency is 1/6. Indeed,
[i]t is a silent assumption, drawn from experience that in the domain where prob-
ability calculus has (so far) been successfully applied, the sequences under con-
sideration do not behave this way; they rather exhibit rapid convergence.
Note that the convergence condition is not a restatement of the law of
large numbers (a mathematical, deductive theorem see below). It is
postulated on the basis of empirical observation. The same applies to
the randomness condition (von Mises, [1928] 1981, p. 25).
Formally von Mises should have made use of a probability limit rather
than a mathematical limit in his denition of the convergence condition.
In that case it becomes problematic to dene the collective rst, and then
probability. Like the indifference interpretation, the frequency interpre-
tation uses a circular argument. However, von Mises is an empiricist who
bases his interpretation of probability on empirical statements, not on
formal requirements. Moreover, if von Mises were able to dene a ran-
dom event without making use of a primitive notion of probability, his
program (rst the collective, then probability) might be rescued.
2.2 Inference and the collective
How might von Mises' probability theory be used for scientic inference
or induction? The answer to this question cannot be given before the law
of large numbers is discussed. It dates back to Bernoulli, has been coined
and reformulated by Poisson and is one of the cornerstones of probability
theory since. Without success, Keynes ([1921] CW VIII, p. 368) suggested
renaming this law the `stability of statistical frequencies', which provides
a clear summary of its meaning.
There are different versions of this law. Chebyshev's `weak law of large
numbers' follows.
2
`Weak law of large numbers' (Chebyshev). Let f () be a probability
density function with mean j and (nite) variance o
2
, and let x
n
be the
36 Probability, Econometrics and Truth
sample mean of a random sample of size n from f (). Let c and o be any
two specied numbers satisfying c > 0 and 0 - o - 1. If n is any integer
greater than o
2
,c
2
o, then
P([x
n
j[ - c) _ 1 o. (3)
This law has a straightforward interpretation. If the number of observa-
tions increases, the difference between the sample mean and its `true'
value j becomes arbitrarily small. In other words (von Mises, 1964,
p. 231), for large n it is almost certain that the sample mean will be
found in the immediate vicinity of its expected value. This seems to
provide a `bridge' (a notion used by von Mises, [1928] 1981, p. 117;
also used by Popper, [1935] 1968) between observations and probability.
But the law of large numbers is a one-way bridge, from the `true' value of
the probability to the observations.
3
Is inference in the opposite direction possible, from the sequence of
observations to the expected value? The oldest answer to this question is
the straight rule (mentioned in chapter 2) which states that the probabil-
ity is equal to the observed frequency. This rule presupposes a very
specic prior probability distribution which is not generally acceptable.
The recognition of this point seems to have been the basis for Laplace's
rule of succession. Von Mises ([1928] 1981, p. 156; 1964, p. 340) proposes
another solution: to combine the law of large numbers with Bayes' the-
orem (see the Intermezzo). He calls the result the second law of large
numbers (the rst being Bernoulli's, a special case of Chebyshev's; see
Spanos, 1986, pp. 1656).
4
This second law is formulated such that the
numerical value of the prior probabilities or its specic shape is irrele-
vant, except that the prior density p
0
(x) (using the notation of von Mises,
1964, p. 340) is continuous and does not vanish at the location of the
relative frequency r, and furthermore, p
0
(x) has an upper bound. Under
those conditions, one obtains the
`Second law of large numbers' (von Mises, 1964, p. 340). If the observa-
tion of an n-times repeated alternative shows a relative frequency r of
`success', then, if n is sufciently large, the chance that the probability of
success lies between r c and r c is arbitrarily close to one, no matter
how small c is.
The frequentist solution to the problem of inference given by von Mises
consists of a combination of the frequency concept of a collective with
Bayes' theorem. If knowledge of the prior distribution does exist, there is
no conceptual problem with the application of Bayes' theorem. If, as is
more often the case, this prior information is not available, inference
Relative frequency and induction 37
depends on the availability of a large number of observations (von Mises,
1964, p. 342; emphasis added):
It is not right to state that, in addition to a large number of observations, knowl-
edge of [the prior distribution] is needed. One or the other is sufcient.
But von Mises underestimates some of the problems that arise from the
application of non-informative prior probabilities or neglecting priors. In
many cases, von Mises is right that the prior vanishes if the number of
observations increases, but this is not always true. In the case of time
series inference, where non-stationarity is a possibility, the choice of the
prior is crucial for the conclusion to be drawn, even with large numbers
of observations (see for example Sims and Uhlig, 1991).
2.3 Appraisal
Von Mises' theory has been very useful to clarify some problems in
probability theory. For example, it avoids Bertrand's paradox because
it restricts probability to cases where we can obtain factual knowledge,
for example a collective of waterwine mixtures. In that case, the limiting
ratio of water and wine does exist, its value depends on the construction
of the collective. Different constructions are possible, yielding different
limiting frequencies. The problem is to consider, in an empirical context,
rst, whether a sequence satises the conditions of a collective, and if so,
to infer its probability distribution.
Obviously, a practical difculty arises as a collective is dened for an
innite sequence. A collective is an idealization. Von Mises ([1928] 1981,
p. 84) acknowledges that `[i]t might thus appear that our theory could
never be tested empirically'. He compares this idealization with other
popular ones, such as the determination of a specic weight (perfect
measurement being impossible) or the existence of a point in Euclidean
space. The empirical validity of the theory does not depend on a logical
solution, but is determined by a practical decision (von Mises, p. 85). This
decision should be based on previous experience of successful applica-
tions of probability theory, where practical studies have shown that fre-
quency limits are approached comparatively rapidly. Von Mises (1964,
p. 110) again stresses the empirical foundation of probability:
Experience has taught us that in certain elds of research these hypotheses are
justied, we do not know the precise domain of their validity. In such domains,
and only in them, statistics can be used as a tool of research. However, in `new'
domains where we do not know whether `rapid convergence' prevails, `signicant'
results in the usual sense may not indicate any `reality'.
38 Probability, Econometrics and Truth
This may be a weak foundation for an objectivist account of probability,
but von Mises deserves credit for its explicit recognition. If the data are
not regular in the sense of his two conditions, then statistical inference
lacks a foundation and Hume's problem remains unresolved. Von Mises'
probabilistic solution to Hume's problem is, after all, a pragmatic one.
Apart from the practical problem of identifying a nite set of observa-
tions with the innite notion of a collective, von Mises' theory suffers
from some formal weaknesses. Unlike the formalist Kolmogorov, von
Mises provides a very precise interpretation of the notion of probability
and randomness. Kolmogorov takes probability as a primitive notion,
whereas von Mises attempts to derive the meaning of probability from
other notions, in particular convergence and randomness. Kolmogorov
notes that the two approaches serve different ends: formal versus applied.
Von Mises is interested in bridging the mathematical theory with the
`empirische Entstehung des Wahrscheinlichkeitsbegriffes' (i.e. the empiri-
cal origin of the notion of probability; Kolmogorov, 1993, p. 2). For this
purpose, Kolmogorov (p. 3, n. 1) accepts the empirical interpretation of
von Mises.
The formal circularity of the argument, mentioned in section 2.1, has
rarely been criticized. Instead, von Mises' interpretation has been
attacked by followers of Kolmogorov for an alleged conceptual weak-
ness, related to the notion of place selection. Von Mises emphasizes the
constructive role of a place selection in his randomness condition there
are no restrictions to the resulting sub-sequence itself. But formally, it can
be proved that there exists a place selection which provides an innite
sequence of random numbers, in which only one number repeatedly
occurs.
5
Is this still a `random sequence'? Von Mises does not care. To
understand why, it is useful to distinguish between different approaches
in mathematical inference. Most mathematicians today prefer to build
mathematical concepts from formal structures (sets of relations and
operations) and axioms to characterize the structures (`Nicolas
Bourbaki', a pseudonym for a group of French mathematicians,
advanced this approach in a series of volumes starting from 1939;
Kolmogorov was a forerunner). Another school (headed by L. E. J.
Brouwer) only accepts intuitive concepts. Intuitionism holds that mathe-
matical arguments are mental constructs. Von Mises, nally, follows a
constructive approach (intuitionism and nitism are special versions
thereof). He argues that constructivism makes his theory empirical (scien-
tic, like physics). The formal possibility that a very unsatisfactory
sequence may be constructed is unlikely to be realized in practice,
given an acceptable place selection (Von Mises, [1928] 1981, pp. 92
93). Abraham Wald (1940) showed that if the set of place selections {]
Relative frequency and induction 39
is restricted to a countable set, the problem disappears. In the same year
Church proposed identifying this set with the set of recursive (i.e. effec-
tively computable) functions. The WaldChurch solution is a step for-
ward but not yet perfect (see Li and Vita nyi, 1990, p. 194). A nal
renement has been proposed by replacing `rst the collective, then prob-
ability' by `rst a random sequence, then probability'. This approach will
be pursued in the chapter 4, section 4.
Von Mises is rarely cited in econometrics (Spanos, 1986, p. 35, is an
exception; another exception is the review of von Mises, [1928] 1981, by
the philosopher Ernest Nagel in Econometrica, 1952). His inuence is not
negligible, though. As shown above, Kolmogorov supports von Mises'
interpretation of probability. Similarly, the probability theorist Jerzy
Neyman (1977, p. 101), who bases his methods primarily on
Kolmogorov formalism, is `appreciative of von Mises' efforts to separate
a frequentist probability theory from the intuitive feelings of what is
likely or unlikely to happen'. Many of von Mises' ideas entered econo-
metrics indirectly, via his student Wald. Harald Crame r (1955, p. 21)
constructed a probability theory which he claims is closely related to
the theory of von Mises. This small group of people inspired Trygve
Haavelmo and other founders of formal econometrics.
If the theory of von Mises is at all applicable to economics, it must be
in cross-section econometrics where data are relatively abundant.
However, it is an open question whether the criterion of convergence is
satised in micro-econometric applications. Probabilities in economics
are not the kind of physical entities that von Mises seems to have in
mind in constructing his theory. Although I do not have direct evidence
for this proposition, this may be a reason why von Mises' brother, the
economist Ludwig von Mises, rejects the use of formal probability theory
in economics. Even so, the subtlety of his work justies a serious attempt
at an interpretation of econometric inference as the study of social (rather
than physical) collectives.
Von Mises' effort to provide a positivist (empirical) foundation for the
frequency theory of probability is laudable. But the strength of the theory
is also a weakness: it is impossible to analyse probabilities of individual
events with a theory based on collectives, or to perform small sample
analysis without exact knowledge of prior probabilities. Here, the work
of R. A. Fisher comes in.
3 R. A. Fisher's frequency theory
Sir Ronald Aylmer Fisher was a contemporary of von Mises. Both devel-
oped a frequency theory of probability. Fisher was inuenced by Venn,
40 Probability, Econometrics and Truth
the critic of `inverse probability'or Bayesian inference, and President of
Gonville and Caius College whilst Fisher was an undergraduate student
there (Edwards, [1972] 1992, p. 248). While von Mises was a positivist in
the tradition of Mach, Fisher was an exponent of British empiricism,
often associated with the names of Locke, Berkeley and Hume. Fisher's
methodological stance is more instrumentalistic than von Mises'. The
latter excels in analytic clarity. Fisher, on the other hand, is characterized
by intuitive genius, less by analytical rigour. His theoretical conjectures
have often been proved by other statisticians, but he also succeeded in
making empirical conjectures (predicting `new facts' which were veried
afterward).
6
Fisher is the man of the Galton laboratory and Rothamstead
Experimental Station, where theory met practical application, which he
considered to be more important for the development of the theory of
probability than anything else. His innovations in the vocabulary and
substance of statistics have made him immortal.
3.1 Fisher on the aim and domain of statistics
In his classic paper on the foundations of statistics, Fisher (1922, p. 311)
gives the following description of the aim of statistics:
the object of statistical methods is the reduction of data. A quantity of data,
which usually by its mere bulk is incapable of entering the mind, is to be replaced
by relatively few quantities which shall adequately represent the whole, or which,
in other words, shall contain as much as possible, ideally the whole, of the rele-
vant information contained in the original data.
This object is accomplished by constructing a hypothetical innite population,
of which the actual data are regarded as constituting a random sample.
Reduction serves induction, which is dened as `reasoning from the sam-
ple to the population from which the sample was drawn, from conse-
quences to causes, or in more logical terms, from the particular to the
general' (Fisher, 1955, p. 69). Consistency, efciency, sufcient and ancil-
lary statistics, maximum likelihood, ducial inference are just a few of
Fisher's inventions of new terms for data reduction. Fisher is well aware,
however, that a theory of data reduction by means of estimation and
testing is not yet the same as a theory of scientic inference. He accepts
Bayesian inference if precise prior information is available, but notes that
in most cases prior probabilities are unknown. For these circumstances,
he proposes the ducial argument, `a bold attempt to make the Bayesian
omelet without breaking the Bayesian eggs' (Savage, 1961, p. 578). I will
rst discuss a number of elements of Fisher's theory of probability, and
Relative frequency and induction 41
then ask whether econometricians may use the ducial argument for
inference.
Unlike von Mises, Fisher does not restrict the domain of probability
theory to `collectives'. Fisher takes the hypothetical innite population
(introduced in Fisher, 1915) as the starting point of the theory of prob-
ability. This resembles the ensemble of Willard Gibbs, who used this
notion in his theory of statistical mechanics. A hypothetical innite popu-
lation seems similar to von Mises' collective, but there are two differ-
ences. First, von Mises starts with observations, and then postulates what
happens if the number of observations goes off to innity. Fisher starts
from an innite number of possible trials and denes the probability of A
as the ratio of observations where A is true to the total number of
observations. Secondly, unlike von Mises, Fisher has always been inter-
ested in small sample theory. Those who are not, Fisher (1955, p. 69)
claims, are `mathematicians without personal contact with the Natural
Sciences'.
7
Fisher was proud of his own laboratory experience, and
regarded this as a hallmark of science.
According to Fisher (1956, p. 33), it is possible to evaluate the prob-
ability of an outcome of a single throw of a die, if it can be seen as a
random sample from an aggregate, which is `subjectively homogeneous
and without recognizable stratication'. Before a limiting ratio can be
applied to a particular throw, it should be subjectively veried that there
are no recognizable subsets. This condition is similar to von Mises' con-
dition of randomness, although the latter is better formalized and, of
course, strictly intended for investigating a collective of throws, not a
single throw.
Another difference between the theories of von Mises and Fisher is
that, in Fisher's theory, there is no uncertainty with respect to the para-
meters of a model: the `true' parameters are xed quantities, the only
problem is that we cannot observe them directly. Von Mises ([1928] 1981,
p. 158) complains that he does
not understand the many beautiful words used by Fisher and his followers in
support of the likelihood theory. The main argument, namely, that p is not a
variable but an `unknown constant', does not mean anything to me.
Moreover, von Mises (1964, p. 496) argues:
The statement, sometimes advanced as an objection [to assigning a prior prob-
ability distribution to parameter ], that is not a variable but an `unknown
constant' having a unique value for the coin under consideration, is beside the
point. Any statement concerning the chance of falling in an interval H (or
concerning the chance of committing an error, etc.) is necessarily a statement
about a universe of coins, measuring rods, etc., with various -values.
42 Probability, Econometrics and Truth
For the purpose of inference, the frequentist asserts implicitly, or expli-
citly, that the statistical model is `true' (Leamer, 1978, calls this the
`axiom of correct specication'). This is a strong assumption if two com-
peting theories, which both have a priori support, are evaluated!
8
Fisher
did not discuss the problem of appraising rival theories, but he was fully
aware of the specication problem. Indeed, one of his most important
contributions to statistics deals with the problem of experimental design,
which has a direct relation to the specication problem (which features
prominently in chapters 6 and 7). First, a number of Fisher's innovations
to the statistical box of tools should be explained, followed by other
theories of inference. Fisher's tools have obtained wide acceptance,
whereas his theory of inference has not.
3.2 Criteria for statistics
Like Hicks did for economics, Fisher added a number of most useful
concepts to the theory of probability, i.e. consistency, efciency, (max-
imum) likelihood, sufciency, ancillarity, information and signicance
test.
Consistency
According to Fisher (1956, p. 141), the fundamental criterion of esti-
mation is the Criterion of Consistency. His denition of a consistent
statistic is: `A function of the observed frequencies which takes the
exact parametric value when for these frequencies their expectations are
substituted' (p. 144).
9
This denition of consistency is actually what
today is known as unbiasedness (I will use the familiar denitions of
consistency and unbiasedness from now on). It would reject the use of
sample variance as an estimator of the variance of a normal distribution.
A condition, which according to Fisher (p. 144) is `much less satisfac-
tory', is weak consistency (today known as plain consistency): the prob-
ability that the error of estimation of an estimator t
n
for a parameter
given sample size n converges to zero if n grows to innity
(plim
no
t
n
= ; a consequence of Bernoulli's weak law of large numbers
and, in the theory of von Mises, a prerequisite for probabilistic inference
his condition of convergence). Fisher desires to apply his criterion to
small samples (which is not possible if the asymptotic denition is used),
therefore, he prefers his own denition.
The importance of unbiasedness and consistency is not generally
accepted. Fisher regards unbiasedness as fundamental. Many authors
accept consistency as obvious. Spanos (1986, p. 244) thinks consistency
is `very important'. Cox and Hinkley (1974, p. 287) give it `a limited, but
Relative frequency and induction 43
nevertheless important, role'. Efron (1986, p. 4) holds that unbiasedness
is popular for its intuitive fairness. The other side of the spectrum is the
Bayesian view, presented, for example, by Howson and Urbach (1989,
p. 186) who see no merit in consistency as a desideratum, or by Savage
([1954] 1972, p. 244) who asserts that a serious reason to prefer unbiased
estimates has never been proposed. The student of Fisher, C. R. Rao
([1965] 1973, p. 344), makes fun of consistency by dening the consistent
estimator o
n
as o
n
= 0 if n _ 10
10
, or o
n
=
n
if n > 10
10
, where
n
is a
more satisfactory (`consistent' in its intended meaning) estimator of . In
case of sample sizes usually available in economics, this suggest using 0 as
the (consistent) estimate for any parameter . Minimizing a mean squared
prediction error (given a specic time horizon) may be much more useful
in practice than aiming at consistency.
One reason for the appeal of consistent or unbiased estimators may be
that few investigators would be willing to assert that, in general, they
prefer biased estimators because of the emotional connotations of the
word `bias'. If this is the rhetoric of statistics in practice, then this also
applies to the notion of efciency.
Efciency
A statistic of minimal limiting variance is `efcient' (Fisher, 1956,
p.156). In the case of biased estimators, efciency is usually dened as
minimizing the expected mean square error, S(t
n
)
2
. Efciency
`expresses the proportion of the total available relevant information of
which that statistic makes use' (Fisher, 1922, p. 310).
Maximum likelihood
The likelihood function, perhaps Fisher's most important contribution
to the theory of probability, gives the likelihood that the random vari-
ables X
1
. X
2
. . . . . X
n
assume particular values x
1
. x
2
. . . . . x
n
, where the
likelihood is the value of the density function at a particular value of .
The method of maximum likelihood is to choose the estimate for which
the likelihood function has its maximum value. Fisher (1956, p. 157)
argues that
no exception has been found to the rule that among Consistent Estimates, when
properly dened, that which conserves the greatest amount of information is the
estimate of Maximum Likelihood.
He continues that the estimate that loses the least of the information
supplied by the data must be preferred, hence maximum likelihood is
suggested by the sufciency criterion.
44 Probability, Econometrics and Truth
Maximum likelihood has a straightforward Bayesian interpretation
(choosing the mode of the posterior distribution if the prior is non-infor-
mative). Without such a Bayesian or other bridge inductive inference
cannot be based on just maximum likelihood. Wilson (cited in Jeffreys
[1939] 1961, p. 383), states: `If maximum likelihood is the only criterion
the inference from the throw of a head would be that the coin is two-
headed.' But, according to Jeffreys, in practical application to estimation
problems, the principle of maximum likelihood is nearly indistinguishable
from inverse probability. Recent investigations in time series statistics
suggest that this is not true in general.
Sufciency
The criterion of sufciency states that a statistic should summarize the
whole of the relevant information supplied by a sample. This means that,
in the process of constructing a statistic, there should be no loss of
information. This is perhaps the most fundamental requisite in Fisher's
theory. Let x
1
. . . . x
n
be a random sample from the density f (; ). Dene
a statistic T as a function of the random sample, T = t(x
1
. . . . x
n
), then
this is a sufcient statistic if and only if the conditional distribution of
x
1
. . . . x
n
given T = t does not depend on for any value t of T (see
Mood, Graybill and Boes 1974, p. 301; Cox and Hinkley, 1974, p. 18).
This means that it does not make a difference for inference about if a
sufcient statistic T = t(x
1
. . . . x
n
) is used, or x
1
. . . . x
n
itself. Cox and
Hinkley (1974, p. 37) base their `sufciency principle' for statistical infer-
ence on this Fisherian notion. The econometric methodology of David
Hendry is in the spirit of this principle, as will be shown in chapter 7, but
other approaches to econometric inference hold sufciency in high esteem
as well. Sufcient statistics do not always exist (they do exist for the
exponential family of distributions, such as the normal distribution).
The RaoBlackwell lemma states that sufcient statistics are efcient.
A statistic is minimal sufcient, if the data cannot be reduced beyond a
sufcient statistic T without losing sufciency. A further reduction of the
data will result in loss of information.
In 1962, Birnbaum showed that the principle of sufciency, together
with the principle of conditionality (see Berger, [1980] 1985, p. 31) yields
the `Likelihood Principle' as a corollary. This principle states that, given a
statistical model with parameters and given data x, the likelihood func-
tion l() = f (x[) provides all the evidence concerning contained by the
data. Furthermore, two likelihood functions contain the same informa-
tion about if they are proportional to each other (see pp. 2733, for
further discussion). Fisher's theory of testing, discussed below, violates
this likelihood principle.
Relative frequency and induction 45
Ancillary statistic
If S is a minimal sufcient statistic for and dim(S) > dim() then it
may be the case that S = (T. C) where C has a marginal distribution
which does not depend on . In that case, C is called an ancillary statistic,
and T is sometimes called conditionally sufcient (Cox and Hinkley,
1974, pp. 323). Ancillarity is closely related to exogeneity (discussed in
chapter 7).
Information
The second derivative of the logarithm of the likelihood function with
respect to its parameter(s) provides an indication of the amount of
information (information matrix) realized at any estimated value for
(Fisher, 1956, p. 149). It is known as the Fisher information. Normally, it
is evaluated at the maximum of the likelihood function. Fisher interprets
the information as a weight of the strength of the inference. The Crame r
Rao inequality states that (subject to certain regularity conditions) the
variance of an unbiased estimator is at least as large as the inverse of
the Fisher information. Thus, the lower the variance of an estimator,
the more `information' it provides (it has, therefore, relevance for the
construction of non-informative prior probabilities).
Signicance test and P-value
An important notion for testing hypotheses is the P-value, invented
by Karl Pearson for application with the
2
goodness-of-t test. If
2
is
greater than some benchmark level, then the tail-area probability P is
called signicantly small. The conclusion is that either the assumed
distribution is wrong or a rare event under the distribution must have
happened. `Student' (Gosset) took the same approach for the t-test.
Fisher continues this tradition. Fisher ([1935] 1966, p. 16) claims that
the P-value is instrumental in inductive inference. Anticipating Popper,
he argues that `the null hypothesis is never proved or established, but is
possibly disproved in the course of experimentation'. An experimental
researcher should try to reject the null hypothesis. Unlike Popper,
Fisher claims that this is the source of inductive inference. The value
of an experiment increases whenever it becomes easier to disprove the
null (increasing the size of an experiment and replication are two
methods to increase this value of an experiment: replication is regarded
as another way of increasing the sample size, p. 22). It is important to
note that Fisher's approach not only differs from Popper's, in that
Popper rejects all kinds of inductive arguments, but also that
Fisher's null hypothesis is usually not the hypothesis the researcher
likes to entertain.
46 Probability, Econometrics and Truth
Basing inference on P-values has a strange implication: the judgement
is driven by observations that in fact are not observed. According to
Jeffreys ([1939] 1961, p. 385) the P-value
gives the probability of departures, measured in a particular way, equal to or
greater than the observed set, and the contribution from the actual value is nearly
always negligible. What the use of P implies, therefore, is that a hypothesis that may
be true may be rejected because it has not predicted observable results that have not
occurred. This seems to be a remarkable procedure.
Jeffreys anticipates the conclusion that inference based on P-values vio-
lates the `likelihood principle'.
Fisher ([1925] 1973) suggests using signicance levels of 1% or 5% and
offers tables for a number of tests. The reason for this (and the subse-
quent popularity of those percentages) is a historical coincidence: due to
his bad relations with Karl Pearson, Fisher was prevented from publish-
ing Pearson's tables which contained probability values in the cells.
Fisher had, therefore, to create his own way of tabulation which subse-
quently proved more useful (see Keuzenkamp and Magnus, 1995). The
tables facilitate the communication of test results, but reporting a P-value
provides more information than reporting that a test result is signicant
at the 5% level. Furthermore, Jeffreys' complaint about P-values applies
equally well to Fisher's signicance levels. Finally, Fisher does not give
formal reasons for selecting one rather than another test method for
application to a particular problem which is a considerable weakness as
different test procedures may lead to conicting outcomes. Neyman and
Pearson suggested a solution for this shortcoming (see below).
3.3 Randomization and experimental design
As a practical scientist, Fisher paid much attention to real problems of
inference. The titles of his books, Statistical Methods for Research
Workers and The Design of Experiments, reect this practical interest.
The latter contains an elaborate discussion of one of Fisher's major
contributions to applied statistics, randomization. Randomization pro-
vides `the physical basis of the validity of the test' (the title of chapter 9 of
Fisher [1935] 1966, p. 17). The `inductive basis' for inferences, Fisher
(p. 102) argues, is improved by varying the conditions of the experiment.
This is a way to control for potentially confounding inuences (nuisance
variables) which may bias the inference with regard to the variables of
interest (another option is to control directly, by means of including all
potentially relevant and well measured variables in the multiple regres-
sion model; see also chapter 6, section 3 for discussion). Fisher also
Relative frequency and induction 47
regards randomization as a means to justify distributional assumptions,
thereby avoiding mis-specication problems (this justication works
often, though not always).
10
Randomization, when introduced, was revo-
lutionary and controversial (K. Pearson, for example, had developed his
system of distributions in order to deal with all kinds of departures from
normality, which after randomization became relatively unimportant;
Fisher Box (1978) provides interesting historical background
information).
A good introduction to experimental design can be found in Fisher
([1935] 1966). Here, the famous example of the `tea-cup lady' (likely to be
inspired by a female assistant at Rothamsted, where regular afternoon tea
sessions took place), who claims to be able to detect whether tea or milk
was put in her cup rst (p. 11). In order to test the hypothesis, many
variables may affect the outcome of the experiment apart from the order
of milk and tea (e.g. thickness of the cup, temperature of the tea, smooth-
ness of material etc.). These `uncontrolled causes . . . are always strictly
innumerable' (p. 18). The validity of tests of signicance, Fisher (p. 19)
argues, `may be guaranteed against corruption by the causes of distur-
bance which have not been eliminated' by means of direct control.
Howson and Urbach (1989, p. 147) claim that randomization applies
only to a `relatively small number of the trials that are actually con-
ducted', as most trials are not aiming at signicance tests, but parameter
estimation. Their claim does not hold, because the estimates may also be
biased due to the `corrupting' effect of uncontrolled nuisance variables.
Another objection to randomization as a catch-all solution to mis-speci-
cation is better founded (p. 149): it might occur that the impact of a
nuisance variable is inuenced by the experimental design. For example,
consider a potato-growing trial, where each of two breeds are planted at
two different spots, or are randomly planted on one larger spot. It might
be that each of the varieties benets likewise from the attention of a
particular insect if planted apart, but that the insect has a preference
for one of the breeds if they are planted randomly. Indeed, it might be.
Any inference, however, will be affected by such a conceivable `corrupt-
ing' factor. Randomization is a means of reducing such bias, it is not a
guarantee that no bias will remain. Howson and Urbach (p. 145) main-
tain that `the essential feature of a trial that permits a satisfactory con-
clusion as to the causal efcacy of a treatment is whether it has been
properly controlled'. This is overly optimistic with regard to the feasibil-
ity of control. Moreover the more non-theoretical an empirical statistical
problem is, the more randomization matters: control is pretty hard if one
does not know what to control for.
48 Probability, Econometrics and Truth
Randomization also affects the economy of research. Rather than
spending resources on eliminating as many of these causes as possible,
the researcher should randomize. Randomization, rather than introdu-
cing assumptions about the relevant statistical distribution, provides the
(`physical') basis of inductive inference. It is needed to cope with the
innumerable causes that are merely a nuisance to the researcher.
Fisher ([1935] 1966, p. 102) adds that uniformity of experiments is not
a virtue:
The exact standardisation of experimental conditions, which is often thought-
lessly advocated as a panacea, always carries with it the real disadvantage that
a highly standardised experiment supplies direct information only in respect of the
narrow range of conditions achieved by standardisation.
It leads to idiosyncratic inferences. A similar remark can be made with
regard to model design.
Fisher's theory of experimental design has little impact on most applied
econometrics: economic data are usually non-experimental. Still, during
the rise of econometric theory, experimental design played an important
role in the econometric language (in particular Haavelmo's). The tension
between this theory and econometric practice is the topic of chapter 6,
section 4.
3.4 Fiducial inference
An important ingredient of Fisher's theory of inference is ducial prob-
ability. In contrast to his other contributions, ducial inference never
gained much support, primarily because of its obscurity. It may be of
interest for the interpretation of mainstream econometrics, however,
because one may interpret arguments of econometricians as implicitly
based on a ducial argument.
11
Consider a 95% condence interval for a random variable X which is
normally distributed as X N(. 1):
12
X 1.96 - - X 1.96. (4)
The meaning of this condence interval is that the random interval
X 1.96, X 1.96) contains with a probability of 95%. In Fisher's
view, is not random, but X is. This leads to a problem of interpretation,
if a particular value x of X is observed, say x = 2.14. Can we plug this
value in, retaining the interpretation of a condence interval, i.e. is it
justiable to hold that in this case lies with 95% probability between
0.18 and 4.10? As long as is not considered a random quantity, the
Relative frequency and induction 49
answer is no. You cannot deduce a probability from non-probabilistic
propositions. The Bayesian solution is to regard as a random variable.
For inference, a prior probability distribution is needed, which Fisher
holds problematic in many (if not most) practical situations. Fisher's
solution to the inference problem is to invoke a probabilistic Gestalt
switch, the ducial distribution (y). Since X is distributed N(0. 1),
so is X. The ducial distribution of is now dened as
P( X _ y) =
(y).
Fisher considers his ducial probability theory as a basis for inductive
reasoning if prior information is absent (see Fisher, 1956, p. 56, where he
also notes, that, in earlier writings he considered ducial probability as a
substitute for, rather than a complement to, Bayesian inference). Fiducial
inference remains closer to the spirit of Bayesian inference than the
NeymanPearson methodology, discussed below. Fisher (p. 51) remarks
that the concept of ducial probability is `entirely identical' with Bayesian
inference, except that the method is new. Blaug (1980, p. 23), on the other
hand, writes that ducial inference is `virtually identical to the modern
NeymanPearson theory of hypothesis testing'. This is a plain misinter-
pretation.
13
Neyman (1952, p. 231) argues that there is no relationship
between the ducial argument and the theory of condence intervals.
Neyman (1977, p. 100) is more blunt: ducial inference is `a conglomera-
tion of mutually inconsistent assertions, not a mathematical theory'.
The problem of a ducial probability distribution is that it makes sense
only to those who perceive Fisher's Gestalt switch, and they are few.
Apart from this, there are numerous technical problems in extending
the method to more general problems than the one considered.
Fiducial inference is at best controversial and hardly, if ever, invoked
in econometric reasoning. Summarizing, Fisher's theory of inference is
not a strong basis for econometrics, although a number of the more
technical concepts he introduced are very important and useful in other
interpretations of probability.
4 The NeymanPearson approach to statistical inference
In a series of famous papers, the Polish statistician Jerzy Neyman and his
English colleague Egon Pearson developed an operational methodology
50 Probability, Econometrics and Truth
of statistical inference (Neyman and Pearson, 1928, 1933a, b). It is
intended as an alternative to inverse inference, and, initially, also as an
elaboration of Fisher's methods (although Fisher was not amused).
Neyman and Pearson invented a frequentist theory of testing, where
the aim is to minimize the relative frequency of errors.
4.1 Inductive behaviour
Neyman and Pearson consider their theory as one of inductive behaviour,
instead of a theory of inductive inference. Behaviourism (created by Ivan
Pavlov, and, in particular, by John B. Watson) was the dominant school
of thought in psychology from the 1920s till the 1940s. Neyman and
Pearson (1933a, pp. 1412) follow suit and
are inclined to think that as far as a particular hypothesis is concerned, no test
based upon the theory of probability can by itself provide any valuable evidence
of the truth or falsehood of that hypothesis.
But we may look at the purpose of a test from another view-point. Without
hoping to know whether each separate hypothesis is true or false, we may search
for rules to govern our behaviour with regard to them, in following which we
insure that, in the long run of experience, we shall not be too often wrong.
Neyman and Pearson hold inference as a decision problem where the
researcher has to cope with two possible risks: rejecting a true null
hypothesis, a so-called Type I error, and accepting a false one, a Type
II error (Neyman and Pearson, 1928). Only in exceptional cases is it
possible to minimize both types of error simultaneously.
In a classic example, Neyman and Pearson (1933a, p. 146) wonder
whether it is more damaging to convict an innocent man or to acquit a
guilty. Mathematical theory may help in showing how to trade off the
risks. Usually, the Type I error is considered to be the more serious one
(although this is certainly not always the case: testing for unit roots is an
example where in the frequency approach the Type II error will be more
dangerous than the Type I error). In practice, the probability of a Type I
error has a preset level, usually Fisher's 5% or 1%. This is the signicance
level, o. Given o, the goal is to nd a test that is most powerful, that is one
which minimizes the probability of a Type II error. An optimum test is a
uniformly most powerful (UMP) test, given signicance level o.
The NeymanPearson methodology is the rst one in the statistical
literature which takes explicit account of the alternatives against which
a hypothesis is tested. Without specifying the alternative hypothesis, it is
not possible to study the statistical properties (in particular, the power) of
a test. In that case, it is unknown when a test is optimal (see Neyman,
Relative frequency and induction 51
1952, pp. 45 and 57). This seems to be an important step ahead, but
Fisher (1956, p. 42) is not impressed:
To a practical man, also, who rejects a hypothesis, it is, of course, a matter of
indifference with what probability he might be led to accept the hypothesis falsely,
for in his case he is not accepting it.
Is the indifference of Fisher's practical man justied?
4.2 The Neyman-Pearson lemma
The goal of Neyman and Pearson (1933a) is to nd `efcient' tests of
hypotheses based upon the theory of probability, and to clarify what is
meant by this efciency. In 1928 they conjectured that a likelihood ratio
test would provide the criterion appropriate for testing a hypothesis
(Neyman and Pearson, 1928). This paper shows that the likelihood
ratio principle serves to derive a number of known tests, such as the
2
and the t-test. The NeymanPearson (1933a) lemma proves that these
tests have certain optimal properties.
The efciency of a testing procedure is formulated in the properties of
the critical region. Alternative tests distinguish themselves by providing
different critical regions. An efcient test picks out the `best critical
region', i.e. the region where, given the Type I error, the Type II error
is minimized. If more test procedures satisfy this criterion for efciency,
then the UMP test should be selected (if available).
If H
0
is the null hypothesis, and H
1
the alternative hypothesis, then (in
Neyman and Pearson's 1933a, p. 142, terminology) H
0
and H
1
dene the
populations of which an observed sample x = x
1
. x
2
. . . . x
n
has been
drawn. Further, p
0
(x) is the probability given H
0
of the occurrence of
an event with x as its coordinates (a currently more familiar notation is
P(x[), where denotes the parameterization under H
0
). The probability
that the event or sample point falls in a particular critical region o is (in
the continuous case):
P
0
(o) =
o
. . .
p
0
(x)dx. (6)
The optimum test is dened as the critical region o satisfying
P
0
(o) _ o. (7)
and
52 Probability, Econometrics and Truth
P
1
(o) = maximum. (8)
Neyman and Pearson (1933a) prove that such an optimum test exists in
the case of testing simple hypotheses (testing =
0
against =
1
) and is
provided by the likelihood ratio statistic z:
z =
P
0
(x)
P
1
(x)
. (9)
The NeymanPearson lemma states that the likelihood ratio statistic
provides an optimal critical region, i.e. given a signicance level o, it
optimizes the power of the test. The theory of Fisher does not provide
an argument for selecting an optimal test. In this sense, Neyman and
Pearson provide an important improvement. Moreover, they present an
extra argument for the application of maximum likelihood, and show
that sufcient statistics contain all relevant information for testing (sim-
ple) hypotheses. Neyman and Pearson note that testing composite
hypotheses is slightly more complicated. For testing composite hypoth-
eses (e.g. =
0
against >
0
), a uniformly most powerful test (one that
maximizes power for all alternatives
1
>
0
) exists if the likelihood ratio
is monotonic (one-parameter exponential distribution families belong to
this class, see Lehmann, [1959] 1986, p. 78). In case of multi-parameter
problems, a UMP test typically does not exist. By imposing additional
conditions, one may nd UMP tests within this restricted class of tests.
An example is the class of unbiased tests (Mood, Graybill and Boes,
1974, p. 425).
4.3 Fisher's critique
The disagreement between Fisher and NeymanPearson is not just one of
different kinds of mathematical abstraction. Indeed, Fisher holds that his
theory is the appropriate one for a free scientic society, whereas
NeymanPearson's either belongs to the world of commerce or to that
of dictatorship. In a critique of the behaviouristic school (of Neyman,
Pearson, Wald and others), Fisher (1956, p. 7; see also pp. 1002) writes:
To one brought up in the free intellectual atmosphere of an earlier time there is
something rather horrifying in the ideological movement represented by the doc-
trine that reasoning, properly speaking, cannot be applied to lead to inferences
valid in the real world.
Fisher (1955, p. 77) opposes the use of loss functions for the purpose of
scientic inference:
Relative frequency and induction 53
in inductive inference we introduce no cost functions for faulty judgements, for it
is recognized in scientic research that the attainment of, or failure to attain to, a
particular scientic advance this year rather than later, has consequences, both to
the research programme, and to advantageous applications of scientic knowl-
edge, which cannot be foreseen. In fact, scientic research is not geared to max-
imize the prots of any particular organization, but is rather an attempt to
improve public knowledge.
In his methodological testament, Fisher (1956, p. 88) concludes that the
principles of Neyman and Pearson will lead to `much wasted effort and
disappointment'.
Is the behaviouristic interpretation appropriate for econometrics?
Many empirical papers in the economic journals do not deal explicitly
with the behaviour of the researcher or policy maker whom he advises.
The risks of making a wrong decision are not further evaluated, their
monetary or utilitaristic implications rarely discussed. Even in cases
where risks are appropriately evaluated, the NeymanPearson approach
is not always the optimal one (Berger, [1980] 1985).
Less persuasive is Fisher's argument against the relevance of power
and its trade-off with the size of a test. His opposition may have been
partly emotional, as Fisher used a similar but mathematically less well-
dened notion: sensitivity (see Fisher, [1925] 1973, p. 11). Neyman and
Pearson were quite right to investigate the efciency of alternative statis-
tical tests. If decision making is hampered by conicting test results,
caused by using a number of alternative but equally valid tests, the selec-
tion of the most efcient test avoids the fate of Buridan's ass. On the
other hand, Fisher's motivation for his opposition, i.e. that alternative
hypotheses are not always well dened (and certainly not restricted to
specic parametric classes such as in NeymanPearson methods) holds in
many practical situations. Fisher would prefer robust tests rather than
UMP tests.
An inevitable weakness of the NeymanPearson theory (acknowledged
by its inventors) is the arbitrariness of the criterion for choosing or
rejecting a hypothesis (the same weakness holds for Fisher's methodol-
ogy). If the convention of applying a signicance level of 0.01 or 0.05 is
followed, then Lindley's `paradox' shows that with growing sample size,
any hypothesis will be rejected.
14
Econometricians tend to ignore the fact
that the signicance level should vary with sample size: macro-econo-
metric studies using a time series of observations spanning 30 years
and micro-econometric studies reporting statistics of cross sections con-
sisting of 14,000 observations often report tests using the same 0.01 or
0.05 signicance levels (see the evidence in Keuzenkamp and Magnus,
1995).
54 Probability, Econometrics and Truth
The NeymanPearson methodology has been criticized by McCloskey
(1985) as an example of statistical rhetoric. Fisher (1956, pp. 99100) had
already made a similar point:
An acceptance procedure is devised for a whole class of cases. No particular
thought is given to each case as it arises, nor is the tester's capacity for learning
exercised. A test of signicance on the other hand is intended to aid the process of
learning by observational experience. In what it has to teach each case is unique,
though we may judge that our information needs supplementing by further obser-
vations of the same, or of a different kind. To regard the test as one of a series is
articial; the examples given have shown how far this unrealistic attitude is cap-
able of deecting attention from the vital matter of the weight of the evidence
actually supplied by the observations on some possible theoretical view, to, what
is really irrelevant, the frequency of events in an endless series of repeated trials
which will never take place.
Finally, many followers of the NeymanPearson methodology pay lip
service to its principles, without explicating the losses that will result
from decisions, and without the adaptation of Neyman and Pearson's
framework of repeated sampling.
4.4 NeymanPearson and philosophy of science
Ronald Giere (1983) uses a decision theoretic framework very similar to
the NeymanPearson methodology (without ever mentioning them) for a
theory of testing theoretical hypotheses in a hypothetico-deductive
method. But Giere acknowledges that science does not always necessitate
making practical decisions, in particular if the evidence is not conclusive.
A decision-making approach for testing scientic theories should, there-
fore, be accompanied by a theory of satiscing a la Simon. This satiscing
approach enables the scientist to withhold judgement, if the risk of error
is valued more highly than the necessity to make a decision. The `institu-
tion of science' decrees the satisfaction levels (they may differ in different
disciplines). Although Giere's approach has the merit of being non-dog-
matic, it lacks clear guidelines.
Not only Giere has tried to encapsulate the NeymanPearson metho-
dology. In one of his rare remarks on statistics, Lakatos (1978, p. 25)
praises the merits of the NeymanPearson methodology. He relates it to
the conjecturalist strategy in the philosophy of science:
Indeed, this methodological falsicationism is the philosophical basis of some of
the most interesting developments in modern statistics. The NeymanPearson
approach rests completely on methodological falsicationism.
Relative frequency and induction 55
Note, however, that the NeymanPearson approach to statistics has the
purpose of guiding behaviour, not inference. It also predates Popper's
Logik der Forschung (the rst German edition appeared in 1934). Hence,
Lakatos is wrong. Moreover, Popper ([1963] 1989, p. 62) explicitly rejects
behaviourism:
Connected with, and closely parallel to, operationalism is the doctrine of beha-
viourism, i.e. the doctrine that, since all test-statements describe behaviour, our
theories too must be stated in terms of possible behaviour. But the inference is as
invalid as the phenomenalist doctrine which asserts that since all test-statements
are observable, theories too must be stated in terms of possible observations. All
these doctrines are forms of the veriability theory of meaning; that is to say, of
induction.
It is, therefore, no surprise that Popper does not refer to the Neyman
Pearson methodology as a statistical version of his methodology of falsi-
cationism. His reluctance to even mention Neyman and Pearson is not
`one of those unsolved mysteries in the history of ideas' as Blaug (1980,
p. 23) asserts, but is the consequence of Popper's strict adherence to
methodological falsicationism.
NeymanPearson methodology in practice, nally, is vulnerable to
data-mining, an abuse on which the different NeymanPearson papers
do not comment. It is implicitly assumed that the hypotheses are a priori
given, but in many applications (certainly in the social sciences and
econometrics) this assumption is not tenable. The power of a test is not
uniquely determined once the test is formulated after an analysis of the
data. Instead of falsicationism, this gives way to a vericationist
approach disguised as NeymanPearson methodology. I will now turn
to Popper's own ideas concerning probability.
5 Popper on probability
A good account of probability is essential to the practical relevance of
methodological fallibilism (see Popper, [1935] 1968, p. 146, cited above in
chapter 1, section 5.1). Has Popper been able to meet this challenge? This
section deals with Popper's theories of probability in order to answer the
posed question. His theories deal with von Mises' frequency theory in
conjunction with a critique of epistemological probability. Later, he
changed his mind and formulated a `propensity' theory which should
enable analysis of the probability of single events (instead of collectives).
Another change of mind was the formulation of a corroboration theory,
which shares some of the intentions of an epistemological theory but
suffers from much greater weaknesses.
56 Probability, Econometrics and Truth
5.1 Popper and subjective probability
Popper ([1935] 1968, p. 146) distinguishes two kinds of probability:
related to events and related to hypotheses. These may be interpreted
as P(e[h) and P(h[e) respectively. The rst is discussed in a framework
similar to von Mises' frequency theory, the second is epistemological,
and belongs to the realm of corroboration theory. Popper aims to
repair the defects of von Mises' theory,
15
and criticizes efforts (by,
among others, Reichenbach) to translate a probability of a hypothesis
into a statement about the probability of events (p. 257). A reection on
Bayes' theorem shows that the radical proposition of Popper is unten-
able. Still, a brief discussion of Popper's views on the two separate
kinds of probability is useful for an appraisal of his philosophy of
science.
Despite his own verdict, Popper's contribution to von Mises' theory is,
at most, marginal. Indirectly, however, he may have triggered the solu-
tion to one of the formal weakness of von Mises' theory. In 1935, he
discussed, at Karl Menger's mathematical colloquium in Vienna, the
conceptual problem of the collective. Wald was one of the delegates
(other participants included Go del and Tarski) and, after hearing
Popper, he became interested in the problem. This resulted in the famous
paper in which Wald `repaired' the denition of collectives.
This positive contribution to the frequency theory of probability is
accompanied by a number of negative statements about the epistemolo-
gical theory of probability, in particular the logical theory of Keynes
(discussed in the next chapter). It is true that this theory is not without
problems, but Popper's arguments against it are not strong. Two exam-
ples, taken from Popper (1983), may give an impression of Popper's
argumentation.
The rst example (p. 294) deals with Kepler's rst law, which can be
summarized by the statements:
(a) x is an ellipse;
(b) x is a planetary orbit;
and the inference: P(a[b) = 1.
16
This is the same as to say that all plane-
tary orbits will be ellipses. Now Popper identies this statement with an
observed (empirical) frequency, which he calls a probability. On the other
hand, the a priori probability (or `logical frequency') of such a statement
`must' be equal to zero given the innity of alternative logical possibilities
(see also Popper, [1935] 1968, p. 373). This zero a priori probability of the
conjecture conicts with the `true' frequentist probability which is nearly
equal to one. Because of the apparent contradiction, Popper rejects the
logical probability propositions.
Relative frequency and induction 57
But are prior probabilities necessarily equal to zero? This is an impor-
tant question indeed. In the next two chapters it is argued that Popper is
wrong. First, a probability is not just a number, but like time and heat
can only be measured with respect to a well-specied reference class. A
molecule is big compared with a quark, but may be small relative to
Popper's brain. Similarly, a particular theory is not probable in itself,
but more (or less) probable than a rival theory. Secondly, even given a
large number of conceivable rival theories in the reference class, there are
good arguments to assign different prior probability values to those
members, in relation to their respective simplicity (see chapter 5).
Continuing the example, an investigator (who even ignores the simpli-
city ranking) may believe that it is a priori equally probable that orbits
are squares or circles. This would assign a prior probability of one half to
the square theory versus one half for the circle theory. Combining these
prior probabilities with data will result in an a posteriori probability, and
it is no bold conjecture that the circle theory will out-perform the square
theory rather quickly. A next step, to improve the t of a theory with the
observations, is to introduce a renement, say the ellipsoidal theory.
Again, the data will dominate the prior probabilities quickly. This is
how, in a simplistic way, probability theory can be applied to the apprai-
sal of theories without paradox.
If, however, Popper's claim is accepted that the number of possible and
presently relevant conjectures is innite and all are equally likely, then
another puzzle should be solved: how is it possible that scientists come up
with conjectures that are not too much at odds with the observed data?
Guessing is a dumb man's job, as Feynman said: inference must inevita-
bly be based on inductive evidence.
Consider a second example of Popper discussing probability. Popper
(1983, pp. 3035) introduces a game, to be played between a frequentist
Popperian and an inductivist. It is called Red and Blue, and the
Popperian will almost certainly win. The idea is that you toss a coin,
win a dollar if heads turns up, or lose a dollar with tails. Now repeat
the game indenitely, and look at the total gains. If they are zero or
positive, they are called blue, red if negative. At the rst two throws,
the player loses and from then on plays even. The game is repeated for
a total of 10,000 throws, with the last throw leading exactly to the border-
line between red and blue. Now, according to Popper, the inductivist
would gamble ten thousand to one that the next throw would keep him
in the red zone, while the keen Popperian frequentist would gratefully
accept this offer (or any offer with odds higher than one to one).
It is obvious that this game is awed and surprising that Popper dares
to use it as an argument. Implicitly, Popper gives the frequentist knowl-
58 Probability, Econometrics and Truth
edge of the structure of the game, whereas this information is hidden for
the inductivist. The argument would collapse once information is distrib-
uted evenly. It cannot be used as an argument against inductivism, except
if Popper would agree that, in scientic inference, the frequentist knows
the laws of nature beforehand. The problem in science, of course, is that
these laws are unknown and have to be inferred from data.
Other objections of Popper to the subjective probability theory can
easily be dismissed. As a nal example consider Popper's `paradox'
that, according to the subjective view, tossing a coin may start with an
a priori probability of heads of 1/2, and after 1,000,000 tosses of which
500,000 are heads, ends with an a posteriori probability of 1/2 as well,
which `implies' that the empirical evidence is irrelevant information
(Popper, [1935] 1968, p. 408). This means, according to Popper, that a
degree of reasonable belief is completely unaffected by accumulating
evidence (the weight of evidence), which would be absurd. It is, of course,
a malicious interpretation of subjective probability to identify a posterior
distribution with a posterior mean.
5.2 The propensity theory of probability
After writing the Logic of Scientic Discovery, Popper succumbed to the
temptation of von Mises' forbidden apple, the probability of single
events. Popper (1982, pp. 6874) is interested in specic events related
to the decay of small particles in quantum theory. The propensity theory,
designed for this purpose, is an extension of the indifference theory, with
a distant avour of the frequency theory. A propensity is a (virtual)
relative frequency which is a physical characteristic of some entity
(p. 70). A propensity is also interpreted as an (objective) tendency to
generate a relative frequency (p. 71). As such, it is an a priori concept,
although Popper argues that it is testable by comparing it to a relative
frequency. But then the `collective' enters again, and it is unclear how the
outcome of one single event (for example, throwing heads) can be related
to the long-run frequency of events.
The propensity theory has not been very successful for the analysis of
small particle physics (see Howson and Urbach, 1989, pp. 2235). Its
relevance for the social sciences, in particular econometrics, is small,
except if people's desires are based on inalienable physical characteristics.
Popper did not claim that the theory should be of interest for the social
sciences, and in fact, he was not interested in discussing his views on
probability with the probability theorists and statisticians at LSE.
17
Hunger may be a physical stimulus to eating, but the choice between
bread or rice cannot be explained in such terms. If propensities are
Relative frequency and induction 59
understood as probabilistic causes, then they do not make sense as
inverse probabilities, because that conicts with notions of causality.
Summarizing, Popper's views on probability add few useful insights to
existing `objective' theories of probability. Reading his sequence of
changing interpretations of probability is an irritation. If methodological
falsicationism must have probabilistic underpinnings (as Popper
acknowledges), then the best way forward is to accept either Fisher's
theory of inference, or the theory of von Mises as a substitute for
Popper's own views. But both acknowledge support of the inductivist
arguments, which is the ultimate sin of Popper's philosophy of science.
5.3 Corroboration and verisimilitude
A methodological theory that pretends to separate the wheat from the
chaff needs a method of theory appraisal. According to Popper, prob-
ability theory will not do. Falsications count, not verications.
Probability theory (so it is asserted) cannot take care of this asymmetry.
This raises two questions: rst, what to do if two theories are as yet both
unfalsied, and second, what to do if both are falsied. Popper's answer
to the rst question is to invoke the degree of corroboration, a hybrid of
conrmation and content, discussed in Popper ([1935] 1968). The second
question is the source of Popper's later ideas on closeness to the truth,
verisimilitude.
A theory is said to be corroborated as long as it stands up against tests.
Some tests are more severe than others, hence there are also different
degrees of corroboration. The severity of tests depends on the degree of
testability, and that, in turn, is positively related to the simplicity of the
theory. The simpler a theory, Popper argues, the lower its a priori
(logical) probability (this is in sharp conict with the ideas presented in
chapter 5). The degree of corroboration of a theory which has passed
severe tests is inversely related to the logical probability (p. 270), but is
positively related to a degree of conrmation (the two make a hybrid
which leads at times to a muddle in Popper's argument, and also has led
to unnecessary arguments with Carnap).
Popper denotes the degree of conrmation of hypothesis h by evidence
e by C(h,e). What is the relation to the probability P(h[e)? Popper
(pp. 3967) argues that C(h. e) ,= P(h[e). I will not interfere with his argu-
ment, but discuss how Popper wants to measure C(h. e). In a footnote to
the main text of his book, (p. 263), he acknowledges that
It is conceivable that for estimating degrees of corroboration, one might nd a
formal system showing some limited formal analogies with the calculus of prob-
60 Probability, Econometrics and Truth
ability (e.g. with Bayes' theorem), without however having anything in common
with the frequency theory [of Reichenbach].
The (later written) appendix of the Logic proceeds in this direction by
endorsing (with a few addenda) an information-theoretical denition of
corroboration (information theory dates back to the work of Claude
Shannon and Norbert Wiener in 1948; see Maasoumi, 1987, for discus-
sion and references):
C(h. e) = log
2
P(e[h)
P(e)
(10)
(where log
2
denotes logarithm with base 2). A good test is one that makes
P(e) small, and P(e[h) (Fisher's likelihood) large. Popper's discussion is
problematic, as he interprets e as a statistical test report stating a good-
ness of t. This cannot be independent of the hypothesis in question,
however. Popper ([1935] 1968, p. 413) nally links his measure of corro-
boration directly to the likelihood function. After much effort, we are
nearly back to the dilemmas of Fisher's theory. The essential distinction
with Fisher is Popper's (p. 418) emphasis that `C(h. e) can be interpreted
as a degree of conrmation only if e is a report on the severest tests we
have been able to design'. That is, Popper invokes the goal of tests for the
interpretation of the results, to exploit the asymmetry between falsica-
tion and verication. If two theories are falsied, then additional criteria
are needed, which may be found in the theory of verisimilitude.
A conrmation of a highly improbable theory, one which implies a low
P(e), adds much to the degree of corroboration. That appeals to common
sense (although Popper seems to confuse P(e) and P(h) in his discussion
of `content', e.g. p. 411). But should we, therefore, do as much as possible
to formulate theories with as low as possible a priori logical probabilities?
Is, in the words of Levi (1967, p. 111), the purpose of inquiry to nurse our
doubt and to keep it warm? The most sympathetic way to interpret
Popper is to regard his degree of corroboration as the gain of informa-
tion, which a test may bring about. Then the goal of scientic inference is
not to search for a priori highly improbable theories, but to gain as much
information as possible. Counting another raven to corroborate the
hypothesis that all ravens are black does not add much information
and is, for sake of scientic progress, not very helpful and a rather costly
way to corroborate the theory that all ravens are black. Systematically
conrming a prediction of a highly disputed theory, on the other hand, is
worth its effort.
Relative frequency and induction 61
In Popper ([1963] 1989, chapter 10), a new concept is added to his
theory of scientic inference in order to discuss matters of progress in
science. Most scientists agree that empirical theories can never claim to
have reached a perfect match with `truth'. But we do know that
Newton's theory provides a better approximation to observable facts
than Keppler's. Popper's new idea is the concept of the verisimilitude of
a theory, a combination of closeness to the truth (empirical accuracy)
and the content of a theory. The denition Popper gives seems appeal-
ing (p. 233): a theory is closer to the truth than another one, if the rst
has more true, and fewer false, consequences than the latter. This cri-
terion satises Lakatos' demand for a `whiff of inductivism', to study
scientic progress as a sequence of improvements rather than instant
crucial tests.
However, the denition of verisimilitude is untenable because false
theories have equal verisimilitudes (see for discussion Watkins, 1984).
A false theory (one with at least one false consequence) can never be
closer to the truth than another false one. As no empirical theory
(model) is perfectly awless, this leads to the extreme scepticism of
which Russell accuses the early Greek sceptics. Kuhn (1962, pp. 1467)
was right when he argued that
if only severe failure to t justies theory rejection, then the Popperians will
require some criterion of `improbability' or of `degree of falsication.' In devel-
oping one they will almost certainly encounter the same network of difculties
that has haunted the advocates of the various probabilistic verication theories.
Popper's subsequent theories of probability are, in short, of little help for
completing methodological falsicationism. His critique of other theories
of probability is of little interest: weaknesses which are real have been
more effectively detected, analysed and sometimes repaired by proper
probability theorists.
6 Summary
How useful for econometric inference are the different interpretations of
probability given so far? The (early formulation of the) indifference the-
ory is clearly unsatisfactory as a foundation for scientic inference in
general and econometric inference in particular. Its relative, the propen-
sity theory, is also of little use for econometrics. In contrast, the fre-
quency interpretations presented in this chapter are widely accepted by
econometricians. They are not bothered by the circularity of the fre-
quency denition of probability.
62 Probability, Econometrics and Truth
Mainstream econometric methodology (as dened by the core chap-
ters of textbooks like Chow, 1983; Johnston, [1963] 1984; Maddala,
1977; Hill, Grifth and Judge, 1997; Goldberger, 1998) is a blend of
von Mises', Fisher's and NeymanPearson's ideas. This blend embraces
the frequency concept of von Mises, his emphasis on objectivity, and
asymptotic arguments. The conditions that warrant application of von
Mises' theory (convergence and randomness) are rarely taken seriously,
however. The blend further relies heavily on Fisher's box of tools, and
sometimes takes his hypothetical sample space seriously. A careful con-
sideration of Fisher's essential concept of experimental design is rarely
made. The NeymanPearson legacy is the explicit formulation of alter-
native hypotheses although the practical relevance of the power of
tests is small.
For the purpose of scientic inference in the context of non-
experimental data, this blend provides weak foundations. The
NeymanPearson approach is not attractive, as decision making in
the context of repeated sampling is not even metaphorically acceptable.
The frequentist highway to truth, paved with facts only, is still in need of
a bridge. Not even a bridge to probable knowledge can be given though.
Fisher's ducial bridge has a construction failure, and is rightly neglected
in econometric textbooks and journals alike. Still, many empirical results
that correspond to good common sense are obtained using the methods
of Fisher and, perhaps, NeymanPearson.
Why are these methods so popular, if they lack sound justication? A
pragmatic explanation probably is the best one. Fisher's maximum like-
lihood works like a `jackknife', Efron (1986, p. 1) argues:
the working statistician can apply maximum likelihood in an automatic fash-
ion, with little chance (in experienced hands) of going far wrong and consider-
able chance of providing a nearly optimal inference. In short, he does not
have to think a lot about the specic situation in order to get on toward its
solution.
Similarly, one of the strongest critics of Fisher's approach, Jeffreys
([1939] 1961, p. 393), disagrees with the fundamental issues but states,
reassuringly, that there is rarely a difference on actual inferences made,
when an identical problem is at stake. Weak foundations do not prevent
reasonable inference.
Alea jacta est. But economic agents do things other than throwing dice.
Perhaps the appropriate method of inference, although probabilistic, will
be different in kind from the view based on observed frequencies such as
throws of dice. Let us, therefore, cross our Rubicon, and move on to the
bank of epistemological probability.
Relative frequency and induction 63
Notes
1. Maddala (1977, p. 13) claims: `The earliest denition of probability was in
terms of a long-run relative frequency.' This claim is wrong: the relative
frequency theory of probability emerged in response to the indifference
theory.
2. The weak law relates to convergence in probability. Mood, Graybill and Boes
(1974, p. 232) and von Mises (1964, pp. 23043) provide discussions of the
laws of large numbers of Khintchine and Markov which do not rely on the
restriction on the variance (see also Spanos, 1986, chapter 9). The strong law
relates to almost sure convergence.
3. Keynes ([1921] CW VIII, pp. 3949) cites an amusing sequence of efforts to
`test' the law of large numbers. In 1849, the Swiss astronomer Rudolf Wolf
counted the results of 100,000 tosses of dice, to conclude that they were ill
made, i.e. biased (Jaynes, 1979, calculates the physical shape of Wolf's dice,
using his data). Keynes [1921] CW VIII, p. 394) remarks that G. Udny Yule
also `indulged' in this idle curiosity. Despite Keynes' mockery, Yule ([1911]
1929, pp. 2578; the ninth edition of the book to which Keynes refers) con-
tains the statement: `Experiments in coin tossing, dice throwing, and so forth
have been carried out by various persons in order to obtain experimental
verication of these results . . . the student is strongly recommended to carry
out a few series of such experiments personally, in order to acquire condence
in the use of the theory.'
4. The expression of the second law strongly resembles von Mises' ([1928] 1981,
p. 105) formulation of Bernoulli's theorem, which says: `If an experiment,
whose results are simple alternatives with the probability p for the positive
results, is repeated n times, and if c is an arbitrary small number, the probability
that the number of positive results will be not smaller than n(p c), and not
larger than n(p c), tends to 1 as n needs to innity.'
5. For any sequence, there is an admissible that coincides with a that is
inadmissible, where admissibility is dened according to the denition stated
in the randomness axiom.
6. A remarkable example is his research in human serology. In 1943, Fisher and
his research group made exciting discoveries with the newly found Rh blood
groups in terms of three linked genes, each with alleles. They predicted the
discovery of two new antibodies and an eighth allele. These predictions were
soon conrmed.
7. This phrase is addressed to Wald and Neyman. Von Mises does not appear
in Fisher's writings, not even in Fisher's (1956), his most philosophical
work.
8. Spanos (1989, p. 411) claims that this `axiom' of correct specication was
rst put forward by Fisher. Spanos also asserts that Tjalling Koopmans
(1937) refers to it as `Fisher's axiom of correct specication'. Although
Koopmans does discuss specication problems in his monograph, there is
no reference to such an axiom. Indeed, it would have been surprising if
Fisher had used the term `axiom' in relation to correct specication: his
64 Probability, Econometrics and Truth
style of writing is far remote from the formalistic and axiomatic language of
the structuralistic tradition. Fisher's hero was Darwin (or Mendel), not
Hilbert (or Kolmogorov). He was interested, for example, in pig breeding,
not axiom breeding (see the delightful biography by his daughter, Joan
Fisher Box). His scepticism towards axiomatics is revealed in letters to H.
Faireld Smith and Harold Jeffreys, and an anecdote, all discussed by
Barnard (1992, p. 8). Around 1950, after a lecture, Fisher was asked
whether ducial probability satised Kolmogorov's axioms. He replied:
`What are they?' The occasion where Fisher makes use of the notion
`axiom' is in ([1935] 1966, pp. 67) where he criticizes Bayes' `axiom' (of
a uniform prior probability distribution).
9. The denition in Fisher (1922, p. 309) is somewhat different: `A statistic
satises the criterion of consistency, if, when it is calculated from the whole
population, it is equal to the required parameter.'
10. Fisher Box (1978, p. 147) quotes a classic 1926 paper of Fisher (`The arrange-
ment of eld experiments'), which states that after randomization, [t]he esti-
mate of error is valid because, if we imagine a large number of different
results obtained by different random arrangements, the ratio of the real to
the estimated error, calculated afresh for each of these arrangements, will be
actually distributed in the theoretical distribution by which the signicance of
the result is tested.'
11. Explicit references by econometricians to ducial inference are rare.
Koopmans (1937, p. 64) is an exception. He discusses exhaustive statistics
and compares this to Neyman's condence intervals. He does not pursue the
issue more deeply and concludes that `maximum likelihood estimation is here
adopted simply because it seems to lead to useful statistics' (p. 65).
12. The following discussion is based on Lehmann ([1959] 1986, pp. 22530).
13. Blaug is in good company. Statisticians of name, such as Bartlett, have
argued that ducial distributions are like distributions that provide
Neyman-type condence intervals (see Neyman, 1952, p. 230, for references).
14. For any level of signicance o, and for any non-zero prior probability of a
hypothesis (P(h
0
), there is always a sample size n such that the posterior
probability of the hypothesis equals 1 o. Thus, a hypothesis that has strong
a posteriori support from a Bayesian perspective, may have very little support
from a Fisherian point of view.
15. Popper met von Mises in Vienna (see Popper, 1976, chapter 20). Popper
(1982, p. 68) claims that he solved the ambiguities in von Mises' frequency
theory: `I feel condent that I have succeeded in purging it of all those
allegedly unsolved problems which some outstanding philosophers like
William Kneale have seen in it.'
16. In 1734, Daniel Bernoulli won a prize at the French Academy with an essay
on the problem of planetary motion. The problem discussed is whether the
near coincidence of orbital planes of the planets is due to chance or not.
Bernoulli's analysis is one of the rst efforts to practise statistical hypothesis
testing.
Relative frequency and induction 65
17. Denis Sargan (private discussion, 1 December 1992). See also De Marchi
(1988, p. 33) who notes that Popper was not interested in discussions with
LSE economists, as they smoked and Popper did not. Later on, Lakatos did
try to get in touch with the LSE statisticians, but by then, according to
Sargan, they had lost their appetite.
66 Probability, Econometrics and Truth
4 Probability and belief
Probability does not exist.
Bruno de Finetti (1974, p. x)
1 Introduction
Probability in relation to rational belief has its roots in the work of Bayes
and Laplace. Laplace is known for his interest in the `probability of
causes', inference from event A to hypothesis H, P(H[A). This is
known as inverse probability, opposed to direct probability (inference
from hypothesis to events, P(A[H)). Inverse probability is part of episte-
mology, the philosophical theory of knowledge and its validation. The
acquisition of knowledge, and the formulation of beliefs, are studied in
cognitive science. Of particular interest is the limitation of our cognitive
faculties. In this chapter, the epistemological approaches to probability
are discussed.
The unifying characteristic of the different interpretations of epistemo-
logical probability is the emphasis on applying Bayes' theorem in order to
generate knowledge. All interpretations in this chapter are, therefore,
known as Bayesian. Probability helps to generate knowledge, it is part
of our cognition. It is not an intrinsic quality that exists independently of
human thinking. This is the message of the quotation from de Finetti,
featuring as the epigraph to this chapter: probability does not exist.
Probability is not an objective or real entity, but a construct of our minds.
This chapter rst introduces the theory of logical probability of Keynes
(section 2) and Carnap (section 3). Section 4 deals with the personalistic
(`subjective') interpretation of probability theory. The construction of
prior probabilities, one of the great problems of epistemological prob-
ability, is the topic of section 5. Section 6 appraises the arguments.
2 Keynes and the logic of probability
The logical interpretation of probability considers probability as a
`degree of logical proximity'. It aims at assigning truth values other
67
than zero or one to propositions for inference of partial entailment. The
numbers zero and one gure as extreme cases. A probability of zero
indicates impossibility, a probability equal to one indicates the truth of
a proposition.
John Maynard Keynes belongs to the founders of this approach. The
publication of A Treatise on Probability (Keynes, [1921] CW VIII) was an
important event in the history of the theory of probability and induction
(Russell, 1927, p. 280, writes that Keynes provides by far the best exam-
ination of induction known to him). Rudolf Carnap, Jaakko Hintikka
and Richard Jeffrey continued research on logical probability.
2.1 A branch of logic
Keynes ([1921] CW VIII) regards probability theory, like economics, as a
branch of logic. This logic is not the formal logic of Russell and
Whitehead, but a more intuitive logic of practical conduct (see
Carabelli, 1985). This theory of conduct is inspired by the Cambridge
philosopher, G. E. Moore, and can be traced back to the writings of
Hume.
The logical theory uses the word `probability' primarily in relation to
the truth of sentences, or propositions. The occurrence of events is of
interest only insofar as the probability of the occurrence of an event can
be related to the truth of the proposition that the event will occur.
1
A
relative frequency is, therefore, not identical to a probability, although it
may be useful in deriving a probability. Jeffreys ([1939] 1961, p. 401)
provides the analogy that physicists once described atmospheric pressure
in terms of millimetres, without making pressure a length. Probability is a
rational degree of belief.
An important axiom of Keynes' system is his rst one:
Provided that a and h are propositions or disjunctions of propositions, and that h
is not an inconsistent conjunction, there exists one and only one relation of
probability P between a as conclusion and h as premiss. ([1921] CW VIII, p. 146)
This axiom expresses what Savage ([1954] 1972) calls the necessary view
of probability, which he attributes to Keynes as well as to Jeffreys.
2
In
this sense, Keynes' theory is an objective one, as he already makes clear in
the beginning of his book (Keynes [1921] CW VIII, p. 4):
in the sense important to logic, probability is not subjective. It is not, that is to
say, subject to human caprice. A proposition is not probable because we think it
so. When once the facts are given which determinate our knowledge, what is
probable or improbable in these circumstances has been xed objectively, and
is independent of our opinion.
68 Probability, Econometrics and Truth
This view has been criticized by Ramsey, who defends a strictly subjec-
tive, or personalistic, interpretation of probability. Keynes ([1933] CW X,
p. 339) yielded a bit to Ramsey.
3
Despite this minor concession, von
Mises ([1928] 1981, p. 94) calls Keynes `a persistent subjectivist'. This
interpretation may be induced by Keynes' denition of probability as a
degree of belief relative to a state of knowledge. However, persons with
the same state of knowledge must have the same probability judgements,
hence there is no room for entrenched subjectivity.
The most striking differences between Keynes and von Mises are:
.
according to von Mises, the theory of probability belongs to the
empirical sciences, based on limiting frequencies, while Keynes regards
it as a branch of logic, based on degrees of rational belief; and
.
von Mises' axioms are idealizations of empirical laws, Keynes' axioms
follow from the intuition of logic.
It is remarkable that such differences in principles do not prevent the
two authors from reaching nearly complete agreement on almost all of
the mathematical theorems of probability, as well as the potentially suc-
cessful elds of the application of statistics!
2.2 Analogy and the principle of limited independent variety
A central problem in the Treatise on Probability is to obtain a probability
for a proposition (hypothesis) H, given some premise (concerning some
event or evidential testimony), A. The posterior probability of H given A
is given by Bayes' rule,
P(H[A) =
P(A[H)P(H)
P(A)
. (1)
In order to obtain this posterior probability, a strictly positive prior
probability for H is needed: if P(H) = 0, then P(H[A) = 0 \A. It is not
obvious that such a prior does exist. If innitely many propositions may
represent the events, then without further prior information their logical
a priori probability obtained from the `principle of indifference' goes to
zero. In philosophy, this is known as the problem of the zero probability
of laws in innite domain (Watkins, 1984).
Informative analogies are an important source for Keynesian probabil-
ity a priori. Hume already noted that analogy is needed in inductive
reasoning. Keynes ([1921] CW VIII, p. 247) cites Hume, who argues
that reasoning is founded on two particulars: `the constant conjunction
of any two objects in all past experience, and the resemblance of a present
object to any of them'. Keynes (p. 243) makes a crucial addition by
Probability and belief 69
introducing `negative analogy'. Counterparts to negative analogy in con-
temporary econometrics are identication and, one might argue, global
sensitivity analysis. Using the question `are all eggs alike?' as an example,
Keynes argues that the answer depends not only on tasting eggs under
similar conditions.
4
Rather, the experiments should not be `too uniform,
and ought to have differed from one another as much as possible in all
respects save that of the likeness of the eggs. [Hume] should have tried
eggs in the town and in the country, in January and in June.' New
observations (or replications) are valuable in so far they increase the
variety among the `non-essential characteristics of the instances'. The
question then is whether in particular applications of inductive inference,
the addition of new data adds simultaneously essential plus non-essential
characteristics, or only non-essential characteristics. The latter provides a
promising area for inductive inference. The statement is in line with
another author's conclusion, that `by deliberately varying in each case
some of the conditions of the experiment, [we may] achieve a wider
inductive basis for our conclusions'. (R. A. Fisher [1935] 1966, p. 102).
Keynes ([1921] CW VIII, p. 277) argues, that, if every separate cong-
uration of the universe were subject to its own governing law, prediction
and induction would become impossible. For example, consider the
regression equation,
y
i
= a
1
x
i.1
. . . a
k
x
i.k
u
i
. (u
1
. . . . . u
n
)
/
(0. o
2
).
i = 1. . . . . n.
(2)
In this equation every observation i of y
i
might have its own `cause' or
explanatory variable, x
j
(k equals n). The same might apply to new
observations, each one involving one or more additional explanatory
variables. Alternatively, the parameters a might change with every obser-
vation, without recognizable or systematic pattern. These are examples of
a lack of `homogeneity' where negative analogy does not hold and, hence,
new observations do not generate inductive insights.
5
The universe must
have a certain amount of homogeneity in order to make inductive infer-
ence possible.
To justify induction, the fundamental logical premise of inference has
to be satised. The principle of limited independent variety serves this
purpose. It is introduced by Keynes in a chapter signicantly entitled
`The justication of these [inductive] methods'.
6
Keynes (p. 279) denes
the independent variety of a system of propositions as the `ultimate con-
stituents' of the system (the indenable or primitive notions) together
with the `laws of necessary connection'.
7
The eggs example may be
70 Probability, Econometrics and Truth
used to clarify what Keynes had in mind. The yolk, white and age of the
egg are (perhaps not all) the ultimate constituents of the egg, while the
chemical process of taste may be interpreted as a law of necessary con-
nection. The chemical process is not different in space or time, and the
number of ultimate constituents is low. Hence, independent variety seems
limited.
If independent variety increases (for example by increasing the number
of regressors, changes in functional form, etc., in equation 2), induction
becomes problematic. Keynes argues that the propositions in the premise
of an inductive argument should constitute a high degree of homogeneity:
Now it is characteristic of a system, as distinguished from a collection of hetero-
geneous and independent facts or propositions, that the number of its premisses,
or, in other words, the amount of independent variety in it, should be less than the
number of its members.
Even so, a system may have nite or innite independent variety. It is
only with regard to nite systems that inductive inference is justied
(p. 280). An object of inductive inference should not be innitely com-
plex, so `complex that its qualities fall into an innite number of inde-
pendent groups', i.e. generators or causes (p. 287). If there is reason to
believe that the condition of limited independent variety is met, then
inductive inference is, in principle, possible. Keynes dubs this belief the
inductive hypothesis. It is a sophisticated version of the principle of the
uniformity of nature.
8
If independent variety is limited, the number of
hypotheses h
i
(i = 1. . . . . n) that may represent e is nite. Lacking other
information, they are assigned a prior probability of 1,n in accordance
with the principle of indifference (alternative measures are discussed in
the next chapter).
Induction depends on the validity of the inductive hypothesis. The
question is how to assess its validity. This cannot be done on purely
logical grounds (as it pertains to an empirical matter), or on purely
inductive grounds (this would involve a circular argument).
9
Keynes
argues that there is no need for the hypothesis to be true. It sufces to
attach a non-zero prior probability to it and to justify such a prior belief
in each application. The domain of induction cannot be determined
exactly by either logic or experience. But logic and experience may help
to give at least some intuition about this domain: we know from experi-
ence that we may place some (though not perfect) condence in the
validity of limited independent variety in a number of instances.
10
`To
this extent', Keynes (pp. 2901) argues, `the popular opinion that induc-
tion depends upon experience for its validity is justied and does not
involve a circular argument'. But one should be careful with applying
Probability and belief 71
probabilistic methods. In particular, Keynes' letter to C. P. Sanger dis-
cussing the problem of index numbers emphasizes this:
when the subject is backward, the method of probability is more dangerous than
that of approximation, and can, in any case, be applied only when we have
information, the possession of which supposes some very efcient method,
other than the probability method, available. ([1909] 1983 CW XI, p. 158)
As an example of such an alternative method, not yet available when
Keynes wrote these comments, one might think of R. A. Fisher's theory
of experimental design. Indeed, this is one of the few methods which has
proved to be effective in cases of inference where the list of causes is likely
to exceed the few captured in a regression model.
2.3 Measurable probability
Keynes has shown the possibility of obtaining logical prior probabilities,
albeit in a very restricted class of cases. How much help is this for infer-
ence? A distinctive feature of Keynes' view is that not all probabilities are
numerically measurable, and in many instances, they cannot even be
ranked on an ordinal scale. The `beauty contest', which has become
famous due to the General Theory but had occurred previously, for
another purpose, in the Treatise, is used to illustrate this point. Keynes
([1921] CW VIII, pp. 279) explains how one of the candidates of the
contest sued the organizers of the Daily Express for not having had a
reasonable opportunity to compete. Readers of the newspaper deter-
mined one part of the nomination. The nal decision depended on an
expert, who had to sample the top fty of the ladies chosen by the read-
ers.
11
The candidate complained in front of the Court of Justice, that she
had not obtained an opportunity to make an appointment with this
expert. Keynes argues that the chance of winning the contest could
have been measured numerically, if only the response of the readers
(who sent in their appraisals and thus provided an unambiguous ranking
of the candidates) had mattered. The subjective taste of the single expert
could not be evaluated in a similar way. Hence, a rational basis for
evaluating the chances of the unfortunate lady was lacking. `Keynesian'
probability theory could not be used to estimate her loss.
The non-measurability of probabilities, the notion that very often
probabilities cannot be assigned numerical values and often can, at
best, be ranked on a cardinal scale, is what I call Keynes' incommensur-
ability thesis of probability. This belongs to Keynes' most controversial
contributions to probability theory. For example, Jeffreys ([1939] 1961,
p. 16), who developed a probability theory similar to Keynes', proposes,
72 Probability, Econometrics and Truth
as the very rst axiom of his probability theory, that probabilities must be
comparable. Indeed, the empirical applicability of probability theory
would be drastically reduced if probabilities were incommensurable. A
rational discussion about different degrees of belief would be impossible
as soon as one participant claims that his probability statement cannot be
compared with another one's (the resulting problem of rational discus-
sion is similar to the utilitarian problem of rational redistribution given
ordinal utility). One should be wary, therefore, of dropping measurable
probability without proposing alternatives. More recently, statisticians
like B. O. Koopman and I. J. Good, have tried to formalize probabilistic
agnosticism. Ed Leamer may be regarded as a successor in econometrics
(although he rarely refers to Keynes and Good, and never to Koopman).
2.4 Statistical inference and econometrics: Keynes vs Tinbergen
In the nal part of the Treatise, Keynes discusses statistical inference.
Whereas the logical theory is mainly directed towards the study of uni-
versal induction (concerning the probability that every instance of a gen-
eralization is true, e.g. the proposition that all swans are white), statistical
inference deals with problems of correlation. This deals with the prob-
ability that any instance of this generalization is true (`most swans are
white'; see Keynes, [1921] CW VIII, pp. 2445). The latter is more intri-
cate than, but not fundamentally different from, universal induction,
Keynes argues (p. 447). The goal of statistical inference is to establish
the analogy shown in a series of instances. Keynes (p. 454) comes fairly
close to von Mises' condition of randomness, for he demands conver-
gence to stable frequencies from given sub-sequences. Von Mises, how-
ever, is more explicit about the requirements for this condition.
Among econometricians, Keynes is known as one of their rst oppo-
nents. This is due to his clash with Tinbergen in 1939. But Keynes had
been tempted by the statistical analysis of economic data himself when he
was working on A Treatise on Probability and employed at the London
India Ofce. In his rst major article, published in 1909 in the Economic
Journal (`Recent economic events in India', 1983, CW XI, pp. 122),
Keynes makes an effort to test the quantity theory of money by compar-
ing estimates of the general index number of prices with the index number
of total currency. The movements were surprisingly similar. In a letter to
his friend Duncan Grant in 1908, Keynes writes that his `statistics of
verication' threw him into a
tremendous state of excitement. Here are my theorieswill the statistics bear them
out? Nothing except copulation is so enthralling and the gures are coming out so
Probability and belief 73
much better than anyone could possibly have expected that everybody will believe
that I have cooked them. (cited in Skidelsky, 1983, p. 220)
Keynes himself remained suspicious of unjustied applications of statis-
tics to economic data. A prerequisite of fruitful application of statistical
methods is that positive knowledge of the statistical material under con-
sideration must be available, plus knowledge of the validity of the prin-
ciple of limited variety, knowledge of analogies and knowledge of prior
probabilities. These circumstances are rare. Keynes ([1921] CW VIII, p.
419) argues, that
To apply these methods to material, unanalysed in respect of the circumstances of
its origin, and without reference to our general body of knowledge, merely on the
basis of arithmetic and of those of the characteristics of our material with which
the methods of descriptive statistics are competent to deal, can only lead to error
and to delusion.
They are the children of loose thinking, the parents of charlatanism. This
provides the basis for Keynes' ([1939] CW XIV) later objections to the
work of Tinbergen, to which I will turn now.
Tinbergen had made a statistical analysis of business cycles at the
request of the League of Nations. Tinbergen's (1939a) major justication
for relying on statistical analysis is that it may yield useful results, and
may be helpful in measuring the quantitative impact of different variables
on economic business cycles. Tinbergen combines Yule's approach of
multiple regression (see chapter 6, section 3) with a dash of Fisher's
views on inference. There is no reference to von Mises' Kollektiv or the
NeymanPearson theory of inductive behaviour, and the epistemological
interpretations of probability are totally absent. Although the title of his
book, Statistical Testing of Business Cycle Theories, suggests otherwise,
there is little testing of rival theories. Indeed, a statistical apparatus to test
competing, non-nested theories simply did not exist. The few reported
statistical signicance tests are used to test the `accuracy of results'
(Tinbergen, 1939b, p. 12). They are not used as tests of the theories
under consideration.
Tinbergen is interested in importance testing, in his words: testing the
economic signicance. Namely, he investigates whether particular effects
have a plausible sign and are quantitatively important. If so, signicance
tests are used to assess the statistical accuracy or weight of evidence.
12
The procedure for `testing' theories consists of two stages. First, theore-
tical explanations for the determination of single variables are tested in
single regression equations. Second, the equations are combined in a
system and it is considered whether the system (more precisely, the
nal reduced-form equation obtained after successive substitution) exhi-
74 Probability, Econometrics and Truth
bits the uctuations that are found in reality. In order to appraise the-
ories, Tinbergen combines the most important features of the different
theories in one `grand model' (in this sense, as Morgan, 1990, p. 120,
notes, Tinbergen views the theories as complementary rather than rivals:
it is a precursor to comprehensive testing). Tinbergen's inferential results
are, for example, that interest does not really matter in the determination
of investment or that the acceleration mechanism is relatively weak.
These are measurements, not tests.
The nal goal of Tinbergen's exercise is to evaluate economic policy.
This is done by using the econometric model as an experimental economy
(or `computational experiment', in modern terms). The coefcients in the
model may change as a result of changing habits or technical progress,
but Tinbergen (1939b, p. 18) writes,
It should be added that the coefcients may also be changed as a consequence of
policy, and the problem of nding the best stabilising policy would consist in
nding such values for the coefcients as would damp down the movements as
much as possible.
Keynes argues that Tinbergen does not provide the logical foundations
which would justify his exercise. Keynes' critique is directly related to the
Treatise on Probability and its emphasis on the principle of limited inde-
pendent variety as an essential premise for probabilistic inference. There
must be a certain amount of `homogeneity' in order to validate induction.
A major complaint of Keynes ([1939] CW XIV) is that Tinbergen's
(1939a) particular example, the explanation of investment, is the least
likely one to satisfy this premise, in particular, because of the numerous
reasons that change investors' expectations about the future, a crucial
determinant of investment. The method of mathematical expectations is
bound to fail (Keynes, [1939] CW XIV, p. 152). Induction is justied in
certain circumstances, only `because there has been so much repetition
and uniformity in our experience that we place great condence in it'
(Keynes, [1921] CW VIII, p. 290). Those circumstances do not apply
generally, and Keynes' attitude towards probabilistic inference in the
context of economics is very cautious.
Another complaint of Keynes deals with Tinbergen's choice of time
lags and, more generally, the choice of regressors. In the rejoinder to
Tinbergen (1940), Keynes ([1940] CW XIV, p. 319) suggests a famous
experiment:
It will be remembered that the seventy translators of the Septuagint were shut up
in seventy separate rooms with the Hebrew text and brought out with them, when
they emerged, seventy identical translations. Would the same miracle be vouch-
Probability and belief 75
safed if seventy multiple correlators were shut up with the same statistical
material?
This rhetorical question has been answered by the postwar practice of
empirical econometrics: there is no doubt that specication uncertainty
and the coexistence of rival, conicting theories are deep problems that
have not yet been solved. Friedman (1940, p. 659), in a review of
Tinbergen's work, concurs with Keynes' objection:
Tinbergen's results cannot be judged by ordinary tests of statistical signicance.
The reason is that the variables with which he winds up . . . have been selected
after an extensive process of trial and error because they yield high coefcients of
correlation.
Friedman adds a statement that reappears many times in his later writ-
ings: the only real test is to confront the estimated model with new data.
David Hendry (1980, p. 396) is not convinced that Keynes' objections
are fundamental ones:
Taken literally, Keynes comes close to asserting that no economic theory is ever
testable, in which case, of course, economics ceases to be scientic I doubt if
Keynes intended this implication.
In fact, Keynes did. As Lawson (1985) also observes, Keynes is very clear
on the difference between the (natural) sciences and the moral `sciences',
such as economics, and indeed believes that (probabilistic) testing is of
little use in as changing an environment as economics. Keynes ([1921]
CW VIII, p. 468) concludes his Treatise with some cautious support for
applying methods of statistical inference to the natural sciences, where the
prerequisites of fruitful application of statistical methods are more likely
to be satised:
Here, though I have complained sometimes at their want of logic, I am in funda-
mental sympathy with the deep underlying conceptions of the statistical theory of
the day. If the contemporary doctrines of biology and physics remain tenable, we
may have a remarkable, if undeserved, justication of some of the methods of the
traditional calculus of probabilities.
Applying frequentist probability methods in economics not only violates
logic, but is also problematic because of the absence of `limited indepen-
dent variety' in what would constitute the potentially most interesting
applications, such as business cycle research. In her discussion of the
debate, Morgan (1990, p. 124) claims that `[b]ecause Keynes ``knew''
his theoretical model to be correct, logically, econometric methods
could not conceivably prove such theories to be incorrect'. Keynes may
be accused of arrogance, but it is wrong to argue that he is as dogmatic as
76 Probability, Econometrics and Truth
suggested here. Morgan's admiration for Tinbergen blinds her to the
analytical argument of Keynes. The assertion that Tinbergen and other
`thoughtful econometricians' (p. 122) were already concerned with his
logical point does not hold ground: the debate between Tinbergen and
Keynes shows that they talk at cross purposes. Tinbergen never seriously
considers the analytical problem of induction or the justication of sam-
pling methods to economics. In Tinbergen's defence, one might note that
he was primarily interested in economic importance tests and much less in
probabilistic signicance tests. Hence, a probabilistic foundation was not
one of his most pressing problems.
3 Carnap and conrmation theory
3.1 Conrmation
Rudolf Carnap has tried to supersede Keynes' theory of logical probabil-
ity by proposing a quantitative system of inductive probability logic,
without endorsing Keynes' incommensurability thesis or his principle
of limited independent variety (see Carnap, 1963, pp. 9725). Carnap
was one of the most important members of the Wiener Kreis, and he
made many fundamental contributions to the philosophy of logical posi-
tivism. Initially, he and his colleagues of the Wiener Kreis supported the
(von Mises branch of the) frequentist theory of probability. Keynes'
theory was thought to lack rigour, it was `formally unsatisfactory'.
Around 1941, Carnap reconsidered his views. His interest in the logical
theory of probability was raised, due to the inuence of Wittgenstein and
Waissman. He began to appreciate Keynes' and Jeffreys' writings (p. 71;
see Schilpp, 1963, for an overview of Carnap's philosophy).
In his early writings, Carnap attempted to show that theoretical state-
ments have to be based (by denition) on experience in order to be
`meaningful'. The veriability principle (see chapter 1, section 5.1) can
then be used in order to demarcate `science' from `metaphysics'. Popper
pointed to the weaknesses of the veriability principle. Instead of giving
up his positivist theory of knowledge, Carnap amended his theory by
relaxing the veriability principle. Here, I will deal with Carnap's
amended theory of knowledge.
Denitions based on experience do not yield meaningful knowledge,
but the possibility of (partial) conrmation does. The basis for Carnap's
theory (Carnap, 1950; 1952) is an articial language of sentences along
the lines of Russell and Whitehead's Principia Mathematica. The goal is
to construct a conrmation function that can be used for inductive infer-
ence, which can gure as a common base for the various methods of
Probability and belief 77
Fisher, NeymanPearson and Wald for estimating parameters and testing
hypotheses (Carnap, 1950, p. 518).
A conrmation function c(H. A) denotes the degree of conrmation of
an hypothesis H by the evidence A. Carnap interprets this as the quanti-
tative concept of a logical probability. He shows that a conditional prob-
ability P(H[A) (in the simplest case dened on a monadic predicate
language L(A. n) which describes n individuals a
n
(i = 1. . . . . n) who
have or have not characteristic A) is indeed a measure of this degree.
The problem is to nd the necessary prior probability by means of
logical rules. For the given simple case, there are 2
n
possible states or
sentences. One solution is to invoke the principle of indifference, but this
does not yield unique prior probabilities. Carnap (1950) proposes two
logical constructs, c
f
(which bases the prior probability in the example on
2
n
) and c
+
(where, for 0 - q - n, the prior probability depends on the
number of combinations
n
C
q
). These constructs are discussed in Howson
and Urbach (1989, pp. 4855). Both rely on considerations of symmetry.
If n o, the logical probability shrinks to zero. In more complex ex-
amples it will be difcult to construct similar prior probabilities. If the
system is extended to a general `universe of discourse' (or sample space),
then it is unclear how the logical probabilities should be constructed.
Without further empirical information, the logically possible states of
the world have to be mapped into prior probability assignments, but
how this should be done remains unclear.
In a critical book-review of Carnap (1950) in Econometrica, Savage
(1952, p. 690) concludes that until the mapping problem is resolved,
`application of the theory is not merely inadequately justied; it is not
even clear what would constitute an application'. Not surprisingly,
Carnap's work did not gain a foothold in econometrics.
13
The unifying
base for the various methods of estimating and testing was not recognized
by the econometric profession. Another critique has been expressed by
Popper ([1935] 1968, p. 150), who complains that the resulting probability
statements are non-empirical, or tautological. This is true for the con-
struction of priors, but it is not correct that Carnap has no theory of
learning from experience.
3.2 The generalized rule of succession
Carnap's theory is a generalization of Laplace's rule of succession, dis-
cussed in chapter 2, section 3. Let H and A be `well formed formulas'
describing an hypothesis H and evidence A for a given predicate language
L. In the case of casting a die, A
k
denotes the outcomes of k throws of a
die. The hypothesis H
i
states that the next throw will be an outcome A
i
.
78 Probability, Econometrics and Truth
In the evidence, k
/
throws have characteristic A
i
. Carnap's degree of
conrmation is summarized in the following equation (a derivation can
be found in Kemeny, 1963, pp. 7249):
c(H
i
. A
k
) =
k
/
z
k
k z
. (3)
where k is the number of exclusive and exhaustive alternatives (e.g. k = 2
in a game of heads and tails; k = 6 in case of throwing dice). The funda-
mental problem is how to choose z, an arbitrary constant which obeys
0 _ z _ o. It represents the logical factor by which beliefs are deter-
mined. The problem of choosing z has not been solved, as a unique
optimal value for z does not exist. If the game is heads and tails and z
is set equal to 2, then Laplace's rule of succession obtains. If z has a
higher value, more evidence is needed to change one's opinion; if z grows
to innity, empirical evidence will not inuence probability assignments.
On the other hand, setting z = 0 yields the `straight rule'. In other words,
z measures the unwillingness to learn from experience. It quanties the
trade-off between eagerness to increase knowledge, and aversion to tak-
ing risk by attaching too much weight to the rst n outcomes of an
experiment.
The problem also shows up in Bayesian econometrics, where a choice
has to be made about the relative weights of prior and posterior informa-
tion before posterior probabilities, predictive densities, etc. can be calcu-
lated. In a discussion of some possible objections to using uniform prior
probabilities, Berger ([1980] 1985, p. 111) gives an example of an indivi-
dual trying to decide whether a parameter lies in the interval (0,1) or
[1,5). The statement of the problem suggests that is thought to be
somewhere around 1:
but a person not well versed in Bayesian analysis might choose to use the prior
() = 1,5 (on = (0. 5)), reasoning that, to be fair, all should be given equal
weight. The resulting Bayes decision might well be that c [1. 5), even if the data
moderately supports (0,1), because the prior gives [1,5) four times the probability
of the conclusion that c (0. 1). The potential for misuse by careless or unscru-
pulous people is obvious.
The warning is odd. There is no logical reason why such a choice is
careless or unscrupulous. If there is good reason for the choice of the
prior, it should have some weight in the posterior. The only thing a
Bayesian can do is to make a trade-off between the thirst for knowledge
and the aversion to risk (which happens to be one of the major goals of
Berger's book).
Probability and belief 79
Hintikka has extended Carnap's theory to more elaborate languages
(polyadic, innite). He has solved some analytical problems. Hintikka is
able to reject Carnap's conclusion that universal statements that all indi-
viduals of a countable universe have some specic property must have a
prior and posterior probability equal to zero. Carnap's conclusion has
been more strongly expressed by Popper, who has tried to prove that the
logical probability of scientic laws must be zero. The fundamental pro-
blem, how to settle for appropriate prior probabilities on logical grounds,
has not been solved by Hintikka (see Howson and Urbach, 1989, and the
references cited there).
3.3 An assessment
Efforts to establish prior probabilities on purely philosophical grounds
have been of limited success. More recently, however, one of Carnap's
students, Ray Solomonoff, introduced information-theoretical argu-
ments to analyse the problem. In combination with earlier intuitions
of Harold Jeffreys, this led to a breakthrough in the literature on prior
probabilities and inductive inference. Section 5 of this chapter dis-
cusses various more recent contributions to the theory of probability
a priori.
The practical compromise of inference must be to select priors that
are robust in the sense that the posterior distribution is relatively insen-
sitive to changes in the prior. How insensitive is a matter of taste, like
the choice of z is. The personalist theories of inference try to provide a
deeper analysis of the choice between risk aversion and information
processing.
Before turning to that theory, a nal issue of interest in Carnap's work
deserves a few words. This is the distinction between conrmability and
testability. A sentence which is conrmable by possible observable events
is testable if a method can be specied for producing such events at will.
Such a method is called a test procedure for the sentence. This notion of
testability corresponds, according to Carnap, to Percy Bridgman's prin-
ciple of operationalism (which means that all physical entities, processes
and properties are to be dened in terms of the set of operations and
experiments by which they are apprehended). Carnap prefers the weaker
notion of conrmability to testability, because the latter demands too
much. It seems that in econometrics, conrmability is the most we can
attain as well (see Carnap, 1963, p. 59). This will be illustrated in chapter
8 below.
80 Probability, Econometrics and Truth
4 Personalist probability theory
The formal foundations of the personalist theory of probability are due
to the French radical socialist and probability theorist Emile Borel (his
paper of 1924 can be found in Kyburg and Smokler, [1964] 1980) and
Keynes' pupil Frank Ramsey (1926). They base the theory of probability
on human ignorance, not innite sequences of events. Both hold that
probability should be linked with rational betting behaviour and evalu-
ated by studying overt behaviour of individuals. They differ from
Reichenbach (1935; [1938] 1976) in making subjective betting ratios the
primitive notions of their theories of probability, whereas Reichenbach
starts from relative frequencies and introduces the wager as an after-
thought (see, e.g., pp. 34857, for his discussion of Humean scepticism,
induction and probability). Like Neyman and Pearson, the early person-
alist probability theorists were inuenced by the upcoming theory of
behaviourism.
Borel's legacy to econometrics is primarily the Borel algebra. His other
contributions to probability theory and mathematics are little known in
the econometric literature. Ramsey is still often cited in economic theory,
but hardly ever in econometrics. The philosophical writings of Borel and
Ramsey did not have an appreciable inuence on econometrics. That is
not entirely true for two more recent contributors to the personalist
theory of probability: de Finetti and Savage. Their work is discussed in
the following sections.
4.1 De Finetti, bookmaker of science
The key to every constructive activity of the human mind is, according to
Bruno de Finetti (1975, p. 199), Bayes' theorem. De Finetti belongs to the
most radical supporters of the personalist theory of probability. De
Finetti ([1931] 1989, p. 195) defends a strictly personalistic interpretation
of probability: `That a fact is or is not practically certain is an opinion,
not a fact; that I judge it practically certain is a fact, not an opinion.' He
outlines a theory of scientic inference, which is further developed in
more recent work (de Finetti, 1974; 1975). He proposes to escape the
straitjacket of deductive logic, and instead holds subjective probability
the proper instrument for inference. Furthermore, he rejects scientic
realism. In the Preface to his Theory of Probability, de Finetti claims
(in capitals) that PROBABILITY DOES NOT EXIST. Probability does neither
relate to truth, nor does it imply that probable events are indeterminate
(de Finetti, [1931] 1989, p. 178; 1974, p. 28). Probability follows from
Probability and belief 81
uncertain knowledge, and is intrinsically subjective. It should be formu-
lated in terms of betting ratios.
There is one `objective' feature that probability assessment should
satisfy: coherence, which is a property of probability assignments:
Coherence. A set of probability assignments is coherent, if and only if
no Dutch Book (a bet that will result in a loss in every possible state of
the world) can be made against someone who makes use of these prob-
ability assignments.
Coherence is de Finetti's requirement for rationality. The probability
assignments of a rational individual (someone who behaves coherently)
are beyond discussion, as de Finetti's (1974, p. 175) theory is strictly
personalistic: `If someone draws a house in perfect accordance with the
laws of perspective, but choosing the most unnatural point of view, can I
say that he is wrong?' His answer is `no'.
The Dutch Book argument is one of the pillars of modern Bayesianism.
It is a natural requirement to economists, many economic theorists accept
Bayesian probability theory for this reason. Econometricians are less
convinced: its applicability to empirical research is limited (see also
Leamer, 1978, p. 40). Typically, it will be difcult to make a bet about
scientic propositions operational. Worse, it may be undesirable or with-
out scientic consequences. The more operational the bet, the more
specic the models at stake will be, and hence, the less harmful a loss
will be for the belief of the loser in his general theory. Models are not
instances of theories, at least in economics. A related critique of book-
makers' probability has been expressed by Hilary Putnam (1963, p. 783):
I am inclined to reject all of these approaches. Instead of considering science as a
monstrous plan for `making book', depending on what one experiences, I suggest
that we should take the view that science is a method or possibly a collection of
methods for selecting a hypothesis, assuming languages to be given and hypoth-
eses to be proposed.
Good scientists are not book-makers, they are book writers. The question
remains how fruitful hypotheses emerge, and how they can be appraised,
selected or tested. Putnam does not answer that question. Neither does de
Finetti.
De Finetti follows Percy Bridgman in requiring an operational deni-
tion of probability (1974, p. 76). By this he means that the probability
denition must be based on a criterion that allows measurement. The
implementation is to test, by means of studying observable decisions of a
subject, the (unobservable) opinion or probability assignments of that
subject. Hence, to establish the degrees of beliefs in different theories
82 Probability, Econometrics and Truth
of scientists, their observable decisions in this respect must be studied.
Whether this can be done in practice is questionable.
4.2 Exchangeability
An interesting aspect of de Finetti's work is his alternative for the formal
reference scheme of probability, discussed in the Intermezzo. One of the
basic notions of probability is independence, dened as a characteristic of
events. For example, in tossing a fair coin, the outcome of the nth trial, x
n
(0 or 1), is independent of the outcome of the previous trial:
P(x
n
[x
n1
. x
n2
. . . .) = P(x
n
). (4)
This is a purely mathematical denition. De Finetti wants to abandon
this characteristic, because in case of independence, the positivist problem
of inference (learning from experience) gets lost: if A and B are indepen-
dent, then P(A[B) = P(A). Therefore, he introduces an alternative notion,
exchangeability of events. It is dened as follows (de Finetti, 1975,
p. 215):
Exchangeability. For arbitrary (but nite) n, the distribution function
F(. . . . . . ) of x
h1
, x
h2
. . . . . x
hn
, is the same no matter in which order the
x
hi
are chosen. Hence, it is not asserted that P(x
i
[x
h1
. x
h2
. . . . . x
hn
) = P(x
i
),
but
P(x
i
[x
h
1
. . . . . x
h
n
) = P(x
j
[x
k
1
. . . . . x
k
n
)
\(i. h
1
. . . . . h
n
) . (j. k
1
. . . . . k
n
).
(5)
as long as the same number of zeros and ones occur in both conditioning
sets.
This is in fact the personalist version of von Mises' condition of random-
ness, with the difference that de Finetti has nite sequences in mind.
Using this denition, de Finetti is able to establish the convergence of
opinions among coherent agents with different priors. Note that it is not
yet clear how to ascertain exchangeability in practice. Like von Mises'
condition of randomness in the frequency theory of probability, the
exchangeability of events must nally be established on subjective
grounds.
Probability and belief 83
4.3 The representation theorem
A by-product of exchangeability is the representation theorem. It is the
proposition that a set of distributions for exchangeable random quanti-
ties can be represented by a probabilistic mixture of distributions for the
same random quantities construed (in the objectivist, relative frequency
way) as independent and identically distributed. If the probability
P(x
1
. . . . . x
n
) is exchangeable for all n, then there exists a prior prob-
ability distribution (z) such that
P(x
1
. . . . . x
n
) =
1
0
z
k
(1 z)
nk
(z)dz. (6)
This theorem provides a bridge between personalistic and objective inter-
pretations of probability. Exchangeability enables de Finetti to derive
Bernoulli's law of large numbers from a personalistic perspective.
Furthermore, if (z) = 1, it is possible to derive Laplace's rule of succes-
sion from the denition of conditional probability and the representation
theorem.
This is an important theorem, but there are some unsolved questions.
A technical problem is that a general representation theorem for more
complicated cases (such as continuous distributions) is not available. A
more fundamental problem is how can two personalists, who have dif-
ferent opinions about a probability, converge in opinion if they do not
agree about the exchangeability of events in a given sequence? There is no
`objective' answer to this question.
4.4 Savage's worlds
Savage's theory is an extension of de Finetti's work (Savage, [1954] 1972,
p. 4). They co-authored a number of articles and share the personalistic
perspective, but some differences remain. Savage bases his theory on
decision making (like de Finetti) but combines probability theory more
explicitly with expected (von NeumannMorgenstern) utility theory.
14
De
Finetti does not directly rely on utility theory, which makes his view
vulnerable to paradoxes such as Bernoulli's `St. Petersburg paradox',
analysed in Savage (p. 93). Personalistic probability and utility theory
are the two key elements of Savage's decision theory. Savage continues
the tradition of Christiaan Huygens and Blaisse Pascal, who were the rst
to derive theorems of expected utility maximization. Another particular
characteristic of Savage is his support of minimax theory (which is due to
von Neumann, introduced in statistics by Abraham Wald). Savage (p. iv)
84 Probability, Econometrics and Truth
notes that `personalistic statistics appears as a natural late development
of the NeymanPearson ideas'.
Savage's decision theory is positivistic in the sense that it is about
observable behaviour (but Savage acknowledges that real human beings
may violate the perfect rationality implied by his theory). He does not
analyse preference orderings of goods, or events, but orderings of acts.
You may assert preference of a Bentley over a Rolls Royce, but this is
meaningless if you never have the opportunity to make this choice in real
life.
Of particular interest among Savage's ideas about inference, is his
notion of small and large worlds (pp. 8291). It is perhaps the most
complicated part of his book, and he acknowledged that it is `for want
of a better one' (p. 83) (readers who want to avoid a headache might skip
the remainder of this subsection). The intuition of the idea is simple: the
analytical straitjacket can be imposed on specic or `local' problems (the
small world). However, when it turns out that the straitjacket really does
not t the problem, one has to shop in a larger world and buy a new
straitjacket which is better suited to the issue at stake.
Savage begins by dening a world as the object about which a person is
concerned. This world is exhaustively described by its states s. s
/
. . . . .
Events A. B. C. . . . are dened as sets of states of the world. They are
subsets of S, the universal event, or the grand world. It is a description of
all possible contingencies. The true state of the world is the state that does
in fact obtain.
Consequences of acting are represented by a set F of elements f . g. h.
. . . . Acts are arbitrary functions f . g. h from S to F. Act g is at least as
good as or preferred to act f (i.e. f _ g) if and only if:
S(f g) _ 0. (7)
where E denotes the expected value of an act, derived from a probability
measure P. Given the axioms of choice under uncertainty, an individual
has to make once in his lifetime one grand decision for the best act,
contingent on all possible states of the world. Similarly, ideally an econ-
ometrician has to know exactly how to respond to all possible estimates
he makes in his effort to model economic relationships, and an ideal chess
player would play a game without further contemplation once the rst
move is made. In practice, this is infeasible. Problems must be broken up
into isolated decision situations. They are represented by small worlds.
A small world is denoted by
S. It is constructed from the grand world
S by partitioning S into subsets or small world states, s. s
/
. . . . . A small
world state is a grand world set of states, i.e. a grand world event. In
Probability and belief 85
making a choice between buying a car and a bicycle, in a small world one
may contemplate contingencies such as the oil price and the expected
increase in trafc congestion. The large world contingencies are many,
perhaps innumerably many. Take, for example, the probability of becom-
ing disabled, the invention of new means of transportation, but also
seemingly irrelevant factors such as changes in the price of wheat or
the average temperature on Mars. The idea of the construction of a
small world is to integrate all `nuisance' variables out of the large
world model.
A small world event is denoted by
B, which is a set of small world
states in
S. The union of elements in
B, C
B, is an event in S. A small
world is `completely satisfactory' if and only if it satises Savage's pos-
tulates of preference orderings under uncertainty, and agrees with a prob-
ability
P such that
P(
B) = P(C
B) \
B
S. (8)
and has a small world utility
U(
f ) of consequences of acts
f which is
equal to the large world expected value of the small world consequences
of acts, E(
t
may have been drawn' (p. 118). In 1933, the sample point E and sample
space W are added to their terminology.
15. See also ibid., p. 6 and pp. 1314 where a design of experiments is related to
the `crucial experiment' in physics.
From probability to econometrics 141
7 Econometric modelling
It takes a lot of self-discipline not to exaggerate the probabilities you
would have attached to hypotheses before they were suggested to you.
L. J. Savage (1962)
1 Introduction
The frequency interpretation of probability presumes that the statistical
model is given by underlying theory, or by experimental design (the `logic
of the laboratory'). The task of the statistician is to estimate the para-
meters and to test hypotheses.
Econometrics is different. The underlying economic theories to justify
a model are not well established. Hence, part of the inference concerns
the theory and specication of the model. The usually unique, non-
experimental data (sample) are used for specication purposes, the spe-
cication thus derived is used for the purpose of inference. There is no
experimental design to justify distributional assumptions: if they are
investigated, the same data are used for this purpose. The pioneer of
econometrics, Tinbergen (1939b, p. 6), had already recognized that it is
not possible to work with a sequence of two separate analytical stages,
`rst, an analysis of theories, and secondly, a statistical testing of those
theories'. Modelling turned out to become an Achilles' heel for econo-
metrics.
Moreover, one task of inference is theoretical progress, hence, the
search for new specications. The probability approach, advocated by
Haavelmo and his colleagues at the Cowles Commission, is not suitable
for inference on rival theories and specications in a situation of unique,
non-experimental data. Section 2 deals with the Cowles methodology,
which long dominated `textbook econometrics' but has became increas-
ingly less respected.
Specication uncertainty has provoked at least four alternative
approaches to econometric modelling which will be discussed in this
chapter. They are:
.
`proigate' modelling based on Vector Autoregressions, which rejects
`incredible' over-identifying restrictions (Sims; discussed in section 3)
142
.
reductionism, emphasizing diagnostic testing (Hendry; section 4)
.
sensitivity analysis which quanties the effect of specication searches
on parameters of interest (Friedman, Leamer; section 5)
.
calibration, or structural econometrics without probability (Kydland,
Prescott; section 6).
The order of the methodologies is by the increasing emphasis given by
each to economic theory. Note that the methodologies presented here do
not exhaust all methods of inference in economics. A serious contender is
experimental economics, which is becoming increasingly popular. It can
be regarded as a complement to econometrics, where the Fisher metho-
dology is, in principle, applicable (some merits of experimental economics
are discussed in chapter 9). The merits of maximum likelihood versus
method of moments and non- (or semi-)parametric methods are not
discussed here: I regard them as rival techniques, not methodologies.
Co-integration, which is a very important technique in modern econo-
metrics, again is not considered as another methodological response to
specication uncertainty (unlike Darnell and Evans, 1990).
2 The legacy of Koopmans and Haavelmo to econometrics
2.1 An experimental method?
Haavelmo and the other contributors to what now is known as the
`Cowles approach to econometrics' deserve praise for their efforts to
provide methodological foundations for econometrics. Haavelmo views
science in the following way (1944, p. 12): in the beginning, there is
`Man's craving for ``explanations'' of ``curious happenings'' '. There is
not yet systematic observation, the explanations proposed are often of
a metaphysical type. The next stage is the ` ``cold-blooded'' empiricists',
who collect and classify data. The empiricists recognize an order or sys-
tem in the behaviour of the phenomena. A system of relationships is
formulated to mimic the data. This model becomes more and more com-
plex, and needs more and more exceptions or special cases (perhaps
Haavelmo has the Ptolemaic system in mind), until the time comes to
start reasoning (the Copernican revolution?). This `a priori reasoning' (a
somewhat odd notion, in light of Haavelmo's starting point of collecting
data) leads to `very general and often very simple principles and
relationships, from which whole classes of apparently very different things
may be deduced'. This results in hypothetical constructions and deduc-
tions, which are very fruitful if, rst, there are in fact `laws of Nature',
and, secondly, if the method employed is efcient (it goes without saying
that the next step of Haavelmo is to recommend the probabilistic
Econometric modelling 143
approach to inference). Finally, Haavelmo (p. 14) summarizes his view by
quoting Bertrand Russell favourably: `The actual procedure of science
consists of an alternation of observation, hypothesis, experiment and
theory.'
This view of the development of science may be a caricature, but not an
uncommon one, certainly not around 1944. It suggests that science devel-
ops from an initial ultra-Baconian phase to the hypothetico-deductive
method that became popular among the logical positivists in the 1940s
(see e.g. Caldwell, 1982, pp. 237). The so-called correspondence rules
that relate theoretical entities to observational entities in logical positi-
vism can be found in Haavelmo's emphasis on experimental design.
Despite the similarity between this view of Haavelmo on scientic devel-
opment, and the view of contemporaneous philosophers, it is unlikely
that Haavelmo really tried to apply the logical positivist doctrine to
econometrics. In most cases, his ultimate argument is the fruitfulness of
his approach, not its philosophical coherency.
Haavelmo regards econometrics as a means of inference for establish-
ing phenomenological laws in economics. Without referring to Keynes,
he (1944, p. 12) rejects the assertion that intrinsic instability of economic
phenomena invalidates such an approach: `a phrase such as ``In economic
life there are no constant laws'', is not only too pessimistic, it also seems
meaningless. At any rate, it cannot be tested' (Haavelmo, 1944, p. 12).
Economists should try to follow natural scientists, who have been suc-
cessful in choosing `fruitful ways of looking upon physical reality', in
order to nd the elements of invariance in economic life (p. 13; see also
pp. 1516). Haavelmo (pp. 89) considers a model as
an a priori hypothesis about real phenomena, stating that every system of values
that we might observe of the `true' variables will be one that belongs to the set of
value-systems that is admissible within the model . . . Hypotheses in the above
sense are thus the joint implications and the only testable implications, as far
as observations are concerned of a theory and a design of experiments. It is then
natural to adopt the convention that a theory is called true or false according as
the hypotheses implied are true or false, when tested against the data chosen as
the `true' variables. Then we may speak, interchangeably, about testing hypoth-
eses or testing theories.
This suggests a one to one relation between hypotheses or models and
theories. However, economics models are not instances of economic the-
ories (Klant, 1990). Haavelmo's argument is, therefore, invalid.
Haavelmo rejects a Baconian view of scientic inference: it is not per-
missible to enlarge the `model space' (the set of a priori admissible
hypotheses) after looking at the data. What is permissible, is to start
144 Probability, Econometrics and Truth
with a relatively large model space and select the hypothesis that has the
best t (1944, p. 83). But, if we return to Keynes' complaints, what if the a
priori model space is extremely large (because there is no way to constrain
the independent variation in a social science, i.e. too many causes are at
work and moreover, these causes are unstable)? Haavelmo has no clear
answer to this question. He holds that `a vast amount of experience'
suggests that there is much invariance (autonomy) around, but does
not substantiate this claim. Neither does he resort to Fisher's solution
to this problem, randomization.
Haavelmo is more `liberal' than Koopmans, whose insistence on the
primacy of economic theory has become famous (see in particular his
debate with Vining). Koopmans (1937, p. 58) had already argued that
the signicance of statistical results depends entirely on the validity of the
underlying economic analysis, and asks the statistician
for a test afrming the validity of the economist's supposed complete set of
determining variables on purely statistical evidence [which] would mean, indeed,
[turning] the matter upside down.
The statistician may be able, in some cases, to inform the economist that
he is wrong, if the residuals of an equation do not have the desired
property. Haavelmo adds to this stricture that the statistician may for-
mulate a theory `by looking at the data' the subject of the next sections
of this chapter.
What can be said about the effectiveness of Haavelmo's programme
for econometrics? Some argue that it revolutionized econometrics.
Morgan (1990, p. 258) claims:
By laying out a framework in which decisions could be made about which theories
are supported by data and which are not, Haavelmo provided an adequate experi-
mental method for economics.
The word `experimental' has been abused by Haavelmo and used in an
equally misleading manner by Morgan. The revolution, moreover, may
be regarded as a mixed blessing: the probability approach to econo-
metrics has led to a one-sided and often unhelpful emphasis on a limited
set of dogmas. Modigliani recalls (in Gans and Shepherd, 1994, p. 168)
how one of his most inuential papers was rejected by Haavelmo, then
editor of Econometrica. This was the paper which introduced the
DuesenberryModigliani consumption function. Haavelmo's argument
was that the important issue at the time was not inventing new economic
hypotheses, but improving estimation methods for dealing with simulta-
neity. Those were needed to sophisticate the structural econometrics pro-
gramme which took a Walrasian set of simultaneous equilibrium
Econometric modelling 145
equations as a given starting point for the econometric exercise of esti-
mating the values of the `deep parameters' involved.
2.2 Structure and reduced form
The most important feature of the Cowles programme in econometrics is
the just mentioned `structural' approach. This takes the (romantic)
approach that, in economics, everything depends on everything.
Economic relations are part of a Walrasian general equilibrium system.
Estimating single equations may lead to wrong inferences if this is
neglected.
All introductory econometric textbooks spend a good deal of space on
this problem. An elementary example is consumer behaviour, where con-
sumer demand c is a function of a price p and income y. If supply also
depends on those variables,then the result is a system of equations. In
general terms, such a system can be represented by endogenous variables
y = y
1
, y
2
. . . . . y
m
, exogenous variables x = x
1
. x
2
. . . . x
n
, and distur-
bances u. The system can be written as:
,
11
y
1t
,
21
y
2t
. . . ,
m1
y
mt
[
11
x
1t
. . . [
n1
x
nt
= u
1t
,
12
y
1t
,
22
y
2t
. . . ,
m2
y
mt
[
12
x
1t
. . . [
n2
x
nt
= u
2t
,
1m
y
1t
,
2m
y
2t
. . . ,
mm
y
mt
[
1m
x
1t
. . . [
nm
x
nt
= u
mt
(1)
or, in matrix notation,
y
/
t
x
/
t
B = u
/
t
. (2)
In this case of a system of equations, a regression of the variables in the
rst equation of the system does not yield meaningful information on the
parameters of this equation. One might solve the system of equations for
y
t
(given that is invertible):
y
/
t
= x
/
t
B
1
u
/
t
1
. (3)
in which case regression of y
t
on x
t
does yield information on the reduced
form regression coefcients BC
1
, but not on the separate parameters of
interest in the structural equations.
For such information, one needs to satisfy the identication condi-
tions. The most well-known type of condition is the (necessary, but not
146 Probability, Econometrics and Truth
sufcient) order condition, which says that the number of excluded
variables from equation j must be at least as large as the number of
endogenous variables in that equation.
1
The choice of the exclusion
restrictions may be inspired by economic theory. The Cowles methodol-
ogy is based on the valid provision of such restrictions.
Equation (3) is useful to obtain the reduced form parameters
= B
1
. In very specic circumstances (exact identication of an equa-
tion), the reduced form ordinary least squares regression can be used for
consistent estimation of the structural parameters (this is known as indir-
ect least squares). In general, however, this is not possible and more
advanced techniques are needed. The favoured method of the Cowles
econometricians is full information maximum likelihood. Because of its
computational complexity (and also sensitivity to specication error),
other methods have been proposed for estimation equations with endo-
genous regressors. Two stage least squares (due to Theil) or the method
of instrumental variables is the most popular of those methods.
2.3 Theory and practice
The degree to which the ideas of the Cowles Commission really have been
practised in applied econometrics is limited: in theory, it was widely
endorsed, in practice, econometricians tried to be pragmatic. There are
a number of reasons for this deviation. First, the NeymanPearson meth-
odology does not apply without repeated sampling. Second, the Cowles
Commission emphasized errors in equations (simultaneous equations
bias) at the cost of errors in variables (ironically the main topic of
Koopmans, 1937). In empirical investigations, simultaneous equations
bias appears to be less important than other sources of bias, moreover,
methods to deal with simultaneity (in particular full information maxi-
mum likelihood) are either less robust than single equation methods, and/
or difcult to compute (see Fox, 1989, pp. 601; Gilbert, 1991b).
Third, the view that the theory (or `model space') should be specied
beforehand is not very fruitful. It is hardly avoidable that the distinction
between model construction and model evaluation is violated in empirical
econometrics (Leamer, 1978; Heckman, 1992; Eichenbaum, 1995). It is
also not desirable. I concur with Eichenbaum (p. 1619), who argues that
the programme has proven `irrelevant' for inductive aims in economics. If
the set of admissible theories has to be given rst (by economic theory)
before the econometrician can start his work of estimation and testing,
then data can do little more than ll in the blank (i.e. parameter values).
In practice, there is and should be a strong interference between theory
and empirical models, as this is the prime source of scientic progress.
Econometric modelling 147
Only recently, the Cowles Commission sermon seems to have gained
practical relevance, due to the new-classical search for `deep parameters',
continued in real business cycle analysis (see the discussion on calibra-
tion, section 6). This approach may be accused of overshooting in its faith
in economic theory, which leads to a `scientic illusion' (Summers, 1991).
The criticism of Summers is of interest, not because it is philosophically
very sound (it isn't), but because it shows some weaknesses of formal
econometrics in the hypothetical-deductive tradition: in particular, the
lack of empirical success and practical utility.
2
In a nutshell, Summers
(following Leamer) argues for more sensitivity analysis, exploratory data
analysis and other ways of `data-mining' that are rejected by the Cowles
econometricians.
The work of the early econometricians relies on a blend of the views of
von Mises, Fisher and NeymanPearson. Von Mises' fundamental justi-
cation of the frequency theory was endorsed without taking the impera-
tive of large samples seriously. Fisher, on the other hand, did not hesitate
to use statistics for small sample analysis. This helped Fisher's popularity
among econometricians, although Fisher had little interest in econo-
metrics. His theory of experimental design, which he considered crucial,
has been ignored in econometrics until very recently. Similarly, the
NeymanPearson method of quality control in repeated samples is
hard to reconcile with econometric practice. If there ever was a `prob-
ability revolution' in econometrics (as Morgan, 1990, holds), it was based
on an arguably deliberate misconception of prevailing probability theory.
It yielded Haavelmo a Nobel memorial prize for economics in 1989.
3 Let the data speak
3.1 Incredible restrictions
Liu (1960) formulated a devastating critique of the structural econo-
metrics programme of the Cowles Commission. Consider the following
investment equation to illustrate the point (p. 858, n. 8), where y is invest-
ment, x
1
prots, x
2
capital stock, x
3
liquid assets, and x
4
the interest rate:
y = a
0
a
1
x
1
a
2
x
2
a
3
x
3
a
4
x
4
u. (4)
Now suppose that the estimates for a
2
and a
4
yield `wrong' signs, after
which x
4
is dropped. The new regression equation is:
y = a
/
0
a
/
1
x
1
a
/
2
x
2
a
/
3
x
3
u
/
. (5)
148 Probability, Econometrics and Truth
However, unknown to the investigator, the `true' investment equation is a
more complex one, for example:
y = b
0
b
1
x
1
b
2
x
2
b
3
x
3
b
4
x
4
b
5
x
5
u
+
. (6)
Textbook econometrics teaches that the expected values of the vectors of
parameters a and a
/
are functions of the `true' parameters vector, b.
Those functions depend on the size of the parameters and the correlation
between the omitted and included regressors. The simplication from (4)
to (5) may be exactly the wrong strategy, even if the deleted variable is
insignicant in (4). Liu concludes that over-identifying restrictions in
structural econometrics are very likely to be artifacts of a specication
search and not justiable on theoretical grounds. The investment equa-
tion is not identied, despite the apparent fullment of the identifying
conditions. Liu's advice is to concentrate on reduced form equations,
particularly if prediction is the purpose of modelling.
3
His scepticism
on parametric identication remained largely ignored, until Sims (1980)
revived it forcefully.
Sims title, `Macroeconomics and reality', mimics the title of chapter 5
of Robbins' ([1932] 1935), `Economic uctuations and reality'. Both
authors are sceptical of prevailing econometric studies of macro-eco-
nomic uctuations. Whereas Robbins rejects econometrics altogether,
Sims rejects econometric practice and suggests an alternative econometric
methodology.
Sims (1980, p. 1) observes that large-scale statistical macro-economic
models `ought to be the arena within which macroeconomic theories
confront reality and thereby each other'. But macro-econometric models
lost much support since the 1970s. The reason, Sims argues, is the same
one as Liu had already indicated: the models are mis-specied and suffer
from incredible identifying assumptions. As an alternative, he proposes
to `let the data speak for themselves' by reducing the burden of main-
tained hypotheses imposed in traditional (structural, Cowles-type) econo-
metric modelling. This is the most purely inductivist approach taken in
econometrics, a Baconian extension of the BoxJenkins time series meth-
odology (with its roots in earlier work of Herman Wold). The difference
from Liu and BoxJenkins methods is that Sims proposes to respect the
interdependent nature of economic data by means of a system of equa-
tions.
This is done by introducing an unrestricted Vector Autoregression
(VAR), in which a vector of variables y is regressed on its past
realizations:
Econometric modelling 149
A(L)y
t
= u
t
. (7)
where L denotes the lag operator. This is also known as an unrestricted
reduced form of a system of equations. Parametric assumptions are
needed to make (7) operational. Sims (1980) species it as a linear
model. This assumption is not innocuous and has been rightly criticized
by Leamer (1985). Non-linearities may be important if policy ventures
outside the historical bandwidth (the kinds of `historical events' that
Friedman uses for non-statistical identication of causes of business
cycle phenomena). A VAR approach may easily underestimate the
importance of such historical events (but the same criticism applies to
the mainstream approach criticized by Sims as well as to Leamer's own
empirical work).
The linear version of the VAR for the n-dimensional vector y can be
formulated as
y
t
=
X
L
i=1
A
i
y
ti
u
t
. (8)
with S(u
t
u
/
ti
) = if i = 0
= 0 otherwise.
For simplicity, the variancecovariance matrix is assumed to be constant
over time. In theory, the lag order L may be innite. If the vector sto-
chastic process y
t
is stationary, it can be approximated by a nite order
VAR. If u
t
follows a white-noise process, the VAR can be estimated
consistently and efciently by OLS regression (Judge et al., [1980] 1985,
p. 680).
3.2 Proigate modelling
The VAR approach is radically different from the plea of Hendry (dis-
cussed below), to `test, test and test'. VAR econometricians are rarely
interested in signicance tests, apart from testing for the appropriate lag
length (by means of a likelihood ratio test or an information criterion)
and `Granger causality' tests. Lag length indeed is an important issue.
The VAR contains n
2
L free parameters to be estimated. A modest VAR
for ve variables with ve lags yields 125 parameters. Estimating unre-
stricted VARs easily results in over-tting. This is at odds with the theory
of simplicity presented in chapter 5. Instead of parsimonious modelling
150 Probability, Econometrics and Truth
Sims (1980, p. 15) favours a strategy for estimating `proigately'. This
makes predictions less reliable, as the model does not separate `signal'
from `noise'. Hence, unrestricted VARs (such as in Sims, 1980) tend to be
sample-dependent (idiosyncratic) and are, therefore, of limited interest.
Zellner rightfully calls such models `Very Awful Regressions'. Further
reduction or adding prior information is desirable for obtaining mean-
ingful results.
The latter option is discussed in Doan, Litterman and Sims (1984), who
advocate a Bayesian view on forecasting. These authors argue that exclu-
sion restrictions (setting particular parameters a priori equal to zero)
reect an unjustied degree of absolute certainty, while prior information
on behalf of the retained variables is not incorporated. Hence, the tradi-
tional approach to structural econometrics is a mixture of imposing too
much and too little prior information. Bayesian VARs may provide a
useful compromise. However, specifying an appropriate prior distribu-
tion for a complicated simultaneous system is hard, as, for example, is
revealed in Doan, Litterman and Sims (1984). This has hampered the
growth of the VAR research programme. A second problem that
impeded the programme was the return of incredible restrictions via
the back door. I will turn to this issue now.
3.3 Innovation accounting
In order to explain macro-economic relationships, proponents of the
VAR methodology decompose variances of the system to analyse the
effect of innovations to the system. If there is a surprise change in one
variable, this will have an effect on the realizations of its own future as
well as on those of other variables both in the present and the future. This
can be simulated by means of impulse response functions. The moving
average representation of the VAR is:
y
t
=
X
L
i=1
B
i
u
ti
. (9)
For `innovation accounting', as it is called, one needs to know how an
impulse via u
t
affects the future of the variables in y. A problem is that
there are many equivalent representations of (9) (as one may replace B
i
by B
i
G and u
ti
by G
1
u
ti
), hence many different responses of the system
to an impulse are conceivable. One may choose G such that B
0
G is the
identity matrix I. Then, the innovations are non-orthogonal if the covari-
ance matrix is not diagonal (see Doan, 1988, p. 8.8, for a discussion of
Econometric modelling 151
orthogonalization; the following is based on this source). In this non-
diagonal case the innovations are correlated. Hence, simulating the
responses to a single impulse in one variable will result in an interpreta-
tion problem as, historically, such separate impulses do not occur. For
this purpose, VAR modellers orthogonalize such that G
1
G
/1
= I.
As a result, the orthogonalized innovations v
t
= u
t
G
1
(with S(v
t
v
/
t
) = I)
are uncorrelated across time and across equations. Impulse accounting
can begin without this interpretation problem.
But a subsequent problem arises, which is the major weakness of the
innovation accounting approach: how to choose G. The Choleski factor-
ization (G lower triangular with positive elements on the diagonal) is one
alternative. It has the attractive property that the transformed innova-
tions can be interpreted as impulses of one standard error in an ordinary
least squares regression of variable j in the system on variables 1 to j 1.
However, there are as many different Choleski factorizations as orderings
of variables (120 in a system of ve variables). The effect of impulses is
crucially dependent on the choice of ordering. This choice involves the
same incredible assumptions as those that were criticized to start with.
The ordering should be justied, and the sensitivity of the results to
changes in the ordering must be examined. Interpreting the results may
become a hard exercise.
3.4 VAR and inference
The VAR methodology intends to reduce the scope for data-mining by its
choice to include all variables available. But even the VAR allows options
for data-mining:
.
the choice of variables to begin with
.
the choice of lag length
.
the choice of the error decomposition to use.
For these reasons, the VAR does not seem the right response to specica-
tion uncertainty and data-mining. Economic theory remains an indispen-
sable tool for deriving useful empirical models. This does not mean that
the VAR is without merits. Two branches of empirical research should be
mentioned. The rst is due to Sims' own work, the second is related to the
analysis of consumption and the permanent income theory. A third line
of research which owes much to the VAR methodology is co-integration
which will not be discussed here.
Sims (1996, p. 117) provides a brief summary of useful ndings about
economic regularities related to the role of money and interest in business
cycles. This research line is presented as an extension of the monetary
analysis of Friedman and Schwartz. The initial nding that `money
152 Probability, Econometrics and Truth
(Granger) causes income' (presented by Friedman and Schwartz and
conrmed by Sims, 1972) was re-adjusted when a VAR exercise showed
that short-term interest innovations accounted for most of the variation
in output and `absorbed' the direct effect of money (Sims, 1980). An
extensive research line on the interpretation of interest innovations as
policy variables resulted, including institutional studies. Sims believes
that this has resulted in a number of well-established `stylized facts'
which turned out to be stable across a range of economies. If this is
correct, then the VAR-methodology can claim an important accomplish-
ment. However, Sims had already complained that other researchers (in
particular real business cycle analysts) do not accept these stylized facts in
their own research. Might the explanation be that VARs are still consid-
ered as too idiosyncratic, sample-dependent and difcult to interpret
from an economic theorist's perspective? An estimated elasticity yields
clear information, whereas an impulse response picture is the miraculous
outcome of a mysterious model.
The second line of research in which VARs have indeed been enor-
mously helpful is the analysis of consumption, in particular the perma-
nent income consumption function (see e.g. Campbell, 1987). Ironically,
this line of research has a strong theoretical motivation, at odds with the
much more inductive intentions originally motivating the VAR metho-
dology. It has a highly simplied structure (e.g. a two-variable VAR).
The results are directly interpretable in terms of economic theory. It has
yielded a number of new theoretical research lines, like those on liquidity
constraints and excess sensitivity of consumption to income. The results
have reached the textbook level of macro-economics and will have a
lasting impact on macro-economic research.
In sum, time-series methods are most useful when economic theory can
be related to the statistical model. This is the case in the examples men-
tioned above. It also is a feature of some alternative approaches that
deserve to be mentioned, such as co-integration methods (which grew
out of the work of Granger) and the `structural econometric modelling
time series analysis' of Palm and Zellner (1974). But it is an illusion that
econometric inference can be successful by letting only data speak.
4 The statistical theory of reduction
4.1 Reduction and experimental design
For Fisher, the `reduction of data' is the major task of statistics. The tool
to achieve this goal is maximum likelihood: `among Consistent Estimates,
when properly dened, that which conserves the greatest amount of
Econometric modelling 153
information is the estimate of Maximum Likelihood' (quoted on p. 44
above). One of the most important of Fisher's criteria is sufciency, `a
statistic of this class alone includes the whole of the relevant information
which the observations contain' (Fisher, [1925] 1973, p. 15).
Fisher's statistical theory of reduction is not only based on his method
of maximum likelihood, but also on his theory of experimental design.
This is a necessary requirement in order to asses `the validity of the
estimates of error used in tests of signicance', something that `was for
long ignored, and is still often overlooked in practice' (Fisher, [1935]
1966, p. 42). The crucial element in his theory of experimental design is
randomization (see chapters 3, section 3.3 and 6, section 3.4 above).
Randomization is Fisher's physical means for ensuring normality or
other `maintained hypotheses' of a statistical model. Student's t-test for
testing the null hypothesis that two samples are drawn from the same
normally distributed population is of particular interest for Fisher (see
also `Gosset' in Personalia). He regards it as one of the most useful tests
available to research workers, `because the unique properties of the nor-
mal distribution make it alone suitable for general application.' (Fisher,
[1935] 1966, p. 45). Fisher's disagreement with statisticians of the `math-
ematical school' (in particular Neyman) results largely from different
opinions about the effect of randomization on normality. According to
Fisher, proper randomization justies the assumption of normality and
the use of parametric tests. Assuming normality is not a drawback of tests
of signicance. Research workers can devise their experiments in such a
way as to validate the statistical methods based on the Gaussian theory of
errors. The experimenters `should remember that they and their collea-
gues usually know more about the kind of material they are dealing with
than do the authors of textbooks written without such personal knowl-
edge' (p. 49).
4.2 The Data Generation Process
4
Fisher's theory of experimental design is unt for analysis of non-experi-
mental data. The metaphor `experiment' has remained popular in econo-
metrics (see Haavelmo, 1944; Florens, Mouchart and Rolin, 1990
5
) but is
rarely related to the kind of experiment that Fisher had in mind. Hence,
Fisher's methods of reduction usually do not apply to econometrics.
An alternative, that still owes much to (but rarely cites) Fisher is the
theory of reduction and model design. This is sometimes called `LSE-
econometrics' but statistical reductionism, or simply reductionism, seems
more appropriate. David Hendry is the most active econometrician in
this school of thought.
154 Probability, Econometrics and Truth
Reductionism starts from the notion of a `Data Generation Process'
(DGP). It is a highly dimensional probability distribution for a huge
vector of variables, w,
DGP f (W
1
T
[W
0
. o
1
T
) =
Y
T
t=1
f (w
t
[W
1
t1
. o
t
). (10)
conditional on initial conditions (W
0
), parameters o
t
c
t
, continuity
(needed for a density representation) and time homogeneity
(f
t
() = f ()). A DGP is a well-dened notion in Monte Carlo studies,
where the investigator is able to generate the data and knows exactly
the characteristics of the generation process. However, reductionists
transfer it to applied econometrics, which is supposed to deal with the
study of the properties of the DGP. The intention of theory of reduction
is to bring this incomprehensive distribution down to a parsimonious
model, without loss of relevant information. Marginalizing and condi-
tioning are two key notions of this theory of reduction.
In applied econometrics, Hendry (1993, p. 73) argues, `the data are
given, and so the distributions of the dependent variables are already
xed by what the data generating process created them to be I knew
that from Monte Carlo'. Apart from the data, there are models (sets of
hypotheses). Given the data, the goal is to make inferences concerning
the hypotheses. In the theory of reduction, the idea is that the DGP (or
population) can be used as a starting point as well as a goal of inference.
Econometric models are regarded as reductions of the DGP (p. 247).
Although the DGP is sometimes presented as `hypothetical' (e.g.
p. 364), there is a tendency in Hendry's writings to view the DGP not
as an hypothesis (one of many possible hypotheses), but as fact or reality.
Sometimes the DGP is presented as `the relevant data generation process'
(p. 74). In other instances, the DGP becomes the `actual mechanism that
generates the data' (Hendry and Ericsson, 1991, p. 18), or simply `the
actual DGP' (Spanos, 1986). The DGP is reality and a model of reality at
the same time.
6
Philosophers call this `reication'. Once this position is
taken, weird consequences follow. Consider the `information taxonomy'
that `follows directly from the theory of reduction given the sequence of
steps needed to derive any model from the data generation process'
(Hendry, 1993, pp. 271, 358). Or, econometric models which are `derived
and derivable from the DGP' (Hendry and Ericsson, 1991, p. 20; empha-
sis added). Consider another example of reication: the proposition that
`it is a common empirical nding that DGPs are subject to interventions
affecting some of their parameters' (p. 373). Rather, those empirical
Econometric modelling 155
ndings relate to statistical models of the data, not to the DGP or its
parameters. Paraphrasing de Finetti, the dgp does not exist. It is an
invention of our mind.
Models are useful approximations to reality. Different purposes
require different models (and often different data, concerning levels of
aggregation, numbers of observations). The idea that there is one DGP
waiting to be discovered is a form of `scientic realism' (see chapter 9,
section 2 for further discussion of realism). This view entails that the task
of the econometrician is `to model the main features of the data genera-
tion process' (Hendry, 1993, p. 445; emphasis added). Instrumentalists do
not have a similar condence in their relationship to the truth. They
regard models as useful tools, sometimes even intentionally designed to
change the economic process. This may involve self-fullling (or self-
denying) prophecies, which are hard (if not impossible) to reconcile
with the notion of a DGP.
This is not to deny that there may be many valid statistical models even
in absence of `the' DGP. Statistical theory does not depend on such a
hypothesis. A time series statistician may formulate a valid univariate
model (see Chung, 1974, for necessary conditions for the existence of a
valid characterization), while a Cowles econometrician will usually design
a very different model of the same endogenous variable. Although both
may start with the notion of a DGP, it is unlikely that these hypothesized
DGPs coincide. Their DGPs depend on the purpose of modelling, as do
their reductions.
4.3 General to specic
An important characteristic of the reductionist methodology is the gen-
eral-to-specic approach to modelling. It was initially inspired by Denis
Sargan's Common Factor test, where a general model is needed as a
starting point for testing a lagged set of variables corresponding to an
autoregressive error process. Another source of inspiration was the
reductionist approach developed at CORE (in particular, by Jean-
Pierre Florens, Michel Mouchart and Jean-Franc ois Richard). The the-
ory of reduction deals with the conditions for valid marginalizing and
conditioning (relying on sufcient statistics). However, note that Fisher
does not claim to reduce the DGP, but rather to reduce a large set of data
without losing relevant information.
The basic idea of general-to-specic modelling is that there is a starting
point for all inference, the data generation process. All models are
reductions of the DGP, but not all of them are valid reductions. In the
following, a brief outline of reduction is sketched. Subsequently, a
156 Probability, Econometrics and Truth
methodological critique is given. Reconsider the DGP, (10). The variable
vector w contains only few variables that are likely to be relevant. These
are labelled y
+
, the remainder is w
+
(nuisance variables). Using the deni-
tion of conditional probability,
7
and assuming parameter constancy for
simplicity, one can partition the DGP:
f (w
+
t
. y
+
t
[W
+
t1
. Y
+
t1
. o
1
) = f
1
(y
+
t
[W
+
t1
. Y
+
t1
. o
1
)
f
2
(w
+
t
[W
+
t1
. y
+
t
. Y
+
t1
. o
z
)
(11)
where W
+
t1
= (W
1
t1
)
+
. The interest is on the marginal density, f
1
. If o
1
and o
2
are variation free ((o
1
. o
2
) c
1
2
; i.e. f
2
does not provide
information about
1
) then, a `sequential cut' may be operated: consider
only f
1
. Assuming furthermore that y
+
t
and W
+
t1
are independent con-
ditionally on Y
+
t1
and o
1
, the nuisance variables can be dropped from f
1
,
which yields:
f (y
+
t
[Y
+
t1
. o
1
). (12)
This can be represented using a specic probabilistic and functional form
(e.g. a VAR). Note that this marginalizing stage of the reduction process
proceeds entirely implicitly. It is impossible to consider empirically all
conceivable variables of interest to start with. Note also that many hid-
den assumptions are made (e.g. going from distributions to densities,
assuming linearity, parametric distributions, constant parameters).
Some of them may be testable but most are taken for granted.
8
The second stage, conditioning, aims at a further reduction by intro-
ducing exogeneity into the model. For this purpose, decompose y
+
into y
(endogenous) and x (exogenous) variables. Rewrite (12) as:
f (y
+
t
[Y
+
t1
. o
1
) = f (y
t
[X
t1
. x
t
. Y
t1
.
1
)f (x
t
[X
t1
. Y
t1
.
2
) (13)
where
1
are the parameters of interest. If x
t
is `weakly exogenous' (
1
and
2
are variation free), then `valid' and efcient inference on
1
or func-
tions thereof on the basis of
f (y
t
[X
t1
. x
t
. Y
t1
.
1
) (14)
is possible (where `valid' means no loss of information). This conditional
model can be represented in a specic form, usually the Autoregressive
Distributed Lag model is chosen for this purpose:
Econometric modelling 157
A(L)y
t
= B(L)x
t
u
t
. (15)
Note again the comments on implicit and explicit assumptions mentioned
with respect to the VAR representation of (12), which equally apply to
this case. The nal stage of modelling is to consider whether further
restrictions on (15) can be imposed without loss of information.
Examples are common dynamic factors, or other exclusion restrictions.
Destructive testing (for example by digesting the (mis-)specication test
menus of the econometric software package PC-GIVE) aims at weeding
out the invalid models. Remaining rival models are still `comparable via
the DGP' (Hendry, 1993, p. 463). Encompassing tests are supposed to
serve this purpose (see chapter 7, section 4.5 below). The Wald test is the
preferred tool for, among others, testing common factors. One reason is
computational ease (p. 152), but the nature of the Wald test matches the
general to specic modelling strategy quite well. This test is also used in
the context of encompassing tests (p. 413).
9
Hendry's justication for general to specic is based on the invalidity
of the test statistics if inference commences with the simple model:
every test is conditional on arbitrary assumptions which are to be tested later, and
if these are rejected all earlier inferences are invalidated, whether `reject' or `not
reject' decisions. Until the model adequately characterizes the data generation
process, it seems rather pointless trying to test hypotheses of interest in economic
theory. A further drawback is that the signicance level of the unstructured
sequence of tests actually being conducted is unknown. (p. 255)
Four objections to this argument can be made.
First, even the most general empirical model will be `wrong' or mis-
specied. Because of this possibility, Hendry (p. 257) advises testing the
general model, but does not explain how to interpret the resulting test
statistics. The validity of the general model is in the eyes of the
beholder.
10
It is revealing that Hendry's interest has shifted from systems
of equations to single equations (see p. 3), which is hard to justify from a
general-to-specic perspective.
Second, one does not need an `adequate' statistical representation of
the DGP (using sufcient statistics) in order to make inferences. In many
cases, stylized facts, a few gures or rough summary statistics (averages
of trials in experimental economics; eyeball statistics) are able to do the
work (note, for example, that statistical modelling in small particle phy-
sics is a rarity; the same applies to game theory). A crucial ingredient to a
theory of reduction should be a convincing argument for the optimal
level of parsimony (simplicity). No such argument is given.
158 Probability, Econometrics and Truth
Third, pre-test bias keeps haunting econometricians, whether they use
stepwise regression, iterative data-mining routines, or general-to-specic
modelling. Signicance levels of the COMFAC test may be known
asymptotically in a general-to-specic modelling strategy (as COMFAC
uses a series of independent tests) (see e.g. pp. 1523), but the sampling
properties of most other test statistics and even this one are obscure out-
side the context of repeated sampling (NeymanPearson) or valid experi-
mental design (Fisher).
A more general fourth objection can be raised. Would econometrics be
better off if the general-to-specic methodology were to be adopted? This
is doubtful. Empirical econometrics is an iterative procedure, and very
few econometricians have the discipline or desire to obey the general-to-
specic straitjacket. Not surprisingly, econometric practice tends to be
gradual approximation, often by means of variable addition. Unlike
Hendry's warnings, this practice may yield interesting and useful empiri-
cal knowledge. Consider one of the best examples in recent econometric
research: the analysis of the Permanent Income Hypothesis, mentioned
above in the discussion of the VAR approach. This literature, spurred on
by publications of Hall, Campbell and Mankiw, Deaton, Flavin and
others, is an example of fruitful variable addition. First, there was the
random walk model Harold Jeffreys would have loved this starting
point. Then came excess sensitivity and excess smoothness. These pro-
blems had to be explained, and were so, by liquidity constraints among
others. New variables were added to the specications, and better (more
adequate) empirical approximations were obtained (where those approxi-
mations could be directly related to advances in economic theory). This
literature, not hampered by an overdose of testing and encompassing,
may not carry the reductionists' praise, but the economics profession
tends to disagree.
11
The empirical literature on the permanent income
hypothesis is viewed as a rare success story in macro-econometrics,
indeed one of the few cases where econometric analysis actually added
to economic understanding of macro-economic phenomena a case
where the alternation of theory and empirical research produced a better
understanding of economics.
12
4.4 Falsicationism and the three cheers for testing
An elaboration of Fisher's approach to statistics is the rise of diagnostic
testing in econometrics. Fisher claims that proper randomization justies
a normality assumption. But economists rarely randomize, and hence
cannot be sure that the estimates have the desired properties. They use
unique data, often time series, for which the tacit statistical assumptions
Econometric modelling 159
are seldom warranted. Instead of randomization, one may design a model
such that the statistical assumptions are not blatantly violated. If diag-
nostic validity tests suggest that the model is mis-specied, the modeller
has to change the model.
Indeed: `The three golden rules of econometrics are test, test and test;
13
that all three rules are broken regularly in empirical applications is for-
tunately easily remedied' (Hendry, [1980] 1993, pp. 278). Testing is pro-
claimed the main virtue of a scientic econometrics. This idea is related to
a philosophy of science that was taught during Hendry's years at the
London School of Economics by Popper and Lakatos. Their methodo-
logical falsicationism and the methodology of scientic research pro-
grammes frequently recur in Hendry's writings.
In chapter 1 it was shown that, according to Popper, one can never
prove a theory right, but one may be able to falsify it by devising a
`crucial experiment'. Falsiability separates science from metaphysics.
Real scientists should formulate bold conjectures and try not to verify
but to falsify. As argued in chapter 1, there are numerous problems with
this view.
14
A crucial experiment is probably rare in physics (it may even
be a scientic myth), and even more so in economics. Indeed, economics
has hardly any experimentation to begin with (and where there is experi-
mentation as in game theory econometrics is rarely invoked for
purposes of inference). The claim that econometric modelling may
serve as a substitute for the experimental method (uttered by many
econometricians, among others Goldberger, 1964; also Morgan, 1990,
pp. 910, 259) deserves scepticism. Finally, for methodological falsica-
tionism, one needs bold conjectures. The empirical models of reductionist
exercises are but a faint reection thereof: they are very able representa-
tions of the data, no more, no less.
15
So much for Popper, Hendry's favourite philosopher. His second idol
is Lakatos. Lakatos (1970) noted that, in practice, few scientic theories
are abandoned because of a particular falsication. Moreover, scientists
may try to obtain support for their theories. He accepted a `whiff of
inductivism'. In order to appraise scientic theories, Lakatos invented
the notion of `progressive' and `degenerative' scientic research pro-
grammes, discussed in chapter 1. A research programme is theoretically
progressive if it predicts some novel, unexpected fact. It is empirically
progressive if some of those predictions are conrmed. Finally, a research
programme is heuristically progressive if it is able to avoid or even to
reduce the dependency on auxiliary hypotheses that do not follow from
the `positive heuristic', the general guidelines of the research programme.
It is very difcult to encounter a single novel fact in Lakatos' sense in
the econometric literature, and the papers of reductionism are no excep-
160 Probability, Econometrics and Truth
tion. Moreover, it is totally unclear what kind of heuristic in an economic
research programme drives reductionists' empirical investigations, apart
from a few rather elementary economic relations concerning consump-
tion or money demand. Hendry, for example, does not predict novel
facts. What he is able to do is provide novel interpretations of given
facts, which is something entirely different. Recalling Lakatos' blunt
comments on statistical data analysis in the social sciences, it will be
clear that it is not justiable to invoke Lakatos without providing the
necessary ingredients of a research programme, in particular the driving
`heuristic'. Hendry does not provide a satisfactory heuristic unless el-
ementary textbook macro-economics combined with the view that people
behave according to an error-correcting scheme would qualify.
16
However, Hendry (1993, p. 417) also applies the notion of progressive-
ness to his methodology itself, rather than to the economic
implementations:
The methodology itself has also progressed and gradually has been able both to
explain more of what we observe to occur in empirical econometrics and to
predict the general consequences of certain research strategies.
Here a `meta research programme' is at stake: not neoclassical econom-
ics, for example, but the method of inference is regarded as a research
programme, with an heuristic, and all the other Lakatosian notions (hard
core, protective belt, novel facts). Hendry's positive heuristic is then to
`test, test and test'. The hard core might be the notion of a DGP to be
modelled using `dynamic econometrics'. Probably the most promising
candidate for a `novel fact' is the insight that, if autocorrelation in the
residuals is removed by means of a CochraneOrcutt AR(1) transforma-
tion, this may impose an invalid common factor: the transformed resi-
duals may behave as a white noise process but they are not innovations.
This is an important insight. Whether the stated heuristic is sufcient for
dening a research programme, and whether this novel fact satises the
desire for progression, is a matter on which disagreement may (and does)
exist.
Apart from the dubious relation to the philosophy of science literature,
there is a very different problem with Hendry's golden rules, which is that
the statistical meaning of his tests is unclear. Standard errors and other
`statistics' presented in time series econometrics do not have the same
interpretation for statistical inference as they have in situations of experi-
mental data. In econometrics, standard errors are just measurements of
the precision of estimates, given the particular measurement system
(econometric model) at hand. They are not test statistics. At times,
Econometric modelling 161
Hendry seems to agree with this interpretation. For example (Hendry,
1993, p. 420),
Genuine testing can therefore only occur after the design process is complete and
new evidence has accrued against which to test. Because new data has been
collected since chapter 8 was published, the validity of the model could be inves-
tigated on the basis of NeymanPearson (1933) `quality control' tests.
This statement also suggests that encompassing tests (discussed in the
next section) are not `genuine' tests (indeed, sometimes encompassing is
presented as a form of mis-specication testing).
The reference to NeymanPearson in the quotation is only partly valid.
NeymanPearson emphasize the context of decision making in repetitive
situations (hence, the theory is based on repeated sampling). In Hendry's
case, a number of additional observations has been collected, but the aim
is not decision making but inference.
Occasionally, instead of the words `statistical tests', the words `diag-
nostic checks' are used. But not in the trinity, the three golden rules.
Would they have the same aural appeal if they were `check, check and
check', or `measure, measure and measure'? It is doubtful, although as
rules they seem more appropriate. Econometricians measure rather than
test, and the obsession with testing is rather deplorable. Indeed, a nal
problem with excessive testing is that, once the number of test statistics
exceeds the number of observations used to calculate them, one may
wonder how well the aim of reduction has been served, and what the
meaning of the tests really is. Hendry and Ericsson (1991) provide an
example. Given ninety-three annual observations, forty-six test statistics
are reported. If standard errors of estimates are included, this number
grows to 108. It grows further once the implicit (eyeball) test statistics
conveyed by the gures are considered.
Two nal remarks on statistical tests: rst, there is always the problem
of choosing the appropriate signicance level (why does 5% deserve
special attention?) a problem not specic to Hendry's methodology;
second, a statistical rejection may not be an economically meaningful
rejection. If rational behaviour (in whatever sense) is statistically rejected
(even if the context is literally one of repeated sampling) at some signi-
cance level, this does not mean that the objects of inference could increase
their utility by changing behaviour. Money-metric test statistics would be
more informative in econometric inference than statistical tests, but such
tests are not discussed by Hendry or most other econometricians
(Hashem Pesaran and Hal Varian are exceptions). This issue is of impor-
tance in many (proclaimed) tests of perfect markets currently presented in
the nance literature.
162 Probability, Econometrics and Truth
4.5 Model design and inference
Hendry argues that all models are derived from the DGP and, therefore,
the properties of the models are also derived entities. One does not need
to invoke a DGP to arrive at this important insight of non-experimental
data analysis (e.g., replace `DGP' by `data') after all, what is this thing
called `DGP'? However, Hendry's emphasis on this insight deserves
praise. Indeed (Hendry, 1993, p. 73),
the consequence of given data and a given theory model is that the error is the
derived component, and one cannot make `separate' assumptions about its proper-
ties.
One of the strengths of Hendry's methodological observations is his
recognition of the importance of model design. I agree that models are
econometricians' constructs, and the residuals are `derived rather than
autonomous' and, hence, models can `be designed to satisfy pre-selected
criteria' (p. 246). The distribution of the residuals cannot be stated a
priori, as may be done in specic situations of experimental design. The
`axiom of correct specication' does not hold, which is why the econo-
metrician has to investigate the characteristics of the derived residuals.
Model design aims at constructing models that satisfy a set of desired
statistical criteria. Models should be sufcient statistics.
However, this has strong implications for the interpretation of the
statistical inferences that are made in econometric investigations. Most
importantly, the tests are not straightforward instances of Neyman
Pearson or Fisherian tests. Accommodating the data (description) is cru-
cially different from inference (prediction), in particular if the model that
accommodates the data is not supported by a priori considerations.
17
In the case of model design, the model simultaneously `designs' its
hypothetical universe. Hence, the model, given the data, cannot be
used for the purpose of inference about the universe as the universe is
constructed during the modelling stage, and is unique to the particular set
of data which were used to construct it. Fisher (1955, p. 71), criticizing
repeated sampling in the NeymanPearson theory, argues:
if we possess a unique sample in Student's sense on which signicance tests are to
be performed, there is always, as Venn ([1866] 1876]) in particular has shown, a
multiplicity of populations to each of which we can legitimately regard our sam-
ple as belonging: so that the phrase `repeated sampling from the same population'
does not enable us to determine which population is to be used to dene the
probability level, for no one of them has objective reality, all being products of
the statistician's imagination.
Econometric modelling 163
This does not imply that Fisher rejected statistical inference in cases of
unique samples. However, he thought experimental design a crucial ele-
ment for valid inference. Economic theory is not an alternative to experi-
mental design, in particular in macro-economics where many rival
theories are available. Model design (accommodation) is not an alterna-
tive either, if the ad hoc element in modelling reduces the prior support of
the model. The real test, therefore, remains the ability to predict out of
sample and in different contexts. Friedman (1940, p. 659) made this point
when he reviewed Tinbergen's work for the League of Nations:
Tinbergen's results cannot be judged by ordinary tests of statistical signicance.
The reason is that the variables with which he winds up . . . have been selected
after an extensive process of trial and error because they yield high coefcients of
correlation.
This conveys the same message as Friedman's reply (in the postscript to
Friedman and Schwartz, 1991) to Hendry and Ericsson (1991). Indeed,
Hendry (1993, pp. 4256) also argues that a real test is possible if (gen-
uinely) new data become available, whereas the test statistics obtained in
the stage of model design `demonstrate the appropriateness (or otherwise)
of the design exercise'.
The position of Friedman and Schwartz, who do not rely on modern
econometric methods but are aware of the fundamental issues in statis-
tics, might be related to Leamer's (1978) emphasis on sensitivity analysis,
in combination with a strong emphasis on the importance of new data.
Econometrics may not be the only, or most useful, method to nd, or
test, interesting hypotheses. Theory is an alternative, or even a careful
(non-statistical) study of informative historical events (like the stock
market crash).
18
There are few occasions where statistical tests have
changed the minds of mainstream economists (with liquidity constraints
in consumption behaviour arguably being a rare exception).
The hypothesis that most human behaviour follows an error-correcting
pattern has not been accepted in economic textbooks, whereas Hall's
consumption model and the more recent related literature is a standard
topic in such books. Why is this so? Perhaps it is because the adequacy
criteria in model design do not necessarily correspond to the require-
ments for an increase in knowledge of economic behaviour. Even the
ability to `encompass' Hall's model does not necessarily contribute to
progress in economic knowledge. Encompassing, Hendry (1993, p. 440)
argues,
seems to correspond to a `progressive research strategy' . . . in that encompassing
models act like `sufcient statistics' to summarize the pre-existing state of
knowledge.
19
164 Probability, Econometrics and Truth
As noted above, this is not what Lakatos meant by progress. Hendry is
able to provide novel interpretations of existing data or models, but this
is not equivalent to predicting novel facts. However, encompassing might
provide a viable alternative to the Popperian concept of verisimilitude
(closeness to the truth). Popper's formal denition of verisimilitude has
collapsed in view of the problem of false theories. Encompassing does not
necessarily suffer from this problem due to the availability of pseudo
maximum likelihood estimators (where the models do not have to be
correctly specied). Here is a potential subject where econometrics may
contribute to philosophy of science.
4.6 Alchemy or science?
The quest for the Holy Grail resulted in a shattering of the brotherhood
of the Round Table. Still, some of the knights gained worship. A quest
for the DGP in econometrics is more likely to gain econometricians dis-
respect. Like making gold for the alchemists, searching for the DGP is a
wrong aim for econometrics. Relying on the trinity of tests may be
respectable, but it does not deliver the status of science. The Popperian
straitjacket does not t econometrics (or any other science). Model design
makes econometrics as scientic as fashion design. Modelling is an art,
and one in which Hendry excels.
Occasionally, Hendry interprets models as `useful approximations'
(1993, p. 276). Karl Pearson, a founder of statistics and an early positi-
vist, argued along those lines, and so did other great statisticians in the
tradition of British empiricism (in particular, Fisher and Jeffreys). Of
course, the issue is: approximations to what? If the answer is a Holy
Grail, known as the DGP, then econometrics is destined to be a branch
of alchemy or, worse, metaphysics. Popperian cosmetics will not render
econometrics a scientic status. If, on the other hand, the answer is
approximations to the data, helping to classify the facts of economics,
econometrics may join the positivist tradition which, in a number of
cases, has yielded respectable pieces of knowledge. Criticism is an impor-
tant ingredient of this positivist tradition, but not a methodological
dogma.
5 Sensitivity analysis
5.1 Extreme Bounds Analysis
Whereas Sims aims at avoiding the problem of specication uncertainty
by evading specication choices (proigate modelling without incredible
Econometric modelling 165
restrictions), and Hendry deals with the problem by careful testing of the
assumptions which are at stake, Ed Leamer provides a very different
perspective. He regards, contrary to Sims, specication choices as una-
voidable, and, contrary to Hendry, test statistics as deceptive. The only
option which can make econometrics credible is by showing how infer-
ences change if assumptions change. In other words, not only the nal
result of a `specication search' should be reported, but the sensitivity of
parameters of interest on eligible changes in the specication. An inter-
esting economic question is rarely whether a parameter differs signi-
cantly from zero and should be deleted or not. The magnitude of a
particular (set of) parameter(s) of interest, and the sensitivity to model
changes, is of much more importance. Extreme Bounds Analysis (EBA) is
a technical tool to obtain information on this issue. EBA itself is a tech-
nique, not a methodology. But the underlying argument, sketched
above, is methodologically very different from other econometric
methodologies.
Consider a regression model y = X[ u, u N(0. o
2
I). A researcher is
interested in how sensitive a subset of parameters of interest, [
i
, is to
changes in model specication. For this purpose, one needs to distinguish
`free' variables, X
f
, from `doubtful' variables, X
d
. A parameter of interest
may be free or doubtful, depending on the perspective of a researcher (see
McAleer, Pagan and Volcker, 1985, who note that the fragility of esti-
mates is sensitive to the subjective choice of classication). For the sake
of clarity I will assume that the parameters of interest belong to the free
variables. If a researcher ddles with X
d
(for example, by dropping some
of them from the specication), it is possible that the estimates of [
i
will
change. This depends on the collinearity of the variables as well as the
contributions of the free and doubtful variables to explaining the varia-
tion in y. EBA is an attempt to convey how sensitive [
i
is to such changes
in model specication (Leamer, 1978, chapter 5; Leamer, 1983).
As a rst step, specication changes are limited to linear restrictions
on the general model, R[ = r. If the least squares estimator is
denoted by b = (X
/
X)
1
X
/
y then the constrained least squares
estimator is
b
c
= b (X
/
X)
1
R
/
(R(X
/
X)
1
R
/
)
1
(Rb r). (16)
Hence, any desired value for the constrained least squares estimates can
be obtained by an appropriate choice of R and r. But even ardent data-
miners tend to have a tighter straitjacket than the restrictions R[ = r.
Leamer (1978, p. 127), therefore, considers the special case of r = 0. In
166 Probability, Econometrics and Truth
this case, it can be shown that the constrained least squares estimates lie
on an ellipsoid; this ellipsoid denes the extreme bounds.
Extreme bounds convey two messages. The rst is directly related to
specication uncertainty. This message can equally well be obtained
from more traditional approaches of diagnostic testing. An example is
theorem 5.7 (p. 160), which proves that there can be no change in the
sign of a coefcient that is more signicant (i.e. has a higher t-statistic)
than the coefcient of an omitted variable. This can be generalized to
sets of regressors, as for example has been done in McAleer, Pagan and
Volcker (1985, Proposition 2b), who transform EBA to a comparison of
2
test statistics of the signicance of the set of doubtful parameters d
relative to the signicance of the parameters of interest i. Their condi-
tion for fragility,
2
d
>
2
i
, is a necessary (but not sufcient) one (for
sufciency, one also has to know the degree of collinearity between the
doubtful variables and the variables of interest). It can be argued that
this traditional approach is easier to carry out and even more informa-
tive (as it shows that by introducing statistically insignicant variables
to the doubtful set, the `robustness' of the parameters of interest
increases). It also avoids a specic problem of EBA that a `focus vari-
able' in Leamer's framework becomes automatically `fragile' once it is
dubbed doubtful.
20
A second message, though, distinguishes EBA from more traditional
approaches. This message is about the numerical value of parameters of
interest, and is related to importance testing (instead of signicance
testing). A conventional signicance test, Leamer (1978) argues, is but
a cumbersome way of measuring the sample size. It is of more interest to
know how the value of parameters change if the model specication is
changed.
Another difference with the reductionist programme of Hendry is
related to the problem of collinearity. According to Hendry, collinearity
is to be combated by orthogonal re-parameterization. Leamer counters
that collinearity is related to the question of how to specify prior infor-
mation. The fact that standard errors will be high if regressors are corre-
lated is not necessarily a problem to Leamer.
EBA is proposed as a Bayesnon-Bayes compromise. The motivation
given for EBA is Bayesian. Parameters are random and have probability
distributions, but, as in Keynes' and B. O. Koopman's writings, it is
argued that precise prior probability distributions are hard to formulate.
Instead of specifying them explicitly, Leamer advises searching over a
range of more or less convincing specications.
Econometric modelling 167
5.2 Sense and sensitivity
While Hendry and other reductionists intend measuring the amount of
mis-specication from the data, Leamer (1983) argues that it would be a
`remarkable bootstrap' if you could do so. In addition to the familiar
standard errors reported in econometric investigations (related to the
sample covariance matrix), one should consider the importance of a
mis-specication covariance matrix M, which largely depends on the a
priori plausibility of a model: `One must decide independent of the data
how good the nonexperiment is' (p. 33). According to Leamer, M gures
as Lakatos' protective belt, which protects hard core propositions from
falsication.
Leamer explicitly refers to the divergence between Fisherian statistics
as used by agricultural experimenters and econometrics. This difference is
measured by M. M may be small in randomized experiments, but it may
be quite large in cases of inference based on non-experimental data. The
resulting specication uncertainty remains whatever the sample size is
(although, if larger samples become available, some econometricians
increase the complexity of a model by, for example, using more exible
functional forms, thereby reducing the mis-specication bias).
One option to decrease M is to switch to experimental settings and
randomize. This brings us into the Fisherian context of statistical
research, but as noted earlier is not much practised. This option is some-
times chosen by economists (and increasingly frequently by game
theorists). Another approach to reduce M is to gather qualitatively
different kinds of `nonexperiments'. Friedman's studies of consumption
and money are typical examples of this approach.
The third response is to take mis-specication and data-mining for
granted, but to analyse how sensitive results are to re-specifying the
model. EBA is a method designed for this purpose, perhaps not the
best method as the bounds are more `extreme' than the bounds of a
`thoughtful data miner'. Many specications that underlie points on
the ellipsoid will belong to entirely implausible models. Paraphrasing
Jeffreys ([1939] 1961, p. 255), whoever persists in looking for evidence
against a hypothesis will always nd it. Leamer's method tends to exag-
gerate the amount of scepticism (although papers in the reductionist
programme that do not investigate sensitivity at all, exaggerate the
amount of empirical knowledge that can be obtained from econometric
studies).
Granger and Uhlig (1990) suggest making the bounds more reasonable
by not taking the extremes, and by imposing constraints in addition to
Rb = r. In particular, they propose considering only those specications
168 Probability, Econometrics and Truth
that have an R
2
of at least 90 or 95% of the maximum R
2
that can be
obtained. This proposal is very ad hoc. It does not remove the real prob-
lem of EBA: whatever the bound, it is not clear whether the resulting
specication is plausible. Plausibility is hard to formalize, but the mini-
mum description length (MDL) principle, discussed in chapter 5, may
serve as a heuristic guide. Goodness of t (more precisely, the likelihood)
is one element of that principle. However, relying on R
2
as a statistic for
goodness of t is not to be recommended, and even
R
2
is not a very good
guide for a sensible sensitivity analysis, as those statistics are bad guides
in small samples (Cramer, 1987, shows that R
2
in particular is strongly
biased upwards in small samples, and both R
2
and
R
2
have a large
dispersion; Cramer advises not using these statistics for samples of less
than fty observations).
A sensible renement of extreme bounds sensitivity analysis can be
found in Levine and Renelt (1992). They discuss the fragility of cross-
country growth regressions. Among other things, they consider regres-
sions of GDP per capita, y (n 1), on a set of free variables X
f
(n k),
including investment shares, initial level of GDP, education and popula-
tion growth. For calculating extreme bounds, one of those variables in
turn is set apart as the variable of interest, x
i
(n 1). In addition, up to
three doubtful variables, X
d
(n l. l _ 3) that may potentially be of
importance in explaining cross-country growth differences are included.
These variables are taken from a list obtained from analysing previous
empirical studies.
The result is linear equations of the following kind:
y = [
i
x
i
X
f
[
f
X
d
[
d
u. (17)
The differences from Leamer's original approach are:
.
the restriction 1 _ 3
.
the (potentially very large) list of doubtful variables is restricted to
seven indicators that are argued to be a `reasonable conditioning set'
.
a further selection is used: for every variable of interest x
i
, the pool of
variables from which to choose X
i
excludes those variables that are
thought on a priori grounds to measure the same phenomenon as x
i
.
The extra restrictions lead to smaller but more convincing extreme
bounds. This makes sense, as the bounds in Leamer's original approach
may be very wide, resulting from nonsense specications.
Levine and Renelt not only report the extreme bounds, but also the set
of variables that yields those bounds. This gives further information to
readers who can judge for themselves whether this particular specication
makes sense at all. The authors also consider the effect of data quality on
Econometric modelling 169
the bounds. After an impressive data analysis (without statistical validity
testing, though) they conclude that the correlation between output and
investment growth is `robust', but a large assortment of other economic
variables are not robustly correlated with growth: `the cross country
statistical relationship between long-run average growth and almost
every particular macroeconomic indicator is fragile' (Levine and
Renelt, 1992, p. 960). This study has become a classic in the empirical
growth literature. With qualications, EBA is a very useful method.
More importantly, the argument underlying EBA deserves more consid-
eration in econometrics.
6 Calibration
6.1 Elements of calibration
Kydland and Prescott (1994, p. 1) argue that performing experiments
with actual people at the level of national economics is `obviously not
practical, but constructing a model economy inhabited by people and
computing their economic behavior is'. Their `computational experiment'
re-introduces the old view that econometric models are an alternative to
physical experiments.
21
A model economy is calibrated `so that it mimics
the world along a carefully specied set of dimensions'. Those dimensions
are usually rst and second moments of some business cycle phenomena
and compared with the actual moments. The resulting model enables
counterfactual analysis which can be used for policy evaluation. The
choice of the model (and the dimensions on which it is evaluated)
depends entirely on the particular question of interest.
Calibration consists of a number of elements. The most important is a
`well-tested theory'. Kydland and Prescott's denition of theory (p. 2) is
taken from Webster's dictionary, i.e. `a formulation of apparent relation-
ships or underlying principles of certain observed phenomena which has
been veried to some degree'. As an example of this (a-Popperian) deni-
tion, they mention neoclassical growth theory which `certainly' satises
the criterion in the denition. One of the problems of econom(etr)ics is
that not everyone agrees. To make things worse, no operational deni-
tion of verication is presented. The obvious candidate based on statis-
tical criteria is explicitly rejected: calibration is presented as an alternative
to econometrics (in the modern, narrow, sense of statistical analysis of
economic data).
The second important element of a calibrated model is a set of para-
meter values. Unlike different approaches in econometrics, calibrators do
not estimate these parameters in the context of their own model. Instead,
170 Probability, Econometrics and Truth
they pick estimates found elsewhere, often in (micro-)econometric inves-
tigations related to specic features of the model to be calibrated. A great
advantage of the calibration approach is the utilization of `established'
ndings, including salient `stylized facts' of economies, which are imputed
into the model. Examples are labour- and capital shares for the calibra-
tion of (CobbDouglas) production functions, or a unitary elasticity of
substitution between consumption and leisure.
The third element of calibration is the actual process of calibrating.
The `well-established theory' plus a selection of `deep parameters' are not
sufcient to specify the dynamics of the general equilibrium model:
`Unlike theory in the physical sciences, theory in economics does not
provide a law of motion' (Kydland and Prescott, 1994, p. 2). This law
is computed by computer simulations in such a way that the `carefully
specied set of dimensions' of the model are in accordance with reality.
The metaphor of a thermometer is invoked to justify this step. The level
of mercury in water at freezing point and boiling water is indicated and it
is known that this corresponds with 0
and 100
Celsius, respectively.
Add the information that mercury expands approximately linearly within
this range and the measurement device can be calibrated. Similarly, a
model economy should give approximately correct answers to some spe-
cic questions where the outcomes are known. In practice, simulated
moments are compared with actual moments of business cycle phenom-
ena. If correspondence is acceptable, then it is argued that a necessary
condition is satised for having condence in answers to questions where
the `true' answer is not known a priori.
6.2 The quest for deep parameters
The most important characteristic of calibration is its strong reliance on
economic theory. According to Eichenbaum (1995, p. 1610),
the models we implement and our empirical knowledge have to be organised
around objects that are interpretable to other economists objects like the para-
meters governing agents' tastes, information sets and constraints. To the extend
that econometricians organise their efforts around the analysis of the parameters
of reduced form systems, their output is likely to be ignored by most of their
(macro) colleagues.
A similar argument was made in chapter 5 on simplicity: structural rela-
tions are more informative than idiosyncratic reduced form equations. It
is also the argument of the Cowles Commission, and I think it is a valid
explanation for the limited success of VAR business cycle research. The
difference between calibrators and Cowles econometricians is that cali-
Econometric modelling 171
brators reject Haavelmo's probability foundations in econometrics, i.e.
formal NeymanPearson methods of inference.
One may wonder how much calibration adds to the knowledge of
economic structures and the deep parameters involved. Calibration is
an interesting methodological procedure, as it extends the merits of
empirical ndings to new contexts. Micro estimates are imputed in gen-
eral equilibrium models which are confronted with new data, not used for
the construction of the imputed parameters. Imputation yields over-iden-
tifying restrictions which are needed to establish the dynamic components
(`laws of motion') of the model. However, this procedure to impute para-
meter values into calibrated models has serious weaknesses and over-
estimates the growth of knowledge about structures and deep parameters.
First, few `deep parameters' (those related to taste, technology, infor-
mation and reactiveness) have been established at all, by econometric or
other means (Summers, 1991). Like Summers, Hansen and Heckman
(1996, p. 90) are sceptical about the knowledge of deep parameters:
`There is no ling cabinet full of robust micro estimates ready to use in
calibrating dynamic stochastic general equilibrium models.'
Second, even where estimates are available from micro-econometric
investigations, they cannot be automatically imported (as is done in
calibration exercises) into aggregated general equilibrium models:
usually, they are based on partial equilibrium assumptions. Parameters
are context dependent. Calibrators suffer from a realist illusion if they
think that estimates can be transmitted to other contexts without pro-
blem.
22
Hansen and Heckman (1996, p. 90) claim that `Kydland and
Prescott's account of the availability and value of micro estimates for
macro models is dramatically overstated'. Kydland and Prescott combine
scientic realism about `deep parameters' with the opposite view about the
aggregate equilibrium models. This micro-realism versus macro-instru-
mentalism is an unusual methodological stance.
Third, calibration hardly contributes to growth of knowledge about
`deep parameters'. These deep parameters are confronted with a novel
context (aggregate time-series), but this is not used for inference on their
behalf. Rather, the new context is used to t the model to presumed `laws
of motion' of the economy. The characteristics of the model (distribu-
tions or statistics of parameters of interest) are tuned to the known
characteristics of the data, like a thermometer which is tuned to the
(known) distribution of temperatures. The newly found parameters for
the laws of motion are not of primary interest (arguably, they are not
`deep' but idiosyncratic, i.e. specic, to the model which is at stake) and,
therefore, not subject to testing. This is not necessarily bad scientic
practice: the merits of testing are highly overvalued in mainstream econo-
172 Probability, Econometrics and Truth
metrics and the calibrators' critique of formal hypothesis testing is not
without grounds. But without criteria for evaluating the performance of a
thermometer, its reported temperature is suspect. Statistics is a useful tool
for measurement and Kydland and Prescott do not give a single argu-
ment to invalidate this claim (Eichenbaum, 1995, is more optimistic
about the possibility of using probabilistic inference in combination
with calibration, as he provides some suggestions to introduce Bayesian
inference and the Generalised Method of Moments procedure into the
calibration methodology).
This leads to the fourth weakness. The combination of different pieces
of evidence is laudable, but it can be done with statistical methods as well
(Bayesian or non-Bayesian approaches to pooling information). This
statistical approach has the advantage that it takes parameter uncertainty
into account: even if uncontroversial `deep parameters' were available,
they would have standard errors. Specication uncertainty makes things
even worse. Neglecting this leads to self-deception.
6.3 Calibration and probabilistic inference
Probability plays a limited role in calibration where stochastic uncer-
tainty in a competitive equilibrium model results in a stochastic process
which is simulated by a computer. The `shocks' to the model (e.g. in
technology) have a well-dened probability distribution (imposed by
the researcher). Sampling theory is invoked to measure the distribution
of statistics of interest generated by the model. The computer can gen-
erate as many (stochastic) repetitions as desired, and the notions of
repeated sampling or collective is justied in this context.
23
It should
be noted that the relation between the models and reality is a totally
different issue to which this use of sampling theory does not relate.
As argued above, criteria for evaluating the results of a calibration
exercise are not given. Calibrators do not rely on a model in which good-
ness of t and other statistical arguments are decisive criteria: `the goal of
a computational experiment is not to try to match correlations' (Kydland
and Prescott, 1994, p. 8). Statistical theory is not invoked to validate
economic models, it is `not a useful tool for testing theory' (p. 23).
Curve tting (e.g. by means of the HodrickPrescott lter), based on
least squares techniques, is not supplemented with probabilistic argu-
ments. One of the invoked arguments against signicance testing is,
that all models are `false', and any signicance test with a given size
will ultimately reject the model, if a sufcient number of observations
is obtained. For example, Eichenbaum (1995, p. 1609) argues,
Econometric modelling 173
Since all models are wrong along some dimension of the data the classic
Haavelmo programme is not going to be useful in this context [Probability
approach in econometrics, 1944]. We do not need high-powered econometrics
to tell us that models are false. We know that.
The calibration approach goes a step further than the Cowles
Commission in its trust in theory, while the probability approach to
econometrics is abandoned as `utopian' (p. 1620). But total rejection of
statistical analysis may lead to unnecessary loss of information.
Kydland and Prescott (1994, p. 23) claim that a useful way to measure
empirical adequacy is to see `whether its model economy mimics certain
aspects of reality'. No arguments are given as to how to assess the quality
of the correspondence and how to select the list of relevant aspects
(moment implications of the model). This is a serious weakness in the
calibration approach. How trustworthy is a model with a `signicant'
empirical defect in some of its dimensions, even if it is not this particular
dimension which is the one of theoretical interest?
Pagan (1995) suggests that calibrated models are invalid reductions of
the data. Kydland and Prescott deny that this is an interesting criticism.
For example, they argue that
all models are necessarily abstractions. A model environment must be selected
based on the question being addressed. . . The features of a given model may be
appropriate for some questions (or class of questions) but not for others. (1996,
p. 73)
There is no unique way to compare models and rank them, as this is a
context-dependent activity. The authors invoke Kuhn (1962, pp. 1456),
who argues that the resolution of scientic revolutions will not be settled
instantly by a statistical test of the old versus the new `paradigm'.
Kydland and Prescott (1996, p. 73, n. 4) again quote Kuhn (1962,
p. 145) who argues: `Few philosophers of science still seek absolute cri-
teria for the verication of scientic theories.' They do not quote the
remainder of Kuhn's argument, which says `[n]oting that no theory can
ever be exposed to all possible relevant tests, they ask not whether a
theory has been veried but rather about its probability in the light of
the evidence that actually exists'. Kuhn continues to show the difculties
due to the fact that probabilistic verication theories all depend on a
neutral observation language. Kuhn's incommensurability theory,
which denies such a neutral language, undermines probabilistic testing
of rival theories. As a result, he concludes, `probabilistic theories disguise
the verication situation as much as they illuminate it' (p. 146). This
phrase is cited with approval by Kydland and Prescott.
174 Probability, Econometrics and Truth
However, Kuhn's argument differs fundamentally from the calibra-
tors'. The latter pretend that a neutral language does exist and even yields
information about `true' deep parameters. Where commensurability pre-
vails, Kuhn (p. 147) holds that it `makes a great deal of sense to ask which
of two actual and competing theories ts the facts better'. A different
motivation for rejecting probabilistic inference should be presented.
Kuhn is the wrong ally.
An effort to provide a motivation is made by the claim that some
model outcomes should obey known facts, but `searching within some
parametric class of economies for the one that best ts a set of aggregate
time series makes little sense' (Kydland and Prescott, 1994, p. 6). The best
tting model may not be the most informative on the question at stake.
Moreover, it may violate `established theory' (p. 18), which should not be
readily forsaken. One of Prescott's most revealing titles of a paper is
`Theory ahead of business cycle measurement' (1986).
The denition of `theory' inspires the rejection of statistical inference.
Following Lucas (1980), theory is conceived as a set of instructions for
building an imitation economy to address specic questions. It is `not a
collection of assertions about the behaviour of the actual economy.
Consequently, statistical hypothesis testing, which is designed to test
assertions about actual systems, is not an appropriate tool for testing
economic theory' (Kydland and Prescott, 1996, p. 83). This negation in
the denition of `theory' is curious. Few of the great theorists in econom-
ics and the sciences would accept it. But it is true that testing theory is a
rare activity and moreover, that statistical hypothesis testing is at best of
minor importance for this. Eichenbaum (1995, p. 1620) is correct in his
assertion that the Cowles programme, `with its focus on testing whether
models are ``true'', means abandoning econometrics' role in the inductive
process'. This does not imply the KydlandPrescott thesis, that probabil-
istic inference is not useful for empirical economics. The latter thesis is
untenable.
7 Conclusion
Specication uncertainty is the scourge of econometrics: it has proven to
be the fatal stumbling block for the Cowles programme in econometrics.
This chapter has discussed various ways of dispelling this scourge: VAR,
reductionism, sensitivity analysis, and calibration. It is useful to summa-
rize their characteristics briey in terms of the underlying philosophy, the
probability theory, and the views on testing and simplicity. This is done
in table 1.
Econometric modelling 175
T
a
b
l
e
1
.
V
A
R
R
e
d
u
c
t
i
o
n
i
s
m
S
e
n
s
i
t
i
v
i
t
y
a
n
a
l
y
s
i
s
C
a
l
i
b
r
a
t
i
o
n
P
h
i
l
o
s
o
p
h
y
o
f
s
c
i
e
n
c
e
i
n
s
t
r
u
m
e
n
t
a
l
i
s
m
r
e
a
l
i
s
m
i
n
s
t
r
u
m
e
n
t
a
l
i
s
m
m
i
c
r
o
-
r
e
a
l
i
s
m
,
m
a
c
r
o
-
m
e
t
h
o
d
o
l
o
g
i
c
a
l
i
n
s
t
r
u
m
e
n
t
a
l
i
s
m
f
a
l
s
i
c
a
t
i
o
n
P
r
o
b
a
b
i
l
i
t
y
t
h
e
o
r
y
r
s
t
f
r
e
q
u
e
n
t
i
s
t
,
l
a
t
e
r
F
i
s
h
e
r
,
N
e
y
m
a
n
B
a
y
e
s
i
a
n
p
r
o
b
a
b
i
l
i
t
y
-
a
v
e
r
s
i
o
n
B
a
y
e
s
i
a
n
P
e
a
r
s
o
n
S
t
a
t
i
s
t
i
c
a
l
t
e
s
t
i
n
g
o
n
l
y
r
e
l
e
v
a
n
t
f
o
r
m
a
k
e
s
e
c
o
n
o
m
e
t
r
i
c
s
n
o
t
i
n
f
o
r
m
a
t
i
v
e
a
b
o
u
t
n
o
t
i
n
f
o
r
m
a
t
i
v
e
a
b
o
u
t
d
e
t
e
r
m
i
n
i
n
g
l
a
g
l
e
n
g
t
h
s
c
i
e
n
t
i
c
s
u
b
s
t
a
n
t
i
v
e
i
s
s
u
e
s
s
u
b
s
t
a
n
t
i
v
e
i
s
s
u
e
s
S
i
m
p
l
i
c
i
t
y
t
h
e
o
r
y
i
m
p
u
t
e
s
g
e
n
e
r
a
l
t
o
s
p
e
c
i
c
b
e
l
o
n
g
s
t
o
r
e
a
l
m
o
f
d
e
s
i
r
a
b
l
e
f
e
a
t
u
r
e
o
f
i
n
c
r
e
d
i
b
l
e
r
e
s
t
r
i
c
t
i
o
n
s
`
e
n
c
o
m
p
a
s
s
i
n
g
'
m
o
d
e
l
,
m
e
t
a
p
h
y
s
i
c
s
,
b
u
t
m
o
d
e
l
s
s
h
o
u
l
d
b
e
e
v
a
l
u
a
t
e
d
b
y
s
h
o
u
l
d
b
e
e
v
a
l
u
a
t
e
d
d
e
s
i
r
a
b
l
e
f
e
a
t
u
r
e
o
f
i
n
f
o
r
m
a
t
i
o
n
c
r
i
t
e
r
i
o
n
b
y
s
i
g
n
i
c
a
n
c
e
t
e
s
t
m
o
d
e
l
s
The great drawback of the VAR approach is that it produces idiosyn-
cratic models, where the criticism of `arbitrary exclusion restrictions'
leads to a violation of the simplicity postulate. A better alternative to
evaluate the effect of restrictions is Leamer's sensitivity analysis. His
method may be imperfect but the idea is sound and deserves much
more support than it has received. Hendry's reductionism collapses
under its Popperian and probabilistic pretences. Signicance testing
does not make econometrics a science and the methodology is likely
once again to yield idiosyncratic models: good representations of specic
sets of data, but hardly successful in deriving novel knowledge on generic
features of the economy. Calibration, nally, throws away the baby with
the bathwater. I support the claim that all models are wrong, but there
are alternative statistical approaches that are able to accommodate this
claim. If econometrics were to escape the straitjacket of the Neyman
Pearson approach, calibration might return to a fruitful collaboration
with econometrics.
Statistical inference may be an informative research tool but without
investigating the credibility of the assumptions of such a model, and the
sensitivity of the parameters of interest to changes in assumptions, econo-
metrics will not full its task. Sensitivity analysis, in its different forms
(Leamer's, Friedman's) should be a key activity of all applied econome-
tricians. This would also re-direct the research interest, away from
NeymanPearson probabilistic tests and Popperian pretence, back to
the substantive analysis of economic questions.
Notes
1. Alternative identifying information may come from linear restrictions in the
model, or from restrictions on the stochastic characteristics of the model.
2. Summers (1991, p. 135) is `Popperian' in claiming that science `proceeds by
falsifying theories and constructing better ones'. In reality, the falsication of
theories is a rare event and not the main concern of scientists. In the remain-
der of his article, Summers is much closer to the instrumentalistic approach
that is supported in this book.
3. Sims (1980) can be regarded as an extension of Liu's argument.
4. The discussion of reductionism draws heavily on Keuzenkamp (1995).
5. These authors dene a Bayesian experiment as `a unique probability measure
on the product of the parameter space and the sample space' (p. 25). Upon
asking the second author why the word `experiment' had been chosen, the
answer was that he did not know.
6. This conation is explicitly recognized by Hendry (1993, p. 77). See also: `a
correct model should be capable of predicting the residual variance of an
Econometric modelling 177
incorrect model and any failure to do this demonstrates that the rst model is
not the DGP' (p. 86).
7. P(w[y) = P(w. y),P(y).
8. Hendry does not have a strong interest in non-parametric or non-linear mod-
els. This is hard to justify in a general-to-specic approach. It rules out
economic models with multiple equilibria.
9. A problem with the Wald test is that it is not invariant to mathematically
identical formulations of non-linear restrictions (Godfrey, 1989, p. 65). It is
usually assumed that this does not affect linear restrictions, such as [
1
= [
2
.
However, an identical formulation of this striction is [
1
,[
2
= 1, which may
lead to the same problem of invariance. Hence, using a Wald-type general-to-
specic test may not always be advisable. Hendry is open minded with respect
to the choice of test statistics: usually there are chosen on pragmatic grounds.
10. For example, the practical starting point of most of Hendry's modelling, the
autoregressive distributive lag model (in rational form y
t
= {b(L),a(L)]
x
t
{1,a(L)]u
t
), is less general than a transfer function model,
y
t
= {b(L),a(L)]x
t
{m(L),r(L)]u
t
. This arbitrary specic starting point
may lead to invalid inferences, for example, in investigating the permanent
income hypothesis (see Pollock, 1992).
11. Of course, one may object that economists are not the best judges on the
fruits of econometrics. This would violate a basic rule of economics (and
political science): the consumer (voter) is always right.
12. Hendry's contribution to this literature is contained as chapter 10 in Hendry
(1993). It is shown that an error correction model is able to `encompass'
Hall's random walk model of consumption in the context of UK data.
There is no doubt that Hendry's specication provides a better approxima-
tion to the data. However, it was not Hall's point to obtain the best possible
approximation. A similar remark applies to the discussion between Hendry
and Ericsson (1991) and Friedman and Schwartz (1991). See also the discus-
sion of calibration in section 5 below.
13. A footnote is added here: `Notwithstanding the difculties involved in calcu-
lating and controlling type I and II errors' (p. 28).
14. See e.g. the excellent treatment in Hacking (1983).
15. Darnell and Evans (1990), who make an effort to uphold the Popperian
approach to econometrics (meanwhile entertaining a quasi-Bayesian interpre-
tation of probability) similarly complain that Hendry does not deliver the
Popperian goods.
16. Note, moreover, that the recent shift to co-integration, which can also be
found in Hendry's writings, weakens the error-correcting view of the world.
The links between co-integration and error correction are less strong than at
rst sight may seem (see Pagan, 1995).
17. See Howson (1988b) for a useful view on accommodation and prediction,
from a Bayesian perspective. Howson claims that the predictionist thesis is
false. This thesis is that accommodation does not yield inductive support to a
hypothesis, whereas independent prediction does. Howson's claim is valid
178 Probability, Econometrics and Truth
(roughly speaking) if the hypothesis has independent a priori appeal.
Dharmapala and McAleer (1995), who deal with (objective) truth values
instead of degrees of belief, share Howson's claim, but ignore the condition
of a priori support. In the kind of econometric model design, where ad hoc
considerations play an important role and a priori support is weak, the pre-
dictionist thesis remains valid.
18. See also Summers (1991).
19. In the section omitted from the citation, there is a reference to Lakatos
amongst others. Hendry and Ericsson (1991, p. 22) make nearly the same
claim, using a very subtle change of words. Encompassing is `consistent with
the concept of a progressive research strategy . . . since an encompassing
model is a ``sufcient representative'' of previous empirical ndings'. One
wonders whether the move from `statistic' to `representative' has any deeper
meaning (as in the distinction between a `test' and a `check').
20. See also Ehrlich and Liu (1999) for a wide range of examples showing the
misleading information which may result from thoughtless application of
EBA.
21. The computational experiment is the digital heir to Weber's
Gedankenexperiment. Like Weber's, it is not an experiment but an argumen-
tation or simulation. As shown before, `experiment' is a much abused term in
econometrics; calibration follows this tradition.
22. Philosophical realism is discussed in chapter 9, section 2.
23. An additional source of uncertainty may be introduced for the parameters of
the model. Bayesian methods can be used to specify such models.
Econometric modelling 179
8 In search of homogeneity
Many writers have held the utility analysis to be an integral and
important part of economic theory. Some have even sought to employ
its applicability as a test criterion by which economics might be
separated from the other social sciences. Nevertheless, I wonder how
much economic theory would be changed if either of the two conditions
above [homogeneity and symmetry] were found to be empirically
untrue. I suspect, very little.
Paul A. Samuelson ([1947] 1983, p. 117)
1
1 Introduction
After discussing general problems of statistical and econometric infer-
ence, I will turn to a case study.
2
This chapter deals with consumer
behaviour which belongs to the best researched areas in both economic
theory and econometrics.
The theory of consumer behaviour implies a number of properties of
demand functions. Such demand functions have been estimated and the
predicted properties have been compared with the empirical results.
Among the most interesting properties is the homogeneity condition, a
widely accepted piece of economic wisdom which, however, does not
seem to stand up against empirical tests.
This chapter discusses why the rejections were not interpreted as
Popperian falsications of demand theory. Furthermore, the role of aux-
iliary assumptions is discussed. Finally, the merits of Leamer's (1978)
`specication searches' for interpreting the literature on testing homoge-
neity are assessed.
2 The status of the homogeneity condition: theory
2.1 Introduction
The condition of homogeneity of degree zero in prices and income of
Marshallian demand functions belongs to the core of micro-economic
wisdom. Simply put, this condition says that if all prices and income
change proportionally, expenditure in real terms will remain unchanged.
This has a strong intuitive appeal. Some economists call it the absence of
180
money illusion, they claim that it is a direct consequence of rational
behaviour.
The homogeneity condition results from an idealizing hypothesis, i.e.
linearity of the budget constraint which, jointly with a set of idealizing
hypotheses related to the preferences of individuals, implies the law of
demand. This law states that compensated demand curves do not slope
upward. Another ideal condition that can be derived from the idealizing
hypotheses is symmetry and negative deniteness of the so-called Slutsky
matrix of compensated price responses. Economists have tested these
conditions, and, in many cases, they have had to reject them statistically.
2.2 Falsiable properties of demand
2.2.1 Axioms and restrictions
The economic theory of consumer behaviour is founded on a set
of axioms and assumptions. Let preferences of some individual for con-
sumption bundles q be denoted by the relations _ (`is weakly preferred
to') and > (`is strictly preferred to'). The following requirements or
`axioms of choice' are imposed (see the excellent introduction to demand
theory by Deaton and Muellbauer, 1980b, pp. 269; much of the follow-
ing is explained in more detail in this or other intermediate textbooks on
micro-economics).
Axioms of choice
(A1) Irreexivity: for no consumption bundle q
i
, it is true that q
i
> q
i
(A2) Completeness: for any two bundles in the choice set, either q
i
_ q
2
or q
2
_ q
1
or both
(A3) Transitivity (or consistency): if q
1
> q
2
and q
2
> q
3
, then q
1
> q
3
(A4) Continuity
(A5) Nonsatiation
(A6) Convexity of preferences.
The rst three axioms dene what economists regard as rational beha-
viour. (A1) to (A4) imply the existence of a well-behaved utility function,
U(q
1
. q
2
. . . . q
n
). (1)
where q
i
denotes the (positive) quantity of commodity i (i = 1. . . . . n).
The utility function is continuous, monotone increasing and strictly
quasi-concave in its arguments, the quantities q
i
. The consumer has to
pay (a positive) price p
i
per unit of commodity i.
The fourth axiom is not required for rational behaviour (it rules out
lexicographic preferences even though these may be the taste of a per-
In search of homogeneity 181
fectly rational individual) but is generally imposed for (mathematical)
convenience. (A5) ensures that the solution is on the budget constraint
instead of being inside the constraint (hence, there is no bliss point). (A6)
allows the use of standard optimization theory but is non-essential for
neoclassical demand theory.
Utility is maximized subject to the budget constraint. An important
additional restriction (R) usually imposed on the consumer choice pro-
blem is linearity of this constraint:
(R1)
X
n
i=1
p
i
q
i
= m. (2)
where m is the budget (or means) available for consumption.
Yet another restriction, needed if one wants to carry out empirical
analysis, is to impose a particular functional form on the utility function
or derived functions (in particular the indirect utility function, and the
expenditure function), and to select an appropriate set of arguments that
appear in the utility function. I will summarize these constraints in:
(R2) Choice of a particular functional form.
(R3) Choice of arguments.
Are these axioms and assumptions like Nowak's (1980) `idealizing
hypotheses'? Such idealizing hypotheses are false, in the sense that they
are in conict with reality. Most economists would agree that this is the
case for (A1)(A6) and (R1)(R3). Economic behaviour can be explained
by this set of restrictions and axioms, but it is acknowledged that eco-
nomic subjects do not always t the straitjacket. This provides the food
for psychologists.
Indeed, tests of the axioms (such as transitivity, and in particular tests
in a situation of choice under uncertainty) have repeatedly shown that the
axioms are violated by mortals even by Nobel laureates in economics.
The papers in Hogarth and Reder (1986) provide ample evidence against
perfect rationality as dened above.
Not only the axioms are `false', but the additional restrictions are also
defective. We know, for example, that occasionally prices do depend on
the quantity bought by a particular consumer, either because the consu-
mer is a monopsonist, or because of quantity discounts. This would imply
a non-linear budget constraint. However, such non-linearities are thought
to be fairly unimportant in the case of choosing a consumption bundle.
Hence, the axioms and constraints are `false', but still they are very
useful for deriving economic theorems. One such theorem is the law of
demand, to be specied below. But the axioms and further restrictions are
182 Probability, Econometrics and Truth
not just instrumental in deriving ideal laws, they also serve to interpret
the real world. It has even been argued that a researcher who obtains
empirical ndings or correlations that imply violations of any of (A1)
(A6) and (R1), is not entitled to call such correlations `demand func-
tions': the estimates of the parameters are not valid estimates of elasti-
cities (Phlips, 1974, p. 34). Such violations can be found by analysing the
properties of demand functions that can be deduced from the axioms and
restrictions. I will now turn to this issue.
2.2.2 Demand functions and their properties
Reconsider the utility function which does not have income and
prices as arguments. Furthermore, the function presented is a static one,
independent of time. Both assumptions are examples of (R3). The indi-
vidual agent maximizes utility subject to the budget constraint. Solving
the optimization problem results in the Marshallian demand equations
that relate desired quantities to be consumed to income and prices, of
which the one for good i reads:
MD q
i
= f
i
(m. p
1
. p
2
. . . . . p
n
). (3)
It is by way of (R1) that the prices and the budget enter the demand
equation.
Alternatively, it is possible to formulate demand functions in terms of a
given level of utility, the so-called Hicksian or compensated demand
functions:
HD q
i
= h
i
(U. p
1
. p
2
. . . . . p
n
). (4)
It is now possible to be more explicit about the properties of demand
functions. There are four properties of interest (Deaton and Muellbauer,
1980b, pp. 434):
(P1) Adding up: the total value of demands for different goods equals
expenditure
(P2) Homogeneity: the Marshallian demand functions are homogeneous
of degree zero in prices and expenditure, the Hicksian demand func-
tions are homogeneous of degree zero in prices
(P3) Symmetry: the cross price derivatives of Hicksian demand functions
are symmetric
(P4) Negativity: the matrix of cross price derivatives of Hicksian demand
functions is negative semi-denite.
It is (P4) in isolation which is often called the `law of demand'. It is a
typical example of an `ideal law'. It implies downward sloping compen-
In search of homogeneity 183
sated demand functions. Occasionally, the four properties taken together
are interpreted as dening demand functions (Phlips, 1974). Note that
Hicks ([1939] 1946, pp. 26, 32) shows that the downward sloping com-
pensated demand curve is (using Nowak's terminology) a concretization
of Marshall's original demand function. Marshall derived the condition
that the (uncompensated) demand function is downward sloping by
imposing that the marginal utility of money is constant. When this idea-
lizing assumption is dropped, the remaining results are still of interest
and, one might add, closer to reality (truth).
The homogeneity property, (P2), states that if m and all p
i
are multi-
plied by the same factor, the quantity demanded of good i (or any other
good) is not changed. Hence,
q
i
= f
i
(m. p
1
. p
2
. . . . . p
n
) = f
i
(om. op
1
. op
2
. . . . . op
n
). (5)
or oq
i
,oo = 0: Nowak's (1980, p. 28) condition to qualify as an idealizing
assumption. The price and income terms enter the demand equation via
the budget constraint, not the utility function. Multiplying both sides of
the constraint by o has no consequences for the composition of the set of
quantities demanded to satisfy the budget constraint. Hence, the homo-
geneity condition is just an implication of the (linear) budget constraint
and only a very weak condition of rational choice. Deaton (1974, p. 362)
argues that it is hard to imagine any demand theory without the homo-
geneity condition and even identies non-homogeneity with irrationality
(in which case either (P2) or (R1) should be added to (A1)(A3) in
dening rational behaviour). Deaton is not alone in viewing (P2) as an
indispensable condition. For example, Phlips (1974, p. 34) writes:
Every demand equation must be homogeneous of degree zero in income and
prices . . . In applied work, only those mathematical functions which have this
property can be candidates for qualication as demand functions.
I will now turn to the question of whether empirical work on estimating
demand functions served a particular role in `concretizing' the theory of
consumer behaviour. In particular, I will focus on testing (P2), the homo-
geneity condition.
For expository reasons, it is useful to write the Marshallian demand (3)
in double logarithmic form and to impose linearity:
ln q
i
= [
i
j
i
ln m
X
n
j=1
c
ij
ln p
j
i. j = 1. 2. . . . . n. (6)
184 Probability, Econometrics and Truth
where j
i
is the income elasticity of demand and c
ij
are the price elasti-
cities. If multiplication of m and all p
i
by the same factor leaves q
i
unchanged (see (5)), then the following condition should hold:
j
i
X
n
j=1
c
ij
= 0. (7)
This equation has been the basis for a number of empirical tests of the
homogeneity condition. An alternative formulation has been used as well.
This will be derived now. The price elasticity can be decomposed as:
c
ij
= c
+
ij
j
i
w
j
. (8)
where w
j
= p
j
= q
j
,m denes the expenditure share of good j. Now (6)
can be rewritten as:
ln q
i
= [
i
j
i
ln m
X
n
j=1
w
j
ln p
j
!
X
n
j=1
c
+
ij
ln p
j
. (9)
As is obvious from (9) the j
i
w
j
component of c
ij
is the part of the price
reaction which can be neutralized by an appropriate change in m. Thus
the c
+
ij
are known as the compensated (or Slutsky) price elasticities as
opposed to the uncompensated ones. It follows from the property that
w
j=1
= 1 that the homogeneity condition, (7) is equivalent to:
X
n
j=1
c
+
ij
= 0. (10)
Both (7) and (10) have been used to test homogeneity.
2.2.3 Homogeneity and the systems approach
In principle, there are as many demand equations as there are
commodities and they should all satisfy the homogeneity condition. One
single contradiction invalidates it as a general property. A test of homo-
geneity should, therefore, involve all equations. A natural approach
would be a formal test of the homogeneity condition for all equations
at the same time.
The systems approach to demand analysis supplies an opportunity for
such a type of test. The empirical application of the Linear Expenditure
System (LES) by Stone (1954b) paved the way for this approach.
In search of homogeneity 185
Normally, the same functional form is used for all equations of the
system. From the point of view of testing homogeneity, this is not really
needed. Johansen (1981) views it as a straitjacket, reducing the empirical
validity of the system. Departing from this practice, however, makes the
use (or test) of other properties of demand functions, like symmetry,
cumbersome. It would reduce the attraction of the systems approach as
a relatively transparent, internally consistent and empirically applicable
instrument of demand analysis.
Let us turn to the famous early example of an application of the
systems approach: the LES. The utility function is specied as:
X
n
i=1
[
i
ln(q
i
,
i
).
X
n
i=1
[
i
= 1. (11)
with [
i
and ,
i
(,
i
- q
i
. [
i
> 0), and i = 1. . . . . n. This utility function is
maximized subject to the budget equation (R1). The rst order conditions
for a maximum are solved for the optimizing quantities to yield the
demand equations:
q
i
= ,
i
[
i
m
X
n
j=1
,
i
p
i
!
,p
i
. (12)
Multiplying through by p
i
yields an equation for optimal expenditures as
a linear function of m and the prices. Note that all prices appear in each
demand equation. Note also from (12) that homogeneity is an integral
part of the specication. Indeed, the LES was not invented to test
demand theory, but to provide a framework for the measurement of
allocation. Signicance tests reported should be interpreted as referring
to the accuracy of measurements (estimates), not the accuracy of the
theory.
Other types of specication, however, do allow for testing the homo-
geneity condition. One important type makes use of duality and starts off
from the indirect utility function, expressing the maximum obtainable
utility for a given set (m,p) where p indicates a vector of prices:
U(f
1
(m. p). f
2
(m. p). . . . . f
n
(m. p) = V(m. p). (13)
Such a function is theoretically homogeneous of degree zero in (m, p), but
one does not need to impose this property on the specication. Using
Roy's identity, one can obtain Marshallian demand functions:
186 Probability, Econometrics and Truth
q
i
=
oV,op
oV,om
. (14)
with the same functional form for all i but not necessarily homogeneous
of degree zero in (m, p). One can replace the f
i
(m. p) in (13) by f
i
() where
is the vector of prices p divided by m. Then V(m. p) specializes to V
+
().
Since is invariant for the same proportional change in m and the p, this
reformulation conserves the homogeneity condition. The counterpart of
(14) is then:
q
i
=
oV
+
,op
i
oV
+
,om
. (15)
It is essential that (15) is distinguishable from (14), for example, in the
form of restrictions on the parameters to be estimated. One can compare
the empirical performance of (14) with that of (15) for all equations of the
system together. If that of (14) is signicantly better than that of (15) one
has to reject the homogeneity condition. The Generalized Leontief
System of Diewert (1974) and the Indirect Translog System of
Christensen, Jorgenson and Lau (1975) are examples of (14). These sys-
tems were, however, not used to test homogeneity separately from the
other restrictions.
A related approach stipulates the functional form of the expenditure
function. This function expresses the minimum m needed to reach a
certain utility level for a given p. Like the indirect utility function, it is
a concept that reects the utility optimizing behaviour of the consumer.
The expenditure function can be obtained by solving m in the indirect
function (13) in terms of U and p, yielding, say, ~ m(U. p). According to
Shephard's lemma the partial derivatives of the expenditure function with
respect to p
i
are the Hicksian (or compensated) demand functions in
terms of U and p. Replacing U by the right-hand side of (13) gives the
desired demand functions. Homogeneity implies that ~ m(U. p) is homo-
geneous of degree one in the prices: if prices double, expenditure must
double as well in order to reach the same utility level. If that is indeed the
case and V
+
() is used to replace U in the Hicksian demand functions,
they are homogeneous of degree zero. However, one does not need to
impose the restriction right away. By comparing the restricted and
unrestricted equations, the restriction may be tested on observable beha-
viour. A typical example of a system that corresponds with this approach
is the Almost Ideal Demand System (see Deaton and Muellbauer, 1980a).
Still another approach takes a short cut and writes down the demand
functions directly in terms of m and p and a set of coefcients. Byron
In search of homogeneity 187
(1970), for example, applies the double logarithmic formulation of
demand functions, (9), with constant j
i
and c
+
ij
. The homogeneity condi-
tion is then (10) which can be tested. This specication is somewhat less
convenient if one also wants to test for Slutsky symmetry. Theil's (1965)
formulation of the Rotterdam System is in this respect more tractable
and at the same time also allows for testing homogeneity. Theil multiplies
both sides of (9) in differential form by the expenditure share w
i
.
2.3 Sources of violations
2.3.1 Invalid axioms
It may be true that one or more of (A1)(A6) are invalid, which
may lead to a violation of the homogeneity property, (P2). This is some-
times called `money illusion'. In that case, changes in absolute (rather
than relative) prices may not be recognized as such and, therefore, induce
changes in real expenditure patterns.
Locating the source of a violation of homogeneity in defective axioms
has been a rare response. Marschak (1943), an early tester of homogene-
ity, is an exception. The problem with this approach is that, apart from
`anything goes', a viable alternative theory of demand does not exist, at
least in the micro-economic domain. Deaton (1974, p. 362) is typical. He
rejects this response, as it would require an `unattractive' hypothesis of
irrational behaviour.
An alternative that still imples rational behaviour is to add a stochastic
element to the utility function (Brown and Walker, 1989). This is an
interesting point, as it shows the differences between the object (the
observing econometrician) and the subject (the rational individual).
Even though the subject behaves consistently with (A1)(A6), the econo-
metrician cannot observe all relevant factors that determine the individ-
ual's preferences. In most empirical studies, this is reected in adding a
disturbance term to demand equations `as a kind of afterthought'
(Barten, 1977). By relating a stochastic disturbance term directly to an
individual's utility, it is acknowledged that individuals differ and that
these differences are not observable to the econometrician. If the utility
function is extended with a disturbance term, the demand equations will
have random parameters. The econometrician not only has to stick an
additive disturbance term to his specication, but also has to take care of
random parameters (as, for example, in Brown and Walker, 1989).
2.3.2 Invalid restrictions
The restrictions (R1)(R3) above may be invalid. First, consider
(R1), linearity of the budget constraint. In reality, prices may be a
188 Probability, Econometrics and Truth
decreasing function of the quantity bought. It is difcult to think of cases
where this invalidates the homogeneity condition directly.
The functional form, (R2), is more likely to generate a clash between
theory and empirical results. Since the pioneering work of Schultz, Stone
and others, extensive work has been done to create more general func-
tional forms for utility functions or demand equations. In addition, the
systems approach which takes advantage of the cross-equation restric-
tions meant a leap forward. During the 1970s, much effort was spent on
inventing more exible functional forms, inspired by Diewert's (1971) use
of duality. It is not clear, however, how the choice of a functional form
affects tests of the homogeneity condition (apart from the fact that, for
some functional forms, tests of homogeneity cannot be carried out). The
increased subtlety of the specications does not generally result in `better'
test results. More recently, non-parametric approaches to inference have
been used to test the theory of consumer behaviour.
The choice of arguments that appear in the demand equation, (R3),
may be inappropriate. For example, some variables are omitted. The
validity of an empirical test of homogeneity or symmetry is conditional
on the correct specication of the model, the maintained hypothesis.
Three examples of omissions are:
(a) snob goods (goods that are more desirable if their prices go up, also
known as Veblen goods). For such goods, the utility function must
include prices. However, such goods are rare and may, therefore, be
legitimately omitted. Snob goods have, to the best of my knowledge,
never been invoked to explain a rejection of homogeneity;
(b) the planning horizon. If consumers maximize lifetime utility rather
than contemporary utility, the optimization problem should be for-
mulated in lifetime terms. However, if m in (3) is sufciently highly
correlated with the life-long concept of expected available means and
if the price expectations are also highly correlated with the prices in
(3), the included variables take over the role of the omitted ones and
homogeneity should hold. The mentioned considerations are an
empirical matter, not a purely theoretical one. Hence, homogeneity
may or may not hold in the specication. This is a problem that
cannot be settled by a priori deductive reasoning. In general, mis-
specication of (3) might result in biased estimates of the coefcients
in the demand equations. The bias could be away from or towards
homogeneity. Dynamic mis-specication is not the only possible
source of mis-specication. Omission of variables representing
changes in preferences can obscure homogeneity even if it exists;
(c) goods not included in the goods bundle considered. In theory, the
maximization problem should be formulated in the most general form
In search of homogeneity 189
possible, with all conceivable arguments that any consumer might be
willing to consider. All conceivable qualities of goods should be spe-
cied in the choice set. This is impossible for the empirical economist.
In applied work, goods are aggregated. The level of aggregation dif-
fers but it is impossible to do without aggregation.
2.3.3 Invalid approximations
All empirical demand systems approximate some kind of ideal
demand system. It is possible that a rejection of homogeneity is caused by
approximation to the ideal rather than truely non-homogeneous demand
equations. As Theil (1980, p. 151) remarks, translog specications (such
as employed by Christensen, Jorgenson and Lau, 1975) have been recom-
mended on the ground that quadratic functions in logarithms can be
viewed as second-order Taylor approximations to the `true' indirect uti-
lity function. They are accurate when the independent variables take
values in sufciently small regions. Still, even if these regions are suf-
ciently small, the implied approximation for the demand functions may
be very unsatisfactory. It follows from Roy's identity that the demand
equations are only rst-order approximations (in this respect the indirect
translog system is similar to the Rotterdam system).
Clearly, there is a basis for doubt about the validity of the homogeneity
condition in a specic model. At the same time, it is not clear how
important deviations from the ideal world characteristics which underlie
the theory of demand are. Homogeneity, like symmetry, is an ideal world
consequence. For several reasons it may be useful to impose it on empiri-
cal models of actual behaviour. Whether this imposition is valid, depends
not only on the resemblance of reality to the ideal world, but also on the
validity of the auxiliary assumptions which are used above. These con-
siderations inspired the investigations of the homogeneity condition in
consumer demand.
Another approximation relates to how the variables are measured. For
example, m is usually calculated using the budget equation, (R1). The
impact this has on the empirical validity of the homogeneity condition is
not clear. The measurement error is of relatively little interest, at least
empirical researchers have rarely investigated errors in variables as a
source of rejecting homogeneity. (Stapleton, mentioned in Brown and
Walker, 1989, is an exception but he does not provide sufcient evidence
to show that empirical results are driven by measurement error.)
2.3.4 Invalid aggregation
The properties of demand functions for individual agents and
elementary goods do not necessarily carry over to demand functions for
190 Probability, Econometrics and Truth
aggregates of goods by aggregates of individuals. All empirical investiga-
tions cited in this chapter deal with such aggregate demand functions. In
more recent empirical investigations, applied to cross sectional micro-
economic data, homogeneity and symmetry remain occasionally proble-
matic (see the survey by Blundell, 1988).
Conditions for aggregation of goods are spelled out in Hicks ([1939]
1946). Most economists are not troubled by aggregation of goods.
Empirical deviations from Hicks' composite commodity theorem (if the
relative prices of a group of commodities remains unchanged, they can be
treated as a single commodity) are thought to be non-essential.
Apart from aggregating goods, economists often test demand theory
by aggregating individuals. Again, this may be inappropriate. Conditions
for aggregating individuals are sketched in Deaton and Muellbauer
(1980a, b). An interesting step has been the formulation of an Almost
Ideal Demand System (AIDS), that takes care of exible functional form
and aggregation issues. Conditions for aggregation are so strong, that
many researchers doubt that ndings for aggregate data have any impli-
cation at all for the theory of consumer demand that deals with individual
behaviour. Blundell (1988) discusses examples of testing the theory using
non-aggregated data. Still, even if data for individuals are used, the prop-
erties of demand are occasionally rejected by the data.
An additional `problem' for testing the theory in such a context is that
the number of observations tends to be very large. A typical example is
Deaton (1990), who estimates demand functions given a sample of more
than 14,000 observations. With such large samples any sharp null
hypothesis is likely to be rejected given a conventional (5 or 1%) signi-
cance level. Not surprisingly, Deaton reports a rejection (of symmetry,
homogeneity is not tested). Deaton's resolution to this rejection is to
become an expedient Bayesian. He invokes the Schwarz (1978) criterion,
which `saves' the symmetry condition.
2.3.5 Invalid stochastics
The statistical tests reported below all rely on assumptions con-
cerning the validity of the test statistics. Most important are assumptions
related to the appropriateness of asymptotic statistics to small samples
and assumptions about the distribution of the residuals. These assump-
tions are frequently doubtful.
Consider the early econometric efforts to estimate demand functions,
e.g. Moore (1914). Model specications were selected after comparing the
goodness of t of alternative models. Karl Pearson's (1900) goodness of
t (
2
) test was the tool, but the properties of this test statistic (in parti-
cular the relevance of degrees of freedom) were not well understood
In search of homogeneity 191
before 1921. Similarly, more recently, systems of demand have been esti-
mated. Testing homogeneity in a system of demand equations is depen-
dent on the unknown variancecovariance matrix. In practice, this has to
be replaced by an estimate, usually a least squares residual moment
matrix. Econometricians who used these tests were aware that the sam-
ples available for investigating consumer demand were not large enough
to neglect the resulting bias, but Laitinen (1978) showed that the impor-
tance of this problem had been grossly underestimated. Many statistical
tests do not have well understood small sample properties even today. I
will return to this issue in section 5 below.
Statistical tests of homogeneity also usually assume specic distribu-
tions of the residuals. Models of random utility maximization problem
(Brown and Walker, discussed above) suggest that the assumption of
homoscedasticity may be invalid. If the residuals are heteroskedastic
and no correction is made, tests tend to over-reject homogeneity.
In short, in the econometric evaluation of consumer behaviour, inves-
tigators started with two additional `axioms':
(A7) Axiom of correct specication
(A8) The sample size is innite
(A7) relates to the validity of the `maintained hypothesis'. This is rarely
the case, taken literally, but econometricians behave as if their model is
correctly specied. Still, (A7) may be invalid, causing a rejection of
homogeneity. This is related to invalid restrictions (R2) and (R3), but
the purely statistical argument about the stochastics of the demand equa-
tions has been further exploited by those who favour a non-parametric
approach in inference, in which no specic distributional assumptions for
the residuals are presumed. Varian (1983) argues that a major drawback
of parametric testing is the fact that a rejection can always be attributed
to wrong functional forms or mis-specication.
(A8) might also be labelled as the asymptotic idealization. Many tests
are based on this assumption of the availability of an innite amount of
data. Again, no one would claim that a particular sample obeys this
requirement, but it is a matter of dispute when `large' is large enough.
In econometrics, it is clear that sample sizes are not quite as large as (A8)
would require. This was acknowledged early on, for example by
Haavelmo (1944) who relies on a weakened version of (A8):
(A8
/
) The sample is a random drawing from a hypothetical innite popu-
lation.
As shown in chapter 6, this version of Asymptopia has gained tremen-
dous popularity among econometricians, but an uncritical belief in
(A8
/
) may result in an unwarranted rejection of the properties of
demand.
192 Probability, Econometrics and Truth
3 Empirical work
The history of testing the homogeneity condition reads as a short
history of econometric inference. All kinds of problems of inference
are discussed, different approaches in statistics are applied to estimat-
ing and testing demand equations. In this section, I use a number of
empirical studies to illustrate some of the problems of econometric
inference.
3.1 Early empirical work
3.1.1 Goodness of t
The earliest known statement about the law of demand goes
back to Davenant (1699) who observed a negative relation between the
quantity of corn and its price. The relation differs from (3) in several
respects. First of all, it takes the quantity as given and the price to be
determined. This makes sense for agricultural goods. For the purpose at
hand, the fact that only one good and one price are considered is more
important because homogeneity involves a set of prices and means. The
simplied relation may be justied by an appeal to a ceteris paribus
clause: the other prices and means are taken to be constant. Whether
this can be maintained empirically was beyond the scope of Davenant.
Most likely, the assumption was invalid, which does not diminish the
importance of his effort as a crude rst approximation.
Davenant's assumption was used unchanged until the nineteenth cen-
tury, when the quantity demanded was also made dependent on prices
other than its own. Walras ([18747] 1954) needed all quantities to
depend on all prices and vice versa for his general equilibrium framework.
According to Walras, the choice of the numeraire, the unit in which the
prices are expressed, is arbitrary. This, of course, is a way of stating the
homogeneity condition. Walras did not pursue an empirical analysis.
Henry Moore was one of the rst economists to make a statistical inves-
tigation of demand functions. Homogeneity was not among his interests,
however.
What is of interest in Moore's work is the source of his statistical
theory. Moore attended courses in mathematical statistics (1909) and
correlation (1913) with Karl Pearson (Stigler, 1962). Pearson, one of
the champions of positivism and measurement, was a clear source of
inspiration in Moore's efforts to measure elasticities of demand. Two
of Pearson's innovations, the correlation coefcient and the goodness
of t test, were instrumental in Moore's investigations. Stigler notes
that Moore was not aware of losses of degrees of freedom in his applica-
In search of homogeneity 193
tions of the
2
test. This is no surprise as the issue was not solved before
1921 when Fisher provided the argument. Pearson himself was most
reluctant to accredit this point to Fisher (see Fisher Box, 1978,
pp. 845). Moore's use of goodness of t tests was as modern as could
be expected.
Moore's legacy not only consists of his early empirical work on
demand, but is also his `professed disciple' (Stigler, 1962), Henry
Schultz. Schultz (1928), a pioneering authority in this eld, estimates
demand functions for sugar. After a short discussion of Cournot and
Marshall (suggesting a demand equation with the demanded good's
own price as the only explanatory variable; a ceteris paribus clause
states that all other prices are constant), Schultz turns to the so-called
`mathematical school' of Walras and Pareto. Here, the quantity
demanded is a complicated function of many price variables. Schultz
(p. 26) asks:
How can we deal with such complicated functions in any practical problem? The
answer is that, although in theory it is necessary to deal with the demand function
in all its complexity in order to show that the price problem is soluble, in practice
only a small advantage is gained by considering more than the rst few highly
associated variables.
Schultz follows the pragmatic approach, by estimating a simple sugar
demand equation with real income and prices deated by the consumer
price index as explanatory variables. His motivation is neither to exploit
homogeneity explicitly, nor to test it, but to see if the t is better than
when using absolute prices.
Schultz (1937) makes use of theoretical insights of Slutsky and
Hotelling and even provides an informal test of the symmetry condition
(pp. 633, 645). A formal test for homogeneity is not pursued, although
some casual information on its validity is provided. After discussing the
question of whether estimation of demand functions should use absolute
or relative prices, Schultz (p. 150) chooses the latter, because `competent
mathematical statisticians have long felt that every effort should be made
to reduce the number of variables in a statistical equation to a minimum'.
The choice of variables is inspired by this pragmatic statistical motiva-
tion, not by theoretical argument. The informal test is to compare a
demand equation using real price and consumption per capita on the
one hand, and nominal price and nominal consumption per capita on
the other hand. Scaling by a price index and a population index has the
same goal: to improve the t. The results (p. 71 and p. 80) show no
difference in t.
194 Probability, Econometrics and Truth
3.1.2 Money illusion?
Schultz is not interested in testing theory or the homogeneity
condition. Rather, the purpose is to estimate elasticities. Marschak
(1943) goes beyond this: he wants to test the null hypothesis of `absence
of money illusion'.
3
The following motivation is given by Marschak
(p. 40):
For the economist, our `null hypothesis' has an additional interest on the grounds
(1) that it is a necessary (though not sufcient) condition of rational behavior,
dened as using one's income to one's best satisfaction, (2) that it supplies a
justication for using `deators' in economic statistics and for discussing the
demand relationships in terms of so called `real' incomes and prices, and (3)
that it is incompatible with important theories of unemployment.
Two incompatible theories mentioned are Irving Fisher's theory that
people are more sensitive to changes in income than to changes in prices,
and Keynes' theory of unemployment. The theoretical background is the
Slutsky equation from which the restrictions on demand functions are
derived. Marschak's empirical model of demand is a combination of an
Engel curve and price substitution effects. Just one demand equation is
estimated: demand for meat. The explanatory variables are income and
the prices of meat, other food and non-food goods. The test carried out is
not a formal signicance test, but standard errors of the estimates of
elasticities are given. Using (7) for the test, the estimated elasticities reject
Marschak's null hypothesis: the numerical magnitude of the income elas-
ticity is larger than the sum of the price elasticities. Marschak is almost
alone in the history of testing homogeneity for his willingness to interpret
his rejection as a falsication, although a `sweeping verdict' on the valid-
ity of the null hypothesis should not yet be made (p. 48).
3.1.3 Stone and the measurement of consumer behaviour
The next important study is Stone (1954a). His model is an
example of the set of double logarithmic demand equations, discussed
above. He (p. 259) employs (10) for the actual test. Estimates are given
for thirty-seven categories of food, based on data for the period 192038.
The equations are parsimonious (not all substitution terms are included
this would be impossible, given the limited number of observations).
Although they result from a search for `the best signicant equation'
(p. 328), no one would regard Stone's efforts as thoughtless data-mining.
He (p. 254) gives the following motivation for his investigation of
demand theory:
In the rst place it gives rise to a number of hypotheses about the characteristics
of demand equations which, in principle at any rate, are capable of being tested by
In search of homogeneity 195
an appeal to observations. These are the `meaningful theorems' of Samuelson,
which, however, will here be termed veriable theorems. One such theorem is, for
example, that consumers do not suffer from `money illusion' or, put more speci-
cally, that demand equations are homogeneous of degree zero in incomes and
prices so that a simultaneous multiplication of incomes and prices by the same
positive constant would leave the amounts demanded unchanged.
In the second place these implications of consistent behaviour usually involve
some restriction on the parameters of the demand equations. Thus the homoge-
neity theorem just referred to entails that the sum of the elasticities of demand
with respect to income and all prices is zero. (p. 254)
Stone refers to Samuelson ([1947] 1983, p. 4), who denes a meaningful
theorem as `a hypothesis about empirical data which could conceivably
be refuted, if only under ideal conditions'. From the actual test, Stone
(1954a, p. 328) infers that,
with two exceptions, home produced mutton and lamb, and bacon and ham, the
sums of the unrestricted estimates of the price (substitution) elasticities do not
differ signicantly from zero. Accordingly, the assumption that the condition of
proportionality is satised is supported by the observations.
The signicance level employed is the familiar 5%. Stone wants to verify,
not falsify. Had he been a falsicationist, he would have concluded that
the two rejections cast demand theory in doubt, as the theory should
apply to all demand equations together. Instead, he appears to be pleased
with the results and continues by imposing homogeneity, interpreting the
estimated equations as proper demand functions, and the parameters as
proper elasticities.
3.2 Empirical work based on demand systems
The rst test of homogeneity in a systems approach is carried out in
Barten (1967). The test is a joint one for homogeneity and symmetry,
hence not a test of homogeneity alone. The homogeneity part of the test
is based on (10). Although it was calculated, the result of a separate test
for homogeneity was not published. This result suggested that the homo-
geneity condition was not well supported by the data, but it was unclear
whether the result was robust. The joint test of homogeneity and sym-
metry, which was published, did not lead to strong suspicion of these
conditions taken together. Testing demand theory is not the main goal of
the paper: the focus is on estimation.
In this context, and given the very strong a priori appeal of the homo-
geneity condition, Barten summarizes the evidence on homogeneity as
follows:
196 Probability, Econometrics and Truth
we may conclude that the empirical data used here are not inconsistent with the
homogeneity property. That is, we have not been able to nd with any great
precision the effect of a monetary veil. (p. 81)
However, the rst published formal test of homogeneity in a systems
approach, Barten (1969), is a rejection of homogeneity. The model uses
the Rotterdam specication for sixteen consumption categories and data
(for the Netherlands) ranging from 192239 and 194961. It is estimated
with maximum likelihood. Applying maximum likelihood estimation to a
complicated problem, showing its fruitfulness, is the real purpose of this
paper. An easy by-product is the application of a likelihood ratio statistic
to test homogeneity.
Hence, testing homogeneity itself is not the major goal of the paper
but the result was as unexpected as disappointing (the disappointment
may have stimulated efforts to counter the rejection, making it a source
of inspiration for the subsequent specication search). The rejection con-
rmed the unpublished earlier one: homogeneity seemed to be on shaky
grounds, although a viable alternative was not considered. The basic
motivation for testing was that it could be done: it showed the scope of
the new techniques.
The strong a priori status of the homogeneity condition made Theil
suggest not publishing the rejection, as this might have been due to
computational or programming errors. Publication would unnecessarily
disturb the micro-economics community. When it became clear that the
results were numerically correct and the rejection was published,
responses varied. Some authors argued that demand systems which inva-
lidate homogeneity do not count as interesting ones (e.g. Phlips). Others
working with different specications and with different estimation and
testing techniques tried to replicate the experiment.
Byron (1970) replicates using the double-logarithmic approach to pos-
tulate a demand system. Using the same data as Barten (1969), Byron
duplicated the rejection of homogeneity (and symmetry). After providing
his statistical analysis, Byron (1970, p. 829) nds it `worthwhile to spec-
ulate why the null hypothesis was rejected'. He continues:
It is possible that the prior information is simply incorrect that consumers do
not attempt to maximize utility and do experience a money illusion due to chan-
ging prices and income. It is quite likely, however, that this method of testing the
hypothesis is inappropriate, that aggregation introduces errors of a non-negligible
order of magnitude, that the applications of these restrictions at the mean is
excessively severe for data with such a long time span, that the imposition of
the restrictions in elasticity form implies an unacceptable form of approximation
to the utility function, that preferences change, or that the naive utility maximiza-
tion hypothesis needs modication for the simultaneous consideration of perish-
In search of homogeneity 197
ables and durables. Such arguments are purely speculative: the only denite
information available is that the imposition of the restrictions in this form on
this data leads to the rejection of the null hypothesis.
In other words, Byron lists a number of auxiliary hypotheses that may
cause the rejection. The tested hypothesis itself may be right if one of the
auxiliary hypotheses in the test system is wrong. This is the Duhem
Quine thesis, discussed in chapter 1, section 5.1. A theory is an intercon-
nected web, no test can force one to abandon a specic part of the web
(Quine, [1953] 1961, p. 43). I consider this issue on page 203 below. The
problem for Barten, Byron and other researchers is that they fail to have
a preconception of where to go: they do not know what lesson to learn
from the signicance tests.
The rejection of homogeneity for Dutch data might be an artifact of
the Netherlands (although not all tests on Dutch data yielded a clear
rejection). Hence, other data sets were used for estimating and testing a
systems approach to consumer demand. The homogeneity condition was
rejected for Spanish data using both the Rotterdam specication and the
double-logarithmic one by Lluch in 1971. The UK is next in the search
for homogeneity. Deaton (1974) compares a number of specications of
demand systems (ve versions of the Rotterdam system, as well as the
LES, Houthakker's direct addilog system and a model without substitu-
tion effects). The data range from 1900 to 1970 and distinguish nine
consumption categories for the UK. Using the likelihood ratio test,
Deaton (p. 362) concludes that the null hypothesis of homogeneity is
rmly rejected for the system as a whole. A closer look at the individual
demand equations shows that (P2) cause the problems. This result is
consistent with the observation in Brown and Deaton (1972, p. 1155):
Of the postulates of the standard model only one, the absence of money illusion,
has given consistent trouble; there is however some evidence to suggest that this
result can be traced to individual anomalies.
These results illustrate the failure to conrm homogeneity starting from
the direct postulation of demand functions. An example of the alterna-
tive, to start from indirect utility, is Christensen, Jorgenson and Lau
(1975). On the basis of rejecting symmetry, conditional on homogeneity,
they (p. 381) conclude that `the theory of demand is inconsistent with the
evidence'. Their evidence adds new information to previous results.
According to the authors, it might be that demand theory is valid, but
that utility is not linear logarithmic. But the results `rule out this alter-
native interpretation and make possible an unambiguous rejection of the
theory of demand'.
198 Probability, Econometrics and Truth
The continuous stream of publications based on the theory of demand
makes clear, however, that few economists did or do share this conclusion
of Christensen, Jorgenson and Lau, even though the sequence of rejec-
tions of the homogeneity condition does not stop here. A novel effort to
test homogeneity is made by Deaton and Muellbauer (1980a), who apply
the Almost Ideal Demand System using British data from 1954 to 1974
and eight consumption categories. F-tests on the restriction of homoge-
neity reject homogeneity in four of the eight cases. Because the Durbin
Watson statistic indicates problems in exactly the same cases, Deaton and
Muellbauer conjecture that the rejection of homogeneity is a symptom of
dynamic mis-specication. Here, a combination of tests points to where
the investigator might search. But this direction is not pursued in Deaton
and Muellbauer (1980a).
4 Philosophy
4.1 Rejection without falsication
Samuelson's Foundations aims at deriving `meaningful theorems', refut-
able hypotheses. The economic theorist should derive such theorems, and
`under ideal circumstances an experiment could be devised whereby one
could hope to refute the hypothesis' (Samuelson, [1947] 1983, p. 4). This
sounds quite Popperian (avant la lettre, as the English translation of The
Logic of Scientic Discovery appeared much later). But at the same time,
Samuelson is sceptical about testing the theory of consumer behaviour, as
revealed by the opening quotation of this chapter. It is hard to be
Popperian.
This is also the conclusion of Mark Blaug (1980, p. 256), who argues
that much of empirical economics is
like playing tennis with the net down: instead of attempting to refute testable
predictions, modern economists all too frequently are satised to demonstrate
that the real world conforms to their predictions, thus replacing falsication,
which is difcult, with verication, which is easy.
Following Popper ([1935] 1968) and Lakatos (1978), Blaug values falsi-
cations much more highly than conrmation. Economists should use
every endeavour to try to falsify their theories by severe tests. Blaug
(1980, p. 257) complains that economists do not obey this methodological
principle, and stoop to `cookbook econometrics', i.e. they `express a
hypothesis in terms of an equation, estimate a variety of forms for that
equation, select the best t, discard the rest, and then adjust the theore-
tical argument to rationalize the hypothesis that is being tested'.
In search of homogeneity 199
It is doubtful that Blaug's general characterization of empirical eco-
nomics holds water. Of course, there are poor econometric studies but, in
order to criticize, one should also consider the best contributions avail-
able. The search for homogeneity certainly belongs to the best econo-
metric research one can nd. The case of homogeneity is illuminating, as
the homogeneity condition makes part of a well-dened economic theory
which provides the empirical researcher a tight straitjacket. Data-mining
or `cookbook econometrics' is ruled out.
The empirical investigations discussed above show how the game of
investigating consumer demand was played: with the net up. At various
stages in time, the most advanced econometric techniques available were
used. But there is a huge difference between playing econometrics with
the net up, and falsifying theories. Few researchers concluded that the
theory of consumer demand was falsied. Verication of homogeneity
turned out to be difcult, not easy.
One rejection might have induced a (`naive') Popperian to proclaim the
falsication of the neoclassical micro-economic theory of consumer beha-
viour. But such a `crucial test' is virtually non-existent in economics.
Blaug ([1962] 1985, p. 703) warns, `conclusive once-and-for-all testing
or strict refutability of theorems is out of the question in economics
because all its predictions are probabilistic ones'. But a sequence of rejec-
tions should persuade more sophisticated adherents of fallibilism.
Lakatosians might conclude that the neoclassical research programme
was degenerating. In reality, none of these responses obtained much
support. Instead, a puzzle emerged, waiting for its solution.
Homogeneity was rejected, not falsied. This stubbornness is hard to
justify in falsicationist terms.
Do Marschak and Christensen, Jorgenson and Lau qualify as
Popperians? Christensen et al. certainly not, as a repeating theme in
Popper's writings is that theories can only be falsied if a better theory,
with a larger empirical content, is available (see also Blaug, [1962] 1985,
p. 708). Such an alternative is not proposed. This point is sometimes
overlooked by statisticians, although most are aware of this limitation
of signicance testing. According to Jeffreys ([1939] 1961, p. 391), the
required test is not whether the null must be rejected, but whether the
alternative performs better in predicting (i.e. is `likely to give an improve-
ment in representing future data'). Surprisingly perhaps, such a test has
not been carried out in the investigations of consumer demand presented
above (more recently, this gap has been lled; see the discussion of
Chambers, 1990, below). How, then, should we interpret the bold claim
made by Christensen, Jorgenson and Lau (1975) that the theory of
demand is inconsistent with the evidence? They may have been under
200 Probability, Econometrics and Truth
the spell of falsicationism (perhaps unconsciously, they do not refer to
any philosophy of science). The Popperian tale of falsication was popu-
lar among economists during the 'seventies (and alas still is). But if
Popper's ideal is praised, then the praise is not very deep. An example
is Deaton (1974, p. 345) :
A deductive system is only of practical signicance to the extent that it can be
applied to concrete phenomena and it is too easy to protect demand theory from
empirical examination by rendering its variables unobservable and hence its pos-
tulates unfalsiable.
After embracing the falsicationist approach, and after nding that
homogeneity is on shaky grounds, Deaton (p. 362) remains reluctant to
conclude to a falsication of demand theory.
4.2 Popper's rationality principle
Now consider Marschak. He tests homogeneity and has an alternative:
existence of money illusion (or more generally, a Keynesian model of
the economy although this is not specied). This seems to be close to
the Popperian ideal; but there is a problem. Popper is aware that, in the
social sciences, the testability of theories is reduced by the lack of
natural constants (e.g. Popper, [1957] 1961, p. 143). This might diminish
the importance of the methodology of falsicationism for the social
sciences (this is the position of Caldwell, 1991; see also Hands, 1991).
On the other hand, an advantage of the social sciences relative to the
natural sciences is the element of rationality (Popper, [1957] 1961,
pp. 1401). Popper proposes supplementing methodological falsication-
ism with the `zero method' of constructing models on the assumption of
complete rationality. Next, one should estimate the divergence between
actual behaviour and behaviour according to the model. Marschak, how-
ever, not only measures the divergence but constructs a test of rationality.
Popper is ambiguous on this point: it is unclear whether methodological
falsicationism extends to trying to falsify the rationality principle. The
fact that Popper refers to Marschak in favourable terms suggests that
falsicationism is the primal principle, with the rationality principle
second.
Whatever the verdict will be on the Popperian spirits of Marschak, he
has not changed the general attitude towards demand theory in general or
money illusion specically. Stone and most other investigators of consu-
mer demand continue to work with neoclassical demand theory. Even if
Marschak were Popperian, his successors in economics are not.
In search of homogeneity 201
It has been noted by philosophers of science that actual science rarely
follows Popper's rules (Hacking, 1983; Hausman, 1989). In particular,
Lakatos has tried to create a methodology that combines some of
Popper's insights with the actual proceedings of science. Can we interpret
the history of testing homogeneity in Lakatos' terms?
4.3 Does Lakatos help?
Using Lakatos' (1978) terminology, the homogeneity condition belongs
to the `hard core', the undeniable part of the neoclassical scientic
research programme, together with statements like `agents have prefer-
ences' and `agents optimize'. Some economists argue that demand equa-
tions are meaningless if they violate the homogeneity condition. The hard
core status of homogeneity is clear in the writings of Phlips (1974, p. 34):
functions that do not have this property do not qualify as demand equa-
tions.
The essential characteristic of a hard core proposition is that a sup-
porter of the research programme should not doubt its validity. The hard
core proposition is irrefutable by the methodological decision of its pro-
ponents. It is an indispensable part of a research programme. At rst
sight, such an interpretation of the homogeneity postulate seems to be
warranted: see Samuelson's epigraph to this chapter and the quotations
of Marschak (1943, p. 40) and Stone (1954a, p. 254). Deaton (1974,
p. 362) concurs:
Homogeneity is a very weak condition. It is essentially a function of the budget
constraint rather than the utility theory and it is difcult to imagine any demand
theory which would not involve this assumption. Indeed to the extent that the
idea of rationality has any place in demand analysis, it would seem to be contra-
dicted by non-homogeneity.
Hence, Deaton claims that any demand theory based upon rational beha-
viour of consumers should satisfy the homogeneity condition: accepting
the rejection `implies the acceptance of non-homogeneous behavior and
would seem to require some hypothesis of ``irrational'' behavior; this is
not an attractive alternative'.
If we continue with the attempt to use Lakatos for an understanding of
the analysis of homogeneity, we must introduce the `negative heuristic'
(see chapter 1, section 5.2), the propositions which immunize the hard
core. In our case, the negative heuristic is not to test the hard core
proposition of homogeneity. But this picture clearly does not t reality.
Empirical investigators were eager to test homogeneity. They did exactly
what this supposed negative heuristic forbade.
202 Probability, Econometrics and Truth
Hence, either homogeneity changed status, from hard core proposition
to something else, or the Lakatosian scheme is not suitable for an under-
standing of the history of consumer demand analysis. The rst option, a
change in status, would imply that homogeneity became part of the so-
called `protective belt of auxiliary hypotheses which has to bear the brunt
of tests and get adjusted and re-adjusted, or even completely replaced, to
defend the thus-hardened core' (Lakatos, 1978, p.48). The protective belt
consists of the ndings that may be used to support the hard core pro-
positions, or which may be investigated as puzzles, anomalies.
4
But the
testers of homogeneity listed many hypotheses that might explain the
rejection of homogeneity. Byron's list, cited above, is one example. The
auxiliary hypotheses are related to dynamics, parameter drift, small sam-
ples, etc., but homogeneity itself is not considered as yet another auxiliary
hypothesis to the hard core of demand theory.
Lakatos' classication is of little help. And if this classication cannot
be used, it is not fruitful to go on with a rational reconstruction of the
history of research in consumer demand using the methodology of scien-
tic research programmes. There is no base for concluding that neo-
classical demand theory was either degenerative or progressive. The
Lakatosian philosophy of science leaves us empty handed. This observa-
tion corresponds to the view of Howson and Urbach (1989, p. 96), who
argue that Lakatos was unable to clarify what qualies elements of a
theory to become part of the hard core of a research programme.
Lakatos invokes a `methodological at', but what is this other than
pure whim? Unfortunately, Howson and Urbach write,
this suggests that it is perfectly canonical scientic practice to set up any theory
whatever as the hard core of a research programme, or as the central pattern of a
paradigm, and to blame all empirical difculties on auxiliary theories. This is far
from being the case.
It is hard to provide a coherent Popperian or Lakatosian interpretation
of this episode in the history of testing: we observe a sequence of rejec-
tions, without falsication. Does this mean that `anything goes'
(Feyerabend, 1975)?
4.4 Auxiliary hypotheses and the DuhemQuine thesis
The answer to the question at the end of the previous subsection is no.
The tests suggest that something is wrong in the testing system. Quine
(1987, p. 142) recommends using the maxim of minimum mutilation:
`disturb overall science as little as possible, other things being equal'.
Dropping homogeneity is far more disturbing than dropping, for
In search of homogeneity 203
example, time-separability of consumption, homoskedastic errors, or the
`representative individual'. Dropping homogeneity, i.e. studying consu-
mer behaviour without demand theory (or a viable alternative), would
make the measurement of elasticities very complex.
Research was directed to saving the straitjacket, the attack directed at
the auxiliary hypotheses. Recall the sources of violation of homogeneity,
other than failure of demand theory (or the axioms underlying the the-
ory) itself. They are invalid restrictions, invalid approximations, invalid
aggregation and invalid stochastics.
Linearity of the budget restriction, (R1), has not been questioned by
the demand researchers. It has fairly strong empirical support, deviations
from linearity are small and, moreover, it is hard to imagine how these
could cause non-homogeneity. The choice of functional form, (R2), on
the other hand, is crucial. In some cases (the LES, for example) homo-
geneity is simply imposed. In other cases, it can be imposed by restricting
the parameters of the demand equation. The functional form of the
demand equation is open to much doubt. It is widely thought that para-
metric demand equations are only rough approximations to some ideal
demand function. Hence, the `truthlikeness' or a priori probability of the
specication is relatively low, and much research has been directed to
obtaining better approximations to this ideal. The same applies to the
choice of arguments, (R3). The variable `time' has received special atten-
tion (Deaton and Muellbauer, 1980a). Other ones, such as snob goods,
are thought to be innocuous and have not attracted research efforts to
save homogeneity.
Aggregation is of special interest. According to some, there is no rea-
son to believe that the micro relations (such as the law of demand) should
hold on a macro level. Others claim that economic theory can only be
applied to aggregates. This issue inspired research into the conditions for
aggregation, in particular of individuals (rather than goods). The effects
of income distribution are crucial. Still, even in aggregate demand sys-
tems where care is taken of income effects homogeneity is violated. The
same happens occasionally with disaggregated demand studies.
Finally, the particular validity of the statistical tests that were
employed has been questioned but it was not until the late seventies
that this approach convinced applied econometricians of the fact that
homogeneity, after all, could stand up against attempts to reject it if
the right small sample tests were used. The old results could be given a
new interpretation; a kind of Gestalt switch had taken place. It is inter-
esting that the validity of statistical tools is a crucial auxiliary assumption
that turned out to be a false one. I will return to this in the next section.
204 Probability, Econometrics and Truth
5 The theory versus the practice of econometrics
5.1 Statistical testing using frequency concepts
If we reconsider the dominant approaches in frequentist probability the-
ory, then Fisher's seems to be the most appropriate in the context of the
analysis of consumption behaviour. In most of the studies presented in
this chapter, one cannot speak of a von Mises collective. Von Mises
explicitly disavowed small sample analysis. It is only more recently that
econometricians have started to analyse consumer behaviour by means of
micro (cross section) data. Deaton (1990) is an example. However, these
recent investigations are not inspired by a quest for collectives, but by the
fact that the theory of consumer behaviour is about individuals, not
aggregates.
Fisher's (1922, p. 311) `hypothetical innite population, of which the
actual data are regarded as constituting a random sample' is the meta-
phor by which the applied econometricians could salve their frequentist
consciences, even though the method of maximum likelihood does not
enter the scene before the late sixties. The signicance tests are interpreted
as pieces of evidence, although in most cases it is not clear what evidence.
But a number of differences from Fisher's approach stand out. Initially,
the aim of inference is not the reduction of data: the real aim is measure-
ment of elasticities, which do not have to be sufcient statistics. Second,
ducial inference is not a theme in demand analysis. Third, there is no
experimental design or randomization by which parametric assumptions
could have been defended and specication uncertainty reduced.
Is the NeymanPearson approach closer to econometric practice? Not
quite. First, the decision context is not clear. Why would the econome-
trician bother with making a wrong decision? This critique is anticipated
by Fisher, who argues that in the case of scientic inference, translating
problems of inference to decision problems is highly articial.
Second, the NeymanPearson lemma is of little relevance in the case of
testing composite hypotheses, based on what most likely is a mis-specied
model, using a limited amount of data without repeated sampling. In
these circumstances, the properties of the test statistics are not well under-
stood. Indeed, the NeymanPearson framework has hardly been relevant
for testing the properties of demand, although references to `size' and
`power' pervade the literature.
Third, although the NeymanPearson approach stands out for its
emphasis on the importance of alternative hypotheses, there are no mean-
ingful alternatives of the kind in the case of testing homogeneity. Thus, as
Jeffreys ([1939] 1961, p. 390) wonders: `Is it of the slightest use to reject a
hypothesis until we have some idea of what to put in its place?'
In search of homogeneity 205
Fourthly, homogeneity is a singularity whereas the alternative is con-
tinuous. If enough data are gathered, it is most unlikely that the singu-
larity precisely holds (Berkson, 1938). This issue is recognized by Neyman
and Pearson. They introduce their theory of testing statistical hypotheses
with a short discussion of a problem formulated by Bertrand: testing the
hypothesis that the stars are randomly distributed in the universe. With
this in mind, they state (1933a, pp. 1412) that
if x is a continuous variable as for example is the angular distance between two
stars then any value of x is a singularity of relative probability equal to zero. We
are inclined to think that as far as a particular hypothesis is concerned, no test
based upon the theory of probability can by itself provide any valuable evidence
of the truth or falsehood of that hypothesis.
Their proposal to circumvent this problem is to exchange inductive infer-
ence for rules of inductive behaviour. The problems with that approach
have been mentioned in chapter 3.
5.2 The lesson from Monte Carlo
An important approach to attack the puzzle of rejections of homogeneity
comes from Laitinen (1978), who shows that the test statistic has proper-
ties different from what was believed. One of the problems of testing
homogeneity is the divergence between asymptotic theory and small
sample practice. (A8) may be invalid. Early on, the use of small samples
was recognized as a possible explanation for the rejection of homogeneity
(Barten, 1969, p. 68; Deaton, 1974, p. 361). However, Deaton remarks
that the small sample correction which has to be made in the case of
testing restrictions on parameters within separate equations, such as
homogeneity, is known. The correction was generally thought to be too
small to count for an unwarranted rejection of homogeneity. Laitinen
(1978, p. 187) however claims that the rejections of homogeneity are due
to the fact that `the standard test is seriously biased toward rejecting this
hypothesis'. The problem is that testing homogeneity for all equations is
dependent on the unknown variancecovariance matrix, which in prac-
tice is usually replaced by a least squares residual moment matrix.
Laitinen describes a Monte Carlo experiment of an articial statistical
computer simulation which shows that estimating demand systems with
only a few degrees of freedom should make a small sample correction for
the test statistic. Particularly in very small samples, the correction factor
was much larger than previously thought. This issue was resolved once
and for all by the small sample statistic proposed by Laitinen. Applying
206 Probability, Econometrics and Truth
this has led to less frequent rejections of homogeneity (although rejec-
tions still occur).
5.3 The epistemological perspective
After rejecting Popperian or Lakatosian rational reconstructions of the
search for homogeneity, and showing how frequentist probability inter-
pretations are hard to reconcile with the efforts to study consumer beha-
viour, a better interpretation should be given. A quasi-Bayesian approach
may do (a formal quantication of the history of evidence is a sheer
impossible task, if only because it is hard to interpret the frequentist
statistical tests in an unambiguous Bayesian way). One advantage of
the Bayesian approach is that there is no longer any need for an articial
distinction between `hard core' and `protective belt'. Second, Bayesians
value rejections as highly as conrmations: both count as evidence and
can, in principle at least, be used for a formal interpretation of learning
from evidence.
The distinction between a basic and an auxiliary hypothesis can be
rened in a Bayesian framework. Let S denote the hypothesis that a
particular specication is valid (for example the Rotterdam system), let
H be the `meaningful theorems' (homogeneity, in our case) and D the
data. Although in most cases listed above, the evidence rejects the joint
hypothesis or test system, H&S the effect on the probabilities P(H[D) and
P(S[D) may diverge, depending on the prior probability for H and S
respectively. Homogeneity has strong a priori appeal, for example, due
to experiences with currency reform (such as in France) or introspection.
The particular specication is less natural. As a result, rejection of the test
system will bear more strongly on the specication than on the hypothesis
of interest, homogeneity (see also Howson and Urbach, 1989, p. 99). The
posterior probability for homogeneity will decline only marginally, as
long as the specication is much more doubtful than homogeneity itself.
This observation explains why the rejections by a sequence of statistical
tests were not directly interpreted as a falsication of homogeneity. If
rejections would have dominated in the long run, whatever specication
or data had been used, eventually the rational belief in homogeneity itself
would have declined.
The homogeneity condition is, for some investigators, so crucial that it
should be imposed without further question. A researcher in the fre-
quency tradition would invoke the `axiom of correct specication', a
Bayesian may formulate a very strong prior on it, equal to 1 in the
extreme case. If this were done, one would never nd out that the
model specication is wanting. The specication search for homogeneity
In search of homogeneity 207
is of interest because it violates both the frequentist and the Bayesian
maxim.
Following the theory of simplicity, a trade-off should be made between
goodness of t and parsimony in demand systems. The trade-off is goal-
dependent. If measurement of elasticities is the real goal of inference (for
economic policy evaluation, for example), it may be right to impose
homogeneity (otherwise, it is not clear what the meaning of the para-
meters is). If the research is, rst of all, of academic interest, homogeneity
should not be imposed except if the resulting model outperforms a model
that violates homogeneity given specic criteria (such as predictive qua-
lities). An approach, broadly consistent with the Minimum Description
Length principle presented in chapter 5, section 2.4, is chosen in
Chambers (1990). Using time series data (108 quarterly observations),
he compares different demand systems by means of their predictive per-
formance (using a root mean squared prediciton error criterion). The
models are the LES, a LES supplemented with habit formation (introdu-
cing dynamics), the AIDS, an error correction system, and a Vector
Autoregressive model. Chambers nds that a relatively simple specica-
tion, i.e. the LES supplemented by habit formation, outperforms more
complicated specications, such as the AIDS. In the favourite model,
homogeneity and symmetry are not rejected by statistical tests.
5.4 In search of homogeneity
Falsication was apparently not the aim of testing homogeneity, but to
many researchers the real aim was very vague. Most of the researchers of
consumer demand tend to interpret the statistical test of homogeneity as
a specication test rather than intentional efforts to test and refute an
economic proposition (although this is never explicitly acknowledged and
is quite probably only an unconscious motive). Gilbert (1991a, p. 152)
makes a similar point. If homogeneity is immune or nearly so for statis-
tical tests, why then take the trouble to write sophisticated software and
invent new estimation techniques or specications in order to test this
condition?
Some insights of Leamer (1978) (rarely exploited in methodological
investigations) may help to answer the question. Leamer interprets
empirical econometrics as different kinds of `specication searches'.
The empirical investigation of the homogeneity condition can be under-
stood as a sequence of different kinds of specication searches. Most tests
were neither Popperian efforts to falsify a theory, nor Lakatosian
attempts to trace a `degeneration' of consumer demand theory, but
specication tests that served a number of different goals. Leamer
208 Probability, Econometrics and Truth
(p. 9) argues that it is important to identify this goal, the type of speci-
cation search, as the effectiveness of a search must be evaluated in terms
of its intentions. A bit too optimistic, Leamer claims that it is `always
possible for a researcher to know what kind of search he is employing,
and it is absolutely essential for him to communicate that information to
the readers of his report'. With the search for homogeneity in mind, a
more realistic view is that most research workers have only a vague idea
of what is being tested. Interpreting the episode with hindsight, we
observe that the goal of the empirical inference on homogeneity gradually
shifts. Schultz and his contemporaries are searching for models as simple
as possible. The signicance of their `test' of homogeneity is the effort to
reduce the dimension of the model, which Leamer classies as a `simpli-
cation search'. A simplication search results from the fact that, in
theory, the list of explanatory variables in a demand equation is extre-
mely long, longer than any empirical model could adopt. Particularly
during the pre-World War ii era, gaining degrees of freedom was impor-
tant. This simplication search has little to do with a metaphysical view
that `nature is simple' and much more with pragmatic considerations of
computability, (very) small sample sizes and efcient estimation.
The theme recurs with Phlips (1974, p. 56) who, writing on imposing
homogeneity, holds that `the smaller the number of parameters to be
estimated, the greater our chances are of being able to derive statistically
signicant estimates'. Marschak's effort to test homogeneity has partly
the same motivation, but also arises from his interest in the validity of
two rival theories: neoclassical (without `money illusion') and Keynesian
(with `money illusion'). Although Marschak (like the other demand
researchers discussed in this paper) does not use Bayesian language, we
may argue that he assigns a positive prior probability to both models and
then tests them. This is, in Leamer's terminology, a `hypothesis testing
search', the kind of search that some philosophers of science emphasize.
An obvious reason why Marschak's rejection of homogeneity is not taken
for granted by his contemporary colleagues (or their successors) is that
the large implicit prior against the homogeneity condition of consumer
demand was not shared. Furthermore, a general problem with the tests of
homogeneity is a lack of consideration of the economic signicance of the
deviation between hypothesis and data. The interest in such a kind of
importance testing has declined while formal statistical signicance test-
ing has become dominant. This development is deplored by Varian
(1990), who argues that economic signicance is more interesting than
statistical signicance. Varian suggests developing money-metric mea-
sures of signicance.
In search of homogeneity 209
Stone, in his book (1954a, p. 254), is motivated by a combination of
increasing degrees of freedom (simplication search) and testing the
`meaningful theorems' of demand theory. But with regard to the latter,
he does not provide an alternative model to which a positive prior prob-
ability is assigned. Hence, it is difcult to interpret Stone's work as a
`hypothesis testing search' similar to Marschak's. The same applies to
Schultz's successors. Their attitude is better described as an `interpretive
search'. Leamer denes this as the small sample version of a hypothesis
testing search, but it may be better to dene it as an instrumentalistic
version of hypothesis testing. The interpretive search then lets one act as
if one or another theory is `true'. Take, as an illustration, Deaton's (1978)
work. Two specications (the LES and a PIGLOG specication) of con-
sumer demand are tested against each other, using Cox's test. The dis-
turbing result is that, if either the LES or the PIGLOG specication is
chosen as the null, the other specication is rejected. The statistical tests
leave us empty handed. Homogeneity is not investigated, but Deaton
makes an interesting general remark about this implication of testing,
which reveals his instrumentalistic attitude:
This is a perfectly satisfactory result . . . In this particular case there are clearly
better models available than either of those considered here. Nevertheless we
might still, after further testing, be faced with a situation where all the models
we can think of are rejected. In this author's view, there is nothing to suggest that
such an outcome is inadmissible; it is perfectly possible that, in a particular case,
economists are not possessed of the true model. If, in a practical context, some
model is required, then a choice can be made by minimizing some appropriate
measure of loss, but such a choice in no way commits us to a belief that the model
chosen is, in fact, true. (1978, p. 535)
Another type of Leamer's specication searches is the `proxy search',
which is destined to evaluate the quality of the data. Although this
issue is raised in a number of the studies discussed, it has not played a
major role in practice. Instead, the issue of the quality of the test statistics
turned out to be more interesting.
This brings us to Laitinen. His work invalidates many of the rejections
of the homogeneity condition that were found during the search for
homogeneity. He points to judgemental errors that were made in the
interpretation of signicance levels. This result is different from the vari-
ety of specication searches and needs another interpretation. A Bayesian
would argue that the choice of signicance levels is a weak cornerstone of
classical (frequentist) econometrics anyway, and that these tests are useful
only insofar as they refer to a well specied loss structure. For the
frequentist econometrician who believes strongly in homogeneity, and
210 Probability, Econometrics and Truth
worships the signicance of testing statistical hypotheses, of Laitinen's
result solves a puzzle that is otherwise hard to explain. The salvage is not
complete: occasionally, `strange' test outcomes do occur.
6 Summary
The popular attitude towards testing in econometrics is one of disap-
pointment and scepticism. Blaug (1980) exhibits disappointment.
McCloskey (1985, p. 182), who argues that no proposition about eco-
nomic behaviour has yet been overturned by econometrics, represents the
sceptical attitude to signicance testing. This scepticism is shared by
Spanos (1986). Blaug, who may be characterized as a dissatised
Popperian, proposes to test better and harder. Spanos (like Hendry)
suggests that signicance testing (in particular, mis-specication testing)
is the key to better testing. McCloskey, on the other hand, proposes
reconsidering the ideal of testing and taking a look at other ways of
arguing, like rhetorics. In my view, those attitudes are erroneous.
Let us start with the Popperian ideal. Many contributors to the empiri-
cal literature on testing homogeneity, perhaps ignorant of philosophy of
science, pay lip-service to Popper. They are Popperian in their support of
the rationality principle, but turn out to be non-Popperian in their stub-
born refusal to give up demand theory. Should they be condemned,
should they test better and harder? I do not think so. These able econo-
metricians were not playing econometrics with the net down. Verication,
not falsication, appears to be hard, and rewarding. It is difcult to
reconcile the episode described in this chapter with a Popperian metho-
dology. Both on the positive level, and on the normative level, the
Popperian approach is wrong. Not surprisingly, Blaug (1980, chapter 6)
is rather vague in his neo-Popperian interpretation of empirical research
of consumer behaviour.
Neither is outright scepticism warranted. The homogeneity condition
may not have been overturned, but many suggestions (made by economic
theorists) as to how to measure elasticities have been scrutinized by
econometricians. Some of the suggestions survived, others foundered.
The literature shows fruitful cross fertilization: theory stimulated
measurement, `strange' test results stimulated theory (in particular, by
inspiring new functional forms or a new meaning of particular test
statistics). Hamminga (1983, p. 100) suggests that theory development
and econometric research are independent from each other. The studies
in consumer demand do not conrm this thesis.
What has been the signicance of testing the null hypothesis of homo-
geneity? A rejection may not affect the null, but instead it points to the
In search of homogeneity 211
maintained hypothesis. A specication search is the result. A declaration
of the falsication of the null of homogeneity would have been too simple
and easy a conclusion. The reason why it is worthwhile to attempt to
verify the `meaningful theorems' is that the measurement of variables of
interest, in particular elasticities, depends on the coherency of the mea-
surement system, the demand equations. Phlips holds that if such mea-
surement is to be meaningful, the restrictions of demand theory should be
imposed. `We nd it difcult to take the results of these tests very ser-
iously', Phlips (1974, p. 55) argues. If, however, the test is interpreted as a
specication test, this difculty disappears.
The research on homogeneity leads to the conclusion that the `small
world' of much empirical inference was not completely satisfactory. The
empirical results did not lead to mechanistic updating of the probability
of a hypothesis of homogeneity without scrutinizing auxiliary hypotheses.
The homogeneity condition survived thanks to its strong prior support.
Notes
1. The two conditions are derived from revealed preference theory, from which
the homogeneity condition can be deduced (Samuelson, [1947] 1983, pp. 111,
116). Compare with his remark (p. 97): `As has been reiterated again and
again, the utility analysis is meaningful only to the extent that it places
hypothetical restrictions upon these demand functions.'
2. This chapter is partly based on Keuzenkamp (1994) and Keuzenkamp and
Barten (1995).
3. Marschak's paper inspired Popper's `zero method', the method of construct-
ing a model on the assumption of complete rationality. See Popper ([1957]
1961, p. 141).
4. Cross (1982) suggests dropping the distinction between hard core and pro-
tective belt. This does not improve the transparency of Lakatos' methodol-
ogy, however.
212 Probability, Econometrics and Truth
9 Positivism and the aims of
econometrics
Science for the past is a description, for the future a belief; it is not, and
has never been, an explanation, if by this word is meant that science
shows the necessity of any sequence of perceptions.
Karl Pearson ([1892] 1911, p. 113)
1 Introduction
A standard characterization of econometrics is to `put empirical esh and
blood on theoretical structures' (Johnston, [1963] 1984, p. 5), or `the
empirical determination of economic laws' (Theil, 1971, p. 1). For
Chow (1983, p. 1) economic theory is the primary input of a statistical
model. Since the days of Haavelmo (1944), econometrics almost serves as
an afterthought in induction: the inductive stage itself is left to economic
theory, the statistical work is about the `missing numbers' of a precon-
ceived theory.
This is not how the founders of probability theory thought about
statistical inference. As Keynes ([1921] CW VIII, p. 359) writes,
The rst function of the theory is purely descriptive . . . The second function of the
theory is inductive. It seeks to extend its description of certain characteristics of
observed events to the corresponding characteristics of other events which have
not been observed.
The question is to what extent econometrics can possibly serve those
functions. In order to answer this question, I will characterize the aims
of econometrics and evaluate whether they are achievable. I will argue
that the only useful interpretation of econometrics is a positivist one.
Section 2 deals with the philosophical maxims of positivism. In the sub-
sequent sections, I discuss measurement (section 3), causal inference
(section 4), prediction (section 5) and testing (section 6). A summary is
given in section 7.
213
2 Positivism and realism
2.1 Maxims of positivism
What are `the' aims of science? There is no agreement on this issue. It is
useful to distinguish two views: positivism (related to empiricism, prag-
matism and instrumentalism) on the one hand, and realism (which
assumes the existence of `true' knowledge, as in Popper's falsicationism)
on the other hand. The following maxims (taken from Hacking, 1983,
p. 41) characterize positivism:
(i) Emphasis on verication, conrmation or falsication
(ii) Pro observation
(iii) Anti cause
(iv) Downplaying explanations
(v) Anti theoretical entities.
The rst relates to the Wiener Kreis criterion of meaningfulness, but also
to the Popperian demarcation criterion. Popper does not support the
other positivist maxims and, therefore, does not qualify as a positivist
(p. 43). The second maxim summarizes the view that sensations and
experience generate empirical knowledge or rational belief. The third is
Hume's, who argues that causality cannot be inferred. The fourth maxim
again is related to Humean scepticism on causality. Newton's laws, so it is
argued, are not explanations but rather expressions of the regular occur-
rence of some phenomena (is gravitation a helpful though metaphysical
notion, or is it a real causal force?). The fth point deals with the debate
between realists and anti-realists. I will make a detour on realism in the
next section, but it may be helpful to state my own beliefs rst.
De Finetti's one-liner `probability does not exist' (see chapter 4)
and my counterpart `the Data Generation Process does not exist' (see
chapter 7, section 4.2), are both implicit endorsements of the positivist
maxims. Interpreting the search for homogeneity as a specication
search, serving the measurement of elasticities and approximation of
demand equations, instead of a search for `truth', is a positivist interpre-
tation (chapter 8). Indeed, I claim that econometrics belongs to the posi-
tivist tradition. I am not at all convinced by Caldwell (1982), who argues
that we are `beyond positivism', and even less by post-modernists, like
McCloskey (1985) who hold that it is all rhetorics. With Bentham (1789,
p. 2), we should say: `But enough of metaphor and declamation: it is not
by such means that moral science is to be improved.'
I share the positivist's interest in conrmation and falsication (which
is not the same as methodological falsicationism, which transforms
common sense into dogma) and reject the rationalism or Cartesian
doubt of someone like Frank Hahn. Hahn is sceptical about econometric
214 Probability, Econometrics and Truth
conrmation and relies on his ability to think. His cogito cannot be better
expressed than by Hahn himself: `It is not a question of methodology
whether Fisher or Hahn is right. It is plain that Hahn is' (1992, p. 5).
I also support the emphasis on measurement and observation. In sec-
tions 3.2 and 5.1 below, I will criticize economists who think that obser-
vation is unimportant. With respect to causality, I hold (like Hendry, in
Hendry, Leamer and Poirier, 1990, p. 184) that econometrics should be
Humean: we observe conjunctions, we do not `know' their deeper
mechanism. Still, it may be convenient to speak of causes, certainly if
we have specic interventions in mind (this is natural for economists who
are interested in policy). I interpret the word `cause' as `the rational
expectation that such and such will happen after a specic intervention'.
Finally, I believe explanations serve a purpose: they are useful in con-
structing analogies which may improve theoretical understanding. But I
do not view understanding and explanation as roads to truth: they are
psychological tools, useful in communicating policy analysis or interven-
tion. Econometric models aim at empirical adequacy and useful interpre-
tations, not truth. As truth does not have an objective existence in
economics, I think it does not make sense to take scientic realism as
the philosophical starting point of economic inference. Realism is not my
specic interest, but I will discuss it briey now.
2.2 Scientic realism
Realism takes the notions in science as `real', existing objectively as the
`truth' which science should `discover'. Hacking (1983) notes that there
are two versions of realism: realism about theories (i.e. theories aim at the
truth) and realism about entities (i.e. the objects described in theories
really exist). Positivists do not care about the realism of either.
Feynman (1985) illustrates the problem of scientic realism by asking
whether the inside of a stone exists. No one has ever seen such an inside;
anyone who tries, by breaking the stone into two parts, only sees new
outsides. Another example is an electron: do electrons exists or are they
just models, inventions of the mind? Concepts like `electron' may be
useful to clarify our thoughts and to manipulate events, but that does
not mean they are to be taken as literally true descriptions of real entities.
They do not refer to a true state of nature, or true facts. If different
physicists mean different things when they speak of an electron, then
this electron is merely an image, an invention of the mind. This is
reected in van Fraassen's (1980, p. 214) question: `Whose electron did
Milikan observe; Lorentz's, Rutherford's, Bohr's or Schro dinger's?'
Positivism and the aims of econometrics 215
Economic counterparts of Milikan's electrons are not hard to think of.
Money is an example. In the form of a penny or a dime money is pretty
real, but its true value may be an illusion! And as an aggregate, like M1, it
is highly problematic. Aggregation and index number theory suggest that
M1 is a poor proxy for the money stock, and not a real entity correspond-
ing to the theoretical entity `money' occurring in a phrase like `money
causes income'. The money aggregate is a theoretical notion, a convenient
ction, which has no real existence outside the context of a specic
theory.
Lawson, who supports realism, argues that realists hold that the object
of research exists independently of the inquiry of which it is an object.
`True' theories can be obtained, he argues, and the objective world does
exist (Lawson, 1987, p. 951). Lawson (1989), however, observes that
econometricians tend to be instrumentalists. Instead of wondering what
might be wrong with realism, he concludes that, because of their instru-
mentalist bias, econometricians have never quite been able to counter
Keynes' critique on econometrics (see chapter 4, section 2.4). But I fear
that a realist econometrics would make econometrics incomprehensible.
Lawson does not clarify how `realist econometrics' should look. Would it
search for deep parameters (the `true' constants of economics, usually
those of consumption and production functions), in the spirit of
Koopmans (1947) or Hansen and Singleton (1983)? Summers (1991) cri-
ticizes this approach: it yields a `scientic illusion'.
Moreover, in economics, the object is not independent of the subject.
In other words, behaviour (like option pricing) is affected by economics
(the option pricing model). An independent truth does not exist. This also
relates to the question of whether we invent or discover theories.
Discovery belongs to realism. In economics, inventions matter.
1
The dif-
ference is that inventions are creations of our minds, not external reali-
ties. The price elasticity of demand for doughnuts is not a real entity,
existing outside the context of a specic economic model. I concur with
van Fraassen (1980, p. 5), who argues that
scientic activity is one of construction rather than discovery: construction of
models that must be adequate to the phenomena, and not discovery of truth
concerning the unobservable.
A problem is how to dene `adequate'. Van Fraassen does not tell. I think
there is at least a heuristic guide to determine relative adequacy: the
Minimum Description Length principle (see chapter 5). It combines a
number of the aims of inference, in particular descriptive and predictive
performance. The MDL principle can also be invoked by those who
emphasize other aims, such as nding causal relations, or improving
216 Probability, Econometrics and Truth
understanding. This is because both aims can be viewed as derivatives of
descriptive and predictive performance.
Finally, I will make some very brief remarks on the `realism of assump-
tions'. The debate on realism raged in economics during the late fties
and early sixties, but the issues then were not quite the same as those
discussed above. In 1953, Milton Friedman published a provoking Essay
on Positive Economics, in which he denies the relevance of realistic
assumptions in economic theory. What counts, according to Friedman,
is the predictive quality of a theory. Frazer and Boland (1983) reinterpret
Friedman's position as an instrumental kind of Popperianism. This seems
a contradiction in terms, given Popper's disapproval of instrumentalism.
Friedman's position is a clearly positivist one, the title of his work is well
chosen. Although Friedman recommends testing theories (by means of
their predictions), this does not make him a Popperian.
2
3 Science is measurement
3.1 From measurement to statistics
The best expression of the view that science is measurement is Kelvin's
dictum: `When you can measure what you are speaking about and
express it in numbers, you know something about it, but when you
cannot measure it, you cannot express it in numbers, your knowledge
is of a meagre and unsatisfactory kind.' Francis Galton and Karl Pearson
advocate this principle in the domain of biometrics and eugenics. Galton
even wanted to measure boredom, beauty and intelligence. It denitely is
a nineteenth-century principle, but its popularity extends to the twentieth
century including econometrics. The original motto of the Cowles
Commission leaves no doubts: `science is measurement'.
3
Measurement
belongs to maxims of positivism in its emphasis on observation, although
few positivists would argue that measurement is the ultimate goal of
science.
The quest for measurement started with the search for `natural con-
stants'. The idea of natural constants emerged in the nineteenth century
(when, for example, experimental physicists started to measure Newton's
constant of gravitation; see Hacking, 1990, p. 55). Nineteenth-century
social scientists tried to nd `constants of society', such as birth rates and
budget spending. For example, Charles Babbage made a heroic effort to
list all relevant constants that should be measured (pp. 5960). Babbage
and the Belgian statistician, Quetelet, are good examples of those who
endorsed measurement for the sake of measurement.
4
If Babbage had
known Marshall's concept of an elasticity, he would have added a
Positivism and the aims of econometrics 217
large number of elasticities to his list. The rst estimated consumer
demand equations, due to Moore and Schultz (see chapter 8), are the
economists' extensions of the positivist research programme. It is no
surprise that Moore was a student of Karl Pearson, and Schultz was a
pupil of Moore.
The rise of statistics parallels the growing availability of `data'. This is
one of the themes of Hacking (1990). The more numerical data that
became available, the higher the demand for their reduction. As related
in chapter 6, Galton was not only obsessed by measurement, but also
invented the theory of regression and correlation. His ratings of beauty
are based on the `law of error' (normal law or Gaussian distribution), and
this law serves his analysis of correlation. By making the steps from
measurement, via the law of error, to correlation, Galton `tamed chance'
(Hacking, 1990, p. 186). Galton bridged description and understanding
by means of statistics: correlation yields explanation, and the law of error
is instrumental in the explanation. Correlation can be interpreted as a
replacement for causes, causation is the conceptual limit to correlation.
3.2 Robbins contra the statisticians
During the founding years of econometrics, Lionel Robbins became a
spokesman against econometrics. Unlike Keynes, he is not one of the
founding (1933) fellows of the Econometric Society. Robbins, ([1932]
1935, p. 104) views economics as `deductions from simple assumptions
reecting very elementary facts of general experience'. The validity of the
deductions does not have to be established by empirical or inductive
inference, but is
known to us by immediate acquaintance . . . There is much less reason to doubt
the counterpart in reality of the assumption of individual preferences than that of
the assumption of an electron. (p. 105)
Hence, Robbins is a realist.
5
Robbins combines realism with an aversion
to econometric inference. He rejects quantitative laws of demand and
supply. If this were the result of academic conservatism, Robbins could
be ignored. However, he makes a point that can be heard today as well.
Among contemporary contributors to economic methodology, at least
one (Caldwell, 1982) thinks that Robbins (and Austrian economists,
who share many of his ideas) has a defensible case. Therefore, consider
Robbins' argument. The problem with obtaining quantitative knowledge
of economic relations, Robbins argues, is that it is
218 Probability, Econometrics and Truth
plain that we are here entering upon a eld of investigation where there is no
reason to suppose that uniformities are to be discovered. The `causes' which bring
it about that the ultimate valuations prevailing at any moment are what they are,
are heterogeneous in nature: there is no ground for supposing that the resultant
effects should exhibit signicant uniformity over time and space. No doubt there
is a sense in which it can be argued that every random sample of the universe is
the result of determinate causes. But there is no reason to suppose that the study
of a random sample of random samples is likely to yield generalizations of any
signicance. (p. 107)
Clearly, this view is at odds with the view of Quetelet and later statisti-
cians, who found that many empirical phenomena do behave uniformly,
according to the `law of error'. The central limit theorem (or its invalid
inversion, see chapter 6, section 2) is of special importance in the
investigations of those statisticians. In order to be persuasive, Robbins
should have shown why, if there is no reason to expect stable relations,
the early statistical researchers of social phenomena found so many regu-
larities. Robbins could also have pointed out which conditions for the
validity of the central limit theorem are violated in economics, or how it
could be used in economics. Such arguments are not given. He just writes
that it is `plain'. (Remember Hahn!) The investigations of early econo-
metricians on autonomous relations, identications, etc., have been more
useful to the understanding of possible weaknesses of econometric infer-
ences, than Robbins' unsupported claims that there are simply no uni-
formities to be discovered.
Robbins attacks the idea that econometricians can derive quantitative
statistical laws of economics. In a sense, Robbins is Pyrrhonian. His
solution is the a priori strategy, which (he thinks) enables one to derive
qualitative laws of economics.
6
Robbins ([1932] 1935, pp. 11213) not
only attacks the statistical analysis of demand and supply, but also sta-
tistical macroeconomics, in particular Wesley Mitchell's (1913) Business
Cycles. Mitchell investigates the similarities between different business
cycles, whereas Robbins argues that the only signicance in describing
different business cycles is to show their differences due to varying con-
ditions over space and time. `Realistic studies' (i.e. statistical investiga-
tions) may be useful in suggesting problems to be solved, but `it is theory
and theory alone which is capable of supplying the solution' (Robbins,
[1932] 1935, p. 120).
Robbins' views are related to the Austrian methodology (see section
5.2). Most Austrians reject the predictive aim of economics, Hayek is an
exception. He argues that statistics is useful only insofar as it yields
forecasts. Robbins, on the other hand, claims that forecasts do not result
from (unstable) quantitative statistical relations, but from (qualitative)
Positivism and the aims of econometrics 219
economic laws (such as the law of diminishing marginal utility). The
latter are `on the same footing as other scientic laws' (p. 121). Exact
quantitative predictions cannot be made. This is his reason for rejecting
the search for statistical relations or `laws'. Only if econometricians were
able to estimate all elasticities of demand and supply, and if we could
assume that they are natural constants, Robbins (pp. 1312) claims, we
might indeed conceive of a grand calculation which would enable an
economic Laplace to foretell the economic appearance of our universe
at any moment in the future. But such natural constants are outside the
realm of economics. Robbins' view is not unlike Keynes' in his debate
with Tinbergen. Keynes, though, is much closer to Hume's strategy of
conventionalism and naturalism than to the neo-Kantian apriorism of
Robbins. Also, Keynes' objections to the inductive aims of econometrics
are better founded than those of Robbins. Unlike Keynes, Robbins cre-
ates a straw man, an economic Laplace to foretell the future. Few modern
econometricians aim at this and neither did the founders. Moore, Schultz
and other early econometricians followed Pearson, the positivist. He did
not want to catch Laplace's demon in a statistical net. Pearson opposed
necessity and determinism. His conception of (statistical) law is neither
Cartesian, nor Laplace's:
law in the scientic sense only describes in mental shorthand the sequences of our
perceptions. It does not explain why those perceptions have a certain order, nor
why that order repeats itself; the law discovered by science introduces no element
of necessity into the sequence of our sense-impressions; it merely gives a concise
statement of how changes are taking place. (Pearson, [1892] 1911, p. 113; see also
pp. 867)
3.3 Facts and theory
The pioneer of the econometrics of consumer behaviour, Schultz, argues
that in the `slippery eld of statistical economics, we must seek the sup-
port of both theory and observation' (1937, pp. 545). This is sound
methodological advice. It neither gives theory nor observation a domi-
nant weight in scientic inference, unlike the authors of the textbooks
cited at the beginning of this chapter. Following Koopmans and
Haavelmo, they put theory rst.
Koopmans holds that fact nding is a waste of time if it is not guided
by theoryneoclassical theory, that is. But fact nding is an important
stage in science. This is acknowledged by Haavelmo (1944, p. 12; see also
chapter 6), who speaks of a useful stage of `cold-blooded empiricism'.
Take Charles Darwin, who writes that he `worked on true Baconian
220 Probability, Econometrics and Truth
principles, and, without any theory, collected facts on a wholesale scale'
(Life and Letters of Charles Darwin, cited with admiration in Pearson,
[1892] 1911, p. 32). Darwin even tried to suppress his theory: `I was so
anxious to avoid prejudice, that I determined not for some time to write
even the briefest sketch of it' (p. 33). Mitchell's (1913) method of business
cycle research is like Darwin's, but Mitchell was not able to make the
consecutive step: to invent a theory to classify the facts. This step has
been made by followers of Mitchell, in particular, Kuznets, Friedman
and Lucas. Lucas (1981, p. 16) praises the empirical work of Mitchell
and Friedman for providing the `facts' or evidence that a theory must
cope with. It may surprise some readers to nd Lucas, best known as a
macro-economic theorist, in the company of Mitchell instead of
Koopmans, but he rightly places himself in the positivist tradition.
7
Stylized facts are a source of inspiration for economic theory. Kuznets'
measurements of savings in the USA inspired Friedman to his permanent
income theory of consumption. In particular, Kuznets found that the
savings rate is fairly stable and does not (as Keynes suggested) decline
with an increase in income. This is a good example of how measurement
can yield a fruitful stylized fact and inspire new theory and measure-
ments. Similarly, Solow's stylized facts about growth have inspired an
immense literature on the basic characteristics and determinants of
growth. There wouldn't have been an endogenous growth literature with-
out studies such as Solow's. The example illustrates the way science
proceeds: from simple (stylized fact) to more general (neoclassical growth
models, endogenous growth models). It would be wrong to reject studies
like Mitchell's, Friedman and Schwartz's, or Solow's because their
stylized facts are all gross mis-specications of economic reality.
But theory is an essential ingredient to classify facts. An obvious exam-
ple is consumer behaviour. To qualify as a demand equation, an inferred
regularity has to satisfy certain theoretical restrictions. Only then are the
estimated elasticities credible. In the slippery eld of empirical economics,
we do indeed need both theory and observation.
3.4 Constants and testing
The quest for constants of nature and society started with Charles
Babbage. Measuring nature's constants not only inspired positivist
research, but is of specic interest for (neo-) Popperians, such as Klant.
Without constants, Klant (1979, pp. 2245) argues, the falsiability prin-
ciple for the demarcation of science breaks down. Popper ([1957] 1961)
acknowledges that lack of constancy undermines methodological falsi-
cationism. In economics, there is little reason to assume that parameters
Positivism and the aims of econometrics 221
are really constant. Human beings change, as do their interactions and
regularities. Logical falsicationism, therefore, is beyond reach. In prin-
ciple, positivists do not have to be bothered by this problem. Their aim is
not to discover and measure `true' constants to single out scientic the-
ories, but to invent measurement systems in order to measure parameters
which are useful in specic contexts.
Charles Peirce contributed to this view. He denies the existence of
natural constants. There are no Laws of Nature, hidden in the Book of
God, with `true' natural constants to be discovered. Instead, laws evolve
from chance, their parameters are at best limiting values that will be
reached in an indenite future (see Hacking, 1990, p. 214). Perhaps this
is even too optimistic about convergence to limiting values (see for exam-
ple Mirowski, 1995, who discusses the `bandwagon effect' of measure-
ments of parameters in physics). The claim that physicists can apply the
Popperian methodology of falsication, while economic theories are not
strictly refutable, cannot be accepted. In both cases, strict methodological
falsicationism is unfeasible. Econometricians face an additional pro-
blem: new knowledge, such as the publication of an empirical econo-
metric model, may lead to a `self-defeating excercise in economic
history' when the ndings are used to change the object of inference
(Bowden, 1989, p. 258). This is the problem of reexivity.
Because of the absense of universal constants, Klant argues, econo-
metricians cannot test (i.e. falsify) general theories. Instead, they esti-
mate specic models. But there are other kinds of testing, more subtle
and much more relevant to econometrics than falsication (see section
6). Lack of experimentation, weakly informative data sets and the pro-
blem of reexivity are more fundamental to econometrics than the pre-
sumed absence of universal natural constants. Reexivity and other
sources of changing behaviour may lead to time varying or unstable
parameters, but this does not prohibit making predictions or learning
from experience. Positivism does not leave the econometrician empty
handed.
4 Causality, determinism and probabilistic inference
`A main task of economic theory is to provide causal hypotheses that
can be confronted with data' (Aigner and Zellner, 1988, p. 1). `Cause' is
a popular word in econometrics, but whether causal inference really is a
main or even the ultimate goal of econometrics may be questioned. The
debate between rationalism (associated with scientic realism) and
empiricism (or positivism) is to a large extent a debate on causality.
Hence, we may expect at least two separate views on causation. Indeed,
222 Probability, Econometrics and Truth
the empiricist holds that causation is a metaphysical notion that is not
needed for scientic investigation. Science is about regularities, and that
sufces. If you want to speak of causes, you may do so, but we cannot
obtain true knowledge about the causes that go on behind the regula-
rities observed. This view has been opposed by Kant, who argues that
causation is the conditio sine qua non for scientic inference.
8
Without
cause, there is no determinant of an effect, no regularity, no scientic
law. The view dates back to the initiator of scepticism, Sextus
Empiricus, who argues that `if cause were non-existent everything
would have been produced by everything and at random' (Outlines of
Pyrrhonism, cited in van Fraassen, 1989, p. 97). The difference is that
Kant provides an optimistic alternative to scepticism, by means of his
a priori synthetic truth. Science presupposes the causal principle.
An intermediate position, which is positivist in content but pragmatic
in form, is expressed by Zellner. He uses words such as `cause' and
`explain' with great regularity and sympathy. But `explain' means to
him: ts well to past phenomena over a broad range of conditions
(Zellner, 1988). `Cause' is dened in terms of predictability: Zellner
advocates Feigl's denition of `predictability according to a law or set
of laws' (this denition takes account only of sufcient causation; see
below). A law as dened by Zellner, is a regularity that both `explains'
(i.e. ts with past phenomena) and predicts well. He adds a probabilistic
consideration: models are causal with high probability when they per-
form well in explanation and prediction over a wide range of cases.
Zellner's work is in the positivist tradition; his favourite example of a
fruitful search for a causal law is Friedman's study of consumption.
Zellner's approach to causality is sensible, although it has to be rened.
Causal inference is primarily of interest if one has a specic intervention
in mind, in which case `cause' can be interpreted as `the rational expec-
tation that such and such will happen'. The type of intervention has to
be made explicit, as well as the other conditions for the rational
expectation.
9
Before discussing causation in econometrics, I will deal with the mean-
ing of causation in a deterministic world. Section 4.1.1 introduces
Laplace's demon, the deterministic representative of the founder of prob-
ability theory. In section 4.1.2, necessary and sufcient causation is dis-
cussed, and the subjunctive conditional interpretation of causation is
presented. Section 4.1.3 deals with Hume's critique on causal inference,
section 4.1.4 discusses Peirce on determinism. In section 4.2, the relation
between causation and probability is discussed. Section 4.3 extends the
discussion to the domain of econometrics.
Positivism and the aims of econometrics 223
4.1 Causation and determinism
4.1.1 Laplace's demon
It is an `evident principle that a thing cannot begin to be without
a cause producing it', writes Laplace (cited in Kru ger, 1987, p. 63).
Laplace, a founder of probability theory, is a determinist. He subscribes
to Kant's principle of universal causation (see below). All events follow
the laws of nature. If one only knew those laws plus initial conditions,
one could know the future without uncertainty. Please let me introduce
Laplace's demon:
We ought then to regard the present state of the universe as the effect of its
previous state, and as the cause of that which will follow. An intelligence which
for a given instant knew all the forces by which nature is animated, and the
respective situations of the existences which compose it; if further that intelligence
were vast enough to submit these given quantities to analysis; it would embrace in
the same formula the greatest body in the universe and the lightest atom: nothing
would be uncertain to it and the future as the past would be present to its eyes.
(Laplace, 1814, p. vi, translation by Pearson, 1978, p. 656)
Nothing in the future is uncertain for the demon. The human mind
searches for truth, Laplace continues, by applying the same methods as
this vast intelligence. This makes men superior to animals. `All men's
efforts in the search for truth tend to carry him without halt towards
such an intelligence as we have conceived, but from which he will always
remain innitely remote.' The reason why we need probability theory is
that we are less able than the demon. There is an incomprehensible
number of independent causal factors; probabilistic inference can help
to obtain knowledge about the probability of causes. Chance is the reec-
tion of ignorance with regard to the necessary or true causes.
4.1.2 Necessary and sufcient causes
What constitutes a `cause'? Below, I discuss a number of deni-
tions. A well known one is Galileo's efcient cause C of an event E, which
is the necessary and sufcient condition for its appearing (Bunge, [1959]
1979, pp. 4, 33) if the following condition holds:
if and only if C, then E.
Inferring the efcient cause of event E is straightforward, for it implies
that not-C results in not-E. A problem is that many events are determined
by a very large or even innite amount of other events. This denition is,
therefore, of little use. A weakened version is the sufcient but not neces-
sary causal condition,
224 Probability, Econometrics and Truth
if C, then E.
This is how John Stuart Mill denes causes, with the additional provision
that the cause precedes the event in time.
10
The denition facilitates test-
ing for causality. The principle of the uniformity of nature, stating that
the same cause must always have the same effect, supplements this causal
relation with immutable laws of nature.
These are the most elementary notions of cause. But how can we know
that causes are real, rather than spurious? One way might be to distrust
our sensations, the phenomena. Rational reasoning, deduction, will
reveal true causes. This view is exemplied by Descartes who holds
that `[n]othing exists of which it is not possible to ask what is the cause
why it exists' (cited in Bunge, [1959] 1979, p. 229, n. 10). This proposition
is better known as Leibniz's principle of sufcient reason, according to
which nothing happens without a sufcient reason or cause (Leibniz uses
reason and cause as synonyms; for Descartes and Leibniz, reasoning and
obtaining causal knowledge were nearly identical). But this principle is of
little help to demarcate spurious and real causes. Hume's scepticism is a
critique of Descartes' rationalistic view on causation.
Kant tried to reconcile Descartes and Hume, by making a distinction
between `das Ding an sich', i.e. the `thing in itself', and its appearance,
`das Ding fu r sich'. Although Kant accepted that we cannot obtain
knowledge of things in themselves, we are able to make inferences with
respect to their appearances. The principle of universal causation states
that appearances of things, i.e. phenomena, are always caused by other
phenomena: `Alles zufa llig Existierende hat eine Ursache' (everything
which happens to exist has a cause; cited in Kru ger, 1987, p. 84, n. 10;
see also von Mises, [1928] 1981, p. 210).
11
Kant's principle is introduced
for the purpose of making inference possible and has an epistemological
intention. It relates to experiences, not to `das Ding an sich'. But the
principle is not empirically testable, it is an a priori synthetic proposition,
a precondition for empirical research (see also Bunge, [1959] 1979, pp.
278). Hume would have considered this requirement superuous.
An alternative to the denitions of causations presented above is to
base the notion of cause on the so-called subjunctive conditional. This is
a proposition of what would happen in a hypothetical situation, i.e. if A
had been the case, B would be the case. According to the subjunctive
conditional denition of cause, C is a cause of E if the following condi-
tion is satised: if C had not occurred, E would not have occurred. This is
anticipated in Hume ([1748] 1977, p. 51): `we may dene a cause to be an
object, followed by another, and where all the objects, similar to the rst,
are followed by objects similar to the second. Or in other words, where, if
Positivism and the aims of econometrics 225
the rst object had not been, the second never had existed' (n.b., this nal
sentence is not correct).
4.1.3 Hume's critique of causality
According to Hume, you may call some things causes, if it
pleases you, but you cannot know they are. Causation is convenient
shorthand for constant conjunction. Reasoning will not enable one to
obtain causal knowledge. Hume ([1739] 1962) argues that there are three
ingredients in apparently causal relations:
(i) contiguity (closeness) in space and time
(ii) succession in time
(iii) constant conjunction: whenever C, then E, i.e. the same relation
holds universally.
The Cartesian theory of causality holds that everything must have a
sufcient cause, and some events must be or cannot be the causes of
other events. But, Hume argues, true knowledge of necessary causes
cannot be derived from observation, whence the problem of induction
and Hume's scepticism. At best, we observe relations of constant con-
junction. If they also obey (i) and (ii), by way of habit we call them causal
laws. They are subjective mental constructs, psychological anticipations,
not truth statements of which the validity can be ascertained. Bunge
([1959] 1979, p. 46), a modern realist, criticizes Hume:
the reduction of causation to regular association, as proposed by Humeans,
amounts to mistaking causation for one of its tests; and such a reduction of an
ontological category to a methodological criterion is the consequence of episte-
mological tenets of empiricism, rather than a result of an unprejudiced analysis of
the laws of nature.
Bunge is correct that Humean causation is indeed of an epistemological
nature, but this is not a mistake of Hume's: he explicitly denies that
ontological knowledge of causation is possible. Bunge's own approach,
summarized by the proposition that causation is a particular case of
production (p. 47),
12
suffers from the same problems as the ones which
Hume discusses: even if we experiment, we cannot know the true causes,
as there is always a possibility that the observed regularities are
spurious.
Habit, which results from experience, is Hume's ultimate justication
for inference. If we regularly experience a constant conjunction of heat
and ame, we may expect to observe such a conjunction in a next instance
(Hume, [1748] 1977, p. 28). Without the guidance of custom, there would
be an end to all human action. This view is shared by Keynes. Custom is
226 Probability, Econometrics and Truth
the very guide in life, Hume argues in his naturalistic answer to
scepticism.
There are no probabilistic considerations in Hume's discussion of caus-
ality (he deals with deterministic laws). Due to the inuence of Newton's
Principia (published in 1687), until the end of the nineteenth century
`explanation' meant virtually the same as `causal description' (von
Mises, [1928] 1981, p. 204). Gravitation gures as the most prominent
`cause' in Newton's laws of mechanics.
13
It is no surprise that, after the
quantum revolution in physics, not only Newtonian mechanics came
under re but also this association of explanation and causality. In a
famous essay, Russell (1913) argues that it is better to abandon the
notion of causality altogether. According to Russell, the causality prin-
ciple (same cause, same effect) was so popular because the idea of a
function was unfamiliar to earlier philosophers. Constant scientic laws
do not presume sameness of causes, but sameness of relations, more
specically, sameness of differential equations (as noted on p. 102
above, Russell claims in his Outline of Philosophy that scientic laws
can only be expressed as differential equations; Russell, 1927, p. 122;
see, for the same argument, Jeffreys, [1931] 1957, p. 189, who concludes:
`the principle of causality adds nothing useful').
There are certainly problems with the Humean view on causality. For
example, the requirement of contiguity is not only questionable in physics
(think of `action at distance') but also economics (printing money in the
US may cause ination in the UK). Moreover, succession in time is not
necessary for a number of relations that would be called causal by com-
mon sense (like `force causes acceleration', Bunge, [1959] 1979, p. 63, or
upward shift of demand curves causes higher prices in general equili-
brium). Other denitions of causality are, therefore, needed if the notion
of causality is to be retained. The alternative is more rewarding: aban-
doning the quest for true causes.
4.1.4 Peirce on the doctrine of necessity
Peirce (1892, p. 162) opposes Kant's principle of universal cau-
sation and other beliefs that `every single fact in the universe is precisely
determined by law'. Peirce exemplies the changes of scientic thinking,
from the deterministic nineteenth century, to the probabilistic twentieth
century (see e.g. Hacking, 1990, pp. 2003). Peirce provides the following
arguments against the `doctrine of necessity'.
First, inference is always experiential and provisional. A postulate of
universal causation is like arguing that `if a man should come to borrow
money . . . when asked for his security . . . replies he ``postulated'' the loan'
(Peirce, 1892, p. 164). Inference does not depend on such a postulate,
Positivism and the aims of econometrics 227
science deals with experience, not with things in themselves. Here, Peirce
joins Hume in his emphasis on observable regularities.
Secondly, Peirce denies that there are constant, objectively given para-
meters (see above). The necessitarian view depends on the assumption
that such parameters do have xed and exact values. Peirce also denies
that exact relationships can be established by experimental methods:
there always remain measurement errors (p. 169). Errors can be reduced
by statistical methods (e.g. least squares), but `an error indenitely small
is indenitely improbable'. This may be seen as an invalid critique of the
doctrine of necessity: the fact that we cannot measure without error the
exact value of a continuous variable in itself does not undermine the
possible existence of this value. But then, Peirce argues, a belief in such
existence must be founded on something other than observation. In other
words, it is a postulate and such postulates are redundant in scientic
inference.
The fact that we do observe regularities in nature does not imply that
everything is governed by regularities, neither does it imply that the
regularities are exact:
Try to verify any law of nature, and you will nd that the more precise your
observations, the more certain they will be to show irregular departures from the
law. . . Trace their causes back far enough, and you will be forced to admit they
are always due to arbitrary determination, or chance. (Peirce, 1892, p. 170)
Peirce was writing before the quantum revolution in physics, but the
probabilistic perspective made him reject Laplace's determinism. The
fact that there still is regularity may be the result of chance events
given the `law of error', the normal distribution (as was argued by
Quetelet and Galton). Regularity may be the result of laws, even of
probabilistic laws. It can equally well result from evolution and habit.
But diversity can never be the result of laws which are immutable, i.e.
laws that presume that the intrinsic complexity of a system is given once
and for all (see Peirce, 1892, p. 173). Chance drives the universe.
4.2 Cause and chance
4.2.1 Laplace revisited
Despite his determinism, Laplace links probability and causality.
For him, there is no contradiction involved. Laplace regards probability
as the result of incomplete knowledge. He distinguishes constant (regular)
causes from dynamic (irregular) causes, `the action of regular causes and
constant causes ought to render them superior in the long run to the
effects of irregular causes' (cited from the Essai Philosophique sur les
228 Probability, Econometrics and Truth
Probabilites, pp. liii^liv, translation Karl Pearson, 1978, pp. 653-4). Our
ignorance of causes underlies probabilistic inference. As this ignorance is
the greatest in the `moral sciences', one might expect Laplace to recom-
mend probabilistic inference especially to those branches. This, indeed, he
does, in a section on application of the calculus of probabilities to the
moral sciences.
Consider an urn, with a xed but unknown proportion of white and
black balls. The constant cause operating in sampling balls from the urn
is this xed proportion. The irregular causes of selecting a particular ball
are those depending on movements of the hand, shaking of the urn, etc.
Laplace's demon can calculate the effects of the latter, but human beings
cannot. What can be done is to make probabilistic propositions about the
chance of drawing particular combinations of balls. The irregular causes
disappear, but a probability distribution remains. For a long period, this
view on probability and causation inuenced probability theory. For
example, Antoine Augustin Cournot, of a younger generation of statis-
ticians than Laplace, still sticks to the view that apparently random
events can result from a number of independent causal chains
(Hacking, 1975, p. 174; Kru ger, 1987, p. 72). The change came at the
end of the nineteenth century, when the deterministic world-view began
to crumble. Meanwhile, the frequency interpretation of probability devel-
oped. How does it relate to causation?
4.2.2 Frequency and cause
Since Hume, one of the positivist maxims has been scepticism
about causes. This is reected in the writings of Ernst Mach, and his
followers Karl Pearson and Richard von Mises. Mach holds that the
notion of causation is an out-dated fetish that is gradually being replaced
by functional laws (Bunge, [1959] 1979, p. 29). This is similar to Russell's
belief that causal laws will be replaced by differential equations. Pearson
goes a step further and claims that causal laws will be replaced by
empirical statistical correlations. Pearson ([1892] 1911) can be read as
an anti-causal manifesto. He is a radical Humean, arguing that cause is
meaningless, apart from serving as a useful economy of thought (pp. 128,
170). Cause is a `mental limit', based upon our experience of correlation
and contingency (p. 153).
Von Mises' thinking on causality is more delicate. He is a positivist,
inspired by Mach, and his probability theory owes some ideas to the
posthumously published book Kollektivmasslehre (1897) of G. Theodor
Fechner (see von Mises, [1928] 1981, p. 83). Both Mach and Fechner
reject the principle of universal causation. Fechner also rejects determin-
ism, largely for the same reasons as Peirce. Fechner's sources of indeter-
Positivism and the aims of econometrics 229
minism are inaccuracy, the suspension of causal laws, the occurrence of
intrinsically novel initial conditions and, nally, the occurrence of self-
fullling prophecies (reexivity).
While Mach, Fechner and Pearson reject causalism out of hand, von
Mises does not think that one needs to abandon the notion of causality in
(probabilistic) inference. In discussing the `small causes, large effects'
doctrine of Poincare , von Mises notes that statistical theories (such as
Boltzman's gas theory) do not contradict the principle of causality. Using
Galton's Board (the `quincunx', see chapter 6, section 3.1) as illustration,
von Mises argues that a statistical analysis of the resulting distribution of
the balls is not in disagreement with a deterministic theory (although the
latter would be very hard to implement to analysing the empirical dis-
tribution). The statistical theory does not compete with a deterministic
one, but is another form of it (von Mises, [1928] 1981, p. 209). Von Mises
even considers that the statistical distributions of events like throwing
dice, or rolling balls on Galton's Board, may be given a causal interpre-
tation. The meaning of the causal principle changes. Causality in the age
of quantum physics is not quite the same as it was during the heyday of
mechanical determinism.
Von Mises (p. 208) argues that classical mechanics is of little help in
`explaining' (by means of causal relations) semi-stochastic phenomena,
such as the motion of a great number of uniform steel balls on Galton's
Board. What is needed is a
simple assumption from which all the observed phenomena of this kind can be
derived. Then at last can we feel that we have given a causal explanation of the
phenomena under investigation.
Hence, von Mises claims that causal explanation is directly related to the
simplicity of assumptions. But cause is not an absolute notion, as it is in
the writings of Descartes and his followers. A theory is not conditional on
the existence of real causes (as it is in the views of Descartes and of Kant),
but conversely: causes are conditional on (fallible) knowledge of theories
based on empirical observation. Causality is relative to theories formu-
lated at some date, without claiming universal validity. Therefore, von
Mises (p. 211) argues, the principle of causality is subject to change, it
depends on our cognition. I regard this as a tenable position in positi-
vism.
4.2.3 Keynes' causa cognoscendi
Unlike Peirce, who was able to anticipate the full consequences
of probabilism, Keynes is a determinist: `none of the adherents of ``objec-
tive chance'' wish to question the determinist character of natural order'
230 Probability, Econometrics and Truth
([1921] CW VIII, p. 317); objective chance results from `the coincidence
of forces and circumstances so numerous and complex that knowledge
sufcient for its prediction is of a kind altogether out of reach' (p. 326).
Despite this support for Laplace's perspective, Keynes does not make
universal causation the ultimate foundation of his theory of probability
and induction. In this sense, he is a Humean. Still, in a brief exposition on
causality, Keynes (pp. 3068) makes a few remarks on causation that
anticipate the modern theory of probabilistic causation of Suppes (see
e.g. Vercelli and Dimitri, 1992, pp. 41617). Keynes distinguishes causa
essendi (the ontological cause, where necessary and sufcient conditions
can be stated) from causa cognoscendi (causality relative to other knowl-
edge, which is a probabilistic notion dealing with regular conjunction).
Keynes provides a number of denitions for types of causes. There are
two given propositions, e and c, related to events E and C where C occurs
prior to E (remember that Keynes' theory of probability is formulated in
terms of propositions). Furthermore, we have laws of nature (indepen-
dent of time) l and other existential knowledge (facts), given by proposi-
tions f. The most elementary causal relations (that do not rely on f ) are
the following.
(i) If P(e[c. l) = 1. then C is a sufficient cause of E
(ii) If P(e[not-c. l) = 0. then C is a necessary cause of E.
The principle of universal causation (or law of causation, as Keynes calls
it) is that, if l includes all laws of nature, and if e is true, there is always
another true proposition c, such that P(e[c. l) = 1. Hence, this principle is
about sufcient causes, i.e. case (i). But for the practical problem of
induction, Keynes (pp. 276, 306) argues, the laws of universal causation
and the uniformity of nature (same cause, same effect) are of little inter-
est. Science deals primarily, with `possible' causes. As an example (not
given by Keynes), one might think of smoking: neither a necessary, nor a
sufcient, cause of lung cancer. In order to deal with possible causation,
Keynes rst weakens denitions (i) and (ii). Consider the sufcient cause
with respect to background knowledge f (case (iv) in Keynes, [1921] CW
VIII; I will skip the similarly weakened versions of necessary causes):
(iv) If P(e[c. l. f ) = 1 and P(e[l. f ) ,= 1 then C is sufficient
cause of E under conditions f .
This introduces background knowledge and makes a more interesting
example of causation than the (unconditional) case (i), although smoking
Positivism and the aims of econometrics 231
still is not a sufcient cause (type (iv)) of cancer. The step to possible
causation (Suppes, 1970, uses the term prima facie cause in this context) is
made by introducing a further existential proposition h, such that
(viii) If P(h[c. l. f ) ,= 0 (i.e. the additional proposition is not
inconsistent with the possible cause, laws and other
facts),
P(e[h. l) ,= 1 (i.e. the effect does not obtain with
absolute certainty in absence of c) and
P(e[c. h. l. f ) = 1 (i.e. the effect is `true'),
then C is, relative to the laws l, a possible sufcient cause
of E under conditions f.
A possible cause can also be dened for the necessary case, yielding the
somewhat odd `possible necessary' cause. An interesting feature of
Keynes' analysis of causation is that he is one of the rst to provide a
probabilistic treatment of causation. The recent theory of Suppes (1970)
is not much different from Keynes' short treatment. But, unlike Suppes,
Keynes does not have much interest in the types of causes such as pre-
sented above. They relate to causa essendi, whereas observations only
make inference with respect to causa cognoscendi possible. A denition
of causa cognoscendi is given:
If P(e[c. h. l. f ) ,= P(e[not-c. h. l. f ), then we have `dependence for probability' of c
and e, and c is causa cognoscendi for e, relative to data l and f .
Keynes' causa cognoscendi brings us back to Humean regularity, which in
his probabilistic treatment is translated to statistical dependence (and
correlation, in the context of the linear least squares model; see Swamy
and von zur Muehlen, 1988, who emphasize that uncorrelatedness does
not imply independence). This, and not causa essendi, is what matters in
practice:
The theory of causality is only important because it is thought that by means of its
assumptions light can be thrown by the experience of one phenomenon upon the
expectation of another. (Keynes, [1921] CW VIII, p. 308)
Keynes is aware that correlation is not the same as causation (although
Yule's famous treatment of spurious correlation appeared ve years later,
in 1926). It is not unlikely that Keynes would hold that thinking (intuitive
logic) would be the ultimate safeguard against spurious causation.
232 Probability, Econometrics and Truth
4.2.4 From Fisher's dictum to the simplicity postulate
Fisher argues that his inductive methods are intended for reason-
ing from the sample to the population, from the consequence to the cause
(Fisher, 1955, p. 69), or simply for inferring the `probability of causes'
(p. 106). There is a clear link between randomness and causal inference in
Fisher's writings. He notes that there is a multiplicity of causes that
operate in agricultural experiments. Few of them are of interest. Those
few can be singled out by sophisticated experimental design.
Randomization is the rst aid to causal inference (see the discussion of
randomization and of experiments in econometrics in chapters 6 and 7).
Cause is a regularly recurring word in Fisher's writings, but he does not
subscribe to a principle of causation such as Kant's. He is an exponent of
British empiricism, his views on causality are not much different from
Hume's (who is not mentioned in Fisher's books). Cause is used as a
convenient way of expression, not unlike Pearson's `mental shorthand'.
There is no suggestion in Fisher's writings that he supports either a
deterministic or an indeterministic philosophy.
One of Fisher's statements with regard to causality is interesting. When
Fisher was asked at a conference how one could make the step from
association to causation, Fisher's answer was `make your theories elabo-
rate' (Cox, 1992, p. 292). This dictum can be interpreted in various ways.
One is to control explicitly for nuisance causes, by including them as
variables in statistical models. This is discussed in Fisher ([1935] 1966,
Chapter IX) as the method of concomitant measurement.
14
It amounts to
making implicit ceteris paribus clauses explicit. But modelling all those
additional factors can be fraught with hazards because the specic func-
tional relations and interactions may be very complex. The alternative is
to improve the experimental design.
The discussion of Pratt and Schlaifer (1988) on the difference between
(causal) laws and regressions elaborates Fisher's ideas on concomitant
measurement. Pratt and Schlaifer (p. 44, italics in original) argue that a
regression is only persuasive as a law, if one includes in the regression
every `optional' concomitant . . . that might reasonably be suspected of either affect-
ing or merely predicting Y given X or if the available degrees of freedom do not
permit this, then in at least one of several equations tted to the data.
In macro-econometrics, degrees of freedom are usually too low to imple-
ment this approach. A combination with Leamer's (1978) sensitivity
analysis may be helpful. Levine and Renelt (1992) make such an effort.
In micro-econometrics, when large numbers of observations are avail-
able, the argument of Pratt and Schlaifer may clarify (implicitly) why
investigators do not vary signicance levels of t-statistics with sample
Positivism and the aims of econometrics 233
size (the same 0.05 recurs in investigations with 30 and 3000 observa-
tions). Including more `concomitants', where possible, makes it more
likely that the inferences are like laws instead of being spurious.
The modied simplicity postulate suggests why a researcher should not
include all available regressors in a regression equation. One may even
argue that this postulate enables a researcher to ignore remote possibi-
lities, and constrain oneself to relatively simple models unless there is
specic reason to do otherwise. This, I think, is Jeffreys' (and, perhaps,
even Fisher's own) approach. Jeffreys ([1939] 1961, p. 11) criticizes the
principle of causality, or uniformity of nature, in its form `[p]recisely
similar antecedents lead to precisely similar consequences'. A rst objec-
tion of Jeffreys is that antecedents are never quite the same, `[i]f ``precisely
the same'' is intended as a matter of absolute truth, we cannot achieve it'
([1939] 1961, p. 12). More interestingly, Jeffreys asks how we may know
that antecedents are the same. Even in carefully controlled experiments,
such knowledge cannot be obtained. The only thing that can be done, is
to control for some conditions that seem to be relevant, and hope that
neglected variables are irrelevant. The question then arises, Jeffreys asks,
`How do we know that the neglected variables are irrelevant? Only by
actually allowing them to vary and verifying that there is no associated
variation in the result' (p. 12). This verication needs a theory of `sig-
nicance tests' (Jeffreys' Bayesian approach, or in Fisher's case, the
approach outlined in chapter 3, section 3). Analysing the residuals in
regression equations is an application of this line of thought.
4.3 Causal inference and econometrics
4.3.1 Recursive systems
Not surprisingly, causality was much discussed during the found-
ing years of econometrics, in particular with respect to business cycle
research. Tinbergen (1927, p. 715) argues that the goal of correlation
analysis is to nd causal relations. But he warns of the fallacy of `a certain
statistician, who discovered a correlation between great res and the use
of re engines, and wanted to abolish re engines in order to prevent
great res' (my translation). The same warning appears in nearly all
introductory textbooks on statistics. Apart from this remark,
Tinbergen does not dig into the problem of causal inference.
Koopmans is more specic. Koopmans (1941, p. 160) argues that the
`fundamental working hypothesis' of econometric business cycle research
is that
234 Probability, Econometrics and Truth
causal connections between the variables are dominant in determining the uc-
tuations of the internal variables, while, apart from external inuences readily
recognized but not easily expressible as emanating from certain measurable phe-
nomena, mere chance uctuations in the internal variables are secondary in quan-
titative importance.
Koopmans denes causal connections as necessary. He also argues that
this working hypothesis is not contradicted by the data (suggesting that
it is testable). There are many interrelations between economic vari-
ables, which `leaves ample freedom for the construction of supposed
causal connections between them. . . In fact, it leaves too much freedom'
(pp. 1612). Here, the economist should provide additional, a priori
information. The economist selects a number of possible causal rela-
tions. The econometrician uses the `principle of statistical censorship'
(p. 163) to purge those suggested relations that are in conict with the
data.
Koopmans' methodological views on causality do not correspond to
the positivist views. Compare Pearson, who argues that causality is just
mental shorthand, adding nothing of interest to the statistical informa-
tion of contingency or correlation. Koopmans, on the other hand, takes
causality as a necessary requirement for business cycle research (like
Kant, who argued that scientic inference needs a causal principle).
Koopmans is less dogmatic than Kant, for he claims that his causal
principle is just a working hypothesis, not rejected by the data so far.
Furthermore, Koopmans (a physicist who grew up in the quantum revo-
lution) does not deny the possibility of pure chance phenomena.
The publications of the Cowles Commission that followed elaborate
Koopmans' point of view. Causality is an a priori notion, imposed as a
property of models. In this sense, Cowles-causality can be regarded as
different from Kant's causality, where it is a property of the `things'
(data) `in themselves', hence (if we consider this in the current context),
a property of the outcome space. Cowles-causation stands half way
between the positivist notion of causality and the a priori view.
Simon (1953) and Strotz and Wold (1960) have tried to elaborate the
denition of causality in terms of properties of (econometric) models
(hence, not in terms of real world events). Simon's denition is based
on restrictions on an outcome space S, where there are (at least) two sets
of restrictions, A and B (the following is a simplied version of the
example given in Geweke, 1982). An econometric model is the conjunc-
tion of those restrictions, A B. One may think of restrictions on the
determination of money, M, and income, Y:
Positivism and the aims of econometrics 235
A : M = a. (1)
B : Y bM = c. (2)
The outcome space of this example is S = {(M. Y) c R
2
]. This is an
example of a causal ordering: condition (1) only restricts M, and condi-
tion (2) restricts Y without further restricting M. More generally, Simon
denes a causal ordering as follows. The ordered pair (A,B) of restrictions
on S determines a causal ordering from M to Y if and only if the mapping
G
Y
(A) = Y and the mapping G
X
(A B) = G
X
(A). Hence, the causal
ordering is a property of the model, not the data.
A related interpretation of causality is the causal chain model of
Herman Wold. This causal chain can be represented by triangular recur-
sive stochastic systems (see e.g. Wold and Jure en, 1953). Consider the
model
By x = u. (3)
where B is a triangular matrix (with unit elements on the diagonal), and
= E(uu
/
) a diagonal matrix. An example (taken from Strotz and Wold,
1960) which is the stochastic counterpart to (1)(2) is:
x = u
1
N(j
1
. o
2
1
). (4)
y bx = u
2
N(j
2
. o
2
2
). (5)
The reduced form equations of such a system can conveniently be esti-
mated with ordinary least squares, which yields consistent estimates.
The causal chain model has two motivations. The rst is that reality
is supposed to follow a causal chain: `truth' is recursive (a popular
method of the Scandinavian school of economics, sequence analysis,
is based on this postulate; see Epstein, 1987, p. 156). Wold even claims
that the only way to deal with causal models is via recursive systems
(see the critique of Basmann, 1963). A second motivation is that recur-
sive systems avoid cumbersome simultaneous equations estimation tech-
niques (in particular, Full Information Maximum Likelihood). Qua
method, Wold opposes the Cowles approach. Qua philosophy, they
are alike: they are close to the Kantian a priori synthetic truth. The
assumptions underlying Wold's arguments (triangular system, normal-
ity) may be invoked as useful simplications, but they remain assump-
tions. The modied simplicity postulate suggests that Wold's argument
236 Probability, Econometrics and Truth
may be useful where the resulting descriptions perform well (evaluated
for example by the Minimum Description Length principle), but it is
not justied to argue that economic reality is per se a triangular recur-
sive system. Another objection to Wold's approach is that, if there are k
endogenous variables in the model, the number of possible causal order-
ings is k!. Wold does not say how to choose among those alternative
specications.
Recursive systems underlie many early macro-econometric models,
such as Tinbergen's.
4.3.2 Causality and conditional expectation
Pearson ([1892] 1911, p. 173) argues that the unity of science is in
its method, and the methods of science are classication, measurement,
statistical analysis:
The aim of science ceases to be the discovery of `cause' and `effect'; in order to
predict future experience it seeks out the phenomena which are most highly
correlated . . . From this standpoint it nds no distinction in kind but only in
degree between the data, method of treatment, or the resulting `laws' of chemical,
physical, biological or sociological investigations.
Pearson's statement has a weakness. He does not consider the possibility
of spurious and nonsense correlation (although classication can be
viewed as his way of discriminating sense from nonsense). Here, the
paths of Pearson and his student, Yule, diverge. Yule is interested in
regression, conditional expectation. This surpasses Pearson's emphasis
on association (correlation). An important reason why econometricians
are interested in causation is that they want to avoid being accused of
inferring `spurious correlations', such as Tinbergen's (1927) example, re
engines cause great res. The fallacy in his example can be exposed by a
few simple experiments: it is straightforward to demonstrate that re
engines are neither sufcient nor necessary causes of great res. In eco-
nomics, it is not easy to obtain such knowledge by an appropriate
experiment.
The conditional expectation model, which underlies regression analy-
sis, has been used to clarify the notion of causality in cases of non-experi-
mental inference. An example is Suppes (1970, p. 12). He distinguishes
prima facie causes and spurious causes. Basically the same distinction is
made in Cox (1992), who provides slightly more simple denitions. Cox's
denitions are as follows. C is a candidate cause (prima facie cause, in
Suppes' terminology) of E if P(E[C) > P(E[not-C). Hence, candidate
causes result at least in positive association. Next, C is a spurious cause
of E if B explains the association, i.e. P(E[C. B) = P(E[not-C. B). Cox's
Positivism and the aims of econometrics 237
and Suppes' approaches brings us back to Fisher's dictum: elaborate
your model. As it may always be the case that there is a variable B
that has not been considered for the model, statistical inferences generate
knowledge about candidate causes, not real causes. This is consistent
with Hume's scepticism about establishing knowledge for necessary
causes. Most econometricians, I think, will agree with Cox's view.
The philosopher Nancy Cartwright (1989), however, has tried to reha-
bilitate `true' causes or `capacities'. The initial motto of the Cowles
Commission, `Science is Measurement', is her starting point. However,
measurement is not her single aim, it is supplemented by the search for
causal relations, or more generally, `capacities'. She summarizes her views
as `Science is measurement; capacities can be measured; and science can-
not be understood without them' (p. 1). Cartwright prefers the study of
capacities to the study of laws. She follows J. S. Mill by opposing Hume's
view that the notion of cause should be replaced by the notion of regu-
larity and constant conjunction. Every correlation, Cartwright (p. 29)
argues, has a causal explanation, and the statisticians' warning that cor-
relation does not entail causation is too pessimistic. Her theory involves
the full set of (INUS) causal conditions for event E (each of them may be
a genuine cause). If each of the elements in this set has an `open back
path' with respect to E, then all of them are genuine causes (p. 33). C has
an open back path with respect to E if and only if C has another (pre-
ceding) cause C
/
, and if it is known to be true that C
/
can cause E only by
causing C. The condition is needed to rule out spurious causes. This,
Cartwright (pp. 345) argues, is how `you can get from probabilities to
causes after all'. The major weaknesses in the argument are the require-
ment for having a full set of INUS conditions, and the presumed true
knowledge in the `open back path' condition.
Cartwright claims that her philosophy applies to the social sciences.
The methods of econometrics, she argues (p. 158), presuppose that the
phenomena of economic life are governed by capacities.
15
I think she
overstates her case. To see why, consider the investigations of consumer
demand presented in chapter 8. Measurement was clearly an aim of most
demand studies. But which `capacity' or cause was at stake? Should we
understand `capacity' in a very general sense, e.g. as rational (intentional)
behaviour, utility maximization? I do not think that it is helpful to main-
tain that an elasticity of 2 is `caused' by utility maximization of a group of
individuals. But perhaps the elasticity is to be interpreted as the bridge by
which a cause operates, a 1% change in income `causes' a 2% change in
consumption. Indeed, an elasticity is a proposition of the kind `if A, then
B'. But the econometric investigations of such elasticities are based on
regular conjunction, causation is only of interest if one wants to go
238 Probability, Econometrics and Truth
beyond measurement, in particular, if intervention is the aim. Even in
that case, `cause' is mental shorthand for rational expectation. We can do
without metaphysical capacities.
4.3.3 Wiener-Granger causality
Granger (1969) denes causality in terms of information prece-
dence (or predictability) for stationary processes. It is an attempt to
provide a statistical, operational denition that can help one to escape
from the philosophical muddle. Granger's denition resembles Cox's of a
non-spurious candidate cause, although Granger restricts the denition
to processes that have nite moments. Formally, let all information in the
`universe' at time t be given by U
t
. U
t
X
t
represents all information
except that contained in x
t
and its past. Then x
t
is said to Granger-cause
y
t1
if
P(y
t1
in C[U
t
) ,= P(y
t1
in C[U
t
X
t
). (6)
for any region C.
16
The denition contains a non-operational informa-
tion set U
t
. This requirement resembles Carnap's (1950) requirement of
total evidence, by which Carnap tried to avoid the so-called reference
class problem. In order to obtain a more useful notion of causality based
on the idea of precedence, one usually invokes a background theory on
which a restricted information set I is based. Granger suggests using the
term `prima facie cause' for this case, and this condition indeed corre-
sponds to Suppes' prima facie cause as well as Cox's candidate cause.
They all are probabilistic versions of conditions (ii) and (iii) of Humean
causation.
Granger's criterion has a number of weaknesses and many of them
have been discussed by Granger himself. I have already mentioned the
assumption of nite moments. For non-stationary processes, one has to
lter the time series in order to make them stationary. There are many
ways to do this. The causality test is sensitive to the choice of ltering.
Furthermore, ltering data is likely to abolish relevant information and,
therefore, violates Fisher's reductionist approach to inference as well as
inference based on the modied simplicity postulate.
There are other weaknesses. According to Granger's denition, sun-
spots `Granger-cause' the business cycle (Sheenan and Grives, 1982).
Sims (1972, p. 543) writes that the Granger causality test `does rest on
a sophisticated version of the post hoc ergo propter hoc principle'.
Whether the characterization `sophisticated' is right, depends on one's
assessment of the importance of the choice of background information, I,
in specic applications. This choice is crucial. Different choices of I may
Positivism and the aims of econometrics 239
give different causality results (as Sims' sequence of tests of money
income causality, starting with Sims, 1972, shows). Granger's test for
causality is highly sensitive to specication uncertainty. The approach
is one of overreaching instrumentalism, where only predictive perfor-
mance matters. While the Wold approach to causal inference exaggerates
a priori economic theory, `Granger-causality' neglects considerations of
economic theory.
4.3.4 Cause and intervention
The notions of sufcient and necessary cause serve economic
policy. If a policy maker knew the causes of recessions, he would like
to neutralize the necessary cause of it, or create a sufcient cause for a
boom. But such knowledge is still far from established. We have a num-
ber of candidate causes instead of necessary or sufcient causes, and
worse, those candidate causes operate in a changing environment. The
policy maker will be interested in how variables of interest react to
changes in policy variables (instruments). Hence, the interest in causal
inference is inspired by the desire to intervene. This notion is absent in the
interpetations of causality given so far (although Cartwright, 1989, chap-
ter 4, implicitly considers this issue).
Leamer (1985) discusses how causality should be related to interven-
tion. He arrives at the familiar notion of structural parameters (where a
parameter is said to be structural if it is invariant to a specied family of
interventions). A causal relation is one with structural parameters, for a
specic context of explicitly stated policy interventions. Leamer's deni-
tion adds an important element to Feigl's `predictability according to a
law'. Without identifying the relevant intervention, Feigl's denition
remains sterile. But without experimentation, the notion of intervention
is obscure as well. A possible help may be the identication of great
historical events, such as wars and crashes. Leamer's view is that the
best way to learn causes is from studying history, econometrics being
of secondary interest only.
To summarize, while Wold imposes causality a priori, and Granger
proposes to test for causality by means of a prediction criterion,
Feigl's predictability according to a law takes a sensible position in
between. Zellner (1979) recommends Feigl's approach to causality, but
causality is only of interest if the law is relevant to a specic inter-
vention. The probable effect of the intervention is of interest to the
policy maker. This is a problem of probabilistic inference, not a meta-
physical speculation about the ontological meaning of cause or the
causal principle.
240 Probability, Econometrics and Truth
5 Science is prediction
5.1 Prediction as aim
5.1.1 Prediction and inference
Prediction may help to establish causality. But some investiga-
tors, in particular positivists, go further: they argue that prediction by
itself is the ultimate aim of science.
17
Comte, for example, writes that
prediction is the `unfailing test which distinguishes real science from vain
erudition' (cited in Turner, 1986, p. 12) erudition, that is plain descrip-
tion. A similar statement can be found in the writings of Ramsey (1929,
p. 151), who argues:
As opposed to a purely descriptive theory of science, mine may be called a fore-
casting theory. To regard a law as a summary of certain facts seems to me
inadequate; it is also an attitude of expectation for the future. The difference is
clearest in regard to chances; the facts summarized do not preclude an equal
chance for a coincidence which would be summarized by and, indeed, lead to a
quite different theory.
Ramsey's theory of science aims at prediction, as does Carnap's theory of
induction, and many other philosophies of science.
Econometricians tend to praise the predictive aim. Koopmans (1947,
p. 166) argues that prediction is, or should be, the most important objec-
tive of the study of business cycles.
18
Why prediction is so important is
not always made clear. Sometimes, prediction is viewed as a means for
social engineering or control (Marschak, 1941, p. 448; Whittle, [1963]
1983). Others suggest that prediction is a prerequisite for testing, and
science is about testing (Friedman, Blaug). Koopmans invokes prediction
to demarcate supercial relations (those of Burns and Mitchell, 1946)
from deep econometric relations (based on `theory'). Koopmans (1949,
p. 89) claims (but does not substantiate his claim) that those `supercial
relations' lack stability and are, therefore, bad instruments for predic-
tions.
Another argument for prediction is that it underlies probabilistic infer-
ence. As Keynes and Fisher argue, statistics has a descriptive and an
inductive purpose. If description were the sole aim, it would be vacuous
as there would be no reason to `reduce' the data. The latter becomes
important if one wants to make inferences about new observations.
Here, the modied simplicity postulate becomes relevant, as it is a way
of dealing with the trade-off between accurate description and simplicity.
This trade-off is based on an inductive theory of prediction. The theory of
simplicity is based on Bayesian updating, a special version of prediction.
Positivism and the aims of econometrics 241
A radical point of view is expressed by Thomas Sargent. In the preface
to Whittle ([1963] 1983, p. v), Sargent claims that `to understand a phe-
nomena is to be able to predict it and to inuence it in predictable ways'.
This reects the symmetry thesis of the logical positivists Hempel and
Oppenheim (see Blaug 1980, pp. 39), which states that prediction is
logically equivalent to explanation. Whittle is more cautious than
Sargent: prediction can be based upon recognition (`a compilation of
possible histories' analogous to the event to be predicted) of a regularity
as well as upon explanation of the regularity (Whittle, [1963] 1983, p. 2).
Explanation does not entail prediction, and one can extrapolate without
an underlying explanatory model. Prediction, Whittle argues, is rarely an
end in itself. For example, a prediction of next year's GNP may have as
its purpose the regulation of the economy. Here, the goal of inference is
not to optimize the accuracy of the prediction, but to optimize (for exam-
ple) a social welfare function.
5.1.2 Fishing for red herrings: prediction, novel facts and old
evidence
In his Treatise on Light (published in 1690) (alternatively, cited in
Giere, 1983, p. 274), Christiaan Huygens argues that the probability of a
hypothesis increases if three conditions are satised. First, if the hypoth-
esis is in agreement with observations, secondly, if this is the case in a
great number of instances, and most importantly,
when one conceives of and foresees new phenomena which must follow from the
hypothesis one employs, and which are found to agree with our expectations.
This view remains popular today, for example in the hypothetico-deduc-
tive philosophy of science (Carnap, Hempel) and in the conjecturalist
version of it (Popper, Lakatos).
Conrmations of predictions of novel facts increase the support for
theories. For this reason, prediction is an important theme in discussions
on the methodology of economics. Not only Friedman (1953) argues that
prediction is essential for appraising theories. Theil (1971, p. 545), Zellner
(1984, p. 30) and Blaug (1980) concur. There is a dissonant voice in this
choir, though: John Maynard Keynes, who states his opinion in a dis-
cussion of the views of Peirce. In 1883, Peirce wrote an essay on `a theory
of probable inference', in which he states the rule of predesignation: `a
hypothesis can only be received upon the ground of its having been
veried by successful prediction' (Peirce, 1955, p. 210). Peirce means to
say that one has to specify a statistical hypothesis in advance of examin-
ing the data (whether these data are new or already exist is irrelevant, this
seems to be why he prefers predesignation to prediction). Successful novel
242 Probability, Econometrics and Truth
predictions are necessary and sufcient for conrming a theory. Keynes
([1921] CW VIII, p. 337) does not agree:
The peculiar virtue of prediction or predesignation is altogether imaginary. The
number of instances examined and the analogy between them are the essential
points, and the question as to whether a particular hypothesis happens to be
propounded before or after their examination is quite irrelevant.
He continues:
[the view] that it is a positive advantage to approach statistical evidence without
pre-conceptions based on general grounds, because the temptation to `cook' the
evidence will prove otherwise to be irresistible, has no logical basis and need
only be considered when the impartiality of an investigator is in doubt. (p. 338)
Keynes distances himself from Peirce on the one hand, and the `truly
Baconian' approach of Darwin on the other hand. The point relates to
a problem in Bayesian inference, i.e. how to deal with old evidence. If
evidence has been observed already, how can it be interpreted probabil-
istically? According to one interpretation such evidence has a probability
value equal to one (it occurred). If so, old evidence cannot conrm a
theory. This follows from Bayes' principle, if P(e) = 1:
P(h[e) =
(P(h) P(h[e)
P(e)
= P(h). (7)
However, in applications of Bayesian inference, one does not set P(e)
equal to one. Instead, P(e) gures as a normalizing constant by which
the left hand side in (7) satises the axiom that probabilities sum to one.
In my view, the problem of old evidence is a philosophical red herring.
The contributions in Earman (1983), devoted to this issue, show that its
consumption is not nourishing. However, data-mining (shing, it is some-
times called) is a problem for inference, whether Bayesian, frequentist or
not probabilistic at all. In his discussion with Tinbergen, Keynes com-
plains that he is not sure whether Tinbergen did not `cook' his results,
even though he had little reason to suspect the impartiality of Tinbergen.
Tinbergen's methods were rejected as a valid base for testing business
cycle theories. One reason is the lack of homogeneity. A second reason,
more clearly formulated by Friedman (1940, p. 639), is that empirical
regression equations are `tautological reformulations of selected eco-
nomic data' and as such not valid for testing hypotheses. The regression
equations remain useful for deriving hypotheses. The real test is to com-
pare the results with other sets of data, whether future, historical, or data
Positivism and the aims of econometrics 243
referring to different regions. Like Keynes, most econometricians accept
this view.
Friedman's point is (implicitly) supported in the survey on prediction
by the statisticians Ehrenberg and Bound (1993) as well. They note that
textbooks on statistics, and many papers in the literature on probability
theory, abound with claims like `least squares regression yields a ``best''
prediction', where `best' is dened according to some statistical criteria
(e.g. minimum mean square error). They complain that `statistical texts
and journals are obsessed with purely deductive techniques of statistical
inference' (p. 188). Although not unimportant, this preoccupation with
statistical criteria tends to blur the importance of specication uncer-
tainty. What is needed is less focus on `best' and more effort to try models
in `many sets of data' (MSOD) rather than single sets of data (SSOD). It
is essential to vary the conditions in which models are applied. This yields
more support to ceteris paribus conditions, hence more reliable inference.
What is of interest is to widen the range of conditions under which a
prediction stands up.
5.2 The Austrian critique on prediction and scientism
Not all phenomena can be predicted: many are unique or not reprodu-
cible. Economists in the Austrian tradition are characterized by a funda-
mental scepticism of the positivist aims of observation, measurement and
prediction. These economists reject all that econometricians stand for. In
his exposition of Austrian methodology, Israel Kirzner (1976, p. 43)
argues that
Our dissatisfaction with empirical work and our suspicion of measurement rest on
the conviction that empirical observations of past human choices will not yield
any regularities or any consistent pattern that may be safely extrapolated beyond
the existing data at hand to yield scientic theorems of universal applicability.
This is as radical a scepticism as scepticism can be. It goes well beyond
Humean scepticism (Hume argued that we merely observe regularities,
but cannot infer causal relations from these). Although some economists
claim universal applicability for their `laws', most economists are more
modest. Kirzner attacks a straw man, like Robbins did before.
Kirzner's argument is weak for another reason as well. The fact that
human action is to some degree erratic does not imply that it is wholly
unpredictable, either on the individual level or in the aggregate. If there is
no scope for predictions, human action is impossible. One might invoke
Hume's view that we cannot help but make inductions, or Keynes, who
praises convention, or philosophers of probability, who argue that it is
244 Probability, Econometrics and Truth
rational to make inferences and predictions. Although we are not abso-
lutely certain that the sun will rise tomorrow, or that consumption of gas
declines as its price rises, this does not prevent us from making predic-
tions and acting upon those predictions. The Austrian doctrine is that we
do have knowledge on the effect of price changes on gas consumption,
but this knowledge results from introspection. This is the Kantian a priori
response to scepticism.
Hayek (1989) is more subtle than Kirzner. Although Hayek criticizes
the pretence of knowledge that economists tend to have, he argues that
pattern prediction is feasible (and in 1933, the `young' Hayek even argued
that statistical research is `meaningless except in so far as it leads to a
forecast'; cited in Hutchison, 1981, p. 211). Pattern prediction should be
distinguished from prediction of individual events, which is not possible.
Hayek warns that we should not use pattern predictions for the purpose
of intervention, as interventions are unlikely to yield the desired effects.
19
This is a hypothesis, though, not a fact, and econometric inference may
serve to appraise the hypothesis.
Mathematical reasoning and algebraic equations may be helpful in
gaining knowledge of the economy, but one should be careful in supple-
menting these tools with statistics. This leads to the illusion, Hayek (1989,
p. 5) writes,
that we can use this technique for the determination and prediction of the numer-
ical values of those magnitudes; and this has led to a vain search for quantitative
or numerical constants.
The pretence of knowledge lies in the claim that mathematical models,
supplemented by numerical parameters (obtained with econometric
tools) can be used to intervene in the economy and predict or obtain
specic goals. Hayek is sceptical about the achievements of econometri-
cians, as (he argues) they have made no signicant contribution to the
theoretical understanding of the economy. But one could also accuse
Hayek, and, more generally, economists working in the Austrian tradi-
tion, of a very different pretence of knowledge. Their claim, that we may
trust an `unmistakable inner voice' (an expression due to Friedrich von
Wieser, cited in Hutchison, 1981, p. 206), easily leads to dogmatism is it
not a pretence of knowledge, that our inner voice is unmistaken?
Caldwell, the advocate of methodological pluralism, accepts the
Austrian critique of forecasting for an odd reason. Austrians reject pre-
diction as a way of testing theories because of the absence of natural
constants. If we want to evaluate Austrian economics, Caldwell (1982,
p. 123) writes, then we should `focus on the verbal chain of logic rather
than on the predictions of the theory'. Hence, we should not engage in
Positivism and the aims of econometrics 245
`external criticism' (p. 248). By the same argument, we must accept that
Friedman's monetarism should be evaluated on ground of its predictions.
You cannot have it both ways: to evaluate monetarism on the basis of
predictions, while evaluating Austrian economics on the basis of intro-
spection or `sound reasoning'. The modied simplicity postulate takes
consideration of `sound reasoning' as well as predictive performance. It
does not depend on unjustied pluralism, but on a proper trade-off
between the different ingredients of scientic arguments.
6 Testing
Testing belongs to the basic pastimes of econometricians. A casual inves-
tigation of titles of papers shows that there is a lot of `testing' in the
econometric literature, though not quite as much `evidence'.
20
Why do
econometricians test so much? And what is its signicance? Sometimes
one wonders about the abundance of tests reported in empirical papers as
the purpose of all those tests is not always communicated to the reader.
The search for homogeneity in consumer demand illustrates the frequent
irrelevance of a positive or negative test result.
Statistical inference is usually understood as estimation and testing. In
the following, I discuss some aims and methods of testing.
6.1 Why test?
6.1.1 Theory testing
The founder of modern statistics, R. A. Fisher ([1925] 1973, p. 2)
wrote: `Statistical methods are essential to social studies, and it is princi-
pally by the aid of such methods that these studies may be raised to the
rank of sciences.' Fisher nowhere discusses the problem of testing rival
theories, which would be the Popperian aim. There is, however, an epi-
sode in his research activities in which he was involved in a real scientic
dispute: the question whether smoking causes lung cancer (see Fisher
Box, 1978, pp. 4726).
21
Conrmative evidence came from a large-scale
study performed in Britain in 1952: it revealed a clear positive association
between lung cancer and smoking. Fisher challenged the results: correla-
tion does not prove causation. Fisher's alternative was that there might
be a genetic cause of smoking as well as of cancer.
How can these rival theories be evaluated? Fisher's argument is one of
omitted variables. One either needs information on the disputed variable
in order to make correct inferences, or the experimental design should be
adapted. In the latter case, one could think of constructing two rando-
mized groups, force members of group I to smoke and force the others to
246 Probability, Econometrics and Truth
refrain from smoking. This is not practical. Fisher, the geneticist, resorted
to an alternative device: identical twins. As a research technique, it resem-
bles the `natural experiments' discussed in chapter 6. It did not resolve the
dispute.
Even in this conict, Fisher had positivist aims: measurement of para-
meters. The inuence of Popperian thought resulted in a shift from esti-
mating to testing as the hallmark of science. Prominent members of the
Cowles Commission, like Haavelmo and Koopmans, advocated this
approach. More recently, hypothetico-deductive econometrics (see
Stigum, 1990) has been embraced by followers of the new-classical
school, who search for `deep' (structural) parameters. A characteristic
view of a mainstream econometric theorist (Engle, 1984, p. 776), is that
if `the confrontation of economic theories with observable phenomena is
the objective of empirical research, then hypothesis testing is the primary
tool of analysis'.
The mainstream view is not without critics. Hahn (1992) writes, `I
know of no economic theory which all reasonable people would agree
to have been falsied.' This is a theorist's challenge to empirical research-
ers. Hahn's verdict is that we need more thinking (theory), less empirical
econometrics. McCloskey (1985, p. 182) agrees with Hahn's view: `no
proposition about economic behaviour has yet been overturned by
econometrics'. Summers (1991, p. 133) adds, `[i]t is difcult to think
today of many empirical studies from more than a decade ago whose
bottom line was a parameter estimate or the acceptance or rejection of a
hypothesis'. Summers argues that the formalistic approach to theory
testing, in the tradition of Cowles and elaborated by new-classical econo-
metricians, leads merely to a `scientic illusion'.
Formal econometric hypothesis testing has an unpersuasive track
record. One of the few econometricians who explicitly acknowledge this
is Spanos (1986, p. 660):
no economic theory was ever abandoned because it was rejected by some empiri-
cal econometric test, nor was a clear-cut decision between competing theories
made in lieu of such a test.
But despite the lack of success, Spanos still aims at theory testing. He
claims that it can be achieved by more careful specication of statistical
models and relies strongly on mis-specication testing.
An important reason for the popularity of theory testing is that it is
thought to be a major if not the main ingredient of scientic progress
(Popper, [1935] 1968; Stigler, 1965, p. 12; Blaug, 1980), the best way to
move from alchemy to science (Hendry, 1980). Falsicationism has had a
strong impact on the minds of economists. Popper is about the only
Positivism and the aims of econometrics 247
philosopher of science occasionally quoted in Econometrica. In the phi-
losophy of science literature, however, falsicationism has become
increasingly unpopular. Not in the least because actual science rarely
follows the Popperian maxims. As Hacking (1983, p. 15) notes, `accepting
and rejecting is a rather minor part of science' (see also the contributions
in Earman, 1983, and those in De Marchi, 1988).
Insofar as theory testing is an interesting aim at all, it is not yet said
that econometrics is the best tool for this purpose. Identifying informa-
tive historical episodes or devising laboratory experiments (increasingly
popular among game theorists, who rarely supplement their experiments
with statistical analysis, as casual reading of such experimental reports in
Econometrica suggests) may generate more effective tests than Uniformly
Most Powerful tests. In the natural sciences, theory testing rarely results
from sophisticated statistical considerations. Giere (1988, p. 190) dis-
cusses the different attitudes towards data appraisal in nuclear physics
and the social sciences. Nuclear physicists seem to judge the t between
empirical and theoretical models primarily on qualitative arguments. Test
statistics such as
2
are rarely reported in nuclear physics papers con-
tained in, for example, the Physical Review (Baird, 1988, makes a similar
observation). Hence, theory (or hypothesis) testing clearly does not
always depend upon the tools we learned in our statistics course.
6.1.2 Validity testing
A different kind of testing, ignored in philosophical writings but
probably more important in actual empirical research than theory testing,
is validity testing (mis-specication or diagnostic testing). Validity tests
are performed in order to nd out whether the statistical assumptions
made for an empirical model are credible. The paper of Hendry and
Ericsson (1991) is an example of extensive validity testing. These authors
argue that, in order to pursue a theory test, one rst has to be sure of the
validity of the statistical assumptions that are made. In their view, valid-
ity testing is a necessary pre-condition to theory testing. However, even if
theory testing is not the ultimate aim, validity testing still may be impor-
tant. Much empirical work aims to show that a particular model (for-
mally or informally related to some theory) is able to represent the data.
If much information in the data remains unexploited (for example,
revealed by non-white-noise residuals), this representation will be suspect
or unconvincing to a large part of the audience.
The importance of validity tests should not be over-emphasized, at
least, they should be interpreted with care. One may obtain a very neat
`valid' statistical equation of some economic phenomenon, after extensive
torturing of the data. Such a specication may suggest much more precise
248 Probability, Econometrics and Truth
knowledge than the data actually contain. Sensitivity analysis, for exam-
ple along the lines of Leamer (1978) (see chapter 7), or such as performed
by Friedman and Schwartz (1963), is at least as important as validity
testing in order to make credible inferences. Illuminating in this context is
the exchange between Hendry and Ericsson (1991) and Friedman and
Schwartz (1991), discussed in detail in Keuzenkamp and McAleer (1999).
6.1.3 Simplication testing
A third aim of testing is simplication testing. Simple models
that do not perform notably less than more complex ones are typically
preferred to the complex ones (see chapter 5). Inference conditional on
exogeneity assumptions is often preferred to analysing `romantic' (Bunge,
[1959] 1979) models where everything depends on everything and for
which Walrasian econometricians advocate full information methods
(Haavelmo and Koopmans belong to this category). Simplicity not
only serves convenience and communication, but is needed for an epis-
temological theory of inference.
6.1.4 Decision making
Finally, a frequently expressed goal of testing is decision making.
This view on testing, and its implementation in statistics, is primarily due
to the NeymanPearson (1928; 1933a, b) theory of inductive behaviour.
The decision-theoretic approach of testing for the purpose of inductive
behaviour has been further elaborated by Wald and, from a Bayesian
perspective, by Savage ([1954] 1972). Lehmann ([1959] 1986) is the
authoritative reference for the frequentist approach, while Berger
([1980] 1985) provides the Bayesian arguments.
Decision making, based on statistical acceptance rules, can be impor-
tant for process quality control, but may even be extended to the apprai-
sal of theories. This brings us back to theory testing as a principal aim of
testing. Lakatos claims that the NeymanPearson version of theory test-
ing `rests completely on methodological falsicationism' (Lakatos, 1978,
p. 25 n.). In chapter 3, section 4.4, I criticized this view as historically
incorrect and analytically dubious.
6.2 Methods of testing
6.2.1 General remarks
The most familiar methods of testing in econometrics are based
on the work of Fisher, and on the NeymanPearson theory of testing. To
summarize the relevant issues that were discussed in chapter 3, Fisher's
theory of signicance testing is based on the following features. First, it
Positivism and the aims of econometrics 249
relies on tail areas (P-values). Secondly, it is intended for small samples.
Thirdly, it is meant for inductive scientic inference. These features are
clearly different from those of the NeymanPearson theory of testing.
This is based, rst, on an emphasis on size and power, leading to UMP
tests. Secondly, NeymanPearson tests are meant for the context of
repeated sampling. Finally, they are instruments for inductive behaviour
and making decisions.
In chapter 3, I analysed some shortcomings of both methods. Despite
these shortcomings, there are few close rivals to the blend of Fisher's and
NeymanPearson's methods that pervaded econometrics. Engle (1984),
for example, gives an overview of test procedures, all based on Neyman
Pearson principles. Bayesian testing has become popular only in a few
specic cases (e.g. the analysis of unit roots). Importance testing (prac-
tised by Tinbergen and, in different forms, advocated by Leamer and
Varian) is not very popular. The implementation of NeymanPearson
methods at the practical level is not easy, though. There is a wide diver-
gence between empirical econometrics and the maxims of a `celibate
priesthood of statistical theorists', as Leamer (1978, p. vi) observes.
The popularity of data-mining is particularly hard to combine with the
equally popular NeymanPearson based methods advocated in textbooks
and many journal papers. Hendry (1992, p. 369) rightly notes that test
statistics can be made insignicant by construction, as residuals are
derived, not autonomous processes. The problem of interpreting the
resulting test statistics remains unsolved today (see Godfrey, 1989, p. 3;
also Leamer, 1978, p. 5).
The following sections deal more explicitly with alternative statistical
methods to test rival theories: both from a frequentist and from a
Bayesian perspective.
6.2.2 Frequentist approaches to testing rival theories
In the frequentist approach there are three different strategies for
testing rival models: the embedding or comprehensive model strategy, the
generalized likelihood ratio strategy and the symmetric (or equivalence)
strategy. These approaches will be discussed below.
Comprehensive testing The comprehensive model is a nesting of
the competing models in a more general, embedding model. Two rival
models M
1
and M
2
have been formulated with reference to an N 1
regressand vector y. Assume both are linear models, represented by:
M
i
: y = W,
i
X
i
[
i
u
i
. u
i
N(0. o
2
i
I). i = 1. 2. (8)
250 Probability, Econometrics and Truth
where W(N K
w
) is a regressor matrix common to both models
(K
w
_ 0), and X
i
are N K
i
regressor matrices unique to either model.
Testing these hypotheses can be done by regarding them as two restricted
forms of a more general model, the embedding model:
M
+
: y = W,
+
X
1
[
+
1
X
2
[
+
2
u
+
. u
+
N(0. o
2
i
I). (9)
The simplest test is to test the hypotheses H
i
: [
+
i
= 0 (using an F-test, for
example; see MacKinnon, 1983, pp. 956 for discussion and comparison
with other tests). A difculty is that there are four possible test outcomes:
the combinations of rejections or acceptances of the hypothesis that
[
+
i
= 0. If either H
1
or H
2
is rejected, and the other accepted, then we
have a `desirable' test outcome that provides the required information. If
both hypotheses are rejected (or accepted), the researcher enters a state of
confusion. This may become even worse, if, in addition, a joint hypoth-
esis [
+
1
= [
+
2
= 0, is tested, as the outcome of this hypothesis test may be
in conict with the separate tests (see Gaver and Geisel, 1974, p. 56).
A further complication of the comprehensive testing approach arises
from the fact that the embedding model presented here is not the only
conceivable one. One can also construct a comprehensive model by
means of a weighted average of the two models. In fact, there are numer-
ous ways to combine rival models. For example, by considering the
likelihood functions of M
1
and M
2
, one can obtain the comprehensive
likelihood function L(L
1
. L
2
. z) as an exponential mixture (due to
A. C. Atkinson but already suggested by Cox, 1961, p. 110), or as a
convex combination (due to Quandt), etc. The exponential mixture is
particularly convenient, as (in the univariate linear model) it yields the
comprehensive model (with W for simplicity incorporated in the
matrices X
i
),
y = (1 z)X
1
[
1
zX
2
[
2
u. (10)
One can test whether z differs from zero or one, but a problem is that this
parameter is not always identiable. If z is exactly equal to one of those
extremes, the parameters of one of the sub-models do not appear in the
comprehensive likelihood function which invalidates the testing proce-
dure (see Pesaran, 1982, p. 268).
A nal problem of comprehensive testing is that the test properties (in
terms of Type I and Type II errors) are unknown if the variables are non-
stationary (and, of course, more generally if the models result from a
specication search).
Positivism and the aims of econometrics 251
The generalized likelihood ratio test An alternative to the com-
prehensive model approach is due to Cox (1961, 1962), who provides an
analysis of tests of `separate' families of hypotheses based on a general-
ization of the NeymanPearson likelihood ratio test. To dene terms,
consider again the models M
1
and M
2
. By separate families of hypoth-
eses, Cox (1962, p. 406) refers to models that cannot be obtained from
each other by a suitable limiting approximation. Vuong (1989) gives more
precise denitions. He calls this the strictly non-nested case, dened by the
condition,
M
1
M
2
= . (11)
As an example, one may think of two regression models with different
assumptions regarding the stochastic process governing the errors (e.g.
normal and logistic). The comprehensive approach, on the other hand,
relies on nested models, dened by
M
1
M
2
. (12)
An intermediate case, which is the most interesting for econometrics,
consists of overlapping models, with
M
1
M
2
,= . M
1
, M
2
. M
2
, M
1
. (13)
I will concentrate on the case of overlapping models. First note that the
NeymanPearson likelihood ratio test does not apply in this case. If L
i
denotes the maximized likelihood of M
i
, one can calculate the likelihood
ratio,
L
i,2
=
L
i
L
i
. (14)
The likelihood ratio test,
LR = 2 ln
L
1
L
2
(15)
has an asymptotic
2
distribution under the null, model M
1
for example.
However, this is only the case if M
1
M
1
(as
2
_ 0). In other words, the
plain likelihood ratio test can be applied in the context of the compre-
hensive approach, but not in cases where one wants to evaluate `separate'
families of hypotheses or overlapping models. If one is not particularly
252 Probability, Econometrics and Truth
interested in signicance testing but in model discrimination, then it may
still be useful to consider the maximized likelihoods and select the model
with the higher one (this is also noted in Cox, 1962), perhaps after con-
sidering matters such as simplicity using the modied simplicity postu-
late. But Cox aims at constructing a proper signicance test in the spirit
of the NeymanPearson framework in the absence of prior probability
distributions.
22
This test is an elaboration of the simple likelihood ratio
test (Cox, 1961, p. 114):
LRC = LR
1,2
S
1
(L
1,2
). (16)
where L
1,2
is the log-likelihood ratio, S
1
{L
1,2
] is the expected value (if M
1
were true) of the difference between the sample log-likelihood ratio given
M
1
and the sample log-likelihood ratio given M
2
(the likelihoods are
evaluated at the maximum likelihood estimates). Cox (1961) shows that
under the null, LRC is asymptotically normal.
An essential feature of this procedure is that it is asymmetric: inference
may depend on whether M
1
or M
2
is chosen as the reference hypothesis,
and it may happen that M
1
is rejected if M
2
is the reference model and
vice versa, in which case there is inconsistent evidence. If the problem is
not one of model selection but specication search, this may be prob-
lematic: it gives a hint that `something' is wrong.
The Cox test rst appeared in econometrics in the early 1970s (see, e.g.,
the surveys by Gaver and Geisel, 1974, and MacKinnon, 1983). Hendry,
Mizon and Richard use it in their theory of encompassing, which is used
for mis-specication analysis as well as testing rival theories (see, e.g.,
Mizon and Richard, 1986). In practice, they rely on linear embedding
models with simple tests for adding a subset (excluding the overlapping
variables) of variables of a rival model to the reference model. For this
purpose, a set of t-tests or an F-test (see the discussion of embedding,
above) can be used (`parameter encompassing'), plus a one degree
of freedom test on the variances of the rival models (`variance
encompassing').
Symmetric testing A different frequentist approach to the prob-
lem of model choice is presented in Vuong (1989), who drops the assump-
tion that either one or both models should be `true' in the statistical sense.
This is the third strategy in the frequentist domain, the symmetric or
equivalence approach (which dates back to work of Harold Hotelling
in 1940). Vuong provides a general framework in which nested, non-
nested or overlapping models can be compared, with the possibility
that one or both of the models being compared may be mis-specied.
Positivism and the aims of econometrics 253
He uses the Kullback Leibler Information Criterion (KLIC), which mea-
sures the distance between a given distribution and the `true' distribution.
The idea is `to dene the ``best'' model among a collection of competing
models to be the model that is closest to the true distribution' (Vuong
1989, p. 309).
Let the conditional density functions of the `true' model be given as
h
0
(y
t
[z
t
). The data x
t
= (y
t
. z
t
) are assumed to be independent and iden-
tically distributed, therefore, the approach is not directly applicable to
time series models. The `true' model is not known, but there are two
alternative models, parameterized by and , respectively. If they do
not coincide with the `true' model, they are mis-specied. In this case,
one can use quasi-maximum likelihood techniques to obtain the `pseudo-
true' values of the parameters, denoted by an asterisk (see Greene, 1993,
for discussion of this method). Given the vectors of observations {y. z],
the KLIC for comparing the distance between the `true' model and one of
the alternatives, e.g. the one parameterized by , is dened as:
KLIC(H
0
y[z
; M
) = S
0
ln f
0
(y[z)
S
0
ln f (y[z;
+
) [ ]. (17)
where M
f (y[[
i
. M
i
)f ([
i
[M
i
)d[
i
. (19)
where f (y[[
i
. M
i
) is the familiar likelihood function. To obtain informa-
tion on the comparative credibility of the rival models given by the data,
one can calculate f (M
i
[y). This is a straightforward application of Bayes'
theorem. The additional ingredients are the prior probabilities of the
models, denoted by f (M
i
). If, a priori, both models are equally credible,
it is natural to set f (M
i
) equal to 0.5. These probabilities are revised in the
light of the data to posterior probabilities:
f (M
i
[y) =
f (M
i
)f (y[M
i
)
f (y)
. (20)
The predictive density of future observations, y
f
, given oberved data y,
can be obtained by the predictive densities of those observations as given
by either of the rival models, weighed by the credibility of each model:
f (y
f
[y) = f (M
1
[y)f (y
f
[y. M
1
) f (M
2
[y)f (y
f
[y. M
2
). (21)
Positivism and the aims of econometrics 255
where f (y
f
[y. M
i
) is obtained by integrating out the parameters of model
M
i
from the predictive density of new observations (weighted by the prior
probability density function (pdf) of the parameters).
The predictive approach to inference does not force one to choose
between the models. Optimal inference on future data depends on both
(two, or even more) models. Evidence is combined, not sieved. This is a
strength as well as a weakness. It does not make sense to invest in calcu-
lating the predictive pdf obtained by models that are quite off the mark
(such a model would carry a low weight in the combined prediction).
Similarly, parsimony suggests that if both models yield similar predictive
results, one might just as well take only one of the models for making a
prediction. Hence, it may be desirable to make a choice. This can be done
by extending the predictive approach with a decision theoretic-analysis.
The decision theoretic approach The decision theoretic approach
to the evaluation of rival models is based on the posterior odds
approach, due to Jeffreys ([1939] 1961). The marginal densities of the
observations for the rival models are compared by means of the posterior
odds ratio,
POR =
f (M
1
[y)
f (M
2
[y)
=
f (M
1
)
f (M
2
)
f (y[M
1
)
f (y[M
2
)
. (22)
The posterior odds ratio is equal to the prior odds times the ratio of the
weighted or averaged likelihoods of the two models. This is the so-called
Bayes factor, the ratio of the marginal densities of y in the light of M
1
(with parameters [
1
) and M
2
(with parameters [
2
) respectively. The
Bayes factor reveals the quality of the representations.
The averaging of the likelihoods is the result of the uncertainty about
the unknown parameters [
i
. The posterior odds ratio is obtained by
integrating this uncertainty out:
POR =
f (M
1
[y)
f (M
2
[y)
=
f (M
1
)
f (M
2
)
f (y[[
1
. M
1
)f ([
1
[M
1
)d[
1
f (y[[
2
. M
2
)f ([
2
[M
2
)d[
2
. (23)
This expression makes clear how Jeffreys' posterior odds approach
relates to the NeymanPearson likelihood ratio test. The latter is based
exclusively on the fraction f (y[[
1
. M
1
),f (y[[
2
. M
2
), evaluated at the maxi-
mum likelihood estimates hence, it only depends on the goodness of t
of the models. There is no consideration of the prior odds of the models,
unlike in the posterior odds analysis. In case of comparing two linear
regression models, Zellner (1971, pp. 31011) shows that the posterior
256 Probability, Econometrics and Truth
odds, on the other hand, depend on the prior odds of the rival models,
precision of the prior and posterior distributions of the parameters, good-
ness of t, and the extent to which prior information agrees with infor-
mation derived from the data. The goodness-of-t factor dominates if the
sample size grows to innity.
In practical applications, the Bayesian researcher usually sets this prior
odds ratio equal to one, in order to give both models an equal chance.
Private individuals can adjust this ratio with their subjective odds as they
like. A more fundamental difference between the NeymanPearson
approach and posterior odds is that the rst is based on the maxima of
the likelihood functions, while the latter is based on averages of likeli-
hood functions. For averaging the likelihood functions, one needs prior
probability distributions (or densities), a reason why some investigators
do not like the posterior odds approach. One may interpret the difference
as a dilemma in constructing index numbers (Leamer, 1978, pp. 1008):
few statisticians propose the highest observed price as the ideal `index' for
consumption goods' prices. The likelihood ratio test is like this `index'. If
likelihood functions are relatively at (compare: most elements of the
class to be summarized in the index have values close to the maximum),
this may be not too bad an approximation to the ideal index. If, however,
data are informative (for example, due to a large sample size), the
NeymanPearson likelihood ratio test statistic and its relatives, Cox's
test and Vuong's test, are unsatisfactory: the maximum values of the
two likelihood functions are not representative for the likelihood func-
tions as a whole. This leads to conicting results between likelihood ratio
tests and posterior odds ratios, if one hypothesis is a sharp null and the
alternative a composite hypothesis.
Another distinction between the NeymanPearson approach and
Jeffreys' posterior odds ratio is that the latter does not depend on
hypothetical events but on the observations and prior probability
densities.
The decision theoretic approach builds on Jeffreys' posterior odds by
considering the possible losses of making wrong decisions (see Savage,
[1954] 1972; Gaver and Geisel, 1974; Berger, [1980] 1985). An example is
the MELO, or Minimum Expected Loss, approach. Consider the follow-
ing very simple loss structure. Given the `correctness' of either model M
1
or M
2
, a loss structure with respect to the rival models might be that
accepting a `correct' model involves zero loss, while accepting the wrong
model leads to a positive loss of L. A more complicated loss structure
might be one that is based on prediction errors (see Zellner, 1984, p. 239,
for discussion of squared error loss). The posterior expected loss of
choosing model i and rejecting j for the simple example given here is
Positivism and the aims of econometrics 257
EL(M
i
) = 0 f (M
i
[y) L f (M
j
[y). (26)
In general, the model with the lowest expected loss will be chosen. The
Bayesian decision theoretic approach uses the posterior odds ratio but
adds an appropriate loss structure to the decision problem. M
1
will be
chosen if the weighted (averaged) likelihood ratio, f (y[M
1
),f (y[M
2
),
exceeds the prior expected loss ratio.
7 Summary
The beginning of this chapter sketched the positivists' view on science.
This view, I argued, is still relevant to econometrics. It is characterized by
an emphasis on observation, measurement and verication. Furthermore,
positivists do not believe in the need to search for `true causes' and tend
to be sceptical about deep parameters. They are Humeans, without
embracing Hume's response to scepticism.
The aims of measurement, causal inference, prediction and testing
seem to compete. If, however, we take a step back and reconsider the
theory of simplicity, then it becomes possible to make order out of the
chaos of rival aims. Description is useful as it serves prediction.
Explanation is useful as well (here, some hard-line positivists might dis-
agree), as it allows the use of analogy and the construction of a coherent
theory (minimizing the number of auxiliary hypotheses), which is eco-
nomic and again, this serves prediction. The nal section on testing rival
theories showed that the Bayesian approach allows for a very useful
combination of prediction and testing. A great advantage is that it
does not rely on the existence of one `true' model.
The discussion on the meaning of causal inference for econometrics
may be claried by accepting Hume's scepticism, without concluding that
`everything would have been produced by everything and at random' (as
Sextus Empiricus wrote). To say that, for example, the minimum wage
causes unemployment, is `convenient shorthand' for the rational expecta-
tion that an increase of the minimum wage will be accompanied by an
increase in unemployment, and more importantly, that an intervention in
the minimum wage with a 1% change will result in an x% change in
unemployment. This is perfectly legitimate, even for a Humean sceptic.
The rational expectation is supported by empirical research: measure-
ment, inference, identication of `historical experiments', perhaps even
proper econometric tests. Conditional prediction is the principal aim of
economics: in the end, it is the rational expectation of the effect of inter-
ventions that makes economic analysis relevant.
258 Probability, Econometrics and Truth
The aims of econometric inference can be summarized as: nding reg-
ularities that are simple as well as descriptively accurate, and that are
stable with reference to intended interventions. Sceptics, who hold that
econometrics has not yielded much interesting empirical knowledge, may
be right when they search for `realist' knowledge: truth. Econometrics
does not deliver truth.
Notes
1. Pearson ([1892] 1911, p. 86) argues that the same is true in physics, e.g.
Newton's law of gravitation.
2. Note that Friedman had `long discussions' with Popper in 1947 at the found-
ing meeting of the Mont Pelerin Society. Friedman found Popper's ideas
`highly compatible' with his own. In a pragmatic sense, Friedman is right:
both advocate confrontation of predictions with the data. From a philoso-
phical perspective, there are great differences, where Friedman is too modest
when he writes that Popper's views are `far more sophisticated' (see Friedman
and Friedman, 1998, p. 215).
3. It is ironic that its prominent member, Koopmans, attacked the economists at
the National Bureau of Economic Research who took this slogan most ser-
iously: Burns and Mitchell. The emphasis of the Cowles Commission, from at
least 1940 onwards, was on theory (the rst input in the hypothetico-deduc-
tive model of science), not measurement.
4. Quetelet measured anything he could. An example is the blooming of lilacs in
Brussels. He went beyond description, in this case. A law of lilacs is the result.
Let t be the mean daily temperature (degrees Celsius). The lilacs will blossom
if t
2
> 4264
C, with t summed from the last day with frost onwards (see,
Hacking, 1990, p. 62).
5. The problem with his realism on preferences can be exposed by asking whose
kind of preferences are not doubted: Hahn's? Stigler and Becker's? Veblen's?
6. Caldwell (1982) notes that Robbins does not use the words a priori in his
essay. Robbins' emphasis on the self-evident nature of economic propositions
justies my classication of Robbins' response to scepticism as the a priori
strategy.
7. Lucas (1987, p. 45) does not claim that models should be `true', they are just
`workable approximations' that should be helpful in answering `a limited set
of questions'.
8. It is difcult to dene Kant's position. I will follow Russell (1946) who holds
that Kant belongs to the rationalist tradition.
9. Note that Feigl considered causality both in the context where intervention is
relevant and where it is not (astronomy). His denition encompassed both
cases. Zellner at least implicitly has interventions in mind.
10. Another, more modern, approach is due to Mackie (1965), who analyses
causation by means of INUS conditions (Suppes, 1970, pp. 756, extends
Positivism and the aims of econometrics 259
this to the probabilistic context). An INUS condition is a condition that is an
Insufcient but Necessary part of a condition which is itself Unnecessary but
Sufcient for the result. An example is: if a short circuit (A) is an INUS
condition for re (B), an additional condition C might be the presence of
oxygen, and the event D would be another, but distinct, way by which the re
might have been caused. C and D serve as background information. An
INUS condition does not have to be a genuine cause (see the discussion in
Cartwright, 1989, pp. 257).
11. A similar statement of Kant's is that everything that happens `presupposes
something upon which it follows in accordance to a rule' (translation in
Kru ger, 1987, p. 72).
12. More precisely, if C happens, then (and only then) E is always produced by it.
13. Although Leibniz rejected gravitation as an `inexplicable occult power', not a
true cause (see Hacking, 1983, p. 46).
14. J. S. Mill's fth canon of induction, the method of concomitant variation,
says that if two variables move together, either is the cause or the effect of the
other, or both are due to some (other) fact of causation.
15. In Cartwright's book, the meaning of `capacity' gradually changes to autono-
mous relations, like the ones that generate the `deep parameters' that are the
principal aim of new-classical econometricians. I think stability and autono-
my are different issues. I would agree with the statement that econometric
inference for purposes of prediction and intervention needs relative stability,
but I am not persuaded that the notion of capacity is needed as well.
16. The original denition of Granger (1969) is given in terms of variances of
predictive error series, where roughly speaking x causes y if x helps to reduce
this variance in a linear regression equation.
17. Occasionally it is acknowledged that not all great works in empirical science
are of a predictive nature. Darwin's theory of evolution is a notorious exam-
ple. Indeed, it is hard to predict future evolution of man or other animals. But
Darwin's theory is not void of predictions: they are the so-called `missing
links', and discoveries such as the Java and Peking men are examples of
conrmed predictions of Darwin's theory (van Fraassen, 1980, p. 75).
18. Note that in his earlier writings, he warns that prediction is a dangerous
undertaking, as it may lead to speculation and instability (Koopmans,
1941, p. 180).
19. The Austrian aim of economics is to explain how unintended consequences
result from purposeful actions. Intervention (government policy to stabilize
the economy) will yield additional unintended consequences and should,
therefore, be avoided.
20. A search with `online contents' of all (not only economics) periodicals at the
Tilburg University Library from January 1991 to 12 August 1993 (containing
212,856 documents) yielded the following results:
2,702 papers have a title with `testing' (and variations thereof) as an entry
2,133 papers have a title with `estimating' (and variations) as an entry
92 papers have a title with `rejection' (and variations) as an entry
260 Probability, Econometrics and Truth
62 have a title with `conrmation' as an entry
1,443 papers have a title containing the word `evidence'.
21. A similar dispute raged in 1911 and 1912 between Karl Pearson and Keynes
about the inuence of parental alcoholism on children. Pearson established
that there was no effect, Keynes disputed Pearson's inference. According to
Keynes, Pearson's analysis does not control for all the relevant factors: there
was what might be called spurious non-correlation. Keynes' main objection
to Pearson was ethical: it is very unlikely that alcoholism is irrelevant, you
therefore should control for everything that possibly might be relevant before
making statistical claims. It is immoral to make a wrong judgement in a case
like this. See Skidelski (1983, pp. 2237) for discussion.
22. If proper prior probability distributions are available, then the Bayesian
posterior odds approach suggests itself. Cox (1961, p. 109) notes that if
such distributions are not available, and use is made of improper distribu-
tions, then the odds are driven by the normalizing constants of M
1
and M
2
.
See the discussion of the Bayesian approach below for some comments.
Positivism and the aims of econometrics 261
10 Probability, econometrics and truth
Your problems would be greatly simplied if, instead of saying that you
want to know the `Truth,' you were simply to say that you want to
attain a state of belief unassailable by doubt.
C. S. Peirce (1958, p. 189)
1 Introduction
The pieces of this study can now be forged together. The rst chapter
introduced the problems of induction and Humean scepticism. Different
responses to scepticism were mentioned: naturalism, apriorism, conjec-
turalism and probabilism. In the remaining chapters, I analysed the foun-
dations of a probabilistic approach to inference. Rival interpretations of
probability were presented and their merits compared. I provided argu-
ments as to why I think apriorism and conjecturalism are not satisfac-
tory, although I do not deny that the probabilistic approach is
problematic as well.
Chapter 3 discussed the frequency approach to probabilistic inference,
chapter 4 considered the epistemological (Bayesian) perspective.
Although the logical foundations of the frequency approach to probabil-
istic inference are problematic, probability theorists have been able to
develop an impressive box of tools that are based on this interpretation.
Unsatisfactory philosophical foundations do not necessarily prevent a
successful development of a theory. On the other hand, the logically
more compelling epistemological approach has not been able to gain a
strong foothold in applied econometric inference. Econometricians have
difculties in specifying prior probability distributions and prefer to
ignore them. Still, they tend to give an epistemological interpretation to
their inferences. They are Bayesian without a prior.
The applications of frequentist methods in econometric investigations
reveal, however, that there is a discomforting gap between the (frequen-
tist) theory and practice of econometrics. This gap has lead to increasing
scepticism about the accomplishments of econometric inference.
Econometric practice is not the success story that the founders of econo-
metrics expected. In 1944, Marschak, director of the Cowles
Commission, wrote a memorandum to the Social Science Research
262
Committee (quoted in Epstein, 1987, p. 62), that stated his research
agenda for the near future:
19456: work on method(ology) to be completed in the main
19468: nal application of method to business cycle hypotheses and
to (detailed) single market problems
19489: discussion of policy. Extension to international economics
A few years before, a similar agenda had worked for the development of
the atomic bomb, but statistical evaluation of economic theories turned
out not to be that easy, not to speak of social engineering.
1
An old problem, specication uncertainty, subverted the credibility of
econometric investigations. The econometrician's problem is not just to
estimate a given model, but also to choose the model to start with.
Koopmans (1941, p. 179) and Haavelmo (1944) delegate this problem
to the economic theorists, and invoke the `axiom of correct specication'
(Leamer, 1978) to justify further probabilistic inference. This problem of
specication uncertainty was noticed by Keynes (remember his call for an
econometric analogy to the Septuagint). Econometricians have been
unable to respond convincingly to Keynes' challenge. This has discredited
the probabilistic response to Humean scepticism, and, perhaps for that
reason, methodological writings in economics are not characterized by an
overwhelming interest in econometric inference. Instead, falsicationism,
apriorism and `post-modernism' dominate the scene.
Section 2 of this chapter deals with the methodology of economics. I
reconsider why the probabilistic response to scepticism deserves support.
Which probability interpretation should underlie this probabilistic
response, is the theme of section 3. In section 4, I discuss the temptations
of truth and objectivity. Section 5 provides a summary.
2 Methodology and econometrics
2.1 Popperians without falsications
Philosophers of science are not regularly quoted in Econometrica. If,
however, an econometrician refers to the philosophy of science, it tends
to be in praise of Popper or Lakatos. However, econometricians rarely
try to falsify. Exceptions prove this rule. One exception is Brown and
Rosenthal (1990, p. 1080). Without referring to Popper these authors,
who test the minimax hypothesis, write that `the empirical value of the
minimax hypothesis lies precisely in its ability to be rejected by the data'.
And indeed, they conclude that the hypothesis is rejected by the data.
This is the Popperian way of doing science (albeit in a probabilistic
Probability, econometrics and truth 263
setting) and, occasionally, it yields interesting results (in particular, if a
rejection points where to go next, which is the case in the example cited).
If one searches hard enough, one should be able to nd additional exam-
ples of Popperian econometrics. However, they are rare. A similar con-
clusion was drawn in chapter 8 on the search for homogeneity.
Econometricians usually are neither Popperian nor NeymanPearson
decision makers. They tend to search for adequate empirical representa-
tions of particular data. This is what positivism is all about (see chapter 9,
section 2). Accepting an economic theory on the basis of econometric
arguments `involves as a belief only that it is empirically adequate' (van
Fraassen, 1980, p. 12). However, empirical adequacy is too vague to serve
as a guiding principle in econometrics. For example, if `adequate' means
`ts well', one has to confront the problem that some empirical models
cannot be less adequate than rival ones which are a special cases of
empirical models, as Lucas pointed out when he compared (Keynesian)
disequilibrium models to new-classical equilibrium models. Adequacy
should be supplemented by parsimony and internal consistency (`non-
ad-hocness'). The trade off can be investigated by means of the modied
simplicity postulate.
2.2 Econometrics in the methodological literature
Econometricians, who are interested in the philosophy of science, would
benet more from reading the methodological works of Karl Pearson,
Ronald A. Fisher and Harold Jeffreys, than from reading Popper,
Lakatos or methodological writings on economics such as Blaug (1980)
or Caldwell (1982). I will turn to the literature on the methodology of
economics to show how it deals with econometrics. Few writings in the
methodology of econometrics take a probabilistic perspective. I will
briey illustrate the fate of econometrics in methodological writings, by
passing through the works of Blaug, Caldwell, Hamminga, Boland and
McCloskey.
Blaug is a Popperian. He is aware that economists do not obey Popper's
rules, but instead of concluding that Popper is wrong, he complains that
econometricians are essentially lazy or negligent in not confronting the-
ories with tough tests. Applied econometrics too often resembles `playing
tennis with the net down' (Blaug, 1980, p. 256). Blaug does not discuss the
scope and limits of statistical inference. Apart from a few casual remarks
about NeymanPearson (pp. 203), there is no analysis of the (practical or
philosophical) difculties that may arise in econometric testing. In this
sense, Hendry's writings (that are claimed to follow the conjecturalist
model of inference) are complementary to Blaug, and despite my disagree-
264 Probability, Econometrics and Truth
ments with his view, he provides a better understanding of how econome-
tricians try to test than Blaug does.
A second popular methodologist is Caldwell. Caldwell (1982, p. 216)
complains that econometricians neglect, or only make gratuitous refer-
ences to, philosophical issues. This is the only occasion in his book where
the word econometrics occurs the index of Beyond Positivism has no
entry on related topics such as statistics, probability or NeymanPearson.
Spanos (1986, chapter 26), tries to meet Caldwell's challenge to integrate
economic methodology and econometrics. Spanos does not discuss
Caldwell's views on apriorism, which has become a more dominant
theme in later works of Caldwell.
Hamminga (1983, pp. 97101), in a case study of the theory of inter-
national trade, discusses problems of identication and specication
uncertainty in econometrics and concludes that they result inescapably
in ad hoc procedures. He concludes that `results of econometric research
cannot in the least affect the dynamics of the OhlinSamuelson pro-
gramme' (p. 100). This means that these two elements are independent
of each other: the theoretical developments in the theory of international
trade could have gone the same way if no econometric research had ever
been done at all. If this claim holds universally, then we may wonder
what the use of econometrics is. But I do not think Hamminga's claim
has such universal status. The independence of these two disciplines sug-
gests that econometric research could similarly continue without taking
notice of theoretical developments, which, I think, is an entirely unwar-
ranted conclusion. For example, Leamer (1984) provides an extensive
discussion of the empirical literature as well as an econometric analysis
of the HeckscherOhlin theory. Leamer does not ask whether this theory
should be accepted or rejected, but rather how adequately it is able to
represent the data. Another example is the theory of consumer beha-
viour, which was stimulated by the empirical ndings (think of exible
functional forms, dynamic models, aggregation). A third example is the
surge in theoretical analysis of the Permanent Income Hypothesis, largely
driven by empirical econometrics (see the investigations in liquidity con-
straints and excess sensitivity).
A discussion of econometrics similar in scope and contents to
Caldwell's can be found in the well-known book of Boland (1982, p. 4).
He briey turns to econometrics, and concludes:
Presentations of methodology in typical econometrics articles are really nothing
more than reports about the mechanical procedures used, without any hint of the
more philosophical questions one would expect to nd in a book on
methodology.
Probability, econometrics and truth 265
I prefer to turn this upside down. Presentations of econometrics in typical
methodological articles are really nothing more than badly informed
caricatures, without any hint of the more sophisticated discussions of
philosophical issues that can be found in econometric studies like
Haavelmo (1944), Vining (1949), Zellner (1971), Leamer (1978), Sims
(1980) or Hendry (1980) (I refrain from mentioning papers that appeared
after Boland wrote this passage and acknowledge all shortcomings in
these papers). It is not easy to take Boland's remark seriously. He briey
discusses the NeymanPearson methodology (Boland, 1982, p. 126) but
does not analyse its merits or weaknesses.
A nal example of methodological interest in econometrics comes from
McCloskey who (as already noted) claims that `no proposition about
economic behavior has yet been overturned by econometrics'
(McCloskey, 1985, p. 182) (Spanos, 1986, p. 660, makes the same asser-
tion). In Popperian terms, econometrics is a failure. I concur with
McCloskey's view and share the critique on the use of signicance
tests. However, I disagree with the `rhetorical' cure. The fact that few
writers are so well versed in rhetorics as McCloskey herself, combined
with my feeling that she does not persuade (me personally, nor the major-
ity of the econom(etr)ics profession), should be sufcient reason to reject
the rhetorical approach to methodology on its own terms.
3 Frequency and beliefs
3.1 Frequentist interpretations of probability and econometrics
Von Mises' frequency theory is the most `objective' basis for probabilistic
inference. Von Mises argues that a statistician needs a collective: no
collective no probability. His theory of induction can be summarized
by his `second law of large numbers', which states that if one has a large
number of observations, the likelihood function will dominate the prior
(see page 37 above). Therefore, we can safely ignore the problem of
formulating priors. For econometrics, this has three drawbacks. First,
in many cases (in particular, time-series econometrics), the number of
observations is rather small. Secondly, the validity of the randomness
condition is doubtful. In economics `stable frequency distributions' rarely
exist. Thirdly, even if the number of observations increases, the `sampling
uncertainty' may decrease but the `specication uncertainty' remains
large (Leamer, 1983). As a result of those three considerations, the `sec-
ond law of large numbers' cannot be easily invoked. Perhaps not surpris-
ingly, von Mises' theory of inference has not gained a foothold in
econometrics.
266 Probability, Econometrics and Truth
R. A. Fisher shares the frequentist perspective, without relying on the
notion of a `collective'. His small sample approach based on hypothetical
populations should appeal to econometricians. This is one explanation
for Fisher's (mostly implicit) popularity in early econometric writings.
The likelihood function is one of the basic tools of applied econometri-
cians, it is like a jackknife (Efron, 1986).
2
Like von Mises, Fisher does not
want to rely on the specication of prior probability distributions when
they are not objectively given. But instead of the `second law of large
numbers', he proposes the problematic ducial argument. In the econo-
metric literature, it received the same fate as von Mises' theory: ducial
inference is simply neglected. Still, econometricians are tempted to inter-
pret the likelihood as a posterior, and frequently they interpret condence
intervals in the Bayesian way.
Neyman and Pearson, nally, devise a theory of quality control in a
context of repeated sampling. They do not aim at inductive inference.
Although their approach is helpful in evaluating the qualities of different
test statistics, the underlying philosophy is not relevant to econometric
inference. Econometricians are not decision makers with explicit loss
functions. Their goals are much closer to Fisher's goals of science.
3.2 Frequentists without frequencies
The frequentist interpretation of probability uses the law of large num-
bers. Keynes ([1921], CW VIII, p. 368) argues that
the `law of great numbers' is not at all a good name for the principle which
underlies statistical induction. The `stability of statistical frequencies' would be
a much better name for it . . . But stable frequencies are not very common, and
cannot be assumed lightly.
Von Mises, I think, might well agree with this statement.
Moreover, `stable frequency distributions, as in biology, do not exist in
economic variables'. This observation is not made by a critic of econo-
metrics, like Keynes, or an opponent of frequentist econometrics, like
Leamer, but by Tjalling Koopmans (1937, p. 58). If von Mises' condi-
tions for collectives are not satised, it is not clear how to justify methods
of inference that are (explicitly or implicitly) based on them. Perhaps it is
by means of a leap of the imagination. For example, one might argue that
the `rigorous notions of probabilities and probability distributions
``exist'' only in our rational mind, serving us only as a tool for deriving
practical statements of the type described above'. This is not de Finetti
speaking, but Haavelmo (1944, p. 48).
Probability, econometrics and truth 267
Koopmans, Haavelmo and many of their followers use the frequentist
interpretation of probability as a convenient metaphor. Metaphors are
not forbidden in science, the `random numbers' that Bayesians use in
numerical integration are just another metaphor. Some metaphors are
more credible than others, though. In the case of the frequentist meta-
phor, the credibility is limited (in particular, in cases of macro-economic
time-series data). This credibility really breaks down if adherents of the
metaphor still pretend that their method is `objective', unlike rival meth-
ods such as Bayesian inference.
3.3 The frequentistBayes compromise
The debate on the foundations of probability has not resulted in a con-
sensus. However, some efforts to reconcile the frequentist approach with
the Bayesian have been made. De Finetti proposed the representation
theorem for this purpose (it was discussed in chapter 4, section 3.3). I
do not think that this theorem served to settle the debate.
A rather different motivation for introducing objective probability into
an overall subjective theory of inference is given by Box (1980). He
suggests using frequency methods to formulate a model (this is the spe-
cication search) and subsequently suggests using Bayesian methods for
purposes of inference and testing. This advice is not rooted in deep phi-
losophical convictions but is more pragmatically inspired by the view that
frequentist methods are better suited for diagnostic testing. Indeed, it is
not uncommon to encounter a DurbinWatson statistic in Bayesian ana-
lyses. Such statistics reveal whether residuals contain relevant informa-
tion: such information may be useful in the context of Bayesian inference.
What is not valid in this context is to interpret frequentist statistics in
frequentist terms.
Good (1988) argues that in a frequentistBayes compromise, the Bayes
factor against a null hypothesis might be approximated by 1=P
N
p
where
P is the P-value and N the sample size. In order to avoid Berkson's (1938)
paradox, a frequentist statistician should decrease the signicance level of
a test if the sample size increases. According to Good's rule, a sample of
fty observations would warrant a P-value of 0.14, whereas the conven-
tional 5% signicance level is appropriate for sample sizes of 400 obser-
vations. This is rarely practised in econometrics. The correlation between
P and N in empirical papers in the Journal of Econometrics (197390) is
even positive (see Keuzenkamp and Magnus, 1995, for evidence). This is
consistent with the view that econometricians want to measure, not test.
268 Probability, Econometrics and Truth
4 Farewell to truth
The alternative responses to Humean scepticism, presented in chapter 1,
have different views on the status of truth in scientic inference.
Apriorists believe they can obtain `true' knowledge by thinking.
Popperians come `closer to the truth' by weeding out false propositions.
Conventionalists, like Hume himself, and pragmatists like Peirce, do not
think that the notion `truth' is very helpful in the development of knowl-
edge. In the probabilistic response, two schools oppose each other: fre-
quentists, who praise `truth', and epistemologists, who join Hume and the
pragmatists in their farewell to `truth'.
4.1 Truth and probabilistic inference
The title of this monograph is a play on words, referring to the book of
the positivist probability theorist, Richard von Mises: Probability,
Statistics and Truth. Von Mises argues for a frequentist interpretation
of probability: use this for the statistical methods of inference, and apply
it in empirical research. This will pave the way to `truth', where his
understanding of truth is a positivist one. Von Mises' `truth' primarily
refers to the `true' value of the rst moment of a distribution (cf. von
Mises, [1928] 1981, p. 222). He aims to show that, `[s]tarting from a
logically clear concept of probability, based on experience, using argu-
ments which are usually called statistical, we can discover truth in wide
domains of human interest' (p. 220).
The frequentist interpretation of probability presumes a state of
`truth', which not only is to be discovered but also enables one to dene
consistency of estimators and similar notions. Von Mises' probability
limits are his `truth' (but the parameters will always remain stochastic!).
Fisher argues that the parameters that one tries to infer are `true' but
unknown constants, characterizing specic probability distributions. If
you know a priori that a specic distribution is valid, then this way of
arguing may be fruitful. And indeed, Fisher has tried to justify the appli-
cation of specic distributions, in particular the normal distribution, by
his theory of experimental design (Fisher, [1935] 1966). Haavelmo is
inuenced by Fisher's interpretation of `truth'. Consider, for example,
the following statement of Haavelmo (1944, p. 49):
the question arises as to which probability law should be chosen, in any given
case, to represent the `true' mechanism under which the data considered are being
produced. To make this a rational problem of statistical inference we have to start
out by an axiom, postulating that every set of observable variables has associated
with it one particular `true,' but unknown, probability law.
Probability, econometrics and truth 269
The way to interpret this statement is that there is a specic probability
distribution, of which the `true' parameters are unknown. But, unlike
Fisher, econometricians do not have a theory of experimental design
that validates the choice of a specic distribution. The recent rise of
non-parametric inference is a (late) response to this defect, the alternative
is to elaborate a theory of specication search (Leamer, 1978).
Few scientists dare to acknowledge that they have problems with
`truth'. De Finetti ([1931] 1989) is one: `As a boy I began to comprehend
that the concept of ``truth'' is incomprehensible.' The epistemological
interpretation of probability has no need for truth. Instead, it emphasizes
belief, as for example in Keynes' writings:
Induction tells us that, on the basis of certain evidence, a certain conclusion is
reasonable, not that it is true. If the sun does not rise tomorrow, if Queen Anne
still lives, this will not prove that it was foolish or unreasonable of us to have
believed the contrary. ([1921]) CW VIII, p. 273)
Keynes is interested in rational belief, not truth. This distinguishes him
from Popper, who is after truth. Keynes' perspective is shared by Peirce,
as the epigraph to this chapter shows. In his discussion of pragmatism,
Peirce writes:
If your terms `truth' and `falsity' are taken in such senses as to be denable in
terms of doubt and belief and the course of experience (as for example they would
be if you were to dene the `truth' as that . . . belief in which belief would tend if it
were to tend indenitely toward absolute xity), well and good: in that case, you
are only talking about doubt and belief. But if by truth and falsity you mean
something not denable in terms of doubt and belief in any way, then you are
talking of entities of whose existence you can know nothing, and which Ockham's
razor would clean shave off. (1958, p. 189)
The only way to interpret `truth' is the limit of inquiry where a scientic
community settles down. But we can never be sure that this settlement is a
nal one. There is no guarantee that a consensus will emerge, we can only
hope for it (see also Hacking, 1990, p. 212).
Peirce subscribes to the growth-of-knowledge school, which believes
that science becomes more and more accurate (empirically adequate).
This was a mainstream view held by physicists around the end of the
nineteenth century. It was even thought that most major discoveries had
been made, physics was virtually nished. This reminds us of Keynes
([1921] CW VIII, p. 275), who warns: `While we depreciate the former
probability of beliefs which we no longer hold, we tend, I think, to
exaggerate the present degree of certainty of what we still believe.'
270 Probability, Econometrics and Truth
4.2 The seduction of objectivity
Objective knowledge is tempting. It is a favourite goal in (neo-)
Popperian writing. Objectivity is also a theme that splits the statistics
(and econometrics) community. According to Efron (1986, p. 3), objec-
tivity is one of the crucial factors separating scientic thinking from
wishful thinking. Efron thinks that, `by denition', one cannot argue
with a subjectivist and, therefore, Efron rejects de Finetti's and
Savage's subjectivist interpretation of probability as `unscientic'.
Efron (p. 331) adds that measures of evidence that can be used and
understood by the `scientist in the street' deserve the title `objective',
because they can be directly interpreted by the members of the scientic
community. As the Fisherian methods of inference form the most popu-
lar language in statistics, they are the objective ones. If we take this
argument seriously, then objectivity is what the majority subscribes to.
This is like Peirce's conception of truth with a twist. According to
Efron, objectivity is something that pleases and seduces the scientist. If
objectivity is what the majority subscribes to, and objectivity is aimed for,
this leads to uncritical herd behaviour. Indeed: `The false idol of objec-
tivity has done great damage to economic science' (Leamer, 1983, p. 36).
Moreover, if objectivity is interpreted as a democratic group decision,
and the members of the group do not share the same utility (or loss)
function, then there may be no rational group decision (Savage, [1954]
1972, p. 1727). This argument resembles Arrow's impossibility theorem.
The debate on `objective' versus `subjective' is not fruitful. In a sense,
all methods that are well explained by a researcher are `objective'. If it
can be replicated, it is objective. Inference, in this sense, is usually objec-
tive, whether from a frequentist or a Bayesian perspective. Similarly, the
data are, both for Bayesians and for non-Bayesians, `objective'. The real
issue is that there is no unique way to get rid of specication uncertainty.
The interpretation is always bound to human caprice. If interpretations
of different researchers converge, then we might say that the state of
knowledge has increased. But this convergence should not be pre-
imposed by a frequentist tyrant.
3
4.3 Limits to probabilism
A proper analysis of inference with respect to probabilistic hypotheses is
of central importance for a sound methodology. Epistemological prob-
ability is the tool that serves this analysis. However, cognitive limitations
prohibit an ideal use of Bayes' theorem. Scientists are not able to behave
like grand-world bookmakers. Even if they face true decision problems,
Probability, econometrics and truth 271
they usually have to construct a small world which may be an improper
reduction of the large world. This may lead to surprise and re-specifying
the `small world' (Savage, [1954] 1972). Fully coherent behaviour, such as
proposed by de Finetti, is superhuman. De Finetti's claim that the key to
every activity of the human mind is Bayes' theorem is untenable. It would
make the rational individual a robot (or Turing machine), which may be
a correct interpretation of rationality, but the creative individual would
be better off being able to escape from the probabilistic straitjacket
imposed by the principle of inverse probability. Creativity is the process
of changing horizons, moving from one small world to another one. If
large world inference is beyond human reach, then creativity (which is by
nature hard to model) and incoherent behaviour are indispensable. The
possibility to revise plans is typical for human beings. Perhaps the ability
to break with coherency is the most precious gift to humanity.
Incoherent behaviour is to sin against the betting approach of Bayesian
inference, the one proposed by Ramsey, de Finetti and Savage. However,
to sin may be useful, as was already known to the scholastics: `multae
utilitates impedirentur si omnia peccata districte prohiberentur' (Thomas
Aquinas, Summa Theologia, ii. ii, q. 78 i), i.e. much that is useful would
be prevented if all sins were strictly prohibited. Reverend Bayes should
allow us to sin and disregard the plea for strict coherency. Van Fraassen
(1989, p. 176) suggests adopting a liberal version of Bayesian inference in
which a certain amount of voluntarism is accepted. He compares this
voluntarism with English law, where anything not explicitly forbidden
is allowed. Orthodox Bayesianism, on the other hand, is better compared
with Prussian law, where anything not explicitly allowed is forbidden
(van Fraassen, 1989, p. 171).
Apart from the impossibility of fully coherent behaviour, supporters of
the probabilistic approach to inference have to face the problem of spe-
cifying prior probability distributions. Keynes ([1921] CW VIII) had
already pointed to the problem of obtaining numerical probabilities.
Many empirical investigators who have tried to use Bayesian methods
for empirical inference have experienced this problem. Jeffreys tried to
circumvent this by means of a theory of non-informative prior probabil-
ity distributions. Although this theory is useful, it is not always satisfac-
tory (for example, when models of different dimension have to be
compared). The theory of universal priors is the most elaborate effort
to improve Jeffreys' methods. In chapter 5, I showed that this approach
runs counter to a logical problem of non-computability, the so-called
halting problem. `True' prior distributions for empirical applications do
not generally exist. If sufcient data become available, this problem is
relatively unimportant. Furthermore, an investigator may try different
272 Probability, Econometrics and Truth
prior probability distributions in order to obtain upper and lower limits
in probabilistic inference (this approach is proposed by Good and
Leamer). Most often, the issue at stake is not whether a variable is `sig-
nicant', but whether it remains relevant in various acceptable specica-
tions of the problem.
A third limitation to probabilistic inference is due to reexivity: self-
fullling prophecies and other interference of the subject with the object.
This undermines the convergence condition in frequentist inference.
However, a Bayesian investigator does not have to convert to extreme
scepticism. The possibility of instability does not invalidate the rational-
ity of induction: see Keynes' statement about Queen Anne, cited above.
The limitations discussed here suggest that probabilistic inference
remains ultimately conjectural. This is nothing to worry about, and it
is a view shared by (neo-) Popperians.
4.4 Econometrics and positivism
In chapter 9, I argued that econometrics belongs to the positivist tradi-
tion. It is a tradition with different niches, encompassing the views of
divergent scientists such as Pearson, Fisher, Jeffreys, von Mises and
Friedman. In the recent philosophical literature, positivism has been
given a new impulse by van Fraassen (1980). He argues for `empirically
adequate representations' of the data. If this is supplemented by a theory
of simplicity, it becomes possible to formalize adequacy and to show how
both theory and measurement can be used in probabilistic inference.
The traditional positivist picture of science is one of steady progress.
Pearson ([1892] 1911, pp. 967), for example, argues that the progress of
science lies in the invention of more general formulae that replace more
complex but less comprehensive `laws' (mental shorthand for descriptions
of the phenomena). `The earlier formulae are not necessarily wrong,
4
they
are merely replaced by others which in briefer language describe more
facts' (p. 97). Jeffreys ([1931] 1957, p. 78), who admires Pearson's
Grammar of Science, presents a view of science that resembles Pearson's:
Instead of saying that every event has a cause, we recognize that observations
vary and regard scientic method as a procedure for analysing the variation. Our
starting point is to consider all variation as random; then successive signicance
tests warrant the treatment of more and more as predictable, and we explicitly
regard the method as one of successive approximation.
I feel sympathetic to this view, although this does not imply that there
should be a convergence of opinions (in Peirce's view, this is the same as
convergence to truth). This early positivist's view of the growth of knowl-
Probability, econometrics and truth 273
edge may be too simplistic in general, as in economics the researcher tries
to represent a `moving target', due, for example, to unstable behaviour.
Still, in many cases, the target moves slowly enough to enable the inves-
tigator to obtain better models (empirically adequate and parsimonious);
the analysis of consumer behaviour is a good example.
5 Conclusion
Economic methodology never had much interest in econometrics. This is
a pity, as probabilistic inference is one of the strong contenders among
the responses to Humean scepticism. Econometrics is, therefore, of par-
ticular interest. Analysing the merits and limits of econometrics should be
an important activity of methodologists. In this book, I have tried to ll
some of the gaps in the methodological literature.
I have argued that econometricians, who search for philosophical
roots, should turn to an old tradition that has been declared dead and
buried by most economic methodologists: positivism. The works of
Pearson, Fisher and Jeffreys deserve more attention than those of the
most popular philosopher among economists, Popper. Econometrics is
not a quest for truth. Econometric models are tools to be used, not truths
to be believed (Theil, 1971, p. vi). And I agree with a warning, expressed
by Haavelmo (1944, p. 3), `it is not to be forgotten that they [our expla-
nations] are all our own articial inventions in a search for an under-
standing of real life; they are not hidden truths to be ``discovered'' '.
Clearly, econometricians have not been able to solve all their problems.
In particular, specication uncertainty hampers probabilistic inference.
Econometric inference is handicapped by the lack of experimental data
(Fisher's context) or large and homogeneous samples (von Mises' con-
text). Neither have econometricians, privately or as a group, well-speci-
ed loss functions on which NeymanPearson decisions can be based.
Hence, the fact that most econometricians still subscribe to the frequen-
tist interpretations of probability is striking. Some explanations can be
given. Frequentist tools serve to describe and organize empirical data.
They can be used as tools to obtain hypotheses. This is Friedman's
interpretation of Tinbergen's work, it is also the view of Box (1980).
The next step, inference (estimating parameters, testing or comparing
hypotheses, sensitivity analysis), is very often done in an informal,
Bayesian way, as if the investigator had used a non-informative prior
all along.
Neither the ofcial frequentist methods nor the ofcial Bayesian
methods are able to deal adequately with specication uncertainty. A
274 Probability, Econometrics and Truth
statement of the founder of the probabilistic approach to inference,
Laplace (cited in K. Pearson, 1978, p. 657), is particularly appropriate:
The theory of probability has to deal with considerations so delicate, that it is not
surprising that with the same data, two persons will nd different results, espe-
cially in very complicated questions.
Economic questions deal frequently with complex matter. If complexity is
merged with misunderstanding of the foundations and aims of inference,
the result will be frustration. This is the fate of econometrics unless it
abandons the quest for truth and returns to its positivist roots.
Notes
1. It is fair to say that the amount of money spent on the making of the atomic
bomb is incomparable to the money spent on the development of econo-
metrics: Rhodes (1986) provides an astonishing account of the magnitude
of economic costs of this piece of physical engineering.
2. Of course, not all econometricians rely on maximum likelihood. Fisher was
opposed by Karl Pearson, who preferred the method of moments. A similar
discussion can occasionally be found in econometrics. For example, Sargent
(1981, p. 243) prefers the method of moments because it avoids using the
likelihood function.
3. See also A. F. M. Smith in his comment to Efron (1986). Note how statisti-
cians use rhetorical methods to make their points. Fisher rejects the Neyman
Pearson methodology as a `totalitarian' perversion (see Fisher, 1956, p. 7, pp.
1002; chapter 3, above). Similarly, Smith rejects Efron's interpretation of
Fisher's methodology, because `[a]ny approach to scientic inference which
seeks to legitimize an answer in response to complex uncertainty is, for me, a
totalitarian parody of a would-be rational human learning process' (Smith, in
Efron, 1986, p. 10). Rhetorical techniques are used. How successful they are
may be questioned.
4. Pearson adds a footnote here, that reads: `They are what the mathematician
would term ``rst approximations'', true when we neglect certain small quan-
tities. In Nature it often happens that we do not observe the existence of these
small quantities until we have long had the ``rst approximation'' as our
standard of comparison. Then we need a widening, not a rejection of ``natural
law''.'
Probability, econometrics and truth 275
Personalia
Thomas Bayes (170261). English reverend whose posthumous Essay Towards
Solving a Problem in the Doctrine of Chance (1763) laid the foundation for
Bayesian inference. See also Laplace.
Emile Borel (18711956). French mathematician who contributed to Bayesian
probability theory and measure theory.
Arthur Lyon Bowley (18691957). Professor of statistics at the London School of
Economics, founding member and later president of the Econometric
Society. Was sympathetic to Bayesian inference.
Rudolf Carnap (18911970), philosopher, founder of logical positivism. Carnap
studied in Vienna. Professor of philosophy in Chicago from 1936 to 1952.
Subsequently, he succeeded Reichenbach at the University of California at
Los Angeles, where he held a chair until 1961. Carnap belongs to the most
important opponents of Popper in epistemology through his support for
inductive inference.
Harald Crame r (18931985). Swedish probability theorist who contributed to
central limit theory. His probability theory is closely related to the theory
of von Mises (Crame r, 1955, p. 21). Crame r's (1946) book on mathematical
statistics was the standard reference book on probability theory for econo-
metricians of the postwar decade. One of his students was Herman Wold.
Bruno de Finetti (190685). Italian probability theorist. One of the most radical
twentieth-century contributors to the subjective approach to probability the-
ory. He invented the notion of exchangeability.
Pyrrho of Elis (c. 365275 BC), the rst and most radical Sceptic, his philosophy
came to be known as Pyrrhonism. He dismissed the search for truth as a vain
endeavour. See also Sextus Empiricus and David Hume.
Sextus Empiricus (c. second century AD). Greek philosopher, popularizer of
Pyrrhonism (in Outlines of Pyrrhonism and Against the Dogmatists). His
work was re-published in 1562 and had much inuence on the philosophy
of, e.g., Descartes and Hume.
Ezekiel, Mordecai (18991974). Agricultural economist, who wrote an inuential
textbook on econometrics-avant-la-lettre.
276
Ronald Aylmer Fisher (18901962). English statistician and geneticist, one of the
greatest probability theorists of the twentieth century. He invented the max-
imum likelihood method, provided the foundations for the t-statistic,
invented experimenal design based on randomization and contributed
many statistical concepts. Sadly, philosophers and methodologists rarely
read his work.
Gottlob Frege (18481925). German mathematician and logician. Inuenced and
inspired logical positivism via Russell and Wittgenstein.
Ragnar Frisch (18951973). Norwegian economist who invented the word `econo-
metrics' and helped to found the Econometric Society. His main econometric
interest was related to systems of equations, for which he invented the `bunch
map' technique. Kalman may be regarded as one of his heirs.
Francis Galton (18221911). English eugenicist and statistician, known for his
analysis of correlation and `regression'. Among his less-known contributions
are a statistical test of the efciency of prayer.
Carl Friedrich Gauss (17771855). German mathematician and probability the-
orist. He provided the probabilistic context for least squares approximation.
Gauss is a contender for being named as inventor of the method of least
squares, but this claim seems to be unwarranted (see Legendre).
William Gosset (18761937). English statistician who contributed to small sample
analysis while solving practical problems at Guinness. In particular known for
the Student t-distribution (published in 1908 under the pseudonym Student).
Trygve Haavelmo (1911). Norwegian econometrician, student of Frisch. During
the 194347 period, he was research associate at the Cowles Commission.
While in the US, he had contacts with Neyman, Wald and many other
statisticians. Haavelmo (1944) strongly inuenced the course of econometrics
in the postwar years, with its emphasis on simultaneous equations bias and
formal NeymanPearson analysis.
David Hume (171176). Scottish philosopher and economist. Known for his
philosophical scepticism and empiricism (partly inspired by the writings of
Sextus Empiricus). Hume claims that it is not possible to verify the truth of
(causal) relations.
Harold Jeffreys (18911989). British scientist and probability theorist. He made
important contributions to make Bayesian inference of practical use, in par-
ticular by elaborating non-informative prior probability distributions and
the method of posterior odds.
Immanuel Kant (17241804). German philosopher, known for the `synthetic
truth a priori'.
John Maynard Keynes (18831946). English probability theorist who also has
some reputation as an economist. His Treatise on Probability counts as one
of the most important works on the theory of scientic induction of the
twentieth century.
John Neville Keynes (18521949). English economist who contributed John
Maynard Keynes to society.
Personalia 277
Andrei Kolmogorov (190387). Russian mathematician and probability theorist.
Known for his formal axiomatic approach to probability.
Tjalling Koopmans (191085). Dutch (later American) physicist who turned to
econometrics and economic theory. He wrote his dissertation on econo-
metrics under supervision of Tinbergen (Koopmans, 1937). After a short
stay at the League of Nations, he moved to the USA in 1940, joining the
Cowles Commission in 1944 where he elaborated the formal approach to
econometrics.
Imre Lakatos (192274). Hungarian philosopher who rened the Popperian
methodology of scientic research programmes.
Pierre Simon de Laplace (17491827). Founder of probability theory who redis-
covered `Bayes' Theorem'. The contributions of Laplace, among others, in
the Theorie Analytique des Probabilites (1812), go far beyond those of Bayes.
Despite his interest in probability theory, his philosophy was strictly deter-
ministic.
Adrien Marie Legendre (17521833). Successor to Laplace at the E
cole Militaire
and E
Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.
Alternative Proxies: