Our objective in editing the Handbook of Computational Economics was to provide

an introduction and selective overview of the rapidly emerging field of computational
economics. Actually, computational economics is not currently recognized as a sep-
arate field in most economics departments, and only a handful of departments offer
regularly listed courses in this area. However computer technology is rapidly changing
the way many economists approach their research, and a large body of published work
has already demonstrated the usefulness and importance of computational methods in
economics. In a recent report to the National Science Foundation [Kendrick (1993)],
a panel of economists surveyed the wide range of contributions that computer models
have made to our discipline in areas such as economic growth, econometrics, inter-
national trade, general equilibrium theory, environmental change, game theory and
mechanism design, static and dynamic optimization problems, and macroeconomics.
The report also identified "many areas where computational economics offers substan-
tial opportunities to extend the frontiers of knowledge" [Kendrick (1993, p. 269)]. The
rapid growth in this area is reflected in the large number of computationally oriented
articles in regular economics journals and the recent emergence of specialized jour-
nals in this area, including Computational Economics and Series A of the Journal of
Economic Dynamics and Control. In 1994 the Society for Computational Economics
(SCE) was founded, and starting in 1995 there will be annual SCE-sponsored interna-
tional conferences in computational economics in addition to the frequent workshops
and con~%rences that already take place on an ad hoc basis (e.g. NSF-supported con-
ferences and workshops on computational methods such as the workshop hosted by
Daniel McFadden and Paul Ruud at the University of California at Berkeley in Au-
gust, 1994). The first international conference of the SCE was held at the [C 2 Institute
in Austin Texas in May 1995 and the second international conference will be held at
the University of Geneva in June, 1996. There is also a World Wide Web home page
for the SCE at h t t p : / / w w w . u n i g e . c h / c e which serves as a reference point for
other conferences, workshops, and research opportunities in computational economics.
Even though computational economics is still in its adolescence - rapidly growing
and evolving-- there are a number of subareas that are already quite mature and highly
viii Prelace to the Handbook

developed. Therefore we felt that this is a good time to publish a handbook in this
area. We view computational economics as providing an important set of tools that an
increasing number of economists will need to acquire in order to understand and do
state of the art research in virtually all areas of economics. In the near future it will
be as difficult to do economics without having a good understanding of computing
hardware and software and numerical methods as it is to do economics without a solid
understanding of real analysis, probability, and statistics.
In the process of selecting topics for this Handbook we decided against trying to
be comprehensive or encyclopedic in our coverage of the subject. Our objective was
to have the Handbook sample a number of different styles and approaches, reflecting
the breadth of computational economics as it is practiced today. In particular, articles
in this volume range from very applied, policy oriented applications of computational
methods to highly theoretical and mathematically complex analyses of algorithms and
numerical methods. Although the main impetus for computational methods first arose
in econometric applications in the last 1940's and early 1950's, we made a conscious
decision not to attempt to cover econometric applications of computational methods
since we regard this as the territory for the Handbook o f Econometrics. In addition, we
made a conscious decision to minimize overlap with topics that have been treated in
recent handbooks in operations research and computer sciences since there are already
a number of high quality handbooks in these areas [see e.g. van Leeuwen (1990) and
Heyman and Sobel (1990)]. Our goal in this Handbook was to emphasize the unique
contributions of computational methods in economics and focus on problems for which
well developed solutions are not already available from the literatures in operations
research, numerical methods, and computer science. Nevertheless, the strong influence
of these literatures on the development of computational economics will be evident
throughout the volume.
Finally, while a number of chapters in this Handbook cover relatively mature areas
in computational economics, we also consciously included a number of chapters on
more speculative "frontier topics" to convey a sense of the exciting new applications
and rapid progress in this field. In particular, several chapters in this volume present
important recently discovered computational innovations and research results. While
it is quite possible that the field of computational economics will look very different
10 years from now than it does now, we designed this volume to provide a broad
overview of the current state of the art, and a set of tools that will have a big impact
on the way economics is done in the future.


We are indebted to many people who assisted in the production and editing of this
Handbook. First of all we would like to thank the Handbook series editors, Kenneth
Arrow and Michael Intriligator, for their constant support and encouragement. We
Pre[ace to the Handbook ix

want to make special acknowledgement to John Geweke, Preston Miller, and Arthur
Rolnick and the Federal Reserve Bank of Minneapolis for hosting the March 18-
19, 1994 conference on Research and Training in Computational Economics, during
which initial drafts of the chapters in this H a n d b o o k were presented and critiqued. We
also want to thank the the Federal Reserve staff who helped organize the conference,
including Carol Blunt, Jody Fahland, Correan Hanover, and Vicki Reupke. Finally
we wish to acknowledge the diligence and promptness of the many "official" and
unofficial referees who reviewed various versions of the chapters included in this
H a n d b o o k . Their insightful comments greatly improved the overall quality of this
volume (although of course we bear final responsibility for any of its defects). These
referees included:
Sudhakar Achath, University of Texas, Sumru Altug, Virginia Polytechnic Insti-
tute, David Belsley, Boston College, Eric van Damme, Tilburg University, Hennie
Daniels, Tilburg University, Paul Fisher, Bank of England, Manfred Gilli, University
of Geneva, Vassilis Hajivassiliou, Yale University, Kenneth Judd, Hoover Institution,
Michael Keane, University of Minnesota, Timothy Kehoe, University of Minnesota,
Michel Keyzer, Center for World Food Studies, Blake LeBaron, University of Wis-
consin, Martin Lettau, Tilburg University, Alan Manne, Stanford University, Ellen
McGrattan, Federal Reserve Bank, Minneapolis, John Miller, Carnegie Mellon Uni-
versity, Reinhard Neck, University of Bielefeld, Soren Nielsen, University of Texas,
Ariel Pakes, Yale University, Spassimir Paskov, Columbia University, Chris Phelan,
MEDS, Northwestern University, Edward S. Prescott, University of Chicago, Martin
Puterman, University of British Columbia, John Rowse, University of Calgary, Berc
Rustem, Imperial College, University of London, Thomas Rutherford, University of
Colorado, Herbert Scarf, Yale University, Sanjay Srivastava, Carnegie Mellon, Dale
Stahl, University of Texas, Charles Tapiero, ESSEC, George Tauchen, Duke Univer-
sity, Joseph Traub, Columbia University, John Tsitsiklis, MIT, Marco Tucci, University
of Siena, Kenneth Wallis, University of Warwick, Charles Whiteman, University of
Iowa, Robert Wilson, Stanford University, Peter Zadrozny, GTE Labs.

H.M. Amman
University of Amsterdam
D.A. Kendrick
University of Texas
J. Rust
University of Wisconsin
x Preface to the Handbook


Heyman, D.P. and Sobel, M.J., eds (1990) Handbooks in operations research and management science.
Amsterdam: Elsevier.
Kendrick, D. et al. (1993) 'Research opportunities in computational economics', Computational Economics,
van Leeuwen, J., ed. (1990) Handbook qf theoretical computer science. Amsterdam: Elsevier.

Introduction to the Series

Preface to the Handbook


Chapter 1


Monash University

1. Introduction 4
1. t. Definition 5
1.2. Brief history 6
2. Solving a CGE model 9
2.1. The programming approach 10
2.2. The derivative approach: The Johansei~Euler method 12
2.3. Solving a multi-period model 24
3. An illustrative CGE model 36
3.1. Input-output database 37
3.2. Equations 39
3.3. Coefficients, parameters, zero problems and initial solution 48
3.4. Closure of the illustrative model 53
3.5. Simulations 55
4. Concluding remarks: Success, partial success and potential of CGE
modelling 67
4.1. Success: Quantifying linkages between different parts of the economy 67
4.2. Partial success: Analysis of welfare effects 69
4.3. Potential: Disaggregated forecasting 76
References 79

*We thank Michael Malakellis, Mark Horridge, Ken Pearson, Dawie de Jongh, Sang Hec Han, John
Piggott and Tom Rutherford for advice and assistance.

Handbook ~)/'Computational Economics, Volume 1, Edited by H.M. Amman, D.A. Kendrick and J. Rust
(~) 1996 Elsevier Science B.V. All rights reserved.
P.B. Dixon and B.R. Parmenter

1. I n t r o d u c t i o n

Over the last 35 years, computable general equilibrium (CGE) models have been used
in the analysis of an enormous variety of questions. These have included
the effects on
• macro variables, including measures of nation-wide or even global economic
• industry variables;
• regional (both sub-national and super-national) variables;
• labour market variables;
• distributional variables; and
* environmental variables
of changes in
• taxes, public consumption and social security payments;
. tariffs and other interferences in international trade;
• environmental policies;
• technology;
• international commodity prices and interest rates;
• wage setting arrangements and union behavior; and
• known levels and exploitability of mineral deposits (the Dutch disease).
While most of these questions have been analyzed in single-country, single-period
models, there are now numerous CGE models which are either multi-regional or
multi-period (dynamic) or both. By going multi-regional, CGE modelling has thrown
light on both intra-country and inter-country regional questions. In the first category
are issues (important in federations) concerning the effects of tax and expenditure
activities of provincial governments. In the second category are issues such as the
effects of the formation of trading blocks and the effects of different approaches to
reducing world output of greenhouse gases. By going dynamic, CGE modelling has
the potential to broaden and deepen its answers to all questions with which it has
been confronted. It has also entered the forecasting arena. CGE models are now used
to generate forecasts of the prospects of different industries, labour force groups and
regions These forecasts feed into investment decisions by private and public sector
organizations affecting stocks of physical and human capital.
The main objective of this chapter is to show how CGE models can be constructed
and applied. We try to achieve this objective by describing the construction and
application of an illustrative model. Although the model is small, it illustrates the key
aspects of CGE modelling, including
• input-output data,
• elasticity parameters,
• theoretical specification,
cTl. 1: Computable General EquilibriumModelling 5

® solution algorithm and

• result interpretation.
The illustrative model can be used in two ways: as a single-period model suitable for
comparative-static analyses; and as a model for multi-period forecasting.
The chapter is organized as follows. In the remainder of this section (Subsections 1.1
and 1.2) we define CGE modelling and provide a brief history of its development.
Then in Section 2 we discuss the computation of solutions for CGE models. The
illustrative model is in Section 3. Section 4 is an overview of what we see as the
field's achievements, failures and potential.
Readers can choose their own path through the chapter. After completing this sec-
tion, some readers may like to skip the mathematics in Section 2 and move straight
to the illustrative model in Section 3. Others might like to start with Section 4 and
then work back through the more technical material in Sections 2 and 3.

1.1. Definition

The distinguishing characteristics of computable general equilibrium (CGE) models

are as follows.
(i) They include explicit specifications of the behavior of several economic actors
(i.e. they are general). Typically they represent households as utility maximizers and
firms as profit maximizers or cost minimizers. Through the use of such optimizing
assumptions they emphasize the role of commodity and factor prices in influencing
consumption and production decisions by households and firms. They may also include
optimizing specifications to describe the behavior of governments, trade unions, capital
creators, importers and exporters.
(ii) They describe how demand and supply decisions made by different economic
actors determine the prices of at least some commodities and factors. For each com-
modity and factor they include equations ensuring that prices adjust so that demands
added across all actors do not exceed total supplies. That is, they employ market
equilibrium assumptions.
(iii) They produce numerical results (i.e. they are computable). The coefficients and
parameters in their equations are evaluated by reference to a numerical database.
The central core of the database of a CGE model is usually a set of input-output
accounts showing for a given year the flows of commodities and factors between
industries, households, governments, importers and exporters. The input-output data
are normally supplemented by numerical estimates of various elasticity parameters.
These may include substitution elasticities between different inputs to production
processes, estimates of price and income elasticities of demand by households for
different commodities, and foreign elasticities of demand for exported products.
An alternative name for CGE models is applied general equilibrium (AGE) models.
This name emphasizes the idea that in CGE modelling the database and numerical
P.B. Dixon and B.R. Parmenter

results are intended to be more than merely illustrative. CGE models use data for
actual countries or regions and produce numerical results relating to specific real-
world situations.

1.2. Brief history

On our definition, the first CGE model was that of Johansen (1960). t His model was
general in that it contained 20 cost-minimizing industries and a utility-maximizing
household sector. For these optimizing actors, prices played an important role in
determining their consumption and production decisions. His model employed market
equilibrium assumptions in the determination of prices. Finally, it was computable (and
applied). It produced a numerical, multi-sectoral description of growth in Norway
using Norwegian input-output data and estimates of household price and income
elasticities derived using Frisch's (1959) additive utility method.
Following Johansen's contribution, there was a surprisingly long pause in the de-
velopment of CGE modelling with no further significant progress until the 1970s. The
1960s were a period in which leading general-equilibrium economists developed and
refined theoretical propositions on the existence, uniqueness, optimality and stability
of solutions to general equilibrium models. 2 Rather than being computable (numeri-
cal), their models were expressed in general, algebraic terms.
The most direct link between this theoretical work and CGE modelling was made
by Scarf (1967a, 1967b and 1973). Drawing on the mathematics of the theoretical ex-
istence theorems, Scarf designed an algorithm for computing solutions to numerically
specified general equilibrium models. This algorithm had finite convergence proper-
ties, i.e. for a wide class of general equilibrium models, the algorithm was certain to
produce a solution in a finite number of steps.
Scarf was of great importance in stimulating interest in CGE modelling in North
America. In the early 1970s, his students John Shoven and John Whalley became
leading contributors to the field [see, for example, Shoven and Whalley (1972,
1973, 1974)]. In 1991, when Scarf was awarded a distinguished fellowship of the
American Economic Association, the citation, read in part:

Scarf's path-breaking technique for the computation of equilibrium prices has re-
sulted in a new subdiscipline of economics: the study of applied general equilibrium
models... Scarf was the catalyst behind the creation of this subfield of the profession

IOn a broader definition, CGE modelling starts with Leontief's (1936, 1941) input-output models of
the 1930s and includes the economy-widemathematical programming models of Sandee (1960), Manne
(1963) and others developed in the 1950s and 60s. We regard these contributions as vital forerunners of
CGE models. On onr definition, input-output and programming models are excluded from the CGE class
because they have insufficient specification of the behavior of individual actors and the role of prices. Scarf
(1994) takes a similar view of the origins of the field.
2See Arrow and Hahn (1971).
Ch. 1: Computable General Equilibrium Modelling

and in the transformation of the general equilibrium model from a purely theoret-
ical construct to a useful toot for policy analysis. (American Economic Review,
82(4), September 1992.)
In our view, this misrepresents Scarf's contribution. Johansen had already solved a
relatively large CGE model by a simple, computationally efficient method 3 well before
the Scarf algorithm was invented. Scarf's technique was never the most effective
method for doing CGE computations. Even those CGE modellers who embraced the
Scarf technique in the 1970s had by the 1980s largely abandoned it. When dealing
with models capable of giving practical answers to policy and forecasting questions,
they switched to older methods such as Newton-Raphson and Euler algorithms. For
this reason, and also because it has been reviewed extensively elsewhere including
in other volumes of the Handbooks in Economics, 4 Scarf's approach receives only
passing mentions in the remainder of this chapter.
While the 1960s were not an active period in CGE modelling, they were a key
decade in the development of large-scale, economy-wide econometric models (e.g.
the Wharton, DRI, MPS, St Louis, Michigan and Brookings models). 5 Relative to
CGE models, the economy-wide econometric models paid less attention to economic
theory and more attention to time-series data. In CGE models, the specifications of
demand and supply functions are completely consistent with underlying theories of
optimizing behavior by economic actors. In economy-wide econometric models, the
role of optimizing theories of the behavior of individual actors is usually restricted to
that of suggesting variables to be tried in regression equations.
In the 1960s, the underlying philosophy of the econometric approach of "letting
the data speak" seemed attractive to applied economists. This may be part of the
explanation of the pause in the development of the CGE approach. In the 1970s
there were two factors, apart fi'om Scarf's bridge with the theoretical literature, which
stimulated interest in the CGE approach.
First, there were major shocks to the world economy including a sudden escalation
in energy prices, a sharp change in the international monetary system and rapid growth
in real wage rates. Without tight theoretical specifications, the econometric models
could not provide useful simulations of the effects of shocks such as these which
carried economies away from established trends.
CGE models are often vulnerable to the criticism that their behavioral specifica-
tions (e.g. utility maximization and cost minimization) are imposed without empirical
validation. 6 However, with these specifications in place, CGE models can offer in-

3Section 2 contains a description of Johansen's method.

4See, for example, Scarf (1982) and Kehoe (1991).
5For an historical perspective on these models, see the papers in Kmenta and Ramsey (1981).
6Among CGE modellers, Dale Jorgenson and his colleagues are the least vulnerable to this criticism.
Starting in 1974, Jorgenson has emphasized the need for econometric estimation of all parameters [see,
for example, Hudson and Jorgenson (1974), Jorgenson (1984) and Jorgenson and Wilcoxen (1994)]. To
support his many CGE applications to energy and environmental issues in the U.S., he has made econometric
estimates for cost functions, indirect utility functions and trade parameters at a detailed level.
P.B. Dixon and B.R. Parmenter

sights into the likely effects of shocks for which there is no historical experience.
For example, up to 1973, there was no modern experience of a sharp change in oil
prices. Consequently, in regression equations based on pre-1973 time-series data, the
price of oil has an insignificant or zero coefficient. This meant that models relying
heavily on time-series analysis implied that movements in oil prices would not be
an important determinant of economic activity. In detailed CGE models, inputs of
oil appear as variables in production functions. Then through cost-minimizing cal-
culations, increases in the price of oil act on economic activity in CGE simulations
in the same way as increases in the prices of other inputs. In the 1970s, interest in
CGE modelling increased as applied economists recognized the power of optimizing
assumptions in translating broad experience (e.g. experience of cost increases) into
plausible predictions of the effects of particular shocks for which we may have no
experience (e.g. the effects of an increase in oil prices).
The second factor driving the growth of CGE modelling over the last 20 years has
been its increasing ability to handle detail. The key ingredients have been improved
data bases (e.g. the availability of unit records from Censuses) and improved computer
programs (e.g. the availability of programs such as GEMPACK, GAMS, HERCULES
and CASGEN). 7 In our consulting work in Australia, we can now use CGE models to
satisfy demands for analyses disaggregated into effects on 120 industries, 56 regions,
280 occupations, and several hundred family types. At this level of detail, no other
technique has as much to offer as CGE modelling. 8 As CGE modellers have learnt to
handle more detail, CGE results have become of interest to public and private sector
organizations concerned with, among other things: industries; regions; employment;
education and training; income distribution; social welfare and the environment.
CGE modelling is now an established field of applied economics. Several detailed
surveys of CGE modelling have appeared in leading journals and in books from
prominent publishers [e.g. Shoven and Whalley (1984), Pereira and Shoven (1988),
Robinson (1989, 1991), Bandara (1991) and Bergman (1990)]. There are regular inter-
national meetings of CGE modellers, often followed by the production of a conference
volumes [e.g. Kelley, Sanderson and Williamson (1983), Scarf and Shoven (1984),
Piggott and Whalley (1985 and 1991), Srinivasan and Whalley (1986), Bergman, Jor-
genson and Zalai (1990), Bergman and Jorgenson (1990), Don, van de Klundert and
van Sinderen (1991), Devarajan and Robinson (1993) and Mercenier and Srinivasan
(1994)]. Numerous monographs have been published giving detailed descriptions of
the construction and application of CGE models [e.g. Johansen (1960), Adelman and
Robinson (1978), Keller (1980), Dixon, Parmenter, Sutton and Vincent (1982), Hart'is

7Descriptions of general-purpose software for solving CGE models include Pearson (1988), Codsi and
Pearson (1988), Bisschop and Meeraus (1982), Brooke, Kendrick and Meeraus (1988), Drud, Kendrick and
Meeraus (1986), Meeraus (1983), and Rutherford (1985a and b). The existence of this software means that
economists interested in building and applying CGE models no longer need either a high level of skill in
programming or a sophisticated understanding of algorithms for solving systems of equations.
SThe main alternative is input-output analysis and its extensions. We return to this in Section 4.
Ch. 1: Computable General Equilibrium Modelling

with Cox (1983), Ballard, Fullerton, Shoven and Whalley (1985), Whalley (1985),
McKibbin and Sachs (1991) and Horridge, Parmenter and Pearson (1993)]. At least
three CGE textbooks are now available for graduate students and advanced under-
graduates [Dervis, de Melo and Robinson (t982), Shoven and Whalley (1992) and
Dixon, Parmenter, Powell and Wilcoxen (1992) 9] and graduate students all over the
world are engaged in-writing CGE theses.
Is the field past its peak? Is it in danger of going stale? We don't think so. We
think that CGE modelling will generate high-profile academic careers for many years
to come. More importantly, it is likely to be increasingly influential in policy making
and in business.
For applied economists with a strong interest in theory, the CGE field offers the
challenge of incorporating into the models ideas from modern macro- and micro-
economics. Of the ideas emerging from macroeconomics, rational expectations and
the differences between the effects of anticipated and unanticipated shocks have re-
ceived considerable attention in CGE modelling [e.g. Ballard and Goulder (1985),
Bovenberg and Goulder (1991), Mercenier and Sampaio de Souza (1994), Jorgenson
and Wilcoxen (1994) and Dixon, Parmenter, Powell and Witcoxen (1992, Chapter 5)].
We expect that CGE models will soon appear incorporating other ideas from modern
macro such as technical change related to accumulation of human capital, and hys-
teresis in labour markets and international trade. Drawing from ideas in modern micro
theory, CGE models are being constructed which include product differentiation at the
firm level, economies of scale, free and costly entry and exit, price discrimination and
game-theoretic behavior [see, for example, Harris with Cox (1983), Harris (1984),
Cox and Harris (1985, 1986), Norman (1990) and Mercenier (1994a and b)].
For applied economists with a strong empirical/statistical interest, the challenges
offered by CGE modelling are unbounded. They include: compilation of timely input-
output and other data with economically meaningful industrial, regional, occupational,
environmental and social classifications; the measurement of outputs for difficult in-
dustries such as banking; the measurement of capital inputs; the estimation of elasticity
parameters; the estimation of trends in tastes and technology; the selection of impor-
tant issues for analysis; and the representation of results in a clear and persuasive

2. Solving a CGE model

There are two main approaches to solving CGE models: non-linear programming
and derivative methods. In Subsection 2.1 we describe briefly the programming ap-
proach. Then in Subsection 2.2 we give a fuller account of a particular derivative
method (the Johansen/Euler method). This latter method is the one that we use to

9This last textbook is accompanied by a set of teaching diskettes prepared by Pearson (1992).
10 P.B. Dixon and B.R. Parmenter

solve the illustrative model in Section 3. Subsection 2.3 is a discussion of how

to solve multi-period or intertemporal models. A special problem with the use of
derivative techniques in solving these models is the need to construct an initial solu-

2.1. The programming approach 1°

The programming approach relies on the idea that the solution to a CGE model can
often be deduced from the solution to an optimization problem. Consider, for example
a two consumer, pure-exchange (no production) model. A solution for this model is
a list of non-negative vectors,

Z = {P, C(1), C(2)}

satisfying the following conditions:

C(i) maximizes U.i(C(i)) subject to P'(C(i) - Z(i)) : 0, i : 1,2, (2.1)

C(i) = ~ Z(i) (2.2)

i i


P ' I = 1, (2.3)

C(i) is the consumption vector for consumer i,
P is the vector of commodity prices,
Z(i) is the exogenously given endowment vector of consumer i;
U,i is consumer i's utility function which we assume is strictly concave.
Condition (2.1) means that consumers maximize their utility functions, Ui, subject to
their budget constraints. Condition (2.2) ensures that demand equals supply for each
good. (For convenience, we assume that there are no goods in excess supply at zero
price.) Condition (2.3) sets the overall level of prices.

I(IThis approach to CGE computation was developed by several authors including Dixon (1975, 1978a)
and Ginsburgh and Waelbroeck (1981). For a recent contribution, see Rutherford (1992). The theoretical
underpinnings were given by Negishi (1960).
Ch. 1: Computable General EquilibriumModelling 11

One approach to finding ~ is to solve a sequence of non-linear programming

problems of the form:

choose C(1), C(2) to maximize

Wl UI (C(1)) -[- W2U2 (C(2)) (2.4)

subject to

c(i) = z(i), (2.s)

i i

where W1 and W2 are a pair of non-negative numbers adding to one.

At a solution (C(1), C(2)) to problem (2.4)-(2.5), there will exist a vector of
Lagrangian multipliers, H , such that

WivU, i(-C(i)) = / / , i = 1,2. (2.6)

Hence, for both i = 1 and 2,

U(i) maximizes Ui(C(i)) subject to H'(C(i) - C ( i ) ) = 0. (2.7)

It is also true that

E e(i) = C Z(i). (2.8)

i i

if it happens that

17'(-C(i)) = II'(Z(i)), (2.9)

then we have found a solution, ~~, to our CGE model as follows:

P =

C(i) = C(i), i--1,2.

If (2.9) is not satisfied, then we vary the weights, Wi, resolving problem (2.4)-(2.5)
until it is satisfied.
Non-linear programming methods can be extended well beyond the pure-exchange
case. They have been applied successfully in solving CGE models which include
12 P.B. Dixon and B.R. Parmenter

production, investment, capital accumulation, taxes and trade.ll Nevertheless, we have

found derivative methods to be more convenient and flexible. This is also the current
experience of most other people in the C G E field.

2.2. The derivative approach: The Johansen/Euler method

We consider a model for which a vector V of length n is a solution (or equilibrium)

if it satisfies a system of equations

F ( V ) = O, (2.10)

where F is a vector function of length m. The components of the vector V represent:

demands for and supplies of commodities and factors; prices; taxes and subsidies;
surpluses and deficits; technological coefficients and other economic variables. The
equation system (2.10) imposes conditions such as: demands equal supplies; prices
equal costs; and demands depend on relative prices and expenditure levels.
We assume that F is differentiable and that the number of variables, r~, exceeds the
number of equations, m. Exogenously given values are assigned to r~ - m variables.
Finally, we assume that an initial solution, V I, is known. That is, we have a vector
V I such that

F ( V l) = 0. (2.11)

For a one-period model, finding an initial solution is usually trivial. As we will see
in Section 3, V I can normally be read from the model's input-output database. The
problem is a little harder for some multi-period models where the initial solution, V l,
has to be constructed. As explained in Subsection 2.3, in most cases, this can be done
quite easily.
In computing solutions to a CGE model, we should take advantage of our knowl-
edge of V I. By not using an initial solution, non-linear programming methods 12 and
combinatorial approaches such as the Scarf algorithm neglect valuable information.
Given an initial solution, we can generate new solutions for our model by elementary
derivative techniques, e.g. variants of Newton's and Euler's methods. We have found
variants of Euler's method to be particularly easy to use and effective in large-scale
C G E computations. 13

t tFor a recent application of the non-linear programming method in a model including these features,
sce Dixon (1991).
12Non-linear programming packages normally store the latest solution to a problem and then proceed
from there when a new solution is required after a change in parameter values. However, with any alteration
in the structure of the problem to be solved, all information on old solutions is usually lost.
13This is the approach underlying the GEMPACK programs for solving CGE models, see Pearson
(1988) and Codsi and Pearson (1988).
Ch. 1: Computable General EquilibriumModelling 13

To describe the Euler method in the context of CGE modelling we start by rewriting
(2.10) as

F ( V I , V2) = 0, (2.12)

where VI is the vector, of length m of endogenous variables and 1/2 is the vector length
r~ - m of exogenous variables. Then we totally differentiate (2.12), recognizing that
if we are to continue to have a solution to our model, then deviations, dVl and dV2,
from V 1 must satisfy, to a linear approximation,

Yl (VI) dV1 + F2(V I) dV2 = 0, (2.13)

where F1 and F2 are matrices of partial derivatives of F evaluated at V I.

To make a one-step Euler or Johansen 14 approximation, we compute

dV1 = B ( V l) dV2 (2.14)


B ( V I) = _ F I - ' (VI)F2(VI). (2.15)

Provided we can evaluate B(VI), then Eq. (2.14) can tell us how, in the region
of V I, the endogenous variables (1/1) are affected by movements in the exogenous
variables (Vz). For example, using (2.14), we might compute the effects on output
and employment in the footwear industry (components of 1/1) of changes in various
taxes and tariffs (components of V2).
Four issues need to be discussed concerning the Johansen/Euler computation,

Is F1 (V I) invertible? First, is it legitimate to assume that F1 (V I) is invertible? From

the implicit functions theorem, 15 we know that the existence of F1-1 (V l) is a necessary
and sufficient condition for the existence of a unique m-vector function G satisfying

F(G(V2), 1/2) = 0 (2.16)

for all 1/2 in a neighbourhood of V2I. Consequently, if F1 (V x) is singular, then our

model either contains no answer to the question of how 1/1 is affected by variations in
V2 in the region of V2I, or it contains multiple answers. In either case, failure to be able
to apply (2.14)-(2.15) would not be the fundamental problem. Rather, our problem

14Equations (2.14)-(2.15) describe the computations made by Johansen (1960).

15See, for example, Apostol (1957, pp. 146-148).
14 P.B. Dixon and B.R. Parmenter

would be either that there were contradictions within our model or that the model was
under-specified to answer the question at hand. We can conclude that the assumption
that F1-1 (V I) exists does not limit the applicability of our computing method: the
assumption will be valid if we are attempting to compute the answer to a question to
which our model has a unique solution.

Approximation errors. The second issue concerns approximation errors. The matrix
B(V ~) shows the partial derivatives, evaluated at V I, of the endogenous variables (1/1)
with respect to the exogerrous variables (V2). That is, B ( V I) is the Jacobian matrix of
the solution function, G. Thus, for a given movement in the exogenous variables, the
valuation of dV1 via (2.14) provides only a first order approximation to the effects on
the endogenous variables implied by the model (2.12). Where (dV1)true is the exact
vector of effects implied by the model and (dV~)(.1) is the set of effects computed
via (2.14), we have

(dVl)true=G(Vl 4- dV2) - G(V2I)

= B ( V I) dV2 + HOT
= (dV1)(. U 4- HOT. (2.17)

HOT is the vector of values of higher order terms in a Taylor's series.

If dV2 is small, then (dV~)(.1) will be a good approximation to (dV1)true, i.e., the
components of HOT will be small. But what do we do if dV2 is not small? In any
case, how can we tell whether dV2 is small enough for (dVl)(.l) to be a satisfactory
approximation to (dV1)true?
One way to answer these questions is to make a multi-step Johansen/Euler compu-
tation. For example, we can compute the effects of the change, dV2, in the exogenous
variables in two steps rather than one. In the first step, we compute

(dV1)(1,2) =/3(1,2) (½ dV2). (2.18)

The subscripts (1,2) refer to the first step of a two-step computation. (dVl)(l,2) is our
approximation to the effect on the endogenous variables of a vector of shocks of half
the size of those in which we are ultimately interested. B0,2) is the/3 matrix used in
the first step of the two-step computation. It is the same as t3(V l) defined in (2.15).
Having computed (dgl)(1,2), we re-evaluate the B matrix as

= - F ( -1 (vo,2))&(v(,,2)) (2.19)

where V0,2) is the vector of values of the variables at the end of the first step of the
two-step computation. That is

v 1,2) -- I v ? + (dV,)v,2), V2' + ½(dV2)]. (2.20)

Ch. 1: Computable General Equilibrium Modelling 15
Slope = B ( V I ) = B(1,2 )

Yl + (dV1)('l) .............................................................
~Error in one-step computation " * < e/ i Slope = B(2,2)
VI1+ (dV1)(°2) ..........................................................
"1" ~- lb
VI+ (dV1).... ~ .~7.... in two-atep computation , VI= G(V2 )
...................................................... la
v[ + (dV0(La)


'I I '! I ' V2

Figure 1.1. Johansen/Euler solutions.

Now we can compute

(dV1)(2,2) = B(2,2)(½ dV2), (2.21)

i.e., we compute the effect on the endogenous variables of the remaining half of the
shocks to the exogenous variables.
Finally, we compute our two-step approximation to (dV1)t~e as

(dgl)(.2) = (dV1)(1,2) + (dVi)(2,2). (2.22)

in many (perhaps most) general equilibrium models the solution functions, G, are
well approximated by quadratic functions over variable ranges relevant to simulations.
In these cases, we find that the two-step computation of dVl involves about half the
error of the one-step computation. That is we find that

(dV1)(.2)- (dV1)tme "~ 1 [(dV1)(.1) -(dV~)true]. (Z23)

This is not an appropriate place to offer a rigorous justification of (2.23). 16 Instead

we offer a diagram and some suggestive algebra.
The diagram (Fig. 1.1) illustrates a 2 variable case in which we are concerned with
the effects on the endogenous variable (1/1) of moving the exogenous variable (172)

16For a complete discussion of results such as (2,23), see Dahlquist, Bjorck and Anderson (1974, pp.
16 P.B. Dixon and B.R. Parmenter

from V2I to V2t + dV2. We assume that the form of G is unknown but that we do know
the initial solution (1/1l, V2I) and also how to evaluate derivatives of G, e.g. via (2.14)-
(2.15). When we use a one-step Johansen/Euler calculation to compute the effect on
V1 of moving V2 from V2I to V2l + dV2, we obtain the answer (dVl)(.1), having an error
of ac. When we use a two-step computation, the error is reduced to ab. Notice that
with the G function drawn approximately quadratic as in our diagram, the two-step
error, ab, is approximately half the one-step error, ac.
Now we turn to the suggestive algebra. We return to the m-equation-n-variable
case and we assume that each of the m endogenous variables is a quadratic function
of the n - m exogenous variables. For the jth endogenous variable, we have

V l ( j ) = a j ( v 2 ) = aj + b¢V2 + ~1 V!,,-~
2 w j v17
2, (2.24)

where a j, bj and Qj are parameters with aj being a scalar, bj being a vector of length
n - m and Qj being a symmetric matrix of size (n - m) x (n - m).
The vector of first-order partial derivatives of Gj, i.e. the jth row of the Jacobian
matrix of G, is given by

! !
Bj (v2) = bj + V/%. (2.25)

We assume that we can evaluate Bj correctly at each step of a Johansen/Euler

procedure. 17 Then we find that the one- and two-step errors in the evaluation of
the effect on the jth endogenous variable of a movement in the exogenous variables
from V2I to V21 ÷ dV2 are as follows:

Error (one-step) = (dVl(j))(.l) - (dVl (j))true,

= B j ( V I) dV2 - [Gj(V21 + dV2) - Gj (V2I)],
= - ½(dV2yQj(dV2); (2.26)


Error (two-step)= (dV1 (J))(.2) - (dV1 (j))n-ue

= Bj (V2[)(½ dV2) + B j ( V ~ + -~
' dV2) (½ dV2)
- [ G j ( V 2 I + dV2) - Gj (V2')]
= - ¼(dV2)'Qj (dV2). (2.27)

17With Euler's method, we introduce small errors into the evaluation of the B matrix through errors in
the evaluation of the endogenous variables. Nevertheless, this does not normally invalidate approximations
such as (2.23). For a detailed mathematical discussion of Euler's method in the context of CGE modelling,
see Dixon, Pan-nenter, Sutton and Vincent (1982, pp. 235-244).
Ch. 1: Computable General Equilibrium Modelling 17


Error (two-step) = ½Error (one-step).

More generally, we can show that if G j is quadratic, then when we double the number
of steps, we halve the error, i.e.

Error (2'r-step) = ½ Error (r-step). (2.28)

Findings such as these suggest that simple extrapolation procedures may produce
accurate evaluations of (dVl)true based on just one- and two-step Johansen/Euler com-
putations. For example, using (2.23) we find that

(dV1)true -~ 2(dV1)(.2) - (dVl)(.1). (2.29)

The use of the right-hand side of (2.29) to evaluate dVl is an example of the application
of Richardson's extrapolation. 18 Our experience has been that such extrapolations are
highly effective in producing accurate simulation results in large models, using only
a small number of Johansen/Euler steps.
What if the solution functions, G, are not well approximated by quadratic func-
tions? Then we may require tour or even eight-step Johansen/Euler computations.
However, over a long period, working with many different models, we have found
very few occasions in which it has been necessary to go beyond a two-step compu-
tation supplemented by an extrapolation.

Convenience: deriving the differential form. The third issue which we will consider
in relation to the Johansen/Euler method is convenience. Is it difficult and time-
consuming to do the total differentiation involved in taking a model from its initial
form (2.12) into a differential form such as (2.13)?
In implementing the Johansen/Euler method, we have found it convenient to deal
mainly with percentage changes in variables rather than changes. 19 That is, instead of
solving the system (2.13) for ( d ½ ) , we solve the system

+ = 0, (2.30)

18See Dalquist, Bjorck and Anderson (1974, pp. 269-273).

19For variables which pass through zero (e.g. the balance of trade), the percentage change form is not
appropriate. For such variables, we continue, in the differential versions of our models, to use changes.
Thus, in reality, the variables in systems such as (2.30) are usually a mixture of percentage changes and
changes. There may even be a few levels variables. Nevertheless, we will, for ease of exposition, retcr to
all the variables in (2.30) as though they are percentage changes.
18 P.B. Dixon and B.R. Parmenter


Vl = B ~ ( V l ) v 2 (2.31)

where vl and v2 are vectors of percentage changes in the variables in V1 and V2, and

F?(V : FI(v )P(, (2.32)

F ~ ( V I) = /72(VI)%I, (2.33)


S * ( V I) : --/~i* - l ( V I ) F ~ ( V I ) . (2.34)

Vl~ and ?/I are diagonal matrices formed from VIl and V2I.
As we will see later in this section and in Section 3, the components in the Fl* and
F2* matrices are often easy to interpret as cost and sales shares which can be evaluated
as either column or row shares from input-output tables. A second advantage of
the percentage-change version of the differential form is that the components of the
solution matrix, B*, are elasticities: the i, j t h component of B* (V I) is the elasticity
of the ith endogenous variable with respect to the j t h exogenous variable evaluated
at the initial solution. Economists normally prefer to work with elasticities rather than
with derivatives which depend on the units in which variables are measured.
Going from the levels representation, (2.12), of a model to a differential, percentage-
change representation, (2.30), usually involves the application of only the three
rules shown in Table 1.1. After some practice, application of these rules becomes

Table 1.1
Rules for deriving the percentage-change version of a model

Representation in:
levels percentage changes
multiplicationrule X = YZ ~ x = y +z
power rule X = yc~ ~ x = ay
addition rules X = Y + Z ~ Xx = Yy + Zz
or x = S y y + S z z
X, Y and Z are levels of variables, x, y and z are percentage changes, c~ is a
parameter and Sy and Sz are shares evaluated at the current solution.In the first
step of a Johansen/Eulercomputation,the current solutionis the initial solution.
Hence Sv = y I / X I and Sz = Z I / X I. In subsequent steps, Sv and Sz are
recomputed as X, Y and Z move away from their initial values.
Ch. 1: Computable General Equilibrium Modelling 19

quite straightforward and the resulting percentage-change representations are often

more readily understood and interpreted than the corresponding levels representa-
In the case of demand and supply functions derived under the assumption that
economic actors are optimizers, we can usually derive the percentage-change repre-
sentation without bothering about the specification of the levels representation. For
example, in most general equilibrium models, we assume that the demands for inputs
( X l j , X 2 j , •. •, X n j ) by producer j are chosen to minimize the costs of producing a
given level of output. A popular specification for the production function is CES, 2°
giving a cost minimizing problem of the form:

where the P~ are input prices, Zj is the level of output and Aj and bij are positive
parameters with the bs summing to 1. pj is parameter with value greater than - 1 but
not precisely zero. 21

The Lagrangian conditions for a solution of problem (2.35)-(2.36) are

together with the production function constraint (2.36). From here, we can eliminate
the Lagrangian multiplier, A j, from the system (2.36), (2.37), eventually arriving at a
representation of the input demand functions which could appear in a levels version
of a CGE model:

2°The CES production function was introduced by Arrow, Chenery, Minhas and Solow (1961). For the
derivation of percentage-change forms of demand and supply equations arising under different specifications
of production, utility, transformation and unit cost fimctions (the duality approach), see Dixon, Patanenter,
Powell and Wilcoxen (1992, pp. 124-148).
21AS pj approaches zero, (2.36) approaches the Cobb-Douglas form.
20 I~B. Dixon and B.R. Parmenter

At this stage we can apply the three rules in Table I. 1 to (2.38) to obtain a percentage-
change representation. Alternatively, we could move directly to a percentage-change
representation by applying the rules in Table 1.1 to (2.37) and (2.36) giving

pk=)~j+(l +pj)(~4 Sijxij)--(l+pj)xkj, k= 1,...,n, (2.39)


zj = E Sijx,ij, (2.40)

where pk, Aj and x# are percentage changes in the variables denoted by the corre-
sponding upper-case symbols, and

S t j = ( b t j X t j-eJ ) / ( ~ b i j X i j-pJ) , for t=l,.. ,n. (2.41)

By multiplying (2.39) through by Ski and summing over all k, we obtain

)~J = EPtStj" (2.42)


Substituting (2.40) and (2.42) into (2.39) and rearranging gives a representation of
the input demand functions which could appear in a percentage-change version of a
CGE model:


where ~Tj is the elasticity of substitution between inputs and is given by ~Tj =
1/(1 + pj).
In interpreting (2.43), we start by noting that (2.41) and (2.37) imply that the Ss
are cost shares, i.e.

Stj = P t X t J / E P k X k j , t = 1,...,n. (2.44)


Now we can interpret (2.43) as follows. Reflecting the assumption of constant returns
to scale underlying (2.36), (2.43) implies that in the absence of price changes producer
j's demands for all inputs will change by the same percentage as its output. If the
price of input k rises relative to a cost-share-weighted average of the movements in
all input prices, then producer j will substitute away from input k, i.e. producer j's
Ch. 1: Computable General Equilibrium Modelling 21

demand for k will rise less quickly than output. The strength of this price-substitution
effect will depend on the value of the substitution parameter crj.
Not all equations from CGE models are more simply represented in percentage
changes of variables than in levels. Some equations (e.g., those specifying total tax
collections as the sum of collections of many different types of taxes) are straight-
forward summations in their levels representation. In their percentage-change repre-
sentation, they involve some clumsy notation to represent various share coefficients
(e.g., the share of total tax collections accounted for by the tax on the use of domes-
tically produced good i by industry j). With recent versions of GEMPACK, 22 users
can represent some of their equations in percentage-change [brm and some in levels
form. The programs do the algebra to convert levels equations into differential forms
before proceeding with the computations.

Inequalities and complementary slackness conditions. The final issue to be consid-

ered in relation to the Johansen/Euler method is the treatment of inequality and com-
plementary slackness conditions. For example, what can we do if our model contains
relationships such as

/~ ~< T, (2.45)

I 7> 0, (2.46)


I=0 if R<T? (2.47)

R, T and I are variables. They can be thought of as an industry's rate of return (R);
its required or target rate of return (T); and its level of investment (I). Under (2.45),
industries expand their capital stocks so that rates of return never exceed the target
rates. Under (2.46), investment cannot be negative, and under (2.47), investment will
be zero if the rate of return is below the target rate.
One approach to handling models containing relationships such as (2.45)-(2.47)
is to solve a sequence of linear complementarity problems (LCPs). Assume that the
original model can be written as:

f(:c) ~ 0 , z'f(cc)~-O, z)O,

where z is the vector of endogenous variables. (Here we assume that the exogenous
variables are part of the function f.) As described by Mathiesen (1985), we replace f
by a linear approximation (e.g. a first-order Taylor approximation), thereby converting

22See Harrison, Pearson, Powell and Small (1993).

22 P.B. Dixon and B.R. Parmenter

our model into an LCR The solution to the LCP can be used in making a new linear
approximation to f . After solving a sequence of LCPs, we can expect to arrive at an
accurate solution to our original model.
In the Johansen/Euler framework, Horridge and Malakellis [see Malakellis (1992)]
have used the following approach. First, they rewrite (2.45)-(2.47) as

R + S = T, (2.48)


rain{I, S} = 0, (2.49)

where S is a nonnegative slack variable. Then they include (2.49) in the 9th step of
an n-step Johansen/Euler computation as

D(a,n)(l(9,n)+ (dl)(a,~)) + (i - D(g,n))(S(g,n) + (dS)(g,n)) : 0, (2.50)

where D(g,n ) is a coefficient defined by

D(g,n) = 1 if I(g,~) < S(g,n), (2.51)


D(g,~) = 0 if !(~,n) ~> S(~,,.). (2.52)

To see how (2.50)-(2.52) works, assume that we are conducting an n-step com-
putation of the effect of a 100 per cent reduction the tariff protecting an industry's
domestic market from import competition. Assume that in the initial situation (i.e.,
before the tariff reduction), investment in the industry is positive, i.e.,

I0,~ ) > 0,

implying that

S0,n) = 0 and R(1,n ) = T0,n).

With I(1,n) > S0,n), we have D 0,n) = 0. Thus, in the first step of our computation,
(the effect of reducing protection from its initial level to ( n - 1 ) / n times that level),
(2.50) reduces to

(dS)(1,n) = O.
Ch. 1: Computable General Equilibrium ModeUing 23

This means that R will continue to equal T, i.e.,

(dR)(l,n) = (dT)0,n).

With lowered protection, we would expect our model to imply that the industry's
rate of return can be maintained at T only with a smaller capital stock and reduced
investment, i.e., we expect

(dI)o,n) < 0.

If - ( d I ) 0 , n ) < I(l,n) so that I(2,n) > 0 and D(2,n) = 0 (0 = S(2,n) < I(2,,~)), then in
the second step of our computation we would continue to fix R to T and we would
continue to allow I to decline as we simulated the effect of a further reduction in
protection. If I stays nonnegative over the n steps of our computation, i.e., if

I(g,,~) ) 0 for 9 = 1 , . . . , n + 1, where I(n+l,n ) is the final value of I,

then S will stay at zero, R will remain equal to T and our final result will be
compatible with relations (2.45)-(2.47).
Now assume that in the (9 - 1)th step, g - 1 < n, investment becomes negative,

I(g-l,~) > 0 and S(9_1,n ) = 0


-(dI)(g--1,n) > I(g-l,n),

so that

I(g,n) < 0 and S(g,n) = 0. (2.53)

Under (2.53) we will have D(g,,~) = 1, reducing (2.50) to

(dI) (g,n) = - I(g,,~). (2.54)

Hence, in the 9th step of the computation, investment will be nudged back to zero
and S will be free to move (i.e., R will no longer be fixed to T). Because in the
9th step we are both reducing protection and forcing investment to increase, we can
expect (dR)(9,~) to be negative. Assuming that T is fixed, this implies that

(dS)(g,n) > O.
24 P.B. Dixon and B.R. Parmenter

In the (9 + 1)th step of our computation, we will have

I(g+l,,~) = 0, S'(g+l,n) > 0 and D ( g + l , n ) = 1.

Thus, investment will stay at zero while we can expect S to increase as we implement
further reductions in protection. With the completion of our computations (i.e., with
the simulation of a 100 per cent reduction in protection) we would expect to arrive at
a solution, compatible with (2.45)-(2.47), in which I(~+l,n) = 0 and R(n+l,n) < T.
Horridge and Malakellis have found that their method allows the Johansen/Euler
approach to be implemented quite easily via GEMPACK in models containing a small
number of inequality constraints. Unfortunately they find that extrapolation procedures
(e.g., Richardson's extrapolation) are no longer effective. In applying their method,
we also need to exercise care to ensure that the differential form of the model stays
well defined when, during the computations, variables stray into illegitimate regions
(e.g., negative investment).
In our own applications of the Johansen/Euler approach we have usually avoided
running up against nonnegativity conditions and other inequality constraints by us-
ing: utility functions implying large marginal utility lbr any commodity consumed
at close to the zero level; production functions implying large marginal products for
any input close to the zero level; and investment specifications implying reductions
in required rates of return as investment levels approach zero. Nevertheless, Horridge
and Malakellis have shown that model builders wishing to use the Johansen/Euler
approach should not feel compelled to eschew theoretical specifications involving a
few inequality constraints.

2.3. Solving a multi-period model

We consider four cases, each concerned with a model in which capital stocks available
for use in year t + 1 are determined by investment which takes place before year t + 1
In Case 1 investment is exogenous. In Case 2, investment and capital accumulation
in year t + 1 depend on expected rates of return for year t + 2, which we assume are
determined by actual returns to and costs of capital in year t + 1. In Cases 1 and 2,
the models are recursive, i.e. they can be solved for year 1 and then for year 2 and
SO o n . 23
In Case 3 we assume that expected rates of return for year t + 2 are the actual rates
of return for year t + 2. That is, we assume that expectations are rational or model

23Until recently, nearly all multi-period CGE models were recursive. Leading examples of recursive
models are Hudson and Jorgenson's (1974) energy model for the US and the Norwegian model, MSG-4
documented in Longva, Lorentsen and Olsen (1985).
Ch. 1: ComputableGeneralEquilibriumModelling 25

consistent. Under this assumption, the model is no longer recursive. 24 Relative to the
recursive models in Cases 1 and 2, solution of our Case 3 model requires a more
sophisticated computational approach. The approach we describe is a Johansen/Euler
method for handling the computations for all of the years simultaneously.
In Case 4, the behavior of investors is explicitly optimizing. We continue to as-
sume model consistent expectations. The solution method described for Case 3 is still
applicable. However, we can also use various shooting methods.

Case 1. Exogenous investment, a recursive model. We start with a model of the


(t), c/2(t), O,(t), n(t), ±(t), K(t - 1)) = o, (2.55)

t = 1,2,...,T,


K(t) = ( I - D ) K ( t - 1 ) + l(t), t = 1,2,...,T, (2,56)

O(t) is a vector giving industry rentals or profits per unit of capital in year
t (Qj(t) is the rental per unit of capital in industry j);
l~(t) is a vector giving the costs in year t of constructing units of capital for
the different industries;
I(t) is a vector of investment levels in year t for the industries;
K ( t - 1) is a vector of industry capital stocks at the end of year t - 1 and
available for use during year t;
D is a diagonal matrix of depreciation rates;
V, (t) and V2(t) are other variables for year t. ~ (t) is the vector of endogenous
variables such as domestic prices and outputs and V2(t) is the vector
of exogenous variables such as world commodity prices, taxes and
technological coefficients. ~ (t) and V2(t) could have been defined to
include K ( t - 1), Q(t), H(t) and I(t). However, these latter variables
have important roles in our description of multi-period modelling and
we prefer to represent them explicitly.
For any given value of t, say t = r, Eq. (2.55) specifies a typical one-period CGE
model. It imposes conditions such as demands equal supplies, prices equal costs and

24An early example of a non-recursive CGE model is Dervis (1975). More recent examples include
Ballard and Goulder (1985), Goulder and Summers (1989), Bovenbergand Goulder (1991), Jorgenson and
Wilcoxen (1994) and Mercenier and Sampaio de Souza (1994).
26 P.B. Dixon and B.R. Parmenter

demands and supplies are consistent with optimizing behaviour by various economic
actors. K(T - 1), capital availabilities in year ~-, can be thought of as a vector of
exogenous or pre-determined variables in the year-T CGE model.
Equation (2.56) says that capital available for use in industry j in year t + 1 [i.e.,
K j (t)] equals capital available in year t depreciated at rate Dj [i.e., ( 1 - D j ) K j (t-1)]
plus investment in year t [i.e., Ij(t)]. Figure 1.2 illustrates the timing of events.

year t [ year t + 1 year t + 2
Q(t) I Q(t+ 1) Q(t+2)
K(t - 1) H(t) K(t) H(t+ 1) K(t+ 1) //(t+2) K(t + 2)
I(t) I(t+l) I(t+2)
V(t) V(t+ 1) V(t+2)

Figure 1.2. Timing in the multi-period model.

We assume that the year-~- model contains no theory of investment, but that if I(~-)
is set exogenously, then the year-r model [together with predetermined values for
K ( 7 - - 1) and exogenously given values for V2(~-)] is sufficient to determine the other
variables for year ~-: Q(~-), H(T) and ~ (T). This means that if we know K(0) and we
have an exogenously specified timepath for investment {I(1), I ( 2 ) , . . . , I(T)}, then
model (2.55)-(2.56) can be solved as a series of one-period CGE computations. First
we use (2.56) to compute the time path for capital stocks {K(1), K ( 2 ) , . . . , K ( T ) } .
Then given V2(~-), we can, in principle, compute ~ (T), Q(T) and H(~-) by solving
the one-period CGE model specified by (2.55) with t = 7-.
To do these computations we can use the Johansen/Euler approach discussed in
the previous subsection. Recognizing that (2.55) holds in each year, we see that
growth rates from year t to year t + 1 satisfy, to a first-order approximation, the

H l ( t ) '/]1(t ÷ 1) + H2(t ) ,o2(t ÷ 1) + Hq(t) q(t + 1) + H~(t) 1r(t + 1)

+Hi(t) i ( t + l ) + H k ( t ) k ( t ) = O , t: 1,2,...,T-1. (2.57)

Equation (2.57) is a percentage change version of (2.55) with the coefficients (H~,
u = 1,2, q, % i and k) evaluated at the solution for year t, i.e., the H~s are evaluated

v(t) = %(t), Q(t), u(t), ±(t), K(t - 1)) (2.58)

The variables denoted by lower-case symbols in (2.57) are percentage growth rates
in the corresponding upper-case variables. For example, q(t + 1) is the vector of
Ch. 1: Computable General Equilibrium Modelling 27

percentage growth rates between years t and t + 1 in rentals, i.e.

qj(t + 1) = lO0(Qj(t + i) - Qj(t))/Qj(t).

Consistent with our discussion of the Johansen/Euler method in Subsection 2.2, the
lower-case symbols in (2.57) can also be interpreted as percentage deviations from
an initial solution for the year-(t + 1) model. This initial solution is V(t) given by
Under our assumption that the year-(t + 1) model (together with K(t), I(t + 1)
and V2(t + 1)) is sufficient to determine Q(t + 1), H(t + 1) and Vl(t + 1), we can
rearrange (2.57) as

vl(t+l)=B(t)v2(t+l), t= 1,...,T- 1, (2.59)


v'l(t+l)= [5'l(t+l),q'(t+l),~r'(t+l)], t--=l,...,T-1, (2.60)

v~(t+l)= [5~(t+l),i'(t+l),k'(t)], t=l,...,T-1, (2.61)


B(t) = -[Hl(t), Hq(t), -1 [H2(t), Hi(t), Hk(t)],

t= 1.

With the time paths for investment in each industry given exogenously, we can
easily compute i(2), i ( 3 ) , . . . , i(T), and with K ( 0 ) known we can use (2.56) in com-
puting k(1), k ( 2 ) , . . . , k(T). Finally, we assume that our input-output data and other
data for the base-period give us a solution to (2.55) for t = 1, i.e., we assume that
V(1) is known.
We can now proceed recursively. Using V(1) we can evaluate B ( I ) . Then from
(2.59), we can compute v1(2). 25 Next we compute V(2) by using formulae of the

V(J)(2) = V(J)(1)(1 + v(J)(2)/lO0), (2.63)

where v(J)(t) is the value of the jth variable in year t. With V(2) in place, we can
evaluate B(2) and compute vt (3), and so on.

25Alternatively, vl (2) could be evaluated by a multi-step Johansen/Euler computation.

28 P.B. Dixon and B.R. Parmenter

Case 2. Endogenous investment but still recursive. Investment depends on rates of

return. As the first step in moving towards a multi-period model with endogenous
investment, we add to our previous model [(2.55)-(2.56)] the following definition of
the rate of return in year t 4- 1 on capital in industry j:

R j ( t + 1) = Qj(t + 1)/(1 + r) - Hj(t) + Hj(t 4- 1)(1 - D j ) / ( 1 + r)

Hi(t) , (2.64)

for a l l j a n d f o r t = 1,...,T-I,

where r" is the rate of interest, which we will treat as a parameter. In this definition,
we assume that an outlay of Hj (t) in year t buys a unit of capital ready for use in
year t + 1. This earns a rental in year t + 1 of Qj (t + 1). The unit of capital depreciates
at the rate Dj and can be sold in year t + 1 for Hj(t + 1)(1 - Dj). In other words,
Rj(t + 1) is the present value in year t of investing a dollar in industry j.
Next, we add the equation

Ky(t)/Kj(t- 1 ) = Fk(t)Fkj(t)(1 4- R;(t~,t 4- 1)) ~' , (2.65)

for a l l j a n d f o r t = 1,...,T.

That is, we assume that the rate of growth of capital through year t depends positively
(c~j > 0) on the rate of return expected in year t to apply in year t + 1. The two F
variables in (2.65) are shift terms which can be used in various ways. For example,
we could set

Fkj(~) - (1 + lrrg(j)/lO0)/(1 + N R R ( j ) ) "j , (2.66)

where lrrg(j) is the long-run trend rate of growth of capital in industry j and N R R ( j )
is j ' s normal rate of return. Fk could be used to simulate the effects of an overall
(not industry-specific) change in the level of business confidence. If Fk is set at 1 and
R~(t,t + 1) equals N R R ( j ) , then under (2.66) capital growth in industry j will be,
in year t, at its long-run trend rate.
One theory of the expected rate of return is static expectations. We take this to
mean that

R;(t,t + 1) - Qj(t) (1 + i n f ) (1 + inf)(1 - Dj)

1+ , (2.67)
j(t) l+r

~br all j and for t = 1 , . . . , T ,

Ch. 1: Computable GeneralEquilibrium Modelling 29

where inf is the rate of inflation. In deriving (2.67), we wrote out the formula for
Rj (t + 1) with rental and price variables for year t + 1 replaced by their levels in t
multiplied by (1 + inf). That is, we assumed that expectations concerning year t + 1
are formed in year t by inflating all nominal variables by the general rate of inflation.
By assuming that r = inf, we can simplify (2.67), obtaining

R~(t, t + 1) = ( Q j ( t ) / H j ( t ) ) - Dj, (2.68)

for all j and for t = 1 , . . . , T .

An advantage of expectations assumptions sucb as (2.67) or (2.68) is that they

give us a model with endogenous investment while still allowing us to solve recur-
sivety, applying the Johansen/Euler technique. To demonstrate this, we add to the
Johansen/Euler system (2.57) the following:
l~j(t 4. 1) - hi(t)
= fk(t 4- 1) 4-.fkj(t + 1)
+ a j ( Q j ( t ) / ( Q j ( t ) + (1 - D i ) I I 3 ( t ) ) ) ( q j ( t + 1) - ~rj(t + 1)), (2.69)

for all j and for t = 1 , . . . , T - 1


:j(t)kj(t + 1) = (1 - D j ) K j ( t - l)kj(t) + + 1), (2.70)

for a l l j a n d f o r t = 1,...,T- 1.

Equation (2.69) is a percentage-change version of (2.65) incorporating (2.68), and

Eq. (2.70) is a percentage-change version of (2.56). In this expanded Johansen/Euler
system [i.e., (2.57) and (2.69)-(2.70)], the variables are 51 (t 4. 1), 52(t 4. 1), q(t 4. 1),
7 @ + 1) and i ( t + 1) f o r t = 1 , . . . , T - 1; and k ( t + 1) f o r t = 0 , . . . , T - 1. All of
these are vectors of growth rates connecting years t and t 4- 1.
For t = 1, the addition of (2.69)-(2.70) expands the original system, (2.57), by 2h
equations where h is the number of industries. The expanded system for t - 1 also
contains 2h + 1 new variables: k 9 (2), fkj (2) and fk (2). Assuming that the f s are set
exogenously, the expanded system for t = 1 can now determine growth rates for h
previously exogenous variables: i3(2) for j - 1 , . . . , h. After solving the expanded
system at t = t, we can, as before, compute V(2). Then we can set t = 2 and
solve the expanded system [(2.57), (2.69)-(2.70)] for growth rates in the endogenous
variables for year 3, and s o o n . 26

26111 making these computations, we need to be clear about initial solutions. In the computationsreported
for our illustrative model in Section 3, the initial solution for year (t + I) is the set of values for the variables
30 P.B. Dixon and B.R. Parmenter

In our illusta'ative model in Section 3, we adopt Eqs (2.69)-(2.70). However, rather

than exogenizing fk (t + 1) for all t, we either exogenize aggregate investment in each
period or fix aggregate investment in relation to aggregate consumption. For period
+ 1, fk(t + 1) is determined endogenously to ensure sufficient growth in capital
stocks through year t + 1 to absorb the given aggregate level of investment.

Case 3. A non-recursive multi-period model. An alternative to (2.68) is

R~(t,t+l)=Rj(t+l), foralljandfort=l,2,...,T. 27 (2.71)

This is the assumption of model consistent or rational expectations. With (2.71)

replacing (2.68), we write (2.65) as
K j ( t ) / K j ( t - 1) -- Fk(t)Fkj(t)(1 + Rj(t + 1)) ~j , (2.72)
for all j and for t = 1 , . . . , T ,
and replace (2.69) by
kj(t + 1) - kj(t) = fk(t + l) + fkj(t + 1)
+ c~j ( R j ( t + 1)/(1 + R j ( t + 1)))rj(t + 2), (2.73)
for a l l j a n d f o r t = 1. . . . , T - l ,
where rj (t. + 2) is the percentage change in industry j ' s rate of return between years
t + l and t + 2.
Now with investment endogenous, we no longer have a recursive model. Before
we can work out the growth rates connecting years 1 and 2, we need to know rj (3).
But this depends on qj(3) and 7rj(3) [see (2.64)]. Values for these variables cannot
be found until we work out the growth rates connecting years 2 and 3. To work out
the growth rates for year 3, we need to know rj (4). But this depends on q:/(4) and
~rj (4), and so on.
CGE computer packages such as GEMPACK (Pearson, 1988) can handle linear
systems containing millions of equations and variabtesY Using the Johansen/Euler

in year t. In particular, in our year-(t + 1) computation, the initial values for K(t) and K ( t + 1), i.e., the
beginning and ending capital stocks in year t + 1, are given by the begining and ending capital stocks in
year t, i.e., K(t)initial - /£(t - 1) and K ( t + 1)initial = K(t) = (1 - D)K(t - 1) + I(t). With this initial
solution for year (t + 1), our percentage-change answers from the Johansen/Euler computation retain their
interpretation as both deviations from a base-case solution and as growth rates through time. For example,
k(t + 1) is the percentage deviation in K(t + 1) from its initial value, /(-(t). Thus, k(t + 1) is also the
growth in capital through year t + 1.
27This equation calls for a value of Rj (T + 1) which is beyond the range of (2.64). In applications of
a model such as the one we are describing, we would expect to set the Rj (T + 1)s exogenously to reflect
the assumption that in the long-run, rates of return settle down to normal levels.
2SGEMPACK uses sparse matrix methods. It also has convenient facilities for reducing the size of a
linear system by using some of the equations to eliminate some of the variables. For a discussion of this in
the context of a large-scale CGE model, see Dixon, Parmenter, Sutton and Vincent (1982, pp. 207-229).
Ch. 1: Computable General Equilibrium Modelling 31

method, we can solve very large, one-period CGE models. This suggests that we
can overcome the non-recursivity problem in multi-period models by using a Jo-
hansen/Euler approach with all years treated simultaneously. 29
To describe this approach, we start by representing a multi-period CGE model as

F(V(1), V(2), ..., V ( T ) ) = o, (2.74)

where V(t) is the vector of variables applying to year t.

All the equations of the model are included, i.e. (2.74) includes the equations linking
contemporaneous variables (e.g. demands at time t equal supplies at time t) and the
equations, such as (2.56), (2.64) and (2.72), linking variables from different times.
Providing that we have an initial solution

V I = (VI(1),VI(2),...,VI(T)),

then we can use the rules from Table 1.1 to form a percentage change version of

F*(V~)v : 0, (2.75)

where F * is the Jacobian matrix of F evaluated at V l and multiplied by ~1, and v is

the vector o f percentage deviations in variables from their initial values. (It is worth
emphasizing that components of v are not growth rates through time.)
Once we have system (2.75), we can divide the variables into endogenous and
exogenous sets. Then we follow the steps described in (2.30)-(2.31) to compute, for
example, the effect on output and employment in the footwear industry in all years
of an anticipated change in the tariff in year t. 3°
How do we find an initial solution, VI? Unlike the situation in a one-period model,
we cannot, for a multi-period model, simply read a solution from our input-output
In some models, it is easy to find a steady state or a balanced-growth path: i.e. we
can find a solution o f the form

g I ~__ ( V I ( t ) , g V I ( 1 ) , 02VI(1),..., 07'-1Vl(1)), (2.76)

where ~ is a diagonal matrix (possibly the identity matrix) of growth factors. 31 How-
ever, not all models have a solution of the form (2.76). In any case, a solution of

29To our knowledge, this potential was first recognized by Bovenberg (1985) and Wilcoxen (1985 mid
3°In models with rational expectations [i.e. with expectation assulnptions such as (2.71)], any change in
tariffs in year t is anticipated. It affects behavior in years t - 1, t - 2,... as well as in years t, t + 1, . . . .
31For example, Bovenberg (1985) uses this method.
32 t~B. Dixon and B.R. Parmenter

this form may be rather far from a realistic solution around which we would want to
compute deviations.
An alternative approach to finding a V ~ involves an initial recursive simulation,
followed by a correction. 32 For example, consider the model specified by (2.55),
(2.56), (2.64) and (2.72). Initially, we delete (2.72) from the system and solve the
reduced system with [ I ( 1 ) , I ( 2 ) , . . . , I ( T ) ] set exogenously. This solution can be
made recursively, as described in Case 1, using data for the initial year and a series
of one-period Johansen/Euler computations.
After completing the initial recursive computation, we will have found a solution
( V I a ( I ) , . . . , V~(T)) to the reduced system. This includes values for R j ( t + 1), ~ =
1 , . . . , T - 1, and K j ( t ) , t = O , . . . , T . These (together with a suitable value for
R j ( T + 1)) can be substituted into (2.72) to obtain implied values for Fkj(t), ~ =
1 , . . . , T. (We assume that Fk (t) is set at one for all t.)
At this stage, we have a solution VIl (an initial, initial solution) to the lull system
(2.56), (2.57), (2.64) and (2.72). Using V H, we can set up a percentage deviation
version of the full system:

/~* (VII)'/) ~ 0. (2.77)

The last step is to run a correction simulation using (2.77). In this simulation, we
include percentage changes in investment [ i j ( t + 1), t = 1 , . . . , T - 1] among the
endogenous variables while the fkj (t + 1), t = 1 , . . . , T - 1, are among the exogenous
variables. The correction simulation consists of shocking the F k j s from their values in
V n to realistic values, e.g. those specified in (2.66). After adjusting VII for the effects
of moving the Fkj s, we arrive at V I. This is a solution to the full model containing
economically sensible relationships between the paths of capital stocks and rates of

Case 4. A non-recursive multi-period model with optimizing investment behavior. In

Section 1, we claimed that a strength of C G E modelling is its reliance on optimizing
theories of behavior by different economic actors. Equation (2.72) rests uneasily with
this claim. It was not derived from any explicit optimizing specification.
An optimizing specification which is often used in the derivation of investment
equations for multi-period CGE models is as follows: 33 industry j chooses l j ( t + 1)

32This method was suggested by Mark Horridge and is related to the homotopy concept [see Zangwill
and Garcia (1981, Chapter 1)]. It has been developed and applied by his student Michael Malakellis (1992,
33The optimizationproblem is usually specifiedin continuous time with an infinite time horizon [see, for
example, Bovenberg (1985), Bovenberg and Goulder (1991) and Dixon, Parmenter, Powell and Wilcoxen
(1992, Chapter 5)]. Solution of the problem then involves the use of optimal control techniques and the
specification of transversality conditions. By adopting a discrete-time optimization problem with a finite
time horizon, we can use the Lagrangian method. Rather than imposing transversality conditions, we impose
a value on terminal capital stocks.
Ch. 1: Computable General Equilibrium Modelling 33

and Kj (t + 1) for t = 1 , . . . , T -- 1 to maximize

~-~ f Qj(t + 1)Kj(~) _ Hj(t + 1)(Ij(t + 1 ) + OI~(t + 1 ) ) \
(1 + r) T + (1 - Dj)Aj(T + l)
}Kj(T) (2.78)

subject to

Kj(t) = (1 - Dj)Kj(t - 1) + Ij(t), (2.79)

for all j and for t = 1 , . . . , T ,

with K j ( 0 ) given.
The only new symbols in (2.78)-(2.79) are 0 and Aj(T + 1). Both denote positive
parameters. The remaining notation is the same as that used earlier in this subsection.
The timing of events is also the same. That is, we assume that K j (t) can be used in
production in year t + 1 (Fig. 1.2).
Two features of the objective function (2.78) need explanation. The first is the term
0I~. This is often called a costs-of-adjustment term. 34 It makes rapid expansion of
the capital stock very costly. With no explicit recognition of risk in (2.78)-(2.79), the
inclusion of the costs-of-adjustment term plays a useful dampening role. Without it,
behavioral specifications such as (2.78)-(2.79) are inclined to imply unrealistically
large responses in investment to small changes in anticipated rentals and construction
The second feature of (2.78) which may be puzzling is the final term. This gives
units of capital a value at the end of the industry's planning period. If they were given
no value, then (2.78)-(2.79) would probably imply unrealistically low investment
levels for years close to T. We have chosen for notational reasons to represent this
terminal value by

TE- Q(1j ( T+ +r)I ) T + (1 - Dj)Aj(T + 1). (2.80)

As we will see, the use of this notation simplifies the presentation of the Lagrangian
conditions for a solution of (2.78)-(2.79).
These Lagrangian conditions are

Qj(t + 1)/(1 + r) t - Aj(t) + (1 - Dj)Aj(t + 1) - 0, (2.81)

for a l l j a n d f o r t = 1,...,T,

34The costs-of-adjustment term takes different forms in different models. The form used here is close
to that in Dixon, Parmenter, Powell and Wilcoxcn (1992, Chapter 5). Bovenberg and Goulder 0991)
use OI2/K.
34 P.B. Dixon and B.R. Parmenter

-nAt)(l + 2o±j(t))/(1 + + As(t) = O, (2.82)

for all j and for t = 1 , . . . , T,


K j ( t ) - (1 - D j ) K j ( t - 1) - I j ( t ) = 0, (2.83)
for all j and for t = 1 , . . . , T,

where the A j ( t ) for all j and for t = 1 , . . . , T are the Lagrangian multipliers 35
associated with the constraint (2.79). Notice in (2.81) that with our notational choice
(2.80) we do not have to make a special case for t = T.
How can we deal with (2.81)-(2.83) in a multi-period CGE model? To answer this
question, we consider the model formed by (2.81)-(2.83) together with (2.55). 36 As we
have done earlier, we will assume that for any given value of t, say t = T, (2.55) can
be solved for VI@), Q(7), and H @ ) in terms of V2(~-), I(~-) and K(~- - 1). We also
assume that we have a base-period solution for (2.55), i.e. we know V(1) = (VI(1),
<V2(1), Q(1), H(1), I(1), K(0)). With given values for I(1) and K ( 0 ) , we will treat
K ( 1 ) as known.
Our aim is to solve the system (2.55), (2.81)-(2.83) with investment deter-
mined endogenously in a way which is consistent with the optimizing specification

35Remember that Aj (T + 1) is a parameter of problem (2.78)-(2.79), not a Lagrangian multiplier.

36This model is the same as that considered in Case 3 [i.e. (2.55), (2.56), (2.64) and (2.72)] except that
(2.72) is replaced by

Rj(t+l)=20(lj(t) 1-DJ1+r H 'j ( tH+ l i~ )(I j (t t +) l ) )

for all j and for t = 1 , . . . ,T- 1, (2.84)


TVj = IIj (T)[1 + 20Ij(T)]/(1 + r) T-I, for all j. (2.85)

(2.84) and (2.85) can be derived by using (2.82) to eliminate Aj (t), t = 1,..., T, from (2.81) and then
calling on the definitions of the rates of return in (2.64) and the terminal value of units of capital in
(2.80). Formulation (2.84)-(2.85) not only helps to relate the present model to that studied in Case 3,
but it helps us to understand the relationship between models with and without adjustment costs. Models
without adjustment costs [e.g., Jorgenson and Wilcoxen (1992, 1994) and Malakellis (1994)] use arbitrage
equations of the form

% ( t + l) Dj 1 --
nj(t) + nj(t + 1)- 1 + r = O, for all j and for t = 1 , . . . , T - 1. (2.86)

(2.86) is what we obtain from (2.84) and (2.64) if we set 0 = 0. This implies that the models with
adjustment costs can be regarded as generalizations of the models without adjustment costs.
Ch. 1: Computable General Equilibrium Modelling 35

(2.78)-(2.79). If we happen to know the values of Aj(2) for all j, then we can do
this recursively. For year 2, we form the system:

H(~] (2), V2(2), Q(2), H(2), I(2), K(1)) = 0, (2.87)

-Hi(2)(1 + 20ij(2))/(1 + r) + Aj(2) = 0 (2.88)


Kj(2) = (1 - Dj)Kj(1) + Ij(2), for all j. (2.89)

With the Aj(2)s known, (2.88) provides the extra equation to enable I(2) to be
determined endogenously in the system (2.87)-(2.88), while (2.89) allows us to cal-
culate K(2). Once we have found all the year-2 variables from (2.87)-(2.89), then
we can move to year 3. For year 3 we have

H ( ~ (3), V2(3), Q(3), H(3), I(3), K(2)) = 0, (2.90)

--//7(3)(1 -~- 2 0 I j ( 3 ) ) / ( l @ 7")2 @ A j ( 3 ) - 0, for all j, (2.91)

Qj(3)/(l+r) 2 - A J ( 2 ) + ( 1 - D y ) A j ( 3 )=0, for all j, (2.92)


Kj(3) = (1 Dj)/~4(2) + Ij(3), for all j. (2.93)

Equations (2.91) and (2.92) provide the extra conditions to enable I(3) and A(3) to
be determined in the system (2.90)-(2.92), and K(3) can be calculated from (2.93).
Having found values for all the year 3 variables, we can move onto year 4, and so on,
With a given base-period solution, V(1), all these calculations could be carried out in
a recursive, Johansen/Euler, year-to-year, computation of the type already described
in Cases 1 and 2.
The main problem we face with a recursive approach is how to set Aj (2) for all j.
In any case, how do we know whether we have set the right values or not?
The question of whether the vector A(2) was set correctly is answered when we
do the calculations for year T. The year T calculation produces values for Aa (T) for
all j. We denote these by A5v) (T) where the superscript (g) refers to guess number 9.
That is A (g) (T) is the vector of values for A(T) obtained in a recursive calculation
36 PB. Dixon and B.R. Parmenter

based on the 9th guess of the vector A(2). If A~g)(T) equals the exogenously given
value of T ~ for all j, i.e. if

j (T) = Q j ( T + 1)/(1 + r) r + (1 - D j ) A j ( T + 1),

A(~) for all j, (2.94)

then we conclude that our 9th guess of A(2) was correct and that we have now found
a solution to the model (2.55), (2.81)-(2.83). If (2.94) is not satisfied, then we can
revise our guess of A(2), re-do the recursive calculations and hope that we manage
to satisfy (2.94) while 9 is still quite small.
The approach we have just described to solving the model (2.55), (2.81)-(2.83) is
a shooting algorithm. (We guess a value for A(2) and shoot forward, trying to hit a
terminal target value.) As explained in Dixon, Parmenter, Powell and Wilcoxen (1992,
Chapter 5, pp. 333-340), simple shooting algorithms often work poorly in economic
models. The difficulty is that small errors in the guess of A(2) can result in very large
differences between the left and right hand sides of (2.94). More success has been
achieved with multiple shooting methods 37 (where each set of recursive calculations
uses guesses of A(t) for several values of 4, not just t = 2) and with the Fair-Taylor
method 3s (where each set of recursive calculations uses guesses of A(~) for all values
of t).
In our forecasting and policy work for businesses and government departments in
Australia, we have not adopted the assumption of rational expectations. We solve a
large (112 industry) recursive model incorporating externally supplied, realistic macro
forecasts. Our approach, which is illustrated in Section 3 and discussed further in
Section 4, is an application of Case 2. However, our colleague, Michael Malakellis
(1994), has built a 13-sector, 30-period model for Australia along the lines of Case 4.
Rather than adopting shooting methods, he has preferred to use in his computations
the non-recursive, simultaneous, Johansen/Euler approach described in Case 3. Un-
der this method, all the A(t)s, Q(t)s and other variables appear in the computations
simultaneously, with A ( T + l) and Q ( T + 1) treated as exogenous variables. His ex-
perience suggests that there would be no serious computational difficulties in applying
this simultaneous method in the solution of very large multi-period models. The real
issue now is the empirical relevance of the rational expectations assumption.

3. An illustrative C G E model

In this section we describe the theory and data of an illustrative CGE model. We
show how CGE models can be used for comparative-static policy analysis and for
forecasting. The illustrative model has just three sectors. Its equation system uses only

37See Lipton, Poterba, Sachs and Summers (1982) and Roberts and Shipman (1972).
38See Fair (1979) and Fair and Taylor(1983).
Ch. 1: Computable General Equilibrium Modelling 37

simple functional forms. With the same techniques as are employed for the illustrative
model, models for real-world applications can be constructed which are much larger
and which have equation systems based on more general functional forms.
In describing the illustrative model, we begin with the input-output database (Sec-
tion 3.1). By examining this, we can set out the structure of the hypothetical economy
to be modelled. Then we proceed to the model's equation system (Section 3.2). In
Section 3.3 we describe the calibration of the equation system using the input-output
accounts, some elasticities and other data. We show how this data set constitutes an
initial solution to the model. Closure of the model is discussed briefly in Section 3.4.
Our illustrative simulations are described in Section 3.5.

3.1. Input-output database

The basic structure of the model is revealed by Table 1.2, the model's input-output
database. The columns identify the following purchasing agents:
(1) domestic producers in each of 3 industries;
(2) investors divided into 3 industries;
(3) a single representative household; and
(4) an aggregate foreign purchaser of exports.
The entries in the columns show the purchases made by these agents. Each of the
4 commodity types identified in the model can, in principle, be purchased locally
or imported from overseas. In our data there are no imports of commodity 3 and
no domestic supplies of commodity 4. The source-specific commodities are used
by industries as inputs to current production and capital formation, consumed by
households and exported. These commodity flows (in the first 8 rows of the table)
are shown at basic values, i.e., at the prices received by the sellers not those paid
by the purchasers: In the case of imports, basic values include import duties. Import
duties are assumed to be levied at rates which vary by commodity but not by user.
The revenue obtained is shown in the tariff vector labelled " ( - ) duty".
One domestically produced commodity (commodity 3) is used as a margins
service 39 which is required to transfer commodities from their sources to their users.
Commodity taxes are also payable on the purchases. The margins services and com-
modity taxes applying to the flows of domestic and imported commodities are shown
in rows 9-24 of the table. By adding the margins and commodity taxes to the cor-
responding basic commodity flows, we can compute the purchasers' values of those
As well as intermediate inputs, current production requires inputs of two categories
of primary factors: labour and fixed capital.

39This could be thought of as trade and transport services.

38 P.B. Dixon and B.R. Parmenter

Table 1.2
Input-output database for the illustrative model
Inputs to current Inputs to capital
production in formation in H'hold Exports (-)duty Total
industries industries consn sales
l 2 3 1 2 3
Domestic 1 10.00 20.00 10.00 0.25 0.13 0.62 4.00 35.00 80.00
commodities 2 15.00 15.00 10.00 3.80 1.90 9.30 55.00 10.00 120.00
3 18.00 8.00 58.00 0.00 0.00 0.00 35.00 0.00 119.00
4 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00

hnports 1 0.00 5.00 10.00 0.51 0.25 1.24 8.00 0.00 -4.00 21.00
2 0.00 6.00 0.00 1.52 0.76 3.72 18.00 0.00 -3.00 27.00
3 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
4 5.00 10.00 4.00 0.76 0.38 1.86 8.00 0.00 - 10.00 20.00

Margins on 1 1.00 4.00 3.00 0.00 0.00 0.00 2.00 6.00 16.00
domestic 2 3.00 5.00 2.00 1.27 0.63 3.10 21.00 4.00 40.00
commodities 3 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
4 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00

Margins on 1 0.00 1.00 3.00 0.25 0.13 0.62 1.00 0.00 6.00
imports 2 0.00 2.00 0.00 0.76 0.38 1.86 6.00 0.00 11.00
3 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0,00 0.00
4 1.00 2.00 1.00 0.51 0.25 1.24 2.00 0.00 8.00

Taxes on 1 1.00 4.00 1.50 0.00 0.00 0.00 0.00 10.00 16.50
domestic 2 3.00 3.57 3.00 0.72 0.36 1.77 18.08 - 1.00 29.50
commodities 3 0.00 0.00 0.00 0.00 0.00 0.00 5.00 0.00 5.00
4 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00

Taxes on 1 0.00 1.00 1.50 0.00 0.00 0.00 0.00 0.00 2.50
imports 2 0.00 1.43 0.00 0.29 0.14 0.71 5.92 0.00 8.50
3 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
4 0.00 0.00 0.00 0.00 0.00 0.00 2.00 0.00 2.00

Labour 22.00 14.00 64.00 100.00

Capital 11.00 8.00 29.00 48.00

Total costs 90.00 110.00 200.00 10.63 5.32 26.05 191.00 64.00 -17.00 68(/.00


Domestic 1 60.00 20.00 0.00

commodities 2 30.00 90.00 0.00
3 0.00 0.00 200.00
4 0.00 0.00 0.00
Total output 90.00 110.00 200.00
Ch. 1: Computable General Equilibrium Modelling 39

in principle, each industry is capable of producing any of the 4 commodity types.

The make matrix at the bottom of Table 1.2 shows the basic value of the output
of each commodity by each industry. In our data, industries 1 and 2 both produce
commodities 1 and 2. Industry 3 is a single-product industry and the sole producer of
commodity 3. Commodity 4 is not produced domestically.

3.2. Equations

The model's theoretical structure describes the purchasing decisions of the industries,
investors, households and foreigners; the production decisions of the industries; price
formation; market clearing, capital accumulation; wage determination; and the defi-
nition of a number of macroeconomic variables. The equations of the model are set
out in Table 1.3, both in the levels of the variables and in their percentage changes.
The levels of the variables can be thought of as referring to year t, and the per-
centage changes to either comparative-static deviations or to growth rates connecting
years t and t + 1. The percentage-change equations follow straightforwardly from
the levels equations by application of the rules described in Table 1.1. The variables
are defined in Table 1.4. The other notation appearing in the equations is defined in
Table 1.5. The notational conventions should become apparent as we proceed through
the equations.
Equations 3.1-3.6 describe the demands by users 4° for source-specific inputs and
for composite inputs 41. All users are assumed to be price takers. As can be seen from
Table 1.2, producers use commodities and primary factors as inputs but other users
(investors, households and purchasers of exports) use commodities only. Commodities
can be sourced domestically or imported, although purchasers of exports use the
domestic source only. Labour and capital are the sources of primary factors.
Industries are assumed to choose inputs to current production and capital formation
to minimise the costs of these activities. Input-output separability is imposed on
the current-production fnnctions so that the composition of inputs is independent of
the composition of output. The structure of Eqs 3.1, 3.2 and 3.5 reflects the nested
structure of the input side of the production functions assumed for current production
and capital creation. The top level of the nests, describing the technology for the
use of composite inputs, are fixed-proportions (Leontiet) functions leading to demand

4°Users are denoted by the superscript (at) taking the values: (l j), producers in industry j; (2j),
investors in industry j; (3) households; or (4) purchasers of exports.
41Inputs are denoted by the first subscript taking the values: 1,..., 9, commodities; or g + 1, primary
factors. The sources of the inputs are denoted by the second subscript taking the values: 1, domestic supplies
in the case of commodities and labour in the case of primary factors; or 2, imports and capital. A "." in
place of this second subscript indicates aggregation over sources, i.e., a composite commodityor primary
40 P.B. Dixon and B.R. Parmenter

Table 1.3
Equations of the illustrative model

Identifier Equation No.

Substitution between domestic and imported products

3.1(a) X(~,) ''(i.)~(is) (p(U).

(is) = x.(~)ug(~) \" (il)' p(~)~
~ (i2)1

3.1(b) . - 2 ,
i = l , . . . , g ; s -= 1 and 2; and (u) = (3) and ( k j ) for k = 1 and 2 4gh -F 2g
andj = 1,...,h

Substitution between labour and capital

3.2(a) x(Ij) /A(lj) = 3((lj) t/;,(lj) [p(lj) /A(IJ) . p(lj) /4(lj) '1
(g-Fl,s)l''(9-t-l,8) "~(g-l-l')~(gq-l,s) \~(g4:-l,l)/''(g+l,1)' ~(g+1,2)/~'(94-1,2)]
z(lj) a(lj) x(lj) (tj) { (lj) a(lJ)
3.2(b) '(g+l,s) - - (g+l,s) ~ "(9+1.) - - O'g+l ~P(g+l,s) -- (g+l,s)
- ~ { V ( g + l , t , ( l J ) ) / V ( g + l , ' , ( l J ) ) } { P \ (U)
- a (lj) ~
j = l,...,h; s = l , 2 2h

Household demands for composite commodities

3.3(a) p(3)(i.)X(3)
U') ='TiP~iS'))Q + ~ i ( j~G ~/JP~'~Q)

3.3(b) V(i.(3))(pl3i! ) q_ x(3)

(i.)]"~ = "[ip ((i.)c¢[P(i
3 ) ~ / (3).) q_q)

-- "~3t~(i.)W~p(i.) +q , i= l,...,g

Prices of composite commodities to households

3.4(a) p(3)X(3) _ ~-, p(3) X(3)

(i.) ( i . ) - z_., (it) (u)

3.4(b) p(3)
(~.) ~ {V(i,t,(3))/V(i,.,(3))}pl~!), i = 1,..., 9

Intermediate and investment demands for composites, commodities and primary factors

3.5(a) X('~)
(i.) =Z(U)A~ u) u = (kj) f o r k = 1,2 and j = 1, • .. , h
If (u) = ( l j ) t h e n / = 1 . . . . . g + 1 2gh ÷ h
3.5(b) 0~)) = z(~)
z(i. If (u) = (2j) then i = 1 . . . . . g
Ch. 1: Computable General Equilibrium Modelling 41

Table 1.3
Identifier Equation No.
Foreign demands (exports) for domestic goods

3.6(a) p{4~) E = F{:,)) {\X(4)(/l)]]-l/r;i

3.6(b) P(/l)~(4) : f~4~) ( )!4! l,... 9

Margins demands for domestic goods

3.7(a) X(i~)(u)
(rl) = x.(~)
" ' ( ~ s ) "a (( ri ~l )) 0 , ) ' r,i= l,...,g
(u) = (3), (4) and (kj) for k = 1,2
and j = 1,...,h
3.7(b) x(is)(,~)
(rl) = x(U)
(is) If (u) = (4) then s = 1
If (u) # (4) then s -- 1,2 492h + 3g2

Composition of output by industries

3.8(a) x(OJ) = z(lj)ff/(oJ) (pO') p(O) p(O)

(il) -- --(il) \~(11)'*(21)''''''(91)}

3.8(b) z!(il)
"J) = z(U) + aO,J) r~ (°)
L~(i0 - ~ (o) 7
{Y(t'J)/Y("a))P(toJ' .,~

j=l,...,h, i=l,...,g gh

Demand equals supply for domestic commodities

y(oj) x~(~) + ~ x~(~1)(4)
"Xtl) =
jEH (u)EU iEG s : l , 2 (u)@U* iEG
" (,,j)
3.9(b) Y(t,o)x(tl) = 22 .(t, i,
jCH 0*)eu

s, ku))x(tl)
iCG s=l,2 (u)Cu*
.... (il)(4)
+ ~ M(t,i,l,k,~))ac(t 0 , t= 1,...,9

Industry revenue equals industry costs

3.10(a) ]((Oj)p(O) y ( l j ) p(lj)

"~(tl) "(tl) "-- ~ ~ "'(gs) "(is)
t6G t6G* s=l,2
3.10(b) j ]HO;I) = ~ ~ V(/~, s , ( b ) ) p, u ~ (lj)), j = 1..... h
LCG t6G* s:l~2
42 P.B. Dixon and B.R. Parmenter

Table 1.3

Identifier Equation No.

Basic price of imported commodities

3.11(a) p((O) ( p ( ~ ' ) / ~ ' ~ T(")

~2) = k" (i2) ,~,'*(i2)

3.1 l(b) p(O) .2(w) +t(~,, i= 1, .,g

(i2) = P(i2) -- e (.) ..

Purchasers' price related to basic prices and taxes

3.12(b) V(i,s,(u))p{::) : ( B ( i , s , ( u ) ) + T ( i , s , ( u ) ) ) [I,p(is)(°) + t(i,s,(u)))
.... (o)
+ ~ M(r, ~, s, Vu))p(~l),
i = 1 . . . . . g; (U) = (3), (4) and (kj) for k = 1,2
and j = 1,...,h
If ( u ) = (4) then s = 1. If ('~) ¢ (4) then s = 1,2 3g + 4gh

investment behaviour

3.13(a) y(lj) (1wy(lj) = F k F (j) [1 q- [ p ( l j ) /p(lj)~ _ eSj]ad

"'(9q-l,2)k*//"'(gq-l,2 ) \ (9+1,2) k )

3.13(b) x (gq-l,2)
(lj) (1"~ - W(gq-l,2)
k'/ ._(U) = f k + f~J)

L" (9+1,2)/\" (9+1,2) q- -- (P(g+l,2)


Capital accumulation

3.14(a) y(U) (1~ = y ( t j ) (l - 6j) q- Z (2j)

"~(g+l,2)x'] "'(9+1,2)

3.14(b) ,-(U) ,,, (lj)

A(g+l,a)kl)X(g+l,2)(l) = - -7g
( "(Ij) r . x (lj) ) q_ z(2j)z(2j)
{I --O3)X(g+l,2
g + l , a ) v.

j = 1,...,h

Costs of constructing units of capital for industries

315(a) P~J>Z(~J)=E E p(2J)x(ZJ)

(.) (.)
gee s=l,2

3.15(b) V(.,., (2j))p (,j) -- ~ ~ V(i, s, (29))P(is)

" (2d) , j = 1. . . . . h h
iEG s=l,2
Ch. 1: Computable General Equilibrium Modelling 43

Table 1.3
Identifier Equation No.
Wage determination

3.16(a) p(U)(g+,,,)= (CPi)F{~)l,,)F(9+l,,)

3.16(b) _(lj)
P(g+l,l) = cpi + f(lj)
J(g+l,0 -}- f(g+l,l), j = 1,..., h h

Consumer price index

317(a) CPI = r I II
i6G s=l,2

3.17(b) cpi= E E (V({'s'(3))/V(""(3)))PI3t)s) 1

iEG s=l,2

Tax rates on sales to households

3.18(a) T(i, s, (3)) = Tb(i,., (3))Ft (3)

3.18(b) t(i,s,(3))=tb(i,.,(3))+ft(3), i=1,...,9, s=l,2 29

Ratio of real investment to real consumption

3.19(a) IR/CR = FIC

3.19(b) iR/CR = tic

Other equations defining

• GDE real GDP, price deflator for GDP 3

® Real consumption (the definition of nominal consumption is implied by (3.3) and the price deflator
for consumption is defined in (3.17)) 1
• Investment, real investment, price deflator tbr investment 3
• Absorption (i.e. gross national expenditure), real absorption, price deflator for absorption 3
• Supplies of domestic commodities and volumes of imports by commodity 29
® Total employment, total usage of capital (rental-weighted sum of industry usage) 2
• Total values of imports (c.i.f) and exports (f.o.b.), and the balance of trade 3
• Indexes of import prices (c.i.f.) and export prices (f.o.b.), and the terms of trade 3
• Total tax collections, total collection of consumer taxes, total collection of tariff revenue 3
• Total tax collections in real terms (deflated by the price of absorption) 1
® Ratio of economy-wide average wage rate to average rental per unit of capital 1

Total number of equations is 492h + 3g 2 + 119A + 149 + 8A + 25

44 PB. Dixon and B.R. Parmenter

Table 1.4
Variables in the illustrative model*
Variables Index ranges Description No.
)),_~ (u) = (3), (4) and (kj) for k = 1,2 Demand by user (u) for good or 6g + 89h + 4h
andj = l,...,h. primary factor (is); and price paid
p(U) If (u) = (4) then s = 1; by (u) for (is)
(is) if (u) ¢ (4) then s = 1,2.
If (u) = ( l j ) then i -- 1 . . . . . 9 + 1;
if (u) 7~ ( l j ) then i = 1 . . . . . 9.

(i.) u = (3) and (kj) for k = 1,2 and Demand for composite good or g+29h+h
j= 1,...,h. primary factor i by user (u)
If (u) = ( l j ) then i = 1 . . . . ,g + 1;
if (u) ¢ ( l j ) then i = 1 , . . . ,9.
a(iJ) j= 1,...,h, s = 1,2. Primary-factor saving technologi- 2h
cal changes

c Total expenditure by households

q Number of households
p(3) Price to households of composite
(i-) i=l,..., 9.

z(~) ('u) = (kj) for k = 1,2 and Activity levels: current production 2h
j=l,...,h. (k = 1) and investment (k = 2)
by industry

(il) i= 1,... g. Shift in foreign demand curves

¢ Exchange rate

r,i = 1,. ,9. Demand for commodity (r 1) to be 4g2h ÷ 3g 2

(at) = (3), (4) and (kj) for k = 1,2 used as a margin to facilitate the
andj = 1,...,h. flow of (is) to (u)
If (u) = (4) then s = 1;
if (u) ¢ (4) then s -- 1,2.

x(OJ) i = l,...,g;j = l,...,h. Output of domestic good i by in- gh

dustry j

i-- l,...,g; s = 1,2. Basic price of good i from 29

source s

P(i2) i = l,...,g. Foreigh-currency c.i.f, price of im- g

ported commodity i

i-- 1,...,g. Power of the tariff on imports of i g

Ch. 1." Computable General Equilibrium Modelling 45

Table 1.4
Variables Index ranges Description No.

t(i,s,(u)) i= l,...,g. Power of the tax on sales o f e o m - 49h+39

(u) = (3), (4) and (kj) for k = 1,2 modity (is) to user (u). The power
and j = 1 , . . . , h. of a tax is t plus the rate of the tax
If (u) = (4) then s = 1;
if (u) # (4) then s = 1,2.

f~<J> j = l,...,h. Industry-specific capital shift terms h

Ik Capital shift term 1

x(lj) /,~ Capital stock in industry j at the h
(g+l,2) kl) j = 1,...,h.
end of the year, i.e., capital stock
available for use in the next year
Cost of constructing a unit of cap- h
ital for industry j

f ((g+l,l)
lj) j = l,...,h. Industry-specific wage shift term h

f(g+l,t) Wage shift term (often the real 1

wage rate)

cpi Consumer price index 1

tdi,.,(3)) i= l,...,g. Base value of power of consumer g

tax on good i, both domestic and

ft (3) Shift term allowing uniform per- 1

centage increase in powers of con-
sumer taxes

iR Real aggregate investment 1

CR Real aggregate consumption 1

tic Ratio of real investment to real 1


Other See list at the end of Table 1.3. Note that real investment and real consumption 21 + 29
have already appeared earlier in this table.

Total number of variables: 492h + 392 + 159h + 199 + 13h + 31

* We list the variables only in their percentage-change form, i.e., in the lower-case notation in which they
appear in the (b)-system of Table 1.3.
46 P.B. Dixon and B.R. Parmenter
Table 1.5
Other notation used in the equations of the illustrative model
Symbol Appears in Description
k~(u) 3.1(a), 3.2(a), 3.8(a) Fnnctions determining composition of composite commodities
and primary factors, and composition of industry outputs. They
are the outcome of CES-cost-minimizing and CET-revenue-
maximizing problems

o-}u) 3.1(b), 3.2("o) Parameter: elasticity of substitution for user (u) between alterna-
tive sources of commodity or factor i

o-(°j) 3.8(b) Parameter: elasticity of transformation in industry j between

outputs of different commodities

V(i,t, (u)) 3.1(b), 3.2(b), 3.10(b), Input-output flow: purchasers' value of good or factor i from
3.12(b), 3.15(b) source t used by user (u)

V(i,., (u)) 3.1(b), 3.2(b), 3.3(b), Input-output flow: V(i, s, (u)) summed over s
3.4(b), 3.15(b)

V ( . , . , (u)) 3.15(b) Input-output flow: V(i, s, (u)) summed over i and s

~/~ 3.3 Parameter: subsistence parameter in linear expenditure system

/3i 3.3 Parameter: marginal budget shares in linear expenditure system

A~ u) 3.5(a) Parameter: demand for composite i per unit of activity by user (u)

~i 3.6 Parameter: foreign elasticity of demand

3.7(a), 3.12(a) Parmneter: use of ( r l ) as a margin per unit of flow of (is) to
user (~)

Y(t, j) 3.8(b), 3.9(b) Input-output flow: basic value of output of domestic good t by
industry j

I/(., j ) 3.8(b) Input-output flow: sum of Y(t, j) over t, i.e., basic value of
output by industry j

B(t, s, (u)) 3.9(b), 3.12(b) Input-output flow: basic value of (ts) used by (u)

M(t, i, s, (u)) 3.9(b), 3.12(b) Input-output flow: basic value of domestic good t used as a
margin to facilitate the flow of (is) to (u)

T(i, s, (u) ) 3.12(b) Input-output flow: collection of taxes on the sale of (is) to (u)

6j 3.13, 3.14 Parmneter: rate of depreciation of industry j ' s capital

ozj 3.13 Parameter: sensitivity of capital growth to rates of return

V_(i, s, (3)) ]
V ( . , . , (3)) ] 3.17 Parametei's: initial values of V(i, s, (3)) and V ( . , . , (3))
Ch. 1: Computable General Equilibrium Modelling 47
Table 1.5
Symbol Appears in Description

G 3.3, 3.8(b), 3.9, Set: {1, 2, . . . , 9}, 9 is the number of composite goods
3.12, 3.15, 3.17
G* 3.10 Set: { 1,2,..., 9 + 1}, 9 + 1 is the number of composite goods
and primary factors

H 3.9 Set: {1,..., h}, h is the number of industries

U 3.9 Set: {(3), (4), (kj) for k = 1,2 a n d j = 1. . . . . h)

U* 3.9 Set: {(3), (kj) f o r k = 1,2 and j -~ 1 , . . . , h }

Eqs 3.5. The second level allows CES substitution between sources in the formation
of composites, leading to the source-specific d e m a n d Eqs 3.1 and 3.2. 42
The household is a s s u m e d to m a x i m i s e a nested utility f u n c t i o n subject to an
aggregate-expenditure constraint. The top level of the utility function, describing pref-
erences for composite commodities, is a K l e i n - R u b i n ( 1 9 4 8 - 1 9 4 9 ) 43 function, leading
to the linear-expenditure system, Eq. 3.3. The second level allows for CES substitu-
tion b e t w e e n sources of commodities as was the case for inputs to current production
and capital formation. The consequent source-specific d e m a n d equations are included
in 3.1.
E q u a t i o n 3.6 specifies constant-elasticity foreign d e m a n d curves for exports.
E q u a t i o n 3.7 specifies d e m a n d s for margins services. 44 We assume that margins
must be used in fixed proportions to the basic flows which they facilitate.
We assume that producers choose their output mixes, given their activity levels and
output prices, to m a x i m i s e r e v e n u e subject to C E T transformation frontiers [Powell
and G r u e n (1968)]. This leads to the supply functions 3.8. 45
E q u a t i o n 3.9 imposes market clearing for domestic commodities. O n the LHS it
sums over producers of commodities and on the RHS over uses of commodities.
Direct and m a r g i n uses are included.

42We include in the factor-demand Eqs 3.2 technology coefficients which we use to introduce labour-
saving technical change in the simulations reported in Section 3.5(c). In models for real-world applications
rather than illustration, we include a wider range of technology and taste coefficients.
43See also Geary (1950-1951) and Stone (1954).
44We denote the type and source of margin service in the double subscript, assuming by the value "1"
for the second subscript that all margin services are domestically sourced. In our data (Table 1.2), only
commodity 3 is used as a margin service. The basic flow which the margin service facilitates is specified
by the triple superscript, the three components of which show, in turn, the basic flow's commodity, source
and user,
45The "o" in the superscripts in these equations denotes output or, in the case of prices, basic values.
The second component of the superscript in the symbols denoting outputs and activity levels specifies the
producing industry.
48 P.B. Dixon and B.R. Parmenter

Equations 3.10-3.12 constitute the model's pricing system. Equation 3.10 relates
basic values to unit costs. In view of the constant returns to scale assumed in the
model's production functions, output and input quantities can be eliminated from the
percentage-change form leaving a relationship between percentage changes in the
basic prices of outputs and percentage changes in the purchasers' prices of inputs.
Equation 3.11 defines the basic prices of imports as their c.i.f., duty-paid prices. In
3.12 purchasers' prices are defined as the sums of basic values, margins costs and
commodity taxes.
Equations 3.13 and 3.14 have been discussed already in Section 2.3 (see (2.65)
and (2.68) which together correspond to 3.13(a); (2.69) which corresponds to 3.13(b);
(2.56) which corresponds to 3.14(a); and (2.70) which corresponds to 3.14(b)). Notice
that in 3.2(a) we used ~(g+1,2)
y(lj) to represent the flow of capital services to industry j.
In 3.13(a) and 3.14(a), X " is the capital stock in use in industry j. We assume
that each unit of ca pl"ta 1 stock
(g+1,2)in existence at the beginning of a year is capable of
providing one unit of capital services during the year and that the available capital
services are always fully used. Hence, in our notation we need not distinguish between
the capital stock and capital services.
Equation 3.15 specifies the unit costs of constructing capital. With constant returns
to scale in the capital production functions, quantities can be eliminated from the
percentage-change form, leaving percentage changes in the unit cost as functions of
percentage changes in input prices only (3.15(b)).
Equation 3.16 allows for indexation of nominal wage rates to the CPI, defined by
3.17. Together, the shift variables (F) in 3.16 represent industry-specific real wage
rates. In the percentage-change form, f(g+l,l) can be used to introduce shifts in the
overall real wage rate and the e(li)
a(g+l,l) can accommodate changes in industrial wage
Equation 3.18 allows us flexibility in setting rates of commodity taxes on house-
holds. This is required in the revenue neutral tariff-reform simulations which we report
in Section 3.5(b). Similar equations could be added to allow flexibility in the treatment
of other tax rates if required.
Equation 3.19 is included to allow us to exogenize the ratio of real aggregate
investment to real aggregate consumption, an option which we choose in the short-
run comparative-static simulations reported in Sections 3.5(a) and (b).
As well as the equations which we have set out in detail in Table 3.2, the model
includes definitions of the macroeconomic variables listed at the end of the table. The
definitions of these variables are orthodox and straightforward. We omit the details
for the sake of brevity.

3.3. Coefficients, parameters, zero problems and initial solution

To form a computable model, we must assign values to the parameters appearing in

the system of equations that we choose to use. If we choose a differential system
Ch. 1: Computable GeneralEquilibrium Modelling 49

(the (b)-system in Table 1.3), then we must also give initial values to the coefficients.
These are cost shares, sales shares or other functions of the model's variables. In each
step of a Johansen/Euler computation, the coefficients are held constant but they are
moved from step to step.
In this subsection we explain how we assigned values to the parameters and co-
efficients in the differential form of the illustrative model. Then we consider briefly
the problem of setting parameters for the (a)-system. In our applications of the illus-
trative model, we use only the (b)-system - we adopt the Johansen/Euler approach.
Nevertheless, it is still worthwhile to look at the (a)-system. This will allow us to
illustrate a key point of Section 2: that for CGE models we almost always have a
readily available initial solution or at least a readily available solution for an initial
Most of the information, required to implement the (b)-system of our illustrative
model is in the input-output data in Table 1.2. For example, a coefficient appearing
in Eq. 3.1(b) is V(1, 1, (12))/V(1, ", (12)), i.e., the share of domestically produced
good 1 in industry 2's expenditure on composite good 1 to be used as an input to
current production. The initial value of this share (for the first step of a Johansen/Euler
computation) can be calculated from Table 1.2 as

V(1,1, (12))] = 20+4+4

V~(1,., (12)) J i,itia, ( 2 0 + 4 + 4 ) + ( 5 + 1 + 1 ) =0.8.

Similar coefficients, each consisting of the share of a sourced input in the user's
expenditure on the relevant composite, appear in Eqs 3.2(b), 3.4(b) and 3.17(b). 46
Equation 3.8(b) contains a different type of share coefficient: Y(t, j ) / Y ( ' , j). This
is the share of the basic value of industry j ' s output of good t in the basic value of
industry j ' s total output. From Table 1.2, we see, for example, that the initial value
of Y(2, Z)/Y(.,2) = 90/110.
Rather than being input-output shares, many of the coefficients in the (b)-system
of Table 1.3 are input-output levels. These are purchasers' values of flows (the Vs in
3.3(b), 3.10(b), 3.12(b) and 3.15(b)); basic values of flows (the/3s and Ys in 3.9(b),
3.10(b) and 3.12(b)); tax collections (the Ts in 3.12(b)); and margins flows (the M s in
3.9(b) and 3.12(b)). The initial values of all these can be read directly from Table 1.2
or calculated by a small number of additions.
Equation 3.3(b) contains the coefficient Pi:.)Q. Setting the initial v,a lue of this
coefficient requires a decision about units. The approach we use is to assume that
quantity units for all composite commodities are chosen so that their initial purchasers'
prices are unity

i.e., [p0*)l
L (i')Jinitiat = 1' for all uEU, i= 1, ""
.,9ands= 1,2. (3.20) 47

46In 3.17(b), the share coefficients are weights in the consmner price index. They are not moved in a
multi-step Johansen/Euler computation.
50 t~B. Dixon and B.R. Parmenter

We define population units so that the initial value of Q is also 1.

The final group of coefficients are those in the differential forms of the investment
and capital accumulation Eqs 3.13(b) and 3.14(b). The data, beyond those in i n p u t -
output tables, required for setting the initial values of such coefficients are depreciation
rates and measures of rates of return. In our illustrative model, we assume that all
depreciation rates (the parameters 5j) are 10 per cent and that net rates of return
(i.e., rental/capital-value ratios less depreciation rates) are 5 per cent. We assume that
capital units are chosen so that ~k
p ( l j ) has an initial value of 1 for all j . Then for
industry 1, we have

[p(ll) x(ll) /p(ll)x(It ) ]

l_ (0+1,2) (0+1,2)/ k (g+l,2)J -- ~JJinitial

F r o m Table 1.2, we see that the initial gross earnings of capital in industry 1 are 11.

(11/X~1,2)) initia1 - 0 . 1 0 = 0 . 0 5


l ~ ( 9 + 1 , 2 ) j initial = 73.33.

Similarly, we find that capital stocks available in the initial year for industries 2 and 3
are 53.33 and 193.33. We can now compute the initial values for the rental rates
0 J ) 1,2) ~
]" They are 11/73.33, 8/53.33 and 29/193.33, i.e., 0.15 for all three industries.
F r o m here, we find that the initial value of the coefficient in square brackets in 3.13(b)
is 0.143 for all industries.
For Eq. 3.14(b), we have already found the initial values of the coefficients X (t j)
With the initial values of p~k( U ) being 1, we can read the initial investment levels
( Z (2j)) from Table 1.2. They are 10.63, 5.32 and 26.05. Now we can calculate the
initial quantity of next year's capital stock for industry 1 as

(X~l,2)(1))initia I = 7 3 . 3 3 ( 1 - 0 . 1 0 ) + 1 0 . 6 3 - - 76.63.

Similarly, we find that next year's capital stocks for industries 2 and 3 are, initially,
53.32 and 200.05.

47We usually assume that the quantity units of all sourced commodities are chosen so that their basic
prices are initially 1. Can (3.20) then be satisfied without violating conditions such as 3.4(a)? For each
user, u, composite i is a CES combination of ~'(il)
y ( u ) and "'(i2)'
x"(u) with each of the CES functions having
its own "A" parameter [see (2.36)]. By suitable choices for the values of these Al~.~s we can ensure that
(3.20) is fulfilled while still satisfying the condition: composite price times composite quantity equals the
sum of expenditures on sourced commodities.
Ch. 1: Computable General Equilibrium Modelling 51

Most of the parameters in the differential representation of our model are either
substitution or transformation elasticities [the ors in 3.1(b), 3.2(b) and 3.8(b)]. Ideally,
these should be estimated econometrically. In practice, they are often assigned val-
ues based on literature search. For our illustrative model we chose values typical of
those estimated for the ORANI model [Dixon, Parmenter, Sutton and Vincent (1982,
Chapter 4)]. We set the Armington elasticities 48 (i.e., the elasticities describing domes-
tic/import substitution, or}u) for all (u) and for i --- 1 , . . . , 9) at 2; the labour/capital
elasticities tcrg+l"(lj) for j = 1, . . . , h) at 0.5; and the transformation elasticities (or(°j),
j = 1 , . . . , h ) at 0.5.
In Eq. 3.6(b) we set the foreign demand elasticities (r/i) at 5 for commodity 1 (the
main export commodity) and at 20 for all other commodities. Again these numbers
are typical of those used in Australia's ORANI model. The ORANI foreign demand
elasticities are consistent with Australia's export volumes having a minor influence
on world prices of Australia's main exports (agricultural and mineral products), and
having barely any influence on all other world prices.
In the household demand Eqs, 3.3(b), the parameters are marginal budget shares
(fli) and subsistence parameters ("/~). In our illustrative model, the four/3s were set at
0.0785, 0.5446, 0.3141 and 0.0628, and the four 7s were set at 6.76, 66.85, 7.04 and
5.41. These parameter settings were chosen to give typical values for household ex-
penditure elasticities and for the ratio of subsistence expenditure to total expenditure. 49
In conjunction with the data in Table 1.2, our chosen parameter values imply initial
values for the four expenditure elasticities of 1, 0.84, 1.5 and 1, and for the subsistence
ratio of 0.45.
The last set of parameters are the c~j s in the investment Eqs, 3.13(b). These control
the sensitivity of capital growth in each industry to variations in rates of return. Guided
by our experience with ORANI, we assigned all of the cus in the illustrative model
the values of 2.0.
With initial values assigned to the coefficients and with the parameters set, we
are almost ready to compute Johansen/Euler solutions for our illustrative model. One
minor practical problem remains: zero input-output flows. These can cause difficulties
~n equations such as 3.12(b). Assume that ~(~) is endogenous. Then if V(i, s, (u))
is zero, Johansen/Euler computations will fail because the value of PI~!),oo will be
indeterminate. One cure is to modify the input-output database by replacing zero
flows with very small numbers. Another is to modify equations such as 3.12(b) to read

(V (i,s, (u) ) + TINY)p{~2)=etc.

where TINY is a parameter assigned a very small value.

4~This type of elasticity is named in recognition of the contribution of Armington (1969, 1970).
49Lluch, Powell and Williams (1977) is a valuable source of estimates of expenditure elasticities and
subsistence ratios for many countries.
52 P.B. Dixon and B.R. Parmenter

To complete this subsection, we consider briefly how to assign values to the pa-
rameters of the (a)-system in Table 1.3. 5° We start with Eq. 3.1 (a) for the cases where
i = 1, s = 1 and 2 and (u) = (12), that is, we look at the parameters of the demand
functions for domestic and imported good 1 to be used as inputs to current production
in industry 2.
From (2.38), we see that these demand functions have the form

~'~0")/~(1") J ~(1~) p02)~0e) , (3.21)

s = 1,2.

We have already set all j) at 2, implying that p112) is - 0 . 5 . As indicated in footnote

47, we assume that all basic prices are initially 1. Now from Table 1.2 we find that

(/9((112)))initial = ( 2 0 + 4 + 4 ) / 2 0 = 1.4


(aP~ll2¢)initia1 ~-~ (5 -}- 1 -t- 1)/5 = 1.4.

With our convention that the purchasers' prices of composites are initially 1, we have

(X~,~))initial-~ 28 + 7 = 35.

~(12) ~(12)
Remembering that ~01) and ~'02) sum to 1, we can solve (3.21) to obtain the vaiues
£(12) h(12) 4(12).
of the three parameters, ~(11)' ~(12) and ~,(1.)

(12) ( ~...~, .t,(12) /-- 1"4b(11)

20 = (35/A(1.)) t ~ ~'(lt) ~
\ t=l,2 \ . . . . (it)


5 = (35/AII.2~) ,(12))
o(u ~l'q°(12)
t . . . . (it) /

5°This is what many CGE modellers refer to as calibration, see for example, Shoven and Whalley
Ch. 1: Computable GeneralEquilibrium Modelling 53


(ll) ~- 0.666, ~'(12)= 0.333


A (J-)2) -- 2.520.

By similar methods, we could assign values to all the other parameters in the
(a)-system. Howe;eeL we use only the (b)-system in our computations. Consequently,
we will not go any further with the process of assigning parameter values in the
(a)-system. Our main reason for considering the topic at all is its relationship to the
idea, emphasized in Section 2, that the input-output database for most CGE models
provides an initial solution.
Given the way we have set the parameters in (3.21), it is clear that the equation is
satisfied by the initial values for X~]~I, y(12)
~(12)' j¢--(12)
~ (1.).' p{]2¢ and P~]~¢ implied by our
input--output data in combination with our conventions on quantity units and initial
prices. Similarly, all the other equations in the (a)-system are satisfied by an initial
solution generated in a straightforward way from the database. This is an implication
of the way the parameter values and the initial values of some of the shift variables 5l
are determined.

3.4. Closure of the illustrative model

F r o m Tables 1.3 and 1.4 it can be seen that the illustrative model contains (49h +
59 + 5h + 6) more variables than equations. 52 Hence, to close the model this number
of variables must be set exogenously. A strength of working with the linearized
percentage-change version of the model and the G E M P A C K software is that it is easy
to run simulations under a variety of closures. In Section 3.5 we report four different
simulations, each with a different closure. The closures are listed in Table 1.6. As we
will demonstrate in explaining the simulations, by changing the closure we are able
to use the model in many different modes.

51For exmnple, from Table 1.2 we have kA(ll))imtial

,,¢-(4) x = 35 and p~4]) = 51/35 = 1.457. The parameter
rll is set at 5. If we set the initial value of the exchange rate at 1, then we can satisfy 3.6(a) by setting the
initial value of the shift variable F{4~) at (51/35)(35) 0.2 = 2.97.
52As shown by Table 1.2, in our data 9 (the number of cmmnodities) is 4 and h (the number of
industries) is 3. Hence, the excess of variables over equations in the implemented version of the model
is 89.
54 I~B. Dixon and B.R. Parmenter

Table 1.6
Numbers of exogenous variables in the simulations reported in Tables 1.7 to 1.9

Standard Short run Revenue- Forecasting

short run for macro neutral (Table 1.9)
Variable (Table 1.7 package (Ta- short run
cots 1, 2) ble 1.7 col. 3) (Table 1.8)
Always exogenous
q number of households 1 1 1 1#

a (I j)
(g+l,s) tech. change 2h 2h 2h 2h #

tl~) tariffs g g g#

t(i, s, (kj)) sales taxes on current and capital 4gh 49h 4gh 49h

(il) export demand shifts g g g g#

p(~O) import prices (cif, $foreign) g g g g#

tb(i, ", (3)) base powers of consumer tax 9 g g 9

Macro targets and instruments

f(g+l,l) wage shift i# end 1 1#

CR real consumption 1# end 1 1#

no symbol total employment end~ 1# end end

no symbol balance of trade as share of GDP end 1 end end

Revenue constraint and wage setting

no symbol real tax collection (CPI deflated) end end 1 end

ft(3) consumer tax shifter 1 1 end 1

(g+l,1) nominal wage rates by industry end end h end

(g+l,l) industry specific wage shifters h h end h

Short-run comparative statics vs forecasting

(9-t-1,1) capital available for use in current h h h h#

iR real investment end end end 1#

tic investmenffconsumption 1 1 1 end

f(J) industry-specific capital shift terms h h h h

fk capital shift term end end end end

g exchange rate 1 1 1 end

cpi consumer price index end end end 1#

Ch. 1: Computable General Equilibrium Modelling 55

Table 1.6
Standard Short run Revenue- Forecasting
short run for macro neutral (Table 1.9)
Variable (Table 1.7 package (Ta- short run
cols 1, 2) ble 1.7 col. 3) (Table 1.8)
(il) [ export volume 9 - b* g - b 9 - b 9~
or l or b b b end
t(~, 1, (4)) export tax

Total number of exogenous variables in all columns is 49h + 59 + 5h + 6

t The entry "end" in the table means endogenous.
# These variables are shocked in the relevant simulations.
* In the simulations reported in Tables 1.7 and 1.8, b is 1, Export volumes are exogenous for coimnodities 2,
3 and 4 and the export tax rate is exogenous only for commodity 1.

Table 1.7
Short-run comparative-static effects of policies for employment stimulation: 1-step Johansen/Euler
solutions (percentage changes)

Variable Real wage cut Demand expansion Macro package

1. Real wage rate -1.00 0.00 -3.67
2. Real aggregate absorption 0.00 1.00 3.09
3. Aggregate employment 0.98 0.45 5.00
4. Wage/rental ratio - 1.39 -0.88 -9.96
5. Terms of trade -0.34 0.22 -0.58
6. GDP price index -0.77 0.64 0.87
7. Consumer price index -0.68 0.58 --0.71
8. Exports of commodity 1 2.14 -1.36 3.66
Activity levels
9. Sector 1 1.56 -0.64 3.79
10. Sector 2 0.19 0.61 2.59
11. Sector 3 0.45 0.57 3.42

12. 100 (Balance of trade)/GDP 0.47 -0.56 0.00

13. Import volume index -0.31 1.12 2.34

Note: Numbers in bold type are exogenous

3.5. Simulations

This section contains some simulations results from the illustrative model, computed
using GEMPACK [ C o d s i a n d P e a r s o n ( 1 9 8 8 ) a n d P e a r s o n ( 1 9 8 8 ) ] a p p l i e d to t h e
b - s y s t e m in T a b l e 1.3. W e p r e s e n t r e s u l t s o f t h r e e t y p e s : c o m p a r a t i v e - s t a t i c r e s u l t s
56 PB. Dixon and B.R. Parmenter
Table 1.8
Simulations of the effects of abolishing tariffs (percentage change)

Variable 1-step 2-step 1, 2-step 8, 16, 32-step

extrapolation extrapolation
1. Tariff revenue -94.92 -97.30 -99.69 -99.99
2. Revenuefrom taxes on h'holds 59.01 60.79 62.57 62.88
3. Importvolume index 5.40 5.82 6.25 6.32
4. Exportsof commodity 1 12.09 12.54 13.00 13.02
5. Termsof trade -1.93 -1.94 -1.95 -1.95
6. (Balanceof trade)/GDP 0.01 0.01 0.00 0.00
Activity levels
7. Sector 1 1.22 1.24 1.27 1.26
8. Sector 2 0.58 0.62 0.65 0.65
9. Sector 3 -0.27 -0.25 -0.24 -0.23

computed b y the 1-step Johansen/Euler method (Table 1.7), comparative-static results

computed by multi-step Johansen/Euler procedures (Table 1.8) and five-year forecasts
comprising annual 1-step Johansen/Euler computations (Table 1.9). Closures for the
simulations are listed in Table 1.6.

(a) Structural effects o f macro employment strategies: 1-step Johansen/Euler


As an example of comparative-static simulations, we have used the illustrative model

to project the effects of alternative strategies for employment generation. We used
O R A N I - m o d e l results like these as inputs to the debate in Australia about macroeco-
nomic policy, concentrating in particular on the structural effects of different strategies
[Dixon, Powell and Parmenter (1979)]. As we illustrate here, our conclusions were
that a combination of demand stimulation and wage moderation would increase em-
p l o y m e n t without adverse consequences for the trade balance and without disruption
to the structure of the economy. Corden and Dixon (1980) explore the prospects for
implementing such a strategy by use of wage-tax bargains with the trade unions. Anal-
ysis such as this underpinned the macroeconomic policy adopted by the Australian
government through the middle 1980s.
Selected results from our illustrative macro-strategy simulations are reported in
Table 1.7. The first column shows the effects of a one-per-cent cut in the CPI-deflated
wage rate with real domestic absorption held fixed. The second shows the effects of
a one-per-cent increase in real domestic absorption (consumption plus investment)
with the CPI-deflated real wage rate held fixed. In the third column are results from
a simulation in which we computed percentage changes in the real wage rate and
Ch. 1: Computable General Equilibrium Modelling 57

Table 1.9
Five-year forecasts: Annual percentage growth rates

Variable Year 1 Year 2 Year 3 Year 4 Year 5

(a) Exogenous scenario
Vertical shifts in export demand schedules*
commodity 1 1.00 10.00 11.00 2.00 2.00
commodity 2 4.00 4.00 4.00 4.00 4.00

World prices of imports

commodity 1 4.00 4.00 4.00 4.00 4.00
commodity 2 4.00 4.00 4.00 4.00 4.00
commodity 4 4.00 4.00 4.00 4.00 4,00

Export volumes
commodity 1 3.00 4.50 3.00 2.50 2.00
colmnodity 2 10.00 11.50 10,00 8.00 7.00
Real wage rate 0.80 0.80 1.50 1.50 1.00

Labour-saving tech. change

sector 1 2.00 4.00 2.00 1.50 1.00
sector 2 2.00 4.00 2.00 1.50 1.00
sector 3 0.00 0.00 0.00 0.00 0.00

Aggregate real investment 2.00 7.20 6.80 0.00 -5.00

Aggregate real consumption 2.50 3.50 2.30 2.00 2.00

Powers of tariffs
commodity 1 - 1.00 - 1.00 - 1,00 0.00 0.00
commodity 2 -4.00 -4.00 -4.00 0.00 0.00
commodity 4 0.00 0.00 0.00 0.00 0.00

Consumer price index 2.90 4.10 3.90 3.00 3.00

Numbers of households 1.40 1.40 1.40 1.40 1.40

(b) Endogenous variables

Terms of trade -2.97 3.86 4.88 -2.04 - 1.92
Wage/rental ratio 1.28 -2.51 1.73 4.73 4.75
Aggregate employment 2.15 3.58 2.31 1.31 0.87
Capital stock in use 3.13 2.96 3.50 3,94 3.41
Real GDP 2.77 4.24 3.08 2.35 1.73
Export volume index 4.42 6,03 4.54 3.71 3.17
Import volume index 2.97 5.44 4.43 1.47 0.01
Nominal devaluation 0.55 0.72 0.95 -0.50 0.07
GDP price index 2,02 5.13 5.30 2.37 2.42
Real devaluation 2.53 -0.41 -0.35 1.13 1.65
58 P..B~ Dixon and B.R. Parmenter

Table 1.9

Variable Year 1 Year 2 Year 3 Year 4 Year 5

Supplies of domestic goods
commodity 1 3.12 4.35 3.05 2.72 2.27
commodity 2 2,40 3.95 2.98 2.70 2.00
commodity 3 2.90 4.38 3.23 2.39 1.76
Capital in use
sector 1 4.50 3.69 3.32 3.57 3.38
sector 2 -0.02 1.09 1.98 3.16 3.01
sector 3 3.47 3.21 3.99 4.29 3.53
Activity levels
sector 1 3.98 4.82 3.20 2.87 2.52
sector 2 1.62 3.52 2.85 2.58 1.76
sector 3 2.90 4.38 3.23 2.39 1.76
sector 1 1.72 1.37 1.13 1.02 1.10
sector 2 0.57 0.93 1.39 0.73 0.01
sector 3 2.65 4.90 2.89 1.53 0.98
Capital growth through year
sector 1 3.69 3.32 3.57 3.38 2.63
sector 2 1.09 1.98 3.16 3.01 1.81
sector 3 3.21 3.99 4.29 3.53 2.34
sector 1 - 1,28 0.90 5.25 2.06 -2.37
sector 2 11,20 9.13 12.02 1.99 -6.51
sector 3 1.46 9.27 6.21 - 1.22 -5.63

* These are the percentage changes in the world prices of exports which would
occur in the absence of changes in export volumes.

real d o m e s t i c a b s o r p t i o n w h i c h t o g e t h e r g i v e a f i v e - p e r - c e n t i n c r e a s e in e m p l o y m e n t
w i t h n o c h a n g e in the ratio o f the trade b a l a n c e to the G D E T h e results are r e p o r t e d
as p e r c e n t a g e c h a n g e s in the variables. T h e s e are to b e i n t e r p r e t e d as p e r c e n t a g e
d i f f e r e n c e s b e t w e e n the v a l u e s w h i c h the v a r i a b l e s w o u l d take in s o m e t a r g e t y e a r
i f t h e s h o c k s h a d b e e n a p p l i e d a n d the v a l u e s w h i c h the v a r i a b l e s w o u l d take in the
s a m e t a r g e t y e a r in the a b s e n c e o f the shocks.
O u r e x p e r i e n c e s u g g e s t s that w i t h s h o c k s o f the sizes u s e d for t h e s e s i m u l a t i o n s
l i n e a r i z a t i o n e r r o r s a r i s i n g in 1-step J o h a n s e n / E u l e r s o l u t i o n s are not serious. H e n c e ,
w e r e l i e d o n t h e 1-step m e t h o d in c o m p i l i n g T a b l e 1.7.
Ch. 1: Computable General Equilibrium Modelling 59

Short-run comparative-static closures. Columns 1 and 2 of Table 1.6 give the clo-
sures used for the simulations reported in Table 1.7. The first group of variables,
which are exogenous in all of our simulations, begins with the number of house-
holds, the rates of factor-saving technical progress, and the rates of duty on imports
and of commodity taxes on inputs to current production and capital formation. The
next two variables alIow us to impose on the single-country model shifts in export
demand schedules and inthe foreign-currency prices of imports. The last variable in
the group is from the RHS of the Eq. 3.18 which determines our setting of the rates
of commodity taxes on household consumption.
The treatment of the next block of variables (headed "Macro targets and instru-
ments") distinguishes the simulations reported in the first two columns of Table 1.7
from that reported in its last column. In the first two columns, we shock "instru-
ments" (the overall real wage rate, f(g+l,1), and real aggregate consumption, CR) and
report the effects on "targets" (aggregate employment and the trade-balance/GDP ra-
tio). In the final column we assign values to the targets and report the changes in the
instruments required to attain these targets.
No revenue constraint is imposed in the simulations reported in Table 1.7. Hence,
real tax-revenue is endogenous and the consumption-tax shift variable ft (3) is exoge-
, (l j)
nous. Nominal wage rates/P(g+t,l)) " are endogenous in all the simulations in Table 1.7
but sectoral wage relativities are held constant. To implement this, the sector-specific
wage-shift variables <e(U)
~(g+1,1)) are exogenous.
In the first two columns of Table 1.6, the exogenous-endogenous assignments of the
variables included in the final block implement some other features of our short-run,
comparative-static environment. The availability of capital in each sector is assumed
to be unaffected by the shocks in the simulations reported in Table 1.7 - an orthodox
short-run assumption. Hence, capital stocks kX(g+l,2))
" (U) , are exogenous.
Aggregate investment (iR) is formally endogenous, the exogeneity of the vari-
able "tic" ensuring that real investment moves at the same (exogenous) rate as real
consumption. The allocation of investment between sectors, reflecting movements
in relative rates of return, is determined .by Eqs 3.13 and 3.14. The shift variable
f~J) is exogenous but the variable fk is endogenous, adjusting to ensure that sector-
specific investment changes are consistent with the movement in aggregate invest-
The nominal exchange rate (e) is the numeraire in the comparative-static simulations
with domestic prices, including the CPI, endogenous.
In the comparative-static simulations, we set the rate of export tax on commod.-
ity 1, the main exported commodity, exogenously and allow the model to determine
movements in the volume of exports. The basic price of domestically produced com-
modity 1 is then linked tightly to the world export price [cf., Eq. 3.12]. Exports of
60 P.B. Dixon and B.R. Parmenter

the other three commodities are exogenous, with their export-tax rates endogenous,
breaking the link between movements in their domestic and world prices. 53

Results. The results in row 3 of the first two columns of Table 1.7 show that, in
the standard short-run closure, employment is generated both by a cut in the CPI-
deflated wage rate (with aggregate real absorption held constant) and by an increase
in aggregate real absorption (with the CPI-deflated wage rate held constant). Both
shocks reduce the real wage from the employers' point of view, that is, they both
reduce the ratio of the wage to the average rental price of capital (row 4).
In the case of the increase in absorption, the main mechanism which triggers the
reduction in the employers' real wage is an increase in the terms of trade (row 5)
which raises the GDP price index (row 6) relative to the CPI (row 7). The increase in
absorption improves the terms of trade because it crowds out exports of commodity 1
(row 8), driving up its world price. Note that under the wage-cut shock exports expand
and the terms of trade deteriorate. This reduces the GDP price index relative to the
CPI, moderating the fall in the producers' real wage.
The elasticity of aggregate employment to the wage/rental ratio is greater for the
wage-cut shock than for the demand-expansion shock. This is explained by differences
in the compositional effects of the shocks. Relative to the wage cut, the demand
expansion stimulates sector 2 and inhibits sector 1 (the main producer of the exportable
commodity 1). Sector 2 is less labour-intensive than sectors 1 and 3. 54
As is implicit in our discussion of the results so far, the balance-of-trade effects
which accompany employment generation differ sharply between the two shocks
(row 12). Because it reduces domestic costs relative to world prices, the wage cut
stimulates exports (row 8) and inhibits imports (row 13), causing an increase in the
ratio of the balance of trade to the GDR Demand expansion has the opposite effect.
It raises domestic costs relative to world prices, crowds out exports and stimulates
imports. Hence, it causes a deterioration of the trade-balance/GDP ratio.
A package of the two policies could avoid balance-of-trade movements and produce
a more balanced expansion of the economy. The computation of such a package is
reported in the third column of Table 1.7. We computed the package via a closure
switch in which aggregate employment and the trade-balance/GDP ratio replace the
CPI-deflated wage rate and real aggregate consumption as exogenous variables. We
then use the model to compute the changes in the wage rate and in consumption which
are required to produce a 5 per cent increase in employment with no change in the
trade-balance/GDP ratio. As can be seen from column 3 of Table 1.7, a wage cut of

53This expedient reflects our inadequate understanding of what determines export volumes for the
economy's minor exports and a recognition that domestic prices of these products tend to move quite
independently of world prices. Some care is needed to ensure that collection of the endogenous export
taxes does not distort unduly a model's public-finance results.
54For a more extensive discussion of the employment effects of demand expansion in ORANI-style
models, see Malakellis and Peter (1991).
Ch. 1: Computable General Equilibrium Modelling 61

3.67 per cent and an absorption increase of 3.09 per cent is the required combination. 55
This produces an expansion of the economy in which all three sectors participate quite
With results like these [reported in Dixon, Powell and Parmenter (1979)], we were
able to counter the argument that in Australia's recessed economy of the late 1970s
assistance measures targeted at particular industries were required. Our argument was
that macroeconomic policy, suitably designed, would stimulate aggregate employment
without leaving structural problems.

(b) Effects' of tariff changes: Multi-step simulations

Another topic which we have analyzed via comparative-static simulations with

O R A N I is the short-run effects of reductions in tariffs on imports. The significance
of this issue in the history of CGE modelling and as a policy issue in Australia is
discussed in Section 4.
A closure suitable for short-run tariff simulations is given in the third column of
Table 1.6. It is identical to the closure used for the simulations underlying columns
1 and 2 of Table 1.7 (see the first column of Table 1.6) except for the treatment of
the variables listed in the section of Table 1.6 headed "Revenue constraint and wage
setting". In computing the effects of tariff changes we impose a real tax-revenue
constraint. By adjusting the consumption-tax rate, the model ensures that the tariff
changes are revenue-neutral. Hence, the real tax-revenue variable is switched with
the consumption-tax shifter (ft(3)) as exogenous. We also choose to hold constant
nominal, rather than real, wage rates. Hence, ~'(g+1,1) is exogenous, with Eq. 3.16
,e J(g+l,l)"
playing no role other than to determine the value v. r( lj)
Results of simulations of the abolition of tariffs under this closure are in Table 1.8.
Given our database (Table 1.2), abolishing the tariffs requires percentage reductions
in the powers of the tariff rates on the model's four commodities of, in turn, 16.00,
10.00, 0, 33.33. These shocks are quite large, raising doubts about the accuracy of
1-step Johansen/Euler solutions. Using Table 1.8 we can consider whether such doubts
are justified. Column 1 contains 1-step results, column 2 contains results computed via
a 2-step procedure with no extrapolation, column 3 contains results extrapolated by
the Richardson method from the 1-step and 2-step solutions, and column 4 contains
results extrapolated from 8-step, 16-step and 32-step solutions. We regard the results
in column 4 as not significantly different fiom the full non-linear solution. This is
confirmed by row 1 of the table which shows the percentage change in aggregate
tariff revenue asymptoting to - 1 0 0 as we increase the accuracy of the solution. With
a negative relationship between the c.i.f, value of imports (the revenue base) and the
tariff rates, it is clear that the 1-step solution will understate the revenue effects of

55Column 3 is just a weighted sum of columns 1 and 2 with the weights being 3.67 and 3.09 respectively.
62 P.B. Dixon and B.R. Parmenter

reducing tariff rates. It also understates the revenue effects of the offsetting increase
in the rate of taxes on consumption (row 2 of the table).
For all variables, the results in column 3 are close to those in column 4. This
confirms our experience that, in most cases, use of the Johansen/Euler procedure
with extrapolation allows very accurate solutions to be obtained with only modest
computing effort. To produce column 3 required just two solutions to the model (a
one-step solution and a two-step solution). The accuracy of even the 1-step solution
in column 1 is such that the policy conclusions which could be drawn from it are
not substantially different from those which could be drawn from the full non-linear
The simulations suggest that abolishing tariffs would stimulate imports (row 3)
but also exports (row 4). The increase in exports, given our assumptions about the
elasticities of the world demand schedules, would erode the terms of trade. The
sectoral effects (rows 7-9) show sector 1, the main exporter, as expanding strongly.
Sector 2 is held back by the large share of the main import-competing commodity
in its sales structure but enjoys a significant cost reduction because of the relatively
large share of imports in its input structure. Sector 3 contracts as consumers substitute
away from the non-traded commodity 3, the price of which has risen relative to the
prices of commodities 1 and 2.

(c) Five-year forecasts

Table 1.9 contains hypothetical five-year forecasts made with our illustrative model. In
forecasting mode, the model is recursive, using the investment specification described
for Case 2 in Section 2.3. The variables in the percentage change equations are to
be interpreted as year-on-year percentage growth rates. The forecasts in Table 1.9 are
designed to show how we are using the MONASH model 56 to make forecasts for the
Australian economy. They comprise five annual simulations each of which was made
with a 1-step Johansen/Euler computation. 57 After each annual simulation we use the
results to update the data and use the updated data as the basis for the next annual
In forecasting with MONASH we drive the very detailed CGE model with exoge-
nously specified scenarios for most macroeconomic variables and for some structural
variables. The macro scenarios are taken from a business forecasting group and the
structural forecasts from expert bodies like the Australian Bureau of Agricultural and
Resource Economics (ABARE). The role of macroeconomics in forecasting with CGE
models is discussed further in Section 4.

56MONASHis a multi-period forecastingversion of ORANI,see Adams, Dixon, McDonald, Meagher

and Parmenter (1994).
57If we are concerned about linearization errors in the annual simulations, multi-step procedm'escould
be substituted for the 1-step computations.
Ch. 1: Computable General Equilibrium Modelling 63

Closure for recursive Jbrecasts. The closure for the forecasting simulations is given
in the final column of Table 1.6. It differs from the standard short-run closure (column
1 of the table) only in the treatment of variables in the last of the four blocks.
,.(l j)
In each of the annual forecasts, the ~(g+l,2), i.e., the capital stocks used in the
forecast year, are exogenous, as they are in short-run comparative statics. Unlike in
comparative statics, for the forecasts we shock the x (o+1,2)"
Oj) Except for the first year,
the shocks are results for x}l--3')l~T2) (1) obtained from the forecasts for the previous
year. For the first year, the shocks are calculated from the data.
Aggregate investment (as well as aggregate consumption) is exogenous in the fore-
casts. This requires tic to b e endogenous, breaking the link in Eq. 3.19 between the
growth rates of investment and eo.nsumption. The allocation of investment between
sectors is determined in the forecasts'by.Eqs 3.13 and 3.14, as in our comparative
statics. Equation 3.13 determines through-the-year growth in the capital stock in each
sector according to the rate-of-return movement in the sector. The accumulation re-
lationship 3.14 then determines the sectoral investment growth required to support
the capital growth. The role of the endogenous variable fk is to ensure that th~ sec-
toral capital growth rates are consistent with the exogenous growth rate of aggregate
The numeraire in the forecasts is the rate of inflation of domestic consumer prices
rather than the nominal exchange rate. In the forecasts the export volumes of all
commodities are exogenous.

Forecast scenario for exogenous variables. The scenario which we have adopted for
the illustrative forecasts is listed in part (a) of Table 1.9. It is similar to the scenarios
which we have used for recent forecasts with MONASH [Adams, Dixon, McDonald,
Meagher and Parmenter (1994), Syntec (1993c)], showing the economy emerging
from a trough in the business cycle.
The illustrative scenario begins with projections of shifts in the foreign demand
schedules for the two exported commodities, i.e., projections of the changes in the
world prices which would occur if export volumes remain unchanged. The scenario
also includes projections of the growth rates of the world prices of the three imported
commodities and of export volumes. For our forecasts with MONASH, we get infor-
mation like this from ABARE. The projections for exports of commodity 1 exhibit
a cyclical pattern typical of the commodity markets which account for the bulk of
Australia's exports. For commodity 2, the projections show high but gradually falling
export growth rates. This is typical of the situation facing Australia's non-traditional
exports (manufacturing and services). In recent years these have exhibited very rapid
growth from a low base. Exports are expected to continue to be strong but as the
base grows it is unlikely that the very high growth rates will be sustained. Note in
Section (b) of the table that the world-price projections imply a cyclical movement
in the terms of trade.
64 t~B. Dixon and B.R. Parmenter

Projections for the real wage rate are for continuation of wage moderation. Wage
moderation was the cornerstone of the Australian government's macroeconomic policy
in the 1980s. With the high levels of unemployment produced by the recession of the
early 1990s, wage moderation is likely to continue. The projections incorporate mild
pro-cyclical movements in the rate of growth of the real wage rate.
Technical change is a particularly difficult component of the scenario required
for the forecasting simulations. For the MONASH forecasts, we set technical-change
scenarios in the light of past patterns of technical change. At the level of disaggregation
at which MONASH works, generating evidence about these past patterns requires a
major research effort [cf. Dixon and McDonald (1993a)]. The hypothetical scenario
in Table 1.9 is limited to projections of labour-saving technical change. 5s We have
built into it two stylised facts: pro-cyclical movements in labour productivity; and
slow measured technical progress in services (sector 3) relative to other sectors.
The scenario includes projections for the domestic demand aggregates. These fol-
low the cyclical pattern dictated by the export projections and their implications for
the terms of trade. Typical of historical experience, investment is projected to be
significantly more volatile than consumption.
Our hypothetical scenario includes a program of tariff reductions which is assumed
to be complete by the end of the third year.
The final two variables in the exogenous scenario are the CPI and the demographic
variable, the number of households. The first of these is the numeraire. Our projections
for import prices reveals our implicit assumption about the world rate of inflation.
Note that we are assuming that on average the domestic rate of inflation will be a
little lower than the world rate. This has been a feature of Australia's recent economic
history. The only role played in the forecasts by the demographic variable is in the
determination of the commodity composition of aggregate consumption (see Eq. 3.3).

Results' f o r endogenous variables. Part (b) of Table 1.9 reports forecasts for selected
endogenous variables. Space precludes a very extensive discussion of these results.
Our strategy for demonstrating the type of insights which this forecasting technique
allows is first to explain the results for year 1 in some detail. The type of explanation
which we give for the year-1 colunm should readily be applicable to the other columns.
We then point to some features of the across-years pattern of the results which illustrate
the implications of the underlying dynamics of the model.
In looking at the results for year 1, we start by noting that the terms-of-trade
forecast is a direct implication of the exogenous scenario. The modest upward shift of
the foreign demand schedule of the major export commodity together with the forecast
expansion of the export volume imply relatively slow growth in its world price. Hence,
the terms of trade fall. The terms-of-trade decline implies that the GDP deflator must

5SThis is not to suggest that other aspects of technical change are unimportant. Dixon and McDonald
(1993a) find, for example, that input biases in the patterns of intermediate-input-savingtechnical change
were very important in explaining the growth of imports in the Australian economyin the late 1980s.
Ch. 1: Computable General Equilibrium Modelling 65

rise less rapidly than the CPI. Because we have assumed that the nominal wage rate
rises faster than the CPI, the wage/rental ratio rises, reducing the labour/capital ratio.
For year i, growth in the capital stock is predetermined by our data. The rates of
growth of capital, together with the forecast rates of growth of employment and the
of rates of labour-saving technical change included in the exogenous scenario, imply
real GDP growth of a little less than 3 per cent.
As with the terms of trade, the rate of growth of the export volume index is
implied directly by our exogenous scenario. Since the rates of growth in the exogenous
scenario for domestic investment and consumption are less than the forecast rate
of growth of GDP, imports are forecast to grow less rapidly than exports, i.e., an
improvement in the real trade balance is implied. With no adjustment in the real
exchange rate, imports would grow more rapidly than the forecast rate, due mainly to
the tariff reductions which are included in the exogenous scenario. Hence, the model
forecasts a real depreciation. With the domestic rate of inflation assumed to be less
than the foreign rate only a small nominal depreciation is required.
Forecasts of structural variables are reported at the bottom of Table 1.9. We begin
with growth rates for outputs of domestically produced commodities. Output growth
for commodity 1 is forecast to be strong relative to GDP growth for two reasons:
because of the rate of export growth included in the exogenous scenario and because
the real devaluation is (notwithstanding the tariff cut) strong enough to allow domestic
production to increase its share vis-a-vis imports in the domestic market. Commodity 2
faces a larger tariff cut and loses market share to imports. Exports of commodity 2
grow very rapidly but are only a small share of its total sales and make a only a
small contribution to its aggregate output growth. As might be expected, the growth
rate forecast for the output of the non-traded commodity 3 is close to the growth rate
forecast for real GDR
Growth of capital used in year 1 depends entirely on initial conditions in our data.
As implied in footnote 26, growth rates in capital used in year 1 are given by through-
the-year growth rates in the data (year 0). The main feature of the forecasts for growth
of capital in use in year 1 is that the capital stock in sector 1 is growing much more
rapidly than that of sector 2. Recall that both these sectors produce commodities 1
and 2. The faster rate of capital growth in sector 1 allows it to gain market share
against sector 2 in the production of both these commodities. This is apparent in
forecasts of growth in sectoral activity levels. Activity in sector 3, a single-product
sector which is the sole producer of commodity 3, must expand at the same rate as
the output of commodity 3.
The employment forecasts are implied directly by the forecasts for growth in capital
usage, growth in activity, and technical change. With rapid capital growth and labour-
saving technical change, employment growth in sector 1 is quite modest despite its
strong activity growth. In sector 2 also, labour-saving technical change is forecast to
keep employment growth slower than activity growth, despite the absence of growth
in the capital stock. According to our exogenous scenario, sector 3 enjoys no technical
66 P.B. Dixon and B.R. Parmenter

improvement. Hence, even with quite strong capital growth, employment growth at a
rate not much less than activity growth is required.
Our sectoral results for through-the-year capital growth in year 1 are quite similar
to our results for growth in capital usage in year 1. Both in the through-the-year
and usage results, sectors 1 and 3 have much higher growth rates than sector 2. The
similarity in the through-the-year and usage growth rates reflects sluggishness in the
adjustment from year to year in through-the-year growth rates. (Recall that the growth
rates in capital usage in year 1 are the through-the-year growth rates in year 0.)
This sluggish adjustment is implemented via our treatment of the shift variables in
Eq. 3.13. According to Eq. 3.13(a), growth in sector j ' s capital stock depends on its
net rate of return and on the product of the shift variables Fk and P(J) As mentioned
in Section 3.3, the net rate of return on capital in the data for year 0 is 5 per cent
for all sectors. With capital use for sectors 1 and 3 growing strongly in year 1, we
can see that a property of our base solution must be that Fk(O)F(~2)(O)has a low
value compared with Fk(O)F(kJ)(o) for j = 1 and 3. In our simulation for year 1, the
percentage movement in F~F(kj) is determined by the percentage movement in Fk
and is, therefore, the same for all j.59 Hence, in year 1, FkF~2) remains low relative
to ~ k , j = 1 and 3. This explains the relatively low growth through year 1 in
sector 2's capital stock. Scarcity of capital raises the rate of return in sector 2 so that
in later years capital growth in sector 2 is not far below that in sectors 1 and 3.
Despite having the lowest capital growth through year 1, sector 2 has the highest
growth in investment. Growth in a sector's investment reflects the change in the
sector's rate of growth in capital. Between years 0 and l, sector 2's capital growth
increases (from - 0 . 0 2 to 1.09 per cent) whereas capital growth in both the other
sectors declines. As explained already, capital growth in sector 2 increases relative to
that of the other sectors because the rate of return in sector 2 increases relative to that
of the other sectors.
We now move to the second part of our explanation strategy. That is, by looking
across some rows in Table 1.9 we describe some aspects of the dynamic operation of
our forecasting model.
Starting with the GDP results we see strong growth in years 2 and 3. This reflects
improving terms of trade, higher rates of growth of export volumes, and rapid growth
in investment. In years 4 and 5, all these weaken as determinants of growth.
Employment follows this cycle closely, despite the pro-cyclical movement in labour-
saving technical change which we assume. The effects of the increase in the rate of
technical change in year 2 can be seen, nevertheless, in the employment forecasts and
productivity changes for sectors 1 and 2. Contributing to the strength of employment
growth in year 2 is the sluggish adjustment of the capital stock. Investment is assumed
to boom in years 2 and 3 but its growth in year 1 is quite low. With growth of capital

59In Table 1.6, F (j) is exogenous and unshocked.

Ch. 1: Computable General Equilibrium Modelling 67

used in year 2 depending on investment in year 1, capital growth in year 2 is unable to

keep pace with output growth. This results in a fall in the wage/rental ratio and very
strong employment growth, mainly in sector 3, which experiences no labour-saving
technical change. On the other hand, in years 3-5, following the investment boom,
capital growth exceeds output growth, increasing the wage/rental ratio and slowing
the rate of growth of employment.
Finally, we note that a number of rows in Table 1.9 show the effects of our between-
year updates of the model's data. The export volume index is one example. In years
1 and 3 the shocks to export volumes in the exogenous scenario are identical but the
export volume index grows more rapidly in the second of these two years. The reason
is that in year 3 the weight in the index of commodity 2 (the most rapidly growing
export) is greater than it was in year 1.

4. Concluding remarks: Success, partial success and potential of CGE modelling

This section contains three propositions:

(1) CGE models have provided useful insights on the likely effects of disturbances
in one part of the economy on activity in other parts;
(2) CGE-based analyses of the welfare effects of proposed policy changes have
been only partially successful; and
(3) CGE models are yet to fulfill their potential for providing guidance to people
concerned with investment and other business decisions.

4.1. Success: Quantifying linkages between different parts of the economy

Before CGE models there were input-output models. These emphasize input-output
linkages between industries. They imply that stimulation of the motor vehicle indus-
try, for example, perhaps from the imposition of a tariff, stimulates the sheet metal
industry. In turn, this stimulates the steel industry and so on.
Input-output computations imply that stimulation of any one industry stimulates all
industries with widespread employment gains. Not surprisingly, input-output models
have been and remain a popular tool of lobbyists seeking government favours for
their industries.
CGE models go beyond input-output models by linking industries via economy-
wide constraints. These include: constraints on the size of government budget deficits;
constraints on deficits in the balance of trade; constraints on the availability of labour,
capital and land; and constraints arising from environmental considerations such as
air and water quality. With these constraints in place, the economy-wide implications
of stimulation of one industry can be negative and a favourable outcome for some
industries can be at the expense of others.
68 P.B. Dixon and B.R. Parmenter

For many years, CGE models have provided results of this type for the effects of
changes in protection. 6° As in Table 1.8, these show protection as favouring import-
competing industries while harming export-oriented industries. For example, since
the mid-seventies, ORANI simulations for Australia have generated results with the
following flavour. 61

* An increase in protection for textiles, clothing footwear and motor vehicles saves
jobs in these industries.
• However, it increases the prices of their products, thereby increasing the CPI.
• With wage rates being linked to the CPI, there is an increase in nominal wage
rates. 62
• This causes cost increases throughout the economy with a profit squeeze and job
losses in those industries which are poorly placed to increase their selling prices.
• Industries in this category are those relying largely on exports for their sales.
Selling prices for these industries are determined on world markets, independently
of their costs.
• Thus in ORANI, with the protected sector and the exporting sector linked through
the labour market, the initial stimulation of TCF and motor vehicles arising from
an increase in protection is translated into a contraction for agriculture, mining
and other export-oriented activities.
• With the real wage rate fixed economy-wide, ORANI implies that increases in
protection have little effect on aggregate employment. The number of jobs gained
in protected industries is approximately balanced by the number of jobs lost in
export-oriented activities.
• Changes in protection change the regional allocation of activity in Australia
with Victoria gaining from increases in protection and Queensland and West-
ern Australia losing. Similarly, changes in protection change the occupational
composition of employment.
Apart from protection, there are many other issues for which adequate analysis
requires recognition of linkages arising from economy-wide constraints. Some of
these were indicated in the opening paragraph of Section 1. Here we give one more
example, again drawing on an ORANI application [see Adams and Parmenter (1993
and I995)].
The question to be answered was: what would be the implications for the states of a
general stimulation of international tourism in Australia? Part of the ORANI-generated

6°See, for example Srinivasml and Whalley (1986) and Whalley (1985).
61The main reference on ORANI is Dixon, Parmenter, Sutton and Vincent (1982). Chapter 7 of that
book is a detailed report of an ORANI tariff simulation. For an overview of the role of ORANI in the
Australian economic debate, see Powell and Shape (1993).
621n most ORANI simulations, it has been assumed that real (CPI deflated) pre-tax wage rates are fixed
and that increases in tariff rates are accompanied by cuts in income taxes, not by CPI-reducing cuts in
other indirect taxes.
Ch. 1: Cnmputable General Equilibrium Modelling 69

answer was that Queensland (Australia's sunshine state and the main destination of
many foreign tourists) would be a small loser. The explanation relies on linkages
between different parts of the economy provided by constraints on the trade accounts.
Certainly international tourists spend money in Queensland, although not as much
as we thought prior to becoming familiar with the relevant statistics. Although many
tourists travel in Queensland, the bulk of their money is spent in New South Wales,
especially on airline tickets for flights in and out of Sydney. Thus, the Queensland
economy experiences moderate (not huge) gains from the expenditures of international
The downside for Queensland comes from the trade accounts. With stimulation
of international tourism, there is, according to ORANI, a strengthening of the ex-
change rate. This impacts adversely on export-oriented-activities including mining
and agriculture. 63 With these activities representing a comparatively large share of its
gross state product, Queensland is left as a net loser from general tourism stimulation.
It is in the tracing out of linkages arising from economy-wide constraints that CGE
modelling has had its greatest successes. With the advent of CGE modelling, the
input-output approach, with its exclusive reliance on linkages arising from flows of
intermediate inputs, is no longer credible.

4.2. Partial success: Analysis of welfare effects

Much of CGE modelling has been concerned with the welfare implications of proposed
policy changes, for example changes in protection, changes in taxes and changes in
environmental regulations. Usually these welfare implications have been measured by
calculating the variation in consumer income which would produce the same variation
in consumer utility as that generated in the CGE simulation of the policy change under
Many interesting welfare results have been obtained, especially in the analysis of
tax changes. For example, using a 19-sector, 12-consumer, multi-period, CGE model,
Ballad, Shoven and Whalley (1985) calculated the marginal excess burden 64 (MEB)
of US taxes on labour, capital, consumption, income and output. Under a variety of
assumptions concerning the elasticity of saving with respect to the real after-tax rate of
return, and the elasticity of labour supply with respect to the real after-tax wage rate,
they found MEBs in the range 0.18 to 0.56. This has an important implication for the
assessment of publicly funded projects. Because of the necessity of generating finance
through increases in decision-distorting taxes, only those publicly funded projects

63This is ml exmnple of Dutch disease: a boomingexport sector (tourism) impacts adversely on other
trading sectors, see, tbr example, Corden (1984).
64The MEB of a tax is x if an increase in the rate of the tax sufficient to increase governmentrevenue
by $1 leaves householdswith the same level of welfare as the imposition of a lump-sum(non-distortionary)
tax of $(1 + x).
70 I~B. Dixon and B.R. Parmenter

which provide benefits valued by consumers well in excess of their costs should be
Nevertheless, we rate CGE work in the area of welfare analysis as only partially
successful. Typically, in the calculations supporting this work, it has been assumed that
the proposed policy change does not affect: the levels of involuntary unemployment
of labour and capital; the form of competition between firms; and rate of techno-
logical progress. It is assumed that welfare changes arise only from reallocations of
consumer budgets between different goods (including possibly leisure and savings)
and reallocations of scarce factors of production between different industries. Such
CGE calculations of welfare changes often produce small and unconvincing numbers.
We consider two examples: the costs of protection and the costs of reducing CO2

(a) The costs o f protection

Even for countries with high and non-uniform tariffs, typical CGE calculations show
gains from moving to free trade of less than 1 per cent of GDR This result could have
been anticipated from pre-CGE work on the costs of protection. For example, in a
theoretical article containing illustrative arithmetic, Johnson (1960) demonstrated that
under competitive assumptions with normal settings for demand and supply elastici-
ties, the costs of protection are likely to be a very small share of GDR Dixon (1978b,
p. 63) concluded that if the principal aim is to measure the costs of protection, "it
would be pointless to apply a model which failed to recognize intra-industry special-
ization and economies of scale. Such a model is virtually certain to generate a paltry
estimate for the costs of protection, whatever the true situation might be".
In Australia, where there have been considerable reductions in protection over the
last 10 years, costs-of-protection (welfare) numbers derived from CGE models have
been ignored. In implementing anti-protection measures, policy makers have referred
to mechanisms not usually included in CGE calculations. Among these omitted mech-
anisms are the effects of increased competition from imports on the structure of in-
dustries and on the behaviour of both management and unions. With lower protection,
policy makers have argued
(a) that there are likely to be reductions in numbers of firms and product lines allowing
lower costs through economies of scale, and
(b) that management is likely to work more effectively and that unions are less likely
to take actions imposing cost increases on firms.
Recognizing that they are missing the main motivations for reductions in protection,
CGE modellers have sometimes enhanced their welfare calculations by assuming that
tariff cuts are accompanied by exogenously given improvements in productivity. This
can produce welfare numbers more in keeping with the views of anti-protectionists.
However, such CGE calculations merely illustrate the implications of anti-protection
arguments. They neither explain these arguments~ nor provide evidence in their support.
Ch. 1." Computable General Equilibrium Modelling 7l

lvrd , l


/a /

lutput = Z I / 2 ~Output =Z i

O u t p u t = 2 Z-

Price setting line

(equation 4.8 with ~i.= 7)

v I
0 6 12 18 ~I Ni
Figure 1.3. Zero pure profits and price-setting with imperfectcompetition and economies of scale.

The most celebrated CGE work incorporating some of the features required for a
satisfactory analysis of the costs of protection is by Harris and Cox (1983) for Canada.
They allowed for economies of scale, intra-industry specialization and non-competitive
market structures. Their lead has been followed by Horridge (1987), Norman (1990),
Mercenier (1994a) and others.
An illustration of the theoretical approach adopted in these studies is given in
Fig. 1.3. The horizontal axis shows the number of firms, N i , in industry i and the
vertical axis shows the markup up in industry i over variable costs:

M U , g = Pg/V~ - 1 (4.1)

where Pi is the price of good i and Vi is variable cost per unit of output.
We assume that firms in industry i incur an annual fixed cost, Fi, and that their
variable cost, V~, per unit of output is independent of their output level. This implies
72 PB. Dixon and B.R. Parmenter

that each firm experiences increasing returns to scale. We also assume that the number
of firms in the industry adjusts to ensure zero pure profits, i.e.,

p~z~ = v~z~ + F~Ni, (4.2)

where Zi is industry output. By rearranging (4.2) we obtain

MU,~- (F~/vd(N~/zd. (4.3)

Equation (4.3) is represented in Fig. 1.3 for different levels of industry output by lines
OA, O X and OA ~. In drawing these lines, we assumed that Fi/V~ is constant with
respect to variations in Ni and Zi.
The line BC in Fig. 1.3 is a price-setting line for a typical firm in industry i. It is
drawn on the assumption that as the number of firms increases, the industry becomes
more competitive, i.e., as 2V,i rises, each firm perceives a higher (larger negative)
elasticity of demand for its product. With profit maximizing behaviour, the Lerner
(1934) condition will apply. This can be written as 65

MUiq = - 1/(1 + eiq) (4.4)

where MUiq is the markup by firm q in industry i, and Ciq is q's perception of the
elasticity of demand for its product.
Assuming all firms are alike, so that Mgiq = MUi and eiq = ci /'or all q, we can
rewrite (4.4) as

MU~ = - 1/(1 + ~(N~)). (4.5)

With the perceived elasticity, ci, becoming a larger negative number as Ni increases,
we see that MUi is negatively related to N~. This is reflected in the negative slope
of BC.
For CGE modelling, we need a numerically specified relationship between ci
and Nz. A possible starting point [Horridge (1987)] for obtaining such a relation-
ship is to assume that demanders of product i have preferences for different varieties
given by a CES function. This implies that the demand function for the product of
the qth firm in industry i has the form:

X(iq) = a i - - ° - i ( P ( i q ) - - ~ S ( i r ) P ( i r ) ) (4.6)

65profit maximization requires that marginal revenue equals marginal cost, i.e,,

~(PiqZiq)/~Ziq =-- Piq -t- ( O P i q / O Z i q ) Z i q : Viq.

By dividing through by Piq, we arrive at (4.4). We assume that elq <i -1 so that Mgiq > O.
Ch. 1: Computable General Equilibrium Modelling 73

X(iq) is the percentage change in the demand for product i of variety q (the variety
produced by the qth firm);
ai is the percentage change in the activity variable (e.g. household disposable
income) relevant in the determination of overall demand for product i;
P(iq) is the percentage change in the price of product (iq);
S(ir) is the share of the total sales of product i accounted for by variety r;
Gi is the elasticity of substitution by users of i between different varieties.

Now we assume that all firms in industry i are the same size (S(i~) = 1 / N i for
all r) and that they are Bertrand rivals, i.e., they behave as if they expect changes in
their prices to generate no price response from their competitors. Then, each firm's
perceived elasticity of demand is

= - l/Nd, (4.7)

giving a numerically implementable equation for the BC line in Fig. 1.3 of the form 66

M U , i = - 1 / ( 1 - Gi(1 - 1 / N i ) ) . (4.8)

Another approach to specifying the price-setting line is to adopt a price-leadership

model such as that of Eastman and Stykolt (1967). They assume that


where Pi"~ is the c.i.f, import price of commodity i, Ti is the tariff rate, and c~i
is a parameter. Under (4.9), the price-setting line is horizontal. Irrespective of their
number, all firms in industry i are led by the landed-duty-paid price of competitive
imports. The markup for each firm is

MUi = aiP~(1 + Ti)/V,i - 1. (4.10)

Harris and Cox (1983) and Horridge (1987) experimented with price-setting as-
sumptions combining both the Lerner/Bertrand and the Eastman/Stykolt specifica-
tions. For example, Horridge assumed that the markup of firms in import competing
industry i is

M U i = W1 [ - 1 / ( 1 - ~ i ( l - 1/Ni))] + W2 [aiPim(1 + T i ) / V i - 1], (4.11)

where W1 and W2 are nonnegative weights summing to one.

66We assume that ai > N i / ( N ~ - 1), ensuring that M U i > O.

74 t~B. Dixon and B.R. Parmenter

In an application of his model to the analysis of the effects of reductions in pro-

tection in Australia, Horridge found that when W1 is close to 1 (the Lerner/Bertrand
pricing specification) the results are similar to those obtained under competitive as-
sumptions with constant returns to scale. In particular, calculated costs of protection
are small. He also found that when W2 is close to 1 (the Eastman/Stykolt specifi-
cation), the calculated costs of protection are much larger than those obtained under
competitive, CRS assumptions. Both these results can be understood by reference to
our diagram.
For import-competing industries, Horridge assumed, on average, that Ni is about 12
and cr~ is about 7, giving a typical value for MUi in (4.8) of 0.185. As shown in the
diagram, variations in Ni from 6 to 24 cause relatively little variation (from 0.207
to 0.175) in MUi. Because his price-setting lines were quite flat, Horridge found
in the Lerner/Bertrand case that changes in protection could cause large changes in
industry outputs without causing much change in markups. Consequently, he found
that percentage movements in Ni were approximately equal to percentage movements
in Zi. (Notice in Fig. 1.3 that if output in industry i doubles, switching us from
ray OA / to OA", then Ni approximately doubles. Similarly, if Z~ halves, then Ni
approximately halves.) With N~ approximately proportional to Zi, total fixed costs
in industry i are also approximately proportional to Zi. Thus, despite allowing for
imperfect competition and for economies of scale at the firm level, the Lerner/Bertrand
version of Horridge's model behaves in a similar way to a competitive model with
CRS specified at the industry level.
With the Eastman/Stykolt specification (W2 = 1), a cut in protection causes a
downward shift in the price-setting line. Because the Eastman/Stykolt price-setting
line is completely flat, a downward shift causes a reduction in MUi, irrespective of
what happens to Zi. With a reduction in MUi, there is an increase in output per firm
and a reduction in fixed costs incurred in the industry per unit of output. Relative to
the Lerner/Bertrand case, Horridge found that this saving of fixed costs per unit of
output generates a considerably increased figure for the welfare gain from eliminating
On the basis of the work by Harris and Cox, Horridge and others, we can conclude
that the costs of protection depend critically on production technologies and on how
firms in protected industries compete with each other. However, we already knew this
from theoretical literature such as Corden (1974). While CGE modellers have made
considerable progress in dealing with imperfect competition and economies of scale,
they have, as yet, failed to incorporate sufficient empirical detail to allow a useful
narrowing of the range of possible estimates for the costs of protection.

(b) The costs of reducing C02 emissions

Our second example of the inadequacies of CGE-based welfare analysis concerns the
costs to Australia of reducing CO2 emissions by 20 per cent by 2005, i.e., the costs
Ch. 1: Computable General Equilibrium Modelling 75

of meeting the Toronto target. Using a version of ORANI, the Industry Commission
(1991) concluded that the main action required in Australia to meet the Toronto target
is the substitution in electricity generation of low CO2 fuels, such as oil and gas, for
high CO2 fuels, especially brown coal. They found that this would involve an annual
welfare cost of about 1.5 per cent of GDE
As with most CGE welfare calculations, the Industry Commission calculations were
comparative static. They compared two ,pictures of the Australian economy in 2005:
one in which Australian electricity generation continued to rely mainly on coal with
CO2 emissions being of no concern, and the other in which a major fuel switch
had taken place to facilitate a sharp reduction in CO2 emissions. As the Industry
Commission recognized, adjustment costs over the period between now and 2005
were omitted from their calculations. For example, no account was taken of the extra
investment needed over this period to replace brown-coal-fired generation plants in
the La Trobe valley (a brown-coal producing region with enormous investments in
generation capacity).
The Industry Commission's work on CO2 emissions indicates that for convincing
welfare analysis, we need to add dynamics and adjustments costs to the list of nec-
essary features of the model. Unfortunately, the dynamics required are complicated.
Because they do not deal adequately with scrappage, simple dynamic models, assum-
ing perfectly malleable capital stocks, are inadequate. Dynamic analyses of the costs
of meeting CO2 emission targets which have adopted the malleability assumption
(brown-coal-fired generation capacity can be converted effortlessly into oil/gas capac-
ity) include Jorgenson and Wilcoxen (1992 and 1994). These analyses, as with those
based on comparative statics, may seriously underestimate the costs of adjusting to
meet environmental objectives.
McKibbin and Wilcoxen (1993a and 1993b) analyze the effects of greenhouse-
gas reductions in a dynamic model with adjustment costs of the type discussed in
Subsection 2.3 (Case 4). While their work is an advance, it suffers from the following
limitations: the form of the adjustment-cost specification (they u s e OI2/K) has no
clear theoretical or empirical justification; the critical parameters, the 0s, are merely
assigned, not estimated; and their model is highly aggregated (for example electricity
is a single industry) meaning that the costs of moving resources within large sectors
of the economy are ignored.
Perhaps the most promising approach to creating models capable of generating
satisfactory estimates of the costs of environmental policies is that of Manne (1991).
He is attempting to absorb a detailed energy model such as MARKAL (Fishbone
et al. 1981) into a CGE framework. Under our definition (Section 1.1), MARKAL is
not a CGE model: it treats prices exogenously and includes insufficient specification
of the behavior of economic actors outside the energy sector. MARKAL's strength is
that it can include specifications of dozens of energy-producing technologies (such as
brown-coal-fired-electricity generation in the south-eastern area) based on engineering
data. Associated with each technology is a non-malleable capacity constraint and a
76 P.B. Dixon and B.R. Parmenter

specification of the costs of creating additional capacity. Progress in taking MARKAL-

like structures into a CGE framework has also been made by Adams et al. (1991) and
Jones et al. (1991).

4.3. Potential: Disaggregated forecasting

Most CGE modelling has been concerned with the effects of proposed policy changes
or the effects of exogenous events, e.g., the discovery of mineral deposits. However,
there is strong demand for forecasts. Disaggregated forecasts are required to help
policy makers, investors, trade unions and households to form realistic expectations
concerning: real wage growth; the costs of capital relative to labour; the industrial
composition of economic activity; employment growth in different occupations and
industries; and growth rates in different regions.
CGE models have not yet proved themselves to be valuable forecasting tools. While
their tight theoretical structure is an attractive feature, it is far from sufficient. In our
efforts to transform ORANI into a forecasting tool we have identified the following
areas as requiring major effort.

(a) Achieving good macro forecasts

The first attempt to use ORANI in forecasting mode was Dixon (1986). Forecasts
were produced for the period 1986 to 1990. The main feature of these forecasts at
the macro level was a sharp reduction in Australia in the costs of capital. This was
supposed to follow from two sources: a reduction in real interest rates world-wide in
response to a contraction in the US budgetary deficit; and the formation of market
expectations that the Australian exchange rate would be strong through the forecast
The assumed reduction in the costs of capital produced in our forecasts an in-
vestment boom, rapid real wage growth and average annual GDP growth of over
5 per cent. At the industry level, we projected good prospects for investment-related
industries such as construction.
In later papers, e.g., Dixon and Parmenter (1987), we argued that foreign financiers
would insist that Australia stabilize its foreign debt as a share of GDP by the end
of 1990. Through ORANI, we found that this implied a sharp real devaluation of the
exchange rate with high real interest rates and costs of capital. This led to forecasts of
only modest real GDP growth, poor prospects for real wage growth and poor prospects
for investment and investment-related industries.
None of our early ORANI forecasts have been close to reality. It is now clear that
we did not kno~w enough about how to forecast the macro economy. Because our
macro forecasts were inaccurate, our industry forecasts were unrealistic.
There are two approaches to macro forecasting in a CGE framework. One is to
rely on the CGE-generated macro implications of assumptions concerning the future
Ch. 1: Computable General EquilibriumModelling 77

paths of variables such as aggregate employment, required rates of return on capital,

technical change and changes in the terms of trade. This was the approach we used in
our early forecasting exercises with ORANI. In the second approach, we rely on the
CGE model only for structural forecasts, e.g. forecasts of the industrial composition
of GDP and the occupational and regional composition of employment. Under this
approach, we force the CGE model to produce results compatible with exogenously
supplied macro forecasts. These can be derived from a conventional macro model
emphasizing business cycle phenomena. Compatibility between the CGE model and
the macro forecasts is achieved by endogenizing in the CGE model such variables as:
an overall measure of technical change (allowing compatibility between exogenously
specified levels for GDP and for aggregate inputs of capital and labour); an overall
measure of import/domestic preferences (allowing compatibility between exogenously
specified levels of aggregate imports and of the real exchange rate); the average
propensity to consume (allowing compatibility between exogenous specified levels
for consumption, GDP and tax rates); and the overall required rates of return on
capital (allowing compatibility between exogenously specified levels for aggregate
investment and for overall real unit labour costs).
Eventually, it may be possible to generate realistic macro forecasts in a CGE model
without help from specialist macro forecasters. However, at this stage it seems sen-
sible to exploit the advantages of division of labour. For example, in forecasting for
Australia, it is necessary to pay close attention to overseas economies. This is because
Australia's business cycle is closely connected to that of the US and other major coun-
tries. The explanation is that growth in the world economy is the main determinant of
movements in Australia's terms of trade. These movements exert a strong influence
on GDR the exchange rate and other macro variables in the Australian economy. By
building a CGE model capable of using exogenously given macro forecasts, we have
been able to draw on the expertise of macro modellers and business forecasters spe-
cializing in the study of overseas economies and their macro economic influence on
Australia. This leaves us free to specialize in CGE modelling of industries, regions
and occupations. 67

(b) Creating and maintaining up-to-date input-output data

In most countries, input-output tables are published by the official statistical bureau
with a long lag. For example, until February 1994 the latest input-output tables
published by the Australian Bureau of Statistics (ABS) were for 1986-1987.
Out-of-date input-output data do not usually pose major difficulties for comparative
static applications of CGE models. For example, Dixon, Parmenter and Rimmer (1986)
found that the simulated effects of a given tariff cut varied little as they changed the

67In generating CGE forecasts for industries, regions and occupations, we are currently using inputs
from Murphy's(1988 and 1991)macroeconometricmodel and fromthe business forecastinggroup, Syntec
(1993a and b). See Adamset al. (1994) and Syntec (1993c).
78 P.B. Dixon and B.R. Parmenter

input-output database underlying their CGE model from 1969 to 1975. Nevertheless,
timeliness of input-output data is vital for forecasting. This is especially true for
forecasting the prospects of investment-supplying industries. Working from an out-of-
date database, a CGE model may be able to produce satisfactory forecasts of growth
in the housing stock reflecting demographic and income projections. However, for
forecasting the prospects of residential construction, cement, bricks, glass and other
industries closely associated with home building, we need current data on the level of
activity in these industries. If the construction activities are currently subdued, then
the achievement of a given growth path for the housing stock may imply that they
have strong growth prospects. Alternatively, if their current level of activity is high,
then the same path for the housing stock may imply a construction slowdown,
As explained in Dixon and McDonald (1993a), we have devoted considerable re-
sources to updating input-output tables published by the ABS. Out initial motivation
was to provide an up-to-date starting point for our CGE forecasts. A subsidiary benefit
has been a detailed quantification for the second half of the 1980s of technological
change and of changes in consumer tastes. This has helped us to develop forecasts
of these variables for the 1990s. In addition, the update project has given us a frame-
work for analysing structural changes in the Australian economy [see Dixon and
McDonald (1993b)].

(c) Disaggregating and understanding what the statistics for the industries represent

Most published CGE models have less than 30 industries. For many purposes this
provides inadequate disaggregation. For example, in Subsection 4.2(b) we argued that
convincing analysis of the costs of limiting CO2 emissions requires greater disag-
gregation of the energy sector than is normal in CGE models. In forecasting, even
with a 100-industry model it is difficult to meet the requirements of clients, both in
the public and private sectors, seeking guidance in the allocation of funds between
alternative investment possibilities.
The development of a relatively disaggregated CGE forecasting model is a major
task. We are finding that it is necessary to think carefully about what the statistics for
each industry really represent. It is not enough to follow the usual practice in CGE
modelling of adopting the same specification (e.g. Leontief, CES, nested-CES, etc.) to
describe each industry's technology, with only the parameter values differing between
industries. Similarly, a uniform specification of how imported products complete with
domestic products (e.g. the Armington specification) is adequate. We give two exam-
• Communication. In the input-output tables published by the Australian Bureau
of Statistics this industry has considerable imports and exports. Does this mean
that output and employment in the industry are highly sensitive to exchange rate
movements and to costs in Australia relative to costs overseas? This is the con-
clusion that follows in ORANI under standard specifications of the behaviour
Ch. 1: Computable General Equilibrium Modelling 79

of trade flows. On getting to known about the nature of trade flows in commu-
nication, we find that this is not a sensible conclusion. Communication imports
are mainly payments by Australia's Telecom to overseas telephone companies
for facilitation of the transmission of calls from Australia. There is also a rental
component for Australian use of foreign-owned communication satellites. Com-
munication exports are mainly payments to Australia's Telecom for facilitating the
transmission of calls coming from overseas. Given the nature of these trade flows,
we expect future movements in exports to be approximately in line with those in
imports (calls go to and fro). After modifying our specification of the industry
to recognize the links between its imports and exports, we no longer find that its
output and employment are highly sensitive to its international competitiveness.
• Aircraft. Does an upsurge of imports of aircraft harm employment and output
in the Australian aircraft industry? This is the result that O R A N I gives under
standard specifications. However, on looking into what the industry does we find
that its product is complementary with imports rather than competitive. The local
industry specializes in aircraft repairs and the manufacture of parts. On changing
the standard specification to reflect this, we find that the local industry is likely
to prosper during a period of strong growth in the volume of imported aircraft.
The availability of programs such as G A M S and G E M P A C K mean that computa-
tional difficulties are not currently a binding constraint in C G E modelling on either
disaggregation or on the use of industry-specific specifications. What is now required
for the creation of practical, decision-oriented CGE models is a willingness by model
builders to increase the amount of information incorporated in their models. To do
this, they will need to work closely with their national statistical agencies. They will
also need to work in teams rather than as individuals. Research teams will be neces-
sary to handle the work loads involved in implementing highly disaggregated C G E
models containing thoughtful theoretical specifications for each industry.


Chapter 2


Cal!fi)rnia Institute qf Technology

University of Minnesota


1. Introduction 88
2. Notation and problem statement 90
3. Computing a sample equilibrium 92
3.1. Two-person games: The Lernke-Howson algorithm 92
3.2. N-person games: Simplicial subdivision 102
3,3. Non-globally convergent methods 106
4. Extensive form games 109
4.1. Notation 109
4,2. Extensive versus normal form | 12
4.3. Computing sequential equilibria 113
5. Equilibrium refinements 115
5.1. Two-person games 115
5.2. N-person games 117
6. Finding all equilibria 118
6.1. Feasibility 119
6.2. Exemplary algorithms for semi-algebraic sets 120
6.3. Complexity of finding game theoretic equilibria 129
7. Practical computational issues 133
7.1. Software 133
7,2. Computational complexity 137
References 139

*This research was funded in part by National Science Foundation grants SBR-9308637 to the California
Institute of Technology and SBR-9308862 to the University of Minnesota. We are grateful to Robert Wilson
and an anonymous referee for detailed comments on earlier drafts.

Handbook of Computational Economics, Volume L Edited by H.M. Amman, D.A. Kendrick and J. Rust
(~) 1996 Elsevier Science B.E All rights reserved.
88 R,D. McKelvey and A. McLennan

1. Introduction

In this paper, we review the current state of the art of methods for numerical com-
putation of Nash equilibria - and refinements of Nash equilibria - for general finite
n-person games. For the most part, we simply survey existing literature. However,
we also provide proofs and technical details for certain results that may be well
known to practitioners, but for which there is not an accessible treatment in the liter-
Although our perspective will emphasize the concerns of economists, we exclude
from consideration algorithms that efficiently solve the specific games that arise in
various areas of application (e.g., auctions, bargaining, matching, repeated games,
stochastic games, games of perfect information). Such specialized procedures consti-
tute a subject area that is, at least potentially, almost as rich as the whole of economic
theory. We also do not attempt to describe the very extensive literature concerned
with procedures, such as programs for playing chess, that attempt to find relatively
effective choices of actions without completely solving the game. Finally, although
some of the algorithms we present have been implemented, so that we will be able
to discuss some applications, the bulk of the work to date has been theoretical, and it
is this aspect that will be emphasized.
Students of game theory, their teachers, and researchers who use these concepts,
are generally aware that solving for Nash equilibria can be a tedious, error-prone
affair, even when the game is very simple, and they also know that the need to solve
a game arises with fair frequency. We will therefore not argue for the general util-
ity of such software in any detail. It is, perhaps, less obvious that suitable software
could support styles of research that are currently infeasible. For theorists it could
be useful to test an hypothesis by systematically searching for a counterexample be-
fore launching an effort to prove it, and in some cases such a search could itself
constitute a proof. Experimentalists might wish to search parameter spaces in order
to obtain experimental designs that maximize the statistical distinction between com-
peting hypotheses. Mechanism designers might conduct such searches with different
goals in mind. Finally, econometric analysis of strategic decisions requires the ability
to solve a given extensive or normal form repeatedly with different parameter val-
ues, for the purpose of computation of likelihood functions and subsequent parameter
The appropriate method for computing Nash equilibria for a game depends on a
number of factors. The first and most important factor involves whether we want to
simply find one equilibrium (a sample equilibrium) or find all equilibria. The problem
of finding one equilibrium is a well studied problem, and there exist a number of
different methods for numerically computing a sample equilibrium. The problem of
finding all equilibria has been addressed more recently. While there exist methods for
computation of all equilibria, they are very computationally intensive. With current
methods, they are only feasible on small problems. We discuss those methods in this
Ch. 2: Computation of Equilibria in Finite Games 89

The second factor of importance concerns whether ~., the number of players, is
greater than two. The polynomials that arise in the definition of Nash equilibrium are of
degree n - 1 in the variables describing the agents' mixed strategies, so for games with
two players, a Nash equilibrium solves a system of linear equations over the variables
in the support of the equilibrium. Among other things, if the input data are rational
numbers, the set of all Nash equilibria is the union of a set of convex polyhedra, each
of which can be completely characterized by specifying a finite number of extreme
points (which also have rational coordinates). Because of this, for two person games
there exist methods for finding exact sample Nash equilibria, and for characterizing
exactly the entire set of Nash equilibria. For games with more than two players, even
if the input data are rational, the set of Nash equilibria with a given support need no
longer be a convex, or even connected set. Even if it is a singleton, it need not have
rational coordinates. The methods that work for two person games can not typically
be directly extended for ~-person games.
The third factor that determines the choice of method concerns the type of Nash
equilibria we wish to find. It is well known that not all Nash equilibria are equally
attractive. For example, Nash equilibria can be dominated, and if there are multi-
ple equilibria, they may be Pareto ranked. A large literature exists on equilibrium
refinements, which defines criteria for selecting among multiple equilibria (such as
perfect equilibria, proper equilibria, sequential equilibria, stable sets, etc.). The issue
of equilibrium refinements has not been extensively addressed in the literature on
computation o f Nash equilibria. We discuss here the limited results that are avail-
able. The methods for finding a sample equilibrium are only guaranteed to find a
Nash equilibrium. Thus there is no guarantee that the equilibrium found will sat-
isfy whatever refinement condition is deemed important. So any method intended
to find a sample Nash equilibrium needs to be modified to find a particular refine-
ment. Since the set of refinements is a subset of the set of all Nash equilibria, a
method that finds all Nash equilibria can serve as a basis for a method to find the
set of all refined Nash equilibria, as long as we can characterize the set of refined
Nash equilibria as a subset of the set of Nash equilibria in a computable way. The
Tarski-Seidenberg theorem implies that most of the equilibrium refinements that have
been proposed can be expressed as semi-algebraic sets. Thus, the same methods as
are used to find all Nash equilibria can in principle be used to find any of these
The remainder of the paper is organized as follows. Section 2 introduces notation
and states the problem. Section 3 reviews methods for computing sample equilibria in
normal form games. Section 4 deals with computation of equilibria on extensive form
games. Section 5 discusses the computation of equilibrium refinements. Section 6
discusses methods for finding all equilibria. Finally, Section 7 discusses practical
computational issues and experience.
90 R.D. McKelvey and A. McLennan

2. Notation and problem statement

Consider a finite n-person game in normal form: There is a set N = { 1, ... , n} of

players, and for each player i E N a strategy set Si = {sil, ... ,si,~}, consisting
of mi pure strategies. For each i E N, we are given a payoff function, ui : S ~+ IR,
where S = YIiEN Si.
Let 79i be the set of real valued functions on S.i. For elements pi E 79i we use the no-
tation Pij = pi(sij). Let 79 = ] - L N 79i and let m = Y~i~N ms. Then 79 is isomorphic
to IRm. We denote points in 79 by p = ( P l , . . . ,Pn), where p~ = (p~l,... ,pi,r~) E 79i.
If p E 79, and p~ E 79i, we use the notation (p~, P - i ) for the element q E 79 satisfying
qi = P{, and qj = pj for j ¢ i. We use similar notation for any vector.
The payoff function u is extended to have domain R '~ by the rule

ui(p) = Z p ( s ) u i ( s ) , (2.1)

where we define

p(s) = 1-IPi(Si). (2.2)


Let Ai = {Pi E 79i: Y~'~jPij = 1,pi >~ 0} be the set of probability measures on
Si. Let A HieN z~i ~ ~m. We use the abusive notation sij to denote the element

Pi E Ai with Pij = 1. Hence, the notation (sij,p-i) represents the strategy where i
adopts the pure strategy sij, and all other players adopt their components of p.

DEFINITION 1. We say p* E 79 is a Nash equilibrium if p* E A and for all i E N,

and all Pi C Ai, U i(Pi,P-i)
* <<.ui(p*).
We start by giving several alternative characterizations of a Nash equilibrium. We
first define three functions x, z, and 9 : 79 ~ Rm, derived from the normal form
game u. For any p E 79, and i E N, and sij E Si, define the i,jth component by

xij (p) = ui(sij, P-i), (2.3)

zij (p) - xiy (p) - ui(p), (2.4)
9ij (P) = max[zij (p), 0]. (2.5)

Nash equilibrium as a fixed point of a correspondence. Define the best response

correspondence A " A ~-~+ A by

A(P)=argmaxIZui(qi,P-i) ]• (2.6)
Ch. 2." Computation qf Equilibria in Finite Games 91

Then p* E A is a Nash equilibrium if and only if it is a fixed point of A, in other

words p* E A(p*).
It is well known (and easy to prove) that A has a closed graph, is nonempty, and
convex valued. It follows by the Kakutani fixed point theorem that there is a fixed

N a s h e q u i l i b r i u m as a fixed point o f a function. Define y • A ~+ zS, with i,jth


Pij + 9 i j ( P ) (2.7)
Y~J(;) - l + ~ j g ~ j ( v )

Then p* E A is a Nash equilibrium if and only if it is a fixed point of y, in other

words p* = y(p*).
Since y is a continuous function from a compact set A into itself, it follows from
the Brouwer fixed point theorem that y has a fixed point. This argument is used by
Nash (195 l) to prove existence of equilibrium for finite n-person games.

N a s h e q u i l i b r i u m as a solution to a non-linear c o m p l e m e n t a r i t y problem.
function z : 7~ ~ ]~m defined above satisfies pi. zi(p) = 0 for all p and each i (i.e.,
Pi and zi(p) are orthogonal). A point p* E A is a Nash equilibrium if and only if
z(p*) <~ O. The Nonlinear Complementarity Problem (NCP) on A then consists of
finding a point p C A with z(p) <~ O. Such a point p i g complementary to z(p). I.e.,
p~j. zij (p) = 0 for all i, j. Thus the set of Nash equilibria are the solution to a NCP
on A.

N a s h e q u i l i b r i u m as a stationary point problem. A point p* E A is a Nash

equilibrium if and only if it satisfies

( > - p ~ ) .x~(p*) 4 o (2.8)

for all i and p C gl. This is the stationary point problem for the function x " A ~+
R '~ on the polytope A C R "~.

N a s h e q u i l i b r i u m as a m i n i m u m of a function on a polytope. Define the real

valued function v • A F-+ ]R by

v(p) = ~ ~ [g~j(p)]2. (2.9)

This is a continuous, differentiable real valued function satisfying v(p) >1 0 for all p.
Further, p* is a Nash equilibrium if and only if it is a global minimum of v. In other
words v(p* ) = O.
92 R.D. McKelvey and A. McLennan

Nash equilibria as a semi-algebraic set. The set of Nash equilibria is the set of
points p E R ~ satisfying

p~ A and z(p) <~0. (2.10)

But z~ is defined by a set of linear inequalities, and z : R ~ ~-+ R ~ is a polynomial

in p. Hence the set of Nash equilibria is a semi-algebraic set.
From the above alternative formulations of a Nash equilibrium, it is clear that the
problem of finding a Nash equilibrium for a finite game can be expressed in terms of
a number of standard problems in the theory of optimization. Each of these problems
has been extensively studied, although not always with the application of computing
Nash equilibria as the goal. However, an array of methods can potentially be brought
to bear on the problem of computing a Nash equilibrium of a finite game.

3. Computing a sample equilibrium

Most of the literature on computation of Nash equilibria deals with the computation of
a sample Nash equilibrium. In this section we review this literature. We first discuss
methods which are globally convergent. As discussed in the introduction, different
methods must be used for two person and r~-person games. We conclude with a brief
discussion of methods that are not globally convergent.

3.1. Two-person games: The Lemke-Howson algorithm

Any review of methods of computation for game theory must start with the work
of Lemke and Howson (1964). Historically, the Lemke-Howson algorithm was the
first of the so called path following algorithms. Lemke and Howson's algorithm was
developed originally for two person games and was then extended to solve more
general linear complementarity problems [Lemke (1965), Eaves (1971)], of which a
two person game is a specific example. Shapley (1974) describes a nice geometrical
interpretation of the Lemke-Howson algorithm for the nondegenerate case, which
lends itself to easy visualization if the number of strategies for both players are small
enough. In this section, we give a precise description of the Lemke-Howson algorithm
as modified by Eaves (1971) to deal with degenerate problems.

3.1.1, Finding one Nash equilibrium

Consider a two person game with strategy sets S,i = {s~,,..,,si~r~} for player i,
i = 1, 2. Let Ui be the payoff matrix for player i, with rows indexed by the strategies
for player i and columns indexed by the strategies for player g ¢ i. In other words,
Ch. 2: Computation of Equilibria in Finite Games 93

the entry in row j , column k of Ui is u i ( s # , s i,k). We assume, without loss of

generality, that all entries in Ui are positive.
A Nash equilibrium is a pair of column vectors Pi C R '~' satisfying

Ui.p i+ri=v~.l (3.1)

subject to pi >~ O, r'i >~ O, pi • 1 = 1, and p~ .r~ = 0 for i = 1,2. Here, vi is a scalar
representing player i ' s payoff, and ri E IRm' is a column vector whose elements Tij
are "slack" variables which must be 0 if i uses strategy 8ij with positive probability.
The notation 1 indicates a column vector of l ' s of appropriate dimension.
If all entries in the Uii are positive, then we are assured that the vi are positive, and
we can reformulate the above problem. Define p~ = p i / v - i , and r~ = ri/vi. Then
Eq. (3.1) can be re-written as

U~. p t i + rli = 1 (3.2)

subject to p~ ~> 0, r~ ~> 0, p~. 1 = 1~v-i, and p~. r~ = 0 for i = 1,2. We can drop
the constraint that p~ • 1 = 1 / v - i at the expense of obtaining one additional solution
in which Pl = P~ = 0. This solution is called the extraneous solution.
Define U to be the m x rn matrix

0 U1 ] (3.3)
U= U2 0 '

and x, y, and q to be column vectors of length ,rn:

x = ' Y = Lp~ ' q =

[:3 . (3.4)

Then Eq. (3.2) can be rewritten as

Uy + x = q (3.5)

subject to x ~> 0, y >~ 0, and x . y -- 0. This is exactly in the form of a linear

complementarity problem. We now discuss methods of solving such problems.
A pair of vectors z , y E R m is said to be a solution of (3.5) if U y + x -- q, A
solution is feasible if x ~> 0 and y ~> 0. As the intersection of finitely many half
spaces of the form { ~ E R 2m: u - ~ ~< c }, the set of feasible solutions is a closed
convex polyhedron, which may be either empty or nonempty, and if nonempty, either
bounded or unbounded. In the case of interest to us U has non-negative entries, and
each of its columns contains a positive entry, while q has positive components. These
94 R.D. McKelvey and A. McLennan

conditions guarantee that the set of feasible solutions is both nonempty (since the
extraneous solution is feasible) and bounded.
A solution is said to be complementary if x • y = 0. Note that a feasible solution
is complementary if and only if, for each 1 ~ i ~< m, either xi = 0 or y~ = 0.
Clearly the extraneous solution (x = q, y = 0) is a complementary feasible solution.
Any Nash equilibrium corresponds to a complementary feasible solution. Further,
any complementary feasible solution corresponds either to a Nash equilibrium or the
extraneous solution.
Described geometrically, the L e m k e - H o w s o n algorithm starts at a given comple-
mentary feasible solution, then proceeds, from vertex to vertex, along a certain path
of one dimensional faces of the polyhedron of feasible solutions, until it reaches a
different complementary feasible solution. For "non-degenerate" problems (which are
generic in the space of parameters U and q) the specific construction of the path is
not difficult to describe, as we will see below. But for exceptional problems (which
can easily arise in games derived from extensive forms) there are ambiguities that
must be resolved.
We write the system (3.5) in the form

Let A : [U I m ] , and let z : [xY]. Consider a collection of indices /3 :

{bl,. •., b.~} with 1 <~ bi ~< 2m, and let B ~ be the m x m matrix whose ith column
is the /3ith column of A. We say that/3 is a basis if B z is nonsingular, and we say
that a solution z is a basic solution if there is a basis/3 such that z{ : 0 for all i ~/3.
Note that for each basis/3 there is exactly one solution z: the components of z with
indices in/3 are the components of ( B Z ) - l q . A basis is feasible if its corresponding
solution is feasible. Basic feasible solutions are vertices of the polyhedron of feasible
solutions, and conversely.
A basis/3 is complementary if, for each 1 ~< j ~< m, exactly one of j and m + j is
in/3. A basis is i-almost complementary if, for all 1 ~< j ~< m different from i, either
j ~ /3 or m + j ¢ /3. Note that a complementary basis is /-almost complementary
for every i. For a n / - a l m o s t complementary basis/3 that is not complementary there
is some 1 ~< j ~< m with j ~ /3 and m + j ~ /3; we call j the omitted index of/3.
The idea will be to start at a complementary feasible solution, then, for some i, move
along a sequence of/-almost complementary feasible solutions until we reach another
complementary solution.
For a basis/3 let qZ = (BZ)-lq and let A z : (B~)-lA. Then define

T z= [-q~ Az] =(BZ) - l [ - q A]. (3.7)

Ch. 2: Computation of Equilibria in Finite Games 95

Let " / = { 9 1 , . . . , g,~} be another basis. We are particularly interested in the case in
which /3 and 3' differ by one element, so assume that 7 = fl - {r} tO {s}, where
r = bh. Note that

T ~/= [-q~ A "y] =P'Y~[-q~ An], (3.8)

where p'r~ = ( B ' r ) - l B ~ is referred to as the pivot matrix.
At the numerical level, a key observation is that (p-~/~)-I = ((B.y)-IB/~)-I =
( B P ) - I B ~r is a collection of columns from A ~ = ( B ~ ) - I A . We think of the entries
in column u of A n = ( B S ) - I A as specifying the linear combination of the columns
of B ~ that coincides with column u of A, and with one exception, the columns of
B "r are columns of B z. So, writing aj~ for the element in row j, column k of A n,
we have

1 a nh - l , s
( p - r S ) - i = (B 5) - 1 B 3 , ~--- an (3.9)
an 1

Jms 1
(Although we have represented the matrix as a diagonal matrix with a column replaced,
in general it will not be of this form, since the indices of the rows may not be in
the same order as the corresponding columns. The essential point is that the column
whose h-component is 1 is replaced with the s-column of An.) Evidently we must have
aZh~ ¢ 0, else B "r is singular, and by simply multiplying (pfl-y)-I by the following
matrix to obtain the identity matrix, one can verify that

1 h--l,s

p'Yfl = 1
a~+l ,s

96 R.D. McKelvey and A. McLennan

Substituting into (3.8) yields, for any 1 ~< k ~< 2 m

jk = {a_~,
a~ a~ p
j k - - ~ - h a~k
ifj =h,


We use the notational convention that aj~o = - q ] , and similarly for/3. Then analogous
formuli for the q~. are obtained from the above.
The L e m k e - H o w s o n algorithm proceeds by "pivoting" along a sequence of/-almost
complementary feasible bases, for some given i, at each step maintaining the
"tableaux" T ~ = [ -q/~ A ~ ] in memory, along with the list of current basis elements
and the correspondence between the basis elements and the rows of the tableaux. The
specific numerical computations in moving from one basis to the next are given above,
and the further description of the algorithm is a matter of specifying the choice of basis
to move to next. In general we will want the new basis to be feasible, i-almost com-
plementary, and different from the basis we were at just before arriving at the current
basis. If we begin at a complementary feasible solution, then the basis reached on the
first pivot will be obtained by adding i or m + i to the basis. If we are at an i-almost
complementary feasible basis/3 that is not complementary, then the new element of
the basis will be either j or m + j, where j is the omitted index of/3. Specifically, the
new element will be j (m + j) if m + j (j) was the element that was dropped in the
last pivot. If the element that is dropped in this pivot is either i or m + i, then the new
basis is complementary, and otherwise the new basis is also i-almost complementary.
Geometrically, when we add an element to the basis we are looking at the edge of
the set of feasible solutions determined by the equation A z = q and the conditions
zj = 0 for all j, other than the one being added, that are not in the basis /3. The
basic feasible solution corresponding to/3 is one endpoint of this edge, and the basic
feasible solution corresponding to the new basis is the other endpoint. The pivoting
procedure described above is the numerical embodiment of this procedure.
For problems whose parameters are generic in the relevant sense, there will always
be a unique basis determining the other endpoint of the edge under consideration. It is
this case that was worked out by Lemke and Howson. In the general case it can happen
that several components of z vanish simultaneously when one reaches the new end-
point, so that any one of these variables could be the one dropped. Lemke and Howson
considered perturbation techniques for dealing with such degeneracies. The procedure
described below is a computational implementation of these ideas due to Eaves (1971).
A matrix of real numbers is lex-negative (positive) if the first nonzero entry of each
row is negative (positive). A basis is lex-feasible if T ~ is lex-negative. Note that as
long as q > 0, the extraneous solution is lex-feasible. Suppose that we are given T ;~ =
[ -q¢~ A ~ ], that/3 is a lex-feasible basis, and we have decided to add s to the basis,
so that we must choose an index r = bh to drop. The choice will be dictated by the
requirement that the new basis "~ = / 3 {r} t2 {s} be lex-feasible, as we now explain.
Ch. 2." Computation (~fEquilibria in Finite Games 97

We have already seen that we must choose r = bh with a~h~ ¢ 0. If r = bh were

a feasible choice with ahs < 0, it would have to be the case that qh~ = 0, in which
case the nonzero components of q~ are the same as the nonzero components of qZ.
In effect the passage from the basis/3 to the basis 3' has not changed the underlying
solution. We wish not to allow this possibility, and we therefore require that h should
be an element of S = { j: aj~ > 0 }. This set must be nonempty since otherwise the
set of feasible solutions would be unbounded. (Any solution would remain a solution
if one increased the component of z corresponding to s by t while increasing the
component corresponding to each h by --ah~.) "z Let So = argmin{q~j/a]~: j E S }.
If the components of q'Y are to be non-negative, h must be chosen from So, clearly.
Conversely, if So is a singleton, there is nothing more to it: since qZh~>0 and ahP~ > 0,
if a ~ ~< 0 then q~ ~> 0 automatically, and otherwise it is non-negative by virtue of
the choice of h. For generic choices of U and q it will be the case that So is always a
singleton; such problems are said to be nondegenerate. [Eaves (1971) points out that
a weaker notion of nondegeneracy suffices.]
When So has more than one element, we must refine our method of choosing which
element of the basis to drop. As above, we write a~0 = - 4 , and similarly for ~/.
Also write S = S - I . For k = 0 , . . . , 2 m define

:argmax{aJ } a-~-.s, j E Sk_l .


For sufficiently large g, Se is a singleton (otherwise two rows would be related by

scalar multiplication so that the rank of A ~ would be less than rn), and we let h be
its unique member.

THEOREM 2. If~3 is a lex-feasible basis and 7 = / 3 - {r} U {s} is" the basis obtained
when s is added to/3, as' p e r the procedure described above, then 7 is lex-feasible,
a n d / 3 is the lex:feasible basis obtained when r is added to 7, as p e r the procedure
described above.

PROOF. We first show that ? is lex-feasible. We must show that each row of T ~ is lex-
negative. Consider row j. If j = h, then the row is simply a positive scalar multiple of
the corre~onding row of T ~, which is lex-negative since/3 is lex-feasible. If j ~ S,
so that a ~ ~< 0, then row j of T "~ is equal to the corresponding row of T ;~ plus a
non-negative scalar multiple of row h of T/~. Since each of these is lex-negative, the
resulting sum must be lex-negative. The remaining rows are those with j c Se- 1 - Se,
for some g ~> 0.
It follows that for k < g, j E Sk, which implies

aJk =- a ~ ~. -at~k
- -- ~ -- ~ ajk = 0 .
98 R.D. McKeIvey and A. McLennan

For k = g, the second equality becomes an inequality, yielding ajVe < 0. Hence row
j is lex-negative. We have established that T "r is lex-negative, hence 7 is lex-feasible.
In the remainder of the proof, we show that/3 is obtained when r is added to 7. It
suffices to show when starting at the basis 7 and adding r, that h E Sk for all k. We
will use the fact that, since r c/3, a]~ is either 1 or 0 according to whether j = h.
Since a~s > 0 was a requirement of the construction of 7,

a~r _ 1
aL - > o,

it follows that h E S = S-1. Now, for g >~ O, assume by way of induction that
h E S t - l . Then for any k < [,

For any other j E S t - l ,

a~ ~ a~
aj~. = a~r - 7T-%," - an '
ahs hs

it follows that

a~k -- 1 p --
ajk fl anh k l = a~ k + "off"-ajk"
ay~ aj~. ahs ] --jr

Hence, since ajar > 0

a~k ~> a~ k
a~h~ ~ z~ aj~ ~ 0,

with equality on the left if and only if" there is equality on the right. Thus, if j C S t - l,
we must have aj~ = 0 for all k < m. By lex-feasibility of/3, it must be that air ~ 0,
which implies that

a~t ~> aj~

a~r ~air. -

Hence, h c St. The result now follows by induction on g. []

Ch. 2: Computation ~["Equilibria in Finite Games 99

Let /3* be the set of all complementary lex-feasible bases, and /3, be the set of
all /-almost complementary lex-feasible bases. For any /3, "~ E /3,, we say that ,6
is adjacent to 3' if 7 is obtained from/3 by adding either j or m + j, as described
above, where j is the omitted index for/3. If/3 C /3", we also say that/3 is adjacent
to 7 E /3* if 7 is adjacent to/3.
The important consequence of the theorem is that if/3 is adjacent to % then "7 is
adjacent to/3. A n y / 3 E/3* is adjacent to exactly one element in/3* namely the one
obtained by adding either i or m + i as the case may be, and any 3' E /3* - / 3 * is
adjacent to precisely two elements in/3*, since any adjacent basis must be obtained
by adding either j or rr~ + j, where j is the omitted index.
Now for any 1 ~< i ~< m, let ~-i be the transitive closure of the adjacency relation
o n / 3 , . Since/3~ is finite, ---i partitions/3* into a finite number of equivalence classes,
each of which has a finite number of members. Every member/3 of an equivalence
class is adjacent to either one or two other members. An element that is adjacent
to exactly one other member is called an endpoint. It follows that each equivalence
class must be of the form of a loop or a path. In a loop, there are no endpoints and
every member of the equivalence class is adjacent to exactly two other members. In
a path, there are exactly two endpoints, which are connected by a path through the
remaining elements. A member/3 is an endpoint if and only if it is a complementary
lex-feasible basis. It follows that/3* contains an even number of elements. We know
that/3* ¢ 0, since the extraneous solution is in/3*. This establishes the existence of
at least one complementary feasible basic solution other than the extraneous solution.
It follows from the above argument that there exists at least one Nash equilibrium.
In the non-degenerate case, lex-feasibility is equivalent to feasibility, and the above
argument establishes that the number of Nash equilibria is odd.
This leads to the Lemke-Howson algorithm, which amounts to just following the
adjacency chain, starting at a complementary lex-feasible basic solution:
(1) Pick ¢4o ~ /3*, and 1 ~< i <~ m. Find the unique new basis /31 resulting from
adding whichever of i or 'm + i is not in /3o, as per the procedure above. Set
(2) If/3k E/3, halt. Otherwise proceed to 3.
(3) Given/3k from which j (resp. m + j) has just been dropped, add m + j (resp. j)
and find the unique new basis/3k+1 allowed by the procedure above. Set k = k + 1
and return to 2.

From the above remarks, we see that the algorithm cannot cycle. Since there are
finitely many bases, the algorithm must eventually halt. Shapley (1974) defines an
index for an equilibrium of a bimatrix game and points out that in the non-degenerate
case, the L e m k e - H o w s o n algorithm connects equilibria with opposite indices. Since
the extraneous solution has index of - 1 , this means that any equilibria reached by
starting from the extraneous solution must have index of +1.
100 R.D. McKelvey and A. McLennan

We have described the Lemke-Howson algorithm for a somewhat larger class of

problems than that generated by two person games. In the case of a two person game,
U has a partitioned form, as in Eq. (3.3). In this case, by rearranging the columns,
we can write

A=FAI 0 l
0 A2 ' A~- [Ii U_~].

Then, we can write B ~ in the form

0 B2~
It follows that

where P]/~ : ( B ~ ) - ' B ~ , q~ = ( B ~ ) - I 1 , and A~ = ( B ~ ) - I A i . Thus, the pivot

operation leaves the zero blocks of A untouched. For the same reason, the above
shuffling of the columns does not affect lex-feasibility. Further, when/3 and "7 differ
by only one element, one of P ] ~ , i -- 1,2, must be an identity matrix.
From a numerical point of view the above observations mean that for two person
games, we need not do computations on (or even store) the whole tableau. Rather, the
tableau can be decomposed into two smaller tableaus T 9 = [ - q ~ A ~ ] , which can
be computed on independently, since any pivot operation only affects one of them.
(In fact the Lemke-Howson algorithm requires that pivot operations alternate between
the two tableaus.) Further computational simplification results since one only has to
maintain data for the current non-basic columns of each tableau.

3.1.2. Finding multiple Nash equilibria

In order to initiate the Lemke-Howson algorithm, one needs to choose an index

1 ~< i ~< m, and a complementary lex-feasible basic solution (clbs)/30 c 13". Typically,
one will start with /3o being the extraneous solution, but this is not necessary. The
algorithm can be initiated from any/3 c / 3 * , as long as one can find it. This suggests
a means of using the Lemke-Howson algorithm to find more than one solution. For
any set/3' C_/3* define

A e ( B ' ) = {/4 c/3*: /3 ~-e /31 for some /3t E B'},

Ch. 2: Computation of Equilibria in Finite Games 101


1) =

Thus A(B/) is the .set of clbs's that are immediately accessible from B ~ via the
L e m k e - H o w s o n algorithm. For any integer t, define At(B I) = A ( A t - I ( B ' ) ) , where
we define A ° ( B t) = B ~. Since B* is finite, clearly, there must exist a T for which
At(B ') = A~(B ') for all r,t >~T. Define A*(B') = AT(B/). This is the set of cIbs's
that are accessible via the Lemke-Howson algorithm from/3 ~.
If we want to find all Nash equilibria to a normal form game, one might try
to set /3 / = {g0}, where fl0 is the extraneous solution, and then use successive
applications of the Lemke-Howson algorithm to find A*(Bt). One would then hope
that A*(B/) = B*. Unfortunately, Wilson constructed an example of a two person
game [reported in Shapley (1974)] for which A*(B ~) ¢ 13". The example is generic,
in the sense that small perturbations of the payoffs do not change the property that
A* (13/) ¢ B*. Thus, we cannot in general hope to find all equilibria by computing the
accessible equilibria. Shapley (1981) has shown that it is possible to perturb a game
so that the set of Nash equilibria remains unchanged, but the accessibility relation
changes. However, he does not provide any general method of perturbing a game so
that all Nash equilibria become accessible.

3.1.3. Zero-sum games

For two person zero sum games, the problem of finding Nash equilibria simplifies
considerably, as we are able to express the problem as a linear program.
The value of a game is defined as the maximum that a player can guarantee
him/herself irrespective of the strategy of the opponent. For a two person game,
the value of the game, v~, to player i is expressed as the solution to the following
linear program:

minimize vi
subject to Ui "p-~ ~< v,i • 1 (3.13)
p~ />0

It follows from the minimax theorem that for two person zero sum gaines, vl = - v 2 ,
and that a strategy pair is minimax if and only if it is a Nash equilibrium. Hence,
the set of Nash equilibria is exactly the set of solutions to the above linear program.
Since the set of solutions to a linear program is a convex polyhedron, it follows that
the set of all Nash equilibria is a convex set, which can be completely characterized
by a finite number of extreme points. These are just the set of optimal basic feasible
solutions of the above linear program.
102 R,D. McKelvey and A. McLennan

3.1.4. Summary

In summary, the Lemke-Howson algorithm provides a way to find at least one Nash
equilibrium for any two person game. As long as the data of the problem are rational,
the algorithm can provide exact solutions, since all computations are done in the ra-
tional field. The Lemke-Howson algorithm is guaranteed to eventually find a solution
in any problem. But, because there is no objective function in the L e m k e - H o w s o n
algorithm, there is no way of determining how close one is to a solution during the
Regarding the computational complexity computational complexity of the L e m k e -
Howson algorithm, Murty (1978) showed an exponential lower bound for Lemke's
algorithm when applied to linear complementarity problems. However, there are no
results giving the computational complexity of the Lemke-Howson algorithm for finite
two person games.
The L e m k e - H o w s o n algorithm can also sometimes be used to find multiple Nash
equilibria. It can compute a set of accessible Nash equilibria. But there is no guarantee
that the accessible equilibria include all Nash equilibria.
In two person, zero sum games, the problem reduces to a linear program. In this
case, the set of all Nash equilibria is a convex set, which can be characterized as the
set of optimal basic feasible solutions to the same linear program.

3.2. N-person games: Simplicial subdivision

For n-person games, with n greater than two, the problem of finding a Nash equilib-
rium is no longer a linear complementarity problem, so we can not apply the L e m k e -
Howson algorithm any longer. Rosenmtiller (1971) and Wilson (1971) independently
extended the Lemke-Howson algorithm to find Nash equilibria for n-person games,
but this extension requires following a solution set of a set of non linear equations,
and has not (to the authors' knowledge) been implemented in computer code. More
tractable approaches use path following approaches derived from Scarf's algorithm
for finding fixed points for a continuous function on a compact set.

3.2.1. Fixed triangulation

The simplicial subdivision algorithms derive from the work of Scarf (1967, 1973).
Most of the original work in this area deals with finding fixed points of a function
f : A '~ ~ A N defined on an (n - 1)-dimensional simplex, A '~ = {p = ( P l , . . . ,P~):
~i~-l Pi = 1, p~ >i 0}. For application to game theoretic problems, we need a version
which operates on the product of unit simplices, z5 = I-L Ai, called the simplotope.
We describe a version of a simplicial subdivision algorithm that works on this product
Ch. 2." Computation of Equilibria in Finite Games 103

It is convenient to represent player i's jth pure strategy as sij = e i j , where e i j is

the i,jth basis vector in R "~. Thus cij is the vector with a 1 in element ~ - I rnz + j ,
and 0 everywhere else, and player i's set of pure strategies is Si = { e i l , . • •, ei,~ }.
We can write A~ = a(Si) = a ( e i l , . . . , e~rn~), and A = [ I i Ai = I]~ a(S.z).
The algorithm treats each agent's set of pure strategies as an ordered set. There
are no constraints on these orderings, and we will use the ordering given by the
indexation. Thus a set T C_ UiS i is said to be ordered if, for all i, there is a hi ~> 0
such that Ti = {sij: j <. ki}. For such an ordered set define Ti = Ti W {8i,ki+l }.
We now give a formal definition of simplicial subdivision. For any t + 1 affinely in-
dependent points { v o , . . . , vt} in R ~, the convex hull (7 = (7(vo,..., vt) of { v o , . . . , vt}
is called a t dimensional simplex with vertices vo,... ,vt. A simplex ~- is a face of
(7 if the set of vertices of ~- is a subset of the vertices in (7. A proper face is a face
that is not all of (7; the interior of (7 is the set of points in a that are not elements of
some proper face. If ~- is a face of (7 with one less vertex than (7, then ~- is called a
facet of (7. A finite collection G of simplices is a triangulation of a compact, convex
set C if the interiors of the faces of all simplices partition C. An equivalent condition
is that the union of the simplices in G is C, and if two simplices have a nonempty
intersection, their intersection is a face of each.
Suppose that we are given a triangulation G of the simplotope A. Then G induces
a triangulation of each face of A, as we now explain. For any T C UiSi, define
Ti = T N S~. Any face of A is of the form 1]i (7(Ti) where each T~ C S~ is nonempty.
If the interior of a face of a simplex in G intersects [ L cr(Ti), then that face is entirely
contained in 1-Ii (7(Ti). Every point in 1]i (7(Ti) is in the interior of some simplex in
G, and of course the interiors of such simplices are pairwise disjoint, since this is
true for G.
A labeling on A is a function 1 • A ~-~ UiSi. Assume that T is an ordered set,
and let (7 E G be a simplex in I-[i (7(Ti), of maximal dimension, so that (7 has
1 + ~ i #(Tz) vertices. Then (7 is said to be almost completely labeled if the labels
associated with the vertices of (7 include each Ti. An almost completely labelled
simplex cr is completely labeled if the labels associated with the vertices of (7 include
T.z for some i E N , in which case (7 is called an i-stopping simplex. The goal of
the algorithm will be to find such a simplex for any labelling that is Sperner-proper:
for all T _C UiSi with Ti ¢ 0 for all i, l ( [ L (7(Ti)) C_ T. Concretely, a labeling is
Sperner proper if, for any v C A, vii = 0 ~ l(v) ¢ eij.
Let Z T denote the set of almost completely labeled simplices of the ordered set
T C_ S. Let S = U T Z r , where the union is over all ordered sets T. We now define
adjacency, a binary relation on Z.
Unless (7 E ~T is O-dimensional, which happens precisely when Ti = 0 for all i,
any (7 C Z T has at least one facet ~- with an almost complete set of labels - i.e.,
l(~-) = T. Possibly ~- is also a facet of one other ( ~ i #(Ti))-dimensional simplex in
I-[i a(Ti), say (7'. Since ~- C (7', (7' is completely labelled. In this case (7 and (7' are
adjacent. If 7 is not a face of two simplices of maximal dimension in I ] i (7(Ti), then T
104 R.D. McKelvey and A. McLennan
_ _ m

must be contained in the boundary of this set, so that ~- C ~ ( T 3 - {s.iz }) × I - L c j ~7(T~)

for some j and sjz E S j . Since l is Sperner proper, sjz cannot be a label of a vertex
of T, i.e., sjl ¢ T, so we must have sjl = s j k j . Two ordered sets T and T / are said
to be adjacent if they differ by at most one element, say sjkj c T - T I. When T and
T z are adjacent, a simplex cr E Z T and a simplex ~- E ZT, are adjacent if l(T) = T
(so ~- is completely labeled) and ~- is a facet of a. The definition of adjacency is
now complete, in the sense that there are no cases in which two simplices in Z are
adjacent aside from the two just described.
Summarizing, given cr C Z T and a facet ~- with an almost complete set of labels, c~
is adjacent either to ~- itself or to another simplex in Z)T which also has 7- as a facet.
Let v be the vertex in (7 - 7-. If l(v) c T, then cr has one other facet with an almost
complete set of labels, so that cr must be adjacent to precisely two other simplices.
If l(v) • T, then (7 is completely labeled, say with l(v) = 8j,kj+l, then cr is adjacent
to the unique almost completely labeled simplex in ~r(Tj U {sjkj+z}) × I-[i#j cr(Ti),
unless k s + 1 = my, so that the labels f o r cr include all the pure strategies o f j .
By the definition of adjacency, every almost completely labeled simplex a c E~,
for some ordered set T C S is adjacent to at most two other simplices. A simplex
that is adjacent to at most one other simplex is called a terminal simplex. It follows
that if cr C Z and J is the set of simplices in Z that can be reached by starting at
o- and proceeding along a sequence of simplices in which each simplex is adjacent
to its predecessor, then J must be: (a) a loop - there is no terminal simplex; (b)
a string - there are two terminal simplices; or (c) a point - J contains a single
(terminal) simplex adjacent to no other simplex. A terminal simplex must either be
the 0-dimensional simplex ~r0 that is the unique element of I~i o-(Ti) when Ti = (3,
a n / - s t o p p i n g simplex for some i E N , or both. The last possibility (namely a point)
is ruled out if we require that m i ~> 2 for all i. With this trivial case eliminated, it
follows that there are an odd number o f / - s t o p p i n g simplices.
Given algorithmic p r o c e d u r e s / o r elaborating the triangulation and computing la-
bels, the above results define an algorithm for finding a stopping simplex: start at
cr0 and follow the adjacency relation to the other endpoint of the string. In order for
a stopping simplex to be of interest, it must approximate a Nash equilibrium to the
game. We now present a labeling function which achieves this.
For any n-person game, and any p E A, we define, as in Section 2, Eq. (2.5)

g J(P) --max[ui(sij,p_i) ui(p),0] and Yij(P) - 1 ÷ gi.4(;)

As noted previously, y : A ~-4 Z~ is a continuous function whose fixed points arc

precisely the Nash equilibria. For p ~ z~, define l(p) = s~j, where ( i , j ) is the
lexicographic least index in argminlc N,1 ~<k~<,~gzk (P) -- PZk.
N o w let {G~}~=I be a sequence of triangulations whose meshes converge to 0.
For each r let ¢7r be an i-stopping simplex for some i. Since we may pass to a
Ch. 2: Computation of Equilibria in Finite Games 105

subsequence, we may assume that it is the same i for all r, and we may also assume
that the sequence err converges, in the obvious sense, say to p*. Now for each sij E Si
and each r there is a vertex v of er'° with l(v) = sij. Among other things this implies
that y~j(v) <, v~j, and passing to the limit yields Yij(P*) ~ P~. This is true for all
1 ~ j ~< mi, so we must have Yij(P*) = Pi*j for all 1 <~ j <~ rni. In view of the
definition of l, it follows that 0 = minzCN.l~<k~<~, Yzk(P) -- Pzk, SO for all k c N and
1 <~ h <~ ink, Ykh(P*) >~ P*kh" But then Ykh(P*) = P*~h for each k and all relevant h.
That is, p* is a fixed point of y, hence a Nash equilibrium.
Thus we have described an algorithm which, for given e, halts at a point p C A with
IlY(P) - P l l < e, namely execute the procedure for finding an/-stopping simplex on a
sequence of successively finer triangulations of A until some vertex of such a simplex
satisfies the halting criterion. It should be stressed that we have not given a procedure
for finding a point that is within e of some fixed point of y. This is inherent in any
algorithm that assumes only that g is continuous, and which uses only the values of
y at particular points to regulate its behavior. To see this imagine such an algorithm
halting after computing the value of y at some sequence of points Pl, • • • ,Pz. Then
the algorithm would behave in the same way if y was replaced by h -~ o f o h where
h : A -+ A is a homeomorphism that leaves P l , . . . ,Pz and y ( P l ) , . . . ,Y(Pz) fixed,
but by choosing h appropriately we are free to locate the fixed points of h -1 o f o h
almost arbitrarily.
In fact for the problem of finding Nash equilibrium, additional information is avail-
able, in that the given functions are algebraic. One could imagine combining the
procedure above with an additional step using this information to test whether the
candidate approximate equilibrium was, in fact, near an actual equilibrium. To our
knowledge this issue has not been investigated.

3.2.2. Refining the triangulation

One of the problems with the Scarf type simplicial subdivision algorithms described
above is that they depend on a given triangulation. If one finds an approximation to
a solution using a triangulation G with a given mesh, and one then wants to get a
better approximation, one must start all over with a new triangulation with a smaller
mesh. The computational effort that went into finding an original approximation does
not help in finding a finer approximation.
This problem has been dealt with in two ways. The tirst is the homotopy method,
originally developed by Eaves (1972). Here the idea is to add a dimension to the
problem representing the accuracy of the approximation, say t E [0, 1]. Then one
triangulates the product space A x [0, 1] in such a way that (roughly) as t approaches
1, the mesh of the triangulation approaches 0. Doup and Talman (1987b) provide a
triangulation which is suitable for application of this method to the simplotope, and
which allows for arbitrary rate of grid refinement.
The second approach to getting better approximations is the restart method. Here
the idea is to develop path following algorithms that allow for start at an arbitrary
106 R.D. McKelvey and A. McLennan

point in the solution space. Merrill (1972), Kuhn and MacKinnon (1975) and Van der
Laan and Talman (1979) developed methods that allow restart at an arbitrary point
in the space, with a triangulation of smaller mesh. In a series of articles, Van der
Laan and Talman (1979, 1980, 1982), Van der Laan, Talman and Van der Heyden
(1987) and Doup and Talman (1987a) have developed versions of these algorithms
which can be applied to the simplotope, and also investigated in considerable detail
the advantages of different triangulations.

3.2.3. Computational complexity

Hirsch, Papadimitriou and Vavasis (1989) showed that Scarf's algorithm, (actually
any algorithm for computing a Brouwer fixed point based on evaluation of function
values), has a worst case complexity which is exponential in the dimension and the
number of digits of accuracy. Todd (1982) showed an exponential lower bound for
piecewise-linear path-following algorithms applied to triangulations of affine linear
On the other hand Saigal (1977) shows that if the underlying function is contin-
uously differentiable and satisfies a Lipschitz condition, that simplicial subdivision
methods can achieve quadratic rates of convergence (in the limit) if the change in
mesh size is chosen appropriately.
One problem with restart methods for use on game theory problems that apparently
does not arise in fixed point problems on the simplex is related to the definition of a
stopping simplex when one works in the simplotope rather than in a single simplex.
The results discussed above guarantee us that as we refine the triangulations, sending
the mesh size to zero, there will be a sequence of stopping simplices converging to
a Nash equilibrium. However, we are not guaranteed that the sequence of stopping
simplices will be stopping simplices for the same player. After a restart, the search for
the right player with a stopping simplex can sometimes cause the algorithm to go out to
the boundary of the simplotope before returning to the approximate starting location.
It is possible that these problems may be avoided in the homotopy methods applied
to the simplotope, or in algorithms such as that of Garcia, Lemke and Luethi (1973)
who, instead of using the simplotope, embed the problem in a simplex of dimension
m - 1. Unfortunately, there is not much in the way of comparative computational
experience of these algorithms as applied to game theory problems reported in the

3.3. Non-globally convergent methods

In this section, several methods for finding Nash equilibria are presented which are
not globally convergent. Despite the fact that they do not satisfy global convergence,
these methods are important because they offer other features, such as speed or the
ability to find a particular equilibrium for the purposes of doing comparative statics,
for example.
Ch. 2: Computation of Equilibria in Finite Games 107

3.3.1. Nash equilibrium as non-linear complementarity problem

We saw in Section 2 that an n-person game can be represented as a non-linear com-

plementarity problem. Mathiesen (1987) and others have been very successful at
finding economic equilibria by formulating the problem as a non-linear complemen-
tarity problem, and then solving that problem by a sequence of approximations by
linear complementary problems (SLCP), Harker and Pang (1990) survey recent de-
velopments in this area as well as the more encompassing set of variational inequality
The general idea of the SLCP approach is that one picks a starting point p0 c A
in the simplex, and then approximates the non-linear problem at that point with the
linear complementarity problem obtained by taking the first order Taylor expansion
around the point. One then uses Lemke's algorithm for the linear complementarity
problem to find an exact solution to that problem. This gives a new point, pl E A
from which one can repeat the process. One then hopes the sequence {pC) converges
to a solution of the original non-linear complementarity problem. The method can be
thought of as a generalization of Newton's algorithm for finding local optima of a
C 2 function, and is also related to the "global Newton method" of Smale (1976). 1
Like Newton's method, the SLCP method is not globally convergent, and requires
judicious choice of a starting point.
Since an n-person game can be formulated as a non-linear complementarity problem
also, the SLCP method should be applicable to solving large normal form games.
Van den Elzen and Talman (1992) have used similar ideas in the computation of
game theoretic equilibria. They formulate the Nash equilibrium as a stationary point
problem, and then approximate the stationary point problem by a sequence of linear
stationary point problems.
These methods do not satisfy global convergence. However, they do have the ad-
vantage of speed. Hence even without global convergence, the ability to repeatedly
try a number of starting points if convergence fails can make these methods attractive,
especially in large problems, if the only goal is to find a sample Nash equilibrium.

3.3.2. Nash equilibrium as a minimum o f a function

As discussed in Section 2, the problem of finding a Nash equilibrium can be reformu-

lated as a problem of finding the minimum of a real valued function on a polytope,
where the global minima of the function correspond to the Nash equilibria of the un-
derlying problem. Under this approach, every isolated Nash equilibrium has a basin
of attraction. So if one starts close enough to an isolated Nash equilibrium, then one
can guarantee to find it with any level of accuracy desired.

1Contrary to the suggestionof the name, the global Newton is not globally convergent. See, e.g, Doup
108 R.D. McKelvey and A. McLennan

Recall, we defined the function v : 79 ~ R in Eq. (2.9) by

iEN l<~j~m~

First, note thatv(p) is a non-negative function that is zero if and only i f p is a Nash
equilibrium for the game. Further, as is shown in McKelvey (1996), v is everywhere
We want to minimize v(p)subject to the constraints that ~j pij = 1 and pij >~
0 for all i,j. One can use methods for constrained optimization to find solutions.
Alternatively, we can impose the constraints as penalty functions, yielding a revised
version of the objective function:

w(p)=v(p)+ ~-~{min[pij,O]}2+~-~(1- ~-~pij) 2

ij ieN 3

This is also a differentiable function, defined over all of 79, and p* is a Nash
equilibrium if and only if it w(p*) = 0. Any of a number of standard algorithms for
finding the unconstrained minimum to a function on R '~ can be applied to find the
minima of w.
Note that it was not established in the above characterization that all minima to the
function v or w are global minima. Hence, there may be local minima to the objective
function that are not global minima, and hence not Nash. So it is important to check
the value of the objective function after convergence, to verify that the point found
is indeed a Nash equilibrium.
The speed of the algorithm (in the authors' experience) is generally slower than
other methods, and as is evident from the discussion in the preceding paragraph, the
algorithm may sometimes require judicious choice of a starting point. Nevertheless,
there are some situations in which the algorithm is preferred to existing methods. The
path following algorithms discussed above are not capable of finding all of the Nash
equilibria. Even if one uses a restart algorithm to start them close to a given Nash
equilibrium, there is no guarantee they will find the nearby equilibrium. However,
every isolated Nash equilibrium will have an open region around it where the value
of the objective function v is strictly greater than 0. Hence, every Nash equilibrium
will have a radius of convergence such that if one starts within this radius of the Nash
equilibrium, and uses any descent method which guarantees that the objective function
decreases at at each step, then the algorithm proposed in this section will converge
to the nearby Nash equilibrium. Therefore, the method proposed here is useful for
investigating thd comparative statics of a particular equilibrium, as a function of the
payoffs of the game.
The function v can also be used to define a differential equation system whose
zeros are Nash equilibria [see McKelvey (1996)]. However, since there may be local
Ch. 2: Computation of Equilibria in Finite Games 109

minima that are not global minima, the system is not globally convergent. Other
work has been done on differential equation systems for solving for Nash equilibria.
Brown and von Neumann (1950) construct a differential equation systems for finite
two person zero sum games which is globally convergent. Rosen (1964) defines a
differential equation system for n-person games, and he gives conditions for global
convergence. Rosen's. conditions require a property of "diagonal strict concavity" on
the matrix of cross derivatives of the payoff functions, which would not in general
be satisfied for finite n-person games.

4. Extensive form games

To this point we have discussed only normal form games, ignoring all computational
issues that might arise out of the derivation of the normal form from some underlying
extensive form game. This is reflective of the state of the literature, which has only
recently begun to address computation of extensive form equilibrium concepts. Never-
theless, the possibility of using information in the extensive form to guide computation
has enormous intuitive appeal, and is one of the principle factors that motivated Kreps
and Wilson (1982) (henceforth KW) to develop the concept of sequential equilibrium,
which we now review. With minor modifications, our notation is that of KW.

4.1. Notation

In an extensive form game the set of physically possible sequences of events is

described by a tree or, somewhat more generally, an arborescence (the accepted term
in computer science is 'forest'), which may be thought of as a finite collection of
trees. Let (T, -~) be a pair consisting of a finite set T of nodes and a binary relation
-4 on T. We interpret x -< t as meaning that the node x precedes the node t, and we
require that -< be transitive and acyclic. In addition we require that, for any t E T,
the set P(t) = { x E T: x -< t } of predecessors of t is completely ordered by -<.
In effect this amounts to a decision to treat equivalent positions reached by different
sequences of moves (e.g., the positions reached in chess after 1. P-Q4, N-KB3; 2. P-
QB4 and 1. P-QB4, N-KB3; 2. P-Q4) as distinct. For most purposes this is a harmless
simplification, but we hasten to point out that it is far from benign from the point of
view of minimizing computational burden.
The set of initial nodes is W = { w < T: P ( w ) = (3 }, and Y = T - W is the set of
noninitial nodes. Similarly, the set of terminal nodes is Z = { z E T: P - ~ (z) = 0 },
and X = T - Z is the set of nonterminal or strategic nodes. The assumption that
P(t) is always completely ordered implies that the immediate predecessor function
Pl : Y --+ X , Pl (Y) = max P(y), is well defined. We let g(t) be the cardinality of
P(t), and for g 1> 2 we define Pe : { Y E Y: g(y) >~ g} -+ X to be the g-fold
composition of pl with itself. Also, let P0 be the identity function on T. The number
of predecessors of t E T is g(t), the largest integer such that Pc(t) (t) is defined.
110 R.D. McKelvey and A. McLennan

We adopt the convention that 0 represents "nature" or "chance". There are finite
sets H , A, and I = { 1 , . . . , n } of information sets, actions, and agents, respectively,
and functions r/: X -+ H , c~ : Y --+ A, and L : H --+ { 0 } U I , interpreted as follows.
The information partition is the collection of sets of the form r~-l(h), h E H . (The
information partition is taken as given in most formal definitions of extensive form
games, which do not include the function r].) We will often abuse notation by equating
h with ~7-1 (h) so that we write x E h to indicate r/(x) = h. For h E H , the agent
who chooses an action when a node in h occurs is ~(h). Let H i = ~-1(i) be the
set of information sets a t ' w h i c h i chooses, i = 0 , . . . , n. When an action is chosen
at h c H i, agent i knows that some node in r1-1 (h) has occurred, but not which
one. For y E Y, c~(y) is interpreted as the action that was chosen at the immediate
predecessor of y, so that A(x) = ct(p1-1 (x)) is the set of actions that are available at
x C X . To make sense out of the choice problems in this structure we must assume
that: (a) if y, y' E Y are distinct with Pl(Y) = Pl(Y') = x, then c~(y) ¢ c~(y'); (b)
A(x) = A(x t) for all h and x, x ' E h. For h E H let A(h) be the set of actions that
can be chosen at h. It is conventional to assume that A(h) N A(h') = I~ for distinct
h, h ~ E H.
In K W it is assumed that nature has no decisions, so the range of ~ is I , and all
'physical' uncertainty is summarized in the initial assessment p E A ( W ) . For many
purposes this is a harmless simplification - one can shuffle once at the beginning
of the game, rather than before each card is drawn - but again this may not be
computationally efficient, so we do not adopt this assumption. The behavior of nature
at the information sets where it chooses is represented by 7r°, a vector of probability
measures 7r~ E A(A(h)). (The information partition of the nodes at which nature
chooses will play no essential role in the theory. It will be easy to see that postulating
this structure entails no loss of generality, since we could have each node at which
nature chooses be a singleton information set.)
In games played by teams it can be sensible to allow an agent (construed as the
team) to forget information from one move to the next, but we will take the usual
perspective and assume that this does not happen. What this means precisely is that
the game is one of perfect recall: for any information set h and any x, x ~ E h, if x*
is a predecessor of x at which the same agent chooses (that is, ~(rl(x*)) = ~(~7(x)))
then ,r/(x*) contains a node x ~* that is a predecessor of x ~, and if y and y~ are the
nodes with y E { x } U P(x), Pl (Y) = x* and y' E { x' } U P(x'), Pl (Y') = x'*, then
o~(y) = ct(y'). More verbally, at ~7(x), c(r/(x)) knows that r/(x*) occurred, and that
at that point c~(y) was chosen; we require that this is also true when x ~ occurs.
A behavior strategy for agent i is a vector 7ri = (~rh)hEH~
i of probability measures
7r]~ E A(A(h)). Let H i = YIhEH~ zh(A(h)) be the set of such objects. The interpre-
tation is that, conditional on h being reached, for a E A(h), 7dh(a) is the probability
that a will be chosen. The term behavior strategy is also used to refer to vectors
~r = (Tri),;E1 whose components are behavior strategies for the various agents. Let
H = [Ii~s Hi be the space of behavior strategies in this sense. (Formally we do not
Ch. 2: Computation of Equilibria in Finite Games 111

include H ° as a factor of H, but in many expressions we wilt treat the given 7r° as a
component of 7r.) It simplifies many expressions to write 7rh and 7rx in place of 7r~(h)
an d 7rn(x)
L(v(~)) respectively. Given a behavior strategy 7r, the probability that a node t is
reached is

p T r ( ~ ) ~__ p(pg(t)(t)) , HTrpe(t)(o~(pg_l(~))),

and if x -4 t, then the probability that t will be reached conditional of x having been
reached is

ix): 1-[

Recently Koller and Megiddo (1992), von Stengel (1996) and Koller, Megiddo
and von Stengel (1996) have pointed out that there are computational advantages in
working with the sequence form, which is essentially a representation of behavior
strategies in terms of the induced probabilities, for each agent and each sequence
of actions for that agent, that the sequence will be chosen if nature and the other
agents behave in a way that permits the sequence. More precisely, suppose that s =
( a l , . . . , aq) is a sequence of actions that might be chosen by agent i, meaning that
the information set at which a~ might be chosen can occur without any earlier choices
by agent i, and, for j = 2 , . . . , q, the information set at which aj may be chosen can
occur after a j_l is chosen, without any intervening actions by agent i. For each such
sequence for a g e n t / set p[~ (s) = [I~=l 7ri(aj); this defines an induced realization
plan/Ui ~ for agent i. The critical point is that for each node t, P ~ ( t ) is a product of
sequence form variables, with one factor for each agent. As a monomial in sequence
form variables, the degree of P'~(t) can easily be much lower than its degree as a
monomial in behavior strategy variables. It is computationally trivial to pass from a
realization plan to the set of behavior strategies that induce it, so for many solution
concepts it is likely to be efficient to solve for equilibria in terms of the sequence
form, then pass to behavior strategies. This is not possible for sequential equilibrium,
since the sequence form suppresses information about an agent's 'intended' behavior
at information sets that the agent never allows to be reached, whereas sequential
equilibrium imposes conditions on such intentions. Nonetheless, it seems likely that a
suitable generalization of the sequence form will be the natural vehicle for computation
of sequential equilibrium.
When every possible event has positive probability, there are unambiguous con-
ditional probabilities defining beliefs at the various information sets. For i c I let
H ~° = ~[heH ~ z~°(A(h)) be the set of totally mixed behavior strategies for i, and
112 R.D. McKelvey and A. McLennan

let H ° -- ll~cz
-- H ~°. For a totally mixed behavior strategy 7r, each node t is reached
with positive probability, so for each h there is an induced probability distribution
#h c A(h) given by


The vector p'~ = (#~)hEH C M - 1--[hEH A(h) is called the belief induced by It.
The equilibrium concept We are working towards requires not just ex ante rationality,
but also rationality, conditional on certain beliefs, at information sets that have no
probability of being reached under the equilibrium strategies. Thus there arises the
question of what is the appropriate requirement concerning the relation between the
behavior strategy and the belief. KW define the set of interior consistent assessments
to be ~po = { (#~, 7r) : 7r E H ° }, and the set ~P of consistent assessments' is its closure
in M x H. K W express doubts about this notion of consistency, but recently Kohlberg
and Reny (1993) have shown that it is equivalent to a conceptually natural condition
that is slightly stronger than independence.
We now introduce a payoffu E R zxx which specifies a reward for each agent at
each terminal node. A behavior strategy induces an expected payoff

EX(ui I t) = 2 P~(z t t). ui(z)


for each i E I at each t E T, and an assessment (#, 7r) induces an expected payoff


for each i E I at each h C H i. The assessment (#, 7r) is sequentially rational if, for
each i E I and h E H i, there does not exist ~.i E 17"i such that

E(",~l#~)(u~ l h ) > E(~'~)(ui l h),

where 7r I ~i is 7c with 7ri replaced by #i. A sequential equilibrium is a sequentially

rational consistent assessment.

4.2. Extensive versus normal form

There are computational aspects of many of the conceptual issues that arise in con-
nection with this model. Perhaps the most important questions, and the first to be
addressed historically, involve the comparison of the extensive with the normal form.
Ch. 2: Computation of Equilibria in Finite Games 113

A pure strategy for agent i is a function si : H i --+ A with si(h) 6 A(h) for all
h in the domain. Given a pure strategy vector s = ( s l , . . . , sn), p and ~r° induce a
distribution on terminal nodes, and, for a given payoff u E N z x I , a pure strategy
vector s = ( s l , . . . , sn) consequently induces expected payoffs u~(s), i c I. This
construction is the most common way of passing from an extensive to a normal form
game (there are other.s), and is what is generally understood as "the" normal form of
the given extensive game.
Any behavior strategy 7ri for agent i induces a normal form mixed strategy cr~~
according to the formula cri (si) = ~IhcH~ 7r~(si(h)). There are mixed strategies that
do not arise in this way, so one can ask whether an agent sacrifices any significant
strategic flexibility by using only behavior strategies. Kuhn (1953) showed that, pro-
vided the game is one of perfect recall, behavior strategies are strategically adequate
in the sense that for any mixed strategy there is a behavior strategy that is realization
equivalent: regardless of the mixed strategies of the other agents, the mixed strategy
and the behavior strategy induce the same distribution on terminal nodes. In particular,
a Nash equilibrium in behavior strategies, that is, a vector of behavior strategies such
that no agent can increase expected payoff by switching to another behavior strategy,
is a true Nash equilibrium in that no agent has an improving deviation in the space
of mixed strategies. Conversely, for any mixed strategy Nash equilibrium, a behavior
strategy consisting of realization equivalent behavior strategies for the various agents
is also a Nash equilibrium.
For agent i the dimension of the set of mixed strategies is ([IheH~ # A ( h ) ) - 1 while
the dimension of the set of behavior strategies is ~ h e H , ( # A ( h ) -- 1). The general
principle that the difficulty of solving an algebraic system is severely affected by the
dimension suggests that the ability to work with behavior strategies is an important
simplification, and indeed, in the authors' experience, for particular examples the
extensive form is generally much easier to solve. In this vein, Wilson (1972) developed
a version of the Lemke-Howson algorithm for two person extensive form games.
Koller, Megiddo and von Stengel (1996) show how, for two person games of perfect
recall, in the space of sequence strategies (which has the same dimension as the space
of behavior strategies) the definition of Nash equilibrium can be expressed as a linear
complementary problem, so that the Lemke-Howson algorithm can be applied without
first passing to the normal form, which could be much larger.

4.3. Computing sequential equilibria

The behavior strategies of a sequential equilibrium constitute a Nash equilibrium, but

there arc behavior strategy Nash equilibria that are not sequential, in the sense that
there is no consistent belief for which the behavior at unreached information sets is
sequentially rational. Sequential equilibrium is generally regarded as better founded
conceptually, and we now discuss the computational issues that are particular to it. One
114 R.D. McKelvey and A. McLennan

important simplification is that, for a game of perfect recall, a consistent assessment

is sequentially rational if and only if it satisfies the following weaker condition: the
assessment (#, 7r) is myopically rational if, for each i E I and h E H i, there does
not exist ¢r~ E A ( A ( h ) ) such that

E ("'~1#~) (u{ ]h) > E (u'~) (u{ I h).

(For a proof see KW.) Since E(~,~l*i~)(ui [ h) is a linear function, it is sufficient that
this condition hold with ~-~ = a for each a E A(h). Thus we can replace sequential
rationality, a nonlinear quantified condition, with a finite collection of unquantified
Since unquantified systems are generally easier to solve, it is also significant that
consistency can be expressed in an unquantified manner. A basis is a set b = bA U b x
where bA C A and bx C X . A basis is consistent if there is a consistent assessment
(#, 7r) in which bA is the set of actions that are assigned positive probability by the
various components of 7r and bx is the set of nodes that are assigned positive prob-
ability by the various components of/z. Kohlberg and Reny (1993) [see also Azhar,
McLennan and Reif (1992) and the Appendix of KW] establish the characterization
of consistency expressed in the following two results.

LEMMA 3. b is a consistent basis if and only/f: ( a ) f o r each h E H there is at least

one a E A ( h ) with a E bA; and (b) there is a function w : A -+ R+ with w(a) > 0
if and only if a E A - bA, such that f o r each h E H,

bx Nh= argmin E w(c~(pe(t))).
xEh g=O

LEMMA 4. If b is a consistent basis and 7r is a behavior strategy such that, f o r all h

and a c A(h), 7rh(a) > 0 if and only if a E hA, then (Iz, Tr) is consistent if and only
if there is a function ~b : A --+ R + + with ~ ( a ) = 7r(a) )"or all a E ba such that, )"or
all h E H and z E h

O, z ~t bx.

The conditions stated in L e m m a 1 can be expressed as a feasibility problem for

a linear program, so the simplex algorithm can be used to determine if a given
basis is feasible. For any feasible basis, L e m m a 2 parameterizes the associated set
of consistent assessments, and since sequential rationality is an algebraic condition
Ch. 2: Computation ()/'Equilibria in Finite Games 115

without quantifiers, once beliefs are given, we have a characterization of the sequential
equilibria for this basis in terms of an unquantified system of algebraic equations and
Practical experience in solving games suggests that significant increases in compu-
tational efficiency can be obtained by using dominance, and more sophisticated types
of reasoning, to eliminate certain bases from consideration without solving the associ-
ated algebraic systems. It is intuitively plausible that the beliefs can play an important
role in such analysis. In particular examples it is also easy to see how the beliefs can
facilitate generalizations of backward induction. For example, if an information set
contains two nodes, and the game beginning at this information set is self contained,
in that no descendant node is in an information set containing nodes not descended
from one of these nodes, then it is natural to solve the part of the game below this
information set, treating the beliefs at this information set parametrically, then solve
the rest of the game using the derived relation between beliefs at this information set
and expected payoffs conditional on each of its two nodes. At this point the efficacy
of this approach has not been given formal expression, either in computational theory
or in actual software.
It is known that the problem of solving for equilibrium in the extensive form
is computationally demanding. Blair and Mutchler (1993) show that deciding if an
extensive form game has an equilibrium in pure strategies is NP-hard.

5. Equilibrium refinements

The ability to compute not just Nash or sequential equilibrium, but also the various
refinements that have been proposed in a now quite voluminous literature, would
facilitate both application and testing of these concepts. The computational issues
associated with these concepts have not been studied in depth. We will first describe
the current literature, then discuss the subject in more general terms.

5.1. Two-person games

Eaves' modification of the Lemke-Howson algorithm to deal with degenerate games

is a method of introducing a class of perturbations of the game to make it non-
degenerate. As long as the lex-order (the ordering of the columns of the tableau) is
chosen appropriately, the algorithm will only terminate at a perfect equilibrium. We
now prove this assertion. For this discussion, we use the same notation as that of
Section 3.1.

LEMMA 5. Eaves' modification of the Lemke-Howson algorithm (as described in Sec-

tion 3.1) will only terminate at a perfect equilibrium.
116 R.D. McKelvey and A. McLennan

PROOF. Assume that the algorithm has terminated at the complementary lex-feasible
basic solution /3. Then, since /3 is a lex-feasible basis, T/~ = [_qm A m ] is lex-
negative. Let Z be a 2 m x (2m + 1) dimensional matrix, whose ith column, Z~, is
the basic solution for basis /3 (that is, components of Zi corresponding to nonbasis
indices are zero) to the equation AmZ~ = -T4], where, Tfl is the ith column of Tm.
For any r / > 0 define #7 to be the 2 m + 1 dimensional vector

~ = (l,@,~2,...,~m,o,...,O),

and set zv = Z#,~. Write zT~ = (y,~, x,~), where y,~ and zv are the first m and last m
components of z w
The ith row of Z is a vector of zeros if i is not in the basis, so z,~ is complementary.
If i is in the basis, then the ith row of Z is equal to the negative of the row of T m
containing a 1 in column i, so Z is lex-positive since T m is lex-negative. Therefore
z v >~ 0 for sufficiently small r/.
In view of the definition of Z we can write

[_qm A m] + A ~ Z = O . (5.1)

Multiplying this equation through by #,j yields

[-qZ Am ] PTl + Am zv = O.

Recalling that A = [ U I ~ ] , set U m = ( B ~ ) - I U . Then we can rewrite the hast

equation as

Ufl(yrl ÷ (~rl) _~ ( / ~ m ) - l x r / = qm or U(y, I + 5,1) + x,~ = q,

where 5 = (771 . . . , 7 ] m ) consists of the second through m ÷ 1 entries of #v.

But the last equation is just an expression of the restriction that z~ = (Yv, :%) is a
complementary basic feasible solution for a perturbed problem obtained by requiring
that each strategy be played with a probability of at least ~lJv_i (where v - i is the
value of the game to the player who does not choose strategy j). By taking a sequence
of such solutions as ~ goes to zero, we obtain a test sequence that converges to z0,
the lex-feasible basic solution of the original specification of the problem. Hence, z0
must be a perfect equilibrium. []

A stable set, as defined by Kohlberg and Mertens (1988), is (roughly) a set of

strategies that is an equilibrium, and which continues to intersect the set of Nash
equilibria for any sufficiently small perturbation of the game. Wilson (1992) uses
the properties of the lexical version of the Lemke-Howson algorithm to construct an
algorithm for computing "simply stable sets" of equilibria in finite two-person games.
Ch. 2: Computation q["Equilibria in Finite Games 117

A simply stable set is a weakening of the notion of a stable set defined by Kohlberg
and Mertens (1988). It is a set of strategies that is an equilibrium under a restricted
set of perturbations to the game. Wilson's algorithm only finds a sample stable set.
However the algorithm is a generalization of the Lemke-Howson algorithm, and like
that algorithm, it can be modified to find all 'accessible' stable sets.

5.2. N-person games

Yamamoto (1993) proposes a homotopy method for computation of a sample proper

equilibrium for a general n-person finite game. Since the path of the algorithm is
determined by nonlinear equations, there are issues of numerical approximation similar
to those raised by Wilson (1971) and Rosenmtiller (1971). Talman and Yang (1994)
propose a simplicial subdivision algorithm for finding a sample proper equilibrium of
finite n-person games.
Mertens (1989, p. 590) points out that there is in principle a finite procedure for
computation of stable sets for n-person games based on a nearly exhaustive procedure
based on triangulation of semi-algebraic sets (these are defined in Section 6), although
he does not provide a practical algorithm for implementing this. For extensive form
games, McKelvey and Palfrey (1995b) propose a homotopy based algorithm for com-
puting a generically unique selection from the set of sequential equilibria for extensive
form games.
Turning now to general considerations, perhaps the most important feature presented
by the refinement literature is its diversity of methods. While most refinements exclude
certain Nash equilibria from the set of solutions, the various notions of stability
[Kohlberg and Mertens (1988)] define a solution to be a set of equilibria, so to solve for
the set of stable equilibria means finding certain subsets of the set of Nash, or perhaps
sequential, equilibria. Some refinements are defined by requiring approximation by
certain types of approximate equilibria [Selten (1975)], Myerson (1977)]. Testing an
equilibrium by comparing it, in some sense, with the set of all equilibria [McLennan
(1985)], or selected subsets of the set of equilibria [Cho and Kreps (1987), Cho
(1987), Banks and Sobet (1987)] is a possible method. Kalai and Samet (1984) ask
for equilibria whose supports are, in a certain sense, minimal. Although it has not been
much discussed in this literature, it is also possible to use index theory to eliminate
some equilibrium, and it is noteworthy that, as is pointed out by Shapley (1974),
for nondegenerate games, starting from the extraneous solution the Lemke-Howson
algorithm halts at an equilibrium with positive index. (More generally, it always
locates an equilibrium with opposite index from the index of the starting point.)
To a very large extent the computability, in principle, of questions associated with
these concepts, is a consequence of their being describable as semi-algebraic sets.
[See, e.g., Schanuel, Simon and Zame (1989) and Blume and Zame (1994).] Since
the various notions of stability are set valued, they present potential counterexamples
to this general principle.
118 R.D. McKelvey and A. McLennan

6. Finding all equilibria

For many purposes, an algorithm that yields a single sample equilibrium is unsatis-
factory. Even if the resulting equilibrium is perfect, or satisfies some other criterion
posed in the literature on refinements of Nash equilibrium, we cannot eliminate the
possibility that other equilibria exist and are more salient. Some refinements pose
standards that involve comparison of a candidate equilibrium with the other equilibria
of the game. Even if (perhaps especially if) one believed that models with multiple
equilibria were treacherous, and therefore ill-suited for applications, one would still
have an interest in algorithmic methods for determining if there a r e multiple equilibria,
or other facts about the s e t of Nash equilibria.
In recent years computer scientists have made a great deal of progress in under-
standing algorithms that deal with systems of equations and inequalities of multivariate
polynomials. The set of Nash equilibria can be represented as such a system, as we
explained in Section 2, so this material is directly applicable to the task at hand, and
the goal of this section will be to give some feeling for the subject by presenting
some of the algorithms that appear, at this point, to have the greatest promise. At
this point there is no literature concerning how these algorithms might be customized
for the particular problems of noncooperative game theory. There is also no literature
concerning how these algorithms might effectively utilize knowledge of a sample
equilibrium. In the authors' experience, an important idea in organizing the analysis
of a game by hand is to find one equilibrium, then ask how other equilibria might
differ from this one; there is currently no substantiation of this wisdom in theory or
in computational experience.
Generally, the algorithms below are much slower than those discussed earlier, with
running times, and in some cases memory requirements, that grow exponentially as
various parameters of the input (in particular the dimension) are increased. Exponential
algorithms are sometimes loosely described as impractical, and problems for which
no better algorithms exist are thought to be intractable, but here these terms seem
inappropriate. If all the problems of interest were large in scale, then indeed these
algorithms would have little utility, but in fact the simplest games are the ones referred
to most frequently in the literature, and many interesting models can be expressed in
relatively small trees. While these procedures will not be useful for many problems of
interest, they should certainly vastly expand the set of examples for which complete
analysis is possible.
To begin with, it is important to recognize that a variety of computational tasks are
(1) Determine whether an equilibrium exists or, more commonly, since existence
is usually guaranteed, determine whether there exists an equilibrium with some
additional property.
(2) Determine the dimension of the set of equilibria with some property.
Ch. 2: Computation of Equilibria in Finite Games 119

(3) In the event that the set of equilibria with some property is 0-dimensional, de-
termine its cardinality.
(4) Compute numerical estimates of the equilibria.
(5) Determine the topology of the set of equilibria, for instance by presenting a
This list is exemplary rather than exhaustive. In fact many refinements of Nash equi-
librium depend on the topological relationship between the graph of the best response
correspondence and its intersection with the diagonal in Z x 27. Also, some of the
tasks subsume others, obviously. Even when the less demanding tasks are not the ul-
timate goal, they can be useful preparatory steps in more demanding calculations. In
particular, computation of numerical values of the various elements of a set of Nash
equilibria can be facilitated in several ways by a knowledge of how many equilibria
there are to compute.

6.1. Feasibility

Important theorems of pure mathematics show that all the tasks above are, in principle,
computationally feasible. A semi-algebraic set is a subset A c R m that is the set of
points satisfying a propositional formula P(x) built up from polynomial equations and
inequalities in the variables x l , . . . , x,~, the logical operators 'and', 'or', and 'not',
and parentheses. For example P(x) might be the condition '(xl /> 0 and xl 2 = x23)
or x2 < 1'. As we stressed in Section 2, the set of Nash equilibria is a semi-algebraic
There is a more general class of quantified propositional formulas of the form

P ( x ) = ( Q l y l ) . - . (Qkyk)R(x,y)

where Q I , . . . , Qk C { V, 3 } and R(x, y) is an unquantified propositional formula.

In this formula the variables Yl,...,Yk are said to be bound by the quantifiers V
and 3, while the variables x l , . . . , xm are said to be unbound. Perfect equilibrium is
an example of a solution concept for which all known definitions involve quantified
expressions such as "for all e > 0 there exists ~ > 0 such that for all trembles ...".
The celebrated Tarski-Seidenberg theorem asserts that any quantified propositional
formula is equivalent to some unquantified formula, in the sense of determining the
same subset of R m, and in fact the original proof essentially specified an algorithm for
generating the unquantified equivalent. This algorithm is obviously impractical, and
has never been implemented. More plausible algorithms for quantifier elimination have
been developed (cylindrical algebraic decomposition [Collins (1975)] can be adapted
to this task), but it has also been shown that the problem is inherently difficult.
It is easily seen that the class of semi-algebraic sets is closed under intersection,
union, and complementation. It was long an open problem to show that any such set
t20 R.D. McKelvey and A. McLennan

could be triangulated: Hironaka (1975) was the first to present an acceptable proof. An
algorithm developed by Collins (1975), cylindrical algebraic decomposition, supplies
a more general type of decomposition, from which a triangulation can be derived, so
this algorithm constitutes an alternative proof of Hironaka's theorem. These facts are
important because the topology of a space with a finite triangulation is completely
determined by the finite, combinatoric, data specifying the simplicial complex. Thus
a large amount of topological information is, in principle, computable.

6.2. Exemplary algorithms f o r semi-algebraic sets

During the last decade the literature in computer science concerning algorithmic anal-
ysis of algebraic systems has grown rapidly. There are now many methods that are,
at least in principle, applicable to the problems that arise in game theory. In contrast,
virtually nothing is known about how one might customize such procedures to take
advantage of the special properties of the systems that arise in game theory. (For
instance, expected payoffs have degree 0 or degree 1 in each of the strategic proba-
bilities.) Here we will describe two algorithms that currently seem among the most
promising from the point of view of the computation of equilibrium.
The support of a Nash equilibrium is the set of pure strategies that are assigned
positive probability. For a given support the definition of Nash equilibrium can be
expressed as a conjunction of polynomial equations (any two strategies in an agent's
support have equal expected payoff, probabilities sum to one) and weak inequalities
(a strategy for an agent outside his/her support does not have a h!gher expected payoff
than a strategy in the support, probabilities are non-negative).
We now abstract away from the game-theoretic origins of the problem. Let
Pl, • •., Pm and ql,. • •, qk be polynomial functions of z E R m. A sign assignment
for q l , . . . , qk is a vector cr = ( ~ , . . . , ~k) E { - , 0, +}k. The sign assignment is said
to be satisfied at a point x E IR~ if c~i is the sign of qi(x), and we let or(x) denote
the sign assignment that is satisfied at x. The computational problem studied here is:
determine the number of common roots of Pl,. •., P~ satisfying each possible sign
assignment f o r q l , . . •, qk. (We think of Pl,. • •, P~ as the polynomials used to express
the requirements that probabilities sum to one, and that each agent is indifferent be-
tween two alternatives that both receive positive probability, while ql, • •., qk are the
polynomials used to express the nonnegativity of probabilities, and the requirement
that the expected utility resulting from an unused alternatives does not exceed the
expected utility resulting from the alternatives that are used.) A related problem is to
determine the set of consistent sign assignments for ql, •. •, q~; by definition this is
{ ~r(x): x C R ~ }. Note that these problems are unaffected if one of the polynomials
is multiplied by a positive rational number.
The algorithm outlined here proceeds in two steps. First, we show how to pass from
an instance of the multidimensional problem to a sign assignment problem in which
m = 1. We then describe the Ben-Or, Kozen, Reif (1986) algorithm for unidimensional
sign assignment problems.
Ch. 2: Computation of Equilibria in Finite Games 121

6.2.1. The resultant

Let P l , . . . ,Prn+l be homogeneous polynomial functions of x E R m+l. That is, each

pi is a sum of monomials of the same degree, and in particular, if p i ( x ) = 0 then,
for any A c R, p~(Ax) = 0. For each i, the set of monomials whose coefficients in Pi
are allowed to be nonzero is fixed, and we regard the coefficients of Pi as variables,
so that each Pi may be viewed as a function of the vector of coefficients a~. The set
of vectors of coefficients e = ( a l , . . . , a,~+l) such that p ~ , . . . ,P,~+I have a common
nonzero root (hence a one-dimensional linear subspace of roots) is the set of points
satisfying a particular quantified formula, so the Tarski-Seidenberg theorem implies
that it can be expressed as the set of points, in the space of the coefficients, satisfying
some unquantified formula. In fact [cf. Van der Waerden (1950)] the closure of this
set of coefficient vectors is the set of roots of a single irreducible (i.e., unfactorable)
polynomial, and the resultant of p l , . . . ,P~+l is, by definition, the lowest degree
polynomial in the coefficients of P l , . • . , Pm+l that vanishes precisely on the closure
of the set of coefficient vectors for which p ~ , . . . ,Pm+l have a nonzero root. There
is a very familiar example: when each Pi is linear, the resultant is the determinant of
the (m + 1) × (m + 1)-matrix whose ith row is the vector of coefficients of p~.
Some methods for computing the resultant have been known for a long time. [See
Van der Waerden (1949).] These methods are "general" in that they allow all mono-
mials of the same degree as Pi to have a nonzero coefficient in p~. For most problems
of interest the numerical values of at least some of the coefficients are known at the
outset. It is possible to compute the relevant resultant symbolically, then substitute the
given coefficient values, but for all but the smallest problems this method is too slow
and consumes too much memory. Recently a literature in mathematics and computer
science has developed around the computation of the sparse resultant, which is the
polynomial derived from the resultant by setting some of the coefficients to zero.
Some of the proposed algorithms also feature methods of taking advantage of prior
knowledge of some of the nonzero coefficients. The subject is complex, utilizing sur-
prising and sophisticated methods. It is developing at a rapid pace, and does not seem
near resolution, so that although it is a central component of methods for computing
game theoretic equilibria, we are unable to treat it in any detail here. For the interested
reader we recommend Canny and Emeris (1993), which in many respects represents
the current state of the art, and references therein.
Now recall that we began with m polynomial functions P l , . . •, Pm on R ~, but more
recently we have been considering m + 1 polynomial P l , . . . , p ~ + I in the variables
x o , . • •, x m . We pass from the first situation to the second as follows. First, convert
the polynomials, originally functions of x ~ , . . . , x,~ into homogeneous polynomials by
multiplying each monomial by a suitable power of the "homogenizing variable" xo,
where by "suitable" we mean that, after all such multiplications, in each polynomial all
monomials have the same total degree. Geometrically this corresponds to the obvious
embedding of R "~ as the hyperplane in R ~+1 given by the condition :co = 1, in the
122 R.D. McKelvey and A. McLennan

sense that the zeros of the derived polynomials in this hyperptane correspond to the
zeros of the original polynomials. We will not distinguish notationally between the
two versions of Pl,. • • ,P,,~- Second, we add another linear homogeneous polynomial
Pm+l = uOXO + "" " + UmXm to the system.
The resultant of the derived system p ~ , . . . ,Pm+l is called the u-resultant of the
given system P l , . . . , P,~. Taking the coefficients of p l , . •., p ~ as given numerically,
we regard the u-resultant as a polynomial R ( u o , . . . , urn) in u 0 , . . . , u ~ . Suppose
the system P l , . • •, Pm (viewed as a system of homogeneous polynomials in the vari-
ables x 0 , . . . , x , 0 has finitely many one dimensional subspaces of solutions, and
( ( 1 ) , . . . , ~(q) are nonzero points on these lines. Then, according to our definition, the
resultant should vanish if and only if some inner product ~(~) • u vanishes. Since
the resultant is the lowest degree polynomial in the coefficients with this property, it
follows that R = I1~=1 ((~) " u. Note that ~(~) corresponds to a root of the original
m-variate system P l , . . . , Pm if and only if ~o(~) ¢ 0.
The following analysis, due to Canny (1988), is the reduction to a single dimension.
The construction begins with the choice of numerical constants Cl,.. •, cm such that
m ~)
(~) for all c~ = 1 , . . . , q, if ~ ) = 0, then ~ j = l cj ¢ O,
({) for k = 1 , . . . , m and all distinct c~,/3,7 = 1 , . . . , m ,

i=1 i=1 i=1

Of course we do not know the points ~(1), . . -, ((q) in advance, so we cannot know
that a particular choice of cl, • • •, Cm is acceptable, but it turns out that a bad choice
can be diagnosed at run time, so one can either take a Monte Carlo approach, choosing
cl,. • •, c~ randomly, which works "with probability one", or one can keep systematic
track of which choices fail, so that with enough such "failure" one will be able to
solve for ( 0 ) . . . , ((q) by linear algebra.
Form the univariate polynomials

p(x) : /~(--X, el,...,Cm)


t+(x) = R(-x, c,,..., + 1),...,

V(x) = R(-x,c,,..., - 1),...,

fori= t,...,m.
Recall that R(uo,. •., urn) factors as the product of the linear forms u0(~ ~) + . - . +
u m ( ( ~ ) . Now if ~ ) = O, then, as a function of x, the corresponding linear factor
Ch. 2." Computation of Equilibria in Finite Games 123

of p is a constant that is nonzero by (-~). Since R is defined only up to multiplica-

tion by a nonzero scalar, we may assume without loss of generality that the vectors
( ~ ) , . . . , ~(m~)) with ~ ) ~ 0 have been normalized to have ~0(c~) = 1. The roots of
p are then the numbers

for those a for which ~ ) ¢ 0.

It is possible that some t + (x) or t~-(x) has multiple roots, a situation we wish to
avoid. A polynomial is quadraO~rei if it has no square factors. Any polynomial has a
unique factorization as a product of powers of linear factors; the quadratfrei part of
the polynomial is the product of these factors. As we explain in greater detail below,
the Euclidean remainder sequence provides a method of computing all square factors
of a univariate polynomial. Let t+ (x) and t~- (x) be the quadratfrei parts of t + (x) and
t [ (x) respectively. From the factorization of R we conclude that the roots of t+ (x)
are the numbers of the form 0~ + ~}~), while the roots of t~- (x) are the numbers of
the form 0,~ - F (~) Evidently for given 0 the roots of t+(20 - x) and t~- (20 - x) are
the numbers 20 - 0~ - ~}~) and 20 - 0,~ + ~}~) respectively. If 0 = 0~, then t~- (x)
and t+ (20~ - x) have the root 0~ - ~}~) in common, and a little algebra shows that
(~) implies that they cannot have any other common root.
For each i = 1 , . . . , rn we form a sequence of polynomials in the variables (0, x)
by setting z°(O,x) = t+(20 - x), z~(O,x) = t~-(x), and, for j > 1, setting

z (O,x) = w (o)4-2(o,x) - J (o)z j-1 (o, x ) x c'

where the polynomials w~ (0), y~ (0), and the integer ei are chosen in some way that
insures that the degree of z~, as a function of x with coefficients that are polynomials
in 0, is less than the degree of z~ -1 . Thus ei will be the difference between the degrees
of z ij-2 and z~- l , and w~J and y~ could be the leading coefficients of ziJ-I and ziJ-2
respectively. (In practice one may wish to check whether these leading coefficients
have any common factors, in which case w,Ji and g~ could be simplified.) For some J,
z~J+t will be a function of 0 only. If ~ ) ¢ 0, then z/a+l (0~) = 0, since z°i(O~, .) and
z iI (0~, .) have a common root, as we saw above. Also, z~J must have degree one as
a function of x, since otherwise z°(Oc~, .) and z~ (0n, .) would have multiple common
Thus we may write z J (0, x) - di (0)x + ni (0). Since z J (0~, 0,~ - ~i(~)) = 0 for all
c~ such that ~ ) 5£ 0, setting

- d (0--5 + o,
124 R.D. McKelvey and A. McLennan

gives a rational function such that ri(O~) = ~}~). (Note that, while hi(O) and di(O)
may depend on the choices of wiJ and yJ made above, their quotient does not.)
Summarizing, once we have computed p and t +, t~-, i = t , . . . , m, the polynomi-
als di and ni are computed using polynomial remainder sequences, and the rational
tunction ri is computed according to the formula above. Since p and r l , . . . , r,~ have
the properties indicated at the outset, questions about the possible sign assignments
of q l , . . . , qk at such roots can be rephrased as univariate problems by substituting
the functions ri for the various arguments of ql, • •., qk.

6.2.2. Univariate systems

Now let p and q ~ , . . . , q k be polynomials in a single real variable x with rational

coefficients. Our goal is to find an algorithm determining the number of real roots of
p satisfying each possible sign assignment for ql,. • •, qk.
For any pair of univariate polynomials f and 9, the Euclidean remainder sequence
r o ( f , 9 ) , . . . , r n ( f , 9) is defined by

ro = f,

1"i+1 ~ Si • ri - - 7"i--1 with degri+l < degr~ (i= 1,...,n- 1),
0 ~ 8n • rn -- rn-1.

That is, 'r'i+~ is the remainder resulting from division of r i - i by ri, with rn being the
last nonzero remainder.
For most readers the Euclidean remainder sequence will be familiar as the algorithm
for determining the greatest common divisor rn of f and 9- In particular, writing
P = r I j (x - c~j), where the product is over the roots cU of p, then differentiating,
one can show that p has multiple roots if and only if gcd(p,p') is a polynomial of
positive degree. More generally, if p = pl • p2 2 . . . . . ph h, where each pe is quadratfrei
(has no square factor), then (up to multiplication by a nonzero constant)

gcd(p,p',... ,p(g)) = p~+l pc+2 2 . . . . . Ph h-e,

so that

gcd(p,p',.. ,p(e-~)) g c d ( p , p , , . . . , p ( ~ + l ) )
Pe = gcd(p,p', . . , p(e) )2

Evidently we have described an algorithm for decomposing p into a product of powers

of quadratfrei polynomials. The given problem, determining the number of roots of
p with each sign assignment, is evidently solved if we can determine the number of
Ch. 2: Computation of Equilibria in Finite Games 125

roots of each Pe with each sign assignment, so henceforth we will assume that p is
quadratfrei. Similarly, we assume that for each j, p and qj are relatively prime, since
computation of gcd's allows us to express the solution of the given problem in terms
of the solutions of smaller problems for which this is the case.
Consider relatively prime polynomials f and 9, and let r0 = to(f, 9 ) , . . . , r~ =
r n ( f , 9) be the Euclidean remainder sequence. For a point a ~ R at which none of
the polynomials in this sequence vanish, let

S(.f,g;a) = #{i = 1,...,n: sign(ri_,(a)) ¢ sign(r/(@)}.

For numbers - o c ~ a < b ~< oc at which none of the polynomials r0, • •., r~ vanish,

A(f,g;a,b) = S(f, 9;a) - S(f,g;b).

A classical result, k n o w n as Sturm's theorem, states that if p is a quadratfrei polyno-

mial, and a ~< b are numbers that are not roots of r0 = r 0 ( p , p ~ ) , . . . , r n = r n (p, p~),
then A(p, pt; a, b) is the number of roots of p between a and b. The Ben-Or, Kozen,
Reif (1986) algorithm (hereafter BKR) is based on the following generalization, which
they describe as essentially due to Tarski.

THEOREM 6. p, p~ = dp/ dx, and q are relatively prime polynomials, and a ~ b are
numbers that are not roots of to = r o ( p , p ' q ) , . . . , rn = rn(p,p' q), then

A ( p , p ' q ; a , b ) -- # { x ~ ( a , b ) : p(x) - 0 and q(x) > O}

- # { x E ( a , b ) : p(x) = 0 and q(x) < 0 } .

PROOF. To begin with note that (6.2) holds trivially when a -- b. We argue by
considering how each side of (6.2) changes as we pass from b -- e - e to b = c + e,
where c is a root of some ri and there are no other roots of any of the polynomials
t o , . . . , r,, in the interval [e - c, e + el. Since p and ptq are relatively prime, rn is a
nonzero constant, so i = n is not a possibility.
Suppose 1 <~ i <~ n - 1. Then (6.1) reduces to r,i-l(c) = -r,z+l(e), so there
is exactly one sign change in passing from ri-l(C ± e) to ri(e ± e) and then to
r~+l (c ± ~), regardless of the sign of ri(e + e). Thus the LHS of (6.2) is unaffected
by passing from b - c - e to b = c + e, and since i ~> 1 the RHS is also unaffected.
The interesting case occurs when e is a root of r0 -- p. To be concrete, suppose
mat q(c) > 0 and p'(c) > 0. Then p'(c)q(c) > 0, and since p'(c) > 0, we have
p(c - e) < 0 < p(c + e). Therefore

Z ( p , p ' q ; c + e) = S ( p , p ' q ; c - e) -- 1.
126 R.D. McKelvey and A. McLennan

We see that if (6.2) holds with b = c - e then it also holds with b = c + e. The other
three possibilities for the signs of q(c) and p'(c) are similar, so the proof is com-
plete. []

Fix p and q satisfying the hypotheses of the theorem: p, p', and q are pairwise
relatively prime. Let p+ and p_ be the numbers of roots of p between a and b at
which q is positive and negative, respectively. Let

1 1 '


A(;, ;'q; a, b) J "

Inverting this system yields

p+ : ½(ZX(p,p'; a, b) + A ( ; , p ' q ; a, b)),

p _ = ½ ( A ( p , p ' ; a, b) - A(p,p'q; a, b)). (6.3)

To generalize this we introduce a concept from linear algebra. I f / 3 = [bhj] is an

m × m matrix and C = [c~k] is an n x n matrix, then the Kronecker product of B
and C, denoted by B ® C, is the m n x m n matrix whose (hi,jk)-entry is bhjcik.
Below we will need the following property of this construct.

LEMMA 7. I f / 3 and C are nonsingular square matrices, then B ® C is nonsingular.

PROOF. Letting B and C be as described above, suppose v E ]Rmn is an element o f

the kernel of B ® C. Then, for all h and i,


This means that, for each i, the m - v e c t o r with j t h component ~ k cikvjk is in the
kernel of B and therefore 0. But then we see that for each j, ( v i i , . . . , Vim) is in the
kernel of C. Thus v = 0. []

Let p be a quadratfrei polynomial, and let q l , . • • , qk be polynomials, each of which

has no factor in common with either p or p'. Fix a < b, where these numbers are not
roots of p, p', qt, • • -, qk. Let ~r1, . . . , ~r be the sign assignments that are satisfied by at
least one root of p between a and b, and let p ~ l , . . . , p~. be the corresponding numbers
of such roots. Let 7r1, . . . , 7rr be a collection of products of the form q l e ~ . . . . . qk ek
Ch. 2: Computation of Equilibria in Finite Games 127

in which each ei is either 0 or 1. Finally let Z be the r x r matrix whose entry Zij
is 1 or - 1 according to whether the sign of 7rJ is positive whenever sign assignment

[11 Aplab,]
cri is satisfied. Then

Z • " = ~ : (6.4)
p~ Lz~(p,p'Trr;a,b)
If qk+l is another polynomial that has no factor in common with p or p', then (thinking
of Z ® A as obtained by replacing each entry of Z with a 2 x 2 cell containing A
multiplied by that entry) the theorem implies that

(Z ® A). [Pe~ +
p~rl. __

[po-~ +
A(p, fTd;a,b) ]

A(p,p'<a,b) /

L par_ A(p, p'~r"qk+l ; a, b) J

where (for example) ~ l + denotes the sign assignment for ql,..., qk+l obtained by
appending a 1 to c¢.
An algorithm for computing the number of roots of p satisfying each possible
sign assignment for ql, • • •, qk is now apparent. Let Z above be the k-fold Kronecker
product of the matrix A, compute the numbers A(p, ptTvJ;a, b) for the r = 2 k products
7r5, and solve (t) for the numbers p ~ . Since Z is a 2 k x 2 k matrix, the running time
of this method is evidently exponential in k. However, by performing the calculation
in an iterative fashion, adding the polynomials qi one at a time and keeping track
only of the sign assignments that are actually satisfied by roots of p between a and
b, this can be drastically improved. More precisely, we have the following inductive
step of the algorithm:
(1) Given an invertible Zk satisfying (6.4) above, let Z~t+l = Zk ® A.
(2) Solve Eq. (6.5) above for the numbers of roots of p between a and b satisfying
each sign assignment.
(3) Eliminate the zeros of the vector of numbers p~,+, and let Zk+ 1 be obtained
fi-om Z~Z+l by eliminating the corresponding columns.
(4) Let Zk+l be a square matrix consisting of a maximal collection of linearly in-
dependent rows of Z~+ 1. Eliminate the components of the vector of numbers
A(p,p'ccr;a, b) and A(p,p'Tdqk+l; a, b) that do not correspond to the rows of
With this modification, the dimension of Zk cannot exceed the number of roots of p,
which of course is bounded by the degree of p, so for given p the running time is
linear in k.
128 R.D. McKelvey and A. McLennan

This completes the description of the univariate component of the BKR algorithm.
They point out that, for the particular problem of determining the set of consistent sign
assignments, there is a multidimensional extension. First, observe that we have effec-
tively described an algorithm for determining the set or consistent sign assignments
for a collection P l , . . •, Pk of quadratfrei, pairwise relatively prime polynomials, since
for each Pi we may apply the procedure above with pi in the role of p. Now suppose
that pl, • - • ,Pk are polynomials in z j , . . . , xra, and consider the particular problem of
determining the set of sign assignments satisfied by the various points in R m. We treat
each polynomial in this sequence as a univariate polynomial, in the single variable
z,~, with coefficients in the field of rational functions in the variables z l , . . . , z,~-l.
Then each point in IR'~-1 determines a collection of univariate polynomials in z,~
obtained by evaluating the coefficients, for which some collection of sign assignments
are pogsible, and the sign assignments that are consistent for p ~ , . . . , p k will be the
ones obtained in this way as z l , . . . , z ~ - i vary over R m-m.
Now observe that the Euclidean remainder sequence is defined for univariate poly-
nomials over any field of coefficients, including the field of rational functions in
z 1 , . . . , z m - l . The consistent sign assignments for P l , . . . , P k will be those allowed
by the sign assignments of the polynomials, in the variables z l , . . . , xm-1, that arise
in the relevant Euclidean remainder sequences computed with respect to xm, then spe-
cialized to z,~ = a = - o e and :c,~ = b = ec. (That is, we need only the signs of the
leading coefficients.) In short, given a collection of polynomials in m variables, there
is a method of passing to a collection of polynomials in m - 1 variables whose consis-
tent sign assignments determine the consistent sign assignments of the given system.
The complexity of this calculation grows very rapidly as the number of variables
increases. An important idea in understanding why this is so is the general principle
that symbolic operations are expensive in comparison with the corresponding cal-
culations in which symbolic variables are replaced by numerical values as soon as
possible. In contrast with the method based on the u-resultant, the multidimensional
BKR algorithm offers very little opportunity for "specialization" of variables before
the inductive descent reaches the univariate case.

6.2.3. Numerical computation o f the solutions

In view of the reduction to one dimension presented earlier, for the purposes of
numerical computation it is important to compute real roots of a polynomial p(cc) of
one variable, which simultaneously satisfy a set of inequality constraints qi (z) >~ 0.
In the following discussion we will assume that the number of such roots has already
been determined. In particular, this gives a stopping rule.
There are a number of ways to approach the above problem. One is to find all
roots of p, and check each to see if it satisfies the constraint. To find all roots, we can
repeatedly apply Sturm's theorem to p until we obtain an interval containing just the
leftmost remaining root, and then apply any of a standard set of line search methods
Ch. 2: Computation of Equilibria in Finite Games 129

to find the root of p in the interval. We can inductively apply this procedure until
all roots have been found. For each root, we must check whether it satisfies the con-
straints qi(z) >~O. Since we only obtain an approximation to the root, the inequality
constraints may be difficult to verify by function evaluation. Thus another application
of BKR may be necessary to verify that the roots found satisfy the constraints.
An alternative procedure would be to successively add sets of linear equations to
F, applying BKR to the augmented set, until we reach a point where each consistent
sign assignment of the set of polynomials is contained in a unique interval. Then
a line search algorithm can be applied to each interval containing a consistent sign
There are also a number of algorithms which will find all roots of a system of poly-
nomials directly. See, e.g., Ben-Or, Feig, Kozen and Tiwari (1988), Drexler (1978),
Garcia and Zangwill (1979, 1980) and Huber and Sturmfels (1995). The homotopy
method of Garcia and Zangwill has been applied to game theoretic problems by
Kostreva (1989) and Kostreva and Kinard (1991).

6.3. Complexity of finding game theoretic equilibria

The domain of practical applicability of the algorithms described above, and of re-
lated algorithms, is largely determined by the rate at which the time and/or memory
requirements of the calculation grow as the 'size' of the input grows. For many of
the algebraic algorithms that deal with unquantified systems the time complexity is
bounded above by functions that grow exponentially as the dimension of the problem
increases. For a fixed dimension the time requirements are bounded by polynomial
functions in appropriate measures of the size of the inputs. Reflecting on the fact
that N "~ has 2 ~ orthants, it seems unlikely that one can do much better than this for
general algebraic computations.
These considerations leave open the possibility that there might be faster algorithms
specifically tailored to deal with the algebraic systems arising in game theory. For
large classes of problems a lower bound on the complexity is given by the size of the
desired output. This line of reasoning gives a particular theoretical significance to the
already interesting question of how many Nash equilibria a game can possess.
Relatively little is known theoretically about the number of Nash equilibria of a
game. Since the Nash equilibria are the fixed points of a continuous function, the
theory of the fixed point index allows one to assign an integer called the index to
each set of Nash equilibria that is both open and closed in the relative topology of
the set of fixed points. A crucial property of the index is additivity: the index of a
union of two disjoint 'relatively clopen' sets is the sum of the indices of the two
sets. Since the set of Nash equilibria is a semi-algebraic set, and any semi-algebraic
set is a union of finitely many path-connected components, the index is determined
by the indices of the components. The general theory of the index implies that the
indices assigned to the elements of any partition of the set of fixed points must sum
130 R.D. McKelvey and A. McLennan

to one. It is known that for a finite normal form game, generically in the space of
payoffs, each equilibrium is isolated, and its index is either t or - 1 , so that the the
number of Nash equilibria is odd. Gul, Pearce and Stachetti (1993) point out that
a strict pure equilibrium necessarily has index 1, so that for a generic finite normal
form game with 2c~ + 1 Nash equilibria, since all equilibria are strict, at least c~ are
non-degenerate mixed strategy equilibria.
If we are searching for all equilibria, and have found some number of Nash equi-
libria, these results may sometimes give us information that there are Nash equilibria
which have not yet been found. Thus, if we have found an even number of Nash
equilibria in a generic game, we know there must be at least one more. If we have
found k pure strategy equilibria in a generic game, then we know there must be at
least k - 1 non-degenerate mixed equilibria. However, these results are not useful in
informing us when we have found all Nash equilibria, and hence cannot be used to
give us a stopping rule in any numerical computational of Nash equilibria.
The total number of equilibria for a game is the sum over all possible supports of
the number of totally mixed equilibria for that support. So one approach to finding
the number of equilibria would be to study the number of totally mixed equilibria on
each support. Of course, in practice, we may not be dealing with a game in general
position. So the number of equilibria for a given game could be infinite. In the case
that we are not dealing with a generic game, a more tractable question may be to find
the number of regular equilibria. Regular equilibria are (roughly) Nash equilibria in
which unused strategies have strictly suboptimal payoffs, and in which the derivative
of the payoff function at the equilibrium is of full rank.
Pick an arbitrary s E S. Set D~ = S ~ - {si}, and D = UicND~" Let ~ be
the number of permutations of D that do not map any element of any Di into Di,
and let qSi = IDol! be the number of permutations of Di. McKelvey and McLennan
(1996) prove that the maximum number of completely mixed regular Nash equilibria
is A/(S) = ~ / l - [ i q~i. They also prove that this is a tight bound by showing it is
possible to construct games that achieve the bound. A similar computation gives an
upper bound on the number of regular Nash equilibria for any given support.
For an n-person game with n > 2, usually JV'(S) > 1. For example, in an n-person
game where each player has 2 pure strategies, N ' ( S ) is equal to the number of
derangements of the integers from one to n, where a derangement is defined as a
permutation with no fixed point. In this case JV'(S) is given by the recursive formula
An = (n - 1) • [ A n - l + An-2], where A1 = 0, and A2 = 1. In a five person game,
this is 44, and in a 10 person game, this is 1,334,961.
Let k = min ISd be the size of the smallest strategy space. For fixed n, N ' ( S )
is exponential in h and for fixed k, N ' ( S ) is exponential in n. Since N ' ( S ) is the
number of totally mixed regular Nash equilibria, the number of Nash equilibria must
be at least as large as N ' ( S ) . Thus, even if algorithms could be constructed whose
complexity is linear in these parameters, it follows from the above result that the worst
Ch. 2: Computation of Equilibria in Finite Games 131

case computational complexity of finding all Nash equilibria is at least exponential in

n and k.
For a two-person game, N ' ( S ) = 1. A two person game will have at most one
regular equilibrium per support. Hence the maximum number of regular equilibria is
no greater than (2 m' - 1). (2 "~2 - 1). On the other hand it is easy to construct games
with (2 m• - 1) regular Nash equilibria, where m* = min [re1, m2]: Assume ml = m*
and set u i ( s l j , s2k) = 0 if j ¢ k, and u i ( s l j , 82k) = a j > 0 if j = k. Then for any
non-empty support C C $I, pC = (Plc , P2c ) is a regular Nash equilibrium if

PiJ = II ak E ~ ak when jcC,

k~c-{j} zcc k~C-(O
C = 0
Pij otherwise.

Thus, the maximal number of regular equilibria for a two person game is somewhere
between (2 m* - 1) and (2 m' - 1)-(2 mE - 1). In any case, it is clear that the maximum
number of regular Nash equilibria is exponential in the minimal number of strategies.
The above complexity calculations represent worst case situations. We might hope
that an "average" game would be better behaved than the worst case game. We do
not know the answer to this.

6.3.1. Two-person games

In the case of two person games, each of the equations and inequalities in the defi-
nition of" Nash equilibrium is linear. So the problem can be re-formulated as a linear
complementarity problem. Further, for any support the set of equilibria with that sup-
port is a convex set (possibly empty), and its closure is the convex hull of its extreme
points. There are a finite number of extreme points, each of which corresponds to a
set of k equations and inequalities, whose matrix of coefficients is of full rank, which
yield a feasible solution. It follows that a "brute force" method of finding all solutions
is explicit enumeration: For each possible support, check for a feasible solution for
[5],[6]. If there is one, either it is unique, or we can find it's finite set of extreme
points, whose convex hull represents the set of Nash equilibria for that support.
No genericity assumption is required in the above algorithm. This procedure will
locate all equilibria even if there are an infinite set of equilibria. Such a procedure
was first suggested by Mangasarian (1964).
Since the number of possible supports for an n-person game is I ~ i r N ( 2 m~ -- 1),
it is clear that even in a two person game, the above method has computational
complexity that is at least exponential in the maximum size of the strategy spaces.
Thus we would not expect this method to be feasible on large games. However, this
procedure has been implemented by Dikhaut and Kaplan (1991) in Mathematica, and
by the first author in C. The authors' experience indicates one can solve an 8x 8
game in approximately 30 seconds on a 486/66MH machine, and games up to 12x 12
132 R.D. McKelvey and A. McLennan

are feasible. One would expect that more sophisticated implementations, which take
account of the dominance structure of the game to eliminate whole sets of supports,
would substantially improve such algorithms, at least on average games. Koller and
Megiddo (1996) give an algorithm for finding all equilibria of a two person extensive
form game (even without perfect recall) that runs in time that is exponential in the
size of the extensive form.

6.3.2. N-person games

For games with more than two players, even if the input data are rational, an isolated
Nash equilibrium need not be rational [see, e.g., Nash (1951) p. 294 for an example].
Second, the set of equilibria with a given support need no longer be a convex, or
even connected set, as the following example illustrates. In this example, n = 3,
S i = {8il , 8i2 } for i E N. The payoff function is given in the following table:

$31 $21 $22 832 821 822

Sll (9,8,12) (0, 0,0) Sll (0,0, 0) (3,4,6)
312 (0, O, O) (9, 8, 2) S12 (3, 4, 4) (0, O, O)

A mixed strategy p = (Pl, P2, P3) ~ z~ is of the form Pl = (P, 1 - p ) , P2 = (q, 1 -- q),
P3 = (r, 1 - r), for some p, q, r E [0, 1]. We abbreviate p by (p, q, r). Then any Nasb
equilibrium to the game with full support must satisfy the equations
9qr + 3(1 - q)(1 - r) = 3q(1 - r') + 9(1 - q)r,
8pr + 4(1 - p ) ( 1 - r) = 4 p ( 1 - r) + S(1 - p ) r ,
lZpq + 2(1 - p ) ( 1 - q) -- @ ( 1 - q) + 4(1 - p)q.
Collecting terms and factoring, this becomes
(6q - 3)(4r - 1) -- 0,
(127' - 4)(2p - 1) = 0,
C a p - 2 ) ( 3 q - 1) = o.
This system of equations has exactly two solutions, one at (p, q, r) = (1/4, 1/2, 1/3),
and one at (p, q,r) = (1/2, 1/3, 1/4). (This game has a total of nine equilibria:
four pure strategy equilibria, three equilibria where two players mix and the other
adopts a pure strategy, and two equilibria with full support.) It should be noted that
the equilibria in this example are all regular Nash equilibria, and this is a generic
example, in the sense that any small perturbation of the payoffs in the game will
yield a game which also has nine Nash equilibria, two of which are distinct and
isolated equilibria with full support.
So it is clear that the computational problem for the n-person case is substantially
more difficult than the two-person case.
Ch. 2: Computation of Equilibria in Finite Games 133

7. Practical computational issues

In the above sections, we have considered the problem of computation of Nash equi-
libria primarily from a theoretical point of view. From the point of view of practical
application, there are a number of questions that need to be addressed.

7.1. Software

Many of the algorithms we have discussed are quite complicated. Implementation

of the algorithms in computer code involves issues of numerical stability, which
we have not discussed, as well as questions of efficiency. One impediment to the
routine use of the methods we have discussed has been the lack of generally available
software that implements the algorithms. For extensive form games, this problem is
particularly acute, since every extensive form game is different. If it were necessary
for a researcher to select and implement their own version of a solution algorithm for
each game that they encounter, then one would not expect that these methods would
be applied very widely.
The authors have been involved in development of a computer software package
(GAMBIT) to address the above problem} An early version of the software was
developed by the first author. Both authors together with Ted Turocy are now involved
in making major revisions to GAMBIT.

7.1.1. Current version of" G A M B I T

The original version of GAMBIT is a program that allows for the interactive building
and solving of extensive form games. The user sees on the computer terminal a
graphics display of the current extensive form game, and is able to navigate around
the extensive form using commands from the keyboard. A series of editing options
allows the user to alter the existing tree by adding, inserting, changing or deleting
portions of the existing extensive form. Thus, at a given node, the user may choose
to insert a new node, change the player who has control of that node, change the
information set to which the node belongs, add or delete a branch from that node,
delete the node, copy that node to another part of the tree, or attach an outcome,
with payoffs to each of the players, to that node. After any change, the program
checks for consistency of the new extensive form, and then redraws the tree if the
changes are legal. In this manner, GAMBIT allows for the entry of any valid finite
extensive form game. Games of incomplete information can be entered by treating the
type distribution as an initial chance move, and specifying player information sets so
that each player knows only his/her own type. Stochastic games can be dealt with by

2Available at the GAMBITWorld WideWeb site at http://www.hss.caltech.edu/-gambillGmnbit.html.

134 R.D. McKelvey and A. McLennan

having several game elements, and having one game be the outcome of another game.
Infinitely repeated games with discounting can be dealt with in a similar manner.
For any extensive form game, GAMBIT can construct the corresponding normal
form (either the reduced, or full normal form). A number of algorithms for finding
equilibria for the normal form game are available. The normal form equilibrium
strategies that are found are then converted back to behavioral strategies to obtain
solutions to the extensive form.

7.1.2. Revisions to G A M B I T

We are currently making extensive revisions to GAMBIT to add several features: a

command language, a modular structure, and support for additional algorithms de-
scribed in Section 6 for computing all equilibria. The most significant feature from
the point of general application is the GAMBIT command language (GCL).
The basic philosophy behind the changes is to provide both a programming envi-
ronment and a computer language to make it easy for both programmers and applied
researchers to manipulate and do operations on extensive and normal form games.

Programming environment. Regarding the programming environment, a programmer

who wants to implement an algorithm for finding a particular solution or refinement
should only have to worry about writing the code for the algorithm, and not have to
worry about writing code to build up or manipulate the game that is to be solved.
For example, prior to executing any algorithm, in addition to building up the game,
one would typically want to iteratively eliminate either weakly or strongly dominated
strategies, and only invoke the algorithm on the reduced game.
To this end, the computer code is being completely rewritten in C++ to make it
modular and allow a standardized interface to the extensive and normal form. Most
algorithms which operate on normal form games need access to the normal form only
to evaluate the payoff at a given mixed strategy profile. Similarly, most algorithms that
operate on the extensive form could work either directly off of the agent normal form,
or only need raccess to the extensive form to obtain evaluation of certain attributes
of the extensive form (such as the payoff or the beliefs at an information set) for a
given mixed behavioral strategy profile. Thus, by providing standard data structures
for mixed strategy profiles and behavior strategy profiles and a standard interface to
obtain the necessary function evaluations, this would enable a programmer to write
such algorithms easily. By simply linking in the relevant normal and extensive form
libraries, a programmer could write a standalone program that would have available
all of the functionality for building and manipulating the game forms without the
programmer having to know or have to deal with the internal workings of this code.
Alternatively, the same code could be linked into GAMBIT or the GCL to make it
available within those platforms.
Ch. 2: Computation ()f"Equilibria in Finite Games 135

Command language. Similar issues arise from the point of view of an applied re-
searcher, trying to make use of existing code developed by others. Here the problem
is that code must be available in a way that it can be used by individuals without
any particular programming expertise, but in a way that allows researchers to address
questions that are unique to their own particular application. To some extent a pro-
gram like G A M B I T provides such an environment, since it allows easy building and
analysis of extensive form games in an interactive setting. However, while an interac-
tive environment is useful for some applications, it requires constant user interaction,
and hence is not suitable for many potential uses.
For many applications, it is necessary to solve a game repeatedly with different
values of the parameters, to be able to do intermediate calculations on the results
(perhaps conditional on what one has found out so far), and to have a record of
what has been done. This is particularly true in econometric applications or in certain
theoretical applications such as the search for counterexamples. For such usage, an
interactive environment is unsuitable. To facilitate such usage, we are currently (with
Ted Turocy) writing a command language for GAMBIT. This is a project that is
currently in progress. So the following description is intended only to give a flavor
for the intended final product.
The command language is a language with Mathematica style syntax. The language
contains a number of data types, some corresponding to standard numerical data types
(int, double, rational) and some corresponding to elements of games (NormalForm,
ExtensiveForm, Node, InformationSet, Player, Outcome, etc.). Variables can be as-
signed to take on any of these data types, and these variables can then be operated on
by an array of functions. Each function has a number of required arguments, (assigned
by the operator " - > " ) as well as some possible optional arguments.
Some functions are useful for building up extensive form games. These commands
each have natural analogues to functions that can be performed (more easily if one
only has to do it once) in the interactive version of GAMBIT. For example

efg := NewEfg[] ;
root :: RootNode[e-> efg];
AppendNode[n->root#2,pl->l,br->2] ;

is a sequence of commands to create the extensive form tree for a two person simulta-
neous move game of poker, illustrated in Fig. 2.1. It defines the variable "root" to be
the root node of a new extensive form game, then creates a decision node for player
0 (chance) at the root node, creates a decision node for player 2 at the first branch of
136 R.D. McKelvey and A. McLennan

iooo ooo
RAISE a ' - -
1.000 -i.000
2. 000 -2. 000

FOLD o -1.000 1.000

-(i, 2 ) ~ R A I S E / 1.000 -i.000
2.000 2.000

Figure 2.1. A simple poker game.

Table 2.1
Normal form for poker game
FF (-1,0 (-1,1)
FR (0, 0) (-1.5, 1.5)
RF (0, 0) (0.5, -0.5)
RR (1,-1) (0,0)

the root node (root#l), does the same at the second branch of the root node, and then
merges the two nodes for player 2 into one information set.
Labels and outcomes could then be attached to nodes of the tree by

SetActionNames[n->root,name->{''ACE'', ''KING''}];

SetOutcome[outc->l,value->{-l, i}];

Other functions are useful for manipulation and solution of games. For example

nfg := ExtToNorm[e >efg];


converts the extensive form game "efg" to a normal form game, as in the following
table, and then finds all solutions that are accessible via the L e m k e - H o w s o n algorithm.
(In the normal form, the strategies are labeled by the action taken at each information
set, so "RF" means that Player 1 Raises with an Ace and Folds with a King). This
game has a unique Nash equilibrium at Pl = (0, 0, 2/3, 1/3), and Pe = (1/3, 2 / 3 ) .
The G C L allows lists of any data type, and has flow control statements, (While, If,
For) which have similar functionality and syntax as those in Mathematica. So
Ch, 2: Computation of Equilibria in Finite Games 137

list :: {''e01.nfg'', ''e02.nfg'', ''e03.nfg''}

F o r [ i := i, i < = L e n g t h [ l - > l i s t ] , i := i + i,
n f g := R e a d N f g F i l e { f i l e - > l i s t [ [i]] ;
While[ElimStrongDom[n-> n f g ] , D e l e t e D o m [ n <-> nfg] ] ;
S i m p D i v [n->nfg] ;

would process three files. For each one it would first successively eliminate strongly
dominated strategies, and then apply the simplicial subdivision algorithm to the re-
duced game.

7.2. Computational complexity

We illustrated the functionality of GAMBIT and the GCL with a very simple example,
which can easily be solved by hand. Only slightly larger examples quickly grow in
complexity to where they are very difficult or impossible to solve by hand. Computer
programs such as GAMBIT are a useful tool in helping to analyze and solve such
games. For example, the authors have found these programs indispensible in the
design, solution and subsequent econometric analysis of game theory experiments
[E1-Gamal, McKelvey and Palfrey (1993) and McKelvey and Palfrey (1992, 1995a,
1995b)]. While these applications illustrate that there is a domain of problems for
which the algorithms discussed in this survey are useful, they do not indicate how
large the domain is. in this section, we discuss briefly this question.
Computational complexity considerations suggest that the worst case performance
of all of the algorithms we have discussed is at least exponential in the size of the
problem, so that the methods are inherently constrained in the scope of applicability
to problems of practical interest. However, the worst case performance may not be
the correct measure if problems that arise in practice tend to be more well behaved.
An more important question is what is the average case complexity?
Unfortunately, there does not seem to be any systematic evaluation in the literature
of the algorithms we have discussed in terms of their performance on average games.
Part of the problem is that it is not clear what an average game is.
To get a rough indication of the size of game that is soluble, we generated a set of
random normal form games. For each size game reported, we generated 100 random
games, and compared some of the algorithms on these games. In all cases, we just
searched for one equilibrium.
For two person games, the Lemke-Howson algorithm clearly outperforms the other
algorithms tested (as currently implemented in GAMBIT). We also tested a simpli-
cial subdivision with restart, and a function minimization algorithm. Table 2.2 gives
information for the Lemke-Howson algorithm, on these random games.
Based on the data in Table 2.2, it appears that the speed of the Lemke-Howson
algorithm, at least in this range, is approximately polynomial in the size, k, of the
strategy space.
138 R.D. McKelvey and A. McLennan

Table 2.2
Performance of Lemke-Howson on 100 random
games* (h = number of strategies per player)

Number Total
k pivots time
2 2.74 0.0200
3 3.84 0.0156
4 4.56 0.0198
6 6.46 0.0271
8 7.66 0.0383
12 13.16 0.0842
16 19.23 0.1737
24 33.87 0.5772
32 78.17 2.153
48 210.4 12.22
64 426.9 43.68
96 819.2 182.1
* Time in seconds on a SUN 4/ELC.

We have not performed similar comparisons on n-person games. Undoubtedly they

will be considerably slower, since the Lemke-Howson algorithm is no longer avai-
The above timings reflect the time required to solve a normal form of a given size.
There is at least one consideration that would ease the computational burden for a spe-
cific application. Namely, there is usually some pre-processing that can substantially
reduce the size of the game before application of any solution algorithms. In the case
of extensive form games, if there are non-trivial subgames, then the subgames can
be recursively solved, and from a computational point of view, the important variable
is the maximum size of a subgame. We have seen also that solving directly on the
extensive form can lead to substantial reduction of the strategy space over the normal
form. However, even if one moves to the normal form, it is important to note that one
should use the reduced normal form, which is typically much smaller than the normal
form. Finally, on both normal and extensive form games, one can successively elim-
inate strongly dominated strategies without eliminating any Nash equilibria. If one
wants only to find a sample equilibrium, then one can successively eliminate weakly
dominated strategies, and any equilibrium to the reduced game will be an equilibrium
to the original game. In short, the effective si'ze of the strategy space for which one
has to apply any solution algorithm is frequently much smaller than the original size
of the strategy space.
Ch. 2: Computation of Equilibria in Finite Games 139


Azhar, S., McLennan, A. and Reif, J.H. (1992) 'Computation of equilibria in noncooperative games',
University of Minnesota, mimeo.
Banks, J.S. and Sobel, J. (1987) 'Equilibrium selection in signalling games', Econometrica, 55:647-661.
Ben-Or, M., Kozen, D. and Reif, J. (1986) 'The complexity of elementary algebra and geometry', Journal
of Computer and System. Sciences, 32:251-264.
Ben-Or, M., Feig, E., Kozen, D. and Tiwari, P. (1988) 'A fast parallel algorithm for determining all roots
of a polynomial with real roots', SIAM Journal of Computation, 17:1081-1092.
Blair, J.R.S. and Mutchler, D. (1993) 'Pure strategy equilibria in the presence of imperfect information:
NP-hard problems', University of Tennessee, Dept. Computer Science. Tech. Report CS-93-220.
Blume, L. and Zame, W.R. (1994) 'The algebraic geometry of perfect and sequential equilibrium', Econo-
metrica, 62:783-794.
Brown, G.W. and von Neumann, J. (1950) 'Solutions of games by differential equations', in: H.W. Kuhn
and A.W. Tucker, eds, Contributions to the theory of games. Annals of Mathematical Studies Number 24,
Princeton, NJ: Princeton Univ. Press, pp. 73-79.
Canny, J. (1988) 'Some algebraic and geometric computations in PSPACE', in: ACM symposium on theory
~f computing, pp. 460-467.
Canny, J. and Emeris, I. (1993) 'An efficient algorithm for the sparse mixed resultant', in: Proceedings,
AAEEC, pp. 89-104.
Cbo, I. (1987) 'A refinement of sequential equilibrium', Econometrica, 55:1367-1389.
Cbo, I. and Kreps, D.M. (1987) 'Signalling games and stable equilibrium', Quarterly Journal of Economics,
Collins, G.E. (1975) 'Quantifier elimination for real closed fields by cylindrical algebraic decomposition',
in: Second G/cor~ference on automata theory and.fi)rrnal languages, Lecture notes in computer science.
Vol. 33. Berlin: Springer, pp. 134-183.
Dickhaut, J. and Kaplan, T. (1991) 'A program for finding Nash equilibria', The Mathematica Journal,
Doup, T.M. (1988) 'Simplicial algorithms on the simplotope', Lecture notes in economics and mathematical
systems, #318. Berlin: Springer.
Doup, T.M. and Taiman, A.J.J. (1987a) 'A new simplicial variable dimension algoritlun to find equilibria
on the product space of unit simplices', Mathematical Programming, 37:319-355.
Doup, T.M. and Talman, A.J.J. (1987b) 'A continuous deformation algorithm on the product space of unit
simplices', Mathematics of Operations Research, 12:485-521.
Drexler, EJ. (1978) 'A homotopy method for the calculation of all zeros of polynomial ideals', in:
H. Wacker, ed., Continuation methods. New York: Academic Press.
Eaves, B.C. ( 1971) 'The linear complementarity problem', Management Science, 17:612-634.
Eaves, B.C. (1972) 'Homotopies for computation of fixed points', Mathematical Programming, 3:1-22.
E1-Gamal, M., McKelvey, R.D. and Palfrey, T. (1993) 'A Bayesian sequential experimental study of learning
in games', Journal of the American Statistical Society, 42:428M-35.
Garcia, C.B., Lemke, C.E and Luethi, H. (1973) 'Simplicial approximation of an equilibrium point for
non-cooperative n-person games', in: T.C. Hu and S.M. Robinson, eds, Mathematical programming.
New York: Academic Press, pp. 227-260.
Garcia, C.B. and Zangwill, W.I. (1979) 'Finding all solutions to polynomial systems and other systems of
equations', Mathematical Programming, 16:159-176.
Garcia, C.B. and Zangwill, W.I. (1980) 'Global calculation methods for finding all solutions to polynomial
systems of equations in n variables', in: Extremal methods and systems analysis. Heidelberg/New York:
Gul, E, Pearce, D. and Stacchetti, E. (1993). 'A bound on the proportion of pure strategy equilibria in
generic games', Mathematics' of Operations Research, 18:548-552.
140 R.D. McKelvey and A. McLennan

Harker, ET. and Pang, J. (1990) 'Finite-dimensional variational inequality and nonlinear complementarity
problems: A survey of theory, algorithms and applications', Mathematical Programming, 48:161-220.
Hironaka, H. (1975) 'Triangulation of algebraic sets', AMS Symposium in Pure Mathematics, 29:165-185.
Hirsch, M.D., Papadimitriou, C.H. and Vavasis, S.A. (1989) 'Exponential lower bounds for finding Brouwer
fixed points', Journal qf Complexity, 5:379-416.
Huber, B. and Sturmfels, B. (1995) 'A polyhedral method for solving sparse polynomial systems', Mathe-
matics of Computation, 64:1541.
Kalai, E. and Samet, D. (1984) 'Persistent equilibria', International Journal of Game Theory, 13:129-144.
Kohlberg, E. and Mertens, J.E (1986) 'On the strategic stability of equilibria', Econometrica, 54:1003-1037.
Kohlberg, E. and Reny, EJ. (1993) 'An interpretation of consistent assessments', mimeo.
Koller, D. and Megiddo, N. (1992) 'The complexity of two person zero-sum games in extensive form',
Games and Economic Behavior, 4:528-552.
Koller, D. and Megiddo, N. (1996) 'Finding mixed strategies with small supports in extensive games',
International Journal of Game Themy, forthcoming.
Koller, D., Megiddo, N. and yon Stengel, B. (1996) 'Efficient solutions of extensive two-person games',
Games and Economic Behavior, forthcoming.
Kostreva, M.M. (1989) 'Nonconvexity in noncooperative game theory', International Journal q]~ Game
Theory, 18:247-259.
Kostreva, M.M. zmd Kinard, L.A. (1991) 'A differential homotopy approach for solving polynomial optimi-
zation problems and noncooperative games', Computers and Mathematics with Applications, 21:135-143.
Kreps, D.M. and Wilson, R. (1982) 'Sequential equilibria', Econometrica, 50:863-894.
Kuhn, H.W. (1953) 'Extensive games and the problem of information', in: H.W. Kuhn and A.W. Tucker,
eds, Contribution to the theory of games, Vol. II. pp. 193-216.
Kuhn, H.W. and MacKinnon, J.G. (1975) 'Sandwich method for finding fixed points', Journal of Opti-
m&ation Theory and Applications, 17:189-204.
Lemke, C.E. (1965) 'Bimatrix equilibrium points and mathematical programming', Management Science,
Lemke, C.E. and Howson, J.T., Jr. (1964) 'Equilibrium points in bimatrix games', SIAM Journal of Applied
Mathematics, 12:413-423.
Mangasarian, O.L. (1964) 'Equilibrium points in bimatrix games', Journal of the Society for Industrial and
Applied Mathematics, 12:778-780.
Mathiesen, L. (1987) 'An algorithm based on a sequence of linear complementarity problems applied to a
Walrasian equilibrium model: An example', Mathematical Programming, 37:1-18.
McKelvey, R.D. (1996) 'A Liapunov function for Nash equilibria', California Institute of Technology,
Social Science Working Paper #953.
McKelvey, R.D. and McLennan, A. (1996) 'The maximal generic number of totally mixed Nash equilibria',
Department of Economics, University of Minnesota, Journal qfEconomic Theory, forthcoming.
McKelvey, R.D. and Palfrey, T.R. (1992) 'An experimental analysis of the centipede game', Econometrica,
McKelvey, R.D. and Palfrey, T.R. (1995a) 'Quantal response equilibria for normal form games', Games
and Economic Behavior, 10:6-38.
McKelvey, R.D. and Palfrey, T.R. (1995b) 'Quantal response equilibria for extensive form games', Cali-
fornia Institute of Technology, Social Science Working Paper #947.
McLennan, A. (1985) 'Justifiable beliefs in sequential equilibrium', Econometrica, 53:889 904.
Merrill, O.H. (1972) 'Applications and extensions of an algorithm that computes fixed points of certain
upper semi-continuous point to set mappings', PhD thesis, University of Michigan, Ann Arbor, MI.
Mertens, J. (1989) 'Stable equilibria, a reformulation', Mathematics of Operations Research, 14:575-625.
Myerson, R.B. (1978) 'Refinements of the Nash equilibrium concept', International Journal of Game
Theory, 7:73-80.
Murty, K.G. (1978) 'Computational complexity mad linear pivot methods', Mathematical Programming
Study, 7:61-73.
Ch. 2: Computation of Equilibria in Finite Games 141

Nash, J.E (1951) 'Noncooperative games', Annals of Mathematic'z, 54:289-295.

Rosen, J.B. (1965) 'Existence and uniqueness of equilibrium points for concave N-person games', Econo-
metrica, 33:520-534.
Rosenmtitler, J. (1971) 'On a generalization of the Lemke-Howson algorithm to noncooperative N-person
games', SIAM Journal of Applied Mathematics, 1:73-79.
Saigal, R. (1977) 'On the convergence rate of algorithms for solving equations that are based on methods
of complementary pivoting', Mathematics ~f Operations Research, 2:108-124.
Scarf, H. (1967) 'The approximation of fixed points of a continuous mapping', SIAM Journal of Applied
Mathematics, 15:1328-1343.
Scarf, H. (1973) The computation of economic equilibria. New Haven, CT: Yale Univ. Press.
Schanuel, S.H., Simon, L.K. and Zame, W.R. (1991) 'The algebraic geometry of games and the tracing
procedure', in: R. Selten, ed., Game equilibrium models, Vol. II: Methods, morals and markets. Berlin:
Selten, R. (1975) 'Reexamination of the perfectness concept for equilibrium points in extensive games',
International Journal qf Game Theory, 4:25-55.
Shapley, L. (1974) 'A note on the Lemke-Howson algorithm', Mathematical Programming Study, 1:175-
Shapley, L.S. (1973) 'On balanced games without sidepayments', in: T.C. Hu and S.M. Robinson, eds,
Mathematical programming. New York: Academic Press, pp. 261-290.
Shapley, L. (1981) 'On the accessibility of fixed points', in: O. Moeschlin and D. Pallaschke, eds, Game
theory and mathematical economics. Amsterdam: North-Holland, pp. 367-377.
Smale, S. (1976) 'A convergent process of price adjustment and global Newton methods', Journal qf
Mathematical Economics, 3:107-120.
Talman, A.J.J. and Yang, Z. (1994) 'A simplicial algorithm for computing proper equilibria of finite games,
Center Discussion Paper 9418, Tilburg University, Tilburg.
Tarski, A. (1951) A decision method for elementary algebra and geometry. Berkeley, CA: Univ. of Cali-
fornia Press.
Todd, M.J. (1982) 'On the computational complexity of piecewise-linear homotopy algorithms', Mathe-
matical Ptvgramming, 24:216-224.
Todd, M.J. (1976) The computation of j2red points and applications. Berlin: Springer.
van den Elzen, AM. and Talman, A.EE (1994) 'Finding a Nash equilibrium in noncooperative n-person
games by solving a sequence of linear stationary point problems', ZOR-methods and Models of Opera-
tions Research, 35:27-43.
van der Loan, G. and Talman, A.J.J. (1979) 'A restart algorithm for computing fixed points without an
extra dimension', Mathematical hvgramming, 17:74-84.
van der Laan, G. and Talman, A.J.J. (1982) 'On the computation of fixed points in the product space of unit
simplices and an application to noncooperative n-person games', Mathematics qf Operations Research,
van der Laan, G. and Tahnan, A.J.J. (1987) 'Simplicial approximation of solutions to the nonlinear com-
plementarity problem with lower and upper bounds', Mathematical Programming, 38:1-15.
van der Laan, G., Talman, A.J.J. and van der Heyden, L. (1987) 'Simplicial variable dimension algo-
rithms for solving the nonlinear complementarity problem on a product of unit simplices using a general
labelling', Mathematics (?f Operations Research, 12:377-397.
van der Waerden, B.L. (1949) Modern algebra. New York: Ungar.
yon Stengel, B. (1996) 'Efficient computation of behavior strategies', Games and Economic Behavior,
Wilson, R. (1971) 'Computing equilibria of n-person games', SIAM Journal of Applied Mathematics,
Wilson, R. (1972) °Computing equilibria of two-person games from the extensive form', Management
Science, 18:448-460.
142 R.D. McKelvey and A. McLennan

Wilson, R. (1992) 'Computing simply stable equilibria', Econometrica, 60:1039-1070.

Yamamoto, Y. (1993) 'A path-following procedure to find a proper equilibrium of finite games', Interna-
tional Journal of Game Theory, 22:49-59.
ZangwiU, W.I. and Garcia, C.B. (1981) Pathways to solutions, fixed points, and equilibria. Englewood
Cliffs, NJ: Prentice-Hall.
Chapter 3


Yale University

1. Introduction 144
2. Notation 145
3. Two stage least squares 146
4. 3SLS and FIML 146
5. Two stage least absolute deviations 150
6. The Gauss-Seidel technique 152
7. Stochastic simulation 155
7.1. Numerical procedures for drawing values 156
8. Optimal control 157
8.1. Stochastic simulation option 158
9. Asymptotic distribution accuracy 159
10. Solution and FIML estimation of RE models 160
10.1, Introduction 160
10.2. The solution method 161
10.3. Computational costs 164
10.4. FIML estimation 165
10.5. Stochastic simulation 166
10.6. Conclusion 168
References 169

Handbook of Computational Economics, Volume L Edited by H.M. Amman, D.A. Kendrick and J. Rust
(~) 1996 Elsevier Science B. V. All rights reserved.
144 R . C IS'air

1. Introduction

Advances in computer hardware in the last two decades have considerably lessened
the computational burden of working with large scale macroeconometric models. Most
methods for single equations are now computationally trivial, and many methods for
complete models are now routine. In particular, the availability of fast, inexpensive
computers has made stochastic simulation routine, and this has greatly expanded the
ways in which models can be tested and analyzed.
This chapter discusses computational methods for the estimation and analysis of
macroeconometric models. The focus is on methods that, while possibly computation-
ally routine, are at least not trivial; computationally trivial methods are not discussed.
Most of the methods discussed are methods for complete models. Nonlinear optimiza-
tion algorithms, such as the DFP algorithm, are not discussed. The reader is assumed
to be familiar with these algorithms.
Much of the material in this chapter is taken from Fair (1984) and (1994), where the
methods are both discussed and applied, and the reader is referred to these two sources
for the applications. To save space, no applications are presented in this chapter. It
will help in what follows, however, to have an idea of the size of the model to which
the methods have been applied. The latest version of this model, which will be called
the "US" model, is presented in Fair (1994). It consists of 30 stochastic equations, 101
identities, and a little over 100 exogenous variables. The basic estimation period is
1954:1-1993:2, for a total of 158 observations. There are 166 unrestricted coefficients
to estimate.
No computer times are reported in this chapter. Advances in hardware are so rapid
that any times reported now would be out of date even by the time this book is
published. Suffice it to say that none of the methods discussed in this chapter - with
the possible exception of FIML estimation of models with rational expectations - are
currently impractical in the sense of requiring days of personal computer time to run.
All the methods discussed in this chapter have been programmed into the Fair-Parke
(1993) program, which is available for distribution. One advantage of this program
is that once a model has been set up in the program, the methods can be carried out
with a few simple commands. The only real work is setting up the model.
What I have tried to show in this chapter is that for the most part computational
issues are no longer a problem in macroeconometric model building. Computational
requirements should not be a major constraint in the advancement of the field in the
Finally, it should be stressed that this chapter is not meant to be a survey of the
field. It is obviously beyond the scope of one paper to survey the literature on all the
numerical methods that are used in macroeconometric modeling - methods for the
various estimators, for deterministic simulation, for Monte Carlo/stochastic simulation,
for optimal control, for nonlinear models, for rational expectations models, etc. There
are not only an enormous number of methods, but many of them pertain to non
Ch. 3: Computational Methods for Macroeconometric Models 145

macroeconomic models as well. I have instead focused on those methods that I have
found useful in my own work and that are in the Fair-Parke program.

2. Notation

The general model considered in this chapter can be dynamic, nonlinear, and simul-
taneous and can have autoregressive errors of any order. The model is written as:

.fi(Yt, X t , o ~ i ) = u i t , i= 1,...,n, t--1,...,T, (1)

where Yt is an n-dimensional vector of endogenous variables, x t is a vector of prede-

termined variables (including lagged endogenous variables), c~i is a vector of unknown
coefficients, and uit is the error term for equation i for observation t. It will be assumed
that the first m equations are stochastic, with the remaining q£it (i = ~ t + l , . . . , •)
identically zero for all t.
The following notation is also used. ui denotes the T-dimensional vector
(U/t,..., % i Z ) ' . G ~ denotes the ki × Z matrix whose tth column is O f i ( y t , xt, c~i)/Oc~i,
where/ci is the dimension of c~i. c~ denotes the vector of all the unknown coefficients
in the model: c~' = ( c ~ ] , . . . , din)'. The dimension of c~ is k, where k = 2 i = 1 ki.
Finally, Zi denotes a T x K i matrix of predetermined variables that are to be used as
first stage regressors for the 2SLS technique.
The following additional notation is needed when discussing the F I M L and 3SLS
estimators. Jt denotes the n x n Jacobian whose i j element is ~ f i / ~ Y j t , ( i , j =
1 , . . . , n). u denotes the m - T - d i m e n s i o n a l vector ( u H , . • •, u l T , . . . , u r n 1 , . . . , u ~ r ) ~.
G ~ denotes the h x m - T matrix

0 ...

i -. G"
Finally, u t denotes the m-dimensional vector ( u r n . . . , u m t ) , and Z denotes the m x m
covariance matrix of ut.
Each equation in (1) is assumed to have been transformed to eliminate any au-
toregressive properties of its error term. If the error term in the untranslbrmed
version, say wit in equation i, follows a rth order autoregressive process, w u --
Pli~coit-I + " " " + priWit r + uit, where uit is iid, then equation i is assumed to have
been transformed into one with u u on the right hand side. The autoregressive coeffi--
cients P l i , . . . , P,..i are incorporated into the c~i coefficient vector, and the additional
lagged values that are involved in the transformation are incorporated into the z t
vector. This transformation makes the equation nonlinear in coefficients if it were no~
146 R.C. b~ir

otherwise, but this adds no further complications to the model because it is already
allowed to be nonlinear. It does result in the "loss" of the first r observations, but this
has no effect on the asymptotic properties of the estimators, lzit in (1) can thus be
assumed to be iid even though the original error term may follow an autoregressive

3. Two stage least squares

Probably the most widely used estimation technique for single equations that produces
consistent estimates is two stage least squares (2SLS). Although the computation
of 2SLS estimates is trivial, it will be useful for reference purposes to present the
estimator. The 2SLS estimate of ai (denoted &i) is obtained by minimizing

, ( Z ; Z i ) - l , Ziui = uiDiu~
Si = u~Z~ t (3)

with respect to ai. Zi can differ from equation to equation. An estimate of the co-
variance matrix of &~ (denoted V2ii) is

= & (4)

where Gi is Gi evaluated at &i, 6i~ = T - l ~ L 1 u~t,^2and uit

^ = fi(Yt, xt, &i).
The 2SLS estimate of the k x k covariance matrix of all the coefficient estimates
in the model (denoted V2) is



~r2ij = O'ij (G~DiGi)-I

~, ~, A,i D i D j G^,j ) (GjDjGj)A' ~, -1
(G (6)

and crij
^ =
T -1 T Uit~tJ
^ ^ t"

4. 3SLS and F I M L

3SLS estimates of a are obtained by minimizing

S : u'[b-' ® Z(Z'Z)-'Z']'. - "~'Du (7)

Ch. 3." Computational Methods fi)r Macroeconometric Models 147

with respect to a, where ~ is a consistent estimate of Z and Z is a T x K matrix

of predetermined variables. An estimate of the covariance matrix of the coefficient
estimates (denoted ~ ) is

= (&DS)-' (8)

where G is G evaluated at the 3SLS estimate of a. Z is usually estimated from the

2SLS estimated residuals.
Under the assumption that ut is independently and identically distributed as multi-
variate normal N(0, Z), FIML estimates of a are obtained by maximizing

T log Izl + ~ t o g
L = --~- IJ~l (9)
with respect to a. An estimate of the covariance matrix of the FIML estimates (denoted
94) is

_( ~)2L .~-1

where the derivatives are evaluated at the optimum.

The FIML computational problem can be separated into two main parts; the first
is to find a fast way of computing L in (9) for a given value of a, and the second is
to find an algorithm capable of maximizing L.
The main cost of computing L is computing the Jacobian term. Two savings can be
made here. One is to exploit the sparseness of the Jacobian. The number of nonzero
elements in Jt is usually much less than n 2. For the US model, for example, n is 131
(so n 2 = 17,161), whereas the number of nonzero elements is only 556. Considerable
time can be saved by using sparse matrix routines to calculate the determinant of Jr.
The second saving is based on an approximation. Consider approximating
~ t =T l log lJtl by simply the average of the first and last terms in the summation
multiplied by T: (T/2)(log IJll + log IJTI). Let So denote the true summation, and
let $1 denote the approximation. It turns out in many applications that So - $1 does
not change very much as the coefficients change from their starting values (say 2SLS
estimates) to the values that maximize the likelihood function. In other words, So - $1
is nearly a constant. This means that $1 can be used instead of So in computing L,
and thus considerable computer time is saved since the determinant of the Jacobian
only needs to be computed twice rather than T times for each evaluation of L. As
noted above, T is 158 for the US model. Using 171 in place of So means, of course,
that the coefficient values that maximize the likelihood function are not the exact
FIML estimates. If one is concerned about the accuracy of the approximation, one
148 R.C. Fair

can switch from $1 to So after finding the maximum using $1. If the approximation
is good, one should see little further change in the coefficients; otherwise additional
iterations using the algorithm will be needed to find the true maximum.
The choice of algorithm turns out to be crucial in maximizing L for large nonlinear
models. My experience is that general purpose algorithms like DFP do not work for
large problems, and in fact the only algorithm that does seem to work is the Parke
(1982) algorithm, which is a special purpose algorithm designed for FIML and 3SLS
estimation. The algorithm exploits two key features of models. The first is that the
mean of a particular equation's estimated residuals is approximately zero for FIML
and 3SLS estimates. For OLS this must be u'ue, and empirically it turns out that it is
approximately true for other estimators. The second feature is that the correlation of
coefficient estimates within an equation is usually much greater than the correlation
of coefficients across equations.
The problem with algorithms like DFP that require first derivatives is that the
computed gradients do not appear to be good guides regarding the directions to move.
Gradients are computed by perturbing one coefficient at a time. When a coefficient
is changed without the constant term in the equation also being changed to preserve
the mean of the residuals, a large change in L may result (and thus a large computed
derivative), and this can be misleading. The Parke algorithm avoids this problem
by spending most of its time perturbing two coefficients at once, namely a given
coefficient and the constant term in the 'equation in which the coefficient appears.
The constant term is perturbed to keep the mean of the residuals unchanged. (The
algorithm does not, of course, do this all the time, since the means of the residuals
must also be estimated.) To take advantage of the generally larger correlation within
an equation than between equations, the Parke algorithm spends more time searching
within equations than between them. General purpose algorithms do not do this, since
they have no knowledge of the structure of the problem.
If only a few coefficients are changed before a new value of L is computed,
considerable savings can be made by taking advantage of this fact. If, for example,
the coefficients that are changed are not in the Jacobian, the Jacobian term does
not have to be recomputed. If only a few equations are affected by the change in
coefficients, only a few rows and columns in the ~ matrix have to be recornputed.
Since the Parke algorithm spends much of its time perturbing two coefficients at a
time, it is particularly suited for these savings.
The estimated covariance matrix for the FIML coefficient estimates, V4 in (10), is
difficult to compute. It is not part of the output of the Parke algorithm, and thus extra
work is involved in computing it once the algorithm has found the optimum. My
experience is that simply trying to compute the second derivatives of L numerically
does not result in a positive definite matrix. Although the true second derivative
matrices at the optimum are undoubtedly positive definite, they seem to be nearly
singular. If this is true, small errors in the numerical approximations to the second
derivatives may be sufficient to make the matrix not positive definite.
Ch. 3: Computational Methods for Macroeconometric Models 149

Fortunately, there is an approach to computing 94 that does work, which is derived

from Parke (1982). Parke's results suggest that the inadequate numerical approxima-
tions may be due to the fact that the means of the right hand side variables in the
estimated equations are not zero. if so, the problem can be solved by subtracting the
means from the right hand side variables before taking numerical derivatives. Let/3
denote the coefficient vector that pertains to the model after the means have been
subtracted, and let c~ denote the original coefficient vector. The relationship between
c~ and/3 is

c~ = M . / 3 (11)

where M is a k x k matrix that is composed of the identity matrix plus additional

nonzero elements that represent the means adjustments. Unless there are constraints
across equations, M is block diagonal. Assume, for example, that the first equation
of the model is

Ylt = /31 q- fl2(Y2t -- 'rrt2) q-fl3(Y3t --'rn,3) q- Ult, t = 1,... ,T, (12)

where m2 and m3 are the sample means of Y2t and Y3t respectively. This equation
can be written

Ylt z /31 -- fl2Tr~2 -- /337T~3 @/32Y2t @/33Y3t @ Ult

=oq +o~2Y2t-l-o~3Y3t q- Ult, t= 1,...,T. (13)
In this case the part of (11) that corresponds to the first equation is

c~2 = 1 0 /32 • (14)

c~3 0 1 f13

Parke found that the covariance matrix of fl could easily be computed numerically.
Let 94(/3) denote this matrix:

(O2L(M ._/3)) - ' (15)

W4(fl) = - ~ 0/30/3t

Given ~ ( ~ ) , the covariance matrix of c~ is simply

< = -/'~" 94(/3)" M r . (16)

V4 can thus be obtained by first computing the covariance matrix of the coefficients
of the transformed model (that it, the model in which the right hand side variables
150 R.C. Fair

have zero means) and then using (16) to get the covariance matrix of the original
Using the Parke algorithm and the various savings discussed above, I have found
it feasible to obtain FIML estimates of the US model. In fact, the main work in using
the FIML estimator is not the computational burden but making sure that no exTors
have been made in taking the derivatives for the Jacobian.
Consider now the 3SLS estimation problem, which is to minimize (7). The only
cost saving to note for this problem is that the D matrix, which is m . T x m . T, need
not be calculated from scratch each time (7) is computed if only a few coefficients
are changed. In other words, pieces of D can be saved and used many times before
needing to be recomputed. This saves considerable time because D is large. For
example, for T = 158 and m = 30, D is 4740 x 4740. I have also found the 3SLS
estimates easy to compute using the Parke algorithm.

5. Two stage least absolute deviations

A single equation estimator for which there are some computational issues is two
stage least absolute deviations (2SLAD). This estimator is as follows. It is assumed
for this estimator that the model in (1) can be written:

Yit =hi(yt,xt, o~i)+uit, i= 1,...,n, t= 1,...,T, (17)

where in the ith equation Yit appears only on the left-hand side. Let ~i = Diyi and
Tzi = Dihi, where, as above, Di = Zi(Z~Zi)
~ - l Zi, where Zi is a matrix of first stage
regressors. There are two ways of looking at the 2SLAD estimator. One is that it


and the other is that it minimizes


Amemiya (1982) has proposed minimizing

[qyit + (1 -q)9~t - h~t[ (20)
Ch. 3." Computational Methods ,fbr Macroeconometric Models 151

where q is chosen ahead of time by the investigator. The estimator that is based on
minimizing (20) will be called the 2SLAD estimator.
The 2SLAD computational problem is to minimize


with respectto c~i, where vie = W i t + (1 - q)git - hit. This computational problem is
not particularly easy, especially when vit is a nonlinear function of c~i. For example, I
have had no success in minimizing (21) using Powell's (1964) no derivative algorithm.
Because standard algorithms do not work, other approaches must be tried. I have
found the following approach to work well, which uses the fact that

T 2 ~ 2
= V" v;~. _ vi_at
t-1 = t=l

where wit = [vit]. For a given set of values of wit (t = 1 , . . . , T ) , minimizing (22)
is simply a weighted least squares problem. If vit is a linear function of ai, closed
form expressions exist for &i; otherwise a nonlinear optimization algorithm can be
used. This suggests the following iterative procedure:
1. Pick an initial set of values of wit. These can be the absolute values of the 2SLS
estimated residuals.
2. Given these values, minimize (22).
3. Given the estimate of ai from step 2, compute new values of vit and thus new
values of wit.
4. With the new weights, go back to step 2 and minimize (22) again. Keep repeating
steps 2 and 3 until successive estimates of c~i are within some prescribed tolerance
level. If on any step some value of wit is smaller than some small preassigned
number (say e), the value of wit should be set equal to e.
In the case in which the equation to be estimated is linear in coefficients, the closed
form expression for &~ for a given set of values of wit is

~'~-*t -';~*~--1
e~i = ( & &) X^*' ~*
i Yi (23)

X'* is the same as )(i, where Xi = D i X i , except that each element in row t of Xi is

divided by v/~7. The vector 9i equals qYi + (1 - q)~)i except that row t is divided
by ~ . (yi equals D i y i . )
The accuracy of the estimates using this approach is a function of e: the smaller
is e, the greater is the accuracy. If vit is a linear function of c~i, the estimates will
152 R.C Fair

never be exact because the true estimates correspond to ki values of wit being exactly
equal to zero, where ki is the number of elements of c~i. One might think that this
would be a serious problem in practice, but I have not found it to be. I typically use a
value of e of 0.0000001 and a percentage stopping criterion for successive coefficient
estimates of 0.001, and with these numbers it is seldom the case that any weight is
less than e. (I typically use a value of q of 0.5.) The method also works well when the
equation is nonlinear in coefficients, such as when the error term is assumed to follow
an autoregressive process and the autoregressive coefficients are estimated along with
the structural coefficients.

6. The Gauss-Seidel technique

Turn now from estimation to solution. Most macroeconometric models are solved us-
ing the Gauss-Seidel technique. It is a remarkably simple technique and in most cases
works remarkably well. Very early on it became the method of choice for solution
purposes. Because the technique is so widely used, it is important to understand what
it does. The technique is easiest to describe by means of an example.
Assume that the model in (1) consists of three equations, and let xit denote the
vector of predetermined variables in equation i. The model is as follows:

f l ( y l t , Y2t, Y3t, X l t , oq) = Ult, (24)

f 2 ( Y l t , Y2t, Y3t, 292t, OZ2) ~- U2t, (25)

f 3 ( Y l t , Y2t , Y3t , X3t , 0!3) = U3t (26)

where Ylt, Y2t, and Y3t are scalars. The technique requires that the equations be
rewritten with each endogenous variable on the left hand side of one equation. This is
usually quite easy for macroeconometric models, since most equations have an obvious
left hand side variable. If, say, the left hand side variable for (25) is log(Y2t/y3t),
then y2t can be written on the left hand side by taking exponents and multiplying
the resulting expression by Y3t. The technique does not require that each endogenous
variable be isolated on the left hand side; the left hand side variable can also appear on
the right hand side. It is almost always possible in macroeconometric work, however,
to isolate the variable, and this will be assumed in the following example.
The model (24)-(26) will be written:

Y l t ~- gl (Y2t, Y3t, X l t , Ctl, Ult), (24)'

Y2t = g2(Ylt, Y3t, X2t, 0~2, U2t), (25)'

Ch. 3: Computational Methods for Macroeconometric Models 153

Y3t -~ g3(Ylt, Y2t, X3t, OZ3,U3t). (26)'

In order to solve the model, values of the coefficients and the error terms are needed.
Given these values and given values of the predetermined variables, the solution
proceeds as follows. Initial values of the endogenous variables are guessed. These
are usually either the actual values or predicted values from the previous period.
Given these values, (24)~-(26) ~ can be solved for a new set of values. This requires
one "pass" through the model: each equation is solved once. One pass through the
model is also called an "iteration". Given this new set of values, the model can
be solved again to get another set, and so on. Convergence is reached if for each
endogenous variable the values on successive iterations are within some prescribed
tolerance level.
There are two main options that can be used when passing through the model.
One is to use the values from the previous iteration for all the computations for
the current iteration, and the other is to use, whenever possible, the values from the
current iteration in solving the remaining equations. Following the second option in
the example just given would mean using the current solution for Ylt in the solution
of y2t and Y3t and using the current solutions for Ylt and Y2t in the solution of Y3t.
In most cases convergence is somewhat faster using the second option. If the second
option is used, the order of the equations obviously matters in terms of the likely speed
of convergence. The first option is sometimes called the Jacobi technique rather than
the Gauss-Seidel technique, but for present purposes both options will be referred to
as the Gauss-Seidel technique.
There is no guarantee that the Gauss-Seidel technique will converge. It is easy to
construct examples in which it does not, and I have seen many examples in practice
where it did not. The advantage of the technique, however, is that it can usually be
made to converge (assuming an actual solution exists) with sufficient damping. By
"damping" is meant the following. Let ult denote the solution value of Ylt for
iteration n - 1 (or the initial value if n is 1), and let Y l t denote the value computed
by solving (24) ~ on iteration n. Instead of using Y l t as the solution value for iteration
n, one can instead adjust Ylt
~(n-l) only partway toward Y~(n)
lt "

^(,~) ^O~-l) , / =(~) -- 91~ -1) ) , )k 1.

Ylt - Ylt + "~{ Y i , 0 < <~ (27)

If A is 1, there is no damping, but otherwise there is. Damping can be done for any
or all of the endogenous variables, and different values of ~ can be used for different
My experience is that one can usually make ~ small enough to achieve convergence.
The cost of damping is, of course, slow convergence. In some cases I have seen values
as low as 0.05 needed. In the vast majority of the problems that I have solved, however,
154 R.C Fair

no damping at all was needed. Two other ways in which one can deal with problems
of convergence are to try different starting values and to reorder the equations. This
involves, however, more work than merely running the problem with lower values of
A, and I have generally not found it necessary to experiment with starting values and
the order of the equations.
Note that nothing is changed in the foregoing discussion if, say, Y l t is also on the
right hand side of (24)'. One still passes through the model in the same way. This
generally means, however, that it takes longer to converge, and more damping may
be required than if y l t is only on the left hand side; thus it is better to isolate variables
on the left hand side whenever possible.
The question of what to use for a stopping rule is not as easy as it might sound.
The stopping rule can either be in absolute or percentage terms. In absolute terms it

19}~r') m(n-1)l
-- Yit I < ei (28)

and in percentage terms it is

~(,~) ^(n-l)
Yit -- Yit
- ~'-L--~ ~ ~i (29)

where ei is the tolerance criterion for variable i. (If damping is used, ~)i(n) in (28) and
(29) should be replaced with Y i t .)
The problem comes in choosing the values for e~. It is inconvenient to have to
choose different values of the tolerance criterion for different variables, and one would
like to use just one value throughout. This is not, however, a sensible procedure if the
units of the variables differ and if the absolute criterion is used. Setting the tolerance
criterion small enough for the required accuracy of the variable with the smallest units
is likely to lead to an excess number of iterations, since a large number of iterations
are likely to be needed to satisfy the criterion for the variables with the largest units.
Setting the criterion greater than this value, on the other hand, runs the risk of not
achieving the desired accuracy for some variables. This problem is lessened if the
percentage criterion is used, but in this case one must be concerned with variables
that can be zero or close to zero.
My experience is that the number of iterations needed for convergence is quite
sensitive to the stopping rule. It does not seem to be the case, for example, that
once one has converged for most variables, one or two additional iterations increase
the accuracy for the remaining variables very much. There is no real answer to this
problem. One must do some initial experimentation to decide how many different
values of ei are needed and whether to use the absolute or percentage criterion for a
given variable.
Ch. 3." Computational Methods for Macroeconometric Models 155

7. Stochastic simulation

As noted in the Introduction, computer hardware advances have now made stochastic
simulation routine, and this has greatly expanded the ways in which models can
be tested and analyzed. Many applications of stochastic simulation are contained
in Fair (1994), but these will not be discussed here. The following discussion will
focus on computational aspects only. The notation in Section 2 will continue to be
Stochastic simulation requires that an assumption be made about the distribution
of ut. It is usually assumed that ut is independently and identically distributed as
multivariate normal N(0, Z), although other assumptions can clearly be used. Al-
ternative assumptions simply change the way the error terms are drawn. Stochastic
simulation also requires that consistent estimates of c~ be available for all i. Given
these estimates, denoted &i, consistent estimates of uit, denoted ~2it, can be computed

as fi(Yt, xt, &i). The covariance matrix S can then be estimated as (1/T)UU', where
is the m x T matrix of the values of ~2u.
Let u~' denote a particular draw of the m error terms for period t from the N(0, f2)
distribution. Given u~' and given &i for all i, one can solve the model for period t
(using, say, the Gauss-Seidel technique). This is merely a deterministic simulation for
the given values of the error terms and coefficients. Call this simulation a "repetition".
Another repetition can be made by drawing a new set of values of u~' and solving
again. This can be done as many times as desired. From each repetition one obtains a
prediction of each endogenous variable. Let yij denote the value on the jth repetition
of variable i for period t. For J repetitions, the stochastic simulation estimate of the
expected value of variable i for period t, denoted/2u, is



2j = (y~
c% j - #.)2 (31)

The stochastic simulation estimate of the variance of variable i for period t, denoted
-2 is then

-2 1 J
~it = -j ~ cri~J. (32)
156 R.C. F a i r

Given the data from the repetitions, it is also possible to compute the variances of
the stochastic simulation estimates and then to examine the precision of the estimates.
The variance of/2~t is simply ~r~t/J. The variance of 6-2, denoted var(6-2t), is

2 J
(Tit. . (33)

In many applications, one is interested in predicted values more than one period
ahead, i.e., in predicted values from dynamic simulations. The above discussion can
be easily modified to incorporate this case. One simply draws values for ut for each
period of the simulation. Each repetition is one dynamic simulation over the period
of interest. For, say, an eight quarter period, each repetition yields eight predicted
values, one per quarter, for each endogenous variable.
It is also possible to draw coefficients for the repetitions z Let & denote, say, the
2SLS estimate of all the coefficients in the model, and let V denote the estimate of
the k x k covariance matrix of &. Given V and given the normality assumption, an
estimate of the distribution of the coefficient estimates is N(&, V). When coefficients
are drawn, each repetition consists of a draw of the coefficient vector from N(&, V)
and draws of the error terms as above.

7.1. Numerical procedures f o r drawing values

A standard way of drawing values of a* from the N(&, V) distribution is to 1) factor

numerically l~ into P P ' , 2) draw k values of a standard normal random variable
with mean 0 and variance t, and 3) compute a* as & + Pe, where e is the k x 1
vector of the standard normal draws. Since E e # = I, then E ( a * - &)(a* - &)~
- E P e # P ~ = <V, which is as desired for the distribution of a*. A similar procedure
can be used to draw values of u~' from the N(0, ~ ) distribution: X7 is factored into
P P ' , and u~ is computed as Pe, where e is a ra x 1 vector of standard normal draws.
An alternative procedure for drawing values of the error terms, derived from Mc-
Carthy (1972), has also been used in practice. For this procedure one begins with the
m x T matrix of estimated error terms, U. T standard normal random variables are
then drawn, and u~ is computed as T-1/2Ue, where e is a T x 1 vector of the standard
normal draws. It is easy to show that the covariance matrix of u~ is ~ , where, as
earlier, S is ( 1 / T ) U U ' .
An alternative procedure is also available for drawing values of the coefficients.
Given the estimation period (say, 1 through T) and given Z, one can draw T values
of u~ (t = l , . . . , T). One can then add these errors to the model and solve the model
over the estimation period (static simulation, using the original values of the coefficient
estimates). The predicted values of the endogenous variables from this solution can
Ch. 3: Computational Methods fi~r Macroeconometric Models 157

be taken to be a new data base, from which a new set of coefficients can be estimated.
This set can then be taken to be one draw of the coefficients. This procedure is more
expensive than drawing from the N(&, V) distribution, since reestimation is required
for each draw, but it has the advantage of not being based on a fixed estimate of the
distribution of the coefficient estimates. It is, of course, based on a fixed value of
and a fixed set of original coefficient estimates.

8. Optimal control

There is a large literature on the use of optimal control techniques in macroecono-

m e t r i c s ] and it is beyond the scope of this chapter to review this literature. Instead,
I will simply discuss one technique that I have found to be very useful in solving
optimal control problems for large models. This technique is presented and applied
in Fair (1974).
The first step in setting up an optimal control problem is to postulate an objective
function. Assume that the period of interest is t = 1 , . . . , T. A general specification
of the objective function is

W = h(yl,..., YT, Xl,..., XT) (34)

where W , a scalar, is the value of the objective function corresponding to values of

the endogenous and exogenous variables for t = 1 , . . . , T. In most applications the
objective function is assumed to be additive across time, which means that (34) can
be written


where ht(fft, x t ) is the value of the objective function for period t. The model can be
taken to be the model in (1).
Let zt be a k-dimensional vector of control variables, where zt is a subset of xt,
and let z be the k • T-dimensional vector of all the control values: z = ( z l , . . . , zT).
Consider first the deterministic case where the error terms in (1) are all set to zero.
For each value of z one can compute a value of W by first solving the model (1) for
Yl, • • •, YT and then using these values along with the values for : e l , . . . , XT to compute
W in (34). Stated this way, the optimal control problem is choosing variables (the
elements of z) to maximize an unconstrained nonlinear function. By substitution, the
constrained maximization problem of maximizing W in (34) subject to the model is

1See Chow (1981) for a discussion of many of the techniques in this area.
158 R.C. Fair

transformed into the problem of maximizing an unconstrained function of the control


W =- ~(z) (36)

where ~ stands for the mapping z --+ Y l , . . . , y T , X l , . . . , X T -+ W . For nonlinear

models it is generally not possible to express y t explicitly in terms of x t , which
means that it is generally not possible to write W in (36) explicitly as a function of
x l , • • . , X T . Nevertheless, given values for Xl,. •., X T , values of W can be obtained
numerically for different values of z.
Given this setup, the problem can be turned over to a nonlinear maximization
algorithm like DFE For each iteration, the derivatives of ~ with respect to the elements
of z, which are needed by the algorithm, can be computed numerically. Each iteration
will thus require k T function evaluations for the derivatives plus a few more for
the line search. Each function evaluation requires one solution (dynamic simulation)
of the model for T periods plus the computation of W in (34) after y l , • • •, Y T are
There is one important cost saving feature regarding the method that should be
noted. Assume that there are two control variables and that the length of the period
is 30. The number of unknowns is thus 60, and therefore 60 function evaluations will
have to be done per iteration to get the numerical derivatives. In perturbing the control
values to get the derivatives, one should start from the end of the control period and
work backward. When the control values for period 30 are perturbed, the solution
of the model for periods 1 through 29 remains unchanged from the base solution, so
these calculations can be skipped. The model only needs to be resolved for period 30.
Similarly, when the control values for period 29 are perturbed, the model only needs
to be resolved for periods 29 and 30, and so on. This cuts the cost of computing the
derivatives roughly in half.
My experience is that algorithms like DFP are quite good at solving optimal control
problems set up in the above way. See, for example, Fair (1974) for the use of the
DFP algorithm to solve quite large problems.

8.1. Stochastic simulation option

Consider now the stochastic case, where the error terms in (1) are not zero. It is
possible to convert this case into the deterministic case by simply setting the error
terms to their expected values (usually zero). The problem can then be solved as
above, in the nonlinear case this does not lead to the exact answer because the values
of W that are computed numerically in the process of solving the problem are not
the expected values. In order to compute the expected values correctly, stochastic
simulation has to be done. In this case each function evaluation (i,e., each evaluation
of he expected value of W for a given value of z) consists of the following.
Ch. 3: Computational Methods"for Macroeconometric Models" 159

1. A set of values of the error terms in (1) is drawn from an estimated distribution.
2. Given the values of the error terms, the model is solved for y l , . . . , ~ and the
value of W corresponding to this solution is computed from (34). Let W j denote
this value.
3. Steps 1 and 2 are repeated J times, where J is the number of repetitions.
~ v

4. Given the J values of W j (j = 1 , . . . , J), the expected value of W is the mean

of these values:

m 1
w: (37)

This procedure increases the cost of solving control problems by roughly a factor of
J, and it is probably not worth the cost for most applications. The bias in predicting
the endogenous variables that results from using deterministic rather than stochastic
simulation is usually small, and thus the bias in computing the expected value of W
using deterministic simulation is likely to be small.

9. A s y m p t o t i c d i s t r i b u t i o n a c c u r a c y

Computer advances have also increased the ability to compute "exact" distributions
of the estimators that are used for macroeconometric models. These distributions can
then be compared to the asymptotic approximations of the distributions that are gen-
erally used for hypothesis testing to see how accurate the asymptotic approximations
are. If some variables are not stationary, the asymptotic approximations may not be
very good. In fact, much of the recent literature in time series econometrics has
been concerned with the consequences of non stationary variables. As will be seen,
computing the exact distributions requires stochastic simulation and reestimation.
The procedure for examining asymptotic distribution accuracy is as follows. Take
an estimator, say 2SLS, 3SLS, or FIML, and estimate the model. Take these coefficient
estimates, denoted &, as the base values, and compute Z using these estimates. From
the N(0, ~ ) distribution (assuming the normality assumption is used), draw a vector
of the ra error terms for each of the T observations. Given these error terms and &,
solve the model for the entire period 1 through T. This is a dynamic simulation of the
model over the entire estimation period. The lagged endogenous variable values in (1)
are updated in the solution process. Also, the matrices of first stage regressors, Z.it,,
are updated to incorporate the new lagged endogenous variable values if the matrices
are used in the estimation, as for 2SLS. The predicted values from this solution form
a new data set. Given this data set, estimate the model by the technique in question,
and record the set of estimates. This is one repetition. Repeat the draws, solution, and
estimation for many repetitions, and record each set of estimates. (Remember that the
160 R.C. Fair

draws of the errors are always from the N(0, ~ ) distribution and that the coefficient
vector used in the solution is always &.)
If J repetitions are done, one has J values of each coefficient estimate, which are
likely to be a good approximation of the exact distribution. For ease of exposition,
this distribution of the J values will be called the "exact distribution", although it
is only an approximation because 27 is estimated rather than known. The asymptotic
distribution can then be compared to this exact distribution to see how close the two
distributions are.
Once an exact distribution has been computed, there are a number of ways to exam-
ine the closeness of the asymptotic distribution to it. For the application in Fair (1994)
the median of the exact distribution was first compared to the coefficient estimate to
examine the bias of the estimate. Given the median from the exact distribution and
given the estimated standard error of the coefficient estimate from the asymptotic dis-
tribution, one can compute the value above which, say, 20 percent of the coefficient
estimates should lie if the asymptotic distribution is correct. For 20 percent, this value
is the median plus 0.84 times the estimated asymptotic standard error. One can then
compute the actual percent of the coefficient estimates from the exact distribution that
lie above this value and compare this percent to 20 percent. For the work in Fair
(1994), this comparison was made for 20, 10, and 5 percent values and for both left
and right tails.
The results that I have obtained so far show that the exact and asymptotic distri-
butions are generally quite similar regarding their tail properties. If this conclusion
holds up upon further work, it has important consequences. It means that the unit root
problems that have received so much attention lately may not be of much concern
to macro model builders. While the existence of unit roots can in theory cause the
asymptotic approximations that are relied on in macroeconometrics to be way off, in
practice they seem fairly accurate.

10. Solution and FIML estimation of RE models

10.1. Introduction

The rest of this chapter is concerned with the estimation and solution of models with
rational expectations. As will be discussed, these models have severe computational
The single equation estimation of equations with rational expectations can be carried
out using Hansen's (1982) method of moments estimator, and there are no serious
computational requirements here. It is also possible, however, to use FIML to estimate
models with rational expectations, and here there are serious computational issues.
Methods for the solution and FIML estimation of these models are presented in Fair
and Taylor (1983). The basic solution method, called the "extended path" (EP) method,
Ch. 3: Computational Methods for Macroeconometric M(Mels 161

has come to be widely used for deterministic simulations of rational expectations

models, 2 but probably because of the expense, the full information estimation method
has not been tried by others. This earlier work discussed a "less expensive" method for
obtaining full information estimates, but the preliminary results using the method were
mixed. Since this earlier work, however, more experimenting with the less expensive
method has been done~ and it seems much more promising than was originally thought.
This work is reported in Fair and Taylor (1990), and the following discussion is taken
from this paper.
The first part of this section discusses the new results using the less expensive
method that have been obtained and argues that full information estimation now seems
feasible for rational expectations models. In the process of doing this some errors in
the earlier work regarding the treatment of models with rational expectations and
autoregressive errors are corrected. The second part discusses methods for stochastic
simulation of rational expectations models, something that was only briefly touched
on in the earlier work.

10.2. The solution method

The notation for the model used here differs somewhat from the notation used in
Eq. (1). The lagged values of the endogenous variables are written out explicitly, and
a't is now a vector of only exogenous variables. The model is written as

fi(Yt, Yt-1,..., Yt-p, Et-lYt, Et-lYt+l,..., E t - l Y t + h , x t , cti) = uit, (38)

u,it = p i u i t - i + eit, i = 1,...,n, (39)

where Yt is an n-dimensional vector of endogenous variables, xt is a vector of ex-

ogenous variables, E t - 1 is the conditional expectations operator based on the model
and on information through period t - 1, c~i is a vector of parameters, pi is the serial
correlation coefficient for the error term uit, and eit is an error term that may be
correlated across equations but not across time. The function fi may be nonlinear in
variables, parameters, and expectations. The following is a brief review of the solution
method for this model. In what follows i is always meant to run from 1 through n.

2For exoanple,the extended path method has been programmed as part of the TROLL computer package
and is routinelyused to solve large scale rationalexpectations models at the IMF, the Federal Reserve, the
Canadian Financial Ministry,and other governmentagencies. It has also been used for simulationstudies
such as DeLong and Summers (1986) and King (1988). Other solution methods for rational expectations
models are summarized in Taylor and Uhlig (1990). These other methods do not yet appear practical
for medium size models and up. See also Fisher (1992) for a discussion of the EP method and various
extensions and alternatives and for an extensive bibliographyof work in this field.
162 R.C Fair

Case 1. Pi = O. Consider solving the model for period s. It is assumed that estimates
of c~i are available, that current and expected future values of the exogenous variables
are available, and that the current and future values of the error terms have been set to
their expected values (which will always be taken to be zero here). If the expectations
Es_lYs, Es_lYs+l,... ,Es_lYs+ h were known, (38) could be solved in the usual
ways (usually by the G a u s s - S e i d e l technique). The model would be simultaneous,
but future predicted values would not affect current predicted values. The EP method
iterates over solution paths. Values of the expectations through period s + h + k + h
are first guessed, where k is a fairly large number relative to h. 3 Given these guesses,
the model can be solved for periods s through s + h + h in the usual ways. This
solution provides new values for the expectations through period s + h + k - the new
expectations values are the solution values. Given these new values, the model can be
solved again for periods s through s + h + k, which provides new expectations values,
and so on. This process stops (if it does) when the solution values on one iteration are
within a prescribed tolerance criterion of the solution values on the previous iteration
for all periods s through s + h + k.
So far the guessed values of the expectations for periods s + h + k + 1 through
s + h + k + h (the h periods beyond the last period solved) have not been changed. If
the solution values for periods s through s + h depend in a nontrivial way on these
guesses, then overall convergence has not been achieved. To check for this, the entire
process above is repeated for k one larger. If increasing k by one has a trivial effect
(based on a tolerance criterion) on the solution values for s through s + h, then overall
convergence has been achieved; otherwise k must continue to be increased until the
criterion is met. In practice what is usually done is to experiment to find the value
of k that is large enough to make it likely that further increases are unnecessary for
any experiment that might be run and then do no further checking using larger values
of k.
The expected future values of the exogenous variables (which are needed for the
solution) can either be assumed to be the actual values (if available and known by
agents) or be projected from an assumed stochastic process. If the expected future
values of the exogenous variables are not the actual values, one extra step is needed
at the end of the overall solution. In the above process the expected values of the
exogenous variables would be used for all the solutions, the expected values of the
exogenous variables being chosen ahead of time. This yields values for E s - l y s ,
Es JYs+I,..., Es-lYt+h. Given these values, (38) is then solved for period s using
the actual value of x~, which yields the final solution value ~ . To the extent that the
expected value of x~ differs from the actual value, Es-ly.~ will differ from ~)s.

3Guessed values are usually taken to be the actual values if the solution is within the period for which
data exist. Otherwise, the last observed value of a variable can be used for the future values or the variable
can be extrapolated in some simple way. Sometimes information on the steady state solution (if there is
one) can be used to help form the guesses.
Ch. 3: Computational Methods for Macroeconometric Models 163

Two points about this method should be mentioned. First, no general convergence
proofs are available. If convergence is a problem, one can sometimes "damp" the so-
lution values to obtain convergence. In practice convergence is usually not a problem.
There may, of course, be more than one set of solution values, and so there is no
guarantee that the particular set found is unique. If there is more than one set, the set
that the method finds may depend on the guesses used for the expectations for the h
periods beyond s + h + k.
Second, the method relies on the certainty equivalence assumption even though the
model is nonlinear. Since expectations of functions are treated as functions of the
expectations in future periods in Eq. (38), the solution is only approximate unless
f i is linear. This assumption is like the linear quadratic approximation to rational
expectations models that has been proposed, for example, by Kydland and Prescott
(1982). Although the certainty equivalence assumption is widely used, including in
the engineering literature, it is, of course, not always a good approximation.

Case 2. Pi ~ 0 and data before s - l available. The existence of serial correlation

complicates the problem considerably. The error terms for period t - 1 ( u i t - l , i =
1,..., n) depend on expectations that were formed at the end of period t - 2, and so
a new viewpoint date is introduced. This case is discussed in Section 2.2 in Fair and
Taylor (1983), but an error was made in the treatment of the second viewpoint date.
The following method replaces the method in Section 2.2 of this paper.
Consider again solving for period s. If the values of u ~ - i were known, one could
solve the model as above. The only difference is that the value of an error term like
u ~ + , , - i would be Pir ~ is-I instead of zero. The overall solution method first uses
the EP method to solve for period s - j, where j > 0, based on the assumption
that u~_9_1 -- O. Once the expectations are solved for, (38) is used to solve for
u i s - j . The actual values of y s _ j and x~_j are used for this purpose (although the
solution values are used for the expectations) because these are structural errors being
estimated, not reduced form errors. Given the values for u i s - j , the model is solved for
period s - j + 1 using the EP method, where an error term like ui~-j+,, is computed
as p~ui.s-j. Once the expectations are solved for, (38) is used to solve for u i ~ - j + l ,
which can be used in the solution for period s - j + 2, and so on through the solution
for period s.
The solution for period s is based on the assumption that the error terms for period
s j 1 are zero. To see if the solution values for period s are sensitive to this
assumption, the entire process is repeated with j increased by 1. If going back one
more period has effects on the solution values for period s that are within a prescribed
tolerance criterion, then overall convergence has been achieved; otherwise j must
continue to be increased. Again, in practice one usually finds a value of j that is large
enough to make it likely that further increases are unnecessary for any experiment
that might be run and then do no further checking using larger values of j.
It should be noted that once period s is solved for, period s + 1 can be solved
for without going back again. From the solution for period s, the values of u,i,~ can
164 R.C. Fair

be computed, which can then be used in the solution for period s + 1 using the EP

Case 3. p~ 7~ 0 and data beJbre period s - 1 not available. This case is based on
the assumption that eis-1 = 0 when solving for period s. This type of an assump-
tion is usually made when estimating multiple equation models with moving average
residuals. The solution problem is to find the values of u~s-1 that are consistent with
this a s s u m p t i o n . T h e overall method begins by guessing values for ui~-2. Given these
values, the model can be solved for period s - 1 using the EP method and the fact
that %is+r-2 ~- p~'uis-2. From the solution values for the expectations, (38) and (39)
can be used to solve for ei~- j 4 If the absolute values of these errors are within a pre-
scribed tolerance criterion, convergence has been achieved. Otherwise, the new guess
for ui~-2 is computed as the old guess plus ei,-1/p~. The model is solved again for
period s - 1 using the new guess and the EP method, and so on until convergence is
At the point of convergence uis-1 can be computed as piUis_2, where u i s - z is the
estimated value on the last iteration (the value consistent with e~s-1 being within a
prescribed tolerance criterion of zero). Given the values of u i , - 1 , one can solve for
period 8 using the EP method, and the solution is finished.

10.3. Computational costs

The easiest way to think about the computational costs of the solution method is to
consider how many timcs the equations of a model must be "passed" through. Let
N1 be the number of passes through the model that it takes to solve the model for
one period, given the expectations. Nt is usually some number less than 10 when
the G a u s s - S e i d e l technique is used. The EP method requires solving the model for
h + h + 1 periods. Let N2 be the number of iterations it takes to achieve convergence
over these periods. Then the total number of passes for convergence is N2N1 ( h + k + 1).
If, say, h is 5, k is 30, N2 is 15, and N1 is 5, then the total number of passes needed
to solve the model for one period is 11,250, which compares to only 5 when there are
no expectations. If k is increased by one to check for overall convergence, the total
number of passes is slightly more than doubled, although, as noted above, this check
is not always done.
For Case 2 above the number of passes is increased by roughly a factor of j if
overall convergence is not checked. Checking for overall convergence slightly more
than doubles the number of passes, j is usually a number between 5 and 10. If q is
the number of iterations it takes to achieve convergence for Case 3 above, the number

4These are again estimates of the structural error terms, not the reduced form error terms. Step (iii) on
p. 1176 in Fair and Taylor (1983) is in error in this respect. The errors computed in step (iii) should be
the structural error terms.
Ch. 3: Computational Methods .fi)r Mactveconometric Models 165

of passes is increased by a factor of q + 1. In practice q seems to be between about 5

and 10. Note for both Cases 2 and 3 that the number of passes is increased relative
to the non serial correlation case only for the solution for the first period (period s).
If period s + 1 is to be solved for, no additional passes are needed over those for the
regular case.

10.4. F I M L estimation

Assume that the estimation period is I through T. The objective function that FIML
maximizes (assuming normality) ispresented in Eq. (9) above. In the present notation,
the i j element of S is ( I / T ) ~ t ~ 1 ci~ejL. Since the expectations have viewpoint
date t - 1, they are predetermined from the point of view of taking derivatives for
the Jacobian, and so no additional problems are involved /'or the Jacobian in the
rational expectations case. In what follows c~ will be used to denote the vector of
all the coefficients in the model. In the serial correlation case c~ also includes the Pi
In the standard case computing S for a give value of c~ is fairly inexpensive. One
simply solves (38) and (39) for the c~t error terms given the data and the value of c~.
This is only one pass through the model since it is the structural error terms that
are being computed. In the rational expectations case, however, computing the error
terms requires knowing the values of the expectations, which themselves depend on c~.
Therefore, to compute S for a given value of c~ one has to solve for the expectations
for each of the T periods. If, say, 11,250 passes through the model are needed to
solve the model for one period and if T is 100, then 1,125,000 passes are needed
for one evaluation of Z and thus one evaluation of L. In a 25 coefficient problem
discussed in Fair and Taylor (1990), the Parke algorithm required 2,817 evaluations
of L to converge, which would be over 3 trillion passes if done this way. 5
It should be clear that the straightforward combination of the EP solution method
and FIML estimation procedures is not likely to be computationally feasible for most
applications. There is, however, a way of cutting the number of times the model has to
be solved over the estimation period to roughly the number of estimated coefficients.
The trick is to compute numerical derivatives of the expectations with respect to
the parameters and use these derivatives to compute Z (and thus L) each time the
algorithm requires a value of L for a given value of ct.
Consider the derivative of E t _ l Y t + r with respect to the first element of c~. One can
first solve the model for a given value of c~ and then solve it again for the first element
of c~ changed by a certain percent, both solutions using the EP method. The computed
derivative is then the difference in the two solution values of E t - t yt+,. divided by
the change in the first element of c~. To compute all the derivatives requires K + 1

5Note that these solutions of the error term e~t are only approximations when f~ is nonlinear. Hence,
the method gives an approximation of the likelihood function.
166 R.C. Fair

solutions of the model over the T number of observations, where K is the dimension
of c~.6 One solution is for the base values, and the K solutions are for the K changes
in a , one coefficient change per solution. From these K + 1 solutions, K - T - (h + 1)
derivatives are computed and stored for each expectations variable, one derivative for
each length ahead for each period for each coefficient] Once these derivatives are
computed, they can be used in the computation of 27 for a given change in a , and
no further solutions of the model are needed. In other words, when the maximization
algorithm changes a and wants the corresponding value of L, the derivatives are first
used to compute the expectations, which are then used in the computation of Z . Since
one has (from the derivatives) an estimate of how the expectations change when c~
changes, one does not have to solve the model any more to get the expectations.
Assuming that the solution method in Case 3 above is used for the F I M L estimates,
derivatives of u~t- 1 with respect to the coefficients are also needed when the errors are
serially correlated. These derivatives can also be computed from the K + 1 solutions,
and so no extra solutions are needed in the serial correlation case.
Once the K + 1 solutions of the model have been done and the maximization
algorithm has found what it considers to be the optimum, the model can be solved
again for the T periods using the optimal coefficient values and then L computed. This
value of L will in general differ from the value of L computed using the derivatives
for the same coefficient values, since the derivatives are only approximations. At this
point the new solution values (not computed using the derivatives) can be used as new
base values and the problem turned over to the maximization algorithm again. This
is the second "iteration" of the overall process. Once the maximization algorithm
has found the new optimum, new base values can be computed, a new iteration
performed, and so on. Convergence is achieved when the coefficient estimates from
one iteration to the next are within a prescribed tolerance criterion of each other. This
procedure can be modified by recomputing the derivatives at the end of each iteration.
This m a y improve convergence, but it obviously adds considerably to the expense.
At a minimum, one might want to recompute the derivatives at the end of overall
convergence and then do one more iteration. If the coefficients change substantially
on this iteration, then overall convergence has not in fact been achieved.

10.5. Stochastic simulation

For models with rational expectations one must state very carefully what is meant by
a stochastic simulation of the model and what stochastic simulation is to be used for.

C'In the notation presented in Section 2 k rather than K is used to denote the dimension of a. K is
used in this section since k has already been used in the description of the EP method.
7Derivatives computed this way are "one sided". "Two sided" derivatives would require an extra K
solutions, where each coefficient would be both increased and decreased by the given percentage. My
experience is that two sided derivatives are generally unnecessary.
Ch. 3: Computational Methods .for Macroeconometric Models 167

In the present case stochastic simulation is n o t used to improve on the accuracy of

the solutions of the expected values. The expected values are computed exactly as
described above - using the EP method. This way of solving for the expected values
can be interpreted as assuming that agents at the beginning of period s form their
expectations of the endogenous variables for periods s and beyond by 1) forming
expectations of the exogenous variables for periods s and beyond, 2) setting the error
terms equal to their expected values (say zero) for periods s and beyond, 3) using the
existing set of coefficient estimates of the model, and then 4) solving the model for
periods s and beyond. These solution values are the agents' expectations.
For present purposes stochastic simulation begins once the expected values have
been solved for. Given the expected values for periods s through s + h, stochastic
simulation is performed for period s. The problem is how no different from the
problem for a standard model because the expectations are predetermined. Assume
that the errors are distributed N(0, ~ ) , where £2 is the FIML estimate of Z from
Section 10.4. From this distribution one can draw a vector of error terms for period s.
Given these draws (and the expectations), the model can be solved for period s in
the usual ways. This is one repetition. Another repetition can be done using a new
draw of the vector of error terms, and so on. The means and variances of the forecast
values can be computed using Eqs (30) and (32).
One can also use this approach to analyze the effects of uncertainty in the coeffi-
cients by assuming that the coefficients are distributed N(&, V4), where & is the FIML
estimate Of c~ and V4 is the estimated covariance matrix of &. In this case each draw
also involves the vector of coefficients.
if uit is serially correlated as in (39), then an estimate of uis-1 is needed for the
solution for period s. This estimate is, however, available from the solution of the
model to get the expectations (see Case 2 in Section 10.2), and so no further work is
needed. The estimate of u i s - 1 is simply taken as predetermined for all the repetitions,
and uis is computed as piu,zs-1 plus the draw for ei~. (Note that the e errors are
drawn, not the u errors.)
Stochastic simulation is quite inexpensive if only results for period s are needed
because the model only needs to be solved once using the EP method. Once the expec-
tations are obtained, each repetition merely requires solving the model for period s.
If, on the other hand, results for more than one period are needed and the simulation
is dynamic, the EP method must be used p times for each repetition, where p is the
length of the period.
Consider the multi period problem. As above, the expectations with viewpoint date
s - 1 can be solved for and then a vector of error terms and a vector of coefficients
drawn to compute the predicted value of Yis. This is the first step.
Now go to period s + 1. An agent's expectation of, say, Yis+a is different with
viewpoint date s than with viewpoint date s - 1. In particular, the value of y~ is in
general different from what the agent at the end of period s - 1 expected it to be
168 R.C. Fair

(because of the error terms that were drawn for period s). 8 A new set of expectations
must thus be computed with viewpoint date s. Agents are assumed to use the original
set of coefficients (not the set that was drawn) and to set the values of the error terms
for periods s + 1 and beyond equal to zero. Then given the solution value of yis and
the actual value of xs, agents are assumed to solve the model for their expectations
for periods s + 1 and beyond. This requires a second use of the EP method. Given
these expectations, a vector of error terms for period s + 1 is drawn and the model is
solved for period s + 1. If equation i has a serially correlated error, then uis+l is equal
to p,zuis_l plus the draw for eis+l. Now go to period s + 2 and repeat the process,
where another use of the EP method is needed to compute the new expectations. The
process is repeated through the end of the period of interest. At the end, this is one
repetition. The overall process is then repeated for the second repetition, and so on.
Note that only one coefficient draw is used per repetition, i.e., per dynamic simulation.
After J repetitions one can compute means and variances just as above, where there
are now means and variances for each period ahead of the prediction. Also note that
agents are always assumed to use the original set of coefficients and to set the current
and future error terms to zero. They do not perform stochastic simulation themselves.
Stochastic simulation has also been used to evaluate alternative international mone-
tary systems using the multicountry models in Carloyzi and Taylor (1985) and Taylor
(1988). For this work values of eit were drawn, but not values of the coefficients. The
vector of coefficients c~ was taken to be fixed.
It seems that stochastic simulation as defined above is computationally feasible
for models with rational expectations. Stochastic simulation is in fact likely to be
cheaper than even FIML estimation using the derivatives. If, for example, the FIML
estimation period is 100 observations and there are 25 coefficients to estimate, FIML
estimation requires that the model be solved 2600 times using the EP method to get
the derivatives. For a stochastic simulation of 8 periods and 100 repetitions, on the
other hand, the model has to be solved using the EP method only 800 times.

10.6. Conclusion

The results reported in Fair and Taylor (1990) using the methods discussed in this
section are encouraging regarding the use of models with rational expectations. FIML
estimation seems computationally feasible using the procedure of computing deriva-
tives for the expectations, and stochastic simulation is feasible when done in the
manner described above. FIML estimation is particularly important because it takes
into account all the nonlinear restrictions implied by the rational expectations hypoth-
esis. It is hoped that the methods discussed in this section will open the way for many
more tests of models with rational expectations.

8It may also be that the actual value of xs differs from what the agent expected it to be at the end of
Ch. 3: Computational Methods .fi)r Macroeconometric Models 169


Amemiya, T. (1982) 'The two stage least absolute deviations estimators', Econometrica, 50:689-711.
Carloyzi, N. and Taylor, J.B. (1985) 'Internafional capital mobility in the coordination of monetary policy
rules', in: J. Bandhari, ed., Exchange rate policy under uncertainty. Cambridge, MA: MIT Press.
Chow, G.C. (1981) Econometric analysis by control methods. New York: Wiley.
DeLong, J.B. and Summers~ L.H. (1986) 'Is increased price flexibility stabilizing?', American Economic
Review, 78:1031-1043.
Fair, R.C. (1974) 'On the solution of optimal control problems as maximization problems', Annals of
Economic and Social Measurement, 3:135-154.
Fair, R.C. (1984) Specification, estimation, and analysis of macroeconometric models. Cambridge, MA:
Harvard Univ. Press.
Fair, R.C. (1994) Testing macroeconometric models. Cambridge, MA: Harvard Univ. Press.
Fair, R.C. and Parke, W.R. (1993) 'The Fair-Parke program for the estimation and analysis of nonlinear
econometric models', mimeo.
Fair, R.C. and Taylor, J.B. (1983) 'Solution and maximum likelihood estimation of dynamic rational
expectations models', Econometrica, 51 : 1169-1185.
Fair, R.C. and Taylor, J.B. (1990) 'Full information estimation and stochastic simulation of models with
rational expectations', Journal of Applied Econometrics, 5:381-392.
Fisher, P. (1992) Rational expectations in macroeconomic models. London: Kluwer Academic Publishers.
Hansen, L.P. (1982) 'Large sample properties of generalized method of moments estimators', Econometrica,
King, S.R. (1988) 'Is increased price flexibility stabilizing?', American Economic Review, 78:267-272.
Kydland, EE. and Prescott, E.C. (1982) 'Time to build and aggregate fluctuations', Econometrica, 50:1342-
McCarthy, M.D. (1972) 'Some notes on the generation of pseudo-structural errors for use in stochastic
simulation studies', in: B.G. Hickman, ed., Econometric models ¢~fcyclical behavior. New York: Columbia
Univ. Press, pp. 185 191.
Parke, W.R. (1982) 'An algorithm for FIML and 3SLS estimation of large nonlinear models', Econometrica,
Powell, M.J.D. (1964) 'An efficient method for finding the minimum of a function of several variables
without calculating derivatives', Computer Journal, 7:155-162.
Taylor, J.B. (1988) 'Policy analysis with a multicountry model', NBER working paper.
Taylor, J.B. and Uhlig, H. (1990) 'Solving nonlinear stochastic growth models: A comparison of alternative
solution methods', Journal of Business and Economic Statistics, 8:1-21.
Chapter 4


University (~["Chicago University (~["Chicago


Federal Reserve Bank ()f"Minneapolis University (~f Chicago and
Hoover Institution, Stanford University


1. Introduction 173
2. Control problems 173
2.1. Deterministic regulator problem 174
2.2. Augmented regulator problem 176
2.3. Discounted stochastic regulator problem 177
2.4. A class of linear-quadratic economies 180
3. S o l v i n g the d e t e r m i n i s t i c linear r e g u l a t o r p r o b l e m 182
3.1. NonsingularAuv 184
3.2. Singular Avv 187
3.3. Continuous-time systems 189
4. C o m p u t a t i o n a l t e c h n i q u e s for s o l v i n g Riccati e q u a t i o n s 192
4.1. Schur algorithm 192
4.2. Doubling algoritlun 194
4.3. Matrix sign algorithm 200

*Lars Peter Hansen and Thomas J. Sargent acknowledge financial support from the National Scien-
ce Foundation, and Evan W. Anderson from a University of Chicago Century graduate fellowship.
This report benefited greatly from insightful comments by an anonymous referee. We especially thank
Peter Zadrozny for his invaluable comments. Conversations with Sherwin Rosen were very help-
ful in formulating two of our example economies and in estimating the cattle cycle model. To ob-
tain computer programs that implement the calculations described in the appendices, please send
an e-mail message to erm@ellen.mpls, frb.fed.us. To obtain computer programs that imple-
ment the algorithms for solving Riccati and Sylvester equations, please send an e-mail message to
ewandersOmidway, u c h i c a g o , edu. The views expressed in this paper are those of the authors and
not necessarily those of the Federal Reserve Bank of Minneapolis or the Federal Reserve System.

Handbook of Computational Economics, Volume L Edited by H.M. Amman, D.A. Kendrick and Z Rust
(~) 1996 Elsevier Science B.V. All rights reserved.
172 E.W. Anderson et al.

5. Solving the augmented regulator problem 202

6. Computational techniques for solving Sylvester equations 205
6.1. The Hessenberg-Schuralgorithm 205
6.2. Doublingalgorithm 207
7. Distorted economies 208
8. Example economies 210
8.1. A model of permanentincome with habit persistence 210
8.2. A model of education 212
8.3. A model of cattle cycles 215
9. Numerical comparisons 218
9.1. Solutionsto Riecati equations 219
9.2. Solutionsto Sylvesterequations 223
10. Innovations representations 224
10.1. Woldand autoregressiverepresentations 226
11. The likelihood function 227
12. Estimating the cattle cycles model 228
Appendix A. Computing ~L/i)O and ~Lt/~O for a state-space model 232
A.I. The formula for ~L/~O 232
A.2. Derivationof the formula 235
A.3. Standarderrors 242
Appendix B. Differentiating the state-space model with respect to
economic pariameters 242
B. 1. A linear-quadraticeconomywithout distortions 242
B.2. A nonlineareconomy without distortions 244
B.3. A linear-quadraticeconomywith distortions 246
References 250
Ch. 4: Mechanics ~["Forming and Estimating Dynamic Linear Economies 173

1. Introduction

This paper describes recent advances for rapidly and accurately solving matrix Riccati
and Sylvester equations and applies them to devise efficient computational methods
for solving and estimating dynamic linear economies. The paper surveys the most
promising solution methods available and compares their speed and accuracy for
some particular economic examples. Except for the simplest dynamic linear models,
it is necessary to compute solutions numerically. In estimation contexts, computation
speed is important because climbing a likelihood function can require that a model be
solved many times. We describe methods that are faster than direct iterations on the
Riccati equation and are more reliable than solutions based on eigenvalue-eigenvector
decompositions of the state-costate evolution equation. Our survey of these methods
draws heavily on Anderson (1978), Gardiner and Laub (1986), Golub, Nasb and Van
Loan (1979), Laub (1979, 1991) and Pappas, Laub and Sandell (1980).
This paper is organized as follows. Section 2 decomposes the optimal linear regula-
tor into sub-problems that are more efficient to solve and describes classes of economic
problems that give rise to such problems. Sections 3-6 describe recent algorithms for
solving these sub-problems. Section 7 extends the range of the basic algorithms to
the domain of "distorted economies" whose equilibria do not correspond to solutions
of optimum problems. Section 8 describes three particular economic models, one of
which is the cattle cycle model of Rosen, Murphy and Scheinkman (1994). Section 9
uses each of these models as contexts for speed and accuracy comparisons of al-
ternative algorithms. Sections 10 and 11 briefly describe innovations representations
and recursive computation of Gaussian likelihood functions. Two appendices (A and
B) provide formulas for computing derivatives of a Gaussian likelihood with respect
to a set of unknown parameters governing the tastes, technology, and information
flows of our economic models. These formulas, which build directly from the work
of Zadrozny (1988a, 1989), are designed to make numerical search algorithms for
maximizing a likelihood function more reliable and to assist in making statistical in-
ferences about the parameters of interest. Section 12 uses these formulas to estimate
Rosen, Murphy, and Scheinkman's model.

2. Control problems

In this section, we pose three optimal control problems. We begin with a problem close
to the much studied time-invariant deterministic optimal linear regulator problem.
We label this problem the deterministic regulator problem. We then consider two
progressively more general problems.
The first generalization introduces forcing sequences or "uncontrollable states" into
the deterministic regulator problem. While this generalization is also a deterministic
regulator problem, there are computational gains to exploiting the a priori knowledge
174 E.W. Anderson et al.

that some components of the state vector are uncontrollable. We refer to this gener-
alization as the augmented regulator problem. As we will see, a convenient first step
for solving an augmented regulatorproblem is to solve a corresponding deterministic
regulator problem in which the forcing sequence is "zeroed out". In other words, we
obtain a piece of the solution to the augmented regulatorproblem by initially solving
a problem with a smaller number of state variables.
The second generalization introduces, among other things, discounting and uncer-
tainty into the augmented regulatorproblem. We refer to the resulting problem as the
discounted stochastic regulator problem. Using well known transformations of the
state and control vectors, we show how to convert this problem into a corresponding
undiscounted augmented regulatorproblem without uncertainty. Therefore, while our
original problem is a discounted stochastic regulator problem, we solve it by first
solving a deterministic regulator problem with a smaller number of state variables,
then solving a corresponding augmented regulatorproblem, and finally using this lat-
ter solution to construct the solution to the original problem in the manner described

2.1. Deterministic regulatorproblem

Choose a control sequence {vt} to maximize
t 2
- + w Gyw),

subject to

yt+l = Ayyyt + Byvt,

~-~ (Ivtl 2 + lytl 2) < oo. (2.1)


This control problem is a standard time-invariant, deterministic optimal linear reg-

ulator problem with one modification. We have added a stability condition, (2.1), that
is absent in the usual formulation. This stability condition plays a central role in at
least one important class of dynamic economic models: permanent income models.
More will be said about these models subsequently. In these models, the stability
condition can be viewed as an infinite horizon counterpart to a terminal condition on
the capital stock.
Following the literature on the time-invariant optimal linear regulator problem, we
impose the following:
DEFINITION. The pair (Ayy, By) is stabilizable if y'B~ = 0 and ytAyy = Ay' for
some complex number A and some complex vector y implies that IAI < 1 or y = 0.
Ch. 4: Mechanics of Forming and EstimatingDynamicLinear Economies 175

ASSUMPTION 1. (Ayv, By) is stabilizable.

Stabilizability is equivalent to the existence of a time-invariant control law that

stabilizes the state [see Anderson and Moore (1979, Appendix C)]. For our applica-
tions, it can often be verified by showing that a trivial control law, such as setting
investment equal to zero, achieves this stability.
In solving this problem, we are primarily interested in specifications for which all
of the state variables are "endogenous", and hence the following stronger restriction
is met:

DEFINITION. The pair (Avy, By) is controllable if y~By = 0 and y~Ay u = Ay ~ for
some complex number A and some complex vector y implies that y is zero.

W h e n (Ayy,By) is controllable, starting from an initialization of zero, the state

vector can attain any arbitrary value in a finite number of time periods by an appro-
priate setting of the controls [see Anderson and Moore (1979, Appendix C)]. 1 For this
reason, we can think of a state vector sequence with evolution equation governed by
a pair (Auy , By) that is controllable as being an endogenous state vector sequence.
While Assumption 1 gives us a nonempty constraint set, it is still possible that the
supremum of the objective is not attained. We assume the following:

ASSUMPTION 2. The matrix Qyv is positive semidefinite, and the matrix R is positive

A m o n g other things, this concavity assumption puts an upper bound of zero on the
criterion function. Therefore, the supremum is finite (and nonpositive). We require
that the supremum is attained.

ASSUMPTION 3. There exists a solution to the deterministic regulator problem for each
initialization of Yo.

A commonly used sufficient condition in the control theory literature for there to
exist a solution is detectability. Factor Qvy = DyDy t.

DEFINITION. The pair (Avv , Dy) is detectable if Dv~y = 0 and Ayyy ~- Ay for some
complex number A and some complex vector y implies that IA[ < 1 or y = 0.

W h e n the pair (Ayy,Dy) is detectable, it is optimal to choose a control sequence

that stabilizes the state vector. In this case, the solution to the control problem is the

1This is one of five equivalent characterizations of reachabilitygiven in Appendix C of Anderson and

Moore (1979). However, many other control theorists take one of these characterizations as the definition
of controllability.For instance, see Kwakemaak and Sivan (1972) and Caines (1988). We choose to follow
this latter convention,
176 E.V~ Anderson et al.

same with or without the stability constraint (2.1). However, as we mentioned previ-
ously, for permanent income models the stability constraint is essential for obtaining
an interpretable solution to the problem. For these models, detectability is too strong
of a condition to impose. Chan, Goodwin and Sin (1984) give a weaker sufficient
condition for there to exist a solution (see (iii) of Theorem 3.10). In the context of
a continuous-time formulation, Hansen, Heaton and Sargent (1991) proposed a very
similar sufficient condition for stabilizable systems based on a spectral representa-
tion of the deterministic regulator problem. Unfortunately, these conditions may be
tedious to check in practice. Some of the solution algorithms we survey below could
in principle be modified to detect a violation of Assumption 3.
A sufficient condition for convergence of one of the solution algorithms that we
survey below is that the pair (Ayy, Dy) be observable:

DEFINITION. The pair (Ayy, Dy) is observable if Dy'y = 0 and Ayyy = Ay for some
complex number A and some complex vector y implies that y = 0.

Clearly, observability is stronger than detectability. Moreover, observability is guar-

anteed when the matrix Qyy is nonsingular. When the pair (Ayy, Dy) is observable,
the value function associated with the deterministic regulatorproblem is strictly con-
cave in the state vector y [Caines and Mayne (1970, 1971)].
The solution to the deterministic regulator problem takes the form

Vt = --Fyyt

lbr some feedback matrix Fy. Stability constraint (2.1) guarantees that the eigenvalues
of Ayy - ByFy have absolute values that are strictly less than one because the state
evolution equation when the optimal control is imposed is given by

Yt+l = (Ayv - BvFy)yt.

2.2. Augmented regulatorproblem

Choose a control sequence {vt} to maximize


- ~(v~'Rvt + yt ' Q~yyt + 2 yt ' Qy~zt),


subject to
Ch. 4: Mechanics of Forming and Estimating Dynamic Linear Economies 177


(IVtl 2 q-lytl 2) < OO.


We have modified the linear regulator problem by including the exogenous forcing
sequence {zt}. The presumption here is that this partitioning may occur naturally in the
specification of the oi:iginal control problem. Of course, as is well known in the control
theory literature, we could always transform an original state vector into controllable
and uncontrollable components. Constructing this transformation, however, can be
difficult to do in a numerically reliable way. In the next section we will display a
class of optimal resource allocation problems associated with dynamic economies
for which zt contains a vector of taste and technology shifters. By assumption, this
component of the state vector cannot be influenced by a control vector such as the
level of investment.
For the augmented regulator problem to be well posed, we require that the forcing
sequence be stable:

ASSUMPTION 4. The eigenvalues of Az~ have absolute values that are strictly less
than one.

The solution to the deterministic regulator problem gives us a piece of the solution
to the augmented regulator problem. More precisely, the solution to the augmented
problem is

Vt -- - Fyyt -- Fz zt ,

where the matrix Fy is the same as in the solution to the regulator problem for which
the forcing sequence {zt} is zeroed out. Consequently, our solution methods entail
first computing Fy by solving a deterministic regulator problem of lower dimension
and then computing F~ given Fy.

2.3. Discounted stochastic regulator problem

Let {St: t = 0, 1,...} denote an increasing sequence of sigma algebras (informa-

tion sets) defined on an underlying probability space. We presume the existence
of a "building block" process of conditionally homoskedastic martingale differences
{wt: t -- l, 2, ...}, which o b e y s

ASSUMPTION 5. The process {wt: ~ = 1 , 2 , . . . } satisfies

(i) E(~+~ 17~) = 0;
(ii) E(wt+lwt+l' I f t) = I.
178 E.W. Anderson et al.

The discounted stochastic regulator problem is to choose a control process {ut},

adapted to {.Yt}, to maximize

[: ;1E::] .o)
subject to

Zt+l ~- Axt + But + CWt+l,

The state vector xt is taken to be the composite of the endogenous and exogenous
state variables. Let Uy = [I 0] be a matrix that selects' the endogenous state vector
Uyxt and Uz = [0 I ] be a matrix that selects the exogenous state vector Uzxt for
an optimization problem with discounting. To justify our partitioning, the matrix A is
restricted to satisfy U~AUy' = 0, and the matrix B is restricted to satisfy U~B = O.
Notice that in addition to incorporating discounting and uncertainty, the discounted
stochastic regulator includes cross-product terms between controls and states, which
are absent in the augmented control problem.
We now apply a standard trick for converting a discounted stochastic regulator
problem to an augmented regulator problem. Using the well known certainty equiv-
alence property of stochastic linear regulator problems, we zero out the uncertainty
without altering the optimal control law. That is, we are free to set the matrix C
to zero and instead solve the resulting deterministic control problem. We eliminate
discounting and cross-product terms between states and controls by using the trans-

Yt = flt/2Uyxt, zt = • t / 2 g z x t , vt = ~ t / 2 ( u t ~- R - I W t x t ) .

As is evident from these formulas, we have absorbed the discounting directly into
the construction of the transformed state and control vectors. In addition, the cross-
product matrix S is folded into the construction of the transformed control vector.
We are left with a version of the augmented regulator problem with the following

A ~ J = fll/2(A -- B R - I w t ) ' By = fll/2UvB ,

Q,z Q~] = Q- WR-1W'
Ch. 4: Mechanics of Forming and Estimating Dynamic Linear Economies 179

Assumptions 1-4 are imposed on the constructed matrices on the left-hand side of
the equal signs in (2.2).
As before, write the solution to the augmented regulator problem as

vt ---- --Fuyt -- Fzzt.

Then the solution to the discounted stochastic regulator problem is

ut = -Fxt,

FY + R-1W '
F= Fz

Also as before, the matrix F u can be computed by solving the corresponding de-
terministic regulator problem with the forcing sequence "zeroed out". In subsequent
sections we will describe methods for computing F v and Fz.
In macroeconomics, the discounted stochastic regulator problem is often obtained
in the fashion of Kydland and Prescott (1982), who use it to replace a nonlinear-
quadratic problem. Thus consider the nonquadratic optimization problem: choose an
adapted (to {act}) control process {ut} to maximize


subject to

xt+l = A x t + B u t + Cwt+l.

Here r is not required to be a quadratic function of ut and xt. When the associated
constraints are nonlinear, sometimes we can substitute the nonlinear constraints into
the criterion function to obtain a problem of the form of (2.3). Kydland and Prescott
(1982) simply replace the function r by a quadratic form in [ut' z t ' ] ' as required for
the discounted stoChastic regulator problem, where the quadratic function is designed
to "approximate" r well near a particular value for the state vector. 2 In the next
subsection, we describe a different approach where, by design, the initial optimal
resource allocation problem can be directly converted into a discounted stochastic
regulator problem.

2While Kydland and Prescott (1982) apply an ad hoc global approximation to r in which the range
of approximation is adapted to the amount of underlying uncertainty, many subsequent researchers have
instead simply used a local Taylor series approximationaround some "nonstochastic" steady state produced
by shutting down all randomness in the model. Kydland and Prescott (1982) note that for the range of
uncertainty they considered, the two methods gave shnilar answers.
180 E.!~ Anderxon et al.

2.4. A class of linear-quadratic economies

We will consider several numerical examples that are members of a class of economies
used by Hansen (1987) and Hansen and Sargent (1994). As in the discounted stochastic
regulator problem, there is an exogenous information vector zt governed by

2 +1 = + (2.4)

where { w t } satisfies Assumption 5 and A ~ - v ~ A ~ satisfies Assumption 4. The

vector 2t determines a time t preference shock bt and a time t endowment shock dt

dt = Ud &,
bt = Ub Zt. (2.5)

A representative household has preferences ordered by

2 - + tg l fo (2.6)

where 9t is a vector of labor-using intern/ediate activities (designed to capture gener-

alized adjustment costs), and st is a vector of household services produced at time t
via the household technology

st = Aht-1 + Hct,
ht = A h h t - I + OhCt. (2.7)

In (2.7), ht is a vector of stocks of household durable goods at t, ct is a vector of

consumption flows, and A, H, Ah, Oh are matrices. There is a constant returns to
scale production technology

~5cct + qBiit + ~ggt = ffkt-1 + dr,

]gt = Akl~t-I q- Okit, (2.8)

where kt is a vector of capital goods used in production, it is a vector of invest-

ment goods, and Ak is a matrix) Hansen and Sargent (1994) describe a competitive
equilibrium for this economy. Associated with the competitive equilibrium is a social
planning problem, namely, to maximize (2.6) over Choices of contingency plans for

{st, ct, zt, co (adapted processes) subject to (2.4)-(2.8) with given initial-
gt, kt, h t}t=o
izations for (zo, h - l , k - l ) .

3Under the constant returns to scale interpretation, dt is taken as an additional "input" available in
fixed supply.
Ch. 4: Mechanics (?f"Forming and Estimating Dynamic Linear Economie.~" 18t

To map this problem into the notation of the previous section, we let

V ht-11
L#t J
We view the first two components of the state vector to be endogenous and the third
component to be exogenous. ]'he control vector ut can be chosen to be investment it
when the matrix qb = [~c ~a ] is nonsingular because in this case 4

Ct ] = (/.3-1(1"ht__l .q~ gd~t _ ~iit). (2.9)


Using this relation, the constraints (2.7) and (2.8) can be rewritten

xt+t = A x t + B u t + Cwt+l

for appropriately chosen matrices A, B, C. The matrix A is block triangular and the
bottom row block of B is zero as required for the discounted stochastic regulator
problem. Moreover, using (2.9) and (2.7), the time t terms Ist - btl 2 and Igtl 2 in the
objective function (2.6) of the social planner both can be expressed as quadratic forms
in the control it and the augmented state xt. Therefore, the social planner's problem
is a discounted stochastic regulator problem.
In permanent income economies, stability of the state vector process is not obtained
automatically as an implication of optimality. An example of such an economy is one
with a single consumption and capital good and no labor-using intermediate activities.
The counterpart to Eq. (2.9) is

ct = F k t - i + UdZt -- it.

We constrain the subjective discount factor to be the reciprocal of the physical return
to capital: fl = 1/(1" + A k ) . In the absence of a stability constraint, the solution to
the resulting control problem does not "stabilize" the capital stock sequence because
the sequence of capital stocks often diverges to minus infinity at a rate not even
dominated by 1/v/ft. This solution to the control problem is not interesting. Therefore,
we impose stability as an additional constraint, with the consequence that the solution
to the resulting infinite-horizon control problem is equal to the limit of the solutions
to a sequence of corresponding finite-horizon problems, each of which has a zero
restriction imposed on the terminal capital stock.

4When 4) is singular, the control vector can be augmented to include some of the components of
consumption or the labor-using intermediate activities.
182 E. • Anderson et aL

3. Solving the deterministic linear regulator problem

In this section we describe ways to solve for the matrix Fy. Recall that this matrix has
a double role. First, it gives the control law for a particular deterministic regulator
problem. More importantly for us, it also gives a piece of the solution to the discounted
stochastic regulator problem.
In describing methods for computing Fy, it is convenient to work with the state-
costate equations associated with the Lagrangian

C = -- ~ [Yt'QyyYt q- vttl:gvt q- 2]~t+l'(Ayyyt q- B y V t - yt+l)]. (3.~)


First-order necessary conditions for the maximization o f / ; with respect to {Vt}tC~=O

and {Yt}t=O a r e
Vt: R V t q- By']~t+l :- 0, t ~> 0, (3.2)
Yt: #t = QyyYt + AyyllZt+l, t >~O. (3.3)
To obtain a composite state-costate evolution equation, solve (3.2) for vt, substitute
the solution into the state evolution equation, and stack the resulting equation and
(3.3) and write the state-costate evolution equation as

I #t '


ByR-lpy ~ [ Ay:
] ' I -%y °,1
There is also a continuous-time counterpart to this system given by

D pt
=HIy I
L#t J '


Ayy -ByR-1By ']

H -= _Qyy -Ayy' " (3.6)

Equation (3.5) is the state-costate equation corresponding to the continuous-time reg-

oo !
ulator problem with criterion - f0 [y(t) Qyvy(t) + u(t)'Ru(t)] dt and law of motion
Dy(t) = Ayvy(t ) + Byu(t), where D is the time-differentiation operator. We describe
Ch. 4: Mechanics of Forming and Estimating Dynamic Linear Economies 183
several methods for solving Eqs (3.4) and (3.5). Formally, we will devote most of our
attention to the discrete-time system (3.4). As we will see, methods designed for solv-
ing the continuous-time system (3.5) can be adapted easily to solve the discrete-time
system (3.4), and conversely.
The solution to (3.4) of interest to us is the one that stabilizes the state-costate
vector sequence for any initialization Y0, Since we have transformed the state vector
to eliminate discounting, we impose stability in the form of square summability:,

~ [Yt]2<oo, (3.7)
t=o #t J

for the discrete-time system (3.4). (We impose the analogous square integrability
restriction on the continuous time system (3.5).)
One way to ascertain the solution to the deterministic regulator problem is to
find an initial costate vector expressed as a function of the initial state vector Y0 that
guarantees the stability of system (3.4) or (3.5). The initialization of the costate vector
takes the form #o = PyYo and is replicated over time. Substituting PvYt for #t into
(3.4), we find that
( I @ B y R -1 B y ' P y ) y t + l = A y y y t ,

Ayy' Pyyt+l = - Q v y y t + Pyyt. (3.8)

It is straightforward to verify that

(I + B y R - I B y ' R u ) -' : I - B~(R + By'PyBy)-IBy'Ru. (3.9)

Solving the first equation in (3.8) for Yt+l

Yt+l = (Avv - BvFv)Yt, (3.10)


Fy = (R + By'PyBy)-lBv'PyAyy. (3.11)

Premultiplying (3.10) by Ayy'P~ gives

Auv' Pvy~+l = (Avv' PvAyy - Avu' PuBvfu)yt. (3.12)

For the right-hand side of Eq. (3.12) to agree with the right-hand side of the second
equation of (3.8) for any initialization Y0, it must be that
Py = Qvv + Ayv'PyAvy - Ayy'PyBv(R + B v ' P y B y ) - ' B y ' P ~ A v v
= Qyy + (Avy - ByFy)'Py(Ayy - BvFv) + Fy'RFv, (3,,13)
184 E. l,E A n d e r s o n et aL

which is the familiar Riccati equation. In other words, the matrix Pv used to set the
initial condition on the costate vector is also a solution to the Riccati equation (3.13).
With this initialization, the costate relation #t = PvYt holds for all t ~> 0. Finally, it
follows from (3.10) that this state-costate solution is implemented by the control law
Vt = -Fgyt.
The remainder of this section is organized as follows. In the first subsection, we
initially consider the case in which the matrix Ayv is nonsingular. While this case is
studied for pedagogical simplicity, it is also of interest in its own right. In the second
subsection, we then treat the more general case in which Ayy can be singular. As
emphasized by Pappas, Laub and Sandell (1980), singularity in Ayy occurs naturally
in dynamic systems with delays. One of our example economics used in our numerical
experiments has a singular matrix Auu. Finally, in the third subsection we study the
continuous-time counterpart to the deterministic regulator problem. We describe an
alternative solution method and show how to convert a discrete-time regulator problem
into a continuous-time regulator with the same relation between optimally chosen
state and costatc vectors. We defer the discussion of the numerical algorithms used
for implementing these methods until the next section.

3.1. Nonsingular Avy

When the matrix Avy is nonsingular, we can solve (3.4) for [ Vt+l ].
• L~t+l .1

~ t -t- 1 ~'t '


= I Avv+BvR - 1Bv'A~y - IQv v - B w~-]R
~ ~ y 'ta~
- v v -i] I
t --1
Qyy A~y
t --I
We find the matrix Pv by locating the stable invariant subspace of the matrix M .

DEFINITION. An invariant subspace of a matrix M is a linear space C of possibly

complex vectors for which MC = C.

Invariant subspaces are constructed by taking linear combinations of eigenvectors

of M . A stable invariant subspace is one for which the corresponding eigenvalues
have absolute values less than one. To solve the model, we aim to find the matrix
Py such that [ / y ] y is in the stable invariant subspace of M for every n dimensional
vector y. We now elaborate on how to compute this subspace.
Ch. 4: Mechanics c~]Forming and Estimating Dynamic Linear Economies 185
The matrix M has a particular structure that we can exploit in characterizing its
eigenvalues. To represent this structure, we introduce a matrix J given by

Notice that J - 1 = j~ = _ j .

DEFINITION. A matrix M is symplectic if M J M ~ = J.

It is straightforward to verify that M given by (3.15) is symplectic. It follows that

M ~= J-IM-IJ. (3.16)

Therefore, the transpose of M is similar to its inverse. Recall that similar matrices
define the same linear transformation but with respect to a different coordinate system.
Thus M ~ and M -1 share the same eigenvalues. For any matrix M, the eigenvalues of
M - 1 are the reciprocals of the eigenvalues of M, so it follows that the eigenvalues of
a real symplectic matrix come in reciprocal pairs, and the number of stable eigenvalues
cannot exceed the number of states n. However, merely requiring M to be symplectic
permits there to be eigenvatues with absolute values equal to one, and so we will
need an additional argument to show that there are exactly n stable eigenvalues.
To locate the stable invariant subspace of the symplectic matrix M, we follow Laub
(t979) and (block) triangularize M:
V - I M V = W,

W= [ Wll W221VVl
] ,2 (3.17)

where V is a nonsingular matrix. By construction, the matrices M and W are similar.

The matrix partitions in (3.17) are built to coincide with the number of stable and
unstable eigenvalues. In particular, the absolute values of the eigenvalues of WII are
A special case of this decomposition is an appropriately ordered Jordan decom-
position of M as was used by Vaughan (1970) in developing an invariant subspace
algorithm for computing Pu. Laub (1991) traces this solution strategy back to the 19th
century and credits MacFarlane (1963) and Potter (1966) with introducing it to the
control literature. As emphasized by Laub (1991), it is preferable to build algorithms
based on other upper triangular decompositions that are more numerically stable. The
Jordan decomposition is particularly problematic when the symplectic matrix M has
eigenvalues with multiplicities greater than one (see also Golub and Wilkinson 1976).
In the next section, we describe alternative Schur decompositions, which are more
reliable numerically.
186 E.W. Anderson et al.

To use this triangularization to calculate Pv, apply V -1 to both sides of the state
Eq. (3.14):

* = Wy~


.__V-1 [Yt]
Yt #t

This transformation permits us to study asymptotic properties in terms of two smaller

uncoupled subsystems. Partition y~" into two blocks with dimensions given by the
number of stable and unstable eigenvalues:

Yt 1_Y2,t .1


Y2,t+l = WzzYz,t,

and the solution sequence { y*2,t} fails to converge to zero unless it is initialized at
zero. Setting y*2,o at zero can be accomplished by an appropriate initialization of the
costate vector, as we now verify•
Partition the matrices V and V - I as

[ Vll V12] V-1 [V ll V 12]

v = v =j ' = Lv2, v 2j •

Since V is nonsingular and there exists a (stable) solution to the optimal control
problem, we must have

V 2 1 y t + v Z 2 # t = O. (3.18)

The rank of the matrix [ V 21 V22 t equals the number of unstable eigenvalues of
M , and thus the rank of its null space must equal the number of stable eigenvalues.
For a solution to exist for every initialization Y0 = Y, it follows from (3.18) that there
must exist a # such that

V21y + V22# -- O.
Ch. 4: Mechanics of Forming and Estimating Dynamic Linear Economies 187

Thus the dimensionality of the null space of [ I/"2t V 22 ] must also be at least n.
Therefore, M has exactly n stable eigenvalues, and the matrix partition V 22 is non-
singular. Solving (3.18) for #t gives

P,t ~-- - (V 22) - t V21 Yr.

Consequently, the matrix Pu used to initialize the costate vector is given by

P.v ~---- ( V 2 2 ) -1V21 ~- V21Vl1-1, (3.19)

where the second equality follows since LV21 ] has rank n, and

L½1 = 0.

3.2. Singular Avy

We now extend the solution method to accommodate singularity in Auy. This method
avoids inverting the L matrix in (3.4). Instead of locating the stable invariant subspace
of M , a deflating subspace method finds the stable deflating subspace of the pencil
),L - N .

DEFINITION. A pencil AL - N is the family of matrices {AL - N } indexed by the

complex variable A.

DEFINITION. A deflating subspace of the pencil AL - N is the subspace C of complex

vectors such that the dimension of C is at least as large as the dimension of the sum
of the subspaces L C and N C .

For the matrices L and N of Eq. (3.4), it can be verified that the intersection of their
null spaces contains only the zero vector? This ensures that a generalized eigenvalue
problem is well posed. When a subspace C is deflating, there exists a vector' x in C
that solves the generalized eigenvalue problem

ALy -- N y

5See Theorem 3 of Pappas, Laub and Sandell (1980) for the case in which (Awu , Du) is detectable.
As we noted previously,the restrictionto a detectable system rules out some interestingeconomicmodels.
More generally, nonexistenceof a common nonzero vector in the null spaces of N and L can be shown
by way of contradiction. Suppose there is a common nonzero vector in the null space. Then the matrix
(I + QyvByR-IBy t) is singular. However, this singularitycontradicts Theorem 1 of Kimura (1988).
188 E. V~ Anderson et aL

[see Stewart (1972, Theorem 2.1)]. Implicitly, we are including the possibility of a
solution with A = e~, which occurs when y is in the null space of L but not in the null
space of N. As with the previous (invariant subspace) method, the deflating subspace
of interest for solving the optimal control problem is the deflating subspace associated
with the stable state-costate sequence. The stable deflating subspace is the subspace
associated with the stable generalized eigenvectors (the eigenvectors associated with
generalized eigenvalues with absolute values strictly less than one). Hence we solve
the model by finding a matrix Pv such that [ L ]Y is in the stable deflating subspace

of the pencil AL - N.
Recall that when Avy is nonsingular, the matrix M is symplectic. More generally,
system (3.4) is associated with a symplectic pencil

DEFINITION. A pencil AL - N is symplectic if L J L I = N J N ~.

Pappas, Laub and Sandell (1980, Theorem 4) show that the generalized eigenvalues
of the symplectic pencil (AL - N) come in reciprocal pairs, just as the eigenvalues
of M do when Ayy is nonsingular. Hence we again have that the number of stable
generalized eigenvalues is no greater than n. Furthermore, we can imitate our argument
in the case in which Ayy is nonsingular to show that there are exactly n stable
generalized eigenvalues. 6
We triangularize the state-costate system (3.4) using the solutions to the generalized
eigenvalue problem. As in Theorem 2.1 of Stewart (1972), there exists a decomposi-
tion of the pencil AL - N such that

T==j ' UNV = w = w== '

where U and V are unitary matrices and the matrix partitions have the same number,
n, of elements as the number of entries in the state vector yr. Premultiplication of the
pencil AL - N by the nonsingular matrix U preserves the solutions to the generalized
eigenvalue problem, and postmultiplication by V alters the generalized eigenvectors
but not the eigenvalues. A consequence of the triangularization is that the solutions
to the generalized eigenvalue problem for the original system are constructed directly
from the solutions to the following two smaller problems:

ATI:~? = W:t~,
,~Tz2y = W22Y. (3.21)

As with the invariant subspace method, we build the blocks of the triangularization
so that the generalized eigenvalues of the first problem in (3.21) satisfy I,,k[ < 1, and

6Theorems 3 and .4 of Pappas, Laub and Sandell (1980) establish this result when the pair (Ayv , Dv)
is detectable.
Ch. 4: Mechanics of Forming and Estimating Dynamic Linear Economies 189

for the second problem IAI > 1. As a consequence, the span of the first rz columns of
V gives the vectors of the deflating subspace we seek. The span of the remaining rL
columns contains the problematic initializations of the state-costate vector for which
the implied sequence of state-costate vectors diverges exponentially. In addition, it
includes the span of the generalized eigenvectors associated with infinite eigenvalues.
Imitating the solution.method when Avv is nonsingular, we initialize the costate vector
as #t = P.vY~, where the matrix P.v is again given by (3.19).
To understand better the nature of this unstable subspace, recall that an eigenvec-
tot associated with an infinite eigenvalue is in the null space of T22. Suppose the
triangularization of L and N is built so that we can further partition the matrices:

'/722= 0

022 J '
where the matrices MH and 022 are nonsingular. Such a triangularization always
exists. Consider solving the following equation recursively for a sequence {yt+l }; for
each t solve for 9t+l given 9t by using

For this equation to have a solution, the second component of Yt must be zero for all
t because

O22fft,2 = 0, (3.22)

and 022 is nonsingular. In addition to eliminating the nonexistence problem, impos-

ing this restriction also resolves the multiplicity problem. Note that the multiplicity
problem for the triangular system is that for a given t, (3.22) does not restrict Xt+l,2.
However, (3.22) applied to time t + 1 resolves the problem.

3.3. Continuous-time systems

To conclude this section, we consider solving continuous-time Hamiltonian systems

of the form (3.5). The defining feature of a Hamiltonian matrix is:

DEFINITION. A matrix H is Hamiltonian if J H is symmetric.

The matrix H in (3.5), (3.6) clearly satisfies this property. It follows that

H' = - J H J -I,
190 E.W.Andersonet al.
which in turn implies that the matrix H ' is similar to - H . Consequently, the eigenval-
ues of a real Hamiltonian matrix come in pairs that are symmetric about the imaginary
axis of the complex plane. The stable eigenvalues of a Hamiltonian matrix are those
whose real parts are strictly negative. Similar arguments to those given above guaran-
tee that there are exactly n stable eigenvalues of H . Therefore, (3.5) can be solved by
using an invariant subspace method and its associated decomposition (3.17), provided
that the classification of stable and unstable eigenvatues is modified appropriately. 7
There is an alternative approach for solving a continuous-time Hamiltonian system.
Given a Hamiltonian matrix H , another Hamiltonian matrix G is constructed with the
same stable and unstable invariant subspaces. The matrix G is called the "sign" of
the matrix H , and is defined as follows. Take the Jordan decomposition of H :

H = V [ All 0 I V - I
0 A22 '
where A l l is an upper triangular matrix with the eigenvalues of H that have strictly
negative real parts on the diagonals, and A22 is an upper triangular matrix with the
eigenvalues of H that have strictly positive real parts on the diagonals. Then

Thus the sign of a matrix is a new matrix with the same eigenvectors as the original
matrix and with eigenvalues replaced by - 1 or 1 depending on the signs of the real
parts of the original eigenvalues.
The matrix Pv can be inferred directly from G. To see this, we use an insight from
Roberts (1980). By construction, all of the stable eigenvalues of G are equal to - 1 .
Consequently, the matrix Pv satisfies the following eigenvalue problem:

for any 'r~ dimensional vector y, and the matrix Py solves the affine equation


This method is implemented by finding fast ways to compute the "sign" of a matrix.

7Deflating subspace methods are not needed for solving the class of continuous-time quadratic control
problems considered here because we can form directly the Hamiltonian matrix and apply an invariant
subspace method. However, as we have formulated it, the continuous-timeproblem does not permit systems
with finite gestation lags in making investment goods productive or systems for which consumption services
depend on only a finite interval of past consumptions.
Ch. 4: Mechanics of Forming and Estimating Dynamic Linear Economies 191

While the matrix sign method is directly applicable for solving continuous-time
Hamiltonian systems, Hitz and Anderson (1972) and Gardiner and Laub (1986) show
how to use it to locate deflating subspaces of discrete-time systems, Consider the
generalized eigenvalue problem for the symplectic pencil

ALy-- N.


(1 + A)(L - N ) y ---=(1 - A)(L + N ) y .

Since tile only common vector in the null space of L and N is zero, we construct the
solution to the eigenvalue problem

5y = ( N - L) - I ( L + N ) ,



Consequently, the stability relations (2.1) carry over here as well, and we apply the
matrix sign algorithm to (N - L ) - I ( L + N).
It also turns out that ( N - L ) - I ( L + N ) is a Hamiltonian matrix, which we can
exploit in computation. To verify the Hamiltonian structure, note that

(L- N)J(L' + N') = LJL' - NJN' - NJL' + LJN'

= -NJL l + LJN I
=- - ( L + N ) J ( L ' - N ' ) ,

where we have used the fact that AL - N is a symplectic pencil. Therefore,

J(L - N)-I(L+ N ) = (L' + N ' ) ( L ' + N ' ) - t J ( L - N ) - I (L + N )

= (L' + N ' ) [ - ( L - N)J(L' + N ' ) ] - I (L + N)
= (L' + N')[(L + N)J(L' - N')]-l(L + N)
= (L' + N')(L' - N')-'J',

which proves that (N - L) -1 (L + N) is a Hamiltonian matrix.

In summary, by construction, the stable (unstable) invm'iant subspace of the Hamil-
tonian matrix ( N - L ) - I ( L + N ) coincides with the stable (unstable) deflating
192 E.W.Andersonet al.

subspace of the symplectic pencil AL - N. This coincidence permits us to compute

the matrix Pv used for initializing the costate vector for the discrete-time system (3.4)
by applying a matrix sign algorithm to (N - L ) - I ( L + N).

4. Computational techniques for solving Riccati equations

We consider three types of algorithms for computing Py:

(i) Schur algorithm;
(ii) doubling algorithm;
(iii) matrix sign algorithm.
A Schur algorithm is based on locating a stable subspace using a Schur decomposition
of the state-costate system. As we noted in the previous section, once a stable subspace
is located, the relevant Riccati equation solution Pv is easily computed. There are two
versions of a Schur decomposition, depending on whether the matrix Avv is known to
be nonsingular or not. A Schur decomposition gives a more reliable way of locating
stable spaces than the familiar Jordan decomposition and its generalization for pencils.
A doubling algorithm is an iterative method for speeding up the dynamic pro-
gramming Riccati equation iteration by doubling the number of time periods in each
iteration. Recall from our discussion in the previous section that the stable deflating
subspace of the pencil {AL - N} coincides with the invariant subspace of the sign
of the matrix (L - N) -1 (L + N) associated with the eigenvalue - 1 . A matrix sign
algorithm is an iterative method for computing the sign of (L - N) -1 (L + N) from
which we can recover P'u easily.

4.1. Schur algorithm

Suppose the matrix Avv is nonsingular. As we noted in Section 3, the matrix Pv can
be found by locating the stable invariant subspace of the matrix M given in (3.15).
In some of our numerical calculations, we use what is referred to as a real Schur
decomposition of M to locate its invariant subspace.

DEFINITION. The real Schur decomposition of a real matrix M is an orthogonal matrix

<V and a real upper block triangular matrix W such that

1 WI2 ... WI~

W = . . . .

... 0 ~.~
Ch. 4.. Mechanics of"Forming and Estimating Dynamic Linear Economies 193

where Wii is either a scalar or a 2 x 2 matrix with complex conjugate eigenvalues. 8

A real Schur decomposition is a computationally convenient version of the block

triangular decomposition (3.17) used to compute Pv when Avv is nonsingular. Golub
and Van Loan (1989) describe how to compute the real Schur decomposition (in
particular, see Sections 7.4 and 7.5). Recall that the block triangular matrix W in
(3.17) results from partitioning the eigenvatues into stable and unstable eigenvalues.
Algorithms that compute the real Schur decomposition of a matrix typically do not
partition the diagonal blocks of W according to stability. Instead, given an arbitrary
real Schur decomposition M = V W V I, one can use the approaches described in either
Bai and Demmel (1993) or Stewart (1976) to construct a sequence of orthogonal
transformations that reorder the diagonal blocks of ~/, while updating V so that
M = V W V ~ holds at every step.
In summary, the steps for implementing a Schur algorithm are

(1) form the matrix M in (3.15);

(2) form a real Schur decomposition of M where the first n columns of V, written in
a partitioned form as [ Vii ! Y21! jr, are a basis for the stable invariant subspace
of M ;
(3) solve Py~rll~- V21 for Pv"

For the numerical computations which follow, we compute the real Schur decomposi-
tion of M using the L A P A C K 9 function DGEES. For comparisons, we also compute
an eigenvector decomposition using the built-in MATLAB function EIG. Our eigen-
vector routine assumes that the eigenvalues of M are distinct, and we do not attempt
to implement an algorithm designed for the more troublesome case in which there
are repeated eigenvalues. We compute Pv in step (3) using the built-in MATLAB
operator ' / ' , which solves a linear equation using Gaussian elimination with partial
A deflating subspace method is required when Auy is singular and likely to be more
stable numerically when Auy is nearly singular. To implement this approach in prac-
tice, we use an ordered real generalized Schur decomposition to find an appropriate
triangularization of the state-costate dynamical system [see Van Dooren (1982)].

DEFINITION. A generalized real Schur decomposition of a real matrix pencil AL - N

is a pair of orthogonal matrices U and V, a real upper triangular matrix T, and a real

8There is also a complex Schur decomposition of a real or complex matrix in which V is a unitary
matrix and l~ is upper triangular.
9The algorithms described in this paper use routines from the FORTRANpackages LAPACK,LINPACK
and RICPACK. All of these packages can be obtained by anonymous tip from netlib.att.com and various
nfirrors. MATLABis a commercial matrix algebra package available from The MathWorks, Inc. All of our
FORTRAN routines are implemented as MATLABMEX-files.
194 E. ~ Anderson et al.
upper block triangular matrix W, such that

. .

UL~" = T = T22 ...

",, ",,

• , . 0

• .. Wire 1
A A ~ W22
UNV = W = .

[. 0 ... 0

where the pencil k2~ii - ~ i i is either a 1 × 1 matrix pencil or a 2 x 2 matrix pencil

with complex conjugate generalized eigenvalues.

As with the real Schur decomposition, we initially compute a generalized real Schur
decomposition of ~L - N without regard to whether the generalized eigenvalues are
stable or not. We then reorder the diagonal blocks of T and W so that the generalized
eigenvalues are partitioned in the manner required by (3.20). This partitioning can be
done using the algorithms described in Van Dooren (1981, 1982) or in K~tgstr6m and
Poromaa (1994).
Thus the steps for implementing a generalized Schur algorithm are
(1) form the matrices L and N in (3.4);
(2) form a generalized real Schur decomposition of the pencil ~L - N where the
first n columns of V, written in a partitioned form as [ Vii ! V211it, span the
deflating s u b s p ~ e of the pencil ~L - N;
(3) solve P ~ 1 = V21 for P v .
For the numerical comparisons which follow, we implement the generalized Schu(
algorithm by using the routines QZHESW, QZITW, QVAL, and ORDER from RIC-
PACK. We also report results for a method that uses generalized eigenvectors to
compute deflating subspaces. This method takes the first n columns of the matrix
to be the generalized eigenvectors of AL - N that correspond to stable generalized
eigenvalues. We implement this method using the built-in MATLAB function EIG,
making no attempt to handle repeated generalized eigenvalues.

4.2. Doubling algorithm

Dynamic programming solves the infinite horizon problem by backward induction,

which leads to iterations on the Riccati equation (3.13). A doubling algorithm can be
viewed as a refinement of this approach. It preserves the idea of approximating the
solution to the infinite horizon problem by a sequence of finite horizon problems, but
Ch. 4." Mechanics of Forming and Estimating Dynamic Linear Economies 195

instead of increasing the horizon by one time period in each iteration, the number of
time periods gets doubled.
To see how this approach works, recall that the solution to the finite horizon problem
for periods 0, . . . , ( r - 1) can be viewed as a two point boundary value problem where
the initial state vector Y0 is set to some arbitrary vector y and the costate vector at
the terminal date #~- is set to zero. Suppose for simplicity that A y y is nonsingular. By
iterating on relation (3.14), we find that

[].0 41,

M =_ M -'~.

To approximate the matrix PAy, we solve (4.1) for the initial costate vector #0 as a
function of Y0. Partitioning M conformably to the state-costate partition, we see that

M l l Y r = YO, M21Yr = #o.

Therefore, the implicit initialization of the costate vector is

,o =

and our approximation for the matrix Py is given by M21(Mll) -1.

What is needed to implement this approach is a way to compute M when the horizon
r is large. Expanding the horizon one period at a time corresponds to multiplying the
matrix M - l , r times in succession. However, when r is chosen to be a power of
two, computations can be sped up by using

M - 2 k+t = ( m - 2 k ) M -2k. (4.2)

As a consequence, when r - 2J, the desired matrix can be computed in j iterations

instead of 2J iterations, which explains the name doubling algorithm.
Given that the matrix M - l has unstable eigenvalues, direct iterations on (4.2)
can be very unreliable. Clearly, the sequence of matrices { M -2k } diverges. One of
the features of a doubling algorithm is to transform these computations into matrix
iterations that converge. Another feature is that a doubling algorithm exploits the fact
that the matrix M is symplectic. Symplectic matrices have several nice properties, l°
We have already seen that their eigenvalues come in reciprocal pairs. In addition, the

t°There is a variation of the Schur algorithm that exploits the symplectic structurc of M. See pages
431-434 of Petkov et al. (1991) for an overview of this algorithm.
196 E.IE Anderson et al.

product of symplectic matrices is symplectic, and the inverse of a symplectic matrix

is symplectic. Moreover, for any symplectic matrix S, the matrices $21 ($11)-1 and
($11)-t Sl2 are both symmetric and

~22 = (~1I') -1 -l- ~21(~11)--1S12

= ( S l l ' ) -1 -~- S21 ( S l l ) - l S l l (Sll)-ls12 .

Therefore, a (2n × 2n) symplectic matrix can be represented in terms of the three
n × n matrices a --- ( S u ) - 1 fl __ ( S u ) - l & 2 , 7 = $21 ($11)-1, the latter two of which
are symmetric.
The doubling algorithm described by Anderson (1978) and Anderson and Moore
(1979) exploits such a representation by using the following parameterization of

M_2 ~: [ (O~k)-1 (Ozk)-l~k ]

= LT~(cx~) -1 o~k' + 7k(o~k)-lF3k I '
where the n × n matrices c~k,/3k, % are given by the recursions

ak+i = a k ( I + ~k"/k)-~ c~k,

/3k+l =/3k + ~k(I +/3k7~)-l/~kak ',
7 k + l = 7k + oe~'%(I +/3k%)-lak. (4.3)
While this alternative parameterization introduces a matrix inverse into the recursions
(4.3) that is absent in (4.2), the matrix I +/3k% being inverted is only n dimensional.
The nonsingularity of this matrix for all k is established in Kimura (1988). To initialize
the doubling algorithm, we simply deduce the implicit parameterization of M - I given
in partitioned form by

[ Avv -j Ayy-1ByR-IBy ' ]

M -1 = N - 1 L = [QyyAyy-I Q y y A y y - t B y R - I B . v ' + Ayy 1j ' (4.4)

which leads to the initializations

ao = Ayy, ~o = B v R - 1 B v ', 7o = Q v v

While our derivation took the matrix Avv to be nonsingular, Anderson (1978) argues
that the doubling algorithm is more generally applicable.
A convenient feature of this parameterization is that there are known conditions un-
der which the matrix sequences {ak}, {/3~}, {%} converge. When the pair (Avy , Dr)
is detectable, then the sequence {Tk} is nondec'reasing and converges to the matrix Pv.
(Here we are adopting the usual partial ordering for positive semidefinite matrices,)
As noted by Kimura (1988, Theorem 5), under the same restrictions, the sequence
Ch. 4: Mechanics of Forming and Estimating Dynamic Linear Economies 197

{ilk} is nondecreasing and converges to a positive semidefinite matrix P.,~ associated

with a "dual" to the deterministic regulator problem.
The convergence of the {c~k} sequence is more problematic. Unfortunately, without
simultaneous convergence of {c~k}, it is not evident that iterations of the form given
in (4.3) can be used as the basis of a numerical algorithm. If this latter sequence
diverges, small numerical errors may get magnified, causing the resulting algorithm
to be poorly behaved. Kimura (1988) provides some sufficient conditions for {c~}
to converge to a matrix of zeros. His sufficient conditions are used to guarantee that
either P.v or Pv is nonsingular.
As we noted previously, a sufficient condition for Py to be nonsingular is that the
pair (Ayv, Dr) be observable. Sufficient conditions for the nonsingularity of the matrix
P~ are that (i) (Avv , By) is controllable; and (ii) (Avv,.Dv) is detectable [Kimura
(1988)]. Recall that controllability is often achieved by our a priori partitioning of
the state vector into endogenous and exogenous components. Thus for our purposes,
the restrictions guaranteeing the nonsingularity of P~ may be of particular interest.
Even so, detectability is too strong for some of our applications.
To apply a doubling algorithm more generally, we sometimes modify the control
problem by adding small quadratic penalties to linear combinations of the states and
controls. As long as these penalties are sufficient to guarantee that either Pv or P~ is
nonsingular, we are assured of convergence of all three sequences. Of course, there
is a danger that the penalty distorts the solution to the original control problem in a
nontrivial way, which must be checked in practice.

4.2.1. Initialization from a positive definite matrix

Instead of adding small quadratic penalties to the objective function for each calendar
date, we could add a terminal penalty to the finite horizon approximation to the con-
trol problem. From Chan, Goodwin and Sin (1984), it is known that iterations on the
Riccati difference equation converge to the unique stabilizing solution whenever the
Riccati equation is initialized at a positive definite matrix. I1 Initializing the Riccati
difference equation at a positive definite matrix is equivalent to imposing a terminal
penalty that is a negative definite quadratic form in the state vector. We will now show
how to initialize the doubling algorithm to impose a terminal penalty. This will permit
us to compute Py via a doubling algorithm for a richer class of control problems.
Consider first a finite time horizon problem with a quadratic penalty on the terminal
state. We select this penalty so that the terminal multiplier >~ = Poy~ for some
positive definite matrix Po. Then Eq. (4.1) is altered to be

A[I 1 Ey0]
M Po Y~ = #o ' (4.5)

llHere we axe using the fact that the pair (Ayy, By) is stabilizable and that there exists a solution to
the deterministic regulator problem when constraint (2.1) is imposed. The result follows from (i) and (iii)
of Theorem 3.1 and Theorem 4.2 of Chan, Goodwin and Sin (1984).
198 E.W. Anderson et al.

Build a matrix K

Then Eq. (4.5) can be rewritten as

K-1MKK-1 Po y~- = #o


- Poyo '


M* = K-1MK.

Partitioning M* consistently with the state-costate vector, the implicit initialization

of the costate vector is now

#o = PoYO + M12(Mll) y0,

and our approximation for Pu is given by M1*2(M~I)-1 + Po.

We are now left with computing the matrix M* when the horizon -r is very large.
Notice that

M* = ( K - 1 M K ) -~.

It is straightforward to verify that because M is symplectic, so is K - I M K . This

means that doubling algorithm (4.3) is applicable for computing ( I £ - l M K ) - 2 k ; how-
ever, the initializations must be altered. The new initializations can be deduced by
looking at the implicit parameterization of the symplectic matrix K - 1 M -1 K , and
they are given by

c~o = (I + B y R - 1 B y ' P o ) - I Ayy,

/3o = (I + B y R - 1 B v ' P o ) - 1 B v R - I Bv ', (4.6)
~YO: Qyv - Po + A y v ' P o ( I + B y R - 1 B y ' P o ) - l A v y .

Not surprisingly, the original initializations coincide with setting Po to zero in (4.6).
Ch. 4." Mechanics of Forming and Estimating Dynamic Linear Economies 199

There are two related advantages to these initializations over the previous ones.
First, the sequence {3'2} converges to P~ - Po whenever Po is positive definite. This
follows from the Riccati difference equation convergence described previously and
does not require that (Avv , Dy) be detectable. Second, the sequence {C/j} converges
and satisfies the bounds

0 < C/j ~ (/Do) - 1

even when (Ayy, Dy) is not detectable. 12 Although we do not have a complete char-
acterization of convergence of the resulting algorithm, all three matrix sequences
(including { % } ) are guaranteed to converge with these alternative initializations if
they converge with the original ones.
In summary, the steps for implementing the doubling algorithm are
(t) initialize ct0,/3o, and 7o according to (4.6);
(2) iterate in accordance with (4.3);
(3) form Py as the limit of {Tk} + Po.
We implement the doubling algorithm in FORTRAN, exploiting the fact that c/k and
% are symmetric matrices for all k. 13 We use two different settings for Po. To obtain
the original doubling algorithm, we set Po to zero; and to investigate the potential
advantages of including a terminal penalty, we set t='o to an identity matrix.

12The convergence and bound can be established as follows. Let {/3~} denote the sequence starting
from the original initialization. Then it is straightforward to show that

~j = (l + ~jPo)
* - ' [~j.

Exploiting the nonsingularity of.Po, the following equivalent formula can be deduced:

flj = (Po) -1 - ( P o + Pofl2 Po) - l ,

The reported bound follows immediately. The sequence {fl~ } is monotone increasing because it is a
subsequence of Riccati difference equation iterations for a dual problem initialized at zero. Therefore, the
sequence {flj } is also monotone increasing. Given the upper bound (Po) - I , this latter sequence must
13We iterate on (4.3) until

II~k -7k-Ill1 ~ ~ II~kll,,

where we set e = 1 x 10 -15 on a computer with a machine precision of 2 -52 ~ 2.2204 × 10 - 1¢,. Here
ltX[[l denotes the matrix l-norm of a matrix X:

IIXlll = ,~ax ~ Ix~jl •

200 E, W. Anderson et al.

4.2.2. Application to continuous time

As noted by Anderson (1978) and Kimura (1989), a doubling algorithm for a discrete-
time symplectic system can be used to solve a continuous-time Hamiltonian system.
Recall that in our discussion of solving control problems via a matrix sign algorithm,
we showed how to convert a discrete-time symplectic system into a continuous-
time Hamiltonian system. To apply a doubling algorithm, we want to "invert" this
mapping, e.g., given a Hamiltonian matrix H, we construct a symplectic pencil with
the same stable deflating subspace. The symplectic pencil associated with H is given
by A(I + H ) - (I - H). By adopting a very similar argument as before, we found it
easy to show that the generalized eigenvectors for the constructed pencil coincide with
the eigenvectors of the original Hamiltonian matrix H. Moreover, the classification
of stable and unstable (generalized) eigenvalues is preserved.

4.3. Matrix sign algorithm

In Section 3.3 we showed how to compute Py from the sign of tile Hamiltonian
matrix for a continuous-time state-costate system. To compute Py for a symplectic
pencil AL - N, we first form the Hamiltonian matrix

H = (L - N ) - l ( L + N)

and then compute sign(H). For this to be a viable solution method, we must be able
to compute sign(H) easily.
There are alternative matrix sign algorithms. An algorithm advocated by Roberts
(1980) and Denman and Beavers (1976) is to average a matrix and its inverse:

Go = H,

Gk+l = G k + ~ [ ( G k ) -1-Gk], k=0,1,.... (4.7)

To speed up convergence, Gardiner and Laub (1986) suggest using the recursion

Go = H,
Gk+l = ~£ck(Gk q:- ek2Gk-I),


e-k = t detGkt 1/n (4.8)

Ch. 4: Mechanics of Forming and Estimating Dynamic Linear Economies 201

Bierman (1984) and Byers (1987) propose a further refinement, which exploits the
fact that the matrix Gk is a Hamiltonian matrix for each k. Recall that if H is a
Hamiltonian matrix, then J H is symmetric where


JGk+l = 1 (JGk + ek2JJGk-lJ), (4.9)

where ek is either set to one as in the original sign algorithm or set via for-
mula (4.8) using JGk in place of Gk. Consequently, it suffices to compute the se-
quence of symmetric matrices {JGk} recursively via (4.9) starting from the initializa-
tion J H . 14
in summary, the steps for implementing a matrix sign algorithm are
(1) form the matrices L and N in (3.4);
(2) compute G, the sign of ( N - L ) - I ( L + N);
(3) compute Pv by solving the over-determined system

GI2 ] [GII +/]

G22 @ I Py = - L a21 J

for /~v"
For our numerical comparisons, we compute the sign of G by iterating on (4.9) until
convergence with ek = I detJGktl/'~. 15 To compute (JGk) -I we use the symmetric
inversion routines DSIFA and DSIDI from LINPACK. We solve (4.10) for Pv using
least squares.
As noted in Anderson (1978), the original sign algorithm (4.7) also can be viewed
as a doubling algorithm. Interpreted in this manner, it uses (at least implicitly) an
alternative parameterization of the symplectic matrix M -1 to that used in doubling
algorithm (4.3). Both recursions entail inverting a matrix. While recursion (4.9) re-
quires that a symmetric (2n x 2n) matrix be inverted in each iteration, the doubling
algorithm (4.3) requires that a nonsymmetric n x n matrix be computed at each
14Kenney, Laub and Papadopoulos (1993) and Lu and Lin (1993) discuss further improvements to the
matrix sign algorithm.
15More precisely, we iterate on (4.9) until

}lJGk JGk-llll ~ ellJGktll,


where e = 1 × 10 -15 .
202 E.W. Anderson et al.

5. Solving the augmented regulator problem

So far, we have shown how to compute the matrix Fy, which provides us with the
optimal control law for the deterministic regulator problem. This matrix also gives us
a piece of the solution to the augmented control problem and, hence, to the problem
of interest: the discounted stochastic regulator problem. The missing ingredient is the
matrix F,, where the optimal control law for the augmented regulator problem is
given by vt = ~FvYt - F~zt. In this section, we show that F~ can be calculated by
solving a particular Sylvester equation.
We start by forming a Lagrangian modified to incorporate the exogenous state
vector sequence {zt}:
/2 = - ~ [Yt'Qyvyt + 2yt'Qy~& + vt'tgvt + 2pt+l'(Ayyyt + Ay~z, + Byve - yt+,)],

where the evolution of the forcing sequence is given by

Zt+l = Azzzt. (5.1)

First-order necessary conditions for the maximization of/2 with respect to { V t}t=0
and {Yt}t=o are
vt: Rvt + ByZ#t+l = 0, t ~> 0, (5.2)
Yt: #t = Q,yyYt q- QyzZt + Ayy'ldt+l, [~ ) O. (5.3)
Solve Eq. (5.2) for vt; substitute it into the state equation; and stack the resulting
equation along with (5.3) and (5.1) as composite system

L a |Pt+1 --N ~ Pt ,
k Zt+l Zt


Ay u 0 Ay, ]
L a =_ Ayy I , Na- -Qyy I (5.4)
0 0 0 A~

As with the deterministic regulator problem, the relevant solution is the one that
stabilizes the state-costate vector for any initialization of Y0 and z0. Hence we seek a
characterization of the multiplier Pt of the form
Ch. 4: Mechanics of" Forming and Estimating Dynamic Linear Economies 203

such that the resulting composite sequence [Yt' #t' zt']' is in the stable deflat-
ing subspace of the augmented pencil AL a - N a. Assuming for the moment that a
solution P exists, it must be the case that P = [ Py P , ], where Pv is the Riccati
equation solution that was characterized in Section 3, and Pz is a matrix that has
not yet been characterized. To see why this must be the case, note that the solution
to the augmented regulator problem with z0 = 0 coincides with the solution to the
deterministic regulator problem. We have previously shown that Pu is a matrix, such
that all vectors in the deflating subspace of the pencil AL - N can be represented
as [ y' ytpy ]'. When the forcing sequence is initialized at zero, so it remains there
for all t, it must also be the case that [ y ' y'Py 0 ] ' is in the stable deflating sub-
space of the augmented pencil AL ~ - N a. This justifies our previous claim that the
solution to the deterministic regulator problem gives us a piece of the solution to the
augmented regulator problem.
To deduce the control law associated with the matrix P , we substitute P into (5.4),
which yields

L~ + = Na Pyyt q- Pzzt. •
k zt + 1 zt

If we write the three equations in this composite system separately,

(I + ByR-tBytPy)Yt+l + ByR-IBytPzzt+I = Ayyyt + Ayzzt,

Zt+l = Azzzt. (5.5)

Substitute tile last equation into the first and solve for Yt+l:

Yt+l = (I q-ByR-1By'Py) -1 [Auuyt + (A w - B v R - I B u ' P ~ A ~ ) z t ] .

It follows from relation (3.9) that this evolution equation for Yt can be rewritten as

Yt+l = (Avv - BuFv)yt + (Ay~ - BvF~)zt, (5.6)

where F u and Fz are given by

Fy =_ (R + B u! PvBy) --1 B v ! PvAu>


For the reasons given previously, our construction of F v coincides with (3.11) used to
represent the optimal control law for the deterministic regulator problem. Stability of
204 E.W. Anderson et al.

the state vector sequence {gt} is guaranteed by evolution Eq. (5.6) because the matrix
Avv - B vFy is the same matrix that appears in the state evolution equation for the
deterministic regulator problem under the optimal control law. Since the solution to
the deterministic regulator problem is stable by design, the eigenvalues of A v u - B y F y
have absolute values that are strictly less than one. The optimal control law for the
augmented regulator problem is given by

vt = -- Fyyt - Fzzt.

The matrix F~ can be computed using formula (5.7) once we know Pz. We
now show that P~ is the solution to a Sylvester equation. Premultiply (5.6) by
Ayy' Py:

A y v ' P y y t + l = A y f f Py(Avv - B v F v ) y t + A y v ' P y ( A v z - ByF~)zt. (5.8)

Using formula (5.7), we rewrite the coefficient matrix on zt as

A y v ' P y ( A y z - F~) = (Avv - B y F y ) ' ( P y A v z + P ~ A ~ ) - A v v ' P ~ A ~ .

To obtain an alternative formula for this coefficient, substitute the last equation of
(5.5) into the second equation and solve for AvffPyyt+l:

Avv' Pvyt+1 = ( Pz - Qv~ - A , d PzA~z)Zt + (Pv - Qvv)yt. (5.9)

Equating coefficients on zt in (5.8) and (5.9) results in

(Avv - B v F v ) ' ( P u A w + P~Azz) A v v ' P ~ A z z = Pz - Qvz - A v v ' P ~ A z z .

Rewriting this in the lbrm of a Sylvester equation (in the unknown matrix P~), we
have that

P~ : Q w + (Avv - B v F v ) ' P v A w + (Avy - B v F v ) ' P ~ A ~ . (5.1o)

As we noted previously, the matrix (Ay v - B v F y ) has only stable eigenvalues.

Also, we assumed that the matrix Azz has only stable eigenvalues (Assumption 4).
These restrictions are sufficient for there to exist a unique solution Pz to (5.10). Up
to now, our discussion proceeded under the presumption that there exists a matrix P,
such that by settingpt = p [LztJ
yt ], we stabilize the state, vector sequence. We can now
work backwards using the (unique) solution to the Sylvester equation to show that
indeed such a matrix P does exist.
Ch. 4: Mechanics of Forming and Estimating DynamicLinear Economies 205

6. Computational techniques for solving Sylvester equations

A Sylvester equation is represented by

M = W + SMT, (6.1)

where the matrices W, S, and T are specified in advance and M is the matrix to be
computed. Consistent with (5.10), the matrices S and T have stable eigenvalues. 16
There is a variety of ways to depict the solution to a Sylvester equation. One is to
vectorize (6.1) as

[I - T ' ® S]vec(M) = vec(W), (6.2)

where vec(.) denotes stacks of the columns of a matrix argument. [To derive (6.2)
from (6.1), use the identity v e c ( S M T ) = I T ' ® S]vec(M).] Hence vec(M) is the
solution to a linear equation system. Alternatively, M is given by the infinite sum


M = E Sj WTJ" (6.3)

This representation can be deduced by iterating on Eq. (6.1), starting from any initial
matrix with the appropriate dimensions.
We consider two types of algorithms for computing M:
(i) Hessenberg-Schur algorithm;
(ii) doubling algorithm.
The Hessenberg-Schur algorithm uses a Schur decomposition of the matrix T to
convert a single Sylvester equation to a collection of much smaller Sylvester equations,
each of which can be vectorized as in (6.2). A Hessenberg decomposition of the matrix
S is used further to simplify the calculations. The doubling algorithm is an iterative
algorithm that approximates the infinite sum on the right-hand side of (6.3) by a finite
sum. Similar to the doubling algorithm for solving a Riccati equation, the number of
terms included in the finite sum approximation "doubles" at each iteration.

6.1. The Hessenberg-Schur algorithm

As suggested by Bartels and Stewart (1972), one strategy for solving Sylvester equa-
tions entails block triangularizing the matrices T and/or S. We follow Golub, Nash

16Weoffer the following word of caution (or apology)to the reader. We are compelled to recycle some
of the notation used in previous sections.
206 E.W. Anderson et al.

and Van Loan (1979) by forming a Schur decomposition of the matrix T: V ' T V = T,
where V is an orthogonal matrix and T is upper block triangular with row and column
blocks that are either one or two dimensional (see Section 4.1 for a formal definition).
Postmultiply Sylvester equation (6.1) by V and rewrite the equation as

M = W + SMT, (6.4)

where ~ r = M V , W = W V , and S = S. Notice that (6.4) is in the form of a

Sylvester equation in the inatrix M.
The block triangularity of T can now be exploited to reduce (6.4) into m smaller
Sylvester e~t.uations, where m is the number of row and column blocks of T. Write
the matrix T in partitioned form as

t ...

[i "



A ~

Use the column partition of W to partition M and W, and let M j and W j denote the

corresponding jth partitions. Decompose Sylvester equation (6.4):

M1 = Wl + SM1T11, (6.5)

~rj = WJ + S ~ MkTkj + S M j T j j , J = 2 , . . . ,m. (6.6)


Notice that (6.5) is a SylvesteAr equation in M~ and that (6.6) is a Sylvester equation in
M j as long as the matrices Mk for k = 1 , 2 , . . . , j - 1 have already been computed.
Thus these m Sylvester equations can be solved sequentially as linear equations using
vectorization (6.2).
An additional refinement advocated by Golub, Nash and Van Loan (1979) entails
taking a Hessenberg decomposition of the matrix S. 17

DEFINITION. The Hessenberg decomposition of the square matrix S is an orthogonal

matrix U and a matrix S that has all zeros below the first subdiagonal, such that
S = USU'.

In addition to postmultiplying Eq. (6.1) by V, we now also premultiply this equation

by U'. Equation (6.4) continues to hold with M = U ' M V , W = U ' W V , and
S = U ' S U . This Sylvester equation can still be decomposed as in (6.5) and (6.6). With

17Alternatively,we could take the Schur decompositionof S as proposedby Barrels and Stewart (1972).
Ch. 4: Mechanics of Forming and Estimating Dynamic Linear Economies 207

in Hessenberg form, we can solve these latter Sylvester equations more efficiently
using an equation solver designed for Hessenberg systems. 18
In summary, the steps for implementing a Hessenberg-Schur algorithm for com-
puting P~ are
(i) form the matrices W = Qvz + (Ayy - B y F y ) t P y A y z , S = (Ayy - B y [ v ) ' , and
T = Az~;
(ii) form a Hessenberg decomposition S = U S U ~ and a Schur decomposition
T -=- V T W ;
(iii) compute the solution M to (6.5) and (6.6) and form P~ = U M W .
Since the Hessenberg decomposition of a matrix can be computed faster than the real
Schur decomposition, one should always arrange the Sylvester equation so that the
Hessenberg decomposition is taken of the matrix (Ayy - B y F y ) ' or Azz, whichever
has more entries. The steps just described should be implemented if there are more el-
ements in the vector Yt than zt. If zt has more elements, then the alternative Sylvester

p~' = Qy~' + A v z ' P y ( A u u - BuFy ) + Az~'P~'(dvu - B v F y )'

should be solved for the matrix P~.

In the numerical comparisons that follow, we form the Hessenberg decomposition of
a matrix using MATLAB subroutine HESS and the Schur decomposition of a matrix
with SCHUR. We solve Hessenberg systems using the routines HSFA and HSSL,
which are part of the package described in Gardiner et al. (1992). 19

6.2. Doubling algorithm

The doubling algorithm for Sylvester equations iterates

OLk+ 1 z OLkO:k~

/3k+1 =/3k/3k, (6.7)

7k+1 = 7k + akTk/3k
to convergence, where a0 = S, •0 = T, and 70 = W. By repeated substitution, it can
be shown that

7k = S J W T j.

181nteresting variations on the Hessenberg-Schur algorithm have been proposed by Hammarling (1982)
and Gardiner et al. (1992).
19See pp. 364-370 of Golub and Van Loan (1989) for a discussion of how to compute the ltessenberg
208 E.g~ Anderson et al.

In other words, each iteration doubles the number of terms in the sum. 2°
To use this doubling algorithm to compute P~
(i) initialize C~o = (Ayy - ByFy) t, ~o = A~z, and 70 = Qyz + (dry - B y F y ) ' P v A w ;
(ii) iterate in accordance to (6.7);
(iii) form Pz as the limit of {'Yk}.
We implement the doubling algorithm in FORTRAN. 21

7. Distorted economies

Some of the algorithms described previously are directly applicable to solving models
whose equilibrium quantity allocations are not the solutions to optimal resource allo-
cation problems. To illustrate this point, we use a simplified version of McGrattan's
(1994) model of a distorted economy. 22 Consider a setup with a representative agent
who chooses a control sequence {vt} to maximize

I 2 ! ^


subject to

Yt+l = A y y Y t + Ayfjgt + B y v t ,

2 + Iwl 2) < (7.1)


where the sequence {~)t} is viewed by the agent as being beyond his control when
making decisions. As an equilibrium condition, ~)t is an exact function of yt and vt:

Yt = ~2yt q- ff~vt. (7.2)

In formulating the decision problem for the representative agent, we have abstracted
from uncertainty and used analogous tricks to those described earlier for eliminating

2°This algorithm is a slight generalization of the doubling algorithm for Lyapunov equations discussed
in Anderson and Moore (1979). A Lyapunov equation is a Sylvester equation in which S = T ~.
21We iterate on (6.7) until

II'~k - "Yk-llll <~ ~ II'Ykll~,

where we set e = 1 × 10 -15 .

22In Appendix B.3, we take another version of McGrattan's formulation and differentiate the equilibrium
law with respect to parameters in the control problem and equilibrium conditions.
Ch. 4: Mechanics of Forming and Estimating Dynamic Linear Economies 209

discounting and cross products between states and controls. [See McGrattan (1994)
and Appendix B.3 of this paper for a more complete treatment.] Also, we have zeroed
out the forcing sequence {zt}, so this setup should be viewed as a distorted equilibrium
counterpart to the deterministic regulator problem.
To define an equilibrium for this model, we introduce a process {y~'} that in equi-
librium coincides with {Yt}. This additional process is used to capture the perceived
evolution of {~)t} by economic agents in making their decisions. Formally, the per-
ceived evolution equation is given by

Yt+l = A Yt ,

Yt = ~ * Y t*,
where the eigenvalues of A* are assumed to have absolute values that are strictly less
than one. Adding this evolution equation to the decision problem of the private agent
is sufficient to make his problem a fully specified deterministic regulator problem.
Writc the solution to this decision problem as

Vt = - F y y t - F ; y ; . (7.3)

Then a rational expectations equilibrium is a specification of (Fv, F~, A*, D*) such
A* = Ayy + Avgf2 - (Avgg~ + By)(Fy + r ; ) ,
f2* = g? - O(Fy + F;),
where control law (7.3) solves the decision problem of the private agent.
As an initial step in solving for an equilibrium, we obtain first-order necessary
conditions for the private agent's control problem:

vt: Rut + By~#t+l = O, t >~O, (7.4)

Yt: t~t = Q y y y t + Q y g y t -}- A y y t # t + l , t >>,O, (7.5)
where {#t} are Lagrange multipliers associated with the constraint Eq. (7.1). At
this stage, we are free to substitute for ~)t from equilibrium condition (7.2). Solving
Eq. (7.3) for vt, substituting it and Eq. (7.2) into Eqs (7.1) and (7.5), and rearranging


0 A' 1 , N= _ 01 ,
210 E.W. Anderson et al.

and A = Ayy + A v g . , Q. = Qvy + Qy#~, ~ = By + Av#~P, and A = Avv -

B y R - t ~ Q z p. Note how these equations generalize (3.4) to a distorted equilibrium
model. When distortions are active, the pencil AL - N may fail to be symplectic, so
the eigenvalues do not necessarily occur in reciprocal pairs. When the eigenvalues can
be split with half inside the unit circle and half outside and the analog of Vii in (3.19)
is nonsingular, then the deflating subspace and matrix sign methods described earlier
can be used tocompute the unique stable equilibrium. 23 Under the same conditions,
if either A or A is nonsingular and well conditioned, then invariant subspace methods
also can be used. Finally, Anderson (1995) describes a generalization of the doubling
algorithm for Riccati equations that can be used to solve distorted equilibria. Since the
pencil is not symplectic, this generalized doubling algorithm includes an additional
For economies with a forcing sequence {zt} with first-order dynamics, there is an
analogous formulation of a distorted economy equilibrium. As with the augmented
regulator problem, the equilibrium can be computed in two steps. First, a distorted
equilibrium for z0 set to zero can be computed using one of the methods described
above. Then the full equilibrium can be deduced by solving a Sylvester equation
analogous to that deduced for the augmented control problem. The Hessenberg-Schur
algorithm and the doubling algorithm described in Section 6 are both applicable in
this second step.

8. Example economies

In preparation for our numerical work, we describe three examples with features that
"stretch" our algorithms to the boundaries of their domains of applicability.

8.1. A model of permanent income with habit persistence

Our first example is an economy with two interacting unit roots in the endogenous
dynamics. As in Hall (1978), Flavin (1981), and Sargent (1987), one unit root comes
from the permanent income character of the model. The technology is specified so that
the rate of return on capital and the subjective rate of time discount are equated. As in
Hansen (1987), Becker and Murphy (1988) and Heaton (1993), we use an extended
version of the permanent income model to accommodate preferences that are not time
separable. The second unit root occurs because of the special way we model habit

23When applying matrix sign methods, one should iterate on (4.7) or (4.8) instead of (4.9), since the
matrix J ( L - N ) - t ( L + N ) is not, in general, symmetric.
Ch. 4: Mechanics (~f Forming and Estimating Dynamic Linear Economies 211

There is a single consumption good ct, a single investment good it, a single physical
capital stock kt, and a single household capital stock, ht, in each time period. The
household capital stock is constructed to be a geometric average of current and past

ht = 0 . 9 h t - i + O . l c t ,

where 0.9 dictates the geometric decay in the average. We capture habit persistence
by introducing a service process:

st = Ct -- h t - t .

One source for a unit root in the endogenous dynamics is that the magnitude of the
time t service is the difference between current and an average of past consump-
The production technology is given by the two relations:

ct + it = O.lkt-1 + dr,

kt = 0.95kt-1 + it.

To provide a permanent income character to this model, we set the subjective discount
rate/3 = 1/1.05.
The preference shock process is restricted to be constant over time (b = 30),
and the technology shock process {dr} is a first-order autoregression with mean 5
and autoregressive coefficient 0.8. We represent these processes using the setup of
Section 2.4 by introducing an exogenous state vector ~t with two components. Recall
that the exogenous state vector process is assumed to have first-order dynamics. The
autoregressive matrix for this process is given by

00 ]
where the first component of zt is initialized at one and remains constant over time.
While the second component of ~t can be subject to shocks in each time period,
certainty equivalence makes the magnitude of the uncertainty inconsequential for
solving the model. Hence it is unnecessary to specify the matrix Cz. The selection
matrices Ub and Ua are given by Ub = [30 0] a n d U a = [5 1].24

24In this economy,there are no intermediate goods 9t. As suggested in Section 2.4, we stilJ use it as
the control vector, and we can clearly solve for ct as a linear function of the control and state vectors.
212 K W . Anderson et aL

For this particular economy, there are potential problems in applying two of tile
algorithms we described in Sections 3 and 4. Since the economy has repeated unit roots
in the endogenous dynamics, an invariant subspace method that uses an eigenvector
routine designed for distinct eigenvalues might give a poor approximation to the
solution. Also, this is an economy in which the square summability constraint (2.1)
is binding. In other words, it is not optimal to stabilize the endogenous state vector
process in the absence of such a constraint. As a consequence, Riccati difference
equation iterations starting from the zero matrix converge to the wrong solution, as
does the corresponding partition of the Po = 0 doubling algorithm.
As a potential remedy for both of these pitfalls, we "approximate" our economy
by one in which there is a very tiny adjustment cost for physical capital. The cost is
captured by introducing a single intermediate good 9t, such that

¢ i t -- g t = O,

where we set ¢ = 1 x 10 -7. This small adjustment cost is enough to eliminate

the repeated unit roots in the endogenous dynamics. Moreover, it makes (Avy , Dr)
detectable, so that it is optimal to stabilize the endogenous state vector process. Since
the pair (Ayy, By) is controllable, this small adjustment cost is enough to guarantee
convergence of the Po = 0 version of,the doubling algorithm. One of the issues
considered in our numerical experiments is how well this "fix up" works in practice.
Does the introduction of small adjustment costs make either the eigenvector algorithm
or the doubling algorithm a viable method for solving the original control problem?
We shall also study this economy with the adjustment costs set equal to zero and with
the 19o = I version of the doubling algorithm.

8.2. A model of education

This example is a version of a time-to-build (or time-to-educate) model of wage skill

differentials that was formulated by Siow (1984). Siow's model interprets the premium
on educated labor as a present-value-equalizing differential required to compensate
for the income foregone during training years. To accord with the framework of Sec-
tion 2.4, we reformulate a version of Siow's model as an optimal resource allocation
Suppose there are three skill levels of labor: "low skill", "medium skill", and "high
skill". We adopt the notational convention that low skill work is engaged in home
production, while the other two skill levels produce market goods. We assume that
it takes four periods to train skilled workers and eight periods to train highly skilled
workers. Trainees are not permitted to switch training programs. This gives rise to
gestation lags in the production technology.
Ch. 4: Mechanics of Forming and Estimating Dynamic Linear Economies 213

Let im,t denote the number of workers who choose the medium-skilled training
program and il~,~ the number who choose the high-skilled training program at time t.
Let km,t and kh,t be the corresponding stocks of workers. Then

km,t : 0.97krn,t-1 + 0.974irn,t-4,

kh,t = 0.97kh,t-1 + 0.97sih,t-8,

where (1 - 0.97) is the exit rate from the labor force. To capture this gestation lag
with the first-order specification of Section 2.4, we include in kt the following:

kt = [ krn,t kh,t 0.973ira,t-3 0.972i,m,t-2 0.97ira,t-1 im,t 0.977/h,t-7

• .. 0.97ih,t-1 ih,t] I.
The first-order evolution equation for {kt} can now be constructed in the obvious way.
Hence to capture the delays in the dynamic technology, we are compelled to augment
the endogenous state vector. This augmentation is the source of the singularity in the
matrix Ayy. The control vector is it = [ im,t ih,t ] t.
The rest of the people engage in home production. Let dl,t denote the time t flow
of newborn or raw labor. The difference

Cl,t = dl,t - im,k - ih,k

is the flow of workers into home production. We include cl,t as a component of the
consumption goods vector for notational convenience. In addition to cl,t, there are
two other components to ct: goods produced by medium-skilled workers and goods
produced by high-skilled workers. These goods are produced according to the (linear)
constant returns to scale technology:

c~r~,t = 0.7kin,t-l,
Ch,t : 0 . 9 k h , t - l .

To capture the disutility of working, we introduce two intermediate goods that sa-

9m,t = ]grn,t-l~
ffh,t : ]~h,t-1;
and to capture costs associated with matching new entrants with training programs,
we introduce two additional intermediate goods that satisfy

~)~,t = O.O002im,t,
9h,t = O.O003ih,t.
214 E.W. Anderson et al.

When these constraints are combined, the technology for producing intermediate goods
and consumption goods is given by

1 0 0- 0 0 0 0
1 0 0 0 0 0
[ gmt ]
01 Cl,t ] 0 0 0 0
0 0 Cra,t + I 0 0 0
0 0 Ch,t 0 1 0 0
0 0 0 0 -1 0 gh,t
0 0. .0 0 0 -1
1 1 o 0 -1
0 0 0 0
0 [i.mt] 0 0.9 0
+ 0 0 = 1 0 L'[km't-1]kht-1q- 0 dl,t.
0 0 L ~ht 0 1 O
0.0002 0 0 0 0
0 0.0003 0 0 .0.
Consider next the household technology. Recall that by our notational convention,
cl,t denotes the quantity of new entrants into household production. The stock of such
workers at time t (after including the new entrants) is denoted ht. This "household
capital stock" evolves according to

ht = 0.97ht_1 + Cl,t,

SO the depreciation factor is the same as for the other two types of labor. Consumption
services 81,t are produced according to the linear technology

81,t = 0.5ht-1.

To capture the disutility of working in the household, we introduce a second service

82,t = - - h t - 1 ;

and to capture the (utility) costs to matching new entrants to the household technology,
we introduce a third consumption service

s3,t = --O.O001cl,t.
Ch. 4: Mechanics of Forming and Estimating Dynamic Linear Economies 215

All total, there are five components to the consumption service vector st because
we also include the consumption goods produced by medium-skilled and high-skilled

st = [ sl,t s2,t s3,t c,~,t oh,t]'.

The househotd's subjective rate of time discount is /3 = 1/1.05. Forcing process

{dl,t} is given recursively by
Pt = ~ OjPt-j -1- zt,
dl,t ~- Pt-18, (8.1)

where Pt are the new births at date t and the 0j's are set to match the birth rates in
the United States in 1990 as reported in the American Almanac: Statistical Abstract
of the United States 1993-1994. We abstract from long term population growth by
appropriately scaling the Oj's to sum to o n e Y The process {zt} has a first-order
autoregressive representation with coefficient 0.9. The variable pt-18 occurs with an
18 period lag in the second equation of (8.1) because we assume that it takes 18
periods (years) before a new born person is ready to enter a training program or
produce household goods.
The preference shock process has three nondegenerate components:

bt = [bl,t 0 0 bm,t bh,t]'.

The zeros in the preference shock process bt are associated with (dis)services to
working in the household and to matching labor to household production. The three
nondegenerate components are independent first-order autoregressive processes aug-
mented by 300. For each scalar autoregression, the autoregressive coefficient is 0.9.

8.3. A model of cattle cycles

In this subsection, we present three versions of Rosen, Murphy and Scheinkman's

(1994) model of cattle cycles. The versions differ according to whether the time units

25Formally, the 0j's were constructed as follows. We took birthrates for women from Table 93 of
the American Almanac: Statistical Abstract of the United States 1993-1994 in the year 1990 and divided
by two. Since birthrates are only recorded for women grouped in five year age brackets, we interpolated
linearly from the midpoints of each age bracket. Birthrates for ages 12 and 47 were set to zero when
doing this interpolation,and birth rates up to age 12 were set to zero. The resulting birthrates imply an
autoregression with an explosive root that induces geometaic growth in population. We then sealed the
birth rate parameters by the inverse of the growth factor raised to the appropriate powers to eliminatethe
growth. The resulting autoregressiveprocess has a unit root by construction.
216 E.W. Anderson et aL

are years, quarters, or months. To match the setup of Section 2.4, we reformulate
Rosen, Murphy, and Scheinkman's market equilibrium model as an optimal resource
problem. We initially describe the yearly model. For our numerical speed and accu-
racy comparisons with the annual version of this model, we estimated some of the
parameters using the methods to be described in subsequent sections. The parame-
ters for the versions of the model at the quarterly and monthly timing intervals were
deduced in ways described below.
Let kb,t denote the total stock of breeding cows. Each such animal gives birth to
r/calves, and calves become part of the adult stock after two years. For simplicity,
we set the death rate of cattle to zero. Therefore, the law of motion for the breeding
stock is given by

kb,t = kb,t-I + ~kb,t-3 + it, (8.2)

where it denotes deletions from the breeding stock due to slaughtering. Stacking the
breeding stocks so as to represent this evolution equation as a first-order system, we

kbt] [i0!][ bl I [!]

kb,t- 2
= 0
kb,t- 3
-t- it.

Consumption ct = - i t . We use one intermediate good to capture slaughtering costs

and three additional ones to capture the holding costs. Holding costs differ depending
on whether the animal is a calf, a yearling, or an adult. Let

gl,t = ect + (1/e)d~,t,

g2,t = ekb,t-1 + (~/lrl/e)dh,t,
g3,t = ekb,t-2 + ("/2rl/e)dh,t,
g4,t = 6kb,t -}- (1/e)dh,t. (8.3)

As specified, the holding and slaughtering costs are quadratic. The parameter e is set
to a small positive number to approximate the linear cost structure used by Rosen,
Murphy and Scheinkman (1994). The parameters 71 and 72 dictate the holding costs
for calves and yearlings, respectively, relative to those for fully grown animals. For
instance, the approximate holding period cost is dh,t for an adult, 3`ldh,t for a calf,
and 3`2dh,t for a yearling. In our computational experiments, the parameters 3`1 and
3'2 are set to 1/3 and to 2/3, respectively. Substituting for kb,t in (8.3) using (8.2) and
Ch. 4." Mechanics of Forming and Estimating Dynamic Linear Economies 217

I;] [; o o ;]
stacking the equations for consumption and intermediate goods into a system, we get

0 Ct + o° i,+ ~ °°
1 o ~ ]g2,t
0 0 1 [ 93,t
0 0 g4,t

= e

[i°i1 [i1
o o
1 0

+ (l/e)

Consumption goods and services are related trivially by

ds,t ]
dh,t ] "

where cq is positive. As a consequence, preferences for consumption are time sepa-

rable, and the slope of the Frisch demand function for beef is - c q .
The exogenous processes are specified as follows. The preference shock process
is given by the constant (c~0/c~l). The parameter C~o is the intercept in the Frisch
demand function. The two technology shock processes {ds,t} and {dh,t} are each
scalar first-order autoregressive processes with unconditional means #8 and #h and
autoregressive coefficients Ps and Ph, respectively.
As a device for proliferating endogenous state variables, we construct analogous
quarterly and monthly versions of a cattle cycle model. In so doing, we abstract from

Table 4.1
Parameter values for yearly, monthly, and quarterly formulations
of the cattle cycles model

Parameters Yearly Quarterly Monthly

/3 0.960 0.990 0.997
c~o 146.0 36.5 12.17
cq 1.270 0.318 0.106
1 + r/ 1.938 1.180 1.057
Ph 0.888 0.971 0.990
Ps 0.699 0.914 0.971
/zh 37.00 9.250 3.083
/~s 63.00 63.00 63.00
l x 10-o4 2.5 x 10-o5 8.33 × 10 - 0 6
218 E.W. Anderson et al.

any (realistic) periodic specification whereby, for example, a certain season of the
year is designated as a calving season. Also, we design the higher frequency models
to be only roughly compatible with the annual model. The parameter values selected
for all three versions are reported in Table 4.1. The higher frequency parameters are
obtained from the following algorithms. Let "r denote the number of seasons in a year
(either four or twelve). The higher frequency versions of t , 1 + r/, Ph, and Ps are
obtained by taking the annual parameters and raising them to the power 1/~-. The
higher frequency versions of a , e, and #h are constructed by dividing the annual
parameters by -r. The parameter #s is the same for all versions of the model. Finally,
as we proliferate time periods, we extend the number of periods it takes for a calf to
become a cow. Instead of two periods, it now takes the animal 2T periods to be an
adult. Accordingly, there are 2"1- cost parameters 7j, J = 1 , . . . , 2T. As in the annual
model, we assumed these parameters increased linearly from zero to one. Hence
~/j = j/(~- + 1).

9. N u m e r i c a l c o m p a r i s o n s

In this section, we study the performance of algorithms for computing solutions to

the optimal resource allocation problems described in Section 8. We report results
for six different economies: two permanent income/habit persistence economies, three
cattle cycle economies, and one time-to-educate model. Recall that the two permanent
income economies are very similar except the second one introduces a very small
adjustment cost term so that the resulting (Avy , Dr) is detectable. We label these
two economies Permanent Income and Permanent Income (with adjustment costs)
in the subsequent tables. The three cattle cycle economies differ with respect to the
presumed decision time interval. The three c a r t e cycle economies are calibrated to be
yearly, quarterly, and monthly decision periods and are labeled Yearly Cattle Cycles,
Quarterly Cattle Cycles, and Monthly Cattle Cycles, respectively. Finally, the time-
to-educate economy is labeled Education in our tables.
Table 4.2 gives the number of endogenous and exogenous state variables for each of
six optimal resource allocation problems. 26 There are four exogenous state variables
for the cattle cycle economy because we included a state that could be used to represent
a preference shock. The autoregressive parameter for this state was set to zero. Since
the gestation time period for a newborn calf to become a cow is held fixed across the
three cattle cycle economies, the number of endogenous state variables is larger for
Monthly Cattle Cycles than for the other two cattle cycle economies. Recall that the

26Wealso give approximatematrix one-norms for the true solutions. For the PermanentIncome economy
we used the true solutions to calculate the norms. For the other economies we used the solutions computed
by the Riccati Iteration algorithm and the doubling algorithm for Sylvester equations. Given the tables
that follow, these norms allow a reader to construct a relative measure of accuracy for the candidate
Ch. 4: Mechanics of Forming and EstimatingDynamic Linear Economies 219
Table 4.2
Number of state variables
Economy Endogenous Exogenous Norm of Py Norm of Pz
states states
Permanent income 2 2 2.45 × 10+m) 2.08 × 10 +o2
Yearly cattle cycles 3 4 1.37 × 10+m) 2.88 × 10+o2
Quarterly cattle cycles 9 4 3.53 x 10+°° 1.26 x 10+o3
Monthly cattle cycles 25 4 9.67 x 10+~) 3.93 x 10+o3
Education 15 52 8.76 x 10+°1 3.77 × 10+o4

number of exogenous state variables and endogenous state variables is large for the
Education economy because of the presumed population dynamics and the number
of time periods it takes to get highly skilled.
Associated with each of the six optimal resource allocation problems is a Riccati
equation and a Sylvester equation that are solved in finding the optimal decision rule.
We report the Riccati equation comparisons in the first subsection and the Sylvester
equation comparisons in the second subsection. Recall that Sylvester equations take
as one of their inputs a matrix constructed from the solution to the corresponding
Riccati equation. To simplify comparisons, we use the same input matrix for each of
the two Sylvester equation algorithms.

9.1. Solutions to Riccati equations

We compare the performance of seven of the Riccati equation solving algorithms

described in Section 4. We consider two invariant subspace algorithms: one is based
on an eigenvector decomposition labeled Eigenvector and the other on the Schur
decomposition labeled Schur in the tables described below. We study two deflating
subspace algorithms that are generalizations of the two invariant subspace algorithms
designed to permit the state evolution matrix (Avu) to be singular. (In fact, this matrix
is singular for the Education resource allocation problem.) We label these deflating
subspace algorithms Generalized Eigenvector and Generalized Schur. We investigate
two doubling algorithms that differ with respect to how they are initialized. The
first doubling algorithm uses the standard initialization (Po = 0), and the second
one initializes the doubling algorithm so that the terminal state and costate vectors
coincide (Po = I). Since the (Po = 0) doubling algorithm gives the wrong solution to
the Permanent Income resource allocation problem, it is not included for that control
problem. Both of these algorithms are labeled Doubling with the specification of Po
given in parentheses. Our seventh algorithm is the matrix sign algorithm and is labeled
accordingly. As a benchmark, one of the algorithms iterates on the Riccati difference
220 E.W.. Anderson et al.

equation from dynamic programming. 27 This algorithm is labeled Riccati Iteration in

the tables.
Table 4.3 reports comparisons of the performance of the eight algorithms
used to compute candidate solutions ( P ~ , F ( P ~ ) ) to the associated determinis-
tic regulator problem's given the inputs ( A y u , B v , Q v v , R). Here F ( P ) -= ( R +
B y ' P B y ) - 1 B y t P A u v . 28 To measure the accuracy of the computed solutions, we use
the matrix one-norm of the Riccati equation residual P~ - T ( P ~ ) where

T ( P ) = Qvy + A u y ' P A v v - A v v ' P B y ( R + B v ' P B y ) - ~ B v ' P A v u .

Gudmundsson, Kenney and Laub (1992) show that P~ is an accurate solution of the
Riccati equation (3.13) if it has a small residual and the Riccati equation is "well-
For the Permanent Income resource allocation problems, Table 4.4 reports the ab-
solute errors

IIP - p ll,, IIF - F<P>II,-

These errors were computed under the presumption that the first problem (without
adjustment costs) is the problem of interest. That is, we compare the true solutions to
the Permanent Income Economy to the computed solutions to the Permanent Income
Economy and the Permanent Income Economy (with adjustment costs). Recall that the
primary reason we introduced the adjustment costs is to make the doubling (Pu = O)
algorithm applicable. For the Permanent Income economy, we calculated the true
solutions for F v and Py by hand:

7/3 -7/60 ]
P'v= -7/60 7/1200J' Fu= [-1/3 1/60].

The results verify that (for the Permanent Income economy) the residual errors re-
ported in Table 4.3 are close proxies for the absolute errors reported in Table 4.4.

27The Riccati iteration algorithm iterates on

Pj+~ = ~),~y + (Ayy - By~))'Pj(A~y - B~Fy) + F / R F j ,


Fj -- (R + Bv~PjBu)-I Bv~PjAwu

until [IPj+I - Pill ~ c ItPjIII, where we set e = 1 × 10-15. We initialize this algorithm at 19o = I.
28A11comparisons reported in the section were performed on an HP-9000/730 computer with 64MB of
memory using version 4.2a of MATLAB and HP's FORTRAN compiler. We base our CPU times on 1100
Ch. 4: Mechanics of Forming and Estimating Dynamic Linear Economies 221

S i n c e s o l u t i o n s to P e r m a n e n t I n c o m e (with adjustment costs) a p p r o x i m a t e closely

t h e s o l u t i o n s to P e r m a n e n t Income, a p p l y i n g t h e d o u b l i n g a l g o r i t h m t o t h e a d j u s t -
m e n t c o s t v e r s i o n g i v e s a r e l i a b l e s o l u t i o n to t h e r e s o u r c e a l l o c a t i o n p r o b l e m w i t h o u t
adjustment costs.

Table 4.3
Performance of algorithms that solve Riccati equations

Economy Algorithm CPU time Residual norm

Permanent income Riccati iteration 0.0334 2.8 × 10 -15
Eigenvector 0.0047 1.1 × 10 -o3
Schur 0.0039 4.4 × 10 -16
Generalized eigenvector 0.0045 1.5 × 10 -1t4
Generalized Schur 0.0037 4.6 × 10 -16
Doubling (Po = I) 0.0031 6.1 × 10 -16
Matrix sign 0.0058 9.7 × 10 -16

Permanent income Riccati iteration 0.0334 1.9 × 10-15

(with adjustment costs) Eigenvector 0.0057 2.4 × 10 -117
Schur 0.0046 1.4 × 10 -15
Generalized eigenvector 0.0048 2.9 × 10 -114
Generalized Schur 0.0037 1.1 × 10 -16
Doubling (Po = 0) 0.0022 9.2 × 10 -16
Doubling (Po = I) 0.0030 9.4 × 10 -16
Matrix sign 0.0062 3.7 × 10 -15

Yearly cattle cycles Riccati iteration 0.0056 9.7 × 10 -16

Eigenvector 0.0076 2.3 × 10 -15
Schur 0.0079 3.3 x 10 -16
Generalized eigenvector 0.0125 1.7 × 10 -15
Generalized Schur 0.0054 2.1 x 10 -15
Doubling (Po = 0) 0.0026 5.6 × 10 -16
Doubling (Po = I) 0.0036 3.9 × 10 -16
Matrix sign 0.0089 6.7 x 10 -16

Quarterly cattle cycles Riccati iteration 0.0520 2.6 × 10 -15

Eigenvector 0.0400 9.4 x 10 -15
Schur 0.0373 1.1 x 10 -14
Generalized eigenvector 0.1177 6.2 × 10 -15
Generalized Schur 0.0248 6.9 x 10 -15
Doubling (Po = 01 0.0125 6.7 x 10 -16
Doubling (Po = I) 0.0131 5.6 x 10 -16
Matrix sign 0.0314 2.3 x 10 -15
222 E. W. Anderson et al,

Table 4.3

Economy Algorithm CPU time Residual norm

Monthly cattle cycles Riccati iteration 1.3860 1.0 × 10 -14
Eigenvector 0.6904 2.9 × 10-{4
Schur 0.6575 8.2 N 10 -14
Generalized eigenvector 1.3100 5.9 × 10 -14
Generalized Schur 0.3370 6.1 × 10-14
Doubling (Po = 0) 0.1435 3.7 x 10 -15
Doubling (Po = I) 0.1437 1.4 x 10 -15
Matrix sign 0.2569 2.2 × 10-{4

Education Riccati iteration 0.2554 8.2 × 10 -14

Generalized eigenvector 0.2437 2.2 × 10+{54
Generalized Schur 0.0394 2.2 × 10 -{56
Doubling (Po = 0) 0.0371 3.1 × 10 -{17
Doubling (1:'o = I) 0.0447 2.7 x 10 .o7
Matrix sign 0.0841 1.9 × 10 -/17

Table 4.4
Accuracy of solutions to the permanent income model

Economy Algorithm Absolute error of P,~ Absolute error of/7~

Permanent income Riccati iteration 6.6 × 10 -14 8.8 × 10 -15
Eigenvector 2.4 x 10 -{12 3.0 × 10 -03
Schur 8.8 x 10 -15 1.1 x 10 -15
Generalized eigenvector 3.1 x 10 -03 4.0 x 10 -{54
Generalized Schur 1.9 × 10 -14 2.6 x 10 -15
Doubling (Po = I) 8.2 × 10 -13 1.3 x 10 .{3
Matrix sign 2.8 × 10 -14 3.7 × 10 -15

Permanent income Riccati iteration 5.7 × 10 -13 8.2 x 10 -14

(with adjustment costs) Eigenvector 4.9 x 10 -06 6.4 x 10 -{57
Schur 5.0 × 10 -14 1.1 x 10 -15
Generalized eigenvector 6.0 x 10 -03 7.8 × 10 -{54
Generalized Schur 1.4 × 10 -13 1.5 x 10 -14
Doubling (Po = 0) 5.0 x 10 -13 7.3 × 10 -14
Doubling (Po = I) 1.7 × 10 -12 2.8 × 10 -13
Matrix sign 5.7 x 10 -13 8.3 × 10 -14

R e t u r n i n g n o w to t h e r e s u l t in T a b l e 4 . 3 , t h e f o l l o w i n g c o m p a r i s o n s a r e n o t e w o r t h y .

(1) The eigenvector and generalized eigenvector algorithms are unreliable for three
of our six economies. Not suprisingly, the presence of repeated roots in the solu-
Ch. 4: Mechanics qf Forming and Estimating Dynamic Linear Economies 223

tion to the Permanent Income control problem caused the eigenvector algorithm
to give unreliable solutions. Shifting to the generalized eigenvector algorithm
resulted only in marginal improvements in accuracy. While introducing tiny ad-
justment costs to the Permanent Income control problem improved the accuracy
of the eigenvector method, it failed to make the eigenvector method as accurate
as the other methods. The generalized eigenvector method performed poorly for
both this control problem and the Education problem.
(2) The Riccati iteration algorithm computed accurate solutions for all of the control
problems and, in particular, computed the most accurate solution for the Educa-
tion problem. Hence if accuracy is the primary concern, rather than speed, this
algorithm is a reasonable choice. However, in situations in which repeated solu-
tions are required, other algorithms can save the researcher a significant amount
of time. 29 Speed gains are likely to be important in econometric estimation and
in determining the sensitivity of solutions to changes in parameter settings.
(3) Algorithms that allow A v v to be singular do not suffer any "penalties" in speed
or in accuracy. Hence for our discrete-time control problems, there does not seem
to be a good reason to use the invariant subspace algorithms.
(4) Both doubling algorithms performed relatively well across the six economies. The
Po = 0 algorithm is a little faster than the Po = I algorithm for the Permanent
Income (with adjustment costs) and for the Yearly Cattle Cycles control problems
with comparable accuracy. The Po = I algorithm is the quickest of the seven
applicable algorithms in solving the original Permanent Income control problem.
The Po = 0 doubling algorithm outperforms the generalized Schur and matrix
sign algorithms. A possible reason it is faster than the generalized Schur algorithm
is that the generalized Schur algorithm does not exploit the symplectic structure
of the control problem.

9.2. Solutions to Sylvester equations

Table 4.5 compares the performance of the Sylvester equation algorithms discussed
in Section 6 applied to the five control problems. The algorithms take as inputs the
matrices (S, T, W). To assess the accuracy of the solutions, we use the matrix one-
norm of the Sylvester equation residual W + S M U T - M e, where M ~ is a candidate
solution. For the Permanent Income control problem, the absolute error, I[M - M~]lt,
of the Hessenberg-Schur solution is 9.1 x 10-13 and the absolute error of the doubling
algorithm's solution is 1.0 x 10 -12.

ZgThe speed of the Riccati iteration algorithm can be increased by lowering the tolerance e. For instance,
if'e is changed to 1 x 10 -07, for the Permanent Income Economy the CPU is reduced to 0.0163 with an
absolute error of 5.3 x 10 -06 for Py. Comparable changes in tolerance settings for the other iterative
algorithms had very minor changes in speed and accuracy for the Permanent Income Economy. Our
experience with the matrix sign algorithm applied to other economies is that significantly lowering the
tolerance can have disastrous consequences for accuracy.
224 E. W. Anderson et aL

Table 4.5
Performance of algorithms that solve Sylvesterequations
Economy Algorithm CPU time Residual norm
Permanent i n c o m e Hessenberg-Schur 0.0017 3.6 × 10 - 1 5
Doubling 0.0010 3.6 × 10-15
Yearly cattle c y c l e s Hessenberg-Schur 0.0027 3.3 × 10-13
Doubling 0.0014 2.8 × 10-14
Quarterly cattle c y c l e s Hessenberg-Schur 0.0041 7.8 × 10-13
Doubling 0.0028 2.6 × 10-13
Monthly cattle c y c l e s Hessenberg-Schur 0.0154 2.6 x 10-12
Doubling 0.0186 6.5 × 10 -13
Education Hessenberg-Schur 0.2601 4.3 × 10 -11
Doubling 0.1233 5.2 × 10-12

The accuracy of the doubling and Hessenberg-Schur algorithms are comparable.

While the doubling algorithm is faster in solving four of the five Sylvester equations,
the Hessenberg-Schur algorithm is faster in solving the Sylvester equation for the
M o n t h l y Cattle Cycles control problem. Recall that this problem has 25 endogenous
states but only four exogenous states. The Hessenberg-Schur algorithm is apparently
better at exploiting this asymmetry.

10, Innovations representations

Constructing an i n n o v a t i o n s r e p r e s e n t a t i o n is a key step in deducing the implica-

tions of a model for vector autoregressions and for evaluating a Gaussian likelihood
f u n c t i o n ) ° An innovations representation is a state-space representation in which the
vector white noise driving the system is of the correct dimension (equal to that of the
vector of observables) and lives in the proper space (the space spanned by current
and lagged values of the observables).
Suppose that our theorizing and data collection lead us to a system of the form 31

Xt+l : Aoxt + Cwt+l,

z t = G x t + vt~

Vt+l = D v t + H w t + l , (10.1)
where D is a matrix whose eigenvalues are bounded in modulus by unity, and { w t }
is a martingale difference sequence with E ( w t + l w t + l ' ] f't) = I, where .T't is the

3°The calculationsin this section are versions of ones described by Andersonand Moore (1979). We
alert the reader that we are "recycling" or "reinitializing"some of the notation used in earlier sections,
such as zt, vt, ut, D, R.
31In particular, the solutionto the discounted swchastic regulawr problem can be expressed as xt+ i =
Aoxt + Cwt+l where Ao = A - BF.
Ch. 4: Mechanics of Forming and Estimating Dynamic Linear Economies 225

sigma field generated by the history of ws up to t. We take zt to be the time t

vector of variables on which an econometrician has observations, and we interpret
vt as a serially correlated measurement error vector. We let R = H H ' , which is the
covariance matrix of H w t + l . We impose C H ~ = 0, by way of assuming that the
"state" and "measurement" errors are uncorrelated.
We define the following quasi-differenced process

2t =- zt+l - Dzt. (10.2)

From Eq. (10.1) and the definition (10.2), it follows that

2t = (GAo - D G ) x t + ( G C + H ) w t + l .

Then (xt, 2t) is governed by the state space system

xt+l = A o x , + Cwt+l,
zt = G x t + ( G C + H ) w t + l , (10.3)
where G = GAo - D G . This system has nonzero covariance between the state noise
CWt+l and the "measurement noise" ( G C + H ) w t + l . Let [Kt, St] be the Kalman
gain and state covariance matrix associated with the Kalman filter, namely,
K t = ( C C ' G ' + A o S t G ' ) ~ 2 ~ 1, (10.4)
f2 t = G ~ t G t + R + G C C G ~, (10.5)
St+l = A o S t A o ' + C C ' - ( C C ' G ' + A o S t G ' ) f 2 t l ( G S t A o ' + G C C ' ) . (10.6)
Then an innovations representation for system (10.3) is

:~t+l = Aoxt + K t u t ,
2t = Gc?t + ut, (10.7)

st - I et_l,

= - ! 5o1,
[?t = E u t u t ' = O,~tO' + R + G C C ' G ' . (10.8)
Initial conditions for the system are a:o and 27o. From definition (10.2), it follows that
[zt+l, z t , . . . , zo, 2o] and [2t, 2 t - l , . . . , 2o, ~:o] span the same space, so that

Xt -~ Blast I z t , z t - 1 , . . . , z o , a;o],

ut = Zt+l - / ~ [ Z t + l I Z t , . . . , Z o , Xo].
The process ut is said to be an innovation process in zt+l.
226 E. W. Anderson et al.

Equation (10.6) is a matrix Riccati difference equation. The Kalman filter has
a steady-state solution if there exists a time-invariant positive semi-definite matrix
£7 which satisfies Eq. (10.6) with Zt+l = St, i.e., one that satisfies the algebraic
matrix Riccati equation. In this case, the same computational procedures used for
the optimal linear regulator problem apply: a benefit of the duality of filtering and
control. The steady-state Kalman gain K is given by Eq. (10.4) with St = S and ~2t
= GZG I + R + GCCGq

10.1. Wold and autoregressive representations

The innovations representation is associated with a WoM representation or vector au-

toregression. Estimates of these representations are recovered in empirical work using
the vector autoregressive techniques promoted by Sims (1980) and Doan, Litterman
and Sims (1984). Wold and vector autoregressive representations are easy to obtain
when A - K G is a stable matrix. To get a Wold representation for zt, substitute
Eq. (10.2) into Eq. (10.7) to obtain

3~t+l = Ao2gt -t- K u t ,

zt+l - D z t = GYct + ut. (10.9)
A Wold representation for zt is

Zt+l = [f -- D L ] - I [ 1 -}- O ( I - A o L ) - ' K L ] u t , (10.10)

where, again, L is the lag operator. From Eq. (10.9) a recursive whitening filter for
obtaining { u t } from { z t } is given by

U t ~- Z t + I -- Dzt - Gxt,

kt+l = A o x t + K u t , (10.11)
Hansen and Sargent (1994) show that an autoregressive representation for zt is

zt+, = { D + ( I - D L ) O [ I - (Ao - K O ) L I - ' K L } zt + ut (10.12)



z,+~ : [D + CKlz, + ~ [ G ( A o - K G ) J K
- DG(Ao - KG)J-~K]zt-j + ut. (10.13)
This equation expresses zt+t as the sum of the one-step-ahead linear least squares
forecast and the one-step prediction error.
Ch. 4: Mechanics of Forming and Estimating Dynamic Linear Economies 227

11. The likelihood function

Obtaining the Kalman gain sequence {Kt} of the previous section is a key step in con-
structing and manipulating a recursive representation of a Gaussian quasi-likelihood
function. It is often necessary to transform the observations into a form matching the
linear state-space form. Thus, we start with a "raw" time series {Yt} that determines
an adjusted series zt according to

zt = f(Yt, 0),
where O is the vector containing the free parameters of the model, including parame-
ters determining particular detrending procedures. For example, if our raw series has
a geometric growth trend equal to #t which is to be removed before estimation, then
the adjusted series is zt = Y t / t ~t. We assume that the state-space model of the form
(10.3) and the associated innovations representation (10.7) pertain to the adjusted
data {zt}. We can use the innovations representation (10.7) recursively to compute
the innovation series, then calculate the Gaussian log-likelihood function

L(@)=~{log,Dt]+trace(X?tlutut')-21og ~f(Yt'O) l} (11.1)
t=0 OYt

and find estimates, ~) = argminoL(@), where X2t = Eutut ~is the covariance matrix
of the innovations computed from (10.8). 32 To find the minimizer (9, we can use a
standard optimization program. In practice, it is best if we can calculate both the log-
likelihood function and its derivatives analytically. First, the computational burden
is much lower with analytical derivatives. Consider, for example, the model of Mc-
Grattan, Rogerson and Wright (1995), which has 64 elements in @. For each step of
a quasi-Newton optimization routine, L and ~L/~O are computed. To obtain ~L/~O
numerically for the McGrattan, Rogerson, Wright (1995) example, the log-likelihood
function must be evaluated 128 times if central differences are used in computing an
approximation for OL/OO, e.g.,

OL L(O + e.e) - L(O - ee)

-- ~ (11.2)
30 2e '

32The log likelihood is conveniently factored as

log Pr(zt, z t - t . . . . . Zo) = log Pr(zt I z t - i . . . . , zo)--" log Pr(zl I ztl) log Pr(zo).

For alternative ways of modelling ~:o, see Ansley and Kohn (1985), Hamilton (1994) and Hansen and
Sargent (1994).
228 E.V~Anderson et al.

where e is a vector of zeros except for a 1 in the element corresponding to 0 and

e is some positive number. Usually, the costs of computing L a large number of
times far outweigh the costs of computing OL/~)O once. If L and OL/OO are to
be computed many times, which is typically the case, then the costs of computing
numerical derivatives can be quite large. A second advantage to analytical derivatives
is numerical accuracy. If the log-likelihood function is not very smooth for the entire
parameter space, there may be problems with the accuracy of approximations such
as Eq. (11.2). With inaccurate derivatives, it is difficult to determine the curvature of
the function and, hence, to find a minimum.
For L(O) in Eq. (11.1), the derivatives ~L(O)/~O can be derived by following
procedures of Kashyap (1970), Wilson and Kumar (1982) and Zadrozny (1988a, 1989,
1992). We display these derivatives in Appendix B and distinguish formulas that are
steps in the derivation from those that would be put into a computer code. Note that
although the final expression for OL/OO derived in Appendix B is complicated, we
can use numerical approximations such as Eq. (11.2) to uncover coding errors.
Once we have the log-likelihood function and its derivatives, we can apply standard
optimization methods to the problem of finding the maximum likelihood estimates.
In practice, we will have a constrained optimization problem since the equilibrium
is not typically computable for all possible parameterizations. For example, we may
have simple constraints such as g < O < u, where g and u are the lower and upper
bounds for the parameter vector. In this case, we use either a constrained optimization
package or penalty functions [see Fletcher (1987)].
After computing the maximum likelihood estimates, we need to compute their
standard errors,

S~(0) = diag 3 0 ~O ] ' (11.3)

where Lt (0) is the logarithm of the density function of the date t innovation, i.e.,

, -1 Of(Yt, O)
Lt(O)=loglY2tl+utY2t ut - 2log Oyt " (11.4)

The formula for 3Lt/30 is also given in Appendix B.

12. Estimating the cattle cycles model

In this section, we present estimates of some of the parameters of Rosen, Murphy and
Scheinkman's (1994)model) 3 We let Pt be the price of freshly slaughtered beef, d~,t

33We have used estimates of key parameters from this section in the numerical experiments for the
annual model.
Ch. 4." Mechanics of"Forming and Estimating Dynamic Linear Economies 229

the feeding cost of preparing an animal for slaughter, dh,t the one-period holding cost
for a mature animal, "yldh,t the one-period holding cost for a yearling, and ~/2dh,t the
one-period holding cost for a calf. The c o s t s {dh,t, ds,t}t~°=o are exogenous stochastic
processes, while the stochastic p r o c e s s {Pt}t°°=o is determined by an equilibrium. Let
]gb,t be the breeding stock and Yt be the total stock of animals. Each animal that is
reserved for breeding, gives birth to r/calves. Calves that survive become part of the
adult stock after 2 years. Letting t index years, the law of motion for stocks is 34

kb,t = kb,t-1 q- rlkb,t-3 -- ct, (12.1)

where ct is a rate of slaughtering. The total head count of cattle is

Yt = kb,t 4- ?}kb,t-1 + ~]kb,t-2, (12.2)

which is the sum of adults, yearlings, and calves, respectively.

A representative farmer maximizes

EO Z ~t {ptct -- dh,tkb,t __ (~ldh,t)(?]kb,t_l) __ (~2dh,t)(?~]~b,t_2)

- d,,tct - ~ I ' t } (12.3)


~t = ( k b,t-l-lg2t-l
2 , + k b,t-Z-}-e
2 2) .

Here e is a small positive parameter which measures the quadratic costs of carrying
stocks and slaughtering.
Demand is governed by

ct = C~o -- cqpt (12.4)

where c~0 > 0 and c~l > 0. The stochastic processes {dh,t, ds,t} are univariate autore-
gressions with orthogonal innovations

dh,t+l = (1 - 19h)~h -}- phdh,t q- eh,t ,

ds,t+l = (1 - - Ps)#s + psdm,t + es,t,

where Ee~,t = o-h2 and E e 2 t = or.2 The disturbance processes {eh,t} and {es,t} are
white noises that are uncorrelated at all lags.

34We have set the death probability in Rosen, M u r p h y and S c h e i n k m a n ' s (1994) model to zero.
230 E. W. Anderson et al.

Table 4.6
Parameter estimates for "Cattle cycle" example
Parameters Estimates Standarderrors
oq~ 146 33.4
c~1 1.27 0.323
3`1 0.647 11.5
3'2 1.77 12.0
r/ 0.938 0.0222
Ph 0.888 0.115
ps 0.699 0.0417
~rh 6.82 10.6
as 4.04 1.05
ay 0.273 0.0383
c% 4.82 0.531

To compute parameter estimates, we use the data of Rosen, Murphy and Scheinkman
(1994), which include annual observations for Yt, ct, and Pt for the United States
during the period 1900-1990. 35 We assume that there is error in measuring the total
stock of cattle Yt and the slaughter rate ct. In particular, we assume that the (1,1)
element of R, the variance-covariance matrix of the measurement errors, is equal to
2 and we assume that the (2,2) element of R is equal to crc.
O-y~ 2 All other elements of
R are set equal to zero.
We are now equipped to estimate the parameters of this model by applying the
formulas of the previous sections. We start with some a priori restrictions. Assume
that/3 = 0.96, e = 1 x 10 -4, /~h ---- 37, and #s = 63. The remaining parameters
are elements of O, i.e., @ = [c~0, cq, 3'1, 3"2, r/, Ph, Ps, Crh, if.s, Cry, Crc]. In Table 4.6,
we report estimates of these parameters and standard errors for the estimates. Note
that from the values for c~0 and c~1 we can get an estimate of the demand elasticity.
For this model, the elasticity is given by - 0 . 6 1 . 36 The values of 3"1 and 3"2 give us
information about the holding costs. The estimates indicate that the costs are higher
for calves than for yearlings. However, the standard errors on 3'1 and 3'2 indicate that
these parameters are not precisely estimated. The value of r/implies that 0 . 9 4 k b , t - 1
calves are born at date t, where kb, t _ l is the breeding stock at t - 1. This estimate is
higher than Rosen, Murphy and Scheinkman's (1994) estimate of 0.85. The estimates
of Ph and Ps imply that there is persistence in the processes for holding and feeding
costs. Finally, the estimates of cry and crc indicate that the measurement error is higher
for the slaughter rate than for the total stock.

35The sources of these data are the Historical Statistics of the United States, Colonial Times to 1970
and Agricultural Statistics. In the data, y is the total stock of cattle excluding milk cows, c is the cattle
slaughtered, and p is price of slaughtered cattle.
36This estimate is c~l x po/co (-1.27×0.48).
Ch. 4: Mechanics of Forming and Estimating Dynamic Linear Economies 231

In Figs 4.1 through 4.3, we plot the predicted and actual time series for the stock
of cattle, the slaughter rate, and the price. The predicted series are the one-step-ahead
forecasts. Using the notation of section 10 these are given by the vector G2t.

60 / I



190~ 19'10 19'20 19'30 19~40 19'50 19'60 19'70 19'80 1990
Figure 4, I. One-step-ahead forecast and actual total stock.

24 i-


20 /
18 / Forecast

16 / ~ /"\ I II

14 I [
12 x Actual ~


~00 1910 1920 1930 1940 1950 1960 1970 1980 1990
Figure 4.2. One-step-ahead forecast and actual slaughter rate.
232 E.W. Anderson et al.

120 - - 1 1"- i


Actual t~~ltt
110 I II I -
t I t
[ I I
t II
I tt

\~\l~l t ' Forecast


i i
901900 ld10 ld20 19'30 19'40 1dso 1960 1970 19180 1990
Figure 4.3. One-step-aheadforecast and actual price of slaughtered beef.

Appendix A. Computing OL/OOand OLt/OOfor a state-space model

Differentiating the log-likelihood function with respect to the free parameters of the
economic model can be broken into two steps: first, differentiating the log-likelihood
function with respect to matrices appearing in the state-space model (10.7); and sec-
ond, differentiating the parameters of the state-space model (10.3) with respect to the
free parameters of the underlying economic model. In this appendix, we derive OL/OO
in terms of the derivatives of Ao, C, G, D, R, :?o, Zo, and {zt, t = 0 , . . . , T}. We
ignore the Jacobian in Eq. (11.1) since it differs for each problem. In Appendix B,
we show how to compute derivatives of Ao for the linear-quadratic and nonlinear
economies with and without distortions.

A.1. The formula for OL/OO

For the first step, we take as given Ao, C, G, D, R, Yco, Zo, and {zt,t = 0,... ,T}
and their derivatives with respect to the deeper economic parameters. We shall show
that the derivative of the log-likelihood function is

[ 0
o, ~tutlg2t I 2trace{~o C G ' M t G }
~0 t=o
OG -, ^ , -1
+ 2 trace ~ (AoZtG Mt - ~tG'MtD + CC'G'Mt - Aoxtztt ~t

+ ~tutt~2~lD) }
Ch. 4." Mechanics of Forming and Estimating Dynamic Linear Economies 233

- 2 trace{-~o (GZtO,'Mt- ztuttx2tl + GYctut'g?~-l)}

+ trace{ ~---~Mr}+ trace/fO~Tt~O ' M t G } - 2 trace{~0t ut'l?tlG}

. [ O z t ~+ l t , £ 2 ~ - i
+ 2 ~race], } - 2 trace{ -8zt
~ u t ,,~-1
~t o,~'/]
j,] (A.1)

~_ OAoi 8C C OC'
OA° ZtAo' + Ao Ao' + AoZt-~- + - ~ +C-
O0 OO ~0
( i~C CI Gt ~)CIG, i~GI 8Ao -I
- 56 +c56 + c c ' T d +-gb-s~c
8Zt O' + AoZt~-~
+ Ao-~- 00"~) K t , + Kt ~-~-
OS2t Kt'

00 OZt . , ~Ao'
- K - ~ ZtAo' + G ' ~ - Ao + O,Zt ~0

c , + c c ~OCI)
+ ~~ c CC' + c ~C
~0 (A.2)

i~Kto0 0 - K t ~ zt + --~ 2,t

~0 ~° ~0 +
_ / i~zt+l
+ ~ , - -gF
-D ~0/" (A.3)

The expressions in (A.2) and (A.3) follow from the definitions of Zt in Eq. (10.6) and
5:t in Eq. (10.7). The initial conditions Y:o and Z0 and their derivatives are assumed
to be given.
If Z0 is given by the steady state solution of the Riccati equation, then the compu-
tation can be simplified. The formula for the derivative of the log-likelihood function
is given by
( ~Ao
0L~0- 2T trace/ ~ - ( Z G ' M G - F ~ . ~ - I G - F~(I - KG)

- ~O's?-lr,,W - KG) - ~TL'rJs?-'c + ~Ao'~(I - Ka)) }

{OC '(a'va - c's?-lr.~(I- Ka)

+ 2T trace ~ - C

- (I - a ' K ' ) r J s ? - ' a + (i - c ' K ' ) r I ( Z - KG)))

234 E. I~ Anderson et al.

~ (AoZ, G'M - IT,G'MD + CC'G'M

+ 2T trace -DG

- AoP~O -1 + F~f2-1D + AoF:~xK

-- F~xKD -- CC'(I -- ,~,K,~F,

tJ uA I~--I

+ CC'G'O-IE~xK _ AoZfio'F~,'~ -1

+ iT,.Ao'I'~x'g?-lD + AoZG'I?-I_E~:~K

- ~Otf2-1F~KD - Ao~Ao'HK + ~UfIoIHKD

- C C H K + CC'G'K'HK) }

- 2T trace{ ODG
( ZdM
''DO + (F~u - GF~)f2 -1

+ GF~xK - FzxK - GZJio'Fu~'f? -1

+ G EG' ~ - I F u A K -- GZAo' HK) }

+ 2T trace{ ~--~(½M + f 2 - 1 F ~ x K + ½ K ' H K ) }

+2trace{'l~(Dzt+l ~Dzt'~) }
t=o \ DO / d ~ - / u t ' f2-1

- 2 t r a c e { ' l ~ ( D\Z tDO

-D~)At'K} , (A.4)

where )3 is the asymptotic state covariance matrix found by iterating on Eq. (10.6)
and G, K, 12, ut and 0St are defined in Eqs (10.3)-(10.5), and (t0.7), and
At = (Ao-KG)tAt+l+Glf2-1ut, t=O,...,T-2,
~T--1 : Glf~-lur-l~
1 r-1
Pu~, =T- ~ utut~'
Ch. 4: Mechanics of Forming and Estimating Dynamic Linear Economies 235

& ~ = [i1 E XtUtl'

P 1Z ztutt
~ T '
1 T-1
r u n = -r E ~t-l"~tt' (A.5)
F/cA = -~ X t - l At , (A.6)

F~x = ~
1~ zt-lAt ,

M = g2-1 - Y2-1F~J? -l,

rio = Ao - KG,
H = Aotl-fff[ o q- G t M G - Ot[2-1-]nuAdo - 2 ~ o / F u A t f 2 - 1 G .

In the remainder of this appendix, we derive the formulas in Eq. (A.1) and Eq. (A.4).
Readers who are not interested in this derivation can skip the rest of this appendix.

A.2. Derivation of the formula

The derivative of the log-likelihood function with respect to any element 0 of the
parameter vector is given by

- =
oo Etrace/
f Og?~
Mt + Ztrace /
~ { Out ,
+ ut Out'
00 ) }

= S1 q- ~ 2 , (A.8)

where Mt = Y?t 1 - Y?tlutut~Y2t 1 and Y?t = Eutut'. We start with the first term in
the expression for the derivative of the log-likelihood function $1. For this, we need
the derivative of the covariance matrix Y2t which satisfies

os?t oQ -, ~OZ~G,+O~O0' OR OG
oo - ~ ~ , G + oo 5g + ~d + ~ cc'a'

+ G -OC
~ C~G, + GC -OCIGt
~ + GCC' OOG

= ( ~ o A o + G O A ooo OD OG) NtG' + G -O- ~Z-tG

oo G - D ~
236 E.W.Andersonet al.
O i
- i G OAo' , GlaD' OG' ,\
+ GiTt Ao ~ ( + --~- G - DO -~ D )
+ -DR DG CC'G' + G -DC
~ + --~ ~ CIGI + G C -DC
~ G' + G C C ' -DGI
~ . (A.9)
The second equality follows from the definition of G. If we post-multiply the deriva-
tive of C2t by Mt and take the trace of the result, we have the first term of the
derivative of the log-likelihood function in Eq. (A.8):

T-1 [
SI = ~
[,DA 0
2 trace~--~- ZtG'MtG
) + 2 trace
(~0 C'G'Mta )
+ 2 trace\~-~{AolTtG
DG -' M~ - ITtQ'MtD + CC' G' Mt } )

- 2 trace(~o GEtO'Mt) + trace(~--~ Mr)

+ trace < ~-(-

Note that the formula for $1 depends on derivatives DAo/DO, DC/DO, DG/80, and
D/{/D0, which are known, and D17t/80, which is yet to be derived.
We now turn to the second term of the log-likelihood function derivative, $2 :
trace(D%tut'/DOg2tl). L e t / ' ~ , ( t ) = utut'. By definition, P~(t) : (5t - Gfft)(zt -
Gfft)' and, therefore, its derivative is given by
8 r ~ ( t ) _ ( De~ 80 Dfft'~ , [ 82~ 86 fft _ O Dfft']'
Do \8o 8o St - ° + u' t 8o 8o)
_ ( Dzt+l 8D Dzt DG ^ DAo
\ ~ DO z t - D DO ~Aozt-a--~-Sct

8Zt+l ' t DD' 8zt ' D' - 9 t t A o '8G' ^ 8Ao'G,

- - - oct'
+ u~ 80 zt 80 DO 80 -~-
OD' DG',
+ f f t ' G ' - ~ + 3ct'--~ D - DO
~}" (a.ll)
If we post-multiply this derivative by f2/-1, take the trace of the resulting matrix,
and sum over t, I then we have the second term of the derivative of the log-likelihood
function, i.e.,
r Ao }
$2 : - ~ 2 trace/~ fft'at' f2t-1G
Ch. 4: Mechanics of Forming and EstimatingDynamicLinear Economies 237
+ 2 trace{ ~OGiAoxtu
t , ^ , f~t i -- "~tut' n ; 1D) j

+ 2 trace --~(ztut - G:~tutl)~Qt -1 - 2 trace ~t=0 Y

Ozt+l ~ t t ~ t l

T-~ ~zt ut,Dt_lD + 2 trace

+ 2 trace i, t=0 ~ff t=o ~ ut " t ufj (A.12)

Sum the expressions in Eqs (A. 10) and (A. 12) to get the expression for the derivative
of the log-likelihood function in (A.1).
For the time-invariant case, several more steps are needed. First, we derive the last
term in Eq. (A.12) in terms of the derivatives that are taken as inputs. Following
Kashyap (1970), Wilson and Kumar (1982) and Zadrozny (1988a), we can simplify
the computations by working with sequences {dr} and {At} defined as follows

= (~0 o OK-
O - K sO&)~
- OK
+ -gg + . 02t
-gg, t = O, . . . , T - 1 ,

At=(Ao-KG)'At+~+G'£2-1ut, t=0,...,T-2,
AT-~ = G' f2-1UT-l. (A.13)

Notice that the time subscripts have been dropped from K and [2 since the time-
invariant case assumes that Zt = Z for all t. Let Ao = Ao - KG. Notice that since
Yct+l = -AoS:t + K2t, its derivative is given by

O0 .~o 00 + dr. (A.14)

Write out the last term in Eq. (A.12) and substitute in 97t = Ato + ~ts~fUo-'dt_s.
Then group terms involving Yc0 and dr, t = 0 , . . . , T - 2. These steps lead to

2 (E
T-1 ~Yct ) 2 (
050 Z--1 )
T trace\ t=0 -~ut'~Q-IG = - ~ t r a c e \ 00 A0' + ~t=, dt-~At'

2 ) {( Ao
- T trace\ 00 A0' - 2 t r a c e \ O0 0~ K-~Ao

~Ao ~D OG] , OK
- KG--~- + K - ~ G + K D - - ~ I~:, + -~-d-F~a
T--I }
] K~ OZt K OD F r i~Zt-I
q- T t=l -gg K D F_,--gV-
238 E.W. Anderson et al.

( OAo }
= - 2 trace / - - ~ _E~x(I - KG)

+ 2 trace { 0OG
0 (Ao-P¢xK - V 2 x K D }

- 2 t r a c e { - ~aD
(GP~,x K -Pz,xK)}

2 / T-1 T-1
trace K" Ozt It' azt-x At'~
k t=l t=l

T traceL-~- ,~0 ) - 2 trace - ~ F~x , (1.15)

where P~x, P¢;~, and F~), are the sums defined in Eqs (A.5) through Eq. (A.7) and
w-',T--1 - /~ t / r r .
F2A = 2.~t=l Zt-I t /1. The second equality follows from the definitions of dt-i
and O and some algebraic manipulation. The last term in Eq. (A.15) uses the fact
that ut = 2t - 0:~t. With the exception of OK/O0, the expression in Eq. (A.15) is a
function of known derivatives. The expression for OK/O0 follows from the definition
in Eq. (10.4) and is given by
- [oc c ' a ' .+ w - ~ oa' ox o,
+CC'--~ +----2-°XO'+Ao-~-ff
oo L oo
OGI + A o ~ -OA°IG
+ Ao~Ao'-~ ~ - ' - Ao~G' ~OD' - AoE ~OGID t]
i n -1

- (CC'G' +Ao20')9 -~ ~AoXO,+a ~0, - -OD

-~ g~o'

OG ITO' + OJ--~- -
OZ O, + GZAo ,OG'
- ~ + 012, OAotGt
- D.-~

OD' -
OEG' -~ GE
~0' D ' OR + -~
+ -~

+a FOC C'G' +cc-g~

OC'G' 0C'] o_t.
+ c c c ' oo ]

Note that we have written 0G/00 in terms of i3G/O0, OAo/O0, and OD/O0. Substi-
tuting OK/O0 into the expression in Eq. (A.15) and rearranging terms, we have
2 T--1 0~ t
T trace( E ~ ut'X2-1G
\ t:O
: - 2 trace (I'e.x(I - KG) + Z 0 ' f 2 - ' P u , x ( I - KG)
Ch. 4: Mechanics (?f Forming and Estimating Dynamic Linear Economies 239

+ 2Ao'G~'n-IG) }

- 2 trace { -~-
3C C'(G'~Q-1I'ux(I- KG) + (I-G'K')F~x'g?-IG) }

+ 2 trace AoI'~K -- _E~KD - CC'(I -- G,K,~E, ,g?-I

) u~

+ CC,G, f2-1E~aK - AoY]Ao

- ,Pu),,f2 -t + Zi{o'l"uk,'f2-1D

+ AoSG'n-IG, xK _ 2,(9,S2-1F, xKD) }

- 2 trace { --~
OD (GF~;~K - F~:,K - GSdo'F~;~'n-I + GSG'n-' F~:,K) }

+ 2 trace{ ~--~ f2-1-Pu~,K }

2 ~ 8zt A ' 3zt-i

T trace t.K --~ t - KD ~_~ - - ~ At'
t=l t=l

T2 trace{ ~0:~0.,~o, ]~ - 2 trace{ -~(G

3Z - g2- l Pu:~Ao)}.
- (A.17)

Therefore, the expression for the second term of the log-likelihood function derivative
$2 is given by

$2 = - 2 trace~.--~--~-(F~J2-1G + Pe;~(I - KG) + ZG'J?-1F~>,(I - KG)

+ }
- 2 trace{ ~o C'(G'S2-1F~x(I- KG) + (I-G'K')G~x'J2-1G) }

- igG (Aol_1.aj2_1 F~D_ID _ AoFs:xK + FexKD

2 trace -~-

+ CC'(I - G'K')F~x'g? -1 - CC'G'F~-IF~K

+ A o Z f i o ' P ~ x ' f 2 -I _ ~ f i o ' p ~ x ' f 2 - 1 D

240 E. V~ A n d e r s o n et al.

- Ao22~,£2-112,,~K + ZG'J2-1I~,~KD) }

- 2 trace ~ - ((Fz.. - GF~)~2 -1 + GF~,K - FzxK - G22fto'F,~x'~(2 a

+ GZ, Q'f2-'F~xK) }

+2 trace{...- F~K}

+ ~ trace --~ ut~L - ~ trace ~ ut'X2-1D

t=O t=O

T trace K E - ~ A t ' - ~At'

k t=l t=l

2 trace{ 0:~° Ao'}

T -gO-

-2trace{~oG'X2-1F~Xfto }. (1.18)

Our expressions for $1 in Eq, (A.10) and $2 in Eq. (A.18) depend on OAo/O0,
OC/O0, OG/O0, OD/O0, OR/a0, which are known, and OX/00, which we will now
derive. Using the expression in Eq. (A.2) with Zt+l = Zt = Z, we get

a-O = A o ~ flto' + W + W', (A. 19)


w= X Ao' +
-gg -
aO ZGIK I - Ao Z A o lag
K I - AoZ -OAo
~ - GIK I

+ AoZG' OD iK' + AoZ OC'D'K' + 1 K OR KI

00 O0 2 -~

- K D ~0 Z GIKI + K ~-~
OG C" C G ' K ' + K G -OC
~ CGtKI. (A.20)
Ch. 4: Mechanics (?[ Forming and Estimating Dynamic Linear Economies 241

The terms W and W' in Eq. (A.19) include all derivatives but 0~'/00. To get the
expression in Eq. (A.20), we substituted the expressions for 0O/00 and OG/OOinto
Eq. (A.19). Let H be a symmetric matrix that satisfies

17 = fito'H]to + ½(H + H'), (a.21)


H = G'MG' - 2G~'~ -1 _Fu~Ao. (A.22)

trace, gO-

= trace{ aZ--~-(Lr - ] t o ' H A o ) }

aZ H }
= trace{-~-~ _ trace{.Ao gg
~Z fito'H }

= trace ~ - - Ao - ~ - -Ao' / 7

= trace{ (W + W')I-I}
-- 2 trace{WH}. (A.23)
If we post-multiply W by /7 and take 2 times the trace, then we have an expression
for trace(OZ/gO)H in terms of known derivatives, i.e.,

trace( ~-o H) 2 trace{ aA~°Sfi.o'/7(I - KG) }

+ 2 trace{~o C'(l-G'K')/7(Z- KG) }
( aG -, _
- 2 trace / ~ - (AoZAo HK Zfto'HKD

+ CC'(1- G'K')/7K) }

+2trace{~---o G~fto'/TK}+trace{.~fo K'l-lK}. (A.24)

Sum $1, which appears in Eq. (A.10) with ~t = ~ and X'2t = S2, and $2 in (A.18).
Substitute in the expression for trace(~2/~O)H from Eq. (A.24). The result is the
derivative of the log-likelihood function which is given in Eq. (A.4).
242 E. ~ Anderson et al.

A.3. Standard errors

After we have computed parameter estimates, we want to compute their standard

errors as given in Eq. (11.3). For this we need to compute the derivative of

L~(@) ----log In~l + u/n?~t

with respect to any element 0 of the parameter vector. 37 This derivative is given by

OLt _ ( ODt ~ Out' Out ut Dt ODe -1

00 trace ~tl O0 ] -t- - ~ ~Qtlut + utQQt I - ~ _ t -1 ~ - ~t ut

= trace{(~t 1- 1 t 1" ~ 0 ~ t "1

= trace~ Mt + trace ~2tl 00 ' (A.25)

where Mt = ~2t I - ~?t-l ututQQt 1. Above, we calculated OX2t/O0 and i~(utut')/O0.

These expressions are given in Eq. (A.9) and Eq. (A. 11).

Appendix B. Differentiating the state-space model with respect to economic


In this appendix, we describe how to compute derivatives of Ao with respect to the

free parameters of an economic model. We do this for four economies: a linear-
quadratic economy without distortions; a nonlinear economy without distortions; a
linear-quadratic economy with distortions; and a nonlinear economy with distortions.
Because we use linear approximations for the nonlinear economies, most of the work
is in deriving the formulas for the linear-quadratic economies.

B. 1. A linear-quadratic economy without distortions

We consider a discounted stochastic regulator problem. The optimization problem is

max E o ~ / 3 t ( x t ' Q x t + ut'Rut + 2 x t ' W u t ) , (B. 1)
{u,} t=o
subject to xt+l = Axt + But + Cwt+l.

37Note that we are again ignoring the Jacobian since the relationship between z and y diffcrs ~br each
Ch. 4: Mechanics of Forming and Estimating DynamicLinear Economies 243

We assume that the matrices Q, R, W, A, and t3 depend on a vector of parameters

69. For the remainder of this section we assume that C = 0. Typically, the number of
elements in 69 is small relative to the combined number of elements in these matrices.
We also assume that the derivatives of the matrices in Eq. (B.1) with respect to the
elements of O are known.
The optimal decision function is given by ut = - F x t , where

F = (R +/3/3'PB)-I(/3B'PA + W') (B.2)

for P satisfying

P : Q +/3A'PA
- ( W + / 3 A t P B ) ( R +/3/3'PB)-l(/3/3'PA + W'). (B.3)
The law of motion for x in equilibrium is

Xt+l = Aoxt, Ao = A - BF. (B.4)

Therefore, the derivative of Ao with respect to an element of 69 is

- F -/3 --. (B.5)
00 00 00 00

The derivatives OA/O0 and OB/O0 depend on the specification of the problem in
Eq. (B.1) and are assumed to be known. The derivative of F is

(oR oB' oP oB' F

OFo0- (R + ~B'PB)-' \~-~ +/3 ~-~ PB +/3B' ~ B +/3/3'P O0/
(°' B
+ (R + / 3 B ' P B ) -1 /3 - ~ P A + fiB' ~
OP t 0A
A +/3/3 P - ~ + ~
]. (B.6)

Notice that this formula depends on the derivative of P, with the remaining derivatives
provided by the modeler. The derivative OP/00 satisfies the following equation:

OF 0Q 0A' 0P , 0A
0---0= 0--0 + fl ~ P A + f i x ~-~ A + fl A P --~
(aw OA' OP a/3)
- -~-~+fl--~-~PB+flA'--~B+flA'P-~- d F

~I['OR 0/31 OP I 0/3~

+1~ k-'~-~ + /3 ~ P B + /3/3' -'~-~/3 + /3/3 P - ' ~ / F

( -ff~ -A-~
OP , OA OW'~
~ F ~ /3 ,,, P A + /3/3' . , A + /3B P .v + O0 ]
244 E.W. Andersonet al.
aP aQ raA' aB'] , raA ]
=/3Ao'-~Ao+-~ff +/3[~ -F' ~ ]PA° + t 3 A o P [ ~ ®BooF

OW F _ F, aW' F' OR
- 0---0- -~ + - ~ F, (B.7)

Although this formula determines only an implicit function for OP/80, the gradient
of P can be represented explicitly in terms of things we know. Define the gradient
operator as follows: for any matrix A that depends on the parameter 0, VoA -
vec(OA/aO). Then,

VOP = (I - flAo' ® Ao')-' { VoQ + fl(Ao+P @ I) VOA' + fl(I @ Ao'P) VoA

-/3(Ao'P ® F') VoB' - t3(F' ® Ao'P) VOB - (F' ® I) VoW
- (s ® F') VoW' + (P' ® F') V0R}, (B.S)
which is a function of the gradients of A, t3, Q, R, and W. The gradient of P can
then be substituted into the following formula for VoF:

VoF = t3(I @ 7~13'P) VoA - 13(F' @ n B ' P ) \7o13 +/3(Ao'P ® Tt) VoB'
- (F' ® 7~) VOR + (I ® n ) VoW' + 13(Ao' ® riB') VoP, (B.9)

where 7~ = (R + t3B+PB) -1. Finally, we substitute this expression for VoF into

VoAo = VoA - (F' @ I) VoB - (I @ 13) VoF. (B.10)

B.2. A nonlinear economy without distortions

The optimization problem that we start with is

max Eo ~ / ~(zt, 0), (B.1 l)

{ud ~=0
subject to zt+l = Axt + But + Cwt+l,
zt = [zt', ut']',

where {wt+l } is a martingale difference sequence and E0 is the mathematical expec-

tation conditioned on time 0 information. We solve a related problem, namely:

max Eo Z f3tzt~Mzt' (B. 12)

{u~} t=o
Xt-]-I = Axt + But,
Ch. 4: Mechanics of Formingand EstimatingDynamicLinearEconomies 245

M=e r(2,0) ~ ~'+ 2 aZ,2 2 e'

1 ar( ,0) c' - 02r(e, 0)

02T(2, 0) 02~'(2, 0))

~2 2 2e t + ~ 05 , (B.13)

and where e is a vector of zeros except for a 1 in the element corresponding to

the constant term in zt, 2 and @ are the steady state values of zt and wt, and
S:~ = [In;0k,n] and S~ = [0,~,k;Ik] (where the " ; " denotes stacking) are selector
matrices and imply z t = S:~zt + &,ut, where n is the dimension of zt and k is
the dimension of ut. The latter problem yields the same decision function as that of
Eq. (B.1) (where Q = Sz'MSx, R = S.~'MSu, and W = Sz'MS.~).
In the nonlinear case, however, the derivatives are slightly more complicated. To
derive 8Ao/OO, we need to calculate derivatives of the coefficient matrices of the
objective function. For this, we need the derivative of M with respect to 0:

OM {~r(2,0)
02r'(2,0) 2q_ 1 ( 02r(2,0) OZ")
a0- a2a0 ae2 ga (:)2

1 2' 03r(2'0) ) e ,
+ ~ a2~a 0 2
1 (:)
+~ e a2O0 + ae0~ 022 a0

( 02r'(2, 0)02) 03T(.g.',0) 03/"(2,0)

- V~ 8~2 ~-~ ( : ) 2 e ' - e ~ ' 02,280 02200 2e'

+ ~ 0228 + Ve 022 ~ (:) , (B.14)

where V~A(z) = [13A(z)/Oz,,..., 8A(z)/Ozn] for A(z) which is n x n and b(:) ~s

an n x n matrix created from a vector of length n 2 by stacking the first n elements
of b into column 1, the next n elements of b into column 2, etc. As this formula
indicates, the modeler must provide first, second, and third-order derivatives of the
return function. The derivatives of Q, R, and W follow immediately from 8M/00,
e.g., 8Q/O0 = Sz'(OM/O0)S~. The remaining derivations are the same as in the
linear-quadratic case.
246 E.W. Anderson et al.

B.3. A linear-quadratic economy with distortions

The optimization problem that we start with is given by

max Eo
t=0 { Zt "022 J Zt l

subject to

Equilibrium conditions are imposed in the form of a set of linear equations

2t = OYt + ~Ctt.

In the notation of this subsection (which differs from that used in Section 7 in the
text), 9t denotes the endogenous state variables affected by the representative agent,
and 2t denotes variables that the agent takes as beyond its control. To ease notation,
we convert the problem to one without cross-products or discounting. Let

Yt = flt/29t,
Zt = f l t / Z 2 t ,

Ut ~ flt/2~t,

W t = flt/2ff)t,

q y = Qy - ~Vu~-lWy ,,
q~ = O~ - W , Ye-l W / ,
Q22 = 022 - 17VzR-lr~Vz 1,
Ay = x/-~(-ay - D ~ R - * w y ' ) ,
A~ = v ~ ( ~ L - B y R - ~ W / ) ,

By = v~B~,
o = (I + ~ R - ~ ~¢~') - ' ( 0 - ~ R - I w y ' ) ,
k~ = ( I + ~/~-lVV~')-lk~. (B.16)
Ch. 4: Mechanics of Forming and EstimatingDynamic Linear Economies 247

With these definitions, we can restate the optimization problem as follows

max ~-"
{ut } ~=0
lIl yt
+ 'tt,t' R U t }
, (B.17)

subject to

~/t+l = AyYt + Azzt + Bvut.

Let X = Av + A~O, Q, = Qv + Q~O, B = By + A~e, and X = Ay - B v R - l g t ' Q / .

The decision function in this case is given by

F= (R + By'PB)-'By'PA, (B.18)

where P satisfies

P = ~) + A ' P A - A ' P B ( R + B y ' P B ) - I B v ' P A . (B.19)

The decision function for the original problem is given by

= ( R + 17Vz'~)-I(RF + IYVv' + IEVz'O), (B.20)

and the equilibrium law of motion for Yt is

Yt+l = Ao'fh, Ao = fly + A~O - f i ~ P - BuF

=/3-1/2(2 - / 3 F ) . (B.2I)

Therefore, the derivative of Ao with respect to a parameter 0 is given by

~F- ~-~0-

To calculate OAo/O0 requires several steps. First, we need the derivatives of A, B,

and F with respect to 0:

0.4 OAv 0AAa~ OO

O 0 - O0 + - - O + O--O'
0/~ _ OBv 0A~ ~Og'

- - -(R + By'PB~ + B;PA~,)-' (k ~0R + --~0~'PBy + 8~' ~0P 8~
248 E. V~Anderson et al.

i aBv aBv ~ OP OA~

+ B, P-gg + --gg-PA~,+B~'~Az~'+B~ e~-~, ,

+ Bv'PAz --~ F + (R + Bv'PB v + Bv'PAz~ ) -1

x ( aBv' OP B 'P oAr

\ aO P A y + By' --~ Ay + y aO

+ -OBvl
- ~ PAzO + B vI -OP
~ A~O + By IP -8Az
- ~ O + Bv'P A, -0~0 )

- (R + By'PB)-' ( --~-ffF+~--OR aBy' p ( ~ - BF)

,aP ^ ^ (OAr OBv F)

+ By ~ ( A - BF) + Bv'P aO O0

+ By'P (@ - ~PF) + Bv'PAz ~0

O0 ~ Y . (B.25)

Note that these derivatives are functions of of OR/OO, OBy/OO, OAy/OO, OA~/OO,
~O/~0, Ok~/O0, and OP/OO. The derivative of R is given since R = ]~. The deriva-
tives for By, Ay, Az, O, and ~P follow from their definitions above, e.g.,

OBv _ ~ ally (B.26)

a0 O0 '
OAr _ V ~ ( a A y O/)v R _ , - a/~ - 1 -
oo \
o0 ao wy' + B~,/~-~ - ~ R - W~'
aWv'" ]
- B'~R-i aO J' (B.27)

aAzao v~( aA~ao a~aoR-~ w~' + B'~-~ ~aP~R-~Wz'


Y 00 J ' (B.28)

O0 (1 -[- I~/~-- 1V/]'z;) - 1 (ak~/~-1
~" iTd, 0 - ~ 1-~ --1 "a/~/~-11~,
~ 0

+~R_1OIEVz'o_ aO ak~ R_117dz, _ ~/~_ l OR R_, 17d,

ao -gg + -~ ao
+ ~/~_1 ~17¢'z'")- (B.29)
aO / '
Ch. 4: Mechanics of Forming and Estimating Dynamic Linear Economies 249

(I+~-lfG') -1 (0¢"
~ & - l V V z ' ~ - ~ - l - g ~ ROR - - 1w- ~,

- - 1 ~OW~'
+ ~PR- -~ - ~a ~ t . (B.30)

The derivative for P-is given by

- Ao + + v/fl I--~- - PAo

+ ~o,p[O~o o~
- ~ F j] + ~) -OR
~ F, (B.31)

where _P = (R + B v ' P B ) - I B P ' A , Ao = - A - B y F , and

oo - o~-+ .~ O+Oz a-~ (B.32)

OA _ OA v OBy R - I ~ U n ~ __OR
O0 ~0 O0 "~ + B y R - I oO R - I ~ U Q /

- B Y R - 1 ~0O~ ' Q~ , - ByR-t~P' 3Q~'

O0 (B.33)

The last two derivatives needed are OQ,v/O0 and 3Q~/O0:

- - ---1 ~ VFry,,
3Qyo0 00yO0 317Vv30 R-117VY' + W y R - I ~ O R-117VY' - W y R -~ , (B.34)

O0 = O0 - O0 z + WvR -1--~ z - WvR-1 00 (B.35)

We now have everything that we need to compute the derivatives of the matrices
in the decision rule and the law of motion for the state vector. To avoid iterating on
Eq. (B.31) for OP/30, we instead take the gradient, e.g.,

Vo P = (I - V/-fiAo ' ® -4o')-' { Vo O, + (I ® Ao' P' ) Vo A

+ Vzfi(Ao'P ' ® I) V o 2 - (F' ® fto'P') VoB
- V/-fi(Ao'P ' ® F') VoB v' + (F' ® F') Von}. (B.36)
Thus the gradient of F is given by
VoF = (I ® 7-4Bv'P ) VoA v + ((0 - ¢JF') ® 7-¢Bv'P ) VoA~
- (F' ® 7-4Bv'P ) VoB v + v ~ ( A o ' P ' ® 7-4) r o b v'
+ v/-fi(Ao ' ® TgBy') VoP - (F' ® 7"4') VoR
+ (I ® TCBv'PA~ ) VoO - (F' ® TiBy'PA~) Vog', (B.37)
250 E. W. Anderson et al.

w h e r e 7~ : ( R + B v ' P B ) - 1 . In terms o f the c o m p u t e r code, we start with E q s ( B . 2 6 ) -

(B.30) and Eqs (B.34)-(B.35), which relate the derivatives o f the original p r o b l e m
to those of the p r o b l e m without discounting or cross-product terms. To c o m p u t e the
gradients of these objects in terms of our inputs, we use the fact that v e c ( A B C ) =
( C ' ® A ) v e c ( B ) for any matrices A, B , and C with the appropriate d i m e n s i o n s such
that A B C exists. We next c o m p u t e the derivatives for A , / 3 , Q, and A w h i c h appear
in E q s (B.23), (B.24), (B.32), and (B.33). Finally, we c o m p u t e V o P in Eq. (B.36),
V o F in Eq. (B.37), and

VoAo : -ln(v02_ (F' ® - ® V0F).


Anderson, B.D.O. (1978) 'Second-order convergent algorithms for the steady-state Riccati equation', In-
ternational Journal of Control, 28:295-306.
Anderson, B.D.O. and Moore, J.B. (1979) Optimalfiltering. Englewood Cliffs, NJ: Prentice-Hall.
Anderson, E.W. (1995) 'Computing equilibria in linear-quadratic dynamic games and models with distor-
tions', University of Chicago, mimeo.
Ansley, C.E and Kohn, R. (1985) 'Estimation, filtering, and smoothing in state-space models with incom-
pletely specified initial conditions', Annals of Statistics, 13:1286-1316.
Bal, Z. and Demmel, J.W. (1993) 'On swapping diagonal blocks in real schur form', Linear Algebra and
Its Applications, 186:73-95.
Bartels, R.H. and Stewart, G.W. (1972) 'Algorithm 432 solution of the matrix equation A X + X B = C',
Communications of the ACM, 15:820-826.
Becker, G.S. and Murphy, K.M. (1988) 'A theory of rational addiction', Journal of Political Economy,
Bierman, G.J. (1984) 'Computational aspects of the matrix sign function solution to the ARE', Proceedings
23rd IEEE Cof!ference on Decision Control, pp. 514-519.
Byers, R. (1987) 'Solving the algebraic Riccati equation with the matrix sign function', Linear Algebra
and Its Applications, 85:267-279.
Caines, RE. (1988) Linear stochastic systems. New York: Wiley.
Caines, RE. and Mayne, D.Q. (1970) 'On the discrete time matrix Riccati equation of optimal control',
International Journal (~[Control, 12(5):785-794.
Caines, RE. and Mayne, D.Q. (1971) 'Correspondence: "On the discrete time matrix Riecati equation of
optimal control-a correction" ', International Journal of Control, 14(1):205-207.
Chan, S.W., Goodwin, G.C. and Sin, K.S. (1984) 'Convergence properties of the Riccati difference equation
in optimal filtering of nonstabilizable systems', 1EEE Transactions on Automatic Control, AC-29(2): 110-
Denman, E.D. and Beavers, A.N. (1976) ~The matrix sign function and computations in systems', Applied
Mathematics and Computations, 2:63-94.
Doan, T., Litterman, R. and Sims, C. (1984) 'Forecasting and conditional projection using realistic prior
distributions', Econometric Reviews, 3(1):1-100.
Economic and Statistics Administration, and Bureau of the Census (1993) The American almanac 1993-
1994. Austin, TX: The Reference Press.
Flavin, M.A. (1981) 'The adjustment of consumption to changing expectations about future income',
Journal of Political Economy, 89:975-1009.
Fletcher, R. (1987) Practical methods (~]:optimization. New York: Wiley.
Ch. 4: Meclumics of Forming and Estimating Dynamic Linear Economies 251

Gardiner, J.D. and Laub, A.J. (1986) 'A generalization of the matrix-sign-function solution for algebraic
Riccati equations', International Journal of Control, 44:823-832.
Gardiner, J.D., Wette, M.R., Laub, A.J., Amato, J.J. and Moler, C.B. (1992) 'A FORTRAN-77 software
package for solving the Sylvester matrix equation A X t 3 T + C X D T = E ' , ACM Transactions on
Mathematical Software, 18:232-238.
Golub, G.H., Nash, S. and van Loan, C. (1979) 'A Hessenberg-Schur method for the matrix problem
A X + X B = C', 1EEE. Transactions on Automatic Control, AC-24:909-913.
Golnb, G.H and van Loan, C. (1989) Matrix computations. Baltimore, MD: Johns Hopkins Univ. Press.
Golub, G.H. and Wilkinson, J.H. (1976) 'Ill-conditioned eigensystems and the computation of the Jordan
canonical form', SlAM Review, 18:578-619.
Gudmundsson, T., Kenney, C. and Laub, A.J. (1992) 'Scaling of the discrete-time algebraic Riccati equation
to enhance stability of the Schur method', IEEE Transactions on Automatic Control, 37:513-518.
Hall, R.E. (1978) 'Stochastic implications of the life cycle-permanent income hypothesis: Theory and
evidence', Journal of Political Economy, 86:971-987.
Hamilton, J.D. (1994) Time series analysis. Princeton, NJ: Princeton Univ. Press.
Hammarling, S.J. (1982) 'Numerical solution of the stable nonnegative Lyapunov equation', IMA Journal
of Numerical Analysis, 2:303-323.
Hansen, L.R (1987) 'Calculating asset prices in three exchange economies', Advances in econometrics,
fifth worm congress. Cmnbridge, MA: Cambridge Univ. Press.
Hansen, L.E, Heaton, J. and Sargent, T.J. (1991) 'Faster methods for solving continuous time recursive
linear models of dynamic economies', in: Rational expectations econometrics. Boulder, CO: Westview
Press, pp. 177-208.
Hansen, L.E and Sargent, T.J. (1994) 'Recursive linear models of dynamic economies', University of
Chicago, mimeo.
Heaton, J. (1993) 'The interaction between time-nonseparable preferences and time aggregation', Econo-
metricu, 61(2):353-385.
Hitz, K.L. and Anderson, B.D.O. (1972) 'Iterative method of computing the ihniting solution of the matrix
Riccati differential equation', Proceedings 23rd IEEE Conference on Decision and Control, Vol.119,
No 9.
Kfigstr6m, B. and Poromaa, P. (1994) 'Computing eigenspaces with specified eigenvalues of a regular
matrix pah (A,/3) and condition estimation: Theory, algorithms and software', LAPACK Working Note
87, mimeo.
Kashyap, R.L. (1970) 'Maximum likelihood identification of stochastic linear systems', IEEE Transactions
on Automatic Control, AC-15:25-34.
Kenney, C.S., Laub, A.J. and Papadopoulos, EM. (1993) 'A Newton-squaring algorithm for computing the
negative invariant subspace of a matrix', IEEE Transactions on Automatic Control, 38:1284-1289.
Kimura, M. (1988) 'Convergence of the doubling algorithm for the discrete-time algebraic Riccati equation',
International Journal ~f Systerns Science, 19(5):701-711.
Kimura, M. (1989) 'Doubling algorithm for continuous-time algebraic Riccati equation', International
Journal of Systems Science, 20(2): 191-202.
Kwakernaak, H. and Sivan, R, (1972) Linear optimal control systems. New York: Wiley/Interscience.
Kydland, E and Prescott, E.C. (1982) 'Time to build and aggregate fluctuations', Econometrica, 50:1345-
Laub, A.J. (1979) 'A Schur method for solving algebraic Riccati equations', IEEE Transactions on Auto-
matic Control, AC-24:913-921.
Laub, A.J. (1991) 'Invariant subspace methods for the numerical solution of Riccati equations', in: S.
Bittanti, A.J. Lanb and J.C. Willems, eds, The Riccati equation. New York: Springer, pp. 163-196.
Lm L. mad Lin, W. (1993) 'An iterative algorithm for solution of the discrete-time algebraic Riccati
equation', Linear Algebra and Its Applications, 188,189:465~88.
MacFarlane, A.G.J. (1963) 'An eigenvector solution of the optimal linear regulator problem', Journal ~#
Electronics and Control, 14:643-654.
252 E. ~ Anderson et al.

McGrattan, E. (1994) 'A note on computing competitive equilibria in linear models', Journal of Economic
Dynamics and Control, 18:149-160.
McGrattan, E., Rogerson, R. and Wright, R. (1995) 'An equilibrium model of the business cycle with
household production and fiscal policy', Staff Report 166, Federal Reserve Bank of Minneapolis, mimeo.
Pappas, T., Laub, A.J. and Sandell, N.R., Jr. (1980) 'On the numerical solution of the discrete-time algebraic
Riccati equation', IEEE Transactions on Automatic Control, AC-25(4):631-641.
Petkov, P., Jr., Christov, N.D. and Konstantinov, M.M. (1991) Computational methods for linear control
systems. Englewood Cliffs, NJ: Prentice-Hall.
Potter, J.E. (1966) 'Matrix quadratic solutions', SIAM Journal on Applied Mathematics, 14:496-501.
Roberts, J.D. (1980) 'Linear model reduction and solution of the algebraic equation by use of the sign func-
tion', International Journal of Control, 32:677-687 (reprint of Technical Report No. TR-13, CUED/B-
Control, Cambridge University, Engineering Department, 1971).
Rosen, S., Mm-phy, K.M. and Scheinkman, J.A. (1994) 'Cattle cycles', Journal of Political Economy,
Sat'gent, T.J. (1987) Macroeconomic theory. New York: Academic Press.
Sims, C.A. (1980) 'Macroeconomics and reality', Econometrica, 48(1):1-48.
Siow, A. (1984) 'Occupational choice under uncertainty', Econometrica, 52(3):631-645.
Stewart, G.W. (1972) 'On the sensitivity of the eigenvalue problem A z = ,~Bcc', SlAM Journal on
Numerical Analysis, 9:669-668.
Stewart, G.W. (1976) 'Algorithm 506 - HQR3 and EXCHNG: FORTRAN subroutines for calculating
and ordering the eigenvalues of a real upper Hessenberg matrix', ACM Transactions on Mathematical
Sq[h,vare, 2:275-280.
United States, Bureau of the Census (1975) Historical statistics ~,~[the United States, colonial times to
1970. Washington, DC: U.S. Department of Commerce.
United States, Bureau of the Census (1989) Agricultural statistics. Washington, DC: U.S. Department of
Van Dooren, P. (1981) 'A generalized eigenvalue approach for solving Riccati equations', SlAM Journal
on Scientific and Statistical Computing, 2:121- 135.
Van Dooren, P. (1982) 'Algorithm 590-DSUBSP and EXCHQZ: Fortran subroutines for computing deflating
subspaces with specified spectrum', ACM Transactions on Mathematical S~#ware, 8:376-382.
Vaughan, D.R. (1970) 'A nonrecursive algebraic solution for the discrete Riccati equation', 1EEE Trans-
actions on Automatic Control, AC-15(5):597-599.
Wilson, D.A. and Kumar, A. (1982) 'Derivative computations for the log likelihood function', IEEE
Transactions on Automatic Control, AC-27:230-232.
Zadrozny, P.A. (1988a) 'Analytic derivatives for estimation of discrete-time, linear quadratic, dynamic,
optimization models', Econometrica, 56:467-472.
Zadrozny, P.A. (1988b) 'Gaussian likelihood,:o~ Continuous-time ARMAX models when data are stocks
and flows at different frequencies', Econometric Theory, 4:108-124.
Zadrozny, P.A. (1989) 'Analytic derivatives for estimation of linear dynamic models', Comlmters and
Mathematics with Applications, 18:539-553.
Zadrozny, P.A. (1992) 'Errata to analytic derivatives for estimation of linear dynamic models', Computers
and Mathematics with Applications, 24:289-290.
Chapter 5


Star,lord Business School


0. Introduction 255
1. Mirrlees' formulation 255
1.1. Statement of the nonlinear pricing problem 255
1.2. The incomplete problem 257
1.3. Auxiliary constraints 258
1.4. Statement of the incomplete problem 258
1.5. Necessary conditions for a solution 259
1.6. Examples 261
1.7. Lessons from a discrete-types formulation 265
2. Computational methods 266
2.1. Direct optimization 267
2.2. Approximation via Fourier series 268
2.3. Introduction to finite-difference methods 269
2.4. Relaxation combined with Newton's method 270
2.5. The pure relaxation algorithm 277
2.6. Other boundary shapes 278
2.7. Higher dimensions 279
2.8. Nonlinear equations 280
2.9. Construction of the price schedules and the tariff 281
2.10. An alternative version 283
3. The complete problem 284
4. A mechanism design formulation 285
5. Summary and conclusions 288

*This work was supported by NSF grants SES9207850 and SBR9511209.

I am grateful for insights from Mark Armstrong, James Mirrlees, and Jean-Charles Rochet, and for valuable
comments from the editors and three other referees.

Handbook ~f' Computational Economics. Volume L Edited by H.M. Amman, D.A. Kendrick and Z Rust
(~) 1996 Elsevier Science B.V. All rights reserved.
254 R. W//~cn

Appendix A. Pseudo-codes 290

Appendix B. APL programs 292
References 292
Ch. 5." Nonlinear Pricing and Mechanism Design 255

O. Introduction

In applications of theories of incentives, the information known privately by an eco-

nomic agent is represented by a point in a Euclidean space. Other agents know the
probability distribution of this point, but not its realization, which is called the agent's
type. For models of this sort, designs of optimal incentive schemes present few difficul-
ties when agents' types are one-dimensional. The computational difficulties are severe,
however, when the types are multidimensional. When the types are m-dimensional,
the main task is to solve a family of partial differential equations to obtain a map
# : R m -+ R m that provides the Lagrange multipliers for each type's incentive-
compatibility constraints.
This chapter describes methods for solving simple versions that arise in nonlinear
pricing and mechanism design. The exposition concentrates initially on nonlinear pric-
ing, but later the similar problems that arise in mechanism design are described briefly.

1. Mirrlees' formulation

To illustrate the origin of the central computational problem, we present the formula-
tion of nonlinear pricing introduced by James Mirrlees (1971, 1976). 1 This formulation
characterizes the design of a tariff offered by a firm to a customer whose preferences
the firm does not know.

1.1. Statement of the nonlinear pricing problem

Consider a monopolist seller who charges a tariff P(q) for a bundle q of its products.
If wealth effects and risk aversion are absent, then a customer of type t is predicted to
respond with the purchase q(t) that maximizes his net benefit U(q, ~) - P(q) among
the feasible bundles q E Q.2 Here, the utility function U measures the customer's gross
benefit in money terms, depending on both the bundle purchased and the customer's
type. The customer knows his type but the seller does not, although the seller knows
the distribution of types in the population, or equivalently, the probability distribution

lMirrlees' initial formulation focused on optimal taxation. Nonlinear versions of Ramsey pricing and
taxation are variants of the general principal-agent problem affected by adverse selection and/or moral
hazard. A formulation in the context of mechanism design is presented in Wilson (1993b), which includes
extensions to cases where agents' types are correlated, and allowance is made for risk aversion and wealth
effects. For surveys of other applications, see Roger Guesnerie and Jean-Jacques Laffont (1984) and Wilson
(1993a, §15). Jean-Charles Rochet (1994) solves completely a class of discrete two-dimensional pricing
problems, and demonstrates the usefulness of the mathematical technique of 'measure sweeping'.
2Formulations that include risk aversion and wealth effects are presented in Mirrlees (1976, 1986),
Kevin Roberts (1979) and Wilson (1993), among many others. We focus on a formulation without these
effects, which captures the key ingredients of the main computational problem.
256 R. Wilson

of the type of a single customer. The seller's objective is to offer the tariff that
maximizes its expected revenue.
Assume hereafter that a' bundle is represented by a vector q = (ql,. • •, qe) in an
g-dimensional Euclidean space. Also, the set Q of feasible bundles is the nonnegative
orthant, and adopt the normalization U(0, t) = 0.
A model with these ingredients can be formulated with either discrete types or a
continuum of types. Presumably there are applications where discrete types are natural,
such as family size, ethnicity, or employment status. In most applications, however,
the type parameters are naturally interpreted as continuous. Typical parameters in-
clude the marginal utility of income; demand elasticities, cross elasticities, and other
parameters of demand functions; and socio-economic and demographic variables such
as wealth, age, and years of education. For instance, the specification of the utility
function U might be derived from a regression model that uses these parameters to
fit d e m a n d data. From a computational viewpoint it might be a matter of indifference
whether one formulates the model with discrete types initially, or instead uses a con-
tinuum formulation that is then analyzed using discrete approximations of the sort
presented in Section 2. This chapter is written, nevertheless, from the perspective that
the main objective is to approximate the solution of a formulation with a continuum
of types. A further motive is to take advantage of the simpler conditions and faster
computations obtained from the more powerful methods available for solving a model
with a continuum of types. For instance, when sufficient regularity assumptions are
satisfied, the continuum formulation enables the problem to be reduced to the solution
of a partial differential equation subject to boundary conditions. Thus the formulation
in this section focuses mainly on the formulation with a continuum of types, as a
prelude to the exposition of the numerical methods presented in Section 2. As will
be seen, it also has the advantage of clarifying the peculiar difficulties engendered by
multidimensional types for which there is no natural strict ordering. 3
In the continuum formulation, the set T of possible types is represented as a com-
pact, convex, full-dimensional subset of an m-dimensional Euclidean space, with a
piecewise-smooth boundary. Assume further that U is a smooth increasing function of
both the bundle and the type, and a concave function of the bundle. It is also usual to
impose conditions ensuring the existence of a tariff that induces self-selection by the
different types of the customer: in the one-dimensional case with a single commodity
and a single type parameter (i.e., g = m = 1), a typical condition requires that Uqt
is uniformly positive and the probability distribution of the type has an increasing
hazard rate [cf. Wilson (1993a, §8)]. 4

3This is actually essential to good modeling in any practical application, because it reflects an essential
aspect of heterogeneity among the population that would be erased by a unidimensional formulation, or
obscured by a discrete formulation that ignored the underlying multidimensionalcharacter of the types.
4In multidimensional versions, the relevant conditions on the utility function are the 'increasing differ-
ences' and the 'single crossing' properties [cf. Milgrom and Shannon (1994)]. McAfee and McMillan (1988)
state a 'generalized single crossing condition' that ensures that global incentive-compatibilityconditions
can be replaced by each type's first- and second-order local conditions.
Ch. 5." Nonlinear Pricing and Mechanism Design 257

1.2. The incomplete problem

The important aspect of Mirrlees' formulation of the seller's problem is to construct the
solution of an incomplete version in which some of the possibly relevant constraints
are omitted. In the single-type case it is known that fairly weak conditions suffice
to ensure that the solution of the incomplete problem is the solution of the complete
problem; or, the solution can be obtained from a simple modification (called the
ironing procedure) described by Mussa and Rosen (1978), Guesnerie and Laffont
(1984) and Wilson (1993a, §8). Although comparable sufficiency conditions have not
been established for multidimensional formulations, here we present computational
methods only for the incomplete problem until an example of the complete problem
is described in Section 3. 5
The motivation for the incomplete problem used in Mirrlees' formulation stems
from the following considerations. 6 Given the tariff P , the net benefit of type t is

W ( t ) - max U(q, t) - P ( q ) .

Alternatively, given the net benefit W ( t ) , the seller's revenue from type t is

]~(t) -- P ( q ( t ) ) = U ( q ( t ) , t ) - W ( t ) . (1)

In this alternative representation, the optimality of the bundle q(t) is conveyed, in

part, by two necessary conditions.
• Incentive-compatibility constraint:

w~(t) = W~(q(t), t) , (2)

if the bundle q(t) is in the interior of Q.

The incentive-compatibility constraint states the envelope property implied by the cus-
tomer's optimization. Because the type t has m dimensions, each side of the incentive-
compatibility constraint is a gradient vector; e.g., Ut (q, t) - (~iTU(q, t))i=j ........
® Participation constraint:

w ( t ) >1 o. (3)
5Various sufficiencyconditions invoked to enable representation of the customer's optimality condition
by the envelope property are described by Mark Armstrong (1992, 1993), Mirrlees (1976, 1986) and
Wilson (1993a, ~8), though mainly for one-dimensional formulations; and McAfee and McMillan (1988)
for multi-dimensional formulations.
6This specification assumes nonstochastic pricing. The formulation of a stochastic mechanism is illus-
trated in Section 4.
258 R. Wilson

The participation constraint recognizes that the customer retains the option to forego

1.3. Auxiliary constraints

These constraints are just two of the necessary conditions required for a solution of
the customer's optimization problem. To illustrate some of the other conditions that
are potentially relevant, we describe three. One is evident if U is a convex function
of the type: in this case W must be a convex function, since it is obtained as the
pointwise maximum of a family of convex functions indexed by the bundle] Another
is that the customer's local second-order necessary condition for a maximum must
be satisfied. For instance, suppose there exists a smooth mapping t(q) specifying
the type purchasing bundle q. From the customer's first-order necessary condition
one infers that the vector of marginal prices at q is Pq(q) = Uq(q,t(q)). Then the
second-order condition that requires Uqq(q, t) - Pqq(q) to be negative semi-definite
at q = q(t) implies that Uqt(q,t(q)). tq(q) must be positive semi-definite. In the
one-dimensional case, if Uqt > 0 then this requires that t(q), and therefore also
q(t), are nondecreasing functions. A third is the requirement that Ut(q(t), t) must be
integrable to obtain W(t), as described below in (7). For a detailed elaboration of these
auxiliary conditions and how the solution of the incomplete problem can be modified
to satisfy them in the one-dimensional case, see Guesnerie and Laffont (1984). Here
we ignore these second-order conditions and develop computational methods only for
the incomplete problem.

1.4. Statement of the incomplete problem

Assume for simplicity that the seller's costs are nil, so that its objective is to maximize
its expected revenue. Let f(t) be the probability density that the customer's type is
t (or the number of customers of that type), defined on the support T C R m of
possible types. Then the seller's optimizafion problem can be cast as choosing the
two functions q(t) and W(t) to maxiniize its expected revenue

JTR(t) dt =---fr[U(q(t),t) - W(t)]f(t) d t , . . . d t m ,

subject to the incentive-compatibility and participation constraints (2) and (3). Note
that this formulation converts the seller's optimization from the assignment of a tariff
to each bundle, to the choice of an assignment of a bundle and a net benefit to each


7If U is a linear function of the type then convexity of W is both necessary and sufficient, given the
other conditions, as shown by Rochet (1987).
Ch. 5: Nonlinear Pricing and Mechanism Design 259

1.5. Necessary conditions for a solution

To address this incomplete problem, let #(t) be an m-dimensional Lagrange multi-

plier attached to the incentive-compatibility constraint. The Lagrangian form of the
objective function is then

f T { [U (q(t), t) -- W ( t)]f (t) + [Wt(t) - Ut(q(t), t)] . #(t)} d t ,

where dt - dr1. • • dt,~. This objective presents a classical problem in the calculus of
variations. The Euler and transversality conditions therefore provide necessary condi-
tions characterizing an optimal solution. To understand the source of these conditions,
it is useful to eliminate the term involving the gradient Wt by using the form of mul-
tidimensional integration by parts obtained from the Divergence Theorem. According
to this theorem,

f T Wt(t) . #(t)dt = f~T W(t)[u(t) . #(t)]ds(t) - / T W ( t ) ~ ~(t) dr.


Here, u(t) is the outward-pointing normal vector at a point t on the boundary OT of T,

and f o r ' " ds(t) represents the surface integral over the boundary; the summation is
called the divergence of #, often denoted by V- #. With this substitution, the objective
is transformed into an ordinary maximand that can be optimized pointwise on T and
its boundary OT.
On the assumption that W and # are smooth functions, three necessary conditions
for an optimum are the following: 8

• Allocative optimality of the assignment of bundles:

U~(q(t), t)f(t) - Uqt(q(t), t) . #(t) <~ O, (4)

and this inequality is complementary to the feasibility constraint q(t) ~> 0. This
condition is sometimes represented as the necessary condition for maximization
of type t's virtual benefit, defined by 9

(?(q,t) =_ U(q,t) - Ut(q,t) . # ( t ) / f ( t ) .

8Two inequalities a <~0 and b >/0 are complementary if their inner product is nil: a • b = 0.
91n the Ramsey formulation of regulated nonlinear pricing the firm is allowed profits sufficient to
cover a fixed cost, and the objective is to maximize consumers' surplus subject to this constraint. This
modification adjoins a multiplier c~ E [0, 1] so that the virtual benefit is (J _~ U - c~Ut#/f.
260 R. Wilson

• Welfare optimality of the assignment of net benefits:

-f(t)- ~ ~(t) ~ O, (5)


and this inequality is complementary to the participation constraint W(t) >/0 on

the interior of the type domain. This is the Euler condition.
• Welfare optimality on the boundary:

~,(t). ,(t) ~< o, (6)

if t E OT, and this inequality is complementary to the participation constraint
W(t) >~ O. This is the transversality condition.
It is important to realize that the transversality condition (6) is essential. In one-
dimensional problems it tends to be innocuous because it binds at only a single
point; in multidimensional problems, however, it typically binds over most of the
boundary, and this accounts for considerable computational difficulties. Also, (5) and
(6) cannot be equalities throughout T because the Divergence Theorem would then
imply fT f(t) dt = 0. Typically, they are strict inequalities for a set of positive measure
comprising 'small' types for whom the participation constraint W(t) = 0 is binding
[cf. Armstrong (1992, 1993)].
These conditions indicate that the key step in obtaining a solution is to construct
the Lagrange multiplier # from (5) and (6). I° Once this multiplier has been obtained,
the optimal assignment of types to bundles is obtained by solving the ordinary Eq. (4).
Moreover, at the bundle q = q(t) the vector p(q) -~ Pq(q) of marginal prices must be
p(q) = Uq(q, t). The tariff P(q) is therefore obtained by integrating these marginal
prices, using the participation constraint (where it binds) to determine the constant of
The main computational task, therefore, is to construct the multiplier # by solving
the welfare-optimality condition (5) and the transversality condition (6), interpreted
as equalities on T and ~)T. It is important to realize, however, that if m > 1 then the
single partial differential Eq. (5) and the boundary condition (6) are insufficient to
determine the m components of the multiplier. For this one needs m - 1 additional
conditions derived from the requirement that the gradient vector of marginal prices
must be integrable to obtain the tariff P. Because W(~) = U(q(t),t) - P(q(t))
according to (1), an equivalent requirement is that the gradient Wt of the net benefit
must be an integrable function of the type [Mirrlees (1986, p. 1241)]. Further, the
incentive-compatibility condition (2) imposes Wt (t) = Ut (q(t), t) so this requirement
can be stated in terms of the following auxiliary condition.

raThe alternative eliminates /~(t) by substituting (4) into (5) and (6) to solve for q(t). If Uq is linear
then this has the same difficulty as solving for/~, and otherwise it involves nonlinear partial differential
Ch. 5: Nonlinear Pricing and Mechanism Design 261

® Integrability: the vector field

Ut(q(t),t) is integrable. (7)

Unfortunately, the integrability condition is stated in terms of the assignment q(t)

of bundles, which depends only indirectly on the Lagrange multiplier #(t) via the
allocative optimality condition (4). Only in special cases where (4) can be inverted to
identify the bundle as an explicit function of the Lagrange multiplier can one avoid
this awkward formulation by rephrasing (7) directly in terms of #(t) without reliance
on the bundle q(t).
If Wt(t) = Ut (q(t), t) is both integrable and differentiable then its Jacobian maU'ix
Wtt (t) is the Hessian matrix of W and therefore it must be symmetric. Conversely, it
is well known that with sufficient smoothness assumptions, symmetry of the Hessian
matrix is equivalent to integrability. Moreover, it is sufficient that only m - 1 of the
symmetry conditions are satisfied:

02W 02W
Otiti+------~(t)- Oti+lti(t), i 1,...,m-1.

In practice, these symmetry conditions are written out explicitly in terms of the cor-
responding terms of the Jacobian of Ut(q(t), t). The net result is that the integrability
condition (7) imposes r a - 1 auxiliary partial differential equations that with (5)
and (6) ordinarily suffice to determine the multiplier. As mentioned previously, these
auxiliary equations are often awkward to handle because they are cast in terms of
the bundle, which depends on the Lagrange multiplier via the allocative optimality
condition (4).

1.6. Examples

When m = l, the integrability condition is vacuous and the welfare-optimality and

transversality conditions have the trivial solution #(t) = 1 - F(t) independently
of U, where F is the distribution function for the density f ) l This simple result
produces most of the well-known properties of the single-dimensional problem; e.g.,
an immediate consequence of the allocative optimality condition (4) is that #(t*) = 0
for the highest type t*, and therefore the bundle obtained by this type has a marginal
price equal to the marginal cost of zero.

llWhen the hazard rate of F is not increasing the customer's second-order necessary condition for an
optimum need not be satisfied. In this case, it may be necessary to apply the ironing procedure to modify
the bundle assignment so that it is nondecreasing [cf. Guesnerie and Laffont (1984) and Wilson (1993a,
§8)]. In contexts of multiproduct bundling and of mechanism design, examples are also known in which
the optimal mechanism is stochastic [cf. Myerson (1981) and Rochet (1994)].
262 R. Wilson

To illustrate the form of the integrability condition when g = m ~> 2, suppose

U ( q , t ) = qr . [A . t - ½B . q] ,
so that Uq(q,t) = A . t - B . q, (8)

where A and B are matrices, and B is symmetric and positive definite. 12 Then the
vector field W t ( t ) = q(t) T . A must be integrable, where q(t) = B - 1 . A . [t - ~(t)]
if the nonnegativity constraint is not binding. The corresponding Hessian matrix is
W t t ( t ) = C . [I - #t(t)], where the matrix C =- A T . B -1 • A is symmetric and
positive definite. Consequently, integrability requires that C . #t(t) is a symmetric
matrix. When m -- 2, therefore, the condition that ensures integrability is an equality
between the off-diagonal elements of this matrix:

8#i 8#2 8#i 8#2

C 1 1 ~ 2 -]- C12-'~2 ~---e 2 1 - ~ -1 -[- C22 ~;----T

where c12 = e21 by construction.

The interplay among the four conditions when m > 1 is illustrated in the following
example, which allows a closed-form solution.

Example 1. For this example, g = ra and A = B = I so

U(q,t) = ~-~ (tiqi- 2q2) ,


and f ( t ) = 1 on the domain

T={tlti>~Oand ~'~t~Er2} '

which is the positive orthant of the ball with radius r. To derive the solution an-
alytically, we guess that W ( t ) = w ( y ( t ) ) , where w is a univariate function and
y ( t ) = ~ i t2. If this specification allows a choice of w satisfying the other conditions,
then the integrability condition (7) is satisfied automatically. From the condition (4) for
an optimal assignment we obtain q(t) = t - # ( t ) and from the incentive-compatibility
condition (2) we obtain W t (4) = q(t), so the specification implies that

w'(v(t))2t = t - #(t).

12The increasing-differences property requires essentially that A has nonnegative elements, and the
super-modularity version of the single-crossing property requires that the off-diagonal elements of B are
nonpositive [cf. Milgrom and Shannon (1994)]. Invoking super-modularity requires that the type space is
a complete lattice, such as an m-dimensional cube.
Ch. 5: Nonlinear Pricing and Mechanism Design 263

Differentiating this relationship and then summing yields

i i

Invoking the welfare-optimality condition (5) yields

w " ( y ) 4 y + w' ( y ) 2 m = rn + 1,

which is a differential equation for w' as a function of y. On the upper boundary

where y(t) = r 2, the outward-pointing normal is proportional to u(t) = 2t, so the
transversality condition (6) imposes the boundary condition 2t • #(t) = 0, or equiva-

2ti [ti - w 1 ( r z ) 2 t ~ ] = O,

which requires wl(r 2) = 1/2. The unique solution of the differential equation that
satisfies this boundary condition is

w'(y)=~ l+--rn 1-

Consequently, the multiplier is

p(t) = ~1 -1 t

Fortuitously, this multiplier also satisfies the transversality condition tt~(t) = 0 on the
lower boundary where ti = 0. Therefore, it satisfies all of the conditions required.
To obtain the optimal schedule p(q) of marginal prices, we use the fact that
the optimal purchase by a customer of type t satisfies the necessary condition
U q ( q ( t ) , t ) = p(q(t)), which in this example requires p(q(t)) = t - q(t), while
the allocative optimality condition (4) requires t - q(t) = #(t). Consequently,
p(q) = #(t(q)), where t(.) is the inverse of the map t ~+ q(t) = t - #(t). The
types making positive purchases are those for whom q(t) > 0, or equivalently
X/~ > r a ( m ) , where the parameter a(m) -~ [1 + m] -1/'~ increases from 1/2
to 1 as m increases. Each interior type purchases either all products or none.
This example illustrates one of the novel features introduced by formulations that
include multiple products and multiple type parameters. In this example, the marginal
price of each product is initially increasing; moreover, the marginal price of each
264 R. WiLwn

product depends on the quantities of the other products purchased. This aspect will
be elaborated further when we explain later the calculations leading to Figs 3-5.
Similarly, when the density is exponential, say f ( t ) = e -v(t) on the domain {t ]
y(t) ~ r 2 }, one obtains

i i i


= f(t) - w' (v(t))w (t)],

where yi =_ Oy/Oti, provided this yields w ~ as a function solely of y. For instance, if

y(t) = 1 / 2 ~ i t 2 so that the density is Normal with r = ec, and m is even, then

.(t)= f(t) t,

where al = 1 and ak+l = Ira/2 - k]ak. Caution is required, however; e.g., if

y(t) = ~ i ti then the resulting multiplier cannot satisfy the transversality conditions,
which stems from the feature that the resulting optimal bundles lie along a one-
dimensional ray. Hereafter we assume that the range of the bundles is full-dimensional
to exclude such complications.
Example 1 is one of a rare collection allowing closed-form solutions; others that
also rely on exploiting radial symmetries are described in Armstrong (1992, 1993) and
Wilson (1993a, §13.5; 1993b, p. 145 ff.).13 The following sections therefore describe
numerical methods for solving the differential Eqs (5) and (7) with the boundary
condition (6). We concentrate on the case g = m = 2. As we shall see, this case
with two products and two type parameters shows already that the problem poses
computational difficulties.
For later reference we mention also a second, closely related example for which
the method used for Example 1 is evidently insufficient.

Standard test problem. This example is the same as Example 1 with g = m and
f ( t ) = 1, but the support T is the unit hypercube.
In spite of its simplicity, the test problem does not seem to be solvable in closed
lbrm. In addition, it is peculiar in that it has a continuum of other 'solutions' whose

13In such examples the optimal tariff depends only on a one-dimensional measure of the aggregate
'size' of the bundle purchased. The analysis of Example 1 in Wilson (1993a, p. 340) has an error: the
leading coefficient of/zi should be 1/2.
Ch. 5: Nonlinear Pricing and Mechanism Design 265

non-optimality is revealed only by the fact that they are discontinuous. For instance,
one such 'solution' is

m[1-ti] ifti~>maxtj,
t~i~j= ti ift~<maxtj,

which for m = 2 is discontinuous along the diagonal of the unit square. These
extraneous solutions indicate that caution is required in using algorithms that rely
on discrete approximations. Such algorithms usually provide no assurance that the
differentiable solution, which is the only optimal one, is the one approximated by
numerical calculations.

1.7. Lessons f r o m a discrete-types a+brmulation

The standard test problem includes an implicit assumption that the type parameters are
distributed independently among the population of customers. Rochet (1994) finds that
correlations can have pronounced effects. He studies the two-dimensional problem in
the case that the support T comprises only the four corners of a translate of the unit
square. A review of his formulation shows the connections between the models with
continuous and discrete types.
In Rochet's setup, a customer of type t obtains the gross benefit U(q, t) = t. q from
the bundle q, and the seller's net revenue is R ( t ) = U(q(t), t) - W ( t ) C(q(t)),
where C(q) = l q . q and

W ( t ) = U(q(t), ~) - P ( q ( t ) ) : max U(q, t) - P ( q ) .


Notice that he interprets the quadratic term as a cost borne by the seller, and therefore
U is bilinear; this alters slightly the interpretation of the participation constraint. The
discrete form of the incentive-compatibility constraints requires in this case that

w(t) - [t -

for every two points t and s in T. This constraint requires that type t does not prefer the
bundle assigned to type s; if the constraint is binding, one says that s attracts t. Using
tz(t, s) as the multiplier on this constraint, the corresponding optimality conditions are,
for each point t c T where f ( t ) > 0 is the mass or number of customers of that type:

® Allocative optimality: [~ - q(t)]f(t) - ~ [ s - t]p(s, t) <~ O.

* Welfare optimality: - f ( t ) - ~ s [ P ( s , t) - #(t, s)] ~< 0.
266 R. Wilson

These conditions, along with the complementarity conditions, are evident analogs
of (4) and (5). Because the seller's maximization problem has a concave objective
and linear constraints, these conditions are both necessary and sufficient. From a
computational viewpoint, the problem is solvable by standard software for quadratic
Rochet assumes that the translate of the unit square considered moves the origin
to a point sufficiently positive that it is the only binding participation constraint. He
then determineshow the solution is affected by the mass function f ( t ) . He identifies
essentially three cases. In the first, f reflects a sufficiently strong positive correla-
tion between the two type dimensions, and in this case the problem is separable: the
problem is solved by treating each dimension separately using only the marginal distri-
butions along the type dimensions. In the second, the correlation is neither too positive
nor too negative, and the solution is obtained from the observation that the binding
incentive-compatibility constraints are the adjacent 'downward' ones: #(~, s) > 0
only if t - s = (1,0) or t - s = (0, 1). This is the 'regular' case implied by the
usual regularity conditions, and corresponds to the standard test problem. Asymmet-
ric versions are also possible in which not all the adjacent downward constraints are
The third case, in which the correlation is sufficiently negative, introduces novel
features. 14 For instance, suppose most customers have the type that is the translate of
(1,0): then all three of that type's incentive-compatibility constraints are binding. A
consequence is that, in contrast to the other cases, the 'highest' type at the translate
of (1, 1) receives an inefficiently large quantity of the second product, induced by a
marginal price that is below marginal cost. The motive for this result is that the seller
wants to obtain the greatest profit from type (1,0), which dominates the population.
Because (1, 1) attracts (1,0), the seller can do this only by assigning (1, 1) a large
quantity of the second product; this precludes (1,0) from preferring (1, 1)'s bundle,
because (1,0) values the second product less than (1, 1) does.
These observations demonstrate that even if the problem is quite regular in other
respects, local negative correlations in the distribution of types can produce results
with properties quite different from those inferred from studies of unidimensional
problems. From a computational perspective, it seems clear that in such cases there
is no practical alternative to solution via direct optimization methods using standard
software for constrained nonlinear programming.

2. Computational methods

In this section we describe numerical algorithms for solving Mirrlees' incomplete

problem. Several are described only for the standard test problem. Implicit throughout

14This case is excluded by regularity assumptions that require the distribution of types to be affiliated, as
in Milgrom and Shannon (1994). Affiliation requires that the type parameters ,are nonnegatively correlated
on every hypercube [cf. Milgrom and Weber (1982)].
Ch. 5." Nonlinear Pricing and Mechanism Design 267

this section is the assumption that the problem is sufficiently regular that only the
local incentive-compatibility constraints specified by the envelope condition (2) are

2.1. Direct optimization

We mention first the computational method used by Armstrong (1992). This method
solves directly a discrete version of the seller's optimization problem. The set T of
possible types is represented by a finite set T comprising a discrete grid of points
with mesh size 5. The seller's problem is then cast as the nonlinear constrained
optimization problem in which the objective is to choose the nonnegative variables
(q(t), W(t))tcT~ to maximize the expected profit

[U(q(t), t) - W(t)] f ( t ) 8 "~


subject to the discretized incentive-compatibility constraints

W ( t + Sei) - W ( t ) = ~ t (q(t),t)5 ,

for each point t C T and each i ~< m, where ei is the ith unit vector. If T is
the unit cube and 5 = 1 / n then this formulation requires [m + 1]n m variables and
rnn '~ (generally nonlinear) constraints, in addition to the nonnegativity constraints.
Thus when rn = 2 the coarse grid with n = 20 requires 1200 variables and 800
constraints. Nonlinear optimization problems of this size are feasible but storage and
time requirements are substantial. Armstrong (1992) reports results for the standard
test problem, for which the incentive-compatibility constraints are linear and symmetry
considerations allow half as many variables. This method seems to have little prospect
of being feasible and accurate when the number m of type dimensions exceeds two.
Direct optimization via constrained nonlinear programming is simplest in design,
and it has the great advantage that powerful software is readily available. However,
it is uneconomical in terms of storage and speed if the problem is sufficiently regular
that only local incentive-compatibility conditions bind; and presently infeasible if
the number of dimensions exceeds two. Software for constrained optimization must
ultimately solve an equivalent set of equations, but typically it takes no account of the
special structure of the problem - notably the simplification of the welfare-optimality
condition (5) via the Divergence Theorem. 15

tSA comparisonbetween the short speedy programs in Appendix B and the typically massive optimiza-
tion software conveys the difference. One skeptical referee reported that the capacity of his or her software
was exhausted by a grid with only eleven points along each of two type dimensions.
268 R. Wilson

2.2. Approximation via Fourier series

For problems in which the utility function U has a simple analytical form it may be
feasible to derive a Fourier (or other finite-element) approximation. The technique
relies on the derivation of an auxiliary second-order partial differential equation for
each multiplier, from which a class of solutions represented as Fourier series can be
We illustrate using the standard test problem. For this problem the bundle purchased
by type t is q(t) = t - Iz(t) and the incentive-compatibility constraint requires that
Wt (t) = q(t). Consequently, the integrability condition that the Hessian matrix Wtt (t)
is symmetric requires that the Jacobian matrix #t(t) is symmetric. Thus, a solution
requires that

0]A1 (~') -- - ~ 1 (~) = 0

in addition to the welfare-optimality condition

0lZl 0#2
(t)+~v-, (t)+f(t) =0.
0tl 0~2

Differentiating these two conditions with respect to t2 and t 1 respectively yields a

single second-order equation for #i (t):

0t, (t) + ' + (t) = O,

and analogously for #> Thus one obtains a classical Poisson equation for each multi-
plier separately. The problem is not of the standard Dirichlet 01" von Neumann form,
however, because the boundary condition where/~2 C {0, 1 } depends on the solution
of the analogous problem for the other multiplier.
Similarly, if Uq is linear as in (8) and e i i = 1 and el2 = 3' then one obtains the
elliptic equation

0t 2 (t) + ~ - 2 2 , , - 2 7 ~ ( t ) + (t) - ~, (t) -- 0.

Similar equations arise in higher-dimensional versions with constant coefficients.

In the standard test problem, the density is f ( t ) = 1 on the unit square, and therefore
Of lOt1 = 0, which yields a classical Laplace equation. For the transversality condition
that imposes the boundary condition #l (t) = 0 for ti C {0, 1 } and t2 > 0, Fourier
Ch. 5: Nonlinear Pricing and Mechanism Design 269

representations of the multiplicatively separable solutions of the Laplace equation take

the form [cf. Milne (1970, § 10) and Derrick and Grossman (1987, § 10)]:

#5 (t) = ~ ak sin(kTrtl ) cosh(kzr[1 - t2]),


where the constants ak are arbitrary. Moreover, if #5 has this form then the welfare-
optimality condition implies that

#2(t) = 1 - t2 + ~ ak cos(kTrtl) sinh(kTr[1 --t2]),


where we have fixed the constant of integration to satisfy the transversality condition
at t2 = 1. The transversality condition on the remaining portion of the boundary
where t2 = 0 imposes the boundary condition #2(t) = 0 there, at least for those types
for which tl is large enough that the net benefit W ( t ) is positive. This yields a set of
linear equations of the form

~--~ bk cos(kTrtl) + 1 = 0,

where bk ~- ak sinh(k~r), to be solved for a set of n values of tl to determine the

coefficients bk. In fact, if the chosen points are tl(k) = 1 - [k - 116 where the mesh
size is 3 = l / n , then bk = 2 except that bn = 1.
In sum, the Fourier method consists of the following steps. First one derives a
Poisson equation for one of the multipliers, say #5. This equation is derived by
differentiating the welfare-optimality condition and the integrability condition, and
then solving the resulting equations to eliminate the other multiplier, if this is possible
in closed form. From the Poisson equation one establishes a family of Fourier solutions
satisfying the boundary conditions for #5 on the segments of the boundary where tl =
0 or tl = 1. From the welfare-optimality condition, interpreted as an ordinary first-
order differential equation for #2, one then obtains a corresponding family of Fourier
solutions for /-52. The last step is to determine the coefficients from the boundary
conditions for #2 on the remaining segments of the boundary. This method for the
two-dimensional case indicates the outline of a method for higher-dimensional cases,
hut no examples have been solved.

2.3. Introduction to finite-difference methods

Algorithms based on finite-difference approximations encounter fundamental diffi-

culties. To motivate the circuitous approach taken in the design of the algorithms
270 R. Wilson

described later, we first describe the source of these difficulties. We use . i [j, hi to
indicate the value of #i(t)/5 at a grid point t = (tl, t2) where ti = ja and t2 = ha
for the standard test problem with mesh size 5.
With this notation, a naive approach to representing the welfare-optimality and
integrability conditions in terms of finite differences produces the two equations

(Pl [j + 1, k] -., [j, k]) + (.2[j, k + 1] - .2[j, k]) + f(ja, ha) = O,

(.1 [J, k + 1] - . , [j, k]) - (.2[j + 1, k] - .z[j, k]) = 0,

at each interior grid point, and analogous equations on the boundary (excluding the
origin). In fact, however, this approach is doomed to failure: such equations are
invariably both singular (redundant columns in the associated matrix) and inconsistent
(redundant rows with incompatible constant terms)!
These deficiencies stem from an important economic consideration. Although the
welfare-optimality condition is properly formulated in terms of forward differences,
the integrability condition must be formulated in terms of backward differences. This
reflects the orientation of the incentive-compatibility constraint, which in the discrete
version is binding for lower types.
Three schemes are described below. Each takes account in a different way of
the opposite directional orientations of the two conditions. The first two rely on a
relaxation algorithm, and later we describe an alternative scheme that relies on a
direct algorithm. 16

2.4. Relaxation combined with Newton's method

The Newton-relaxation algorithm takes the conservative view: all directions are in-
cluded in the construction of the values at one grid point from the values at its
neighbors. This tactic is implemented by invoking two refinements in the formulation.
The first refinement introduces the analogs of the Eqs (9) in which the stencil is
rotated. Solving (9) yields
"1 [j, k] = ~ ( f ( j a , /ga) ÷ .1 [j -]- 1, ]g] ÷ "1 [j, 1~ ÷ 1] + "2[j, k + 1]
- . 2 [ j + 1,

and analogously for .2[j, k]. This representation is the basis for a recursive formula
that relies on a three-point stencil in which the values .1 [J, k] and .2[j, hi are con-
structed from the values at the two adjacent points higher in the grid. This stencil
is shown schematically in the upper left of Fig. 5.1: the values at the corner • are
computed from the values at the two endpoints. Also shown in the upper right of

JtSee Golub and Ortega (1992, §9) for expositions of these standard algorittuns.
Ch. 5: Nonlinear Pricing and Mechanism Design 271

Basic Stencil Stencil Rotated 90 D

Composite Stencil
Constructed from the Four Rotations of the Basic Stencil
Figure 5.1. The two-dimensional asymmetric three-point basic stencil in which the values at the comer •
are computed from the values at the two endpoints higher in the grid. Also shown is this stencil rotated
90 ° , and the composite five-point stencil obtained from summing over the four rotations of the basic stencil
by multiples of 90 ° . The composite stencil is symmetric: values at the center • are computed from the
values at the four endpoints.

the figure is this stencil after its orientation is rotated 90 ° . Rotating this basic stencil
through 90, 180, and 270 degrees, and then adding the four f o r m u l a s produces the

#l[j,k] = ¼ ( # l [ j + 1,hi + # l [ j , k + 1] + # l [ j - 1,k] + # l [ j , k - 1])

for the standard test problem. This formula corresponds to the c o m p o s i t e s y m m e t r i c

five-point stencil s h o w n at the bottom of Fig. 5.1: the values at the center are c o m p u t e d
f r o m the values at the four endpoints.
N o t i c e that adding the four rotations o f the basic stencil eliminates the d e p e n d e n c e
on adjacent values o f the other multiplier. M o r e o v e r , this recursion is precisely the
one obtained f r o m the natural five-point stencil d e r i v e d f r o m the Laplace equation.
The r e c u r s i o n is s y m m e t r i c , produces a nonsingular system o f equations, and yields
a relaxation algorithm that is unconditionally stable, t7 In m o r e general problems,
one wants assurance that inclusion of the rotations o f the basic stencil will suffice to
assure that it is the differentiable solution that is a p p r o x i m a t e d by the calculations. No

17The latter property is well known for elliptic equations [cf. Golub and Ortega (1992)]. Milne (1970,
§10) uses the rotations based on multiples of 45 degrees to obtain a recursion based on a nine-point stencil
for which the error from the discrete approximation is smaller by an order of magnitude. In the sequel we
omit this refinement as well as others, such as implicit methods, successive over relaxation, and alternating
direction methods [cf. Golub and Ortega (1992, §9)].
272 R. WiLson

general theoretical justification for this conclusion seems to be known, however, and
therefore one must rely pragmatically on the encouraging evidence from computational
The second refinement adapts the transversality condition to obtain a formulation
that relies on only the one multiplier #1 occurring in the recursion used for the
relaxation algorithm. No modification is required on the two sides of the unit square
where tl 6 {0, 1 }: there the boundary condition is #1 (~) = 0. To obtain a boundary
condition on the side where t2 = 1, note that the transversality condition requires
#2(t) = 0 and therefore 0>2/0tl = 0. In combination with the integrability condition
this implies that the solution must satisfy the differential constraint

(t) = 0

on this segment of the boundary. This is a boundary condition of von Neumann form.
The corresponding discrete version imposes the constraint

~l[j,n] = ~l[j,'/'b -- 1].

in principle, a similar condition also applies on the opposite side where t2 = 0.
However, computational experience shows that it is practically impossible to enforce
a differential constraint there with any reasonable mesh size. The reason for the
disparity between the two sides where t2 E {0, 1 } is evident from solved examples:
near the upper boundary the differential constraint is nearly satisfied, whereas near
the lower boundary it is far from satisfied. Another relevant consideration is that on
the lower boundary the differential condition need not apply for small values of t.l
for which W(t) = 0, whereas the monotonicity assumptions imposed on U ensure
that it applies uniformly on the upper boundary. Therefore, we enforce the differential
constraint only on the upper boundary and seek an alternative condition on the lower
To obtain a useful boundary condition on the side where t2 = 0, we integrate the
welfare-optimality condition over the interval from t2 = 0 to t2 = 1 for each fixed
tl. Taking account of the welfare-optimality constraint yields

1 l 0#l
{b(~l) ~ fo ~(t)
z-01"2d~;2 fo ( ~-1 (~3)@ f(~;)) d~2"
Further, the transversality condition requires /t 2 = 0 at both extremes, so we obtain
the integral constraint that requires

= o
Ch. 5: Nonlinear Pricing and Mechanism Design 273

at each t 1 > 0. The corresponding discrete form of the integral value is

¢[Jl = - [J + 1,k] - k]) + I(j , (10)

In the algorithm, Newton's method is used to improve iteratively the estimates of the
r~ - 1 values (#I [3, 0] }0<j<n on the lower boundary until the corresponding integral
conditions q~[j] = 0 are satisfied approximately.

2.4.1. Summary of the Newton-relaxation algorithm

These ingredients provide the following summary statement of the relaxation algo-
rithm combined with Newton's method. We are given a symmetric stencil specifying
a discrete recursion amenable to implementation as a relaxation algorithm. Typically
this recursion is obtained by summation of the recursions derived from rotations of a
basic asymmetric stencil. For the standard test problem, this equation is linear and in-
volves only the one multiplier I~1, but more generally it may be nonlinear and involve
all the multipliers, as we illustrate later.
The only impediment to straightforward calculation of a solution by successive
relaxation, therefore, is the unusual set of boundary conditions. Typically these are
Dirichlet conditions on a portion of the boundary (where tl c {0, 1 }), but on other
portions (where t2 ¢ {0, 1 }) they depend on the other multiplier. To eliminate this
dependence, they are replaced by a differential condition (where L2 = 1) and an inte-
gral condition (where t2 = 0). On these portions the boundary values are estimated by
iterative improvement from an initial guess; in particular, Newton's method is used
to improve the approximation of the integral condition.
We use the standard test problem to illustrate. The algorithm starts with an initial
approximation of the boundary values #l (t) for t 2 C { 0 , 1 } and tl > 0. This approxi-
mation is then improved on the upper boundary by repeatedly applying the differential
condition; and on the lower boundary, by repeatedly applying Newton's method to the
integral constraints. Between these improvements of the boundary values, the values
at interior grid points are obtained by successive relaxation; that is, the symmetric
recursion is applied repeatedly.
To initiate the algorithm, we specify an initial guess #~ [j, k] of the values of #1 (t)
at all grid points (j6, kc~) c T of the grid, requiring only that #]'[0, k] -- 0 on the left
side and #~ [r~, k] = 0 on the right side of the boundary, for 0 ~< k ~< 'r~,where the mesh
size is d = 1/r~. The subsequent steps of the algorithm alternate between two phases.

Phase 1~ Construction of the interior values, in this phase the boundary values
remain fixed while the symmetric recursion is applied repeatedly. Thus, using an
274 R. Wilson

explicit form of relaxation, the new values in iteration r + 1 are obtained from the
previous values by the recursion
.[+1 [j, k] = I (#].[j + l, k] + #]'[j,k + 1] + #~[j - l,k]
r •
1]), (11)
for 0 < j, k < n, as derived previously for the symmetric five-point stencil, is These
iterations are repeated until a test of convergence is satisfied.

Phase 2. Improvement of the boundary values. To improve the estimates of the

values on the upper and lower boundaries, we proceed as follows.
• On the upper boundary where t2 = 1:

>~+l[j, n] = #]'[j, n - 1], (12)

which enforces the differential constraint 0 # l / 0 t 2 0 there.

- - -

• On the lower boundary where t2 = 0, Newton's method is employed to improve

the boundary values to meet the integral constraint:

0] = 0] - oa • ¢', (13)

where #~[.,0] indicates the vector (>]'[j, 0])o<j<n and similarly ¢~[j] is the
current value of the j t h integral in iteration r. Also, 0 C (0, 1) is a parameter
fixing the step size and J is an approximation of the Jacobian matrix of ¢ with
respect to #l [', 0].
After a single iteration of Phase 2, one returns to Phase 1 to adjust the interior values
to conform to the revised boundary values.
Computational experience shows that in practice it is sufficient to use a coarse ap-
proximation of the Jacobian matrix. A typical column of this matrix can be constructed
by calculating the difference between the integral values obtained from Phase 1 and
the integral values obtained when o n e boundary value (say #1 [n/2, 0]) is perturbed.
Figure 5.2 shows this difference for the standard test problem. It is evident in the
figure that the main effects are a positive increment in its own integral value and a
negative increment of equal magnitude in the adjacent integral value. Consequently,
it suffices to use a scaled version of the Jacobian matrix for which the diagonal ele-
ments are 1, the next-higher diagonal elements are - 1 , and other elements are zero.
This implies that the inverse of the scaled Jacobian, j - l , has elements that are 1
on and above the diagonal, and zero below. This approximation works well even for
problems that differ substantially fi-om the standard test problem, including the case
(8) in which Uq is linear. A typical value of the step size that works well is 0 = 0.2.
18This recursion can be improved in accuracy and/or convergence rate by using Milne's nine-point
stencil and/or an implicit form, possibly augmented by over relaxation.
Ch. 5: Nonlinear Pricing and Mechanism Design 275
Chanoe in the InteoralValueatt~


Figure 5.2. The changes in the integral values resulting from a perturbation of/~1(t) on the lower boundary
where tl = 1/2 and t2 = 0. These changes indicate that the Jacobian can be approximated by a matrix
with diagonal and super-diagonal elements that are 1 and -1 respectively, with other elements set to zero.

2.4.2. The program nra2L and a numerical example

Appendix A provides a pseudo-code outline of the algorithm for the standard test
problem based on relaxation combined with Newton's method. Appendix B in-
cludes the A P L program nra2L that implements this algorithm for the more general
two-dimensional linear version (8) in which the matrix C conveys the relevant infor-
mation about the two coefficient matrices A and B. This program assumes that the
matrices A and B are symmetric with diagonal elements 1 and off-diagonal elements
a and b. Using this algorithm for the case a = b = 1/2 and grid size 3 = 1/40, one
obtains the approximation of the multiplier/~l (t) shown in Fig. 5.3 for several values
of t2.
For small values of t2 the multiplier is large and declines steeply as tl increases.
For larger values of t2 the multiplier is smaller, and first increases before declining
as tt increases. The feature that ~/zl(t)/~t2 ~ 0 when t2 ~ 1 is evident in the
figure: for large values of t2 the curves are close together, and indeed the curve for
t2 = 0.9 is virtually indistinguishable from the curve along the upper boundary where
L2 = 1.
Because the optimal bundle requires ql (1~) = max{O, El --/~1 (~)} the customer's
purchase of commodity 1 is zero where # l ( t ) /> tl. In the figure, therefore, only
the region where ]zI (t.) ~ E l is relevant to the calculation of the optimal assignment
and the marginal prices. This eases somewhat the error produced by not enforcing
the differential condition on the lower boundary where t2 = O; however, one must
expect that the effects of this error propagate to some extent throughout the type
276 R. Wilson





0 0.2 0.4 0.6 0.8 1
Type parameter ta

Figure 5.3. The multiplier /~1 (t) for several values of t2 using the version (8) in which the diagonal and
off-diagonal elements of A, B, and G' are 1 and I/2 respectively. The data for the figure were computed
using the relaxation algorithm combined with Newton's method, and the grid size was 6 =- 0.025.

Marginal Price of Product 1

0.6 ~- . . . . . . . . . .

\ Quantity of Product 2

- ~ 0.3
0.4 --_ 0.4

0.2 ...... ---- ~-~

0 0.2 0.4 0.6 0.8 1

Quantity of Product 1
Figure 5.4. The marginal price Pl (q) of product 1 derived from Fig. 5.3 using polynomial interpolation.
Ch. 5: Nonlinear Pricing and Mechanism Design 277

Figure 5.4 shows the resulting schedule Pl (q) of marginal prices obtained via poly-
nomial interpolation. As with the multiplier, the marginal price Pl (q) first increases
and then decreases as ql increases. The resulting bundles have the property that each
interior type purchases either both products or neither, which Armstrong (1992, 1993)
has shown to be true for a wider class of cases.

2.5. The pure relaxation algorithm

The pure relaxation algorithm takes a bolder approach, relying on the presumption that
formulation of the integrability condition in terms of backward derivatives suffices.
It enables a much simpler algorithm, but caution is advised because it has not been
tested on enough examples to ensure that this presumption is always justified. As
before, we illustrate using the standard test problem.
Using backward derivatives for the integrability condition, the discrete formulation
of the welfare-optimality and integrability conditions specifies the two equations
(#, [j + 1, k] - #1 [j, k]) + (/-'2[j, k + l] - #2[j, k]) + f ( j 6 , k6) = O,
(.1 [j, k - 1] - .1 [j, k]) - (#a[J - 1, k] #2[j, k]) = O,
which yields the recursion
. l [j, k] = ' (f(j6, k~) + #l[j + 1,k] + F'l[J,k - 1] + #2[j, k + 1]
- #2[j - 1,hi), (14)
and similarly for P2[j, k], based on the symmetric five-point stencil. Phase 1 remains
unchanged except that this recursion is used for successive relaxation of both multipli-
ers at the interior grid points. In Phase 2 the differential condition remains unchanged
too: as before, it is derived solely from the integrability condition and the correspond-
ing transversality condition on each of the two upper boundaries. The significant dif-
ference is that now the boundary condition on each of the two lower boundaries can
be derived from the welfare-optimality condition and the transversality condition. For
instance, on the lower boundary where t2 = 0 the boundary condition for/~I (tl, 0) is

Or2 (tl,0) + f ( t l , O ) = O,

which is just the welfare-optimality condition when the transversality condition

>2(tl, 0) = 0 is invoked to get 0#2/0tl (tl, 0) = 0. As mentioned previously, how-
ever, when tl is small this last equality is difficult to enforce with any reasonable grid
size; consequently, in practice it is better to use the full form of the corresponding
discrete formulation:

#1 [3 -- l, 0] = #l [j, 0] + #2[J, 1] + f ( j t , 0a), (15)

278 R. Wilson

which can be solved recursively starting from/*1 [n, 0] = 0, as required by the transver-
sality condition on the boundary where tl = 1.
This algorithm is summarized in pseudo-code in Appendix A for the standard test
problem, and it is implemented in the APL program pra2L included in Appendix B.
The APL code is written for the general linear form (8) and the type density f is
arbitrary; also, it allows that the grid size can differ along the two dimensions of the
type domain.

2.6. Other boundary shapes

As one can see fi'om the contrast between Example 1 and the standard test problem,
which differ only in the shape of the upper boundary, the geometry of the type
domain is a critical determinant of the multiplier. When the domain is not square the
transversality condition involves both multipliers along each boundary segment that
is not horizontal or vertical. Such cases require minor modifications of the differential
condition and the integral condition.
We illustrate with Example t. The upper boundary is a segment of a circle and
therefore the transversality condition requires that

tl/Zl (~;) -~ t2#2(t) = 0

along this segment. To obtain the integral condition, therefore, one substitutes the
integral formula for #2(t) derived from the welfare-optimality condition to obtain:

tl/zl(t) -- t2 /0t2( f ( t l , 0#1,~ r)) d~-= 0 .

r) + -~l kcl,

The corresponding discrete version is

j # l [ j , k ] - k~_~ ( f ( j 5 , s5) + #1[j,+ 1,s] - #l[j, s]) = O,


where tl = j3 and t2 = k~ on the discrete upper boundary that approximates the

actual boundary - or one can use an appropriate interpolation for points on the actual
upper boundary.
Similarly, to identify the differential condition for the multiplier #1 along this seg-
ment, one substitutes the integrability condition 0/Zl/()~ 2 = O]~2/0tl into the derivative
of the transversality condition with respect to /'1 to obtain the formula

//,1(/;) q - t l ' ~ l (~) q-~; 2 (~;) ~-=0.

Ch. 5: Nonlinear Pricing and Mechanism Design 279

The corresponding discrete version is

#~+l[J' h] - 1 + j + k
1 (j#~[j - l, k] + k#~[j, k - 1])

2.7. Higher dimensions

Phase 1 presents no intrinsic difficulties in higher dimensions. For instance, for the
analog of the standard test problem when the dimension m is arbitrary, the equation
for #1 is the general Laplacian

02#1 (t~
i=1 - ' - ~ 2 . ', / = 0 .

The analog of the basic recursion used in the relaxation algorithm is therefore the
symmetric stencil with 2 m + 1 points that constructs #1 (t) at the grid point t as
the average of the 2 m values at the adjacent grid points in each of the positive and
negative orthogonal directions. In general, as in this example, the stencil is obtained by
summing over the asymmetric stencils obtained from the possible orthogonal rotations
of a basic stencil, but if an explicit second-order equation can be derived for each
multiplier separately then one can use its natural stencil directly) 9
In Phase 2, the differential condition is essentially unchanged. The integral condition
is complicated partly by the large number of integral constraints. For instance, if the
type space is the unit hypercube then the boundary condition #1 (t) = 0 applies on two
faces and the differential constraints apply on m - 1 additional faces, so there remain
m - 1 faces where [ m - 1In m - I boundary values must be determined from integral
constraints. For the standard test problem, the analog of the simple approximation of
the Jacobian used for two-dimension problems works equally well in three dimensions.
The main complication, however, is the fact that each integral constraint depends
on m - 1 of the multipliers. For the standard test problem, at one of these values on
the face where tl = 0, the integral constraint is

-- f(t) + ~ -~(t) dr1 = O,

which involves more than one of the multipliers if m > 2. Consequently, it is appar-
ently necessary to solve for all the multipliers simultaneously. In the next paragraphs,

19Standard software is available lbr solving fairly general elliptic equations with Dirichlet boundary
conditions of considerable complexity, One source of such software written in Fortran is NetLib at Oak
Ridge National Laboratory: the lnternet address is NetLib@ornl.gov.
280 R. Wilson


pl (q)


Figure 5.5. The marginal price Pl (q) of product 1 for the standard test problem, calculated via Delaunay

such a scheme is outlined for the general case, including nonlinear versions of the
integrability constraint; the comments regarding Phase 2 in that case apply here as
Appendix B includes the APL computer program nra3D that implements this al-
gorithm for the three-dimensional version of the standard test problem. The approxi-
mation of the multiplier #1 (t) obtained from this program is shown in Fig. 5.6 for all
values of tl and several values of the other two type parameters. Note that the two
values of t3 yield almost the same cur*e for t2 = 1.

2.8. Nonlinear equations

No examples for which the integrability condition is nonlinear have been solved,
but we hazard a guess about the modifications required. Phase 1 in such cases
evidently requires one to solve the bundle-optimality condition (4), the welfare-
optimality condition (5), and the integrability condition (7), jointly for the 2m values
(qi(t), #i(t))i=l ..... .~ at each grid point t. This involves 2m equations, of which all
but (5) is typically nonlinear; consequently, one of the many algorithms for solving
nonlinear equations must be used.
Ch. 5: Nonlinear Pricing and Mechanism Design 281

Based on experience from two-dimensional examples, the key modification required

for convergence and stability is that the m equations derived from (5) and (7) should
be formulated as the sum of the 2 m sets of m equations derived from each of the
m + 1-point stencils obtained from rotations of a basic stencil. With this modification,
Phase 1 is again a relaxation algorithm, albeit a nonlinear one, for all m multipliers
simultaneously, and incidentally the m quantities, at each grid point. 2°
In Phase 2, the differential condition derived from the joint application of the
transversality condition and the integrability condition is also nonlinear, so again
a nonlinear equation must be solved. The integral condition requires no significant
modification since it involves only the welfare-optimality condition, which is linear.
However, it should be noted that, because a solution is sought for all multipliers
simultaneously, at each point on each segment of the boundary one of" the boundary
values must be determined by either a differential and or an integral condition. As
mentioned above, each integral condition requires summing over values of the discrete
approximations of the derivatives of m - l of the multipliers.
These considerations about the Newton-relaxation algorithm are considerably sim-
plified if one uses the higher-dimensional version of the pure relaxation algorithm,
because the role of Newton's method is replaced by using the welfare-optimality
condition to specify the boundary conditions on the lower boundaries. Lacking com-
putational experience with this algorithm in more than two dimensions, however, we
are reluctant to venture a guess about its implementation.

2.9. Construction of the price schedules and the tariff

The end product of the algorithm is an approximation of the vector multiplier #(t) at
each point t in a discrete grid T used to approximate the domain of types. Using this
multiplier, one obtains the optimal assignment of the bundle q(t) to type t from the
ordinary Eq. (4). If this equation is nonlinear then one can use any standard method
for solving nonlinear equations, such as Newton's method, or one of the standard
software packages for nonlinear o~imization of type t's virtual benefit. Where the
bundle q(t) is nonzero, say for t E T+, the vector of marginal prices is required to be

p(q(t)) = t),

at least for those components for which qi(t) > 0, and the total tariff is

fi(t) =-- P(q(t)) = U(q(t), t) - W ( t ) ,

where W ( t ) = 0 and P ( t ) = 0 on the lower region of T+ where q(t) = O.

2°Some discussion and results about methods and conditions for convergenceand stability of nonlinear
systems are included in Golub and Ortega (1992, §5.3).
282 R. Wilson


t2= 0.25 t 3 = 1.0

0.5 f ~ t 3 = 0.5

/ \
/ \

/ 0.25 \
• /0.5o ~ \
/ /

0.2 /0.75 ~ \


0 0.2 0.4 0.6 0.8 1
Type parameter t 1
Figure 5.6. The three-dimensional standard test problem's multiplier ~1 (t) for several values of t2 and t3,
computed using the relaxation algorithm combined with Newton's method, using the grid size 6 = 0.025.
Note that when ta = 1 the two values of t3 yield nearly the same curve.

The next step is to construct the actual vector p(q) of marginal prices and then
the total tariff P(q). This can be done by Delaunay triangulation, which for the two-
dimensional case is implemented in Mathematica [Wolfram (1991), Boylan (1991)].
From the list (ql (t), qz(t),/3t (t)) for t E T+, for instance, this technique produces a
piecewise-linear approximation of Pl (q) as a function of the vector q = (ql, q2) E Q;
and P2(q) can be approximated similarly: Figure 5.5 shows the Delaunay triangulation
of the marginal price schedule Pl (q) for the standard test problem. Alternatively, the
isoquants of the marginal price schedules can be calculated by polynomial interpola-
tion, as in Fig. 5.4.
This completes the construction of the solution to the incomplete problem in Mir-
rlees' formulation. In principle, however, a final step is required. One must still verify
that the solution of the incomplete problem solves the complete problem. This in-
volves two checks, one to verify that each type's bundle is globally optimal given the
tariff, and another to verify that the tariff is a global solution of the seller's problem.
Although sufficient conditions (and the ironing procedure for modifying the solution
of the incomplete problem) ensuring these global properties are well-known for the
one-dimensional case, comparable results for multidimensional cases are known only
Ch. 5: Nonlinear Pricing and Mechanism Design 283

for formulations having the increasing-differences and super-modularity properties

studied by Milgrom and Shannon (1994), or the generalized single-crossing property
studied by McAfee and McMillan (1988).

2.10. An alternative version

An alternative version of the algorithm is described by Wilson (1993, §13.6). Although

it can be adapted to more general problems, we sketch it only for the standard test
The idea is to reinterpret the second-order equation for ILl as a first-order equation
for the two functions a(t) =-- -0jZl(t)/O~; 1 and b(t) ---- -O>l(t)/Ot2, linked by the
integrability condition Oa(t)/Ot2 = Ob(t)/Otl. The boundary conditions imply that
b(t) = 0 on the left and right boundaries where tl C {0, 1 }, and f2 a(t) dtt = 0 for all
values of t2. In fact, one can show that if the latter condition is satisfied on the upper
boundary where t2 = 1 then it is satisfied everywhere, and therefore the boundary
condition for b on the left boundary is extraneous. In a discrete version, therefore, one
implements the algorithm by initially guessing values of a on the upper boundary that
sum to zero, and specifying that the values of b on the right boundary are all zero. From
these values one can then use the discrete approximations of the first-order equation
and the integrabitity condition to calculate the values of a and b at all grid points from
the three-point stencil rotated 180 °, proceeding from upper-left to lower-right along
successively lower diagonals. The aim, therefore, is to find values of a on the upper
boundary that ensure that the integral condition is satisfied, expressed here in the form

~0 1[f(t) - a(t)] dr2 = 0

for each value of t 1 > 0, or the corresponding summation in discrete form. One can
therefore use Newton's method (as above) to improve iteratively the estimates of a
on the upper boundary.
One can do better than this, however, when the first-order equation is lineal- with
constant coefficients. The solutions for a and b throughout the grid can be expressed in
terms of the coefficients of the values of a from the upper boundary. Consequently, the
integral conditions provide a set of linear equations, which fortunately is nonsingular.
Solving these equations yields the required values of a on the upper boundary, and then
the values of a and b throughout the grid can be obtained from the coefficients. Finally,
one obtains the multiplier from the discrete version of the formula #l (~) = .f,~, a(]~) d~l.
The solution of the standard test problem obtained from this algorithm differs only
slightly from the solutions produced by nonlinear optimization, Fourier approximation,
and the relaxation algorithm.
Appendix B includes the APL program aalg2D that implements this algorithm for
the standard test problem.
284 R. Wilson

3. The complete problem

The chief indication about the relationship between the solutions of the incomplete
and complete problems is the example solved by Rochet (1994), which we summarize
here briefly.
Rochet considers the variant of the standard test problem in which the types are
assumed to be distributed uniformly on a translate of the unit square, say e ~< t~ ~< 1 + e
for i = 1 , 2 ; the customers' benefit function is bilinear, say U(q,t) = t . q ; and
the seller's cost is quadratic, say C(q) = ½qT . q. This switching of the cost term
from the buyer to the seller leaves unchanged the stated optimality conditions for
the incomplete problem except for the different interpretation of the participation
constraint. The type-linearity of the benefit function implies that the complete problem
differs from the incomplete problem only by imposing the second-order constraint that
the net benefit function W must be convex, as shown by Rochet (1987). When c = 0
this additional constraint is satisfied automatically and therefore the solution of the
incomplete problem (depicted in Fig. 5.5) solves the complete problem. However,
this is false if e > 0: if W is convex then it is continuous and differentiable almost
everywhere, and therefore Wt(t) = 0 for t E To - {t ] W ( t ) = 0}, which contradicts
the implication of the optimality conditions that

at t = (0, t2) on the upper boundary of To. Rochet shows that in fact the complete
solution introduces a third domain between the previous two, so that:

• W(t)=Oiftl+t2~</3.
• W(t) = w ( t l + t 2 ) i f / 3 < t l + t 2 < 7 .
• W ( t ) is the solution of the incomplete problem on the restricted domain 7 ~<
tl + t2, subject to the condition W ( t ) = w ( 7 ) on its lower boundary.
Here,/3 -- [ 4 e + 4x/~fi+ 6]/3, "y ~- [6e+ v/6]/3, and

w(z) -- 3z 2 -
~ s~z . le21n(z
. . . 2e) ~,

where ~ is chosen to ensure w(/3) = 0. On the middle domain, W ( t ) increases

smoothly from 0 to w(7), and its gradient from (0, 0) to (e, e), as tl + t2 increases.
Rochet uses a remarkable technique called 'measure sweeping' to establish this result.
The gist of Rochet's construction is the demonstration that, by excluding disconti-
nuities of the net benefit function and its gradient, the convexity constraint linearizes
the net benefit near To by inducing 'bunching' of customers. In this example, the
middle domain displays bunching in the sense that all types on a locus tl + t2 = s
purchase the same bundle q with ql = q2 so that their net benefit depends only on the
Ch. 5: NonlinearPricingand MechanismDesign 285

sum s of their type parameters. This collapse of the dimensionality of the commodity
space (in this case to a single dimension - solvable by unidimensional methods) near
To to preserve continuity may be a general feature of those cases where the complete
problem differs from the incomplete problem. Presently, however, no general analysis
of the complete problem is known. Evidently, the methods described in this chapter
apply directly only to those cases where full dimensionality is preserved, or more
generally, to those regions of the type space where full dimensionality persists.

4. A mechanism design formulation

Nonlinear pricing is only one application from the larger class addressed by the theory
of mechanism design. This section describes the formulation of a simple problem of
mechanism design to illustrate the more general contexts in which the basic com-
putational problem arises. This formulation addresses a problem of social choice by
a group of individuals with private information about their preferences. To simplify,
wealth effects and risk aversion are excluded, and consequently no allowance is made
for risk sharing or incentive problems stemming from moral hazard; also, correlations
among individuals' information is excluded. Wilson (1993b) provides a more general
lbrmulation that includes these aspects.
The decision problem is to choose one anaong several options indexed by d E D,
together with a system of transfer payments among the members of the group. The
members are indexed by j c J and a member j obtains the gross benefit Uja(tj)
(relative to a null option) if the option chosen is d, where tj is a vector of type
parameters privately known by member j21 A mechanism allows each member j
to submit a reported type t j , and then based on all reports t ~ ( t j ) j c J the option
d is chosen with probability z a ( t ) ; in addition, to provide incentives for truthful
revelation, j receives a transfer payment sj(~). Thus, on the assumption that other
members report truthfully, member j ' s net benefit is

Wj(~J) -- maxE {~d Xd(~'~'J)Ujd(~J) + t tJ } '

where (t,~j) indicates the combination of truthful reports from other members to-
gether with the stated report t j from member j . The operator E {- I ~.J} indicates the

2t Nonlinear pricing is the special case in which there are two members: a seller whose type is fixed
and who receives the entire welfare weight, and a representative customer whose type is uncertain and who
receives zero welfare weight. A principal-agentproblem in which the agent takes an observable action based
on private information is analogous [cf. Hurwicz and Shapiro (1978)]. If the agent's action is unobservable,
then transfers can depend only on observables; in this case the gross benefit function must be interpreted
as the indirect utility obtained after the agent's optimization of its private action. Explicit formulations arc
provided in Guesnerie and Laffont (1984), Prescott and Townsend (1984) and Wilson (1993a, Chapter 15;
1993b, Section 3).
286 R. Wilson

conditional expectation of other members' types conditional on j ' s type; however,

members' types are assumed here to be independent, so the distribution of others'
types is unaffected by j ' s type.
The mechanism-design problem is to maximize a social welfare function, say

where each ),j represents a nonnegative welfare weight, subject to incentive and
feasibility constraints. 22 The incentive constraints parallel the nonlinear pricing for-
® Incentive-compatibility constraints:

where on each side the prime indicates the gradient with respect to j's vector of
type parameters. As with nonlinear pricing, this condition expresses the envelope
condition implied by the requirement that truthful reporting is optimal.
• Participation constraints:

wj(tj) w;(tj).
This supposes that member j can obtain the net benefit W ] ( t j ) by rejecting
participation; hereafter, assume Wj* (t j) = 0.
The feasibility constraints include the obvious requirements on the choice probabilities
that Zd(t) >~ 0 and ~ a Zd(t) ~< 1, where the residual probability is assigned to the
null option d = o that preserves the status quo and yields a gross benefit of zero
to each member. In addition, one requkes that the transfer payments do not depend
on an infusion of funds: ~ j sj(t) ~< 0. Note that this formulation supposes that the
m e m b e r s ' allocation of the cost of an option is included in its description.
To see the connection to nonlinear pricing, suppose that the members are a seller
and a buyer, and each option corresponds to the transfer of a bundle q from the seller
to the buyer; also, the buyer's expected payment is the money transfer to the seller.
The buyer's incentive-compatibility constraint has the general form

w(t) - w(i) -

22This objective invokes the criterion of ex ante incentive efficiency. The weaker criterion of interim
incentive efficiency allows type-contingent welfare weights.
Ch. 5: Nonlinear Pricing and Mechanism Design 287

for every two types t and {. When the decision rule is nonstochastic, this condition
yields the discrete-types formulation at the end of Section l, and when the envelope
condition suffices to represent the buyer's first-order optimality condition, it yields
the incentive-compatibility condition (2).
The analysis of this problem parallels the analysis of nonlinear pricing. In particu-
lar, the 'incomplete problem' ignores the auxiliary second-order conditions (such as
convexity of Wj if Ujd is convex). The Lagrangian form of the objective assigns
a multiplier #j (tj) for each incentive compatibility constraint, and again the Diver-
gence Theorem is applied to convert the objective to a form that allows maximization
pointwise. One can also take advantage of the fact that the multiplier on the transfer
constraint must have the form 7r~Ij fj(tj) where fj@j) is the marginal density of tj
and 7r > 0. This yields necessary conditions for optimality in the incomplete problem
that parallel (4)-(7).
• Allocative optimality: after the report t, positive probability is assigned only to
those options that maximize ~ j (Yjd(tj), where j's virtual benefit is

(:jd(tj) ujd(tj) - u; (tj) . ,j(tj)/ fj(tj).

* Welfare optimality:
- - v. m(tj) 0, (16)

and this inequality is complementary to the participation constraint Wj(tj) >~0

on the interior of j's type domain. Recall that V • #j is the divergence of/zj as
in (5).
• Welfare optimality on the boundary:
uj(tj) . #j(tj) <~O, (17)

and this inequality is complementary to the participation constraint Wj(tj) >~0

on the boundary. As in (6), vj (tj) is the outward-pointing normal vector at a
point tj on the boundary OTj of j's type domain Tj.
• Integrability: the vector field

E {~d Xd(t)U~d(~j) I tj } isintegrable.

These conditions indicate that the social choice is determined from the allocative
optimality condition after the welfare optimality conditions (16) and (17) are solved,
subject to the integrability condition. The welfare optimality conditions, one for each
member, are exact analogs of the conditions (5) and (6) for nonlinear pricing.
288 R. Wilson

However, among the members these conditions are linked by two features. One
is the integrability condition, which depends on the solutions of all the members'
welfare optimality conditions. The second is the fact that the scalar multiplier 7r is
determined to ensure feasibility of the transfers; the method is described in Wilson
(1993b; Eq. (7) on p. 138). Due to these linkages, no multidimensional examples with
significant complexity have been solved. However, some examples where symmetry
can be exploited are solved in Wilson (1993b; p. 142 ft.).

5. Summary and conclusions

Nonlinear pricing is one instance of a wide variety of problems derived from the
principal-agent paradigm and its extensions to mechanism design. The key ingredient
of the standard formulation is the representation of an agent's private information as a
point in a Euclidean space. Characterization of a solution relies on the necessary con-
ditions derived from the seller's incomplete problem. This approach is fruitful when
the type space is unidimensional because the Lagrange multiplier on the incentive-
compatibility constraint can be obtained in closed form. When the type space is mul-
tidimensional, however, construction of the multipliers presents a classical problem
involving first-order partial differential equations, complicated by awkward boundary
conditions. The differential equations, moreover, involve both the simple linear form
in the welfare-optimality condition (5) and possibly nonlinear forms derived from the
integrability condition (7).
Computational procedures naturally divide into two phases. In Phase 1 the differ-
ential equations are solved with fixed (Dirichlet) boundary values. Straightforward
approaches to solving these equations based on discrete approximations encounter in-
consistencies. The explanation seems to be that the integrability condition is properly
formulated in terms of backward derivatives and the welfare-optimality condition, in
terms of forward derivatives. The pure relaxation algorithm relies on this asymme-
try to form the recursion from a symmetric stencil for successive relaxation on the
interior. The Newton-relaxation algori~m enforces a symmetric setup using only for-
ward derivatives by including all rotations of a basic stencil to form the recursion for
successive relaxation. In some examples, inclusion of the rotated stencils is equiv-
alent to solving the second-order equation derived from the welfare-optimality and
integrability conditions.
In Phase 2 the boundary values are adjusted to improve the approximation of the
boundary conditions imposed by the transversality condition. For both algorithms, the
differential condition derived from the integrability condition provides an adequate
yon Neumann condition on the upper boundary. In the pure relaxation algorithm,
the conditions on the lower boundaries can be derived from the welfare-optimality
condition. This yields an especially simple scheme for calculations. In the more cau-
tious Newton-relaxation algorithm, however, these boundary conditions are specified
Ch. 5." Nonlinear Pricing and Mechanism Design 289

by the integral constraints derived by integrating the welfare-optimality condition,

and Newton's method is used to obtain successive improvements. Fortunately, simple
approximations of the Jacobian suffice for Newton's method in the limited class of
examples that have been studied.
Limited experience with an alternative algorithm indicates that it may also be
feasible to use an m + 1-point stencil if the functions are interpreted as the gradients
of the multipliers, but this algorithm has been implemented only for the standard test
The examples that have been solved indicate that these difficult computations are
worthwhile. In particular, the qualitative properties of the tariff and its marginal prices
bear little relation to those predicted from studies of the one-dimensional case. For
example, in the standard test problem the multipliers are not monotone functions
of the types, and the marginal prices of commodities are not monotone functions
of the quantities purchased. Also, each customer purchases either both commodities
or neither, indicating that implicit 'bundling' is a significant feature, as Armstrong
(1992, 1993) has emphasized. Rochet's analysis of the discrete-types case indicates
further that negative correlation among the type parameters can invalidate the usual
conclusion that the last units are supplied to the highest types at marginal cost. In sum,
the chief qualitative features of multi-product pricing and taxation differ substantially
from the single-product case.
Essentially all the computational methods presented here address only the incom-
plete problem that relies on the envelope condition to represent incentive-compatibility
constraints; it therefore ignores customers' second-order conditions and their impli-
cations, such as convexity of the net benefit function. The only known method of
handling these auxiliary constraints is indicated by Rochet's technique of measure
sweeping applied to the example he solved. If one judges from this example, these
constraints can require a collapse of the dimensionality of the assigned bundles in
order to maintain the continuity of the net benefit function and its gradient along
linear segments of the type space. At this writing, no algorithmic procedures have
been developed to implement the insight obtained from Rochet's analysis.
The complexity of the multidimensional nonlinear pricing problem addressed here
suggests that an entirely different formulation might be useful in practice. An alter-
native approach is developed in Wilson (1993; §12, §14) by relying on a formulation
in which it is supposed that the seller has no information about the distribution of
types and about the dependence of customers' preferences on their types. Instead, the
seller knows only the aggregate distribution of bundles purchased in response to lin-
ear tariffs. This formulation precludes an exact analysis of the participation constraint
(3) because the demand data do not distinguish whether a price increase curtails a
customer's demand or extinguishes participation; in particular, pure bundling is gener-
ally not optimal. On the other hand, this formulation allows calculation of a solution
via a simple gradient algorithm, and quite complicated problems can be addressed
290 R. Wilson

Appendix A. Pseudo-codes

This appendix provides pseudo-codes for the two main algorithms in Section 2, applied
to the two-dimensional standard test problem. We allow in the first code that the
density f is not uniform to show its role. Complete implementations in APL code are
displayed in Appendix B. We use here some notation from the APL code to provide
a direct connection between the pseudo-code and the APL code.

A. Relaxation with Newton's method

The following summarizes the logic of the APL program nra2L, whose related line
numbers are shown in square brackets.

0. Initialization [5-7].
a. Input. Obtain a matrix f of the values of a discrete approximation of the density
of types on the unit square.
b. Parameters. Interpret the row and column dimensions of f as specifying the
number of grid points along the two dimensions. Specify a convergence criterion
(e.g., e = 0.001) and a step size (e.g., 0 = 0.2) for Newton's method. Recognize
that for the standard test problem the four relaxation coefficients are all bk = 0.25.
c. Initial values. Specify an initial approximation of #1 that is a matrix x of ze-
ros with the dimensions of f. This specification already satisfies the boundary
condition #l(t) -- 0 for t~ E {0, 1}.
1. Phase 2 [8, 9].
a. E r r o r measurement. At each grid point tl > 0 on the lower boundary where
t2 = 0, compute the integral value d = ~(~) via (10) using the approximation x
for #1. Note that except for the term involving f , d is just the difference between
two adjacent column sums of z.
b. Newton's method. Revise the lower-boundary values of z via (13), using the
stepsize 0 and the approximate inverse of the Jacobian. Because of the simple
form of the Jacobian, this step requires only partial sums of the integral values d.
c. Enforce the differential constraint. Revise the upper-boundary values of z via
(12). This step requires only that the last column of z be set equal to the next-
to-last column.
2. Phase 1 [10-12].
a. Relaxation. Compute new values of #1 at the non-boundary grid points via (11)
using :c. This step consists of computing at each grid point an average of the four
adjacent values of z and, if f is not uniform, the first difference of f along the
first dimension.
Ch. 5: Nonlinear Pricing and Mechanism Design 291

3. Recursion [12, 13].

a. If the integral values d were sufficiently small, and also the differences between
P,t and z were all sufficiently small, terminate with the output #1, consisting of
a matrix of values at the grid points.
b. Otherwise, let z take the values of #1 and return to Phase 2.
As mentioned, this procedure is implemented for the two-dimensional linear version
in the APL program nra2L, whose right argument is the matrix f and left argument
is the reduced form C of the matrix of linear coefficients. The three-dimensional
version of the standard test problem is implemented in the APL program nra3D; its
sole argument is the number N of grid points along each of the three dimensions.

B. Pure relaxation

The following summarizes the logic of the APL program pra2D, whose related line
numbers are shown in square brackets.

O. Initialization [3, 4].

a. Input. Obtain the number n of grid points along each dimension.
b. Parameters. Assign the density f = 1 at each grid point. Specify a convergence
criterion (e.g., c = 0.0001).
c. Initial values. Start with matrices of zeros for the initial approximations of the
values of FI,1 and #2 at the grid points.

1. Phase 1 [5-8].
a. Relaxation. Revise the non-boundary values via (14). The new values at each
grid point are obtained as weighted sums of the density and the values at adjacent
grid points.
b. Recursion. If the differences between the new and old values are not sufficiently
small then repeat the relaxation (a).
2. Phase 2 [9-13].
a. E n f o r c e the differential constraint. Revise the upper boundary values via (12)
for #1 and analogously for #2.
b. E n f o r c e the welfare optimality condition. Revise the lower boundary values
via (15) for #1, computing recursively from j : ~, to j = 1, and analogously for
3. Reeursion [14, 15].
a, If the difference between the new and old boundary values is sufficiently small
then terminate with the outputs #1 and #2, each consisting of a matrix of values
at the grid points.
b. Otherwise, return to Phase 1.
292 R. Wilson

As mentioned, this procedure is implemented in the APL program p r a 2 D for the

standard test problem. The program looks more complicated only because it allows
differing grid sizes along the two dimensions, and it separates the matrices of non-
boundary values (z and y for #I and #2) from the vectors of non-zero boundary
values (a0 and a l for # l , and b0 and bl for #2).
The program p r a 2 L implements the two-dimensional general linear version with an
arbitrary density; its right argument is the matrix of values of f at the grid points and
its left argument is the reduced form C of the matrix of linear coefficients.

Appendix B. APL programs

The A P L programs n r a 2 L and nra3D that implement the relaxation algorithm with
N e w t o n ' s method are shown on the following page. The program n r a 2 L is designed
for two-dimensional problems with linear coefficients as in (8), whereas n r a 3 D is
designed for the three-dimensional version of the standard test problem. Both use the
inverse of the coarse approximation of the Jacobian (described in the text), expressed
in terms of partial sums of the errors in the integral constraints.
On the subsequent page are the A P L programs p r a 2 D and p r a 2 L that implement
the pure relaxation algorithm for the two-dimensional version of the standard test
problem, and the linear-coefficients model as in (8).
Shown last is the program aalg2D that implements the alternative algorithm for the
standard test problem.
The parameter 'r~ in the programs plays the role of r~ + 1 in the text. All these
programs assume that the domain of the type parameters is the unit square or cube.


Arlnstrong, M. (1992) 'Optimal nonlinear pricing by a multiproduct monopolist', Chapter 4 of PhD thesis,
Institute of Economics and Statistics of Oxford University, Oxford, UK.
Armstrong, M. (1993) 'Multiproduct nonlinear pricing', Economic Theory Discussion Paper #185, Cam-
bridge University, UK. :"
Boylan, E (1991) '"Discrete Mathematics", and the function Triangular Surface Plot in the program
Discrete Math. "Computational Geometry" ', Chapter 4 in Guide to standard Mathematica packages,
Technical Report, Wolfram Research.
Derrick, W. and Grossman, S. (1987) A first course in di[]'erential equations with applications. St. Paul:
West Publishing Co.
Golub, G.H. and Ortega, J.M. (1992) Scientific computing and d!ffemntial equations. San Diego, CA:
Academic Press.
Guesnerie, R. and Laffont, J.-J. (1984) 'A complete solution to a class of principal-agent problems with
an application to the control of a self-managed firm', Journal q( Public Economics, 25:329-369.
Hurwicz, L. and Shapiro, L. (1978) 'Incentive structures maximizing residual gain under incomplete intor-
mation', Bell Journal (?[ Economics, 9:180-191.
McAfee, R.R and McMillan, J. (1988) 'Multidimensional incentivc compatibility and mechanism design',
Journal (?/'Economic Theory, 46:335 354.
Ch. 5: Nonlinear Pricing and Mechanism Design 293

Milgrom, ER. and Shannon, C. (1994) 'Monotone comparative statics', Econometrica, 62:157-180.
Milgrom, ER. and Weber, R.J. (1982) 'A theory of auctions and competitive bidding', Econometrica,
Milne, W.E. (1970) Numerical solution of differential equations. New York: Dover.
Mirrlees, J.A. (1971) 'An exploration in the theory of optimal taxation', Review of Economic Studies,
Mirrlees, J.A. (1976) 'Optimal tax theory: A synthesis', Journal of Public Economics, 6:327-358.
Mirrlees, J.A. (1986) 'The theory of optimal taxation', in: K.J. Arrow and M.D. Intriligator, eds, Handbook
of mathematical economics, Vol. Ill. New York: Elsevier, Chapter 24, pp. 1197 1249.
Mussa, M. and Rosen, S. (1978) 'Monopoly and product quality', Journal of Economic Theory, 18:301-317.
Myerson, R.B. (1981) 'Optimal auction design', Mathematics of Operations Research, 6:58-63.
Prescott, E.C. and Townsend, R.M. (1984) 'Pareto optima and competitive equilibria with adverse selection
and moral hazard', Econometrica, 52:2145.
Roberts, K.W.S. (1979) 'Welfare considerations of nonlinear pricing', E~'onomic Jnurnal, 89:66-83.
Rochet, J.-C. (1987) 'A necessary and sufficient condition for rationalizability in a quasi-linear context',
Journal c~fMathematical Economics, 16:191-200.
Rochet, J.-C. (1994) 'Optimal screening of agents with multiple characteristics', Institut d'Economie ln-
dustrielle, Universit6 de Toulouse I, France, mimeo.
Wilson, R. (1993a) Nonlinear pricing. New York: Oxford Univ. Press.
Wilson, R. (1993b) 'Design of efficient trading procedures', in: D. Friedman and J. Rust, eds, The dou-
ble auction market: Institutions, theories, and evidence, Santa Fe Institute Studies in the Sciences of
Complexity, Proceedings Volume XIV, Reading, MA: Addison-Wesley, Chapter 5, pp. 125--152.
Wolfram, S. (1991)Mathematica. Redwood City, CA: Addison-Wesley.
Chapter 6

The University of" Texas


1. Introduction 296
2. Methods 297
2.1. Activity analysis 298
2.2. Location 304
2.3. Economiesof scale 307
3. Software 310
4. Process industries 314
4.1. Small static 314
4.2. Large static 316
4.3. Small dynamic 317
4.4. Examples of sectoral models 318
5. The computer industry 320
6. Energy 322
7. Environment 324
8. Agriculture 326
9. Linkages to computable general equilibrium and growth models 327
10. Limitations 327
11. Conclusions 328
References 328

*1 am indebted to many colleagues for comments on earlier drafts of this report. Any remaining errors
are my responsibility.

Handbook of Computational Economics, Volume 1, Edited by H.M. Amman, D.A. Kendrick and J. Rust
(~) 1996 Elsevier Science B.V All rights reserved.
296 D.A. Kendrick

1. Introduction

Much of economic analysis falls into the two large domains of the microeconomics of
individual firms and the macroeconomics of national economies. In between these two
domains lies the study of individual sectors of the economy such as the automobile
industry or the oil industry. Perhaps this should be called mesoeconomics. 1 This has
traditionally been the area covered by industrial organization. However, the focus of
industrial organization has been narrowly on the regulation and behavior of firms in
monopolistic or competitive environments rather than on the broad set of economic
issues which arise in sectors such as energy, agriculture, environment, education or
Therefore, substantial areas of economics analysis have developed separately as
fields such as energy economics, agricultural economics or the economics of education.
A n important part of the work in these studies of separate sectors of the economy
has e m p l o y e d numerical economic models which are solved on computers. This has
occurred because when one studies a sector of the economy, analytical methods, with
their restriction to models having only a few equations, one or two commodities,
one or two locations, a focus on steady state results and only convex functions,
have proved to be inadequate. Sectoral models frequently address issues involving
many commodities, many locations, substitution and economies of scale and hence
computational methods have a decided advantage over analytical methods in sectoral
economics studies.
Three purposes underlie our discussion of sectoral economics. The first is to discuss
a c o m m o n set of methods which are not currently in wide use in either microeconomics
or macroeconomics but which constitute a core of the methodology for models in many
different sectors of the economy. Some of these methods will be discussed here and
pointers will be given to the location in the literature for discussion of others.
The second purpose is to provide a brief tour of the varieties of sectoral models. This
is a growing and vibrant area of economic research but one that is relatively unknown
to many economists. The aim here is not to provide a comprehensive discussion of
the models for any given sector but to.provide illustrations of the types of models in
use. Neither will a comprehensive bibliography be provided but rather only citations
to illustrative works.
The third purpose is to provide discussion of how some of the methods of sectoral
economics can be used to great advantage in microeconomics or macroeconomics.
One can think of a progression from the microeconomics of a single firm to sectoral
economics with many firms in a single industry to macroeconomics covering the
entire economy. The theme here is that the interfaces between these three fields are
not so much lines as permeable cell walls. The methods which have been developed

IThe nmne mesoeconomics was suggested to me by Bruce Smith. Others have reported that the term
has been used in the World Bank but do not know it provenance.
Ch. 6: Sectoral Economics 297

in sectoral economics offer substantial opportunities for improving some of the ways
we do microeconomics and macroeconomics.
For microeconomics in particular it is the view of this author that we economists
missed an early opportunity to incorporate these tools fully into our tool kit. Tjalling
Koopmans, Robert Dorfman, Paul Samuelson and Robert Solow and many other
economists were contributors to the beginning of mathematical programming. How-
ever, these tools were not widely adopted in standard microeconomics, but rather were
taken up by the fields of Operations Research and Management Science and fashioned
into some of the basic methods which are now used in industry and commerce. This
occurred because of too much emphasis in economics on analytical results and not
enough attention to the rapidly growing power of computational methods. So this
chapter will contain discussion at various points as to how some of the tools of sec-
toral economics might be adopted again in standard microeconomics with substantial
gain to that field.
Progress in sectoral economics depends not only upon the development of new
methods of specifying the economic and mathematical relationships in the models;
rather, it also depends crucially on improvements in the software which is used for
creating and solving the models. Therefore, this chapter includes a section on modeling
languages, expert systems and graphical user interfaces and the role they are playing
in sectoral economics.
The chapter begins with a discussion of a core of methods used in sectoral eco-
nomics. This is followed by a section on software and then by a section which cover
some of the major areas of sectoral modeling, i.e. process industries, the computer
industry, energy, environment and agriculture. Throughout the focus is on optimiza-
tion methods, i.e. mathematical programming and control theory. One could write a
similar chapter which focuses on the estimation and simulation of econometric models
of sectors. For a bibiography that contains both optimization and simulation sectoral
models see Labys (1987). For surveys in this area see Labys, Takayama and Uri

2. Methods

Three basic methods of sectoral economics are discussed in this section. The first
of these is the modeling of production with activity analysis. This draws on the
work of Koopmans (1951) and grew naturally out of the use of linear programming
methods in economics, viz Dorfman, Samuelson and Solow (1958). The methods have
proven to be so useful for modeling diverse production activities and for analyzing
substitution possibilities in the face of changing relative prices that they have remained
in substantial use even as nonlinear programming methods have been widely adopted.
The second set of methods concern the modeling of location. This springs in part
from the early work of Dantzig (1951) and of Koopmans and Reiter (1951) on the
298 D.A. Kendrick

linear programming transportation problem. Raw materials, plants and markets are
located in space and the shipment of materials among these locations is an impor-
tant part of the economics of a number of sectors. Therefore, sectoral models have
made wide use of the linear programming transportation problem and its generaliza-
The third method is the modeling of economies of scale. Much of microeconomic
theory is built on the assumption that production functions exhibit diminishing returns
to scale and are therefore concave. However, this assumption is not valid for many
production processes in modern industrial economies. Rather economies of scale are
widespread in the process industries and in transportation and communication. So
numerical methods were devised for solving models in which production was char-
acterized by economies of scale using non-concave production functions.
As these methods are discussed below both the mathematics and the computational
development will be shown. The computational development for sectoral economic
models is done with a wide variety of software systems. One of the most widely used
of these systems is GAMS, cf. Brooke et al. (1992) so this is the system which will
be used here for illustrative purposes. GAMS is useful for solving algebraic models
including linear, non-linear and mixed-integer mathematical programming problems.
Examples of similar software systems are AMPL by Fourer et al. (1990, 1992, 1994),
AIMMS by Bischopp and Entriken (1993) and Structured Modeling by Geoffrion
(1987, 1992a, 1992b).
As the mathematical and computational representations of various components are
developed, there will be some discussion of "style". This follows the approach of
the famous little book The Elements of Style by William Strunk, Jr. and E.B. White
(1959). This book has been the Bible for generations of American college students as
they learn to express themselves well in writing. It is the author's opinion that style
is as important in mathematical and computational communication as in written com-
munication, cf. Kendrick (1984). That theme is interlaced throughout the discussion
The following sub-sections are written in an introductory fashion. The reader who
is already familiar with these methods may prefer to scan the material on the way to
Section 3 which is devoted to sectoral models of the process industries.

2.1. Activity analysis

Activity analysis is widely used in sectoral studies, but is not commonly taught in
courses on microeconomics. These courses rather focus on neoclassical production
functions like the Cobb-Douglas or constant elasticity of substitution (CES). Therefore
a brief introduction to the subject of activity ~inalysis is provided here.
Consider an environmental study which is concerned with the production of carbon
dioxide in electric power production. Commodities are both inputs and outputs. For
Ch. 6: Secmral Economics 299

example, coal and labor are used to produce electric power. In addition, carbon dioxide
is produced as a by-product. Such a production process can be represented by a vector
in the tradition of Koopmans (1951) as

electric power
coal (tons) gcoal, elec power
labor (man-hours) glabor, elee power
electricity (kwh) 1
carbon dioxide acarbon--dioxide, elee power

This is a production process with two inputs and two outputs. The convention used
is that inputs are negative. The sign of the coal and labor inputs in the vector above
will be negative, while the sign of the electricity and carbon dioxide coefficients will
be positive. Thus acoal, elec power tons of coal and alabor, elec power man-hours of labor
are required to produce one kilowatt hour of electricity and acarbon--dioxide, elecpower of
carbon dioxide. While most neoclassical production functions have a single output, it
is natural in process analysis to have multiple outputs as is the case above. This is one
of the aspects of activity analysis which accounts for its widespread use in sectoral
Substitution is important in most sectoral models. For example one might want
to use an environmental model to analyze the effects of carbon taxes on electricity
production. As the carbon taxes are increased one would expect power producers to
shift from fuels like coal which yield larger quantities of carbon dioxide to those
which yield smaller quantities like fuel oil or natural gas. A model to study this
phenomenon could have three activities as shown below
In Table 6.1 the coefficients in the first four rows will be negative and those in the
last two rows will be positive since coal, fuel oil, natural gas and labor are inputs and
electricity and carbon dioxide are outputs. The three vectors in Table 6.1 represent the
three production processes. In the first vector coal is used to produce a kilowatt hour of
electricity as well a s acarbon--dioxide ' elec with coal of carbon dioxide. In the second vector
afuel oil, elec with fuel oil barrels of oil are used to produce the same amount of electricity,

Table 6.1
Three processes for producing electric power

electric power electric power electric power

with coal with fuel oil with natural gas
coal (tons) acoal, elee with etml
fuel oil (barrels) afuel ~lil, elec with oil
natural gas (mcf) ag~_~,elecwithgas
labor (hours) alabor, elec with coal alabor, elec with fuel oil alabor, elec with g~s
electricity (kwh) t 1 1
carbon dioxide acarbon--dioxide, clec with coal acarben--dioxide, elec with fuel oil acarbon--dioxide, elec with gas
300 D.A. Kendrick

but the carbon dioxide output is the smaller amount acarbon--dioxide ' elec with fuel oil. In the
last vector natural gas is used and the still smaller amount acarbon--dioxide, elec with gas of
carbon dioxide is produced.
Mathematically the relationships above can be stated

Y~ = Z acpZp' c E C, (1)

y,: - output (+) or input ( - ) of commodity c,
acp = input-output coefficient for commodity e in process p,
Zp : level of process p,
P = set of processes
= {elec with coal, dec with fuel oil, dec with gas},
C = set of commodities
= {coal, fuel oil, gas, labor, electricity, carbon dioxide}.
For computational purposes the constraint (1) can be written in GAMS. First the sets
are defined with the statements

C Commodities
/ coal,fuel-oil,gas,labor, elec,car-diox /
P Processes
/ elec-coal, elec-fo, elec-gas /

Many of the computer languages used for sectoral modeling are set-driven in the
sense that sets are defined as here and then parameters, variables and equations are
declared over these sets. This contrasts with most econometric modeling packages
where set operators are not used and each parameter, variable and equation is sepa-
rately defined.
Then the parameters, variables and equations are declared.

A(C,P) Input-Output Coefficients
Y(C) Output or Input Levels
Z(P) Process Levels
ONE(C) Input-Output Equation;

In each case the mathematical symbol is defined over the appropriate set or sets, i.e.
the input-output coefficients are defined over the commodity set C and the process
set P.
Ch. 6." Sectoral Economics 301

Recall that the mathematical statement of Eq. (1) is

y¢ : ~ aepZp, c E C. (1)

With the declarations.above Eq. (1) can be written in the GAMS language as
ONE(C).. Y(C) :E= SUN(P, A(C,P) * Z(P));

There is almost a one-to-one mapping from the mathematical statement to the com-
putational statement of the equation. This is an important property of high-level com-
putational languages like GAMS. Sectoral models may be notationally complex, and
this one-to-one mapping means that the mathematics of the model can be translated
with ease into the computational language. This reduces the number of errors and it
allows for rapid prototyping. For a more complete discussion of the GAMS language
see the chapter by Stavros Zenios in this volume.
The set-driven nature of GAMS makes it necessary to write only one equation
even though there may be hundreds of commodities in the model. Thus set driven
languages are extremely useful in sectoral modeling. There are usually numerous
commodities, processes, production units and locations. It would be a tedious and
error prone activity to define each parameter, variable and equation individually.
Consider next an example of the use of Eq. (1) to calculate the output of carbon
dioxide from a power plant using three alternative fuels. If coal were used to produce
20 kwh, fuel oil were used to produce 30 kwh and natural gas were used to produce
10 kwh, the output of carbon dioxide would be

Ycza'bon dioxide = acarbon dioxide, elec with coal(20) + acarbon dioxide, etec with fuel oi1(30)

+acarbon dioxide, elec with gas(lO)

An environmental model might include a constraint that carbon emissions should be

less than or equal to a certain level, ~. This constraint would be written in the model

Ycarbon dioxide ~ e (2)

or using (1) as

acarboo--dioxide, p Zp ~ e. (3)

This shows how a production function in activity analysis form might be incorporated
into a mathematical programming model of the economy and the environment. In
GAMS this would be represented by defining an additional parameter and inequality
as follows:
302 D.A. Kendrick

EBAR Upper Limit on Carbon Emissions;
THREE(C) Carbon dioxide limit constraint;
THREE(''car-diox'').. SUM(P,A(''car-diox'',P)*Z(P))=L=EBAR;

Equation (3) applies to only carbon dioxide; however, it is first declared over all
commodities and then defined over a single element of the set.
Models with activity analysis also frequently include cascades of production func-
tions in which the output "of one activity becomes the input to another activity. Con-
sider, for example, an oil refinery model with processes for distillation and catalytic
reforming. The vectors for these two processes could be

Table 6.2
Two processes in an oil refinery

distillation catalytic reforming

crude oil (barrels) -1.0
naphtha (barrels) 0.4 - 1.2
gasoline (barrels) 0.6 1.O

For each barrel of crude oil that is input to the distillation process there is production
of 0.4 of a barrel of naphtha and 0.6 of a barrel of gasoline. The naphtha in turn serves
as an input to the catalytic reforming process where 1.2 barrels of naphtha are required
to produce a barrel of gasoline. The constraint for naphtha takes the form

E acpZp >~O, c = naptha, (4)



anaptha, distillation Zrfistillation + anaptha,reforming Zreforming ) O.

In the summation when p = distillation the acv coefficient will be positive to indicate
that naphtha is produced as is shown in Table 6.2. When p = reforming the acp
coefficient will be negative to indicate that naphtha is consumed as is also shown in
Table 6.2. Thus the inequality (4) requires that at least as much naphtha be produced
by the first process as is used by the second process.
For an introductory discussion of the use of activity analysis in sectoral models
see Kendrick and Stoutjesdijk (1978, Chapter 3). Sectoral models frequently contain
dozens of commodities and processes in which commodities are produced in some
processes and used in others. Some examples are the fertilizer industry in Choksi,
Meeraus and Stoutjesdijk (1980), the personal computer industry in Ang (1992), the
Ch. 6: Sectoral Economics 303

Table 6.3
Three processes for producing steam

electric power electric power electric power

with coal with fuel oil with natural gas
coal (tons) acoal, elecwithco~
fuel oil (barrels) aluel oil, dec withfilel oil
natural gas (mcf) agas, elec with gas
stemn (cubic feet) 1 1 1

copper industry in Dammert and Palaniappan (1985) and in Gordon et al. (1987) or
the steel industry in Westphal (1971a, 1971b).
The discussion above introduces the concept of commodities and processes as they
are used in activity analysis models. A third concept that is important is that of a
productive unit. Consider a boiler used to produce steam which turns a turbine to
produce electric power. Three processes for these production activities are shown in
Table 6.3. Either coal, fuel oil or natural gas can be used to produce steam. In this
case the boiler is designed to burn coal, fuel oil or natural gas, so that as the relative
prices of these three fuels change the power company can switch among the fuels.
The boiler is called a productive unit and has a fixed capacity of so many cubic
feet of steam per year. The model then includes an inequality to represent the fact that
the three processes can be used in any mix during the year so long as the total usage
does not exceed the capacity of the productive unit. This is written mathematically as

E bmvzP <" k~, m ¢ M, (5)


km = the capacity of productive unit (machine) m,
b.mp = a coefficient which is one if process p uses productive unit m and is zero
M = the set of productive units.

Then the constraint (5) for the boiler would be written as

Zelec with coal -~- Zelec with fuel oil q- Zelec with gas ~'~ kboiler. (6)

Thus the three processes compete for the use of the capacity of the boiler and there
will be substitution among these processes in response to changes in the prices of
fuels including the level of any carbon taxes.
In summary, activity analysis offers a natural way to model the production side in
sectoral models. In these models commodities are transformed by processes from raw
304 D.A. Kendrick

materials to intermediates and then to final products and the processes compete for
the use of the capacity of the productive units.
However, one must use activity analysis models with care. Because of their linearity
they can result in all-or-nothing solutions in which one process is used 100 percent
and the alternatives are not used at all, in an industry where a mix of processes is
normally in use. For this reason it is important to provide substitute processes and to
include diminishing return specifications as appropriate. For example, of the pitfalls
of using activity analysis without adequate substitution activities see the description
of the results from the small static steel model in Kendrick, Meeraus and Alaton-e
(1984, p. 76). For general piecewise linear representation of nonlinear, but weakly
separable, functions see Charnes and Cooper (1956).

2.2. Location

One of the most important aspects of sectoral economics is space. Traditional spa-
tial economics is divided into regional economics which focuses on regions in a
single country, and international trade, which focuses on nations in a worldwide
economy. Sectoral economics deals with a wide variety of spatial settings which
include both of these traditional areas of study. Sectorat studies may range from a
city or province to a country or even to the globe. For example, transportation sector
models are frequently developed for a single city while global warming environ-
mental models are most commonly national or international. Electric power mod-
els usually cover states or provinces while petrochemical models may cover all the
countries in a region like the European Common Market or may be worldwide in
In sectoral models raw materials, processing plants and markets are usually spatially
separated and transportation costs play a substantial role in the economics of the sector.
Consider the petroleum industry. A petroleum model might begin with crude oil in
the Mid-East. The crude oil is shipped to Rotterdam or Houston to be transformed to
gasoline and marketed in Paris or Dallas, In the automobile industry the various part s
of an automobile might be manufactured in several countries, assembled in another
country and marketed in yet another.
For this reason sectoral models frequently consider location. This type of modeling
originated with the linear programming transportation problem which can be stated
simply as minimize


Ch. 6: Sectoral Economics 305

= total transportation cost,

#~j = unit transportation cost for shipments from plant i to market j,
xij = shipments from plant i to market j,
I = set of plants,
J = set of markets.
In this model there is a set of plants and a set of markets. One seeks to find the
shipments which minimize total transportation cost while providing the requirements
of each market and without violating the capacity constraints of the plants.
The computational representation of the criterion function (7) in GAMS is devel-
oped by first defining the mathematical elements as

I Plants
J Markets
MU(I, J) Transportation Cost Per Unit Shipped
XI Total Transportation Cost (Greek xi)
X(I,J) Number of Units Shipped
CRITERION Objective Function;

Then the objective function can be written as

CRITERION.. XI =E= SUM((I,J), MU(I,J) * x(I,J)) ;

The constraints for the model are

xij <~ hi, i E I, (8)


sum of shipments- [capacity of]

from p l a n t i t o ~ [ planti J'
all markets

i.e. the shipments to all markets from each plant must be less than or equal to the
capacity ki of the plant.
In GAMS this is represented as

K(I) Capacity
CAPACITY(I) Capacity Constraint;
CAPACITY(I).. SUM((J), X(I,J)) =L- K(I);
306 D.A. Kendrick

The second set of constraints for the model is

xij >~dj, j E J, (9)


sum of shipments" [ demand at ]

from all plants >~ [ market j J '
to market j

i.e. the shipments from H1 plants to market j must equal or exceed the demand in
that market. The style of using words in brackets like those under Eqs (8) and (9) is
called the "Alan Manne Notation" after its originator. This style of notation is one of
the methods which are used to make complex models easier to understand.
In GAMS Eq. (9) is represented as
D(J) Demand
DEMAND(J) Demand Constraint;
DEMAND(J).. SUM((I), X(I,J)) =G= m(J);

The complete representation of the simple linear programming transportation model

in the GAMS language is shown below. This model has two plants and three markets.
I Plants
/ CHI C h i c a g o
DAL D a l l a s /
/ CLEV Cleveland
PITTS P i t t s b u r g h
ATL Atlanta /
K(I) Capacity
/ CHI 42
DAL 20 /
m(J) Demand
/ CLEV 22
ATL 15 /
MU(I,J) T r a n s p o r t a t i o n Cost Per U n i t S h i p p e d


CHI 6.6 7.9 10.3
DAL 16.3 16.9 12.2
XI T o t a l T r a n s p o r t a t i o n Cost (Greek xi)
X(I,J) N u m b e r of U n i t s S h i p p e d
Ch. 6: Secmral Economics 307

CRITERION Objective Function
C A P A C I T Y (I) Capacity Constraint
D E M A N D (J) Demand Constraint;
CRITERION.. XI =E= SUM((I,J), MU(I,J) * X(I,J)) ;
CAPACITY(I) . . SUM(J, X(I,J)) =L= K(I) ;
DEMAND(J) . . SUM(I, X(I,J)) :G= D(J) ;

After the demand equations are specified, there is a "MODEL" statement which indi-
cates that all of the three equations are included in the model. The "SOLVE" statement
then determines that the model will be solved with linear programming (LP) meth-
ods and that the criterion value XI will be minimized. Finally, the results of the
optimization are displayed with the "DISPLAY" statement.
This model would look somewhat different in one of the other modeling languages
but the basic elements of sets, parameters, variables, equations, model, solve and dis-
play statements would be present though called by different names and have different
syntax and capabilities.
This simple linear programming transportation model is the basis for much more
elaborate specifications of location. A full model might include mines where raw
materials are extracted, plants where the raw materials are shaped into intermediate
commodities, other plants where the intermediate commodities are fashioned into final
products and markets where the final products are consumed. In all these locations
the simple principle remains that you cannot ship more than you have capacity to
produce and that you must receive as much as is required at each stage of processing
or final use.
The linear programming transportation problem above focuses on operations and
not on investment. In dynamic variants of the transportation problem there is capacity
for each productive unit in each plant in each time period. Also, there are investment
activities which can be used to add to capacity. This is the problem which is the focus
of much of sectoral economics. Given an existing set of plants, which productive units
should be expanded in which plants in which years and by how much? Also, where
should new plants be built, and which existing plants should be shut down?
W.W. Cooper has pointed out to me that it is useful to distinguish location models
from transportation models, viz. Koopmans and Beckman (1957). See also Thore
(1991) and Thompson and Thore (t992).
The economics of" investment location highlights another important attribute of
many sectoral models: namely, economies of scale.

2.3. Economies of scale

Consider a country with markets in a number of large cities. If economies of scale in

investment are strong and transportation costs are small, then a single large factory
308 D.A. Kendrick

Cost t

Figure 6.1. The investment cost function.

could most efficiently serve the country. If, on the other hand, transportation costs
are large and/or economies of scale in investment are weak, then many small plants
located near the markets could most efficiently serve the country.
This scene is replayed in many sectoral models, though the stage may not be a
single country, but rather a region within a country, a collection of countries in a
c o m m o n market or even the entire world economy. For the electric power industry
one may be interested in a region within a country like the southern part of India.
For the steel industry one may be concerned with a collection of countries like the
European Common Market. For the microprocessor industry, the relevant economic
playing field might be the whole world.
When there are economies of scale the investment cost function might look like
Fig. 6.1. The curved line in the figure above is an investment cost function which is
characterized by declining marginal cost, i.e.

¢ = .f(h) (10)

¢ = investment cost,
h = size of the productive unit.
For example, the function might be

¢=c~h ~ with c~>0, 0</3< 1.

The function (10) can be approximated by a linear function with a positive intercept
and a positive slope as in shown by the dashed line in Fig. 6.1, i.e.

¢ = wy + vh, (1 l)

E nvestment] = F.xedl [wiable t

cost J [costJ + [ cost J
Ch. 6." Sectoral Economics 309

y : a zero-one integer variable,
co = the "fixed-charge" or "site-entry" portion of the capital cost,
h = the size of the productive unit,
v = the slope of the approximate cost function.
In addition to Eq. (1 l) two other constraints must be added to the models

h ~< hy (12)


y : 0 or 1 (13)

= an upper bound on the size of the productive unit.
The constraints (12) and (13) require that the full fixed charge co be incurred if an
investment of even the smallest size is made.
In GAMS Eqs (12) and (13) are specified as

HBAR Maximum Size of P r o d u c t i v e Unit
H Size of P r o d u c t i v e Unit
Y Zero-One Integer Variable
FIXCOST Fixed Cost Constraint

The key to this specification is the use of the "BINARY VARIABLES" statement
which requires that the Y variables be restricted to the values of zero or one. This is
coupled with "MODEL" and "SOLVE" statements as follows:



Here "STEEL" is the model name, " P H I " is the criterion function value and MIP
(mixed integer programming) is the solver which is used to obtain the solution.
Strictly speaking, GAMS is a modeling system rather than a solution method, but
it interfaces with a number of solvers for linear, nonlinear, linear mixed integer and
nonlinear mixed integer programming. For example, both MINOS and CONOPT are
nonlinear programming solvers associated with GAMS. MINOS was developed by
Murtagh and Saunders (1987) and CONOPT was developed by Drud (1994). Thus the
310 D.A. Kendrick

user who has both of these solvers can solve a given nonlinear programming model us-
ing these two codes by specifying "OPTION NLP = MINOS5;" or "OPTION NLP
= CONOPT;" followed by "USING Nr.p" in the SOLVE statement. Since different
codes are effective in solving different nonlinear programming problems, this is an
important option for the user.
Constraint (13) requires that the model be solved as a mixed integer programming
model since the y variable cannot take on any non-negative value but rather only the
values zero or one. This approach to the approximation of non-convex investment cost
functions was proposed by Markowitz and Manne (1957). See also Dantzig (1957).
Essentially the investment problem in models with economies of scale is combina-
torial. Usually there are many different productive units which can be expanded as
well as new plant sites which can be opened at various locations. The problem is one
of finding which combination of these expansions is the most efficient. Mixed integer
programming codes essentially go through the combinations and search for the best
solution. However, the search is conducted in an efficient manner which makes use of
the fact that some solutions are dominated by others. Still the computational cost of
solving mixed integer programs increases very rapidly with the number of investment
options under consideration. High speed computers have made this work possible and
are now permitting the solution of substantial investment problems even on desktop
computers. A recent example is the work on electric power planning where there are
integer variables not only for productive units but also for transmission lines in the
distribution network, cf. Baughman, Siddiqi and Zarnikau (1993).
From this background in some of the core methods of sectoral modeling we turn to
a discussion first of the software which can be used to develop sectoral models and
then to the models themselves.

3. Software

The GAMS language has been used in this chapter to illustrate the development in a
computer language of the mathematics of sectoral modeling. While GAMS [Meeraus
(1983)] was one of the first high-level software systems widely used for sectoral
modeling, a number of competing systems have gained recognition in recent years.
These include the AMPL system of Fourer, Gay and Kernighan (1990, 1992). This
was developed at Northwestern University and Bell Laboratories and was originally
Unix-based. The MIMI (Manager for Interactive Modeling Interfaces) system has
been developed by Baker (1990) to include a stronger database orientation than other
modeling languages. Among other uses it has been applied to process industry models
including distribution planning, operations planning and production scheduling. The
Structured Modeling system of Geoffrion (1987, 1992a, 1992b) has a very carefully
worked out theoretical background and an emphasis on developing the structure of
the model independent of the data. One of the most recently developed systems
Ch. 6." Sectoral Economics 311

is AIMMS by Bisschop and Entriken (t993). This software is compatible with the
GAMS language, and offers a graphical interface for the construction and use of
interactive reports (what-if analysis).
One of the most interesting ways to come up to speed in sectoral modeling is to
look through the papers and books which describe the modeling languages mentioned
above. There are numerous references there to the use of these systems for modeling
a wide variety of sectors and different activities within those sectors.
A recent software development of potential importance for sectoral economic mod-
eling is the integration of optimizers into spreadsheets. I have tried the Excel imple-
mentation which incorporates Leon Lasdon's GRG nonlinear programming code and
found it easy to use. I introduced a small growth model and was fascinated to watch
the search process as the code converged toward the optimal solution. While these
spreadsheet optimizers do not have the set drive capabilities which make modeling
languages like GAMS so valuable for sectoral modeling, their widespread availabil-
ity and large number of well-trained users is indicative of the impact which these
optimizers may have on future sectoral modeling work.
Two recent development in software development hold interesting possibilities for
sectoral models. The first is the use of artificial intelligence or expert systems to
develop and maintain the models. An example of this is in Krishnan (1988, 1990,
1991) which utilizes the Prolog language to create an expert system called PM for
process industry model development. A small portion of the representation of a model
of the Mexican steel industry is shown in Table 6.4.
The declaration of an "index" in PM is similar to set specification in the GAMS
language except that the set is represented by the index rather than by the set sym-
Table 6.4
PM representation
model_name (io_forAron).
index (years, [t]).
index (mills, [i]).
index (input, [c]).
index (iron, [r]).
index (iron4~roduction, [p]).
isa (years, planning_period).
isa (mills, plant).
isa (input, rawanaterial).
isa (iron, product).
isa (iron_production, production_process).

ins_of ([1989], years).

ins_of ([hylsa], mills).
ins_of ([coking_coal], input).
ins_of ([iron_ore], input).
ins_of ([pig_iron], iron).
ins_of ([pig_iron_production], irond)roduction ).
312 D.A. Kendrick

File Edit
Data View Results Help

~ c / / / / ~ PITTS

~ 15

Figure 6.2. The graph window.

bol. The "isa" predicate is used to relate the sets (or indices) to internal knowledge
variables which are used by PM. Thus the set of years is a "planning-period" and
mills are "plant". This internal knowledge base gives PM some of its power by per-
mitring logical tests on the structure of the model and in helping the user to provide
a complete model. The "ins_of' predicates provides instances or elements of the sets.
Thus coking_coal and iron_ore are elements of the set of inputs.
Logic programming can provide powerful tools to help the user construct a model. It
can also provide a very useful query structure like a database to answer questions about
the sector. However, at this stage in their development logic programming models are
tedious to construct. Fortunately, this tedium will probably be relieved in the future by
the use of graphical interfaces for these systems. In the meantime, graphical systems
are beginning to provide some of the functions of logic programming systems but in
a more intuitive framework.
This leads to the second software development for sectoral modeling, namely graph-
ical interfaces which are specific to certain classes of models but which offer substan-
tial ease of use. An example is the production and transportation modeling system,
PTS, which was created by Kendrick (1991) to run under Windows. This system is
specific to production and transportation modeling but the graphical interface greatly
decreases the entry cost for first time modelers. A similar system for the Macintosh
computer has been developed by Jones (1990, 1991).
These graphical systems may be the wave of the future. They offer the possibility of
maintaining parallel representations of a model in mathematical, graphical, database,
spreadsheet, logic programming and upper-level language forms. Figure 6.2 shows
the graphical representation of a simple linear programming transportation model in
PTS running under Windows.
Ch. 6: Sectoral Economics 313


File Help

I Plants /
J Markets /
PARAMETER K(I) Capacities of Plants /
CHI 52
DAL 20

Figure 6.3. The GAMS window.

The square symbols represent plants and the rectangles represent markets. A new
plants can be added to the model by clicking on the "Plant" box in the upper left hand
corner of the window and then moving the mouse to the place in the graph where it
should be located and clicking again.
Figure 6.3 provides the GAMS representation of the same model. If the user adds
a plant to the graphical view, then the GAMS view will immediately be updated to
reflect the change.
Some changes to the model are most easily made by changes in the graph window
and others by changes in the GAMS window. Other views might also be used. For
example PTS maintains a spreadsheet view of the transportation cost between plants
and markets. This may be the view in which most users will prefer to enter changes
in the transportation cost though one could also make that type of change in the
graphical view or the GAMS view.
Each view may represent only a portion of the information about the model. For
example the spreadsheet view might be used only for the transportation cost but not
include data from the graphical view about where plants and markets are located.
PTS is experimental software designed to work with simple linear programming
transportation problems and only provides three or four different views. However,
it illustrates the principle of parallel model representations which will most likely
be provided in the graphical user interface sectoral modeling systems of the future.
PTS was implemented in the C language using the Software Development Kit for
Windows. At the time PTS was developed, OLE (Object Linking and Embedding)
was not yet available; however future sectoral modeling system will make use of
this capability. Then for example the spreadsheet implementation, instead of being
hand-crafted as in PTS, might make use of the Microsoft Excel spreadsheet.
314 D.A. Kendrick

From the preceding discussion of the core methods of sectoral modeling and the
software we now turn to a description of some of the sectoral models which have
been developed in recent years. We begin with models from the process industries.
A comprehensive survey of the literature in each area is beyond the scope of this
chapter. Rather highlights of the core methodology in each area are given along with
illustrative references which will enable the interested reader to get a start in the

4. Process industries

In order to highlight the attributes of various sectoral models, this section will contain
a discussion of an illustrative set of models followed by an annotated listing of a
group of models organized by geographical coverage and sector.
When one first approaches a sectoral economic analysis there is a tendency to build
a single model which encompasses all the matters of interest. However, this strategy
frequently runs afoul of the fact that, even as fast as modern computers are, they are
not up to the task of solving fully specified models of sectors. Moreover, large models
are difficult to understand and it is not easy to verify that they are correctly specified
and entered into the computer in an error-free way.
Therefore, it is usually valuable to plan on developing a number of different models
in a given project. For example, the first model or models may focus on the operations
problem of the firm and abstract from investment. Then at a later stage in the project
models which include investment can be developed. However, when investment is
added it is frequently necessary to use an aggregated version of the operations model
which contains fewer commodities, productive units and processes.
It is important as one progresses from one model to another not to throw away
the first models developed but rather to preserve them. For example, the first highly-
aggregated, static, operations-only model is usually the model which is easiest for
readers to fully comprehend. This model can therefore be put to important use as an
introduction to the project in a form that can be widely understood and therefore serve
as a useful basis for communications between the modelers and the supervisors.
As an example of this kind of a system of models, consider the models which
were developed in a World Bank project to analyze the steel industry in Mexico,
cf. Kendrick, Meeraus and Alatorre (1984). The project included small static, large
static and small dynamic models which are used below to illustrate the strengths and
weaknesses of such models.

4.1. Small static

The work began with a small static model of the industry. It included five plants,
three markets, five raw materials, two intermediate products, a single final product
Ch. 6: Secwral Economics 315

and a single time period. A list of the commodities gives an idea of the degree of
disaggregation in the model.
raw materials
natural gas
scrap iron
intermediate products
sponge iron
pig iron
final product
A more disaggregated model would begin with iron ore and coal as raw materials
rather than pellets and coke. Also, it would include shapes, reinforcing rods, hot
rolled sheets, cold rolled sheets and tin as final products. However, even in its highly
aggregated form, the small static model is sufficient to address many of the important
operational issues of the industry.
This model includes cascades of production functions of the sort that were discussed
in the Activity Analysis section above. In the first stage (i) pellets and coke are
transformed into pig iron or (ii) pellets and natural gas are transformed into sponge
iron. In the second stage (i) pig iron and scrap iron or (ii) sponge iron and electricity
are transformed into steel. Also, since there is more than one process for making steel
there is substitution among inputs as described in the first part of the section above
on Activity Analysis.
The criterion was to minimize the cost to transform the raw materials first into
intermediate and then into final products in order to satisfy the demand requirements
at the markets. The basic structure of the original linear programming transportation
model is maintained. This is a strength and a weakness. It is a strength in that cost
minimization enables one to focus clearly on the relative efficiencies of the different
steel mills and the productive processes and productive units within those plants. It
is also a strength in that three of the five steel mills in the model were owned by the
government at the time of the study and minimizing the cost to serve the steel needs
of the country was one of the key goals of the industry. Cost minimization in the
linear programming transportation model framework is a weakness in that such mod-
els usually focus on meeting fixed requirements and thereby ignore the relationship
between price and the quantity demanded. Also, the goal of cost minimization is not
appropriate for the two private steel mills among the five in the model.
While there are ways to modify the linear programming transportation problem
to include demand functions and to convert the model to profit maximization, cf.
Kendrick and Stoutjesdijk (1978, Chapter 7). One can also solve for a partial equilib-
rium by maximizing the sum of producers' and consumers' surplus. These methods
316 D.A. Kendrick

have so far not been widely adopted. However, there is a tendency in that direc-
tion which is being reinforced by the increased efficiency of nonlinear programming
solvers and by the existence of nonlinear mixed integer programming codes. An ex-
ample of this can be seen in the electric power study of Baughman, Siddiqi and
Zarnikau (1993).
While there are a number of ways in which this small static Mexican steel model
could be improved, it none-the-less permitted an analysis of some of the key issues
of that industry at that time. One of those issues was the substantial subsidies which
were effectively passed from the government to the private sector via the medium
of artificially low natural gas prices. The effects on the industry of modifications o f
these subsidies could be studied with even such a small model. And indeed because
the model was small and relatively easy to understand, the message was strengthened.

4.2. Large static

The large static model worked backward to include more details of raw material pro-
duction and preparation and forward to include more intermediate and final products.
Seven iron ore mines and a coal mine were added to the model as well as three pellet
plants and a coke oven facility. One more small steel mill was added to raise the total
from five to six. The number of market areas was increased from three to eight. All
of these changes added to the importance of transportation cost in the model. Also
the model was expanded to include about fifty commodities including twelve types
of final products.
Since the production of coking coal was explicitly included in this larger model,
the model could be used to study the effects of loosening the restrictions on imported
coal which were in place in Mexico at that time. Also, the industry was focused on
the domestic market and did not give much importance to potential exports; therefore
a study was made of the effects on the industry of greater export promotion.
There was a tendency to run each mill separately and to ignore the possibility
of substantial interplant shipments of intermediate materials to ease bottlenecks in
some of the plants. An experiment was conducted with the model to analyze these
possibilities. Also, an experimental run was done to study where strikes would be
most effective from the point of view of the unions or of management.
This large static model was sufficiently disaggregated that it included the kind of
detail which the operating officials in the industry would recognize as the level at
which they thought about the economics of the industry. It contained capacities for
many of the individual productive units in the steel mills and thus was able to analyze
the effects of various bottlenecks on production.
While the model was a large one for its day it may now be solved with ease on a
microcomputer. However, while speed increases help in understanding such a model,
the complexity is still such that one should not jump into such a model development
Ch. 6: Sectoral Economics 317

project without first constructing a small static model of the type described in the
previous section. As one measure of the complexity the GAMS statement of the
model included about 1200 lines, i.e. twenty to thirty pages of set specification, tables,
variable lists and equations.
Complexity of this degree is one of the reasons that graphical methods like those
discussed in the software section above will be important in the evolution of sectoral
models. It is easier to understand a graphical representation of an industry than it is
to sort out the details from a GAMS statement. For example, software like the PTS
system discussed above will eventually include icons of plants on a map. One will
be able to click on these plants and effectively "go inside" to find an array of icons
which represent the main productive units linked by lines of product flows. These
productive units could in turn be clicked on to permit the user to "go inside" and see
the alternative production processes which can be used in the productive unit.
The results of solving the model will not appear as tables in a GAMS output but
rather as graphs showing product flows between productive units as well as from plants
to markets. This will enable one to quickly digest the results of even large models
and to develop additional experiments in order to learn first about the functioning of
the model and then later about the functioning of the economics of the sector.

4.3. Small dynamic

The small dynamic model focuses on investment. The model not only includes mul-
tiple time periods but also specification of investment cost with economies of scale.
As discussed above, mixed integer programming methods are required to solve in-
vestment problems where there are economies of scale. Such problems are inherently
combinatorial. Thus if the problem has four productive units to be considered in each
of two time periods, the computational problem is of the order of 28 or 256. If just one
more productive unit is added, the order is 25° or 1024. Thus the computational com-
plexity increases exponentially with the number of productive units and the number
of time periods.
Because of the substantial computational cost associated with mixed integer pro-
gramming problems it is necessary to aggregate again when one moves from static
to dynamic models. In the case of the small dynamic model of the Mexican steel
industry, there are seven plants, three mines, three markets, eight commodities, five
productive units and five time periods. Thus in order to include the investment anal-
ysis it is necessary to give up much of the disaggregation of mines, plants, markets
and commodities which were used in the large static model. While the algorithms
for solving mixed integer programming problems are continuing to improve and the
speed of computers is advancing even more rapidly, the fundamental tradeoff between
representing investment opportunities and model size is likely to remain.
At the time of the study, natural gas prices in Mexico were controlled at a level
about one-tenth of the world price. Since this was a key input for the sponge iron
318 D.A. Kendrick

production process of the privately owned steel mills, it amounted to a substantial

subsidy from the government to the competition for its state-owned steel mills. At the
same time the quality of domestic coal and iron ore was declining. Also, there was
increasing concern about congestion and air pollution in Mexico's larger cities and
a willingness to provide subsidies to industries which would locate elsewhere. These
ingredients provided the motivation for the small dynamic model used to study the
sensitivity of the minimum cost plan for the sector.
The study found that the policies affecting the regulation of natural gas prices were
very important to the investment plan of the coming decades. It also found that the
expansion of the industry should be near ports because of the declining quality of the
domestic iron ore reserves and therefore the expectation that iron ore pellets would
be imported in the future. Finally it found that substantial investments in plants on
the Gulf Coast were not economical unless natural gas prices were held at the low
regulated levels.
In summary, a dynamic model can prove to be an effective tool for studying in-
vestment in productive units in a set of plants located near mines or ports and serving
both domestic and foreign markets. The strength of this type of model lies in its
ability to represent the transformation of commodities in a cascade of production pro-
cesses running in productive units. The strength also lies in the ability to represent
economies of scale in investment cost and to solve the resulting model using mixed
integer programming methods. The weakness lies in the lack of demand functions and
in the specification of cost minimization as the criterion function. Also, the weakness
lies in the fact that insufficient attention is given to the gaming of investment in heavy
industry. While the basic model described above could be used as a part of a game
analysis of investment plans in the various steel mills, this was not attempted in the
World Bank studies. It would seem to be an important part of future sectoral invest-
ment models. I am not thinking so much here about analytical solutions to small game
theory models as I am the use of the numerical models to analyze various opposing
investment strategies by the companies.

4.4. Examples of sectoral models

One of the most useful ways to review sectoral models in the process industries is by
the geographical scope of the models. As Table 6.5 shows there is a fair sampling of
models which are single country, multi-country and worldwide in scope. For a more
extensive discussion of models covering different geographical areas see Kendrick
Many of these models are in GAMSLIB, the model library which is included with
the GAMS software, see Chapter 19 of Brooke, Kendrick and Meeraus (1992). One
logical application of this class of sectoral models is to an industry located in a single
country. Since national governments regulate or control process industries in most
Ch. 6: SectoraI Economics 319
Table 6.5
Linear mixed-hlteger programming sectoral models of processes industries by geographic area
single country
Choksi, Meerans and Stoutjesdijk (1980)
Westphal (1971a, 1971b)
Sub (1981)
Kendrick (1967)
Westphal (1971a, 1971b)
Manne and Vietorisz (1963)
Mennes and Stoutjesdijk (1985)
Brown, Dalmnert, Meeraus and Stoutjesdijk (1983)
Dammert and Palaniappan (1985)
Adib (1985)
Wei (1984)

countries, this is the appropriate geographical scope for many studies. These studies
may cover a single country such as the fertilizer industry in Egypt in Choksi, Meeraus
and Stoutjesdijk (1980), the petrochemical industry in South Korea in Suh (1981) or
the steel industry in Brazil in Kendrick (1967). Or they may cover large projects in
more than one industry as is the case for petrochemicals and steel in South Korea in
Westphal (t971a, 1971b). Westphal's model demonstrates how large projects for two
sectors can be embodied in an economy-wide model.
One of the interesting aspects of Westphal's work is that it provides a means of
considering the effects of large projects in small countries. In some cases a project
or projects may be so large as to affect the total economy. Westphal's approach
permits considertation of the feedback effects of the project on the economy, such
the effects on interest rates, exchange rates, exports or imports. For additional studies
on economies and scale and investment planning see Westphal (1975), Chenery and
Westphal (1979)and Cremer and Westphal (1984).
The multi-country models have typically been motivated by prospective or existing
free trade areas or common markets. For example, one of the first sectoral models
was the Manne and Vietorisz (1963) study of the fertilizer industry in Latin America.
Similarly, the Mennes and Stoutjesdijk (1985) study of the fertilizer sector was done
for the Andean Common Market.
A different methodology for sectoral economic modeling was put forth by Manne
(1967). In these studies, Manne used dynamic programming methods to calculate the
320 D.A. Kendrick

optimal period between capacity expansions and to determine the amount of each
expansion. The methodology was applied to the aluminum, caustic soda, cement and
fertilizer industries in India. For an extension of this work see FreidenMds (1981).
While this methodology has not been widely used yet, it seems likely that it could
provide a basis for the use of stochastic control methods in analyzing the expansion
of an industry over time.

5. The computer industry

Two different methodologies have been applied in developing sectoral models of

various portions of the computer industry. The first of these is the use of the linear
programming transportation model framework to the personal computer industry by
Ang (1992).
The commodities in this model are shown in Table 6.6. The focus here is on
memory chips and microprocessors which are assembled into motherboards and then
into personal computers.
The model included sixteen raw material suppliers such as Toshiba, NEC, Mit-
subishi, Quantum, Seagate, Conner, Intel, AMD and Motorola; nine assembly plants
located in Texas, Florida, California, New York and Massachusetts; as well as thirteen
market areas in the U.S., Europe, Japan and the Far East.
Models of this type are potentially very useful because there are occasionally short-
ages of memory chips or other components and the capacity of the productive units
would quickly reflect this kind of problem. Also, from time to time there are tariffs
and quotas which affect this industry substantially and this class of models is ideal
for reflecting the impacts of these constraints on international trade.
However, there are at least three major problems with the development of models
of this type. First, there is a tendency in linear programming models for the low
cost supplier to gain the entire market, while the reality is that product differentiation
is very important in this industry and market shares do not shift rapidly between
suppliers. Therefore, adaptations of the models to permit slow shifts between suppliers
will be important. Second, demand fufictions need to be built into the models. Third
it is difficult to obtain the capacity and cost data which such a model requires. Private

Table 6.6
Commodities in personal computermodel

Raw material Intermediate Final

printed circuit boards IBM PC
memory chips 68xxx motherboard Macintosh
68xxx processors 80xxx motherboard Compaq
80xxx processors AST
hard disk DELL
Ch. 6: Sectoral Economics 321


\ /

number of I applications
workstations software

Figure 6.4. A computer workstation model.

firms in this highly competitive industry are reluctant to reveal these statistics. There
are some firms that collect and sell information on this industry, but academic projects
cannot easily afford to pay the price for such data.
All three of the problems can be solved with creativity and resources and this
industry will continue to be important for a number of countries. Therefore, it seems
likely that a number of sectoral models of the personal computer industry will be
developed in the years to come.
A different class of models of the computer industry is simulation models which
are driven by the decreasing feature size of elements on microprocessor and random
access memory chips. A schematic for a model of this type is in Fig. 6.4.
The decrease in the feature size of the elements on computer chips is the driving
force of the model. As this size decreased from around twenty microns in the early
1960's to less than one in the early 1990's the speed of the chips increased and their
cost decreased. This in turn resulted in increases in sales so that there were more
workstations in existence. This in turn augmented the development of applications
software which in turn added to the increase in sales.
Touma (1993) has developed a model of this type for the computer workstation in-
dustry using exponential functions. For example in his model the feature size decreases
by 5.5 percent per year. This model was developed in the C language. Touma's training
is electrical engineering and the model reflects this in the rich detail of the technical
information not only about microprocessors but also disk drives, displays, etc.
322 D.A. Kendrick

The model uses not only feature size but also die size and the number of critical
masks as driving forces. Dies are circular disks of several inches in diameter which
look a little like a compact disk. The chips are etched onto these dies. The number
of critical masks has to do with the number of levels of etching on the dies. These
three factors determine the number of flawed chips on a die and therefore the cost of
Touma's model was designed for parametric use so that one can alter the expected
rate of change of feature size, the size of the dies for the chips or the number of
critical mask. Then one can see the effect of these changes on the number of flawed
chips and the effect of this on microprocessor cost and eventually on workstation cost.
As a simulation model it runs very fast and is a useful tool for analyzing the effects
of changes in feature size, die size and number of masking levels on the future of the
computer workstation industry.

6. Energy

While most of the models discussed in this chapter are activity analysis models, some
of the most influential are not. Consider for example Pindyck's classic 1978 paper
on the OPEC cartel. This is a multiperiod mathematical programming model which
can also be thought of as a deterministic nonlinear control theory model. The model
was originally solved by Pindyck as a control theory problem; however the GAMS
solution discussed here is a mathematical programming problem.
The model is used to analyze oil supply and price in the world economy. Oil
suppliers are divided into two groups, OPEC and the "fringe" of oil producers not in
OPEC. The objective function is to maximize OPEC's discounted profit stream.
OPEC faces a dynamic tradeoff. If it raises prices, its immediate revenues are higher
but this also has the effect of decreasing demand with a lagged response and of slowly
but surely increasing the oil supply from the fringe. Since OPEC is modeled as the
residual supplier this increase in supply from the non-OPEC fringe erodes OPEC
profits over time. c
The model has fifteen time periods and roughly one hundred constraints. This is
large enough to capture the essence of the problem and small enough that it is a
manageable nonlinear programming problem. The model is in the GAMS Library, cf.
Brooke et al. (1992) and has been used by many students over the years.
In contrast to Pindyck's control theory model of oil supply and demand, the activity
analysis models have been used to great advantage for many years in modeling oil
refineries. One of the first mathematical programming models in the energy sector
was the refining model developed by Charnes, Cooper and Mellon (1954). This was
followed by Alan Manne's 1956 book on the scheduling of refinery operations and
his 1958 article on a U.S. model of the petroleum refining industry. A later model by
Langston (1983) extended this work to the world-wide oil refining industry. Among
Ch. 6: Sectoral Economics 323

other things, Langston analyzed the effects of increases in oil import taxes on the
use of U.S. oil refineries instead of European oil refineries to supply gasoline to U.S.
Much of the world oil supply is delivered to refineries in tankers. However, natural
gas is difficult to ship economically in tankers and therefore most of it is transported
in pipelines. These pipeline networks are naturally modeled with mathematical pro-
gramming models. For example, Waverman (1973) developed a model of natural gas
supply in Canada and the U.S. In this region most natural gas is produced in Texas
and Louisiana as well as in the western Canadian province of Alberta. In contrast, the
largest markets for the gas are in the eastern U.S. and Canada as well as in California.
The natural gas fields in Texas and Louisiana are closer to the markets of eastern
Canada than are the Alberta fields. Accordingly, Waverman's model suggests shipping
the Alberta gas to California and the Texas and Louisiana gas to eastern Canada rather
than building a trans-Canadian natural gas pipeline to attain autarky in the Canadian
natural gas industry.
A more recent use of mathematical programming models to analyze the natural
gas industry is in Beltramo, Manne and Weyant (1986). Also, Rowse (1986b) uses a
dynamic framework to analyze Canadian natural gas export issues and Rowse (1992)
provides a discussion of U.S.-Canadian natural gas trade using a group of models
analyzed by the Stanford University Energy Modeling Forum.
A rather different approach to natural gas pipeline modeling is in McCabe, Rassenti
and Smith (1991). This paper discusses the uses of linear programming models of gas
pipeline systems to assist market solutions.
Another important application of mathematical programming models to sectoral
models is in electric power. The pioneering work in this area was done by Masse and
Gibrat (1957). Some of the studies done in the 1970's were Gately (1971), Anderson
(1972), Fernandez, Manne and Valencia (1973) and Anderson and Turvey (1977).
Some studies in the 1980's were Kang and Kendrick (1985) and Kwun (1986).
The basic structure of most of these models is power plants linked by transmission
lines to markets for electric power. The plants may use coal, oil or natural gas. Also,
some of the plants may be nuclear and in most cases there are also hydroelectric
facilities. Each of these types of plants have different economic characteristics. For
example, hydroelectric facilities have high capital cost, low operating cost and a
comparative advantage in certain seasons of the year while coal-fired power plants
have relative low capital cost and high operating cost. There are substantial economies
of scale in investment particularly in nuclear and hydroelectric facilities so mixed
integer programming methods are frequently used to solve these models. Also, some
models such as Baughman, Siddiqi and Zarnikau (1993) consider investment not only
in power plants but also in transmission lines and model these investment options
with integer variables.
The earlier models were linear programs or mixed integer linear programs. The
more recent models have moved toward nonlinear programming and even nonlinear
324 D.A. Kendrick

mixed integer programming. Similarly, the earlier models were mostly deterministic,
and there is a tendency now to begin using stochastic methods to model uncertainty
in outage for power plants and transmission lines.
One of the most important contributors to energy modeling has been Alan Manne.
See for example Manne (1974, 1976, 1979) and Manne, Richels and Weyant (1979).
Also, some of Manne's work links the energy and environmental fields, viz. Manne
and Richels (1992). Other important contributions to energy modeling are the linear
programming models of Nordhaus (1973, 1979) and the quadratic programming model
of Kennedy (1974). Rowse (1986a) explores the measurement of user costs of ex-
haustible resources using nonlinear programming. For an integrated survey of natural
resource industry/sector programming models see Rowse and Copithorne (1982).
While most models in this area have confined themselves to a single industry such
as oil and gas or electricity, some models have considered the entire energy sector, viz
Fernandez and Manne (1973). In a recent model of this class Linden (1992) used a
mixed integer programming model to represent coal mining, oil production, refining,
natural gas production, and electric power production and distribution for Colombia.
This model demonstrates that we are moving into an era in which one can build
substantially disaggregated models of a number of related industries and solve them
as a single large sectoral model. At the time the model was being developed it was
solved on a Sun workstation, but Linden has reported that improvements in solvers
have enabled him to solve the model even on a 486 microcomputer.

7. E n v i r o n m e n t

Water pollution was one of the first concerns of the environmental movement in
the post World War II period in the U.S. In water pollution models point sources
of pollution such as urban sewage are processed in part before being dumped into
rivers and streams. The pollutants lower the dissolved oxygen level in the rivers down
stream from the point source. So the river plays the important role of processing the
pollutant but at the cost of degrading t~e quality of the stream.
The models of water pollution have therefore included activities which provide
various degrees of processing of wastewater at increasing cost before the effluent is
passed into the stream. Then flow equations are used to model the reduction in the
dissolved oxygen level downstream ti'om the sewage plant as the remaining waste is
processed. In most models there are a number of sewage plants, sometimes owned by
different municipalities, which affect the water quality in any given leg of the stream.
Some models in this class have sought the minimum operating and capital cost
to meet quality standards for dissolved oxygen levels in the steams. Other models
have considered the use of markets for dischargeable permits. A recent model of this
type is Letson (1992a) for the Colorado River in Texas. He used thirteen wastewater
treatment plants which were planned for the year 2000 to study the effects on two
Ch. 6: Sectoral Economics 325

segments of the river. In another aspect of the same study Letson (1992b) analyzed
investment issues.
Recently, the global warming problem has become a matter of increasing concern.
Fuels such as coal, oil and natural gas release carbon dioxide as they are burned. As the
carbon dioxide concentration increases in the atmosphere it affects the release of heat
from the earth and therefore tends to bring about an increase in the earth's temperature.
This increase in temperature in turn affects the output of the agricultural and other
sectors. There are different degrees of uncertainty associated with the various links
in this chain of causation, the most uncertain of which is the link between increases
in the carbon dioxide concentration and changes in the earth's temperature. Also, the
dynamics' are a matter of substantial concern since the full effect of burning fossil
fuels today will not be felt until some years in the future..
A small and elegant model for analyzing global warming is Nordhaus (1992). This
model is solved as a nonlinear programming model using the GAMS software. It can
be used to analyze the effects of various control policies including carbon taxes on
fossil fuels.
One aspect of the global warming problem is the potential future conflict between
developing and developed nations. While many analysts agree that it may be neces-
sary to reduce carbon emissions in order to stabilize or reduce the carbon dioxide
concentrations in the atmosphere; there is less agreement about how this responsibil-
ity should be shared around the globe. Consider for example India and China. Both
of these countries have substantial coal reserves and strong ambitions for economic
development. In the past the industrialized countries of Japan, the U.S. and Europe
have been the largest contributors to the increase in the carbon dioxide concentra-
tion in the atmosphere. China, India and the other developing countries feel that it is
their turn now and that the developed countries should curtail their emissions even
more sharply in order to permit the developing countries to develop their fossil fuel
This is one of the policy questions analyzed in Duraiappah (1993). This GAMS-
language model includes not only equations to represent the effects of the economy on
carbon dioxide emissions and thus the temperature, but also the effect of temperature
changes back on the economy. Also the model includes alternative technologies with
different degrees of capital intensity which can be used to ameliorate the emissions.
Thus one policy examined in the study is transfers from the developed to the devel-
oping countries to aid the developing countries to adopt capital intensive technologies
which are less polluting. Another result the model highlights is the important role
dynamics plays in environmental policy decisions. As the time horizon was altered,
the best policy decision changed from a "business-as-usual" approach to an "act-now"
approach. This kind o f results would have been difficult to predict using analytical
results alone.
For another approach to modeling the global warming problem see Manne and
Richels (1991). For a study of air pollution control see Kohn (1978).
326 D.A. Kendrick

8. Agriculture
Two of the main classes of agricultural models are those for annual crops and those
for tree crops and livestock. The central element in the annual crop models is the
allocation of land to crops in multiple growing seasons within a year. In contrast the
tree crop and livestock models are usually dynamic models with state equations for
the trees or for the various types of livestock.
The annual crop models usually have two or three growing seasons and a set
of crops which can occupy the land during certain months of the year. Frequently
the models also include irrigation activities as well as careful attention to the use
of both permanent and temporary labor. The problem is to find that mix of crops
which maximize profits within the land, labor and water constraints. Many of these
models are not cost-minimizing. Rather, the criterion is consumer-producer surplus
maximization. Also, economies of scale have not been considered as prominently
in agricultural as in industrial models. Thus it has not been necessary to employed
mixed integer programming techniques. On the other hand, uncertainty is important
in the agricultural sector and there has consequently been considerable work using
the mean-variance and other uncertainty modeling techniques.
One of the most well-known models in this class is the CHAC model which was
developed at the World Bank for Mexican agriculture, cf. Duloy and Norton (1973)
and Norton and Solis (1983). Following the methodology of Kutcher, Meeraus and
O'Mara (1986) and Hazell and Norton (1986), a recent model in this class is Lofgren's
(1990, 1993) study of Egyptian agriculture. The model includes eight crops and seven
sets of one or two-month time periods within the year. Each crop would occupy the
land in a subset of these time intervals. There are constraints not only for land and
irrigation water but also for fertilizers and for four types of labor.
During the time of Lofgren's study Egyptian agriculture was closely regulated by
the government, with many prices for both inputs and outputs fixed. So the model
is used to analyze the possible effects of freeing various combinations of regulations
and prices.
Tree crop and livestock models use dynamic equations to model the number of
trees or animals. For example a cattle model will include equations for the numbers
of steers, bulls, cows and calves. Births" add to the stocks and slaughter of cows and
steers produces meat for sale while reducing the stocks. When prices rise the desired
herd size rises so fewer cows are slaughtered in order to produce more calves. This
can produce short term price rises even greater than the initial uptick in prices.
Tree crop models may carry vintage information about the age of trees since there
is sometimes a long lag between planting of new trees and their first yield. In cocoa
this lag is about seven years and affects the price and production dynamics of the
sector substantially. For an example of the application of control theory methods to
stabilization of cocoa market prices, see Kim et al. (1975). The book edited by Labys
(1975) contains a number of sectoral model studies for agricultural and industrial
For a survey of agriculture sector modeling see McCarl and Spreen (1980).
Ch. 6: Sectoral Economics 327

9. Linkages to computable general equilibrium and growth models

The first outlines of this chapter extended the notion of models of a single sector to
economy wide models with multiple sectors such as computable general equilibrium
models (CGE's) and growth models. This is a logical step so there may eventually be
close ties between the class of models discussed in this chapter and those in Dixon
and Parmenter's chapter in this book. Also, it is reasonable to expect that sectoral
models will be linked to growth models directly as is outlined in Kendrick (1990)
and through ties to dynamic CGE's.
One example in the direction of links between sectoral and economy-wide models
is by Alan Manne. This is done in the integration of a sectoral model and a macro
model with ETA-MACRO [Manne (1979)] and its offspring GLOBAL 2100 by Alan

10. Limitations

In many PhD dissertations which I have supervised I have insisted that the student
add a section near the end which is called limitations. This section is designed to
provide the reader with a guide to the difficulties inherent in the development and use
of the models employed by the student. Also, this section is used to report on the fact
that it was difficult to get the models to converge or that the solutions did not prove
to be as robust as one could desire.
When I made a first draft of this chapter I failed to follow my own advice about
including a section on limitations. However, John Rowse's comments on that draft
reminded me of the need for such a section. John suggested that I add a paragraph
which points out that while linear programming methods are robust that nonlinear
programming can still be tricky. In my experience it is easy to formulate a nonlinear
programming model in a modeling language, but it is often difficult to obtain feasible
and then optimal solutions. As some protection against this I urge students to begin
with small models. Then as they advance to more complicated models they should
retain feasibility and optimality at each step. However, even this "start small" advice
is not enough to protect one from the difficulties which are inherent in solving non-
linear programming problems. Though there are exceptions where everything works
beautifully, any analyst planning to develop a nonlinear programming model should
anticipate difficulties in obtaining solutions and should proceed cautiously.
Also, one of the most important limitations of sectoral models is their complexity.
This is not "computational complexity" in the sense of np hard problems, but rather
that sectoral models of any size can be difficult to debug and to check for errors
because of the sheer number of commodities, processes, productive units, plants and
markets. Though the development of modeling languages has been of tremendous
help in dealing with this type of complexity, it remain important to develop small
328 D.A. Kendrick

models first and to add complications slowly so that errors can be eliminated at each
step along the way. If one begins with a large model and encounters problems, it can
be exceedingly difficult to track down the source of the error.

11. Conclusions

Studies of various sectors of the economy have been done for many years; however, the
increased power of computers is bringing new possibilities to this area of economics.
In the past it was possible to run a few regressions and to make policy prescriptions
on this basis. Similarly, rates of return could be computed on individual projects to
aid in investment decisions.
In contrast, it is now possible to estimate entire models and to solve these models
using nonlinear programming or stochastic control theory methods. These methods
permit the consideration of more accurate models and the inclusion in the analysis of
the effects of uncertainty. Also, it is no longer necessary to consider the rate of return
on a single project. Rather, models can be developed which include many investment
projects in a sector and computer codes can be used to find the best combinations of
these investments in the various productive units at dispersed mines and plants. For
an example applied to the $3 billion national capital expenditure budget of a major
U.S. corporation, see Geoffrion (1994).
This progress in sectoral modeling is based on the solid foundation of earlier work
in activity analysis, linear programming and the solution of models with economies
of scale which permits the development and solution of realistic models with many
commodities, processes, productive units, plants and markets. The progress is being
accelerated by the availability of modeling languages which permit the specification
and solution of substantially disaggregated models even on personal computers and
by the development of graphical interfaces for sectoral models which reduce the
complexity associated with large models. As the speed of microprocessors continues
to increase, there is strong reason to believe these trends will continue, allowing us
to develop substantially better ways to understand sectors in economies. This is turn
should provide us with mesoeconomics as a strong bridge between microeconomics
and macroeconomics


Adib, P.M. (1985) 'An investment planning model of the world petrochemical industry', PhD dissertation,
University of Texas, Austin, TX.
Anderson, D. (1972) 'Models for determining least-cost investments in electricity supply', The Bell Journal
()f Economics and Management Science, 3(1):267-299.
Anderson, A. and Turvey, R. (1977) Electric economics: Essays and case studies. Baltimore, MD: Johns
Hopkins Univ. Press.
Ch. 6: Sectoral Economics 329

Ang, B.K. (1992) 'Modeling the U.S. personal computer industry using GAMS', Masters Thesis, University
of Texas, Austin, TX.
Baker, T.E. (1990) 'A hierarchical/relational approach to modeling', Computer Science in Economics and
Management, 3(1):63-80.
Baughman, M.L., Siddiqi, S.N. and Zarnikau, J. (1993) 'Comprehensive electrical system planning',
Progress Report #3, Project RP3581-03, University of Texas, Austin, TX.
Beltramo, M.A., Manne, A.S. and Weyant, J.E (1986) 'A North American gas trade model (GTM)', Energy
Journal, 7:15-32.
Bisschop, J. and Entriken, R. (1993) 'AIMMS: The modeling system', Paragon Decision Technology, RO.
Box 3277, 2001 DG Haarlem.
Brooke, A., Kendrick, D.A. and Meeraus, A. (1992) GAMS, A userg" guide, release 2.25. Danvers, MA:
Boyd & Fraser Publishing Co.
Brown, M., Dmnmert, A., Meeraus, A. and Stoutjesdijk, A. (1983) "World investment analysis: The case
of aluminum', World Bank Staff Working Papers, Number 603, The World Bank, Washington, DC.
Charnes, A., Cooper, W.W. and Mellon, B. (1954) 'A model for programming and sensitivity analysis in
an integrated oil company', Econometrica, 22(2):193-217.
Charnes, A. and Cooper, W.W. (1956) 'Nonlinear power of adjacent extreme point methods in linear
programming', Econometrica, pp. 132-153, which is republished in: Chames and Cooper (1961) Man-
agement models and industrial applications of linear programming. New York: Wiley, Chapter X.
Chenery, tt.B. and Westphal, L.E. (1979) 'Economies of scale and investment over time', in: H.B. Chenery,
et at., eds, Structural clumge and development policy. New York: Oxford Univ. Press.
Choksi, A.M., Meeraus, A. and Stoutjesdijk, A.J. (1980) The planning qf investment programs in the
.fertilizer industry. Baltimore, MD: Johns Hopkins Univ. Press.
Cremer, J. and Westphal, L.E. (1984) '"The interdependence of investment decisions" revisited', in:
M. Syrquin, L. Taylor and L.E. Westphal, eds, Economic structure and perJbrmance: Essays in honor qf
Hollis B. Chenery. New York: Academic Press.
Dammert, A. and Palaniappan, S. (1985) Modeling investment in the worm copper secto~ Austin, TX:
Univ. of Texas Press.
Dantzig, G.B. (1951) 'Application of the simplex method to a transportation problem', in: T. Koopmans,
ed., Activity analysis of production and allocation. New York: Wiley, Chapter XXIIt, pp. 359-373.
Dantzig, G.B. (1957) 'Discrete variable extremum problems', Operations Research, pp. 266-277.
Dorfman, R., Samuelson, P. and Solow, R. (1958) Linear programming and economic analysis. New York:
Drud, A. (1994) 'CONOPT - A large-scale GRG code', ORSA Journal on Computing, forthcoming.
Duloy, J.H. and Norton, R.G. (1973) 'CHAC, a programming model of Mexican agriculture', in:
L.M. Goreux aad A.S. Manne, eds, Multi-level planning: Case studies in Mexico. Amsterdam: North-
Duraiappah, A.K. (1993) Global warming and economic development. Dordrecht: Kluwer Academic Pub-
Fernandez, G. and Manne, A.S. (1973) 'Energeticos, a process analysis of the energy sector', in:
L.M. Goreux and A.S. Manne, eds, Multi-level planning: Case studies in Mexico. Amsterdan~: North-
Holland, Chapter II1.2.
Fernandez, G., Manne, A.S. and Valencia, J.A. (1973) 'Multi-level planning of electric power projects', in:
L.M. Goreux and A.S. Manne, eds, Multi-level planning: Case studies in Mexico. Amsterdam: Nol'th-
Holland, Chapter III.1.
Fourer, R., Gay, D.M. and Kernighan, B.W. (1990) 'A modeling language for nmthematical progrzunming',
Management Science, 36:519-554.
Fourer, R., Gay, I),M. and Kemighan, B.W. (1992) AMPL: A modeling language for mathematical pro-
gramming. Danvers, MA: Boyd & Fraser Publishing Co.
Fourer, R. and Gay, D.M. (1994) 'Expressing special structures in an algebraic modeling language for
mathematical programming', ORSA Journal of Computing, forthcoming.
330 D.A. Kendrick

Freidenfelds, J. (1981) Capacity expansion. Amsterdam: Elsevier.

Gately, D.I. (1971) 'Investment planning for the electric power industry: A mixed-integer programming
approach, with applications to southern India', PhD dissertation, Princeton, NJ: Princeton Univ. Press.
Geoffrion, A.M. (1987) 'An introduction to structured modeling', Management Science, 33(5):547-588.
Geoffrion, A.M. (1992a) 'The SML language for structured modeling: Levels 1 and 2', Operations Re-
search, 40:38-57.
Geoffrion, A.M. (1992b) 'The SML language for structured modeling: Levels 3 and 4', Operations Re-
search, 40:58-75.
Geoffrion, A.M. (1994) 'Capital portfolio optimization: A managerial overview', Working paper, Graduate
School of Management, University of California, Los Angeles, CA.
Gordon, R.B., Koopmans, T.C., Nordhaus, W.D. and Skinner, B.J. (1987) Toward a new iron age? Quan-
titative modeling of resource exhaustion. Cambridge, MA: Harvard Univ. Press.
Goreux, L.M. and Manne, A.S., eds, (1973) Multi-level planning: Case studies in Mexico. Amsterdam:
Hazell, P.B.R. and Norton, R.D. (1986) Mathematical programming analysis in agriculture. New York:
Jones, C.V. (1990) 'An introduction to graph-based modeling systems: Part I: Overview', ORSA Journal
on Computing, 2(2):136-151.
Jones, C.V. (1991) 'An introduction to graph-based modeling systems: Part II: Graph grammars and the
implementation', ORSA Journal on Computing, 3(3):180-206.
Kang, K.-H. and Kendrick, D.A. (1985) 'The trade off between economies of scale and reliability in the
electric power industry', Journal t)f"Economic Development, 10(1):47-61.
Kendrick, D.A. (1967) Programming investment in the process industries. Cambridge, MA: MIT Press.
Kendrick, D.A. (1984) 'Style in multisectoral modeling', in: A.J. Hughes Hallett, ed., Applied decision
analysis and economic behavior. Dordrecht: Martinus Nijhoff, Chapter 15.
Kendrick, D.A. (1990) Models for analyzing comparative advantage. Dordrecht: Kluwer Academic Pub-
Kendrick, D.A. (1991) 'A graphical interface for production and transportation system modeling: PTS',
Computer Science in Economics and Management, 4(4):229-236.
Kendrick, D.A., Meeraus, A. and Alatorre, J. (1984) The planning of investment programs in the steel
industry. Baltimore, MD: Johns Hopkins Univ. Press.
Kendrick, D.A. and Stoutjesdijk, A.J. (1978), The planning of investment programs: A methodology. Bal-
timore, MD: Johns Hopkins Univ. Press.
Kennedy, M. (1974) 'An economic model of the world oil market', The Bell Journal ~)f Economics and
Management Science, 5(2):540-577.
Kim, H.K., Goreux, L.M. and Kendrick, D.A. (1975) 'Feedback control rule for cocoa market stabilization',
in: W.C. Labys, ed., Quantitative models of commodity markets. Cambridge, MA: Ballinger, Chapter 9,
pp. 233-264.
Kohn, R.E. (1978) A linear programming model.fi)r air pollution control. Cambridge, MA: MIT Press.
Koopmans, T. (1951) 'Analysis of production as an efficient combination of activities', in: T. Koopmans,
ed., Activity analysis t~fproduction and allocation. New York: Wiley, Chapter ti.
Koopmans, T. and Beckmann, M. (1957) 'Assignment problems and the location of economic activities',
Econometrica, pp. 53-76.
Koopmans, T. and Reiter, S. (1951) 'A model of transportation', in: T. Koopmans, ed., Activity analysis of
production and allocation. New York: Wiley, Chapter XIV.
Krishnan, R. (1988) 'PM: A logic-based language for production, distribution and inventory planning', Pro-
ceedings ~)['the Hawaii international cot~ference on the system sciences. See also Krishnan, R., Kendrick,
D. and Lee, R.M. (1988), 'A knowledge-based system for production and distribution economics', Com-
puter Science in Economics and Management, 1(1):53-72.
Krishnan, R. (1990) 'A logic modeling language for model construction', Decision Support Systems, 6:123--
Ch. 6: Sectoral Economics 331

I~ishnan, R. (1991) 'PDM: A knowledge-based tool for model construction', Decision Support Systems,
Kutcher, G.R, Meeraus, A. and O'Mara, G.T. (1986) 'Agricultural modeling for policy analysis', mimeo,
The World Bank, Washington, DC.
Kwun, Y. (1986) 'Joint optimal supply planning of industrial cogeneration and conventional electricity
systems', Economic research division, Public Utility Commission of Texas, Austin, TX.
Labys, W.C., ed. (1975) Quantitative models of commodity markets. Cambridge, MA: Ballinger.
Labys, W.C. (1987) Primary commodity markets and models: An international bibliography. Gower Press,
Aldershot, Hants.
Labys, W.C., Takayama, T. and Uri, N.D. (1989) Quantitative methods for market-oriented economic
analysis over space and time. Gower Press, Aldershot, Hants.
Langston, V.C. (1983) 'An investment model for the U.S. gulf coast refining/petrochemical complex',
University of Texas, Austin, TX.
Letson, D. (1992a) 'Simulation of a two-pollutant, two-season pollution offset system for the Colorado
river of Texas below Austin', Water Resources Research, 28(5): 1311-1318.
Letson, D. (1992b) 'Investment decisions and transferable discharge permits: An empirical study of water
quality management under policy uncertainty', Environmental and Resource Economics, 2:441-458.
Linden, G. (1992) 'An integrated approach to energy investment: A project level model for Colombia',
PhD dissertation, University of Texas, Austin, TX.
Lofgren, H.L. (1990) 'A quadratic programming study of Egyptian agriculture', PhD dissertation, University
of Texas, Austin, TX.
Lofgren, H.L. (1993) 'Liberalizing Egypt's agriculture: A quadratic programming analysis', Journal of
A~Ftican Economies, 2(2):238-261.
Manne, A.S. (1956) Scheduling (~:'petroleum refinery operations. Cambridge, MA: Harvard Univ. Press.
Manne, A.S. (1958) 'A linear programming model of the U.S. petroleum refining industry', Econometrica,
Manne, A.S., ed. (1967) Investments for capacity expansion. Cambridge, MA: MIT Press.
Manne, A.S. (1974) 'Waiting for the breeder', in Review of economic studies, Symposium issue on the
economics of natural resources.
Manne, A.S. (1976) 'ETA: A model for energy technology assessment', Bell Journal ~/' Economics,
Manne, A.S. (1979) 'ETA-MACRO: A model for energy-economy interactions', in: R. Pindyck, ed., Ad-
vances in the economics :~:"energy and resources, Vol. 2. Greenwich, CT: JAI Press.
Manne, A.S. and Richels, R.G. (1991) 'Buying greenhouse insurance', Energy Policy, (July/August):543-
Mannc, A.S. and Richels, R.G. (1992) Buying greenhouse insurance: The economic cost of C02 emission
limits. Cambridge, MA: MIT Press.
Manne, A.S., Richels, R.G. and Weyant, J.E (1979) 'Energy policy modeling: A survey', Operations
Research, 27(1): 1-36.
Manne, A.S. and Markowitz, H.M. (1963) Studies in process analysis. New York: Wiley.
Manne, A.S. and Vietorisz, T. (1963) 'Chemical processes, plant location, and economies of scale', in:
A. Manne and H. Markowitz, eds, Studies in process analysis. New York and London: Wiley.
Markowitz, H.M. and Manne, A.S. (1957) 'On the solution of discrete programming problems', Econo-
metrica, 25:84~110.
Masse, R and Gibrat, R. (1957) 'Application of linear programming to investments in the electric power
industry', Management Science, April.
McCabe, K.A., Rassenti, S.J. and Smith, V.L. (1991) 'Smart computer-assisted markets', Science, 254:534--
McCarl, B.A. and Spreen, T.H. (1980) 'Price endogenous mathematical programming as a tool for sector
analysis', American Journal (~/'Agricultural Economics, 62(1):87-102.
332 D.A. Kendrick

Meeraus, A. (1983) 'An algebraic approach to modeling', Journal of Economic Dynamics and Control,
Mennes, L. and Stoutjesdijk, A. (1985) Multicountry investment analysis. Baltimore, MD: Johns Hopkins
Univ. Press.
Murtagh, B.A. and Saunders, M.A. (1987) 'MINOS 5.1 user's guide', Report SOL 83-20R, December
1983, revised January 1987, Stanford University.
Nordhaus, W.D. (1973) 'The allocation of energy resources', Brookings Papers on Economic Activity,
Nordhaus, W.D. (1979) The efficient use of energy resources, Cowles foundation monograph 26. New
Haven, CT: Yale Univ. Press.
Nordhaus, W.D. (1992) 'An optimal transition path for controlling greenhouse gases', Science,
Norton, R.D. and Solis, L., eds (1983) The book of CHAC: Programming studies for Mexican agriculture.
Baltimore, MD: Johns Hopkins Univ. Press.
Pindyck, R.S. (1978) 'Gains to producers from the cartelization of exhaustible resources', Review of Eco-
nomics and Statistics, 60:238-251.
Rowse, J. (1986a) 'Measuring the user costs of exhaustible resource consumption', Resources and Energy,
Rowse, J. (1986b) 'Allocation of Canadian natural gas to domestic and export markets', Canadian Journal
qf Economics, 19(3):418-442.
Rowse, J. (1992) 'Whither long-tenn Canada-U.S. natural gas trade? A view from the (modeling) trenches',
Socio-Economic Planning Sci., 26(1):43-55.
Rowse, J. and Copithorue, L.W. (1982) 'Natural resource programming models and scarcity rents', Re-
sources and Energy, 4(1):59-85.
Strunk, W., Jr. and White, E.B. (1959) The elements of style. New York: Macmillan.
Sub, J.S. (1981) An investment planning model for the oil refining and petrochemical industries in Korea.
University of Texas, Austin, TX.
Thompson, G.L. and Thore, S. (1992) Computational economics: Economic modeling with optimization
software. Danvers, MA: The Scientific Press, Boyd & Fraser Publishing Co.
Thore, S. (1991), Economic logistics. New York: Quorum Books.
Touma, W.R. (1993) The dynamics of the computer industry. Boston: Kluwer Academic Publishers.
Waverman, L. (1973), Natural gas and national policy. Toronto: Univ. of Toronto Press.
Wei, Gwei-nyu D. (1984) 'A linear process model for the steel industries in the United States, the European
Community, and Japan', Master Thesis, Univ. of Texas.
Westphal, L.E. (1971 a) Planning investments with economies of scale. Amsterdam: North-Holland.
Westphal, L.E. (197 l b) 'An intertemporal planning model featuring economies of scale', in: H.B. Chenery,
ed., Studies in development planning. Cambridge, MA: Harvard Univ. Press.
Westphal, L.E. (1975) 'Planning with economies of scale', in: C.R. Blitzer, P.B. Clark and L. Taylor, eds,
Economy-wide models and development planning. London: Oxford Univ. Press.
Chapter 7

University ~f Massachusetts

1. Introduction 336
2. Technology for parallel computation 341
2.1. Parallel architectures 341
2.2. Parallel programming languages and compilers 348
2.3. Computer science issues in parallel algorithm development 350
3. Fundamental problem classes and numerical methods 354
3.1. Problem classes 355
3.2. Algorithms 361
4. Applications and numerical results 375
4.1. Nonlinear equations 375
4.2. Optimization problems 376
4.3. Parallel computation of variational inequality problems 384
4.4. Parallel computation of dynamical systems 393
Acknowledgements 400
References 401

* The author would like to thank the computer scientists, Marilynn Livingston and D.R. Mani, for many
helpful discussions in the course of preparing this work and Kathy Dhanda for assistance with the literature
searches. The author would also like to thank Manfred Gilli and an anonymous referee for ruany helpful
suggestions and comments.

Handbook t~f Computational Economics, Volume L Edited by H.M. Amman, D.A. Kendrick and ,L Rust
(~) 1996 Elsevier Science B.V. All rights reserved.
336 A. Nagurney

1. Introduction

The emergence of computation as a basic scientific methodology in economics has

given access to solutions of fundamental problems that pure analysis, observation,
or experimentation could not have achieved. Virtually every area of economics from
economeU'ics to microeconomics and macroeconomics has been influenced by ad-
vances in computational methodologies. Indeed, one cannot envision the solution of
large-scale estimation problems and general economic equilibrium problems without
the essential tool of computing. The ascent of economics to a computational science
discipline has been fairly recent, preceded by the earlier arrivals of physics, chem-
istry, engineering, and biology. Economics, as the other computational disciplines,
stands on the foundations established by computer science, numerical analysis, and
mathematical programming; unlike the aforementioned computational disciplines, it is
grounded in human behavior. As to be expected, it brings along unique computational
challenges, which have stimulated research into numerical methods.
For example, many algorithms for the solution of optimization problems, equi-
librium problems, including game theory problems, and dynamical systems, among
other problem-types, can trace the requirement of the solution of a prototypical eco-
nomic problem as the motivation for their discovery and subsequent development.
Indeed, one has only to recall the early contributions to computational economics
of Koopmans (1951), Dantzig (1951), Arrow, Hurwicz and Uzawa (1958), Dorfman,
Samuelson and Solow (1958) and Kantorovich (1965) in the form of the formulation
and solution of linear programming problems derived from resource allocation and
pricing problems.
Subsequently, the need to formulate and solve portfolio optimization problems in
financial economics, where the objective function is no longer linear and represents
risk to be minimized, helped to stimulate the area of quadratic programming [cf.
Markowitz (1952, 1959) and Sharpe (1970)]. Quadratic programming is also used
for the computation of certain spatial price equilibrium problems in agricultural and
energy economics [cf. Takayama and Judge (1964, 1971)]. The special structure of
these problems, as those encountered in portfolio optimization, has further given rise
to the development of special-purpose algorithms for their computation [cf. Dafermos
and Nagurney (1989)]. Quadratic programming has, in addition, provided a pow-
erful tool in econometrics in the case of certain problems such as the estimation
of input/output tables, social accounting matrices, and financial flow of funds ta-
bles [cf. Bacharach (1970), Nagurney and Eydeland (1992), Hughes and Nagurney
(1992), and the references therein]. In such problems, the different values of the
quadratic form depict different distributions. Econometrics has also made use of the
techniques of global optimization, which is appealed to in the case that the function
to be minimized is no longer convex [cf. Pardalos and Rosen (1987), Goffe, Ferrier
and Rogers (1994)].
Nonlinear programming, which contains quadratic programming as a special case,
has found wide application in the neoclassical theory of firms and households in
Ch. 7: Parallel Computation 337

microeconomics. In such problems the firms or households seek to maximize a cer-

tain objective function subject to constraints [see, e.g., Intriligator (1971), Takayama
(1974) and Dixon, Bowles and Kendrick (1980)].
Recently, there has been an increasing emphasis on the development and application
of methodologies for problems which optimization approaches alone cannot address.
For example, fixed point algorithms pioneered by Scarf (1964, 1973), complementarity
algorithms [cf. Lemke (1980)], homotopy methods, along with the global Newton
method and its many variants [cf. Smale (1976), Garcia and Zangwill (1981), and the
references therein], and variational inequality algorithms [cf. Dafermos (1983) and
Nagurney (1993)] have yielded solutions to a variety of equilibrium problems in which
many agents compete and each one seeks to solve his/her own optimization problem.
Such classical examples as imperfectly competitive oligopolistic market equilibrium
problems, in which firms are involved in the production of a homogeneous commodity,
and seek to determine their profit-maximizing production patterns, until no firm can
improve upon its profits by unilateral action, and Walrasian price equilibrium problems
in which agents, given their initial endowments of goods, seek to maximize their
utilities, by buying and selling goods, thereby yielding the equilibrium prices and
quantities, fall into this framework.
The drive to extend economic models to dynamic dimensions, as well as the desire
to better understand the underlying behavior that may lead to an equilibrium state,
led to the introduction of classical dynamical systems methodology [cf. Coddington
and Levinson (1955), Hartman (1964), Varian (1981)]. Recently, it has been shown in
Dupuis and Nagurney (1993) that the set of stationary solutions of a particular dynam-
ical system corresponds to the set of solutions of a variational inequality problem. The
dynamical system has its own application-specific constraints, and is non-classical in
that its right-hand side is discontinuous. Such a dynamical system, for example, could
guarantee that prices are always nonnegative as well as the commodity shipments.
Moreover, this dynamical system can be solved by a general iterative scheme. This
scheme contains such classical methods as the Euler method, the Heun method, and
the Runge-Kutta method as special cases.
Finally, the need to incorporate and evaluate the influence of alternative policies
in economic models in the form of dynamical systems has yielded innovations in
stochastic control and dynamic programming [cf. Kendrick (1973), Holbrook (1974),
Chow (1975), Norman (1976), Judd (1991)1. In contrast to the focus in dynamical
systems, where one is interested in tracing the trajectory to a steady state solution, in
control theory, the focus is on the path from the present state to an improved state.
Moreover, unlike the aforementioned mathematical programming problems, where the
feasible set is a subset of a finite-dimensional space, the feasible set is now infinite-
dimensional [see also Intriligator (1971)].
Computational methodologies, hence, have greatly expanded the scope and com-
plexity of economic models that can now be not only formulated, but analyzed and
solved. At the same time, the increasing availability of data is pushing forward the
338 A. Nagurney

demand for faster computer processors and for greater computer storage, as well
as for even more general models with accompanying new algorithmic approaches.
Faster computers also enable the timely evaluation of alternative policy interventions,
thereby, decreasing the time-scales for analysis, and minimizing the potential costs of
implementation. It is expected that algorithmic innovations in conjunction with com-
puter hardware innovations will further push the frontiers of computational economics.
To date, the emphasis in economics has been on serial computation. In this com-
putational framework an algorithm or simulation methodology is implemented on a
serial computer and all operations are performed in a definite, well-defined order.
The emphasis on serial computation is due to several factors. First and foremost,
serial computers have been available much longer than, for example, parallel comput-
ers, and, hence, users have not only a greater familiarity with them, but, also, more
software has been developed for such architectures. Secondly, many computer lan-
guages are serial in nature and, consequently, the use of such programming languages
subsumes basically serial algorithms and their subsequent implementation on serial
architectures. Moreover, humans naturally think sequentially although the brain also
processes information in parallel. In addition, the use of parallel architectures requires
learning not only, perhaps, other programming languages, but also new computer ar-
chitectures. Finally, the development of entirely new algorithms, which exploit the
features of the architectures may be required.
Computation, however, is evolving along with the technological advances in hard-
ware and algorithms, with an increasing focus on the size of the computational prob-
lems that can be handled within acceptable time and cost constraints. In particular,
parallel computation represents an approach to computation that can improve system
performance dramatically, as measured by the size of problem that can be handled.
In contrast to serial computation, with parallel computation, many operations are per-
formed simultaneously. Parallel processors may be relatively sophisticated and few in
number or relatively simple and in the thousands.
Parallel computation achieves its faster perforlnance through the decomposition
of a problem into smaller subproblems, each of which is allocated and solved si-
multaneously on a distinct processor. For example, in the case of a multinational or
multiregional trade problem, the level of decomposition may be "coarse", with the
decomposition being on the level of the number of nations or regions, or "fine", as on
the level of the commodity trade patterns themselves. In fact, for a given problem there
may be several alternative decompositions that may he possible, say on the level of the
number of commodities, on the number of markets, or on the number of market pairs.
The technology represented by massively parallel computation, in particular, sug-
gests that there is no obvious upper limit on the computational power of machines
that can be built. Two basic concepts emerge here - that of "data parallelism" and
that of "scalability". Data parallelism is a technique of coordinating parallel activi-
ties, with similar operations being performed on many elements at the same time, and
exploits parallelism in proportion to the amount of data [cf. Hillis (1992)]. Data level
Ch. 7: Parallel Computation 339

parallelism is to be contrasted with one o f the simplest and earliest techniques of co-
ordinating parallel activities, known as pipelining, which is analogous to an assembly
line operation, where operations are scheduled sequentially and balanced so as to take
about the same amount of time. Vector processors work according to this principle.
Another technique of parallelism that can be used along with pipelining is known
as functional parallelism. An example of this in a computer would occur when sep-
arate multiplication and addition units would operate simultaneously. Although both
of these techniques are useful, they are limited in the degree of parallelism that they
can achieve since they are not "scalable".
Scalability envisions building massively parallel computers out of the same com-
ponents used in, for example, desktop computers. Consequently, the user would be
able to scale up the number of processors as demanded by the particular application
without a change in the software environment.
Parallel computation represents the wave of the future. It is now considered to be
cheaper and faster than serial computing and the only approach to faster computation
currently foreseeable [cf. Deng, Glimm and Sharp (1992)].
Parallel computation is appealing, hence, for the economies of scale that are pos-
sible, for the potentially faster solution of large-scale problems, and also for the
possibilities that it presents for imitating adjustment or tatonnement processes. For
example, serial computation is naturally associated with centralization in an orga-
nization whereas parallel (and the allied distributed) computation is associated with
decentralization, in which units work, for the most part independently, with their ac-
tivities being monitored and, perhaps, synchronized, periodically. Market structures
associated with central planning are thus more in congruence with serial and cen-
tralized computation, with competitive market structures functioning more as parallel
systems. Indeed, one can envision the simulation of an economy on a massively par-
allel architecture with each competing agent functioning as a distinct processor in the
architecture, with price exchanges serving as messages or signals to the other agents.
The agents would then adjust their behavior accordingly.
To fix some ideas, we now mention several basic issues of parallel architectures,
which are discussed further in Section 2. The principal issues are: 1) the type of
processing involved, typically the combination of instruction and data parallelism,
2) the type of memory, either global (or shared) or local (or distributed), 3) the type of
interconnection network used, and 4) the processing power itself. The interconnection
network, for example, is not an issue in a serial architecture. It consists of links joining
pairs of processors and provides routes by which processors can exchange messages
with other processors, or make requests to read from, write to, or lock memory
locations. In designing interconnection networks for parallel processing systems, every
effort is made to minimize the potential for bottlenecks due to congestion on the
network. The basic distinction that can be made between processing power is whether
or not the processors are simple or complex. The latter are more common in "coarse-
grained" architectures, typically consisting of several processors, say on the order
340 A. Nagurney

of ten, whereas the former - in "fine-grained" architectures, typically consisting of

thousands of processors.
The design of a parallel algorithm, hence, may be intimately related to the particular
architecture on which it is expected to be used. For example, the parallelization of an
existing serial algorithm, a typical, first-cut approach, may fail to exploit any parallel
features and realize only marginal (if any) speedups. On the other hand, certain "serial"
algorithms, such as simulation methodologies may be, in fact, embarrassingly parallel
in that the scenarios themselves can be studied almost entirely independently (and on
different processors) with the results summarized at the completion of the simulation.
These two situations represent extremes with the most likely situation being that a
new and easier to parallelize algorithm will be developed for a particular problem
The last decade has revealed that parallel computation is now practical. The ques-
tions that remain to be answered are still many. What are the best architectures for
given classes of problems? How can a problem be decomposed into several, or into
numerous subproblems for solution on, respectively, coarse-grained or fine-grained
architectures? How do we design an algorithm so that the load across processors is
balanced, that is, all processors are kept sufficiently busy and not idle? What new
directions of research in numerical methods will be unveiled by the increasing avail-
ability of highly parallel architectures? What new applications to economics remain
to be discovered?
Parallel computation represents not only a new mode of computation, but a new
intellectual paradigm. It requires one to view problems from a new perspective and
to examine problems as comprised of smaller, interacting components, and at differ-
ent levels of disaggregation. It breaks down barriers between disciplines through a
common language. With expected dramatic shifts from serial to parallel computation
in the future, a significant change in scientific culture is also envisaged.
In this chapter the focus will be on parallel computation, numerical algorithms, and
applications to economics. The presentation will be on a high level of abstraction and
theoretically rigorous. Although parallelprocessing plays a role in symbolic processing
and artificial intelligence, such topics are beyond the scope of this presentation. We
also do not address distributed processing, where the processors may be located a
greater distance from one another, execute disparate tasks, and are characterized by
communication that is not, typically, as reliable and as predictable as between parallel
The chapter is organized as follows. In Section 2 we overview the technology of
parallel computation in terms of hardware and programming languages.
In Section 3 we present some of the fundamental classes of problems encountered
in economics and the associated numerical methodologies for their solution. In partic-
ular, we overview such basic problems as: nonlinear equations, optimization problems,
and variational inequality and fixed point problems. In addition, we discuss dynamical
Ch. 7: Parallel Computation 341

systems. For each problem class we then discuss state-of-the-art computational tech-
niques, focusing on parallel techniques and contrast them with serial techniques for
illumination and instructive purposes. The techniques as presented are not machine-
dependent, but the underlying parallelism, sometimes obvious, and sometimes not, is
emphasized throughout.
In Section 4, we present applications of the classes of problems and associated
numerical methods to econometrics, microeconomics, macroeconomics, and finance.
Here we discuss the implementations of the parallel algorithms on different architec-
tures and present numerical results.

2. Technology for parallel computation

In this section we further refine some of the ideas presented in the Introduction by
focusing on the technology of parallel computation in terms of both hardware and
software. In particular, we discuss some of the available parallel architectures as
well as features of certain parallel programming languages. We then highlight some
computer science issues of parallel algorithm development which reveal themselves
during the presentation of the technology.
The field of parallel computing is quite new and the terminology has yet to be
completely standardized. Nevertheless, the discussion that follows presents the fun-
damental terminology and concepts that are generally accepted.

2.1. Parallel architectures

Although the speed at which serial computers do work has increased steadily over
the last several decades, with a large part of the speedup process being due to the
miniaturization of the hardware components, there are natural limits to speedup that are
possible due to miniaturization. Specifically, two factors limit the speed at which data
can move around a traditional central processing unit: the speed at which electricity
moves along the conducting material and the length and thickness of the conducting
material itself. Both of these factors represent physical limits and today's fastest
computers are quickly approaching these physical constraints. Furthermore, as the
miniaturization increases, it becomes more difficult to dissipate heat from the devices
and more opportunities for electrical interference manifest themselves.
Other possibilities beyond the silicon technology used in miniaturization exist.
These include optical computing and the development of a quantum transistor. Nev-
ertheless, these technologies are not sufficiently advanced to be put into practical
Due to such limitations of serial computation, a paradigm for parallel computing
emerged. Metaphorically speaking, if an individual or processing unit cannot complete
342 A. Nagurney

the task in the required time, assigning, say, three individuals or units each a part of
the task, will, ideally, result in the completion of the task in a third of the time.
The most prevalent approach to parallelism today uses the von Neumann or control-
driven model of computation, which shall also be the focus here. Other approaches
presently under investigation are systolic, data flow, and neural nets. In addition, many
of the terms that have been traditionally used for architectural classification of parallel
machines can also be used to describe the structure of parallel algorithms and, hence,
serve to suggest the best match of algorithm to machine.

Flynn's taxonomy

One taxonomy of hardware is due to Flynn (1972) and although two decades old, it is
still relevant in many aspects today. His classification of computer architectures is with
respect to instruction stream and data stream, where instruction stream is defined as
a sequence of operations performed by the computer and data stream is the sequence
of items operated on by the instructions. In particular, the taxonomy may be depicted
as follows:

where SI refers to single instruction, MI to multiple instruction, SD to single data,

and MD to multiple data.
In single instruction, all processors are executing the same instruction at any given
time where the instruction may be conditional. If there is more than a single processor,
then this is usually achieved by having a central controller issue instructions. In
multiple instruction, on the other hand, different processors may be simultaneously
executing different instructions. In single data all processors are operating on the same
data items at any given time, whereas in multiple data, different processors may be
operating simultaneously on different data items.
Under this taxonomy, we have that a SISD machine is just the standard serial
computer, but, perhaps, one with multiple processors for fault tolerance. A MISD
machine, on the other hand, is very rare and considered to be impractical. Nevertheless,
one might say that some fault-tolerant schemes that utilize different computers and
programs to operate on the same input data are of this type.
A SIMD machine typically consists of N processors, a control unit, and an inter-
connection network. An example of the SIMD architecture is the Thinking Machines
CM-2 Connection Machine, which will be discussed later in greater detail. A MIMD
machine usually consists of N processors, N memory modules, and an interconnection
network. The multiple instruction stream model permits each of the N processors to
store and execute its own program, in contrast to the single instruction stream model.
An example of a MIMD architecture is the IBM SP1, which also will be discussed
later more fully.
Ch. 7." Parallel Computation 343

Shared versus distributed memory

The second principal issue is that of memory. In a global or shared memory, there is
a global memory space, accessible by all processors. Processors may, however, also
possess some local memory. The processors in a shared memory system are connected
to the shared (common global) memory by a bus or a switch. In local or distributed
(message-passing) memory, all the memory is associated with processors. Hence, in
order to retrieve information from another processor's memory, a message must be
sent there. M I M D machines, for example, are organized with either the memory being
distributed or shared.
Memory and bus contention must be considered in the case of algorithm develop-
ment for a shared memory system, since caution must be taken when two processors
try to simultaneously write to the same memory location. Distributed memory systems,
on the other hand, avoid the memory contention problem, but since access to non-local
data is provided by message passing between processors through the interconnection
network, contention for message passing channels is, thus, of concern.

Interconnection networks

Another common feature of parallel architectures is the interconnection network. Some

basic interconnection networks are: a bus, a switching network, a point to point
network, and a circuit-switched network. In a bus, all processors (and memory) are
connected via a common bus or busses. The memory access in this case is fairly
uniform but an architecture based on such an interconnection network is not very
scalable due to contention. In a switching network, all processors (and memory) are
connected to routing switches as in a telephone system. This approach is usually
scalable. In a point to point network, the processors are directly connected to only
certain processors and must go multiple hops to get to additional processors. In this
type of network, one usually encounters distributed memory and this approach is also
Some examples of a point to point network are: a ring, a mesh, a hypercube, and
binary tree (also a fat tree). The processors' connectivity here is modeled as a graph
in which nodes represent processors and edges the connections between processors.
In a ring network, for example, the N processors are connected in a circular manner
so that processor Pi is directly connected to processors Pi-1 and Pi+l. In a mesh
network, for example, the N processors of a two-dimensional square mesh are usually
configured so that an interior processor Pi,j is connected to its neighbors - processors
Pi-l,j, Pi+l,j, Pi,j-1, and ~ , j + l . The four corner processors are each connected to
their two remaining neighbors, while the other processors that are located on the edge
of the mesh are each connected to three neighbors. A hypercube with N processors,
on the other hand, where N must be an integral power of 2, has the processors indexed
by the integers{0, 1 , 2 , . . . , N - 1}. Considering each integer in the index range as a
344 A. Nagurney

(log 2 N)-bit string, two processors are directly connected only if their indices differ
by exactly one bit.
In a circuit-switched network, a circuit is sometimes established from the sender
to the receiver with messages not having to travel a single hop at a time. This can
result in significantly lower communication overhead, but at high levels of message
traffic, the performance may be seriously degraded.
Desirable features of the interconnection network are the following: 1) any proces-
sor should be able to communicate with every other processor, 2) the networks should
be capable of handling requests from all processors simultaneously With minimal de-
lays due to contention, 3) the distance that data or messages travel should be of lower
order than the number of processors, and 4) the number of wires and mesh points in
the network should be of lower order than the square of the number of processors.
As discussed in Denning and Tichy (1990), many interconnection networks satisfy
these properties, in particular, the hypercube. Although some computers utilize inter-
connection networks that do not satisfy these characteristics, they may, nevertheless,
be cost-effective because the number of processors is small. Examples of computers
that violate the fourth property above are the Cray X-MP and the Cray Y-MR which
utilize a crossbar switch as the interconnection network. These contain N 2 switch
points and become unwieldy as the number of processors grows. The Sequent Sym-
metry and the Encore Multimax, for example, make use of a shared bus, which may
result in violation of the second property above as the number of processors increases
due to congestion on the bus.

Another term that appears quite frequently in discussions of parallel architectures is
granularity. Granularity refers to the relative number and the complexity of the pro-
cessors in the particular architecture. A fine-grained machine usually consists of a
relatively large number of small and simple processors, while a coarse-grained ma-
chine usually consists of a few large and powerful processors. Speaking of early
1990s technology, fine-grained machines have on the order of 10,000 simple proces-
sors, whereas coarse-grained machines have on the order of 10 powerful processors
[cf. Ralston and Reilly (1993)]. Medium-grained machines have typically on the or-
der of 100 processors and may be viewed as a compromise in performance and size
between the fine-grained and coarse-grained machines.
Fine-grained machines are usually SIMD architectures, whereas coarse-grained ma-
chines are usually shared memory, MIMD architectures. Medium-grained machines
are usually distributed memory, MIMD architectures. According to Ralston and Reilly
(1993), by the mid 1990s, due to technological advances, one can expect that fine-
grained machines will have on the order of a million of processors, coarse-grained
machines will have on the order of one hundred processors, and medium-grained
machines will have on the order of ten thousand processors.
Ch. 7: Parallel Comlmtation 345

Examples of parallel architectures

We now provide examples, selected from commercially available machines that illus-
trate some of the above concepts. The examples include fine-, coarse-, and medium-
grained machines and SIMD and MIMD machines, along with a variety of inter-
connection networks; We first list some parallel computers and subsequently discuss
several of them more fully. Examples of distributed memory, SIMD computers are:
the Thinking Machines CM-1 and CM-2, the MasPar MP1, and the Goodyear MPR
Examples of shared memory, MIMD computers are: the BBN Butterfly, the Encore
Multimax, the Sequent Balance/Symmetry, the Cray X/MR Y/MR and C-90, the IBM
ES/9000, and the Kendall Square Research KSR1. Examples of distributed memory,
MIMD computers are: the Cray T3D, the Intel iPSC series, the IBM SP1, the Intel
Paragon, the NCUBE series, and the Thinking Machines CM-5.
It is also worth mentioning the INMOS transputer, with its name an amalgam of
transistor and computer. It is a microprocessor, or family of microprocessors, which
has been specially designed for the building of parallel machines. It consists of a RISC
(Reduced Instruction Set Computer) processor, and its own high level programming
language. It is well-suited to constructing MIMD architectures and can be used as
either a single processor or as a network of processors. The RISC processor, since
it maintains a minimum set of instructions, has room on its chip for other functions,
and its design makes it easier to coordinate the parallel activity among separate units.
Those interested in additional background material on parallel architectures are re-
ferred to the books by Hockney and Jesshope (1981), DeCegama (1989) and Hennessy
and Patterson (1990). For the historical role of parallel computing in supercomputing,
see Kaufmann and Smart (1993). For a comprehensive overview of parallel architec-
tures and algorithms, see Leighton (1992). For an overview of parallel processing in
general, see Ralston and Reilty (1993).
For illustrative purposes, we now discuss in greater detail several distinct paralM
architectures that highlight the major issues discussed above. Some of these architec-
tures are then used for the numerical computations presented in Section 4.

The CM-2

We begin with the fine-grained, massively parallel CM-2, manufactured by the Think-
ing Machines Corporation. The CM-2 is an example of a distributed memory, SIMD
architecture with 2 ~6, that is, 65,536 processors in its full configuration, each with 8KB
(kilobytes, that is, 2 l° or 1,024 bytes) of local memory, and 2,048 Weitek floating
point units. Other common configurations are CM-2's with 8K (8,192) or 32K (32,768)
Each processor performs very simple, 1 bit, operations. Each processor can operate
only on data that is in its own memory but the processors are interconnected so that
data can be transferred between processors. Sixteen bit serial processors reside on a
346 A. Nagut~ey

chip, and every disjoint pair of chips shares a floating point unit. The fiont-end system
(often a SUN workstation or a VAX) controls the execution on a CM-2. Programs
are developed, stored, compiled, and loaded on the front-end. The instructions issued
by the front-end are sent to the sequencers which break down the instructions into
low-level operations that are broadcast to the processors. Program steps that do not
involve execution on parallel variables are executed on the front-end. Communica-
tion between processors on a chip is accomplished through a local interconnection
network. Communication between the 4,096 chips is by a 12-dimensional hypercube
topology. The CM-2 has a peak performance of 32 GFLOPS (gigaflops or billions of
floating point operations per second). See Thinking Machines Corporation (1990) for
additional background on this architecture.

The CRAY X-MP/48

The CRAY series of supercomputers consists of coarse-grained shared memory ma-

chines where the vector processors may operate independently on different jobs or may
be organized to operate together on a single job. A vector processor is a processor con-
taining special hardware to permit a sequence of identical operations to be performed
faster than a sequence of distinct operations on data arranged as a regular array.
The CRAY X-MP/48 system, manufactured by Cray Research, is a coarse-grained
system with four vector processors and a total of 8 million 64-bit words. The peak
performance of the system is 0.8 GFLOPS. Each processor contains 12 functional
units that can operate concurrently. For additional information, see Cray Research,
Inc. (1986).
The C90, also manufactured by Cray Research, contains 16 connected processing
units, each of which is capable of performing a billion calculations a second. It is
a shared memory, MIMD machine. Central memory is 2 GB (gigabytes, that is, 23o
or 1.024 x 109 bytes) and the solid state storage device serves as an extension of
memory to provide an additional 4 GB. See Cray Research, Inc. (1993a) for further
reading on this architecture.
The T3D exists as a 32-processor prototype, and can be expanded to 128 proces-
sors and, ultimately, 512 processors. It is a distributed memory, MIMD machine. Each
of its processors is a DEC Alpha 64-bit microprocessor, with a theoretical peak of
150 MFLOPS. The topology of the T3D is that of a three-dimensional torus. Each pro-
cessors can contain 16MB of memory. See Cray Research, Inc. (1993b) for additional

The KSR1

The Kendall Square Research KSR1 computer is a medium-grained, shared memory

MIMD model whose shared memory is known as "ALLCACHE". The processors are
interconnected in levels of rings, with proprietary processors, up to 32 in 1 level,
Ch. 7: Parallel Computation 347

1066 in 2 levels, each with 512KB "subcache" and 32MB "local cache", it uses
a Unix operating system, run in a distributed manner and includes provisions for
dynamic load balancing and time-sharing nodes among different users. For additional
information, see Kendall Square Research (1992).

The ES/9000

The IBM ES/9000 is a shared memory, MIMD machine, which includes a number
of improvements in design and technology as compared to its predecessor, the IBM
ES/3090. It consists of 6 processors, each a vector unit in its own right, and with
8 GB of storage. Each processor has a separate cache for instructions and data,
thereby allowing for concurrent access of instructions and data. The ES/9000 also
allows for high-speed data-handling.
For additional information on this architecture, see IBM Corporation (1992).

The SP1

The IBM SP1 is a distributed memory, MIMD architecture with "many" off-the-shelf
IBM RS/6000 workstation processors in a single box. Hence, programs that have run
on an IBM RS/6000 workstation can be easily ported to the SP1, since it has the same
compilers available. The scalability of the SP1 lies in its switch technology. In the case
of 64 processors, the grouping of the processors is in 4 racks of 16 processors each,
with each rack containing its own switchboard, which handles all communication over
the switch. A typical processor on the SP1 has 32KB of cache memory, 128MB of
main memory, and 512MB of extended (virtual) memory.
For more information, see IBM Corporation (t993).

The Paragon

The lntel Paragon XP/S is a message passing, MIMD computer that can also support
the SPMD (Single Program Multiple Data) programming style. It is sometimes referred
to as a scalable heterogeneous multicomputer. It is compatible with the iPSC/860
family and has a 2D mesh interconnection network. Its "GP" nodes are involved in
service and I/O and its "MP" nodes have four i860 XP processors operating in a
shared memory implementation. We refer the interested reader to Intel Corporation

The CM-5

The Thinking Machines CM-5 is an example of a MIMD architecture. It is (typically)

medium-grained with distributed memory. It can contain from 16 to 16K processors,
with 16K processors being a theoretical upper bound where communication has to
take place at the speed of light. The processors are interconnected via a "fat tree" data
348 A. Nagurney

network with a regular binary tree for the control network. It consists of processing
nodes that are SPARC processors, each of which has 4 proprietary attached vector
units. Each vector unit controls 8MB of memory. A group of nodes under the control
of a single processor is called a partition and the control manager is called the parti-
tion manager. The nodes can be time-shared among different users [cf. Thinking Ma-
chines Corporation (1992a)]. The programming model associated with this machine is
referred to commonly as SPMD, Single Program Multiple Data, which can be viewed
as an extension of the SIMD approach to a medium grain MIMD architecture.

2.2. Parallel programming languages and compilers

There are two fundamental (and complementary) categories of parallel programming

languages. The first category consists of explicitly parallel languages, that is, languages
with parallel constructs, such as vector operations and parallel do-loops. Hence, in this
category, the parallelism in a program must be specified explicitly by the programmer.
Languages with explicitly parallel constructs can be further classified as being either
low level or high level. The second category consists of languages in which the
potential parallelism is implicit. For languages in this category, a parallelizing compiler
must be available to determine which operations can be executed in parallel.
Most of the parallel languages developed to date have been Fortran extensions, due,
in part, to the large investment in software development for numerical computations on
serial architectures. For example, CM Fortran, developed for the Connection Machine,
is an explicitly parallel programming language, and was influenced by Fortran 90. It is
also referred to as a data level programming language and exhibits the themes common
to such languages as: elementwise parallelism, replication, reduction, permutation, and
conditionals [cf. Steele (1988)].
As an illustration, in elementwise parallelism, when one adds two arrays, one adds
components elementwise. In terms of replication, one is interested in taking an amount
of data and making more of it in, for example, a few to many case, which is an example
of "spreading" in Fortran. On the other hand, one has "reduction" when one takes
many data items and reduces them to a few items. This occurs when one sums over
many values, or takes the "max" or "min" of an array. One encounters a permutation
when one does not change the amount of data but rearranges it in some fashion. CM
Fortran [cf. Thinking Machines Corporation (1992b, 1993a)], for example, uses the
Fortran 90 array features, whereas other data parallel languages usually incorporate
a new data type. Once the datasets are defined in the form of arrays or structures,
a single sequence of instructions causes the concurrent execution of the operations
either on the full datasets or on selected portions.
We now present one of the above constructs in CM Fortran. Others are given in
sample codes in Section 4. Consider the addition of two arrays A and B, each of
dimension 200 x 200. The statement in CM Fortran is then given by: C=A+B. This
Ch. 7: Parallel Computation 349

statement is executed as a single statement and yields the 40,000 elements of the
array C simultaneously. In the case where the number of array elements exceeds the
number of processors, the compiler configures the algorithm for processing on "virtual
processors" and assigns each element its own virtual processor.
In contrast, a serial Fortran 77 version yielding the values of C would consist of
the following statements:

Do 10 i=l,200
Do 20 j=l,200
20 Continue
i0 Continue

The flow of control in a data parallel language is almost identical to that of its serial
counterpart, withoutany code required to guarantee synchronization in the program
as is needed in functional parallelism. In functional parallelism, for example, one may
have to assign particular tasks to specific processors via specific parallel programming
task allocation constructs and then wait for the tasks to be completed, also explicitly
stated in the code, before reassignation. Such a feature is provided in Parallel Fortran
[cf. IBM Corporation (1988)], which is used in the IBM 3090-400 and IBM ES/9000
MIMD architecture series [cf. IBM Corporation (1992)]. In data level parallelism, in
contrast, the compilers and other system software maintain synchronization automati-
cally. Furthermore, since the sequence of events is ahnost identical to those that would
occur in a serial version of the program, program debugging, analysis, and evaluation
is simplified.
There also exist Fortran extensions for programming shared memory, MIMD ar-
chitectures. As an illustration, we provide the Fortran 77 code for the above matrix
addition, embedded with Parallel Fortran constructs for task allocation, for the IBM
3090/600E, which can have up to six processors.
n t a s k = n p r o c s ()
Do 5 i=l,ntask
o r i g i n a t e any task itask(i)
5 Continue
Do i0 i=i,200
irow(i) =i
d i s p a t c h any task next(i), sharing(Al), calling add(irow(i))
i0 Continue
wait for all tasks

This routine allocates the task of summing, term by term, the elements of the rows
of A and t3 to an available processor. The summing is accomplished in the subroutine
add, which shares the common A1 with the main routine, which, in turn, contains the
elements of the three arrays A, B, and C. An IBM Parallel Fortran compiler would be
needed for the compilation of the above code, illustrating the complementary nature
of explicit parallel programming and automatic parallelization.
350 A. Nagurney

The second category of languages, which relies on parallelizing compilers, is, in-
deed, common to the majority of shared memory MIMD machines and to supercom-
putcrs within this class. Languages in which potential parallelism is implicit are such
common programming languages as the already-mentioned Fortran, Pascal, and C.
The parallelizing compilers, for example, would automatically translate sequential
Fortran 77 code into parallel form. Sequential Fortran code could, hence, in princi-
ple, be more easily ported across different parallel platforms with the availability of
such compilers. These compilers have had their greatest success in translating Fortran
do-loops into vector operations for execution on pipelined vector processors. Never-
theless, more research is needed in the area of parallelizing compilers for distributed
memory architectures.
Finally, it is also worth noting the Thinking Machine's CM-5, which incorporates
a mix of parallel techniques. The extended model is referred to as coordinated paral-
lelism. The CM-5 retains the positive features of a SIMD machine, in that it is good
at synchronization and communication, and the positive feature of a MIMD machine,
that of independent branching. In order to program the CM-5 in a MIMD style, one
makes use of the CMMD library [cf. Thinking Machines Corporation (1993b)], which
supports such operations as sending and receiving messages between nodes, and such
global operations as scan, broadcast, and synchronization.
One must also be aware that there are libraries of software routines available for
parallel architectures. For example, the CMSSL (Connection Machine Scientific Soft-
ware Library) contains routines for solving systems of equations, ordinary differential
equations, and linear programming problems [see, e.g., Thinking Machines Corpora-
tion (1992c)]. We will illustrate the use of this library in an application in Section 4. In
addition, there is now a utility known as CMAX, which enables the translation of serial
Fortran 77 code to CM Fortran [see, e.g., Thinking Machines Corporation (1993c)].
The Intel Paragon and the Kendall Square Research KSR1 computers also support
Fortran as well as C. The KSR1, in addition, supports Cobol, since many of its
applications lie in database management. Of course, the Intel Paragon also makes
use of message libraries. Hence, one sees that one no longer must learn different
assembly languages in order to avail oneself of parallel computation. Furthermore, it is
expected that the major parallel architectures will also be supporting High Performance
Fortran, thus making codes more portable across the different architectures [cf. High
Performance Fortran Forum (1992)].

2.3. Computer science issues in parallel algorithm development

Before turning to the presentation of numerical methods for particular problem classes
in Section 3, we briefly highlight issues revealed above which impact parallel algo-
rithm development and which do not arise in serial computation. These issues should
be kept in mind when reading the subsequent sections. The major issues are: decom-
position, task scheduling, load balancing, and synchronization and communication.
Finally, the algorithm developer must have the target architecture in mind.
Ch. 7: Parallel Computation 351


The first and foremost step in the development of any parallel program for the solution
of a problem is to determine the level of decomposition that is possible. One typically
first considers the decomposition of the problem itself from the highest level to the
most refined. Naturally, one should exploit any obvious parallelism. In conjunction
with problem decomposition, one should also consider domain decomposition, that is,
whether or not one can break up the region of definition of the problem into smaller
subregions. In many parallel numerical methods, as we shall see in Sections 3 and 4,
problem and domain decomposition may go hand in hand. In addition, one must
consider the decomposition of the data structures themselves, since, for example, data
on a particular architecture may be stored in local memory, and how one designs the
data structures will influence the architectural level solution of the problem.

Task scheduling

After the problem is decomposed into subtasks, the subtasks must be allocated for
completion by the available processors. This is easy to understand in the manager-
worker parallel paradigm, in which the manager (a processor) partitions a task into
subtasks and assigns them to workers (other processors). If the tasks are homogeneous,
in that they take about the same amount of time to complete on the processors, and
relatively independent, then the scheduling of the tasks can be accomplished by a
deterministic or random assignment, or using some simple heuristic. If this is not the
case, then it may be difficult to schedule the tasks efficiently. Hence, one should aim
to decompose the problem into subproblems of relative difficulty and size. Such a
mechanism will be illustrated in Section 4 in the context of a multicommodity trade

Load balancing

This issue brings us to load balancing, which can be achieved satisfactorily by break-
ing up the problem into similar subproblems. However, if the workload cannot be
predicted well in advance, then one may have to make use of more advanced tech-
niques than static load balancing provides. For example, one may attempt dynamic
load balancing, adjusting the workload update and task reassignment throughout the
computation. This is, however, difficult to accomplish effectively and may require a
great deal of experimentation.

Synchronization and communication

After one has selected one (or more) decomposition strategies for a given problem,
one needs to assess the amount of synchronization and communication that will be
required. It terms of synchronization, one distinguishes between tightly and loosely
352 A. Nagurney

synchronous and asynchronous. In a synchronous strategy one focuses on the elements

of the data domain with the expectation that they are updated accordingly. With an
asynchronous strategy there is no natural synchronization and one cannot determine
what data will be available (and how old it may be) at a particular iteration increment.
For asynchronous strategies it is often very difficult to establish convergence of the
underlying algorithm. Communication requirements are distinguished between static,
deterministic and dynamic, non-deterministic.

Target machine

Finally, the importance of the target machine cannot be overestimated. In other words,
the selection of an algorithm to solve a particular problem should be made with a view
of the properties of the architecture on which the algorithm is to be implemented. For
example, one should keep the following questions in mind. Is the intended architecture
SIMD, MIMD, or a combination? Is the memory distributed or shared and what is
the available size? What is the structure of the interconnection network? Does the
parallel computer have vector capabilities? Does the architecture support any software
libraries, which may contain frequently used routines that are optimized? Are there
message passing utilities available? One must be cognizant of such issues in order
to make the best mapping of an algorithm to a particular architecture for a specific

Some performance measures

We conclude this section with a discussion of some performance measures. A com-

mon measure of the pertbrmance gain from a parallel processor is known as speedup.
Roughly defined, it is the ratio of the time required to complete the job with one pros
cessor to the time required to complete the job with N processors. Perfect speedup,
hence, would be N. The achievement of a perfect speedup, at least in principle, may
be feasible in the following situations: 1) in the case where each part of the problem
is permanently assigned to a processor and each such subproblem is computationally
equivalent, with the processors experiencing no significant delays in exchanging in-
formation and 2) in a machine where each subproblem can be dynamically assigned
to available processors, it may be attained only as long as the number of subproblems
ready for processing is at least N. The best that one can hope to achieve is speedup
that is linear in the number of processors.
More rigorously speaking and a model that is often utilized in measurements and
evaluations of parallel algorithms is the following. Let Tl* be the time required to solve
a particular problem using the best possible serial algorithm on a single processor and
let TN be defined as the amount of time required to solve the problem using a parallel
algorithm implemented on N processors. Then the ratio

sN = T~ (~.l)
Ch. 7." Parallel Computation 353

is known as the speedup of the algorithm, and the ratio

sN TI*
EN- N - (2.2)

as the efficiency of the algorithm. The above measures may also be evaluated as the
functions of the size of the problem n, with TN = T N ( n ) . In the ideal situation, the
speedup S N = N and the efficiency E N = 1.
Since there may be difficulty in determining the best or optimal serial algorithm
and, hence, TI*, this term is sometimes alternatively defined as: the time required by
the best existing serial algorithm, the time required by a benchmark algorithm, or
the time required for the problem using the particular parallel algorithm on a single
processor of the parallel processing system. Note that the last definition yields a
speedup measure which evaluates the parallelizability of the parallel algorithm but
provides no information as to its overall efficiency vis a v i s other existing algorithms.
Reporting of numerical results on parallel architectures must, hence, clearly state
precisely what measurement of Tl* is being used for calculating speedup and efficiency.
Another measure, known as Amdahl's Law [see Amdahl (1967)], attempts to take
into consideration the fact that parts of an algorithm (or code) may be naturally serial
and not parallelizable, and, therefore, when a large number of processors may be
available the parallel parts of the program may be quickly completed, whereas the
serial parts serve as bottlenecks. In particular, the law is expressed as follows

1 1
SN ~ J~ + ( l - f ) ~ -7' for all N, (2.3)

where f denotes the fraction of the total computation that is inherently serial. As f
approaches zero, the speedup approaches the idealized one.
Hence, based on Amdahl's Law, the maximum possible speedup, even with an
unlimited number of processors, i.e., as N ~ ec, would be: 1 / f . Consequently, an
application program that is 10% serial, as was found to be the case (at best) in many
scientific programs in the 1960's, would run no more than ten times faster, even with
an infinite number of processors. This law, however, fails to recognize that often a
problem is scaled with the number of processors, and f as a fraction of size may
be decreasing, that is, the serial part of the code may take a constant amount of the
time, independent of the size of the problem. This argument is sometimes used as a
refutation of Amdahl's Law.
The measure, mentioned already earlier, known as MFLOPS (or GFLOPS) is also
considered a yardstick for algorithm (and architecture) evaluation. This measure may
have to be determined by hand if there is not the software to compute it on a particular
354 A. Nagurney

All the above measures, although imperfect, can, nevertheless, provide useful guide-
In addition, time complexity is an important measure that allows one to compare
serial execution versus parallel execution on a particular architecture with a num-
ber of processors. We now illustrate this measure for some very common numerical
For example, [cf. Bertsekas and Tsitsiktis (1989)], the multiplication of two real
'r~ × 'r~ matrices, which in serial execution is of time complexity O(n 3) can be reduced
to O(log 2 r~), provided that one has a hypercube with n 3 processors. If, instead, one has
a mesh architecture with n 2 processors, then the time complexity would be O(r~ log 2).
The parallel execution of the multiplication of an n x n matrix and an n × 1 vector;' in
turn, has a time complexity of O(logn) on a hypercube with n 2 processors. Finally,
an inner product of two real n-dimensional vectors has a parallel execution time
complexity of O(log'r~) on a hypercube with n processors.

3. Fundamental problem classes and numerical methods

In Section 2 the focus was on the technology and computer science aspects of parallel
computing. In this section we turn to the mathematical programming and numerical
analysis aspects of parallel computing. We first overview some of the fundamental
mathematical problems encountered in economics and then discuss the numerical
methods for their solution. In particular, we emphasize problem classes and associated
computational schemes that can be (and have been) paralletized and that have been
subjected to rigorous theoretical analysis. This presentation is by no means exhaustive
but, instead, highlights problems that occur frequently in practice. The goal here is to
present unifying concepts in an accessible fashion.
We begin with systems of equations, which have served as the foundation for many
economic equilibrium problems. Moreover, computational schemes devised for this
class of problems have also been generalized to handle problems with objective func-
tions and inequalities. We then discuss optimization problems, both unconstrained and
constrained, and consider the state-of-the-art of parallel algorithms for this problem
We subsequently turn to the variational inequality problem, which is a general prob-
lem formulation that encompasses a plethora of mathematical problems, including,
among others, nonlinear equations, optimization problems, complementarity problems,
and fixed point problems. A variety of serial and parallel decomposition algorithms
are presented for this problem class.
The relationship between solutions to a particular dynamical system and the solu-
tions to a variational inequality is recalled, as well as a general iterative scheme for the
solution of such dynamical systems, which induces several well-known algorithms.
For this problem class we also discuss parallel computing issues.
We first describe the classes of problems under consideration and then the associated
numerical schemes.
Ch. 7: Parallel Computation 355

3.1. Problem classes

We briefly review certain problem classes, which appear frequently in economics, and
then recall their relationship to the variational inequality problem.
For standardization of notation, let x denote a vector in R n and F a given contin-
uous function from K to R ~, where K is a given closed convex set.

Systems of" equations

Systems of equations are common in economics, in particular, in the setting of defining

an economic equilibrium state, reflecting that the demand is equal to the supply
of various commodities at the equilibrium price levels, and in the formulation of
macroeconometric models. Let K = R n and let F : R '~ ~+ R '~ be a given function.
A vector x* c R n is said to solve a system of equations if

F(x*) =0. (3.1)

This problem class is, nevertheless, not sufficiently general to guarantee, for examp-
le, that x* ) 0, which may be desirable in the case where the vector x refers to

Optimization problems

Optimization problems, on the other hand, consider explicitly an objective function to

be minimized (or maximized), subject to constraints that may consist of both equalities
and inequalities. Let f be a continuously differentiable function where f : K ~-~ R.
Mathematically, the statement of an optimization problem is:

Minimize f(x) (3.2)

subject to x ~ K.

Note that in the case where / ( = R r~, then the above optimization problem is an
unconstrained problem.
Optimization problems occur frequently in economics not only in microeconolnics,
such as in the theory of households or firms, but also in econometrics.

Complementarity problems

Let R~_ denote the nonnegative orthant in R n, and let F " R n ~-~ R n. Then the
nonlinear complementarity problem over R~ is a system of equations and inequalities
stated as:
Find x* >~ 0, such that

F(x*) >~ 0 and F(x*) T . (x*) = 0. (3.3)

356 A. Nagurney

Whenever the mapping F is affine, that is, whenever F(z) = M x + b, where M

is an n x n matrix and b and n x 1 vector, the above problem is then known as the
linear complementarity problem.

The variational inequality problem

The finite-dimensional variational inequality problem, VI(F, K), is to determine a

vector :c* E K , such that

F(x*) T . ( z - z * ) >~0, for all zEK, (3.4)

where F is a given continuous function from K to/~n and K a given closed convex
Variational inequality problems have been used to formulate and solve a plethora of
economic equilibrium problems ranging from oligopolistic market equilibrium prob-
lems to general economic equilibrium problems.

Dynamical systems

Consider the dynamical system defined by the ordinary differential equation (ODE)

- rI(x, v(x)), 5(0) = xo c K, (3.5)

where given z E K and v c R k, the projection of the vector v at z is defined by

FI(:r, v) = lira (P(z + 6'v) - z)

&-+O d '

and the orthogonal projection P(z) with respect to the Euclidean norm by

P(z) = arg m i n
IIx - zll. (3.6)

The difficulty in studying the dynamical system (3.5), as discussed in Dupuis and
Nagurney (1993), lies in that the right-hand side, which is defined by a projection, is
discontinuous. Nevertheless, as established therein, the important qualitative properties
of ordinary differential equations hold in this new, nonclassical setting. The projection
ensures that the trajectory always lies within the feasible set K and, hence, satisfies
the constraints. This would guarantee, for example, that if K was the nonnegative
orthant, the production outputs in an oligopoly example would always be nonnega-
tive; similarly, the prices in a Walrasian equilibrium problem would also always be
nonnegative. For more background, see Nagurney and Zhang (1996).
Ch. 7: Parallel Computation 357

3.1.1. Relationship between the variational inequality problem and other problem
We now review the fact that the variational inequality problem contains the above
problem classes as special cases, discuss its relationship to the fixed point problem,
and recall that its set of solutions corresponds to the set of solutions of the above
dynamical system. For rigorous proofs, see Nagurney (1993).
For example, a system of Eqs (3.t) can be formulated as a variational inequality
problem. Indeed, a vector x* c R n solves VI(F, R n) if and only if F(x*) = 0, where
F : R n ~-~ R n.
Similarly, both unconstrained and constrained optimization problems can be for-
mulated as variational inequality problems. Consider the .optimization problem (3.2)
with x* as the solution. Then x* is a solution of the variational inequality problem:

Vf(x*)T.(x--x*)>/O, for all xcK.

On the other hand, if f(x) is a convex function and x* is a solution to V I ( V f , K ) ,

then x* is a solution to the above optimization problem•
If the feasible set K = R ~, then the unconstrained optimization problem is als0 a
variational inequality problem.
The variational inequality problem, however, can be reformulated as an optimiza-
tion problem, only under certain conditions. In particular, if we assume that F(x) is
continuously differentiable on K and that the Jacobian matrix

Dxl Dx~
DF._.__.kI . . . . DF~I t
~F~ ... ~F,~
~xl ~xn
is symmetric and positive semi-definite, so that F is convex, then there is a real-valued
function f : K ~-+ R satisfying

v f ( x ) = F(x)

with x* the solution of VI(F, K ) also being the solution of the optimization prob-
lem (3.2).
Hence, although the variational inequality problem encompasses the optimization
problem, a variational inequality problem can be reformulated as a convex optimiza-
tion problem, only when the symmetry condition and the positive semi-definiteness
condition hold. The variational inequality, therefore, is the more general problem in
that it can also handle a function F(x) with an asymmetric Jacobian.
The variational inequality problem contains the complementarity problem (3.3) as
a special case. The relationship between the complementarity problem defined on the
358 A. Nagurney

nonnegative orthant and the variational inequality problem is as follows. VI(F, R~_)
and the complementarity problem defined above have precisely the same solutions, if
The relationship between the variational inequality problem and the dynamical
system was established in Dupuis and Nagurney (1993). In particular, if one assumes
that the feasible set K is a convex polyhedron, and lets b = --F, then the stationary
points of (3.5), i.e., those that satisfy 5: = 0 = F i ( x , - F ( x ) ) , coincide with the
solutions of VI(F, K ) .
This identification is meaningful because it introduces a natural underlying dynam-
ics to problems which have, heretofore, been studied principally in a static setting at
an equilibrium point.

Fixed point problems

We now turn to a discussion of fixed point problems in conjunction with variational

inequality problems. In particular, we first recall that the variational inequality problem
can be given a geometric interpretation. Let K be a closed convex set in R n. Then
for each x E R n, there is a unique point y c K , such that

[Ix - y l l ~< IIx - z l [ , for all zEK,

and y is the orthogonal projection of x on the set K , i.e.,

y = P ( x ) = arg m i n IIx - zll.


Moreover, y = P(x) if and only if

yTC.(z_y))x r.(z-y), for all zCK


( y - x ) 7" . (z - y) ~ 0, for all zcK.

Recalling that for two vectors u, v C R n, the inner product u.v = ]u I• Iv t .cos 0, and,
hence, for 0 ~ 0 ~< 90 °, u - v ~> 0, then the last inequality above may be interpreted
We now present a property of the projection operator that is useful both in qualitative
analysis of equilibria and their computation. Let K again be a closed convex set. Then
the projection operator P is nonexpansive, that is,

] ] P x - Px']I <~ H x - x'II, for all x,x' c R n.

Ch. 7: Parallel Computation 359

The relationship between a variational inequality and a fixed point problem can
now be stated. Assume that K is closed and convex, z* E K is a solution of the
variational inequality problem if and only if x* is a fixed point of the map

P(I-"/F):K~-+K, for 7>0 that is, x* = P ( x * - y F ( x * ) ) .

3.1.2. Qualitative properties of the variational inequality problem

In this subsection, for completeness, we present certain qualitative results fbr the
variational inequality problem. We also review certain properties and recall definitions
which will be referred to in our discussions of the convergence of algorithms. The
interested reader is referred to Kinderlehrer and Stampacchia (1980) for accompanying
results in standard variational inequality theory.
Existence of a solution to a variational inequality problem follows from continuity
of the function F entering the variational inequality, provided that the feasible set K
is compact. Indeed, if K is a compact convex set and F ( z ) is continuous on K , then
the variational inequality problem admits at least one solution x*.
In the case of an unbounded feasible set K, the existence of a solution to a vari-
ational inequality problem can, nevertheless, be established under the subsequent
Let ZR denote a closed ball with radius R centered at 0 and let K R = K fq ZR.
K R is then bounded.
By VIR we denote the variational inequality problem

F(x*R) T . ( y - z*R) >~ O, for all y E KR.

In this case we have the following result. VI(F, K ) admits a solution if and only
if z ~ satisfies [IzT~ll < R for large enough R.
Although I[x~]I < R may be difficult to check, one may be able to identify an
appropriate R based on the particular application.
Qualitative properties of existence and uniqueness become easily obtainable under
certain monotonicity conditions. We first outline the definitions and then present the

DEFINITION 3.1 (Monotonicity). F(x) is monotone on K if

[ F ( z 1 ) - F ( x 2 ) ] T . (z 1 - z 2) >~0, for all x ' , x 2 E K.

DEFINITION 3.2 (Strict monotonicity). F(x) is strictly monotone on K if

[F(x 1 ) - F ( x 2 ) ] T . ( x ' - x 2)>0, for all x' ~ a 2, x ' , x 2 E K.

360 A. Nagurney

DEFINITION 3.3 (Strong monotonicity). F(x) is strongly monotone if

[ F ( x 1) -- F(x2)] T . (x 1 __ x 2)/> oLltx1 - x2[l 2,

for some a>0, and all x 1,x 2 E K .

DEFINITION 3.4 (Lipschitz continuity). F(x) is Lipschitz continuous if there exists a

positive constant L such that

[]F(x')- F(x2)ll ~ L l l x ' - x211, for all x ' , x 2 C K.

Recall now the following. Suppose that F(x) is strictly monotone on K . Then the
solution is unique, if one exists.
Monotonicity is closely related to positive definiteness and plays a role similar to
that of convexity in optimization problems. Indeed, suppose that F(x) is continuously
differentiable on K and the Jacobian matrix
13F~ ... ~FI \

VF(~) : •
... ~)F,~
is positive semi-definite (positive definite), that is,
vTVF(x)v >~O, for all v c R n,
(vrvF(x)v>O, for all v/0, vER~),

then F(x) is monotone (strictly monotone).

Under a slightly stronger condition, we have the following result. Assume that F(x)
is continuously differentiable on K and that V F ( x ) is strongly positive definite, that

v~'VF(~)v ~ ~llvlt 2, for all vC R n, for all x C K.

Then F(x) is strongly monotone.

The property of strong monotonicity guarantees both existence and uniqueness of
a solution. In particular, if one assumes that F(x) is strongly monotone, then there
exists precisely one solution x* to VI(F, K ) .
Assume now that F(x) is both strongly monotone and Lipschitz continuous. Then
the projection PK Ix -- 7 F ( x ) ] is a contraction with respect to x, that is, if we fix
7 <~ ~/La where a and L are the constants appearing, respectively, in the strong
monotonicity and the Lipschitz continuity condition definitions. Then

IIPK(X -- ~/F(x)) - PK(Y -- ~F(Y))II < / ~ l l x - Yll

Ch. 7: P a r a l l e l C o m p u t a t i o n 361

for all x, 9 E K, where

~ ( 1 -- "TOg)1/2 < 1.

It follows from this result and from the Banach fixed point theorem that the operator
P ~ ( x - 7 F ( x ) ) has a unique fixed point x*.
The above results are useful in establishing convergence of algorithmic schemes.

3.2. Algorithms

In this subsection we present some of the basic algorithmic schemes for the solution
of the above problem classes. In particular, we focus on those algorithms, which have
been successfully implemented in practice on both serial and parallel architectures, and
that have been subject to theoretical analysis. Conditions for convergence are briefly
discussed with an aim towards accessibility. References where complete proofs can
be obtained are included.
Many iterative methods for the solution of systems of equations, optimization prob-
lems, variational inequality and other problems, have the form

x ~+l = g ( S ) , r=O,l,..., (3.7)

where x ~ is an n-dimensional vector and 9 is some function from R '~ into itself with
components {gl, 9 2 : . . . , g,~}. For example, in the case where g ( x ) = A x + b , where A
is of dimension n x n, and b is an n-dimensional vector, one obtains a linear iterative
The principal iterations of the form (3.7) are the

Jacobi iteration."

X'r+l = gi(X]- ' :X~z), i = 1, n,

i . . . . . :

and the

Gauss-Seidel iteration."

z~ +1 = y "~°~-+J
ika,1 :...:X 7_+11 : X i ": " . . : S )n ~ 'i=l :" . . : r~.

As is well-known, the Gauss-Seidel algorithm incorporates the information as it

becomes available, whereas the Jacobi method updates the iterates simultaneously.
Hence, the Jacobi method is a natural parallel method. Indeed, each subproblem i,
for the evaluation of x~-+1 can be allocated to a distinct processor for simultaneous
362 A. Nagurney

solution. It is also worth noting that there are different Gauss-Seidel algorithms,
depending on the specific order with which the variables are updated. Moreover,
a Gauss-Seidel iteration may be totally unparatlelizable as when each function 9i
depends upon all of the components of the vector x. Of course, one may then wish to
select an ordering so that one could maximize the parallelism in any given iteration, but
recognize the fact that there exists a trade-off between parallelization and convergence
~,",ecd of Gauss~Seidel iterations.
In our statements of the algorithms for the various classes of problems, for the
sake of brevity, we present only the typical iteration. Of course, each algorithm must
be suitably initialized and also convergence must be verified through an appropriate
convergence criterion. This latter issue is discussed more fully in terms of specific
applications in the numerical section.

3.2.1. Algorithms for systems of equations

The principal iterative techniques for solving systems of equations [cf. (3.1)] are the
Gauss-Seidel and the Jacobi methods.
In the case where the system of equations itself is linear, say,

Ax ~- b~

where A is an n x n real matrix and b a vector in R n, then, in addition to iterative

methods, direct methods can be applied. Direct methods find an exact solution in
a finite number of operations (usually of the order n3), whereas iterative methods
converge to a solution asymptotically. Iterative methods are, nevertheless, preferable
to direct methods when n is very large, since often they provide an acceptable solu-
tion after a reasonable number of iterations and, usually, require less memory storage.
Similarly, the computation of the inverse of A, A - 5, can be accomplished via iterative
or direct methods.
We first present iterative methods and then highlight direct methods.
Note that, under the assumption that A is invertible, one is guaranteed a unique
solution x*. If we write the ith equation of A x - b as

~ aijxj = bi,

and assume that ai.i 5¢ O, then the statement of the Jacobi algorithm for a typical
iteration ~-, and beginning with an initial vector x ° C R '~, would be
Ch. 7: Parallel Computation 363

Jacobi iteration:

- aijx)-bi , i=l,...,n,
aii j¢ .

and the statement of the Gauss-Seidel algorithm for a typical iteration ~-:

Gauss-Seidel iteration:

= -- -- L.J t~ijwj q- aijx; -- bi , i = 1,..., n.

aii j<i j>i

Other variants of the above algorithms which make use of a relaxation parameter are
the Jacobi Oven-elaxation Method (JOR) and the Successive Overrelaxation Method
(SOR). These algorithms, under an appropriate choice of relaxation parameter, denoted
by 3' [cf. Bertsekas and Tsitsiklis (1989)], often converge faster.
In particular, an iteration of the JOR algorithm (note the similarity to the Jacobi
iteration) is given by

JOR iteration:

= --- aijxj-bi , i=l,...,n,

aii j#i

whereas an iteration of the SOR algorithm (note the similarity to the Gauss-Seidel
iteration) is given by

SOR iteration:

x~-+l=(1-3")x~- -7
- [ ~Xi j-a' ,' ~°~-+1
j + ~aijx; - - bi ] , i = l,...,n.
aii j<i j>i

In the case where 3' = 1, JOR and SOR collapse, respectively, to the Jacobi and
Gauss-Seidel methods.
We now briefly discuss some convergence results. If the matrix A is row diagonally
dominant, then the Jacobi method converges to the solution of the system of equations.
The Gauss-Seidel method converges to the solution if the matrix A is symmetric and
positive definite. Recall that a diagonally dominant matrix is also positive definite.
Both the JOR algorithm and the SOR algorithm converge, under the same conditions
as the Gauss-Seidel method, provided that the relaxation parameter 3' is sufficiently
small (and positive) in the case of JOR and in the range (0, 2) for SOR.
364 A. Nagurney

All the above methods, nevertheless, share the desirable, especially fi'om a practical
point of view, property that, if they converge, then they converge to a solution.
Direct methods, on the other hand, typically solve a system of linear equations
by applying a simple set of transformations on both sides of the equation until the
matrix A becomes triangular, in which case the resulting system is then solved by
back substitution. The classical direct method for solving a linear system of equations
is Gaussian elimination. Its time complexity for parallel execution is O(n) for n 2
processors on a hypercube and O(n 2) for n processors on a hypercube. For a mesh
architecture the time complexity is O(n) for n 2 processors. For additional complexity
as well as parallel implementation issues, see Bertsekas and Tsitsiklis (1989), and the
references therein.
Numerical procedures for solving linear systems of equations, including Gaussian
elimination, are standard features of many parallel software libraries [see, e.g., CMSSL

3.2.2. A l g o r i t h m s f o r u n c o n s t r a i n e d o p t i m i z a t i o n

Here we consider algorithms for minimizing a continuous function f : R '~ ~-+ /~,
in the absence of constraints [cf. (3.2)]. In this case, V f ( x * ) = 0 for every vector
x* that minimizes f and, hence, the problem of minimizing f is related to solving
the system V f ( x ) = 0 of generally nonlinear equations, where F ( x ) -- V / ( x ) .
In fact, the proofs of convergence of various schemes for solving linear equations,
discussed above, also make use of this fact. Indeed, cf. Section 3.1.1, under certain
assumptions, such as symmetry, in this case of A, one can reformulate the problem as
an optimization problem, which would here take the form of a quadratic programming
problem. This problem, in turn, would be strictly convex if A is assumed positive
definite. Hence, in the proof one establishes that the objective function must decrease
at each step. This is known as the "descent" approach to establishing convergence.
The statement of the nonlinear Jacobi algorithm for unconstrained optimization is
given by the following expression.

Nonlinear Jacobi method:

•x~•-+' = arg m i n f ( x ' ~ , . . . , x. i _. l , .x ~., x i + ~ , . .. , x~~~) , i--l,.. ,n.


The nonlinear Gauss-Seidel algorithm is defined by the following expression.

Nonlinear Gauss-Seidel method:

x~ +1 -- arg m i n f ( x ~ +1 , " . , x-r+l

i-l,xi, x~ i~r+ l , . . ' , x ~ ) ,~r i- 1,...,n.
Ch. 7: Parallel Computation 365

One assumes that a minimizing x~"+1 always exists, and that the algorithms are ini-
tialized with an x ° E R n.
The nonlinear Gauss-Seidel algorithm is guaranteed to converge to a solution x*
of (3.2) under the assumptions that f is continuously differentiable and convex, and
f is a strictly convex function of xi, when all the other components of the vector x
are held fixed. Both algorithms are guaranteed to converge if the mapping T, defined
by T ( x ) = x - 3 ' V f ( x ) , is a contraction with respect to a weighted maximum
norm, where the weighted maximum norm Ilzll~ = maxi 1~-7I- Xl The sequence { S }
generated by either of these algorithms then converges to the unique solution x*
geometrically. The contraction condition would hold if the matrix V 2 f ( x ) satisfies a
diagonal dominance condition [cf. Bertsekas and Tsitsiklis (1989)].
Note that different versions of the nonlinear algorithms are obtained if R n can
be decomposed into a Cartesian product: [ L = l R'~', where at each stage, the mini-
mization is done with respect to the ni-dimensional subvector x i . Cartesian products
will also play an important role in the construction of decomposition algorithms for
both constrained optimization problems and variational inequality problems. These
algorithms converge under analogous assumptions to those imposed previously.
Linearized counterparts of the above algorithms include a generalization of the JOR
algorithm for linear equations, where

JOR method:

x~-+' = x~" - 7 [ D ( S ) ] - ' V f ( x r)

where 7 > 0, and D ( x ) is a diagonal matrix whose ith diagonal element is V~if(x ),
assumed nonzero.
A generalization of the SOR algorithm is given by

SOR method:

V ' 4:(xr+l ; + l I "I-

X~+I 7- z d \ 1 : . . . , X _ ,Xi,... ,~r~)
i= l,...,n.
= Xi -- "~ V72 ¢_.( "r+l :+11 XT "
--iia\Wl ~...~X._ , ~...,X r)

Both the JOR algorithm, sometimes referred to as the Jacobi method, and the
SOR algorithm, sometimes referred to as a Gauss-Seidel method, are guaranteed to
converge to the unique solution x ~, provided that 7 is chosen positive and small
enough and V f is strongly monotone (cf. Definition 3.3). For this proof, rather than
using the descent property, one uses a contraction approach [cf. Bertsekas and Tsitsiklis
Both the nonlinear and linear Jacobi methods are easily parallelized with each
subproblem / being allocated to a distinct processor/. Also, note that these are syn-
chronous algorithms in that one obtains updates for all {xi}, i = 1 , . . . ,n, before
proceeding to the next iteration.
366 A. Nagurney

If f is assumed to be twice continuously differentiable, then another important al-

gorithm for the solution of nonlinear equations and optimization problems is Newton's
method described below.

Newton's method:

.r+l ~- X r _ (V2f(zr))-,Vf(zr).

In general, the points generated by Newton's method may not converge. The reason
for this is that the Hessian matrix, V 2 f ( S ) , may be singular, so that x '-+1 is not well-
defined. Even if ( V 2 / ( S ) ) - I exists, then f ( S +1) is not necessarily less than f ( S ) .
However, if the initial point z ° is sufficiently close to z*, so that V f ( z * ) = 0, and
V 2 f ( x *) is of full rank, then Newton's method is well-defined and converges to x*
[cf. Bazaraa and Shetty (1979)].
It is worth noting that if f ( x ) is a quadratic function, that is, f ( x ) = ½ x T A x - xTb,
then x ~'+l = x ~ - A - I ( A x ~ - b ) = A-lb, and Newton's method converges in a single
The evaluation of both the Hessian and its inverse can be done in parallel (cf.
Section 3.2.1), as can the evaluation of the Jacobian.
Related algorithms are approximate Newton's methods, which are based on the
preceding Newton iteration, but the inverse of the Hessian matrix is not determined
completely. An example follows:

An approximate Newton's method:

X r+l = a7r -1- ~ 8 ~

where s is computed by solving:

V2f(zr)s = --Vf(zr).

In an approximate Newton's method, one uses an iterative algorithm to solve this

linear system and terminates this algorithm after only a few iterations (before con-
vergence). An appropriate iterative algorithm, for example, would be the previously
described SOR method. This yields a direction vector g, which, under certain assump-
tions, is guaranteed to be a direction of descent. In particular, the assumptions are:
f ( x ) /> 0, for all x c R n, V f ( x ) is Lipschitz continuous with constant L, g is chosen
to satisfy lgTv2f(xr)g q-Vf(xr)Tg < 0, and I1~11> L111V/(x")ll, and there exists
a constant L2 such that ½V2f(x) - L J is nonnegative definite for every x. Then, if
7 is selected so that 0 < 3' < 2L2/L, convergence is guaranteed [cf. Bertsekas and
Tsitsiklis (1989)].
For additional results and theory with respect to algorithms for the solution of
both nonlinear equations and unconstrained optimization problems, see Dennis and
Schnabel (1983).
Ch. 7: Parallel Computation 367

3.2.3. Algorithms f o r constrained optimization

Assuming now that the feasible set K is a Cartesian product, where K = I]~:,
z Ki,
and each xi is an ni-dimensionM vector, then one has natural extensions of the
nonlinear algorithms introduced for the unconstrained case to optimization problems
with constraints [cf. (3.2)]. Indeed, the nonlinear Jacobi method is given by

Nonlinear Jacobi method:

x~-+I ~- arg rain f ( x ~ , ... , X i~-

_ 1~ xi, ~
Xi+l~" .. ~ x~), i= 1, .. . z,

and the nonlinear Gauss-Seidel algorithm by

N o n l i n e a r G a u s s - S e i d e l method:

x~-+t = arg xmin

~ c K , f ( x ~ +1 , . . . , X .;+l 1 , X i , x "~
i-t-l'''', XTz), i = 1 , . . . , z.

Hence, the overall problem is decomposed into z smaller subproblems, each of

which itself is a constrained optimization problem, but over a smaller and simpler
feasible set.
Convergence of the iterates {x ~-} generated by the Gauss-Seidel algorithm to a
minimizer of f over K is guaranteed under the assumptions that f is a continuously
differentiable function, convex on K, and a strictly convex function of xi when the
other components of the vector x are held fixed. Under the very same conditions,
hence, one was guaranteed convergence of the nonlinear Gauss-Seidel method for
unconstrained optimization problems, where Ki = R TM .
Convergence of the nonlinear Jacobi method can be established under an appropriate
contraction assumption on the mapping x : = x - " / V f ( x ) . The nonlinear Jacobi
method can be implemented on a parallel architecture by allocating a distinct processor
to each of the z subproblems for the computation of the respective xi.
The linearized algorithms for unconstrained optimization are no longer valid for
constrained optimization. This is due to the fact that, even if we begin within the
feasible set K , an update can take us outside of the feasible set. A simple solution
is to project back whenever such a situation occurs. Recall that the projection P ( x )
was defined as: P ( x ) = argminy~K I!x - v i i .
In particular, we have the well-known gradient projection method, where an iterate
is given by
368 A. Nagurney

Gradient projection method:

x = P(S - 3`vf(x'-))

with 3' > 0, a positive stepsize.

Convergence conditions will now be briefly discussed. In particular, if V f ( x ) is
strongly monotone and Lipschitz continuous (cf. Definition 3.4), and f(x) >~ 0,
Vx c K , then if 3' is selected to be small enough, the sequence {x ~-} defined by
the above statement of the gradient projection algorithm converges to the solution
x* geometrically. What is important to note is that, although this linear algorithm is
not at first appearance amenable to parallelization, there may, nevertheless, be appli-
cations in which the realization of the above algorithm yields a natural decoupling.
For example, this would be the case if the feasible set K = 1-~i=1 Ki, with each
Ki = [0, oo), where the projection would be obtained by projecting the ith compo-
nent of x on the interval [0, co), which is simple and can be done independently and
simultaneously for each component. A similar situation may arise in the case of the
solution of dynamical systems, as we shall demonstrate in the numerical section.
Further, if K is a Cartesian product, one can consider a Gauss-Seidel version of
the gradient projection algorithm defined by the iteration:

Gauss-Seidel version of the gradient projection method:

x~ +1 P[x~ 7V~]'(x[ +' x;_+l' xT x~)], i = 1, ,z.

For supporting proofs of convergence of the above schemes, see Bertsekas and
Tsitsiklis (1989).
The algorithms for optimization problems presented here have focused on problems
where the constraint set is a Cartesian product. In the case where this does not hold
one may, nevertheless, be able to exploit the underlying structure of a problem and
take advantage of parallelism by transforming the problem in an appropriate fashion.
One approach is to consider a dual optimization problem, which may be more suit-
able for parallelization than the original primal problem. A variety of decomposition
approaches for large-scale problems are:presented in Lasdon (1970) and parallel de-
composition algorithms, in particular, are discussed in Bertsekas and Tsitsiklis (1989).
Since many such algorithms are better understood in the context of a specific applica-
tion, we defer a discussion along these lines until we consider a specific application
in Section 4.
We also refer the interested reader to Lootsma and Ragsdell (1988) for an overview
of parallel computation of nonlinear optimization problems and for a discussion of
parallelization of the conjugate gradient method, variable-metric methods, and several
decomposition algorithms. For an examination of dynamic programming and parallel
computers, see Casti, Richardson and Larson (1973) and Finkel and Mnaber (1987).
Experimentation on the parallelization of combinatorial optimization algorithms can
be found in Kindervater and Lenstra (1988).
Ch. 7: Parallel Computation 369

3.2.4. Algorithms for variational inequality problems

We now focus on the presentation of variational inequality algorithms, which may

be applied for the computation of equilibria. In particular, we first present projec-
tion methods and then decomposition algorithms for when the variational inequality
to be solved is defined over a Cartesian product of sets. We discuss decomposition
algorithms of both the Jacobi and Gauss-Seidel type, the former being naturally im-
plementable on parallel computer architectures. We don't present algorithms for com-
plementarity problems, since these are special cases of variational inequality problems,
and the theory of variational inequality algorithms is more developed. Moreover, we
don't discuss fixed point algorithms since they may not be appropriate for large-scale
Variational inequality algorithms resolve the variational inequality problem (3.4)
into simpler vari~itional inequality subproblems, which, typically, are optimization
problems. The overall efficiency of a variational inequality algorithm, hence, will
depend upon the optimization algorithm used at each iteration. The subproblems un-
der consideration often have a special structure and special-purpose algorithms that
exploit that underlying structure can be used to solve the embedded mathematical
programming problems to realize further efficiencies. Projection methods. Projection methods resolve a variational inequality

problem, typically, into a series of quadratic programming problems. They have been
applied for the computation of a plethora of equilibrium problems [cf. Nagurney
(1993)] and, although they were not developed as parallel computational procedures,
per se, may, nevertheless, resolve the problem because of the underlying feasible
set K, into (numerous) subproblems, which can then be solved simultaneously. The
same characteristic was discussed in regards to the gradient projection method in
Subsection 3.2.3.

Projection method:

= _ _

where G is a symmetric positive definite matrix, and ~/> 0.

Convergence is guaranteed [cf. Bertsekas and Tsitsiklis (1989)] provided that the
function F is strongly monotone (cf. Definition 3.3) and Lipschitz continuous (cf.
Definition 3.4), for any ~/E (0, 3,°], such that the mapping induced by the projection
above is a contraction mapping with respect to the norm II" Ila. The sequence { S }
generated by the projection algorithm then converges to the solution x* of (3.4)
In the case where the function F is no longer strongly monotone, but satisfies
the tess restrictive monotonicity condition (cf. Definition 3.1), and is also Lipschitz
370 A. Nagurney

continuous, then the modified projection method of Korpelevich (1976) is guaranteed

to converge to the solution of the variational inequality problem. If the function F
is monotone, rather than strongly monotone, then a unique solution, however, is no
longer guaranteed.

Modified projection method:

= -

where Y¢" is given by

= p(x _ 7F(x ))

and 7, is, again, a positive scalar, such that 7 C (0, ~], where L is the Lipschitz
constant in Definition 3.4. Note that here G -1 = I. Decomposition algorithms. Here we assume that the feasible set K is a

Cartesian product, that is,

K = HKi (3.8)

where each K i C Rn~; ~']iz=l ni = n; xi denotes a vector in R TM, and Fi(x) • K F-+
R n~ for each i.
Many economic equilibrium problems are defined over a Cartesian product set and,
hence, are amenable to solution via variational inequality decomposition algorithms.
For example, a variety of game theory problems would fall into this framework,
where each player has his or her own objective function and feasible set, with the
feasible set depending upon only the individual's particular strategies, and not on
those of the other players. This would be the case in classical oligopolistic market
equilibrium problems [cf. Cournot (1838), Gabay and Moufin (1980)]. In addition,
multicommodity spatial price equilibrium problems [cf. Takayama and Judge (1971),
Dafermos (1986)] would also have a feag~le set defined as a Cartesian product, where
each commodity would have to satisfy its own conservation of flow equations.
The appeal of decomposition algorithms lies in their particular suitability for the
solution of large-scale problems. Moreover, parallel decomposition algorithms can be
implemented on parallel computer architectures and further efficiencies realized.
We emphasize that for any given equilibrium problem there may be several al-
ternative, albeit, equivalent, variational inequality formulations, which may, in turn,
suggest distinct, novel, and not immediately apparent, decomposition procedures.
We present the nonlinear decomposition methods and then the linear decomposition
methods. For each, we first present the Jacobi version and then the Gauss-Seidel
The statement of a typical iteration ~- of the nonlinear Jacobi method is given by
Ch. 7: Parallel Computation 371

Nonlinear Jacobi method:

x~ +1 = solution of:

.T T
Fi(xl,.. ., z~~_~,~,x~+~, . . . , ~ ; ) T • ( x ~ - x ~ ) / > 0, Vx~ e K~, Vi.

A typical iteration of the nonlinear Gauss-Seidel method is given by

Nonlinear Gauss-Seidel method."

x~ +1 = solution of:

f T-}-| ,~T-[-I ~ ~'r-1 ~--1 t

~x~ ,...,~_~ ,~,~+~,...,zz ) • (x~ - z ~ ) / > 0, Vz~ c K~, Vi.

The linear Jacobi method, on the other hand, is given by the expression

Linear Jacobi method:

x[ +1 = solution of:

[F~(x~-) + A~(x~) o (xi - x~)] T . [x~ - x~] >1 O, Vx~ E K~, Vi.

The linear Gauss-Seidel method is given by the expression

Linear Gauss-Seidel method:

x~-+1 = solution of:

[F,,i( x ", r + l , . . . ~ X i-_r +ll ~Xi~

... X~)
+ A i ( x ~ + l , " " , ~_~-+l
- 1 , ~_~-
,"" ,xz)"
~- ( x i - x~)]T. [x~ - x~] /> 0,
Vxi E Ki, Vi.
There exist many possibilities for the choice of Ai(.). If A i ( x ~-) = Vz~Fi(x~),
then we have a Newton's method. If we let A i ( x ~) = Di(x~-), where Di(.) denotes
the diagonal part of Vz~Fi('), then we have a linearization method. If Ai(.) = Gi,
where G~ is a fixed, symmetric and positive definite matrix, then we get a projection
Note that the variational inequality subproblems above should be easier to solve
than the original variational inequality since they are smaller variational inequality
problems, defined over smaller feasible sets. In particular, if in the linear methods we
select the Ai(') to be diagonal and positive definite, then each of the subproblems is
equivalent to a separable quadratic programming problem with a unique solution (cf.
Section 3.1.1).
The subproblems that must be solved at each iteration of the nonlinear methods are
themselves variational inequality problems. Hence, an algorithm such as the projection
method (cf. Subsection would have to be applied, where V f ( x ) would now
372 A. Nagurney

be replaced by F(x), and the relaxation parameter would need to lie in the range
(0, 1]. See Dafermos (1983) and Nagurney (1993) for additional discussion of the
projection method for variational inequality problems, as well as other algorithms,
including the relaxation method.
The linear methods are appealing since each variational inequality subproblem may
be expected to take on a simpler form for computational purposes than in the case of
the nonlinear methods. This is especially true, as already mentioned above, if A(-) is
selected to be diagonal.
We now present a convergence theorem for the above decomposition algorithms
that is due to Bertsekas and Tsitsiktis (1989) [see, also, Nagurney (1993)].

THEOREM 3.1. Suppose that the variational inequality problem (3.4) has a solution
x* and that there exist symmetric positive definite matrices Gi and some ~ > 0 such
that Ai(x) - 6Gi is nonnegative definite for every i and x E K, and that there exists
a 7 C [0, 1) such that

IIc; - F (y) - A i ( y ) . (xi - yi))ll~ < max IIxj - yjlIj, Vx, y c K,


where Ilxill = ( x T a i x i ) 1/2. Then the Jacobi and the Gauss-Seidel linear and non-
linear decomposition algorithms, with Ai(x) being diagonal and posit&e definite,
converge to the solution x*.

Variational inequality theory was originally introduced by Hartman and Stampac-

chia (1966) for the study of partial differential equations, that arise principally in
mechanics. Such problems, however, in contrast to the ones considered here, are
infinite-dimensional. For the parallel solution of partial differential equations, see Or-
tega and Voigt (1985).

3.2.5. Algorithms for the dynamical system

Although the dynamical system (3.5) provides a continuous adjustment process, a

discrete time process is needed for actual computational purposes. Towards this end,
in this subsection, we first review a general iterative scheme introduced in Dupuis and
Nagurney (1993), which induces a variety of numerical procedures, all of which, in
turn, are designed to estimate solutions to the variational inequality problem (3.4) and
to trace the trajectory of the dynamical system from the initial state. We then present
several schemes induced by the general iterative scheme. These schemes are not in
themselves parallel. Nevertheless, since they are based on a projection operation,
which in many applications takes on a very simple form that is decomposable in the
variables, one oftentimes obtains a parallel scheme. Indeed, this will be illustrated in
terms of concrete applications in Section 4.
Ch. 7: Parallel Computation 373

The proposed algorithms for obtaining a solution to the variational inequality prob-
lem all take the form

x ~-+1 = P ( S - a~-F~-(S)), (3.9)

where, without loss of generality, the "~-" denotes an iteration (or time period),
{a~-, ~- E T} is a sequence of positive scalars, and the sequence of vector fields
{F~-(.), T E T} "approximates" F(.).
We now present the Euler-type method, which is the simplest algorithm induced
by the above general iterative scheme.

Euler-type method:
In this case we have that

F (x) =

for all -r E T and x E K . This would correspond to the basic Euler scheme in the
numerical approximation of standard ODEs.
Another method is

H e u n - t y p e method."
In this case we have that

Finally, if the function F is defined in a sufficiently large neighborhood of K ,

another method is

Alternative Heun-type method:

In this case we set

Other methods, which are induced by this general iterative scheme, include Runge~
Kutta-type algorithms.
We now consider a situation where the above schemes would be parallelizable.
Suppose that the feasible set K = r I ~ l K~, where each Ks = [0, oe). Then it is easy
to see that the expression (3.9) takes on the following simple closed form expression:

x~-+l = max{0, x~ - a r F i r ( x ' r ) ) , i = 1,.,.,z.

All of the xi"r+l, s, hence, can be updated in parallel.

374 A. Nagurney

We now give the precise conditions for the general convergence theorem and present
its statement. For proofs, see Dupuis and Nagurney (1993).

ASSUMPTION 3.1. Fix an initial condition x ° C K and define the sequence { S , ~- c

T } by (3.9). Assume the following conditions.
1. ~ =c~
o ai = c~, ai > O, ai --+ 0 as i -+ c~z.
2. d ( F . r ( z ) , F ( z ) ) --+ 0 uniformly on compact subsets o f K as r --+ oo, where
d( x , A ) = inf{llx - yll, y c A }, and the overline indicates closure.
3. Define Cu to be the unique solution to ~ = F l ( x , - F ( x ) ) that satisfies Cy(O) =
y C K . The w - limit set

yE K t>/O s>/t

is contained in the set o f stationary points o f Jc = I I ( x , - F ( x ) ) .

4. The sequence { x ~, -r E T } is bounded.
5. The solutions to ~ = l-I(x, - F ( x ) ) are stable in the sense that given any compact
set K 1 there exists a compact set K2 such t h a t U y E K n K I Ut>~0 {¢y(t)} C /(2,

The assumptions are phrased as they are because they describe more or less what
is needed for convergence, and because there are a number of rather different sets of
conditions that imply the assumptions.

ASSUMPTION 3.2. There exists a B < oo such that the vector field - F : R n ~+ R n
satisfies the linear growth condition: II - F(x)l[ ~< B(1 + IixlI)for x c K , and also

(-F(x) + F ( y ) ) T • ( x - y) <. B I I x - yll ~

f o r all x, y C K .

THEOREM 3.2. Let S denote the solutions to variational inequality (3.4), and assume
Assumptions 3.1 and 3.2.
Suppose { x ' , "r E T } is the scheme generated by (3.9). Then d ( x r, S) --+ 0 as
T --+ O0.

COROLLARY 3.1. Assume the conditions of" Theorem 3.2, and also that S consists" o f
a finite set o f points. Then lim,-~oo x r exists and equals a solution to the variational
inequality (3.4).

The above classes of problems and accompanying numerical methods were selected
for their general applicability with an eye towards unifying principles. In the subse-
quent section we focus on applications and numerical results that help to illustrate and
synthesize the computer science and mathematical programming principles of parallel
computing discussed thus far.
Ch. 7: Parallel Computation 375

4. Applications and numerical results

In this section we discuss both applications and numerical results. We begin with
an application of systems of equations, and also discuss applications of optimization
problems, variational inequality problems~ and dynamical systems. The applications
are drawn from econometrics, microeconomics, macroeconomics, and finance.

4.1. Nonlinear equations

In this subsection we discuss an application of nonlinear equations [cf. (3.t)] that

illustrates the parallel computation of the solution via the Jacobi and the Gauss-Seidel
methods and by using a software library.

4.1.1. Econometric model simulation

Nonlinear equations are used in the formulation of macroeconometric systems. Such

problems can be very large, especially when one wishes to solve the same model
repeatedly for different data sets, as would be the case, for example, in stochastic
simulation and optimal control. Indeed, such problems were some of the first economic
problems that were solved using supercomputers with vectorization [see, e.g., Amman
(1985), Ando, Beaumont and Ando (1987), Petersen (1987), Petersen and Cividini
(1989) and Amman (1989)].
Recently, Gilli and Pauletto (1993) considered the solution of a system of Eqs (3.1)
consisting of linear and nonlinear equations on the CM-2 architecture. The model that
they solved was a macroeconometric model of the Japanese economy, developed at
the University of Tsukuba and consisting of 98 equations and 53 exogenous variables.
The model was solved for ten time periods from 1973 to 1982.
The model, when put into block recursive form, exhibited a pattern common to
macroeconometric models, in that a large fraction of the variables (77 of them) lay in
one interdependent block, and were both preceded and followed by recursive equa-
tions. 6 variables were defined recursively before the block, followed by 15 variables
that did not feed back on the block.
The authors studied the interdependent block for the purpose of the Gauss-Seidel
algorithm considering an ordering of the equations that would minimize the amount
of feedback. The use of a DAG (directed acyclic graph) representing the equations
without their feedback, enables the identification of those sets of equations that can
be updated in parallel. Thus, 77 equations were grouped into 18 levels or sets.
Different orderings can yield different DAGs, and one might look for an ordering
that minimizes the number of levels (which would correspond to a minimum coloring
of the graph representing the equations of the block) in order to achieve the highest
possible speedup. However, such an ordering can result in a slower convergence (a
larger number of iterations) and, according to the authors, the question of an optimal
ordering remains open.
376 A. Nagurney

The parallelization of the Gauss-Seidel iteration is not exploitable on the CM-2

because it implies the execution of different tasks with different data, which can
only be done on an MIMD computer. In order to take advantage of the CM-2, the
authors then proceeded to repeatedly solve the same model, but using 8,192 different
datasets, corresponding to 8,192 different sets of exogenous variables. The number
of datasets was selected to correspond to the number of processors in order to yield
the best speedup possible. The Gauss-Seidel algorithm, which was now serial in the
dimension of the equations and parallel in the dimension of the data, required 22.2
seconds on the CM-2 using the convergence tolerance

z~- + l - z ~ - ~e, for alli, w i t h e = 0 . 0 0 1 .

It required 1,109 seconds on a Sun ELC, yielding a speedup of 50. Since the con-
vergence verification step itself was found to be time-consuming, a modification of it
reduced the CPU time on the CM-2 to 12.7 seconds and on the Sun to 863 seconds,
yielding an improved speedup of 68.
The authors also experimented with a Newton method for solving the model, where
the numerical approximation of the Jacobian matrix is carried out in parallel. For
the computation of the step in the Newton method, that necessitates the solution of
a linear system (cf. Section 3.2.2), the authors made use of the library, CMSSL [cf.
Thinking Machines Corporation (1992c)], which contains routines for solving systems
of equations.

4.2. Optimizationproblems
In this subsection we focus on constrained optimization problems [cf. (3.2)], in partic-
ular, portfolio optimization problems and, subsequently, an estimation problem known
as the constrained matrix problem. Both of these constrained optimization problems
are quadratic programming problems. For the portfolio optimization problem we uti-
lize the gradient projection method in which we embed a special-purpose algorithm
for the solution of the simpler subproblems. For the constrained matrix problem, we
apply a dual method which has been specifically developed for this problem and ex-
ploits its special structure. Since the resulting subproblems are of the same structure as
those encountered in the application of the gradient projection method to the portfolio
optimization problem, the same special purpose algorithm is applied at each iteration.

4.2.1. Portfolio optim&ationproblems

As mentioned in the Introduction, portfolio optimization problems stimulated the inter-

est in the development of the area of mathematical programming known as quadratic
Ch. 7. Parallel Computation 377

Recall the classical portfolio optimization problem [cf. Markowitz (1959) and
Sharpe (1970)] where there are n financial instruments, x denotes the n-dimensional
vector of shares of the instruments, Q is the variance-covariance matrix of dimen-
sion n x n, and r is the n-dimensional vector of expected returns on the individual
instruments. Then the portfolio optimization problem may be formulated as follows:

Minimize f ( x ) = xT Qx - rT x (4.1)

subject to

E xj = t, (4.2)

xj ~> 0, for all j = 1 , . . . , n , (4.3)

where the objective function denotes the risk minus the expected returns.
One may also introduce a risk parameter ), in which case the objective function (4.1)
is modified to

Minimize f (x) = xT Qx - /~rTx (4.4)

and the full problem also incorporates the above constraints. The variational inequality
formulation of this problem is given by

[ 2 Q x * - A r ] T . [ x - x * ] ~>0, VxEK, (4.5)

where the feasible set K consists of the constraint (4.2) and the nonnegativity con-
straints (4.3).
An application of the gradient projection method discussed in Subsection 3.2.3 re-
solves this problem into simpler quadratic programming problems, where the quadratic
matrix at each iteration is now the diagonal identity matrix and where at iteration ~-
the subproblem is given by:

Minimizex,~g x * T s + ( 3 ' ( 2 Q S - kr) - S - ' ) T x ~-. (4.6)

In applications one may wish to vary the A over a horizon in which case we have
the problem

Minimize f(x) = ~ x i T Qx i - ~irT xi (4.7)
378 A. Nagurney

subject to

i = 1,
Xj i = 1, ..., m, (4.8)

xji ~>0, for all i= 1,...,m; j=l,...,n, (4.9)

where x i here denotes the n-dimensional vector with components {x~,.. • , x in} cor-
responding to the shares associated with the problem Ai.
Observe that this problem may be decomposed into m subproblems, each of the
form given by (4.4) with a distinct A. In fact, an even finer decomposition (on the level
of m x n is possible with an appropriate implementation of the exact equilibration
algorithm, which is discussed (in a more general context) in Subsection 4.2.2.
The dataset that we utilized consisted of a variance-covariance matrix that was esti-
mated using the Standard & Poor's index consisting of 500 firms. The data consisted
of monthly data from 1986 through 1992 and the resulting estimated Q matrix was
of dimension 500 x 500.
The system utilized f o r the implementation of the gradient projection method was
the CM-2 with a SUN as the front-end. The parameter 3' was set to 0.001 and the
convergence tolerance e was set equal to 0.0001. The convergence criterion was:
I x ' - - x ' - - I I ~< e. The gradient projection method with the embedded exact equilibration
scheme was implemented in CM Fortran.
First, the single portfolio optimization problem with A = 1 was solved; this problem
was equivalent to (4.1) subject to (4.2) and (4.3). Subsequently, A was varied from 1
to 100 [cf. (4.7)] in increments of 2 yielding 51 portfolio optimization problems with
a total of 25,500 variables.
The single problem with )~ = 1 required 252 iterations for termination, whereas the
51 problems, with ,~l = 1, ,~2 = 3 , . . . , A51 = 100, required a total of 312 iterations.
The CPU times on the CM-2 required for convergence are reported in Table 7.1.
We did not solve the single problem using 32K processors since it had only 500
variables. In regards to the relative times on 8K processors, the 51 problems required
less than 3 times the amount of CPU time as did the single problem. Indeed, when
32K processors were used the 51 problems required only approximately 20% more
CPU time than the single problem required on 8Kprocessors.

Table 7.1
Portfolio optimization CPU times in seconds
Example 8K 16K 32K
single problem 365.28 238.16 -
51 problems 9 6 2 . 9 2 694.85 421.40
Ch. 7: Parallel Computation 379

It is worth mentioning that multi-sector, multi-instrument general financial equi-

librium problems, formulated as variational inequality problems [cf. Nagurney, Dong
and Hughes (1992), Nagurney (1994)] can be decomposed into subproblems of the
form considered here and, hence, at least in principle, are also amenable to massively
parallel computation.

4.2.2. Constrained matrix problems

Constrained matrix problems arise in numerous applications, such as the estimation of

input/output tables, social accounting matrices, migration tables, and origin/destination
tables in transportation [see, e.g., Bacharach (1970)]. These problems can also be very
large-scale in practice and are usually formulated as optimization problems. Here we
shall consider a special-purpose algorithm for the solution of the problem which
allows for parallel computation. The algorithm is a dual method that decomposes the
problem into many simpler subproblems, each of which can then be solved explicitly
and in closed form.
In this subsection we briefly review the constrained matrix problem with known
row and column totals under consideration here. For the formulation of the general
quadratic constrained matrix problem with unknown row and column totals and other
variants, we refer the reader to Nagurney and Eydeland (1992), and the references
In particular, we consider the diagonal constrained matrix problem, which is for-
mulated as a minimization of the weighted squared sums of the deviations. We denote
the given m x n matrix by X ° = (x°j), and the matrix estimate by X = (x~j). Let
s o denote the known row i total, and si the estimate of the row i total. Let d o denote
the known column j total, and dj the estimate of the column j total. We assume that
the 7ij elements are all positive.
The diagonal quadratic constrained matrix problem is given by:

Tr~ 7t,

Minimize f(x) : ~ Z 7ij(xij - xi°) 2 (4.10)

i=l j=l

subject to the row constraints

Xij : Si~ i = l,...,m, (4.11)

and the column constraints

~xij = d o, j = l,...,n, (4.12)

380 A. Nagurney


xij >>-0, for all i,j. (4.13)

Note that this problem is of the lbrm (3.2), where the feasible set K is defined by
the set of x that satisfy constraints (4.11)-(4.13).
For generalizations of this model to the estimation of financial flow of funds ac-
counts, see Hughes and Nagurney (1992) and Nagurney and Hughes (1992). The splitting equilibration algorithm and the exact equilibration algorithm.
Neither a Gauss-Seidel algorithm nor a Jacobi algorithm can be applied for the solu-
tion of this problem because the feasible set K here is not a Cartesian product, that is,
of the form (3.8). Nevertheless, the problem has a lot of structure that can be exploited
for parallel computation. In particular, one can see that if the objective function was
subject to either only the constraints (4.11) and (4.13), or (4.12) and (4.13), then
each of the m, respectively, n subproblems could be solved simultaneously. Note,
for example, the similarity of this optimization problem to the portfolio optimization
problem (4.7) subject to constraints (4.8) and (4.9), in which case s 0i would be equal
to 1 but although Q is no longer diagonal in the portfolio optimization problem, the
gradient projection method approximates the problem by making use of the diagonal
identity matrix at each iteration. Hence, the decomposed subproblems have essentially
the same structure and can be solved by the same suitable algorithm.
The algorithm, known as the Splitting Equilibration Algorithm (SEA) [cf. Nagur-
ney and Eydeland (1992)], computes a solution to the quadratic programming problem
(4.10), subject to constraints (4.11) through (4.13), by first considering a modifica-
tion of the objective function subject to only the row constraints (4.11), and then
by considering a modification of the objective function subject to only the column
constraints (4.12). The former problem is referred to as the Row Equilibration Step,
whereas the latter problem to as the Column Equilibration Step. The algorithm can
be interpreted and analyzed as a dual method.
The simplicity of the procedure lies in that each of the row/column subproblems,
because of their special structure, can be solved exactly, and in closed form, us-
ing exact equilibration. Exact equilibration algorithms were originally introduced by
Dafermos and Sparrow (1969), and then later generalized and theoretically analyzed
in Eydeland and Nagurney (1989). The massively parallel implementation of the Split-
ting Equilibration Algorithm (as the implementation of the gradient projection method
for the portfolio optimization problem) depends crucially on the massively parallel
implementation of the (row/column) exact equilibration algorithm.
The statement of SEA is as follows:

The splitting equilibration algorithm

Step O. Initialization step:

L e t # l c R n = O . S e t ~ - : = 1.
Ch. 7: Parallel Computation 381

Step 1. R o w equilibration:
Find X ( # ~) such that

X ( # "~) - - + nf(x) - #j xij

j=l " i=1

subject to

E xij = s i 0
, i= 1,...,rn, (4.15)

xij ~> O, for all i , j .

Compute the Lagrange multipliers according to:

~+1 = 2 7 i j X i j ( # r ) _ 2.yijxio _ # ; , for i = 1,... ,m.

Step 2. Column equilibration:

Find X(M -+l) such that

X(A ~'+') ) Min f ( x ) - E )~-+1 Yij - s (4.16)

/=1 \ j=t

subject to

Exij =d o , j= 1,...,n, (4.17)


zij >~ O, for all i , j .

Compute the Lagrange multipliers according to:

_ )kr+ 1
#;+1 = 2 " / i j X i j ( A "~+t) - 2 % j x i ° i , for j = 1 , . . . , n .

Step 3. Convergence verification:

If I ( E j xij(A~+I) - s io) / s i l0 ~ e, for all i, terminate; else, set r := r + t, and go to
Step 1.

We now present an algorithm for the solution of each of the row/column subprob-
lems with special structure. The notable feature of this procedure is that it lends itself
to a massively parallel implementation. For simplicity, we develop the presentation
of the exact equilibration procedure in the context of the column equilibration step.
382 A. Nagurney

In particular, in the column equilibration step at iteration % we are interested in

computing for each subproblem j, the flows x u , . . . , Xmj from the rows 1 , . . . , m to
the column j that satisfy the constraint (4.17) and the following optimality/equilibrium

+ hii ~ = #J if x~j > 0, (4.18)

[ >~#j if x i j = 0

where the 9ij = 27ij terms, for i = 1 , . . . , m, are all greater than zero and hij =
-2~/ijxi° - A~-+l, for i = 1 , . . . , m . The term #j is simply the Lagrange multiplier
corresponding to the constraint (4.17) (where the superscript ~-+ 1 has been removed),
and not known a priori. The algorithm below computes the solution to the above
system (4.18) in closed form. It can then be applied to solve each of the n column

Exact equilibration

(i) Sort the hij's; i = 1 , . . . , m , in nondescending order and relabel the hij's
accordingly. Define hm+t,j "-~--:xD. Set v := 1.
(ii) Compute

V ~ = 1 hii/giJ + do (4.19)
/Aj =

If hvj < t~ <~ hv+l,j, then stop; s' := v, and go to (iii). Otherwise, set v := v + 1,
and go to (ii).
(iii) Set
8 !
pj -- hij
x~3----, i = 1 , . . . , s ~,
Xij : O , i = g + l,...,m.

8 t
Note that #~+l is then equal to the #j computed above.
In an analogous manner one can construct an exact equilibration algorithm for
solving the ith row equilibration subproblem (4.14), subject to constraint (4.15) for
the particular row. Note that the algorithm should then be applied for the solution of
all the m row subproblems.
As described above SEA decomposes the constrained matrix problem into m row
subproblems, each of which can be solved independently and simultaneously on a
distinct processor using exact equilibration, and into n column subproblems, each of
which can also be solved independently and simultaneously. In this context, hence,
Ch. 7: Parallel Computation 383

if m = n then at most m processors would be used for the parallel implementa-

tion; this is, indeed, the case with a coarse-grain architecture. However, in the next
subsection we will show how a massively parallel implementation of SEA with the
exact equilibration algorithm exploits all n x n processors, if such an architecture is
available. The massively parallel implementation of SEA. The massively parallel im-
plementation of SEA was earlier reported in Kim and Nagurney (1993). The language
that was used for the implementation was CM Fortran and the architecture, the CM-2.
We now briefly describe some of the intrinsic functions of C M Fortran that make
it very well-suited for implementing the exact equilibration algorithm. For example,
the intrinsic function c m f _ o r d e r sorts elements of a matrix either row - wise or
column - wise and returns the indices. The m i n v a l and m a x v a l functions, in turn,
return the smallest, respectively, largest element of a row or a column in an array.
The t r a n s p o s e feature of a matrix, in turn, is useful in minimizing the cost of
communication between processors in which the data elements are located.
Since matrix operations in CM Fortran must be conformable, i.e., the operated on
matrices must be of the same dimensions, one may need to change a matrix into a
vector, or vice versa; for such transformations the functions p a c k and u n p a c k are
very useful. Also one may use the s p r e a d command to replicate a vector into a
Finally, we note the availability of logic statements, such as the w h e r e , e l s e ,
e n d statement that checks conditions on vector/matrix elements in parallel.
Here we consider the estimation of an input/output table. Before presenting the
numerical example, we focus on the critical implementation issues.
Recall that S E A decomposes the constrained matrix problem into row subproblems
and column subproblems. Hence, in an n x n problem there would be n row subprob-
lems to be solved and then n column subproblems, until convergence. In particular,
the solution of each of the n subproblems of the form (4.18), which consisted of n
unknown xij variables, was carried out by using n of the processors to first compute
the #ff given in Eq. (4.19), for v = 1 , . . . ,n. A s h i f t c o m m a n d was then utilized
in order to bring the neighboring hvj, hv+l,j values to the same location, in order to
minimize the communication. The hvj < #~ ~< hv+l,i check condition was imple-
mented using the w h e r e , e l s e , e n d construct. All n column problems were solved
in the same fashion, simultaneously. The x i j ' s for i = 1 , . . . , n; j = 1 , . . . , n were
then updated, also simultaneously.
We report now the results of both the implementation of SEA on the CM-2 and on
the IBM 3090/600.
For the parallel version of SEA on the IBM 3090/600E we utilized as the base the
serial F O R T R A N code developed on this machine and added the Parallel Fortran (PF)
constructs in order to handle the task allocation, that is, the assignment of each of the
n r o w / n column subproblems to the six CPU's. The conversion of the serial Fortran
384 A. Nagurney

Table 7.2
Constrained matrix problem example IO72b
(485 rows x 485 columns)
# of physicN processors CPU time in seconds
8K 51.74
16K 29.58
32K 16.34

code to this parallel code was relatively straightforward in that only task origination
statements, dispatch statements that allocated a row/column subproblem to the next
available processor, a waiting statement for synchronization, and task termination
statements had to be added to the original serial code.
We now present the results of our implementations on the two architectures. The
convergence criterion was set at e = 0.01. The weights, the 3'ij's were set to 1/x°j
for z°j > 0, and to 1, otherwise.
In Table 7.2 we present the results of the computations on the CM-2 system for
a dataset based on an input/output matrix, IO72b, consisting of 485 rows and 485
columns and representing a dataset of a 1972 input/output matrix for the US. This
problem consisted of 235,225 variables. The problem was solved using 8K processors,
16K processors, and, finally, 32K processors.
Observe that the CM-2 CPU time decreases approximately linearly as the number
of processors is increased. We note that the same problem when solved on an IBM
3090/600E required 438.35 CPU seconds for the serial Fortran code [cf. Nagurney
and Eydeland (1992)], compiled using the FORTVS compiler, optimization level 3,
and 291.54 CPU seconds on an IBM 3090/600J. The number of iterations required
for convergence was 4 for SEA both on the CM-2 and on the IBM 3090/600. In
terms of the parallel runs on the IBM 3090/600E, the wall clock time required for
convergence of" the parallel implementation of the Splitting Equilibration Algorithm,
compiled using the PF compiler, was 444.18 seconds for 1 CPU, 229.85 seconds for
2 CPUs, 118.76 seconds for 4 CPUs, ahd 86.32 seconds for 6 CPUs.
For additional discussion on the constrained matrix problem and supplementary
references, see Nagurney (1993).

4.3. Parallel computation of variational inequality problems

Here we consider the parallel computation of variational inequality problems [cf.

(3.4)]. We first discuss the massively parallel computation of spatial price equilibria
with ad valorem tariffs via the modified projection method. As mentioned in Section, this algorithm is not a parallel decomposition method per se, but, may, never-
theless, due to the structure of the feasible set K in the application in question, yield
Ch. 7."Parallel Computation 385

subproblems that can be solved simultaneously. This is precisely the feature that is
illustrated in this application. We then discuss the parallel computation of multicom-
modity spatial price equilibrium problems via variational inequality decomposition
algorithms and their implementation on a coarse-grained architecture.

4.3.1. Spatial price equilibrium problems with ad valorem tariffs'

In this subsection we briefly review the perfectly competitive spatial market model
with ad valorem tariffs introduced in Nagurney, Nicholson and Bishop (1995).
We consider m supply markets involved in the production of a homogeneous com-
modity and n demand markets. We denote a typical supply market by / and a typical
demand market by j. Let si denote the supply at supply market i and dj the demand
at demand market j. We group the supplies into a column vector s E R '~ and the
demands into a column vector d c R n. Let Qij denote the nonnegative commodity
shipment between supply and demand market pair (i, j), and group the commodity
shipments into a column vector Q c R ~n.
The commodity shipments and the supplies and demands must satisfy the following
conservation of flow equations:

s~=~Q~j, i=l,...,m, (4.20)


dj = ~ Q i j , j=l,...,n. (4.21)

Hence, the supply at each supply market must be equal to the sum of the commodity
shipments from that market to all the demand markets, and the demand at each demand
market must be equal to the sum of the commodity shipments from all supply markets
to each demand market.
We now describe the price and cost structure. Let 7ri denote the supply price at
supply market i and pj the demand price at demand market j. We group the supply
prices into a row vector 7r E R m and the demand prices into a row vector p ~ R n. The
transportation cost associated with shipping the commodity between supply market i
and demand market j is denoted by cij. We group the transportation costs into a row
vector c E R ran.
We assume that the supply price at a supply market may, in general, depend upon
the supplies of the commodity at every supply market, that is,

~=~(s). (4.22)

Similarly, the demand price at a demand market may, in general, depend upon the
demands for the commodity at every demand market, that is,

p=p(d). (4.23)
386 A. Nagurney

The per unit transportation cost, in turn, associated with shipping the commodity
between a pair of supply and demand markets is assumed to be fixed, that is, it is
independent of the volume of commodity shipments, where the fixed per unit cost is
denoted by cij, and the associated ran-dimensional vector by E. Hence, we have that

c = e. (4.24)

Note that other fixed per unit transfer costs and per unit tariffs can be readily incor-
porated into the fixed ~ function.
We now introduce discriminatory ad valorem tariffs. Let tij denote the ad valorcm
tariff, assumed positive, and applied to imports by demand market j from supply
market i. The incorporation of ad valorem tariffs modifies the spatial price equilibrium
conditions [cf. Samuelson (1952), Takayama and Judge (1971)] as follows: For all
pairs of supply and demand markets (i,j); i = 1 , . . . , m; j = 1 , . . . , n, a commodity
supply, shipment, and demand pattern (s*, Q*, d*) satisfying (4.20) and (4.21) is said
to be in equilibrium if

(Tci(s*) + cij)" (1 + t i j ) ~ = pj(d*) if Qi*j > O,

pj(d*) ifQi*j = 0 -

Hence, in equilibrium, if a positive amount of the commodity is shipped between a

pair of supply and demand markets, then the effective supply price plus transportation
cost after the imposition of ad valorem tariffs must be equal to the demand price at
the demand market. If there is no commodity shipment between a pair of supply and
demand markets, then the effective supply price plus transportation cost can exceed
the demand price.
In view of constraints (4.20) and (4.21), one can define the functions ~i(Q) -
7ri(s), i = 1 , . . . ,m, and the functions ~j(Q) = pj(d), j = 1,..., n. The variational
inequality formulation of the equilibrium conditions governing the spatial market
model with ad valorem tariffs derived in Nagurney, Nicholson and Bishop (1995) is

~-~((~i(Q*)+~ij).(l+tij)-'fij(Q*)).(Qij-Qi*j)>~o, (4.26)
i=l j=l

VQ e R ;
If one then defines

fij(Q) ~ ((~i(Q) +~ij)" (1 +tij) - ~j(Q)), Vi,j, (4.27)

and lets F(Q) c R mn be the row vector with (i, j)th component Fij (Q), then varia-
tional inequality (4.26) may be expressed in standard form as:
Ch. 7: Parallel Computation 387

Determine Q* c K , such that

F(Q*). ( Q - Q*) >~o, VQ c K, (4.28)

where the feasible set K - {Q I Q c R~)'~}.

Note that since K is the nonnegative orthant in the reformulation of this problem,
this problem can also be formulated as a complementarity problem:
Determine Q* >~ 0, such that

F(Q*) T ~ O and F(Q*).Q*=O.

A variety of spatial price equilibrium problems, including net import models, and
models with quotas, have been formulated and solved as linear complementarity prob-
lems on parallel architectures, under the assumption of linear supply and demand price
functions, by Guder, Morris and Yoon (1992, 1993) using the Sequent Symmetry $81
with 20 processors.
The algorithm that we propose is the modified projection method (cf. Sec-
tion which resolves the variational inequality problem under consideration
here into subproblems that are very simple for computational purposes. Indeed, we
obtain a closed form expression for the determination of the commodity shipments at
each iteration. Moreover, since each of the commodity shipments between a pair of
supply and demand markets can be evaluated separately and simultaneously at any
iteration, this algorithmic scheme enables one to exploit the availability of (massively)
parallel computer architectures.
We now provide the closed form expressions for the solution of encountered sub-
problems. In particular, one must first compute: For all supply and demand market
pairs (i,j), i = 1,...,m; j = 1,...,n,

QiZj = max {0,7((-Tri(s ~-) - ~ij)(1 + tij) + pj(d~)) + Qi~}, (4.29)

and then: For all supply and demand market pairs (i, j), i = 1 , . . . , m; j = 1 , . . . , n,

= max {0, 7((-Tri(g ~-) - 6~)(1 + tij) + pj(d~)) + Qi~ }. (4.30)

In view of expressions (4.29) and (4.30), one sees that all of the m n commodity
shipments can be solved simultaneously at each iteration ~-. Hence, an "ideal" com-
puter architecture for the solution of such problems may be one in which there are as
many processors as there are pairs of markets. The convergence results can be found
in Nagurney, Nicholson and Bishop (1995).
388 A. Nagurney Implementation of the Modified Projection Method on the CM-2. In this

subsection some numerical results are presented for the implementations of the mod-
ified projection method on two distinct architectures, the IBM ES/9000, when the
algorithm is implemented in Fortran, compiled, and executed using a single proces-
sor, and the CM-2, when the algorithm is implemented in CM Fortran and executed
on 8K, 16K, and 32K processors. We consider the solution of large-scale spatial price
equilibrium problems with discriminatory ad valorem tariffs.
Specifically, we consider spatial price equilibrium problems in which the supply
price and demand price functions are asymmetric and linear and the transportation
cost functions are fixed.
The CM Fortran code for the implementation of the modified projection method
for the model consisted of an input and setup routine and a computation routine to
implement the iterative steps (4.29) and (4.30). The crucial feature in the design of the
program was the construction of the data structures to take advantage of the data level
parallelism and computation. We first constructed the array C, of dimension m x n,
to store the transportation costs {~{j}. We then constructed the array t to store the
tariff rates, with the (i, j)th component equal to tij.
The supply price coefficients were stored in an m × m array SC, and the demand
price coefficients were stored in an n × n array DC. We also introduced additional
arrays SP and DP to denote, respectively, the supply prices and the demand prices
at a given iteration, where the /th row of SP consisted of the identical elements
{Tri} and the jth column of DP consisted of the identical elements {pj}. To compute
the supply prices, we used the s p r e a d command to spread the supplies and then
multiplied the resulting matrix with the SC matrix. Specifically, the s p r e a d command
makes multiple copies of a vector along columns or along the rows to create a 2-
dimensional array. We then used the sum command to add the elements of each row
of the resulting arrays and added the resulting vector to the vector containing the fixed
supply price terms. The result was then s p r e a d to create the supply prices SP at the
particular iteration. The demand prices were obtained in an analogous fashion. The
array QO was used to storethe values of Q from the previous iteration and was used
for convergence purposes. ~'
We now present the critical steps in the CM Fortran computation section.

Implementation of the Modified Projection Method

Do w h i l e(elT.ge.e)
1. QO(:,:)=Q(:,:)
2. construct SP and DP
3. temp(:,:)=Q(:,:)+'y(DPC,:)-(C(:,:)+SP(:,:))*(I +t))
4. Q(:,:)=temp(:,:)
5. where(tempC,:).lt.0.) Q(:,:)=0.
6. update SP and DP with new Q
7. tempC,:)=QOC,:)+o'(DPC,:)-(C(:,:)+S P(:,:))*(1 +t))
8. Q(:,:)=ternp(:,:)
Ch. 7." Parallel Computation 389

9. w h e r e(temp(:,:).lt.0.)Q(:,:)=0.
10. err--maxval(abs(Q-QO))
11. update supplies and demands
e n d do

Hence, from step 3 above it can be seen that element (i, j) of the array "temp"
contains at the ~-th iteration the value of: 7(pj(d ~-) - (~ij + 7r~(s~-))(1 + tij)) + Q~:~
[cf. (4.29)]. In step 7 above, on the other hand, it can be seen that element (i,j) of
the array "temp" now contains at the ~-th iteration the value of: 7 ( p j ( d ~-) - (~j +
7ri(g'~))(1 + tij)) + Qi~ [cf. (4.30)]. All the variables above followed by a "(:,:)" are
2-dimensional arrays.
Q is updated by using a mask in steps 5 and 9, where the (i,j)th element is set to
zero if the value of temp(i,j) is negative. What is important to note is that, at each
iteration, all of the Q~j's, for i = 1 , . . . , m; j = 1 , . . . , n, are computed and updated
simultaneously. This is not possible when the algorithm is implemented on a serial
architecture with consequences that shall be highlighted subsequently.
Note that the above code can be easily adapted to solve spatial price equilibrium
problems without ad valorem tariffs, but with fixed unit transportation costs, by simply
removing the (1 + t) expression, which we did, as well.
We now turn to the presentation of the numerical results. The problems are large-
scale problems ranging in size from one hundred supply markets and one hundred
demand markets to five hundred supply markets and five hundred demand markets,
that is, with ten thousand to two hundred and fifty thousand commodity shipment
The numerical results for the large-scale problems are reported for both the serial
implementation of the algorithm in Fortran on the IBM ES/9000 and the parallel
implementation in CM Fortran presented above on the CM-2 architecture.
The tariffs t~j were generated randomly and uniformly in the range: [0, 2, ]. A full
description of the datasets, along with additional numerical results, can be found in
Nagurney, Nicholson, and Bishop (1995).
We set the convergence tolerance e = 0.01, and set " / - 0.0001 for all the numerical
examples. Also, we initialized the algorithm for each example w i t h QOj = 0 for all i, j.
The serial implementation of the modified projection method on the IBM ES/9000
yielded the same number of iterations as had been obtained on the CM-2 for each
example. The CPU times are reported in Table 7.3. We report the times for each
example both with the tariffs and with the tariffs removed.
The first example, 100 x 100, consisting of one hundred supply markets and one
hundred demand markets, required 185 iterations for convergence in the absence of
tariffs and 4,611 iterations in the presence of tariffs. The second example, 200 x 200,
consisting of two hundred supply markets and two hundred demand markets, required
286 iterations for the without-tariff case, and 2,951 iterations for the with-tariff case.
The third example, 300 x 300, consisting of three hundred supply markets and three
hundred demand markets, required 250 iterations for convergence for the without-tariff
390 A. Nagurney
Table 7.3
Numerical results for large-scale problems with ad valorem tariffs CPU times in seconds
IBM ES/9000 CM-2 (8K) CM-2 (16K) CM-2 (32K)
Example W i t h o u t With Without With Without With Without With
m xn tariffs tariffs tariffs tariffs tariffs tariffs tariffs tariffs
100x 100 8.80 241.50 7.58 186.72 5.22 130.90 - -
200x200 53.83 632.08 17.23 178.72 14.96 155.19 10.89 112.89
300x300 107.94 >900 21.28 239.96 14.04 158.19 10.05 113.07
400x400 246.31 >900 38.23 523.15 24.92 340.64 17.43 238.29
500x500 880.47 >900 158.82 657.48 120.78 499.26 68.88 284.58

case and 2,796 iterations for the with-tariff case. The fourth example in Table 7.3,
400 x 400, consisting of four hundred supply markets and four hundred demand
markets, required 305 iterations for the without-tariff case, and 4,140 iterations for
the with-tariff case. The final problem, 500 x 500, consisting of five hundred supply
markets and five hundred demand markets, required 686 iterations for convergence
for the problem without tariffs, and 2,825 iterations for convergence for the problem
with tariffs.
It is apparent that the use of a massively parallel architecture for these large-scale
problems realized substantial savings in CPU time over the time required on the serial
architecture. For example, in the smallest problem, 100 x 100, and without tariffs, the
time on the IBM ES/9000 was 8.8 seconds, whereas the time using 16K processors
of the CM-2 was 5.22 seconds. (We did not solve this problem on 32K processors
since there were only 10,000 variables in this size of problem.) In the next largest
problem, 200 x 200, the time on the ES/9000 for the problem without tariffs was 53.83,
whereas the same problem was solved in only 10.89 seconds using 32K processors
of the CM-2, a five-fold improvement. This improvement in relative performance
increased as the size of the problem increased, with the result that the largest problem
in this set, 500 x 500, required 880.47 seconds on the ES/9000 and less than a tenth
of that time, 68.88 seconds, when 32K processors of the CM-2 were utilized. The
largest problem, 500 x 500, and with tariffs, only required about 4 minutes using 32K
processors of the CM-2.

4.3.2. Multicommodity problems

In this subsection we consider a multicommodity version of the spatial price equilib-

rium model described in Subsection 4.3.1, but without ad valorem tariffs, which will
be used as a model for illustrating variational inequality decomposition algorithms.
For additional background, see Nagurney (1993).
Consider again m supply markets and n demand markets but now involved the pro-
duction/consumption of J different commodities, with a typical commodity denoted
Ch. 7." Parallel Computation 391

by k. As before, denote a typical supply market by i and a typical demand market

by j. Let @ denote the supply of the commodity k associated with supply market i
and let Tr~
k denote the supply price of this commodity associated with supply market i.
Let d~ denote the demand for commodity k associated with demand market j and
let p~ denote the demand price associated with demand market j and commodity k.
Group the supplies and supply prices, respectively, into a column vector s C R Jm
and a row vector 7r E R J~. Similarly, group the demands and the demand prices,
respectively, into a column vector d E R Jn and a row vector p E R Jn.
Let Qi~ denote the nonnegative commodity shipment of commodity k between
the supply and demand market pair ( i , j ) and let cijk denote the nonnegative unit
transaction cost associated with trading commodity k between (i, j). Assume that the
transaction cost includes the cost of transportation; depending upon the application,
one may also include a tax/tariff, fee, duty, or subsidy within this cost. Group then
the commodity shipments into a column vector Q E R J~n and the transaction costs
into a row vector c E R Jmn.
The market equilibrium conditions, assuming perfect competition take the following
form: For all pairs of supply and demand markets (i, j): i = 1 , . . . , m; j = 1 , . . . , n,
and all commodities/~ = 1 , . . . , J:

if Qikj* > 0,
~ ) + c~j pj(d*) (4.31)
if Qikj* = 0.

The condition (4.31) states that if there is trade of commodity k between a market
pair (i, j), then the supply price of k at supply market i plus the transaction cost
between the pair of markets associated with trading commodity k must be equal to
the demand price of k at demand market j in equilibrium; if the supply price plus
the transaction cost exceeds the demand price, then there will be no shipment of that
commodity between the supply and demand market pair.
Moreover, the following feasibility conditions must hold for every commodity k,
and markets i and j:

8i = ij (4.32)


392 A. Nagurney

The transaction cost between a pair of supply and demand markets associated with
trading a commodity may now depend upon the shipments of all the commodities
between every pair of markets, that is,

c = c(Q) (4.34)

where c is a known function.

The variational inequality formulation of the equilibrium conditions (4.3 l) is

~(s*). (s - s*) + c ( O * ) . (O - O*) - p(d*). (d - d*) > O,

v(,, Q, d) e K,

where K - I~k=l Kk, where Kk is defined as the set of (s, Q, d), such that constraints
(4.32) and (4.33) are satisfied.
The algorithm that was utilized for the solution of this problem was the linear
Jacobi method with a diagonal matrix A(-), which resolves the problem into single
commodity problems, each of which, in turn, is equivalent to a quadratic programming
problem. The algorithm that was utilized for the solution of the embedded subproblems
was the demand market equilibration algorithm of Dafermos and Nagurney (1989).
In particular, the algorithm in the context of this application, is expressed as follows.

Linear Gauss-Seidel method:

Start with an initial feasible (s °, QO, d o) c K.
At iteration r, construct new supply price, demand price, and transaction cost
functions, which are linear and separable, and given for each commodity k by

r,k,~, O~r~ ~ k ( 0~/k ~ ~,k"~

~r¢ t,s¢)= ~sk (S )s~ + ,Tr~(s')- -~ski (s )s~ ), i = 1,...,m,

pj'r k'dk'
v j j = OP~ gd%d~
, + p)(d r ) - (dr)dj, , j= 1,..,n,
Od~ ~ 3

r,~ k Oc~j r k (k 04 -\

i= 1,...,m; j= 1,...,n,

and solve the variational inequality subproblem for each k, of the form (4.35), which is
equivalent to a quadratic programming problem. The solution is (s r+l, Qr+l,dr+l).
In Nagurney and Kim (1989) a problem consisting of 50 supply markets, 50 demand
markets, and 12 commodities was solved, where the supply price and demand price
functions were quadratic, and the transaction cost functions were highly nonlinear (to
the fourth power).
Ch. 7: Parallel Computation 393

Table 7.4
Speedups and efficiencies for a
N fN f~ SN EN
2 73.34 128.72 1.76 88%
3 55.63 128.72 2.31 77%

In Table 7.4 the speedups and efficiencies [cf. (2.1) and (2.2)] obtained when the
algorithm was implemented on an IBM 3090/600E and compiled using the Parallel
Fortran compiler are reported. Tl* here denotes the time required for the algorithm im-
plemented on a single processor of the system. The task allocation was accomplished
by using the constructs provided in Parallel Fortran.
Additional numerical results can be found in Nagurney and Kim (1989) for both
the linear Jacobi algorithm and the linear Gauss-Seidel algorithm.

4.4. Parallel computation of dynamical systems

In this subsection we consider the computation of dynamical systems via the Eu-
ler method presented in Subsection 3.2.5. We first illustrate the method through an
application to the classical oligopoly problem and then discuss a massively parallel
implementation of the algorithm for the computation of a dynamical systems model
of spatial price equilibrium.

4.4.1. Oligopolistic market equilibria

In this subsection we first briefly review the oligopoly model and its variational
inequality formulation. We then present the dynamical system whose set of stationary
points corresponds to the set of solutions of the variational inequality problem.
Assume that there are m firms involved in the production of a homogeneous com-
modity and a single demand market. Let qi denote the nonnegative commodity output
produced by firm i and let d denote the demand for the commodity at the demand
market. Group the production outputs into a column vector q c R m +.
The following conservation of flow equation must hold:

d = Zqi. (4.36)

Associate with each firm i a production cost f,i, where

£ = fi(qi). (4.37)
394 A. Nagurney

The demand price for the commodity is given by

p = p(d). (4.38)

The profit or utility ui of firm i is then given by the expression

ui = pqi --fi. (4.39)

In view of (4.36)-(4.38), one may write the profit as a function solely of the production
output, i.e.,

u=u(q). (4.40)

Now consider the usual oligopolistic market mechanism [cf. Cournot (1838), Nash
(1950)], in which the m firms supply the commodity in a noncooperative fashion, each
one trying to maximize its own profit. We seek to determine a nonnegative production
pattern q* for which the m firms will be in a state of equilibriumas defined below.

DEFINITION 4.1. A commodity production pattern q* C R ~ is said to constitute a

Cournot-Nash equilibrium if for each firm i; i = 1 , . . . , m,

ui(qi,* q~) >~ ui(qi, qi~), Vqi E R+, (4.41)

where q~ = ( q ~ , . . . *
qi-1, qi*+l' *
" • " , qm)"

As established in Gabay and Moulin (1980), the variational inequality formulation

of the Cournot-Nash equilibrium is as follows.
Assume that for each firm i the profit function ui(q) is concave with respect to the
variables { q l , . . - , qm}, and continuously differentiable. Then q* E R ~ is a Cournot-
Nash equilibrium if and only if it satisfies the variational inequality

- -
£ Oui(q*) (qi . - -
q*) >~ O, Vq E RT+n, (4.42)
i=l Oq~

or, equivalently, q* is an equilibrium production pattern if and only if it satisfies the

variational inequality
m ~} ,
[ fi(q~) ;(Ei=, - ; q; x [q, - q;] o,
i=1 [ Oqi (4.43)

Vq C R~.
Ch. 7: P a r a l l e l Computation 395

We will now put the oligopolistic market equilibrium problem into standard varia-
tional inequality form. Let x be the column vector x = q c R '~, and let F(x) c R ~ be
the row vector with components: ( - ~ ( qi)qll ) ~ " " " ~ _ 0....
(q~)~t~ and K _= {q {q ~> 0},
then variational inequality (4.43) governing the classical Cournot-Nash oligopoly
problem can be placed in standard form.
We now state the ordinary differential equation (ODE) (cf. (3.5) and (3.6)). The
class of pertinent ODEs takes the form:'

~=[I(z,-F(z)), x ( O ) - xo El4. (4.44)

As established in Lemma l in Dupuis and Nagurney (1993), each stationary point

of (4.44), that is, each point in the set of x* satisfying

0 = I1(2",-F(x*)), (4.45)

also satisfies the variational inequality (4.42).

The ordinary differential Eq. (4.44), however, is iaot standard in that the right-hand
side is discontinuous. Nevertheless, as has been established in Dupuis and Nagurney
(1993), the important qualitative and quantitative results of "standard" ODEs will still
be applicable.
We now briefly interpret the ODE (4.44) in the context of the oligopoly model.
First, note that ODE (4.44) ensures that the production outputs are always nonnegative.
Indeed, if one were to consider, instead, the ordinary differential equation: 5: = - F ( x ) ,
such an ODE would not ensure that x(t) ) 0 for all t ) 0, unless additional restrictive
assumptions were to be imposed, such as the assumption that the solutions to the
oligopoly problems lie in the interior of the feasible set [cf. Okuguchi (1976) and
Okuguchi and Szidarovsky (1990)]. ODE (4.44), however, retains the interpretation
that if x at time t lies in the interior of K, then the rate at which x changes is greatest
when the vector field - F ( x ) is greatest. Moreover, when the vector field pushes x
to the boundary of the feasible set K, then the projection I1 ensures that x stays
within K .
Recall now the definition of F(x) for the oligopoly model, in which case the
dynamical system (4.44) states that the rate of change of the production outputs
is greatest when the firms' marginal utilities are greatest. If the marginal utilities
are positive, then the firms will increase their shipments; if they are negative, then
they will decrease their shipments. Therefore, ODE (4.44) is a continuous adjustment
or tatonnement process for the oligopoly problem. Although the dynamical system
provides a continuous adjustment process, a discrete time process is needed for actual
computational purposes. In particular, in the context of the classical oligopoly model,
396 A. Nagurney

one would, at each iteration ~- of the Euler method, compute the new production
outputs for each firm i in closed form as follows

i = max { 0, a~-oui(q'[)
~qi +q[}, foreach i= 1,...,m. (4.46)

Observe that (4.46) is a parallel adjustment process, where in the classical oligopoly
problem all of the production outputs are updated simultaneously. Proof of conver-
gence of the Euler method for this model, as well as for a spatial oligopoly model,
can be found in Nagurney, Dupuis and Zhang (t994).
It is worth noting the similarity between the EuleHype method and the Goldstein-
Levitin-Polyak gradient projection method, cf. Goldstein (1964, 1967) and Levitin
and Polyak (1966), who independently proposed a projection method for minimizing
a continuously differentiable function f : K ~ R, where the iteration ~- takes the
form: x+- = P(x~- - a-~Vf(x~-)) [see also Bertsekas (1976)]. The oligopoly problem,
however, is a variational inequality problem and not an optimization problem, and
although the Euler-type method can be used to solve an optimization problem, the
converse does not hold true, that is, an optimization algorithm cannot be used to solve
a variational inequality problem (unless it can also be cast as an optimization problem,
which would hold in the very special case where the Jacobian of F is symmetric).
It is also worth mentioning that Arrow and Hurwicz (1958a,b) earlier proposed a
gradient method for optimization problems, which was stated as solving a dynamical
system. A discussion of other gradient-type methods, based on both the Hildreth
(1957) and the Arrow-Hurwicz methods, and their application to classical spatial
price equilibrium problems, can be found in Takayama and Judge (1971). A numerical example. We now apply the Euler-type method to compute the
solution to a numerical example. The algorithm was coded in Fortran and the system
used for the numerical work was the IBM ES/9000.
The example is taken from M u r p ~ , Sherali and Soyster (1982). The oligopoly
consists of five firms, each with a production cost function of the form:

l /~i J , l

f+(q+) = c+q+ + (#+~i+ 1) hi ~ qi ~ , (4.47)

with the parameters given in Table 7.5. The demand price function is given by:

Ch. 7: Parallel Computation 397

Table 7.5
Parameters for the 5-firm
oligopoly exmnple

Firm i ci hi fli
1 10 5 1.2
2 8 5 1.1
3 6 5 1.0
4 4 5 0.9
5 2 5 0.8

The convergence criterion was: ]q~~" - q i r-1 ] ~< 0.001, for all i. The algorithm
was initialized at qO = (10, 10, 10, 10, 10). We utilized the sequence: {a-~}=lO x
{1, I l I I I ' .}.
2 ' 2 ' 3' 3 ' 3 ' "
The algorithm required 19 iterations and only a negligible amount of CPU time for
convergence. The algorithm converged to q* = (36.93, 41.81, 43.70, 42.65, 39.17),
reported to four digits Of accuracy.
As reported in Nagurney (1993), the projection method, which would in the above
general iterative scheme [cf. (3.9)] correspond to F,-(z~-) = F ( x T ) with a~ = 7,
for all iterations r, required 33 iterations for convergence to the same solution with
~, = 0.9, under the same initial conditions. The relaxation method, on the other
hand, cf. Nagurney (1993), required only 23 iterations but was more computationally
costly, since at each iteration nonlinear equations must be solved. Also, we emphasize
that the conditions for convergence of both the projection and the relaxation method
are more restrictive than those required by the general iterative scheme described in
Subsection 3.2.5.

4.4.2. Spatial price equilibria

In this subsection we consider a dynamical systems model of a single commodity

version of the spatial price equilibrium model described in Subsection 4.3.2. For
additional background and numerical results, see Nagurney, Takayama and Zhang
We do, however, use an alternative variational inequality formulation, which makes
the massively parallel decomposition by market pairs more apparent. The decomposi-
tion proposed here is of the finest possible for this problem. In particular, we consider
the variational inequality formulation of the problem given by

F ( Q * ) r . (Q - Q*) ) o, VQ c K , (4.49)

here F(-) is the ran-dimensional row vector whose (/,j)th component is given by:
Try(s) + c~j(Q) - pj(d), and the feasible set K is defined as the nonnegafive orthant:
398 A. Nagurney

We now present the ordinary differential equation (ODE), whose set of stationary
points corresponds to the set of solutions of variational inequality (4.49), or, equiva-
lently, to the set of spatial price equilibrium patterns satisfying conditions (4.31), with
the number of commodities J = 1. The pertinent ODE is given by:

(2 = n ( Q , - F ( Q ) ) , Q(o) - QO c K . (4.50)

The intuition behind the dynamical system in the context of the spatial price equi-
librium problem will now be briefly addressed. If Q(t) E K °, that is, in the context
of the spatial price equilibrium problem, all the commodity shipments at time t,
Q(t),.are positive, then the evolution of the solution is directly given in terms of
F : Q = - F ( Q ) , where recall that - F i j ( Q ) = pj(d) - cij(Q) - Try(s). In other
words, if the demand price at a demand market exceeds the supply price plus trans-
action cost associated with shipping the commodity between this pair of supply and
demand markets, then the commodity shipment between this pair of markets will in-
crease. On the other hand, if the supply price plus transaction cost exceeds the demand
price, then the commodity shipment between the pair of supply and demand markets
will decrease. If a stationary point is reached, that is, if (~ = 0 = - F ( Q ) , then the
supply price plus the transaction cost will be exactly equal to the demand price for
each pair of markets and the associated commodity shipments will be positive.
However, if the vector field F drives Q to the boundary of K (i.e. F ( Q ( t ) ) points
"out" of K ) the right-hand side of (4.50) becomes the projection of F onto OK. In
other words, if the commodity shipment is driven to be negative, then the projection
ensures that the commodity will be nonnegative, by setting it equal to zero.
For the computation of the solution to this problem, we applied the Euler method
(cf. Section 3.2.5), where the expression (3.10) now takes the form:

ij = max {0, a.r(pj(d ~) - cij(Q ~) - 1ri(s~)) + Qi~j }
i=l,...,m; j= 1,...,n.

Note that (4.51) is a parallel adjustment process in that each of the m n market
pair subproblems can be solved simultaneously at each iteration. Moreover, each such
subproblem can be solved explicitly in closed form.
In view of the similarity between the iterative step (4.51) and the iterative steps
(4.29) and (4.30), the massively parallel implementation of the Euler method in CM
Fortran is similar to the massively parallel implementation of a single step of the
modified projection method.
For completeness, and easy reference, we present some numerical results. In partic-
ular, we considered spatial price equilibrium problems with linear, asymmetric supply
price functions, linear, asymmetric demand price functions, and quadratic transaction
(transportation) cost functions. We report the results on a set of five examples, the
first example consisting of 100 supply markets and 100 demand markets, with 10,000
Ch. 7: Parallel Computation 399

Table 7.6
CM-2 times for spatial price equilibrium problems nonlinear transportation
Exmnple # of supply # of demand CM-2 time (see.)
markets markets 8K 16K 32K
ASP100 100 100 48.98 37.27 --
ASP200 200 200 165.70 154.19 111.88
ASP300 300 300 263.69 172.95 122.80
ASP400 400 400 544.84 352.61 245.17
ASP500 500 500 1772.58 1214.51 690.65

variables or unknown commodity shipments, and ending with a problem with 500
supply markets and 500 demand markets, that is, with 250,000 variables. The numer-
ical results are reported in Table 7.6 for the examples using 8K, 16K, and, finally,
32K processors of the CM-2.
The algorithm was initialized with Q0 = 0. The convergence criterion used was:
ij - Q i j l <~ e for all i, j, with the tolerance e set to 0.001.
Each example (except for the first, which had only 10,000 variables) was solved
with 8K processors, with 16K processors, and, finally, with 32K processors.
The first example in this set, ASP100, required 2,558 iterations for convergence,
the second example, ASP200, required 5,693 iterations, the third example, ASP300,
took 5,869 iterations, the fourth example, ASP400, took 8,188 iterations, and the fifth
example, ASP500, 13,264 iterations.
We then considered the solution of spatial price equilibrium problems in which
the transportation cost functions were fixed, that is, of the form (4.29) and applied
the Euler method on the CM-2, the CM-5, and the ES/9000. These problems had
been previously solved on the CM-2 in Nagurney, Takayama and Zhang (1995). The
problems ranged in size from 300 x 300 or 90,000 variables for SP300 to 500 x 500
or 250,000 variables for SP500 and the CPU times on the three distinct architectures
are reported in Table 7.7.
Table 7.7 reports the CPU times using 32K processors of the CM-2, and 128 nodes,
256 nodes, and 512 nodes of the CM-5. Only a single processor of the ES/9000 is also
used. The examples (as one would expect) required the same number of iterations for
convergence on the CM-5 as they did, respectively, on the CM-2 and on the ES/9000.
SP300 required 4,483 iterations, SP400 required 8,187 iterations, whereas SP500
required 13,262 iterations. The CPU time on the ES/9000 for SP500 is estimated,
since it became prohibitively expensive to solve it serially.
The numerical results clearly indicate the following. First, it is imperative that an
algorithm be mapped to the appropriate architecture. The Euler method in its real-
ization in the economic equilibrium problem under consideration here is a massively
400 A. Nagurney

Table 7.7
CM-2, CM-5, and ES/9000 CPU times for spatial price equilibrium problems
fixed transportation costs CPU times in seconds

Example CM-2 CM-5 ES/9000

32K 128 256 512 1
SP300 93.31 54.31 45.68 40.92 1,170.59
SP400 243.33 311.54 106.49 90.31 4,034.25
SP500 686.7g 305.88 180.38 133.80 9,600*
*estimmed true

parallel algorithm and, hence, should be implemented on a massively parallel archi-

tecture. Indeed, although the Euler method requires many iterations for convergence,
the total time required for convergence is minimal in the massively parallel implemen-
tation for spatial price equilibrium problems, since each iteration is computationally
inexpensive, because of its simplicity and because the problems are solved simulta-
Second, the ease of portability of the CM Fortran code between the CM-2 and CM-5
was demonstrated. No changes to the code were needed (except for the compilation)
in order to execute the CM Fortran code on the CM-5 which had been developed
for the CM-2. Third, the numerical results on the CM-5 suggest that very large-scale
problems in economics can be solved very efficiently. Indeed, the largest problem,
consisting of 250,000 variables required only approximately 2 minutes for solution.
This is due, partially, to the fact of the layout of the data structures and partially to
the algorithm itself and its implementation in CM Fortran.
Moreover, these results and those in the preceding numerical subsections suggest
that massively parallel computation can enable one to conduct many simulations of
alternative policy interventions such as, for example, different tariff structures, in a
timely fashion.


The research reported herein was supported, in part, by the National Science Foun-
dation under grant DMS 9024071 under the Faculty Awards for Women program.
This research was conducted at the National Center for Supercomputer Applications
at the University of Illinois at Urbana-Champaign, at the Pittsburgh Supercomputing
Center, and at the Cornell Theory Center at Cornell University in Ithaca, New York.
The use of these facilities and the technical assistance provided at these centers are
gratefully acknowledged.
Ch. 7: Parallel Computation 401


Amdahl, G. (1967)'The validity of single processor approach to achieving large scale computing capabil-
ities', in: AFIPS proceedings, pp. 483-485.
Amman, H.M. (1985) 'Applying the Cyber 205 for optimal control experiments in economics', Supercom-
puter, 8/9:71-74.
Amman, H.M. (1989) 'Nonlinear control simulation on a vector machine', Parallel Computing, 10:123-127.
Ando, A., Beaumont, P. and Ando, M. (1987) 'Efficiency of the CYBER 205 for stochastic simulations
of a simultaneous, nonlinear, dynamic econometric model', The International Journal c~fSupercomputer
Applications, 1:54-81.
Arrow, K.J. and Hurwicz, L. (1958a) 'Gradient method for concave programming, 1: Local results', in:
K.J. Arrow, L. Hurwicz and H. Uzawa, eds, Studies in linear and nonlinear programming. Stanford,
CA: Stanford Univ. Press, pp. 117-126.
Arrow, K.J. and Hurwicz, L. (1958b) 'Gradient method for concave programming, IlI: Further global results
and applications to resource allocation', in: KJ. Arrow, L. Hurwicz and H. Uzawa, eds, Studies in linear
and nonlinear programming. Stanford, CA: Stanford Univ. Press, pp. 133-145.
Arrow, K.J., Hurwicz, L. and Uzawa, H., eds, (1958) Studies in linear and nonlinear programming. Stan-
ford, CA: Stanford Univ. Press.
Bacharach, M. (1970) Biproportional scaling and input--output change. Cambridge, UK: Cambridge Univ.
Bazaraa, M.S. and Shetty, C.M. (1979) Nonlinear programming: Theory and algorithms. New York: Wiley.
Bertsekas, D.R (1976) 'On the Goldstein-Levitin-Polyak gradient projection method', IEEE Transactions
on Automatic Control, AC-21:174-184.
Bertsekas, D.P. and Tsitsiklis, LN. (1989) Parallel and distributed computation. Englewood Cliffs, NJ:
Casti, J., Richardson, M. and Larson, R. (1973) 'Dynamic programming and parallel computers', Journal
of Optimization Theory and Applications, 12:423-438.
Chow, G.C. (1975) Analysis and control e)fdynamical systems. New York: Wiley.
Coddington, E. and Levinson, N. (1955) Theory of dif_[erential equations. New York: McGraw-Hill.
Cournot, A. (1987) Researches into mathematical principles t)f tbe theory (~f"wealth. New York: Mcnfillan.
Cray Research, Inc. (1986) 'CRAY X-MP computer systems l:unctional description manual', Eagan, MN.
Cray Research, Inc. (1993a) 'CRAY C90 series functional description manual', Eagan, MN.
Cray Research, Inc. (1993b) 'CRAY T3D system architecture overview manual', Eagan, MN.
Dafermos, S. (1983) 'An iterative scheme for variational inequalities', Mathematical Programming, 16:40-
Dafermos, S. (1986) 'Isomorphic multiclass spatial price and multimodal traffic network equilibrium mod-
els', Regional Science and Urban Economics, 16:197-209.
Dafermos, S. and Nagurney, A. (1989) 'Supply and demmld equilibration algorithms for a class of market
equilibrium problems', Transportation Science, 23:118-124.
Dafermos, S. and Sparrow, ET. (1969) 'The traffic assignment problem for a general network', Journal (~f
Research of the National Bureau ()f Standards, 73B :91-118.
Dantzig, G.B. (1951) 'A proof of the equivalence of the programming problem and the game problem',
in: T.C. Koopmans, ed., Activity analysis (~fproducton and allocation. New York: Wiley, pp, 330-335.
DeCegama, A.L. (1989) The technology (~fparallel processing: Parallel processing architectures and VLSI
hardware. Englewood Cliffs, NJ: Prentice-Hall.
Deng, Y., Glimm, J. and Sharp, D.H. (1992) 'Perspectives on parallel cmnputing', Daedalus, pp. 31-52.
Denning, RJ. and Tichy, W.E (1990) 'Highly parallel computation', Science, 250(30):1217-1222.
Dennis, J.E. and Schnabel, R.B. (1983) Numerical methods Jor unconstrained optimization and nonlinear
equations. Englewood Cliffs, N J: Prentice-Hall.
Dixon, RB., Bowles, S. and Kendrick, D. (1980) Notes and problems in microeconomic theory. Amsterdam:
402 A. Nagurney

Dorfman, R., Samuelson, EA. and Solow, R. (1958) Linear programming and economic analysis. New
York: McGraw-Hill.
Dupuis, E and Naguruey, A. (1993) 'Dynamical systems and variational inequalities', Annals of Operations
Research, 44:9-42.
Eydeland, A. and Nagurney, A. (1989) 'Progressive equilibration algorithms: the case of linear transaction
costs', Computer Science in Economics and Management, 2:197-219.
Finkel, R. and Manber, U. (1987) 'DIB - A distributed implementation of backtracking', ACM Trans. Prog.
Lang. Syst., 9:235-256.
Flynn, M.J. (1972) 'Some computer organizations and their effectiveness', IEEE Transactions on Comput-
ers, C-21:948-960.
Gabay, D. and Moulin, H. (1980) 'On the uniqueness and stability of Nash equilibria in noncooperative
games', in: A. Bensoussan, E Kleindorfer and C.S. Tapiero, eds, Applied stochastic control in econo-
metrics and management science. Amsterdam: North-Holland, pp. 271-294.
Garcia, C.B. and Zangwill, W.I. (1981) Pathways to solutions, fixed points, and equilibria. Englewood
Cliffs, N J: Prentice-Hall.
Gilli, M. and Pauletto, G. (1993) 'Econometric model simulation on parallel computers', The International
Journal of Supelvomputer Applications, 7(3).
Goffe, W., Ferrier, G.D. and Rogers, J. (1994) 'Global optimization of statistical functions with simulated
annealing', Journal of Econometrics, 60:65-99.
Goldstein, A.A. (1964) 'Convex programming in Hilbert space', Bulletin qf the Mathematical Society,
Goldstein, A.A. (1967) Constructive real analysis. New York: Harper & Row.
Guder, E, Morris, J.G. and Yoon S. (1992) 'Parallel and serial successive overrelaxation for multicommodity
spatial price equilibrium problems', Transportation Science, 26:48.
Guder, E Morris, J.G. and Yoon, S. (1993) 'Parallel computation of intertemporal multicommodity spatial
price equilibria in the presence of quotas', Annals of Operations Research, 44:277-298.
Hartman, E (1964) Ordinary differential equations. New York: Wiley.
Hartman, E and Stampacchia, G. (1966) 'On some nonlinear elliptical differential functional equations',
Acta Mathematica, 115:271-310.
Hennessy, J.L. and Patterson, D.A. (1990) Computer architecture: A quantitative approach. San Mateo,
CA: Morgan Kaufmann.
High performance Fortran forum (1992) 'High performance Fortran language specification, version 4', Rice
University, Houston, TX.
Hildreth, C. (1957) "A quadratic programming procedure', Naval Research Logistics Quarterly, 4:79-85.
Hillis, W.D. (1992) 'What is massively parallel computing, and why is it important?' Daedalus, pp. 1-15.
Hockney, R. and Jesshope, C. (1981) Parallel computers. Bristol: Adam Hilger.
Holbrook, R.S. (1974) 'A practical method for controlling a large, nonlinear, stochastic system', Anna& of
Economic and Social Measurement, 3:155-176.
Hughes, M. and Nagurney, A. (1992) 'A network model and algorithm for the analysis and estimation of
financial flow of funds', Computer Science in Economics and Management, 5:23-39.
IBM corporation (1988) 'Parallel FORTRAN language and library reference', Document Number SC23-
IBM corporation (1992) 'IBM enterprise system/9000: Introducing the system', GA24-4186, Endicott, New
IBM corporation (1993) 'IBM 9076 scalable POWER parallel systems: General information', GH26-7219,
Kingston, New York.
Intel corporation (1992) 'Paragon supercomputers', Beaverton, Oregon.
Intriligator, M.D. (1971), Mathematical optimization and economic theory. Englewood Cliffs, NJ: Prentice-
Judd, K.L. (1991) Numerical methods in economics. Hoover Institution, Stanford University, Stanford, CA.
Ch. 7: Parallel Computation 403

Kantorovich, L.V. (1965) The best use of economic resources. Cambridge, MA: Harvard Univ. Press.
Kaufmann III, W.J. and Smarr, L.L. (1993) Supercomputing and the transformation of science. New York:
Scientific American Library.
Kendall square research (1992) 'KSR1 technical summary', Waltham, MA.
Kendriek, D.A. (1973) 'Stochastic control in macroeconomic models', IEEE conference publication no. 101,
Kim, D.S. and Nagnrney, A. (1993) 'Massively parallel implementation of the splitting equilibration algo-
rithm', Computational Economics, in press.
Kinderlehrer, D. and Stampacchia, D. (1980) An introduction to variational inequalities and their applica-
tions. New York: Academic Press.
Kindervater, G.A. and Lenstra, LK. (1988) 'Parallel computing in combinatorial optimization', Annals of
Operations Research, 14:245-289.
Koopmans, T.C. (1951) Activity analysis of production and allocation. New York: Wiley.
Korpelevich, G.M. (1976) 'The extragradient method for finding saddle points and other problems', Eko-
nomicheskie i Mathematicheskie Metody, 12:747-756.
Lasdon, L.S. (1970) Optimization theory for large systems. New York: Macmillan.
Leighton, ET. (1992) Introduction to parallel algorithms and architectures: Arrays, trees, hypercubes. San
Mateo, CA: Morgma Kaufmann.
Lemke, C.E. (1980) 'A survey of complementarity problems', in: R.W. Cottle, E Giannessi and J.L. Lions,
eds, Variational inequalities and complementarity problems. Chichester, UK: Wiley, pp. 213-239.
Lootsma, EA. and Ragsdell, K.M. (1988) 'State-of-the-m-t in parallel nonlinear optimization', Parallel
Computing, 6:133-155.
Markowitz, H. (1952) 'Portfolio selection', The Journal of Finance, 7:77-91.
Markowitz, H. (1959) Portfolio selection: Efficient diversification of investments. New York: Wiley.
Murphy, F.H., Sherali, H.D. and Soyster, A.L. (1982) 'A mathematical programming approach for deter-
mining oligopolistic market equilibrium', Mathematical Programming, 24:92-106.
Nagumey, A. (1993) Network economics: A variational inequality approach. Boston, MA: Kluwer Acade-
mic Publishers.
Nagurney, A. (1994) 'Variational inequalities in the analysis and computation of multi-sector, nmlti-instru-
ment financial equilibria', Journal of Economic Dynamics and Control, 18:161-184.
Nagurney, A., Dong, J. and Hughes, M. (1992) 'The formulation and computation of general financial
equilibrium', Optimization, 26:339-354.
Nagurney, A., Dupuis, E and Zhang, D. (1994) 'A dynamical systems approach for network oligopolies
and variational inequalities', Annals of Regional Science, 28:263-283.
Nagurney, A. and Eydeland, A. (1992) 'A Splitting Equilibration Algorithm for the computation of large-
scale constrained matrix problems: Theoretical analysis and applications', in: H.M. Amman, D.A. Belsley
and L.E Pan, eds, Computational economics and econometrics. Advanced studies in theoretical and
applied econometrics. Vol. 22, pp. 65-105.
Nagurney, A. and Hughes, M. (1992) 'Financial flow of funds networks', Networlc~, 22:145-161.
Nagurney, A. and Kim, D.S. (1989) 'Parallel mid serial variational inequality decomposition algorithms
for multicommodity market equilibrium problems', The International Journal of Supercomputer Appli-
cations, 3:34-58.
Nagurney, A., Nicholson, C.E and Bishop, EM. (1995) 'Spatial price equilibrium models with discrim-
inatory ad valorem tariffs: Formulation and comparative computation using variational inequalities',
in: J.C.J.M. van den Bergh, R Nijkamp and E Rietveld, eds, Recent advances in spatial equilibrium
modelling: Methodology and applications. Heidelberg: Springer, in press.
Nagurney, A., Takayama, T. and Zhang, D. (1995) 'Massively parallel computation of spatial price equilibria
as dynamical systems', Journal of Economic Dynamics and Control, 19:3-37.
Nagurney, A. and Zhang, D. (1996) Projected dynamical systems and wtriational inequalities with appli-
cations. Boston, MA: Kluwer Academic Publishers.
404 A. Nagurney

Nash, J.F. (1950) 'Equilibrium points in n-person games', Proceedings of the National Academy ~fSciences,
Norman, A.L. (1976) 'First order dual control', Annals ~)["Economic and Social Measurement, 5:311-322.
Okuguchi, K. (1976) Expectations and stability in oligopoly mode&, Lecture notes in economics and
mathematical systems, 138. Berlin: Springer.
Okuguchi, K. and Szidarovsky, E (1990) The theory of oligopoly with multi-product firms, Lecture notes
in economics and mathematical systems, 342, Berlin: Springer.
Ortega, J.M. and Voigt, R.G. (1985) 'Solution of partial differential equations on vector and parallel
computers', SIAM Review, 27:159-240.
Pardalos, P.M. and Rosen, J.B. (1987) Constrained global optimization: Algorithms and applications, Lec-
ture notes in computer science, 268. Berlin: Springer.
Petersen, C.E. (1987) 'Computer simulation of large-scale econometric models: Project LINK', The Inter-
national Journal of Supercomputer Applications, 1:31-54.
Petersen, C.E. and Cividini, A. (1989) 'Vectorization and econometric model simulation', Computer Science
in Economics and Management, 2:103-117.
Ralston, A. and Reilly, E.D., eds (1993) Encyclopedia (~f computer science. New York: Vail Nostrand
Samuelson, P.A. (1952) 'A spatial price equilibrium and linear programming', American Economic Review,
Scarf, H. (1964) 'The approximation of fixed points of continuous mappings', SIAM Journal of Applied
Mathematics, 15:1328-1343.
Scarf, H. with Hzmsen, T.E. (1973) The computation ~feconomic equilibria. New Haven, CT: Yale Univ.
Sharpe, W. (1970) PorO~olio theory and capital markets. New York: McGraw-Hill.
Smale, S. (197fi) 'A convergent process of price adjustment and Global Newton methods', Journal of
Mathematical Economics, 3:1-14.
Steele, Jr., G.L. (1988) 'Languages for massively parallel computers', in: Proceedings ~f the IEEE second
symposium on the f?ontiers ~)f massively parallel computations, pp. 3-13.
Takayama, A. (1974) Mathematical economics. Hillsdale, N J: Dryden Press.
Takayama, T. and Judge, G.G. (1964) 'Equilibrium among spatially separated markets: A reformulation',
Econometrica, 32:510-524.
Takayama, T. and Judge, G.G. (1971) Spatial and temporal price and allocation models. Amsterdam:
Thinking Machines Corporation (1990) 'CM-2 technical summary', Cambridge, MA.
Thinking Machines Corporation (1992a) 'CM-5 technical summary', Cambridge, MA.
Thinking Machines Corporation (1992b) 'Gettin~ started in CM fortran', Cambridge, MA.
Thinking Machines Corporation (1992c) 'CMSSL release notes for the CM-200', Cambridge, MA.
Thinking Machines Corporation (1993a) 'CM fortran user's guide', Cambridge, MA.
Thinking Machines Corporation (1993b) 'CMMD user's guide', Cambridge, MA.
Thinking Machines Corporation (1993c) 'Using the CMAX converter', Cambridge, MA.
Varian, H. (1981) 'Dynamical systems with applications to economics', in: K.J. Arrow and M.D. Intriligator,
eds, Handbook ~f mathematical economics, pp. 93-110.
Chapter 8

The real estate price and assets and liability analysis case
Digital Equipment Europe, European Technical Center, France


Institute ~f Systems Science, Singapore


1. Introduction 407
1.1. Historical perspective 407
1.2. Plan of the chapter 408
2. M o t i v a t i o n s for the use o f A I in e c o n o m i c s and finance 409
3. Real estate pricing and lending 411
3.1. Problem analysis and adaptation of AI techniques to decision making goals 41 l
3.2. Short definitions of main generic approaches 413
4. G e n e r i c tasks 415
5. C o n v e n t i o n a l A I approaches 416
5.1. Knowledge-based systems 417
5.2. Natural language processing 418
5.3. Qualitative simulation 419
6. M a c h i n e learning approaches 420
6.1. Introduction to alternative machine lezmling approaches 420
6.2. Tree induction: ID3 422
6.3. Unsupervized learning: Conceptual clustering 424
6.4. Neural processing 425
7. Case-based reasoning in e c o n o m i c s and finance 427
7.1. Introduction 427
7.2. Implementation of case-based reasoning in economics 428
7.3. Potential applications of CBR in economics and finance 432

Handbook of Computational Economics, Volume L Edited by H.M. Amman, D.A. Kendrick and J. Rust
@ 1996 Elsevier Science B.V. All rights reserved.
406 L.E Pau and Tan, Pan Yong

8. Conclusions: Areas ~br further research 433

Terminology (in alphabetical order) 434
References 437
Ch. 8: Artificial Intelligence in Economics and Finance 407

This chapter emphasizes specific contributions, advantages, and weaknesses of ar-

tificial intelligence (AI) techniques for the following purposes:
• knowledge-based reasoning that applies heuristic knowledge in order to make
policy search more robust, or to satisfy trading principles in a competitive econ-
® machine learning to learn the behavior of economic agents
® decision rationalization in decision support systems, and case-based reasoning
• integration of databases and knowledge bases via suitable knowledge represen-
A complete process is proposed in order to relate the business motivations of end
users (banks, financial companies, etc.) to the solution architecture of their problems.
Both AI and non-AI techniques are used in application-specific combinations. A single
case study, that of analyzing and forecasting residential real estate pricing and credit,
is used for illustration throughout the chapter. This case is useful in addressing a
diversity of motivations and techniques, and the problem can be organized into a multi-
level organizational and informational framework. The relative merits of common AI
techniques are discussed and compared with the findings of a comprehensive survey
of operational systems [27].
This chapter cannot by itself be a full textbook account of notions, research, algo-
rithms and software in the AI area; such information, adapted to economic or financial
problems, can be found in [3], while [8] is a Handbook on AI in general (3 volumes).
This chapter also includes a Terminology Index on general notions and/or computer
science terms. This chapter was written in order to reduce the complexities of AI
notions by relating them directly to application cases serving as illustrations of these
more abstract AI notions.
Similarly, this chapter cannot deeply explore the application cases or fields, and the
issues they are faced with; references are provided giving this information.

1. I n t r o d u c t i o n

1.1. Historical perspective

It is time to look back and forward into the applications and research challenges for
artificial intelligence in economics and finance. This broad area was in the late 1970's
labeled as "the one" field where AI would "with certainty and brilliance" make the
deepest inroads, owing to the relatively high proportion of simple formalized heuristic
knowledge and low labor productivity.
Prior to 1990, a large number of banks, insurance or finance companies embarked
on internal developments, hiring a few key AI specialists for overall software ar-
chitectural design and prototyping. Some specialized companies have also emerged,
offering generic approaches using mostly knowledge-based or natural language-based
408 L.E Pau and Tan, Pan Yong

tools tailored to the special needs of generic areas such as insurance, asset allocation,
and formatted message understanding. It was estimated that more than 2000 prototypes
were developed in industry worldwide, most with standard "shells", each requiring
customization efforts of about six man-months [51]. The term "shell" refers to com-
mercial software products offering standard capabilities of some non-specific expert
systems, but without the application to specific knowledge base(s) (see Terminology
For some strange reasons, "open" AI applications in economics and finance was
never a prime area for academic researchers in their academic capacities; the openness
refers here to the integration requirement with other applications or computer systems.
Economics and management science researchers had never showed any interest at that
In the late 1980s, some incisive remarks were made on the above phenomenon,
stressing that bad integration, poor tools and, above all, lack of trust by management,
affected incomplete fielding of these solutions, despite the lack of exhaustive or "ac-
cepted" knowledge. A thorough verification as to the actual use of these prototypes
and later ones [3, 27] led to the result that at most 350 systems were deployed, and
often on a trial basis. This leaves less than 80 "survivors" in 1994, which encom-
pass solutions used in an operational context (separate from development) and which
include identifiable AI technologies in the deployed solution.
These survivors and precursors shared a few recognizable characteristics:
• very narrow and focused functionalities
• non-strategic, but identified and measurable usefulness
• engineering methodologies developed by conventional software and mostly by
internal staff with a dual training/experience in banking and software, and with
only limited exposure to AI research
• heavy use of database systems, with A I representing less than 20% of the total
solution (and often not the critical part, as opposed to database engineering)
• growing use of "complementary" techniques, such as object oriented problem
specification, machine learning and neural networks
• PC target platforms combined with networking and, possibly, with client-server
A major trend of the early 1990s was the growing awareness and use of AI techniques
among management scientists and some economics academics, even though they tend
to "bypass" AI and embrace "newer" techniques such as constraint programming
or neural networks. As to commercial implementations, they are now carried out
routinely and with success.

1.2. Plan o f the chapter

Apart from describing common AI techniques, this chapter also relates them to ap-
plications, and describes each technique from a user perspective. To illustrate this
Ch. 8: Artificial Intelligence in Economics and Finance 409

approach, a unifying case study will be considered, that of real estate pricing and
credit (Section 3). This case study was selected not because it is a dominating one,
but on the basis that it is both a good illustration of the usefulness of AI and a direct
lead to some key areas of further research.
Sections 2 and 4 reflect concerns in the business community about the use of
AI techniques and offer a systematic goal-driven methodology for selecting solution
approaches combining AI and other techniques. In particular, in Section 2 we develop
a linkage between AI techniques and the implementation of "intelligent agents", i.e.,
autonomous programs performing financial or settlement tasks under the control of a
human coordinator.
Sections 5, 6, and 7 discuss common AI approaches, their advantages and dis-
advantages, as well as examples linked to the common real estate case study. The
topics covered include conventional AI approaches (knowledge-based systems, natu-
ral language, qualitative modeling), machine learning (induction, concept learning and
neural processing), and case-based reasoning. The conclusion in Section 8 presents
new and important research and application areas.
A Terminology Annex is added to give some definitions of AI specific terms used
throughout the chapter.

2. Motivations for the use of AI in economics and finance

In the areas of banking, financial services, insurance, economics, accounting, and

related industries, the main operational motivations (M~) for initiating and eventu-
ally fielding system solutions by using significant portions of artificial intelligence
techniques are as described in Table 8.1.
It is not possible here to detail full specifics of each application, mentioned in
Table 8.1, but a short introduction is given below (the reader who is familiar with the
examples above can skip this part):
- Lending advisors. They are decision assistants who analyze all data and qualitative
elements concerning a loan application (consumer, real estate or business loans),
check on consistence with lending policies and tariffs, and who offer advice as
to what to ask for in specific circumstances (exception handling), and ultimately
suggest alternative loan offers
- B o n d trading and allocation systems. These are decision support tools, coupled
to real-time data feeds on the one hand, and to fund holdings databases, on the
other hand; they allow bond traders to place orders and optimize through these
sequential decisions the overall bond portfolio structure in view of allocation targets
(maturities, cun'encies, risks, etc...)
- Real estate appraisal f o r credit. This is an integrated workflow solution [35] allow-
ing all parties in a real estate pricing and loan process to collect, verify, aggregate
all pertinent information, enable parallel processes such as property inspection, land
410 L.F. Pau and 7hn, Pan Yong

Table 8.1

M1 : Development of computer-based solutions allowing for the handling of tasks of a high relative
complexity, as measured versus operator skills or combinatorics, thus leveraging skills/staff
and reducing risks; the complexity involves achieving a compromise anaong the provision of
a consistent level of information, a wide range of capabilities around some specific goals, and
training capabilities
Example: lending advisors
M2: Setting up computer- and communication-based information services, with user specific inter-
faces and dialogue functions, offering a time and/or competitive edge over the other actors oper-
ating in the same domain, and pooling knowledge for the preparation of time-critical decisions
Example: bond trading and allocation system
M3: Replacing paperwork, information consolidation and cumbersome control procedures in routine
operations involving usually distributed agents/information sources, where a high consistency
is needed reflecting a common policy or legal/tax rules
Example: real estate appraisal Jbr credit
Example: liquidation of pension rights
M4: Outright labor, quality, time and money savings in centralized routine tasks, with reduced errors
Example: money transfer telex conversion into verified standard formats
Ms: Ability to upgrade a solution software incrementally with a higher software productivity
Example: cash point machines network management and fault diagnosis

registry queries, insurance coverage, etc .... and reach decisions about a loan and
its conditions
- Liquidation of pension rights'. This is another integrated workflow solution allowing
all parties in a pension payment process to collect, verify, aggregate all pertinent in-
formation, to enable parallel processes such as health checks, disability claims, etc ....
and to reach decisions about the pension payments
- Money transfer telex conversion into verified standard formats. Bank-to-bank money
transfers proceed from a free form specification of the payment to the audit of the
credit, made to the destination account and payee; because the payment data can be
ordered in many different ways due to language, format, address and bank reference
inconsistencies, and because electronic funds transfer systems operate with few strict
formats, a lot of content checking and validation goes into the formatting process;
this is very costly, tedious and suitable for knowledge-based processing
- Cash point machines network management and.fault diagnosis. It is essential for
service reasons to keep the teller machine availability high and, furthermore, to
manage them as one integrated asset, irrespective of communication systems physi-
cal locations, and failures (due to usage or to the machines or networks). Knowledge
can be accumulated as to the operations of the machines and networks, so as to
assist in diagnosing and repairing them quickly.
These classes of motivations may be further related to the location (L~) of the need in
terms of the organizational element (person, building, department, network segment)
Ch. 8: Artificial Intelligence in Economics and Finance 411

where this solution is needed. This is very important as the "cultures" and "business
performance metrics" are very different among these locations.

Table 8.2
L~: . front-officeservices facing the end customer
Lz: generalsupport functions including dealing rooms
L3: productor service specific support functions
L4: internalpolicy making and related information analysis
Ls: auditing, compliance and security functions

More fundamentally, the real question is: what will financial markets and economic
analysis processes look like two decades from now? This question is rarely asked
as vision is not common in those fields. As more financial, economic, and customer
information is disseminated electronically, the ability to manipulate it is also grow-
ing both for professionals and individuals. Transactions will be instantly verified and
settled through global real-time payment systems. The keys to these activities are "in-
telligent agents", stuffed with AI techniques and knowledge. They are the equivalent
of financial "private eyes". These "intelligent agent" programs will zoom along infor-
mation networks scanning all information sources for any figure, clause, or hypothesis
that might be relevant to the specific duties of this agent. They operate on case-based
reasoning principles, but extend them by doing well-understood tasks, from trading to
payment control. The coordinators will be either banks exploiting those niches so that
they can benefit from the collection of agents they control and deploy, or individuals
selecting other agents to suit their needs. These agents will all be categorized by their
motivations Mx and field of action L v.

3. Real estate pricing and lending

3.1. Problem analysis and adaptation of AI techniques to decision making goals

This case [32], at the cross-roads of housing finance and wider economic issues [55],
offers an excellent example of a mixture of techniques, integration, and multi-level
analysis (domains Dx) (from household economics, demographics [50] to macro-
economics) with different information sources and constraints. This combination of
characteristics certainly and fundamentally shows what makes the use of AI in eco-
nomics and finance so challenging. This case is pedagogically useful, as many people
may be able to understand it.
Table 8.3 gives the decision making level (national policy, bank, housing district,
or individual home) and the typical goals or concerns at each such level, with in-
formation analyzed by a variety of AI techniques, most suitable to meet these goals.
412 L.F. Pau and Tan, Pan Yong

Table 8.3
Levels of real estate pricing and credit case

Level Goals Information Approaches

D l: National Acceptablemonetary Demographics Macroeconomic models
aggregates Sectoral economic activity Neural networks
Inflation and employment Money and bond markets Qualitative simulation
Tax and investment laws Inductive logic
D2: Mortgage Assets and liabilities Balance sheet structure Optimization
bank allocation Credit policies Constraint programming
Profitability Branch policies Spreadsheet
Risk control Competition
D3: Housing Detectquality trends Local government policies Case based reasoning
district/ Real estate t a x a t i o n Environment Neural networks
bank branch Supply/demandof housing Land use structure/age Qualitative simulation
Public valuations
D4: Individual Qualityand risk Household finances Expert systems
home assessment Taxation Machine learning
Private valuation Visual observations Probability estimation
Historical/credit records

The techniques are not to be described fully here, but will be introduced in Sec-
tion 3.2.
The following Tables 8.4 and 8.5 will in turn relate the decision making levels, as
just introduced, to where in terms of locations (Lv) these decisions are formed, and
thereafter the tables will link the goals or concerns to the generic motivation classes
(Mx) of Table 8.1.
Such goal-driven analysis and organization-dependent adaptation are essential in
order to have a clear range of options as to generic implementation techniques. It
serves as a guide to the student or newcomer, prompting what to look for when
faced, at a given organizational level and in a given management structure, with some
specific motivations which A I can help achieve.
In the cases mentioned above, information is normally collected and analyzed at
one level, whereas the AI approaches in the rightmost columns must provide outputs
fed to levels upwards and downwards in order, for the techniques selected to achieve
their specific goals listed in the second column. It should also be noticed that there is
a conflict between some of the goals at different levels; for example risk minimization
in property valuation at level D4 will reduce profits at level D2 and increase long-term
interest rates at level D1.
The motivations and locations are different too. Assuming the actor is only
the mortgage bank (other actors would lead to different combinations), Tables 8.4
and 8.5 give the specific locations and connections between goals and motiva-
Ch. 8: Artificial Intelligence in Economics and Finance 413
Table 8.4
Locations in real estate pricing and credit case

Level Locations
DI: National level L4 (economic analysis)
D2: Mortgage bank L3 (assets and liability management)
L3 (treasury)
L4 (commercial lending policy)
L5 (auditing)
D3: Housing district/branch L2 (mortgage payments)
L4 (marketing)
D4: Individual home L1 (bank desk)
LI (valuation expert)

Table 8.5
Goals in relation to motivations in real estate pricing and credit case

Goal/level Motivation
Acceptable monetary aggregates, inflation and M3
Assets and liabilities allocation/D2 M2
Profitability/D2 M2
Risk control/D2 M4
Detect quality trends/D3 M1
Real estate taxation/D3 M2, M5
Quality and risk assessment/D4 MI
Private valuation/D4 M3

3.2. Short definitions o f main generic approaches

We will here briefly introduce some of the generic approaches appearing as alternatives
in Table 8.3:

- M a c r o e c o n o m i c models [30, 45]. They refer to the full range of econometric meth-
ods (based on statistics, identification or control techniques) w h e r e b y one can rep-
resent quantitative m o d e l s of economic systems and estimate the parameters in
such systems from past experience. The models usually i n v o l v e constants (to be
estimated), e x o g e n o u s data, instrument variables (exogenous), and e n d o g e n o u s vari-
ables c o m p u t e d from the model
- Probability estimation [33]. It is a set of statistical or probability estimation tech-
n i q u e s (e.g. m a x i m u m likelihood, least squares, Bayes rule, etc . . . . ) and - by ex-
tension - hypothesis testing procedures, which allow to estimate from quantitative
m e a s u r e m e n t s the probability (and by extension: certainty) or the confidence inter-
414 L.E Pau and Tan, Pan Yong

val of some states, while providing test statistics for them. The states can involve
one variable or a combination thereof
- Optimization [30, 31, 39, 46]. It is a set of linear, non-linear or stochastic algorithms
which will recursively update the variables in a quantitative model, eventually sub-
ject to constraints, in such a way that a quantitative function of these variables will
be maximized or minimized, or achieve a game theoretical equilibrium
- Neural networks (orprocessing) [9, 19]. They refer to approximations to cognitive
processes (see Section 6.4) where the stimuli (usually the input data or observations,
quantitative as well as qualitative) are being analyzed by a network of computational
cells characterized by individual non-linear responses akin to neurons in the brain.
When propagated through the network and after adding up, the stimuli result in
combined values which can be measured at specific cells called output neurons/cells.
By suitable algorithms applied to training data (e.g. input stimuli and corresponding
output responses), one can estimate recursively the weights and parameters applying
to all links between computational cells, and such that the computed output match
as closely as possible the nominal output responses. Once the training weights and
parameters have been estimated, new input stimuli can be applied to the network
corresponding to new situations and forecasts made as to their effects at all levels
including the output cells~this is called feed-forward networks. At the same time,
other algorithms can estimate the best input stimuli matching specified responses;
this is called feed-backward networks. By combining the two types, self-learning
can be achieved if some data (input stimuli and output responses) are supervised.
Whereas these algorithms are usually slow, the feed-forward/backward calculations
are very fast
- Qualitative simulation [3]. It is an advisory technique (see Section 5.3) producing
qualitative values for the endogenous variables in a macroeconomic quantitative
model from range or trend input values of the exogenous and instrument variables.
These qualitative output values are produced by propagating the inputs via essen-
tially grammatic or semantic expressions which specify how two or more range
or trend values (treated as linguistic elements) are combined depending on the
arithmetic or logic operator applying in the quantitative model for the underlying
- Machine learning (also abbreviated as "learning") [20]. It is a generic designation
of all cognitive techniques [33, 34] (see Section 6) leading to algorithms which allow
to elicit knowledge (represented in a suitable knowledge representation format) from
facts in a list, relational or semantic network form
- Inductive logic [22, 23]. It is a learning approach which generates predicates or
production rules from logic facts; these logic facts are logic combinations or asso-
ciations of truth values of some underlying variables. Inductive logic is efficient in
generating knowledge from logic facts and in synthesizing vast amounts of facts
by eliminating logic redundancies
Ch. 8: Artificial Intelligence in Economics and Finance 415

- C o n s t r a i n t p r o g r a m m i n g [56]. It is an approach which covers a range of program-

ming languages (PROLOG-IV, CHIP, etc .... ) which allow to express declaratively
quantitative bounds, as well as logic or qualitative constraints applying to all vari-
ables in any given knowledge rule. If a problem-solving method is enacted, the
language kernel will enforce, transparently to the user, the satisfaction of all con-
straints and prune the search space only for states which satisfy the constraints. In
this way, constraint programming is an advance over optimization as it allows to
perform the same, while including qualitative constraints and knowledge
- E x p e r t s y s t e m [3, 57]. It is a knowledge based system (see Section 5.1) composed
of one or several knowledge bases, combined with the same functionalities as a
"shell" (see above), the implementation being customized or via a commercial
software product
- Case b a s e d r e a s o n i n g [16]. It is a problem solving architecture (see Section 7) com-
bining a data base of facts, called cases, with a learning algorithm; this algorithm
elicits knowledge from these facts, while a proximity measure allows to retrieve
from the data base cases, similar to a new observed pattern, together with a li-
brary of problem-solving methods operating on the knowledge extracted and on the
exceptions hereto. Case-based reasoning also serves as an explanation mechanism
by providing explicit facts/cases and all the processes carried out on them, while
selecting these relevant facts by knowledge-based search methods. Case-based rea-
soning is intermediate between data base applications, and expert systems and calls
for learning when it is found appropriate to add new knowledge from validated
facts to the existing knowledge base.

4. Generic tasks

The analysis of the motivations in relation to the types of information sources and
AI (or other) approaches is essential to allow the users to select AI techniques for
testing and later deployment. It is also essential for researchers to identify which
features are still missing in a given context (formalized here by the triplet Mx, L v,
Dz). Most users do not care about a specific approach, but for the ability to carry out
certain generic tasks (denoted by Tn) linked to the motivations, but constrained by
the locations and information types. Much energy goes into identifying and assessing
whether a given technique in a given operational environment can or cannot perform
a task well understood by the "user". It is often regretted that AI research ignores
such categorization issues, which are on the contrary well practiced in statistics or
operations research, thus implicitly inducing these "users" to select such approaches
The motivations can be linked in tabular form to specific generic tasks. A tabular
relation can also be established between these generic tasks and AI approaches and
others [54]. Whereas these two tables are general, the final selection candidates for
416 L.E Pau and Tan, Pan Yong

a solution architecture, combining several techniques to meet the goals of a specific

application case, will require a table such as Table 8.5 above, which is case specific.
The more important issue in economics and finance, because of the strong influence
of statistics, econometrics, operations research, and control theory, is whether any AI
approach can supplement these established approaches (besides database and integra-
tion techniques), and not the other way round. Spreadsheets are mentioned here on
purpose, because it should be recognized that the current spreadsheet products (which
include conditional IF_THEN_ELSE jumps as well as mathematical programming)
offer most of the functionalities end users are looking for, including real-time data
feeds. Likewise economics formulae used in models are very important [30], and
many tools exist for user friendly quantitative modeling [45].
Such a systematic screening process fully represents a state-of-the-art goal-driven
thinking around AI approaches in economics and finance. It goes beyond the relative
enhanced interest in one given approach at any one point in time by showing why
new approaches become relevant as they serve to fulfill some very explicit goals.
Such a screening fails however at highlighting the importance of efficiency-related
techniques, such as knowledge representation, knowledge acquisition methodologies,
search/matching algorithm efficiency, conflict resolution strategies, semantic disam-
biguation, terminology databases, etc., which are all extremely important, but only
once the solution architecture is defined meeting user goals.
Irrespective of all motivations and tasks, there is always in economics and finance
a set of permanent constraints to be satisfied about risks. The Federal Reserve Board,
FDIC and the Office of the Comptroller of the currency have defined seven categories
of risks in all economic transactions, all of which must be bounded:
• (counterparty) credit risk
• market price risk in view of price variations
• settlement risk
• operating risk, i.e., having people, means and methods to engage in activities
• liquidity risk, i.e., risk of not being able to replace a non-marketable good
• legal risk .~
• interconnection risk between instruments
Probabilistic or uncertainty reasoning is far from enough to deal with these risks.
On the contrary, AI allows for explicit formulation of the risk knowledge and its
incorporation in problem solving approaches.

5. Conventional AI approaches

This section will briefly define and evaluate the more conventional AI approaches,
taken among those mos~ frequently tested in prototype economic and financial appli-
cations, and will illustrate them by the real estate case.
Ch. 8: Artificial Intelligence in Economics and Finance 417
Table 8.6
Typical knowledge-basedsystem structure

• heuristicknowledge, stored in a knowledgebase, formalizedand stored via a knowledgerepresenta-

tion scheme (such as IF_THEN_ELSErules, logic predicates, frames with nested slots, class-instance
object oriented variations, semanticnetworks, etc.)
• a conflict resolution strategy (first found, least recently used, antecedent ordered, consequenceor-
dered, etc.) [10, 28]
• one or several inferenceenginesimplementingsearch algorithms on the chosen knowledgerepresen-
tation data structures (such as forward chahfing,backward chaining,truth maintenance,breadth-first,
etc.) possibly specialized to differentknowledgebase segments
• goal seeking and querying facility, which will allow the user to enter his requests and decode them
into a data structure compatiblewith knowledgerepresentation,thus activatingthe search and letting
•alternative answers be evaluated by the conflictresolution strategy
• possibly:an explanationfacility to justify the reasoningpath from query to conclusion
• possibly:interfaces to external call facilities or from other software modules

5.1. Knowledge-based systems

A conventional knowledge-based system or expert system will consist of the following

key elements (see Table 8.6); the definitions of some terms can be found in the
Terminology Annex, and full tutorial examples can be found in [3, 35].

Example in relation to the case (see [35] for more details). The mortgage bank val-
uator or real estate assessor will visit the property and inspect it and the surroundings,
entering qualitative attributes about it. This goes into the same fact base as another
part of the same knowledge base system which will evaluate the loan applicant's
household finances, tax situation, and the ability to pay back the mortgage. Property
valuation by the assessor will be adjusted by heuristic knowledge specific to the dis-
trict and to the bank's commercial policy. The output will be: loan granting decision,
loan size, loan interest rate, and possibly some tax advice to the applicant.

Advantages. Thanks largely to the widespread use of expert systems shells, or of

libraries of basic elements thereof, many users have tried out expert systems, this
approach representing in 1994 about 85% of all prototypes in the database [27].
Provided the knowledge exists or can be acquired, validated and "accepted", it is a
simple and efficient solution technique, and integration facilities exist today which
did not exist 5 years ago.

Disadvantages. Expert systems or knowledge base systems "fail" almost always to

get into real exploitation, say with an average 80% likelihood, when the knowledge
was either not available or not already in a documented and agreed form, so that the
knowledge acquisition effort is either too costly or risky. Other reasons for failures
in the past were over-sophistication, lack of integration facilities, lack of knowledge
representation standardization, and management control issues.
418 L.E Pau and Tan, Pan Yong
Table 8.7
Typical natural lzulguagemodule

a terminology database containing,besides general language dictionaries, domain specific terms,

with linked pointers to acronyms, synonyms,validity periods, etc., and possibly other languages
® a formal syntactic, grammatical and semantic description of the correct sentences and layouts
• a parser (see TerminologyAnnex) which matches, possiblyby search, a given piece of text with its
formal description and the root instances in the terminologydatabase, and which labels each item
in the text with a whole range of attributes
• possiblya limited form of discourse analysis
• almostalways a document layout language obeying the major document layout architectural stan-
dards (Open Document Architecture ODA/CDA, or SGML), made specific to the particular text
layout, so that contents is spatially tagged
• possiblyan optical character reading (OCR) module

5.2. Natural language processing

A typical natural language module will either serve as a natural language interface
for querying a knowledge-based system or a database (or several), or to automate the
screening for correctness of formatted forms. The constituents of a natural language
module are given in Table 8.7.

Example in relation to the case. The granted loan contracts are scanned and par-
ticulars about the specific properties or loan conditions are parsed for storage in a
database and subsequent aggregation of loan data. It also serves the periodic audit
and review of the specific loans, and to "pool" or "securitize" packages of mortgages
having similar properties.

Example in relation to the case. Using the methodology developed in [49], one can
in limited cases build a semantic causal model from a text description of the real
estate market behavior and link the endogenous nodes in the semantic net to external
quantitative models.

Advantages. If the terminology database is small, the semantic ambiguity is minimal,

and if the document layout structure is relatively fixed, good results can be and have
been obtained by banks and security companies using natural language techniques.

Disadvantages. The costs of the terminology management database only, and of the
natural language technology, are a deterrent to wider use, not to speak of those cases
where semantic ambiguity or sheer text size makes impossible any robust routine

Natural language theory and terminology databases also rely on ontologies, especially
domain ontologies, which are essential, for example, to structure any object oriented
software in finance.
Ch. 8: Artificial b~telligence in Economics and Finance 419

Example. An ontology for developing any software in personal finances has the
following classes:
• "linguistic/terminology object instances" include: checks, checkbooks, ledgers,
paying agents, bank accounts, bank statements, budgets, tax forms, etc.
• "actions" include: writing a check, depositing a payment, balancing the ledger,
preparing a budget, preparing a tax return
• "standard practices" include: check writing, depositing, withdrawing, paying bills,
balancing the books, filing out tax forms, etc.
• "institutions" include: banks, payment agents, merchants, IRS
® "tools" include: checks, checkbooks, ledgers, data management systems
® "faults" include: improperly filed out checks, errors in checkbook, late payments,
missed deposits, negative cash flows, inability to prepare tax reports
• "concerns" include: financial solvency, timeliness of payments, creditworthiness,
payment of taxes

5.3. Qualitative simulation

In almost all economic and financial applications, dynamics, forecasting and model
identification, there are systematic requirements. Needless to say, there is a wide range
of powerful techniques to do this, both on numeric data and on graphical trends. This
last remark is about the so-called "technical analysis", essentially a parsing of temporal
curve shapes by syntactic pattern recognition which, when combined with a knowledge
base of market behavior, leads to powerful trend analysis [3, Chapter 5]. However,
there is no Delphi Oracle and AI comes in with qualitative simulation, where both
the underlying model structure and its dynamic behavior are based on causal links
and trends. The ambition here is to bypass the quantitative model identification and
parametric estimation in order to achieve robust qualitative long term forecasts. A
typical qualitative simulation environment is presented in Table 8.8.

Example in relation to the case. To model quantitatively the housing demand/supply

at the macroeconomic level D1 is very difficult. Qualitative simulation then attempts
to model over time the decision making psychology of investors, owners, and buyers

Table 8.8
Typical qualitative simulation structure
• interval logic to relate quantitative data and quantitative derivatives to trend labels (such as high
growth, low growth, strong decrease, etc.)
• fuzzymembership relations to measure the interaction between two label variables
• a temporal logic, which is a kind of a clock timer, with possibly time interval logic
• aggregationfunctions, spelling out how and when exogenous and endogenous variables are combined
• a temporal causal graph, describing explicitly how the variables interact over time, with strengths
set up by the interval logic and aggregated by the fuzzy membership relations
420 L.E Pau and Tan, Pan Yong

vis-5-vis macroeconomic and demographic aggregates so as to be able to predict

robust cycles and basic trends in housing supply, demands, and used property price

Advantages. There is great potential in just having robust qualitative forecasts, pos-
sibly filtered by heuristic knowledge.

Disadvantages. • The modeling effort is very high and the sensitivity of the results in
the economic or financial field to any of the parameters of the technique is very high.
Therefore, there is a risk of only being able to achieve robust forecasts at such an
aggregate level that their usefulness is only very limited or the delay to obtain them
is too long.

6. Machine learning approaches

6.1. Introduction to alternative machine learning approaches

In an economic world, there are intelligent economic agents who must be able to
change over time as they interact with the markets, as well as through the experience of
their own internal states and processes. Machine learning is relevant in such a changing
world, because it is likely that econometric modeling may fail due to unexpected
changes and that new economic relations are not known explicitly in a timely fashion.
Learning is important as a partial countermeasure to the "knowledge acquisi-
tion/engineering bottleneck", which refers to the cost and difficulty of obtaining
knowledge and afterwards maintaining the knowledge base up to date and consis-
The solution, as evidenced by most attempted uses in economics and finance, is
therefore to begin with a minimal amount of knowledge and learn from examples, ad-
vice, or explorations of the "domain. These techniques of partly automated knowledge
acquisition are known as machine learNng.
The research field of machine learning has been fruitful enough to yield many
different problems and solutions. These algorithms vary in their goals, in their training
data types, and in learning strategies and knowledge representation. However, all of
these algorithms learn by searching through a space of possible concepts to find an
acceptable generalization.
The primary learning task is induction, which is to learn a generalization from a set
of examples. For instance, concept learning in the ID3 algorithm produces IF_THEN
rules from examples. Given examples of some concept, such as "high bankruptcy
risk", the learning system can infer a formal knowledge-based definition that will
allow itself to correctly recognize future instances of that concept. Logic induction is
especially important in economics and management as it will produce logic predicates
Ch. 8: Artificial Intelligence in Economics and Finance 421

from structured tuples of examples, thus allowing for immediate validation against
legal or procedural knowledge expressed in logic or PROLOG. It should also be noted
that logic induction can be speeded up by stating economic "impossibility theorems"
[28] as negative tuples. For example, in our application case of mortgage credit,
mortgage interest rates cannot be very different from long-term bond interest rates.
The induction algorithms are data driven (thus appealing to many economists), do
not use prior knowledge of the learning domain and rely on a large number of examples
to define the essential properties of a general concept. Algorithms that generalize on
the basis of patterns in training data are referred to as similarity-based learning.
In contrast, explanation-based learning uses analogy and other techniques to utilize
prior knowledge and learn from a limited amount of training data. Case-based rea-
soning, for example, stores solutions to problems in their original or slightly modified
form. On addressing a new problem, a case-based reasoning system retrieves a case
it deems sufficiently similar and uses that case as a basis for solving a new problem.
Although similarity-based and explanation-based learning differ in search strategies,
representation languages and amount of prior knowledge used, they all assume that
training data are classified by a "teacher". The learning system is told whether an
instance is a positive example, which belongs to the target concept, or a negative
example, which does not. This reliance on training instances of known classification
defines the task of supervised learning.
In unsupervized learning, an intelligent learning system can acquire useful knowl-
edge in the absence of correctly classified data. For example, in conceptual clustering,
the system has to divide a given set of objects, such as households or companies, into
homogeneous categories. We will examine CLUSTER/2, an approach to the problem
of category formation.
Parallel distributed learning refers to a family of learning models, including neu-
ral networks, which examine the way in which intelligent behavior can arise from
the interactions of large numbers of small, individually simple elements. Whereas
the learning techniques previously mentioned work on symbolic knowledge, neural
networks operate usually on numeric or categorized input data. They represent knowl-
edge in a large number of interconnections among nodes, which are abstractions of
biological nerve cells.
In machine learning, the maturity of learning methods varies from one part of the
field to another.
Supervised inductive learning of classification rules is a well understood technology.
Although we shall only give one example of such methods (ID3) below, many other
algorithms exist and are turned into tools [34]. At least one company (Westinghouse)
reports substantial cost savings resulting from knowledge discovered through inductive
learning [40, 44]. Scaling up inductive methods to handle more complex inductive
learning tasks remains a formidable challenge.
Unsupervized learning and discovery techniques are much less mature, partly be-
cause the goals and criteria for success in these areas are still ill-defined. It has been
422 L.E Pau and Tan, Pan Yong

demonstrated that unsupervized learning algorithms can contribute to new scientific

discoveries [18].
Neural network techniques have been generating much interest in real world appli-
cations recently, although they have associated problems of being considered as "black
boxes" (not able to explain the decision). Furthermore, it is our guess that much work
remains to be done before practical neural network systems can be deployed.
A few of these techniques will be introduced and evaluated below, while case-based
reasoning will be dealt with specifically in the following section.

6.2. Tree induction: ID3

The ID3 algorithm [22, 23] represents concepts as decision trees, a representation that
allows us to determine the classification of an object by testing its attribute value for
certain properties.

Example related to the case. Consider the problem of estimating the mortgage risk
of an individual on the basis of such properties as credit history, current debt, housing
price, and income. Table 8.9 lists a sample of individuals with a known mortgage
risk. The decision tree produced by the ID3 algorithm will represent the relations in
Table 8.9. In the decision tree, each internal node represents a test on some prop-
erty, such as credit history or housing price. Each possible value of that property
corresponds to a branch of the tree. Leaf nodes represent classifications, such as low
or moderate risk. A n individual of unknown type may be classified by traversing
this tree: at each internal node, test the individual's value for that property and take

Table 8.9
Examples of mortgage credit history data from individuals

Individual Risk Credit history Housingprice Income

1. high bad high $0-15k
2. high unknown high $15-35k
3. moderate unknown low $15-35k
4. high unknown low $0-15k
5. low unknown low over $35k
6. high bad low $0-15k
7. moderate bad low over $35k
8. low good low over $35k
9. low good high over $35k
10. high good high $0-$15k
11. moderate good high $15-35k
12. low good high over $35k
Ch. 8: Artificial Intelligence in Economics and Finance 423

the appropriate branch. This continues until reaching a leaf node and obtaining the
individual's classification.
The ID3 algorithm partitions examples with respect to combinations of attribute
values of the examples in L, a set of positive and negative examples, so that it
classifies correctly the given cases. The decision tree algorithm builds the tree so
that all training samples on a leaf of the tree belong to the same class. It adopts an
information theoretic approach by generating splits in the tree each time information
is gained from it about some specific attribute.

Advantages. Tree based induction and most other induction algorithms offer the
capability to generate incremental knowledge tied to dynamic and changing situations
for example in trading situations or in risk evaluation cases. If indeed limited to
small or medium training and validation sets, this gives a significant performance
edge over other economic actors who must stick to one case instead of seeing the

Disadvantages. Although the ID3 algorithm produces simple decision trees, it is not
obvious that such trees will be effective in predicting the classification of unknown
examples. Variations of ID3 have been developed to deal with such problems as noise
and excessively large training sets. For more details, see [23, 24].

It should be noted that many economic induction problems are especially diffi-
cult when operating in the time domain; for example, a logic predicate rule of the

a(t) and b(t) > c not (d(t)) ~ rate-of-change (e(t + 1) > f ) or 9(t + 1) < h

is not a first-order logic rule because there is no existential or universal quantification.

Moreover, rules of this type cannot be expressed even as propositional Horn clauses
since the conclusion is disjunctive. This kind of rule induction is more suitable for
continuous-value prediction tasks by model tree algorithms like M5 (R. Quinlan). In
such induction systems, zero-order is assumed that is: each training example con-
sists of values for a fixed collection of attributes plus the value of the dependent

Example from case. The M5 model-based induction algorithm can learn rules
which predict the mortgage bank proft margin INTD(t) as a nonlinear function of
I N T D ( t - 1) and new mortgages issued price index on houses. This example uses
exactly the same data as the neural network example in Section 6.4.2, which does not
provide "compact" rules as this.
424 L.F. Pau and Tan, Pan Yong

6.3. Unsupervized learning: Conceptual clustering

The clustering techniques begin with a collection of unclassified objects and some
means of measuring the similarity of objects, say among securities or individual
accounts. Its goal is to organize these objects into a hierarchy of classes that meet
some standard of homogeneity, such as maximizing the similarity of objects in the
same class. This is typical of marketing research.
Numeric taxonomy is one of the most ancient approaches to the clustering problem.
Numerical methods rely upon the representation of objects as a collection of features,
each of which may have some numerical value. A reasonable similarity metric treats
each object (a vector of n feature values) as a point in n-dimensional space. The
similarity of two objects is the Euclidean distance between them in this space. We
may extend this numeric taxonomy to objects represented as sets of symbols, rather
than numeric features. A reasonable way to define the similarity of two objects is the
proportion of features they have in common.
However, similarity-based clustering algorithms do not adequately capture the un-
derlying role of semantic knowledge in cluster formation, for example, certain features
of an object might be more important than others.
Conceptual clustering addresses this problem by using machine learning techniques
to produce general concept definitions and apply background knowledge to the for-
mation of categories. CLUSTER/2 [20, 21] is a good example of this approach. It
uses background knowledge in the form of biases on the language used to represent
CLUSTER/2 forms k categories by constructing individuals around k seed objects,
which function as initial cluster centers, k is a parameter that may be adjusted by the
user. CLUSTER/2 evaluates the resulting clusters, selects new seeds and repeats the
process until its quality criteria are met.

Example in relation to the case. At level D3, it is of utmost importance to analyze

portfolios of mortgaged properties about their homogeneity in terms of quality versus
price, in order to avoid risks which are either too large or too small. Conceptual
clustering is applied to such spatially close mortgaged properties, and/or to bank
client groups (i.e. those which have an account just because of the loan, and those
who also buy financial products, or who cover living expenses and cash flow at the
same place).

Advantages. Conceptual clustering offers objective means to categorize the data

across operators and thus helps target marketing, back-office and other operations.

Disadvantages. The selection of the seeds and the interpretation can be delicate. It
is still too sophisticated for many users in banking and insurance.
Ch. 8: Artificial Intelligence in Economics and Finance 425

6.4. Neural processing

6.4.1. Neural network architectures

A neural network appears more attractive than traditional statistical or times-series

modeling approaches for certain problem domains due to its ability to learn from large
number of different input data sources and from the execution speed, once training
is completed. This is especially relevant in finance and risk assessment where clearly
identifiable dominant problem features often do not exist, as opposed to a wide array of
useful data with little explicit and tbrmally modeled interactions. The neural network
is able to abstract and generalize from past experience quantified from the problem
domain, and apply it to new problems.
A typical simple neural network can be visualized as a multi-layered interconnection
of nodes. Input nodes accept data, the hidden nodes extract features/generalizations
from the data, and the output nodes present the verdict of the network to the user (i.e.
either a forecast, a classification, a parameter estimation, or even a sub-optimum).
A neural network learns from examples by modifying, using some training algo-
rithm, the strength of its connections, which are represented by the weights of all
arcs linking the nodes at different levels. The most popular model used in practical
applications is the backpropagation network with its corresponding training algorithm.
The mechanism in a backpropagation network is simple: training data is fed to the
input layer of the network and passed on to the hidden layer, weighted by the inter-
connection strengths. The neurons at the hidden layer sum up the incoming values,
transform this value (nonlinearly) into their output value, and pass it on to the output
layer. Likewise, neurons at the output layer sum up the weighted incoming values.
The error is determined as the difference between the computed output and the de-
sired output (target), and this error is "back propagated" from the output layer to the
input layer. Based on the propagated error, the connection strengths of the neural net-
work, also called weights, are modified. After repeated presentation of training data
(drilling), the network gradually learns the underlying distribution of the data and is
able to generalize to the unknown data.
There are many kinds of neural networks, each classified according to their ar-
chitecture and training algorithms. For different problem domains, the performances
of these networks vary. A list of introductory references is provided for interested
readers to explore neural networks on their own [17, 19, 25, 26].

6.4.2. Neural network applications in economics and finance

Some typical examples are given below:

• Classification of firms
Consider the problem of determining whether a firm, based on its financial perfor-
mance, is likely to go bankrupt. This is a straight-forward classification problem, as
426 L.E Pau and Tan, Pan Yong

the goal is to classify a firm as either bankrupt or non-bankrupt (output class) on the
basis of its financial characteristics (inputs). Once a neural model has been trained,
i.e., the neural network has been presented with both bankrupt and non-bankrupt firms
it adequately discriminates among them. To determine the classification of a specific
firm, its financial ratios are input to the trained network and the network will in turn
provide the user with a classification of this firm.
• Forex forecasting
Time series forecasting is perhaps the most exciting application of neural networks.
The objective is to discover the underlying "structure" of the mechanism generat-
ing the data, i.e., to discover the relationship between present, past, and future ob-
servations. Conventional time series analysis restricts attention to cases where this
relationship is expressed linearly. The nonlinear nature of neural networks extends
their power in modeling this relationship. Very good short term predictions are often
produced with little effort.
• Abnormal trading pattern detection
The detection of abnormal trading patterns in certain financial activities may require
much effort from the auditor. For example, the fraudulent use of ATM cards may
be detected by looking for certain withdrawal or spending patterns. These patterns,
once quantified, can be used for training neural networks. A trained network is able
to screen a huge amount of transaction data for abnormal activities.

Example in relation to the case. A simple neural network using data about
• volumes of mortgages issued, and by whom
• liquidity positions in deposit banks versus national/federal bank
• interest rate time structure for money markets and bond markets
• interest rate time structure for mortgages
• global data on disposal income with households
• price indexes for apartments and houses
can produce excellent forecasts up to one year ahead for price indices for apartments
and houses, for risk exposure by mortgage banks (including risks of refinancing), and
profit margins for the mortgage banks.

6.4.3. Backpropagation neural network

The backpropagation training algorithm, the most frequently used [17, 19, 25, 26], is
an iterative gradient algorithm designed to minimize the mean square error between
the actual output of a multilayer feed-forward perception and the desired output. It
requires continuous differentiable nonlinearities.

Advantages. Neural networks are highly flexible, in that they can be trained in
any problem domain, as long as the data is well quantified and sufficient historical
Ch. 8: Artificial Intelligence in Economics and Finance 427

data is available. Areas where neural networks work best include data classification,
modeling, forecasting, and signal processing such as generating control outputs (for
example a trading position). Neural networks are also of high appeal in finance for
their real-time capability once trained to process data from real-time data feeds and
generate trades or estimates. The choice of the training algorithm is only of secondary
importance as long as the computer platform is powerful enough.

Disadvantages. Neural networks offer almost no explanation capability to justify

how the input nodes are exploited for the generation of the output node results. Also
the statistical performance generally underscores statistical methods, such as projec-
tion pursuit [33]. Computational architectures exist, however, allowing for parallel

7. Case-based reasoning in economics and finance

7.1. Introduction

If you look up case-based reasoning (CBR) in any. of the standard textbooks in Arti-
ficial Intelligence or Economics, or both (e.g. [3, 8]), you would not see CBR listed
in the tables of contents. The topic first emerged in 1990-91, although its premises
have been researched for some time [1, 2]. Very little, if anything at all, has been
published to show its very direct relevance to economic analysis and its extensions
which are the focus of this section.
From an analysis perspective, case-based reasoning refers to reasoning in which a
human problem-solver relies on previous cases, or examples that have been encoun-
tered, to tackle a new problem. The problem-solver recalls previous cases and decides
how they are similar or dissimilar to the present issue being considered. If any of the
previous cases provides any insight, the problem-solver tries to solve the present case
using strategies that have been proven to be actually effective in the previous ones.
On the other hand, if the problem-solver finds that the new problem is different from
the previous cases, the new problem and its proposed solution are stored. In other
words, learning occurs when a new type of case is dealt with.
From a theoretical perspective, CBR refers to a number of concepts and techniques
(e.g. data structures and algorithms) that can be used to record and index past cases,
and a search can be conducted to identify the ones that might be useful in solving
a new case. In addition, techniques are required to modify earlier cases to match
better the new cases, and still other techniques to synthesize new cases when they are
In view of the motivations and goals in the previous sections, CBR in economics
and finance is clearly only of emerging interest, but it offers what is closest to the
motivations M~, M2 and M3, while being also closest to the operating traditions in
428 L.E Pau and Tan, Pan Yong

locations L1, L3, L4 and Ls. It is typical of the "mixing" of different techniques, this
time within one single AI approach, thus supplementing also the integration described
in the real estate pricing and lending case, where there is an integration across levels,
by allowing different techniques to be nested together via results and data.
The primary reason for a strong interest in CBR in economics and finance is that
economic data, economic model structures, and even reasoning, can be heterogeneous
from case to case. In other techniques, one or several of these three dimensions are
usually fixed. Furthermore, there are many problems in reasoning just in terms of
rules, since the rules that embody exceptional cases are often hard to induce and
write and even harder to modify when new cases appear.

Example. How can we benefit for another country from experiences with a given
monetary policy in a given country in terms of selecting the instrument variables to be
manipulated and their amplitudes, considering that some of the data may be different
and the economic structures/models even more so? Likewise in financial analysis, how
can we devise sectoral trend and sectoral correction factors, when analyzing data from
a number of corn'panies from that sector as cases with heterogeneous sizes, product
structures and accounting methods?
The secondary reason why even imperfect CBR is highly relevant in economic
analysis is that it does derive solutions when querying solution goals (instead of just
providing some best match of a prior set of goals and measures) and eventually also
some elementary reasonings valid in the average sense over the case data taken jointly,
as in the case of machine learning.
To summarize, CBR is an attempt to fulfill needs, such as:
• for economists and financial analysts: simple tools that can represent cases and
can be modified quickly
• for model builders and knowledge engineers: simplifying the problems involved
in representing knowledge and rules, and in updating systems. CBR has been
heavily influenced by learning by analogy [4]. Here one set of information is
compared with another and a decision is made about whether they are similar.
One such technique is the derivational-analogy method.
Section 7.2 reviews the basic CBR approaches and concepts, illustrated by examples
of suggested applications to economics, and provides a step-by-step decomposition of
the case-based reasoning implementation and usage. In Section 7.3, we shall identify
the subjects of early CBR projects in economics and finance.

7.2. Implementation of case-based reasoning in economics

All CBR systems include two key ingredients:

Ch. 8: Artificial Intelligence in Economics and Finance 429

• case representation: via groups of features with associated values. Some of the
features, typically collections of objects, rules or predicates, may be actions that
have been taken in the past
• algorithms for indexing, searching for, and modifying cases

Thus a CBR implementation is an environment that allows a developer to represent

knowledge in terms of cases. Consequently, first generation type CBR tools are often
a collection of search algorithms that rest on top of an existing hybrid expert systems

7.2.1. Case representation

In several domains, the initially available information is called "surface" features

and has two characteristics: it covers real situations sparsely and does not allow the
system to reliably discriminate between cases. As a result, the retrieval operations
may return a number of inappropriate cases along with the relevant ones. However,
in many domains the retrieved cases provide a CBR system with a means of ob-
taining additional discriminating features called "deep features", through so-called
"probes". These are data structures which include means to acquire data, and domain-
specific interpretation knowledge which is used on this data to produce the deep
features. The knowledge about probes is encoded in a structure called a validation
Thus, case representation is still the least tested area of work on CBR, in which no
general guidelines exist for the selection of case features and their attributes. Whereas
the focus on rules, frames, predicates or objects alone is certainly not correct, a
mixture may be marginally better. The proposed IEEE Standard P1052 for frame-
based knowledge representation and management of knowledge assets, also called
IMKA, is a step in that direction but as yet untested in relation to other CBR steps.
Therefore, we propose for the time being the following format CR (Table 8.10) with
accompanying example (Table 8.11) as a probably more powerful and flexible case
representation, as tested out in other fields [6, 7]:
A CBR environment should be able'to store thousands of cases. Therefore, the
essence for successful implementation is a result of good case library development,
with identifiers for typical or representative cases. The representation CR above may
seem complex but it is intended to force the identification of the constituents of the
"case space" into segments and then entering typical cases from each segment of the
overall domain of cases.
Knowledge representation structures such as frame systems allow instances to have
unique attributes, while conventional object systems require a unique class to serve
as the template for any instance that is created. Both, however, are insufficient for
CBR and hence the proposed representation CR above.
430 L.E Pau and Tan, Pan Yong
Table 8.10
CR: Case representation in case-based reasoning (CBR)
(see also Table 8.11)
cR-I) each case C is an instance uf the class "case" and can have any number of attributes and values
CR-2) for each case C, its subdomains are subclasses fulfilling CR-I as well
CR-3) tbr each subdomain object (i) of a given case C a collection of structured triplets:
F(i/C) -- (X(j), L(i, j, X), N(j, G(i, j)))
C: case designation
i: object designation
j: vector number
F(i/C): is the symbol for any knowledge triplet pertaining to object "i" belonging to case G'
X ( j ) : quantitative and qualitative feature values vector about (i)
L(i, j, X): list uf feature nude names in X(j) and possibly types (string, word, character,
G(i,j): structured directed labeled graph between nodes in L(i, j, X) with semantic labels,
showing the causal sU'ucture between them
N(j,G(i,j)): list of rules or predicates invoked by features in L(i,j,X) representing the
causal structure in G(i, j)

Table 8.11
Example of case representation in monetary policy
Example: Take as problem area monetary policy as conducted by national banks in different periods; so
the cases in CR-1) is the collection of countries surveyed and of the periods considered.

Take for example G' = Germany.

The subdomalns in CR-2) would typically be any country specific collection of subdomains such as:
{Q: ( M l, M2, M3, fiscal policy, currency controls, investment controls, ...).

Take now for example i = fiscal policy; then we can have in F (fiscal policy/Germany) the following
X ( I ) : (50 000, progressive, 20%, losses on portfolio, 60 000, ...).
L (fiscal policy, *, 1): (ceiling for private taxation, scale, tax rate, deduction type, average net personal
G (fiscal policy, 1): structured directed labeled graph between nodes in L (fiscal policy, *, 1) with semantic
labels (types or name of regulation applicable), showing the causality. In this case many arcs would
go from the instrument variables to the node "average net personal income".
N (1, G (fiscal policy, 1)): list of rules or predicates representing the causal model described by G, with
possibly econometric formulas giving values of attributes in X(1).

Please note in this example that j = 1 is the only such vector instance and that others will normally exist.

7. 2.2. Indexing of cases

T h e r e a r e s e v e r a l w a y s to o r g a n i z e o r i n d e x t h e c a s e s . T h i s is d e s c r i b e d in s o m e d e t a i l
h e r e , s i n c e in S e c t i o n 1 t h e r e w a s a r e m a r k a b o u t a l a c k o f u s e o f d a t a b a s e t e c h n o l o g y
in m a n y A I s o l u t i o n s to p r o b l e m s in e c o n o m i c s a n d f i n a n c e . T h e f o u r c o m m o n c a s e
i n d e x i n g w a y s are as f o l l o w s :
Ch. 8: ArtificialIntelligence in Economics and Finance 431

1. Simple case-based inductive indexing. The CBR system will dynamically develop
a tree using an inductive algorithm like ID3 (see [3] for economic applications).
The end nodes of the CBR tree represent sets of cases rather than specific conclu-
sions. The steps along the way involve tests for the presence or absence of features.
This indexing can also support the retrieval of cases that are similar but not exact

2. Nearest neighbor indexing. If each attribute in each case has a value, matching
a new case among a set of existing ones can be done by standard nearest-neighbor
information retrieval criteria. We calculate the degree to which the new case matches
the existing ones, attribute by attribute. Fuzzy natural language matching and thesauri
equivalence are used. With the type text, a partial match occurs if any individual word
is used in the description of the feature of the presented case and the feature of the
stored case. The character type, a more flexible structure, provides spell-checking ca-
pabilities that achieve partial matches even when the feature being described contains
misspelled words.

3. Hierarchical indexing. A hierarchy tree is developed from the cases. This tree
can be used to screen quickly any new case and then match it with only those cases
in the case library that have characteristics similar to the prototype that is most alike
the new case. The algorithm calculates a score for the presented case and each of
the cases in the case library and selects those with the closest score and presents
them for indexing. This approach exploits the layers of abstract classes between the
root case class and the specific graphs of instances of cases derived from the causal

4. Knowledge guided indexing. If the CBR is built on top of an hybrid expert system
shell, one can use the rules in N ( j , G(i,j)) to reason about which subset of cases is
most alike the new case one tries to match.

5. Combination of indexing techniques. One can finally combine (1)-(4) to provide

efficient full indexing for large applications. This can be formalized by having an
indexing script stating which indexing technique is applied to which set of cases
and in what order. Normally in this event, nearest-neighbor indexing comes last as it
applies weight to attributes.

These weights are match and mismatch weights. The match weight indicates the
importance of the presence of a feature while the mismatch weight indicates how
important it would be if the feature were not present in the case being examined.
These default absent-present weights can be inherited from the root class case to
handle situations in which there is a feature in the new case that is not present in a
stored case. All these weights must be specified exogenously.
432 L.E Pau and Tan, Pan Yong

7.2.3. Case retrieval

There are also several techniques for retrieving cases from the case library when the
user presents a new case. All possible matches are identified and possibly ranked in
case the technique uses weights.

1. Template retrieval. Here one looks for a case in the case library that exactly
matches the new case. This is what database retrieval implements. Each attribute of
the new case must exactly, match the attributes of one or more cases in the case-base.
Otherwise nothing is returned.

2. Hierarchical retrieval. Whenever the cases have been indexed with either induc-
tive or hierarchical techniques, one will search the tree structure, making decisions at
each node until one can go no further. If a leaf-node is reached, then the set of cases
found there is returned. If the retrieval stops short of a leaf-node, then all the sets of
cases that lie beyond that point are returned.

3. Associative retrieval. Associative retrieval is used with nearest-neighbor indexing.

The associative retrieval algorithm [9] examines the attributes of the new case in the
case library, using weights assigned to each attribute of each stored case.

4. Knowledge-based retrieval. In this case the user can write a small set of rules to
guide or filter the search for appropriate cases. This approach relies on a conventional
search algorithm such as forward search or beam-search or constraint satisfaction [3].

The effectiveness of the retrieval is measured by recall and precision [12]. Recall
measures the percentage of relevant information that is retrieved from the database in
response to a particular query. Precision measures the percentage of retrieved infor-
mation that is relevant.

7.2.4. Learn and adopt new cases

In many situations, a good match cannot be found in Step and one must
provide formulas, procedural code, or rules that will allow the system to adapt an
existing case so that it will better match the new case. Also, the end user may adapt
and store new cases that are unique and should be included in the case library in the
future. This process is sometimes called case adaptation.

7.3. Potential applications o f CBR in economics and finance

This section in no way pretends to describe full or solved CBR problems in economics
or finance. Rather, it proposes the implementation of a solution (Section 7.2), and
provides pointers to some possible, yet .untested, applications (Section 7.3). We list
such potential applications below.
Ch. 8: Artificial Intelligence in Economics and Finance 433

Example in relation to the case. Property valuation in a district. As individual mort-

gage applications are being processed by a bank branch, it builds up a library of
cases. These cases are not independent in that property assessment criteria, zoning
infrastructure, municipal real estate taxation rules link them together. CBR can indeed
help carry out a "calibration" of property valuations, with case retrieval providing a
very pragmatic explanation and justification [32, 47].

Example in relation to the case. Securitization of mortgage portfolio. With the

introduction of suitable similarity measures, as explored before by conceptual clus-
tering, the case-based retrieval can provide both a rule base to the characteristics of
a given candidate pool of mortgages and a means to retrieve mortgages to be allo-
cated to pools of mortgage securities with given characteristics. It also provides a
documentation feature to the buyer or seller of the securitized mortgage pool.

Example. Mergers and acquisitions. There have been very few formal approaches
in this area, for very good reasons. Nevertheless, an ex-post case-based analysis of
M&A, based on later performances, would prove to be an interesting testing ground
both for CBR and for financial analysis.

Example. Marketing research. For targeting selected groups, there is a still unsolved
problem in selection of representative elements and evaluation whether a new case
(that is, a new completed survey questionnaire) falls in this or that action category.
Whereas statistical clustering and neural networks show some promise, they are poor
at handling qualitative and causality aspects, which CBR should be able to handle [38].

8. Conclusions: Areas for further research

In the real world of economic analysis, AI approaches must be scalable and tunable
to changing characteristics without having to change the underlying techniques or
algorithms. In economic applications, better explanation-based interfaces are necessary
between the user and the system. The discussion around a set of solutions generated
by a system will invariably turn up features or problems (and not just "knowledge"),
supported with lots of examples retrieved with varying degrees of match. It is essential
to learn from the evolution in economic analysis that feature-centered discussions are
important keys to the debates although sometimes hidden behind modeling issues
[10]. The selection of instrument variables and policies is a feature selection as the
CR case representation in Table 8.11 clearly shows. Reference [5] adds to this the
potential dimension of introducing the behavioral elements of the economic agents.
From the perspective of business, almost no AI research addresses economic com-
petition theories, either with reference to game theory [28, 31] or to disequilibrium
theories. AI search theory has much to gain from revisiting both economic theories
434 L.E Pau and Tan, Pan Yong

and game theory, especially as computer supported cooperative work environments

[29] arise in AI solutions.
On a more fundamental level, the business communities have over the past 5 to 10
years developed a strong understanding of the value of knowledge assets, these be-
ing people-based skills, intellectual property, information, trademarks, process/design
knowledge. However, even when restricted to such a clear issue as interoperable and
exchangeable knowledge bases, to be obtained as a result of the standardization of
some of the basic knowledge representations, the academic AI community prefers to
push for many unproven concepts. Banking, finance, services and administrations are
tired of this and want quickly accepted knowledge representation standards they can
stake their survival and competitiveness on. In this relation, knowledge acquisition is
a far tess important issue.
Economic and financial processes are extremely "brittle" and so are the techniques
used to analyze them. The choice is always between flexible techniques and accurate
ones, but AI can contribute to both, provided it is supplemented by learning and adap-
tation to face changing behaviors, experiments, etc. This brittleness and compromise
suggest that genetic algorithms (GA) [42], combined with heuristic knowledge, are
worth exploring so as to address fundamentally different systems from conventional
symbolic AI. Genetic algorithms allow for gaming, competition and optimization, as
work on double auction tournaments in economics has shown.
The strongest potential of some AI techniques lie in their data/information fusion
capability [6, 28], which was little explored. This holds especially for neural networks,
case-based reasoning, and knowledge representation standards.
In many of the above trends, final success will ultimately depend on a subtle bal-
ance between allowing some AI techniques to be implemented alongside mainstream
techniques and exploring challenging fields, such as machine learning and explanation.

T e r m i n o l o g y (in alphabetical order)

Backward reasoning involvesanother set of operators applied to the goals to convert them to sub-goals
that are easier to solve, this being done recursively untill final simplificationwithin the constraints of
the domain-taskdomain.
Case-based reasoning is a problem solving architecture (see Section 7) combining a database of facts,
called cases, with a learning algorithm which elicits knowledge from these facts, with a proximity
measure allowingto retrieve from the database cases similar to a new observed pattern, with a library
of problem solving methods operating on the knowledge extracted and/or the exceptions hereto. Case-
based reasoning also serves as an explanation mechanism by providing explicit facts/cases mid all
the processes carried out on them, while selecting these relevant facts by knowledge-based search
methods. Case-based reasoningis intermediate between database applicationsand expert systems, and
calls for learningwhenit is found appropriate to add new knowledgefrom validated facts to the existing
knowledge base.
Conflict resolution strategy is a set of control rules in problem solving used to prune among alternatives
or to make one choice when different reasoningpaths are possible. In those cases where performance
functions or attributes apply to each of these alternate paths, a game theory inspired strategy may apply
[31, 46].
Ch. 8: Artificial Intelligence in Economics and Finance 435

Constraint p r o g r a m m i n g is an approach which covers a range of programming languages (PROLOG-IV,

CHIP, etc....) which allow to express declaratively quantitative bounds, as well as logic or qualitative
constraints applying to all variables in any given knowledge rule. If a problem solving method is enacted,
the language kernel will enforce, transparently to the user, the satisfaction of all constraints and prune
the search space only for states which satisfy the constraints. In this way, constraint programming is
an advance over optimization, as it allows to perform the same, while including qualitative constraints
and knowledge.
Expert system (see also: Shell) is a knowledge-based system, composed of one or several knowledge
bases, combined with the same functionalities as a "shell" (see below), the implementation being
customized or via a commercial software product.
Explanation, also called explanation facility, is a tutorial assistant which justifies states resulting from a
problem solving process, w.r.t, to the specific knowledge elements having helped reach the solution(s).
It covers also annotations made to the knowledge besides what is encapsulated in the formal knowledge
Forward reasoning is the result of applying a sequence of operators on a database describing the task-
domain situation to produce modified situations until they satisfy a goal.
F r a m e is a knowledge representation which includes declarative and procedural information in predefined
internal relations belonging to generic types, called slots; a frame has a number of knowledge slots, or
a tree-based hierarchy hereof, for facts about the concept represented by the frame.
Fuzzy membership is a finite function giving the likelihood that a given random variable has a given
value; when opposed to probability theory, which assigns a probability value to a possible value, a
fuzzy membership function assigns an interval of likelihoods to the same. Fuzzy membership functions
are used in reasoning with uncertainty.
G r a m m a r is a formal generative description of the vocabulary found in some language, and of the rewriting
rules (also called production rules), whereby a chain of elements of the vocabulary and of additional
symbols can be replaced by another chain, this process being repetitive and starting with an initial
Heuristics is a process, situation independent, for guessing how to control a problem solving approach.
This involves identifying wrong guesses and recovering from them. The term heuristic knowledge refers
to formalized knowledge (situation independent) describing how this process is implemented.
Inductive logic is a learning approach which generates predicates or production rules from logic facts;
these logic facts are logic combinations or associations of truth values of some underlying variables.
Inductive logic is efficient in generating knowledge from logic facts and in synthesizing vast amounts
of facts by eliminating logic redundancies.
Inference. See: "Problem solving" or "Search".
Knowledge is the symbolic representation of aspects of some named universe of discourse.
Knowledge representation techniques involve routines that manipulate specialized data structures to
achieve problem solving.
Learning is defined as any process by which a system can increase its performance. This may or may not
involve the acquisition of explicit knowledge from examples, associations, analogies, etc ...:
- rote learning: new knowledge is supplied in a form that enhances the performance directly, i.e. by
searching for a wider context
- learning by being told and advice taking
- learning from examples or induction
- learning by analogy
- learning by successive approximations, i.e. by neural networks.
Machine learning is also the genetic designation of all cognitive techniques leading to algorittuns which
allow to elicit knowledge (represented in a suitable knowledge representation format) from facts in list,
frame, relational or semantic network form.
Logic predicate. See "Predicate".
436 L.E Pau and Tart, Pan Yong

Machine learning. See "Learning".

Neural networks refer to approximations to cognitive processes, where stimuli (usually input data or
observations, quantitative as well as qualitative) are being analyzed by a network of computational
cells characterized by individual non-linear responses akin to neurons in the brain. When propagated
through the network and after adding up, the stimuli result in combined values which can be measured
at specific cells called output neurons/cells. By suitable algorithms applied to training data (e.g. input
stimuli and corresponding output responses), one can estimate recursively the weights and parameters
applying to all links between computational cells, so that the computed output match as closely as
possible the nominal output responses. Once the training weights and parameters have been estimated,
new input stimuli can be applied to the network corresponding to new situations and forecasts made
as to their effects at 'all levels including the output cells; this is called feed-forward networks. At the
same time, other algorithms can estimate the best input stimuli matching specified responses; this is
called feed-backward networks. By combining the two types, self-learning can be achieved if some
data (input stimuli and output responses) are supervised. Whereas these algorithms are usually slow,
the feed-forward/backward calculations are very fast.
Nonmonotonic logic encompasses several forms of nondeductive reasoning, such as adopting assumptions
that may have to be abandoned in light of new information or knowledge taken into account. The two
broad classes of nonmonotonic reasoning are reasoning by default and reasoning by circumscription.
Ontology is a categorization of objects, concepts, events or terms, according to domain practice.
Parsing is a procedure whereby linguistic input is decomposed into a tree, or network of trees, helping
determine the function of each word from syntax and other knowledge sources. A parser implementing
parsing can be viewed as a recursive pattern matcher, seeking to map a string of words onto meaningful
syntactic, or grmnmatical, or semantic patterns.
Predicate is a logic expression made of a sentence involving terms and logic quantifiers.
Problem solving. A large body of core ideas, concepts and algorithms that deal with deduction, inference,
heuristics, planning, common-sense reasoning, theorem proving, and related processes (all as defined
in the usual language).
Problem solving model. A problem solving model specifies the problem solving procedure for a specific
problem at a level abstracting it from implementation details.
Qualitative simulation is an advisory technique, producing qualitative values for the endogenous vari-
ables in a macroeconomic quantitative model, from range or trend input values of the exogenous and
instrument variables. These qualitative output values are produced by propagating the inputs via es-
sentially grammatic or semantic expressions which specify how two or more range or trend values
(treated as linguistic elements) are combined depending on the arithmetic or logic operator applying in
the quantitative model for the underlying variables.
Record is a collection of one or more named fields, each containing a symbolic representation of some
aspect of the universe of discourse.
Rule is a statement between objects or concepts, including their quantified or qualitative attributes, of the

IF (sentence) THEN (sentence) ELSE (sentence)

and this constitutes one knowledge representation.

Search. See "Problem solving". The three components of a search process are:
- a database which describes both the current task domain and the goal
- a set of operators that are used to manipulate the database by e.g. heuristics, inference, proving, etc...
- a control strategy for deciding what to do next, in particular which operators to apply and where to
apply them.
Semantic nets are attributed graphs between nodes representing concepts, objects, and events found in the
domain-task space or in any abstract space generated during the search process; links between nodes
describe interrelations, and each link may have a label carrying a meaning.
Ch. 8: Artificial Intelligence in Economics and Finance 437

Shell (also called: expert system shell) refers to generally commercial expert system software products
composed of: a user interface, an editor for some knowledge representation systems, a set of search
procedures, some explanation facilities, and a range of utilities. They allow for fast prototyping of
expert systems.
Slots see "Frame",
Tuple (or t-uple) is a list of typed data elements.

[1] Proc. case based reasoning workshops, DARPA, 1988-1991, Menlo Park, CA: Morgan Kaufman,
[2] Shank, R. and Riesbeck, C. Inside case-based reasoning. Hillsdale, NJ: Lawrence Erlbanm, 1989.
[3] Pan, L.F. and Gianotti, C. Economic and financial knowledge-based processing. Heidelberg, NY:
Springer, 1990.
[4] Russell, S.J. The use of knowledge in analogy and induction, Research notes. London: Pitman, 1989.
[5] Pan, L.E 'Behavioral knowledge in sensor/data fusion', Journal of Robotic Systems, 7(3):295-308,
[6] Pau, L.E Sensor and data Jusion. London: Academic Press, 1992.
[7] Pan, L.F. 'Knowledge representation approaches in sensor fusion', Automatica, 25(2):207-214, 1989.
[8] Handbook (~fartificiat intelligence. Palo Alto, CA: Morgan Kaufman, 1985.
[9] Pat, Y.H. Adaptive pattern recognition and neural networks. Reading, MA: Addison-Wesley, 1989.
[10] Pan, L.F. 'Artificial intelligence in financial services', IEEE Trans. data and knowledge engineering,
[11] Pan, L.F. 'Feature extraction in the time domain: Application to the analysis of financial data in
strategies over time', in: K.S. Fu and A. Whinston, eds, Pattern recognition theory and applications.
NATO ASI Series, Vol. E-22, Leyden, NL: Noordhoff, pp. 75-90, 1977.
[12] Salton 'Another look at automatic text retrieval systems', Comm. of the ACM, 29:648-656, 1986.
[13] Fisher, D. and Langley, P. 'Conceptual clustering and its relation to numerical taxonomy', in: W. Gale,
ed., Artificial intelligence and statistics. Reading, MA: Addison-Wesley, pp. 77-116, 1988.
[14] Kolodner, J.L. and Simpson, R.L. 'The mediator: Analysis of an early case-based problem solver',
Cognitive Science Journal, 13:507-549, 1989.
[15] Lehnert, W.G. 'Case-base problem solving with a large knowledge base of learned cases', in: Proc.
National Col!f. on AI, pp. 301-306, 1988.
[16] Riesbeck, C.K. and Schank, R.C. Inside case-based reasoning. HiUsdale, NJ: Lawrence Erlbaum,
[17] Aleksander, I. and Morton, H. An introduction to neural computing. London: Chapman & Hall, 1990.
[18] Cheeseman, P., Kelly, J., Self, M., Stutz, J, Taylor, W. and Freeman, D. 'Autoclass: A bayesian
classification system', Proceedings of the fifth international conference on machine learning, pp.
54-64, 1988.
[19] March, A., Harston, C. and Robert, R. Handbook of neural computing applications. New York:
Academic Press, 1990.
[20] Michalski, R.S., Carbonell, J.G. and Mitchell, T.M. Machine intelligence: An artificial intelligence
approach, Vol. 1, 1983.
[21] Michalski, R.S. and Stepp, R.E. 'Learning from observation: Conceptual clustering', in: Michalski,
Carbonell and Mitchell, eds, Machine intelligence: An artificial intelligence approach, Vol. 1, 1983.
[22] Quinlan, J.R. 'Learning efficient classification procedures and their application to chess endgames',
in: Miehalski, Carbonell and Mitchell, eds, Machine intelligence: An artificial intelligence approach,
Vol. 1, 1983.
[23] Quinlan, J.R. 'Induction of decision trees', Machine Learning, 1(1):81-106, 1986.
438 L.E Pau and lhn, Pan Yong

[24] Quinlan, J.R. 'The effect of noise on concept learning', in: Michalski et al., eds, Machine intelligence:
An artificial intelligence approach, Vol. 1, 1983.
[25] Trippi, R. and Turban, E. 'Neural networks in finance and investing', Probus, 1993.
[26] Wasserman, P. Neural computing: Theory and practice. New York: Van Nostrand Reinhold, 1989.
[27] Pan, L.E 'Database of 400+ financial and economic fielded applications of AI with their solution
architecture (only available commercially)', continuously updated.
[28] Pan, L.E 'A survey of reasoning procedures in knowledge based systems for economics and manage-
ment', Journal Computer Science in Economies and Management, 3:3-22, 1990.
[29] Easterbrook, S.M., ed., 'CSCW: Cooperation or conflict', NATO ASI proc. Springer, 1992.
[30] Berck, P. Economist's mathematical manual. Springer 1993.
[31] Dutta, B. et al., eds, Game theory and economic applications, Lecture notes in economics and math-
ematical systems, 389. Springer, 1992.
[32] AI-ECON project, SPES Program, European community, Brussels, 1992-1995 (with the participation
of Herriot Watt University, ABN-AMRO Bank, Digital Equipment Europe, Tilburg University, AIS
[33] STATLOG Project, ESPRIT project no 5170, European Community, on the comparison of neural
networks, machine learning and statistics, 1990-1993.
[34] MLT Machine learning toolbox, ESPRIT project 2154, coordinated by ALCATEL Research, 1989-
[35] Pan, L.E and Tambo, T. 'Knowledge based mortgage loan credit granting and risk assessment',
Economic Dynamics and Control, 14:255-262, 1990.
[36] Richter, M.M., et al., eds, Proc. EWCBR-93 1st European workshop on case based reasoning, Uni-
versity of Kaiserslautern, 1-5 November 1993, 1993.
[37] Koselka, R. 'Businessman's dilemma: latest developments in game theory', FORBES, pp. 107-114,
11 October 1993.
[38] O'Leary, D.E. 'Case based reasoning and multiple agent systems for accounting regulation systems',
Journal of Intelligent Systems in Accounting, Finance and Management, 1:41-52, 1992.
[39] Aubin Optima and equilibria. Springer, 1993.
[40] Allen, B.P. 'Case based reasoning: Business applications', Communications qf'the ACM, 37(3):40-44,
[41] Roth, EH. and Jacobstein, N. 'The state of the art o f knowledge based systems', Communications qf
the ACM, 37(3):27-39, 1994.
[42] Bauer, R.J. Genetic algorithms and investment strategies. New York: Wiley, 1993.
[43] Sleeman, D. 'Towards a technology and a science of machine learning', Journal A1COM, 7(1):29-38,
[44] Aamodt, A. and Plaza, E. 'Case based reasoning: Foundation issues', Journal AICOM, 7(1):39-59,
[45] Varian, H. Economic and financial modeling with MATHEMAT1CA. Springer, 1993.
[46] Aumann, R.J. and Hart, S., eds, Handbook of game theory. Amsterdam: North-Holland, 1992.
[47] Gonzalez, A.J. and Laureano-Ornz, R. 'A case based reasoning approach to real-estate property
appraisal', J. Expert Systems with Applications, 4(2):89.
[48] Kolodner, J.L. 'An introduction to case based reasoning', Artificial Intelligence Review, 4, 1992.
[49] Pau, L.E 'Inference of the structure of economic reasoning from natural language analysis', Interna-
tional Journal of Decision Support Systems, 1(4).
[50] Mankiw, N.G. and Weil, D.N. 'The baby boom, the baby bust, mad the housing market', Journal qf
Regional Science and Urban Economics, 19(2):235, 1989.
[51] Kolodner, J. Case based reasoning. Los Altos, CA: Morgan Kaufmann, 1993.
[52] Srinivas, M. and Patnaik, L.M. 'Genetic algorithms: A survey', IEEE Computer, 27(6):17-27, t994.
[53] Peter, E. Chaos and order in capital markets. New York: Wiley, 1991.
Ch. 8: Artificial Intelligence in Economics and Finance 439

[54] Pau, L.E 'Artificial intelligence in economics: A formalization of applications selection', Proc. K1-
94, German AI Conference 1994, Saarbriicken, September 1994, Springer Research Notes Series.
Heidelberg, 1994.
[55] Miles, D. Housing finance market and the wider economy. New York: Wiley, 1994.
[56] Fron, A. Programmation par contraintes. Paris: Addison-Wesley, 1994.
[57] Frost, R.A. Introduction to knowledge based systems. London: Collins, 1986.
Chapter 9


Brown University

Smt~/brd University
University of Chicago

1. Introduction 443
2. Discriminant functions 444
3. The perceptron 445
4. Feedforward neural networks with hidden units 446
5. Classifier systems 448
5.1. The brain as an accounting system 448
5.2. Generality versus discrimination 450
6. Economic applications 450
6.1. Arthur's artificial bandit 451
6.2. Prisoner's dilemma 452
6.3. Representation 454
6.4. More general games 455
6.5. Capital and moral hazard without discounting 457
6.6. A model 458
6.7. Linear strategies 460

* We thank Blake LeBaron, John Rust, mad an anonymous referee for helpful comments. Parts of this
paper draw heavily on nmterials in Chapters 3 and 4 of Sargent's Bounded Rationality in Macroeconomics,
Oxford University Press, 1993.

Handbook qf" Computational Economics, Volume 1, Edited by H.M. Amman, D.A, Kendrick and J. Rust
(~) 1996 Elsevier Science B.V. All rights reserved.
442 L-K. Cho and ZJ. Sargent

7. Conclusions and open questions 462

8. Appendix 463
8.1. Analysis of Theorem 2 463
References 469
Ch. 9: Neural NetworL~.fi)r Encoding and Adapting in Dynamic Economies 443

1. Introduction

During the last 15 years, neural networks have engaged the attention of researchers
in diverse fields of science. This paper is an introduction to using neural networks in
economic dynamics. For macroeconomics and repeated games, we are always looking
for economical ways o f formulating and characterizing equilibria - collections of
strategies that satisfy our usual assumptions about individual rationality and common
understandings. In this paper we adduce several examples that show neural networks
to be fruitful sources of 'functional forms' for conveniently representing equilibria;
for formulating ways of computing them; and for studying their stability.
We begin by describing feedforward neural networks as approximators, and relate
them to statistical discriminant functions. Then we describe how neural nets of varying
complexity can represent equilibria in two repeated games and one dynamic economic
model. These three examples are designed to stretch neural networks in different
directions. We start with a repeated prisoner's dilemma game, and use it to show
the power of the perceptron as a device for encoding strategies. We define linear
strategies, display perceptrons that encode them, and show that for the prisoner's
dilemma game, these strategies are discriminating enough to support all subgame
perfect payoff vectors. This outcome stems from the feature of the payoff matrix for
the prisoner's dilemma game that linear strategies discriminate sufficiently among
alternative histories. In the spirit of Minsky and Papert, we study whether linear
strategies are enough in other two-player games. They are not, but for a general class
of two-player games only a slightly more complex network is discriminating enough
to support all equilibrium payoff vectors.
Inspired by work of Atkeson (1991) and Marcet and Marimon (1992), we then
study a mechanism design game associated with repeated moral hazard in a stochastic
growth model. The capital stock has dynamics chosen by a private sector assisted by
a planner who has access to outside resources but who cannot monitor the investment
level of the private sector and who must therefore rely on some sort of statistical test
to infer the private sector's past behavior. 1 We show how simple neural networks can
be applied to this example to sustain almost every average payoff vector beyond the
autarchy level.
Each of our examples starts from the observation that the simplest neural network -
the single unit perceptron - can implement a trigger strategy. This allows the neu-
ral network to encode enough history-dependent strategies to recover a large set of
equilibria in repeated games and other dynamic economic settings. Our encoding of
history-dependent strategies uses the insight that simple 'recurrent' neural networks
can implement accounting systems for efficiently tracking the rewards associated with
past behavior patterns. The classifier system of Holland, which we briefly recount,
also centers around an accounting system.

JThe existence of the capital stock prevents application of the review strategies of Radner (t985).
444 L-K. Cho and ZJ. Sargent

Thus, we import two themes from the artificial intelligence literature: classification
and accounting. We also touch briefly on a third theme, adaptation. Their thrift in
encoding makes us optimistic about neural networks as devices in terms of which we
shall eventually be able to formulate theories of learning and evolution over history-
dependent strategies. We conclude this essay with some speculations about ways that
these possibilities might soon be realized.

2. Discriminant functions

Players who repeatedly engage in a stage game with k actions must somehow clas-
sify histories into k categories to prompt one of the k actions. To prepare the way,
we briefly describe statistical discriminant functions, a classic tool for studying di-
chotomous or polychotomous choices, and a bridge between classical linear statistical
methods and neural networks. The simplest manifestations of these methods classify
vectors of independent variables z E X for the purpose of predicting some associated
characteristics y E Y. In inspecting these methods, we should keep in the back of our
minds that the X set that shall eventually interest us must somehow summarize the
history of play.
Fisher (1936) used least squares regression to find a linear discriminantfunction
for determining to which of two predetermined classes an individual belongs. 2 We are
given two populations, zl E X1 and z2 c X2, each of k x 1 random vectors, each with
c o m m o n covariance matrix V, but with different mean vectors EXl = #1, E z 2 = t.2.
The means 1.1 and /*2 and the common covariance matrix V are known. Vectors z
are to be drawn from a mixture of the two distributions, with equal probability. We
seek a rule for classifying an :c randomly drawn from this mixture of populations Xj
and X2, by which we mean saying that z belongs either to Xi or X2.
A solution of this problem is attained with the linear discriminantfunction/3'cc-/3o,
where/3 is a k x 1 vector and/3o is a scalar, and which guides our decision according
to the rule

i f / 3 ' z -/3o >>-O, then x is a member of X l ;

i f / 3 ' z --/3o < 0, then z is a member of X2.

For a given within population variance /3'V/3, we select /3 to separate the two
populations as much as possible, where the discrepancy between the two populations
is measured by the criterion /3'(>1 - #2). We choose /3 to maximize /3'(1.1 - 1.2),
subject to/3'V/3 = c, where e > 0 is a constant, by setting

/3 = A-'V-l(1*1 - 1.2), (2)

2See Kendall (1957).

Ch. 9: Neural Networks'./'orEncoding and Adapting in Dynamic Economies 445

where A is a Lagrange multiplier on the constraint/3'V/3 = e. By setting the multiplier

equal to one we pin down the variance c in the constraint, and so set 3o (1) according
to/30 = / 3 ' ( # t + #2)/2.
When the means (#1, #2) and covariance matrix V are not known a priori, they are
estimated by sample means and covariances, where the sample covariance is estimated
by pooling observations across the X1 and X2 populations. Sample estimates are sub-
stituted into (2) to obtain the sample discriminant function. When Xt and X2 are each
multivariate normal, the discriminant function (1) implements a likelihood ratio test.

3. The perceptron

Linear discriminant functions can he represented with a single-layer perceptron, the

simplest type of neural network) The perceptron models the interaction of k input
neurons x a , i = 1 , . . . , k , with one output neuron Yr. The neurons are elements
xi E X and y C Y, where the spaces X, Y can be specified in various ways. For
'Ising neurons', X = { - 1 , 1}; for classification perceptrons we take Y -- {0, 1}. a
perceptron is a function

Yt = A(w' act),

where w and xt are each (k x 1) vectors, and A is a 'squasher' function, i.e. a

monotonically nondecreasing function that maps R onto [0, 1]. Three popular squasher
functions are:
The Heaviside step function:

1, forz~>0;
A(z)= 0, forz<0.

The 'sigmoid function':

A(z) -
1 +e -z"

Any cumulative distribution function, e.g. the normal c.d.f.:

l f_ z ( -z2 )
A(z) - VLT~
~/a-Y-cr exp dx.

A perceptron first weights input xit by wi; then weighted inputs are summed across
i and squashed to yield output Yr. With the Heaviside squasher function, the neuron

3See White (1992) and Weisbuch (1990) for extensive treatment of the material of the next several
subsections. See Rubinstein (1993) for an economic application of perceptrons.
446 L-K Cho and T.J. Sargent

is either 'on' (Yt = 1) or 'off' (yt = 0). For non-negative inputs, a positive weight wi
from input / to the output neuron is called an 'exciting' connection, and a negative
weight is called an 'inhibiting' connection, because such connections make it more
or less likely that the output neuron will 'fire'.
For a given set of weights w and the Heaviside step function as the squasher, a
perceptron represents a particular linear discriminant function or classifier. The neuron
'fires' (Yt = 1) if and only if wtxt >~ 0. 4
Given a 'training sample' {Yt, xt}~=l, where t denotes a particular individual, the
perceptron training problem is to choose w to minimize ~']~t=~(Yt - A(w' xt)) 2. This
is a nonlinear least squares problem. The literature on perceptrons describes various
algorithms of the iterative form

wt : w t - t + a V A ( w t - 1 , xt)(Yt - A(w~_,xt)),
where {Tt} is a nonincreasing sequence of positive numbers, and V A ( w , x) is the
gradient of A with respect to w. One scheme is to pick 7t equal to a small positive
constant, and repeatedly to run the sample {yt,xt}t=l5° through the algorithm until
convergence occurs. This recursive algorithm has an interpretation as a stochastic
approximation scheme, a class of learning algorithms described in Ljung, Pflug, and
Walk (1992). 5 Implementatigons of nonlinear least squares found in the econometrics
literature also apply directly.

Perceptrons and discriminant analysis

Early work on the perceptron focused on determining classes of objects which could
or could not be separated by a perceptron. Results of Minsky and Papert (1969) caused
a long period of disenchantment about the utility of the single unit perceptron because
it can represent a linear discriminant function, but not a nonlinear discriminant func-
tion. Nonlinear discriminant functions are needed for many classification problems,
including the famous X O R problem studied by Minsky and Papert. Enthusiasm for
perceptrons revived in the early 1980s when it was recognized that networks of per-
ceptrons can approximate any nonlinear discriminant function by including 'hidden

4. Feedforward neural networks with hidden units

By arranging banks of perceptrons into rows and linking elements of successive

rows via weighted summation operators, we construct afeedforward neural network.

4With one of the other two squasher functions above, the value of yt renders a probability C (0, 1)
that an individual is classified in one of the sets.
5Also see Ljung (1977), Beneveniste, M6tivier, and Priouret (1990) and Ljung and S6derstr6m (1983).
Ch. 9." Neural Networkv for Encoding and Adapting in Dynamic Economies 447

Halbert White and various co-workers 6 have shown that feedforward neural networks
are well regarded as approximators of nonlinear functions 9 mapping vectors x c X
into vectors y E Y.
A feedforward neural network with one hidden layer is described by the two equa-
y~ = Oo + E Ojatj~
atj : A wjixit •
\i=l /

The second equation describes the output atj of hidden unit j , which is simply
a perceptron. The first equation generates the output Yt of the network by taking an
affine function of the (q × 1) vector at of outputs of the hidden units. The parameters of
the network are the weights wji and 0j. The network can be represented compactly as

yt = Oo + E OjA(wj 'x~). (4)


The literature has addressed two issues about models of this class: representation
and estimation. The representation or approximation literature delineates the class of
functions 9 : X --+ Y that can be arbitrarily well approximated by model (4). Hornik,
Stinchcombe and White (1989) have shown that a very wide class of functions can
be approximated by (4). 7 The parameter that controls the proximity of approximation
is q, the number of hidden units. For a given q, the best mean-square approximator
to a function 9(x) is determined by the values of (0, w) that minimize the squared

g(x) - Oo+ ~ OjA(wj tx) 2. (5)


Here [1 - 112 is the L2 norm. For a given q, the best approximator can be found by
versions of Newton's method. The approximation literature assures us that, if we
select q large enough, we can find a (0, w) that make this mean squared error as small
as we want.

6See White (1992) and Gallant and White (1988).

7They show that, if enough hidden units are included, then any Borel measurable function from one
finite dimensionalspace to another can be approximated arbitrarily well. Barron (1991) showed that to
achieve the same approximationrate a feedforward network uses only linearly many parameters (O(qn)),
while polynomial,spline, and trigonometricexpansionsuse parameters that grow exponentially(O(qn)).
448 L-K. Cho and ZJ. Sargent

The estimation problem occurs when we are given a sample {yt, x t}t.=l T and a
particular model of the form (4), with fixed q, and want to estimate the parameters.
This is a version of the nonlinear regression problem. The model to be estimated can
be written in the form

y t : g ( x t , / ~ ) + ct,

where we have stacked the parameters (0, w) into the vector/3. Estimation can pro-
ceed by utilizing one of a variety of algorithms based on the methods of stochastic
approximation. The 'training' algorithms discussed in the literature all use versions
of these iterations. There exist 'on-line' algorithms that have asymptotic properties
equivalent to 'off-line' algorithms. However, for small samples one can generally do
better by using an 'off-line' algorithm.

Recurrent networks

The second equation of (3) can be modified in a way that lets us model dynamics.
For example, we could specify

atj = A ( w j 'xt + "/j ' Y t - l ) ,

where 3'j is a vector of parameters. In the special case that wj = 0 for all j, this is
an autonomous dynamic system, one that can be used to represent the systematic part
of a nonlinear vector autoregression. Alternatively, we can specify

atj = A ( w j 'xt + 5j ' a t - l ) ,

where at-1 is the vector of values of a t - l , j , and d is a vector of additional parameters.

This kind of network has been used by Elman (1988), and captures feedback to the
hidden units from past values of hidden units.
A recurrent network can easily be set up to build up statistics (e.g., sums and sums of
squares) of past inputs to or outputs from a network, thereby providing the flexibility
to implement various accounting devices. Accounting devices play important roles in
various applications of networks to decision problems and economic models.

5. Classifier systems

5.1. The brain as an accounting system

A good illustration of an accounting system is John Holland's 'classifier system'.

This is a dynamic accounting system over pieces of decision rules. It consists of an
evolving and more or less comprehensive list of 'if-then' statements, which Holland
Ch. 9: Neural Networks ,for Encoding and Adapting in Dynamic Economies 449

calls classifiers that map conditions into actions; a set of interpretations of these
classifiers to enable them to make decisions through time; an accounting system
that cumulates payoffs associated with past experiences; and an auction for making
decisions in 'real time'. The system has a list with a fixed number of classifiers, which
compete for survival.
When a message from the environment enters the classifier system, sometimes
several classifiers have their 'if' parts satisfied, but their 'then' parts might differ, so
that the classifiers are offering conflicting advice. An 'auction' resolves the conflict,
with classifiers bidding to spend 'balances' accumulated via an accounting process
that registers the consequences of past decisions. The accounting system is the vehicle
by which the classifier system learns to alter its behavior over time.
Formally, a classifier system consists of the following objects.
Bit strings (classifiers). A classifier is a bit string of fixed length, written over the
trinary alphabet 0, 1,#, where # is interpreted as 'either 0 or 1' or 'I don't care'.
The first part of each bit string is interpreted as encoding a 'condition' statement,
while the remaining bits encode an 'action' statement. The presence of the # sign
accommodates generalization.
A decoding device. When a state occurs in the environment, this device identifies
which of a fixed collection of classifiers match in their 'condition' parts the condition
prevailing in the environment. The device thereby identifies a set of relevant classifiers,
from which one is to be selected actually to make a decision at the moment.
An accounting system. A measure of value called strength is assigned to each
classifier in the system at each point in time. Strengths for each classifier are updated
over time in response to the utilities and costs that flow from the environment when
the classifier acts. The accounting system computes cumulated averages of realized
utilities net of costs. In sequential settings, the accounting system taxes classifiers
operating at one stage, and awards the proceeds to the classifier at the immediately
preceding stage of the decision tree whose decision moved the system to the position
that gave the presently active classifier the opportunity to act. Setting up the accounting
system in this way is important to induce decisions whose only rewards are that they
facilitate subsequent decisions that will ultimately generate rewards.
An auction system. The auction system determines which of a set of matched
classifiers is granted the right to act in any given situation. Two alternative auction
principles are:
(a) The highest strength rule gets to make the decision.
(b) The right to make a decision is allocated probabilistically, with the probability of
being granted the decision made equal to a rule's relative strength.
A device for introducing new classifiers. New classifiers are introduced in several
(a) Uncovered situations. The most obvious occurs when the environment produces
a condition that matches no existing classifiers. In this situation, a new classi-
fier is generated whose condition statement matches the existing environmental
condition, and whose action part is randomly generated.
450 L-K. Cho and T.J. Sargent

(b) Try something new. New classifiers are generated and old ones are occasionally
extinguished in order to provide room for experimenting with untried actions.
(c) Generalize and specialize. New classifiers are synthesized to generalize (replace
0's and l ' s with #'s), or to specialize (replace #'s with 0's or l's in existing rules).

5.2. Generality versus discrimination

An author of a classifier system controls the list of states or conditions to encode, the
scheme for encoding them, and the accounting system that distributes rewards and
collects 'taxes'. 8 Thus, some 'hard wiring' goes into the construction of a classifier
system, much of it being done with an eye to the particular problem at hand. In some
environments, what is not hard-wired is the degree of generalization.
The problem of learning induces an incompletely understood tradeoff between 'gen-
eral' rules (those with 'conditions' that are coarse and therefore are often met) and
'specific' rules (those with conditions that are fine and therefore less frequently met).
An advantage of 'general' rules is that their conditions are frequently encountered,
which means that their performance can be assessed frequently. A disadvantage is that
they call for the same action for all states that satisfy the condition. In effect, general
classifiers give the advice: 'use a piece-wise constant decision rule over the subset of
the state space that I cover'. Specific decision rules have the opposite advantages and
disadvantages. They potentially permit fine-tuning the action to fit the specific point
in the condition space, but they pay lbr that advantage by requiring longer histories
of experience to learn. 9
An interesting property of classifier systems is that they can be set up in ways that
permit the degree of generalization or specificity to emerge adaptively. The presence of
the # (or 'I don't care') symbol in the alphabet, together with devices designed either
to generalize or specialize, 1° provide this capacity. The literature contains intriguing
simulation examples in which different degrees of generalization have emerged in
classifier systems, but at the present time little is known about general principles that
determine their propensity to generalize.

6. Economic applications

We now illustrate how some of these ideas can be combined, varied, and applied
to study economic equilibria. As promised, the following ideas will be exploited:
the representation of strategies in terms of hyperplanes, implemented in terms of

8In sequential problems, the author must also link classifier sub-systems 'intertemporally' in a ways
that permit learning to experience the rewards of patience.
9There are similar tradeoffs between parametric and non-parametric estimation strategies in econ-
mSee Marimon, McGrattan and Sargent (1990).
Ch. 9: Neural Networks fi)r Encoding and Adapting in Dynamic Economies 451

linear functions via perceptrons; the accounting for history by averaging past payoff
information; and an attempt to get by with very general rules that make minimal
distinctions among histories.

6.1. Arthur's artificial bandit

Brian Arthur's (1989) application of the Holland classifier system to a two armed
bandit problem illustrates both the role of the accounting system in recording past
payoff information, and the potential for the classifier system to 'learn'. The ith arm
pays a random variable xit drawn independently and identically from a distribution
Fi with mean #i. Assume that/£1 >/£2. A player must pull one arm for each t with
1 ~< t ~< T, with his reward being his total payoffs. The player knows neither F1
n o r f 2.
Arthur let the classifier system consist of the two classifiers,
Here the first entry encodes the 'condition' and the second entry encodes the 'ac-
tion'; # means 'whatever the history of observations', 1 means pull the first arm, and
0 means pull the second arm. The conditions of both classifiers are met all of the
time. The first classifier plays arm 1 all of the time, and the second plays arm 2 all
of the time. Let ri (t), i = 1,2, be a clock recording the cumulative number of times
that arm i has been pulled as of time t, which equals the cumulative number of times
that classifier i has acted. Arthur set up the accounting system

Xi~-~ -- ~'i~-~- 1

where S ~ is the 'strength' of classifier i after it has been rewarded with its payoff
when its clock is at ri. It is easy to show that Si~ records the average payoff associated
with playing arm i.
Arthur used a random device based on relative strengths to determine which clas-
sifier to obey at time t. In particular, at time t + 1, the system obeys classifier 1 with
probability 7rlt+l given by

S1~ (t)
7rlt+l = S l r l ( t ) -{- S2T2(t) '

By using methods of stochastic approximation, Arthur showed that the classifier sys-
tem eventually 'probability-matches', that is, plays the arms in fractions-over-time
that are proportional to their expected rewards. 11

llSee Arthur (1991) for discussion of various ways of altering the classifier system to improve its
452 L-If. Cho and T.J. Sargent

6.2. Prisoner's dilemma

In Arthur's model, histories are summarized by the average revenue generated by each
arm, and the probability of pulling an arm is determined by a sigmoid function. T h i s
section extends to repeated games the idea of using linear summaries of past histories.
We will focus on pure strategies and so instead of the sigmoid function, we shall use
the Heaviside function to assign a particular action conditioned on the linear summary.
To illustrate the main idea, we shall study the infinitely repeated prisoner's dilemma
game whose component game is

G C [3,3 0,4]
D [4,0 l, lJ" (6)

In period t, player i chooses an action s~ C {C, D}, and s t : (s~, s~) is the realized
outcome. Let h T = ( S l , . . . , s T ) (where s t E { C , D } 2 for ~ = 1 , . . . , T ) be a history
in period T + l, and let H T be the set of all histories in period T + 1.
A repeated game strategy fi is a mapping from a history to an action:

f{ : H : U HT` --+ { C , D }

where h ° is the null history. Let { o - t ( f ) } ~ l be the sequence of outcomes induced by

f = ( f l , f 2 ) . Player i's objective function is

liminf 1 ~ ui(st)"
7`--+00 T

The strategy f,i is complex because it takes a history h T C H T as an input, and

because the number of elements in H T is 227`, which increases exponentially as
the game proceeds. The astonishing eventual size of the argument of f.i makes it
implausible that f~ literally describes the strategic plan of a real human being. Yet
game theory routinely assumes that a rational player implements f~.
Models with bounded rationality are motivated by the suspicion that this theory
attributes excessive computational power to players. Here we take a particular way of
modeling a strategy in a repeated game. In each period, a player observes an outcome
s ¢ {C, D} 2 and forms a perception c~(s), where c~ : S = {C, D} 2 --+ R. a history
h T = ( s l , . . . , s T) is summarized by the aggregate ~tT_l oh(st). The player's decision
is determined by a single threshold decision rule: if the aggregate ~ t = l cei(st) is non-
negative, then he takes C, and otherwise, he chooses D.
Ch. 9: Neural Networl¢~for Encoding and Adapting in Dynamic Economies 453

Because the threshold for decision making is 0, one can consider

oz(s)Tr(s : h T ) , (7)
sc{G',D} 2

where 7r(s : h T) is the empirical frequency of s in h T. This class of strategies has a

simple geometric representation. The threshold of the decision rule is

:h : o, (8)
~E{O,D} 2

which is a hyperptane in the unit simplex of R 4. The player classifies histories into
two sets: one above the hyperplane, and the other below the hyperplane, and assigns
an action to each half space determined by (8). For this reason, we call it a linear
strategy, and say that histories are classified according to a linear classifier.
Each linear strategy can be represented by a single Ising neuron, a perceptron
of order 1, consisting of sensory, associative, and decision units. Here is how the
perceptron 'plays'. In the first period, the decision unit assigns a particular action s I
based on an 'empty' signal. At the beginning of period 2, the sensory unit observes
.s I = (sl, s~), and emits cgi(81). The associative unit receives this signal, and reports
to the decision unit 1 if c~i(s 1) ) O, and O, otherwise. Depending upon the report
from the associative unit, the decision unit chooses an action. Thus, our perceptron
of order 1 is defined formally in terms of the triple

~i -- (B{, c~{, s~).

At the beginning of the game, the machine selects an initial action s~. Once the game
starts, each history h T = ( s l , . . . , s T) is processed into


where c~{ is the synaptic weight generated by the sensory unit for each outcome. Then
the associative unit generates reports according to the Heaviside function

where A(z) = 1 if z ~> 0, and otherwise, 0. The decision unit chooses an action
according to

m {o, I} -+{c, D}.

454 L-K. Cho and T.J. Sargent

We have defined the perceptron in such a way that the input is the entire history h T.
Because the length of the input increases as the game continues, one may argue that
we still impose excessive computational capability. But as our discussion of recurrent
networks suggested, one can implement the same computing machine recursively by
taking the input in each period to be the outcome instead of a history, and by adding
a 'storage unit' of the summary statistics computed in the previous round.
One way to 'remember' a previous summary statistic is to introduce the 'threshold
adjustment rule' invented by Rosenblatt (1956). Instead of fixing the threshold of
the associative unit as 0 in the formal definition, we would construct a sequence
{0 t }t=0
oo of thresholds. Let 0 ° = 0 be the threshold of the associative unit at the
beginning of period 1. Once the new signal s I is processed in period 1, the threshold
becomes 01 = c~i(s 1) at the beginning of period 2. Following T - 1 periods of play,
o T _ l = ~ t =T--I
l Ozi(st) ' which is the threshold for the associative unit to process s T,
and after the information s T is processed, it is changed to 0 T = ~tT__t oh(st). It is
not difficult to prove that our formal definition and the recursive definition describe
identical computing machines.
Since only the sign of ~ t = l c~i(se) influences actions, we can take Y-~-ses o~i(s)r~(s"
hT), where 7r(s : h T) is the empirical frequency of s in h T, as the state of the machine.

6.3. Representation

Linear strategies in a repeated game presume little computational capability, but evi-
dently form a small subset of all repeated game strategies because they force actions
only to depend on the historical empirical frequency of outcomes. A natural question
is whether one reduces the set of equilibrium outcomes of a game by restricting the
feasible strategies to our class of linear strategies. For the prisoner's dilemma, the
answer is no.

THEOREM 1. For any individually rational payoff vector v* = (v~, v~), construct a
perceptron gh by assigning the synaptic weight ch of player i as

= j(s) - v;, ¢ i, vs {c, D} 2, (9)

so that player i chooses D if and only if ~-~s6s c~i(s)Tr(s : h T) > O, and the initial
action as sl = C. The perceptron thus constructed forms a Nash equilibrium whose
long run average payoff vector is v*.

Notice that

sCS scS
Ch. 9: Neural Networks fi)r Encoding and Adapting in Dynamic Economies 455

In ~ , player i is monitoring player j's average payoff from past plays. We have
constructed the decision rule so that player i punishes player j by playing D whenever
player j ' s average payoff exceeds v~. This implies that player j cannot receive more
than v~ against player i's linear strategy. On the other hand, whenever player j's
average payoff falls below v~, player i cooperates so that player j can improve his
own payoff) 2 This suggests that the pair of linear strategies can sustain v* as the
equilibrium payoff vector.
Theorem 1 is a representation theorem: any individually rational payoff vector can
be represented as an equilibrium outcome of linear strategies. Thus, although we
have restricted computational capability, we have not limited the set of equilibrimn
When we have many equilibrium outcomes, the selection of 'good' equilibria be-
comes a central question. A useful way to identify 'good' equilibrium outcomes is
to check their stability by seeking robustness against small mistakes. One can imag-
ine several forms of mistakes in the operation of a perceptron, including errors in
computing ~ c s c~i(s)Tr(s : h T) or in choosing a wrong action ('trembling hand').
Since any mistakes are captured in the state of the machine, one can incorporate 'ro-
bustness against mistakes' by requiring that the pair of perceptrons achieve a Nash
equilibrium following any pair of states of the machine that can be revealed by some
histories. This is in the spirit of requiring subgame perfection since we consider any
state induced by all possible histories. An interesting result is that the pair of Nash
equilibrium perceptrons constructed in Theorem 1 is stable in this sense.

COROLLARY 1. Following any history, the pair of perceptrons constructed in Theo-

rem 1 forms a Nash equilibrium whose long run average payoff vector is v*.

6.4. More general games

Theorem 1 and Corollary 1 demonstrate the potential benefit of neural networks as a

modeling tool. By using the simplest form of neural networks, we recover the folk
theorem for the repeated prisoner's dilemma game. However, in other games, some
outcomes cannot be sustained by a pair of linear strategies. To show this, we consider
another infinitely repeated prisoner's dilemma game where the component game is


G'" D [2,2
C [6,0 0,61]1," (10)

Notice that the cooperation outcome is Pareto dominated by some convex combination
of u(C, D) and u(D, C).

12Theactual proofof this second part is rather complex.

456 1.-K. Cho and 7:,I. Sargent

Since the security level payoff of each player is 1 (2, 3) is an individually rational
payoff vector. The folk theorem says that this payoff vector can be sustained by some
subgame perfect equilibrium. However, no linear strategies can sustain this payoff
vector as a Nash equilibrium. Instead of a formal proof, we will examine whether the
perceptron ~bi whose synaptic weights are determined by

~i(8)=uj(8)-v~, iS.j, (11)

can support v* = (2, 3) as a stable Nash equilibrium vector. Fix a history ~T =

@1,..., s T ) sO that


for each i = 1,2. In this case, following ~T, player i chooses C. By

c~,i(C,C) =uj(C,C)-v~ <~0, V i = 1,2, (12)

the summary statistic ~ c~i(st) remains negative as long as both players choose C'.
As a result, once (C, C) is played, the continuation play is stuck at (C, C) and player
2's long run average payoff becomes u2(C, C), which is strictly less than v~ = 3.
To support (2, 3) as an equilibrium payoff in (10), two players must coordinate their
actions not to play (C, C). That is because once (C, C) is played, both players get
stuck with playing (C, C) for the rest of the game. In order to avoid playing (C, C),
we need a more sophisticated way of classifying histories.
Notice that the target payoff vector can be generated by alternating (D, D), (C, D)
and (D, C). A natural guess is that we can support (2, 3) as a Nash equilibrium
payoff vector by forcing players to play only these three outcomes. To make players
avoid (C, C), we need a more 'sophisticated' perceptron for at least one player, say
player 2, which monitors not only the opponent's but also his own average payoff.
That is, player 2 chooses D if

t:l t=l
But if

T ) (13)
t=l t=l
Ch. 9: Neural Networlc~for Encoding and Adapting in Dynamic Economies 457

player 2 further considers his own average payoff from the same history. If (13) holds

= r 0, (14)
t=l t=l

then player 2 chooses D. Otherwise, he chooses C. If player 1 uses the perceptron with
a single associative unit built upon (9), then he will choose C if and only (14) holds. If
player 2 considers (13) and (14) simultaneously, then the two players never play (C, C)
following any history. We can show that this pair of perceptrons forms a Nash equilib-
rium that induces v* = (2, 3) following every history, i.e., a stable Nash equilibrium.
This example shows that even for 2 x 2 games, some individually rational pay-
off vectors cannot be sustained by linear strategies. Instead, some outcomes require
strategies with two linear classifiers, where one unit monitors the opponent's average
payoff, and the other one monitors his own payoff. At this point, one might conjecture
that we have to increase the number of linear classifiers as the number of actions in
the component game increases. Fortunately, this conjecture is false. It can be shown
that we require only strategies with at most three linear classifiers to sustain any in-
dividually rational payoff vector in general two person repeated games. This result is
attained in Cho (1994a).
The construction of linear Nash equilibria suggests that it might be necessary for
each party to monitor the opponent's payoff following any history. If so, one may
wonder whether the same construction of 'simple' strategies applies to games with
imperfect monitoring where some players cannot directly observe the action and the
payoff of the opponent. In this case, each player must rely on public information to
infer the activities of the other players. If the public outcome is completely uninfor-
mative about the underlying actions of the players, then no cooperation beyond the
one shot Nash equilibrium is possible. A natural question is whether the 'simple' con-
struction carries over to models where the public outcome is sufficiently informative.
Cho (1994a) examines a general two person repeated games with imperfect moni-
toring and with no discounting. Suppose that the public outcome is informative enough
to construct a least squares estimator of the frequency of actions of the opponent. The
sufficient condition to construct the least squares estimator is, roughly speaking, that
the number of public outcomes be as large as the product of the number of actions
available to each player. One can apply the same construction of strategies that have
three linear classifiers over the least squares estimators. By exploiting the consistency
of the least squares estimators, we can recover the folk theorem.

6.5. Capital and moral hazard without discounting

To demonstrate that linear strategies are useful in constructing simple equilibria in

models other than the repeated games, we now examine a stochastic growth model
458 L-K. Cho and T.J. Sargent

with moral hazard. As the previous examples demonstrate, through only a handful of
linear classifiers, we can discriminate histories finely enough to sustain any individu-
ally rational payoff vector in a repeated game. By computing average payoff of each
player, one can precisely figure out when to punish and when to cooperate.
In the presence of imperfect monitoring, however, the linear classifier needs to
extract information about hidden actions of the opponent. Because some player cannot
compute the average payoff of the opponent, one must be able to construct a linear
'proxy' of the hidden variables as the game continues. The central issue becomes how
well the linear classifier can learn about the history of hidden actions.
The existence of capital in the growth model aggravates the problem. As the capital
stock changes, the feasible set of strategies of the players also changes. In order to
monitor the activities of an agent through a series of public data, one may have
to accumulate substantial amounts of data. Unless the linear classifier can learn the
hidden activities swiftly and accurately, one may not carry out the punishment and
the cooperation available before the capital stock changes.
We shall show that linear strategies can overcome these problems. By exploiting the
resiliency of linear strategies, we shall construct a linear strategy sequential equilib-
rium that can sustain any payoff vector in a small neighborhood of the efficient frontier.

6.6. A model

We use an undiscounted version of a stochastic growth model studied by Marcet

and Marimon (1992). Consider a production economy where the representative en-
trepreneur decides consumption (c t) and investment (i t) in period t = 1 , 2 , . . . . Let
k t be the capital stock in period t which produces output according to f ( k t) where f
is a neoclassical production function satisfying Inada condition. Let ~-t be the subsidy
by the planner. The budget constraint of the entrepreneur in period t is

ct+i t--f(k t ) + 7 -t.

The capital stock evolves according to

kt+ 1 dk t + i t

where d C (0, 1).

Since f satisfies the Inada condition,

max f ( k ) - (1 - d)k < oo



k(d) -- arg max f ( k ) -- (1 - d)k

Ch. 9: Neural Networks for Encoding and Adapting in Dynamic Economies 459

is well-defined. Let u(e t) be the utility of the entrepreneur in period t. The en-
trepreneur's objective function is

liminf 1


v, = u ( f ( k ( d ) ) - (1 - d)k(d))

as the autarchy level payoff. That is, given k(d) as the level of capital, the entrepreneur
can maintain v I by investing (1 - d)k(d) in every period without any subsidy from
the planner.
Let the planner's objective function be

liminf 1

where A E R+ is the weight assigned to the entrepreneur's utility.

Suppose that the return from investment is stochastic. Given i t, the new capital
stock generated by i t is g(it). We assume that

Eg(i t) = i t,

g(o) = o

and if i > i', then g(i) first order stochastically dominates g(i'). The evolution of the
capital stock is now dictated by

kt+ 1 = dk t + 9(i t)

and the planner only observes the capital stock in each period as well as the transfer 7-t.
Since the entrepreneur is risk averse, in any Pareto efficient solution, the planner
must bear all the risk by offering ~.t so as to make the consumption level of the
entrepreneur independent of the outcome from the investment:

Au'(c t) = 1.

Without a proper mechanism to control the entrepreneur, however, the entrepreneur

has an incentive to under invest so that he can increase consumption in each period.
460 L-K. Cho and ZJ. Sargent

6. 7. Linear strategies

We will show that the planner can construct a 'linear strategy' that implements 'almost'
any payoff vl > v_1 as a sequential equilibrium. The (static) optimal solution (/¢, ?)
of the planner is characterized by

A u ' ( f ( k ) - (1 - d)k + $) = 1,

f ' ( k ) - (1 - d) = O.

Choose k* </¢, and 7" > ~- so that

A u ' ( f ( k * ) - (1 - d)k* + "r*) < 1,

f ' ( k * ) - (1 - d) > O.

Let i* = (1 - d)k* be the level of investment needed to maintain k* as a steady state,


c* = f ( k * ) - i* + r*

be steady state consumption. Let

~ = ~(S(k*) - ~* + 7*)


,~p* = ~ ( c * ) - ~-.

Obviously, (v~, Vp) is not efficient. But, by choosing k* and r* in a small neigh-
borhood of/c and ?, we can make the distance from the efficient frontier arbitrarily
Recall that

u - IR3 --+ R

is a differentialSle concave function that maps (k, i, r) into u ( f ( k ) - i + r ) . Differentiate

u at ( k * , i * , r * ) :

~'(c*)(I'(k*)(k - k*) - (~ - i*) + (7 - ~-*)).

Ch. 9: Neural Networks for Encoding and Adapting in Dynamic Economies 461

We can write the tangent hyperplane as

H '1 = {(k,i,T) • f'(k*)(k- k*) - (i - i * ) + (7-- 7-*) = 0 } .

By applying the same operation to the objective function of the planner, we obtain

Hp= ,{ (k,i,~-):f'(k*)(k-k*)-(i-i*)+ (')

1 )tu'-(c*) @-~-*)=0 } .

Notice that we have chosen h* and ?-* so that

1 <0.
By the concavity of the objective functions, H I and H p serve as inflated estimates
for u ( f ( k ) - i + "r) and A u ( c ) - ~-, respectively: if v~ = u ( f ( k ) - i -4- T), then
f ' ( k * ) ( k - k*) - (i - i*) + ( r - r * ) < 0, and if Vp = A u ( f ( k ) - i + ~-) - r, then
f'(k*)(k- k*) - ( i - i*) + (t :~u,(c*)
1 )(~- - ~-*) < 0.
We cannot use H~ and H p as the threshold to assign an action to the planner,
because the investment level cannot be observed by the planner. Thus, we need to
construct a proxy for i t. Define

~t = (kt+1 _ k*) - d(k t - k*).

Notice that for t >~ 1,

E [~t i t] it .,

since i* = (1 - d)k*. By replacing i t with it into H ' t, we have

f ' ( k * ) ( k t - k*) - (k t+l - k* - d('k t - k * ) ) + ('r t - ~-*).

By adding up with respect to t / > l, we construct a function

GTi = ( f ' ( k * ) -- (l -- d)) E ( k t - k*) - (k T+l - k i ) + E ( T t - T*)
t=l t=t

as a 'linear proxy' for the gross payoff of the entrepreneur above v~ so that if G ~ ~> 0,
then the planner assigns TT+l = 0, and if G T < 0, then he assigns T T+I = T*.
462 L-K. Cho and T.J. Sargent

Similarly, we construct
GTp = (f'(k*) - (1 - d)) Z ( U - k*) - (]gT+l _ ]gl)

+ l-~uTu~ ~=1

so that the entrepreneur plays i T+I = 0 if GpT > 0, and plays i T+1 = i*, otherwise.
We confine the formal analysis of this model to Appendix A, where we establish:

THEOREM 2. Let 991 and ~p be the pair of strategies for the entrepreneur and the
planner constructed as above. Then (¢Pl, qOp) is a sequential equilibrium whose long
run expected payoff vector is (v~, Vp).

PROOF. See the appendix. []

The planner learns the entrepreneur's activity through G T, which can be computed
from public information. Since the entrepreneur is risk averse, it appears that the
planner could potentially overestimate the entrepreneur's average payoff by using the
linear proxy G T. But because the average increment of the capital stock is a consistent
estimator of the average investment by the entrepreneur, the planner eventually can
accurately estimate the entrepreneur's investment level and average payoff.

7. Conclusions and open questions

The last example shows both the strengths and the weaknesses of perceptrons as a
learning tool. Because line~ strategies are simple to implement, we can represent a
large class of equilibrium outcomes with simple perceptrons. However, in the presence
of imperfect monitoring, the linear proxies can overestimate the hidden variables. We
had to rely on a strong law of large numbers to avoid the problem of overestimation.
As a result, our approach does not directly carry over to models with discounting
and moral hazard. Without moral hazard, Cho (1994b) showed how to recover the
folk theorem with a neural network with a finite number of associative units. Cho
(1994b) used a feedback rule to adjust the state variable in order to obtain subgame
perfect equilibrium, in contrast to Cho (1994a), where feedforward neural networks
were sufficient. With imperfect monitoring and discounting, a more sophisticated
feedback rule will be needed. So we are continuing to search for a neural network with
finite processors capable of recovering the folk theorem. If successful, this endeavor
promises to open efficient ways of computing equilibria.
Throughout this paper, we have regarded bounded rationality as a constraint on
computational capabilities. Thus, the main exercise was to construct equilibria sus-
tained by simple neural networks, and to study how much the restriction on strategies
Ch. 9." Neural Networks for Encoding and Adapting in Dynamic Economies 463

reduced the set of equilibrium outcomes. Rubinstein (1986) pioneered a different line
of research by modelling bounded rationality as a cost to implementing equilibrium
strategies. In the Rubenstein tradition, it is assumed that players are perfectly rational,
but constrained to carry out the plan only through computing machines with finite ca-
pabilities, and that the cost of computation depends upon some complexity measure of
the computing machi.ne. Rubinstein (1986) demonstrated that small complexity costs
may collapse the set of equilibria to a very small set.
The approach of Rubinstein (1986) presumes that players have access to finite au-
tomata that encode transition and outcome functions. We believe that neural networks
are useful tools to study the strategic implications of complexity cost, especially when
the feasible computing machines may have infinite states, and the computing machine
can adapt itself as the game progresses. We are pursuing two classes of extensions,
the first designed to build real-time adaptation into the neural networks (e.g., via ad-
justment of the 'target payoff' levels); and a second to impute some computational
costs to employing more complex networks. This second endeavor promises to extend
the notions of Rubenstein (1986) to computing machines with an infinite dimensional

8. Appendix

8. l. Analysis of Theorem 2




T T = ~-~(r t- r*).

To simplify notation, assume without loss of generality that



a=f'(k*) - (1 - d)
464 I.-K. Cho and T.Z Sargent


~=- [1

By construction,

c~,¢~ > 0.


n, = {(wT, kr+',T~) • ~w ~ - k ~+' +T T = 0}


Hp = { ( W T , k T + I , T T ) : a W T - ] g T + I flTT o}.

Given a hyperplane H, let H represent the open half space above H.

For later reference, let us summarize some useful results. First, VT ~> 1,


which implies that

limsup ~1 ~ Tt <~ T*. (15)

T-+oo t=l

By assumption, VT ~> 1, ~T+I ~ ~,.

Note that W T+I > W T if and only if k t+l > k*. Define

T = inf{t: dt~: <~ k*}.


wT ~ -- Z dt[~ =-- W, VT >1 1.

limsup 1 k*. (16)
T-+oo T ~ kt ~
Ch. 9: Neural Networks for Encoding and Adapting in Dynamic Economies 465

By the construction of the strategy, 7-T = 0 if and only if the state variable is in H1.
Thus, 7 -T+I < T T whenever the state variable is in H1. By exploiting the fact that
a,/3 > 0, one can show that

T T /> - a W - 7 - * =~.

Combining this result with (15),

lim , - ] - T = 0.

S i n c e {,~T} is bounded from below, one can compute how far W T is pushed down
while i t = 0 is being played so that the capital stock evolves according to k t+l = dk t.
Note that i T+I : 0 if and only if the state variable is in Hp:

c ~ W T - k T + I - ~ T T >~ O.

Thus, while i T + I = 0 is played,

kT+l _ /y/-T /37-

W T >~ >~ - ~ (17)
Oz O!

We warn, however, that the right hand side is not a lower bound of the random
variable W y , since W y can be smaller than the right hand side while the state is
in Hp.
By (16), we know that

limsup ~1 ~ kt k*.
T--+oo t=l

We shall prove that

1 7'
liminfr~ooT ~ kt >~ k*

by following the idea of Fudenberg, Kreps and Maskin (1990).

Define a new random variable

= (l - d ) w * + k + _Z_<
466 1.-K. Cho and T.J. Sargent

Notice that the last term is the 'lower bound' of W T while ( W r , k T + l , T T ) ~ H p .

Thus, if z t <~ O, then

( w ~, k ~+~ , T ~) c Hp


i t = i*. (18)

One can easily verify that

z t+: - z t = k t - k* + (1 - d ) g ( i t ) . (19)


-]"~T = {t ~ T - 1: z t ~ 0}

as the collection of periods before T when z t <<, O, and

x ~' = ~ k ~ - k* + (1 - d ) 9 < ) .
t E']~T

By (18) and (19),

E [ S +1 : x ~] = ~ .

Let _x he the lowest realized value of k - k* + (1 -- d ) 9 ( i ) for Vk ~> 0 and i ~> 0.


X T = max [ x ' , . . . , x T ] .


z T>l~+(z T - X T ) , VT>>I. (20)

PROOF. We prove the lemma by induction. For T = 1, the conclusion holds obviously.
Suppose that the conclusion holds for T. If Z T ~ O, then

Z T+I ) X -[- Z T ) X ) fl;-[- (X T -- X T ) .

Ch. 9: Neural Networlc~ .for Encoding and Adapting in Dynamic Economies 467

If z T <~ O, then
Z T+I == Z T q- (X T4-I __ X T )

: ( ~ + x ~ - x ~) + (x ~+' - x O
:Xq- ( x T + 1 -- X T )

> ~_ + (~r+~ _ xr+,).


Since {X T } is a martingale,

.T, T -- X T

almost surely. Thus,

liminf- ) 0
T-+~ T

which implies

liminf '--
T--+oo T E k t >/

We conclude that


e1 Eke: k*

Since i t = i* or 0, and Tt = ~-* or 0, we can conclude that ahnost surely,

limsup -1- # {t ~ T: i t = 0 or T t = 0 } = 0.
T-+oo T

This implies that

1 1
lim ~--~ u(c t ) lira i*
t=l t=l

almost surely. Similarly, the long run average payoff of the planner is exactly v~. This
proves that the pair of linear strategies constructed supports (v~, v~).
468 1.-K. Cho and TJ. Sargent

Next, we have to show that no party has an incentive to deviate from the linear
strategy. We only verify that the entrepreneur has no incentive to deviate. Similar
logic applies to the planner.
First, since { ~ T = I (k t+l - dk t) - it}~=l is a martingale with bounded increments,

-- k t+l -- dk t) - i t -----4 0 (21)

r kt=l

almost surely. By (15) and (16), regardless of the entrepreneur's repeated game strat-
limsup 1 k*

limsup ~ E zt <~r*.
T--+oo t=l

Since k* < k(d),

1 kt -(l-d) kt <<.f(k*)-(1-d)k*
f Tt=l

-1~ k t ~ k*..

Since { k T } is uniformly bounded from above, V6 > 0, 9T(6) > 0 so that VT ~>T(6),

(1 -- d) [ T ~t=l /~t] _ T1 t 1=
~z [k t+l - dk t] < 6. (22)

Since u is continuous and u' > 0, Ve > 0, 96 > 0 and T(6) so that VT ~> r(6), (22)
holds and

f k' - = [k '+~ - dk ~] + ~ Tt .< ~,~ + ~.

((Tt=~l) 1 t~l 1 t=~l )
Ch. 9: Neural Networksfor Encoding and Adapting in Dynamic Economies 469

By (21), V r / > 0, 3 T ( r / ) ~> T(6) so that V T ~> T ( ~ ) ,

Pr ( U +1 - dU) - i t < rl, VT ) T(~) >~ 1 - r].


Thus, since 'a and f are c o n c a v e , for a sufficiently small r1 > 0,

Pr ( 1 Tt~=lu(f(kt)--it+Tt) < v ~ + e + r l , VT>~T(~])) > - 1 - ~ ]

f r o m w h i c h we c o n c l u d e that the entrepreneur's long run e x p e c t e d p a y o f f cannot


(1 - ~l)(vt + e + rl) + riM

for s o m e positive constant M . Since e, 'q > 0 are arbitrary, we h a v e p r o v e d that the
pair of linear strategies sustains v* as the long run expected p a y o f f vector.

Anderson, T.W. (1958) Introduction to multivariate statistical analysis. New York: Wiley.
Arthur, B.W. (1989a) 'The dynamics of classifier competitions', 7 March, miIneo.
Arthur, B.W. (1991) 'Designing economic agents that act like human agents: A behavioral approach to
bounded rationality', American Economic Review, Papers and Proceedings, 81:353-359.
Atkeson, A. (1991) 'International lending with moral hazard and risk of repudiation', Econometricu,
B arron, A.R. (1991) 'Universal approximation bounds for superpositions of a sigmoidal function', Technical
Report No. 58, Department of Statistics, University of Illinois, mimeo.
Chen, X. and Halbert, W. (1992) 'Asymptotic properties of some projection-based Robbins-Monro proce-
dures in a Hilbert space', Department of Economics, University of California, San Diego, CA, November,
Chen, X. and Halbert, W. (1993) 'Convergence of nonparametric learning models', Department of Eco-
nomics, University of California, San Diego, CA, February, mimeo.
Cho, I.-K. (1994a) 'Perceptrons play repeated games with imperfect monitoring', Games and Economic
Behavior, forthcoming.
Cho, I.-K. (1994b) 'Bounded rationality, neural network and repeated games with discounting', Economic
Theory, 4:935-957.
Cho, I.-K. (1995) 'Perceptrons play repeated prisoner's dilemma', Journal qfEconomic Theory, 67(1):266-
Ehnan, J.L. (1988) 'Finding structure in time', CRL Report 8801, Center for Research in Language,
University of California, San Diego, CA, mimeo.
Fisher, R.A. (1936) 'The use of multiple measurements in taxonomic problems', Annals qt Eugenics,
470 L-IcL Cho and T,J. Sargent

Fudenberg, D., Kreps, D.M. and Maskin, E. (1990) 'Repeated games with long-run and short-run players',
Review of Economic Studies, 57:555-573.
Gallant, A.R. and White, H. (1988) 'There exists a neural network that does not make avoidable mistakes',
in: IEEE second international conference on neural networks, San Diego: SOS printing, pp. 657-664.
Goldberg, D.E. (1989) Genetic algorithms in search, optimization, and machine learning. Menlo Park, CA:
Hebb, D.O. (1949) The organization of behavior. New York: Wiley.
Hertz, J., Krogh, A. and Palmer, R. (1991) Introduction to the theory of neural computation. Redwood
City, CA: Addison-Wesley.
Holland, J.H. (1975) Adaptation in natural and artificial systems. Ann Arbor, MI: Univ. of Michigan Press.
Holland, J.H. (1986) 'Escaping brittleness: The possibilities of general-purpose learning algorithms applied
to parallel rule-based systems', in: R.S. Michalski, J.G. Carbonell and T.M. Mitchell, eds, Machine
learning: An artificial intelligence approach, II. Los Altos, CA: Morgan Kaufmann.
Hornik, K,, Stinchcombe, M. and Halbert, W. (1989) 'Multi-layer feedforward networks are universal
approximators', Department of Economics, University of California, San Diego, CA, February, mimeo.
Kendall, M.G. (1957) A course in multivariate analysis. London: Charles Griffin.
Kendall, M.G. and Halbert, W. (1991) 'Strong convergence of recursive m-estimators for models with
dynamic latent variables', University of Illinois, March, mimeo.
Kushner, H.J. and Clark, D.S. (1978) Stochastic approximation methods for constrained and unconstrained
systems. New York/Berlin: Springer.
Ljung, L. (1977) 'Analysis of recursive stochastic algorithms', IEEE Transactions on Automatic Control,
Ljung, L. and SOderstrOm,T. (1983) Theory and practice of recursive identification. Cmnbridge, MA: MIT
Ljung, L., Pflug, G. and Harro, W. (1992) Stochastic approximation and optimization of random systems.
Basel/Boston/Berlin: Birkh~nser.
Marcet, A. and Marimon, R. (1992) 'Communication, commitment, and growth', Journal of Economic
Theory, 58(2):219-249.
Marimon, R., McGrattan, E. and Sargent, T. (1990) 'Money as a medium of exchange in an economy with
artificially intelligent agents', .lournal ~f Economic Dynamics and Control, 14:329-374.
Minsky, M.L. and Papert, S.A. (1969) Perceptrons. Cambridge, MA: MIT Press.
Miiller, B. and Reinhardt, J. (1990) Neural networks: An introduction. Berlin/Heidelberg: Springer.
Radner, R. (1985) 'Repeated principal agent games with discounting', Econometrica, 53:1173-1198.
Rosenblatt, E (1956) Principles of neurodynamics: Perceptrons and the theory of brain mechanisms. Wash-
ington, DC: Spartan Books.
Rubinstein, A. (1986) 'Finite automata play the repeated prisoner's dilemma', Journal of Economic Theory,
Rubinstein, A. (1993) 'On price recognition and computational complexity in a monopolistic model',
Journal of Political Economy, 101:473-484.
Weisbuch, G. (1990) Complex systems dynamics: An introduction to automata networks. Lecture Note
Vol. II, Santa Fe Institute Studies in the Sciences of Complexity, Reading, MA: Addison-Wesley.
White, H. (1992) Artificial neural networks: Approximation and learning theory. Oxford: Basil Blackwell.
Chapter 10

University af Pennsylvania
University of Cyprus

1. Introduction 472
2, O v e r v i e w o f the G A M S m o d e l i n g l a n g u a g e 473
2.1. Domains of parameters and variables 474
2.2. The GAMS aritlunetic operations 475
2.3. The GAMS summation and product operators 475
2.4. Data entry and manipulations 475
2.5. The GAMS relational operators 477
2.6. Declaration and definition of equations 477
2.7. Exception handling capabilities 478
2.8. GAMS solvers 478
2.9. The GAMS libraries of economic and financial models 479
3. Example applications 479
3.1. A simple transportation model 480
3.2. Asset allocation model 482
3.3. The SAMBAL system: Estimating Social Accounting Matrices 484
References 488

Handbook of Computational Economics, Volume I, Edited by H.M. Amman, D.A. Kendrick and J. Rust
(~) 1996 Elsevier Science B.V All rights reserved.
472 S.A. Zenios

1. Introduction

Mathematical modeling and computer analysis is a cornerstone of computational eco-

nomics. A wide range of economic problems can be represented by systems of equa-
tions or by optimization programs over systems of inequalities. Such models, and the
underlying economic applications, are discussed in other Chapters of this Handbook
on General Equilibrium Models, Game Theory, and Sectoral Models.
These mathematical models are used to represent real and observable systems. The
models are, therefore, developed using economic observations (i.e. data). They are not
merely abstract mathematical descriptions. The instantiation of an abstract mathemat-
ical model using economic data, and its solution on the computer, is facilitated using
high-level modeling languages. This Chapter provides an introduction to one partic-
ular algebraic modeling language, GAMS of Brooke, Kendrick and Meeraus (1992).
It gives an overview of the language and illustrates its use in modeling problems in
transportation, asset allocation, and the estimation of social accounting matrices.
A significant part of the time required to develop a model involves data preparation
and transformation, and report generation. The model is transformed from a form that
is understandable to the modeler, to a form that is readable by the computer. Until
the 1970's such transformations were handled by programs tailored to each specific
application. Such programs, known as matrix generators, required several hours of
programming time, were difficult to alter to accommodate changes in the model and
were accessible only to the specialist who wrote them and not to the analyst who
developed the model. As a result, matrix generators were difficult to debug and correct.
In the late 1970's the case was made that matrix generators should give way to
algebraic modeling languages, Bisschop and Meeraus (1982) and Fourer (1983). A
modeling language could integrate ideas from relational database theory with the
rapidly expanding field of mathematical programming. Relational databases provide
the framework for data organization, while mathematical programming provides a
way for describing a variety of problems and offers algorithms for their solution.
The first steps towards the development of an algebraic modeling language concen-
trated on linear programming problems, Bisschop and Meeraus (1982). Extensions
were then made to handle nonlinear and integer programs, Brooke, Drud and Meer-
aus (1984,1992), network problems, Zenios (1990), variational and complementarity
problems, Rutherford (1992).
The first modeling language - GAMS, a General Algebraic Modeling System - was
developed at the World Bank in the late 1970's. This system provides a high-level
(algebraic) language for the representation of large and complex models. It allows
for unambiguous statements of algebraic relations that define an abstract system of
variables and equations. It also provides mechanisms for data management. The sys-
tem performs appropriate data transformations to create a specific instance of the
model, starting from the abstract representation. Since the model description is alge-
braic the GAMS statement of the model provides a readable documentation. The data
management mechanisms also facilitate the preparation of reports.
Ch. 10." Modeling Languages in Computational Economics: GAMS 473

The use of an appropriate algorithm to solve the model is also handled by GAMS.
From the user's perspective, the model is independent of the solution algorithm.
This approach ensures portability. Only the GAMS statement of the model needs
to be ported among different computer platforms, and the GAMS statements are
machine independent. A GAMS model can be ported, today, to a wide range of
computer platforms ranging from personal computers, mainframe and workstation
systems, attached array processors and CRAY vector supercomputers.
GAMS is not the only language available, although it is currently the most widely
used due to the experiences accumulated by the World Bank analysts. For other
developments in modeling languages refer to Geoffrion (1987) and Fourer, Gay and
Kernighan (1993). Two more recent systems are AMPL by Fourer, Gay and Kernighan
and AIMMS by Bisschop and Entriken (1993). Some of the differences between these
systems are the ways they interface with other packages, their ability to handle special
model structures such as networks, and the software systems they use for solving
the models. The treatment of this Chapter concentrates exclusively on GAMS. This
approach is sufficient for providing readers with an introduction to the fundamentals
of modeling languages. Readers who are interested in using a modeling language,
GAMS, AMPL or AIMMS, would need access to the respective users' guide.

2. Overview of the GAMS modeling language

A GAMS model is a collection of statements in the GAMS language. These statements

define the variables of the model, specify the symbolic relationships between them in
the form of equations, specify data structures and assign values to them, and instructs
the computer to generate and solve the model. Other GAMS statements are used to
handle output. In this section we describe the basic components of GAMS, without
going into details on the precise syntax. Readers will get here a general overview of the
language and its capabilities. Detailed documentation is provided in the GAMS User's
Guide, Brooke, Kendrick and Meeraus (1992). We use upper case, typewriter font for
all expressions that are part of the GAMS language, such as EQUATIONS and SOLVE.
Data structures, data initialization and symbolic relationships are specified by writ-
ing GAMS statements" on GAMS symbols. Symbols must first be declared as to type,
before they can be used. Each symbol must be declared to belong to one of the
following six classes:


GAMS statements are classified into one of two groups:

1. Declaration and definition statements.
2. Execution statements.
474 S.A. Zenios

Declaration statements specify the class of a symbol, and a definition statement

provides values for a declared symbol. For example, the GAMS statement:
SETS J markets ;

specifies a symbol J as being a set, which is explained in the declaration statement

to be the set of "markets". Elements of this set, i.e., actual markets, are defined in
the following GAMS statement that combines the declaration of the symbol with its
definition. This statement also illustrates the combined definition of two sets.
I c u s t o m e r s /BIG, S M A L L / ;

Declarations consist of a keyword for the symbol class, an identifier for the symbol,
the domain of the symbol, and some explanatory text. The declaration of a parameter
is illustrated in the following example:

PARAMETER A (l,J) Input-Output Matrix

Symbol-class-keywo~ Identifier Dommn(Sec. 2.1) Expl~o~ ~xt

Execution statements are instructions to carry out actions such as data transfor-
mation, model generation, model solution and preparation of reports. The execution
statements are:

2.1. Domains of parameters and variables

The sets in a GAMS model are used to specify the domains of parameters and vari-
ables. Equations may also be defined over domains as discussed in Section 2.6. For
PARAMETER D(J) demand of a p r o d u c t at market J ;

declares a parameter over the set of markets J. For the set J defined above, this
parameter definition is equivalent to the three parameters D ( ' ' N E W _ Y O R K ' ' ) ,
D(' 'CHICAGO' ') , D ( ' 'TOPEKA' ').
Variables may also be defined over sets, as in the example

which defines a variable over the product space of sets I and J.

The domain sets can be used to manipulate and transform data by using indexed
operators, as explained in Section 2.3. In the context of declaration and definition of
equations the domains are used to specify the symbolic relationships among variab-
Ch. 10: Modeling Languages in Computational Economics: GAMS 475

2.2. The GAMS arithmetic operations

GAMS supports arithmetic expressions on parameters and variables. These expres-

sions can be used either for data manipulation, or for defining symbolic relationships.
The standard arithmetic expressions are

* and / multiplication and division,
4- and - addition and subtraction.

In addition, GAMS supports many commonly used standard functions such as expo-
nentiation, logarithms, trigonometric functions, absolute value functions etc.

2.3. The CAMS summation and product operators"

Algebraic manipulations of GAMS symbols are facilitated with the use of indexed
operators such as SUM (for summation) and PROD (for product). The format of these
operators is based on the idea that both operators have two arguments: The domain,
i.e., the index set, over which the operator is executed, and an operand. A simple
example is:

SUM(J, X(I,J)) ;

which is equivalent to the standard algebraic expression ~ j e J x~j. Similarly

PROD ((l,J) , X ( l , J ) ) ;

is equivalent to ~I~eI [ I j 6 J Xij.

GAMS supports two additional indexed operators, SNAX and SMIN, that find
the largest and smallest values over the domain of an indexed set. For example,
SNAX ( J , D ( J ) ) finds the largest value of the parameter D (if).

2.4. Data entry and manipulations

GAMS allows three different formats for entering data:

1. Lists,
2. Tables,
3. Direct assignments.
Consider for example the PARAMETER D (J) for the demand of a product at each
member of the market set J. Assume that demands are identical, let us say 300, at all
markets. The simplest way to initialize the demand parameter is by the assignment
476 S.A. Zenios

D(J) : 300;

This statement implicitly loops over all elements of the set J. The following list
format is equivalent to the assignment statement:
PARAMETERS D(J) demand at market J in cases
T O P E K A 3 0 0 /;

The right-hand-side of the assignment statement does not have to be a number. It

could be a G A M S expression that operates on other data structures. For example, the
d e m a n d in every market region can be split over multiple customers. If the SET I
defines the set of customers, and TABLE C_DE~kND ( I , J ) is the demand for each
customer in each region, then the total demand in each region can be initialized by
using the following assignment statement:
D(J) : SUM(I, C DEMAND(I,J)) ;

C_DEI~ND ( I , J ) is an example of a two-dimensional table (or matrix). G A M S

provides the TABLE format for initializing tables. The following statement illustrates
the data initialization of the demand for two customers (BI@ and SMALL) that are
present in each one of the three markets in J .

TABLE C_DEMAND (I,J) demand for each customer in each market


BIG 200 200 200
SMALL i00 i00 i00

G A M S allows the declaration and initialization of tables with more than two dimen-
Data structures can be manipulated using standard arithmetic operations, indexed
operations and functions. For example, the statement

D(J) = SUM(I, C_DEMAND(I,J)) ;

manipulates the two-dimensional table C_DEHAND ( I , J ) using the SUM indexed op-
erator in order to initialize the parameter D ( J ) . Note that both the right and left
expressions of the assignment statement are defined over the domain set ft. G A M S
produces an error message if the domains of the two sides of an assignment statement
are not consistent with each other.
As another example, a simple arithmetic operation D ( J ) = 2 * D ( J ) doubles
the level of demand. Functions can also operate on G A M S data structures as in
LOGD(J) : LOG (D(J)) ;

that transforms the demand values to logarithmic scale.

Ch. 10: Modeling ~nguages in Computational Economics: GAMS 477

2.5. The GAMS relational operators

A relational operator allows the specification of relations between its left and right
arguments. GAMS supports relational operators in two ways: in the definition of
equations and in logical expressions.
In the definition of equations a relational operator is used to specify the type of the
relationship. For example, =E= is used to define equality relationships, :@: is used
to define greater-or-equal (~>) inequalities, and =L= is used to define less-or-equal
(~<) inequalities.
In logical expressions the symbols EQ, NE, LT and so on are used to specify
a required relationship between two values. These three symbols correspond to the
relationships =, 5£ and < respectively. GAMS also supports Boolean relational oper-
ators (NOT, AND, OR, XOR) although it does not support a Boolean data type. It
follows the convention that the result of a relational operator is 0 if the assertion is
false, and 1 if it is true. (Programmers familiar with the C programming language
will notice the similarity.)

2.6. Declaration and definition of equations

EQUATIONS, like all GAMS symbols, must first be declared before they can be
defined and used. The declaration is a list of names (these are the names of the
equations), each followed by a domain and by some explanatory text. We give two

EQUATIONS COST Cos[ definition

DEMAND(J) Constraint on required demand in market J;

The COST equation is a single equation, while for DEMAND(J) we have one
equation for each element of the set J. The domain of the demand equation is the
set J. The above statements define two blocks of equations. The actual number of
generated equations is equal to the cardinality of the set J, plus one more for the
COST equation.
The next statement specifies the symbolic relationships that define the equations.
First, we define the variables and some additional parameters that are needed in the
specification of the equations. The definition of the equations follows. It starts with
the equation identifier followed by . . , and then it gives the symbolic expression.
VARIABLES TRCOST Total transportation cost
X(I,J) S h i p m e n t f r o m o r i g i n I to d e s t i n a t i o n J;
PARAMETERS D{J) T o t a l d e m a n d at e a c h m a r k e t
C(I,J) Per unit transportation cost from I to J;

COST.. T R C O S T =E= S U N ( ( I , J ) , C ( I , J ) * X ( I , J ) ) ;
DEMAIXlD(J) . . S U M ( I , X ( I , J ) ) =G= D(J) ;
478 S.A. Zenios

2.7. Exception handling capabilities

The specification of complex relationships requires a mechanism for handling excep-

tions. One of the most powerful features of GAMS, in this respect, is the Dollar ($)
Operator. This operator can be used in both arithmetic expressions and in the defini-
tion of equations. Conceptually, the dollar operator is equivalent to an "IF" statement
of programming languages. Its general structure is the following:
It specifies that the expression A is evaluated (if it is a definition statement) or is
executed (if it is an execution statement) IF expression B is true. We illustrate the use
of this operator with two simple examples. Detailed explanations can be found in the
GAMS User's Guide.
Consider the following example:
S C A L A R X,Y;
Y=2; X:I;
X = 25(Y GT 1.5);

This statement assigns the value 2 to X if Y is greater than 1.5, and 0 otherwise.
The next example uses the dollar operator to control an indexed operation. Assume
that we are given the demagd parameter D (J), but for some markets in the set J
the demand is unavailable and is assigned a value of -INF (i.e., -infinity). The total
demand can be calculated by the following expression:
TOTAL : SUH(J$(m(a) NE -INF), D(O) ;

2.8. GAMS solvers

The GAMS language provides the flexibility for the specification of a wide vari-
ety of models. To support the solution of these models the GAMS system is inter-
faced with several optimization solvers. The basic system is usually configured with
two linear programming solvers (BDMLP and MINOS). GAMS/MINOS [Murtagh
and Saunders (1977)] can also handle nonlinear programs. Other linear and non-
linear programming codes include GRG2 of Lasdon et al. (1978) and CONOPT
[Drud (1985)]. Integer programs can be solved using GAMS/ZOOM [Singhal, Marsten
and Morin (1989)]. Network problems, linear and nonlinear, can be solved us-
ing GAMS/GENOS [Zenios (1990)]. More specialized solvers are also available,
like HERCULES of Drud and Kendrick (1986) for large, economywide models,
GAMS/MATBAL of Zenios, Drud and Mulvey (1989) for solving matrix estima-
tion problems, and GAMS/CPLIB of Dirkse et al. (1992) that allows the interfacing
of GAMS with solvers for the mixed complementarity problem.
Most of these solvers are available on several machines, ranging from personal
computers, to workstations, mainframes and vector supercomputers. More information
on the availability of solvers is given in the GAMS manual.
Ch. 10: Modeling Languages in Computational Economics: GAMS 479

2.9. The GAMS libraries of economic and financial models

It is often useful to build a model for an economic system by modifying the model of
a closely related economy. This practice facilitates the development of the conceptual
model, building on prior experiences by others. It also makes it easier to implement
the actual model on the computer. The GAMS system includes a large library of 100
models, called GAMSLIB. Some of the models in the library are included to illustrate
the capabilities of GAMS. Others are included because they represent classical and
widely used models. Of particular interests to economists are the models on agri-
cultural economics (those include several country-wide models for Pakistan, Egypt,
Turkey, Brazil), general equilibrium, economic development, energy economics (in-
cluding again country-wide models for Korea, Turkey, U.S.), as well as models in
micro- and macro-economics and econometrics. Detailed information on GAMSLIB
is given in Brooke, Kendrick and Meeraus (1992).
A library on financial models expressed in GAMS is also available, Dahl, Meeraus
and Zenios (1993). This library contains most of the standard optimization models
from corporate finance (Markowitz's mean-variance models, portfolio dedication and
portfolio immunization), as well as more specialized and complex models for structur-
ing collateralized mortgage obligations (CMOs), term-structure estimation and so on.

3. Example applications

We now illustrate complete GAMS models on three diverse applications from trans-
portation, finance, and estimation of social accounting matrices. A GAMS model
typically consists of the following statements:

Specification of the data. This part of the statement does the following:
Declare and define sets
Declare and define parameters
Assign data to parameters
Display the data for inspection purposes.

Specification of the model. This part of the statement does the following:
Declare variables
Declare equations
Define equations
Define a model.

Solution of the model. This part of the statement solves the model and displays
The next sections illustrate the use of these basic GAMS features.
480 S.A. Z e n i o s

3.1. A simple transportation model

We consider as an example a simple transportation problem. In this problem we are

given a set of production plants and a set of potential markets. Each plant has a
given level of production, and each market has a known level of demand. The cost
of shipping one unit of the product from the plants to the markets is also given.
A simple, algebraic, statement of the problem is the following: Let i and j be
indices for the plants and markets, respectively. Denote by si the supply of each
plant, and by dj the demand at each market. Let also xij denote the decision variables,
indicating the amount supplied to market j by plant i, and let cij denote the cost of
shipping one unit from i to j. The following linear program determines the least-cost


Minimize Z Z cijxij, (1)

i j
s.t. E xij <~ si, for all i, (2)
xij >~ dj, for all j. (3)

We consider now a specific instance of this problem, described in Dantzig (1963,

Chapter 3). In this example there are two plants and three markets. A complete GAMS
statement of this model is given next. Lines starting with a * are comments. Note that
the model is self explanatory.

* Declare two sets, and define their member elements.

I production plants /SEATTLE, SAN_DIEGO/

* Declare the supply and demand parameters, and define their numerical values.

s(i) c a p a c i t y of p l a n t I in c a s e s
/ S E A T T L E 350
S A N _ D I E G O 600 /
D(O) d e m a n d at m a r k e t J in c a s e s
/ N E W _ Y O R K 325
C H I C A G O 300
T O P E K A 275 /;
Ch. 10: Modeling Languages in Computational Economics: GAMS 481

* Declare a table of distances from each plant to each market,

* and define the numerical entries of the table.
TABLE D(I,J) distance in t h o u s a n d s of miles


SEATTLE 2.5 1.7 i. 8
SAN_DIEGO 2.5 I. 8 i. 4

* Define scalar parameters, and perform some data transformations to

* convert the distance between each pair of plant/market to a monetary cost.
SCALAR F f r e i g h t in d o l l a r s p e r c a s e p e r t h o u s a n d m i l e s /90/;
PARAMETER C(I,J) t r a n s p o r t c o s t in t h o u s a n d s of d o l l a r s p e r c a s e ;
C(I,J) = F * D(I,J) / i000 ;

* Declare the variables of the model

X(I,J) shipment quantity (cases) from plant I to m a r k e t J
Z total transportation c o s t s i n t h o u s a n d s of d o l l a r s

* Declare the equations, and specify the symbolic relationships that define
* each equation. There are three groups of equations. One equation is the
* objective function. A second group specifies constraints on the available
* supply at each plant. A third group specifies constraints on the required
* demand at each market.
COST define the objective function
SUPPLY(1) o b s e r v e s u p p l y l i m i t at p l a n t I
DEMAND(J) satisfy demand a t m a r k e t J;
COST .. Z = E = S U M ( ( I , J ) , C(I,J) * X ( I , J ) ) ;
SUPPLY(I).. SUM(J,X(I,J)) =L= S(I) ;
DEMAND(J) . . SUM(I,X(I,J)) =G= m(J) ;

• Define a model, called TRANSPORT, that contains all the

• equations declared and defined above.

• Solve the model, using a linear programming package.


• illustrate the use of a special purpose network optimization solver.

• Select the network optimizer GENOS as the linear programming solver.

• Display the level (.L) and marginal values (.M) of the variables
482 S.A. Zenios

3.2. Asset allocation model

We illustrate now the use of GAMS in modeling problems from corporate finance.
The problem of asset allocation is at the core of modern, practical, finance. It is
the problem of deciding how much to invest in each of the broad asset classes,
such as stocks, bonds, cash, foreign currency, fixed income securities and others. The
allocation aims at achieving the best portfolio, given the investors preferences and
Asset allocation decisions are, usually, made based on the principle of diversifica-
tion. Assuming that the (non-systematic) risk of the asset classes is captured by the
variance in their returns, the asset allocation model will diversify risk by selecting
securities whose returns are not highly correlated with each other. Harry Markowitz
(1952,1959) formulated the problem of portfolio selection as a mean-variance opti-
mization model. What, subsequently, became known as Markowitz model provides
the basis for models for asset allocation. The model also provided the foundations
for the development of modern portfolio theory, and Markowitz's contribution was
recognized by a Nobel prize in economics in 1990.
We illustrate the formulation of a mean-variance optimization model for asset allo-
cation in broad currency categories. The problem facing the investor is that of allocat-
ing cash reserves among securities denominated in different currencies to achieve a
target expected return for the portfolio, while minimizing the variance of the returns.
Define I, the set of available currencies, #i, the expected return of each asset class, and
Q = {q{j}, the covariance matrix of returns. Let also #p denote the target expected
return of the portfolio, and let zi, i 6 I, denote the fraction of the investor's cash
that is allocated to asset class i. The Markowitz mean-variance portfolio optimization
model is formulated as follows:


Minimize ~ E qijzizj, (4)
i6I j6I

s.t. ~#ix~=#p, (5)


~-~zi = 1. (6)

The GAMS statement of this model is illustrated in Fig. 10.1. Lines 13 and 14 define
the set of investments. These two statements illustrate the use of sets ( I ) and supersets
(S). The ALIAS command is used to give two names to the set of investments. The
need for this is made clear in the definition of the covariance matrix. A PARAMETER
statement specifies the expected return of each security, and a TABLE specifies the
Ch. 10: Modeling Languages in Computational Economics: GAMS 483

13 SET S investment set / CN,FR,GR,JP,SW,UK,US,WR /

14 I(S) analyzed investments / CN, F R , G R , J P , S W , U K , U S /;
16 ALIAS (l,J) ;
22 PARAMETER MU(S) expected return of security /
24 CN 0.1287
25 FR 0.1096
26 GR 0.0501
27 JP 0.1524
28 SW 0.0763
29 UK 0.1854
30 US 0.0620
31 WR 0.0916 /
34 TABLE Q(I,J) covariance matrix
37 CN 42.18
38 FR 20.18 70.89
39 GR 10.88 21.58 25.51
40 JP 5.30 15.4] 9.60 22.33
41 SW ]2.32 23.24 22.63 10.32 30.01
42 UK 23.84 23.80 13.22 10.46 ]6.36 42.23
43 US 17.41 12.62 4.70 1.00 7.20 9.90 16.42 ;
45 Q(I,J)$(0RD(J) GT ORD(I)) : Q(J,I) ;
48 SCALAR MUp target expected return for the portfolio / 0.115 /;
52 OMEGA objective function value
53 X(I) f r a c t i o n of t h e p o r t f o l i o that consists of security I ;
57 X.UP(I) = 1 ;
61 OBJ objective function
62 MBAL mean balancing constraint
63 BUDGET budget constraint ;
66 OBJ.. OMEGA :E: .5 * S U M ( ( I , J ) , Q(I,J)*X(I)*X(J)) ;
69 MBAL.. SUM(I, MU(I)*X(I)) :E: MUp ;
72 BUDGET.. SUM(I, X(I)) -E- 1 ;

Figure 10.1. A GAMS statement for the mean-variance asset allocation model.
484 S.A. Zenios

covariance matrix of those returns. Note that only the lower diagonal part of the
covariance matrix is specified by the TABLE command. The upper diagonal part is
easily constructed, due to symmetry, by the implied loop over r and J in line 45.
The rest of the model is self-explanatory. A target expected return is specified, a
quadratic expression of portfolio v~ariance is defined for the objective function, and
the model minimizes the variance of the portfolio, subject to the constraints that the
target expected return is achieved, and that 100% of the available budget is invested.
The model is a quadratic programming problem and is solved using a nonlinear
programming (NLP) code.

3.3. The SAMBAL system: Estimating Social Accounting Matrices

We now describe a GAMS based system that facilitates the representation and solution
of matrix estimation problems. The GAMS/SAMBAL system is a custom-made tem-
plate of the GAMS language that allows easy specification of models for estimating
a social accounting matrix. This system is specialized for the particular application,
but the full set of GAMS features explained above remain available.
The matrix estimation problem is typically posed as follows:

Given a rectangular matrix A, determine a matrix X that is close to A and satisfies

a given set of linear restrictions on its entries.

A matrix that satisfies the linear restrictions is said to be balanced. For a survey of
models and algorithms for this problem see Schneider and Zenios (1990).
In this section we are interested in the estimation of social accounting matrices, or
SAM. A SAM is a square matrix A whose entries represent the flow-of-funds between
the national income accounts of a country's economy at a fixed point in time. Each
index of a row or a column of A represents an account, or agent, in the economy.
Entry aij is positive if agent j receives funds from agent i. A SAM is a snapshot of
the critical variables in a general equilibrium model describing the circular flow of
financial transactions in an economy. For balancing problems arising from estimating
SAMs, the linear restrictions are the a priori accounting identities that each agent's
total expenditures and total receipts must be equal. That is, for each index i of the
matrix A, the sum of the entries in row i must equal the sum of the entries in column i.
The volume by Pyatt and Round (1985) provides an introduction to Social Accounting
The agents of an economy in a simplified SAM include institutions, factors of
production, households, and the rest-of-the-world (to account for transactions with the
economies of other countries). Briefly, the production activities generate value-added
which flows to the factors of production - land, labor, and capital. Factor income is the
primary source of income for institutions - households, government and firms - who
purchase goods and services supplied by productive activities, thereby completing the
Ch. 10: Modeling Languages in Computational Economics: GAMS 485

cycle. Of course, to be useful for equilibrium modeling, this highly aggregated model
must be disaggregated into subaccounts for each sector of the economy.
The compilation of a SAM is a difficult task due to data inconsistencies. Incon-
sistent data is an inherent problem when statistical methods are used to estimate
underlying economic models. Morgenstern (1963) devoted his book to the problem of
inconsistency in economic measurements. In particular, the direct estimate of a SAM
is never balanced. The following quote of Sir Richard Stone [Van der Ploeg (1982,
p. 186] summarizes the sources of inconsistency in SAM modeling.

. .. it is impossible to establish by direct estimation a system of national accounts

free of statistical discrepancies, residual errors, unidentified items, balancing en-
tries and the like since the information available is in some degree incomplete,
inconsistent and unreliable. Accordingly, the task of measurement is not finished
when the initial estimates have been made and remains incomplete until final es-
timates have been obtained which satisfy the constraints that hold between their
true values.

Therefore, the raw estimates of a SAM must be adjusted so that the consistency
requirements are satisfied. This problem motivates much of the work on matrix bal-
ancing for economic modeling.
A matrix balancing problem also arises when partial survey methods are used
to estimate a SAM. Frequently, estimates of the total expenditures and receipts are
available for each agent in an economy, but current data are not available for the
individual transactions between the agents. If a complete (balanced) SAM is available
from an earlier period, then the SAM must be updated to reflect the recent index
totals. The problem is then to adjust the entries of the old matrix A so that the
row and column totals equal the given fixed amounts. A similar balancing problem
occurs when the entries of an input-output matrix must be updated to be consis-
tent with exogenous estimates of the total levels of primary inputs and final de-
The matrix balancing application we describe here can be formulated as follows:

PROBLEM 3.3. Given an n x n nonnegative matrix A = (aij), determine a "nearby"

nonnegative matrix X = (xij ) (of the same dimensions) such that

~xij=2xji, i= 1,2,...,n, (7)

j=I j=l

and xij > 0 only if aij > O.

In general, there are infinitely many matrices satisfying the consistency restric-
tions (7). For the problem to be well-posed the notion of a nearby matrix has to
be defined. This notion is defined by some distance function f ( X ; A) which mea-
sures the "distance" between X and A. The choice f ( X ; A ) -~ II X - A N3, where
486 S.A. Zenios

l] " IIF denotes the Frobenius norm, leads to a linearly constrained quadratic opti-
mization problem. Another commonly used objective is the negative entropy func-

(i,j) \ aij /

The G A M S / S A M B A L system specifies exactly how data structures for a matrix

balancing model should be set up. A GAMS statement ACRONYHS is used to specify
the distance functions. Additional data structures are provided to hold the data of the
problem by using two sets of GAMS symbols. One set, prefixed with T, is used to
provide information about the entries of the SAM (e.g., initial values, upper/lower
bounds on the estimated values, specification of the distance measure). The other set,
prefixed by Y, is used to specify information about the row and column totals. Though
Problem 3.3 only specifies that row sums should be equal to column sums, it is also
possible that some a priori target values are given for these totals. It is possible to
specify a problem whereby row sums are required to be equal to column sums, while
minimizing some distance from the prespecified values.
The following data structures are available for the specification of the Social Ac-
counting Matrix.

TINIT the initial values of the matrix

TNAX upper bounds on the entries of the balanced matrix
TMIN lower bounds on the entries of the balanced matrix
TFUNC functional form of the distance function, chosen from the list of ACRONYMS
TWEIGHT weighing coefficients of the penalty term for each entry of the matrix
TBASE the balanced values of the matrix

The same data structures, prefixed with Y - Y I N I T , YNAX, YMIN, YFUNC,

YWEIGHT, YBASE are available to store information about the row and column

totals. For example, Y I N I T specifies the initial, target, values for the row and column
totals. Not all of the above data structures need to be provided for a well-specified
model. Minimal data requirements include the specification of the initial matrix values
T I N I T and the functional from of the distance norms. Other information is incorpo-
rated in the model if it is provided.
Figure 10.2 illustrates the output of a simple model in GAMS/SAMBAL. This
model estimates the entries of a 5 x 5 social accounting matrix. For three of these
rows/columns we are given an estimate of the totals (see line 31), and a quadratic
distance function is specified (line 49). For the remaining totals no prior information
is given, and a residual distance function (i.e., a penalty identically equal to zero) is
specified in line 50.
Ch. 10: Modeling Languages in Computational Economics." GAMS 487




9 LABOR 15 3 130 80
12 PROD1 15 130 20
13 PROD2 25 40 55
25 LABOR 0.167 0.833 0.038 0.063
28 PROD1 0.167 0.019 0.071
29 PROD2 0.400 0.063 0.091
32 L A B O R = 220, P R O D 1 = 190, P R O D 2 - 105
34 L A B O R = 22, P R O D 1 : 38, P R O D 2 : 21 /;
39 A C C N ( A C C ) = Y E S ; A C C N ( A C C A ) = NO;

Figure 10,2. The GAMS/SAMBAL statement of a simple model for estimating a Social Accounting Matrix.
488 S.A. Zenios

[1] Bisschop, J. and Entriken, R. AIMMS: The modeling system. Haarlem, The Netherlands: Paragon
Decision Technology, 1993.
[2] Bisschop, J. and Meerans, A. 'On the development of a general algebraic modeling system in a
strategic planning environment', Mathematical Programming Study, 20:1-29, 1982.
[3] Brooke, A., Drud, A. and Meeraus, A. 'High level modeling systems and nonlinear programming',
in: P.T. Boggs, R.H. Byrd and R,B. Schnabel, eds, Numerical optimization 1984. Philadelphia, PA:
SIAM, 1984.
[4] Brooke, A., Kendrick, D. and Meeraus, A. GAMS: A user~ guide, release 2.25. Danvers, MA: The
Scientific Press, Boyd and Fraser Publishing Company, 1992.
[5] Dahl, H., Meeraus, A. and Zenios, S.A. 'Some financial optimization models: 1. Risk management',
in: S.A. Zenios, ed., Financial optimization. Cambridge Univ. Press, pp. 3-36, 1993.
[6] Dantzig, G.B. Linear programming and extensions. Princeton, N J: Princeton Univ. Press, 1963.
[7] Dirkse, S., Fen:is, M., Preckel, P.V. and Rutherford, T. 'The GAMS callable program library for
variational and complementarity solvers', Working paper, Computer Science Department, University
of Wisconsin, Madison, WI, 1992.
[8] Drnd, A. 'CONOPT: A GRG code for large sparse dynamic nonlinear optimization problems', Math-
ematical Programming, 31:153-191, 1985.
[9] Drud, A. and Kendrick, D. 'HERCULES: A system for large economywide models', Technical report,
Development Research Department, The World Bank, Washington, DC, 1986.
[10] Fourer, R. 'Modeling languages versus matrix generators for linear programming', ACM Transactions
on Mathematical S¢~flware, 9(2):143-183, 1983.
[11] Fourer, R., Gay, D.M. and Kernighan, B.W. AMPL: A modeling language.for mathematical program-
ming. The Scientific Press, 1993.
[12] Geoffriou, A.M. 'Introduction to structured lnodeling', Management Science, 33(5):547-588, 1987.
[13] Lasdon, L.S., Waren, A.D., Jain, A. and Rather, M. 'Design and testing of a generalized reduced
gradient code for nonlinear programming', ACM Transactions on Mathematical S~flware, 4:34, 1978.
[14] Markowitz, H. 'Portfolio selection', Journal of Finance, 7:77-91, 1952.
[15] Markowitz, H. Portfolio selection, efficiency diversification of investments. Cowles Foundation Mono-
graph 16, New Haven, CT: Yale Univ. Press; 2nd edn - Princeton, NJ: Basil Blackwell, 1959.
[16] Morgenstern, O. On the accuracy ¢~feconomic observations. Princeton, N J: Princeton University Press,
[17] Murtagh, B.A. and Saunders, M.A. 'Minos user's guide', Report sol 77-9, Department of Operations
Research, Stanford University, California, CA, 1977.
[ 18] Van Der Ploeg, E 'Reliability and the adjustment of sequences of large economic accounting matrices',
Journal of the Royal Statistical Society, 145:169-194, 1982.
[19] Pyatt, G. and Round, J.I., eds Social accounting matrices: A basis for planning. Washington, DC:
The World Bank, 1985.
[20] Rutherford, T. 'Extensions of GAMS for complementarity problems and variational inequalities with
examples arising in economic equilibrium analysis', Working paper, Department of Economics, Uni-
versity of Western Ontario, 1992.
[21] Schneider, M.H. and Zenios, S.A. 'A comparative study of algorithms for matrix balancing', Opera-
tions Research, 38:439-455, 1990.
[22] SinghM, J., Marsten, R.E. and Morin, T.L. 'Fixed order branch-and-bound methods for mixed-integer
programming: The ZOOM system', ORSA Journal on Computing, 1(1):44-51, 1989.
[23] Zenios, S.A. 'Integrating network optimization capabilities into a high-level modeling language', ACM
Transactions on Mathematical Software, 16:113-142, 1990.
[24] Zenios, S.A., Drud, A. and Mulvey, J.M. 'Balancing large social accounting matrices with nonlinear
network programming', Networks, 17:569-585, 1989.
Ctutpter 11


University of Califbrnia at Berkeley

1. Introduction 490
2. Design of Mathematica 490
3. The front end 490
4. Programming 491
4.1. An example 491
4.2. Defining functions 492
4.3. Programming constructs 493
4.4. Pattern matching 493
4.5. Expressions 495
5. Packages 496
6. MathSource 496
7. Applications in economics 496
7.1, Comparative statics 496
7.2. Dynamic programming 498
7.3. Nash equilibria 499
7.4. Econometrics and statistics 499
7.5. Graphics 502
7.6. Teaching 504
8. Summary 505
References 505

*I would like to thank David Belsley and Bob Parks for comment on earlier drafts. This work was
supported by NSF grant SES-9223130.

Handbook of Computational Economics, Volume 1, Edited by H.M. Amman, D.A. Kendrick and J. Rust
~) 1996 Elsevier Science B. V. All rights reserved.
490 H.R. Varian

1. Introduction

Mathematica is a computer program that can help you do mathematics. You can use
it to do symbolic, numeric and graphical analysis. Mathematica is sold by Wolfram
Research, Inc and runs on a variety of computers including MS-Windows, Macintosh,
and Unix platforms. The cost of Mathematica depends on the version and the platform;
it ranges from about $200 for a student version to several thousand for a multiple-user
workstation version. You can contact Wolfram Research, Inc. by sending e-mail to
i n f o @ w r i , corn, or calling them at 217-398-0700.

2. Design of Mathematica

There are two parts to the Mathematica program: the kernel and the front end. The
kernel is the basic computational engine and is more-or-less platform independent;
the front-end is slightly different for each platform. These two programs can be run
separately: the front end can run on a lowly Macintosh while the kernel executes
on a remote workstation or a supercomputer. This allows you to do your computa-
tions on whatever size computer you choose and still work in exactly the same user
Wolfram Research has developed a set of protocols known as MathLink that allow
the Mathematica kernel to communicate with other programs running on a given ma-
chine. This feature allows the user to combine functionality of various programs in
a convenient way. For example, you can manipulate numbers in a spreadsheet and
then send them to Mathematica for further processing. Or you can process Mathe-
matica output in TEN or some other formatting system. You can also send parts of
Mathematica computations off to a special-purpose computer package such as I M S L
or S. Various packages are available from third party suppliers that make this kind of
inter-process communication very easy to implement.
You load packages into Mathematica using < < as in the following examples.
In[l]:= < < S t a t i s t i c s 'D e s c r i p t i v e S t a t i s t i c s '
< < / U s e r s /h a l / P a p e r s / M a t h e m a t i c a / n a s h , m

3. The front end

Mathematica keeps a record of a session in a format known as a Notebook. This is

an A S C I I file and is essentially machine independent. It allows the input and output
of Mathematica (including graphical output) to be organized in a convenient way.
A Notebook has an outline structure tIiat allows parts of the session to be hidden
or open as the user desires. A Notebook can serve as an "audit trail" to ensure that
calculations or manipulations of data can be easily reproduced.
Notebooks can themselves be used as inputs to other programs. For example, W R I
distributes a package that will convert Notebooks to TEN format so that the material in
Ch. 11: Mathematica./or Economists 491

the Notebook can be typeset. Several books have been produced using this technique.
In fact, this article has been produced using this system.

4. Programming

Mathematica contains a complete programming language that can be used to automate

various kinds of computations. Thejanguage is based on the philosophy of "functional
programming". This means that the fundamental operation in the language is the
application of a function.
Adherents of functional programming argue that it is a very efficient way to pro-
gram. Functional programming allows you to build up small pieces of a program,
interactively debug them, and string them together to achieve a desired end. Other
functional languages are APL and Lisp. People who have used these languages will
find Mathematica programming to be quite congenial.
Mathematica also has tools for procedural programming, which is the style of
programming used in Fortran, Pascal and C. However, these tools - DO loops, WHILE
loops, and the like - are normally not the best way to program in Mathematica. One
advocate has gone so far as to proclaim "If you .aren't programming functionally,
you're programming disfunctionally!"

4.1. An example

The operation of the Mathematica programming language is best illustrated by an

Suppose that you want to compute the square root of 3 using Newton's method.
The difference equation that you want to iterate is:

Xt+ 1 = 2 3;t -}-

In order to write a program to calculate this expression using C or Fortran you would
need to declare the variables, construct a DO or a for loop, and output the results. In
Mathematica you simply declare the function:
In[l]:= newton[x_] := N [ I / 2 (x + 3/x)]

The expression on the left is the function declaration, x_ defines the dummy variable
that will be the argument to the function, and N[ ] indicates that you want the
expression inside the bracket to be converted to a real number.
Once the function has been defined, you then apply the built-in function Ne s t L ± s t
to calculate the first 5 terms starting from x = 1:
492 H.R. Varian

tn[2]:= NestList [newton, 1.0,5 ]

Out[2]= {i., 2., 1.75, 1.73214, 1.73205, 1.73205]

If you want to iterate until the result no longer changes, simply use the FixedPoint
In[3]:= FixedPoint [newton, 1.0 ]
Out[3]= 1. 7 3 2 0 5

4.2. Defining functions

Mathematica contains a number of functions that operate on lists of objects. You

can also write your own functions. For example, Mathematica contains a derivative
function that will calculate the symbolic derivative of an expression:
In[l]:: D[x^n,x]

Out[l]= -1 + n
n x

You can define a gradient and Hessian function as follows:

In[2]:: Grad[f_,x_] := N a p [ D [ f , # ] & , x ]
Hessian[f_,x_] := G r a d [ G r a d [ f , x ] ,x]

Here the # symbol is a placeholder that will take on the values in the list x. The
definition of G r a d "maps" the D[ f , #] over the list x = { x l , x 2 } to produce a new
list {D [ f , x l ] , D [ f , x 2 ] }. The definition of Hessian applies G r a d to Grad. Here
are some examples.
In[3]:= Grad[xl^a x2^b, {xl,x2}]

Out[3]: -i + a b a -i + b
{a xl x2 , b xl x2 }

In[4]:= MatrixForm[Hessian[xl^a x2^b, {xl,x2)] ]

Out[4]: -2 + a b -i + a -I + b
(-i + a) a xl x2 a b xl x2

-i + a -i + b a -2 + b
a b xl x2 (-i + b) b xl x2

Similarly the following definition will produce the first-order conditions for optimizing
a function:
In[5]:= FOC[f ,x ] :: M a p [ ( D [ f , # ] - - 0 ) & , x ]

In[6]:= FOC[xl^a x2^b, {xl,x2}]

Out[6]: i + a b a -i + b
{a x l x2 == 0, b xl x2 == 0]
Ch. l l." Mathematica for Economists 493

4.3. Programming constructs

Mathematica contains a number of programming constructs for iterating, branching,

etc. However, in general it is best to avoid iteration and indices if possible. Often
there is a built in function that will do some particular sort of manipulation of a list
of values. For example, recently I had a list of price vectors, (pt) and associated
consumption bundles (z t) for t = 1 , . . . , T. I wanted to calculate the matrix (pS;ct)
tbr -L,s = 1 , . . . , T for some revealed preference calculations. This is simple to do
using iteration and indices of course, but that is quite inelegant. After a short search
through the Mathematica book and a bit of experimentation I came up with the
following solution that uses Mathematica's generalized inner product function.
In[l]:= vNatrix[p_,x_] := I n n e r [ T i m e s , p , T r a n s p o s e [ x ] , P l u s ]

To verify that this works, define some vectors and apply the function:
In[2]:= p:{ {pll,pl2}, {p21,p22}, {p31,p32] } ;
x={ {xll,xl2}, {x21,x22}, {x31,x32} ] ;

In[3]:= M a t r i x F o r m [vMatrix [p, x] ]

Out[3]= pll xll + p12 x12 pll x21 + p12 x22 pll x31 + p12 x32

p21 xll +' p22 x12 p21 x21 + p22 x22 p21 x31 + p22 x32

p31 xll + p32 x12 p31 x21 + p32 x22 p31 x31 + p32 x32

This example illustrates a nice point about Mathematica programming: if you know
the formula for what you want, you can apply your function to symbolic values to
see if it produces the right thing. Once it does, you can switch to numbers.

4.4. Pattern matching

At the most fundamental level, Mathematica operates by replacing patterns of ex-

pressions with other expressions. This means that Mathematica has a sophisticated,
built-in pattern matching engine. This pattern matching facility is also available to the
For example, economists often want to solve systems of equations that have a
"Cobb-Douglas" structure:
X l(lll X 2512 =hi,
X 10'21X 2c~22= b2.
An easy way to do this is to take a log transform to construct the linear system

all log :cl + al2 log z2 = log bl,

a21 log Zl + a22 log z2 = log b2.
494 H.R. Varian

Mathematica won't do this kind of transformation automatically, nor should it: this
particular transformation is only valid for positive real numbers. On the other hand,
it would be nice to automate this sort of thing. Here's how to do this in M a t h e m a t i c a .
First we define the rules that translate the pattern may b = c into

a log x + b log y = log c

and the reverse transformation. (The semicolon at the end of the expression inhibits
the output which, in this dase, is uninteresting.)
In[l]:: C l e a r [x]

In[2]:: logRules:{x_^a_ y ^b :: c_ -> a Log[x] + b L o g [y] : : L o g [c] } ;

In[3]:= eRules={(Log[x_] -> ( a ' L o g [ b _ ] + c_*Log[d_])/e_) ->

(x -> b ^ ( a / e ) d^(c/e))};

N o w let's apply this to solving the following system of equations.

In[4]:= eqns={xl^all x2^a21 == b l , x l ^ a l 2 x2^a22 =: b 2 }

Out[4]= all a21 a12 a22

{xl x2 == bl, xl x2 =: b 2 }

ln[5]:: logEqns:eqns/.logRules

Out[5]: {all Log[xl] + a21 Log[x2] :: Log[bl],

a12 Log[xl] + a22 Log[x2] =: Log[b2]}

In[6]:= ans=Simplify[Solve[logEqns, {Log[xl],Log[x2]}]]

Out[6]= -(a22 Log[bl]) + a21 Log[b2]

{{Log[xl] -> ............................ ,
a12 a21 - all a22

-(a12 Log[bl]) + all Log[b2]

Log[x2] -> ............................ }}
-(a12 a21) + all a22

In[7]:= ans/.eRules

Out[7]: b2/(a12 a21 - all a22)

{{xl -> . . . . . . . . . . . . . . . . . . . . . . . . . ,
a22/(a12 a21 - all a22)

b2/(-(a12 a 2 1 ) + a l l a22)
x2 -> . . . . . . . . . . . . . . . . . . . . . . . . . . . . }}
a12/(-(a12 a21) + a l l a22)
Ch. 11: M a t h e m a t i c a .]'or Economists 495

This particular set of rules is pretty minimal. A really useful set of rules for doing
and undoing log transforms should be more sophisticated. Nevertheless, this example
illustrates some of the power of the pattern matching capabilities.

4.5. Expressions

In Mathematica everything is an expression: a head followed by a list of items. A

generic list, for example, is represented by the expression LL s t [ a , b , c ] . ']'he sum
of 3 objects is represented by P l u s [ a , b , c ] . An assignment expression that says
"map a to b" is represented by R u l e [ a , b ] , and so on. This uniform representation
means that all objects can be operated upon in the same way.
Consider for example the operator Map [ f , e x p ]. This will distribute the function
f over the elements in the expression e x p . So
In[1]:= M a p [ f , L i s t [a,b, c]]
Map[f,Plus[a,b,c] ]
M a p [f , R u l e [a, b] ]

Out[l]= {f[a], f[b], f[c])

Out[l]= f[a] + f[b] + f[c]

Out[l]= f [a] -> f [b]

Functions can be defined "on the fly" by using the construction called a pure
function. The function f [ x ] = x ^ 2 can be written in "pure" form as #^2& or
Function [x, x^2 ].

In[2]:= Map [#^2&, {a,b,c}]

Out[2]= 2 2 2
{a , b , c )

You can also have functions with multiple definitions such as

In[3]:= f[x_] : = x ^ 2
f[x_,y_] := x^3 + y^3

Mathematica will try the first form first; if that doesn't work it will try the second. If
nothing fits, Mathematica just returns what you typed in.
In[4]:= f [a]
f [a,b,c]
Out[4]= 2
Out[4]-: 3 3
a + b

Out[4]= f [a, b, c]

This allows you to define functions that can accept different forms of arguments~
496 H.R. Varian

5. Packages

Sets of Mathematica commands can be collected together into Packages. These are
plain ASCII files that can be input into other Mathematica programs to do specific
calculations. The Mathematica distribution comes with a number of Packages designed
for specific sorts of calculations such as combinatorics, linear algebra, statistics, and
so on. Many authors have produced Packages and Notebooks that for various uses that
they have made available to other users through articles, books and on-line systems.

6. MathSource

Wolfram Research maintains a repository of contributed Mathematica materials that

are available via e-mail and ftp. This repository is known as MathSource. The easiest
way to start using MathSource is to send e-mail to m a t h s o u r c e @ w r i , corn that
contains the message h e l p i n t r o . MathSource will return some documents that
explain how to retrieve files.

7. Applications in economics

In the following sections I describe some applications of Mathematica in economics.

Obviously the list is not complete, but I hope to give the reader some idea of potential
uses. Many further examples can be found in Varian (1993).

7,1. Comparative statics

Economists spend a lot of time analyzing optimization problems using the techniques
of comparative statics. Although these computations are very simple, they can be
quite tedious. Mathematica can help to automate this process.
For example, here is a calculation that derives the comparative statics for a profit
maximizing firm with two inputs. First we define the objective function:
In[l]:= profit = f[xl,x2] - wl xl - w2 x2

Out[[]= -(wl xl) - w2 x2 + f[xl, x2]

Next we calculate the first order conditions for profit maximization and-the Hessian
using the functions that we have defined earlier.
In[2]:= focs=FOC [profit, {xl,x2} ]

Out[2]= (1, O) (0,1)

{-wl + f [xl, x2] :: O, -w2 + f [xl, x2] == O]
Ch. 11: Mathematicafi~r Economists 497

In[3]:= Hess = H e s s i a n [ p r o f i t , {xl,x2}]

Out[3]= (2, O) ( i , 1)
{{f [xl, x2], f [xl, x2]},

(i,i) (0,2)
{f [xl, x2], f [xl, x 2 ] } }

Note Mathematica's notation for derivatives: f(i,j) is the ith derivative of argument
1 and the jth derivative of argument 2.
Next we totally differentiate the first-order conditions.

In[4]:= totalDerivative = Dt [foes]

Out[4]= ( 1,1 )
{-Dt[wl] + Dt[x2] f [xl, x2] +

Dt[xl] f [xl, x2] == 0,

-DE[w2] + Dt[x2] f [xl, x2] +

mt[xl] f [xl, X2] =: 0}

Note that Mathematica uses the notation of D t [ x l ] for the differential element d z I .
N o w we simply solve the system of equations and substitute out for the determinate
of the Hessian:
In[5]:= S i m p l i f y [ S o l v e [ t o t a l D e r i v a t i v e , {Dr [xl] ,
Dt [x2 ] } ] ] /. {Det [Hess] - > d H e s s }
Out[5]= (0,2) (1, l )
Dt[wl] f [xl, x2] - D t [ w 2 ] f [xl, x2]
{{Dt[xl] -> . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ,

D t [x2 ] ->

(i,i) (2,0)
-(Dt[wl] f [xl, x2]) + mt[w2] f [xl, x2]
................................................ }}

In standard notation, these expressions say

)22 dwl q- f12 d'w2

dz 1 =

fll dw2 + f21 dwl

dx 2 =

These conditions contain all of the normal comparative statics conclusions about cost
498 H.R. Varian

7.2. Dynamic programming

Dynamic programming is another calculation that is straightforward but tedious. Con-

sider, for example, the problem of allocating consumption over time. The optimal
solution can be characterized through the use of the Bellman value function:

V~(w) = max u(e) + c~½+l((W - c)r).


For certain classes of u(c) it is possible to find closed-form solutions for V~(w). Solv-
ing the Bellman recursion numerically or symbolically is simple using Mathematica.
For example, here is how you would write the Bellman equation for the case of log
utility and a five-period time horizon:
In[l]:= V[w , 5 ] :: Log[w]
V [ w _ , t_] :=
M o d u l e [{c},
Log[c] + alpha*V[ (w-c)*R,t+l]/.Solve[D[Log[c]
+ alpha*V[ ( w - c ) * R , t + l ] , c ] = = 0 , c ] [ [i] ] ]

The first definition gives the boundary condition. The second definition gives the
The M o d u l e construction declares c to be a local variable. Subsequently, we have
the recursive definition of the value function; the notation a / . b means "substitute b
into a". In this case b contains the optimal solution and a is the objective function.
Calculating V(w, 2), for example, gives us:
In[2]:= Simplify [V [w, 2 ] ]

Out[2]= w
Log[ ........................... ] +
2 3
1 + alpha + alpha + alpha

alpha R w
alpha Log[ ............................ ] +
2 3
I + alpha + alpha + alpha

2 2
2 alpha R w
alpha L o g [. . . . . . . . . . . . . . . . . . . . . . . . . . . . ] +
2 3
1 + alpha + alpha + alpha

3 3
3 alpha R w
alpha L o g [. . . . . . . . . . . . . . . . . . . . . . . . . . . ]
2 3
1 + alpha + alpha + alpha

The optimal consumption in period 2 is given by

Ch. 11. Mathematica ./br Economists 499

In[3]:= Solve[D[Log[c] + alpha*V[(w-c)*r,3],c]==0,c] [[i]]

Out[3]= w
{c -> ........................... }
2 3
1 + alpha + alpha + alpha

7.3. Nash equilibria

In a two-person game with a finite number of strategies, calculating all Nash equilibria
is a straightforward hut tedious enumeration of Kuhn-Tucker conditions. Dickhaut and
Kaplan (1992) have written up a Mathematica package that automates this calculation.
For example, here are all Nash equilibria in the Battle of the Sexes.

In[t]:= Nash[ { { {2, i}, {0,0} }, { {0,0), {1,2} ] } ]

Out[l]= 2 1 1 2
{{{0, i}, {0, i}}, {{-, -}, {-, -}}, {{i, 0}, {I, 0)})
3 3 3 3

7.4. Econometrics and statistics

One of the most promising and comparatively underexploited areas for applications
of Mathematica is in econometrics and statistics. Mathematica can serve as a com-
putational engine for special-purpose calculations, as a symbolic engine for deriving
expressions, and as a tool for data analysis.

7.4.1. Symbolic expressions

Mathematica can simplify various statistical calculations. As an example, let us define

a Normal distribution:

In[l]:= N o r m a l D i s t n [x ,m_, s_] _--

E x p [ - ( ( x - m ) / s ) ^2 / 2]/ (s Sqrt[2*Pi] )

Consider the problem of choosing a forecast y so as to minimize some expected loss

involving x and y. The L I N E X loss function [see Varian (1975) and Zellner (1986)]
has the form:
In[2]:= LinexLoss[a_,x_,y_] := Exp[a*(y-x)] - a*(y-x)

We are interested in the expected loss. The easy way to calculate this is to recognize
that the first term is just the moment generating function for the Normal distribution.
But if we don't recognize this, we can easily calculate the expected loss between - o c
and + o o using Mathematica:
500 H.R. Varian

ln[31:= Exloec t e d L o s s 1 = T o g e t h e r [P o w e r E x p a n d [
Integrate [LinexLoss [a,x,y] *
NormalDistn [x,m, s} , {x, - I n f i n i t y , Infinity} ]]]

Out[3]: 2
(a (-2 m + a s + 2 y))/2
(2 E + 2 am- 2 ay-

(a - --) s
2 2
(a (-2 m + a s + 2 y))/2 s
E r f [. . . . . . . . . . ] -
S q r t [2 ]

+ --) s
2 2
(a (-2 m + a s + 2 y))/2 s
Erf[ ........... ]) / 2
S q r t [2 ]

In Mathematica notation,

Erf[z] : ~ lf0 e -t 2dr.

From the definition it is easy to see that

Erf[-z] : -Erf[z].

In order to get Mathematica to make this substitution, we use the function

ExpandAl I.

In[4]:= Expec tedLos s 3 :ExpandAl i [E x p e c t e d L o s s l ]

Out[4]= 2 2
-(a m) + (a s ) /2 + a y
E + a m - a y

Finally we solve for the loss-minimizing estimate and use Simplify and
PowerExpand to put it into a simple form.

In[5]:: ans=Solve[D[ExpectedLoss3,y] ::0,y] [ [i] ]

Out[5]: 2 2 2 2
a m - a s a s
Log[E Sqrt[E ]]
{y -> . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . }

[n[6]:: Simplify [P o w e r E x p a n d [ans ] ]

Out[6]: 2
a s
{y - > m . . . . . }
Ch. l h Mathematica for Economists 501

The loss minimizing estimator is the mean, biased downwards by an amount that
depends on the variance of the distribution.

7.4.2. Special purpose calculations

The bootstrap is a well-known tool for estimating the sampling distribution of" an esti-
mator. Implementing this in Mathematica is trivial. Here is a function that resamples
from a list.
In[l]:= Resample[list_] :: list[[
Table [Random[ Integer, { I, Length [list ] } ] ,
{Length[list] ]] ] ]

To test this we apply it to a symbolic list:

In[2]:= Resample [ ta, b, c, d, e, f, g,h} ]

Out[2]: {b, f, b, d, a, f, h, d}

In[3]:= Resample [ {a, b, c, d, e, f, g, h} ]

Out[3]: {e, f, c, c, e, d, h, h]

N o w I generate a sample of 25 random numbers; the semicolon tells Mathematica

not to print them out.
In[4]:= theSample:Table [Random [] , {25}] ;

Here I resample from the list lO times and compute the mean of each resample.
In[5]:= bootList=Table [Mean [Resample [theSample] ] , {i0}]
Out[5]= {0.529655, 0.473363, 0.62481, 0.586473, 0.543386,

0.547182, 0.387835, 0.464449, 0.593265, 0.633631}

We can write a function that will take a sample and a statistic and apply this
statistic to resampled draws from the original sample. (The M e d i a n is defined in
Statistics 'DescriptiveStatistics.)

lnl6l:= bootlt[sample_,n ,stat_] := Table[stat[Resample[sample]],{n}]

In[7]:= bootIt[theSample,10,Median]
Out[7]= {0.520299, 0.482829, 0.398446, 0.526447, 0.482829,

0.520299, 0.398446, 0.392464, 0.445672, 0.526447}

7.4.3. Data analysis

Mathematica comes with a number of standard statistical routines. Belsely (1992)

has provided a number of additional routines for classical econometric calculations.
502 H.R. Varian

Ley and Steel (1993) have provided routines for Bayesian calculations. Stine (1992)
describes some methods for time series analysis.
It is also easy to write your own routines. For example, I recently used Mathematica
to calculate "efficiency indices". Suppose that you have a set of factor prices, factor
choices, and output levels for n firms denoted by (wi, x~, Yi). The efficiency index of
firm i is given by

ei : min W i X j
Yj)Y{ WiXi

Here we look at each firm that produces at least as much output as firm / to see if it's
production plan would cost less than firm i's plan assuming both firms faced factor
prices wi. Furthermore, I wanted to keep track of all the elements over which the
minimization was taken so I could see how many firms appeared to be more efficient
than the firm in question.
This is not a difficult calculation, but there is no ready-made package to do it.
In Mathematica all I had to do was to write:

In[l]:= efficOf[i_,j_,wt_,xt_] := (wt[[j]].xt[[i]])/(wt[[j]].xt[[j]]

eff[ws ,xs_,y_] :=
Map [Union,
Table[If[y[[i]] <= y[[j]],Min[efficOf[j,i,ws,xs],l],l],
{i,l,Length[y] }, {j,l,Length[y] }] ]

In[2]:= effList=eff [w,x,y] ;

In[3]:= minEf f=Map [Min, ef fList ]

The first expression simply calculates the cost ratio• The second expression compiles a
list of all the efficiencies that are less than 1 for each firm. The third expression actually
does the calculations, and the fourth expression calculates the minimum efficiency.
Once I had these efficiencies it was easy to look at a histogram, see how they changed
when different inputs were used, and so on. Because the calculations were so flexible
it was much easier to experiment than it would be if I had used a Fortran or C program
to do this.

7.5. Graphics

One of the most useful things that Mathematica can do is to produce plots. Here are
a few economics graphs.
Ch. 1 h Mathematicafor Economists 503

7.5.1. Revealed preference

This is the output of a function that takes as input a list of price-quantity pairs, plots
them, and then highlights the observations that violate the Weak A x i o m of Revealed
Preference. (I omit the function definition since it is not very illuminating.)




20 40 60 80 i00

7.5.2. Cournot equilibrium

Here are the commands to generate the isoprofit lines and reaction curves depicting
a Cournot equilibrium. Although I've chosen a very simple example, Mathematica
has no problem dealing with quite complicated profit functions - including ones with
kinks and discontinuities.
In[l]:= Profitl[xl_,x2_] := (100-xl-x2)*xl
Profit2[xl ,x2_] := (100-xl-x2)*x2

ln[2]:: prl=ContourPlot[Profitl[xl,x2],{xl,0,50],{x2,0,50},

Out[21= -ContourGraphics-

In[3]:= pr2=ContourPlot[Profit2[xl,x2],{xl,0,50),{x2,0,50},

Out[3]: -ContourGraphics-

In[4]:= Solve[D[Profitl [xl,x2] ,xl] =:0,xl] [ [I] ]

Out[4]= i00 - x2
{xl -> . . . . . . . . }
504 H.R. Varian

In[5]:= rl:ParametricPlot[{(100-x2)/2,x2),{x2,0,50),
r2:ParametricPlot[{xl, (100-xl)/2),(xl,0,50),

Finally we combine the plots to produce the picture of the Cournot equilibrium:
In[6]:= Show [{prl, pr2, rl, r2 }, DisplayFunction->$DisplayFunction]






0 i0 20 30 40 5

Out[6]= -Graphics-

7.6. Teaching

I have used Mathematica to prepare problem sets for both graduate and undergraduate
courses. It makes it easy to get the graphs right and to make sure that the calculations
come out in round and/or realistic numbers. Lately I have realized that it is silly to
compose problems on Mathematica and have the students do them by hand: they
should have access to the same kinds of tools the professor has access to. In the near
future I hope to have some self-contained economic exercises and examples available
in Mathematica.
I have prepared a set of Notebooks that go through some of the calculations used in
my textbook Microeconomic Analysis. These are available via MathSource as items
0202-419. I have also experimented with a number of undergraduate exercises. Since
many universities are now introducing students to Mathematica and other symbolic
algebra systems in calculus courses, it should be easy to use these tools in more
advanced undergraduate economics courses.
Ch. 11: Mathematica.for Economists 505

8. Summary

I have found M a t h e m a t i c a to be very useful for prototyping, debugging, and executing

many sorts of computations and calculations of interest to economists. Its strong
point is that it does many different things well. It has a user-friendly front end and a
sophisticated and flexible programming language; these features make it very handy
for quick analysis of small-scale problems. However, it is probably not an ideal
platform for large-scale computations that are going to be repeated many times, or for
manipulation of large data sets. Standard statistical packages such as SAS or SPSS,
or general purpose programming languages such as C or Fortran, are probably better
suited for such tasks.


Belsely, D. (1993) 'Econometrics.m: A package for doing econometrics in Mathematica', in: H. Varian,
ed., Economic and financial modeling with Mathematiea. New York: Springer.
Dickhaut, J. and Kaplan, T. (1993) 'A program for finding Nash equilibria', in: H. Varian, ed., Economic
and financial modeling with Mathematica. New York: Springer.
Ley, E. and Steel, M.EJ. (1993) 'Bayesian econometrics: Conjugate analysis and rejection sampling', in:
H. Varian, ed., Economic and financial modeling with Mathematica. New York: Springer.
Stine, R.A. (1993) 'Time series models and Mathematica', in: H. Varian, ed., Economic and fnancial
modeling with Mathematica. New York: Springer.
Varian, H. (1974) 'A Bayesian approach to real estate assessment', in: S.E. Fienberg and A. Zellner, eds,
Studies in Bayesian econometrics and statistics. Amsterdam: North-Holland.
Varian, H., ed., (1993) Economic and financial modeling with Mathematica. New York: Springer.
Wolfram, S. (1991) Mathematica. Reading, MA: Addison-Wesley.
Zellner, A. (1986) 'Bayesian estimation and prediction using asymmetric loss functions', Journal ~[ the
American Statistical Association, 81:446451.
Chapter 12


Hoover Institution, Stanford University
National Bureau of Economic Research


1. Introduction 511
2. The uses of approximation ideas: An overview 513
3. The mathematical foundations of regular perturbation methods 515
3.1. The meaning of "approximation" 515
3.2. Taylor series approximation 516
3.3. Rational approximation 516
3.4. hnplicit function theorem 517
3.5. Generalizations to function spaces 517
4. Applications of regular perturbation methods to economics 519
4.1. Comparative statics: A simple rule of thumb in tax theory 520
4.2. Comparative dynanaics: A canonical problem 520
4.3. Perturbing dynamic equilibria 521
4.4. The stable manifold theorem and applications to economic theory 524
4.5. Pel~turbingfunctional equations from recursive equilibrium analyses 527
5. Bifurcation methods 54O
5.1. Applications of the Hopf bifurcation to dynamic economic theory 541
5.2. Gauge functions 542
5.3. Bifurcation applications to stochastic modelling 542
6. Asymptotic expansions of integrals 545
6.1. Econometric applications of asymptotic methods 547
6.2. Theoretical applications of Laplace's method 547

* This research was supported by NSF Grant SBR-9309613. The author gratefully acknowledges commeuts
from Bo Li, Ariel Pakes, John Rust, and Ben Wang.

Handbook of Computational Economics, Volume I, Edited by H.M. Amman, D.A. Kendrick and J. Rust
~) 1996 Elsevier Science B.!~ All rights reserved.
510 K.L. dudd

7. The mathematics of L p approximations 548

7.1. Orthogonalpolynomials 548
7.2. Least-squaresorthogonal polynomial approximation 549
7.3. Interpolation 551
7.4. Approximationthrough interpolation 552
7.5. Approximationthrough regression 554
7.6. Piecewisepolynomial interpolation 554
7.7. Shape-preserving interpolation 556
7.8. Multidimensional approximation 557
8. Applications of approximation to dynamic programming 560
8.1. Discretizationmethods 562
8.2. Multilinear approximation 563
8.3. Polynomialapproximations 563
9. Projection methods 563
9.1. Generalprojection algorithm 565
10. Applications of projection methods to rational expectations models 569
10.1. Discrete-time deterministic optimal growth 569
10.2. Stochasticoptimal growth 572
10.3. Problemswith inequality constraints 574
10.4. Dynamicgames 574
10.5. Continuous time problems 575
10.6. Modelswith asymmetric information 575
10.7. Convergence properties and accuracy of projection methods 577
11. Hybrid perturbation-projection method 578
12. Conclusions 58O
References 581
Ch. 12: Approximation, Perturbation, and Projection Methods in Economic Analysis 511

This article examines local and global approximation methods which have been used
or have potential future value in economic and econometric analysis. While these
methods are familiar, they are seldom developed within a general, formal analytical
framework, a fact which has hindered understanding of these techniques and limited
their application. We attempt to review and unify this literature, showing connections
which have been ignored, and pointing out potential new directions. We first re-
view the foundations of basic asymptotic, or, perturbation, methods, and discuss their
applications to economic modelling and econometrics. We next discuss global ap-
proximation methods, including orthogonal polynomials, interpolation theory, shape-
preserving splines, and neural networks. We present the related projection method for
solving operator equations, and illustrate its application to dynamic economic analysis,
dynamic games, and asset market equilibrium with asymmetric information. Finally,
we discuss how the hybrid perturbation-projection method combines the complemen-
tary strengths of local approximation procedures and the projection method to produce
a promising new method.

1. Introduction

The key technical problem in much of economic analysis is the determination of

some unknown function. Important examples include the optimal policy functions of
economic agents (such as the consumption function in macroeconomics), equilibrium
price functions dynamic models, equilibrium strategies in games, and inference rules
and price functions in asymmetric information problems. The usual approach is to
make functional form assumptions on the structural elements of a model which lead
to closed-form solutions for these functions; prominent examples of this approach
are the linear-quadratic competitive structures discussed in Hansen and Sargent [60],
the linear-quadratic dynamic game structure exposited in Kydland [82, 83], the linear
risk tolerance and Gaussian returns assumptions in Merton [96], and the exponential-
Gaussian structure in Grossman [54]. Unfortunately, the desire for a closed-form
solution often restricts the analysis. While these special cases may suffice for some
purposes, they are often inadequate for a robust analysis. Such robustness is important
for both theoretical analysis, where important elements may be ignored in cases with
closed-form solutions, and in empirical work where misspecification of tastes and
technology can ruin an otherwise valid approach.
The alternative is to assume more general and flexible functional forms and use
approximation ideas to compute functions which are "close" to the true solution. In
the first section we remind the reader of a variety of theoretical and empirical prob-
lems for which these methods are useful. In the rest of the paper, we will review
the two basic approaches to the approximation of functions and the approximate so-.
lution of operator equations, representing two different kinds of data and objectives,
and introduce a third which combines the strengths of the first two methods. L o c a l
512 K.L. Judd

approximations take as data the value of the unknown function f and its derivatives
at a point x0 and constructs a function which matches those properties at x0. These
constructions rely on Taylor's theorem, the implicit function theorem, and bifurcation
theory, and lead to the construction of Taylor or Pad6 series, or other approximations
of a simple form. These methods are called perturbation, or asymptotic, methods. The
basic idea of asymptotic methods is to formulate a general problem, find a particular
case which has a known solution, and then use that particular case and its solution
as a starting point for computing approximate solutions to "nearby" problems. These
methods are widely used in mathematical physics, particularly in quantum mechanics
and general relativity theory, with much success. While economists have often used
special versions of perturbation and asymptotic techniques, such as linearizing around
a steady state, they often provide little formal justifications for their procedures, and
sometimes proceed in ad hoc and potentially invalid fashions. This has lead to some
coniusion as to the differences among various procedures. This is plausibly one rea-
son why economists have generally not exploited the full range and power of these
approximation techniques.
We will give simple examples of the perturbation methods and indicate the more
substantive uses which have appeared in the economics literature. These applications
include theoretical analyses of sunspot equilibria as well as quantitative analyses of
economic policies and business cycles. We will interpret the phrase "computational
economics" broadly in this chapter. The perturbation analyses which theorists have
done has been viewed as pure theory, and the authors made no apparent use of a com-
puter. However, much of this work is really the outcome of algebraic manipulations
which could be automated by symbolic mathematics software, such as Mathematica,
Maple, or Macsyma. We take the view that in the future such theoretical analyses will
be done by computer software, and is an interesting new avenue for computational
economics. This literature is included here also because the linear approximations
which these authors compute do have value as numerical approximations, and it
is instructive to compare these methods with other "linear approximation" methods
used in economics. Furthermore, these linear approximations are just the first step in
higher-order Taylor series expansions which themselves may have substantial numer-
ical value, even though this fact is generally not utilized in either the theoretical or
applied literatures.
The other approaches to approximation are more global in nature. L p approximation
takes a given function f and finds a "nice" function 9 which is "close to" f in
the sense of some L p norm. To compute an L p approximation of f, one ideally
needs the entire function, whereas we generally have information about f at only a
finite number of values. Interpolation is any procedure which finds a "nice" function
which exactly fits a finite set of prescribed conditions. Regression is similar to L p
approximation in that a some L p norm is minimized, an L 2 norm in the case of
least squares and L ~ in the case of minimum absolute deviation. Regression also
lies between L p approximation and interpolation in that it uses n points of data to
Ch. 12: Approximation, Perturbation, and Projection Methods in Economic Analysis 513

produce an approximation with m < n free parameters which "nearly" satisfies the
data. These approximation methods form the basis for projection methods, also known
as weighted residual methods, for solving functional equations. Projection methods
have been increasingly used in the physical sciences over the past twenty years. They
have been used to solve various economic problems, ranging from dynamic growth
models, dynamic games, and asset market equilibria with incomplete information.
Both perturbation and L p approximation methods are important because of the in-
creasing role of computation in economic analysis. Many computational economists
eschew sophisticated approximation techniques, believing that simple methods of ap-
proximation combined with supercomputer technology will solve any problem they
might have. This is not the attitude taken in other computationally intensive fields.
In fact, an examination of the numerical analysis literature shows that over the past
fifty years advances in numerical analysis have improved algorithm speed as much as
hardware advances. Rice [108] presents a formal and substantive discussion of this
issue for the problem of solving two- and three-dimensional elliptic partial differen-
tial equations, a class of numerical problems which arise naturally in continuous-time
stochastic economic modelling. He argues that we were able to solve these problems
4 million to 50 billion times faster in 1978 than in 1945, of which a factor of 2,000
to 25 million can be attributed to software improvements, and a factor of 2,000 to
hardware improvements. One reason for this improvement has been the application
of the basic approximation ideas we present below. It is clear from examination of
the mathematical and economic literature that even a modest application of modern
approximation techniques can substantially improve the efficiency of most computa-
tional methods in economics. The objective of this review is to be retrospective and
review actual applications, but also to be prospective and indicate where a more in-
tensive use of well-known mathematical techniques can expand the range and quality
of these applications in economics.
After discussing perturbation and projection methods, we move to a third approach
to approximation which combines perturbation and projection methods. The perturba-
tion and projection methods of solution differ substantially in their focus and proce-
dures. However, we shall see that their strengths and weaknesses are complementary.
This complementarity implies that a combined analysis using both methods will al-
low economists to analyze many economic problems in a robust and reliable fashion.
This combined method is called the hybrid perturbation-Galerkin procedure. We will
illustrate its advantages and potential in a simple example.

2. The uses of approximation ideas: An overview

Economic modelling problems have used a variety of approximation methods. In dy-

namic programming problems, one wants to solve out for the value function and the
corresponding policy rule, which in turn are needed for an empirical analysis of the
514 K.L. Judd

data. The closed-form approach l to this problem is exemplified in Sargent's [116]

analysis of dynamic labor demand. However, the linear-quadratic approach has limi-
tations. Rust [111] exemplifies the alternative approach where one assumes arbitrary
tastes and technology and approximately solves the dynamic programming problem
of the agents and for likelihood models for the data. However, Rust uses the very
conservative discrete-state approximation method which is reliable but slow. The ap-
proximation ideas we discuss below have been successful in solving many dynamic
programming problems which are more general than the linear-quadratic case but with
substantially greater efficiency than the discrete-space approximation method. These
solutions could also be used in maximum likelihood econometric procedures where
such an increase in speed would be important.
The approximation ideas we discuss below have also been used in rational expec-
tations equilibrium analysis. Closed-form solutions are rare; agricultural economists
realized the futility of this back in 1958 with Gustafson's [56] work on grain stock-
piling. A critical aspect of that problem is the nonnegativity constraint on grain stock-
piles. This constraint leads to kinks in the storage rules and price functions. Gustafson
used piecewise linear functions to approximate the relation between current price and
the current total grain stock. Williams and Wright [123-125] extended the Gustafson
analysis to include elastic supply. An important innovation in their solution was their
observation that the conditional expectation of the future grain price is a smooth func-
tion of the current state of the market, and that this conditional expectation function
characterizes equilibrium. This observation suggests that equilibrium can be approxi-
mated by low-order polynomial approximation of the conditional expectation function
which characterizes equilibrium. This leads to a considerable improvement in effi-
ciency over the alternative of using discrete-state or piecewise linear approximations
of the current price law. Helmburger and Miranda [98] also use this approximation
idea to solve equilibrium. More recently, Christiano and Fisher [32] use the same idea
to model general equilibrium where a nonnegativity constraint on gross investment
will occasionally bind.
These approximation methods are also important in empirical work on structural
models of commodity markets. Deaton and Laroque [43] used approximations of the
rational expectations equilibrium to compute methods of moments estimates in a fully
structural model of several commodity markets,
Dynamic games also have a similar dichotomy. Kydland exemplifies the closed-
form approach to linear-quadratic games, whereas Kotlikoff, Shoven, and Spivak [80]

l Some may argue that the linear-quadratic model typically does not have a closed-formsolution because
it is generally necessary to solve a Riccati equation, or, as in the case of dynamic games, a coupled system
of Riccati equations. While there are nontrivial problems associated with solving Riccati equations, we
currently have methods which are so reliable and accurate that the solutions are treated as if they were
closed-form solutions with no computational error. Since the approximation problems are much worse when
we leave the linear-quadratic paradigm, linear-quadratic modelling is, for the purposes of this review, more
like closed-form modelling than the approximate solutions we will discuss.
Ch. 12: Approximation, Perturbation, and Projection Methods in Economic Analysis 515

take a smooth approximation approach to solving a more general dynamic game.

Judd [71] and Miranda and Rui [99] use modern approximation theory to solve non-
linear dynamic games.
The most common use of perturbation methods is the method of "linearizing around
a steady state". Such linearizations tell us how a dynamical system evolves near a
steady state, and we can also use them to compute how a system reacts to shocks
which move the steady state, such as tax policy or monetary policy changes. A
particularly important case of this was Magill [93], who suggested that the linear
approximations of stochastic growth models be used in macroeconometric analysis.
Kydland and Prescott [85], and many later macroeconomists have successfully used a
linear approximation computational approach to examine the empirical strength of the
Real Business Cycle hypothesis. Similarly, many authors used linearization methods
to analyze the impact of macroeconomic policy on dynamic equilibrium.
The key fact is that perturbation methods are just ways to take derivatives in com-
plex problems. This implies that they have a variety of uses. For example, in maximum
likelihood estimation, one must repeatedly compute derivatives of the likelihood func-
tion. Zadrozny [127] discusses how to compute such derivatives analytically in the
case of linear quadratic models. For more general models, computing such derivatives
is generally done numerically. However, perturbation methods could be used to solve
for these derivatives analytically with considerable gains in accuracy and speed.
These are just a few examples of how approximation ideas are important in com-
putational aspects of both theory and econometrics. We shall now discuss the formal
mathematics behind these approximation ideas and illustrate their applications in sim-
ple examples.

3. The mathematical foundations of regular perturbation methods

The most basic local approximation techniques are called regular perturbation meth-
ods. They are based on a few basic theorems including the well-known Taylor's
theorem and the implicit function theorem for/~n as well as extensions to operators
on infinite-dimensional spaces. We will first state the basic theorems which provide
the foundation for regular perturbation methods in this section, and give examples of
their use in the next section.

3.1. The meaning of "approximation"

We often use the phrase " f ( x ) approximates 9(z) for z near z0", but the meaning of
this phrase is seldom made clear. One trivial sense of the term is that f(zo) =- 9(zo).
While this is certainly a necessary condition, it is generally too weak to be a useful
concept. Approximation usually means at least that f~(zo) = 91(zo) as well. In this
516 K.L. Judd

case, we say that " f is a first-order (or linear) approximation to 9 at x = x0". In

general, " f is an nth order approximation of 9 at x = xo" if and only if

lim I] f ( x ) - g ( x ) = O.
~" Ilx-xo[I n

3.2. Taylor series approximation

The most basic local approximation is described by Taylor's theorem:

THEOREM 1 (Taylor's theorem). Suppose f • R n --+ R 1, and is C k+l. Then for x ° c
R n If f E C n+l [a, b] and x, xo E [a, b], then

f ( x ) = f ( x °) + ~x~ (x°) (x~ - x °)


1 ov

M7 ~,, "'" ~Xil,,.~Xi k (X0)(X~I--xOi)'''(Xik--,~7Ok)

i1=I ik=l
+ O (ll x - x ° Ilk+l). (1)
The Taylor series approximation of f ( x ) based at x °, (1), uses derivative informa-
tion at x ° to construct a polynomial approximation, f is analytic on [a, b] exactly
when this approximation converges to f on [a, b] as k increases. Generally, this ap-
proximation is good only near x ° and decays rapidly away from x °.

3.3. Rational approximation

Padd approximation uses the same derivative information as does a Taylor series
approximation, but instead constructs a rational function to approximate f. The (m, n)
Pad6 approximant of f at x0 is a rational function

r(x)- p(x) (2)

where p(x) and q(x) are polynomials of degree m and n, and

0= -d-x~(p-fq) (x0), k=0,...,m+n. (3)
Ch. 12: Approximation, Perturbation, and Projection Methods in Economic Analysis 517

The m + n + 1 derivative conditions in (3) suffice since q(xo) can be normalized to

be I. The problem of computing the coefficients of p and q is a (generally nonsingular)
linear problem,
The experience is that Pad6 approximants are better global approximants than Taylor
series approximations, that is, the error grows less rapidly as we move away from
x0. There are strong theorems confirming this for some functions; see Bender and
Orszag [8] for an accessible treatment.
Rational approximation ideas have not been as widely used in economic analysis as
Taylor series methods. Pad6 approximation has proved useful in econometric analysis.
See Phillips [103] for a discussion of various generalizations of Pad6 expansions; in
particular, he discusses the idea of using information at several points, not just one.
Phillips also reviews applications to finite sample distribution theory. Below we will
discuss another kind of application of Pad6 approximations.

3.4. Implicit function theorem

The next important tool is the Implicit function theorem in Euclidean spaces.
THEOREM 2 (Implicit function theorem). If H ( x , y) : R ';~ x R "~ --+ R "~ is C j and
Hv(xo, fro) is not singular, then there is a unique function C° function h : R n -+ R m
such that f o r (x, y) near (xo, Yo)

H(x, h(x)) = o.

Furthermore, if H is C k then h is C k - j and its derivatives can be computed by

implicit differentiation of the identity H ( x , h(x) ) -~ O.
The Implicit function theorcm states that h can be uniquely defined for x near zero
by a relation of the form H ( x , h(x)) = 0 whenever Hy(0, h(0)) is not singular. This
allows us to implicitly compute the derivatives of h with respect to x as a functions
of x. When we combine Taylor's theorem and the Implicit function theorem, we have
a way to compute a locally valid degree k polynomial approximation of the implicit
function h(x) whenever H is sufficiently differentiable. The derivative information
could also be used to compute a Pad6 approximant.
The previous theorem applied to finite-dimensional problems. Frequently in eco-
nomics we need to solve for unknown functions which are solutions to some operator
equations. In these cases we need implicit function theorem for infinite dimensional

3.5. Generalizations to function spaces

To solve dynamic economic problems, we need generalizations of these theorems to

functional spaces. It is necessary, therefore, to first introduce some terminology from
5l8 K.L. Judd

functional analysis, and state a generalization of the implicit function theorem which
has a straightforward computational implementation.
Suppose that X and Y are Banach spaces, i.e., normed complete vector spaces. A
map M : X k -+ Y is k-linear if it is linear in each of its k arguments. It is a power map
if it is symmetric and k-linear, in which case it is denoted by M:C k = M ( x , x , . . . , x).
The norm of M is constructed from the norms on X and Y, and is defined by

IIMII : sup IIM(xl,:C2,..., xk)ll.

I I : ~ d l = l , i = 1 , 2 ..... k

For any fixed :Co in X , consider the infinite sum in Y


Tx : Z - (4)

where each of the Mk is a k-linear power map from X to Y. When the infinite series
in (4) converges, T is a map from X to Y. The majorant series for T is

IIMk[I - x01l k.

The important fact is that T will converge whenever its majorant series does.
DEFINITION 3. r is analytic at xo if and only if, for some neighborhood of x0, it is
defined and its majorant series converges.
With these definitions, we can now state an analytic operator version of the Implicit
Function Theorem, taken from Zeidler [128].
THEOREM 4 (Implicit function theorem for analytic operators). Suppose that
= "M,j (5)

defines an analytic operator, F : U C R × X -+ Y, where U is a neighborhood

o f (0, O) in R × X . Furthermore, assume that F(0, 0) = 0 and that the operator
Mol : X --+ Y, representing the Frechet cross-partial derivative at (0, 0), is invertible.
Consider the equation

F(e, x(e)) = 0 (6)

implicitly defining a function x(e) : R --~ X . The following are true:

1. There is a neighborhood of 0 E R, V, and a positive number, r > O, such that
(6) has a unique solution x(e) with [Ix(c)ll < r for each e E V.
Ch. 12: Approximation, Perturbation, and Projection Methods in Economic Analysis 519

2. The solution, x(e), of(6) is analytic at e = O, and, ]or some sequence of x~ in X ,

can be expressed as

: n (7)

where the coefficients xn can be determined by substituting (7) into (6) and equating
coefficients of like powers of e.
3. The radius of convergence of the power series representation in (7) is no less than
that of the analytic map, z(e) : R --+ R, defined implicitly for some neighborhood
of O by


o : Z IIMnkll <8)

Furthermore, fbr some sequence Zn of real numbers,



represents the solution to (8) and IZnl > [IX,~ll.

See Zeidler [128] for a proof and discussion of this implicit function theorem. The
mathematics of applying this method turns out to be elementary since the task is
reduced to recursive computation of xn terms, in term-by-term approach described
above. The only requirement is to set up the problem so that it is expressed as an
analytic operator with a nondegenerate radius of convergence. This theorem shows
that the logic and intuition from the finite-dimensional implicit function theorem
generalizes naturally and straightforwardly for analytic operators.

4. Applications of regular perturbation methods to economics

There have been many uses of local approximations in economics, implicit and ex-
plicit. The topic of comparative statics is nothing more than applications of the implicit
function theorem. Comparative dynamics are technically more difficult problems, but
fit into the same general framework. Recognizing these similarities will help us solve
difficult problems. We will review some basic applications which have appeared and
give examples of some possible future uses.
520 ICL Judd

4.1. Comparative statics: A simple rule of thumb in tax theory

Comparative statics are just basic applications of the implicit function theorem. One
simple example of applying perturbation ideas is the impact of a tax on equilibrium.
Suppose that D(p) is demand at consumer price p, that S(p) is supply at producer
price p, and that a per unit tax of r is applied. Then the equilibrium consumer price
at tax rate ~- can be expressed as the function p ( r ) which is implicitly defined by
D(p(r)) = S(p(r) - r). We can expand this relation around r = 0, the tax-free
equilibrium case, to study the impact of the tax on equilibrium. This analysis leads,
for example, to the useful rule of thumb that the efficiency cost of a tax equals
l(r/D + r/s)r 2 where rlD and Us are the demand and supply elasticities at the r = 0
case. This quadratic approximation has been used extensively to intuitively discuss tax
policies and as the formal basis for some quantitative tax analysis, as in the Barro [5]
analysis of optimal tax policy.
This tax example is just one simple case where simple perturbation formulas, more
commonly described as comparative statics, are useful approximations. We next ex-
amine dynamic applications of these perturbation ideas.

4.2. Comparative dynamics: A canonical problem

Since it will be frequently used below, we will now describe a simple continuous-
time 2 model of economic growth. Let k be the capital stock, e the rate of consumption,
and f ( k ) the rate of output. Assume that the intertemporal utility function of the
representative agent is f ~ e-ptu(c(t)) dr, and that the capital stock evolves according
to k, = f ( k ) - e. The corresponding optimal growth problem is

V(ko) ~ max foo e -pt u(c) dt,

~(t) Jo
= f(k) - c (9)
k(0)- k0
where V ( k ) is the value function. Our examples will study the solution to this op-
timal growth problem. We will also examine the representative agent version of this
problem. The competitive equilibrium will correspond to the social planning prob-
lem in the perfectly competitive, distortion free case, but not otherwise. We will also
examine the equilibrium problem when taxes are present. While this model and its
stochastic generalization appears to be special, it is in the same general family of
dynamic optimization problems investigated by the papers of Sargent and Rust.

2It will be obvious that all of these methods can be applied in the same way to discrete-time models.
Since there is no substantive distinction between the discrete-time and continuous-time literatures, I will
discuss continuous-time and discrete-time papers together.
Ch, 12: Approximation, Perturbation, and Projection Methods in Economic Analysis 521

4.3. Perturbing dynamic equilibria

To illustrate the essential features of perturbation methods applied to dynamic equi-

libria, we apply them to study the effects of policy changes in a dynamic model of
equilibrium with taxation. Brock and Turnovsky [21] shows that if we take the simple
growth model behind (9) and add a tax on capital income, the resulting equilibrium
solves the system of differential equations

~=3'(c) c (p - f ' ( k ) ( 1 - r)),

k=f(k) - e- g

where 3@) ~ u'(c)/(cu"(c)) is the rate of intertemporal substitution in consumption,

r(t) is the tax on capital income at time t, 9(t) is government expenditure (on goods
which do not affect utility) at t. The tax rates are exogenous, and c and k are the
unknowns to be determined. Note that this includes the special case of r = g = 0,
which characterizes (9). The boundary conditions for (10) are the initial condition on
the capital stock

k(O) = too (11)

and a stability condition on consumption

0< lim c(t) < o c . (12)


The conceptual experiment is as follows. We assume that the "old" tax policy was
constant, r ( t ) = f, and that it has been in place so long that, at t = 0, the economy
is at the steady state corresponding to ¢. Note that this also assumes that for t < 0,
agents assumed that r ( t ) = ? for all t, even t > 0. Hence, at t = 0, k(0) = k *s.
Suppose, however, that at t = 0, agents are told that future tax policy will be different.
Say that they find out that the new tax rates are ? + r(t), t / > 0, that is r(t) will be
the change in the tax rate at time t. Similarly, they are told that the new expenditure
policy is ~0 + 9(@ We also allow the possibility that the capital stock at t = 0 is
changed by n. The new system is

d=7(c)c(p - ft(h)(¢ + -r(t))),

] c = f ( k ) - c - (~ + 9(t))

together with k(0) = k s* + ~, and (12). We will use perturbation methods to approx-
imate the effects of the new policies r and g on the dynamic paths for k and c.
522 K.L. Judd

We need to parameterize the new policy so that it fits the perturbation approach;
that is, we need to imbed the shocked system (13) in a parameterized collection set
of problems of the form F(c, k, t, e) = 0. We do this by defining

~(t, ~) = e + ~ ( t ) , g(t, ~) = o + ~9(t), k(o, ~) = k ss + ~

and the corresponding continuum of BVP's

ct(t, e) = 7(c(t, e)) c(t, e) (p - f ' ( k ( t , e))(1 - T(t, e))),

kt(t, e) = f ( k ( t , e)) - c(t, e) - g(t, e), (14)
k (0, e) = k ss + e~
plus (12).
The system (14) implicitly defines consumption and capital paths for any value of
e. In that way, it fits into our general implicit function framework in that we have an
expression F(c, k, t, c) = 0 which implicitly defines the paths c(t) and k(t). As long
as the functions involved in (14) are locally analytic, we can apply Theorem 4 above.
With this apparatus in hand, we can now solve for the first-order perturbation of (14).
To solve for first-order approximations of the impact of e on c and k, we differentiate
(14) with respect to e, evaluate the resulting differential equation at e = 0, and arrive
at the following linear differential equation system for the unknown functions c~ (t, 0)
and k~ (t, 0):

c~t(t,O) = 7 ( c s~) c s~ ( - f"(kSS)(1 - f ) k e ( t , O ) + ( p - ft(k~S)(-~-~(t,O)))),

k~,(t, o) = f'(k~')k~(t, O) - c~(t, O) - g(t), (15)

k~ (0, O) =

plus the condition that c~ and k~ are both bounded. This is a linear boundary value
problem with constant coefficients, which can be solved analytically. This is typical
of perturbation methods: differentiate a nonlinear problem and one will arrive at a
linear problem of the same type.
We then solve for c~(t, 0) and k~(t, 0) from (15). The result will allow us to compute
a linear approximation for c(t, 1) and k(t, 1), the consumption and capital paths under
the tax and spending changes; they are

c(t, 1) ~ f ( k ~ ) - ~ + c~(t, 0),

k(t, 1) ~ k ss + k~(t, 0).
One can also compute the derivative of any dynamic quantity, such as lifetime utility
and tax revenue, with respect to e, thereby computing the marginal change in the
consumption and capital path per dollar of extra revenue, per util of extra utility, or
relative to any other quantity.
Ch. 12: Approximation, Perturbation, and Projection Methods" in Economic Analysis 523

The resulting solutions can be very informative. For example, the initial shock to
net investment (denoted by the derivative of I - f (k) - e with respect to e at t = 0)

I~(O)- 7cP T(p)+(f'(k s~)-p)ec÷pG(~)-g(O) (16)


# -- I
2(1 - ?) 1+ 1+ crOK

is the positive eigenvalue of the linearized system (15), OK is capital's share of

income, Or is labor's share, and 0c is the steady state share of output which goes to
consumption. G ( s ) and T ( s ) are the Laplace transforms 3 of the policy perturbations
9(t) and ~-(t).
Perturbation methods yield algebraic formulas for quantities of interest. For exam-
ple, the formula (16) tells us many things. First, future tax increases reduce investment.
However, their effect is proportional to T(#), which is essentially the average tax in-
crease discounted at the positive eigenvalue, #. From (17) it is clear that # exceeds
i f ( k ) , the marginal product of capital and p, the after-tax return. Hence, future tax
increases are heavily discounted when determining their impact on current investment.
Second, government spending has an ambiguous impact on investment - current gov-
ernment spending depresses investment and future spending increases investment, but
again the future impact is discounted at rate #. Third, since investment and output
are related, we also know the initial impact of this policy shock on output. For ex-
ample, if a future tax increase causes current investment to fall, then output in the
future will also fall. Note that these shocks could be nonconstant, allowing us to
consider partially anticipated shocks. These simple calculations address basic issues
in macroeconomics.
Fourth, the presence of t~ in (14) allows us to use the same approach to compute the
effect of changes in the initial capital stock on consumption. The effect is intuitive: an
increase in the capital stock of ~ will increase output by ~ f ' ( h ss) = ~p/(1 - "~) but
will increase consumption by #t~, but # > f ' ( k ss) implies that the increase in con-
sumption is greater. Therefore, this procedure also tells us that the slope of the equi-
librium policy function for consumption is #.
We can also use this method to approximate solutions to the optimal growth model.
We chose the tax example to make clear that the presence of a social planning equiv-
alent plays no role in this procedure. However, if taxes and government spending are

3If f(t) : R 1 -~ R n, then the Laplace transform of f(t) is L ( f } : R l -+ R n, where L(f}(s)

f, e-~tf(t) dt.
524 K.L. Judd

zero, then the problem reduces to the social planner's optimal growth problem. For
example, the presence of the parameter ~ in (14) means that the linear approximation
to the consumption policy function near the steady state is F~-

4.4. The stable manifold theorem and applications to economic theory

The analysis above is just a simple example of what perturbation analysis can do.
Extending this type of analysis to several states is important in economics. These
additional states will arise when we include heterogeneous capital or heterogeneous
agents to our model. In this section we review the stable manifold theorem, 4 which
is the general statement of the linear approximation theory in dynamical systems,
and its applications to economics. However, we will also note that we can compute
approximations which go beyond those derived from the stable manifold theorem.

4.4.1. Multidimensional dynamics

The methods used above can be extended to the case of several state variables by
applying basic linear algebra and differential equation theory. The most used math-
ematical theorem in this regard is the stable manifold theorem. Suppose we have a
dynamic system

2 = g(z) (18)

with a stationary point at Z*; that is, 9(Z*) = 0. Then the local behavior of (18) for
Z near Z* is linearly approximated by the linear system

= A z (19)

where A = 9 z ( Z * ) and z -= Z - Z*. The solution to (19) is z(t) = e At zo .5 The

stable manifold theorem essentially says that the local behavior of (18) near Z* is
approximated with first-order accuracy by the local behavior of (19). In particular,
if the linear system (19) has a k-dimensional stable space near Z*, then (18) has a
k-dimensional stable manifold 6 near Z*.
This is a common situation in dynamic growth models, with and without distortions.
Let Z = (X, Y) where X is a list of predetermined variables and Y is a list of free
variables; we use here the terminology of linear rational expectations models, as in

4We shall just discuss the procedure which is .justified by the stable mmlifold theorem. An interested
reader can find a formal statement of the stable manifold theorem in Coddington and Levinson [35].
SFor discrete time systems, Zt+l = 9(Zt), Z* = 9(Z*), zt+l = Azt, and zt = Atzo.
6A stable manifold is a manifold, M, such that if z(to) is in M than Z(t) is in M for t >tll and
converges to Z*.
Ch. 12: Approximation, Perturbation, and Projection Methods in Economic Analysis 525

Blanchard and Kahn [12], for example. The predetermined variables are the state
variables, such the distribution of the capital stock across sectors, or the distribution
of wealth. The free variables are the decision variables, such as consumption and labor
supply, and prices, all of which are endogenous at each moment. Suppose that there is
a stationary point at Z* = (X*, Y*). Then the local behavior of the system is linearly
approximated by (19)and the solution is z(t) = e At (xo, Yo), where x - X - X*,
y - Y - Y*, and y0 is chosen to keep z(t) bounded asymptotically. Let 3;(x0) be the
set of all possible values for the free variables which together with the predetermined
variables being equal to x0 will imply a bounded path for z(t). 3;(xo) may be a single
value or a set of values.
In many economic models, Y(x0) is a single-valued function which generates much
valuable information, such as the dependence of prices, output, labor supply, and
consumption on the state variables. As in the one-dimensional case, in general, they
will allow one to compute linear approximations to the multidimensional equilibrium
decision rules, even when the equilibrium cannot be reduced to a social planning
problem. This procedure (which is equally valid for continuous-time and discrete-
time systems) for computing a linear approximation is well-known; it is presented,
for example, in detail in Chapter 6 of Stokey and Lucas [119]. Anderson [2] presents
computer programs in Mathematica for solving such problems in discrete-time.

4.4.2. Comparative dynamics

The general theory of such perturbations for optimal control problems has been worked
out in a variety of papers. Oniki [101] and Araujo and Scheinkman [3] proved that
optimal paths were differentiable with respect to parameters. Treadway [121] and
Mortenson [100] used a heuristic approach to derive explicit formulas for local ap-
proximations near steady states. Lucas [92] and Otani [102] provided approximation
formulas and formal justifications for them, the latter for the general optimal control
problem. Caputo [25, 24] derived Slutsky - like expressions for comparative dynam-
ics problems, and Lafrance and Barney [86] extended the analysis to the case of
nondifferentiable constraints.
While the tax example in (15) above was quite simple, the robustness of the method
to dynamic equilibrium analysis is obvious. This approach has been used to analyze
many questions in dynamic economic policy. One can add labor supply, and other
tax instruments. Judd [66, 68, 67] used this method to calculate the marginal effi~
ciency cost of various tax innovations, and related impulse responses to tax changes
for several macroeconomic variables. Laitner [87-89, 91] has written a series of pa-
pers on comparative dynamics, and applying them to difficult problems in dynamic
tax incidence. His work includes Overlapping generations applications of perturba-
tion methods and large-dimension applications of the linearization procedure. Bovem
berg [t6, 18, 17] has used these methods to analyze international economic questions.
526 K.L. Judd
He has computed the impact of taxation on capital flows, trade patterns, and terms
of trade in dynamic models of international trade. All of these authors 7 use lineariza-
tions around the steady state to compute quantitative estimates of the impact of policy
shocks. It is clear that these procedures can be used to analyze models with imperfect
competition and externalities as well.
The linearization procedures appear to be much faster than alternative numerical
methods, such as shooting. The disadvantage is that linearization procedures can
produce only the first-order effects, and may miss higher-order effects. We next turn
to that issue.

4.4.3. Higher-order approximations

The stable manifold theorem calculation yields just linear approximations. However,
proceeding as we did above, one could also compute second order approximations.
This is typically not done, but there is no theoretical difficulty. In fact, when we
compute the second differential of (14) one finds that the differential equations for
c~(t) and k~(t) are the same as the differential equations for c~(t) and k~(t) in (15)
except for different forcing terms. More specifically, if we write (15) in the form
gc = Ax + ~(t), where x = (c~, k~), then the corresponding equation for e~(t) and
k,~(t) has the same form except for the ~p(t) term. Since the difficult part of solving
any linear differential equation lies in dealing with the linear operator A, we see that
solving for c,,(t) and k,,(t) is essentially the same as solving for c,(t) and k~(t). More
generally, the methods used in Bensoussan [ 10] presents the mathematical foundations
for these methods in the finite-horizon case. In many models, these higher-order terms
will be as easy to compute as the first-order effects. By adding a few higher-order
terms to the linear term, one will end up with an accurate procedure far faster than
standard differential equation solution methods. Below we will return to the problem
of higher-order approximations in recursive equilibrium contexts.

4.4.4. Determinacy of perfect foresight equilibria

The discussion above presumed that Y(xo) is a single-valued function. There are
many interesting cases where Y(xo) is a correspondence, indicating that there are
many choices for Y0 which satisfy the boundedness conditions. In fact, when there
are too few unstable eigenvalues of the Jacobian 9z(Z*), that is, the number of stable
eigenvalues exceeds the number of predetermined variables, then Y(xo) is a linear
space. This implies that we have indeterminacy, that is, there is a linear continuum
of prices and/or allocations which are consistent with equilibrium. They have proven
useful for qualitatively analyzing many issues in dynamic general equilibrium. Ke-
hoe and Levine [79] used this approach to study indeterminacy in infinite-horizon

7This is by no means a completelist of such analyses.

Ch. 12: Approximation, Perturbation, and Projection Methods in Economic Analysis 527

economic models. This, and many other papers, show that indeterminacy is possible
in robust examples, and that the dimension of the indeterminacy can be large. Local
determinacy of equilibrium is an important example where a key qualitative property
of a model can be determined by straightforward computation.

4.5. Perturbing functional equations from recursive equilibrium analyses

A large variety of economic problems can be reduced to various kinds of functional

equations, some more complex than the simple ordinary differential equations in time
as in the example above. Stochastic models, in particular, do not generally reduce
to such equations. In this section we shall take a functional approach to a simple
growth model to illustrate the general applicability of perturbation methods to those
functional equations arising from dynamic programming and recursive equilibrium.

4.5.1. Stationary, deterministic growth

We will first look at a single-sector, single good, continuous-time optimal growth

problem, (9). The Bellman equation defining V(k) is

pV(k) = max
u(c) + V ' ( k ) ( f ( k ) - c). (20)

By the concavity of u and f , at each k there is a unique optimal choice of c, which

satisfies the first order condition u'(c) = V'(k). We will let C(k), the policy function,
denote that choice. (20) implies a differential equation for C(k):

u " ( C ( k ) ) C'(k)(y - C(k)) + u ' ( C ( k ) ) ( f ' ( k ) - p) = O. (21)

At the steady state, k ~, f ( k ~ ) = C ( k ~ ) , which, when substituted into (21) implies

the condition p = f~(k ~) which determines k ~.
Our goal is to compute the Taylor series expansion of the policy function around
the steady state. Specifically, we want to compute the coefficients of

C(k) - C(k ~ ) + C ' ( k ~ ) ( k - k ~) + C " ( k ~ ) ( k - k ~ ) 2 / 2 + . . . . (22)

We have so far computed k ~s, C(kSS), and f'(ks~). We next move to C'(k~s). At
this point we must assume that C(k) is C ~ . This assumption is clearly excessive,
but not unrealistic if we also assume that u(c) and f ( k ) are also C ~ . In fact, Santos
and Vila [115] shows that if u and f are C k then the policy function is C k--z near
any stable steady state.
528 K.L. Judd

Differentiating (21) with respect to k yields s

0 = u " ' C ' C ' ( f - C) + u " C " ( f - C) + u " C t ( f ' - C')

+ u't(f' - p) + u ' f t' (23)

which holds at each k and at the steady state, k ~, reduces to

0 = -u"(C') 2 + u'lCtf ' + u'f". (24)

Hence C ~(h ss) must solve the quadratic equation (24), implying

C' = u " f ' ± V / ( u " f ' ) 2 + 4 u " u t f "


where all derivatives are evaluated at the steady-state levels for the capital stock and
consumption. Since u and f are increasing and concave, (25) has two real solutions
of opposite signs. Since C r > 0 is known, we choose the positive root.
To demonstrate the ease with which higher-order terms can be calculated, we next
compute C"(kSS). Differentiating (23) with respect to k and imposing the steady state
conditions yields an equation linear in C"(k~Q. Therefore, solving for CH(k ~) is
easier than solving for C ' ( k ~ ) . In fact, the solution for C"(k ~ ) is

C,,(k~s) = 2(p - C')u'"C'C' + 3 u " C ' f " + u'f'"

u"(3C' - 2p)

where all functions are evaluated at h ss. Note that the solution for C " ( k ~ ) involves
C ( k ~ ) . The critical simplifying feature is that once we have solved the quadratic
equation for C ' ( k ~ ) , we have a linear equation for C"(k~S). Similarly, continued
differentiation of (21) shows that every other derivative of C at k ~ can be defined
linearly in terms of the steady-state derivatives of u, f , and lower order derivatives.
Judd and Guu [75] present Mathematica programs which compute arbitrary order
Taylor and Pad6 expansions based on the derivatives of C at the steady state. Judd [74]
shows that the 100 degree polynomial approximation to C is easily computed via a
recursive formula. Table 1 displays the results for a variety of approximations. The
assumptions are that u(c) = c('+'Y)/(1 + 7) and f ( k ) = pUVc~ with p = 0.04,
7 = - 2 , and c~ = 0.25. To evaluate the quality of the approximations, we compute a
normalized, unit-free version of (21), which is the Euler equation error

E ( k ) = u ' r ( c ( k ) ) C ' ( k ) ( f - C(k)) + u ' ( C ( k ) ) ( f ' ( k ) - p)


8We drop arguments when they can be understood from context.

Ch. 12: Approximation, Perturbation, and Projection Me#uMs in Economic Analysis 529

Table 12.1 displays E ( k ) for various degrees of approximation, and types of ap-
proximation. The notation a ( - n ) denotes a x 10 - n . The theoretical properties of
the Taylor and Pad6 approximations are displayed in this example. As the degree of
approximation increases, both approximations improve at all capital stocks in [0, 2].
Outside of [0, 2], the Taylor approximation is poor and getting worse; however, the
Pad6 approximation is doing very well even at k = 3 when n = 15.

Table 12.1
Eulerequation errors
k n=6 n=10 n=15
Taylor Pad6 Taylor Pad6 Taylor Pad6
0.1 9.7(--1) 2.7(-1) 5.2(-1) 3,0(-2) 2.6(-1) 1.5(-3)
0.3 6.3(-2) 5.0(-3) 1.2(--2) 5,3(-5) 1.6(-3) 1.3(-5)
0.6 6.2(--4) 1.5(-5) 1.2(--5) 5,5(-9) 1.0(-7) 6.3(-8)
0.8 3.6(-6) 4.7(-8) 4.4(-9) 1,5(-12) 1.2(-12) 7.8(-9)
1.0 0(0) 6.3(-16) 0(0) 6,3(16) 0(0) 0(0)
1.3 3.6(-5) 1.5(-7) 2.3(--7) 3.8(-12) 4.6(-10) 7.9(-10)
1.6 3.7(-3) 8.7(-6) 3.7(-4) 2.2(-9) 2.4(-5) 1.4(-9)
2.0 1.0(-1) 1.3(-4) 7.9(-2) 1.5(.-7) 6.8(-2) 3.1(-9)
2.5 9.6(-1) 8.7(-4) 7.9(-1) 3.0(-6) 1.7(2) 7.1(-9)
3.0 4.3(1) 3.0(-3) 1.3(3) 2.0(-5) 7.1(5) 3.7(-8)

Just becausethe Euler equation error is small does not imply that the approximation
is close to the true solution. We make two points. First, in this case, we can check for
accuracy and we do find that the Euler equation error is a good indicator of accuracy.
Second, if the Euler equation errors are small then the associated decision rule is one
in which the agents are making decisions which are nearly optimal in the sense that
the gain from doing the exactly optimal action improves the agent's welfare slightly.
Since computation is costly for economic agents, we can only expect them to follow
rules which are nearly optimal, and the appropriate sense of nearly optimal is not the
distance from their decision and the optimal decision but the value to the agent of
determining and taking the optimal action.

4.5.2. Non-steady state perturbations

The examples above computes a Taylor series for C(h) around a particular capital
stock, the steady state. There are other formulations which can also produce useful
approximations. Recall that perturbation methods begin with a soluble case out of a
continuum of cases, and uses differentiation to produce an approximation based on the
soluble case. Instead of constructing an approximation based on knowing the value
of C at some point, we can begin with a case where we know the entire solution
530 K.L. Judd

and use that case to construct approximations. An example of this alternative is the
continuum of problems

0 = C'(k, e) ( f ( k , e) - C(k, e)) + 7 C ( k , e) (p - f ' ( k , e)) (27)

where 3' is the constant relative risk aversion parameter, and

f ( k , e) = (1 - c)pk + ~k°~p/oe.

At e = 0, we hav~ a linear production function with a marginal product of capital

equal to p, the pure rate of time preference; in this degenerate case, the solution is
C(k, e) = pk, that is, consumption equals output. At all positive values for c, the
production function is concave and the unique steady state is k = 1. Suppose that
we are really interested in the e = 1 case where f is the standard Cobb-Douglas
production function.
The first perturbation of (27) implies that for all k and e,

0 = Cke(f - C) + Ck(f~ - C~) + "TCe (p - A ) + "TC ( - f a t ) .

which at e = 0 and C = pk reduces to

o = Ck(A - + zC

and implies the solution

c (k, 0) = - ' - 7) + ( z p - p ) k .

Continued differentiation will yield more terms which can be use in a Taylor series
approximation for the Cobb-Douglas production function (c -- 1) case of the form

C(k,l)- C(k,O)+C~(k,O)+C~(k,O)/2+C~(k,O)/6+.... (28)

Note that this approximation is an approximation at all k, and theory tells us that it
is good only for small c. To determine how good this approximation is for C(k, 1)
we could substitute it into the Euler equation and check to see if the Euler equation
errors are small. They often turn out to be acceptable, but we will see below that even
if (28) does not solve (27) well, the C~(k, 0), C~(k,O), etc., functions can still turn
out to be very useful.
Ch. 12: Approximation, Perturbation, and Projection Methods in Economic Analysis 531

4.5.3. Single-sector, stochastic growth

We next take the deterministic model above, add uncertainty, and show how to use
the approximation to the deterministic policy function around k ss in the deterministic
case to compute an approximate policy function in the model with a small amount of
uncertainty. While the assumption of small shocks may seem limiting, it is sensible
in many applications, such as macroeconomic and related financial analysis.
The stochastic problem is

V(k) = s u p E
{/o e -or u(c) dt

dk = ( f ( k ) - c)dt + ~ d z .
} ,

The Bellman equation becomes

0 = max [ - p V ( k ) + u(c) + Vk(k) ( f ( h ) - c) + ec~ (k) Vkk (k)].


It is straightforward to show that C(k) solves

o - ~(k)~'"(C(k)) + ¢(k) ~"(C(k)) + ,y(k) ~'(C(k)) (30)


~(k) = ~ ( k ) [c'(k)] 2,
¢(k) = [f(h) - h(k) + ea'(k)] C'(k) + ecr(k) C " ( k ) ,

-y(k) = f'(k) - p.
Formally, we are again looking for the terms of the Taylor expansions of C,

C(k, e) -- C(kSs, O) + Ck(kSS,0)(k - k ~ ) + C~(k~S,0)~

+ C k k ( k % 0)(k - k ~ ) 2 / 2 + C~k(k ~s , 0)~(k - k ~ )
+ C ~ ( k ~,0)e2/2 + . . . . (31)

Before proceeding as before, we should note that the validity of these simple methods
in this case is surprising. Note that (30) is a second order differential equation when
e ~ 0, but that it degenerates to a first-order differential equation when e = 0.
Changing e from zero to a nonzero value is said to induce a singular perturbation
in the problem because of this change of order. Normally much more subtle and
sophisticated techniques must be used to use the e = 0 case as a basis of approximation
for nonzero c. The remarkable feature of stochastic control problems, proved by
532 K.L. Judd

Fleming [48], is that this is not the case here, that perturbations of e, the instantaneous
variance, can be analyzed as a regular perturbation in e. 9
With Fleming's analysis in hand, we will now proceed. We assume that we know
all the k derivatives of C at k = k 8s and e = 0. This is what the previous section on
deterministic problems produced. We now move to computing C, by differentiating
(30) with respect to e. When we impose the deterministic steady state conditions
f ( k ss) = C(kS~), f ' ( k ~ ) = p, and c = 0, we arrive at a linear equation which
implies that

I t l .r"¢2
Ce- u t~k +Ckk
u"Ck or(k) + a'(k) (32)

where all the derivatives of C are evaluated at k = k s8 and e = 0. Note that the
solution for C, is a function not only of the deterministic steady state value of u,
u t, and u", it also depends on u m, and Ckk, which in turn depends on fro. If u
were quadratic, f linear, and ~r~(k) = 0, then (32) shows that C, = 0, as we expect
from the certainty equivalence results for linear-quadratic control. Again, continued
differentiation of (30) with respect to e and k leads to solutions for C ~ , C~k, Ck~,, etc.
Judd and Guu [75] present Mathematica programs for computing these coefficients.
They also show that the approximations are valid over a substantial range of values
for e and k.

4.5.4. Dynamic programming

The optimal growth examples above are just special cases of dynamic programming
problems. Albrecht et al. [1] showed that one could differentiate the Bellman equa-
tion with respect to an exogenous parameter. Even the higher-order aspects of the
computations above can be justified. Blume, Easley and O'Hara [14] discuss when
dynamic programming solutions are smooth in the state variables. Bensoussan [10]
also provides a general treatment.

4.5.5. Adjustment cost models

The problems above were based essentially on first-order conditions. We can apply
perturbation methods to other problems which are not as simple. Dixit [44] studied
the dynamics of models where a controller wants to keep the state of a system close
to some optimal value and incurs a fixed adjustment costs whenever he adjusts the
state. This leads to (S, t, s) rules; that is, when the state moves up to S or down
to s the controller incurs the adjustment cost and pushes the state to a target t.

9A more modem analysis of this problem relying on viscosity methods instead of probabilistic methods
is in Fleming and Souganides [49]. Their approach is also more general, possibly including distorted
Ch. 12: Approximation, Perturbation, and Projection Methods in Economic Analysis 533

There are many models which fit this description, but they seldom have analytical
solutions to the problem of determining S, s, and t in terms of structural parameters.
Dixit used perturbation methods to derive algebraic formulas for S and s in terms
of structural parameters. He also demonstrates that first-order approximations yield
very good approximations when the adjustment cost is empirically reasonable. On
the qualitative side, he makes rigorous the fact that the region of inaction, S - s, is
quite large for small variance; more precisely, he proves a fourth-power law which
states that S - s e( c 1/4 when the adjustment cost is c. This result is quite important
since it says that the region of inaction is quite large relative to the cost for small
costs. Dixit [44] discusses a number of applications of this result. This is an excellent
example of how one can use the perturbation method to get an analytically simple
rule of thumb which provides important intuition about a problem.

4.5.6. S t o c h a s t i c e q u i l i b r i u m a n a l y s i s w i t h o u t P a r e t o e f f i c i e n c y

Many equilibrium problems do not reduce to optimal control problems, such as dy-
namic equilibria with taxation or money. While the discussion above concerned an
optimal control problem, the same methods can be used to study the behavior of an
economy distorted by taxation. The basic fact is that near the deterministic steady
state, the linear approximation to the law of motion in the stochastic model is

d x = A ( x - ( x ~ - A ) ) d t + 52 d z (33)

where A is the linearization of the deterministic model and S is the covariance

matrix of the shocks to the state. In the deterministic model, x ss is the target state
and A "pushes" the state towards the target. This expressions shows that the linear
approximation to the stochastic model involves the same linear law of motion locally
but with a new target, where the adjustment A arises due to certainty nonequivalence.
With this observation, Balcer and Judd [4] studied the effects of taxation in a simple
capital accumulation model where (33) reduces to

dk = A(k - (k ss - A ) ) d t + Z d z (34)

where ~ is the negative eigenvalue of the linearization of the dynamic system de-
scribing the taxed equilibrium (similar to (10)). Therefore, the effects of taxation on
business cycle fluctuations reduce to its effect on ~. They show how the level and the
composition of the effective tax rate affect important business cycle statistics.
One can also compute equilibrium utility under distortions. If we have a tax of ~-
on all income but have all revenues rebated in a lump-sum fashion, the equilibrium
value function is

. v ( k ) = ~(c(k)) + Vk(k) (f(k) - C ( k ) ) + c~ (k)Vkk(k). (35)

534 K.L. Judd

An optimal policy chooses consumption to maximize the fight-hand side, but the
equilibrium policy under taxation does not. To see the difference, recall that the de-
terministic steady state is the k ~- which satisfies i f ( U ) = p/(1 --r) in the deterministic
case. Then, differentiating (35) with respect to k yields

pVk(k) = u'Ck + Vk(k) (ff - Ck) + Vkk(f -- C) (36)

which reduces at k ~- to

u'-Vk= Vk(f'-p) Vk r
Ok = P C'k 1 - ~-

which shows that the social marginal value of capital deviates from marginal utility
of consumption when the tax rate is not zero.
This fact is important when we come to evaluate the impact of uncertainty on the
equilibrium value function. Differentiating (35) at e = 0 and k = k ~- we find

pYo = ( u ' - v k ) c , + ~ V k k ( k ) (37)

which implies that the true first-order approximation to V around the deterministic
steady state is

V(k, ~) - (k - k')Vk + ~((~' - Vk)C~ + ~ Vkk(k))/p. (38)

This shows that the impact of uncertainty on the equilibrium value function depends
on the degree of certainty nonequivalence, C,, when the tax rate is nonzero, and that
dependence increases with increasing taxation. This is an example of a question where
certainty-equivalent methods of approximating a stochastic economy will not produce
reliable, first-order accurate answers.

4.5. 7. Multidimensional, high-order approximations

The examples explored above have only one dimension. These methods can be ex-
tended to multidimensional problems, yielding high-order approximations to multidi-
mensional problems. Bensoussan [10] discusses these problems for the finite-horizon
case, and Judd [74] presents these procedures for the infinite-horizon case. A nontrivial
difficulty in dealing with the higher-order approximations is the messy notation asso-
ciated with multivariate versions of Taylor's theorem. Judd [74] extends the Einstein
tensor notation, which was introduced to drastically simplify expressions in general
relativity theory, to make these higher-order approximation techniques in optimal con-
trol contexts more tractable. Judd [74], following Fleming [48], further extends the
multidimensional case to include uncertainty. The basic fact is that all the higher-
order terms of the Taylor series expansion, even in the stochastic multidimensional
Ch. 12: Approximation, Perturbation, and Projection Methods in Economic Analysis 535

case, are solutions to linear problems once one computes the first-order term in the
state variables. This indicates that the higher-order terms are easy to compute. Initial
experiments indicate that they are also good approximations well beyond the steady
state values. These procedures have not been exploited much, but can be obviously
applied to problems in the real business cycle, finance, public finance, and dynamic
general equilibrium literatures.
The other development is the work of Fleming and Souganides [49]. They derive
asymptotic results for problems written in viscosity form. One advantage of these
problems is that they can handle infinite-horizon problems, whereas the results de-
scribed in Bensoussan are proven mostly for finite-horizon cases. While discussing
viscosity, a relatively recent advance in nonlinear partial differential equations, is be-
yond the scope of this paper, we should note that these methods surely cover the
equations which arise in dynamic programming, and might generalize to cover equi-
librium problems.

4.5.8. Dynamic games

Perturbation techniques can also be used to analyze dynamic games. Because of the
notational burden of a formal treatment, I will here just give the basic idea behind
the perturbation approach. Suffice it to say here that we are discussing dynamic
game equilibrium concepts which can be written as solutions to ordinary or partial
differential equations, or some similar system of functional equations.
As with any perturbation method, we begin with a "point" (possibly in a function
space) where we know the solution. In game theory, such cases do arise. For example,
suppose that we have two players who each influence their own state variables, but
that the payoff functions and the laws of motion are such that neither player is affected
by the actions of the other. This would, for example, be the case of two differentiated
duopolists where the cross-elasticity of demand is zero, and the state variable of the
game is the vector of the firm's capital stocks. Then the equilibrium of such a "game"
is trivial, reducing to an optimal control problem for each player. Using the techniques
above, we can compute local approximations for each player's strategy around steady
states of the degenerate game.
Now suppose that the payoffs and/or laws of motion are slightly perturbed so that
each player now cares about the other's actions. By differentiating the functional
equations which characterize equilibrium with respect to the perturbation parameter
and imposing the implicit function theorem and Taylor's theorem, we will be able to
compute how equilibrium is affected by the alteration.
Another kind of starting point is to specify a game with general interactions, but
make some parametric assumption such that the players have no interest in the dy-
namics. This is the case when the interest rate is infinite. In such cases, the dynamic
game reduces to a static game and, in equilibrium, neither player expends any effort to
affect the future. With this degenerate case in hand we can then compute expansions
536 K,L. Judd

in the inverse of the interest rate to determine what happens as the firms begin to care
about the future. There are two examples of papers using these methods.
Judd [69] applied Theorem 4 to a patent race model. He assumed a duopoly model
where the players had two kinds of research strategies and it is necessary to complete
a sequence of steps. Analytic solution of such a general problem is clearly impossible.
He began by assuming that the patent race had a zero prize for the winner, which, of
course, implies a Nash equilibrium of no effort. This is also equivalent to the infinite
interest rate case. He then proved local existence of equilibrium as well as constructed
local linear and quadratic approximations.
Budd et al. [23] contains the most complex perturbation analysis of a dynamic
game. They analyzed a stochastic market share duopoly game. Specifically, current
profits for each firm is a function of firm one's market share, s, which is the state of the
game. Each player expends effort to increase his share, which moves stochastically.
The result is a stochastic dynamic game. The two degenerate cases they use are the
infinite interest rate case and the case of infinite instantaneous variance of random
movements in s. In these cases the firms either don't care about future market share
or essentially have no control over future market share, implying a Nash equilibrium
of zero effort. They compute asymptotic expansions in the inverses of the interest
rate and the disturbance variance. With these expansions they are able to examine the
dynamics of competition, determining when, for example, the laggard firm will work
hard to catch up, when the leading firm will work hard to keep its advantage, etc.
There have been few applications of perturbation methods to game analyses thus
far, but they do indicate the potential of the method. Srikant and Basar [118] develops
regular perturbation methods for a large class of dynamic games different from those
examined in Judd and Buddet al. Given the general applicability of these methods, the
increased interest in dynamic econometric analyses, and the difficulties of game theory
computation, one suspects that these procedures will become increasingly popular.

4.5.9. The macroeconomic "Linear-quadratic approximation"

The perturbation methods described have been used to approximate a wide variety
of optimal control and economic equilibrium problems, and can be used much more
extensively. While many macroeconomists have also studied stochastic growth models,
many have eschewed the procedures above and instead use ad hoc procedures which
replace nonlinear growth models with hopefully similar linear-quadratic models. Since
the latter strategy bears some similarity to perturbation methods and often uses similar
terminology, we will next describe it and discuss the many differences between it and
perturbation methods.
As discussed in Magill [93] and Kydland and Prescott [84, 85], one basic idea
is to replace a stochastic nonlinear control problem with a "similar" linear-quadratic
control problem which "approximates" the nonlinear model, and then apply linear-
Ch. 12: Approximation, Perturbation, and ProjectionMethods"in EconomicAnalysis 537

quadratic methods to solve the m o d e l ) ° This procedure is described precisely in

McGrattan [95]. 11 She takes the nonlinear stochastic optimal control problem

V(z0) -- m a x E tTr(u,x) ,

z~+l = g(z~, ut, ct)

where x is a vector of state variables, u is a vector of controls, and 7c is concave. She

solves for the steady state of the deterministic version of (39), and replaces (39) with
the linear regulator problem

V(xo) =-- max E /3t (xtQxt + u~Rut + 2x~Wut) ,
Ut (40)

Xt+l = Axt + B u t + Cet

where x~Qx + u~Ru + 2x~Wu is the second-order Taylor expansion of 7r, and
A x + B u is the first-order Taylor expansion of 9, both taken at the deterministic
steady state. 12
The linear-quadratic procedure outlined in McGrattan [95] differs from the pertur-
bation method in its approach, objective, and results. Despite using the term "lineal"
approximation," the objective is not to compute a locally valid Taylor series for the
equilibrium behavior rules. In fact, this procedure may produce an "approximation"
which differs substantially from the Taylor series produced by perturbation methods.
This is immediately seen by applying it to (29): f " ( k ss) appears in the solution to
C~(k ~ ) in (25) but appears nowhere in (40) after we apply McGrattan's procedure
to (29); therefore, the linear decision rule computed by McGrattan's method applied
to (29) would not be the linear approximation of the true decision rule at the steady
state, (22), even in the deterministic model. In fact, those who use this procedure

raThe procedure described here is applicable only to optimal control problems, and those equilibrium
problems which reduce to optimal control problems.
l IWhile many have used the "linear-quadratic" method, McGrattan's is the only precise statement of
the procedure for the general discrete-time multidimensionalcontrol problem which I have seen in the
published literature. The Magill procedure is the correct procedure, differs from McGrattan, but has been
largely ignored.
12Kydland and Prescott use a slightly different procedure. They choose linear rules which satisfy the
Euler equation at a collection of points near the steady state, where the collection is determined by the
variance of the shocks. In this respect, their procedure is similarto the projectionmethod we discuss below.
They commentthat in the case they examine,the differences are slight, but there is no reason to believethat
this is always true. I include their procedure here since their stated goal is to simplify "the determination
of the equilibriumprocess by reducingit to solving a linear-quadraticmaximizationproblem".
538 K.L. Judd

generally make no claim that they are computing the linear approximation of the true
decision rule.
If one were to use investment instead of consumption as the decision rule in (29)
then the result from McGrattan's procedure does yield the true linear approximation
in many cases (further work is needed to see how general this fact is). This does
not say that McGrattan's procedure is correct. Instead it points out an undesirable
sensitivity to economically inessential details in the formulation of the problem. In
contrast, perturbation methods are not sensitive to such changes.
No matter how one proceeds, the approach in McGrattan, Magitl, and Kydland
and Prescott, makes no adjustment for variance. The approximation is a certainty
equivalent approximation even though the true problem is generally not certainty
equivalent. At best, this procedure computes the first two terms of (31) above but
drops the third and later terms. The result is only half of the true linear approximation
at the deterministic steady state since the approximation includes the linear Taylor
expansion term for the state variables but excludes all Taylor expansion terms for the
variance terms. Multidimensional generalizations of the rules computed in Judd and
Guu [75] have no such problems.
This intuitive way of approaching the problem can lead to some conceptual prob-
lems in thinking about approximations. The linear-quadratic intuition behind (40)
says to replace a nonlinear problem with a similar linear-quadratic problem because
the latter is solvable. Suppose that you wanted a higher-order approximation of the
optimal decision rule. This approach suggests that the way to compute a quadratic
approximate decision rule would be to take a third-order polynomial approximation
of the objective around the deterministic steady state and solving exactly the resulting
cubic optimal control problem. Of course, there is no exact solution in general for
third-order problems, making it appear difficult to compute a quadratic approximation
to the decision rule. In contrast, the perturbation methods described above show that
the higher-order terms are in fact easy to compute.
Christiano [30] adopts a different approach to the "linear-quadratic" approach. He
writes down the Euler equations for the nonlinear model in the form (18), and then
linearizes these equations around the steady state to create a linear system of the form
(19). Two comments are in order. First, this essentially reduces the problem to a calcu-
lus of variations problem. Since this cannot be done for all optimal control problems,
this approach is limited. However, it is justified by the stable manifold theorem. Sec-
ond, he also imposes certainty equivalence on his approximation to stochastic models.
Therefore, he also ends up with only a "half-linear" approximation.
Dotsey et al. [45], Christiano [30] and McGrattan [95] have documented the qual-
ity of some implementations of the macroeconomic linear-quadratic approach. The
results follow what one would expect from the perturbation analysis. The Christiano
and McGrattan implementations of the linear-quadratic method do fairly well when it
comes to modeling movements of quantities, but not as well with asset prices. This
Ch. 12: Approximation, Perturbation, and Projection Methods in Economic Analysis 539

is expected since perturbation methods show that the linear approximation of quan-
tity movements depend on only linear-quadratic terms whereas asset pricing move-
ments are more likely to involve higher-order terms. In particular, the extra terms
produced in Judd and Guu [75] show that the deviations from certainty equivalence
depend on higher-order derivatives of the utility function. The linear-quadratic ap-
proximation also does less well as the variance of the productivity shocks increases
since the linear-quadratic approach ignores the effects of the variance on the decision
The linear-quadratic scheme in (40) is used to solve for equilibria which solve a
social planning problem. Macroeconomists have devised complex iterative schemes to
compute equilibria of distorted economies. They also revolve around linear-quadratic
approximations of the individual agents' problems (see, for example, Cooley and
Hansen [37]). These procedures are offered without any rigorous justification, and
offer no reason why they should be used instead of the earlier linearization methods
derived from standard mathematical methods. As pointed out above, the standard
perturbation methods used by Laitner, Judd, Bovenburg, and described in Stokey and
Lucas will compute first-order valid linear approximations in nonlinear equilibrium
models, and do so in a nonrecursive, hence much faster, fashion.
Furthermore, the problems with this macroeconomic approach are even greater
when dealing with distorted models. These approximations also ignore the impact
of variance. The point of many of these exercises is to compute the welfare effects
of various policies. This requires the computation of an equilibrium value function.
We saw above that the first-order approximation to such functions, (38) includes the
deviation from certainty equivalence when taxes are present. Therefore, their compu-
tations of utility are not reliable. Another example of the inadequacy of an informal
approach is in Chari et al. [27]. They show that the resulting "linear approximation"
does poorly relative to a global nonlinear procedure. Since they do not take an explicit
perturbation approach (that is, formulate it as an application of the implicit function
theorem or one of its generalizations, and compute an appropriate number of terms),
this is not evidence against the use of perturbation methods, only against the informal
approach they use.

4.5.10. Linear model computation

The linear approximation approach could also be used, but has been overlooked, when
it comes to the analysis of linear-quadratic models. The idea is simple: if one has
a model with a deterministic steady state and globally linear equilibrium behavioral
rules, then the linear rules which are locally valid near the steady state are the globally
valid rules. All of the perturbation methods outlined above are direct, noniterative,
methods in contrast to the complex, iterative procedures often used by economists to
solve linear dynamic economic models.
540 K.L. Judd

5. Bifurcation methods

Sometimes we will want to compute an approximation to an implicitly defined function

at a point where the conditions of the Implicit function theorem do not hold, in
particular when Hy(xo, yo) is singular. In some cases, there is additional structure
which can be exploited by bifurcation methods, to which we now turn.
Suppose that H ( x , e) is C 2. One way to view the equation H ( x , e) = 0 is that
for each e it defines a collection of x which solves the equation. We say that e0 is
a bifurcation point if the number of solutions to H ( x , e) = 0 changes as c passes
through co. Two situations are summarized in the following theorem.

THEOREM 5 (Bifurcation theorem). Suppose H ( x , O) = O for all x, where H : R 2 -+

R. Furthermore, suppose that

H (xo,O)=O =H,(xo, O),

f o r some (xo, 0). Then, /f Hcc(xo, 0) ~ 0, there is an open neighborhood A/" of (x0, 0)
and a function h(e), h(e) 7~ O f o r e ~ O, such that

H ( h ( e ) , e ) = 0 on A f

and H ( x , e) is" locally diffeomorphic to e ( e - x ) or e ( e + x ) . Otherwise, if H,,(xo, O) =

0 ¢ H ~ , (x0, 0), then there is an open neighborhood .Af of (x0,0) and a function
h(e), h(e) 7/= O f o r e ~ O, such that

H(h(,),,) = 0 on N

and H ( x , e) is locally diffeomorphic to e3 - xe or e3 + xe. In both cases, c = 0 is a

bifurcation point.

It may seem that Theorem 5 has limited applicability given its low-dimension
character. Fortunately, there is a procedure, the Lyapunov-Schmidt method, which is
used to transform high dimension (even infinite dimension) problems into appropriate
low-dimension problems at which point one applies the procedures above. This greatly
increases the applicability of this approach.
Theorem 5 also seems limited in that H has domain R 2. Theorem 5 generalizes
to H : R n+1 --4 R ~. Furthermore, the Bunch theorem (see Zeidler) generalizes the
bifurcation methods in Theorem 5 to allowing both x and e to be in Banach spaces.
Space limitations prevent our discussing these generalizations here, but below we will
see that economic applications are obvious.
There are many kinds of bifurcations; the simple ones in Theorem 5 are referred
to as the transcritical and pitchfork bifurcations. Another, more complex, bifurcation
Ch. 12: Approximation, Perturbation,and ProjectionMethods in EconomicAnalysis 541

which arises naturally in economics is the Hopf bifurcation. We present the statement
in Benhabib and Nishimura [9].

THEOREM 6 (Hopfbifurcation). Suppose that {c = [i'(z,#), z E G C R n, # E

[-c, c]C R, F E C k. Suppose that there exists stationary solutions, that is, for I~1
< c, there is ~(#) such that F(2(#), #) = O. Suppose that the Jacobian Fx(2(#), #)
has a parametric pair of eigenvalues which can be expressed as a(#) 4-/3(#)i where
a(O) = O, /3(0) ¢ O, and a'(O) ¢ O. Then there exists a family of parametric so-
lutions z(t, e) and #(e) of ~c(t, e) -- F(z(t, e), t~(e)) such that z(t,O) = "z(O) but
z(t,e) 7~ Y(#(e))for e ¢ 0. Furthermore, #(c) is C k-' and the period of the cycle
is 27r/ t/3(0)t.

The result stated above is just a first-order result; higher order approximations are
available. There are also conditions which guarantee that the periodic solutions are
stable orbits. There are further generalizations of the Bifurcation Theorem which cov-
ers cases where there are many nondegenerate branches passing through a bifurcation
point. Such cases may correspond naturally to multiple equilibria in economic models.
For a more complete discussion of these issues see Zeidler [128], Chow and Hale [34]
and Golubitsky and Schaeffer [53].

5.1. Applications of the Hopf bifurcation to dynamic economic theory

The Hopf bifurcation has been extensively used to study the possibility of deterministic
cycles in economic models. Benhabib and Nishimura [9] explored the possibility of
cycles in multisector growth models. They showed how to use the Hopf bifurcation to
check for the presence of Hopf bifurcations and offered plausible numerical examples
of Hopf bifurcations in optimal growth models. Zhang [131] presented a simplified
version of this analysis and also showed how to compute expressions for the period
of such cycles and how to check their stability.
The Hopf bifurcation has also been used in Industrial Organization theory. Feich-
finger [46] used the Hopf bifurcation in a dynamic model of advertising to argue that
cycles in advertising expenditure were quite plausible. The Hopf bifurcation has also
been used to analyze general equilibrium with financial imperfections. The theme of
these papers is that while the equilibrium dynamics of an economy may be stable
with perfect capital markets, capital market imperfections may lead to more complex
dynamics. Again, the Hopf bifurcation is used to demonstrate the existence of stable
cycles. These papers include Franke [50], who investigated a Keynes-Wicksell model
with adaptive expectations for inflation and found that periodic orbits were possible.
There are possibly many other applications of the Hopf bifurcation. Most dynamic
analyses examine only the steady state, not its local dynamic structure. At the least,
one can check the local linear structure to see if the number of stable and unstable
eigenvalues is consistent with local asymptotic stability. Since stable cycles are often
542 K.L. Judd

associated with unstable steady states, the presence of too many unstable eigenvalues
should lead an analyst to check for the possibility of a nearby Hopf bifurcation. Since
this checking is purely an algebraic exercise, easily done by symbolic computation
methods, such checks should become standard.

5.2. Gauge functions

The methods described above, commonly referred to as regular perturbations, compute

expansions of the form Y~'~i=l ~ aieL" There are many cases where we will want to
compute different expansions. In general, a system of gauge functions is a sequence
of functions, {Sn(e)}~°°__1, such that

lim ~n+l(e) _ 0.
~-~o O~n(~)

An asymptotic expansion of f(x) near x = 0 is denoted

f(x) ~ f(O) + ~ a i S i ( x )

where, for each n,

lim f(x) - (f(0) + Y~-a ai 5~(x)) = 0.

• -~o 6~(x)

In regular perturbations, the sequence of gauge functions is 5k(e) = e k. Another

example of a gauge system is 5k(c) = c k/2. In many problems, the main task is
determining the correct gauge system. The next sections present examples of this
more general problem.

5.3. Bifurcation applications to stochastic modelling

Whereas the Hopf bifurcation is useful in analyzing deterministic systems, the simpler
pitchfork and transcritical bifurcations examined above in Theorem 5 can be used to
study stochastic problems. In this section, we will illustrate this with two examples.
First we will discuss the details of a simple portfolio problem. Second, we will discuss
a much more sophisticated application.

5.3.1. Por(olio choices with small risks

Suppose that an investor has W in wealth to invest in two assets. The safe asset yields
R per dollar invested and the risky asset yields Z per dollar invested. If a proportion
Ch. 12." Approximation, Perturbation, and Projection Methods in Economic Analysis 543

co of his wealth is invested in the risky asset, final wealth is Y = W((1 - c o ) R + c o Z ) .

We assume that he chooses co to maximize E { u ( Y ) } for some concave utility function
One way in which economists have gained insight into this problem is to ap-
proximate u with a quadratic function and solve the resulting quadratic optimization
problem. It is argued that this is valid for small risks. The bifurcation approach allows
us to examine this rigorously. We first create a continuum of portfolio problems by

= R + e5 + e2¢c.

At e = 0, Z is degenerate and equal to R. If 7r > 0, we model the intuitive case

of risky assets paying a premium. Note that we multiply £ by e and 7r by e2. Since
the variance of ez is e 2 a2, this models the observation that risk premia are the same
order as the variance.
Judd and Guu [77] applied the Bifurcation theorem to this problem, producing
a procedure which involves solving only linear equations and which can be used
for noncompact distributions. We will briefly outline their analysis. The first-order
condition for co is (assuming W = 1)

o - E{~'(R + co(~z + ~2.)) (z + ~ ) } - G(co, ~). (41)

We want to analyze this problem for small e. We cannot apply the implicit function the-
orem since 0 = G(co, 0) for all co implying that co is indeterminate at e_ = 0. Since we
want to solve for co as a function of e near 0, we first need to compute which of these
co values is the "correct" solution to the e = 0 case; specifically, we want to compute
t" \
coo ~- lira co<e).

Implicit differentiation of (41) implies

0 = G~ co' + G~. (42)

Differentiating G we find
Ge = E { u " ( Y ) (coz + 2coeTr) (z + eTr) + u'(Y)zr},

G~ : E { ~ " ( 9 ) (z + ~)2~}.
At e = 0, G~ = 0. co'(0) can be well-defined in (42) only if G~(co, 0) = 0 also.
Therefore, we look for a bifurcation point, coo, defined by 0 = Ge(coo, 0). At e = 0,
this reduces to 0 = u"(R)coo~z2 + u'(R)Tr, which implies

coo- ~'(R) ~z2"

544 K.L. Jb!dd

This is the simple portfolio rule indicating that co is the product of risk tolerance
and the risk premium per unit variance. If coo is well-defined, then this must be its
value. Since the conditions of tile Bifurcation theorem are satisfied at (coo, 0), there
is a function co(e) which satisfies (41) and goes through (w0, 0).
Note that this is not an approximation to the portfolio choice at any particular
variance. Instead, coo is the limiting portfolio share as the variance vanishes. Some
authors treat this as an approximation to the true solution, co(e), for small c. That,
however, is not the case. If we want the linear approximation of co(e) at (coo, 0), we
must go one more step since the linear approximation is co(e) - co(0) + e w'(0). To
calculate co'(O) we need to do one more round of implicit differentiation. Differentiat-
ing (42) with respect to c yields 0 = G ~ colw~+ 2G,o~ co! + G~oco"+ G~. At (coo,0),
G ~ = u'"(R)co~ E{z3}, G ~ = O, G ~ = u " ( R ) E{z2}. Therelbre,

1 ,.'"(R) E{z 3}
co'(o)-- 2 E{z2} cog

This formula tells us how the share of wealth invested in the risky asset changes as
the riskiness increases, highlighting the importance of the third and second derivatives
of utility and the ratio of skewness to variance. If the distribution of the risky asset is
symmetric, then E { z 3} = O, and the constant coo is the linear approximation of co(e).
This is also true if urn(R) -=- O, such as in the quadratic utility case. However, if the
utility function not quadratic and the risky return is not symmetrically distributed, then
co~(0) ~ 0, and the linear approximation is a nontrivial function. Note that this says that
a linear approximation to co(e) requires a cubic approximation to the utility function
and third moments of ~. This fact, also demonstrated in Samuelson [114], shows
how the simple approach of using only a quadratic approximation to the objective
function does not produce a valid linear approximation for co(e). The advantage of the
bifurcation approach demonstrated here is that the structure of the problem indicates
exactly what information is needed.
Samuelson [114] earlier analyzed this problem in a less formal fashion. Also, his
formal analysis was limited to Z with compact distributions, that is, random variables
whose support goes to a point as e goes to zero, a detail which substantially limits
practical interest. The perturbation arguments used above make no such restriction.
While the Samuelson method worked in this example, using Theorem 5 allows us to
proceed in a more general fashion, and provides the necessary formal justification for
these calculations.
One example of where the true problem has a bifurcation structure and standard
linear approximation procedure is unacceptable is Huffman [63]. Huffman examined
an overlapping generations model of capital accumulation, and tried to examine the
impact of a business cycle shock on asset trading. He computed the deterministic
steady state and computed the impulse function for individual wealth and asset trading
arising from unanticipated shocks to endowments and output. Since this was all done
Ch. 12: Approximation, Perturbation, and ProjectionMethods in EconomicAnalysis 545

in an otherwise deterministic, perfect foresight model, the tacit assumption, appropriate

for such models, was that equity was the only traded asset. However, Huffman then
interpreted the impulse functions from the deterministic model as impulse response
functions for a stochastic model, where all agents know that these shocks occur
frequently. In the stochastic case, there would also be demand for trade in bonds
as well as equity, and business cycle shocks would generate disturbances to bond
holdings and equity holdings. This was ignored by Huffman, who implicitly assumed
that even in the stochastic model the only asset was equity. Such a capital market
imperfection will have important impacts on the predicted asset trading, and is not an
appropriate approximation assumption; just because one of the deterministic equilibria
has no bond trading does not mean that the absence of bonds is an appropriate
approximation for the stochastic model, even one with shocks with small variance.
The importance of including bond trading into the analysis depends on the question
being investigated. It is likely that the welfare loss is small when the variance is
small. However, including bond trading could have substantial impact on the volume
of trade in assets. A bifurcation approach can include bonds and equity and analyze
such trade.

5.3.2. Bifurcation and sunspots

A particularly sophisticated application of bifurcation techniques appeared in Chi-

appori et al. [29]. They analyzed the existence of stationary sunspot equilibria near
steady states of overlapping generations model of arbitrary dimension. They show that
when a steady state has indeterminate local deterministic dynamics, i.e., there exists a
continuum of perfect foresight paths converging to the steady state, then there exists
a continuum of sunspot equilibria which have support in neighborhoods of the steady
state. They also are able to determine the possible qualitative character of the sunspot
equilibria. This paper displays a very sophisticated and rich application of the ideas
behind Theorem 5. These methods will also allow researchers to assess quantitative
aspects of sunspot equilibria.

6. Asymptotic expansions of integrals

In economic and econometric problems, integrals frequently take the form

1()~) ==-JD e-Xg(x) f(x) dx (43)

where A is a large parameter. Simply differentiating (43) with respect to A at A = ec

will not work here. Laplace's method provides a useful way to approximate (43). The
basic idea is that the major contribution of the integrand is at the minimum of 9(x).
546 K.L. Judd

Suppose g(x) is minimized at x = a. For large A, if x ¢ a then e-)'.q(x) << e-)W(a).

As long as f ( x ) does not offset this for Ix - a I >> 0, I(),) is determined largely by
the behavior of the integrand, e-'Xg(x)f(x), for x near a.
The one-dimensional case is easy to state. Assume that g and f satisfy the asymp-
totic series
g(x) ~ g(a) + E ai(x - a) i+t', f ( x ) "-' E b i ( x - a) i + a - I .
i=o i=0

Under modest assumptions (see Wong [126], or Bleistein and Handelsman [13]) if
the integral I(A) = f : f ( x ) e -)'9(~) dx converges absolutely for sufficiently large A,
and if g is minimized on [a, b] at a, then

I(A) ~ e -)'g(a) F A(i+cO/t~ (44)


where .U(A) _= f0~ e - ~ x ;~-1 dx is the gamma function, and the ci depend on the ai
and bi. In particular

bo ( bl (o~ +#2aol)albo) ao(O~+l)/..

cO -- c~/l~ ~ Cl =
#a o #
To compute these coefficients and others one essentially expands the integrand in
terms of A and matches like powers. 13 One of the byproducts of this theorem is the
construction of an integrand which is close to e -xg(~) f ( x ) but also integrable. This
approximation to the integrand is then integrated to produce Laplace's approximation.
Note that the gauge functions of A in (44) depend on the asymptotic expansions of f
and g.
One elementary application of Laplace's method is Stirling's formula for n!. Recall
that n! = F ( n + 1). We would like to approximate F(n) for large n. To use Laplace's
method, let x = yA; then

F(A) = A ~ e-)'(Y-lnY)y -1 dy.

~0 °°
The minimum of y - In y is at y = 1. Break the integral into two integrals over
[0, 1] and [1,o c), and add the two one-sided approximations to get the two-term

ff()~) ~ ~ /@-l/2e-A
(1) 1 q- 1 ~ "

13Bender and Orszag gives an intuitive presentation of this procedure.

Ch. 12: Approximation, Perturbation, and Projection Methods in Economic Analysis 547

Stirling's formula is just the one-term expansion, n! ~ v/-2-~(n + l)n~-U2e-(n+l).

While the operating assumption in Laplace's approximation is that A is a "large"
parameter, practical use of such methods rely on what "large" means quantitatively.
Fortunately, these expansions may do very well even when A is actually small.
For example, Stirling's approximation for 1! is 0.9595 and the two-term expansion
yields 0.9995.
There is a multidimensional extension of Laplace's method. Suppose D C R ~,
f, 9 E C2[D]. Suppose the minimum of 9(x) for x c D is achieved at x0 in the
interior of D. Then the leading term of the expansion is

e_~g(~o) ( ~ ) n/2
Z(,~) -- [Nil/2 f(xo)

where H -~ (gx~xj) is the Hessian of 9 at x0. Higher order terms can also be computed.
While computing higher-order terms would be very tedious, symbolic languages such
as Mathematica, Maple, and Macsyma, are ideally suited to do this.

6.1. Econometric applications of asymptotic methods

Asymptotic methods in econometrics are essentially perturbation methods where the

properties of an estimator are computed in terms of the size of the data set and the
expansion is around the case of an infinite sample size. In some cases, the asymp-
totic problem can be handled by relatively simple procedures, such as Edgeworth
In other cases, the full power of Laplace's method is needed to compute asymptotic
properties of statistics. In this case, the integral is the likelihood function, and it is
written in the form (43) where the parameter ~ is the sample size. Phillips [104] used
Laplace's method to approximate small sample marginal densities of instrumental
variables estimators. Ghysels and Lieberman [52] use Laplace's method to compute
small sample biases which arise from using filtered data in dynamic regressions.
Laplace's method has been more popular among statisticians; see the citations in
[52]. Holly and Phillips [62] use the related saddlepoint procedure. These methods
work well, but are not often used, possibly because their implementation requires
much algebra. One suspects that a more intensive use of symbolic computational
tools would make them more accessible.

6.2. Theoretical applications of Laplace's method

While the theoretical applications of asymptotic methods for evaluating integrals are
few currently, they are likely to increase. Brock [ 19] discusses where Laplace's method
548 K.L. Judd

is useful in evaluating statistical mechanical systems adapted to economic issues.

As this modelling approach matures, it is likely that Laplace's method and related
asymptotic procedures will be quite useful.

7. The mathematics of Lp approximations

We will often want to approximate functions over a broad range of values with
relatively uniform accuracy. In this case, we turn to L v approximations. L p approxi-
mations finds a "nice" function 9 which is "close to" a given function f in the sense
of a L p norm. To compute an L v approximation of f , one ideally needs the entire
function, an informational requirement which is generally infeasible. Interpolation
is any procedure which finds a "nice" function which goes through a collection of
prescribed points. When using interpolation, the objective is to assure that if the data
comes from a function 9 then the interpolant is close to 9. Regression lies between L 2
approximation and interpolation in that the amount of data used exceeds the number
of free parameters, producing an approximation which "best" fits the data. In all cases,
we need to formalize the notions of "nice" and "close to".

7.1. Orthogonal polynomials

We will next use basic vector space ideas to construct representations of functions
which will lead to good approximations. Since the space of continuous functions is
spanned by the polynomials, x n, it is natural to think of the ordinary polynomials as a
basis for the space of continuous functions. However, recall that good bases for vector
spaces possess useful orthogonality properties. We will develop those orthogonality
ideas to construct orthogonal polynomials.

DEFINITION 7. A weighting function, w(x), on [a, b] is any function which is positive

and has a finite integral on [a, b]. Given a weighting function w(x), we define an
inner product on integrable functions over [a, b]:

(f, 9) = f(x) 9(x) w(x) dx.

The family of polynomials {~n(x)} are mutually orthogonal with respect to the
weighting function w(x) if and only if

{~,,~, ~ ) - - 0 , n¢m.
There are several examples of orthogonal families of polynomials, each defined by
a different weighting function and interval. Some common ones useful in economics
Ch. 12."Approximation, Perturbation, and Projection Method9 in Economic Analysis 549

are Legendre, Chebyshev, Laguerre, and Hermite polynomials. Legendre polynomials

assume w(x) = 1 on the interval [ - 1 , 1]; the nth Legendre polynomial is

(-1) n d~
Pn(X) =--- 2nni dx n [(1 - x2)n].

The Chebyshev polynomials arise from w(x) = (1 - X2) -1/2 on [--1, 1]; the nth
Chebyshev polynomial is

T n ( X ) ~ C O S ( n C O S - 1 X).

The Chebyshev and Legendre polynomials are useful in solving problems which live
on compact sets since a linear change of variables will transform in compact interval
into [ - 1 , 1]. The Laguerre polynomials correspond to w(x) = e - x on [0, cxz); the nth
member is

ex dn
Ln(x) =- n! dx n (xn e-x)"

Laguerre polynomials are useful when one needs to approximate time paths of vari-
ables in a deterministic analysis. Hermite polynomials arise from w(x) = e -~2 on
( - o c , ec); the nth member is

Hn(x) - ( - l ) n e ~2 ~ddn
z (e-X2) •

Hermite polynomials are used to approximate functions of normal random variables.

7.2. Least-squares orthogonal polynomial approximation

Given f(x) defined on [a, b], one approximation concept is least-squares with respect
to the weighting function w(x). That is, given f(x), the least-squares polynomial
approximation of f with respect to weighting function w(x) is the degree n polynomial
which solves

min fa (f(x) -p(x)) 2w(x) dx.
deg(p) ~<n

In this problem, the weighting function w(x) indicates how we care about approxi-
mation errors as a function of x. For example, if one has no preference over where
the approximation is good (in a squared-error sense) then we take w(x) = 1. If one
550 K.L. Judd

cared more about the error around x = 0 we should choose a w ( x ) which is larger
near 0.
The connections between orthogonal polynomials and least-squares approximation
are immediately apparent in solving for the coefficients of p(x) in the least-squares
approximation problem. If {~}~=1 is an orthogonal sequence with respect to w ( x ) ,
and we define (f, g) - f:o f ( x ) g(x) w ( x ) dx the induced metric is 11 f II2=
- (f, f),
the least-squares solution minimizes [1 f - p H, and can be expressed

;(x) =

Note the similarity between least-squares approximation and linear regression. The
formula for p(x) is essentially the same as regressing the function f on n + t orthog-
onal regressors; the coefficient of the ith "regressor", qoi(x), equals the "covariance"
between f and the ith "regressor" divided by the variance of the ith regressor. This
is no accident since regression is a least-squares approximation.

7.2.1. Chebyshev approximation

We will next describe some of the features of Chebyshev approximation since they
play an important role in many applications.

THEOREM 8 (Chebyshev approximation theorem). Assume f C C k [-1, 1]. Let

- c0 + cjTj(x)


2 /_1 f ( x ) T ~ ( x ) d x
cj = -
7r i v;l- x 2

Then there is a b such that, ]'or all n >~ 2

I]f-Cn ]]oo <~ nr

Hence Cn -4 f uniformly as n -4 oo. Furthermore, there is a constant c such that

[cjl< T, j> l.
Ch. 12: Approximation, Perturbation, and Projection Methods in Economic Analysis 551

This theorem has many useful aspects. First, if we compute a n-term Chebyshev
approximation, we need to assess the likelihood of it being "nearly" as good as the
full approximation. If the last few terms of the n-term approximation do not appear to
be dropping at the j - k rate indicated in the theorem, we would take this as evidence
for adding more terms; if the coefficients are dropping at the indicated rate we feel
more comfortable in accepting the n-term approximation. Note that, even though the
construction is a least-squares approach, the convergence is uniform, a far stronger
form of convergence. Since uniform approximation is a more difficult problem, we
instead use Chebyshev approximation which, according to Theorem 8, will work
nearly as well in the uniform norm.

7.3. Interpolation

Interpolation is any method which takes a finite set of pointwise restrictions and finds
a function f : R n --+ R m satisfying those restrictions.

7.3.1. Lagrange interpolation

Lagrange interpolation takes a collection of n points in R 2, (Xi, Yi), i ~-- l, . . . , n,

where the xi are distinct, and finds a degree n - 1 polynomial, p ( x ) , such that
yi = p ( x i ) , i = 1, . . . , n. The Lagrange formula demonstrates that there is such
interpolating polynomial. Define

x -- X j
II - x5

Note that gi(x) is unity at x = xi and zero at x = x j for i ¢ j . This property implies
that the polynomial


interpolates the data, that is, Yi = p ( x i ) , i = 1, . . . , n. Furthermore, this is the unique


7.3.2. Hermite interpolation

We may want to find a polynomial p which fits slope as well as level requirements.
Suppose we have data

p(xi)=yi, p'(xi)=y~, i= t,...,n

552 K.L. Judd
where the xi are distinct. Since we have 2n conditions, we are looking for at least a
degree 2n - t polynomial which satisfies the conditions above.
We will construct the unique solution, p(x). First define the functions

f i(x) = (z - xi) ei(z) 2,

hi(z) = (1 - 2f~ (x) (x - xi)) gi(Z) 2.

The critical facts are that hi is a function which is zero at all xj nodes except at xi,
where it is unity, and its derivative is zero at all xj, and the reverse is true for f~i(x).
The unique solution to the Hermite interpolation problem is

i=1 i=1

7.4. Approximation through interpolation

Interpolation is extremely powerful since it uses a minimal amount of information to

construct an approximation. It is also dangerous since the number of free parameters
equal the amount of data. Furthermore, we want the approximation to be valid gen-
erally, not just at the interpolation nodes. This is not generally true for interpolation
schemes. Consider the function f(x) = 1/(1 ÷ z 2) over the interval [ - 5 , 5]. Let
p~(x) be the nth degree polynomial which agrees with f at the n + 1 uniformly
spaced (including the endpoints) nodes. Not only does Pn not converge to f, but
for Izl > 3.64, limsup,~_~o ~ If(x) -pT~(x)l =- e~. Therefore, for a seemingly well-
behaved C ~ function, interpolation at the uniformly spaced nodes does not improve
as we use more points.

7.4.1. Interpolation error

The last example may discourage one from approximating a function through inter-
polation. While the example does indicate that caution is necessary, with care we
can reduce the likelihood of perverse behavior by interpolants. To see what we can
do, we examine the general interpolation error. Recall that the Lagrange polynomial
interpolating f at points xi is pn(x) = ~in=l f(xi)gi(x). Define


,zn) -- II(z -
Ch. 12: Approximation, Perturbation, and Projection Methods in Economic Analysis 553

The following theorem provides a bound on the interpolation error of the Lagrange

THEOREM 9. A s s u m e a = xo < x l < .." < x n = b. T h e n

sup If(x) -p~(x)14 IIf('~+l)ll~((n+ 1)!) -1 sup tP(x;xo,...,xT~). (45)

~ [ , , b] xc[a, b]

This bound decomposes the interpolation error into three pieces. The first two are
independent of any analysts choice. However, the third term depends on the choice
of interpolation points. By making good choices for the x i we can substantially affect
the interpolation error.
Here we see a significant difference between the problem facing a numerical analyst
and the problems of an econometrician. An econometrician must take the values of
f evaluated at whatever points some data generating process provides. In contrast, in
approximation problems we get to choose where to evaluate f. In general, interpolation
would be a bad procedure for econometricians since there is in general no assurance
that our data comes from a good choice of x's. When we can choose the points,
there is some hope that we can choose them to keep down the interpolation error.
Furthermore, econometricians have to deal with significant error in the observations
of f , whereas in numerical contexts we evaluate f with high accuracy.

Z4.2. Chebyshev interpolation

We will next determine a good collection of interpolation nodes. Note that our choice
of -¢x .~'~ affects only the maximum value of ~P(x), which in turn does not depend
l 7,Ji=l
on f . So if we want to choose interpolation points so as to minimize their contribution
to (45), the problem is

min m a x l - I ( x - xk).
:EI~...~X n x

The solution to this problem on [ - 1 , 1] is

2k - 1 )
xk=cos\ 2n 7r , k= 1,...,n,

which are the zeros of T ~ ( z ) . Therefore, the interpolation nodes which minimize the
error bound (45) are the zeros of a Chebyshev polynomial adapted to the interval;
we call this C h e b y s h e v interpolation. This shows that the Chebyshev interpolant is
the best in terms of minimizing the worst-case error. Furttlermore, it also keeps the
maximum error acceptably small, as the next theorem shows.
554 K.L. Judd

THEOREM 10 (Chebyshev interpolation theorem). Suppose f E Ck[a, b]. lf I~ is the

degree n Chebyshev interpolant, then there is some dk such that for all n

II f - I f IIoo---
< log(n+l)+2 ~ II Iloo.

This theorem says that the Chebyshev interpolant converges to f rapidly as we

use more Chebyshev zeros. Furthermore, if f has k derivatives, then the convergence
rate is O ( n - ~ . l o g ( n ÷ 1)). If f E C °°, then we have O ( n - k l o g ( n + 1)) conver-
gence for all k; of course, the proportionality constants, dk, are also increasing in k.
Convergence may seem to be an unremarkable property, but recall that interpolation
at uniformly spaced points does not necessarily converge. Given these properties,
Chebyshev methods are valuable whenever the approximated function is smooth.

7.5. Approximation through regression

Another way to approximate a function is to use regression. In regression, one eval-

uates the function f ( x ) at m points, and use the resulting evaluations to choose a
parametric approximation with n parameters, n << m, which minimizes some loss
function. The methods closest in spirit to the material above are the seminonpara-
metric methods. A key asymptotic result in the seminonparametric literature is that
if m and n grow at appropriate rates, then the approximation converges to f ( x ) as
n -+ oc. While regression methods can be used, they are based on "random" choices
of the xi, whereas other approximation methods make efficient choices of the xi and
will generally dominate regression by using fewer points.

7.6. Piecewise polynomial interpolation

Lagrange interpolation computes a C °o function to interpolate the given data. An

alternative is to construct a function which is only piecewise smooth. Two common
schemes are Hermite polynomials and splines.

7.6.1. Step Jhnction approximation

One common approximation strategy in economics is to use step functions. Step

lunction approximations on [a, b] are generated by a basis of step functions, {~i:
i = 1 , . . . , n } where h = a - b/n and -

0, a~x<a+(i ~l)h,
~(x)= 1, a + ( i - ~ ) h ~ < x < a + i h ,
O, a + i h ~ x ~ b .
Ch. 12: Approximation, Perturbation, and Projection Methods in Economic Analysis 555

lfthe interpolation data are (xi, yi) and ~i(xi) = 1, then the step function ~ i yi~i(x)
interpolates the data. To get better approximations, one increases n.

7.6.2. Piecewise linear approximation

Piecewise linear approximations take a sequence of data, (xi, Yi), and creates a piece-
wise linear function which interpolates ,the data. If the z~ are uniformly distributed,
then they are generated by a basis of tent functions, that is, functions of the form, for

0, a ~< z ~< a + ( i - 1)h,

~i(x) = (x - (a + (i - 1 ) h ) ) / h , a + (i - 1)h <~ x ~ a + ih,
1 - (x - (a + i h ) ) / h , a + ih <~ x <~ a + (i + 1)h,
O, a+(i+l)h~x~b.

These are called tent functions since ~ ( x ) is zero to the right of a + (i - 1)h,
rises linearly to a peak at a + ih, and then falls back to zero at a + (i + 1)h, and
remains zero. While both step function and piecewise linear approximations fit into
our general linear approach, they differ in that the basis elements are zero over most
of the domain, and at each point in the domain most basis functions are zero. This is
the defining feature offinite element approaches to approximation. While the resulting
bases are not strictly orthogonal, they are close to being so since the inner product of
most distinct pairs of basis elements is zero.

7.6.3. Hermite interpolation polynomials

Next, suppose that we have both level and slope information at xt, • • •, xn. Within
each [xi, xi+l] interval, we construct the Hermite interpolation polynomial given the
level and slope information at xi and xi+l. The collection of interval-specific Hennite
interpolations constitute a piecewise polynomial approximation. The resulting function
is a cubic polynomial almost everywhere. However, at the interpolation nodes, it is
only C l . This lack of smoothness is often undesirable and is addressed by splines.

7.6.4. Splines

Another piecewise smooth scheme is to construct a spline. A spline is any smooth

function which is piecewise polynomial but also almost as smooth where the poly-
nomial pieces connect. Formally, a function s(x) on [a, b] is a spline of order k if s
is C k-2 on [a, b], and there is a grid of points (called nodes) a = xo < x~ < • • - <
xn = b such that s is a polynomial of degree at most k - 1 on each subinterval
[xi, xi+l], i = 0 , . . . , n - 1. Note that order 2 splines are just the common piecewise
linear functions.
556 K.L. Judd

The cubic spline (that is, of order 4) is popular. Suppose that we have Lagrange
interpolation data {(x~, y~) I i = 0 , . . . , n}. The xi will be the nodes of the spline,
and we want to construct a spline, s(x), such that s(xi) = Yi, i = 0 , . . . , n. On each
interval [xi, xi+l], s(x) will be a cubic a~ + b~ x + e~ x 2 + d~ x 3. The definition of a
cubic spline together with the Lagrange data provides us with 4n - 2 conditions on
the 4n coefficients. Various splines are differentiated by the two additional conditions
imposed. One way to fix the spline is to pin down s'(xo) and s'(xn). For example, the
natural spline imposes s'(xo) = 0 = s'(xn). Hermite splines give s'(xo) and s'(x~)
values f'(xo) and f ' ( x ~ ) when these are known.
In general, degree k splines with data at n nodes will yield O ( n - ( k + l ) ) convergence
for f c C k+1 [a, b]. Splines are excellent for approximations for two general reasons.
First, evaluation is cheap since splines are locally cubic. To evaluate a spline at x
you must first find which interval [xi, Xi+l] contains x, then find the coefficients for
the particular cubic polynomial used over [xi, xi+1], and evaluate that cubic at x.
The second reason for using splines is that good fits are possible even for functions
which are not C °~ or have regions of large higher-order derivatives, situations where
orthogonal polynomials do not do as well since global approximation schemes have
difficulties in dealing with small regions of high curvature. On the other hand, if a
function is well-behaved, orthogonal polynomials will generally do better.

7. 7. Shape-preserving interpolation

Above we have focused on the pointwise convergence properties of various approxi-

mation schemes. Sometimes we will want to both interpolate data and preserve some
shape in the data. For example, if the interpolation data indicates an increasing func-
tion, we may want to compute an approximation which is increasing everywhere, not
only node-to-node but also between the interpolation nodes. Even though a scheme
which converges pointwise will asymptotically preserve shape, these methods are not
satisfactory since we will want to preserve the shape when we have a small amount of
data, not just when we have large amounts of data. It is on this dimension where the
difference between orthogonal polynomials and piecewise polynomial approximations
are important since orthogonal polynomials will not generally preserve shape.
Schumaker [117] presents a particularly simple way to construct shape-preserving
quadratic splines. Suppose we want to find a function s E C 1[tl, t2] such that

s(t ) - - s,, i = 1,2,

and, furthermore, suppose zl(z2, sl)s2, implying that the data are consistent with
a concave function. The task is to find an interpolating function s which is also
concave. Schumaker accomplishes this by adding one interpolation node ~i E [~i, ~i+1]
and constructs quadratic functions over [/.i, ~i] and [~i, ~+l] which together make a
Ch. 12: Approximation, Perturbation, and Projection Methods in Economic Analysis 557

concave C 1 function s on [t~, t~+l]. In general, if the data on [t~, t~+l] are consistent
with monotonicity, concavity, convexity, or nonnegativity, then one can construct a
piecewise quadratic function which is monotone, concave, convex, or nonnegative on
[ti,t~+l]. The nontrivial fact here is such a {i exists for any interpolation data. By
piecing together these functions over subintervals, we can preserve shape globally. If
one does not have stope information, one need only to choose the slope parameters
so as to be consistent with the shape of the data. Schumaker also shows how to make
judicious estimates of the slopes.
There are many papers on this topic; see Judd [74] for several references.

7.8. Multidimensional approximation

Most economic problems involve several dimensions - physical and human capital,
capital stocks of competitors, wealth distribution, etc. When we attempt to approx-
imate functions of several variables, many difficulties present themselves. We will
discuss multidimensional interpolation and approximation methods, first by generaliz-
ing the one-dimensional methods via product formulations, and then by constructing
inherently multidimensional schemes.

7.8.1. Tensor product bases

Tensor product methods build multidimensional basis functions up from simple one-
dimensional basis functions. If {qoi(x)}~__1 is a basis for functions of one real variable,
then the set of pairwise products, {~i(x)~j(y)}~,°°j= 1 is the tensor product basis for
functions of two variables. To handle n dimensional problems in general, one can
take all the n-wise products to create the n-fold tensor product of a one-dimensional
basis. The tensor approach can extend orthogonal polynomials and spline approxima-
tion methods to several dimensions. One advantage of the tensor product approach
is that if the one-dimensional basis is orthogonal in a norm, the tensor product is
orthogonal in the product norm. The disadvantage is that the number of elements
increases exponentially in the dimension.

7.8.2. Complete polynomials

There are many ways to form multidimensional bases and avoid the "curse of di-
mensionality". One way is to use complete polynomial bases, which grow only poly-
nomially as the dimension increases. To motivate the complete polynomials, recall
Taylor's theorem for R n in Theorem 1 above. Notice the terms used in the kth de-
gree Taylor series expansion. For k = 1, Taylor's theorem uses the linear functions
~f)l ~ {1,Xt,X2,... ,Xn}. For h = 2, Taylor's theorem uses

7")2 ~ J~l (J {x21~ . . • ~ X 2n ~ X ~l X 2 ~ X l X 3 ~ . . . Xn_lXn}.

558 K.L. Judd

792 contains some cross-product terms, but not all; for example, XlX2X 3 is not in 5o2.
In general, the kth degree expansion uses functions in

{ I g=l
The set 79k is the complete set of polynomials of total degree k.
Complete sets of polynomials are often superior to tensor products for multivariate
approximation. The n-fold tensor product of { 1, x , . . . , x k } contains ( k + 1)n elements,
far more than 79k. For example, 7)2 contains 1 + n + n(n + 1)/2 elements compa-
red to 3 n for the tensor product. Taylor's theorem tells us that many of the tensor
product elements add little to the approximation, saying that the elements of 79k will
yield a kth order approximation near x °, and but that the n-fold tensor product of
{1, x , . . . , x k} can do no better than kth order convergence since it does not contain
all degree k + 1 terms. This suggests that a complete family of polynomials will give
us nearly as good an approximation as the tensor product of the same order, but with
far fewer elements.

7.8.3. Finite element approaches

Finite element methods use bases whose elements have small support. One simple
example is bilinear interpolation. Suppose we have the values of f(x, y) at (x, y) =
(+ 1, + 1). Then, the following 4 functions form an interpolation basis:

= ¼(1 ~2(x,y)= ¼(l÷x)(1-y),

~4(x,y)= ¼(1-x)(l÷y).

The bilinear approximation to f on the square [ - 1 , 1] × [-1, 1], which is an example

of an element, is


The approximation is linear at each edge, but generally has a saddlepoint curvature
on the interior. To interpolate data on a two-dimensional lattice, we create the bilinear
approximation on each square.
Finite element methods consist of partitioning a domain into several elements, and
patching together the local approximations on the elements, but this is not easy. Since
we generally want the result to be a continuous function, care must be taken that
resulting approximation is continuous across element boundaries. With bilinear inter-
polation, this will hold since any two approximations overlap only at the edges of
rectangles, and on those edges the approximation is the linear interpolant between
Ch. 12: Approximation, Perturbation, and Prajection Methods in Economic Analysis 559

the common vertices. If we know that we are approximating a smooth function, then
the kinks at the edges of the elements may make bilinear approximation unappealing.
Assuring smoothness at element boundaries is an increasingly difficult problem as
we increase the desired degree of differentiability and the dimension. The bilinear
finite element scheme is just the simplest of a large number of finite element ap-
proximation schemes. There is a large literature on finite element approximations of
multidimensional functions (see Burnett [22]).

7.8.4. Neural networks

The previous approximation procedures are based on linear combinations of polyno-

mial and trigonometric functions. Neural networks provide us with an alternative and
inherently nonlinear functional form for approximation. A single-layer neural network
is a function of the form

F(x;/3) -2 h ~9 xi (47)

where x E R n is the vector of inputs and h and g are scalar functions. A common
form chooses 9(x) = x, reducing (47) to the form h(t3Tx). A single hidden-layer
feedforward network is the form

F(x;/3,7) ~ 7jl~, g xi • (48)

\ i=1 ~ /

Note the simplicity of the functional forms; this simplicity makes neural network
approximations easy to evaluate.
The data for a neural network consists of (yj, xJ) pairs such that yj C R is supposed
to be the output of a neural network if xJ E R n is the input. This requirement imposes
conditions on the parameters/3 in (47) and/3 and 3/in (48). One fits single-layer neural
networks by finding fl to solve

rain E ( y j - F(xJ;fl)) 2

and the objective of a single hidden-layer feedforward network is to solve

min E ( y j - F ( x J ; p , 3')) 2,

which are just instances of nonlinear least squares fitting.

560 K.L. Judd

The approximating power of neural network approximation is indicated by theorems

of Horni, Stinchcombe and White (see White [122] for a wide-ranging discussion of
neural networks and their properties). Let G be a continuous function, G : R -+ R,
such that f~-~ooG ( x ) d x is finite and nonzero and G is L p for 1 ~< p < ec. Let

+ b j), bj,/3j C R,
L j=l

w j C R n, w j ¢ 0 , m= 1,2,...}

be the set of all possible single-hidden layer feedforward neural networks using G as
the hidden layer activation function.

THEOREM 11. Let f : R ~ -+ R be continuous. Then f o r all e > O probability measure

#, and compact sets K C R ~, there is a 9 E z , n ( G ) such that

sup If(z) - g ( z ) l <



L If(x) - g(x)[ d/z < e.

This also holds when G is a squashing functions, i.e., G : R --+ [0, 1], G is nonde-
creasing, lim:~-+oo G ( x ) = 1, and limx~-oo G ( x ) = 0.14

These are universal approximation results which justify the use of neural network
approximation, and help explain its success. There is some evidence that neural net-
work approximation methods may be particularly efficient at, multidimensional ap-
proximation in the sense of needing relatively few free parameters; see Barton [6] for
a recent result. The theoretical development of neural networks is proceeding, but is
inherently difficult because of the nonlinearity of this approach.

8. Applications of approximation to dynamic programming

Approximation methods are a key part of most numerical procedures. They are par-
ticularly important in discrete-time dynamic programming problems. These problems

14Notethat a squashing function is a cumulativedistribution function and vice-versa.A coimnonchoice

for G is the sigmoid function, G(x) = 1/(1 + e-X).
Ch. 12: Approximation, Perturbation, and Projection Methods in Economic Analysis 561

are among the most useful and basic of dynamic economic problems, with well-
understood theoretical properties. We will briefly discuss them and the approximation
aspects of numerical dynamic programming.
Let 7r(u, x)be profit flow if the state is x and the control is u. Suppose the law of
motion is

z~+l =g(xt, ~t).

Then the value function, V(x), solves

V(x) = maxTr(u, x) + flV(g(x, u) ) - (TV)(x). (49)

The standard theoretical procedure is to iterate on the basic functional equation,

(49). If we could handle arbitrary functions, we would start with a guess, V0, and then
compute the sequence {Vn} generated by

v~ - T E n - 1 . (50)

This procedure converges when viewed as a mapping in the space of value functions.
On the computer, however, one cannot store arbitrary functions. There are several
details which need to be decided to compute approximations to (50). Since we cannot
deal directly with the space of continuous functions, we focus on a finite-dimensional
subspace. We will approximate V(x) as a finite linear sum of basis functions.


Numerical procedures construct a V(x) which approximately satisfies the Bellman

equation, (49). More specifically, the objective is to find a vector, ~ E R N, such that
V solves (49) as closely as possible.
The basic task is to replace T, an operator mapping continuous functions to con-
tinuous functions, with a finite-dimensional approximation, 2r, which maps functions
of the form in (51) to functions of the same form. We construct 2~ in two steps. First,
we choose a finite collection, X, of points x, and evaluate (T~')(z) at x C X. We
will refer to this as the maximization step since it is the maximization problem in (49)
at x. The resulting values are points on the function T V . Since we are approximating
a continuous value function, we use that information to choose a value function of
form (51) which "best" summarizes the information generatedconcerning T V . This
is the critical approxima~on step, and we denote the result TV. In essence, T takes
a function of form (51), V, and maps it to another function of the same form, and is
therefore a mapping in the space of the ~ coefficients, and the objective is to find a
562 K.L. Judd

fixed point for T in the space of coefficients. We can also view T as a mapping from
continuous functions to the finite-dimensional subspace representable as V ( x , ~), in
which case the problem is to find a fixed point of T in the space of functions of form
The details of the approximation aspects of this procedure - choosing a basis for
the expression of V, choosing points X to evaluate T V , and fitting the data - are
important. We next discuss some basic approaches.

8.1. Discretization methods

The simplest approximation procedure is to discretize the state space, that is, they re-
place the problem on a continuous state space with one with a finite number of points.
This has the advantage of reducing the problem to one of finite matrices. The other
advantage is that the resulting analysis exactly solves some similar economic problem.
See Rust [112] for a discussion of numerical dynamic programming procedures. Even
in the case of discrete-state dynamic programming, projection solution ideas come
into use. The key computation in the discrete-state approach is the solution of a large
linear system. This can be accomplished approximately by using the GMRES method
(see Saad and Schulz [113]) which essentially computes a few directions and finds
an approximate solution which is spanned by these directions and minimizes a loss
While the discretization method does not obviously fit the description above, it
is generally equivalent to approximating the value function with a step function. 15
However, step functions are highly inefficient ways to approximate a smooth value
function. Because of this, the discretized state space method is unlikely to be of much
value in economic analysis outside of naturally discrete problems, one-dimensional
problems, or problems where the solutions are so nonsmooth that discretization is
competitive with smooth approximation schemes. The impracticality of discretization
is indicated by the fact that supercomputers are often used. Multidimensional problems
are practically impossible, even for supercomputers, since the "curse of dimensionali-
ty" is particularly vexing for this method; if N points are used for a one-dimensional
problem, then N a points will be used for a d -dimensional problem. There are several
ingenious methods for making discrete state problems more efficient; see Rust [112]
for a description of these algorithms. We will focus on the alternatives presented by
the application of approximation ideas.

15After computing the solution to (50), many users then use linear interpolation to estimate the value
function at points not part of the discretized state space. Since this linear interpolation is done only after
the value iteration is completed, it does not affect these comments and it's contribution to improving the
algorithm's accuracy is limited.
Ch. 12: Approximation, Perturbation, and Projection Methods in Economic Analysis 563

8.2. Multilinear approximation

While the discretization approach has been popular in economics, other economists
and the operations research literature in general have moved instead to continuous
approximations of the value function. The simplest example of this is the DYGAM
package discussed in Dantzig et al. [41 ] which used multilinear interpolation on hyper-
cubes when computing Vn+l from the information generated by TV~. In economics,
Zeldes [130] used piecewise linear approximations.
This procedure has several advantages. Far fewer nodes are needed compared to
a discretization method since the continuity of V is being exploited. There are some
difficulties. First, the kinks make the optimization step more difficult, and are unrep-
resentative of V if V is C 2. Second, multilinear approximation generates curvature
properties which may cause multiple local optima in the optimization step. The prob-
lem is that the interpolation may not have the same shape as the data.

8.3. Polynomial approximations

If a little continuity is good, then more should be better if V is sufficiently contin-

uous. In this spirit, Bellman et al. [7] proposed the use of polynomials, Daniel [38]
discussed the use of splines, and Johnson et al. [64] report computing experience
with a variety of approximation schemes. Judd [74] presents an example of using a
tensor-product basis of Chebyshev polynomials to solve a three-dimensional optimal
growth problem. The advantages of polynomial approximations are that fewer points
are evaluated and increased smoothness makes the optimization step more rapid. All
of the approximation methods discussed above are potentially useful for dynamic
There are, however, some problems which may arise with polynomial approxima-
tion which don't arise with discretizafion or multilinear approximation. The difficulty
is that many interpolation schemes do not preserve shape. Even if we use the best
possible interpolation scheme, the resulting approximation may not be good in be-
tween the nodes in X, and can lead to instabilities in the value iteration algorithm.
To deal with this, Judd and Solnick [78] proposes the use of shape-preserving poly-
nomials to construct value function approximations, and computes upper bounds on
the error which are superior, to those from the discretization approach. In particular,
this leads to convergence proofs for value function iteration with shape-preserving
approximation, an important fact in itself since there can be no such proof for value
iteration with polynomial approximation schemes in general.

9. Projection methods

Our discussion of dynamic programming indicated that approximation ideas may be

useful in solving the operator equations which arise in dynamic programming. We
564 K.L. dudd

next discuss how these ideas from approximation theory naturally lead to algorithms
for solving many of the operator equations which arise in economics. They are called
p r o j e c t i o n methods, also known as w e i g h t e d residual methods. We will describe the
general projection approach for solving general operator problems. In fact, most of
the techniques currently used by economists are also projection methods when viewed
from the general perspective.
The first important observation is that in many economic models, equilibrium can
be expressed as a collection of functions. In dynamic programming problems, that
unknown function will be either the value function and/or the optimal policy func-
tion. In dynamic games, the unknown functions are the agents' strategy functions. In
optimal growth models, the unknown function may be the optimal policy function. In
dynamic equilibrium models, the unknown functions would include functions which
indicate consumption demand, labor supply, asset trading strategies, and asset and
commodity prices, all as functions of the underlying state variables. For specificity,
consider the following simple deterministic growth problem:



where capital obeys the law of motion

kt+l = f ( k t ) - ct.

To calculate the optimal consumption policy (and competitive equilibrium consump-

tion function), h ( k ) , it is enough to focus on the Euler equation,

0 = u'(h(k)) - flu'(h(f(k) - h(k)))f'(f(k) - h ( h ) ) =_ (Af(h))(k). (52)

The basic idea of projection techniques is to first express equilibrium conditions on

these functions as a zero of an operator, .M : B1 --+ B2, where B 1 and/32 are function
spaces. In (52) above, the Euler equation error is defined to be that operator, where B1
and B2 are both equal to the space of continuous functions on [0, oc). In general, the
operator N" can be an ordinary differential equation, as in optimal control problems, a
partial differential equation, as in continuous-time dynamic programming, or a more
general functional equation, as in Euler equations expressing necessary conditions for
recursive equilibria (as formulated in Prescott and Mehra [26]). Of course, space and
time limitations make it impossible for computers to store and evaluate all possi-
ble elements of B1. To make the problem tractable, projection methods focus on a
finite-dimensional subspace of candidates in B1 which can be easily represented on a
computer and is likely to contain elements "close" to the true solution. The selection
of this finite-dimensional space naturally exploits approximation methods. It may be
Ch. 12: Approximation, Perturbation, and Projection Methods in Economic Analysis 565

difficult for the computer to evaluate N , in which case we find a computable oper-
ator, J~, which is "similar" to N'. Within the finite-dimensional space of candidate
solutions, we find an element which is "almost" a zero of ~ ' .
While the basic idea is natural, there are many details. The key details are specifying
the approximation method we will use, the finite-dimensional subspace within which
we look for an approximate solution, and the computer representation of N', defining
what "close" and "almost" mean, and finding the approximate solution. By studying
these details, we will see how to implement these ideas efficiently to solve numerically
interesting dynamic nonlinear economic problems.

9.1. General projection algorithm

We next describe the projection method in a general context. One begins with an
operator equation representation of the problem, that is, one reduces the economic
problem to finding an operator iV" and a function f such that equilibrium is represented
by the solution to

A/" ( f ) = 0

where f : D C R N --+ R M, N" : B1 -+ B2, and the Bi are function spaces. Typically
34* is a composition of algebraic operations, differential and integral operators, and
functional compositions, and is frequently nonlinear. We shall show how to implement
the canonical projection technique in a step-by-step fashion. We first give an overview
of the approach, then highlight the critical issues for each step, and discuss how the
steps interact.
The first step is to decide how to represent approximate solutions. One general way
is to assume that our approximation, f , is built up as a linear combination of simple
functions. We will also need a concept of when two functions are close. Therefore,
the first step is to choose a basis and an appropriate concept of distance:

Step 1. Choose bases, ~bj = {(~:~i}~?°l, and inner products, (., .)j, over Bj,
j = 1,2.
The basis over B1 should be flexible, capable of yielding a good approximation for
the solution, and the inner products should induce useful norms on the spaces.
Next, we decide how many basis elements to use and how to implement iV:
Step 2. Choose a degree of approximation r~ for f, a computable approximation N"
of iV", and a collection of n functions from B2, p~ : D --+ /~M, i = l , . . . , ~r~,.

The approximate solution will be f -- 2in.=1 aicpi(:c). The convention is that the
~i increase in "complexity" and "nonlinearity" as i increases, and that the first n
elements are used. The best choice of n cannot be determined a priori. Generally,
566 K.L Judd

the only "correct" choice is n = ec. Larger n should yield better approximations, but
one is most interested in the smallest n which yields an acceptable approximation.
One initially begins with small n and increases n until some diagnostic indicates that
little is gained by continuing. Similar issues arise in choosing .~. Sometimes we can
take . ~ = N', but more generally some approximation is necessary. The Pi are the
projection directions we will use to determine &
Step 1 lays down the topological structure of our approximation and Step 2 fixes
the flexibility of the approximation. Once we have made these basic decisions, we
begin our search for an approximate solution to the problem. Since the true solution
f satisfies N ' ( f ) = 0, we will choose as our approximation some f which makes
2~(f) "nearly" equal to the zero function. Since f is parameterized by 6, the problem
reduces to finding a coefficient vector d which makes . ~ ( f ) nearly zero. This search
for 6 is the focus of Steps 3-5.

Step 3. For a guess & compute the approximation, f _= ~i"-, ai~.i(x), and the
residual function,

R(x; -

The first guess of d should reflect some initial knowledge about the solution. After
the initial guess, further guesses are generated in Steps 4 and 5, where we see how we
use the inner product, (., ")2, defined in the space 132, to define what "near" means.

Step 4. For each guess of & compute the n projections,

- (R(.; i = l,...,,J.

Step 5. By making a series of guesses over ff and iterating over Steps 3 and 4, find
which sets the n projections equal to zero.
This general algorithm breaks the numerical problem into several distinct steps. It
points out the many distinct techniques of numerical analysis which are important.
First, in Steps 1 and 2 we choose the finite-dimensional space wherein we look for
approximate solutions, hoping that within this set there is something "close" to the
real solution. These steps require us to think seriously about approximation theory
methods. Second, Step 4 will involve numerical integration if we cannot explicitly
compute the integrals which define the projections. Third, Step 5 is a distinct numerical
problem, involving the solution of a nonlinear set of simultaneous equations or the
solution of a minimization problem. We shall now consider each of these numerical
problems in isolation.
Ch. 12: Approximation, Perturbation, and Projection Methods in Economic Analysis 567
9.1.1. Choice of basis and inner product

There are many criteria which the basis and inner product should satisfy. The full
basis ~1 for the space of candidate solutions should be "rich"; in particular, it should
be complete in/3i. We will generally use inner products of the form

(f(x), g(x))~ =_/D f(x)g(x)w(x)dx

for some weighting function w(x) >>O.

Computational considerations also play a role in choosing a basis. The ~ should
be simple to compute. They should be similar in size to avoid scaling problems.
While asymptotic results such as the Chebyshev interpolation theorem may lull one
into accepting polynomial approximations, practical success requires a basis where
only a few elements will do the job. This requires that the basis elements should
"look something like" the solution. In particular, our discussion of approximation
methods above shows that we should use smooth functions to approximate smooth
functions, but use splines to approximate functions which may have kinks or other
extreme local behavior. We will also see that the use of orthogonal bases will enhance
efficiency and accuracy. Because of its special properties, a generally useful choice
is the Chebyshev polynomial family. If, on the other hand, one has a basis which is
known to efficiently approximate the solution, one should use that instead or combine
it with the Chebyshev polynomials. A good, problem-specific, choice of basis can
substantially improve algorithmic performance over the generic approximation meth-
ods discussed above. However, the generic approaches are usually acceptable if one
has no apparent problem-specific alternative.

9.1.2. Choice and evaluation of projection conditions

Projection techniques include a variety of special methods. Generally we use (-, ')2
to measure the "size" of the residual function, R(x; g). The general strategy is to find
an ~ which makes R(x; if) small. There are several ways to proceed.
First, we have the least-squares approach which chooses d so as to minimize the
"weighted sum of squared residuals":

rn~n</~(x; d), R(x; d)>2"

This replaces an infinite-dimensional operator equation with a nonlinear minimization

problem in R "~. The standard difficulties may arise; for example, there may be local
minima which are not global minima. The objective may be poorly conditioned.
However, there is no reason for these problems to arise more often here than in any
568 K.L. Judd

other context, such as maximum likelihood estimation, where optimization problems

are solved numerically.
While the least-squares method is a direct approach to making R(x; d) small, most
projection techniques find approximations by fixing n projections and choosing d
to make the projection of the residual function in each of those n directions zero.
Formally, these methods find ~7 such that (R(x; d), pi(x))2 = 0 for some specified
collection of functions, Pi. Different choices of the Pi defines different implementa-
tions of the projection method.
One such technique is the Galerkin method. In the Galerkin method we use the
first n elements of the basis for the projection directions. Therefore, g is chosen to
solve the equations:

_= (R(z; =o, i=

Notice that here we have reduced the problem of solving a functional equation to
solving a finite set of finite-dimensional nonlinear equations. In some cases in physics,
the Galerkin projection equations are the first-order conditions to some least-squares
minimization problem, in which case the Galerkin method is also called the Rayleigh-
Ritz method. This is not as likely to happen in economics problems because of the
inherent nonlinearities.
There are obviously many ways to implement the projection idea. A collocation
method takes n points from the domain D, { X"~}i=1, and chooses g to solve

R(xi; 6) =0, i= 1,...,n.

This is a special case of the projection approach since R(xi; 6) equals the projection
of R(x; d) against the Dirac delta function at x~, (R(x; if), 5(x - xi))2. Orthogonal
collocation chooses the collocation points in a special way. The chosen xi are the zeros
of the n ' t h basis element, where the basis elements are orthogonal with respect to
the inner product. The Chebyshev interpolation theorem suggests its power. Suppose
D = [ - 1 , 1] and we have found an d such that R(z~ ; 6) = O, i = 1, ..., n, where
the z~~ are the n zeros of Tn. As long as R(x; 6) is smooth in x, the Chebyshev
interpolation theorem says that these zero conditions force R(x; 6) to be close to zero
for all x, and that these are the best possible points to use if we are to force R(x; 6) to
be close to zero. It is not certain that even orthogonal collocation is a reliable method;
fortunately, its performance turns out to be surprisingly good.
Choosing the projection conditions is a critical decision since the major computa-
tional task is the computation of those projections. The collocation method is fastest
in this regard since it only uses the value of _R at n points. More generally, the pro-
jections will involve integration. In some cases one may be able to explicitly perform
the integration. This is generally possible for linear problems, and possible for special
nonlinear problems. However, our experience is that this will generally be impossible
Ch. 12: Approximation, Perturbation, and Projection Methods in EconomicAnalysis 569

for nonlinear economic problems. We instead need to use numerical quadrature tech-
niques to compute the integrals associated with evaluating (., .). A typical quadrature
formula approximates f : f ( x ) w ( x ) d x with a finite sum 2 i n 1 w i f ( z i ) where the
xi are the quadrature nodes and the wi are the weights. Since these formulas also
evaluate R(x; d) at just a finite number of points, xi, quadrature-based projection
techniques are essentially weighted collocation methods. The advantage of quadra-
ture formulas over collocation is that information at more points is used to compute
the approximation, hopefully yielding a more accurate approximation of the projec-

9.1.3. Finding the solution

Step 5, which determines 6 by solving the projection conditions computed in Step 4,

uses either a minimization algorithm (in the least-squares approach) or a nonlinear
equation solver to solve the system P(6) = 0. Many alternatives exist, including
successive approximation, Newton's method, and homotopy methods, all of which
have been used in the economics applications of the projection method.

10. Applications of projection methods to rational expectations models

Most methods used in numerical analysis of economic models fall within the general
description above. We will see this below when we compare how various methods
attack growth problems. The key fact is that the methods differ in their choices of
basis, fitting criterion, and quadrature techniques. With the general method laid out,
we will now report on a particularly important application to show its usefulness.

]O.l. Discrete-time deterministic optimal growth

We examine optimal growth problems in discrete time and show how projection
techniques can be adapted to calculate solutions. The stochastic case is one which has
been studied by many others with various numerical techniques. In fact, one point
we make below is that most of these procedures are really projection methods. By
recognizing the common projection approach underlying these procedures, we can
better understand their differences, particularly in accuracy and speed. We conjecture
that the comparative performances of these various implementations of projection
ideas in the discrete-time stochastic optimal growth problem is indicative of their
relative value in other future problems.
We first examine the deterministic growth problem described above which is char-
acterized in (52). We shall now describe the details of a projection approach to that
problem. The domain D of our approximation will be [km, kM]. km and kM are
chosen so that the solution will have k confined to [k~, kM]. In particular, [k~, kM]
570 K.L. Judd

must contain the steady state, a point which we can determine before calculations
begin. Our approximation to h is parametrically given by


where n is the number of terms used. Common choices include the Chebyshev poly-
nomials ¢~(k) ~ T~-i (2(k - km)/(kM - kin) - 1), the tent functions, or the ordinary
In this problem, N" is a simpleoperator using only arithmetic operations and com-
position. Therefore, we can take N" = N'. Since h is continuous, we define .Af to have
domain and range in C°[k,~, kM]. Hence, B1 = B2 = C°[km, kM], the continuity of
N" in the L ~ norm following from the u, f , and/z being C 1 in all their arguments.
Given the Euler equation (52), the residual function becomes
R ( k ;6) = u' (h(k ; 6)) - flu' (h(f(k) - h(k ; 6); 6 ) ) f ' ( f ( k ) - h(k ; 6))

To compute & we can do one of several things. First, we consider orthogonal
collocation. We choose n values of k, denoted by ki, i = 1 , . . . , n. We then choose
6 so that R(ki; 6) = 0 for each i. Orthogonal collocation chooses the ki to be the n
zeros of ~b. The Chebyshev interpolation theorem strongly argues for using Chebyshev
polynomials in this case. If R(ki; 6) = 0 for each ki, then we would like to conclude
that R(ki; 6) is the zero function on the domain D. The Chebyshev interpolation
theorem says that this is most justified if the ki were the Chebyshev zeros, and that
if we use Chebyshev zeros, R(k; ~) will likely be nearly zero.
We could also implement the Galerkin method. If we use Chebyshev polynomials
as a basis, then we use projections with the inner product

( f ( k ) , g(k)) =- f ( k ) g ( k ) w ( k ) dk


1/2) 2 __±

With this choice of inner product, the basis is orthogonal. The Galerkin method
computes the n projections

P i ( a ) = fk kM R ( k ; 6 ) ¢ i ( k ) w ( k ) d k , i= l,...,n,
Ch. 12: Approximation, Perturbation, and Projection Methods in Economic Analysis 571

and chooses d so that P ( d ) = O. Here the difficulty is that each P~(d) is an integral
which needs to be computed numerically. The form of w ( k ) implies the use of Gauss-
Chebyshev quadrature. That is, we approximate Pi (~) = 0 conditions with

o = R(kj;

for some ra > n, with the kj being the m zeros of ~b~+l.

When we have calculated our estimate of ~, we would like to check if this procedure
yields reliable approximations. Several diagnostics can be used to see if the proposed
solution is acceptable. First, the ak coefficients decline rapidly in k, as predicted by
the Chebyshev approximation theorem. Second, the low-order coefficients should be
insensitive to the choice of n. While these facts do not prove that the approximation
is good, we would be uncomfortable if the high-order coefficients were not small, or
if the coefficient estimates were not stable as we increase n. We also want to examine
test cases to see if the results from the projection method agree with the answer from
another method known to be accurate. Judd [72] performs these tests on a variety of
empirically interesting cases, finding that the projection method applied to this model
is very accurate and very fast.
Table 12.2 (which is taken from Judd [72]) indicates the kind of accuracy which
can be achieved. We assume that f ( k ) was Cobb-Douglas with capital share of
0.25, and that the steady state capital stock is k = 1. We first solved the problem
with an 800,000 point discretization over the range [0.5, 1.3]. We then used the
projection method to solve the problem. The entry under PROD indicates the output
at k, and CONS indicates the optimal consumption as computed by a 800,000 point
discretization method. The entries under n = 9, 6, 4, 2 columns indicate the error of
the degree n polynomial approximation. The notation a ( - m ) means a x 10 - ~ . The
results indicate that even a low-order approximation does quite well.
The tent function approach was used in Bizer and Judd [11] in a similar model.
There the interpolation nodes were chosen to be uniformly distributed in D. The
advantage of the piecewise linear approximation is that the resulting interpolation is

Table 12.2
Errors in consumption policy function
k PROD CONS n =9 n= 6 n = 4 n = 2
0.50 0.1253211 0.1147611 3(-7) 3(--7) 0.01 -1(-4)
0.70 0.1401954 0.1335954 -3(-7) -3(-7) -1(-6) 1(--4)
0.80 0.1465765 0.1421165 -2(-7) -1(--7) --5(-6) 2(-4)
0.90 0.1524457 0.1501957 4(-7) 0.04 -5(-6) 2(-4)
1.00 0.1578947 0.1578947 0 -0.01 -3(-6) 2(-4)
1.10 0.1629916 0.1652816 --2(-7) -2(-7) 2(-6) 9(-5)
1.30 0.1723252 0.1792852 2(-7) 2(-7) 4(-6) --1(--4)
572 K.L. Judd

shape-preserving. This may be useful since we know that h is monotone increas-

ing. However, the shape-preservation considerations turn out not to be important
relative to the differentiability considerations which argue for Chebyshev polynomi-
als. The policy functions computed in Judd [72] using Chebyshev polynomials were
monotone increasing, and using tent functions substantially reduced the algorithm's

10.2. Stochastic optimal growth

We next turn to a stochastic optimal growth model. This example will show us how
to handle multidimensional problems and the conditional expectations which arise
in stochastic dynamic problems. We will also be able to describe the parameterized
expectations method of solving rational expectations models.
More specifically, we examine the problem

k t + l = O t f ( ] c t ) -- et,

ln0t+l = plnOt + :t+~ (53)

where Ot is a stationary AR(1) multiplicative productivity parameter. We will assume

that the productivity shocks et "- N(0, o-2) are independent. In this problem, both the
beginning-of-period capital stock and the current value of 0 are needed for a sufficient
description of the state. Hence, consumption is a function of both k and 0, h(h, 0),
and the Euler equation is

u/(h(k,O)) = /3E{u/(h(Of(h)-h(k,O),O) )Of'(Of(k)-h(k,O)) I O}. (54)

At this point, we will rewrite the Euler equation to make it more linear. We know
that projection algorithms work well for linear problems. Perhaps our algorithm will
do better if we make it more like a linear problem. To that end, rewrite (54) as

0 = h(k, 0)

Note that (55) has two terms, one linear in h(k, 0), and the other is similar to a CRTS
function of next period's potential consumption values. Similar stochastic growth
problems were investigated in Judd [72] and in the Taylor-Uhlig [120] symposium.
We shall now describe and compare the various methods.
Ch. 12: Approximation, Perturbation, and Projection Methods in Economic Analysis 573

The procedure for finding h is similar to the deterministic case. First of all, we need
to approximate the policy function. Judd [72] and Coleman [36] use approximations
of the form

nk 7~,0

h(k, O; a) = ~ ~ aij¢ij(k O)
i=l j=l

where the ~/Jij functions are Chebyshev functions of k and 0 in Judd, and tent func-
tions of Ink and log 0 in Coleman [36]. Judd also considered complete polynomials.
Comparisons followed the considerations outlined above. Since the policy function
is smooth, the smooth approximation procedures did better with the complete poly-
nomial approach doing best, that is, the greatest accuracy per unit of computer time.
Coleman's choice of a finite element approach reduced efficiency since it used far
more basis elements; furthermore, it cannot switch to a complete polynomial ap-
proach. These differences between the spectral approach advocated in Judd and the
finite element approach used in Coleman become even larger as we move to higher
In their approach to the stochastic growth model, Den Haan and Marcet [57] pa-
rameterized the policy function to be

h(k, O) = (k62053e 61)l/'y = (exp{Sl + 52 in k + 53 In 0}) I/'Y (56)

that is, they assume that log consumption is a linear function of In k and log 0.
However, this basis is not orthogonal. When they tried to improve the approximation
to a quadratic form in In k and log0, the lack of orthogonality lead to difficulties
which prevented them from improving on the linear approximation. They argue that
the collinearity of their basis elements is "a fortunate situation" and justifies their focus
on low-order polynomial approximations. In contrast, the use of orthogonal bases in
Judd and the use of a finite element approach in Coleman leads to no difficulties in
finding substantially better approximations beyond low-order polynomials.
The comparisons of the Coleman, Den Haan and Marcet, and Judd approaches to
solving (54) illustrates the importance of approximation ideas. Den Haan and Marcet,
and Judd use polynomials to approximate what is presumed to be a smooth function.
Coleman's contrasting use of finite elements introduces kinks in the approximation
which forces him to use many elements. The finite element approach and the or-
thogonal polynomial approach avoids the multicollinearity problems which limited
Den Haan and Marcet to low-order approximations. As these papers discuss, these
differences lead to considerable differences in computational speed and accuracy in
the final result.
574 K.L. Judd

10.3. Problems with inequality constraints

The optimal growth problem described above was simple in that the equilibrium was
described in terms of an Euler equation which always had an interior solution. In some
problems, constraints mean that the first-order conditions must be complemented with
complementary slackness conditions. This was the nature of the problems which were
the first to lead to numerical solutions of nonlinear rational expectations equilibria.
Gustafson [56] investigated the problem of equilibrium storage of a storable commod-
ity. He assumed that output in period t is an exogenous random variable, Yt, which is
divided between a change in storage, St+l - S t (cot is the beginning-of-period-t stock),
and consumption, ct. In equilibrium, price is a function of total stock, p(St + zt), and
obeys the conditions

p(s~ + x~) - E{p(S~+l + x~+~)}/> 0,

(p(St + xt) -- E{p(St+I -t- xt+l) })St+l = 0

where St+l = St + xt - D(p(St + xt)) and D(p) is the demand function.

In some states of the world, the equilibrium storage level will be zero and the price
function will not be a smooth. Gustafson [56] used a piecewise linear approximation
of the equilibrium price function in his solution method. Piecewise linear approxi-
mations are relatively inefficient because many pieces are necessary to get a good
approximation. In their analysis of the problem (which also generalized Gustafson to
handle endogenous output) Williams and Wright solved for E{pt+l [ St+l } as a func-
tion of St+l, expressing this year's expectation of next year's price conditional on the
amount stored for next year. This function determines the current price, future supply,
and current stockpiling through an Euler equation similar to (57), but is smooth be-
cause it is a conditional expectation. Hence, they found that a low-order polynomial
approximation was sufficient to solve the problem. Miranda and Helmburger [98] also
used this insight in their analysis of stockpiling. Christiano and Fisher [32] applied
the Wright-Williams technique for handling the inequality constraint to a constrained
version of (53) and found similar advantages to using a smooth approximation.
This discussion points out two facts when dealing with inequality constraints. First,
we can still use the same approximation ideas but we may have to adapt to handle
the kinks which may arise. Second, skillful construction of the problem may result
in finding a smooth function which characterizes equilibrium and allows us to use
the more efficient smooth approximation schemes. Again, approximation ideas can be
exploited to produce superior methods.

10.4. Dynamic games

Methods which are useful for dynamic programming are also naturally natural for
computing closed-loop (also known as Markov) equilibria of dynamic games. This
Ch. 12: Approximation, Perturbation, and Projection Methods in Economic Analysis 575

holds since each player solves a dynamic programming problem, and equilibrium
can be expressed as a coupled collection of Bellman equations for each player's
dynamic programming problem. Kotlikoff, Shoven and Spivak [80] used ordinary
polynomials in their study of strategic saving and bequests. Miranda and Rui [99]
computed closed-loop equilibria for dynamic stockpiling games among commodity
producing countries. They used Chebyshev polynomial approximations to players'
value functions and projection methods to determine equilibrium value functions. In
both cases, equilibrium was computed with relative ease.

10.5. Continuous time problems

The examples above have been of discrete-time systems. Projection methods have
also been used to solve continuous-time models. One simple example is the canonical
continuous-time optimal growth problem described above in (21), which reduced to
solving the differential equation:

o : C'(k) (f(k) - C(k)) u " ( C ( k ) ) (p - f ' ( k ) ) - E(k; C).

Judd [71] used a basis of Chebyshev polynomials to approximate C(k) with C(k, a) =
En-i=0 ai on a large interval of capital stocks. Again, the performance of the
algorithm was very good, independent of the details of the implementation. In fact,
it easily outperformed the more commonly used shooting approach to the problem.
Judd also extended this model to allow for taxation and uncertainty in continuous time.
In all cases, accurate results were obtained quickly. Since projection methods were
initially developed to deal with continuous-time systems represented by ordinary and
partial differential equations, this is not surprising. One suspects that continuous-time
systems in general will be readily computed with projection methods.

10.6. Models with asymmetric information

Many of the examples discussed above reduced to applying the projection method
to standard mathematical problems - ordinary and partial differential equations and
integral equations. To demonstrate the flexibility of the projection method, we next
examine a very different kind of problem - equilibrium where individual agents have
different information. These problems do not reduce to any of the standard operator
problems discussed in applied mathematical literature. However, one can attack them
successfully with the projection method. We will first describe an application to asset
markets with asymmetric information. We will then discuss other economic problems
where these methods have potential.
576 K.L. Judd

10.6.1. Information and asset markets'

Asset market equilibrium with imperfect information have been rigorously studied in
recent years. Grossman [54] and Grossman and Stiglitz [55] began a long literature
on the partial equilibrium analysis of security markets with asymmetric information.
However, much of this literature makes very special and simple assumptions about
the distribution of returns, the information asymmetries, investor tastes, and asset
structure. The restrictions substantially limit the generality of the results and the range
of questions which can be addressed.
Recently, Judd and Bernardo [76] applied projection methods to analyze these
models without special functional form assumptions. A simple one-period investment
problem illustrates the method. Suppose each investor invests in two assets. The safe
asset pays out R dollars per dollar invested, and the basic risky asset (we will call
it stock) pays out Z dollars per share. If an investor begins the first period with W
dollars in cash and coo shares of stock, and ends the first period with co shares of stock
which trade at a price of p dollars per share, his second, and final, period consumption
will be

= (w - (co - coo)p) R + c o 2 . (58)

The first-order condition for the choice of co will be

0 : E { u ' ( ~ ) ( 2 - pR) 1 I} (59)

where I is the investor's information set.

10.6.2. Computing conditional expectations

The conditional expectation in (59) implies that our equilibrium concept involves
a conditional expectation. Numerical implementation of the conditional expectation
conditions is the most challenging aspect of this problem. To solve this problem, Judd
and Bernardo used the following definition of conditional expectation:

Z(X) = E { Y I X }

if and only if

E{(Z(X) - Y)f(X)} : 0

for all bounded measurable functions, f ( X ) , of X. Intuitively, this says that the predic-
tion error of the conditional expectation, E { Y [ X}, is uncorrelated with any measur-
able function of the conditioning information, X. This definition replaces the condi-
tional expectation with an infinite number of unconditional expectation conditions.
Ch. 12: Approximation, Perturbation, and Projection Methods in Economic Analysis 577

10.6.3. Computing an asymmetric inJbrmation rational expectations equilibrium

We now show how to compute an equilibrium. Assume three types of investors, with
type i investors observing Yi. The state of the market includes all private signals,
g = (Yl, Yz, Y3), but each investor sees only the market-clearing price and his own
information. Therefore, a rational expectations equilibrium includes a price function
p(y) and type-specific demand policy functions, Oi(p, Yi) for i = 1,2, 3, such that
given p(y), O~ solves (59) for i = 1,2,3, and ~ = 1 0 . i ( y i , p ( y ) ) : 1 for all states y,
where total supply is 1.
In their solution, Judd and Bernardo [76] approximate the price law, P(yl, Y2, Y3),
and the demand rules, Oi(p(y), Yi), with multivariate polynomials. To determine the
unknown coefficients in those polynomials, they impose projection conditions on the
investors' first-order conditions. The first-order-condition for a type i investor

Ey,z{u'(Si)(Z-pR) lyi,p } = 0 , i = 1,2,3. (60)

Using the definition of conditional expectation given above they impose projection
conditions of the form

Ey,z - p(y)R)p(v/ } : o, (61)

for various choices of j, k >~ 0. The condition in (61) states that the prOduct of the
excess return and the marginal utility of consumption for a type i agent is uncorrelated
with polynomials in p(y) and yi.
After imposing a sufficient number of such conditions, the result is a system of
projection conditions constituting a finite nonlinear system of algebraic equations. This
reduces an infinite dimensional functional problem to a finite-dimensional algebraic
problem. The projection conditions given in Eq. (61) are only part of the conditional
expectation condition given in Eq. (60). The hope is that a small number of projections
can yield a useful approximation. Judd and Bernardo document the accuracy for this
approximation method for a variety of distributions. Overall, their experience is that
this method is reliable and reasonably fast.

10.7. Convergence properties and accuracy of projection methods

When using numerical procedures, it is desirable to know something concerning its
errors. An important focus of theoretical numerical analysis is the derivation of bounds
on errors. Two kinds of error results are desirable. First, it is desirable to derive an
upper bound on the error for a given level of approximation. Second, if such upper
bounds are not possible, it may still be valuable to know that the error goes to zero
asymptotically, that is, as one lets the degree of approximation become arbitrarily
578 K.L. Judd

large. The first kind of error information is rarely available. More typical in numer-
ical algorithms for differential equations are asymptotic results. There has been little
work on proving that the algorithms used by economists are asymptotically valid.
Fortunately, there are general theorems concerning the consistency of the Galerkin
method. Zeidler [128] and Krasnosel'skii and Zabreiko [81] demonstrate consistency
under a variety of conditions. Even though it remains to be seen whether these the-
orems cover our problems, they do indicate that projection methods are potentially
valid for our economic problems.
Even if one had a convergence theorem for a method, it is clear that one cannot just
blindly accept any answer one gets from a computation. Asymptotic theorems have
a nasty feature of telling you only that the error goes to zero as your computational
effort approaches infinity, but generally not telling you at what finite level of effort you
may stop. Therefore, a more pragmatic approach is to ignore convergence theorems
and instead use diagnostics to ask whether a solution is acceptable. We actually did
that above in our construction of (26). There the issue was how well a perturbation
expansion solved a continuous-time Euler equation. We constructed the approximation,
substituted it in the Euler equation, and used the result, (26) to measure the amount
of irrationality an economic agent is guilty of in equilibrium if each agent followed
our approximate rule. If that number is small, say a dollar per million spent, then we
argue that the approximate rule is as reasonable a prediction for behavior as the "true"
equilibrium since people generally do not optimize beyond one part in a million.
In economic problems we can generally compute such diagnostics and measure the
level of implied "irrationality". This diagnostic approach to evaluating a candidate
solution can be applied independent of the computational method which produced
the candidate solution. It does not rely on convergence; in fact, even if one uses a
convergent method, one should still use such diagnostics to make sure that one did
not stop the method too early. Furthermore, if one uses a method for which there is
no convergence theorem, but it produces a solution which passes such diagnostics,
the lack of a convergence theorem is irrelevant.

11. Hybrid perturbation-projection method

We have discussed both perturbation and projection methods for solving economic
models. While they are different approaches to approximation problems, we will next
describe a method, the hybrid perturbation-projection method which synergistically
exploits their differences and similarities.
Suppose that there are a continuum of problems to be solved indexed by a parameter
e with the form

A/'(f(z, e); e) = O.
Ch. 12: Approximation, Perturbation, and Projection Methods in Economic Analysis 579

Suppose that we can solve the c = 0 instance. The result of applying perturbation
methods near the e = 0 solution is the calculation of a series of the form

f(x, ~ (62)

where the ~Pi(x) functions are computed by the perturbation calculations and the 6i (e)
are the generally prespecified gauge functions. Similarly, the result of a projection
approach is an approximation of the form

f(x,e)~-~ai(e)Wi(x) (63)

where the ~i(x) functions are the prespecified basis elements of the approximation
system and the ai(e) coefficients are computed by the projection method. The strength
of perturbation methods is that the approximations are quite good (in fact, asymp-
totically optimal) for small e, but the weakness is that the quality may not hold up
as e increases. The projection approach tries to be good for any e, but the difficulty
is finding good bases which will allow the series in (63) to be short. Therefore, the
strengths and weaknesses of these methods are complementary.
This observation turns out to be substantive. The idea of the hybrid perturbation-
projection method is to use the ~i(x) functions from perturbation calculations as
the basis functions to be used in a projection method. We know that these functions
constitute an optimal basis for small e, and that the optimal weight on ~i(x) is
3~(e) for small e. The conjecture is that the ~i functions still form a good basis for
approximating f(x, e) but that the weight on ~p~ should not be the prespecified 5i(e)
but rather should be computed by (63).
Our continuous-time growth model gives a simple example of this approach. Recall
the continuum of problems represented (27) and the related expansion (28). The
objective there was to use a perturbation method to solving (21). We will use the
results of the perturbation approach to develop a projection approach to solving (21).
The first perturbation was the function

c (k) - k"p( -' - 7) + (Tp - p)k. (64)

Note that this function has a singularity at k = 0, a feature which is possibly also true
of the solution to C(k, 1). This feature is absent in the orthogonal bases we discussed
above. We see here already that this procedure has produced a basis element which
has some advantages. To see if this is a good basis element, one can compare the
basis {1, k, C~ (k)} with the basis {1, k, k2}. Computations show that the custom-made
basis lead to solutions which had much smaller errors.
580 K.L. Judd

Note what (64) really suggests. Since the function k is already in the basis, C~ (k)
essentially adds the production function, k s, to the basis. This reflects the general idea
that the basis should be augmented by functions which are natural to the problem.
Above we used differentiability properties to motivate basis elements. The hybrid
approach attacks more precisely the problem of developing problem-appropriate bases.
Continuing the perturbation approach will generate a series of functions which
can be used as a basis for a projection approach. For example, C~(k) is a compli-
cated function which essentially adds k 2c~-1 to the basis {1, k, C~(k)}. This addi-
tional element is not as intuitive as C, (k), showing that the perturbation method will
bring in elements other than obvious ones. Again, computations show that the basis
{1, k, Ce(k), C,~(k)} does much better than the basis {1, k, k 2, k 3} in solving
These additional basis elements will possibly be collinear with previous elements.
However, for any specified inner product, we can use a standard Gram-Schmidt
procedure to construct a basis which spans the same space and is orthogonal. In
this way, we can combine the conditioning advantages of orthogonal bases with the
desirable shape properties of the perturbation functions.
Judd [73] discusses further the usefulness of this approach to producing bases.
The example above just hints at the method's potential. In this example, reducing the
number of basis elements is not important since the basis size is not a limiting factor in
one-dimensional problems with smooth well-behaved solutions. However, basis size
is a very important consideration in multidimensional problems. In those problems,
a few well-chosen basis elements may allow for drastic reduction in basis size. One
suspects that the hybrid perturbation-projection method has substantial potential in
multidimensional problems where economizing on the basis size is important.
The hybrid perturbation-projection method also points out the value of combining
methods. Since economics problems do not fit into standard mathematical classifica-
tions, it is likely that skillful combinations of various techniques will prove to be a
powerful technique.

12. Conclusions

In this chapter we have reviewed a collection of approximation ideas which have

proved themselves useful in computational analyses of economic models. We have
also shown that a general class of techniques from the numerical partial differential
equations literature can be usefully applied and adapted to solve nonlinear economic
problems. Despite the specificity of the applications discussed here, the general de-
scription makes clear the general usefulness of perturbation and projection methods
lbr economic problems, both theoretical modelling and empirical analysis. The appli-
cation of perturbation and projection methods and the underlying approximation ideas
have already substantially improved the efficiency of economic computations. Further
exploitation of these ideas will surely lead to further progress.
Ch. 12: Approximation, Perturbation, and Projection Methods in Economic Analysis 581

[1] Albrecht, J.W., Hohnlund, B. and Lang, H. 'Comparative statics in dynamic programming models
with an application to job search', Journal of Economic Dynamics and Control, 15(4):755-769,
[2] Anderson, G.S. 'Symbolic algebra programming for analyzing the long-run dynamics of economic
models', in: H. Variaaa, ed, Economic and financial modeling with mathematica. New York: Springer,
[3] Aranjo, A. and Scheinkman, J.A. 'Smoothness, comparative dynamics, and the turnpike property',
Econometrica, 45:601-620, 1977.
[4] Balcer, Y. and Judd, K.L. 'Dynamic effects of tax policy', mimeo, 1985.
[5] Bah'o, R. 'On the determination of public debt', Journal of Political Economy, 87:940-971, 1989.
[6] Barron, A.R. 'Universal approximation bounds for superpositions of a sigmoidal function', IEEE
Transactions on Information Theory, 39(3):930-945, 1993.
[7] Belhnan, R.~ Kalaba, R. and Kotkin, B. 'Polynomial approximation: A new computational technique
in dynamic programming: Allocation processes', Mathematics c~t"Computation, 17:155-161, 1963.
[8] Bender, C.M. and Orszag, S.A. Advanced mathematical methods fi~r scientists and engineers. New
York: McGraw-Hill, 1978.
[9] Benhabib, J. and Nishimura, K. 'The Hopf bifurcation and the existence and stability of closed orbits
in multisector models of optimal economic growth', Journal (~f Economic Theory, 21(3):421-444,
[10] Bensoussan, A. Perturbation methods in optimal control. Wiley, 1988.
[11] Bizer, D.S. and Judd, K.L. 'Uncertainty and taxation', American Economic Review, 79:331-336,
[12] Blanchard, O.J. and Kahn, C.M. 'The solution of linear difference models under rational expecta-
tions', Econometrica, 48(5):1305-1311, 1980.
[13] Bleistein, N. and Handelsman, R.A. Asymptotic expansions ~f' integrals'. New York: Holt, Rinehart
& Winston, 1976.
[14] Blume, L., Easley, D. and O'Hara, M. 'Characterization of optimal plans for stochastic dynamic
programs', Journal of Economic Theory, 28:221-234, 198Z
[15] de Boor, C. A practical guide m splines. New York: Springer, 1978.
[16] Bovenberg, A.L. 'The effects of capital income taxation on international competitiveness and trade
flows', American Economic Review, 79:1045-1064, 1989.
[17] Bovenberg, A.L. 'Capital income taxation in growing open economies', Journal qfPublic Economics,
31:347-377, 1986.
[18] Bovenberg, A.L 'The corporate income tax in an intertemporal equilibrium model with imperfectly
mobile capital', International Economic Review, 29:321-340, 1988.
[19] Brock, W.A. 'Pathways to randomness in the economy: Emergent nonlinearity and chaos in eco-
nomics and finance', Estudios Economicos, 8(1):3-55.
[20] Brock, W.A. and Mirman, L.J. 'Optimal economic growth and uncertainty: The discounted case',
Journal ql" Economic Theory, 4:479-513, 1972.
[21] Brock, W.A. and Turnovsky, S.J. 'The analysis of macroeconomic policies in perfect foresight equi-
librium', International Economic Review, 22:179-209, 1981.
[22] Burnett, D.S. Finite element analysis. Reading, MA: Addison-Wesley, 1987.
[23] Budd, C , Harris, C. and Vickers, J. 'A model of the evolution of duopoly: Does the asymmetry
between firms tend to increase or decrease?', Review of Economic Studies, 60(3):543-573.
[241 Caputo, M.R. 'How to do comparative dynamics on the back of an envelope in optimal control
theory', Journal ~f Economic Dynamics and Control, 14:655-683, 1990.
[25] Caputo, M.R. 'Comparative dynamics via envelope methods in variational calculus', Review ~[
Economic Studies, 57(4):689-697, 1990.
[26] Chao, J.C. and Phillips, EC.B. 'Bayesian posterior distributions in limited information analysis of
the simultaneous equations model', Yale University, mimeo, 1994.
582 K.L. Judd

[27] Chaff, V.V., Christiano, L.J. and Kehoe, EJ. 'Policy analysis in business cycle models', mimeo, 1993.
[28] Cheney, E.W. Introduction to approximation theory. New York: McGraw-Hill, 1966.
[29] Chiappori, P.A., Geoffard, P.Y. and Guesnerie, R. 'Sunspot fluctuations around a steady state: The
case of multidimensional, one-step forward looking economic models', Econometrica, 60(5):1097-
1126, 1992.
[30] Cl~ristiano, L.J. 'Solving the stochastic growth model by linear-quadratic approximation and by
value-function iteration', Journal of Business and Economic Statistics, 8:23-26, 1990.
[31] Christiano, L.J. 'Linear-quadratic approximation and value-function iteration: A comparison', Journal
¢)f Business and Economic Statistics, 8:99-113, 1990.
[32] Christiano, L.J. and Fisher, J.D.M. 'Algorithms for solving dynamic models with occasionally binding
constraints', University of Western Ontario, mimeo, 1994.
[33] Christiano, L.J. and Valdivia, V. 'Notes on solving models using a linearization method', Northwest-
ern University, mimeo, 1993.
[34] Chow, S.-N. and Hale, J.K. Methods t~fbifitrcation theory. New York: Springer, 1982.
[35] Coddington, E.A. and Levinson, N. Theory qfordinary differential equations. New York: McGraw-
Hill, 1965.
[36] Coleman, W.J. II 'Solving the stochastic growth model by policy function iteration', Journal of
Business and Economic Statistics, 8:27-29, 1990.
[37] Cooley, T. and Hansen, G. 'The inflation tax in a real business cycle model', American Economic
Review, 79:733-748, 1989.
[38] Daniel, J.W. 'Splines and efficiency in dynamic programming', Journal of Mathematical Analysis
and Applications, 54:402-407, 1976.
[39] Danthine, J.-P. and Donaldson, J.B. 'Stochastic properties of fast vs. slow-growing economies',
Eeonometrica, 49:1007-1033, 1981.
[40] Danthine, J.-P., Donaldson, J.B. and Mehra, R. 'On some computational aspects of equilibrium
business cycle theory', Journal of Economic Dynamics and Control, 13:449-470, 1989.
[41] Dantzig, G.B., Harvey, R.P., Landowne, Z.E and McKnight, R.D. DYGAM - a computer system.fi)r
the solution t)fdynamic programs. Palo Alto, CA: Control Analysis Corporation, 1974.
[42] Dasgupta, S. and McKenzie, L.W. 'The comparative statics and dynamics of stationary states', in:
J. Chipman, D. McFadden, M. Richter, eds, Pre.ferences, uncertainty, and optimality: Essays in honor
of Leonid Hurwicz. Boulder/Oxford: Westview Press, pp. 280-303, 1990.
[43] Deaton, A. and Laroque, G. 'On the behavior of commodity prices', Review of Economic Studies,
59:1-23, 1992.
[44] Dixit, A. 'Analytical approximations in models of hysteresis', Review of Economic Studies,
58(1):141-151, 1991.
[45] Dotsey, M. and Mao, C.S. 'How well do linear approximation methods work?', Journal of Monetary
Economics, 29:25-58, 1992.
[46] Feichtinger, G. 'Hopf bifurcation in an advertising diffusion model', Journal of Economic Behavior
and Organization, 17(3):401-411, 1992.
[47] Fletcher, C.A.J. Computational Galerkin techniques. New York: Springer, 1984.
[48] Fleming, W. 'Stochastic control for small noise intensities', SlAM Journal of Control, 9(3):473-517,
[49] Fleming, W. and Souganides, P.E. 'Asymptotic series and the method of vanishing viscosity', Indiana
University Mathematics Journal, 35(2):425-447, 1986.
[50] Franke, R. 'Stable, unstable, and persistent cyclical behaviour in a Keynes-Wicksell monetary growth
model', Oxford Economic Papers, 44(2):242-256, 1992.
[51] Friedman, A. 'Stochastic differential games', Journal ~fDifferential Equations, 11:79-108, 1972.
[52] Ghysels, E. and Lieberman, O. 'Dynamic regression and filtered data series: A Laplace approximation
to the effects of filtering in small samples', University of Montreal, mimeo, 1993.
[53] Golubitsky, M. and Schaeffer, D.G. Singularities and groups in bifurcation theory, Vol. I. New York:
Springer, 1985.
Ch. 12: Approximation, Perturbation, and Projection Methods in Economic Analysis 583

[54] Grossman, S. 'On the efficiency of competitive stock markets where agents have diverse information',
Journal of Finance, 18:81-101, 1976.
[55] Grossman, S.J. and Stiglitz, J.E. 'On the impossibility of informationally efficient markets', American
Economic Review, 70:393-408, 1980.
[56] Gustafson, R.L. 'Carryover levels for grains: A method for determining amounts that are optimal
under specified conditions', USDA Technical Bulletin 1178, 1958.
[57] den Haan, W. and Marcet, A. 'Solving the stochastic growth model by parameterizing expectations',
Journal of Business and Economic Statistics, 8:31-34, 1990.
[58] Hansen, G.D. and Prescott, E.C. 'Recursive methods for computing equilibria of business cycle
models', IEM Discussion Paper 36, Federal Reserve Bank of Minneapolis, 1991.
[59] Hansen, L.E and Sargent, T.J. 'Formulating and estimating dynamic linear rational expectations
models', Journal of Economic Dynamics and Control, 2:7-46, 1980.
[60] Hansen, L.E and Sargent, T.J. 'Recursive linear models of dynamic economies', Hoover Institution,
manuscript, 1990.
[61] Hansen, L.E and Singleton, K. 'Generalized instrumental variables estimators of nonlinear rational
expectations models', Econometrica, 50:1269-1286, 1982.
[62] Holly, A. and Phillips, EC.B. 'A saddlepoint approximation to the distribution of the k-class estimator
in a coefficient in a simultaneous system', Econometrica, 47:1527-1548, 1979.
[63] Huffman, G.W. 'A dynamic equilibrium model of assset prices and transaction volume', Journal of
Political Economy, 95(1):138-159, 1987.
[64] Johnson, S., Stedinger, J.R., Shoemaker, C.A., Li, Y. and Tehada-Guibert, J.A. 'Numerical solution
of continuous-state dynamic programs using linear and spline interpolation', Operations Research,
41(3):484-500, 1993.
[65] Judd, K.L. 'An alternative to steady-state comparisons in perfect foresight models', Economics
Letters, 10:55-59, 1982.
[66] Judd, K.L. 'Short-run analysis of fiscal policy in a simple perfect foresight model', Journal qf
Political Economy, 93:298-319, 1985.
[67] Judd, K.L. 'Debt and distortionary taxation in a simple perfect foresight model', Journal of Monetary
Economics, 20:51-72, 1987.
[68] Judd, K.L. 'Welfare cost of factor taxation in a perfect foresight model', Journal of Political Econ-
omy, 95, 1987.
[69] Judd, K.L. 'Closed-loop equilibrium in a multi-stage innovation race', mimeo, 1985.
[70] Judd, K.L. 'Asymptotic methods in dynamic economic models', Hoover Institution, mimeo, October,
[71] Judd, K.L. 'Minimum weighted residual methods for solving dynamic economic models', Hoover
Institution, mimeo, 1990.
[72] Judd, K.L. 'Projection methods for solving aggregate growth models', Journal of Economic Theory,
58(2):410-452, 1992.
[73] Judd, K.L. 'Hybrid perturbation-projection methods applied to economic problems', Hoover Insti-
tution, mimeo, 1996.
[74] Judd, K.L. Numerical methods in economics. Cambridge, MA: MIT Press, forthcoming.
[75] Judd, K.L. and Guu, S.-M. 'Perturbation solution methods for economic growth models', in: H. Var-
ian, ed., Economic and financial modeling with mathematica. New York: Springer, 1993.
[76] Judd, K.L. and Bernardo, A. 'Asset market equilibrium with general securities, tastes, returns, and
information asymmetries', Hoover Institution, mimeo, 1994.
[77] Judd, K.L. and Guu, S.-M. 'Bifurcation approximation methods applied to asset market equilibrium',
Hoover Institution, mimeo, 1996.
[78] Judd, K.L. and Solnick, A. 'Numerical dynamic programming with shape-preserving splines', mimeo,
[79] Kehoe, T. and Levine, D. 'Comparative statics and perfect foresight in infinite horizon economies',
Econometrica, 53:433-454, 1985.
584 K.L. Judd

[80] Kotlikoff, L., Shoven, J. and Spivak, A. 'The effect of annuity insurance on savings and inequality',
Journal of ~ b o r Economics, 4(3):S183-$207, 1986.
[81] Krasnoselskii, M.A. and Zabreiko, P. Geometrical methods of nonlinear analysis. Berlin: Springer,
[82] Kydland, E 'Equilibrium solutions in dynamic dominant-player models', Journal ~/"Economic The-
ory, 15:307-324, 1977.
[83] Kydland, E 'Noncooperative and dominant player solutions in discrete dynamic games', International
Economic Review, 16(2):321-335, 1975.
[84] Kydland, EE. and Prescott, E.C. 'A competitive theory of fluctuations and the feasibility and de-
sirability of stabilization policy', in: S. Fischer, ed., Rational expectations and economic policy.
Chicago, IL: Univ. of Chicago Press, 1980, pp. 169-198.
[85] Kydland, EE. and Prescott, E.C. 'Time to build and aggregate fluctuations', Econometrica, 50:1345-
1370, 1982.
[86] Lafrmlce, J.T. and Barney, L.D. 'The envelope theorem in dynamic optimization', Journal ~[ Eco-
nomic Dynamics and Control, 15(2):355-385, 1991.
[87] Laitner, J. 'The stability of steady states in perfect foresight models', Econornetrica, 49:319-333,
[88] Laitner, J. 'Transition time paths for overlapping-generations models', Journal qfEconomic Dynamics
and Control, 7:111-129, 1984.
[89] Laitner, J. 'The dynamic analysis of continuous-time life-cycle savings growth models', Journal of
Economic Dynamics and Control, 11:331-357, 1987.
[90] Laitner, J. 'Dynamic determinacy and the existence of sunspot equilibria', Journal qf Economic
Theory, 47(1):39-50, 1989.
[91] Laitner, J. 'Tax changes and phase diagrmns tor an overlapping generations model', Journal of
Political Economy, 96:193-220, 1990.
[92] Lucas, R.E. 'Optimal investment policy and the flexible accelerator', International Economic Review,
8:78-85, 1967.
[93] Magill, J.RM. 'A local analysis of N-sector capital accumulation under uncertainty', Journal of
Economic Theory, 15(1):211-219, 1977.
[94] Marcet, A. 'Simulation analysis of dynamic stochastic models: Applications to theory and estimation',
in: C. Sims, ed., Advances in econometrics sixth world congress, Vol. II. Cambridge Univ. Press, pp.
[95] McGrattan, E.R. 'Solving the stochastic growth model by linear-quadratic approximation', Journal
of Business and Economic Statistics, 8(1):41~3, 1990.
[96] Merton, R.C. 'Optimal consumption and portfolio rules in a continuous-time model', Journal (~f
Economic Theory, 3(4):373-413, 1971.
[97] Milnor, T. Topology from the d!fferentiable viewpoint. Charlottesville, VA: Univ. Press of Virginia,
[98] Miranda, M.J. and Hehnberger, RG. 'The effects of commodity price stabilization programs', Amer-
ican Economic Review, 78:46-58, 1988.
[99] Miranda, M. and Rui, X. 'Solving dynamic games via polynomial projection methods: An application
to international commodity markets', Ohio State University, mimeo, 1994
[100] Mortensen, D.T. 'Generalized costs of adjustment and dynamic factor demand theory', Econometrica,
41:657-665, 1973.
[101] Oniki, H. 'Comparative dynamics (sensitivity analysis) in optimal control theory', Journal of Eco-
nomic Theory, 6:265-283, 1973.
[ 102] Otani, K. 'Explicit formulae of comparative dynamics', International Economic Review, 23:411.4 19,
[103] Phillips, RC.B. 'Best uniform and modified pad6 approximants to probability densities in econoinet-
tics', in: W. Hildebrand, ed., Advances in econometrics. Cambridge, MA: Cambridge Univ. Press,
Ch. 12: Approximation, Perturbation, and Projection Methods in Economic Analysis 585

[104] Phillips, EC.B. 'Marginal densities of instrument variable estimators in the general single equation
case', Adwmces in Econometrics, 2:1-24.
[105] Phillips, P.C.B. 'To criticize the critics: An objective Bayesian analysis of stochastic trends', Journal
of Applied Econometrics, 6:333-364, 1991.
[106] Prescott, E.C. and Mehra, R. 'Recursive competitive equilibrium: the case of homogeneous agents',
Econometrica, 48:1365-1379, 1980.
[107] Rivlin, T.J. An introduction to the approximation of functions. Waltham, MA: Blaisdell, 1969.
[108] Rice, J.R. Numerical methods, software, and analysis. Boston: Academic Press, 1993.
[109] Rivlin, T.J. Chebyshev polynomials: From approximation theory to algebra and number theory. New
York: Wiley/Interscience, 1990.
[110] Rust, J. 'Optimal replacement of GMC bus engines: An empirical model of Harold Zurcher', Eeono-
metrica, 55(5):999-1033, 1987.
[111] Rust, J. 'Maximum likelihood estimation of discrete control processes', SlAM Journal on Control
and Optimization, 26:1006-1023, 1988.
[112] Rust, J. 'Numerical dynamic programming in economics', in: this Handbook.
[113] Saad, Y. and Schulz, M.H. 'GMRES: A generalized minimum residual algorithm for solving non-
symmetric linear systems', SlAM Journal on Scientific and Statistical Computing, 7(3):856-869,
[114] Samuelson, P.A. 'The fundamental approximation theorem of portfolio analysis in terms of means,
variances and higher moments', Review of Economic Studies, 37:537-542, 1970.
[115] Santos, M.S. and Vila, J.-L. 'Smoothness of the policy function in continuous time economic models:
The one-dimensional case', mimeo, 1988.
[116] Sargent, T.J. 'Estimation of dynamic labor demand schedules under rational expectations', Journal
of Political Economy, 86:1009-1044, 1978.
[117] Schumaker, L.L. 'On shape-preserving quadratic sptine interpolation', SlAM Journal ~fNumerical
Analysis, 20(4):854-864, 1983.
[118] Srikant, R. and Basar, T. 'lterative computation of noncooperative equilibria in nonzero-sum differ-
ential games with weakly coupled players', Journal c)f"Optimization Theory and Applications, 71 (1):
137-168, 1991.
[119] Stokey, N.L. and Lucas, R.E., Jr. Recursive methods in economic dynamics', Cambridge: Harvard
Univ. Press, 1989.
[120] Taylor, J.B. and Uhlig, H. 'Solving nonlinear stochastic growth models: A comparison of alternative
solution methods', Journal t)f Business and Economic Stutistics, 8:1-18, 1990.
[121] Treadway, A.B. 'Adjustment costs and variable inputs in the theory of the competitive firm', Journal
t~f Economic Theory, 2:329-347, 1970.
[122] White, H. Artificial neural networks. Cambridge: Blackwell, 1992.
[123] Wright, B.D. and Williams, J.C. 'The economic role of commodity storage', Economic Journal,
92:596-614, 1982.
[124] Wright, B.D. and Williams, J.C. 'The welfare effects of the introduction of storage', Quarterly
Journal t~f Economics, 99:169-182, 1984.
[125] Williams, J. and Wright, B. Storage and commodity markets. Cambridge Univ. Press, 1991.
[126] Wong, R. Asymptotic approximation of integrals. San Diego: Academic Press, 1989.
[127] Zadrozny, E 'Analytic derivatives for estimation of discrete-time, linear quadratic, dynamic, opti-
mization models', Econometrica, 56:467-472, 1988.
[128] Zeidler, E. Nonlinear functional analysis and its applications, Vol. 1. New York: Springer, 1986.
[129] Zeidler, E. Nonlinear functional analysis, Vol. II. New York: Springer, 1989.
[130] Zeldes, S.E 'Optimal consumption with stochastic income deviations from certainty equivalence',
Quarterly Journal of Economics, 104:275-298, 1989.
[131] Zhang, W.-B. 'Hopf bifurcations in multisector models of optimal economic growth', Econmnic
Letters, 26(4):329-334, 1988.
Chapter 13


University ~f Amsterdam


1. Introduction 588
2. Deterministic optimization 588
2.1. Linear quadratic control: Discrete time 588
2.2. Linear quadratic control: Continuous time 602
3. Stochastic control 604
3.1. Stochastic linear quadratic control: Discrete time 604
3.2. Stochastic linear quadratic control: Continuous time 613
4. Summary 614
A note on software 615
References 615

*In the period that I have been writing this chapter I benefited greatly from the discussions I had with
a number of people. My thanks go to Sudhakar Achath, Masanao Aoki, Tamer Basar, Willem Buiter, Ray
Fair, Manfred Gilli, Steven Hall, Henk Jager, Ken Judd, Finn Kydland, Reinhard Neck, Heinz Neudecker,
John Rust, Berc Rustem, Mark Salmon, Charles Tapiero, Boe Thio, Victor Wesseling and three anonymous
referees. Special thanks go to David Kendrick who gave a lot of useful advise on an earlier draft of this
chapter. I am indebted to Brigit Bokhorst and Ada Kromhout for correcting the English. Any remaining
deficiencies are entirely my own.

Handbook of Computational Economics, Volume L Edited by H.M. Amman, D.A. Kendrick and J. Rust
(~) 1996 Elsevier Science B.V. All rights reserved.
588 H. Amman

1. Introduction

The use of dynamic optimization techniques in economics goes back to the 1920s
with the work of Ramsey (1928) and Hotelling (1931). However, it was not until
the 1950s that dynamic optimization techniques and optimal control theory gained
a wider acceptance in economics. The work of Tustin (1953), Holt (1962), Phillips
(1954, 1957) started a period of deterministic control theory in economics that was
further boosted by the work of Pontryagin et al. (1964). Pontryagin and his co-workers
extended the calculus of variation, by developing the Maximum Principle.
Another line of research is the use of stochastic control methods in economics. In the
beginning of the Seventies stochastic control was introduced with the works of Chow
(1975), MacRea (1972), Kendrick (1973) and Norman (1976), adding the element of
uncertainty to the model dynamics. More recently, stochastic control in economics
has been successfully pursued by Neck and Matulka (1994a, 1994b), Becket et al.
(1986, 1994) and Rustem (1990, 1994).
We will first present here the most widely used control model in economics, the
Linear Quadratic Control Model (LQCM). The major advantage of LQCM is that it
has, in general, a unique analytical solution. However, this is not necessarily true when
dealing with forward-looking behavior or stochastic models. The same holds for most
nonlinear dynamic control problems. Consequently the solution of the optimization
problem has to be obtained through an approximation procedure of some sort.
In the next section we will briefly present the standard deterministic LQCM and
extend its formulation to allow for rational expectations. Section 3 will deal with
the stochastic linear quadratic control model. The stochastic version of the LQCM
explicitly models stochastic parameters and hence provides the opportunity to address
the Lucas critique within the control framework. At the end of each section we will
present some examples to highlight the power of the control models. Control models
with both rational expectations and parameter uncertainty will generally have no
analytical solution. Therefore, our approach will be to obtain a solution in a numerical
fashion, emphasizing the importance of numerical methods for economic modeling.
In Section 2 we will start with the LQCM for finite and infinite horizon problems.
Section 3 will extend the LQCM for stochastic models. For the more general nonlinear
models the reader is referred to the next chapter Numerical dynamic programming by
John Rust.

2. Deterministic optimization

2.1. Linear quadratic control: Discrete time

The LQCM has frequently been discussed in both the systems engineering literature
and economics. A partial list of references in both fields is: Aoki (1976), Bertsekas
Ch. 13: Numerical Methods ]br Linear-Quadratic Models 589

(t976), Kwakernaak and Sivan (1972), Pindyck (1973), Chow (1975), Kailath (1980),
Kendrick (1981a, 1981b), Whittle (1982), Jamshidi and Malek-Zavarei (1986). The
finite horizon LQCM or tracking I problem can be formulated as follows:
Find for the linear system

xt+l = A t x t + B t u t + Ctzt,
x0 given (2.1)

with xt E R n and t E {0, 1 , . . . , T } , a set of admissible controls U = {u0, ul,

., UT-1 } which minimize the quadratic cost function

Jr = LT(XT) + E L t ( x t , ut) (2.2)


LT = ~I X ti Q T X T ,

LT = ~1 x /t~4txt
r, t
+ xtFtut 1 ,'
+ 2utRtu (2.3)

where xt E R n is the state vector, ut E R m the control vector, zt E R s the vector of

exogenous (predetermined) variables, At ~ R nxn the transition matrix, Bt E N n x m ,
Ct E R nxs, Qt c IRn x n a symmetric semi-positive definite matrix, Rt c R ~xTT~ a
symmetric positive definite matrix and Ft E ]Rnxm a cross-term matrix, all at time t.
Very often the LQCM is written in the tracking or regulator form, tracking the state
target vectors ~?t E R n and the target control vector ~%t E R m. Hence, the criterion
function has the form

1 ^ !
LT -- [(XT - XT) Q T ( X T - :~T) (2.4)

for t = T and

: ½(x - - + - -

+ l ( u t - £ t ) ' R t ( u t - ut) (2.5)

for t ~ { 0 , . . . , T - 1}.
The solution of the problems formulated in (2.1)-(2.3) or (2.1), (2.2), (2.4) and (2.5)
can be obtained through Bellman's dynamic programming, Bellman (1957), which in

t Sometimes the Linear-Quadratic tracking problem is referred to as Linear-Quadratic regulator problem.

590 H. Amman

our case is equivalent to solving these equations through backward induction. By

applying Bellman's functional recurrence equation 2 for t E {T - 1 , . . . , 0}

Vt = min { L t ( x t , ut)
Xt+l ~Ut~t+l

+ At+, (xt+, - A t x t - Btut - Ctzt) + Vt; 1} (2.6)

we can move backward in time, thus solving the problem for the value function Vt
with the Lagrange multipliers At+l. Vt~_l is the optimal value of the value function at
time t + 1. The above backward induction will lead to a set of four equations for each
time step, in the variables {Kt, pt, Gt, gt}, that incorporate the necessary conditions
for an optimum. These equations are

K t = A~Kt+IAt + Qt
- [A~Kt+lBt + Ft] [B~Kt+IBt + Rt]-1 [B~Kt+,At + Ft], (2.7)

Pt = -[A~tKt+lBt + Ft] [B~Kt+,Bt + Rt]-1 [B~(Kt+,Ctzt + Pt+,) - Rtftt]

+ A~ [Kt+lCtZt + Pt+l] - QtYct, (2.8)

Gt = - [ B ; K t + 1 B t + R t ] - l [ B ; K t + l A t + Ft], (2.9)

gt = - [ B ~ K t + l B t +/~t] -1 [B~(gt+lCtZt + Pt+l)- Rt~tt] • (2.10)

Equation (2.7) is the so-called Riccati equation and (2.8) the tracking equation. These
two equations have the fixed end condition

K T = QT , (2.11)

PT = -- QT3C. (2.12)

Starting at t = T with (2.11), (2.12) we can backtrack (2.7) to obtain the Kt's and Pt'S.
Then (2.9) and (2.10) can be used to calculate the Gt's and 9t's. Once this is done, the
initial condition x~ = x0 can be used to derive u~ from the linear feedback equation

"at =: Gtxt + gt. (2.13)

Then the system Eq. (2.1), with the help of the feedback rule (2.13), can be in-
tegrated forward in time to obtain the whole set of optimal state vectors X* =

2For a tutorial see Leonard and Van Long (1992).

Ch. 13: Numerical Methods .fi)r Linear-Quadratic Model.; 591

{x~, x ~ , . . . , z~v_l } and the optimal control vectors U* = {u~, u ~ ' , . . . , u~,_ 1}. Given
the fact that we have assumed that R t is positive definite and Q t is semi positive
definite the optimum obtained is a minimum. In summary the simple algorithm is: 3

Algorithm 1. Computing the L Q C M

1 input x0
2 compute and store fixed end conditions K T = Q T , PT : --Qt3CT
3 for t = T - 1 , . . . , 1 repeat /* backward loop */
4 retrieve K t + l , Pt+l
5 c o m p u t e a n d store K t , Pt
6 end
7 for t = 0, 1 , . . . , T - 1 repeat /* forward loop */
8 retrieve K t + l , Pt+l, xt and c o m p u t e Gt, gt
9 c o m p u t e a n d store ut, xt+l
10 end
If we assume that the target values, matrices and predetermined variables of the control
problem are time-independent, there may be a infinite horizon solution, {k, ~2), for the
LQCM. The infinite horizon problem, Aoki (1989), Sargent (1987), can be formulated
as follows:
Find for the time-invariant system

xt+t = A x t + B u t + C z (2.14)

the stationary control vector ~ that minimizes

J~ = lim ~Lt(xt,ut) (2.15)
T ---) c ~


L t = ½(xt - Yc)'Q(xt - ~) + (xt - Yc)'F(ut - ~z)

1 u t - ¢ z ) ' n ( u t - ¢z).
+ ~( (2.16)

We can derive the exact solution of the state and control vector {~, ~2} from the
Bellman recurrence equation for infinite horizon problems, Aoki (1989), Bertsekas
(1976), Sargent (1987),

V = m i n { L ( x , u ) + V ( A x + Bu + C z ) } . (2.17)

3For the algorithms presented in this chapter we are using "pseudo code" as generally used in numerical
analysis. For examples, see Scales (1985) or Stewart (1973).
592 14. Amman

By assuming that the value function has the form V ( x ) = x ' K x + f i x , we obtain the
algebraic Riccati equation

K = A ' K A + Q - [ A ' K B + F ] [ B ' K B + R ] - I [ B ' K A + F'], (2.18)

p = -[A'KB + F][B'KB + R] -1 [ B ' K ( C z + p) -- R~t]

+ A ' [ K C z +p] - Q:L (2.19)

If we insert the solution of (2.18), (2.19), {K,~5} in (2.9), (2.10) we get {G, ~}. The
stationary value of the state equation are then

Yc = (1 - A - B G ) - I ( B [ 7 + C z ) (2.20)


= ~:~ + ~. (2.21)

However there is no guarantee that such a stationary solution actually exists. A suf-
ficient condition of a stable stationary solution is that the moduli of the eigenvalues
of the matrix A + B G fall into the unit circle. It can be shown that if this stability
condition holds the matrix (I - A - / 3 G ) is invertible, enabling us to derive the
unique solution (2.20) and (2.21). Bertsekas (1976, p. 75) gives an alternative set of
conditions for which (2.18) has a stationary solution.
The simplest way to compute the stationary solution of the infinite horizon problem
is by using successive substitution. By reversing the time index in Eq. (2.7) and setting
Ko "- Q, PT = --Qt~zT we can iterate on Kt. First iterate on K~ until

tlK~ -- K~-lll < Ep1/2

Sp being the smallest representable machine number and 11. It an appropriate norm,
for t C {0, 1,...} and then repeat the procedure for Pt until

ttp~ - pc-, II < cp1/2 .

IIp~-~ II
More sophisticated schemes using a Newton method can be found in Amman and
Neudecker (1995) and a nonrecursive method for solving the algebraic Riccati equa-
tion is described in Vaughan (1970) and McGratten (1994). The latter method is
restricted, however, because it assumes that the transition matrix A is nonsingular.
Ch. 13: Numerical Methods .fi)r Linear-Quadratic Models 593

Implementation issues

The computation complexity of this backward/forward induction is quite simple, Am-

man (1986), Amman and Kendrick (1992a). Apart from the matrix inversion, the
operations in (2.7)-(2.10) belong to the BLAS Level-3 operation set, 4 which means
that with the current state of computing machinery these operations can be carried out
with a high degree of efficiency. Either through pipelining and superscalar processing
in RISC architectures or through parallel-vector processing on most supercomputing
systems. 5
The maximum order of the operations is O(~3), h = max(n, m). 6 In economics the
number of control variables is generally far less than the number of state variables.
The amount of CPU time required to obtain U* is for all practical purposes linear
with n 3 x T. The remaining issue is the matrix inversion. Normally, you will avoid
carrying out the matrix inversion, because of the numerical instability and inefficiency
connected to matrix inversion. In general it is better to solve a system instead. How-
ever, by backward recurrence we can easily show that B~Kt+IBt + Rt in (2.18),
(2.19) is positive definite and has full rank, rk(B~Kt+lBt + Rt) = m, so the size of
the inverse is modest and its numerical properties are superior to those of a general
matrix inverse. Therefore, using a Choleski decomposition procedure will normally
produce satisfactory results.

Example 1.

To conclude this part we will present a numerical example to demonstrate the LQCM.
Let us take a simple optimal growth model with intertemporal substitution between
capital and consumption, which is of the following form:
Find for the system

yd = ct + it output demand equation,

Y~ : O1]~t @ Oalt production function,
market clearing condition,
kt+j = (1 - a)k.~ + ii capital formation, (2.22)
It labor supply,
et consumption demand,
it investment demand,

a set of admissible controls

4An expos6 on Basic Linear Algebra Subroutines can be found in Coleman and Van Loan (1988).
5Amman (1986, 1989, 1990) gives performance figures on a number of supercomputing systems.
6Modi (1988) provides an introduction to the computational analysis of most linear algebra operations.
594 H. Amman

lo '''" 19

so as to minimize the quadratic penalty function

Jlo = qlo(klo - 1) 2 + ~ Z [(1 - c t ) 2 + l~] (2.23)

subject to the constraints k + 0 = 1 and kt0 = 1. In this example consumption and

labor are normalized variables in the interval [0, 1]. By giving ql0 a sufficiently large
number we impose the terminal condition, hi0 = 1, for the capital stock. Applying
Algorithm 1, from this section we obtain

Solution of optimal growth model

0 1 2 3 4 5 6 7 8 9 10 cxD
k 1.00 1.39 1.66 1.83 1.93 1.96 1.93 1.83 1.66 1.39 1.00 2.50
l 0.40 0.28 0.21 0.14 0.10 0.07 0.05 0.04 0.03 0.02 0.00
c 0.20 0.43 0.59 0.71 0.79 0.85 0.89 0.92 0.94 0.96 1.00

with the functional value Jl0 = 0.40 and the stationary solution in the last column.

Rational expectations: The Fair and Taylor method

In economic decision-making expectations play an important part. Normally economic

agents will form expectations on key economic indicators like prices and income, and
display a forward-looking behavior in accordance with their expectations. Broadly
speaking there are two ways for modeling expectations, see e.g. Sheffrin (1983),
Adaptive and Rational Expectations. The first hypothesis, the adaptive expectations
hypothesis (AEH), assumes that economic agents are backward-looking and that they
base their expectations on the past states of the economy. Because of this backward-
looking nature, the AEH fits easily into the standard economic control models. Un-
fortunately, the AEH is inconsistent in the sense that the expectations will generally
be biased and will systematically deviate from the actual outcome of the economic
In his well-known article, Muth (1961) introduced another type of expectation for-
mation that does not suffer from this inconsistency property: the Rational Expectations
Hypothesis (REH). According to the REH, the subjective expectations of economic
agents will be equal to the mathematical expectations based on the economic model
employed by economic agents. Consequently, the expectation fbrmation will be unbi-
ased and no systematic errors occur when predicting the future. This REH has had a
major impact on the economic literature. Under the REH, economic policy tends to be
less effective than under the AEH. Furthermore, Kydland and Prescott (1977) pointed
Ch. 13: Numerical Methods ]br Linear-Quadratic Models 595

out that Bellman's principle of optimality is not valid if expectations are developed
according to the REH. Rational expectations will generally lead to a time-inconsistent
Suppose that the maximum lead in xt+l is k. Because of its expectational effect, the
control vector ut then has impact on {xt+l, xt+2, • • • , X t + k } . This effect is ignored in
(2.1) and consequently the dynamic programming approach produces a non-optimal
solution. To resolve this issue we have to reformulate Eq. (2.1) of our LQCM slightly
to form an augmented state equation

Xt+l = Atxt + Btut + Ctzt -I- ~ Dj,tEtxt+j (2.24)

where E t x t + j is the expected state vector for time t + j formed at time t and Dj,t E
N nxn is a known matrix. Under the REH the subjective expectations of economic
agents, t X ~ + j , will coincide with the mathematical expectations, so evidently tx~+j =
Etxt+j. In the above deterministic case, rational expectations are reduced to perfect
foresight, which means that Vj t x t+j ¢ -- Xt+ j, Note that in this case the value function
becomes a function of future states other than xt+l

¼=V(zt+l,...,zt+k,u, At+l). (2.25)

Through the expectations the minimization of Vt becomes functionally dependent on

{ V t + l , . . . , Vt+k} and consequently we have a fully dependable set of minimization
problems, which are very complicated to solve analytically. There have been a number
of attempts to design computational strategies to derive the set of controls in the
presence of rational expectations. The procedures that have been proposed are in
most cases extensions of the Fair and Taylor (1983) 7 numerical iterative approach
for solving models with rational expectations. Cohen and Michel (1988), Amman
and Kendrick (1992b, 1995b) and Fisher (1992) discuss how to derive a sub-optimal
strategy that is time-consistent and expectation-consistent. An alternative approach to
this problem can be found in Fisher, Holly and Hughes Hallett (1986) and Hughes
Hallett (1987).
The Fair and Taylor (1983) procedure for determining the value of the rational
expectations in simulation models is an iterative scheme, in which values of these
variables after convergence are the same as the solution from the model. This requires
choosing some initial path for the expected values for the first iteration and then
solving the model repeatedly. After each iteration the expected values are updated to
be the same as the solution values for the corresponding state variable in the previous
solution. This process is continued until convergence is obtained. Fisher (1992) and

7See also Fair (1984).

596 H. Amman

A m m a n and Kendriek (1992b) employ a similar procedure for optimization models.

The iterative scheme will be outlined here by using the notation Etx~+j to represent
the expected value of the state variable at iteration v. The simplest way of beginning
the iterative scheme is by setting Etx~+j = 0, Vt, Vj, and then solving the model.
In this case there are no rational expectations and the standard solution procedure
for quadratic-linear tracking problems can be used. Call the optimal state variables
for this solution X N L = { x g L , . . . , xNL}, that is the no-lead ( N L ) solution. Then
set the values of the rational expectations equal to the no-lead solution for the first
iteration, v = 1,

Etxt+ j = itN
+ Lj , Vt, Vj. (2.26)

The system equation on the first iteration then becomes

xv+l A ~v+l t9 . v + l ~ v (2.27)
t+l = ~ t . ~ t + J-,t~ t + C t z t + ,...., Dj,tEtzt+ j .

However the terms

Ctzt + ~ Dj,tEtxtV+j (2.28)

are all known so the system equation can be rewritten as

_v+l A ~v+l Z:~ o v + l ~t~, ~ (2.29)

it+ 1 : zatx t + L'tC~t +


Ct = [Ct Dl,t . . . Dk,t] (2.30)


~,~ = . . (2.31)

k Etz~+k J
The system Eq. (2.29) is now of the same form as the original system Eq. (2.1) and
the standard LQCM algorithm can be used to solve the model for iteration v + 1. Call
the solution to this problem
*,v+l *,v+l'~
xt ,u t ), Vt. (2.32)
Ch. 13: Numerical Methods for Linear-Quadratic Models 597

The iterations as described above are then repeated until convergence is obtained, so

Ilx~ +1 - x~ll
_1/2 ,
~p Vt, (2.33)
where Cp is the tolerance of convergence. If convergence is obtained for all the
expectational variables at all the time periods, then the condition

E t x t + j = E t x t +* j = xt+j,
* Vt, Vj, (2.34)

will hold and the solution conforms to the REH. There is one important loose end in
tile above algorithm, viz. the expectational variables { E t X T + I , . . . , EtXT+k}. As we
do not have any values for {XT+1, • • •, XT+k} we do not have any values to substitute
for the expectational variables. We can do two things to solve this problem. First, we
can assume that XT reaches a stationary value. If so, then Vj EtXT+j = XT = X* and
we have satisfactory fixed end conditions. However, this stationarity condition will not
always hold. In those cases we have to extend the terminal horizon sufficiently to say
T + s, to get satisfactory computational results} By extending the horizon s periods
we can mitigate the effect of setting the expectations to an arbitrary level and therefore
reduce the effect of the forward-looking behavior to a minimum. In summary, the Eqs
(2.23)-(2.30) describe the Fair-Taylor iterative procedure for expectational variables
which is performed at each time period. The basic algorithm is as follows.

Algorithm 2. L Q C M expectations augmented

1 input Ep, X0, set Etxot+j = S t x t ,NL+ j Vt E [0, T - 1], Vj E [0, k]
2 for v = 0, 1 , . . . , repeat
3 set terminate_exp -- false /* start expections loop */
4 set Ct and ~[, Vt • [0, T - 1], Vj • [0, k],
5 compute mr+ 1-v+l, uV+lt , Vt • [0, T - 1], /* use algorithm 1 here */
6 if Vtllx;~+~ -x~'ll/llx~ll < ~/2 then /* check on convergence */
7 set terminate_exp = t r u e
8 else
9 Etx~+j =- xt+
_~+1j , Vt • [0, T - 1], Vj E [0, k],
10 endif
11 end until terminate_exp
The first step of the algorithm sets the tolerance level of the algorithm, ~p , fo1-
lowed by an initial guess of the state variables, xt+NLj. Steps 3 through l l are the
loop that derives the expectational consistent optimal solution predetermined vari-.
ables. Steps 4 defines the expectations augmented state equation so it can be fitted

8See also Fair (1984, Chapter 11).

598 H. Amman

into the LQCM. Step 7 computes a norm for checking whether or not we have to
j u m p out of the expectations loop.
Before we move to the next section, it is worthwhile to note that the L Q C M has
many extensions and reformulations. Examples are constrained controls and non-fixed
targets for state and control variables. A number of these additions are discussed in
Kendrick (1981) and A m m a n and Jager (1987, 1988). Furthermore, in the tradition
o f Theil (1964) the L Q C M can be rewritten in a stacked f o r m allowing for a non-
dynamic programming solution. Theil's method has been extended and improved by
Hughes Hallett, Holly and Rustem. 9

Example 2.

To conclude this part we will present a numerical example to demonstrate the L Q C M

with forward-looking variables. Like in the previous example, let us take a simple
optimal growth model with expectations of the form:
Find for the system

yd = ct + it output demand equation,

y~ = O1kt + 02lt linear production function,
market clearing condition,
EtCt+l = 1 - It labor supply, (2.35)
kt_l- 1 = (1 -- 5)kt + ii capital formation,
ct consumption demand,
it investement demand,

a set of admissible controls U = { c o , . . . , C9} SO as to minimize the quadratic penalty


J,o = qlo(klo - 1): + i ~ [(1 - c t ) : + l~] (2.36)

subject to the constraints ko = 1 and klo = 1. Applying Algorithm 2, from this section
we obtain
Solution of optimal growth model with expectations
0 1 2 3 4 5 6 7 8 9 10 oe
k 1.00 1.37 1.61 1.75 1.81 1.79 1.68 1.48 1.17 0.70 1.00 2.50
l 0.51 0 . 3 7 0 . 2 6 0 . 1 9 0 . 1 3 0 . 0 9 0 . 0 7 0 . 0 5 0 . 0 3 0.01 0.00
c 0.28 0 . 4 9 0 . 6 3 0 . 7 4 0.81 0.87 0.91 0.93 0 . 9 5 0.97 1.00

with the functional value J10 = 0.26.

9For a survey on the stacked form solution method see Hughes Hallett and Rees (1983) and Holly,
Rustem and Zarrop (1979). An analysis of the computational efficiency of the stacked form versus dynamic
programming can be found in Amman (1989).
Ch. 13: Numerical Methodsfor'Linear-Quadratic Models 599

The above table clearly reflects the expectational effects of the control model. In
the first two periods the solution jumps to its long run solution. Much quicker than
would be the case without the expectational effect•

Rational expectations: The Blanchard and Kahn method

The major drawback of the method described in Algorithm 2 is that one needs bound-
ary conditions for the state vector. However, for a certain class of models one can
derive a solution that depends on a different set of boundary conditions of the state
vector and uses the so called saddle point properties of the model employed, see
Amman and Kendrick (1995b). Let us assume we have a system equation of the form

xt+l = Axt + But + Czt + ~_j DjEtxt+j. (2.37)

There are two major differences with (2.24)• First, there is a minimal lead in the ex-
pectation formation of two instead of one, and second, the matrices are time-invariant.
If we assume additionally that Dk is nonsingular we can reduce (2.37) to the first-
order difference form. In order to compute the admissible set of controls we have
to eliminate the rational expectation variables from the model. For this we need the
Blanchard and Kahn (1980) method (BK). Equation (2.37) may be transformed into
the BK's so called first order linear form, cf. Chow (1983), by augmenting the state
vector with the lead variables, that is

Xt+l 1 0 I 0
• , , 0
Et+lxt+2 [ = 0 0 I

Et+l Xt+k .1
I - D k-1
,tAt - D k -1
,t - D-1

X I Et+lXt+2 + zt (2.38)

ILEt+~xt+k-1 -D Bt -Dk,tCt

assuming that the Dk,t matrix is invertibleJ ° Applying the BK notation this simplifies

L tPt+; Pt L 7~,2 ] L "[z,2 J

mFor applied models this may be a strong assumption.

600 H. Amman


Et+tPt+ 2 =

~/u,t = ' and %,t = • (2.40)

-Dk,{Bt -D , Ct

Once we have the model in first order form we can apply the BK method for solving
first order systems with rational expectations. Applying the Jordan canonical form
method, ll dropping the time index for the matrices, we get

t+l -- + + ct (2.41)


v [ xt+l ]
xt+l = [ E t + l P t + 2 J


A = Bll J1B~II 1,

~-'Yu,1 -- ( B I 1 J 1 C 1 2 -I- B12J2C22)C221J21(C21~{u,1 + ~22~u,2),

c~ = ~/z,l Zt - ( N i l J 1 C I 2 -/- B I Z J 2 C 2 2 ) C ~ 1

X~ {J(2-i-l)(c21~[z,1 +~22~[z,2)Etz,.t_i} (2.42)


-- ( N i l J l C12 nt- BI2J2C22)C221



Thus in moving from Eq. (2.37) to (2.41)-(2.42) we have used the BK method
to reduce the economic model to a form without rational expectations. Now that we
have the model in the form (2.41), it is easy to set up an iterative scheme to solve

11In order to compute the actual numeric solution of a model we must assume that the matrix At can
be diagonalized.
Ch. 13." Numerical Methods for Linear-Quadratic Models 601

our optimization model. If we set the instruments at the first iterationstep to'zero,
Vt ut° = 0, we can solve the optimization model using (2.41). This produces a new
set of instruments Vt ut° = 0 allowing us to iterate on Eq. (2.41) c~. This iterative
procedure is repeated until the values of the model x~' converge.
Through the BK method we have reduced the system equation to a form without
expectations. Having the system equation in the form of (2.41) and taking into account
that (2.42) depends on future predetermined variables and controls, it is easy to set up
an iterative scheme to solve our control model. If we set the initial control variables
to zero, Vt ut° = 0, we can solve the control model by applying Algorithm 1. As soon
as we have convergence on all states, like in (2.32) we have obtained the optimal
solution of our expectation augmented control model. The difference with the Fair-
Taylor approach is, that we now exploit the saddle point property of the control model
to "capitalize" with the c~ term in (2.42) the effect of future, out of sample, controls
on the solution of the model.

Example 3.

To conclude this section we will extend a numerical example to demonstrate the Fair-
Taylor approach and the BK approach. Let us take a simple macro model with output
pt, consumption ct, investment it, government expenditures 9t, and taxes taxt. The
corresponding state vector has the form zt = [Yt ct it 9t taxt]' and the control vector
ut = [9t+1]. The L Q C M is then like:
Find for the system

Yt+l = et+l q- i t + l q- g t + l ,

ct+l = 0.8(yt - t a x t ) + 200,

it+l = 0 . 2 E t y t + 2 + 100, (2.43)
gt+ 1 ~ '~t,

taxt+l = 0.25yt+l

with xo = [ 1 5 o o 1100 400 0 375]', a set of admissible controls U = { g l , . . . , 97} so

as to minimize the quadratic penalty function

J7 = ~1 ~ [(Yt- 1600) 2 q- d] (2.44)

and Joo for the time consistent/expectation consistent optimal policy. Applying Algo-
rithm 2, setting ETXT+1 = ETXT+2 = xT, from this section we obtain
602 H. A m m a n

Solution of LQ control model by Algorithm 2

t 0 1 2 3 4 5 6 7 8 9 10 cx~
y 1500 1567 1585 1 5 9 1 1 5 9 2 1592 1 5 9 3 1 5 9 3 1 5 9 2 1590 1585 1592.59
g 53 28 21 20 19 19 19 19 19 16 18.52
Jl0 = 8962

Solution of LQ control model by Blanchard and Kahn method

t 0 1 2 3 4 5 6 7 8 9 10 cx~
y 1500 1571 1589 1 5 9 3 1 5 9 5 1 5 9 5 1 5 9 5 1 5 9 5 1 5 9 5 1 5 9 3 1587 1596.15
g 53 28 21 20 19 19 19 19 19 16 19.23
Jlo = 8902

So in this example, clearly the BK method performs differently then the Fair-Taylor
method, due to different handling of the boundary conditions.

2.2. Linear quadratic control: Continuous time

The power of the LQ model in the discrete time case lies in its practical relevance.
Most econometric models are models in discrete time and consequently the discrete
version of the LQ model can be used in modeling a policy evaluation problem.
However, for analytical models more often the continuous version of the LQCM is
used. Various examples can be found in Tapiero and Sulem (1994), Bergstrom (1987).
The continuous version of the LQCM is formulated as follows:
Find for the linear system

Sc(t) = A ( t ) x ( t ) + B ( t ) u ( t ) + C ( t ) z ( t ) (2.45)

or simpler

Jc = A x + B u + C z (2.46)

with x(0) given and t E [0, T], an admissible path of the control vector u(t), which
minimizes the quadratic penalty function

J =
/0 L ( x , u) dt + L T ( x ( T ) ) (2.47)


L = !2( x - Yc)'Q(x - Yc) + (x - Yc)'F(u - ¢~) + '-(u

2 - ~ ) ' n ( u - ~),

L T -- ½ ( x ( T ) - Y c ( T ) ) ' Q ( T ) ( x ( T ) - ]c(T)). (2.48)

Ch. 13: Numerical Methods .fi)r Linear-Quadratic Models 603

The solution of the continuous LQ model can be obtained in several ways. The most
common approaches are the Maximum Principle, Pontryagin (1964), or the Hamilton-
Jacobian-Bellman equation (HJBe). Following the latter we can define the dynamic
optimization problem in a similar fashion as in Eq. (2.5). The HJBe for the above
problem, dropping the time variable for the sake of convenience is

~t { iron L ( x , u ) +
(Ax + S u + Cz) }
. (2.49)

Differentiation with respect to the control vector yields the feedback rule

~z " (2.50)

If we insert the optimality condition lbr the control vector into the HJBe we end
up with a partial differential equation (PDE) that can be solved through numerical
techniques directly. In doing so, we obtain the time path of the functional J and its
numerical derivatives ~J/~t and ~J/~x, enabling us to compute the time path of
the optimal control vector. Numerical methods for solving this type of PDE are the
so-called ADI method and the Crank-Nickelson method. Good references for solv-
ing PDEs are found in Ames (1977) and Sewell (1988). Easy accessible references
are Burden and Faires (1989), Cheney and Kincaid (1985) and Myint-U (1987). Ap-
plications that solve PDEs numerically are mainly found in the finance literature,
especially for solving option models like the Brennan and Schwartz bond model [cf.
Foster (1989) and Hull (1989)1. In our case there is a simpler way of finding the
solution for the control vector by assuming a solution of the functional of the form
[cf. Chow (1979) or Tapiero and Sulem (1994)]

J = l x ' K ( t ) x + p(t)'x + h(t) (2.51)

K, p, and h being time dependent variables. By inserting (2.45) and the appropriate
derivatives of (2.48) in (2.46) we can derive a set of equations which have a similar
form as the Eqs in (2.6)-(2.11). These equations are

- K = Q - (F + K B ) R - 1 ( F ' + B ' K ) + A ' K + K A , (2.52)

-!3 : ( K B + F ) R - I ( F ' ~ - B'p) - Q~? + K ( C z + B~) + A'p, (2.53)

: ½ 'm + + 'Fa + p'Cz. (2.54)

Given the set of initial conditions

K(T) = Q(T),
604 H. Amman

p(T) .... Q(T)9(T),

h,(T) : ½2(T)'Q(T)9(T) (2.55)

we can integrate these equations backward in time. Once we have obtained the solution
we can compute the optimal path of the control vector from z(0) by using

u* : 9 + Gcc (2.56)

where now 9 : ¢z + R - l (F'Yc- B'p) and G = - R -1 ( F ' + B ' K ) . Note that (2.51) is
the continuous tim e counterpart of the Eq. (2.12). Instead of computing the solution of
the HJBe directly we now have to solve a set of ordinary differential equations (ODEs)
in (2.49), which is much simpler. Appropriate ways to solve (2.49) are, for instance, a
Runga-Kutta scheme as a good workhorse or a more sophisticated Predictor-Corrector
scheme. See Press et al. (1986) or Stoer and Burlirsch (1980).

3. Stochastic control

3.1. Stochastic linear quadratic control: Discrete time

Essential in stochastic control experiments is the way in which information is handled.

With information we mean the way we deal with new measurements of the system,
the economic model, we use for our control experiments. The three most common in-
formation structures in control theory are Certainty Equivalence (CE) with or without
stochastic parameter updating, Passive Learning or Open Loop Feedback (OLF) and
Active Learning or Dual Control (DUAL).
The Certainty Equivalence concept with parameter updating uses, at any given time
period t, new measurements of the system to reestimate the stochastic parameter vector
of the system, Or. Once this parameter vector is obtained a new set of optimal controls
for the remaining periods, {u~'+l,..., U~_I} , is computed, ignoring the uncertainty
connected to these stochastic parameters.
In the Passive Learning concept the parameter uncertainty is taken into account
when computing the optimal controls. The uncertainty of the stochastic parameters is
captured by the covariance matrix of these parameters, St°°. This covariance matrix
gives information on the level of uncertainty in the system equation. It is evident that
the degree of uncertainty in the system equation determines the degree to which the
desired path can be followed. By using S °° we can distinguish more risky policy
outcomes from less risky policy outcomes. To put it in a economic perspective, a
less risky tax reduction might be preferred over a more risky monetary expansion in
pursuing a GNP target.
Ch. 13: Numerical Methods for Linear-Quadratic Models 605

Active Learning even goes a step further. With Active Learning the policy-maker
will, at a given time step ~, try to use the controls of the system, ut, to reduce the
so-called Cost-to-Go, which can be interpreted as the expected value of the value func-
tion discussed in the previous section [cf. Bertsekas (1976) and Kendrick (1981)]. The
Cost-to-Go is the anticipated cost that will be incurred in the system for the remaining
periods, t + 1 , . . . , T . E l e m e n t s which have an important effect on the Cost-to-Go are
the magnitudes of the covariance matrices { Z ° ° l , . . . , Z °°} and { Z ~ l , . . . , ~U~}. 12
By reducing the magnitude of these matrices the uncertainty in the state variable
outcomes is reduced. With Active Learning the controls have a dual character in the
sense that they are used for controlling the system along the desired path, but also
for reducing the uncertainty in the system. A reduction in the system uncertainty
may improve the level of control over the system and hence give a better perfor-
mance. Because of space limitations we will not present all the "equational details"
for computing the optimal controls, but concentrate on the conceptual and algorithmic
scheme for solving the stochastic LQCM under the REH. The equational details are
presented in Kendrick (1981 a, 1981 b) and Amman and Kendrick (1991). Other ref-
erences on stochastic control and rational expectations can be found in Basar (1988,
1989a, 1989b) and Rustem (1990).

Certainty equivalence

The standard stochastic quadratic linear tracking problem is written as

Find for the linear system

xt+, = A t ( O t ) x t + Bt(Ot)ut + Ct(Ot)zt + ~ D j , t ( O t ) E t x t + j + et (3.1)

with EOo c R p, E x o C R n given, t E {0, 1 , . . . , T } , a set of admissible controls

U = {u0, u l , . . . , u T - i } so as to minimize the quadratic penalty function

JT -- t~, L T ( X T ) -l- L t ( z t , ut) (3.2)


LT = ½(xT -- -

n t = ½(xt - Yct)'Qt(xt - xt) + (xt - :~t)'Ft(ut - ut) (3.3)

12Strictly speaking the matrices LjZ00t, ' " , ~tx x } have a prior estimate {Ntlt_l,
" " ' ~tlt-I}
a posterior estimate {Et°l°t,... , Ztl t }. For notational convenience we have left out this distinction. The
same holds for Ot and xt.
606 H. Amman

The variables have the same meaning as in Section 2. In addition we use the following
variables: Ot C ]Rp a time varying stochastic parameter vector containing the elements
of the matrices At, Bt, Ct, Dj,t, and ~t E R n as a random term vector. As in the
previous section, we have augmented the system equation with rational expectations.
The variable Etxt+ j is the expected value of the state variable at period t + j as
projected from period t. The solution of the augmented system in the stochastic
version is derived in roughly the same way as in the deterministic model. Should
convergence be obtained for all the expectational variables in all the time periods, then
the condition Vt, Vj tx~+ i = E t x t + j will hold and the solution conforms to the REH.
The possibility of measurement error is also entertained here. These measurement
relations are

Yt = Ht(Ot)xt + ~t (3.4)

where Yt E R z is a measurement vector, Ht c R lx~ a measurement coefficient matrix

and ~t c R t is a measurement error vector. Finally, the stochastic parameter vector
Ot cannot only be uncertain, but may also follow a first-order (time-varying) Markov

Ot+l = f2tOt + tit (3.5)

where [2t E R pxp is a known Markov transition matrix and r/t E R s a time-varying
parameter error vector. The vectors et, Ct, ~Tt, x0, 00 are assumed to be mutually inde-
pendent, normally distributed, random vectors with known means and covariances. In
addition we assume that xo ~ N ( E x o , Z ~ ) , Oo N N(EOo, zoo), et ~ N(O, Z ~ ) ,
~t ~ N ( 0 , Z ~ ) and tit ~ N(O, Zvv). Furthermore, E x o E R n is the expected state
vector at initial period, EOo E IRt is the expected parameter vector at the initial period,
Z ~ x c R r~xn is the covariance matrix for the initial period state variables, Z oo ~ ]Rtxl
is the covariance matrix for the initial period parameter estimates, Z ~ E R r~xn is
the covariance matrix for system disturbances, Z ~( E R pxp is the covariance ma-
trix for measurement disturbances, Z ~ c R sxs is the covariance matrix for Markov
disturbances. Ezo, EOo, Z ~ , Z ~ , ~ v , Z ~ x and Z oo are assumed to be known.
For proper understanding, if we set ~2t = I , ~U'm = 0 we have a simple OLS es-
timation problem for the parameters. If f2t = I, 2 2 ~ > 0, the parameters follow a
random walk. The matrices Z ce, Z ¢¢, Z:nv can be estimated using a Full Information
Likelihood Estimation procedure, Cuthbertson et al. (1992).
If we rewrite the system equation in the form of Eq. (2.24), the model is in the
format of the standard control model without rational expectations. We will not discuss
all the computational details when deriving the solution of the optimal controls, U*.
The details can be found in Kendrick (1981a, 1981b). We will concentrate on the
solution concept for the rational expectations and their interaction with the optimal
control vector and the updating of the parameter vector. We apply the Fair-Taylor
Ch. 13: Numerical Methods .fi)r Linear-Quadratic Models 607

iterative procedure from the previous section, this is the inside loop of the algorithm
shown in Algorithm 3, steps 4-12.
Consider next the outside or time-period loop in Algorithm 3, step 2 and 14. In the
Certainty Equivalence procedure the new observation of the state variables is obtained
at the end of each time period and this is used to update the stochastic parameter vector.
This updating is done with the Kalman filter as described in Kendrick (1981, p. 90).
Moreover, there is an additive random shock et in the system equation and this is
added to the state variable for period ~ + 1.
Algorithm 3. Certainty equivalence method
1 input ep, Exo, EOo, set EtxO+j = Etxt+j
,NL Vt E [0, T - 1], Vj c [0, hi,
2 for t = 0, 1 , . . . , T - l, repeat /* time loop */
3 set terminate_exp = false
4 for v = O, 1 , . . . , repeat /* expectations loop */
5 set Cl and ~ , Vl E [~, T],
6 compute U~ +1, x:~t
]t~ .,v+l
a~l+ 1 Vl C [t, T],
7 if I I E t x ~ ÷ 1 - E~x~ll/lIE, x? II < ~p , VZ ~ It, T] then
8 set terminate_exp = t r u e
9 else
10 Etx?+j = "~t~z+j~ - ~ + ' , V1 E [t, T], Vj E [0, k],
11 endi f
12 end until terminate_exp
13 measure y t + l /* move to next time step */
14 compute EO~ + l , E x ~ + l , S ° t ° l , S ~ 1 /* reestimation based on measurement */
15 end

Passive learning

We use the name Passive Learning here to refer to the method which is called Open
Loop Feedback in Bertsekas (1976) and Kendrick (1981). In this method the uncer-
tainty in the parameters is used for choosing the control variables in each period,
but no consideration is given to the influence of this choice on future learning. The
system equation for this approach is the same as in the certainty equivalence case,
Eq. (3.1). The stochastic parameter vector in (3.1), 0t, has a covariance matrix S oo
obtained when estimating the system equation. When rational expectations are added
to the system equation the result is the same as in the Certainty Equivalence method.
The algorithm is somewhat more complicated in the Passive Learning case than in
the Certainty Equivalence case. Not because there are more loops in the algorithm,
but rather because it is necessary to discuss more of the loops. In the Certainty
Equivalence case it was not necessary to discuss the iteration backward across time
periods, which is done as the Riccati equations are integrated backward in time from
the terminal conditions to the initial period. Neither was it necessary to discuss the loop
608 H. Amman

for the integration of the system equation from the initial conditions to the terminal
conditions. However, both of these additional loops are displayed in Algorithm 4.

Algorithm 4. Passive learning method

1 input %, Exo, EOo, set EtxOt+j = EtXt+jNL, Vt C [0, T - 1], Vj C [0, k],
2 for t = 0, 1 , . . . , T - 1, repeat /* time loop */
3 set terminate_exp = false
4 for v = 0, 1 , . . . , repeat /* expectations loop */
5 set Cz and 2~, Vl E It, T],
6 f o r i = t , . . . , T - 1 repeat /* forward projection */
• . r~OO~v r~XX~V r~ ~ V
7 project 2..,i+ 1 : 2 a i + 1 , 125tUi+ 1
8 compute Etx~+ l , u v
9 end_/
10 if IlE~x~ + ' - E , ~ x ~ l l / l l E ~ x ~ l [ < c p , vz ~ [t,T] then
11 set terminate_exp = t r u e
12 else
13 v j
EtXl+ . .:v+,j , V1 e [t, T], Vj C [0, k],

14 endi f
15 end until terminate_exp
16 m e a s u r e Yt+l /* move to next time step */
17 compute EOt+l, Ext+l, ~°° 1, 12~ 1 /* reestimation based on measurement */
18 end_t
In order to have a consistent solution for the Passive Learning case it is necessary
at any time step tto integrate the covariance matrices forward in time to the terminal
period. This forward integration occurs in the system loop shown in Algorithm 4 and
is done given all the information available at the time. The system loop is embedded
in the rational expectation iteration loop, so the system loop must be repeated with
each iteration until a consistent solution is obtained for Eq. (3.1) at each period <
The Fair-Taylor iterations are done in the same way as for the Certainty Equiva-
lence case, except that a Passive Learning problem is solved at each iteration until
convergence is obtained. The outside or time-period loop is similar to the Certainty
Equivalence procedure, except that after each observation is obtained it is necessary
to update not only the mean values of the stochastic parameter estimates, 0t, but also
the covariance matrix Z °°. These covariance estimates are needed to compute the
solution of (3.1)-(3.5).

Active learning

in this method one considers at a given time step not only the present uncertainty
in the stochastic parameter estimates, Z oo , but also the potential impact of the
choice of the controls on the future uncertainty in the stochastic parameter estimates,
{ Z ° ° ~ , . . . , Z°°}. The algorithm considers the possibility of improving the estimates
Ch. 13." Numerical Methods fi)r Linear-Quadratic Models 609

of the stochastic parameters, thus increasing the control over the system. By employ-
ing the control ut, we could try to reduce tsS00 t + l , . . ' , Z T00 }- The criterion function
and system equation for the Active Learning approach is the same as for the other
two methods, Eqs (3.1)-(3.3).
The model can be solved by using the Active Learning algorithm which is outlined
in Kendrick (1981, Chapter 10). The algorithm for is even more complicated in the
Active Learning case than in the Passive Learning case. This time we have once
again one extra loop in the algorithm. This loop is the search loop, as is shown in
Algorithm 5.

Algorithm 5. Active learning method

1 input ep, Exo, EOo, set Etx°+j = EtXt+jNL,Vt • [0, T 1], Vj E [0, hi,
2 for t = 0, 1 , . . . , T - 1 repeat /* time loop */
3 set terminate_exp = false
4 for v = 0, 1 , . . . , repeat /*expectations loop */
5 set Cl and ~ , Vl c It, T],
6 set terminate_search = false
7 for j = 0, 1 , . . . , repeat /* search loop */

8 m i n i m i z e E t ~ ~_~(Jd,l
T + J~,z + J,,z) } /* evaluate C o s t - t o - G o */

9 fori=t,...,T-1 repeat /* forward projection */

10 project ~y:OO,v
/+1 ~,v
' z ~ i + l ' EtOV+l
11 c o m p u t e EtxV+l ' u vi(iCt)
12 end_/
13 if Ilu3t•+l'~ - u t 5,~ll/llu~,~ll
< @/2 then
14 set terminate~earch = t r u e /* optimal control */
15 else
16 u~+l,~ =
j,v ,
17 endi f
18 end until terminate_search
19 if I I E ~ z ~ +~ - E~x?ll/llEtz~ll < ~ , Vl • [t,T], then
20 set terminate_exp : t r u e
21 else
22 Etx~+j = ~txt+ v+l, VI • It, T], Vj C [0, k],
23 endi f
24 end until terminate_exp
25 measure Yt+~ /* move to next time step */
26 c o m p u t e EOt+l, E x t + l , S ° ° l , S~.~_! /* reestimation based on
measurement */
27 end_t
610 H, Amman

The search loop is used in the dual control algorithm to search for the optimal value
of the control vector u~, which strikes the best balance between: (i) perturbations
which can be used to reduce future uncertainties in the parameter estimates; and (ii)
control actions which keep the state and control variables on the desired paths. Each
time a search value for the control vector is chosen it is necessary to integrate the
Riccati equations backwards in time and the system equation forward in time; so these
two loops are shown in Algorithm 5.
The rational expectations iterations are done in the same way as for the Passive
Learning case except that an Active Learning problem is solved at each iteration until
convergence is obtained. The outside or time-period loop is similar to the Passiye
Learning procedure, except that after each observation is obtained it is necessary to
update not only the mean values of the stochastic parameters, but also the covariance
matrices. As with the Passive Learning case an extended Kalman filter is used for
this purpose.

Related issues

The Active learning strategy is the most mathematically sophisticated learning strat-
egy of the strategies presented here. However, there are a number of drawbacks
connected to the Active Learning strategy. First, it is highly computational intensive,
and second, the Active Learning strategy might suffer from nonconvexities. We have
encountered this nonconvexity phenomenon in a number of our simulation studies,
Kendrick (1978), Amman and Kendrick (1993, 1995a). This is caused by the fact
that in the Active Learning case the algorithm tries to dimish future (projected) un-
certainty of the system by perturbing the current state of the system. This produces
a trade-off between future gains, because of better learning, and current loss due to
the perturbation. A possible solution to the nonconvexity problem is to use a global
solver. Goffe et al. (1992) recently applied the global solving method of Simulated
Annealling for a number of nonconvex likelihood functions and Jerrell (1994) applied
interval analysis to the same set of functions. Though computationally intensive, both
methods seem to be promising when dealing with nonconvex functions.
An other issue concerns the Lucas critique, Lucas (1976). The Lucas critique states
that in deriving the optimal strategy through a control model, the future value of
the parameters can be effected by these controls. As a result any deterministic type
of control model may deviate from the optimal path because this parameter-drift
effect is ignored. The way in which the control model is specified in Eqs (3.1)
and (3.2) explicitly allows for the parameters to change over time. In computing
the expectations-consistent strategy we incorporate the effect that our controls may
have on the stochastic parameters of the system. In the forward projection we in-
clude the effect of our controls on future learning of our parameters. The way the
system Eq. (3.1) is set up, the effect of the rational expectations are an integral part
of the learning process of the parameters. The presence of rational expectations in
Ch. 13: Numerical Methods fi)r Linear-Quadratic Models 611

Eq. (3.1) allows the policy maker to learn the effect expectations may have on the
future value of the parameters. By embedding a feedback mechanism from future
states expectations on current parameter values, we escape to some extent the Lucas

Example 4: The Taylor model

The example we will examine in this section is the Taylor model, Taylor (1979).
The Taylor model is an empirical model, with the advantage that it supplies all the
necessary econometrics to carry out a stochastic control simulation on a real-life model
with rational expectations. The stochastic control version of the Taylor model has the
following form:
Find for the system

Yt+l = 01,ty~ + 02,ty~-i + 03,tht+l + 04,tht + 05,tEtTh+1

+ 06,tt + Oo,t - 0.38wt + el,t,
7rt+l = 7rt + 08,tEtYt+l + 09,t + Wt+l - 0.67wt + c2,t, (3.6)
qd3t+l : Z3,t


Yt = output level,
7rt : rate if inflation,
ht = real money balance,
w~ = random term used for transformation


EOo - [1.t67 - 0.324 0578 - 0.484 - 0.447 8.43E-0.5 0.072 0.018 5.15E-0.4]'

and E x o = [yo 7to wo]' = [0.1 0 0], the set of admissible controls U = { h o , . . . , hlo}

{9 }
to minimize

Jxo : + + (3.7)


- 0.79E-2 0 0 ]
~EC = 0 1.0E-32 0 ] ,
0 0 0.37E-2
612 H. Amman

27,vv = S ¢¢ : 22 ~x : 0, Ht : f2t = I , EtZT+l ----EtXT,

(3.77E-2 -0.68E-2 -0.39E-2 0.31E-2 --0.10E-2 0.16E-5 0.59E-3 0.41E-5 0.15E-6"

0.81E-2 0.54E-2 --0.61E-2 0.42E-2 0.38E~5 0.51E-3 0.43E-4 0.71E-6
0.31E-1 --0.33E-1 0.39E-1 --0.19E-5 0.20E-2 0.17EM. 0.31E~5
0.38E-1 --0.47E-1 -0.58E-6 -0.36E-2 --0.llE~- --0.25E-6
~¢qq = 0.96E-1 --0.32E-5 --0.56E-2 0.31E~ 0.76E-6
0.61E-8 0.18E-5 -0.10E-7 --0.26E-10
0.12E-2 --0.39E-5 --0.58E-8
0.34E-4 0.65E-6

In t h e figure b e l o w we h a v e p l o t t e d the m o n e y s u p p l y u n d e r the v a r i o u s strategies

GNP in t h e Taylor Model

Value GNP

0.45 - - G N P Active Learning

O.4 - - G N P Passive Lenming







(I i I
5 9 l(}


Money Supplyin the Taybr model

~ ' - e k , . 1 + MoneySupplyActive Leming ]
---e-- MoneySupplyPassive L~,-~ng]
-0,4 ]~ Money supplyCE ]
o= -0,45
-0,55 1 2 3 4 ~"--..,~,~,~,~ 6 7 8 9
Ch. 13: Numerical Methods .fbr Linear-Quadratic Models 613

Below you find the Monte Carlo results (10000 runs) of the various strategies.

Performance Taylor model

Strategy Mean Standard deviation % Score
No action (ut = 0) 0.9737 0.0054 3.1
Certainty equivalence 0.1629 0.0014 22.3
Passive learning 0.1614 0.0014 63.3
Active learning 0.1620 0.0014 11.3

In the Taylor model the Passive learning strategy turns out to be the best strategy in
63.3 percent of the simulation runs, although the difference with the Active learning
strategy is not very significant. This is a somewhat disappointing result, considering
the extra computational complexity involved in the Active. learning strategy. However,
this result is consistent with results we encountered earlier with other macro models.
If the level of uncertainty in the model is modest, perturbing the economic model in
order to improve the quality of the estimation normally does not seem to pay off in
terms of lower loss function values.

3.2. Stochastic linear quadratic control: Continuous time

In the last part of this section on LQCM we will derive the continuous version of
the SLQCM under the assumption of certainty equivalence. If we add uncertainty
to the model in (2.43) we end up with a stochastic differential equation. The the-
ory of stochastic differential equations is exhaustively dealt with in Arnold (1974)
and Oksendal (1985). Various examples can be found in Chow (1979), Kamien and
Schwartz (1981), Malliaris and Brock (1982), Bergstrom (1987) and Amman and
Velden (1992). Computational aspects are discussed in Kushner and Dupuis (1992)
and Tapiero and Sulem (1994). The continuous version of the LQCM is formulated
as follows:
Find for the linear system

dx = (Ax + Bu + Cz) dt + ~l/2dv (3.8)

with x(0) given and t < [0, T], an admissible path of the control vector u(t), which
minimizes the expected value of the quadratic penalty function

Y Eo{~oTL(x,u)dt+LT(x(T))} (3.9)

the function L having the same shape as in Eq. (2.44). The system noise v follows
a Brownian motion with usual properties E(dv), E(dv 2) = dt, and the matrix yjl/2
is the cholesky decomposition of the system noise covariance matrix Z, such that
614 14. A m m a n

= ($1/2)~$1/z. The solution of the stochastic model can once again be obtained
through the Hamilton-Jacobian-Bellman equation (HJBe). Following the latter we
can define the dynamic optimization problem in a similar fashion as in Eq. (2.5). The
HJBe for the above problems is

Ot - - m i n L(x, u) + OJ
-~x (Ax + Bu + Cz) + tr~Z-~--~zz) (3.1o)

Like in Paragraph 2, if we insert the optimality condition for the control vector into the
HJBe we end up with a partial differential equation (PDE) that can be solved through
numerical techniques directly. In doing so, we obtain the time path of the functional J
and its numerical derivatives OJ/Ot and OJ/Ox, enabling us to compute the time path
of the optimal control vector. Like for the deterministic model we assume a solution
of the type

J = ½x'K(t)x + p(t)'x + h(t).

The only difference between the deterministic solution in Paragraph 2.2 and stochastic
one is that the h(t) term gets an additional component

' ^' Qx^' + :~'F~ + p'Cz + ½tr(ZK)

- h = ½~'R~ + ~z (3.11)

which adds the covariance matrix to the solution. The other equations of (2.49) remain
unchanged. The solution we obtain through (2.49), (2.50) and (3.12) is the solution for
the control vector u(t) that gives the lowest expected value for the penalty function.
As (2.51) is independent of the h(t) term, the optimal control vector is independent
of the covariance matrix Z. Hence, the solution is of a certainty equivalence type.
For Monte Carlo purposes it is possible to produce a sampling x(t) by simulating
(3.9) with a random term. The Brownian random term can be realized by setting
dv = ev/-~, where ~ ~ N(0, 1). With the help of (2.51) and the Brownian motion
component we can propagate (3.9) through time and derive the path of the state vector
x(t) for different sets of random numbers.

4. Summary
In this chapter we have presented the deterministic and stochastic version of the
Linear-Quadratic Control Model, with a number of extensions, as often applied in the
various fields of economics. This type of modeling has not reached its final point yet.
At present there are two different branches of research that need further investigation.
The stochastic version still has some major drawbacks that have to be worked upon.
For instance, all through Section 3 we have assumed that the random components were
Ch. 13: Numerical Methods Jbr Linear-Quadratic Models 615

of a Gaussian nature. In a large number of cases this may be a considerable drawback.

Especially for financial modeling where you know with a high degree of certainty
that most of the phenomena you describe do not follow a Gaussian distribution.
Consequently, the stochastic version of the LQCM has to be extended for other than
Gaussian distributions. Monte Carlo integration, as described in the chapter of John
Geweke, may be the right vehicle for doing this, see Kendrick (1995).
Another element is the nonconvex character of stochastic dynamic optimization
problems. In dealing with a stochastic optimization problems you basically want to
control the deterministic part of the model and the stochastic elements inherent to the
problem. In general, there might be a trade off between those two components, giving
rise to a nonconvex structure of the objective function.

A note on software

The numerical examples presented in Section 2 of this chapter are written in GAUSS 13
and the examples in Section 3 are done through the computer program DUAL. The
examples in both GAUSS and DUAL can be obtained from the author.


Ames, W.E (1977) Numerical methods fi~r partial differential equations. New York: Academic Press.
Amman, H.M. (1986) 'Are supercornputersuseful for optimal control experiments?',Journal of Economic
Dynamics and Control, 10:127-130.
Amman, H.M. (1990) 'Implementingstochastic control software on supercomputingmachines', Journal of
Economic Dynamics and Control, 14:265-279.
Amman, H.M. and Jager, H. (1987) 'Optimaleconomicpolicy under a crawlingpeg exchange rate system',
in: C. Carraro and D. Sartore, eds, Developments of control theory for economic analysis. Dordrecht:
Khiwer Academic Publishers.
Amman, H.M. and Jager, H. (1988) 'A constrained control algorithm for non-linear control problems',
International Journal of System Science, 19:1781-1794.
Amman,H.M. (1989) 'Nonlinearcontrol simulationon a vector machine', Parallel Computing, 10:123-127.
Amman H.M. and Kendrick, D.A. (1991) 'An user's guide for DUAL: A program for quadratic-linear
stochastic control problems', TechnicalPaper 90-4, Austin,TX: Center of Economics Research.
Amman H.M. and Kendrick,D.A. (1992a) 'Parallelprocessingfor large-scalenonlinearcontrolexperiments
in economics', The International Journal of Supercomputer Applications, 5:90-95.
Amman H.M. and Kendrick,D.A. (1992b) 'Forward lookingvariables in deterministiccontrol', Annals of
Operation Research, to appear.
Amman, H.M. and Kendrick, D.A. (1993) 'Forward looking variables and learning in stochastic control',
The International Journal of Supercomputer Applications, 7:201-211.
Amman H.M. and Kendrick, D.A. (1994) 'Active learning;Some empiricalresults', Journal of Economic
Dynamics and Control, 18:119-124.
Amman H.M. and Kendrick, D.A. (1995a) 'Nonconvexitiesin stochastic control models', International
Economic Review, 36:455-475.

13GAUSS is a product of Aptech Systems, Inc, Maple Valley, U.S.A.

616 H. Amman

Amman, H.M. and Kendrick, D.A. (1995b) 'Solving stochastic optimization models with learning and
rational expectations', Economic Letters, 48:9-13.
Amman, H.M. and Neudecker, H. (1995) 'Numerical solutions of the algebraic matrix Riccati equation',
Research Memorandum University of Amsterdam.
Amman, H.M. and van Velden, L.M.T. (1992) 'Exchange rate uncertainty in imperfect markets: A simulation
approach', in: H.M. Amman, D.A. Belsley and L.E Pan, eds, Computational economics and econometrics.
Dordrecht: Khiwer Academic Publishers.
Aoki, M. (1976) Optimal control and system theory in dynamic economic analysis. New York: North-
Aoki, M. (1989) Optimization qf stochastic" systems: Topics in discrete-time dynamics. Boston: Academic
Arnold, L. (1974) Stochastic differential equations: Theory and applications. New York: Wiley.
Basar, T. (1988) 'Solutions to a class of nonstandard stochastic control problems with active learning',
IEEE Transactions on Aromatic Control, 33:1122-1129.
Basar, T. (1989a) 'Dynamic optimization of some forward looking stochastic models', in: A. Blagviere,
ed., Modeling and control systems. Heidelberg: Springer, pp. 121,315-336.
Basar, T. (1989b) 'Some thoughts on rational expectations models and alternative formulations', Computers,
Mathematics and Applications, 18:591-604.
Bellman, R. (1957)Dynamic programming. Princeton, N J: Princeton Univ. Press.
Becker, R.G., Dwolatzky, B., Karakitos, E. and Rustem, B. (1986) 'The simultaneous use of rival models
in policy optimisation', The Economic Journal, 96:425-448.
Becker, R., Hall, S. and Rustem, B. (1994) 'Robust optimal decisions with stochastic nonlinear economic
systems', Journal of Economics Dynamics and Control, 18:125-148.
Bergstrmn, A.R. (1987) 'Optimal control in wide-sense stationary continuous-time stochastic models',
Journal of Economic Dynamics and Control, 11:425-443.
Bertsekas, D.E (1976) Dynamic programming and stochastic control. Boston: Academic Press.
Blanchard, O.J. and Kahn, C.M. (1980) 'The solution of linear difference models under rational expecta-
tions', Econometrica, 48:1305-1311.
Burden, R.L. and Faires, J.D. (1989) Numerical analysis. Boston: PWS-KENT Publishing Company.
Cheney, W. and Kincaid, D. (1985) Numerical mathematics and computers. Pacific Grove: Brooks/Cole
Publishing Company.
Chow, G.C. (1975) Analysis and control of dynamic economic systems. New York: Wiley.
Chow, G.C. (1979) 'Optimum control of stochastic differential equation systems', Journal of Economic
Dynamics and Control, 3:143-175.
Chow, G.C. (1981) Econometric analysis by control methods. New York: Wiley.
Chow, G.C. (1983) Econometrics. Tokyo: McGraw-Hill.
Cohen° D. and Michel, E (1988) 'How should control theory be used to calculate a time-consistent gov-
ernment policy?', Review of Economic Studies, 55:263-274.
Coleman, T.F. and Van Loan, C. (1988) Handbook.fi)r matrix computation. Philadelphia, PA: SIAM.
Cuthbertson, K., Hall, S.G. and Taylor, M,E (1992) Applied econometric techniques. Hemel Hempstead:
Philip Allan.
Fair, R.C. (1984) Specification, estimation and analysis of macroeconometric models. Cambridge, MA:
Harvard Univ. Press.
Fair, R.C. and Taylor, J.B. (1983) 'Solution and maximum likelihood estimation of dynamic rational
expectations models', Econometrica, 51 : 1169-1185.
Fisher, EG. (1992) Rational expectations in macroeconomic models. Dordrecht: Kluwer Academic Pub-
Fisher, EG., Holly, S. and Hughes Hallett, A.J. (1986) 'Efficient solutions techniques for dynamic non-linear
rational expectations models', Journal of Economic, Dynamics and Control, 10:139-145.
Foster, G.H. (1989) 'Bond pricing in APL2: A study in numerical solution of the Brennan and Schwartz
bond pricing model using vector processor', Computational Economics, 2:179-198.
Ch. 13: Numerical Methods for Linear-Quadratic Models 617

Goffe, W.L., Ferfier, G.D. and Rogers, J. (1992) 'Simulated annealing: An initial application in economet-
rics', Computational Economics, 5:133-146.
Holly, S., Rustem, B. and Zarrop, M.B. (1979) Optimal control.fi)r econometric models: An approach to
economic policy formulation. New York: St. Martin's Press.
Holt, C.C. (1962) 'Linear decision rules for economic stabilization and growth', Quarterly Journal qf'
Economics, 76:20M5.
Hotelling, H. (1931) 'The economics of exhaustible resources', Journal ¢)f'PoIitical Economy, 39:137-175.
Hughes Hallett, A.J. (1987) 'Forecasting and policy evaluation in economies with rational expectations:
The discrete time case', Bulletin q[ Economic Research, 39:40-70.
Hughes Hallett, A.J. and Rees, H. (1983) Quantitative economic policies and interactive planning. Cam-
bridge, UK: Cambridge Univ. Press.
Hull, J. (1989) Options, futures and other derivative securities. Englewood Cliffs, NJ: Prentice-Hall.
Jamshidi, M. and Malek-Zavarei, M. (1986) Linear control systems: A computer aided approach. Oxford:
Jerrell, M. (1994) 'Global optimizatiou using interval arithmetic', Computational Economics, 7:55-62.
Kalaith, T. (1980) Linear systems. Englewood Cliffs, NJ: Prentice-Hall.
Kamien, M.I. and Schwartz, N.L. (1981) Dynamic optimization: The calculus q{ variations and optimal
control in economics and management. Amsterdam: North-Holland.
Kendrick, D.A. (1973) 'Stochastic control in macroeconomic models', IEEE Con/brence Publication,
Kendrick, D.A. (1978) 'Nonconvexities from probing in adaptive control problems', Economics Letters,
Kendrick, D.A. (1981a) 'Control theory with applications to economics', in: KJ. Arrow and M.D. lntrili-
gator, eds, Handbook on mathematical economics. Amsterdam: North-Holland, Chapter 4.
Kendrick, D.A. (1981b)Stochastic control./or economic models. New York: McGraw-Hill.
Kendrick, D.A. (1995) 'Ten wishes', Computational Economics, 8:65-80.
Kushner, H.J. and Dupuis, P.G. (1992) Numerical methods for stochastic control problems in continuous
time. New York: Springer.
Kwakenmak H. and Sivan, R. (1972) Linear optimal control systems. New York: Wiley.
Lronard, D. and Van Long, N. (1992) Optimal control theory and static optimization in economics. Cam-
bridge, UK: Cambridge Univ. Press.
Lucas, R.E. (1976) 'Econometric policy evaluation: A critique', in: K. Brunner and A.H. Meltzer, eds,
The Phillips curve and labor markets. Supplementary Series to the Journal of Monetary Economics, pp.
McGratten, E.R. (1994) 'A note on computing competitive equilibria in linear models', Journal qf Eco
nomics Dynamics and Control, 18:149-.160.
MacRea, E.C. (1972) 'Linear decision with experimentation', Annals o{ Economic Social Measurement,
Malliaris, A.G. and Brock, W.A. (1982) Stochastic methods in economics and finance. Amsterdam: North-
Mangasarian, O.L. (t966) 'Sufficieny conditions for the optimal control of nonlinear systems', SlAM
Journal on Control, 4: t 39-t 52.
Modi, J.J. (1988) ParaUel algorithms and matrix computation. Oxford: Clarendon Press.
Muth, J.K (1961) 'Rational expectations mad the theory of price movements', Econometrica, 29:315-335.
Myint-U, T. (1987) Partial d!f/erential equations for scientists and engineers. Amsterdam: North-Holland
Norman, A.L. (1976) 'First order dual control', Annals (~f'Economic and Social Measurement, 6:437M47.
Neck, R. and Matulka, J. (1994a) 'Stochastic optimum control of macroeconometric models using the
algorithm OPTCON', European Journal ()f"Operational Research, 73:384-405.
Neck, R. and Matulka, J. (1994b) "Stochastic control of nonlinear economic models', in: W.W. Cooper
and A.B. Whinston, eds, New directions in computational economics. Dordrecht: Kluwer Academic
618 H. Amman

~ksendal, B. (1985) Stochastic differential equations. Berlin: Springel:

Petit, M.L. (1990) Control theory and dynamic games in economic policy analysis. Cambridge, UK: Cam-
bridge Univ. Press.
Pindyck, R.S. (1973) Optimal planning.for economic stabilization. Amsterdam: North-Holland.
Phillips, A.W. (1954) 'Stabilization policy in a closed economy', Economic Journal, 64:290-323.
Phillips, A.W. (1957) 'Stabilization and the time form of the lagged responses', Economic Journal, 67:265-
Pontryagin, L.S., Boltyanskii, V.G., Gamkrelidze, R.V. and Mishenko, E.E (1964) The mathematical theory
of'optimal processes. Oxford: Pergamon.
Press, W., Flannery, B.E, Teukolsky, S.A. and Vetterling, W.T. (1986) Numerical recipies. Cambridge, UK:
Cambridge Univ. Press.
Ramsey, EE (1928) 'A mathematical theory of saving', Economic Journal, 38:543-559.
Rustem, B. (1990) 'Optimal consistent robust feedback rules under parameter, forecast and behavioral
uncertainty', in: N.M. Christadoulakis, ed., Dynamic modeling and control of national economies 1989.
Oxford: Pergamon, pp. 195-202.
Rustem, B. (1994) 'Stochastic and robust control of nonlinear economic systems', European Journal of
Operation Research, 73:304-318.
Sargent, T.J. (1987) Dynamic macroeconomic theory. Cambridge, MA: Harvard Univ. Press.
Scales, L.E. (1985) Introduction to non-linear optimization. London/Basingstoke: Macmillan.
Sewell, G. (1988) The numerical solution t~f ordinary and partial differential equations. Boston: Academic
Sheffrin, S.M. (1983) Rational expectations. Cambridge, UK: Cambridge Univ. Press.
Stewart, G.W. (1975) Introduction to matrix computation. New York: Academic Press.
Stockey, N.L. and Lucas, R.E., Jr. (1989) Recursive methods in economic dynamics. Cambridge, MA:
Harvard Univ. Press.
Stoer, J. and Bulirsch, R. (1980) Introduction to numerical analysis. New York: Springer.
Tapiero, C.S. and Sulem, A. (1994) 'Computational aspects in applied stochastic control', Computational
Economics, 7:109-146.
Taylor, J.B. (1979) 'Estimation and control of a macroeconomic model with rational expectations', Econo-
metrica, 45:1377-1385.
Theil, H. (1964) Optimal decision rules .f~)r government and industry. Amsterdam: North-Holland.
Tustin, A. (1953) The mechanism of economic systems. Cambridge, MA: Harvard Univ. Press.
Vaughan, D.R. (1970) 'A nonrecursive algebraic solution for the Riccati equation', IEEE Transactions on
Automatic Control, AC-15:597-599.
Whittle, E (1982) Optimization over time: Dynamic programming and stochastic control. New York: Wiley.
Chapter 14

University of Wisconsin


1. Introduction 620
2. MDPs and the theory of dynamic programming: A brief review 632
2.1. Definitions of MDPs, DDP's and CDP's 632
2.2. Belhnan's equation, contraction mappings, and Blackwell's theorem 633
2.3. Examples of analytic solutions to Bellman's equation for specific "Test Problems" 636
3. Computational complexity and optimal algorithms 639
3.1. Discrete computational complexity 640
3.2. Continuous computational complexity 641
4. Numerical solution methods for general MDPs 648
4.1. Discrete finite horizon MDPs 649
4.2. Discrete infinite horizon MDPs 652
4.3. Conthmous finite horizon MDPs 669
4.4. Continuous infinite horizon MDPs 697
5. Conclusion 717
References 722

*I tun grateful for helpful conunents by Hans Armnan, Dimitri Bertsekas, Ken Judd, David Kendrick,
Eduardo Ley, Michael Keane, Sam Kortum, Martin Puterman, Michael Sandfort, Kenneth Wolpin and
two not very anonymous referees, Charles Tapiero and John Tsitsiklis. Indirect financial support t'rom
the Bradley Foundation, the Graduate School of the University of Wisconsin, and the National Science
Foundation is gratefully acknowledged.

Handbook qt Computational Economics, Volume L Edited by H.M. Amman, D.A. Kendrick and .L Rust
(~ 1996 Elsevier Science B.V All rights reserved.
620 J. Rust

1. Introduction

This chapter surveys numerical methods for solving dynamic programming (DP) prob-
lems. The DP framework has been extensively used in economics because it is suffi-
ciently rich to model almost any problem involving sequential decision making over
time and under uncertainty. 1 Economic applications include the pioneering work on
optimal inventory policy by Arrow, Harris and Marschak (1951), studies of invest-
ment under uncertainty by Lucas and Prescott (1971), analyses of optimal intertempo-
ral consumption/savings and portfolio selection under uncertainty by Phelps (1962),
Hakansson (1970), Levhari and Srinivasan (1969), Merton (1969) and Samuelson
(1969), the work on optimal growth under uncertainty by Brock and Mirman (1972)
and Leland (1974), models of asset pricing by Lucas (1978) and Brock (1982), and
the studies of equilibrium business cycles by Kydland and Prescott (1982) and Long
and Plosser (1983). By the early 1980's the use of MDPs had become widespread
in both micro and macroeconomic theory as well as in finance and operations re-
search. 2
By a simple re-definition of variables virtually any DP problem can be formulated
as a M a r k o v decision process (MDP) in which a decision maker who is in state st
at time t = 1 , . . . , T takes an action at that determines current utility u(st, at) and
affects the distribution of next period's state St+l via a Markov transition probability
p(st+l 1 st, at). The problem is to determine an optimal decision rule ~ that solves
V ( s ) =_ max,~ E,~{~-~Tt=ofltu(st, at) I so = s} where E,~ denotes expectation with
respect to the controlled stochastic process {st, at} induced by the decision rule o~ =
{c~1,..., aT}, and/3 E (0, 1) denotes the discount factor. What makes these problems
especially difficult is that instead of optimizing over an ex ante fixed sequence of
actions {a0,. • •, aT} one needs to optimize over sequences of functions {c~0,..., C~T}
that allow the ex post decision at to vary as a best response to the current state
of the process, i.e. at = c~t(st). The method of dynamic programming provides a
constructive, recursive procedure for computing o~ using the value ]unction V as
a "shadow price" to decentralize a complicated stochastic/multiperiod optimization
problem into a sequence of simpler deterministic/static optimization problems. 3

1DP problems are also known as "stochastic control problems" in the mathematics and engineering
2Dp models can also be used to model "learning" behavior in which agents update beliefs about
unobserved state variables and unknown parameters of the state transition probabilities according to Bayes
Rule. See Stokey and Lucas (1987) for additional examples of DP models in economic theory. See Rust
(1994a,b, 1995a) for surveys of applications of DP models in econometrics.
3In finite horizon problems V actually denotes an entire sequence of value functions, V --= (Vc~ ,
. . . , V~), just as a denotes a sequence of decision rules. One can view dynamic programming from a
game-theoretic perspective as a backward induction method for finding a "subgame perfect equilibrium
of a game against nature". See Rust (1996) for a discussion of this interpretation and the related issue of
"time consistency" of preferences and decision rules.
Ch. 14: Numerical Dynamic Programmingin Economics 621

Although there are extensions of dynamic programming to problems with non-

time separable and "long run average" specifications of the agent's objective func-
tion, this chapter focuses on discounted MDPs. Much of our discussion will focus
on the infinite-horizon case, where V is the unique solution to Bellman's equation,
V = F ( V ) , where the Bellman operator F is defined by:

= max + t (1.1)

The optimal decision rule can be recovered from V by finding a value a(s) E A(s) that
attains the m a x i m u m in (1.1) for each s E S. We review the main theoretical results
about MDPs in Section 2, providing an intuitive derivation of Bellman's equation in
the infinite horizon case. The Bellman operator has a particularly nice mathematical
p r o p e r t y : / " is a contraction mapping. A large number of numerical solution methods
exploit the contraction property to yield convergent, numerically stable methods for
computing approximate solutions to Bellman's equation, including the classic method
of successive approximations. Since V can be recovered from c~ and vice versa,
the rest of this chapter focuses on methods for computing the value function V,
with separate discussions of numerical problems involved in approximating a where
appropriate. 4
F r o m the standpoint of computation, there is an important distinction between dis-
crete MDPs whose state and control variables can assume only a finite number of
possible values, and continuous MDPs whose state and control variables can assume
a continuum of possible values. The value functions for discrete MDPs are elements
a s u b s e t / 3 of the finite-dimensional Euclidean space R Isl (where IS[ is the number
of elements in S), whereas the value functions of continuous MDPs are elements
of a subset B of the infinite-dimensional Banach space 13(S) of bounded, measur-
able real-valued functions on S. Thus, discrete MDP problems can be solved exactly
(modulo rounding error in arithmetic operations), whereas the solutions to continuous
MDPs can generally only be approximated. Approximate solution methods may also
be attractive for solving discrete MDPs with a large number of possible states [S] or
actions [A[. Continuous MDPs arise frequently in economic applications since state
variables such as income or wealth, and control variables such as consumption or
investment are naturally treated as continuous quantities.
The use of any type of approximate solution method presents us with a tradeoff
between the m a x i m u m allowable error e in the numerical solution and the amount of
computer time (and storage spac e ) needed to compute it. Solution time will also be

4Special care must be taken in problems with a continuum of actions, since discontinuities or kinks in
numerical estimates of the conditional expectation of V can create problems for numerical optimization
algorithms used to recover o~. Also, certain solution methods focus on computing a directly rather than
indirectly by first computing V. We will provide special discussions of these methods in Section 4.
622 J. Rust

an increasing function of any relevant measure of the size or dimension d of the MDP
problem. It goes without saying that economists are interested in using the fastest
possible algorithm to solve their MDP problem given any specified values of (c, d).
Economists are also concerned about using algorithms that are both accurate and nu-
merically stable since the MDP solution algorithm is often embedded or "nested" as
a subroutine inside a larger optimization or equilibrium problem. Examples include
computing competitive equilibria of stochastic general equilibrium models with in-
complete markets [Hansen and Sargent (1993), Imrohoro~lu and Imrohoro~lu (1993),
McGuire and Pakes (1994)], maximum likelihood estimation of unknown parameters
of u and p using data on observed states and decisions of actual decision-makers
[Eckstein and Wolpin (1989), Rust (1994a,b) and Sargent (1978, 1981)], and com-
putation and econometric estimation of Bayesian Nash equilibria of dynamic games
[McKelvey and Palfrey (1992)]. All of these problems are solved by "polyalgorithms"
that contain MDP solution subroutines. 5 Since the MDP problem must be repeatedly
re-solved for various trial sequences of the "primitives" (/3, u,p), speed, accuracy,
and numerical stability are critical.
There are two basic strategies for approximating the solution to a continuous MDP:
1) discrete approximation and 2) smooth approximation. Discrete approximation meth-
ods solve a finite state MDP problem that approximates the original continuous MDP
problem over a finite grid of points in the state space S and action space A. Since the
methods for solving discrete MDPs have been well developed and exposited in the op-
erations research literature [e.g. Bertsekas (1987), Porteus (1980) and Puterman (1990,
1994)], this chapter will only briefly review the standard approaches, focusing instead
on research on discretization methods for solving continuous MDPs [see, e.g. Bert-
sekas (1975), Fox (1973), Whitt (1978) and Santos and Vigo (1995a,b)]. Although
smooth approximation methods also have a long history [dating back to Bellman
(1957), Bellman and Dreyfus (1962) and Bellman, Kalaba and Kotkin (1963)], there
has been a recent resurgence of interest in these methods as a potentially more efficient
alternative to discretization methods [Johnson et al. (1993), Judd and Solnick (1994),
Miranda and Schnitkey (1995) and Smith (1991)]. Smooth approximation methods
treat the value function V and/or the decision rule c~ as smooth, flexible functions of
s and a finite-dimensional parameter vector 0: examples include linear interpolation
and a variety of more sophisticated approximation methods such as polynomial series
expansions, neural networks, and cubic splines. The objective of these methods is to
choose a parameter 0" such that the implied approximations V0 or c~g "best fit" the
true solution V and c~ according to some metric. 6 In order to ensure the convergence

5We will see that the MDP subroutine is itself a fairly complicatedpolyalgorithmconsisting of individual
subroutines for numerical integration, optimization, approximation, and solution of systems of linear and
nonlinear equations.
6Other variants of smooth approximation methods use nonlinear equation solvers to find a value 0*
that satisfies a set of "orthogonality conditions". In finite horizon MDPs interpolation methods are used to
store the value function. Other methods such ,as ~rohnsonet al. (1993) use local polynomial interpolation
of V over a grid of points in S.
Ch. 14." Numerical Dynamic Programming in Economics 623

of a smooth approximation method, we need a parameterization that is sufficiently

flexible to allow us to approximate arbitrary value functions V in the set B. One way
to do this is via an expanding sequence of parameterizations that are dense in B in
the sense that for each V E ]~7

lim inf IIVo =- v i i = o, (1.2)

k--->oo OcR k

where IIvii = sup~es IV(s)l denotes the usual "sup-norm". For example, consider an
infinite horizon MDP problem with state space S = [ - 1 , 1]. A natural choice for an
expanding family of smooth approximations to V is


where pi(s) = s ~ is t h e / t h standard polynomial. A more sophisticated choice is to use

an orthogonalfamily such as the Chebyshev polynomials pi(s) = cos(/ c o s - l (s)). 8
The Weierstrass approximation theorem implies that the Vo parameterizations in (1.3)
are ultimately dense in the space B = C [ - 1 , 1] of continuous functions on [ - 1 , 1].9
Under the least squares criterion of goodness of fit, the problem is to choose a pa-
rameter 0 c O C / ~ k to minimize the error function crN defined by:

i N (1.4)

where F is some computable approximation to the true Bellman o p e r a t o r / ' and the
si are a pre-specified grid of points in S. Methods of this form are known as minimum
residual (MR) methods since the parameter 0 is chosen to set the residual function
R(Vo)(s) = Vo ( s ) - F ( V o ) ( s ) as close to the zero function as possible. The philosophy
of these methods is that if the true value function V can be well approximated by a
flexible parametric function 170 for a small number of parameters k, we will be able
to find a better approximation to V in significantly less cpu time by minimizing an
error function such as aN(O ) in (1.4) than by "brute force" discretization methods.

71f the MDP problem has further structure that allows us to restrict the true value function to some
subset of B such as the space of continuousfunctions,C(S), then the limit in Eq. (1.2) condition need
only hold for each V in this subset.
8See Judd Chapter 12 for a definitionof Chebyshev and other families of orthogonal polynomials.
9tn fact we have the followingerror bound for Eq. (1.2): inf0ERk ]IVo - VII ~ blog(k)/k for some
constant b > 0. See Judd (1996, Chapter 12) for a formal statement of this result and Rivlin (1969) for a
624 J. Rust

There has been considerable controversy in the economics literature about the rel-
ative merits of discrete versus smooth approximation methods for solving continuous
MDP problems. Some of the controversy grew out of a "horse race" in the 1990
Journal of Business and Economic Statistics [Taylor and Uhlig (1990)] in which a
number of alternative solution methods competed in their ability to solve the classical
Brock-Mirman stochastic growth model which we will describe in Section 2.6. More
recently Judd (1994a) has claimed that

Approximating continuous-state problems with finite state Markov chains limits the
range of problems which can be analyzed. Fortunately, state-space discretization
is unnecessary. For the past thirty years, the standard procedure in Operations
Research literature [see Bellman (1963), Dantzig (1974), Daniel (1976)] has been
to approximate value functions and policy rules over continuous state spaces with
orthogonal polynomials, splines, or other suitable families of functions. This results
in far faster algorithms and avoids the errors associated with making the problem
unrealistically "lumpy" (p. 3).

This chapter offers some new perspectives on this debate by providing a conceptual
framework for analyzing the accuracy and efficiency of various discrete and smooth
approximation methods for continuous MDP problems. The framework is the theory
of computational complexity [Garey and Johnson (1983), Traub and Wo~niakowski
(1980) and Traub, Wasilikowski and Wo2niakowski (TWW) (1988)]. We provide a
brief review of the main results of complexity theory in Section 3. Complexity theory
is of particular interest because it has succeeded in characterizing the form of optimal
algorithms for various mathematical problems. Chow and Tsitsiklis (1989) used this
theory to establish a lower bound on the amount of computer time required to solve of
a general class of continuous MDPs. A subsequent paper, Chow and Tsitsiklis (1991),
presented a particular discrete approximation method - a multigrid algorithm - that
is approximately optimal in a sense to be made precise in Section 4.3. Thus, we will
be appealing to complexity theory as a way of finding optimal strategies forfinding
optimal strategies.
There are two main branches of complexity theory, corresponding to discrete and
continuous problems. Discrete (or algebraic) computational complexity applies to fi-
nite problems that can be solved exactly such as matrix multiplication, the traveling
salesman problem, linear programming, and discrete MDPs. 1° The size of a discrete
problem is indexed by an integer d and the (worst case) complexity, comp(d), denotes
the minimal number of computer operations necessary to solve the hardest possible
problem of size d (or ec if there is no algorithm capable of solving the problem).
Continuous computational complexity theory applies to continuous problems such as
multivariate integration, function approximation, nonlinear programming, and contin-
uous MDP problems. None of these problems can be solved exactly, but in each case

mln the latter two problems we abstract from rounding error in computerarithmetic.
Ch. 14: Numerical Dynamic Programming in Economics 625

the true solution can be approximated to within an arbitrarily small error tolerance e.
Problem size is indexed by an integer d denoting the dimension of the space of the
continuous variable (typically a subset of Ra), and the complexity, comp(e, d), is
defined as the minimal computational cost (cpu time) of solving the hardest possible
d-dimensional problem to within a tolerance of e.
Complexity theory enables us to formalize an important practical limitation to our
ability to solve increasingly detailed and realistic MDP problems: namely Bellman's
(1955) curse ofdimensionality. This is the well-known exponential rise in the amount
of time and space required to compute the solution to a continuous MDP problem as
the number of dimensions d~ of the state variable or the number of dimensions da of
the control variable increases. Although one typically thinks of the curse of dimen-
sionality as arising from the discretization of continuous MDPs, it also occurs in finite
MDPs that have many state and control variables. For example, a finite MDP with d,
state variables each of which can take on ISI > 1 possible values has a total of ISI a~
possible states. The amount of computer time and storage required to solve such a
problem increases exponentially fast as d~ increases. Smooth approximation methods
cannot escape the curse of dimensionality: for example using the Chebyshev series ap-
proximation in Eq. (1.3), the dimension/c of the parameter vector 0 must increase at a
sufficiently rapid rate as ds increases in order to guarantee that HVo - V H <~ e. Specif-
ically, the literature on approximation theory [Pinkus (1985), Novak (1988)] shows
that k must increase at rate (l/e) (a/~) in order to obtain uniform e-approximations
of smooth multidimensional functions which are r-times continuously differentiable.
Although there are certain nonlinear approximation methods such as neural networks
that can be shown to require a polynomially rather than exponentially increasing num-
ber of parameters k to obtain c-approximations for certain subclasses of functions [i.e.
only k = O(1/e 2) parameters are required to approximate the subclass of/'unctions
considered in Barron (1993) and Hornik, Stinchombe, White and Auer (1993)], these
methods are still subject to the curse of dimensionality. In the case of neural networks
one must fit the parameter vector 0 for the neural net by finding a global minimum of
the nonlinear least squares problem in Eq. (1.4). However neural nets typically lead
to problems with many local minima, and one can prove that the amount of com-
puter time required to find an c-approximation to a global minimizer of an arbitrary
smooth function increases exponentially fast as the number of parameters k increases
[Nemirovsky and Yudin (1983)], at least on a worst case basis. Variations of smooth
approximation methods such as projection methods that involve solutions of systems
k nonlinear equations or which require multivariate interpolation or approximation
are also subject to the curse of dimensionality [Sikorski (1985), TWW (1988)].

DEFINITION. A class of discrete MDP problems with ds state variables and da control
variables is subject to the curse of dimensionality if comp(ds,da) = Y2(2a~+da).
626 ,L Rust

A class of continuous MDP problems is subject to the curse of dimensionality if

comp(e, da, d~) = ~O(1/C(d~+d~)).11

In the computer scmnce literature problems that are subject to the curse of di-
mensionality are called intractable.12 If the complexity of the problem has an upper
bound that only grows polynomially in d we say that the problem is in the class P
of polynomial-time problems. Computer scientists refer to polynomial-time problems
as tractable. 13
A large number of mathematical problems have been proven to be intractable on
a worst case basis. These problems include multivariate integration, optimization,
and function approximation [TWW (1988)], and solution of multidimensional partial
differential equations (PDE's), and Fredhom integral equations [Werschulz (1991)].
Lest the reader get too depressed about the potential usefulness of numerical methods
at the outset, we note that there are some mathematical problems that are not subject
to the curse of dimensionality. Problems such as linear programming and solution of
ordinary differential equations have been proven to be in the class P of polynomial
time problems [Nemirovsky and Yudin (1983) and Werschulz (1991)]. Unfortunately
Chow and Tsitsiklis (1989) proved that the general continuous MDP problem is also
intractable. They showed that the complexity function for continuous MDPs with ds
state variables and d,, control variables is given by:

c°mp(e'd~'d~'i3) = O ( (( 1 ) (1.5)
1 - 9)2c) 2ds+d° '

liThe notation ~2 denotes a "lower bound" on complexity, i.e. comp(d) = g2(9(d)) if

l-~ma~lg(d)/comp(d)l < cx~.

12Weprefer the terminology"curse of dimensionality"since the commonuse of the term "intractable"

connotes a problem that can't be solved. Computer scientists have a specific terminology for problems
that can't be solved in any finite amount time: these problems have infinitecomplexity, and are classified
as non-comlmtable.However even though intractable problems are computable problems in the computer
science terminology, as the problem grows large the lower bound on the solution time grows so quickly
that large scale versions of these problems are not computable in any practical sense.
13Here again it is importantto note the differencebetween the commonmeaningof the term "tractable"
and the computer science definition. Even so-called "tractable" polynomial-timeproblems can quickly
become computationallyinfeasible if complexity satisfies comp(d) /t- O(dk) for some large exponent
k. However it seems to be a fortunate act of nature that the maximum exponent k for most common
polynomial time problems is fairly small; typically k E [2, 4].
Ch. 14: Numerical Dynamic Programming in Economics 627

where the symbol O denotes both an upper and lower bound on complexity. 14 In sub-
sequent work, Chow and Tsitsiklis (1991) developed a "one way multigrid" algorithm
that comes within a factor of 1/I l°g(/3) l of achieving their complexity bound, so it
can be viewed as an approximately "optimal algorithm" for the MDP problem. As
we will see, the multigrid algorithm is a particularly simple example of a discrete
approximation method that is based on simple equi-spaced grids of S and A ) 5
The fact that the Chow-Tsitsiklis lower bound on complexity increases exponen-
tially fast in d8 and da tells us that the curse of dimensionality is an inherent aspect
of continuous MDP problems that can't be circumvented by any solution algorithm,
no matter how brilliantly designed. There are, however, three potential ways to le-
gitimately circumvent the curse of dimensionality: 1) we can restrict attention to a
limited class of MDPs such as the class of linear-quadratic MDPs, 2) we can use
algorithms that work well on an average rather than worst case basis, or 3) we can
use random rather than deterministic algorithms.
Chapter 4 by Anderson, McGrattan, Hansen and Sargent and Chapter 13 by Amman
demonstrate the payoffs to the first approach: they present highly efficient polynomial-
time algorithms for solving the subclass of linear-quadratic MDPs (LQ-MDPs) and
its associated "matrix Ricatti equation". These algorithms allow routine solution of
very high dimensional LQ problems on desktop workstations.
The idea behind the second approach to circumventing the curse of dimensionality
is that even though an algorithm may perform poorly in solving a "worst case"
problem, it may in fact perform very satisfactorily for most problems that are typically
encountered in economic applications. A classic example is the simplex algorithm for
linear programming. Klee and Minty (1972) concocted an artificial sequence of LP
problems that force the simplex algorithm to visit an exponentially increasing number
of vertices before it finds an optimal solution. Nevertheless the simplex algorithm
performs quite satisfactorily for most problems encountered in practical applications.
There are alternative algorithms that are guaranteed to solve all possible LP problems
in polynomial-time [e.g. Khachian's (1979) algorithm or Karmarkar's (1985) interior
point algorithm], but numerical comparisons reveal that for "typical" problems the
simplex method is as fast if not faster. One can resolve this paradoxical finding by
appealing to the concept of average case complexity. This requires us to specify
a prior probability measure # to represent the likelihood of encountering various
problems and to define complexity by minimizing the expected time required to solve
problems with an expected error of e or less. Section 3 provides a formal definition

14Formally, comp(d) = O(g(d)) if there exist constants 0 ~ Cl ~ c2 such that ctg(d) ~ comp(d)
c2g(d). Chow and Tsitsiklis were primarily interested in studying complexity as a function of e and/3 for
a fixed d. It is possible that the bounding constants Cl and c: are also functions of d, but it is not known
whether these constants increase exponentially or only polynomially fast in d.
15Discrete approximation methods have been found to be approximately optimal algorithms for other
mathematical problems as well. For example Werschulz (1991) showed that the standard finite element
method (FEM) is a nearly optimal complexity algorithm for solving linear elliptic PDE's.
628 Z Rust

of average case complexity. This concept is of interest since several problems such as
multivariate integration, elliptic PDE's, and Fredholm integral equations of the second
kind that are intractable on a worst case basis have been shown to be tractable on an
average case basis [Wo2niakowski (1991) and Werschulz (1991)]. Since multivariate
integration constitutes a major part of the work involved in solving for V, this result
provides hope that the MDP problem might be tractable on an average case basis. In
Section 4.3 we conjecture that recent results of Wo£niakowski (1991, 1992) can be
used to show that certain classes of MDPs become tractable on an average case basis,
i.e. the average case complexity function is bounded above by a polynomial function
of the dimension d, and inverse error 1/c. Section 4.4 describes several algorithms
that we believe will perform well on an average case basis. One of these methods uses
the optimal integration algorithm of Wo2niakowski (1991) that evaluates integrands
at the set of shifted Hammersley points, and another evaluates integrands at the set
of Sobol' points [Paskov (1992, 1994)].
A final strategy for circumventing the curse of dimensionality is to use random
rather than deterministic algorithms. Randomization can sometimes succeed in break-
ing the curse of dimensionality because it relaxes the requirement that the error in the
numerical approximation is less than e with probability one. However similar to the
notion of average complexity, it can only offer the weaker assurance that the expected
error in an approximate solution is less than c. A classic example where randomiza-
tion succeeds in breaking the curse of dimensionality is multivariate integration. The
Monte Carlo estimate of the integral f f ( s ) ds of a function f on the d-dimensional
hypercube S = [0, 1]a is formed by drawing a random sample of points { g l , - . . , gN}
uniformly from S and forming the sample average }-]~U1 f ( s i ) / X . HSlder's inequal-
ity implies that the expected error in a Monte Carlo integral converges to zero at rate
1/x/N, independent of the dimension d. However randomization does not always
succeed in breaking the curse of dimensionality: problems such as multivariate op-
timization [Nemirovsky and Yudin (1983)], function approximation [TWW (1988)],
and solution of linear elliptic PDE's and Fredhom integral equations of the second
kind [Werschulz (1991)] are intractable regardless of whether or not randomization is
Section 4 discusses a recent result due to Rust (1995b) that proves that randomiza-
tion does break the curse of dimensionality for a subclass of MDPs known as discrete
decision processes (DDP's), i.e. MDPs with a finite number of possible actions. Rust's
upper bound on the worst case randomized complexity of an infinite horizon DDP
problem with IA] possible choices and a d-dimensional continuous state vector st is
given by:

compW°r-ran(G d) o ( IAI3 (1.6)

\ I log(/3)l(1 -/3)se4 J '

which implies that the DDP problem can be solved in polynomial time once random-
ization is allowed. Rust's proof is constructive since he presents a "random multigrid
Ch. 14: Numerical Dynamic PIvgramming in Economics 629

algorithm" that attains the upper bound on complexity in (1.6). The multigrid algo-
rithm is based on the random Bellman operator FN : B -+ B defined by:

FN(V)(s)= ~eA(~)
max [u(s'a)+ -Nfl EV(~i)P(gi
N ]
s,a) , (1.7)

where {gl, • • •, gN} is an IID random sample of points uniformly distributed over
the d-dimensional state space S = [0, 1]a. A particularly nice feature of the random
Bellman operator is that it is self-approximating, i.e. FN(V)(s) is a well-defined
smooth function of s E S whenever u and p are smooth functions of s. Thus, one
doesn't need to resort to auxiliary interpolation or approximation procedures to be
able to evaluate l~(V)(s) at any point s E S.
Unfortunately randomization cannot break the curse of dimensionality for the class
of continuous decision processes (CDP's), i.e. MDPs with continuous choice sets
A(s). The reason is quite simple: since the general nonlinear programming problem
is a special case of the MDP problem when/3 = 0, the general MDP problem must be
at least as hard as the general (static) nonlinear programming problem. However as
we noted above, the general nonlinear programming problem is intractable regardless
of whether deterministic or random algorithms are employed. In Section 4.4 we con-
sider whether there are certain subclasses of CDP's for which randomization might
succeed in breaking the curse of dimensionality. One such class is the set of" "convex
CDP's" with convex choice sets A(s) for each s E S and u(s, a) and p(s' [ s, a)
are concave functions of a for each pair (s', s) c S x S. An algorithm that might
succeed in breaking the curse of dimensionality for this class of problems is a random
multigrid algorithm using the random Bellman operator defined in (1.7), modified to
use numerical optimization to find an approximate maximizer over the continuum of
actions in A(s). Due to the convex structure of the problem, there are polynomial-time
numerical optimization methods that are guaranteed to find an e-approximation to the
maximizing element a E A(s).
While complexity theory provides a useful general guide to the analysis and design
of efficient numerical methods for MDPs, there are limits to its ability to make precise
efficiency comparison of specific algorithms. Probably the most useful way to evaluate
the performance of different methods is to compare their performance over a suite
of test problems that are commonly encountered in economic applications. While we
provide comparisons of some of the standard solution methods for discrete MDPs
in Section 4.2.2 and discuss what we know about the actual performance of various
methods wherever possible, we do not have sufficient space in this chapter to provide a
rigorous comparison of the wide variety of different solution methods for a reasonably
comprehensive set of multidimensional test problems. This is a task we leave to future
Our review of MDPs and numerical methods is necessarily selective given the space
constraints of this chapter. To provide some perspective of how our review fits into
630 ,L Rust

the larger literature on numerical dynamic programming and stochastic control, it is

useful to briefly review the main variations of control problems encountered in the

• deterministic vs. stochastic

• Markovian vs. non-Markovian
• discrete vs. continuous time
• finite vs. infinite horizon
• linear/quadratic vs. general nonlinear problems
• discrete vs. continuous state
• discrete vs. continuous action
• discounted vs. long-run average rewards
• perfect vs. imperfect state information

This chapter focuses on stochastic control since deterministic control problems are a
special case and can generally be effectively solved using the same algorithms devel-
oped for stochastic problems. We also focus on Markovian decision processes with
additively separable utility functionals. Although the method of dynamic program-
ming can be extended in a straightforward manner to solve finite-horizon problems
with non-Markovian uncertainty and general utility functionals [see Hinderer (1970)
or Gihman and Skorohod (1979)], these methods require keeping track of complete
histories which make them computationally intractable for solving all but the small-
est problems. We also focus on MDPs formulated in discrete rather than continuous
time. While there is an elegant theory of continuous time dynamic programming in
which the value function V can be shown to be the solution to a system of partial
differential equations known as the H a m i l t o n - B e l l m a n - J a c o b i (HBJ) equation [see,
e.g. Doshi (1976) and Fleming and Soner (1993)], under very general conditions one
can show that a continuous time MDP can be approximated arbitrary closely by a
discrete time M D P when the time interval is sufficiently small [Van Dijk (1984)].
Indeed, the predominant solution method for solving approximate discrete time for-
mulations of the M D P problem rather than attempting to numerically solve the HBJ
equation, which frequently reduces to a nonlinear system of PDE's of the second
order [see Kushner (1990) and Kushner and Dupuis (1992)]. 16 As we already noted,
even systems of linear elliptic PDE's are intractable on a worst case basis, which may
explain why the best approach to solving these problems is to solve an approximate

16Chapter 9 of Fleming and Soner reviews closely related numerical methods based on solving finite-
difference approximationsto the HJB equations, which Kushner and Dupuis (1992) show are closely related
to methods based on solving approximate discrete time MDPs. Semmler (1994) provides an interesting
application, of the time discretization approach that demonstrates the ability of discrete time models to
approximate chaotic trajectories in the limiting continuous time model as the time interval tends to zero.
See Tapiero and Sulem (1994) for a recent survey of numerical methods for continuous time stochastic
control problems and Ortega and Voigt (1985) for a review of the literature on numerical methods for
Ch. 14: Numerical Dynamic Programming in Economics 631

discrete time MDR In order to obtain good approximations, we need discrete time
MDPs with very short time intervals At whose discount factors/3 = e At are very close
to 1. However we can see from the Chow and Tsitsiklis complexity bound (1.5) that
the complexity of this problem tends to infinity as/3 -+ 1. This provides an indication
of the inherent difficulty of the continuous time MDP problem. A similar problem
is encountered in solving MDPs under the long-run average rather than discounted
criterion. Although there is a special theory for the solution of MDP problems under
the long-run average reward criterion [see Chapter 7 of Bertsekas (1987)], we focus
on solving problems with discounted returns since a generalization of a classical the-
orem due to Abel [see Bhattacharya and Majumdar (1989), Dutta (1991)] shows that
under weak regularity conditions (1 - p ) V converges to the optimal long-run average
reward as/3 tends to 1. This implies that we can approximate the stationary optimal
decision rule under long-run average rewards by solving a discounted MDP with/3
close to 1.17
We also focus on problems of perfect state information, since one can always
reduce a problem with imperfect state information to a problem with perfect state
information by treating the conditional distribution of the unobserved components of
the state variable given the observed components as an additional state variable [see,
e.g. Bertsekas (1987)]. However while this Bayesian approach is conceptually simple,
the reformulation falls prey to the curse of dimensionality since the new state variable
in the reformulated problem is a conditional probability distribution, an element of an
infinite-dimensional space. These problems are inherently much more difficult. For
example Papadimitriou and Tsitsiklis (1987) have shown that discrete state MDPs
with partially observed states are significantly harder problems than discrete MDPs
with fully observed states. It is an open question whether these non-Bayesian learning
algorithms can avoid the curse of dimensionality inherent in the Bayesian approach.
The remaining subject divisions: finite vs. infinite horizon, discrete vs. continuous
states and controls are all covered i n separate sections of this chapter. Readers who
are interested in efficient algorithms for solving LQ-MDP problems should consult
Amman, Chapter 13, and Anderson, McGrattan, Hansen and Sargent, Chapter 4.
A final caveat is that the MDP framework involves the implicit assumption that
decision-makers have time-separable preferences and are expected utility maximizers.
Experimental tests of human decision-making show that neither of these assump-
tions may be valid [e.g. the famous "Allais paradox", see Machina (1987)]. Recently,
a new theory of sequential decision-making has emerged that allows for time and
state non-separable preferences [see Epstein and Zin (1989, 1990) and Ozaki and
Streufert (1994)], and it's possible that this generalized dynamic choice theory could
result in more realistic economic models. Apart from some special cases that have
been solved numerically [e.g. Hansen and Sargent (1995)] there is virtually no theory

17See Puterman (1994) Chapters 8 and 9 for detailed discussion of algorithms for solving problems
under the long-run averagereward criterion.
632 J, Rust

or well developed numerical procedure for solving general versions of these prob-
lems. However these problems have a recursive structure, so it is quite likely that
many of the methods outlined in this chapter can ultimately be adapted to compute
approximate solutions to these more general sequential decision processes (SDP's).
We content ourselves with the observation, formalized in Rust (1994a), that the class
of MDPs is already sufficiently broad to enable one to generate virtually any type of
decision rule (behavior) via an appropriate specification of preferences u and law of
motion p.

2. MDPs and the theory of dynamic programming: A brief review

This section reviews the main theoretical results on dynamic programming in finite
and infinite horizon problems. Readers who are already familiar with this theory may
simply wish to skim over this section to determine the notation we are using, and
skip directly to the presentation of numerical methods in Section 4. We focus on time
stationary problems for notational simplicity: it will become clear that in finite horizon
problems the utility and transition probability functions can be general functions of
time t without affecting any of our results.

2.1. Definitions o f MDPs, DDP's and CDP's

DEFINITION 2.1. A (discrete-time, discounted) Markovian Decision Process (MDP)

consists of the following objects:
• A time index t E {0, 1 , 2 , . . . , T } , T <<,oc.
• A state space S.
• An action space A.
• A family of constraint sets {A(s) c_ A I s E S}.
• A transition probability p(ds' [ s, a) = Pr{st+l ~ ds' [ st = s, at = a}.
• A discount f a c t o r / 3 E (0, 1).
• A single period utility function u(s, a) such that the utility functional UT has the
additively separable decomposition: 18


lSThe boldface notation denotes sequences: s = (so, • .., ST). We will subsequently impose explicit
topological structure on S and A and smoothness conditions on u and p later in this section and in
Section 4.
Ch. 14: Numerical Dynamic Programming in Economics 633

The agent's optimization problem is to choose an optimal decision rule c~ = (c~0,

• . . , aT) to solve the following problem:

max E~{UT(S, d)}

~=(~0 ..... ~T)

-/...o T
I st_l,O~t_,(st_,))po(dso), (2.2)

where P0 is a probability distribution over the initial state so. Stated in this form,
the optimization problem is extremely daunting. We must search over an infinite-
dimensional space of sequences of potentially history-dependent functions (s0,
. . . , ST), and each evaluation of the objective function in (2.2) requires (T + t)-
fold multivariate integration• We now show how dynamic programming can be used
to vastly simplify this potentially intractable optimization problem. Before doing this,
we distinguish two important subclasses of MDPs known as DDP's and CDP's:

DEFINITION 2.2. A Discrete Decision Process (DDP) is an MDP with the following
• There is a finite set A such that A(s) C A for each s E S.

For simplicity, Section 4 will make the further assumption that A(s) = A for all
s E S. This apparent restriction actually does not involve any loss of generality, since
we can mimic the outcome of a problem with state-dependent choice sets A(s) by a
problem with a state-independent choice set A by choosing the utility function u(s, a)
so that the utility of any "infeasible" action a E A A A(s) c is so low that it will in
fact never be chosen.

DEFINITION 2.3. A Continuous Decision Process (CDP) is an MDP with the following
• For each s E S, the action set A(s) is a compact subset of R d° with non-empty

In Section 4 we will see that there are important differences in the computational
complexity and the methods used to solve DDP's and CDP's.

2.2. Bellman's equation, contraction mappings, and Blackwell's theorem

In the finite-horizon case (T < oo), dynamic programming amounts to simple back-
ward induction. In the terminal period VT and ST are defined by:

ST(ST)=-arg max [U(8T, aT)], (2.3)

634 J. Rust

~r(ST)= max [U(ST,aT)]. (2.4)

aTEA(sT) ~ "

In periods t =- 0 , . . . , T - 1, Vt and at are defined by the recursions:

o~t(st)=arg max [u(st,at)+/3/Vt+l(St+l)p(dSt+llst,at) 1 (2.5)


F ~ "I

Vt(st) = max lu(st,at)+/3/Vt+l(St+l)p(dSt+llst,at)l. (2.6)

arEA(st) [ J ]

In principle, the optimal decision rule at time t, c~t, can depend not only on the
current state st, but on the entire previous history of the process, at = ozt(st, Ht),
where Ht = (so, ao,..., st-l, at-l). However it is easy to see that the Markovian
property of p and the additive separability of U imply that it is unnecessary to keep
track of the entire previous history Ht: without loss of generality we can restrict
attention to Markovian strategies where the optimal decision rule c~t depends only on
the current time t and current state st.
It's straightforward to verify that at time t = 0 the value function Vo(so) represents
the maximized expected discounted value of utility in all future periods. Since dynamic
programming has recursively generated the optimal decision rule c~ = (c~0,..., c~r),
it follows that

v0(s) : I s0 : s}. (2.7)

In the infinite horizon case T = oo there is no "last" period from which to start the
backward induction to carry out the dynamic programming algorithm. However if the
per period utility function u is uniformly bounded and the discount factor/3 is in the
(0, 1) interval, then we can approximate the infinite horizon utility functional Uoo (s, d)
arbitrarily closely by a finite horizon utility functional UT(S, d) for T sufficiently
large. This is the basic idea underlying many numerical methods for solving infinite
horizon MDPs such as successive approximations.
Almost all infinite-horizon MDPs formulated in economic applications have the
further characteristic of stationarity: i.e. the transition probabilities and utility func-
tions are the same for all t. In the finite horizon case the time homogeneity of u and p
does not lead to any significant simplifications since there still is a fundamental non-
T j
stationarity induced by the fact that the remaining utility ~ j = e / 3 u(sj, aj) depends
on t. However in the infinite-horizon case, the stationary Markovian structure of the
problem implies that the future looks the same whether the agent is in state st at time
t or in state st+k at time t + k provided that st = st+k. In other words, the only
variable that affects the agent's view about the future is the value of his current state s.
This suggests that the optimal decision rule and corresponding value function should
Ch. 14: Numerical Dynamic Programming in Economics 635

be time invariant. Removing time subscripts from the recursions (2.5) and (2.6), we
obtain the following equations characterizing the optimal stationary decision rule c~
and value function V:

a ( s ) = argaEA(s)max[u(s,a)+~/V(s')p(ds'l s,a)], (2.8)

where V is the solution to:

V ( s ) = aEA(s)max[u(s,a)+~ f V(s')p(ds' I (2.9)

Equation (2.9) is known as Bellman's equation. 19 In mathematical terms Bellman's

equation is a functional equation and the value function V is a fixed point to this
functional equation.
To establish the existence and uniqueness of a solution V to Bellman's equation
we need to impose some additional regularity conditions. The following conditions
are stronger than necessary to prove the existence of (11, c~) but are typically made in
applications: 1) S and A are compact metric spaces, 2) u(s, a) is jointly continuous and
bounded in (s, a), 3) s -+ A(s) is a continuous correspondence. Let B(S) denote the
Banach space of all measurable, bounded functions f : S -+ R under the (essential)
supremum norm, Ilfll = suPsES If(s)l. Define the Bellman operator F : B(S) --+
B(S) by:

/n(W)(8) = aEA(s)max [u(s,a) + /3J W(s')p(ds'f s,a)1. (2.10)

Bellman's equation can then be re-written in operator notation as:

V = r(V), (2.11)

i.e. V is a fixed point of the mapping/7. Blackwell (1965) and Denardo (1967) noted
that the Bellman operator has a particularly nice property: it is a contraction mapping.
This means that for any V, W c B we have:

lit<v)- r<w)ll llv- wll. (2.12)

The theory of contraction mappings allows us to establish the existence and unique-
ness of the solution V to Bellman's equation. In addition, the theory of contraction

19Bellman was not the first to discover this equation (for example versions of it appear in prior work by
Arrow et al. (1951) on optimal inventory policy), however the equation bears his name due to Bellman's
systematic application of the approach to solving a wide variety of problems.
636 J. Rust

mappings provides the basis for many of the solution methods and error bounds de-
veloped in Section 4. The following two theorems are the cornerstones of the theory
of infinite horizon MDPs.

CONTRACTION MAPPING THEOREM. If ff is a contraction mapping on a Banach Space

B, then F has a unique fixed point V C 13.

BLACKWELL'S THEOREM. The stationary, Markovian, infinite-horizon policy given by

c~ = (c~, c~,...) where c~ is defined in (2.8) and (2.9) constitutes an optimal decision
rule for the infinite-horizon MDP problem (2.2).

The optimality of the infinite horizon policy c~ can be proved under weaker con-
ditions that allow u(s, a) to be an unbounded, upper semicontinuous function of s
and a [see Bhattarcharya and Majumdar (1989)]. Although problems with unbounded
state spaces S, decision spaces A, and utility functions u arise frequently in economic
applications, in practice most of the computational methods presented in Section 4
require bounded domains and utility functions. We can approximate the solution to
a problem with unbounded state space, decision space, and utility function via a
sequence of bounded problems where the bounds tend to infinity. Furthermore, the
solution to problems with upper-semicontinuous utility functions can be approximated
as the limit of a sequence of solutions to problems with continuous utility functions
[see Gihman and Skorohod (1979, L e m m a 1.8)]. 2o

2.3. Examples of analytic solutions to Bellman's equation for specific "Test Problems"

We now provide several concrete examples of MDPs that arise in typical economic
problems, showing how the theory in the previous sections can be used to solve them.
Although the underlying thesis of this chapter is that analytical solutions to DP prob-
lems are rare and non-robust (in the sense that small perturbations of the problem
formulation leads to a problem with no analytic solution), we present analytical solu-
tions in order to provide a "test bed" of problems to compare the accuracy and speed
of various numerical solution methods presented in Section 4. Examples 2 and 3 pro-
vide examples of a CDP and a DDP, respectively. In order to differentiate these two
cases we will use the notation a E A(s) to denote the case of discrete choice (where
A(s) contains a finite or countably infinite number of possible values), and c c A(s)
to denote the case of continuous choice (where A(s) contains a continuum of possible
values, such as a convex subset of Euclidean space).

2°Subtle mathematical issues arise when we move to more abstract state and action spaces. For example
Puterman (1994, Section 6.2.5) provides an example where V need not be a measurable function. One
also must establish sufficient conditions for the existence of "measurable selections" to guarantee that a is
measurable. See Bertsekas and Shreve (1978) for a rigorous treatment of these existence and measurability
Ch. 14: Numerical Dynamic Programming in Economics 637

EXAMPLE 1 (A trivial problem). Consider a problem where u(8, a) = 1 for all a c

A(~) and all s E S. Given that the utility function is a constant, it is reasonable
to conjecture that V is a constant also. Substituting this conjecture into Bellman's
equation we obtain:

F f 1
V= max |l+/3]Vp(dJ,s,a)|, (2.13)
aCA(s) L J J

the unique solution to which is easily seen to be V = 1/(1 - / 3 ) . This is the well
known formula for a geometric series, V = [1 + / 3 +/32 -t. . . . ] which is clearly equal
to expected utility in this case since u is identically equal to 1. This provides a simple
and basic test of any solution method for infinite horizon MDPs: the method should
return a value function identically equal to 1/(1 - / 3 ) whenever we solve the problem
with a utility function that is identically 1.

EXAMPLE 2 (A problem with continuous state and control variables). Consider the
problem of optimal consumption and savings analyzed by Phelps (1962). In this case
the state variable s denotes a consumer's current wealth, and the decision a is how
much to consume in the current period. Since consumption is a continuous decision,
we will use ct rather than at to denote the value of the control variable, and let
wt denote wealth at time t. The consumer is allowed to save, but is not allowed to
borrow against future income. Thus, the constraint set is A(w) = {c I 0 ~ c <~ w}.
The consumer can invest his savings in a single risky asset with random rate of return
{Rt} is an IID process (i.e. independently and identically distributed over time) with
marginal distribution F . Thus there is a two-dimensional state vector for this problem,
st = (wt, Rt), although since Rt is IID it is easy to see that wt is the only relevant
state variable entering the value function. Let the consumer's utility function be given
by u(w, c) = log(c). Then Bellman's equation for this problem is given by:

V ( w ) = max log(c) + / 3 V(R(w- c))F(dR . (2.14)


Working backward from an initial conjecture V = 0 we see that at each stage

Vt = F t ( 0 ) has the form, Vt(w) = ftlog(w) + 9t for constants ft and gt- Thus,
it is reasonable to conjecture that this form holds in the limit as well. Inserting the
conjectured functional form V ( w ) = f ~ log(w) + 90o into (2.14) and solving for the
unknown coefficients f0o and 90o we find:

f ~ - 1-/3'

log(1 - fl) 3 log(/3) 3E{log(/~)}

g0o - 1 -/3 + (1 -/3)------~+ (1 - / 3 ) 2 , (2.15)
638 J. Rust

and the optimal decision rule or consumption function is given by:

c~(w) = (1 - [3)w. (2.16)

Thus, the logarithmic specification implies that a strong form of the permanent income
hypothesis holds in which optimal consumption is a constant fraction of current wealth
independent of the distribution F of investment returns.

EXAMPLE 3 (A problem with discrete control and continuous state variable). Consid-
er the problem of optimal replacement of durable assets analyzed in Rust (1985, 1986).
In this case the state space S = R+, where st is interpreted as a measure of the ac-
cumulated utilization of the durable (such as the odometer reading on a car). Thus
st = 0 denotes a brand new durable good. At each time t there are two possible deci-
sions {keep, replace} corresponding to the binary constraint set A(s) = {0, 1} where
at = 1 corresponds to selling the existing durable for scrap price P and replacing it
with a new durable at cost P. Suppose the level of utilization of the asset each period
has an exogenous exponential distribution. This corresponds to a transition probability
p given by:

p(dSt+l ] st,at)
i- e x p { - A ( d s t + l - st)} if at = 0 and St+l ) st,
= - exp{-A(dst+l 0)} if at = 1 and 8t+l • O, (2.17)
Assume the per-period cost of operating the asset in state s is given by a function
c(s) and that the objective is to find an optimal replacement policy to minimize
the expected discounted costs of owning the durable over an infinite horizon. Since
minimizing a function is equivalent to maximizing its negative, we can define the
utility function by:

if at = O,
= -
if at = 1.

Bellman's equation takes the form:

V ( s ) = max
[ - c(s) +/3
F V(s')A e x p { - A ( s ' - s)} ds',

- [-fi - P] - c(O) + fl
/5 1
V ( s ' ) A e x p { - A ( s ' ) } ds' . (2.19)

Observe that V is a non-increasing, continuous function of s and that the second

term on the right hand side of (2.19), the value of replacing the durable, is a constant
independent of s. Note also that P > __P implies that it is never optimal to replace
Ch. 14: NumericalDynamic Programmingin Economics 639

a brand-new durable s = 0. Let 7 be the smallest value of s such that the agent is
indifferent between keeping and replacing. Differentiating Bellman's equation (2.1 9),
it follows that on the continuation region, [0, 3'), c~(s) = 0 (i.e. keep the current
durable) and V satisfies the differential equation:

V'(8) = -c'(8) +/~c(s) + A(1 -/3)V(s). (2.20)

This is known as a free boundary value problem since the boundary condition:

- -c(3)
v ( ~ ) = I F - P ] + v ( o ) = - c ( ~ ) + Zv(-y) - ~7~, (2.21)

is determined endogenously. Equation (2.20) is a linear first order differential equation

that can be integrated to yield the following closed-form solution for V:

V(s) = max [lC(~7) -c(7) + f~ e'(Y) [ 1 - /3e -x(1-~)(u-s) ]dy 1 (2.22)

' -1- - - 5

where 3' is the unique solution to:

. -P] = Cjo~ a---JFe'(v)[1 - fle-;,(l--e)~]dy. (2.23)

It follows that the optimal decision rule is given by:

a(s) = { 01 ifs>7.if
s E [0,~y], (2.24)

3. Computational complexity and optimal algorithms

To keep this chapter self contained, this section provides a brief review of the main
concepts of the theory of computational complexity. There are two main branches
of complexity theory: discrete (or algebraic) complexity theory and continuous (or
information based) complexity. The two branches differ in their assumptions about
the underlying model of computation (Turing vs. real), the types of problems being
solved (discrete vs. continuous), and the nature of the solution (exact vs. approximate).
We appeal to computational complexity theory in Section 4 as a tool for helping us
think about the relative efficiency of various algorithms for solving MDPs.
640 J. Rust
3.1. Discrete computational complexity

Discrete computational complexity theory deals with mathematical problems such as

matrix multiplication or the traveling salesman problem that can be fully specified by
a finite number of parameters and whose exact solution can be computed in a finite
number of steps. 21 This theory is based on the Turing model of computation, which
is universal model of computation that assumes that a computer can perform a finite
number of possible operations on a finite number of possible symbols, although with
an unbounded memory and disk storage capacity. Thus, a Turing machine is a generic
model of a computer that can compute functions defined over domains of finitely
representable objects such as the integers or rational numbers, Z. A function f is said
to be computable if there exists a computer program or algorithm that can compute
f(x) for any input x in its domain in a finite amount of time using a finite amount of
memory and storage. 22 However knowing that an algorithm can compute the value of
a function in finite amount of time is not very useful if the amount of time or storage
required to compute it is unreasonably large. The theory of computational complexity
classifies various problems in terms of their complexity defined as the minimal amount
computer time or space required by any algorithm to compute f(x) for any particular
value of x. In order to formally define complexity, we need some notion of the "size"
or inherent difficulty of computing a particular input x. Although the notation x
suggests the input is a number, inputs can be more abstract objects such as a finite list
of symbols. For example in the traveling salesman problem, the input x consists of a
finite list { e l , . . . , On} of cities, and a corresponding list {d(ci, cj)} of the distances
between each of the cities ci and cj. The function f returns a solution to the traveling
salesman problem, i.e. an ordering ( c ~ 0 ) , . . . , c,~(n)) that minimizes the length of a
tour of all n cities beginning in city c~(1), visiting each city in sequence, and then
returning to city c~(1) from city c~(n). Here it is natural to index the "size" or difficulty
of the input x by the number of cities n. The complexityfunction comp(n) denotes the
minimal time required by any algorithm to compute a solution to a problem with input
size n. Computer scientists have classified various mathematical problems in terms
of their inherent difficulty, as measured by comp(n). A polynomial-time problem has
comp(n) = O(P(n)) for some polynomial function P(n). If the problem complexity
function cannot be uniformly bounded above by any polynomial function P(n) it
is referred to as an exponential-time problem. In the computer science literature,
polynomial-time problems are referred to as tractable and exponential-time problems
are referred to as intractable. Examples of tractable problems include multiplication
of two n x n matrices (for which comp(n) = O(7"/,2"376)) and linear programming. In
Section 4 we show that an n-dimensional discrete dynamic programming problem is

21Good references to this theory include Garey and Johnson (1979) and Kronsj6 (1985).
22Conversely a problem is said to be non-computable(or undecidable)if there is no algorithm that can
be guaranteed to compute f(x) in a finite amount of time. Examples of non-computableproblems include
the haltingproblemand Hilbert's tenth problem, i.e. solvability of polynomial equations over the integers.
Ch. 14: NumericalDynamic Programmingin Economics 641

an example of an intractable problem, since its complexity is given by comp(r~) =

~?(IS]~), where IS] > 1 is the number of possible states in each dimension.

3.2. Continuous computational complexity

Continuous computational complexity deals with continuous mathematical problems

defined over infinite-dimensional spaces such as multivariate integration, solution
of PDE's, and solution of continuous-state MDPs. None of these problems can be
solved exactly on ordinary computers, although one can in principle compute arbi-
trarily accurate approximations to these solutions, to within an arbitrary error tol-
erance c > 0. Unlike discrete complexity theory which is based on the Turing
model of computation, continuous complexity theory is based on the real number
model of computation, i.e. a computer that can conduct infinite precision arithmetic
and store exact values of arbitrary real numbers such as 7r. Thus, continuous com-
plexity theory abstracts from certain practical issues such as numerical stability and
round-off error. The other key distinguishing feature of a continuous mathematical
problem is that it generally cannot be fully specified by a finite list of parameters.
Since computers can only store finite sets of real numbers, it follows that algo-
rithms for solving continuous mathematical problems must make due with partial
information on the problem to be solved. For example in the integration problem
algorithms must ordinarily rely on evaluations of an integrand f at only a finite
number of points in the domain. Recognizing the fact that information on the prob-
lem inputs is generally only partial, continuous complexity theory is also referred
to as information-based complexity (IBC). A number of important developments
in IBC theory over the last decade has lead to a fairly complete categorization of
the difficulty of computing approximate solutions to a wide variety of continuous
mathematical problems, including the continuous MDP problem. It has also lead to
fairly precise characterizations of the form of optimal and near-optimal algorithms for
various problems. The remainder of this section reviews the main elements of this
theory. 23
A continuous mathematical problem can be defined abstractly as a mapping A :
F -+ /3 from an infinite-dimensional space F into an infinite-dimensional space
/3. The mapping A is known as the solution operator. For example, in the case of
multivariate integration the solution operator A is given by:

A(f) =/[1,1] d f(s)/~(ds), (3.1)

23This section closely follows the excellent treatment in Traub, Wasilikowski and Wo2niakowski (1988)
with only minor changes in notation. Tsitsiklis (1994)provides a review of applications of complexitytheory
to MDPs. Traub (1993) presents a review of recent developments in complexity theory for solving more
general continuous problems such as multivariate integration and function approximation.
642 J. Rust

where/~ denotes Lebesgue measure. The range of A is the set B = / ~ and the domain
of A, the set of admissible integrands f , is given by:

F = { f : [0, 1]a -+ R I D r f is continuous and IID"fll <. t}, (3.2)

where ][Drfll denotes the largest mixed partial derivatives of order r of f , i.e.

I[D"fll = max sup Orf(_s!,..:,Sd)

kl,...,kd sl,...,Sd ()kiS1,. ,okd8d '
suNect to: r = kl + " " + ha. (3.3)

In the case of the M D P problem, the set F of problem elements consist of pairs
f = (u,p) satisfying certain regularity conditions. For example, in Section 4 we will
require that u and p are elements of a certain class of Lipschitz continuous functions.
The solution operator A : F --+ B can be written as V = A(u,p), where V denotes
the T + 1 value functions ( V 0 , . . . , VT) given by the recursions ( 2 . 2 ) , . . . , (2.5) in
the finite horizon case, or the unique solution to Bellman's equation (2.10) in the
infinite horizon case. Under appropriate regularity conditions on the set of problem
elements F , the range of A is in the set B = C(S), where C(S) is the Banach space
of bounded continuous functions on S.
Since the problem elements f c F live in an infinite-dimensional space and com-
puters are finite-state devices, we will only be able to approximate the true solution
A(f) using a "computable" mapping U : F --+ B that uses only a finite amount of
information about f and can be calculated using only a finite number of elementary
operations (e.g. addition, multiplication, comparisons, etc.). Given a norm on the space
B we can define the error involved in using the approximate operator U instead of the
true solution operator A by HA(Z)- g(f)][. We say that U(I) is an c-approximation
o f f c F if llA(f) - U(f)ll <~ e. Since we restrict U(I) to depend only on a finite
amount of information about f , we can represent it as the composition of two distinct

U(f) = CN ( I N ( f ) ) , (3.4)

where 1N (f) : F -+ R N represents an information operator providing information

about the problem element f c F , and q5N " R N --+ B is an algorithm that maps the in-
formation about f into an approximate solution. 24 In general, the information about f
will be given by a finite number N of functionals of f , I N ( f ) = (L1 ( f ) , . . . , LN(I)).
In the problems we are considering here, F : S --+ R is a function space and we will

24The information operator IN is sometimes referred to as an oracle. See, e.g. Nemirovsky and
Yudin (1983).
Ch. 14: Numerical Dynamic Programming in Economics 643

be focusing on the evaluation functionals, Li(f) - f(si) for some point si ~ S. The
resulting information operator IN (f) is referred to as the standard information:

IN(f) = ( f ( s , ) , . . . , f(sN)), (3.5)

where {sl, • •., 8N} can be thought of as defining "grid points" in the domain of f .
The information operator IN, the algorithm CN, and the number of grid points N are
all treated as variables to be chosen in order to obtain approximations of a specified
degree of accuracy, e. Thus, we can think of an arbitrary algorithm as consisting of
the pair U = (IN, q)N), 25
The total computational cost of computing an approximate solution U(f), denoted
by c(U, f), is assumed to consist of two components:

c(U, f) = e, (IN, f) + c2 (¢N, IN ( f ) ) , (3.6)

where cl(IN, f) is the cost of computing the N information points IN(f) (i.e. of
evaluating the function f at the N grid points { s l , . . . , SN}), and c2(ON, IN(f)) de-
notes the combinatorial cost of using the information IN(f) C R N to compute the
approximate solution U(f) = CN(IN(f)). In many problems such as the multivariate
integration problem one can prove that C(IN, S) and C(¢N, IN(S)) are both propor-
tional to N. For example, this is true of the multivariate integration problem when
IN(f) is the standard information (3.5) and CN is a linear algorithm such as the
sample average algorithm:

dpN(SN(f)) = f fl f(si). (3.7)

We are now ready to define the e-complexity, which roughly speaking is the minimal
cost of computing an e-approximation to a mathematical problem A(f). We begin by
defining the concept of worst-case complexity which measures the cost of computing
the hardest possible problem element f E F.

DEFINITION 3.1. The worst case deterministic complexity of a mathematical problem

A is given by:

compW°r-det(e) = inf {c(U) [ e(U) <<.¢}, (3.8)

25We have oversimplified the treatment of information here in the interest of brevity. TWW (1988)
distinguish between adaptive it~fi~rmation versus nonadaptive infi)rmatian. In the latter case, we "allow each
of the N functionals of f to be conditioned on the outcome of the previous functionals, i.e. I N ( f ) --
{Ll (f), L2(f, Y l ) , . . . , L N (f, Y1 . . . . , YN-1)} where the y~ are defined recursively by Yl = L1 ( f ) and
y~ = Li (f, Yl, • • . , Yi- 1), i = 2 , . . . , N . We have presented the special case of nonadaptive information
since for many linear problems, one can prove that there is no gain to using adaptive information.
644 J. Rust

where the functions e(U) and c(U) are defined by:

e(g) = sup IIA(/) - g ( / ) f l ,

c(U) = sup c(U, f). (3.9)

Thus, worst case complexity is a minimax bound on the cost of computing an

e-approximation to the problem A. Tight upper and lower bounds on the complexity
have been established for various mathematical problems. For example, the worst case
complexity of integration of functions that are differentiable to order r over the set
S = [0, 1] a is known to have a worst case complexity given by 26

compW°r-det(G d, r) = O (')
C~77/r . (3.1o)

Thus, multivariate integration is subject to the curse of dimensionality. In Section 4

we review recent results of Chow and Tsitsiklis (1989, 1991) that proves that the
general M D P problem is also subject to the curse of dimensionality - at least using
deterministic algorithms and measuring complexity on a worst case basis.
One way to break the curse of dimensionality is to use random rather than determin-
istic algorithms. Similar to the case of deterministic algorithms, a random algorithm
U can be decomposed as a composition of a randomized information operator IN and
a randomized combinatory algorithm qSN:

U(f) = ~N (iN(f)). (3.11)

Monte Carlo integration is a simple example of a randomized algorithm U. It is based

on the randomized information

iN(f) = {.t(al),... ,f(aN)}, (3.12)

where { g ~ , . . . , 8N} are IID uniform random draws from [0, 1] a, and the deterministic
linear algorithm CN : R N --+ R given in Eq. (3.7). In order to ensure that a ran-
dom algorithm U is well defined we need to define an underlying probability space
(~,-.~, >) and measurable mappings _TN : g? -+ R N and CN : R 5r x ~Q -~> /3 such
that U ( f ) is a well defined random element o f / 3 for each f ~ F . We can think of
a random algorithm as a kind of mixed strategy ~7 = (¢N, fN, ~2, ~,/~). Clearly just

26We include the parameters d and r as arguments to.the complexity function to emphasize its depen-
dence on the dimension and smoothness of the set of problem elements F. Obviously we do not treat d as
a variable in F since otherwise Definition 3.1 implies that complexity will generally be infinite (i.e. take
the supremum as d -+ oo).
Ch. 14: NumericalDynamicProgrammingin Economics 645

as pure strategies are a special case of mixed strategies, deterministic algorithms are
special cases of random algorithms when # is restricted to be a unit mass. It follows
immediately that randomized complexity can never be greater than deterministic com-
plexity, although for certain mathematical problems (such as multivariate integration)
randomized complexity is strictly less than deterministic complexity.
DEFINITION 3.2. The worst case randomized complexity of a mathematical problem
A is given by:

compW°r-ran(e) ~ inf {e(U) I e(U) ~< e}, (3.13)


where e(U) and c(U) are defined by:

e(U) = sup [ IlA(f) - 0(co, f)]l#(dco),


e(U) = s u p / ' e(U(co, f), f)#(&o). (3.14)

To illustrate Definition 3.2, consider the randomized complexity of the multivariate
integration problem A defined in Eqs (3.1) and (3.2). The simple Monte Carlo in-
tegration algorithm puts an upper bound on the complexity of the problem given

compW°r-ran(e, d ) = O ( e ~ ) . (3.15,

To see this, note that F is a uniformly bounded class of uniformly equicontinuous

functions, so the classical Ascoli theorem implies that F is a compact subset of
C[0, 1]a. Define the covariance operator 27 : C[0, 1]a × C[0, 1]a -+ R by:

f v= j'f(s)v(8)a(ds)- . f f(s)A(ds) f v(s)a(ds). (3.16)

It is elementary to show that ~ is a bounded, bilinear, positive definite operator.

Indeed, the "standard deviation" operator ~7 : C[0, 1]d --+ _R satisfies:

or(f) - v / f ~ ~< t]fll, (3.17)

which not only implies that ~ is bounded, but that the functional (7 is Lipschitz
continuous. Since the set F is compact, it follows that supfcF c~(f) < oc. Using
H61der's inequality we see that

e(U) sup E
f(gi) - j J'(s)A(ds) } ~< supfcF (7(f)
646 J. Rust

Assuming that c(U) = 7(d)N for some polynomial function 7(d) (which will typi-
cally be linear in d), it follows that by setting N = (K/e) 2, K = supf~F or(f), we
are assured that e(U) <~ e. It follows that the worst case randomized complexity of
multivariate integration is given by:

compW°r-ran(e, d) = 0 ('y(d)
\-g-} • (3.19)

Comparing this bound to the deterministic complexity bound in (3.10), it follows that
randomization does succeed in breaking the curse of dimensionality of multivariate
An implicit assumption underlying the use of random algorithms is that computers
are capable of generating true IID random draws from arbitrary distributions such as
the uniform distribution on [0, 1]. Of course, actual computers only generate pseudo-
random numbers using deterministic algorithms. To the extent that these algorithms are
deterministic it would appear that the results on deterministic computational complex-
ity ought to apply, leading to the conclusion that Monte Carlo integration using actual
random number generators cannot be capable of breaking the curse of dimensionality
of the multivariate integration problem. There has been a large, almost philosophical
debate over this issue. However we agree with the view expressed in Traub, Wasi-
likowski and Wo~niakowski (1988) that "pseudo-random computation may be viewed
as a close approximation of random computation, and that randomness is a very pow-
erful tool for computation even if implemented on deterministic computers" (p. 414).
Indeed, Traub and Wo/.niakowski (1992) have shown that Monte Carlo algorithms
based on a linear congruential generator of period M with a uniformly distributed
initial seed "behaves as for the uniform distribution and its expected error is roughly
N -1/2 as long as the number N of function values is less than M 2 " (p. 323).
As we noted in the introduction, randomization (even if implemented using "truly
random" generators) does not always succeed in breaking the curse of dimensionality:
for example randomization doesn't help for problems such as nonlinear optimization,
solution of PDE's or Fredholm integral equations, as well as the general approximation
problem. A final way to break the curse of dimensionality is to evaluate algorithms
on an average rather than worst case basis. To do this we need to specify a prior
probability measure # over the space F of problem inputs.

DEFINITION 3.3. The average case deterministic complexity of a mathematical prob-

leln A is given by:

compavg-det(¢) ~ ipf {c(U) I e(U ) ~< ¢}, (3.20)

Ch. 14: NumericalDynamic Programmingin Economics 647

where e(U) and e(U) are defined by:

= £ IIA(f) u(f) ll # ( d f ) ,

e(U) = / ) c ( U ( f ) , f) # ( d r ) . (3.21)

Thus, the average case complexity measure is based on minimizing the expected
cost of solving a problem with an expected error no greater than e, where the expecta-
tion is taken with respect to the prior probability measure #. One can also define the
concept of average randomized complexity, compaVg-ran@). However T W W (1988)
showed that. randomization does not help in the average case setting in the sense
that compavg-ran(e) = compavg-det(@ Thus, there is no loss in generality in restricting
attention to deterministic information and algorithms in the average case setting. By
choosing the prior # appropriately, the average case complexity measure can reflect
the types of problems one typically encounters in practical applications. However in
practice it is difficult to specify priors over infinite dimensional spaces and most results
assume fairly specific prior probability measures # such as "Wiener sheet measure"
over the space F - C[0, lid. Relative to this prior, Wogniakowski (199 l) showed that
multivariate integration is a tractable problem:

THEOREM. The average case deterministic complexity of the multivariate integration

problem (3.1) and (3.2) is given by:


Wo~niakowski provided a full characterization of a nearly optimal integration al-

gorithm: it is the Sample average algorithm (3.7), where the integrand f is evaluated
at the HammersIey points in [0, lid. 27
A number of other problems that are intractable in the worst case and randomized
settings have been shown to be tractable in the average case setting. These prob-
lems include multivariate function approximation [Wo2niakowski (1993)], solution of
linear elliptic PDE's, and Fredholm integral equations [Werschulz (1991)]. However
very little is known about the average case complexity of nonlinear problems; for

27See Niederreiter (1992) /'or a definition of the Hammersley points. Wo2niakowski actually charac-
terized the optimal integration algorithm: the optimal integration algorithm is a sample average of shit?ed
Hammersley points. This is not fully constructive since the amount that the Hammersley points need to be
shifted is a quantity that appears to be very difficult to compute. Paskov (1993) characterized the optimal
algorithm for integration of smooth functions under the folded Wiener sheet measure prior. He derived a
nearly optimal algorithm that breaks the curse of dimensionality: a simple average of the integrand f at
points derived from the hyperbolic crosspoints.
648 J. Rust

example it is still an open question whether multivariate optimization is tractable on

an average case basis. Another problem with average case complexity is the difficulty
of specifying meaningful priors over the infinite dimensional space F of problem
elements such as the space of possible MDP problems. Many papers use priors that
yield analytically convenient results, such as Gaussian priors. However once we con-
sider arbitrary alternative priors, methods that have good average case complexity
will also have to have good worst case complexity as Nemirovsky and Yudin (1983)
have noted:

The motivation for our choice of the minimax method for characterizing methods
on classes "according to the worst case" undoubtedly requires some explanation.
There is a widespread belief that the "minimax approach" is too pessimistic; it is
considered more sensible to average the characteristics of the methods on individ-
ual problems of the class according to some a priori distribution. Such a "Bayesian
approach" postulates that "in life" problems of a given type are distributed in a
definite way. We point out, however, that for arbitrary broad classes of problems
there is no way, justified in any degree, of giving such an a priori distribution
over the class of problems. Hopes of an "experimental determination" of such a
distribution are unfounded: if the class of problems is parametric, with 50 param-
eters say, then any reliable "direct" construction of their joint distribution would
require a selection of fantastic size, which is certainly unrealizable. So, even in the
simplest linear problems, an empirical approach to the construction of the a priori
distribution is hopeless. Thus the "Bayesian" approach to the study of methods of
solving arbitrarily wide classes of problems has no future in practice; the recom-
mended methods would have to work well with an arbitrary a priori distribution;
but then they would also be good in the minimax sense (pp. 26-27).

Despite the theoretical merit of this view, the next section will present several exam-
ples of less "conservative" algorithms that seem to work well in practice even though
they can be proven to do poorly on a worst case basis. The classic example is the
simplex algorithm of linear programming which in its modified 'constraint genera-
tion' form has been successfully used to solve fairly large scale MDPs by Trick and
Zin (1993). Another example is low discrepancy numerical integration. Even though
these methods are deterministic (and thus intractable on a worst case basis) the nu-
merical results of Paskov (1994) show that they significantly outperform Monte Carlo
integration in terms of speed and accuracy even in fairly high dimensional problems
(d = 360). In Section 4.3.1 we suggest that these methods could be effective for
continuous MDPs.

4. Numerical solution methods for general MDPs

This section surveys the solution methods for MDPs. We cover both standard ap-
proaches that have been used for many years as well as several promising recent
Ch, 14: Numerical Dynamic Programming in Economics 649

methods for which we have little or no practical experience. We devote separate sub-
sections to the main categories of MDPs: finite and infinite horizon problems and
discrete and continuous problems. This organization is dictated by the substantial
differences in solution methods for these various categories of problems. We also
provide separate treatment of solution methods for CDP's and DDP's since the nature
of the control variable also has a big impact on the computational complexity of these

4.1. Discrete finite hot&on MDPs

The main numerical method for solving finite horizon MDPs is simple backward
recursion (see Eqs (2.3)-(2.6) in Section 2). The integration operator in the finite
state case reduces to simple summation:

tsl ]
Vt(s) = F ( V t ) ( s ) = max u(s,a) + ~ Z Vt+l(s')p(s' j s, a ) . (4.1)
acA(8) #=1

Assume for simplicity that the choice sets A(s) contain a common finite number of
possible actions A. Then it is easy to see that computation of (4.1) for each state s
requires a total of 2(tAIJS I + IAI) additions, multiplications, and comparison opera-
tions, or a total of 2(IAItSl 2 + IAItSI) operations to compute the entire time t value
function Vt. Thus the total operation count in carrying out the dynamic programming
procedure is is 2T(IAIISI 2 + IAIIXl), which is dominated by the squared term for
problems where 1.91 is large. Note that the storage requirements are also O(IAljSI2),
representing the space required to store the transition probabilities {p(s' I s, d)}. It
follows that the complexity of solving discrete finite horizon MDPs is O(TJAIISJ2),
which implies that it is a member of the class P of polynomial time problems, pro-
vided that we measure the size of the problem by the number of discrete states 15'1.
However if we measure the size of the discrete MDP problem by the dimension of
the state vector st rather than by the total number of states ISl, then one can show
that the MDP problem is in the class of exponential-time problems.
Notice that the main bulk of the work required to solve a discrete finite horizon
MDP problem is the computation of the conditional expectations of the value function
for each possible combination of the state s, action a, and time period t: the remaining
summation and maximization operations are of order O(TLA ]ISI) which are negligible
compared to the O(TJAIIS12) operations needed to compute the conditional expec-
tations. There are four main ways to speed up the latter calculations: 1) exploiting
special "sparsity structure" of the transition probability p(s' I s,d), 2) using mas-
sively parallel processors, 3) using fast matrix multiplication algorithms, 4) "action
elimination" methods, and 5) using smooth approximation methods to reduce the cost
of computing and storing value functions for discrete MDPs with huge numbers of
states and controls.
650 z Rust

Exploiting special structure of" the MDP problem. Many economic problems such
as the discretized version of the optimal replacement problem and the optimal con-
sumption and saving problems presented in Section 2 have transition probabilities
that are sparse and they often have a highly regular, recursive structure. For example,
by restricting the distribution of investment returns to a finite interval one obtains a
discrete representation of the consumption/savings problem with a banded transition
probability matrix p. In order to obtain the largest speedups and storage reductions
we need to fully exploit our a priori knowledge of the particular economic problem.
However determining the best way to do this gets more complicated in problems with
multidimensional state variables. To apply formula (4.1) we typically recode the mul-
tidimensional state vector st to have a linearly ordered discrete state representation
s = 1 , . . . , ISI. By doing this coding in the right way we can obtain a representation
for the finite state transition probability matrix that has a desirable sparsity pattern
that substantially reduces the burden of computing the conditional expectation of the
value functions in (4.1). Rust (1991) provides an example of this approach in the
case of an MDP model of optimal retirement behavior where the state variable st has
seven components, i.e. ds = 7. It turns out that for this problem any coding procedure
yields a matrix representation for p(s I I s, d) that is a direct product of a circulant
matrix C, a banded matrix t3, and a dense matrix D. Depending on how we order
these component matrices to form the overall transition probability matrix p we ob-
tain various sparsity patterns which are more or less amenable to rapid computation
on parallel and vector computers. It turns out that the optimal ordering for a vector
processor like the Cray-2 is p = C ®/3 ® D, which yields an upper block triangular
representation for p [for details see Rust (1991)]. This strategy is even more effective
for solving infinite horizon problems by the policy iteration approaches presented in
the next section since policy iteration involves solution of systems of ISI equations
in ISt unknowns, which takes O(ISt 3) time using conventional algorithms for solving
systems of linear equations (e.g. LU factorization and recursive back-substitution).
Relative to naive policy iteration methods that treat p as a dense matrix, Rust showed
that using the optimal ordering p = C ® t3 ® D fully exploits the sparsity pattern
of p and results in speedups of @(ICI 3) where ICI is the order of the circulant ma-
trix C. For the retirement problem considered in Rust's (1991) paper, this resulted in
a speed-up of 27,000 times relative to standard policy iteration algorithms that don't
exploit the sparsity structure of p.

Massive parallel processing technology. Massive parallel processing is just begin-

ning to be applied in computational economics. Note that the backward recursion
approach to solving DP problems is inherently sequential and cannot be paralMized.
However, the majority of the work in each step of backward induction are the
O([A]IS] 2) operations necessary to compute the conditional expectations of the value
function. This task can clearly be performed in parallel. For example, if we have
access to an expandable massive parallel processor with O([S])-processors, then we
Ch. 14: Numerical Dynamic Programming in Economics 651

can assign each separate processor to compute the summation in (4.1) for each state
s = 1 , . . . , 15'1. It follows that the MDP problem can now be solved in O(TIAIISI)
time using ISl processors as opposed to the O(TIA[ISI 2) time required by a single
processor. Rust (1996b) showed that simple backward induction using O(]SI 3) pro-
cessors is capable of solving an MDP with IAI = o(Isl) actions and ISl states in
O(log ISl) time. In the next section we argue that a massively parallel linear equation
algorithm of Pan and Reif (1985) can be'used to solve the infinite horizon MDP prob-
lem by policy iteration in O(log(ISt) 2) time using O ( [ S [ 2"376) p r o c e s s o r s . Although it
seems clear that massively parallel computers will enable us to significantly speed up
the calculation of solutions to finite horizon MDPs as well, most current applications
of this technology [e.g. Coleman (1993)] have focused on infinite horizon problems
so we will defer further discussion of this approach to Sections 4.2 and 4.4.

Fast matrbc multiplication algorithms. Although fast matrix multiplication meth-

ods also offer the potential for large speedups on either serial or parallel machines,
there have been few practical implementations of these algorithms. However this may
change in the near future since the potential gains to using these algorithms are get-
ting increasingly large - especially in large scale problems where the "overhead" of
these algorithms can be "amortized". For example, the number of operations required
to multiply two 'r~ x 'r~ matrices using standard matrix multiplication algorithms is
2'r~3 - r~2. Strassen's (1972) algorithm computes the product in 4.7z~e'8°7 operations,
Pan's (1980) algorithm requires O(r~2'795) operations, and Coppersmith and Wino-
grad's (1987) algorithm requires only O(n 2376) operationsY However these fast ma-
trix multiplication algorithms have larger space requirements and higher fixed costs
relative to the conventional matrix nmltiplication algorithm. For example, Strassen's
algorithm requires 11/3n e memory locations which exceeds the 3n 2 locations needed
for conventional matrix multiplication, and Pan's algorithm requires 24 operations to
multiply two 2 x 2 matrices compared to only 8 operations required by the conven-
tional algorithm. However the break-even point for overcoming the overhead costs
associated with fast matrix multiplication may not be very large. For example when
r~ = 128 the traditional algorithm requires 2,097,152 operations versus 823,723 for
Strassen's algorithm and 797,184 for Pan's algorithm. 29 This suggests that implemen-
tation of fast matrix multiplication algorithms may be a path worth investigating for
solving discrete MDPs with large, ill-structured, and non-sparse transition probability

28Surprisingly, the computation',d complexity of matrix multiplication is not known. All that is known
is that its complexity, comp(n), lies within the following lower and upper bounds: comp(r~) -- Y2(n2) and
coInp(7/,) = O(n2"376).
29These figures are taken from Table 2.4.2 of Kronsj/5 (1985).
652 J. Rust

Action elimination methods. These are methods for identifying and eliminating non-
optimal actions in order to reduce the size of the set of possible actions that must
be searched at each backward induction step. The methods are usually described
in the context of infinite horizon models, see Section 6.7 of Puterman (1994) and
Hubner (1977).

Smooth approximation methods. We defer discussion of this approach to Sec-

tions 4.3.2 and 4.4.2.

4.2. Discrete infinite horizon MDPs

We begin by reviewing the "standard methods" for MDPs. We do not attempt to

offer a comprehensive survey of the huge number of different methods that have
been proposed, but rather focus on the methods which we view as most effective for
problems arising in economic applications. As mentioned in Section 2, the solution
to infinite horizon MDP problems is mathematically equivalent to computing a fixed
point of the Bellman operator V = F ( V ) . The fixed point problem can also be
posed as the problem of finding a zero to the nonlinear functional F ( V ) = 0, where
F = [I - F]. There are two mains ways to compute fixed points to contraction
mappings, successive approximations and the Newton-Kantorovich method. Most of
the other methods that have been proposed for the solution of infinite horizon MDPs
are applications of more general methods for solving systems of nonlinear equations.
This has spawned an almost overwhelmingly large number of iterative techniques
for solving infinite horizon MDPs including variants of Newton's method, Gauss-
Seidel, successive overrelaxation, and so forth. For a review of these methods in the
case of general nonlinear equations, we refer the reader to the classic text by Ortega
and Rheinboldt (1970). Readers who are interested in seeing how these methods
can be applied to the MDP problems should consult Kushner and Kleinman (1971),
Porteus (1980) and Puterman (1990, 1994).

4.2.1. Successive approximation, policy iteration, and related methods

This subsection provides brief descriptions of the methods that are most commonly
used for solving discrete infinite horizon MDPs that arise in economic and operations
research applications. The main issue is the relative efficiency of successive approxi-
mations versus policy iteration in solving large scale problems. Although the standard
implementation of policy iteration becomes computationally infeasible in problems
with ISt ~> 10,000, there are modified versions of policy iteration that appear to dom-
inate successive approximation and its variants. Section 4.2.2 verifies this conclusion
in a numerical comparison of the accuracy and efficiency of the various methods
in solving a discretized version of the auto replacement problem in Example 3 of
Section 2.3.
Ch. 14: Numerical Dynamic Programming in Economics 653

Successive approximations. 3° Starting with an arbitrary initial guess V0 of the so-

lution of Bellman's equation, one simply iterates the Bellman operator

vk+~ = r ( v k ) = r k+~ (v0). (4.2)

Note that in the case where we set V0 = 0, the method of successive approximations is
equivalent to solving an approximate finite-horizon problem by backward induction.
The Contraction Mapping theorem (see Theorem 2.1 in Section 2) guarantees the
consistency of this algorithm: in particular the contraction property implies that 11V -
Vkl] converges to zero at a geometric rate, sometimes referred to as linear convergence.
Furthermore this same error bound can be used to determine an upper bound on the
number of iterations required in order to be within some pre-specified tolerance e
of the true solution V. Indeed, a straightforward calculation shows that a maximum
of T(e, fl) successive approximation steps are required to obtain an c-approximation,
where T(c, fi) is given by:

T(e, fi) = J 1' ~ ' ]l°g ( ( 1 _ - f l ) e ) " (4.3)

This upper bound is tight: one can construct MDPs that take exactly T(e, fl) - 1
successive approximation steps in order to compute an e-approximation to V (for ex-
ample consider the case where u(s, a) = 1 so the MDP reduces to a simple geometric
series). In problems where the discount factor fl is very close to 1 such as problems
where the time intervals are relatively short (such as daily or weekly), so we can see
from (4.3) that an unacceptably large number of successive approximation steps will
be required to obtain any desired level of numerical accuracy e.

Accelerated successive approximations. In certain circumstances the method of suc-

cessive approximations can be significantly accelerated by employing the McQueen-
Porteus Error Bounds. If V is the true solution to Bellman's equation, V = F ( V ) ,
and 170 is any initial starting estimate for successive approximations, then after k
successive approximations steps V must lie within upper and lower bounds given by:

Fk(Vo) +_bke < v ~< Fk(Vo) + g~e, (4.4)

where e denotes an ] S I x 1 vector of l's, and:

_bk --/3/(1 - / 3 ) min [Ck (Vo) - ] - k - I (Vo)], (4.5)

bk = fl/(1 - / 3 ) max [F k (17o) - F k - I (Vo)].

3°The method also goes by name value iteration, contraction iteration, backward induction, or simply
dynamic programming.
654 ,I. Rust

The contraction property guarantees that _bk and bk approach each other geometrically
at rate/3. The fact that the fixed point V is bracketed within these bounds suggests
that we can obtain an improved estimate of V by terminating the iterations (4.2) when
[bk - bk ]< e, setting the final estimate of V to be the median bracketed value:

Vk = F k ( V 0 ) +
2 e. (4.6)

Bertsekas (1987) p. 195 sl~ows that the rate of convergence of {Vk} to V is geometric
at rate/31A=I, where A2 is the subdominant eigenvalue of Ms. In cases where 1 21<
the use of the error bounds can lead to significant speed-ups in the convergence of
successive approximations at essentially no extra computational cost. However in
problems where M s has multiple ergodic sets ]A21= 1 and the error bounds will not
lead to an appreciable speed improvement as illustrated in computational results in
Table 5.2 of Bertsekas (1987).

Policy iteration methods'. In relatively small scale problems (ISI < 500) with dis-
count factors sufficiently close to 1 (e.g./3 > 0.95) the method of Policy Iteration is
generally regarded as one of the fastest methods for computing V and the associated
optimal decision rule a. The method starts by choosing an arbitrary initial policy,
a 0 ) l Next a policy valuation step is carried out to compute the value function V~0
implied by the stationary decision rule so. This requires solving the linear system
Vso = uso +/3Mc~oVc~o where M s is the Markov operator, which in the finite state
case reduces to the S x S transition probability matrix with (i, j) element equal to
M s ( i , j ) = p(s' = sj [ si, ~(si)). More generally the Markov operator is the linear
conditional expectations operator defined by

M s V ( s ) = f V(s')p(ds' I s, a(s)). (4.7)

Once the solution V~0 is obtained, a policy improvement step is used to generate an
updated policy al:

al(s) = arg max i u(s,a) +/3 ZISl V~o(s')p(s'ls, a )1 . (4.8)

aEA(s) s/=l

Given al one continues the cycle of policy valuation and policy improvement steps
until the first iteration k such that ak = ak-1 (or alternatively Vsk = Vsk_~). It
is easy to see that such a Vsk satisfies Bellman's equation, so that by Theorem 2.3

31One possible choice is so(s) = arg aEmA(axs)[u(s,a)].

Ch. 14: Numerical Dynamic Programming in Economics 655

the stationary Markovian decision rule a = ak is optimal. Policy iteration always

generates an improved policy, i.e. V~k ~> V~_~, see Bertsekas (1987) or Puter-
man (1994). In fact, policy iteration always generates a strict improvement in V~k,
since if V~ k = V~k_ ~ the method has already converged. Since there are only a finite
number IA(1)I x . . . x IA(ISI)t of feasible stationary Markov policies, it follows that
policy iteration always converges to the optimal decision rule a in a finite number of
Policy iteration is able to discover the optimal decision rule after testing an amaz-
ingly small number of trial policies ak: in our experience the method typically con-
verges in under 20 iterations. The reason for the fast convergence of this method is
closely connected to the very rapid quadratic convergence rates of Newton's method
for nonlinear equations: indeed Puterman and Brumelle (1979) showed that policy
iteration is a form of the Newton-Kantorovich method for finding a zero to the non-
linear mapping F : R 1SI --+ RISI defined by F ( V ) = [I - F](V). The error bounds
for Newton-Kantorovich iterations are valid for general Banach spaces, which may
be the reason why in many applications the number of policy iteration steps required
to discover the optimal policy appears to be independent of the number of states,
ISt. However the amount of work per iteration of policy iteration does depend on
ISI and is significantly larger than for successive approximations. Since the number
of algebraic operations needed to solve the linear system (3.3) for V~ k is O(tSI3),
the standard policy iteration algorithm becomes impractical for problems with ISI
more than several hundred thousand elements. 32 To solve very large scale MDP prob-
lems, it seems that the best strategy is to use policy iteration, but to only attempt to
approximately solve for V~ in each policy valuation step.

Modified policy iteration. This approach uses successive approximations to approx-

imate the solution to the linear system V~ = G~(V~) rather than computing the
exact solution V~ = [I - p M , ] - I u a at each policy valuation step. The successive
approximations iterations for the operator G~ can be further accelerated by using
the McQueen-Porteus error bounds described above. The following Theorem of Put-
erman and Shin (1978) shows that asymptotically, each modified policy iteration is
equivalent to performing a large number of successive approximation steps.

THEOREM 4.1. Let a be an optimal policy for a stationary discounted MDP problem,
and let ak be the decision rule generated at step k of the modified policy iteration
algorithm that uses N successive approximation steps as an approximate solution for
V~ k in each policy valuation step. If'.

lim M,~ k --
M~ I= O,
32Supercomputers using combinations of vector processing and multitasking regularly now solve dense
linear systems exceeding 1,000 equations and unknowns in under l CPU second. See for example, Don-
garra (1986).
656 J. Rust


li-m IVN+J - - v l -- /3N+1. (4.9)

IvN- vl
Thus, modified policy iteration can also be thought of as an accelerated form of
successive approximations. It can be effective in problems where/3 is relatively low,
although in problems where /3 is close to 1 it tends to suffer from the same slow
convergence properties as" successive approximations. However our numerical results
in Section 3.5 demonstrate that when we use the McQueen-Porteus error bounds to
accelerate the use of successive approximations to solve the fixed point problem V~ =
G~(V~) modified policy iteration comes close to achieving the rapid convergence
properties of standard policy iteration, but requires significantly less work to compute
each policy valuation step.

Policy iteration with state aggregation. The strategy of this class of methods is to
use policy iteration and standard linear equation solvers to compute the exact solution
~ to a lower dimensional version of (3.3) that groups the IS] elemental states of
the original problem into a smaller number K of aggregate states. In an aggregation
step we choose a partition of the state space SI, • • •, SK (methods for choosing the
partition will be described shortly). If aggregate state i has Ni elements, we can define
a transition probability matrix M ~ on the aggregate d state space by:

m i
(4.1 o)
sES~ s ' E S j

Let ~ be an K x 1 vector of the average utilities in each of the K aggregate states.

Then if K is sufficiently small, one can use standard Gaussian elimination algorithms
to rapidly compute the exact solution go = [I - / 3 M ~ 1 - 1 to the K-state problem.
In a disaggregation step we use ~ of to construct an approximation of the ISI-state
value function Vc~.The partition $1,..., SK can be represented by an ISI x K partition
matrix W defined by:

1 i f / C SA, (4.11)
Wij = 0 otherwise.

Using W we can then compute an approximate solution V~ = W ~ to the original

problem (3.3). Notice this approximate solution will be a step function equal to ~ ( i )
for each elemental state s c Si.
Bertsekas and Castafion (1989) have shown that a better approach is to use W ~
as an additive correction in an iterative procedure that takes an initial estimate V and
Ch. 14: Numerical Dynamic Programming in Economics 657

generates a more accurate solution V that avoids the jaggedness of the step function
approximation W~3s. In this case the appropriate formula for g s becomes:

~s(,:) = N ~ [Cs (V) (s) -- V(s)]. (4.12)

To see why (4.12) is appropriate, suppose there exists an K x 1 vector y that solves
the equation

Vs = V + W y , (4.13)

where V~ is the solution to (3.3). Using the equations G s ( V ) - us +/3MsV and

Vs = G s ( V s ) = u s + f l M s V ~ we have:

[I - ZMs] (Vs - V ) : C~(V) - V. (4.14)

Multiplying both sides of (4.14) by ( W ~ W ) - 1 W ' and substituting W y - (Vs - V )

in the left-hand side we obtain:

(w'w)-xw'[I - ZMs]wy : (w'w)-'w'(cs(v) - v). (4.15)

Notice that since W is a partition matrix, ( W ' W ) 1 is an K x K diagonal matrix

whose ith diagonal element is I/N~. It follows that the solution to Eq. (4.15) can be
written as:


where I is the K x K identity matrix and go is given by (4.12). Thus, (4.16) is the
appropriate form of the aggregation step when the aggregation term W ~ s is treated
as an additive correction to an initial ISI x 1 estimate V as in (4.13). An alternative
formula for the disaggregation step can be found by applying G s to both sides of the
equation Vs = V + W g s yielding:

v~, ~_ c s ( v s ) + # M ~ W v s . (4.17)

Bertsekas and Castafion show that by interspersing a number of successive approxi-

mation steps Vt+l = Gs(Vt) between each pair of aggregation/disaggregation steps
(4.16) and (4.17), one can guarantee that the method will converge to the fixed point
Vs. Thus, in the initial stages the additive correction term Wg,~ succeeds in shifting
an initial estimate V to a neighborhood of Vs, and this estimate is refined by a small
658 J. Rust

number successive approximation steps, This cycle is then repeated, and as the re-
sulting sequence {Vt} converges to V it is easy to see from (4.12) that the additive
corrections W g ~ in (4.13) converge to 0.
The aggregate states can be chosen on basis of a fixed partition of the state space,
or can be chosen adaptively after each aggregation step k in order to minimize the
residual variation in G~(Vk) - Vk. One way to do this is to divide the total variation
A = max[G~(Vk) - Vk] - min[G~(Vk) - Vk] into K equal intervals, assigning states
with residuals G~(Vk)(s) - Vk(s) in the highest interval to aggregate state 1, states
with residuals in the next highest interval to aggregate state 2, and so on, i.e.

s E Sk if G,~(V)(s) - V(s) b- (k- 1)A C (O,A], (4.1S)

where b = m i n [ a ~ ( v k ) - Vk].

4.2.2. Comparison of methods in the auto replacement problem

As an illustration, we report the results of a comparison of the performance and accu-

racy of each of the methods presented in the previous section in the finite-state version



.-Y 7
'<:' co

ff l i

7> - - Continuous-Store S=[O,~)

I] ............ Discrete-Stote S=(1 ..... 100)

O4 / - - - Discrete Stote S={1 ..... 10]

0 10 20 50 40 50 60 70 80 90 O0

Stote VGriGble st

Figure 14.1. Comparison of discrete vs. continuous value functions in auto replacement problem.
Ch. 14: Numerical Dynamic Programming in Economics 659

of the automobile replacement problem in Example 3 of Section 2.3, a problem that

was originally posed and solved by policy iteration in Howard's (1960) monograph.
The parameters of the finite-state problem were chosen to match the analytical so-
lution given in (2.22) in the case where ,k = 0.5, /3 = 0.95, c(s) -= 200s, and
P - P_ = 100,000. Calculating the optimal stopping boundary 3' in (2.23), we see
that it is optimal to replace the automobile when st > ~/-- 52.87. The corresponding
value function is plotted as the solid line in Fig. 14.1.
An approximate discrete-state version of the problem was solved using ]S I = 100
states and the same discount factor, cost function, and replacement costs. The expo-
nential specification (2.17) for p(. [ st, dt) in the continuous case was approximated
with a 12-point probability distribution in the discrete case, using a simple continuity
correction. Thus, each of the 12 mass points were computed as the probability an
exponential random variable falls within plus or minus 0.5 of the integer values j that
st assumes in the discrete case:

o k exp{-Ay}dy, j~0~

f j+0.5 • exp{-)~y}dy, j = 1,...,10, (4.19)

/.oo/k exp{-/ky}dy,

The discrete-state value function (computed by policy iteration) is plotted as the

dotted line in Fig. 14.1, with one dot for each point in the state space. One can see
from Fig. 14.1 that the discrete-state value function approximates the continuous-state
version quite closely: the maximum absolute deviation between the two functions was
1164, and the maximum percentage deviation was just over 1%. Figure 14.1 also plots
an interpolated value function solved with only IS[ = 10 states. The maximum error
in this case is 3553, representing a 1.8% deviation. One can see that even very coarse
discretizations are able to do a good job of approximating a continuous underlying
value function. This provides some insight into the potential effectiveness of solution
methods based on state aggregation.
Tables 14.1 and 14.2 present a comparison of six alternative solution algorithms
used to solve the 100 state replacement problem. 33 The tables present the number of
iterations and CPU times required by each algorithm to compute an estimate of V
to within a tolerance of 1164, the maximum deviation between the continuous and

33Each of these methods were programmed in Gauss and run on an IBM 386/SX computer. The code,
also written in Matlab and C, is available from the author upon request.
660 J. Rust

Table 14.1
Comparison of solution methods/3 = 0.95

Method Iterations CPU seconds [V - V* [ /~ - _b IV - F (V) [

1. Successive approximations 114 48.9 571.7 26.5 30.6
2. Error bounds 65 29.7 503.4 1141.0 48.5
3. Policy iteration 6 46.1 1.1E-9 I. 1E-9 5.8E-11
4. Modified policy iteration 5 21.8 52.3 301.4 8.6
5. Fixed state aggregation 6 31.4 20.9 336.0 8.8
6. Adaptive state aggregation 8 171.8 40.4 291.2 7.7

Table 14.2
Comparison of solution methods/3 = 0.9999

Method Iterations CPU seconds IV - V*[ /~ - b__ IV - F(V)[

1. Successive approximations >10,000 > 4,600 > 30, 000 1.1E-8 3.0
2. Error bounds 166 75.7 4.4E-1 114.5 5.7E-3
3. Policy iteration 8 71.0 2.9E-7 2.9E-7 1.5E-11
4. Modified policy iteration 11 50.0 174.9 219.4 2.9E-2
5. Fixed state aggregation 10 51.9 3.4 93.8 4.7E-3
6. Adaptive state aggregation 15 1296 2.4E-1 58.4 2.9E-3

d i s c r e t e - s t a t e f o r m u l a s for V. 34 T h e m o d i f i e d p o l i c y iteration a l g o r i t h m u s e d N = 2 0
s u c c e s s i v e a p p r o x i m a t i o n steps to c o m p u t e a n a p p r o x i m a t e fixed p o i n t o f G , . T h e
M c Q u e e n - P o r t e u s e r r o r b o u n d s w e r e e m p l o y e d to test for c o n v e r g e n c e a n d p r o d u c e an
i m p r o v e d final e s t i m a t e o f V~. 35 T h e state a g g r e g a t i o n m e t h o d s u s e d M = 10 aggre-
gate states. T h e fixed state m e t h o d s i m p l y p a r t i t i o n e d the states into 10 e q u a l groups,
S~ = { 10 * (i - 1) + 1 , . . . , 10 * i}. T h e a d a p t i v e state a g g r e g a t i o n m e t h o d p a r t i t i o n e d
t h e states e n d o g e n o u s l y , a c c o r d i n g to the m a g n i t u d e o f the r e s i d u a l s G ~ ( V ) - V as
g i v e n in (4.18). In b o t h o f the a g g r e g a t i o n m e t h o d s , we set a r e l a t i v e l y l o o s e i n n e r
c o n v e r g e n c e t o l e r a n c e : iterate Vk was d e e m e d to b e a sufficiently g o o d a p p r o x i m a -

34Successive approximations were terminated when the crude estimate of the maximum deviation
between Vt and V was less than 1164. Successive approximations with error bounds were stopped when
the more refined McQueen-Porteus error bounds indicated that the maximum deviation of Vt and V was
less than 1164. The various policy iteration algorithms were terminated at the first iteration k such that
ak -- ak--l, although in the case of the approximate policy iteration algorithms, additional steps were
continued until the McQueen-Porteus error bounds for the operator F indicated that the estimated value
function Vc~k was within 1164 of V.
35More precisely, one successive approximations step V21 = F(V2o) was performed using the operator
F after 20 successive approximations steps Vt+l = Gc~k (Vt) were performed with the operator G=~.
The final estimate of Vak in modified policy iteration step k is then given by: Vk = V21 + (b + __b)/2
where b and _bare the McQueen-Porteus error bounds for the operator 1".
Ch. 14."Numerical Dynamic Programming in Economics 661
Successive Apgro~icnotions Policy Iteration

;YL, /

/ ~ . . . . . . . . . . . . . . . . . . . . . . . . . . . .


, , , , . . . . . , , , , , , , , , ,
o m ~o 3v 4o so so 7o ao 9o Ioo i0 20 30 40 50 60 70 ~lo 90 I00
Stole Vormbfe s t stole V~riobie S t

Successive Approximations with Error Bounds Modified Policy I t e r a t i o n

< - -.'- . . . . . . . . . . . . . . . . =.2.2
........ ' 5 . . . . . . . . . . . . . . . . . . _ 2 - ' - - -- -- --

~._ / --- Iteration 3

/ . ~...f>

, , , , , , , , , , , , , , , , , , ,
10 2o 30 40 50 60 70 ao 90 I00 °0 to 20 30 40 50 ~0 70 ao 90 mo

State Variable St Slete Vorloble S t

Figure 14.2. Convergence of Vt to V when fl = 0.95.

tion of the fixed point V~ = G,~(Vc~) if the McQueen-Porteus error bounds _bk, bk for
the operator G~ satisfy bk - _bk < 30000(1 - / 3 ) / / 5 . As soon as the policy converged
(c~k = ak t), additional policy valuation steps were carried out with a tighter conver-
gence tolerance of 500(1 - / 5 ) / / 3 until the resulting estimate for V~ k was within 1164
of V (computed via the McQueen-Porteus error bounds for F). Table 14.2 presents
a comparison of the six methods in the case where/5 = 0.9999.
Figure 14.2 illustrates the convergence trajectories of 4 of the algorithms. Succes-
sive approximations converges to V from below, reflecting the fact that it is solving
a finite-horizon approximation to the infinite-horizon problem. Policy iteration con-
verges to V from above, reflecting the fact that each successive policy a t represents
an improvement over the previous policy as discussed earlier) 6 Convergence to V
under the error bounds and modified policy iteration procedures is not necessarily
monotonic. In the former case one can see that the addition of the correction term
(b + _b)/2 quickly translates the iterates ~ into a region much closer to V (com-
pare the first iterate V1 using the error bounds correction to the the second iterate I~

36Note that we have cast the automobile problem as a minimization problem, in maximization prob-
lems the direction of convergence of policy iteration and successive approximations is reversed. Also the
monotonicity of successive approximationsiterations is a result of our choice for the initial value, Vo = 0,
662 Z Rust

under successive approximations). Once in a general neighborhood of V, successive

approximations succeed in "bending" lit into the precise shape of V. Modified policy
iteration is not monotonic because approximating the fixed point V~ = Go (V~) by N
successive approximation steps implies that each iterate Vk = GNk (Vk-1) will be an
underestimate of the true solution V,~. Thus, the sequence {Vk} generated by mod-
ified policy iteration is essentially the same as under policy iteration, but translated
downward. In this case the sequence {Vk} ended up converging to V from below.
Note also the inexactness of the approximate solutions under modified policy itera-
tion imply that more iterations are required to get close to V in comparison to policy
Overall the results indicate that in this problem policy iteration using an approxi-
mate rather than exact solution to the inner fixed point problem V, = Gc~(V~) was
the fastest of the methods considered. In the early stages of policy iteration, it is not
necessary to solve (3.3) exactly to insure rapid progress towards V and the optimal
decision rule a. Since the decision rules c~k are the results of finite maximizations, V~
does not have to be an extremely precise estimate of V to insure that the correspond-
ing policy at coincides with a. However once at = a (which is indicated by the fact
that c~t does not change in successive iterations) one can set finer tolerances for the
approximate solution of V~ = G~ (V~) in (3.3) to generate a more accurate estimate of
V and insure that the solution ak really does correspond to the optimal decision rule a.
We found that the Bertsekas-Castafion adaptive state aggregation procedure was not
effective in this particular problem. We noticed that the adaptively chosen states typi-
cally varied quite substantially over successive aggregation steps. The variations in the
membership of the aggregate states frequently resulted in an approximate disaggregate
solution G~(V) +/3Po~Wg~ that would tend to move away from the fixed point V~,
requiring a large number of intervening successive approximations steps to move the
solution back towards V~. Use of a fixed set Of aggregate states performed much more
effectively in this problem, reflecting our ability to closely approximate a continuous
value function using very coarse discrefizations as demonstrated in Fig. 14.1. 37 The
McQueen-Porteus error bounds are also a very cost-effecfive method for accelerating
convergence, particularly when used to accelerate the method of successive approxi-
mations for the fixed point problem V~ = G~(V~) encountered at each of the policy
valuation steps.

4.2.3. New approaches to solving large scale problems

The rest of this subsection discusses several promising recent ideas for speeding up
the solution to large scale discrete infinite horizon MDPs. The first example describes

37However in less structured problems it may not be clear how to choose the aggregate states, and
in problems with multiple ergodic classes use of a fixed set of aggregate states is typically ineffective.
In these problems, Bertsekas and Castafion (1989) have found that the adaptive aggregation method can
outperform the modifiedpolicy iteration algorithm.
Ch. 14: Numerical Dynamic Programming in Economics 663

the use of linear programming methods as a potentially attractive alternative to policy

iteration. The second example describes an idea of Pan and Reif (1985) to use massive
parallel processors and Newton's method to solve the linear system V~ = us +
/3Mo,V,~ in O(log(]SI) 2) time using O(ISl ~°) processors, where co ~ 2.376 is the
best lower bound on fast matrix multiplication. The third example discusses the use
of a parametric approximation method known as the "minimum residual algorithm"
(MR) to approximately solve the system of linear equations involved in carrying out
the policy iteration algorithm. For certain MDPs with sparse transition probabilities
the MR algorithm can reduce the time required to solve the policy valuation step
from O(ISt 3) to as little as O(ISl) time depending on the sparsity of the transition
probability matrix and the level of precision ¢ required in the solution. A fourth
approach, action elimination was already mentioned in Section 4.1. We refer the
reader to Puterman (1994) Section 6.7 for a detailed discussion of this approach.

Linear programming. It is well known [see, e.g. Bertsekas (1987)] that the value
function for a discrete infinite horizon MDP problem is the solution to the following
linear programming problem:

min Z V ( s ) (4.20)
VCRIsl s = l

subject to:
>1 a) + I

a = 1 , . . . , tA(s)], s = 1 , . . . , ISl. (4.21)

Many people have presumed that solving the MDP problem as an LP problem is a
relatively inefficient way to solve large-scale MDPs. However a recent paper by Trick
and Zin (1993) observes that generally only IS I of the IAIISl constraints (4.21) will be
binding since there will generally be a unique optimal action. Therefore they propose
solving the LP problem using a technique known as constraint generation which
solves an expanding sequence of LP's starting from a very small LP problem that only
imposes a small subset of the constraints (4.21) and then sequentially re-solves the LP
problem adding discarded constraints that were violated by the previous trial solution
until the optimal solution V is found. Trick and Zin claim that constraint generation
has been "spectacularly successful" in solving large scale LP's in other applications
including "a formulation for the traveling salesman problem that is estimated to have
26o constraints is solved in a matter of hours on a workstation" (p. 8). Trick and Zin
used constraint generation to solve a discretized version of the continuous MDP test
problem used in the JBES (1990) "horse race". Their discretization used a two-state
664 z Rust

Markov chain approximation for the exogenous "technology shock" process {zt} and
varying size discretizations for the endogenous capital stock variable kt. They found
that even without employing constraint generation, the standard commercial linear
programming code "CPLEX", was able to solve the MDP problem between 33 to 13
times faster than policy iteration as the number of capital grid points ranged from
33 to 513. By switching to constraint generation they were able to speed up the
solution of the 513 state problem by another 2.5 times, or a total speedup of nearly
33 times relative to policy iteration. Trick and Zin also describe multigrid versions
of their LP/MDP constraint generation algorithm that are similar in many respects to
the Bertsekas-Castafion adaptive aggregation approach to policy iteration. Their first
version, which they refer to as "grid generation" begins by solving the LP problem
using only a subset of the states s c S and then sequentially adds additional states to
the linear programming problem using the previous solution as a starting point to the
problem with a larger set of states. A second version, which they refer to as "adaptive
grid generation" is described as follows:

We begin by optimizing over a coarse evenly spaced grid using constraint gener-
ation. Each state s on this coarse grid has an optimal action c~(s) and a shadow
price A(s) generated by the LP solution. This shadow price measures the impact
that on the sum of the value function ordinates (the objective function (4.20)), of a
small change in the constraint, specifically u(s, c~(s)). We next calculate the slack
of each constraint adjacent to to the constraint for c~(s). This amounts to finding
the change in u that would result in the constraint holding with equality. We then
multiply these slacks by the shadow price to obtain a measure of the impact on
the objective function of placing a new grid point adjacent to c~(s). We do this for
all s. We then add the actions and the states corresponding to the highest values
of this product of shadow price and slack. The new points are always the mid-
point between two existing points [paraphrase of Trick and Zin (1993) p. 17 in the
notation of this chapter].

Trick and Zin found that their adaptive grid generation procedure does a good job
of approximating value functions to continuous state MDP problems, especially in
regions where the value function has lots of curvature since the method tends to place
more grid points in these regions.

Massively parallel policy iteration. This is the standard policy iteration algorithm
using the massively parallel linear equation solver of Pan and Reif (1985) to ap-
proximately solve each policy valuation step. The Pan-Reif algorithm uses Newton's
method to compute an approximate inverse of the IS] x IS] matrix M = [ I - ~M~] of
the linear system that must be solved at each policy iteration step in O(log(ISI) 2)
time using O(]S[ ~) processors, where co is the best lower bound on fast matrix
multiplication, currently co = 2.376. The Newton algorithm for matrix inversion
Ch. 14."NumericalDynamicProgrammingin Economics 665

works as follows. Given an initial estimate L0 of the inverse M -1 which satisfies

HI - LoMII = q < 1, one can use Newton's method to solve the matrix equation
F ( X ) = [I - X M ] = 0 yielding iterations of the form:

Lk = (21 - L k - I M ) L k - 1 . (4.22)

An alternative iterative procedure that yields similar results is

Lk = L o l l [I + ( I - LoM)] 21. (4.23)

As is characteristic of Newton's method, the iterates Lk converge quadratically to the

zero M -1 of F ( X ) :

I I M -1 - Lk[[ ~< q2k IIL°I-------~[ (4.24)

(1 - q )
Using this error bound, Pan and Reif established the following lemma:

LEMMA 4.1. Suppose that the initial [S I x ISI starting matrix Lo for Newton's iteration
(4.22) is chosen so that q ~ III - LoMIL satisfies:

q= 1 ,c.,,OQ. ~ as ISI-+ oo. (4.25)

Let e be a positive constant. Then O(log(ISI) iterations of the Newton algorithm

suffice in order to compute an approximate inverse ~ - 1 that satisfies:

IIM-'- -'11 llLo11. (4.26)

Now observe that Newton's method involves repeated multiplication of [S I x [S t

matrices. This can be done in O(log(ISI) ) time using O(ISl ~) processors where ~
2.376. By L e m m a 4.1 only O(log(IS[) ) Newton iterations are required to obtain an
approximate inverse Lk satisfying (4.26). Therefore we have the following theorem:

THEOREM 4.2. Suppose that the initial estimate Lo of M -1 satisfies inequality (4.25),
and let c be a positive constant. Then a massive parallel processor with O([Sp °)
processors can find an approximate solution Lk satisfying the convergence criterion
(4.26) in O ( l o g ( I S I ) 2) time.
666 ,L Rust

Since policy iteration is also a version of Newton's method, we can use the P a n -
Reif Newton algorithm to generate a sufficiently precise approximate solution to the
value function Vs at each policy iteration step, guaranteeing that policy iteration
using the approximate solution Vs is still quadratically convergent. If the number of
policy iteration steps required to converge to the optimal policy is independent of the
number of states IS[ (or at least does not grow systematically with ]El), it follows that
the Pan-Reif massively parallel policy iteration algorithm is capable of solving the
MDP problem in O(log(IS[) ~) time steps using a massively parallel processor with
o(Isl w) Pr o c essors. 38 Although the Pan-Reif method has not yet been implemented
to our knowledge, Archibald, McKinnon and Thomas (1993) provide a discussion of
the practical implementation and performance of parallel policy iteration algorithms.

Policy iteration using minimum residual algorithms. The result of the previous sec-
tion is currently primarily of theoretical interest due to the fact that most economists
do not have access to the truly massive parallel processing technology required by the
Pan-Reif method as well as the fact that fast matrix multiplication algorithms required
by the Pan-Reif algorithm described above are not yet widely available in standard
numerical libraries. A further drawback to the use of Newton's method to invert the
IS] x IS] matrix I - / 3 M s is that it involves unnecessary work if our objective is
to simply solve a single system of linear equations V,~ = us + flMsVs at each pol-
icy valuation step. We now describe a class of "minimum residual algorithms" (MR)
for solving this system that is highly efficient, especially in cases where the transi-
tion probability matrix M s corresponding to p(s ~ I s, a) is sparse. MR algorithms
have been proven extremely effective in solving very large scale systems arising in
chemistry and physics applications. The M R algorithm is similar to modified policy
iteration in that it is an iterative method that only attempts to approximately solve
the linear system Vs = Gs(Vs) at each policy valuation step. In order to motivate
the method, recall that modified policy iteration produces an approximate solution Vs
given by the following linear combination:

i ir (4.27)

where M s is me ISI x IS[ matrix representation o f p ( s ' I s, c~(s)) and u s is the tSI × 1
vector with elements u(s, c~(s)). If M s is a dense matrix, each policy valuation step

38The issue as to whether the number of policy iteration steps required to solve an MDP problem is
independent of ISI is not fully resolved at this point. The results of Puterman and Brumelle (1979) show
that policy iteration is a form of the Newton-Kantorovichmethod whose error bounds are valid for general
Banach spaces and thus apparently independent of the problem dimension ISI. However John Tsitsiklis
(private communication) has suggested that it may be possible to imitate the Klee and Minty (1972) worst
case arguments for the complexity of linear programming and construct worst case MDPs for Which the
number of policy iteration steps increases linearly with ]SI.
Ch. 14: Numerical Dynamic Programming in Economics 667

of modified policy iteration requires O(ktSt 2) operations, since each matrix-vector

multiplication requires O(ISI 2) operations. Modified policy iteration will dominate
ordinary policy iteration if k can be substantially less than ISI without substantially
increasing the total number of iterations necessary to achieve convergence. In addition
if M s is a sparse matrix with O(ISI) nonzero elements, then only O(klSI) operations
are required to carry out each policy valuation step.
We now show that there are better methods for approximately solving the policy
valuation step that require a significantly smaller number k of matrix-vector multi-
~ications than required by modified policy iteration to obtain approximate solutions
V~ of specified accuracy. We can write the linear system u = [I - / 3 M ] V in generic
form as:

u = LV, (4.28)

where the matrix L = [I - ¢3M] has condition number IILIIIIL-111 that is bounded
above by 1. The MR method computes an approximate solution to the linear sys-
tem (4.28) as the linear combination Vk given by:

Vk z E c * L i - i1 u (4.29)

where the coefficients ( c T , . . . , c~) are chosen by the method of ordinary least squares
to minimize the residual difference t t u - MVk If2:

( c ~ , . . . , c~) = arg min U - ~ciLi-lu . (4.30)

(Cl'""ck)ERk i=t 112

The MR algorithm is just one example of many algorithms that use the Krylov
information: 39

Ik(L,u) = (u, Lu, L 2 u , . . . , L k u ) . (4.31)

Let F denote the class of orthogonally invariant matrices. 40 Traub and Wo~niakowski
(1984) proved that the MR algorithm is nearly optimal for solving the linear system
(4.28) provided the matrix L is in the class F of orthogonally invariant matrices. More

39Other methods that use this information include the generalized conjugate residual method (GCR),
the Chebyshev algorithm, and simple successive approximations.
4°A class of matrices F is orthogonally invariant if for each L E F we have QLQ ~ c F for any
orthogonal matrix Q. It is easy to verify that the class of all matrices with uniformly bounded condition
numbers is orthogonally invariant.
668 J. Rust

precisely, they proved that the value of k required by the MR algorithm to produce
an c-approximation of the true solution V to (4.28) is at most one step larger than
the lower bound k(e, F ) on the value of k required to produce an c-approximation
for each L E/7,41 Further Chou (1987) proved that the Krylov information is almost
optimal in the sense that the value of k(e, F ) under the ga'ylov information is at most
twice the lower bound on this index using the "optimal information" for this problem.
Traub and Wo£niakowski derived an explicit expression for the value of k required
by the MR algorithm to obtain an c-approximation for the class of matrices F defined
by F = {L I L = I - / 3 M , [IMII ~< 1}. Notice that this class includes the matrices
arising in MDP problems. For this class F their formula for the minimal value of k,
denoted by kMR(e, F), is given by:

k M R ( e ' F ) = m i n ( IS'' k,og/p

)'l°g!e~!(1-~)] + 1 ) ' (4.32)

where 3 E [0, log(1 -/32 +/32e2)/log(e2)]. For problems where extremely high pre-
cision is not required we have kMR(e, F ) << ISl, so if the matrix L is sparse with
O(ISI) nonzero elements, then the total work involved in carrying out each policy
valuation step using the MR algorithm is only o(Isl) compared to O(IS[ 3) using
standard linear equation solvers.
Saad and Schultz (1986) present numerical comparisons of a generalized iterative
version of the MR algorithm known as GMRES. This algorithm has been successfully
used to solve extremely high-dimensional systems in chemistry and physics. 42 The
GMRES algorithm uses Gram-Schmidt orthogonalization of the Krylov information
(u, L u , . . . , Lku) to allow sequential solution of the least squares problem (4.30).
This allows periodic restarting of the GMRES algorithm and adaptive monitoring of
the approximation error IIu - LVk II, allowing one to stop the algorithm when desired
solution tolerance e is obtained rather than precommitting ex ante to a fixed value
of kMR(e, F). Elman (1982) proved that the sequence of iterates from the GMRES
algorithm with m restarts satisfy the following error bound:

- - LV011, (4.33)

where 7 = Amax(L'L), and Amax(UL) denotes the largest eigenvalue of the matrix
L~L (which is equal to (1 -/3)2 in this case). This formula shows that increasing
the number of restarts improves the rate of convergence of the GMRES algorithm. In

41The function k(c, F) is called the optimal class index and can be regarded as a type of complexity
bound, see Traub and Wo/niakowski (1984) for a definition.
42Someapplicationsreportusing GMRESto approximatelysolvesystemswith over 100,000equations
and unknowns.
Ch. 14: Numerical Dynamic Programming in Economics 669

their numerical experiments Saad and Schultz (1986) found that m E [5, 20] seemed
to work well, yielding significant speed-ups over other approximate solution methods.
The fact that GMRES is considered to be an "industrial strength" equation solver in
numerical chemistry and physics suggests that this algorithm could have substantial
promise in economic problems for use as a subroutine to approximately solve the
policy valuation steps in discrete MDPs with very large numbers of states.

4.3. Continuous finite horizon MDPs

Recall from Section 2.1 that continuous MDPs have state variables that can assume
a continuum of possible values. For concreteness, we will assume here that the state
space S is an uncountable subset of the ds-dimensional Euclidean space R d~. As
we noted in Section 2.1, there two main subclasses of continuous MDPs: 1) discrete
decision processes (DDP's) for which the action set A is finite, and 2) continuous
decision processes (CDP's) for which the constraint set A(s) is a compact, convex
subset of the da-dimensional Euclidean space R a~ . We will see that there are important
differences in the solution algorithms and associated computational complexity bounds
for CDP's and DDP's. In particular, we show that there are randomized algorithms
that succeed in breaking the curse of dimensionality of approximating the solution to
DDP's whereas no deterministic or random algorithm succeeds in breaking the curse
of dimensionality for CDP's, at least on a worst case basis.

4.3.1. Discrete approximation methods

•Discrete approximation methods compute the value functions ~ and decision rules
st, t = 1 , . . . , T, at a finite set of points in the state space S and constraint sets A(s).
We will refer to a finite subset of points of S as a grid, and denote it by ( s l , . . . , SN}.
Note that this definition does not presume that the points in the grid need to be evenly
spaced. There are many variants of discrete approximation, but the general structure
can be described as follows. One first chooses grids for the state and action spaces,
and then performs backward induction for a finite MDP problem defined on these
grids. Thus, discretization amounts to replacing the continuous-state version of the
backward induction step
v (8) =

-- max [
u(s,a) +/3 / I
Vt+l(s')p(ds' ] s,a) , 8 C S, (4.34)

by a discretized equivalent

-= max u(8, a) + /3~_Vt+l(sk)pN(Sk l S, a) , S E {Sl,...,SN}. (4.35)

aEA(s) k=l
670 J. Rust

We can represent the recursion (4.35) more compactly as successive applications of

F i r starting from the 0 function:

=FNT-t(0), t=0,...,T. (4.36)

Note there is no particular reason to assume that the number of points N making
up the grid of S is the same as the number of grid points making up the grids of
the constraint sets A(s). Also, there is no reason why one can't choose different
grids for each constraint set A(si), i = 1,..., N. Indeed, it is not even necessary
to discretize the constraint sets if continuous optimization methods such as gradient
hill climbing algorithms, the N e l d e r - M e a d (1965) polytope algorithm, or stochastic
search algorithms such as simulated annealing are used to solve the maximization
problem in (4.35). 43 The key restriction is that the state space is discretized, which
implies that discrete approximation methods amount to a replacement of the true
Bellman operator F : B(S) --+ 13(S) (where 13(S) denotes the infinite dimensional
Banach space of measurable, bounded functions), by an approximate "discretized"
Bellman operator F N " R N --+ -RN but otherwise the backward induction process is
identical. However even though the discrete solution V = ( V 0 , . . . , % ) consists of
T + 1 vectors in _RN, we can use these vectors as "data" for a variety of interpolation
or "smoothing" methods to produce continuous solutions defined over the entire state
space S. For example, in our plots of the numerical solutions to the automobile
replacement problem in Fig. 14.1 we used simple linear interpolations of successive
values of the IS] × 1 vector V. In multidimensional problems other more elaborate
interpolation schemes can be employed such as cubic splines (which guarantee that
the interpolated function is twice continuously differentiable at the "knot points"
{sl,.. •, SN}), or various versions of local polynomial fits such multivariate Hermite
polynomials and tensor product splines. The details of these various interpolation
and approximation methods will be presented in our survey of smooth approximation
methods in the next subsection.
All of the discrete approximation methods that we will be presenting in this sub-
section have discretized transition probabilities pN(S k I s, a) that are nice functions
(i.e. continuous or continuously differentiable) of s. This is significant since it im-
plies that the the approximate Bellman operator ~ N has a very important property: it
is self-approximating, i.e. we can evaluate FN(Vt) at any point s in the continuous

43To keep the length of this chapter within bounds, we do not survey the wide range of algorithms
available for solving the embedded constrained optimization problems required to evaluate (4.35). We refer
the reader to Gill, Murray and Wright (1985) for an excellent survey of constrained and unconstrained
optimization methods. Hwang (1980) and Solis and Wets (1981) are good starting points for the theory
of and practical experience with randomized maximization methods. Geman and Hwang (1986), Goffe,
Ferrier and Rogers (1992) and Bertsekas and Tsitsiklis (1993) are good starting points for the theory of
and practical experience with simulated annealing methods.
Ch. 14." Numerical Dynamic Programming itt Economics 671

state space S and not just at the finite set of grid points { S 1 , . . . , SN}. Furthermore
Vt(s) = F N ( ~ + 1 ) ( s ) is a maximum of continuous functions of s and is therefore
a well-defined continuous function of s. The self-approximating property leads to a
dual interpretation of FN: we can regard it as a contraction mapping on R N when
we consider its restriction to the finite grid {sl,. • •, sly} or we can regard it as a
contraction mapping directly on the infinite-dimensional space C ( S ) . This simplifies
computation since we don't need to employ any auxiliary interpolation, approximation
or smoothing methods to evaluate F N ( W ) at any point s E S, and it also makes it
easier to analyze the convergence of these methods as N --+ oo. However the most
important reason for emphasizing the self-approximating property is that under cer-
tain circumstances it allows us to break the curse of dimensionality of approximating
the value functions to certain MDPs. More specifically, we will show that backward
induction using a random Bellman operator FN breaks the curse of dimensionality
for the class of DDP's, i.e. MDPs with finite action sets A.
We now provide a brief overview of four main approaches to discretization. There
are two key problems involved in developing an effective discretization procedure:
1) how to choose the N grid points { s l , . . . , SN} at which to evaluate Vt+l in the
summation in (4.35), and 2) how to construct a discretized estimate PN of the con-
tinuous transition probability p. It turns out that the first problem is the most im-
portant, since the methods for estimating p typically follow naturally once we have
chosen a grid. We now present four different strategies for selecting the grid points
{ s l , . . . ,sN}: 1) uniform grids, 2) quadrature grids, 3) random grids, and 4) "low
discrepancy" grids. The first two methods are subject to the curse of dimensionality
of multivariate integration discussed in Section 3: i.e. the minimal number of grid
points N(c,/3) required to approximate the Bellman operator F defined in (4.34) to
within a maximum error of (1 - / 3 ) e uniformly for all V satisfying tVl ~<K/(1 - / 3 )
is N = O ( l / ( e ( 1 -/3)2)d~) where d, is the dimension of the state vector s. Further-
more the worst case complexity bounds established by Chow and Tsitsiklis (1989)
show that general CDP's are subject to an inherent curse of dimensionality, at least
on a worst case basis using deterministic methods. However Rust (1995b) proved that
the third approach, randomly chosen grid points, succeeds in breaking the curse of
dimensionality for DDP's. The intuition behind the result is that essentially all the
work involved in solving a DDP is the numerical integrations required to compute
the conditional expectations of the value function. Since Monte Carlo integration can
be used to break the curse of dimensionality of the numerical integration subproblem,
it follows that it also breaks the curse of dimensionality of the DDP problem. The
final approach uses "low .discrepancy" grid points such as the Hammersley points
and Sobol' points. As we discussed at the end of Section 3.2, Wo2niakowski (1991)
proved that a quasi-Monte Carlo integration algorithm using the Hammersley points
succeeds in breaking the curse of dimensionality associated with the multivariate in-
tegration problem on an average case basis. We conjecture that these grid points may
672 Z Rust

also break the curse of dimensionality for the subclass of DDP's on an average case
In order to establish the convergence properties of the various discretization proce-
dures presented below we need to impose some additional regularity conditions on the
state space S and the functions {u, p}. The following assumptions are stronger than
necessary to prove many of the results stated below, but we will impose them here in
order to simplify the proofs and unify the exposition. We believe that the assumption
that the state and control spaces are subsets of the ds and rid-dimensional unit cubes is
without significant loss of generality. The key restriction is that S and A are compact
sets. If we have compactness then we can always do a change of coordinates to map
these spaces into compact subsets of the hypercube.

DEFINITION 4.1. Recall the formal definition of information based complexity in Sec-
tion 3.2. Let A(u, p) = V denote the (finite or infinite horizon) MDP problem given by
the dynamic programming recursion equations in Section 2.2. The class of admissible
problem elements F = {(u, p)} for the MDP problem is given by:
(A1) S = [0, lids; A(s) = [0, l]d% Vs E S.
(A2) p(ds ~ I s, a) has a continuous and uniformly bounded density with respect to
Lebesgue measure on S for each a E A(s) and s E S.
(A3) u(s, a) is jointly Lipschitz continuous in (s, a) with Lipschitz bound/~u .44
(A4) p(s ~ I s, a) is a jointly Lipschitz continuous function of (s', s, a) with Lipschitz
bound Kp.
(A5) The mapping s --+ A(s) is a Lipschitz continuous correspondence. 45
We have assumed for convenience that the Lipschitz constants Ku and Kp are
independent of the problem dimension d. There are many problems for which this
may not be a good assumption. The results that follow will also go through in a more
generalized framework where Ku and Kp are allowed to be polynomially increasing
functions of d. However Rust's (1995b) result on the ability of randomization to break
the curse of dimensionality breaks down if K~ or Kp increase exponentially fast in d.
Thus, (A3) and (A4) are important assumptions that must be carefully verified in
practical applications.

Uniform grid points. The obvious way to discretize a continuous state space is
to use a uniform grid, consisting of equi-spaced points in S and A. Specifically, we
partition the ds-dimensional cube [0, 1]d" into equal subcubes of length h on each side.
Assuming that 1/h is an integer, this results in a total of N = (l/h) d~ subcubes. In
to obtain arbitrarily accurate approximations, it should be clear that we will need to

44Lipschitz continuity means that lu(s, d) - u ( s ' , d')l ~ g ~ l ( s , d) - (s', d/)t, for all feasible pairs
(s, d) and (s', d').
45This means that for any s, s / E S and any a I E A(s t) there exists some a E A(s) such that
Ch. 14: NumericalDynamicProgrammingin Economics 673

have h --+ 0. It follows that the simple uniform discretization is subject to the curse
of dimensionality since the number of grid points, and hence the amount of work
required to solve the approximate discrete MDP problem, increases exponentially
in the dimensions ds and da. For this reason, the uniform discretization has been
traditionally been regarded as a "naive" approach to numerical approximation of MDP
problems. Surprisingly, Chow and Tsitsiklis (1991) showed that the simple uniform
discretization is about the best we can do in the sense that a multigrid algorithm using
a sequence of uniform grids for S and A(s) nearly attains the lower bound on the
worst case complexity of the MDP problem. Since their result was proven for the
case of infinite horizon MDPs we will defer further discussion of this paradox until
Section 4.4. Here we will simply illustrate how they used a uniform grid to construct
an approximate Bellman operator/~h that converges t o / " as h -+ 0.
Let Sh denote the partition of S induced by the uniform partition of the unit cube,
[0, 1] ds . Similarly let Ah denote the partition induced by the discretization of the action
space A = [0, 1] aa. Thus, Sh consists of N~ = (1/h) a~ equal subcubes of length h on
each side, and Ah consists of N~ = (l/h) d~ equal subcubes of length h on each side.
Let sk denote an arbitrary element (grid point) in the kth partition element of Sh and
let k(s) denote the index of the partition element of Sh that contains a given point
s C 6;. Similarly, let ak denote an arbitrary element (grid point) in the kth partition
element of Ah. The discretized utility and transition probability are defined by:

Uh(S,a) =u(sk(~),a),

I a) (4.37)
ph(s' I s,a)= fp(sk(~,)lsk(~),a)ds,,

where the normalization of the second equation insures that Ph is a well defined tran-
sition probability density on S. Note that in many cases we will need to do numerical
integration to compute the normalizing factors in the denominator of the discretized
transition probability Ph as can be seen from formula (4.37). In practice the required
numerical integration can be carried out by any numerical integration method such as
the quadrature method to be described next. Note also that (4.37) defines uh and Ph
as discrete step function approximations to u and p. This is primarily for notational
convenience: in practice one would probably want to use spline interpolations so that
Uh and Ph are also Lipschitz continuous functions of (s ~, s, a). Given these objects~
we can define the discretized Bellman operator by

max [uh(s, a k ) + f l / V ( s ~ ) p h ( J ] s , a ~ ) d J J
= ak, ...... No

= max uh(s, ak) + fl V(sk)ph(Sk I Sk(s), ak) , (4.38)

ak~ ]e=l~...,aNa N k=l
674 .I. R u s t

where the second equality holds whenever V is a step function taking value V(sk)
in partition element k. The following theorem is the basic consistency result for the
uniform discretization procedure, adapted from Theorem 3.1 of Chow and Tsitsiktis

THEOREM 4.3. There exist constants K1 and t£2 such that Jbr all h sufficiently small
and all V ~ t3 we have:

lit(v) - +/3K211Vll)h. (4.39)

Using Lemma 4.2, we need to choose h small enough so that the right hand side
of inequality (4.39) is less than (1 - /3)e uniformly for all V satisfying ]]Vll ~<
K/(1 -/3). It is easy to see that choosing h proportional to (1 - fl)2e is sufficient
to guarantee that this is satisfied. This implies that the discretized MDP requires a
total of IS I = O(1/((1 - fl)zc)~z~) grid points. A similar analysis shows that ]A] =
O(1/((1 -/3)2)e) a° grid points are required in order to guarantee that the discrete
maximizations in the approximate Bellman operator Fh(V) are within (1 - / 3 ) e of
F(V) uniformly for all V satisfying ]]V]] ~ K/(1 -/3). Recalling from Section 4.1
that O(TtA ]IS] 2) operations are required to solve a discrete, finite horizon MDP, it
follows that the uniform discretizationrequ~es O(T/((1 - fl)ze)(zas+a=)) operations
to compute an approximate solution V = (170,..., VT) that is uniformly within e of
the true sequence of value functions. It follows that discrete approximation methods
using uniform grids are subject to the curse of dimensionality.

Quadrature grids. Tauchen and Hussey (1991) suggested the use of quadrature ab-
scissa as a good set of grid points for discrete approximation of integral equations
arising in linear rational expectations models. Their approach can be easily translated
to provide a discrete approximation method for continuous MDPs. The motivation
for this approach is the use of quadrature methods for approximating the integral of
a function f : S --+/~ with respect to a weighting function p : S --+/L In the unidi-
mensional case, S = [0, 1], quadrature involves the choice of N quadrature abscissa
{ S l , . . . , su} and corresponding weights { W l , . . . , WN} such that

j f(s)p(s) ds ~ ~

f(sk)'wk. (4.40)

The weighting tunction p will typically be a probability density function, although in

some cases it can be interpreted as an "importance sampling" weighting reflecting the
analyst's a priori knowledge about the parts of the state space that are most important
for calculating the integral. In the special case of Gaussian quadrature, { S l , . . . , SN}
and {Wl,... ,WN} are determined by requiring that Eq. (4.40) to be exact for all
Ch. 14: Numerical Dynamic Programming in Economics 675

polynomials of degree less than or equal to 2 N - 1. The resulting grid points and
weights depend only on the weighting function p and not the integrand f . Notice that
the weights { w l , . . . , W N } generally do not satisfy w~ = p(sk), although they are
always guaranteed to be positive. Applying Gaussian quadrature to approximate the
conditional expectation of the value function V with respect to a weighting function
p we obtain

v<)p(s' I s, a)ds'= f V(s') ) p(s')ds'

J Pt )

~- ~-' V(sk)P(skl_s,a
" p(sk) ) wk. (4.41)

The quadrature weights and abscissa allow us to define an N - s t a t e Markov chain

with state space consisting of the quadrature abscissa { s l , . • •, SN} with transition
probability PN defined probability by a simple normalization:

p(sk I sj,a)wk/p(sk) j,k = 1,...,N. (4.42)

pN(Sk l sj,a) = N

Using PN we can define a discretized Bellman operator F N " C(S) -+ C(s) by:

-FN(V)(s) = max u(s,a) +13 V(s~)pN(sk I s,a) , (4.43)

aEAN(.S) k=l

where AN(S) denotes a finite choice set with N points, a discretized version of A(s)
using the same procedure defined for the uniform discretization procedure above. 46
Notice that since FN(V)(s) is the maximum of a sum of continuous functions of s,
it has the self-approximating property that we discussed earlier. Adapting Tauchen
and Hussey's theorem 4.2 yields the following error bound on the deviation between
F ( V ) and F N ( V ) :

THEOREM 4.4. Suppose S = [0, 1]. There exist constants Kt and 1422such that for all
V C C(S;) and all N sufficiently large we have:

Ilr(v) _ A
,,rN(V)I[ ~< (4.44)

46Of course there is no reason beyond notational simplicity to require that tile number of points N
used for the quadrature abscissa equals the number of points for the choice set discretization as long as
both tend to infinity and the maximum grid size of the discretization of A(s) tends to zero as N --+ oc.
676 J. Rust

Note that this is the same basic bound as obtained in the case of uniform discretiza-
tion in Theorem 4.1 with inverse of the number of quadrature abscissa 1IN playing
the same role as the grid size parameter h. Thus, just as in the case of uniform dis-
cretization, the quadrature approach produces a consistent estimate of the true value
function V as N -4 e~.
Note that although Gaussian quadrature possesses several optimum properties
[Davis and Rabinowitz (1975)] the e-complexity of the quadrature algorithm is the
same as uniform discretization since quadrature is a deterministic integration algo-
rithm and we saw from Section 3.2 that all deterministic integration algorithms are
subject to the curse of dimensionality. We can see this more explicitly by consider-
ing the product rule formula for numerical quadrature of multidimensional functions
f : [0, 1]a - 4 R:

.N N d
/ ' f ( s ) ds~- ~ ... ~fi-~f(sk~,...,s~d) H w k ~, (4.45)
kl=l kd=l i=1

where the {sk~ } and {wk~ } are the same quadrature abscissa and weights as in the one-
dimensional case. Clearly the amount of cpu time required to evaluate a d-dimensional
product quadrature rule is proportional to N d, even though the approximation error for
f ~ C[0, 1] d still decreases at rate inversely proportional to the number of quadrature
abscissa used in each dimension, i.e. 1/N. It follows that in the multivariate case the
amount of computer time required by quadrature methods to approximate an integral
with a maximum error of e on a worst case basis is of order O(1/ed~), the same as
for the simple uniform discretization method described above. Thus, while the use of
quadrature abscissa may be relatively efficient for low-dimensional MDP problems,
quadrature cannot break the curse of dimensionality of the continuous MDP problem,
at least on a worst case basis. If we calculate the number of operations required to
guarantee that backward induction using quadrature grids produces approximate value
functions that are uniformly within e of the true solution V, we find that the amount
of work required is of order O(T/((1 fl)2~)(2d~+do)), which is same order of effort
required by backward induction using a simple uniform discretization. The Chow and
Tsitsiklis (1989) complexity bound presented in Section 4.4 shows that this curse of
dimensionality is an inherent feature of the MDP problem that can't be circumvented
by any choice of deterministic solution algorithm. In order to circumvent the curse
of dimensionality we must either focus on a subclass of MDP problems with further
structure, or we must adopt a different notion of computational complexity such as
randomized complexity or average case complexity.

Random grid points. In Section 3.2 we discussed the use of randomization as a way
to break the curse of dimensionality associated with approximate solution of various
mathematical problems. Certain problems such as multivariate integration that axe
Ch. 14: Numerical Dynamic Programming in Economics" 677

intractable when deterministic algorithms are used become tractable when random
algorithms are allowed. Of course the use of random algorithms does not come without
cost: we can't guarantee that the approximate solution is within e of the true solution
with probability 1 but only with probability arbitrarily close to 1. Also randomization
does not always succeed in breaking the curse of dimensionality: Nemirovsky and
Yudin (1978) showed that the general nonconvex multivariate optimization problem is
intractable whether or not random or deterministic algorithms are used. This implies
that randomization cannot succeed in breaking the curse of dimensionality of the
general M D P problem with continuous action space A since it includes the general
nonconvex multivariate optimization problem as a special case when fl = 0 and
IS] = 1. We record this result as

THEOREM 4.5. Randomization cannot succeed in breaking the curse of dimensionality

of the class of all continuous MDP problems, i.e. a lower bound on the computational
complexity is given by:

compW°r-ran(G/3, da, ds) = O

(1) [(1 - fl)2e]d= " (4.46)

However Rust (1995b) showed that randomization does succeed in breaking the
curse of dimensionality of the subclass of DDP problems given in Definition 2.2.
The intuition underlying the result is fairly simple: a DDP has a finite number IAt of
possible actions but a continuous multidimensional state space S = [0, 1] d. Therefore
essentially all the work involved in approximating the Bellman operator _r'(V) is
the multivariate integration problem to compute f V(s')p(s' I s, a)ds' at each pair
(s, a). However we know from Section 3.2 that randomization succeeds in breaking
the dimensionality of the multivariate integration, so it stands to reason that it can
also break the curse of dimensionality of the DDP problem. Proving this result is not
quite as simple as this heuristic argument would indicate since the Bellman operator
effectively involves the evaluation of infinitely many multidimensional integrals for
each possible conditioning pair (s, a). The problem can be solved by introducing a
random Bellman operator I N ( V ) defined by

u(s,a) + ~ ~V(gi)PN(gi 18,a)

-FN(V)(s) - max
aEA(s) [ ,N i=1 1
, (4.47)

where ( g l , . . . , gN) are N IID random uniform draws from the d-dimensional hyper-
cube [0, 1] d, and the transition probability pN over this grid is defined by:

I j, a) i,j = I , . . . , N . (4.48)
pN(d~lgj,a) : N - ,
~ k = l p ( s i I gj, a)
678 J. Rust

The resulting discrete approximation algorithm, Vt = -F~-t (0), t = 0 , . . . , T, can be

interpreted as simple backward induction over a randomly chosen set of grid points in
[0, 1]d. Notice that the random Bellman operator is a self-approximating operator, i.e.
we can use formula (4.47) to evaluate -FN (V)(s) at any point s E S rather than just at
the set of random grid points (Sl, •. •, su). This self-approximating property is the key
to the proof of the convergence of the random Bellman operator. Rust (1995b) proved
that if the transition probability p(s' ] s, a) is a sufficiently smooth function of (s', s)
(i.e. Lipschitz continuous), then the sample average 1IN ~i=IN v(gi)pN(gi I s , a)
will not only be close to its expectation f V(s')p(s' t s, a)ds' at any specific pair
(s, a) c S x A, it will be close uniformly for all (s, a) E S x A. This can be formalized
in the following error bound established in Rust (1995b):

THEOREM 4.6. Suppose S = [0, 1]d and (u,p) satisfy the Lipschitz conditions (A3)
and (A4). Then for each N >~ 1 the expected error in the random Bellman operator
satisfies the following uniform bound:

E {II N(v)- F(v)II} (1 - / 3 ) , / N '

where Kp is the Lipschitz bound on p and "/(d) is a bounding constant given by:


and C is an absolute constant independent of u, p, V, /3 or d.

Inequality (4.49) implies that randomization breaks the curse of dimensionality for
the subclass of DDP problems because the expected error in the random Bellman
operator decreases at rate 1/,¢/N independent of the dimension d of the state space
and the constant of proportionality 3'(d) grows linearly rather than exponentially in
d. Using Lemma 4.1 we can insure that the expected error in the random sequence
of value functions V = (~'o,. • •, VT) generated by backward induction using/n N will
be within e of the true sequence V = (V0,..., VT) provided we choose N ) N(c,/3)
where N(e,/3) is given by

(7(d)IAIKpK~ 2
- -

Since the complexity of solving the DDP with N states and IAI actions is O(TIAIN2),
we obtain the upper bound on complexity of Rust (1995b):
Ch. 14: Numerical Dynamic Programming in Economics 679

THEOREM 4.7. Randomization breaks the curse of dimensionality of solving DDP's:

i.e. the worst case complexity of the class of randomized algorithms for computing
an e-approximation to V = (Vo,..., liT) in DDP problems satisfying ( a l ) , . . . , (A4)
is given by:

compW°r-ran(e,/3, d) -- O \ (1 - ¢3)se4 / " (4.52)

Rust's algorithm has not yet been programmed and evaluated in actual applications,
so we do not know how well it works in practice. The fact that simple backward
recursion using the random Bellman operator has the nice theoretical property of
breaking the curse of dimensionality does not necessarily imply that the method
will outperform deterministic methods on low-dimensional problems in much the
same way that deterministic integration algorithms such as quadrature systematically
outperform crude Monte Carlo integration in low dimensional problems. However
Monte Carlo integration methods have been successfully applied in a recent study by
Keane and Wolpin (1994), although their method differs in many respects from the
random Bellman operator approach described above. In particular, their method does
not have the self-approximating feature of the random Bellman operator, so auxiliary
interpolation routines must be employed to generate estimates of the continuous value
function for points s that do not lie on a preassigned grid of the state space S. Since this
approach requires the use of smooth approximation methods, we defer its presentation
until Section 4.3.2.

"Low discrepancy" grid points. We conclude our survey of discrete approximation

methods by suggesting that certain deterministically generated sets of grid points
known as "low discrepancy sets" may be very effective for reducing the compu-
tational burden of the multivariate numerical integration subproblem of the general
MDP problem. Recent research has shown that various low discrepancy sets such as
the Hammersley, Halton, and Sobol' points possess certain optimality properties for
numerical integration problems [for definitions of these sets, see Neiderreiter (1992)].
For example, Section 3.2 summarized Wo2niakowski's (1991) result that the sample
average 1/N }-]~N_I f(si) where ( s l , . . . , 8N) a r e the first N Hammersley points is a
nearly optimal algorithm for computing an integral f[0,1]e f(s) ds in the sense that the
average cost of this algorithm is close to the lower bound on the average complexity of
numerical integration using a Gaussian prior over the space F of possible integrands.
We can get some insight into why these particular deterministic sequences seem to
perform so well in numerical integration from an inequality Neiderreiter (1992) refers
to as the Koksma Hlwaka inequality:

EN f ( s i ) - f f ( s ) ~ ( d s ) ~< V(f)D*N(S,,..., 8N), (4.53)

680 ,L Rust

where A is Lebesgue measure on [0, 1]a, V ( f ) is the total variation in f in the sense
of Hardy and Krause (see p. 19 of Neiderreiter for a definition), and D~v is the

D*N(Sl,...,SN) ~- sup IAN(B) -- A ( B ) ] , (4.54)


where 13 is the class of (open) suborthants of [0, 1]d, (B = {[0, s) d C [0, 1]d[s C
[0, lid}) and AN is the empirical CDF corresponding to the points ( s t , . . . , sN). The
Koksma-Hlwaka inequality suggests that by choosing grid points ( s l , . . . , SN) that
make the discrepancy D ~ ( S l , . . . , 8N) small we will be able to obtain very accurate
estimates of the integral of functions f whose total variation V(f) is not too large. An
interesting literature on discrepancy bounds [surveyed in Neiderreiter (1992)] provides
upper and lower bounds on the rate of decrease of certain "low discrepancy" point
sets such as the Hammersley, Halton, and Sobol' points. For example Roth's (1954)
lower bound on the discrepancy of any set of points (s~,..., SN) in [0, 1]d is given

D*N(Sl,...,sN) >~K(d)(1,_og_N5 (d-l)~ 2

where K(d) is a universal constant that only depends on the dimension of the hyper-
cube, d. An upper bound for the N-element Hammersley point set P = { s ~ , . . . , ShN}
is given by:

~/(d)(log N) a-'
D*N(P ) <~ + O ( N - ' (log N)a-z).

Obviously for this class, the Hammersley points do much better than randolnly se-
lected points since the rate of decrease in the expected error in the latter is at the
slower rate O(1/v/N). A recent study by Paskov (1994) compared the accuracy of
deterministic integration using deterministic low discrepancy grids such as the Halton
and Sobol' points and standard Monte Carlo in high dimensional (d = 360) integra-
tion problems arising in finance applications. For these problems the deterministic
algorithms provided much more accurate estimates of the integral in less cpu time.
This suggests that these points might be better choices than randomly chosen grid
p o i n t s ( g l , - • • , 8 N ) in the definition of the random Bellman operator in Eq. (4.47).
That fact that these deterministic algorithms outperform Monte Carlo integration
for certain classes of functions F does not contradict the complexity bounds pre-
sented in Section 3.2. The deterministic low-discrepancy integration algorithms are
indeed subject to a curse of dimensionality, but on a worst case basis. However the
theoretical work of Wo~niakowski and Paskov shows that low discrepancy methods
break the curse of dimensionality on an average case basis. Paskov's numerical results
Ch. 14: Numerical Dynamic Programming in Economics 681

are consistent with the average case complexity bounds based on prior distributions
which are concentrated on a subclass of problem elements with stronger regularity
conditions than considered under the worst case complexity measure. Although Monte
Carlo integration does a good job in approximating any measurable integrand with
finite variance, certain deterministic integration algorithms may be more effective for
certain restricted subclasses of functions that have more smoothness. For example,
the Koksma-Hlwaka inequality suggests that low discrepancy point sets such as the
Halton or Sobol' points will be effective in integrating the set of functions that have
small Hardy-Krause variations V ( f ) , although they generally won't do a good job
of integrating arbitrary continuous functions in high dimensional spaces since V ( f )
can be shown to increase exponentially fast as the dimension of the space increases,
at least for functions in certain subsets of the space C[0, 1]d of continuous functions
on the d-dimensional hypercube.
An important unanswered question is whether an analog of Paskov's and
WoZniakowski's results extend to certain classes of MDPs, i.e. are certain classes
of MDP problems tractable on an average case basis? We conjecture that the subclass
of DDP problems is tractable on an average case basis. As we noted at the end of
Section 3 the difficulty in applying an average case analysis to the DDP problem is
the necessity of specifying a reasonable prior over the space of admissible transi-
tion probabilities. The typical prior used in multivariate integration problems, folded
Wiener sheet measure, does not ensure that the transition probabilities are nonnegative
and integrate to 1.

4.3.2. Smooth approximation methods

Smooth approximation methods differ from discrete approximation methods by ap-

proximating the value function V = (V0,..., VT) or decision rule ct = (cx0,..., c~T)
by smooth functions of the state variable s rather than over a finite grid of points. For
example many methods parameterize the value function or policy function in terms of
a vector 0 C R k of unknown parameters, choosing 0" so that the resulting estimates of
V~ or c~0 are as close as possible to the true solutions V and c~. Smooth approximation
methods can be viewed from an abstract perspective as selecting estimates of V or &
from a finite-dimensional submanifold of the infinite dimensional space of value or
policy functions. We shall see that the differences in various approximation methods
are a result of different ways of paraln~erizing these submanifolds and of different
metrics for locating particular elements V and & that are "closest" to the true solution
V and c~.
In order to understand the general structure of smooth approximation methods and
appreciate the role that approximation and interpolation methods play in carrying out
the backward induction process, consider how one would compute the basic recursion
relation (2.6) determining the value function V~ at time t. Assume that we have already
computed a smoothed version of the time t + 1 value function so that we can quickly
682 .I. Rust
evaluate ~+1 (s) at any point s E 5;. Using Vt+l we can compute estimates of the time
t value function Vt at any finite collection of points { s ~ , . . . , sN} using the standard
backward induction formula of dynamic programming:

~--- ae.A(s~)max[ u ( s i , a ) + f l / V t + l ( s ' ) p ( d s ' t si, a)], i = 1,... , N. (4.55)


We use the notation l"(Vt+l)(si) to denote an approximation to the value of the true
Bellman operator F(~+~)(si) at s~ in view of the fact that we will typically not be
able to find analytic solutions to the integration and maximization problems in (4.55)
so we must rely on approximate numerical solutions to these problems. Since it is
quite costly to compute approximate solutions to the right hand side of (4.55) at
each of the N points { S l , . . . , SN}, we would like to choose _N as small as possible.
However in order to increase the accuracy of solutions to the numerical optimization
and integ~t~n problems the are involved in calculating the time t - 1 value Nnction
Vt--1 ~- -F(Vt) we need to evaluate ~ at as large a number of points s E 5; as
possible. Rather than evaluating Vt(s) at a large number of points s C 5; we will
consider approximation methods thatonly require us to evaluate ~ at a relatively
small number of points {Vt (s 1 ) , . . . , Vt (SN)} and use these points as "data" in order
to generate accurate "predicted" values ~ ( s ) at other points s E S at much lower
cost than evaluating Eq. (4.55) directly at each s E 5;. A general approach is to
approximate the value function by nonlinear least squares using a parametric family
Vo = (Vo,o,..., VT,O) that are smooth functions o f s C 5; and a ( T + 1)k × 1 vector of
unknown parameters 0 = (00,..., OT). Then our estimate of the time t value function
is ~ = Vt, 0 where 0"t is defined recursively by:

O~ = arg min aN(O) ~ INZ Vot (Si) -- F(Vot+I ) (si) 2 , (4.56)

OtERk i=1

where V0~+I is the least squares estimate of the time t + 1 value function Vt+l, The
degree to which the smooth approximation approach succeeds in providing accurate
approximations for a small number N depends on the degree of smoothness in the
true solution: much larger values of N will be required if we are unable to rule out
the possibility that Vt is highly oscillatory as opposed to the case where we know a
priori that Vt has certain monotonicity and concavity properties.
Notice that smooth approximation is similar to discrete approximation in that both
methods require specification of a "grid" { s l , . . . , SN}. It follows that the same issues
of optimal choice of grid points that arose in our review of discrete approximation
methods will also arise in our discussion of continuous approximation methods. In
Ch. 14: N u m e r i c a l D y n a m i c P r o g r a m m i n g in E c o n o m i c s 683

addition to tile choice of grid points (which is similar in many respects to the problem
of selecting "optimal design points" in a regression/prediction problem) there are a
number of other decisions that have to be made in order to implement a smooth
approximation method:
1. Which object should we parametrize: V, c~, or something else?
2. How should this object be parameterized: via a linear-in-parameters specification
such as a polynomial series approximation, or a nonlinear-in-parameters approxi-
mation such as a neural network, or by piecewise polynomial approximations such
as splines or Hermite polynomials?
3. How should 0 be determined: via nonlinear least squares fit, or via a projection
method such as Chebyshev interpolation or Galerkin's method, or some other (pos-
sibly randomized) procedure?
4. Which numerical optimization and integration methods should be used to compute
the right hand side of (4.55) at each of the grid points { s l , . . . , SN}?
One can see that we obtain a wide variety of algorithms depending on how we
answer each of these questions. We do not have the space here to provide an exhaustive
analysis of all the possible variations. Instead we will survey some of the leading
smooth approximation methods that have been used in practical applications, and
then conclude with some general observations on the relative efficiency and accuracy
of these methods. We will concentrate on providing intuitive presentations of methods
rather than on formal proofs of their convergence.

Piecewise linear interpolation. 47 This is perhaps the simplest smooth approximation

method, also referred to as multilinear interpolation. Versions of this approach have
been implemented in the computer package DYGAM by Dantzig et al. (1974). The
basic idea is to use simple piecewise linear interpolation of Vt in each coordinate of
the state vector s. Multilinear interpolation requires an underlying grid on the state
space defined by a cartesian product of unidimensional grids over each coordinate of
the vector s = ( s l , . . . , sa). Let the grid over the kth component of s be given by N
points 8k, 1 < 8 k , 2 < • • • "~ 8k,N, SO that the overall grid of the state space S contains
N d points. In order to estimate Vt(s) at an arbitrary point s E S, the multilinear
interpolation procedure locates the grid hypercube that contains s and successively
carries out linear interpolation over each coordinate of s, yielding an estimate of Vt(s)
that is a linear combination of the values of Vt at the vertices of the grid hypercube
containing s. For example, consider a two-dimensional MDP problem d = 2 and
suppose we want to evaluate Vt at the point s = (sl,s2). First we locate the grid
hypercube containing s: sl E ( S l , i , 81,i_}_1) and s2 C (s2,j, s2,j+l) for some indices i
and j between 1 and N. Define the weights wl and w2 by:
81 - - 8 t #
,tU 1 - -
81#+1 - - 81, i

47This section borrows heavily from the presentation in Johnson ¢t al. (1993).
684 J. Rust

8,2 -- 82, j
w2 - . (4.57)
82,j_1_1 -- 82, j

then the multilinear approximation Vt(8') is defined by:

~zt(8" ) = W2V2(8,2,j+1) + (1 - w2)vu(8"2,j), (4.58)

where the points V2(82,j) and vz(8"2,j+l) are linear interpolations over the first coor-
dinate Sl of s:

v2(8'2,j) = wlvds'l#+~, 8'2,j) + (~ - wl)v~(8~,l, 8'2,j),

V2(82,j-t-1) = W l V t ( 8 l , i + l , s2,j_t_l) @ (1 - l/)l)Vt(8i,1, s 2 , j + l ) . (4.59)

In a d-dimensional problem, one applies the above steps recursively, generating a total
of 2 d - 1 linear interpolations and function evaluations to produce the interpolated
value Vt(s). Since the amount of work required to perform all of these interpolations
increases exponentially in d it is clear that simple multilinear interpolation is subject
to the curse of dimensionality. The method also produces an interpolated value func-
tion with kinks on the edges of each of the grid hypercubes. In certain MDP problems
the kinks in Vt are not smoothed out by the integration operator f Vt(s')p(ds'] s, a)
which creates difficulties for nonlinear optimization algorithms that attempt to com-
pute the maximizing value of the action a at each grid point s. The main advantage
of linear interpolation, noted in Judd and Solnick (1994), is that in one dimensional
problems linear interpolation preserves monotonicity and concavity properties of V.
Unfortunately Judd and Solnick showed that these properties are not necessarily pre-
served by multilinear interpolation in higher dimensional problems.
The fact that multilinear interpolation is subject to the curse of dimensionality
is not just of academic interest: practical experience with the method demonstrates
that it is feasible only in relatively low-dimensional problems. In their numerical
application of the multilinear DP algorithm to a multidimensional stochastic water
reservoir problem, Johnson et al. (1993) found that while numerical calculation of
a reasonable approximation of V for a problem with T = 3 and d -- 2 took only
l0 seconds on an IBM 3090 supercomputer, solving a problem with d = 5 was
estimated to require 70 cpu hours. Thus, they concluded that "The effort required to
solve practical problems having nonlinear objectives with a dimension d of more than
3 to 5 with reasonable accuracy is generally prohibitive with the linear interpolation
method commonly employed" (p. 486).
Ch. 14: Numerical Dynamic Programming in Economics 685

Piecewise cubic interpolation. 4s Piecewise linear interpolation is an example of

a spline approximation, which in one dimension is simply a piecewise polynomial
approximation to a function f(s) (s c R) with boundary conditions at knot points
(Sl,. • •, SN) that ensure that the local polynomial approximations on each subinterval
(si, si+1) join up to have a specified degree of smoothness. A spline of order r is an
approximation to f that is (r - 2) times continuously differentiable and therefore a
polynomial of degree not greater than (r - 1) in each subinterval (si, Si+l). Thus, a
piecewise linear approximation is also known as a 2-spline. If the value function is
known to be twice continuously differentiable, then much better approximations can
be obtained using a 4-spline, i.e. piecewise cubic polynomials, since they produce an
approximation that is everywhere twice continuously differentiable. The extra smooth-
ness of cubic splines often result in much better approximations, allowing one to get
by with a smaller number of knot points than are required by linear splines to achieve
a specified degree of accuracy. In general if f has a bounded rth derivative, then it
can be approximated by an r-spline over a finite interval with N equally space knot
points with a maximum error of order O(1/N~). This result suggests that higher order
splines ought to be able to obtain the same error of approximation using a smaller
number of knot points N.
Multidimensional problems can be solved by forming tensor product splines. These
are simply products of one-dimensional splines in each of the coordinate variables.
For example, a d-dimensional cubic spline approximation to f(s) takes the form:

4 4 d

f(s) = Z "'" E e(k,(s) ..... kd(s)) ( i l ' ' ' " 'id) H ( S j - 8j,kd(s)) i j - I (4.60)
i~=1 id=l j=l

where k(s) = (kl ( s ) , . . . , ha(s)) is the index of the grid hypercube element containing
the point s (i.e. for each i = 1 , . . . , d we have si E (8i,ki,8i,ki+l) where si is
the ith component of s, and 8i, 1 < 8i, 2 < ... < 8i, N are the knot points on the
ith coordinate axis). The coefficients ck(~)(il,...,i~z) are determined by requiring
that f(s) interpolate the true function f(s) at each of the grid points and that the
first and second derivatives of f(s) match up along all of the edges of the grid
hypercubes so that the resulting function is twice continuously differentiable over
all s c S. However there are two extra degrees of freedom in each dimension due
to the fact that the N knots define only N - 1 intervals in each dimension. The
polynomial spline algorithm developed in Johnson et al. (1993) resolves this using a
construction due to De Boor (1978) that fits only N - 3 cubic polynomials in each
dimension. Since each cubic polynomial is defined by 4 coefficients, the total number
of coefficients ck(~) (il, • • • , id) in a multidimensional cubic spline with N knot points
on each coordinate axis is [4(N - 3)] d.

48This section borrows from the exposition of splines in Daniel (1976) and Johnson et al. (1993).
686 Z Rust

Johnson (1989) showed that the number of floating point operations required to
compute these coefficients is of order O(4dNa). After these coefficients have been
computed and stored, it takes an additional (4 a - 1) floating operations to compute an
interpolated estimate Vt (s) at any point s c 5'. Since this must be done at many points
3' E S in order to compute the numerical integrals at each of the N a grid points on
the right hand side of (4.55), it dominates the cpu requirements of the cubic spline DP
algorithm. Thus the method is clearly subject to the curse of dimensionality. However
even though evaluating the cubic spline in (4.55) requires approximately 2 a-1 times
more work than evaluation of the linear spline in (4.58), the increased accuracy of
a cubic spline approximation may allow one to use fewer knot points N in each
dimension to obtain an estimate of Vt with the same level of accuracy. For example
in their numerical comparisons of the linear and cubic spline methods Johnson et al.
(1993) found that solving a four dimensional stochastic reservoir test problem to
within an error tolerance of approximately 1% of the true solution required N = 13
knot points per dimension using linear spline interpolation but only N = 4 knot
points per dimension with cubic spline interpolation. The linear spline algorithm took
4,043 cpu seconds as opposed to only 26 cpu seconds for the cubic spline algorithm,
representing a speedup of 155 times. Part of the speedup is also due to the fact that
the cubic spline interpolation is everywhere twice continuously differentiable which
enabled the use of faster gradient optimization algorithms to compute the maximizing
value of a at each of the grid points, whereas the linear spline interpolation had kinks
that required the use of the slower non-gradient Nelder and Mead (1965) optimization
algorithm. They concluded that "In a case with a 250-fold reduction in CPU time,
this implies that a 10-fold reduction is associated with the use of a quasi-Newton
algorithm and a 25-fold reduction is associated with the increased accuracy of the
(cubic) spline" (p. 494).
These speedups provide fairly convincing evidence for the superiority of cubic
spline interpolation over simple linear interpolation, at least in the class of problems
considered. However a drawback of cubic spline interpolation is that it is not guar-
anteed to preserve concavity or monotonicity properties of the value function. Judd
and Solnick (1994) discuss a quadratic spline interpolation procedure of Schumache r
(1983) that enables one to impose monotonicity and concavity restrictions relatively
easily. They argue that these properties are quite important in a variety of economic
problems and their numerical examples demonstrate that the extra effort involved in
enforcing these constraints has a significant payoff in terms of enabling one to use a
smaller number of knot points to produce approximate solutions of a specified degree
of accuracy.

Chebyshev polynomial approximation. 49 Chebyshev polynomial approximation is

a special case of polynomial series approximation which is in turn a special case

49This section bon'owsheavily from papers by Judd (1990 and 1994) and Judd and Solnick (1994).
Ch. 14." Numerical Dynamic Programming in Economics 687

of the class of linear approximation algorithms discussed in Novak (1988). These

algorithms approximate the time t value function Vt(s) by a linear combina-
tion of the first k elements of an infinite sequence of polynomial basis functions
{pl(s),pZ(S),..~,pk(s),...} defined over all s E Sil o By specifying a particular set
of coefficients Ot = (Ot,1,...,Ot,k) we obtain an estimate Vg, of the time t value
function Vt given by


To implement this method we need to specify the basis functions and a method
for choosing the "best fitting" coefficient vector 0"t. A natural choice for the set
of basis functions are the "ordinary" polynomials p~ (s) = s i since the Weierstrass
approximation theorem shows that if S is compact, then the set of all linear com-
binations of the form (4.61) is dense in the space C(S) of all continuous functions
on S. A natural way to determine 0"t is via the nonlinear least squares approach in
Eq. (4.56). A problem with ordinary least squares estimation of 0"t using the ordi-
nary polynomial basis {/, s, s 2 , . . . } is that successive terms in the series approxi-
mation (4.61) become highly collinear as k gets large and this can create numeri-
cal problems (e.g. near-singular moment matrices) in process of computing the least
squares estimate of 0"t. For example, if the grid points {sl . . . . , su} are not well dis-
tributed over the entire domain S, the approximate value function VO, can display
explosive oscillations as k gets large, especially in regions of S where there are few
grid points.
These problems motivate the use of uniformly bounded, orthogonal polynomial
series approximations. There are (infinitely) many such orthogonal bases for C(S). In
addition, different orthogonal bases can be generated depending on how one defines
the inner product on elements of C(S). If # is a measure on S, then an inner product
can be constructed using # as a "weighting function":

(f, 9)~ ~ fs f(s)g(s)Iz(ds)" (4.62)

The Chebyshev polynomials {pi} arc an orthogonal collection of polynomials de-

fined on the domain S = [ - 1 , 1] where orthogonality is defined using the weight--

5°We use the term "basis" in a loose sense here. In particular we do not require the basis functions to
be orthogonal or linearly independent.
688 Z Rusl

ing function p,(ds) = d s / ~ . 51 The explicit formula for the {p/} is given by
pi(s) = c o s ( / c o s -1 (s)), but they can also be defined via the recursion:

Pi (s) = 2spi_ 1(s) - Pi-2 (s), (4.63)

with initial conditions po(s) = 1 and Pl (s) = s. By definition, the C h e b y s h e v poly-

nomials satisfy the continuous orthogonality relation

(Pi'PJ)" =
f , ~-i----~ ds = O, i ¢ j, (4.64)

h o w e v e r the key to C h e b y s h e v a p p r o x i m a t i o n is the fact that the {pi} also satisfy the
discrete orthogonality relation:

E p i ( s l k ) ; j ( s l )k = O, i ¢ j, (4.65)

where szk are the the zeros of Pk given by:

-- , l = 1,...,k. (4.66)

This result implies that the C h e b y s h e v zeros { s ~ , . . . , s~} can serve as a set of grid
points for interpolation of an arbitrary continuous function f . Using these grid points,
we can interpolate an arbitrary continuous function f by the function f k - 1 @ ) defined

fk-l(s) =-20~ + E ~+,pi(s), (4.67)

where the coefficients 0" = ( 0 1 , . . . , 0"k) are given by:

~ f(8 lk )pi_l(8 lk ). (4.68)

51The fact that the Chebyshev polynomials are only defined on the [-1, 1] interval does not limit
the applicability of this approach, since a function f(s) defined on an arbitrary interval [a, b] can be
transformed into an equivalent function on [-1, 1] via the change of variables s -+ 2(s - a)/(b - a) - 1.
Ch. 14."NumericalDynamic Programmingin Economics 689

The function ik--1 defined in Eq. (4.67) is called the degree k - 1 Chebyshev in-
terpolant of the function f. It is easy to see that the discrete orthogonality rela-
tion (4.65)implies that f k - ~ ( s ) agrees with f ( s ) at each of the Chebyshev zeros,
s E { 8 ~ , . . . , s~}, so that fk-1 is a polynomial of degree k - 1 that interpolates f at
these knot points. 52
The amount of work (i.e. number of additions and multiplications) involved in
computing the coefficients 0"t at each stage t using a degree k - 1 Chebyshev interpolant
is O(k2). Once these coefficients are computed, the effort required to evaluate fk at any
s C S is only O(k). Similar to cubic splines, the smoothness properties of Chebyshev
polynomials lead to the presumption that we can obtain good approximations for small
values of k: this has been verified in many numerical applications, although all of
the numerical examples that we are aware of are for infinite horizon problems so we
defer discussion of their numerical performance until Section 4.4.2.
The last remaining issue is whether one can use Chebyshev polynomials to inter-
polate continuous multidimensional functions f : S -+ R, where S C R d. We can do
this using tensor products of the one dimensional Chebyshev polynomials:

k k
i1=1 id=l
where the coefficients 0(il ..... id) can be computed using a d-fold extension of for-
mula (4.68). Besides the Chebyshev polynomials, other commonly used families of
orthogonal polynomials include the Legendre polynomials (defined on the interval
[ - 1 , 1] using the weight function #(ds) = ds), the Laguerre polynomials (defined
on the interval [0, ec) using the weight function #(ds) = se-Sds), and the Hermite
polynomials (defined on ( - e c , ec) using weight function #(ds) = exp{-s2/2}ds).
Besides the direct formula for 0"given by the Chebyshev interpolation formula (4.68)
or the indirect nonlinear least squares approach to estimating 0 in formula (4.56), the
0 coefficients can be determined using a broad class of algorithms that Judd (1992)
has referred to as projection methods. An example of a projection method is the
Galerkin method which determines the k x 1 vector 0"t as the solution to the system
of k nonlinear equations given by:

(VO,--F(V~,+I),Pi)u~ : O, i = 1,...,~. (4.70)

The Galerkin method is closely connected to least squares: VO~ l- -/v'(Vgt+

) is a "resid-
ual function", and the Galerkin method requires that the projections of this residual
52Note that the Chebyshev coefficient estimates 0 given in (4.68) are not the same as the least squares
estimates based on the k Chebyshev polynomials {P0.... ,Pk-l} and grid points {s~ ..... s~}.
690 z Rust

function on each of the k basis functions { P l , . . . , Pk} should be zero, i.e. the residual
function is required to be orthogonal to the linear space spanned by { P l , . . . ,pk}.
Unfortunately one can not literally carry out the Galerkin procedure since the inner
products in (4.70) involve integrals that must be computed numerically. A problem
with the Galerkin procedure relative to least squares or Chebyshev interpolation is that
it is not clear what to do if the system (4.70) has no solution or more than one solution.
The other problem is that all of these procedures are subject to a compound version of
the curse of dimensionality: not only do the number of basis functions k necessary to
attain an e-approximation increase exponentially fast in d, but the amount of computer
time required to find an e-approximation to a nonlinear system of k equations and
unknowns increases exponentially fast on a worst case basis [Sikorski (1984, 1985)].

Approximation by neural networks. The key problem with all of the spline and poly-
nomial series approximation methods discussed so far is the curse of dimensionality:
in each case the number of 0 coefficients k required to obtain an e-approximation to a
given function f increases exponentially fast in the dimension d of the state space S.
Recent results of Barron (1993) and Hornik, Stinchcombe, White and Auer (HSWA)
(1992) that show that neural networks can approximate functions of d variables on
a worst case basis without using an exponentially increasing number of coefficients.
This work suggests that we can break the curse of dimensionality of solving the MDP
problem by approximating value functions by single layer feedforward networks of
the form:

~,0(s) = ;~o + ~ a~e(~s + ~), (4.71)

where Ot = ( A 0 , . . . , A k , S l , . . . , d k , p t , . . . , p k ) has a total of N = ( d + 2)k + 1 com-

ponents (since the 5i are d x 1 vectors) and ~b : R ~ R is a sigmoidal "squashing
function" such as the logistic function, qS(z) = exp{z}/(1 + exp{:c}). Unfortunately
the general results of Novak (1988) and T W W (1988) on the computational com-
plexity of the approximation problem imply that neural networks cannot succeed in
breaking the curse of dimensionality associated with solving the dynamic program-
ming problem. The reason is that although neural nets are capable of achieving very
good approximations to functions using relatively few parameters, the Achilles heel is
the curse of dimensionality associated with solving the global minimization problem
necessary to find a set of neural network parameters 0"t such that the associated net-
work output function Vt, g best fits the true value function Vt. While there are certain
tricks that can reduce the computational burden of computing "Or,they do not succeed
in circumventing the curse of dimensionality of the global minimization problem. 53

53Barron showed that the approximation error in the neural network will continue to decrease at rate
l/v'~ if we optimize the neural net parameters sequentially starting with only one hidden unit and min-
Ch. 14: Numerical Dynamic Programming in Economics 691

Barton concluded that "We have avoided the effects of the curse of dimensionality in
terms of the accuracy of approximation but not in terms of computational complex-
ity" (p. 933). This is exactly what a number of investigators have found in practice:
neural nets lead to nonlinear least squares problems that have many local minima, and
the effort required to find a local minimizer that yields good approximations to the
value or policy function can be extremely burdensome. For example a recent study
by Coleman and Liu (1994) used a neural net with k = 2, 3 and 5 hidden units and a
logistic squashing function ¢ to approximate the solution to a dynamic programming
problem of a household's optimal consumption/savings decisions in the presence of
incomplete markets. Paraphrasing their results in our notation, they concluded that

For low values of k the method seems to perform quite poorly. For k = 2 the
approximation was uniformly above the true solutions, and for k = 3 the ap-
proximation exhibits an S-shaped pattern around the true solution. To address this
latter problem, for the second method we fixed 27 uniformly spaced grid points
and selected the neural net that came the closest to the true solution at all these
points. This method substantially improves the results, which leads to approxima-
tions which are about as good as the Chebyshev polynomial method. We wish to
point out, however, that fitting the Chebyshev polynomial was trivial, but fitting
the neural net was quite difficult: there evidently existed a variety of distinct local
minima that needed to be ruled out (p. 20).

Regression-based interpolation and Monte Carlo simulation methods'. In addition to

the curse of dimensionality associated with solving the multivariate function approx-
imation and optimization subproblems of an MDP problem, there is also a curse of
dimensionality associated with the multidimensional numerical integration subprob-
lem required to evaluate the Bellman operator F(V)(s) at any point s E S. As we
discussed in the introduction and in Section 3, randomized Monte Carlo integration
algorithms succeed in breaking the curse of dimensionality of the integration prob-
lem. This suggests that Monte Carlo simulation methods might be more effective than
deterministic methods such as multivariate quadrature for solving the integration sub-
problem in high-dimensional MDPs. Keane and Wolpin (1994) used this approach in
their algorithm for approximating solutions to finite horizon DDP's, i.e. MDPs with
finite choice sets A(s). The Keane-Wolpin algorithm computes the alternative-specific
value functions ½(s, a), t = 0 , . . . , T, defined by the recursion:

Vt(s, a) = u(s, a) +/3 [ ,max, [Vt+l (s', a')]p(ds' [ s, a), (4.72)

J a eA(s )

imizing over (A1,61,pl) and then successively adding additional hidden units 2, 3 , . . . , k treating the
parameter values for the previous hidden units as fixed. Due to the fact that the cpu-time required to find
an c-approximation to a global minimum of a smooth function increases exponentially in the number of
variables on a worst case basis, it follows that it is significantly easier to solve k individual (d + l)-
dimensional optimization problems than a single k(d + 1)-dimensional problem.
692 Z Rust

with the terminal condition VT(S, a) = u(s, a). The Vt(s, a) functions are related to
the usual value functions Vt(s) by the identity:

Vt(s) = max [Vt(s,a)]. (4.73)


To implement the Keane-Wolpin algorithm, one needs to specify a grid over the
state space S, say, { s l , . . . , su}. Keane and Wolpin used randomly generated grids,
although in principle they could also be deterministically chosen. At each point sj on
the grid, and for each action a E A(sj) one draws N realizations from the transition
density p(ds' I sj, @.54 We denote these realizations ( 8 1 j a , . . . , 8Uja) to emphasize
the fact that separate realizations are drawn for each conditioning pair (sj, a). Then
the Keane-Wolpin estimate of Vt(sj, a) is given by:

(sj, a) = '.(sj, a)

Z max [Vt+l(gija,,a')], j = 1 ... N. (4.74)

-}-N i=1 a'eA(gi4,~) ' '

Notice that Vt is only defined at the pre-assigned grid points { s l , . . . , SN} and not at
other points s C S. Since the realizations (slja,..., gNja) will generally not fall on
the pre-assigned grid points { s l , . . . , SN}, Keane and Wolpin fit a linear regression
using the N simulated values y _= ( ~ ( s , ) , . . . , ~(SN)) as an N × 1 dependent vari-
able and various functions of the components of the estimated expected value function
EVt+I(S, a) -~ f Vt+l(s')p(ds'ls , a) evaluated at the grid points {81,... , 8N} to
form the N x K matrix of regressors, X. The ordinary least squares coefficient esti-
mates ~) = (X'X)-~X'y were then used to generate estimates of Vt(s) at arbitrary
points s E S outside of the preassigned grid {sl,. •., sN}. In principle one could
use a variety of other non-parametric regression procedures (e.g. kernel estimation,
nearest neighbor estimation, regression trees etc.) or any of the sptine or polynomial
interpolation methods discussed above to generate smoothed estimates Vt(s) defined
at any s C S. Note that Eq. (4.74) implicitly assumes that some sort of interpola-
tion procedure has already been carried out on the stage t ÷ 1 value function Vt+l
so that it can be evaluated at all the random points gija, i = 1 , . . . , N, that were
drawn at the stage t grid point (sj, a). Although Keane and Wolpin have not derived
explicit error bounds or provided a formal proof of the convergence of their method,
they did perform extensive numerical tests of their algorithm and concluded that
"our approximation method ameliorates Bellman's 'curse of dimensionality' problem,
obtaining approximate solutions for problems with otherwise intractably large state

54There is no reason why the number of Monte Carlo draws must equal the number of grid points: the
only reason we have imposed this restriction is to simplify notation. In principle one can choose different
numbers of realizations Nj,a at each pair (sj, a).
Ch. 14: Numerical Dynamic Programming in Economics 693

space" (p. 35). Although their use of Monte Carlo integration does break the curse
of dimensionality of the integration subproblem, their algorithm does not avoid the
curse of dimensionality of the approximation subproblem due to the general results
of Novak (1988) and TWW (1988) which show that the worst case complexity of
multivariate function approximation increases exponentially in the problem dimension
d regardless of whether deterministic or randomized approximation methods are used.

Smooth approximation of the decision rule. All of the smooth approximation meth-
ods considered so far have focused on approximating the value function V. How-
ever in many problems the primary interest is in approximating the optimal de-
cision rule a. Given an estimate V of V we can compute the implied estimate
&(s) = (&0(s),..., &T(S)) of the optimal decision rule a(s) = (c~0(s) . . . . , aT(s))
at any point s E S via the dynamic programming recursion:

&t(s) = argaEA(s)max[u(s,a)+/3/Vt+l(s')p(ds' I s,a)], t= 0,... T-, 1. (4.75)

In practice &t(s) will differ from at(s) not only because Vt differs from V, but
also due to any additional approximation errors introduced by numerical solution
of the integration and constrained maximization subprobtems in Eq. (4.75). If we
only need to compute &t(s) at a small number of points s E S, the cost of finding
accurate approximate solutions to these subproblems might not be too burdensome.
However if we need to evaluate & at a large number of points in S, say for purposes
of stochastic simulation of the optimal decision rule, it will generally be too time-
consuming to compute &t(s) at many different points s E S. In these cases it makes
sense to incur the up-front cost of using one of the function approximation methods
discussed previously (Chebyshev polynomial approximation, piecewise polynomial
approximation, neural networks, etc.) to compute an approximate decision rule &t
given information IN -= (&(sl),..., ~(SN))
o n the optimal decisions computed over

a finite grid of points { s t , . . . , SN} in S. This results in a smooth approximation c~

to the optimal decision rule ct that can be quickly evaluated at any s E S.
However an alternative approach, first suggested by Smith (1991), is to develop
a solution procedure that approximates a directly, completely bypassing the need to
approximate V. Let ao(s) denote a parametric approximation to ct, i.e. a smooth
mapping 0 -+ A(s) depending on a h x 1 vector of parameters 0 which are chosen to
best approximate the true optimal decision rule a. For example ao could be given by
a polynomial series approximation or the output of a neural network. If c~ represents a
control variable such as consumption expenditures that can take on only nonnegative
values it is easy to define parameterizations that automatically enforce this constraint,

ao(s) = exp O~pi(s , (4.76)

694 Z Rust

where the {pi(s)} are some polynomial basis such as the Chebyshev polynomials,
etc. Other parameterizations such as splines, etc. can obviously be used as well. The
motivation for Smith's procedure is the static formulation of the dynamic optimization
introduced in Section 2:

V ( s ) = arg max E,~ /3tu ~, I so = s . (4.77)

~=(~0 ...... T)

We have largely avoided consideration of the direct static characterization of a in

this chapter in favor of the indirect dynamic programming characterization since
the static optimization problem is over the infinite-dimensional space of all deci-
sion rules and none of the standard static optimization procedures can be expected
to work particularly well over such large spaces. However the formulation suggests
that if there are flexible parameterizations 0 --+ c~0 that provide good approxima-
tions to a wide class of potential decision rules using a relatively small number of
parameters k, then we can approximate the optimal decision rule c~ by the paramet-
ric decision rule c~0 where 0 is the solution to the k-dimensional static optimization

0" = arg max V~ o (s), (4.78)


and V~ o is the value function corresponding to the decision rule c~0 = (s0,0, •. •, C~T,O):

V.o =- : s • (4.79)

For certain MDP problems, the optimal decision rule might be a relatively simple
function of s, so that relatively parsimonious parameterizations C~o with small num-
bers of parameters k might do a good job of approximating c~. In these cases Smith
refers to the approximation ~ as a simple rule of thumb.
The difficulty with this method as it has been stated so far is that we generally
do not know the form of the value function V~ o (s) corresponding to the decision
rule c~0. Calculation of V~ o (s) is a major source of the numerical burden of solving
the dynamic programming problem. However Monte Carlo integration can be used
to break the curse of dimensionality of approximating V~ o (s). Let {g~(0)} denote a
realization of the controlled stochastic process induced by the decision rule c~o. This
process can be generated recursively by

p(. I (4.8o)
Ch. 14." Numerical Dynamic Programming in Economics 695

where go(O) = s is the initial condition for the simulation. Let {gtm(0) I m =
1 , . . . , M } denote M independent simulations of this process. Then we can construct
an unbiased simulation estimate of V~ o (s) as follows:

1 M T
m=l t=0

Then we can define OM to be the solution to the following optimization problem

Ov = arg max V,~o(s). (4.82)


However this straightforward simulation approach is subject to a number of severe

problems. First, the simulated value function V'~o will generally not be a continuous
function of 0, complicating the solution to the maximization problem (4.82). Second,
we must re-simulate the controlled process in (4.80) each time we choose a new trial
value of 0 in the process of solving the maximization problem (4.82). This introduces
extra noise into the optimization creating a phenomenon known as "chattering" which
can prevent the simulation estimate 0"M from converging to its intended limit O as
M --+ co. The chattering problem can make it impossible to consistently estimate the
optimal decision rule c~ even if the parameterization of the decision rule c~0 (4.76) is
sufficiently flexible to enable it to provide arbitrarily accurate approximations to an
arbitrary decision rule as k --+ oo.
Smith (1991) showed that we can avoid both of these problems if the MDP is a
CDP of the following form: the state vector can be partitioned as s = (y, z) where
z is an exogenous state variable with Markov transition probability q(z' I z) and
y is an endogenous state variable with law of motion given by y' = r(y, e, z', z).
Given a stochastic realization of the exogenous state variable ( 5 1 , . . . , 2T) with initial
value zo = z, define the corresponding realization of the endogenous state variables
(~11( 0 ) , . . . , ~IT(O)) by the recursion

Y/.+I (0) -- r ( y t ( 0 ) , Ct(O), ff.t+l, Zt), (4.83)

where ~t(O) - c~t,o(~t(O), zt) and Yo = Y is the initial condition for the endogenous
state variable. Smith's algorithm begins by generating M independent realizations of
the exogenous state variables which remain fixed for all subsequent trial values of O.
However each time 0 changes we update the M corresponding realizations of the
696 J. Rust

endogenous state variables usin~the recursion (4.83). Using these simulations we can
define a Monte Carlo estimate V,~o (s) of V,~ o (s) by: 55

l M T
m=l /,=0

It is easy to verify that as long as the function r(y, c, z t, z) is a continuously differ-

entiable function of y and c that V,~0(s) will be a continuously differentiable function
of 0.
If the parameterization of the decision rule c~0 is ultimately dense in the space of
possible decision rules (i.e. for any possible decision rule c~ we have inf0 11c~0-c~ll --+ 0
as k --+ c~ where k is the dimension of 0), then it seems reasonable to conjecture
that Smith's theorem should imply that C~M -+ C~ provided that k and M tend
to infinity at the "right" rates, although to our knowledge this result has not yet
been formally established. However Smith compared his numerical solutions to the
analytical solution of a special case of the Brock-Mirman (t972) model. His numerical
solutions were based on linear and quadratic rules of thumb for c~ that depend on k -- 3
and k = 6 unknown parameters, respectively. He also considered a third rule of thumb
given by a simple one parameter "partial adjustment model" about the steady state
capital stock that would result if the technology shock zt were assumed to remain
indefinitely at its current level. Smith evaluated the quality of these three rules of
thumb in terms of the toss in discounted utility relative to the optimal decision rule
and in terms of how well the fitted rules approximated the optimal decision rule. He
found that:

from a welfare perspective, the three rules of thumb are nearly indistinguishable
from the optimal rule. For example, the welfare loss from using the optimal linear
rule rather than the truly optimal rule is equivalent to losing only 0.01% of per
period consumption. The corresponding losses for the optimal quadratic rule and
the "partial adjustment of the capital stock" rule are, respectively, 8.3 x 10-5% and
2.0 x 10-5%. The "partial adjustment of the capital stock" and the optimal quadratic
rule yield capital stock decisions that deviate only rarely by more than 0.5% from
optimal behavior. The results of this section show that ... parsimoniously param-
eterized rules of thumb can mimic optimal behavior very closely, according to a
variety of metrics. Indeed, a surprising finding is that a one-parameter family of
decision rules (the "partial adjustment of the capital stock" rule) outperforms a six
parameter family (the optimal quadratic rule) along all dimensions considered. The
success of the "partial capital stock adjustment" rule shows that rules of thumb

55In practice, Smith uses a procedure known as antithetic variates to reduce the variance in the Monte
Carlo estimates of Vc~o (s). Geweke in Chapter 15 of this Handbook discusses antithetic variates and other
variance reduction methods.
Ch. 14: Numerical Dynamic Programming in Economics 697

which incorporate some of the economic structure underlying the optimization

problem can perform better than "brute force" polynomial expansions (pp. 18-20).

Geweke, Slonim and Zarkin (t992) extended Smith's approach to DDP's. Although
they do not provide a proof of the convergence of their procedure, they did conduct
numerical tests of the method and reported favorable results.

Concluding observations. While many of the smooth approximation methods pre-

sented in this section appear very promising for the solution of small to medium scale
continuous MDP problems, especially in their use of Monte Carlo simulation to break
the curse of dimensionality of the high dimensional integration subproblems required
to compute V~o, the Achilles heel of all of these methods is the curse of dimen-
sionality of the embedded optimization and approximation problems. For example
Smith's method encounters a compound form of the curse of dimensionality since the
amount of computer time required to find an e-approximation to the global maximum
in (4.82) increases exponentially fast as k increases and the value of k required to
guarantee that c~o(s) provides an e-approximation to an arbitrary decision rule c~(s)
increases exponentially fast in the dimension d of the state variables increases. The
GSZ method avoids the curse of dimensionality of the optimization problem (since
their analog of (4.82) is globally concave in 0 and Nemirovsky and Yudin have shown
that the worst case complexity of concave optimization problems increases linearly
in the number of variables, k). However their method does not avoid the curse of
dimensionality of the approximation problem, since the number of basis functions
k used in the choice probabilities must increase exponentially fast as d increases in
order to guarantee an e-approximation to c~. However it is not clear that the curse of
dimensionality is really a practical problem for the kinds of MDPs that are currently
being solved: both Smith's and GSZ's results show that very parsimonious param-
eterizations of c~ can succeed in generating reasonably accurate approximations for
problems of current interest in economic applications. If their results carry over to
higher dimensional problems, it suggests the possibility that these smooth approxi-
mation methods might be able to circumvent the curse of dimensionality, at least for
some commonly encountered problems in economics.

4.4. Continuous infinite horizon MDPs

Most of the basic approaches to approximating continuous infinite horizon MDPs have
already been outlined in the previous section on continuous finite horizon MDPs. The
main additional complication in the infinite horizon case is that there is no terminal
period from which to begin a backward induction calculation: since V is the fixed
point to Bellman's equation V = F(V) we need to employ iterative methods to
approximate this fixed point in addition to approximating the infinite-dimensional
Bellman operator F : 13(S) --+ B(S). As in the previous section we present two
698 J. Rust

general approaches to approximating V: 1) discrete approximation, and 2) smooth

As we noted in the introduction, smooth approximation methods typically do not
exploit the fact that _P is a contraction operator, so proving the convergence of these
methods is more difficult. Smooth approximation methods may do a better job of
exploiting other features of the MDP problem such as the smoothness of V or the
fact that the MDP is in a special class of problems for which Euler equations can be
derived. However the curse of dimensionality of the multivariate function approxima-
tion problem implies that all of the standard smooth approximation methods for MDP
problems are also subject to a curse of dimensionality, at least on a worst case basis.
Discrete approximation methods compute an approximate fixed point V~v to an ap-
proximate finite-dimensional Bellman operator/'N : R N -+ R N that fully preserves
the contraction property of _P. Since these methods involve relatively straightforward
extensions of the general methods for contraction fixed points and the more special-
ized methods for discrete infinite horizon MDPs in Section 4.2, it has been much
easier to derive computable error bounds and relatively tight upper and lower bounds
on their computational complexity. However the lack of corresponding error bounds
for most of the existing smooth approximation methods for MDPs makes it is diffi-
cult to say much in general about the the relative efficiency of discrete versus smooth
approximation methods. A starting point for such an analysis is the complexity bound
for continuous infinite horizon MDPs derived by Chow and Tsitsiklis (1989). Their
complexity bound shows that no algorithm is capable of breaking the curse of dimen-
sionality, at least on a worst case basis.

THEOREM 4.8. The worst case deterministic complexity of the class of continuous in-
finite horizon MDP problems satisfying regularity conditions ( A 1 ) , . . . , (A5) of Sec-
tion 4.3.1 using the standard information (i.e. function evaluations) on (u, p) is given

c°mpW°r-det(e'ds' da' fJ) = O ( 1 ) (4.85)

((1 - f l ) 2 e ) 2dS+a~ "

Recall from Section 3 that (9 symbol denotes both an upper and lower bound
on the amount of cpu-time required by any deterministic algorithm to compute an
e-approximation to V on a worst case basis. A heuristic explanation of the expo-
nential lower bound (4.85) is that it is a product of the complexity bounds of three
"subproblems" involved in computing an approximate solution to a continuous MDP
in (4.55): 1) a O(1/e d°) bound on the complexity in finding an e-approximation to
the global maximum of the constrained optimization problem, 2) a O(1/e a~) bound
on the complexity of finding an e-approximation to the multivariate integral, and
3) a O ( 1 / e aS) bound on the complexity of the approximation problem, i.e. the prob-
lem of finding a (Lipschitz) continuous function V that is uniformly within e of
Ch. 14: Numerical Dynamic Programming in Economics 699

the fixed point V = F ( V ) . 56 Interestingly, in a subsequent paper Chow and Tsit-

siktis (1991) show that a simple discrete approximation method - a multigrid al-
gorithm - nearly attains the complexity bound (4.85) and hence constitutes an ap-
proximately optimal algorithm for the general continuous MDP problem. We are
not aware of any corresponding optimality results for smooth approximation meth-
It is important to note three key assumptions underlying the Chow and Tsitsiklis
complexity bound: 1) the class of MDP problems considered are CDP's (continuous
decision processes), 2) only deterministic algorithms are considered, and 3) complexity
is measured on a worst case basis. We review a recent paper of Rust (1995b) that shows
that another discrete approximation method, a random multigrid algorithm, succeeds
in breaking the curse of dimensionality of infinite horizon DDP's (i.e MDPs with finite
choice sets). This result is due to the fact that 1) essentially all of the work involved
in approximating the solution to a DDP is the multivariate integration problem, and 2)
randomization breaks the curse of dimensionality of the integration problem. However
randomization does not succeed in breaking the curse of dimensionality of the problem
of multivariate function approximation. This implies that none of the standard smooth
approximation methods are capable of breaking the curse of dimensionality of the
DDP problem.
Randomization is also incapable of breaking the curse of dimensionality associated
with the maximization subproblem of the MDP problem. In this sense CDP's are
inherently more difficult problems than DDP's. The only hope for breaking the curse
of dimensionality for CDP's is to focus on subclasses of CDP's with further structure
(i.e. CDP's where the maximization subproblem in (2.9) is concave for each s C S),
or to evaluate the complexity of MDP problems on an average instead of a worst
case basis. However we do not know whether there are algorithms that are capable
of breaking the curse of dimensionality of MDPs on an average case basis: although
recent results of Wo2niakowski (1992) that proves that the approximation problem is
tractable on an average case basis, it is still an open question whether the optimization
problem is tractable an average case basis [Wasilkowski (1994)].

56The surprisingaspect of the Chow and Tsitsikliscomplexity bound is that the bound is proportional
to the complexity bound for the problem of approximating the function F(W) for a fixed Lipschitz
continuous function W. One would expect that the problem of finding a fixed point to /1 to be inherently
more difficult than simply evaluating r at a particular value W, and we would expect that this extra
difficulty ought to be reflected by an extra factor of e representingthe extra work involvedin findingmore
accurate approximationsto the fixed point. This is certainlythe case for standard iterative algorithms such
as successive approximationsor Newton-Kantorovichiterations in which the number of evaluationsof F
for various trial valnes of Vk increases to infinity as e ~ 0, but not for the optimal algorithm. We will
provide further intuition as to why this is true below.
700 J. Rust

4.4.1. Discrete approximation methods'

Discrete approximation methods involve replacing the continuous state version of

Bellman's equation (2.9) by a discretized version

= max U(si,a)+flEVN(sj)PN(SjlSi,a) , (4.86)

aEA(s~) j=l

defined over a finite grid of points { s t , . . . , SN} in the state space S. 57 Section 4.3.1
discussed four possible ways of choosing grid points: 1) uniform (deterministic) grids,
2) quadrature grids, 3) uniform (random) grids, and 4) low discrepancy grids. Each
of these grids lead to natural definitions of an approximate discrete Markov transition
probabilities PN over the finite state space { S l , . . . , SN}, which implies that the ap-
proximate Bellman operator/~lv is a well defined contraction mapping from R N into
itself for each N. As we showed in Section 4.3.1, each of the "discretized" transition
probabilities PN(" t s, a) are also continuous functions of s and are defined for all
s E S. This implies that each of the approximate Bellman operators that we considered
in Section 4.3.1 also have the property of being self-approximating, i.e. FN(V)(s)
can be evaluated at any s E S by replacing si by s in formula (4.86) without the
need to resort to any auxiliary interpolation or approximation procedure. Thus, FN
has a dual interpretation: it can be viewed as mapping from R Iv into R N when we
restrict attention to its values on the grid { s t , . . . , sN}, but it can also be viewed as a
mapping from B(S) into B(S). Thus, quantities such as IIrN(w) - r ( w ) I I are well
defined, and quantities such as ]IVN -- V[[ are well defined provided we interpret VN
as its "extension" to B(S) via FN, i.e. VN(s) = FN(VN)(s) for s ¢ { S l , . . , , SN}.
Even though VN (viewed now as the fixed point to -PN in -RN) can be computed
exactly (modulo roundoff error) by the policy iteration algorithm described in Sec-
tion 4.2, we will generally make do by calculating an approximate fixed point VN
of FN. Using the tr~ngle inequality and Lemma 2.1, we can derive the following
bound on the error ttVN - vll between the true and approximate solution to the MDP
problem (where we now treat VN and VN as their extensions to B(S) via FN):

II N - vii = II N + V,,, - VN - vii

II N - '/Nil + IIV - vii (4.87)
IIrN(v) - r(v)ll
IIPN- v ll + (1 - 9 )
57As we discussed in Section 4.3.1 discrete approximation methods do not necessarily require dis-
cretization of the action sets A(si), i -- 1 , . . . , N. A variety of deterministic or stochastic constrained
optimization algorithms including "continuous" optimization methods such as gradient hill climbing algo-
rithms, Nelder-Mead polytope algorithms, or random algorithms such as simulated annealing could be used
as well.
Ch. 14: Numerical Dynamic Programming in Economics 701

The "fixed point approximation error" is represented by the quantity IIVN -- VN I] and
the "discretization error" is represented by the quantity ]].FN(V) - F ( V ) ] ] . The error
bounds derived in Section 4.2 allow one to make the fixed point approximation error
less than e/2 and the error bounds presented in Section 4.3.1 allow one to choose
a sufficiently fine grid (i.e. a sufficiently large N) so that the discretization error is
less than (1 - / 3 ) e / 2 uniformly for all V. It then follows from inequality (4.87) that
the maximum error in the approximate solution VN will be less than e. We now turn
to descriptions of two broad classes of discrete approximation algorithms: 1) single
grid methods and 2) multigrid methods. Multigrid methods reduce the fixed point
approximation error and the discretization error in tandem, beginning with a coarse
grid in order to reduce the computational burden of getting to a neighborhood of V,
but generating successively finer grids to refine the level of accuracy as the algorithm
begins to home in on the solution.

Single grid methods. Single grid methods consist of two steps: 1) choose a procedure
for generating a grid and generate N grid points { s l , . . . , SN} in S, 2) use one of the
solution methods for discrete infinite horizon MDPs described in Section 4.2 to find an
approximate fixed point VN to the approximate Bellman operator FN corresponding to
this grid. Once one has chosen a method for generating grid points and has determined
a number of grid points N to be generated, the level of discretization error (i.e. the
second term in inequality (4.87)) is fixed. The only remaining decision is how many
iterations of the discrete infinite horizon MDP solution algorithm should be performed
to reduce the fixed point approximation error to an appropriate level. A standard choice
for the latter solution algorithm is the method of successive approximations, especially
in view of the results of Section 4 which suggest that successive approximations may
be an approximately optimal algorithm for solving the fixed point problem, at least if
the dimension N is sufficiently large. Using the upper bound T(e,/3) on the number
of successive approximation steps from Eq. (4.3) of Section 4.2,1, it follows that the
e-complexity of successive approximations is O(T(e,/3) lAIN2), where ]A I denotes the
amount of cpu-time required to solve the constrained optimization problem in (4.86).
Now consider the use of uniformly spaced grid points. Using inequality (4.39) it is easy
to see that N = O(1/[(1 -/3)2e]a~) grid points are required in order to guarantee that
lIEN(V)-F(V)I I <~(1-/3)e uniformly for all V satisfying ]IV]] ~< K/(1-/3), where
K - maxs~s maxa~A(s) [u(s, a)]. If we solve the maximization problem in (4.86) by
discretizing the da-dimensional action sets A(s), it can be shown that a total of [A] =
O(1 /[(1 -/3)2e]a") points are required to guarantee that ]1FN(V) - F(V)II < (1 -/3)
uniformly for all V satisfying [[V[[ ~< K/(1 -/3). Using inequality (4.87) it follows
that if tA] and N are chosen this way and if T(e, fl) successive approximation steps
are performed, then the e-complexity of successive approximations over a uniform
grid, compSa'~°(c,/3, da, ds), is given by:

comp a, g (e, fl, d,. =0 ( T(e, fl) (4.88)

702 J. Rust

Notice that the cpu-time requirement is a factor of T(e,/3) higher than the lower
bound in (4.85). Thus we see that successive approximations using a single fixed
uniform discretization of the state and action spaces is not even close to being an
optimal algorithm for the continuous MDP problem. In view of the additional difficulty
involved in computing the approximate transition probability PN (discussed following
Eq. (4.37) in Section 4.3.1) this method is not recommended for use in practical
A potentially superior choice is the quadrature grid and the simpler Nystr6m/Tauchen
method for forming the approximate finite state Markov chain given in Eq. (4.42) of
Section 4.3.1. Tauchen (1990) used this approach to approximate the value function
to the Brock-Mirman stochastic optimal growth problem (example 2 in Section 2.4).
Recall that this problem has a two-dimensional continuous state variable, st = (kt, zt)
where kt is the endogenous capital stock state variable and zt is the exogenous tech-
nology shock state variable. Tauchen used a 20 point Gauss-Hermite quadrature rule
to calibrate a discrete Markov chain approximation to the technology shock tran-
sition density q(zt+l ] zt) and 90 equispaced gridpoints for log(kt). The resulting
grid over the two-dimensional state space S contained N = 20 x 90 = 1800 points.
Given this grid, Tauchen used successive approximations, terminating when the per-
centage change in successive iterations was less than 0.001. Using a Compaq 386/25
megahertz computer and the Gauss programming language, it took 46 cpu minutes to
compute an approximate solution VN. The resulting solution is extremely accurate:
comparing the implied optimal decision rule &N to the analytic solution in the special
case of log utility and 100% depreciation, Tauchen found that "The algorithm is only
off by one digit or so in the fourth decimal place" (p. 51). Although Tauchen's method
simplifies the numerical integration problem involved in computing the normalizing
factors in the denominator of the formula for PN, the use of quadrature grids does
not eliminate the curse of dimensionality of the underlying integration problem since
it is subject to the same curse of dimensionality as any other deterministic integration
algorithm. As a result, the complexity bound for quadrature methods is basically the
same as for uniform grids:

compS~'qg (e, /3, da, ds) = O (comp . . . . g(e, ~, d~, d,)), (4.89)

where qg stands for quadrature grid. More generally, the results of Section 3.2 imply
that the worst case complexity of any single grid method that uses successive approx-
imations and a deterministically chosen grid will be at least as large as for the naive
uniform grid.
Rust (1995b) showed that a variation of discrete approximation using random uni-
form grids (i.e. N IID uniformly distributed draws from S) does succeed in breaking
the curse of dimensionality of the integration subproblem. Although we have noted
that randomization in incapable of breaking the curse of dimensionality of the maxi-
mization subproblem in CDP's, it does break the curse of dimensionality of the DDP
Ch. 14." Numerical Dynamic Programming in Economics 703

problem since essentially all the work involved approximating the Bellman operator
to a DDP problem is in the multivariate integration subproblem. Rust proved that
successive approximations using the random Bellman operator I"N defined Eq. (4.47)
of Section 4.3.1 succeeds in breaking the curse of dimensionality of solving the DDP
problem on a worst case basis. For simplicity, assume that S -- [0, 1]a, and that a
random grid { S l , . . . , g~v} consists of N liD uniform random draws from S. The error
bound in Theorem 4.6 shows that randomization breaks the curse of dimensionality
of the integration problem. Theorem 4.7 provides an upper bound on the randomized
complexity of the DDP problem for finite horizon problems, and the bound extends to
infinite horizon problems by substituting T(e,/3) for the horizon length T in Eq. (4.52).
For completeness, we state this result as the following corollary to Theorem 4.7:

COROLLARY, Under the regularity conditions of Theorem 4.7 randomization breaks'

the curse of dimensionality of finding an e-approximation to the value function V
of an infinite horizon DDP problem. An upper bound on the worst case randomized
complexity of the infinite horizon DDP problem is given by:

.... 0 [ll°g ( 1 - / ( 1 - /3)e)d4]Al3K4K4"l

compW°r-ran(e,/3, d) (4.90)
\ i tog(/3)l( 1 _/3)8£4 ]'

where the constants Ku and Kp are the Lipschitz bounds' for u and p given in
Definition 4.1.

Finally, consider the use of low discrepancy grids { s i , . . . , SN}. These are deter-
ministically chosen grids such as the Hammersley, Halton, or Sobol' points described
in Section 4.3.1. The approximate discrete Markov chain approximation pN to the
transition density is formed exactly the same as in Eq. (4.48) for the randomly chosen
grid, but with the deterministically chosen grid points {81,... , 8N} in place of the
random grid points { g l , - . - , gt¢}. Since low discrepancy grids are deterministically
chosen, they will be subject to the same curse of dimensionality that any other de-
terministic integration method faces on a worst case basis. However the results of
Wo~niakowski and Paskov discussed in Section 3.2 show that low discrepancy grids
do succeed in breaking the curse of dimensionality on an average case basis. These
theoretical results have been impressively confirmed in result numerical comparisons
by Paskov (1994) that show that low discrepancy grids are significantly faster than
uniform random grids for solving high dimensional (d = 360) integration problems
arising in finance. These results suggest that successive approximations based on an
approximate Bellman operator ['N formed from low discrepancy grids could also suc-
ceed in breaking the curse of dimensionality of solving DDP problems on an average
case basis. However to our knowledge this conjecture has not been formally proven.
So far we have only considered the use of successive approximations to calculate
an approximate fixed point VN of -FN . Clearly any of the other solution methods
704 J. RusI

discussed in Section 4.2 could be used as well. Particularly promising algorithms

include the Trick-Zin linear programming approach using constraint generation, policy
iteration methods that approximately solve the linear system (I -/3M~)V~ - us
using the GMRES algorithm, Puterman's modified policy iteration algorithm, and
the state aggregation methods presented in Section 4.2. Although the 1987 result of
Sikorski-Wo~niakowski suggests that simple successive approximations is an almost
optimal algorithm in high dimensional problems (i.e. for large N), their theorem
assumes that the cost of evaluating the Bellman operator is independent of N which
is clearly not the case for discrete approximation methods. It is not clear whether
their result will continue to hold in a complexity analysis that recognizes that cost
of evaluating/'N (V) increases with N. However, we will now consider the broader
class of multigrid algorithms that successively refine the grid over S. It turns out that
simple successive approximations is indeed almost optimal within this larger class.

Multigrid methods. Multigrid methods have attracted substantial attention in the

literature on numerical solution of partial differential equations where they have been
found to lead to substantially faster convergence both theoretically and in practice. 5s
To our knowledge Chow and Tsitsiklis (1991) were the first to show how multigrid
methods can be applied to continuous MDPs. They proposed a "one way" multigrid
algorithm that works as follows:
1. Beginning with a coarse uniform grid over S containing N1 = ( l / h ) a~ grid
points {sl,. •., sN~} (where h is a sufficiently small positive constant), use the im-
plied approximate Bellman operator FNI to carry out T1 successive approximation
steps starting from an arbitrary initial estimate V0 of V where T1 is the smallest
number of successive approximation iterations until the following inequality is satis-

/r~Tl (V0) -- F T t - I ( v 0 ) ~. -
- (4.91)
/3(1 - / 3 ) '

where K ' is a bounding constant determined by the Lipschitz constants for ~ and p
in assumptions ( A 1 ) , . . . , (15) and h satisfies h ~< 1 / 2 K ' .
2. At iteration k of the multigrid algorithm generate a new uniform grid
with Nk = 2d~Nk_l points. Use the value function I~k = iNk_lk, V k _ l1) r~~ T k from
the final successive approximation step from the previous grid as the starting
point for successive approximations using the new grid. 59 Carry out out Tk suc-

58See Hackbusch (1985) for a survey, and the Web server

h t t p : / / i n f o . desy. de/pub/ww/proj ects/MG, htral for a comprehensive and up to date bibli-
ography of multigrid methods and applications.
59Actually Chow and Tsitsiklis suggested a better estimate of Vk, namely the average of the upper
and lower McQueen-Porteus error bounds after Tk-i successive approximation steps, see Eq. (4.6) in
Section 4.2.1. We have used the value of the last successive approximation step only to simplify notation.
Ch. 14."Numerical Dynamic Programming in Economics 705
~ o o u , , ~ f o , , , , Gr~d P o i n t s 4OO U n ; f o , r,, (.;,id Po~,,t~

!iiiiiiiiiii}iiiiiii;iil;iiii! ...........

ol . . . . . . . . . . . . . . . . . . .
::::::::::::::::::::: ;~i????!???????????~??????????E??i???iE
::::::::::::::::::::: iiiiiiiiiii}!iiiiiiiiiiii!!ii;iiiiiiiiii
::::::::::::::::::::: :!iiiiiiiliiiiiiiiiii?ii!iiiil}iiiiiiiiii
::::::::::::::::::::: :~!iiiiiiiii!iiiiiiiiiii!iiiiiiiiiiiiiiil
o n~ 02 03 O~ 0.5 0.6 0.70e O.e ,.{~ e, oz 0.3 o4 os o.~ ~.7 o.8 o.o 1.o ol o.2'o3'oi 05 06 0.7 0.80g IO

Figure 14.3. Exampleof successivegrids generated by multigridalgorithm.

cessive approximations steps starting from Vk where Tk is the smallest num-

ber of successive approximations steps until the following inequality is satis-

FTN~(Vk) .["/VTk--l(yk) <~ (4.92)
2 k 1/3(1 -/3)"

3. Terminate the outer grid generation iterations k = l, 2, 3 , . . . at the smallest

value/c satisfying:

~< e (4.93)
2~-1/3(1 - ~ )

where e is a predefined solution tolerance.

Chow and Tsitsiklis proved that the multigrid algorithm terminates with a
value function Vk+l that satisfies []Vk+~ - V I I ~< e. Furthermore they proved
that the multigrid algorithm is almost optimal in the sense that its complex-
ity compS",ma(e,/3, ds, da) (where mg denotes "multigrid") is within a factor of
1/1 log(/3)] of the lower bound on the complexity of the continuous MDP problem
given in Eq. (4.85) of Theorem 4.8:

THEOREM 4.9. Under assumptions (A1),..., (A5) the rnultigrid algorithm outlined
in steps 1 to 3 above is almost optimal, i.e. its complexity,

comp......g(e,/~,d~,d~)=O ilog(fl)i ((1-/3 2do+d. , (4.94)

is within a factor of 1/[ log(/3) I of the lower bound on the worst case complexity of
the M D P problem.
706 J. Rust

This result formalizes the sense in which simple successive approximations is %l-
most optimal" for solving the MDP problem. The multigrid algorithm can obviously
be generalized to allow other deterministic methods for generating grid points, includ-
ing quadrature grids and low discrepancy grids. Similar to our results for single grid
methods, the worst case complexity bounds for multigrid methods using other reason-
able deterministic methods for generating grids (e.g. quadrature grids) will be similar
to the complexity bound for the multigrid method using uniform grids of Chow and
Rust (1995b) presented a "random multigrid algorithm" for DDP's that is basically
similar to the deterministic multigrid algorithm of Chow and Tsitsiklis, with the ex-
ception that it does not require an exponentially increasing number of grid points in
order to obtain accurate approximations to the integrals underlying the random Bell-
man operator -FN. In particular, at each stage k of the random multigrid algorithm we
only need to increase the number of grid points by a factor of 4 as opposed to the
factor 2 a~ required by the deterministic multigrid algorithm. The following steps lay
out the details of the random multigrid algorithm:
1. Beginning with a coarse uniform grid over S containing some initial number
of grid points, N1 /> 1, use the implied random Bellman operator FN1 to carry out
Tl successive approximation steps starting from an arbitrary initial estimate V0 of
V where T1 is the smallest number of successive approximation iterations until the
following inequality is satisfied:

hT, ( Vo ) _ -T, - I ( Vo ) (4.95)
v 7(1

where K is a bounding constant determined by the Lipschitz constants for u and p

in assumptions ( A 1 ) , . . . , (a5), K = 7(d)lAIK=Igp, and 7(d) is given in Eq. (4.50)
of Theorem 4.8.
2. At iteration k of the multigrid algorithm generate a new uniform grid with
Ark = 4Nk-1 points. Use the value function #dk = -FNr~ ~l(7/k_t) from the final suc-
cessive approximation step from the previous grid as the starting point for successive
approximations using the new grid. 6° Carry out out Tk successive approximations
steps starting from <dk where Tk is the smallest number of successive approximations
steps until the following inequality is satisfied:

4- 7(1 -

6°As per our previous footnote, we actually take the average of the upper and lower McQueen-Porteus
error bounds after T k - I successive approximation steps, see Eq. (4.6) in Section 4.2.1.
Ch. 14: Numerical Dynamic Programming in Economics 7O7
I 0 0 R~mdom Glid Polnt~ 400 R~,~dom Grid Po~nls I~oo R~naom Grid P o m t ~

• . . •


• U .. . .- .
~".": -'," ".:" .2:.>',T"""::.'¢ % :2 ?
:::. %" ,,,o~...:,.2. m .4 •" .' • " . ' ° " : °. :

, ".'.. :..7..~:-:'.~.'-'.~'.-:,.~,--.k--( . :;.~ .:

[i-..rT.:.,:...:-" ,:, :;"2.>" ", ':. >...'.", "

ot o~ 03 a4 os ~ 7 ~ o ,o o o.i o.z 03 0.4 o.5 o.6 0.7 o,8 o9 i.o

Figure 14.4. Example of successive grids generated by random multigrid algorithm.

3. Terminate the outer grid generation iterations k = 1,2, 3 , . . . at the smallest

value ~: satisfying:

K 2
N,~ • (1 -- fl)4~2 (4.97)

where e is a predefined solution tolerance.

Since Ark is increasing by a factor of 4 at each iteration k of the random multigrid
algorithm, Theorem 4.6 implies that the expected error is being halved at each itera-
tion, and therefore the random multigrid algorithm terminates with a finite value of/c
with probability 1. Rust showed that the multigrid approach is faster than the single
grid random successive approximations algorithm by a factor of log(1 / (1 - fl)e), lead-
ing to the following upper bound on the randomized complexity of infinite horizon

THEOREM 4.10. An upper bound on the worst case complexity of infinite horizon DDP
problems is given by."

compW°r-ran(e,/3, d) = O i lo~-~_ ~-8~4 j . (4.98)

Note that Theorem 4.10 is an upper bound on the randomized complexity of the
DDP problem. The lower bound is not yet known. However in view of Bakhvalov's
(1964) result that simple Monte Carlo integration is an almost optimal algorithm when
the dimension d is large relative to the degree of smoothness r of the integrand, it is
reasonable to conjecture that the upper bound on the randomized complexity of the
DDP problem given in Theorem 4.10 is not far (i.e. within a factor of 1/I log(fl)I) of
the lower bound on the randomized complexity of this problem.
708 J. Rust
~oo s , , b , ) ) ' G,id P o ~ , , t s 4 0 0 S o b o r Grid P o i n t s 1600 Sobor G~id P~,,,~s
, ~ ., . . '~I--" : . .'" ~. - ., . . . ~ , . '
~,._ .- ,,:. -....._.-..-• ... . . . . ..-,,

• o

~, . " . . . •. . . . . . . . . . .
ol o.2 o.~ o.~ os o.e 07 oa o.~ I.D ol o.2 03 o.4 05 0.6 o7 og o~ la

Figure 14.5~ Example of successive grids generated by a "Low Discrepancy" multigrid algorithm•

Figure 14.5 presents successive grids generated by a "low discrepancy" multigrid

algorithm that uses the Sobol point sequence described in Section 4.3.1 above. 61
Recall that these sequences (sl,...,SN) are strategically chosen to minimize the
discrepancy between empirical Lebesgue measure based on the "sample" ( s l , . . . , sN)
and true Lebesgue measure, but otherwise the operation of the multigrid algorithm
is exactly as described for the uniform and random multigrid methods described
above. The theoretical and numerical results of Paskov (1993, 1994) suggest that
sample averages of an integrand using Sobol points result in much more accurate
estimates of the integral than taking sample averages of the integral evaluated at
pseudo-random points• Comparing Figs 14.4 and 14.5 it is in fact visually apparent
that the Sobol points are more "uniform" in the sense that there are fewer overlapping
points and fewer empty gaps in the unit square• Although there has been no theoretical
or numerical analysis of the use of Sobol points for solving MDPs, I believe that the
combination of Sobol point multigrid algorithm could yield very fast and accurate
numerical approximations to the value function•
It is easy to define still other variants of the multigrid algorithm corresponding
to the other discrete solution algorithms presented in Section 4.2, such as multigrid
policy iteration, multigrid linear programming, and so forth• We do not yet have any
computational experience with these methods, so it is not clear whether they will out-
perform multigrid successive approximations. 6~ However the near optimality of the
slJccessive approximations multigrid algorithm suggests that these other methods may
not do much better: at most they might eliminate the remaining factor of 1/J log(/3)[
difference between the complexity of the multigrid successive approximations algo-
rithm and Chow and Tsitsiklis's lower bound on deterministic complexity of MDPs
given in Theorem 4.8.

61These points were generated using the algorithm available in the 1992 edition of Numerical Recipes
by Press, Flannery, Teukolsky and Vetterling.
62This statement is not quite true, since the "adaptive grid generation" approach of Trick and Zin
described in Section 4.2.3 can be viewed as a multigrid linear programming algorithm for continuous
MDPs. Unfortunately their method has not yet been compared to the multigrid successive approximation
Ch. 14: Numerical Dynamic Programming in Economics 709

4.4.2. Smooth approximation methods'

Section 4.3.2 presented a large number of different smooth approximation methods

for continuous finite horizon MDPs. We begin this section by observing that all of the
methods in Section 4.3.2 extend in a straightforward manner to smooth approxima-
tion methods for continuous infinite horizon MDPs: one simply uses these methods to
compute a finite horizon approximation to the infinite horizon problem with T(c,/3)
periods, where T(e,/3) is the maximum number of successive approximation steps
required to ensure that IIFt(Vo) - VII ~< ~ for any starting estimate 17o. The draw-
back of this approach is that it suffers from the slow rate of convergence of the
method of successive approximations (discussed in Sections 4 and 4.2), which is es-
pecially problematic when /3 is close to 1. In addition there are difficult unsolved
problems involved in establishing sufficient conditions for these "approximate suc-
cessive approximations" methods to converge at all. In this section we focus on linear
approximation algorithms of the form:


where {pl, p2,-.-} are a pre-specified sequence of basis functions such as the Cheby-
shev polynomials. One can show that approximation algorithms of this form are op-
timal error algorithms for approximating functions in any class F that is convex and
symmetric [see Novak (1988)]. The first class of methods we discuss, introduced by
Kortum (1992), might be called "parametric successive approximations" since it con-
verts the successive approximations iterations for V into successive approximations
iterations for the coefficient vector 0. Next we consider a potentially faster method that
we refer to as "parametric policy iteration" that was first suggested by Schweitzer and
Seidman (1985): it is the standard policy iteration algorithm except that each policy
valuation step uses a linear approximation VO~ to the exact value function Vat implied
by the trial decision rule c~t. Both of these approaches use the method of ordinary
least squares to compute the best fitting parameter estimates 0t at each stage t. Thus,
these approaches can be viewed as iterative versions of the class of minimum residual
methods and the closely related class of projection methods that are described in nmch
more generality by Judd in Chapter 12. The basic idea behind these methods is to
find a parameter 0" such that the residual function

R(vo)(s) : Vo(s) - (4.100)

as close to the 0 function as possible.

710 J. Rust

Parametric successive approximations. In continuous MDPs the standard method of

successive approximations Vt = F t (V0), t = 1 , 2 , . . . , generates a globally convergent
sequence of iterations in the function space B(S). We wish to illustrate a number of
difficulties involved in proving the convergence of successive approximations when
one is simultaneously trying to approximate the functions F t (V0) by any of the smooth
approximation methods outlined in Section 4.3.1. We will focus on linear approxima-
tions Vo of the form (4.99) due to their optimality properties noted above and also
because it simplifies notation. If we consider approximations Vo using a fixed number
k of basis functions { s l , . . . , sk} then unless the true value function V lies within the
subspace spanned by {Pl, • • •, Pk} we have inf0eRk IIVo - r ( V o ) I I > o, which implies
that we will be unable to obtain arbitrarily accurate approximations to V. However if
the sequence { p l , p 2 , . . . } is dense in the space B of all possible value functions in
the sense of Eq. (1.2) in the introduction, then by choosing k sufficiently large we can
make inf0cRk IIVo - F ( V o ) l l as small as desired. Suppose we choose k large enough
to guarantee that inf0cRk ]]Vo - F(Vo)]] <~ (1 - fl)e, where e is some predefined so-
lution tolerance. Then we can show that there exists a k > 0 and a 0 c R k such that
the estimate V 0 is uniformly within e of the true solution V. The problem now is to
describe a feasible algorithm that that is capable of finding the required k and 0" for
any pre-specified solution tolerance c. Consider first the problem of finding 0 when k
is fixed at some arbitrary value.
Kortum (1992) suggested the method of successive approximations in 0-space.
Starting from an arbitrary initial guess 0"0 (corre~onding to an initial approximate
value function, Voo), we form updated estimates {Or} according to the formula:

0t = ( x x t) -1
xut , (4.101)

where the N x 1 vector Yt -- ( y t ( 1 ) , . . . , y t ( N ) ) ' , and the N x K matrix X =

{ X ( i , j ) } are defined by:

= 2(vo,_1)(sd,
X (i, j ) = pj (si), (4.102)

where i = 1 , . . . , N, j -- 1 , . . . , k, F is an approximation to the Bellman operator

/7 induced by the need to numerically approximate the maximization and integration
subproblems, and { s l , . . . , sN} is a grid of N >~ k points in S. The interpretation of
Eq. (4.101) is that Ot is the ordinary least sqtmres estimate that minimizes the sum of
squared errors of the residual function Vo - F(Vo~_I ):

O't - arg min E Vo(si) -

^ 2 .
Ch. 14." Numerical Dynamic Programming in Economics 711

It is not essential that ordinary least squares be used to compute 0"t: any linear or
nonlinear approximation method that is capable of yielding arbitrarily accurate ap-
proximations to functions of the form F ( W ) will do as well. Any of the methods
discussed in Section 4.3.1 could be used: for example 0"t could be calculated by the
Chebyshev interpolation formula (4.68) using a grid {sl,. • •, SN} consisting of the
first N = k Chebyshev zeros, or Ot could be calculated by the Galerkin method
in Eq. (4.70), or by any of the broader class of projection methods which we will
discuss in more detail below. Finally Ot could be calculated by least squares b u t
using a nonlinear-in-parameters functional form for Vo such as the neural network
approximation given in Eq. (4.71) of Section 4.3.2.
A more important question is when to terminate the successive approximations
iterations. Kortum (1992) suggested termination when successive changes in the pa-
rameter vector is less than some predefined solution tolerance, H0"t- Ot-I II ~ ~" The
triangle inequality implies that

Ilvo - vo _,ll Z ttplt, (4.104)


where IlOt--0"t-ill denotes the maximum change in h coefficients from iteration

t - 1 to t. Thus, if the 0"t coefficients change by a sufficiently small amount the
corresponding value functions VO~ will also change by only a small amount.
Unfortunately we are not aware of any formal proofs of the convergence of this
algorithm and we are unaware of numerical applications of the method. To date most
numerical applications of the successive approximations approach to solving continu-
ous infinite horizon MDPs have not attempted to generate continuous approximations
Vgt at each step of the backward induction process. Instead, most of the applications of
which we are aware [including Christiano (1990) and Tauchen (1990)] have adopted
the discrete approximation approach, carrying out all of the successive approximation
steps on a discretized finite state MDP and using multilinearinterpolation or some
other approximation method to generate a smooth extension V defined over all of 5'
from "data" consisting of an N x 1 approximate fixed point Vlv to an approximate
Bellman operator defined over a finite grid { s l , . . . , SN}.

Parametric policy iteration. Schweitzer and Seidman (1985) and Kortum (1992) in-
dependently proposed a parametric version of the Bellman-Howard policy iteration
algorithm described in Section 4.2.1. Their method differs from the standard imple-
mentation of policy iteration in its use of a polynomial series approximation Vo given
in Eq. (4.99) to approximate the value function Vs corresponding to a given station-
ary policy c~. Recall from Section 4.2 that Vs is the solution to the linear system
V~ = u s - 3 M ~ V s where M s is the Markov operator corresponding to the policy c~.
712 J. Rust

This linear system must be solved at each "policy valuation" step of the policy itera-
tion algorithm, and in the case of continuous MDPs it is actually a linear functional
equation whose solution can only be approximated. Let {si, • • •, si} be a grid over
the state space S. The parametric policy iteration algorithm works as follows: given
a guess of a policy c~t at stage t of the policy iteration algorithm we approximate
the true value function Vat corresponding to the policy cq by the value function Vot
where 0"t is given by:

= (XtXt)
t -1
t (4.105)

where the 2V x k matrix Xt and the N x 1 vector Yt are defined by

Xt(i,j) = pj(si) -/3 fpj(~')v(dJ I ~,, ~(~,)). (4.106)

Given 0"t we form an updated policy c~t+l defined by:


=arg max
[u(si, a) + /3f Vo~(s')p(ds' ] si,a)] i= 1,...,N. (4.107)

We continue iterating on Eqs (4.106) and (4.107) until

max {c~t+l(si) - ~ ( ~ ) l ~<


in which case we can use VOt+las the corresponding estimate of V. Just as in the case
of parametric successive approximations, the formula for Ot in Eq. (4.105) correspond
to the ordinary least squares estimate:
Ot=arg min ~lu(si, c~t(si)) +/3M~(Vo)(si)- Vol2
(0t ..... Ok) i=1

N k
= arg min ~-"
(0~..... 0h) .=
~(s~,~(~)) +/3 ~2o~ fpj(s')p(ds' I s~,~,(s~)) (4.108)
k 2
- ~ ojpj(s~) i
Ch. 14: Numerical Dynamic Programming in Economics 713

Similar to the previous case, there is no particular reason to restrict attention to OLS:
any of the other linear or nonlinear approximation methods considered in Section 4.3.2
could be used as well.
In certain cases the integrals appearing in Eqs (4.108) and (4.107) can be computed
analytically. For example Kortum shows that when the basis functions { P l , . . . ,Pk}
are polynomials the integrals are the derivatives of the moment generating function of
the transition probability p(ds' t s, a), and in some cases there are analytic expressions
for these moment generating functions. If not, then we need to invoke some sort of
numerical integration method such as the quadrature or Monte Carlo integration meth-
ods discussed in Section 4.3.1. Similar to the parametric successive approximations
method described above, the parametric policy iteration algorithm does not converge
monotonically to the true solution V, so special care must be taken to prevent cycling.
We are not aware of error bounds or formal proofs of the convergence of this method.
Just as in the case of parametric successive approximations, a proof of convergence
will require specification of the rates at which the number of iterations t the number of
basis functions k and the number of grid points N tend to infinity in order to guarantee
that the sequence of value functions {Vo~} converges to V. We can imagine multi-
level level algorithms that iteratively adjust t, k, and N until a certain convergence
tolerance is attained. Although we are not aware of any formal procedures to insure
convergence, Schweitzer and Seidman and Kortum report very favorable experience
with the method in numerical experiments. Kortum used the method to approximate
the value and policy functions for a MDP model of a firm's optimal investment in
research and development. He used a basis consisting of the first k = 8 Chebyshev
polynomials. 63 Paraphrasing his conclusions in our terminology, he found that:
The algorithm often requires less than six iterations before it converges to a toler-
ance of 0.0000001 for the policy function. Running on an IBM-compatible com-
puter with an 80486 processor, the algorithm requires 5-60 seconds to converge.
The algorithm was successful in matching the closed form solutions for V(s) and
o~(s), even in cases where V was not a polynomial (p. 14).

Minimum residual methods. These methods are based on the equivalent formulation
of the Bellman equation as determining V as the unique zero to the residual function,
i.e. the nonlinear operator R(V) = [ I - F ] (V) - 0. Given a parametric approximation
Vo, these methods approximate V by V0 where 0 is chosen to make /~(V0) as close
to zero as possible in the Lp n o r m :

O : arg rain ttR(Vo)[Iv


- IVo(s)-c(vo)(s)]P~(ds) , pc [l,~]. (4.109)

63Kortum did not report the number N of grid points used in his regressions for Ot and his procedure
for choosing {sl, . . . , SN}.
714 J. Rust

The function # entering (4.109) represents a probability measure reflecting the ana-
lyst's view of the relative importance of deviations in the residual function at various
values of s. For this reason, the MR method is also known as the method of mini-
mum weighted residuals (MWR). The important point is that MR methods convert the
infinite-dimensional fixed problem into a finite-dimensional minimization problem.
In theory we are interested in minimizing the sup-norm of R(Vo) which corresponds
to the choice p = oc. However in practice the sup-norm is difficult to work with
since it leads to objective function that is generally non-differentiable in 0. Therefore
standard practice is to use the L2 norm. It is important to note that in general the L2
and L ~ norms generate different topologies on the space B(S). In particular the fact
that V is close to V in the L2 norm does not necessarily imply the V is close in the
L ~ form. However if V is known to be an element of certain compact subspaces
B of 13(S) (such as the Sobolev space of all functions whose rth derivative have
uniformly bounded Lp norm), then the Rellich-Kondrachev theorem [Adams (1975,
Theorem 6.2)] provides sufficient conditions for the Lp norm to be equivalent to the
La norm, p ~> 2, i.e. conditions under which there exists a constant Cp > 0 such that

- vllp cpll - vii2, p 2, If,V C/3. (4.110)

Let F~ denote a smoothed Bellman operator given by:

= l°g [eA(s)exp{l [u(s,a)+Z f v(s')P(dJ I s,4}] (4.111)

Dini's theorem implies that for each V E 13(S) we have:

lim F~(V) = F(V), (4.112)


i.e. the smoothed Bellman operator converges to the Bellman operator as the smooth-
ing parameter c~ --+ 0. This result is useful for simplifying the analysis of vari-
ous algorithms for computing approximate value functions since F~(V)(s) is a nice
smooth function of V for all s E S whereas the max operator inside the Bellman
operator can introduce kinks or nondifferentiabilities in F(V)(s) at certain points
s E S. These kinks create complications for the practical application of the MWR
algorithm which are avoided by using the smoothed Bellman operator. Let P~ denote
an approximate smoothed Bellman operator that employs some numerical integration
method to approximately calculate the integral entering /~. Numerical integration
must also be used to approximate the integrals defining the L2 norm. Nearly all of the
standard numerical integration algorithms can be represented as a weighted average
of the integrand over a grid of points { s l , . . •, sN} in the domain S using weights
Ch. 14." Numerical Dynamic Programming in Economics 715

{ w l , . . . , WN}. For example Gaussian quadrature has this representation, and Monte
Carlo integration has this representation for a randomly chosen grid { g l , . . . , gN}. and
uniform weights wi = l/N, i = 1 , . . . , N. Using this representation for the L2 norm
in (4.109) leads to the definition of the MR estimate of 0":

0 " = arg min ~ Vo(si)- ~(Vo)(si)2wi. (4.113)

One obtains a large number of variations of the basic MR method depending on how
one specifies the basis functions { P l , . . . , Pk} determining the approximation method
for Vo and the grid points { S l , . . . , sN} and weights { w l , . . . , WN} determining the
numerical integration used to approximate integrals entering F~ and the L2 norm for
the objective function in (4.109) (we assume that N is the same in both cases only for
notational simplicity). Assuming that we have chosen procedures that can generate
arbitrarily accurate estimates for sufficiently large values of k and N, there are 3
basic parameters governing the ability of the MR estimate of the value function to
approximate the true solution V: k, N and the smoothing parameter ~r. By letting c~
tend to zero and k and N tend to infinity at the right rates, it is possible to show that
V0 converges to V.

Projection methods. We introduced projection methods in our discussion of Cheby-

shev approximation in Section 4.3.2. Our discussion in this section will be brief since
Judd (1996, Chapter 12) discusses projection methods in detail, showing how they
apply to many different numerical problems in addition to MDPs. The basic idea of
projection methods is that the k x 1 parameter vector 0" should be chosen to set the
projections of the residual function R(Vo) against a set of k "projection directions"
{q51,..., q~k} equal to zero:

<R(V0), 4'~>,

f [Vo(s)-F~(VO)(s)]~i(s)#(ds ) = 0 , i= 1,...,k, (4.114)

where # is a probability measure defining the weighted inner product (f, 9),.
Equation (4.114) is a nonlinear system of k equations in the k unknowns 0, and
we assume that at least one solution 0 exists. Since the true residual function
t~(V)(s) = V(s) - F(V)(s) is identically zero, its projection on any set projec-
tion directions {q51,..., q~k) will be zero. Therefore it seems reasonable to assume
that if V0 is a good approximation to V and if F~ is a good approximation to F,
then (4.114) should have a solution 0".
Judd describes many different possible choices for the projection directions: here
we only summarize the most prominent choices. The Galerkin method is a special
716 J. Rust

case when ¢i = Pi where Pi are the basis functions for the linear approximation to
Vo in Eq. (4.99). The interpretation of the Galerkin method is that since the residual
function R(Vg) represents the "error" in the approximation VO, we can make the error
small by forcing it to be orthogonal to each of the first k basis functions. It is easy to
see that the M R method can be viewed as a special case of a projection method with
projection directions given by


A final variant of projection method is the collocation method which is a special case
where # has its support on a finite grid { s l , . . . , sk} and the projection directions are
the indicator functions for these points, i.e. ¢i(s) = I{s = si} where I { s = si} = 1
if s = si, and 0 otherwise. Thus, the collocation method amounts to choosing a grid
of k points in S, {sl, • • •, sk}, and choosing 0" so that the residual function is zero at
these k points:

R(V~)(si) =0, i= 1,...,k. (4.116)

In addition to the four types of grids described in Section 4.3.1, another choice for
{ s l , . . . , sk} are the Chebyshev zeros, or more generally the zeros of the basis func-
tions { P l , . - - , pk}. Judd refers to the resulting method as orthogonal collocation and
provides the following rationale for choosing this particular grid in the special case
where the Pi are the Chebyshev polynomials:

As long as R(Vo)(s) is smooth in s, the Chebyshev Interpolation Theorem says

that these zero conditions force R(Vo)(s) to be close to zero for all s E S, and that
these are the best possible points to use if we are to force R(Vo) to be close to the
zero function. Even after absorbing these considerations, it is not certain that even
orthogonal collocation is a reliable method; fortunately, its performance turns out
to be surprisingly good (Judd, Chapter 12).

The main conceptual problems with projection methods are: 1) determining what to
do if the system (4.114) doesn't have a solution, 2) determining which solution 0" to
choose if the system (4.114) has more than 1 solution, 3) deriving bounds on the
difference between the approximate and exact solution ItVo - vii. We are not aware
of error bounds or even proofs that V0 --+ V as k -+ ec and F~ ~ / 7 , at least for the
case of MDPs. To date we are not aware of any applications of projection methods
to approximating the value function V as a fixed point to the Bellman o p e r a t o r / ' .
Krasnosel'skii et al. (1972) and Ziedler (1993) present proofs of the convergence of
projection methods for certain classes of nonlinear operators. It seems likely that these
approaches could be modified to establish convergence proofs in the case of MDPs.
Ch. 14: Numerical Dynamic Programming in Economics 717

Remarks. While a number of the smooth approximation methods described in

this section appear to be effective for approximating solutions to relatively low-
dimensional MDP problems, all of these methods are subject to the curse of dimen-
sionality that makes it virtually impossible to use these methods in high-dimensional
problems. In particular, multi-dimensional approximation methods such as the tensor
product versions of Chebyshev polynomial or other forms of approximations require
the number of basis functions ra to increase exponentially fast in d in order to main-
tain an e-approximation to the function to be approximated. Many of the methods we
have described, including the minimum residual and projection methods, also involve
compound forms of the curse of dimensionality since they require solution to mini-
mization problems of dimension m or solution to systems of nonlinear equations of
dimension m, but the amount of computer time required to find e-approximations to
these solutions increases exponentially fast in m on a worst case basis. Furthermore
randomized procedures for solving these approximation and optimization subprob-
lems cannot break the curse of dimensionality due to the fact that randomization is
incapable of breaking the curse of dimensionality of the optimization subproblem [see
Yudin and Nemirovsky (1983)].

5. Conclusion

This chapter has surveyed an almost overwhelmingly large literature on numerical

methods for MDPs that has developed over the last 40 years. Although the field of
numerical dynamic programming could probably be described as "mature", there is
no sign that research in this area is slowing down. If anything, contributions to this
literature seem to have accelerated over the last five years. The combination of im-
proved solution methods and rapid improvements in computer hardware have enabled
economists to greatly reduce computation time, improve accuracy, and solve larger
and more realistic MDP models. We have focused on "general purpose" methods that
are capable of providing arbitrarily accurate approximations to generic MDP prob-
lems. Due to the space constraints in this already long chapter, we have not been
able to describe a number of more specialized but potentially promising methods that
have performed well in specific applications. These include Euler equation methods
[see, for example, Christiano and Fisher (1994), Coleman (1990), Marcet (1990),
Marshall and Marcet (1994) and the survey in Rust (1996)], the linear-quadratic ap-
proximation method of Christiano (1990) and McGrattan (1990), the "backsolving"
approach of Sims (1990) and Ingram (1990), the "extended path method" of Fair and
Taylor [see Gagnon (1990)] and the Lagrange multiplier approach of Chow (1992,
1993a,b, 1994). Space Constraints have also forced us to ignore an interesting liter-
ature on learning algorithms for approximating solutions to MDPs with incomplete
information about (u, p). We refer the reader to Narendra and Wheeler (1986), Barto,
Bradtke and Singh (1993) and Tsitsiklis (1994) for studies that show how learning
718 J. Rust

algorithms are able to eventually "discover" optimal strategies. Rust (1989b) provides
references to an older literature on adaptive control.
We have focused most of our attention on continuous MDPs since these problems
arise quite frequently in economic applications and methods for discrete MDPs have
been adequately covered elsewhere. As we have seen, there is a controversy over
the relative efficiency of discrete versus continuous approximation methods for ap-
proximating the solutions to continuous MDPs. We have appealed to the theory of
computational complexity to provide a framework for analyzing the speed and accu-
racy of the large number" of discrete and smooth approximation methods that have
been proposed. Complexity theory allows us to formalize the well known "curse of
dimensionality", that has prevented us from approximating solutions to large scale
MDP problems. Complexity theory enables us to determine whether the curse of
dimensionality is an inherent aspect of the MDP problem, or whether there exist so-
lution methods that are capable of breaking it. Complexity theory has also enabled
us to characterize the form of optimal or nearly optimal algorithms for the general
continuous-state/continuous-action infinite horizon MDP problem. We have used this
theory to argue that smooth approximation methods are subject to an inherent curse of
dimensionality inherited from the curse of dimensionality of the underlying problem
of multivariate function approximation. Although discrete approximation methods are
also subject to the curse of dimensionality (at least on a worst case basis for MDPs
with continuous action spaces known as CDP's), we have shown that a discrete ap-
proximation method, the multigrid/successive approximations algorithm of Chow and
Tsitsiklis (1991), is a nearly optimal algorithm for the general infinite horizon con-
tinuous MDP problem. We are not aware of any corresponding optimality results for
smooth approximation methods, and indeed at present we do not even have conver-
gence proofs or error bounds for most of these methods. For the subclass of MDPs
with finite action spaces (i.e. DDP's) we have shown that there exist randomized
versions of discrete approximation methods that do succeed in breaking the curse
of dimensionality. An example of such a method is the random multigrid/successive
approximations algorithm of Rust (1994c). The intuition of why this algorithm breaks
the curse of dimensionality is that in DDP problems essentially all the work is the
numerical integrations required to compute the conditional expectations of the value
function. Since Monte Carlo methods break the curse of dimensionality of the integra-
tion subproblem, it follows that these methods also break the curse of dimensionality
of the DDP problem.
Unlike integration, complexity theory shows that randomization is incapable of
breaking the curse of dimensionality of multivariate function approximation. How-
ever this conclusion seems to contradict the result that randomization succeeds in
breaking the curse of dimensionality of approximating the class of multivariate value
functions V ( s ) = F ( V ) ( s ) for DDP problems. It turns out that there is no contradic-
tion here: the reason that randomization does not break the curse of dimensionality of
the general approximation problem is due to the fact that complexity bound for the
Ch. 14: Numerical Dynamic Programming in Economics 719

general approximation problem is defined in terms of a much broader class of func-

tions F (e.g. the class of all Lipschitz continuous functions with uniformly bounded
Lipschitz norm). Randomization succeeds in breaking the curse dimensionality of ap-
proximating value functions since they are elements of a restricted subset B C F that
can be represented as the maximum of a finite number of linear integral operators. On
the other hand, smooth approximation methods based on general multivariate approx-
imation algorithms such as tensor products of Chebyshev polynomials, splines, neural
networks, etc. are designed to approximate the much wider class F and therefore fail
to exploit the special structure of the set B of value functions. Thus, our conclusion
that smooth approximation methods are subject to an inherent curse of dimensionality
irregardless of whether deterministic or random algorithms are used only applies to the
class of standard smooth approximation methods (i.e. those based on approximation
methods that are capable of approximating arbitrary Lipschitz continuous functions in
F rather than exploiting the special structure of the class of value functions V). It is
entirely possible that a smooth approximation algorithm that exploits the structure of
this subset can succeed in breaking the curse of dimensionality of the DDP problem.
Indeed, as we noted in Section 4, discrete approximation methods such as the ran-
dom multigrid algorithm can also be viewed as smooth approximation methods due
to the self-approximating property of the random Bellman operator FN. This implies
that certain nonstandard smooth approximation methods do succeed in breaking the
curse of dimensionality of approximating value functions to certain classes of MDP
problems such as DDP's.
We have devoted relatively little space to the maximization subproblem beyond the
observation that when there are a continuum of actions, the maximization subproblem
is subject to an inherent curse of dimensionality regardless of whether deterministic
or random algorithms are used. This result immediately implies that randomization
cannot break the curse of dimensionality involved in solving the general CDP prob-
lem, so in this sense CDP's are inherently more difficult problems than DDP's. Indeed
we view the practical possibilities for solving general CDP's that have complicated
multidimensional constraint sets A(s) to be fairly hopeless. Most of the constrained
optimization algorithms that we are aware of require a good deal of "babying" to en-
sure convergence, and there is generally no guarantee that these methods will converge
to a global as opposed to local maximum. Given that these constrained maximization
problems must be solved hundreds and thousands of times in the course of computing
an approximate solution to the MDP problem, it seems reasonably clear that given the
current state of the art we must either resort to discretization of the constraint sets and
determine the approximate maximizing value &(s) by simple enumeration, or we must
focus on highly structured problems such as convex optimization problems where fast
globally convergent algorithms exist. Since convex constrained optimization problems
can be solved in polynomial time on a worst case basis, it may even be possible to
extend the random Bellman operator approach of Rust (1994c) to break the curse of
720 J. R u s t




o t r J i p r i i i ,__

0 2 4 6 8 10 12 14 16 18 20


Figure 14.6. Hypothetical cost functions for exponential and polynomial time algorithms.

dimensionality of the class of convex CDP's on a worst case basis. It is an open theo-
retical question whether there exist optimization algorithms that succeed in breaking
the curse of dimensionality on an average case basis [see, e.g. Wasilkowski (1994)].
At present there seems to be little cause for hope for a breakthrough in this area.
While these are useful general results, complexity theory (whether done on a worst
or average case basis) can take us only so far. Although the theory tells us a lot
about the the rate of growth in cpu time as the problem dimension d increases and
the permissible error tolerance e decreases, the fact that complexity bounds typically
depend on unspecified proportionality constants make it quite difficult to determine
the exact value of the complexity bound which we need in order to characterize the
optimal algorithm for any fixed values of d and e. It is also typically very difficult to
verify a number of important assumptions underlying the complexity analysis, such as
the assumption that the Lipschitz b o u n d s / ( ~ and Kp on u and p are independent of d
or grow only polynomially in d. As we noted in Section 4.3.1, if these bounds grow
exponentially fast in practical applications, then none of the approaches considered in
this chapter succeed in breaking the curse of dimensionality.
Even in "well-behaved" problems it is entirely possible that an exponential-
time time algorithm E might be faster for use in approximating solutions to low-
dimensional MDPs than a polynomial-time algorithm P. This is illustrated in Fig. 14.6
which plots hypothetical cost functions for algorithm E which increases exponen-
tially fast in d and algorithm P which increases at a rate proportional to the fourth
power of d. The figure shows that the exponential-time algorithm is actually twice as
fast for solving low-dimensional problems than the polynomial-time algorithm. The
Ch. 14: Numerical Dynamic Programming in Economics 721

polynomial-time algorithm is preferable only for problems of dimension d ~> 6 since

the exponential-time algorithm "blows up" after this point. It follows that the mere
fact that an algorithm is subject to the curse of dimensionality does not necessarily
imply that it is a bad method if we are only interested in solving relatively small
scale problems. Indeed given that the computational complexity function comp(c) is
the lower envelope of all the algorithm-specific cost curves, it is entirely possible
that the exponential-time algorithm E could actually be an optimal algorithm for
sufficiently low-dimensional problems. In the case of the DDP problem algorithm
E might represent a smooth approximation method such as a Galerkin method that
approximates the value function by a linear combination of Chebyshev polynomials
and P might represent the random multigrid algorithm.
The phenomenon illustrated in Fig. 14.6 may be part of the source of the paradoxical
findings in Section 4.4 that discrete approximation methods such as the method of
successive approximations using approximate Bellman operator defined over a finite
grid of points are orders of magnitude slower than smooth approximation methods
such as Galerkin-Chebyshev. Note also that the discrete approximation methods that
have been considered so far are deterministic single grid methods: the theoretical
results of Section 4.4.1 suggest that the random multigrid or low discrepancy multigrid
methods will be orders of magnitude faster than the single methods that have been
investigated to date. Unfortunately these approaches have yet to be implemented and
compared to existing methods, with the recent uxception of Santos and Vigo (1996).
We conclude that although complexity theory has suggested a number of useful
algorithms, the theory has relatively little to say about important practical issues
such as determining the point at which various exponential-time algorithms such as
Chebyshev approximation methods start to blow up, making it optimal to switch
to polynomial-time algorithms such as the random multigrid algorithm described in
Section 4.4.1. It seems that the only way to obtain this information is via "hands
on" practical experience with the various methods. This kind of experience has been
accumulated for other numerical problems. For example in a section entitled "The
Present State of the Art" in their book on numerical integration, Davis and Rabinowitz
(1984) identified optimal methods for different ranges of d. They concluded that
product quadrature rules were the best for low dimensional problems (i.e. d ~< 4) and
that for high dimensional problems (d > 15) "Sampling or equidistribution methods
are indicated. They are time consuming, but some care, are reliable" (p. 417). What
we need is a similar statement of the state of the art for MDPs.
In the final analysis there is no substitute for rigorous numerical comparisons of
alternative methods such as we have presented in Section 4. However we have only
presented comparisons for a limited number of methods for one low dimensional
test problem - the auto replacement problem. In future work it will be essential
to provide numerical comparisons of a much broader range of methods over a much
broader range of test problems including problems of moderate to high dimensionality.
The curse of dimensionality has been stumbling block to such a comparison in the
722 J. Rust

past. Perhaps an equally serious p r o b l e m has been the lack of analytic solutions to
m u l t i d i m e n s i o n a l problems. This has m a d e it difficult to compare the relative accuracy
of various n u m e r i c a l methods. E v e n w h e n an analytic solution exists there are m a n y
ways to j u d g e the accuracy of an approximate solution: do we m e a s u r e it in terms of
the error in the value function, the policy function, or in the stochastic properties of
the simulated state and control variables? A n important direction for future research
will be to develop a suite of m u l t i d i m e n s i o n a l test problems with analytic solutions
and a c o m p r e h e n s i v e set of criteria for m e a s u r i n g the accuracy of solutions along
m u l t i p l e d i m e n s i o n s and to carry out a systematic numerical comparison of the various
approaches, i n c l u d i n g some of the p r o m i s i n g n e w methods suggested in this chapter.
At that point we will have a m u c h better picture of the true state of the art in n u m e r i c a l
dynamic programming.


Adams, R.A. (1975) Sobolev spaces. New York: Academic Press.

Anderson, E., McGrattan, E., Hansen, L.R and Sargent, TJ. (1996) 'Mechanics of forming and estimating
dynamic linear economies' Chapter 4 in this Handbook.
Archibald, T.W., McKinnnn, K.I.M. and Thomas, L.C. (1993) 'Serial and parallel value iteration algorithms
for discounted Markov decision processes', European Journal of Operations Research, 67:188-203.
Arrow, K.J. (1968) 'Applications of control theory to economic growth', in: G.B. Dantzig and A.F. Veinott,
eds, Mathematics for decision sciences, Part 2. Providence, RI: Amer. Mathematical Soc..
Arrow, K.J., Harris, T. and Marschak, J. (1951) 'Optimal inventory policy', Econometrica, 19(3):250-272.
Bakhvalov, N.S. (1964) 'On optimal estimates of the rate of convergence of quadrature processes and inte-
gration methods of the Monte Carlo type', In the miscellany: Numerical methods.[br solving differential
and integral equations and quadrature formulae. Moscow: Nanka, pp. 5-63.
Barron, A.R. (1993) 'Universal approximation bounds for superpositions of a sigmoidal function', lEEE
Transactions on Information Theory, 39(3):930-945.
Barto, A.G., Bradtke, SJ. and Singh, S.R (1995) 'Learning to act using real-time dynamic programming',
Artificial Intelligence, 75:81-138.
Bellman, R. (1955) 'Functional equations in the theory of dynamic progrmmlfing: Positivity and quasilia-
earity', Proceedings (~f the National Academy of Sciences, 41:743-746.
Bellman, R. (1957) Dynamic programming. Princenton, NJ: Princeton Univ. Press.
Bellman, R. and Dreyfus, S. (1962) Applied dynamic programming. Princenton, NJ: Princeton Univ. Press.
Bellman, R., Kalaba, R. and Kotkin, B. (t963) 'Polynomial approximation: A new computational technique
in dynamic programming: Allocation processes', Mathematics of Computation, 17:155-161.
Bertsekas, D. (1987) Dynamic programming deterministic and stochastic models. New York: Prentice-Hall.
Bertsekas, D. (1977) 'Monotone mappings with application in dynamic programming', SlAM Journal of
Control and Optimization, 15(3):438-464.
Bertsekas, D. (1975) 'Convergence of discretization procedures in dynamic programming', IEEE Transac-
tions on Automatic Control, 20:415-419.
Bertsekas, D. and Castafion, D. (1989) 'Adaptive aggregation methods for infinite horizon dynmnic pro-
gramming', 1EEE Transactions on Automatic Control, 34(6):589-598.
Bertsekas, D.P. and Shreve, S.E. (1978) Stochastic optimal control: The discrete time case. New York:
Academic Press.
Bertsekas, D.P. and Tsitsildis, J.N. (1993) 'Simulated annealing', Statistical Science, 8:1-16.
Ch. 14: Numerical Dynamic Programming in Economics 723

Bhattacharya, R.N. and Majumdar, M. (1989) 'Controlled semi-Markov models - the discounted case',
Journal of Statistical Planning and Inference, 21:365-381.
Blackwell, D. (1962) 'Discrete dynamic programming' Annals of Mathematical Statistics, 33:719-726.
Blackwell, D. (1965) 'Discounted dynamic programming' Annals of Mathematical Statistics, 36:226-235.
Blackwell, D. (1967) 'Positive dynamic programming', in: Proceedings of the 5th Berkeley symposium
on mathematical statistics and probability, Vol. 1. Berkeley, CA: University of California Press, pp.
Brown, P.N. and Saad, Y. (1990) 'Hybrid Krylov methods for nonlinear systems of equations', SlAM
Journal on Scientific and Statistical Computing, 11(3):450-481.
Brock, W.A. (1982) 'Asset prices in a production economy', in: J.J. McCall, ed., The economics ~?finfor-
mation and uncertainty. Chicago, IL, Univ. of Chicago Press.
Brock, W.A. and Mirman, L.J. (1972) 'Optimal economic growth under uncertainty: The discounted case',
Journal ~)f Economic Theory, 4:479-513.
Chow, C.S. and Tsitsiklis, J.N. (1989) 'The complexity of dynamic programming', Journal of Complexity,
Chow, C.S. and Tsitsiklis, J.N. (1991) 'An optimal multigrid algorithm for continuous state discrete time
stochastic control', IEEE Transactions on Automatic Control, 36(8):898-914.
Chow, G.C. (1975) Analysis and control of dynamic economic systems. New York: Wiley.
Chow, G.C. (1979) 'Optimum control of stochastic differential equation systems', Journal of Economic
Dynamics and Control, 1:143-175.
Chow, G.C. (1981) Econometric analysis by control methods. New York: Wiley.
Chow, G.C. (1992) 'Dynamic optimization without dynamic programming', Economic Modelling, 3:3-9.
Chow, G.C. (1993a) 'Optimal control without solving the Bellman equation', Journal of Economic Dy-
namics and Control, 17:621-630.
Chow, G.C. (1993b) 'Computation of optimum control functions by Lagrange multipliers', in: D.A. Belsley,
ed., Computational techniques for econometrics and economic analysis. Dordrecht: Kluwer Academic
Publishers, pp. 65-72.
Chow, G.C. (1996) 'Dynamic economics: An exposition by the Lagrange method', manuscript, Princeton
Chew, S.H. and Epstein, L.G. (1989) 'The structure of preferences and attitudes towards the timing and
resolution of uncertainty', International Economic Review, 30(1): 103-118.
Chou, A.W. (1987) 'On the optimality of the Krylov information', Journal of Complexity, 3:545-559.
Christiano, L.J. (1990) 'Linear-quadratic approximation and value function iteration: A comparison', Jour-
nal of Business and Economic Statistics, 8(1):99-113.
Christiano, L.J. and Fisher, J.D.M. (1994) 'Algorithms for solving dynamic models with occasionally
binding constraints', Department of Economics, University of Western Ontario, manuscript.
Coleman, W.J. (1990) 'Solving the stochastic growth model by policy-function iteration', Journal of Busi-
ness and Economic Statistics, 8:27-29.
Coleman, W.J. (1993) 'Solving nonlinear dynamic models on parallel computers', Journal of Business and
Economic Statistics, 11(3):325-330.
Coleman, W.J. and Liu, M. (1994) 'Incomplete markets and inequality constraints: Some theoretical and
computational issues', Fuqua School of Business, Duke University, manuscript.
Cryer, C.W. (1982) Numerical functional analysis. Oxford: Clarendon Press.
Daniel, J.W. (1976) 'Splines and efficiency in dynamic programming', Journal of Mathematical Analysis
and Applications, 54:402-407.
Dantzig, G.B., Harvey, R.P., Landowne, Z.E and McKnight, R.D. (1974) DYGAM - A computer system
for the solution ¢)fdynamic programs. Palo Alto, CA: Control Analysis Corporation.
Davis, P.J. and Rabinowitz, P. (1975) Methods of numerical integration. New York: Academic Press.
de Boor, C. (1978) A practical guide to splines. New York: Springer.
Denardo, E.V. (1967) 'Contraction mappings underlying the theory of dynamic programming', SlAM Re-
view, 9:165-177.
724 J. Rust

Doshi, B.T. {1976) 'Continuous time control of Markov processes on an arbitrary state space: Discounted
rewards', Annals of Statistics, 4(6): 1219-1235.
Dudley, R.M. (1989) Real analysis and probability. Pacific Groves, CA: Wadsworth.
Dutta, P. (1991) 'What do discounted optima converge to?: A theory of discount rate asymptotics in
economic models', Journal of Economic Theory, 55(1):64-94.
Dynkin, E.B. and Juskevic, A.A. (1979) Controlled Markov processes. New York: Springer.
Eckstein, Z. and Wolpin, K. (1989) 'The specification and estimation of dynamic stochastic discrete choice
models', Journal ~f Human Resources, 24(4):562-598.
Epstein, L.G. and Zin, S.E. (1989) 'Substitution, risk aversion, and the temporal behavior of consumption
and asset returns: A theoretical framework', Econometrica, 57(4):937-970.
Epstein, L.G. and Zin, S.E. (1990) 'First-order risk aversion and the equity premium puzzle', Journal ~f
Monetary Economics, 26:387-407.
Evtushenko, Y.G. (1985) Numerical optimization techniques. Optimization Software Division. New York:
Fleming, W.H. and Mete Soner, H. (1993) Controlled Markov processes and viscosity solutions. New York:
Fox, B.L. (1973) 'Discretizing dynamic programming', Journal ~f Optimization Theory and Its Applications,
Garey, M.R. and Johnson, D.S. (1983) Computers and intractibility: A guide to the theory of NP-
completeness. New York: Freeman.
Geman, S. and Hwang, C. (1986) 'Diffusions for global optimization', SlAM Journal on Control and
Optimization, 24(5):1031-1043.
Geweke, J., Slonim, R. and Zarkin, G. (1992) Econometric solution methods for dynamic discrete choice
problems. University of Minnesota, manuscript.
Gihman, I.I. and Skorohod, A.V. (1974, 1975, 1979) The theory of stochastic processes, Volumes I, II, III.
New York: Springer.
Goldfarb, D. and Todd, M.J. (1989) 'Linear programming', in: A.H.G. Rinooy Kan and M.J. Todd, eds,
Handboolr~ in operations research, Volume I: Optimization. Amsterdam: North-Holland.
Gihman, I.I. and Skorohod, A.V. (1979) Controlled stochastic processes. New York: Springer.
Goffe, W.L., Ferrier, G.D. and Rogers, J. (1992) 'Global optimization of statistical functions with simulated
annealing', Computational Economics, 5(2):133-146.
Hackbusch, W. (1985) Multi-grid methods and applications. New York: Springer.
Hakansson, N. (1970) 'Optimal investment and consumption strategies under risk for a class of utility
functions', Econometrica, 38:587-607.
Hammersley, J.J. and Handscomb, D.C. (1992) Monte Carlo methods. London: Chapman & Hall.
Hansen, L.P. and Sargent, T.J. (1980a) 'Formulating and estimating dynamic linear rational expectations
models', Journal of Economic Dynamics and Control, 2(1):7-46.
Hansen, L.E and Sargent, T.J. (1980b) 'Linear rational expectations models for dynamically interrelated
variables', in: R.E. Lucas, Jr. and T.J. Sargent, eds, Rational expectations and econometric practice.
Minneapolis, MN: Univ. of Minnesota Press.
Hansen, L.E and Sargent, Y.J. (1995) 'Discounted linear exponential quadratic Gaussian control', IEEE
Transactions on Automatic Control, 40(5):968-971.
Hansen, L.E and Sargent, T.J. (1996) 'Recursive models of dynamic linear economies', manuscript, Hoover
Institution, Stanford, CA.
Hansen, L.E and Singleton, K. (1982) 'Generalized instrumental variables estimation of nonlinear rational
expectations models', Econometrica, 50:1269-1281.
Hansen, L.E and Singleton, K. (1983) 'Stochastic consumption, risk aversion, and the temporal behavior
of asset returns', Journal ~)fPolitical Economy, 91(2):249-265.
Hawkins, D. and Simon, H.A. (1949) 'Note: Some conditions for macroeconomic stability', Econometrica,
Ch. 14: Numerical Dynamic Programming in Economics 725

Hinderer, K. (1970) Foundations of nonstationary dynamic programming with discrete time parameter,
Berlin: Springer.
Hornik, K.M., Stinchcombe, M. and White, H. (1990) 'Universal approximation of an unknown function
and its derivatives using multilayer feedforward networks', Neural Networks, 3:551-560.
Homik, K., Stinchcombe, M., White, H. and Auer, E (1994) 'Degree of approximation results for
feedforward networks approximating unknown mappings and their derivatives', Neural Computation,
Howard, R. (1960) Dynamic programming and Markov processes. New York: Wiley.
Howard, R. (1971) Dynamic probabilistic systems, Volume II: Semi-Markov and decision processes. New
York: Wiley.
Hubner, G. (1977) 'Improved procedures for eliminating suboptimal actions in Markov programming by
the use of contraction properties', in: Transactions of the seventh Prague conference on in~mmation
theory, statistical decision .functio~v, random processes. Dordrecht: D. Reidel, pp. 257-263.
Hwang, C. (1980) 'Laplace's method revisited: Weak convergence of probability measures', Annals of
Probability, 8(6):1177-1182.
Imrohoro~lu, A. and Imrohoro~lu, S. (!993) 'A numerical algorithm for solving models with incomplete
markets', International Journal c~fSupercomputer Applications, 7(3):212-230.
Johnson, S.A. et al. (1993) 'Numerical solution of continuous-state dynamic progrmns using linear and
spline interpolation', Operations Research, 41(3):484-500.
Judd, K. (1993) 'Projection methods for solving aggregate growth models', Journal of Economic Theory,
Judd, K. (1994a) 'Comments on Marcet, Rust and Pakes', in: C. Sims, ed., Advances in econometrics sixth
worm congress, Vol. II. Cambridge Univ. Press, pp. 261-276.
Judd, K. and Solnick, A. (1994b) 'Numerical dynamic programming with shape-preserving splines', Hoover
Institution, manuscript.
Judd, K. (1996a) 'Approximation methods and projection methods in economic analysis', Chapter 12 in
this Handbook.
Judd, K. (1996b) 'Numerical methods in economics', Hoover Institution, manuscript.
Khachian, L.C. (1979) 'A polynomial time algorithm for linear programming', DokI. Acad. Nauk SSSR,
145(2):293-294. (Translated in Soviet Phys. Dokl., 7:595-596.)
Kantorovich, L.V. and Akilov, G.E (1982) Functional analysis. Oxford: Pergamon.
Keane, M.E and Wolpin, K.I. (1994) 'The solution and estimation of discrete choice dynamic programming
models by simulation: Monte Carlo evidence', Review of Economics and Statistics, 76(4):648-672.
Klee, V. and Minty, G.J. (1972) 'How good is the simplex algorithm?', in: O. Shisha, ed., Inequalities 1II.
New York: Academic Press, pp. 159-175.
Kortum, S. (1992) 'Value function approximaton in an estimation routine', Boston University~ manuscript.
Krasnoselskii, M.A., Vainikko, G.M., Zabreiko, EE, Rutitskii, Ya.B. and Stetsenko, V.Ya. (1972) Approx-
imate solution ~f operator equations. Translated by D. Louvish. Groningen: Wolters-Noordhoff.
Kreps, D.M. and Porteus, E.L. (1978) 'Temporal resolution of uncertainty and dynamic choice theory',
Econometrica, 46(1):185-200.
Kreps, D.M. and Porteus, E.L. (1979) 'Dynamic choice theory and dynamic programming~, Econometrica,
I<xonsj/5, L. (1985) Computational complexity of sequential and parallel algorithms. New York: Wiley.
Kushner, H.J. (1990) 'Numerical methods for stochastic control problems in continuous time', SIAM Journal
on Control and Optimization, 28(5):999-1048.
Kushner, H.J. and Kleinman, A.J (1971) 'Accelerated procedures for the solution of discrete Markov control
problems', IEEE Transactions on Automatic Control, 16(2):147-152.
Kushner, H.J. and Dupuis, EG. (1992) Numerical methods ~¢?~rstochastic control problems in continuous
time. New York: Springer.
Kydland, E and Prescott, E.C. (1982) 'Time to build and aggregate fluctuations', Econometrica, 50:1345-
726 J. Rust

Leland, H. (1974) 'Optimal growth in a stochastic in a stochastic environment', Review (~f Economic
Studies, 41:75-86.
Leonteif, W. (1966) Input-output economics. London: Oxford Univ. Press.
Lettan, M. and Uhlig, H. (1995) 'Rules of thumb and dynamic programming', Discussion paper 9527,
Center, Tilburg University.
Levhari, D. and Srinivasan, T. (1969) 'Optimal savings under uncertainty', Review of Economic Studies,
Long, J.B. and Plosser, C. (1983) 'Real business cycles', Journal ()f Political Economy, 91(1):39-69.
Lucas, R.E., Jr. (1978) 'Asset prices in an exchange economy', Econometrica, 46:1426-1446.
Lucas, R.E., Jr. and Prescott, E.C. (1971) 'Investment under uncertainty', Econometrica, 39(5):659-681.
Luenberger, D.G. (1969) Optimization by vector space methods. New York: Wiley.
Machina, M.J. (1987) 'Choice under uncertainty: Problems solved and unsolved', Journal of Economic
Perspectives, 1(1):121-154.
Marcet, A. (1994) 'Simulation analysis of dynamic stochastic models: Applications to theory and estima-
tion', in: C. Sims, ed., Advances in econometrics sixth worm congress, Vol. II. Cambridge Univ. Press,
pp. 91-118.
Marcet, A. and Marshall, D.A. (1994) 'Solving nonlinear rational expectations models by parameterized
expectations: Convergence to stationary solutions', Working paper WP-94-20, Federal Reserve Bank of
Chicago, Chicago, IL.
McKelvey, R.D. and Palfrey, T.R. (1992) 'An experimental study of the centipede game', Econometrica,
Merton, R.C. (1969) 'Lifetime portfolio selection under uncertainty: The continuous-time case', Review (~t"
Economics and Statistics, 51:247-257.
Miranda, M. and Schnitkey, G. (1995) 'Estimation of dynamic agricultural decision models: The case of
dairy cow replacement', Journal of Applied Econometrics, 10(5):41-56.
Nelder, J. and Mead, R. (1965) 'A simplex method for function minimization', Computational Journal,
Nemirovsky, A.S. and Yudin, D.B. (1983) Problem complexity and method efficiency in optimization. New
York: Wiley.
Niederreiter, H. (1992) Random number generation and quasi-Monte Carlo methods', SIAM CBMS-NSF
Vol. 63. Philadelphia, PA: SIAM.
Novak, E. (1988) Deterministic and stochastic error bounds in numerical analysis. Lecture Notes iu Math-
ematics, Vol. 1349, Berlin: Springer.
Ortega, J.M. (1988) Introduction to parallel and vector solution of linear systems. New York: Plenum.
Ortega, J.M. and Rheinboldt, W.C. (1970) lterative solution on nonlinear equations in several variables.
New York: Academic Press.
Ortega, J.M. and Voigt, R.G. (1985) 'Solution of partial differential equations on vector and parallel
processors', SlAM Review, 27(2):149-240.
Ozaki, H. and Streufert, P. (1996) 'Nonlinear dynamic programming for non-additive stochastic objectives',
Journal of Mathematical Economics, forthcoming.
Pakes, A. and McGuire, P. (1994) 'Computing Markov-perfect Nash equilibria: Numerical implications of
a dynamic differentiated products model', Rand Journal of Economics, 25(4):555-589.
Pan, V. and Reif, J. (1985) 'Efficient parallel solution of linear systems', Transactions of the ACM,
Papadimitriou, C.H. and Tsitsiklis, J.N. (1987) 'The complexity of Markov decision processess', Mathe-
matics of Operations Research, 12(3):441-450.
Paskov, S.H. (1993) 'Average case complexity of multivariate integration for smooth functions', Journal
t)f Complexity, 9:291-312.
Paskov, S.H. (1996) 'New methodologies for valuing securities', in: S. Pliska and M. Dempster, eds,
Mathematics t~fDerivative Securities. Isaac Newton Institute, Cambridge, UK.
Ch. 14: Numerical Dynamic Programming in Economics 727

Phelps, E. (1962) 'Accumulation of risky capital', Eeonometrica, 30:729-743.

Porteus, E.L. (1980) 'Overview of iterative methods for discounted finite Markov and semi-Markov deci-
sion chains', in: R. Hartley et al., eds, Recent developments in Markov decision processes. New York:
Academic Press.
Powell, M.J.D. (1981) Approximation theory and methods. Cambridge Univ. Press.
Press, W., Flannery, B.E, Teukolsky, S.A. and Vetterling, W.T. (1992) Numerical recipies. Cmnbridge:
Cambridge Univ. Press.
Puterman, M.L. (1990) 'Markov decision processes', in: D.E Heyman and M.J. Sobel, eds, Handbooks in
operations research and management science, Vol. 2, Amsterdam: North-Holland.
Puterman, M.L. (1994) Markov decision processes. New York: Wiley.
Puterman, M.L. and Shin, M.C. (1978) 'Modified policy iteration algorithms for discounted Markov decision
problems', Management Science, 24:1127-1137.
Rivlin, T.J. (1969) An introduction to the approximation of functions. Waltham, MA: Blalsdell.
Rust, J. (1985) "Stationary equilibrium in a Market for durable assets', Econometrica, 53(4):783-806.
Rust, J. (1986) 'When is it optimal to kill off the market for used durable goods?', Econometrica, 54(1):
Rust, J. (1987) 'Optimal replacement of GMC bus engines: An empirical model of Harold Zurcher',
Econometrica, 55(5):999-1033.
Rust, J. (1988) 'Maximum likelihood estimation of discrete control processes', SlAM Journal on Control
and Optimization, 26(5):1006-1023.
Rust, J. (1989a) 'A dynamic programming model of retirement behavior', in: D. Wise, ed., The economics
(~[aging. Univ. of Chicago Press, pp. 359-398.
Rust, J. (1989b) 'Comment on "optimal collection of information by partially informed agents" ', Econo-
metric Reviews, 7(2):155-160.
Rust, J. (1994a) 'Structural estimation of Markov decision processes', in: D. McFadden and R. Engle, eds,
Handbook of econometrics, Vol. 4. Amsterdam: North-Holland, pp. 3081-3143.
Rust, J. (1994b) 'Estimation of dynamic structural models, problems and prospects: Discrete decision
processes', in: C. Sims, ed., Advances in econometrics sixth worm congress, Vol. lI. Cambridge Univ.
Press, pp. 119-170.
Rust, J. (1995a) 'Do people behave according to Bellman's principle of optimality?', University of Wis-
consin, manuscript.
Rust, J, (1995b) 'Using randomization to break the curse of dimensionality', Econometrica, forthcoming.
Rust, J. (1996a) 'Stochastic decision processes: Theory, computation, and estimation', University of Wis-
consin, manuscript.
Rust, J. (1996b) 'Is dynamic programming inherently sequential?', manuscript, University of Wisconsin.
Saad, Y. (1984) 'Practical use of some Krylov subspace methods for solving indefinite and nonsymmetric
linear systems', SIAM Journal on Scientific and Statistical Computing, 5( l):203-228.
Sand, Y. (1989) 'Krylov subspace methods on supercomputers', SIAM Journal on Scientific and Statistical
Computing, 10(6):1200-1232.
Saad, Y. mad Sehultz, M.H. (1986) 'GMRES: A generalized minimum residual algorithm for solving
nonsymmetric linear systems', SlAM Journal on Scientific and Statistical Computing, 7(3):856-869.
Samuelson, EA. (1969) 'Lifetime portfolio selection by dynamic stochastic programming', Review of
Economics and Statistics, 51:239-246.
Santos, M. and Vigo, J. (1995a) 'Accuracy estimates for a numerical approach to stochastic growth models',
manuscript, ITAM, Mexico.
Santos, M. and Vigu, J. (1995b) 'Error bounds for a numerical solution for dynamic economic models',
manuscript, ITAM, Mexico.
Sargent, T.J. (1978) 'Estimation of dynamic labor demand schedules under rational expectations', Journal
of Political Economy, 86(6):1009-1044.
Sargent, T.J. (1981) 'Interpreting economic time series', Journal of Political Economy, 89(2):213-248.
728 Z Rust

Schumacher, L.L. (1983) 'On shape preserving quadratic spline interpolation', SIAM Journal of Numerical
Analysis, 20(4):854-864.
Schweitzer, EJ. and Seidmann, A. (1985) 'Generalized polynomial approximations in Markovian decision
processes', Journal of Mathematical Analysis and Applications, 110:568-582.
Semmler, W. (1995) 'Solving nonlinear dynamic models by iterative dynamic progrmmning', Computational
Economics, 8(2):127-154.
Sikorski, K. (1984) 'Optimal solution of nonlinear equations satisfying a Lipschitz condition', Numerische
Matematik, 43:225-240.
Sikorski, K. (1985) 'Optimal solution of nonlinear equations', Journal of Complexity, 1:197-209.
Simon, H.A. (1956) 'Dynamic programming under uncertainty with a quadratic criterion function', Econo-
metrica, 24:74-81.
Sims, C.A. 'Solving the stochastic growth model by backsolving with a particular nonlinear form for the
decision rule', Journal of Business and Economic Statistics, 8(1):45-47.
Smith, A.A., Jr. (1991) 'Solving stochastic dynamic progrmnming problems using rules of thumb', Discus-
sion Paper 816, Institute for Economic Research, Queen's University, Ontario, Canada.
Solis, EJ. and Wets, R.J. (1981) 'Minimization by random search techniques', Mathematics ~[` Operations
Research, 6(1):19-30.
Stokey, N.L. mad Lucas, R.E., Jr. (with Prescott, E.C.) (1989) Recursive methods in economic dynamics.
Cambridge, MA: Harvard Univ. Press.
Tapiero, C.S. and Sulem, A. (1994) 'Computational aspects in applied stochastic control', Computational
Economics, 7(2):109-146.
Tanchen, G. (1990) 'Solving the stochastic growth model by using quadrature methods and value function
iterations', Journal of Business & Economic Statistics, 8(1):49-51.
Tauchen, G. and Hussey, R. (1991) 'Quadrature-based methods for obtaining approximate solutions to
nonlinear asset pricing models', Econometrica, 59(2):371-396.
Tranb, J.E (1994) 'Breaking intractability', Scientific American, 270(1):102-107.
Traub, J.E, Wasilkowski, G.W. and Wo~niakowski, H. (1988) Information-based complexity. New York:
Academic Press.
Traub, J.E and Wogniakowski, H. (1980) A general theory of optimal algorithms. New York: Academic
Traub, J.E and Wo2niakowski, H. (1984) 'On the optimal solution of large linear systems', Journal ~)f the
Association .['or Computing Machinery, 31(3):545-559~
Traub, J.E and Wo2niakowski, H. (1991) 'Information-based complexity: New questions for mathemati-
cians', The Mathematical lntelligencer, 13(2):34-43.
Traub, J.E and Wo~niakowski, H. (1992) 'The Monte Carlo algorithm with a pseudorandom generator',
Mathematics of Computation, 58(197):323-339.
Trick, M.A. and Zin, S.E. (1993) 'A linear programming approach to solving stochastic dynamic programs',
Carnegie-Mellon University, manuscript.
Tsitsiklis, J.N. (1994) 'Asynchronous stochastic approximation and Q-learning', Machine Learning, 16:185-
Tsitsiklis, J.N. (1994) 'Complexity-theoretic aspects of problems in control theory', Center for Intelligent
Control Systems, MIT Press, manuscript.
van Dijk, N.M. (1984) Controlled Markov processes: Time discretization. CWI Tract 11. Amsterdam:
Mathematische Centrum.
Wasilkowski, G.W. (1992) 'On the average complexity of global optimization problems', Mathematical
Programming, 57:313-324.
Wershculz, A.G. (1991) The computational complexity o[` differential and integral equations. New York:
Oxford Univ. Press.
Wheeler, R.M., Jr. and Narendra, K.S. (1986) 'Decentralized learning in finite Markov chains', 1EEE
Transactions on Automatic Control, AC-31:519-526.
Ch. 14: Numerical Dynamic Programming in Economics 729

Whitt, W. (1978) 'Approximations of dynamic programs I', Mathematics (~f Operations Research, .3:23t-
Whittle, P. (1982) Optimization over time: Dynamic programming and stochastic control, Vol. I and II.
New York: Wiley.
Wo~niakowski, H. (1991) 'Average case complexity of multivariate integration', Bulletin of the American
Mathematical Society, 24:185-194.
Wo2niakowski, H. (1992) 'Average case complexity of linear multivariate problems', Journal of Complexity,
Chapter 15




University of Minnesota
Federal Reserve Bank of Minneapolis


1. Introduction 733
2. Deterministic methods of integration 734
2.1. Unidimensional quadrature 734
2.2. Multidimensional quadrature 735
2.3. Low discrepancy methods 738
2.4. Other deterministic methods 741
3. Pseudorandom number generation 742
3.1. Uniform pseudorandom number generation 743
3.2. General methods for nonuniform distributions 745
3.3. Selected univariate distributions 752
3.4. Selected multivariate distributions 754
4. Independence Monte Carlo 756
4.1. Simple Monte Carlo 757
4.2. Acceptance methods 759
4.3. Importance sampling 761
4.4. A note on the choice of method 764
5. Variance reduction 769
5.1. Antithetic Monte Carlo 770
5.2. Systematic sampling 772
5.3. The use of conditional expectations 773
5.4. Control variables 774

*Comments from John Rust and two anonymous referees, who bear no responsibility for errors or
omissions, are gratefully acknowledged. This work was supported in part by National Science Foundation
Grants SES-9210070 and SBR-9514865. The views expressed in this paper are those of the author and not
necessarily those of the Federal Reserve Bank of Minneapolis or the Federal Reserve System.

Handbook of Computational Economics, Volume L Edited by H.M. Amman, D.A. Kendrick and J. Rust
@ 1996 Elsevier Science B.E All rights reserved.
732 •1. G e w e k e

6. M a r k o v chain M o n t e Carlo methods 775

6.1. Two Markov chain Monte Carlo algorithms 777
6.2. Mathematical background 779
6.3. Convergence of the Gibbs sampler 781
6.4. Convergence of the Metropolis-Hastings algorithm 784
6.5. Assessing convergence and numerical accuracy 785
7. S o m e e x a m p l e s 788
7.1. Stochastic volatility 788
7.2. Integration and optimization 793
References 797
Ch. 15: Monte Carlo Simulation and Numerical Integration 733

1. Introduction

Optimization problems in dynamic, stochastic environments are an increasingly im-

portant part of economic theory and applied economics. Inspired by the potential
returns to richer and more realistic models of a variety of policy problems and the
promise of ever-growing computational power, economists have turned more and more
to models that can be simulated but not solved in closed form. Simulation methods
can provide solutions for two related integration problems. One integration problem
arises in model solution, for agents whose expected utilities cannot be expressed as a
closed function of state and decision variables. The other occurs when the investigator
combines sources of uncertainty about models to draw conclusions about policy.
This chapter concentrates on simulation methods that are both important and useful
in the solution of these integration problems. In mathematics there is a long-standing
use of simulation in the solution of integration problems, notably partial differential
equations, where the form of the simulation is often suggested by the problem it-
self. The history of simulation methods to solve integration problems in economics is
shorter, but these methods are appealing there for the same reason: integration gen-
erally involves probability distributions in the integrand, which thereby suggests the
simulation methods to be employed.
This pervasive use of simulation methods in science persists despite the well-known
asymptotic advantages of deterministic approaches to integration. This continued use
of simulation methods occurs in part because astronomical computing time is often
required to realize the promise of deterministic methods. A more important fact is that
simulation methods are generally straightforward for the investigator to implement,
relying on an understanding of a few principles of simulation and the structure of
the problem at hand. By contrast, deterministic methods typically require much larger
problem-specific investments in numerical methods. Simulation methods economize
the use of that most valuable resource, the investigator's time.
The objective of this chapter is to convey an understanding of principles for the
practical application of simulation in economics, with a specific focus on integration
problems. It begins with a discussion of circumstances in which deterministic methods
are preferred to simulation, in Section 2. The next section takes up general procedures
for simulation from univariate and multivariate distributions, including acceptance and
adaptive methods. The construction and use of independent, identically distributed
random vectors to solve the multidimensional integration problems that typically arise
in economic models is taken up in Section 4, with special attention to combination
of different approaches and assessment of the accuracy of numerical approximations
to the integral. Section 5 discusses some modifications of these methods to produce
identically but not independently distributed random vectors, that often greatly reduce
approximation error in applications in economics. Recently developed Markov chain
Monte Carlo methods, which make use of samples that are neither independently nor
identically distributed, have greatly expanded the scope of integration problems with
734 J, Geweke

convenient practical solutions. These procedures are taken up in Section 6. The chapter
concludes with some examples of recent applications of simulation to integration
problems in economics.

2. Deterministic methods of integration

The evaluation of the integral I = f£o f ( x ) d x is a problem as old as the calculus

itself', and is equivalent to solution of the differential equation dy/dx = f(x) subject
to the boundary condition y(a) = 0. In well-catalogued instances analytical solutions
are available. [Gradshteyn and Ryzhik (1965) is a useful standard reference.] The
literature on numerical approaches to each problem is huge, a review of any small
part of which would occupy a substantial part of this volume. This section focuses on
those procedures that provide the most useful tools in economics and econometrics
and are readily available in commercial software. This means neglecting the classical
but dated approaches using equally-spaced abscissas, like Newton-Cotes; a useful
overview of these methods is provided by Press et al. (1986, Chapter 4) and a more
extended discussion may be found in Davis and Rabinowitz (1984, Chapter 2).

2.1. Unidimensional quadrature

The principle underlying most state-of-the-art deterministic evaluations of I =

fb f ( z ) d z is Gaussian quadrature. If f(z) = p(x)w(x), where p(z) is any poly-
nomial of degree 2n - 1 or lower, and w(z) is a chosen basis function, then there
exist points x~ c [a, b] and a weight wi associated with each point such that

~a b jfab
f(x) dx = p(x)w(x) dx = w@(xi).
The points and weights depend only on a, b and the function w(x), and if they are
known for a = 0 and b = 1, then it is straightforward to determine their values for
any other choices of a and b. If r(x) = f(x)/w(x) is not a polynomial of degree
2rL - 1 or lower, then


may be taken as an approximation to I = fb f(x)dx. If r(x) is "smooth" relative to a

polynomial of degree 2n - 1, then the approximation should be good. More precisely,
Ch. 15: Monte Carl() Simulation and Numerical Integration 735

one may show that if r(x) is 2n-times differentiable then

jfab n
f(x) dx - Z ~oir(xi) = cnr (zn) (~)

tbr some ~ e [a, b], where {on} is a sequence of constants with l i m n ~ cn = 0. For
example, if w(x) = 1, a = - 1 , b = +1, then cn = 22n+t(n!)4/{(2n + 1)![2n!] 3}
[Judd (1996, Section 7.2)].
This approach can be applied to any subinterval of [a, b] as well. As long as
r(x) is 2n-times differentiable the accuracy of the approximation may be improved
by summing over subintervals. In fact in this case, one may satisfy prespecified
convergence or error criteria through successive bisection. Error criteria are usually
specified as the absolute or relative difference in the computed approximation to I =
f(~ f(x) dx using an n-point and an m-point quadrature [Golub and Welsch (1969)].
Infinite and semi-infinite intervals can be treated by appropriate transformation of
variable to a finite interval [Piessens et al. (1983)]. Existence and boundedness of
r (2n) depends in part on the choice of basis function w(x). Some of the most useful
are indicated in the following table.

w(x) Interval Name

1 ( - 1, 1) Legendre
1/~//1 - x 2 (-1, 1) Chebyshev first kind
~¢//1 X2 - - (--1, 1) Chebyshev second kind
e x p ( - z 2) (-0% +c~) Hermite
(1 + x)C~(l - x) ~ (--1, t) Jacobi
exp ( - x) x a (0, c~) Generalized Laguerre
1/ cosh(x) ( - o% +c~) Hyperbolic cosine

For many purposes Gauss-Legendre rules are adequate, and there is a subst,antiat
stock of commercially supplied software to evaluate one-dimensional integrals up to
specified tolerances. These methods have been adapted to include functions having
singularities at identified points in the interval of integration [Piessens et al. (1983)].

2.2. Multidimensional quadrature

Some multidimensional integration problems in fact reduce to an integration in a single
variable that must be carried out numerically. For example, all but one dimension may
be integrable analytically, or the multidimensional integral may in fact be a product
736 ,L Geweke

of integrals each in a single variable, perhaps after a suitable change of variable. In

such cases quadrature for one-dimensional integrals usually provides a neat solution.
Such cases are rare in economics and econometrics. If the dimension of the domain
of integration is not too high and the integrand is sufficiently smooth, then one-
dimensional methods may be extended with practical results. These cases cover a
small subset of integration problems in economics and econometrics, but they deserve
discussion because quadrature-based methods are then quite efficient and may be easy
tO USe.
The straightforward extension of quadrature methods to higher dimensions shows
both its strengths and weaknesses. Following Davis and Rabinowitz (1984, pp. 354--
359), suppose that _R is an m-point rule of integration over B C_ 9V', leading to the


R(f)=Ewjf(xj)~f(x)dx, xj c B ,

and that S is an n-point rule over G C_ 9l s, leading to the approximation

S ( f ) = ~ ' r'kf(Yk) ~ JG.f(Y)dy, Yk C G.


The product rule of R and S is the ran-point rule applicable to B x G,

m n

R x S ( f ) = ~-'~-~c~jr,~f(xj,y~) ~ / B f ( x , y ) dxdy, xy E B, Yk E G.
j=l k=l ×G

If h(x, y) = ~-~i=l
k fi(x)gi(y), and if R" integrates f.i(x) exactly over B and S inte-
grates gi(Y) exactly over G (i = 1 , . . . , k), then R x S will integrate h(x, y) exactly
over B × G. The obvious extensions to the product of three or more rules can be made.
These extensions can be expected to work well when (a) quadrature is adequate in
the lower dimensional marginals of the function at hand, (b) h(x, y) ~ f(x)g(y), and
(c) the product m n is small enough that computation time is reasonable. Condition (c)
and perhaps (a) are violated when the support of h is concentrated on a set small rela-
tive to the Cartesian boundaries for that support, as illustrated in Fig. 15.1 (a). A more
common occurrence in economics and econometrics involves violations of (b) and (c):
B × G = 9l r x 9l s, but the function is concentrated on a small subset of its support
that cannot be expressed as a Cartesian product, as illustrated in Fig. 15.1(b). Whether
these difficulties are present or not, the number of function evaluations and products
required in any product rule increases geometrically with the number of arguments of
the function, a phenomenon sometimes dubbed "the curse of dimensionality'.
Ch. 15: Monte Carlo Simulation and Numerical Integration 737

(a) (b)


Figure 15.1. Contours of the function to be integrated are shown.

These constitute the dominant problems for quadrature methods in economics and
econometrics. To a point, one may extend quadrature to higher dimensions using
extensions more sophisticated than product rules. These extensions are usually specific
to functions of a certain type, and for this reason the literature is large, but reliable
software for a problem at hand may be hard to come by. For example, there has
been considerable attention to monomials (polynomials for which the highest degree
in any one product is bounded); e.g. McNamee and Stenger (1967), Genz and Malik
(1983), Davis and Rabinowitz (1984, Section 5.7). Compound, or subregion, methods
provide the most widely applied extensions of quadrature to higher dimensions. In
these procedures, a finer and finer subdivision of the original integration region is
dynamically constructed, with smaller subregions concentrated where the integrand
is most irregular. Within each subregion a local rule with a moderate number of
points is used to approximate the integral. If, at a given step, a prespecified global
convergence criterion is not satisfied, those regions for which the convergence criterion
is farthest from being satisfied are subdivided, and the local rule is applied to the
new subdivisions [Van Dooren and De Ridder (1976), Genz and Malik (1980), Genz
(1991)]. For these procedures to work successfully, it is important to have a scheme
for construction of subregions well suited to the problem at hand, as reconsideration
of Fig. 15.1(b) will make clear. For example Genz and Kass (1996) provides an
algorithm that copes well with the isolated peaks in high-dimensional spaces often
found in Bayesian multiparameter problems.
These extensions of quadrature are routinely successful for integrals through di-
mension four or five. Beyond four or five, success depends on whether the problem
at hand is of a type for which existing subregion methods are well suited. Whereas
the application of quadrature to a function of a single variable can be successful as
a "black box" procedure, problems of dimensions three and four are more likely to
738 J. Geweke

require transformations or other analytical work before quadrature can be applied.

There are very few applications of quadrature-based methods to integrals of more
than five dimensions in the literature.

2.3. Low discrepancy methods

A low discrepancy method defines a deterministic sequence of points {xj}~_ l and

a corresponding m-point integration rule m - ' ~ j = l f ( x j ) ~ fB f ( x ) dx. Gaussian
quadrature organizes the choice of points to evaluate interactions of polynomials with
basis functions exactly. Low discrepancy methods choose the sequence to minimize
the difference between the number of points in a set and its measure. [The discussion
here closely follows parts of Niederreiter (1992, Chapters 2 and 3).]
The canonical problem sets B = fa, the d-dimensional hypercube. (This stipulation
is less restrictive than it might seem, and we shall return to this point in an example
in Section 4.4.) For arbitrary S _C B define
A S;{xj}j=l = Xs(xj),

where Xs (x) is the characteristic function of S, Xs (x) = 1 if x E S and Xs (x) = 0

if x ~ S. Thus A(S; {xj}~=l) is the counting function that indicates the number of j
with 1 ~< j ~< m for which xj E S. If S is a nonempty family of Lebesgue measurable
subsets of fd, then the discrepancy of the point set {xj}}"=l is

Dm(S ; (Xj}?_l) ~- ~ : p A(S; {xj}~I~ 1)/TI~-- ~d(S) ,

where /~d(') denotes d-dimensional Lebesgue measure. Let S* be the family of all
subintervals of fd of the form 1-[i=l [0, ui]. Then the star discrepancy of {xj}~= 1 is

D~({Xj}jm=l) z Drrt(S*;{xj}jm'-l)-

The star discrepancy of {xj}~= 1 may be used to bound the error of approximation
of flu f(x) dx by m -1 ~ j = , f(xj). To do so, first define the variation o f f on i d in
the sense of Vitali,

, 1 _d_f dxl

for functions f for which the individual partial derivatives are continuous on x~d, Next,
let V (k) (f; i l , . . . , 'ik) be the variation in the sense of Vitali of the restriction of f to
Ch. 15." Monte Carlo Simulation and Numerical Integration 739

the k-dimensional face { ( x l , . . . , xd) C id: xj = 1 for j ¢ i i , . . . , ik}. The variation

of f on [d in the sense of Hardy and Krause is
V(f) = Z Z V(~)(f;ii"'" 'i~)'
k=l l<~il~".<~ik<~d

[See Niederreiter (1992, Section 2.2) for a extension of this definition to functions f
that are not d times continuously differentiable.] For any sequence {xj}, xj c fa,

~f(xj)- f / a f ( x ) dx ~ V(f)D;~(Xl,...,Xm),

the Koksma-Hlawka inequality [Hlawka (1961), Niederreiter (1992, Theorem 2.11)].

The bound is strict [Niederreiter (1992, Theorem 2.12)].
Low discrepancy methods choose sequences {xj } so as to minimize D,,~ * ({ X"3 }j=l)-
Intuitively, the star discrepancy can be kept small by spacing the points xj evenly. A
naive grid on fd will achieve this, but requires an impractically large number of points
for d >~ 5 in the same way as quadrature does. Low discrepancy methods substantially
extend the range of practical d before succumbing to the curse of dimensionality. To
describe two such sequences, begin with the unique base-b expansion of any integer n,


where b is an integer exceeding 1 and 0 ~< aj(n) < b. The radical-inverse function
Cb in base b is defined by


This function maps the integers 1 , . . . , m into m distinct points in the unit interval,
maintaining a regular spacing between the points: if m = bk - 1, k integer, then there
are m evenly spaced points beginning with b -k and ending with 1 - b -k. Let (bj} be
a sequence of relatively prime integers all exceeding 1. (For example, bl = 2, b2 = 3,
b3 = 5 , . . . . ) The Halton sequence in bases bt,..., bd is

{Xj}~'~=I, xj = [ ¢ b l ( J ) , . . . , ¢ b d ( J ) ] #

[Hilton (1960)]. The m-element Hammersley sequence in bases bl,..., bd is

{xj}~=,, xj [j/m, Ob~(j),...,¢bd_,(3)],

740 J. Geweke
[Hammersley (1960)]. [An even earlier, closely related sequence is that of Richtmeyer
(1952, 1958) described in Hammersley and Handscomb (1964).]
It may be shown [Niederreiter (1992, Theorem 3.6)] that for a Halton sequence in
the pairwise relatively prime bases bl,. • •, ba,

• m --+--m
m \=logm+

~ (j=~ll~)bj-1 "~_~(logm,d+o[_.~(logm,d_ll" (2.3.1)

For the corresponding Hammersley sequence there is the somewhat better bound

• m ~ -d- + - - | ~--,
l 12[ //\z,ogojbj-1 1o g m + ~-~-)
m m

"~~___(logm)a_,+ O[_~(logm)a 7}. (2.3.2)
The second inequalities in (2.3.1) and (2.3.2) imply that the optimal bases are the
primes themselves, bl = 2, b2 = 3, b3 = 5 , . . . .
If the upper bounds in (2.3.1)-(2.3.2) are used to govern accuracy, then the number
of function evaluations increases faster than geometrically with dimension, d, because
of the presence of the term

d ( b i - 1) d-I ( b i - 1)
II or II
i=1 i=1

Table 15.1 provides the number of evaluations required to assure that

"~ f ( x j ) - j [ . f(x) dx <~c ( c = 10 -2 or 10 -5)


for a function whose Hardy-Krause total variation is d. It also provides the actual
number of evaluations required to guarantee an approximation error of c or less for
the function f(x) = 2 J = i zj. While the upper bound on the number of evaluations
required increases faster than exponentially in the dimension d, the actual number
required increases not much faster than linearly and is much smaller. In general,
however, one will not know the value of the actual error of approximation. The
difficulty of assessing this error is a major disadvantage of low discrepancy and other
deterministic algorithms for integration.
Ch. 15. Monte Carlo Simulation and Numerical Integration 741

Table 15.1
Evaluations required to approximate f i d f ( x ) d x , f ( x ) = ~ ; = 1 f ( x j ) , with
maximum error c: Actual number and upper bound

d 2 3 4 5
c = 0.01:
Actual " 228 442 661 1060
Bound 19,335 1,014,825 9.154 × 107 1.522 × 1010
c = 10-5:
Actual 640,426 1,039,188 1,523,433 2,379,162
Bound 52,477,915 3.469 x 109 3,513 × l0 II 6.114 × 10 t3

2.4. Other deterministic methods

In specialized settings integration in high dimensions can be made more tractable.

The obvious limiting case is the one in which the entire problem may be solved
analytically. But there are also classes of problems that cannot be solved analytically,
with common features that suggest specific approximations. An example is provided
by Tierney and Kadane (1986) for a class of problems arising in Bayesian statistics
and econometrics:

En(g) = fo 9(8) exp[g(e)]rr(O)de z

f e exp[nL* (8)] dO
fo exp[g(O)]re(e) de fo exp[nL(O)] dO '

where g(O) is a log-likelihood function; re(0) is a prior density kernel; 9(0) is a

strictly positive function of interest; n is the number of observations entering the log-
likelihood function; L(O) = [log re(0) + g.(O)]/n;and L* (0) = [log g(O) + log re(0) +
Let 0" denote the mode of L, and let Z = aL(a)/0ooo'. Laplace's approximation

fo exp[nL(O)] dO ~ Jo [ n L ( O ) - ln(O - 0")', ( 0 - 0)] d0

= (2re)k/2l,~l 1/2 exp [nL(O)].

Similarly, if 0"* is the mode of L* and S* = ~2L*(O*)/OOgO',

exp[nL* (8)] dO ~ (2re)k/2lZ* l'/2 exp [nL* (0")].

742 A Geweke

The error of approximation in each case is O(n--1/2), but in the corresponding ap-

the leading terms in the numerator and denominator cancel and the resulting error of
approximation for En (9) is O(r~ - ] ) [Tierney and Kadane (1986)].
The approximate solution provided by this method is a substantial improvement on
previous approximations of this kind, which worked with a single expansion about O.
It exhibits two attractions shared by most specialized approximations to integration in
higher dimensions. First, it avoids the need for specific adaptive subregion analysis
required for quadrature, if indeed quadrature can be made to work at all. Second,
once function-specific code has been written the computations involve standard ascent
algorithms to find 0" and O* and are usually extremely fast. This example also shares
some limitations of this approach. First, there is no way to reduce approximation error,
whereas in quadrature one can increase the number of points or subregions used and
in Monte Carlo one can increase the number of iterations. Second, there is no way
to evaluate the error of approximation; again, quadrature and Monte Carlo will prove
error estimates. Third, there is possibly time intensive analytical work required for
each problem in forming derivatives for different 9 as well as different g. And finally,
the requirement that 9 be strictly positive is restrictive. The method may be extended
to more general functions at the cost of some increase in complexity [Tierney, Kass
and Kadane (1989)].

3. Pseudorandom number generation

The analytical properties of virtually all Monte Carlo methods for numerical integra-
tion, and more generally for simulation, are rooted in the assumption that it is possible
to observe sequences of independent random variables, each distributed uniformly on
the unit interval. Given this assumption, various methods, described in Section 3.2,
may be used to construct random variables and vectors with more complex distribu-
tions. Specific transformations from the uniform distribution on the unit interval to
virtually all of the classical distributions of mathematical statistics have been con-
structed using these methods. Some examples are reviewed in Sections 3.3 and 3.4.
These distributions, in turn, constitute building blocks for the solutions of integration
and simulation problems described subsequently in this chapter.
The assumption that it is possible to observe sequences of independent random
variables, distributed uniformly or otherwise, constitutes a model or idealization of
what actually occurs. In this regard it plays the same role here with respect to what
follows as does the assumption of randomness in much of economic theory with
respect to the derived implications for optimizing behavior or does the assumption
Ch. 15: Monte Carlo Simulation and Numerical Integration 743

of randomness with respect to the development of methods of statistical inference in

econometrics. In current methods tor pseudorandom number generation, the observed
sequences of numbers for which the assumption of an i.i.d, uniform distribution on the
unit interval is the model, are in fact deterministic. Since the algorithms that produce
these observed sequences are known, the properties of the sequences may be studied
analytically in a way that events in the real world corresponding to assumptions
of randomness in economic models may not. Thus, the adequacy or inadequacy of
stochastic independence as a model for these sequences is on a surer footing than
is this assumption as a model in economic or econometric theory. We begin this
section with an overview of current methods of generating sequences for which the
independent uniform assumption should be an adequate model.

3.1. Uniform pseudorandom number generation

Virtually all pseudorandom number generators employed in practice are linear congru-
ential generators and their elaborations. In the lineal" congruential generator a sequence
of integers {J/} is determined by the recursion

Ji = ( a J i - i + c) m o d m . (3.1.1)

The parameters a, e, and m determine the qualities of the generator. If c = 0, the

resulting generator is a pure multiplicative congruential generator. For example, the
multiplicative generator with m = 231 - 1 = 2147483647 (a prime) and a = 16807,
a = 397204094, or a = 950706376 is used in the IMSL scientific library [IMSL
(1994)], and the user may choose between different values of c as well as set the seed
J0. The sequence {Ji} is mapped into the pseudorandom uniform sequence {Ui} by
the transformation

~ = Ji/m. (3.1.2)

If m is prime, the sequence will cycle after producing exactly m distinct values;
clearly one can do no better than m = 231 - 1 for a sequence of positive integers
with 32-bit arithmetic. There are many criteria for evaluating the i.i.d, uniform distri-
bution on the unit interval as a model for the resulting sequences {Ui}. Informal but
useful discussions are provided by Press et al. (1986, pp. 192-194) and Bratley, Fox
and Schrage (1987, pp. 216-220). More technical and detailed evaluations, includ-
ing discussion of the choice of c, may be found in Coveyou and McPherson (1967),
Marsaglia (1972), Knuth (1981) and Fishman and Moore (1982, 1986).
There are many elaborations on pseudorandom number generation that build on
the primitive of the linear or multiplicative congruential generator. In the shuffled
generator, a table is initialized with q seeds. The generator is then used in the obvious
744 J. Geweke

way to select a table entry pseudorandomly, and J1 and Ul are generated as described
in the preceding paragraph. Then a new entry is selected pseudorandomly, U2 is
generated from that entry, and so on. If the congruential generator produced i.i.d.
uniform random variables, so would the shuffled generator, and shuffled generators
extend the upper bound on cycle length to mq; this option is provided conveniently in
IMSL. A shuffled generator described by L'Ecuyer (1986) has cycle length over 1019.
However, the analytical properties of the shuffled generator are harder to evaluate.
In another elaboration on the basic approach, one may combine two pseudorandom
sequences {Ji} and {K.~} from the congruential generator to produce a third sequence
{L~} that is then mapped into Ui, Ui = L i / m , in one of two ways: (a) Let Li =
(Ji + K i ) m o d m , or (b) use {Ki} to randomly shuffle {J~} and then set {Li} to the
shuffled sequence. Both of these generators extend cycle length, but subtle issues arise
in the combination of sequences. For a discussion of these issues and comparison of
properties, consult Wichmann and Hill (1982) or L'Ecuyer (1986) for (a), Marsaglia
and Bray (1968) or Knuth (1981, p. 32) for (b).
The add with carry generator [Marsaglia and Zaman (1991)] has a base b, lags r and
s (r > s), and a seed vector j' = ( j l , . . . , jr, C) with integer elements ji: 0 ~< .j~ < b
(i = 1 , . . . , r) and carry bit c = 0 or 1. The generated sequence is j , / ( j ) , f [ / ( j ) ] , . . .

(j2,... ,j~, jr+~-s +j~ + c, 0)

if j,.+l-~ + jl + c < b,
f(jl,...,jr,c) = (j2, , 3 ~ . , j r + l - s + j l + c - b , 1)
if jT+I-~ + jl + c ~> b.

With appropriately chosen base b, lags r and s, and seed vector j, the generated
sequence has period b~ + bs - 2. Marsaglia and Zaman (1991) discuss appropriate
choices of these values. One example is b = 232 - 5, r = 43, s = 22, and seed vector
consisting of any 43 integers in [0, 232 - 6]. The sequence of vectors has a cycle
exceeding 10 414, and all possible sequences of 43 integers appear within a cycle. [The
add with carry generator is one of a family of closely related generators. Marsaglia
and Zaman (1991) discuss the family.]
Since pseudorandom numbers are in fact deterministic, some consideration must be
given to systematic differences between the two. One important quality is the cycle
length. Most simulations on personal computers or workstations are unlikely to exceed
the cycle length of 231 of typical good linear congruential generators. But a study
carried out with vector or parallel processors could well exceed this length, and in such
cases the shuffled or add with carry generator should be considered. Another quality
is absence of serial correlation. This is easily tested but generally is not a problem.
Greenberger (1961) shows that the first order serial correlation coefficient of any linear
congruential generator is bounded above by a -1 [1 - ( 6 c / m ) + 6 ( e / m ) 2] + (a + 6 ) / m ,
Ch. 15: Monte CarloSimulation and Numerical Integration 745

and Knuth (1981, p. 84) points out that for nearly all m the serial correlation coefficient
is less than 1/v/-m.
Evidence of pseudorandomness is usually exhibited in high dimensional spaces. If
one plots successive overlapping sequences of n pseudorandom numbers, then the
sequences typically lie in a few hyperplanes of dimension n - 1 each. For example, in
the case of linear congruential generators the number of hyperplanes is no more than
( n ! / m ) 1/n [Marsaglia (1968)]: e.g., if m = 231 - 1, then sequences of length 6 lie on at
most 108 distinct hyperplanes. In the add with carry generator, successive overlapping
sequences of more than r values lie on hyperplanes with a separating distance of at
least 1/v'~ [Tezuka et al. (1993)]. One can determine the existence of such hyperplanes
using the spectral test first proposed in Coveyou and MacPherson (1967). Accessible
descriptions of this test are provided in Knuth (1981) and Bratley, Fox and Schrage
(1987). Most simulation methods employ highly nonlinear transformations of {Ui},
as we shall see subsequently, so the distribution of sequences on hyperplanes does
not carry over. (However, new problems can arise: see the discussion below of the
Box and Muller transformation to construct normally distributed random variables.)
A few practical steps will avoid most problems. First, use only uniform pseudo-
random number generators that are completely documented with references to the
academic literature. Second, questions of execution time, often discussed in the aca-
demic literature, are irrelevant in computational economics: subsequent computations
using pseudorandom uniform random sequences take much longer than the most
elaborate variants on linear congruential generators, so that even if execution time for
these generators could be driven to zero, there would be no significant improvement
in overall execution time. Third, one should ensure that cycle length is substantially
greater than the length of the pseudorandom sequence to be generated. Finally, any
publicly reported result based in part on a sequence of pseudorandom numbers should
be checked for sensitivity to the choice of generator. This does not imply numerical
analysis that takes the investigator far from the problem of interest. A key advantage
of Monte Carlo methods, to be discussed in Section 4, is that measures of accuracy
are produced as a by-product based on the assumption that successive pseudorandom
numbers are independently and identically distributed. Results obtained using vari-
ants of methods for producing these sequences should agree within these measures
of accuracy. For example, computations can be executed with different seeds, with
different values of c in (3.1.1), with or without shuffling, or using an add with carry
or related generator. This requires only minor changes in code for most software.

3.2. General methods for nonuniform distributions

Throughout this section, z will denote a random variable with cumulative distribution
function (c.d.f.) F and support C, and u will denote a random variable with uniform
distribution on the unit interval. If x is continuous, its probability density function
(p.d.f.) will be denoted by f. We turn first to several general methods for mapping u
into z.
746 J. Geweke
Inverse c.d.f Suppose x is continuous, and consequently the inverse c.d.f.

F -l(p) = {c:P(x < c) = p}

exists. Then x and F-l(u) have the same distribution: P[F-I(u) ~ d] = P [ u ~<
F(d)] = F(d). Hence pseudorandom drawings {xi}i=l N of x may be constructed as
F-1 (ui), w h e r e {ui}~_l is a sequence of pseudorandom uniform numbers.
A simple example is provided by the exponential distribution with probability
density f(x) = A e x p ( - A x ) , x >/ 0. Correspondingly, F(x) = 1 - e x p ( - A x ) ,
F - I (p) = log(1 - p)/A, and consequently, x = - l o g ( u ) / A .
The inverse c.d.f, method is very easy to apply if an explicit, closed form expression
for the inverse c.d.f, is available. Since most inverse c.d.f.s require the evaluation of
transcendental functions, the method may be inefficient relative to others. [That is the
case in the foregoing example; see von Neumann (1951) or Forsythe (1972) for a more
efficient alternative.] In some cases, evaluation of the c.d.f, is superficially closed form
to the user of a mathematical software library but in fact involves nontrivial numerical
integration of the kind discussed in Section 2. A leading example is provided by the
standard normal distribution, for which specialized methods can be applied to the
computation of F -~ [Hart et al. (1968), Strecok (1968)], but for which acceptance
and composition methods (discussed below) are more efficient.

Discrete distributions. Suppose that the random variable X takes on a finite number
of values, without loss of generality the integers 1 , . . . , n and P ( X = i) = Pi. The
preferred methods will depend (among other things) on the number of draws to be
made from the distribution. If only a few draws are to be made (as may be the case
with the Markov chain Monte Carlo methods discussed in Section 6), then the obvious
inverse mapping from the unit interval to the integers 1 , . . . , n can be constructed and
subsequently used to search for the appropriate integer corresponding to the drawn u.
The disadvantage of this method is that the search time can be substantial. If many
draws are to be made, then the alias method due to Walker (1974) and refined by
Walker (1977) and Kronmal and Peterson (1979) is more efficient. The basic idea is
to draw an integer i from an equiprobable distribution on the first n integers, and
choose i with probability r~ and a corresponding alias ai with probability 1 - ri. If
the values of a~ and ri are chosen correctly, then the resulting choice probabilities
are p~ for i (i = 1 , . . . , n). Setting up the table of r~ and ai requires O(n) time [see
Bratley, Fox and Schrage (1987, pp. 158-160) for an accessible discussion]; whether
this overhead is worthwhile depends on the value of n and the number of draws to be
made from the discrete distribution. The aliasing algorithm is implemented in many
mathematical software libraries.

Acceptance methods. Suppose that x is continuous with p.d.f, f(x) and support C.
Let g be the p.d.f, of a different continuous random variable z with p.d.f. 9(z) which
Ch, 15: Monte Carlo Simulation and Numerical Integration 747

has a distribution from which it is possible to draw i.i.d, random variables and for

sup [f(x)/g(x)] = a < oo.


The function g is known as an envelope or majorizing density of f , and the distribution

with p.d.f, g is known as the source distribution. To generate xi,
(a) Generate u;
(b) Generate z;
(c) It" u > f(z)/[ag(z)], go to (a);
(d) xi = z.
The unconditional probability of proceeding from step (c) to step (d) in any pass is

f(z) g(z) dz = a - 1 '
oo " '

and the unconditional probability of reaching step (d) with value at most c in any
pass is

/~- f(z) .g(z)dz = a-iF(e).

oo '

Hence the probability that xi is at most c at step (d) is F(c).

The principle of acceptance sampling is illustrated in Fig. 15.2. The two essentials
of applying this procedure are the ability to generate z and the finite upper bound
on f(x)/9(x ). The efficiency of the method depends on the efficiency of generating
z and the unconditional probability of acceptance, which is just the inverse of the
upper bound on f(x)/g(x). (In this respect, acceptance sampling is closely related
to importance sampling discussed in Section 4.3.) The great advantage of acceptance


Figure 15.2. The target density is f ( x ) , the source density is 9(x), and a = s u p [ f ( x ) / g ( x ) ] ,
748 ,L Geweke
sampling is its ability to cope with arbitrary probability density functions as long as the
two essential conditions are met and efficiency is acceptable for the purposes at hand.
Notice that the method will work in exactly the same way if f(x) is merely the kernel
of the p.d.f, of x (i.e., proportional to the p.d.f.) as long as a = sup~ec[f(x)/g(x)]
(although in this case a -1 no longer provides the unconditional acceptance probabil-
ity). This property can be exploited to advantage to avoid numerical approximation
of unknown constants of integration.
Specific examples providing insight into the method may be found in the family
of truncated univariate normal distributions. As a first example, consider the standard
normal probability distribution truncated to the interval (0, 0.5):

f(x)- 0.19146v/~exp - =2.0837exp - , 0<x~<0.5.

The standard normal distribution itself is a legitimate source distribution, but since
suPo<x<o.5[f(x)/9(x)] = ( 0 . 1 9 t 4 6 ) - ' , the efficiency of this method is low. However,
for a source distribution uniform on (0, 0.5], supo<x<~o.5[f(x)/9(x)] = 2.0837/2.0 =
1.0418: the unconditional probability of acceptance is (1.0418) -1 = 0.95985. As a
second example, consider the same distribution truncated to the interval (5, 8]:

f(x) = 2.8665 x 10 -7 x/ff~ exp -

= 1.3917 x 106 exp ( - ~), 5<x~<8.

The standard normal fails as a source distribution since the acceptance probabil-
ity is 2.8665 x 10 -7. A uniform source density yields an acceptance probability of
only 0.064271. An exponential distribution translated to the truncation point is for
many purposes an excellent approximation to a severely truncated normal distribution
[Marsaglia (1964), Geweke (1986)], and for the exponential source density, setting the
parameter equal to the truncation point is an optimal or near optimal choice [Geweke
(1991)]. One can readily verify that the acceptance probability for the source density

9(x) - 5exp[-5(x-- 5)], 5 < x ~ < 8 ,

is 0.96406.

Optimizing acceptance sampling. Acceptance methods may readily be extended to

multivariate distributions. This topic is taken up in detail in Section 4.2. We turn now
to the question of finding an optimal source distribution for a specified problem and
develop results for the general case of univariate or multivariate distributions.
Ch. 15: Monte Carlo Simulation and Numerical Integration 749

In general, suppose' that it is desired to draw i.i.d, variables from a distribution

with target density kernel f(x;0), 0 E (9, having support C(O) c 9l'~; the param-
eter vector 0 indexes a family of density kernels f(.). Suppose that a family of
source distributions with densities 9(x; a), a E A C 9l~, having support D(c 0, has
been identified, with the property that for all 0 c @, there exists at least one
for which SUpxec(0 ) f(x; O)/9(x; a) < ec. To accomplish the goal of i.i.d, sampling
from f(x; 0), draws from 9(x; a) are retained with probability q(a, 0)f(x; O)/9(x; o~),

F f(x;O)] -1
q(o~,O) ~ ] sup
LxcC(0) 9(x; ct) J "

Suppose the family of source densities 9(" ;') has been fixed, but not the value of a,
and that the objective is to maximize the unconditional probability of accepting the
draw from the source distribution. Just as in the foregoing examples, this unconditional
probability is proportional to

fD (a) q(a,g(X; O~) 0) g(X; a) dx = q(c~,0).


Hence the problem is to determine the saddle point


Given the usual regularity conditions, a necessary condition is that a be part of a

solution of the (ra + p)-equation system

[log f ( x ; 0 ) - logg(x; c~)] = 0,

0~ logg(x; c~) = 0.

As an example, consider the target density kernel

f(z; T, 7]) --- _P exp(-r/z),

which arises as a conditional posterior density kernel for the degrees-of-freedom pa-
rameter in a Student-t distribution [Geweke (1992b, Appendix B)]. For the exponential
750 Z Geweke

family of source densities g(x; c~) = c~exp(-c~x), the regular necessary conditions
are that

TIlog(2 ) +1 ¢(2)] +(c~-~)=0,

O~-1 - - X = 0 ~

where ¢(.) = F ' ( . ) / F ( . ) is the digamma function. The desired value of c~ is the
solution of

T2 - 1 ° g ( 2 @ + 1 - ¢ ~ +(c~-r/)=0,

which may be found using standard root-finding algorithms. Acceptance rates of about
0.15 are reported in Geweke (1992b).

Adaptive methods. It may be possible to improve upon a source distribution, using

information about the target distribution acquired in the sampling process itself. A
very useful application of this idea has been made to the problem of sampling from
distributions with log-concave probability density functions. It is especially attractive
when it is costly to evaluate the target density kernel at a point or when known source
densities are inefficient or nonexistent. The exposition here closely follows Gilks and
Wild (1992), who build on some earlier work by Devroye (1986); see Wild and Gilks
(1993) for a published algorithm. An application of this algorithm is discussed in
Section 7.1.
Let h(x) = log f(x). The support D of f(x) is connected, and h(x) is differentiable
and weakly concave everywhere in D; i.e., h~(x) is monotonically nonincreasing in
Suppose that h(x) and h'(x) have been evaluated at k points in D, xl ~< ... ~< xk,
k ~> 2. We assume that if D is unbounded below, then h~(xl) > 0 and that if D
is unbounded above, then h~(xk) < 0. Let the piecewise linear upper hull u(x) of
h(x) be formed from the tangents to h(.) at the xj, as shown in Fig. 15.3. For
j = 1 , . . . , k - 1 the tangents at xj and xj+l intersect at

h(xj+l) - h(xj) - xj+lh'(xj+1) + xjh'(xj)

~J = h'(xj) - h'(xj+~)

Further let w0 denote the lower bound of D (possibly - o o ) and wk the upper bound
of D (possibly +c~). Then

u(x) = h(xj) + (x - xj)h'(xj), x E (wj-l,wj].

Ch. 15: Monte Carlo Simulation and Numerical Integration 751


ia / > x
xl wl x2 w2 xa \

Figure 15.3. The function h(x) = l o g f ( x ) , where f ( x ) is a log-concave p.d.f. The lower hull g(x) is
formed by the chords joined at the xj, and the upper hull u ( z ) is formed by the tangents at the z# which
are joined at the wj.

Similarly the piecewise linear lower hull g(x) of h(x) is formed from the chords
between the x#,

g ( x ) = (xj+l - x)h(x#) + (x - x#)h(Xj+l)

, X • ( Z j , Xjq-1].
xj+l - xj

For subsequent purposes it is useful to extend the definition to include

e(x)=-oo, X<Xl or x>xk.

At the start of an acceptance/rejection iteration, the function exp[u(x)] forms a

source density kernel, and exp[g(x)] is a squeezing density kernel. The iteration begins
by drawing a value z from the distribution with kernel density function exp[u(x)].
This may be done in two steps:
(a) Compute p# = P(w#_, < x ~ wj) = I j / I (j = 1 , . . . , k), where

h'(~j) if h'(x#) 7/=O,

( h(xj)(wj - W#-l) if h'(x#) = 0

and I = ~ = l I#. Choose an interval (wj-1, w#] from this discrete distribution as
described above.
(b) Conditional on the choice of interval the source distribution is exponential.
Draw z from this distribution as previously discussed.
The draw z is accepted or rejected by means of the acceptance sampling algorithm
described above, but using the following shortcut. Having drawn u, we know that z
will be accepted if u ~ exp[g(z) - u(z)], and in this case no further computations are
752 A Geweke

required. If u > exp[e(z) - u(z)], then evaluate h(z) and h'(z) accept z if and only
if u ~< exp[h(z) - u(z)]. In the latter case add z to the set of points ( x l , . . . , xk),
reordering the xjs, and update u(.) and g(.), unless z is accepted and no more draws
from the target distribution are needed. This completes the acceptance iteration.
Notice that this algorithm is more likely to update the source and squeezing densities
the more discordant are these functions at a point. As the algorithm proceeds, the
probability of acceptance of any draw increases toward 1, and the probability that an
evaluation of h will be required for any draw falls to 0.

Composition algorithms. Formally, composition arises from a p.d.f, representation

f(x) =
gy(X) d H ( y ) .

A random variable Y from distribution H is generated, followed by a random variable

X with p.d.f, gy. This method goes back at least to Marsaglia (1961), who used it to
generate normal random variables. It is also the natural method to use for mixture dis-
tributions. For example, suppose that x is drawn from a N'(0, 0.12) distribution with
probability 0.95 and a N'(0, 102) distribution with probability 0.05. The probability

0.95 - -
1 ( x2)
0.1v/~ exp --- exp - 2--66
is strongly leptokurtic and not well suited to acceptance sampling. But the construction
of the random variable in fact corresponds to a composition with

P(Y=O)=0.95, P(Y= 1)=0.05,

gY=°(x) - 0.1 exp 07d2 '

9 g = l ( x ) = 0.05 1 0 v ~ exp - 2-~ "

3.3. Selected univariate distributions

In most cases there is associated with each of the classical univariate distributions
a substantial literature on the generation of corresponding pseudorandom variables.
Good mathematical and statistical software libraries have drawn on this literature and
are widely available. In many cases the most efficient and accurate routines are not
simply implementations of the constructions that appear in the mathematical statistics
literature, and the user is well-advised to take advantage of the capital embodied
Ch. 15." Monte Carlo Simulation and Numerical Integration 753

in good libraries. The discussion here is limited to illustrating how the techniques
discussed in Section 3.2 are used in specific cases. More thorough surveys in the
literature are provided by Bratley, Fox and Schrage (1987, pp. 164-189) and Devroye
(1986). All of the methods discussed here are implemented in good software libraries,
which should always be used. This discussion is not intended to form the basis of
reliable code.

Binomial distribution. The binomial distribution indicates the probability of k suc-

cesses in n independent trials if p is the probability of success in any given trial:

p(k) = (nk)Pk(1-- p) (n-k).

The definition provides a direct method for generating the random variable k, but is
acceptably rapid only if n is small. For small values of np, the inverse c.d.f, method is
practical since p(k) will typically require evaluation for only a few values of k. In all
other cases, however, composition algorithms with acceptance methods are more effi-
cient. Examples are given by Ahrens and Dieter (1980) and Kachitvichyanukul (1982).

Univariate normal distributions. Inverse c.d.f, methods for the standard normal have
already been mentioned. Acceptance sampling methods are not hard to design, espe-
cially if one exploits the exponential source distribution as first noted by Marsaglia
(1964). Related and succeeding work by Marsaglia and Bray (1964); Marsaglia,
MacLaren and Bray (1964); and Kinderrnan and Ramage (1976) combining accep-
tance sampling and composition form the basis for the generation of standard normal
variables in most software libraries.
Box and Muller (1958) showed that if U1 and U2 are mutually independent standard
uniform random variables, then

X = cos(27rUl)x/-21ogU2, Y = sin(27rU1)x/'C21og U2

are independent standard nol-mal random variables. (The key to the demonstration lies
in a transformation to polar coordinates.) The combination of this method with the
linear congruential random number generator produces a pathology, however. If U,i
and U,i+l are successive realizations of (3.1.1)-(3.1.2) then
Ui+, = [(amUi + c)modm]/m
J" cos(27rVi+,) = cos [27r(aUi + c/m)],
==>[ sin(27rUi+l) = sin [27r(aUi + c/m)]
and hence
X~ = cos [2~(o,U,~ + c/,r~)] ~ : - 2 l o g U~,
Y / = sin [27c(aUi + c/m)] , ~ l o g Ui.
754 z Geweke

All possible values of (X~, Y~) fall on a spiral. As an approximation to a pair of

independent variables the distribution of (X~, Yi) could hardly be worse. However, if
one discards Y/, the sequence {X~} suffers from no known problems of this kind. This
is one of the reasons that acceptance sampling and composition rather than the Box-
Muller transformation is used in statistical libraries. It illustrates the risks involved
in seemingly straightforward combinations of distribution theory with pseudorandom
uniform variables.
Given a sequence of standard normal random variables {zi}, a sequence from the
general univariate normal distribution A/'(#, crz) can be generated through the familiar
transformation xi = # + crzi.

Gamma distributions. The gamma distribution is important in its own right, for in-
cluded special cases like the chi-square, and as a building block for other distributions
like the beta. The g a m m a distribution with scale parameter A and shape parameter a
has probability density

f ( z ) = ;~exp(-Ax)(Ax)a-l /r(a), x >10.

In general, random variables from this distribution may be generated efficiently using
composition algorithms andacceptance methods. Fast and accurate methods are com-
plicated but readily available in statistical software libraries. For example, IMSL uses
the composition-acceptance methods of Ahrens and Dieter (1974) and Schmeiser and
Lal (1980). A few special cases are worth note.
(a) If a = 1, then the distribution is exponential with parameter A and the inverse
c.d.f, method discussed above is much more efficient.
(b) If a = 0.5, then x = z2/2, z ,-~ .N'(0, A2).
(c) If A = 0.5, then x ~ X2(u), u = 2a. If a is an integer, then x is the sum of a
independent exPonentially distributed random variables each with parameter A = 0.5.
If u is an odd integer, then x is the sum of [u/2] independent exponentially distributed
random variables plus the square of an independent standard normal. For integers up
to u = 17, these representations provide the basis for more efficient generation from
the chi-square distribution, but for larger integers it is more efficient to use the more
general composition-acceptance methods.

3.4. Selected multivariate distributions

Generation of random vectors typically builds upon the ability to generate univariate
random variables. Just how this should be done is not always obvious, however, and
sometimes the obvious method is not the most efficient. The examples that follow are
intended only to illustrate this fact. Statistical software libraries should be consulted
for implementation of these methods.
Ch. 15." Monte Carlo Simulation and Numerical Integrution 755

Multinomial distribution. The multinomial distribution indicates the probability of

kj realizations of outcome j, from m possible outcomes, in n independent trials. If
pj is the probability of outcome j in any given trial, then

mn! f i pjkj , kj >~0 and £ kj = n.

p(kj) [Ij=l kj! j=l
- -


The decomposition of this distribution into its full conditionals, p(kl), p(k2 i k l ) , . . . ,
p(kj [ k l , . . . , k j - l ) , . . . ,p(km I kl .... ,k,~-l), may be used to generate the kj. We

P(~l) = k, kl]. 1 --

p(kj I k~, ,k:i-~) = fc~jS-'~j'l

£~j =- n - ~ ki, pj = pj / 1-- Pi •

These distributions are all binomial.

Multivariate normal distribution. The generation of a multivariate normal random

vector x from the distribution A/'(/z, ~ ) is based on the familiar decomposition

z~.N'(0, I~), x=/~+Az with A A ' = E .

While any factorization A of 27 will suffice, it is most efficient to make A upper
or lower triangular so that m ( m + 1)/2 rather than T/~2 products are required in
the transformation from z to x. The Cholesky decomposition, in which the diagonal
elements of the upper or lower triangular A are positive, is typically used.

Wishart distribution. If xi l~ At(o, •), the distribution of A = ~i~=l (x~ - 2)(xi -

7r~× l
2) ~ is Wishart, with p.d.f.

IAI½('~-'~) exp ( - ½tr 27-1A)

f ( A ) :- 2½(~_l)mTr,~(,~_l)/4[yT[½(,~_l ) I - [ ~ - 1 F rn!L( 2 - i)];

for brevity, A ~ W(27, n - t). (For obvious reasons this distribution arises frequently
in simulations. It is also important in Bayesian inference, where the posterior distri-
bution of the inverse of the variance matrix for a normal population often has this
756 J. Geweke

form.) Direct construction of A through generation of { X i}i=1

n becomes impractical
for large n. A more efficient indirect method follows Anderson (1984). Let ~ have
lower triangular Choleski decomposition 57 = L L ' , and suppose Q ,-~ W(I,~, n - 1).
Then L Q L ' ~ W(22, n - 1) [Anderson (1984, pp. 254-255)]. Furthermore Q has
Q = UU', u~j=O(i<j<m),
uij ~ N'(O, 1), uii2 ,'-, x 2 ( n - i)

(i = 1 , . . . , m ) , with the uij mutually independent for i ~> j [Anderson (1984,

p. 247)]. Even if n is quite small, this indirect construction is much more efficient
than the direct construction.

4. Independence Monte Carlo

Building on the ability to produce sequences of vectors that are well described as
i.i.d, random variables, we return to the integration problem with particular attention
to high dimensions. There are two distinct but closely related problems that arise in
economics and econometrics.
Problem I is to evaluate

I = / D f ( x ) dx.

Problem E is to evaluate

E - E[g(x)],

where x is a random vector with c.d.f. P(x). To simplify notation, assume that/9 is
absolutely continuous and that x has a probability density function p(x). It is implicit
in Problem E that fD 9(x)P(x) dx is absolutely convergent in its domain D. Detailed
examples of Problems E and I are provided in Section 7.
If a random vector z has p.d.f, p(z), then any function r(z) = a • p(z), a > 0,
is said to be a kernel density function for z. In order to express some key moments
compactly, let ET [9(z)] denote the expectation of g(z) if z has kernel density function
r(z); similarly varT[9(z)] for variance.
Many of the procedures discussed in this section are straightforward applications
of two results in basic mathematical statistics. Let {yi} be an i.i.d, sequence from a
population, and let

1 ~-~ Yi and z 1
iJN = -~ ~ 8N -- N - 1 E (Yi - YN) 2.
i=l i=1
Ch. 15." Monte Carlo Simulation and Numerical Integration 757

If the population has finite first moment, then E(gN) = E(y) and the strong law of
large numbers states that

Y N a.s~ E(y);

i.c., P[limN--+oo YN = E(y)] = I. If the same population also has a finite variance
(72, then the central limit theorem establishes that

v~[~ - E(y)] ~ ~¢(o, ~);

i.e., limN-+oo P{v/N[ON-E(y)] <~ccr} = ~(c), where as(.) is the c.d.f, of the.h/'(O, 1)
distribution. In this case E(s 2 ) = 0 2, and from the strong law of large numbers,

,~ e:e+ 02

4.1. Simple Monte Carlo

In the case of Problem I, suppose that

f(x) = g(x)p(x),

with p(x) >~ O, where fD p(x) dx = p* is a known positive constant. Then p(x) is a
kernel density function. Suppose further that it is possible to draw pseudorandom vec-
tors {x~} from the distribution with probability density function p(x)/p*, as described
in Section 3. Since

I= £ f(x) dx=, /~ p*g(x)p(x)

p* dx=E v[p*g(x)],

it follows that

IN = (4.1.1)

The requirement that p* is known may be weakened by replacing p* with a sequence

P~v 2:_%p* in the last expression. (Some practical methods of producing P~v at essen-
tially no incremental cost are taken up in Section 4.2.) If p* is known, then E(IN) = I,
but if p* must be replaced by a consistent estimator, then in general E(IN) ¢ I but
(4.1.1) is still true.
758 J. Geweke

If in addition fD g2(x)P(x) dx is absolutely convergent, this result can be extended

to provide a measure of the accuracy of IN. Let

.: : var~ [V*9(x)] : p-: 1 /D [p*g(x) I]2p(x)d(x).


V~(I~-I) d>N(O,~2), ~[p*g(x~)--±U]~°'~ ~.

(The result may be extended to include cases in which p* is approximated by a

sequence of P~v, but some changes are required; see Section 4.2.) This result makes
exact the intuitive notion that p(.) should be chosen to mimic the shape of f(-).
The solution of Problem E by simple Monte Carlo is even simpler, as long as
it is possible to construct an i.i.d, sequence from the probability distribution of x
in E[9(x)], for then EN = 1 ~N1 g(Xi) - ~ E and E(EN) = E, VN. It is not
necessary to know the integrating constant of the kernel probability density for x. If
cr2 = var[9(x)] exists, then v/-N(EN - E) ~ + Af(0, c~2) as well.
As an example, consider the problem

I= f~ f(x)dx= f~ a(x)p(x)dx
=/; 1 - .)'I-I(x -
a(x) exp I- ~(x ,)] dx,
where H is positive definite. Since p(x) is a multivariate normal kernel density func-

IN = (27r)k/ZlHl-1/2N-1~ g(x~), x~ ~ At(u, rI-J).

Because p(x) ) 0, Vx E ~k, IN ~ 4 1 regardless of the form of f(x). However,

convergence will be impractically slow if 9(x) is ill conditioned or (equivalently) #
and H are chosen so that p(.) poorly mimics f(.). If varp[g(x)] exists, then

~2 = (2~) k IX-II-~varp[g(x)]

provides the pertinent measure of the adequacy of IN as an approximation of I. Only

this expression - not the dimensionality k - matters.
Ch. 15: Monte Carlo Simulation and Numerical Integration 759

4.2. Acceptance methods

Acceptance methods may be used to evaluate integrals in much the same way as they
are used to produce pseudorandom numbers. In Problem I, suppose that 0 ~ g(x) ~<
a < ec, Vx C D. Suppose further that p* is known or equivalently that p(x) is a
probability density function and not merely a kernel. Let {xi} be an i.i.d, sequence
drawn from a distribution function with p.d.f, p(x), and let ui be a corresponding
Bernoulli random variable,

u i = 0 o r 1, P(u, = 1) = 1 g(xd.


IN = N a

ui ...." aEp(ui) = a /o1 g(x)p(x) dx = I,

E ( I N ) = I VN, v / N ( I N - I) - ~ N'(0, a2),

0-2 = a l - 12, aIN - I2N a.~ 0-2. (4.2.1)

This method may be extended to g(x) for which - o o < g < g(x) ~< u < oo, by defin-
ing g+ (x) = sup[0, g(x)], g - ( x ) = - i n f [ 0 , g(x)], and approximating fD g+ ( x ) d x and
fD g - (X) dx separately. Observe that 0-2 is an increasing function of a and the uncon-
ditional probability of acceptance P ( u i = l) = a - l I is a decreasing function of a,
If p(x) c< g(x), then P ( u i ) = 1 and 0-2 = 0, but this is tantamount to being able to
integrate f ( x ) analytically. In general one seeks to minimize a. If a is too large, then
very few ui will be accepted, and the method will be impractical.
In Problem E, acceptance methods may be applied to draw from the distribution
with probability density p(x). If h(x) is a source density as described in Section 3.2,
0 ~ p ( x ) / h ( x ) ~< a < (x~, Vx c D, then a sequence ofi.i.d, draws from the distribution
with p.d.f, p(x) may be constructed. If we take {xi}.i=l g to be the accepted draws,
EN = -~ 9(x0 E, E(EN) = E, VN,

V~(EN -- E ) d > J k f ( 0 , 0-2), 0-2 = v a r v [ g ( x ) ] , (4.2.2)


760 J. Geweke

If we take { Z i}i=l
N to be draws from the source density, and u,i = 1 if zi is accepted
and u~ = 0 if not, then
v~(EN -- E) - % N/V'(O, 0"2),
i=l i=l

i=l i=l

(In this case one again seeks to choose h(x) so as to minimize a.) Which expression
is more relevant depends on the particulars of the problem. We shall return to this
topic in Section 4.4.
The acceptance method just described assumes that the probability density is known,
including its constant of integration - i.e., fD p(x) dx = 1. This assumption may be
strong in practice. In Problem I, one may recognize p(x) as a probability density
kernel, not knowing the constant of integration. Acceptance or adaptive methods might
be applied to draw from the distribution with kernel density p(x); these methods do
not require that one know the constant of integration for p(x). If p(x) is the kernel
and p* = fD p(X) dx, it is then the case for acceptance methods in Problem I that

1 a.8.


Whether or not consistent evaluation of p* is possible depends on the method used

to draw variables from the distribution with kernel p*. If the method is acceptance
sampling or a variant on acceptance sampling (e.g., the adaptive method for log-
concave densities described in Section 3.2), one can approximate p* using the methods
just described as long as the actual probability density (not just the kernel) of the
source distribution for the target kernel p(x) is known. This produces a sequence p~
_. . a s~ p..
with the property PN -- ~ ~ 1 P~ In this case clearly

~ a.8.
v (iN - 1) -% N(0, 0-2),

but 0-2 is affected by the substitution of/5~v for p*.

One may work out expressions for ~r2 and a corresponding consistent (in N ) ap-
proximation of 0-2, as has been done already in several cases. Such expressions are
quite useful in the analytical comparison of approximation methods. But if the goal is
simply to assess approximation error, straightforward asymptotic expansion is much
simpler. To illustrate the method, return to the case of simple Monte Carlo integration
Ch. 15." Monte Carlo Simulation and Numerical Integration 761

with p* unknown, (4.1.1). Let M be the number of i.i.d, draws from source density
h(z) for target density p(z), define a = supD[p(z)/h(z)], and let

Yi = p(zi)/h(zi),
1 with probability p(zi)/ah(zi),
u~ = 0 otherwise,
w,~ = u ~ g ( z d .

1 1 1

i=1 i=1 i=1

IM = YMWM/~ZM ~ I.

As long as fD 92(x) p(x) dx is absolutely convergent, ~ ( I M -- I) --~ N'(0, or2),

[ #-~(yi) ~(wi) v'~(ui) 2 c ~ ( y i , wl) 2c~'~(yi, ui)

2c~(w~, ] a.s~ a2.

"ff)M ~ M J
(This expression may be derived by the delta method, i.e. by linearizing IM in tjM,
gM and WM. The terms ggr(yi), co"-~(yi,wi), etc., are computed in the usual way
from {Yi, w~, ui}i=l.)

4.3. Importance sampling

The method of importance sampling may be used to solve Problem I or Problmn E,
under similar circumstances: one has available a probability distribution with p.d.f.
somewhat similar to the integrand f(x) in Problem I or the probability density function
p(x) in Problem E and wishes to use an independent, identically distributed sample
from this distribution to approximate I or E. Rather than use acceptance to generate
an i.i.d, sample from the distribution with p.d.f, p(x), importance sampling uses all
of the draws from the source probability distribution but weights that sample to
obtain a convergent approximation. In this method the probability density function
of the source distribution is called the importance sampling density, a term due to
Hamanersley and Handscomb (1964), who were among the first to propose the method.
It appears to have been introduced to the economics literature by Kloek and Van Dijk
(1978). We shall denote the importance sampling density j(x).
762 £ Geweke
Suppose that for Problem I one can draw an i.i.d, sequence of random vectors {xi}
from the importance distribution and that the support of this distribution includes D.

Ej [f(x~)/j(xi)] = /D ~f(x)
( ~ 3~x"
( ) dx = f D f ( x ) d x = I -

Since f(xi)/j(xi) is also an i.i.d sequence,

1 ~ f(xi) a.s~ I
IN =- -~ i=l j(xi)

by the strong law of large numbers. Furthermore, E(/N) = I VN. This result is
remarkable for its weakness: no upper bound on f(x)/j(x) is required as is the
case for f(x)/h(x) in acceptance sampling. The requirement that the support of j(x)
include D is necessary and usually trivial to verify.
In Problem E importance sampling may be attractive if there is no simple method
of constructing pseudorandom numbers drawn from the distribution P(.) underlying
the expectation operator. If the constant of integration for the probability density is
known, then

1 ~ g(xi)p(xi) a.s. E and E(EN) = E, VN,

EN = ~ i=l j(xi)

as long as the support of the importance sampling distribution includes that of P(-).
If the constant of integration is not known and p(x) is merely the kernel of the
probability density function, fD p(x) dx = t9", then

1 ~ 9(xdp(,,d % 1~ p(xi) a.~ .

i=1 j(xi) p'E, i=l j(xi) + p '

and hence

EN--1 [g(xi)p(xi)/j(xi)] a.s~ E, (4.3.1)

=- ES,
but of course E(EN) ¢ E in general. In either case w(x) = p(x)/j(x) may be re-
garded as a weight function, large weights being assigned to those g(xi) for which the
importance sampling distribution assigns smaller probability than does the probability
distribution P(.).
Ch. 15: Monte Carlo Simulation and Numerical Integration 763

To assess the accuracy of importance sampling approximations using a central limit

theorem, more is required. In the case of Problem I, suppose that fD[f2(x)/j(x)] dx
is absolutely convergent. Then f(xi)/j(xi) is an i.i.d, sequence and

IN - ~ ±, v~(~rN- ±) ~ H(0, ~2),

~. __ [ [f'(x) l z= [f(x) _ I
J~ i J--~)J dx- = Ej (4.3.2)
1 kf2(xd G%~2.
8 2 -~" - N i=l ja(xi)

It is therefore practical to assess the accuracy of IN as an approximation of I.

The convergence of fD [fe(x)/j(x)] dx must be established analytically, however. If
If(x)/j(x)] is bounded above on D or if D is compact and f2(x)/j(x) is bounded
above, then convergence obtains. If neither of these conditions is satisfied, then veri-
fying convergence may be difficult. In choosing an importance sampling density, it is
especially important to insure that the tails of j(x) decline no faster than those of f(x).
If these conditions are not met, but one still proceeds with the approximation, then
convergence is usually quite slow. Violation of the central limit theorem convergence
condition then may be evidenced by values of s 2 that increase with N.
Assessing the accuracy of EN as an approximation of E is complicated by the
ratio of terms in (4.3.1). If both

[ p2(x)
Ep[w(x)] = JDJ~-dx (4.3.3a)


Ep [g2(x)w(x)] = ~ [g2(x)p(x)] dx (4.3.3b)

are absolutely convergent, then

EN a.s.) E, v/N(EN - E) d .Af(0, or2),

, NE~=~
N [ g ( x d - E ]2 w (x~) _ ~ O.2"
8N ~ 2

[Derivations are given in Geweke (1989).] This result provides a practical way to
assess approximation error and also indicates conditions in which the method of
764 J. Geweke

importance sampling will work well for Problem E. A small value of Ep[w(x)], perhaps
as reflected in a small upper bound on w(x), combined with small varp [g(x)], will lead
to small values of cr2. As in the case of Problem I, central limit theorem convergence
conditions must be verified analytically.
There has been little practical work to date on the optimal choice of importance
sampling distributions. Using a result of Rubinstein (1981, Theorem 4.3.1) one can
show that the importance sampling density with kernel 19(x) - Elp(x) provides the
smallest possible value of cr~. This is not very useful, since drawing pseudorandom
vectors from this distribution is likely to be awkward at best. There has been some
attention to optimization within families of importance sampling densities [Geweke
(1989)], but optimization procedures themselves generally involve integrals that in
turn require numerical approximation. Adaptive methods use previously drawn xi to
identify large values of f ( x ) / j ( x ) , w(x), or 92(x)w(x) and modify j(x) accordingly
[Evans (1991)]. Such procedures can be convenient but are limited by the fact that xi
is least likely to be drawn where j(x) is small. Informal, deterministic methods for
tailoring j(x) have worked well in some problems in Bayesian econometrics [Geweke
In Problem I the objective in choosing the importance sampling density is to find
j(x) that mimics the shape of f(x) as closely as possible; the relevant metric is (4.3.2).
Finding j(x) e ( f ( x ) will drive O-2 t o zero, but this amounts to analytical solution of
the problem since f D j ( x ) d x = 1. In Problem E the relevant metric (4.3.4) is more
complicated, involving both the variance of g(x) and the closeness of j(x) to p(x)
as reflected in w(x) = p(x)/j(x). As long as varp[9(x)] > 0, no choice of j(x) will
drive cr2 to zero, and if varp[9(x)] = 0, then Problem E reduces to Problem I. If
j(x) e(p(x), then ~r2 = varv[9(x)], which can serve as a benchmark in evaluating
the adequacy of j(x). The ratio c~2/varp[9(x)] has been termed the relative numerical
efficiency of j (x) [Geweke (1989)]: it indicates the ratio of iterations using p(x) itself
as the importance sampling density, to the number using j(x), required to achieve the
same accuracy of approximation of E. Relative numerical efficiency much less than
1.0 (less than 0.1, certainly less than 0.01) indicates poor imitation of p(x) by j(x) in
the metric (4.3.4), possibly the existence of a better importance sampling distribution
or the failure of the underlying convergence conditions (4.3.3).

4.4. A note on the choice of method

There is considerable scope for combining the methods discussed in Sections 3 and 4.
For example, the pseudorandom number generation in making draws from the popu-
lation with probability density h(x), in the case of acceptance sampling, or j(x), in
the case of importance sampling, generally will involve several of the methods dis-
cussed in Section 3.2. In even moderately complex problems, the investigator needs
to tailor these methods, balancing computational efficiency against demands for the
development and checking of reliable code.
Ch. 15: Monte Carlo Simulation and Numerical Integration 765

Acceptance sampling and importance sampling are clearly similar. In fact, given
a candidate source density, one has the choice of undertaking either acceptance or
importance sampling. A straightlorward comparison of approximation errors indicates
the issues involved in the choice. In Problem I, the variance in acceptance sampling

o-~-- /D [g(x)-- I]2p(x)dX = /D92(x)p(x)dx-- I2

if by draw we mean accepted
draw. But if instead we mean every draw from the
source distribution, the variance is

o-~=aI-1 2, a=supg(x),
from (4.2.1). In importance sampling, where all draws are used but differentially
weighted, the variance is

O'2Z/D 92(x)p(x) dx - 1 2,

from (4.3.2). Hence given a choice between acceptance and importance sampling in
Problem I, importance sampling is clearly preferred: it conserves information from
all draws, whereas the rejected draws in acceptance sampling require execution time
but do not further improve the accuracy of the approximation.
For Problem E the situation is different. The variance is

cr~= /D [9(x) -- E]2p(x)dx

for acceptance sampling (see (4.2.2)) if we count only accepted draws and

o-52= a [ [ 9 ( z ) - Z]Sp(z) dz, a : sup[p(z) / ]~(z) ] ,


if we count all draws (see (4.2.3)). For importance sampling, expressing (4.3.4) in the
notation of acceptance sampling, we have

o-2 fD[9(,, ) - El2"w(x)p(,,)dx, w(x) = p(x)/h(x).

Since o-~ ~ o-~ ~ o-h, a choice between acceptance and importance sampling on
grounds of computational efficiency rests on the particulars of the problem. If evalu-
ation of 9(x) is sufficiently expensive relative to evaluation of p(x)/h(x),
sampling will be more efficient; otherwise, importance sampling will be the choice.
In fact one may combine acceptance and importance sampling. Let c be any positive
constant, and define

'p(zi)/ch(zi) ifp(zi)/h(zi) >~c,

w(zi) = 1 with probability p(zi)/ch(zi) if p(z~)/h(zi) < c,
0 otherwise.

n n

i=1 i=1

For any given problem there will be a value of c that minimizes the variance of
approximation error relative to required computing time. This may be found experi-
mentally; or for some analytical methods, see Mfiller (1991, Chapter 2). The hybrid
method can result in dramatic increases in efficiency when computation of 9(x) is
relatively expensive (or there are many such functions to be evaluated) and the weight
function w(x) is small with high probability.
A more fundamental choice is that between the simulation methods discussed in
this and the previous section and the deterministic algorithms outlined in Section 2.
Many problems in economics require integration in very high dimensions. (Two ex-
amples are presented in Section 7.) For such problems the most practical deterministic
procedures are the low discrepancy methods of Section 2.3. Tables 15.2 and 15.3 pro-
vide some specific comparisons for dimensions as high as d = 100. (Execution time
for quadrature methods in these problems is approximately 8 × 4 a - l ° seconds on a
Sun 10/51 workstation: 0.01 seconds for d = 5, 8 seconds for d = 10, 3 months for
d = 20, about 104 times the estimated age of the universe for d = 40, . . . . )
Table 15.2 extends the analysis of the same problem taken up in Section 2.3. As
noted there, the bounds in (2.3.1) and (2.3.2) are useless for this problem and most
others. The actual Halton errors presented in Table 15.2 were found by direct com-
putation, using the first d primes as the bases. The Monte Carlo errors were found
analytically. Two error bounds are presented, one based on a 95% confidence interval
(:t:1.96cr) and a second based on a 100(1-10-12)% confidence interval (i7.13cr).
For lower dimensions the comparison is dominated by the convergence of the Hal-
ton sequence at rate logm/m compared with Monte Carlo at rate m-1/2: the Halton
sequence is much more accurate. But for any reasonable fixed value of m, the compar-
ison in higher dimensions is dominated by an approximately exponential rate of error
increase in d for the Halton sequence, contrasted with the rate d 1/2 for Monte Carlo.
For m = 1,000 iterations, Monte Carlo is more efficient for d exceeding about 25 if
one applies the p = 0.05 standard and for d exceeding about 45 for the p = 10 -12
standard. For m = 50,000 the breakpoints occur around d = 35 and d = 110, respec-
tively. (The Halton error is not monotone decreasing in m because of the systematic
way in which points are selected.)
Table 15.2
Error comparison for Halton sequence and independence Monte Carlo

d Halton error Halton bound MC error(p = 0.05) MC error(p = 10-12)

rr~ = 1,000
5 --7.526 x 10.3 9.302 x 102 0.04000 0.1455
10 --0.02807 6.053 x 1019 0.05658 0.2058
20 --0.1097 2.616 x 1029 0.08002 0.2911
40 -0.3824 8.225 x 1072 0.1132 0.4117
60 -0.8202 2.467 x 10121 0.1386 0.5042
80 - 1.476 1.250 x 10173 0.1600 0.5822
• 100 -2.062 1.447 × 10227 0.1789 0.6509
m = 50, 000
5 --2.786 × 10-4 1.071 x 102 5.658 × 10.3 0.02058
10 -8.861 × 10-4 3.533 x 101° 8.002 x 10-3 0.02911
20 -3.537 x 10-3 3.225 x 1030 0.01132 0.04117
40 -0.02216 3.356 x 1075 0.01600 0.05822
60 -0.02768 2.2990 × 10127 0.01960 0.07131
80 -0.05681 2.4186 x 10181 0.02263 0.08234
100 -0.08779 5.235 x 10237 0.02530 0.09205

Table 15.3 provides a comparison of these methods for an e x a m p l e of Prob-

l e m E. T h e Halton s e q u e n c e is first m a p p e d into the normal distribution applying
the inverse-c.d.f, transformation in each dimension. Each of the five panels pro-
vides approximations to successively higher m o m e n t s , p , of the multivariate nor-
m a l distribution. W i t h i n each panel, the comparison is d o m i n a t e d by the same fea-
tures noted for Table 15.2. C o m p a r i s o n s across panels are d o m i n a t e d by important
characteristics of each method. M o n t e Carlo errors are proportional to El(z) 2p] =
{1-I~=112(P- j ) + 1]} 1/2, where z ~ iV'(0, 1). Halton errors reflect an interaction
b e t w e e n the ordering o f the points and the characteristics of :v~. W h e n p is odd, z Pi is
an odd m o n o t o n e increasing function of zi, whereas the standard n o r m a l probability
density f u n c t i o n is even. F o r any fixed ra, the Halton points systematically exclude
positive z i values for which the corresponding - z i value has b e e n included. H e n c e
the error is always negative (as it was in Table 15.2 for the same reason). W h e n p is
even, this is not the case and the size of the error is smaller as well. The tendency of the
Halton sequence to systematically exclude larger z i has m o r e severe consequences for
evaluation of the integral the higher the value of odd p. Thus, for p = 5 i n d e p e n d e n c e
M o n t e Carlo b e c o m e s d o m i n a n t for values of d exceeding a fairly small threshold.
The largest problems worked for Table 15.3 (d = 100, rn = 5 0 , 0 0 0 ) required about
75 seconds on a Sun 10/51 w h e n solved using a Halton sequence. I n d e p e n d e n c e M o n t e
Carlo was about 15 times faster in every case. The difference reflects the inherent
speed of linear congruential generators, contrasted with the floating point operations
r e q u i r e d to g e n e r a t e a H a l t o n s e q u e n c e . F o r m o r e c o m p l e x and realistic problems

the relative speed of independence M o n t e C a r l o is l e s s i m p o r t a n t , s i n c e c o m p u t a t i o n
t i m e t y p i c a l l y will b e d o m i n a t e d b y s u b s e q u e n t c o m p u t a t i o n s i n v o l v i n g t h e s e q u e n c e s
produced by either method.

Table 15.3
Error comparison for Hahon sequence and Monte Carlo

m= 1,000 m=50,000
Halton error Monte Carlo error Halton error Monte Carlo error
d (p=0.05) ( p = 10 -12) (./9=0.05) ( p = 10 -12)
f(x) = ~i=l x l , x ~ N'(0, Id); evaluate E[f(x)]
5 -0.04190 0.1386 0.5042 - 1 . 8 0 8 x 10 - 3 0.01960 0.07131
10 -0.1411 0.1960 0.7131 - 5 . 5 5 2 × 10 - 3 0.02772 0.1008
20 -0.5497 0.2772 1.008 -0.02076 0.03920 0.1426
40 -1.731 0.3920 1.426 -0.06548 0.05544 0.2017
60 -3.362 0.4801 1.747 -0.1461 0.06790 0.2470
80 --5.658 0.5544 2.017 -0.2573 0.07840 0.2852
100 --7.807 0.6198 2.255 -0.2336 0.08765 0.3189

f(x) = ~id=l X2z, X ~ A/'(0, Id); evaluate E[f(x)]

5 -0.0496 0.2400 0.8733 -1.664 x 10 - 3 0.03395 0.1235

10 -0.0941 0.3395 1.235 -2.418 × 10 - 3 0.04801 0.1747
20 -0.0864 0.4801 1.746 -4.611 × 10 - 3 0.06790 0.2470
40 0.2436 0.6790 2.470 -6.367 × 10 -3 0.0962 0.3493
60 0.5680 0.8316 3.025 -3.662 × 10 -3 0.1176 0.4278
80 0.4982 0.9602 3.493 0.0243 0.1358 0.4940
100 1.449 1.074 3.906 -0.04932 0.1518 0.5523

f(x) :2,\1 3
Xi , x ~ Af(0, Id); evaluate E[f(x)]
5 --0.3500 0.5368 1.953 -0.02286 0.07591 0.2761
10 -1.083 0.7591 2.762 -0.06800 0.1073 0.3906
20 -4.072 1.074 3.906 -0.2386 0.1518 0.5523
40 -11.86 1.518 5.523 -0.6821 0.2174 0.7811
60 -19.56 1.859 6.765 -1.411 0.2630 0.9567
80 -27.78 2.147 7.811 -2.641 0.3036 1.104
100 -36.18 2.400 8.733 -2.218 0.3395 1.235

f(x) ~d x4 x ~ Af(0, Id); evaluate E[f(x)]

5 -0.7442 1.420 5.167 -0.03612 0.2008 0.7307

10 -1.046 2.008 7.307 -0.04667 0.2840 1.0333
20 -0.8494 2.840 10.33 -0.07076 0.4017 1.461
40 -7.504 4.016 14.61 0.03523 0.5681 2.067
60 16.88 4.919 17.90 0.1150 0.6929 2.521
80 23.48 5.681 20.66 -0.1898 0.8034 2.923
100 32.94 6.351 23.10 -0.7909 0.8982 3.268
Table 15.3

m = l, 000 m = 50, 000

Halton error Monte Carlo error Hahon error Monte Carlo e~or
d (p=o.05) (v=lO -~2) (v=o.05) (p=lO -~2)
S(x) = ~ : 1 x~, x ~ N'(0, Id); evMuate E[f(x)]
5 -3.216 4.260 15.50 -0.3365 0.6026 2.192
10 -1.043 6.025 21.92 -1.006 0.8521 3.100
20 --36.44 8.521 31.00 --3.433 1.205 4.384
40 -118.9 12.05 43.84 --9.549 1.704 6.200
60 --202.8 14.76 53.69 --14.50 2.087 7.593
80 --281.6 17.04 62.00 --13.11 2.410 8.760
100 --359.7 19.05 69.32 --23.97 2.695 9.803

These comparisons illustrate the general rule that simulation methods are preferred
for higher dimensional problems. If the dimension is very low, then quadrature meth-
ods are much faster and more accurate. For intermediate dimensions, quadrature is
impractical and low discrepancy methods are more accurate than simulation methods.
Just where the breakpoints occur is problem-specific, and the situation is complicated
by the fact that there are no useful independent assessments of approximation error
for low discrepancy methods. Simulation methods always provide an assessment of
numerical en'or as a by-product, for square-integrable functions. Combined with the
checks for robustness of results with respect to alternative uniform random num-
ber generators and seed values, these methods are practical and reliable for a much
wider range of problems than is any deterministic algorithm. As we shall see, their
application in complex problems can be very natural.

5. Variance reduction

In any of the independence Monte Carlo methods a single draw can be replaced by
the mean of M identically but not independently distributed draws. For example, in
simple Monte Carlo for Problem I,

i=1 Lj=I .1

For any i ¢ k xij and xkt are independent, whereas xij and xiz are dependent. Since
all x~j are drawn from the distribution with probability density p(x),

I N , M a.s~ I , ~ / - N ( f N , M -- f ) d ~ J r ' ( 0 , O"2),

M 2
a.s~ 0 . . 2 .
i=1 j=l

The idea is to set up the relation among x i l , . . . , X i M in such a way that a .2 <
1/Mvarp[9(x~j)]. If in addition the cost of generating the M-tuple is insignificantly
greater than the cost of generating M independent variables from p(x), then IN,M
provides a computationally more efficient approximation of I than does IN.
There are numerous variants on this technique. This section takes up four that
account for most use of the method: antithetic variables, systematic sampling, con-
ditional expectations, and control variables. The scope for combining these variance
reduction techniques with the methods of Section 4 or Section 6 is enormous. Rather
than list all the possibilities, the purpose here is to provide some appreciation of the
circumstances in which each variant may be practical and productive.

5.1. Antithetic Monte Carlo

This technique is due to Hammersley and Morton (1956) and has been widely used in
statistics, experimental design, and simulation [e.g., Mikhail (1972), Mitchell (1973),
Geweke (1988)]. In antithetic simple Monte Carlo integration M = 2 correlated var-
iables are drawn in each of N replications. Then,

= ½{ + cov[9(x ,), }.
As long as cov[g(x~l), 9(x~z)] < 0, antithetic simple Monte Carlo integration with
N/2 replications will have smaller error variance than simple Monte Carlo iteration
with N replications, and the computational requirements will be about the same.
To focus on the main ideas, consider the situation in which p(x) is symmetric about
a point # in Problem I set out in Section 4. In this case xil = # + wi, x~2 = # - w~
describes a pair of variables drawn from the distribution with p.d.f, p(x) with corre-
lation matrix - I . If 9(x) were a linear function, then var{l[9(xil) + 9(xi2)]} = 0,
and variance reduction would be complete. (Clearly I = 9(#); this case is of interest
only as a limit for numerical integration problems.) At the other extreme, if g(x) is
also symmetric about #, then var{½[g(x~l) + 9(xi2)]} = var[9(x)]: U replications of
antithetic simple Monte Carlo integration will yield as much information as N repli-
cations of simple Monte Carlo, but will usually require about double the number of
computations. As an intermediate case, suppose that d(y) = 9(xy) is either monotone
nondecreasing or monotone nonincreasing for all x. Then g ( X i l ) - - I and g ( x i 2 ) -- f
must be of opposite sign if they are nonzero. This implies cov[g(xil),g(xi2)] < O,
whence cr.2 ~< ½var[9(x)] = cr2/2, and so antithetic simple Monte Carlo integration
produces gains in efficiency.
The use of antithetic Monte Carlo integration is especially powerful in an important
class of Bayesian learning and inference problems. In these problems x typically rep-
resents a vector of parameters unknown to an economic agent or an econometrician,
and p(x) is the probability density of that vector conditional on information available.
The integral I could correspond to an expected utility or a posterior probability. If the
available information is based on an i.i.d, sample of size T, then it is natural to write
pT(x) for p(x). As T increases, the distribution pT(X) generally becomes increasingly
symmetric and concentrated about the true value of the vector of unknown parame-
ters, reflecting the operation of a central limit theorem. In these circumstances g(x)
is increasingly well described by a linear approximation of itself over most of the
support of pr(x), as T increases. Suppose that the agent or econometrician approx-
imates I using simple Monte Carlo with accuracy indicated by ~r~r or by antithetic
simple Monte Carlo with accuracy indicated by cr~r2. Given some side conditions,
mainly continuous differentiability of 9(x) in a neighborhood of the true value of the
parameter vector x and a nonzero derivative of 9(x) at this point, it may be shown
that cr~r2/c@ --+ 0 [Geweke (1988)]. Given additional side conditions, mainly twice
continuous differentiability of 9(x) in a neighborhood of the true value of the param-
eter vector x, it may be shown that T~r~2/c~ converges to a constant. The constant is
inversely related to the magnitude of O9(x)/Ox and directly related to the magnitude
of ~2g(X)/~x~x t, each evaluated at the true value of the parameter vector x [Geweke
(1988)]. This result is an example of acceleration, because it indicates an interesting
sequence of conditions under which the relative advantage of a variance reduction
method increases without bound.
Application of the method of antithetic variables with techniques more complicated
than simple Monte Carlo is generally straightforward. In the case of importance sam-
pling, Xil and x~2 are drawn from the importance sampling density j(x). In Problem I
the term ½[f(xil)/j(xil) + .f(x~2)/j(xi2)] replaces f(x~)/j(x~). In Problem E, define
w(x) = p(x)/j(x) as before. Then

N E,
~ i = , [W(Xil) -~ W(Xi2)]
- E) _ 2 + / ( 0 ,

N~i=IN [g(xil)w(xi,)+g(xi2)w(xi2) __ ~ N W(Xil) -~- W(Xi2)] a.s.

S~v = [ ~(~")+~(~:) ~ cr.2-
4{ ~i~1 [w(xi,)+ w(xi2)] }2
772 Z Geweke

These results are valid for any antithetic variables algorithm, even if j (x) is not
symmetric and even if the variance of the approximation error cr2 is increased rather
than decreased in moving to the use of antithetic variables. The essential requirements
are that the xijs be drawn from the importance sampling distribution and that x~ and
xm be independent for i ¢ k.
In complex problems involving multivariate x, pseudorandom variables often may
be generated by use of successive conditionals for x t = ( xto ) , . . . , X !(~)),

p(x) = p(x(l))p(x(=)I x o ) ) . . . p ( x ( ~ ) I x o ) , . . . , x ( ~ - l ) ) .

In such cases a pair of antithetic variables Xil and xi2 may be created by constructing
a pair for a single, convenient subvector xu). Especially if g(x) = g(x(j)), the benefits
of antithetic Monte Carlo will then be realized in both Problem I and Problem E. An
example of this use of antithetic variables is taken up in Section 7.2.

5.2. Systematic sampling

Systematic sampling [McGrath (t970)] combines certain advantages of determinis-
tic and Monte Carlo methods. The former achieve great efficiency by systematically
choosing points for evaluation in specific low-dimensional problems; the latter pro-
duce indications of accuracy as a byproduct and are amenable to high-dimensional
problems. Systematic sampling specifies an m-tuple of points as a deterministic func-
tion of a random vector u,

xj = f j ( u ) (j- 1,...,'m),

with the property that the induced distribution of every xj is that of the probability
density function p(x).
As a leading example consider the case of univariate x, with pseudorandom vari-
ables from the distribution of x constructed using the inverse c.d.f, method (Sec-
tion 3.2). Denote F(e) = P[x ~< c], suppose u~ (i = 1 , . . . , N) are independently and
uniformly distributed on the unit interval, and take

- F + (5 --

where "[-]" denotes greatest fractional part. Clearly the method need not be limited
to evenly spaced grids; e.g., a Halton sequence (Section 2.3) could just as easily be
applied. Extension to higher dimensions is straightforward, but is subject to all of the
problems of deterministic methods there. The advantage of systematic methods is that
approximation error is generally O(Tr/, - 1 ) w h e r e a s that in Monte Carlo is Or(N-I~2).
In high-dimensional problems systematic sampling can be advantageous when con-

fined to a subset of the vector x that is especially troublesome for Monte Carlo and/or
is an important source of variation in the function 9(x). As an example of the former
condition, suppose it is difficult to find an importance sampling density that mimics
p(x), but

x'= (' ')x o ) , x(2)

k, 1 x'm,l txm2

a good importance sampling density for the marginal p.d.f, p(x0) ) is available, and
the inverse c.d.f. F -1 (p I x0)) of the conditional distribut!on of x(2) can be evaluated.
One may generate Xl(i) together with corresponding importance sampling weight w~;
draw ( u l , . . •, urn2) independently distributed on the unit interval; create the systematic

x(2)~j,...y.., = F -~ ([u~ + j l / g l ] , . . . , [u.,~, + j~./e~.])

(Jk= 1,...,gk; k= 1,...,m2).

Then record

.qi = [ ~ I gk "'" g X(1)i,X(2)ij,...j,,2)

Lk=l _1 jl=l jm2=l

along with each weight w~. Previous expressions in Section 4.3 for IN, G2, and @¢
are then valid with g~ in place of 9(xi). In particular (4.3.2) is still true, and s 2 may
be used to assess the increase in accuracy yielded by systematic sampling with higher
values of the gk.

5.3. The use of conditional expectations

Suppose there is a partition of x, x' = (x~l), x~2)), such that

(J(X) ~- g ( X ( I ) , X ( 2 ) ) --~ g*(X(1))~(X(2)) ,

where g(-) is linear; p(x) -- p(x0),x(2)) = p(x0))p(x(a ) I x(1)); it is possible to draw

pseudorandom vectors from the marginal distribution for x0) with p.d.f, p(x0)); and
E(x(2) I x0)) is known analytically. Then
774 J. Geweke


varp(x(,~){9* (x0))g[E(x(2) I x0))] } ..< varv(x)[g(x)].

Consequently, application of Monte Carlo methods directly in (5.3.1) will produce

an approximation error with smaller variance than would Monte Carlo in the general
framework set forth in Section 4.
The use of conditional expectations in fact bears a close relationship to antithetic
Monte Carlo integration. In particular, if one could draw antithetic variables x(2)il and
X(2)i 2 from the distribution with p.d.f, p(x(2) I x(1)) with perfectly negative correlation,
then 1 (x(2)i 1 q-x(2)i2) = E(x(2) [ x(l)), and exactly the same result would be obtained.
More generally, whenever 9(x) is a function of x(1) only, it is usually worth noting
whether E[g(x0) ) [ x(2)] can be evaluated analytically. If so, then the variance of
approximation error can be reduced by using the function of interest E[9(x(1 )) I x(2)~]
rather than g(x(1)i). Since

9(x(1)):E[g(x(,))lx(2)l+r / with cov{r/,E[g(x(1))lx(2)]}--0,

varp(x(2~){E[9(x(1))lX(z)]}<~varp(x(l~)[9(xo)) 1•

Against this improvement should be balanced the time required for the additional
computations, which are generally of no further use in generation of the xi; this time
is usually small.

5.4. Control variables

It is often the case that one is able to solve approximations to Problem I or Problem E
analytically. For example, if the mean # of the distribution with p.d.f p(x) is known
and one has available a linear approximation 9 (e) (x) of the function 9(x), then the
mean of 9 (e) (x) is 9 (~)(#). Moreover if {xi}~=l is a pseudorandom sample drawn from
the distribution with p.d.f, p(x), then 9(x~) and 9 (e) (x~) will be positively correlated
if the linear approximation is good for most xi. In this situation the method of control
variables, introduced by Kahn and Marshall (1953) and Hammersley and Handscomb
(1964), can be used to reduce the variance of the approximation error in I N or Ejv.
We develop the specific method for simple Monte Carlo integration in Problem I; ex-
tension to more involved methods is straightforward. Let JN = N-~ ~ = l h(xi) have
known mean J. (In the example given h(x~) = g(e)(xi), JN -= N -1 ~iN-.l g(g)(xi)
and J = 9 (e) (#).) Consider approximations of the form

I~N = I N + ~ ( J g -- J),
where IN is computed as before. It is the case that I~v k ~ I, and as long as varp[h(xi)]
exists, a central limit theorem may still be used to evaluate numerical accuracy. One
can easily verify that var(I~v ) is minimized by/3 = --cov(JN, IN)/var(JN), and in
this case

covZ(JN, IN) __ var(IN)[1 -- corr2(JN, IN)].

var(I~v) = var(IN) var(Ju )

Usually the parameter/3 is unknown. It may be estimated in the obvious way from
the replications.
This method is easily extended to the case in which a vector of estimates JN =
( J ~ ) , . . . , J ~ ) ) ' with known mean J = (J('),..., J(q))' is available. If we denote

= var(JN), c = COV(JN,IN),
qxq qxl

then the variance of the approximation

& = IN +/~'(JN -- J)

is minimized by/3 -- •-Ic, and in this case

v a r ( & ) = var(IN) -- c ' . , V % = var(±~) [ 1 e'.Z'-le 1 "


6. Markov chain Monte Carlo methods

All of the independence Monte Carlo methods for integration assume the ability to
efficiently generat e pseudorandom variables from a distribution with specified prob-
ability density function p(x). But in many economic problems it is difficult or im-
possible to find a generation algorithm that is sufficiently efficient to be practical. An
instructive limiting case is the one in which the constituents of x are independently


One could construct an acceptance sampling algorithm with a source density hi(zi)
corresponding to each pi(zi), and accept the draw with probability p(z)/ah(z), where

p(z) v(zO
a=sup~=Hai, ai=sup~-Z-~ (i= 1,...,m).
z I~k~} i=1 z i i%~i)
776 J. Geweke

Since a is directly proportional to the time required to obtain an accepted draw (see
Section 3.2) this expression makes clear that acceptance sampling can be subject to its
own curse of dimensionality if the source density is constructed element-by-element.
Essentially the same difficulty can arise in importance sampling, where it is manifested
in only a few weights w(xi) accounting for the sum.
This example is of interest only as a limiting case. If the xi really were independent,
one could employ acceptance sampling element-by-element, and computation time
would then be proportional to ~ i ~ 1 ai. An obvious extension of this idea to the
general case is to write

p(x) ~-p(Xl)I~Pill ..... i--l(Xi t Xl,'..,Xi l)

and employ acceptance or importance sampling for each conditional. The difficulty
here is that construction of probability density kernels for the marginal in xl and
all but the last conditional require analytic integration. Notable simple cases aside,
this is not possible, and it remains impossible lbr subvectors as well as individual
This section takes up a recently developed generalization of independence Monte
Carlo that has become known as Markov chain Monte Carlo. The idea is to construct
a Markov chain with state space D and invariant distribution with p.d.f, p(x). Fol-
lowing an initial transient or burn-in phase, simulated values from the chain form a
basis for approximating Ep[g(x)], thus solving Problem E. If the p.d.f, p(x) does not
contain an unknown factor of proportionality p*, then Problem I is solved as well.
What is required is to construct an appropriate algorithm and verify that its invariant
distribution is unique, with p.d.f, p(x).
Markov chain methods have a history in mathematical physics dating back to the
algorithm of Metropolis et al. (1953). This method, which is described in Hammersley
and Handscomb (1964, Section 9.3) and Ripley (1987, Section 4.7), was generalized
by Hastings (1970), who focused on statistical problems, and was further explored by
Peskun (1973). A version particularly suited to image reconstruction and problems in
spatial statistics was introduced by Geman and Geman (1984). This was subsequently
shown to have great potential for Bayesian computation by Gelfand and Smith (1990).
Their work, combined with data augmentation methods [Tanner and Wong (1987)],
has proven very successful in the treatment of latent variables and other unobservables
in economic models. (An example is given in Section 7.1.) Since 1990 application of
Markov chain Monte Carlo methods has grown rapidly; new refinements, extensions,
and applications appeal- almost continuously.
This section concentrates on developing the methods, deferring serious examples to
Section 7. We begin with a heuristic introduction to two widely used variants of these
methods, the Gibbs sampler and the Metropolis-Hastings algorithm (Section 6.1).
Some theory of continuous state Markov chains required to demonstrate convergence
is given in Section 6.2. Easily verified sufficient conditions for convergence of the
Gibbs sampler are set forth in Section 6.3 and for convergence of the Metropolis-
Hastings algorithm in Section 6.4• Some practical issues in assessing the error of
approximation are treated in Section 6.5. Much of the treatment here draws heavily
on the work of Tierney (1991, 1994), who first used the theory of general state
space Markov chains to demonstrate convergence, and Roberts and Smith (1994),
who elucidated sufficient conditions for convergence that turn out to be applicable in
a wide variety of problems in economics.

6. I. Two Markov chain Monte Carlo algorithms

Motivated by the role of p(x) in Problem I or Problem E, discussion here proceeds

assuming that x is continuously distributed. However, there is no harm in regarding
x as discrete on a first reading. A full development covering both the continuous and
discrete cases is given in Section 6.2.
The Gibbs sampler begins with a partition, or blocking, of Xmxl,

Xtz (', • . k)"

X !

For i = 1 , . . . ,k, x}i ) = ( x i l , . . . ,x,i,~(i)) and re(i) ~> 1; ~ik 1 'rrt(i) = m; and the
xij are the components of x. Let p(x(i) I x(_i)) denote the conditional p.d.f.s induced
by p(x), where x(_i) = {x(j), j # i}.
Suppose we were given a single drawing x°,x '° = (x}°),... ,x(k)) tO , from the distri-
bution with p.d.f, p(x). Successively make drawings from the conditional distribution
as follows:

x ( 1 ) , x ( 3 ) , . . . , x k) ,

Xll), ..., x'( j _ I ) , X ( jo+ I ) , . . . , X ( k ) o ) ,

xl ))
This defines a transition process from x ° to x I = (x}Jl) , " ' , x'l(k))" The Gibbs sampler
is defined by the choice of blocking and the forms of the conditional densities induced
by p(x) and the blocking• Since x ° ~ p(x), (xll), • " , x (j-l),
1 x (j j ) , x 0( j + ~ ) , . . . , x (0~ ) ) ~
p(x) at each step in (6.1.1) by definition of the conditional density. In particular,
x ~ p(x).
778 J. Geweke

Iteration of the algorithm produces a sequence x °, x l , . . . , x t , . . . which is a real-

ization of a Markov chain with probability density function kernel for the transition
from point x to point y given by

Kc(x,y) = 1-Ip[y(e) lx(j)(J > e), y u ) ( j < £)].


Any single iterate x t retains the property that it is drawn from the distribution with
p.d.f, p(x).
For the Gibbs sampler to be practical, it is essential that the blocking be chosen
in such a way that one can make the drawings (6.1.1) in an efficient manner. For
many problems in economics, the blocking is natural and the conditional distributions
are familiar; Section 7.1 provides an example. In making the drawings (6.1.1) all
the methods of Sections 3 and 4 are at our disposal. Observe that in this context
acceptance sampling is attractive relative to importance sampling, since the former
produces independent, identically distributed, unweighted drawings from the condi-
tional distribution.
Of course, it is generally difficult or impossible to make even one initial draw from
the distribution with p.d.f, p(x). The purpose of that assumption here is to marshal an
informal argument that p(x) is the p.d.f, of the invariant distribution of the Markov
chain. A leading practical problem is to elucidate conditions in which the distribution
of x t will converge to that corresponding to p(x) for any choice of x ° in the domain
D, and we turn to this in Section 6.3.
The Metropolis-Hastings algorithm begins with an arbitrary transition probability
density function q(x, y) and a starting value x °. If x t = x, the random vector generated
from q(x, y) is considered as a candidate value for x t+l. The algorithm actually sets
x t+l = y with probability

p(y)q(y,x) }
a(x,y) :min[~y),t ;

otherwise, the algorithm sets x t+~ - x : x t. This defines a Markov chain with a
generally mixed continuous-discrete transition probability from x to y given by

{q(x,y)~(x,y) if y # x,
K ( x , y) : 1 - fD q(x, z)c~(x, z) dz if y : x.

This form of the algorithm is due to Hastings (1970). The Metropolis et al. (1953)
form takes q(x, y) = q(y, x). A simple variant that is often useful is the independence
chain [Tierney (1991, 1994)], q(x, y) = j(y). Then

~p(y)j(x) } . fw(y) }
c~(x,y)=min~~,l =mm~w---~,l ,
where w(x) -- p ( x ) / j ( x ) . The independence chain is closely related to acceptance

s~unpling (Section 4.2) and importance sampling (Section 4.3). But rather than place
a low (high) probability of acceptance or a low (high) weight on a draw that is
too likely (unlikely) relative to p(x), the independence chain assigns a high (low)
probability of accepting the candidate for the next draw.
There is a simple two-step argument that motivates the convergence of the sequence
{xt } generated by the Metropolis-Hastirtgs algorithm to p(-). [This approach is due to
Chib and Greenberg (1995).] First, observe that if any transition probability function
p(x, y) satisfies the reversibility condition

p(x)p(x, y) = p(y)p(y, x),

then it has p(.) as its invariant distribution. To see this, note that

f p(x)p(x, y) dx = f p(y)p(y, x) dx = p(y) / p(y, x) dx = p(y).

The second step is to consider the implications of the requirement that K(x, y) be
reversible: p(x)K(x, y) = p(y)K(y, x). For y # x it implies that


Suppose (without loss of generality) that p(x)q(x,y) ~> p(y)q(y,x). If we take

c~(y, x) = 1 and c~(x,y) = p(y)q(y, x ) / p ( x ) q ( x , y), this equality is satisfied.
In implementing the Metropolis-Hastings algorithm, the transition probability den-
sity function must share two important properties. First, it must be possible to generate
y efficiently from q(x, y). All the methods of Sections 3 and 4 are potential tools for
these drawings. (Once again, acceptance sampling is attractive relative to importance
sampling.) A second key characteristic of a satisfactory transition process is that the
unconditional acceptance rate not be so low that the time required to generate a
sufficient number of distinct x' is too great.

6.2. Mathematical background

Let tSxt't~Jt=0be a Markov chain defined on D C_ ~r~ with transition kernel K " D x
D --+ ~+ such that, with respect to a c~-finite measure u on the Borel c~-field of ~"~,
for u-measurable A,

P ( x ~ c A lx ~-* = x) = f / ¢ ( x , y ) d u ( y ) + ~'(X)XA(X),
780 J. Geweke


r 1 if x c A,
r(x)=l- K(x,y)du(y) and XA(X)='[0 ifxCA.

The measure u will be Lebesgue for continuous distributions and discrete for discrete
The transition kernel K is substochastic: it defines only the distribution of accepted
candidates. Assume that K has no absorbing states, so that r(x) < 1 Vx E D. The
corresponding substochastic kernel over t steps is then defined iteratively,

K (t) (x, y) = f K (t- 1)(x, z)/((z, y) du(z) @ K (t- 1)(x, y)r(y)

+ [.r'(x)]t--lK(x, y).

This describes all t-step transitions that involve at least one accepted move. As a
function of y it is the p.d.f, with respect to u of x t, given x ° = x, excluding realizations
with x t = x Vj = 1 , . . . , t .
An invariant distribution for the Markov chain is a function p(x) that satisfies

P(A)=/AP(X) du(x) = ~ { ./A K ( x , y ) d u ( y ) + r(x)XA(X)}p(x) du(x)

= ~ P(x t E A I x t - ' = x)p(x) du(x)

for all u-measurable A. Let D* = {x c D: p(x) > 0}. The kernel K is p-irreducible
if for all x C D*, P(A) > 0 implies that P(x t E A I x° = x) > 0 for some t /> 1.
Situations like the one shown in Fig. 15.4, where the support is disconnected and the
Markov chain is the Gibbs sampler, cannot arise if K is p-irreducible. Note that if
x ° E Di, it is impossible that x t C Dj (j 7~ i, any t > 0). In the situation portrayed
in Fig. 15.4, there are two invariant distributions, one for D~ (reached if x ° E D1)
and one for D2 (reached if x ° E D2).
The kernel K is aperiodic if there exists no u-measurable partition D = Us=o/3s
(r ~> 2) such that

P ( x t < B~,,,o~(,. ) I x ° - x e B0) = l, yr.

It is Harris recurrent if P[xt C B i.o.] = 1 for all u-measurable B with JBp(x) du(x) >
0 and all x ° E D. If a kernel is Harris recurrent, then it is p-irreducible.
Tierney (1994) shows the following.
(A) If K is p-irreducible, then p(x) is its unique invariant distribution.
)- Xl~)

Figure 15.4. The disconnected support D = D 112 D 2 for the probability distribution implies that a Gibbs
sampler with blocking (x(0 , x(2)) will not be Harris recurrent, In the example shown it cannot converge
from any starting value,

(B) If K is p-irreducible and aperiodic, then

lim f
t ---+c~ J D
IK~(xo,y)- p(y)[du(y) = 0
except possible for a set of xo of p-measure zero. If K is also Harris recurrent,
then this occurs for all xo.
(C) If K is aperiodic and Harris recurrent and 191 is p-integrable, then for all x ° E D,

N -~ ~g(x') ~ f~ 9(x)p(x)d.(x).

(A) and (B) follow immediately from Theorem 1 of Tierney (1994). Since K is
Harris recurrent and its invariant distribution is a proper distribution, it is positive
Harris recurrent and hence K is ergodic. Result (C) then follows from Tierney (1994),
Theorem 3.

6.3. Convergence of the Gibbs sampler

The Gibbs sampler requires that the conditional probability density functions

p[x(~) Ix(_~)] =p(x)/~ p(x)d~,(x(~)) (~-- l, .,k)


be well-defined on their supports. In this case the transition kernel density is

KG(x,y) = I ] p [ y ( e ) Ix(j) (j > g), y(j) (j < g)].
782 J. Geweke

If x ° E D, then p(x) is the density of an invariant distribution of the chain defined

by Ko:

DKG(x,y)p(x) du(x)
=P(Y(k) IY(-k)) J'P[Y(k-,)IX(k), Y(j) (J < k - 1}]

×/p[y(k-2) f ,,(k), ,,(~-,), ytj) (5 < k - 2)] ×...

× f~[y(2)I y(,), ,,(j)(J > :))/p[y(,)Ix(j)(5 > 1)]

×/p[xo) I x(j) (j > 1)] d~,~(,,(1))
x p[x(2) I xo) (j > 2)] du2(x(:z))p[x(3) [ x(.i) (j > 3)] d,/3(xo) ) x
x p[x(~¢_l)I x(k)] duk-i (x(k-t))p[x(k)] d'k(x(k))

=p(y(~) ¢y(_~)) fp[y(~_,) [ x(~), y(j) (j < k - 1)]

×/p[y(~_~) j x(~), x(~_,), y(j) (j < k - :)] ×...

×/PLY(:) I YO), xo) (J > 2)]

× / P[Yo) [x(j) (j > 2)]p[xo) Ix(j) (j > 3)] du3(xo) ) ×...

X p[X(k-,)I X(k)] dMk-I(X(k-l))p[X(k)] dUk(X(k))

=P(Y(k) IY(-k)) JP[Y(k-1))X(k),Y(j) (J < k'- 1)]

x/p[y(~_:)lx(k), x(k-~), y(y) (j < k-2)] ×...

×/P[Yo), y(:) ] x(j) (j > 3)] ×-.-

=-P(Y(k) I Y(-a)) / .PlY(k-I) t x(k), y(j) (j < k - 1)]

Ch. 15: Monte Carlo Simulation and Numerical Integration 783

x fp[y( -2) I x(k), x(k-l), y(j) (J < k - 2)]

× PlY(l), Y(2),-" ,Y(k-3) I x(k-1),x(k)]

~-P(Y(k) IY(-k)) J P [ Y ( k - 1 ) lx(k), Y(j) (J < k - 1)]

× PIY(I),Y(2),'-', Y(k-2)I x(k)lP[X(k)] d~'k (x(k))

=P(Y(k) I Y(-k))P[Y0),Y(2),'' .,Y(k-1)] ----P(Y)-

If ~, is discrete, p-irreducibility of KG is sufficient for results (A), (B), and (C) in
Section 6.2 [Tierney (1994)]. The continuous (Lebesgue measure) case is technically
more difficult, but it may be shown that three simple conditions are jointly sufficient
for results (A), (B), and (C) [Roberts and Smith (1994)]:
(1) p(x) is lower semicontinuous at 0;
(2) f p ( x ) dxi is locally bounded (i = 1 , . . . , k);
(3) D * is connected.
A function h(x) is lower semicontinuous at 0 if, for all x with h(x) > 0, there exists
an open neighborhood Nx 3 x and g > 0 such that for all y E Nx, h(y) ~> ~ > 0. This
condition rules out situations like the one shown in Fig. 15.5, where the probability
density is uniform on a closed set. For any point x on the boundary there is no open
neighborhood Nx 3 x such that for all y c Nx, h(y) is bounded away from 0. The
point A is absorbing.
The local boundedness condition, together with lower semicontinuity at 0, ensures
that the Markov chain is aperiodic. It does so by guaranteeing that for the sequence
of support sets B ' ( x ) = {y ¢ D*: K(~)(x,y) > 0}, Bt(x) C B ' + l ( x ) for all t >~ 1
and all x E D* [Roberts and Smith (1994, Lemma 3)].


y.- x(~)

Figure 15.5. The probability density p(x) is uniform on the closed set D and consequently is not lower
semicontinuous at 0. The point A is absorbing for the Gibbs sampler with blocking (x(~),x(2)), so if
x° = A convergencewill not occur.
Connectedness of D*, together with conditions (1) and (2), implies that the Gibbs
sampler is p-irreducible [Roberts and Smith (1994, Theorem 2)]. Conditions (2) and
(3) further imply that the probability measure P corresponding to p(x) is absolutely
continuous, and consequently [Tierney (1994, Corollary 1)] the Gibbs sampler is Har-
ris recurrent. Therefore p(x) is the unique invariant probability density of the Gibbs
These conditions are by no means necessary for convergence of the Gibbs sampler;
Tierney (1994) provides substantially weaker conditions. However, the conditions
stated here are satisfied for a very wide range of problems in economics and are
much easier to verify than the weaker conditions.

6.4. Convergenceof the Metropolis-Hastingsalgorithm

Take the transition probability density function q(x, y) of Section 6.1 to be a Markov
chain kernel with respect to u, q : D* x D* --+ ~+. Defining c~ : D* x D* --+ [0, 1]
as before, define KH : D* x D* --+ ~+ by

KH(x, y) = q(x, y)a(x, y).

This is the substochastic kernel governing transitions of the chain from x to y that are
accepted according to the probability a(x, y). The distribution p(x) dr(x) is invariant
if for all v-measurable sets A,

P(A) = fA p(x)dr(x)= fD
P[y E Z l x]p(x)dv(x).
Recalling that

P[y ~ A I x] = LKH(X,y)dv(y)+ [1--/DKH(x,z)dv(z)IXA(X),

Z) P[y C A I x]p(x) dr(x)

= L L KH(x,y)dv(y)p(x)dv(x)

+fDXA(X)p(x)dv(x)- fDJ; KH(X'y) dv(Y)XA(X)p(x)du(x)

: fD L KH(X,y) dv(y)p(x)dv(x)

÷iA,IXl IA KH(X, y ) d v ( y ) p ( x ) d r ( x ) .
Since p(x)KH(X, y) = min[p(x)q(y, x), p(x)q(x, y)] is symmetric in x and y, the last
expression reduces to fA p(x)du(x) = P ( x E A).
From this derivation it is clear that invariance is unaffected by an arbitrary scaling
of KH(x, y) by a constant c. The choice of c affects the properties of the Metropolis-
Hastings algorithm in important practical ways. Larger values of c result in fewer
rejected draws but slower convergence to p(x), whereas smaller values of c increase
the proportion of rejected candidates but accelerate the rate of convergence to p(x).
Roberts and Smith (1992) show that the convergence properties of the Metropolis-
Hastings algorithm are inherited from those of q(x,y): if q is aperiodic and
p-irreducible, then so is the Metropolis-Hastings algorithm. If q(x, y) is constructed
as a Gibbs sampler (as is often the case), then the conditions set forth in Section 6.3
may be used to verify aperiodicity and p-irreducibility. A Metropolis-Hastings chain
is always Harris recurrent, and therefore the invariant distribution p is unique.

6.5. Assessing convergence and numerical accuracy

In any practical application one is concerned with the discrepancy between E[9(x)] =
fD 9(x)p(x)dx and its numerical approximation N -I ~-~N=I g(x/). Consider the de-

2 - E[g(x)] = E I x° - E[ (x)l
N t=l t=l

+ t,, ° = A (x °) +
t=l t=l

The term AN(x °) is nonstochastic and in general nonzero, but l i m N - ~ AN (x°) = 0

if conditions set forth earlier in this section are satisfied. The purpose of a transient
or burn-in phase is to reduce AN(x°), but for any finite transient period it will still
be the case in general that AN(X °) ¢ 0. This difficulty is termed the convergence
or sensitivity to initial conditions problem. The term BN(x °) is stochastic and is
the analog of E,N -- t?, or IN -- I for acceptance or importance sampling. This term
vanishes as N -+ ec, but assessing its size is complicated by the fact that {x t} is
neither independently nor identically distributed. This difficulty may be termed the
numerical accuracy problem.
A leading cause of slow convergence is multimodality of the probability distribu-
tion, for example, as shown in Fig. 15.6 for a Gibbs sampler. In the limit multimodality
approaches disconnectedness of the support, and increasingly large values of N are
required for AN(X °) to be close to 0. This difficulty is essentially undetectable given
a single Markov chain: for a chain of any fixed length, one can imagine multimodal
distributions for which the probability of leaving the neighborhood of a single mode
786 J. Geweke



Figure 15.6. Iso-probabilitydensity contours of a multimodal bivariate distribution are shown. (Arrows
indicate directions of increased density.)Given sufficientlysteep gradients the Gibbs samplerwill converge
very slowly.

is arbitrarily small. This sort of convergence problem is precisely the same as the
multimodality problem in optimization, where iteration from a single starting value
can by itself never guarantee the determination of a global optimum. Multimodal
disturbances are difficult to manage by any method, including those discussed in Sec-
tion 4. In the context of the Markov chain Monte Carlo algorithms, the question may
be recast as one of sensitivity to initial conditions: different initial conditions will lead
to quite different chains, in Fig. 15.6, unless the simulations are sufficiently long.
A Markov chain Monte Carlo algorithm can be made fully robust against sensitivity
to initial conditions by constructing many very long chains. Just how one should
trade off the number of chains against their length for a given budget of computation
time is problem specific and as a practical matter not yet fully understood. Many
of the issues involved are discussed by Gelman and Rubin (1992), Geyer (1992),
and their discussants and cited works. In an extreme variant of the multiple chains
approach, the chain is restarted many-times, with initial values chosen independently
and identically distributed from an appropriate distribution. But finding an appropriate
distribution may be difficult: one that is too concentrated reintroduces the difficulties
exemplified by Fig. 15.6; one that is too diffuse may require excessively long chains
for convergence. These problems aside, proper use of the output of Markov chain
Monte Carlo in a situation of multimodality requires specialized diagnostics; Zellner
and Min (1992) have obtained some interesting results of this kind. At the other
extreme a single starting value is used. This approach provides the largest number of
iterations toward convergence, but diagnostics of the type of problem illustrated in
Fig. 15.6 will not be as clear.
In specific circumstances a central limit theorem applies to BN(x°), which may
therefore be used to assess the numerical accuracy problem. To develop one set
of such circumstances, suppose that the Markov chain is stationary. This could be
guaranteed by drawing x ° from the stationary distribution. Such a drawing would
Ch. 15." Monte Carlo Simulation and Numerical Integration 787

be time consuming (if not, i.i.d, sampling from p(x) is possible), but only one is
required. Alternatively, one could iterate the chain many times beginning from an
arbitrary initial value, discard all but the last iteration, and take this value as drawn
from the stationary distribution to begin a new chain. Suppose G = Ep[9(x)] and
varp[g(x)] are finite and denote "7i = covK[g(xt),g(xt+i)]. A Markov chain with
kernel K is reversible if K ( x , y) = K ( y , x) for all x, y E D. Metropolis-Hastings
chains are always reversible; Gibbs sampling chains are not [Geyer (1992, Section 2)].
If the Markov chain is stationary, p-irreducible, and reversible, then



and if cr2 < e~, then

V (gN -- c ) d H(o,

[Kipnis and Varadhan (1986)].

In the absence of reversibility, known sufficient conditions for central limit theorems
are strong and difficult to verify. For example, if for some m < e~ P ( x t+m c A I
x t = x ) / f A p(x) du(x) is bounded below uniformly in x, then D is a small state space
and {x t } is uniformly ergodic [Tierney (1991b, Proposition 2)]. Then if varp[9(x)] is
finite, there exists Or2 < 0<3 such that V ~ ( g u -- G) d J~(0, o'2). The boundedness
condition, however, is generally difficult to establish.
In neither circumstance is there a known sufficient condition for approximation of
the variance term O"2 of the central limit theorem. The problem is formally quite similar
to estimating the variance of the sample mean £'N = N -1 ~U=l Zt of a stationary
time series {zt}. In the time series problem, well-established mixing conditions (rates
of decay for cov(zt, zt+i)) are sufficient for consistent estimation of vat (2N) [e.g.,
Hannah (1970, pp. 207-210)]. In time series applications these conditions remain
assumptions. The difficulty in applying these conditions to Markov chain Monte Carlo
is that they cannot be established from verifiable fundamentals.
Nevertheless, applications of the time series procedures as if sufficient mixing
conditions obtain appear to give quite reliable results for real problems in economics.
That is, applying a central limit theorem as if the output of the Markov chain Monte
Carlo algorithm were a stationary process satisfying the mixing conditions yields
accurate probability statements about the output of the same algorithm applied to
the same problem with a new starting value and initial seed for the random number
generator [Geweke (1992a), Geyer (1992)]. This leads to a conservative but practical
procedure for assessing the accuracy and reliability of Markov chain Monte Carlo.
First, execute several short runs - a burn-in of 50 to 100 iterations followed by a chain
of length N = 500 or N = 1000 is sufficient for many problems. Examine the 9N and
788 Z Geweke

their standard errors as assessed by conventional time series procedures for a single
time series to see whether the scatter of each 9N across the short runs is consistent
with these standard errors. If necessary, increase the length of the short runs until this
consistency is achieved. Second, choose the last value of one of the short runs, and
use it as the starting value of a long run of from N = 1 0 4 to N = 1 0 6 iterations. As
a final check, compare the 9N from the single long run with the confidence intervals
implied by the short runs. Report the final value of 9N, together with its numerical
standard error as computed by time series methods for a single series.

7. Some examples

The usefulness of all of these methods lies as much in their appropriate combination
as in the application of any one individually. We turn now to some examples that
illustrate some useful combinations, and in the process treat a few topics closely
related to integration and simulation.

7.1. Stochastic volatility

Models in which the volatility of asset returns varies smoothly over time have re-
ceived considerable attention in recent years. [For a survey of several approaches, see
Bollerslev, Chou and Kroner (1992).] Persistent but changing volatility is an evident
characteristic of returns data. Since the conditional distribution of returns is relevant
in the theory of portfolio allocation, proper treatment of volatility is important. Time-
varying volatility also affects the properties of real growth and business cycle models.
A simple model of time-varying volatility is the stochastic volatility model, whose
descriptive properties have been examined by a series of investigators beginning with
Taylor (1986). The approach here closely follows that of Jacquier, Poison and Rossi
(1994). Let rt denote the one-period return of a single asset and let xt be a vector
of deterministic time series such as indicators for day of the week, holidays, etc. A
simple stochastic volatility model is

rt -= ~txt 4- ct, ~1/2 ut,

~t = nt (7.1.1)

loght = c~ + ~log ht-1 + cruut, (7.1.2)

(ut'] Im ./V'(O,12). (7.1.3)

At time T an economic agent is concerned with future returns ~"T-t-I ~ • • - , rTq-q through
an expected utility function

E [ V ( r T + l , . . . , r T + q ; z ) I~T] = E [ V ( r q ; . . . ) [ ~bT], (7.1.4)

Ch. 15." Monte Carlo Simulation and Numerical Integration 789

where z is a generic vector of other arguments which may be known or unknown at

time T.
Evaluation of this expected utility function requires the solution of an integration
problem. We will consider this problem for three different specifications of the in-
formation set ~T in turn. Denoting rT = (rl,... ,rT)', XT+s = ( X l , . . . ,XT+s) t,
O' = (/Y, a, 5, ~r~) and hT = (hi,..., hT)', these are

~)= {rT,XT+q,O, hT}; ~)= {rT,XT+q,O}; ~(~)= {rT,XT+q}.

As one may readily verify, deterministic approximations of the type discussed in
Section 2 are inconvenient for this problem. Even explicit expressions of the integrals
in closed form are awkward and unrevealing. Simulation methods are much more
direct, and have the added advantage that one set of simulations can suffice for several
alternative values of the other arguments z in (7.1.4). These arguments might include
taste parameters, or the values of decision variables which themselves do not affect
rT. (Section 7.2 provides an example involving explicit optimization.)
The solution for the problem for ~ ) is simple. In the notation of Section 4, repeated
period-by-period simulation of x = rq provides an independent identically distributed
sample {~i)}N 1 whose probability density p(x) = p(rq I ~(~)) we have not even
expressed. Then,

where 9(x) = V ( x ; . . . ) = V(rT+I,..., rr+q;...). Consequently,

1 N


The problem for ~(~) is more difficult. Rather than hT itself the agent has available

p(hT I rT,XT,O) = p(hT,rT [ XT,O)/p(rT)

= p(rT [hT,XT,O)p(hT [Xy, O)/p(rT) c<p(rT [hT,XT,O)p(hT [ XT, O)

= (2~r)-x/2 I I h ; l n - = 2ht J

x (27r)-T/2~rTTh;l exp [ - ~ -2~fu

790 J. Geweke

o( H h~3/2exp -
t=l =

xexp[_~-~ (l°ght-°~-51°ght-1) a
t=l 2~r~ ' (7.1.5)

where ct = rt - fltxt. The simple Monte Carlo solution of the previous problem
could be extended to this one if one could draw an i.i.d, sample L,rg(i)aN ' ° T J i = l from
the distribution implied by the last kernel. This is clearly not possible, nor are there
obvious source or importance sampling distributions for the methods of Sections 4.2
or 4.3.
This problem can be solved in a number of ways, and a comparison of three
alternatives is instructive. All begin with the kernels of the conditional probability
densities for individual ht implied by (7.1.5). For t = 2 , . . . , T - 1 the kernel is

p[ht I h~(t # s),O,~ d

- 2ht J ~-~ , (7.1.6)


c~(1 - 5 ) + 5 ( l o g h t - 1 + loght+,) 02 _ _ (72v

1+~2 ' 1+~2"

(Similar expressions for hi and hT may be constructed.)

The first two approaches construct a Gibbs sampler for the ht, drawing and
successively replacing hi, h2, • • •, hT. Each cycle of drawing and replacement pro-
duces the next realization of h~) in the Markov chain. Note from (7.1.5) that
limh~op(ht I rt,Xt,O) = 0 for any t = 1 , . . . , T , and since the support of hT
is the positive orthant of ~T the probability density function of hT is lower semi-
continuous at 0. The remaining sufficient conditions for convergence of the Gibbs
sampler are clearly satisfied. Conditional on each [a~) in the chain, draw a single ~i)
as in the problem for ~(~). Since

p(h~)) - p(h~ I rT,XT,O) --+ O,

it follows that

_ p(rq t 0)l -, 0
Ch. 15: Monte Carlo Simulation and Numerical Integration 791

Both approaches work directly with the conditional distribution of Ht = log ht, which
from (7.1.6) is given by

logp(Ht [ Hs(s ~ t), O, et)

= -exp (- @ ) exp(-He) - (Ht - #;)2/2~ 2 (7.1.7)

(up to an additive constant) where #~' = #t - 0.5cr2, but differ in the method for
obtaining Hr.
The first approach is to use acceptance sampling. A reasonable source distribution
is N'(/4' , ~r2), for which the acceptance probability is

exp[-(~-)exp(-Ht)] =exp( - et2)'2htj

The acceptance probability falls below 0.01 if and only if e2/ht exceeds 9.2, which is
highly unlikely if the model reasonably well describes the distribution of the returns
rt. The acceptance probability could be improved somewhat using the optimizing
procedures set out in Section 3.2, but given the favorable acceptance probabilities for
the N'(#~', ~r2) source distribution the additional overhead might not be warranted.
The second approach is to note that the log conditional kernel densities (7.1.7)
are strictly concave, and apply the adaptive method of Gilks and Wild (1992). Their
algorithm (described in Section 3.2) may be initialized by noting that Ht = #~
lies to the left of the mode of the log-conditional and a solution of (1 - Ht +
H2/2) exp(-et2/2) - (Ht -/~)/O "2 lies to the right of the mode. Except for the
method of drawing Ht, the solution of the problem proceeds as in the first approach.
The third approach is to construct a Metropolis-Hastings independence chain. This
is done by forming a Metropolis step Mt for each ht, and then combining all T steps
into a single transition M = M1M2.'. MT, At each Mt either a candidate new value
is accepted or the old value of ht is retained. Thus, when M operates on the old hT it
generally produces a mixture of old and new ht in the new hT. The transition kernel
M is p-irreducible and aperiodic, and an argument like the one in Section 6.4 shows
that p(hT [ rr, XT+q, O) is the invariant distribution of M [Jacquier, Polson and Rossi
(1994, Section 2)]. A useful distribution for the Metropolis-Hastings independence
chain is the gamma distribution for h t I with shape parameter a = [ 1 - 2 exp(cr2)]/[1-
exp(c~2)] + 0.5 and scale parameter/~ = (a - 1) exp(#t + 0.5or2) + 0.5e 2. Combined
with an appropriate scaling of the transition kernel, as discussed in Section 6.4, this
chain produces convergence at a practical rate [see Jacquier, Poison and Rossi (1994,
Section 2.4, for details)].
The solution of the problem for ~(r2) is directly usable in the solution of the problem
for ¢~(~), in the context of the Gibbs sampler. From the form of (7.1.1)-(7.l.3) the
792 J. Geweke

probability density kernel for 0 and hT underlying the expectations operator in (7.1.4)

Ilhtl/2exp - ~/xt)2]
t=l t=l 2ht J

- t=l 2cr~
1/ 1j p(/3, a, (~,cru), (7.1.8)

where p(/3, c~, 3, a~) is the prior probability density function of 0 ~ = (/3r, a, d, or,).
A Gibbs sampler with blocking (hr, 0) will alternate drawing and substitution for
hT [ rT,XT,O and 0 I rT,XT,hT. The drawing for hT is the same one constructed
to solve the problem for ~b(~). The second drawing is facilitated by noting that the
kernel of (7.1.8) in 0 may be expressed

o( i i ex p _ (rt-/3'xt)~

×°v(T+l) e x p [ - - ~ 20-2

if the prior probability distribution has the conventional improper kernel p(/3, a, 3, ~r~,)
c< cr~-1. Thus, /3 and (a, 3, or,) are conditionally independent. In each case the dis-
tribution follows from standard treatments of Bayesian learning about a linear model
[e.g. Poirier (1995, Section 9.9)]:

F3 ~ N'(b, Q-l),

7' T
Q : E ht-IXtX~ and b : Q-1 E h~-lxtrt,
t=l t=l

for/3 and

$2/o~ ~ x2(T - 2), (a, ~)' Iav ~ N'(e, . ~ p - l ) ,

[ T
~ t = 1 log ht_ l
P = ~ t =Tl log ht-l ~tT-=- I log 2 ht-1 J ' c P[ ~tT=l loght log ht-I
Ch. 15." Monte Carlo Simulation and Numerical Integration 793

S 2 = ~ (loght - c 1 - c21oght-1),

for (a, 6, ¢7u).

7.2. Integration and optimization

The solution of all but the simplest dynamic optimization problems cannot be ex-
pressed in closed form. Since the objective function in these problems is expected
atility, integration is required to evaluate a candidate solution. Finding a good numeri-
cal approximation to the solution therefore requires optimizations of a function which
can be evaluated only inexactly. Moreover this evaluation must in general be repeated
many times in the process of approximating the solution. Several approaches to this
very important problem have been proposed: a good introduction is provided by Tay-
lor and Uhlig (1990) and the papers following that article; more recent work includes
McGrattan (1996). Here we discuss a widely applicable procedure that uses Monte
Carlo integration to solve dynamic optimization problems subject to an imposed pa-
rameterization of the decision rule, and then loosens the parametric restrictions so
as to approach the optimum. The description here closely follows Smith (1991) who
invented the method. The notation and assumptions are largely those of Stokey and
Lucas (1989, Chapter 9).

The problem, Many dynamic optimization problems can be expressed

max E0z.._~/3tr x t , x t + l , / Z 1 (7.2.1)

{xt}~l t=0 \pxl
given x0,z0 and subject to xt+l C F(xt,zt), Vt.
The sequence of state v e c t o r s {Zt}tC~=l is a Markov process with transition density

"O(Zt+l I Zt); Zt E Z C_ ~e, Vt; (7.2.2)

and Z is either compact or countable. The decision vector xt c X C_ ~p; X is closed

and convex. The agent observes the state vector s~ = (x~, z~) E S = X x Z prior
to choosing xt+l. The operator E0 denotes expectations conditional on the period 0
information set so. The return function r is bounded, continuous in (xt, Xt+l, zt), and
concave in ( X t , X t + l ) , VZt E W. The correspondence/' is nonempty, compact- and
convex-valued, and continuous. The convexity of/7 precludes problems with discrete
choice sets; for a treatment of discrete choice see Keune and Wolpin (1994).
794 J. Geweke

These assumptions imply the existence of a unique, time-invariant continuous de-

cision rule: w : S -+ X that expresses optimal xt+l = w(xt,zt) [Stokey and Lucas
(1989, Chapter 9)]. The optimization problem is to determine the decision rule. The
approach taken here is to replace w with a rule of thumb characterized by a vector of
parameters ~b :

Xt+ 1 = h(Xt,Z,~3), ~3 E C C_ ~Rk, C compact. (7.2.3)

This rule closes the model. Given So, z = {zt}t=l, T and ~b, (7.2.2)-(7.2.3) determines
fX / T + I
x = ~ tlt=l = q(z; ~b, so) through the obvious iterations.
Let b(x, z; so) = ~ = 0 ~tr( xt, Xt+l, zt) denote the utility delivered by the sequences
x and z given so for the dynamic optimization problem with horizon truncated at
T. Repressing so to maintain notational simplicity, 9(z,~O) = b[q(z;~b,s0),z;s0])
is delivered utility for decision rule h with parameterization ~b. Given h, the agent
chooses the best possible ~b, which we shall denote

~bo =: arg m~xE0[g(z, 0)]. (7.2.4)

Problem (7.2.4) is a simplification of Problem (7.2.1), but it still cannot be solved

analytically. The chief complication is the evaluation of the integral associated with E0
in (7.2.4). The key idea in the solution described here is to simulate the behavior of s
for different values of ~b, thereby providing approximations to Eo [9(z, ~b)]. As we shall
see, arbitrarily good approximations to ~bo may be obtained in this way. By increasing
T and employing a sequence of functions h that are increasingly flexible through a
longer parameter vector ~b, the solution of (7.2.4) may be made to approximate that
of (7.2.1) [Smith (1991)].

The algorithm. Generate n i.i.d, sequences fz(i) = {z{}~1 according to (7.2.2), and
take O = {~(i)}~_~ to be the collection of these sequences. If we let Q~(O,~b) =
~i~=l 9(~ ('0, ~b), then n-lQn(@, ~b) 3~4 E0[9(z, ~b)]. Since the set of sequences @ is

~ = arg maxr~-lQn(O, ~b)


is a well-defined, deterministic optimization problem that can be solved using stan-

dard hill climbing methods. These methods will be more efficient to the extent that
Or~Oh and Oh/O~b (better yet, 02r/Oh 2 and O2h/O~b~b' in addition) can be evaluated
Ch. 15: Monte Carlo Simulation and Numerical Integration 795

Asymptotic properties. Given four further assumptions, ~,, a.% ~bo and central limit
theorems may be used to assess the accuracy of the approximation of ~b0 by ~n and
of Eo[g(z, ~b)] by n-lQn(O, ~b).
(1) g(z, ~') is twice continuously differentiable in ~b for all z.
(2) The following functions are regular:
(a) g(z, ~b), 89(z, ¢)/8~b, 02g(z, ~b)/SOS~b';
(b) [0g(z, ~b)/8¢] [Sg(z, ~b)/8¢'];
(c) g2(z, ¢).
Regular is used in the sense of Tauchen (1985). Denoting the probability density
function of z by f(z), d(z, ¢ ) is regular if
(i) d(z, ~b) is measurable in z, V~b C C;
(ii) d is separable (Huber, 1967);
(iii) d is dominated -i.e., ~b ~ f 5(z) dz < ~ and ]d(z,~b)] < b(z), V~b c C;
(iv) d(z, ~/,) is continuous in ¢, Vz.
(3) E[g(z, ~b)] (the existence of which is guaranteed by Assumption 2(a)) is uniquely
maximized at ~bo, an interior point of C.
(4) E[82g(z, ~bo)/8~bS~b'] (whose existence is also guaranteed by Assumption 2(a))
is nonsingular.
Given these four further assumptions, one can usefully approximate ¢o:

+n ~ + ¢o, n'/2(+n - ¢o) d> N(0, V);

V = A - I B A -I ,

[ OO(z, ¢o)] B = E [09~_~¢o)0g(z,_¢0)].
A=E 0¢0¢' j'
0¢' J'

., 0¢0¢' n i=1 0¢ 0¢'

Under exactly the same conditions, one can also usefully approximate E[g(z, ~b)]:

(o, ~,,) °-~ E[9(z, ¢)1,

~n n 92(~('),~ ) - on(o,~ Z+ ~2 := var[g(z,¢)].

796 J. Geweke

Proofs are given by Smith (1991) who uses asymptotic theory developed by Amemiya
(1985) and Tauchen (1985). The second result is especially useful in valuing the
approximation error: see Smith (1991, Section 5).

Antithetic variables. In many applications the conditional distribution of the exoge-

nous state vector zt, with probability density function v(zt I zt-1), is smooth and
symmetric or nearly symmetric. The return function r is commonly monotone in-
creasing or decreasing in each element of zt and may be nearly linear over most of
the support of the distribution of zt. In such circumstances there are substantial gains
in the use of antithetic variables as described in Section 5.1. Let ~(il) and f~(i2) denote
such an antithetic pair. (Exactly how the pair is drawn will depend on the particulars
of the problem. What is essential, as discussed in Section 5.1, is that ~(il) and ~(i2)
be identically distributed.) Consider n/2 replications of f~(i~) and ~(i2) in lieu of n
replications of ~(i). Redefine

Qo(o,,) = E
[.(~("),~) +.(:), ~)]
with O = (~(il) ~(i2)),i~/~ and t a k e @n : argmax,l,n-lQn(O,¢). Then ~n and
n-lQ,~(O, ¢) are consistent for ~b and E[g(z, ~b)] as before. There are again central
limit theorems, but now

V = A - I B * A -1, with B*=B+½(C+C'),


C = E [ 0 g ( z ~ , ~bo)Og(z(i2), ~b0)J



o-~ = var[.(z, ~o)] + coy [.(~("),.o),.(:~, ~o)],

TL i = l i=t
1[ )]2
Ch. 15: Monte Carlo Simulation and Numerical Integration 797

S m i t h ( t 9 9 1 ) applies this m e t h o d to a v a r i a n t of the B r o c k a n d M i r m a n ( 1 9 7 2 ) g r o w t h

m o d e l . T h e c h a r a c t e r i s t i c o f the m o d e l that is i m p o r t a n t for the s u c c e s s o f the use o f
a n t i t h e t i c v a r i a b l e s is t h a t the e x o g e n o u s state v a r i a b l e s m o v e s m o o t h l y o v e r t i m e a n d
t h e r e t u r n f u n c t i o n is o n l y m o d e s t l y n o n l i n e a r o v e r m o s t o f the s u p p o r t o f z. U s i n g
o n l y 100 a n t i t h e t i c pairs a n d T = 800, S m i t h d e t e r m i n e s ~p up to f o u r s i g n i f i c a n t
figures. T h e s u b o p t i m a l i t y o f the r e s u l t i n g d e c i s i o n rules t u r n s o u t to b e e q u i v a l e n t to
a p e r - p e r i o d d e c r e a s e in c o n s u m p t i o n o f 2 x 1 0 - 5 % .


Ahrens, J.H. and Dieter, U. (1974) 'Computer methods for sampling from galmna, beta, Poisson, and
binomial distributions', Computing, 12:223-246.
Ahrens, J.H. and Dieter, U. (1980) 'Sampling from binomial and Poisson distributions: A method with
bounded computation times', Computing, 25:193-208.
Amemiya, T. (1985)Advanced econometrics. Cambridge, MA: Harvard Univ. Press.
Anderson, T.W. (1984) An introduction to multivariate statistical amdysis, 2nd edn. New York: W!ley.
Bollerslev, T., Chou, R. and Kroner, K.E (1992) 'ARCH modelling in finance', Journal of Econometrics,
Box, G.E.P. ,and Muller, M.E. (1958) 'A note on the generation of random normal deviates', Annals of
Mathematical Statistics, 29:610-611.
Bratley, P., Fox, B.L. and Schrage, L.E. (1987) A guide to simulation, 2nd edn. New York: Springer.
Brock, W.A. and Mirman, L.J. (1972) 'Optimal economic growth and uncertainty: The discounted case',
Journal of Economic Theory, 4:497-513.
Chib, S. and Greenberg, E. (1995) 'Understanding the Metropolis-Hastings algorithm', The American
Statistician, 49:327-335.
Coveyou, R.R. and MacPherson, R.D. (1967) 'Fourier analysis of uniform random number generators',
Journal of the ACM, 14:100-119.
Davis, P.J. and Rabinowitz, P. (1984) Methods of numerical integration. Orlando, FL: Academic Press.
Dcvroye, L. (1986) Non-un!fi)rm random variate generation. New York: Springer.
Evans, M. (1991) 'Adaptive importance sampling and chaining', Contemporary Mathematics, 115 (Statis-
tical Multiple Integration):137-142 (Providence: American mathematical Society).
Fishman, G.E and Moore, L.R., III (1982) "A statistical evaluation of multiplicative randoln number
generators with modulus 231_1', Journal of the American Statistical Association, 77:129-136.
Fishman, G.E and Moore, L.R., II1 (1986) 'An exhaustive analysis of multiplieative congruential random
number generators with modulus 231-1', SIAM Journal on Scientific and Statistical Computing, 7:24--45.
Forsythe, G.E. (1972) 'Von Neumann's comparison method for random sampling from the norm~ and
other distributions', Mathematical Computation, 26:817-826.
Gelfand, A.E. and Smith, A.EM. (1990) 'Sampling based approaches to calculating marginal densities',
Journal of the American Statistical Association, 85:398~-09.
Gelman, A. and Rubin, D.B. (1992) 'Inference from iterative simulation using multiple sequences', Statis-
tical Science, 7:457-472.
Geman, S. and Geman, D. (1984) 'Stochastic relaxation, Gibbs distributions and the Bayesian restoration
of images', 1EEE Transactions on Pattern Analysis and Machine Intelligence, 6:721-741.
Genz, A. ( 1991) 'Subregion adaptive algorithms for multiple integrals', Contemporary Mathematics', 115
(Statistical Multiple Integration):23-31 (Providence: American mathematical Society).
Genz, A. and Kass, R. (1996) 'Subregion adaptive integration of functions having a dominant peak',
Washington State University, working paper.
798 ~ Geweke

Genz, A. and Malik, A. (1980) 'An adaptive algoritlun for numerical integration over an N-dimensional
rectangular region', Journal ~f Computational and Applied Mathematics, 6:295-302.
Genz, A. and Malik, A.A. (1983) 'An imbedded fmnily of fully symmetric numerical integration rules',
SIAM Journal (~f'Numerical Analysis, 20:580-588.
Geweke, J. (1986) 'Exact inference in the inequality constrained normal linear regression model', Journal
(~['Applied Econometrics, 1: 127-141.
Geweke, J. (1988) 'Antithetic acceleration of Monte Carlo integration in Bayesian inference', Journal ~/'
Econometrics, 38:73-89.
Geweke, J. (1989) 'Bayesian inference in econometric models using Monte Carlo integration', Economet-
rica, 57:1317-1340.
Geweke, J. (1991) 'Efficient simulation from the multivariate normal and student-t distributions subject to
linear constraints', in: E.M. Keramidas, ed., Computing science and statistics: Proceedings (~t"the 23rd
symposium on the inte~'ace, pp. 571-578.
Geweke, J. (1992a) 'Evaluating the accuracy of sampling-based approaches to the calculation of poste-
rior moments', in: J.M. Bernardo et at., eds, Bayesian statistics 4: Proceedings of" the .f?mrth Valencia
international meeting. Oxford: Clarendon Press.
Geweke, J. (1992b) 'Priors for macroeconomic time series', Federal Reserve Bank of Minneapolis Institute
for Empirical Macroeconomics, Discussion Paper No. 64.
Geyer, C.J. (1992) 'Practical Markov chain Monte Carlo', Statistical Science, 7:473-481.
Gilks, W.R. and Wild, E (1992) 'Adaptive rejection sampling for Gibbs sampling', Applied Statistics (JRSS
Series C), 41:337-348.
Golub, G.H. and Welsch, J.H. (1969) 'Calculation of Gaussian quadrature rules', Mathematics ~[' Compu-
tation, 23:221-230.
Gradshteyn, I.S. and Ryzhik, I.M. (1965) Tables ~/" integrals, series, and products. New York: Academic
Greenberger, M. (1961) 'Notes on a new pseudo-random number generator', Journal of the ACM, 8:163-
Halton, J.M. (1960) 'On the efficiency of evaluating certain quasi-random sequences of points in evaluating
multi-dimensional integrals', Numerische Mathematik, 2:84-90.
Hammersley, J.M. (1960) 'Monte Carlo methods for solving multivariable problems', Annals ~f" the New
York Academy ~f Sciences, 86:844-874.
Hammersley, J.M. and Handscomb, D.C. (1964) Monte Carlo methods. London: Methuen.
Hammersley, J.M. and Morton, K.W. (1956) 'A new Monte Carlo technique: Antithetic variates', Proceed-
ings ~/-the Cambridge Philosophical Society, 52:449-474.
Hannan, E.J. (1970) Multiple time series. New York: Wiley.
Hart, H.E Cheney, E.W., Lawson, C.L., Maehly, H.J., Mesztenyi, C.K., Rice, J.R., Thacher, H.G., Jr. and
Witzgall, C. (1968) Computer approximations. New York: Wiley.
Hastings, W.K. (1970) 'Monte Carlo sampling methods using Markov chains and their applications',
Biometrika, 57:97-109.
Hlawka, E. (1961) 'Funktionen yon Beschrankter Variation in der Theorie der Gleichverteilung', Annali di
Matematica Pura Ed Applicata, 54:325-333.
Huber, P.J. (1967) 'The behavior of maximum likelihood estimates under nonstandard conditions', in:
L.M. LeCam and J. Neyman, eds, Proceedings t~f the fi.fth Berkeley symposium on mathematical statistics
and probability, Vol. 1. Berkeley: Univ. of California Press, pp. 221-234.
IMSL (1994) IMSL stat/Iibrary. Houston: Visual Numerics.
Jacquier, E., Poison, N.G. and Rossi, EE. (1994) 'Bayesian analysis of stochastic volatility models', Journal
~/ Business and Economic Statistics, forthcoming.
Judd, K.L. (1996) Numerical methods in economics. Cambridge, MA: MIT Press, forthcoming.
Kachitvichyanukul, V. (1982) 'Computer generation of Poisson, binomial, and hypergeometric random
variates', PhD dissertation, Purdue University.
Ch. 15: Monte Carlo Simulation and Numerical Integration 799

Kahn, M. and Marshall, A.W. (1953) 'Methods of reducing sample size in Monte Carlo computations',
Operations Research, 1:263-278.
Keune, M. and Wolpin, K. (1994) 'The solution and estimation of discrete choice dynamic programming
models by simulation: Monte Carlo evidences', Review of Economics and Statistics, 76:648-672.
Kinderman, A.J. and Ramage, J.G. (1976) 'Computer generation of normal random variables', Journal of
the American Statistical Association, 71:893-896.
Kipnis, C. and Varadhan, S.R.S. (1986) 'Central limit theorem for additive functionals of reversible Markov
processes and applications to simple exclusions',, Communications in Mathematical Physics, 104:1-19.
Kloek, T. and van Dijk, H.K. (1978) 'Bayesian estimates of equation system parameters: An application
of integration by Monte Carlo', Econometrica, 46:1-20.
Knuth, D.E. (1981) The art of computer programming, Vol. 2: Seminumerical algorithms, 2nd edn. Reading,
MA: Addison-Wesley.
Kronmal, R.A. and Peterson, A.V. (1979) 'On the alias method for generating random variables from a
discrete distribution', American Statistician, 33:214-218.
L'Ecuyer, P. (1986) 'Efficient and portable combined pseudorandom number generators', Communications
o[' the A CM, 29:304-313.
Marsaglia, G. (1961) 'Expressing a random variable in terms of uniform random variables', Annals ~/
Mathematical Statistics, 32:894-899.
Marsaglia, G. (1964) 'Generating a variable from the tail of a normal distribution', Technometrics, 6:101-
Marsaglia, G. (1968) 'Random numbers fall mainly in the planes', Proceedings qf the National Academy
of Sciences, 60:25-28.
Marsaglia, G. (1972) 'The structure of linear congruential sequences', in: S.K. Zarema, ed., Applications
of number theory W numerical analysis. New York: Academic Press.
Marsaglia, G. and Bray, T.A. (1964) 'A convenient method for generating normal variables', SlAM Review,
Marsaglia, G. and Bray, T.A. (1968) 'On-line random number generators and their use in combinations',
Communications of the ACM, 11:757-759.
Marsaglia, G., MacLaren, M.D. and Bray, T.A. (1964) 'A fast procedure for generating normal random
variables', Communications ~)f the ACM, 7:4-10.
Marsaglia, G. and Zaman, A. (1991) 'A new class of random number generators', The Annals qfApplied
Probability, 1:462--480.
McGrath, E.I. (1970) Fundamentals of operations research. San Francisco: West Coast Univ. Press.
McGrattan, E. (1996) 'Solving the stochastic growth model with a finite element method', Journal of
Economic Dynamics and Control, 20:1942.
McNamee, J. and Stenger, E (1967) 'Construction of fully symmetric numerical integration formulas',
Numerical Mathematics, 10:327-344.
Metropolis, N., Rosenbluth, A.W., Rosenbluth, M.N., Teller, A.H. and Teller, E. (1953) 'Equation of state
calculations by fast computing machines', The Journal of Chemical Physics, 21:1087-1092.
Mikhall, W.M. (1972) 'Simulating the small-sample properties of econometric estimators', Journal ~f the
American Statistical Association, 67:620-624.
Mitchell, B. (1973) 'Variance reduction by antithetic variates in G1/G/1 queueing simulation', Operations
Research, 21:988-997.
Mtiller, E (1991) 'Numerical integration in Bayesian analysis', Purdue University unpublished PhD disser-
Niederreiter, H. (1992) Random number generation and quasi-Monte Carlo methods. Philadelphia, PA:
Peskun, EH. (1973) 'Optimum Monte-Carlo sampling using Markov chains', Biometrika, 60:607-612
Piessens, R., DeDoncker-Kapenga, E., l~lberhuber, C.W. and Kahaner, D.K. (1983) QUADPACK. New York:
800 J. Geweke

Poirier, D. (1995) Intermediate statistics and econometrics: A comparative approach. Cambridge, MA:
MIT Press.
Press, W.H., Flannery, B.E, Teukolsky, S.A. and Vetterling, W.T. (1986) Numerical recipes: The art oJ
scientific computing. Cambridge, MA: Cambridge Univ. Press.
Richtmeyer, R.D. (1952) 'On the evaluation of definite integrals and a quasi-Monte Carlo method based
on properties of algebraic numbers', Report LA-1342. Los Alamos: Los Alamos Scientific Laboratories.
Richtmeyer, R.D. (1958) 'A non-random sampling method, based on congruences for Monte Carlo prob-
lems', Report NYO-8674. New York: Institute of Mathematical Sciences, New York University.
Ripley, R.D. (1987) Stochastic simulation. New York: Wiley.
Roberts, G.O. and Smith, A.EM. (1994) 'Simple conditions for the convergence of the gibbs sampler and
Metropolis-Hastings algorithms', Stochastic Processes and Their Applications, 49:207-216.
Rubinstein, R.Y. (1981) Simulation and the Monte Carlo method. New York: Wiley.
Schmeiser, B.W. and Lal, R. (1980) 'Squeeze methods for generating gamma variates', Journal of the
American Statistical Association, 75:679-682.
Smith, A.A. (1991) 'Solving stochastic dynamic programming problems using rules of thumb', Queen's
University, Department of Economics, Discussion Paper No. 816.
Stokey, N.L. and Lucas, R.E. Jr. (1989) Recursive methods in economic dynamics. Cambridge, MA: Harvard
Univ. Press.
Strecok, A.J. (1968) 'On the calculation of the inverse of the error function', Mathematics of Computation,
Tanner, M.A. and Wong, W.H. (1987) 'The calculation of posterior distributions by data augmentation',
Journal of the American Statistical Association, 82:528-550.
Tauchen, G. (1985) 'Diagnostic testing and evaluation of maximum likelihood models', Journal ~)fEcono-
metrics, 30:415-443.
Taylor, S. (1986) Modelling financial time series. New York: Wiley.
Taylor, J. and Uhlig, H. (1990) 'Solving nonlinear stochastic growth models: A comparison of alternative
solution methods', Journal of Business and Economic Statistics, 8:1-17.
Tezuka, S., L'Ecuyer, P. and Couture, R. (1993) 'On the lattice structure of the add-with-carry and subtract-
with-borrow random number generators', ACM Transactions on Modeling and Computer Simulation,
Tieruey, L. (1991) 'Exploring posterior distributions using Markov chains', in: E.M. Keramaidas, ed.,
Computing science and statistics: Proceedings ¢~fthe 23rd symposium on the interface. Fairfax: Interface
Foundation of North America, pp. 563-570.
Tierney, L. (1994) 'Markov chains for exploring posterior distributions', Annals of Statistics, 22:1701-1762.
Tierney, L. and Kadane, J.B. (1986) 'Accurate approximations for posterior moments and marginal densi-
ties', Journal of the American Statistical Association, 81:82-86.
Tierney, L., Kass, R.E. and Kadane, J.B. (1989) 'Fully exponential Laplace approximations to expectations
and variances of nonpositive functions', Journal of the American Statistical Association, 84:710-716.
van Dooren, P. and de Ridder, L. (1976) 'An adaptive algorithm for numerical integration over an N-
dimensional rectangular region', Journal of Computational and Applied Mathematics, 2:207-217.
von Neumann, J. (1951) 'Various techniques used in connection with random digits', National Bureau of
Standards Applied Mathematics, Series 12, pp. 36-38.
Walker, A.J. (1974) 'New fast method for generating discrete random numbers with arbitrary frequency
distributions', Electronics Letters, 10:127-128.
Walker, A.J. (1977) 'An efficient method for generating discrcle random variables with general distribu-
tions', ACM Trunsactions on Mathematical S~?[~vare, 3:253-256.
Wichmann, B.A. and Hill, I.D. (1982) 'An efficient and portable pseudo-random number generator', Applied
Statistics, 31 : 188-190.
Wild, P. and Gilks, W.R. (1993) 'Adaptive rejection sampling from log-concave density functions', Applied
Statistics (JRSS Series C), 42:701-708.
Zellner, A. and Min, C. (1995) 'Gibbs sampler convergence criteria', Journal qf the American Statistical
Association, 90:921-927.

Aamodt, A. 438 Algorithm 640, 642, 647

Abilock, H. 81 BKR algorithm 125
Abnormal trading pattern detection 426 determinis•c algorithms 627, 628, 644, 646,
Acceptance methods 746, 753, 754, 759 698
Acceptance sampling 747, 748, 751-754, 762, doubling algorithm 194, 207
764, 765, 775, 776, 778, 779, 791 Euler's algorithm 7, 12, 13, 16
Accessible 102 Fair-Taylor algorithm 36
Accounting 409 for constrained optimization 367
Action elhnination 663 for systems of equations 362
Action elimination methods 652 for the dynamical system 372
Actions 110 for unconstrained optimization 364
Active Learning 605, 608-610, 613 for variational inequality problems 369
Active Learning algorithm 609 Johansen/Ealer (see also Johansen/Euler
Active Learning method 609 method for solving CGE models) 9,
Active Learning or Dual Control (DUAL) 604 12-24
Active Learning strategy 610, 613 minimum residual algorithm 663, 666
Activity analysis 298 multigrid algorithms 624, 627, 629, 699, 705
Adams, ED. 62, 63, 68, 76, 77, 79 low discrepancy multigrid algorithm 708
Adams, R.A. 714, 722 random multigrid algorithm 629, 699, 706,
Adaptive Expectations Hypothesis (AEH) 594 707, 721
Adaptive information 643 Nelder and Mead optimization algorithm 686
Adaptive methods 750, 764, 791 Newton-Raphson algorithm 7, 12
Adelman, 1. 8, 79 optimal algorithms (see also computational
ADI method 603 complexity) 624, 627, 639, 647, 707,
Adib, EM. 319, 328 718
Adjacent 99, 104 polyalgorithms 622
AEH 594 projection methods algorithm 625, 689, 709,
Affiliation 266 715, 717
Agents 110 random algorithms 627-629, 644
Aggregation functions 419 Scarf's algorithm 6, 7, 12
Agricultural economics 479 shooting algorithm 25, 35, 36
Ahrens, J.H. 753, 754, 797 simplex algorithm 627, 648
AI 407 successive approximations algorithm 621,652,
AIMMS 298, 311,473 653, 655, 656, 660, 661,706, 709, 713
Akilov, G.P. 725 Allen, B.E 438
Alan Manne 322, 327 Allocative optimality 259, 265, 287
Alan Manne notation 306 Almost completely labeled 103
Alatorre, J. 304, 314, 330 Alternative Heun-type method 373
Alberta 323 Aluminum 319
Albrecht, J.W. 581 Amato, J.J. 251
Aleksander, I. 437 Amdahl, G. 353, 401
Algebraic languages 472 Amdahl's law 353
Algebraic Riccati equation 592 Amemiya, T. 150, 169, 796, 797
802 Index

Ames, W.F. 603, 615 Arnold, L. 613, 616

Amman, H.M. 401,403, 592, 593, 595, 596, Arrow, K.J. 6, 19, 80, 293, 336, 401,404, 617,
598, 599, 605, 610, 613, 615, 616, 627 620, 635, 722
AMPL 298, 310, 473 Arthur, B.W. 451,469
Andean common market 319 Artificial intelligence 311,405
Anderson, A. 328 Asset allocation 482
Anderson, B.D.O. 250, 251 Assets and liability 405
Anderson, D. 323, 328 Assignment statement 475
Anderson, E.W. 173, 175, 191, 196, 200, 250, Associative retrieval 432
627, 722 Asymmetrie information 575-577
Anderson, G.S. 581 Asymptotic distribution accuracy 159
Anderson, N. 15, 17, 80 Asymptotic expansions of integrals 545
Anderson, T.W. 469, 756, 797 Atkeson, A. 443, 469
Ando, A. 401 Altracts 265
Ando, M. 401 Auer, E 625, 690, 725
Ang, B.K. 302, 320, 329 Augmented regulator problem 176
Ansley, C.F. 250 Aumann, R.J. 438
Antithetic Monte Carlo 770-772, 774 Australian Bureau of Agricultural and Resource
Antithetic simple Monte Carlo 770, 771 Economics (ABARE) 62, 63
Antithetic variables 771,772, 774, 796, 797 Australian Bureau of Statistics (ABS) 77, 78
Aoki, M. 588, 591,616 Autoregressive representation 226
APL 491 Auxiliary constraints 258, 289
APL computer program 280 Average case complexity 628
APL program 275, 278, 283, 290-292 Average case deterministic complexity 646, 647
Apostol, T.M. 13, 80 Azhar, S. 114, 139
Applied General Equilibrium (AGE) models (see
also Computable General Equilibrium
(CGE) models) 5 Bacharach, M. 336, 401
Approximate Newton's method 366 Backpropagation network 425
Approximation Backpropagation neural network 426
L p approximation 512, 548 Backsolving 717
c-approximation 642 Backward chaining 417
Approximation methods (see also interpolation Backward derivatives 277
methods and Chebyshev polynomials) Backward differences 270
Chebyshev series approximation 625 Backward induction 653, 669
discrete approximation 627, 669, 700, 718 Backward reasoning 434
function approximation 624, 628 Backward recursion 649
Galerkin method 689, 711,715, 721 Bai, Z. 193, 250
neural networks 414, 421,425, 436, 445, 559, Baker, T.E. 310, 329
622, 625, 690, 711,719 Bakhvalov, N.S. 707, 722
orthogonal polynomial families 548, 623, 624 Balanced matrices 484
polynomial series approximations 686 Balcer, Y. 581
smooth approximation 649, 652, 679, 681,709 Ballard, C.L. 9, 25, 69, 80
splines 624, 719 Bandara, J.S. 8, 80
Araujo, A. 581 Banking 409
Arborescence 109 Banks 407
Archibald, T.W. 666, 722 Banks, J.S. 117, 139
Arithmetic operations 475 Barney, L.D. 584
Armington, ES. 51, 78, 80 Barro, R. 581
Armstrong, M. 257, 264, 267, 277, 292 Barron, A.R. 469, 581,625, 690, 722
Index 803

Bartels, R.H. 205, 250 Bishop, EM. 403

Barto, A.G. 717, 722 Bisschop, J. 8, 80, 298, 311, 329, 472, 473, 488
Basar, T. 585, 605, 616 Bittanti, S. 251
Basic solution 94 Bizet, D.S. 581
Basis 94, 114, 565, 567 BKR algorithm 125
Bauer, R.J. 438 Bjorck, A. 15, 17, 80
Baughman, M.L. 310, 316, 323, 329 BK 599, 601
Bayesian calculations 502 Blackwell, D. 635, 723
Bazaraa, M.S, 366, 401 Blackwell's theorem (see also Markov decision
BDMLP 478 process) 633, 636
Beaumont, P. 401 Blagviere, A. 616
Beavers, A.N. 200, 250 Blair, J.R.S. 115, 139
Becker, G.S. 250 Blanchard and Kahn method (BK) 599
Becker, R.G. 588, 616 Blanchard, O.J. 581, 599, 616
Beckmann, M. 307, 330 BLAS Level-3 593
Behavior strategy t 10 Bleistein, N. 581
Beliefs 111 Blitzer, C.R. 332
Bellman, R. 581,589, 616, 622, 624, 625, 722 Block recursive form 375
Bellman equation (see also operators, Bellman) Blume, L. 117, 139, 581
531,575, 621,633, 635-638, 653, 654, Boggs, ET. 488
713 Bollerslev, T. 788, 797
Bellman recurrence equation 591 Boltyanskii, V.G. 618
Bellman value function 498 Bond trading 409
Bellman dynamic progrmnming 589 Bootstrap 501
Bellman functional recurrence equation 590 Bound variable 119
Bellman principle of optimality 595 Boundary conditions 268, 269, 272, 277, 283,
Belsely, D. 501,505 288, 602
Belsley, D.A. 403, 616, 723 Boundary shapes 278
Beltramo, M.A. 323, 329 Bounded rationality 452
Ben-Or, M. 120, 125, 129, 139 Bovenberg, A.L. 9, 25, 31-33, 80, 581
Bender, C.M. 581 Bowles, S. 337, 401
Beneveniste 446 Box, G.E.P. 753, 797
Benhabib, J. 581 Boylan, P. 282, 292
Bensoussan, A. 402, 581 Bradtke, S.J. 717, 722
Berck, E 438 Bratley, P. 743, 745, 753, 797
Bergman, L. 8, 80 Bray, T.A. 744, 753, 799
Bergstrom, A.R. 602, 613, 616 Brazil 3t9
Bernardo, A. 583 Breaking the curse of dimensionality 646
Bemardo, J.M. 798 Brock, W.A. 581,613, 617, 620, 696, 723, 797
Bertsekas, D.R 354, 363-366, 368, 369, 372, Brooke, A. 8, 80, 298, 322, 329, 472, 473, 479,
401, 588, 591,605, 607, 616, 622, 488
631,636, 654-656, 662, 663, 670, 722 Brown, G.W. 109, 139
Best response correspondence 90 Brown, M. 319, 329
Bhattacharya, R.N. 631,636, 723 Brown, EN. 723
Bierman, G.J. 201,250 Brumelle 655, 666
Bifurcation applications 542 Brunner, K. 617
Bifurcation methods 540 Budd, C. 581
Bifurcation theorem 540 Budget constraints 10
Binding incentive-compatibility constraints 266 Bulirsch, R. 604, 618
Binomial distribution 753 Bunching 284
804 Index

Bundling 289 Chebyshev polynomials (see also approximation

Burden, R.L. 603, 616 methods) 549, 570, 572, 623, 694,
Burnett, D.S. 581 713, 717, 719, 721
Byers, R. 201,250 Cheeseman, IF'. 437
Byrd, R.H. 488 Chen, X. 469
Chenery, H.B. 19, 80, 319, 329, 332
Cheney, E.W. 582, 603, 616, 798
C90 346 Chew, S.H. 723
C language 313, 321,491,502, 505 Chiappori, P.A. 582
Caines, P.E. 175, 176, 250 Chib, S. 779, 797
Calculus of variation 588 China 325
Canada 323 Chipman, J. 582
Canny, J. 121, 122, 139 Cho, I.-K. 117, 139, 457, 458, 463, 469
Caputo, M.R. 581 Choksi, A.M. 302, 319, 329
Carbon taxes 325 Choleski decomposition 593, 613
Carbonell, J.G. 437, 470 Chou, R. 788, 797
Carloyzi, N. 168, 169 Chow, C.S. 624, 626, 627, 644, 673, 674, 676,
Carraro, C. 615 704, 7t7, 723
Cartesian product 365 Chow, G.C. 169, 337, 401,588, 589, 603, 613,
Cascades 302, 315 616
Case 428 Christadoulakis, N.M. 618
Case Representation (CR) 429, 430 Christiano, L.J. 582, 711,717, 723
Case retrieval 432 Christov, N.D. 252
Case-based inductive indexing 431 Circumvent the curse of dimensionality 627
Case-based reasoning 407, 415, 427, 434 Cividini, A. 404
CASGEN 8 Clark, D.S. 470
Cash point machines 410 Clark, EB. 332
Castafion, D. 656, 662, 722 Class-instance object 417
Casti, J. 368, 401 Classification 422
Cattle cycles model 215 Classification of firms 425
Causal model 418 Classifier 446
CBR 427 Classifier system 448
Cement 320 CLUSTER/2 424
Central limit theorem 757, 763, 764, 771,775, CM Fortran 348
786, 787, 795, 796 CM-2 345
Certainty Equivalence (CE) 604, 607, 613 CM-5 347
Certainty Equivalence case 607 C Q emissions
CES function 78 costs of reducing 70, 74-76, 78
demand for product variety 72 Toronto targets 75
in illustrative CGE model 47, 50 Coarse-grained machine 344
percentage-change form of corresponding input Cobb-Douglas function 19
demand functions 19, 20 Cobb-Douglas structure 493
CGE 327 Coddington, E. 337, 401,582
Chan, S.W. 176, 197, 250 Codsi, G. 8, 12, 55, 80
Chao, J.C. 581 Cohen, D. 595, 616
Chari, V.V. 582 Coleman, T.E 593, 616
Cha:~aes, A. 304, 322, 329 Coleman, W.J. 582, 691,717, 723
Chebyshev approximation 550 Collapse of dimensi0nality 285, 289
Chebyshev interpolation 553, 554 Collins, G.E. 119, 120, 139
Chebyshev polynomial approximation 686 Collocation 568
Index 805

Collocation method 716 welfare analysis in 69-76

Colombia 324 Computational complexity 102, 106, 129, 624,
Commodities 298 639, 649, 676, 718
Comparative dynamics 520, 525 algebraic complexity 640
Comparative statics 496, 520 average case complexity 627, 628, 647, 648,
Competition theories 433 676
Complementarity problems 355 continuous complexity 624, 641
Complementary 94, 259 discrete complexity 624, 640
Complete polynomials 557, 573 intractable problems 626, 629, 640, 677
Complete problem 282 randomized complexity 645, 676, 703
Completely labeled 103 tractable problems 626, 628, 640
Complexity 327, 640, 643 worst case complexity 626, 643, 644, 648,
e-complexity 643 680, 707
Complexity fnnction 640 Computational complexity theory 629
Composition algorithms 752-754 Computational cost 643
Composition methods 746 Computer industry 320
Computable functions 640 Computer science issues 350
Computable General Equilibrium (CGE) models Computer software for solution of CGE models 8
calibration of 52, 53 Computer workstation industry 321
compared to economy-wide econometric Conceptual clustering 421,424
models 7, 8 Conditional expectations 576, 773, 774
costs of protection in 70-74 Conflictresolution strategy 417, 434
costs of reducing CO2 emissions in 70, 74-76 CONOPT 309, 478
definition of 5 Consistent 114
economies of scale in 9, 70-74 Consistent assessments 112
forecasting with 4, 62-67, 76-78 Constant returns to scale
game theory in 9 and costs of protection 74
history of 6-9 in CES production function 20
illustrative model 36-67 in illustrative CGE model 48
imperfect competition in 71-74 Constrained matrix problems 379
initial solution to 10, 12, 22, 27, 29, 31, 32, Constraint generation 663, 664
49-53, 66 Constraint programming 415, 435
investment in 21-24 Constraint satisfaction 432
key aspects of 4, 5 Constraint sets 632
level of disaggregation in 75, 78 Constraints 415
monographs describing 8, 9 Consumption function 638
multi-period versions of (see also multi-period Continuous action 630
CGE models) 24-36 Continuous and discrete types 265
percentage-change forms of 17-21, 27, 29-32 Continuous (or information based) complexity
potential of 76 639
price discrimination in 9 Continuous complexity theory 641
product differentiation in 9 Continuous computational complexity 624, 641
range of applications of 4 Continuous finite horizon MDPs 669
rational expectations in 9 Continuous MDP problems 624
software for solution of 8 Continuum of types 256
solving via derivative methods t2-24 Contraction 360
solving via programming methods 10 12 Contraction iteration 653
success of 67 69 Contraction mapping 62t, 633, 635, 652
surveys of 8 Contraction Mapping theorem 636, 653
textbooks on 9 Control variables 774
806 bldex

Control vector 589 Danthine, J.-R 582

Controllability 175 Dantzig, G.B. 297, 310, 329, 336, 401,480, 488,
Controlled stochastic process 620 582, 624, 683, 722, 723
Convexity constraint 284 Dasgupta, S. 582
Cooley, T. 582 Data entry 475
Cooper, W.W. 304, 307, 322, 329, 617 Databases 312, 407
Copithorne, L.W. 324, 332 Davis, EJ. 676, 721,723, 734, 736, 737, 797
Copper 319 de Boor, C. 581,723
Corden, W.M. 56, 69, 74, 80 de Melo, J. 9, 81
Cost minimization 315 de Ridder, L. 737, 800
Cost-to-Go 605 Deaton, A. 582
Costs of protection CGE analysis of 70-74 DeCegama, A,L. 345, 401
Cottle, R.W. 403 Decision making level 411
Cournot equilibrium 503 Decision support systems 407
Cournot, A. 370, 401 Decision trees 422
Cournot-Nash equilibrium 394 Declaration 477
Couture, R. 800 Declaration, definition statements 473
Covariance 482 Decomposition 351
Covariance estimates 608 Decomposition algoritluns 370
Covariance matrices 605 DeDoncker-Kapenga, E. 799
Covariance operator 645 Definition of equations 477
Coveyou, R.R. 743, 745, 797 Deflating subspace 187
Cox, D. 9, 71, 73, 74, 80, 82 Delaunay triangulation 282
CR (Case Representation) 429, 430 DeLong, J.B. 161, 169
Crank-Nickelson method 603 Deminel, J.W. 193, 250
CRAY X-MP/48 346 den Haan, W. 583
Credit risk 416 Denardo, E.V. 635, 724
Cremer, J. 319, 329 Deng, Y. 339, 401
Critical masks 322 Denman, E.D. 200, 250
Crops Denning, P.J. 344, 401
annual 326 Dennis, J.E. 366, 401
tree 326 Depreciation 25, 26, 28, 50
Cryer, C.W. 723 Derangements 130
Cubic spline interpolation 686 Derivative methods for solving CGE models (see
Cubic splines 622 also Johansen/Euler method for solving
Curse of dimensionality (see also computational CGE models) 9, 12-24
complexity) 625, 626 629, 631,644, Derrick, W. 269, 292
646, 669, 671,672, 676, 677, 679, Dervis, K. 9, 25, 80, 81
690, 692, 694, 697-699, 702, 703, 717, Detectability 175
718, 721 Deterministic complexity 645, 646, 698
Cuthbertson, K. 606, 616 Deterministic control 588
Cylindrical algebraic decomposition 120 Deterministic LQCM 588
Deterministic regulator problem 174
Devarajan, S. 8, 81
Dafermos, S. 336, 337, 370, 372, 401 Devroye, L. 750, 753,797
DAG (directed acyclic graph) 375 Dickhaut, J. 139, 499, 505
Dahl, H. 479, 488 Dies 322
Dahlquist, G. 15, 17, 80 Dieter, U. 753, 754, 797
Dammert, A. 303, 319, 329 Differential condition 273, 277-279, 288
Daniel, J.W. 582, 624, 685, 723 Differential constraint 272, 274
Index 807

Dikhaut 131 Dudley, R.M. 724

Direct methods 362 Duloy, J.H. 326, 329
Dirichlet boundary conditions 279 Dupuis, EG. 356, 358, 372, 374, 402, 403, 613,
Dirichlet conditions 273 617, 630, 725
Dirkse, S. 478, 488 Duraiappah, A.K. 325, 329
Discounted criterion 631 Dutta, B. 438
Discounted stochastic regulator problem 177, Dutta, P. 631,724
179, 242 Dwolatzky, B. 616
Discourse analysis 418 Dynamic games 574
Discrepancy (see also numerical integration) 680 Dynamic optimization 588, 793, 794
Discrete approximation 622, 669, 679, 698 Dynamic Programming (DP) (see also Markov
Discrete (or algebraic) complexity theory 639, decision process) 319, 498, 560, 563,
641 595, 619, 620, 632, 653
Discrete (or algebraic) computational complexity Dynamical systems 356
624, 640 Dynkin, E.B. 724
Discrete finite horizon MDPs 649
Discrete formulation 277
Discrete infinite horizon MDPs 652 Easley, D. 581
Discrete types 265 Easterbrook, S.M. 438
Discretization 562 Eastman, H.C. 73, 81
Discretized Bellman operator 673 Eaves, B.C. 92, 96, 97, 105, 139
Discretized incentive compatibility 267 Eckstein, Z. 622, 724
Discriminant functions 444 Econometric calculations 501
Distorted economy 208, 246 Econometric model simulation 375
Distributed memory 343 Economies of scale 307
Divergence 259 and costs of protection 70-74
Divergence theorem 259, 267, 287 in CGE models 9
Diversification 482 Economy-wide econometric models compared to
Dixit, A. 582 CGE models 7,. 8
Dixon, EB. 8-10, 12, 16, 19, 30, 32, 33, 36, 51, Education model 212
56, 61-64, 68, 70, 76-81, 327, 337, Efficiency of the algorithm 353
401 Egyptian agriculture 326
Doan, T. 250 E1-Gamal, M. 137, 139
Document layout language 418 Elasticity parameters in illustrative CGE model
Dollar ($) operator 478 51
Domain 474, 477 Elasticity parameters 5, 9
Don, H. 8,81 Electric power 316, 323
Donaldson, J.B. 582 Elliptic equations 268, 271,279
Dong, J, 403 Elliptic PDEs 628
Dongana 655 Elman, J.L. 448, 469, 668
Dorfman, R. 297, 329, 336, 402 Emeris, I. 121, 139
Doshi, B.T. 630, 724 Endpoint 99
Dotsey, M. 582 Energy economics 479
Doubling algorithm 194, 207 Energy sector 324
Doup, T.M. 105-107, 139 Engle, R. 727
Drexler, EJ. 129, 139 Entriken, R. 298, 311, 329, 473, 488
Dreyfus, S. 622, 722 Entropy functional 486
Drud, A. 8, 81, 309, 329, 472, 478, 488 Envelope condition 289
DUAL 611 Envelope property 257
Dual method 379 Epstein, L.G. 631,723, 724
808 Index

EQUATIONS 477 Fakes, J.D. 603, 616

Equilibria finding all 118 Far East 320
Equilibrium refinements 115 Fast matrix multiplication 651,663
ES/9000 347 Fast matrix multiplication algorithms 649
Euclidean remainder sequence 124 Feasibility 288
Euler and transversality conditions 259 Feasible 93, 94
Euler condition 260 Feature size 321
Euler equation error 529 Feedback rule 590
Euler equation methods 717 Feed-backward networks 414
Euler method 396, 398 Feed-forward networks 414
Euler-type method 373, 396 Feedforward neural network 446
Europe 320 Feichtinger, G. 582
European Common Market 304, 308 Feig, E. 129, 139
Evans, M. 764, 797 Fernandez, G. 323, 324, 329
Evtushenko, Y.G. 724 Ferrier, G.D. 336, 402, 617, 670, 724
Exact equilibration 382 Ferris, M. 488
Exact equilibration algorithm 380 Fertilizer 319
Exception handling 478 Fienberg, S.E. 505
Execution statements 473 F1ML 146, 160, 165
Existence of a solution to a variational inequality Finance 407
problem 359 Financial models 479
Expectation consistent 595 Financial services 409
Expectational variables 597, 606 Fine-grained machine 344
Expectations Finite-difference approximations 269
about exchange rates 76 Finite element 555, 558, 573
model consistent (see rational) 9, 24, 25, Finite-element approximation 268
30-36 Finkel, R. 368, 402
rational 9, 24, 25, 30-36 First-order necessary condition 258
static 24, 28-30 Fishbone, I.G. 75, 81
Expectations-consistent strategy 610 Fisher, D. 437
Expected revenue 258 Fisher, J.D.M. 582, 717, 723
Expert system 311,415, 435 Fisher, RG. 595, 616
Explanation 415, 417, 427, 435 Fisher, E 161, 169
Explanation-based learning 421 Fisher, R.A. 469
Exponential distribution 746, 748 Fishman, G.E 743, 797
Exponential-time algorithm 625, 720 Fixed point 90, 91,635, 656, 716
Exponential-time problem (see also Fixed point problem 358, 652
computational complexity) 640, 649 Flannery, B.E 618, 727, 800
Extended Path (EP) method 160 Flavin, M.A. 250
Extensive form 109 Fleming, W.H. 582, 630, 724
Extraneous solution 93, 265 Fletcher, C.A.J. 582
Eydeland, A. 336, 402, 403 Fletcher, R. 250
Flynn, M.J. 342, 402
Flyun's taxonomy 342
Face 103 Folk theorem 456
Facet 103 Forex forecasting 426
Fair, R.C. 36, 81, 144, 157, 158, 160, 161, Forsythe, G.E. 746, 797
163-165, 168, 169, 595, 597, 616, 717 Fortran 491, 502, 505
Fair-Taylor iterations 608 Forward differences 270
Fair-Taylor iterative procedure 597, 607 Forward integration 608
Index 809

Forward reasoning 435 GAMS statements 473

Forward-looking behavior 594 declaration statements 473
Forward-looking variables 598 definition statements 473
Foster, G.H. 603, 616 execution statements 473
Fourer, R. 298, 310, 329, 472, 473, 488 GAMS symbols 473
Fourier series 268 ACRONYMS 473, 486
Fourier solutions 269 MODELS 473
Fox, B.L. 622, 724, 743, 745, 753, 797 PARAMETER 473, 475
Frame 417, 435 SETS 473
Frame-based knowledge representation 429 GAMS/SAMBAL 484
Franke, R. 582 Garcia, C.B. 32, 85, 106, 129, 139, 142, 337, 402
Fredhohn integral equations 628, 646, 647 Gardiner, J.D. 173, 191,200, 207, 251
Free boundary value problem 639 Garey, M.R. 624, 640, 724
Freeman, D. 437 Gately, D.I. 323, 330
Freidenfelds, J. 320, 330 Gauge functions 542, 546, 579
Friedman, A. 582 GAUSS 611
Friedman, D. 293 Gauss programming language 702
Frisch, R. 6, 81 Gauss-Hermite quadrature 702
Frobenius norm 486 Gauss-Seidel 652
Fron, A. 439 Ganss-Seidel algorithm 376
Frost, R.A. 439 Gauss-Seidel iteration 361,363
Fu, K.S. 437 Gauss-Seidel technique 152
Fudenberg, D. 466, 470 Ganss-Seidel version of the gradient projection
Full information likelihood estimation 606 method 368
Fullerton, D. 9, 80 Gaussian 611
Functional equation 635 Gaussian quadrature 675, 676, 715, 734, 738
Functional programming 491 Gay, D.M. 310, 329, 473, 488
Fuzzy membership 419, 435 GCL 134
Geary, R.C. 47, 81
Gelfand, A.E. 776, 797
Gabay, D. 370, 402 Gelman, A. 786, 797
Gagnon 717 Geman, D. 797
Gale, W. 437 Geman, S. 724, 797
Galerkin 570 GEMPACK 8, 12, 21, 24, 30, 53, 55, 79
Galerkin method (see also approximation General Algebraic Modeling System (GAMS) 8,
methods, Galerkin method) 689, 711, 79, 298, 310, 472
715, 721 General equilibrium 479
Gallant, A.R. 446, 470 Generalized real Schur decomposition 193
GAMBIT 133 Generalized Shur algorithm 193
Game analysis 318 Generalized single crossing condition 256
Game theory 434 Generic AI tasks 415
in CGE models 9 Genetic algorithms 434
price leadership 73 Genz, A. 737, 797, 798
Bertrand rivalry 73 Geoffard, P.Y. 582
Gamkrelidze, R.V. 618 Geoffrion, A.M. 298, 310, 328, 330, 473, 488
Gamma distribution 754, 791 Geographical scope 318
GAMS (General Algebraic Modeling System) 8, Geweke, J. 697, 724, 748--750, 764, 771,787,
79, 298, 310, 472 798
GAMS libraries 479 Geyer, C.J. 786, 787, 798
810 Index

Ghysels, E. 582 Gustafson, R.L. 583

Giannessi, E 403 Guu, S.-M. 583
Gianotti, C. 437
Gibbs sampler 776-778, 780, 781,784, 785,
790-792 Hackbusch, W. 724
Gibrat, R. 323, 331 Hahn, EH. 6, 80
Giesen, G. 81 Hakansson, N. 620, 724
Gihman, I.I. 630, 636, 724 Halbert, W. 469, 470
Gilks, W.R. 750, 791,798, 800 Hale, J.K. 582
Gilli, M. 402 Hall, R.E. 251
Ginsburgh, V.A. 10, 81 Hall, S.G. 616
Glimm, J. 339, 401 Hallett 598
Global warming 325 Halting problem 640
GMRES (Generalized Minimum Residual Halton points 679, 703
Algorithm) 668, 704 Halton sequence 739, 740, 766-768
Goals 413 Halton, J.M. 679, 703, 739, 798
Goffe, W.L. 336, 402, 610, 617, 670, 724 Hamilton, J.D. 251
Goldberg, D.E. 470 Hamilton-BeUman-Jacobi (HBJ) equation (see
Goldfarb, D. 724 also Markov decision process,
Goldstein, A.A. 402 continuous time) 630
Goldstein, G. 81 Hmnikon-Jacobian-Bellman equation (HJBe)
Golub, G.H. 173, 185, 193, 205-207, 251,270, 603, 614
271, 281,292, 735, 798 Hamiltonian 189
Golubitsky, M. 582 Hammarling, S.J. 251
Gonzalez, A.J. 438 Hammersley points (see also numerical
Goodwin, G.C. 176, 197, 250 integration, low descrepancy mthods)
Gordon 303 628, 647, 671,679, 703
Goreux, L.M. 330 Hammersley sequence 739, 740
Goulder, L.H. 9, 25, 32, 33, 80, 82 Hammersley, J.J. 679, 703, 724
Gradient 492 Harnmersley, J.M. 740, 761,770, 774, 776, 798
Gradient hill climbing algorithms 700 Handscomb, D.C. 724, 740, 761,774, 776, 798
Gradient projection method 368, 377 Handelsman, R.A. 581
Gradshteyn, I.S. 734, 798 Hannah, E.J. 787, 798
Grammar 435 Hansen, G.D. 583
Granularity 344 Hansen, L.P. 160, 169, 176, 180, 251, 583, 622,
Graphical systems 312 627, 631,722, 724
Greenberg, E. 779, 797 Hansen, T.E. 404
Greenberger, M. 744, 798 Hardy-Krause variations 680, 681
Greenhouse gas (see CO2 emissions) Hargreaves, C. 79, 83
GRG2 478 Harker, P.T. 107, 140
Grid 669, 700 Harris, C. 581
Grid points 643 Harris, R.G. 8, 9, 71, 73, 74, 80, 82
Grossman, S. 269, 292, 583 Harris, T. 620, 722
Growth models 327 Harrison,W.J. 21, 82
Gruen, EH. 47, 84 Harro, W. 470
Guder, F. 402 Harston, C. 437
Gudmundsson, T. 251 Hart, H.E 746, 798
Guesnerie, R. 255, 257, 258, 261,285, 292, 582 Hart, S. 438
Guide 473 Hartman, P. 337, 372, 402
Gul, E 130, 139 Harvey, R.P. 582, 723
Index 811

Hastings, W.K. 776, 778, 798 Huber, P.J. 798

Hawkins, D. 724 Hubner, G. 652, 725
Hazard rate 256, 261 Hudson, E.A. 7, 24, 82
Hazell, P.B.R. 326, 330 Huffman, G.W. 583
Heaton, J. 176, 251 Hughes Hallett, A.J. 598, 616, 617
Hebb, D.O. 470 Hughes, M. 336, 402, 403
Helmberger, P.G. 584 Hull, J. 603, 617
Hennessy, J.L. 345, 402 Hurwicz, L. 285, 292, 336, 401
HERCULES 8, 478 Hussey, R. 674, 728
Hermite interpolation 552 Hwang, C. 670, 724, 725
Hermite polynomials 549, 689 Hybrid perturbation-projection method 578-580
Hertz, J. 470 Hymmen, H.A. 81
Hessenberg decomposition 206 Hypercube 343
Hessenberg-Schur algorithm 205
Hessian 492
Hessian matrix 261,268 ID3 421,422
Heterogeneity 256 Illustrative CGE model
Heun-type method 373 closure 53-56, 59, 61, 63
Heuristic knowledge 407 comparative-static simulations with 55-62. 63
Heuristics 435 elasticity parameters 51
Heyman, D.E 727 equations 39-48
Hickman, B.G. 169 forecasting with 62-67
Hierarchical indexing 431 initial solution 49-53, 66
Hierarchical retrieval 432 input-output database 37-39
Higher dimensions 279 Immediate predecessor 109
Hilderbrand, W. 82, 584 Immediately accessible 101
Hildreth, C. 402 Imperfect competition in CGE models 71-74
Hill, I.D. 744, 800 Imperfect state information 630
Hillis, W.D. 338, 402 Implementation Modified Projection Method on
Hinderer, K. 630, 725 CM-2 388
Hironaka, H. 120, 140 Implicit function theorem 13, 517
Hirsch, M.D. 106, 140 Implicit fnnction theorem for analytic operators
Hitz, K.L. 191,251 518
Hlawka, E. 739, 798 Importance sampling 747, 761-776, 771-773,
Hockney, R. 345, 402 776, 778-790
Holbrook, R.S. 337, 402 hnpossibility theorems 421
Holland, J.H. 470 Imrohoro~lu, A. 725
Holly, A. 583 hnrohoro~lu, S. 725
Holly, S. 598, 616, 617 IMSL 490
Holt, C.C. 588, 617 Incentive compatibility 265, 287
Holmlund, B. 581 Incentive efficiency 286
Homotopy method 105 Incentive-compatibility constraint orientation 270
Hopf bifurcation 541 Incentive-compatibility constraints 255, 257,
Hornik, K.M. 470, 625, 690, 725 259, 268, 286, 288, 289
Horridge, J.M. 9, 22, 24, 32, 71-74, 82 Incentives 255
Hotelling, H. 588, 617 Incomplete problem 257, 258, 288
Howard, R. 725 Increasing differences 256, 283
Howson, J.T., Jr. 140 Increasing-differences property 262
Hu, T.C. 139, 141 Increasing returns to scale (see economies of
Huber, B. 129, 140 scale)
812 Index

independence chain 778, 779 Interconnection networks 343

Independence Monte Carlo 756, 767-769, 775 Interconnection risk 416
Index 129 Interdependent block 375
Indexed operators 475 Interplant shipments 316
P R O D 475 Interpolation 512, 548, 551
S M A X 475 Interpolation methods (see also approximation
St4IN 475 methods)
SUM 475 local polynomial interpolation 622
Indexing of cases 430 Interpolation error 552
India 325 Interval logic 419
Induction 420 Intractable problems (see also computational
Inductive logic 414, 435 complexity, intractable problems) 626,
Industry Commission 75, 82 629, 640, 677
Inference 435 Intriligator, M.D. 293, 337, 402, 404, 617
Inference engines 417 Invariant subspace 184
Infinite horizon problem 591, 592 Inverse c.d.f. 746, 753, 754, 772, 773, 767
Information fusion 434 Investment cost 308
Information operator 642, 643 Investment in CGE models
Information partition 110 adjustment costs in 33, 75
Information sets 110 in illustrative model 42, 48, 50, 51, 59, 62, 66,
Information theoretic approach 423 67
Information-Based Complexity (IBC) 641,672 in multi-period models 24-36
Ingram 717 inequality constraints 21-24
Initial assessment 110 Ironing procedure 257, 261
Initial nodes 109 Irrigation 326
INMOS transputer 345 Iterative methods 361,362
Inner product 565, 567 Iterative techniques 652
Innovations representation 224
Input-output accounting concepts
basic values 37 Jacobi iteration 361,363
make matrix 39 Jacobi Overrelaxation Method (JOR) 363
margins 37 Jacobian 279
purchasers' values 37 Jacobian matrix 14, 16, 31,261,268, 274
Input-output data 5, 9, 18 Jacobstein, N. 438
an illustrative example 37-39 Jacquier, E. 788, 791,798
interpretation of 78, 79 ]ager, H. 598, 615
sensitivity of CGE results to 77, 78 Jain, A. 488
updating 77 Jamshidi, M. 589, 617
Input-output model 6, 8, 67, 69 Japan 320
Insurance 407, 409 Jerrell, M. 610, 617
Integer programs 478 Jesshope, C. 345, 402
Integrability 261,287 Johansen, L. 6, 7, 8, 13, 82
Integrability condition 268-270, 272, 277, 278, Johansen/Euler method for solving CGE models
280, 283, 288 9, 12-24
Integral condition 273, 278, 279 approximation errors in 14-17, 58, 61, 62
Integral constraint 272, 274, 289 extrapolation procedures with 17, 24, 61, 62
Integration 734, 742, 789, 793 for illustrative model 49-51, 55-57
Integration operator 649 inequalities and complementary slackness
Intelligence, artificial 405 conditions in 21-24
Intelligent agents 411 invertability of derivative matrices in 13, 14
Index 813

multi-step computations in 14-17, 27, 61, 62 Kenney, C.S. 201,251

non-recursive multi-period models 25, 30, 36 Keramaidas, E.M. 798, 800
recursive multi-period models 26-30 Kernighan, B.W. 310, 329, 473, 488
using percentage changes in 17-21 Keune, M. 799
Johnson, D.S. 624, 640, 724 Khachian, L.C. 627, 725
Johnson, H.G. 70, 82 Kim, D.S. 403
Johnson, S.A. 583, 622, 683-686, 725 Kim, H.K. 326, 330
Jones, B. 76, 82 Kimura, M. 187, 196, 197, 200, 251
Jones, C.V. 312, 330 Kinard, L.A. 129, 140
JOR (Jacobi Overrelaxation Method) 363 Kincaid, D. 603, 616
JOR iteration 363 Kinderlehrer, D. 359, 403
JOR method 365 Kinderman, A.J. 753, 799
Jordan cauonical form method 600 Kindervater, G.A. 368, 403
Jorgeusen, D.W. 7, 8, 9, 24, 25, 34, 75, 80, 82 King, S.R. 161, 169
Judd, K.L. 337, 402, 581,583, 622-624, 684, Kipnis, C. 799
686, 709, 715, 725, 735, 798 Klee, V. 627, 725
Judge, G.G. 336, 370, 404 Klein, L.R. 47, 82
Juskevic, A.A. 724 Klein-Rubin 47
Kleindorfer, E 402
Kleinman, A.J. 652, 725
Kachitvichyanukul, V. 753, 798 Kloek, T. 761,799
Kadane, J.B. 741,742, 800 Kmenta, J. 7, 82
Kahaner, D.K. 799 Kneese, A.V. 82
Kahn, C.M. 581, 599, 616 Knowledge 417, 435
Kahn, M. 774, 799 Knowledge assets 434
Kalaba, R. 581,622, 722 Knowledge guided indexing 43 l
Kalai, E. 140 Knowledge representation 417, 429, 435
Kalaith, T. 589, 617 Knowledge-based reasoning 407
Kalman filter 610 Knowledge-based retrieval 432
Kamien, M.I. 613, 617 Knowledge-based systems 417
Kang, K.-H. 323, 330 Knuth, D.E. 743-745, 799
Kantorovich, L.V. 336, 403, 725 Kohlberg, E. 114, 116, 140
Kaplan, T. 131, 139, 499, 505 Kohn, R.E. 250, 325, 330
Karakitos, E. 616 Koksma-Hlwaka inequality 679, 681
Karmarkar 627 Koller, D. 111, 132, 140
Kashyap, R.L. 251 Kolodner, J.L. 437, 438
Kass, R.E. 737, 742, 797, 800 Konstantinov, M.M. 252
Kaufmann III, W.J. 345, 403 Koopmans, T.C. 297, 299, 307, 330, 336, 401,
Keane, M.R 679, 691,725 403
Kehoe, T.J. 7, 82, 582, 583 Korpelevich, G.M. 370, 403
Keller, W. 8, 82 Kortum, S. 709-711,725
Kelley, A.C. 8, 82 Koselka, R. 438
Kelly, J. 437 Kostreva, M.M. 129, 140
Kendall, M.G. 470 Kotkin, B. 581,622, 722
Kendrick, D.A. 80, 81,298, 302, 304, 312, 314, Kotlikoff, L. 584
315, 319, 323, 327, 329, 330, 337, Kozen, D. 120, 125, 129, 139
401,403, 472, 473, 478, 479, 488, Krasnoselskii, M.A. 584, 716, 725
588, 589, 593, 595, 596, 598, 599, Kreps, D.M. 109, 117, 139, 140, 466, 470, 725
605-607, 609-611, 615-617 Krishnan, R. 330, 331
Kennedy, M. 324, 330 Krogh, A. 470
814 lndex

Kronecker product 126 Lehnert, W.G. 437

Kroner, K.F. 788, 797 Leighton, ET. 345, 403
Kronmal, R.A. 746, 799 Leland, H. 620, 726
Kronsj6, L. 725 Lemke, C.E. 92, 106, 139, 140, 337, 403
Krylov information 668 Lemke-Howson algorithm 92, 115, 137
KSR1 346 Lending advisors 409
Kuhn, H.W. 106, 113, 139, 140 Lenstra, J.K. 368, 403
Kumar, A. 252 Leontief, W.W. 6, 39, 78, 82, 726
Kushner, H.J. 470, 613, 617, 630, 652, 725 Lerner, A.P. 72, 82
Kutcher, G.P. 326, 331 Letson, D. 324, 325, 331
Kwakernaak, H. 175, 251,589, 617 Lettau, M. 726
Kwun, Y. 323, 331 Levhari, D. 620, 726
Kydland, RE. 163, 169, 179, 251,584, 594, 620, Levine, D. 583
725 Levinson, N. 337, 401, 582
K~gstrt~m, B. 251 Lex-feasible matrix 96
Lex-negative matrix 96
Ley, E. 502, 505
Labeling 103 Li, Y. 583
Labys, W.C. 297, 326, 331 Lieberman, O. 582
Laffont, J.-J. 255, 257, 258, 261,285, 292 Likelihood function 227
Lafrance, J.T. 584 Limitations 327
Lagrange interpolation 551 Lin, W. 201,251
Lagrange multiplier 259, 260 Linden, G. 324, 331
Lagrange multiplier approach 717 Linear Complementarity Problems (LCP) 21, 22,
Lagrangian methods 11, 19, 32, 33 93
Laguerre polynomials 549, 689 Linear congruential generators 743-745, 767
Laitner, J. 584 Linear convergence 653
Lal, R. 754, 800 Linear elliptic PDEs 647
Landowne, Z.E 582, 723 Linear expenditure system 47
Lang, H. 581 Linear Ganss-Seidel method 371,392
Langley, P. 437 Linear growth condition 374
Langston, V.C. 322, 331 Linear interpolation 683
Laplace approximation 741 Linear Jacobi method 371
Laplace equation 268 Linear program 480
Laplace method 546, 547 Linear Programming (LP) 624, 627, 648, 663,
Laplacian 279 664, 704, 708
Laroque, G. 582 Linear programming transportation problem 298
Larson, R. 368, 401 Linear Quadratic Control Model Control Model
Lasdon, L.S. 368, 403, 478, 488 (LQCM) 588, 614
Laub, A.J. 173, 185, 187, 188, 191,200, 201, Linear-quadratic approximation 717
251,252 Linear-quadratic problems (see also Markov
Laureano-Ornz, R. 438 decission process, linear quadratic)
Lawson, C.L. 798 630
Learning 414, 435 Linear strategies 454
Learning process 610 Linearization method 371
Least squares 567 LINEX loss function 499
LeCam, L.M. 798 Lions, J.L. 403
L'Ecuyer, P. 744, 799, 800 Lipschitz continuity 360
Lee, R.M. 330 Lipton, D. 36, 82
Legendre polynomials 549, 689 Liquidity risk 416
Index 815

Lisp 491 Magill, J.RM. 584

Lists 475 Majumdar, M. 631,636, 723
Litterman, R. 250 Malakellis, M. 22, 24, 32, 34, 36, 60, 83
Liu, M. 691,723 Malek-Zavarei, M. 589, 617
Livestock 326 Malik, A. 737, 798
Ljung, L. 446, 470 Malliaris, A.G. 613, 617
Lluch, C. 51, 82 Management science 297
Load balancing 351 Manber, U. 402
Local incentive-compatibility constraints 267 Mangasarian, O.L. 131, 140, 617
Locations 304, 410, 411,413 Mankiw, N.G. 438
Lofgren, H.L. 326, 331 Manne, A.S. 6, 75, 83, 310, 319, 323-325,
Logic predicate 417, 435 329-331
Logic predicate rule 423 Mao, C.S. 582
Logic programming 312 Marcet, A. 443, 458, 470, 583, 584, 717, 726
Long, J.B. 620, 726 Maren, A. 437
Long-run average criterion 631 Marginal excess burden of taxation 69
Long-ran average rewards 630 Marginal prices 260, 263, 281, 282, 289
Longva, S. 24, 83 Marimon, R. 443, 450, 458, 470
Loop 99 MARKAL model 75
Lootsma, EA. 368, 403 Market equilibrium assumptions 5
Lorentsen, L. 24, 83 in illustrative CGE model 41, 47
Low discrepancy 648 in Johansen model 6
Low discrepancy grid points 679 Market equilibrium conditions 391
Low discrepancy grids 671,700, 703, 706 Market price risk 416
Low discrepancy methods 738, 739, 766, 769 Marketing research 424, 433
LQ-MDP 631 Markov chain 779, 780, 783-787, 790
LQCM 588, 589, 591,593, 596, 598, 602, 613, Markov chain Monte Carlo 746, 775-777, 786,
611 787
Lu, L. 201,251 Markov Decision Process (MDP) (see also
Lucas critique 610 controlled stochastic process) 620, 698
Lucas, R.E, Jr. 584, 585, 610, 617, 618, 620, action space 632
724, 726, 728, 793, 794, 800 Continuous Decision Processes (CDPs) 629,
Luenberger, D.G. 726 633, 669
Luethi, H. 106, 139 continuous MDPs 621
Ldonard, D. 617 continuous state MDPs 630
continuous time MDPs 630
deterministic MDPs 630
M5 423 discount factor 632, 653
MacFarlane, A.G.J. 185, 251 discounted MDPs 630
Machina, MJ. 631,726 Discrete Decision Processes (DDPs) 633, 669
Machine learning 407, 414, 420 discrete MDPs 621
Macintosh 312 finite horizon MDPs 630, 632
MacKinnon, J.G. 106, 140 infinite horizon 630, 632
MacLaren, M.D. 753, 799 Linear Quadratic MDPs (LQ-MDPs) 627
MacPherson, R.D. 743, 745, 797 state space 632
MacRea, E.C. 588, 617 stochastic MDPs 630
Macroeconometric model 7, 8, 77, 375, 413 Markov operator 654
Macroeconomic forecasts 62-64, 76, 77 Markov process 654
Macroeconomic policy 56-61 transition probabilities 632
Maehly, H.J. 798 Markov transition matrix 606
816 Index

Markov transition probability 620 McKnight, R.D. 582, 723

Markovian 630 McLennan, A. 114, 117, 130, 139, 140
Markovian decision processes 630, 632 McMillan, J. 256, 257, 283, 292
Markovian decision rule 655 McNamee, J. 737, 799
Markovian strategies (see also optimal decision McQueen-Porteus error 660
rules) 634 McQueen-Porteus error bounds 653, 656, 660,
Markowitz, H. 83, 310, 331,336, 403, 482, 488 661,706
Markowitz model 482 MDP (Markov Decission Process) 629, 698
Marsaglia, G. 743, 744, 745, 748, 752, 753, 799 MDP problems 664
Marschak, J. 620, 722 Mead, R. 670, 726
Marshall, A.W. 774, 799 Meagher, G.A. 62, 63, 79
Marshall, D.A. 717, 726 Mean-variance models 479
Marsten, R.E. 478, 488 Mean-variance optimization model 482
Maskin, E. 466, 470 Measure sweeping 284, 289
Masse, E 323, 331 Measurement error vector 606
Massive parallel processing 650 Measurement vector 606
Massive parallel processor 650, 663, 665 Mechanism design 285, 288
Massively parallel computers 651 Medium-grained machines 344
Massively parallel implementation of SEA 383 Meeraus, A. 8, 80, 81, 83, 302, 304, 310, 314,
Massively parallel policy iteration 664 319, 326, 329, 331,332, 472, 473,
Massively parallel processors 649 479, 488
Mathematica 490 Megiddo, N. 111, 132, 140
Mathiesen, L. 21, 83, 107, 140 Mehra, R. 582, 585
Matrix estimation 484 Mellon, B. 322, 329
Matrix generators 472 Meltzer, A.H. 617
Matrix pencil 187 Mennes, L. 319, 332
Matrix Riccati equation (see also linear quadratic Mercenier, J. 8, 9, 25, 71, 83, 84
problems) 627 Mergers and acquisitions 433
Matrix sign 190 Merrill, O.H. 106, 140
Matrix sign algorithm 200 Mertens, J.E 116, 117, 140
Matulka, J. 588, 617 Merton, R.C. 584, 620, 726
Maximum Principle 588, 603 Mesh network 343
Mayne, D.Q. 176, 250 Mesztenyi, C.K. 798
McAfee, R.P. 256, 257, 283, 292 Mete Soner, H. 724
McCabe, K.A. 323, 331 Method of minimum weighted residuals 714
McCall, J.J. 723 Metropolis, N. 776, 778, 799
McCarl, B.A. 326, 331 Metropolis-Hastings algorithm 776, 778, 779,
McCarthy, M.D. 156, 169 784, 785
McDonald, D. 62, 64, 78, 79, 81 Metropolis-Hastings independence chain 791
McFadden, D. 582, 727 Mexican agriculture 326
McGrath, E.I. 772, 799 Mexican steel industry 311
McGrattan, E. 252, 450, 470, 584, 627, 717, Michalski, R.S. 437, 438, 470
722, 793, 799 Michel, E 595, 616
McGratten, E.R. 592, 617 Microprocessor 321
McGuire, E 622, 726 Mikhail, W.M. 770, 799
McKelvey, R.D. 108, 117, 130, 137, 139, 140, Miles, D. 439
622, 726 Milgroln, ER. 256, 262, 266, 283, 293
McKenzie, L.W. 582 Milne, W.E. 269, 271,293
McKibbin, W.J. 9, 75, 83 Milnor, T. 584
McKinnon, K.LM. 666, 722 M1MD machine 342
Index 817

MIMI 310 Motivations 409, 410, 413

Min, C. 786, 800 Moulin, H. 370, 402
Minhas, B.S. 19, 80 MSG-4 model of Norway 24
Minimum residual 717 Muller, M.E. 753, 797
Minimum Residual (MR) methods 623, 709, 713 Multi-period CGE models
MINOS 309 and welfare analysis 75
Minsky, M.L. 446, 470 different types of 24, 25
Minty, G.J. 627, 725 timing in 26, 33
Miranda, M. 584, 622, 726 with exogenous investment 25-27
Mirman, L.J. 581,620, 696, 723, 797 with optimizing investors 32-36
Mirrlees' formulation 257 with rational expectations 30-36
Mirrlees, J.A. 255, 257, 260, 293 with static expectations 28-30
Mishenko, E.E 618 Multicommodity problems 390
Mitchell, B. 770, 799 Multidimensional integration by parts 259
Mitchell, T.M. 437, 470 Multigrid methods 701,704
Mixed complementarity problem 478 Multigrid versions 664
Mixed integer programming 310 Multilinear approximation 563
Mixed strategy 644 Multilinear interpolation 683
Mnaber 368 Multinomial distribution 755
Model of computation (Turing vs. real) 639 Multiproduct bundling 261
Model tree algorithms 423 Multivariate function approximation 647, 699,
Modeling languages 472 718
AIMMS 298, 311,473 Multivariate function approximation problem 698
AMPL 298, 310, 473 Multivariate integration 624, 628, 641,643-647
GAMS 298, 310, 472 Multivariate normal distribution 755, 767
Models with rational expectations 160 Multivariate optimization 628, 648
Modi, J.J. 593, 617 Mulvey, J.M. 478, 488
Modified policy iteration 655, 656, 661,667, 704 Murphy, C.W. 77, 83
Modified policy iteration algorithm 660 Murphy, F.H. 403
Modified projection method 370, 387 Murphy, K.M. 173, 250, 252
Moeschlin, O. 141 Murtagh, B.A. 309, 332, 478, 488
Moler, C.B. 251 Murty, K.G. 102, 140
MONASH model 62-64 Mussa, M. 257, 293
Monetary policy 430 Mutchler, D. 115, 139
Money transfer telex conversion 410 Muth, J.E 594, 617
Monotonicity 359 Myerson, R.B. 117, 140, 261,293
Monte Carlo 613, 742, 745, 766, 767, 790, 793 Myint-U, T. 617
Monte Carlo algorithms 646 Myopically rational 114
Monte Carlo integral 628 M6tivier 446
Monte Carlo integration 611,644, 671,679, 694, Mtiller, B. 470
707, 713 Mtfller, R 766, 799
Monte Carlo simulation method 691
Moore, J.B. 175, 196, 250
Moore, L.R., III 743, 797 Nagurney, A. 336, 337, 356-358, 369, 372, 374,
Morgenstem, O. 485, 488 401-403
Morin, T.L. 478, 488 Narendra, K.S. 717, 728
Morris, J.G. 402 Nash equilibrium 90, 455, 499
Mortensen, D.T. 584 Nash, J.E 132, 141,404
Morton, H. 437 Nash, S. 173, 205, 206, 251
Morton, K.W. 770, 798 Natural gas 323
818 Index

Natural language processing 418 Nonlinear programs 478

Naughton, B. 82 Nonmonotonic logic 436
Nearest neighbor indexing 431 Nonstochastic pricing 257
Necessary conditions 259 Nordhaus, W.D. 324, 325, 332
Neck, R. 588, 617 Normal distribution 746, 748, 767
Negative correlation 266, 289 Normal form 90
Negishi, T. 10, 83 Norman, A.L. 337, 404, 588, 617
Nelder and Mead optimization algorithm 686 Norman, V.D. 9, 71, 83
Nelder, J. 670, 726 Norton, R.D. 330, 332
Nelder-Mead polytope algorithms 700 Norton, R.G. 329
Nemirovsky, A.S. 625, 626, 642, 648, 677, 717, Novak, E. 625, 687, 690, 709, 726
726 Numeric taxonomy 424
Net benefit 257 Numerical dynamic programming 588
Network optimizer 481 Numerical integration 648, 697
Network problems 478
Neudecker, H. 592, 616
Neural networks (see also approximation Object Linking and Embedding 313
methods, neural networks) 414, 421, Observability 176
425, 436, 445, 559, 622, 625, 690, O'Hara, M. 581
711,719 Oil prices
Neural processing 414, 425 modelling effects of changes in 8
Newton method 273, 283, 289, 366, 376, 491, Oksendal, B. 613, 618
652, 655, 663 Okuguchi, K. 404
Newton-Kantorovich method 652 OLE 313
Newton-relaxation algorithm 270, 288 O'Leary, D.E. 438
Neyman, J. 798 Oligopolistic market equilibria 393
Nicholson, C.F. 403 Olsen, O. 24, 83
Niederreiter, H. 647, 679, 680, 726, 738-740, O'Mara, G.T. 326, 331
799 One way multigrid 627
Nijkamp, P. 403 Oniki, H. 584
Nine-point stencil 271 Ontology 418, 436
Nishimura, K. 581 OPEC 322
Nodes 109 Open loop feedback 607
Nonadaptive information 643 Operating risk 416
Noncomputable 626, 640 Operations research 297, 620
Nonconvex likelihood functions 610 Operators
Nonconvexity 610 Bellman operator 621,635, 652, 653,670,
Nondegenerate 97 674, 677, 691,698, 700, 703, 704,
Nondynamic progranaming solution 598 710, 714, 716, 719
Nonexpansive projection operator 358 random Bellman operator 629, 671,677 679,
Noninitial nodes 109 706
Nonlinear complementarity problem 91, 107 smoothed Bellman operator 714
Nonlinear constrained optimization 267 Optimal assignment 281
Nonlinear equations 375 Optimal bundles 264
Nonlinear functional 652 Optimal consumption and savings 637
Nonlinear Gauss-Seidel method 364, 367, 371 Optimal control 32
Nonlinear Jacobi method 364, 367, 371 Optimal control techniques 157
Nonlinear optimization 267, 646 Optimal decision rule 620, 655
Nonlinear pricing 255, 285, 288, 289 Optimal information 668
Nonlinear programming 624, 629 Optimal replacement of durable assets 638
Index 819

Optimal taxation 255 Parameter vector 606

Optimising assumptions Pardalos, RM. 336, 404
about investors 25, 32-36 Parke, W.R. 144, 149, 169
in CGE models 5, 19, 72 Parmenter, B.R. 8, 9, 16, 19, 30, 32, 33, 36, 51,
in economy-wide econometric models 7 56, 61-63, 68, 76, 77, 79, 81, 82, 327
in illustrative CGE model 39, 47 Parser 418, 436
in Johansen model 6 Parsing 436
power of 8 Partial Differential Equation (PDE) 255, 268,
Optimization 414, 764, 786, 793, 794 288, 603, 614
Optimization problems 355, 376 eliptic PDEs 628
ORANI model 51, 56, 61, 62, 68, 69, 75-79 linear PDEs 647
Ordered set 103 Partial survey 485
Organizational location 410 Participation constraint 257, 265, 286, 289
Ortega, J.M. 270, 271,281,292, 372, 404, 630, Pascal 491
652, 726 Paskov, S.H. 628, 647, 648, 680, 708, 726
Orthogonal collocation 568, 570, 7t6 Passive Learning 604, 607, 608, 610, 613
0rthogonal family 623 Passive Learning or Open Loop Feedback (OLF)
Orthogonal polynomials 548, 624 604
Orthogonal projection 356 Path 99
Orthogonal rotations 279 Path following algorithms 92
Orszag, S,A. 581 Patnaik, L.M. 438
Otani, K. 584 Patterson, D.A. 345, 402
Ozaki, H. 631,726 Pau, L.E 403, 437-439, 616
Pauletto, G. 402
Payoff 112
Pad6 approximation 516, 517 Payoff function 90
Pad6 expansions 528 Pearce, D. 130, 139
Pakes, A. 622, 726 Pearson, K.R. 8, 9, 12, 21, 30, 55, 80, 82, 83
Palaniappan, S. 303, 319, 329 Pension rights 410
Palfrey, T,R. 117, 137, 139, 140, 622, 726 Percentage-change form of CGE models
Pallaschke, D. 141 convenience of using 17, 18
Palmer, R. 470 of CGE models for multi-period models 27,
Pan, V. 651,663, 664, 726 29-32
Pang, J. 107, 140 of CGE models of CES input demand function
Pao, Y.H. 437 19, 20
Papadimitriou, C.H. 106, 140, 726 of CGE models of illustrative model 39~-8
Papadopoulos, RM. 201,251 of CGE models rules for deriving 18, 19
Papert, S.A. 446, 470 Perceptron 445
Pappas, T. 173, 187, 188, 252 Pereira, A.M. 8, 83
Paragon 347 Perfect 630
Parallel architectures 341 Perfect equilibrium 115
Parallel computation of dynamical systems 393 Perfect recall 110
Parallel computation of variational inequality Perfect state information 631
problems 384 Performance measures 352
Parallel Fortran 349 Permanent income economy 210
Parallel programming languages and compilers Permanent income hypothesis 638
348 Personal finances 419
Parallel representations 312 Perturbation 579
Parallel-vector processing 593 Perturbation method 535
Parameter uncertainty 604 Peskun, RH. 799
820 Index

Peter, E. 438 Potter, J.E. 185, 252

Peter, M.W. 60, 83 Powell, A.A. 9, 19, 21, 32, 33, 36, 47, 51, 56,
Petersen, C.E. 404 61, 68, 81, 82, 84
Peterson, A.V. 746, 799 Powell, M.J.D. 151, 169, 727
Petit, M.L. 618 Precedes 109
Petkov, E, Jr. 195, 252 Preckel, EV. 488
Petrochemicals 319 Predetermined variables 59 l
Pflug, G. 446, 470 Predicate 436
Phelps, E. 620, 637, 727 Predictor-corrector scheme 604
Phillips, A.W. 588, 618 Prescott, E.C. 163, 169, 179, 251,285, 293,
Phillips, EC.B. 581,583-585 583-585, 594, 620, 725, 726, 728
Piecewise cubic interpolation 685 Press, W.H. 604, 618, 727, 734, 743, 800
Piecewise polynomial approximation 685 Price discrimination in CGE models 9
Piecewise-linear approximation 282 Price indices 426
Piessens, R. 735, 799 Principal agent 288
Piggott, J. 8, 80, 83, 84 Principal-agent problem 285
Pindyck, R.S. 332, 589, 618 Prisoner's dilemma 452
Pipelines 323 Private information 285, 288
Pipelining 593 Probability estimation 413
Pivot matrix 95 Probes 429
Players 90 Product differentiation in CGE models 9
Plaza, E. 438 Production process 299
Plosser, C. 620, 726 Production rules 435
PM 311 Productive unit 303
Poirier, D. 792, 800 Programming models 6
Poisson equation 268, 269 methods for solving CGE models 9, 10-12
Policy iteration 650, 652, 654, 655, 659, 661, Projection 566, 570, 579
664, 666, 709, 711 Projection methods (see also algorithms,
Policy iteration algorithm 663 projection methods) 369, 371,564,
Policy iteration methods 654 565, 576, 625, 689, 709, 715, 717
Policy iteration with state aggregation 656 Projection of the vector 356
Poison, N.G. 788, 791,798 Prolog 311,421
Polynomial interpolation 277, 282 Proper equilibrium 117
Polynomial series expansions 622 Property valuation 433
Polynomial time 627-629 Pseudo-codes 290
Polynomial-time algorithm 627, 720 Pseudorandom 743
Polynomial time problems (see also Pseudorandom number 646, 742-745
computational complexity) 626, 640, PTS 312
649 Pure relaxation algorithm 277, 288
Polynomials family 623 Pure strategy 90, 113, 645
Polytope algorithm 670 Puterman, M.L. 622, 631,636, 652, 655, 666,
Pontryagin, L.S. 588, 618 727
Poromaa, E 194, 251 Pyatt, G. 484, 488
Porteus, E.L. 622, 652, 725, 727
Portfolio dedication 479
Portfolio immunization 479 Quadratfrei 123
Portfolio optimization problems 376 Quadratfrei part 123
Positive and negative examples 423 Quadratic convergence rates 655
Positive correlation 266 Quadratic cost function 589
Poterba, J. 36, 82 Quadratic programming 266
Index 821

Quadratic programming problems 377 Regression 512, 548, 554

Quadratic-linear tracking problems 596 Regression-based interpolation method 691
Quadrature 734-737, 739, 742, 766, 769 Regular equilibria 130
Quadrature abscissa 674 Regular perturbation methods 515
Quadrature grids 671,674, 700, 702, 706 REH (Rational Expectations Hypothesis) 594,
Qualitative properties of the variational inequality 595, 597, 605, 606
problem 359 Reif, J.H. 114, 120, 125, 139, 651,663, 664, 726
Qualitative simulation 414, 419, 436 Reilly, E.D. 344, 345, 404
Quantified propositional formulas 119 Reinhardt, J. 470
Quinlan, J.R. 437, 438 Reiter, S. 297, 330
Relational operators 477
Relaxation algorithm 273, 279, 281
Rabinowitz, R 676, 721,723, 734, 736, 737, 797 Rellich-Kondrachev theorem 714
Radial symmetries 264 Reny, P.J. 114, 140
Radner, R. 443, 470 Repeated game strategy 452
Ragsdell, K.M. 368, 403 Reported type 285
Ralston, A. 344, 345, 404 Representation
Ramage, J.G. 753, 799 GAMS 313
Ramsey pricing 259 Graphical 312, 317
Ramsey, F.R 588, 618 Residual function 566, 567
Rmnsey, J.B. 7, 82 Restart method 105
Random grid points 676 Resultant 121
Random grids 671 Revealed preference 493, 503
Random multigrid/successive approximations Rewriting rules 435
718 Rheinboldt, W.C. 652, 726
Random uniform grids 702 Riccati equation 184
Randomization (see also algorithms) 628, 646, Rice, J.R. 585, 798
672, 677, 679, 699, 718 Richardson's extrapolation 17, 24, 61, 62
Rassenti, S.J. 323, 331 Richardson, M. 368, 401
Rate of return Richels, R.G. 324, 325, 331
definition of 28 Richter, M.M. 438, 582
in illustrative model 50, 65, 66 Richtmeyer, R.D. 740, 800
Rational expectation iteration 608 Riesbeck, C.K. 437
Rational expectation variables 599 Rietveld, P. 403
Rational expectations 594, 595, 599, 607, 610 Rimmer, R.J. 77, 81
Rational expectations equilibrium 577 Ring network 343
Rational Expectations Hypothesis (REH) 594 Rinooy Kan, A.EG. 724
Ratner, M. 488 Ripley, R.D. 776, 800
Rayleigh-Ritz method 568 Risks 416
Real estate 405 Rivlin, T.J. 585, 727
Real estate appraisal for credit 409 Robert, R. 437
Real estate pricing 409 Roberts, G.O. 777, 783-785, 800
Real number model of computation 641 Roberts, J.D. 190, 200, 252
Real Schur decomposition 192 Roberts, K.W.S. 255, 293
Realization equivalent 113 Roberts, S.M. 36, 84
Realization plan 111 Robinson, S.M. 8, 9, 79, 81, 84, 139, 141
Record 436 Rochet, J.-C. 255, 258, 261,265, 284, 293
Recursive models 24-30, 62-67 Rogers, J. 336, 402, 617, 670, 724
Rees, H. 598, 617 Rogerson, R. 252
Refining industry 322 Rosen, J.B. 109, t41, 336, 404
822 Index

Rosen, S. 173, 252, 257, 293 Scales, L.E 591,618

Rosenblatt, E 454, 470 Scarf, H.E. 6-8, 84, 102, 141,337, 404
Rosenbluth, A.W. 799 Schaeffer, D.G. 582
Rosenbluth, M.N. 799 Schank, R.C. 437
Rosenmiiller, J. 102, 117, 141 Schanuel, S.H. 117, 141
Rossi, P.E. 788, 791,798 Scheinkmau, J.A. 173, 252, 58t
Rotated stencils 288 Schmeiser, B.W. 754, 800
Roth, EH. 438 Schnabel, R.B. 366, 401,488
Round, J.I. 484, 488 Schneider, M.H. 484, 488
Rowse, J. 323, 324, 332 Schnitkey, G. 622, 726
Rubin, D.B. 786, 797 Schrage, L.E. 743, 745, 753, 797
Rubin, H. 47, 82 Schultz, M.H. 585, 668, 669, 727
Rubinstein, A. 445, 463, 470 Schumacher, L.L. 686, 728
Rubinstein, R.Y. 764, 800 Schumaker, L.L. 585
Rui, X. 584 Schur algorithm 192
Rule 417, 436 Schwartz, N.L. 613, 617
Rule of thumb 694 Schweitzer, P.J. 709, 711,728
Runga-Kutta scheme 604 Search 432, 436
Russell, S.J. 437 Second-order conditions 258, 287, 289
Rust, J. 293, 585, 620, 622, 628, 632, 650, 671, Secufitization of mortgage portfolio 433
672, 677, 678, 699, 706, 717-719, 727 Seed objects 424
Rustem, B. 588, 598, 605, 616-618 Seidmann, A. 709, 711,728
Rutherford, T.E 8, 10, 84, 472, 488 Self, M. 437
Rutitskii, Ya.B. 725 Self-approximating 670, 700
Ryzhik, I.M. 734, 798 Self-approximating operator 629
Self-selection 256
Seller's revenue 257
S, computer package 490 Selten, R. 117, 141
Saad, Y. 585, 668, 669, 723, 727 Semantic nets 436
Sachs, J.D. 9, 36, 82, 83 Semi-algebraic set 92, 119
Saddle point property 601 Semmler, W. 630, 728
Saigal, R. 106, 141 Sequence form 111
Salton 437 Sequential equilibrium 109, 112
SAM 484 Sequential, equilibria 117
SAMBAL 484 Set-driven nature 301
Samet, D. 140 Settlement risk 416
Sampaio de Souza, M. 9, 25, 83 Sewell, G. 603, 618
Sample equilibrium 88, 92 Shank, R. 437
Samuelson, P.A. 297, 329, 336, 402, 404, 585, Shannon, C. 256, 262, 266, 283, 293
620, 727 Shape preservation 572
Sandee, J. 6, 84 Shape-preserving approximation 563
Sandell, N.R., Jr. 173, 187, 188, 252 Shape-preserving interpolation 556
Sanderson, W.G. 8, 82 Shapiro, L. 285, 292
Santos, M. 585, 622, 727 Shapley, L. 92, 99, 101, 117, 141
Sargent, T.J. 176, 180, 251,252, 450, 470, 583, Shared memory 343
585, 591,618, 622, 627, 631,722, Sharp, D.H. 339, 401
724, 727 Sharpe, W. 336, 404
Sartore, D. 615 Sheffrin, S.M. 594, 618
SAS 505 Shell 408, 437
Saunders, M.A. 309, 332, 478, 488 Sherali, H.D. 403
Index 823

Shetty, C.M. 366, 401 Smooth approximation of the decision rule 693
Shin, M.C. 727 Snape, R.H. 68, 84
Shipman, J.S. 36, 84 Sobel, J. 117, 139
Shisha, O. 725 Sobel, M.J. 727
Shoemaker, C.A. 583 Sobol points (see also numerical integration)
Shoven, J.B. 6, 8, 9, 52, 69, 80, 83, 84, 584 628, 671,679, 703, 708
Shreve, S.E. 636, 722 Social accounting matrices 484
Siddiqi, S.N. 310, 316, 323, 329 Social choice 285
Sign assignment 120 Social welfare function 286
Sikorski, K. 625, 690, 728 Software engineering 408
SIMD machine 342 Solis, EJ. 670, 728
Simon, H.A. 724, 728 Solis, L. 326, 332
Simon, L.K. 117, 141 Solnick, A. 583, 622, 684, 686, 725
Simple Monte Carlo 757, 758, 769-771 Solow, R.M. 19, 80, 297, 329, 336, 402
Simplex 102, 103 Solution of linear elliptic PDE's 628
i-stopping simplex 103 Solution operator 641
Simplex method (see also algorithms, simplex Solvers 478
algorithm) 627, 648 CONOPT 309, 478
Simplicial subdivision 102 GAMS/CPLIB 478
Simplotope 102 GAMS/GENOS 478
Simply stable sets 116 GAMS/MATBAL 478
Simpson, R.L. 437 GAMS/MINOS 478
Sims, C.A. 217, 250, 252, 584, 725, 727, 728 GAMS/ZOOM 478
Simulated annealling 610, 670, 700 MINOS 478
Sin, K.S. 176, 197, 250 Sonnenschien, H. 82
Singh, S.P. 717, 722 SOR (Successive Overrelaxation Method) 363
Singhal, J. 478, 488 SOR iteration 363
Single grid 701 SOR method 365
Single grid methods 701 Souganides, EE. 582
Single period utility function 632 South Korea 319
Single-crossing property 256, 262, 283 Soyster, A.L. 403
Singleton, K. 583, 724 Sparrow, ET. 401
Siow, A. 252 Sparse resultant 121
Sivau, R. 175, 251,589, 617 Sparsity pattern 650
Skorohod, A.V. 630, 636, 724 Sparsity structure 649
Skykolt, S. 73, 81 Spatial price equilibrium conditions 386
SLCP 107 Spatial price equilibrium problems with ad
Sleeman, D. 438 valorem tariffs 385
Slonim, R. 697, 724 Speedup of the algorithm 353
Slots 435, 437 Sperner-proper 103
SLQCM 613 Spivak, A. 584
Smale, S. 107, 141,337, 404 Spline approximation 685
Small models 327 Spline of order 685
Small, E.J. 21, 82 Splines 554, 555, 624, 719
Smarr, L.L. 345, 403 SPI 347
Smith, A.A., Jr. 622, 722, 728, 776, 793, 796, Splitting equilibration algorithm 380
800 Spreadsheets 312, 416
Smith, A.EM. 777, 783-785, 797, 800 Spreen, T.H. 326, 331
Smith, V.L. 323, 331 SPSS 5O5
Smooth approximation 622, 698 Srikant, R. 585
824 Index

Srinivas, M. 438 Strecok, A.J. 746, 800

Srinivasan, T.N. 8, 68, 83, 84, 620, 726 Streufert, R 631,726
Stabilizability 174 Strong monotonicity 359, 360
Stable manifold theorem 524 Structured modeling 298, 310
Stable set 116 Strunk, W., Jr. 298, 332
Stacchetti, E. 130, 139 Sturmfels, B. 129, 140
Stacked form 598 Stutz, J. 437
Stampacchia, D. 359, 403 Substitution 299
Stampacchia, G. 372, 402 Successive approximations 660
Standard information 643 Successive overrelaxation 652
Standardization 434 Successive Overrelaxation Method (SOR) 363
Stanford university energy modeling forum 323 Successive relaxation 273, 277
State equation 592 Suh, J.S. 319, 332
Stationarity 634 Sulem, A. 602, 603, 613, 618, 630, 728
Stationarity condition 597 Summation product operators 475
Stationary Markov policies 655 Summers, L.H. 25, 36, 82, 161, 169
Stationary point problem 91, 107 Sun workstation 324
Stationary solution 592 Sunspots 545
Steel, M.EJ. 319, 502, 505 Super-modularity 262, 283
Steele, G.L., Jr. 348, 404 Supercomputers with vectorization 375
Stedinger, J.R. 583 Superscalar 593
Stencil 270, 274, 277, 279, 281,288, 289 Supervised learuing 421
Stenger, E 737, 799 Support 120
Step functions 554 Surface features 429
Stepp, R.E. 437 Sutton, J. 8, 16, 30, 51, 68, 81
Stetsenko, V. 725 Sweeney, J.L. 82
Stewart, G.W. 188, 193, 205, 250, 252, 591,618 Sylvester equation 204, 205
Stiglitz, J.E. 583 Symbolic derivative 492
Stimuli 414 Symmetric five-point stencil 271
Stinchcombe, M. 470, 625, 690, 725 Symplectic 185, 188
Stine, R.A. 502, 505 Synchronization and communication 351
Stirling's formula 546 Syntactic pattern recognition 419
Stochastic approximation 446 Syntec 63, 77, 84
Stochastic control 588 System equation 604
Stochastic control experiments 604 Systematic sampling 772, 773
Stochastic dynamic optimization 611 Systems of equations 355
Stochastic LQCM 605 Szidarovsky, E 404
Stochastic models 588 S6derstr{Sm, T. 446, 470
Stochastic parameter estimates 608
Stochastic parameters 588, 604, 609
Stochastic simulation 155, 158, 166 Tables 475
Stocha~stic volatility 788 Takayama, A. 297, 337, 404
Stockey, N.L. 618 Takayama, T. 331,336, 370, 403, 404
Stocks, K.J. 81 Talman, A.J.J. 105-107, 117, 139, 141
Stoer, J. 604, 618 Tambo, T. 438
Stokey, N.L. 585, 620, 728, 793, 794, 800 Tanner, M.A. 776, 800
Stone, R.A. 47, 84 Tapiero, C.S. 402, 602, 603, 613, 618, 630, 728
Stoutjesdijk, A.J. 302, 315, 319, 329, 332 Target control vector 589
Strategic nodes 109 Target machine 352
Strategy set 90 Tariff 255, 260, 281, 289
Index 825

effects of changes in 4 Tiwafi, E 129, 139

analysed in illustrative CGE model 61, 62, 64, Todd, M.J. 106, 141,724
65 Total variation 680
analysed in ORANI 68, 77, 78 Totally mixed 111
Tarski, A. 141 Touma, W.R. 321, 332
Task scheduling 351 Tourism analysed in ORANI 68, 69
Tauchen, G. 674, 702, 711,728, 796, 800 Townsend, R.M. 285, 293
Taxation 521, 533, 534 Tracking equation 590
Taylor expansions 531 Tracking problem 589
Taylor series 14, 21,516, 527, 529 Tractable problems (see also computational
Taylor, J.B. 36, 81, 160, 161, 163-165, 168, 169, complexity, tractable problems) 626,
528, 585, 595, 611,616, 618, 624, 628, 640
717, 788, 800 Training algorithm 425
Taylor, L. 332 Transfer payments 285
Taylor, S. 793, 800 Transition matrix 589
Taylor, W. 437 Transportation model 480
Teacher 421 Transversality condition 32, 260, 268, 269, 272,
Technical analysis 419 277, 278, 288
Technical change 4, 70, 77 Traub, J.F. 624, 641,646, 667, 668, 728
in illustrative CGE model 47, 59, 64-67 Traveling salesman problem 624, 640
quantification of 78 Treadway, A.B. 585
Tehada-Guibert, J.A. 583 Tree induction 422
Teller, A.H. 799 Trend analysis 419
Teller, E. 799 Triangulation 103
Template retrieval 432 Trick, M.A. 648, 663, 728
Temporal causal graph 419 Trippi, R. 438
Temporal logic 419 Truth maintenance 417
Tensor product 557 Tsitsiklis, J.N. 354, 364--366, 368, 369, 372,
Tensor product splines 685 401,624, 626, 627, 631,636, 641,
Tent functions 555, 570 644, 670, 673, 674, 676, 704, 717,
Terminal nodes 109 718, 722, 723, 726, 728
Terminal simplex 104 Tuple (t-uple) 437
Terminology 434 Tucker, A.W. 139, 140
Terminology database 418 Turban, E. 438
Terms of trade 60, 62-64, 66, 77 Turing machine 640
Teukolsky, S.A. 618, 727, 800 Turing model of computation 640, 641
TEX 490 Turnovsky, S.J. 581
Tezuka, S. 745, 800 Turvey, R. 323, 328
Thacher, H.G., Jr. 798 Tustin, A. 588, 618
Theil, H. 618 Two armed bandit 451
Thomas, L.C. 666, 722 2SLAD 150
Thompson, G.L. 307, 332 2SLS 146
Thore, S. 307, 332 Type 255
3SLS 146 Type parameters 289
Tichy, W.E 344, 401 Type linearity 284
Tierney, L. 741,742, 778, 783, 784, 787, 800 T3D 346
Time complexity 354
Time series analysis 502
Time consistent 595 U-resultant 122
Time inconsistent 595 Uberhuber, C.W. 799
826 Index

Uhlig, H. 161, 169, 585, 624, 726, 793, 800 Voigt, R.G. 372, 404, 630, 726
Unbound variable 119 von Neumann, J. 109, 139, 746, 800
Uniform 700 yon Stengel, B. 111, 140, 141
Uniform distribution 742, 743, 745
Uniform grid points 672
Uniform grids 671,700 Waelbroeck, J.L. 10, 81
Uniqueness 360 Wage-tax bargains 56
Univariate normal distributions 748, 753, 754 Walk 446
Unsupervized learning 421,424 Walker, A.J. 746, 800
Uri, N.D. 297, 331 Waren, A.D. 488
Utility functions 10, 24 Wasilkowski, G.W. 624, 641,646, 699, 720, 728
Utility functions in illustrative CGE model 47 Wasserman, E 438
Uzawa, H. 336, 401 Watts, S. 82
Waverman, L. 323, 332
Weber, R.J. 266, 293
Vainikko, G.M. 725 Wei, Gwei-nyu D. 319, 332
Valdivia, V. 582 Weighted residual methods 564
Valencia, J.A. 323, 329 Weights 414, 425
Value function 620 Weil, D.N. 438
Value iteration 653 Weisbuch, G. 445, 470
van de Klundert, T. 8, 81
Welfare optimality 260, 265, 268, 287
van den Bergh, J.C.J.M. 403
Welfare optimality condition 269, 270, 272, 277,
van den Elzen, A.H. 107, 141
278, 288
van der Heyden, L, 106, 141
Welfare optimality on the boundary 287
van der Laan, G. 106, 141
Welfare weight 286
van der Ploeg, E 485, 488
Welsch, J.H. 735, 798
van der Waerden, B.L. 121, 141
Wershculz, A.G. 626, 628, 647, 728
van Dijk, H.K. 761,799
Westphal, L.E. 303, 319, 329, 332
van Dijk, N.M. 630, 728
Wets, R.J. 670, 728
van Dooren, E 193, 194, 252, 737, 800
Van Loan, C. 173, 193, 206, 207, 251,593, 616 Wette, M.R. 251
Van Long, N. 590, 617 Weyant, J.E 323, 324, 329, 331
van Sinderen, J. 8, 81 Whalley, J. 6, 8, 9, 52, 68, 69, 80, 83, 84
van Velden, L.M.T. 613, 616 Wheeler, R.M., Jr. 717, 728
Varadhan, S.R.S. 799 Whinston, A.B. 437, 617
Varian, H. 337, 404, 438, 496, 499, 505, 581, White, E.B. 298, 332
583 White, H. 445, 446, 470, 585, 625, 690, 725
Variance reduction 769 Whitt, W. 622, 729
Variational inequality problem 356 Whittle, E 589, 618, 729
Variational inequality subproblems 371 Wichmann, B.A. 744, 800
Vaughan, D.R. 185, 252, 592, 618 Wilcoxen, EJ. 7, 9, 19, 25, 31-34, 36, 75,
Vavasis, S.A. 106, 140 81-83, 85
Veinott, A.E 722 Wild, P. 750, 791,798, 800
Vetterling, W.T. 618, 727, 800 Wilde, D. 81
Vickers, J. 581 Wilkinson, J.H. 185, 251
Vietorisz, T. 319, 331 Willems, J.C. 251
Vigo, J. 622, 727 Willimns, J.C. 585
Vila, J.-L. 585 Williams, R. 51, 82
Vincent, D.E 8, 16, 30, 51, 68, 81 Williamson, J.G. 8, 82
Virtual benefit 259, 287 Wilson, D.A. 252
Index 827

Wilson, R. 102, 109, 113, 116, 117, 140, 142, Yoon S. 402
255-257, 261,264, 283, 285, 288, 289, Yudin, D.B. 625, 626, 642, 648, 677, 717, 726
Windows 312
Wishart distribution 755 Zabreiko, RE 584, 725
Witzgall, C. 798 Zadrozny, RA. 173, 252, 585
Wold representation 226 Zalai, E. 8, 80
Wolfrmn, S. 282, 293, 505 Zaman, A. 744, 799
Wolpin, K.I, 622, 679, 691,724, 725, 799 Zame, W.R. 117, 139, 141
Wong, R. 585 Zangwill, W.I. 32, 82, 85, 129, 139, 142, 337,
Wong, W.H. 776, 800 402
World Bank 314, 318 Zarkin, G. 697, 724
Worst case 644 Zarnikau, J. 310, 316, 323, 329
Worst case basis 626 Zarrop, M.B. 5-98, 617
Worst case deterministic complexity 643 Zeidler, E. 585
Worst case randomized complexity 645, 646 Zeldes, S.E 585
Wo2niakowski, H. 624, 628, 641,646, 647, 667, Zellner, A. 499, 505, 786, 800
668, 671,679, 699, 728, 729 Zenios, S.A. 301,471,472, 478, 479, 484, 488
Wright, B.D. 585 Zero sum games 101
Wright, R. 252, 670 Zhang, D. 403
Zhang, W.-B. 585
Yamamoto, Y. 117, 142 Ziedler 716
Yang, Z. 117, 141 Zin, S.E. 631,648, 663, 724, 728

