Otanasy
Otanasy
Otanasy
Anders Warne
January 4, 2013
Abstract: YADA (Yet Another Dsge Application) is a Matlab program for Bayesian estimation and
evaluation of Dynamic Stochastic General Equilibrium and vector autoregressive models. This
paper provides the mathematical details for the various functions used by the software. First,
some rather famous examples of DSGE models are presented and all these models are included as
examples in the YADA distribution. YADA supports a number of dierent algorithms for solving
log-linearized DSGE models. The primary or work horse algorithm is the so called AndersonMoore algorithm (AiM), but the approaches of Klein and Sims are also covered. The AiM parser
is used to translate the DSGE model equations into a structural form that the solution algorithms
can make use of. The solution of the DSGE model is expressed as a VAR(1) system that represents
the state equations of the state-space representation. Thereafter, the dierent prior distributions
that are supported, the state-space representation and the Kalman filter used to evaluate the loglikelihood are presented. Furthermore, it discusses how the posterior mode is computed, including
how the original model parameters can be transformed internally to facilitate the posterior mode
estimation. Next, the paper provides some details on the algorithms used for sampling from
the posterior distribution: the random walk Metropolis and slice sampling algorithms. In order to
conduct inference based on the draws from the posterior sampler, tools for evaluating convergence
are considered next. We are here concerned both with simple graphical tools, as well as formal
tools for single and parallel chains. Dierent methods for estimating the marginal likelihood
are considered thereafter. Such estimates may be used to evaluate posterior probabilities for
dierent DSGE models. Various tools for evaluating an estimated DSGE model are provided,
including impulse response functions, forecast error variance decompositions, historical forecast
error and observed variable decompositions. Forecasting issues, such as the unconditional and
conditional predictive distributions, are examined in the following section. The paper thereafter
considers frequency domain analysis, such as a decomposition of the population spectrum into
shares explained by the underlying structural shocks. Estimation of a VAR model with a prior
on the steady state parameters is also discussed. The main concerns are: prior hyperparameters,
posterior mode estimation, posterior sampling via the Gibbs sampler, and marginal likelihood
calculation (when the full prior is proper), before the topic of forecasting with Bayesian VARs is
considered. Next, the paper turns to the important topic of misspecification and goodness-of-fit
analysis, where the DSGE-VAR framework is considered in some detail. Finally, the paper provides
information about the various types of input that YADA requires and how these inputs should be
prepared.
c 20062013 Anders Warne, Monetary Policy Research Division, Directorate General ReRemarks: Copyright
search, European Central Bank. I have received valuable comments and suggestions by past and present members of
the NAWM team: Kai Christoel, Gnter Coenen, Jos Emilio Gumiel, Roland Straub, Michal Andrle (esk Nrodn
Banka, IMF), Juha Kilponen (Suomen Pankki), Igor Vetlov (Lietuvos Bankas), and Pascal Jacquinot, as well as our
consultant from Sveriges Riksbank, Malin Adolfson. A special thanks goes to Mattias Villani for his patience when
trying to answer all my questions on Bayesian analysis. I have also benefitted greatly from a course given by Frank
Schorfheide at the ECB in November 2005. Moreover, I am grateful to Juan Carlos Martnez-Ovando (Banco de
Mxico) for suggesting the slice sampler. And last but not least, I am grateful to Magnus Jonsson, Stefan Lasen,
Ingvar Strid and David Vestin at Sveriges Riksbank, to Antti Ripatti at Soumen Pankki, and to Tobias Blattner, Boris
Glass, Wildo Gonzlez, Markus Kirchner, Mathias Trabandt, and Peter Welz for helping me track down a number of
unpleasant bugs and to improve the generality of the YADA code. Finally, thanks to Dave Christian for the Kelvin
quote.
Contents
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2. DSGE Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.1. The An and Schorfheide Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.2. A Small Open Economy DSGE Model: The Lubik and Schorfheide Example . . . . . . .
2.3. A Medium-Sized Closed Economy DSGE Model: Smets and Wouters . . . . . . . . . . . . . .
2.3.1. The Sticky Price and Wage Equations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.3.2. The Flexible Price and Wage Equations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.3.3. The Exogenous Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.3.4. The Steady-State Equations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.3.5. The Measurement Equations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3. Solving a DSGE Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.1. The DSGE Model Specification and Solution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.2. The Klein Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.3. The Sims Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.4. YADA Code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.4.1. AiMInitialize . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.4.2. AiMSolver . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.4.3. AiMtoStateSpace . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.4.4. KleinSolver. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.4.5. SimsSolver . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4. Prior and Posterior Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.1. Bayes Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.2. Prior Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.2.1. Monotonic Functions of Continuous Random Variables. . . . . . . . . . . . . . . . . . . . . . . . . .
4.2.2. The Gamma and Beta Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.2.3. Gamma, 2 , Exponential, Erlang and Weibull Distributions . . . . . . . . . . . . . . . . . . . . .
4.2.4. Inverted Gamma and Inverted Wishart Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.2.5. Beta, Snedecor (F), and Dirichlet Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.2.6. Normal and Log-Normal Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.2.7. Left Truncated Normal Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.2.8. Uniform Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.2.9. Student-t and Cauchy Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.2.10. Logistic Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.2.11. Gumbel Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.2.12. Pareto Distribution. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.2.13. Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.3. Random Number Generators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.4. YADA Code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.4.1. logGammaPDF . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.4.2. logInverseGammaPDF . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.4.3. logBetaPDF . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.4.4. logNormalPDF . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.4.5. logLTNormalPDF . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.4.6. logUniformPDF . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.4.7. logStudentTAltPDF . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.4.8. logCauchyPDF . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.4.9. logLogisticPDF . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.4.10. logGumbelPDF . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.4.11. logParetoPDF . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.4.12. PhiFunction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.4.13. GammaRndFcn . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.4.14. InvGammaRndFcn . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2
9
13
13
14
15
15
18
18
19
19
21
21
24
25
26
26
27
27
27
28
29
29
29
30
30
31
33
35
37
37
38
38
39
41
41
42
43
44
44
44
45
45
45
45
45
45
45
45
46
46
46
46
4.4.15. BetaRndFcn. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.4.16. NormalRndFcn . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.4.17. LTNormalRndFcn . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.4.18. UniformRndFcn . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.4.19. StudentTAltRndFcn . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.4.20. CauchyRndFcn . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.4.21. LogisticRndFcn . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.4.22. GumbelRndFcn . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.4.23. ParetoRndFcn . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5. The Kalman Filter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5.1. The State-Space Representation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5.2. The Kalman Filter Recursion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5.3. Initializing the Kalman Filter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5.4. The Likelihood Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5.5. Smoothed Projections of the State Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5.6. Smoothed and Updated Projections of State Shocks and Measurement Errors . . . . .
5.7. Multistep Forecasting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5.8. Covariance Properties of the Observed and the State Variables . . . . . . . . . . . . . . . . . . . .
5.9. Computing Weights on Observations for the State Variables. . . . . . . . . . . . . . . . . . . . . . .
5.9.1. Weights for the Forecasted State Variable Projections . . . . . . . . . . . . . . . . . . . . . . . . . . .
5.9.2. Weights for the Updated State Variable Projections . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5.9.3. Weights for the Smoothed State Variable Projections . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5.10. Simulation Smoothing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5.11. Chandrasekhar Recursions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5.12. Square Root Filtering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5.13. Missing Observations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5.14. Diuse Initialization of the Kalman Filter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5.14.1. Diuse Kalman Filtering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5.14.2. Diuse Kalman Smoothing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5.15. A Univariate Approach to the Multivariate Kalman Filter . . . . . . . . . . . . . . . . . . . . . . . . .
5.15.1. Univariate Filtering and Smoothing with Standard Initialization. . . . . . . . . . . . . . . .
5.15.2. Univariate Filtering and Smoothing with Diuse Initialization . . . . . . . . . . . . . . . . .
5.16. Observation Weights for Unobserved Variables under Diuse Initialization . . . . . . .
5.16.1. Weights for the Forecasted State Variables under Diuse Initialization . . . . . . . . . .
5.16.2. Weights for the Updated State Variable Projections under Diuse Initialization .
5.16.3. Weights for the Smoothed State Variable Projections under Diuse Initialization
5.17. YADA Code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5.17.1. KalmanFilter(Ht). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5.17.2. UnitRootKalmanFilter(Ht) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5.17.3. StateSmoother(Ht) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5.17.4. SquareRootKalmanFilter(Ht). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5.17.5. UnitRootSquareRootKalmanFilter(Ht) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5.17.6. SquareRootSmoother(Ht) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5.17.7. UnivariateKalmanFilter(Ht) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5.17.8. UnitRootUnivariateKalmanFilter(Ht) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5.17.9. UnivariateStateSmoother . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5.17.10. KalmanFilterMO(Ht) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5.17.11. UnitRootKalmanFilterMO(Ht) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5.17.12. SquareRootKalmanFilterMO(Ht) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5.17.13. UnitRootSquareRootKalmanFilterMO(Ht) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5.17.14. UnivariateKalmanFilterMO(Ht) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5.17.15. UnitRootUnivariateKalmanFilterMO(Ht) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5.17.16. StateSmootherMO(Ht) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5.17.17. SquareRootSmootherMO(Ht) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3
46
46
47
47
47
47
47
48
48
49
49
50
51
52
52
54
56
56
57
57
58
58
59
60
62
63
64
65
67
69
69
70
73
73
75
75
77
77
78
79
79
79
79
79
80
80
80
80
80
80
81
81
81
81
5.17.18. UnivariateStateSmootherMO . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
5.17.19. DiffuseKalmanFilter(MO)(Ht) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
5.17.20. DiffuseSquareRootKalmanFilter(MO)(Ht) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
5.17.21. DiffuseUnivariateKalmanFilter(MO)(Ht) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
5.17.22. DiffuseStateSmoother(MO)(Ht). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
5.17.23. DiffuseSquareRootSmoother(MO)(Ht) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
5.17.24. DiffuseUnivariateStateSmoother(MO). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
5.17.25. DoublingAlgorithmLyapunov . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
6. Parameter Transformations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
6.1. Transformation Functions for the Original Parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
6.2. The Jacobian Matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
6.3. YADA Code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
6.3.1. ThetaToPhi . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
6.3.2. PhiToTheta . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
6.3.3. logJacobian. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
6.3.4. PartialThetaPartialPhi. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
7. Computing the Posterior Mode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
7.1. Comparing the Posterior Modes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
7.2. Checking the Optimum. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
7.3. A Monte Carlo Based Optimization Routine . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
7.4. YADA Code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
7.4.1. VerifyPriorData . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
7.4.2. logPosteriorPhiDSGE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
7.4.3. logPosteriorThetaDSGE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
7.4.4. logLikelihoodDSGE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
7.4.5. logPriorDSGE. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
7.4.6. YADAcsminwel. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
7.4.7. YADAnewrat . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
7.4.8. YADAgmhmaxlik . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
7.4.9. YADAfminunc*. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
8. Posterior Sampling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94
8.1. The Random Walk Metropolis Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94
8.2. The Slice Sampling Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
8.3. Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96
8.4. Credible Sets and Confidence Intervals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97
8.5. YADA Code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98
8.5.1. NeweyWestCovMat . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98
8.5.2. DSGERWMPosteriorSampling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98
8.5.3. DSGESlicePosteriorSampling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100
8.5.4. ExponentialRndFcn . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100
9. Markov Chain Monte Carlo Convergence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102
9.1. Single Chain Convergence Statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102
9.2. Multiple Chain Convergence Statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103
9.3. YADA Code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104
9.3.1. CUSUM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104
9.3.2. SeparatedPartialMeansTest . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104
9.3.3. MultiANOVA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104
10. Computing the Marginal Likelihood . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106
10.1. The Laplace Approximation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106
10.2. Modified Harmonic Mean Estimators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107
10.2.1. Truncated Normal Weighting Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107
10.2.2. Truncated Elliptical Weighting Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107
10.2.3. Transformed Parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109
10.3. The Chib and Jeliazkov Estimator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109
4
In physical science the first essential step in the direction of learning any subject
is to find principles of numerical reckoning and practicable methods for measuring some quality connected with it. I often say that when you can measure what
you are speaking about, and express it in numbers, you know something about
it; but when you cannot measure it, when you cannot express it in numbers,
your knowledge is of a meagre and unsatisfactory kind; it may be the beginning
of knowledge, but you have scarcely in your thoughts advanced to the state of
Science, whatever the matter may be.
William Thomson, 1st Baron Kelvin, May 3, 1883 (Thomson, 1891, p. 73).
1. Introduction
The use of Bayesian methods is among some econometricians and statisticians highly controversial. A dogmatic frequentist may argue that using subjective information through a prior
pollutes the information content of the data by deliberately introducing small sample biases.
Provided that the data can be regarded as objective information, the choice of model or set
of models to use in an empirical study is however subjective and, hence, the use of subjective
information is dicult to avoid; for entertaining discussions about the pros and cons of the
Bayesian and frequentist approaches to statistical analysis, see Efron (1986), Poirier (1988)
with discussions, and Little (2006).
Several arguments for using a Bayesian approach are listed in the introduction of FernndezVillaverde and Rubio-Ramrez (2004). The first and perhaps most important argument listed
there concerns misspecification. Namely, that Bayesian inference relies on the insight that (all)
models are false. The subjective information that comes from the prior may therefore to a
certain extent correct for the eects that misspecification of a model has on the information
content in the data.1 For some recent ideas about model validation in a Bayesian setting, see
Geweke (2007).
YADA is a Matlab program for Bayesian estimation of and inference in Dynamic Stochastic
General Equilibrium (DSGE) and Vector Autoregressive (VAR) models. DSGE models are microfounded optimization-based models that have become very popular in macroeconomics over the
past 25 years. The most recent generation of DSGE models is not just attractive from a theoretical perspective, but is also showing great promise in areas such as forecasting and quantitative
policy analysis; see, e.g., Adolfson, Lasen, Lind, and Villani (2007b), Christiano, Eichenbaum,
and Evans (2005), Smets and Wouters (2003, 2005, 2007), and An and Schorfheide (2007).
For a historical overview, the reader may, e.g., consult Gal and Gertler (2007) and Mankiw
(2006).
The software is developed in connection with the New Area-Wide Model (NAWM) project
at the ECB; cf. Christoel, Coenen, and Warne (2008). Detailed descriptions about how the
software has been coded and how functionality can be added to it are given in the document
Extending YADA which is included in the YADA distribution; cf. Warne (2012).
YADA takes advantage of code made available to the NAWM project by colleagues at both
central bank institutions and the academic world. In particular, it relies to some extent on the
code written by the group of researchers at Sveriges Riksbank that have developed the Riksbank
DSGE model (Ramses). This group includes Malin Adolfson, Stefan Lasen, Jesper Lind, and
Mattias Villani.
A Matlab version of the Anderson-Moore algorithm for solving linear rational expectations
models (AiM) and writing them in state-space form is used by YADA; see, e.g., Anderson and
Moore (1985), Anderson (1999, 2008, 2010), or Zagaglia (2005). Since only linearized DSGE
models can be parsed with and solved through AiM, models based on higher order approximations are not supported; see Fernndez-Villaverde and Rubio-Ramrez (2005). In addition to
the AiM algorithm, the QZ-decomposition (generalized Schur form) based algorithms of Klein
1
See also Fernndez-Villaverde (2010) for further discussions on the advantages and disadvantages of Bayesian
inference.
9
(2000) and Sims (2002) are also supported. Although the Klein algorithm based model solver
in YADA does not directly use his solab code, I have most definitely taken a peek into it.2
Moreover, YADA includes csminwel and gensys, developed by Christopher Sims, for numerical
optimization and solving linear rational expectations models, respectively, as well as code from
Stixbox by Anders Holtsberg, from the Lightspeed Toolbox by Tom Minka, from Dynare by Michel
Juillard and Stephane Adjemian, and from the Kernel Density Estimation Toolbox by Christian
Beardah.3
In contrast with other software that can estimate DSGE models, YADA has a Graphical User
Inference (GUI) from which all actions and settings are controlled. The current document does
not give much information about the GUI. Instead it primarily focuses on the mathematical
details of the functions needed to calculate, for instance, the log-likelihood function. The instances when this documents refers to the GUI are always linked to functions that need certain
data from the GUI. The help file in the YADA distribution covers the GUI functionality.
This document is structured as follows. In the next section, we present three log-linearized
DSGE models from the literature. When we have established the general form of these models,
Section 3 provides an overview of the matrix representation of linear rational expectations
models and how these can be analysed with the Anderson-Moore algorithm. In addition, the
Klein (2000) and Sims (2002) algorithms for solving a DSGE model are discussed, focusing on
how the AiM form can be rewritten into a form compatible with the these algorithms.
Once we know how to solve the models, the issue of estimation can be addressed. The
starting point for Bayesian estimation is Bayes theorem, giving the relationship between and
the notation for the prior density, the conditional and the marginal density of the data, as well
as the posterior density. We then present the density functions that can be used in YADA for
the prior distribution of the DSGE model parameters. Parametric definitions of the densities are
provided and some of their properties are stated. This leads us into the actual calculation of the
likelihood function via the Kalman filter in Section 5. In addition, this Section is concerned with
smooth estimation of unobserved variables, such as the structural shocks, the computation of
weights on the observed variables for estimates of unobservables, simulation smoothing, square
root filtering for handling numerical issues, how to deal with missing observations, univariate
filtering for speed concerns, and diuse initialization.
Since some of the parameters can have bounded support, e.g., that a standard deviation
of a shock only takes positive values, the optimization problem for posterior mode estimation
typically involves inequality restrictions. Having to take such restrictions into account may slow
down the optimization time considerably. A natural way to avoid this issue is to transform
the original parameters for estimation such that the domain of the transformed parameters is
the real line. In this way we shift from a constrained optimization problem to an unconstrained
one. The specific transformations that YADA can apply are discussed in Section 6, and how these
transformations aect the estimation of the posterior mode is thereafter covered in Section 7.
Once the posterior mode and the inverse Hessian at the mode have been calculated, we can
construct draws from the posterior distribution using a Markov Chain Monte Carlo (MCMC)
sampler, such as the random walk Metropolis algorithm. In fact, it may be argued that the
parameter transformations discussed in Section 6 are more important for posterior sampling
than for posterior mode estimation. Specifically, when a Gaussian proposal density is used
2
Paul Kleins homepage can be found through the link: http://www.paulklein.se/. A copy of solab for Matlab,
Gauss, and Fortran can be obtained from there.
for the random walk Metropolis algorithm, the proposal draws have the real line as support
and, hence, all draws for the transformed parameters are valid candidates. In contrast, when
drawing from a Gaussian proposal density for the original parameters it is highly likely that
many draws need to be discarded directly since the bounds may often be violated. Moreover,
we often have reason to expect the Gaussian proposal density to be a better approximation of
the posterior density for parameters that are unbounded than for parameters whose domain
is bounded. As a consequence, YADA always performs posterior sampling for the transformed
parameters, while the user can select between the original and the transformed parameters for
posterior mode estimation. Posterior sampling with the random walk Metropolis algorithm as
well as the slice sampler is discussed in Section 8. The importance sampler, used by for instance
DeJong, Ingram, and Whiteman (2000), is not supported by YADA.
Given a sample of draws from the posterior distribution it is important to address the question
if the posterior sampler has converged or not. In Section 9 we deal with simple but eective
graphical tools as well as formal statistical tools for assessing convergence in a single Markov
chain and in parallel chains. When we are satisfied that our posterior sampler has converged, we
may turn to other issues regarding Bayesian inference. In Section 10 we examine the problem
of computing the marginal likelihood of the DSGE model. This object can be used for cross
DSGE model comparisons as long as the same data is covered, but also for comparisons with
alternative models, such as Bayesian VARs and DSGE-VARs.
In Section 11 we turn to various tools for analysing the properties of a DSGE model. These
tools include impulse response functions, forecast error variance decompositions, conditional
correlations, correlation decompositions, observed variable decompositions, and ways of addressing identification concerns. Thereafter, out-of-sample forecasting issues are discussed in
Section 12. Both unconditional and conditional forecasting are considered as well as a means
for checking if the conditional forecasts are subject to the famous Lucas (1976) critique or not.
The following topic concerns frequency domain properties of the DSGE model and of the VAR
model and this is covered in Section 13. For example, the population spectrum of the DSGE
model can be decomposed at each frequency into the shares explained by the underlying economic shocks. Furthermore, Fishers information matrix can be computed via the frequency
domain using only values for the parameters of the model as input. Provided that we regard
identification as a concern which is directly related to the rank of this matrix, identification
may therefore be studied in some detail at a very early stage of the analysis, i.e., without the
necessity of having access to parameter estimates based on data.
The next topic is Bayesian VARs. In particular, YADA supports VAR models for forecasting
purposes. The types of prior that may be used, computation of posterior mode, posterior sampling with the Gibbs sampler, the computation of the marginal likelihood, and forecasting with
such models are all given some attention in Section 14. One specific feature of the Bayesian
VAR models that YADA support is that the steady state parameters are modelled explicitly.
Next, an important aspect of Bayesian analysis is that it does not rely on the assumption that
the model is correctly specified. The so called DSGE-VAR approach, advocated in a series of
articles by Del Negro and Schorfheide (2004, 2006, 2009) and Del Negro, Schorfheide, Smets,
and Wouters (2007), has been suggested as a tool for measuring the degree of misspecification
of a DSGE model by approximating it by a VAR; see also An and Schorfheide (2007). This
approach may be also be used as a measure of fit of the DSGE model, and can be used to
compute the posterior distribution of the DSGE model parameters when viewed through the
lens of the VAR. The setup of DSGE-VARs and their estimation are discussed in some detail in
Section 15, while Section 16 turns to the prior and posterior analyses that can be performed
through a DSGE-VAR.
Finally, the issue of setting up the DSGE model, VAR, and DSGE-VAR input for YADA is discussed in Section 17. This involves writing the model equations in an AiM model file, specifying
a prior for the parameters to estimate, having appropriate parameter functions for parameters
that are either calibrated or which are functions of the estimated parameters, the construction
of the file for the measurement equations that link the model variables to the observed and the
exogenous variables, and finally the file that reads observed data into YADA. Since this paper
11
concerns the implementation of various mathematical issues in a computer program, most sections end with a part that discusses some details of the main functions that are made use of by
YADA.
12
2. DSGE Models
Before we turn our attention to issues related to the estimation of DSGE models it is only natural
to first have a look at a few examples and how DSGE models can be solved. In this Section I will
therefore provide a birds eye view of some well know DSGE models, which happen to share the
unfortunate fate of being directly supported by YADA. That is, the YADA distribution contains
all the necessary files for estimating them. The problem of solving DSGE models is discussed
thereafter in Section 3.
A natural point of entry is the benchmark monetary policy analysis model studied in An and
Schorfheide (2007). This model is only briefly presented as a close to the simplest possible
example of a DSGE model. An open economy version of this model from Lubik and Schorfheide
(2007a) is thereafter considered before the well-known Smets and Wouters (2007) model is
examined in more detail. The problem of solving a DSGE model will thereafter be addressed in
Section 3.
2.1. The An and Schorfheide Model
The model economy consists of a final goods producing firm, a continuum of intermediate
goods producing firms,a representative household, and a monetary and a fiscal authority. As
pointed out by An and Schorfheide (2007), this model has become a benchmark specification for
monetary policy analysis; for a detailed derivation see, e.g., King (2000) or Woodford (2003).
Let xt lnxt /x denote the natural logarithm of some variable xt relative to its steady state
value x. The log-linearized version of the An and Schorfheide (2007) model we shall consider
has 6 equations describing the behavior of (detrended) output (yt ), (detrended) consumption
(ct ), (detrended) government spending (gt ), (detrended) technology (zt ), inflation t , and a
short term nominal interest rate Rt . The equations are given by
1
t1 Et zt1 ,
ct Et ct1 R
t Et
t gt ,
t Et t1 y
y
t ct gt ,
t1 1 R 1 t 1 R 2 y
t R R
t gt R R,t ,
R
(2.1)
gt G gt1 G G,t ,
zt Z zt1 Z Z,t .
The shocks i,t iidN0, 1 for i R, G, Z, and are called the monetary policy or interest rate
shock, the government spending shock, and the technology shock.
The first equation is the log-linearized consumption Euler equation, where is the inverse
of the intertemporal elasticity of substitution. The second equation is the log-linearized price
Phillips curve, with being the discount factors and a function of the inverse of the elasticity
of demand (), steady-state inflation (), and the degree of price stickiness () according to
1
.
2
The third equation is the log-linearized aggregate resource constraint, while the fourth is the
monetary policy rule with y
t gt being equal to the output gap. This output gap measure
is a flexible price measure, i.e., based on 0, with potential output thus being equal to
government spending. Accordingly, the output gap is equal to consumption in this version of
the model. The final two equations in (2.1) are AR(1) processes for the exogenous government
spending and technology variables.
The steady state for this model is given by r /, R r , , y g1 1/ , and
c 1 1/ . The parameter is the steady-state inflation target of the central bank, while
is the steady-state growth rate.
The measurement equation linking the data on quarter-to-quarter per capita GDP growth
(yt ), annualized quarter-to-quarter inflation rates (t ), and annualized nominal interest rates
13
t y
t1 zt ,
yt Q 100 y
t A 400t,
(2.2)
t.
Rt A r A 4 Q 400R
Additional parameter definitions are:
Q
A
1
,
1
,
1
,
(2.3)
100
400
r A
1
400
where only the parameter is of real interest since it appears in (2.1). In view of the expression
for , it follows that the and parameters cannot be identified in the log-linearized version
of the model. The parameters to estimate for this model are therefore given by
(2.4)
When simulating data with the model, the value of was given by:
2.00 0.15 1.50 1.00 0.60 0.95 0.65 0.40 4.00 0.50 0.002 0.008 0.0045 .
These values are identical to those reported by An and Schorfheide (2007, Table 2) for their
data generating process.
2.2. A Small Open Economy DSGE Model: The Lubik and Schorfheide Example
As an extension of the closed economy An and Schorfheide model example, YADA also comes
with a small open economy DSGE model which has been investigated by, e.g., Lubik and
Schorfheide (2007a) and Lees, Matheson, and Smith (2011) using actual data. The YADA
example is based on data simulated with the DSGE model.
The model is a simplification of Gal and Monacelli (2005) and, like its closed economy
counterpart, consists of a forward-looking IS-equation and a Phillips curve. Monetary policy
is also given by a Taylor-type interest rate rule, where the exchange rate is introduced via the
definition of consumer prices and under the assumption of PPP. In log-linearized form the model
can be expressed as
1
t Et t1 Et zt1 Et
Et y
t1 R
qt1 1
t1
,
y
t Et y
y
t xt ,
qt1
qt
t Et t1 Et
(2.5)
t1 1 R t y y
t R R
t e
et R R,t ,
R
qt t ,
et t 1
where 1/1 r A /400, 2 1 , is the intertemporal elasticity of
substitution, and 0 < < 1 is the import share. The closed economy version of the model is
obtained when 0.
Variables denoted with an asterisk superscript are foreign, the nominal exchange rate is given
by e, terms-of-trade (defined as the relative price of exports in terms of imports) by q, while
potential output in the absence of nominal rigidities, xt , is determined by the equation:
1
y
t ,
xt 2
To close the model, the remaining 4 variables are assumed to be exogenous and determined by:
qt1 Q Q,t ,
qt Q
t1
Y Y ,t ,
y
t Y y
,t ,
t t1
zt Z zt1 Z Z,t ,
14
(2.6)
where i,t N0, 1 for i R, Q, Y , , Z. When the model is taken literally, the terms-of-trade
variable is endogenously determined by the equation:
t y
t .
qt y
However, Lubik and Schorfheide note that such a specification leads to numerical problems
when estimating the posterior mode and to implausible parameters estimates and low likelihood
values when a mode is located.
The measurement equations for this model are:
t zt ,
yt Q y
t A t ,
t ,
Rt A r A 4 Q R
(2.7)
et ,
et
qt .
qt
The model therefore has a total of 19 unknown parameters, collected into
y e R r A Q Z
Y A Q Q Z R Y .
When simulating data using this model, the value for was given by:
1.30 0.23 0.14 0.69 0.11 0.51 0.32 0.31 0.31 0.42
0.97 0.46 1.95 0.55 1.25 0.84 0.36 1.29 2.00 .
With the exception of the parameters that reflect the steady-state values of the observables,
the values are equal to the benchmark estimates for Canada in Lubik and Schorfheide (2007a,
Table 3).
2.3. A Medium-Sized Closed Economy DSGE Model: Smets and Wouters
A well known example of a medium-sized DSGE model is Smets and Wouters (2007), where the
authors study shocks and frictions in US business cycles. Like the two examples discussed above,
the Smets and Wouters model is also provided as an example with the YADA distribution. The
equations of the model are presented below, while a detailed discussion of the model is found in
Smets and Wouters (2007); see also Smets and Wouters (2003, 2005). It should be emphasized
that since the model uses a flexible-price based output gap measure in the monetary policy rule,
the discussion will first consider the sticky price and wage system, followed by the flexible price
and wage system. The equations for the 7 exogenous variables are introduced thereafter, while
the steady-state of the system closes the theoretical part of the empirical model. Finally, the
model variables are linked to the observed variables via the measurement equations.
2.3.1. The Sticky Price and Wage Equations
The log-linearized aggregate resource constraint of this closed economy model is given by
g
(2.8)
y
t cy ct iyit zy zt t ,
ct ), real private
where y
t is (detrended) real GDP. It is absorbed by real private consumption (
g
where ky is the steady-state capital-output ration, is the steady-state growth rate, and is the
depreciation rate of capital. Finally,
zy r k ky ,
where r k is the steady-state rental rate of capital. The steady-state parameters are shown in
Section 2.3.4, but it is noteworthy already at this stage that zy , the share of capital in
production.
The dynamics of consumption follows from the consumption Euler equation and is equal to
(2.9)
ct c1 ct1 1 c1 Et ct1 c2 lt Etlt1 c3 rt Et t1 tb ,
where lt is hours worked, rt is the policy controlled nominal interest rate, and tb is proportional
to the exogenous risk premium, i.e., a wedge between the interest rate controlled by the central
bank and the return on assets held by households. It should be noted that in contrast to Smets
and Wouters (2007), but identical to Smets and Wouters (2005), I have moved the risk premium
variable outside the expression for the ex ante real interest rate. This means that tb c3 tb ,
where tb is the risk premium variable in Smets and Wouters (2007), while tb is referred to as
a preference variable that aects the discount rate determining the intertemporal subsitution
decisions of households in Smets and Wouters (2005). I have chosen to consider the expression
in (2.9) since it is also used in the dynare code that can be downloaded from the American
Economic Review web site in connection with the 2007 article.4
The parameters of the consumption Euler equation are:
c1
/
,
1 /
c2
c 1 wh l/c
,
c 1 /
c3
1 /
,
c 1 /
where measures external habit formation, c is the inverse of the elasticity of intertemporal
substitution for constant labor, while wh l/c is the steady-state hourly real wage bill to consumption ratio. If c 1 (log-utility) and 0 (no external habit) then the above equation reduces
to the familiar purely forward looking consumption Euler equation.
The log-linearized investment Euler equation is given by
it i1it1 1 i1 Etit1 i2 qt ti ,
(2.10)
where qt is the real value of the existing capital stock, while ti is an exogenous investmentspecific technology variable. The parameters of (2.10) are given by
1
1
,
, i2
i1
1
1 1c
1 c 2
where is the discount factor used by households, and is the steady-state elasticity of the
capital adjustment cost function.
The dynamic equation for the value of the capital stock is
k
rt Et t1 c31 tb ,
(2.11)
qt q1 Et qt1 1 q1 Et rt1
where rtk is the rental rate of capital. The parameter q1 is here given by
1
.
1
Turning to the supply-side of the economy, the log-linearized aggregate production function
can be expressed as
s 1 lt a ,
(2.12)
y
t p k
q1 c 1
rk
The links to the code and the data as well as the Appendix of Smets and Wouters (2007) can be found next to the
electronic version of the paper.
16
The capital services variable is used to reflect that newly installed capital only becomes eective with a one period lag. This means that
t1 zt ,
s k
(2.13)
k
t
t is the installed capital. The degree of capital utilization is determined from cost
where k
minimization of the households that provide capital services and is therefore a positive function
of the rental rate of capital. Specifically,
zt z1 rtk ,
(2.14)
where
1
,
and is a positive function of the elasticity of the capital adjustment cost function and normalized to be between 0 and 1. The larger is the costlier it is to change the utilization of
capital.
The log-linearized equation that specifies the development of installed capital is
t1 1 k1 it k2 i .
t k1 k
(2.15)
k
t
z1
k2 1 1 1c .
p
t k
t
where the real wage is given by w
t . Similarly, the real marginal cost is
rtk 1 w
t ta ,
ct
(2.17)
where (2.17) is obtained by substituting for the optimally determined capital-labor ratio in
equation (2.19).
Due to price stickiness, as in Calvo (1983), and partial indexation to lagged inflation of those
prices that cannot be reoptimized, prices adjust only sluggishly to their desired markups. Profit
maximization by price-setting firms yields the log-linearized price Phillips curve
p
t t
t 1 t1 2 Et t1 3
1 t1 2 Et t1 3
ct t ,
(2.18)
where t is an exogenous price markup process. The parameters of the Phillips curve are given
by
p
1 p 1 1c p
1c
.
, 2
, 3
1
1 1c p
1 1c p
1 1c p p p 1p 1
The degree of indexation to past inflation is determined by the parameter p , p measures the
degree of price stickiness such that 1 p is the probability that a firm can reoptimize its price,
and p is the curvature of the Kimball (1995) goods market aggregator.
Cost minimization of firms also implies that the rental rate of capital is related to the capitallabor ratio and the real wage according to.
s lt w
(2.19)
t.
rtk k
t
In the monopolistically competitive labor market the wage markup is equal to the dierence
between the real wage and the marginal rate of substitution between labor and consumption
w
ct ct1
t l lt
,
(2.20)
t w
1 /
17
where l is the elasticity of labor supply with respect to the real wage.
Due to wage stickiness and partial wage indexation, real wages respond gradually to the
desired wage markup
w
t1 1 w1 Et w
t1 Et t1 w2 t w3 t1 w4
w
(2.21)
w
t w1 w
t t ,
where tw is an exogenous wage markup process. The parameters of the wage equation are
1 1c w
1
,
w
,
2
1 1c
1 1c
1 w 1 1c w
w
.
, w4
w3
1 1c
1 1c w w 1 w 1
w1
The degree of wage indexation to past inflation is given by the parameter w , while w is the
degree of wage stickiness. The steady-state labor market markup is equal to w 1 and w is
the curvature of the Kimball labor market aggregator.
The sticky price and wage part of the model is closed by adding the monetary policy reaction
function
f
f
rt1 1 r t ry y
t
t tr ,
(2.22)
t y
ry y
t y
rt
f
where y
t is potential output measured as the level of output that would prevail under flexible
prices and wages in the absence of the two exogenous markup processes, whereas tr is an
exogenous monetary policy shock process.
2.3.2. The Flexible Price and Wage Equations
The flexible price equations are obtained by assuming that the two exogenous markup processes
are zero, while w p 0, and w p 0. As a consequence, inflation is always equal to
the steady-state inflation rate while real wages are equal to the marginal rate of substitution
between labor and consumption as well as to the marginal product of labor. All other aspects
of the economy are unaected. Letting the superscript f denote the flexible price and wage
economy versions of the variables we find that
f
f
f
f
g
y
t cy ct iyit zy zt t ,
f
f
f
f
f
f
ct c1 ct1 1 c1 Et ct1 c2 lt Etlt1 c3 rt tb ,
ta
k,f
z1 rt ,
f 1 k1 if
k1 k
t
t1
k,f
f
rt 1 w
t ,
(2.23)
k2 ti ,
f
s,f lf w
k
t ,
t
t
1
f
f
f
f
c ct1
,
w
t l lt
1 / t
k,f
rt
where rt is the real interest rate of the flexible price and wage system.
2.3.3. The Exogenous Variables
There are 7 exogenous processes in the Smets and Wouters (2007) model. These are generally
modelled as AR(1) process with the exception of the exogenous spending process (where the
18
process depends on both the exogenous spending shock t and the total factor productivity
shock ta ) and the exogenous price and wage markup processes, which are treated as ARMA(1,1)
processes. This means that
g
t g t1 g t ga a ta ,
b
b tb ,
tb b t1
i
i ti ,
ti i t1
a
a ta ,
ta a t1
p
(2.24)
p
t p t1 p t p p t1 ,
w
w
w tw w w t1
,
tw w t1
r
r tr .
tr r t1
j
The shocks t , j {a, b, g, i, p, r, w}, are N0, 1, where tb is a preference shock (proportional
p
to a risk premium shock), ti is an investment-specific technology shock, t is a price markup
shock, tr is a monetary policy or interest rate shock, and tw is a wage markup shock.
1
1 r k
k
l
l
1 1
,
p
,
,
w
k
k
w
y
k
p r
where ky k/y. From these relationships it is straightforward, albeit tedious, to show that
zy r k ky .
The steady-state relation between real wages and hourly real wages is
w w wh ,
so that the steady-state hourly real wage bill to consumption ratio is given by
1 r k ky
1
wh l
,
c
w cy
w cy
where the last equality follows from the relationship of zy r k ky .
given by
yt
y
t y
t1
ct ct ct1
it it it1
.
(2.25)
wt w
t1
t w
lt
lt l
t
rt
4
rt
4
r
Since all observed variables except the federal funds rate (which is already reported in percent)
are multiplied by 100, it follows that the steady-state values on the right hand side are given by
1 ,
100 1 , 100 1 , r 100
c
where is steady-state inflation. The federal funds rate is measured in quarterly terms in Smets
and Wouters (2007) through division by 4, and is therefore multiplied by 4 in (2.25) to restore
it to annual terms.5 At the same time, the model variable rt is measured in quarterly terms.
Apart from the steady-state exogenous spending-output ratio only four additional parameters
are calibrated. These are 0.025, w 1.5, and p w 10. The remaining 19 structural
and 17 shock process parameters are estimated. The prior distributions of the parameters are
given in Smets and Wouters (2007, Table 1) and are also provided in the YADA example of their
model.
For the YADA example of the Smets and Wouters model, the data on the federal funds rate has been redefined into
annual terms.
20
i1
where L > 0 is the number of lags and U > 0 the number of leads. The zt (p 1) vector are
here the endogenous variables, while t (q 1) are pure innovations, with zero mean and unit
variance conditional on the time t 1 information. The Hi matrices are of dimension p p
while D is p q. When p > q the covariance matrix of t Dt , DD , has reduced rank since
the number of shocks is less than the number of endogenous variables.6
Adolfson, Lasen, Lind, and Svensson (2008a) shows in some detail how the AiM algorithm
can be used to solve the model in equation (3.1) when U L 1.7 As pointed out in that
paper, all linear systems can be reduced to this case by replacing a variable with a long lead or a
long lag with a new variable. Consider therefore the system of stochastic dierence equations:
H1 zt1 H0 zt H1 Et zt1
Dt .
(3.2)
The AiM algorithm takes the Hi matrices as input and returns B1 , called the convergent autoregressive matrix, and S0 , such that the solution to (3.2) can be expressed as an autoregressive
process
(3.3)
zt B1 zt1 B0 t ,
The specification of i.i.d. shocks and the matrix D is only used here for expositional purposes. AiM does not make
any distinction between endogenous variables and shocks. In fact, zt would include t and, thus, H0 would include
D.
7
Their paper on optimal monetary policy in an operational medium-sized DSGE model is published in Adolfson,
Lasen, Lind, and Svensson (2011).
21
where
B0 S1
0 D,
(3.4)
S0 H0 H1 B1 .
(3.5)
(3.6)
This can be seen by leading the system in (3.2) one period and taking the expectation with
respect to time t information. Evaluating the expectation through (3.3) yields the identity.
From equations (3.5) and (3.6) it can be seen that B1 and S0 only depend on the Hi matrices,
but not on D. This is consistent with the certainty equivalence of the system.
More generally, the conditions for the existence of a unique convergent solution (Anderson
and Moore, 1983, 1985, and Anderson, 2008, 2010) can be summarized as follows:
Rank condition:
rank
U
Hi
dimz.
iL
{zi }1
iL
The rank condition is equivalent to require that the model has a unique non-stochastic steady
state, while the boundedness condition requires that the endogenous variables eventually converge to their steady state values; see also Blanchard and Kahn (1980) for discussions on existence and uniqueness.8
Given that a unique convergent solution exists, AiM provides an autoregressive solution path
L
Bi zti B0 t .
(3.7)
zt
i1
z
B B 2 B L
B
zt
t1 0 t
1
z
0 0 zt2 0
t1
. . .
. . .
..
.. ..
.
..
.
. . .
ztL 1
ztL
0 I
0
0
(3.8)
With t zt ztL 1
the F matrix of the state-space form is immediately retrieved from
(3.8), while the state shocks, vt , are given by the second term on the right hand side; see
Section 5. The Q matrix is equal to the zero matrix, except for the upper left corner which is
given by B0 B0 . If L 1, then Q B0 B0 , while F B1 .
Anderson and Moore (1985) presents a 14-steps algorithm for solving the system of equations in (3.1). The AiM matlab code is setup in a dierent way, where the first 8 steps are
performed without relying on a singular value decomposition. Before the coded version of the
AiM algorithm is presented, let
H HL H1 H0 H1 HU ,
be a p pL U 1 matrix, while Q is a zero matrix with dimension pU pL U . The
auxiliary initial conditions in Q are first setup from a series of shift-rights. These are based on
locating rows of zero of the last (right-most) p columns of H (initially therefore of HU ). Once
there are no such rows of zeros, the algorithm is complemented with an eigenvalue computation
for determining Q.
8
For generalizations and discussions of the boundedness condition, see Sims (2002) and Meyer-Gohde (2010); see
also Burmeister (1980) for further discussions on convergence and uniqueness.
22
Assuming that the matrix HU has n1 > 0 rows of zeros with indices i1 , the shift-right procedure begins with setting the first n1 rows of Q equal to rows with indices i1 and the first
pU L columns of H. The rows in H with indices i1 are prepended with p columns of zeros
while the last p columns are deleted, i.e., the first pU L elements in the n1 rows with
indices i1 are shifted to the right by p columns. The procedure next tries to locate rows of zeros
in the last p columns of this reshued H matrix. Let n2 be the number of such rows of zeros
with indices i2 . If n2 > 0 and n1 n2 pU , rows n1 1 until n1 n2 of Q are set equal
to the rows with indices i2 and the first pU L columns of H. Furthermore, the rows in H
with indices i2 are prepended with p columns of zeros while the last p columns are deleted.
The procedure thereafter checks for rows of zeros in the last p columns of H and repeats the
procedure if necessary until no more rows of zero can be located in the last p columns of H.
Notice that the procedure also breaks o if n1 n2 . . . nk > pU , where k is the number of
searches for rows of zeros. In that case, AiM reports that there have been too many shift-rights
and that it cannot locate a unique convergent solution to the system of equations (too many
auxiliary conditions).
Once the shift-right loop has finished, the AiM procedure computes the p pU L matrix
from the reshued H matrix according to
H1
HL H1 H0 HU 1 ,
U
where HU is now the p p matrix in the last p columns of the reshued H, while the pU
L pU L companion matrix A is obtained as
0
IpU L 1
.
A
23
(3.9)
Following, e.g., Meyer-Gohde (2010), we choose to stack the variables in the system (3.2)
such that t 1 appear above t. That way we know that the first p variables are predetermined.
This turns out to simplify the handling of output from the QZ decomposition. In particular, we
rewrite the AiM system matrices such that
zt
H1 H0 zt1
D
0 H1
(3.10)
t .
I
0
Et zt1
0
I
zt
0
The matrix B1 in (3.3) can now be computed from a QZ decomposition of the matrices (A, B)
conditional on a given ordering of the generalized eigenvalues of the matrix pencil Az B,
where z is a complex variable. The QZ decomposition of (A, B) is given by QAZ S and
QBZ T, where S and T are upper triangular and possibly complex, Q and Z are unitary, i.e.,
24
(3.13)
term, i.e., the H1 matrix is typically singular. In fact, it would be sucient to define the vector et
such that only those elements of Et zt1
that enter the model would be used. YADA here takes
the easy way out and includes all such variables in the definition of et . This avoids the hazzle of
letting the code try to figure out which variables are (always) included and which are (always)
excluded, a potentially error prone operation. One cost of this decision is that the solution time
for the Sims approach is longer than necessary. At the same time, this is not expected to be
very important since relative to computing the value of the log-likelihood the time for solving
the model is nevertheless short.
The complex conjugate or conjugate transpose of a complex matrix A B iC is equal to A B iC . Hence,
the complex conjugate of a real matrix is simply the transpose of the matrix.
9
25
Given the form (3.13), the solution method by Sims is also based on the QZ decomposition
with the generalized eigenvalues sorted the same way as in Klein; see Sims (2002, equations
4445) for details on the computation of B1 and B0 . This also means that the existence of a
unique convergent solution can be checked with the same tools.
3.4. YADA Code
YADA uses only Matlab functions for running the AiM procedures, whereas most Matlab implementations include script files. The main reason for this change is that the construction of
various outputs can more easily be traced to a particular Matlab file when functions are used,
while script files tend to hide a lot of variables, most of which are only needed locally. Moreover,
inputs are also easier to keep track of, since they can be given local names in a function.
The main AiM functions in YADA for solving the DSGE model and setting up the output
as required by the Kalman filter are: AiMInitialize, AiMSolver, and AiMtoStateSpace. A
number of other functions are also included for utilizing AiM, but these are not discussed here.10
It should be noted that the computationally slowest function, AiMInitialize, needs only be run
once for a given model specification. The other two main functions need to be run for each set
of parameter values to be analysed by the code.
YADA also supports the Klein (2000) and Sims (2002) approaches to solving a DSGE model.
The AiM parser, run through the AiMInitialize function, is still required for these approaches
since we need to write the DSGE model on the structural form in (3.2). The Klein approach
is handled with the function KleinSolver, while Sims gensys solver is run via the function
SimsSolver.
3.4.1. AiMInitialize
The function AiMInitialize runs the AiM parser on ModelFile, a text file that sets up the DSGE
model in a syntax that the AiM parser can interpret. YADA refers to this file as the AiM model
file. If parsing is successful (the syntax of the ModelFile is valid and the model is otherwise
properly specified), the AiM parser writes two Matlab files to disk. The first is the function
compute_aim_data.m and the second the script file compute_aim_matrices.m. The latter file
is then internally parsed by AiMInitialize, rewriting it as a function that accepts a structure
ModelParameters as input, where the fields of the structure are simply the parameter names as
they have been baptized in the AiM model file, and provides the necessary output. For example,
if the model file has a parameter called omega, then the structure ModelParameters has a field
with the same name, i.e., ModelParameters.omega.
The functions compute_aim_data.m and compute_aim_matrices.m are stored on disk in a
sub-directory to the directory where the AiM model file is located. By default, the name of this
directory depends only on the name of the model specification (which can be dierent from the
AiM model file, since the latter can be shared by many model specifications). AiMInitialize
therefore also takes the input arguments NameOfModel and (optionally) OutputDirectory.
AiMInitialize also runs the function compute_aim_data.m and stores the relevant output
from this function in a mat-file located in the same directory as the compute_aim_data.m file.
Finally, AiMInitialize provides as output the status of the AiM parsing, and the output given
by the compute_aim_data.m function. The status variable is 0 when everything went OK; it
is 1 if the parsing did not provide the required output; 2 if the number of data variables did
not match the number of stochastic equations; 3 if illegal parameter names were used;11 and
4 if the number of lags (L ) is greater than 1. All output variables from AiMInitialize are
10
Most, if not all, of these Matlab functions originate from the AiM implementation at the Federal Reserve System;
see, e.g., Zagaglia (2005).
11
YADA has only reserved 4 names as illegal. First of all, in order to allow for parameters called g and h (matrix
names in the function compute_aim_matrices.m) YADA temporarily renames them YADAg and YADAh, respectively,
when it rewrites the file from a Matlab script file to a Matlab function. For this reason, parameters cannot be
named YADAg and YADAh. Furthermore, the name UserVariables is reserved for passing user determined data to
the parameter functions that YADA supports, while the name YADA is reserved for internally disseminating the state
equation matrices to the measurement equation function; see Section 17.4.
26
required, while the required input variables are given by ModelFile, being a string vector containing the full path plus name and extension of the model file, and NameOfModel, a string
vector containing the name of the model specification. The final input variable is optional
and is locally called OutputDirectory, the directory where the AiM output is stored. The
NameOfModel variable determines the name of the mat-file that is created when running the
function compute_aim_data.m.
3.4.2. AiMSolver
The function AiMSolver attempts to solve the DSGE model. To this end it requires as inputs the
ModelParameters structure (containing values for all model parameters), the number of AiM
equations (NumEq, often being at least p q 1), the number of lags (NumLag being L ), the
number of leads (NumLead being U ), and the numerical tolerance for AiM and the other DSGE
model solvers (AIMTolerance).
As output the function provides a scalar mcode with information about the solvability properties of the DSGE model for the parameter values found in the ModelParameters structure.
When a unique convergent solution exists the mcode variable returns 1, while other values reflect various problems with the selected parameters (see the AiMSolver file for details).
Given that a unique convergent solution exists, the solution matrices as well as the maximum absolute error (MaxAbsError) when computing the solution are calculated. The solution
matrices are given by all the Bi s, provided in BMatrix ( BL B1
), and all the Sj s, returned
as the matrix SMatrix ( SL S1 S0
). These matrices have dimensions NumEq L NumEq and
NumEq L 1NumEq, respectively. Since YADA only accepts DSGE models that have been
specified such that L 1, the dimensions of these matrices are not unnecessarily made larger
than they need be.
Finally, the function yields the output variable ModelEigenvalues, a structure that contains
information about the eigenvalues of the reduced form, i.e., the solution of the model.
3.4.3. AiMtoStateSpace
The function AiMtoStateSpace creates the F matrix for the state equation (5.2) based on
the input matrix BMatrix and B0 from SMatrix. Since the output from AiMSolver treats all
equations in a similar fashion, the vectors zt and t are both often included as separate equations. Hence, NumEq p q. The additional input variables StateVariablePositions and
StateShockPositions are therefore needed to locate which rows and columns of BMatrix and
SMatrix that contain the coecients on the z and variables. These input vectors are created
with the YADA GUI.
3.4.4. KleinSolver
The function KleinSolver requires 6 input variables to perform its task, i.e., to solve the
DSGE model with the Klein (2000) approach. The variables are: ModelParameters, NumLead,
StateVariablePositions, StateShockPositions, AIMTolerance, and OrderQZ. The first two
and the fifth input variable is identical to the same variables in AiMSolver, while the third and
the fourth variable are used by AiMtoStateSpace. The last input variable is a boolean that is
unity if the function ordqz is a built-in Matlab function, and zero otherwise. All Matlab version
greater than or equal to version 7 have this function.
The function provides 3 required and 2 optional output variables. The required outputs
are F, B0, and mcode. The first two are matrices on lagged state variables and current state
shocks in the state-pace representation, i.e., the solution to the DSGE model and are therefore
the same as the output variables from AiMtoStateSpace. The mcode variable is shared with
AiMSolver, but supports slightly dierent values. The optional variables are MaxAbsError and
ModelEigenvalues which are also provided by AiMSolver. The latter structure is now based on
the generalized eigenvalues of the structural form of the model, and has fewer fields than the
variable provided by AiMSolver.
27
3.4.5. SimsSolver
The function SimsSolver supports the same input and output variables as KleinSolver. It
rewrites the structural form of the DSGE model into the Sims (2002) form in equation (3.15)
and sends the matrices to gensys, the Sims solver. The function used by YADA for this is called
YADAgensys and is a slight rewrite of Sims original function. In particular, it makes it possible
to run the Matlab function ordqz rather than gensys own qzdiv. The built-in Matlab function
is considerably faster than qzdiv, but is not included in older versions of Matlab.
28
with being the support of . Since pY is a constant when Y has been realized we know
that the posterior density of is proportional to the product pY |p. Hence, if we can
characterize the distribution of this product we would know the posterior distribution of . For
complex models like those belonging to the DSGE family this characterization is usually not
possible. Methods based on Markov Chain Monte Carlo (MCMC) theory can instead be applied
to generate draws from the posterior.
Still, without having to resort to such often time consuming calculations it should be noted
that the mode of the posterior density can be found by maximizing the product pY |p.
Since this product is usually highly complex, analytical approaches to maximization are ruled
can be estimated using numerical
out from the start. Instead the posterior mode, denoted by ,
methods. In Section 4.2 we provide details on the individual prior distributions for the elements of that YADA supports. Through the independence assumption, the joint prior p is
simply the product of these individual (and marginal) prior densities.12 The computation of the
likelihood function for any given value of is thereafter discussed in Section 5.
4.2. Prior Distributions
In the Bayesian DSGE framework it is usually assumed that the parameters to be estimated,
denoted here by , are a priori independent. For parameters that have support R, the prior
distribution is typically Gaussian. Parameters that instead have support R tend to have either
gamma or inverted gamma prior distributions, while parameters with support c, d, where
d > c and both are finite, are usually assumed to have beta prior distributions; see, e.g., An
and Schorfheide (2007). In some cases, e.g., Adolfson et al. (2007b), the distribution may be
left truncated normal for a certain parameter. The density functions of these distributions as
well as of the uniform, the Student-t (and Cauchy), the logistic, the Gumbel, and the Pareto
12
If one wishes to make use of parameters that are a priori dependent, one may formulate parameter functions and
treat elements of as auxiliary parameters for the ones of interest. YADA supports such parameter functions and,
hence, an assumption of a priori independent parameters is not restrictive.
29
distributions are given below. Some of these have, to my knowledge, not been used in the
empirical DSGE modelling literature, but it seems reasonable to, e.g., consider using a Studentt or a logistic as an alternative to the normal prior.
YADA can also support a number of additional distributions through parameter transformation functions. These include but are not limited to the Weibull and the Snedecor (better known
as the F or Fisher) distributions.13 The densities of such additional distribution are derived
through a useful result which directly relates the density of a monotonic transformation of a
continuous random variable to the density of that variable. Next, the gamma and beta functions are presented since they often appear in the integration constants of certain important
distributions. Thereafter, we examine the prior distributions which are directly supported by
YADA, focusing on the specific parameterizations used and relate these parameters to moments
of the distributions. Furthermore, we discuss some distributions which can be derived from
the directly supported ones, and which are therefore indirectly supported. In addition, we also
reflect on some interesting special cases of the directly supported priors. The section ends with
a dicussion about random number generators.
4.2.1. Monotonic Functions of Continuous Random Variables
Suppose that a continuous random variable x has density function pX x. The general principle
for determining the density of a random variable z fx, where f is monotonic (order
preserving), is the following:
1
pX f 1 z
(4.3)
pZ z
1
f f z
The derivative of f is given by dz/dx f , while f 1 is the inverse function, i.e.,
x f 1 z; see, e.g., Bernardo and Smith (2000, p. 111). This powerful result makes it
straightforward to determine the density of any monotonic transformation of a random variable.
An intuition for the result in equation (4.3) can be obtained by recalling that the integral of
the density of x over its domain is equal to unity. To calculate this integral we multiply the
density of x by dx and then perform the integration. When integrating over the domain of z
we instead multiply the density of x by dz. To ensure that the integral is still equal to unity, we
must therefore multiply this expression by |dx/dz| |1/dz/dx|, where the absolute value
guarantees that the sign of the integral does not change.
Since YADA supports functions of parameters, the relationship in equation (4.3) means that
YADA indirectly supports all prior distributions where the corresponding random variable can
be expressed as a monotonic function of one of the basic priors directly supported by YADA.
Furthermore, the relationship between the joint density and the conditional and the marginal
densities (i.e., the foundation for Bayes Theorem) makes it possible to further enhance the set of
density functions which YADA can indirectly support to include also mixtures and multivariate
priors.
4.2.2. The Gamma and Beta Functions
The gamma function is defined by the following integral identity:
xa1 expxdx, a > 0.
a
(4.4)
13
YADA can indirectly support multivariate extensions of the distributions. For example, one may wish to have a
Dirichlet (multivariate beta), a multivariate normal prior, or an inverted Wishart prior for a vector of parameters.
For these cases, parameter transformation functions can be used to allow for a multivariate prior. In the case of a
multivariate normal prior, we would define the prior as univariate normal priors for auxiliary parameters, e.g., for
one conditional and for one marginal parameter, while the transformation would be applied to the conditional
parameter. Similarly, the Dirichlet distribution is supported through univariate beta priors for auxiliary parameters;
see, e.g., Connor and Mosimann (1969). In other words, YADA can support multivariate priors through its use of a
parameter transformation function; see Section 17.3.
30
,
2
b
(4.8)
In practise, most economists (and econometricians) are probably more comfortable formulating
a prior in terms of the mean and the standard deviation, than in terms of a and b.
The mode can, when it exists, also be expressed in terms of the mean and the variance
parameters. Equation (4.8) and the expression for the mode give us
The mode therefore exists when 2 > 2 , i.e., when the mean is greater than the standard
deviation.
A few examples of the gamma distribution have been plotted in the upper left panel of Figure 1 (with the lower bound being equal to zero). The mean has been set to 0.2 in three cases
while the standard deviation takes the values (0.05, 0.1, 0.2). For the two cases when the mean
is greater than the standard deviation the mode exists, while results in a 1 so that the
mode does not exist. For the cases when the mode exists, the height of the density is negatively
related the standard deviation. Furthermore, for a given mean, the mode lies closer to the mean
as the standard deviation decreases since the ratio 2 / becomes smaller. Moreover, since a
lower standard deviation for fixed mean implies that a increases we also know that skewness
decreases. Hence, the gamma distribution with mean 0.2 and standard deviation 0.1 (blue solid
line in Figure 1) is more skewed than the gamma with mean 0.2 and standard deviation 0.05
(red dashed line).
The last example covers the case when the mean increases while the standard deviation is
fixed, i.e., the blue solid line relative to the magenta colored dotted line. The distance between
the mode and the mean now also decreases since the ratio 2 / becomes smaller. In terms of
31
Figure 1. Examples of gamma, inverted gamma, beta, normal and left truncated
normal distributions.
Gamma Distribution
(0.1,1)
(0.1,2)
(0.1,5)
(0.1,10)
18
16
7
14
6
12
5
10
4
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
Beta Distribution
0.1
0.2
0.3
0.4
0.5
0.6
Normal
and Left Truncated Normal Distributions
0.9
1.8
N(0,1)
LTN(0,1,1)
LTN(0,1,0.5)
LTN(0,1,0)
0.8
1.6
0.7
1.4
0.6
1.2
0.5
1
0.4
0.8
0.3
0.6
0.2
0.4
(0.5,0.28868)
(0.5,0.2)
(0.35,0.2)
(0.65,0.2)
0.2
0.1
0.2
0.3
0.4
0.5
0.6
0.1
0.7
0.8
0.9
0
4
the shape and scale parameters a and b, we know from (4.8) that a increases with while b
decreases. Moreover, since a increases it follows from the skewness expression that it decreases.
One special case of the gamma distribution is the 2 q. Specifically, if z 2 q then this
is equivalent to stating that z Gq/2, 2, with mean q and variance 2q. The mode of this
distribution exists and is unique when q 3 and is equal to q 2.
Another special case of the gamma distribution is the exponential distribution. This is obtained by letting a 1 in (4.7). With z b G1, b, we find that the mean is equal to
b and the variance is 2 b2 .
Similarly, the Erlang distribution is the special case of the gamma when a is an integer. For
this case we have that a a 1!, where ! is the factorial function. The parameterization
of the Erlang density is usually written in terms of 1/b, a rate parameter.
YADA can also support the Weibull distribution through the gamma prior and the so called file
with parameters to update; see Section 17.3. Specifically, if x G1, 1 1 and x z/ab
with a, b > 0, then z has a Weibull distribution with scale parameter a and shape parameter
b, i.e. z Wa, b. In YADA one would specify a prior for the random variable x and compute
z ax1/b in the file with parameters to update. The density function is now
b
b b1
z
exp
,
z > 0,
(4.9)
pW z|a, b b z
a
a
32
since dx/dz is positive and given by the term b/ab zb1 .14
2
The mean of the Weibull distribution is W a1 1/b, the variance is W
a2 1
2/b 1 1/b2
, whereas the mode exists and is given by
W ab 1/b1/b
when b > 1.
4.2.4. Inverted Gamma and Inverted Wishart Distributions
A random variable z > 0 has an inverted gamma distribution with shape parameter a > 0 and
scale parameter b > 0, denoted by z IGa, b, if and only if its pdf is given by
2
1
2a1
.
(4.10)
z
exp
pIG z|a, b
aba
bz2
15
1/2
This pdf has a unique mode at
IG 2/b2a
1 ; cf. Zellner (1971). Moreover, the
statement z IGa, b is equivalent to z 1/ x where x Ga, b.
The inverted gamma distribution is an often used prior for a standard deviation parameter.
Letting z, a q/2, and b 2/qs2 , we get
2 q/2
qs
qs2
2
q1
exp
,
(4.11)
pIG |s, q
q/2 2
2 2
where s, q > 0. The parameter q is an integer (degrees of freedom) while s is a location
parameter. This pdf has a unique mode at
IG sq/q 11/2 . Hence, the mode is below s
for finite q and converges to s when q .
The moments of this distribution exists when q is suciently large. For example, if q 2,
then the mean is
1/2
q 1/2 q
s,
IG
q/2
2
while if q 3 then the variance is given by
q 2
2
s 2IG .
IG
q2
Hence, both the mean and the variance are decreasing functions of q; see Zellner (1971) for
details.
Moreover, if q 4 then the third moment also exists. The exact expression can be found
in Zellner (1971, eq. (A.48)), but since that expression is very messy an alternative skewness
measure may be of interest. One such simpler alternative is the Pearson measure of skewness,
defined as the mean minus the mode and divided by the standard deviation. For the inverted
gamma we here find that
1/2
1/2
q 1/2
q
q
q/2
2
q1
IG
IG
q 3.
SP,IG
1/2 ,
IG
q 1/2 2 q
q
q2
q/2
2
This expression is positive for finite q and the inverted gamma distribution is therefore rightskewed. As q gets large, the skewness measure SP,IG 0. Both the numerator and the denominator are decreasing in q and for q > 5 the ratio is decreasing.16
With pX x expx and z ax1/b we find from equation (4.3) that f 1 z z/ab . Moreover, f x
a/bx1b/b so that f f 1 z ab /bz1b . By multiplying terms we obtain the density function in (4.9). Notice
that dx/dz 1/f f 1 z, the Jacobian of the transformation z into x.
14
15
Bauwens, Lubrano, and Richard (1999) refer to the inverted gamma distribution as the inverted gamma-1 distribution. The inverted gamma-2 distribution is then defined for a variable x z2 , where z follows an inverted
gamma-1 distribution.
16
Skewness is defined as the third standardized central moment, i.e., the third central moment divided by the
standard deviation to the power of 3. There is no guarantee that the sign of this measure always corresponds to the
sign of the Pearson measure.
33
A few examples of the inverted gamma distribution have been plotted in the upper right
panel of Figure 1. The location parameter is for simplicity kept fixed at 0.1, while the number
of degrees of freedom are given by q 1, 2, 5, 10. It can be seen that the height of the density
increases as q becomes larger.17 Moreover, the variance is smaller while skewness appears to be
lower for q 10 than for q 5. The latter is consistent with the results for the Pearson measure
of skewness, SP,IG .
Another parameterization of the inverted gamma distribution is used in the software developed by Adolfson et al. (2007b). Letting a d/2 and b 2/c, the pdf in (4.10) can be written
as:
2 c d/2 d1
c
.
z
exp
pIG z|c, d
d/2 2
2z2
The mode of this parameterization is found by setting
IG c/d 11/2 . With c qs2 and
d q this parameterization is equal to that in equation (4.11) with z .
A multivariate extension of the inverted gamma distribution is given by the inverted Wishart
distribution. Specifically, when a p p positive definite matrix is inverted Wishart, denoted
by IWp A, v, its density is given by
v/2
|A|
1 1
vp1/2
||
exp tr A ,
(4.12)
p vp/2 pp1/4
2
2
p v
b
where b a i1 ai1
/2 for positive integers a and b, with a b, and being the
gamma function in (4.4). The parameters of this distribution are given by the positive definite
location matrix A and the degrees of freedom parameter v p. The mode of the inverted
Wishart is given by 1/p v 1A, while the mean exists if v p 2 and is then given by
E
1/v p 1A; see, e.g., Zellner (1971, Appendix B.4) and Bauwens et al. (1999,
Appendix A) for details.
Suppose for simplicity that p 2 and let us partition and A conformably
A11 A12
11 12
,
A
.
12 22
A12 A22
It now follows from, e.g., Bauwens et al. (1999, Theorem A.17) that:
(1) 11 is independent of 12 /11 and of 221 22 212 /11 ;
(2) 11 IW1 A11 , v 1;
(3) 12 /11 |221 NA12 /A11 , 221 /A11 , where N, 2 denotes the univariate normal distribution with mean and variance 2 (see, e.g., Section 4.2.6 for details); and
(4) 221 IW1 A221 , v, where A221 A22 A212 /A11 .
From these results it is straightforward to deduce that the multivariate random matrix may
be represented by three independent univariate random variables. Specifically, let
1 IGs1 , v 1,
2 IGs2 , v,
17
1s21 ,
at 11 ,
(4.13)
and A11 v
it can be shown that 11 IW1 A11 , v 1 by evaluating
With 11
A11 , and multiplying this density with the inverse of the
the inverted gamma density
; recall equation (4.3) in Section 4.2.1.
derivative of 11 with respect to 1 , i.e., by 1/21/2
11
2
2
Furthermore, letting 221 2 and A221 vs2 we likewise find that 221 IW1 A221 , v.
Trivially, we also know that 12 /11 A12 /A11 221 /A11 implies that 12 /11 |221
NA12 /A11 , 221 /A11 .
12
Together, these results therefore ensure that IW2 A, v, where
11 12 ,
12 11
22 22
A12
A11
22
A11
212
.
11
It may also be noted that one can derive an inverted Wishart distribution for the general p
p case based on p univariate inverted gamma random variables and pp 1/2 univariate
standard normal random variables, and where all univariate variables are independent. The
precise transformations needed to obtain from these univariate variables can be determined
by using Theorem A.17 from Bauwens et al. (1999) in a sequential manner.
4.2.5. Beta, Snedecor (F), and Dirichlet Distributions
A random variable c < x < d has a beta distribution with parameters a > 0, b > 0, c R and
d > c, denoted by x Ba, b, c, d if and only if its pdf is given by
a1
b1
xc
1
dx
.
(4.14)
pB x|a, b, c, d
d ca, b d c
dc
The standardized beta distribution can directly be determined from (4.14) by defining the
random variable z x c/d c. Hence, 0 < z < 1 has a beta distribution with parameters
a > 0 and b > 0, denoted by z Ba, b if and only if its pdf is given by
1
za1 1 zb1 .
(4.15)
pSB z|a, b
a, b
For a, b > 1, the mode of (4.15) is given by
SB a 1/a b 2. Zellner (1971) provides
general expressions for the moments of the beta pdf in (4.15). For example, the mean of the
2
ab/a b2 a b 1.
standardized beta is SB a/a b, while the variance is SB
The a and b parameters of the beta distribution can be expressed as functions of the mean
and the variance. Some algebra later we find that
SB
2
,
a 2 SB 1 SB SB
SB
(4.16)
1 SB
a.
b
SB
2
when SB 1 SB >
From these expressions we see that a and b are defined from SB and SB
2
> 0 with 0 < SB < 1.
SB
Letting B and B2 be the mean and the variance of x Ba, b, c, d, it is straightforward to
show that:
B c d c SB ,
(4.17)
2 2
.
B2 d c SB
This means that we can express a and b as functions of B , B , c, and d:
B c
2
c
d
a
B
B
B ,
d cB2
d B
a.
b
B c
(4.18)
The conditions that a > 0 and b > 0 means that c < B < d, while B cd B > B2 .
SB
The mode still exists when a, b > 1 and is in that case given by
B c d c
c d ca 1/a b 2.
35
ab
a b3 a b 1a b 2
a b2 a b 1
Hence, if a b, then the beta distribution is symmetric, while b > a (a > b) implies that it is
right-skewed (left-skewed). Since b > a implies that B < d c/2, it follows that the mean
lies below the mid-point of the range c, d
.
The beta distribution is related to the gamma distribution in a particular way. Suppose
x Ga, 1 while y Gb, 1. As shown by, e.g., Bauwens, Lubrano, and Richard (1999,
Theorem A.3), the random variable z x/x y Ba, b.
The beta distribution is plotted in the lower left panel of Figure 1 for a few examples. In all
cases the lower bound c 0 and the upper
bound b 1. For the baseline case the mean is
0.5 while the standard deviation is 1/ 12 0.28868 and this is displayed as the horizontal
blue solid line in the figure. This means that the beta distribution is identical to the uniform
distribution. When the standard deviation drops, the distribution becomes bell shaped (red
dashed line) and since the mean is exactly at the center between the lower and the upper
bound, the distribution becomes symmetric; cf. equal (4.17) where a b. As noted above,
when the mean of the beta distribution is smaller (greater) than the mid-point in the support,
the the distribution is right-skewed (left-skewed) since b > a (a>b).
The beta distribution is also related to the Snedecor or F (Fisher) distribution. For example,
suppose that x Ba/2, b/2 with a, b being positive integers. Then z bx/a1 x can
be shown to have an Fa, b distribution; cf. Bernardo and Smith (2000, Chapter 3). That is,
ab/2
aa/2 bb/2 a/21
z
,
z > 0.
b az
pF z|a, b
a/2, b/2
The mean of this distribution exists if b > 2 and is then F b/b 2. The mode exists and
is unique with
F a 2b/ab 2 when a > 2. Finally, if b > 4 then the variance exists
and is given by F2 2b2 a b 2/ab 4b 22 .
Although YADA does not directly support the F distribution, the combination of the beta prior
and the file with parameters to update (see Section 17.3) makes it possible to indirectly support
this as a prior.
The multivariate extension of the beta distribution is the so called Dirichlet distribution. The
marginal distribution for one element of a Dirichlet distributed random vector is the beta distribution; cf. Gelman, Carlin, Stern, and Rubin (2004, Appendix A). YADA does not directly
support prior distributions that include dependence between parameters. However, by using
the file with parameters to update (see Section 17.3) the user can circumvent this restriction.
Specifically, suppose xi Bai , bi are mutually independent for i 1, . . . , k 1 with k 3.
i1
Defining zi xi j1 1 xj for i 2, . . . , k 1, z1 x1 , and assuming that bi1 ai bi for
i 2, . . . , k 1, it is shown in Connor and Mosimann (1969) that the density for z1 , . . . , zk1
is given by
k
k
i1 i
1
zi i ,
pD z1 , . . . , zk1 |1 , . . . , k k
i1 i i1
k1
where zk 1 i1 zi , i ai for i 1, . . . , k 1 and k bk1 . This is the density function
of the Dirichlet distributed vector z z1 , . . . , zk1 D1 , . . . , k .
The first two moments of the standardized Dirichlet distribution exist and are, for example,
k
given in Gelman et al. (2004, Appendix A). Specifically, let 0 j1 j . The mean of zi is
D,i i 1/0 k when it exists. The variance
D,i i /0 , while the mode is equal to
SB
36
D,ij
i 0 i
,
2
0 0 1
i j
.
2
0 0 1
From the expressions for the mean and the variance of the Dirichlet, the relation to the mean
and the variance of the (univariate) beta distribution can be seen.
To use the Dirichlet prior in YADA, the user should setup a prior for the auxiliary parameters
xi and compute zi in the file with parameters to update. Cases when the Dirichlet may be of
interest include models that have parameter pairs than are restricted to, e.g., be positive and to
sum to something less than unity.18
18
One such example is when the model contains an AR(2) process and where the AR parameters should both be
positive and add up to something less than unity. This is sucient but not necessary for stability. By transforming zi
even further one could also consider the general conditions for stability of an AR(2) process. For instance, let
y1 2z2 41 z2 z1 2,
y2 2z2 1.
It can now be shown that y1 y2 < 1, y2 y1 < 1, and 1 < y2 < 1 for z1 , z2 D1 , 2 , 3 . The last condition
follows from z2 y2 1/2 0, 1. The first two conditions are satisfied if we notice that y2 1 < y1 < 1 y2 .
We may therefore let z1 y1 y2 1/2 2y2 0, 1. Based on the means and covariance of zi we can
directly determine the means of yi . Notice that the stability conditions are also satisfied if we let zi U0, 1, with
z1 and z2 independent.
37
where
1 a/2/2
if a > 0
a
LT N c.
Three examples of the left truncated normal distribution along with the normal distribution
are plotted in the lower right panel of Figure 1. As c increases the height of the density relative
to it highest point based on the normal for the same support increases.
4.2.8. Uniform Distribution
A random variable z is said to have a uniform distribution with parameters a and b with b > a,
denoted by z Ua, b if and only if its pdf is given by
1
.
(4.22)
pU z|a, b
ba
The mean and the variance of this distribution are:
ab
,
U
2
b a2
.
U2
12
The beta distribution is equivalent to a uniform distribution with lower bound c and upper
bound d when B c d/2 and B2 d c2 /12; see, also, Bauwens, Lubrano, and Richard
(1999) for additional properties of the uniform distribution.
4.2.9. Student-t and Cauchy Distribution
A random variable z is said to have a Student-t distribution with location parameter R,
scale parameter > 0, and degrees of freedom parameter d (a positive integer), denoted by
z td , , if and only if its pdf is given by
2 d1/2
1 z
d 1/2
1
.
(4.23)
pS z|, , d
d
d/2 d
The Student-t distribution is symmetric around the mode , while the mean exists if d > 1,
and the variance exists if d > 2. The first two central moments are then given by
S ,
d
2.
d2
The distribution has heavier tails (higher kurtosis) for finite d than the normal distribution.19
When d , the density in (4.23) converges to the density of the normal distribution.
At the other extreme, i.e., d 1, the distribution is also known as the Cauchy, denoted by
z C, . The density function now simplifies to
.
(4.24)
pC z|,
2 z 2
S2
19
0.5
N(0,1)
t(0,1,1)
t(0,1,2)
t(0,1,10)
N(0,1)
L(0,1,1)
L(0,1,0.5)
L(0,1,1.5)
0.45
0.4
0.35
0.35
0.3
0.3
0.25
0.25
0.2
0.2
0.15
0.15
0.1
0.1
0.05
0
5
0.05
0
5
Gumbel Distribution
0.7
3.5
0.6
0.5
2.5
0.4
0.3
1.5
0.2
0.1
0.5
(3,0.6667,0)
(3,0.6667,0.6667)
(2,0.6667,0.6667)
(3,1,0)
4.5
0.8
Pareto Distribution
(0,1)
(0,0.5)
(1,1)
(0,2)
0.9
0
4
0.5
1.5
2.5
The mean and the variance for the Cauchy distribution do not exist; see, also, Bauwens, Lubrano, and Richard (1999) for additional properties of the Student-t distribution.
The normal, the Cauchy, and some other Student-t distributions are shown in the upper left
panel of Figure 2. As q increases, the height of the t-density approaches the height of the normal
density from below. For the standard distributions with location parameter zero and unit scale
parameter, the tails of the Student-t become thicker than the normal once the distance is at
leats 2 standard deviations away from the mean of the normal distribution.
4.2.10. Logistic Distributions
A random variable z is said to have a logistic distribution with location parameter R and
scale parameter > 0, denoted by z L, , if and only if its density function is
exp z /
(4.25)
pL z|,
2 .
1 exp z /
Because the pdf can be expressed in terms of the square of the hyperbolic secant function20 it is
sometimes referred to as the sech-squared distribution.
20
The logistic distribution receives its name from its cdf, which is an instance of the family of
logistic functions. Specifically, the cdf is given by
1
.
FL z|,
1 exp z /
The logistic distribution is symmetric and resembles the normal distribution in shape, but has
heavier tails (higher kurtosis). The mean and variance of the distribution are:
L ,
(4.26)
2 2
,
L2
3
while excess kurtosis is 6/5.
One extension of the logistic distribution that may be of interest is the so called Type I generalized (or reversed) logistic distribution; cf. Balakrishnan and Leung (1988). A random variable
z is said to have a Type I generalized logistic distribution with location parameter R, scale
parameter > 0, and shape parameter c > 0 if and only if its density function is
c exp z /
(4.27)
pGL z|, , c
1c .
1 exp z /
In this case the cdf is given by
1
FGL z|, , c
.
c
1 exp z /
(4.28)
The distribution is left-skewed for c < 1, right-skewed for c > 1, and for c 1 it is symmetric
and is identical to the logistic. The mode of the distribution is given by
GL lnc.21
The mean and the variance of the Type I generalized logistic distribution exist and are given
by
GL c ,
2
(4.29)
2
c 2 .
GL
6
The term is Eulers constant, while c is the digamma function and c its first derivative,
the so called trigamma function.22 In Matlab, c and its derivatives can be computed through
the function psi. But since this function was not present until version 6.5 of Matlab, the functions DiGamma and TriGamma are included in YADA, using the algorithms in Bernardo (1976)
and Schneider (1978), respectively.23
21
23
The algorithm also makes use of modifications suggested by Tom Minka in the functions digamma.m and
trigamma.m from the Lightspeed Toolbox. See http://research.microsoft.com/en-us/um/people/minka/ for
details.
40
The logistic distribution is compared with the normal distribution in the upper right panel of
Figure 2. Focusing on a mean of zero, we find that the height of the symmetric logistic (c 1) is
greater than the height of the normal close to the mean. Between around one and 2.5 standard
deviations away from the mean of the normal the height of the normal is somewhat greater than
that of the logistic, whereas further out in the tails the height of the logistic is greater than the
height of the normal. As c < 1 the distribution becomes left-skewed (green dash-dotted line),
and for c > 1 it is right-skewed (magenta dotted line). The skewness eect in Figure 2 seems to
be larger in absolute terms when c < 1 than when c > 1. This is also confirmed when applying
the skewness formula. For c 0.5 we find that skewness is approximately 2.19, whereas for
c 1.5 (c 100) it is roughly 0.61 (1.45). As c becomes very large, skewness appears to
converge to around 1.46. Hence, a higher degree of left-skewness than right-skewness can be
achieved through the Type I generalized logistic distribution.
4.2.11. Gumbel Distribution
A random variable z is said to have a Gumbel distribution with location parameter R and
scale parameter > 0, denoted by z Gu, , if and only if its density function is
z
z
1
exp exp
.
(4.30)
pGu z|, exp
The Gumbel distribution is the most common of the three types of Fisher-Tippett extreme
value distribution. It is therefore sometimes called the Type I extreme value distribution. The cdf
of the Gumbel is given by
z
.
(4.31)
FGu z|, exp exp
The Gumbel distribution is right-skewed for z and therefore left-skewed for x z. The
mean and the variance exist and are given by:
Gu ,
(4.32)
2 2
2
.
Gu
6
Skewness is roughly equal to 1.1395, while excess kurtosis is exactly 12/5.24 The mode is given
by the location parameter
Gu , while the median is lnln2; see Johnson, Kotz, and
Balakrishnan (1995).
A few examples of the Gumbel distribution have been plotted in the lower left panel of
Figure 2. It is noteworthy that a shift in the mean for fixed standard deviation implies an equal
size shift in the location parameter . This is illustrated by comparing the solid blue line for
Gu0, 1 to the dash-dotted green line for Gu1, 1. This is a direct consequence of equation
(4.32), where d/dGu 1. Moreover, and as also implied by that equation and the fact that
is the mode, a higher (lower)
standard deviation for fixed mean results in a lower (higher)
mode, i.e., d
Gu /dGu 6/ 0.45.
4.2.12. Pareto Distribution
A random variable z b is said to have a Pareto distribution with shape parameter a > 0 and
location parameter b > 0, denoted by z P a, b, if and only if its pdf is equal to
pP z|a, b aba za1 .
(4.33)
This distribution was originally used by Vilfredo Pareto to describe the allocation of wealth
among individuals. The idea was to represent the 80-20 rule, which states that 20 percent of
the population control 80 percent of the wealth. The power law probability distribution has the
simple cdf
a
b
.
(4.34)
FP z|a, b 1
z
24
To be precise, skewness is given by SGu 12 63/ 3 , where 3 is equal to Aprys constant; cf. footnote 21.
41
The mode of the distribution is b and the density declines exponentially toward zero as z
becomes larger. The mean exists if a > 1, while the variance exists if a > 2. The central
moments are then given by
ab
,
P
a1
ab2
.
P2
a 12 a 2
Moreover, if a > 3 then skewness exists and is given by:
2a 1 a 2
.
SP
a 3 a
Since SP > 0 it follows that the distribution is right-skewed. Moreover, SP is a decreasing
function of a and SP 2 as a .
The Pareto distribution is plotted for a 3 parameter case in the lower right panel of Figure 2.
The third parameter is called the origin parameter, c, and is applied such that z c x, where
x P a, b. In other words, the density function for z is given by
pP z|a, b, c aba z ca1 .
Comparing the baseline case where z P 3, 2/3, 0 (blue solid line) to the case when c b
(red dashed line) it is clear that the origin parameter merely aects the lower bound of the
distribution. On the other hand, a drop of the shape parameter a from 3 to 2 (green dashdotted line) lowers the height of the distribution around the mode and increases the mass of
the distribution in the right tail.25 Finally, an increase in b has no eect on skewness and, hence,
it increases the height of the distribution over its support.
4.2.13. Discussion
YADA needs input from the user regarding the type of prior to use for each parameter it should
estimate. In the case of the beta, normal, logistic, and Gumbel distributions the parameters
needed as input are assumed to be the mean and the standard deviation. If you wish to have
a general beta prior you need to provide the upper and lower bounds as well or YADA will set
these to 1 and 0, respectively. If you wish to have a Type I generalized logistic distribution you
need to provide the shape parameter c in (4.27).
For the gamma and the left truncated normal distribution, the parameters to assign values for
are given by , , and a lower bound c. For the gamma distribution this the first two parameters
are the mean and the standard deviation, while for the left truncated normal they are defined
in equation (4.20) and are the location and scale parameters, respectively.
Similarly, for the inverted gamma distribution the parameters to select values for are s, q,
and a lower bound c. The s parameter is, as mentioned above, a location parameter and q is
a degrees of freedom parameter that takes on integer values. The location parameter s can,
e.g., be selected such that the prior has a desired mode. Relative to equation (4.11) we are
now dealing with the location parameter s c and the random variable c since the
density is expressed for a random variable that is positive. The mode for this parameterization
is
IG sq/q 11/2 c1 q/q 11/2 .
As can be expected, the uniform distribution requires the lower and upper bound, i.e., a and
b in (4.22).
For the the Student-t, the required parameters are , , and d, while the Cauchy only takes
the first two parameters. Finally, the Pareto distribution takes the shape and location parameters
25
Excess kurtosis is also a function of a only. Specifically, provided that a > 4 it is given by:
6 a3 a2 6a 2
.
KP
a a2 7a 12
It can be shown that excess kurtosis is a decreasing function of a and that KP 6 from above as a .
42
(a, b). In addition, YADA also accepts an origin parameter c which shifts z by a constant, i.e.,
y z c, where z P a, b. This means that y b c, i.e., b c is both the mode and the
lower bound of y.
For all distributions but the beta, gamma, logistic, and Gumbel there is no need for internally
transforming the distribution parameters. For these 4 distributions, however, transformations
into the a and b parameters ( and for the logistic and Gumbel) are needed, using the mean
and the standard deviation as input (as well as the shape parameter c for the logistic). YADA has
4 functions that deal with these transformations, MomentToParamGammaPDF (for the gamma),
MomentToParamStdbetaPDF (for the standardized beta), MomentToParamLogisticPDF (for the
logistic), and MomentToParamGumbelPDF (for the Gumbel distribution). These functions take
vectors as input as provide vectors as output. The formulas used are found in equations (4.8)
and (4.16) above for the gamma and
the beta distributions, the inverse of (4.26) for the logistic
distribution, i.e., L and 3/L when c 1, the inverse of (4.29) when c 1, and
the inverse of (4.32) for the Gumbel distribution. Since YADA supports a lower bound that can
be dierent from zero for the gamma prior, the mean minus the lower bound is used as input
for the transformation function MomentToParamGammaPDF. Similarly, the mean and the standard
deviation of the standardized beta distribution are computed from the mean and the standard
deviation as well as the upper and lower bounds of the general beta distribution. Recall that
these relations are SB B c/d c and SB B /d c.
4.3. Random Number Generators
For each prior distribution that YADA has direct support for, it can also provide random draws.
These draws are, for instance, used to compute impulse responses based on the prior distribution. In this subsection, the specific random number generators that YADA makes use of will be
discussed.
The basic random number generators are given by the rand and randn Matlab function.
The first provides draws from a standardized uniform distribution and the second from a
standardized normal distribution. Supposing that p U0, 1 it follows by the relationship
p z a/b a, where b > a are the upper and lower bound for z that
z a b ap.
Hence, random draws for z are obtained by drawing p from U0, 1 via the function rand and
computing z from the above relationship.
Similarly, with x N0, 1 and x z / it follows that random draws for z N0, 2
can be obtained by drawing x from N0, 1 via the function randn and using the relationship
z x.
To obtained draws of z from LTN, , c YADA uses a very simple approach. First of all, x
is drawn from N, 2 . All draws such that x c are given to z, while the draws x < c are
discarded.
If z Ga, b, then YADA checks if the Statistics Toolbox is available on the computer. If its
existence is confirmed, then the function gamrnd is used. On the other hand, if this toolbox
is missing, then YADA uses the function YADARndGammaUnitScale function to obtained x
Ga, 1, while z bx Ga, b follows from this relationship. The function that draws from
a gamma distribution with unit scale (b 1) is based on the function rgamma from the Stixbox
Toolbox by Anders Holtsberg.26. Since gamma distributed random variables can have a lower
bound (c) dierent from zero, this parameter is added to the gamma draws.
Similarly, to obtain draws from a beta distribution, YADA first checks if the Statistics Toolbox
is available. If the test provides a positive response the function betarnd is used to generate
z Ba, b. With a negative response to this test, YADA uses the random number generator
for the gamma distribution for y1 Ga, 1 and y2 Gb, 1, and determines z from z
y1 /y1 y2 . For both cases, YADA makes use of the relationship x c d cz, with d > c.
It now follows that x Ba, b, c, d.
26
Furthermore, to draw z IGs, q YADA again makes use of the gamma distribution. First of
all, the pair a, b q/2, 2/qs2 is computed such that z IGa, b IGs, q. Using the fact
that z being IGa, b is equivalent
to x z2 Ga, b. Hence, x is drawn from the gamma
distribution, while z 1/ x.
For the Cauchy and, more generally, the Student-t we make use of the result that the standardized Student-t
! density is given2by the ratio between the standardized normal density and
the density of z/q where z q ; see, e.g., Bauwens et al. (1999, p. 318). That is, we let
!
y1 N0, 1 and y2 Gd/2, 2 and let x y1 d/y2 . This means that x td 0, 1. Finally,
the get the random draws z td , we employ the relationship z x. Again, random
draws from the Cauchy distribution C, are given by setting d 1 for the draws from the
Student-t distribution; see, e.g, Gelman et al. (2004, Appendix A, p. 581).27
To obtain random draws from the Type I generalized logistic distribution, YADA makes use
of the cdf in equation (4.28). That is, we replace FGL z|, , c with p U0, 1 and compute z
by inverting the cdf. This provides us with
1
z ln
1 .
p1/c
The same approach is used for the Gumbel and the Pareto distributions. By inverting (4.31)
we obtain
z ln lnp ,
where p U0, 1. It now follows that z Gu, .
Similarly, by inverting equation (4.34), and taking the origin parameter into account, it is
straightforward to show that
1
.
zcb
1 p1/a
With p U0, 1 we find that z P a, b, c.
4.4. YADA Code
The density functions presented above are all written in natural logarithm form in YADA. The
main reason for this is to keep the scale manageable. For example, the exponential function
in Matlab, like any other computer software available today, cannot deal with large numbers.
If one attempts to calculate e700 one obtains exp(700) = 1.0142e+304, while exp(720) = Inf.
Furthermore, and as discussed in Section 4.2.2, the gamma and beta functions return infinite
values for large (and finite) input values, while the natural logarithm of these functions return
finite values for the same large input values.
4.4.1. logGammaPDF
The function logGammaPDF calculates ln pG z|a, b in (4.7). Required inputs are z, a, b, while
the output is lnG. Notice that all inputs can be vectors, but that the function does not check
that the dimensions match. This is instead handled internally in the files calling or setting up
the input vectors for logGammaPDF.
4.4.2. logInverseGammaPDF
The function logInverseGammaPDF calculates ln pIG |s, q in (4.11). Required inputs are sigma,
s, q, while the output is lnIG. Notice that all inputs can be vectors, but that the function does
not check that the dimensions match. This is instead handled internally in the files calling or
setting up the input vectors for logInverseGammaPDF.
27
The Statistics Toolbox comes with the function trnd which returns draws from the standardized Student-t distribution td 0, 1. This function is not used by YADA and the main reason is that it does not seem to improve upon the
routines provided by YADA itself. In fact, for small d many draws from the Student-t appear to be extremely large or
small. To avoid such extreme draws, YADA excludes all draws that are outside the range 6, 6
. While the
choice of 6 times the scale factor is arbitrary, this seems to work well in practise. In particular, when estimating the
density via a kernel density estimator using, e.g, and Epanechnikov or normal kernel, the resulting density seems to
match a grid-based estimator of the Student-t density very well when based on occular inspection.
44
4.4.3. logBetaPDF
The function logBetaPDF calculates ln pB z|a, b, c, d in (4.14). Required inputs are z, a, b, c,
and d, while the output is lnB. Notice that all inputs can be vectors, but that the function does
not check that the dimensions match. This is instead handled internally in the files calling or
setting up the input vectors for logBetaPDF.
4.4.4. logNormalPDF
The function logNormalPDF calculates ln pN z|, in (4.19). Required inputs are z, mu, sigma,
while the output is lnN. Notice that all inputs can be vectors, but that the function does not check
that the dimensions match. This is instead handled internally in the files calling or setting up
the input vectors for logNormalPDF.
4.4.5. logLTNormalPDF
The function logLTNormalPDF calculates ln pLT N z|, , c in (4.20). Required inputs are z, mu,
sigma, c, while the output is lnLTN. This function calls the PhiFunction described below. Notice
that the function does not check if z c holds. This is instead handled by the functions that
call logLTNormalPDF.
4.4.6. logUniformPDF
The function logUniformPDF calculates ln PU z|a, b in (4.22), i.e., the log height of the uniform
density, i.e., lnb a. The required inputs are a and b, where the former is the lower bound
and the latter the upper bound of the uniformly distributed random vector z.
4.4.7. logStudentTAltPDF
The function logStudentTAltPDF calculates ln pS z|, , d in (4.23), i.e., the log height of the
Student-t density. The required input variables are z, mu, sigma, and df, while the output is lnS.
Notice that all inputs can be vectors, but that the function does not check that the dimensions
match. This is instead handled internally in the files calling or setting up the input vectors for
logStudentTAltPDF.
4.4.8. logCauchyPDF
The function logCauchyPDF calculates ln pC z|, in (4.24), i.e., the log height of the Cauchy
density. The required input variables are z, mu, and sigma, while the output is lnC. Notice that
all inputs can be vectors, but that the function does not check that the dimensions match. This
is instead handled internally in the files calling or setting up the input vectors for logCauchyPDF.
4.4.9. logLogisticPDF
The function logLogisticPDF calculates ln pGL z|, , c in (4.27), i.e., the log height of the
(Type I generalized) logistic density. The required input variables are z, mu, sigma and c, while
the output is lnL. Notice that all inputs can be vectors, but that the function does not check that
the dimensions match. This is instead handled internally in the files calling or setting up the
input vectors for logLogisticPDF.
4.4.10. logGumbelPDF
The function logGumbelPDF calculates ln pGu z|, in (4.30), i.e., the log height of the Gumbel
density. The required input variables are z, mu, and sigma, while the output is lnGu. Notice that
all inpute variables can be vectors, but that the function does not check that the dimensions
match. This is instead handled internally in the files calling or setting up the input vectors for
logGumbelPDF.
45
4.4.11. logParetoPDF
The function logParetoPDF calculates ln pP z|a, b in (4.33), i.e., the log height of the Pareto
density. The required input variables are z, a, and b, while the output is lnP. Notice that all
inputs can be vectors, but that the function does not check that the dimensions match. This is
instead handled internally in the files calling or setting up the input vectors for logParetoPDF.
4.4.12. PhiFunction
The function PhiFunction evaluates the expression for a in (4.21). The required input is
the vector a, while the output is PhiValue, a vector with real numbers between 0 and 1.
4.4.13. GammaRndFcn
The function GammaRndFcn computes random draws from a gamma distribution. The function
takes two required input variables, a and b. These contain the shape and the scale parameters
from the gamma distribution with lower bound 0 and both are treated as vectors; see equation
(4.7). Notice that the function does not check that the dimensions of a and b match. An
optional input variable for total number of draws, NumDraws, is also accepted. The default value
for this integer is 1. The algorithms used by the function are discussed above in Section 4.3.
The function provides one output variable, z, a matrix with row dimension equal to the length
of a and column dimension equal to NumDraws.
4.4.14. InvGammaRndFcn
The function InvGammaRndFcn computes random draws from an inverted gamma distribution.
The function takes two required input variables, s and q. These contain the location and degrees
of freedom parameters from the inverted gamma distribution with lower bound 0 and both
are treated as vectors; see equation (4.11). Notice that the function does not check that the
dimensions of s and q match. An optional input variable for total number of draws, NumDraws,
is also accepted. The default value for this integer is 1. The algorithms used by the function are
discussed above in Section 4.3.
The function provides one output variable, sigma, a matrix with row dimension equal to the
length of s and column dimension equal to NumDraws.
4.4.15. BetaRndFcn
The function BetaRndFcn computes random draws from a beta distribution. The function takes
4 required input variables, a, b, c and d. These contain the shape parameters as well as the
lower and upper bound from the beta and are all treated as vector; see equation (4.14). The
function does not check if the dimensions of these variables match; this is instead handled by
the function that calls it. An optional input variable for total number of draws, NumDraws, is
also accepted. The default value for this integer is 1. The algorithms used by the function are
discussed above in Section 4.3.
The function provides one output variable, x, a matrix with row dimension equal to the length
of a and column dimension equal to NumDraws.
4.4.16. NormalRndFcn
The function NormalRndFcn computes random draws from a normal distribution. The function takes two required input variables, mu and sigma. These contain the mean (location) and
standard deviation (scale) parameters; see equation (4.19). The function does not check if the
dimensions of these variables match; this is instead handled by the function that calls it. An
optional input variable for total number of draws, NumDraws, is also accepted. The default value
for this integer is 1. The algorithm used by the function is discussed above in Section 4.3.
The function provides one output variable, z, a matrix with row dimension equal to the length
of mu and column dimension equal to NumDraws.
46
4.4.17. LTNormalRndFcn
The function LTNormalRndFcn computes random draws from a left truncated normal distribution. The function takes 3 required input variables, mu, sigma and c. These contain the location,
scale and lower bound (left truncation) parameters; see equation (4.20). The function does
not check if the dimensions of these variables match; this is instead handled by the function
that calls it. An optional input variable for total number of draws, NumDraws, is also accepted.
The default value for this integer is 1. The algorithm used by the function is discussed above in
Section 4.3.
The function provides one output variable, z, a matrix with row dimension equal to the length
of mu and column dimension equal to NumDraws.
4.4.18. UniformRndFcn
The function UniformRndFcn computes random draws from a uniform distribution. The function takes two required input variables, a and b. These contain the lower and upper bound
parameters; see equation (4.22). The function does not check if the dimensions of these variables match; this is instead handled by the function that calls it. An optional input variable for
total number of draws, NumDraws, is also accepted. The default value for this integer is 1. The
algorithm used by the function is discussed above in Section 4.3.
The function provides one output variable, z, a matrix with row dimension equal to the length
of a and column dimension equal to NumDraws.
4.4.19. StudentTAltRndFcn
The function StudentTAltRndFcn computes random draws from a Student-t distribution. The
function takes 3 required input variables, mu, sigma and d. These contain the location, scale
and degrees of freedom parameters; see equation (4.23). The function does not check if the
dimensions of these variables match; this is instead handled by the function that calls it. An
optional input variable for total number of draws, NumDraws, is also accepted. The default value
for this integer is 1. The algorithm used by the function is discussed above in Section 4.3.
The function provides one output variable, z, a matrix with row dimension equal to the length
of mu and column dimension equal to NumDraws.
4.4.20. CauchyRndFcn
The function CauchyRndFcn computes random draws from a Cauchy distribution. The function
takes two required input variables, mu and sigma. These contain the location and scale parameters; see equation (4.24). The function does not check if the dimensions of these variables
match; this is instead handled by the function that calls it. An optional input variable for total number of draws, NumDraws, is also accepted. The default value for this integer is 1. The
algorithm used by the function is discussed above in Section 4.3.
The function provides one output variable, z, a matrix with row dimension equal to the length
of mu and column dimension equal to NumDraws.
4.4.21. LogisticRndFcn
The function LogisticRndFcn computes random draws from a Type I generalized logistic distribution. The function takes 3 required input variables, mu, sigma and c. These contain the
location, scale and shape parameters; see equation (4.27). The function does not check if the
dimensions of these variables match; this is instead handled by the function that calls it. An
optional input variable for total number of draws, NumDraws, is also accepted. The default value
for this integer is 1. The algorithm used by the function is discussed above in Section 4.3.
The function provides one output variable, z, a matrix with row dimension equal to the length
of mu and column dimension equal to NumDraws.
47
4.4.22. GumbelRndFcn
The function GumbelRndFcn computes random draws from a Gumbel distribution. The function
takes two required input variables, mu and sigma. These contain the location and scale parameters; see equation (4.30). The function does not check if the dimensions of these variables
match; this is instead handled by the function that calls it. An optional input variable for total number of draws, NumDraws, is also accepted. The default value for this integer is 1. The
algorithm used by the function is discussed above in Section 4.3.
The function provides one output variable, z, a matrix with row dimension equal to the length
of mu and column dimension equal to NumDraws.
4.4.23. ParetoRndFcn
The function ParetoRndFcn computes random draws from a Pareto distribution. The function
takes two required input variables, a and b. These contain the shape and location parameters;
see equation (4.33). The function does not check if the dimensions of these variables match;
this is instead handled by the function that calls it. An optional input variable for total number
of draws, NumDraws, is also accepted. The default value for this integer is 1. The algorithm used
by the function is discussed above in Section 4.3. Since the function is based on a zero value
for the origin parameter, it can be added to the draws as needed.
The function provides one output variable, z, a matrix with row dimension equal to the length
of a and column dimension equal to NumDraws.
48
(5.1)
The vector xt is k-dimensional and contains only deterministic variables. The vector t is rdimensional and is known as the state vector and contains possibly unobserved variables. The
term wt is white noise and is called the measurement error.
The state (or transition) equation determines the dynamic development of the state variables.
It is here given by:
(5.2)
t Ft1 vt ,
49
where F is the state transition matrix. The term vt is white noise and it is assumed that vt and
w are uncorrelated for all t and , with
Q for t ,
E vt v
0 otherwise,
while
R for t ,
E wt w
0 otherwise.
The parameter matrices are given by A (k n), H (r n), F (r r), Q (r r), and R (n n).
These matrices are known once we provide a value for . To initialize the process described by
(5.1) and (5.2), it is assumed that 1 is uncorrelated with any realizations of vt or wt .
5.2. The Kalman Filter Recursion
Let t {yt , yt1 , . . . , y1 , xt , xt1 , . . . , x1 } denote the set of observations up to and including
period t. The Kalman filter provides a method for computing optimal 1-step ahead forecasts
of yt conditional on its past values and on the vector xt as well as the associated forecast
error covariance matrix.28 These forecasts and their mean squared errors can then be used to
compute the value of the log-likelihood function for y. Given the state-space representation in
(5.1) and (5.2), it can directly be seen that the calculation of such forecasts requires forecasts
of the state vector t conditional on the observed variables.
Let t1|t denote the linear projection of t1 on t . The Kalman filter calculates these forecasts recursively, generating 1|0 , 2|1 , and so on. Associated with each of these forecasts is a
mean squared error matrix, represented by
Pt1|t E t1 t1|t t1 t1|t .
Similarly, let yt|t1 be the linear projection of yt on t1 and xt . From the measurement
equation (5.1) and the assumption about wt we have that:
yt|t1 A xt H t|t1 .
The forecast error for the observed variables can therefore be expressed as
yt yt|t1 H t t|t1 wt .
(5.3)
(5.4)
It follows that the 1-step ahead forecast error covariance matrix for the observed variables is
given by
(5.5)
E yt yt|t1 yt yt|t1 y,t|t1 H Pt|t1 H R.
To compute the forecasts and forecast error covariance matrices for the observed variables (yt )
we therefore need to know the sequence of forecasts and forecast error covariance matrices of
the state variables (t ).
Projecting the state variables in period t 1 on t we find from equation (5.2) that
t1|t Ft|t ,
(5.6)
where t|t is called the updated or filtered value of t . From standard results on linear projections
(see, e.g., Hamilton, 1994, Chapter 4.5), the updated value of t relative to its forecasted value
is given by:
yt yt|t1 .
(5.7)
t|t t|t1 Pt|t1 H1
y,t|t1
Substituting (5.7) into (5.6), gives us:
where
28
t1|t Ft|t1 Kt yt yt|t1 ,
(5.8)
,
Kt FPt|t1 H1
y,t|t1
(5.9)
The forecasts are optimal in a mean squared error sense among any functions of (xt , t1 ).
50
y
y
,
rt|t H1
t
t|t1
y,t|t1
(5.11)
(5.12)
E t t
exists. From the state equation (5.2) we find that:
F F Q.
(5.15)
1
vecQ,
vec Ir 2 F F
(5.16)
where vec is the column stacking operator, and the Kronecker product. One candidate for
P1|0 is therefore .
However, if r is large then the calculation of in (5.16) may be too cumbersome, especially
if this has to be performed frequently (e.g., during the stage of drawing from the posterior).
In such cases, it may be better to make use of the doubling algorithm. Equation (5.15) is a
Lyapunov equation, i.e., a special case of the Sylvester equation. Letting 0 Q and 0 F we
can express the iterations
k k1 k1 k1 k1 ,
k 1, 2, . . .
(5.17)
where
k k1 k1 .
The specification in (5.17) is equivalent to expressing:
k
k 1
2
F j QF j .
j0
From this relation we can see that limk k . Moreover, each iteration doubles the number
of terms in the sum and we expect the algorithm to converge quickly since ||k || should be close
to zero also for relatively small k when all the eigenvalues of F lie inside the unit circle.
Alternatively, it may be better to let P1|0 cIr for some constant c. The larger c is the less
informative the initialization is for the filter. YADA allows for both alternatives to using equation
(5.16) for initializing P1|0 . In addition, the exact treatment of diuse initialization when c
is discussed in Section 5.14.
5.4. The Likelihood Function
One important reason for running the Kalman filter for DSGE models is that as a by-product it
provides us with the value of the log-likelihood function when the state shocks and the measurement errors have know distributions. The log-likelihood function for yT , yT 1 , . . . , y1 can
be expressed as:
T
ln p yt |xt , t1 ; ,
ln L yT , yT 1 , . . . , y1 |xT , . . . , x1 ;
(5.18)
t1
by the usual factorization and assumption regarding xt . To compute the right hand side for a
given , we need to make some distributional assumptions regarding 1 , vt , and wt .
In YADA it is assumed that 1 , vt , and wt are multivariate Gaussian with Q and R being
positive semidefinite. This means that:
1
n
ln p yt |xt , t1 ; ln2 ln y,t|t1
2
2
(5.19)
1
y
y
.
yt yt|t1 1
t
t|t1
y,t|t1
2
The value of the sample log-likelihood can thus be calculated directly from the 1-step ahead
forecast errors of yt and the associated forecast error covariance matrix.
5.5. Smoothed Projections of the State Variables
Since many of the elements of the state vector are given structural interpretations in the DSGE
framework, it is important to use as much information as possible to project this vector. That
is, we are concerned with the smooth projections t|T . In equation (5.7) we already have such a
projection for t T. It thus remains to determine the backward looking projections for t < T.
52
Hamilton (1994, Chapter 13.6) shows that the smooth projections of the state vector are
given by:
(5.20)
t|T t|t Jt t1|T t1|t , t 1, 2, . . . , T 1,
where the Kalman smoothing matrix Jt is given by
1
.
Jt Pt|t F Pt1|t
The mean squared error matrix of the smooth projection t|T is now:
Pt|T Pt|t Jt Pt1|T Pt1|t Jt .
(5.21)
To calculate t|T and Pt|T one therefore starts in period t T 1 and then iterates backwards
until t 1. Derivations of these equations can be found in, e.g., Ansley and Kohn (1982).
The smoothing expression in (5.20) requires that Pt1|t is a full rank matrix. This assumption
is violated if, for instance, R 0 (no measurement errors) and rankQ < r with P1|0 . In
this case we may use an eigenvalue decomposition such that Pt1|t SS , where S is r s,
r > s, is a diagonal full rank s s matrix, and S S Is . Replacing Jt with
Jt Pt|t F SS Pt1|t S1 S ,
the smoothing algorithm remains otherwise intact; see, e.g., Kohn and Ansley (1983).29
There is an alternative algorithm for computing the smoother which does not require an
explicit expression for the inverse of the state forecast covariance matrix Pt1|t ; see, e.g., De Jong
(1988, 1989), Kohn and Ansley (1989), or Koopman and Harvey (2003). The smoothed state
vector can be rewritten as
t|T t|t1 Pt|t1 rt|T ,
t 1, 2, . . . , T.
(5.22)
y
H
rt1|T ,
y
F
K
rt|T H1
t
t
t|t1
y,t|t1
(5.23)
where rT 1|T 0.30 The only matrix that has to be inverted is y,t|t1 , the forecast error
covariance matrix of the observed variables, i.e., the same matrix that needs to be invertible
for the Kalman filter to be valid.31 Furthermore, notice that Pt|t1 times the first term of the
To show that P S1 S is a generalized inverse of P SS we need to establish that P P P P ; see, e.g.,
Magnus and Neudecker (1988, p. 38). This follows directly from S S Is , while 1 S P S1 means that the
generalized inverse may also be written as in the expression for Jt .
29
30
These expressions can be derived by first substituting for t|t in (5.20) using equation (5.7), replacing Jt with
its definition, and substituting for Pt|t from (5.10). Next, we take the definition of the Kalman gain matrix Kt into
account and rearrange terms. This gives us (5.22), with
1
yt yt|t1 F Kt H Pt1|t
t1|T t1|t , t 1, 2, . . . , T 1.
(5.23)
rt|T H1
y,t|t1
The dierence between the smoothed value and the forecast value of t1 in period t T 1 is obtained from (5.7),
yielding
yT yT |T 1 .
T |T T |T 1 PT |T 1 H1
y,T |T 1
Subsituting this into (5.23 ) for t T 1 gives us
yT 1 yT 1|T 2 F KT 1 H H1
yT yT |T 1
rT 1|T H1
y,T 1|T 2
y,T |T 1
yT 1 yT 1|T 2 F KT 1 H rT |T ,
H1
y,T 1|T 2
yT yT |T 1 . This implies that (5.23) holds for t T 1 and, in addition, for t T provided
where rT |T H1
y,T |T 1
that rT 1|T 0. Notice also that if we substitute the definition of rt|T into equation (5.22) for t T it gives us the
identical expression to what we have in (5.7).
T 1|T 2 rT 1|T .
Returning to (5.22) we now have that T 1|T T 1|T 2 PT 1|T 2 rT 1|T so that PT1
1|T 2 T 1|T
From (5.23 ) we then find that (5.23) holds T 2 as well. Continuing backwards recursively it can be established
that (5.23) holds for t 1, 2, . . . , T 1.
31
When the Kalman filter and smoother is used to estimate unobserved variables of the model, then the inverse of
y,t|t1 may be replaced with a generalized inverse; see, e.g., Harvey (1989, p. 106). However, a singular covariance
matrix is not compatible with the existence of a density for yt |t1 , . Hence, the log-likelihood function cannot be
calculated for models where y,t|t1 does not have full rank.
53
expression for rt|T gives the update part for t relative to the forecast part, while the same matrix
times the second term generates the forward looking part. Hence, the alternative expression
for the smoother has an intuitive interpretation, where the smoother is directly linked to the
information in the data at t 1, t, and therafter from t 1 until T.
The mean squared error matrix for the smoothed projections can likewise be expressed as
Pt|T Pt|t1 Pt|t1 Nt|T Pt|t1 ,
where
t 1, 2, . . . , T,
H F Kt H Nt1|T F Kt H ,
Nt|T H1
y,t|t1
(5.24)
(5.25)
and NT 1|T 0. It can again be seen that the only matrix that has to be inverted is the
forecast error covariance matrix of the observed variables. Notice also that the first term in the
expression for Nt|T is related to the update part of the covariance matrix for the state variables,
while the second term is linked to the forward looking part. Moreover, these terms clarify why
Pt|T Pt|t in a matrix sense.
5.6. Smoothed and Updated Projections of State Shocks and Measurement Errors
Smoothing allows us to estimate the shocks to the state equations using as much information
as possible. In particular, by projecting both sides of equation (5.2) on the data until period T
we find that
(5.26)
vt|T t|T Ft1|T , t 2, . . . , T.
To estimate v1 using the full sample, it would seem from this equation that we require an
estimate of 0|T . However, there is a simple way around this issue.
For t1|T we premultiply both sides of (5.20) with F, subsitute (5.6) for Ft1|t1 , and replace (5.22) for t|T t|t1 . Next, take the definition of Jt1 into account and use (5.13) for
FPt1|t1 F . These manipulations give us
Ft1|T t|t1 Pt|t1 Q rt|T .
At the same time equation (5.22) gives us an expression for t|T . Using (5.26) it follows that
vt|T Qrt|T ,
t 2, . . . , T.
(5.27)
The variable rt|T in (5.22) is therefore seen to have the natural interpretation that when premultiplied by the covariance matrix of the state shock (Q), we obtain the smoothed projection
of the state shock. In other words, rt|T in (5.22) is an innovation with mean zero and covariance
matrix Nt|T .
Equation (5.27) provides us with a simple means for estimating vt|T for t 1 since rt|T is
defined for that period. In other words, equation (5.27) can be applied to compute the smooth
projection of the state shocks not only for periods t 2, . . . , T but also for period t 1, i.e.
v1|T Qr1|T .
To calculate update projections of the state shocks, denoted by vt|t , we need the smooth
projection t1|t . Since equation (5.20) holds for all T > t we can replace T with t 1. Using
the definition of Jt and equation (5.7) for t1|t1 t1|t we obtain
y
y
, t 1, . . . , T 1.
(5.28)
t|t1 t|t Pt|t F H1
t1
t1|t
y,t1|t
Premultiplying t1|t with F, replacing FPt1|t1 F with Pt|t1 Q (see equation 5.13), and taking
equation (5.6) into account we find that
y
y
.
Ft1|t t|t1 Pt|t1 Q H1
t
t|t1
y,t|t1
It now follows from (5.7) and (5.11) that
y
y
Qrt|t ,
vt|t QH1
t
t|t1
y,t|t1
t 2, . . . , T.
(5.29)
Hence, the state shock projected on observations up to period t is a linear combination of the
forecast error for the observed variables. Notice also that vt|t is equal to Q times the first term
54
on the right hand side of (5.23) and therefore has a natural connection with the smoothed
projection of the state shock.
Moreover, a simple estimator of v1|1 is suggested by equation (5.29). Specifically,
1
y1 A x1 H 1|0 Qr1|1 .
v1|1 QH H P1|0 H R
Implicitly, this assumes that F0|1 is equal to
1
y1 A x1 H 1|0 .
F0|1 1|0 P1|0 Q H H P1|0 H R
It is also interesting to note that the covariance matrix of the updated projection of the state
shock is given by
H Q.
QH1
E vt|t vt|t
y,t|t1
This matrix is generally not equal to Q, the covariance matrix of vt , unless t is observable at t.32
In fact, it can be shown that Q QH1
H Q in a matrix sense since the dierence between
y,t|t1
these matrices is equal to the covariance matrix of vt vt|t . That is,
H Q, t 1, . . . , T.
(5.30)
E vt vt|t vt vt|t |t Q QH1
y,t|t1
Moreover, the covariance matrix of the smoothed projection of the state shock is
QNt|T Q.
E vt|T vt|T
H Q for t 1, . . . , T. Moreover, we find
From (5.25) if also follows that QNt|T Q QH1
y,t|t1
that
(5.31)
E vt vt|T vt vt|T |T Q QNt|T Q, t 1, . . . , T.
Additional expressions for covariance matrices of the state shocks are given in Koopman (1993);
see also Durbin and Koopman (2012, Chapter 4.7).
The measurement errors can likewise be estimated using the sample information once the
update or smoothed state variables have been computed. In both cases, we turn to the measurement equation (5.1) and replace the state variables with the update or smooth estimator.
For the update estimator of the state variables we find that
y
y
, t 1, . . . , T,
(5.32)
wt|t yt A xt H t|t R1
t
t|t1
y,t|t1
where we have substituted for update estimate of the state variables using equation (5.7) and
rearranging terms. It follows from (5.32) that the update estimate of the measurement error is
zero when R 0. Moreover, we find from this equation that the covariance matrix is given by
R.
R1
E wt|t wt|t
y,t|t1
R in a matrix sense since the dierence is a positive semidefinite
It follows that R R1
y,t|t1
matrix. In fact,
R, t 1, . . . , T.
(5.33)
E wt wt|t wt wt|t |t R R1
y,t|t1
It may also be noted that the population cross-covariance matrix for the update estimates of
the state shocks and the measurement errors is given by
R.
QH1
E vt|t wt|t
y,t|t1
Unlike the population cross-covariance matrix for the state shocks and measurement errors, this
matrix is generally not zero.
Similarly, the smoothed estimate of the measurement error can be computed from the masurement equation. If we subsitute for the smooth estimate of the state variables and rerrange
Observability of t at t means that Pt|t 0. From equation (5.13) we therefore know that Pt|t1 Q. The claim
follows by noting that Pt|t always satisfies equation (5.10) and that y,t|t1 H QH R.
32
55
terms
wt|T yt A xt H t|T
R1
H
P
F
r
y
,
t
t|t1
t|t1
t1|T
y,t|t1
t 1, . . . , T.
(5.34)
If the covariance matrix R 0, then wt|T 0 by construction. Notice also that the covariance
matrix is equal to:
R,
R1
y,t|t1 H Pt|t1 F Nt1|T FPt|t1 H 1
E wt|T wt|T
y,t|t1
y,t|t1
R 1
Kt Nt1|T Kt R.
y,t|t1
E wt|t wt|t
(5.35)
The measurement equation can also be extended to the case when the measurement matrix
H is time-varying. With H replaced by Ht in (5.1), the filtering, updating, and smoothing
equations are otherwise unaected as long as Ht is treated as known. YADA is well equipped to
deal with a time-varying measurement matrix; see Sections 5.17 and 17.4 for details.
(5.37)
The mean squared error matrix for the th|t forecast is simply:
h
Pth|t F Pt|t F
h
h1
F i QF i
i0
(5.38)
FPth1|t F Q.
Finally, the mean squared error matrix for the yth|t forecast is:
E yth yth|t yth yth|t H Pth|t H R.
(5.39)
The h-step ahead forecasts of y and the mean squared error matrix can thus be built iteratively
from the forecasts of the state variables and their mean squared error matrix.
From this it immediately follows that the autocovariance function of the state variables, given
the parameter values, based on the state-space model is:
(5.41)
F h .
E t th
56
The autocovariance function of the observed variables can now be calculated from the measurement equation (5.1). With E yt
A xt , it follows that for any non-negative integer h
H H R, if h 0,
E yt A xt yth A xth
(5.42)
h
otherwise.
H F H,
From equations (5.41) and (5.42) it can be seen that the autocovariances tend to zero as h
increases. For given parameter values we may, e.g., compare these autocovariances to those
obtained directly from the data.
t 1, . . . , T 1.
(5.43)
Substituting for t|t1 , t1|t2 , . . . , 2|1 in (5.43) and rearranging terms it can be shown that
t1|t
t
t1|t z 0 t1|t 1|0 ,
t 1, . . . , T 1,
(5.44)
1
t1|t t1|t K ,
1, . . . , t,
t1
Lti , if 0, 1, . . . , t 1,
i0
t1|t
if t.
Ir ,
(5.45)
(5.46)
Notice that t|t1 Lt1 L1 for 0, 1, . . . , t 1. We can therefore also express this
matrix as
t1|t Lt t|t1 , 0, 1, . . . t 1.
If the intension is to compute the state variable forecasts for all (t 1, 2, . . . , T 1), this latter
relationship between -matrices is very useful in practise. It means that for the forecast of t1
we premultiply all the -matrices for the forecast of t by the same matrix Lt and then add
t t1|t Ir to the set of -matrices.
33
Their version of the state-space model allows the matrices H, R, F, and Q to be time-varying and known for all
t 1, . . . , T when t 1. See also Gmez (2006) for conditions when the weighting matrices of Koopman and
Harvey (2003) converge to those obtained from steady-state recursions, i.e., to the asymptotes.
57
Moreover, the -matrices for the forecast of t1 are related to those for the forecast of t
through
Lt t|t1 , if 1, . . . , t 1,
t1|t
if t.
Kt ,
In fact, the relationship between t1|t and t|t1 can be inferred directly from (5.43).
Furthermore, from (5.44) we see that the only -matrix we need to keep track of is 0 t1|t
since it is the weight on the initial condition 1|0 .
where
Ir Pt|t1 Gt t|t1 , if 1, . . . , t 1,
t|t
if t,
Pt|t1 Mt ,
while 0 t|t Ir Pt|t1 Gt
0 t|t1 .
(5.49)
(5.50)
The general expression for rt|T as a weighted series of the observed data and the initial conditions is given by
T
rt|T z 0 rt|T 1|0 , t 1, . . . , T.
rt|T
1
GT T |T 1 , if 1, . . . , T 1,
rT |T
if T,
MT ,
(5.51)
(5.52)
rt|T Mt L rt1|T
if t,
t
L r
if t 1, . . . , T.
t
t1|T
The weights on the initial condition is given by 0 rt|T Lt 0 rt1|T Gt 0 t|t1 .
58
The smoothed state variable projections are expressed as a weighted sum of the observed
variables and the initial conditions as follows:
T
t|T z 0 t|T 1|0 , t 1, . . . , T,
(5.53)
t|T
1
where the weights on the observations z are obtained through (5.22) and are given by:
t|t1 Pt|t1 rt|T , if 1, . . . , t 1,
(5.54)
t|T
if t, . . . , T.
Pt|t1 rt|T ,
The weights on the initial conditions is 0 t|T 0 t|t1 Pt|t1 0 rt|T .
These expressions allow us to examine a number of interesting questions. For example,
column j of the t|T weight matrix tells us how much the smooth estimate of the state
variables for period t will change if the j:th observed variable changes by 1 unit in period .
Letting this column be denoted by ,j t|T while zj,, denotes the j:th element of z , we can
decompose the smooth estimates such that
t|T
n
T
,j t|T zj, 0 t|T 1|0 .
j1 1
Hence, we obtain an expression for the share of the smoothed projection of the state variables at
t which is determined by each observed variable and of the initial condition. Similar expression
can be computed for the state forecasts and the updates.
Since zt yt A xt it follows that the weights for the state forecast, update and smoother are
equal to the weights on yt . If we wish to derive the specific weights on xt we simply postmultiply
the -matrices with A . For example, in case xt 1 we have that
t|T
n
T
j1 1
T
,j t|T yj,
t|T 0 t|T 1|0 ,
1
where
wt|T
H ,
if 1, . . . , t 1, t 1, . . . , T,
t|T
In H t|T , if t.
(5.56)
Moreover, the n r matrix with weights on the initial condition is 0 wt|T H 0 t|T .
Observation weights for the state shocks can be determined from the observation weights for
the economic shocks. This topic is covered in Section 11.1.
5.10. Simulation Smoothing
The smooth estimates of the state variables, shocks and measurement errors that we discussed
in Sections 5.5 and 5.6 above give the expected value from the distributions of these unobserved
variables conditional on the data (and the parameters of the state-space model). Suppose instead that we would like to draw samples of these variables from their conditional distributions.
Such draws can be obtained through what is generally known as simulation smoothing; see, e.g.,
Durbin and Koopman (2012, Chapter 4.9).
Frhwirth-Schnatter (1994) and Carter and Kohn (1994) independently developed techniques for simulation smoothing of the state variables based on
(5.57)
p 1 , . . . , T |T p T |T p T 1 |T , T p 1 |2 , . . . , T , T .
59
The methods suggested in these papers are based on drawing recursively from the densities on
the right hand side of (5.57), starting with pT |, then from pT 1 |T , T , and so on. Their
techniques were improved on in De Jong and Shephard (1995) who concentrated on sampling
the shocks (and measurement errors) and thereafter the state variables. The algorithm that I
shall discuss relies on the further significant enhancements suggested by Durbin and Koopman
(2002), which have made simulation smoothing both simpler and faster, and which is also
discussed by Durbin and Koopman (2012).
Let w1 v1 wT vT
, a vector of dimension n rT with the measurement errors
and state shocks. We know that the conditional distribution of is normal, where the smooth
estimates in Section 5.6 gives the conditional mean, E |T
, and that the covariance matrix
is independent of T , i.e., Cov |T
C. We also know that the marginal distribution of is
N0, , where IT diag R, Q
, while 1 N , and independent of .
Drawing vectors
from p|T can be achieved by drawing vectors from N0, C and adding
the draws to E |T
. Durbin and Koopman suggest that this can be achieved through the
following three steps.
(1) Draw i from N0, and 1i from N , . Substitute these values into the statespace model in (5.1)(5.2) to simulate a sequence for yti , t 1, . . . , T, denoted by
Ti .
(2) Compute E |Ti
via equations (5.27) and (5.34).
(3) Take
i E |T
i E |Ti
as a draw from p|T .
If the covariances matrices R and Q have reduced rank, we can lower the dimension of
such that only the unique number of sources of error are considered. Moreover, if we are only
interested in the simulation smoother for, say, the state shocks, then we need only compute the
i
i
E vt |Ti
, while vti vt|T vti vt|T
, t 1, . . . , T, is a draw
smooth estimates vt|T and vt|T
from pv1 , . . . , vT |T . Notice that the first step of the algorithm always has to be performed.
To see why the third step of the algorithm gives a draw from p|T , note that i
i is drawn
E |Ti
is independent of T . This means that, conditional on T , the vector
from a normal distribution with mean E |T
and covariance matrix E i E |Ti
i
E |Ti
C. Furthermore, a draw of t , t=1, . . . , T, from p1 , . . . , T |T is directly
i
.
obtained as a by-product from the above algorithm by letting Ti t|T ti t|T
Durbin and Koopman (2002) also suggest that antithetic variables can be computed via these
results. Recall that an antithetic variable in this context is a function of the random draw
which is equiprobable with the draw, and which when used together with the draw increases
the eciency of the estimation; see, e.g., Mikhail (1972) and Geweke (1988). It follows that
i . If a simulation
i E |T
i E |Ti
is one such antithetic variable for
smoother of is run for i 1, . . . , N, then the simulation average of
i is not exactly equal
i
i
plus
improve the computational time; see also Anderson and Moore (1979, Chapter 6.7). For these
recursions to be valid, the system matrices need to be constant or, as in Aknouche and Hamdi
(2007), at least periodic, while the state variables are stationary.
Defining Pt1|t Pt1|t Pt|t1 and utilizing, e.g., Lemma 7.1 in Anderson and Moore (1979)
(or equivalently the Lemma in Herbst, 2012) we find that
H
P
H
(5.59)
F
K
Pt1|t F Kt H Pt|t1 Pt|t1 H1
t
t|t1
y,t1|t2
With Pt1|t Wt Mt Wt the following recursive expressions hold
H,
y,t|t1 y,t1|t2 H Wt1 Mt1 Wt1
H 1
,
Kt Kt1 y,t1|t2 FWt1 Mt1 Wt1
y,t|t1
H1
H Wt1 Mt1 ,
Mt Mt1 Mt1 Wt1
y,t1|t2
Wt F Kt H Wt1 .
(5.60)
(5.61)
(5.62)
(5.63)
Notice that the computation of Pt1|t involves multiplication of r r matrices in (5.14) and
(5.58). Provided that n is suciently smaller than r, as is typically the case with DSGE models,
the computational gains from the Chandrasekhar recursions in (5.60)(5.63) can be substantial.
This is particularly the case when we only wish to evaluate the log-likelihood function in (5.19)
since the state covariance matrix is not explicitly needed in this expression. Being able to
recursive compute y,t|t1 without involving multiplications of r r matrices can therefore be
highly beneficial.
To initialize these recursions we need values of W1 and M1 , Herbst (2012) notes that for
t 1 the matrix Riccati equation in (5.58) can be expresses as
P2|1 FP1|0 F Q K1 y,1|0 K1
Provided that we have initialized the Kalman filter with P1|0 in equation (5.15), it follows
that
P1|0 FP1|0 F Q,
so that
P2|1 K1 y,1|0 K1 W1 M1 W1 .
This suggests that
W1 K1 ,
M1 y,1|0 .
(5.64)
t
Alternatively, we can replace recursions of the Kalman gain matrix with recursions of K
FPt|t1 H. This means that we replace equation (5.61) with
t K
K
t1 FWt1 Mt1 Wt1 H.
This equation involves fewer multiplications than the recursive equation for the Kalman gain
matrix and may therefore be preferred. Furthermore, equation (5.63) can be rewritten as
t 1
H Wt1 .
Wt F K
y,t|t1
The intializations of W1 and M1 are aected by these changes to the Chandrasekhar recursions
and are now given by
1 , M1 1 .
W1 K
y,1|0
This also emphasizes that the matrices Wt and Mt are not uniquely determined.
As pointed out by Herbst (2012), the Chandrasekhar recursions are sensitive to the to the
numerical accuracy of the solution to the Lyapunov equation in (5.15). When the doubling
algorithm in Section 5.3 is used to compute P1|0 it is therefore recommended that the number
of recursions k is large enough for the error matrix
,k1 k1 k
2k1
1
j2k
61
FQF j ,
to be suciently close to zero. The convergence criterion for the doubling algorithm in YADA
concerns the norm of ,k1 , i.e., the largest singular value of this matrix. This makes it straightforward to have control over the numerical accuracy of the solution to the Lyapunov equation
when this fast, numerical approach is used.
It may also be noted that if k , then the error of the Lyapunov equation (5.15) is given
by
k
k
Fk F Q k F 2 QF 2 .
This is equal to the term in ,k1 with the smallest exponent and since each increment to
the error matrix is positive semidefinite the norm of the solution error for k is smaller than
(or equal to) the norm of the error matrix. Furthermore, using k1 as the solution for the
Lyapunov equation yields a solution error which is never bigger than that of k .
5.12. Square Root Filtering
It is well known that the standard Kalman filter can sometimes fail to provide an error covariance matrix which is positive semidefinite. For instance, this can happen when some of
the observed variables have a very small variance or when pairs of variables are highly correlated. The remedy to such purely numerical problems is to use a square root filter; see Morf
and Kailath (1975), Anderson and Moore (1979, Chapter 6.5), or Durbin and Koopman (2012,
Chapter 6.3).
To setup a square root filter we first need to define the square roots of some of the relevant
1/2
be the r r square root of Pt|t1 , R1/2 the n n square root of
matrices. Specifically, let Pt|t1
R, while B0 is an r q square root of Q, with r q.34 That is,
1/2 1/2
Pt|t1 ,
Pt|t1 Pt|t1
R R1/2 R1/2 ,
Q B0 B0 .
1/2
H Pt|t1
R1/2 0
.
Ut
1/2
0
B0
FPt|t1
Notice that
Ut Ut
y,t|t1
H Pt|t1 F
(5.65)
(5.66)
.
FPt|t1 H FPt|t1 F Q
The matrix Ut can be transformed to a lower triangular matrix using the orthogonal matrix
G, such that GG Inrq . Postmultiplying Ut by G we obtain
Ut G Ut ,
(5.67)
U
0 0
.
Ut 1,t
U2,t U3,t 0
where
(5.68)
1/2
,
U1,t
y,t|t1
U2,t
FPt|t1 H1/2
,
y,t|t1
1/2
U3,t
Pt1|t
.
This means that the Kalman filter step in (5.8) can be expressed as
1
yt yt|t1 .
U1,t
t1|t Ft|t1 U2,t
(5.69)
Moreover, the state covariance matrix in (5.14) can instead be computed from
U3,t
.
Pt1|t U3,t
(5.70)
The matrix B0 is defined in Section 3 and satisfies vt B0 t , where t N0, Iq , where q is the number of i.i.d.
economic shocks. Hence, Q B0 B0 and B0 may therefore be regarded as a proper square root matrix of Q.
34
62
The right hand side of (5.70) is positive semidefinite by construction and the same is true for
U1,t
.
y,t|t1 U1,t
The crucial step in the square root filter is the computation of the orthogonal matrix G. One
approach to calculating this matrix along with Ut is the so called Q-R factorization. For an
n m matrix A this factorization is given by A QR, where the n n matrix Q is orthogonal
(QQ In ), while the n m matrix R is upper triangular; see, e.g., Golub and van Loan (1983,
p. 147) or the qr function in Matlab. To obtain Ut we therefore compute the Q-R factorization
of Ut and let Ut be the tranpose of the upper triangular matrix R (which should not be confused
with the covariance matrix of the measurement errors), while G is given by the Q matrix.
It is also possible to calculate square root versions of the update and smooth estimates of the
states and of the state covariance matrices. Concerning the update estimates we have that t|t
is determined as in equation (5.12), where the square root filter means that Pt|t1 is based on
the lag of equation (5.70), while
1
U1,t
yt yt|t1 .
(5.71)
rt|t H U1,t
Furthermore, the update covariance matrix is determined by equation (5.10) where Pt|t1 and
y,t|t1 are computed from the square root expressions.
The smothed state variables can similarly be estimated using the square root filter by utilizing
equation (5.22), where Pt|t1 is again determined from the output of the square root filter, while
the expression for the smooth innovations in equation (5.23) can now be expressed as
1
H rt1|T ,
(5.72)
rt|T rt|t F U2,t U1,t
with rT 1|T 0 and where rt|t is given by (5.71).
The state covariance matrix for the smooth estimates is computed as in equation (5.24), but
where a square root formula for Nt|T is now required. Following Durbin and Koopman (2012)
which satisfies
we introduce the r r lower triangular matrix Nt1|T
Nt1|T
.
Nt1|T Nt1|T
1
1
Nt|T H U1,t
H
Nt1|T ,
F U2,t U1,t
(5.73)
t|T N
; see equation (5.25). The last step is now to compute
from which it follows that Nt|T N
t|T
t 1, . . . , T,
(5.75)
where the matrix St is n nt and nt n with rank equal to nt . The columns of this matrix are
given by columns of In , whose i:th column is denoted by ei . The i:th element of yt , denoted
63
by yi,t , is observed at t if and only if ei is a column of St , i.e., ei St 0. In the event that
observations on all variables are missing for some t, then St is empty for that period. Hence,
the vector yto is given by all observed elements in the vector yt , while the elements with
missing values are skipped.
The measurement equation of the state-space model can now be expressed as
o
o
yto Ao
t xt Ht t wt ,
t 1, . . . , T,
(5.76)
o
HSt , and wto St wt . This means that the measurement errors are
where Ao
t ASt , Ht
o
given by wto N0, Ro
t , where the covariance matrix is given by Rt St RSt .
The Kalman filtering and smoothing equations under missing observations have the same
general form as before, except that the parameter matrices in (5.76) replace A, H, and R when
nt 1. In the event that nt 0, i.e., there are no observation in period t, then the 1-step ahead
forecasts of the state variables become
t1|t Ft|t1 ,
(5.77)
while the covariance matrix of the 1-step ahead forecast errors of the state variables is
Pt1|t FPt|t1 F Q.
(5.78)
That is, the Kalman gain matrix, Kt , is set to zero; cf. equations (5.8) and (5.14). For the update
estimates we likewise have that rt|t 0, t|t t|t1 , and Pt|t Pt|t1 .
Furthermore, when nt 0 then the smoothing equation in (5.23) is replaced by
rt|T F rt1|T ,
(5.79)
(5.80)
Regarding the log-likelihood function in (5.19) we simply set it to 0 for the time periods when
nt 0, while nt replaces n, yto is used instead of yt , and yo ,t|t1 instead of y,t|t1 for all
time periods when nt 1.
5.14. Diuse Initialization of the Kalman Filter
The Kalman filters and smoothers that we have considered thus far rely on the assumption that
the initial state 1 has a (Gaussian) distribution with mean and finite covariance matrix ,
where the filtering recursions are initialized by 1|0 and P1|0 ; this case may be referred
to as the standard initialization.
More generally, the initial state vector 1 can be specified as
1 S S v1 ,
(5.81)
after some point in time t d is here optional, but advisable for computational eciency. The
discussion below follows the treatment in Koopman and Durbin (2003).
It should already at this point be stressed that diuse initialization will not converge at a
finite point in time t d if some state variable has a unit root, such as for a random walk.
Moreover, near non-stationarity is also a source for concern. In this situation d will be finite,
but can still be very large, dicult to determine numerically, and on top of this greater than
the sample size. Numerical issues are likely to play an important role under near unit roots
with certain covariance matrices possibly being close to singular or with problems determining
their rank. For such models it is recommended to stick with a finite initialization of the state
covariance matrix.
5.14.1. Diuse Kalman Filtering
From equation (5.81) we define the covariance matrix of the initial value for the state variables
with finite c as
P cP P ,
where P SS and P S v S . The case of full diuse initialization means that s r so that
P Ir while P 0, while 0 < s < r concerns partial diuse initialization. The implications
for the Kalman filter and smoother are not aected by the choice of s, but the implementation
in YADA mainly concerns s r. In fact, if none of the variables have been specifically selected
as a unit-root process, then full diuse initialization is used. Similarly, state variables that have
been selected as unit-root processes, whether they are or not, will be excluded from the s
diuse variables such that the corresponding diagonal element of P is zero, while the same
diagonal element of P is unity. Further details on the selection of unit-root state variables is
found in the YADA help file, while such variables can be selected via the Actions menu in YADA.
The 1-step ahead forecast error covariance matrix for the state variables can be decomposed
in the same way as P so that
Pt|t1 cP,t|t1 P,t|t1 Oc1 ,
t 1, . . . , T,
where P,t|t1 and P,t|t1 do not depend on c.35 It is shown by Ansley and Kohn (1985) and
Koopman (1997) that the influence of the term P,t|t1 will disappear at some t d and that
the usual Kalman filtering equations in Section 5.2 can be applied for t d 1, . . . , T.
Consider the expansion of the inverse of the matrix
y,t|t1 c,t|t1 ,t|t1 Oc1 ,
t 1, . . . , T,
(5.82)
which appears in the definition of Kt in (5.9). The covariances matrices on the right hand side
are given by
,t|t1 H P,t|t1 H,
(5.83)
,t|t1 H P,t|t1 H R.
Koopman and Durbin (2003) provide the Kalman filter equations for the cases when ,t|t1
is nonsingular and ,t|t1 0. According to the authors, the case when ,t|t1 has reduced
rank 0 < nt < n is rare, although and explicit solution is given by Koopman (1997). The authors
suggest, however, that for the reduced rank case the univariate approach to multivariate Kalman
filtering in Koopman and Durbin (2000) should be used instead. This is also the stance taken
by YADA when ,t|t1 is singular but not zero. The univariate approach for both the standard
initialization and diuse initialization is discussed in Section 5.15.
The exact initial state 1-step ahead forecast equations when ,t|t1 is nonsingular and c
are given by
t1|t Ft|t1 K,t yt yt|t1 ,
P,t1|t FP,t|t1 L,t ,
P,t1|t
35
FP,t|t1 L,t
(5.84)
K,t ,t|t1 K,t
The term Oc1 refers to a function fc such that cfc is finite as c .
65
Q,
for t 1, . . . , d, with
,
K,t FP,t|t1 H1
,t|t1
,
K,t FP,t|t1 H K,t ,t|t1 1
,t|t1
L,t F K,t H ,
with the initialization 1|0 , P,1|0 P and P,1|0 P . The forecast yt|t1 is given by
equation (5.3).36
From the forecast equations in (5.84) it can deduced that the Kalman update equation can
be written as
yt yt|t1
t|t t|t1 P,t|t1 H1
,t|t1
(5.85)
t|t1 P,t|t1 r,t|t .
while the update state covariance matrices are
H P,t|t1 ,
P,t|t P,t|t1 P,t|t1 H1
,t|t1
H P,t|t1 P,t|t1 H1
H P,t|t1
P,t|t P,t|t1 P,t|t1 H1
,t|t1
,t|t1
(5.86)
P,t|t1 H1
1
H P,t|t1 .
,t|t1 ,t|t1 ,t|t1
If ,t|t1 0, the exact initial state 1-step ahead forecast equations are instead
t1|t Ft|t1 K,t yt yt|t1 ,
P,t1|t FP,t|t1 F ,
P,t1|t
FP,t|t1 L,t
(5.87)
Q,
where
,
K,t FP,t|t1 H1
,t|t1
L,t F K,t H .
It can here be seen that the forecast equations are identical to the standard Kalman filter equations where P,t1|t serves the same function as Pt1|t , while an additional equation is provided
for P,t1|t . When P,d|d1 0 it follows that ,d|d1 0 and that it is no longer necessary
to compute the equation for P,t1|t . Accordingly, the diuse Kalman filter has automatically
shifted to the standard Kalman filter.
From (5.87) it is straightforward to show that the Kalman state variable update equation
when ,t|t1 0 is
yt yt|t1
t|t t|t1 P,t|t1 H1
,t|t1
(5.88)
t|t1 P,t|t1 r,t|t ,
while the update covariance matrices are given by
P,t|t P,t|t1 ,
H P,t|t1 .
P,t|t P,t|t1 P,t|t1 H1
,t|t1
(5.89)
Regarding the automatic collapse to the standard Kalman filter, for stationary state-space
models Koopman (1997) notes that
#
$
rank P,t1|t min rank P,t|t1 rank ,t|t1 , rank F ,
as long as the left hand side is nonnegative; see also Ansley and Kohn (1985). For example,
with the intialization P 0 and P Ir , it is straightforward to show that
P,2|1 FPH F ,
,2|1 H FPH F H,
36
Notice that the second term within parenthesis in the expression for K,t is subtracted from the first rather than
added to it. The sign error in Koopman and Durbin (2003, page 89) can be derived using equations (5.7), (5.10),
(5.12), and (5.14) in Durbin and Koopman (2012).
66
where PH Ir HH H1 H and rankPH r n. This means that the rank of P,2|1 is less
than or equal to the minimum of r n and the rank of F. As long as ,t|t1 is nonsingular it
follows that the rank of P,t1|t drops quickly towards zero.
From the updating equation for P,t1|t in (5.87) it can be seen that if a row of F is equal to
a row from the identity matrix, i.e., the corresponding state variable has a unit root, then the
diagonal element in P,t1|t is equal to the same element in P,t|t1 . This means that the rank of
P,t|t1 will never converge to zero and whether the diuse Kalman filter shifts to the standard
filter depends on the development of the covariance matrix ,t|t1 . Moreover, in the case of
near non-stationary where, say, a state variable follows an AR(1) process with autoregressive
coecient close to unity, the diuse Kalman filter need not converge to standard filter within
the chosen sample period.
When ,t|t1 is nonsingular, we may deduce from the update equation (5.85) that the update estimator of the state shock is given by
vt|t Qr,t|t ,
(5.90)
while the update measurement error is zero, i.e., wt|t 0. Similarly, when ,t|t1 0 it can
be shown from equation (5.88) that the update estimator of the state shock is
vt|t Qr,t|t ,
(5.91)
(5.92)
The log-likelihood function can also be computed for periods t 1, . . . , d as shown by Koopman (1997). In case ,t|t1 is nonsingular the time t log-likelihood is equal to
1
n
(5.93)
ln p yt |xt , t1 ; ln2 ln ,t|t1 ,
2
2
and if ,t|t1 0 we have that
1
1
n
y
y
. (5.94)
ln p yt |xt , t1 ; ln2 ln ,t|t1 yt yt|t1 1
t
t|t1
,t|t1
2
2
2
As expected, the log-likelihood in the latter case is exactly the same as for the standard filter;
cf. equation (5.19).
t d, d 1, . . . , 1
(5.95)
(5.96)
1
rd1|T
0,
0
Nd1|T
Nd1|T ,
1
2
Nd1|T
Nd1|T
0.
For the case when ,t|t1 is nonsingular it is shown by Koopman and Durbin (2003) that
0
0
L,t rt1|T
,
rt|T
1
0
1
H 1
y
r
y
K
rt|T
t
t|t1
,t t1|T L,t rt1|T .
,t|t1
67
(5.97)
1
L,t
Nt|T
,t|t1 ,t|t1 ,t|t1
(5.98)
1
1
L,t Nt1|T
K,t H HK,t
Nt1|T
L,t .
0
0
H1
y
,
y
L,t rt1|T
rt|T
t
t|t1
,t|t1
1
1
rt|T
F rt1|T
,
(5.99)
(5.100)
2
F Nt1|T
F.
The smoothing recursions in (5.97)(5.98) and (5.99)(5.100) may also be written more
I2 H
, whereas
compactly as in Koopman and Durbin (2003, Section 4). Specifically, let H
0
0
1
rt|T
Nt|T
Nt|T
y
y
t
t|t1
, zt
t|T
,
rt|T 1 , N
1
2
rt|T
Nt|T Nt|T
0
and the vector zt is 2n-dimensional. If ,t|t1 is nonsingular we let
0
1
L
K
H
,t
,t
,t|t1
, L
t
t|t1
.
1
1
1
0
L
,t
,t|t1
,t|t1
,t|t1
,t|t1
On the other hand, when ,t|t1 0 we instead define
1
0
L
0
,t
,t|t1
, L
t|t1
t
.
0 F
0
0
For both cases, the recursions in (5.97)(5.98) and (5.99)(5.100) may be expressed as
t|t1 zt L
t rt1|T ,
(5.101)
rt|T H
t|t1 H
t N
L
t1|T L
t.
t|T H
N
(5.102)
For the state shocks and the measurement errors, a nonsingular ,t|t1 matrix yields the
following smooth estimates
0
,
vt|T Qrt|T
0
wt|T RK,t
rt1|T
.
(5.103)
0
wt|T R 1
y
r
y
K
.
t
t|t1
,t
,t|t1
t1|T
68
(5.104)
Kalman filtering and smoothing when ,t|t1 is singular but has at least rank one is covered by the univariate approach discussed in Section 5.15.2. When YADA encounters this case
it automatically switches to a univariate routine for this time period and reverts back to the
multivariate routine when the ,t|t1 is either nonsingular or zero. The multivariate approach
in Ansley and Kohn (1985) and Koopman (1997) is not be covered since it is mathematically
complex and computationally inecient.
5.15. A Univariate Approach to the Multivariate Kalman Filter
The standard approach to Kalman filtering and smoothing is based on taking the full observation
vector into account at each point in time. The basic idea of univariate filtering is to incrementally
add the individual elements of the vector of observed variables; see Anderson and Moore (1979,
Chapter 6.4), Koopman and Durbin (2000), and Durbin and Koopman (2012, Chapter 6.4).
One important reason for considering such an approach is computational eciency, which can
be particularly relevant when diuse initialization is considered. In this subsection we shall
first consider the univariate approach, also known as sequential processing, under standard
initialization of the state vector, and thereafter the case of diuse initialization.
5.15.1. Univariate Filtering and Smoothing with Standard Initialization
In order to make use of univariate filtering and smoothing, one must first deal with the issue of
correlated measurement errors. One solution is to move them to the vector of state variables and
shocks. Alternatively, the measurement equation can be transformed such the the measurement
errors become uncorrelated through the transformation. This latter case may be handled by a
Schur decomposition of R SS , where is diagonal and holds the eigenvalues of R, while S
is orthogonal, i.e., S S In .37 By premultiplying the measurement equation by S we get
yt A xt H t wt ,
yt
wt
t 1, . . . , T,
AS,
HS,
This means that E wt wt
, a diagonal
where
matrix, while the state equation is unaected by the transformation. Since r can be a large
number for DSGE models, YADA always transforms the measurement equation when R is not
diagonal, rather than augmenting the state vector.
Assuming that R is diagonal, the Kalman forecasting and updating equations can be determined from the following univariate filtering equations
S yt ,
S wt .
(5.105)
Pt,i1 Pt,i Kt,i 1
t,i Kt,i ,
(5.106)
(5.107)
(5.108)
Kt,i Pt,i Hi .
(5.109)
Observed element i in yt is denoted by yt,i and the corresponding columns of A and H are given
by Ai and Hi , respectively, while Ri is the measurement error variance for wt,i . The univariate
transition equations are
t1,1 Ft,nt 1 ,
(5.110)
Pt1,1 FPt,nt 1 F Q.
(5.111)
It follows that the forecasting and updating estimates for the state variables and the corresponding covariance matrices are
t1|t t1,1 ,
t|t t,nt 1 ,
Pt1|t Pt1,1 ,
Pt|t Pt,nt 1 ,
Since R is a square matrix, the Schur decomposition is identical to the eigenvalue decomposition.
69
(5.112)
Note that although the univariate Kalman filter requires looping over all i 1, . . . , nt , it
also avoids inverting y,t|t1 and two matrix multiplications. For larger models, computational
gains can therefore be quite important. Furthermore, the log-likelihood can also be determined
without inverting y,t|t1 . Specifically, it can be shown that
nt
1
2
ln2 ln t,i zt,i /t,i .
(5.113)
ln p yt |xt , t1 ;
2 i1
Recall that we are assuming a diagonal covariance matrix for the measurement errors. This
assumption is not a restriction for the log-likelihood since the orthogonal transformation matrix
S has determinant one, i.e., the value of the log-likelihood function is invariant to S.
Smoothed estimates of the unobserved variables can likewise be determined from univariate
smoothing recursions. Let rT,nT 0 and NT,nT 0 intialize the univariate smoother while
rt,i1 Hi 1
t,i zt,i Lt,i rt,i ,
Nt,i1 Hi 1
t,i Hi Lt,i Nt,i Lt,i ,
(5.114)
(5.115)
(5.116)
Nt1,nt1 F Nt,0 F.
(5.117)
Nt|T Nt,0 .
(5.118)
Smooth estimates of the state variables and their covariances satisfy equations (5.22) and
(5.24), while smooth estimates of the measurement errors and the state shocks can be computed from the smooth innovations as in Section 5.6.
The above algorithm may also be used for computing update estimates of the state shocks
(and measurement errors) with the rt|t innovation vector. Let ut,nt 0 for t 1, . . . , T and
consider the recursion
ut,i1 Hi 1
t,i zt,i Lt,i ut,i ,
i nt , . . . , 1.
(5.119)
(5.120)
For the calculation of the measurement error in (5.120) it is important to note that the original
H and R matrices are used, instead of those obtained when transforming the measurement
equation such that the errors have a diagonal covariance matrix. Naturally, the update estimator
of the measurement errors may also be computed directly from the measurement equation using
the update estimator of the state variables.
5.15.2. Univariate Filtering and Smoothing with Diuse Initialization
The diuse initialization of the Kalman filter considered in Section 5.14 implies that the matrix
Pt,i , the vector Kt,i and the scalar t,i can be decomposed as
Pt,i P,t,i cP,t,i Oc1 ,
Kt,i K,t,i cK,t,i Oc1 ,
(5.121)
K,t,i P,t,i Hi ,
K,t,i P,t,i Hi .
70
(5.122)
Notice that ,t,i 0 implies that K,t,i 0 since Hi has rank one.
To obtain the diuse filtering recursions, the scalar 1
t,i needs to be expanded as a power
series in cj . This yields
c1 1 c2 2 Oc3 , if
,t,i > 0,
1
,t,i ,t,i
,t,i
t,i
1
,t,i
otherwise.
This is easily established by computing ,t,i c,t,i 1
t,i 1. Provided ,t,i > 0, equations
(5.105) and (5.106) then gives us
t,i1 t,i K,t,i 1
,t,i zt,i ,
P,t,i1 P,t,i
(5.123)
K,t,i K,t,i
1
,t,i ,
1
,t,i 2
P,t,i1 P,t,i K,t,i K,t,i
,t,i K,t,i K,t,i K,t,i K,t,i ,t,i ,
(5.124)
(5.125)
for i 1, . . . , nt . When ,t,i 0, the univariate filtering equations for the standard initialization apply, i.e.,
t,i1 t,i K,t,i 1
,t,i zt,i ,
P,t,i1 P,t,i ,
(5.126)
(5.127)
1
P,t,i1 P,t,i K,t,i K,t,i
,t,i ,
(5.128)
for i 1, . . . , nt . Notice that P,t,i1 plays the role of Pt,i1 under standard initialization and
that the filter under diuse initialization is augmented with equation (5.127). The transition
from t to t 1 satisfies the following
t1,1 Ft,nt 1 ,
(5.129)
P,t1,1 FP,t,nt 1 F ,
P,t1,1 FP,t,nt 1 F Q.
(5.130)
(5.131)
38
See Ansley and Kohn (1985, 1990), Koopman (1997), or Section 5.14.1.
71
(5.133)
where
ln 2 ln
if ,t,i > 0,
,t,i
ln pt,i
0
0
1
Nt,i
r
N
, N
,
t,i t,i
rt,i t,i
1
1
2
rt,i
Nt,i Nt,i
it can be shown using equations (5.114) and (5.115) that for ,t,i > 0
0
L
L
,t,i
0,t,i
rt,i , i nt , . . . , 1,
rt,i1
z
Hi 1
0
L
,t,i
,t,i t,i
(5.134)
where
L,t,i Ir K,t,i Hi 1
,t,i ,
1
L0,t,i K,t,i ,t,i ,t,i K,t,i Hi 1
,t,i .
Furthermore,
Hi Hi 1
,t,i
t,i1
N
2
Hi Hi 1
H
H
,t,i
i i
,t,i
,t,i
for i nt , . . . , 1,, with transitions
rt1,nt1 I2 F rt,0 ,
L,t,i
0
L0,t,i
L,t,i
L
L
0,t,i
t,i ,t,i
N
,
0
L,t,i
t1,nt1 I2 F N
t,0 I2 F ,
N
for t d, . . . , 1.
If ,t,i 0, it can likewise be shown that
1
H
z
L
0
,t,i
rt,i ,
rt,i1 i ,t,i t,i
0
L,t,i
0
i nt , . . . , 1,
1
H
H
0
0
0
L
L
,t,i
,t,i
i
i
,t,i
t,i
t,i1
N
,
N
0
L,t,i
0
L,t,i
0
0
for i nt , . . . , 1. The transition from t to t 1 is again covered by equation (5.136).
For both these cases, the diuse state smoothing equations are given by
t|T t,1 Pt,1 rt,0 ,
Pt|T P,t,1
t,0 P ,
Pt,1 N
t,1
72
(5.135)
(5.136)
(5.137)
(5.138)
(5.139)
(5.140)
0
.
vt|T Qrt,0
(5.141)
We can also determine update estimates of the state shocks using the above algorithm. Specif1
ically, let u0
t,nt ut,nt 0 for all t and define the stacked vector
u0
.
u
t,i t,i
u1
t,i
If ,t,i > 0 we let
u
t,i1
z
Hi 1
,t,i t,i
L,t,i
L0,t,i
L,t,i
1
H
z
L
0
,t,i
u
t,i ,
u
t,i1 i ,t,i t,i
0
L,t,i
0
u
t,i ,
i nt , . . . , 1.
(5.142)
(5.143)
The update estimates of the state shocks are now given by equation (5.120) with ut,0 being
replaced with u0
t,0 . Update estimates of the measurement errors can likewise be determined
by replacing ut,0 with u0
t,0 in this equation. As in the case of standard initialization for the
univariate case, care has to be taken to ensure that the original H and R matrices are used,
rather than those implied by the transformation S.
5.16.1. Weights for the Forecasted State Variables under Diuse Initialization
To determine the weights on the observed variables for the forecasted state variables under
the assumption of diuse initialization we need to take three cases into account. Namely, the
covariance matrix ,t|t1 has full rank nt , it has rank zero, and it has rank less than nt but
greater than zero. The last possibility is, as in Section 5.14, handled through the univariate
filter with diuse initialization.
When ,t|t1 has full rank nt equation (5.84) for the state variable forecast can be rewritten
as
t1|t L,t t|t1 K,t zt ,
where zt yt A xt . Similarly, when ,t|t1 has rank zero, equation (5.87) can be rearranged
according to
t1|t L,t t|t1 K,t zt .
These two equations have the same general form as (5.43) in Section 5.9, thus suggesting how
we may proceed to determine the observation weights.
73
The third case to examine is when ,t|t1 has reduced rank but is nonzero. From the univariate filter in Section 5.15.2 it may be recalled that t1|t t1,1 Ft,nt 1 Ft|t , while
K
1
t,i
,t,i ,t,i zt,i , if ,t,i > 0,
t,i1
otherwise.
t,i K,t,i 1
,t,i zt,i ,
From equation (5.107) we know that zt,i yt,i Ai xt Hi t,i and, hence, we may rewrite the
above equation as
(5.144)
t,i1 Lt,i t,i Kt,i zt,i ,
where zt,i yt,i Ai xt , and
L
,t,i , if ,t,i > 0,
Lt,i
L,t,i , otherwise,
and Kt,i is defined similarly from K,t,i and K,t,i , respectively. From recursive substitution for
t,i in (5.144) it can be shown that
nt
nt
nt
Lt,i t,1
Lt,j Kt,i zt,i .
(5.145)
t,nt 1
i1
i1
ji1
Since the univariate filter is based on uncorrelated measurement errors, the transformation
R SS discussed at the beginning of Section 5.15.1 must to be taken into account. Letting zt
be the nt -dimensional vector with typical element zt,i , we know that zt S zt . This means that
equation (5.145) can be rewritten as
t t|t1 F K
t zt ,
(5.146)
t1|t F L
where
t
L
nt
Lt,i ,
t
K
nt
j2 Lt,j
Kt,1 Lt,nt Kt,nt 1 Kt,nt S .
i1
Combining the results for the three cases concerning the rank of ,t|t1 we find that
t t|t1 K
t zt ,
t1|t L
where
t 1, . . . , d 1,
(5.147)
t L , if rank
L
,t
,t|t1 0,
F L
t , otherwise,
t t|t1 , if 1, . . . , t 1,
L
t1|t
if t.
Kt ,
The weight on 1|0 for the initialization sample is likewise given by
t1
ti .
L
0 t1|t
i0
For t d, . . . , T 1, the weights on the observed variables are determined as in Section 5.9 but
with the caveat that the new weights for the initialization sample need to be used.
74
It should be emphasized that if the univariate filtering under diuse initialization is applied,
then the following matrices are redefined as follows
t,
t FK
t,
t FL
K
L
for the initialization sample.
5.16.2. Weights for the Updated State Variable Projections under Diuse Initialization
It is now straightforward to determine the weights on the observed variables over the initialization sample for the Kalman updater. Specifically, we now find that
t t|t1 N
t zt ,
t|t M
(5.148)
, if rank
H
nt ,
Ir P,t|t1 H1
,t|t1
,t|t1
t I P
1
M
if rank ,t|t1 0,
r
,t|t1 H,t|t1 H ,
otherwise,
Lt ,
where
P,t|t1 H1
,
if
rank
nt ,
,t|t1
,t|t1
t P
1
N
if rank ,t|t1 0,
,t|t1 H,t|t1 ,
otherwise.
Kt ,
This means that equation (5.48) is also valid for the intialization sample t 1, . . . , d, expect
that the weights on the observed variables are now determined by
t t|t1 , if 1, . . . , t 1,
M
t|t
if t,
Nt ,
and
while the weights on the initial state for the intialization sample is
t 0 t|t1 .
0 t|t M
If univariate filtering has been used, then the following matrices are redefined as follows
t, N
t K
t,
t L
M
for the initialization sample.
5.16.3. Weights for the Smoothed State Variable Projections under Diuse Initialization
The smooth estimates of the states variables for the initialization sample can be expressed as
t|T t|t1 Pt|t1 rt|T , t 1, . . . , d,
where Pt|t1 P,t|t1 P,t|t1
and rt|T is the extended innovation vector in (5.101) or in
(5.139). To determine observation weights for the smoothed state variables it is therefore
convenient to first compute the observation weights for the innovation vector rt|T .
In the event that ,t|t1 has rank nt or zero is can be shown that
t|t1 Jn zt H
t|t1 Jn H t|t1 L
t rt1|T ,
rt|T H
(5.149)
0
i nt , . . . , 1,
(5.150)
where
At,i
, if ,t,i > 0
H 1
i ,t,i
H
i ,t,i
, otherwise,
while
L,t,i L0,t,i
, if ,t,i > 0
0
L,t,i
Bt,i
L,t,i
,
otherwise.
0
L,t,i
By recursive substitution for rt,i in equation (5.150), making use of the transition equation
(5.136) for the innovations, and the relation rt,0 rt|T it can be shown that
nt
nt
i1
Bt,j At,i zt,i
Bt,i
(5.151)
I2 F rt1|T .
rt|T
i1
j1
i1
To obtain an equation similar to (5.149) we need to find an expression for zt,i in terms of zt
and t|t1 . By making use of equation (5.144) along with recursive substitution and recalling
that t,1 t|t1 some algebra leads to
i1
i1
i1
Lt,j t|t1
Lt,j Kt,k zt,k , i 1, . . . , nt .
(5.152)
t,i
j1
k1
jk1
Next, by substituting the right hand side of (5.152) into equation (5.107) for t,i we obtain
i1
i1
i1
Lt,j Kt,k zt,k Hi
Lt,j t|t1 , i 1, . . . , nt .
(5.153)
zt,i zt,i Hi
k1
j1
jk1
The first two terms on the right hand side of (5.153) can be further simplified to Ct,i S zt , where
Ct,i is the 1 nt vector given by
i1
Ct,i Hi
j2 Lt,j Kt,1 Hi Lt,i1 Kt,i2 Hi Kt,i1 1 0 0 ,
for i 1, . . . , nt . It follows that (5.151) can be rewitten as
nt
nt
i1
i1
i1
Bt,j At,i Ct,i S zt
Bt,j At,i Hi
Lt,j t|t1
rt|T
i1
j1
nt
Bt,i
i1
j1
j1
I2 F rt1|T ,
i1
or more compactly
t t|t1 H
t rt1|T .
(5.154)
rt|T Ft zt G
Combining the results for the rank of ,t|t1 being nt , zero, or between these numbers in
equation (5.149) and (5.154) we find that
t t|t1 H
t rt1|T ,
(5.155)
rt|T Ft zt G
76
where
Ft
t
G
t
H
H
otherwise,
Ft
H
otherwise,
Gt
L
if rank ,t|t1 {0, nt },
Ht otherwise.
If univariate filtering and smoothing has been applied, then the following equalities hold:
t G
t , and H
t H
t.
Ft Ft , G
The observation weights for the innovations rt|T over the initialization sample (t 1, . . . , d)
can now be established from (5.155). Specifically,
t rt1|T G
t t|t1 , if 1, . . . , t 1,
rt|T Ft H
t rt1|T ,
if t,
H
t rt1|T ,
if t 1, . . . , T.
rt|T are initialized at t d 1 by
The weights in the 2r nt matrix
rd1|T
.
rd1|T
0
Furthermore, the weights on the initial state for t 1, . . . , d are
t 0 rt1|T G
t 0 t|t1 ,
0 rt|T H
0 rd1|T
0 rd1|T
t|t1 Pt|t1 rt|T , if 1, . . . , t 1,
t|T
if t, . . . , T.
Pt|t1 rt|T ,
The weights for the initial state is similarly equal to
0 t|T 0 t|t1 Pt|t1 0 rt|T .
Since the measurement errors can be directly estimated using either updated or smoothed
estimates of the state variables, the observation weights under diuse initialization can be determined from the same relations. That is, the relationships in equations (5.55) and (5.56) are
still valid for the smooth estimates of the measurement errors. When calculating weights for
the update estimates of the measurement errors, the weights for the smooth state variables in
(5.56) are replaced with the weights for the update estimates of the state variables, recalling
that t|t 0 for t 1.
5.17. YADA Code
5.17.1. KalmanFilter(Ht)
The function KalmanFilter in YADA computes the value of the log-likelihood function in (5.18)
for a given set of parameter values. It requires a n T matrix Y y1 yT
with the observed
77
variables, a k T matrix X x1 xT
with exogenous variables, and parameter matrices A,
H, F, Q, and R. The F and Q matrices are constructed based on the output from the solution to
the DSGE model, while the A, H, and R matrices are specified in a user-defined function that
determines the measurement equation; cf. Section 17.4. Moreover, the vector with initial state
values 1|0 is needed. This input is denoted by KsiInit and is by default the zero vector.
Furthermore, KalmanFilter requires input on the variable initP. If this variable is 1, then
the function calculates an initial value for the matrix P1|0 as described in equation (5.15). If
this variable is set to 2, then the doubling algorithm is used to calculate an approximation of
(see DoublingAlgorithmLyapunov below). Next, the input variables MaxIter and Tolerance
are accepted and are used by the doubling algorithm function. The input variable StartPeriod
is used to start the sample at period tm 1. The default value of this parameter is 1, i.e., not to
skip any observations. Moreover, the boolean variable AllowUnitRoot is needed to determine
if undefined unit roots are accepted in the state equation or not. Finally, if initP is 3, then
P1|0 cIr , where c > 0 needs to be specified; its default value is 100.
The function KalmanFilterHt takes exactly the same input variables as KalmanFilter. While
the input variable H is r n for the latter function, it is now r n T for the former function.
This means that KalmanFilterHt allows for a time-varying measurement matrix.
As output, KalmanFilter (KalmanFilterHt) provides lnL, the value of the log-likelihood
function in (5.18), where the summation is taken from tm until T. Furthermore, output is
optionally provided for yt|t1 , H Pt|t1 H R (or Ht Pt|t1 Ht R when the measurement matrix
is time-varying), t|t1 , Pt|t1 , ln pyt |xt , t1 ; from tm until T, etc. The dimensions of the
outputs are:
lnL: scalar containing the value of the log-likelihood function in (5.18).
status: indicator variable being 0 if all the eigenvalues of F are inside the unit circle, and
1 otherwise. In the latter case, KalmanFilter (KalmanFilterHt) sets initP to 3. In
addition, this varaible takes the value -1 in the event that the value of the log-likelihood
function is not a real number. The latter can happen if the forecast covariance matrix
of the observed variables, H Pt|t1 H R, is not positive definite for some time period t.
lnLt: 1 T tm 1 vector ln pytm |xtm , tm 1 ; ln pyT |xT , T 1 ;
. [Optional]
Yhat: n T tm 1 matrix ytm |tm 1 yT |T 1
. [Optional]
MSEY: nnT tm 1 3 dimensional matrix where MSEY:, :, ttm 1 H Pt|t1 H R.
[Optional]
Ksitt1: r T tm 1 matrix tm |tm 1 T |T 1
. [Optional]
Ptt1: r r T tm 1 3 dimensional matrix where Ptt1:, :, t tm 1 Pt|t1 .
[Optional]
The inputs are given by Y, X, A, H, F, Q, R, KsiInit, initP, MaxIter, Tolerance, StartPeriod,
and c. All inputs are required by the function. The integer MaxIter is the maximum number
of iterations that the doubling algorithm can use when initP is 2. In this case, the parameter
Tolerance, i.e., the tolerance value for the algorithm, is also used.
5.17.2. UnitRootKalmanFilter(Ht)
The function UnitRootKalmanFilter (UnitRootKalmanFilterHt) takes all the input variables
that KalmanFilter (KalmanFilterHt) accepts. In addition, this unit-root consistent version of
the Kalman filter needs to know the location of the stationary state variables. This input vector
is given by StationaryPos. Using this information the function sets up an initial value for the
rows and columns of P1|0 using the algorithm determined through initP. If this integer is 1 or 2,
then the rows and columns of F and Q determined by StationaryPos are used. The remaining
entries of the P1|0 are set to zero if o-diagonal and to c if diagonal.
The output variables from UnitRootKalmanFilter (UnitRootKalmanFilterHt) are identical
to those from KalmanFilter (KalmanFilterHt).
78
5.17.3. StateSmoother(Ht)
The function StateSmoother (StateSmootherHt) computes t|t , t|T , Pt|t , Pt|T , and t1|t using
yt , yt|t1 , t|t1 and Pt|t1 as well as the parameter matrices H, R, F, and B0 as input. The
dimensions of the outputs are:
Ksitt: r T tm 1 matrix tm |tm T |T
.
Ptt: r r T tm 1 3 dimensional matrix where Ptt:, :, t tm 1 Pt|t .
KsitT: r T tm 1 matrix tm |T T |T
.
PtT: r r T tm 1 3 dimensional matrix where PtT:, :, t tm 1 Pt|T .
Ksit1t: r T tm 1 matrix with the 1-step smoothed projections t1|t .
rtvec: r 1 vector with the smoothed innovation vector rtm |T . [Optional]
NtMat: r r T tm 1 matrix with the smoothed innovation covariances Nt|T . [Optional]
The required inputs are given by Y, Yhat, Ksitt1, Ptt1, H, F, R, and B0. For the StateSmoother
version, the H matrix has dimension r n, while for StateSmootherHt is has dimension r nT.
5.17.4. SquareRootKalmanFilter(Ht)
The functions SquareRootKalmanFilter and SquareRootKalmanFilterHt compute the value
of the log-likelihood function using the square root filter rather than the standard filter; cf.
KalmanFilter and KalmanFilterHt. The functions take the same input variables as the standard filter functions except that Q is replaced with B0.
The required output variables are lnL and status. The optional variables are the same as for
the standard Kalman filter functions. In addition, the functions can compute output variables
Yerror (n T tm 1 matrix with the 1-step ahead forecast errors of the observed variables),
SigmaSqRoot (n n T tm 1 matrix with the square root of the 1-step ahead forecast error
covariance matrix of the observed variables), InvMSEY (nnT tm 1 matrix with the inverse
of y,t|t1 ), and KalmanGain (r n T tm 1 matrix with the Kalman gain matrix based on
the square root calculations).
5.17.5. UnitRootSquareRootKalmanFilter(Ht)
The unit root consistent function UnitRootSquareRootKalmanFilter and calculates the value
of the log-likelihood function using the square root filter rather than the standard filter; cf.
UnitRootKalmanFilter. The function take the same input variables as the standard filter functions except that Q is replaced with B0.
The output variables are exactly the same as the function SquareRootKalmanFilter provides.
The functions with Ht appended are mirror images exacpt that they allow for a time-varying
H matrix on the state variables in the measurement equation.
5.17.6. SquareRootSmoother(Ht)
The square root smoother are computed with the aid of the function SquareRootSmoother for
the case of a constant H matrix and by SquareRootSmootherHt for a time-varying H matrix. The
input variables are slightly dierent than those accepted by the standard smoother functions,
StateSmoother and StateSmootherHt. In particular, the square root smoother functions accept
the input variables: Yerror, SigmaSqRoot, InvMSEY, KalmanGain, Ksitt1, Ptt1, H, F, and B0.
The first 4 variables are output variables from the square root filter functions discussed above,
while the remaining input variables are shared with the standard smoother functions.
The output variables are the same as those given by StateSmoother and StateSmootherHt.
5.17.7. UnivariateKalmanFilter(Ht)
The functions UnivariateKalmanFilter and UnivariateKalmanFilterHt compute the value
of the log-likelihood function using the univariate filter rather than the standard filter; cf.
KalmanFilter and KalmanFilterHt. The functions take the same input variables as the standard filter functions.
79
The required output variables are lnL and status. The optional variables are the same as for
the standard Kalman filter functions. In addition, the functions can compute output variables
sigma2i (cell array of dimension 1 T where the cells contain the 1 n vector with scalars
t,i for i 1, . . . , n), Kti (cell array of dimension 1 T where the cells contains the r n
matrix whose columns are given by Kt,i for i 1, . . . , n), zti (cell array of dimension 1 T
where the cells contain the 1 n vector with scalars zt,i for i 1, . . . , n), and Hti (cell array of
dimension 1 T where the cells contain the r n matrix H, whose columns are given by Hi for
i 1, . . . , n). The variables are presented in Section 5.15.1.
5.17.8. UnitRootUnivariateKalmanFilter(Ht)
The function UnitRootUnivariateKalmanFilter (UnitRootUnivariateKalmanFilterHt) is responsible for computing the value of the log-likelihood function using the univariate filter rather
than the standard filter; see, for instance, UnitRootKalmanFilter. The function takes the same
input variables as the standard filter functions.
The required output variables are lnL and status. The optional variables are the same as for
the univariate Kalman filter function UnivariateKalmanFilter.
5.17.9. UnivariateStateSmoother
The univariate Kalman smoother is computed with the function UnivariateStateSmoother.
The input 11 variables are given by zti, sigma2i, Hti, Kti, Ksitt1, Ptt1, Yerror, H, F, R, and
B0. The first 4 input variables are given by the same named output variables of the univariate
Kalman filter. The final 7 input variables are all used by the other smoother functions.
The output variables are the same as those given by the standard and square root smoother
functions.
5.17.10. KalmanFilterMO(Ht)
The standard Kalman filtering subject to possibly missing observations is handled by the function KalmanFilterMO for the case of a constant H matrix and by KalmanFilterMOHt for a timevarying matrix. The input and output variables are identical to those for the standard Kalman
filter functions.
5.17.11. UnitRootKalmanFilterMO(Ht)
The standard Kalman filtering allowing for unit roots and possibly subject to missing observations is handled by UnitRootKalmanFilterMO for the case of a constant H matrix and by
UnitRootKalmanFilterMOHt for a time-varying matrix. The input and output variables are
identical to those for the standard Kalman filter functions that allow for unit roots.
5.17.12. SquareRootKalmanFilterMO(Ht)
The square root Kalman filtering subject to possibly missing observations is handled by the
function SquareRootKalmanFilterMO for the case of a constant H matrix and for a time-varying
H matrix by SquareRootKalmanFilterMOHt. The input and output variables are identical to
those for the square root Kalman filter functions.
5.17.13. UnitRootSquareRootKalmanFilterMO(Ht)
The square root Kalman filtering allowing for unit roots and possibly subject to missing observations is handled by UnitRootSquareRootKalmanFilterMO for the case of a constant H matrix
and for a time-varying H matrix by UnitRootSquareRootKalmanFilterMOHt. The input and
output variables are identical to those for the square root Kalman filter functions which allow
for unit roots.
80
5.17.14. UnivariateKalmanFilterMO(Ht)
The univariate Kalman filtering subject to possibly missing observations is handled by the function UnivariateKalmanFilterMO for the case of a constant H matrix and for a time-varying H
matrix by UnivariateKalmanFilterMOHt. The input and output variables are identical to those
for the univariate Kalman filter functions.
5.17.15. UnitRootUnivariateKalmanFilterMO(Ht)
The univariate Kalman filtering allowing for unit roots and possibly subject to missing observations is handled by UnitRootUnivariateKalmanFilterMO for the case of a constant H matrix
and for a time-varying H matrix by UnitRootUnivariateKalmanFilterMOHt. The input and
output variables are identical to those for the univariate Kalman filter functions which allow for
unit roots.
5.17.16. StateSmootherMO(Ht)
The standard Kalman smoothing subject to possibly missing observations is handled by the
function StateSmootherMO when the H matrix in the measurement equation is constant and
by StateSmootherMOHt when it is time-varying. The input and output variables are exactly the
same as those provided by the standard Kalman smoothing functions.
5.17.17. SquareRootSmootherMO(Ht)
The square root Kalman smoothing subject to possibly missing observations is handled by
SquareRootSmootherMO when the H matrix in the measurement equation is constant and by
SquareRootSmootherMOHt when it is time-varying. The input variables are exactly the same
as those needed by the square root Kalman smoothing functions without missing observations.
The output variables are extended with two optional variables. Namely, the r T matrices rtt
and rtT with estimates of the update and smooth innovations, rt|t and rt|T , respectively.
5.17.18. UnivariateStateSmootherMO
The univariate Kalman smoothing calculations subject to possibly missing observations is handled by the function UnivariateStateSmootherMO for the cases of a constant or a time-varying
H matrix in the measurement equation. The input variables are exactly the same as those
needed by the function UnivariateStateSmoother. The output variables are extended with
two optional variables. Namely, the r T matrices rtt and rtT with estimates of the update
and smooth innovations, rt|t and rt|T , respectively.
5.17.19. DiffuseKalmanFilter(MO)(Ht)
The functions DiffuseKalmanFilter(MO)(Ht) computes the standard Kalman filter with diuse
initialization, where functions with the addition MO to the name handle possible missing observations, and functions with the addition Ht cover a time-varying H-matrix in the measurement
equation. The 11 input variables are: Y, X, A, H, F, Q, R, KsiLast, StartPeriod, AllowUnitRoot,
and StationaryPos which are all shared with other functions for the standard Kalman filter.
The functions provides 2 required and 7 optional output variables: lnL, status, lnLt, Yhat,
MSEY, Ksitt1, Ptt1, SigmaRank, and SmoothData. Only the last two are unique to functions
with diuse initialization. The variable SigmaRank is a matrix with at most 2 rows and d
columns. The first row holds the rank of ,t|t1 over the initialization sample, while the second
row (when it exsts) holds the number of observed variables for the same sample. The last
output variable, SmoothData, is a structure with fields containing data needed for computing
the smooth and update estimates over the initialization sample.
5.17.20. DiffuseSquareRootKalmanFilter(MO)(Ht)
The functions DiffuseSquareRootKalmanFilter(MO)(Ht) computes the square-root Kalman filter with diuse initialization. The function shares its 11 input variables with the function for
the standard Kalman filter with diuse initialization, except that Q is replaced with B0.
81
39
The settings tab in YADA contains options for selecting the doubling algorithm rather than the vectorized solution
technique, and for selecting the maximum number of iterations and the tolerance level for the algorithm. The default
values are 100 and 1.0e-8.
82
6. Parameter Transformations
If some of the parameters in have a gamma, inverted gamma, left truncated normal, or Pareto
prior distribution, then the support for these parameters is bounded from below. Similarly, if
some of the parameters have a beta or uniform prior distribution, then the support is bounded
from below and above. Rather than maximizing the log posterior of subject to these bounds
on the support, it is common practise to transform the parameters of such that the support
of the transformed parameters is unbounded. Before turning the attention to posterior mode
estimation, the discussion will first consider the parameter transformation that YADA can apply.
6.1. Transformation Functions for the Original Parameters
For the p1 parameters with a gamma, inverted gamma, left truncated normal, or Pareto prior
distribution, denoted by 1 , the transformation function that is typically applied is the natural
logarithm
(6.1)
i,1 ln i,1 ci,1 , i 1, . . . , p1 ,
where ci,1 is the lower bound. Please note that YADA sets the lower bound of the Pareto distribution to c b, where c is the origin parameter and b the location parameter for y c z with
z having lower bound b; cf. equation (4.33).
Letting 2 denote the p2 parameters of with a beta or uniform prior distribution, the transformation function is the generalized logit
i,2 ai
(6.2)
, i 1, . . . , p2 ,
i,2 ln
bi i,2
where bi > ai gives the upper and the lower bounds.40 The remaining p0 parameters are given
by 0 0 , while 0 1 2
and 0 1 2
. The overall transformation of into
may be expressed as g, where g is a vector of monotonic functions.41
We can likewise define a transformation from back into by inverting the above relations.
That is,
(6.3)
i,1 exp i,1 ci,1 , i 1, . . . , p1 ,
and
ai bi exp i,2
, i 1, . . . , p2 ,
(6.4)
i,2
1 exp i,2
while 0 0 . The full transformation can be expressed as g 1 .
6.2. The Jacobian Matrix
When the parameters are used for evaluating the posterior distribution, it should be noted
that the log-likelihood function is invariant to the transformation, i.e., pY | pY |g 1
pY |. Next, the value of the joint prior density p can be determined in the usual way by
using the fact that g; recall Section 4.2.1. That is, we need to take the Jacobian in the
transformation into account.
Since the individial parameters are assumed to be independent, the prior density of is equal
to the product of the marginal prior densities for each individual parameter. Moreover, since
the matrix with partial derivatives of with respect to is diagonal and equal to the inverse
of the matrix with partial derivatives of with respect to , it follows that the individual
parameters are also a priori independent, that the individual Jacobians are given by
di,j
1
g g 1 di,j , i 1, . . . , pj and j 0, 1, 2,
i,j
and that p is equal to the product of the individual prior densities.
40
We may think of this transformation as a generalized logit since ai 0 and bi 1 implies the logit function.
41
There is no need to order the parameters according to their prior distribution. This is only done here for clarifying
reasons. YADA knows from reading the prior distribution input which parameters have a beta, a gamma, etc, prior
distribution.
83
For the parameters that have a gamma, inverted gamma, left truncated normal, or Pareto
distribution, the log of the Jacobian is simply
di,1
(6.5)
i,1 , i 1, . . . , p1 .
ln
di,1
For the parameters with a beta or a uniform prior, the log of the Jacobian is
di,2
ln bi ai i,2 2 ln 1 expi,2 , i 1, . . . , p2 .
ln
di,2
(6.6)
Finally, for the parameters with a normal, Student-t (and Cauchy), logistic, or Gumbel prior
the log of the Jacobian is zero since di,0 /di,0 1.
Notice that the Jacobians are positive for all parameter transformations and, hence, each
Jacobian is equal to its absolute value. The sum of the log Jacobians in equations (6.5) and
(6.6) should now be added to the log prior of , evaluated at g 1 , to obtain the value of
the log prior of ; cf. equation (4.3) in Section 4.2.1.
6.3. YADA Code
YADA has four functions that handle the parameter transformations discussed above. The
g mapping is handled by ThetaToPhi, the g 1 mapping by PhiToTheta, while
logJacobian takes care of calculating the log of the Jacobian. In addition, there is a function
(PartialThetaPartialPhi) that computes the matrix with partial derivatives of with respect
to .
6.3.1. ThetaToPhi
The function ThetaToPhi calculates the mapping g. It requires the inputs theta, the
vector; thetaIndex, a vector with the same length as with unit entries for all parameters
that have a gamma, an inverted gamma, a left truncated normal, or a Pareto prior distribution, with zero entries for all parameters with a normal, a Student-t, a Cauchy, a logistic, or a
Gumbel prior, with 2 for the beta prior, and 3 for the uniform prior; UniformBounds a matrix
with lower and upper bounds of any uniformly and beta distributed parameters (for all other
parameters the elements are 0 and 1); and LowerBound, a vector of the same length as with
the lower bound parameters ci,1 ; see also Section 7.4 regarding the function VerifyPriorData.
The output is given by phi.
6.3.2. PhiToTheta
The function PhiToTheta calculates the mapping g 1 . It requires the inputs phi, the
vector; thetaIndex; UniformBounds; and LowerBound. The output is given by theta.
6.3.3. logJacobian
The function logJacobian calculates the sum of the log of the Jacobian for the mapping
g 1 . It requires the inputs phi, the vector; thetaIndex; and UniformBounds. The output
is given by lnjac.
6.3.4. PartialThetaPartialPhi
The function PartialThetaPartialPhi calculates the partial derivatives of with respect to .
The required input is phi, thetaIndex, and UniformBounds. The output is given by the diagonal
matrix ThetaPhiPartial. This function is used when approximating the inverse Hessian at the
posterior mode of with the inverse Hessian at the posterior mode of .
84
m
R
The matrix Y represents the observed data, L is the likelihood function, g 1 , while
J is the determinant of the (diagonal) Jacobian; cf. Section 4.2.1. The posterior estimate of
(7.2)
arg max ln L Y ; ln p ,
In Section 7.1 we shall discuss when the posterior estimate is close to the
where
posterior mode .
The actual optimization of the log posterior of or is performed numerically in YADA. The
user can choose between Christopher Sims csminwel routine, Marco Rattos newrat, Dynares
gmhmaxlik, and Matlabs fminunc (provided that the Optimization Toolbox is installed and
YADA-related di-files have been taken into account) and whether the transformed parameters () or the original parameters () should be targetted. All these optimization routines
provide an estimate of the posterior mode of the targetted parameters and of the inverse Hes .42 The inverse Hessian at the mode is one candidate for the
sian at the mode, denoted by
covariance matrix of the proposal density that the random walk Metropolis algorithm discussed
in Section 8.1 needs for generating candidate draws from the posterior distribution of and of
.
Note that the YADA specific version of fminunc is not supplied with the public version
of YADA. Moreover, the original Matlab version of fminunc is not supported by the posterior mode estimation routine in YADA since it uses an edited version of the function (named
YADAfminuncx, where x should be replaced with 5 or 7). The YADA specific version has some
additional output fields and also supports a progress dialog. To make it possible for users that
have Matlabs Optimization Toolbox installed to use fminunc for posterior mode estimation in
YADA, di-files are available for download from the YADA website. In addition, instructions on
how to make use of the di-files are provided in the YADA help file.43
Rm .
42
The Matlab function fminunc actually produces an estimate of the Hessian at the mode.
43
The YADA website is located at: http://www.texlips.net/yada/. The YADA help file can also be read there.
85
The prior density of can be found using the result in Section 4.2.1. This means that
a
exp
1
.
exp exp
p
aba
b
This density resembles an extreme value distribution where, for instance, a 1 yields a Gumbel
distribution (for ) with location parameter 0 and scale parameter b; see
lnab. This
Section 4.2.11. It is straightforward to show that the mode of the density is
translates to the prior estimate ab and, hence, the mode of p is not equal to the value of
exp when is evaluated at the mode of p.
Furthermore, we know that the posterior distribution of is equal to its prior distribution
when the data is not informative about the parameter . Similarly, the posterior distribution
of is also equal to its prior in this case and since the distributions have dierent modes, it
follows that the posterior distributions do as well.
One implication of this is that it may be useful to perform the optimum check discussed in
be close
Section 7.2 also for the original parameters. Should the posterior estimate g 1
to the posterior mode for for all parameters, it suggests that the likelihood may be dominating
and, hence, that the data may be informative about all the parameters.44 On the other hand,
if the posterior mode of some element of appears to be far away from its posterior estimate
then the data is unlikely to be informative about this parameter. Hence, not only can
using ,
the plots of the log posteriors be useful when tuning a proposal density, but it may also be
informative about identification issues.
7.2. Checking the Optimum
is a local optimum, YADA makes use of some tools suggested
In order to check if the value
and originally coded by Mattias Villani. For each element i of the vector a suitable grid with
i c
1/2 , i c
1/2 ), where
d elements is constructed from the lower and upper bounds (
i,i
i,i
, the inverse Hessian of the log posterior at the mode. Let i
i,i is element i, i of
c > 0 and
be a vector with all elements of except element i. For each i in the grid, the log posterior is
evaluated at i , i . For parameter i this provides us with d values of the log posterior of i
conditional on i .
, where the value
One proposal density that YADA can use for posterior sampling is N,
of is determined from the previous draw from the posterior. Since the computed values of the
log posterior of i are conditional on all parameters being equal to their values at the posterior
mode, it is natural to compare them to a conditional proposal density for i . For the grid values
of i that were used to calculate the conditional log posterior values of the parameter, one such
i|i
i,i
i,i
1
density is the log of the normal density with mean i and variance
i,i i,i .
with element i removed. Similarly, the matrix
i,i
i,i is equal to the i:th row of
The vector
Note that this does not imply that the data is informative. It is possible that the prior mode of is close to the
mode of the prior for . In the one parameter example, large values for b imply that is close to .
45
Such an estimated conditional variance can also be transformed into a marginal variance if, e.g., we are willing
to use the correlation structure from the inverse Hessian at the posterior mode. Let C , where denotes
element-by-element division, and is the square root of the diagonal of . Let i|i be the conditional variance, while
i,i is the marginal variance. For a normal distribution we know that i|i i,i i,i 1
i,i i,i . This relationship
1
can also be expressed through the correlation structure as i|i 1 Ci,i Ci,i
Ci,i i,i . Hence, if we have an
estimate of the conditional variance i|i and the correlation structure C, we can compute the marginal variance i,i
by inverting this expression.
86
Figure 3. Plot of the conditional log posterior density around the estimated posterior mode along with two conditional proposal densities (left) and
the log-likelihood (right) for the parameters , G , and Z .
lnG /1 G
ln
165
170
170
0.2
0.2
0.4
0.6
0.8
172
1
166
170
168
175
lnG /1 G
ln
165
166
168
170
175
1.5
2.5
0.2
lnZ
0.2
0.4
0.6
0.8
172
1
1.5
2.5
lnZ
Mode
Log Posterior
Norm Approx Hessian
Norm Approx Modified Hessian
166
168
170
Mode
Log Posterior
Log Likelihood (scaled)
166
168
170
172
172
174
174
176
176
5.8
5.6
5.4
5.2
5.8
5.6
5.4
5.2
Using these ideas, the 3 plots to the left in Figure 3 provides graphs of the conditional log
posterior density (blue solid line) of the parameters , G , and Z from the An and Schorfheide
model in Section 17. The transformed () space for the parameters is used here, i.e., the log of
and Z and the logit of G . Since the support for is the real line, it is seems a priori more
likely that a normal distribution can serve well as a proposal density for than for , where,
for instance, G is restricted to be the 0-1 interval; cf. Section 6. The red dotted line shows the
normal approximation of the log posterior using the posterior mode as mean and conditional
variance based on the inverse Hessian at the mode. The green dashed line is similarly based on
the normal approximation with the same mean, but with the conditional variance estimated as
discussed in the previous paragraph.
It is worth noticing from the Figure that the normal approximations based on the inverse
Hessian and on the modification are close to the log posterior for all these parameters except
for G . In the case of G , however, these dierences seem to be particularly severe and indicates
, as its covariance matrix, then it may take
that if the proposal density has the inverse Hessian,
a long time before the support of G is suciently covered. By comparison, the proposal density
based on the modification lies closer to the log posterior and is therefore a better approximation.
The posterior mode checking facilities in YADA also produces a second set of plots. These
graphs operate over the same grid as those discussed above, but instead of studying proposal
densities they plot the log-likelihood against the log posterior over the grid for each parameter.
To avoid scaling problems in the graphs, YADA adds the value of the log prior at the mode to
the log-likelihood. One important feature of these plots is that potential identification issues
can be detected from the slope of the log-likelihood. Moreover, they give an idea of how far
away a local maximum for the log-likelihood is relative to the local maximum of the posterior
for each parameter.
The 3 plots to the right in Figure 3 provides graphs of the conditional log posterior density
(blue solid line) along with the log-likelihood (red dashed line) for the same 3 parameters. As
mentioned above, the log-likelihood has been scaled such that the value of the log prior at the
mode has been added to each value. For the parameter it can be seen that the local maximum
of the log-likelihood gives a value for is somewhat smaller than the posterior mode value,
while the local maximum of the log-likelihood for Z is very close to the posterior mode. It is
also noteworthy that the log-likelihood for G is increasing for values of this parameter that are
greater than the posterior mode value, but that the slope is rather flat. This suggests that at
least locally this parameter is not well identified. This is also supported by the result that the
prior mode of G (11/13 0.8462) is very close to the posterior estimate (0.8890).
87
s 1, 2, . . . , S,
(7.3)
where S is the maximum number of draws to use, m is the dimension of the parameter vector,
c is a positive scale factor, and is a positive definite matrix. The latter matrix may, e.g., be
diagonal with the prior variances of in the diagonal. Should the variances not exists, then the
corresponding element may be replaced with a large constant. Furthermore, the initial value
0 may be taken from the prior distribution information in YADA.
The vector s can now be updated as in Section 8.1,46 while the posterior mean and posterior covariance matrix are updated according to
s s1 1/s s s1 ,
s s1 s1 s1 s s
1/s s s s1 s1 s1 ,
and the posterior mode from
s
s
s1
(7.4)
(7.5)
otherwise.
The Dynare function gmhmaxlik also includes a portion of code that attempts to tune the
scale factor c before it simulates the posterior mean, mode, and covariance matrix. During this
tuning step, the initial value for (or ) also changes as draws are accepted by Monte Carlo
procedure. Moreover, once it has estimated the posterior covariance matrix via (7.4), the dynare
code optionally uses this estimate to first re-tune the scale factor c and thereafter to make a final
attempt of climbing the posterior hill to its peak, i.e.,the mode of the posterior distribution. The
tuning of the scale parameter is for this optional case, as well as for the initial tuning case,
based on a suitable target for the acceptance rate of the simulation procedure; YADA works
with a targeted acceptance rate of 1/4, while the original Dynare function has a target rate of
1/3.
7.4. YADA Code
The posterior mode is computed in YADA by the function PosteriorModeEstimation. The main
inputs for this function are the structures DSGEModel, CurrINI, and controls. The first contains
paths to the user files that specify the log-linearized DSGE model, the prior distribution of its
parameters, the data, the measurement equations, and any parameter functions that should be
dealt with. It also contains information about options for the Kalman filter, the sample to use,
names of observed variables, of exogenous variables, of state variables, and names of the state
shocks, your choice of optimization routine and parameters to target (transformed or original),
46
For the transformed parameters this means that the rule in equation (8.1) is used. For the original parameters
the ratio on the right hand side of the selection rule are replaced with the log posteriors for this parameterization.
88
the tolerance value, the maximum number of iterations to consider, as well as some other useful
features.
The CurrINI structure contains data on initialization information needed by YADA. This
structure contains non-model related information, while the DSGEModel structure contains the
model related information. Finally, the controls structure holds handles to all the controls on
the main GUI of YADA.
7.4.1. VerifyPriorData
Based on the input that PosteriorModeEstimation receives, the first task it performs is to
check the data in the prior distribution file. This is handled by the function VerifyPriorData.
Given that the prior distribution data is complete (cf. Section 17.2), this function returns the
prior distribution data in various variables. These variables are given by theta, thetaIndex,
thetaDist, LowerBound, ModelParameters, thetaPositions, PriorDist, ParameterNames, and
UniformBounds.
The vector theta contains the initial values of the parameters to be estimated, i.e., . The
vectors thetaIndex and thetaDist have the same length as theta with integer entries indicating the type of prior distribution that is assumed for each element of theta. The dierence
between these two vectors is that thetaIndex indicates the type of transformation that should
be applied to obtain , while thetaDist gives the prior distribution. The vector LowerBound
gives the lower bound specified for the parameters in the prior distribution file. This bound is,
for example, used by the parameter transformation function discussed in Section 6.3.
The structure ModelParameters has fields given by the parameter names assigned in the
prior distribution file; see, e.g., Table 2. Each field is assigned a value equal to the initial
value for that parameter. Both estimated and calibrated parameters are given a field in the
ModelParameters structure. Similarly, the vector structure thetaPositions has dimension
given by m, the dimension of . Each element in the vector structure has a field parameter
that contains a string with the name of the parameter. The vector structure is constructed such
that thetaPositions(i).parameter gives the name of the parameter in position i of .
The structure PriorDist has 11 fields: beta, gamma, normal, invgamma, truncnormal, cauchy,
student, uniform, logistic, gumbel, and pareto. Each such field contains a matrix whose
number of rows depends on the number of parameters assigned a given prior distribution. The
number of columns is 2 for the normal, uniform, gamma, inverted gamma, Cauchy and Gumbel, while it is 3 for the left truncated normal, the Student-t, the logistic, and the Pareto; the
third column holds the lower bound for the left truncated normal, the number of degrees of
freedom for the Student-t, and the origin for the Pareto. For the beta distribution, finally,
the matrix has 4 columns, where the last two hold the lower and upper bounds. Columns
1 and 2 have the values of prior parameter 1 and 2 that were given in the prior distribution file. The PosteriorModeEstimation function later creates 4 new fields in the PriorDist
structure. These fields are beta_ab, gamma_ab, logistic_ab, and gumbel_ab, containing the
a, b parameters for the beta (eq. (4.15)) and the gamma (eq. (4.7)) distributions, the location, scale and shape parameters , , c for the logistic distribution, and the location and
scale parameters for the Gumbel distribution , . The functions MomentToParamStdbetaPDF,
MomentToParamGammaPDF, MomentToParamLogisticPDF and MomentToParamGumbelPDF, respectively, deal with the transformations from mean and standard deviation (and shape parameter
c for the logistic) to the needed parameters; cf. Section 4.4.
The structure ParameterNames has fields all, calibrated, beta, gamma, normal, invgamma,
truncnormal, uniform, student, cauchy, logistic, gumbel, pareto, and estimated. Each
field returns a string matrix with the parameter names. One field to this structure, additional,
is added by the function ReadAdditionalParameterNames. This term holds a string matrix
with the names of all the new parameters defined in the file with parameters to update; cf.
Section 17.3. Moreover, it extends the string matrix ParameterNames.calibrated with any
new calibrated parameters that YADA has found in the file with parameters to initialize.
89
Finally, the matrix UniformBounds has dimension m 2. For all prior distributions but the
uniform and the beta each row has 0 and 1. For the uniform and the beta the rows have the
lower and the upper bounds.
7.4.2. logPosteriorPhiDSGE
Since csminwel, newrat, and fminunc are minimization routines, the function to minimize when
transformed parameters are targetted by the optimizer is given by minus the expression within
parenthesis on the right hand side of (7.1); this function is called logPosteriorPhiDSGE in
YADA. Before attempting to minimize this function, YADA runs a number of checks on the user
defined functions. First of all, all user defined Matlab functions are copied to the tmp directory
to make sure they are visible to Matlab.
Second, YADA attempts to run the user defined Matlab functions. The first group contains
any functions with additional parameters that the user has included in the model setup; cf.
Section 17.3. Given that such functions exist, the order in which they are executed depends
on the user input on the DSGE Data tab on the YADA GUI; see Figure 6. If both types of
additional parameter files exist, the order is determined by the data in the checkbox Run file
with parameters to initialize before file with parameters to update. The execution of additional
parameters updates the ModelParameters structure with fields and values.
Given that these files are executed without errors (or that they do not exist), the following
step is to check the validity of the measurement equation function; cf. Section 17.4. Assuming
this function takes the necessary input and provides the necessary output (without errors),
YADA then tries to solve the DSGE model; see Section 3.4. If the model has a unique convergent
solution at the initial values (see Section 3.1), YADA proceeds with the final preparations for
running the optimization routine. If not, YADA returns an error message, reporting which
problem AiM discovered.
The final preparations first involves collecting additional parameters into the ParameterNames
structure in the field additional. Next, the initial values of the additional parameters as
well as the initial values of the calibrated parameters are located and stored in the vectors
thetaAdditional and thetaCalibrated, respectively. These two tasks are handled by the
functions ReadAdditionalParameterNames and ReadAdditionalParameterValues. Next, the
actual sample to use is determined by running the function CreateSubSample. This sub-sample
does not take into account that the user may have selected a value for the StartPeriod variable
dierent from tm 1; see Section 5.17. The choice of StartPeriod is determined by the choice
of First observation after Kalman filter training sample on the Settings tab of the YADA dialog.
The last task before the chosen minimization routine is called is to check if the function
that calculates the log posterior returns a valid value at the initial parameter values. The
logPosteriorPhiDSGE function takes 10 input arguments. The first is phi, the transformed
parameters . Next, 6 inputs originally created by VerifyPriorData are required. They
are: thetaIndex, UniformBounds, LowerBound, thetaPositions, thetaDist, and the structure
PriorDist. Furthermore, the structure with model parameters ModelParameters, the DSGE
model structure DSGEModel, and the AIMData structure are needed. The latter structure is created when the DSGE model is parsed through the AiMInitialize function. That function saves
to a mat-file the outputs from the compute_aim_data function. When this mat-file is loaded into
YADA it creates the AIMData structure with fields having names equal to the output variables
of compute_aim_data. Finally, the variable OrderQZ is needed by the Klein (2000) and Sims
(2002) DSGE model solvers. Recall that this is a boolean variable which is unity if ordqz is a
built-in Matlab function (true for version 7 and later) and zero otherwise. Furthermore, the
choice of DSGE model solver is determined by the setting in the DSGE Model Data frame on the
DSGE Data tab; see Figure 6.
Based on this input the log posterior evaluation function logPosteriorPhiDSGE first transforms into by calling PhiToTheta. Next, it makes sure that ModelParameters is correctly updated for the parameters that are estimated. This is achieved through the function
ThetaToModelParameters. Apart from ModelParameters it needs theta and thetaPositions
90
to fulfill its task. With ModelParameters being updated for the estimated parameters, any remaining parameters are reevaluated next, i.e., the user defined function with parameters to
update.
With the parameter point determined, the log-likelihood function is examined and evaluated if the parameter point implies a unique convergent solution for the DSGE model and
the state vector is stationary (the largest eigenvalue of F is inside the unit circle). The function logLikelihoodDSGE deals with the calculation. The inputs for the function are the three
structures ModelParameters, DSGEModel, and AIMData. The function returns three variables:
logLikeValue, the value of the log-likelihood; mcode, indicating if the DSGE model has a
unique convergent solution or not; and status, a boolean variable, indicating if the F matrix in the state equation (5.2) has all eigenvalues inside the unit circle or not. Given that mcode
is 1 and that status is 0, the value of the log-likelihood is considered valid. If either of these
variables returns a dierent value, the function returns 1000000, otherwise it proceeds with the
evaluation of the log of the prior density at g 1 through logPriorDSGE and, if the prior
density value is not a NaN, the log of the Jacobian. The latter function is given by logJacobian,
presented in Section 6.3. If the log prior density value is NaN, then logPosteriorPhiDSGE again
returns 1000000. Otherwise, the function returns minus the sum of the log-likelihood, the log
prior density, and the log Jacobian.
In addition to the mandatory logPost output variable, the logPosteriorPhiDSGE function
also supports the optional logLike output variable. It is equal to NaN when logPost returns
1000000, and to logLikeValue when all computations could be carried out successfully.
7.4.3. logPosteriorThetaDSGE
The function logPosteriorThetaDSGE works in a very similar way as logPosteriorPhiDSGE.
One dierence is, of coursem that it takes the input vector theta with the original parameters
instead of phi. Another dierence is that the log posterior for the original parameters takes the
input variable ParameterBounds instead of UniformBounds. This matrix has two columns with
the lower and upper bounds for the parameters in theta. Finally, the output variable logPost
is equal to the sum of the log likelikhood and the log prior of theta. In all other respects, the
two log posterior functions are equal.
7.4.4. logLikelihoodDSGE
The function logLikelihoodDSGE directs the main tasks when using the DSGE model for computing the log-likelihood with the standard Kalman filter or the square root filter; see Sections 5.4 and 5.12. The inputs are, as already mentioned, the three structures ModelParameters,
DSGEModel, AIMData, and the boolean variable OrderQZ.
First, logLikelihoodDSGE runs either the AiMSolver, the KleinSolver or the SimsSolver
function. Given that it returns an mcode equal to unity and AiM is used to solve the model, the
AiMtoStateSpace function is executed. This provides us with the data to determine F and Q
that the Kalman filter needs. Next, the measurement equation function is executed to obtain
the A, H, and R matrices for the current value of the parameters. Once this task is completed, the appropriate Kalman filter function is executed, yielding the outputs logLikeValue
and status.47
7.4.5. logPriorDSGE
The function logPriorDSGE computes the log height of the joint prior density function at a
given value of . It requires the four inputs theta, thetaDist, PriorDist, and LowerBound. If
the value of log prior density is not a real number, logPriorDSGE returns NaN.
47
The standard Kalman filter function is KalmanFilter, while KalmanFilterHt handles a time-varying H matrix.
The unit root consistent functions are similarly called UnitRootKalmanFilter and UnitRootKalmanFilterHt, respectively; see Section 5.17 for details.
91
7.4.6. YADAcsminwel
Given that the user has chosen to use Christopher Sims csminwel function, the YADA implementation YADAcsminwel is utilized to minimize either the function logPosteriorPhiDSGE or
logPosterioThetaDSGE. If the optimization algorithm converges within the maximum number
of iterations that the user has selected, YADA makes use of the main output variables from this
function. First of all, the vector phiMode or thetaMode is collected. Furthermore, the value of
(minus) the log posterior at the mode is saved into LogPostDensity, while the inverse Hessian
is located in the variable InverseHessian. In case the log posterior of is used for estimation
then the mode of is obtained from the parameter transformation function PhiToTheta.
If, for some reason, csminwel is unable to locate the mode, YADA presents the return code
message of csminwel indicating what the problem may be.
7.4.7. YADAnewrat
When Marco Rattos newrat function has been selected, the YADA implementation YADAnewrat
is utilized to minimize either the function logPosteriorPhiDSGE or logPosterioThetaDSGE.
Since newrat uses an outer product gradient for calculation the Hessian, newrat requires values
of the log posterior for all time periods in the estimation sample. The additional functions
logPosteriorPhiDSGE4Time and logPosterioThetaDSGE4Time therefore support as a second
output argument a vector logPostT with all time t values of the log posterior.48 The input
variables are identical to those in logPosteriorPhiDSGE and logPosterioThetaDSGE.
7.4.8. YADAgmhmaxlik
When Dynares gmhmaxlik Monte Carlo based optimization procedure has been chosen, the
YADA version YADAgmhmaxlik is applied to maximize either minus logPosteriorPhiDSGE or
minus logPosterioThetaDSGE; recall that both these functions are setup for minimization. The
Monte Carlo based simulation scheme that the function uses takes input from the posterior
sampling and the specifics of these inputs are discussed in Sections 8.1 and 8.5.2. The initial
scale factor, c, is taken from the Scale factor for the posterior sampler selection; see also Figure 4.
Moreover, the number is Monte Carlo draws for estimating the posterior covariance matrix
in equation (7.4) is equal to the number of posterior draws per chain minus the number of
posterior draws discarded as burn-in period. As initial values for the posterior mean and the
posterior covariance matrix YADA supplies the function with estimates based on draws from the
prior distribution.
The function works in four dierent steps. First, it tunes the scale factor c such that the acceptance rate is close to the targeted rate of 1/4. The posterior mode estimate is also updated
as in equation (7.5) during this step, but the posterior mean and covariance matrix are not considered. The maximum number of parameter draws during this stage is equal to the minimum
of 200,000 and 10 times the number of Monte Carlo draws. Second, based on this tuned scale
factor and the initial covariance matrix, the posterior mean and posterior covariance matrix are
estimated via (7.4) using the number of selected Monte Carlo draws, with the posterior mode
estimate again being updated as in equation (7.5).
Third, the scale factor c is retuned based on the estimated posterior covariance matrix. This
step is otherwise the same as the first, with the posterior mode estimate being updated using
the rules of the sampler and equation (7.5). Once retuning has finished, the last step involves
an attempt to climb the summit of the of the log posterior. The maximum number of parameter
draws is the same as during step one and three with the same covariance matrix as in step
three, but now all draws are accepted as possible mode candidates and only the test in (7.5)
is applied. The scale factor is also retuned during this last step to cool down the system as it
comes closer to the summit of the log posterior.
48
The original version of newrat in Dynare expects function names ending with _hh. This has been changed in
YADAnewrat to 4Time.
92
7.4.9. YADAfminunc*
YADA has three versions of Matlabs fminunc at its disposal. For Matlab versions prior to version 7, an older fminunc function is called: it is named YADAfminunc5 and its original version
is dated October 12, 1999. For version 7 and later, YADAfminunc7 is used, and for Matlab
versions prior to 7.5 it is originally dated April 18, 2005, while for Matlab version 7.5 and
later it is dated December 15, 2006. The YADAfminunc* function attempts to minimize the
function logPosteriorPhiDSGE or logPosteriorThetaDSGE, and if it is successful the vector
phiMode or thetaMode, minus the value of the log posterior at the mode, and the Hessian at the
mode are provided. This Hessian is inverted by YADA and the results is stored in the variable
InverseHessian. In case the log posterior of is used for estimation then the mode of is
obtained from the parameter transformation function PhiToTheta.
Again, if YADAfminunc* fails to locate the posterior mode within the maximum number of
iterations, the return code message of fminunc is presented.
YADAfminunc* is only available in the version of YADA that is exclusive to the New Area-Wide
Model team at the European Central Bank. As mentioned in Section 7, the publicly available
version of YADA does not include these functions. Instead, di-files are provided from the YADA
website. These files provide the information the user needs to properly edit some files from the
Optimization Toolbox such that fminunc can be used in YADA. It should be emphasized that the
original files from the Optimization Toolbox should not be edited, only copies of them that are
subject to the names changes required by YADA.
93
8. Posterior Sampling
8.1. The Random Walk Metropolis Algorithm
The Random Walk Metropolis (RWM) algorithm is a special case of the class of Markov Chain
Monte Carlo (MCMC) algorithms popularly called Metropolis-Hastings algorithms; see, Hastings (1970) and Chib and Greenberg (1995) for an overview.
The Metropolis version of this MCMC algorithm is based on a symmetric proposal density,
i.e., q , |Y q, |Y , while the random walk part follows when the proposal density is
symmetric around zero, q , |Y q , 0|Y . The random walk version of the algorithm
was originally suggested by Metropolis, Rosenbluth, Rosenbluth, Teller, and Teller (1953) and
was first used to generate draws from the posterior distribution of DSGE model parameters by
Schorfheide (2000) and Otrok (2001).
The description of the RWM algorithm here follows An and Schorfheide (2007) closely.
(1) Compute the posterior mode of ; cf. (7.1) or (7.2). The mode is denoted by .
(2) Let be the inverse of the Hessian evaluated at the posterior mode . YADA actually
allows this matrix to be estimated in four dierent ways and that its correlations are
scaled in a joint fashion towards zero.
c2
N
1 |s|
N
1
s,
(8.2)
1
N
N
sN
where
s
N
1 n ns
,
N ns1
if s 0,
of the diagonal elements of . The matrix has the usual property that it converges to 0 in
94
probability as N when the Markov chain that has generated the s sequence is ergodic;
see, e.g. Tierney (1994).
8.2. The Slice Sampling Algorithm
One drawback with Metropolis-Hastings algorithms is that they require quite a lot of tuning
to work well. Tuning involves the selection of a proposal density and its parameterization. If
the proposal density is a poor approximation of the posterior distribution, convergence of the
sampler may be very slow or, even worse, the sampler may not cover important subsets of the
support for the posterior distribution.
The issue of convergence of the MCMC sampler is covered in Section 9 and the tools discussed
there apply to any MCMC sampler. Assuming that, for instance, the RWM sampler has converged
we may nevertheless find that it is not very ecient. In particular, a large number of draws
may be required to achieve convergence due to high serial correlation in the MCMC chain. To
examine how ecient the sampling algorithm is, variance ratios, such as the ineciency factor,
can be computed from draws of the individual parameters as well as for the value of the log
posterior; see, e.g., Roberts (1996, Section 3.4).49
An MCMC sampler that requires less tuning and which relies on standard distributions is the
so called slice sampler. This sampler is based on the idea that to sample a random variable one
can sample uniformly from a region under a slice of its density function; Neal (2003). This idea
is particularly attractive for models with non-conjugate priors, such as DSGE models, since for
the region under the slice to be unique the height of the density need only be determined up to
a constant.
To formalize the idea of slice sampling we shall first consider the univariate case. Let fx be
proportional to the density of x, denoted by px. A slice sampling algorith for x follows these
steps:
(1) Draw y from U0, fx0 , where x0 is a value of x such that fx0 > 0. This defines
the horizontal slice: S {x : y < fx}, where by construction x0 S.
(2) Find an interval, I L, R, around x0 that contains all, or much, of the slice.
(3) Draw a new point, x1 , uniformly from the part of the slice within I.
Notice that the horizontal slice S is identical to the slice S {x : y < px}, where y
px0 /fx0 y.
The first step of the sampler involves the introduction of an auxiliary variable, y, which need
not be stored once a new value of x has been drawn. It only serves the purpose of determining a
lower bound for fx, i.e., we are only interested in those values of x, on the horizontal axis, for
which fx is greater than y. The dicult step in the algorithm is the second, where a suitable
interval on the horizontal axis needs to be determined.50 Unless the interval I is identical to S,
the third step in the algorithm may require that the function fx is evaluated more than once
per iteration.
A multivariate case using hyperrectangles is described by Neal (2003, Section 5.1). The three
steps given above are still valid, except x is now m-dimensional, and the interval I is replaced
by the hyperrectangle H L1 , R1 Lm , Rm . Let i be scale estimates for each variable
i 1, . . . , m while x0 is the previous value of x. The multivariate slice sampler based on
hyperrectangles can be implemented as follows:
(1) Draw y from U0, fx0 ;
49
The ineciency factor is equal to the ratio between the variance of the draws when autocorrelation is taken into
account and the variance under the assumption that the draws are i.i.d. That is, the ineciency factor is equal to 1
plus 2 times the sum of the autocorrelations. For serially correlated processes this factor is greater than unity, and
the larger its value is the more inecient the sampler is. Roberts (1996) considers the inverse of this ratio, i.e., the
eciency factor, which YADA presents as RNE (relative numerical eciency); see also Geyer (1992).
50
Draws from a normal distribution can be produced with this algorithm. Specifically, given a value
% for xdraw
2
y uniformly from 0, expx /2/ 2
. Define the upper bound of the horizontal slice by R 2 lny 2,
while L R. Finally, draw x uniformly from L, R
.
95
96
The credible set is generally not unique; see, e.g., Bernardo and Smith (2000, Chapter 5.1) or
Geweke (2005, Chapter 2.5). For example, we may select C l , u , l < u such that
Pr < l
1 Pr > u
/2, i.e., an equal tails credible interval. One advantage of this
choice is that it is always unique. At the same time, a disadvantage is that it is generally not the
shortest possible credible set in terms of distance from the maximum to the minimum, or even
the set with the highest probability density values.
The highest probability density (HPD) credible set is a popular choice in Bayesian inference.
Such a set, CHP D , is defined with respect to p such that
&
(i) CHP D pd 1 ; and
If p is unimodal and symmetric, then the HPD credible region is typically equal to the equal
tails credible interval. This is, for example, true when p is Gaussian. However, when p is
skewed, then the the equal tails credible interval is not equal to the HPD set. At the same time,
while the equal tails credible interval is unique the HPD credible set need not be; for instance,
when is uniformly distributed.
The HPD set may be estimated directly from the posterior draws. Let is be ordered as i,j ,
where j 1, . . . , N and i,j i,j1 for all j 1, . . . , N 1. Furthermore, let 1 N
denote
the closest integer and define the interval Rj N i,j , i,j 1N
for j 1, . . . , N 1
N
. Provided that the posterior density of i is unimodal and that the sample of draws is
ergodic, it is shown by Chen and Shao (1999, Theorem 2) that
Rj N
min
1jN 1N
Rj N,
(8.3)
converges almost surely to CHP D . Since j need not be unique, Chen and Shao suggest using
the lowest value of j satisfying (8.3) to obtain a unique estimate.
Furthermore, since the HPD is not unique to transformations of i , such as hi , one needs
to estimate the HPD on the transformation. Moreover, the transformation may also be a function
of (and the data), HPDs for such transformations should also be estimated directly from the
transformed values.
97
, unless N
< 100 and N > 200 when N 100 is selected. This ensures that
N
, while limN N
2 /N 0; see, e.g., Geweke (2005, Theorem 4.7.3). The
limN N
output is given by StdErr, an m m covariance matrix such that the numerical standard errors
are available as the square root of the diagonal elements.
8.5.2. DSGERWMPosteriorSampling
The function DSGERWMPosteriorSampling handles the actual run of the RWM algorithm. It
uses the same inputs as PosteriorModeEstimation, i.e., DSGEModel, CurrINI, controls. One
important dierence relative to the posterior mode estimation function is that some fields in
the DSGEModel structure are ignored in favor of values saved to disk while running the posterior
mode estimation routine. In particular, YADA stores data about the prior distribution and sample dates information to disk and DSGERWMPosteriorSampling makes sure that the same prior
and dates are used when sampling from the posterior as when estimating the posterior mode.
Before starting up the RWM algorithm, the function performs a number of tests. First, it
attempts to execute the additional parameter functions that are present. If they run without
giving any errors, the measurement equation function is executed next and, thereafter, YADA
attempts to solve the DSGE model at the posterior mode estimate. Given that all checks return
positive results, the log posterior function logPosteriorPhiDSGE is executed for the posterior
mode estimate of .
The RWM algorithm in YADA has been designed to be flexible and to avoid computing things
more often than necessary. The Posterior Sampling frame on the Options tab is displayed in
Figure 4. First, the choice of posterior sampler is provided: the RWM algorithm or the slice
sampler. Next, the total number of draws from the posterior per sampling chain can be selected
as well as the number of sample batches per chain. YADA always stores the data to disk when
a sample batch has been completed. The number of draws in each batch depends on the total
number of draws per chain and the number of sample batches per chain. This allows the user
to abort a run and later on restart the posterior sampler from the last saved position.
Furthermore, the number of sampling chains can be selected as well as the length of the
burn-in period for the sampler. The draws obtained during the burn-in period are later on
discarded from the total number of posterior draws per chain, although they will still be saved
to disk.
The selections thereafter turns to the proposal density. First, the method for estimating the
inverse Hessian at the posterior mode can be selected. YADA makes use of the output from the
selected optimization routine by default. Alternatively, YADA can fit a quadratic to the evaluated
log posterior to estimate the diagonal of the inverse Hessian; cf. Section 7.2. In addition, when
the option Transform conditional standard deviations for modified Hessian to marginal using
correlations from Hessian on the Miscellaneous tab is check marked, then these estimates are
scaled up accordingly. For both possibilities, the correlation structure is thereafter taken from
the inverse Hessian that the optimization routine provides. Third, a finite dierence estimator
can be applied. Here, the step length is determined by the user and this selection is located
on the Miscellaneous tab. Finally, a user specified parameter covariance matrix for the proposal
density is also supported. Such a matrix may, for instance, be estimated using old draws from
the posterior distribution. YADA supports this feature from the View menu, but any user defined
matrix is also allowed provided that it is stored in a mat-file and that the matrix has the name
ParameterCovarianceMatrix.
98
The estimator of the inverse Hessian can be influenced through a parameter that determines
its maximum absolute correlation. This parameter is by default set to 1 (no restriction), but
values between 0.95 and 0 can also be selected. This parameter interacts with the largest
absolute correlation for the inverse Hessian such that the o-diagonal elements of the new
inverse Hessian are given by the old o-diagonal elements times the minimum of 1 and the
ratio between the desired maximum absolute correlation and the estimated maximum absolute
correlation. Hence, if the desired maximum absolute correlation is greater than the estimated
maximum absolute correlation then the inverse Hessian is not aected. On the other hand, if
the ratio between these correlations is less than unity, then all correlations are scaled towards
zero by with the ratio. At the extreme, all correlations can be set to zero by selecting a maximum
absolute correlation of zero.
The following two selections concern the c0 and c scale parameters for the initial density
and for the proposal density, respectively. The selection of the c parameter influences greatly
the sample acceptance probability. If you consider this probability to be too low or high, then
changing c will often help; see, e.g., Adolfson, Lind, and Villani (2007c). The c0 parameter
gives the user a possibility to influence the initial value 0 . For instance, if c0 0, then
i.e., the posterior mode.
0 ,
The next parameter in the Posterior Sampling frame is only used under multiple sampling
chains. The weight on randomization refers to the weight given to a randomly drawn , determined as in the case of the single chain but with c0 4, relative to the posterior mode when
setting up 0 . Hence, if the weight on randomization is 1, then each sampling chain uses
16
and if the weight on randomization is 0, then each sampling chain starts
0 from Nm ,
from the posterior mode. This means that the weight on randomization is identical to c0 except
that it is restricted to the 0-4 interval. Since multiple chains are used to check, for instance,
convergence related issues, it is not recommended to start all chains from the posterior mode.
99
The following parameter on this frame determines the maximum number of draws from the
posterior that will be used in prediction exercises. When comparing the number of posterior
draws minus the length of the burn-in period to the desired maximum number of draws to use
in such exercises YADA selects the smallest of these two numbers. Among the stored posterior
draws the parameter values used in prediction are obtained by either using a fixed interval (default) or by drawing randomly from the available draws using a uniform distribution. The length
of the fixed interval is the maximum possible to ensure desired number of parameter draws is
possible. The option Randomize draws from posterior distributions on the Miscellaneous tab
determines which option is used.
The final parameter on the Posterior Sampling frame is the percentage use of posterior draws
for impulse responses, variance decompositions, etc. It allows the user to consider only a share
of the available posterior draws when computing the posterior distributions of such functions. It
may be useful to make use of a small share of the number of available draws when preliminary
results are desired or when the user is mainly concerned with point estimates, such as the
posterior mean. As above for the prediction case, if fewer than 100 percent are used, the user
can choose between the largest fixed interval between draws (default) or uniform draws from
the posterior draws.
One additional user determined parameter influences how the RWM algorithm is executed.
This parameter in located on the DSGE Posterior Sampling frame on the Settings tab; see Figure 5. If the checkbox Overwrite old draws is check marked, then previous draws will be overwritten. Conversely, when this box is not checked, then old draws will be used. This allows the
user to recover from a previously aborted run of the RWM algorithm provided that the number
of sample batches is greater than 1 and that at least one batch was saved to disk.
There is also another case when YADA can recover previously saved posterior draws. Given
that the checkbox Overwrite old draws is check marked and the posterior draws are only obtained from one sampling chain, YADA will check if, for your current selection about the number of sample batches, posterior draws exist on disk such that the number of posterior draws is
lower than what you have currently selected. For instance, suppose you have selected to save
10 sample batches per chain and that you currently consider 100,000 posterior draws. YADA
will then check if you have previous run the posterior sampler with less than 100,000 draws for
10 sample batches. The highest such number of posterior draws will then be considered as a
candidate. Supposing that you have already run the posterior sampler for 75,000 draws with
10 sample batches, YADA will ask you if you would like to make use of these 75,000 draws.
When the RWM algorithm has finished, YADA first allows for (but does not force) sequential
estimation of the marginal likelihood. The alternative estimators are discussed in Section 10.
The choice of algorithm is stated on the DSGE Posterior Sampling frame, where certain other
parameters that are needed by the marginal likelihood estimation function can also be selected.
Before DSGERWMPosteriorSampling has completed its mission it sends the results to a function that writes a summary of them to file. This file is finally displayed for the user.
8.5.3. DSGESlicePosteriorSampling
The function DSGESlicePosteriorSampling handles the actual run of the slice sampler. It uses
the same inputs as DSGERWMPosteriorSampling, i.e., DSGEModel, CurrINI, controls. Moreover,
it behaves in essentially the same way as the RWM function with the exception of the actual
sampling part. Moreover, the RWM algorithm keeps track of the acceptance rate, while the
slice sampler counts the number of times the log posterior is computed. In all other respects,
the functions perform the same tasks except that the slice sampler will never call the Chib and
Jeliazkov (2001) marginal likelihood estimator function; see Section 10.4.4.
8.5.4. ExponentialRndFcn
The function ExponentialRndFcn computes random draws from an exponential distribution.
The function takes 1 required input variables, mu, the mean of the distribution. An optional
input variable for total number of draws, NumDraws, is also accepted. The default value for this
integer is 1. The algorithm used is discussed in footnote 51 on page 96.
100
Figure 5. The DSGE Posterior Sampling frame on the Settings tab in YADA.
The function provides one output variable, z, a matrix with row dimension equal to the length
of mu and column dimension equal to NumDraws.
101
The cusum path plot is obtained by plotting {Ci } against i 1, . . . , N. If N is very large it may
be practical to plot the statistic against, say, i N0 , 2N0 , . . . , N instead for some suitable integer
N0 .
The cusum statistic in (9.1) is zero for i N. In YADA the value of
is added to Ci and the
1
summary statistics are either the log posterior (ln LY ; g ln pg 1 ln J), the
original parameters (), or the transformed parameters (). Moreover, YADA calculates moving
window cusum paths for a fixed window size of N1 N/10, i.e.,
Ci
i
,
SXj
i N1 , . . . , N,
(9.2)
ji1N1
where, again,
is added to Ci .
A separated partial means test for a single MCMC chain has been suggested by Geweke;
see, e.g., Geweke (2005, Theorem 4.7.4). Let N be the number of draws and suppose that
Np N/2p and p are positive integers. For instance, with N 10, 000 and p 5 we have that
N5 1, 000. Define the p separated partial means:
SN
j,p
Np
1 mNp 2j1
S
,
Np m1
j 1, . . . , p,
(9.3)
where S is some summary statistic of the transformed parameters (such as the original parameters ). Let j,p be the Newey and West (1987) numerical standard error for j 1, . . . , p.
102
N
Define the (p 1) vector SN
with typical element SN
p
j1,p Sj,p and the p 1 p 1
tridiagonal matrix VpN where
N
2
2
Vj,j
j,p
j1,p
,
and
j 1, . . . , p 1
N
N
2
Vj1,j
j1,p
,
Vj,j1
j 1, . . . , p 1.
The statistic
N 1 N d 2
SN
(9.4)
Sp p 1,
Vp
GN
p
p
as N under the hypothesis that the MCMC chain has converged with the separated partial
means being equal.
9.2. Multiple Chain Convergence Statistics
One approach for monitoring convergence using draws from multiple MCMC chains is to use
analysis of variance. The approach outlined below is based on Brooks and Gelman (1998),
which generalizes the ideas in Gelman and Rubin (1992); see also Gelman (1996) and Gelman
et al. (2004).
For a univariate scalar summary S we may assume that we have N draws from M chains.
Let Sij denote draw i from chain j. We may then define the average of S for chain j as Sj
M
N
1/N i1 Sij , while the overall average is S 1/M j1 Sj . The between-chain variance
B and the within-chain variance W are now given by
B
and
M
2
N
Sj S ,
M 1 j1
2
1
Sij Sj .
W
MN 1 j1 i1
M
(9.5)
(9.6)
The between-chain variance B contains a factor of N because it is based on the variance of the
within-chain means, Sj , each of which is an average of N draws Sij .
From the two variance components two estimates of the variance of S in the target distribution, S , can be constructed. First
S N 1 W 1 B,
(9.7)
N
N
is an unbiased estimate of the variance under the assumption of stationarity, i.e., when the
starting points of the posterior draws are actually draws from the target distribution. Under the
more realistic assumption that the starting points are overdispersed, then (9.7) is an overestimate of the variance of S.
Second, for any finite N, the within-chain variance W in (9.6) should underestimate the vari S and W approach S , but from opposite directions.
ance of S. In the limit as N , both
Accounting for sampling variability of the estimator S yields a pooled posterior variance esti S B/MN.
mate of V
To monitor convergence of the posterior simulation Gelman and Rubin (1992) therefore suggest estimating the ratio of the upper and the lower bounds of the variance of S through:
'
N 1 M 1B
V
.
(9.8)
R
W
N
MNW
As the simulation converges, the potential scale reduction factor in (9.8) declines to 1. This
means that the M parallel Markov chains are essentially overlapping.
The scalar summary S can here be individual parameters of the DSGE model. A multivariate
version of the potential scale reduction factor is suggested by Brooks and Gelman (1998). Now
S is, e.g., a vector of all the model parameters, with B and W being covariance matrix estimators
of the between-chain and within-chain covariance. The multivariate potential scale reduction
103
'
N1 M1
1 ,
(9.9)
N
M
where 1 is the largest eigenvalue of the positive definite matrix 1/NW 1 B; see Brooks and
Gelman (1998, Lemma 2). The multivariate potential scale reduction factor in (9.9) declines
less
towards 1 as the simulation converges. Gelman et al. (2004) suggests that values for R
than 1.1 may be regarded as an indication that the MCMC sampler has converged.
In addition to monitoring the MPSRF in (9.9) Gelman et al. (2004) also suggest to monitor
the determinants of W and B. This allows the user to also check if both the within-chain
covariance matrix W and the between-chain covariance matrix B stabilize as functions of N.
R
columns, where the first holds the number of draws used in the calculation, the second the R
value, the third the determinant of V , while the fourth holds the determinant of W. The rows
104
correspond to the sample sizes used for the sequential estimates, as can be read from the first
column.
105
f
p|Y d
,
(10.2)
pY
pY |p
&
where f is a proper probability density function such that fd 1; see Gelfand and
Dey (1994).52 Given a choice for f, the marginal likelihood pY can then be estimated
using:
1
N
f s
1
,
(10.3)
pH Y
N s1 L Y ; s p s
where s is a draw from the posterior distribution and N is the number of draws. As noted
by, e.g., An and Schorfheide (2007), the numerical approximation is ecient if f is selected
such that the summands are of equal magnitude. It can be shown that the harmonic mean
estimator is consistent but not unbiased. In fact, due to Jensens equality it is upward biased.
10.2.1. Truncated Normal Weighting Function
Geweke (1999) suggested to use the density of a truncated multivariate normal distribution in
(10.3). That is, for 0 < p < 1
)
(
1
exp 1/2
1
2
(10.4)
p m ,
f
1/2
p2m/2 | |
where is the mean and the covariance matrix from the output of the posterior simulator,
N
N
i.e., N 1 s1 s and N 1 s1 s s . The expression {a b} is 1 if true
and 0 otherwise, while 2p m is the 100p percentile value of the 2 distribution with m degrees
of freedom; see also Geweke (2005, Section 8.2.4 and Theorem 8.1.2).
10.2.2. Truncated Elliptical Weighting Function
The accuracy of the harmonic mean estimator depends on the degree of overlap between the
numerator (weight function) and denominator (posterior kernel) in (10.3). When the posterior
density of is far from Gaussian, the Gaussian weighting function is less likely to work well.
First of all, the height of the posterior density can be very low at the mean, especially when
it is multimodal. Second, the truncated normal is often a poor local approximation to a nonGaussian posterior density. Third, the likelihood can get close to zero in the interior of the
parameter space. To deal with these three issues, Sims, Waggoner, and Zha (2008) (SWZ) have
suggested an alternative weighting function based on a truncated elliptical distribution.
s
s
. Next,
N 1 N
Let be the posterior mode and the scaling matrix
s1
let the nonnegative scalar r be given by
%
1
.
(10.5)
r
m/2 h r
,
|1/2 r m1
2 m/2 |
(10.6)
The identify& follows from noticing that p|Y pY |p/pY so that the right hand side of (10.2) is equal
to 1/pY fd
1 .
52
107
where m is the dimension of , and hr is a density function that does not depend on m which
is defined for nonnegative r and is to be estimated.53
Let r s be the value of r when s , i.e., it represents the value of r for the posterior
draws s 1, . . . , N. Suppose that hr have a support on a, b
and be defined as
cr c1
.
(10.7)
h r|a, b, c c
b ac
The parameters a, b, and c are chosen as follows. Let Qx be the x percentile of the r s values,
ordered from the smallest to the largest, for x 1, 10, 90. The values b and c are selected such
that the probability of r Q10 from hr|0, b, c is equal to 0.1 and the probability of r Q90
from hr|0, b, c is equal to 0.9. This means that
Q90
ln1/9
b
.
, c
lnQ10 /Q90
0.91/c
Furthermore, the value of a Q1 to keep hr bounded above.
In order to truncate the elliptical distribution, SWZ suggested using iid draws from g.
These are constructed as
r 1/2
i 1, . . . , M,
x ,
(10.8)
i
x
53
The normal, Student t, logistic, and Laplace distributions are examples of elliptical distributions; see, e.g., Fang,
Kotz, and Ng (1990) and Landsman and Valdez (2003).
108
weight function and the posterior kernel have very little overlap and the estimated marginal
likelihood is likely to be misleading.54
10.2.3. Transformed Parameters
Since YADA works internally with the transformed parameters, the expression for the modified harmonic mean estimator of the marginal likelihood is slightly dierent. Specifically,
1
N
f s
1
,
(10.11)
pH Y
N s1 L Y ; g 1 s p g 1 s J s
where f is either the truncated normal or the truncated elliptical for the transformed
parameters.
The numerical standard error can easily be computed for the modified harmonic mean estimator of the (log of the) marginal likelihood. Let
f
,
(10.12)
m
k |Y
where k|Y LY ; g 1 pg 1 J is the posterior kernel of p|Y . To handle
possible numerical issues we introduce a constant c which guarantees that a rescaling of m
is bounded for all . An example of such a numerical problem is that the log posterior kernel is
equal to, say, 720 for some . While exp720 is a small positive number, computer software
such as matlab states that 1/ exp720 is infinite.55
One suitable choice is often c max k|Y , i.e., the largest posterior kernel value over all
posterior draws of . For any suitable choice of c, it follows that
f
.
(10.13)
n
exp ln k |Y lnc
The rescaling needs to be accounted for when computing the marginal likelihood. Let n
be
the sample mean of n. It follows that the log marginal likelihood based on the modified
harmonic mean estimator is given by
.
(10.14)
ln pH Y ln c ln n
The n values can now be used to compute a numerical standard error of the log marginal
likelihood based on the modified harmonic mean estimator. The numerical standard error of
n can be computed from the Newey and West (1987) estimator in (8.2) by replacing with
n. The delta method can thereafter be applied such that the numerical standard error of the
log marginal likelihood is equal to the numerical standard error of n divided by n
.
10.3. The Chib and Jeliazkov Estimator
The Chib and Jeliazkov (2001) estimator of the marginal likelihood starts from the so called
marginal likelihood identity
L Y; p
L Y ; g 1 p g 1 J
.
(10.15)
p Y
p |Y
p |Y
This relation holds for any value of (and ), but a point with high posterior density should
In practise, it is important that rescaled log posterior kernel values are bounded from below by about 700 and
from above by approximately 700. This guarantees that the exponential function yields finite values.
55
109
The numerator of (10.15) can be determined directly once the posterior mode has been found.
The denominator, however, requires a numerical approximation. Hence,
p g 1
J
L Y ; g 1
,
(10.16)
pCJ Y
p |Y
remains to be calculated.
where p|Y
Based on the definition of r, |Y in equation (8.1), let
#
$
, |Y min 1, r , |Y .
(10.17)
exp 2
.
(10.18)
c
q , |Y 2
2c
q,
|Y .56 The posterior density at the mode can
This density is symmetric, i.e., q, |Y
now be approximated by
q s , |Y
N 1 N
s , |Y
s1
p |Y
J
j |Y
J 1 j1 ,
(10.19)
N
N 1 s1 q s , |Y
,
J
j |Y
J 1 j1 ,
where s , s 1, . . . , N are sampled draws from the posterior distribution with the RWM
algorithm, while j , j 1, . . . , J, are draws from the proposal density (10.18). The second
1 for all s when is the posterior mode,
equality stems from the fact that s , |Y
s
i.e., the transition from to is always accepted by the algorithm. Hence, the Chib and
is simply the sample average of the proposal density height for the
Jeliazkov estimator of p|Y
accepted draws relative to the posterior mode, divided by the sample average of the acceptance
probability, evaluated at the posterior mode.
The parameter J is always equal to N in YADA. In contrast with the modified harmonic mean
estimator of the marginal likelihood, the Chib and Jeliazkov estimator requires J additional
draws. When J is large and the parameter space is high dimensional, the Chib and Jeliazkov
estimator will be considerably slower to compute since the log posterior function on the right
hand side of equation (7.1) needs to be evaluated an additional J times.
in (10.19)
The numerical standard error of the log of the marginal likelihood estimate p|Y
can be computed from the vectors
s,j
s , |Y
h
q
hs,j 1s,j
.
j |Y
h2
,
This means that
The average of hs,j is denoted by h.
2,
1 ln h
ln p |Y
ln h
and the numerical standard error of the log marginal likelihood estimate can be calculated
from the sample variance of hs,j via the delta method. The sample variance of the latter can
be computed using the Newey and West (1987) estimator.
10.4. YADA Code
Gewekes (1999) modified harmonic mean estimator of the marginal likelihood with a truncated normal density as weighting function is computed with MargLikeModifiedHarmonic. This
function can also be used to estimate the marginal likelihood sequentially. The nature of the
Notice that , |Y min{1, r, |Y q, |Y /q, |Y } in Chib and Jeliazkov (2001); see, e.g., Section
2.1, above equation (7).
56
110
sequential estimation is quite flexible, where a starting value and an incremental value for the
sequence can be selected on the Settings tab. By default, YADA sets the starting value to 100
and the increment value to 100. For a posterior sample with 10000 draws, this means that the
marginal likelihood is estimated for the sample sizes 100, 200, . . . , 9900, 10000. The selection
of sequential estimation sample is determined on the DSGE Posterior Sampling frame on the
Settings tab; cf. Figure 5.
Similarly, the modified harmonic mean estimator of the log marginal likelihood with a truncated elliptical density as weighting function suggested by Sims et al. (2008) is computed with
the function MargLikeSWZModifiedHarmonic. It can also perform a sequential estimation of the
log marginal likelihood.
The function MargLikeChibJeliazkov calculates the Chib and Jeliazkov (2001) estimator of
the marginal likelihood as well as its numerical standard error. Like the modified harmonic
mean estimator function, MargLikeModifiedHarmonic, the calculations can be performed sequentially using the same sample grid.
The Laplace approximation is calculated by MargLikeLaplace. It is only run towards the
end of the posterior mode estimation routine and should only be viewed as a quick first order
approximation when comparing models.
10.4.1. MargLikeLaplace
The function MargLikeLaplace takes 3 inputs. First, it requires the value of the log posterior at
the mode of , LogPost, second minus (the inverse of) the Hessian at the mode, Hessian, and
third IsInverse, a boolean variable that takes the value 1 if Hessian is the inverse Hessian and
0 otherwise. Based on equation (10.1) the log marginal likelihood is calculated and provides as
the output LogMarg.
10.4.2. MargLikeModifiedHarmonic
The function MargLikeModifiedHarmonic takes 5 inputs. First of all, PostSample, an N m
matrix with N being the number posterior draws that are used. This values is often smaller
than the total number of posterior draws that have been computed either since burn-in draws
are skipped or since the log marginal likelihood is computed from a subsample of the available
posterior draws (or both). Next, the values of the log posterior kernel are needed. These are assumed to be given by the N dimensional vector LogPost. Third, the function accepts a boolean
variable ComputeSequential that is 1 if the marginal likelihood should be estimated sequentially and 0 otherwise. Fourth, a vector with coverage probabilities are needed. This vector,
denoted by CovProb, can be empty or contain numbers between 0 and 1. Finally, the function
requires the structure DSGEModel with model related information. This structure should contain the fields SequentialStartIterationValue and SequentialStepLengthValue, where the
former gives the starting value of the sequential estimates and the latter gives the increment
value. In case CovProb is empty, the structure should also include the fields CovStartValue,
CovIncValue, and CovEndValue. These fields determine the starting probability value, the increment and the upper bound of the coverage probability p in (10.4); cf. the DSGE Posterior
Sampling frame on the Settings tab in Figure 5.
The output of MargLikeModifiedHarmonic is given by the variables LogMargs, CovProb, and
NWStdErr. The dimension of the first output variable is given by the number of successful
computations of the marginal likelihood for the given coverage probabilities times the number of
coverage probabilities plus 1. The first column of this matrix contains the number of draws used
for the computations, while the remaining columns contains the marginal likelihood values for
each given coverage probability. The second output argument is simply the vector of coverage
probabilities that was used by the function. Finally, the third output variable is a vector with the
numerical standard errors of the estimated log marginal likelihood for each coverage probability
at the full sample estimates using the Newey and West (1987) estimator; see equation (8.2).
111
10.4.3. MargLikeSWZModifiedHarmonic
The function MargLikeSWZModifiedHarmonic needs at nine input variables to complete its mission. These are: PostSample, LogPost, TotalDraws, PValue, ComputeSequential, ModeData,
lambda, DSGEModel, and CurrINI. Four of these variables are identical to the same names
inputs for MargLikeModifiedHarmonic. The integer TotalDraws is the number of parameter
draws that could be used and is often greater than N; see the explanation in 10.4.2. The integer PValue lies between 0 and 100 and is the percent of log posterior kernel values that are
greater than a lower bound that is defined through its value. The structure ModeData has fields
with names that include data from the posterior mode estimation of the DSGE or DSGE-VAR
model. The scalar lambda is the DSGE-VAR hyperparameter (see Section 15) and is empty for
DSGE models. The structure CurrINI is discussed above.
The results from the calculations are provided in the four output variables LogMargs, qL,
LBound, and NWStdErr. The first is a matrix with rows equal to the number of successful computations of the log marginal likelihood and two columns. The first column holds the number of
parameter draws used, and the second the estimated log marginal likelihood at that number of
parameter draws. The scalar qL is the fraction of iid draws from the elliptical distribution such
that the log posterior kernel values at these draws are greater than LBound, the lower bound
for the log posterior kernel implied by PValue when all the N posterior draws are used. Finally,
the variable NWStdErr is the numerical standard error, based on the Newey and West (1987)
estimator, of the estimated log marginal likelihood value for all the N parameter draws.
10.4.4. MargLikeChibJeliazkov
The function MargLikeChibJeliazkov needs 17 input arguments. The first two are the matrix
PostSample and the vector LogPost that are also used by MargLikeModifiedHarmonic. Next,
and the inverse Hessian at the posterior mode
are needed, along with
the posterior mode
the scale factor c that is used by the proposal density. These inputs are denoted by phiMode,
SigmaMode, and c, respectively. Furthermore, the function takes the inputs logPostMode (the
value of the log posterior at the mode), NumBurnin (number of burn-in draws), and the boolean
variable ComputeSequential that is also used by MargLikeModifiedHarmonic.
The remaining 9 input arguments are the last 9 inputs for the logPosteriorPhiDSGE function,
j |Y term in the denominator of equation (10.19). The function
used to compute the ,
MargLikeChibJeliazkov always sets J N in (10.19).
The output matrix is given by LogMargs. The first column gives the number of draws used for
estimating the marginal likelihood, the second column the estimated value of the log marginal
likelihood, while the numerical standard error of the log marginal likelihood is provided in the
third column. The standard error is computed using the Newey and West (1987) estimator, with
N 1/2.01 .
N
112
T
t|T z 0 t|T 1|0 ,
t 1, . . . , T.
(11.5)
1
(11.6)
(11.7)
H
YADA actually checks this by trying to find column of zeros in B0 . Such columns may appear when a certain shock
is forced to have no eect on the variables in the AiM model file, but the shock has not been deselected in YADA.
113
according to
t|t
t
t|t z 0 t|t 1|0 ,
t 1, . . . , T.
(11.8)
1
H
if 1, . . . , t 1,
B0 H1
t|t1
y,t|t1
t|t
B H1
if t.
0
y,t|t1
The weights on the initial condition are now given by
H 0 t|t1 .
0 t|t B0 H1
y,t|t1
(11.9)
(11.10)
0
Ir 0 rt|T Jr rt|T .
rt|T
Given the relationship between the economic shocks and the state shocks in (11.1), we therefore
find that smooth estimates of the economic shocks for the initialization sample are
where
t t|t1 , if 1, . . . , t 1,
G
rt|t
if t,
Ft ,
while the r r matrices with weights on the initial state are
t 0 t|t1 .
0 rt|T G
The update estimates of the economic shocks over the initialization sample may now be
computed from the update innovations rt|t according to
t|t B0 Jr rt|t ,
t 1, . . . , d.
114
It therefore follows that equation (11.8) is also valid for the initialization sample, but where the
q nt weighting matrices are now
t|t B0 Jr rt|t , 1, . . . , t,
(11.13)
while the q r matrices with weights on the initial state are
0 t|T B0 Jr 0 rt|t .
11.1.3. Simulation Smoother
Since the dimension of t is typically lower than the dimension of vt , i.e., q < r, a more
ecient algorithm of the simulation smoother from Section 5.10 would take this account. This
can directly be achieved by replacing vt by t in the definition of and by letting IT
diag R, Iq
.
Furthermore, the covariance matrix R will typically have reduced rank. We may therefore
apply the decomposition R SS , where S is n nR , rank R
nR n, diag 1 , . . . , nR
are the non-zero eigenvalues of R, ordered from the largest to the smallest, while S S InR .
We then let wt R1/2 t , with R1/2 S1/2 , while t has covariance matrix InR . Finally, we
replace wt with t in the definition of , with the consequence that IT nR q .
It should be emphasized that if t (or wt ) is observable at t, such that the right hand side
of (11.2) (or (5.35)) is zero, then the covariance matrix of the simulation smoother for the
economic shocks is zero. This may happen when q nR n. Assuming that nR 0, this
can easily be checked by inspecting if B0 NT |T B0 Iq . In addition, if a diagonal element of the
covariance matrix in (11.2) is zero, then the corresponding economic shock can be observed
from the data and the parameter values; a similar conclusion about the measurement errors
can be drawn by inspecting the right hand side of (5.35).
To improve the eciency of the simulation smoother we may use antithetic variables; see
Section 5.10. Given our redefinition of as being a vector of dimension TnR q of N0, 1
random variables, we can now provide some additional antithetic variables that allow the simulation sample to be balanced for scale. Recall first that
i E |T
i E |Ti
,
where i is a draw from N0, , while the anithtetic variable
i
i E |T
d/c
while a fourth antithetic is
i E |T
i
d/c
E |T
.
structural, economic shocks, but also on measurement errors and prediction errors of the unobserved state variables. Since the prediction errors of these variables depend on the economic
shocks and on the measurement errors, one cannot really speak about a unique decomposition
since one may further decompose the variable prediction errors.
Let th yth yth|t denote the h-step ahead forecast error of the observed variables when
we condition on the parameters. Furthermore, notice that yth yth|T provided that th T.
From the measurement error estimation equation (5.34) and the multistep forecasting equation
(5.37) we thus have that for any h such that t h T:
(11.14)
th H th|T th|t wth|T , t 1, . . . , T h.
By making use of equation (5.40) we can rewrite the dierence between the smoothed and
the h-step forecasted state vector on the right hand side of (11.14) as
th|T th|t
h1
F t|T t|t
F i vthi|T .
h
i0
Substituting this expression into (11.14) and noticing that vthi|T B0 thi|T , we obtain the
following candidate of a historical forecast error decomposition
h
th H F
t|T t|t H
h1
F i B0 thi|T wth|T ,
t 1, . . . , T h.
(11.15)
i0
Unless the state vector can be uniquely recovered from the observed variables58 the first term
on the right hand side is non-zero. It measures the improvement in the projection of the state
vector when the full sample is observed relative to the partial sample. As the forecast horizon
h increases, this term converges towards zero. It may be argued that the choice of time period
t for the state vector is here somewhat arbitrary. We could, in principle, decompose this term
further until we reach period 1. However, the choice of state variable period t for the forecast
error is reasonable since this is the point in time when the forecasts are made. Moreover,
shifting it back to, say, period 1 would mean that the historical forecast error decomposition
would include estimates of the economic shocks that are based not only on the full sample, but
also on the t information set and, furthermore, the timing of those shocks would be for time
periods 2, . . . , t, and would therefore be shocks with time period prior to t 1, the first
time period of the forecast period.
Apart from the timing of the terms, the decomposition in (11.15) also has the advantage that
the dynamics of the state variables enter the second moving average term that involves only
the economic shocks, while the measurement errors in the third term do not display any serial
correlation. We may regard this as a model consistent decomposition in the sense that only
the state variables display dynamics and the measurement errors are independent of the state
variables. Hence, serial correlation in the forecast errors should stem from the shocks that aect
the dynamics, i.e., the economic shocks.
11.3. Impulse Response Functions
The responses of the observed variables yt from shocks to t can easily be calculated through
the state-space representation and the relationship between the state shocks and the economic
shocks. Suppose that t ej and zero thereafter, with ej being the j:th column of Iq . Hence,
we consider the case of a one standard deviation impulse for the j:th economic shock. From
the state equation (5.2) the responses in th for h 0 are:
(11.16)
resp th |t ej F h B0 ej , h 0.
If the model is stationary, then the responses in the state variables tend to zero as h increases.
58
This would be the case if the state vector could be expressed as a linear function of the observed variables (and
the deterministic) by inverting the measurement equation.
116
From the measurement equation (5.1) we can immediately determine the responses in the
observed variables from changes to the state variables. These changes are here given by equation (11.16) and, hence, the responses of the observed variables are:
(11.17)
resp yth |t ej H F h B0 ej , h 0.
Again, the assumption that the state variables are stationary implies that the responses of the
observed variables tend to zero as the response horizon h increases.
11.4. Forecast Error Variance Decompositions
The conditional forecast error covariance matrix for the h-step ahead forecast of the observed
vector y is given in equation (5.39). It can be seen from this equation that this covariance
matrix is time-varying. Although this is of interest when we wish to analyse the forecast errors
at a particular point in time, the time-variation that the state-space model has introduced is
somewhat artificial since it depends only on the choice of t 1. For this reason we may wish
to consider an unconditional forecast error covariance matrix, where the value of (5.39) no
longer depends on the chosen initialization period.
Assume that a unique asymptote of the forecast error covariance matrix Pth|t exists and let
it be denoted by Ph for h 0, 1, . . .. By equation (5.38) it follows that
Ph FPh1 F Q,
h 1.
Similarly, from the expression for Pt|t below equation (5.20) we have that
1
P0 P1 P1 H H P1 H R H P1 .
(11.18)
(11.19)
Let h 1 in equation (11.18) and substitute for P0 from (11.19). We then find that the asymptote
P1 must satisfy
1
(11.20)
P1 FP1 F FP1 H H P1 H R H P1 F Q.
Given that we can solve for a unique asymptote P1 , all other Ph matrices can be calculated using
(11.18) and (11.19).
The assumptions that (i) F has all eigenvalues inside the unit circle, and (ii) Q and R are
positive semidefinite, are sucient for the existence of an asymptote, P1 , that satisfies (11.20);
see, e.g., Proposition 13.1 in Hamilton (1994). Let the asymptote for the Kalman gain matrix in
(5.9) be denoted by K, where
1
K FP1 H H P1 H R .
The assumptions (i) and (ii) also imply that all the eigenvalues of F KH lie on or inside the
unit circle.
In fact, if we replace (ii) with the stronger assumption that either Q or R is positive definite,
then the asymptote P1 is also unique; see Proposition 13.2 in Hamilton (1994). This stronger
assumption is in the case of DSGE models often not satisfied since the number of economic
shocks tends to be lower than the number of state variables (Q singular) and not all observed
variables are measured with error (R singular). Nevertheless, from the proof of Proposition
13.2 in Hamilton it can be seen that the stronger assumption about Q, R can be replaced with
the assumption that all the eigenvalues of F KH lie inside the unit circle. From a practical
perspective this eigenvalue condition can easily be checked once an asymptote P1 has been
found; see also Harvey (1989, Chapter 3.3).
The expression in (11.20) is a discrete algebraic Riccati equation and we can therefore try to
solve for P1 using well known tools from control theory. The matrix Q is typically singular for
DSGE models since there are usually fewer economic shocks than state variables. Moreover,
the matrix R is not required to be of full rank. For these reason, YADA cannot directly make
use of the function dare from the Control System Toolbox in Matlab or the procedures discussed
by Anderson, Hansen, McGrattan, and Sargent (1996). Instead, YADA uses a combination of
iterations (with as an initial value) and eigenvalue decompositions, where a solution to
(11.20) is attempted in each iteration using the dare function for a reduction of P1 . The details
on this algorithm are presented in Section 11.5.
117
Prior to making use of such a potentially time consuming algorithm it makes sense to first
consider a very simple test. From equation (11.18) is can be seen that if P0 0 then P1
Q. This test can be performed very quickly and when successful save considerable computing
time. Moreover, P0 0 has the natural interpretation that all the time t state variables and
measurement errors can be observed from the information available at t, i.e., t t|t for all t
so that wt wt|t .
The asymptotic forecast error covariance matrix for yth is now given by:
h H Ph H R,
h 1.
(11.21)
Under the assumption of linearity, it is relatively straightforward to calculate forecast error variance decompositions for y based on (11.21). The total h-step ahead forecast error variances are
simply the diagonal elements of this matrix. To compute the variance due to all the measurement errors, we let Q 0 and, therefore, Ph 0 for all h 0. The forecast error covariance
matrix of yth is now R for all h when the economic shocks are shut down. The share of the
forecast error variance due to each one of the measurement errors is now the inverse of the
diagonal of (11.21) times the diagonal of R.
Similarly, to compute the variance due to economic shock j we may first consider letting
, where B0j is the j:the column of B0 . Given these new error covariance
R 0 and Q B0j B0j
matrices we first check if P0 0 when we let P1 Q. If the condition is not satisfied we attempt
to solve for P1 using (11.20). Since the n n matrix H P1 H is singular when rankQ < n, the
inverse of this matrix is replaced by SS H P1 H
S1 S , where the n s matrix S is computed
from an eigenvalue decomposition of H P1 H.59 Furthermore, is no longer the best initial
.
value. Instead we may calculate a new using the new Q, i.e., B0j B0j
If the Riccati equation has a unique asymptote, P1 , under the new Q, we can compute the new
Ph matrices. Next, the shares of the forecast error variance of the observed variables at horizon
h for economic shock j can be determined by premultiplying the vector of diagonal elements of
the newly computed H Ph H matrix with the inverse of the diagonal elements of h .
When the model has measurement errors, then the sum of the share due to the measurement
errors and the share due to the economic shocks is less than one for finite h when the impact
of the economic shocks are computed with R 0. To see this, let PhR denote the forecast error
covariance matrix of the state variables for any positive semidefinite R, while Ph0 denotes this
covariance matrix when R 0. The right hand side of (11.21) can be written as
H PhR H R H Ph0 H R H PhR Ph0 H.
(11.22)
The first term on the right hand side of (11.22) is the share explained by the economic shocks,
while the second term is the share explained by the measurement errors. The share of the
forecast error variances of the observed variables that cannot directly be attributed to these two
sources of uncertainty is given by the third term. Unless P0R 0 so that P1R Q P10 , this third
term is generally not zero.
In fact, this error term appears because of the way we have computed the impact of the
economic shocks on the forecast error variance, i.e., by only allowing for one economic shock
at a time. To see why, notice that an alternative way of computing the eect of shock j on
the forecast error variance of the observed variables would be to compute the forecast error
variance when shock j is set to zero, and subtract the resulting matrix from the forecast error
covariance when all shocks and measurement errors are taken into account. For this latter case,
the matrix R is always included in the computations of P1 and the sum of all the terms due to
the q economic shocks plus R yields the forecast error covariance matrix on the left hand side
of (11.22).
The problem with PhR Ph0 can therefore be avoided by computing the forecast error covariances for the individual shocks in a dierent way. It should also be noted that the error
59
Let be a s s diagonal matrix whose diagonal elements are the non-zero eigenvalues of H P1 H. The eigenvalue
decomposition in YADA is then SS H P1 H, where S S Is . The inverse of H P1 H is therefore replaced with
S1 S . Moreover, since rankQ 1 here we expect that s 1.
118
term problem on the right hand side of (11.22) can be avoided by simply including R when
solving for P1 in (11.20) for each individual economic shock and this is the approach taken in
YADA.
However, there is a more fundamental problem about using a linear decomposition that we
have not discussed yet. Namely, the forecast error covariance matrix in (11.21) is a non-linear
function of the economic shocks and the measurement errors and any decomposition based on
the assumption of linearity can therefore only be regarded as an approximation. The approxima
tion error is, in my experience, smaller when computing the decompositions setting Q B0j B0j
and including a non-zero R in the Riccati equation solver for the individual economic shocks
rather than the alternative of subtracting the forecast error covariance matrix for all shocks but
j and the measurement error covariance from h . Moreover, this approach seems to be faster,
presumably because P1 Q is more often a good initial value. Nevertheless, the assumption of
linearity is an approximation and the error may be noticeable, especially in models where the
number of shocks and measurement errors is greater than the number of observed variables.
The h-step ahead forecast error for the observed variables when the forecasts are performed
based on the observations in period T can be expressed as:
(11.23)
T h H T h T h|T wT h , h 1, 2, . . . .
But we can also derive an alternative asymptotic forecast error variance decomposition from
(11.23). The forecast error of the state variables can as we have already seen in, for example,
equations (5.36) and (5.40) be expressed as
h1
F i B0 T hi ,
T h T h|T F h T T |T
i0
h1
H F i B0 B0 F i H R.
(11.24)
i0
This decomposition can also be derived by recursively substituting for Ph based on equation
(11.18) into equation (11.21). Although the decomposition in (11.24) requires P0 to be determined, we may opt to leave this term otherwise undetermined, i.e., the P0 matrix is not
decomposed into the shares of the individual economic shocks. For the impact of the individual
shocks on the forecast error covariances we may focus on the second term, while the third term
gives us the share of the measurement errors. This has the benefit that the Riccati equation
solver would at most have to be used once (per value of ) and therefore save plenty of time,
and that a linear decomposition is indeed exact.
However, YADA does not directly use this alternative decomposition. It is clearly an attractive
decomposition since it is linear in terms of state variable uncertainty, economic shock uncertainty, and measurement error uncertainty. However, the main reason for not using this decomposition rather than the linear approximation of the non-linear case with only shock and
measurement error uncertainty is simply that YADA can also compute the so called conditional
forecast error variance decomposition. This latter decomposition focuses on the second term on
the right hand side of (11.24), i.e., it only covers the eects on the forecast errors due to the
economic shocks over the forecast horizon. Accordingly, the most interesting aspect of the decomposition in (11.24) is already covered. Conditional forecast error variance decompositions
are discussed in Section 11.6.
The long-run forecast error variance decomposition can also be calculated using the above
relations. First, we note that if the unique asymptote P1 exists, then all Ph exist and are unique.
119
Moreover, limh Ph so that the long-run forecast error covariance of the observed variables is simply the contemporaneous covariance matrix H H R. The long-run forecast error
variance decomposition can now be calculated, noting first that is obtained by solving the
Lyapunov equation (5.15). The covariance matrix due to measurement errors is R, while the
j
j
covariance matrix due to economic shock j is given by H H, where is the solution to the
Lyapunov equation:
j
j
, j 1, . . . , q.
(11.25)
F F B0j B0j
Summing these q shocks terms, applying the H matrix, and adding the measurement error
covariance matrix we obtain the asymptote of the contemporaneous covariance matrix of the
observed variables. In this case, a linear decomposition can be performed without an approxiq
j
mation error since j1 .
If we assume that all observed variables are expressed as first dierences, the levels are given
by the accumulation of the first dierences and an initial value. This means that the forecast
error of the levels is the accumulation of the forecast errors for the observed variables. Letting
T h denote the h-step ahead forecast error of the levels variables, it is given by
T h
h
T j ,
j1
h
h
H T j tj|T
wT j .
j1
(11.26)
j1
The covariance matrix of the levels h-step ahead forecast error is denoted by h . For h 1 we
find that 1 1 , while a bit of algebra reveals that
h1
i
i
H F Phi Phi F
H,
(11.27)
h h h1
i1
where the third term on the right hand side is due to the covariance between the state variable
errors in period T h and periods T 1 until T h 1, i.e., the covariance between T h and
T h1 .
11.5. The Riccati Equation Solver Algorithm in YADA
The algorithm used by YADA to (try to) solve the Riccati equation (11.20) for the forecast error
variances uses a combination of iterative and non-iterative techniques. Let the Riccati equation
be given by
1
(11.28)
P FP F FP H H P H R H P F Q,
where Q and R are positive semidefinite and P is used instead of P1 . I will discuss the main
ingredients of the algorithm below, where each iteration follows the same steps. It is assumed
that P Q has already been tested and rejected.
First, a positive semidefinite value of P is required. This value is used to evaluate the right
hand side of (11.28), yielding a new value for P that we shall explore.60 Since the new value
of P may have reduced rank, we first use an eigenvalue decomposition such that for the r r
positive semidefinite matrix P
P NN ,
where N is r s such that N N Is , while is an s s diagonal matrix with the non-zero
eigenvalues of P . Substituting this expression for P into (11.28), premultiplying both sides by
N and postmultiplying by N, we obtain the new Riccati equation
1
(11.29)
AA AB B B R B A C,
where A N FN, B N H, and C N QN.
In the event that H P H R has reduced rank, its inverse is replaced by SS H P H R
S1 S where S is
obtained from the eigenvalue decomposition H P H R SS .
60
120
Next, if the matrix B B R in (11.29) has reduced rank, YADA performs a second eigenvalue
decomposition. Specifically,
B B R DD ,
where D is n d such that D D Id , while is a d d diagonal matrix with the non-zero
eigenvalues of B B R. Replacing the inverse of this matrix with DB B R 1 D, with
B BD and R D RD, the Riccati equation (11.29) can be rewritten as
1
AA AB B B R
B A C.
(11.30)
When the matrix B B R in (11.29) has full rank, YADA sets D In .
YADA now tries to solve for in (11.30) using dare from the Control System Toolbox; see
Arnold and Laub (1984) for details on the algorithm used by dare. If dare flags that a unique
solution to this Riccati equation exists, , then YADA lets P N N . When the call to dare
does not yield a unique solution, YADA instead compares the current P to the previous P . If the
dierence is suciently small it lets the current P be the solution. Otherwise, YADA uses the
current P as input for the next iteration.
11.6. Conditional Variance Decompositions
An alternative approach to the forecast error variance decomposition in Section 11.4 is suggested through the historical forecast error decomposition in equation (11.15). The h-step ahead
forecast error using data until period T can be expressed as:
h1
F i B0 T hi wT h ,
T h H F h T T |T H
h 1, 2, . . . , H.
(11.31)
i0
If we condition the forecast error T h on the state projection error (first term) and the measurement error (third term), the conditional h-step ahead forecast error variance is given by the
variance of the second term on the right hand side. That is,
Vh
h1
H F i B0 B0 F i H.
(11.32)
i0
This forecast error variance is identical to the forecast error variance that we obtain when a VAR
model is written on state-space form. It is therefore analogous to a variance decomposition
that is calculated from the impulse response functions in (11.17).
Letting Ri H F i B0 , the conditional forecast error variance decomposition can be expressed
as the n q matrix
1
h1
h1
Ri Ri In
Ri Ri ,
(11.33)
vh
i0
i0
where is the Hadamard (element-by-element) product. With ein being the i:th column of In ,
the share of the h-step ahead conditional forecast error variance of the i:th observed variable
q
that is explained by the j:th economic shock is given by ein vh ej .
YADA can also handle conditional variance decompositions for levels variables. To illustrate
how this is achieved assume for simplicity that all observed variables are expressed as first
dierences so that the levels are obtained by accumulating the variables. This means that the
h-step ahead forecast error for the levels is the accumulation of the error in (11.31), i.e.,
T h H
h
F T T |T H
j1
j1
h
F i B0 T ji
h
j1 i0
wT j ,
h 1, 2, . . . , H.
(11.34)
j1
The conditional h-step ahead forecast error for the levels variables is the second term on the
right hand side of (11.34). This can be expressed as
Tch
j1
h
Ri T ji
j1 i0
h1
j0
121
Rj T hj ,
(11.35)
j
where Rj i0 Ri . It therefore follows that the conditional forecast error variance for the
levels of the observed variables is
h1
Rj R
(11.36)
Vh
j .
j0
We can then define the levels variance decomposition as in equation (11.33) with Rj instead of
Ri (and summing over j 0, 1, . . . , h 1).
By collecting the products Ri Ri into one group and all other product into a second group and
dividing both sides of equation (11.36) by h, it can be rewritten as:
Vh Vh /h
h1
h1 m1
h m
(11.37)
hi
Ri Ri
Ri Rm Rm Ri .
h
h
m1 i0
i0
Taking the limit of Vh as the forecast horizon approaches infinity we obtain an finite expression
of the long-run forecast error covariance. We here find that
lim Vh
Ri Ri
m1 i0
i0
m1
Ri Rm Rm Ri
Ri
i0
Ri
(11.38)
i0
1
1
H Ir F
B0 B0 Ir F
H.
Hence, if we divide the h-step ahead forecast error covariance matrix by h and take the limit
of this expression, we find that the resulting long-run covariance matrix is equal to the cross
product of the accumulated impulse responses.
These results allow us to evaluate how close the forecast error covariance matrix at the h-step
horizon is to the long-run forecast error covariance matrix. The ratio between the l:th diagonal
element in (11.37) and in (11.38) is an indicator of such convergence. A value close to unity
can be viewed as long-run convergence at forecast horizon h, while a very large or very small
value indicates a lack of convergence.
We can also use the result in (11.38) to calculate the long-run conditional forecast error
variance decomposition. Letting Rlr H Ir F1 B0 , we find that
1
Rlr Rlr ,
(11.39)
vlr Rlr Rlr In
provides such a decomposition.
11.7. Conditional Correlations and Correlation Decompositions
The basic idea behind conditional correlations is to examine the correlation pattern between
a set of variable conditional on one source of fluctuation at a time, e.g., technology shocks.
Following the work by Kydland and Prescott (1982) the literature on real business cycle models tended to focus on matching unconditional second moments. This was critized by several
economists since a models ability to match unconditional second moments well did not imply
that it could also match conditional moments satisfactorily; see, e.g., Gal (1999).
We can compute conditional correlations directly from the state-space representation. Let
column j of B0 be denoted by B0j , while the j:th economic shock is j,t . Recall from equation
(11.25) that the covariance matrix for the state variables conditional on only shock j is
j
j
,
F F B0j B0j
j
j
(11.40)
where E t t |j,t
. We can estimate at by either solving (11.40) analytically through
the vec operator, or numerically using the doubling algorithm discussed in Section 5.3. The conditional correlation for the observed variables can thereafter be calculated from the conditional
122
covariance matrix
j
j
y H H.
(11.41)
As an alternative to conditional population moments we can also consider simulation methods to obtain estimates of conditional sample moments. In that case we can simulate a path for
the state variables conditional on only shock j being non-zero, by drawing T values for j,t and
letting
s
B0j j,t , t 1, . . . , T.
(11.42)
ts Ft1
j
where 0s is drawn from N0, . The conditional sample correlations for simulation s can
now be computed from the covariance matrix:
T
1 s s
j,s
H t t H,
y
T t1
s 1, . . . , S.
(11.43)
By repeating the simulations S times we can estimate the distribution of the conditional sample
correlations for a given .
Correlation decompositions have been suggested by Andrle (2010) as a means of decomposing autocorrelations for pairs of variables in a linear model. These decompositions relate the
conditional correlations to the correlations by weighting them with the ratio of the conditional
variances and the variances. From equation (5.42) we know that the state-space model provides
us with
H H R, if h 0,
(11.44)
y h
h
otherwise.
H F H,
In addition, the covariance matrix of the state variables is related to the conditional covariance
matrices in (11.40) linearly with
q
j
.
j1
y h
while
y h
H j H,
if h 0,
H F h j H,
otherwise,
q
(11.45)
j
j1
yk ,yl h
yk ,yk 0yl ,yl 0
q
j1
j
yk ,yl h
yk ,yk 0yl ,yl 0
(11.46)
j
where yk ,yk 0 is the variance of yk,t . Assuming that all conditional variances yk ,yk 0 are
positive, Andrle (2010) notes that the right hand side of (11.46) can be rewritten as
*
+ j
j
j
q
+
yk ,yl h
, yk ,yk 0yl ,yl 0
yk ,yl h
%
yk ,yk 0yl ,yl 0
j
j
j1
yk ,yk 0yl ,yl 0
(11.47)
q
j j j
yk yl yk ,yl h.
j1
123
In other words, the correlation between yk,t and yl,th is equal to the sum over all shocks j of
j
j
the product of the shares of the standard deviations of yk and yl due to shock j (yk times yl )
j
and the conditional correlation between the variables yk ,yl h. The two standard deviation
shares are non-negative (positive if all the conditional variances are positive) and sum to at
most unity over the dierent shocks (unity when R 0) and can be interpreted as the weights
needed to obtain the correlations from the conditional correlations.
In case one of the conditional variances is zero for some shock, we let all terms be zero for
that shock in!(11.47). Moreover, to deal with non-zero measurement errors we add the correlation Ryk ,yl / yk ,yk 0yl ,yl 0 to the expressions for yk ,yl 0 that involve sums of conditional
covariances. Finally, the population moments used in the above expressions can be substituted
for simulated sample moments, such as (11.43), to decompose the sample correlations.
t1
F i B0 ti|T ,
(11.48)
i0
where 0|T is unobserved and can be set to its expected value (zero) when the observed data
begins in period t 1, while the smooth estimate of the economic shocks for period 1 is
then 1|T B0 B0 1 B0 1|T . From (11.48) we can decompose the smooth estimates of the
state variables into terms due to the q economic shocks. Substituting this expression into the
measurement equation based on smoothed estimates, we obtain
yt A xt H F t 0|T
t1
H F i B0 ti|T wt|T ,
t 1, . . . , T.
(11.49)
i0
tt
0 1
H F i B0 ti|T wt|T ,
t t0 1, . . . , T.
(11.50)
i0
This provides a decomposition of the observed variables yt into (i) deterministic variables, (ii)
the estimated history of the state until t0 (t0 |T ), (iii) the q economic shocks from t0 1 until t,
and (iv) the measurement error.
Assuming that the model has a unique and convergent solution at both b and at a , YADA
provides two approaches for parameter scenario analysis. The first is to calculate smooth estimates of the economic shocks under the two parameter vectors. The path for the observed
variables are, in this situation, the same for both parameter vectors.61
The second approach takes the economic shocks and measurement errors based on b as
given and calculates the implied observed variables from the state-space representation with
the parameter matrices A, H, F and B0 determined by the alternative a vector.62 This path can
then be compared with the actual data for the observed variables.
11.10. Linking the State-Space Representation to a VAR
To address the issue if the economic shocks and measurement errors of the state-space representation can be uncovered from a VAR representation of the observed variables, FernndezVillaverde, Rubio-Ramrez, Sargent, and Watson (2007) provide a simple condition for checking
this.
To cast equations (5.1) and (5.2) into their framework we first rewrite the measurement error
as:
wt t ,
where R while t N0, I. The matrix is of dimension n m, with m rankR n.
Substituting for t from the state equation into the measurement equation we get:
yt A xt H Ft1 H B0 t t
A xt H Ft1 Dt ,
(11.51)
(11.52)
where B B0 0
is an r q m matrix.
The state-space representation has a VAR representation when t can be retrieved from the
history of the observed variables. The first condition for this is that D has rank q m such that
a Moore-Penrose inverse D D D1 D exists. A necessary condition for the existence of this
inverse is clearly that n q m, i.e., that we have at least as many observed variables as there
are economic shocks and unique measurement errors.
Assuming D exists we can write t in (11.51) as a function of yt , xt , and t1 . Substituting
the corresponding expression into the state equation and rearranging terms yields
(11.53)
t F BD H F t1 BD yt A xt .
If the matrix F BD H F has all eigenvalues inside the unit circle, then the state variables
are uniquely determined by the history of the observed (and the exogenous) variables. The
state vector t can therefore be regarded as known, and, moreover, this allows us to express the
measurement equation as an infinite order VAR model. Accordingly, the economic shocks and
the measurement errors are uniquely determined by the history of the observed data (and the
parameters of the DSGE model). The eigenvalue condition is called a poor mans invertibility
condition by Fernndez-Villaverde et al. (2007).
11.11. Fishers Information Matrix
It is well known that when the vector of parameters is estimated with maximum likelihood,
then the inverse of Fishers information matrix is the asymptotic covariance matrix for the parameters. If this matrix has full rank when evaluated at the true parameter values, the parameters are said to be locally identified; cf. Rothenberg (1971).
61
Alternatively, one may wish to compare smooth estimates of the state variables under the two parameter vectors.
62
One alternative to the second approach is to simulate the model under a by drawing the economic shocks and
the measurement errors from their assumed distribution a large number of times, compute the implied path for the
observed variables, and then compare, say, the average of these paths to the actual data for the observed variables.
125
Since DSGE models are typically estimated with Bayesian methods, identification problems
can likewise be viewed through the behavior of the Hessian of the log-posterior distribution.
However, such problems can be dealt with by changing the prior such that the log-posterior has
more curvature. Still, use of prior information to deal with identification problems is unsatisfactory. One way to examine how much information there is in the data about a certain parameter
is to compare the plots of the prior and the posterior distributions. If these distributions are
very similar, then it is unlikely that the data is very informative about this particular parameter.
The comparison between prior and posterior distributions require that we have access to
draws from the posterior. Since drawing from the posterior may be very time consuming, it may
be useful to consider an alternative approach. In this respect, Fishers information matrix may
also be useful when considering identification issues from a Bayesian perspective. This approach
has been investigated in a series of articles by Nikolay Iskrev; see Iskrev (2007, 2010).63
Using standard results from matrix dierential algebra (see Magnus and Neudecker, 1988) it
has been shown by Klein and Neudecker (2000) that with y
t yt yt|t1 the second dierential
of the log-likelihood function in (5.18) can be written as:
(
)
T
T
1
1
1
2
tr y,t|t1 dy,t|t1 y,t|t1 dy,t|t1
dy
t 1
d
y
d ln L T ;
t
y,t|t1
2 t1
t1
(
)
T
1
1
1
tr y,t|t1 dy,t|t1 y,t|t1 dy,t|t1 y,t|t1 y
t y
t
t1
T
2
(
tr
t1
1
y,t|t1
dy,t|t1 1
d
y
y
t
t
y,t|t1
(
)
T
2
1
tr y,t|t1 d y
t y
t .
t1
Taking the expectation of both sides with respect to , the Lemma in Klein and Neudecker
t ;
implies that the last two terms on the right hand side are zero. Moreover, with E y
t y
E y,t|t1 ;
, the above simplifies to
T
1
1
1
2
E dvec y,t|t1
y,t|t1 y,t|t1 dvec y,t|t1
E d ln L T ;
2 t1
T
t1
d
y
dy
t 1
t
y,t|t1
T
ln L T ;
y
t
y
t 1
E
y,t|t1
E
t1
T
vec y,t|t1
vec y,t|t1 1
1
1
E
y,t|t1 y,t|t1
.
2 t1
(11.54)
The partial derivatives of y
t and y,t|t1 with respect to A, H, R, F, Q (the reduced form parameters) can be determined analytically. The form depends on how the initial conditions for
63
See also Beyer and Farmer (2004), Canova and Sala (2009), Consolo, Favero, and Paccagnini (2009), Komunjer
and Ng (2011), and Bonaldi (2010) for discussions of identifiability issues in DSGE models.
126
the state variables relate to the parameters; see Zadrozny (1989, 1992) for details. The step
from reduced form parameters to is explained by Iskrev (2007).
Instead of making use of these analytic results, YADA currently computes numerical derivatives of y
t and y,t|t1 with respect to .
11.12. A Rank Revealing Algorithm
In practise it may be dicult to assess the rank of the information matrix. In most cases we
may expect that the determinant is positive but perhaps small. One approach to disentangling
the possible parameters that are only weakly identified is to compute the correlations based on
the information matrix. If two parameters are highly correlated in absolute terms, it may be
dicult to identify both. However, it should be kept in mind that the information matrix can
be of full rank also when the correlation between some pairs of parameters is close or equal
to unity. Moreover, the matrix can have less than full rank also when all correlation pairs are
less than unity. Hence, the information content in such correlations is of limited value when it
comes the determining the rank of the information matrix.
As pointed out by Andrle (2009), a natural tool for examining the rank properties of any real
valued matrix is the singular value decomposition; see, e.g., Golub and van Loan (1983). For
symmetric matrices the decomposition simplifies to the eigenvalue decomposition. The determinant of a square matrix is equal to the product of its eigenvalues and, hence, the eigenvectors
of the smallest eigenvalues of the information matrix give the linear combinations of the parameters that are the most dicult to identify. Provided that an eigenvalue is very small and
its eigenvector has large weights on a pair of parameters, then the correlation between these
two parameters will be large in absolute terms. However, if the eigenvector has large weights
for more than two parameters, then the correlations between pairs of these parameters need
not be large. This suggests that correlations are potentially unreliable for determining weakly
identified parameters, but also that the eigenvectors of the smallest eigenvalues do not suer
from such a problem.
In fact, Andrle (2009) has suggested a heuristic procedure, based on Q-R factorization with
column pivoting (Golub and van Loan, 1983, Section 6.4), that sorts the parameters from the
most identified to the least identified. For a positive definite matrix A, the rank revealing
approach may be applied as follows:64
(1) Compute the singular value decomposition of A USV ( V SV since A is symmetric)
where U and V are orthogonal (U U I and V V I). It is assumed that the singular
values in S are sorted from the largest to the smallest.
(2) Calculate the Q-R factorization with column pivoting on V , i.e.,
V P QR,
where P is a permutation matrix, Q is orthogonal (Q Q I), and R is upper triangular.
(3) The last column of the permutation matrix P is a vector with unity in position j and
zeros elsewhere, linking column j of A to the smallest singular value. Discard row and
column j of A and return to (1) until A is a scalar.
The first column that is removed from A is the least identified and the last to remain is the most
identified. To keep track of the column numbers from the original m m matrix A, we use the
vector v 1, 2, . . . , m
as the column number indicator. Now, element j of v determines the
column number cj of A that is linked to the smallest singular value. In step (3) we therefore
discard element j from v and row and column cj from A before returning to step (1).
As an alternative to performing the Q-R factorization with column pivoting on V , one may
calculate the factorization directly on A.65 In that case, the singular value decomposition can
be skipped and only the last two steps need be performed. This latter option is the preferred
64
Note that the algorithm may also be applied when A has reduced rank or when it is not a square matrix; see
Andrle (2009) for details on how it then needs to be modified.
65
In practise, a factorization based on A seems to yield the same ordering of the columns as a factorization based
on S1/2 V .
127
An example of this latter case is trivially when the parameter concerns a first order autoregression and its support
is given by 1, 1.
128
i , the statistic is
.
DN,N sup FN i | FN i |
(11.57)
NN
D
sup |Bt| K,
N,N
NN
t 0,1
(11.58)
where denotes weak convergence, Bt is a Brownian bridge,67 and K has a Kolmogorov
distribution. The cdf of K is given by
1i1 exp 2i2 x2 .
(11.59)
Pr K x 1 2
i1
This means that the test value on the left hand side of (11.58) is simply substituted for x in
(11.59), where the infinite sum is approximated by an finite sum based on the exponential term
being suciently close to zero. An approximate p-value for the Kolmogorov-Smirnov test that
the two distributions are equal can therefore be computed from such a truncation.68
When parameters which are important for the target behavior have been located, a graphical
illustration may prove useful. It is well known that certain parameters of the monetary policy
reaction function are candidates for yielding, e.g., a unique convergent solution or indeterminacy. In the case of the An and Schorfheide model in Section 2.1, the parameter 1 can produce
such behavior. Scatter plot of pairs of parameters may provide useful information about the regions over which the model has a unique convergent solution, indeterminacy, and when there
is no convergent solution.
11.15. Moments of the Observed Variables
The population autocovariances conditional on a given value of (and the selected model) have
already been presented in Section 5.8. In this section we shall discuss how one can compute
the moments of the observed variables conditional only on the model. To this end, let p
be a proper density function of the DSGE model parameters. From a notational point of view
I have suppressed additional variables from the density, but it should nevertheless be kept in
mind that it could either be the prior or the posterior density of the model parameters.
Recall that the population mean of the observed variables conditional on and the exogenous
variables is:
E yt |xt , A xt y| xt .
The population mean conditional on the exogenous variables (and the model) is therefore given
by
y| p d xt ,
(11.60)
E yt |xt y xt
where is the domain of the parameters and y is an n k matrix. In the event that A is
calibrated, then y| y for all .
Recall that a Brownian bridge, Bt, is defined from a Wiener process, Wt, over the unit interval t 0, 1
such
that
Bt Wt tW1,
where Wt is normally distributed with mean 0, variance t, and have stationary and independent increments.
67
68
2
2
Pr K x
exp 2i 1 2 /8x2 .
x i1
This expression is better suited when x is small. The mean of the Kolmogorov distribution is K /2 ln2
2
2
2
2
0.868731, while the variance is K /12 K 0.260333 . For discussions on computational issues, see, e.g.,
Marsaglia, Tsang, and Wang (2003).
129
Turning next to the covariance matrix of yt conditional on and the exogenous variables, we
have from equation (5.42) that
E yt y| xt yt y| xt |xt , H H R y| .
From the law of iterated expectations and by utilizing the mean expansion
yt y xt yt y| xt y| y xt ,
the population covariance matrix for yt conditional only on the exogenous variables can be
shown to satisfy
E yt y xt yt y xt |xt E E yt y xt yt y xt |xt ,
E y| y xt xt y| y |xt .
This may be written more compactly as
y E y| |xt C y| xt |xt .
(11.61)
That is, the covariance matrix of yt (conditional on the exogenous variables) is equal to the
mean of the covariance matrix conditional on the parameters and the covariance matrix of the
mean conditional on the parameters. If y| y for all , then the second term is zero.
Similar expressions can be determined for all the autocovariances as well.
To keep notation as simple as possible, I assume that n 1 and that xt 1 when examining
higher central moments. Concerning the third central moment, the law of iterated expectations
and the mean expansion can be applied to show that
3
3
3
2
, y| .
E E yt y| | E y| y
3C y|
(11.62)
E yt y
Hence, skewness69 of yt is equal to the mean of skewness of yt conditional on the parameters plus skewness of the mean conditional on the parameters plus three times the covariance
between the conditional variance and the conditional mean.70 It can now be deduced that if
y| y for all , then the second and third term on the right hand side of (11.62) are
2
y2 for all , but this case is only
both zero. In addition, the third term is zero if y|
expected to occur in state-space models where the matrices H, R, F, B0 are fully determined
from calibrated parameters.
For the log-linearized DSGE model with normally distributed structural shocks and measurement errors we know that yt | has zero skewness since it is normally distributed. If the
distribution of the conditional mean is skewed, then yt inherits the skewness from the conditional mean but its skewness is also aected by the covariance between the conditional variance
and the conditional mean. Since both these conditional moments may be non-linear functions
of we typically do not know the sign of the third term on the right hand side in equation
(11.62).
The fourth central moment of yt can likewise be determined from the law of iterated expectations and the mean expansion. This gives us
4
4
4
E E yt y| | E y| y
E yt y
(11.63)
3
2
2
y| y
.
4C E yt y| | , y| 6E y|
69
Skewness is usually defined as the third moment of the standardized random variable yt y /y . This means
that skewness is equal to the ratio between the third central moment of the random variable yt and the standard
deviation of yt to the power of three.
70
Since E y2 y| y
0, the third term on the right hand side in equation (11.62) comes from the relationship
2
2
y| y E y|
E y|
y2 y| y .
130
Hence, kurtosis71 is equal to the expectation of conditional kurtosis plus kurtosis of the conditional mean plus 4 times the covariance between conditional skewness and the conditional
mean plus 6 times the expected value of the product between the conditional variance and the
square of the conditional mean in deviation from the mean. If y| y for all , then the
last three terms on the right hand side of (11.63) are zero, and kurtosis of yt is given by the
mean conditional kurtosis. Furthermore, the third term is zero when yt | is symmetric so that
conditional skewness is zero.
Since yt | is normal for the state-space model with normal measurement errors and structural shocks, it follows from the properties of the normal distribution72 that conditional kurtosis
of yt is given by
4
4
.
E yt y| | 3y|
Since the normal distribution is symmetric we also know that the third term in (11.63) is zero.
Hence, kurtosis of yt is determined by the mean of conditional kurtosis of yt |, kurtosis of
the conditional mean, and the mean of the product between the conditional variance and the
square of the conditional mean in deviation from the mean.
11.16. YADA Code
The impulse responses are handled by the function DSGEImpulseResponseFcn, historical forecast error decompositions by DSGEHistDecompFcn, while the variance decompositions are calculated by the function DSGEVarianceDecompFcn for the original data and for the levels by
DSGEVarianceDecompLevelsFcn. Since the variance decompositions require that a Riccati equation can be solved, the code includes the function RiccatiSolver. The conditional correlations are performed by the functions DSGEConditionalCorrsTheta that can deal with both
population-based and sample-based correlations, while correlation decompositions are carried
out by DSGECorrelationDecompTheta.
The conditional variance decompositions are handled by DSGECondVarianceDecompFcn, while
output on estimates of unobserved variables and observed variable decompositions is provided
by the function CalculateDSGEStateVariables. The levels of the conditional variance decompositions are handled by DSGECondLevVarianceDecompFcn, while the levels of the impulse
response functions are taken care of by DSGELevImpulseResponseFcn. Parameter scenarios for
fixed parameter values are handled by the function DSGEParameterScenariosTheta. The function DSGEtoVARModel checks the poor mans invertibility condition of Fernndez-Villaverde
et al. (2007), i.e., if all the eigenvalues of the matrix on lagged states in equation (11.53) lie
inside the unit circle.
Annualizations of the conditional variance decompositions are computed by the function
DSGECondAnnVarianceDecompFcn, while impulse responses for annualized data is calculated directly from the output of DSGEImpulseResponseFcn. With s being the data frequency, this typically involves summing s consecutive impulse responses provided that the variable is annualized
Kurtosis is usually defined as the fourth moment of the standardized random variable yt y /y . This means
that kurtosis is actually equal to the fourth central moments divided by the square of the variance.
71
72
From, e.g., (Zellner, 1971, Appendix A) we have that the even central moments of z N, 2 can be expressed
as
2r 2r
E z 2r r 1/2, r 1, 2, . . . .
Since r 1/2 > 1 it is straightforward to show that the gamma function (see Section 4.2.2) can be expressed as:
r 1/2
r1
r j 1/21/2,
j0
where 1/2 . Hence, the even central moments of z of can also be written as
r1
E z 2r 2r
2 r j 1 ,
r1
j0 r
j0
j 1/2 1/2r
r1
131
j0 2r
r 1, 2, . . .
j 1.
by summing s consecutive observations. For response horizons prior to period s1 all responses
from period 0 until the response horizon are summed. We thereafter return to Fishers information matrix from Section 11.11. The function used for this purpose is DSGEInformationMatrix
which can estimate the information matrix for any selected value of . The observation weight
computations for the forecast, update and smooth projections of the state variables and economic shocks are handled by the function DSGEObservationWeightsTheta. The singular value
decomposition and Q-R factorization with column pivoting to determine identification patterns from a matrix is taken care of by DSGEIdentificationPatterns. The Monte Carlo filtering checks are performed via the function MonteCarloFiltering.
11.16.1. DSGEImpulseResponseFcn
The function DSGEImpulseResponseFcn is used to calculate the responses in the state variables
and the observed variables from the economic shocks. It provides as output the structure
IRStructure using the inputs H, F, B0, and h. The r n matrix H is given by the measurement equations, while F and B0 are obtained from the DSGE model solution as determined by
the function AiMtoStateSpace. The last input h is a positive integer denoting the maximum
horizon for the impulse responses.
The output structure IRStructure has two fields Ksi and Y. The fields contain the responses
of the state variables and of the observed variables, respectively. These are provided as 3D
matrices. The first dimension is the number of states (r) for the field Ksi and observed variables
(n) for Y, the second dimension is the number of shocks (q), and the third is the number
of responses plus one (h 1). First instance, IRStructure.Y(:,:,i+1) hold the results for
response horizon i.
11.16.2. DSGELevImpulseResponseFcn
The function DSGELevImpulseResponseFcn is used to calculate the accumulated responses in the
state variables and the levels responses of the observed variables from the economic shocks. It
provides as output the structure IRStructure using the inputs H, F, B0, AccMat, and h. The r n
matrix H is given by the measurement equations, while F and B0 are obtained from the DSGE
model solution as determined by the function AiMtoStateSpace. The matrix AccMat is an n n
diagonal 0-1 matrix. It is used to accumulate the responses in the observed variables provided
that they are viewed as being in first dierences. The last input h is a positive integer denoting
the maximum horizon for the impulse responses.
The output structure IRStructure has the same dimensions as for the original data function
DSGEImpulseResponseFcn. The fields contain the responses of the state variables (Ksi) and of
the observed variables (Y), respectively. While the responses in the state variables are pure
accumulations of the response function in (11.16), the levels response for the observed variables are only accumulated for those variables which are viewed as being in first dierences.
Specifically, with C being the 0-1 diagonal matrix AccMat, the levels responses for the observed
variables are given by
L
L
|t ej C resp yth1
|t ej H F h B0 ej , h 1,
resp yth
where respytL |t ej H B0 ej . Observed variables are viewed by YADA as being in first
dierences based on the user defined input in the Data Construction File; cf. Section 17.5.
11.16.3. CalculateDSGEStateVariables
The function CalculateDSGEStateVariables provides output on estimates of various unobserved variables. To achieve this it needs 5 inputs: theta, thetaPositions, ModelParameters,
DSGEModel, and ObsVarDec. The first three inputs are discussed in some detail in connection
with the prior file handling function VerifyPriorData, while the structure DSGEModel is discussed in connection with the posterior mode estimation function PosteriorModeEstimation;
cf. Section 7.4. The 6th input variable, ObsVarDec, is a boolean variable that determines if the
function should compute the observed variable decompositions or not. This input is optional
132
and defaults to 0 if not supplied. The last two input variables are related to recursive smooth estimation of state variables, state shocks, and measurement errors. The boolean variable RecEst
is unity if recursive estimation should be performed and zero otherwise. The string vector
HeaderStr is shown in the window title of the wait dialog that is displayed during recursive
estimation. Both these input variables are optional and are equal to 0 and an empty string,
respectively, by default.
The only required output from the function is the structure StateVarStructure. In addition,
the function can provide output on status, the mcode output from the DSGE model solving
function that have been used, and kalmanstatus, the status output from the KalmanFilter
function. The latter two outputs are only taken into account by YADA when initial parameter
estimates are given to CalculateDSGEStateVariables.
The structure StateVarStructure has at most 27 fields. First of all, the fields Y and X hold
the data matrices for the observed variables and the exogenous variables for the actual sample
used by the Kalman filter. Furthermore, the field TrainSample hold a boolean variable which
reports if a training sample was used or not when computing the log-likelihood through the
Kalman filter. Furthermore, the output on the state variable estimates are given through the
fields Ksitt1 (forecast), Ksitt (update), and KsitT (smooth), while the forecasted observed
variables are held in the field Yhat. Next, the field lnLt stores the vector with sequential loglikelihood values, i.e., the left hand side of equation (5.19).
The smooth estimates of the economic shocks are located in the field etatT. The matrix stored
in this field has the same number of rows as there are shocks with a non-zero impact on at least
one variable. This latter issue is determined by removing zero columns from the estimated B0
matrix. The columns that are non-zero are stored as a vector in the field KeepVar. Update
estimates of the economic shocks are located in etatt. The smooth innovation population
covariance matrices Nt|T in equation (5.25) are located in the field NtMat, an r r T matrix.
These covariance matrices are used to compute the average population covariance matrix for
T
the smoothed economic shocks, i.e., 1/T t1 B0 Nt|T B0 . If the model contains measurement
errors, then the smoothed estimates of the non-zero measurement errors are located in the
field wtT, while names of equations for these non-zero measurement errors are stored as a
string matrix in the field wtNames. At the same time, all estimated measurement errors are kept
in a matrix in the field wthT. Similarly, update estimates of the measurement errors are found
in wtt and wtht.
Given that the boolean input ObsVarDec is unity, the historical observed variable decompositions are calculated. The field XiInit contains a matrix with typical column element given
by H F t 0|T . Similarly, the field etaDecomp holds a 3D matrix with the contributions to the
observed variables of the non-zero economic shocks. For instance, the contribution of shock
i for observed variable j can be obtained for the full sample as etaDecomp(j,:,i), a 1 T
vector. The historical decompositions for the state variables are similarly handled by the fields
XietaInit and XietaDecomp.
Recursive estimates of the state variables, the state shocks, and the measurement errors are
computed when the boolean variable RecEst is unity. The field recursive is then setup and
defined as a vector (or structure array) such that recursive(t).KsitT holds the t:th recursive values of the smoothed state variables. Similarly, state shocks and measurement errors
are located in recursive(t).etatT and recursive(t).wtT, respectively, for the t:th recursive
smooth estimate.
The structure StateVarStructure also has 5 fields with parameter matrices A, H, and R from
the measurement equation, and F and B0 from the state equation. The last field is given by
MaxEigenvalue, which, as the name suggests, holds the largest eigenvalue (modulus) of the
state transition matrix F.
11.16.4. DSGESimulationSmootherTheta
The function DSGESimulationSmootherTheta calculates the distribution of unobserved variables conditional on the data and the parameters of the DSGE model. The input variables are
133
This means that the part of the h-step ahead forecast error that is due to economic shock j is obtained from
etathT(:,:,j), an n T h matrix.
134
is 1 (0) if sample-based (population-based) conditional correlations should be computed. Finally, the integers FirstPeriod and LastPeriod marks the sample start and end point when
the sample-based approach should be used.
The function provides one required and one optional output. The required output variable is
CondCorr, a structure with fields Mean, Quantiles, and ShockNames. When the field Quantiles
is not empty, then it has length equal to the number of quantiles, and each sub-entry has fields
percent and Mean. The former stores the percentile value of the distribution, while the latter
stores the conditional correlations at that percentile. The optional output variable is status that
indicates if the solution to the DSGE model is unique or not. The value is equal to the variable
mcode given by either the function AiMSolver, the function KleinSolver, or SimsSolver.
11.16.7. DSGECorrelationDecompTheta
The function DSGECorrelationDecompTheta computes the correlation decompositions for the
observed variables and the state variables. It takes 6 input variables: theta, thetaPositions,
ModelParameters, VarStr, DSGEModel, and CurrINI. All these input variables have been discussed above except for VarStr. It is a string that supports the values Observed Variables
and State Variables.
The function yields one required and one optional output variable. The requires variable
is CorrDec, a structure with 7 fields. The first field, Y, an nn 1/2 2h 1 q 1
matrix with the decompositions of the observed variable correlations over the horizons h
until h into the q shocks and the measurement error. Similarly, the second field, Xi, is an
rr 1/2 2h 1 q matrix with the correlation decompositions for the state variables.
The third field, AutoCovHorizon, is an integer with the value of h, the autocorrelation horizon.
The ShockNames and the ShockGroupNames fields are string matrices where the rows hold the
names of the shocks and the shock groups, respectively, where the number of rows is q for the
shocks and g for the shock groups, with q g. The ShockGroups field is a vector of dimension q
with integers that map each shock to a certain shock group, while the ShockGroupColors field
is a g 3 matrix, where each row gives the color as an RGB triple for a shock group. The RGB
triple holds values between 0 and 1, representing the combination of red, green and blue, and
this scale can be translated into the more common 8-bit scale that is used to represent colors
with integer values between 0 and 255.
The optional output variable is mcode, determined by either the function AiMSolver, the
function KleinSolver, or the function SimsSolver. It is only used when theta is given by the
initial parameter values.
11.16.8. DSGEParameterScenariosTheta
The function DSGEParameterScenariosTheta calculates the parameter scenario for two values
of the parameter vector, the baseline value and the alternative value. It takes 10 input variables: DSGEModel, theta, thetaScenario, thetaPositions, ModelParameters, FirstPeriod,
LastPeriod, BreakPeriod, CopyFilesToTmpDir, and finally CurrINI. The structures DSGEModel,
ModelParameters, thetaPositions and CurrINI have all been discussed above. The vector
theta holds the baseline values of the parameters, while thetaScenario holds the alternative
(scenario) values of the parameters. The integers FirstPeriod and LastPeriod simply indicate the first and the last observation in the estimation sample (not taking a possible training
sample for the state variables into account). The integer BreakPeriod indicates the position in
the sample (taking the training sample into account) where the parameters change, while the
boolean CopyFilesToTmpDir indicates if certain files should be copied to the tmp directory of
YADA or not.
The function provides 8 required output variables. These are: Status, a boolean that indicates if all calculations were completed successfully or not. Next, the function gives the actual
path for the observed variables in the matrix Y, as well as the matrix YScenario, holding the
alternative paths. As mentioned in Section 11.9, these paths are based on feeding the smooth
estimates of the economic shocks (and measurement errors) based on the baseline parameters
into the state-space model for the alternative parameters. Next, the function gives two matrices
135
with smooth estimates of the economic shocks: OriginalShocks and ScenarioShocks. The
former holds the values of the economic shocks under the baseline parameter values, while the
latter gives the values of the economic shocks under the alternative parameter values. Similarly,
two matrices with state variable estimates are provided: OriginalStates and ScenarioStates,
where the former holds the smooth estimates of the state variables for the baseline parameter
values, while the latter matrix holds the implied state variables for the alternative parameter
values. That is, when the state equation is applied to the combination of the smoothly estimated economic shocks under the baseline parameter values along and the F and B0 matrices
for the alternative parameter values. Finally, the function provides a vector with positive integers, KeepShocks, signalling which of the economic shocks have a non-zero influence on the
variables of the DSGE model.
11.16.9. DSGEtoVARModel
The function DSGEtoVARModel is used to check if the state-space representation of the DSGE
model satisfies the poor mans invertibility condition of Fernndez-Villaverde et al. (2007).
The function takes 4 inputs: H, R, F, and B0. These are, as before, the matrices H and R from
the measurement equation, and the matrices F and B0 of the state equations; see, e.g., the
details on DSGEImpulseResponseFcn.
As output the function provides status and EigenValues. The integer status is unity if the
state-space model can be rewritten as a VAR model, and 0 if some eigenvalues is on or outside
the unit circle. In the event that the number of economic shocks and unique measurement errors
exceeds the number of observed variables, status is equal to 1. The vector EigenValues
provides the modulus of the eigenvalues from the invertibility condition when status is nonnegative.
11.16.10. DSGEInformationMatrix
The function DSGEInformationMatrix is used to estimate Fishers information matrix stated
in equation (11.54). To achieve this 6 input variables are required: theta, thetaPositions,
ModelParameters, ParameterNames, DSGEModel, and CurrINI. The first 3 variables are identical
to the input variables with the same names in the CalculateDSGEStateVariables function.
The 4th input is a string matrix with the names of the estimated parameters. The last two input
variables are structures that have been mentioned above; see also Section 7.4.
The function provides the output variable InformationMatrix which is an estimate of the
right hand side in (11.54) with the partial derivatives y
t / and vecy,t|t1 / replaced
by numerical partials. By default, each parameter change is equal to 0.1 percent of its given
value. If the model cannot be solved at the new value of , YADA tries a parameter change of
0.01 percent. Should YADA also be unsuccessful at the second new value of , estimation of the
information matrix is aborted.
11.16.11. DSGEIdentificationPatterns
The function DSGEIdentificationPatterns computes the eigenvectors and eigenvalues of a
square (symmetric) matrix and attempts to order the columns of the input matrix through Q-R
factorization with column pivoting; see Section 11.11. The function takes one input variable,
the matrix InformationMatrix, which is typically positive semidefinite.
Three output variables are determined from the input: EigenVectorMatrix, EigenValues,
and ParameterOrdering. The first two variables are taken from the singular value decomposition of InformationMatrix, where the first variable is given by V in USV A (where A is
given by InformationMatrix) and the second is equal to the diagonal of S. The third variable
is a matrix with two columns. The first column contains the column ordering based on the
Q-R factorization with column pivoting of V , while the second column contains the ordering
based on the same algorithm applied directly to InformationMatrix. When printing output on
parameter ordering, YADA currently only writes the ordering based on the InformationMatrix.
136
11.16.12. MonteCarloFiltering
The function MonteCarloFiltering is used to check the DSGE model solution property for a
given set of draws from the prior distribution. The 6 input variables are given by: thetaDraws,
thetaPositions, ModelParameters, AIMData, DSGEModel, and CurrINI. The last five input variables have been discussed above, while the first input variable is a matrix of dimension p nd
where p is the dimension of , the vector of parameters to estimate, and nd is the number of
draws from the prior distribution of .
As output the function provides the nd -dimensional vector SolutionCode, where a unit value
indicates a unique and convergent solution of the model, a value of 2 that there is indeterminacy, and a value of 0 that there is no convergent solution.
11.16.13. DSGEObservationWeightsTheta
The function DSGEObservationWeightsTheta is used to estimate the observation weights and
the associated decompositions that were discussed in Section 5.9 for the state variables and
in Section 11.1 for the economic shocks. The computations require 6 input variables: theta,
thetaPositions, ModelParameters, VarType, DSGEModel, and CurrINI. The first 3 variables
are identical to the input variables with the same names in the CalculateDSGEStateVariables
function. The 4th variable is an integer which indicates the type of output to prepare. The
supported values are: (1) decompositions for state variables; (2) decompositions for economic
shocks; (3) weights for state variables; and (4) weights for economic shocks. The last two input
variables are structures that have been mentioned above; see also Section 7.4.
The function provides the output variable StateDecomp, a structure whose fields depend on
the value for VarType. In addition, the function can provide output on status, the mcode output
from the DSGE model solving function, and kalmanstatus, the status output from the Kalman
filter.
11.16.14. DSGECondVarianceDecompFcn
The function DSGECondVarianceDecompFcn computes the conditional forecast error variance
decomposition in (11.33). The function needs 4 inputs: H, F, B0, and h. These are exactly the
same as those needed by DSGEImpulseResponseFcn.
As output the function provides the 3D matrix FEVDs. This matrix has dimension n q h,
with n being the number of observed variables, q the number of economic shocks, and h the
forecast horizon.
11.16.15. DSGECondLevVarianceDecompFcn
The function DSGECondLevVarianceDecompFcn computes the conditional forecast error variance
decomposition in (11.33), but where Ri is partly an accumulation of Ri1 . Specifically, let
C denote a diagonal 0-1 matrix, where a diagonal element is 1 if the corresponding variable
should be accumulated and 0 otherwise. The Ri matrices are here calculated according to
Ri CRi1 H F i B0 ,
i 1, 2, . . . ,
while R0 H B0 . This allows YADA to compute levels eects of observed variables that only
appear in first dierences in the yt vector, e.g., GDP growth. At the same time, variables
that already appear in levels, e.g., the nominal interest rate, are not accumulated. The function needs 5 inputs: H, F, B0, AccMat, and h. These are identical to the inputs accepted by
DSGELevImpulseResponseFcn.
As output the function provides the 3D matrix FEVDs. This matrix has dimension n q h,
with n being the number of observed variables, q the number of economic shocks, and h the
forecast horizon.
11.16.16. DSGECondAnnVarianceDecompFcn
The function DSGECondAnnVarianceDecompFcn calculates the conditional variance decomposition in (11.33), but where the Ri matrices need to take annualization information into account.
137
In particular, let N denote a diagonal 0-1 matrix, while s is the frequency of the data, e.g., s 4
(s 12) for quarterly (monthly) data. The Ri matrices are now given by:
min{i,s}1
ij
Ri N
H F B0 H F i B0 , i 0, 1, 2, . . . .
j1
the convergence criterion is given by the Matlab norm function applied to the change in P1 ,
unless the call to dare indicates a solution of the Riccati equation. The details of the algorithm
are given in Sections 11.4 and 11.5.
The Riccati solver function gives three required output variables P1, status, and NumIter, as
well as one optional variable, TestValue. The first required variable is the solution candidate
for P1 . The solution is considered as valid by YADA if the iterations have converged within the
maximum number of iterations. If so, the variable status is assigned the value 0, otherwise it
is 1 unless infinite values or NaNs were located. This last case results in a value of 2 for status.
The third output NumIter is simply the number of iterations used, while the fourth (optional)
variable gives the value of the convergence criterion for the last iteration.
139
(12.1)
p |T d,
where is the support of and where, for convenience, we have neglected to include an index
for the model in the above expressions.
From a Bayesian perspective, it may be noticed that for a given model there is no uncertainty
about the predictive density and, thus, there is no uncertainty about a point or a density forecast
which is determined from it. This can be seen in equation (12.1) where posterior parameter
uncertainty is integrated out and what remains is a deterministic function of the data and the
model. In practise, numerical methods typically need to be applied, but the induced simulation
uncertainty can be controlled by the econometrician.74
The problem of numerically determining the predictive distribution is examined in the context of a cointegrated VAR by Villani (2001). Although the full predictive distribution cannot be
obtained analytically, a procedure suggested by Thompson and Miller (1986) may be applied.
Their procedure is based on a double simulation scheme, where S draws of from its full posterior are first obtained. In the second stage, prediction paths are simulated for xT 1 , . . . , xT h
conditional on the data and . From the conditional predictive distribution we have that
h
p yT i |xT i , T i1 ; .
p yT 1 , . . . , yT h |xT 1 , . . . , xT h , T ;
(12.2)
i1
The right hand side of equation (12.2) is obtained by the usual conditioning formula and by
noting that yT i conditional on xT i , T i1 and the parameters is independent of xtj for all
j > i. Moreover, the density for this conditional expression is for the state-space model given
by a multivariate normal with mean yT i|T i1 and covariance H PT i|T i1 H R.
The approach suggested by Thompson and Miller (1986) may be implemented for the statespace model as follows. For a given draw of from the posterior distribution, the DSGE model is
solved and the matrices A, H, R, F, and Q are calculated. A value for yT 1 is now generated by
drawing from the normal distribution with mean yT 1|T and covariance matrix H PT 1|T H R
using the expressions in Section 5.
For period T 2 we treat the draw yT 1 as given. This allows us to compute yT 2|T 1 and
H PT 2|T 1 H R and draw a value of yT 2 from the multivariate normal using these values as
the mean and the covariance. Proceeding in the same way until we have drawn yT h we have
obtained one possible future path for yT 1 , . . . , yT h conditional on xT 1 , . . . , xT h , T and the
given draw from the posterior distribution.
We may now continue and draw a total of P paths for yT 1 , . . . , yT h conditional on this
information. Once all these paths have been drawn, we pick a new value of from its posterior
and recalculate everything until a total of P S draws from the density in (12.2) have been drawn.
74
This may be contrasted with a frequentist approach to forecasting where a point or a density forecast is conditioned
on the unknown parameters of the model, i.e., it is based on the first density term on the right hand side of (12.1).
Once the unknown is replaced with a point estimate, the resulting point or density forecast is subject to the
estimation uncertainty inherent to the selected estimator and sample period and which cannot be influenced by the
econometrician. At the same time, the true predictive density, based on the true values of the parameters, is
deterministic but remains unknown; see Geweke and Whiteman (2006) for discussions on Bayesian forecasting.
140
Alternatively, and as suggested by Adolfson, Lind, and Villani (2007d), we can directly
utilize the state-space form. Specifically, for a given draw of from its posterior distribution,
the period T state vector can be drawn from NT |T , PT |T , where T |T and PT |T are obtained
from the final step of the Kalman filter computations; cf. Section 5. Next, a sequence of future
states T 1 , . . . , T h can be simulated from the state equation (5.2) given draws of the state
shocks vT 1 , . . . , vT h . The latter are drawn from the normal distribution with mean zero and
covariance matrix Q B0 B0 . Next, vectors of measurement errors wT 1 , . . . , wT h are drawn
from a normal distribution with mean zero and covariance matrix R. Adding the sequence
of state variables and the measurement errors a path for yT 1 , . . . , yT h is obtained via the
measurement equation (5.1). For the given value of , we can now generate P paths of the
observed variables and by repeating this for S draws from the posterior distribution of and
total of P S paths from the predictive density of yT 1 , . . . , yT h may be obtained.
The Adolfson, Lind, and Villani (2007d) approach is faster than the first approach since
the underlying computations are more direct. For this reason, the Adolfson, Lind, and Villani
approach has been implemented in YADA. Moreover, this procedure highlights the fact that the
uncertainty in the forecasts stems from four sources: parameter uncertainty (), uncertainty
about the current state (T ), uncertainty about future shocks (v), and measurement errors (w).
Based on the law of iterated expectations the forecast uncertainty for yT i can be decomposed
(Rao-Blackwellization) as follows:
(12.3)
C yT i |T ET C yT i |T ; CT E yT i |T ; ,
where ET and CT denote the expectation and covariance with respect to the posterior of at
time T and where, for notational simplicity, the sequence of exogenous variables xT 1 , . . . , xT h
has been suppressed from the expressions. Adolfson, Lind, and Villani (2007d) show that the
first term on the right hand side of (12.3) is given by
i1
i
i
j
Fj Q F
H ET R , (12.4)
ET C yT i |T ; ET H F PT |T F H ET H
j0
providing the uncertainties regarding the current state, the future shocks, and the measurement
errors. Similarly, for the second term in (12.3) we have that
(12.5)
CT E yT i |T ; CT A xT i H F i T |T ,
which thus reflects the influence of parameter uncertainty on forecast uncertainty.
To simplify the expression for the shock uncertainty term, consider the dierence equation
i1 F ,
i Q F
i 1, . . . , h,
i
i1
j
Fj Q F .
j0
This expression allows for fast computation of the shock uncertainty term since loops over j are
not required.
It is interesting to note that not all terms in equation (12.3) necessarily increase as the forecast
horizon increases. For example, the uncertainty due to the state at time T is positive semidefinite at i 1 when some state variable are unobserved and have an impact on the observed
variables; cf. first term on the right hand side of (12.4). As i , the state uncertainty term
converges to zero when all eigenvalues of F are inside the unit circle. Hence, there exists some
finite forecast horizon i beyond which the state uncertainty factor is decreasing. Similarly, the
parameter uncertainty term may also have this property; cf. equation (12.5). For example, if
all elements of A are known, then parameter uncertainty is entirely due to variation in F i T |T
across dierent parameter draws from the posterior, and the uncertainty of this term will decrease beyond some forecast horizon i when all eigenvalue of F are inside the unit circle. In
141
fact, the only term that becomes larger and larger for finite forecast horizons is the shock uncertainty term (second term on the right hand side of equation 12.4) and, hence, we expect this
term to dominate the forecast error covariance matrix of the observed variables at and beyond
some finite forecast horizon. Moreover, the sum of the state, shock, and measurement error uncertainty terms on the right hand side of equation (12.4) are increasing as the forecast horizon
increases and we therefore expect the overall forecast error uncertainty in (12.3) to increase
with the horizon as well, at least beyond some finite horizon when the share due to parameter
uncertainty is suciently small.75
12.2. Conditional Forecasting with a State-Space Model
Conditional forecasting concerns forecasts of endogenous variables conditional on a certain
path and length of path for some other endogenous variables; see, e.g., Waggoner and Zha
(1999). While use of such conditioning assumptions may at first seem to be of limited interest,
one important forecasting situation that should be kept in mind is when real-time data vintages76
are used by the forecaster. The values for all observed variables for period T, the last historical
time period, have often not been released by the statistical authority yet and are therefore
missing from the relevant data vintage, i.e., the data set is unbalanced. Accordingly, some of
the time T values need to be forecasted and the forecasts of these variables need to take into
account that values for other variables are available for the same time period.
In this section I will discuss conditional forecasting as it has been implemented in YADA
for a DSGE model. Specifically, it is assumed that the conditioning information satisfies hard
conditions, i.e., a particular path, rather than soft conditions (a range for the path).
Let K1 and K2j be known n qm matrices with qm min{n, q} such that rankK1 qm and
j 1, . . . , g 1. Furthermore, consider the following relation:
zT i K1 yT i
i1
K2j
yT ij uT ,
i 1, . . . , g.
(12.6)
j1
The specification in equation (12.6) is general enough to satisfy our purposes. In the special
case where K2j 0 and uT 0 the vector zT i is determined directly from yT i , e.g., one
particular observed variable. Although such a specification covers many interesting cases it does
not allow us to handle the case when y includes the real exchange rate and the first dierences
of the domestic and foreign prices, but where z is the nominal exchange rate. Let pt and pt
denote the domestic and foreign prices, respectively, while st denotes the nominal exchange
rate. We may then let K1 be defined such that K1 yT i sT i pT i pT i pT i pT i ,
whereas K2j K2 for all j and K2 yT ij pT ij pT ij and uT pT pT . Another
interesting case which requires K2j to vary with j is when the conditioning assumptions involve
annual inflation and the observed variables have quarterly inflation.
To keep the values in zT i fixed over the given horizon, the method we consider requires that
a subset of the economic shocks are adjusted to take on certain values. The selection of shocks
is defined by the user while the values are calculated by taking equation (12.6) into account.
The selection of economic shocks is determined by the q qm matrix M where q > qm and
M 0. It now follows
rankM qm . Let M be the q q qm orthogonal matrix, i.e., M
that N M M
is a full rank q q matrix while
qqm
M
t
.
(12.7)
N t
t
q
M
t m
q
The shocks t m will be adjusted over the time interval t T 1, . . . , T g to ensure that (12.6)
is met for all forecast paths of the observed variables over this time interval.
75
For more detailed discussion on forecasting with DSGE models in practise, see Adolfson et al. (2007d), Christoel,
Coenen, and Warne (2011), and Del Negro and Schorfheide (2012).
76
See, for instance, Croushore (2011a,b) for further details on forecasting with real-time data vintages.
142
(5.2) as
tqqm B0 M
tqm .
(12.8)
t Ft1 B0 M
Turning first to period T 1 we know that if we substitute for yT 1 using the measurement
equation (5.1) and the rewritten state equation (12.8), the conditioning on the vector zT 1 in
(12.6) implies that:
qq
q
m
m uT . (12.9)
K1 H B0 M
zT 1 K1 A xT 1 K1 wT 1 K1 H FT K1 H B0 M
T 1
T 1
q
q
m
m
A xT 1 H T 1
wT 1 .
yT 1
q
m
can, for given parameter
We may now continue with period T 2 where the shocks T 2
q
q
m
m
, T 2 m , yT 1
, and uT . In fact, it is now
values, be determined from zT 2 , xT 2 , wT 2 , T 1
straightforward to show that the values for the economic shocks which guarantee that the
conditioning path zT 1 , . . . , zT g is always met are:
1
qm
qm
qqm
K1 H B0 M
T i
i1
j1
qm
K2j
yT ij
uT ,
(12.10)
i 1, . . . , g,
while the states and the observed variables evolve according to:
qm
qm
qqm B0 M
qm ,
FT i1
B0 M
T i
T i
T i
and
q
q
T m
q
m
m
A xT i H T i
wT i ,
yT i
i 1, . . . , g,
i 1, . . . , g,
(12.11)
(12.12)
T .
while
For i > g there are not any direct restrictions on the possible paths for the observed variables
other than that the state vector at T g needs to be taken into account.
The procedure described here makes it straightforward to calculate conditional predictive
distributions. For a given draw of from its posterior distribution, the period T state vector T
is drawn from NTz
, P z . YADA sets Tz
T |T and PTz
PT |T and, hence, ignores
|T g T |T g
|T g
|T g
the conditioning assumptions.
qq
M and wT i from N0, R.
Next, the economic shocks T i m are drawn from N0, M
q
m
are calculated from (12.10), the state
Based on the conditioning assumptions, the shocks T i
vector from (12.11), and the observed variables from (12.12) in a sequential manner until
qm
are drawn from N0, M M until i h. This
i g 1, when the economic shocks T i
provides one path for yT 1 , . . . , yT h . Given the value of we may next repeat this procedure
yielding P paths. By considering S draws of we can calculate a total of P S sample paths for
the observed variables that take the conditioning into account.
To evaluate the forecast error covariances let us first combine the measurement and the state
equation such that yT i is expressed as a function of the parameters, xT i , T , wT i , and the
M since M is typically a 0-1 matrix with only one unit element per
In many situations we will have that M
M .
column. Similarly, we can always select M such that M
77
143
i1
F j B0 T ij wT i ,
i 1, . . . , h.
(12.13)
j0
yT g
wT g
A xT g
H Fg
yT g1 A xT g1 H F g1
wT g1
.
T
..
..
...
...
.
yT 1
A xT 1
HF
wT 1
g1
H B0 H FB0 H F B0
T g
0
g2
H B0
H F B0 T g1
. ,
.
.
.
..
..
..
..
0
0
H B0
T 1
or
(12.14)
YT g XT g GT WT g DNT g .
The conditioning relations in equation (12.6) can also be stacked for the time periods T 1
until T g. Here we find that
K2g1
K1 K21
yT g
u
zT g
K2g2
yT g1 uT
zT g1 0 K1
. . ,
. .
..
.. ..
.. .
.
.
zT 1
yT 1
uT
0
0
K1
or
(12.15)
ZT g K YT g UT .
Let us now substitute for YT g from (12.14) into the conditioning assumptions (12.15). Moreover, define the stacked decomposition of the structural shocks
N qqm Ig M
N qm ,
NT g Ig M
T g
T g
q
q
q
m
m
m
T g
T 1
m
exists. Collecting these results, the stacked vector of shocks is given
unique solution for NT g
by
K D Ig M
1 K D Ig M
N qqm ,
(12.16)
NT g N,T g Iqg Ig M
T g
where
K D Ig M
1 kT g ,
N,T g Ig M
kT g ZT g UT K XT g GT WT g .
With this in mind, it is straightforward to show that conditional on T , ZT g and the mean
of the stacked shock vector NT g is
K D Ig M
1 ZT g K XT g G z
U
.
(12.17)
N,T g Ig M
T
T |T g
144
(12.18)
Hence, premultiplication of both sides by K we find that the conditioning assumptions are
satisfied by the conditional mean predictions. Notice that the expectation is here also taken with
respect to M, i.e., the selection of shocks used the ensure that the conditioning assumptions
are satisfied. For notational convenience, however, it has been left out of the conditioning
information.
Furthermore, it can also be shown that the covariance matrix of the observed variable for
fixed is given by
z G D
Ig R D
DD
Ig M
M D D
D
,
(12.19)
C YT g |T , ZT g ; DGP
T |T g
where
K D Ig M
1 K .
Ing D Ig M
D
Premultiplication of the covariance matrix in (12.19) by K or postmultiplication by K we find
that the resulting expression is zero. Hence, the uncertainty about the conditioning assumption
is zero. Moreover, the first term of the right hand side of (12.19) gives the share of the forecast
error covariances for fixed due to state variable uncertainty, the second term yielding the
measurement error share, while the third provides the shock uncertainty share.
For forecast horizons i beyond the conditioning horizon g we have that
i
yT i A xT i H F T
ig1
T g wT i ,
H F j B0 T ij H F ig BN
i g 1, . . . , h, (12.20)
j0
(12.21)
for i g 1, . . . , h. Moreover, the forecast error covariance matrix for these horizons can be
shown to be given by
ig
F
z G
F ig H H F ig K
Ig R K
H
C yT i |T , ZT g ; H F ig GP
T |T g
R
ig1
j
H F j B0 B0 F H
(12.22)
j0
ig
M N
Ig M
B F
H,
H F ig BN
for i g 1, . . . , h, and where
K D Ig M
1 K ,
B Ig M
K
F g KG,
G
K D Ig M
1 K D.
Iqg Ig M
N
(12.23)
The first term on the right hand side of (12.22) represents state uncertainty for fixed parameters,
the second and third give the measurement error uncertainty, while the last two provide a
measure of shock uncertainty. Moreover, it is worth pointing out that when all the eigenvalues
of F are less than unity in absolute terms the first, second, and fifth terms on the right hand
side converge to zero as i so that the forecast error covariance matrix approaches the
covariance matrix of the observed variables for fixed parameters.
145
As in equation (12.3) we now find that the forecast error covariance matrix for the observed
variables can be decomposed as
(12.24)
C yT i |T , ZT g ET C yT i |T , ZT g ; CT E yT i |T , ZT g ; ,
for i 1, . . . , h, and where ET and CT denote the expectation and covariance with respect to
the posterior of at time T.78 The first term on the right hand side gives us a decomposition of
the forecast uncertainty into state, measurement error, and shock uncertainty, while the second
term provides a measure of parameter uncertainty.
12.3. Modesty Statistics for the State-Space Model
Conditional forecasting experiments may be subject to the well known Lucas (1976) critique.
Leeper and Zha (2003) introduced the concept of modest policy interventions along with a
simple metric for evaluating how unusual a conditional forecast is relative to the unconditional
forecast. Their idea has been further developed by Adolfson, Lasen, Lind, and Villani (2005).
I shall present three modesty statistics, two univariate and one multivariate.
The general idea behind the modesty statistics is to compare the conditional and the unconditional forecast. The forecasts are subject to uncertainty concerning which shocks will hit
the economy during the prediction period. For the conditional forecasts some shocks have to
take on certain values over the conditioning period to ensure that forecasts are consistent with
the conditioning information. If these restricted shocks behave as if they are drawn from their
distributions, then the conditioning information is regarded as modest. But if the behavior of
the shocks over the conditioning period is dierent from the assumed, then the agents in the
economy may be able to detect this change. In this case, the conditioning information need no
longer be modest and might even be subject to the famous Lucas (1976) critique.
Within the context of the state-space representation of the DSGE model, the univariate statisqq
tic suggested by Leeper and Zha (2003) is based on setting T i m 0 and wT i 0 for
i 1, . . . , g in equations (12.10)(12.12). The alternative univariate statistic suggested by
Adolfson et al. (2005) does not force these shocks to be zero over the conditioning horizon.
For both approaches the dierence between the conditional and the unconditional forecasts
at T g for a given is:
yT g
; E yT g |T ;
T,g
g
HF
T T |T H
g1
j0
qqm wT g ,
qm M
F j B0 M
T gj
T gj
(12.25)
q qq T g
where {t m , t m }tT 1 . Under the Leeper and Zha approach the measurement errors
qq T g
and the other shocks {t m }tT 1 are set to zero, while under the Adolfson et al. approach
(12.26)
i 1, . . . , g.
Under the hypothesis that T is can be observed at T, the matrix PT |T 0. This assumption is
used by Adolfson et al. (2005) and is also imposed in YADA.
The assumption that the state vector in T can be observed at T in the modesty analysis is
imposed to make sure that it is consistent with the assumptions underlying the DSGE model.
That is, the agents know the structure of the model, all parameters, and all past and present
shocks. Hence, there cannot be any state uncertainty when evaluating the current state. This
also has an implication for equation (12.25) where we set T T |T .
78
This posterior distribution provides an approximation since it does not incorporate the conditioning assumptions
in ZT g , but for convenience this fact is here overlooked.
146
(12.27)
q T g
Under the hypothesis that the conditioning shocks are modest, i.e., {t m }tT 1 can be viewed
as being drawn from a multivariate standard normal distribution, this statistic is 2 n. Instead
of using the chi-square as a reference distribution for the multivariate modesty statistic, one may
T g
calculate the statistic in (12.27) using the shocks {t }tT 1 in (12.27) that are drawn from
to
the Nq 0, Iq distribution and thereafter compute the tail probability Pr T,g T,g
determine if the conditioning information is modest.
One univariate statistic suggested by Adolfson et al. is the following
i
T,g
i
T,g
%
,
i,i
T g
i 1, . . . , n,
(12.28)
is calculated with the other shocks and the measurement errors drawn from a
where T,g
normal distribution. This statistic has a standard normal distribution under the assumption of
modest conditioning shocks.
be computed for zero measureFor the alternative Leeper-Zha related statistic we let T,g
ment errors and other shocks, while
T g H PT g|T H,
B .
PT i|T FPT i1|T F B0 MM
0
i,i
0.
(12.30)
Like Kilian and Manganelli I have here adopted the convention of defining downside risk as a
negative number and upside risk as a positive number.79
These risk measures are related to the loss function
if y < yL ,
ayL y
(12.31)
L y 0
if yL y yU ,
1 ay y if y > y .
U
U
This mean that the expected loss is given by
E L y aDR yL 1 aUR yU ,
(12.32)
where 0 a 1 thus gives the weight on downside risk relative to upside risk in the loss
function.
As pointed out by Kilian and Manganelli it is common in discussions of risk to stress the need
to balance the upside and downside risks. Risk balancing in this sense may be considered as
taking a weighted average of the upside and downside risks. As shown by Kilian and Manganelli
(2007) such a balance of risk measure may be derived under optimality arguments and is here
given by
(12.33)
BR1,1 yL , yU aDR1 yL 1 aUR1 yU .
For the case with a quadratic loss function with equal weights given to upside and downside
risks, the balance of risk measure in (12.33) is
BR1,1 yL , yU E y yL |y < yL Pr y < yL E y yU |y > yU Pr y > yU .
That is, the balance of risks is a weighted average of the expected value of y given that it is
below the lower bound and the expected value of y given that it is above the upper bound with
weights given by the probabilities that the events occur. Such a measure is, for instance, used
by Smets and Wouters (2004) in their study on the forecasting properties of a DSGE model.
12.5. The Predictive Likelihood and Log Predictive Score
The calculation of the height of the joint or the marginal predictive density is often needed by
methods for comparing or evaluating density forecasts; see, e.g., Geweke and Amisano (2010).
As Gneiting, Balabdaoui, and Raftery (2007) point out, the assessment of a predictive distribution on the basis of its density and the observed data onlythe predictive likelihoodis
consistent with the prequential approach of Dawid (1984), according to which forecasts are
both probabilistic and sequential in nature, taking the form of probability distributions over a
sequence of future values; see also Geweke (2010) and Geweke and Amisano (2011, 2012).
The use of the predictive likelihood as a valid Bayesian approach to model selection has long
been recognized. Box (1980), for example, has emphasized the complementary roles in the
model building process of the posterior and predictive distributions, where the former is used
for diagnostic checking of the model, while the latter provides a general basis for robustness
checks. Moreover, for models with improper priors the predictive likelihood can still be used for
model selection provided that the sample being conditioned on is large enough to train the prior
to a proper one; see, e.g., Gelfand and Dey (1994), Eklund and Karlsson (2007), and Strachan
and van Dijk (2011).
A forecast comparison exercise is naturally cast as a decision problem within a Bayesian
setting and therefore needs to be based on a particular preference ordering. Scoring rules can
be used to compare the quality of probabilistic forecasts by giving a numerical value using the
predictive distribution and an event or value that materializes. A scoring rule is said to be proper
if a forecaster who maximizes the expected score provides its true subjective distribution; see
Winkler and Murphy (1968). If the maximum is unique then the rule is said to be strictly proper.
Proper scoring rules are important since they encourage the forecaster to be honest.
79
Risk measures of this type were first proposed in the portfolio allocation literature by Fishburn (1977); see also,
e.g., Holthausen (1981).
148
A widely used scoring rule that was suggested by, e.g., Good (1952) is the log predictive
score. Based on the predictive density function of yt1 , . . . , yT h , it can be expressed as
SJ h, m
T N
h 1
h 1, . . . , H,
(12.34)
tT
where Nh is the number of time periods the h-step-ahead predictive density is evaluated, t
is the observed data of yt until period t, and m is an index for the model. If the scoring rule
depends on the predictive density only through the realization of y over the prediction sample,
then the scoring rule is said to be local. Under the assumption that only local scoring rules are
considered, Bernardo (1979) showed that every proper scoring rule is equivalent to a positive
constant times the log predictive score plus a real valued function that only depends on the
realized data; see Bernardo and Smith (2000) for general discussions on related issues and
Gneiting and Raftery (2007) for a recent survey on scoring rules.
When evaluating the log score with the realized value of y over the prediction sample, the
dierence between the log predictive score of model m and model k is equal to the average
log predictive Bayes factor of these two models, where a positive value indicates that, on average, model m is better at predicting the variables over the given sample than model k. It is
furthermore straightforward to show that the log predictive likelihood of model m is equal to
the dierence between the log marginal likelihood value when the historical data, t , and the
realisations yt1 , . . . , yth are used and the log marginal likelhood value obtained when only
the historical data are employed; see, e.g., Geweke (2005, Chapter 2.6.2). In fact, based on the
observations T N1 and with Nh Nh1 1 we can rewrite the log predictive score in (12.34)
as
h1
1
(12.35)
ln p T N1 i , m ln p T i , m , h 1, . . . , H.
SJ h, m
Nh i0
This means that the log predictive score of model m for one-step-ahead forecasts is proportional
to the dierence between the log-marginal likelihood for the full sample T N1 and the historical
sample T . Moreover, the calculation of the score for h-step-ahead forecasts based on the joint
predictive likelihood requires exactly 2h marginal likelihood values, where the first h are based
on the samples T N1 i and the last h on T i for i 0, . . . , h 1.
It can also be seen that the log predictive likelihood in (12.34) can be rewritten as a sum
of one-step-ahead log predictive likelihoods. Hence, in essence the log score SJ h, m covers
one-step-ahead forecasts only and is therefore not well suited for a comparison of h-step-ahead
forecasts when h > 1. When comparing the density forecasts of the NAWM and alternative
forecast models, Christoel et al. (2011) therefore focus on the marginal predictive likelihood
of the h-step-ahead forecasts rather than the joint predictive likelihood in (12.34). The log
predictive score can now be expressed as
SM h, m
T N
h 1
ln p yth |t , m ,
h 1, . . . , H.
(12.36)
tT
The relationship between the marginal likelihood and the log predictive score in (12.36) holds
when h 1. For other forecast horizons it is claimed by both Christoel et al. (2011, p. 114)
(p. 114) and Adolfson et al. (2007d, p. 324) that this connection breaks down and, hence,
that the marginal likelihood cannot detect if some models perform well on certain forecast
horizons while other models do better on other horizons. Furthermore, Adolfson et al. (2007d,
p. 325) remark that computing SM h, m for h > 1 is not an easy task since p yth |t , m
does not have a closed form solution and that kernel density estimation from the predictive
draws is not practical unless the dimension of yth is small. They therefore suggest using a
normal approximation of the predictive likelihood based on the mean and the covariance of the
marginal predictive distribution.
149
However, going back one step one realizes that Christoel et al. (2011) and Adolfson et al.
are incorrect since
p yth , t , m
, h 1, . . . , H.
(12.37)
pyth |t , m
p t , m
The denominator is the marginal likelihood of model m when using the data t and the numerator is likewise the marginal likelihood for this model when using the data (yth , t ). Hence,
the connection between the predictive likelihood and the marginal likelihood remains also for
h > 1. The problem for calculating the log predictive score in (12.36) for h > 1 therefore concerns the question: it is possible to compute the marginal likelihood for the sample (yth , t )?
A solution to this problem is suggested by Warne, Coenen, and Christoel (2012). Suppose
we replace the realizations of yti , i 1, . . . , h 1, in th with missing observations and apply
a valid method for dealing with incomplete-data when evaluating the likelihood function for
fixed parameters of model m. This eectively means that we treat missing observations as a
method for integrating out variables at certain points in time from the likelihood, and that the
marginal likelihood of the model for (yth , t ) can thereafter be computed via standard tools.80
Such an approach can also be used to estimate the marginal likelihood for the data (yth
, t ),
, t ).
where yth is a subset of the elements of yth , as well as for, e.g., the data (yt1 , . . . , yth
In fact, we may replace data points with missing observations anywhere in the predictive sample
yt1 , . . . yth when calculating the likelihood function.
In the case of linear state-space models with Gaussian shocks and measurement errors, the
likelihood function can be calculated using a Kalman filter which allows for missing observations; see, e.g., Durbin and Koopman (2012, Chapter 4.10) or Harvey (1989, Chapter 3.4.7).
Once we turn to non-linear, non-normal state-space models a missing observations consistent
filter, such as the particle filter (sequential Monte Carlo), may instead be applied when computing the likelihood; see Giordani, Pitt, and Kohn (2011) for a survey on filtering in state-space
models, or Durbin and Koopman (2012, Chapter 12) for an introduction to particle filtering.
When the joint predictive density for fixed parameters is Gaussian, marginalization can also
be conducted directly via the predictive mean and the covariance matrix for given parameters (provided these moments can be determined analytically) by utilizing well-known properties of the normal distribution.81 In the case of the linear Gaussian models this approach to
marginalization is equivalent to the Kalman filter approach, where the Kalman filter approach to
marginalization provides a unifying framework and is as parsimonious as possible when dealing
with potentially large matrices.
The predictive likelihood for a subset of the variables may now be estimated with, e.g., the
harmonic mean; see Gelfand and Dey (1994), the modification in Geweke (1999, 2005), or the
extension in Sims et al. (2008).82 For this estimator of the predictive likelihood we formally
80
This idea is related to but also dierent from data augmentation and other such EM algorithm extensions. For
these algorithms, the model is used to replace missing observations with model-based draws of the latent variables
and then use complete-data methods to address the incomplete-data problem; see, e.g., Tanner and Wong (1987)
and Rubin (1991).
81
In fact, any distribution where the marginal density can be determined analytically from the joint for given
parameters, such as the Student t, can be marginalized in this way.
82
Other methods, such as bridge sampling, annealed importance sampling, power posteriors (thermodynamic integration), steppingstone sampling, or cross-entropy may also be considered; see Meng and Wong (1996), FrhwirthSchnatter (2004) [bridge sampling], Neal (2001) [annealed importance sampling], Lartillot and Philippe (2006),
Friel and Pettitt (2008) [power posteriors], Xie, Lewis, Fan, Kuo, and Chen (2011) [steppingstone sampling], and
Chan and Eisenstat (2012) [cross-entropy with importance sampling].
150
1
h
|t
pH yth
Sh s1 L y |t ; s p s |t
th
h
h
1
Sh
S
f hs
f s
1
1
,
Sh s1 L y , t |s p s
S s1 L t ; s p s
th
(12.38)
where Lyth
|t ; denotes the conditional likelihood, Lt ; the likelihood, Lyth
, t | is
equal to the product between the conditional likelihood and the likelihood, we have used the
fact that p|t Lt ; p/pt , and where the function f is either the truncated
normal density, chosen as in equation (10.4), or the truncated elliptical density, computed as in
Sims et al. (2008).
, t ), it is tempting to
To avoid having to generate posterior draws for each sample (yth
is small, this
replace the draws hs in (12.38) with the draws s . If the dimension of yth
approximation is likely to work well in practise, but also implies that the resulting estimator is
not consistent. With f being a proper probability density function, the justification for using
the harmonic mean estimator is due to the following:
f
f
yth , t
p |yth
, t d
E
p yth |t , p |t
p yth |t , p |t
(12.39)
1
f
d p yth |t
,
p yth
|t
, t
where the second equality follows from applying Bayes rule to the posterior p|yth
and pyth |t , and the resulting expression is generally not equal the inverse of the predictive
likelihood. Accordingly, replacing the hs draws with s when computing the harmonic mean
estimator in (12.38) is unappealing since the estimator is no longer consistent.
If we insist on using only one set of parameter draws for all forecast horizons when computing
the predictive likelihood, we may instead use an importance sampling (IS) estimator; see, e.g.,
Geweke (2005). With i being draws from the importance density g, a general expression
of the IS estimator is
i
i
N
1 L yth |t , p |t
.
(12.40)
pIS yth |t
N i1
g i
Letting g p|t such that i s with N S, the estimator of the predictive likelihood in (12.40) is simply the average over the S posterior draws s of the conditional likelihood, i.e. standard Monte Carlo integration based on the conditional likelihood. Under certain
conditions, the right hand side of (12.40) converges almost surely to the expected value of
The inverse of x is a convex function, such that Jensens inequality implies that E x
1 E x1
.
84
Importance sampling is based on iid draws from the importance density; see, for instance, Geweke (2005, Chapter 4.2.2) for further details. In the case of DSGE and DSGE-VAR models, the posterior draws are typically obtained
via Markov chain Monte Carlo, such as the random walk Metropolis sampler, and are therefore not independent.
However, under certain conditions (Tierney, 1994) the estimator in (12.40) is consistent also when the draws from
151
equipped with the posterior draws s and the conditional likelihood, Lyth
|t , , the predictive likelihood can be consistently estimated directly, without having to compute it from two
marginal likelihoods, and without having to sample from the distribution of the parameters
, t ) for h 1, . . . , H.
conditional (yth
A further important property of the IS estimator is that it is unbiased (see Chan and Eisenstat, 2012, Proposition 1), while the harmonic mean estimator is not. Furthermore, the latter
estimator is sensitive to the choice of f and can be dicult to pin down numerically when
the dimension of is large, while the IS estimator based on the posterior p|t should be less
hampered by the this. In the case of DSGE models, which are typically tightly parameterized,
numerical issues with the harmonic mean should not be a major concern, but for DSGE-VARs
and BVAR models the computations need to take all the VAR parameters into account and is
therefore likely to be an important issue.
The IS estimator works well in practise when the draws from the importance density cover
well enough the parameter region where the conditional likelihood is large. When computing
the marginal predictive likelihood with g p|t this is typically the case, but is questionable when dealing with the joint predictive likelihood as h becomes large.85 For such situations
it may be useful to consider cross-entropy methods for selecting the importance density optimally, as in Chan and Eisenstat (2012), or apply one of the other methods mentioned above or
in footnote 82.
We may also compute the numerical standard error of the IS estimator in (12.40). As
|t can be calculated
suming that g p|t , the numerical standard error of pIS yth
with the Newey and West (1987) estimator, as in equation (8.2) but with n replaced with
|t . That is, we multiply the standard error of the predictive likelihood with the depIS yth
rivative of the log predictive likelihood with respect to the predictive likelihood as prescribed
by the delta method.
given the
For models with a normal likelihood, the log of the conditional likelihood for yth
history and the parameters is given by
1
n
|t ; ln2 ln y ,th|t
ln L yth
2
2
(12.41)
1
1
yth|t H F h t|t ,
y,th|t H Pth|t H R,
Pth|t FPth1|t F BB ,
h 1, . . . , H,
where t|t is the filter estimate of the state variables, and Pt|t the corresponding filter estimate
of the state variable covariance matrix based on the data t . The matrices , H, R, F, B are all
evaluated for a given value of .
The Kalman filter for missing observations can also be used when we are interested in the
joint predictive likelihood for subsets of variables across a sequence of future dates. To this end,
g p|t are not independent and the same conditions can be used to verify that the harmonic mean estimator
in (12.38) is consistent. In strict terms, the estimator in (12.40) is not an IS estimator when the iid assumption is
violated, but we shall nevertheless use this term also when the draws from the posterior are dependent.
85
For suciently large h the situation resembles the case when the marginal likelihood is computed by averaging
the likelihood over the prior draws. Such an estimator typically gives a poor estimate of the marginal likelihood.
152
, . . . , yth
|t ;
ln L yti
|ti1
, t ; ,
ln L yt1
(12.42)
i1
{ytj
}i1
where ti1
j1 and
1
n
|ti1
, t ; ln2 ln y ,ti|ti1
ln L yti
2
2
(12.43)
1
1
2
We assume for notational simplicity but without loss of generality that n and K are constant
across the h periods, i.e., that the same variables are conditioned on and that the realizations
are available. We now have have that y ,ti|ti1 K y,ti|ti1 K, H HK, A AK,
R K RK, while
A xti H Fti|ti1 ,
yti|ti1
y,ti|ti1 H Pti|ti1 H R .
The 1-step-ahead state variable forecasts are given by
yti1|ti2
,
ti|ti1 Fti1|ti2 Gti1 yti1
where the Kalman gain matrix is
.
Gti1 FPti1|ti2 H 1
y ,ti1|ti2
The 1-step-ahead state variable forecast error covariance matrix is
Pti|ti1 F Gti1 H Pti1|ti2 F Gti1 H Gti1 R Gti1 Q.
While the outlined solution to the problem of how to calculate the log predictive score based
on the marginal predictive likelihood for a subset of the observed variables is straightforward,
the calculation of marginal likelihoods for large systems is computationally expensive when
based on posterior draws. An approximate but computationally inexpensive estimator of the
marginal likelihood is the Laplace approximation, discussed above in Section 10.1; see also
Tierney and Kadane (1986), Gelfand and Dey (1994), and Raftery (1996). It requires that the
mode of the log posterior, given by the sum of the log likelihood and the log prior, can be
computed and that its Hessian is available.
be minus the Hessian, the Laplace approximation
Letting be the posterior mode of and
based on the sample t is given by
1 |
d ln2 ln |
,
(12.44)
ln pL t ln L t ; ln p
2
t
where d is the dimension of . The third term on the right hand side approximates ln p|
1
with Ot accuracy and, hence, the expression in (12.44) is a reflection of Bayes theorem
through what Chib (1995) calls the basic marginal likelihood identity.
th
(12.45)
| ln | |
ln |
,
ln p ln p
2
1
where the expression takes into account that ln |A | ln |A| when A has full rank. Gelfand
and Dey (1994) refer to (12.45) as their case (ii) and they note that the approximation has
Ot2 accuracy. In other words, the Laplace approximation of the marginal predictive likelihood in (12.45) is more accurate than the Laplace approximation of the marginal likelihood.
153
Alternatively, the log posterior for the sample yth
, t and its Hessian can be evaluated at
the parameter value instead of . This has the advantage that only one posterior mode esti
(h 1, . . . , H) that we are interested
mation is required, rather than one plus one for each yth
in. However, the use of in (12.45) ensures that the first derivatives of the log posterior for
, t are equal to zero, while the use of only ensures that they are, at best,
the sample yth
approximately zero. This fact implies an additional error source for the approximation, with
the eect that its accuracy is reduced to Ot1 , i.e., the same order of accuracy as the one the
marginal likelihood approximation in (12.44) has.
Replacing with in (12.45) yields the following simplified expression of the log of the
marginal predictive likelihood:
1
t | ln |
th | ,
ln |
|t ln L yth
|t ;
(12.46)
ln pL yth
2
t are minus the Hessians of the log posteriors based on the samples y , t
th and
where
th
The first term on the right hand side of (12.46) is the log
and t , respectively, evaluated at .
of the conditional likelihood, evaluated at the posterior mode using the t data, and we often
expect the marginal predictive likelihood to be dominated by this term. Concerning the second
term it may be noted that if the two log determinants are equal, then the Laplace approximation
is equal to the plug-in estimator of the predictive likelihood. Moreover, it is straightforward
to show that
2 ln L yth
|t ;
th|t .
th
t
t
(12.47)
Hence, we may expect that the second term in (12.46) is often close to zero. Furthermore, the
overall computational cost when using (12.46) is not reduced by taking the expression on the
right hand side of (12.47) into account unless analytical derivatives are available.
3
j0
qtij , . . . ,
3
j0
qt1j |t
i1
p qtij |qtij1 , . . . , qt1 , t
j0
(12.48)
added. It would be 4 for quarterly data and 12 for monthly. Next, for i 1, . . . , s 1
Fi Fi1 F i ,
Qi Qi1 Fi QFi ,
i
Fi Pt|t Fi ,
Pt|t
0
Pt|t Define the covariance
with initialization values F0 Ir and Q0 Q, resulting in Pt|t
matrices:
i
i
tj t ; ,
(12.49)
Ct C
jis
i
t
C
i
ytj t ; ,
(12.50)
jis
where is max{1, i s 1}. For example, when s 4 this means that for i 1, . . . , 4 we add
t1 until ti in the Cti covariances, and for i 5 we add ti3 until ti .
By throwing oneself into a multitude of tedious algebra it can be shown that
FP i1 F Q , if i 1, . . . , s,
i
i1
t|t
(12.51)
Ct
i1
FC
F Qs1 , if i s 1.
t
Furthermore, it can also be shown that
H Ci H iR, if i 1, . . . , s
t
i
t
i
H Ct H sR, if i s 1.
(12.52)
Provided that all variables in the y vector are in first dierences, the conditional likelihood
i
i
.
where z,t K t K and zti|t jis ytj|t
When some variables in the y vector are in first dierences and some in levels, the actuals
(yti ), the forecasts (yti|t ), and the covariance matrices (i
t ) need to take this into account.
For the actuals and the forecast we simply avoid summing current and past values for the
variables in levels, while the diagonal elements of the covariances i
t corresponding to levels
variables are replaced with the diagonal elements of y,ti|t from these positions. It remains
to replace the o-diagonal elements of the i
t matrices which should represent covariances
between first dierenced variables and levels variables with the correct covariances.
To determine the covariances between sums of first dierenced variables and levels variables,
define
i
tj t ; ,
(12.54)
i
t C ti ,
i
t
jis
C yti ,
i
jis
155
ytj t ; .
(12.55)
i1
P
,
if i 1, . . . , s,
ti|t Ft
i
t
i1
F Fs1 Q, if i s 1.
Ft
(12.56)
i 1, . . . , h.
(12.57)
(12.59)
t 0,1
There are some attempts to determine test statistics for the multivariate case; see, e.g., Fasano and Franceschini
(1987) who consider two and three dimensions for the test. For some more recent work on the problem of comparing
multivariate distributions, see Loudin and Miettinen (2003).
156
12.7.1. DSGEPredictionPathsTheta
The function DSGEPredictionPathsTheta needs 11 inputs. First of all, a set of values for the
parameters is supplied through the variable theta. To use the values properly the vector structure thetaPositions and the structure ModelParameter that were discussed in Section 7.4 are
also needed. Furthermore, the DSGE model information structure DSGEModel and the generic
initialization structure CurrINI must be supplied to the function. The following input is given
by the k h matrix X with the values of the exogenous variables over the h period long prediction sample. Next, the value of h is accepted since X is empty if k 0. The 8th input is
called NumPaths and specifies how many prediction paths to compute, while the boolean variable AnnualizeData indicates if the prediction paths should be annualized or not. Similarly, the
boolean variable TransData indicates if the data should be transformed or not. The final input
is given by NameStr which indicates the type of values that are used for , e.g., the posterior
mode estimate.
The main output from the function is the 3-dimensional matrix PredPaths and the matrices
PredEventData and YObsEventData. The dimensions of the PredPaths matrix are given by
the number of observed variables, the length of the prediction sample, and the number of
prediction paths. The matrix PredEventData has as many rows as number of observed variables
and 7 colummns; the matrix is computed through the function CalculatePredictionEvents;
see Section 12.7.5. A prediction event can, for instance, be defined as non-negative inflation
for h consecutive periods over the prediction period. The h integer of always less than or
equal to the length of the prediction period. Similarly, the matrix YObsEventData has the same
dimension as PredEventData and holds prediction event data when the mean of the predictive
distribution is equal to the realized values for the observed data.
12.7.2. DSGEPredictionPaths
The function DSGEPredictionPaths requires 13 inputs. Relative to the previous function,
DSGEPredictionPathsTheta, there are two additional inputs (the first and the last) and the
second last input variable is dierent from the last of DSGEPredictionPathsTheta. Before the
thetaMode input, the current function accepts the matrix thetaPostSample with NumDraws rows
and NumParam columns. Despite the name, this matrix can either hold draws from the posterior
or from the prior distribution. Similarly, thetaMode can be the posterior mode estimates as well
as the initial values of the parameters .
The number of draws from the posterior that are used can vary irrespective of how many
draws from the posterior that are available, while the number of draws from the prior are,
in principle, arbitrary. Typically, the number of draws of that are sent to this function is a
small number, such as 500 or 1,000. The last two input variables used by the function are
CurrChain, which is an integer that indicates the MCMC chain number, and IsPosterior, a
boolean variable which is one if the parameter draws are taken from the posterior distribution
and zero if they are taken from the prior.
As output the function provides 6 variables. The first is the boolean DoneCalc that indicates
if all calculations were performed or not. The next is the matrix PredEventData with prediction
event results. The final 4 variables provides the data on the prediction variance decompositions for the observed variables over the whole prediction horizon. These variables are called
StateCov, ShockCov, MeasureCov, and ParameterCov, respectively. Apart from MeasureCov
these are all 3D matrices with dimensions n n h, where h is the length of the prediction
horizon, while MeasureCov is n n.
The prediction paths are not directly sent as output from the function. These are instead
written to disk in mat-files, one for each parameter draw. In each file the 3D matrix PredPaths
is stored. Its dimensions are given by the number of observed variables, the length of the
prediction sample, and the number of prediction paths.
157
12.7.3. DSGECondPredictionPathsTheta
The function DSGECondPredictionPathsTheta needs 13 inputs. The first 6 and the last 5 are
the same inputs as the function DSGEPredictionPathsTheta takes. The 2 additional inputs
naturally refer to the conditioning information. Specifically, the 7th input is given by Z, an
qm g matrix with the conditioning data zT 1 zT g
, while the 8th input variable is called
U, an qm g matrix with the initial values uT g1 uT
for the conditioning; cf. equation
(12.6).
The main output from the function is the 3-dimensional matrix PredPaths, the matrices
with prediction event test results, PredEventData and YObsEventData, and the modesty results,
MultiModestyStat, UniModestyStat, and UniModestyStatLZ. The dimension of PredPaths is
given by the number of observed variables, the length of the prediction sample, and the number
of prediction paths. The matrix PredEventData (and YObsEventData) has as many rows as
number of observed variables and 7 columns. These matrices are computed by the function
CalculatePredictionEvents. A prediction event can, for instance, be defined as non-negative
inflation for h consecutive periods over the prediction period. The h integer of always less
than or equal to the length of the prediction period. The dierence between PredEventData
and YObsEventData is that the latter matrix holds prediction event results when the mean of
the predictive distribution has been set equal to the realized values of the observed variables.
The modesty statistics are only calculated when AnnualizeData is zero. When this condition is met, MultiModestyStat is a matrix of dimension NumPaths times 2, where the first
, while the second column gives T,g . The matrix
columns holds the values of T,g
UniModestyStat has dimension NumPaths times n and gives the univariate modesty statistics
in equation (12.28), while UniModestyStatLZ is a vector with the n values of the univariate
Leeper-Zha related modesty statistic.
12.7.4. DSGECondPredictionPaths
The function DSGECondPredictionPaths for computing the conditional predictive distribution
requires 15 inputs. The first 7 and the last 6 input variables are the same as those that the
function DSGEPredictionPaths accepts. The two additional inputs refer to the same data that
the function DSGECondPredictionPathsTheta requires, i.e., to Z and U.
Moreover, as in the case of DSGEPredictionPaths for the unconditional predictive distribution, the majority of the output from this function is not sent through its output arguments, but
are written to disk. For instance, the prediction paths are written to disk in mat-files, one for
each parameter draw. In each file the 3D matrix PredPaths is stored. Its dimensions are given
by the number of observed variables, the length of the prediction sample, and the number of
prediction paths. Moreover, the multivariate and univariate modesty statistics are calculated
and saved to disk provided that the AnnualizeData variable is zero (no annualization).
The function provides the same 6 output arguments as the DSGEPredictionPaths function.
Moreover, the prediction paths data is also written to disk in mat-files, one for each parameter
draw. In each file the 3D matrix PredPaths is stored. Its dimensions are given by the number
of observed variables, the length of the prediction sample, and the number of prediction paths.
12.7.5. CalculatePredictionEvents
The function CalculatePredictionEvents computes prediction event and risk analysis data
from the simulated prediction paths. The function requires 2 input variables: PredPaths and
PredictionEvent. The first variable is the familiar matrix with all prediction paths for a given
parameter value, while the second is a matrix that holds the prediction event information. The
number of rows of PredictionEvent is equal to the number of variables, and the number of
columns is three; the upper bound of the event, the lower bound, and the number of consecutive
periods for the event.
As output the function provides the matrix PredEventData. The number of rows is equal to
the number of variables, while the number of columns is 8. The first column gives the number
of times the event is true, i.e., the number of times that the paths contain values that fall within
158
the upper and lower bound for the number of periods of the event. The second column holds
the number of times that the paths are below the lower bound of the event for the length of
the event, the third provides the number of times that the paths are above the upper bound
of the event for the length of the event, while the 4th column has the total number of times
that the event could be true. The 5th and 6th column store the sum of and the sum of squared
deviations from the lower bound of the event when the paths are below the lower bound for
the required length of the event. Similarly, the 7th and 8th column hold the sum of and sum
of squared deviations from the upper bound of the event when the paths are above the upper
bound for the required length of the event.
12.7.6. DSGEPredictiveLikelihoodTheta
The function DSGEPredictiveLikelihoodTheta computes the joint and the marginal predictive
likelihood for fixed parameter values using the Laplace approximation. The function requires
17 input variables: theta, thetaPositions, thetaIndex, thetaDist, PriorDist, LowerBound,
UniformBounds, ModelParameters, YData, X, IsOriginal, IsPlugin, and also FirstPeriod,
LastPeriod, StepLength, DSGEModel, and CurrINI. Most of these input variables have been
described above. The matrix YData contains the realizations of the observed variables over the
forecast sample, X is a matrix with data on the deterministic variables over the same sample,
IsOriginal is a boolean variable that takes the value 1 if the original data should be forecasted
and 0 if annualized data should be predicted, while IsPlugin is a boolean variable that takes the
value 1 if only the plugin estimator should be computed and 0 if also the Laplace approximation
should be calculated. Finally, StepLength is the step length used by the function that estimates
the Hessian matrix using finite dierences; see Abramowitz and Stegun (1964, p. 884) formulas
25.3.24 and 25.3.27.
The function provides six required output variables and two optional. The required variables
are JointPDH and MargPDH, vectors with the joint and marginal predictive likelihood values,
respectively. Furthermore, the variable LaplaceMargLike is a scalar with the value of the marginal likelihood for the historical sample using the Laplace approximation, while the cell array
PredVars contains vectors with positions of the predicted variables among all observed variables. The vectors MargPlugin and JointPlugin give the plugin estimates of the marginal and
joint predictivce likelihood, i.e., when the second term involving the Hessian matrices in equation (12.46) is dropped. The optional output variables are status and kalmanstatus, which
have been discussed above.
159
1
y h exp ih ,
(13.1)
sy
2 h
where i 1 and is the angular frequency measured in radians, while expih is a point
on the unit circle for all R and integers h; see, e.g., Fuller (1976, Chapter 4) or Hamilton
(1994, Chapters 6 and 10). De Moivres theorem allows us to write
exp ih cosh i sinh.
Making use of the some well known results from trigonometry,87 it can be shown that the
population spectrum can be rewritten as
1
y 0
y h y h cosh
sy
2
h1
(13.2)
y h y h sinh .
i
h1
From (13.2) it can be seen that the diagonal elements of the population spectrum are real
and symmetric around zero.88 The o-diagonal elements (cross-spectrum) are complex while
the modulus of each such element is symmetric around zero. This follows by noticing that the
real part of the spectrum is symmetric, while the imaginary part is skew-symmetric.89 Hence,
the population spectrum is a Hermitian matrix so that sy sy , i.e., the complex
equivalent of a symmetric matrix. The real part of the cross-spectrum is called the co-spectrum
while the imaginary part is known as the quadrature spectrum.
Moreover, since the sine and cosine functions are periodic such that cosa cosa 2k
and sina sina 2k for any integer k the population spectrum is also a periodic function
of with period 2. Hence, if the know the value of a diagonal element of the spectrum for all
between 0 and , we can infer the value of this element for any . Similarly, if we know the
value of the modulus of an o-diagonal element of the spectrum for all between 0 and , we
can infer the value of the modulus for any .
Translating frequencies into time periods we let j 2j/T, where the frequency j has a
period of T/j time units (months, quarters, or years). This means that the frequency is equal to
2 divided by the period in the selected time units. As an example, suppose that low frequencies
are regarded as those with a period of 8 years or more, business cycle frequencies those with
a period of, say, 1 to 8 years, while high frequencies are those with a period less than 1 year.
For quarterly time units, these periods imply that low frequencies are given by 0, /16
,
busines cycle frequencies by /16, /2
, while high frequencies are /2,
.
13.2. Spectral Decompositions
There is an inverse transformation of (13.1) that allows us to retrieve the autocovariance matrices from the spectral density. Specifically,
sy exp ih d.
(13.3)
y h
Specifically, recall that cos0 1, cosa cosa, sin0 0, and sina sina; see, e.g., Hamilton
(1994, Appendix A) for some details.
87
88
In addition, the diagonal elements of the population spectrum are non-negative for all ; see, e.g., Fuller (1976,
Theorem 3.1.9).
89
For h 0 it thus follows that the area under the population spectrum is equal to the contemporaneous covariance matrix of yt . That is,
sy d.
(13.4)
y 0
For a diagonal element of the population spectrum of the observed variables we can therefore
consider 2 times the area between 0 and to represent the share of the variance of the corresponding element of y that can be attributed to periodic random components with frequency
less than or equal to .
The conditional covariance of the state variables in the state-space model are given in equation (11.40), while the conditional covariance of the observed variables are shown in equation
(11.41). The covariance matrix of the state variables, , is related to the conditional covariance
of these variables through
q
j
.
(13.5)
j1
This means that the contemporaneous covariance matrix of the observed variables, y 0, can
be expressed as
q
j
y 0 H
H R.
(13.6)
j1
j
Let s be the population spectrum of the state variables, while s is the spectrum of
the state variables when all shocks are zero excepts j for j 1, . . . , q. Since the state-space
model is linear and the shocks and measurement errors are uncorrelated, the spectrum of the
state variables is equal to the sum of these spectra; see, e.g., Fuller (1976, Theorem 4.4.1) for a
generalization of this property. That is,
q
j
s .
(13.7)
s
j1
Moreover, let sw denotes the spectrum of the measurement errors. Since these errors are
just white noise, it follows that sw 1/2R is constant.
The population spectrum of the observed variables is equal to the weighted sum of the spectrum of the state variables plus the spectrum of the measurement errors. Specifically,
q
1
1
j
RH
R.
(13.8)
s H
sy H s H
2
2
j1
Combining this with the results on the conditional covariances above it follows that
j
j
H s Hd.
H H
j
H s H
is the spectrum of the observed variables when there are no measureThe matrix
ment error and when all shocks are zero expect for j . From equation (13.4) it follows for a
j
diagonal element of H s H that 2 times the area under this spectrum is equal to the share
of the contemporaneous variance of the corresponding element of the observed variables that
are due to state shock j.
13.3. Estimating the Population Spectrum for a State-Space Model
From equation (5.42) we know that if F has all eigenvalues inside the unit circle, then the
autocovariances of the state-space model exist and can be expressed as
H H R, if h 0,
(13.9)
y h
h
for h 1, 2, . . .
H F H,
161
y h 1
t yth y
th , h 0, 1, . . . , h,
yt y
T th1
< T and
where h
y
t
T
yt xt
t1
T
1
xt xt
xt .
t1
h
1
y h cosh
y h
y 0
h
sy
2
h1
(13.12)
h
y h sinh .
y h
h
i
h1
Notice that expi takes us along the unit circle from 1 to 1 for the real part and from 0 into negative values
and back to 0 for the imaginary part as goes from 0 to . Similarly, expi traces out the positive region of the
unit circle for the imaginary part as goes from 0 to , while the real part again begins at 1 and ends at 1. That
is, expi and expi are mirror images along the unit circle.
90
162
One popular estimate of the spectrum uses the modified Bartlett kernel, which is given by
1h
h
, h 1, 2, . . . , h.
h
1
h
From a practical perspective, the main problem with the non-parametric estimator in (13.12)
see, e.g, Hamilton (1994) for additional discussions. For the selected
is how to choose h;
value of h we can compare the estimated population spectrum for the simulated data to the
estimated population spectrum for the observed data. With simulated data we may consider,
say, S number of simulated paths per parameter value for and P dierent parameter values
from the posterior distributions, thus yielding an estimate of the posterior distribution of the
non-parametric estimate of the population spectrum.
13.5. Filters
The population spectra for the state-space model and the VAR model are special cases of the
filter vt ALwt , where wt is a covariance stationary vector time series of dimension k and
the m k polynomial
Aj zj ,
A z
j
is absolutely summable. Assuming that the population spectrum of wt is given by sw , the
population spectrum of the m dimensional time series vt is
(13.13)
sv A expi sw A expi , ,
,
see, e.g., Fuller (1976, Theorem 4.4.1) and Hamilton (1994, Chapter 10). A very simple example
of such a filter is when we wish to transform a vector wt from quarterly dierences, (1 z), to
annual dierences, (1 z4 ). This means that Az 1 z z2 z3 Ik so that vt is the sum
of the current and previous 3 quarterly changes.
13.6. Coherence
Let syk ,yl denote the element in row k and column l of sy . The coherence between yk
and yl is given by
2
R2yk ,yl
0,
,
(13.14)
where |a| denotes the modulus of a and the terms in the denominator are assumed to be
non-zero. In the event that one of them is zero the statistic is zero. It can be shown that
0 R2yk ,yl 1 for all as long as y is covariance stationary with absolutely summable
autocovariances; see, e.g., Fuller (1976, p. 156). The coherence statistic thus measures the
squared correlation between yk and yl at frequency , i.e., R2yk ,yl R2yl ,yk for all pairs
yk , yl .
We can similarly define a coherence statistic for pairs of state variables in the state-space
model. Letting sk ,l denote the element in row k and column l of the population spectrum
of the state variables, s , it follows that
2
R2k ,l
|sk ,l |
,
sk ,k sl ,l
0,
,
is the coherence between k and l . Again we note that the statistic is zero whenever one of the
terms in the denominator is zero.
Finally, it may be noted that if q 1 and R 0, then the coherence between two observed
variables is either zero or one. Hence, there is not any point in computing conherence statistics
conditional on all shocks being zero except one.
163
k, l 1, . . . , n, k l.
syk ,yl ayk ,yl exp iyk ,yl ,
(13.15)
where ayk ,yl |syk ,yl | is called the cross amplitude spectrum and yk ,yl the phase
spectrum; see, e.g., Fuller (1976, p. 159) or Sargent (1987, p. 269). The latter object can be
computed from
qyk ,yl
,
(13.16)
yk ,yl arctan
cyk ,yl
where arctan denotes the arctangent, i.e., the inverse tangent.91 The gain of yl over yk can be
defined as
ayk ,yl
,
(13.17)
yk ,yl
syk ,yk
for those where syk ,yk > 0. The gain is sometimes also defined as the numerator in
(13.17); see, e.g., Sargent (1987) or Christiano and Vigfusson (2003). The cross amplitude
tells how the amplitude in yl is multiplied in contributing to the amplitude of yk at frequency
. Similarly, the phase statistic gives the lead of yk over yl at frequency . Specifically, if
yk ,yl > 0, then yk leads yl by yk ,yl / periods, while yk ,yl < 0 means that yl
leads yk by yk ,yl / periods.
However, the phase is not unique since there are multiple values of yk ,yl that satisfy
equation (13.15) for each ,
. For example, if yk ,yl solves (13.15), then so does
yk ,yl 2h for h 0, 1, 2, . . .. In other words, the lead and the lag between two
sinusoidal functions with the same period (T/j) is ill-defined.
As a method for resolving the ambiguity in characterizing the lead-lag relationship between
variables, Christiano and Vigfusson (2003) proposes an alternative approach. From equation
(13.3) we find that
syk ,yl exp ih d.
(13.18)
yk ,yl h
Using the fact that the population spectrum is Hermitian,92 equation (13.15) and De Moivres
theorem, Christiano and Vigfusson (2003) show that
syk ,yl exp ih syk ,yl exp ih d
yk ,yl h
0
(13.19)
2ayk ,yl cos yk ,yl h d
0
yk ,yl h, d.
0
Recall that the tangent is given by tanx sinx/ cosx. The inverse tangent can be defined as
i
ix
arctanx tan1 x ln
,
2
ix
where arctanx /2, /2
for real valued x. The inverses of the trigonometric functions do not meet the
usual requirements of inverse functions since their ranges are subsets of the domains of the original functions. The
latter is a consequence of, e.g., the cosine function being periodic such that cos0 cos, and so on. This means
that an inverse of the cosine function will deliver multiple values, unless its range is restricted to a certain subset.
For values of x within this subset, the usual requirement of inverse functions apply.
91
For the cross amplitude spectrum and the phase spectrum it follows from this property that ayk ,yl ayk ,yl
and yk ,yl yk ,yl , respectively.
92
164
where yk ,yl h, is the covariance between the component of yk,t at frequency and the
component of yl,th at frequency .
Christiano and Vigfusson note that it is common to characterize the lead-lag reation between
two variables by the value of h for which yk ,yl h is the largest (in absolute terms). From
(13.19) it can be seen that yk ,yl h, is maximized for h yk ,yl /, since cos0 is the
unique maximum of the cosine function.
To resolve the ambiguity of yk ,yl , Christiano and Vigfusson suggest that phase is chosen
such that
yk ,yl h, d, > 0,
(13.20)
hyk ,yl , arg max
h
,
sy
sy
1
1
sy
, k, l 1, . . . , p.
(13.22)
Kk,l , tr sy
k
l
Using the relationship between the trace and the vec operator (see Magnus and Neudecker,
1988, Theorem 2.3) it can be shown that
vec sy
vec sy
1
1
sy
.
(13.23)
sy
K,
Recall that sy is not symmetric, but Hermitian. Note also that the transpose used in (13.23)
is not the complex conjugate but the standard transpose.93 It is now straightforward to show
that K, is Hermitian, i.e., K, K, . It follows that in (13.21) is real and
symmetric since K, K, is real and symmetric for each 0,
, while K, 0 is
real and symmetric since sy 0 is; see equation (13.2).
It is important to note that the population spectrum need not be invertible at all frequencies.
In particular, it may be the case that it is singular at frequency zero (the long-run covariance
matrix is singular). YADA checks such cases and uses the rule that K, is not computed for
the frequencies where the population spectral density is singular.
The population spectrum of the state-space model is given by equations (13.8) and (13.10).
Since we exclude parameters that concern the mean of the observed variables, we let all partial
Recalling footnote 9, the complex conjugate or conjugate transpose for a complex matrix A B iC is equal to
A B iC . This means that a matrix A is Hermitian if A A . The standard transpose of A is instead given by
A B iC .
93
165
derivatives with respect to the elements of A be equal to zero. Next, using the matrix dierential
rules in Magnus and Neudecker (1988) it can be shown that
vec sy
1
I 2.
(13.24)
2 n
vec R
With Knn being defined as the n2 n2 dimensional commutation matrix,94 it can be shown
that
vec sy
(13.25)
Knn In H s In H s .
vec H
Letting Nr 1/2Ir 2 Krr , as in Magnus and Neudecker (1988, Theorem 3.11), it can be
shown that
1
1
vec sy
1
H F expi
H F expi
(13.26)
Nr B0 Ir .
vec B0
Furthermore, the partial derivatives with respect to the state transition matrix is given by
1
vec sy
exp i
H s H F expi
vec F
(13.27)
1
exp i .
Knn H s H F expi
The next step is to determine the partial derivatives of R, H, F, B0 with respect to and,
using the chain rule, postmultiply the matrices on the right hand side of equations (13.24)
(13.27) by vecR/ , vecH/ , and so on. The partial derivatives of the reduced form
parameters of the state-space model with respect to the structural parameters can either be
achieved numerically or analytically; see Iskrev (2007). To determine which elements of that
are included in , we may simply check which elements of that have a non-zero column in at
least one of the matrices with partial derivatives vecM/ , with M R, H, F, B0 . To obtain
vecM/ we remove the columns of vecM/ that correspond to the elements of that
do not meet this condition. Finally, the partial derivative of sy with respect to is obtained
by premultiplying the above matrices of partial derivatives by the commutation matrix, i.e., we
use vecsy Knn vecsy .
A frequency domain expression of Fishers information matrix may also be derived as in
Harvey (1989, Chapters 4 and 8). When following his approach, it may be expressed as
1
K, j ,
2 j0
T 1
(13.28)
where j 2j/T as above. In practise we may use either (13.21) or (13.28) as the information matrix, but the numerical integration needed for the Whittles expression is likely to give
a better approximation of than the right hand side of (13.28). The reason is that a smaller
numerical representation of d may be used in (13.21) than the value j 2/T used in
(13.28).95
13.9. YADA Code
The decomposition of the population spectrum for the state-space model is undertaken by the
function DSGESpectralDecompTheta. This function works for a fixed value of the parameter
vector and computes the decomposition for either the observed variables or for the state
variables.
For any m n matrix A, the mn mn dimensional commutation matrix is defined from vecA Knm vecA ,
with Kmn Knm Imn ; see Magnus and Neudecker (1988, Chapter 3) for some properties of this matrix.
94
95
It may be noted that the information matrix in (13.28) is based on an asymptotic approximation of the loglikelihood function when transforming it from the time domain to the frequency domain; see Harvey (1989, Chapter 4.3) for details.
166
When computing spectral densities for the observed variables, the function can also deal
with annualized variables. For variables in first dierences of quarterly observations the an3
nualization filter is Az 1 j1 zj , while for variables in first dierences of monthly
11
observations the filter is Az 1 j1 zj . The filter computation is taken care of by the
function AnnualFilter.
The function DSGEPopulationSpectrum computes the population spectrum for a fixed parameter value, while DSGECoherenceTheta computes the coherence statistic either for the observed
variables or for the state variables. Whittles estimator of the information matrix in (13.21) is
calculated by DSGEFrequencyDomainInfMat for T 1.
13.9.1. DSGESpectralDecompTheta
The function DSGESpectralDecompTheta requires 6 input variables: theta, thetaPositions,
ModelParameters, VarStr, DSGEModel, and CurrINI. The first three and last two variables have
often appeared above; see, e.g., CalculateDSGEStateVariables in Section 11.16.3. The 4th
input variable VarStr is a string that supports the values Observed Variables and State
Variables, thus indicating if spectral density decompositions should be performed for observed variables or state variables.
The function provides the output variable SpecDec, a structure whose fields depend on the
value of VarStr. In addition, the function can provide output on status, the mcode output
variable from the DSGE model solving function that has been used, and kalmanstatus, the
status output from the Kalman filter.
For spectral decompositions of observed variables, the SpecDec structure has at least 13 fields
and at most 18. The additional 5 fields are all related to annualization of the data. The latter
operation can only be performed if the vector stored in the field annual to the DSGEModel input
variable has at least one value which is greater than unity. Specifically, a value of 4 means that
the corresponding observed variable is annualized by adding the current and previous 3 values
while a value of 12 means that the variables is annualized by adding the current and previous
11 values.
The 5 fields with annualized spectral densities are called AnnualY, AnnualMeasureError,
AnnualYStates, AnnualYShocks, and AnnualVariance. The first three hold f dimensional cell
arrays with n n matrices, where f is the number of used frequencies between 0 and . The
spectral density for the annualized observed variables are located in the AnnualY cell array, the
term related to the measurement errors in AnnualMeasureError, and the total influence of the
states variables in AnnualYStates. The field AnnualYShocks is a q f cell array with n n
matrices with the influence of the individual economic shocks in the rows of the cell array.
Finally, the field AnnualVariance is a vector with the population variances of the observed
variables.
Among the always present fields are Frequencies, Y, States, Shocks, MeasureError, and
OriginalVariance. The first is an f dimensional vector with the frequencies that have been
used for the spectral densities. The value of f is 300 with 0, /299, 2/299, . . . , . The
following three fields are cell arrays of dimension f, f and q f, respectively, with the spectral
densities for the observed variables, the share of all the state variables, and the shares due
j
to the individual shocks. The latter two are given by H s H and H s H; cf. equations
(13.10)(13.11). The field MeasureError is an n n matrix with the frequency independent
influence of the measurement errors on the spectral density, i.e., 1/2R, while the field
OriginalVariance is a vector with the population variances of the observed variables.
Three fields are further related to the population covariance of the observed variables. They
are called SigmaY, SigmaXi, and R. The first is the population covariance matrix itself, while
the second field is the share due to the state variables, and the third the share due to the
measurement errors.
Furthermore, the structure SpecDec has 4 fields that hold data for using shocks groups:
ShockNames, ShockGroups, ShockGroupNames, and ShockGroupColors. The first and the third
167
are string matrices where the rows hold the names of the shocks and the shock groups, respectively, where the number of rows is q for the shocks and g for the shock groups, with q g.
The second field is a vector of dimension q with integers that map each shock to a certain shock
group, while the last field is a g 3 matrix, where each row gives the color as an RGB triple for
a shock group. The RGB triple holds values between 0 and 1, representing the combination of
red, green and blue, and this scale can be translated into the more common 8-bit scale that is
used to represent colors with integer values between 0 and 255.
For spectral decompositions of the state variables, the number of fields in the SpecDec structure is smaller. Among the 13 fields that are always present for the observed variables, 4 are no
longer available for the spectral decomposition of the state variables. The missing fields are: Y,
MeasureError, SigmaY, and R. Regarding the cell arrays, the dimension of the matrices stored
in each cell is now r r. This means that, for example, the States field is an f dimensional cell
array of the r r matrices s . Moreover, the field OriginalVariance is a vector with the
population variances of the state variables, while SigmaXi is the corresponding r r population
covariance matrix.
13.9.2. DSGEPopulationSpectrum
The function DSGEPopulationSpectrum computes the population spectrum for a set of frequencies 0,
using 4 input variables: H, R, F, B0. These variables correspond to the H, R, F,
and B0 matrices of the state-space model.
The function provides 7 output variables. The first is given by the cell array SyOmega that has
300 entries. Each element in the array is equal to the n n matrix sy j for a given frequency
j j 1/299, j 1, . . . , 300. The second variable, Omega, is a vector of length 300 with
the dierent frequencies j , while the following, SxiOmega, is a cell array with 300 entries. This
time the array holds the r r matrices s j .
All the remaining output variables concerns input for the partial derivatives in (13.26) and
(13.27). The variables Fi and Fmi are cell arrays with 300 elements that hold the inverse
of Fexpij 1 and Fexpij 1 , respectively, while ei and emi are vectors with 300
elements, where each entry is expij and expij .
13.9.3. DSGECoherenceTheta
The function DSGECoherenceTheta requires 6 input variables to compute coherence for a fixed
value of the DSGE model parameters: theta, thetaPositions, ModelParameters, VarType,
DSGEModel, and CurrINI. The only unfamiliar variable is VarType, a boolean variable which is
unity if coherence for the observed variables should be computed and zero if state variables
have been selected.
As output, the function provides SOmega and Omega. The former is a matrix of dimension
mm 1/2 f, where m is the number of variables, i.e., equal to n for the observed variables
and to r for the state variables. The dimension f is equal to the number of frequencies between
0 and . The Omega variable is a vector of dimension f with the frequency values j . Following
the convention used by the DSGEPopulationSpectrum function above, f 300 so that j
j 1/299 for j 1, . . . , 300.
13.9.4. DSGEFrequencyDomainInfMat
The function DSGEFrequencyDomainInfMat takes the same input variables as the time domain
function DSGEInformationMatrix (see Section 11.16.10). That is, it accepts the 6 variables:
theta, thetaPositions, ModelParameters, ParameterNames, DSGEModel, and CurrINI.
As output the function returns informationMatrix, the p p information matrix in (13.21)
with T 1. The dimension p is equal to the number of entries in that aect the population
spectrum, i.e., the number of entries that aect at least one of the state-space matrices H, R, F,
B0 . The positions of these parameters are provided in the output variable ParamIncluded, while
the third output variable, ParamExcluded, holds the positions in of the parameters that do not
have an eect on the above state-space matrices (and, hence, only concern the A matrix).
168
k
l xtl dtl t ,
t 1, . . . , T.
(14.1)
l1
The vector dt is deterministic and assumed to be of dimension q. The residuals t are assumed
to be i.i.d. Gaussian with zero mean and positive definite covariance matrix . The l matrix is
p p for all lags, while is p q and measures the expected value of xt conditional on the parameters and other information available at t 0. All Bayesian VAR models that are supported
by YADA have an informative prior on the parameters, the steady state of xt . Moreover, the
elements of the vector xt (dt ) are all elements of the vector yt (xt ) in the measurement equation
of the DSGE model. It is hoped that this notational overlap will not be confusing for you.
14.1. The Prior
The setup of the VAR model in (14.1) is identical to the stationary VAR process in mean-adjusted
form that Villani (2009) examines. The prior on the steady state is also the same as that considered in his paper and used by, e.g., Adolfson, Anderson, Lind, Villani, and Vredin (2007a).
That is, with vec I assume that the marginal prior is given by
(14.2)
Npq , ,
where is positive definite. YADA allows the user to select any values for and for the
diagonal of . The o-diagonal elements of the prior covariance matrix are assumed to be
zero.
Let 1 k
be the p pk matrix with parameters on lagged x. The prior distributions for these parameters that YADA supports are as follows:
(i) a Minnesota-style prior similar to the one considered by Villani (2009);
(ii) a normal conditional on the covariance matrix of the residuals (see, e.g., Kadiyala and
Karlsson, 1997); and
(iii) a diuse prior.
I will address the details about each prior distribution below.
First, for the Minnesota-style prior the marginal prior distribution of is given by:
(14.3)
vec Np2 k , ,
where the prior mean need not be unity for the first own lagged parameters and zero for the
remaining. In fact, the general setup considers xt to be stationary with steady state determined
by the prior mean of and dt .
Let vec , where 1 k
is a p pk matrix with the prior mean of .
The assumption in YADA is that l 0 for l 2, while 1 is diagonal. The diagonal entries
are determined by two hyperparameters, d and l . With ii,1 being the i:th diagonal element
of 1 , the prior mean of this parameter, denoted by ii,1 , is equal to d if variable i in the xt
vector is regarded as being first dierenced (e.g., output growth), and l if variable i is in levels
(e.g., the nominal interest rate).
The Minnesota feature of this prior refers to the covariance matrix . Let ij,l denote the
element in row (equation) i and column (on variable) j for lag l. The matrix is here assumed
to be diagonal with
o
if i j,
l h ,
(14.4)
Var ij,l
o c ii , otherwise.
lh jj
The parameter ii is simply the variance of the residual in equation i and, hence, the ratio
ii /jj takes into account that variable i and variable j may have dierent scales.
169
Formally, this parameterization is inconsistent with the prior being a marginal distribution
since it depends on . YADA tackles this in the standard way by replacing the ii parameters
with the maximum likelihood estimate. The hyperparameter o > 0 gives the overall tightness
of the prior around the mean, while 0 < c < 1 is the cross-equation tightness hyperparameter.
Finally, the hyperparameter h > 0 measures the harmonic lag decay.
Second, when the prior distribution of is no longer marginal but conditional on the covariance matrix of the residuals we use the following:
vec| Np2 k , ,
(14.5)
where is a positive definite pk pk matrix, while is determined in exactly the same way
as for the Minnesota-style prior above. A prior of this generic form is, for instance, examined
by Kadiyala and Karlsson (1997), where it is also discussed relative to, e.g., the Minnesota
prior. The matrix is assumed to be block diagonal in YADA, where block l 1, . . . , k
(corresponding to l ) is given by
o
(14.6)
l Ip .
l h
Hence, the overall tightness as well as the harmonic lag decay hyperparameter enter this prior,
while the cross-equation hyperparameter cannot be included. This is the price for using the
Kronecker structure of the prior covariance matrix. At the same time, dierent scales of the
variables are now handled by conditioning on instead of using sample information.
Finally, in the case of the diuse prior we simply assume that the prior density p 1.
The marginal prior distribution for is either assumed to be diuse or inverted Wishart. Let
the marginal prior density of be denoted by p. In the former case, we simply make use of
the standard formula (see, e.g., Zellner, 1971)
p1/2
p ||
(14.7)
In the latter case, the density function of is proper and given in equation (4.12) in Section 4.2.4. Recall that the mode of the inverted Wishart is equal to 1/p v 1A, while the
mean exists if v p 2 and is then given by E
1/v p 1A; see, e.g., Zellner (1971,
Appendix B.4) and Bauwens et al. (1999, Appendix A) for details.
The hyperparameter A can be selected in two ways in YADA. The first route is to let A equal
the maximum likelihood estimate of . This was suggested by, e.g., Villani (2005). The alternative is to let A A Ip , where the hyperparameter A gives the joint marginal prior residual
variance; see, e.g., Warne (2006). By selecting the degrees of freedom as small as possible
(given finite first moments) the impact of the parameterization for A is minimized, i.e., by
letting v p 2.96
Finally, it should be pointed out that the joint prior distribution of (, , ) satisfies certain
independence conditions. Specifically, is assumed to be independent of and . Under the
Minnesota-style prior for it is also assumed that is a prior independent of .
14.2. Posterior Mode
Before we turn to the estimation of the posterior mode we need to introduce some additional
notation. Let x be a p T matrix with xt in column t, while is constructed in the same way.
Similarly, let d be a q T matrix with dt in column t. Furthermore, let D be a qk 1 T
dtk
(14.8)
z D ,
(14.9)
or
Given that the inverted Wishart prior has been selected for and the normal conditional on prior for , it
follows by standard distribution theory that the marginal prior of is matricvariate t.
96
170
where 1 k
. Applying the vec operator on gives us:
Ipq
Iq 1
vec ,
vec
.
..
Iq k
Uvec .
(14.10)
The nonlinearity of the VAR model means that an analytical solution for the mode of the joint
posterior distribution of (, , ) is not available. However, from the first order conditions we
can express three systems of equations that the mode must satisfy, and by iterating on these
equations it is possible to quickly solve for the posterior mode. Naturally, the choice of prior
influences the three systems of equation.
First, the choice of prior for and does not have any eect on the equations that has to
satisfy at the mode conditional on and . Here we find that
1
1
1
1
1
U vec zD .
(14.11)
U DD U
Second, in case the Minnesota-style prior is applied to , the posterior mode estimate must
satisfy the system of equations
1 1
vec yY 1
(14.12)
vec Y Y 1 1
.
Similarly, when a normal conditional on the residual covariance matrix prior is used for , then
the posterior mode must satisfy:
1
.
(14.13)
Y Y 1
yY 1
The system of equations that needs to satisfy when a diuse prior is used on these parameters
1
is, for instance, obtained by letting 1
0 in (14.13), i.e., yY Y Y .
Third, in case a Minnesota-style prior is used on , then the posterior mode of must satisfy:
1
A .
(14.14)
T pv1
If the prior on is diuse, i.e., given by (14.7), we simply set v 0 and A 0 in (14.14).
Similarly, when the prior on is given by (14.5), then the posterior mode of satisfies
1
A 1
.
(14.15)
T pk 1 v 1
If the prior on is diuse we again let v 0 and A 0. Similarly, if the prior on is diuse,
we set k 0 and 1
0 in (14.15).
14.3. Gibbs Samplers for a Bayesian VAR
The posterior samplers used by YADA for drawing from the posterior distribution of the parameters of the Bayesian VAR models that it supports are simple Gibbs samplers; see, e.g., Geman
and Geman (1984), Casella and George (1992), Tierney (1994), or Geweke (1999, 2005). This
means that the full conditional posterior distributions are needed for (, , ).97
The full conditional posterior distribution of is given by Villani (2009, Proposition A.1). Let
T {x1k , . . . , x0 , x1 . . . , xT , d1k , . . . , d0 , d1 . . . , dT }. We can now express this distribution
as:
,
(14.16)
|, , T Npq ,
97
The Gibbs sampler is a special case of the Metropolis-Hastings algorithm where the proposal density is equal to
the full conditional posterior, with the eect that the accetance rate is always unity. The sampler was given its name
by Geman and Geman (1984), who used it for analysing Gibbs distributions on lattices.
171
1
1
1
1
where
U DD U and U vec zD . Notice that the mean
of this conditional distribution has the same general construction as the first order condition
expression for in (14.11).
The full conditional posterior distribution of when a Minnesota-style prior is used is also
given by Villani (2009, Proposition A.1). Given our notation, this distribution can be expressed
as
,
(14.17)
vec |, , T Np2 k ,
1
1
1
1
1
If the prior on is assumed to be diuse, then we simply let v 0 and A 0 in (14.17). This
results in the full conditional posterior of in Villani (2009).
The case when the prior distribution of is normal conditional on instead implies that the
full conditional posterior of the autoregressive parameters is given by:
,
(14.19)
,
vec |, , T Np2 k vec
1
1
yY 1
where
Y Y and
. For this case we find that the full
conditional posterior of is:
(14.20)
|, , T IWp A 1
, T pk v .
p |
(14.23)
p |T N
i1
The density on the right hand side of (14.23) is normal and parameterized as shown in equation
(14.17) or (14.19).
172
There remains to estimate the conditional posterior density p|, T at the selected parameter point. In this case we cannot, as Chib (1995) explains, use the posterior draws from the
Gibbs sampler from the full conditional posteriors above. Instead, we can apply Gibbs samplers
Sim That is, we draw j from (14.16) with .
for (, ) for the fixed value of .
j
ilarly, we draw from either (14.18) or from (14.20) with . The gives us N posterior
draws j , j that are all based on a fixed value of . We can now estimate p|, T at
)
using
(,
N
j , ,
T .
,
T N 1
p |
(14.24)
p |
j1
The density on the right hand side is normal with parameters given by equation (14.16).
There are, of course, alternative ways of estimating the marginal likelihood pT for the
Bayesian VAR model; see, e.g., Geweke (1999). The approach advocated by Chib (1995) may be
regarded as reliable when a parameter point with a high posterior density is used. The posterior
mode, discussed in Section 14.2, is one such point, but one may also consider an estimate of the
joint posterior mean. YADA always makes use of the posterior mode, thus explaining why the
posterior mode must be estimated prior to running the posterior sampler for the Bayesian VAR.
Moreover, when the Gibbs sampler based on the full conditional posteriors is applied, then the
point of initialization is given by the posterior mode.
The chosen order of factorization in equation (14.22) influences how the Chib estimator of
the marginal likelihood is carried out. Since generally has (a lot) more parameters than
or it is useful to fix first. The additional Gibbs steps for and can be carried out much
more quickly than the more time consuming step. The choice between using the full condition
posterior for or is not so important. From a computational perspective it should generally
not matter much if we estimate p|, T or p|, T since the dimensions of and are
generally fairly low.
14.5. Unconditional Forecasting with a VAR Model
The Thompson and Miller (1986) procedure may also be applied to the Bayesian VAR in Section 14. In this case we let , , where the draws are obtained using the Gibbs
samplers discussed in Section 14.3. For a given draw from its posterior distribution we may
first draw residuals T 1 , . . . , T h from a normal distribution with mean zero and covariance
matrix . Next, we simulate the xT 1 , . . . , xT h by feeding the residual draws into the VAR
system in (14.1). Repeating this P times for the given gives us P sample paths conditional on
. By using S draws of from its posterior we end up with P S paths of xT 1 , . . . , xT h from its
predictive density.
For the Bayesian VAR we may decompose the prediction uncertainty into two components,
residual uncertainty and parameter uncertainty. That is,
(14.25)
C xT i |T ET C xT i |T ; CT E xT i |T ; ,
where the deterministic process dT 1 , . . . , dT h has been suppressed from the expressions to
simplify notation. The first term on the right hand side measures the residual uncertainty,
while the second measures parameter uncertainty. To parameterize these two terms, we first
rewrite the VAR model in (14.1) in first order form:
YT i BYT i1 Jk T i ,
i 1, . . . , h,
k1
1
I
0
p
B
..
0
Ip
173
(14.26)
(14.27)
Using these well known expressions we first find that the residual uncertainty term is:
i1
Jk B j Jk Jk B j Jk
i 1, . . . , h,
(14.28)
ET C xT i |T ; ET
j0
i 1, . . . , h.
(14.29)
It may be noted that the parametric expression for the residual uncertainty term can be
i
simplified such that the summation over j is avoided for all i. Define the pk pk matrix
X
from the dierence equation
i1 B ,
i J J B
X
X
k
k
i 1, . . . , h,
i1
Jk B j Jk Jk B j Jk ,
i 1, . . . , h.
j0
1 xT i
K
i1
xT ij u
T ,
K
2j
i 1, . . . , g,
(14.30)
j1
(14.33)
1/2 1/2 .
YT i B i YT
i1
B j Jk 1/2 T ij ,
j0
174
i 1, . . . , g.
i1
i 1, . . . , g.
j0
Jk B g
xT g
dT g
x
T g1 dT g1 Jk B g1
Y
.
..
..
... T
.
Jk B
xT 1
dT 1
g2
1/2
1/2
0
J
B
J
T g1
k
k
,
.
..
..
...
..
.
.
1/2
T 1
0
0
or
T g .
T g T g GYT DN
X
The conditioning assumptions in (14.30) can be stacked as:
K
K
K
x
u
zT g
21
2g1 T g
1
T
0 K
z
T
K
xT g1 u
T g1
1
2g2
. . ,
. .
..
.. .
.
.
.
.
. .
zT 1
xT 1
u
T
0
0
K
1
or
X
T.
T g U
T g K
Z
(14.34)
(14.35)
where kT g ZT g UT K T g GYT .
T g conditional on the
Like in Waggoner and Zha (1999), the distribution of the shocks N
restriction (14.36) is normal with mean N,T
g and idempotent covariance matrix N,T
g . We
here find that
T g ,
1 k
K DD K
N,T
g D K
(14.37)
1 K
D.
K DD K
N,T
g Ipg D K
This concludes the first step of deriving the mean and the covariance matrix that the standardized shocks should be drawn from to ensure that the conditioning assumptions are satisfied.
For the second step, where we will show that the choice of 1/2 does not have any eect on
the reduced form shocks subject to the conditioning assumptions, let T g be the stacking of
T g , . . . , T 1 . This means that
T g .
(14.38)
T g Ig 1/2 N
The restriction (14.36) can now be expressed in terms of T g as
T g .
T g k
D
K
175
(14.39)
g1
I Jk BJk Jk B Jk
p
0
g2
I
J
B
J
p
k
k
.
D
.
..
..
..
.
.
0
0
Ip
Moreover, the definition in (14.38) also means that the distribution of the reduced form shocks
T g conditional on the restriction (14.39) is normal with mean ,T g and covariance matrix
,T g . These moments are equal to
1
T g ,
k
,T g Ig D K K D Ig D
(14.40)
1
K
K
K
D
Ig D
D
Ig .
K
,T g Ig Ig D
From these expressions we find that the moments do not depend on a particular choice of 1/2
and the claim has therefore been established.
The computation of the conditional predictive distribution can now proceed as follow. For
a draw , , from the joint posterior distribution, we may first draw T g from
N,T g , ,T g thus yielding a sequence of shocks, T 1 , . . . , T g , which guarantees that
the conditioning assumptions (14.30) are met. Next, we draw T i for i g 1, . . . , h from
N0, . With these shocks we can simulate the path xT 1 , . . . , xT h by feeding the residuals
into the VAR system (14.1). Repeating this P times for the given gives us P sample paths from
the predictive distribution conditional on the historical data, the conditioning assumptions, and
. Repeating the above procedure for S draws of from its joint posterior distribution means
that we end up with P S paths of xT 1 , . . . , xT h from the conditional predictive distribution.
For each draw we can also estimate the population mean of xT 1 , . . . , xT h by letting
T 1 , . . . , T g be equal to ,T g . The shocks T i are next set to zero for i g 1, . . . , h.
By feeding these shock values into the VAR system we obtain a path for E xT i |T , ZT g ;
,
i 1, . . . , h. Repeating this S times for the dierent draws we may estimate the population
mean of the conditional predictive distribution by taking the average of these S paths.
The prediction uncertainty of the conditional forecasts can be decomposed into error (or
residual) uncertainty and parameter uncertainty. The equivalent to equation (14.25) is now
(14.41)
C xT i |T , ZT g ET C xT i |T , ZT g ; CT E xT i |T , ZT g ; ,
for i 1, . . . , h, where the ET and CT as in Section 12.1 denotes the expectation and covariance
with respect to the posterior of at time T. Once again, the deterministic variables over the
prediction horizon have been suppressed from the expression.
To parameterize these terms we first note that
i1
i
Jk B j Jk
,T ij ,
E xT i |T , ZT g ; dT i Jk B YT
j0
where
,T ij 0 if i j > g. These expected values satisfy the conditioning assumptions for
i 1, . . . , g. Moreover, the forecast error for a given is
i1
Jk B j Jk T ij
,T ij .
xT i E xT i |T , ZT g ;
j0
176
and K
are both invariant to T.
Next, the covariance matrix ,T g is invariant to T since D
Partitioning this gp gp matrix as follows
g,g
g,1
.
..
..
..
,T g
.
. ,
1,g
1,1
j,i is the p p covariance matrix for the vector pair (T i , T j ) for all i, j
i,j
where
1, . . . , g. The forecast error covariance matrix of xT i conditional on T , ZT g and is now
equal to
i1
i1
i,j J B l J ,
Jk B j Jk
C xT i |T , ZT g ;
k
k
j0 l0
where
i,j
i,j , if i, j 1, . . . , g,
,
if i j, i g 1, . . . , h
0
otherwise.
These covariance matrices also satisfy the conditioning assumptions, meaning that, for instance,
1,1 0.
K
1
The moment expression for fixed parameters may also be expressed more compactly. For
forecast horizons until g we have that
,T g ,
T g |T , ZT g ; T g GYT D
(14.42)
E X
gives us the mean predictions from T 1 until T g. The forecast error covariance matrix is
here given by
,T g D
.
T g |T , ZT g ; D
(14.43)
C X
B Jk BJk B g1 Jk .
The forecast error covariance matrix for fixed is now given by
ig1
j
,T g B B ig J ,
Jk B j Jk Jk B Jk Jk B ig B
C xT i |T , ZT g ;
k
(14.45)
j0
for i g 1, . . . , h. Notice that for a stationary VAR model the mean prediction converges to
the mean of x as i and the prediction covariance matrix to the covariance matrix of x.
The modesty analysis can also be performed in the VAR setting. Like in the case of the statespace model we can consider one multivariate and two univariate statistics. These are again
based on the ideas of Adolfson et al. (2005) and Leeper and Zha (2003). For a given draw T g
from N,T g , ,T g the dierence between the period T g simulated conditional forecast
value of the endogenous variables and the unconditional forecast (given ) is
g1
Jk B j Jk T gj Jk BT g ,
T,g T g xT g T g ; E xT g |T ;
j0
177
(14.46)
where T g
T g T 1
. The forecast error covariance matrix for the unconditional
forecast of xT g is
g1
Jk B j Jk Jk B j Jk .
(14.47)
T g
j0
T,g T g T,g T g 1
T g T,g T g .
(14.48)
Under the hypothesis that the conditioning shocks are modest this statistic is 2 p. An
alternative reference distribution can be generated by computing the same statistic with T g
replaced with T i drawn from N0, for i 1, . . . , g and defining this reference statistic as
T,g T g . The event {T,g T g T,g T g } can then be tested for each one of the P S
conditional forecast paths that is computed, making it possible to estimate the probability of this
event. If the probability is suciently small we may say the hypothesis of modest conditioning
assumptions is rejected.
Univariate modesty statistics can now be specified by selecting elements from the vector in
(14.46) and the matrix in (14.47). Specifically, we let
i
T,g T g
i
, i 1, . . . , p.
(14.49)
T,g T g %
i,i
T g
This statistic has a standard normal distribution under the assumption that the conditioning
information is modest and, like the multivariate statistic, it takes into account that there is
uncertainty about all shocks.
For a Leeper-Zha type of univariate modesty statistic we set the reduced form shocks equal
,T g
,T 1
value. The covariance matrix for the forecast errors
to the mean ,T g
thus becomes singular and is given by
T g
g1
,T gj J B j J .
Jk B j Jk
k
k
(14.50)
j0
,
K
,T j K
K
1
1
1
1
for j 1, . . . , g. The univariate Leeper-Zha type of modesty statistic is now given by
i
,T
g
T,g
, i 1, . . . , p.
i
%
T,g ,T g
i,i
(14.51)
T g
k
l zl .
l1
98
If all roots of z lie outside the unit circle it follows that xt is covariance stationary and that
the autocovariances are absolutely summable.
98
The population spectrum of x, denoted by sx , exists for all and is now equal to
1
1
1
expi expi
, ,
.
(14.53)
sx
2
Rather than comparing the population spectrum of the state-space model to a non-parametric
estimate it may be useful to compare it to a VAR based estimate. The latter model often gives a
reasonably good approximation of the covariance properties of the data; see, for instance, King
and Watson (1996) and Christiano and Vigfusson (2003) for papers using unrestricted VARs
when comparing frequency domain properties of estimated models to the data.
14.8. YADA Code
YADA contains a wide range of functions for the Bayesian VAR analysis. In this section I will
limit the discussion to the four main topics above, i.e., the prior, the posterior mode estimation,
the Gibbs sampler, and the marginal likelihood calculation.
14.8.1. Functions for Computing the Prior
This section concerns two functions, MinnesotaPrior and NormalConditionPrior. These functions both compute elements needed for the prior covariance matrix of the parameters. The
first functions is used when the Minnesota-style prior is assumed for these parameters, i.e.,
when in (14.3) is determined as in equation (14.4). Similarly, the second function is applied
when the normal condition on prior is assumed. In this case, the matrix in (14.5) is
determined as in equation (14.6).
14.8.1.1. MinnesotaPrior
The function MinnesotaPrior requires 5 inputs: OverallTightness (o ), CrossEqTightness
(c ), HarmonicLagDecay (h ), OmegaVec, and k. While the first three inputs are hyperparameters, the fourth is a vector with the diagonal elements of , i.e., with the residual variances.
YADA always uses the maximum likelihood estimate of to generate these residual variances.
Finally, the fifth input is the lag order of the Bayesian VAR. The function provides the p2 k p2 k
matrix SigmaPi ( ) as output.
14.8.1.2. NormalConditionPrior
The function NormalConditionPrior takes 4 inputs: OverallTightness, HarmonicLagDecay, p,
and k. The first two are the same hyperparameters as the MinnesotaPrior function uses, while
the third input is the number of endogenous variables, and the fourth the lag order. As output
the function provides the pk pk matrix OmegaPi ( ).
14.8.2. Functions for Estimating the Mode of the Joint Posterior
The function BVARPosteriorModeEstimation is used to estimate the posterior mode of the
Bayesian VAR parameters. It handles all the types of priors discussed in Section 14.1. The
main inputs for this function are the structures DSGEModel and CurrINI; see Section 7.4.
From the perspective of analysing a Bayesian VAR model, the DSGEModel structure contains
information about the type of prior to use for VAR parameters, their hyperparameters, the lag
order, as well as which endogenous and exogenous variables to use, for which sample, the data,
the maximum number of iterations to consider, and the tolerance value of the convergence
criterion. This information allows the function to compute the maximum likelihood estimates
of , , and and fully set up the prior. The maximum likelihood estimates are used as initial
values for the posterior mode estimation algorithm. The maximum likelihood estimate of is
adjusted to take the prior into account. For example, if a diuse prior is used for and for
, then the maximum likelihood estimate of is multiplied by T/T p 1. Similarly, if
the inverted Wishart prior in (4.12) is assumed with v p 2, then the maximum likelihood
estimate is multiplied by T, A is added to this, and everything is divided by T p v 1.
As discussed in Section 14.2, it is not possible to solve for the posterior mode analytically.
Instead, it is possible to iterate on the first order conditions until a set of values that satisfy
179
these conditions can be found. The posterior mode estimation routine in YADA first evaluates
the log posterior at the initial values. For each iteration i YADA computes:
i given i1 and i1 as shown in equation (14.12) or (14.13);
i given i and i1 as shown in equation (14.11); and
i given i and i as shown in equation (14.14) or (14.15).
With a new set of parameter values, the log posterior is recalculated and if then absolute change
is not suciently small, the algorithm computes iteration i 1. Otherwise it exits.
YADA has three functions for calculating the log posterior. First, if a diuse prior on
is assumed, then BVARLogPosteriorDiffuse is called. Second, if the prior on is of the
Minnesota-style, then the function BVARLogPosteriorMinnesota is used. Finally, the function
BVARLogPosteriorNormalCond is considered when a normal conditional on the residual covariance matrix prior on is assumed.
Furthermore, YADA has 5 functions for dealing with the computations in (14.11)(14.15).
These functions are: BVARPsiMean (equation 14.11) for , BVARPiMeanMinnesota (equation
14.12) or BVARPiMeanNormalCond (equation 14.13) for , and BVAROmegaMinnesota (equation
14.14) or BVAROmegaNormal (equation 14.15) for . The functions are also used by the Gibbs
sampling routine for the Bayesian VAR; cf. Section 14.8.3.
14.8.2.1. BVARLogPosteriorDiffuse
The function BVARLogPosteriorDiffuse needs 11 inputs. First of all it requires the parameter
values Omega (), Pi (), and Psi (). Next, the hyperparameters of the prior of the residual
covariance matrix are needed as Amatrix (A) and qDF (v), and the hyperparameters of the
steady state prior thetaPsi ( ) and SigmaPsi ( ). Finally, the function needs information
about the endogenous and exogenous variables in terms of the matrices x (x), X (X), d (d), and
the required output is needed by the posterior mode estimation routine, the optional output is
needed by the Gibbs sampler for drawing from the posterior distribution.
14.8.2.5. BVARPiMeanMinnesota
The function BVARPiMeanMinnesota accepts 8 inputs: Omega, Psi, thetaPi, invOmegaPi, x, X,
d, and dLag. Apart from invOmegaPi all these inputs are described above for the log posterior
function BVARLogPosteriorMinnesota. The input invOmegaPi is simply the inverse of .
The output ThetaBarPi is required and is a ppk matrix. The vector in (14.17) is obtained
by applying the vec operator to this output. Optionally, the matrix SigmaBarPi is provided. This
in the same equation.
output is equal to
14.8.2.6. BVARPiMeanNormalCond
The function BVARPiMeanNormalCond needs 7 inputs: Psi, muPi, invOmegaPi, x, X, d, and
dLag. All these inputs apart from invOmegaPi are discussed above for the log posterior function
BVARLogPosteriorNormalCond. The input invOmegaPi is, of course, the inverse of .
The output muBarPi is required and is the p pk matrix
in (14.19). Optionally, the matrix
in the same equation.
OmegaBarPi is provided. This output is equal to
14.8.2.7. BVAROmegaMinnesota
The function BVAROmegaMinnesota needs 8 inputs: Pi, Psi, A, qDF, x, X, d, and dLag. These inputs are also required by BVARLogPosteriorMinnesota. As output the function provides Omega,
the p p matrix in equation (14.14).
14.8.2.8. BVAROmegaNormal
The function BVAROmegaNormal accepts 10 inputs: Pi, Psi, A, qDF, muPi, invOmegaPi, x, X, d,
and dLag. These inputs are discussed above; see the BVARLogPosteriorNormalCond and the
BVARPiMeanNormalCond functions. As output the function provides Omega, the p p matrix in
equation (14.15).
14.8.3. Gibbs Sampling
The function BVARPosteriorSampling controls the events regarding the posterior sampling algorithm. The function takes exactly the same inputs as the posterior mode estimation function
BVARPosteriorModeEstimation. The Bayesian VAR estimation routines follow the same type
of logic as the DSGE model estimation routines. This means that you have to run the posterior
mode estimation before the posterior sampling function can be run. The reason is simply that
the posterior sampling routine for the Bayesian VAR uses the posterior mode estimates of the
parameters to initialize the Gibbs sampler.
As in the case of posterior sampling for the DSGE model, the sampling function for the
Bayesian VAR model reads a number of variable entries from the posterior mode estimation
output file. These data are read from file to ensure that exactly the same data is used for
posterior sampling as was used for posterior mode estimation. Hence, if you have changed
some hyperparameter after running the posterior mode estimation part, the new value will not
be used by YADA. Similarly, changes to the sample will be ignored as well as any other changes
to the data.
The posterior sampling function can look for sampling data that you have already generated and can load this data. These features operate in exactly the same way for the posterior
sampling function for the DSGE and the Bayesian VAR model.
The precise Gibbs sampler for the Bayesian VAR depends on the prior you are using for the
, , and parameters. The posterior mode estimates are always used as initial values for
the sampler. To generate draw number i of the parameters YADA first draws i conditional
on i1 and i1 with the function InvWishartRndFcn. Since the marginal likelihood is
also estimated as described in Section 14.4 YADA also draws a value for conditional on
fixed at the posterior mode and a theoretically consistent previous value of , i.e., one
that is also conditioned on at the posterior mode. Next, YADA draws i conditional on
181
i1 and i . Here it utilizes the function MultiNormalRndFcn. To finish the i:th draw, a
value of i conditional on i and i is obtained. Again, YADA makes use of the function
MultiNormalRndFcn.
As in the case of YADA also draws a value of conditional on fixed at the posterior mode
and the draw obtained for when is fixed at the mode. This value of is used for the next
draw of conditional on fixed at the posterior mode. In this fashion YADA generates two
sets of Gibbs sampler draws. The first full set (i , i , i ) are draws from the joint posterior
T in equation (14.23). The second partial set
and are also used to estimate the density p|
j
j
,
T in equation (14.24).
( , ) is only used to estimate the conditional density p|
14.8.3.1. InvWishartRndFcn
The function InvWishartRndFcn requires two inputs to generate a draw from the inverted
Wishart distribution. These inputs are A and df, representing the location parameter and the
degrees of freedom parameter respectively. As output the function provides Omega.
14.8.3.2. MultiNormalRndFcn
The function MultiNormalRndFcn generates a desired number of draws from the multivariate
normal distribution. As input the function needs mu, Sigma, and NumDraws. These inputs provide
the mean, the covariance matrix and the number of desired draws, respectively. The last input
is optional and defaults to 1.
As output the function gives z, a matrix with as the same number of rows as the dimension
of the mean and number of columns gives by NumDraws.
14.8.4. Marginal Likelihood of the Bayesian VAR
Estimation of the marginal likelihood is handled by the function MargLikeChib in YADA. Before discussing this function in more detail it is worthwhile to keep Lindleys (also known as
Bartletts) paradox in mind; see Bartlett (1957) and Lindley (1957). That is, as a rule of thumb
we should only compare the marginal likelihood value across two models if the prior is proper
in the dimensions where they dier. By proper we mean that the prior density should integrate
to unity (a finite constant) over these parameters. For instance, if the Minnesota-style prior is
assumed for , then we can compare the marginal likelihood for models that dier in terms of
the lag order given that the same sample dates and variables are covered by xt . If, instead, the
diuse prior p 1 is used for these parameters, then the marginal likelihoods should not
be compared in this dimension. The paradox here states that the model with fewer lags will
have a greater marginal likelihood value regardless of the information in the data.
14.8.4.1. MargLikeChib
The function MargLikeChib computes the log marginal likelihood using Chibs marginal likelihood identity. As input the function requires 9 inputs (and accepts a 10th). First of all
i , i , T
it takes two vectors with NumIter elements of values of the log densities p|
j
used for a particular estimate, the second column the estimated log marginal likelihood, and
the third column the numerical standard error of the estimate based on the Newey and West
(1987) correction for autocorrelation.
14.8.5. Forecasting with a Bayesian VAR
14.8.5.1. BVARPredictionPathsPostMode
The function BVARPredictionPathsPostMode requires 9 inputs. The first group is Psi, Pi, and
Omega with fixed values for , , and , respectively. Next, the function takes the structures
DSGEModel and CurrINI. Furthermore, the p h matrix DPred with data on the exogenous
variables over the h periods in the prediction sample as well as h, the prediction sample length
are needed. Finally, the function requires the integer NumPaths and the boolean AnnualizaData.
The former determines the number of prediction paths to compute at the fixed parameter value,
while the latter indicates if the prediction paths should be annualized or not.
The number of output variables is equal to 6. The first is the 3-dimensional matrix PredPaths,
whose dimensions are given by the number of observed variables, the length of the prediction
sample, and the number of prediction paths. The second output variable is PredMean, a matrix
with the population mean predictions of the observed variables. The following output variables
are the matrices PredEventData, which stores the prediction event results, and YObsEventData,
which stores the observed event paths, i.e., those when the mean of the paths is equal to the
observed ath. These two matrices are obtained from the function CalculatePredictionEvents
and have as many rows as variables and 7 columns; see Section 12.7.5. The last two output
variables are called KernelX and KernelY, 3-dimensional matrices with kernel density estimates
of the marginal predictive densities. The dimensions of both matrices are equal to the number
of observed variables, the number of grid points, and the prediction sample length.
14.8.5.2. BVARPredictionPaths
The function BVARPredictionPaths also requires 9 inputs. The final 6 input variables are identical to the final 6 input variables for the BVARPredictionPathsPostMode function. The first 3,
however, are now given by the matrices PsiPostSample, PiPostSample, and OmegaPostSample.
The number of rows of these matrices is NumDraws, while the number of columns is equal to the
number of parameters of , , and , respectively.
The function gives 4 variables as output. First, the boolean variable DoneCalc indicates if
the calculations were finished or not. The second output is PredEventData, a p 2 matrix
with the prediction event results. Furthermore, the prediction uncertainty decomposition into
the residual uncertainty and the parameter uncertainty is provided through the 3D matrices
ShockCov and ParameterCov. The dimensions of these matrices are p p h, where h is the
length of the prediction sample. This decomposition is only calculated when the boolean input
variable AnnualizeData is zero.
14.8.5.3. BVARCondPredictionPathsPostMode
The function BVARCondPredictionPathsPostMode for computing conditional forecasts with the
BVAR at posterior mode values of the parameters requires 11 input variable. Nine of these variables are shared with BVARPredictionPathsPostMode, the function for unconditional forecasts
at posterior mode. The additional two input variables are Z and U. The matrix Z is an m g
matrix with the conditioning data used by the Bayesian VAR, while the vector U holds the initial
values u
T ; see equation (14.30).
The number of output variables supplied by the function is equal to 10 and in addition to the
variables given by BVARPredictionPathsPostMode it also provides MeanShocks, a p h matrix
with the mean value of the shocks at the population mean prediction, and three variables
related to modesty analysis: MultiModestyStat, UniModestyStat, and UniModestyStatLZ. The
modesty statistics are only calculated when AnnualizeData is 0. In that case, MultiModestyStat
is a NumPaths times 2 matrix, with the multivariate modesty statistic and the reference statistic
in the two columns. The univariate Adolfson et al. (2005) statistics are stored in the NumPaths
183
times p matrix UniModestyStat, while the univariate Leeper-Zha related statistics are given by
the p-dimensional vector UniModestyStatLZ.
14.8.5.4. BVARCondPredictionPaths
The function BVARCondPredictionPaths calculates conditional forecasts with the BVAR using
draws from the posterior distribution of the model parameters. It uses 11 input variables, where
nine are shared with BVARPredictionPaths, and the additional two inputs are given by Z and
U, discussed above for BVARCondPredictionPathsPostMode.
The function supplies 5 output variables and 4 of these are shared with the unconditional
forecasting function BVARPredictionPaths. The remaining variable is ShockMean, a ph matrix
with the estimated population mean of the residuals over the posterior draws.
184
Similarly, the central DSGE model based population autocovariances of yt conditional on may
be expressed as:
H H R, if j 0,
j
t t
(15.3)
y,t E yt A xt ytj A xtj ;
j
Ht F Htj , for j 1, 2, . . . .
Recall that is the central population covariance matrix of the state variables conditional on ,
i.e., it satisfies the Lyapunov equation (5.15). It now follows from equations (15.2) and (15.3)
that the non-central population autocovariances are
j
; y,t A xt xtj
A, t 1, . . . , T, and j 0, 1, . . . .
(15.4)
E yt ytj
We may next define average DSGE model based population moments from these expressions.
Specifically, the sample average of the products for the deterministic variables is
T
1
j
x x ,
x
T t1 t tj
while
T
1 j
j
,
y
T t1 y,t
j 0, 1, . . . ,
j 0, 1, . . . ,
(15.5)
(15.6)
while
T
1
j
E yt xtj
; A
x ,
T t1
j 0, 1, . . . ,
j 0, 1, . . . .
(15.7)
(15.8)
We shall use these average moments below when parameterizing the prior distribution of the
DSGE-VAR.
p
1
0
x
x
x
p1
1
0
0
p1
y
A
A
y A x A
A x
x
.
Y Y .
..
..
..
..
.
.
.
p
p1
p1
0
0
y A
x
y
x
x A
A
A
A
(15.11)
Similarly, let yY be an nnpk matrix with the average non-central population moments
based on the yt and Yt vectors
p
p
1
1
0
yY A
(15.12)
y A x A y A x A .
x
If xt is a constant and the H matrix in the measurement equation is not time-varying, these
average population moments are the same as those given Del Negro and Schorfheide (2004,
Section A.2).
A population based regression can now determine the mapping from the DSGE model parameters to the VAR parameters. Specifically, suppose that Y Y is invertible, then
yY 1
Y Y ,
yy
yY 1
Y Y yY ,
(15.13)
(15.14)
0
0
where yy
y A x A is an n n matrix with average non-central population moments
based on the yt vector. The matrices and are restriction functions that will be used
to center the prior distribution of , conditional on and a hyperparameter 0 that
measures the deviation of the DSGE-VAR from the VAR approximation of the DSGE model.
The density function of the inverted Wishart is provided in equation (4.12); see Section 4.2.4.
187
Zellner (1971). Let the non-central sample product moment matrices be given by:
T
yy 1
y y ,
T t1 t t
T
Y Y 1
Y Y ,
T t1 t t
T
yY 1
y Y .
T t1 t t
From, e.g., Del Negro and Schorfheide (2004), the conditional posterior distributions of the
DSGE-VAR parameters can now be expressed as:
, 1 T np k ,
(15.18)
|y, Y1 , , IWn 1 T
Y Y
1 ,
1/T Y Y
(15.19)
vec|y, Y1 , , , Nnnpk vec,
where
1
1
1
yY
yY
Y Y
Y Y
,
(15.20)
1
1
1
1
1
1
yy
yY
yy
yY
1
1
1
1
(15.21)
1
1
1
Y Y
Y Y
yY
yY .
1
1
1
1
Notice that the posterior distributions in (15.18) and (15.19) depend on the initial values Y1 .
When we wish to compare marginal likelihoods for the DSGE-VAR to the DSGE model, the
information sets need to be the same. This may be handled in the DSGE model by using the
sample y1p , . . . , y0 as a training sample for the Kalman filter.
From the expressions in (15.20) and (15.21) it can also be seen that the larger is, the closer
the posterior mean of the VAR parameters is to and , the values that respect the cross
equation restrictions of the DSGE model. At the same time, the smaller becomes, the closer
the posterior mean is to the classical maximum likelihood estimates of and .
n 1 T np k
,
n T np k
b
where b a i1 a1i
/2 for positive integers a, b with a b, and is the gamma
function; see equation (4.4).100
100
Relative to equation (A.2) in Del Negro and Schorfheide (2004), the expression in (15.24) takes into account
that all terms involving powers of 2 cancel out in the numerator and denominator. The expression in (15.24) can
188
These results are valid when is finite. The case of implies that the VAR parameters
, are equal to , . The two densities for the VAR parameters in (15.23) are
therefore unity, so that
p y |Y1 , , p y |Y1 , , , .
(15.25)
The right hand side of (15.25) is the likelihood function of the VAR and, hence, the multivariate
normal density provides us with
nT/2
T 1
T/2
| |
exp tr , (15.26)
p y |Y1 , , , 2
2
where
1
yy
,
yY Y Y yY
YY
1 is the maximum likelihood estimator of .
yY
and
YY
The posterior density of the original DSGE model parameters for a given is proportional to
the marginal likelihood in (15.24) times the prior of . That is
p |y, Y1 , p y |Y1 , , p .
(15.27)
Since the marginal likelihood in (15.24) is equal to the marginal likelihood for the transformed
parameters , the posterior density of the transformed DSGE model parameters is proportional
to the marginal likelihood times the prior of . The latter prior is, as noted in Section 6, equal
to the product of the Jacobian in the transformation from into and the prior of ; see, e.g.,
Section 4.2.1. With g 1 , this means that
(15.28)
p |y, Y1 , p y |Y1 , g, p ,
where p Jpg 1 .
It is now possible to sample from the posterior distribution of for each by relying
on one of the MCMC algorithms discussed in Section 8. The RWM algorithm is considered in
Del Negro and Schorfheide (2004, 2006) and Del Negro et al. (2007), but a slice sampler can
also be used. If the RWM algorithm is used, then the proposal density from the DSGE model
may be used. Alternatively, the posterior mode and inverse Hessian at the mode of can be
computed from the above expressions via a numerical optimization routine, as in Section 7,
and then used for the posterior sampler. If the DSGE model is severely misspecified this latter
approach may result in a better proposal density. YADA allows for both possibilities and when
the posterior mode exists for all as well as for the DSGE model, the user will be asked which
approach to take.
1T npk/2
1 T
Y Y 1/
Y Y
|Y Y |
1T npk/2
nT/2
1 1/
.
T npk/2
T npk/2
|T |
| |
The expressions on the right hand side of these two relations are preferable from a numerical perspective since they
are less likely to involve matrices with large numerical values.
189
If we assign equal probabilities to the elements of , then the posterior probabilities for this
in (15.29) is the
hyperparameter are proportional to the marginal likelihood. This means that
101
posterior mode of .
The function py|Y1 , summarizes the time series evidence on model misspecification and
documents by how much the DSGE model must be relaxed to balance within-sample fit and
model complexity. To estimate the marginal likelihood we may use either the modified harmonic mean due to Geweke (1999, 2005) that was discussed in Section 10.2, or the marginal
likelihood identity based estimator due to Chib and Jeliazkov (2001) that was presented in Section 10.3. The latter approach should only be used if the RWM algorithm has been used for
posterior sampling.
The posterior distribution of the DSGE-VAR parameters can be computed by generating a
s
pair s , s
from the normal-inverted Wishart distribution in (15.18)(15.19) for each
for s 1, . . . , N. Once these parameters have been sampled, the
that was obtained under
DSGE-VAR model can be applied to any exercise that is valid for a reduced form VAR model,
such as forecasting or estimating the implied population or sample moments of the model.
As noted by Adolfson et al. (2008b), an appealing feature of the comparison of the DSGE
model to DSGE-VARs is that the same prior distribution is used to weight the likelihood functions across models when forming the marginal likelihood. Bayesian model probabilities have
great appeal, but they are sensitive to the choice of prior. This sensitivity may not be so large
when the models are similar and the prior is elicited in a similar way. A comparison between
a DSGE model and a Bayesian VAR using a statistically motivated prior, as in Section 14.1, is
more likely to be sensitive to the selected priors.102
Even if the DSGE model does not have a finite order VAR representation, the VAR model
mainly functions as a tool to relax the cross-equation restrictions and to obtain a specification
with superior empirical fit. The VAR model does not have to nest the DSGE model for this
analysis to remain sensible since the moments of the DSGE model that are used to form the
prior on the VAR are exact regardless of how good the approximation is. This means that a
indicates that the cross product moments of the DSGE model that are used to form the
large
prior agree well with the likelihood function.
15.7. Posterior Mode of the DSGE-VAR
The mode of the marginal posterior of can be determined by maximizing p|y, Y1 , in
(15.28) numerically with respect to . The mode of the marginal posterior of is thereafter
obtained by using the transformation function g 1 when the posterior density for the
transformed parameters has been utilized; cf. Section 6. Alternatively, the marginal posterior
mode of can be computed by maximizing (15.27) numerically with respect to the original
parameters. Given that we have located the posterior mode for each , the Laplace approximation of the marginal likelihood for py|Y1 , may be computed using the expression
To elicit a proper prior for a continuous T np 1 k l which we are willing to regard as fair is not
a trivial problem. To see why, notice that a natural transformation of is /1 , where l , 1
and
0 < l l /1 l < 1. The transformation is natural in the sense that it delivers a parameter which is defined
over a finite interval, thus making it feasible to use a uniform distribution. Although such a prior for may seem
fair since it gives dierent values an equal weight, it implies that has the density
1 l
p |l
2 .
1
101
Hence, is Pareto distributed, with cdf F|l 11l /1; see Section 4.2.12. Since the shape parameter
(a in equation 4.33) is unity, the moments do not exists. The location parameter, l , is both equal to the lower bound
and to the mode of the distribution (while the origin parameter, c, is 1). Hence, this prior puts an extreme weight
of values of close to the lower bound, l , and therefore on models which are as far away as possible from the
DSGE model. Moreover, the density height decreases exponentially as increases. While such a prior may seem
appropriate among economists who think DSGE model are of little or no value, the penalties on models with larger
values are extreme. Although this prior for is proper, it is not even in the neightborhood of being fair. In fact, it
is not even in the fringes of the fringes of the fringes of satisfying such a (loose) concept.
102
The Bayesian VAR prior is radically dierent from the economic prior of the DSGE model.
190
in equation (10.1). Although this may be a rather crude approximation unless sucient care
is taken when calculating the Hessian matrix numerically, it can nevertheless yield some information about which values of the data tend to favor. In fact, this is the approach taken in
Adolfson et al. (2008b, Table 2) when comparing their DSGE models to DSGE-VARs with and
without cointegration relations.
When the VAR parameters are fully determined by and, hence, the joint posterior
mode of all parameters is determined directly from the mode of the marginal posterior of .
When is finite, however, we only approximate the mode of the joint posterior of (, , )
through the mode of the joint conditional posterior of (, ) if we plug in the marginal mode
of into the relations that determine the joint conditional mode. The approximation result
follows when we note that the joint conditional posterior of (, ) depends on and this
dependence is not taken into account when we compute the mode of the marginal posterior of
. Given that the marginal mode of is close to the joint mode of , the approximation of the
joint posterior mode of (, ) can be expected to be very good.
Still, once we have determined the mode of the joint conditional posterior of the VAR parameters we can compute the concentrated likelihood for . From this likelihood of the joint
mode may be computed through numerical optimization of the concentrated posterior of .
The resulting value may then be used in the expressions determining the joint mode of the VAR
parameters and, thus, provide us with the mode of the joint posterior of (, , ).
The joint posterior of , , can be factorized as in (15.22), where the first term on the
right hand side can be rewritten as
(15.30)
p , |y, Y1 , , p |y, Y1 , , , p |y, Y1 , , .
Since the full conditional posterior of is given by the normal distribution, it follows that at
the mode of the joint distribution is equal to its mean
,
(15.31)
where the term on the right hand side is given in equation (15.20). Substituting this value into
the first density function on the right hand side of (15.30) we obtain an expression from which
the joint mode of can be determined as a function of . Maximizing this function with respect
to the covariance matrix of the VAR residuals one arrives at
1 T
,
(15.32)
1 T n 1
where the second term on the right hand side is given in equation (15.21). It is now straightforward to show that the mode of for the joint posterior is less than the mode of the conditional
posterior density in (15.18).103 As noted above, if we plug in the marginal mode of into equations (15.31) and (15.32) we may use these values as an approximation of the joint mode of
the DSGE-VAR. However, to actually determine the joint mode we need to continue a few more
steps.
First, substiting the mode expressions in (15.31) and (15.32) for and , respectively, into
(15.30) and rearranging terms we find that
Y Y n/2
|y, Y1 , , c , T TY Y T
p ,
(15.33)
1 T
np1k1/2 ,
103
1/p v 1A. This means
It was noted in Section 14.1 that if IWp A, v, then its mode is equal to
that the mode of the conditional posterior of in (15.18) is equal to
np k
1 T
.
c
1
1 T n 1 np k
1 T n 1 np k
c
<
Accordingly,
.
191
where
n1T n1/2
c , T exp n 1 T n 1 /2
1 T n1
1
.
nnpk/2 2n1T/2 nn1/4 n 1 T np k
When evaluating the conditional posterior of (, ) at the joint mode, we therefore have two
terms that depend on and that will influence the determination of the joint posterior mode
for all parameters.
Second, to estimate the mode of from the joint posterior of , , we need to multiply
the marginal likelihood of in (15.24) by the right hand side of (15.33). The corresponding
concentrated likelihood of is given by:
1 T
1T n1/2
,
(15.34)
pc y|Y1 , , c , T
n/2
T npk/2
|TY Y |
|T |
where
n1T n1/2
c , T exp n 1 T n 1 /2
1 T n1
1
.
nT npk/2 2n1T/2 nn1/4 n T np k
Finally, with m being the dimension of the last step is to solve the following numerical
problem
1
arg max pc y Y1 , g , p ,
(15.35)
m
R
g 1
g.
Y Y
,
tr
2
where g 1 , while c , T is a function that does not aect the Hessian matrix. Using
the tools for matrix dierential calculus that are described in Magnus and Neudecker (1988), it
can be shown that:
2 ln p , , |y, Y1 ,
Y Y
1 ,
T
(15.38)
Y
Y
vecvec
2 ln p , , |y, Y1 ,
0,
(15.39)
vecvech
2 ln p , , |y, Y1 ,
Y Y
1 F ,
T
(15.40)
Y
Y
vec
192
where vech is the column stacking operator that only takes the elements on and below the
is given in equation (15.43)
vec
Y Y 1 In
Y
Y
(15.43)
Dnpk GY Y .
GyY Inpk
Furthermore, the nn 1/2 m matrix F is given by:
vech
Gyy Dn
Dnpk GY Y
1
2Dn In GyY .
(15.44)
For the third term within large brackets on the right hand side, the result Dn Dn 1/2In2
Knn Nn has been used (see Magnus and Neudecker, 1988, Theorem 3.12) so that Dn Nn
Dn .
The last matrix in the partioned Hessian at the mode is given by
2 ln p , , |y, Y1 ,
Y Y
1 F
TF
Y
Y
1
1 T 2 tr
n 2 ln |Y Y |
(15.45)
2
2
T np k 2 ln | | 2 ln p
.
2
The last four terms on the right hand side can be computed numerically. Since they are all
m m matrices, the dimensions are kept down substantially relative to a numerical Hessian for
the joint log posterior. It may be noted that for the second term on the right hand side, the
is kept fixed at ,
while the matrix
varies with .
matrix
For the inverse Hessian at the mode it is recommended to make use of results for partioned
matrices and inverses of Kronecker products. Let the Hessian be described by the matrix
0
H,
H,
,
H
0
H
H
,
,
H,
H,
H,
The duplication matrix is defined from the relationship vecA Dn vechA for a symmetric n n matrix A,
where vech is the column stacking operator that only takes the elements on and below the diagonal.
104
193
where
Y Y
1 ,
H, Y Y 1/
i.e., we have multiplied equation (15.38) with 1/T. The inverse of this matrix is simply
1
Y Y 1
.
Y Y 1/
H,
Similarly,
H,
1 T n 1
1 Dn ,
Dn 1
2T
The inverse matrix on the right hand side exists and is positive definite if H is positive definite.
The inverse of H can now be expressed as:
H,
H,
H,
H 1 H,
,
H,
H,
H,
H,
H,
matrix has already been determined. The remaining 5 partitions of the
where the m m H,
inverse of H are given by:
1
1
1
H,
H,
H, H,
H,
H,
,
H,
1
1
1
H,
H,
H, H,
H,
H,
,
H,
1
1
H,
H, H,
H,
H,
,
H,
1
H,
H, H,
,
H,
1
H,
H, H,
.
H,
Finally, the inverse Hessian at the mode is obtained by multiplying H 1 by 1/T; see, e.g.,
Magnus and Neudecker (1988, Theorem 1.3) for the inverse of the partioned matrix H.
It is noteworthy that if the inverse Hessian is computed by combining these analytical results
with numerical derivatives we find that the dimensions of the matrices that need to be inverted
are np k, n, and m, while the dimension of the inverse Hessian itself is nnp k nn
1/2 m. For medium size DSGE models we can therefore expect that numerical precision can
be greatly improved by using the above procedure and that the computation time itself is also
shortened.
(15.46)
The non-singular A0 matrix is of dimension n n and each column is equal to the contemporaneous response in yt from a unit impulse to the corresponding element of t . To be exactly
identified we need to impose nn 1/2 identifying restrictions in addition to the nn 1/2
that have already been implicitly imposed through the assumed covariance matrix of t . The
194
(15.47)
1/2
(
(15.49)
106
The Q-R factorization of an n n matrix A of rank n is given by A QR. In practise, the n n matrix Q
is orthogonal, while the n n matrix R is upper triangular; see, e.g., Golub and van Loan (1983, p. 147) or the
qr function in Matlab. Hence, to obtain the matrices in (15.49) we instead compute the Q-R factorization for
A0 . Moreover, since some diagonal elements of R may be negative it is necessary to premultiply this matrix
with a diagonal n n matrix S, whose diagonal entries are 1 (1) when the corresponding diagonal elements of
R are positive (negative). The resulting matrix SR is upper triangular with only positive diagonal elements and is
therefore a suitable candicate for 1/2 . Furthermore, we need to postmultiply Q with S. Since S is orthogonal it
follows that QS is also orthogonal and may be used as .
106
107
15.9.1.2. GetDSGEVARPriorParameters
The function GetDSGEVARPriorParameters needs 8 input variables: theta, thetaPositions,
ModelParameters, PriorDist, thetaDist, LowerBound, DSGEModel, and CurrINI. These are
needed to solve the DSGE model and provide the state-space form at the value of the original
DSGE model parameters; see Section 11.16.
As output the function returns 11 variables. First of all, the matrices Phi and SigmaEpsilon
with the prior parameters in equations (15.13) and (15.14). The following 5 output variables
are the matrices from the state-space representation of the DSGE model, A, H, R, F, and B0.
Next, the variable AIMData is given, which is followed by DetProductMoments and HSample that
are input variables for DSGEVARPrior above. Finally, the function provides a boolean Status
variables which is unity if the prior parameters could be computed and 0 otherwise.
15.9.1.3. DSGEVARParameters
The function DSGEVARParameters requires 15 input variables: A, H, R, F, B0, lambda, GammaHatyy,
GammaHatyY, GammaHatYY, DetProductMoments, p, initP, MaxIter, Tolerance, and HSample.
Most of these variables are discussed above for the DSGEVARPrior function. The variable lambda
is equal to the hyperparameter that determines how closely the DSGE-VAR model is to the VAR
approximation of the DSGE model. The variables GammaHatyy, GammaHatyY and GammaHatYY are
yY , and
Y Y .
yy ,
simply the non-central sample product moment matrices
The function provides 6 output variables: Phi, SigmaEpsilon, Gammayy, GammaYY, GammayY,
VAR (np k). Finally, the variable logGPR is the log of the gamma product ratio in equation
(15.24), i.e., ln n 1 T np k ln n T np k.
The function provides one required output variable, logPost, i.e., minus the height of the
log posterior density up to the constant determined by the value of the log marginal likelihood.
In addition, the logPosteriorPhiDSGEVAR function also supports the optional logLike output
variable. It is equal to logLikeValue when all computations could be carried out successfully,
and to NaN otherwise.
15.9.2.2. logPosteriorThetaDSGEVAR
The function logPosteriorThetaDSGEVAR computes the log marginal posterior for the original
parameters. Like its counterpart logPosteriorPhiDSGEVAR it needs 22 input variables and, with
the exception of phi and UniformBounds, they are identical. Specifically, these two variables are
replaced with theta and ParameterBounds. The first is the vector of original parameters, while
the second is a matrix with the lower and upper bounds for the parameters in the columns.
The output variables are identical to those provided by logPosteriorPhiDSGEVAR.
15.9.2.3. logLikelihoodDSGEVAR
The function logLikelihoodDSGEVAR computes the value of the log of the marginal likelihood in equation (15.24) for finite values of . To accomplish this it needs 15 input variables: ModelParameters, DSGEModel, AIMData, lambda, T, n, p, npk, GammaHatyy, GammaHatyY,
GammaHatYY, DetProductMoments, HSample, logGPR, and OrderQZ. These variables have already
been discussed above as well as in Section 15.9.1 and in Section 7.4.
The function provides 3 output variables. The first is logLikeValue, the value of the log
marginal likelihood at the given of , the DSGE model parameter. Next, it provides mcode to
indicate if the DSGE model has a unique convergent solutions at . Finally, it gives PriorStatus,
a boolean variable that is 1 if the log-likelihood could be calculated and 0 otherwise.
15.9.2.4. logLikelihoodDSGEVARInf
The function logLikelihoodDSGEVARInf computes the value of the log marginal likelihood in
equation (15.26) for . The function needs 12 input variables to achieve this. Namely, the
same variables as logLikelihoodDSGEVAR except lambda, npk, and logGPR. The output variables
are the same as logLikelihoodDSGEVAR.
15.9.2.5. logConcPosteriorPhiDSGEVAR
The function logConcPosteriorPhiDSGEVAR computes the value of the concentrated log posterior for the transformed parameters using the concentrated likelihood in (15.34). The function
takes the same 22 input variables as logPosteriorPhiDSGEVAR, except for logGPR being replaced with logCLC, the log of the constant term for the concentrated likelihood. The latter
variable is determined by DSGEVARLogConcLikelihoodConstant as requires the input variables
lambda, T , n, and npk.
The function only provides one output variable, logPost, i.e., minus the height of the log
of the concentrated posterior density of the transformed parameters up to the constant determined by the value of the log marginal likelihood. At the posterior mode of the DSGE model
parameters, this value is equal to minus the height of the log of the joint posterior density up to
the constant determined by the value of the log marginal likelihood.
15.9.2.6. logConcPosteriorThetaDSGEVAR
The function logConcPosteriorThetaDSGEVAR calculates the value of the concentrated log posterior for the original parameters based on the expression on the right hand side of equation
(15.36). The function takes the same 22 input variables as logPosteriorThetaDSGEVAR, except
for logGPR being replaced with logCLC.
The function only provides one output variable, logPost, i.e., minus the height of the log of
the concentrated posterior density of the original parameters up to the constant determined by
the value of the log marginal likelihood.
197
15.9.2.7. logConcLikelihoodDSGEVAR
The function logConcLikelihoodDSGEVAR calculates the value of the centrated log-likelihood in
(15.34) for finite . It makes use of the same 15 input variables as logLikelihoodDSGEVAR, except for logGPR being replaced with logCLC. The function also gives the same 3 output variables
as logLikelihoodDSGEVAR.
15.9.2.8. Additional Density Function
When computing the marginal or joint posterior mode with Marco Rattos newrat, the YADA
implementation needs log posteriors and log-likelihoods that have names ending with 4Time
and which provide as a second output variable the time t values of the respective function. For
the log posteriors, these functions have exactly the same input variables as their counterparts,
e.g., logPosteriorPhiDSGEVAR4Time has the same input variables as logPosteriorPhiDSGEVAR.
For the log-likelihood functions, they also require the input variables y and Y with the actual
data on the endogenous and the deterministic and lagged endogenous variables, respectively.
To compute the time t log-likelihood based on the marginal likelihood in (15.24) the code
utilizes a very simple approach. Namely, it takes into account that this likelihood is equal to the
ratio of the likelihood for a sample t observations and t 1 observation. The log-likelihood for
periods t can thus be computed recursively once the full sample value has been determined. It
should be noted that when t < np 1 k 1 then the time t 1 value of the log-likelihood
can no longer be computed since the gamma function is only defined for positive values; see
Section 4.2.2. Furthermore, the same approach is used when time t values for the concertrated
likelihood function are computed based on equation (15.34).
15.9.3. Estimation Functions
15.9.3.1. DSGEVARMargPosteriorModeEstimation
The function DSGEVARMargPosteriorModeEstimation can estimate the marginal posterior mode
of either the transformed DSGE model parameters or the original parameters through the lens
of the DSGE-VAR for each value that is determined in the data construction file; see Section 17.5.4. The function therefore works in much the same way as PosteriorModeEstimation
which deals with posterior mode estimation of these parameters for the DSGE model; see Section 7.4. The input variables of these two functions are identical.
In contrast to PosteriorModeEstimation, the DSGE-VAR marginal posterior mode estimation
routine provides one output variable. Namely, the vector MarginalLambda which holds the
positions in the vector Lambda of the values the user selected to use. This makes it possible to
consider dierent values for the marginal and the joint posterior mode estimation. Moreover,
it allows the user to estimate the posterior mode in suitable batches.
Apart from computing the posterior mode of following the procedure laid out in posterior
mode estimation function for the DSGE model, the function plugs these values into the posterior
mode expressions for the VAR parameters in equations (15.31) and (15.32) and compares the
log marginal likelihood across the values to determine which gives the DSGE-VAR has
the largest posterior probability. The marginal likelihood is here computed using the Laplace
is replaced with ln Ly|Y1 , g 1 ,
for each
approximation in (10.1), where ln LY ; g 1
. These log marginal likelihood values are thereafter plotted and, if available, compared
with the log marginal likelihood value for the DSGE model. Again, the Laplace approximation
is used and this value is available if the posterior mode estimation has been completed for the
DSGE model.
15.9.3.2. DSGEVARPosteriorModeEstimation
The function DSGEVARPosteriorModeEstimation estimates the joint posterior mode of the VAR
parameters ( and ) and either the transformed DSGE model parameters () or the original
parameters (). It operates is the same manner as the function that computes the marginal
posterior mode in DSGEVARMargPosteriorModeEstimation. The input variables are the same
and the type of output that is produced is similar, except that it refers to the joint posterior
198
mode. In particular, the output variable is given by JointLambda, a vector with the positions in
the vector Lambda that the user selected to use for joint posterior mode estimation.
15.9.3.3. DSGEVARJointPosteriorInvHessian
The function DSGEVARJointPosteriorInvHessian attempts to compute the inverse Hessian at
the posterior mode of the joint posterior distribution of all the DSGE-VAR parameters using a
combination of analytical results and numerical approximations; see Section 15.7. The function
takes the same 21 input variables as logConcPosteriorPhiDSGEVAR, except for logCLC being
replaced with StepLength. The latter variable determines the step length when computing numerical approximations of the matrices with second partial derivatives in equation (15.45).108
15.9.4. Sampling Functions
15.9.4.1. DSGEVARPriorSampling
The function DSGEVARPriorSampling computes a sample of draws from the prior distribution of the DSGE () and VAR (, ) parameters. To achieve its objective it requires 9 input variables: theta, thetaPositions, ModelParameters, PriorDist, thetaDist, LowerBound,
NumPriorDraws, DSGEModel, and CurrINI. All these variables are familiar with the exception of
NumPriorDraws, which is an integer determining to total number of draws from the joint prior
distribution p, , |. The function computes the prior draws for all and saves them
to disk.
15.9.4.2. DSGEVARRWMPosteriorSampling
The function DSGEVARRWMPosteriorSampling operates in much the same way as the corresponding RWM sampling function for the DSGE model; see Section 8.5.2. In addition to the
input variables accepted by DSGERWMPosteriorSampling, the DSGE-VAR version requires the
maingui input variable. This variable is the handle to the YADA dialog, typically taking the
value 1. Posterior sampling of the DSGE model parameters () through the lens of the VAR
model is always performed for a certain value of the hyperparameter. When determining the
parameters of the proposal density for the RWM algorithm, the function checks which posterior
mode results exist on disk. The function can make use of the posterior mode results for the
DSGE model, the marginal or the joint posterior mode results for the DSGE-VAR.
The function can compute the marginal likelihood for the DSGE-VAR using either the modified harmonic mean (Section 10.2) or the Chib and Jeliazkov marginal likelihood identity based
estimator (Section 10.3). In addition, the function can compute the conditional marginal likelihood of the DSGE model when draws from the posterior distribution of the DSGE model exist
on disk by using a training sampling with minimum length p.
The calculation of the conditional marginal likelihood for the DSGE model is performed
by the function CondMargLikeModifiedHarmonic for the modified harmonic mean estimator,
and by CondMargLikeChibJeliazkov for the Chib and Jeliazkov estimator. In addition, when
the latter estimator is used to compute the marginal likelihood for the DSGE-VAR, the function MargLikeChibJeliazkovDSGEVAR is employed. This function is identical to the function
employed by the DSGE model (MargLikeChibJeliazkov) except that it calls the log posterior
of the DSGE-VAR (logPosteriorPhiDSGEVAR) instead of the log posterior of the DSGE model
(logPosteriorPhiDSGE) to evaluate the function; see the denominator in equation (10.19).
15.9.4.3. DSGEVARSlicePosteriorSampling
The function DSGEVARSlicePosteriorSampling performs posterior sampling with the slice sampling algorithm. It uses the same input variables as DSGEVARRWMPosteriorSampling and behaves in essentially the same way as the RWM function except for the actual sampling part.
108
The matrices with second partial derivatives are computed as in Abramowitz and Stegun (1964, equations 25.3.24
and 25.3.27, p. 884). This classic book is also hosted online: see, for instance, the homepage of Colin Macdonald at
the University of Oxford.
199
Moreover, while the RWM keeps track of the acceptance rate, the slice sampler counts the number of times the log posterior is evaluated. Moreover, when the posterior draws are obtained
through the slice sampler, YADA will only compute the log marginal likelihood of the DSGE-VAR
as well as of the DSGE model with the modified harmonic mean estimator.
15.9.4.4. DSGEVARPosteriorSampling
The function DSGEVARPosteriorSampling performs posterior sampling of the VAR parameters.
It takes the same input variables as the posterior samplers for the DSGE model parameters and
computes draws from the conditional posteriors of and ; cf. equations (15.18) and (15.19)
in Section 15.4. Since the dimensions of the VAR parameter matrices can be huge, posterior
sampling of the VAR parameters can use a fraction of the parameters. The size of the subset
is determined by the percentage use of posterior draws for impulse responses, etc option on
the Posterior Sampling frame on the Options tab; see Figure 4.
200
A1
t 1, . . . , T,
(16.1)
t A1
t
0
0 j ytj A0 0 xt ,
j1
1/2 . These estimates may be compared to either the update estimates of
where A1
0
shocks in t that are included in t or to the smooth estimates.
(16.2)
p1 p
I
0
0
n
(16.3)
.
..
0
0
In
Suppose that t ej and zero thereafter, with ej being the j:th column of In . That is, we
shall consider a one standard deviation impulse for the j:th structural shock. From equation
(16.2) it now follows that the responses of the endogenous variables are:
(16.4)
resp yth t ej J h J A e , h 0.
p
0 j
Provided that yt is covariance stationary, the responses of the endogenous variables tend to
zero as the response horizon h increases.
When some of the endogenous variables are expressed in first dierences, we can also compute the levels responses of all observed variables. As in Section 11.16.2, let C be an n n
diagonal matrix with 0 (1) in diagonal element i if endogenous variable i is measured in levels
(first dierences). This means that the levels responses to the shock t ej are given by:
L
t ej C resp y L
t ej J h J A e , h 1,
(16.5)
resp yth
p 0 j
p
th1
201
with respytL |t ej A0 ej . The matrix C acts as an indicator for the variables where an
accumulation of the responses in (16.4) gives the responses in the levels.
16.3. Forecast Error Variance Decompositions
Forecast error variance decomposition in DSGE-VAR models can be performed just like in any
other VAR model. From Section 11.6 we know that the conditional variance decompositions of
the DSGE model function in a similar way. Let
Ri Jp i Jp A0 ,
be the n n matrix with all impulse responses in y for period i, the h-step ahead forecast error
variance decomposition can be expressed as the n n matrix vh given in equation (11.33) with
q n.
The variance decompositions for the levels (or the accumulation of the observed variables)
can also be computed as in Section 11.6. Moreover, the long-run forecast error variance decomposition for the levels is given by equation (11.39) with
1
Jp i Jp A0 Jp Inp
Jp A0 .
Rlr
i0
i0
The endogenous variables can thus be decomposed into three terms given by (i) the deterministic variables, (ii) the impact of the initial value (Y0 ), and (iii) the structural shocks. This is
analogous to the historical observed variable decomposition in (11.49); cf. Section 11.8.
When xt 1 for all t, the above can be developed further. Specifically, we then know that
the VAR based population mean of yt conditional on the parameters is given by
1
Jp 0 .
(16.7)
y Jp Inp
The decomposition in (16.6) can now be expressed as
t1
Jp i Jp A0 ti ,
yt y Jp t Y0 p In y
t 1, . . . , T,
(16.8)
i0
where p is a p 1 vector with ones. The historical decomposition of the endogenous variables
is now given by (i) its population mean, (ii) the initial value in deviation from its population
mean, and (iii) the structural shocks.
The decompositions can be generalized into decompositions for all possible subsamples {t0
1, . . . , T}, where t0 0, 1, . . . , T 1. For arbitrary point of initialization, t0 , the decomposition
in (16.6) gives us
yt Jp tt0 Yt0
tt
0 1
Jp i Jp 0 xti
i0
tt
0 1
Jp i Jp A0 ti ,
t t0 1, . . . , T,
(16.9)
t t0 1, . . . , T.
(16.10)
i0
i0
We now find that the endogenous variables are decomposed into (i) deterministic variables,
(ii) the history of the endogenous variables until period t0 , and (iii) the structural shocks from
period t0 until period t.
202
y 0
y 1
y p 1
y 0
y p 2
y 1
.
Z
..
..
..
.
.
.
y 0
y p 1 y p 2
It follows from (16.11) that this covariance matrix satisfies the Lyapunov equation
Z Z Jp Jp .
(16.12)
(16.13)
From this expression we can determine y j for j 0, . . . , p 1. For all other autocovariances
we have that
p
i y j i, j p, p 1, . . . .
y j
i1
Rather than calculating population based autocovariances, we can instead simulate data from
the DSGE-VAR and compute sample based estimates of the autocovariances from this data.
Since t N0, we simulate a path for the endogenous variables by drawing T values for
ts from its distribution and letting
s
Jp ts ,
Zts Zt1
t 1, . . . , T.
In case xt 1 we may draw Z0s from N0, Z , while a time varying mean can be handled by
simply letting Z0s Z0 . Finally, we use the relationship between yt and Zt such that
yts yt Jp Zts ,
t 1, . . . , T.
The autocovariances can now be estimated directly from the simulated data, taking only the
deterministic variables xt into account. By repeating this S times, we obtain a sample based
distribution of the autocovariances for a given , .
be computed from Z in (16.12) using (16.13) where has been replaced with A0j A0j .
From these population-based conditional covariances we may compute the correlation decompositions for the DSGE-VAR using the same tools as in Section 11.7. Relative to equation
(11.46), the only change is that the (conditional) covariance matrices of the DSGE-VAR are used
instead of the (conditional) covariance matrices of the DSGE model.
Instead of computing such population-based conditional central second moments, we can
make use of simulation methods to obtain estimates of the sample moments. Since t N0, In
we can simulate a path for Z conditional on only shock j being non-zero by drawing T values
203
s
for j,t
from a standard normal and letting
s
s
Zts Zt1
Jp A0j j,t
,
t 1, . . . , T.
(16.14)
j
In case xt 1 we may draw Z0s from N0, Z , while a time varying mean can be handled
by simply letting Z0s Z0 . Like for the autocovariances, we may take the simulation one step
further. Namely, we let yts yt Jp Zts and then estimate Zts by regressing yts on xt
and stacking it. With Zts denoting the estimated stacked vector, the sample estimate of the
conditional covariance matrix for simulation s is given by
T
ts Z
ts ,
j,s 1
Z
Z
T t1
s 1, . . . , S.
(16.15)
By repeating the simulations S times we can estimate the distribution of the conditional sample
correlations for a given , A0 for a DSGE-VAR.
16.7. Spectral Decomposition
The spectral decomposition of the DSGE model was discussed in Section 13.2. A similar decomposition can also be determined for DSGE-VARs. The population spectrum of the latter is given
by
1
1
1
expi expi
, ,
,
(16.16)
sy
p 2 l
where z In l1 l z . When the structural shocks, t , can be identified we know that
A0 A0 . Letting A0j be the j:th column of A0 it follows that
n
n
1
1
1
j
expi A0j A0j expi
sy
.
(16.17)
sy
2
j1
j1
The contemporaneous population covariance matrix of the endogenous variables conditional on
j
the parameters satisfies equation (13.4). Let y 0 be the covariance matrix of the endogenous
variables conditional on the parameters and on all shocks of the DSGE-VAR being zero except
for j ; cf. Section 16.6. We then find that
j
j
sy d.
(16.18)
y 0
That is, the conditional contemporanoues covariance matrix of the endogenous variables based
on only j being non-zero is equal to the integral of the spectral decomposition based on this
shock.
16.8. Unconditional Forecasting
It is straightforward to apply the sampling the future procedure of Thompson and Miller (1986)
to a DSGE-VAR model; cf. Section 14.5. For a given draw , from the posterior distribution
of a DSGE-VAR we first simulate residuals T 1 , . . . , T h from a normal distribution with mean
zero and covariance matrix . Next, we simulate a path for yT 1 , . . . , yT h by feeding the
residuals into the VAR system in equation (15.9). Repeating this P times for the given ,
yields P sample paths conditional on the parameters. By taking S draws of , from its
posterior we end up with P S paths of yT 1 , . . . , yT h from its predictive density.
The VAR system can be conveniently rewritten for a forecasting exercise. Starting from equation (16.9) we set t T i and t0 T such that
i1
i1
Jp j Jp 0 xT ij Jp i YT
Jp j Jp T ij , i 1, . . . , h.
(16.19)
yT i
j0
j0
Compared with calculating forecast paths from equation (16.2) premultiplied by Jp , the expression in (16.19) has the advantage that the lagged endogenous variables are fixed at the same
204
value for all i. At the same time the terms capturing the influence of the exogenous variables
and the innovations involve sums of weighted current and past values and therefore appear to
be more complex than those in (16.2). To simplify these two terms, let
xT i Jp 0 xT i xT i1 ,
(16.20)
T i1 ,
T i Jp T i
(16.21)
i 1, . . . , h,
where these np-dimensional vectors are initialized through xT T 0. We can then express
the value of the endogenous variables at T i as
yT i Jp xT i Jp i YT Jp T i ,
i 1, . . . , h.
(16.22)
This equation makes it straightforward to compute a path for the endogenous variables over the
forecast sample since it is not necessary to loop over j 0, 1, . . . , i 1.
We can decompose the prediction uncertainty for the DSGE-VAR into two components, residual or shock uncertainty and parameter uncertainty. That is,
C yT i |T ET C yT i |T ; , CT E yT i |T ; , ,
(16.23)
where ET and CT denotes the expectation and covariance with respect to the posterior of
, at time T and where, for notational simplicity, the sequence of exogenous variables
xT 1 , . . . , xT h has been suppressed from the expressions.
i be
To develop a simple expression for the first term on the right hand side of (16.23), let
Y
defined from the dierence equation
i1 ,
i Jp Jp
Y
Y
i 1, . . . , h,
(16.24)
i1
Jp j Jp Jp j Jp .
j0
i 1, . . . , h,
while from (16.22) we find that the parameter uncertainty term is given by:
i
i 1, . . . , h.
CT E yT i |T ; , CT Jp xT i Jp YT
(16.25)
(16.26)
(M
M M M 1 this
MM M1 and M
is therefore n n and has full rank n. With M
205
means that
nqm
t
t M
nq
t m
qm
t
M
(16.27)
q
M
t is the n qm dimensional vector with the free shocks, while t m
where
M t is the qm dimensional vector with the manipulated (or controlled) shocks. The VAR model
in (16.19) can therefore be rewritten as
nqm
yT i 0 xT i Jp YT i1 A0 M
T i
q
m,
A0 M
T i
i 1, . . . , g,
(16.28)
the history of the observed variables, and the shocks T i m is that the qm qm matrix K1 A0 M
has full rank. With this in mind, we find that
qm
1 zT i K 0 xT i K Jp Y qm
T i
K1 A0 M
1
1
T i1
i1
(16.29)
qm
nqm
K
y
u
,
i
1,
.
.
.
,
g
K1 A0 M
T
2j T ij
T i
j1
q
q
q
m
m
m
yT i1
yT 1
yT yT ip
, while
where the np 1 vector YT i1
qm
qm
0 xT i Jp YT i1
A0 T i ,
yT i
i 1, . . . , g,
(16.30)
q
m
determined by (16.29), and where the shocks
where T i satisfies equation (16.27) with T i
nq
M .
T i m N0, M
For i > g there are not any conditioning assumptions to take into account and T i N0, In
when applying the sampling the future procedure.
In order to derive moments from the predictive distribution of the conditional forecasts we
begin by stacking the system in (16.19) for T 1, . . . , T g, and using equation (16.20) along
with the relationship t A0 t . This gives us
Jp xT g
Jp g
yT g
.
..
..
..
.
yT 1
Jp xT 1
Jp
In Jp Jp Jp g1 Jp
T g
0
In
Jp g2 Jp
T g1
.
0
..
..
...
g
..
.
.
T 1
0
0
In
or
(16.31)
YT g XT g GYT D Ig A0 T g .
Furthermore, based on equation (16.27) we can decompose the shocks such that
nq
qm
T g m
T g
0
M
M
0
T g
.
.
.
..
..
.
. ,
.
.
.
.
.
.
nq
q
m
T 1
0
M
0
M
m
T 1
or
T g
nqm Ig M
qm .
Ig M
T g
T g
206
T 1
(16.32)
The stacked conditioning assumptions are given by equation (12.15). Subsituting for YT g
from equation (16.31), using (16.32) and rearranging we find that
qm ZT g UT K XT g GYT K D Ig A0 M
nqm .
K D Ig A 0 M
T g
T g
has
The matrix expression on the left hand side is invertible when the qm qm matrix K1 A0 M
full rank. In that case we obtain a stacked version of equation (16.29) where the manipulated
shocks are given as a function of the conditioning assumptions, the exogenous variables, the
nq
historical data on the observed variables, and the freely determined structural shocks T g m
M
.
N0, Ig M
With these results in mind it can be shown that
q
q
1 K D Ig A0 M
nqm ,
(16.33)
m m K D Ig A 0 M
T g
T g
,T g
where
K D Ig A 0 M
1 K D Ig A0 .
Ing Ig M
D
Premultiplication of the covariance matrix in (16.35) by K or postmultiplication by K yields a
zero matrix. Hence, the conditional predictive distribution of the observed variables satisfies
the conditioning assumptions.109
For forecast horizons i beyond the conditioning horizon g it can be shown that
yT i
Jp xT i
Jp i YT
ig1
Ig A0 T g ,
Jp j Jp T ij Jp ig
(16.36)
j0
p
p
p
The mean of the predictive distribution is therefore given by
Ig A 0 M
qm .
T i Jp i YT Jp ig
E yT i |T , ZT g ; , A0 Jp x
,T g
(16.37)
C yT i |T , ZT g ; , A0
ig1
j0
j
Ig A
Jp j Jp A0 A0 Jp Jp Jp ig
0
ig
M D
Ig A
Ig M
Jp .
D
0
(16.38)
Provided that all the eigenvalues of are less than unity in absolute terms, the conditional
predictions approaches the mean of the observed variables as i , while the covariance
matrix of the forecast error converges to the unconditional covariance matrix of the variables.
109
This can also be seen by noting that the forecast errors YT g E YT g |T , ZT g ; , are orthogonal to K .
207
From the moments of the conditional predictive distribution for fixed parameter values, we
may determine the mean and covariance of the predictive distribution once the influence of
the model parameters has been integrated out. As in equation (12.24), the corresponding
expression for the covariance matrix of the DSGE-VAR is
C yT i |T , ZT g ET C yT i |T , ZT g ; , A0 CT E yT i |T , ZT g ; , A0 ,
for i 1, . . . , h, and where ET and CT denote the expectation and covariance with respect to the
posterior of (, A0 ) at time T. The first and the second term on the right hand side represent
shock and parameters uncertainty, respectively.
16.9.2. Control of the Distribution of the Shocks
The Waggoner and Zha (1999) approach can be implemented for the DSGE-VAR in the same
basic way as for the BVAR considered in Section 14.6. The conditioning assumptions, however,
are expressed as in equation (12.6) or in the stacked version (12.15).
Unlike the case when particular shocks of the DSGE-VAR are manipulated to ensure that the
conditioning assumptions are met, the Waggoner and Zha approach does not require identification of the structural shocks. Hence, we may consider a variant of the stacked system (16.31)
where t is used instead of t . That is,
(16.39)
YT g XT g GYT DT g ,
where T g T g T 1
. The restrictions that the ng dimensional vector of innovations
needs to satisfy can therefore be expressed as
K DT g kT g ,
where the qm g 1 vector
(16.40)
kT g ZT g UT K XT g GYT .
Like in Section 14.6 it can now be shown that if T g N,T g , ,T g then the restrictions
in (16.40) are satisfied for all values of T g . The moments of the distribution are here given by
1
kT g ,
,T g Ig D K K D Ig D K
(16.41)
1
K D Ig .
,T g Ig Ig D K K D Ig D K
The properties of the conditional predictive distribution can now be derived without much
ado. First of all, the mean of the distribution for fixed values of the parameters (, ) is
(16.42)
E YT g |T , ZT g ; , XT g GYT D,T g ,
while the covariance matrix is
C YT g |T , ZT g ; , D,T g D .
(16.43)
K
Jp x
T i
Jp i YT
ig1
T g ,
Jp j Jp T ij Jp ig
j0
(16.44)
ig Jp . (16.45)
Jp j Jp Jp Jp Jp ig
C yT i |T , ZT g ; ,
,T g
j0
208
Given that the eigenvalues of are less than unity in aboslute term, both these moments converge to the unconditional moments of the observed variables when i . A decomposition
of the overall predictive uncertainty can now be obtained as in the end of Section 16.9.1. This
provides us with the overall share of the predictive uncertainty under the conditioning assumptions which is due to uncertainty about the shocks and the share which is due to the uncertainty
about the parameters.
(16.46)
(16.47)
where
nq
M ,
T g r N 0, Ig M
are the free shocks over the conditioning horizon. The conditioning problem is now one of
qr
nq
|T g r such that the conditioning assumptions in (12.15)
determining the distribution of T g
are satisfied.
qr
have to satisfy can here be expressed as
The restrictions that the shocks T g
qr kqr ,
(16.48)
K D Ig A 0 M
T g
T g
where
qr
nqr .
ZT g UT K XT g GYT D Ig A0 M
kT g
T g
q
nq
r
conditional on T g r can now be shown to be normal with mean and
The distribution of T g
covariance matrix given by
q
A D K K D Ig A M
1 kqr ,
,Tr g Ig M
T g
0 M A0 D K
0
(16.49)
qr
A D K K D Ig A M
M
A D K 1 K D Ig A M
.
,T g Iqr g Ig M
0
0
0
0
The mean of the conditional predictive distribution for fixed parameters is now given by
qr
,
(16.50)
E YT g |T , ZT g ; , XT g GYT D Ig A0 M
,T g
where
q
qr ,
A D K K D Ig A M
1 k
,Tr g Ig M
0
T g
0 M A0 D K
q
r ZT g UT K XT g GYT .
k
T g
M D
Ig M
C YT g |T , ZT g ; , D Ig A0 D
qr
Ig A0 D ,
Ig M
,T g Ig M
209
(16.51)
where
1 K D Ig A .
M
A D K K D Ig A M
Ing Ig M
D
0
0 M A0 D K
0
It is now straightforward to show that the predictive distribution of YT g satisfies the conditioning assumptions. Premultiplication of the conditional mean in (16.50) by K we find that
the right hand side is equal to ZT g UT . Furthermore, premultiplication of the covariance in
(16.51) by K and postmultiplication by K gives us a zero matrix.
q
In the event that qr qm we find that the covariance matrix ,Tr g is zero since the qm g qm g
is invertible. As a consequence, the subset of shocks qr qr and
matrix K DIg A0 M
T g
qm
T g
,T g
j0
qr
ig
M D
Ig M
Ig M
Jp .
Ig A0
D
,T g Ig M
(16.53)
Provided that the eigenvalues of are less than unity in aboslute term, the moments in
(16.52) and (16.53) converge to the unconditional moments of the observed variables when
i . A decomposition of the overall predictive uncertainty can now be obtained as in the
end of Section 16.9.1. This provides us with the overall share of the predictive uncertainty
under the conditioning assumptions which is due to uncertainty about the shocks and the share
which is due to the uncertainty about the parameters.
q T g
Provided that the manipulated shocks are modest, i.e., {t m }tT 1 can be regarded as being
drawn from a multivariate normal distribution with mean zero and covariance matrix M M, the
statistic in (16.55) is 2 n. Alternatively, a reference distribution for the statistics may be simulated by feeding a sequence of standard normal shocks T g into the statistic in (16.55), denoted
T g
to
by T,g T g , and thereafter computing the tail probability Pr T,g T g T,g
determine if the conditioning assumptions are modest or not.
A set of univariate statistics based on these ideas may also be considered. Following Adolfson
et al. (2005) we here consider
i
T,g T g
i
T,g T g %
, i 1, . . . , n,
(16.56)
g Jp e
ei Jp
Y
i
where the numerator is element i of the vector in (16.54), and ei is the i:th column of In . This
statistic has a standard normal distribution under the assumption that the manipulated shocks
are modest.
The third modesty statistic for the case of direct control of the structural shocks is the LeeperZha inspired statistic. All shocks are here set to zero except for the manipulated ones. This
means that we require the covariance matrix based on using only the manipulated shocks
i1 ,
i Jp A MM
A Jp
0
0
Y
Y
i 1, . . . , g,
,T g
Under the Waggoner and Zha approach we replace the forecast dierence in equation (16.54)
with
T g ,
T,g T g Jp
where T g is a draw from N,T g , ,T g . The multivariate modesty statistic in (16.55) is
therefore given by
g 1
Jp
T,g T g .
Jp
T,g T g T,g T g
Y
The univariate statistics in (16.56) is likewise computed with element i from T,g T g rather
T g . The denominator is unaected by the choice of conditioning
than this element from T,g
method.
For the Leeper-Zha approach we use element i from T,g ,T g in the numerator of the test
g is replaced with g , where
statistic, while the covariance matrix
Y
Y
1
i
i1
K1 Jp Y , i 1, . . . , g,
Y Jp K1 K1 K1
and where 0
Y 0.
Finally, under the distribution for a subset of the shocks method discussed in Section 16.9.3,
the multivariate and univariate modesty statistics based on the Adolfson, Lasen, Lind, and
Villani approach are calculated as in the case of direct control of the shocks, i.e., based on
the structural shocks rather than on the VAR residuals. For the univariate Leeper-Zha based
modesty statistic, however, the covariance matrix is calculated as under the Waggoner and Zha
A . When qr qm we find that K A0 M
is
approach, except that is replaced with A0 MM
0
1
invertible and, hence, that the Leeper-Zha covariance matrix is equal to the one determined
under direct control of the shocks method. When qr q we likewise find that M In while
A0 A0 so that the Leeper-Zha covariance matrix is identical to the one obtained under the
Waggoner and Zha method.
16.10. The Predictive Likelihood for DSGE-VARs
The general expressions for estimating the predictive likelihood that were presented in Section 12.5 are also valid for DSGE-VAR models. In this regard it is important to keep in mind
211
that the DSGE-VAR has parameters beyond , except for the case when , and that the
VAR parameters need to be dealt with when computing the marginal likelihood for the sam
, t ). The simplified expressions for the Laplace approximation in equation (12.41)
ple (yth
can also be used in the DSGE-VAR framework, where we only need to provide formulas for
yth|t and y,th|t . This is easily achieved through results in Section 16.8 regarding the population moments of the marginal predictive distribution, but again the caveat about taking all the
DSGE-VAR parameters when computing the marginal likelihood for the sample (yth
, t ) needs
to be taken into account.
We can directly make use of equations (16.22) and (16.24) to determine yth|t and y,th|t .
That is, setting th to zero
yth|t Jp xth Jp h Yt ,
h Jp ,
y,th|t Jp
Y
h 1, . . . , H,
where the , , parameters are evaluated at the posterior mode. Based on the joint conditional posterior density of the VAR parameters, the joint posterior mode of the VAR parameters
conditional on is given by equations (15.31) and (15.32).
The joint predictive likelihood for a subset of variables can be derived from the Kalman filter
equations in Section 5.2. The conditional likelihood is given by equation (12.42), where the
period ti term is shown in (12.43). It therefore remains to provide expressions for the forecast
K Jp Yti|ti1 ,
yti|ti1
i 1, . . . , h,
J x Y
p 0 ti
ti1|ti2 Gti1 yti1 yti1|ti2 , if i 2,
Yti|ti1
J x Y ,
if i 1.
p 0 t1
t
The Kalman gain matrix
,
Gti1 ti1 Jp K1
y,ti1|ti2
i 2,
G
Jp Jp , if i 2,
ti1 K Jp ti|ti1 Gti1 K Jp
ti|ti1
J J ,
if i 1.
p p
Posterior mode estimation of via a DSGE-VAR model can, as discussed in Section 15.7, be
conducted in at least two dierent ways. That is, we may either use the marginal likelihood
in equation (15.24) or the concentrated likelihood in equation (15.34) when is finite, and
the likelihood in equation (15.26) when . When combined with the prior of (or )
the first gives us a marginal posterior mode estimate of , while the second gives us a joint
posterior mode estimate. The case when means that the marginal and the joint mode
are equal. Given any such estimate, the corresponding posterior mode of the VAR parameters is
obtained by plugging the value of the estimated DSGE model parameters into equations (15.31)
and (15.32). Both these approaches yield legitimate candidate estimates of the posterior mode
when computing the height of the predictive density in (12.41).
A peculiar property of the DSGE-VAR model is that its prior in equations (15.16) and (15.17)
depends on T. That is, the prior changes as the number of observation in the likelihood increases. Specifically, when going from T to T 1 observations on yt , the change of the prior of
212
ln n T np k ln | | ln | | tr 1
.
2
2
2
If the predictive likelihood for, say, yT 1 |T is calculated using the marginal likelihood values
from the Laplace approximations for periods T 1 and T it follows that the predictive likelihood
is directly influenced by this feature of the prior. In additional, the three terms involving and
imply that the link between the Hessian matrices in equation (12.47) is not valid. Both
these eects from the DSGE-VAR prior being dependent on T are clearly unfortunate when using
such models in a forecast comparison exercise.
If we instead make use of the more reasonable assumption that the prior remains unchanged
when evaluating forecasts, i.e., we fix T at T in the prior, and compute the Laplace approxi t is available through equations (15.38)(15.45).
mation through (12.46), the Hessian matrix
th is equal
Furthermore, equation (12.47) is now valid with the eect that the Hessian matrix
to sum of t and th|t . For the DSGE-VAR models with finite it follows that
2
ln L yth
2 ln L yth
|y, Y1 ; ,
|y, Y1 ; ,
vecvec
vecvech
2
2 ln L yth
|y, Y1 ; ,
th|t
(16.57)
.
ln L yth |y, Y1 ; ,
0
vech vec
vech vech
0
The zeros are due to the fact that the conditional likelihood is invariant to the DSGE model
parameters and we have assumed that the prior of the VAR parameters is fixed, with t t for
the forecast horizon t h with h 1, . . . , H.
By contrast, for the DSGE-VAR models with , the VAR parameters are functions of the
th|t we may then apply the chain rule to the partial
DSGE model parameters. To determine
derivatives. Letting the nonzero matrices in (16.57) be denoted by , , , , , and
, , respectively, it follows that
vec
vec
vec
vech
,
,
th|t
vech
vech
vec
vech
,
.
,
The partial derivatives of and with respect to are:
1
vec
1
Y Y In GyY Y Y Dnpk GY Y ,
and
vech
Gyy Dn Dnpk GY Y 2Dn In GyY ,
213
M
M
K,h K Jp
i,h
Y
vec
i0
h1i
vec
Y
I
M
M
I
np
i,h
np
i,h
vec
h1i xth1i
vec
Y
1 i
K
J
I
M
np
p
h K,h
vec
vec
vech1i
th1i
1 x
i
M Inp Jp KK,h h
vec
vec
Inp Jp K1
Y i Cnp M M Cnp
Inp i Yt h
K,h h t
)
h1i
h
h
1 vec
1
K,h K Jp
K,h
vec
vec
vec
vec
h
h
1
Y
1
1
K
J
K
J
p
p
h K,h
K,h
2 vec
vec
h
1 vec
h
Y
1
1
J
K
J
p
p
K,h
K,h h vec .
2
vec
Furthermore,
,
1
2
h
vec
Y
vec
1
K,h K Jp
1
Jp K1
K,h K Jp Jp K 2K,h h h In
vec
h
Y
h2
h1i
vec
Y
M Inp Gi,h
vech
vech i0
vec
h
h
1
Y
1
1
K
J
K
J
,
p
p
h K,h
K,h
2 vec
vech
and
,
1
2
(16.58)
h
vec
Y
vech
(16.59)
1
Jp K1
K,h K Jp Jp K 2K,h h h In
vec
h
Y
(16.60)
K Jp
1
K,h
.
vech
These expression are evaluated at the posterior mode and rely on the following:
h K yth Jp xth Jp h Yt ,
i
1
K Jp i , i 0, 1, . . . , h 2,
Gi,h Jp K In 1
K,h h h
K,h
h Jp K,
K,h K Jp
Y
h1
h
h1
vec
vec
Y
Y
Inp M
In2 p2 Cnp
,
Y
vec
vec
214
h 1, 2, . . . , H,
0 0, vec
0 /vec 0, and Cnp is the n2 p2 n2 p2 commutation matrix such
where
Y
Y
that Cnp vec vec . Moreover,
h1
h
vec
vec
Y
Y
Jp Jp Dn
, h 1, 2, . . . , H,
vech
vech
0 /vech 0. Next,
where again vec
Y
xth1
xth
Jp N xth1 Inp M
,
vec
vec
where xt 0 and xt /vec . Furthermore,
vec h1
vec h
h1
M Inp
Inp
,
vec
vec
h 1, 2, . . . , H,
h 1, 2, . . . , H,
vec h
h
th
K Jp
Yt K Jp
,
vec
vec
vec
h 1, 2, . . . , H,
represent the n structural shocks of the DSGE-VAR. The third dimension gives the horizon for
the responses, where the first concerns the contemporaneous responses.
16.11.2. DSGEVARVarianceDecompositions
The function DSGEVARVarianceDecompositions computes DSGE-VAR based variance decompositions for the original or the levels data. It requires 6 input variables to achieve its task: Phi, A0,
p, h, VDType, and levels. All variables except VDType are used by DSGEVARImpulseResponses
discussed above. The VDType input is an integer that takes the value 1 if the forecast errors
concern the original data and 2 if they refer to the levels data.
The three output variables from the function are called FEVDs, LRVD and VarShares. The
first is a matrix of dimension n n h with the forecast error variance decompositions for the
n endogenous variables in the rows, the n structural shocks in the columns, and the forecast
horizon in the third dimension, with h being the maximum horizon. The second output variable,
LRVD, is an n n matrix with the long-run variance decompositions. For the original data, the
long-run is determined by the maximum of h and 200, and for levels data it is computed via
the matrix Rlr in Section 16.3 along with the expression in equation (11.39). The final output
variable is given by VarShares, which measures the convergence to the long-run by dividing the
h-step ahead forecast error variances with the long-run variances. Values close to unity suggest
convergence, while small values indicates the model has not yet converged to the long run.
16.11.3. DSGEVARObsVarDecomp
The function DSGEVARObsVarDecomp calculates the observed variable decompositions from a
DSGE-VAR model. It takes 7 input variables: Phi, A0, p, k, y, Y, and IsConstant. The first 4
variables have been discussed above, but it may be noted that Phi includes the parameters on
the exogenous variables. The input variable y is an nT matrix with the data on the endogenous
variables, while Y is an npkT matrix with the data on the exogenous and lagged endogenous
variables. Finally, the variable IsConstant is a boolean variable that is unity if xt 1 for all t
and zero otherwise. The decomposition in (16.6) is used in the latter case, while (16.8) is used
in the former.
The three output variables are given by yMean, yInitial, and yShocks. The first is an n T
matrix with the population mean of the data given the parameters (and the exogenous variables
term when IsConstant is zero). Similarly, the variable yInitial is an n T matrix with the
initial value term in the decomposition. The last output variable is yShocks, an n T n matrix
with the decomposition of the endogenous variables (rows) for each time period (columns) due
to the n structural shocks (third dimension).
16.11.4. DSGEVARCorrelationParam
The function DSGEVARCorrelationParam requires only three input variables: Phi, SigmaEpsilon
and DSGEModel. The matrix Phi is of dimension n np k are thus includes the parameters on
the exogenous variables as well as those on lagged endougenous. The matrix SigmaEpsilon is
the residual covariance matrix, while the last input variable is well known.
The output variable is called SVEs and is a structure with 7 fields. The DSGE-VAR model
population mean matrix of dimension n k is stored in the field Mean. To get the actual
population mean, this matrix needs to be multiplied by xt for each time period. The population
mean computation is based on the approximation
1
Jp 0 xt ,
yt Jp Inp
i.e., an expression which is correct for xt 1. The actual population mean is given by
Jp j Jp 0 xtj .
yt
j0
216
To get an idea about the approximation error, YADA computes yt through the recursion
yt 0 xt
p
i yti ,
i1
matrix obtained from in equation (15.10) by excluding the parameters on exogenous variables
(0 ). All other variables are shared with DSGEVARCorrelationSimulationParam above.
Two output variables are provided, SOmega and Omega. The first is an nn 1/2 f matrix with the unique coherence values for pairs of observed variables across the f frequencies,
obtained from the vector Omega. Like for all other spectral density functions in YADA, the integer f 300, while the entries in Omega are given by j j 1/299; see, e.g., the
DSGECoherenceTheta function in Section 13.9.3.
16.11.9. DSGEVARSpectralDecomposition
The function DSGEVARSpectralDecomposition requires 6 input variables: Phi, A0, lambda,
EstStr, DSGEModel, and CurrINI. The Phi matrix does not include parameters on the deterministic variables and is therefore setup in the same way as for the function that computes the
forecast error variance decomposition; cf. DSGEVARVarianceDecompositions above. The scalar
lambda gives the hyperparameter, while EstStr is a string that provides information about
which values of the VAR parameters that are used, i.e., the initial values, the marginal or the
joint posterior mode values. The structures DSGEModel and CurrINI are by now well known.
The output variables from the function are first of all given by SyOmega and SyOmegaAnnual.
j
The first variable is an n f cell array of n n matrices with the spectral decompositions sy
for shock j and frequency ; see equation (16.17). The second variable is the annualized version
of the conditional population spectrum; see Section 13.5. Next, the vectors OriginalVariance
and AnnualVariance are given as output. These hold the model-based population variances of
the endogenous variables and of the annualized endogenous variables, respectively. The fifth
and last output variable is Omega, an f-dimensional vector with the 300 frequencies that are
considered, i.e., 0, /299, 2/299, . . . , .
16.11.10. DSGEVARPredictionPathsParam
The function DSGEVARPredictionPathsParam computes unconditional forecasts of the endogenous variables for fixed values of the parameters. To achieve this objective, 12 input variables are needed: Phi0, Phi, SigmaEpsilon, X, LastPeriod, h, lambda, NumPaths, EstStr,
ForecastType, DSGEModel, and CurrINI. The matrix Phi0 contains the parameters on the exogenous variables, while Phi are the parameters on lagged endogenous variables and the residual
covariance matrix is given by SigmaEpsilon. Data over the forecast sample on the exogenous
variables are located in the k h matrix X, while the integer LastPeriod gives the position in
the full sample of the last period that is viewed as observed. This means that yT is located in
position LastPeriod of the matrix DSGEModel.Y. The integer h is the maximum forecast horizon, while lambda is the usual hyperparameter that defines the weight on the prior for the
DSGE-VAR. The integer NumPaths is equal to the number of forecast paths to calculate, EstStr
is a string that indicates if initial values, marginal or joint posterior mode values are used for
the computations. The integer ForecastType is equal to 1 if the original data format of the endogenous variables is forecasted, 2 if annualized data is forecasted, and 3 if the function should
forecast transformed data. The last two input variables are familiar.
The number of output variables provided by the function is 4. They are given by PredPaths,
PredData, PredEventData, and YObsEventData. These variables are nearly the same as those
supplied by the unconditional forecasting function, DSGEPredictionPathsTheta, for the DSGE
model. Specifically, the fields of PredData dier somewhat. The DSGE-VAR forecasting function has the following fields: PredMean, epsShocks, epsMean, Shocks, KernelX, and KernelY.
The PredMean field gives the n h matrix with population mean forecasts of the endogenous
variables. The field epsShocks is an n h matrix with the population mean of the residuals
over the forecast horizon, i.e., a zero matrix, while epsMean is the sample mean of the simulated
residuals. The paths for all residuals are stored in the matrix Shocks, with dimensions n h P
(with P being the number of simulated paths). The last two fields of the PredData structure give
the horizontal and vertical axes values for kernel density estimates of the marginal predictive
densities. These matrices have dimensions n 28 h.
218
16.11.11. DSGEVARPredictionPaths
The function DSGEVARPredictionPaths needs 16 input variables. The prior or posterior draws
of the parameters of the VAR are given by PhiDraws and SigmaDraws. The matrices have dimensions d nnp k and d nn 1/2, respectively, where d is the number of draws from
the corresponding parameter distribution. Each row of the PhiDraws matrix is given by the
transpose of the column vectorization of , while the rows of SigmaDraws contains the draws
of vech . The following two input variables are given by PhiMode and SigmaModeEpsilon,
fixed parameter values of and which are used as backup values for the VAR parameters.
For the prior distribution these matrices are equal to the mode values given the initial values
of the DSGE model parameters, and for the posterior by the mode values given the posterior
mode values of the DSGE model parameters. The latter parameter vector depends on how the
posterior sampler was parameterized.
The next four input variables are directly related to forecasting. They are: X, LastPeriod, h,
and ForecastType and are identical to the variables with the same names in the unconditional
forecast function for fixed parameter values discussed above (DSGEVARPredictionPathsParam).
Thereafter we have 5 input variables related to sampling of the distribution for the DSGEVAR model: CurrChain, IsPosterior, PME, lambda, and NumPaths. The first is an integer that
denotes the number of the selected Markov chain; for the prior distribution this variable can be
empty. The following is a boolean variable which is unity if draws from the posterior distribution
are provided to the function and 0 if they stem from the prior. Next, PME is an integer that is
1 if the posterior mode estimator of the DSGE model parameters used by the posterior sampler
of the DSGE-VAR is taken from the DSGE model, 2 if it was given by the marginal posterior
mode estimator from the DSGE-VAR model of , and 3 if was the joint posterior mode from the
DSGE-VAR. The following variable simply gives the hyperparameter of the DSGE-VAR, while
the number of paths per parameter value is NumPaths. The last 3 input variables are DSGEModel,
CurrINI and controls, all being documented above.
As output the function provides 4 variables. The first is the boolean DoneCalc that indicates if
the computations were completed or not. The next is the matrix PredEventData with prediction
event results. The last two variables give the decomposition of the prediction uncertainty into
the residual/shock uncertainty and parameter uncertainty terms; see (16.25) and (16.26). Both
these variables are 3D matrices with dimensions nnh. The actual prediction paths are stored
in mat-files on disk and are not provided as output variables from the function.
16.11.12. DSGEVARCondPredictionPathsParam(WZ/Mixed)
The function DSGEVARCondPredictionPathsParam computes conditional forecasts of the observed variables for fixed values of the parameters using the direct control of the shocks method
discussed in Section 16.9.1, DSGEVARCondPredictionPathsParamWZ makes use of the Waggoner
and Zha approach discussed in Section 16.9.2, while DSGEVARCondPredictionPathsParamMixed
is based on the distribution of a subset of the shocks method in Section 16.9.3. In addition
to the 12 variable required by the function DSGEVARPredictionPathsParam for unconditional
predictions, the conditional forecasting functions both require two further variables: Z and U.
The former holds the conditioning data, while the latter takes care of initial conditions (uT
in equation (12.6)). It should also be noted that since DSGEVARCondPredictionPathsParam
uses the direct control of shocks method it needs the matrix A0 , which is multiplied by the
structural shocks, instead of the residual covariance matrix . Hence, the input variable A0
replaces SigmaEpsilon in this case. Moreover, this function also needs the input variable
DSGEVARShocks, a vector with integer values determining the positions among the shocks in
the DSGE model of the shocks used in the DSGE-VAR model. This variable is not needed by the
DSGEVARCondPredictionPathsParamWZ function since it does not require structural shocks.
Both function provide seven output variables. The first four are identical to the outputs
given by DSGEVARPredictionPathsParam. The final three variables are the modesty statistics
called MultiModestyStat, UniModestyStat, and UniModestyStatLZ, and they are only calculated when ForecastType is unity. When this condition is met, MultiModestyStat is a matrix
219
T g , while
of dimension NumPaths times 2, where the first columns holds the values of T,g
the second column gives T,g T g under the direct control of the shocks method; see equa T g and T g are replaced with T g and T g , respectively,
tion (16.55); the shock processes
when the Waggoner and Zha approach is applied. The matrix UniModestyStat has dimension
NumPaths times n and gives the univariate modesty statistics, while UniModestyStatLZ is a
vector with the n values of the univariate Leeper-Zha related modesty statistic.
16.11.13. DSGEVARCondPredictionPaths(WZ/Mixed)
The function DSGEVARCondPredictionPaths computes conditional forecasts of the observed
variables for a sample of parameters from the prior or the posterior distribution using the direct
control of shocks method; see Section 16.9.1. Similarly, DSGEVARCondPredictionPathsWZ performs the same task using the Waggoner and Zha method that was discussed in Section 16.9.2
and the function DSGEVARCondPredictionPathsMixed is based on the distribution of a subset of
the shocks method in Section 16.9.3. Both functions need a total of 18 input variables. To begin
with, 16 input variables are shared with the function DSGEVARPredictionPaths. Since the direct
control of shocks method requires the structural form of the DSGE-VAR, the matrix SigmaDraws
is replaced with A0Draws and the covariance matrix SigmaModeEpsilon with A0Mode. The remaining two input variables are given by Z and U, which contains the conditioning assumptions;
see the last Section above.
Both functions provide 6 output variables, 4 of which are shared with unconditional forecasting function DSGEVARPredictionPaths. In addition, the conditional forecasting functions
provide the variables ShockMean and ShockNames. The former variable is a matrix of dimension
n h with the population mean over the prediction horizon of the residuals needed to ensure
that the conditioning assumptions are satisfied. The population mean is estimated as the average over the used VAR parameter draws of the population mean of the resisuals for a fixed
value of these parameters. Finally, the ShockNames output variable is a string matrix with the
names of the structural shocks used by the DSGE-VAR.
16.11.14. DSGEVARPredictiveLikelihoodParam
The function DSGEVARPredictiveLikelihoodParam calculates the joint and marginal predictive
likelihood of a DSGE-VAR model for fixed parameter values using the Laplace approximation.
The function needs 28 input variables to achieve this: theta, thetaPositions, thetaIndex,
thetaDist, PriorDist, LowerBound, UniformBounds, ModelParameters, AIMData, lambda, T, n,
p, npk, GammaHatyy, GammaHatyY, GammaHatYY, DetProductMoments, HSample, logGPR, YData,
X, IsOriginal, IsPlugin, ModeValue, StepLength, DSGEModel, and CurrINI. ModeValue is an
integer that takes the value 1 if posterior mode value from the DSGE model or the initial values
are supplied, the value 2 when marginal posterior mode values are given, and 3 if joint posterior
mode values are provided. In the latter case, the value of logGPR is interpreted as logCPC, the
constant term in the concentrated likelihood function if the DSGE-VAR.
The six output variables are JointPDH, MargPDH, LaplaceMargLike, PredVars, MargPlugin,
and JointPlugin. All these variables have been discussed in Section 12.7.6 in connection with
the function DSGEPredictiveLikelihoodTheta.
220
221
the model has 6 state variables (cf. equation (2.1)), but an additional state variable has been
included to account for the need of y
t1 in the measurement equation for Yt in (2.2). Hence,
r 7 is the dimension of t . The model also has q 3 shocks (the i,t variables) which are
listed among the variables, and one constant. The total number of variables (and equations) is
therefore NumEq 11. The names of the variables given in Table 1 will also appear in the string
matrix that will be sent as input to the measurement equation function, i.e., the string matrix
StateVarNames. Similarly, the names of the equations, e.g., EQ1Euler, will also show up in a
string matrix that can be used to help YADA determine exactly which are the state equations of
the structural form.
Table 1. The AiM model file code for the An and Schorfheide example in equation (2.1).
MODEL> ASmodel
ENDOG>
yhat
_NOTD
pihat
_NOTD
chat
_NOTD
rhat
_NOTD
ghat
_NOTD
zhat
_NOTD
yhatlag
_NOTD
one
_DTRM
etaR
_NOTD
etaG
_NOTD
etaZ
_NOTD
EQUATION> EQ1Euler
EQTYPE>
IMPOSED
EQ>
yhat =
LEAD(yhat,1) + ghat - LEAD(ghat,1) - (1/tau)*rhat
+ (1/tau)*LEAD(pihat,1) + (1/tau)*LEAD(zhat,1)
EQUATION> EQ2Phillips
EQTYPE>
IMPOSED
EQ>
pihat =
beta*LEAD(pihat,1) + kappa*yhat
- kappa*ghat
EQUATION> EQ3Consumption
EQTYPE>
IMPOSED
EQ>
chat =
yhat - ghat
EQUATION> EQ4MonPolicyRule
EQTYPE>
IMPOSED
EQ>
rhat =
rhoR*LAG(rhat,1) + (1-rhoR)*psi1*pihat + (1-rhoR)*psi2*yhat
- (1-rhoR)*psi2*ghat + sigmaR*etaR
EQUATION> EQ5GovConsumption
EQTYPE>
IMPOSED
EQ>
ghat =
rhoG*LAG(ghat,1) + sigmaG*etaG
EQUATION> EQ6Technology
EQTYPE>
IMPOSED
EQ>
zhat =
rhoZ*LAG(zhat,1) + sigmaZ*etaZ
EQUATION> EQ7YLag
EQTYPE>
IMPOSED
EQ>
yhatlag =
LAG(yhat,1)
EQUATION> EQ8OneDef
EQTYPE>
IMPOSED
EQ>
one =
0*LAG(one,1)
EQUATION> EQ9MonPolShock
EQTYPE>
IMPOSED
EQ>
etaR =
0*one
EQUATION> EQ10GovConsShock
EQTYPE>
IMPOSED
EQ>
etaG =
0*one
EQUATION> EQ11TechShock
EQTYPE>
IMPOSED
EQ>
etaZ =
0*one
END
222
Table 2. An example of the required and optional data for the prior distribution file.
Model parameter
Prior parameter 1
Prior parameter 2
Lower bound
tau
estimated
Status
Initial value
1.87500
gamma
Prior type
2.000
0.50
1.000
kappa
estimated
0.15000
gamma
0.200
0.10
0.000
psi1
estimated
1.45830
gamma
1.500
0.25
0.000
psi2
estimated
0.37500
gamma
0.500
0.25
0.000
rhoR
estimated
0.50000
beta
0.500
0.20
rhoG
estimated
0.84620
beta
0.800
0.10
rhoZ
estimated
0.70590
beta
0.660
0.15
rA
estimated
0.50000
gamma
0.500
0.50
0.000
piA
estimated
6.42860
gamma
7.000
2.00
0.000
gammaQ
estimated
0.40000
normal
0.400
0.20
sigmaR
estimated
0.00358
invgamma
0.004
4.00
0.000
sigmaG
estimated
0.00859
invgamma
0.010
4.00
0.000
sigmaZ
estimated
0.00447
invgamma
0.005
4.00
0.000
Upper bound
It may be noted that the first and the third equation in Table 1 are written exactly as in An and
Schorfheide (2007, equations 29 and 31) and not as in (2.1). These two ways of writing the loglinearized consumption Euler equation and the aggregate resource constraint are equivalent for
this model. Furthermore, the use of the constant one is a simple trick which allows us to ensure
that the iid shocks are indeed exogenous. Regarding the AiM notation used in the Table, _NOTD
refers to not data, meaning that AiM treats the variable as an unobserved variable. Similarly,
_DTRM means that the variable is deterministic. For more information, see, e.g., Zagaglia (2005,
Section 4.1).
The AiM code in Table 1 is also found in the file AnSchorfheideModel.aim located in the
sub-directory example\AnSchorfheide. The file can be parsed by AiM and, as mentioned in
Section 3.4, this is handled by the function AiMInitialize. Similar aim files exist for the Lubik
and Schorfheide (2007a) and the Smets and Wouters (2007) examples in sub-directories of the
example directory in YADA.
17.2. Specification of the Prior Distribution
The file with the prior distribution data must be given by either a Lotus 1-2-3 spreadsheet (file
extension .wk1) or an Excel spreadsheet (extension .xls). The An and Schorfheide example
comes with a number of such prior distribution files, e.g., AnSchorfheidePrior.wk1.
The prior distribution file should list all the parameters that are going to be estimated. It
may also list parameters that are calibrated.110 The 7 required column headers in this file are
given by model parameter, status, initial value, prior type, prior parameter 1, prior
parameter 2, and lower bound. The entries under the lower bound header are in fact ignored
unless the prior distribution is gamma, inverted gamma, or left truncated normal. Furthermore,
all the headers are case insensitive in YADA.
YADA also supports two optional headers: upper bound and prior parameter 3. The upper
bound header is used for the beta distribution only. When YADA locates this header it will also
take the lower bound for beta distributed parameters into account. If the header is missing
YADA assumes that any beta distributed parameters have lower bound 0 and upper bound
1. The prior parameter 3 header is used by the Student-t distribution and should contain a
positive interger, the number of degrees of freedom in (4.23). If this header is missing, then
YADA will set the degrees of freedom parameter to unity for the Student-t prior.
110
In Section 17.3, we discuss alternative and more flexible ways of specifying additional parameters that YADA
needs to know about in order to solve the DSGE model.
223
The measurement equation file is always checked for internal consistency before commencing
with, e.g., posterior mode estimation.
17.5. Reading Observed Data into YADA
To estimate with YADA the data on yt Yt t It
and xt 1 needs to be read from a
file. Since the user may wish to transform the raw data prior to defining yt by, e.g., taking logs,
YADA requires that the construction of data is handled in a Matlab m-file.
As an example, consider the data construction file DataConstFile.m that is located in the
sub-directory example\AnSchorfheide. It assumes that there are no inputs for the function.
The requirements on its setup concerns the structuring of the output. Specifically, the data
construction file should return a structure, named e.g., StructureForData. The actual name of
the structure is not important as it is a local variable to the function. The fields of this structure,
however, have required names and setup. The matrix with observed data should appear as
StructureForData.Y.data. It should preferably be of dimension n T with n < T; if not,
YADA will take its transpose.
If you need to transform your data prior to using them for estimation, you can always do
so in your data construction file. For instance, you may want to take natural logarithms of
some of the variables in your data input file, you may wish to rescale some variables, take first
dierences, remove a linear trend, etc. All Matlab functions located on the matlabpath can be
used for this purpose. It is important to note, however, that any files you instruct YADA to read
data from should be specified in the data construction file with their full path. The reason is
that YADA copies all the Matlab m-files that you specify on the DSGE Data tab (see Figure 6)
to the directory tmp and executes them from there. That way, YADA avoids having to deal with
temporary changes to the path. At the same time, a data file located in, e.g., the same directory
as your data construction file will not be copied to the tmp directory. Hence, a command like
wk1read([data\AnSchorfheideData.wk1]),
will not work unless you manually create a directory data below YADAs working directory and
copy the file AnSchorfheideData.wk1 to this directory.
The working directory for YADA is always given by pwd, i.e., the directory where YADA.m is
located. Hence, if you store your data file in a sub-directory to YADAs working directory, e.g.,
example\AnSchorfheide\data you can use the command pwd to set the root for the path where
your data file is located. In this case,
wk1read([pwd \example\AnSchorfheide\data\AnSchorfheideData.wk1]).
When you exit YADA, all files found in the tmp directory are automatically deleted.
The names of the observed variables should appear in the field StructureForData.Y.names.
It should be given as a cell array with n string elements. In DataConstFile.m, n 3 while
StructureForData.Y.names = {YGR INFL INT}.
The data for any exogenous variables should be given by StructureForData.X.data, a matrix of dimension k T, e.g., a vector with ones for a constant term. Similarly, the cell array StructureForData.X.names provides the names of these variables, e.g., being given by
{const}. If the model has no exogenous variables, then these two fields should be empty.
Given that the model has exogenous variables, it is possible to add extra data on these variables to the entry StructureForData.X.extradata. This is an optional entry that if used should
either be an empty matrix or a k Th matrix. YADA views this data on the exogenous variables
as having been observed after the data in StructureForData.X.data. Hence, the extra data
can be used in, for instance, out-of-sample forecasting exercises where it will be regarded as
observations T 1 until T Th .
Next, the field StructureForData.sample should contain a 4 dimensional vector with entries
giving the start year, start period, end year and end period. For instance,
StructureForData.sample = [1980 1 2004 4].
227
This sample data refers to the data in the matrix StructureForData.Y.data for the observed
variables and the matrix StructureForData.X.data for the exogenous variables. The sample
used for estimation can be changed on the Settings tab in YADA.
YADA requires that the data frequency is specified as a string. Valid string entries are
quarterly, monthly, and annual. The first letter of these strings are also permitted. The
name of the field for the data frequency is simply StructureForData.frequency.
and that one initial value is given for YGR. Data for the variable TREND is stored in x. Since one
observation is prepended to the vector of data for YGR, the dimension of the data for TREND must
match the dimension for YGR, i.e., Tx T 1. This means that x is a 1 T 1 vector with
data on TREND. For instance, this may be the vector 0 1 T
.
Suppose we instead consider the following transformation function:
YGR-diff(N)+0.2*TREND
and that the variable N is located in the first row of x and TREND in the second row. In this case,
N needs to have one more element than YGR once initial values for latter have been taken into
account. At the same time TREND should have the same number of elements as YGR. By letting
the first element in the second row of x be NaN (while the first element of the first row is a real
number), this transformation can be achieved.
Since the inline function in Matlab, when executed with one input argument, creates a function with an input ordering of the variables that follows the order in which the variables appear
in the string vector provided to inline, YADA requires that the variable to be transformed appears first in the function string, and all the additional variables thereafter. The ordering of the
additional variables is assumed to match the ordering of these variables in the matrix x.
Assuming that the transformation has been performed successfully, YADA checks if the dimension of the transformed variable is greater than that of the original variable. Should this be
the case, data at the beginning of the variable created by the transformation are removed such
that the dimensions match. In the event that the dimension of the variable created is smaller
than the original, then YADA assumes that the transformation used up data at the beginning
from a time perspective. This would, for example, be the case if the transformation uses the
diff function and no initial values are made available.
The field invertfcn holds a string that describes how the function in the field fcn should
be inverted. In analogy with the fcn field, the fields invertinitial and invertx hold initial
values and data on all additional variables that are needed by the inversion function. Similarly,
the function for exporting data is stored in the field exportfcn. Likewise, the information about
initial values and additional variables required for the export transformation are located in the
fields exportinitial and exportx. Finally, the fields exporttitle and exportname hold string
vectors that makes it possible to use a dierent name of the variable in the file with exported
data. The exporttitle string is written to the line above the exportname string.
The following example for GDP growth is found in the file DataConstFile.m:
StructureForData.Y.transformation.YGR.fcn = 100*(exp(YGR/100)-1);
StructureForData.Y.transformation.YGR.partial = exp(YGR/100).*PartYGR;
StructureForData.Y.transformation.YGR.annualizefcn = ...
100*(exp((1/100)*(YGR(4:length(YGR))+YGR(3:length(YGR)-1)+...
YGR(2:length(YGR)-2)+YGR(1:length(YGR)-3)))-1);
StructureForData.Y.transformation.YGR.annualizepartial = ...
exp((1/100)*(YGR(4:length(YGR))+YGR(3:length(YGR)-1)+...
YGR(2:length(YGR)-2)+YGR(1:length(YGR)-3))).*...
(PartYGR(4:length(PartYGR))+PartYGR(3:length(PartYGR)-1)+...
PartYGR(2:length(PartYGR)-2)+PartYGR(1:length(PartYGR)-3));
StructureForData.Y.transformation.YGR.initial = [];
StructureForData.Y.transformation.YGR.x = [];
StructureForData.Y.transformation.YGR.invertfcn = ...
100*log(1+(YGR/100));
StructureForData.Y.transformation.YGR.invertinitial = [];
StructureForData.Y.transformation.YGR.invertx = [];
StructureForData.Y.transformation.YGR.exportfcn = ...
100*(exp(cumsum(YGR)/100));
StructureForData.Y.transformation.YGR.exportinitial = 0;
StructureForData.Y.transformation.YGR.exportx = [];
StructureForData.Y.transformation.YGR.exporttitle = Real GDP;
229
StructureForData.Y.transformation.YGR.exportname = DY;
Since YGR is the log first dierence of GDP, the general function in the fcn field calculates the
(quarterly) growth rate of GDP. The annualization function similarly provides the annual growth
rate of GDP, while no initial data are supplied and the computations do not need any additional
variables. The field partial (annualizepartial) is the first order partial derivative of the fcn
(annualizefcn) function times a part of the variable that is transformed. The function is used
when the transformation of the variable is applied to a linear decomposition of the variable,
such as the observed variable decomposition in Section 11.8.
It is also possible to transform the observed variables via linear combinations of the transformations performed by the above setup. With three observed variables the following defines
such a transformation matrix:
StructureForData.Y.TransMatrix = [1 0 0;0 1 0;0 -1 1];
The linear combinations of the individually transformed variables is therefore the first, the
second, and the third minus the second (the real rate). Notice that the TransMatrix matrix will
only be applied to the vector of observed variables after the individual transformation functions
under the field fcn have been utilized. This implies that linear transformations of the observed
variables can only be performed when the transformation functions have been properly setup
for all observed variables.
The field invertfcn inverts the calculation in fcn, while the field exportfcn gives the expression for calculating the levels data for YGR based on a constant initial value for the variable.
The variables INFL and INT have their own transformation functions; see DataConstFile.m for
details.
17.5.2. Levels or First Dierences
To inform YADA about which variables appear in levels (like the interest rate) and which appear in first dierences (output and inflation), the field levels should contain a vector whose
elements are either 0 (first dierence) or 1 (level). In our example this means that
StructureForData.Y.levels = [0 0 1].
This information makes it possible for YADA to compute, e.g., the levels responses in all observed variables of a certain economic shock. If the levels field is missing from the data construction file, then YADA displays a message box to remind you. The file is, however, regarded
as valid, with all observed variables in levels. Once you add the field to the data construction
file, YADA will stop nagging you about the missing information.
17.5.3. Simple Annualization
Furthermore, to tell YADA how to annualize observed variables, the field annual should contain a vector whose elements are either 0 (do not annualize/is already annualized) or 1 (annualize). YADA annualizes a variable by adding to the current value the previous 3 (11) observations for quarterly (monthly) data. For instance, the YGR variable measures quarterly logged
GDP (per capita) growth. Summing this variable over 4 consecutive quarters thus gives the
annual logged GDP (per capita) growth. The inflation and interest rate variable are already
assumed to be measured in annual terms. This means that we set:
StructureForData.Y.annual = [1 0 0].
How the data frequency is specified is provided below. It may be noted that the annual field
is optional, with the default being a zero vector, and that YADA does not display a nag screen
when this field is missing.
In some situations you may wish that the annualized data should be multiplied by a constant. For example, the inflation equation in (2.2) annualizes quarter-to-quarter inflation by
multiplying the quarterly price changes with 4. The annualized inflation series can therefore be
calculated by adding four consecutive quarters of t and dividing the sum with 4. For YADA to
compute such an annualized inflation series field annualscale should be specified. Here, we
may set
230
conditional forecasting for periods T 1 and onwards until Tz . Furthermore, if Z has some
NaN values, then the corresponding conditioning assumption is skipped by YADA. The subfield
names should hold a cell array with length equal to m, the number of variables to condition on.
Note that you may select a subset of these variables through the YADA GUI.
Second, the field Z has two subfields that contain matrices K1 and K2 that are needed to map
the observed variables into the conditioning information; cf. equation (12.6). The first matrix is
required to be of full rank m, while the second can be a zero matrix as well as being 3D, where
the third dimension covers the j-dimension for K2j . Their common first and second dimension
should be n m, where n as before is the number of observed variables and m is the number of
conditioning variables in StructureForData.Z.data.
Since the mapping from the observed variables to the conditioning variables may require
an initial value, YADA also needs data in the field U in the structure StructureForData. This
field should contain the subfield data that provides an m Tu matrix with initial conditions
in the mapping; cf. (12.6) for details. Again, YADA assumes that the first observation in this
matrix comes from the same time period as the first observation in the matrix with observed
variables. This initial conditions data can be used provided that Tu is at least equal to the last
period used for estimation of the parameters in the DSGE model. Finally, the conditioning
data can also be linked with transformation function and the logic is the same as in the case
of the transformation functions for the observed variable scenario data, i.e., through the field
StructureForData.Z.transformation.
17.5.7. Percentiles for Distributions
It is possible to control the percentiles that are used for plotting confidence bands for certain
distributions. This is currently handled by StructureForData.percentiles. This optional
entry should be a vector with integer values greater than 0 and less than 100. If there are
at least 2 valid entries YADA will make use of them. For instance, we may write
StructureForData.percentiles = [85 50 15 5 95].
Unless the values are already expressed in ascending order YADA will sort them. Moreover, if
the vector has an odd number of entries then the middle entry is ignored. In the above example
we thus have that YADA would treat the 5 dimensional vector as equal to
StructureForData.percentiles = [5 15 85 95].
YADA then assumes that the first and the last element may be used to construct the outer
confidence band, while the second and the third are used for an inner confidence band.
232
References
Abramowitz, M. and Stegun, I. A. (1964), Handbook of Mathematical Functions with Formulas,
Graphs, and Mathematical Tables, Dover Publications, New York.
Adolfson, M., Anderson, M. K., Lind, J., Villani, M., and Vredin, A. (2007a), Modern Forecasting Models in Action: Improving Macroeconomic Analyses at Central Banks, International
Journal of Central Banking, December, 111144.
Adolfson, M., Lasen, S., Lind, J., and Svensson, L. E. O. (2008a), Optimal Monetary Policy
in an Operational Medium-Sized DSGE Model, Sveriges Riksbank Working Paper Series No.
225.
Adolfson, M., Lasen, S., Lind, J., and Svensson, L. E. O. (2011), Optimal Monetary Policy
in an Operational Medium-Sized DSGE Model, Journal of Money, Credit, and Banking, 43,
12871331.
Adolfson, M., Lasen, S., Lind, J., and Villani, M. (2005), Are Constant Interest Rate Forecasts
Modest Policy Interventions? Evidence from a Dynamic Open-Economy Model, International
Finance, 8, 509544.
Adolfson, M., Lasen, S., Lind, J., and Villani, M. (2007b), Bayesian Estimation of an Open
Economy DSGE Model with Incomplete Pass-Through, Journal of International Economics,
72, 481511.
Adolfson, M., Lasen, S., Lind, J., and Villani, M. (2008b), Evaluating an Estimated New Keynesian Small Open Economy Model, Journal of Economic Dynamics and Control, 32, 2690
2721.
Adolfson, M., Lind, J., and Villani, M. (2007c), Bayesian Analysis of DSGE Models - Some
Comments, Econometric Reviews, 26, 173185.
Adolfson, M., Lind, J., and Villani, M. (2007d), Forecasting Performance of an Open Economy
DSGE Model, Econometric Reviews, 26, 289328.
Aknouche, A. and Hamdi, F. (2007), Periodic Chandrasekhar Recursions, arXiv:0711.3857v1.
An, S. and Schorfheide, F. (2007), Bayesian Analysis of DSGE Models, Econometric Reviews,
26, 113172, with discussion, p. 173219.
Anderson, B. D. O. and Moore, J. M. (1979), Optimal Filtering, Prentice-Hall, Englewook Clis.
Anderson, E. W., Hansen, L. P., McGrattan, E. R., and Sargent, T. J. (1996), Mechanics of
Forming and Estimating Dynamic Linear Economies, in H. M. Amman, D. A. Kendrick, and
J. Rust (Editors), Handbook of Computational Economics, 171252, Elsevier.
Anderson, G. and Moore, G. (1983), An Ecient Procedure for Solving Linear Prefect Foresight
Models, Mimeo, Board of Governors of the Federal Reserve System.
Anderson, G. and Moore, G. (1985), A Linear Algebraic Procedure for Solving Linear Perfect
Foresight Models, Economics Letters, 17, 247252.
Anderson, G. S. (1999), The Anderson-Moore Algorithm: A MATLAB Implementation, Mimeo,
Board of Governors of the Federal Reserve System.
Anderson, G. S. (2008), Solving Linear Rational Expectations Models: A Horse Race, Computational Economics, 31, 95113.
Anderson, G. S. (2010), A Reliable and Computationally Ecient Algorithm for Imposing the
Saddle Point Property in Dynamic Models, Journal of Economic Dynamics and Control, 34,
472489.
Andrle, M. (2009), A Note on Identification Patterns in DSGE Models, Manuscript, Czech
National Bank.
Andrle, M. (2010), Correlation Decompositions, Manuscript, IMF.
Ansley, C. F. and Kohn, R. (1982), A Geometrical Derivation of the Fixed Smoothing Algorithm, Biometrika, 69, 486487.
233
Ansley, C. F. and Kohn, R. (1985), Estimation, Filtering, and Smoothing in State Space Models
with Incompletely Specified Initial Conditions, The Annals of Statistics, 13, 12861316.
Ansley, C. F. and Kohn, R. (1990), Filtering and Smoothing in State Space Models with Partially
Diuse Initial Conditions, Journal of Time Series Analysis, 11, 275293.
Arnold, III., W. F. and Laub, A. J. (1984), Generalized Eigenproblem Algorithms and Software
for Algebraic Riccati Equations, Proceedings of the IEEE, 72, 17461754.
Balakrishnan, N. and Leung, M. Y. (1988), Order Statistics from the Type I Generalized Logistic
Distribution, Communications in Statistics - Simulation and Computation, 17, 2550.
Bartlett, M. S. (1957), A Comment on D. V. Lindleys Statistical Paradox, Biometrika, 44,
533534.
Bauwens, L., Lubrano, M., and Richard, J. F. (1999), Bayesian Inference in Dynamic Econometric
Models, Oxford University Press, Oxford.
Bernardo, J. M. (1976), Psi (Digamma) Function, Applied Statistics, 25, 315317, algorithm
AS 103.
Bernardo, J. M. (1979), Expected Information as Expected Utility, The Annals of Statistics, 7,
686690.
Bernardo, J. M. and Smith, A. F. M. (2000), Bayesian Theory, John Wiley, Chichester.
Beyer, A. and Farmer, R. E. A. (2004), On the Indeterminacy of New-Keynesian Economics,
ECB Working Paper Series No. 323.
Beyer, A. and Farmer, R. E. A. (2007), Testing for Indeterminacy: An Application to U.S.
Monetary Policy: Comment, American Economic Review, 97, 524529.
Blanchard, O. J. and Kahn, C. M. (1980), The Solution of Linear Dierence Models under
Rational Expectations, Econometrica, 48, 13051311.
Bonaldi, J. P. (2010), Identification Problems in the Solution of Linearized DSGE Models,
Borradores de Economia 593, Banco de la Republica de Colombia.
Box, G. E. P. (1980), Sampling and Bayes Inference in Scientific Modelling and Robustness,
Journal of the Royal Statistical Society Series A, 143, 383430.
Brooks, S. P. and Gelman, A. (1998), General Methods for Monitoring Convergence of Iterative
Simulations, Journal of Computational and Graphical Statistics, 7, 434455.
Burmeister, E. (1980), On Some Conceptual Issues in Rational Expectations Modeling, Journal
of Money, Credit, and Banking, 12, 800816.
Calvo, G. A. (1983), Staggered Prices in a Utility-Maximizing Framework, Journal of Monetary
Economics, 12, 383398.
Canova, F. and Sala, L. (2009), Back to Square One: Identification Issues in DSGE Models,
Journal of Monetary Economics, 56, 431449.
Carter, C. K. and Kohn, R. (1994), On Gibbs Sampling for State Space Models, Biometrika, 81,
541553.
Casella, G. and George, E. I. (1992), Explaining the Gibbs Sampler, The American Statistician,
46, 167174.
Chan, J. C. C. and Eisenstat, E. (2012), Marginal Likelihood Estimation with the Cross-Entropy
Method, Manuscript, Centre for Applied Macroeconomic Analysis, Australian National University.
Chen, M.-H. and Shao, Q.-M. (1999), Monte Carlo Estimation of Bayesian Credible and HPD
Intervals, Journal of Computational and Graphical Statistics, 8, 6992.
Chib, S. (1995), Marginal Likelihood from the Gibbs Output, Journal of the American Statistical
Association, 90, 13131321.
Chib, S. and Greenberg, E. (1995), Understanding the Metropolis-Hastings Algorithm, The
American Statistician, 49, 327335.
234
Chib, S. and Jeliazkov, I. (2001), Marginal Likelihood from the Metropolis-Hastings Output,
Journal of the American Statistical Association, 96, 270281.
Christiano, L. J. (2002), Solving Dynamic Equilibrium Models by a Method of Undetermined
Coecients, Computational Economics, 20, 2155.
Christiano, L. J., Eichenbaum, M., and Evans, C. (2005), Nominal Rigidities and the Dynamic
Eects of a Shock to Monetary Policy, Journal of Political Economy, 113, 145.
Christiano, L. J. and Vigfusson, R. J. (2003), Maximum Likelihood in the Frequency Domain:
The Importance of Time-to-Plan, Journal of Monetary Economics, 50, 789815.
Christoel, K., Coenen, G., and Warne, A. (2008), The New Area-Wide Model of the Euro Area:
A Micro-Founded Open-Economy Model for Forecasting and Policy Analysis, ECB Working
Paper Series No. 944.
Christoel, K., Coenen, G., and Warne, A. (2011), Forecasting with DSGE Models, in M. P.
Clements and D. F. Hendry (Editors), The Oxford Handbook of Economic Forecasting, 89127,
Oxford University Press, New York.
Clarida, R., Gal, J., and Gertler, M. (2000), Monetary Policy Rules and Macroeconomic Stability: Evidence and Some Theory, Quartely Journal of Economics, 115, 147180.
Connor, R. J. and Mosimann, J. E. (1969), Concepts of Independence for Proportions with a
Generalization of the Dirichlet Distribution, Journal of the American Statistical Association,
64, 194206.
Consolo, A., Favero, C. A., and Paccagnini, A. (2009), On the Statistical Identification of DSGE
Models, Journal of Econometrics, 150, 99115.
Cowles, M. K. and Carlin, B. P. (1996), Markov Chain Monte Carlo Convergence Diagnostics: A
Comparative Review, Journal of the American Statistical Association, 91, 883904.
Croushore, D. (2011a), Forecasting with Real-Time Data Vintages, in M. P. Clements and D. F.
Hendry (Editors), The Oxford Handbook of Economic Forecasting, 247267, Oxford University
Press, New York.
Croushore, D. (2011b), Frontiers of Real-Time Data Analysis, Journal of Economic Literature,
49, 72100.
Dawid, A. P. (1984), Statistical Theory: The Prequential Approach, Journal of the Royal Statistical Society, Series A, 147, 278292.
De Jong, P. (1988), A Cross-Validation Filter for Time Series Models, Biometrika, 75, 594600.
De Jong, P. (1989), Smoothing and Interpolation with the State-Space Model, Journal of the
American Statistical Association, 84, 10851088.
De Jong, P. (1991), The Diuse Kalman Filter, The Annals of Statistics, 19, 10731083.
De Jong, P. and Chu-Chun-Lin, S. (2003), Smoothing with an Unknown Initial Condition,
Journal of Time Series Analysis, 24, 141148.
De Jong, P. and Shephard, N. (1995), The Simulation Smoother for Time Series Models,
Biometrika, 82, 339350.
Deistler, M., Dunsmuir, W., and Hannan, E. J. (1978), Vector Linear Time Series Models:
Corrections and Extensions, Advances in Applied Probability, 10, 360372.
DeJong, D. N., Ingram, B. F., and Whiteman, C. H. (2000), A Bayesian Approach to Dynamic
Macroeconomics, Journal of Econometrics, 98, 203223.
Del Negro, M. and Eusepi, S. (2011), Fitting Observed Inflation Expectations, Journal of Economic Dynamics and Control, 35, 21052131.
Del Negro, M. and Schorfheide, F. (2004), Priors from General Equilibrium Models, International Economic Review, 45, 643673.
Del Negro, M. and Schorfheide, F. (2006), How Good Is What Youve Got? DSGE-VAR as a
Toolkit for Evaluating DSGE Models, Federal Reserve Bank of Atlanta Economic Review, 91,
2137.
235
Del Negro, M. and Schorfheide, F. (2009), Monetary Policy Analysis with Potentially Misspecified Models, American Economic Review, 99, 14151450.
Del Negro, M. and Schorfheide, F. (2012), DSGE Model-Based Forecasting, Prepared for Handbook of Economic Forecasting, Volume 2.
Del Negro, M., Schorfheide, F., Smets, F., and Wouters, R. (2007), On the Fit of New-Keynesian
Models, Journal of Business & Economic Statistics, 25, 123143, with discussion, p. 143162.
Dunsmuir, W. (1979), A Central Limit Theorem for Parameter Estimation in Stationary Vector
Time Series and Its Application to Models for a Sginal Observed with Noise, The Annals of
Statistics, 7, 490605.
Dunsmuir, W. and Hannan, E. J. (1976), Vector Linear Time Series Models, Advances in Applied
Probability, 8, 339364.
Durbin, J. and Koopman, S. J. (1997), Monte Carlo Maximum Likelihood Estimation in NonGaussian State Space Models, Biometrika, 84, 669684.
Durbin, J. and Koopman, S. J. (2002), A Simple and Ecient Simulation Smoother for State
Space Time Series Analysis, Biometrika, 89, 603615.
Durbin, J. and Koopman, S. J. (2012), Time Series Analysis by State Space Methods, Oxford
University Press, Oxford, 2nd edition.
Efron, B. (1986), Why Isnt Everyone a Bayesian? The American Statistician, 40, 15, with
discussion, p. 511.
Eklund, J. and Karlsson, S. (2007), Forecast Combinations and Model Averaging using Predictive Measures, Econometric Reviews, 26, 329363.
Fang, K.-T., Kotz, S., and Ng, K. W. (1990), Symmetric Multivariate and Related Distributions,
Chapman & Hall, London.
Fasano, G. and Franceschini, A. (1987), A Multidimensional Version of the KolmogorovSmirnov Test, Monthly Notices of the Royal Astronomical Society, 225, 115170.
Fernndez-Villaverde, J. (2010), The Econometrics of DSGE Models, SERIEs, 1, 349.
Fernndez-Villaverde, J. and Rubio-Ramrez, J. F. (2004), Comparing Dynamic Equilibrium
Models to Data: A Bayesian Approach, Journal of Econometrics, 123, 153187.
Fernndez-Villaverde, J. and Rubio-Ramrez, J. F. (2005), Estimating Dynamic Equilibrium
Economies: Linear versus Non-Linear Likelihood, Journal of Applied Econometrics, 20, 891
910.
Fernndez-Villaverde, J., Rubio-Ramrez, J. F., Sargent, T. J., and Watson, M. W. (2007), ABCs
(and Ds) of Understanding VARs, American Economic Review, 97, 10211026.
Fishburn, P. C. (1977), Mean-Risk Analysis with Risk Associated with Below-Target Returns,
American Economic Review, 67, 116126.
Friel, N. and Pettitt, A. N. (2008), Marginal Likelihood Estimation via Power Posteriors, Journal of the Royal Statistical Society Series B, 70, 589607.
Frhwirth-Schnatter, S. (1994), Data Augmentation and Dynamic Linear Models, Journal of
Time Series Analysis, 15, 183202.
Frhwirth-Schnatter, S. (2004), Estimating Marginal Likelihoods for Mixture and Markov
Switching Models Using Bridge Sampling Techniques, Econometrics Journal, 7, 143167.
Fuller, W. A. (1976), Introduction to Statistical Time Series, John Wiley, New York.
Gal, J. (1999), Technology, Employment, and the Business Cycle: Do Technology Shocks
Explain Aggregate Fluctuations? American Economic Review, 89, 249271.
Gal, J. and Gertler, M. (2007), Macroeconomic Modeling for Monetary Policy Evaluation,
Journal of Economic Perspectives, 21(4), 2545.
Gal, J. and Monacelli, T. (2005), Monetary Policy and Exchange Rate Volatility in a Small
Open Economy, Review of Economic Studies, 72, 707734.
236
Gelfand, A. and Dey, D. (1994), Bayesian Model Choice: Asymptotics and Exact Calculations,
Journal of the Royal Statistical Society Series B, 56, 501514.
Gelfand, A. E. and Smith, A. F. M. (1990), Sampling Based Approaches to Calculating Marginal
Densities, Journal of the American Statistical Association, 85, 398409.
Gelman, A. (1996), Inference and Monitoring Convergence, in W. R. Gilks, S. Richardson,
and D. J. Spiegelhalter (Editors), Markov Chain Monte Carlo in Practice, 131143, Chapman
& Hall/CRC, Boca Raton.
Gelman, A., Carlin, J. B., Stern, H. S., and Rubin, D. B. (2004), Bayesian Data Analysis, Chapman
& Hall/CRC, Boca Raton, 2nd edition.
Gelman, A., Roberts, G. O., and Gilks, W. R. (1996), Ecient Metropolis Jumping Rules, in
J. M. Bernardo, J. O. Berger, A. P. Dawid, and A. F. M. Smith (Editors), Bayesian Statistics 5,
599607, Oxford University Press, Oxford.
Gelman, A. and Rubin, D. B. (1992), Inference from Iterative Simulations Using Multiple Sequences, Statistical Science, 7, 457511.
Geman, S. and Geman, D. (1984), Stochastic Relaxation, Gibbs Distributions and the Bayesian
Restoration of Images, IEEE Transactions on Pattern Analysis and Machine Intelligence, 6, 721
741.
Geweke, J. (1988), Antithetic Acceleration of Monte Carlo Integration in Bayesian Inference,
Journal of Econometrics, 38, 7389.
Geweke, J. (1999), Using Simulation Methods for Bayesian Econometric Models: Inference,
Development, and Communication, Econometric Reviews, 18, 173.
Geweke, J. (2005), Contemporary Bayesian Econometrics and Statistics, John Wiley, Hoboken.
Geweke, J. (2007), Bayesian Model Comparison and Validation, American Economic Review,
97(2), 6064.
Geweke, J. (2010), Complete and Incomplete Econometric Models, Princeton University Press,
Princeton.
Geweke, J. and Amisano, G. (2010), Comparing and Evaluating Bayesian Predictive Distributions of Asset Returns, International Journal of Forecasting, 26, 216230.
Geweke, J. and Amisano, G. (2011), Optimal Prediction Pools, Journal of Econometrics, 164,
130141.
Geweke, J. and Amisano, G. (2012), Prediction and Misspecified Models, American Economic
Review, 102, 482486.
Geweke, J. and Whiteman, C. H. (2006), Bayesian Forecasting, in G. Elliott, C. W. Granger,
and A. Timmermann (Editors), Handbook of Economic Forecasting, 380, North Holland, Amsterdam, volume 1.
Geyer, C. J. (1992), Practical Markov Chain Monte Carlo, Statistical Science, 7, 473511.
Giordani, P., Pitt, M., and Kohn, R. (2011), Bayesian Inference for Time Series State Space
Models, in J. Geweke, G. Koop, and H. van Dijk (Editors), The Oxford Handbook of Bayesian
Econometrics, 61124, Oxford University Press, New York.
Gneiting, T., Balabdaoui, F., and Raftery, A. E. (2007), Probabilistic Forecasts, Calibration and
Sharpness, Journal of the Royal Statistical Society Series B, 69, 243268.
Gneiting, T. and Raftery, A. E. (2007), Strictly Proper Scoring Rules, Prediction, and Estimation, Journal of the American Statistical Association, 102, 359378.
Golub, G. H. and van Loan, C. F. (1983), Matrix Computations, Johns Hopkins University Press,
Baltimore, 1st edition.
Gmez, V. (2006), Wiener-Kolmogorov Filtering and Smoothing for Multivariate Series with
State-Space Structure, Journal of Time Series Analysis, 28, 361385.
Good, I. J. (1952), Rational Decisions, Journal of the Royal Statistical Society Series B, 14,
107114.
237
Gouriroux, C., Monfort, A., and Renault, E. (1993), Indirect Inference, Journal of Applied
Econometrics, 8, S85S188.
Hamilton, J. D. (1994), Time Series Analysis, Princeton University Press, Princeton.
Harvey, A. C. (1989), Forecasting, Structural Time Series Models and the Kalman Filter, Cambridge University Press, Cambridge.
Harvey, A. C. and Pierse, R. G. (1984), Estimating Missing Observations in Economic Time
Series, Journal of the American Statistical Association, 79, 125131.
Hastings, W. K. (1970), Monte Carlo Sampling Methods Using Markov Chains and Their Applications, Biometrika, 57, 97109.
Havil, J. (2003), Gamma: Exploring Eulers Constant, Princeton Science Library, Princeton, N. J.
Herbst, E. P. (2012), Using the Chandrasekhar Recursions for Likelihood Evaluation of DSGE
Models, Finance and Economics Discussion Series, No. 2012-35, Federal Reserve Board,
Washington, D. C.
Holthausen, D. M. (1981), A Risk-Return Model with Risk and Return Measured as Deviations
from a Target Return, American Economic Review, 71, 182188.
Ingram, B. F. and Whiteman, C. H. (1994), Supplanting the Minnesota Prior Forecasting
Macroeconomic Time Series Using Real Business Cycle Model Priors, Journal of Monetary
Economics, 34, 497510.
Iskrev, N. (2007), Evaluating the Information Matrix in Linearized DSGE Models, Economics
Letters, 99, 607610.
Iskrev, N. (2010), Local Identification in DSGE Models, Journal of Monetary Economic, 57,
189202.
Johnson, N., Kotz, S., and Balakrishnan, N. (1995), Continuous Univariate Distributions, volume 2, John Wiley, New York, 2nd edition.
Kadiyala, K. R. and Karlsson, S. (1997), Numerical Methods for Estimation and Inference in
Bayesian VAR-Models, Journal of Applied Econometrics, 12, 99132.
Kalman, R. E. (1960), A New Approach to Linear Filtering and Prediction Problems, Journal
of Basic Engineering, 82, 3545.
Kalman, R. E. and Bucy, R. S. (1961), New Results in Linear Filtering and Prediction Theory,
Journal of Basic Engineering, 83, 95108.
Kass, R. E. and Raftery, A. E. (1995), Bayes Factors, Journal of the American Statistical Association, 90, 773795.
Kilian, L. and Manganelli, S. (2007), Quantifying the Risk of Deflation, Journal of Money,
Credit, and Banking, 39, 561590.
Kilian, L. and Manganelli, S. (2008), The Central Banker as a Risk Manager: Estimating the
Federal Reserves Preferences under Greenspan, Journal of Money, Credit, and Banking, 40,
11031129.
Kimball, M. S. (1995), The Quantitative Analytics of the Basic Nonmonetarist Model, Journal
of Money, Credit, and Banking, 27, 124177.
King, R. G. (2000), The New IS-LM Model: Language, Logic, and Limits, Federal Reserve Bank
of Richmond Economic Quarterly, 86, 45103.
King, R. G. and Watson, M. W. (1996), Money, Prices, Interest Rates and the Business Cycle,
Review of Economics and Statistics, 78, 3553.
King, R. G. and Watson, M. W. (1998), The Solution of Singular Linear Dierence Systems
under Rational Expectations, International Economic Review, 39, 10151026.
Klein, A. and Neudecker, H. (2000), A Direct Derivation of the Exact Fisher Information Matrix
of Gaussian Vector State Space Models, Linear Algebra and its Applications, 321, 233238.
Klein, A. and Spreij, P. (2006), An Explicit Expression for the Fisher Information Matrix of a
Multiple Time Series Process, Linear Algebra and its Applications, 417, 140149.
238
Klein, A. and Spreij, P. (2009), Matrix Dierential Calculus Applied to Multiple Stationary
Time Series and an Extended Whittle Formula for Information Matrices, Linear Algebra and
its Applications, 430, 674691.
Klein, P. (2000), Using the Generalized Schur Form to Solve a Multivariate Linear Rational
Expectations Model, Journal of Economic Dynamics and Control, 24, 14051423.
Kohn, R. and Ansley, C. F. (1983), Fixed Interval Estimation in State Space Models when some
of the Data are Missing or Aggregated, Biometrika, 70, 683688.
Kohn, R. and Ansley, C. F. (1989), A Fast Algorithm for Signal Extraction, Influence and CrossValidation in State Space Models, Biometrika, 76, 6579.
Komunjer, I. and Ng, S. (2011), Dynamic Identification of Dynamic Stochastic General Equilibrium Models, Econometrica, 79, 19952032.
Koopman, S. J. (1993), Disturbance Smoother for State Space Models, Biometrika, 80, 117
126.
Koopman, S. J. (1997), Exact Initial Kalman Filtering and Smoothing for Nonstationary Time
Series Models, Journal of the American Statistical Association, 92, 16301638.
Koopman, S. J. and Durbin, J. (2000), Fast Filtering and Smoothing for State Space Models,
Journal of Time Series Analysis, 21, 281296.
Koopman, S. J. and Durbin, J. (2003), Filtering and Smoothing of State Vector for Diuse
State-Space Models, Journal of Time Series Analysis, 24, 8598.
Koopman, S. J. and Harvey, A. (2003), Computing Observation Weights for Signal Extraction
and Filtering, Journal of Economic Dynamics and Control, 27, 13171333.
Kydland, F. E. and Prescott, E. C. (1982), Time to Build and Aggregate Fluctuations, Econometrica, 50, 13451370.
Landsman, Z. M. and Valdez, E. A. (2003), Tail Conditional Expectations for Elliptical Distributions, North American Actuarial Journal, 7, 5571.
Lartillot, N. and Philippe, H. (2006), Computing Bayes Factors Using Thermodynamic Integration, Systematic Biology, 55, 195207.
Leeper, E. M. and Zha, T. (2003), Modest Policy Interventions, Journal of Monetary Economics,
50, 16731700.
Lees, K., Matheson, T., and Smith, C. (2011), Open Economy Forecasting with a DSGE-VAR:
Head to Head with the RBNZ Published Forecasts, International Journal of Forecasting, 27,
512528.
Lindley, D. V. (1957), A Statistical Paradox, Biometrika, 44, 187192.
Little, R. J. (2006), Calibrated Bayes: A Bayes/Frequentist Roadmap, The American Statistician, 60, 213223.
Loudin, J. D. and Miettinen, H. E. (2003), A Multivariate Method for Comparing N-dimenional
Distributions, PHYSTAT2003 Proceedings, 207210.
Lubik, T. A. and Schorfheide, F. (2004), Testing for Indeterminacy: An Application to U. S.
Monetary Policy, American Economic Review, 94, 190217.
Lubik, T. A. and Schorfheide, F. (2007a), Do Central Banks Respond to Exchange Rate Movements? A Structural Investigation, Journal of Monetary Economics, 54, 10691087.
Lubik, T. A. and Schorfheide, F. (2007b), Testing for Indeterminacy: An Application to U.S.
Monetary Policy: Reply, American Economic Review, 97, 530533.
Lucas, Jr., R. E. (1976), Econometric Policy Evaluation: A Critique, in K. Brunner and A. H.
Meltzer (Editors), Carnegie-Rochester Series on Public Policy, Vol. 1, 1946, North-Holland,
Amsterdam.
Machina, M. J. and Rothschild, M. (1987), Risk, in J. Eatwell, M. Millgate, and P. Newman
(Editors), The New Palgrave Dictionary of Economics, 203205, MacMillan, London.
Magnus, J. R. and Neudecker, H. (1988), Matrix Dierential Calculus with Applications in Statistics and Econometrics, John Wiley, Chichester.
239
Warne, A. (2012), Extenting YADA A Guide to The User Interface, The Controls and The
Data Structures, Manuscript, European Central Bank. Available with the YADA distribution.
Warne, A., Coenen, G., and Christoel, K. (2012), Predictive Likelihood Comparisons with
DSGE and DSGE-VAR Models, Manuscript, European Central Bank.
Whittle, P. (1953), The Analysis of Multiple Time Series, Journal of the Royal Statistical Society
Series B, 15, 125139.
Winkler, R. L. and Murphy, A. H. (1968), Good Probability Assessors, Journal of Applied
Meteorology, 7, 751758.
Woodford, M. (2003), Interest and Prices: Foundations of a Theory of Monetary Policy, Princeton
University Press, Princeton.
Xie, W., Lewis, P. O., Fan, Y., Kuo, L., and Chen, M.-H. (2011), Improving Marginal Likelihood
Estimation for Bayesian Phylogenetic Model Selection, Systematic Biology, 60, 150160.
Yu, B. and Mykland, P. (1998), Looking at Markov Samplers Through CUSUM Path Plots: A
Simple Diagnostic Idea, Statistics and Computing, 8, 275286.
Zadrozny, P. A. (1989), Analytical Derivatives for Estimation of Linear Dynamic Models, Computers and Mathematics with Applications, 18, 539553.
Zadrozny, P. A. (1992), Errata to Analytical Derivatives for Estimation of Linear Dynamic
Models, Computers and Mathematics with Applications, 24, 289290.
Zagaglia, P. (2005), Solving Rational-Expectations Models through the Anderson-Moore Algorithm: An Introduction to the Matlab Implementation, Computational Economics, 26, 91
106.
Zellner, A. (1971), An Introduction to Bayesian Inference in Econometrics, John Wiley, New York.
242