The Rmgarch Models
The Rmgarch Models
(Version 1.3-0)
Alexios Ghalanos
Contents
1 Introduction 2
3 Miscellaneous 17
Appendices 18
1
1 Introduction
The ability to dynamically and jointly model the full multivariate density dynamics has very
important implications for risk and portfolio management, and more generally economic pol-
icy decision making. However, feasible large-scale multivariate GARCH modelling has proved
very challenging since the direct extension of the univariate models to a vector representation
by Bollerslev et al. (1988). The rmgarch package aims to provide a subset of multivariate
GARCH models which can handle large scale estimation through separation of the dynamics so
that parallel processing may be used. Methods for fitting, filtering, forecasting and simulation
are included were applicable with some interesting additional methods aimed at portfolio and
risk applications. This document provides for a summarized theoretical background of the mod-
els and their properties.
While there are a number of open source and commercial packages implementing the DCC based
models, the rmgarch package uniquely implements and introduces the GO-GARCH model with
ICA using the multivariate affine Generalized Hyperbolic distribution and the relevant methods
for working with this model in an applied setting.
The rmgarch package is on CRAN and the development version on bitbucket (https://
bitbucket.org/alexiosg). Some online examples and demos are available on my website
(http://www.unstarched.net).
The package is provided AS IS, without any implied warranty as to its accuracy or suitability. A
lot of time and effort has gone into the development of this package, and it is offered under the
GPL-3 license in the spirit of open knowledge sharing and dissemination. If you do use the model
in published work DO remember to cite the package and author (type citation(”rmgarch”)
for the appropriate BibTeX entry) , and if you have used it and found it useful, drop me a note
and let me know.
IMPORTANT :
The package is still in development and some functions/methods MAY change
over time, and bugs are certain to exist. Please report any suspected bugs in
the code, mistakes in the models or general questions to the R-SIG-FINANCE
mailing list and not directly to my email, unless solicited. I maintain a blog
(http://www.unstarched.net) which contains some examples and posts which I up-
date when time permits.
2
Var[z t ] = I N , (3)
with I N denoting the identity matrix of order N. The conditional covariance matrix H t of xt
may be defined as:
Var (xt |I t−1 ) = Vart−1 (xt ) = Vart−1 (εt )
1/2 1/2
= H t Vart−1 (z t )(H t )0
= H t. (4)
The literature on the different specifications of H t may be broadly divided into direct multi-
variate extensions, factor models and the conditional correlation models. The usual trade-off
of model parametrization and dimensionality clearly applies here, with the fully parameterized
models offering the richest dynamics at the cost of increasing parameter size, making it unfea-
sible for modelling anything beyond a couple of assets. There is, also, a not so evident tradeoff
between those models which allow flexible univariate dynamics (in the motion dynamics and
the distributions) to enter the equation at the cost of some multivariate dynamics. The next
sections will review these models and some of the tradeoffs they present for the decision maker.
A more complete review of multivariate GARCH (MGARCH ) models is provided by Bauwens
et al. (2006) and Silvennoinen and Teräsvirta (2009).
3
p p
where D t = diag( h11,t , ..., hnn,t ), and R is the positive definite constant conditional corre-
lation matrix. The conditional variances, and hii,t , which can be estimated separately, can be
written in vector form based on GARCH(p,q) models5
p
X q
X
ht = ω + Ai εt−i εt−i + B i ht−i (6)
i=1 i=1
where ω ∈ Rn , Ai and B i are N × N diagonal matrices, and denotes the Hadamard operator.
The conditions for the positivity of the covariance matrix H t are that R is positive definite,
and the elements of ω and the diagonal elements of the matrices Ai and B i are positive. In the
extended CCC model (E-CCC) of Jeantheau (1998), implemented in the ccgarch package, the
assumption of diagonal elements on Ai and B i was relaxed, allowing the past squared errors
and variances of the series to affect the dynamics of the individual conditional variances, and
hence providing for a much richer structure, albeit at the cost of a lot more parameters. The
decomposition in (5), allows the log-likelihood at each point in time (LLt ), in the multivariate
normal case, to be expressed as
1
LLt = log (2π) + log |H t | + ε0 t H −1
t ε t
2
1
= log (2π) + log |D t RD t | + ε0 t D −1 −1 −1
t R D t εt (7)
2
1
= log (2π) + 2 log |D t | + log |R| + z 0 t R−1 z 0 t
2
where z t = D −1
t εt . This can be described as a term (D t ) for the sum of the univariate GARCH
model likelihoods, a term for the correlation (R) and a term for the covariance which arises from
the decomposition.
Because the restriction of constant conditional correlation may be unrealistic in practice, a class
of models termed Dynamic Conditional Correlation (DCC) due to Engle (2002) and Tse and
Tsui (2002) where introduced which allow for the correlation matrix to be time varying with
motion dynamics, such that
H t = D t Rt D t . (8)
In these models, apart from the fact that the time varying correlation matrix, Rt , must be
inverted at every point in time (making the calculation that much slower), it is also important
to constrain it to be positive definite. The most popular of these DCC models, due to Engle
(2002), achieves this constraint by modelling a proxy process, Qt as:
Qt = Q̄ + a z t−1 z 0 t−1 − Q̄ + b Qt−1 − Q̄
(9)
= (1 − a − b)Q̄ + az t−1 z 0 t−1 + bQt−1
where a and b are non negative scalars, with the condition that a + b < 1 imposed to ensure
stationarity and positive definiteness of Qt . Q̄ is the unconditional matrix of the standardized
errors z t which enters the equation via the covariance targeting part (1 − a − b)Q̄, and Q0 is
positive definite. The correlation matrix R is then obtained by rescaling Qt such that,
4
The log-likelihood function in equation (6) can be decomposed more clearly into a volatility and
correlation component by adding and subtracting ε0 t D −1 −1 0
t D t εt = z t z t ,
T
1X
LL = N log (2π) + 2 log |D t | + log |Rt | + z 0 t R−1 0
t z t
2 i=1
T T (11)
1X −1 −1
1X
= 0
N log (2π) + 2 log |D t | + ε t D t D t εt − z 0 t z t + log |Rt | + z 0 t R−1 0
t z t
2 i=1 2 i=1
= LLV (θ1 ) + LLR (θ1 , θ2 )
where LLV (θ1 ) is the volatility component with parameters θ1 , and LLR (θ1 , θ2 ) the correlation
component with parameters θ1 and θ2 . In the Multivariate Normal case, where no shape or skew
parameters enter the density, the volatility component is the sum of the individual GARCH
likelihoods which can be jointly maximized by separately maximizing each univariate model.
In other distributions, such as the multivariate Student, the existence of a shape parameter
means that the estimation must be performed in one step so that the shape parameter is jointly
estimated for all models. Separation of the likelihood into 2 parts provides for feasible large
scale estimation. Together with the use of variance targeting, very large scale systems may be
estimated in a matter of seconds with the use of parallel and grid computing. Yet as the system
becomes larger and larger, it becomes questionable whether the scalar parameters can adequately
capture the dynamics of the underlying process. As such, Cappiello et al. (2006) generalize the
DCC model with the introduction of the Asymmetric Generalized DCC (AGDCC ) where the
dynamics of Qt are:
− −
Qt = Q̄ − A0 Q̄A − B 0 Q̄B − G0 Q̄ G + A0 z t−1 z 0 t−1 A + B 0 Qt−1 B + G0 z − 0
t z t G (12)
where A, B and G are the N × N parameter matrices, z − t are the zero-threshold standardized
−
errors which are equal to z t when less than zero else zero otherwise, Q̄ and Q̄ the unconditional
−
matrices of z t and z t respectively. Because of its high dimensionality, restricted models have
been used including the scalar, diagonal and symmetric versions with the specifications nested
being
√ √
DCC : G = [0] , A = a, B = b
√ √ √
ADCC : G = g, A = a, B = b
GDCC : G = [0].
Variance targeting in such high dimensional models where the parameters are no longer scalars,
creates difficulties in imposing positive definiteness during estimation while at the same time
guaranteeing a global optimum solution. Methods which directly check and penalize the eigen-
values of the intercept matrix introduce non-smoothness and discontinuities into the likelihood
surface for which inference is likely to be difficult.6 More substantially, Aielli (2009) points out
that the estimation of Q̄t as the empirical counterpart of the correlation matrix of z t in the
DCC model is inconsistent since E[z t z t ] = E[Rt ] 6= R[Q̄t ]. He proposes instead the cDCC
model which includes a corrective step which eliminates this inconsistency, albeit at the cost of
targeting which is not allowed.
One model which tries to balance dimensionality with more realistic dynamics is the Flexible
6
This has not prevented a plethora of paper using these models and making inference based on questionable
convergence criteria.
5
DCC (FDCC) model of Billio et al. (2006) which allows groups 7 of securities to have the same
dynamics. The model may parsimoniously be represented as:
P Q
Qt = cc0 + (Ig aj ) (Ig aj )0 εt−j ε0t−j + (Ig bj ) (Ig bj )0
X X
Qt−j (13)
j=1 j=1
where Ig is the assets × groups logical matrix of group exclusive membership. This is a very
flexible representation allowing a large range of representations, from a single group driving all
dynamics (like the DCC), to each asset having its own group (like the GDCC). Unfortunately,
without specialized restrictions correlation targeting is lost, but the model still remains feasible
for a not too large number of groups. In the rmgarch package, the intercept is estimated using
correlation targeting with the intercept set to (110 − aa0 − bb0 ) Q̄ and the restriction that
ai aj + bi bj < 1, ∀i, j in order to avoid explosive patterns. Positive definiteness of the matrices
is achieved by construction subject to a suitable starting point for Qt . Also note that only the
FDCC(1,1) model is allowed (i.e. P=1, Q=1) because of the large number of pairwise constraints
needed which make higher order models prohibitively expensive to calculate (and in any case it
is quite rare to use anything beyond this for DCC type models).
In the rmgarch package, the DCC, aDCC and FDCC models are implemented using the 2-stage
approach, with a choice of 3 distributions, the multivariate Normal (MVN ), Student (MVT ) and
Laplace (MVL). For the MVT distribution, it is understood that this is based on known shape
parameter (which may be fixed for the first and second stage estimation using the fixed.pars
method on the specification object), else that the first stage estimation is QML based as in
Bauwens and Laurent (2005).
2.2.1 Forecasting
Because of the nonlinearity of the DCC evolution process, the multi-step ahead forecast of the
correlation cannot be directly solved, and is instead based on the approximation suggested in
Engle and Sheppard (2001). Consider the multi-step ahead evolution of the proxy process Qt+n :
where Et [zt+n−1 z 0 t+n−1 ] = Rt+n−1 and Rt+n = diag(Qt+n )−1/2 Qt+n diag(Qt+n )−1/2 . Engle and
Sheppard (2001) suggest 2 types of approximations possible to solve for Rt+n , and the package
adopts the one which, based on their findings, provides for the least bias. That is, set Q̄ ≈ R
and Et [Qt+1 ] = Et [Rt+1 ], so that:
n−2
(1 − α − β) R̄(α + β)i + (α + β)n−1 Rn+1
X
Et [Rt+n ] = (15)
i=0
Importantly, for the rolling 1-ahead method, the estimate of Q̄ at time T+n is updated from
data up to time T+n-1, which will lead to small differences in results from applying the DCC
filter method on new data (the difference will grow with n) which uses a fixed value for Q̄ (and
which can be controlled by the n.old option).
6
analysis and the actuarial sciences for many years before being introduced in the finance litera-
ture more than a decade ago by Frey and McNeil (2000) and Li (2000). They have since been
very popular in investigating the dependence of financial time series of various assets classes and
frequencies. Breymann et al. (2003) investigate bivariate hourly FX spot returns finding that the
Student Copula best fit the data at all horizons (with the shape parameter increasing with the
time horizon), while Malevergne and Sornette (2003) find that the Normal Copula fits pairs of
currencies and equities well on the whole but unsurprisingly fails to capture tail events where the
Student Copula does best.8 Junker and May (2005) use a Frank copula with a transformation
generator and GARCH dynamics for the margins using the empirical distribution, to analyze
the bivariate dependency of the daily returns of 6 stocks and 3 Euro swap rates (with horizons
2,5, and 10 Years). The comparison with a range of popular copulas including the Normal and
Student, in a risk exercise shows that asymmetric tail dependency is important and usually not
accommodated by the Student distribution9 While most studies are predominantly focused on
bivariate copulas, the extension to n-variate models is not overtly challenging particularly for
elliptical distributions, or the use of the more recent Vine pair copulas (see for example Joe et al.
(2010)).
2.3.1 Copulas
An n-dimensional copula C (u1 , . . . , un ) is an n-dimensional distribution in the unit hypercube
[0, 1]n with uniform margins. Sklar (1959) showed that every joint distribution F of the random
vector X = (x1 , . . . , xn ) with margins F1 (x1 ) , . . . , Fn (xn ), can be represented as:
for a copula C, which is uniquely determined in [0, 1]n for distributions F under absolutely
continuous margins and obtained as:
C (u1 , . . . , un ) = F F1−1 (u1 ) , . . . , Fn−1 (un ) (17)
where fi are the marginal densities and c is the density function of the copula given by:
f F1−1 (u1 ) , . . . , Fn−1 (un )
c (u1 , . . . , un ) = n . (19)
fi Fi−1 (ui )
Q
i=1
with Fi−1 being the quantile function of the margins. A key property of copulas is their invariance
under strictly increasing transformations of the components of the X, so that for example the
copula of the multivariate Normal distribution Fn (µ, Σ) is the same as that of Fn (0, R) where
R is the correlation matrix implied by the covariance matrix, and the same for the copula of the
multivariate Student distribution reviewed in detail in Demarta and McNeil (2005). The density
8
Interestingly the authors argue that since such events are rare, the goodness of fit test they use cannot always
reject the Normal Copula.
9
An alternative would be to use the skew Generalized Hyperbolic Student distribution analyzed in Aas and
Haff (2006) which allows for the modelling of one heavy (with polynomial behavior) and one semi-heavy (with
exponential behavior) tail.
7
of the Normal Copula, of the n-dimensional random vector X in terms of the correlation matrix
R, is then:
1 0
− 12 u0 R −I u
c (u; R) = e (20)
|R|1/2
where ui = Φ−1 (Fi (bf xi )) for i = 1, . . . , n, representing the quantile of the Probability Integral
Transformed (PIT ) values of X, and I is the identity matrix. Because the Normal Copula cannot
account for tail dependence, the Student Copula has been more widely used for modelling of
financial assets. The density of the Student Copula, of the n-dimensional random vector X in
terms of the correlation matrix R and shape parameter ν, can be written as:
n −(ν+n)/2
ν+n ν
1 + ν −1 u0 R−1 u
Γ 2 Γ 2
c (u; R, ν) = −(ν+1)/2 (21)
n
1/2 ν+n
n ν
Q u2i
|R| Γ 2 Γ 2 1+ ν
i=1
For elliptical distributions, Lindskog et al. (2003) proved that there is a one-to-one relationship
between this measure and Pearson’s correlation coefficient ρ given by:
X 2
τ (xi , xj ) = 1 − P{Xi = x}2 arcsin ρij (23)
x∈R
π
which under certain assumptions (such as in the case of the multivariate Normal) simplifies to
2 10 Kendall’s τ is also invariant under monotone transformations making it rather
π arcsin ρij .
more suitable when working with non-elliptical distributions. A useful application arises in the
case of the multivariate Student Distribution, where a maximum likelihood approach for the
estimation of the Correlation matrix R becomes unfeasible for large dimensions. In this case, an
alternative approach is to estimate the sample counterpart of Kendall’s τ 11 from the transformed
margins and then translate that into the correlation matrix as detailed in (23), providing for a
method of moments type estimator.12 The shape parameter ν may then be estimated keeping
10
Another popular measure is Spearman’s correlation coefficient ρs which under Normality equates to
6 ρ
π
arcsin 2ij , and it is usually very close in result to Kendall’s meaure.
11
The matrix is build up from the pairwise estimates.
12
It may be the case that the resultant matrix is not positive definite, in which case a variety of methods exist
to tweak it into one.
8
the correlation matrix constant, with little loss in efficiency vis-a-vis the full maximum likelihood
method.13
9
generated by the past realization of rt , and the conditional variance hit follows a GARCH(1,1)
process16 :
p
rit = µit + εit , εit = hit zit , (25)
hit = ω + α1 ε2t−1 + βhit−1 (26)
where zit are i.i.d. random variables which conditionally follow a standardized skew Student
distribution, zit ∼ fi (0, 1, ξi , νi ), of Fernandez and Steel (1998) with skew and shape parameters
ξ and ν respectively and derived in the rugarch vignette.17 The dependence structure of the
margins is then assumed to follow a Student copula with conditional correlation Rt and constant
shape parameter η. The conditional density at time t is given by:
ft Fi−1 (uit |η ) , . . . , Fi−1 (unt |η ) |Rt , η
ct (uit , . . . , unt |Rt , η ) = n (27)
fi Fi−1 (uit |η ) |η
Q
i=1
where uit = Fit (ri t |µit , hit , ξi , νi ) is the PIT transformation of each series by its conditional dis-
tribution Fit estimated via the first stage GARCH process, Fi−1 (uit |η ) represents the quantile
transformation of the uniform margins subject to the common shape parameter of the multivari-
ate density, ft (. |Rt , η ) is the multivariate density of the Student distribution with conditional
correlation Rt and shape parameter η and fi (. |η ) is the univariate margins of the multivariate
Student distribution with common shape parameter η. The dynamics of Rt are assumed to
follow an AGDCC model as described in the previous section, though it is more common to use
a restricted scalar DCC model for not too large a number of series. Finally, the joint density of
the 2-stage estimation is written as:
n
Y 1
f (r t |µt , ht , Rt , η ) = ct (uit , . . . , unt |Rt , η ) √ fit (zit |νi , ξi ) (28)
i=1
hit
where it is clear that the likelihood is composed of a part due to the joint DCC copula dynamics
and a part due to the first stage univariate GARCH dynamics.
A similar model, with Student margins, was estimated by Ausin and Lopes (2010) using a
Bayesian setup, and an empirical risk management application, albeit once again using only a
bivariate series (DAX and Dow Jones indices), used to illustrate its applicability and appropri-
ateness.
In the rmgarch package, the Normal and Student copulas are implemented, with either a static
or dynamic correlation model (aDCC).
2.3.5 Forecasting
Because of the nonlinear transformation of the margins, there is no closed form solution for the
multi-step ahead forecast. As such, the cgarchsim method must be used. The inst folder of the
package contains a number of examples.
10
by a set of unobserved underlying factors that are conditionally heteroscedastic. The dependence
framework is non-dynamic as a consequence of large scale estimation in a multivariate setting.
The dependence structure of the unobserved factors then determines the type of factor model
it belongs to, with correlated factors making up the F-ARCH type models, while uncorrelated
and independent factors comprise the Orthogonal and Generalized Orthogonal Models respec-
tively.18 Because one can always re-discover uncorrelated or independent sources by certain
statistical transformation, the correlated factor assumption of F-ARCH models does appear
to be restrictive. GO-GARCH models on the other hand make use of those transformations
to place the factors in an independence framework with unique benefits such as separability
and weighted density convolution giving rise to truly large scale, real-time and feasible estima-
tion. Consider a set of N assets whose returns r t are observed for T periods, with conditional
mean E[r t |Ft−1 ] = mt , where Ft−1 is the σ-field generated by the past realizations of r t , i.e.
Ft−1 = σ(r t−1 , r t−2 , . . .). The GO-GARCH Model of van der Weide (2002) maps r t − mt onto
a set of unobserved independent factors f t (or ”structural errors”),
r t = mt + t t = 1, . . . , T (30)
t = Af t , (31)
where A is invertible and constant over time and may be decomposed into the de-whitening
matrix Σ1/2 , representing the square root of the unconditional covariance, and an orthogonal
matrix, U , so that:
A = Σ1/2 U , (32)
and f t = (f1t , . . . , fN t )0 . The rows of the mixing matrix A therefore represent the independent
source factor weights assigned to each asset (i.e. rows are the assets and columns the factors).
The factors have the following specification:
1/2
f t = H t zt, (33)
where H t = E[f t f 0t |Ft−1 ] is a diagonal matrix with elements (h1t , . . . , hN t ) which are the condi-
tional variances of the factors, and z t = (z1t , . . . , zN t )0 . The random variable zit is independent
of zjt−s ∀j 6= i and ∀s, with E[zit |Ft−1 ] = 0 and E[zit 2 ] = 1, this implies that E[f |F
t t−1 ] = 0
and E[t |Ft−1 ] = 0. The factor conditional variances, hi,t , can be modelled as a GARCH-type
process. The unconditional distribution of the factors is characterized by:
Under the assumption that all Ai and B are restricted to have the same eigenvector Z, with the eigenvalues of
A being all zero except the ith one, and the C can be decomposed into ZDZ 0 where D is some positive definite
diagonal matrix, then this is a GO-GARCH (with GARCH(1,1) univariate dynamics) model where Z is the linear
ICA map. However, GO-GARCH model is not limited to GARCH(1,1) or any particular process for the factors.
11
The conditional covariance matrix, Σt ≡ E[(r t − mt )(r t − mt )0 |Ft−1 ] of the returns is given by:
Σt = AH t A0 . (37)
The Orthogonal Factor model of Alexander (2001)19 which uses only information in the covari-
ance matrix, leads to uncorrelated components but not necessarily independent unless assuming
a multivariate normal distribution. However, while whitening is not sufficient for independence,
it is nevertheless an important step in the preprocessing of the data in the search for independent
factors, since by exhausting the second order information contained in the covariance matrix it
makes it easier to infer higher order information, reducing the problem to one of rotation (orthog-
onalization). The original procedure of van der Weide (2002) used a 1-step maximum likelihood
approach to jointly estimate the rotation matrix and dynamics making the procedure infeasible
for anything other than a few assets. Alternative approaches such as nonlinear least squares
and method of moments for the estimation of U have been proposed in van der Weide (2004)
and Boswijk and van der Weide (2011), respectively. In the rmgarch package, I estimate the
matrix U by ICA as in Broda and Paolella (2009) and Zhang and Chan (2009). One of the
computational advantages offered by the Generalized Orthogonal approach is that following the
estimation of the independent factors, the dynamics of the marginal density parameters of those
factors may be estimated separately.
2.4.1 ICA
The estimation of the factor loading matrix A exploits the decomposition in (32). The estimation
of Σ1/2 , representing the square root of the unconditional covariance matrix, is usually obtained
from the OLS residuals bt = r t − m c t , while the orthogonal matrix U can be estimated using ICA
(see Broda and Paolella (2009), Zhang and Chan (2009)). ICA is a computational method for
separating multivariate mixed signals, x = [x1 , ..., xn ]0 , into additive statistically independent
and non-Gaussian components, s = [s1 , ..., sn ]0 , such that x = Bs. The objective is to decompose
the observed x = [x1 , ..., xn ]0 , into independent factors s = [s1 , ..., sn ]0 and a linear matrix B,
such that x = Bs. The independent source vector s ∈ Rn , is assumed to be sampled from a
joint distribution f (s),
f (s1 , ..., sn ) = f (s1 )f (s2 )...f (sn ), (38)
where s is not directly observable, nor is the particular form of the individual distributions,
f (si ), usually known.20 This forms the key property of independence, namely that the joint
density of independent signals is simply the product of their margins. The estimate of the
linear mixing matrix B can be obtained via estimation methods based on a choice of criteria for
measuring independence which include the maximization of non-Gaussianity through measures
such as kurtosis and negentropy, minimization of mutual information, likelihood and infomax.
This follows from the Central Limit Theorem which states that mixtures of independent variables
tend to become more Gaussian in distribution when they are mixed linearly, hence maximizing
non-Gaussianity leads to independent components (see Hyvärinen and Oja (2000) for more
details).21 Entropy may be thought of as the amount of information inherent within a random
variable, being an increasing function of the amount of randomness in that variable. For a
19
When U is restricted to be an identity matrix, the model reduces to the Orthogonal Factor model.
20
If the distributions are known the problem reduces to a classical maximum likelihood parametric estimation.
21
Estimation by minimization of the mutual information was first proposed by Comon (1994) who derived
a fundamental connection between cumulants, negentropy and mutual information. The approximation of ne-
gentropy by cumulants was originally considered much earlier in Jones and Sibson (1987), while the connection
between infomax and likelihood was shown in Pearlmutter and Parra (1997), and the connection between mutual
information and likelihood was explicitly discussed in Cardoso (2000)
12
discrete random variable X it is defined as,
X
H(X) = − P (X = bi ) log P (X = bi ), (39)
i
with bi denoting the possible values of X. In the continuous case, for a continuous random
variable X with density fX (x), the entropy22 H is defined as,
Z
H(X) = − fX (x) log fX (x)dx. (40)
A key result from information theory states that among all random variables of equal variance,
a Gaussian variable has the largest entropy. Hence entropy could be used as a measure of
non-Gaussianity. A related measure of non-Gaussianity is the negentropy which is always non-
negative and zero for a Gaussian variable. It is defined as,
where H(Xgauss ) is the entropy of a Gaussian random variable having the same covariance matrix
as X. As shown by Comon (1994), negentropy is invariant for invertible linear transformations
and is an optimal estimator of non-Gaussianity with regards to its statistical properties (i.e.
consistency, asymptotic variance and robustness). In practice, because we do not know the
density, approximations of negentropy are used such as the one by Hyvärinen and Oja (2000),
p
ki [E(Gi (X)) − E(Gi (V ))]2 ,
X
J(X) ≈ (42)
i=1
where ki are positive constants, V is a standardized Gaussian variable and Gi are non-quadratic
functions. The choice of the non-quadratic function has an impact on the robustness of the
estimators of negentropy. with G(x) = x4 (kurtosis based) being the least robust while more
robust measures would include,
1
g1 (u) = log cosh a1 u, g2 (u) = − exp(−0.5u2 ). (43)
a1
Because these non-quadratic functions present a complex nonlinear optimization problem, so-
phisticated numerical algorithms are usually necessary. Two main algorithms are used, the online
and batch methods, with the former based on stochastic gradient methods while in the latter
case a popular choice is the natural gradient ascend of likelihood. The FastICA of Hyvärinen
and Oja (2000) is a very efficient batch algorithm with a range of options for the non-quadratic
functions. It can be used to estimate the components either one at a time by finding maximally
non-Gaussian directions or in parallel by maximizing non-Gaussianity or the likelihood. The es-
timation procedure of the GO-GARCH model can be summarized as follows. First, the FastICA
is applied to the whitened data z t = Σ b −1/2 b
t , where Σ b 1/2 is obtained from the eigenvalue de-
composition of the OLS residual covariance matrix, returning an estimate of f t , i.e., y t = W z t .
Second, because of the assumption of independence, the likelihood function of the GO-GARCH
model is greatly simplified so that the conditional log-likelihood function is expressed as the sum
of the individual conditional log-likelihoods,√derived √ from the √ conditional
√ marginal densities of
the factors, i.e., GHλi (yit ) ≡ GH(yit ; λi , µi hit , δi hit , αi / hit , βi / hit ), plus a term for the
mixing matrix A, estimated in the first step by FastICA:
T X
N
L(bt |θ, A ) = T log A−1 +
X
log (GHλi (yit |θi )) (44)
t=1 i=1
22
In the continuous case this is usually called differential entropy.
13
where θ is the vector of unknown parameters in the marginal densities. Because ICA is a linear
noiseless model,23 the implication for this 2 stage estimation in the GO-GARCH model is that
uncertainty plays no part in the derivation of the mixing matrix A and hence does not affect
the standard errors of the independent factors.
The possibility of modelling the independent factors separately not only increases the flexibility
of the model but also its computational feasibility, since the multivariate estimation reduces to N
univariate optimization steps plus a term which depends on the factor loading matrix. Thus the
independence property of the model allows the estimation of very large scale systems on modern
computational grids with the time required to calculate any n-dimensional model equivalent to
the time it takes to estimate one single factor in this framework.
In the rmgarch package, 2 algorithms for ICA are implemented locally. The FastICA of Hyväri-
nen and Oja (2000), based on a direct translation of their Matlab code and the RADICAL of
Learned-Miller and Fisher III (2003) which offers a robust alternative. Both models allow a
choice of common options such as the type of covariance estimator to use for the whitening
stage (e.g. Ledoit-Wolf, EWMA) as well as the possibility of dimensionality reduction during
the PCA stage. In the latter case, some results for the model are still to be derived and it is
therefore considered experimental at this stage.
14
to a benchmark such as the normal distribution), a positive co-skewness of an asset with another
asset means that, when the price volatility goes up the return of this asset also goes up. The
general acceptance that the conditional density of asset returns is not completely and adequately
characterized by the first two moments, implies that the derivation of any measure of risk from
that density requires estimates for the higher order co-moments of the return distribution if one
is work within a multivariate setting. The linear affine representation of the GO-GARCH model
allows to identify closed-form expression for the conditional co-skewness and co-kurtosis of asset
returns24 , as described in de Athayde and Flôres Jr (2000). The conditional co-moments of r t
of order 3 and 4 are represented as tensor matrices,
M 3t = AM 3f,t (A0 ⊗ A0 ),
(45)
M 4t = AM 4f,t (A0 ⊗ A0 ⊗ A0 ),
where M 3f,t and M 4f,t are the (N × N 2 ) conditional third co-moment matrix and the (N × N 3 )
conditional fourth co-moment matrix of the factors, respectively. M 3f,t and M 4f,t , defined as are
given by
h i
M 3f,t = M 31,f,t , M 32,f,t , . . . , M 3N,f,t (46)
h i
M 4f,t = M 411,f,t , M 412,f,t , . . . , M 41N,f,t | . . . |M 4N 1,f,t , M 4N 2,f,t , . . . , M 4N N,f,t (47)
15
2.4.3 The Portfolio Conditional Density
An important question that can be addressed in this framework is the determination of the
portfolio conditional density, an issue of vital importance in risk management application. The
N -dimensional NIG distribution, closed under convolution, is suited to problems in portfolio and
risk management where a weighted sum of assets is considered. However, when the distributional
parameters α and β, representing skew and shape, are allowed to vary per asset, as in the
GO-GARCH case, this property no longer holds and numerical methods such as that of the
Fast Fourier Transform (FFT ) are needed to derive the weighted density by inversion of the
characteristic function of the scaled parameters25 . In the case of the NIG distribution, this
is greatly simplified because of the representation of the modified Bessel function for the GIG
shape index (λ) with value −0.5 which was derived in Barndorff-Nielsen and Bläsild (1981),
otherwise the characteristic function of the GH involves the evaluation of the modified Bessel
function with complex arguments, which complicates the inversion. Appendix 3 derives the
characteristic functions used in the case of independent margins for both the NIG and full GH
distributions. Let Rt be the portfolio return:
1/2
Rt = w0t r t = w0t mt + (w0t AH t )z t (49)
1/2
where H t is estimated from the GARCH dynamics of y t . The model allows to express the
portfolio variance, skewness and kurtosis in closed form,
2
σp,t = w0t Σt wt ,
w0t M 3t (wt ⊗ wt )
sp,t = ,
(w0t Σt wt )3/2 (50)
w0t M 4t (wt ⊗ wt ⊗ wt )
kp,t = ,
(w0t Σt wt )2
where Σt , M 3t and M 4t are derived in (45). The portfolio conditional density may be obtained
via the inversion of the characteristic function through the FFT method as in Chen et al. (2007)
(see Appendix 3 for details) or by simulation. The former is used in this package for its accuracy
and speed. Provided that z t is a N -dimensional vector of innovations, marginally distributed as
1-dimensional standardized GH, the density of weighted asset return, wit rit , is
!
αi βi
wi,t ri,t = (wi,t mi,t + wi,t zi,t ) ∼ GHλi wi,t µi + wi,t mi,t , |wi,t | δi , , (51)
|wi,t | |wi,t |
1/2
where w0t is equal to w0t AH t , and wi,t is the i -th element of wt , mi,t the conditional mean
of the i -th underlying asset. In order to obtain the density of the portfolio, we must sum the
individual weighted densities of zi,t . The characteristic function of the portfolio return Rt is
n
d d √ !!
λj γ K λj (δ̄j υ)
Y X X
ϕR (u) = ϕw̄Zi (u) = exp iu µ̄j + log + log √ (52)
i=1 j=1 j=1
2 υ K λj (δ̄j γ)
where, γ = ᾱj2 − β̄j2 , υ = ᾱj2 − (β̄j + iu)2 , and (ᾱj , β̄j , δ̄j , µ̄j ) are the scaled versions of the
parameters (αi , βi , δi , µi ) as shown in (51). The density may be accurately approximated by
FFT as follows,
Z +∞ Z s
1 1
fR (r) = e(−iur) ϕR (u)du ≈ e(−iur) ϕR (u)du. (53)
2π −∞ 2π −s
25
This effectively means that the weighted density is not necessarily NIG distributed.
16
Once the density is formed by FFT inversion of the characteristic function, distribution, quantile
and sampling functions can be created. In the rmgarch package these are represented are dfft,
pfft, qfft and rfft, which operate on the point in time conditional density approximation, an object
of class goGARCHfft, returned from calling the convolution method on a fitted (goGARCHfit),
filtered (goGARCHfilter ), forecasted (goGARCHforecast), simulated (goGARCHsim) or rolling
(goGARCHroll ) object. Finally, the nportmoments method applied to a goGARCHfft object
will return the FFT-based semi analytic portfolio moments.
2.4.4 Forecasting
The multi-step ahead forecast of the GO-GARCH model is based completely on the univariate
factor dynamics, already covered in the rugarch package. Additionally, all methods available
for working with a fitted (goGARCHfit) object are also available for the resulting forecast (goG-
ARCHforecast) object and covered in detail in the help file, and the examples in the inst folder
of the package.
3 Miscellaneous
Like the rugarch package, parallel functionality is implemented by passing a pre-created cluster
object from the parallel package. Unlike the rugarch package, there is a much higher cost to
the use of a socket (snowfall) rather than fork (multicore) based setup, and depending on the
number of sockets used, it may be the case that the data communication overhead is so high
that non-parallel estimation is faster.
A comprehensive set of examples is available in the rmgarch.tests folder of the source. There
are 5 main files, covering the Copula, DCC, FDCC and GO-GARCH models and the fScenario
and fMoments methods for use in portfolio and risk management applications (see the parma
package).
17
Appendices
The GH characteristic function
The moment generating function (MGF ) of the GH Distribution is,
!
µu u2
MGH(λ,α,β,δ,µ) (u) = e M √ + βu ,
GIG λ,δ α2 −β 2 2
(54)
q
2
!λ/2 K
λ δ α2 − (β + u)
α2 − β 2
= eµu
α2 − (β + u)2
p
K λ δ α2 − β 2
where MGIG represents the moment generating function of the Generalized Inverse Gaussian
which forms the mixing distribution in this variance-mean mixture subclass. Powers of the MGF,
MGH (u)p , only have the representation in (54) for p = 1, which means that GH distributions
are not closed under convolution with the exception of the NIG, and only in the case when the
shape and skew parameters are the same. The MGF of the NIG is,
√
eδ α2 −β 2
MN IG(α,β,δ,µ) (u) = eµu √ 2 2
. (55)
eδ α −(β+u)
Powers of p are equivalent in this case to multiplication by p of δ and µ, so that,
When the distribution is not closed under convolution, numerical methods are required such as
the inversion of the characteristic function by FFT. Because the MGF is a holomorphic function
for complex z, with |z| < α − β, we can obtain the characteristic function of the GH distribution,
using the following representation,
In order to find the portfolio density in the case of the GO-GARCH (maGH/maNIG) model, the
characteristic function required for the inversion of the NIG density was already used in Chen
et al. (2010) and given below,
!
d d q r
2
X X
φport (u) = exp iu µ̄j + δ̄j ᾱj2 − β̄j2 − ᾱj2 − (β̄j + iu) (60)
j=1 j=1
where ᾱj , β̄j , δ̄j and µ̄j represent the parameters scaled as described in the main text of the
paper. In the case of the GH characteristic function, this is a little more complicated as it
18
involves the evaluation of modified Bessel function of the third kind with complex arguments.26
Taking logs and summing,
d
(
X λj λ
j
φport (u) = exp iu µ̄j + log ᾱj2 − β̄j2 − log ᾱj2 − (β̄j + iu)2 +
j=1
2 2
q q )
log K λj δ̄j ᾱj2 − (β̄j + iu)2 − log K λj δ̄j ᾱj2 − β̄j2 (61)
which is more than 30 times slower to evaluate than the equivalent NIG function because of the
Bessel function evaluations.
26
The Bessel package of Maechler (2012) is used for this purpose.
19
References
K. Aas and I.H. Haff. The generalized hyperbolic skew student’s t-distribution. Journal of
Financial Econometrics, 4(2):275–309, 2006.
C. Alexander. Orthogonal garch. In Mastering Risk, volume 2, pages 21–38. Financial Times-
Prentice Hall, 2001.
M.C. Ausin and H.F. Lopes. Time-varying joint distribution through copulas. Computational
Statistics and Data Analysis, 54(11):2383–2399, 2010.
L. Bauwens and S. Laurent. A new class of multivariate skew densities, with application to
generalized autoregressive conditional heteroscedasticity models. Journal of Business and
Economic Statistics, 23(3):346–354, 2005.
L. Bauwens, S. Laurent, and J.V.K. Rombouts. Multivariate garch models: A survey. Journal
of Applied Econometrics, 21(1):79–109, 2006.
T. Bollerslev, R.F. Engle, and J.M. Wooldridge. A capital asset pricing model with time-varying
covariances. Journal of Political Economy, 96(1):116, 1988.
P.H. Boswijk and R. van der Weide. Method of moments estimation of go-garch models. Journal
of Econometrics, 163(1):118–126, 2011.
S.A. Broda and M.S. Paolella. Chicago: A fast and accurate method for portfolio risk calculation.
Journal of Financial Econometrics, 7(4):412, 2009.
L. Cappiello, R.F. Engle, and K. Sheppard. Asymmetric correlations in the dynamics of global
equity and bond returns. Journal of Financial Econometrics, 4(4):537–572, 2006.
J.F. Cardoso. Entropic contrasts for source separation: geometry and stability. In Simon Haykin,
editor, Unsupervised adaptive filters, pages 139–190. John Wiley & sons, 2000.
Y. Chen, W. Härdle, and V. Spokoiny. Portfolio value at risk based on independent component
analysis. Journal of Computational and Applied Mathematics, 205(1):594–607, 2007.
Y. Chen, W. Härdle, and V. Spokoiny. Ghica–risk analysis with gh distributions and independent
components. Journal of Empirical Finance, 17(2):255–269, 2010.
20
P. Comon. Independent component analysis, a new concept? Signal Processing, 36(3):287–314,
1994.
Christophe Croux and Kristel Joossens. Robust estimation of the vector autoregressive model
by a least trimmed squares procedure. In Paula Brito, editor, COMPSTAT 2008, pages 489–
501. Physica-Verlag HD, 2008. ISBN 978-3-7908-2084-3. URL http://dx.doi.org/10.1007/
978-3-7908-2084-3_40.
A.C. Davison and R.L. Smith. Models for exceedances over high thresholds. Journal of the
Royal Statistical Society: Series B (Methodological), 52(3):393–442, 1990.
G.M. de Athayde and R.G. Flôres Jr. Introducing higher moments in the capm: Some basic
ideas. In C. Dunis, editor, Advances in Quantitative Asset Management. Springer, 2000.
S. Demarta and A.J. McNeil. The t copula and related copulas. International Statistical Review,
73(1):111–129, 2005.
R.F. Engle. Dynamic conditional correlation. Journal of Business and Economic Statistics, 20
(3):339–350, 2002.
R.F. Engle and K. Sheppard. Theoretical and empirical properties of dynamic conditional
correlation multivariate garch. NBER Working Paper, 2001.
R.F. Engle, V. Ng, and M. Rothschild. Asset pricing with a factor arch covariance structure:
Empirical estimates for treasury bills. Journal of Econometrics, 45:213–237, 1990.
Hsing Fang and Lai Tsong-Yue. Co-kurtosis and capital asset pricing. Financial Review, 32(2):
293–307, 1997.
C. Fernandez and M.F. Steel. On bayesian modeling of fat tails and skewness. Journal of the
American Statistical Association, 93(441):359–371, 1998.
C. Harvey and A. Siddique. Conditional skewness in asset pricing tests. Journal of Finance, 55
(3):1263–1295, 2000.
A. Hyvärinen and E. Oja. Independent component analysis: Algorithms and applications. Neural
Networks, 13(4-5):411–430, 2000.
21
T. Jeantheau. Strong consistency of estimators for multivariate arch models. Econometric
Theory, 14(1):70–86, 1998.
H. Joe. Multivariate models and dependence concepts, volume 73. Chapman & Hall/CRC, 1997.
H. Joe, H. Li, and A.K. Nikoloulopoulos. Tail dependence functions and vine copulas. Journal
of Multivariate Analysis, 101(1):252–270, 2010.
M.C. Jones and R. Sibson. What is projection pursuit? Journal of the Royal Statistical Society:
Series A (General), 150(1):1–37, 1987.
M. Junker and A. May. Measurement of aggregate risk with copulas. Econometrics Journal, 8
(3):428–454, 2005.
A. Kraus and R. Litzenberger. Skewness preference and the valuation of risk assets. Journal of
Finance, 31(4):1085 – 1100, 1976.
W.H. Kruskal. Ordinal measures of association. Journal of the American Statistical Association,
53(284):814–861, 1958.
Erik G. Learned-Miller and John W. Fisher III. Ica using spacings estimates of entropy. Journal
of Machine Learning Research, 4:1271–1295, December 2003. ISSN 1532-4435. URL http:
//dl.acm.org/citation.cfm?id=945365.964306.
DX Li. On default correlation: A copula function approach. Journal of Fixed Income, 9(4):
43–54, 2000.
F. Lindskog, A. Mcneil, and U. Schmock. Kendall’s tau for elliptical distributions. In Credit
Risk: Measurement, Evaluation and Management. Physica-Verlag, 2003.
M. Maechler. Bessel: Bessel – Bessel Functions Computations and Approximations, 2012. URL
http://CRAN.R-project.org/package=Bessel. R package version 0.5-4.
Y. Malevergne and D. Sornette. Testing the gaussian copula hypothesis for financial assets
dependences. Quantitative Finance, 3(4):231–250, 2003.
A.J. Patton. Modelling asymmetric exchange rate dependence. International Economic Review,
47(2):527–556, 2006.
S.A. Ross. The arbitrage theory of capital asset pricing. Journal of Economic Theory, 13(3):
341–360, 1976.
R.S. Sears and K.C.J. Wei. Asset pricing, higher moments, and the market risk premium: A
note. Journal of Finance, 40:1251–1253, 1985.
22
A. Sklar. Fonctions de répartition à n dimensions et leurs marges. Publ. Inst. Statist. Univ.
Paris, 8(1):11, 1959.
Y.K. Tse and A.K.C. Tsui. A multivariate generalized autoregressive conditional heteroscedas-
ticity model with time-varying correlations. Journal of Business and Economic Statistics, 20
(3):351–362, 2002.
R. van der Weide. Go-garch: a multivariate generalized orthogonal garch model. Journal of
Applied Econometrics, 17(5):549–564, 2002.
R. van der Weide. Wake me up before you go-garch. In Computing in Economics and Finance.
Society for Computational Economics, 2004.
A. Zeevi and R. Mashal. Beyond correlation: Extreme co-movements between financial assets.
Columbia University, 2002.
K. Zhang and L. Chan. Efficient factor garch models and factor-dcc models. Quantitative
Finance, 9(1):71–91, 2009.
23