Machine Learning
Machine Learning
on
Unit-V
Time Series Models of Heteroscedasticity
SR
Dr. Suresh, R
Assistant Professor
Department of Statistics
Bangalore University, Bengaluru-560 056
1
Contents
1 Time Series Models of Heteroscedasticity 5
1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.2 Some Common Features of Financial Time Series . . . . . . . . . . . . . . 7
1.3 ARCH Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.3.2 ARCH(m) Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.3.3 ARCH(1) Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
1.4 GARCH Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
1.4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
1.4.2 GARCH(1,1) Model . . . . . . . . . . . . . . . . . . . . . . . . . . 12
1.5 Test for ARCH Effect . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
1.6 Identifying an ARCH/GARCH Model in Practice . . . . . . . . . . . . . . 16
1.7 Maximim Likelihood Estimation . . . . . . . . . . . . . . . . . . . . . . . . 16
1.7.1 ML Estimation of ARCH Model . . . . . . . . . . . . . . . . . . . . 16
1.7.2 ML Estimation of GARCH Model . . . . . . . . . . . . . . . . . . . 18
SR
2
Syllabus
3
References
[1] Box, G. E. P., Jenkins, G. M., Reinsel, G. C. and Ljung, G. M., Time
Series Analysis-Forecasting and Control, 5/e, Wiley, 2016.
[2] Brockwell, P. J. and Davis, R. A., Introduction to Time Series and
Forecasting, 3/e, Springer, Switzerland, 2016.
[3] Chatfield, C. and Xing, H., The Analysis of Time Series: An Introduc-
tion with R, 7/e, CRC Press, 2019.
[4] Cryer, J. D. and Chan, K. S., Time Series Analysis with Application
in R, 2/e, Springer, New York, 2008.
[5] Enders, W., Applied Econometric Time Series, 4/e, Wiley, 2015.
[6] Kirchgassner, G., Wolters, J. and Hassler, U., Introduction to Modern
Time Series Analysis, 2/e, Springer, Berlin, 2013.
[7] Tsay, R. S., Analysis of Financial Time Series, 3/e, Wiley, New Jersey,
2010.
SR
4
1 Time Series Models of Heteroscedasticity
1.1 Introduction
In the previous units, our focus was on time series with time-varying mean
processes. We were concerned with stationary and nonstationary variables.
The nonstationary nature of the variables(time series) implied that they had
means that change over time. All models discussed so far use the conditional
expectation to describe the mean development of one or more time series.
The optimal forecast, in the sense that the variance of the forecast errors will
be minimised, is given by the conditional mean of the underlying model.
Here, it is assumed that the residuals are not only uncorrelated but also
homoscedastic, i.e. that the unexplained fluctuations have no dependencies
in the second moments.
In this unit, we are concerned with stationary series, but with conditional
variances that change over time. The model we focus on is called the au-
toregressive conditional heteroskedastic (ARCH) model and its generalized
version Generalised autoregressive conditional heteroskedastic (GARCH)
model.
Nobel Prize winner Robert Engle’s original work on ARCH was con-
SR
cerned with the volatility (means changing a lot over time) of inflation.
However, it was applications of the ARCH model to financial time series
that established and consolidated the significance of his contribution. For
this reason, the examples used in this unit will be based on financial time
series.
Financial time series have characteristics that are well represented by
models with dynamic variances. The particular aims of this unit are to
discuss the modeling of dynamic variances using the ARCH class of models
of volatility (Note: In statistics we use variance to measure volatility), the
estimation of these models.
Note :
1. The importance of volatility models stems from the fact that the price
of an option crucially depends on the variance of the underlying se-
curity price. Thus with the surge of derivative markets in the last
decades the application of such models (models of volatility) has seen
a tremendous rise.
2. Another use of volatility models is to assess the risk of an investment.
In the computation of the so-called value at risk (VaR), these models
5
have become an indispensable tool. In the banking industry, due to
the regulations of the Basel accords, such assessments are in particular
relevant for the computation of the required equity capital backing-up
assets of different risk categories.
3. In risk management, volatility models provide a simple approach to
calculating the value at risk of a financial position. Volatility also
plays an important role in asset allocation and portfolio optimization.
4. Volatility is an important factor in options trading. Here volatility
means the conditional standard deviation of the underlying asset re-
turn.
5. A special feature of stock volatility is that it is not directly observ-
able.Although volatility is not directly observable, it has some charac-
teristics (Refer section: 1.2) that are commonly seen in asset returns.
6. Volatility evolves over time in a continuous manner —that is, volatility
jumps are rare.
7. Volatility does not diverge to infinity—that is, volatility varies within
SR
, because we want to use the past history to forecast the variance. The
last equality holds if E(rt |rt−1 , rt−2 , . . .) = 0, which is true in most
cases.
12. The ARCH process has the property of time-varying conditional vari-
ance, and therefore can capture the volatility clustering.
13. ARCH and GARCH model are non-linear models.
6
Remark: It is pertinent to note that the first differences of most of the
financial time series often exhibit wide swings, or volatility, suggesting that
the variance of financial time series varies over time. We can think of
modeling such “varying variance” (modelling changes in variance or volatil-
ity). This is where the so-called autoregressive conditional heteroscedastic-
ity (ARCH) model originally developed by Engle comes in handy.
These models do not generally lead to better point forecasts of the mea-
sured variable, but may lead to better estimates of the (local) variance.
This, in turn, allows more reliable prediction intervals to be computed and
hence a better assessment of risk.
further large changes and periods when small changes are followed
by further small changes. In this case the series are said to display
time-varying volatility as well as “clustering” (‘volatility clustering’ or
‘volatility pooling’) of changes. Within these periods (periods of large
changes and periods of small change) volatility seems to be positively
autocorrelated. Statistically, volatility clustering implies time-varying
conditional variance. (In the case of financial data, for example, large
and small errors tend to occur in clusters, i.e., large returns are followed
by more large returns, and small returns by more small returns)
3. The financial time series may seem serially uncorrelated, but it is de-
pendent. (ACFs of the financial t.s may suggest no significant serial
correlations except for small ones at lags 1 or 2. However, the sample
ACFs of some function of the financial t.s., (say absolute or squared ),
show strong dependence over all lags.)
4. These series display non-normal properties. (we see more observations
around the mean and in the tails,i.e., more peaked around the mean
and relatively fat tails-). This results in ‘excess kurtosis’, i.e. the values
of the kurtosis are above three. Distributions with these properties said
to be leptokurtic.
7
5. The financial time series may also be asymmetric(skewed).
6. Most of these financial time series is that in their level form(i.e.,
mean) they are random walks; that is, they are nonstationary. On the
other hand, in the first difference form, they are generally stationary.
ARMA models were used to model the conditional mean of a process when
the conditional variance was constant. Using an AR(1) as an example, we
assumed
E(Xt |Xt−1 , Xt−1 , . . .) = φXt−1 , and V ar(Xt |Xt−1 , Xt−1 , . . .) = V ar(at ) = σt2 .
(1)
In many problems, however, the assumption of a constant conditional vari-
ance will be violated. Models such as the autoregressive conditionally het-
eroscedasticor ARCH model, first introduced by Engle(1982) were devel-
oped to model changes in volatility. These models were later extended to
generalized ARCH, or GARCH models by Bollerslev(1986).
SR
The first model that provides a systematic framework for volatility modeling
is the ARCH model of Engle (1982). The basic idea of ARCH models is
that: Suppose rt = µt + at , where µt is the mean return, conditional on
Ft−1 , the information available through time (t − 1) then:
8
a) the shock/innovation at of an asset return is serially uncorrelated, but
dependent, and
b) the dependence of at can be described by a simple quadratic function
of its lagged values. Specifically, an ARCH(m) model assumes that
at = σt t , (4)
σt2 = α0 + α1 a2t−1 + · · · + αm a2t−m , (5)
where {t } is a sequence of independent and identically distributed
(iid) random variables with mean zero and variance 1, α0 > 0, and
αi > 0 for i > 0. A model for at satisfying Equations (4) and (5) is
called an autoregressive conditionally heteroscedastic model of order
m (ARCH(m)).
Note:
1. The coefficients αi must satisfy some regularity conditions (α1 + α2 +
· · · + αm < 1) to ensure that the unconditional variance of at is finite.
This additional constraint ensures that the at are covariance stationary
with finite unconditional variance σt2 .
2. In practice (for modeling purposes), t is often assumed to follow the
SR
9
1.3.3 ARCH(1) Model
at = σt t , (6)
σt2 = α0 + α1 a2t−1 , (7)
= E[E(a2t |Ft−1 )]
= E(α0 + α1 a2t−1 )
= α0 + α1 E(a2t−1 )
= α0 + α1 V ar(at−1 ) ∵ V ar(at−1 ) = E(a2t−1 )
= α0 + α1 V ar(at ) ∵ V ar(at−1 ) = V ar(at )
α0
σa2 = . (9)
1 − α1
Since the variance of at must be positive, we require 0 ≤ α1 < 1.
Note: Substituting α0 = σa2 (1 − α1 ) from (9) into (7), we se that
or, equivalently,
σt2 − σa2 = α1 (a2t−1 − σa2 ). (11)
Hence, the conditional variance of at will be above the unconditional
variance whenever a2t−1 is larger than the unconditional variance σa2 .
3. The at are serially uncorrelated: Since for j > 0,
10
But the at ’s are not mutually independent since they are interrelated
through their conditional variances. The lack of serial correlation is an
important property that makes the ARCH model suitable for model-
ing asset returns that are expected to be uncorrelated by the efficient
market hypothesis.
4. Unconditional Kurtosis of at : We can show that the unconditional
Kurtosis of at is given by
3(1 − α12 )
κ= (13)
1 − 3α12
This value exceeds 3, the kurtosis of the normal distribution. Hence,
the marginal distribution of at has heavier tails than those of the
normal distribution. Hence the innovation process at in a Gaussian
ARCH(1) model tends to generate more ‘outliers’ than a Gaussian
white noise process. This is in agreement with the empirical finding
that ‘outliers’ appear more often in asset returns than that implied
by an iid sequence of normal random variates. This is an additional
feature of the ARCH model that makes it useful for modeling financial
SR
11
1.4 GARCH Model
1.4.1 Introduction
The ARCH model has a disadvantage in that it often requires a high lag
order m to adequately describe the evolution of volatility over time. An
extension of the ARCH model called the generalizedARCH, or GARCH,
model was introduced by Bollerslev (1986) to overcome this issue. The
generalized ARCH or GARCH model is a parsimonious alternative to an
ARCH(m) model.
Definition 1.2. For a log return series rt , let at = rt − µt be the innovation
at time t. Then at follows a GARCH(m,s) model if
at = σt t (15)
m
X s
X
σt2 = α0 + αi a2t−i + 2
βj σt−j (16)
i=1 j=1
where {t }is a sequence of iid random variables with mean 0 and variance
Pmax(m,s)
1, α0 > 0, αi ≥ 0, βj ≥ 0, and i=1 (αi + βi ) < 1.
SR
Note:
1. The constraint on αi + βi implies that the unconditional variance of at
is finite, whereas its conditional variance σt2 evolves over time.
2. To carry out inference, t is often assumed to follow a standard normal
or standardized Student-t distribution or generalized error distribution.
3. Equations (15 and 16) reduces to a pure ARCH(m) model if s = 0.
4. The αi and βj are referred to as ARCH and GARCH parameters, re-
spectively.
5. {t } is independent of at−j , j ≥ 1.
6. A GARCH (generalized autoregressive conditionally heteroscedastic)
model uses values of the past squared observations (ARCH terms )
and past variances (GARCH terms) to model the variance at time t.
The simplest and most widely used model in the class of GARCH models
is the GARCH(1, 1) model.
12
Definition 1.3. The GARCH(1, 1) model is
at = σt t , (17)
σt2 = α0 + α1 a2t−1 + β1 σt−1
2
, (18)
where at is the shock/innovation of an asset return and {t } is a sequence
of independent and identically distributed (iid) random variables with mean
zero and variance 1, α0 > 0, 0 ≤ α1 , β1 ≤ 1 and (α1 + β1 ) < 1.
13
4. The at are serially uncorrelated: Since for j > 0,
E(at at−j ) = E[E(at at−j |Ft−1 )] = E[at−j E(at |Ft−1 )] = 0. (22)
But the at ’s are not mutually independent since they are interrelated
through their conditional variances.
5. Unconditional Kurtosis of at : We can show that the unconditional
Kurtosis of at is given by
3(1 − (α1 + β1 )2 )
κ= (23)
1 − (α1 + β1 )2 − 2α12
This value exceeds 3, the kurtosis of the normal distribution. Conse-
quently, similar to ARCH models, the tail distribution of a GARCH(1, 1)
process is heavier than that of a normal distribution.
Remark: These properties also hold for general GARCH models with
higher orders, but the argument become more complicated.
Note:
SR
14
both tests the null hypothesis is that there is no heretoskedasticity i.e. that
there are no ARCH effects.
Pn 2 2 2 2
t=k+1 (ât − σ̂a )(ât−k − σ̂a )
ρ̂a2t (k) = Pn 2 2 )2
(26)
t=1 (â t − σ̂ a
(c) Use the Ljung-Box test statistic to test the hypothesis that all
correlation coefficients up to order K are simultaneously equal to
zero. McLeod and Li (1983) proposed the portmanteau statistic
n(n + 2) K
P
k=1 ρ̂a2t (k)
Q̃â2 = (27)
n−k
Under the null hypothesis this statistic is distributed as χ2 with
K degrees of freedom (McLeod and Li (1983) showed that the
statistic Q̃â2 has approximately the χ2 distribution with K degrees
of freedom under the assumption that the ARMA model alone is
adequate.). The decision rule is to reject the null hypothesis if
Q̃â2 > χ2K (α), where χ2K (α) is the upper 100(1 − α)th percentile of
χ2K , or the p−value of Q̃â2 is less than α, type-I error.
2. Engle’s Lagrange-Multiplier Test: Engle (1982) proposed a Lagrange-
Multiplier test. This test rests on an ancillary regression of the squared
residuals against a constant and lagged values of â2t−1 , â2t−2 , . . . , â2t−m
15
where the {â2t } is again obtained from a preliminary regression or
â2t = rt − µ̂t .
The auxiliary regression thus is
â2t = α0 + α1 â2t−1 + · · · + αm â2t−m + et , t = m + 1, ..., n. (28)
(i.e., Fit an AR(m) model to {â2t }, t = 1, 2, . . . , n ). Here et denotes the
error term of this auxiliary regression. Then the null hypothesis H0 :
α1 = α2 = · · · = αm = 0 is tested against the alternative hypothesis
H1 : αi 6= 0 for at least one i. As a test statistic one can use the
coefficient of determination times n, i.e. nR2 . Therefore LM test
statistic is
LM = nR2 (29)
Under H0 , this test statistic is asymptotically distributed as χ2 with m
degrees of freedom. The decision rule is to reject the null hypothesis
if LM > χ2m (α), where χ2m (α) is the upper 100(1 − α)th percentile of
χ2m , or the p−value of LM is less than α, type-I error.
Note:
1. ARCH(p) should only ever be applied to a series that has already had
an appropriate model fitted sufficient to leave the residuals looking
SR
like discrete white noise. Since we can only tell whether ARCH is
appropriate or not by squaring the residuals and examining the ACF,
we also need to ensure that the mean of the residuals is zero.
2. These tests can also be useful in a conventional regression setting.
16
sumption, the likelihood function of an ARCH(m) model is
f (a1 , a2 , . . . , an |α) = f (an |Fn−1 )f (an−1 |Fn−2 ) · · · f (am+1 |Fm )f (a1 , a2 , . . . , am |α)
n
!
Y 1 a2t
= p exp − 2 × f (a1 , a2 , . . . , am |α) (30)
t=m+1 2πσ t
2 2σ t
2
X 1 1 1 a t
l(am+1 , am+2 , . . . , an |α, a1 , a2 , . . . , am ) = − ln(2π) − ln(σt2 ) − 2 .
t=m+1
2 2 2 σ t
(32)
Since the first term ln(2π) does not involve any parameters, the log-likelihood
function becomes
n
" #
2
X 1 1 a t
l(am+1 , am+2 , . . . , an |α, a1 , a2 , . . . , am ) = − ln(σt2 ) + 2 . (33)
t=m+1
2 2 σ t
17
2. Nelson (1991) suggested using the generalized error distribution (GED)
for the estimation.
The likelihood function of a GARCH model can be readily derived for the
case of normal innovations. We illustrate the computation for the case of a
stationary GARCH(1, 1) model. Extension to the general case is straight-
forward. Given the parameters α0 , α1 , and β1 the conditional variances can
be computed recursively by the formula
for t ≥ 2, with the initial value, σ12 , set under the stationarity assumption
α0
as the stationary unconditional variance σa2 = . We use the
1 − α1 − β1
conditional pdf
!
2
1 a
f (at |at−1 , at−2 , . . . , a1 ) = p exp − t 2 (35)
2
2πσt 2σ t
SR
Iterating this last formula and taking logs gives the following formula for
the log-likelihood function:
n
!
2
n 1 X a
l(α0 , α1 , β1 ) = − ln(2π) − 2
ln(σt−1 ) + t2 (37)
2 2 t=1 σt
***END***
18