Time Series Analysis in Python With Statsmodels
Time Series Analysis in Python With Statsmodels
Time Series Analysis in Python With Statsmodels
discussions, stats, and author profiles for this publication at: https://www.researchgate.net/publication/265673479
Article
CITATIONS READS
4 2,098
3 authors, including:
Some of the authors of this publication are also working on these related projects:
All content following this page was uploaded by Josef Perktold on 09 December 2015.
Abstract—We introduce the new time series analysis features of scik- In the simplest case, the errors are independently and iden-
its.statsmodels. This includes descriptive statistics, statistical tests and sev- tically distributed. Unbiasedness of OLS requires that the
eral linear model classes, autoregressive, AR, autoregressive moving-average, regressors and errors be uncorrelated. If the errors are ad-
ARMA, and vector autoregressive models VAR.
ditionally normally distributed and the regressors are non-
Index Terms—time series analysis, statistics, econometrics, AR, ARMA, VAR, random, then the resulting OLS or maximum likelihood es-
GLSAR, filtering, benchmarking timator (MLE) of b is also normally distributed in small
samples. We obtain the same result, if we consider consider
FT
Introduction the distributions as conditional on xt when they are exogenous
Statsmodels is a Python package that provides a complement random variables. So far this is independent whether t indexes
to SciPy for statistical computations including descriptive time or any other index of observations.
statistics and estimation of statistical models. Beside the initial When we have time series, there are two possible extensions
models, linear regression, robust linear models, generalized that come from the intertemporal linkage of observations. In
linear models and models for discrete data, the latest release the first case, past values of the endogenous variable influence
of scikits.statsmodels includes some basic tools and models the expectation or distribution of the current endogenous
for time series analysis. This includes descriptive statistics, variable, in the second case the errors et are correlated over
statistical tests and several linear model classes: autoregres- time. If we have either one case, we can still use OLS or
generalized least squares GLS to get a consistent estimate of
RA
sive, AR, autoregressive moving-average, ARMA, and vector
autoregressive models VAR. In this article we would like to the parameters. If we have both cases at the same time, then
introduce and provide an overview of the new time series OLS is not consistent anymore, and we need to use a non-
analysis features of statsmodels. In the outlook at the end linear estimator. This case is essentially what ARMA does.
we point to some extensions and new models that are under
Linear Model with autocorrelated error (GLSAR)
development.
Time series data comprises observations that are ordered This model assumes that the explanatory variables, regressors,
along one dimension, that is time, which imposes specific are uncorrelated with the error term. But the error term is an
stochastic structures on the data. Our current models assume autoregressive process, i.e.
that observations are continuous, that time is discrete and E(xt , et ) = 0
equally spaced and that we do not have missing observations.
This type of data is very common in many fields, in economics et = a1 et 1 + a2 et 1 + ... + ak et k
D
use OLS to estimate, adding past endog to the exog. The vector same as described before. statsmodels provides estimators for
autoregressive model (VAR) has the same basic statistical both methods in tsa.ARMA which will be described in more
structure except that we consider now a vector of endogenous detail below.
variables at each point in time, and can also be estimated with Time series analysis is a vast field in econometrics with a
OLS conditional on the initial information. (The stochastic large range of models that extend on the basic linear models
structure of VAR is richer, because we now also need to take with the assumption of normally distributed errors in many
into account that there can be contemporaneous correlation of ways, and provides a range of statistical tests to identify
the errors, i.e. correlation at the same time point but across an appropriate model specification or test the underlying
equations, but still uncorrelated across time.) The second assumptions.
estimation method that is currently available in statsmodels is Besides estimation of the main linear time series models,
maximum likelihood estimation. Following the same approach, statsmodels also provides a range of descriptive statistics for
we can use the likelihood function that is conditional on the time series data and associated statistical tests. We include an
first observations. If the errors are normaly distributed, then overview in the next section before describing AR, ARMA
this is essentially equivalent to least squares. However, we and VAR in more details. Additional results that facilitate the
can easily extend conditional maximum likelihood to other usage and interpretation of the estimated models, for example
models, for example GARCH, linear models with generalized impulse response functions, are also available.
autoregressive conditional heteroscedasticity, where the vari-
ance depends on the past, or models where the errors follow OLS, GLSAR and serial correlation
FT
a non-normal distribution, for example Student-t distributed
Suppose we want to model a simple linear model that links
which has heavier tails and is sometimes more appropriate in
the stock of money in the economy to real GDP and consumer
finance.
price index CPI, example in Greene (2003, ch. 12). We import
The second way to treat the problem of intial conditions is to
numpy and statsmodels, load the variables from the example
model them together with other observations, usually under the
dataset included in statsmodels, transform the data and fit the
assumption that the process has started far in the past and that
model with OLS:
the initial observations are distributed according to the long import numpy as np
run, i.e. stationary, distribution of the observations. This exact import scikits.statsmodels.api as sm
maximum likelihood estimator is implemented in statsmodels tsa = sm.tsa # as shorthand
for the autoregressive process in statsmodels.tsa.AR, and for
RA
mdata = sm.datasets.macrodata.load().data
the ARMA process in statsmodels.tsa.ARMA. endog = np.log(mdata[’m1’])
exog = np.column_stack([np.log(mdata[’realgdp’]),
Autoregressive Moving average model (ARMA) np.log(mdata[’cpi’])])
exog = sm.add_constant(exog, prepend=True)
ARMA combines an autoregressive process of the dependent
variable with a error term, moving-average or MA, that res1 = sm.OLS(endog, exog).fit()
includes the present and a linear combination of past error print res1.summary() provides the basic overview of
terms, an ARMA(p,q) is defined as the regression results. We skip it here to safe on space. The
Durbin-Watson statistic that is included in the summary is
E(et , es ) = 0, for t 6= s very low indicating that there is a strong autocorrelation in
yt = µ + a1 yt 1 + ... + ak yt p + et + b1 et 1 + ... + bq et q
the residuals. Plotting the residuals shows a similar strong
autocorrelation.
As a simplified notation, this is often expressed in terms of
D
If we assume that the second is correct, then we can estimate methodology is to look at the pattern in the autocorrelation
the model with GLSAR. As an example, let us assume we (acf) and partial autocorrelation (pacf) functions
consider four lags in the autoregressive error. scikits.statsmodels.tsa.arima_process contains a class that
mod2 = sm.GLSAR(endog, exog, rho=4) provides several properties of ARMA processes and a
res2 = mod2.iterative_fit()
random process generator. As an example, statsmod-
iterative_fit alternates between estimating the autoregressive els/examples/tsa/arma_plots.py can be used to plot autocor-
process of the error term using tsa.yule_walker, and feasible relation and partial autocorrelation functions for different
sm.GLS. Looking at the estimation results shows two things, ARMA models.
the parameter estimates are very different between OLS and
GLS, and the autocorrelation in the residual is close to a
random walk:
res1.params
#array([-1.502, 0.43 , 0.886])
res2.params
#array([-0.015, 0.01 , 0.034])
mod2.rho
#array([ 1.009, -0.003, 0.015, -0.028])
This indicates that the short run and long run dynamics might
be very different and that we should consider a richer dynamic
FT
model, and that the variables might not be stationary and that
there might be unit roots.
ARMA Modeling Specifically, in the time domain the Baxter-King filter takes
Statsmodels provides several helpful routines and models the form of a symmetric moving average
for working Autoregressive Moving Average (ARMA) time- K
series models, including simulation and estimation code. For yt⇤ = Â ak yt k
example, after importing arima_process as ap from scik- k= K
its.statsmodels.tsa we can simulate a series1 where ak = a k for symmetry and ÂKk= K ak = 0 such that
>>> ar_coef = [1, .75, -.25] the filter has trend elimination properties. That is, series that
>>> ma_coef = [1, -.5]
>>> nobs = 100 contain quadratic deterministic trends or stochastic processes
>>> y = ap.arma_generate_sample(ar_coef, that are integrated of order 1 or 2 are rendered stationary by
... ma_coef, nobs) application of the filter. The filter weights ak are given as
>>> y += 4 # add in constant
follows
We can then estimate an ARMA model of the series
>>> mod = tsa.ARMA(y) a j = B j + q for j = 0, ±1, ±2, . . . , ±K
>>> res = arma_mod.fit(order=(2,1), trend=’c’,
... method=’css-mle’, disp=-1) (w2 w1 )
>>> arma_res.params B0 =
array([ 4.0092, -0.7747, 0.2062, -0.5563])
p
The estimation method, ’css-mle’, indicates that the starting 1
Bj = (sin (w2 j) sin (w1 j)) for j = 0, ±1, ±2, . . . , ±K
parameters from the optimization are to be obtained from pj
FT
the conditional sum of squares estimator and then the exact where q is a normalizing constant such that the weights sum
likelihood is optimized. The exact likelihood is implemented to zero
using the Kalman Filter. Â j= K K b j
q=
2K + 1
Filtering and
2p 2p
We have recently implemented several filters that are com- w1 = , w2 =
PH PL
monly used in economics and finance applications. The three
most popular method are the Hodrick-Prescott, the Baxter- with the periodicity of the low and high cut-off frequen-
King filter, and the Christiano-Fitzgerald. These can all be cies given by PL and PH , respectively. Following Burns and
RA
viewed as approximations of the ideal band-pass filter; how- Mitchell’s [] pioneering work which suggests that US business
ever, discussion of the ideal band-pass filter is beyond the cycles last from 1.5 to 8 years, Baxter and King suggest using
scope of this paper. We will [briefly review the implementation PL = 6 and PH = 32 for quarterly data or 1.5 and 8 for annual
details of each] give an overview of each of the methods and data. The authors suggest setting the lead-lag length of the
then present some usage examples. filter K to 12 for quarterly data. The transformed series will
The Hodrick-Prescott filter was proposed by Hodrick and be truncated on either end by K. Naturally the choice of these
Prescott [HPres], though the method itself has been in use parameters depends on the available sample and the frequency
across the sciences since at least 1876 [Stigler]. The idea is to band of interest.
separate a time-series yt into a trend tt and cyclical compenent The last filter that we currently provide is that of Christiano
zt and Fitzgerald [CFitz]. The Christiano-Fitzgerald filter is again
yt = tt + zt a weighted moving average. However, their filter is asymmetric
about t and operates under the (generally false) assumption
D
The components are determined by minimizing the following that yt follows a random walk. This assumption allows their
quadratic loss function filter to approximate the ideal filter even if the exact time-
T T series model of yt is not known. The implementation of their
min  zt2 + l  [(tt tt 1) (tt 1 tt 2 )]
2 filter involves the calculations of the weights in
{tt } t t=1
yt⇤ = B0 yt + B1 yt+1 + · · · + BT 1 t yT 1 + B̃T t yT +
where tt = yt zt and l is the weight placed on the penalty for
roughness. Hodrick and Prescott suggest using l = 1600 for B1 yt 1 + · · · + Bt 2 y2 + B̃t 1 y1
quarterly data. Ravn and Uhlig [RUhlig] suggest l = 6.25 and
for t = 3, 4, ..., T 2, where
l = 129600 for annual and monthly data, respectively. While
there are numerous methods for solving the loss function, sin( jb) sin( ja)
Bj = ,j 1
our implementation uses scipy.sparse.linalg.spsolve to find pj
the solution to the generalized ridge-regression suggested in
b a 2p 2p
Danthine and Girardine [DGirard]. B0 = ,a = ,b =
p Pu PL
Baxter and King [BKing] propose an approximate band-pass
filter that deals explicitly with the periodicity of the business B̃T t and B̃t 1 are linear functions of the B j ’s, and the values
cycle. By applying their band-pass filter to a time-series yt , for t = 1, 2, T 1, and T are also calculated in much the same
they produce a series yt⇤ that does not contain fluctuations at way. See the authors’ paper or our code for the details. PU
frequencies higher or lower than those of the business cycle. and PL are as described above with the same interpretation.
TIME SERIES ANALYSIS IN PYTHON WITH STATSMODELS 101
Figure 2: Unfiltered Inflation and Unemployment Rates 1959Q4- Figure 4: Unfiltered Inflation and Unemployment Rates 1959Q4-
2009Q1 2009Q1
FT
RA
Figure 3: Unfiltered Inflation and Unemployment Rates 1959Q4- Figure 5: Unfiltered Inflation and Unemployment Rates 1959Q4-
2009Q1 2009Q1
API and resultant filtered series for each method. We use series We also provide for another frequent need of those who work
for unemployment and inflation to demonstrate 2. They are with time-series data of varying observational frequency--that
traditionally thought to have a negative relationship at business of benchmarking. Benchmarking is a kind of interpolation
cycle frequencies. that involves creating a high-frequency dataset from a low-
>>> from scipy.signal import lfilter frequency one in a consistent way. The need for benchmarking
>>> data = sm.datasets.macrodata.load()
>>> infl = data.data.infl[1:] arises when one has a low-frequency series that is perhaps
>>> # get 4 qtr moving average annual and is thought to be reliable, and the researcher also has
>>> infl = lfilter(np.ones(4)/4, 1, infl)[4:] a higher frequency series that is perhaps quarterly or monthly.
>>> unemp = data.data.unemp[1:]
A benchmarked series is a high-frequency series consistent
To apply the Hodrick-Prescott filter to the data 3, we can do with the benchmark of the low-frequency series.
>>> infl_c, infl_t = tsa.filters.hpfilter(infl) We have implemented Denton’s modified method. Origi-
>>> unemp_c, unemp_t = tsa.filters.hpfilter(unemp)
nally proposed by Denton [Denton] and improved by Cholette
The Baxter-King filter 4 is applied as [Cholette]. To take the example of turning an annual series
>>> infl_c = tsa.filters.bkfilter(infl) into a quarterly one, Denton’s method entails finding a bench-
>>> unemp_c = tsa.filters.bkfilter(unemp)
marked series Xt that solves
The Christiano-Fitzgerald filter is similarly applied 5
T ✓ ◆
>>> infl_c, infl_t = tsa.filters.cfilter(infl) Xt Xt 1 2
>>> unemp_c, unemp_t = tsa.filters.cfilter(unemp) min Â
{Xt } t It It 1
102 PROC. OF THE 10th PYTHON IN SCIENCE CONF. (SCIPY 2011)
subject to
T
 Xt = Ay y = {1, . . . , b }
t=2
That is, the sum of the benchmarked series must equal the
annual benchmark in each year. In the above Ay is the annual
benchmark for year y, It is the high-frequency indicator series,
and b is the last year for which the annual benchmark is
available. If T > 4b , then extrapolation is performed at the
end of the series. To take an example, given the US monthly
industrial production index and quarterly GDP data, from 2009
and 2010, we can construct a benchmarked monthly GDP
Figure 6: VAR sample autocorrelation
series
>>> iprod_m = np.array([ 87.4510, 86.9878, 85.5359,
84.7761, 83.8658, 83.5261, 84.4347,
85.2174, 85.7983, 86.0163, 86.2137,
86.7197, 87.7492, 87.9129, 88.3915,
88.7051, 89.9025, 89.9970, 90.7919,
90.9898, 91.2427, 91.1385, 91.4039,
FT
92.5646])
>>> gdp_q = np.array([14049.7, 14034.5, 14114.7,
14277.3, 14446.4, 14578.7, 14745.1,
14871.4])
>>> gdp_m = tsa.interp.dentonm(iprod_m, gdp_q,
freq="qm")
R EFERENCES
[BKing] Baxter, M. and King, R.G. 1999. “Measuring Business Cycles:
Approximate Band-pass Filters for Economic Time Series.”
FT
Review of Economics and Statistics, 81.4, 575-93.
[Cholette] Cholette, P.A. 1984. “Adjusting Sub-annual Series to Yearly
Benchmarks.” Survey Methodology, 10.1, 35-49.
[CFitz] Christiano, L.J. and Fitzgerald, T.J. 2003. “The Band Pass Filter.”
International Economic Review, 44.2, 435-65.
[DGirard] Danthine, J.P. and Girardin, M. 1989. “Business Cycles in
Switzerland: A Comparative Study.” European Economic Review
33.1, 31-50.
[Denton] Denton, F.T. 1971. “Adjustment of Monthly or Quarterly Series
to Annual Totals: An Approach Based on Quadratic Minimiza-
tion.” Journal of the American Statistical Association, 66.333,
99-102.
[HPres] Hodrick, R.J. and Prescott, E.C. 1997. “Postwar US Business
RA
Cycles: An Empirical Investigation.” Journal of Money, Credit,
and Banking, 29.1, 1-16.
[RUhlig] Ravn, M.O and Uhlig, H. 2002. “On Adjusting the Hodrick-
Figure 9: VAR Forecast error variance decomposition Prescott Filter for the Frequency of Observations.” Review of
Economics and Statistics, 84.2, 371-6.
[Stigler] Stigler, S.M. 1978. “Mathematical Statistics in the Early States.”
Annals of Statistics 6, 239-65,
Various tests such as testing Granger causality can be carried [Lütkepohl] Lütkepohl, H. 2005. “A New Introduction to Multiple Time
out using the results object: Series Analysis”
>>> res.test_causality(’realinv’, ’realcons’)
H_0: [’realcons’] do not Granger-cause realinv
Conclusion: reject H_0 at 5.00% significance level
{’conclusion’: ’reject’,
’crit_value’: 3.0112857238108273,
D
Conclusions
statsmodels development over the last few years has been
focused on building correct and tested implementations of
the standard suite of econometric models available in other
statistical computing environments, such as R. However, there