Stationary Process
Stationary Process
1
Stationary Process
Basic Concepts
A time series is stationary if the properties of the time series (i.e. the mean,
variance, etc.) are the same when measured from any two starting points in
time. Time series which exhibit a trend or seasonality are clearly not
stationary.
We can make this definition more precise by first laying down a statistical
framework for further discussion.
Definitions
Definition 1: A stochastic process (aka a random process) is a
collection of random variables ordered by time.
In economics, GDP and corporate profits (by year) can be modeled as
stochastic processes. In biology the number of elephants in the wild, in
meteorology the average temperature of the planet, in medicine the number
of Ebola cases, etc. can all be modeled as stochastic processes.
Thus, if we are interested in GDP from 2001 until 2015, we can define the
random variables yi = the GDP in 2000 + i, and so the series y1, y2, …, y15 is a
stochastic process.
Corresponding to the individual populations of the random variables in a
stochastic process are the samples for each random variable. Any such
realization of samples is called a time series. Note that the sample of each
random variable in a time series contains just one element.
Definition 2: A stochastic process is stationary if the mean, variance and
autocovariance are all constant; i.e. there are constants μ, σ and γk so that for
all i, E[yi] = μ, var(yi) = E[(yi–μ)2] = σ2 and for any lag k, cov(yi, yi+k) = E[(yi–
μ)(yi+k–μ)] = γk.
A time series is stationary if the above properties hold for the time series
(in the same way as we extend properties of a population to its samples). We
will make this more precise shortly.
Observation: The above definition of stationary is what is usually
called weakly stationary, but fortunately it is sufficient for our purposes.
A stochastic process is truly stationary if not only are the mean, variance,
and autocovariances constant, but all the properties (i.e. moments) of its
distribution are time-invariant.
Example
Example 1: Determine whether the Dow Jones closing averages for the
month of October 2015, as shown in columns A and B of Figure 1 is a
stationary time series.
2
Figure 1 – Dow Jones Time Series
As you can see from Figure 1, there is an upward trend to the data. This is an
indication that the time series is not stationary.
We now take the first differences of the Dow Jones closing averages on
consecutive days, as shown in Figure 2.
3
Figure 2 – Differences of Lag 1
Here cell N5 contains the formula =M5-M4 and similarly for the other cells
in column N. This time the chart shows what looks like a random pattern.
This is indicative of a stationary time series.
4
Autocorrelation Function
Definitions
Definition 1: The autocorrelation function (ACF) at lag k, denoted ρk, of a
stationary stochastic process, is defined as ρk = γk/γ0 where γk = cov(yi, yi+k) for any i.
Note that γ0 is the variance of the stochastic process.
Definition 2: The mean of a time series y1, …, yn is
The autocorrelation function (ACF) at lag k, for k ≥ 0, of the time series is defined
by
5
Figure 1 – ACF at lag 2
The formulas for calculating s2 and r2 using the usual COVARIANCE.S and
CORREL functions are shown in cells G4 and G5.
The formulas for s0, s2, and r2 from Definition 2 are shown in cells G8, G11, and G12
(along with an alternative formula in G13). Note that the values for s2 in cells E4 and
E11 are not too different, as are the values for r2 shown in cells E5 and E12; the
larger the sample the more likely these values will be similar.
Worksheet Functions
Real Statistics Functions: The Real Statistics Resource Pack supplies the following
functions:
ACF(R1, k) = the ACF value at lag k for the time series in range R1
ACVF(R1, k) = the autcovariance at lag k for the time series in range R1
Note that ACF(R1, k) is equivalent to
=SUMPRODUCT(OFFSET(R1,0,0,COUNT(R1)-k)-
AVERAGE(R1),OFFSET(R1,k,0,COUNT(R1)-k)-AVERAGE(R1))/DEVSQ(R1)
Observations
There are theoretical advantages to using division by n instead of n–k in the
definition of sk, namely that the covariance and correlation matrices will always be
definite non-negative (see Positive Definite Matrices).
Even though the definition of autocorrelation is slightly different from that of
correlation, ρk (or rk) still takes a value between -1 and 1, as we see in Property 2.
6
Properties
Property 1: For any stationary process, γ0 ≥ |γi| for any i
Proof: Click here
Property 2: For any stationary process, |ρi| ≤ 1 (i.e. -1 ≤ ρi ≤ 1) for any i > 0
Proof: By Property 1, γ0 ≥ |γi| for any i. Since ρi = γi /γ0 and γ0 ≥
0 (actually γ0 > 0 since we are assuming that ρi is well-defined), it follows that
7
Observation: A rule of thumb is to carry out the above process for lag = 1 to n/3
or n/4, which for the above data is 22/4 ≈ 6 or 22/3 ≈ 7. Our goal is to see whether
by this time the ACF is significant (i.e. statistically different from zero). We can do
this by using the following property.
Tests
Property 3 (Bartlett): In large samples, if a time series of size n is purely random
then for all k
Example 3: Determine whether the ACF at lag 7 is significant for the data from
Example 2.
As we can see from Figure 3, the critical value for the test in Property 3 is .417866.
Since r7 = .031258 < .417866, we conclude that ρ7 is not significantly different from
zero.
Example 4: Use the Box-Pierce and Ljung-Box statistics to determine whether the
ACF values in Example 2 are statistically equal to zero for all lags less than or equal
to 5 (the null hypothesis).
The results are shown in Figure 4.
8
Figure 4 – Box-Pierce and Ljung-Box Tests
We see from these tests that ACF(k) is significantly different from zero for at least
one k ≤ 5, which is consistent with the correlogram in Figure 2.
Test worksheet functions
Real Statistics Functions: The Real Statistics Resource Pack provides the following
functions to perform the tests described by the above properties.
BARTEST(r, n, lag) = p-value of Bartlett’s test for correlation coefficient r based
on a time series of size n for the specified lag.
BARTEST(R1,, lag) = BARTEST(r, n, lag) where n = the number of elements in
range R1 and r = ACF(R1,lag)
PIERCE(R1,,lag) = Box-Pierce statistic Q for range R1 and the specified lag
BPTEST(R1,,lag) = p-value for the Box-Pierce test for range R1 and the
specified lag
LJUNG(R1,,lag) = Ljung-Box statistic Q for range R1 and the specified lag
LBTEST(R1,,lag) = p-value for the Ljung-Box test for range R1 and the
specified lag
In the above functions where the second argument is missing, the test is performed
using the autocorrelation coefficient (ACF). If the value assigned instead is 1 or
“pacf” then the test is performed using the partial autocorrelation coefficient (PACF)
as described in the next section. Actually, if the second argument takes any value
except 1 or “pacf”, then the ACF value is used.
For example, BARTEST(.303809,22,7) = .07708 for Example 3 and
LBTEST(B4:B25,”acf”,5) = 1.81E-06 for Example 4.
41 thoughts on “Autocorrelation Function”
Autocorrelation Proof
Property 1: For any stationary process, γ0 ≥ |γi| for any i
9
Proof: For any stationary process yi with mean µ, define zi = yi – µ. Then it is easy
to see that zi is a stationary process with mean zero. Also
(including the case where k = 0) which means that it is sufficient to prove the
property in the case where the mean is zero.
Now suppose that E[zi–czi+k] ≥ 0 for some real number c. Now for any real number c,
it follows that
and so
If γk ≥ 0, let c = 1, while if γk < 0, let c = -1. Then
This can be calculated as the correlation between the residuals of the regression of y
on x2, x3, x4 with the residuals of x1 on x2, x3, x4.
For a time series, the hth order partial autocorrelation is the partial correlation of
yi with yi-h, conditional on yi-1,…, yi-h+1, i.e.
Here, Γk is the k × k autocovariance matrix Γk = [vij] where vij = γ|i-j| and δk is the k ×
1 column vector δk = [γi]. We also define π0 = 1. We can also define πki to be
the ith element in the vector , and so πk = πkk.
Provided γ0 > 0, the partial correlation function of order k is equal to the kth element
in the following column matrix divided by γ0.
10
Here, Σk is the k × k autocorrelation matrix Σk = [ωij] where ωij = ρ|i-j| and τk is
the k × 1 column vector τk = [ρi].
Note that if γ0 > 0, then Σk and Γk are invertible for all m.
The partial autocorrelation function (PACF) of order k, denoted pk, of a time
series, is defined in a similar manner as the last element in the following matrix
divided by r0.
Here Rk is the k × k matrix Rk = [sij] where sij = r|i-j| and Ck is the k × 1 column
vector Ck = [ri].
We also define p0 = 1 and pik to be the ith element in the matrix , and so pk = pkk.
These values can also be calculated from the autocovariance matrix of the time series
in the same manner as described above for stochastic processes.
Observation: Let Y be the k × 1 vector Y = [y1 y2 … yk]T and Z = [y1–µ y2–µ … yk–
µ]T where µ = E[Y]. Then cov(Y) = E[ZZT], which is equal to
the k × k autocovariance matrix Γk = [vij] where vij = γ|i-j|.
Example 1: Calculate PACF for lags 1 to 7 for Example 2 of Autocorrelation
Function.
The result is shown in Figure 1.
Figure 1 – PACF
We now show how to calculate PACF(4) in Figure 2. The calculations of the other
PACF values is similar.
11
Figure 2 – Calculation of PACF(4)
First, we note that range R4:U7 of Figure 2 contains the autocovariance matrix with
lag 4. This is a symmetric matrix, all of whose values come from range E4:E6 of
Figure 1. The values on the main diagonal are s0, the values on the diagonal above
and below the main diagonal are s1. The values on the diagonal two units away
are s2 and finally, the values in the upper right and lower left corners of the matrix
are s3.
Range R9:R12 is identical to range E5:E8. The values in range T9:T12 can now be
calculated using the array formula
=MMULT(MINVERSE(R4:U7),R9:R12)
The values in range T9:T12 are the pi4 values, and so we see that PACF(4) = p4 = -
.06685 (cell T12).
Observation: See Correlogram for information about the standard error and
confidence intervals of the pk, as well as how to create a PACF correlogram that
includes the confidence intervals.
Observation: We see from the chart in Figure 1, that the PACF values for lags 2, 3,
… are close to zero. As can be seen in Partial Autocorrelation for an AR(p) Process,
this is typical for a time series derived from an autoregressive process. Note too that
we can use Property 3 of Autocorrelation Function to test whether the PACF values
for lags 2 and beyond are statistically equal to zero (see Figure 3).
12
Figure 3 – Bartlett’s test for PACF
The test shows that PACF(2) is not significantly different from zero. Note that,
where applicable, we can also use Property 4 and 5 of Autocorrelation Function to
test PACF values.
Real Statistics Function: The Real Statistics Resource Pack supplies the following
function where R1 is a column range containing time series data:
PACF(R1, k) – the PACF value at lag k
The following array functions are also provided
ACOV(R1, k) – the autcovariance matrix at lag k
ACORR(R1, k) – the autcorrelation matrix at lag k
Property 1: The autocovariance matrix is non-negative definite.
Proof: Let Y = [y1 y2 … yk]T and let Γ be the k × k autocovariance matrix Γ =
[vij] where vij = γ|i-j| . As we observed previously, Γ = E[ZZT].
Now let X be any k × 1 vector. Then
where
Clearly, E[yi] = μ, var(yi) = σ2i and cov(yi, yj) = 0 for i ≠ j. Since these values
are constants, this type of time series is stationary. Also note that ρh = 0 for
all h > 0.
13
Example 1: Simulate 300 white noise data elements with mean zero.
Using the formula =NORM.S.INV(RAND()) we can generate a sample of 300
white noise elements, as displayed in Figure 1.
14
Figure 2 shows 40 values for rk. We would expect that about 40(.95) = 2 of
these values would be outside the 95% confidence interval. In fact, two ACF
values are outside this range, namely r9 = .11842 and r19 = .13366.
Using the Ljung-Box test, we see that none of the 40 ACF values is
significantly different from zero:
where
If δ = 0, then the random walk is said to be without drift, while if δ ≠ 0, then the
random walk is with drift (i.e. with drift equal to δ).
It is easy to see that for i > 0
It then follows that E[yi] = y0 + δi, var(yi) = σ2i and cov(yi, yj) = 0 for i ≠ j. The
variance values are not constants but vary with time i, and so this type of time series
is not stationary. Also, the mean values are constant only for a random walk without
drift.
Note too that since cov(εi,εj) = 0 for i ≠ j, it follows that
Note that the first difference zi = yi – yi-1 of a random walk is stationary since it takes
the form
15
Plot
Example 1: Graph the random walk with drift yi = yi-1 + εi where the εi ∼ N(0,.5).
The graph is shown in Figure 1. All the cells in column B contain the formula
=NORM.INV(RAND(),0,.5), cell C4 contains the formula =1+B4 and cell C5
contains the formula =1+B5+C4.
As we can see, the graph shows a clear upward trend and the ACF shows a slow
descent.
16
Figure 2 – First differences of a random walk
2 thoughts on “Random Walk”
Deterministic Trend
A time series with a (linear) deterministic trend can be modeled as
Now E[yi] = μ + δi and var(yi) = σ2, and so while the variance is a constant, the mean
varies with time i; consequently, this type of time series is also not stationary.
These types of time series can be transformed into a stationary time series
by detrending, i.e. by setting zi = yi – δi. In this case zi = μ + εi, which is a purely
random time series.
In a similar fashion, we can speak about a quadratic deterministic trend (yi = μ
+ δi + εi) or various other varieties of deterministic trends.
The types of random walks described previously are said to have a stochastic trend.
We can also have random walks with a deterministic trend. These take the form
where δ is a constant. These are not stationary and require differencing and
detrending to be transformed into a stationary time series.
17
Example 1: Graph the time series with deterministic trend yi = i + εi) where the εi ∼
N(0,1).
The graph is shown in Figure 1. All the cells in column B contain the formula
=NORM.S.INV(RAND()) and cell C4 contains the formula =A4+B4 (and similarly
for the other cells in column C).
As we can see, once again the graph shows a clear upward trend and the ACF shows
a slow descent.
18
Figure 2 – Detrending
4 thoughts on “Deterministic Trend”
Dickey-Fuller Test
We consider the stochastic process of form
where |φ| ≤ 1 and εi is white noise. If |φ| = 1, we have what is called a unit root. In
particular, if φ = 1, we have a random walk (without drift), which is not stationary.
In fact, if |φ| = 1, the process is not stationary, while if |φ| < 1, the process is
stationary. We won’t consider the case where |φ| > 1 further since in this case the
process is called explosive and increases over time.
This process is a first-order autoregressive process, AR(1), which we study in more
detail in Autoregressive Processes. We will also see why such processes without a
unit root are stationary and why the term “root” is used.
The Dickey-Fuller test is a way to determine whether the above process has a unit
root. The approach used is quite straightforward. First calculate the first difference,
i.e.
19
i.e.
If we use the delta operator, defined by Δyi = yi – yi-1 and set β = φ – 1, then the
equation becomes the linear regression equation
where β ≤ 0 and so the test for φ is transformed into a test that the slope parameter β
= 0. Thus, we have a one-tailed test (since β can’t be positive) where
H0: β = 0 (equivalent to φ = 1)
H1: β < 0 (equivalent to φ < 1)
Under the alternative hypothesis, if b is the ordinary least squares (OLS) estimate
of β, and so φ-bar = 1 + b is the OLS estimate of φ, then for large enough n
where
We can use the usual linear regression approach, except that when the null
hypothesis holds the t coefficient doesn’t follow a normal distribution and so we
can’t use the usual t-test. Instead, this coefficient follows a tau distribution, and so
our test consists of determining whether the tau statistic τ (which is equivalent to the
usual t statistic) is less than τcrit based on a table of critical tau statistics values shown
in Dickey-Fuller Table.
If the calculated tau value is less than the critical value in the table of critical values,
then we have a significant result; otherwise, we accept the null hypothesis that there
is a unit root and the time series is not stationary.
There are the following three versions of the Dickey-Fuller test:
Each version of the test uses a different set of critical values, as shown in the Dickey-
Fuller Table. It is important to select the correct version of the test for the time series
being analyzed. Note that the type 2 test assumes there is a constant term (which
may be significantly equal to zero).
Example 1: The net daily earnings of a small-time gambler are listed in column B
of Figure 1. Use the Dickey-Fuller test to determine whether the times series is
stationary.
20
We start by assuming that the correct model is type 1, namely constant but no
trend.
(constant, no trend) we use the Real Statistics Linear Regression data analysis tool
using range B4:B27 and the X data range and D5:D28 as the Y data range. Note that
the values in column D are calculated by placing the formula =B5-B4 in cell D5,
highlighting the range D5:D28 and pressing Ctrl-D.
The output from the regression analysis is shown on the right side of Figure 1. In
particular, we see that the t statistic (cell I20) for the β1 coefficient is -1.91613. This
is the tau statistic. We now look up in the Dickey-Fuller Table, and find that the tau
critical value for a type 1 test is -2.986 when n = 25 and α = .05. Since τcrit = -2.986
< – 1.91613 = τ, we cannot reject the null hypothesis that the time series is not
stationary.
Note that the β1 coefficient (cell G20) is negative as expected. If instead, the
coefficient were positive, then we would know that this type of Dickey-Fuller test
was inappropriate since β1 = φ – 1 ≤ 0.
We now display in Figure 2 a plot of the time series values from Figure 1.
21
Figure 2 – Chart of Winnings by Day
We see that there is an apparent downward trend towards the end of the 25 day period
and so it is not surprising that the time series is not stationary. In fact, this leads us
to choose the type 2 Dickey-Fuller test (with constant and trend). The result of this
test is shown in Figure 3.
22
Since we are using the regression model
Δyi = β0 + β1i + β2yi-1 + εi
this time, we use A4:B27 from Figure 1 as the X data range and D5:D28 as the Y
data range. We see from Figure 3 that the t statistic (cell I21) for the β2 coefficient is
-2.91345. We now look up in the Dickey-Fuller Table, and find that the tau critical
value is -3.60269 for a type 2 test when n = 25 and α = .05. Since τcrit = -3.60269 < -
2.91345 = τ, we cannot reject the null hypothesis that the time series is not stationary.
Real Statistics Function: The Real Statistics Resource Pack provides the following
array function where R1 contains a column of time series data.
ADFTEST(R1, lab, , , type, alpha): returns a 3 × 1 range which contains the
following values: tau-statistic, tau-critical, yes/no (stationary or not)
If lab = TRUE (default is FALSE), the output consists of a 3 × 2 range whose first
column contains labels. type = the test type (0, 1, 2, default is 1). The default value
for alpha is .05.
Note that for the type 2 test for Example 1, the output from the array formula
=ADFTEST(R6:R30,TRUE,,,2,U9)
agrees with the results we obtained above, as displayed in Figure 4.
23
Note that the ADFCRIT function will return critical values for alpha = .01, .025, .05
and .10 and for values of n found in the Dickey-Fuller Table as well as for values
of alpha and n not included in the table.
24
Augmented Dickey-Fuller Table
If the calculated tau value is less than the critical value in the table above,
then we have a significant result; otherwise, we accept the null hypothesis
that there is a unit root and the time series is not stationary.
The following is a more precise way of estimating these critical values:
crit = t + u/N + v/N2 + w/N3
where t, u, v and w are defined as follows:
25
Type 2 Constant and trend Δyi = β0 + β1 yi-1 + β2 i+ εi
The extension to AR(p) processes has the following three versions.
Once you know how many lags to use, the augmented test is identical to the
simple Dickey-Fuller test. We can use the Akaike Information Criterion
(AIC) or Bayesian Information Criteria (BIC) to determine how many lags to
consider, as described in Comparing ARIMA Models.
Thus we can now use the full version of the ADFTEST function which was
introduced in Dickey-Fuller Test.
Real Statistics Function: The Real Statistics Resource Pack provides the
following array function where R1 contains a column of time series data.
ADFTEST(R1, lab, lag, criteria, type, alpha): returns an 8 × 1 range which
contains the following values: tau-statistic, tau-critical, yes/no (stationary
or not), AIC value, BIC value, # of lags (p), the first-order autoregression
coefficient and estimated p-value.
If lab = TRUE (default is FALSE), the output consists of a 8 × 2 range whose
first column contains labels. type = the test type (0, 1, 2, default is 1). The
default value for alpha is .05.
The arguments lag and criteria, which were not used for the Dickey-Fuller
Test, are defined as follows:
• lag = the maximum number of lags to use in the test (default 0)
• criteria = “none” : no criteria is used, and so p is set to the value
of lag
• criteria = “aic” : the AIC is used to determine the number of
lags p (where p ≤ lag)
• criteria = “bic” : the BIC is used to determine the number of
lags p (where p ≤ lag)
To specify the criteria, you can use “AIC” or 1 instead of “aic”, you can use
“BIC” or 2 instead of “bic” and you can use “” or 0 instead of “none”.
26
If lag < 0 then lag will automatically be set to value
=Round(12*(n/100)^.25,0), as proposed by Schwert, where n = the number
of elements in the time series.
To specify the test type, you can use “” or “none” instead of 0, you can use
“drift” or “constant” instead of 1 and you can use “trend” or “both” instead of
2.
Example 1: Determine whether the data in column A of Figure 1 has a unit
root based on a model without trend based on the Schwert estimate for the
maximum number of lags using the AIC criteria. Also, determine whether
there is a unit root based on a model with trend and a maximum number of
lags equal to 7 using the AIC criteria.
27
default type = 1, which results in the test for constant without trend. As we
can see from range P4:P11 in Figure 2, since tau-stat > tau-crit, the time
series is not stationary.
28
Time Series Testing Tools
Real Statistics Data Analysis Tool: The Real Statistics Resource Pack
provides the Time Series Testing data analysis tool which consolidates many
of the capabilities described in this part of the website.
To use this tool for the data in Example 1 of Stationary Process (repeated in
Figure 1), press Ctr-m and choose the Testing option from the Time S tab
(or from the Time Series option if using the original user interface) and
click on the OK button. Now, fill in the dialog box that appears as shown in
Figure 1.
In Figure 1 we have inserted the time series values in the Input Range field,
without column heading or date information. You can optionally request the
ACF, ACVF and/or PACF values by placing a positive integer value in the
corresponding field. In Figure 1 we have requested the ACF(k) values for
lags k = 1, 2, 3, 4, 5. We could have requested any combination of ACF, ACVF
or PACF values, or none at all.
Similarly, we can request any combination of the white noise tests (Bartlett’s,
Box-Pierce, Ljung-Box) or none at all by inserting a positive integer in the
corresponding field. In Figure 1 we have requested only the Ljung-Box test
on ACF for lags up to 5.
Finally, we can optionally request the ADF test by inserting a non-negative
integer value (including 0) in the # of Lags field or by leaving this field
empty and selecting the Schwert option. We can also request to use
the Drift or Trend options of the test. In Figure 1, we have requested the
test with drift based on the number of lags specified by the Schwert criterion.
29
Figure 1 – Time Series Testing data analysis dialog box
The Schwert criterion calculates the lag based on the Excel formula
=ROUND(12 * (n / 100) ^ (1 / 4), 0)
which in this case is ROUND(12*(22/100)^(1/4),0) = 8. The AIC criteria is
then used with lags = 8.
The output for this analysis is shown in Figure 2 (only the first 15 of the 22
data elements are shown in columns D and E).
30
Figure 2 – Time Series data analysis
The first 5 ACF values are shown in column H. The Ljung-Box test gives a
significant result (cell M6), which means that at least one of the first 5
autocorrelations is significantly different from zero. The Augmented Dickey-
Fuller test shows that the time series is not stationary (cell P13).
Example 2: Repeat Example 1 on the first differences of the data in Example
1.
We fill in the dialog box shown in Figure 1 with two changes, namely we
change the # of Diff field from the default value of zero to 1 and used the
ADF test without drift. The result is shown in Figure 3 (only the first 15 of 21
data values is shown in columns D and E).
31
Figure 3 – Time Series data analysis after differencing
This time we see that the first five ACF values are not statistically different
from zero (cell M6) and that the data is stationary (cell P13).
Correlogram
Real Statistics Data Analysis Tool: The Real Statistics Resource Pack
provides the Correlogram data analysis tool which outputs an ACF or
PACF correlogram that includes the confidence intervals.
ACF Correlogram
Example 1: Construct an ACF Correlogram for the data in column A of
Figure 1 (only the first 18 of 56 data elements are visible).
32
Figure 1 – ACF Correlogram
Press Ctr-m and choose the Time Series option (or the Time S tab if
using the Multipage interface). Select the Correlogram option and click on
the OK button. Now, fill in the dialog box that appears as shown in Figure 2.
Since the # of Lags field was left blank, the default of 30 was used.
33
Note that cell D7 contains the formula =ACF($A$4:$A$59,C7), cell E7
contains the formula =-F7 and cell F7 contains the formula
=NORM.S.INV(1-$F$3/2)/SQRT(COUNT($A$4:$A$59))
The remaining values in columns D and E (until row 36, corresponding to
lag 30) are calculated using similar formulas. Cell F8 contains the formula
=NORM.S.INV(1-
$F$3/2)*SQRT((1+2*SUMSQ(D$7:D7))/COUNT($A$4:$A$59))
and similarly, for the other cells in column F. This reflects the fact that the
standard error and confidence interval of ACF(k) are
PACF Correlogram
Example 2: Construct a PACF Correlogram for the data in column A of
Figure 1.
This time the PACF option from the dialog box in Figure 2 is selected. The
output is shown in Figure 3 (only the firsts 15 of 30 lags is shown).
34
This reflects the fact that the standard error and confidence interval of
PACF(k) are
35
Figure 2 – Imputation Examples
Linear interpolation
The missing value in cell E15 is imputed as follows as shown in cell G15.
The missing value in cell E10 is imputed as follows as shown in cell G10.
Finally, the missing value in cell E18 is imputed as follows as shown in cell
G18.
Spline interpolation
To create the spline interpolation for the four missing values, first, create the
table in range O3:P14 by removing all the missing values. This can be done
by placing the array formula =DELROWBLANK(D3:E18,TRUE) in range
O3:P14, as shown in Figure 3. Next place the array formula
=SPLINE(R4:R18,O4:O14,P4:P14) in range S4:S18 (or in range H4:H18 of
Figure 2).
36
Figure 3 – Spline interpolation
The chart of the spline curve is shown on the right side of Figure 3. The
imputed values are shown in red on the chart.
See Spline Fitting and Interpolation for additional information.
Prior/Next
For Next the next non-missing value is imputed (or the last non-missing
value if there is no next non-missing value), while for Prior the previous
non-missing value is imputed (or the first non-missing value if there is no
previous non-missing value).
The missing value in cell E9 is imputed as 23 (cell J9) when using Next and
12 (cell I9) when using Prior. The missing value in cell E18 is imputed as 75
(cell I18 or J18) when using Prior or Next.
Simple Moving Average
The imputed value depends on the span value k which is a positive integer.
To impute the missing values, we first use linear interpolation, as shown in
column AE of Figure 4. For any missing values in the first or last k elements
in the time series, we simply use the linear interpolation value. For the
others, we use the mean of the 2k+1 linear interpolated values on either side
of the missing value.
In Figure 2 we use a span value of k = 3. To show how the values in column
K of Figure 2 are calculated, we calculate the linear interpolated values as
shown in column AE of Figure 4. Next, we place the formula
=IF(AD4=””,AE4,AD4) in cell AF4, highlight range AF4:AF6 (i.e. a column
37
range with k = 3 elements) and press Ctrl-D. Similarly, we copy the formula
in cell AF4 into the last 3 cells in column F.
Next, we place the formula =IF(AD7=””,AVERAGE(AE4:AE10),AD7) in cell
AF7, highlight the range AF7:AF15 (i.e. all the cells in column AF that haven’t
yet been filled in), and press Ctrl-D. The imputation should in column K is
identical to that shown in column AF.
38
Figure 5 – WMA for t = 6
Here, cell AK4 contains the formula =1/(ABS(AJ$7-AJ4)+1), cell AL4
contains =AE6 and cell AM4 contains =AK4*AL5. We can now highlight the
range AK:AM10 and press Ctrl-D to fill in the other values. Then we sum the
weights to obtain the value 3.16667 as shown in cell AK11 and sum the
products to obtain the value 60.66667 as shown in cell AM11. The imputed
value is thus 60.66667 divided by 3.166667, i.e. 19.15789 as shown in cell
AM12.
This is the value shown in cell AG9 of Figure 4. In fact, we can fill in column
AG of Figure 4 as follows. First, insert the worksheet formula
=1+2*SUMPRODUCT(1/AC5:AC7) in cell AG20. Next, fill in the first 3 and
last 3 values in column AG by using the values in column AE. Finally, insert
the following formula in cell AG7, highlight range AG7:AG15, and press Ctrl-
D.
=IF(AD7=””,SUMPRODUCT(AE4:AE10,1/(ABS(AC4:AC10-
AC7)+1))/AG$20,AD7)
Exponential (Weighted) Moving Average
The approach is identical to that of the weighted moving average except that
we use weights that are a power of 2. Now, we weight the linear imputed
values in column AE of Figure 4 by 1 for t = 6, by 1/2 for t = 5 or 7, by 1/4
for t = 4 or 8, and by 1/8 for t = 3 or 9.
The calculation of the imputed value at t = 6 is as shown in Figure 5, except
that the formula used for cell AK4 is now =1/2^ABS(AJ$7-AJ4) (and
similarly for the other cells in column AK). The result is as shown in column
AH of Figure 4. This time, cell AH7 contains the formula
=IF(AD7=””,SUMPRODUCT(AE4:AE10,1/2^ABS(AC4:AC10-
AC7))/AH$20,AD7)
and cell AH20 contains the formula =1+2*SUMPRODUCT(1/2^AC4:AC6).
39
Worksheet Function
Real Statistics Function: For a time series represented as a column array
where any non-numeric values are treated as missing, the Real Statistics
Resource Pack supplies the following array function:
TSImputed(R1, itype, k): returns a column array of the same size as R1
where each missing element in R1 is imputed based on the imputation
type itype which is either a number or text string as shown in Figure 2
(default 0 or “linear”) and k = the span (default 2), which is only used with
the three moving average imputation types.
For example, =TSImputed(E4:E18,”ema”,3) returns the time series shown in
range M4:M18 of Figure 2.
Seasonality
If the time series has a seasonal component, then we can combine one of the
imputation approaches described in Figure 1 with a seasonality imputation
approach as described in Handling Missing Seasonal Time Series Data.
40
Autoregressive Processes
A p-order autoregressive process, denoted AR(p), takes the form
It turns out that such a process is stationary when |φ1| < 1, and so we will
make this assumption as well. Note that if |φ1| = 1 we have a random walk.
Similarly, a second-order autoregressive process, denoted AR(2),
takes the form
41
and a p-order autoregressive process, AR(p), takes the form
42
Figure 1 – Simulated AR(1) process
The graph of the y values is shown on the left of Figure 2. As you can see, no
particular pattern is visible. The graph of ACF for the first 15 lags is shown
on the right side of Figure 2. As you can see, the actual and theoretical values
for the first two lags agree, but after that, the ACF values are small but not
particularly consistent.
43
Figure 2 – Graphs of simulated AR(1) process and ACF
Observation: Based on Property 3, for 0 < φ1 < 1, the theoretical values of
ACF converge to 0. If φ1 is negative, -1 < φ1 < 0, then the theoretical values of
ACF also converge to 0, but alternate in sign between positive and negative.
Property 4 : For any stationary AR(p) process. The autocovariance at
lag k > 0 can be calculated as
and
44
Solving for ρ1 yields
Also
We can also calculate the variance as follows:
45
letter i. For example, the equation z2 + 1 has the roots i and –i as can be seen
by substituting either of these values for z in the equation z2 + 1.
We now give three properties of imaginary numbers, which will help us avoid
discussing imaginary numbers in any further detail:
• all values which involve imaginary numbers can be expressed in the
form a + bi where a and b are real numbers
• if a + bi is a root of a pth degree polynomial, then so is a – bi
• if z = a + bi then the absolute value of z is defined by |z| =
Since a and b are real numbers, not involving , we only need to deal with
real numbers.
Property 2: An AR(1) process is stationary provided |φ1| < 1
Property 3: An AR(2) process is stationary provided
|φ2| < 1 and |φ1| + φ2 < 1
Example 1: Determine whether the following AR(2) process is stationary.
This process is not stationary since 1 + √1.5 ≥ 1. You get the same result via
Property 3 since |φ1| + φ2 = 2 – .5 = 1.5 ≥ 1.
Example 2: Determine whether the following AR(2) process is stationary.
Since
the roots of the reverse characteristic equation are not real. In fact
Thus
and so we see that this AR(2) process is stationary. We get the same result
via Property 3 since
where w1, …, wp are the unique roots of the reverse characteristic equation
Real Statistics Function: The Real Statistics Resource Pack supplies the
following array function where R1 is a p × 1 range containing the
46
phi coefficients of the polynomial where φp is in the first position and φ1 is in
the last position.
ARRoots(R1): returns a p × 3 range where each row contains one root, and
where the first column consists of the real part of the roots, the second
column consists of the imaginary part of the roots and the third column
contains the absolute value of the roots
This function calls the ROOTS function described in Roots of a Polynomial.
Note that just like in the ROOTS functions, the ARRoots function can take
the following optional arguments:
ARRoots(R1, prec, iter, r, s)
prec = the precision of the result, i.e. how close to zero is acceptable. This
value defaults to 0.00000001.
iter = the maximum number of iteration performed when performing
Bairstow’s Method. The default is 50.
r, s = the initial seed values when using Bairstow’s Method. These default to
zero.
Partial Autocorrelation for AR(p) Process
Property 1: For an AR(p) process yi = φ0 + φ1 yi-1 +…+ φp yi-p + εi, PACF(k)
= φk
Thus, for k > p it follows that PACF(k) = 0
Example 1: Chart PACF for the data in Example 1 from Basic Concepts for
Autoregressive Process
Using the PACF function and Property 1, we get the result shown in Figure
1.
47
Figure 1 – Graph of PACF for AR(1) process
Observation: We see from Figure 1 that the PACF values for lags > 1 are
close to zero, as is expected, although there is some random fluctuation from
zero.
Example 2: Repeat Example 1 for the AR(2) process
48
Figure 2 – Simulated AR(2) process
This time we place the formula =5+0.4*0-0.1*0+B4 in cell C4, =5+0.4*C4-
0.1*0+B5 in cell C5 and =5+0.4*C5-0.1*C4+B6 in cell C6, highlight the range
C6:C103 and press Ctrl-D.
The ACF and PACF are shown in Figure 3.
49
Figure 3 – ACF and PACF for AR(2) process
As you can see, there isn’t a perfect fit between the theoretical and actual ACF
and PACF values.
Finding AR(p) coefficients
Suppose that we believe that an AR(p) process is a fit for some time series.
We now show how to calculate the process coefficients using the
following techniques: (1) estimates based on ACF or PACF values, (2) using
linear regression and (3) using Solver. We illustrate the first of these
approaches on this webpage.
One approach is to use the Yule-Walker in reverse to calculate the φ0, φ1,
…, φp, σ2 coefficients based on the values of μ, γ0, …, γp (ACF values).
Alternatively, we use the values μ, γ0, π1…, πp (PACF values), which it turns
out are equivalent.
Example 1: Use the statistics described above to find the coefficients of the
AR(1) process based on the data in Example 1 of Autoregressive Processes
Basic Concepts.
The first 8 of 100 data elements are shown in column B of Figure 1. We next
calculate the mean, variance and PACF(1) values. From these, we can
estimate the process coefficients as shown in cells G8:G10. This estimate of
the time series is the process yi = 4.983 + .394yi-1 + εi where σ2 = 1.421703.
50
Figure 1 – Estimation of AR(1) coefficients
As we can see, the process coefficients are pretty close to the original
coefficients used to generate the data in column B (φ0 = 5, φ0 = .4 and σ2 = 1)
with the exception of σ2, which is a little high.
Observation: We can use this approach for AR(2) processes, by noting
that
Thus
and so
Example 2: Use the statistics described above, to find the coefficients of the
AR(2) process based on the data in Example 1.
We show two versions in Figure 2. The lower version is based on the ACF
using the formulas described in the above observation. The upper version is
based on the PACF using Property 1 of Partial Autocorrelation of AR(p)
Processes.
51
Figure 2 – Estimation of AR(2) coefficients
Finding AR(p) coefficients using Regression
We now show how to calculate the coefficients of an AR(p) process which
represents a time series by using ordinary least squares.
An AR(p) process can be expressed as
which is equivalent to
Let X be the n–p × p+1 matrix such that the ith row is [1 yi-1 yi-2 ⋯ yi-p], i.e. X =
[xij] where xi1 = 1 for all i and xij = yi-j+1 for all j > 1. Let Y be the n–p × 1 column
vector Y = [yp+1 yp+2 ⋯ yn]T, let φ be the p+1 × 1 column vector φ
52
= [φ0 φ1 ⋯ φp]T and ε be the n–p column vector ε = [εp+1 εp+2 ⋯ εn]T . Then the
AR(p) process can be represented by
We are given the values of y1, …, yn, but we also need to initialize values for
y0, …, y1-p (i.e. the values with non-positive subscripts). We will simply
initialize these values to zero, although alternatively, we can use
54
Figure 3 – Regression approach to finding AR(2) coefficients
Here the X values are shown in columns X and Y and the Y values are shown
in column Z. These values are obtained by placing the formulas =B5
(referencing Figure 2) in cell X4, =B4 in cell Y4 and =B6 in cell Z4,
highlighting the range X4:Z101 and pressing Ctrl-D. The predicted values
can now be calculated using the TREND array function.
Observation: The regression approach to calculating the AR(p) model
coefficients is more accurate than the ACF/PACF approach described
in Finding AR(p) Coefficients. Elsewhere we also show how to use Solver to
calculate these coefficients. The coefficients will be identical to those using
linear regression.
Real Statistics Function: The Real Statistics Resource Pack supplies the
following array function:
ARMap(R1,p) – takes the time series in the n × 1 range R1 and outputs
the n–p × p+1 range where the first p columns represent the X values in the
linear regression and the last column represents the Y values.
If we had highlighted the range X4:Z101, entered the formula
=ARMap(B4:B103) and pressed Ctrl-Shft-Enter we would get the same
values in range X4:Z101 as in Figure 3.
55
Lag Function
We now define the lag function
for any constant c and any variables x and z. We also use the following
notation for any variable z and non-negative integer n.
or even
where 1 is the identity function and we use the notation (f+g)x to mean f(x)
+ g(x) for any functions f and g. This can also be expressed as
or
Note that
where the values r1, r2, …, rp are the characteristic roots of the AR(p)
process.
Based on the vector φ = [φ1, …, φp] of coefficients, we can define the
operator φ(L)
56
Observation: The lag function is also called the (back) shift operator and
so sometimes the symbol B is used in place of L.
Augmented Dickey-Fuller Test
In Dickey-Fuller Test we describe the Dickey-Fuller test which determines
whether an AR(1) process has a unit root, i.e. whether it is stationary. We
now extend this test to AR(p) processes.
For the AR(1) process
Once you know how many lags to use, the augmented test is identical to the
simple Dickey-Fuller test. We can use the Akaike Information Criterion
(AIC) or Bayesian Information Criteria (BIC) to determine how many lags to
consider, as described in Comparing ARIMA Models.
Thus we can now use the full version of the ADFTEST function which was
introduced in Dickey-Fuller Test.
Real Statistics Function: The Real Statistics Resource Pack provides the
following array function where R1 contains a column of time series data.
57
ADFTEST(R1, lab, lag, criteria, type, alpha): returns an 8 × 1 range which
contains the following values: tau-statistic, tau-critical, yes/no (stationary
or not), AIC value, BIC value, # of lags (p), the first-order autoregression
coefficient and estimated p-value.
If lab = TRUE (default is FALSE), the output consists of a 8 × 2 range whose
first column contains labels. type = the test type (0, 1, 2, default is 1). The
default value for alpha is .05.
The arguments lag and criteria, which were not used for the Dickey-Fuller
Test, are defined as follows:
• lag = the maximum number of lags to use in the test (default 0)
• criteria = “none” : no criteria is used, and so p is set to the value
of lag
• criteria = “aic” : the AIC is used to determine the number of
lags p (where p ≤ lag)
• criteria = “bic” : the BIC is used to determine the number of
lags p (where p ≤ lag)
To specify the criteria, you can use “AIC” or 1 instead of “aic”, you can use
“BIC” or 2 instead of “bic” and you can use “” or 0 instead of “none”.
If lag < 0 then lag will automatically be set to value
=Round(12*(n/100)^.25,0), as proposed by Schwert, where n = the number
of elements in the time series.
To specify the test type, you can use “” or “none” instead of 0, you can use
“drift” or “constant” instead of 1 and you can use “trend” or “both” instead of
2.
Example 1: Determine whether the data in column A of Figure 1 has a unit
root based on a model without trend based on the Schwert estimate for the
maximum number of lags using the AIC criteria. Also, determine whether
there is a unit root based on a model with trend and a maximum number of
lags equal to 7 using the AIC criteria.
58
Figure 1 – Time Series
Here range J4:K8 contains the array formula =DescStats(A3:A22,TRUE).
We see that the mean value of the time series is 2.376, and so we conclude
that the time series likely has a non-constant mean. We could confirm this
by using a t-test to see whether the population mean is significantly different
from zero.
We now use the array formula =ADFTEST(A3:A22,TRUE,-1) to show the
results of the ADF test without trend. The -1 means that we are using the
Schwert estimate for the maximum number of lags. We are also using the
default type = 1, which results in the test for constant without trend. As we
can see from range P4:P11 in Figure 2, since tau-stat > tau-crit, the time
series is not stationary.
59
Figure 2 – ADF Test
Note that the above formula effectively uses a maximum lag count of 8, which
can seen by using the formula =ROUND(12*(K4/100)^0.25,0) in cell K10
from Figure 1.
Looking at the chart in Figure 1, it appears that the time series has a trend,
and so we repeat the ADF Test with constant and trend to get the results
shown in range S4:T11 of Figure 2 using the array formula
=ADFTEST(A3:A22,TRUE,7,”aic”,2). Here type = 2 (constant and trend) and
maximum number of lags = 7. Note that we didn’t use 8 as the maximum
number of lags since that would produce error values (based on insufficient
degrees of freedom in the underlying regression analysis).
Real Statistics Data Analysis Tool: As explained in Time Series Testing
Tools, the Time Series Testing data analysis tool can be used to perform
the Dickey-Fuller Test. In fact, it can also be used to perform the Augmented
Dickey-Fuller Test.
60
Real Statistics Functions: The Real Statistics Resource Pack provides the following
array functions where R1 contains a column of time series data.
PPTEST(R1, lab, lags, type, alpha) – an array function that returns a column
range for the PP test consisting of tau-stat, tau-crit, stationary (yes/no), lags and the
autocorrelation coefficient and p-value.
KPSSTEST(R1, lab, lags, type, alpha) – an array function that returns a column
range for the KPSS test consisting of test-stat, crit-value, stationary (yes/no), lags
and p-value.
As usual, if lab = TRUE (default is FALSE), the output consists of two columns
whose first column contains labels. type = the test type (0, 1, 2, default is 1). The
default value for alpha is .05.
To specify the test type, you can use “” or “none” instead of 0, you can use “drift”
or “constant” instead of 1 and you can use “trend” or “both” instead of 2.
Note too that the KPSS test does not support the case where there is no constant and
no trend. Thus, type for KPSSTEST is restricted to 1 and 2. If type = 0 is used, then
it is assumed that type = 1.
You can either specify the number of lags to test or use the values “short” or “long”.
If “short” is specified then lags is calculated to be =Round(4*(n/100)^.25,0)
where n = the number of elements in the time series, while if lags = “long” then the
value =Round(12*(n/100)^.25,0) is used.
In Figure 1, we repeat the analysis for Example 1 of Augmented Dickey-Fuller
Test using the PP and KPSS tests, specifying lags = “short” (which is equivalent
to lags = 3).
61
Figure 1 – PP and KPSS Tests
62
Recall that for the lag function Lc(yi) = yi-c, and so (1–Lc)yi = yi – yi-c. This is the
principal way of expressing seasonality for SARIMA models.
Note too that if si is deterministic, then (1–Lc)si = si – si-c= 0.
SARIMA Models
As described in ARMA Models, an ARMA(p,q) model can be expressed as
If φ0 = 0 (i.e. the mean of the stochastic process is zero) then this can be
expressed using the lag operator as
where
A seasonal ARIMA model takes the same form, but now there are additional
terms that reflect the seasonality part of the model. Specifically, a
SARIMA(p,d,q) × (P,D,Q)m model without constant can be expressed as
where
63
The forecast can be expressed (where the coefficients should have a hat on
them):
And so
which is equivalent to
In the case where there is a constant term φ0 this expression takes the form
This serves as the equation to estimate the forecast at time i (when the
final εi is set to zero). You can also solve for εi to obtain an expression that
can be used to estimate the residuals.
SARIMA Model Example
Example 1: Create a SARIMA(1,1,1) ⨯ (1,1,1)4 model for Amazon’s quarterly
revenues shown in Figure 1 and create a forecast based on this model for the four
quarters starting in Q3 2017.
Note that the range A3:B33 contains all the data, where the second half of the data
is repeated in columns D and E (so that it is easier to display in the figure).
64
Figure 1 – Amazon Revenues
We start by creating a plot of the time series data by highlighting the range B4:B33
and then selecting Insert > Charts|Line. After making a few modifications we
obtain the result shown in Figure 2.
65
Figure 3 – Ordinary and seasonal differencing
The original revenue data is repeated in range O3:O33 (with only the first 14 data
elements visible in the figure). Column P contains the detrended data where cell P5
contains the formula =O5-O4, and similarly for the other cells in column P. Column
Q removes the seasonality from the data in column P. This is done by inserting the
formula =P9-P5 in cell Q9, highlighting the range Q9:Q33 and pressing the key
sequence Ctrl-D.
We next plot the data in column Q as shown on the right side of Figure 3.
This time the plot looks like it comes from a stationary time series, although
we would need to perform a unit root test to confirm this.
Figure 4 shows how to calculate the residuals for the SARIMA model of this
time series in terms of the coefficients (only the first 8 of the time series
entries in AG3:AI28 are displayed).
66
The values in range AH4:AH28 are copied from Q9:Q33 of Figure 3. As we
saw SARIMA Models, the residuals of this time series can be calculated using the
formula
To calculate εi we need to know the values of εi-1, εi-4, εi-5. Thus we arbitrarily set the
values of the first five residuals equal to zero and use the above formula to
calculate ε6 (cell AI9). This is done in Excel using the following worksheet formula
=AH9-AL$3-AL$4*AH8-AL$6*AH5+AL$4*AL$6*AH4-AL$5*AI8-AL$7*AI5-
AL$5*AL$7*AI4
Next, we highlight range AI9:AI28 and press Ctrl-D to fill in the values of all the
rest of the residuals. Since we initially set all the coefficients to zero, the residuals
initially all take the same value as the data.
Our goal is to find coefficients that minimize the sum of the squares of the residuals
(SSE). We accomplish this by using Excel’s Solver. The value of SSE is calculated
in cell AL9 using the formula =SUMSQ(AI9:AI28).
Select Data > Analysis|Solver and fill in the dialog box that appears as shown in
Figure 5.
67
After clicking on the Solve button, the results shown in Figure 6 appear.
Since the last data element is y25, we want to determine the forecasted values
of y26,. y27, y28 and y29. To do this, we use the above formula using data values
of yi-1,. yI-4 and yi-5 when available, and (previously obtained) forecasted values
when the real data values are not available. This is shown in Figure 1.
68
Figure 1 – Forecast for the differenced time series
This figure shows the data values and residuals for the later portion of the
time series (leaving out the middle) plus the forecasted values. E.g. the
forecast value in cell AH29 is calculated by the formula
=AL$3+AL$4*AH28+AL$6*AH25-AL$4*AL$6*AH24+AL$5*AI28
+AL$7*AI25+AL$5*AL$7*AI24
After entering this formula, you can highlight the range AH29:AK32 and
press Ctrl-D to obtain the other three forecast values. Note that the
residuals corresponding to the four forecast values are implicitly set to zero.
Now that we have the forecasted values for the time series shown in column
Q of Figure 3 of SARIMA Model Example, we need to translate these into
forecast values for the original time series (column O in Figure 3 of SARIMA
Model Example). To accomplish this, we need to undo the two types of
differencing.
We start by replicating the bottom of the data in Figure 3 of SARIMA Model
Example (i.e. the part that is not displayed) and then inserting the forecast
that we obtained in Figure 1. This is shown in Figure 2.
69
Figure 2 – Forecast (step 1)
We only need to go in the original time series far enough to produce at least
one value not forecasted in column AQ. Whereas differencing proceeds from
left to right, integrating (i.e. undoing differencing) proceeds from right to
left. If we know the values in cells AP5 and AQ9, we can obtain the value in
cell AP9 using the formula =AP5+AQ9. Similarly, if we know the value in
cells AO8 then we can calculate the value in cell AO9 using the formula
=AO8+AP9 (where the value in AP9 was calculated previously).
In a similar way, we can obtain the value in cell AP10, using the formula
=AP6+AQ10 and the value in cell AO10 using the formula =AO9+AP10. We
highlight the range AO10:AP13 and press Ctrl-D to obtain the other three
forecast values, as shown in Figure 3.
70
We can now extend the plot shown in Figure 2 of SARIMA Model Example to
include the forecasted values, as shown in Figure 4.
71
to the number of rows in the highlighted range minus the number of rows
in R1).
SARIMA_PRED(R0, R1, d, D, per): returns a column array with the
forecasted values for the SARIMA(p, d, q) ⨯(P, D, Q)per model of the time
series data in R1 that correspond to the forecast values in R0 for the
SARMA(p, q) ⨯(P, Q)per model.
All the above arrays are column arrays. Any of the Rar, Rma, Rsa, Rsm arrays
may be omitted, although at least one of these can’t be omitted. per defaults
to 12 and cons defaults to 0 (i.e. no constant).
Note that the ADIFF function is an extension to the version described
in ARIMA Differencing.
Observation: Example 1 shows how to create a SARIMA(1, 1, 1) ⨯ (1, 1,
1)4 model and forecast. The above functions make it easier to create any
SARIMA model and forecast. To illustrate this, for Example 1 of SARIMA
Model Example, the following formulas could have been used:
=ADIFF(B4:B33,1,1,4) to create the time series in range AH4:AH28 of
Figure 4 of SARIMA Model Example
=SARMA_RES(AH4:AH28,AL4,AL5,AL6,AL7,4,AL3) to create the array of
residuals in range AI4:AI28 of Figure 4 of SARIMA Model Example
=SARMA_PRED(AH4:AH28,AL4,AL5,AL6,AL7,4,AL3) to create an array
of predicted values that take the values in the array AH4:AH32 – AI4:AI32
of Figure 4 of SARIMA Model Example. In particular, the last 4 of these
values are those found in range AH29:AH32.
=SARIMA_PRED(AH29:AH32,B4:B33,1,1,4) to create the array of forecast
values shown in range AO10:AO13 of Figure 3 of SARIMA Forecast
Example.
Real Statistics Data Analysis Tool: The Real Statistics Resource Pack
provides the Seasonal Arima (Sarima) data analysis tool which creates a
SARIMA model and forecast.
To perform the analysis for Example 1 of SARIMA Model Example,
press Ctrl-m and choose Seasonal Arima (Sarima) from the Time
S tab (or from the Time Series dialog box if using the original user
interface). Now fill in the dialog box that appears as shown in Figure 1.
72
Figure 1 – SARIMA dialog box
If you leave the # of Forecasts field blank, then its value defaults to the
value in the Seasonal Period field. If that field is blank then no seasonality
is used in the model and # of Forecasts defaults to 5.
After clicking on the OK button, the output shown in Figures 2 and 3 is
displayed (only the first 24 rows of the output in Figure 2 and the first 20
rows in Figure 3 are displayed).
73
Figure 2 – SARIMA output (part 1)
74
Figure 3 – SARIMA output (part 2)
Most of the values are produced using the Real Statistics functions described
above. The formulas used for the descriptive statistics in range J13:J24 and
coefficient roots in columns P, Q and R are similar to those used for the
corresponding values in the Arima data analysis tool.
The lower portion of the output, which contains the forecast, is shown in
Figure 4. The values in columns D, E, F and G are the continuation of these
columns from Figure 2 and the values in columns T and U are the
continuation of these columns from Figure 3.
75
Figure 4 – SARIMA forecast output
Range G29:G32 contains the four-quarter forecast for the differenced time
series, while range U34:U37 contains the corresponding four-quarter
forecast for the revenues for the period Q3 2017 through Q2 2018.
Observation: For this example, we chose to use the Solver approach to
estimating the SARIMA coefficients. The default is to use the Levenberg-
Marquardt approach. This is accomplished by leaving the Solver option
unchecked in Figure 1. In this case, the output is similar to that described
above, except that now the output in Figure 5 is included, which is useful in
that it provides the standard errors of the coefficients and the t-tests that
determine which coefficients are significantly different from zero.
76
corresponding standard errors. If lab = TRUE (default FALSE) then a
column of labels is appended to the output.
SARIMA_PARAM(R1, ar, ma, diff, per, sar, sma, sdiff, con): returns an
array with four columns, the first column of which contains the SARIMA
coefficients (in the order constant term, phi coefficients, theta coefficients,
Phi coefficients, Theta coefficients) and the remaining columns contain the
corresponding standard errors, t statistics and p-values.
Here, the parameters are ar = p, ma = q, diff = d, per = m, sar = P, sma = Q,
sdiff = D for a (p, d, q) × (P, D, Q)m SARIMA model. con = TRUE (default) if
a constant term is included.
Mann-Kendall Test
Basic Concepts
The Mann-Kendall Test is used to determine whether a time series has a
monotonic upward or downward trend. It does not require that the data be
normally distributed or linear. It does require that there is no
autocorrelation.
The null hypothesis for this test is that there is no trend, and the alternative
hypothesis is that there is a trend in the two-sided test or that there is an
upward trend (or downward trend) in the one-sided test. For the time
series x1, .., xn, the MK Test uses the following statistic:
Note that if S > 0 then later observations in the time series tend to be larger
than those that appear earlier in the time series, while the reverse is true
if S < 0.
The variance of S is given by
where t varies over the set of tied ranks and ft is the number of times (i.e.
frequency) that the rank t appears.
The MK Test uses the following test statistic:
where se = the square root of var. If there is no monotonic trend (the null
hypothesis), then for time series with more than 10 elements, z ∼ N(0, 1), i.e.
z has a standard normal distribution.
77
Examples
Example 1: Determine whether the time series in range A4:A15 of Figure 1
has a monotonic trend.
78
Figure 2 – Mann-Kendall Test (part 2)
Note that S = -44 (cell R7), which indicates the potential for a downward
trend. This is consistent with the line chart of the time series data shown in
Figure 3.
79
The following row contains the ties corrections where cell P19 contains the
sum of these corrections. This is done by inserting the formula
=IF(D18=0,0,D18*(D18+1)*(2*D18+7)) in cell D19, highlighting the range
D19:N19, pressing Ctrl-R and, inserting the formula =SUM(D19:N19) in cell
P19.
Worksheet Functions
Real Statistics Function: The Real Statistics Resource Pack supplies the
following array function to automate the steps required to perform the
Mann-Kendall Test.
MKTEST(R1, lab, tails, alpha): returns a column array with the
values S, s.e., z-stat, p-value and trend.
R1 is a column array containing the time series values, if lab = TRUE then an
extra column of labels is appended to the output (default FALSE), tails = 1 or
2 (default) and alpha is the significance level (default .05).
trend takes the values “yes” or “no” in the two-tailed test, and “upward” or
“no” in the one-tailed case where S > 0 and “downward” or “no” in the one-
tailed case where S < 0.
For Example 1, =MKTEST(A4:A15,TRUE) outputs the results shown in range
Q7:R11 of Figure 2.
Data Analysis Tool
The Mann-Kendall Test can also be performed using the Mann-Kendall
and Sen’s Slope data analysis tool, as demonstrated in Sen’s Slope.
Examples Workbook
Click here to download the Excel workbook with the examples described on
this webpage.
Reference
Gocic, M. and Trajkovic, S. (2012) Analysis of changes in meteorological
variables using Mann-Kendall and Sen’s slope estimator statistical tests in
Serbia. Elsevier
https://www.academia.edu/6955354/Trend_Analysis_MK_Sen_Slope
Sen’s Slope
The usual method for estimating the slope of a regression line that fits a set
of (x, y) data elements is based on a least-squares estimate. This approach is
not valid when the data elements don’t fit a straight line; it is also sensitive
to outliers.
Sen’s Slope Definition
We now describe an alternative, more robust, nonparametric estimate of the
slope, called Sen’s slope, for the set of pairs (i, xi) where xi is a time
series. Sen’s slope is defined as
80
A 1–α confidence interval for Sen’s slope can be
calculated as (lower, upper) where
81
Figure 2 – Sen’s Slope (step 2)
Worksheet Function
Real Statistics Function: The Real Statistics Resource Pack supplies the
following array function to automate the steps required to calculate Sen’s
slope.
SEN_SLOPE(R1, lab, alpha): returns a column array with the values:
Sen’s slope along with the lower and upper limits of the 1–alpha confidence
interval.
R1 is a column array containing the time series values, if lab = TRUE then an
extra column of labels is appended to the output (default FALSE)
and alpha is the significance level (default .05).
For Example 1, =SEN_SLOPE(A4:A15,TRUE) outputs the results shown in
range Q11:R13 of Figure 2.
Data Analysis Tool
Real Statistics Data Analysis Tool: The Real Statistics Resource Pack
provides the Mann-Kendall and Sen’s Slope data analysis tool.
To use this data analysis tool for Example 1 (whose data is repeated on the
left side of Figure 4), press Ctrl-m and select the Mann-Kendall and
Sen’s Slope option from the Time S tab (or from the Time Series dialog
box if using the original user interface) and then fill in the dialog box that
appears as shown in Figure 3.
83
Figure 1 – Cox-Stuart Test
The p-value from the test is shown in cell D9 using the binomial distribution
(or in cell D11 using the Slope Test). Since p-value = .035156 < .05 = α, we
have a significant result provided the time series is decreasing. Since the ratio
of positive values to non-negative values in column K is .125 (cell D8), which
is less than .5, we can conclude that the time series is indeed decreasing. Note
that this is a one-tail test since we specified a decreasing trend.
If we had specified an increasing trend, then clearly we would have a non-
significant result (p-value = 1-.035156 = .964844). If we wanted to determine
whether there was a trend in either direction, then we would perform a two-
tailed test with p-value = 2*.035156 = .070312, which is not significant.
84
Worksheet Function
Real Statistics Function: The Real Statistics Resource Pack provides the
following array function.
COX_STUART(R1, tails): returns a column array with the p-value of the
Cox-Stuart test for the data in the column array R1 along with the ratio
between positive differences over total differences.
tails = 1 or 2 (default). If ratio < .5 then any trend is decreasing, while if ratio
> .5 then any trend is increasing.
Applying this function in range D13:D14, we get the same results as shown
above for Example 1.
Examples Workbook
Click here to download the Excel workbook with the examples described on
this webpage.
Reference
Logos, T. (2009) Trend analysis with Cox-Stuart test in R. R-Blogger
https://www.r-bloggers.com/2009/08/trend-analysis-with-the-cox-stuart-
test-in-r/
Granger Causality
Granger Causality
As we have learned on many occasions, correlation doesn’t necessarily imply
causality, and while we can measure the degree of association between two
variables, i.e. correlation, it is harder to determine whether one variable
causes another variable.
Although generally, we don’t believe that a present or future event can cause
a past event, we do believe that it is possible that a past event can cause a
present or future event. This is the impetus for the Granger’s
Causality test on time-series data that gives evidence that variable x causes
y. Whether this test really demonstrates causality is open to debate, and so
we will use the phrase “x Granger-causes y” instead of “x causes y”.
As we will see, x Granger-causes y when the prediction of y is improved by
the inclusion of past values of x.
Granger Causality Test
The test is based on the following OLS regression model:
Here, the αj and βj are the regression coefficients and εi is the error term. The
test is based on the null hypothesis:
H0: β1 = β2 = … = βm = 0
We say that x Granger-causes y when the null hypothesis is rejected.
85
We use the usual F test described in Adding Extra Variables to a Regression
Model to determine whether there is a significant difference between the
regression model shown above (the full model) or the reduced model, based
on the null hypothesis, without the βj terms (i.e. where all the βj = 0).
There we demonstrate two equivalent forms of the test:
Here, all the terms are based on the full model with the exception
of SS′E and Rr2, which are based on the reduced model.
If the p-value for this test is less than the designed value of α, then we reject
the null hypothesis and conclude that x causes y (at least in the Granger
causality sense).
Assumptions
The Granger Causality test assumes that both the x and y time series are
stationary. If this is not the case, then differencing, de-trending or other
techniques must first be employed before using the Granger Causality test.
Note that the number of lags, i.e. the value of m, is critical, in that different
values of m may lead to different test results. One approach to selecting an
appropriate value for m is to choose the value that results in the full model
with the smallest AIC or BSC value.
It is possible that causation is only in one direction, or in both directions
(x Granger-causes y and y Granger causes x) or in neither direction.
Examples
Example 1: Figure 1 shows the egg production and chicken population
(including only those birds related to egg production) for the years 1931 to
1970. Determine whether the amount of the egg production Granger-causes
the size of the chicken population or the chicken population Granger-causes
the amount of egg production, or both or neither. This example is a tongue-
in-check exploration of the common question, “Which came first: the
chicken or the egg”?
86
87
Figure 1 – Chicken and Egg production
A plot of both time series (see Figure 2) shows that neither series is
stationary.
88
As a result, we will instead study the first differences of each time series. The
data and time series plots for these are shown in Figures 3 and 4.
89
Figure 3 – Differenced time series
90
Figure 5 – ADF tests
We now show how to determine whether Chickens Granger-cause Eggs for
lags = 4. To do this we perform regression on the X data in range E2:L37 of
Figure 6 and Y data in range M2:M37 (only the first 12 of 35 rows are shown).
91
Figure 7 – Test for Granger Causality
Here we use the Real Statistics function RSquare on the full model (cell AP3)
as well as the reduced model (AP4), although we could have gotten all the
values in the figure by actually conducting the regression.
Since p-value = 0.003892 is small, we conclude that Eggs Granger-cause
Chickens for lags = 4. Alternatively, we could have calculated the p-value by
placing the Real Statistics formula =RSquareTest(E3:L37,E3:H37,M3:M37)
in cell AP9.
Worksheet Functions
Real Statistics Functions: The Real Statistics Resource Pack supports the
following two functions that make it easy to determine whether the time
series in the column array Rx Granger-causes the time series in the column
array Ry at the specified number of lags.
GRANGER(Rx, Ry, lags) = the F statistic of the test
GRANGER_TEST(Rx, Ry, lags) = p-value of the test
We can use the GRANGER_TEST function to determine whether Eggs
Granger-causes Chickens and vice versa at various numbers of lags, as shown
in Figure 8.
92
Examples Workbook
Click here to download the Excel workbook with the examples described on
this webpage.
Reference
Thurman, W. N. and Fisher, M. E. (1988) Chickens, eggs, and causality, or
which came first? American Journal of Agricultural Economics. Vol. 70.
No. 2.
http://web.pdx.edu/~crkl/ec571/eggs.pdf
93
• Handling Missing Time Series Data
• Autoregressive Processes
• Basic Concepts
• Characteristic Equation
• Partial Autocorrelation
• Finding Model Coefficients using ACF/PACF
• Finding Model Coefficients using Linear Regression
• Lag Function Representation
• Augmented Dickey-Fuller Test
• Other Unit Root Tests
• Moving Average Processes
• Basic Concepts
• Infinite-order Moving Average
• Invertibility
• Finding Model Coefficients using ACF
• Finding Model Coefficients using Solver
• Autoregressive Moving Average Processes (ARMA)
• Basic Concepts
• ARMA(1, 1) processes
• ARMA(p, q) processes
• Calculating model coefficients using maximum likelihood
• Calculating model coefficients using Solver
• Evaluating the ARMA model
• Forecasting
• Real Statistics data analysis tool
• Real Statistics ARMA tool options
• Autoregressive Integrated Moving Average Processes (ARIMA)
• Differencing
• Identification
• Calculating model coefficients
• Comparing models
• Forecasting
• Seasonal ARIMA (SARIMA)
• Seasonality for Time Series
• SARIMA models
• Example of an ARIMA model
• SARIMA forecast example
• Real Statistics support
• Miscellaneous Topics
• Mann-Kendall Test
94
• Sen’s Slope
• Cox-Stuart Test
• Granger Causality
• Cointegration (Engle-Granger Test)
• Cross Correlations
• ARIMAX Model and Forecast
References
Greene, W. H. (2002) Econometric analysis. 5th Ed. Prentice-Hall
https://spu.fem.uniag.sk/cvicenia/ksov/obtulovic/Mana%C5%BE.%20%C
5%A1tatistika%20a%20ekonometria/EconometricsGREENE.pdf
Gujarati, D. & Porter, D. (2009) Basic econometrics. 5th Ed. McGraw Hill
https://cbpbu.ac.in/userfiles/file/2020/STUDY_MAT/ECO/1.pdf
Hamilton, (1994) Time series analysis. Princeton University Press
http://www.ru.ac.bd/stat/wp-
content/uploads/sites/25/2019/03/504_02_Hamilton_Time-Series-
Analysis.pdf
Wooldridge, J. M. (2009) Introductory econometrics, a modern approach.
5th Ed. South-Western, Cegage Learning
https://economics.ut.ac.ir/documents/3030266/14100645/Jeffrey_M._W
ooldridge_Introductory_Econometrics_A_Modern_Approach__2012.pdf
Autoregressive Processes
A p-order autoregressive process, denoted AR(p), takes the form
95
Autoregressive Process Proofs
Property 1: The mean of the yi in a stationary AR(p) process is
Proof: Since the process is stationary, for any k, E[yi] = E[yi-k], a value which we will denote μ.
Since E[εi] = 0, E[φ0] = φ0 and
it follows that
Proof: Since the yi and εi are independent, by basic properties of variance, it follows that
Proof: First note that for any constant a, cov(a+x, a+y) = cov(x,y). Thus, cov(yi,yj) has the same
value even if we assume that φ0 = 0, and similarly for var(yi) = cov(yi,yi). Thus, it suffices to prove
the property when φ0 = 0. In this case, by Property 1, μ = 0, and so cov(yi,yj) = E[yiyj].
Thus
since by the stationary property, E[yi-1,yi- ] = γi-1. Now, by induction on k, it is easy to see that
k
Hence
8 thoughts on “Autoregressive Process Proofs”
Autoregressive Processes Basic Concepts
In a simple linear regression model, the predicted dependent variable is modeled as
a linear function of the independent variable plus a random error term.
96
Thinking of the subscripts i as representing time, we see that the value of y at
time i+1 is a linear function of y at time i plus a fixed constant and a random error
term. Similar to the ordinary linear regression model, we assume that the error terms
are independently distributed based on a normal distribution with zero mean and a
constant variance σ2 and that the error terms are independent of the y values. Thus
It turns out that such a process is stationary when |φ1| < 1, and so we will make this
assumption as well. Note that if |φ1| = 1 we have a random walk.
Similarly, a second-order autoregressive process, denoted AR(2), takes the form
97
The time series ACF values are shown for lags 1 through 15 in column F. These are
calculated from the y values as in Example 1. Note that the ACF value at lag 1 is
.394376. Based on Property 3, the population ACF value at lag 1 is ρ1 = φ1 = .4.
Theoretically, the values for ρh = = .4h should get smaller and smaller
as h increases (as shown in column G of Figure 1).
98
Figure 2 – Graphs of simulated AR(1) process and ACF
Observation: Based on Property 3, for 0 < φ1 < 1, the theoretical values of ACF
converge to 0. If φ1 is negative, -1 < φ1 < 0, then the theoretical values of ACF also
converge to 0, but alternate in sign between positive and negative.
Property 4 : For any stationary AR(p) process. The autocovariance at lag k > 0 can
be calculated as
and
99
Solving for ρ1 yields
Also
We can also calculate the variance as follows:
100
Figure 1 – Graph of PACF for AR(1) process
Observation: We see from Figure 1 that the PACF values for lags > 1 are close to
zero, as is expected, although there is some random fluctuation from zero.
Example 2: Repeat Example 1 for the AR(2) process
101
Figure 2 – Simulated AR(2) process
This time we place the formula =5+0.4*0-0.1*0+B4 in cell C4, =5+0.4*C4-
0.1*0+B5 in cell C5 and =5+0.4*C5-0.1*C4+B6 in cell C6, highlight the range
C6:C103 and press Ctrl-D.
The ACF and PACF are shown in Figure 3.
102
Figure 3 – ACF and PACF for AR(2) process
As you can see, there isn’t a perfect fit between the theoretical and actual ACF and
PACF values.
4 thoughts on “Partial Autocorrelation for AR(p) Process”
Characteristic Equation for AR(p) Processes
Property 1: An AR(p) process is stationary provided all the roots of the following
polynomial equation (called the characteristic equation) have an absolute value
greater than 1.
This is equivalent to saying that if z that satisfies the characteristic equation then |z|
> 1.
In fact, setting w = 1/z, this is equivalent to saying that |w| < 1 for any w that satisfies
the following equation
By the Fundamental Theorem of Algebra, any pth degree polynomial has p roots; i.e.
there are p values of z that satisfy the above equation. Unfortunately, not all of these
roots need to be real; some can involve “imaginary” numbers such as , which is
usually abbreviated by the letter i. For example, the equation z + 1 has the
2
roots i and –i as can be seen by substituting either of these values for z in the
equation z2 + 1.
We now give three properties of imaginary numbers, which will help us avoid
discussing imaginary numbers in any further detail:
• all values which involve imaginary numbers can be expressed in the form a
+ bi where a and b are real numbers
• if a + bi is a root of a pth degree polynomial, then so is a – bi
103
• if z = a + bi then the absolute value of z is defined by |z| =
Since a and b are real numbers, not involving , we only need to deal with real
numbers.
Property 2: An AR(1) process is stationary provided |φ1| < 1
Property 3: An AR(2) process is stationary provided
|φ2| < 1 and |φ1| + φ2 < 1
Example 1: Determine whether the following AR(2) process is stationary.
This process is not stationary since 1 + √1.5 ≥ 1. You get the same result via Property
3 since |φ1| + φ2 = 2 – .5 = 1.5 ≥ 1.
Example 2: Determine whether the following AR(2) process is stationary.
Since
the roots of the reverse characteristic equation are not real. In fact
Thus
and so we see that this AR(2) process is stationary. We get the same result via
Property 3 since
where w1, …, wp are the unique roots of the reverse characteristic equation
Real Statistics Function: The Real Statistics Resource Pack supplies the following
array function where R1 is a p × 1 range containing the phi coefficients of the
polynomial where φp is in the first position and φ1 is in the last position.
ARRoots(R1): returns a p × 3 range where each row contains one root, and where
the first column consists of the real part of the roots, the second column consists of
the imaginary part of the roots and the third column contains the absolute value of
the roots
104
This function calls the ROOTS function described in Roots of a Polynomial. Note
that just like in the ROOTS functions, the ARRoots function can take the following
optional arguments:
ARRoots(R1, prec, iter, r, s)
prec = the precision of the result, i.e. how close to zero is acceptable. This value
defaults to 0.00000001.
iter = the maximum number of iteration performed when performing Bairstow’s
Method. The default is 50.
r, s = the initial seed values when using Bairstow’s Method. These default to zero.
2 thoughts on “Characteristic Equation for AR(p) Processes”
Finding AR(p) coefficients
Suppose that we believe that an AR(p) process is a fit for some time series. We now
show how to calculate the process coefficients using the following techniques: (1)
estimates based on ACF or PACF values, (2) using linear regression and (3) using
Solver. We illustrate the first of these approaches on this webpage.
One approach is to use the Yule-Walker in reverse to calculate the φ0, φ1,
…, φp, σ2 coefficients based on the values of μ, γ0, …, γp (ACF values). Alternatively,
we use the values μ, γ0, π1…, πp (PACF values), which it turns out are equivalent.
Example 1: Use the statistics described above to find the coefficients of the AR(1)
process based on the data in Example 1 of Autoregressive Processes Basic Concepts.
The first 8 of 100 data elements are shown in column B of Figure 1. We next
calculate the mean, variance and PACF(1) values. From these, we can estimate the
process coefficients as shown in cells G8:G10. This estimate of the time series is the
process yi = 4.983 + .394yi-1 + εi where σ2 = 1.421703.
105
Observation: We can use this approach for AR(2) processes, by noting that
Thus
and so
Example 2: Use the statistics described above, to find the coefficients of the AR(2)
process based on the data in Example 1.
We show two versions in Figure 2. The lower version is based on the ACF using the
formulas described in the above observation. The upper version is based on the
PACF using Property 1 of Partial Autocorrelation of AR(p) Processes.
106
for any constant c and any variables x and z. We also use the following
notation for any variable z and non-negative integer n.
or even
where 1 is the identity function and we use the notation (f+g)x to mean f(x)
+ g(x) for any functions f and g. This can also be expressed as
or
Note that
where the values r1, r2, …, rp are the characteristic roots of the AR(p)
process.
Based on the vector φ = [φ1, …, φp] of coefficients, we can define the
operator φ(L)
Observation: The lag function is also called the (back) shift operator and
so sometimes the symbol B is used in place of L.
Other Unit Root Tests
Two other unit root tests are commonly used, in addition to or instead of
the Augmented Dickey-Fuller Test, namely:
• Phillips-Perron (PP) test
107
• Kwiatkowski–Phillips–Schmidt–Shin (KPSS) test
While the ADF test uses a parametric autoregression to estimate the errors, the PP
test uses a non-parametric approach.
The KPSS test uses yet a different approach. Unlike the other tests, the null
hypothesis for the KPSS test is that the time series is stationary, while the alternative
hypothesis is that there is a unit root.
Real Statistics Functions: The Real Statistics Resource Pack provides the following
array functions where R1 contains a column of time series data.
PPTEST(R1, lab, lags, type, alpha) – an array function that returns a column
range for the PP test consisting of tau-stat, tau-crit, stationary (yes/no), lags and the
autocorrelation coefficient and p-value.
KPSSTEST(R1, lab, lags, type, alpha) – an array function that returns a column
range for the KPSS test consisting of test-stat, crit-value, stationary (yes/no), lags
and p-value.
As usual, if lab = TRUE (default is FALSE), the output consists of two columns
whose first column contains labels. type = the test type (0, 1, 2, default is 1). The
default value for alpha is .05.
To specify the test type, you can use “” or “none” instead of 0, you can use “drift”
or “constant” instead of 1 and you can use “trend” or “both” instead of 2.
Note too that the KPSS test does not support the case where there is no constant and
no trend. Thus, type for KPSSTEST is restricted to 1 and 2. If type = 0 is used, then
it is assumed that type = 1.
You can either specify the number of lags to test or use the values “short” or “long”.
If “short” is specified then lags is calculated to be =Round(4*(n/100)^.25,0)
where n = the number of elements in the time series, while if lags = “long” then the
value =Round(12*(n/100)^.25,0) is used.
In Figure 1, we repeat the analysis for Example 1 of Augmented Dickey-Fuller
Test using the PP and KPSS tests, specifying lags = “short” (which is equivalent
to lags = 3).
108
Figure 1 – PP and KPSS Tests
2 thoMoving Average Processes
A q-order moving average process, denoted MA(q) takes the form
Thinking of the subscripts i as representing time,
we see that the value of y at time i+1 is a linear function of past errors. We
assume that the error terms are independently distributed with a normal
distribution with mean zero and a constant variance σ2.
Topics:
• Basic Concepts
• Infinite-order Moving Average
• Invertibility
• Finding Model Coefficients using ACF
• Finding Model Coefficients using Solver
The mathematical proofs of some of the properties of Moving Average
Processes is given in Moving Average Proofs
uMoving Average Processes
109
A q-order moving average process, denoted MA(q) takes the form
Thinking of the subscripts i as representing time, we
see that the value of y at time i+1 is a linear function of past errors. We assume that
the error terms are independently distributed with a normal distribution with mean
zero and a constant variance σ2.
Topics:
• Basic Concepts
• Infinite-order Moving Average
• Invertibility
• Finding Model Coefficients using ACF
• Finding Model Coefficients using Solver
The mathematical proofs of some of the properties of Moving Average Processes is
given in Moving Average Proofs
7 thoughts on “Moving Average Processes”
ghts on “Other Unit Root Tests”
Calculating MA Coefficients using Solver
We now show how to use Excel’s Solver to calculate the parameters that best fit an
MA(q) process to some empirical time series data, based on the assumption that the
data does indeed fit an MA(q) process for some specific value of q.
Example 1: Repeat Example 1 of Calculating MA Coefficients using ACF using
Solver.
We created our 200 element time series by simulating the MA(1) process yi = εi –
.4εi-1 with σ2 = .25. The values in the time series are shown in range C4:C203 of
Figure 1.
Our goal is to fit this data to an MA(1) process of the form yi = μ + εi + θ1εi-
1 (ignoring that the time series was derived from a simulation of an MA(1)
process).
MA(q) Process Basic Concepts
A q-order moving average process, denoted MA(q), takes the form
Thinking of the subscripts i as representing time, we
see that the value of y at time i+1 is a linear function of past errors. We assume that
the error terms are independently distributed with a normal distribution with mean
zero and a constant variance σ2. Thus
110
where zi = yi – μ. Thus, we can often simplify our analyses by restricting ourselves
to the case where the mean is zero.
Using the lag operator, we can express a zero mean MA(q) process as
where
Property 1: The mean of an MA(q) process is μ.
Property 2: The variance of an MA(q) process is
where 1 ≤ j < n.
If the process is invertible (see Invertible MA(q) Processes) then
Example 1: Simulate a sample of size 199 from the MA(1) process yi = 4 + εi + .5εi-
1 where εi ∼ N(0,2).
111
Figure 1 – Simulated MA(1) data
By Properties 1 and 2, the theoretical values for the mean and variance are μ = 4 and
var(yi) = σ2(1+ ) = 22(1+.52) = 5. These compare to the actual time series values of
y̅ = AVERAGE(C6:C204) = 4.358 and s2 = VAR.P(C6:C204) = 4.401.
The ACF values are shown for lags 1 through 15 in Figure 2. These are calculated
from the y values as in Example 1 of AR(p) Process Basic Concepts. Note that the
ACF value at lag 1 is .301285. Based on Property 3, the population ACF value at lag
1 is
The ACF values for lags h > 1 vary from about -.18 to .16, compared to the
theoretical value of ρh = 0 (per Property 3). As you can see from Figure 2, the sample
values can be quite different from the theoretical values.
112
Figure 2 – ACF for MA(1) process
Observation: By Property 3
and so the reciprocal of θ1 yields the same ACF. Thus, if we are seeking a
coefficient θ1 that yields a particular ρ1 value, we can always choose a coefficient
whose absolute value is at most 1. In fact, it turns out that there is always a unique
such θ1 whose absolute value is less than 1.
This is also true for any MA(q) process, and so we will restrict our MA(q) processes
to those where |θj| < 1 for all j. This also ensures that the MA(q) process is invertible
(see Invertible MA(q) Processes).
Example 2: Chart PACF for the data in Example 1.
The approach is as described in Example 1 of Partial Autocorrelation Function. The
chart is shown in Figure 3.
113
Figure 3 – Graph of PACF for MA(1) Process
The theoretical PACF values are calculated using Property 6. In particular, we insert
the following formula in cell G5, highlight the range G5:G19 and press Ctrl-D.
=-((-0.5)^E5*(1-0.5^2)/(1-0.5^(2*E5+2)))
Note that we couldn’t use the following formula
=-(-0.5)^E5*(1-0.5^2)/(1-0.5^(2*E5+2))
This is because Excel incorrectly evaluates any expressions of the form -a^n as if
they were (-a)^n. Thus –1^2 and –(–1)^2 are evaluated as 1 instead of -1.
Observation: We can see from Figure 3 that the absolute value of the PACF values
tends towards zero as the lag increases. This is generally true for an MA(q) model.
10 thoughts on “MA(q) Process Basic Concepts”
114
Figure 1 – Using Solver to fit an MA(1) process
As we have done elsewhere we calculate the mean of the time series to provide our
estimate of the mean of the process, namely, the estimate of μ =
AVERAGE(C4:C203) = .03293, which noted previously is not significantly
different from zero.
We can now either subtract off this value for the mean from the y values in column
C or simply assume that the mean is zero, and proceed assuming that μ = 0, which
is what we will do here.
Since yi = εi + θ1εi-1, it follows that εi = yi – θ1εi-1,. Thus, for any estimated value of θ1,
we can calculate the values of the εi for i > 1 based on the data values in the time
series and the assumption that the initial residual value is zero, i.e. ε0 = 0.
Thus, we place 0 in cell D4 of Figure 1. Next, we insert the formula =C5-$G$3*D4
in cell D5, highlight the range D5:D203 and press Ctrl-D.
115
By Property 2 of Moving Average Processes Basic Concepts
and so an estimate for σ2 can be calculated from the estimate for θ1 using the formula
VAR.P(C5:C204) or VARP(C5:C204) as an estimate for var(yi). This is the formula
in cell G4 of Figure 1.
We use as an initial guess for the value of θ1 the value calculated by using the ACF
estimate from Example 1 of Calculating MA Coefficients using ACF, namely
0.28958, although we could simply use 0. As usual, we will use Solver to minimize
the mean squared error (MSE), which simply the sum of the squares of the εi values,
as shown by the formula in cell G6 of Figure 1.
We now select Data > Analysis|Solver which brings up the dialog box shown on
the right side of Figure 1. We fill in the values shown to minimize MSE (cell G6) by
changing the value of θ1 (cell G3). Note that since we want to restrict θ1 to have an
absolute value less than 1, we add the constraints shown in the dialog box. We could
also add a constraint to ensure that σ2 > 0, although this is not necessary since the
formula in cell G4 already creates this constraint (provided all the data elements in
column C are not equal).
When we click on the Solve button, the results shown in Figure 2 appear.
116
Figure 2 – Solver output for MA(1) process
We see that MSE is minimized when θ1 = -0.35909, a value that is closer to the
original value than the result from Example 1 of Calculating MA Coefficients using
ACF.
We should also make sure that the residual values in column D are consistent with
white noise. We see this by using the Ljung-Box test or looking at the Correlogram,
as shown in Figure 3.
117
Figure 3 – Check that residuals are white noise
Example 2: Repeat Example 1 trying to fit the data with an MA(2) process.
The approach is the same, except that this time, we calculate the residuals using the
formula εi = yi – θ1εi-1 – θ2εi-2, and assume that ε0 = ε-1 = 0. E.g., this is captured in
Excel by using the formula =C6-$G$3*D5-G$4*D4 in cell D6 and similarly for the
other cells in column D.
The setup for Solver is shown in Figure 4 using initial guesses of θ1 = θ2 = 0.
118
Figure 4 – Using Solver to fit an MA(2) process
The output is shown in Figure 5.
119
In Comparing ARIMA Models we discuss how to determine which model is a better
fit, the MA(1) process from Example 1 or the MA(2) process from Example 2. We
also show how to use this model for forecasting in ARIMA Forecasting.
8 thoughts on “Calculating MA Coefficients using Solver”
Moving Average Proofs
Moving Average Basic Concepts
The following are proofs of properties found in Moving Averages Basic
Concepts
Property 1: The mean of an MA(q) process is μ.
Proof:
Proof:
Proof:
When h = 1
since E[εi-1] = 0. When h > 1
Proof:
Thus, when h = 1
and when h = 2
120
and when h > 2
and for h = 2
121
and
Now define
Then the original AR(1) process can be transformed into the process
which is
But then
and so
which means that
Similarly
which results in
and so
is the desired MA(∞) process.
Property 1: Any stationary AR(1) process can be expressed as an MA(∞) process.
In fact
122
Proof: Using the same approach as in Example 1, we find that the AR(1)
process
can be expressed as
where
Since the original process is a stationary AR(1), |φ1| < 1 and the εi have the desired
properties.
Observation: Another way to see this is to use the lag operator, namely that an
AR(1) process (with zero mean) can be expressed as
where
as well as
where
Substituting the first equation inside the second, we get
i.e.
Thus
etc.
and so we that
It follows that
We also observed above that
123
Example 2: Show that the following AR(2) process can be represented by an MA(∞)
process.
By Property 1 of Autoregressive Processes Basic
Concepts, the mean is
Now define
Then the original AR(2) process can be transformed into the process
But then
and so
etc. Thus the first few terms of the MA(∞) process are
124
Figure 1 – Convert AR(2) model into an MA(∞) model
3 thoughts on “Infinite Moving Average Processes”
Infinite Moving Average Processes
The following is a proof of Property 4 in Infinite Moving Average Processes.
Property 4: The following are true for any MA(∞) process
Proof: The assumption that the infinite sum of the absolute values of
the ψi terms is finite is needed to show that all the infinite series listed below
converge. As usual, it is sufficient to demonstrate the above properties in the
case where the mean is 0.
nvertibility of MA(q) Processes
Basic Concepts
Just as we can define an infinite-order moving average process, we can also define
an infinite-order autoregressive process, AR(∞). It turns out that any stationary
MA(q) process can be expressed as an AR(∞) process. E.g. suppose we have an
MA(1) process with μ = 0.
Thus
Continuing in this way, after n steps we have
125
As a result, we have
Or equivalently
It turns out that if |θ1| < 1 then this infinite series converges to a finite value. Such
MA(q) processes are called invertible.
Properties
Property 1: If |θ1| < 1 then the MA(1) process is invertible
Property 2: The MA(q) process yi = μ + εi + θ1εi-1 + ⋅⋅⋅ + θqεi-q is invertible provided
the absolute value of all the roots of the characteristic polynomial 1 + θ1L + θ1L2 +
⋅⋅⋅ + θqLq = 0 is greater than 1.
Worksheet Function
Real Statistics Function: The Real Statistics Resource Pack supplies the following
array function where R1 is a q×1 range containing the theta coefficients of the
polynomial where θq is in the first position and θ1 is in the last position.
MARoots(R1): returns a q × 3 range where each row contains one root, and where
the first column consists of the real part of the roots, the second column consists of
the imaginary part of the roots and the third column contains the absolute value of
the roots
This function calls the ROOTS function described in Roots of a Polynomial. Note
that just like in the ROOTS functions, the MARoots function can take the following
optional arguments:
MARoots(R1, prec, iter, r, s)
prec = the precision of the result, i.e. how close to zero is acceptable. This value
defaults to 0.00000001.
iter = the maximum number of iterations performed when performing Bairstow’s
Method. The default is 50.
r, s = the initial seed values when using Bairstow’s Method. These default to zero.
Example
Example 1: Determine whether the following MA(3) process is invertible
yi = 4 + εi + .5εi-1 – .2εi-2 + .6εi–3
We insert the array formula =MARoots(B3:B5) in range D3:F5 to obtain the results
shown in Figure 1.
126
Figure 1 – Roots of an MA(3) process
We see that the three roots of the characteristic equation are -.605828–1.23715i, -
.605828+1.23715i, and -0.87832. Since the absolute value of the real root is less than
1, we conclude that the process is not invertible.
Example 2: Determine whether the following MA(2) process is invertible
yi = εi – .1εi-1 + .21εi-2
Using the same approach as for Example 1, we see that the roots of the characteristic
polynomial are 10/3 and 10/7, both of which are greater than one. Thus, we conclude
that this is an invertible process.
Examples Workbook
Click here to download the Excel workbook with the examples described on this
webpage.
References
Peiris, M. S. (2013) Invertibility of MA processes. Time series concepts & methods
https://talus.maths.usyd.edu.au/u/UG/SM/STAT3011/r/TS/NOTES10.pdf
Brockwell, P. J. and Davis, R. A. (2002) Introduction to time series and
forecasting, 2nd Ed. Springer.
http://home.iitj.ac.in/~parmod/document/introduction%20time%20series.pdf
13 thoughts on “Invertibility of MA(q) Processes”
Calculating MA Coefficients using ACF
If we know (or assume) that a time series can be fit by an MA(q) process, then we
need to figure out the value of the parameters μ, σ2, q, θ1, …, θq.
The initial approach to determining the value for q is to look at the ACF values for
the time series under consideration. Since we know that for an MA(q) process, ρk =
0 for all k > q, we seek the first value for q where ACF(q) is approximately zero. We
will refine this approach in Comparing ARIMA Models.
We next turn our attention to finding the other parameters that provide the best fit
for the data.
We start by looking at an MA(1) process yi = μ + εi + θ1εi-1 . We know that
Property 1: The mean is μ.
Property 2: The variance is
where
128
We now calculate the mean (cell F4), variance (cell F5) and autocorrelation from the
time series as shown in the upper right-hand side of Figure 1. From these values, we
calculate two possible values for θ1, namely -0.28958 and -3.45331. Note that these
values are reciprocals of one another. Only the value θ1 = -0.28958 yields an
invertible MA(1) process since |θ1| < 1. In this case, we see that σ2 = 0.198967.
The result is an estimate of the MA(1) process, namely
with an estimate of 0.198967 for the variance of the εi. Using a one-sample t-test, we
can see that the mean is not significantly different from zero (t = .97, p-value = .33,
2 tailed test).
Observation: We can compute a somewhat crude 95% confidence range
for θ1 based on the normal approximation, as shown in Figure 2.
129
AR(p) and MA(q) process, then the process is called ARMA(p, q) and can be
expressed as
Topics
• Basic Concepts
• ARMA(1, 1) processes
• ARMA(p, q) processes
• Calculating model coefficients using maximum likelihood
• Calculating model coefficients using Solver
• Evaluating the model
• Forecasting
• Real Statistics data analysis tool
• Real Statistics ARMA tool options
For proofs of some of the properties, see ARMA Proofs.
3 thoughts on “ARMA Processes”
This last equality results from the fact that |φ1| < 1, and so φ12 < 1, in which
case we have a geometric series that converges as follows:
See Geometric Series for a proof that the geometric series converges.
ARMA Processes
An autoregressive moving average (ARMA) process consists of both
autoregressive and moving average terms. If the process has terms from both an
130
AR(p) and MA(q) process, then the process is called ARMA(p, q) and can be
expressed as
Topics
• Basic Concepts
• ARMA(1, 1) processes
• ARMA(p, q) processes
• Calculating model coefficients using maximum likelihood
• Calculating model coefficients using Solver
• Evaluating the model
• Forecasting
• Real Statistics data analysis tool
• Real Statistics ARMA tool options
For proofs of some of the properties, see ARMA Proofs.
3 ARMA(1,1) Processes
For an ARMA(1, 1) process
Now let’s suppose that |φ1| < 1. We show how to create the MA(∞) representation
as follows:
Thus
where ψ0 = 1 and for j > 0
The MA(∞) representation is therefore
If |φ1| < 1, then this ARMA(1,1) process is stationary. It also turns out that when |θ1|
< 1, the process is invertible.
Example 1: Find the MA(∞) form of the ARMA(1, 1) process yi =.4yi-1 +εi – .2εi-1
etc.
131
We get the same result using the Real Statistics PSICoeff array function as shown in
Figure 1.
133
Figure 3 – ACF for ARMA(1,1) Process
Cell M6 contains the formula =ACF($C$12:$C$111,L6), and similarly, for the other
cells in column M, cell N6 contains the formula
=(Q5+Q6)*(1+Q5*Q6)/(1+2*Q5*Q6+Q5^2) and cell N7 contains the formula
=N6*Q$5, and similarly for rest of the cells in column N.
Since |φ1| = .7 < 1 and |θ1| = .2 < 1, this process is both stationary and invertible.
Example 3: Simulate a sample of 105 elements from the ARMA(1,1) process
134
Figure 4 – Simulated ARMA(1,1) Process with non-zero mean
8 thoughts on “ARMA(1,1) Processes”
thoughts on “ARMA Processes”
RMA(p,q) Processes
Property 1: An ARMA(p, q) process
The causal property implies that (and is equivalent to) the fact that there
exist constants ψj such that ψ0 = 1 and
135
Thus all stationary ARMA processes can be expressed as an MA(∞) process.
In fact, the ψj coefficients can be determined as in Property 2.
Property 2: Let
Then
with σ2 = 1, it is not surprising that we can model the time series as an ARMA(1,1)
process. We now see how close the coefficients are to the coefficients of the original
ARMA(1,1) process.
Since we are assuming that we have an AR(1,1) process, we know that
136
Actually, it is sufficient to use the formula =F9-F8*J$8-G8*K$8 in cell G9. We use
the more complicated formula shown above since it is applicable when we get to the
general ARMA(p,q) case.
137
Figure 2 – Solver output
We see that the values φ1 = .6867541 and θ1 = -0.40305 reduce SSE from 125.98788
to 110.19884. These are the coefficients we are looking for. Note that the phi
coefficient is fairly similar to the original coefficient φ1 = .7, and the theta coefficient
is a little off from the original coefficient θ1 = -0.2.
We can use the Real Statistics T Test and Non-parametric Equivalents data
analysis tool, to determine whether the mean of the data values (in range F8:F112)
is significantly different from zero.
i.e. the same process as in Example 1, except that the constant term is non-zero.
Repeating the steps described in Example 1 (with a few differences, as described
below), we get the output from Solver shown in Figure 4 (only the first 10 data
elements are displayed).
138
Figure 4 – Solver output for ARMA(1,1) process with constant term
The data in columns B and C are the same as in Figure 2. Whereas column C contains
the original time series data yi, column F contains the data for the time series zi =
yi – µ. This is done by placing the formula =F6-K$7 in cell F6, highlighting the range
F6:F110 and pressing Ctrl-D. Here cell K7 contains the estimate of the mean of the
ARMA(1,1) process which is being estimated.
As in Example 1, now place 0 in cell G6 and the formula =F7-
SUMPRODUCT(F6,J$6)-SUMPRODUCT(G6,K$6) in cell G7. Then highlight the
range G7:G110 and press Ctrl-D. We also insert the formula =SUMSQ(G7:G110)
in cell K9 (SSE).
We now use Excel’s Solver to minimize the value of SSE. The Solver dialog box is
filled in as in Figure 1, except that this time we insert K9 in the Set Objective field
and the range J6:K7 in the By Changing Variable Cells field, where J6 contains the
initial guess for φ1, namely zero and K6 contains the initial guess for θ1, also zero,
K7 contains the initial guess for µ, namely 0, and J7 contains the value of φ0, which
is calculated by the formula =K7*(1-SUM(J6)) since as we saw in defining
ARMA(p,q) processes in ARMA Basic Concepts
Note that we could simply use the formula =K7*(1-J6) in cell J7 in the ARMA(1,1)
case, although the more complicated formula given above is applicable in the more
general ARMA(p,q) case.
After clicking on the Solve button, the output shown in Figure 4 is displayed. You
can see that the estimated values of φ0, φ1, θ1 are similar, but not exactly the same as
the original values of the simulated ARMA(1,1) process.
139
11 thoughts on “Calculate ARMA(p,q) coefficients using
Solver”
Evaluating the ARMA model
In Calculating ARMA(p,q) Coefficients using Solver we showed how to create
an ARMA model for time series data. We now present some statistics for
evaluating the fit of the model. All the statistics we present will be for the
ARMA(1,1) created in Example 2 of Calculating ARMA(p,q) Coefficients
using Solver.
Descriptive Statistics
We start with some descriptive statistics, as shown in Figure 1 (with reference
to the cells in Figure 4 of Calculating ARMA(p,q) Coefficients using Solver).
140
parameters, as long as the LL value (i.e. SSE value) is made as small as
possible.
Akaike’s Information Criterion (AIC) and Bayesian Information
Criteria (BIC) are two such measures. The latter is also called the Schwarz
Bayesian Criterion (SBC) or the Schwarz Information
Criterion (SIC).
where k = the number of parameters in the model, which for a model without
a constant term is k = p + q + 1 (including φ1, …, φp, θ1, …, θq, σ); in the case
where there is a constant term, k = p + q +2 (including φ0).
The value of -2LL is as described at the end of Calculating ARMA(p,q)
Coefficients using Maximum Likelihood. Since any models of the time series
have the same size, the n + LN(2π) portion of -2LL, and therefore
of AICaug and BICaug, are not relevant, and so can be left out of the definitions
of AIC and BIC.
For the time series in Example 2 of Calculating ARMA(p,q) Coefficients
using Solver), the values of these statistics are shown in Figure 2.
141
the two models are significantly different by using the fact that LL1 – LL0 ~
χ2(1) where LL1 is the LL value for the complete model and LL0 is
the LL value for the reduced model.
For ARMA(p,q) models, this is equivalent to testing
where SSE0 and SSE1 are the sums of the squared errors of the reduced and
complete models respectively. The results for Example 2 are shown in the
upper portion of Figure 3.
142
Example 1: Create a forecast for times 106 through 110 based on the
ARMA(1,1) model created in Example 1 of Calculating ARMA Coefficients
using Solver.
The result is shown in Figure 1, where we have omitted the data for times 5
through 102 to save space.
Note that since we don’t have an observed value for ε106, we use the theoretical
mean value, namely zero. The forecasted value at time i = 106 is calculated
in Figure 1 using the formula
=SUMPRODUCT(W112,J$8)+SUMPRODUCT(X112,K$8). For this model,
this formula can be simplified to =W112*J8+X112*K8, but the longer
formula will come in handy when we create forecasts using ARMA(p, q)
where p and/or q is larger than 1.
The forecast at time i = 107 is calculated by
143
This time, there are no observed values for ε106, ε107, or y106. As before, we
estimate ε106 and ε107 by zero, but we estimate y106 by the forecasted value ŷ106.
This is accomplished in Excel using the formula =SUMPRODUCT(Y113,J8).
The forecasted values at times 108, 109 and 110 are calculated in a similar
manner.
We next look at the standard error for the forecast values. For the observed
times, the standard error is σ, which can be estimated
by = = = 1.029371, where SSE = 110.19884
(see Figure 2 of Calculating ARMA Coefficients using Solver).
The standard error of the mth forecasted value (after the n observed values)
is given by the formula
where
is the MA(∞) representation of the ARMA process. As before, we
estimate σ by estimated by = 1.029371.
The psi coefficients for the MA(∞) representation are as shown in Figure 2.
144
Example 2: Create a forecast for times 106 through 110 based on the
ARMA(1,1) model created in Example 2 of Calculating ARMA Coefficients
using Solver.
The process is identical to that shown in Example 1. The only difference is
that this time there is a constant term in the ARMA(1,1) model. The result is
shown in Figure 3.
146
Figure 2 – ARMA(2,1) model – part 1
In Figure 2, we see that the best fit ARMA(2,1) process is given by
yi = 2.64 + .68yi-1 + .06yi-2 + εi – .41εi-1
The mean value 10.143026 (cell J6) has been subtracted from all the y values in
column B (shown in Figure 1) to obtain the z values in column E.
Note that the formula in cell F6 is
=E6-SUMPRODUCT(E4:E5,I$4:I$5)-SUMPRODUCT(F4:F5,J$4:J$5)
Similar formulas are used to calculate the other residual values shown in column F.
The formula used to calculate SSE (cell J8) is =SUMSQ(F6:F108). The other cells
are calculated as described in Evaluating the ARMA Model.
Note that AIC = 16.68 (cell J21). This compares with AIC = 13.03 for the
ARMA(1,1) model used to fit the same data as shown in Figure 2 of Evaluating the
ARMA Model. This gives evidence that the ARMA(1,1) model is a better fit for the
data than the ARMA(2,1) model. Similarly, BIC = 29.86 (cell J22) for the
ARMA(2,1) model is greater than BIC = 20.30 for the ARMA(1,1) model shown in
147
Figure 2 of Evaluating the ARMA Model, giving more evidence that the
ARMA(1,1) is the better, and certainly more parsimonious, fit for the data.
148
Figure 4 – ARMA(2,1) model – part 3
To save space, we have not included the values for times 4 through 101. Note the
following formulas used in Figure 4.
Cell Entry Formula
149
Figure 6 – Time series forecast
See ARMA Tool Options for a description of the following options that are
displayed in Figure 1:
• Make AR(p) agree with OLS
• Include sigma-sq in AIC/BIC
• Reformat for Linear Regression
• Use Solver
Real Statistics Function: The Real Statistics Resource Pack provides the following
array functions. In particular, the first function is used to calculate the ARIMA
coefficients and their standard errors.
ARIMA_Coeff(R1, p, q, d, con, lab) = a p+q+1 × 4 array, each row of which
contains the coefficient, standard error, t-stat and p-value (in order: constant, phi 1,
phi 2, …, theta 1, theta 2, …) of the ARIMA(p,q,d) model for the time series data
in column range R1; if lab = TRUE (default FALSE), then an extra row and
column are appended with labels; if con = TRUE (default) then a constant term is
used, otherwise it is not (i.e. it is set to zero).
Range Q4:R7 of Figure 3 contains the formula
=ARIMA_Coeff(B4:B108,2,1,0,TRUE), where only the first two columns of the
output are used (with no labels). The output from the array formula
=ARIMA_Coeff(B4:B108,2,1,0,TRUE,TRUE) is shown in range P9:T13 of Figure
7.
150
Figure 7 – Real Statistics ARIMA_Coeff and ARIMA_Stats functions
Note too that there are more options for the con argument than just TRUE and
FALSE. In fact, you can specify a column or row range with up to p+q+1 elements.
Each position in the range specifies the initial guess used for the corresponding
coefficient in the Levenberg-Marquardt algorithm. The initial guess for any
coefficients that are not explicitly specified is .2. Note too that if the element in the
range takes the form “c” followed by a numeric value, then that numeric value will
be fixed and the Levenberg-Marquardt algorithm will not change it.
E.g. for an ARIMA(2,1,0) model, the con range can have up to p+q+1 = 2+1+1 = 4
elements, the first for the constant, the second for phi 1, the third for phi 2 and the
fourth for theta 1. Thus, if cell D1 contains “c1.2” and cell D2 contains .4, then the
formula =ARIMA_Coeff(B4:B108,2,1,0,D1:D2) specifies that the constant term
will be fixed with the value 1.2 and phi 1 will be initialized to .4 (instead of .2)
although its final value will depend on the Levenberg-Marquardt algorithm. The
initial values of phi 2 and theta 1 will be .2 (the default) since these values have not
been specified in range D1:D2.
ARIMA_Stats(R1,R2, p, q, d, con, lab) = 7 × 1 column array containing the
values LL, SSE, MSE, AIC, BIC, AIC augmented and BIC augmented for the
ARIMA(p,q,d) model for the time series data in column range R1 based on the
coefficients in the p+q+1 × 1 column range R2; if lab = TRUE (default FALSE),
then an extra column of labels is appended to the output; if con = TRUE (default)
then a constant term is used, otherwise it is not (unlike ARIMA_Coeff, no other
values are acceptable).
The output from the array formula
=ARIMA_Stats(B4:B108,Q10:Q13,2,1,0,,TRUE) is shown in range P15:Q21 of
Figure 7.
151
17 thoughts on “Real Statistics ARMA Tool”
ARMA Tool Options
We now describe the following options that are displayed in Figure 1 of Real
Statistics ARMA Tool.
• Make AR(p) agree with OLS
• Include sigma-sq in AIC/BIC
• Reformat for Linear Regression
• Use Solver
Also, note that the Differences field is described in ARIMA
Differencing and ARIMA Model Coefficients.
Use Solver: If this option is selected then the ARIMA_Coeff function is not used to
estimate the ARIMA coefficients, but instead Solver is used as described
in Calculating ARMA Coefficients using Solver.
Make AR(p) agree with OLS: The usual calculation for the statistic (e.g.
cell J15 of Figure 2) is to take the square root of SSE/n. If we are modelling an AR(p)
process, then it is common to use an ordinary least squares (OLS) regression, in
which case, s.e. = = if there is a constant term
and s.e. = = if there is no constant term.
If you check this option then the OLS regression approach is used for an AR(p)
model; otherwise the value is used. This option does not affect the results
for an ARMA(p, q) model where q ≠ 0.
Include sigma-sq in AIC/BIC: When calculating AIC and BIC, as shown
in Evaluating the ARMA Model, there is a term k which represents the number of
parameters in the model. The usual approach is to include σ as one of the model
coefficients, and so k = p + q + 2 for an ARMA(p, q) model with a constant term.
For the AR(p) model using OLS regression, however, the σ parameter is not used in
calculating AIC and BIC, and so k = p + 0 + 1 for an ARMA(p, 0) model with a
constant term. Actually, the σ parameter is not used in the Solver approach described
in Calculating ARMA Coefficients using Solver either.
If you check this option and either q > 0 or Make AR(p) agree with OLS is
unchecked, then σ is included as a parameter when calculating AIC and BIC;
otherwise, it is not.
Reformat for Linear Regression: In the case of a time series that is modeled as an
AR(p) process, you can choose to use ordinary linear regression using the Real
Statistics Linear Regression data analysis tool. To do this the data in the time series
must be reformatted as shown in the following example.
152
If you check this option then the time series data is reformatted so that it can be sued
as input to the Linear Regression tool; otherwise, no such reformatted data is
output. In either case, the usual ARIMA calculations are made.
Example 1: Perform multiple linear regression for the AR(3) model of the time
series in range B4:B23 of Figure 1.
This time we choose the ARIMA Model and Forecast data analysis tool and insert
the range B4:B23 in the Input Range field, insert 3 in the AR order field, 0 in
the MA order field and check the Reformat for Linear Regression option.
When we click on the OK button, we see output similar to that in Figures2, 3 and 4
of Calculating ARMA Coefficients using Solver along with the output in columns
AF through AI of Figure 1.
153
agree with OLS and Include sigma-sq in AIC/BIC fields, then the output would
agree with the linear regression model shown in Figure 1.
Real Statistics Function: The Real Statistics Resource Pack provides the following
array function where R1 is an n × 1 column range containing time series data and p is
a positive integer
ARMap(R1, p): outputs an n–p × p+1 range which contains X and Y data
equivalent to the data in R1 in order to perform multiple linear regression
In fact, the ARIMA Model and Forecast data analysis uses this function to produce
the reformatted data range described above. In particular, range AF6:AI22 in Figure
1 contains the array formula =ARMap(B4:B23,3).
2 thoughts on “ARMA Tool Options”
Seasonality for Time Series
A time-series yi with no trend has seasonality of period c if E[yi] = E[yi+c].
If we have a stationary time series yi and a deterministic time series si such
that si = si+c for all i (and so si = si+kc for all integers k), then zi = yi + si would be
a seasonal time series with period c. As shown in Regression with
Seasonality, the seasonality of such time series can be modeled by using c–1
dummy variables.
A second way to model seasonality is to assume that si = μm(i) + εi where εi is a
purely random time series and μ0, …, μc-1 are constants where m(i) =
MOD(i,c).
A third approach is to model seasonality as a sort of random walk,
i.e. si = μm(i) + si-c + εi . If μ0 = … = μc-1 = 0 then there is no drift; otherwise μ0,
…, μc-1 capture the seasonal drift.
Of course, seasonality can be modeled in many other ways.
Recall that for the lag function Lc(yi) = yi-c, and so (1–Lc)yi = yi – yi-c. This is the
principal way of expressing seasonality for SARIMA models.
Note too that if si is deterministic, then (1–Lc)si = si – si-c= 0.
SARIMA Models
As described in ARMA Models, an ARMA(p,q) model can be expressed as
If φ0 = 0 (i.e. the mean of the stochastic process is zero) then this can be
expressed using the lag operator as
154
where
A seasonal ARIMA model takes the same form, but now there are additional
terms that reflect the seasonality part of the model. Specifically, a
SARIMA(p,d,q) × (P,D,Q)m model without constant can be expressed as
where
The forecast can be expressed (where the coefficients should have a hat on
them):
155
And so
which is equivalent to
In the case where there is a constant term φ0 this expression takes the form
This serves as the equation to estimate the forecast at time i (when the
final εi is set to zero). You can also solve for εi to obtain an expression that
can be used to estimate the residuals.
SARIMA Forecast Example
In SARIMA Model Example we show how to create a SARIMA model for the
following example, step by step, in Excel.
Example 1: Create a SARIMA(1,1,1) ⨯ (1,1,1)4 model for Amazon’s quarterly
revenues shown in Figure 1 and create a forecast based on this model for the
four quarters starting in Q3 2017.
We now show how to use this model to create a forecast.
The coefficients estimated by this model (shown in AK3:Al7 of Figure 1) can
be used to create a forecast based on the following equation:
Since the last data element is y25, we want to determine the forecasted values
of y26,. y27, y28 and y29. To do this, we use the above formula using data values
of yi-1,. yI-4 and yi-5 when available, and (previously obtained) forecasted values
when the real data values are not available. This is shown in Figure 1.
156
Figure 1 – Forecast for the differenced time series
This figure shows the data values and residuals for the later portion of the time series
(leaving out the middle) plus the forecasted values. E.g. the forecast value in cell
AH29 is calculated by the formula
=AL$3+AL$4*AH28+AL$6*AH25-AL$4*AL$6*AH24+AL$5*AI28
+AL$7*AI25+AL$5*AL$7*AI24
After entering this formula, you can highlight the range AH29:AK32 and press Ctrl-
D to obtain the other three forecast values. Note that the residuals corresponding to
the four forecast values are implicitly set to zero.
Now that we have the forecasted values for the time series shown in column
Q of Figure 3 of SARIMA Model Example, we need to translate these into
forecast values for the original time series (column O in Figure 3 of SARIMA
Model Example). To accomplish this, we need to undo the two types of
differencing.
We start by replicating the bottom of the data in Figure 3 of SARIMA Model
Example (i.e. the part that is not displayed) and then inserting the forecast
that we obtained in Figure 1. This is shown in Figure 2.
157
Figure 2 – Forecast (step 1)
We only need to go in the original time series far enough to produce at least
one value not forecasted in column AQ. Whereas differencing proceeds from
left to right, integrating (i.e. undoing differencing) proceeds from right to
left. If we know the values in cells AP5 and AQ9, we can obtain the value in
cell AP9 using the formula =AP5+AQ9. Similarly, if we know the value in
cells AO8 then we can calculate the value in cell AO9 using the formula
=AO8+AP9 (where the value in AP9 was calculated previously).
In a similar way, we can obtain the value in cell AP10, using the formula
=AP6+AQ10 and the value in cell AO10 using the formula =AO9+AP10. We
highlight the range AO10:AP13 and press Ctrl-D to obtain the other three
forecast values, as shown in Figure 3.
158
We can now extend the plot shown in Figure 2 of SARIMA Model Example to
include the forecasted values, as shown in Figure 4.
159
to the number of rows in the highlighted range minus the number of rows
in R1).
SARIMA_PRED(R0, R1, d, D, per): returns a column array with the
forecasted values for the SARIMA(p, d, q) ⨯(P, D, Q)per model of the time
series data in R1 that correspond to the forecast values in R0 for the
SARMA(p, q) ⨯(P, Q)per model.
All the above arrays are column arrays. Any of the Rar, Rma, Rsa, Rsm arrays
may be omitted, although at least one of these can’t be omitted. per defaults
to 12 and cons defaults to 0 (i.e. no constant).
Note that the ADIFF function is an extension to the version described
in ARIMA Differencing.
Observation: Example 1 shows how to create a SARIMA(1, 1, 1) ⨯ (1, 1,
1)4 model and forecast. The above functions make it easier to create any
SARIMA model and forecast. To illustrate this, for Example 1 of SARIMA
Model Example, the following formulas could have been used:
=ADIFF(B4:B33,1,1,4) to create the time series in range AH4:AH28 of
Figure 4 of SARIMA Model Example
=SARMA_RES(AH4:AH28,AL4,AL5,AL6,AL7,4,AL3) to create the array of
residuals in range AI4:AI28 of Figure 4 of SARIMA Model Example
=SARMA_PRED(AH4:AH28,AL4,AL5,AL6,AL7,4,AL3) to create an array
of predicted values that take the values in the array AH4:AH32 – AI4:AI32
of Figure 4 of SARIMA Model Example. In particular, the last 4 of these
values are those found in range AH29:AH32.
=SARIMA_PRED(AH29:AH32,B4:B33,1,1,4) to create the array of forecast
values shown in range AO10:AO13 of Figure 3 of SARIMA Forecast
Example.
Real Statistics Data Analysis Tool: The Real Statistics Resource Pack
provides the Seasonal Arima (Sarima) data analysis tool which creates a
SARIMA model and forecast.
To perform the analysis for Example 1 of SARIMA Model Example,
press Ctrl-m and choose Seasonal Arima (Sarima) from the Time
S tab (or from the Time Series dialog box if using the original user
interface). Now fill in the dialog box that appears as shown in Figure 1.
160
Figure 1 – SARIMA dialog box
If you leave the # of Forecasts field blank, then its value defaults to the
value in the Seasonal Period field. If that field is blank then no seasonality
is used in the model and # of Forecasts defaults to 5.
After clicking on the OK button, the output shown in Figures 2 and 3 is
displayed (only the first 24 rows of the output in Figure 2 and the first 20
rows in Figure 3 are displayed).
161
Figure 2 – SARIMA output (part 1)
162
Figure 3 – SARIMA output (part 2)
Most of the values are produced using the Real Statistics functions described
above. The formulas used for the descriptive statistics in range J13:J24 and
coefficient roots in columns P, Q and R are similar to those used for the
corresponding values in the Arima data analysis tool.
The lower portion of the output, which contains the forecast, is shown in
Figure 4. The values in columns D, E, F and G are the continuation of these
columns from Figure 2 and the values in columns T and U are the
continuation of these columns from Figure 3.
163
Figure 4 – SARIMA forecast output
Range G29:G32 contains the four-quarter forecast for the differenced time
series, while range U34:U37 contains the corresponding four-quarter
forecast for the revenues for the period Q3 2017 through Q2 2018.
Observation: For this example, we chose to use the Solver approach to
estimating the SARIMA coefficients. The default is to use the Levenberg-
Marquardt approach. This is accomplished by leaving the Solver option
unchecked in Figure 1. In this case, the output is similar to that described
above, except that now the output in Figure 5 is included, which is useful in
that it provides the standard errors of the coefficients and the t-tests that
determine which coefficients are significantly different from zero.
164
corresponding standard errors. If lab = TRUE (default FALSE) then a
column of labels is appended to the output.
SARIMA_PARAM(R1, ar, ma, diff, per, sar, sma, sdiff, con): returns an
array with four columns, the first column of which contains the SARIMA
coefficients (in the order constant term, phi coefficients, theta coefficients,
Phi coefficients, Theta coefficients) and the remaining columns contain the
corresponding standard errors, t statistics and p-values.
Here, the parameters are ar = p, ma = q, diff = d, per = m, sar = P, sma = Q,
sdiff = D for a (p, d, q) × (P, D, Q)m SARIMA model. con = TRUE (default) if
a constant term is included.
We will further assume that the random column vector Y = [y1 y2 ··· yn]T is normally
distributed with pdf f(Y; β, σ2) where β = [φ1 ··· φp θ1 ··· θq]T. For any time series y1,
y2, …, yn the likelihood function is
where Γn is the autocovariance matrix. As usual, we treat y1, y2, …, yn as fixed and
seek estimates for β and σ2 that maximizes L, or equivalently the log of L, namely
165
This produces the maximum likelihood estimate (MLE) B, s2 for the parameters β,
σ2. Equivalently, our goal is to minimize
Observation: From Property 1, we can conclude that for large enough n, the
following holds, where the parameter with a hat is the maximum likelihood estimate
of the corresponding parameter.
AR(1)
AR(2)
MA(1)
MA(2)
ARMA(1,1)
167
Note that cell D7 contains the formula =ACF($A$4:$A$59,C7), cell E7
contains the formula =-F7 and cell F7 contains the formula
=NORM.S.INV(1-$F$3/2)/SQRT(COUNT($A$4:$A$59))
The remaining values in columns D and E (until row 36, corresponding to
lag 30) are calculated using similar formulas. Cell F8 contains the formula
=NORM.S.INV(1-
$F$3/2)*SQRT((1+2*SUMSQ(D$7:D7))/COUNT($A$4:$A$59))
and similarly, for the other cells in column F. This reflects the fact that the
standard error and confidence interval of ACF(k) are
PACF Correlogram
Example 2: Construct a PACF Correlogram for the data in column A of
Figure 1.
This time the PACF option from the dialog box in Figure 2 is selected. The
output is shown in Figure 3 (only the firsts 15 of 30 lags is shown).
168
This reflects the fact that the standard error and confidence interval of
PACF(k) are
169
Handling Missing Time Series Data
When data is missing in a time series, we can use some form of imputation
or interpolation to impute a missing value. In particular, we consider the
approaches described in Figure 1.
Numeric label Text label Imputation type
170
Figure 2 – Imputation Examples
Linear interpolation
The missing value in cell E15 is imputed as follows as shown in cell G15.
The missing value in cell E10 is imputed as follows as shown in cell G10.
Finally, the missing value in cell E18 is imputed as follows as shown in cell
G18.
Spline interpolation
To create the spline interpolation for the four missing values, first, create the
table in range O3:P14 by removing all the missing values. This can be done
by placing the array formula =DELROWBLANK(D3:E18,TRUE) in range
O3:P14, as shown in Figure 3. Next place the array formula
=SPLINE(R4:R18,O4:O14,P4:P14) in range S4:S18 (or in range H4:H18 of
Figure 2).
171
Figure 3 – Spline interpolation
The chart of the spline curve is shown on the right side of Figure 3. The
imputed values are shown in red on the chart.
See Spline Fitting and Interpolation for additional information.
Prior/Next
For Next the next non-missing value is imputed (or the last non-missing
value if there is no next non-missing value), while for Prior the previous
non-missing value is imputed (or the first non-missing value if there is no
previous non-missing value).
The missing value in cell E9 is imputed as 23 (cell J9) when using Next and
12 (cell I9) when using Prior. The missing value in cell E18 is imputed as 75
(cell I18 or J18) when using Prior or Next.
Simple Moving Average
The imputed value depends on the span value k which is a positive integer.
To impute the missing values, we first use linear interpolation, as shown in
column AE of Figure 4. For any missing values in the first or last k elements
in the time series, we simply use the linear interpolation value. For the
others, we use the mean of the 2k+1 linear interpolated values on either side
of the missing value.
In Figure 2 we use a span value of k = 3. To show how the values in column
K of Figure 2 are calculated, we calculate the linear interpolated values as
shown in column AE of Figure 4. Next, we place the formula
=IF(AD4=””,AE4,AD4) in cell AF4, highlight range AF4:AF6 (i.e. a column
172
range with k = 3 elements) and press Ctrl-D. Similarly, we copy the formula
in cell AF4 into the last 3 cells in column F.
Next, we place the formula =IF(AD7=””,AVERAGE(AE4:AE10),AD7) in cell
AF7, highlight the range AF7:AF15 (i.e. all the cells in column AF that haven’t
yet been filled in), and press Ctrl-D. The imputation should in column K is
identical to that shown in column AF.
173
Figure 5 – WMA for t = 6
Here, cell AK4 contains the formula =1/(ABS(AJ$7-AJ4)+1), cell AL4
contains =AE6 and cell AM4 contains =AK4*AL5. We can now highlight the
range AK:AM10 and press Ctrl-D to fill in the other values. Then we sum the
weights to obtain the value 3.16667 as shown in cell AK11 and sum the
products to obtain the value 60.66667 as shown in cell AM11. The imputed
value is thus 60.66667 divided by 3.166667, i.e. 19.15789 as shown in cell
AM12.
This is the value shown in cell AG9 of Figure 4. In fact, we can fill in column
AG of Figure 4 as follows. First, insert the worksheet formula
=1+2*SUMPRODUCT(1/AC5:AC7) in cell AG20. Next, fill in the first 3 and
last 3 values in column AG by using the values in column AE. Finally, insert
the following formula in cell AG7, highlight range AG7:AG15, and press Ctrl-
D.
=IF(AD7=””,SUMPRODUCT(AE4:AE10,1/(ABS(AC4:AC10-
AC7)+1))/AG$20,AD7)
Exponential (Weighted) Moving Average
The approach is identical to that of the weighted moving average except that
we use weights that are a power of 2. Now, we weight the linear imputed
values in column AE of Figure 4 by 1 for t = 6, by 1/2 for t = 5 or 7, by 1/4
for t = 4 or 8, and by 1/8 for t = 3 or 9.
The calculation of the imputed value at t = 6 is as shown in Figure 5, except
that the formula used for cell AK4 is now =1/2^ABS(AJ$7-AJ4) (and
similarly for the other cells in column AK). The result is as shown in column
AH of Figure 4. This time, cell AH7 contains the formula
=IF(AD7=””,SUMPRODUCT(AE4:AE10,1/2^ABS(AC4:AC10-
AC7))/AH$20,AD7)
and cell AH20 contains the formula =1+2*SUMPRODUCT(1/2^AC4:AC6).
174
Worksheet Function
Real Statistics Function: For a time series represented as a column array
where any non-numeric values are treated as missing, the Real Statistics
Resource Pack supplies the following array function:
TSImputed(R1, itype, k): returns a column array of the same size as R1
where each missing element in R1 is imputed based on the imputation
type itype which is either a number or text string as shown in Figure 2
(default 0 or “linear”) and k = the span (default 2), which is only used with
the three moving average imputation types.
For example, =TSImputed(E4:E18,”ema”,3) returns the time series shown in
range M4:M18 of Figure 2.
Seasonality
If the time series has a seasonal component, then we can combine one of the
imputation approaches described in Figure 1 with a seasonality imputation
approach as described in Handling Missing Seasonal Time Series Data.
Handling Missing Seasonal Time Series Data
Seasonality
If a time series has a seasonal component, then we can combine one of the
imputation approaches described in Figure 1 of Handling Missing Time
Series Data with either deseasonalizing or split seasonal imputation (as
shown in Figure 1) based on the seasonality period (i.e. 4 for quarterly, 12 for
monthly, etc.).
Numeric label Text label Seasonality type
0 none no seasonality
1 seas deseasonalizing
175
Example 1: Apply the split seasonality approach to impute the missing
elements for the time series in column G of Figure 2.
176
We illustrate this approach for the same time series shown in Figure 2. This
is repeated in column G of Figure 3.
177
imputation type itype and k, as described in Handling Missing Time Series
Data, plus stype which is either a number or text string as shown in Figure 1
(default 0 or “none”) and per is the seasonal period (default 4 for
quarterly), which is only used when stype is not “none”.
For example, =TSImputed(G4:G19,”wma”,2,”split”,4) returns the time series
shown in range H4:H19 of Figure 2. =TSImputed(G4:G19,”wma”,2,”seas”,4)
returns the time series shown in range K4:K19 of Figure 3.
178