Time Series Analysis
Time Series Analysis
hist X, normal
swilk x
When your data is not normally distributed use non-parametric test. Non parametric tests disregards
the normality assumption.
Non-parametric test
ranksum x, by()
Time series data are data collected on the same observational unit at multiple time periods.
1. Forecasting models eg forecast the inflation levels, interest rate levels etc using the following
models
a. Autoregressive (AR) models
b. Autoregressive distributed lag (ADL) models
2. To estimate dynamic causal effects. The concern if to know the effect of a policy shift over time.
1. Time lags
2. Correlation over time (serial correlation or autocorrelation)
3.
Step 1
tsset year telling Stata that year is your time scale variable
decode X, gen(new_X)
Correlation coefficient
Pairwise correlation and use stars to show the correlations that are significant at less than 1% level
pwcorr, star(0.01)
The OLS requires that your time series be stationary. It is therefore, important to determine stationarity
in time series data. You need to deal with stationary time series processes. A stationary process is
characterized by time-invariant mean, variance and autocovariance.
Some time series data tend to have a stochastic trend commonly known as a unit root. A stochastic
trend or deterministic trend also known as a unit root. Regressing non-stationary series will result is
spurious regression
Testing for Unit root using the Augmented -Fuller test, you are testing whether the process is either a
random walk or a random walk with a drift. There are variant forms of the ADF depending on whether
you allow for a non-zero constant/or a deterministic trend.
Augmented Dickey-Fuller test for the presence of a unit root in X with a trend term
dfuller X, trend
Augmented Dickey-Fuller test for the presence of a unit root in X with a drift term
dfuller X, drift
Augmented Dickey-Fuller test for the presence of a unit root in X with a drift term and two lags
You can also use the Phillip Perron test by issuing the command
pperron X, trend
Differenced models
Generating differences for variable X. You can generate the differenced series by issuing the command
d2. The prefix “d2” means second differencing, if you want third differencing you issue “d3” etc
gen dX = d2.X
Autoregressive models
Autoregressive model using lags for forecasting. AR is another form of a time series model. It uses a
variable’s own history to predict its future. In an auto regressive model no other variable is needed.
Regressors in autoregressive consists of lagged dependent variables only. Written as AR(p), means the
pth order autoregressive model. Autoregression is based on the premise that time series are serially
correlated i.e. current level of most series are correlated with previous or earlier levels.
reg Y L.Y
The models uses the lag value of X to account for the lag effect. You might want to see if previous values
have any impact on current values.
How to generate the lagged values of X. You can issue the commands
Alternatively you can use the lag operator L to perform the task after declaring tsset command
reg Y L(1/2). X L(1/2).X instruct STATA to use lags 1-2 of the variable X as the
regresors. L1 up to L2 refers to lag 1, lag 2.
You need to first install a package for the ARDL model by issuing the command. The package ardl will
automatically determine the optimum lag length.
ARDL can be used in long-run relationship analysis when the variables are integrated I(.). The model can
produce consistent estimates of the long-run coefficients. The ARDL helps to study the long-run
relationships among time series variables.
In the presence of cointegration the traditional ARDL model is in applicable, hence the need to correct
for long-run relationship. You use an error-correction style ARDL model in the presence of cointegration
ardl y x z, ec
estat ectest
ardl y x z, ec
estat ectest
It is possible for time series to be integrated (move together) in a non-stationary way. Such series will
follow a common stochastic trend. These time series are said to be cointegrated. If variables X and Y are
cointegrated, we cannot rely on OLS estimates. We can apply the differenced equation to remove the
history of cointegration or co-movement.
If we difference the series we will eliminate the history of how Y will be pulled into a long-run
relationship with X
If we estimate the variables in levels, our test statistics are unreliable because the variables are
non-stationary.
The appropriate model for a cointegrated case is the error-correction model (ECM) of Hendry and
Sargan
Stata has a suite of commands for fitting, forecasting, interpreting and perfroming inference on vector
error-correction models (VECMs) with cointegrating variables. After fiitng a VECM, the irf commands can
be used to obtain impulse-response functions (IRFs) and forecast-error variance decompositions
(FEVDs).
Varsoc x y
vec xy
You can use the Engle and Granger’s extension of the ADF test
If X and Y are integrated that is I(1) or I(2) etc, that is integrated of order 1 or integrated of order 2, then
a two variable error correction model is necessary. With two variables there can only be one
cointegrating relationship linking the long run paths. With m variables there can be m-1 cointegrating
relationships
regress y x
predict e, resid
dfuller e, lags(10)
These models have been developed to characterize the joint time-series of a set (vector) of variables
without making the restrictive assumptions that would allow the identification of structural dynamic
models. VAR is a simple generalization of predicting a variable based on its own lagged variable. It also
involves predicting a vector of variables based on lags of all variables.
y=α 0 + α 1 y t−1 + β 0 X 1+ β 2 X t −1 + ε ty
x
x=θ 0+θ 1 x t −1 +δ 0 y 1+ δ 2 y t−1 +ε t
The is called a vector autoregressive model because y is auto regressive on y t −1 , in the same way x is
autoregressed on its lagged values. The variable y t −1 and x t−1 are the lagged values of y and x
respectively
On VAR models we can do Granger causality test. The null hypothesis is that the lagged values of x does
not help to predict y given the presents of lagged values of y and vice versa.
It is also possible to do the impulse response function and the variance decompositions. This will help to
identify the structural shocks affecting our model.