0% found this document useful (0 votes)
21 views

New Microsoft Word Document4

Uploaded by

ADIL NAVEED
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
21 views

New Microsoft Word Document4

Uploaded by

ADIL NAVEED
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 31

To identify two linear time series models for

the Chinese quarterly GDP data (one with


logs and one without logs) using the Box-
Jenkins methodology, follow these steps.
We'll use Python's stats models library for
ARIMA modeling and other libraries for
data handling and visualization.

### Step-by-Step Process

1. *Load and inspect the data. *


2. *Make the data stationary. *
3. *Identify appropriate ARIMA models
using ACF and PACF plots. *
4. *Fit the ARIMA models. *
5. *Validate the models using diagnostics. *

### Python Code Implementation

First, ensure you have the required libraries


installed:
bash
pip install pandas numpy stats models
matplotlib seaborne

Now, proceed with the Python code to


perform the analysis:

python
import pandas as pd
import numpy as np
import matplotlib. pyplot as plt
from statsmodels. graphics. tsaplots import
plot_acf, plot_pacf
from statsmodels.tsa.arima.model import
ARIMA
from statsmodels.tsa.stattools import ad
fuller

# Load the data


df = pd.read_csv('GDPChina.csv',
index_col='Date', parse_dates=True,
decimal='.')
gdp = df['GDP']

# Function to perform the Dickey-Fuller test


def adf_test(series):
result = adfuller(series)
print('ADF Statistic:', result[0])
print('p-value:', result[1])
print('Critical Values:', result[4])
return result[1] <= 0.05

# Original series
print("ADF test for original series:")
adf_test(gdp)

# Differencing the series to make it


stationary
gdp_diff = gdp.diff().dropna()
print("\nADF test for differenced series:")
adf_test(gdp_diff)

# Plot ACF and PACF for differenced series


fig, ax = plt.subplots(2, 1, figsize=(12, 8))
plot_acf(gdp_diff, lags=40, ax=ax[0])
plot_pacf(gdp_diff, lags=40, ax=ax[1])
plt.show()

# Fit ARIMA model without logs


model = ARIMA(gdp, order=(1,1,1))
model_fit = model.fit()
print(model_fit.summary())

# Log transformation
gdp_log = np.log(gdp)
# Differencing the log-transformed series to
make it stationary
gdp_log_diff = gdp_log.diff().dropna()
print("\nADF test for log-differenced
series:")
adf_test(gdp_log_diff)

# Plot ACF and PACF for log-differenced


series
fig, ax = plt.subplots(2, 1, figsize=(12, 8))
plot_acf(gdp_log_diff, lags=40, ax=ax[0])
plot_pacf(gdp_log_diff, lags=40, ax=ax[1])
plt.show()

# Fit ARIMA model with logs


model_log = ARIMA(gdp_log, order=(1,1,1))
model_log_fit = model_log.fit()
print(model_log_fit.summary())

# Diagnostic plots for models


def diagnostic_plots(residuals, title):
fig, ax = plt.subplots(1, 2, figsize=(12, 6))
residuals.plot(ax=ax[0], title=f'Residuals
of {title}')
plot_acf(residuals, lags=40, ax=ax[1])
plt.show()

print("\nDiagnostics for ARIMA(1,1,1)


without logs:")
diagnostic_plots(model_fit.resid,
"ARIMA(1,1,1) without logs")
print("\nDiagnostics for ARIMA(1,1,1) with
logs:")
diagnostic_plots(model_log_fit.resid,
"ARIMA(1,1,1) with logs")

### Explanation:

1. *Load Data*: The data is loaded using


pandas.
2. *ADF Test*: The Augmented Dickey-
Fuller (ADF) test checks for stationarity. If
the p-value > 0.05, the series is non-
stationary and requires differencing.
3. *Differencing*: Apply differencing to
make the series stationary.
4. *ACF and PACF Plots*: These plots help
identify the values of \ (p \) and \ (q \) for
the ARIMA model.
5. *ARIMA Models*: Fit the ARIMA (1,1,1)
model for both the original and log-
transformed series.
6. *Diagnostics*: Check residuals to validate
the models.

### Results:

- *Without logs*: ARIMA (1,1,1) model is


identified for the differenced series.
- *With logs*: Log transformation followed
by differencing, and ARIMA (1,1,1) model is
identified.

These steps and code outline the process to


fit and validate ARIMA models for the
Chinese quarterly GDP data using the Box-
Jenkins methodology. Adjust model orders
based on the ACF/PACF plots and
diagnostics if necessary.

For the models identified in the


previous step, leave for
example the last 24 real values
to compare the two models in
terms of forecasting (out of
sample forecasting exercise).
What is the best model and
why is this your choice?

To compare the forecasting


performance of the identified
models, we'll proceed with an
out-of-sample forecasting
exercise. We'll leave out the
last 24 observations from the
dataset, fit both models on the
training data (the data
excluding the last 24
observations), and then
compare the forecasted values
with the actual values.

Here's the detailed step-by-


step process and the Python
code for performing this
analysis:

### Step-by-Step Process


1. *Split the data into training
and test sets. *
2. *Fit the identified models on
the training set. *
3. *Generate forecasts for the
test period using both models.
*
4. *Compare the forecasted
values with the actual values. *
5. *Evaluate model
performance using appropriate
metrics (e.g., RMSE, MAE). *

### Python Code


Implementation

python
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from
statsmodels.graphics.tsaplots
import plot_acf, plot_pacf
from
statsmodels.tsa.arima.model
import ARIMA
from sklearn.metrics import
mean_squared_error,
mean_absolute_error

# Load the data


df =
pd.read_csv('GDPChina.csv',
index_col='Date',
parse_dates=True, decimal='.')
gdp = df['GDP']

# Split the data into training


and test sets
train_size = len(gdp) - 24
train, test = gdp[:train_size],
gdp[train_size:]
# Fit ARIMA model without logs
on training data
model = ARIMA(train,
order=(1,1,1))
model_fit = model.fit()

# Forecast for the test period


forecast =
model_fit.forecast(steps=24)
forecast = forecast[:24] #
Ensure forecast length matches
test set
# Log transformation
gdp_log = np.log(gdp)
train_log, test_log =
gdp_log[:train_size],
gdp_log[train_size:]

# Fit ARIMA model with logs on


log-transformed training data
model_log = ARIMA(train_log,
order=(1,1,1))
model_log_fit = model_log.fit()
# Forecast for the test period
using the log-transformed
model
forecast_log =
model_log_fit.forecast(steps=2
4)
forecast_log = forecast_log[:24]
# Ensure forecast length
matches test set
forecast_log =
np.exp(forecast_log) # Inverse
log transformation
# Evaluate model performance
rmse_no_log =
np.sqrt(mean_squared_error(t
est, forecast))
mae_no_log =
mean_absolute_error(test,
forecast)
rmse_log =
np.sqrt(mean_squared_error(t
est, forecast_log))
mae_log =
mean_absolute_error(test,
forecast_log)

print("ARIMA(1,1,1) without
logs:")
print(f"RMSE: {rmse_no_log}")
print(f"MAE: {mae_no_log}")

print("\nARIMA(1,1,1) with
logs:")
print(f"RMSE: {rmse_log}")
print(f"MAE: {mae_log}")

# Plot the forecasts vs actual


values
plt.figure(figsize=(12, 6))
plt.plot(test.index, test,
label='Actual')
plt.plot(test.index, forecast,
label='Forecast without logs')
plt.plot(test.index,
forecast_log, label='Forecast
with logs')
plt.legend()
plt.title('Forecast vs Actuals')
plt.show()

### Explanation:

1. *Data Splitting*: The dataset


is split into training and test
sets, leaving the last 24
observations for testing.
2. *Model Fitting*: Both ARIMA
(1,1,1) models (one without
logs and one with logs) are
fitted on the training set.
3. *Forecasting*: Forecasts for
the test period are generated
from both models. For the log-
transformed model, the
forecasts are transformed back
using the exponential function.
4. *Performance Evaluation*:
The forecasts are compared
with the actual values using
RMSE (Root Mean Squared
Error) and MAE (Mean Absolute
Error).
5. *Plotting*: A plot is
generated to visualize the
actual values against the
forecasts from both models.

### Results and Conclusion:

- *Without logs*:
- RMSE: \ (\text {calculated
RMSE value} \)
- MAE: \ (\text {calculated
MAE value} \)

- *With logs*:
- RMSE: \ (\text {calculated
RMSE value} \)
- MAE: \ (\text {calculated
MAE value} \)
The model with the lower
RMSE and MAE is considered
better.

### Choosing the Best Model:

- The *ARIMA (1,1,1) model


with logs* tends to perform
better in many cases because
log transformation stabilizes
the variance, leading to better
model performance.
- However, the final choice
depends on the specific RMSE
and MAE values obtained. The
model with the lower values
indicates better forecasting
accuracy.

Evaluate the RMSE and MAE


values from the output to
determine which model
provides the best forecasts for
the Chinese quarterly GDP
data.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy