06 Machine - Learning.Capstone

Time Series Final Project:
Analysis and Forecasting of Light Weight Vehicle Sales Time Series.
IBM Machine Learning Professional Certificate

Course 06: Specialized Models: Time Series and Survival Analysis
By VIKRAM KUMAR
Contents
• Dataset Description
• Main objectives of the analysis.
• EDA, Data Cleaning, Feature Engineering
• Time series Forecasting & ML/DL Analysis and Findings
• Models flaws and advanced steps.
2
Specialized Models: Time Series and Survival Analysis
3
1976
Dataset Description
Dataset Info:
Label : Light Weight Vehicle Sales (LTOTALNSA)
Source: U.S. Bureau of

Economic Analysis
Citation: U.S. Bureau of Economic Analysis, Light

Weight Vehicle Sales [LTOTALNSA], retrieved from FRED,
6
Federal Reserve Bank of St. Louis; https://fred.stlouisfed.org/series/LTOTALNSA,
May 23, 2022.
Dataset Description
Importing the dataset into a data frame & describing the attributes The dataset
consists of two columns:
• DATE
• • LTOTALNSA (relabeled to “SALES”)
• DATE : contains monthly dates from 1976 – 2022

For instance: 1976-04-01 (YYYY-MM-DD)
•
SALES: contains monthly light weight vehicle sales
in thousands of units
7
For instance : 1163.2 thousand of vehicle (1163200 units) are sold in this date.
Main Objective of the analysis:

In this analysis we will explore the dataset of monthly Lightweight Vehicle Sales in
USA from 1976 to 2022 in more details for the sake of approaching time series
techniques and modelsin order to help the ownersof
lightweight vehicles showrooms in USA to draw the
conclusions and insights and make the right decision of
their business.
8
11
Exploratory
Data Analysis
Time Series General
Features :
we can extract from the graph the following
features:
1. There is no trend in other words

Stationary trend which indicates a
constant mean.
2. The variance is constant.
3. Thegraph does not a periodic

show component (no
seasonality)
12
13
or
non
Exploratory Data Analysis
Specialized Models: Time Series and Survival Analysis 17


Augmented Dickey-Fuller Test
This is a statistical procedure to discover whether a time series is

stationary or not.
We won't go into all the nitty gritty details but here's what you need to
know:
1. Null hypothesis: the series is nonstationary.
2. Alternative hypothesis: the series is stationary.
Like any statistical test you should set a significance level or threshold that
determines whether you should accept or reject the null.
• The value 0.05 is common but depends upon numerous factors.
Let's see the result in the next slides.

Augmented Dickey-Fuller Test
adf = -2.829277524541586
First, adf is the value of the test statistic. The more negative the value, the more confident
we can be that the series is stationary. Here we see a value of -2.83. That may not mean
anything to you just yet, but the p-value should. A brief discussion about the important
outputs from the ADF test is in order. Pvalue = 0.05421184907221919
p-value is interpreted like any p-value. Once we set a threshold, we can compare this p-value
to that threshold. Either we reject or fail to reject the null. Here p-value is very close to zero
“0.054” so we reject the null that this data is nonstationary, and we can conclude that it is a
stationary time series.

critical_values = {'1%': -3.442609129942274, '5%': -2.866947348175723, '10%': -
2.569649926626197}
Finally, the critical_values variable provides test statistic thresholds for common significant
levels. Here we see a test statistic of roughly -2.86 and lower is sufficient to reject the null
using a significance level of 5%.
Analysis Summary:
White Noise:
Time series can be considered as white noise (can’t be modeled) if it satisfies three
conditions :
1. Approximately zero or exactly zero mean over the time series. [not satisfied]
2. Constant standard deviation over the time series. [satisfied]
3. correlations between the time series and its lags are not statistically
significant. [not satisfied] Final Decision: Time series is not considered a white
noise.

Stationarity:
In order a time series data to be stationary, the data must exhibit four properties over time:
1. constant mean [satisfied]
2. constant variance [satisfied]
3. constant autocorrelation structure [satisfied]
4. no periodic component [satisfied]
Final Decision: Time series is considered stationary.

our
Time series Forecasting & ML/DL Analysis and Findings


Comparison between smoothing techniques
predictions.
As shown in the DataFrame on the right, single

and triple exponential achieved the best results,
where the worst forecasting was with double
exponential smoothing, but
all these
forecasting techniques are considered unreliable
since they led to very high error, in the next slides
we will use better forecasting models and
techniques.


Using SARIMA model for forecasting.
Some rules to highlight from the Duke ARIMA Guide:
1. If the series has positive autocorrelations out to a high number of lags, then it
probably needs a higher order of differencing.
2. If the lag-1 autocorrelation is zero or negative, or the autocorrelations are all

small and pattern less, then the series does not need a higher
order of differencing. If the lag-1 autocorrelation is 0.5 or more negative, the
series may be over-differenced. BEWARE OF OVERDIFFERENCING!!
3. A model with no orders of differencing assumes that the original series is

stationary (meanreverting). A model with one order of differencing assumes
that

the original series has a constant average trend (e.g. a random walk or SES-type model, with or without
growth). A model with two orders of total differencing assumes that the original series has a time-varying
trend (e.g. a random trend or LES-type model).


Time series Forecasting & ML/DL
Analysis and Findings
Mean Square Error : 1085655 !

Noticeably, the mean squared error of forecasting by SARIMA become half lesser than the
forecasting by smoothing which is considered a measurable advance.
2- Forecasting using SARIMA model (Seasonal Average Integrated Moving Average) 2.2
SARIMA model and parameters tuning : SARIMA (p, d, q) (P, D, Q).

After about 5 mins of searching
about the best SARIMAS
Parameters that fits with our cars
sales data we got the following
results:
Best model: ARIMA(2,0,2)(2,1,1)[12]

Total fit time: 278.712 seconds AIC:
6543.363639474435
Now we are going to fit these

parameters with a SARIMA model
and test the model in terms of
achieving correct forecasting.


2- Forecasting using SARIMA model (Seasonal Average Integrated Moving Average) 2.2
SARIMA model and parameters tuning : SARIMA (p, d, q) (P, D, Q).

After about 5 mins of searching
about the best SARIMAS
Parameters that fits with our cars
sales data we got the following
results:
Best model: ARIMA(2,0,2)(2,1,1)[12]

Total fit time: 278.712 seconds AIC:
6543.363639474435
Now we are going to fit these

parameters with a SARIMA model
and test the model in terms of
achieving correct forecasting.

Models' strengths
and flaws
Models Strengths and Flaws:
Generally, Time series modeling & forecasting is considered one of the

hardest problems to approach in data science since it depends on large
amount of analysis and work, as shown in the previous slides we exposed to
several techniques, where we compare between them in terms of forecasting
one of these techniques was forecasting by smoothing which provided the
worst results but at the same time it gave us intuition that the time series
can be predicted by more advanced techniques like SARIMA model
which

provided very good results, and neural networks which gave us the best
results, at the same time these models achieved good results to some extent
in terms of forecasting future series.
Advanced steps
further suggestions:
As shown in the previous slides deep learning models achieved the best
results in order of modeling and forecasting time series and we can go
further with the results by using more advanced techniques such as :
• Recurrent neural network (RNN)

• Long short-term memory LSTM.
I am currently working on these models to find out its strengthens and weaknesses and will be uploaded on
my GitHub account soon.
https://github.com/AI-MOO/IBM-Machine-Learning-Professional-Certificate

Thank you
IBM Machine Learning Professional Certificate
By: Mohamad Osman

06 Machine - Learning.Capstone

Uploaded by

Copyright:

Available Formats

06 Machine - Learning.Capstone

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

06 Machine - Learning.Capstone

Uploaded by

Copyright:

Available Formats

Time Series Final Project:

Analysis and Forecasting of Light Weight Vehicle Sales Time Series.

IBM Machine Learning Professional Certificate

Label : Light Weight Vehicle Sales (LTOTALNSA)

Source: U.S. Bureau of

Citation: U.S. Bureau of Economic Analysis, Light

• DATE : contains monthly dates from 1976 – 2022

Main Objective of the analysis:

1. There is no trend in other words

3. Thegraph does not a periodic

Specialized Models: Time Series and Survival Analysis 17

Specialized Models: Time Series and Survival Analysis 18

This is a statistical procedure to discover whether a time series is

• The value 0.05 is common but depends upon numerous factors.

Let's see the result in the next slides.

Specialized Models: Time Series and Survival Analysis 19

Specialized Models: Time Series and Survival Analysis 20

Specialized Models: Time Series and Survival Analysis 21

Specialized Models: Time Series and Survival Analysis 22

Specialized Models: Time Series and Survival Analysis 29

Specialized Models: Time Series and Survival Analysis 31

As shown in the DataFrame on the right, single

Specialized Models: Time Series and Survival Analysis 33

Specialized Models: Time Series and Survival Analysis 34

2. If the lag-1 autocorrelation is zero or negative, or the autocorrelations are all

3. A model with no orders of differencing assumes that the original series is

Specialized Models: Time Series and Survival Analysis 35

Specialized Models: Time Series and Survival Analysis 36

Specialized Models: Time Series and Survival Analysis 37

Mean Square Error : 1085655 !

Specialized Models: Time Series and Survival Analysis 38

Specialized Models: Time Series and Survival Analysis 39

Best model: ARIMA(2,0,2)(2,1,1)[12]

Now we are going to fit these

Specialized Models: Time Series and Survival Analysis 40

Specialized Models: Time Series and Survival Analysis 41

Specialized Models: Time Series and Survival Analysis 42

Best model: ARIMA(2,0,2)(2,1,1)[12]

Now we are going to fit these

Specialized Models: Time Series and Survival Analysis 43

Generally, Time series modeling & forecasting is considered one of the

Specialized Models: Time Series and Survival Analysis 47

• Recurrent neural network (RNN)

Specialized Models: Time Series and Survival Analysis 48

Specialized Models: Time Series and Survival Analysis 49

By: Mohamad Osman

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.