06 Machine - Learning.Capstone

Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 50

Time Series Final Project:

Analysis and Forecasting of Light Weight Vehicle Sales Time Series.

IBM Machine Learning Professional Certificate


Course 06: Specialized Models: Time Series and Survival Analysis

By VIKRAM KUMAR
Contents
• Dataset Description
• Main objectives of the analysis.
• EDA, Data Cleaning, Feature Engineering
• Time series Forecasting & ML/DL Analysis and Findings
• Models flaws and advanced steps.

2
Specialized Models: Time Series and Survival Analysis

3
1976
Dataset Description
Dataset Info:

Label : Light Weight Vehicle Sales (LTOTALNSA)

Source: U.S. Bureau of


Economic Analysis

Citation: U.S. Bureau of Economic Analysis, Light


Weight Vehicle Sales [LTOTALNSA], retrieved from FRED,

6
Federal Reserve Bank of St. Louis; https://fred.stlouisfed.org/series/LTOTALNSA,
May 23, 2022.

Dataset Description
Importing the dataset into a data frame & describing the attributes The dataset
consists of two columns:

• DATE
• • LTOTALNSA (relabeled to “SALES”)

• DATE : contains monthly dates from 1976 – 2022


For instance: 1976-04-01 (YYYY-MM-DD)


SALES: contains monthly light weight vehicle sales
in thousands of units

7
For instance : 1163.2 thousand of vehicle (1163200 units) are sold in this date.

Main Objective of the analysis:


In this analysis we will explore the dataset of monthly Lightweight Vehicle Sales in
USA from 1976 to 2022 in more details for the sake of approaching time series
techniques and modelsin order to help the ownersof
lightweight vehicles showrooms in USA to draw the
conclusions and insights and make the right decision of
their business.

8
11
Exploratory
Data Analysis
Time Series General
Features :
we can extract from the graph the following
features:

1. There is no trend in other words


Stationary trend which indicates a
constant mean.
2. The variance is constant.

3. Thegraph does not a periodic


show component (no
seasonality)
12
Specialized Models: Time Series and Survival Analysis

13
or
non
Exploratory Data Analysis

Specialized Models: Time Series and Survival Analysis 17


Exploratory Data Analysis

Specialized Models: Time Series and Survival Analysis 18


Exploratory Data Analysis
Augmented Dickey-Fuller Test

This is a statistical procedure to discover whether a time series is


stationary or not.

We won't go into all the nitty gritty details but here's what you need to
know:
1. Null hypothesis: the series is nonstationary.
2. Alternative hypothesis: the series is stationary.

Like any statistical test you should set a significance level or threshold that
determines whether you should accept or reject the null.

• The value 0.05 is common but depends upon numerous factors.

Let's see the result in the next slides.

Specialized Models: Time Series and Survival Analysis 19


Exploratory Data Analysis
Augmented Dickey-Fuller Test

adf = -2.829277524541586

First, adf is the value of the test statistic. The more negative the value, the more confident
we can be that the series is stationary. Here we see a value of -2.83. That may not mean
anything to you just yet, but the p-value should. A brief discussion about the important
outputs from the ADF test is in order. Pvalue = 0.05421184907221919

p-value is interpreted like any p-value. Once we set a threshold, we can compare this p-value
to that threshold. Either we reject or fail to reject the null. Here p-value is very close to zero
“0.054” so we reject the null that this data is nonstationary, and we can conclude that it is a
stationary time series.

Specialized Models: Time Series and Survival Analysis 20


Exploratory Data Analysis
critical_values = {'1%': -3.442609129942274, '5%': -2.866947348175723, '10%': -
2.569649926626197}

Finally, the critical_values variable provides test statistic thresholds for common significant
levels. Here we see a test statistic of roughly -2.86 and lower is sufficient to reject the null
using a significance level of 5%.

Analysis Summary:
White Noise:
Time series can be considered as white noise (can’t be modeled) if it satisfies three
conditions :
1. Approximately zero or exactly zero mean over the time series. [not satisfied]
2. Constant standard deviation over the time series. [satisfied]
3. correlations between the time series and its lags are not statistically
significant. [not satisfied] Final Decision: Time series is not considered a white
noise.

Specialized Models: Time Series and Survival Analysis 21


Exploratory Data Analysis
Stationarity:
In order a time series data to be stationary, the data must exhibit four properties over time:
1. constant mean [satisfied]
2. constant variance [satisfied]
3. constant autocorrelation structure [satisfied]
4. no periodic component [satisfied]
Final Decision: Time series is considered stationary.

Specialized Models: Time Series and Survival Analysis 22


our
Specialized Models: Time Series and Survival Analysis 28
Time series Forecasting & ML/DL Analysis and Findings

Specialized Models: Time Series and Survival Analysis 29


Specialized Models: Time Series and Survival Analysis 30
Time series Forecasting & ML/DL Analysis and Findings

Specialized Models: Time Series and Survival Analysis 31


Specialized Models: Time Series and Survival Analysis 32
Time series Forecasting & ML/DL Analysis and Findings
Comparison between smoothing techniques
predictions.

As shown in the DataFrame on the right, single


and triple exponential achieved the best results,
where the worst forecasting was with double
exponential smoothing, but
all these
forecasting techniques are considered unreliable
since they led to very high error, in the next slides
we will use better forecasting models and
techniques.

Specialized Models: Time Series and Survival Analysis 33


Time series Forecasting & ML/DL Analysis and Findings

Specialized Models: Time Series and Survival Analysis 34


Time series Forecasting & ML/DL Analysis and Findings
Using SARIMA model for forecasting.
Some rules to highlight from the Duke ARIMA Guide:

1. If the series has positive autocorrelations out to a high number of lags, then it
probably needs a higher order of differencing.

2. If the lag-1 autocorrelation is zero or negative, or the autocorrelations are all


small and pattern less, then the series does not need a higher
order of differencing. If the lag-1 autocorrelation is 0.5 or more negative, the
series may be over-differenced. BEWARE OF OVERDIFFERENCING!!

3. A model with no orders of differencing assumes that the original series is


stationary (meanreverting). A model with one order of differencing assumes
that

Specialized Models: Time Series and Survival Analysis 35


Time series Forecasting & ML/DL Analysis and Findings
the original series has a constant average trend (e.g. a random walk or SES-type model, with or without
growth). A model with two orders of total differencing assumes that the original series has a time-varying
trend (e.g. a random trend or LES-type model).

Specialized Models: Time Series and Survival Analysis 36


Time series Forecasting & ML/DL Analysis and Findings

Specialized Models: Time Series and Survival Analysis 37


Time series Forecasting & ML/DL
Analysis and Findings

Mean Square Error : 1085655 !

Specialized Models: Time Series and Survival Analysis 38


Time series Forecasting & ML/DL Analysis and Findings
Noticeably, the mean squared error of forecasting by SARIMA become half lesser than the
forecasting by smoothing which is considered a measurable advance.

2- Forecasting using SARIMA model (Seasonal Average Integrated Moving Average) 2.2
SARIMA model and parameters tuning : SARIMA (p, d, q) (P, D, Q).

Specialized Models: Time Series and Survival Analysis 39


After about 5 mins of searching
about the best SARIMAS
Parameters that fits with our cars
sales data we got the following
results:

Best model: ARIMA(2,0,2)(2,1,1)[12]


Total fit time: 278.712 seconds AIC:
6543.363639474435

Now we are going to fit these


parameters with a SARIMA model
and test the model in terms of
achieving correct forecasting.

Specialized Models: Time Series and Survival Analysis 40


Time series Forecasting & ML/DL Analysis and Findings

Specialized Models: Time Series and Survival Analysis 41


2- Forecasting using SARIMA model (Seasonal Average Integrated Moving Average) 2.2
SARIMA model and parameters tuning : SARIMA (p, d, q) (P, D, Q).

Specialized Models: Time Series and Survival Analysis 42


Time series Forecasting & ML/DL Analysis and Findings
After about 5 mins of searching
about the best SARIMAS
Parameters that fits with our cars
sales data we got the following
results:

Best model: ARIMA(2,0,2)(2,1,1)[12]


Total fit time: 278.712 seconds AIC:
6543.363639474435

Now we are going to fit these


parameters with a SARIMA model
and test the model in terms of
achieving correct forecasting.

Specialized Models: Time Series and Survival Analysis 43


Specialized Models: Time Series and Survival Analysis 44
Specialized Models: Time Series and Survival Analysis 45
Models' strengths
and flaws
Models Strengths and Flaws:

Generally, Time series modeling & forecasting is considered one of the


hardest problems to approach in data science since it depends on large
amount of analysis and work, as shown in the previous slides we exposed to
several techniques, where we compare between them in terms of forecasting
one of these techniques was forecasting by smoothing which provided the
worst results but at the same time it gave us intuition that the time series
can be predicted by more advanced techniques like SARIMA model
which

Specialized Models: Time Series and Survival Analysis 47


provided very good results, and neural networks which gave us the best
results, at the same time these models achieved good results to some extent
in terms of forecasting future series.

Advanced steps
further suggestions:

As shown in the previous slides deep learning models achieved the best
results in order of modeling and forecasting time series and we can go
further with the results by using more advanced techniques such as :

• Recurrent neural network (RNN)

Specialized Models: Time Series and Survival Analysis 48


• Long short-term memory LSTM.

I am currently working on these models to find out its strengthens and weaknesses and will be uploaded on
my GitHub account soon.

https://github.com/AI-MOO/IBM-Machine-Learning-Professional-Certificate

Specialized Models: Time Series and Survival Analysis 49


Thank you
IBM Machine Learning Professional Certificate
Specialized Models: Time Series and Survival Analysis

By: Mohamad Osman

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy