06 Machine - Learning.Capstone
06 Machine - Learning.Capstone
06 Machine - Learning.Capstone
By VIKRAM KUMAR
Contents
• Dataset Description
• Main objectives of the analysis.
• EDA, Data Cleaning, Feature Engineering
• Time series Forecasting & ML/DL Analysis and Findings
• Models flaws and advanced steps.
2
Specialized Models: Time Series and Survival Analysis
3
1976
Dataset Description
Dataset Info:
6
Federal Reserve Bank of St. Louis; https://fred.stlouisfed.org/series/LTOTALNSA,
May 23, 2022.
Dataset Description
Importing the dataset into a data frame & describing the attributes The dataset
consists of two columns:
• DATE
• • LTOTALNSA (relabeled to “SALES”)
•
SALES: contains monthly light weight vehicle sales
in thousands of units
7
For instance : 1163.2 thousand of vehicle (1163200 units) are sold in this date.
8
11
Exploratory
Data Analysis
Time Series General
Features :
we can extract from the graph the following
features:
13
or
non
Exploratory Data Analysis
We won't go into all the nitty gritty details but here's what you need to
know:
1. Null hypothesis: the series is nonstationary.
2. Alternative hypothesis: the series is stationary.
Like any statistical test you should set a significance level or threshold that
determines whether you should accept or reject the null.
adf = -2.829277524541586
First, adf is the value of the test statistic. The more negative the value, the more confident
we can be that the series is stationary. Here we see a value of -2.83. That may not mean
anything to you just yet, but the p-value should. A brief discussion about the important
outputs from the ADF test is in order. Pvalue = 0.05421184907221919
p-value is interpreted like any p-value. Once we set a threshold, we can compare this p-value
to that threshold. Either we reject or fail to reject the null. Here p-value is very close to zero
“0.054” so we reject the null that this data is nonstationary, and we can conclude that it is a
stationary time series.
Finally, the critical_values variable provides test statistic thresholds for common significant
levels. Here we see a test statistic of roughly -2.86 and lower is sufficient to reject the null
using a significance level of 5%.
Analysis Summary:
White Noise:
Time series can be considered as white noise (can’t be modeled) if it satisfies three
conditions :
1. Approximately zero or exactly zero mean over the time series. [not satisfied]
2. Constant standard deviation over the time series. [satisfied]
3. correlations between the time series and its lags are not statistically
significant. [not satisfied] Final Decision: Time series is not considered a white
noise.
1. If the series has positive autocorrelations out to a high number of lags, then it
probably needs a higher order of differencing.
2- Forecasting using SARIMA model (Seasonal Average Integrated Moving Average) 2.2
SARIMA model and parameters tuning : SARIMA (p, d, q) (P, D, Q).
Advanced steps
further suggestions:
As shown in the previous slides deep learning models achieved the best
results in order of modeling and forecasting time series and we can go
further with the results by using more advanced techniques such as :
I am currently working on these models to find out its strengthens and weaknesses and will be uploaded on
my GitHub account soon.
https://github.com/AI-MOO/IBM-Machine-Learning-Professional-Certificate