Analytics Roadmap
Analytics Roadmap
Analytics Roadmap
ABC is a retail goods market, which has hundreds of shops all over the
country. According to their annual business plan, they have two main
objectives which focus on Sales Performance and Stock Management.
• Marketing team believes that their sales performance might increase
by offering cross-products with discount rates.
• At the same time, Operations Office has an objective to decrease costs
of expired goods.
Business Issue Understanding
Product data
Sales history by products
Stock data
Expiration dates of products
Example:
Product data >> ERP Database Definitions
Sales history by products >> Sales Transaction Data
Stock data >> Stock Management Data
Expiration dates of products >> ERP Database
Data
Data Preperation
Preperation
•Data Integration
•Data Manipulation
•Missing Values Handling
•Feature Selection
•Feature Generation
•Dimensionality Reduction
•Outlier Removal
•Normalization
• Data Manipulation
Restructuring data to be ready for model building
Ex:
True / False values to > 1 / 0
Time / Date handling
Product types :> Factor
Tools:
R Packages : Python: SQL
dpylr Spark
reshape2
numpy
tidyr
data.table pandas Alteryx
… Knime
Azure
Data
Data Preperation
Preperation
•Data Integration
•Data Manipulation
•Missing Values Handling
•Feature Selection
•Feature Generation
•Dimensionality Reduction
•Outlier Removal
•Normalization
• Missing Value Handling
• Finding the missing values
• Deciding how to treat them
• Delete the record
• Fill manually
• Fill with mean, median, last value before, first next value
• Fill with a model (regression, decision tree…etc)
• Feature Selection
• Which features should be included in the model?
• Eliminating features with huge ratios of missing values
• If more than %40 of values are missing,feature could be excluded.
• Deselecting features which are highly correlated or represent the same phenomena
• Ex: Heat degree: one column in ‘degress celcius’ ; other column ‘fahrenheit’
• Ex: Date of birth ; Age
Data
Data Preperation
Preperation
•Data Integration
•Data Manipulation
•Missing Values Handling
•Feature Selection
•Feature Generation
•Dimensionality Reduction
•Outlier Removal
•Normalization
• Feature Generation
• Deriving features from other features
• Ex: Total Sales Per District
Kadıköy
2011
540
2012
650
2013
800
2014
900
2015
910
2016
1105
2017
1200
2018
1400
Optimum NA 200 400 340 500 590 899 560
Natilus 440 420 450 465 502 520 510 560
Beşiktaş 820 890 905 910 902 900 920 880
Üsküdar 50 200 400 600 800 421 430 500
Kartal 30 50 90 150 250 320 220 150
…
• Ex: Kadıköy
Optimum
540
NA
650
200
800
400
900
340
910
500
1105
590
1200
899
1400
560
Natilus 440 420 450 465 502 520 510 560
Beşiktaş 820 890 905 910 902 900 920 880
Üsküdar 50 200 400 600 800 421 430 500
Kartal 30 50 90 150 250 320 220 150
…
• Reducing the number of features in a data model by grouping them or eliminating them
•Outlier Removal
•Normalization
• Outlier Removal
• Removing the observation points
that are distant from the observations
Data Preperation
•Data Integration
•Data Manipulation
•Missing Values Handling
•Feature Selection
•Feature Generation
•Dimensionality Reduction
•Outlier Removal
•Normalization
• Normalization
• Normalization of ratings means adjusting values measured on different scales to
a notionally common scale, to enable them to compare with each other and
- Standard Deviation:
[x - mean(x)] * sd(x)
…
Modeling