02 Crispdm
02 Crispdm
02 Crispdm
Analysis
Data
Preparation
Data
Deployment Modelling
Evaluation
20/04/2023 DMA Lecture 02 4
CRISP-DM: Phases and Tasks
(Use this as a Check-list for Your Project)
Determine Produce
Describe Transform Build
Data Mining Final
Data Data Model
Goals Report
Conduct
Produce Integrate Assess Review
Initial Data
Project Plan Data Model Project
Exploration
Verify
Format
Data
Data
20/04/2023 Quality DMA Lecture 02 5
CRISP-DM Phase 1
Business Understanding
• Objectives
– To thoroughly understand, from a business perspective, what the client
really wants to accomplish, their business goals, resources, and constraints
– To translate these goals and restrictions into a data mining problem
definition
– To produce a preliminary plan for achieving the data mining goals and the
business goals
• Outputs
– Statement of primary business objectives
– Statement of data mining objectives
– Statement of successful criteria
• Think from business perspective first, not from analytical
perspective, in order to have a set of meaningful questions to which
analytics to be applied to find possible answers/causes
20/04/2023 DMA Lecture 02 6
CRISP-DM Phase 1
Business Understanding
• Case 1:
– Business goal: To increase first-year student progression rate by 10%
– Data mining goal:
• To identify factors affecting student progression, e.g., entry qualifications, family commitment,
p/t jobs, etc. Possible models include: cluster analysis, association analysis, correlation, etc.
• To predict and identify student at-risk. Possible models include various predictive models and
descriptive models. R-F-M model?
• Case 2:
– Business goal: Acquire new customers in order to increase sales by 20% for this year
– Data mining goal:
• Identify which existing customers are most/least profitable.
• Identify the shopping patterns of the most profitable customers, what have they purchased? in
which sequence have they purchased products?
• Identify how customer demographics are linked to their purchasing.
• Case 3:
– Business goal: How many new donors to recruit for blood donation and by which time
to recruit them?
20/04/2023 DMA Lecture 02 7
– Data Mining goal:?
CRISP-DM Phase 2
Data Understanding
• Objectives
– To collect the initial data
– To explore the data, get familiar with the data and discovery initial insights into
the data by identifying “Gross” or “surface” properties
– To evaluate the quality of the data, may need to loop back to the previous stage
• Relevance
• Completeness (coverage)
• Missing values, outliers, extreme values, incomparable value ranges of variables,
inconsistence, imbalanced class, etc., etc.
– Detect a sub-set(s) of the data of interest, and may address directly the data
mining goals
• Outputs
– Data description report
– Data exploration report
– Data quality report
20/04/2023 DMA Lecture 02 8
CRISP-DM Phase 2
Data Understanding: “Surface Properties”
Present
Results (VIZ)
Deploy the
Codes
(Model)
Data
source
Transformation
Data mart
…
Load Enterprise
data Data mart
warehouse
Interactive
Data queries
source
OLAP
cube
Internal/ Data
mining
external User interface
Browser
Portal
20/04/2023
Dashboard DMA Lecture 02 24
Scorecard
Summary
• The concept of methodology, why have a data mining methodology
• CRISP-DM
– The key stages, tasks and input/output within each stage, between
different stages
– Essential:
• Translating a business problem into a data mining problem;
• Evaluating the value of the models created from a business perspective: How can
they be used to address business concerns
• Data mining is a research process: Always ask yourself what has
found, and what impact your finding will have on business
Always start your analysis with a set of clearly-defined
questions/problems
No meaningful question, No analysis
20/04/2023 DMA Lecture 02 25