Capstone Project
Capstone Project
Capstone Project
1. Business understanding
• What problem you are trying to solve?
• Every project, whatever its size, begins with the
understanding of the business.
• Business partners who need the analytics solution
play a critical role in this phase by defining the
problem, the project objectives, and the solution
requirements from a business perspective.
2 Analytic approaches
• The problem must be expressed in the context of
statistical learning to identify the appropriate machine
learning techniques to achieve the desired result.
3. Data Requirement
What data do you need to answer the question?
• Analytic approach determines the data requirements -
specific content, formats, and data representations,
based on domain knowledge
4. Data collection
• Where is the data coming from (identify all sources)
and how will you get it?
• The Data Scientist identifies and collects data
resources (structured, unstructured and semi-
structured) that are relevant to the problem area.
• If the data scientist finds gaps in the data collection,
he may need to review the data requirements and
collect more data.
5. Data understanding
Is the data that you collected representative of the
problem to be solved?
• Descriptive statistics and visualization techniques can
help a data scientist understand the content of the
data, assess its quality, and obtain initial information
about the data
6. Data preparation
• What additional work is required to manipulate and
work with the data?
• The Data preparation step includes all the activities
used to create the data set used during the modelling
phase.
• This includes cleansing data, combining data from
multiple sources, and transforming data into more
useful variables.
• In addition, feature engineering and text analysis can
be used to derive new structured
7. Model Training
In What way can the data be visualized to get the
answer that is required?
• From the first version of the prepared data set, Data
scientists use a Training dataset (historical data in
which the desired result is known) to develop predictive
or descriptive models.
The modelling process is very iterative.
8. Model Evaluation
• Does the model used really answer the initial
question or does it need to be adjusted?
• The Data Scientist evaluates the quality of the model
and verifies that the business problem is handled in a
complete and adequate manner
9. Deployment
• Can you put the model into practice?
• Once a satisfactory model has been developed and
approved by commercial sponsors, it will be
implemented in the production environment or in a
comparable test environment.
10. Feedback
Can you get constructive feedback into answering the
question? By collecting the results of the implemented
model, the organization receives feedback on the
performance of the model and its impact on the
implementation environment