Unit 2 Notes
Unit 2 Notes
Business Understanding:
Analytic Understanding
Analytic Understanding
Data Requirements:
Data Collection:
Data Understanding:
Data Preparation:
Modelling:
Evaluation:
Model evaluation is done during model development. It checks
for the quality of the model to be assessed and also if it meets
the business requirements
Deployment:
Deployment phase checks how much the model can withstand
in the external environment and perform superiorly as
compared to others.
Feedback:
Data Engineer:
Leverages deep technical skills to assist with tuning SQL
queries for data management and data extraction, and
provides support for data ingestion into the analytic
sandbox.
DBA sets up and configures the databases to be used, the
data engineer executes the actual data extractions and
performs substantial data manipulation to facilitate the
analytics.
Data Scientist:
Provides subject matter expertise for analytical techniques,
data modeling, and applying valid analytical techniques to
given business problems.
Ensures overall analytics objectives are met.
Performing ETLT
14
PHASE 2: DATA PREPARATION
Open Refine :(formerly called Google Refine) is "a free, open
source, powerful tool for working with messy data." It is a
popular GUI-based tool.
Data Wrangler :is an interactive tool for data clean ing and
transformation. Wrangler was developed at Stanford University
and can be used to perform many transformations on a given
dataset forming data transformations
Performing ETLT
14
PHASE 2: DATA PREPARATION
Open Refine :(formerly called Google Refine) is "a free, open
source, powerful tool for working with messy data." It is a
popular GUI-based tool.
Data Wrangler :is an interactive tool for data clean ing and
transformation. Wrangler was developed at Stanford University
and can be used to perform many transformations on a given
dataset forming data transformations
Model Selection
The team's main goal is to choose an analytical
technique, or a short list of candidate techniques, based
on the end goal of the project.
In the case of machine learning and data mining, these
rules and conditions are grouped into several general
sets of techniques, such as classification, association
rules, and clustering.
Model Selection
Model Selection
Model Selection
The team's main goal is to choose an analytical
technique, or a short list of candidate techniques, based
on the end goal of the project.
In the case of machine learning and data mining, these
rules and conditions are grouped into several general
sets of techniques, such as classification, association
rules, and clustering.
Model Selection
Model Selection
Better performance
Longer lifetime
Easier retraining
Speedy production
Better performance
Longer lifetime
Easier retraining
Speedy production
This allows the team to learn from the deployment and make
any needed adjustments before launching the model across
the enterprise.
This allows the team to learn from the deployment and make
any needed adjustments before launching the model across
the enterprise.
Keep it short.
Keep it short.