ETL Specification Table of Contents: Change Log
ETL Specification Table of Contents: Change Log
What follows is a table of contents for the ETL Specification document. This is targeted at organizations
that do not have rigid specification / development procedures in place. Those who already follow clear
development methodologies will find this specification document to be weak. Those who fly by the seat
of their pants, will find this specification to be insanely detailed.
Much of this information should exist already. Examples include the logical dimensional model, the
physical database model, the source-to-target map, and the data profiling reports. It’s very helpful to
pull everything together into a single document. Make the specification document readable as a
standalone document, and include links to the detailed external documents (it’s easy to embed links in a
Word doc).
The “mandatory” column represents what we consider the bare minimum of information that you
should have pulled together, and issues to have thought through, before you do any real development.
Anytime you touch SSIS before that point should be considered throwaway / learning / prototyping.
Change Log
What Approx page count Mandatory
Overview of changes to this document (who, what, when) 1
Summary
What Approx page count Mandatory
What is this ETL specification for? What system/subsystem/phase? 2
Historical load or incremental load?
1
to document)
Data profiling reports 2 (overview + link
to document)
Database physical design 1 (overview + link
to DDL script or
database definition)
2
For each attribute, Type1 v. Type 2 handling
Incremental data volumes, measured as new and updated
rows / load cycle
How to handle late arriving data for facts and dimensions
Load frequency (eg daily)
Table partitioning strategy
Overview of data source(s), focusing on any unusual
characteristics (unusually short access window; data lives in
Excel; etc)
Detailed source-to-target mapping (link to location in
existing document)
Detailed source data profiling (link to location in existing
document)
Deviations from default strategies, if any: 0-2
SCD management (eg default is use SSIS wizard but for this
table we’re going to do XYZ for reason ABC.
Extract strategy
Startup
Cleanup
Error handling
Dependencies: Which other tables need to be loaded before this 1
table is processed?
Summary of flow
What Approx page count Mandatory
Describe master packages, and provide a first cut at job 2-4 per subject area
sequencing. Create a dependency tree that specifies which tables (1-2 pages for
must be processed before others. Whether or not you choose to pictures, 1-2 pages
parallelize your processing, it’s important to know the logical for text)
dependencies that cannot be broken.