0% found this document useful (0 votes)
137 views

ETL Specification Table of Contents: Change Log

This document provides a template for an ETL specification table of contents. It includes sections for an overview, dimensional model, source to target mapping, architecture strategies, a high level map of the data flow, table-level details, and a summary of the overall data flow. The level of detail required depends on the organization, but the document aims to capture all relevant information needed before development. Anything developed before completing this specification should be considered throwaway work.

Uploaded by

Karthik Raparthy
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
137 views

ETL Specification Table of Contents: Change Log

This document provides a template for an ETL specification table of contents. It includes sections for an overview, dimensional model, source to target mapping, architecture strategies, a high level map of the data flow, table-level details, and a summary of the overall data flow. The level of detail required depends on the organization, but the document aims to capture all relevant information needed before development. Anything developed before completing this specification should be considered throwaway work.

Uploaded by

Karthik Raparthy
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 3

ETL Specification Table of Contents

What follows is a table of contents for the ETL Specification document. This is targeted at organizations
that do not have rigid specification / development procedures in place. Those who already follow clear
development methodologies will find this specification document to be weak. Those who fly by the seat
of their pants, will find this specification to be insanely detailed.

Much of this information should exist already. Examples include the logical dimensional model, the
physical database model, the source-to-target map, and the data profiling reports. It’s very helpful to
pull everything together into a single document. Make the specification document readable as a
standalone document, and include links to the detailed external documents (it’s easy to embed links in a
Word doc).

The “mandatory” column represents what we consider the bare minimum of information that you
should have pulled together, and issues to have thought through, before you do any real development.
Anytime you touch SSIS before that point should be considered throwaway / learning / prototyping.

Change Log
What Approx page count Mandatory
Overview of changes to this document (who, what, when) 1 

Summary
What Approx page count Mandatory
What is this ETL specification for? What system/subsystem/phase? 2 
Historical load or incremental load?

Detailed Dimensional Model


What Approx page count Mandatory
Provide the detailed dimensional model (such as Excel spreadsheet 3 (overview + link 
or Erwin diagram) to spreadsheet
 Most details are available as a linked document and/or data model
 Include enough discussion within the text of this document overview
to help the reader understand what’s going on documentation)
 List and brief description of each target table
Source to target map 2 (overview + link 

1
to document)
Data profiling reports 2 (overview + link 
to document)
Database physical design 1 (overview + link 
to DDL script or
database definition)

Architecture and Default Strategies


What Approx page count Mandatory
ETL Architecture: software, hardware, and placement of software 1 (overview + link to 
on hardware architecture
 Most details are available as a linked document document)
 Include enough discussion within the text of this document
to help the reader understand what’s going on
 Location of staging areas (file v. database, target location)
Data model of the ETL process metadata 2
Requirements for system availability, and the basic approach to 2 
meeting those requirements
Overview of generic error handling. 2
For each source system, the default strategy for extract 0.5 / source system 
 Change Data Capture v. Push from source to flat files v.
Separate set of extract packages, staging to tables v.
whatever else you think of
Default slowly changing dimension handling strategy (eg use SSIS 1
SCD transform)
Preconditions (like checking for disk space; closing DW to user 1
queries; dropping or creating indexes; truncating staging tables)
Postconditions (updating statistics and indexes, cleaning up staging
files or database; running a backup)

High level map and Table-level details


What Approx page count Mandatory
High level map picture and brief discussion. Draw a picture that 2-4 per subject area 
explains where data is sourced from and where it’s going. Annotate (1-2 pages for
the major changes that need to happen along the way. What we’re pictures, 1-2 pages
talking about is something similar to Figure 7-2 in the book, but at a for text)
much higher level so you can fit an entire subject area on 1-2 pages
For each table, an estimate of complexity: Is this table’s package 0.5 
hard, medium, easy?
For each table, a high level map as in Figure 7-2. Accompanied by 2-8 per table
discussion of the issues, pseudocode any really complicated
transformations.
 Table design (basically the DDL to create the table, can be a
link)

2
 For each attribute, Type1 v. Type 2 handling
 Incremental data volumes, measured as new and updated
rows / load cycle
 How to handle late arriving data for facts and dimensions
 Load frequency (eg daily)
 Table partitioning strategy
 Overview of data source(s), focusing on any unusual
characteristics (unusually short access window; data lives in
Excel; etc)
 Detailed source-to-target mapping (link to location in
existing document)
 Detailed source data profiling (link to location in existing
document)
Deviations from default strategies, if any: 0-2
 SCD management (eg default is use SSIS wizard but for this
table we’re going to do XYZ for reason ABC.
 Extract strategy
 Startup
 Cleanup
 Error handling
Dependencies: Which other tables need to be loaded before this 1
table is processed?

Summary of flow
What Approx page count Mandatory
Describe master packages, and provide a first cut at job 2-4 per subject area 
sequencing. Create a dependency tree that specifies which tables (1-2 pages for
must be processed before others. Whether or not you choose to pictures, 1-2 pages
parallelize your processing, it’s important to know the logical for text)
dependencies that cannot be broken.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy