0% found this document useful (0 votes)

293 views

Model Driven Logical ETL Design Part1

The volume and change rate of ETL programs demand a high quality design approach. Current ETL design practices lead to low quality and high costs. An efficient method for the design and development of ETL components will prove to be invaluable in a data warehousing project.

Uploaded by

markzwijsen

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

293 views

Model Driven Logical ETL Design Part1

Uploaded by

markzwijsen

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 9

Article for DM Review

Author: Mark Zwijsen, Atos Origin Nederland B.V.

Date: 26 september, 2008

Model driven design of ETL-functions

Structured approach leads to higher quality of all aspects.

Summary
In this article, the author presents a structured approach for describing ETL-
functionality. This approach is platform-independent and, when applied to
average-sized and larger data warehouse environments, leads to cost reduction
in programming and testing, as well as changing future designs.

The volume and change rate of ETL programs demand a high quality
design approach
Most data warehousing projects contain large numbers of ETL (Extract,
Transform, Load) components. An average data warehouse easily contains
dozens if not hundreds of tables (facts, dimensions, aggregates) and the number
of ETL components often exceeds the number of tables in a data warehouse.
This is because a separate ETL-program is normally designed and built for at
least each table. In many cases more than one component is used per table: if a
table contains data from more than one source, usually an ETL component for
each source for each table is built.

A data warehouse is subject to change during its whole lifecycle. These changes
come from evolving user requirements, company processes or the market in
which the company operates. This means that new tables and ETL components
have to be built and existing components need adjustments.
An efficient method for the design and development of ETL components will
prove to be invaluable in a data warehousing project.

Current ETL design practices lead to low quality and high costs
Essentially, there is no difference between the design and development process
of ETL components and that of other data processing systems. In both cases,
functional requirements lead to the final product. However, in the practice of
designing or specifying the ETL-transformations, there is no clear distinction
between the logical and the technical design level. One of the reasons for this is
that formal specification languages for ETL are not available or widely
recognized. This is in contrast to the design practice for data structures, where
formal specification languages at the logical level have been available for many
years.

When modelling a data structure, a clear distinction is made between the logical
and technical data model. The logical level describes the structure, the meaning
and relationship of the data, without any knowledge or consideration of the
database management system (DBMS) that will be used. Entity Relationship
modelling is often used as the logical specification language. The technical level

Page 1 of 9
Article for DM Review
Author: Mark Zwijsen, Atos Origin Nederland B.V.
Date: 26 september, 2008
describes the way in which the data structure needs to be implemented in the
specific DBMS.
The technical model originates from a transformation of the logical model, which
undergoes implementation specific adjustments needed for various technical
improvements, such as optimisation and I/O performance.

In designing ETL components, the distinction between the logical and the
technical level is rarely made. The design documents usually contain a mixture of
functional and technical descriptions and specifications. The language used is
informal, which often means that it is free for interpretation, incomplete and does
not comply to standard terminology. This means that the programmer is free to
make assumptions and interpretations, which can endanger the quality of the end
result (the ETL component).
In many data warehousing projects, the design documentation is produced after
the software has been developed. The temptation to document by reverse
engineering is huge. In this type of documentation it is not uncommon to find
many tool specific descriptions included.

High quality ETL designs are purely functional and model driven
So what is a high quality ETL design and why do we need it? In order to answer
this question, the term “quality” needs to be defined. In this article, we use a
definition based on publications by Cavano, McCall and Boehm on the aspects of
quality in software. The term ‘enduring usability’ is the key indicator. The
‘enduring usability’ of a product (in this case an ETL design) is measured by its
usability, maintainability and portability.
The actual design documentation is often completely written in the technical
language of the chosen ETL tool. Unfortunately, this has undesirable
consequences for maintainability on a functional level (structure, modifiability,
legibility, testability) and for the portability. A consideration to (partially) switch to
a different ETL tool is often obstructed by a tool lock-in of the documentation.
This means that pure functional designs are very important, especially for the
maintainability and portability of the data warehouse software.

A universal functional model for ETL functions forms a solid base for good
functional designs. This principle has already firmly proven itself in data
modeling. The question that remains is how such a model should look. To
answer this question we have analyzed a large set of ETL programs from
different data warehouse implementations.

The common functioning of most of the ETL programs can be split into two parts:
1. In the first part, a model transformation is applied. It could also be called a
metamorphosis: The shape of the data changes from source model shape to
target model shape.
2. In the second part, the data - which now is in the ‘target shape’ - is processed
into the data warehouse where aspects like denormalisation and history (slowly
changing dimensions, dimension specific) are handled.

Page 2 of 9
Article for DM Review
Author: Mark Zwijsen, Atos Origin Nederland B.V.
Date: 26 september, 2008
This second part is rather commonplace and a lot of ETL tools offer automated
support (wizards). This part can be designed and developed with a small set of
rules.

The actual transformation takes place in the first part. This is where the data
undergoes its change of shape. The data from the source is the candidate data
for the data warehouse. A candidate undergoes a kind of metamorphosis to
become insertable into the target environment. During the transformation process
first the source shape of the candidate is constructed, and then the target shape
is derived from this source shape.

The transformation model can be divided into 7 steps

These steps are:

1. Collect
2. Enrich
3. Filter
4. Link
5. Validate
6. Convert
7. Deliver

Source Target
model model

Trigger Enrichment Filter Link Validation Conversion Delivery

entity rules rules rules rules rules rules

Collect Enrich Filter Link Validate Convert Deliver

Figure 1

The transformation steps can be classified as:

1. Source model oriented: The rules and criteria are mainly related to the data
model of the source collection: Collect, Enrich, Filter. This is where the source
shape of the candidates is constructed.
2. Target model oriented: The rules and criteria are mainly related to the data
model of the target collection: Link, Convert, Deliver. This is where the target
shape of the candidates is constructed.
3. Quality oriented: The rules and criteria are conern the quality (completeness,
correctness) of the target shape and are related to corrective or adaptive
actions: Validate.

Page 3 of 9
Article for DM Review
Author: Mark Zwijsen, Atos Origin Nederland B.V.
Date: 26 september, 2008

The fundamental idea behind this classification and ranking is that similar
decisions, or transformations, on the data are brought together in one
transformation step, and that all the necessary input for a step is formed by the
output of the preceding step.

The output of every step is summarized in Figure 2.

Transformation Output
step
Collect All the candidates with the directly related available attributes
Enrich All candidates with the directly and indirectly related
attributes
Filter All relevant candidates with the directly and indirectly related
attributes
Link All relevant candidates with the directly and indirectly related
attributes, plus the referencing key attributes of the target
model
Validate All relevant and qualitatively approved candidates with the
directly and indirectly related attributes plus the referencing
key attributes of the target model
Convert All relevant and qualitatively approved candidates with all
necessary attributes for the target model.
Deliver Candidates processed into the target model
Figure 2: Result of the functional transformation steps

Though this 7-step functional decomposition describes a global processing order,

the order can be changed in the technical implementation. This technical
implementation is not only driven by the proposed functionality, but also by the
features of the chosen ETL tool, and by ‘tuning’-activities to achieve optimal
operational performance.

Elaboration is done using the Table metaphor

Tables are very suitable for this goal because they are not only a technical
implementation of the logical concept ‘entity type’, but also because the data can
be instantly visualized.
However, the data that is made visible has only symbolic value, and this requires
some imagination and interpretation by the reader. The advantage is that the size
of the tables is relatively small, which simplifies reading.

1. Collect: All candidates from the trigger table

The first step is collecting the data from the trigger table. The Trigger entity type
represents the actual candidates that need to be processed. This is called the
trigger entity type, because changes to entities of this type are the origin of
changes in the target data.

Page 4 of 9
Article for DM Review
Author: Mark Zwijsen, Atos Origin Nederland B.V.
Date: 26 september, 2008
The functional information stored for this step is the name of the trigger-entity
type.

In the table metaphor, this is shown in Figure 3. In fact this is nothing more than
a table with contents.
Trigger-table A

Attribute A1

Attribute A2

Attribute A3

1 A B C
2 D E F
3 G H I
4 J K L
5 M N O
Figure 3: Table contents after Collection

2. Enrich: add attributes from other source tables

The trigger entity type candidates are probably not yet in their full target shape. It
might be necessary to enrich these candidates with attributes from related
entities.
Two relevant motives for this enrichment are:
• Denormalization in the target model (for the sake of a dimension in a
dimensional model).
• Combination: Combining information from several sources.

The main purpose of Enrichment, is that the relationships between the trigger
entity type and the remaining relevant entity types, are specified in such a way
that the correct attributes are unambiguously added to the target shape.
Related table B

Related table C
Trigger-table A

Attribute C1

Attribute C2
Attribute A1

Attribute A2

Attribute A3

Attribute B1

Attribute B2

1 A B C P Q R S
2 D E F T U V W
3 G H I X Y Z 1

Page 5 of 9
Article for DM Review
Author: Mark Zwijsen, Atos Origin Nederland B.V.
Date: 26 september, 2008
4 J K L 2 3 4 5
5 M N O 6 7 8 9
Figure 4: Table Contents after Enrichment

3. Filter: Exclude Candidates

It is possible that not all candidates are relevant for processing into the target.
Reasons for not processing can be:
1. The whole population of the source table is offered for processing, whereas
only the changes in a specific period are needed.
2. If a department specific data mart is being made, some candidates are just
not relevant for that specific department.

To indicate which candidate is relevant and which is not, criteria on the attributes
are used.

The candidates 2 and 5 in Figure 5 do not meet the filter criteria. After this step, 3
candidates remain (numbers 1, 3 and 4).
Related table B

Related table C
Trigger-table A

Attribute C1

Attribute C2
Attribute A1

Attribute A2

Attribute A3

Attribute B1

Attribute B2

1 A B C P Q R S
2 D E F T U V W
3 G H I X Y Z 1
4 J K L 2 3 4 5
5 M N O 6 7 8 9
Figure 5: Tablecontents after filtering

4. Link: Add key attributes of the target model

If a candidate in the target model has references to other entities (N : 1
relationship), then this step ensures that the reference keys are determined. In a
dimensional target model, this step mainly applies to fact candidates but in some
cases it can apply to dimension candidates.

This step has a lot of similarities with the ‘Enrich’ step. Like in the enrichment
process, the relationships between the candidates and other entity types need to
comply with specific criteria. The difference between this step and the enrichment
step is that in this step we look at the target model, and we are only interested in
the referring key and not the other attributes of the entity type which we want to
link to.

Page 6 of 9
Article for DM Review
Author: Mark Zwijsen, Atos Origin Nederland B.V.
Date: 26 september, 2008

Figure 6 shows that the table is steadily growing. It is also starting to adopt the
shape of the target model.

Related table B

Related table C

Target Table D
Trigger-table A

Attribute C1

Attribute C2
Attribute A1

Attribute A2

Attribute A3

Attribute B1

Attribute B2

FK1

FK2

FK3
1 A B C P Q R S a b c
2 D E F T U V W
3 G H I X Y Z 1 d e
4 J K L 2 3 4 5 g h i
5 M N O 6 7 8 9
Figure 6: Table contents after Linking

5. Validate: Verifying completeness and correctness of the

candidates
Every candidate needs to comply with the specified quality criteria, before being
inserted into the target. These criteria are also called validation rules.
A validation rule can be related to an enrichment (did an enrichment succeed or
fail?), a link (was the referring link found or not?) or an attribute value (domain
validation, obligatory value?).
This step has similarities with the Filter-step. The main difference is that in the
Filter-step a candidate is ignored and no further processing is then done on that
candidate. In the Validate-step every candidate is relevant, but there is a
possibility that a candidate needs more processing before it can pass validation.
Corrective actions are needed to get the candidate through the validation
process. Offering the candidate for validation again later or replacing an invalid
reference by a valid reference (for example dummy reference), are examples of
corrective actions.

Figure 7 shows that candidates 3 and 4 do not comply with the quality criteria.
Per criteria a proceeding step is defined. After this step one candidate remains
(number 1).

Page 7 of 9
Article for DM Review
Author: Mark Zwijsen, Atos Origin Nederland B.V.
Date: 26 september, 2008

Related table B

Related table C

Target Table D
Trigger-table A

Attribute C1

Attribute C2
Attribute A1

Attribute A2

Attribute A3

Attribute B1

Attribute B2

Action
FK1

FK2

FK3
1 A B C P Q R S a b c
2 D E F T U V W
3 G H I X Y Z 1 d e Re-process
4 J K L 2 3 4 5 g h i Corrective
action
5 M N O 6 7 8 9
Figure 7: Table contents after validation, only record #1 remains

6. Convert: establish target attributes, to be derived from source

attributes
The target shape of the candidates is completed by deriving all attributes that are
needed in the target model. (Not including the referencing key attributes which
have already been determined in the Link-step).
For each target model attribute, we specify how it must be derived from one or
more source attributes.

Figure 8 shows the contents of the table with the complete target model.
Related table B

Related table C

Target Table D
Trigger-table A

Attribute C1

Attribute C2

Attribute D1

Attribute D2

Attribute D3
Attribute A1

Attribute A2

Attribute A3

Attribute B1

Attribute B2

FK1

FK2

FK3

1 A B C P Q R S a b c x x z
2 D E F T U V W
3 G H I X Y Z 1 d e
4 J K L 2 3 4 5 g h i
5 M N O 6 7 8 9
Figure 8: Table contents after converting

Page 8 of 9
Article for DM Review
Author: Mark Zwijsen, Atos Origin Nederland B.V.
Date: 26 september, 2008

7. Deliver: loading the candidates into the target database

The last step is the actual delivery, or loading, of the new candidates into the
target database. In this step the following rules apply:
- rules concerning history handling (slowly changing dimension types);
- rules describing the atomicity of the load, i.e. when existing data has to be
replaced, is it a record-based replacement or is it a batch replacement?

So all you need are a set of rules and criteria

The transformation rules, per step and relating to the source and target model,
can be clarified in a table. Together with the data models of the source and
target, this table defines the complete functional description of the transformation
process. This functional description is truly a ‘Platform Independent Model’.

Conclusion
This model driven approach offers a number of advantages over the commonly
used approaches (prosaic, reverse engineering):
- This model can be automated, provided that it is complete. Case-tools which
can automatically or semi-automatically generate data structures and more
classical data processing components, have been available for a
considerable time. Case-tool automation should also be possible for ETL. .
Further analysis is needed to accomplish this.
- This model driven approach generally complies with the quality requirements
described in this article. This approach excels especially in structure,
efficiency, consistency, transferability and testability.
- And as always there is also a financial side to this. Ultimately, this approach
will deliver financial rewards. This model offers a few easy methods for
monitoring completeness and correctness and cuts down the amount of
rework.

Mark Zwijsen is Senior Consultant Data Warehousing and Business Intelligence at Atos Origin.
He can be reached at mark.zwijsen@atosorigin.com

Page 9 of 9

MUNAR - Linear Regression - Ipynb - Colaboratory
No ratings yet
MUNAR - Linear Regression - Ipynb - Colaboratory
30 pages
Case 2 Predicting Boston Housing
0% (6)
Case 2 Predicting Boston Housing
2 pages
Research Data Strategy
No ratings yet
Research Data Strategy
9 pages
Oracle Warehouse Builder 11g: Getting Started
From Everand
Oracle Warehouse Builder 11g: Getting Started
Bob Griesemer
No ratings yet
SAS Data Analytic Development: Dimensions of Software Quality
From Everand
SAS Data Analytic Development: Dimensions of Software Quality
Troy Martin Hughes
No ratings yet
A Framework For ETL Systems Development
No ratings yet
A Framework For ETL Systems Development
16 pages
The Definitive Guide to Data Integration: Unlock the power of data integration to efficiently manage, transform, and analyze data
From Everand
The Definitive Guide to Data Integration: Unlock the power of data integration to efficiently manage, transform, and analyze data
Pierre-yves Bonnefoy
No ratings yet
Data Virtualization: Selected Writings
From Everand
Data Virtualization: Selected Writings
Rick F. van der Lans
No ratings yet
Databricks Cloud How To Log Analysis Example
No ratings yet
Databricks Cloud How To Log Analysis Example
9 pages
Informatica Data Lake Management On The Aws Cloud
100% (1)
Informatica Data Lake Management On The Aws Cloud
30 pages
Data Mart V Data Warehouse
No ratings yet
Data Mart V Data Warehouse
19 pages
Data Testing White Paper
No ratings yet
Data Testing White Paper
15 pages
Fundamentals of Analytics Engineering: An introduction to building end-to-end analytics solutions
From Everand
Fundamentals of Analytics Engineering: An introduction to building end-to-end analytics solutions
Dumky De Wilde
No ratings yet
Managing Data as a Product: Design and build data-product-centered socio-technical architectures
From Everand
Managing Data as a Product: Design and build data-product-centered socio-technical architectures
Andrea Gioia
No ratings yet
Informatica Interview Questions
No ratings yet
Informatica Interview Questions
28 pages
Dimensional Modeling in Data Warehousing
No ratings yet
Dimensional Modeling in Data Warehousing
23 pages
Data Warehousing (2002-05 IBM Ex)
100% (1)
Data Warehousing (2002-05 IBM Ex)
40 pages
Data Warehouse Architecture: Source Analysis Staging Area Data Warehouse Data Mart
No ratings yet
Data Warehouse Architecture: Source Analysis Staging Area Data Warehouse Data Mart
41 pages
FSLDM Data Modeller
No ratings yet
FSLDM Data Modeller
1 page
Data Vault and HQDM Principles PDF
No ratings yet
Data Vault and HQDM Principles PDF
8 pages
Building Regulatory and Supervisory Technology Ecosystems: For Asia’s Financial Stability and Sustainable Development
From Everand
Building Regulatory and Supervisory Technology Ecosystems: For Asia’s Financial Stability and Sustainable Development
Asian Development Bank
No ratings yet
Social Media Data Mining and Analytics
From Everand
Social Media Data Mining and Analytics
Gabor Szabo
No ratings yet
Insurance DataWare House Design Vechiles
No ratings yet
Insurance DataWare House Design Vechiles
2 pages
Pentaho Data Integration Cookbook - Second Edition
From Everand
Pentaho Data Integration Cookbook - Second Edition
María Carina Roldán
No ratings yet
Data Mining Complete Self-Assessment Guide
From Everand
Data Mining Complete Self-Assessment Guide
Gerardus Blokdyk
No ratings yet
Change data capture Third Edition
From Everand
Change data capture Third Edition
Gerardus Blokdyk
No ratings yet
Data Models
No ratings yet
Data Models
57 pages
Comprehensive SQL Techniques: Mastering Data Analysis and Reporting
From Everand
Comprehensive SQL Techniques: Mastering Data Analysis and Reporting
Adam Jones
No ratings yet
A Tale of Two Architectures
No ratings yet
A Tale of Two Architectures
16 pages
Machine Learning: Hands-On for Developers and Technical Professionals
From Everand
Machine Learning: Hands-On for Developers and Technical Professionals
Jason Bell
No ratings yet
Microsoft Dynamics NAV 2009: Professional Reporting
From Everand
Microsoft Dynamics NAV 2009: Professional Reporting
Steven Renders
No ratings yet
DWH & Data Modeling
No ratings yet
DWH & Data Modeling
50 pages
Advances in Object Oriented Data Modeling
100% (1)
Advances in Object Oriented Data Modeling
394 pages
Executive Program in Data Science & Data Analytics Along With Python
No ratings yet
Executive Program in Data Science & Data Analytics Along With Python
21 pages
Conceptual Data Vault Model
100% (1)
Conceptual Data Vault Model
7 pages
Why Data Preprocessing?: Incomplete
No ratings yet
Why Data Preprocessing?: Incomplete
17 pages
Kimball University
No ratings yet
Kimball University
6 pages
Enterprise Architecture EA Standard Requirements
From Everand
Enterprise Architecture EA Standard Requirements
Gerardus Blokdyk
No ratings yet
Troubleshooting Spark Challenges
No ratings yet
Troubleshooting Spark Challenges
7 pages
Basic Elements of Data Warehouse Architecture
100% (1)
Basic Elements of Data Warehouse Architecture
3 pages
Evolutionary Architectures Day 2816211628782168865
No ratings yet
Evolutionary Architectures Day 2816211628782168865
213 pages
Mastering Data Engineering and Analytics with Databricks: A Hands-on Guide to Build Scalable Pipelines Using Databricks, Delta Lake, and MLflow (English Edition)
From Everand
Mastering Data Engineering and Analytics with Databricks: A Hands-on Guide to Build Scalable Pipelines Using Databricks, Delta Lake, and MLflow (English Edition)
Manoj Kumar
No ratings yet
Director Analytics Supply Chain in New York City Resume Pradeep Nair
No ratings yet
Director Analytics Supply Chain in New York City Resume Pradeep Nair
2 pages
A Direct Approach To Physical Data Vault Design
No ratings yet
A Direct Approach To Physical Data Vault Design
32 pages
Meeting DWH QA Challenges Part 2
No ratings yet
Meeting DWH QA Challenges Part 2
10 pages
Low Level Design
No ratings yet
Low Level Design
23 pages
Big Data's Human Component
No ratings yet
Big Data's Human Component
4 pages
Govindarajan Data Vault PDF
100% (1)
Govindarajan Data Vault PDF
29 pages
Data Quality
No ratings yet
Data Quality
4 pages
Intro To Data Engineering Databricks Webinar 13may
No ratings yet
Intro To Data Engineering Databricks Webinar 13may
59 pages
Data Modeling Interviews
No ratings yet
Data Modeling Interviews
16 pages
Er Studio Da Quick
No ratings yet
Er Studio Da Quick
72 pages
Designing The Star Schema Database
No ratings yet
Designing The Star Schema Database
12 pages
Becoming a Data Driven Organization: The Handbook
From Everand
Becoming a Data Driven Organization: The Handbook
Mirko Peters
2/5 (1)
Speed Your Data Lake ROI
No ratings yet
Speed Your Data Lake ROI
16 pages
Data Quality - Trusted Data Across The Entreprise - Overview
100% (1)
Data Quality - Trusted Data Across The Entreprise - Overview
14 pages
Data Architect or ETL Architect or BI Architect or Data Warehous
No ratings yet
Data Architect or ETL Architect or BI Architect or Data Warehous
4 pages
Report - Atlan - Data Catalog Primer
No ratings yet
Report - Atlan - Data Catalog Primer
24 pages
Data Modeling - Design
No ratings yet
Data Modeling - Design
17 pages
New Challenges in Data Integration: Large Scale Automatic Schema Matching
No ratings yet
New Challenges in Data Integration: Large Scale Automatic Schema Matching
8 pages
PSQ 1.2. Voltage Sags & Interruptions
No ratings yet
PSQ 1.2. Voltage Sags & Interruptions
56 pages
CFD Vision 2030
No ratings yet
CFD Vision 2030
58 pages
Cadiente Vs Macas
No ratings yet
Cadiente Vs Macas
1 page
Project Report Abhinav
No ratings yet
Project Report Abhinav
101 pages
Switchgears and Switchboard Assemblies
No ratings yet
Switchgears and Switchboard Assemblies
3 pages
Client Engineering Org. Spec. No. Project Title Location Date Project Location Contract No. Equip. No. Proj. No. Specifier Name Tele. No. P. O. No
No ratings yet
Client Engineering Org. Spec. No. Project Title Location Date Project Location Contract No. Equip. No. Proj. No. Specifier Name Tele. No. P. O. No
1 page
HANDWASHING
No ratings yet
HANDWASHING
23 pages
Microsoft Office 2007 Serial Key
100% (2)
Microsoft Office 2007 Serial Key
2 pages
Projectile Motion
No ratings yet
Projectile Motion
16 pages
Formwork Design Concrete Slab: Givens/intial Design Actual
No ratings yet
Formwork Design Concrete Slab: Givens/intial Design Actual
3 pages
I HFG
No ratings yet
I HFG
31 pages
Poetry Book PDF
No ratings yet
Poetry Book PDF
6 pages
Project 1 Data
No ratings yet
Project 1 Data
236 pages
2019 California Journalism Awards - Sports Section (The Desert Sun)
No ratings yet
2019 California Journalism Awards - Sports Section (The Desert Sun)
40 pages
7.4 Rca and Capa Process For Daily Work Management
No ratings yet
7.4 Rca and Capa Process For Daily Work Management
3 pages
Science & Technology in Society: Artificial Intelligence
No ratings yet
Science & Technology in Society: Artificial Intelligence
1 page
ACMV
No ratings yet
ACMV
12 pages
Mehmud Industries (PVT (1) .) LTD (18tons)
No ratings yet
Mehmud Industries (PVT (1) .) LTD (18tons)
4 pages
Unit 6 Social Change
No ratings yet
Unit 6 Social Change
14 pages
Interview Question
No ratings yet
Interview Question
9 pages
PESTEL Analysis & Uses in Finance
No ratings yet
PESTEL Analysis & Uses in Finance
9 pages
Resilient Valves O&M Manual
No ratings yet
Resilient Valves O&M Manual
16 pages
302X FICHA TECNICA
No ratings yet
302X FICHA TECNICA
2 pages
Data Sheet
No ratings yet
Data Sheet
6 pages
UserManualforCorrectionModuleofWBTN Rural
No ratings yet
UserManualforCorrectionModuleofWBTN Rural
32 pages
Barriers (Activity)
No ratings yet
Barriers (Activity)
3 pages
A103 - Ground & First Floor Plan
No ratings yet
A103 - Ground & First Floor Plan
1 page
A little light on the spiritual laws Cooper download
No ratings yet
A little light on the spiritual laws Cooper download
66 pages
John M. Blair (Auth.) - The Control of Oil-Palgrave Macmillan UK (1976)
100% (1)
John M. Blair (Auth.) - The Control of Oil-Palgrave Macmillan UK (1976)
454 pages
Patanjali - Strategic Management - Group Project
No ratings yet
Patanjali - Strategic Management - Group Project
18 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Model Driven Logical ETL Design Part1

Uploaded by

Model Driven Logical ETL Design Part1

Uploaded by

Article for DM Review

Author: Mark Zwijsen, Atos Origin Nederland B.V.

Model driven design of ETL-functions

Structured approach leads to higher quality of all aspects.

The transformation model can be divided into 7 steps

Trigger Enrichment Filter Link Validation Conversion Delivery

Collect Enrich Filter Link Validate Convert Deliver

The transformation steps can be classified as:

The output of every step is summarized in Figure 2.

Though this 7-step functional decomposition describes a global processing order,

Elaboration is done using the Table metaphor

1. Collect: All candidates from the trigger table

2. Enrich: add attributes from other source tables

3. Filter: Exclude Candidates

4. Link: Add key attributes of the target model

5. Validate: Verifying completeness and correctness of the

6. Convert: establish target attributes, to be derived from source

7. Deliver: loading the candidates into the target database

So all you need are a set of rules and criteria

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.