Test Material
Test Material
Test Material
The data in a Data Warehouse system is loaded with an ETL (Extract, Transform, Load)
tool. As the name suggests, it performs the following three operations:
Extracts the data from your transactional system which can be an Oracle,
Microsoft, or any other relational database,
You can also extract data from flat files like spreadsheets and CSV files using an ETL tool
and load it into an OLAP data warehouse for data analysis and reporting. Let us take an
example to understand it better.
Example
Let us assume there is a manufacturing company having multiple departments such as
sales, HR, Material Management, EWM, etc. All these departments have separate
databases which they use to maintain information w.r.t. their work and each database
has a different technology, landscape, table names, columns, etc. Now, if the company
wants to analyze historical data and generate reports, all the data from these data
sources should be extracted and loaded into a Data Warehouse to save it for analytical
work.
An ETL tool extracts the data from all these heterogeneous data sources, transforms the
data (like applying calculations, joining fields, keys, removing incorrect data fields, etc.),
and loads it into a Data Warehouse. Later, you can use various Business Intelligence (BI)
tools to generate meaningful reports, dashboards, and visualizations using this data.
The most common ETL tools include: SAP BO Data Services (BODS), Informatica –
Power Center, Microsoft – SSIS, Oracle Data Integrator ODI, Talend Open Studio, Clover
ETL Open source, etc.
Some popular BI tools include: SAP Business Objects, SAP Lumira, IBM Cognos,
JasperSoft, Microsoft BI Platform, Tableau, Oracle Business Intelligence Enterprise
Edition, etc.
1
ETL Testing
ETL Process
Let us now discuss in a little more detail the key steps involved in an ETL procedure –
Data transformation also involves data correction and cleansing of data, removing
incorrect data, incomplete data formation, and fixing data errors. It also includes data
integrity and formatting incompatible data before loading it into a DW system.
2
ETL Testing
Staging Layer – The staging layer or staging database is used to store the data
extracted from different source data systems.
Data Integration Layer – The integration layer transforms the data from the
staging layer and moves the data to a database, where the data is arranged into
hierarchical groups, often called dimensions, and into facts and aggregate
facts. The combination of facts and dimensions tables in a DW system is called a
schema.
Access Layer – The access layer is used by end-users to retrieve the data for
analytical reporting and information.
The following illustration shows how the three layers interact with each other.
3
2. ETL Testing – Tasks ETL Testing
ETL testing is done before data is moved into a production data warehouse system. It is
sometimes also called as table balancing or production reconciliation. It is different
from database testing in terms of its scope and the steps to be taken to complete this.
The main objective of ETL testing is to identify and mitigate data defects and general
errors that occur prior to processing of data for analytical reporting.
8. Sample data comparison between the source and the target system
4
3. ETL vs Database Testing ETL Testing
Both ETL testing and database testing involve data validation, but they are not the
same. ETL testing is normally performed on data in a data warehouse system, whereas
database testing is commonly performed on transactional systems where the data comes
from different applications into the transactional database.
Here, we have highlighted the major differences between ETL testing and Database
testing.
ETL Testing
ETL testing involves the following operations:
4. Verifying if table relations – joins and keys – are preserved during the
transformation.
Database Testing
Database testing stresses more on data accuracy, correctness of data and valid values.
It involves the following operations:
4. Verifying missing data in columns. Check if there are null columns which actually
should have a valid value.