Data Warehouse Testing - Approaches and Standards
Data Warehouse Testing - Approaches and Standards
Data Warehouse Testing - Approaches and Standards
Index
1. Introduction .................................................................................................................................3 2. Background ..................................................................................................................................3 3. Aim of Testing ..............................................................................................................................3 4. Physical Data Model.......................................................................................................................4 4.1 Dimensions Tables ...................................................................................................................4 4.2 Fact Tables .............................................................................................................................4 5. Standard Tests..............................................................................................................................5 5.1 Source to Target Comparisons ..................................................................................................6 5.2 Presentation Layer Testing........................................................................................................6 5.3 Automated Testing ..................................................................................................................6 6. Test Plan......................................................................................................................................8 7. Feedback to Developers .................................................................................................................8
1. Introduction This document acts as a guide to for testing approaches taken by the Testing teams when undertaking DWH and ETL projects. It provides a standard set of tests to be carried out for all DWH projects as well as guidance on further areas to investigate in order to help facilitate a thorough testing program. The aim of this guide will be to reduce the instances of bugs that arise post release of any DWH project so that all problems should be spotted and addressed in the testing phases to fix.
2. Background DWH projects largely concentrate on data, and moving data from one form to another to allow reporting. The sources of this data can be any one of Database of an application Flat files which have been processed from other applications Flat files produced by Third Party Vendor ODS Databases
3. Aim of Testing The ultimate aim for ETL and reporting loads are two fold: Data is loaded by the ETL process in a timely manner that will allow for system availability the next day should that occur problems of the source system occur prior to loading. The data presented in the reporting layer is accurate and correct and reflects the logic expected by the business area.
The type of testing will have to be re-branded so the same language is used throughout all teams to avoid confusion. UNIT TESTING o Testing carried out by the Developer on their individual components
DWH TECHNICAL TESTING o o o Testing carried out by the Systems Analysts/Designer to make sure the code has implemented the design. This will take into account all components making up the project. Impact assessment of the change downstream in the DWH.
END TO END TESTING o Testing carried out by the Test Teams to make sure source data corresponds to the final data as appeared in reporting.
Key to the End-to-End Testing will be an understanding of the underlying data and the type of information business users are likely to ask. Included in this testing should be data in the Dimension and Fact tables in the DWH as well as the data presented through the presentation layer.
4. Physical Data Model DWH designs will follow a Star Schema. All DWH reporting will be done this way. The physical data model should be displayed and it is the information within these tables that should be the main focus for the testing. For the DWH, the data contained here is key, as the DWH is built to accurately reflect the data from systems for analysis and reporting. Within this there are different obvious stars to be aware of. This will help with the overall understanding of the type of fact or dimension table and the type of testing required. 4.1 Dimensions Tables Dimension tables holds the reference data for which the data in the fact tables can be grouped up and reported on. There are 3 basic types of dimensions. The names are descriptions are shown below:
Name Refresh
Description This is straightforward refresh. The fields are constantly overwritten and history is not kept for the column. For example should a description change for a Product number, the old value will be over written by the new value. This is known as a slowly changing dimension, as history can be kept. The column(s) where the history is captured has to be defined. In our example of the Product description changing for a product number, if the slowly changing attribute captured is the product description, a new row of data will be created showing the new product description. The old description will still be contained in the old row.
Type 2
Type 3
This is also a slowly changing dimension. However, instead of a new row, in the example, the old product description will be moved to a old value column in the dimension, while the new description will overwrite the existing column. In addition, a date stamp column exists to say when the value was updated. Although there will be no full history here, the previous value prior to the update is captured. No new rows will be created for history as the attribute for measured for the slowly changing value will be moved from current value to previous value columns.
4.2 Fact Tables Fact tables will hold the actual analytical values. give the data required. Within fact tables should numeric quantities that will
The key to fact tables is to understanding the grain. This means the lowest level of record the fact table holds. For example, in the case of the sales_fact this is the order_line_number, which is the same of the item line number on a transaction.
There are different types of fact tables, and the table below gives an overview of the common ones used.
Description Most facts will fall into this category. The transactional fact will capture transactional data such as sales lines or stock movement lines. The measures for these facts can be summed together.
Snapshot
A snapshot fact will capture the current data for point for a day. For example, All the current stock positions where items are in which branch at the end of a working day. Snapshot fact measures can be summed for this day, but cannot be summed across more than 2 snapshot days as this data will be incorrect.
Accumulative Snapshot
An accumulative snapshot will sum data up for an attribute, and is not based on time. For example, for the accumulative sales quantity for a sale for a particular product. The row of data will be calculated for this row each night giving an accumulative value.
5. Standard Tests There will be tests for DWH that should be carried as part of testing for every project. These are:o Initial Load Testing o Is the data in the Dimension/Fact table correct based and matches what is shown in the source system.
Incremental Testing o o Once a data component is updated in the source system, is the Dimension/Fact table show the updated value. In the case of slowly changing dimensions, is this behaving as expected?
Testing for the logic of the data in the fields o o Does each column display what it should do Is the values shown for calculated fields correct.
Presentation layer Testing o o Business Rules restrictions are sometimes applied to the presentation layer. Due to Business Rules restrictions, it can mean some rows are not shown on the BO layer.
Summary Information o o o Summary information to be testing (i.e. data grouped up to a certain level?) This has to be tested over a period of 1-2 weeks minimum to make sure the data is correct. Basic counts on the number of records should match up with the number records within the source system, if possible.
Counts should be carried out after each incremental run to make sure this can match up to the sources if this is possible. o Report Reconciliation o o In the cases where the DWH has been written to help replace an existing report constructed by other methods, reconciliation testing should be carried out. This may require doing straight comparisons between the report from the DWH and the report it is replacing. It should not be taken that these reports directly match up. In some cases, there will be differences due to elements that were not previously visible or clear to the original non-DWH report when it was developed. For these cases, the difference should be explainable and understandable. o If the reports do not match, it does not mean this is an automatic failure of this test case. Such testing should be carried out along with End Users to help the understanding of the data involved. 5.1 Source to Target Comparisons In most cases data tested will be a direct comparison from the source to the target. In the case of file extractions and data from a third party source, an element of trust has to be in place that the data received from these sources are correct. If the data comes from a internal application, true end-to-end testing should include updating elements on the front end of the source system when required. 5.2 Presentation Layer Testing The presentation layer should reflect the information within the tables. The key part of testing for the Business Objects front end will be to make sure the contexts for any universes are correctly set. This is relevant when the universe contains more than a single fact table, which may be of a different granularity on one another. In addition the universe should be logically laid out to make it intuitive for end users. 5.3 Automated Testing Not all the areas of DWH can be automated, but some of critical testing involving the data synchronization can be achieved using automating data warehouse solution. The aim here is to cover the most critical part of Data Warehouse, which is synchronization of source and target data. Challenges Large number of tables Multiple source systems involved Testing the data synchronization for all these tables Testing Business Objects reports through Automation
Probable Solution As verifying the data held on large number of tables involved huge amount of efforts, concentrate on automating this bit. The solution can be divided into three major areas.
Count Verification: Initially all the tests for Source to Target mapping and Incremental Load for
verifying initial/incremental/differential data count can be automated.
Data Verification: Secondly, for Source to Target mapping and Incremental Load Data verification
tests can be automated.
Verifying Data types: Lastly, logic for checking of the data types of Source and target fields can be
introduced. Architecture
Test parameter file will be used as input to the test script created in Automation tool. It will contain the source & target table SQL scripts along with database details. There will exist a central repository for Reusable functions to connect and read the parameter file, connecting database, generating dynamic SQLs and creating test result report. User can do customized test execution based on the inputs given before executing automation scripts. At the end of each execution, a customized report detailing the execution results will be generated. As part of test execution, all the test data (source and target) will be dumped in Excel format.
Advantages Re-usability : These automated tests can be reused with minimum efforts for regression testing, when newer versions of the source systems are introduced. Customisation : Customised reporting to evaluate the test results and customized test execution.
6. Test Plan Test plans should be prepared with the following input: Business Requirement Document High Level Design Document Input from the Key Business User
Here, the input from the Key Business User will be the most important, and relationships should be built up with them according to each project. The elements for testing will be data related rather than system related. Details of business rules should be held within the HLD however, the while here, rules may be explained in terms if concept. The actual formula or data elements used to get the final figure might be derived from the Detailed Design Document. However, input from the developer to test such rules should be kept to a minimum. Data calculation tests should be written and done by the tester to make sure the rules have been interpreted, as it should have been.
7. Feedback to Developers The type of feedback given back to users is important. This feedback should be clear concise and also give the developer an insight of what he should be looking to address. In many cases data behavioural patterns are not observed until the data is fed into a DWH application, so although the logic of the code seems correct and to the HLD, in real terms, the data behaviour has meant that extra processing logic could be required to make sure the final field displayed what it is expected from it. For this, it may be a collaborative effort between all parties and such logic problems should be address before the live release. If required, such feedback should be added to defect reporting tool. This will help in understanding the issue with the problem data.