ETL Testing Interview Questions
ETL Testing Interview Questions
ETL Testing Interview Questions
Answers
Following are frequently asked questions in interviews for freshers as well experienced ETL
tester and developer.
1) What is ETL?
In data warehousing architecture, ETL is an important component, which manages the data for
any business process. ETL stands for Extract, Transform and Load. Extract does the process of
reading data from a database. Transform does the converting of data into a format that could be
appropriate for reporting and analysis. While, load does the process of writing the data into the
target database.
3) Mention what are the types of data warehouse applications and what is the difference
between data mining and data warehousing?
Info Processing
Analytical Processing
Data Mining
Data mining can be define as the process of extracting hidden predictive information from large
databases and interpret the data while data warehousing may make use of a data mine for
analytical processing of the data in a faster way. Data warehousing is the process of aggregating
data from multiple sources into one common repository
1
SAS Enterprise ETL server
Additive Facts
Semi-additive Facts
Non-additive Facts
Cubes are data processing units comprised of fact tables and dimensions from the data
warehouse. It provides multi-dimensional analysis.
OLAP stands for Online Analytics Processing, and OLAP cube stores large data in muti-
dimensional form for reporting purposes. It consists of facts called as measures categorized by
dimensions.
Tracing level is the amount of data stored in the log files. Tracing level can be classified in two
Normal and Verbose. Normal level explains the tracing level in a detailed manner while verbose
explains the tracing levels at each and every row.
Grain fact can be defined as the level at which the fact information is stored. It is also known as
Fact Granularity
A fact table without measures is known as Factless fact table. It can view the number of
occurring events. For example, it is used to record an event such as employee count in a
company.
2
11) Explain the use of Lookup Transformation?
12) Explain what is partitioning, hash partitioning and round robin partitioning?
To improve performance, transactions are sub divided, this is called as Partitioning. Partioning
enables Informatica Server for creating of multiple connection to various sources
Round-Robin Partitioning:
Hash Partitioning:
For the purpose of partitioning keys to group data among partitions Informatica server
applies a hash function
It is used when ensuring the processes groups of rows with the same partitioning key in
the same partition need to be ensured
The advantage of using the DataReader Destination Adapter is that it populates an ADO
recordset (consist of records and columns) in memory and exposes the data from the DataFlow
task by implementing the DataReader interface, so that other application can consume the data.
14) Using SSIS ( SQL Server Integration Service) what are the possible ways to update
table?
3
15) In case you have non-OLEDB (Object Linking and Embedding Database) source for
the lookup what would you do?
In case if you have non-OLEBD source for the lookup then you have to use Cache to load data
and use it as source
16) In what case do you use dynamic cache and static cache in connected and unconnected
transformations?
Dynamic cache is used when you have to update master table and slowly changing
dimensions (SCD) type 1
For flat files Static cache is used
17) Explain what are the differences between Unconnected and Connected lookup?
A data source view allows to define the relational schema which will be used in the analysis
services databases. Rather than directly from data source objects, dimensions and cubes are
created from data source views.
19) Explain what is the difference between OLAP tools and ETL tools ?
4
ETL tool is meant for the extraction of data from the legacy systems and load into specified data
base with some process of cleansing data.
While OLAP is meant for reporting purpose in OLAP data available in multi-directional model.
With the power connect option you extract SAP data using informatica
Install and configure the PowerConnect tool
Import the source into the Source Analyzer. Between Informatica and SAP Powerconnect
act as a gateaway. The next step is to generate the ABAP code for the mapping then only
informatica can pull data from SAP
To connect and import sources from external systems Power Connect is used
21) Mention what is the difference between Power Mart and Power Center?
22) Explain what staging area is and what is the purpose of a staging area?
Data staging is an area where you hold the data temporary on data warehouse server. Data
staging includes following steps
5
For the various business process to identify the common dimensions, BUS schema is used. It
comes with a conformed dimensions along with a standardized definition of information
Data purging is a process of deleting data from data warehouse. It deletes junk data's like rows
with null values or extra spaces.
Schema objects are the logical structure that directly refer to the databases data. Schema objects
includes tables, views, sequence synonyms, indexes, clusters, functions packages and database
links
Using tools is imperative to conduct ETL testing considering the volume of data. Here is a list of
top 5 ETL Testing Tools with Key features and download links :
1) QuerySurge
QuerySurge is ETL testing solution developed by RTTS. It is built specifically to automate the
testing of Data Warehouses & Big Data. It ensures that the data extracted from data sources
remains intact in the target systems as well.
Features:
6
Helps to automate manual testing effort
Provide testing across the different platform like Oracle, Teradata, IBM, Amazon, Cloudera, etc.
It speeds up testing process up to 1,000 x and also providing up to 100% data coverage
It integrates an out-of-the-box DevOps solution for most Build, ETL & QA management software
Deliver shareable, automated email reports and data health dashboards
Informatica Data Validation is a popular ETL tool. It integrates with the PowerCenter Repository
and Integration Services. It enables developers and business analysts to create rules to test the
mapped data.
Features:
Informatica Data Validation provides complete solution for data validation along with data
integrity
Reduces programming efforts because of intuitive user interface and built-in operators
Identifies and prevents data issues and provides greater business productivity
It has Wizards to create test Query without the user's need to write SQL
This tool also offers design Library and reusable Query Snippets
It can analyze millions of rows and columns of data in minutes
It helps to compare data from source files and data stores to the target Data Warehouse
It can produce informative reports, updates, and auto-email results
3) QualiDI:
QualiDi enables clients to reduce costs, achieve higher ROIs and accelerate time to market. This
ETL tool automates every aspect of the testing lifecycle. It enables clients to reduce costs,
achieve higher ROIs and accelerate time to market
Features:
7
Finding bad and non-compliant data
Data integration testing
Testing across platforms
Managing test cycles through dashboards and reports
Meaningful auto test data generation using constraints and referential integrity
Automated test case generation for direct mappings
Central test case repository allows test schedules for regression testing
Test execution maintained in batches for regression and retesting
Test execution results in dashboards and reports available at a click
Built-in defect tracking and monitoring, interfacing with a third-party defect tracking tool
4) ICEDQ:
ICEDQ is an ETL testing platform. It is built to automate Data Migration Testing and Production
Data Monitoring. It helps users to identify all types of data issues generated during ETL
processes. It provides a complete automated solution to audit, validates and reconciles data.
Features:
5) ETL Validator:
8
Datagaps ETL Validator is a Data warehouse testing tool. It simplifies the testing of Data
Integration, Data Warehouse, and Data Migration projects. It has an inbuilt ETL engine capable
of comparing millions of records.
Features:
Define rules for automatically validating data in every column in the incoming file
Compare profile of target and source data
Simplifies comparison of database Schema across environments
Capability to assemble and schedule test plan
Baseline and compare data to find differences
Analyzes data across multiple systems
It allows web-based reporting
REST API and continuous integration features.
It offers Data Quality and Data Integration Testing
Wizard Based Test Creation
Enterprise Collaboration
Container based security
It provides scheduling Capabilities to the users
It provides benchmarking Capabilities
Reduce costs associated with testing data projects
Here are the top BI tools with its popular features and download links:
1) Yellowfin BI:
9
Features:
Access dashboards from anywhere: web page, company intranet, wiki, or mobile device
Mapping mobile BI like features helps user to access and monitor business-related data
It allows faster, smarter collective decision-making.
User's insights can be made effective through data-rich presentations and interactive reports
This BI tool also supports business decision-making process
2) Clear Analytics:
Clear Analytics is an accurate, timely and clear business insights system. This business
intelligence tool helps to fulfill business needs. This BI tool provides easy extraction of large
data from reliable sources and presents it in the form of professional reports.
Features:
Features:
10
The application developed using SAP can integrate with any system
It follows modular concept for the easy setup and space utilization
Allows to create next-generation database system that combines analytics and transactions
Provide support for On-premise or cloud deployment
Simplified data warehouse architecture
Easy Integration with SAP and non-SAP applications
4) SISENSE:
Sisense is a business intelligence tool. It instantly analyzes and visualizes both big and disparate
datasets. It is an ideal tool for creating dashboards with a wide variety of visualizations.
Features:
5) MicroStrategy:
11
MicroStrategy is an enterprise analytics software. It empowers people to make better decisions
and transform the way they do business. It offers most advanced and predictive analytics.
Features:
6) BOARD:
Features:
7) Pentaho:
12
Pentaho is a Data Warehousing and Business Analytics Platform. The tool empowers business
users to access, discover and merge all types and sizes of data.
Features:
8) Jaspersoft:
Jaspersoft is an open source BI tool. It empowers people around the world every day to make
better decisions. It provides flexible, cost-effective, and widely-deployed business intelligence
solutions. It enables better decision making through highly interactive Web-based reports,
dashboards, and analysis.
Features:
9) QlikView:
Qlik allows creating visualizations, dashboards, and apps. It also allows seeing the entire story
that lives within data.
13
Features:
10) BIRT:
BIRT is open source Business Intelligence and reporting tool. It consists of a visual report
designer and a runtime component for Java environment.
Features:
IBM Cognos Analytics, an interactive business intelligence tool. It allows sharing data-driven
insights in a governed environment. It creates compelling reports and dashboards.
Feature:
14
It offers cloud support and complete governance of data to generate online and offline reports
Accurate and safe reporting
Cross-department predictive analyses
Intent-based process modeling
Dundas is an enterprise-ready Business Intelligence platform. It is used for building and viewing
interactive dashboards, reports, scorecards and more. It is possible to deploy Dundas BI as the
central data portal for the organization or integrate it into an existing website as a custom BI
solution.
Features:
Style Intelligence is a data intelligence platform. It is powerful data mashup software that allows
fast and flexible transformation of data from disparate sources.
Features:
15
Scale up for large data sets of users using Inbuilt Spark platform
Generate paginated reports with embedded business logic and parameterization
14) Birst:
Birst is a web-based networked BI and analytics solution. It connects insights from various teams
and helps in making informed decisions. It allows decentralized users to augment the enterprise
data model. It also offers a unified semantic layer to maintain definitions and key metrics.
Features:
15) Netlink:
The Netlink Analytics Platform is a leading edge advanced analytics and cloud-based platform. It
can be used as Software-as-a-Service or Platform-as-a-Service. It is available with Private or
Public Cloud options. It offers best-in-class data visualizations to provide a single point of
enterprise truth.
Features:
Quick customization and visualization of dashboards across all major mobile devices
Powerful and quick ecosystem connectors
Analytics feedback to existing enterprise systems
Embedded domain and statistical knowledge via SMEs
16
Embedded Big Data and end user collaboration
Agility and unlimited scalability
Data scientists mine and deliver the insights that are not visible in operational reports and
dashboards
16) ClicData:
ClicData is a business intelligence dashboard solution. It is designed for use primarily by small
and midsized businesses. The tool enables end users to create reports and dashboards.
Features:
Profitbase is a business intelligence solution that delivers critical business information. It allows
companies to monitor and manage their business performance. It is appropriate for many
commercial markets, including manufacturing and retail.
Features:
It helps make faster decisions based on continuously updated and accurate data
It provides visibility into KPIs in finance, sales, AR/AP, as well as performance measures
It is modular, scalable, and consists of a data warehouse augmented with OLAP cubes
The BI software allows adding new business systems through acquisition or system upgrades
It is a module based BI tool so that customers can select the analytic tools best suited for their
requirements
17
Download link: http://www.profitbase.no/?lang=en
18) Exago:
Features:
Excel-like design to offer advanced functionality like Charts, Formula Editor, and Conditional
Formatting
It simplifies the process of creating complex tables with summary results
It offers wide variety of animated visualizations to choose from and a Chart Wizard
Its Map wizard makes it easy for users to visualize their data on geographical maps
Allow users to link together an unlimited number of charts and tabular reports
It automates the process of merging data to highly formatted PDF, RTF and Excel templates
Reports can be scheduled for automatic emailing
It offers browser-based support for iPad, iPhone and Android devices
19) Halo:
Halo is a unique self-service business intelligence tool. This business intelligence platform helps
in business planning for supply chain management. It incorporates and combines automated data
transformation processes with assisted manual data manipulation.
Features:
18
Supplier Management
Customer Management
Visualization, Reporting, and Analytics
Rapid Insight is a BI software which allows to builds, predictive models. It offers automated
modeling to identify relationships within complex sets of data.
Features:
21) Alteryx:
Alteryx is a Business intelligence and analytics solutions for enterprise and SMB companies. It is
a desktop-to-cloud Agile BI and analytics solution. It designed for data artisans and business
leaders
Features:
19
22) LongView:
LongView Enterprise is a business intelligence reporting and analytics platform. It allows rapid
creation of custom applications like reports, dashboards, etc.
Features:
23) Splunk:
Splunk is a tool to make machine data accessible, usable, and valuable to everyone. It delivers
operational intelligence to DevOps teams. It also helps companies to be more productive,
competitive, and secure.
Key Features:
20
24) Oracle BI Standard Edition One
Features:
1) QuerySurge
21
QuerySurge is ETL testing solution developed by RTTS. It is built specifically to automate the
testing of Data Warehouses & Big Data. It ensures that the data extracted from data sources
remains intact in the target systems as well.
Features:
2) MarkLogic:
MarkLogic is a data warehousing solution that makes data integration easier and faster using an
array of enterprise features. This tool helps to perform very complex search operations. It can
query data including documents, relationships, and metadata.
Features:
The Optic API can perform joins and aggregates over documents, triples, and rows.
It allows specifying more complex security rules for all the elements within documents
Writing, reading, patching, and deleting documents in JSON, XML, text, or binary formats
Database Replication for Disaster Recovery
Specify Output Options on the App Server Configuration
Importing and Exporting Configuration Information
22
3) Oracle:
Oracle data warehouse software is a collection of data which is treated as a unit. The purpose of
this database is to store and retrieve related information. It helps the server to reliably manage
huge amounts of data so that multiple users can access the same data.
Features:
Distributes data in the same way across disks to offer uniform performance
Works for single-instance and real application clusters
Offers real application testing
Common architecture between any Private Cloud and Oracle's public cloud
Hi-Speed Connection to move large data
Works seamlessly with UNIX/Linux and Windows platforms
It provides support for virtualization
Allows connecting to the remote database, table, or view
4) Amazon RedShift:
Amazon Redshift is an easy to manage, simple, and cost-effective data warehouse tool. It can
analyze almost every type of data using standard SQL.
Features:
23
No Up-Front Costs for its installation
It allows automating most of the common administrative tasks to monitor, manage, and scale
your data warehouse
Possible to change the number or type of nodes
Helps to enhance the reliability of the data warehouse cluster
Every data center is fully equipped with climate control
Continuously monitors the health of the cluster. It automatically re-replicates data from failed
drives and replaces nodes when needed
5) Domo:
Domo is a cloud-based Data warehouse management tool that easily integrates various types of
data sources, including spreadsheets, databases, social media and almost all cloud-based or on-
premise Data warehouse solutions.
Features:
6) Teradata Corporation:
The Teradata Database is the only commercially available shared-nothing or Massively Parallel
Processing (MPP) data warehousing tool. It is one of the best data warehousing tool for viewing
and managing large amounts of data.
Features:
24
Quick and most insightful analytics
Get the same Database on multiple deployment options
It allows multiple concurrent users to ask complex questions related to data
It is entirely built on a parallel architecture
Offers High performance, diverse queries, and sophisticated workload management
7) SAP:
Features:
8) SAS:
SAS is a leading Datawarehousing tool that allows accessing data across multiple sources. It can
perform sophisticated analyses and deliver information across the organization.
Features:
25
Activities managed from central locations. Hence, user can access applications remotely via the
Internet
Application delivery typically closer to a one-to-many model instead of one-to-one model
Centralized feature updating, allows the users to download patches and upgrades.
Allows viewing raw data files in external databases
Manage data using tools for data entry, formatting, and conversion
Display data using reports and statistical graphics
9) IBM – DataStage:
IBM data Stage is a business intelligence tool for integrating trusted data across various
enterprise systems. It leverages a high-performance parallel framework either in the cloud or on-
premise. This data warehousing tool supports extended metadata management and universal
business connectivity.
Features:
10) Informatica:
26
Features:
It has a centralized error logging system which facilitates logging errors and rejecting data into
relational tables
Build in Intelligence to improve performance
Limit the Session Log
Ability to Scale up Data Integration
Foundation for Data Architecture Modernization
Better designs with enforced best practices on code development
Code integration with external Software Configuration tools
Synchronization amongst geographically distributed team members
11) MS SSIS:
SQL Server Integration Services is a Data warehousing tool that used to perform ETL
operations; i.e. extract, transform and load data. SQL Server Integration also includes a rich set
of built-in tasks.
Features:
Open Studio is an open source data warehousing tool developed by Talend. It is designed to
convert, combine and update data in various locations. This tool provides an intuitive set of tools
27
which make dealing with data lot easier. It also allows big data integration, data quality, and
master data management.
Features:
The Ab Initio is a data analysis, batch processing, and GUI based parallel processing data
warehousing tool. It is commonly used to extract, transform and load data.
Features:
14) Dundas:
Dundas is an enterprise-ready Business Intelligence platform. It is used for building and viewing
interactive dashboards, reports, scorecards and more. It is possible to deploy Dundas BI as the
central data portal for the organization or integrate it into an existing website as a custom BI
solution.
28
Features:
15) Sisense:
Sisense is a business intelligence tool which analyses and visualizes both big and disparate
datasets, in real-time. It is an ideal tool for preparing complex data for creating dashboards with a
wide variety of visualizations.
Features:
29
16) TabLeau:
Tableau Server is an online Data warehousing with 3 versions Desktop, Server, and Online. It is
secure, shareable and mobile friendly data warehouse solution.
Features:
17) MicroStrategy:
Features:
30
18) Pentaho
Pentaho is a Data Warehousing and Business Analytics Platform. The tool has a simplified and
interactive approach which empowers business users to access, discover and merge all types and
sizes of data.
Features:
19) BigQuery:
Google's BigQuery is an enterprise-level data warehousing tool. It reduces the time for storing
and querying massive datasets by enabling super-fast SQL queries. It also controls access to both
the project and also offering the feature of view or query the data.
Features:
31
Automatic Data Transfer Service
Full control over access to the data stored
Easy to read and write data in BigQuery via Cloud Dataflow, Spark, and Hadoop
BigQuery provides cost control mechanisms
20) Numetric:
Numetric is the fast and easy BI tool. It offers business intelligence solutions from data
centralization and cleaning, analyzing and publishing. It is powerful enough for anyone to use.
This data warehousing tool helps to measure and improve productivity.
Features:
Data benchmarking
Budgeting & forecasting
Data chart visualizations
Data analysis
Data mapping & dictionary
Key performance indicators
32
Solver BI360 is a most comprehensive business intelligence tool. It gives 360º insights into any
data, using reporting, data warehousing, and interactive dashboards. BI360 drives effective, data-
based productivity.
Features:
33