ETL Testing Interview Questions

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 33

Top 25 ETL Testing Interview Questions &

Answers
Following are frequently asked questions in interviews for freshers as well experienced ETL
tester and developer.

1) What is ETL?

In data warehousing architecture, ETL is an important component, which manages the data for
any business process. ETL stands for Extract, Transform and Load. Extract does the process of
reading data from a database. Transform does the converting of data into a format that could be
appropriate for reporting and analysis. While, load does the process of writing the data into the
target database.

2) Explain what are the ETL testing operations includes?

ETL testing includes

 Verify whether the data is transforming correctly according to business requirements


 Verify that the projected data is loaded into the data warehouse without any truncation
and data loss
 Make sure that ETL application reports invalid data and replaces with default values
 Make sure that data loads at expected time frame to improve scalability and performance

3) Mention what are the types of data warehouse applications and what is the difference
between data mining and data warehousing?

The types of data warehouse applications are

 Info Processing
 Analytical Processing
 Data Mining

Data mining can be define as the process of extracting hidden predictive information from large
databases and interpret the data while data warehousing may make use of a data mine for
analytical processing of the data in a faster way. Data warehousing is the process of aggregating
data from multiple sources into one common repository

4) What are the various tools used in ETL?

 Cognos Decision Stream


 Oracle Warehouse Builder
 Business Objects XI
 SAS business warehouse

1
 SAS Enterprise ETL server

5) What is fact? What are the types of facts?

It is a central component of a multi-dimensional model which contains the measures to be


analysed. Facts are related to dimensions.

Types of facts are

 Additive Facts
 Semi-additive Facts
 Non-additive Facts

6) Explain what are Cubes and OLAP Cubes?

Cubes are data processing units comprised of fact tables and dimensions from the data
warehouse. It provides multi-dimensional analysis.

OLAP stands for Online Analytics Processing, and OLAP cube stores large data in muti-
dimensional form for reporting purposes. It consists of facts called as measures categorized by
dimensions.

7) Explain what is tracing level and what are the types?

Tracing level is the amount of data stored in the log files. Tracing level can be classified in two
Normal and Verbose. Normal level explains the tracing level in a detailed manner while verbose
explains the tracing levels at each and every row.

8) Explain what is Grain of Fact?

Grain fact can be defined as the level at which the fact information is stored. It is also known as
Fact Granularity

9) Explain what factless fact schema is and what is Measures?

A fact table without measures is known as Factless fact table. It can view the number of
occurring events. For example, it is used to record an event such as employee count in a
company.

The numeric data based on columns in a fact table is known as Measures

10) Explain what is transformation?

A transformation is a repository object which generates, modifies or passes data. Transformation


are of two types Active and Passive

2
11) Explain the use of Lookup Transformation?

The Lookup Transformation is useful for

 Getting a related value from a table using a column value


 Update slowly changing dimension table
 Verify whether records already exist in the table

12) Explain what is partitioning, hash partitioning and round robin partitioning?

To improve performance, transactions are sub divided, this is called as Partitioning. Partioning
enables Informatica Server for creating of multiple connection to various sources

The types of partitions are

Round-Robin Partitioning:

 By informatica data is distributed evenly among all partitions


 In each partition where the number of rows to process are approximately same this
partioning is applicable

Hash Partitioning:

 For the purpose of partitioning keys to group data among partitions Informatica server
applies a hash function
 It is used when ensuring the processes groups of rows with the same partitioning key in
the same partition need to be ensured

13) Mention what is the advantage of using DataReader Destination Adapter?

The advantage of using the DataReader Destination Adapter is that it populates an ADO
recordset (consist of records and columns) in memory and exposes the data from the DataFlow
task by implementing the DataReader interface, so that other application can consume the data.

14) Using SSIS ( SQL Server Integration Service) what are the possible ways to update
table?

To update table using SSIS the possible ways are:

 Use a SQL command


 Use a staging table
 Use Cache
 Use the Script Task
 Use full database name for updating if MSSQL is used

3
15) In case you have non-OLEDB (Object Linking and Embedding Database) source for
the lookup what would you do?

In case if you have non-OLEBD source for the lookup then you have to use Cache to load data
and use it as source

16) In what case do you use dynamic cache and static cache in connected and unconnected
transformations?

 Dynamic cache is used when you have to update master table and slowly changing
dimensions (SCD) type 1
 For flat files Static cache is used

17) Explain what are the differences between Unconnected and Connected lookup?

Connected Lookup Unconnected Lookup


- It is used when lookup function is used
 Connected lookup participates in mapping
instead of an expression transformation
while mapping
 Multiple values can be returned
- Only returns one output port
 It can be connected to another  Another transformation cannot be
transformations and returns a value connected

 Static or dynamic cache can be used for


 Unconnected as only static cache
connected Lookup

 Connected lookup supports user defined  Unconnected look up does not


default values support user defined default values

 In Connected Lookup multiple column can  Unconnected lookup designate one


be return from the same row or insert into return port and returns one column
dynamic lookup cache from each row

18) Explain what is data source view?

A data source view allows to define the relational schema which will be used in the analysis
services databases. Rather than directly from data source objects, dimensions and cubes are
created from data source views.

19) Explain what is the difference between OLAP tools and ETL tools ?

The difference between ETL and OLAP tool is that

4
ETL tool is meant for the extraction of data from the legacy systems and load into specified data
base with some process of cleansing data.

Example: Data stage, Informatica etc.

While OLAP is meant for reporting purpose in OLAP data available in multi-directional model.

Example: Business Objects, Cognos etc.

20) How you can extract SAP data using Informatica?

 With the power connect option you extract SAP data using informatica
 Install and configure the PowerConnect tool
 Import the source into the Source Analyzer. Between Informatica and SAP Powerconnect
act as a gateaway. The next step is to generate the ABAP code for the mapping then only
informatica can pull data from SAP
 To connect and import sources from external systems Power Connect is used

21) Mention what is the difference between Power Mart and Power Center?

Power Center Power Mart


 Suppose to process huge volume of
 Suppose to process low volume of data
data

 It supports ERP sources such as SAP,


 It does not support ERP sources
people soft etc.

 It supports local and global


 It supports local repository
repository

 It converts local into global  It has no specification to convert local into


repository global repository

22) Explain what staging area is and what is the purpose of a staging area?

Data staging is an area where you hold the data temporary on data warehouse server. Data
staging includes following steps

 Source data extraction and data transformation ( restructuring )


 Data transformation (data cleansing, value transformation )
 Surrogate key assignments

23) What is Bus Schema?

5
For the various business process to identify the common dimensions, BUS schema is used. It
comes with a conformed dimensions along with a standardized definition of information

24) Explain what is data purging?

Data purging is a process of deleting data from data warehouse. It deletes junk data's like rows
with null values or extra spaces.

25) Explain what are Schema Objects?

Schema objects are the logical structure that directly refer to the databases data. Schema objects
includes tables, views, sequence synonyms, indexes, clusters, functions packages and database
links

26) Explain these terms Session, Worklet, Mapplet and Workflow ?

 Mapplet : It arranges or creates sets of transformation


 Worklet: It represents a specific set of tasks given
 Workflow: It's a set of instructions that tell the server how to execute tasks
 Session: It is a set of parameters that tells the server how to move data from sources to
target

5 Best ETL Automation Testing Tools in 2018


ETL testing is performed before data is moved into a production data warehouse system. It is
also known as table balancing or production reconciliation. The main goal of ETL testing is to
identify and mitigate data defects.

Using tools is imperative to conduct ETL testing considering the volume of data. Here is a list of
top 5 ETL Testing Tools with Key features and download links :

1) QuerySurge

QuerySurge is ETL testing solution developed by RTTS. It is built specifically to automate the
testing of Data Warehouses & Big Data. It ensures that the data extracted from data sources
remains intact in the target systems as well.

Features:

 Improve data quality & data governance


 Accelerate your data delivery cycles

6
 Helps to automate manual testing effort
 Provide testing across the different platform like Oracle, Teradata, IBM, Amazon, Cloudera, etc.
 It speeds up testing process up to 1,000 x and also providing up to 100% data coverage
 It integrates an out-of-the-box DevOps solution for most Build, ETL & QA management software
 Deliver shareable, automated email reports and data health dashboards

2) Informatica Data Validation:

Informatica Data Validation is a popular ETL tool. It integrates with the PowerCenter Repository
and Integration Services. It enables developers and business analysts to create rules to test the
mapped data.

Features:

 Informatica Data Validation provides complete solution for data validation along with data
integrity
 Reduces programming efforts because of intuitive user interface and built-in operators
 Identifies and prevents data issues and provides greater business productivity
 It has Wizards to create test Query without the user's need to write SQL
 This tool also offers design Library and reusable Query Snippets
 It can analyze millions of rows and columns of data in minutes
 It helps to compare data from source files and data stores to the target Data Warehouse
 It can produce informative reports, updates, and auto-email results

Download link: https://www.informatica.com/etl-testing.html

3) QualiDI:

QualiDi enables clients to reduce costs, achieve higher ROIs and accelerate time to market. This
ETL tool automates every aspect of the testing lifecycle. It enables clients to reduce costs,
achieve higher ROIs and accelerate time to market

Features:

7
 Finding bad and non-compliant data
 Data integration testing
 Testing across platforms
 Managing test cycles through dashboards and reports
 Meaningful auto test data generation using constraints and referential integrity
 Automated test case generation for direct mappings
 Central test case repository allows test schedules for regression testing
 Test execution maintained in batches for regression and retesting
 Test execution results in dashboards and reports available at a click
 Built-in defect tracking and monitoring, interfacing with a third-party defect tracking tool

Download link: http://www.bitwiseglobal.com/data-tools/qualidi/

4) ICEDQ:

ICEDQ is an ETL testing platform. It is built to automate Data Migration Testing and Production
Data Monitoring. It helps users to identify all types of data issues generated during ETL
processes. It provides a complete automated solution to audit, validates and reconciles data.

Features:

 ICEDQ reads data from any database or file


 It matches data in memory based on unique columns
 It helps in transformation or business expressions
 It identifies mismatching data based on comparison & expression evaluation
 Can check up to 10000 rows to identify issues
 In-Memory Rules Engine
 It allows advanced Scripting
 User & Connection Security
 Jenkins Integration (Build Integration Tools)
 HP ALM Integration
 Web Services & Command Line Interface

Download link: https://icedq.com/download-icedq-trial

5) ETL Validator:

8
Datagaps ETL Validator is a Data warehouse testing tool. It simplifies the testing of Data
Integration, Data Warehouse, and Data Migration projects. It has an inbuilt ETL engine capable
of comparing millions of records.

Features:

 Define rules for automatically validating data in every column in the incoming file
 Compare profile of target and source data
 Simplifies comparison of database Schema across environments
 Capability to assemble and schedule test plan
 Baseline and compare data to find differences
 Analyzes data across multiple systems
 It allows web-based reporting
 REST API and continuous integration features.
 It offers Data Quality and Data Integration Testing
 Wizard Based Test Creation
 Enterprise Collaboration
 Container based security
 It provides scheduling Capabilities to the users
 It provides benchmarking Capabilities
 Reduce costs associated with testing data projects

Download link: http://www.datagaps.com/etl-testing-tools/etl-validator-download

24 Best Business Intelligence(BI) Tools List in


2018
Business Intelligence tools help organizations to improve their decision making & social
collaboration. It provides the means for efficient reporting, thorough analysis of data, statistics &
analytics.

Here are the top BI tools with its popular features and download links:

1) Yellowfin BI:

Yellowfin is a business intelligence platform. It is a single integrated solution developed for


companies across varying industries. It also makes it easy to assess, monitor and understand data.

9
Features:

 Access dashboards from anywhere: web page, company intranet, wiki, or mobile device
 Mapping mobile BI like features helps user to access and monitor business-related data
 It allows faster, smarter collective decision-making.
 User's insights can be made effective through data-rich presentations and interactive reports
 This BI tool also supports business decision-making process

Download link: https://www.yellowfinbi.com/

2) Clear Analytics:

Clear Analytics is an accurate, timely and clear business insights system. This business
intelligence tool helps to fulfill business needs. This BI tool provides easy extraction of large
data from reliable sources and presents it in the form of professional reports.

Features:

 It provides software solutions that require less human resources


 Dashboard creation
 Graphical data presentation
 Key Performance Indicators
 Easy Indication of issues
 Helps to create strategic planning
 It offers predictive analysis

Download link: http://www.clearanalyticsbi.com/

3) SAP BUSINESS INTELLIGENCE:

SAP BI is an integrated business Intelligence software. It is an enterprise level application for


open client/server systems. It has set new standards for providing the best business information
management solutions.

Features:

 It provides highly flexible and most transparent business solutions

10
 The application developed using SAP can integrate with any system
 It follows modular concept for the easy setup and space utilization
 Allows to create next-generation database system that combines analytics and transactions
 Provide support for On-premise or cloud deployment
 Simplified data warehouse architecture
 Easy Integration with SAP and non-SAP applications

Download Link: https://support.sap.com/en/my-support/software-downloads.html

4) SISENSE:

Sisense is a business intelligence tool. It instantly analyzes and visualizes both big and disparate
datasets. It is an ideal tool for creating dashboards with a wide variety of visualizations.

Features:

 Allows to build interactive dashboards with no tech skills


 Create a single version of truth with seamless data
 Query big data at very high speed
 Unify unrelated data into one centralized place
 East drag-and-drop user interface
 Allows to access dashboards even in the mobile device
 Eye-grabbing visualization
 Ad-hoc analysis of high-volume data
 Exports data to Excel, CSV, PDF Images and other formats
 Enables to deliver interactive terabyte-scale analytics
 Identifies critical metrics using filtering and calculations
 Handles large scale data at single commodity server

Download Link: https://www.sisense.com/get/watch-demo/

5) MicroStrategy:

11
MicroStrategy is an enterprise analytics software. It empowers people to make better decisions
and transform the way they do business. It offers most advanced and predictive analytics.

Features:

 Advanced and predictive analytics


 Business intelligence
 Easy to use and maintain
 High-performance business intelligence
 Self-service analytics
 Big data solutions
 Software as a service (SaaS)
 Real-time WYSIWYG report design
 Scorecards and dashboards
 Enterprise reporting

Download link: https://www.microstrategy.com/us

6) BOARD:

Board is a Management Intelligence Toolkit. It combines features of business intelligence and


corporate performance management. It is designed to deliver business intelligence and business
analytics in a single package.

Features:

 Analyse, simulate, plan and predict using a single platform


 To build customized analytical and planning applications
 Board All-In-One combines BI, Corporate Performance Management, and Business analytics
 It empowers businesses to develop and maintain sophisticated analytical and planning
applications
 Proprietary platform helps to report by accessing multiple data sources

Download link: http://www.board.com/en

7) Pentaho:

12
Pentaho is a Data Warehousing and Business Analytics Platform. The tool empowers business
users to access, discover and merge all types and sizes of data.

Features:

 Enterprise platform to accelerate the data pipeline


 Community Dashboard Editor allows fast and efficient development and deployment
 Big data integration without a need for coding
 Simplified embedded analytics
 Visualize data with custom dashboards
 Operational reporting for mongo dB
 Platform to accelerate the data pipeline

Download now: http://www.pentaho.com/testdrive

8) Jaspersoft:

Jaspersoft is an open source BI tool. It empowers people around the world every day to make
better decisions. It provides flexible, cost-effective, and widely-deployed business intelligence
solutions. It enables better decision making through highly interactive Web-based reports,
dashboards, and analysis.

Features:

 It offers reporting, data visualization, and data integration


 It can be integrated into any mobile app so that users can access data from anywhere
 It provides support for decision-making process through key performance indicators and
problem Indicators
 Available as SaaS, On-premise and cloud platform

Download link: https://www.jaspersoft.com/

9) QlikView:

Qlik allows creating visualizations, dashboards, and apps. It also allows seeing the entire story
that lives within data.

13
Features:

 Simple drag-and-drop interfaces to create flexible, interactive data visualizations


 Use natural search to navigate complex information
 Instantly respond to interactions and changes
 Supports multiple data sources and file types
 It allows easy security for data and content across all devices
 It shares relevant analyses, including apps and stories using centralized hub

Download link: http://www.qlik.com

10) BIRT:

BIRT is open source Business Intelligence and reporting tool. It consists of a visual report
designer and a runtime component for Java environment.

Features:

 All your data in a single view


 Analyze billions of records in seconds
 No complex data modeling
 Clean and enrich your data
 Best practices analytical techniques
 No need to write code
 Easy to use visuals
 User autonomy and self-sufficiency

Download link: http://www.eclipse.org/birt/

11) IBM Cognos Analytics

IBM Cognos Analytics, an interactive business intelligence tool. It allows sharing data-driven
insights in a governed environment. It creates compelling reports and dashboards.

Feature:

 Its a web-based proprietary integrated BI suite developed by IBM


 Helps organizations to get insights effectively and provides toolset for reporting, analyzing data
 Allows to create own dashboard and access information from anywhere

14
 It offers cloud support and complete governance of data to generate online and offline reports
 Accurate and safe reporting
 Cross-department predictive analyses
 Intent-based process modeling

Download link: http://www-03.ibm.com/software/products/en

12) Dundas BI:

Dundas is an enterprise-ready Business Intelligence platform. It is used for building and viewing
interactive dashboards, reports, scorecards and more. It is possible to deploy Dundas BI as the
central data portal for the organization or integrate it into an existing website as a custom BI
solution.

Features:

 Easy access through web browser


 Allows to use sample or Excel data
 Server application with full product functionality
 Integrate and access all kind of data sources
 Ad hoc reporting tools
 Customizable data visualizations
 Smart drag and drop tools
 Visualize data through maps
 Predictive and advanced data analytics

Download link: http://www.dundas.com/support/dundas-bi-free-trial

13) Style Intelligence:

Style Intelligence is a data intelligence platform. It is powerful data mashup software that allows
fast and flexible transformation of data from disparate sources.

Features:

 It allows accessing structured databases and semi-structured sources, on-premise applications


 Helps to create streamlined apps for data consumption and updating
 Offer customized and secure levels of data exploration and reporting to cloud application users

15
 Scale up for large data sets of users using Inbuilt Spark platform
 Generate paginated reports with embedded business logic and parameterization

Download link: https://www.inetsoft.com/products/StyleIntelligence/

14) Birst:

Birst is a web-based networked BI and analytics solution. It connects insights from various teams
and helps in making informed decisions. It allows decentralized users to augment the enterprise
data model. It also offers a unified semantic layer to maintain definitions and key metrics.

Features:

 Enable Data as a Service


 Everyone is Cloud-Connected
 Helps end users to access and blend their data with IT-owned data
 Rapidly refine enterprise data
 Create trusted, governed user data
 Create corporate wide metrics
 Create top-down Virtual BI instances
 Blend corporate and local data
 It supports individual agility, transparently governed working with trusted corporate and
departmental data

Download link: https://www.birst.com/

15) Netlink:

The Netlink Analytics Platform is a leading edge advanced analytics and cloud-based platform. It
can be used as Software-as-a-Service or Platform-as-a-Service. It is available with Private or
Public Cloud options. It offers best-in-class data visualizations to provide a single point of
enterprise truth.

Features:

 Quick customization and visualization of dashboards across all major mobile devices
 Powerful and quick ecosystem connectors
 Analytics feedback to existing enterprise systems
 Embedded domain and statistical knowledge via SMEs

16
 Embedded Big Data and end user collaboration
 Agility and unlimited scalability
 Data scientists mine and deliver the insights that are not visible in operational reports and
dashboards

Download link: http://www.netlink.com/solutions/business-analytics-platform/

16) ClicData:

ClicData is a business intelligence dashboard solution. It is designed for use primarily by small
and midsized businesses. The tool enables end users to create reports and dashboards.

Features:

 Keep complete track of your Business health


 It allows sharing data, insights, metrics, and reports with the groups and individuals
 Import, connect and standardize data into a single, powerful, cloud-based data warehouse
 This BI tool reports progress and performance of your projects
 Live Alerts on Dashboards
 Extensive Data Manipulation

Download link: https://www.clicdata.com

17) Profitbase BI:

Profitbase is a business intelligence solution that delivers critical business information. It allows
companies to monitor and manage their business performance. It is appropriate for many
commercial markets, including manufacturing and retail.

Features:

 It helps make faster decisions based on continuously updated and accurate data
 It provides visibility into KPIs in finance, sales, AR/AP, as well as performance measures
 It is modular, scalable, and consists of a data warehouse augmented with OLAP cubes
 The BI software allows adding new business systems through acquisition or system upgrades
 It is a module based BI tool so that customers can select the analytic tools best suited for their
requirements

17
Download link: http://www.profitbase.no/?lang=en

18) Exago:

Exago is a full-featured BI solution. It offers report in any combination to accommodate users of


different levels. It allows to build and format basic tabular reports.

Features:

 Excel-like design to offer advanced functionality like Charts, Formula Editor, and Conditional
Formatting
 It simplifies the process of creating complex tables with summary results
 It offers wide variety of animated visualizations to choose from and a Chart Wizard

 Its Map wizard makes it easy for users to visualize their data on geographical maps
 Allow users to link together an unlimited number of charts and tabular reports
 It automates the process of merging data to highly formatted PDF, RTF and Excel templates
 Reports can be scheduled for automatic emailing
 It offers browser-based support for iPad, iPhone and Android devices

Download link: http://www.exagoinc.com/

19) Halo:

Halo is a unique self-service business intelligence tool. This business intelligence platform helps
in business planning for supply chain management. It incorporates and combines automated data
transformation processes with assisted manual data manipulation.

Features:

 A Clean, Intuitive Interface which is helpful for Self-Service Dashboard Customization


 Ability to share, collaborate and take action all while archiving your discussions
 Sales and operations planning
 Demand Planning
 Inventory Planning

18
 Supplier Management
 Customer Management
 Visualization, Reporting, and Analytics

Download link: https://halobi.com/

20) Rapid Insight Analytics:

Rapid Insight is a BI software which allows to builds, predictive models. It offers automated
modeling to identify relationships within complex sets of data.

Features:

 It makes complex data integration and predictive modeling easy


 It delivers automated Predictive Modeling to identify relationships within complex sets of data
 It allows users to create repeatable processes based on reports and analyses

Download link: http://www.rapidinsightinc.com/

21) Alteryx:

Alteryx is a Business intelligence and analytics solutions for enterprise and SMB companies. It is
a desktop-to-cloud Agile BI and analytics solution. It designed for data artisans and business
leaders

Features:

 Analytics for Midsize Businesses


 Connects business analysts and decision makers regardless of size, format, or physical location
 It offers big data and customer analytics
 It allows Ad Hoc Analysis
 Online Analytical Processing
 Automatic Scheduled Reporting
 Highly customizable Dashboard

Download link: https://www.alteryx.com/

19
22) LongView:

LongView Enterprise is a business intelligence reporting and analytics platform. It allows rapid
creation of custom applications like reports, dashboards, etc.

Features:

 Delivers actionable, contextual knowledge to decision-makers at every level


 It analyzes information from multiple data sources such as ERP, OLAP, relational databases, and
web services
 Single-sign-on if integrated with Windows or LDAP
 It is available on all web servers
 It allows exporting data and reports to Excel, PowerPoint, and PDF
 It allows users to share ad-hoc reports with other users
 Live data pulled from server and automatically refreshed in real time
 Automatic alerts based on thresholds
 Display data in animation and motion charts

Download link: http://www.longview.com/

23) Splunk:

Splunk is a tool to make machine data accessible, usable, and valuable to everyone. It delivers
operational intelligence to DevOps teams. It also helps companies to be more productive,
competitive, and secure.

Key Features:

 Data drive analytics with actionable insights


 Next-generation monitoring and analytics solution
 Delivers a single, unified view of different IT services
 Extend the Splunk platform with purpose-built solutions for security

Download link: https://www.splunk.com/

20
24) Oracle BI Standard Edition One

Oracle BI is an application that offers integrated, end-to-end Enterprise Performance


Management System. It is an integrated query, reporting, analysis, alerting, data integration and
management tool.

Features:

 Simplify analytics strategy by standardizing on one integrated platform


 Centralize data models and metrics to offer comprehensive representation of the business
 Helps business leaders to securely access and explore data
 Connect directly to more Oracle sources and Big Data for broader, richer analysis
 Empower key decision-makers to find answers to predictive and statistical questions quickly
 It allows business analysts to create mash-up data sets by running R scripts in batch mode
 It offers hundreds of pre-built functions to speed analysis with R scripts
 Allows creating a story around any business with visually stunning analytics.
 Get unique insights by creating rich data mash-ups.
 View, analyze, and modified data in the cloud or on-premises
 Create mobile analytical apps without writing a single line of code
 Build apps once and distribute anywhere apps are responsive to any device, any screen size.
 Offer In-memory enhancements
 Improve performance for mash-up data
 Faster query performance
 Load your data and analyze it from any angle to uncover problems and new opportunities
 Blend all types of local and corporate data

Download link: https://www.oracle.com/index.html

20 Best ETL / Data Warehousing Tools in


2018
With many Database Warehousing tools available in the market, it becomes difficult to select the
top tool for your project. Following is a curated list of most popular open source/commercial
ETL tools with key features and download links.

1) QuerySurge

21
QuerySurge is ETL testing solution developed by RTTS. It is built specifically to automate the
testing of Data Warehouses & Big Data. It ensures that the data extracted from data sources
remains intact in the target systems as well.

Features:

 Improve data quality & data governance


 Accelerate your data delivery cycles
 Helps to automate manual testing effort
 Provide testing across the different platform like Oracle, Teradata, IBM, Amazon, Cloudera, etc.
 It speeds up testing process up to 1,000 x and also providing up to 100% data coverage
 It integrates an out-of-the-box DevOps solution for most Build, ETL & QA management software
 Deliver shareable, automated email reports and data health dashboards

2) MarkLogic:

MarkLogic is a data warehousing solution that makes data integration easier and faster using an
array of enterprise features. This tool helps to perform very complex search operations. It can
query data including documents, relationships, and metadata.

Features:

 The Optic API can perform joins and aggregates over documents, triples, and rows.
 It allows specifying more complex security rules for all the elements within documents
 Writing, reading, patching, and deleting documents in JSON, XML, text, or binary formats
 Database Replication for Disaster Recovery
 Specify Output Options on the App Server Configuration
 Importing and Exporting Configuration Information

Download Link: https://developer.marklogic.com/products

22
3) Oracle:

Oracle data warehouse software is a collection of data which is treated as a unit. The purpose of
this database is to store and retrieve related information. It helps the server to reliably manage
huge amounts of data so that multiple users can access the same data.

Features:

 Distributes data in the same way across disks to offer uniform performance
 Works for single-instance and real application clusters
 Offers real application testing
 Common architecture between any Private Cloud and Oracle's public cloud
 Hi-Speed Connection to move large data
 Works seamlessly with UNIX/Linux and Windows platforms
 It provides support for virtualization
 Allows connecting to the remote database, table, or view

Download Link: https://www.oracle.com/downloads/index.html

4) Amazon RedShift:

Amazon Redshift is an easy to manage, simple, and cost-effective data warehouse tool. It can
analyze almost every type of data using standard SQL.

Features:

23
 No Up-Front Costs for its installation
 It allows automating most of the common administrative tasks to monitor, manage, and scale
your data warehouse
 Possible to change the number or type of nodes
 Helps to enhance the reliability of the data warehouse cluster
 Every data center is fully equipped with climate control
 Continuously monitors the health of the cluster. It automatically re-replicates data from failed
drives and replaces nodes when needed

Download Link: https://aws.amazon.com/redshift/

5) Domo:

Domo is a cloud-based Data warehouse management tool that easily integrates various types of
data sources, including spreadsheets, databases, social media and almost all cloud-based or on-
premise Data warehouse solutions.

Features:

 Help you to build your dream dashboard


 Stay connected anywhere you go
 Integrates all existing business data
 Helps you to get true insights into your business data
 Connects all of your existing business data
 Easy Communication & messaging platform
 It provides support for ad-hoc queries using SQL
 It can handle most concurrent users for running complex and multiple queries

Download Link: https://www.domo.com/product

6) Teradata Corporation:

The Teradata Database is the only commercially available shared-nothing or Massively Parallel
Processing (MPP) data warehousing tool. It is one of the best data warehousing tool for viewing
and managing large amounts of data.

Features:

 Simple and Cost Effective solutions


 The tool is best suitable option for organization of any size

24
 Quick and most insightful analytics
 Get the same Database on multiple deployment options
 It allows multiple concurrent users to ask complex questions related to data
 It is entirely built on a parallel architecture
 Offers High performance, diverse queries, and sophisticated workload management

Download Link: https://downloads.teradata.com/

7) SAP:

SAP is an integrated data management platform, to maps all business processes of an


organization. It is an enterprise level application suite for open client/server systems. It has set
new standards for providing the best business information management solutions.

Features:

 It provides highly flexible and most transparent business solutions


 The application developed using SAP can integrate with any system
 It follows modular concept for the easy setup and space utilization
 You can create a Database system that combines analytics and transactions. These next next-
generation databases can be deployed on any device
 Provide support for On-premise or cloud deployment
 Simplified data warehouse architecture
 Integration with SAP and non-SAP applications

Download Link: https://support.sap.com/en/my-support/software-downloads.html

8) SAS:

SAS is a leading Datawarehousing tool that allows accessing data across multiple sources. It can
perform sophisticated analyses and deliver information across the organization.

Features:

25
 Activities managed from central locations. Hence, user can access applications remotely via the
Internet
 Application delivery typically closer to a one-to-many model instead of one-to-one model
 Centralized feature updating, allows the users to download patches and upgrades.
 Allows viewing raw data files in external databases
 Manage data using tools for data entry, formatting, and conversion
 Display data using reports and statistical graphics

Download Link: https://www.sas.com/en_in/home.html

9) IBM – DataStage:

IBM data Stage is a business intelligence tool for integrating trusted data across various
enterprise systems. It leverages a high-performance parallel framework either in the cloud or on-
premise. This data warehousing tool supports extended metadata management and universal
business connectivity.

Features:

 Support for Big Data and Hadoop


 Additional storage or services can be accessed without need to install new software and
hardware
 Real time data integration
 Provide trusted ETL data anytime, anywhere
 Solve complex big data challenges
 Optimize hardware utilization and prioritize mission-critical tasks
 Deploy on-premises or in the cloud

Download Link: http://www-01.ibm.com/support/docview.wss?uid=swg24037518

10) Informatica:

Informatica PowerCenter is Data Integration tool developed by Informatica Corporation. The


tool offers the capability to connect & fetch data from different sources.

26
Features:

 It has a centralized error logging system which facilitates logging errors and rejecting data into
relational tables
 Build in Intelligence to improve performance
 Limit the Session Log
 Ability to Scale up Data Integration
 Foundation for Data Architecture Modernization
 Better designs with enforced best practices on code development
 Code integration with external Software Configuration tools
 Synchronization amongst geographically distributed team members

Download link: https://informatica.com/

11) MS SSIS:

SQL Server Integration Services is a Data warehousing tool that used to perform ETL
operations; i.e. extract, transform and load data. SQL Server Integration also includes a rich set
of built-in tasks.

Features:

 Tightly integrated with Microsoft Visual Studio and SQL Server


 Easier to maintain and package configuration
 Allows removing network as a bottleneck for insertion of data
 Data can be loaded in parallel and various locations
 It can handle data from different data sources in the same package
 SSIS consumes data which are difficult like FTP, HTTP, MSMQ, and Analysis services, etc.
 Data can be loaded in parallel to many varied destinations

Download link: https://www.microsoft.com/en-us/download/details.aspx?id=39931

12) Talend Open Studio:

Open Studio is an open source data warehousing tool developed by Talend. It is designed to
convert, combine and update data in various locations. This tool provides an intuitive set of tools

27
which make dealing with data lot easier. It also allows big data integration, data quality, and
master data management.

Features:

 It supports extensive data integration transformations and complex process workflows


 Offers seamless connectivity for more than 900 different databases, files, and applications
 It can manage the design, creation, testing, deployment, etc of integration processes
 Synchronize metadata across database platforms
 Managing and monitoring tools to deploy and supervise the jobs

Download Link: https://www.talend.com/download/

13) The Ab Initio software:

The Ab Initio is a data analysis, batch processing, and GUI based parallel processing data
warehousing tool. It is commonly used to extract, transform and load data.

Features:

 Meta data management


 Business and Process Metadata management
 Ability to run, debug Ab Initio jobs and trace execution logs
 Manage and run graphs and control the ETL processes
 Components can execute simultaneously on various branches of a graph

Download Link: https://www.abinitio.com/en/

14) Dundas:

Dundas is an enterprise-ready Business Intelligence platform. It is used for building and viewing
interactive dashboards, reports, scorecards and more. It is possible to deploy Dundas BI as the
central data portal for the organization or integrate it into an existing website as a custom BI
solution.

28
Features:

 Data warehousing tool for Business Users and IT Professionals


 Easy access through web browser
 Allows to use sample or Excel data
 Server application with full product functionality
 Integrate and access all kind of data sources
 Ad hoc reporting tools
 Customizable data visualizations
 Smart drag and drop tools
 Visualize data through maps
 Predictive and advanced data analytics

Download link: http://www.dundas.com/support/dundas-bi-free-trial

15) Sisense:

Sisense is a business intelligence tool which analyses and visualizes both big and disparate
datasets, in real-time. It is an ideal tool for preparing complex data for creating dashboards with a
wide variety of visualizations.

Features:

 Unify unrelated data into one centralized place


 Create a single version of truth with seamless data
 Allows to build interactive dashboards with no tech skills
 Query big data at very high speed
 Possible to access dashboards even in the mobile device
 Drag-and-drop user interface
 Eye-grabbing visualization
 Enables to deliver interactive terabyte-scale analytics
 Exports data to Excel, CSV, PDF Images and other formats
 Ad-hoc analysis of high-volume data
 Handles data at scale on a single commodity server
 Identifies critical metrics using filtering and calculations

Download Link: https://www.sisense.com/get/watch-demo/

29
16) TabLeau:

Tableau Server is an online Data warehousing with 3 versions Desktop, Server, and Online. It is
secure, shareable and mobile friendly data warehouse solution.

Features:

 Connect to any data source securely on-premise or in the cloud


 Ideal tool for flexible deployment
 Big data, live or in-memory
 Designed for mobile-first approach
 Securely Sharing and collaborating Data
 Centrally manage metadata and security rules
 Powerful management and monitoring
 Connect to any data anywhere
 Get maximum value from your data with this business analytics platform
 Share and collaborate in the cloud
 Tableau seamlessly integrates with existing security protocols

Download Link: https://public.tableau.com/en-us/s/download

17) MicroStrategy:

MicroStrategy is an enterprise business intelligence application software. This platform supports


interactive dashboards, scorecards, highly formatted reports, ad hoc query and automated report
distribution.

Features:

 Unmatched speed, performance, and scalability


 Maximize the value of investment made by enterprises
 Eliminating the need to rely on multiple tools
 Support for advanced analytics and big data
 Get insight into complex business processes for strengthening organizational security
 Powerful security and administration feature

Download link: https://www.microstrategy.com/us/get-started

30
18) Pentaho

Pentaho is a Data Warehousing and Business Analytics Platform. The tool has a simplified and
interactive approach which empowers business users to access, discover and merge all types and
sizes of data.

Features:

 Enterprise platform to accelerate the data pipeline


 Community Dashboard Editor allows the fast and efficient development and deployment
 Big data integration without a need for coding
 Simplified embedded analytics
 Visualize data with custom dashboards
 Ease of use with the power to integrate all data
 Operational reporting for mongo dB
 Platform to accelerate the data pipeline

Download now: http://www.pentaho.com/testdrive

19) BigQuery:

Google's BigQuery is an enterprise-level data warehousing tool. It reduces the time for storing
and querying massive datasets by enabling super-fast SQL queries. It also controls access to both
the project and also offering the feature of view or query the data.

Features:

 Offers flexible Data Ingestion


 Read and write data in via Cloud Dataflow, Hadoop, and Spark.

31
 Automatic Data Transfer Service
 Full control over access to the data stored
 Easy to read and write data in BigQuery via Cloud Dataflow, Spark, and Hadoop
 BigQuery provides cost control mechanisms

Download now: https://cloud.google.com/bigquery/

20) Numetric:

Numetric is the fast and easy BI tool. It offers business intelligence solutions from data
centralization and cleaning, analyzing and publishing. It is powerful enough for anyone to use.
This data warehousing tool helps to measure and improve productivity.

Features:

 Data benchmarking
 Budgeting & forecasting
 Data chart visualizations
 Data analysis
 Data mapping & dictionary
 Key performance indicators

Download Link: https://www.numetric.com/schedule-a-demo/

21) Solver BI360 Suite:

32
Solver BI360 is a most comprehensive business intelligence tool. It gives 360º insights into any
data, using reporting, data warehousing, and interactive dashboards. BI360 drives effective, data-
based productivity.

Features:

 Excel-based reporting with predefined templates


 Currency conversion and inter-company transactions elimination can be automated
 User-friendly budgeting and forecasting feature
 It reduces the amount of time spent for the preparation of reports and planning
 Easy configuration with User-friendly interface
 Automated data loading
 Combine Financial and Operational Data
 Allows to view data in Data Explorer
 Easily add modules and dimensions
 Unlimited Trees on any dimension
 Support for Microsoft SQL Server/SQL Azure

Download link: http://www.solverglobal.com/products/

33

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy