What Is ETL?
What Is ETL?
What Is ETL?
ETL gained popularity in the 1970s when organizations began using multiple data
repositories, or databases, to store different types of business information. The need to
integrate data that was spread across these databases grew quickly. ETL became the
standard method for taking data from disparate sources and transforming it before
loading it to a target source, or destination.
In the late 1980s and early 1990s, data warehouses came onto the scene. A distinct
type of database, data warehouses provided integrated access to data from multiple
systems – mainframe computers, minicomputers, personal computers and
spreadsheets. But different departments often chose different ETL tools to use with
different data warehouses.
What is ETL?
ETL is an abbreviation of Extract, Transform and Load. In this process, an
ETL tool extracts the data from different RDBMS source systems then
transforms the data like applying calculations, concatenations, etc. and then
load the data into the Data Warehouse system.
1. Full Extraction
2. Partial Extraction- without update notification.
3. Partial Extraction- with update notification
Irrespective of the method used, extraction should not affect performance and
response time of the source systems. These source systems are live
production databases. Any slow down or locking could effect company's
bottom line.
Step 2) Transformation
Data extracted from source server is raw and not usable in its original form.
Therefore it needs to be cleansed, mapped and transformed. In fact, this is
the key step where ETL process adds value and changes data such that
insightful BI reports can be generated.
In this step, you apply a set of functions on extracted data. Data that does not
require any transformation is called as direct move or pass through data.
Step 3) Loading
Loading data into the target datawarehouse database is the last step of the
ETL process. In a typical Data warehouse, huge volume of data needs to be
loaded in a relatively short period (nights). Hence, load process should be
optimized for performance.
Types of Loading:
Besides the support of extraction, transformation, and loading, there are some other tasks that are
important for a successful ETL implementation as part of the daily operations of the data warehouse
and its support for further enhancements. Besides the support for designing a data warehouse and
the data flow, these tasks are typically addressed by ETL tools such as OWB.
Oracle is not an ETL tool and does not provide a complete solution for ETL. However, Oracle does
provide a rich set of capabilities that can be used by both ETL tools and customized ETL solutions.
Oracle offers techniques for transporting data between Oracle databases, for transforming large
volumes of data, and for quickly loading new data into a data warehouse.
ETL tools
There are many Data Warehousing tools are available in the market. Here,
are some most prominent one:
1. MarkLogic:
http://developer.marklogic.com/products
2. Oracle:
https://www.oracle.com/index.html
3. Amazon RedShift:
https://aws.amazon.com/redshift/?nc2=h_m1