Data Warehouse
Data Warehouse
Definition:
A data warehouse is a comprehensive repository of
current and historical information that is designed to
enhance an organization’s performance.
Key characteristics of a data warehouse:
• Centralized: Data is gathered from multiple sources and stored in a
single location.
• Integrated: Data goes through a cleaning and transformation process
to ensure consistency across different sources.
• Subject-oriented: The data is organized around specific business
subjects, such as customers, products, or sales.
• Historical: Data warehouses typically store historical data, allowing for
analysis of trends over time.
• Read-only: Data warehouses are primarily used for analysis, so the
data is usually read-only.
Characteristics of Data Warehouse:
Subject-Oriented:
• A data warehouse is subject-oriented since it provides
topic-wise information rather than the overall
processes of a business.
• Such subjects may be sales, promotion, inventory, etc.
• For example, if you want to analyze your company’s
sales data, you need to build a data warehouse that
concentrates on sales.
• Such a warehouse would provide valuable information
like ‘who was your best customer last year?’ or ‘who is
likely to be your best customer in the coming year?’
Integrated:
• A data warehouse is developed by integrating data from varied
sources into a consistent format.
• The data must be stored in the warehouse in a consistent and
universally acceptable manner in terms of naming, format, and
coding.
• This facilitates effective data analysis.
Non-Volatile:
• Data once entered into a data warehouse must remain
unchanged.
• All data is read-only.
• Previous data is not erased when current data is entered.
• This helps you to analyze what has happened and when.
Time-Variant:
• The data stored in a data warehouse is documented
with an element of time, either explicitly or implicitly.
• An example of time variance in Data Warehouse is
exhibited in the Primary Key, which must have an
element of time like the day, week, or month.
Database vs. Data Warehouse
• Although a data warehouse and a traditional database share
some similarities, they need not be the same idea.
• The main difference is that in a database, data is collected
for multiple transactional purposes.
• However, in a data warehouse, data is collected on an
extensive scale to perform analytics.
• Databases provide real-time data, while warehouses store
data to be accessed for big analytical queries.
• Data warehouse is an example of an OLAP system or an
online database query answering system.
• OLTP is an online database modifying system, for example,
ATM.
Data Warehouse Architecture
• A data warehouse architecture uses dimensional
models to identify the best technique for extracting
meaningful information from raw data and translating
it into an easy-to-understand structure.
Three main types of architecture when designing a
business-level real-time data warehouse;
• Single-tier Architecture
• Two-tier Architecture
• Three-tier Architecture
Data Warehouse Architecture:
Data warehouse architecture comprises a three-tier structure;
Bottom Tier:
• The bottom tier or data warehouse server usually represents a relational
database system. Back-end tools are used to cleanse, transform and feed data
into this layer.
Middle Tier:
• The middle tier represents an OLAP server that can be implemented in two ways.
• The ROLAP or Relational OLAP model is an extended relational database
management system that maps multidimensional data process to standard
relational process.
• The MOLAP or multidimensional OLAP directly acts on multidimensional data
and operations.
Top Tier:
• This is the front-end client interface that gets data out from the data warehouse.
• It holds various tools like query tools, analysis tools, reporting tools, and data
mining tools.
How Data Warehouse Works
• Data Warehousing integrates data and information collected
from various sources into one comprehensive database.
• For example, a data warehouse might combine customer
information from an organization’s point-of-sale systems, its
mailing lists, website, and comment cards.
• It might also incorporate confidential information about
employees, salary information, etc.
• Businesses use such components of data warehouse to
analyze customers.
• Data mining is one of the features of a data warehouse that
involves looking for meaningful data patterns in vast volumes
of data and devising innovative strategies for increased sales
and profits.
Types of Data Warehouse:
There are three main types of data warehouse;
Enterprise Data Warehouse (EDW):
• This type of warehouse serves as a key or central database that
facilitates decision-support services throughout the enterprise.
• The advantage to this type of warehouse is that it provides access to
cross-organizational information, offers a unified approach to data
representation, and allows running complex queries.
Operational Data Store (ODS):
• This type of data warehouse refreshes in real-time.
• It is often preferred for routine activities like storing employee
records.
• It is required when data warehouse systems do not support reporting
needs of the business.
Data Mart:
• A data mart is a subset of a data warehouse built to
maintain a particular department, region, or business
unit.
• Every department of a business has a central
repository or data mart to store data.
• The data from the data mart is stored in the ODS
periodically.
• The ODS then sends the data to the EDW, where it is
stored and used.
Data Warehouse Examples:
• Investment and Insurance companies use data warehouses
to primarily analyze customer and market trends and allied
data patterns.
• In sub-sectors like Forex and stock markets, data warehouse
plays a significant role because a single point difference can
result in huge losses across the board.
• Retail chains use data warehouses for marketing and
distribution, so they can track items, examine pricing policies
and analyze buying trends of customers.
• They use data warehouse models for business intelligence
and forecasting needs.
• Healthcare companies, use data warehouse concepts
to generate treatment reports, share data with
insurance companies and in research and medical
units.
• Healthcare systems depend heavily upon enterprise
data warehouses because they need the latest,
updated treatment information to save lives.
Data Warehousing Tools:
• Data warehouse tools are software components used to perform
several operations on an extensive data set.
• These tools help to collect, read, write and transfer data from various
sources.
• They are designed to support operations like data sorting, filtering,
merging, etc.
Data warehouse tools can be categorized as:
• Query and reporting tools
• Application Development tools
• Data mining tools
• OLAP tools
Some popular data warehouse tools are Xplenty, Amazon Redshift,
Teradata, Oracle 12c, Informatica, IBM Infosphere, Cloudera, and
Panoply.
Functions of Data Warehouse Tools and Utilities
• The following are the functions of data warehouse tools and
utilities −
• Data Extraction − Involves gathering data from multiple
heterogeneous sources.
• Data Cleaning − Involves finding and correcting the errors in data.
• Data Transformation − Involves converting the data from legacy
format to warehouse format.
• Data Loading − Involves sorting, summarizing, consolidating,
checking integrity, and building indices and partitions.
• Refreshing − Involves updating from data sources to warehouse.
Note − Data cleaning and data transformation are important steps in
improving the quality of data and data mining results.
Benefits of Data Warehouse:
• Improved data consistency
• Better business decisions
• Easier access to enterprise data for end-users
• Better documentation of data
• Reduced computer costs and higher productivity
• Enabling end-users to ask ad-hoc queries or reports without deterring
the performance of operational systems
• Collection of related data from various sources into a place
Companies having dedicated Data Warehouse teams emerge ahead of
others in key areas of product development, pricing, marketing,
production time, historical analysis, forecasting, and customer
satisfaction. Though data warehouses can be slightly expensive, they
pay in the long run.