Data Warehouse
Data Warehouse
Data Warehouse
(R2021)
RIT CCS341 - DATA WAREHOUSING 1
UNIT-I
INTRODUCTION TO DATA WAREHOUSE
Data warehouse Introduction - Data warehouse components- operational database Vs data
warehouse – Data warehouse Architecture – Three-tier Data Warehouse Architecture -
Autonomous Data Warehouse- Autonomous Data Warehouse Vs Snowflake - Modern Data
Warehouse
Introduction
A Data Warehouse is Built by combining data from multiple diverse sources that support
analytical reporting, structured and unstructured queries, and decision making for the organization,
and Data Warehousing is a step-by-step approach for constructing and using a Data Warehouse.
Many data scientists get their data in raw formats from various sources of data and information. But,
for many data scientists also as business decision-makers, particularly in big enterprises, the main
sources of data and information are corporate data warehouses. A data warehouse holds data from
multiple sources, including internal databases and Software (SaaS) platforms. After the data is
loaded, it often cleansed, transformed, and checked for quality before it is used for analytics
reporting, data science, machine learning, or anything.
1. Business User: Business users or customers need a data warehouse to look at summarized
data from the past. Since these people are coming from a non-technical background also, the
data may berepresented to them in an uncomplicated way.
2. Maintains consistency: Data warehouses are programmed in such a way that they can be
applied in a regular format to all collected data from different sources, which makes it
effortless for company decision-makers to analyze and share data insights with their
colleagues around the globe. By standardizing the data, the risk of error in interpretation is
also reduced and improves overall accuracy.
3. Store historical data: Data Warehouses are also used to store historical data that means, the
time variable data from the past and this input can be used for various purposes.
4. Make strategic decisions: Data warehouses contribute to making better strategic decisions.
Some business strategies may be depending upon the data stored within the data warehouses.
5. High response time: Data warehouse has got to be prepared for somewhat sudden masses
and typeof queries that demands a major degree of flexibility and fast latency.
Characteristics of Data warehouse:
1. Subject Oriented: A data warehouse is often subject-oriented because it delivers may be
achieved on a particular theme which means the data warehousing process is proposed to
handle a particular theme that is more defined. These themes are often sales, distribution,
selling. etc.
2. Time-Variant: When the data is maintained via totally different intervals of time like
weekly, monthly, or annually, etc. It founds numerous time limits that are unit structured
Operational systems are designed to support Data warehousing systems are typically
high-volume transaction processing. designed to support high-volume analytical
processing (i.e., OLAP).
Operational systems are usually concerned Data warehousing systems are usually
with current data. concerned with historical data.
Data within operational systems are mainly Non-volatile, new data may be added
updated regularly according to need. regularly. Once Added rarely changed.
It is optimized for a simple set of transactions, It is optimized for extent loads and high,
generally adding or retrieving a single row at a complex, unpredictable queries that access
time per table. many rows per table.
Operational systems are widely process- Data warehousing systems are widely
oriented. subject-oriented
Operational systems are usually optimized to Data warehousing systems are usually
perform fast inserts and updates of optimized to perform fast retrievals of
associatively small volumes of data. relatively high volumes of data.
Relational databases are created for on-line Data Warehouse designed for on-line
transactional Processing (OLTP) Analytical Processing (OLAP)
Data contents OLTP system manages current OLAP system manages a large amount
data that too detailed and are of historical data, provides facilitates for
used for decision making. summarization and aggregation, and
stores and manages data at different
levels of granularity. This information
makes the data more comfortable to
use in informed decision making.
Database design OLTP system usually uses an OLAP system typically uses either a star
entity-relationship (ER) data or snowflake model and subject-
model and application- oriented database design.
oriented database design.
View OLTP system focuses primarily OLAP system often spans multiple
on the current data within an versions of a database schema, due to
enterprise or department, the evolutionary process of an
without referring to historical organization. OLAP systems also deal
information or data in different with data that originates from various
organizations. organizations, integrating information
from many data stores.
Volume of data Not very large Because of their large volume, OLAP
data are stored on multiple storage
media.
Access patterns The access patterns of an OLTP Accesses to OLAP systems are mostly
system subsist mainly of short, read-only methods because of these
atomic transactions. Such a data warehouses stores historical data.
system requires concurrency
control and recovery
techniques.
Insert and Short and fast inserts and Periodic long-running batch jobs
Updates updates proposed by end- refresh the data.
users.
Data Extraction: This stage handles various data sources. Data analysts should employ
suitable techniques for every data source.
Data Transformation: As we all know, information for a knowledge warehouse comes
from many alternative sources. If information extraction for a data warehouse posture huge
challenges, information transformation gifts even important challenges. We tend to perform
many individual tasks as a part of information transformation. First, we tend to clean the info
extracted from every source of data. Standardization of information elements forms an
outsized part of data transformation. Data transformation contains several kinds of
combining items of information from totally different sources. Information transformation
additionally contains purging supply information that’s not helpful and separating
outsourced records into new mixtures. Once the data transformation performs ends, we’ve got
a set of integrated information that’s clean, standardized, and summarized.
Data Loading: When we complete the structure and construction of the data warehouse and
go live for the first time, we do the initial loading of the data into the data warehouse
storage. The initial load moves high volumes of data consuming a considerable amount of
time.
3. Data Storage in Warehouse:
Data storage for data warehousing is split into multiple repositories. These data repositories contain
structureddata in a very highlynormalized form for fast and efficient processing.
Metadata: Metadata means data about data i.e. it summarizes basic details regarding data,
creating findings & operating with explicit instances of data. Metadata is generated by an
additional correction or automatically and can contain basic information about data.
Raw Data: Raw data is a set of data and information that has not yet been processed and
was delivered from a particular data entity to the data supplier and hasn’t been processed
nonetheless by machine or human. This data is gathered out from online sources to deliver