Data Warehouse Notes

Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 41

DATA WAREHOUSE

INTRO
What is a data warehouse?
Data warehouse (DWH) is a large store of data accumulated from a wide
range of sources within a company and used to guide management decisions.
A DWH is typically used to connect and analyze business data from
heterogeneous sources.
The data warehouse is the core of the BI system which is built for data
analysis and reporting.
It is a blend of technologies and components which aids the strategic use of
data.
It is electronic storage of a large amount of information by a business which
is designed for query and analysis instead of transaction processing.
Data Warehouse Architecture

Client Client

Query &
Analysis

Metadata Warehouse

Integration

Source Source
Source
INTRO….
What is Data Warehousing?
A Data Warehousing is process for collecting and
managing data from varied sources to provide
meaningful business insights.

 It is a process of transforming data into information and


making it available to users in a timely manner to make a
difference.
What is Data Warehousing?
Information
A process of
transforming data into
information and
making it available to
users in a timely
enough manner to
make a difference

Data
Characteristics of data warehouse
• Subject Oriented: Data that gives information about a
particular subject instead of about a company's ongoing
operations.
• Integrated: Data that is gathered into the data warehouse from
a variety of sources and merged into a coherent whole.
• Time-variant: All data in the data warehouse is identified with
a particular time period.
• Non-volatile: Data is stable in a data warehouse. More data is
added but data is never removed. This enables management to
gain a consistent picture of the business.
Characteristics of data warehouse
• Data warehousing is combining data from multiple and
usually varied sources into one comprehensive and easily
manipulated database.
• Common accessing systems of data warehousing include
queries, analysis and reporting.
• Because data warehousing creates one database in the end, the
number of sources can be anything you want it to be, provided
that the system can handle the volume, of course.
• The final result, however, is homogeneous data, which can be
more easily manipulated.
DATABASE VS DATA WAREHOUSE
Parameter Database Data Warehouse

Purpose
Is designed to record Is designed to analyze
Processing
Method The database uses the Online Data warehouse uses Online
Transactional Processing (OLTP) Analytical Processing (OLAP).
Usage
The database helps to perform Data warehouse allows you to
fundamental operations for your analyze your business.
business
Tables and
Joins Tables and joins of a database are Table and joins are simple in a data
complex as they are normalized. warehouse because they are
denormalized.
Orientation
Is an application-oriented collection It is a subject-oriented collection of
of data data
Storage limit
Generally limited to a single Stores data from any number of
application applications
DATABASE VS DATA WAREHOUSE
Parameter Database Data Warehouse
Availability Data is available real-time Data is refreshed from source systems as and
when needed
Usage ER modeling techniques are used for Data modeling techniques are used for
designing. designing.
Technique Capture data Analyze data
Data Type Data stored in the Database is up to date. Current and Historical Data is stored in Data
Warehouse. May not be up to date.
Storage of data Flat Relational Approach method is used for Data Ware House uses dimensional and
data storage. normalized approach for the data structure.
Example: Star and snowflake schema.

Query Type Simple transaction queries are used. Complex queries are used for analysis
purpose.
Data Summary Detailed Data is stored in a database. It stores highly summarized data.
Applications of Data Warehousing
Sector Usage
Airline It is used for airline system management
operations like crew assignment, analyzes of
route, frequent flyer program discount schemes
for passenger, etc.
Banking It is used in the banking sector to manage the
resources available on the desk effectively.
Healthcare Data warehouse used to strategize and predict
sector outcomes, create patient's treatment reports, etc.
Advanced machine learning, big data enable
datawarehouse systems can predict ailments.
Applications of Data Warehousing
Industry Functional areas of use Strategic use

Airline Operations; marketing Crew assignment, aircraft


development, mix of fares, analysis
of route profitability, frequent flyer
program promotions

Banking Product development; Customer service, trend analysis,


Operations; product and service promotions,
marketing reduction of IS expenses

Credit card Product Customer service, new information


development; service, fraud detection
marketing
Health care Operations Reduction of operational expenses
Applications of Data Warehousing

Industry Functional areas of use Strategic use

Investment Product development; Risk management, market movements


and Operations; marketing
Insurance analysis, customer tendencies analysis,
portfolio management
Retail chain Distribution; marketing Trend analysis, buying pattern analysis,
pricing policy, inventory control, sales
promotions, optimal distribution channel

Telecommunications Product development;


Operations; marketing New product and service promotions,
reduction of IS budget, profitability
analysis
Personal care Distribution; marketing Distribution decisions, product promotions,
sales decisions, pricing policy
Public sector Operations Intelligence gathering
Data Warehousing process

• Extraction, transformation, and loading (ETL) – a process that extracts information from
internal and external databases, transforms the information using a common set of enterprise
definitions, and loads the information into a data warehouse.
Data Mart
Subset of data warehouses that is highly focused and
isolated for a specific population of users
Example: Marketing data mart, Sales data mart, etc.

14
ETL process
What is ETL?
ETL is a process that extracts the data from different
source systems, then transforms the data (like applying
calculations, concatenations, etc.) and finally loads the
data into the Data Warehouse system.
Full form of ETL is Extract, Transform and Load.
ETL PROCESS
Extraction, Transformation, and Loading (ETL)
 Data extraction
◦ get data from multiple, heterogeneous, and external sources
 Data cleaning
◦ detect errors in the data and rectify them when possible
 Data transformation
◦ convert data from legacy or host format to warehouse format
 Load

◦ sort, summarize, consolidate, compute views, check


integrity, and build indicies and partitions
 Refresh

◦ propagate the updates from the data sources to the


warehouse
17
Database vs. Data Warehouse
Databases contain information in a series of two-
dimensional tables
In a Data Warehouse and data mart, information is
multidimensional, it contains layers of columns and rows

18
Multidimensional Analysis
Data mining – the process of analyzing data to extract
information not offered by the raw data alone
Data-mining tool – uses a variety of techniques to find
patterns and relationships in large volumes of information
and infers rules that predict future behavior and guide
decision making
Data-mining tools include: query tools, statistical tools,
intelligent agents, etc.

19
From Tables and Spreadsheets to
Data Cubes

 A data warehouse is based on a multidimensional data model which views data in the
form of a data cube
 A data cube, such as sales, allows data to be modeled and viewed in multiple
dimensions
◦ Dimension tables, such as item (item_name, brand, type), or time(day, week,
month, quarter, year)
◦ Fact table contains measures (such as dollars_sold) and keys to each of the related
dimension tables

20
Conceptual Modeling of Data Warehouses

 Modeling data warehouses: dimensions & measures


◦ Star schema
◦ Snowflake schema
◦ Fact constellation

21
Conceptual Modeling of Data Warehouses

 Modeling data warehouses: dimensions & measures


◦ Star schema: A fact table in the middle connected to a set of
dimension tables

22
Example of Star Schema

time
time_key item
day item_key
day_of_the_week Sales Fact Table item_name
month brand
quarter time_key type
year supplier_type
item_key
branch_key
branch location
location_key
branch_key location_key
branch_name units_sold street
branch_type city
dollars_sold state_or_province
country
avg_sales
Measures
23
Conceptual Modeling of Data Warehouses

 Modeling data warehouses: dimensions & measures


◦ Snowflake schema: A refinement of star schema where some
dimensional hierarchy is normalized into a set of smaller
dimension tables, forming a shape similar to snowflake

24
Example of Snowflake Schema

time
time_key item
day item_key supplier
day_of_the_week Sales Fact Table item_name supplier_key
month brand supplier_type
quarter time_key type
year item_key supplier_key

branch_key
branch location
location_key
location_key
branch_key
units_sold street
branch_name
city_key
branch_type
dollars_sold city
city_key
avg_sales city
state_or_province
Measures country

25
Conceptual Modeling of Data Warehouses

 Modeling data warehouses: dimensions & measures


◦ Fact constellations: Multiple fact tables share dimension tables,
viewed as a collection of stars, therefore called galaxy schema or
fact constellation

26
Example of Fact Constellation
time
time_key item Shipping Fact Table
day item_key
day_of_the_week Sales Fact Table item_name time_key
month brand
quarter time_key type item_key
year supplier_type shipper_key
item_key
branch_key from_location

branch location_key location to_location


branch_key location_key dollars_cost
branch_name units_sold
street
branch_type
dollars_sold city units_shipped
province_or_state
avg_sales country shipper
Measures shipper_key
shipper_name
location_key
27
shipper_type
Multidimensional Data
Sales volume as a function of product, month,
and region Dimensions: Product, Location, Time
o n Hierarchical summarization paths
gi

Industry Region Year


Re

Category Country Quarter


Product

Product City Month Week

Office Day

Month
28
A Sample Data Cube

Total annual sales


Date of TVs in U.S.A.
1Qtr 2Qtr 3Qtr 4Qtr sum

t
uc
TV
od PC U.S.A
Pr
VCR

Country
sum
Canada

Mexico

sum

29
Typical OLAP Operations
 Roll up (drill-up): summarize data
◦ by climbing up hierarchy or by dimension reduction
 Drill down (roll down): reverse of roll-up
◦ from higher level summary to lower level summary or detailed
data, or introducing new dimensions
 Slice and dice: project and select

 Pivot (rotate):
◦ reorient the cube, visualization, 3D to series of 2D planes
 Other operations
◦ drill across: involving (across) more than one fact table
◦ drill through: through the bottom level of the cube to its back-
end relational tables (using SQL)
30
Roll-up:

Roll-up is also known as "consolidation" or


"aggregation." The Roll-up operation can be performed
in 2 ways
◦ Reducing dimensions
◦ Climbing up concept hierarchy. Concept hierarchy is a system
of grouping things based on their order or level.
ROLL-UP EXAMPLE
ROLL-UP..
In this example, cities New jersey and Lost Angles and
rolled up into country USA
The sales figure of New Jersey and Los Angeles are 440
and 1560 respectively. They become 2000 after roll-up
In this aggregation process, data is location hierarchy
moves up from city to the country.
In the roll-up process at least one or more dimensions
need to be removed. In this example, Quater dimension is
removed.
Drill-down

In drill-down data is fragmented into smaller parts.


It is the opposite of the rollup process.
It can be done via
◦ Moving down the concept hierarchy
◦ Increasing a dimension
DRILL-DOWN
Slice
Here, one dimension is selected, and a new sub-cube is
created.
Following diagram explain how slice operation
performed:
SLICE
Dice
This operation is similar to a slice. The difference in dice
is you select 2 or more dimensions that result in the
creation of a sub-cube.
DICE
Pivot

In Pivot, you rotate the data axes to provide a substitute


presentation of data.
In the following example, the pivot is based on item
types.
PIVOT

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy