1
1
INTRODUCTION TO DATA
WAREHOUSING
UNIT-1 TOPICS
¢ Trends in DW
3
WHY BUSINESS INTELLIGENCE?
Business intelligence is the activity which contributes to the growth of any company.
4
HOW BI TRANSFORMS THE DATA TO INFORMATION
AND INFORMATION TO KNOWLEDGE.
5
BUT WHY DATA WAREHOUSING?
6
A PRODUCER WANTS TO KNOW….
Which are our
lowest/highest margin
customers ?
Who are my customers
What is the most and what products
effective distribution are they buying?
channel?
8
DATA, DATA EVERYWHERE
YET ...
¢ I can’t find the data I need
data is scattered over the
network
many versions, subtle
differences
¢ I can’t get the data I need
need an expert to get the data
¢ I can’t understand the data I
found
available data poorly
documented
¢ I can’t use the data I found
results are unexpected
data needs to be transformed 9
10
MOTIVATION
• In most organizations, data about specific parts
of business is there - lots and lots of data,
somewhere, in some form.
• Data is available but not information -- and
not the right information at the right time.
Data warehouse is to:
• bring together information from multiple sources
as to provide a consistent database source for
decision support queries.
• off-load decision support applications from the
on-line transaction system.
11
WHAT IS A DATA WAREHOUSE?
Ø A central location where consolidated data from multiple locations
(databases) are stored.
Ø DWH is maintained separately from an organization’s operational
databae.
Ø End users access it whenever any information is needed.
Ø Note: Data Warehouse is not loaded every time new data is added
to database.
12
SCENARIO 1
13
SCENARIO 1 : ABC PVT LTD.
Mumbai
Delhi
Sales per item type per branch Sales
for first quarter. Manager
Chennai
14
Banglore
SOLUTION 1:ABC PVT LTD.
15
SOLUTION 1:ABC PVT LTD.
Mumbai
Report
Delhi
Query & Sales
Data
Analysis tools Manager
Warehouse
Chennai
Banglore 16
SCENARIO 2
17
SCENARIO 2 : ONE STOP SHOPPING
Data Entry
Operator
Report
Data Entry
Operator 20
SCENARIO 3
21
SOLUTION 3
22
23
BASIC CONCEPTS OF DATA WAREHOUSING
24
DATA WAREHOUSE: A BLEND OF TECHNOLOGIES
25
EVOLUTION OF DATA WAREHOUSE
26
EVOLUTION OF DATA WAREHOUSE
27
EVOLUTION OF DATA WAREHOUSE
29
Operational Systems vs.
Decision Support Systems
Operational Systems
30
Operational Systems vs.
Decision Support Systems
Decision Support Systems
31
FAILURES OF PAST DECISION
SUPPORT SYSTEMS
OLTP
systems
32
OPERATIONAL SYSTEM VS. DATA
WAREHOUSE SYSTEM
33
TO SUMMARIZE ...
¢ OLTP Systems are
used to “run” a business
35
OLTP VS. OLAP
36
OLTP & DATA WAREHOUSE SYSTEM
37
OPERATIONAL AND INFORMATIONAL
38
OPERATIONAL V/S INFORMATION SYSTEM
Features Operational Information
41
CHAPTER SUMMARY
¢ Companies are desperate for strategic information to counter fiercer
competition, extend market share, and improve profitability.
¢ In spite of tons of data accumulated by enterprises over the past
decades, every enterprise is caught in the middle of an information
crisis. Information needed for strategic decision making is not readily
available.
¢ All the past attempts by IT to provide strategic information have been
failures. This was mainly because IT has been trying to provide
strategic information from operational systems.
¢ Informational systems are different from the traditional operational
systems. Operational systems are not designed for strategic
information.
¢ We need a new type of computing environment to provide strategic
information. The data warehouse promises to be this new computing
environment.
¢ Data warehousing is the viable solution. There is a compelling need 42
for data warehousing for every enterprise.
43
Unit-1
Introduction to Data
Warehousing
Unit-1 Topics
Chapter Objectives
¢ Review formal definitions of a data warehouse
¢ Discuss the defining features
A data warehouse is
- subject-oriented,
- integrated,
- time-variant,
- nonvolatile
Data
Legacy Warehouse
System
Data Processing
Flat File Data Transformation
Integration
Integration
In terms of data.
• Encoding structures.
• Measurement of attributes.
• Physical attribute of data
• Naming conventions.
• Data type format
Integration
Time-variant
load
access
Non-volatile
Data Granularity
¢ Data granularity in a data warehouse refers to the level of detail.
l Higher granularity refers to detailed data that is at or near the
transaction level.
l Low granularity refers to data that is summarized or aggregated.
¢ In an operational system, data is usually kept at the lowest level of
detail.
¢ In a DW, data is summarized at different levels.
Level 1- Daily
Level 2- Weekly
Level 3- Monthly
Level 4- Quartely
Level 5- Yearly
Level 6- 5 years summary
etc…
etc…
Data Granularity
¢ Three data levels in a banking data warehouse
Use Cases:
•Marketing analysis and reporting favor a data mart approach because these
activities are typically performed in a specialized business unit, and do not require
enterprise-wide data.
•A financial analyst can use a finance data mart to carry out financial reporting.
Two approaches in designing a Data Warehouse
Inmon vs. Kimball
Use Case:
A company considering an expansion needs to incorporate data from a variety
of data sources across the organization to come to an informed decision. This
requires a data warehouse that aggregates data from sales, marketing, store
management, customer loyalty, supply chains, etc.
Two approaches in designing a
Data Warehouse
Execution
Systems Extract,
Transformation, •Reporting
ODS
• CRM Tools
and Load (ETL) Enterprise
• ERP
• Legacy Layer Data •OLAP
• e-Commerce Warehouse Data Tools
• Cleanse Data Mart
• Filter Records •Ad Hoc
• Standardize Values Query
• Decode Values Data Tools
•External Data • Apply Business Rules Mart
• Householding Metadata •Data
• Purchased • Dedupe Records Repository Mining
Market Data Data
• Merge Records Tools
• Spreadsheets Mart
•Custom Tools
Technologies: •HTML Reports
•Informatica PowerMart •Oracle •Cognos
•PeopleSoft •SQL Server •Business Objects
•Ab Initio
•SAP •MicroStrategy
•Data Stage •Teradata
•Oracle Discoverer
•Siebel •Oracle Warehouse Builder •DB2 •Brio
•Oracle Applications •Custom programs •Data Mining Tools
•Custom Systems •SQL scripts •Portals
Components/Building Blocks of DWH
Source Data Component
¢ Source data can be grouped into 4 components
l Production data
• Comes from operational systems of enterprise
• Some segments are selected from it
• Narrow scope, e.g. order details
l Internal data
• Private datasheet, documents, customer profiles etc.
• E.g. Customer profiles for specific offering
• Special strategies to transform ‘it’ to DW (text document)
l Archived data
• Periodically take the old data and store it in archived files.
• DW have snapshots of historical data
l External data
• Executives depend upon external sources
• E.g. market data of competitors, car rental require new manufacturing.
Define conversion
Data Staging Component
¢ In a separate repository.
¢ Large volumes of historical data kept.
¢ Data structured in highly normalized formats
l for fast and efficient processing.
¢ Structures suitable for analysis
¢ Are read only repositories.
¢ They must be open to different tools.
Metadata Component
¢ Operational Metadata
l describes details of the processing and accessing of data
¢ End-User Metadata
l It is the navigational map of the data warehouse.
l It enables the end-users to find information from the data
warehouse.
Information Delivery Component
Management and Control Component
¢ Government deregulation
Key issues:
¢ Value and expectation
¢ Risk Assessment
¢ Build or Buy
¢ Overall plan
Guiding principles of a Data
Warehouse System
¢ While planning for your data warehouse, key issues to be considered include:
setting proper expectations, assessing risks, deciding between top-down or
bottom-up approaches, choosing from vendor solutions.
¢ Business requirements, not technology, must drive your project.
¢ A data warehouse project without the full support of the top management and
without a strong and enthusiastic executive sponsor is doomed to failure from day
one.
¢ A data warehouse project is much different from a typical OLTP system project.
The traditional life cycle approach of application development must be changed
and adapted for the data warehouse project.
¢ Standards for organization and assignment of team roles are still in the
experimental stage in many projects. Modify the roles to match what is important
for your project.
¢ Participation of the users is mandatory for success of the data warehouse project.
Users can participate in a variety of ways.
¢ Consider the warning signs and success factors; in the final analysis, adopt a
practical approach to build a successful data warehouse.
Unit-1
Defining Business Requirements
Unit-1 Topics
¢ Users can give insights into how they think about the
business.
¢ Users can tell you what measurement units are important
for them.
¢ Users can let you know how they measure success in that
particular department.
¢ Users can give you insights into how they combine the
various pieces of information for strategic decision
making.
Dimensional Analysis
Managers think in Business Dimensions
Dimensional Analysis
Dimensional Nature of Business Data
¢ Options price
¢ Full price
¢ Dealer credits
¢ Dealer invoice
¢ Manufacturer proceeds
¢ Amount financed
Information Package example for
analyzing Hotel occupancy
Key Business Metrics or Facts
¢ Vacant rooms
¢ Unavailable rooms
¢ Number of occupants
¢ Revenue
Information Package helps to…..
¢ Business analysts
Group Sessions
¢ Groups of twenty or less persons at a time
¢ Use only after getting a baseline understanding of the
requirements
¢ Not good for initial data gathering
1. Introduction
l State the purpose and scope of the project.
l Include broad project justification.
l Provide an executive summary of each subsequent section.
3. Specific requirements
l Include details of source data needed.
l List the data Transformation and storage requirements.
l Describe the types of information delivery methods needed by
the users.
4. Information packages
l Provide as much detail as possible for each information package.
l Include in the form of package diagrams.
Requirements Definition Document Outline
5. Other requirements
l Cover miscellaneous requirements such as data extract
frequencies, data loading methods, and locations to which
information must be delivered.
6. User expectations
l State the expectations in terms of problems and opportunities.
l Indicate how the users expect to use the data warehouse.
Requirements Definition Document Outline
¢ Unlike the requirements for an operational system, the requirements for a data
warehouse are quite nebulous.
¢ Business data is dimensional in nature and the users of the data warehouse think in
terms of business dimensions.
¢ A requirements definition for the data warehouse can, therefore, be based on
business dimensions such as product, geography, time, and promotion.
¢ Information packages—a new concept—are the backbone of the requirements
definition. An information package records the critical measurements or facts and
business dimensions along which the facts are normally analyzed.
¢ Interviews and group sessions are standard methods for collecting requirements.
¢ Key people to be interviewed or to be included in group sessions are senior
executives (including the sponsors), departmental managers, business analysts, and
operational systems DBAs.
¢ Review all existing documentation of related operational systems.
¢ Scope and content of the requirements definition document include data sources,
data transformation, data storage, information delivery, and information package
diagrams.