0% found this document useful (0 votes)
5 views

1

The document introduces data warehousing, highlighting its necessity for effective business intelligence and decision-making. It outlines the basic elements of data warehouses, their evolution, and the differences between operational and informational systems. Additionally, it discusses the ETL process, data granularity, and the architecture of data warehouses, emphasizing the importance of integrating data from multiple sources for strategic insights.

Uploaded by

saianjani.1025
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views

1

The document introduces data warehousing, highlighting its necessity for effective business intelligence and decision-making. It outlines the basic elements of data warehouses, their evolution, and the differences between operational and informational systems. Additionally, it discusses the ETL process, data granularity, and the architecture of data warehouses, emphasizing the importance of integrating data from multiple sources for strategic insights.

Uploaded by

saianjani.1025
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 134

UNIT-1

INTRODUCTION TO DATA
WAREHOUSING
UNIT-1 TOPICS

¢ Need for data warehousing


¢ Basic elements of DW

¢ Trends in DW

¢ Project planning and management

¢ Collecting the requirements

Book: Data Warehousing Fundamentals: A Comprehensive Guide for IT


Professionals. Paulraj Ponniah
2
WHAT IS BUSINESS INTELLIGENCE?
Business intelligence (BI) is a technology-driven
process for analyzing data and presenting
actionable information to help corporate
executives, business managers and other end
users make more informed business decisions.

3
WHY BUSINESS INTELLIGENCE?
Business intelligence is the activity which contributes to the growth of any company.

4
HOW BI TRANSFORMS THE DATA TO INFORMATION
AND INFORMATION TO KNOWLEDGE.

5
BUT WHY DATA WAREHOUSING?

Let’s understand the challenges in achieving


Business Intelligence

6
A PRODUCER WANTS TO KNOW….
Which are our
lowest/highest margin
customers ?
Who are my customers
What is the most and what products
effective distribution are they buying?
channel?

What product prom- Which customers


-otions have the biggest are most likely to go
impact on revenue? to the competition ?
What impact will
new products/services 7
have on revenue
and margins?
OUTPUT OF BI

8
DATA, DATA EVERYWHERE
YET ...
¢ I can’t find the data I need
— data is scattered over the
network
— many versions, subtle
differences
¢ I can’t get the data I need
— need an expert to get the data
¢ I can’t understand the data I
found
— available data poorly
documented
¢ I can’t use the data I found
— results are unexpected
— data needs to be transformed 9

from one form to other


WHY DATA WAREHOUSE?

10
MOTIVATION
• In most organizations, data about specific parts
of business is there - lots and lots of data,
somewhere, in some form.
• Data is available but not information -- and
not the right information at the right time.
Data warehouse is to:
• bring together information from multiple sources
as to provide a consistent database source for
decision support queries.
• off-load decision support applications from the
on-line transaction system.
11
WHAT IS A DATA WAREHOUSE?
Ø A central location where consolidated data from multiple locations
(databases) are stored.
Ø DWH is maintained separately from an organization’s operational
databae.
Ø End users access it whenever any information is needed.
Ø Note: Data Warehouse is not loaded every time new data is added
to database.

12
SCENARIO 1

ABC Pvt. Ltd is a company with branches at Mumbai,


Delhi, Chennai and Bangalore.
The Sales Manager wants quarterly sales report.
Each branch has a separate operational system.

13
SCENARIO 1 : ABC PVT LTD.

Mumbai

Delhi
Sales per item type per branch Sales
for first quarter. Manager

Chennai

14
Banglore
SOLUTION 1:ABC PVT LTD.

¢ Extract sales information from each database.


¢ Store the information in a common repository at a
single site.

15
SOLUTION 1:ABC PVT LTD.

Mumbai

Report
Delhi
Query & Sales
Data
Analysis tools Manager
Warehouse

Chennai

Banglore 16
SCENARIO 2

One Stop Shopping Super Market has huge operational


database.
Whenever Executives wants some report, the OLTP system
becomes slow and data entry operators have to wait for
some time.

17
SCENARIO 2 : ONE STOP SHOPPING

Data Entry Operator


Report

Wait Operational Management


Database

Data Entry Operator


18
SOLUTION 2

¢ Extract data needed for analysis from operational


database.

¢ Store it in another system, the data warehouse.

¢ Refresh warehouse at regular intervals so that it


contains up to date information for analysis.

¢ Warehouse will contain data with historical


perspective.
19
SOLUTION 2

Data Entry
Operator

Report

Transaction Extract Data


Operational Manager
data Warehouse
database

Data Entry
Operator 20
SCENARIO 3

Cakes & Cookies is a small, new company. The


chairman of this company wants his company to
grow. He needs information so that he can make
correct decisions.

21
SOLUTION 3

¢ Improve the quality of data before loading it into


the warehouse.
¢ Perform data cleaning and transformation before
loading the data.
¢ Use query analysis tools to support adhoc queries.

22
23
BASIC CONCEPTS OF DATA WAREHOUSING

¢ Take all the data from the operational systems


¢ Where necessary, include relevant data from
outside, such as industry benchmark indicators
¢ Integrate all the data from the various sources

¢ Remove inconsistencies and transform the data

¢ Store the data in formats suitable for easy access


for decision making

24
DATA WAREHOUSE: A BLEND OF TECHNOLOGIES

25
EVOLUTION OF DATA WAREHOUSE

Since 1970s, organizations gained competitive advantage through


systems that automate business processes to offer more efficient
and cost-effective services to the customer
• This resulted in accumulation of growing amounts of data in
operational databases
• Support day-to-day business operations such as order
processing, inventory control, billing, and so on.
•Provide online information and produce a variety of reports
to monitor and run the business.

26
EVOLUTION OF DATA WAREHOUSE

In 1990s, organizations typically have numerous operational


systems with overlapping and sometimes contradictory definitions
• Organizations now focus on ways to use operational data to
support decision-making, as a means of gaining competitive
advantage
• Operational systems were never designed to provide strategic
information
• Which product lines to expand
• Which markets they should strengthen

27
EVOLUTION OF DATA WAREHOUSE

Organizations need to turn their archives of data into a source of


knowledge, so that a single integrated /consolidated view of the
organization’s data is presented to the user

Data warehousing is a new paradigm which is


• An ideal environment for data analysis and decision
support.
• Fluid, flexible and interactive
•100% user driven
• helpful in answering complex and unpredictable questions.
28
INFORMATION CRISIS
• Organizations have lots of data

• Information technology resources and systems are


not effective at turning all that data into useful
strategic information

29
Operational Systems vs.
Decision Support Systems
Operational Systems

30
Operational Systems vs.
Decision Support Systems
Decision Support Systems

31
FAILURES OF PAST DECISION
SUPPORT SYSTEMS

OLTP
systems

32
OPERATIONAL SYSTEM VS. DATA
WAREHOUSE SYSTEM

33
TO SUMMARIZE ...
¢ OLTP Systems are
used to “run” a business

¢ The Data Warehouse


helps to “optimize” the
business
34
OLTP VS OLAP

35
OLTP VS. OLAP

36
OLTP & DATA WAREHOUSE SYSTEM

37
OPERATIONAL AND INFORMATIONAL

38
OPERATIONAL V/S INFORMATION SYSTEM
Features Operational Information

Characteristics Operational processing Informational processing


Orientation Transaction Analysis
User Clerk,DBA,database Knowledge workers
professional
Function Day to day operation Decision support
Data Content Current Historical, archived,
derived
View Detailed, flat relational Summarized,
multidimensional
DB design Application oriented Subject oriented
Unit of work Short ,simple transaction Complex query
Access Read/write Read only
39
OPERATIONAL V/S INFORMATION SYSTEM
Features Operational Information

Focus Data in Information out


No. of records accessed tens/ hundreds millions

Number of users thousands hundreds


DB size 100MB to GB 100 GB to TB
Usage Predictable, repetitive Ad hoc, random, heuristic

Response Time Sub-seconds Several seconds to


minutes
Priority High performance,high High flexibility,end-user
availability autonomy
Metric Transaction throughput Query throughput
40
ETL – EXTRACT,TRANSFORM AND LOAD

¢ ETL is the process of extracting the data from


various sources, transforming this data to meet
your requirements and then loading it into a
target data warehouse.

41
CHAPTER SUMMARY
¢ Companies are desperate for strategic information to counter fiercer
competition, extend market share, and improve profitability.
¢ In spite of tons of data accumulated by enterprises over the past
decades, every enterprise is caught in the middle of an information
crisis. Information needed for strategic decision making is not readily
available.
¢ All the past attempts by IT to provide strategic information have been
failures. This was mainly because IT has been trying to provide
strategic information from operational systems.
¢ Informational systems are different from the traditional operational
systems. Operational systems are not designed for strategic
information.
¢ We need a new type of computing environment to provide strategic
information. The data warehouse promises to be this new computing
environment.
¢ Data warehousing is the viable solution. There is a compelling need 42
for data warehousing for every enterprise.
43
Unit-1
Introduction to Data
Warehousing
Unit-1 Topics

¢ Need for data warehousing - Chapter 1


¢ Basic elements of DW - Chapter 2
¢ Trends in DW - Chapter-3
¢ Project planning and management – Chapter 4
¢ Collecting the requirements – Chapter 5

Book: Data Warehousing Fundamentals: A Comprehensive Guide for IT


Professionals. Paulraj Ponniah
Basic elements of DW – Chapter 2

Chapter Objectives
¢ Review formal definitions of a data warehouse
¢ Discuss the defining features

¢ Distinguish between data warehouses and data marts

¢ Study each component or building block that makes up


a data warehouse
¢ Introduce metadata and highlight its significance
Inmons’s definition

A data warehouse is
- subject-oriented,
- integrated,
- time-variant,
- nonvolatile

collection of data in support of management’s decision


making process.
Subject-oriented

¢ Data warehouse is organized around subjects such as


sales, product, customer.
¢ It focuses on modeling and analysis of data for
decision makers.
¢ Excludes data not useful in decision support process.
Subject-oriented
Integration

¢ Data Warehouse is constructed by integrating


multiple heterogeneous sources.
¢ Data Preprocessing are applied to ensure
consistency.
RDBMS

Data
Legacy Warehouse
System

Data Processing
Flat File Data Transformation
Integration
Integration

In terms of data.

• Encoding structures.
• Measurement of attributes.
• Physical attribute of data
• Naming conventions.
• Data type format
Integration
Time-variant

¢ Provides information from historical perspective,


e.g. past 5-10 years
¢ Every key structure contains either implicitly or
explicitly an element of time, i.e., every record has
a timestamp.
¢ The time-variant nature in a DW
l Allows for analysis of the past
l Relates information to the present
l Enables forecasts for the future
Non-volatile

¢ Data once recorded cannot be updated.


¢ Data warehouse requires two operations in data
accessing
l Initial loading of data
l Incremental loading of data

load

access
Non-volatile
Data Granularity
¢ Data granularity in a data warehouse refers to the level of detail.
l Higher granularity refers to detailed data that is at or near the
transaction level.
l Low granularity refers to data that is summarized or aggregated.
¢ In an operational system, data is usually kept at the lowest level of
detail.
¢ In a DW, data is summarized at different levels.
Level 1- Daily
Level 2- Weekly
Level 3- Monthly
Level 4- Quartely
Level 5- Yearly
Level 6- 5 years summary
etc…
etc…
Data Granularity
¢ Three data levels in a banking data warehouse

Daily Detail Monthly Summary Quarterly Summary


Account Account Account
Activity Date Month Quarter no
Amount No. of transactions No. of transactions
Deposit/ Withdraw Withdrawals Withdrawals
Deposits Deposits
Beginning Balance Beginning Balance
Ending Balance Ending Balance
Data Mart
¢ Data mart is a smaller
version of the Data
Warehouse which deals with
a single subject.

¢ Data marts are focused on


one area. Hence, they draw
data from a limited number
of sources.

¢ Time taken to build Data


Marts is very less compared
to the time taken to build a
Data Warehouse.
Data Warehouse and Data Mart
Types of Data Mart
Two approaches in designing a Data Warehouse
Inmon vs. Kimball

Use Cases:
•Marketing analysis and reporting favor a data mart approach because these
activities are typically performed in a specialized business unit, and do not require
enterprise-wide data.
•A financial analyst can use a finance data mart to carry out financial reporting.
Two approaches in designing a Data Warehouse
Inmon vs. Kimball

Use Case:
A company considering an expansion needs to incorporate data from a variety
of data sources across the organization to come to an informed decision. This
requires a data warehouse that aggregates data from sales, marketing, store
management, customer loyalty, supply chains, etc.
Two approaches in designing a
Data Warehouse

Top-down approach INMON Bottom-up approach KIMBALL


Enterprise view of data Narrow view of data
Inherently architected Inherently incremental
Single, central storage of data Faster implementation of manageable
parts
Centralized rules and control Each datamart is developed
independently
Takes longer time to build Comparatively less time than a DW
Higher risk to failure Less risk of failure
Needs higher level of cross-functional Unmanageable interfaces
skills
Data Warehouse Architecture

Operational Data and Metadata Presentation


Source Systems ETL Layer
Repository Layer Layer

Execution
Systems Extract,
Transformation, •Reporting
ODS
• CRM Tools
and Load (ETL) Enterprise
• ERP
• Legacy Layer Data •OLAP
• e-Commerce Warehouse Data Tools
• Cleanse Data Mart
• Filter Records •Ad Hoc
• Standardize Values Query
• Decode Values Data Tools
•External Data • Apply Business Rules Mart
• Householding Metadata •Data
• Purchased • Dedupe Records Repository Mining
Market Data Data
• Merge Records Tools
• Spreadsheets Mart

•Custom Tools
Technologies: •HTML Reports
•Informatica PowerMart •Oracle •Cognos
•PeopleSoft •SQL Server •Business Objects
•Ab Initio
•SAP •MicroStrategy
•Data Stage •Teradata
•Oracle Discoverer
•Siebel •Oracle Warehouse Builder •DB2 •Brio
•Oracle Applications •Custom programs •Data Mining Tools
•Custom Systems •SQL scripts •Portals
Components/Building Blocks of DWH
Source Data Component
¢ Source data can be grouped into 4 components
l Production data
• Comes from operational systems of enterprise
• Some segments are selected from it
• Narrow scope, e.g. order details
l Internal data
• Private datasheet, documents, customer profiles etc.
• E.g. Customer profiles for specific offering
• Special strategies to transform ‘it’ to DW (text document)
l Archived data
• Periodically take the old data and store it in archived files.
• DW have snapshots of historical data
l External data
• Executives depend upon external sources
• E.g. market data of competitors, car rental require new manufacturing.
Define conversion
Data Staging Component

¢ After data is extracted, data is to be prepared


¢ Data extracted from sources needs to be changed, converted
and made ready in suitable format
¢ Three major functions to make data ready
l Extract
l Transform
l Load
¢ Staging area provides a place and area with a set of functions
to
l Clean
l Change
l Combine
l Convert
Data Staging Component (contd..)

Data movements to the Data warehouse


Data Storage Component

¢ In a separate repository.
¢ Large volumes of historical data kept.
¢ Data structured in highly normalized formats
l for fast and efficient processing.
¢ Structures suitable for analysis
¢ Are read only repositories.
¢ They must be open to different tools.
Metadata Component

¢ Metadata is simply defined as data about data.


l For example, the index of a book serves as a metadata for the
contents in the book.
l Think of metadata as the Yellow Pages® of your town.
¢ The summarized data that leads us to detailed data.
¢ Is similar to the data dictionary but much more than that.
¢ It is the source of information for the management
module.
¢ Data is often worthless without it
l Can’t sell data (cars) without metadata (manuals)
Types of Metadata

¢ Operational Metadata
l describes details of the processing and accessing of data

¢ Extraction and Transformation Metadata


l contain data about the extraction frequencies, extraction
methods, and business rules for the data extraction.
l contains information about all the data transformations that take
place in the data staging area

¢ End-User Metadata
l It is the navigational map of the data warehouse.
l It enables the end-users to find information from the data
warehouse.
Information Delivery Component
Management and Control Component

¢ Sits on top of all other components.


¢ Coordinates the services and activities within the data
warehouse.
¢ Controls the data transformation and data transfer into the
data warehouse storage.
¢ Moderates information delivery to the users.
¢ Monitors movement of data into the staging area and
from there into the data warehouse storage.
¢ It interacts with metadata component to perform
management and control functions.
Components/Building Blocks of DWH
Chapter Summary

¢ Defining features of the data warehouse are: separate, subject-oriented,


integrated, time-variant, and nonvolatile.
¢ You may use a top-down approach and build a large, comprehensive,
enterprise data warehouse; or, you may use a bottom-up approach and build
small, independent, departmental data marts.
¢ In spite of some advantages, both approaches have serious shortcomings.
¢ A viable practical approach is to build conformed data marts, which together
form the corporate data warehouse.
¢ Data warehouse building blocks or components are: source data, data staging,
data storage, information delivery, metadata, and management and control.
¢ In a data warehouse, metadata is especially significant because it acts as the
glue holding all the components together and serves as a roadmap for the end-
users.
Trends in Data Warehouse
Chapter 3
Chapter Objectives

¢ Review the continued growth in data warehousing


¢ Learn how data warehousing is becoming
mainstream
¢ Discuss several major trends, one by one
¢ Grasp the need for standards and review the progress
¢ Understand Web-enabled data warehouse
What are trends in Data Warehousing?

¢ What is the current scenario?


¢ What is the state of market?
¢ What business have adopted data warehousing?
¢ What are technological advances?

In short what are the “Significant Trends”?


Continued growth in Data Warehousing

Four significant factors drove many companies to move into


data warehousing:
¢ Fierce competition

¢ Government deregulation

¢ Need to revamp(optimize) internal processes

¢ Imperative(crucial) for customized marketing


Vendor Solutions and Products
Significant Trends

Multiple Data Types


Data Visualization
Parallel Processing

¢ Performance parameters for data warehouse:


• to handle complex queries
• to handle simultaneous queries efficiently
• loading data
• index creation
Parallel Processing Hardware options
Parallel Processing Software must be
capable for the following

¢ Analyzing a large task to identify independent units that


can be executed in parallel.
¢ Identifying which of the smaller units must be executed
one after the other.
¢ Executing the independent units in parallel and the
dependent units in the proper sequence.
¢ Collecting, collating, and consolidating the results
returned by the smaller units
Two options provided by the vendors are:
¢ parallel server option

¢ parallel query option


Advantages of Parallel Processing

¢ Performance improvement for query processing, data


loading, and index creation
¢ Scalability, allowing the addition of CPUs and memory
modules without any changes to the existing application
¢ Fault tolerance so that the database would be available
even when some of the parallel processors fail
¢ Single logical view of the database even though the data
may reside on the disks of multiple nodes
Query Tools
¢ Flexible presentation—Easy to use and able to present results
online and on reports in many different formats
¢ Aggregate awareness—Able to recognize the existence of
summary or aggregate tables and automatically route queries to the
summary tables when summarized results are desired
¢ Crossing subject areas—Able to cross over from one subject data
mart to another automatically
¢ Multiple heterogeneous sources—Capable of accessing
heterogeneous data sources on different platforms
¢ Integration—Integrate query tools for online queries, batch
reports, and data extraction for analysis, and provide seamless
interface to go from one type of output to another
¢ Overcoming SQL limitations—Provide SQL extensions to
handle requests that cannot usually be done through standard SQL
Browser Tools

¢ Tools are extensible to allow definition of any type of data


or informational object
¢ Inclusion of open APIs (application program interfaces)
¢ Provision of several types of browsing functions
including navigation through hierarchical groupings
¢ Allowing users to browse the catalog (data dictionary or
metadata), find an informational object of interest, and
proceed further to launch the appropriate query tool with
the relevant parameters
¢ Applying Web browsing and search techniques to browse
through the information catalogs
Data Fusion

¢ Data fusion is a technology dealing with the merging of


data from disparate sources.
¢ The more information stored, the more difficult it is to
find the right information at the right time. Data fusion
technology is expected to address this problem also.
¢ Data fusion is still in the realm of research.
Multidimensional analysis &
Agent technology

¢ Means that they will be able to analyze business


measurements in many different ways.
¢ Multidimensional analysis is also synonymous with
online analytical processing (OLAP).
¢ A software agent is a program that is capable of
performing a predefined programmable task on behalf
of the user.
¢ Software agents may even be used for routine
monitoring of business performance.
Knowledge management & Data
Warehouse
The Web to the Warehouse

¢ Capturing the clickstream of all the visitors to your company’s


Web site and performing all the traditional data warehousing
functions.
¢ Clickstream data tracks how people proceeded through your
company’s Web site, what triggers purchases, what attracts
people, and what makes them come back.
¢ Clickstream data enables analysis of several key measures
including:
l Customer demand
l Effectiveness of marketing promotions
l Effectiveness of affiliate relationship among products
l Demographic data collection
l Customer buying patterns
l Feedback on Web site design
The Web to the Warehouse
Chapter Summary

¢ Data warehousing is becoming mainstream with the spread of high-volume data


warehouses and the rapid increase in the number of vendor products.
¢ To be effective, modern data warehouses need to store multiple types of data:
structured and unstructured, including documents, images, audio, and video.
¢ Data visualization deals with displaying information in several types of visual forms:
text, numerical arrays, spreadsheets, charts, graphs, and so on. Tremendous progress
has been made in data visualization.
¢ Data warehouse performance may be improved by using parallel processing with
appropriate hardware and software options.
¢ It is critical to adapt data warehousing to work with ERP packages, knowledge
management, and customer relationship systems.
¢ Data warehousing industry is seriously seeking agreed-upon standards for metadata
and OLAP. The end is perhaps in sight.
¢ Web-enabling the data warehouse means using the Web for information delivery and
integrating the clickstream data from the corporate Web site for analysis. The
convergence of data warehousing and the Web technology is crucial to every business
in the 21st century.
Project Planning & Management
Planning your Data Warehouse

Key issues:
¢ Value and expectation
¢ Risk Assessment

¢ Top down or Bottom Up

¢ Build or Buy

¢ Single vendor or Best of breed

¢ Business requirement ,not technology

¢ Top management Support

¢ Justifying your Data warehouse

¢ Overall plan
Guiding principles of a Data
Warehouse System

¢ Sponsorship. No data warehouse project succeeds without


strong and committed executive sponsorship.
¢ Project Manager. It is a serious mistake to have a project
manager who is more technology- oriented than user-
oriented and business-oriented.
¢ New Paradigm. Data warehousing is new for most
companies; innovative project management methods are
essential to deal with the unexpected challenges.
¢ Team Roles. Team roles are not to be assigned arbitrarily;
the roles must reflect the needs of each individual data
warehouse project.
Guiding principles of a Data
Warehouse System

¢ Data Quality. Three critical aspects of data in the data


warehouse are: quality, quality, and quality.
¢ User Requirements. Although obvious, user requirements
alone form the driving force of every task on the project
schedule.
¢ Building for Growth. Number of users and number of queries
shoot up very quickly after deployment; data warehouses not
built for growth will crumble swiftly.
¢ Project Politics. The first data warehouse project in a
company poses challenges and threats to users at different
levels.
Guiding principles of a Data
Warehouse System

¢ Realistic Expectations: It is easy to promise the world in


the first data warehouse project; setting expectations at the
right and attainable levels is the best course.
¢ Dimensional Data Modeling: A well-designed dimensional
data model is a required foundation and blueprint.
¢ External Data: A data warehouse does not live by internal
data alone; data from relevant external sources is an
absolutely necessary ingredient.
¢ Training: Data warehouse user tools are different and new.
If the users do not know how to use the tools, they will not
use the data warehouse. An unused data warehouse is a
failed data warehouse.
Signs of success

¢ Queries and reports—rapid increase in the number of


queries and reports requested by the users directly
from the data warehouse
¢ Query types—queries becoming more sophisticated
¢ Active users—steady increase in the number of users
¢ Usage—users spending more and more time in the
data warehouse looking for solutions
¢ Turnaround times—marked decrease in the times
required for obtaining strategic information
Practical Approach

¢ Running a project in a pragmatic way means constantly


monitoring the deviations and slippage, and making in-
flight corrections to stay the course. Rearrange the priorities
as and when necessary.
¢ Review project task dependencies continuously. Minimize
wait times for dependent tasks.
¢ Let project schedules act as guides for smooth workflow
and achieving results, not just to control and inhibit
creativity. Please do not try to control each task to the
minutest detail. You will then only have time to keep the
schedules up-to-date, with less time to do the real job.
Chapter Summary

¢ While planning for your data warehouse, key issues to be considered include:
setting proper expectations, assessing risks, deciding between top-down or
bottom-up approaches, choosing from vendor solutions.
¢ Business requirements, not technology, must drive your project.
¢ A data warehouse project without the full support of the top management and
without a strong and enthusiastic executive sponsor is doomed to failure from day
one.
¢ A data warehouse project is much different from a typical OLTP system project.
The traditional life cycle approach of application development must be changed
and adapted for the data warehouse project.
¢ Standards for organization and assignment of team roles are still in the
experimental stage in many projects. Modify the roles to match what is important
for your project.
¢ Participation of the users is mandatory for success of the data warehouse project.
Users can participate in a variety of ways.
¢ Consider the warning signs and success factors; in the final analysis, adopt a
practical approach to build a successful data warehouse.
Unit-1
Defining Business Requirements
Unit-1 Topics

¢ Need for data warehousing - Chapter 1


¢ Basic elements of DW - Chapter 2
¢ Trends in DW - Chapter-3
¢ Project planning and management – Chapter 4
¢ Collecting the requirements – Chapter 5

Book: Data Warehousing Fundamentals: A Comprehensive Guide for IT


Professionals. Paulraj Ponniah
CHAPTER OBJECTIVES

¢ Discuss how and why defining requirements is different for a


data warehouse
¢ Understand the role of business dimensions
¢ Learn about information packages and their use in defining
requirements
¢ Review methods for gathering requirements
¢ Grasp the significance of a formal requirements definition
document
Dimensional Analysis
Usage of Information Unpredictable
¢ Building a data warehouse is very different from building an
operational system.
¢ In Operational System,
l Users give precise details of the required functions.
l Details of information content
l Details of usage patterns
¢ In Data warehousing System,
l Users are unable to define their requirements
l Users are not sure what information they want from Data
warehouse
l Users not sure how they would like to use the information.
¢ It is notable especially in the requirements gathering phase because :
• Usage of Information is unpredictable
Dimensional Analysis
Dimensional Nature of Business Data

¢ Users can give insights into how they think about the
business.
¢ Users can tell you what measurement units are important
for them.
¢ Users can let you know how they measure success in that
particular department.
¢ Users can give you insights into how they combine the
various pieces of information for strategic decision
making.
Dimensional Analysis
Managers think in Business Dimensions
Dimensional Analysis
Dimensional Nature of Business Data

Analysis of sales units along the three business dimensions


Examples of Business Dimensions
Information Packages - A New Concept

¢ The new methodology for determining requirements for a


data warehouse system is based on business dimensions.
¢ It flows out of the need of the users to base their analysis
on business dimensions.
¢ The new concept incorporates the basic measurements and
the business dimensions along which the users analyze
these basic measurements.
¢ Using the new methodology, you come up with the
measurements and the relevant dimensions that must be
captured and kept in the data warehouse.
1. Business dimensions

¢ In requirements collection phase, the end users can


provide the measurements which are important to that
department.
¢ They can also give insights of combining the various
pieces of information for strategic decision making.
¢ Managers think of business in terms of business
dimensions
¢ The managers try to evaluate business in different
dimensions.
Information Package example for
analyzing Automobile Sales
2. Dimension Hierarchies/Categories

¢ When a user analyzes the measurements along a business


dimension, the user usually would like to see the numbers
first in summary and then at various levels of detail
l Hierarchies are paths for drilling down or rolling up in our
analysis.
l Non hierarchical data which are very important in analyzing
are called categories.
• Holiday flag is a category which helps in evaluating the sales
on a holiday
2.1 Dimension Hierarchies and
Categories for sales
¢ Product: Model name, model year, package styling,
product line, product category, exterior color, interior
color, first model year
¢ Dealer: Dealer name, city, state, single brand flag, date
first operation
¢ Customer demographics: Age, gender, income range,
marital status, household size, vehicles owned, home
value, own or rent
¢ Payment method: Finance type, term in months, interest
rate, agent
¢ Time: Date, month, quarter, year, day of week, day of
month, season, holiday flag
Information Package example for
analyzing Automobile Sales
Key Business Metrics or Facts

The set of meaningful and useful metrics for analyzing


automobile sales is as follows:
¢ Actual sale price

¢ MRP sale price

¢ Options price

¢ Full price

¢ Dealer credits

¢ Dealer invoice

¢ Amount of down payment

¢ Manufacturer proceeds

¢ Amount financed
Information Package example for
analyzing Hotel occupancy
Key Business Metrics or Facts

The set of meaningful and useful metrics for analyzing hotel


occupancy is as follows:
¢ Occupied rooms

¢ Vacant rooms

¢ Unavailable rooms

¢ Number of occupants

¢ Revenue
Information Package helps to…..

¢ Define the common subject areas


¢ Design key business metrics
¢ Decide how data must be presented
¢ Determine how users will aggregate or roll up
¢ Decide the data quantity for user analysis or query
¢ Decide how data will be accessed
¢ Establish data granularity
¢ Estimate data warehouse size
¢ Determine the frequency for data refreshing
¢ Ascertain how information must be packaged
Requirements Gathering Methods
Users of the Data Warehouse

Broadly, we can classify the users of the data warehouse as


follows:
¢ Senior executives (including the sponsors)
¢ Key departmental managers

¢ Business analysts

¢ Operational system DBAs

¢ Others nominated by the above


Requirement gathering methods

What requirements do you need to gather?


¢ Data elements: fact classes, dimensions
¢ Recording of data in terms of time
¢ Data extracts from source systems
¢ Business rules: attributes, ranges, domains,
operational records
Requirement gathering methods
Interviews
¢ Two or three persons at a time
¢ Easy to schedule

¢ Good approach when details are intricate

¢ Some users are comfortable only with one-on-one


interviews
¢ Need good preparation to be effective
¢ Always conduct pre interview research

¢ Also encourage users to prepare for the interview


Requirement gathering methods

Group Sessions
¢ Groups of twenty or less persons at a time
¢ Use only after getting a baseline understanding of the
requirements
¢ Not good for initial data gathering

¢ Useful for confirming requirements

¢ Need to be very well organized


Requirement gathering methods

¢ Review of Existing Documentation


¢ Documentation from User Departments
¢ Documentation from IT
Requirements Definition Document Outline

1. Introduction
l State the purpose and scope of the project.
l Include broad project justification.
l Provide an executive summary of each subsequent section.

2. General requirements descriptions


l Describe the source systems reviewed.
l Include interview summaries.
l Broadly state what types of information requirements are
needed in the data warehouse.
Requirements Definition Document Outline

3. Specific requirements
l Include details of source data needed.
l List the data Transformation and storage requirements.
l Describe the types of information delivery methods needed by
the users.

4. Information packages
l Provide as much detail as possible for each information package.
l Include in the form of package diagrams.
Requirements Definition Document Outline

5. Other requirements
l Cover miscellaneous requirements such as data extract
frequencies, data loading methods, and locations to which
information must be delivered.

6. User expectations
l State the expectations in terms of problems and opportunities.
l Indicate how the users expect to use the data warehouse.
Requirements Definition Document Outline

7. User participation and sign-off


l List the tasks and activities in which the users are expected to
participate throughout the development life cycle.

8. General implementation plan


l At this stage, give a high-level plan for implementation.
Chapter Summary

¢ Unlike the requirements for an operational system, the requirements for a data
warehouse are quite nebulous.
¢ Business data is dimensional in nature and the users of the data warehouse think in
terms of business dimensions.
¢ A requirements definition for the data warehouse can, therefore, be based on
business dimensions such as product, geography, time, and promotion.
¢ Information packages—a new concept—are the backbone of the requirements
definition. An information package records the critical measurements or facts and
business dimensions along which the facts are normally analyzed.
¢ Interviews and group sessions are standard methods for collecting requirements.
¢ Key people to be interviewed or to be included in group sessions are senior
executives (including the sponsors), departmental managers, business analysts, and
operational systems DBAs.
¢ Review all existing documentation of related operational systems.
¢ Scope and content of the requirements definition document include data sources,
data transformation, data storage, information delivery, and information package
diagrams.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy