Chap 2
Chap 2
mohibkhan483@gmail.com
1
Book
2
Chapter 2
3
DEFINING FEATURES
4
DEFINING FEATURES
Subject-Oriented Data
• In Operational Systems, we store data by individual applications
• E.g. in an order processing application, we keep the data for that
particular application
• These data sets provide the data for all the functions for entering
orders, checking stock, verifying customer’s credit etc.
• In data warehouse, data is stored by real-world business subjects or
events, not by applications
• In data warehouse, all the data sets relating to the same real-world
business subject or event is tied together
• Data is linked and stored by business subjects
5
Subject-Oriented Data
6
DEFINING FEATURES
Integrated Data
• The data in the data warehouse comes from several operational
systems
• Source data reside in different databases, files, and data segments
• The file layouts, character code representations, and field naming
conventions all could be different
• Here are some of the items that would need to standardized and
made consistent:
• † Naming conventions
• † Codes
• † Data attributes
• † Measurements 7
Integrated Data
8
DEFINING FEATURES
Time-Variant Data
• Data warehouse contains historical data, not just current values
• Changes to data are tracked and recorded
• If necessary, reports can be produced to show changes over time
• Every data structure in the data warehouse contains the time
element
• This aspect of the data warehouse is quite significant for both the
design and the implementation phases
9
DEFINING FEATURES
Time-Variant Data
• For example, the sales quantity in a record may relate to a specific
date, week, month, or quarter
• The time-variant nature of the data in a data warehouse
• † Allows for analysis of the past
• † Relates information to the present
• † Enables forecasts for the future
10
DEFINING FEATURES
Nonvolatile Data
• Once the data is captured and committed in the data warehouse,
you do not run individual transactions to change the data there
• Data updates are common place in an operational database; not so
in a data warehouse
• The data in a data warehouse is not as volatile as the data in an
operational database
11
Nonvolatile Data
12
DEFINING FEATURES
Data Granularity
• Frequently, the analysis begins at a high level and moves down to
lower levels of detail
• It is efficient to keep data summarized at different levels
• Depending on the query, we can go to the particular level of detail
and satisfy the query
• Data granularity in a data warehouse refers to the level of detail
• The lower the level of detail, the finer is the data granularity
13
Data Granularity
14
DATA WAREHOUSES AND DATA MARTS
15
DATA WAREHOUSES AND DATA MARTS
• Should you build a large data warehouse and then let that
repository feed data into local?
• Should you build individual local data marts, and combine them to
form your overall data warehouse?
• Should these local data marts be independent of one another?
• Or should they be dependent on the overall data warehouse for
data feed?
• Should you build a pilot data mart?
• These are crucial questions
16
DATA WAREHOUSES AND DATA MARTS
17
DATA WAREHOUSES AND DATA MARTS
Top-Down Approach
• Data in the data warehouse is stored at the lowest level of
granularity; based on a normalized data model
• In the Inmon vision the data warehouse is at the center of the
“Corporate Information Factory” (CIF)
• Delivering business intelligence to the enterprise
• Business operations provide data to drive the CIF
• The centralized data warehouse would feed the dependent data
marts that may be designed based on a dimensional data model
18
DATA WAREHOUSES AND DATA MARTS
Top-Down Approach
• The advantages of this approach are:
• † A truly corporate effort, an enterprise view of data
• † Inherently architected, not a union of disparate data marts
• † Single, central storage of data about the content
• † Centralized rules and control
• † May see quick results if implemented with iterations
• The disadvantages are:
• † Takes longer to build even with an iterative method
• † High exposure to risk of failure
• † Needs high level of cross-functional skills
• † High expense without proof of concept 19
DATA WAREHOUSES AND DATA MARTS
Top-Down Approach
20
DATA WAREHOUSES AND DATA MARTS
Bottom-Up Approach
• Data marts are created first to provide analytical and reporting
capabilities for specific business subjects
• Data marts may contain detailed data or summaries depending on
the needs for analysis
• These data marts are joined or “unioned” together by conforming
the dimensions
21
DATA WAREHOUSES AND DATA MARTS
Bottom-Up Approach
• The advantages of this approach are:
• † Faster and easier implementation of manageable pieces
• † Favorable return on investment and proof of concept
• † Less risk of failure
• † Inherently incremental; can schedule important data marts first
• † Allows project team to learn and grow
• The disadvantages are:
• † Each data mart has its own narrow view of data
• † Permeates redundant data in every data mart
• † Perpetuates inconsistent and irreconcilable data
• † Proliferates unmanageable interfaces 22
DATA WAREHOUSES AND DATA MARTS
A Practical Approach
• We do not lose sight of the overall big picture for the entire
enterprise
• We base our planning on this overall big picture
• This aspect is from the top-down approach
• Adopt the principles of the bottom-up approach and build the
conformed data marts based on a priority scheme
• The steps in this practical approach are as follows:
• 1. Plan and define requirements at the overall corporate level
• 2. Create a surrounding architecture for a complete warehouse
• 3. Conform and standardize the data content
• 4. Implement the data warehouse as a series of supermarts, one at a
time 23
DATA WAREHOUSES AND DATA MARTS
A Practical Approach
• Plan at the enterprise level
• Gather requirements at the overall level
• Establish the architecture for the complete warehouse
• Determine the data content for each supermart
• Supermart are carefully architected data marts
• Implement these supermarts, one at a time
• Make sure that the data content among the various supermarts are
conformed in terms of datatypes, field lengths, precision etc
• A certain data element must mean the same thing in every
supermart
24
ARCHITECTURAL TYPES
25
ARCHITECTURAL TYPES
26
ARCHITECTURAL TYPES
Federated
• Some companies get into data warehousing with an existing legacy
of an assortment of decision-support structures
• In the form of operational systems, extracted datasets, primitive data
marts
• It is unwise to discard everything and start from scratch
• In federated, Data may be physically or logically integrated through
shared key fields, overall global metadata, distributed queries, and
etc.
• In this architectural type, there is no one overall data warehouse
27
ARCHITECTURAL TYPES
Hub-and-Spoke
• Similar to the centralized data warehouse, an overall enterprise-
wide data warehouse
• Atomic data in the third normal form is stored in the centralized data
warehouse
• Dependent data marts in this architectural type
• Dependent data marts obtain data from the centralized data
warehouse
• The centralized data warehouse forms the hub to feed data to the
data marts on the spokes
28
ARCHITECTURAL TYPES
Hub-and-Spoke
• Each dependent dart mart may have normalized, denormalized,
summarized, or dimensional data structures based on individual
requirements
• Most queries are directed to the dependent data marts
• The centralized data warehouse may also be used
• Top-down approach to data warehouse development
29
ARCHITECTURAL TYPES
Data-Mart Bus
• Begin with analyzing requirements for a specific business subject
such as orders, shipments, billings, insurance claims, car rentals
• Build the first data mart using business dimensions and metrics
• These business dimensions will be shared in the future data marts
• By conforming dimensions among the various data marts, the
result would be logically integrated supermarts that will provide an
enterprise view of the data
• The data marts contain atomic data organized as a dimensional
data model
• This architectural type results from adopting an enhanced bottom-
up approach to data warehouse development 30
OVERVIEW OF THE COMPONENTS
31
OVERVIEW OF THE COMPONENTS
32
OVERVIEW OF THE COMPONENTS
33
OVERVIEW OF THE COMPONENTS
34
OVERVIEW OF THE COMPONENTS
36
OVERVIEW OF THE COMPONENTS
41
OVERVIEW OF THE COMPONENTS
42
OVERVIEW OF THE COMPONENTS
44
OVERVIEW OF THE COMPONENTS
46
OVERVIEW OF THE COMPONENTS
Metadata Component
• Meta data in a data warehouse is similar to the data dictionary or the
data catalog in a database management system
• The metadata component is the data about the data in the data
warehouse
• Metadata in a data warehouse is similar to a data dictionary, but
much more than a data dictionary
47
OVERVIEW OF THE COMPONENTS
49
METADATA IN THE DATA WAREHOUSE
Operational Metadata
• Data for the data warehouse comes from several operational systems
of the enterprise
• These source systems contain different data structures
• The data elements selected for the data warehouse have various field
lengths and data types
• In selecting data from the source systems for the data warehouse,
you split records, combine parts of records from different source
files, and deal with multiple coding schemes and field lengths
• When you deliver information to the end-users, you must be able to
tie that back to the original source data sets
• Operational metadata contain all of this information about the
operational data sources
50
METADATA IN THE DATA WAREHOUSE
51
METADATA IN THE DATA WAREHOUSE
End-User Metadata
• The end-user metadata is the navigational map of the data
warehouse
• It enables the end-users to find information from the data warehouse
• The end-user metadata allows the end-users to use their own
business terminology and look for information in those ways in which
they normally think of the business
52
METADATA IN THE DATA WAREHOUSE
Special Significance
• It acts as the glue that connects all parts of the data warehouse.
• It provides information about the contents and structures to the
developers
• It opens the door to the end-users and makes the contents
recognizable in their own terms
53