Topic 4 (Data Warehouse)
Topic 4 (Data Warehouse)
Topic 4 (Data Warehouse)
Information
Support
3 Main Focuses
Data Warehouse
Data Visualization
Business Analytics
2
Data
1
Warehouse
Lesson Outcomes
▸ Understand the basic definitions and concepts of data warehouses
▸ Learn different types of data warehousing architectures; their comparative advantages
and disadvantages
▸ Describe the processes used in developing and managing data warehouses
▸ Explain data warehousingoperations
▸ Explain the role of data warehouses in decision support
▸ Explain data integration and the extraction, transformation, and load (ETL) processes
▸ Describe real-time (a.k.a. right-time and/or active) data warehousing
▸ Understand data warehouse administration and security issues
4
Main Topics
▸ DWdefinitions
▸ Characteristics ofDW
▸ DataMarts
▸ ODS, EDW,Metadata
▸ DWFramework
▸ DWArchitecture& ETL Process
▸ DWDevelopment
▸ DWIssues
5
“The data warehouse is a collection of
integrated, subject-oriented databases
design to support DSS functions, where
each unit of data is non-volatile and
relevant to some moment in time”
physical repository where relational data are specially
organized to provide enterprise-wide, cleansed data in a
standardized format
6
Characteristics of D W
▸ Subject orientation: data is organized based on how users
refer to it.
7
Characteristics of D W
Additional characteristics
▸ Web based: efficient computing environment for web-
based applications.
▸ Relational/multidimensional: Uses either a relational
structure or multidimensional structure.
▸ Client/server: Uses the client/server architecture to
provide easy access for end users.
▸ Real time: Newer data warehouses provide real-time
data-access and analysis capabilities.
▸ Include Metadata: about how the data are organized.
8
Characteristics of D W
3 main types of DW are:
▸ Data marts
Smaller and more focused
▸ Operational Data Stores (ODS)
Interim staging area for DW
▸ Enterprise Data Warehouse (EDW)
Large-scale DW
9
Data Mart
▸ A departmental data warehouse that stores only relevant data
1
Operational Data Store (ODS)
▸ A type of database that often used as an interim staging area
Integration of data
Integration of data from many sources into a
standard format
1
Metadata
▸ Data about data
Data about data
▸ In a data warehouse, metadata describe the contents of a
data warehouse and the manner of its acquisition and use.
1
A Conceptual
Framework for
DW
No data marts option
Data Applications
Sources (Visualization)
Access
Routine
ERP Business
ETL Reporting
Process Data mart
(Marketing)
Select
Legacy Metadata Data/text
/ Middleware
Extract mining
Data mart
(Engineering)
Transform Enterprise
POS Data warehouse
OLAP,
Integrate
Data mart
API
Dashboard,
(Finance) Web
Load
Other
OLTP/wEB
Replication Data mart
(...) Custom built
External applications
data
15
Generic D W Architectures
Three-tier architecture Two-tier architecture
1. Data acquisition software First 2 tiers in three-tier architecture is
(back-end) combined into one
2. The data warehouse that
contains the data & software
3. Client (front-end) software
that allows users to access
and analyze data from the
warehouse
16
Generic D W Architectures
Tier 1: Tier 2:
Client workstation Application & database server
17
Web-based D W Architectures
Web pages
Application
Server
Client Web
(Web browser) Internet/ Server
Intranet/
Extranet
Data
warehouse
18
Alternative D W Architectures
(a) Independent Data Marts Architecture
ETL
End user
Source Staging Independent data marts
access and
Systems Area (atomic/summarized data)
applications
ETL
Dimensionalized data marts End user
Source Staging
linked by conformed dimentions access and
Systems Area
(atomic/summarized data) applications
ETL
End user
Source Staging Normalized relational
access and
Systems Area warehouse (atomic data)
applications
ETL
Normalized relational End user
Source Staging
warehouse (atomic/some access and
Systems Area
summarized data) applications
20
Ten factors that potentially affect the
architecture selection decision:
1. Information interdependence 6. Strategic view of the datawarehouse
between organizational units prior to implementation
2. Upper management’s information 7. Compatibility with existing systems
needs 8. Perceived ability of the in-house ITstaff
3. Urgency of need for adata 9. Technical issues
warehouse
10. Social/political factors
4. Nature of end-user tasks
5. Constraints on resources
21
Enterprise Data Warehouse (by Teradata Corporation)
22
Data Integration and the Extraction,
Transformation, and Load (ETL) Process
Data Integration
• Integration that comprises three major processes: data access, data federation (the
integration of business views across multiple data stores), and change capture (based on the
identification, capture and delivery of changes made to the enterprise data stores).
Packaged Transient
application data source
Data
warehouse
Legacy
Extract Transform Cleanse Load
system
Data mart
Other internal
applications
24
ETL
Issues affecting the purchase of an ETL tool
• Data transformation tools are expensive
• Data transformation tools may have a long learning
curve
Important criteria in selecting an ETL tool
• Ability to read from and write to an unlimited number of
data sources/architectures
• Automatic capturing and delivery of metadata
• A history of conforming to open standards
• An easy-to-use interface for the developer and the
functional user
Benefits of DW
Direct benefits of a data warehouse
• Allows end users to perform extensive analysis
• Allows a consolidated view of corporate data
• Better and more timely information
• Enhanced system performance
• Simplification of data access
Indirect benefits of data warehouse
• Enhance business knowledge
• Present competitive advantage
• Enhance customer service and satisfaction
• Facilitate decision making
• Help in reforming business processes
Data Warehouse Development
▸ Data warehouse development approaches
Inmon Model: EDW approach (top-down)
Kimball Model: Data mart approach (bottom-up)
Which model is best?
- There is no one-size-fits-all strategy to DW
- Depends on user demand, business requirement and
enterprise maturity
• One alternative is the hosted warehouse
Driver Automotive
Facts:
Dimensions: Claim Information Central table that contains
How data will be sliced/
(usually summarized)
diced (e.g., by location,
information; also contains
time period, type of
foreign keys to access each
automobile or driver)
dimension table.
Location Time
29
Dimensional Modelling
Data cube
A two-dimensional, three-
dimensional, or higher-
dimensional object in which
each dimension of the data
represents a measure of
interest
- Grain highest level of detail
- Drill-down/up
- Slicing
30
OLAP Operations
• Slicing: A slice is a subset of a multidimensional array (usually a two-dimensional
representation) corresponding to a single value set for one (or more) of the
dimensions not in the subset.
• Roll-up: A roll-up involves computing all of the data relationships for one or more
dimensions.
Be politically aware.
32
Risks for Implementing D W
No mission or objective Architectural and design risks
34
Real-time D W
(a.k.a. Active Data Warehousing)
▸ Enabling real-time data updatesfor real-time analysis and real-time
decision making is growing rapidly
▹ Push vs. Pull (of data)
▸ Concernsabout real-time BI
▹ Not all data should be updated continuously
▹ Mismatch of reports generated minutes apart
▹ May be cost prohibitive
▹ May also beinfeasible
35
Evolution of D S S & D W
37
Active Data Warehousing
(by Teradata Corporation)
38
Comparing Traditional and Active D W
39
Data Warehouse Administration
▸ Dueto its huge size and its intrinsic nature, a DWrequires especially strong
monitoring in order to sustain its efficiency, productivity and security.
▸ The successful administration and management of a data warehouse entails
skills and proficiency that go past what is required of a traditional database
administrator.
▹ Requires expertise in high-performance software, hardware, and networking
technologies
40
D W Scalability and Security
▸ Scalability
▹ The main issues pertaining to scalability:
▹ The amount of data in the warehouse
▹ How quickly the warehouse is expected to grow
▹ The number of concurrent users
▹ The complexity of user queries
▹ Good scalability means that queries and other data-access functions will grow linearly
with the size of the warehouse
▸ Security
▹ Emphasis on security and privacy
41