Topic 4 (Data Warehouse)

Download as pdf or txt
Download as pdf or txt
You are on page 1of 41

ISP688/ITS674:INTELLIGENT DECISION MAKING SUPPORT SYSTEMS

Information
Support
3 Main Focuses
Data Warehouse

Data Visualization

Business Analytics
2
Data
1
Warehouse
Lesson Outcomes
▸ Understand the basic definitions and concepts of data warehouses
▸ Learn different types of data warehousing architectures; their comparative advantages
and disadvantages
▸ Describe the processes used in developing and managing data warehouses
▸ Explain data warehousingoperations
▸ Explain the role of data warehouses in decision support
▸ Explain data integration and the extraction, transformation, and load (ETL) processes
▸ Describe real-time (a.k.a. right-time and/or active) data warehousing
▸ Understand data warehouse administration and security issues
4
Main Topics
▸ DWdefinitions
▸ Characteristics ofDW
▸ DataMarts
▸ ODS, EDW,Metadata
▸ DWFramework
▸ DWArchitecture& ETL Process
▸ DWDevelopment
▸ DWIssues

5
“The data warehouse is a collection of
integrated, subject-oriented databases
design to support DSS functions, where
each unit of data is non-volatile and
relevant to some moment in time”
physical repository where relational data are specially
organized to provide enterprise-wide, cleansed data in a
standardized format

6
Characteristics of D W
▸ Subject orientation: data is organized based on how users
refer to it.

▸ Integrated: all inconsistencies regarding naming convention


and value representations are removed.

▸ Non-volatile: data are stored in read-only format and do


not change over time.

▸ Time variant: data are not current but normally time-series

7
Characteristics of D W
Additional characteristics
▸ Web based: efficient computing environment for web-
based applications.
▸ Relational/multidimensional: Uses either a relational
structure or multidimensional structure.
▸ Client/server: Uses the client/server architecture to
provide easy access for end users.
▸ Real time: Newer data warehouses provide real-time
data-access and analysis capabilities.
▸ Include Metadata: about how the data are organized.

8
Characteristics of D W
3 main types of DW are:
▸ Data marts
Smaller and more focused
▸ Operational Data Stores (ODS)
Interim staging area for DW
▸ Enterprise Data Warehouse (EDW)
Large-scale DW

9
Data Mart
▸ A departmental data warehouse that stores only relevant data

Dependent data mart


A subset that is created directly froma data warehouse

Independent data mart


• A small data warehouse designed fora strategic business unit or
a department

1
Operational Data Store (ODS)
▸ A type of database that often used as an interim staging area

Contents are updated


The contents of an ODS are updated throughout the
course of business operations

Short term decision


Similar to short-term memory in that it stores only very recent
information
Provides near-real-time current data
data are consolidated from multiple source systems and able to
be presented in a near-real-time
1
Enterprise Data Warehouse (EDW)
▸ A large-scale DW that is used across the enterprise

Integration of data
Integration of data from many sources into a
standard format

Support for other enterprise applications


provide data for DSS, CRM, SCM, BPM, BAM, PLM, KMS

1
Metadata
▸ Data about data
Data about data
▸ In a data warehouse, metadata describe the contents of a
data warehouse and the manner of its acquisition and use.

▸ Using pattern to differentiate between syntactic metadata


(i.e., data describing the syntax of data), structural metadata
(i.e., data describing the structure of the data), and semantic
metadata (i.e., data describing the meaning of the data in a
specific domain).

1
A Conceptual
Framework for
DW
No data marts option
Data Applications
Sources (Visualization)
Access
Routine
ERP Business
ETL Reporting
Process Data mart
(Marketing)
Select
Legacy Metadata Data/text

/ Middleware
Extract mining
Data mart
(Engineering)
Transform Enterprise
POS Data warehouse
OLAP,
Integrate
Data mart

API
Dashboard,
(Finance) Web
Load
Other
OLTP/wEB
Replication Data mart
(...) Custom built
External applications
data

15
Generic D W Architectures
Three-tier architecture Two-tier architecture
1. Data acquisition software First 2 tiers in three-tier architecture is
(back-end) combined into one
2. The data warehouse that
contains the data & software
3. Client (front-end) software
that allows users to access
and analyze data from the
warehouse

16
Generic D W Architectures

Tier 1: Tier 2: Tier 3:


Client workstation Application server Database server

Tier 1: Tier 2:
Client workstation Application & database server

17
Web-based D W Architectures

Web pages
Application
Server

Client Web
(Web browser) Internet/ Server
Intranet/
Extranet
Data
warehouse

18
Alternative D W Architectures
(a) Independent Data Marts Architecture

ETL
End user
Source Staging Independent data marts
access and
Systems Area (atomic/summarized data)
applications

(b) Data Mart Bus Architecture with Linked Dimensional Datamarts

ETL
Dimensionalized data marts End user
Source Staging
linked by conformed dimentions access and
Systems Area
(atomic/summarized data) applications

(c) Hub and Spoke Architecture (Corporate Information Factory)

ETL
End user
Source Staging Normalized relational
access and
Systems Area warehouse (atomic data)
applications

Dependent data marts


(summarized/some atomic data)
19
Alternative D W Architectures
(d) Centralized Data Warehouse Architecture

ETL
Normalized relational End user
Source Staging
warehouse (atomic/some access and
Systems Area
summarized data) applications

(e) Federated Architecture

Data mapping / metadata


End user
Logical/physical integration of access and
Existing data warehouses
common data elements applications
Data marts and legacy systmes

20
Ten factors that potentially affect the
architecture selection decision:
1. Information interdependence 6. Strategic view of the datawarehouse
between organizational units prior to implementation
2. Upper management’s information 7. Compatibility with existing systems
needs 8. Perceived ability of the in-house ITstaff
3. Urgency of need for adata 9. Technical issues
warehouse
10. Social/political factors
4. Nature of end-user tasks
5. Constraints on resources

21
Enterprise Data Warehouse (by Teradata Corporation)

22
Data Integration and the Extraction,
Transformation, and Load (ETL) Process
Data Integration
• Integration that comprises three major processes: data access, data federation (the
integration of business views across multiple data stores), and change capture (based on the
identification, capture and delivery of changes made to the enterprise data stores).

Enterprise application integration (EAI)


• A technology that provides a vehicle for pushing data from source systems into a data
warehouse
• Application Programming Interface (API)
• Service-oriented architecture (SOA)
Enterprise information integration (EII)
• An evolving tool space that promises real-time data integration from a variety of sources
• A mechanism for pulling data from source systems to satisfy a request (queries) for
information (i.e. using XML, query languages)
23
Extraction, Transformation, and
Load (ETL) process

Packaged Transient
application data source

Data
warehouse

Legacy
Extract Transform Cleanse Load
system

Data mart
Other internal
applications

24
ETL
Issues affecting the purchase of an ETL tool
• Data transformation tools are expensive
• Data transformation tools may have a long learning
curve
Important criteria in selecting an ETL tool
• Ability to read from and write to an unlimited number of
data sources/architectures
• Automatic capturing and delivery of metadata
• A history of conforming to open standards
• An easy-to-use interface for the developer and the
functional user
Benefits of DW
Direct benefits of a data warehouse
• Allows end users to perform extensive analysis
• Allows a consolidated view of corporate data
• Better and more timely information
• Enhanced system performance
• Simplification of data access
Indirect benefits of data warehouse
• Enhance business knowledge
• Present competitive advantage
• Enhance customer service and satisfaction
• Facilitate decision making
• Help in reforming business processes
Data Warehouse Development
▸ Data warehouse development approaches
Inmon Model: EDW approach (top-down)
Kimball Model: Data mart approach (bottom-up)
Which model is best?
- There is no one-size-fits-all strategy to DW
- Depends on user demand, business requirement and
enterprise maturity
• One alternative is the hosted warehouse

▸ Data warehouse structure:


The Star Schema vs. Relational

▸ Real-time data warehousing?


Star Schema vs Relational Database
D W Structure: Star Schema
(a.k.a. Dimensional Modeling)
Start Schema Example for an
Automobile Insurance Data Warehouse

Driver Automotive

Facts:
Dimensions: Claim Information Central table that contains
How data will be sliced/
(usually summarized)
diced (e.g., by location,
information; also contains
time period, type of
foreign keys to access each
automobile or driver)
dimension table.

Location Time

29
Dimensional Modelling
Data cube
A two-dimensional, three-
dimensional, or higher-
dimensional object in which
each dimension of the data
represents a measure of
interest
- Grain highest level of detail
- Drill-down/up
- Slicing

30
OLAP Operations
• Slicing: A slice is a subset of a multidimensional array (usually a two-dimensional
representation) corresponding to a single value set for one (or more) of the
dimensions not in the subset.

• Dice: A slice is a subset of a multidimensional array (usually a two-dimensional


representation) corresponding to a single value set for one (or more) of the
dimensions not in the subset.

• Drill-down/up: Drilling down or up is a specific OLAP technique whereby the user


navigates among levels of data ranging from the most summarized (up) to the
most detailed (down).

• Roll-up: A roll-up involves computing all of the data relationships for one or more
dimensions.

• Pivot: A pivot is a means of changing the dimensional orientation of a report or ad


hoc query-page display.
31
Best Practices for Implementing D W
The project must fit with corporatestrategy

There must be complete buy-in to theproject

It is important to manage userexpectations

The data warehouse must be built incrementally

Adaptability must be built in from the start

The project must be managed by both IT and business professionals(a


business–supplier relationship must bedeveloped)
Only load data that have been cleansed/highquality

Do not overlook training requirements

Be politically aware.

32
Risks for Implementing D W
No mission or objective Architectural and design risks

Quality of source dataunknown Scope creep and changingrequirements

Skills not in place Vendors out of control

Inadequate budget Multiple platforms

Lack of supporting software Key people leaving theproject

Source data not understood Loss of thesponsor

Weak sponsor Too much new technology

Users not computer literate Having to fix an operational system

Political problems or turf wars Geographically distributed environment

Unrealistic user expectations Team geography and language culture


33
Things to Avoid for Successful
Implementation of D W
Starting with the wrong sponsorshipchain

Setting expectations that you cannot meet

Engaging in politically naivebehavior

Loading the warehouse with information just because it is available

Believing that data warehousing database design is the same as


transactional DB design
Choosing a data warehouse manager who istechnology oriented rather
than user oriented

34
Real-time D W
(a.k.a. Active Data Warehousing)
▸ Enabling real-time data updatesfor real-time analysis and real-time
decision making is growing rapidly
▹ Push vs. Pull (of data)
▸ Concernsabout real-time BI
▹ Not all data should be updated continuously
▹ Mismatch of reports generated minutes apart
▹ May be cost prohibitive
▹ May also beinfeasible
35
Evolution of D S S & D W
37
Active Data Warehousing
(by Teradata Corporation)

38
Comparing Traditional and Active D W

39
Data Warehouse Administration
▸ Dueto its huge size and its intrinsic nature, a DWrequires especially strong
monitoring in order to sustain its efficiency, productivity and security.
▸ The successful administration and management of a data warehouse entails
skills and proficiency that go past what is required of a traditional database
administrator.
▹ Requires expertise in high-performance software, hardware, and networking
technologies

40
D W Scalability and Security
▸ Scalability
▹ The main issues pertaining to scalability:
▹ The amount of data in the warehouse
▹ How quickly the warehouse is expected to grow
▹ The number of concurrent users
▹ The complexity of user queries
▹ Good scalability means that queries and other data-access functions will grow linearly
with the size of the warehouse

▸ Security
▹ Emphasis on security and privacy
41

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy