Data Warehousing: Engr. Madeha Mushtaq Department of Computer Science Iqra National University
Data Warehousing: Engr. Madeha Mushtaq Department of Computer Science Iqra National University
Data Warehousing: Engr. Madeha Mushtaq Department of Computer Science Iqra National University
• 1960
Master Files & Reports
• 1965
Lots of Master files!
• 1970
Direct Access Memory & DBMS
• 1975
Online high performance transaction processing
HISTORICAL OVERVIEW
• 1980
PCs and 4GL Technology (MIS/DSS)
• In the 1990s, businesses grew more complex, corporations spread globally, and
competition became fiercer.
• The operational computer systems did provide information to run the day-to-
day operations, but what the executives needed were different kinds of
information that could be readily used to make strategic decisions.
• The operational systems, important as they were, could not provide strategic
information.
WHY DATA WAREHOUSE?
• The amount of data average business collects and stores is doubling every year.
• Total hardware and software cost to store and manage 1 Mbyte of data
• 1990: ~ $15
• All the past attempts by IT to provide strategic information have been failures.
• This was mainly because IT has been trying to provide strategic information from
operational systems.
• These operational systems such as order processing, inventory control, claims
processing, outpatient billing, and so on are not designed or intended to provide
strategic information.
• Only specially designed decision support systems or informational systems can
provide strategic information.
OPERATIONAL VERSUS DECISION-SUPPORT SYSTEMS
• The Data warehouse is thus the decision support system that can provide strategic
information for analysis, discerning trends, and monitoring performance. The DWH
Let us re-examine the basic concept of data warehousing. The basic concept of data
warehousing is:
• Take all the data from the operational systems
• Where necessary, include relevant data from outside, such as industry benchmark
indicators
• Integrate all the data from the various sources
• Remove inconsistencies and transform the data
• Store the data in formats suitable for easy access for decision making
• Although a simple concept, it involves different functions: data extraction, the
function of loading the data, transforming the data, storing the data, and providing
user interfaces.
DEFINING FEATURES OF DATA WAREHOUSE
• Subject-Oriented Data
• Integrated Data
• Time-variant Data
• Nonvolatile Data
• Data Granularity
SUBJECT-ORIENTED DATA
• The data in the data warehouse comes from several operational systems.
• Source data are in different databases, so the file layouts, character code
representations could be different, naming conventions could be different,
attributes for data items could be different.
• Before the data from various disparate sources can be usefully stored in a data
warehouse, data inconsistencies are removed; data from diverse operational
applications is integrated.
INTEGRATED DATA
TIME-VARIANT DATA
• For an operational system, the stored data contains the current values.
• we might store some past transactions in operational systems, but, essentially,
operational systems reflect current information because these systems support day-
to-day current operations.
• A data warehouse, because of the very nature of its purpose, has to contain
historical data, not just current values.
• For example, if a user wants to find out the reason for the drop in sales in the North
East division, the user needs all the sales data for that division over a period
extending back in time.
TIME-VARIANT DATA
• Data extracted from the various operational systems and pertinent data obtained from
outside sources are transformed, integrated, and stored in the data warehouse.
• The data in the data warehouse is not intended to run the day-to-day business.
• Data from the operational systems are moved into the data warehouse at specific
intervals. We do not update the data warehouse every time we process a single order.
• Depending on the requirements of the business, these data movements take place twice a
day, once a day, once a week, or once in two weeks.
NONVOLATILE DATA
As illustrated in Figure, every business transaction does not update the data in the data warehouse. The
business transactions update the operational system databases in real time.
DATA GRANULARITY