Unit-2
Unit-2
MIS
By Shailee Shah
Assistant professor
President Institute of Computer Application
Data warehousing
Data warehousing refers to a technology that helps in aggregating structured data from
various sources. This way, one can easily compare this data and analyze .The technology of
data warehousing provides a user with a platform to clean, integrate, and consolidate data-
thus supporting the decision-making process by the management. Various types of data are
present in a warehouse that may be non-volatile, integrated, subject-oriented, and time-
variant.
A data warehouse consolidates the available data from various sources while still ensuring
the accuracy, quality, and consistency of the contained information. A warehouse improves
the overall performance of a system. The available data flows into it from a variety of
databases, and it works by organizing this data into schemas. This schema must describe the
type and layout of the contained data. The query tools in a warehouse analyze the tables of
data using the available schema.
Data warehousing is a collection of tools and techniques using which more knowledge can
be driven out from a large amount of data. This helps with the decision-making process and
improving information resources. 2
Data warehouse is basically a database of unique data structures that allows relatively quick
and easy performance of complex queries over a large amount of data. It is created from
multiple heterogeneous sources.
3
Characteristics of Data Warehouse
4
Characteristics of Data Warehouse
1. Subject-Oriented
A data warehouse target on the modeling and analysis of data for decision-makers. Therefore,
data warehouses typically provide a concise and straightforward view around a particular
subject, such as customer, product, or sales, instead of the global organization's ongoing
operations.
This is done by excluding data that are not useful concerning the subject and including all data
needed by the users to understand the subject.
5
Characteristics of Data Warehouse
2. Integrated
A data warehouse integrates various heterogeneous data sources like RDBMS, flat files, and
online transaction records. It requires performing data cleaning and integration during data
warehousing to ensure consistency in naming conventions, attributes types, etc., among
different data sources.
3. Time-Variant
Historical information is kept in a data warehouse. For example, one can retrieve files from 3
months, 6 months, 12 months, or even previous data from a data warehouse. These variations
6
with a transactions system, where often only the most current file is kept.
Characteristics of Data Warehouse
4. Non-Volatile
The data warehouse is a physically separate data storage, which is transformed from the
source operational RDBMS. The operational updates of data do not occur in the data
warehouse, i.e., update, insert, and delete operations are not performed. It usually requires
only two procedures in data accessing: Initial loading of data and access to data.
Therefore, the DW does not require transaction processing, recovery, and concurrency
capabilities, which allows for substantial speedup of data retrieval. Non-Volatile defines that
once entered into the warehouse, and data should not change. 7
Benefits of Data warehousing
1. Understand business trends and make better forecasting decisions.
3. The structure of data warehouses is more accessible for end-users to navigate, understand,
and query.
4. Queries that would be complex in many normalized databases could be easier to build and
maintain in data warehouses.
5. Data warehousing is an efficient method to manage demand for lots of information from
lots of users.
6. Data warehousing provide the capabilities to analyze a large amount of historical data.
8
Criteria of Data warehousing
1. Scalability : with no on-premise software or hardware, it’s easy, cost-effective,
simple and flexible to scale with cloud services
2. Low entry cost: with no servers, hardware, IT work or operational costs, cloud
services cost substantially less up-front.
4. Security: Typical cloud providers stay hyper up-to-date with security patches and
protocols to keep their host of customers safe and happy
6. Speed: Cloud solutions usually rely on far-away servers, and the time that it takes
for data to bounce through those servers and reach the end user from these servers
can be unacceptable for many businesses. 9
Criteria of Data warehousing
with this problem - with local servers, speed and latency can be better managed, at
least for businesses based in one geographical location.
10
The Data warehouse Model
11
Data warehouse modeling is the process of designing the schemas of the detailed and
summarized information of the data warehouse. The goal of data warehouse modeling is to
develop a schema describing the reality, or at least a part of the fact, which the data
warehouse is needed to support.
Data warehouse modeling is an essential stage of building a data warehouse for two main
reasons. Firstly, through the schema, data warehouse clients can visualize the relationships
among the warehouse data, to use them with greater ease. Secondly, a well-designed schema
allows an effective data warehouse structure to emerge, to help decrease the cost of
implementing the warehouse and improve the efficiency of using it.
Data modeling in data warehouses is different from data modeling in operational database
systems. The primary function of data warehouses is to support DSS processes.
Data warehouses are designed for the customer with general information knowledge about
the enterprise, whereas operational database systems are more oriented toward use by
software specialists for creating distinct applications. 12
The Data warehouse Model
Older detail data is stored in some form of mass storage, and it is infrequently accessed and
kept at a level detail consistent with current detailed data.
Lightly summarized data is data extract from the low level of detail found at the current,
detailed level and usually is stored on disk storage. When building the data warehouse have
to remember what unit of time is summarization done over and also the components or what
attributes the summarized data will contain.
Highly summarized data is compact and directly available and can even be found outside
the warehouse.
Metadata is the final element of the data warehouses and is really of various dimensions in
which it is not the same as file drawn from the operational data, but it is used as:-
A directory to help the DSS investigator locate the items of the data warehouse.
A guide to the mapping of record as the data is changed from the operational data to the data
warehouse environment.
A guide to the method used for summarization between the current, accurate data and the lightly
3. What do you mean by Data Warehouse? Explain Characteristics and benefits of Data
Warehouse.
THANK YOU
14