0% found this document useful (0 votes)

9 views

DWM UNIT-I NOTES

A data warehouse is a centralized repository that organizes and analyzes large datasets to facilitate business decision-making. It is characterized by subject-oriented, integrated, time-referenced, non-volatile data, and supports various analytical processes. The document also discusses the advantages and disadvantages of data warehousing, the differences between operational and informational systems, and outlines components of Kimball’s DW/BI architecture.

Uploaded by

mohansaigudipalli1234

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

9 views

DWM UNIT-I NOTES

Uploaded by

mohansaigudipalli1234

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 9

UNIT-1

Introduction to Data Warehouse

What Is A Data Warehouse?

● It is a central repository where information is coming from one or more data

sources.it helps to make decision easily.

● A data warehouse is a powerful database model that significantly enhances the user‟s
ability to quickly analyze large, multidimensional data sets.
● It cleanses and organizes data to allow users to make business decisions based on
facts. Hence, the data in the data warehouse must have strong analytical
characteristics.
● Creating data to be analytical requires that it be subject-oriented, integrated, time-
referenced, and non-volatile.
● It is a multi dimensional model.
Characteristics of Data Warehouse:
1.Subject-Oriented Data
2.Integrated Data
3.Time-Referenced Data
4.Non-Volatile Data
5.Granularity

Subject-Oriented Data:
● In a data warehouse environment, information used for analysis is organized around
subjects: employees, accounts, sales, products, and so on.
● This subject specific design helps in reducing the query response time by searching
through very few records to get an answer to the user’s question.

Integrated Data:

● Integrated data refers to de-duplicating information and merging it from many sources
into one consistent location.
● When short listing your top 20 customers, you must know that “HAL” and “Hindustan
Aeronautics Limited” are one and the same.
● Much of the transformation and loading work that goes into the data warehouse is
centred on integrating data and standardizing it.

Time-Referenced Data:

● Time-referenced data essentially refers to its time-valued characteristic.

● For example, the user may ask “What were the total sales of product „A‟ for the past
three years on New Years Day across region „Y ‟?”
● Time-referenced data when analyzed can also help in spotting the hidden trends
between different associative data elements,
● which may not be obvious to the naked eye. This exploration activity is
termed “data mining”.

Non-Volatile Data:
● The data warehouse is a physically separate data storage, which is transformed from
the source operational RDBMS. The operational updates of data do not occur in the
data warehouse, i.e., update, insert, and delete operations are not performed.
● The non-volatility of data, characteristic of data warehouse, enables users to dig deep
into history and arrive at specific business decisions based on facts.

Data Granularity:in data warehouse data to keep data at different levels

Why A Data Warehouse?

The Data Access Crisis

● If there is a single key to survival in the 1990s and beyond, it is being able to analyze,
plan, and react to changing business conditions in a much more rapid fashion.
● In order to do this, top managers, analysts, and knowledge workers in our enterprises,
need more and better information.
● Every day, organizations large and small, create billions of bytes of data about all
aspects of their business; millions of individual facts about their customers, products,
operations and people.
● But for the most part, this is locked up in a maze of computer systems and is
exceedingly difficult to get at. This phenomenon has been described as “data in jail”.

Data Warehousing:

● The idea of data warehousing came to the late 1980's when IBM researchers Barry
Devlin and Paul Murphy established the "Business Data Warehouse."
● Data warehousing is a field that has grown from the integration of a number of
different technologies and experiences over the past two decades.
● These experiences have allowed the IT industry to identify the key problems that
need to be solved.
● It is the process of developing,managing and securing the electronic storage of data by a
business or organization in a digital data warehouse .

Operational vs. Informational Systems

● Operational systems, as their name implies, are the systems that help the every day
operation of the enterprise.
● These are the backbone systems of any enterprise, and include order entry, inventory,
manufacturing, payroll and accounting.
● Due to their importance to the organization, operational systems were almost always
the first parts of the enterprise to be computerized.

● Informational systems deal with analyzing data and making decisions, often major,
about how the enterprise will operate now, and in the future.
● Not only do informational systems have a different focus from operational ones, they
often have a different scope.
● Where operational data needs are normally focused upon a single area, informational
data needs often span a number of different areas and need large amounts of related
operational data.
DATA WAREHOUSE ADVANTAGES & DISADVANTAGES:

DATA WAREHOUSE ADVANTAGES:

1. DW make access to a wide variety of data easier for end users.2.provide key i/f for
business decision making
3. Improves the quality of decisions made
4. Especially useful for the medium & large term
5. It provides a great power of information processing
6. Facilities decision making in business
7. Companies get an increase in productivity
8. It allows you to plan more effectively
9. Reduce response times &Operating costs
10. improve relationships with suppliers & customers

DATA WAREHOUSE DIS ADVANTAGES:

1. DW can suppose high costs, maintenance costs are high

2. DW may become obsolete relatively soon
3. It require continuous cleaning, transformation ,data integration, maintenance
4. Once implemented it can be difficult to add new data sources
5. they have a complex & multidisciplinary design
6. they require a restructuring of the OS
7. They require a review of the data model, objects, transactions, additions to storage
8. In an implementation process difficulties may be encountered in relation to the
different objectives that an organization intends
Difference between OLTP and OLAP:
OLTP (On-Line Transaction Processing) is featured by a large number of short on-line
transactions (INSERT, UPDATE, and DELETE). The primary significance of OLTP
operations is put on very rapid query processing, maintaining record integrity in multi-access
environments, and effectiveness consistent by the number of transactions per second. In the
OLTP database, there is an accurate and current record, and schema used to save transactional
database is the entity model (usually 3NF).
OLAP (On-line Analytical Processing) is represented by a relatively low volume of
transactions. Queries are very difficult and involve aggregations. For OLAP operations,
response time is an effectiveness measure. OLAP applications are generally used by Data
Mining techniques. In OLAP database there is aggregated, historical information, stored in
multi-dimensional schemas (generally star schema).
Data Warehouse (OLAP) Operational Database(OLTP)
It involves historical processing of It involves day-to-day processing.
information
OLAP systems are used by knowledge OLTP systems are used by clerks, DBAs, or
workers such as executives, managers, and database professionals.
analysts.
It is used to analyze the business. It is used to run the business.
It focuses on Information out. It focuses on Data in.
It is based on Star Schema, Snowflake It is based on Entity Relationship Model.
Schema, and Fact Constellation Schema.
It is subject oriented. It is application oriented.
It contains historical data. It contains current data.
It provides summarized and consolidated It provides primitive and highly detailed
data. data.
It provides summarized and It provides detailed and flat relational view
multidimensional view of data. of data.
The number of users is in hundreds. The number of users is in thousands.
The number of records accessed is in The number of records accessed is in tens.
millions.
These are highly flexible. It provides high performance.
What are the components in Kimball’s DW/BI Architecture?

Kimball’s DW/BI Architecture:

There are four separate and distinct components to consider in the DW/BI
environment: operational source systems, ETL system, data presentation area, and
business intelligence application.

Fig:Core elements of the Kimball DW/BI architecture.

Operational Source Systems :
● These are the operational systems of record that capture the business’s transactions.
● Source systems as outside the data warehouse because presumably you have
little or no control over the content and format of the data in these
operational systems.
● The main priorities of the source systems are processing performance and
availability.
● Operational queries against source systems are narrow, one-record-at-a-time
queries that are part of the normal transaction flow and severely restricted in
their demands on the operational system.
● In many cases, the source systems are special purpose applications without
any commitment to sharing common data such as product, customer,
geography, or calendar with other operational systems in the organization.
Of course, a broadly adopted cross-application enterprise resource planning
(ERP) system or operational master data management system could help
address these shortcomings.
Extract, Transformation, and Load System :
● The extract, transformation, and load (ETL) system of the DW/BI environment
consists of a work area, instantiated data structures, and a set of processes.
● . Extraction is the first step in the process of getting data into the data warehouse
environment.
● Extracting means reading and understanding the source data and copying the data
needed into the ETL system for further manipulation.
● After the data is extracted to the ETL system, there are numerous potential
transformations, such as cleansing the data (correcting misspellings, resolving
domain conflicts, dealing with missing elements, or parsing into standard formats),
combining data from multiple sources, and de-duplicating data.
● The ETL system adds value to the data with these cleansing and conforming tasks
by changing the data and enhancing it.
● The final step of the ETL process is the physical structuring and loading of data into
the presentation area’s target dimensional models.
● When the dimension and fact tables in a dimensional model have been updated,
indexed, supplied with appropriate aggregates, and further quality assured, the
business community is notified that the new data has been published.
● The ETL system is typically dominated by the simple activities of sorting and
sequential processing
● The creation of both normalized structures for the ETL and dimensional structures
for presentation means that the data is potentially extracted, transformed, and loaded
twice—once into the normalized database and then again when you load the
dimensional model.
● Although enterprise-wide data consistency is a fundamental goal of the DW/BI
environment, there may be effective and less costly approaches than physically
creating normalized tables in the ETL system, if these structures don’t already exist.
Presentation Area to Support Business Intelligence:
● The DW/BI presentation area is where data is organized, stored, and made available
for direct querying by users, report writers, and other analytical BI applications.
● Dimensional modeling is the most viable technique for delivering data to DW/BI
users.
● The presentation area is that it must contain detailed, atomic data.
● Atomic data is required to withstand assaults from unpredictable adhoc user
queries.
● The presentation data area should be structured around business process
measurement events.
● All the dimensional structures must be built using common, conformed dimensions.
● Data in the queryable presentation area of the DW/BI system must be dimensional,
atomic (complemented by performance-enhancing aggregates), business process-
centric, and adhere to the enterprise data warehouse bus architecture. The data must
not be structured according to individual departments’ interpretation of the data.
Business Intelligence Applications :
● The final major component of the Kimball DW/BI architecture is the business
intelligence (BI) application.
● The term BI application loosely refers to the range of capabilities provided to
business users to leverage the presentation area for analytic decision making.
● A BI application can be as simple as an ad hoc query tool or as complex as a
sophisticated data mining or modeling application.

Alternative DW/BI Architectures:

● Data mart is such a storage component which is concerned on a specific
department of an organization. It is a subset of the data stored in the datawarehouse.
Data mart is focused only on particular function of an organization and it is
maintained by single authority only, e.g.m finance, Marketing. Data Marts are small
in size and are flexible.
● Types of Data Mart:
There are three types of data marts:
Dependent Data Mart :
● Dependent Data Mart is created by extracting the data from central repository,
Datawarehouse. First data warehouse is created by extracting data (through ETL
tool) from external sources and then data mart is created from data warehouse.
Dependent data mart is created in top-down approach of datawarehouse
architecture. This model of data mart is used by big organizations.

Independent Data Mart Architecture:

● Independent Data Mart is created directly from external sources instead of data
warehouse. First data mart is created by extracting data from external sources and
then datawarehouse is created from the data present in data mart. Independent data
mart is designed in bottom-up approach of datawarehouse architecture. This model
of data mart is used by small organizations and is cost effective comparatively.
● Independent data marts are not difficult to design and develop. They are beneficial to
achieve short-term goals but may become cumbersome to manage—each with its
own ETL tool and logic—as business needs expand and become more complex.
● An advantage to this model is that individual business units can run the data
mart that suits them best.

Hybrid Data Mart –

This type of Data Mart is created by extracting data from operational source or
from data warehouse. 1Path reflects accessing data directly from external sources
and 2Path reflects dependent data model of data mart.

THE STEP BY STEP GUIDE FOR SUCCESSFUL IMPLEMENTATION OF DATA LAKE-LAKEHOUSE-DATA WAREHOUSE: "THE STEP BY STEP GUIDE FOR SUCCESSFUL IMPLEMENTATION OF DATA LAKE-LAKEHOUSE-DATA WAREHOUSE"
From Everand
THE STEP BY STEP GUIDE FOR SUCCESSFUL IMPLEMENTATION OF DATA LAKE-LAKEHOUSE-DATA WAREHOUSE: "THE STEP BY STEP GUIDE FOR SUCCESSFUL IMPLEMENTATION OF DATA LAKE-LAKEHOUSE-DATA WAREHOUSE"
AJIT DASH
2/5 (2)
Unit - 1 Introduction To Data Warehousing
No ratings yet
Unit - 1 Introduction To Data Warehousing
57 pages
Learn Data Warehousing in 24 Hours
From Everand
Learn Data Warehousing in 24 Hours
Alex Nordeen
No ratings yet
Unit 1
No ratings yet
Unit 1
99 pages
DW Unit-1 (1) XXXXXXXX
No ratings yet
DW Unit-1 (1) XXXXXXXX
70 pages
DWH Fundamentals (Training Material)
No ratings yet
DWH Fundamentals (Training Material)
21 pages
Week-2-Data Warehouse and Olap
No ratings yet
Week-2-Data Warehouse and Olap
57 pages
Data Mining UNIT I
No ratings yet
Data Mining UNIT I
11 pages
In T e G R A Ti o N: Integration of Data
No ratings yet
In T e G R A Ti o N: Integration of Data
21 pages
Module1 Part3
No ratings yet
Module1 Part3
46 pages
Unit 2
No ratings yet
Unit 2
31 pages
ETL Testing
No ratings yet
ETL Testing
32 pages
Data War Eh Puse
No ratings yet
Data War Eh Puse
51 pages
DWDM Unit 1
No ratings yet
DWDM Unit 1
103 pages
Data Mining and Warehousing
No ratings yet
Data Mining and Warehousing
18 pages
Advance Concept in Data Bases Unit-5 by Arun Pratap Singh
100% (1)
Advance Concept in Data Bases Unit-5 by Arun Pratap Singh
82 pages
What Is Data Warehouse
No ratings yet
What Is Data Warehouse
19 pages
Data Mining Unit 1
No ratings yet
Data Mining Unit 1
31 pages
DATA WAREHOUSE Basic Concepts
No ratings yet
DATA WAREHOUSE Basic Concepts
26 pages
Introduction To Data Warehouse Edited
No ratings yet
Introduction To Data Warehouse Edited
34 pages
Introduction To Data Warehouse
No ratings yet
Introduction To Data Warehouse
34 pages
Lecture # 1-2-Intro
No ratings yet
Lecture # 1-2-Intro
55 pages
Data Warehouse
No ratings yet
Data Warehouse
97 pages
DM Chapter 4
No ratings yet
DM Chapter 4
8 pages
Introduction To Data Warehousing
No ratings yet
Introduction To Data Warehousing
43 pages
Lesson 2. Data Warehouse Basic Concepts
No ratings yet
Lesson 2. Data Warehouse Basic Concepts
18 pages
Data Warehouse - Final
No ratings yet
Data Warehouse - Final
28 pages
Presentation Prepared By:: Aqsa Ashfaq
No ratings yet
Presentation Prepared By:: Aqsa Ashfaq
22 pages
DWBI Unit-1
No ratings yet
DWBI Unit-1
19 pages
Data Warehouse
No ratings yet
Data Warehouse
16 pages
DWM 1
No ratings yet
DWM 1
15 pages
Pharma Batch: Data Warehousing
No ratings yet
Pharma Batch: Data Warehousing
32 pages
Etl Testing Documentation PDF
No ratings yet
Etl Testing Documentation PDF
22 pages
Data Warehouse
No ratings yet
Data Warehouse
4 pages
DM Part 2
No ratings yet
DM Part 2
24 pages
Data Ware Housing
No ratings yet
Data Ware Housing
10 pages
Datawarehouse Unit-2
No ratings yet
Datawarehouse Unit-2
59 pages
Data Mining Complete
No ratings yet
Data Mining Complete
95 pages
DWDM Lecturenotes PDF
No ratings yet
DWDM Lecturenotes PDF
133 pages
Data Ware House Concepts
No ratings yet
Data Ware House Concepts
12 pages
DW Basics
No ratings yet
DW Basics
8 pages
Week-2-Data Warehouse and Olap
No ratings yet
Week-2-Data Warehouse and Olap
48 pages
R16 4-2 DataMining Notes UNIT-I
No ratings yet
R16 4-2 DataMining Notes UNIT-I
31 pages
Module 1 (2)
No ratings yet
Module 1 (2)
71 pages
Warehousing
No ratings yet
Warehousing
15 pages
DWDM Unit 2
No ratings yet
DWDM Unit 2
21 pages
Data Warehouse and Mining-1
No ratings yet
Data Warehouse and Mining-1
40 pages
7 Data Warehousing - 1
No ratings yet
7 Data Warehousing - 1
32 pages
Data Warehousing
No ratings yet
Data Warehousing
77 pages
DWH Start l2
No ratings yet
DWH Start l2
117 pages
Term 1
No ratings yet
Term 1
12 pages
02 DataWarehousing and OLAP
No ratings yet
02 DataWarehousing and OLAP
66 pages
DWDM Unit 1
No ratings yet
DWDM Unit 1
122 pages
What Is A Data Warehouse?
No ratings yet
What Is A Data Warehouse?
48 pages
Data Warehouse Components
No ratings yet
Data Warehouse Components
26 pages
Data Warehouse
No ratings yet
Data Warehouse
86 pages
Data Warehousing: Understanding A Data Warehouse
No ratings yet
Data Warehousing: Understanding A Data Warehouse
4 pages
Unit Ii DWDM
No ratings yet
Unit Ii DWDM
10 pages
Database Management System
From Everand
Database Management System
Manish Soni
No ratings yet
Practical Data Strategies and Recipes
From Everand
Practical Data Strategies and Recipes
Tom Henricksen
No ratings yet
Cell Type UR18650F: Specifications
No ratings yet
Cell Type UR18650F: Specifications
5 pages
Opamp Written Notes
No ratings yet
Opamp Written Notes
24 pages
Third Conditional
No ratings yet
Third Conditional
7 pages
Department of Mathematics and Statistics
No ratings yet
Department of Mathematics and Statistics
4 pages
Counters in Digital Electronics - Javatpoint
No ratings yet
Counters in Digital Electronics - Javatpoint
5 pages
Automatisation Du Réseau Avec Python
No ratings yet
Automatisation Du Réseau Avec Python
46 pages
High Vacuum Oil Purifier: Kato Electric Mfg. Co., LTD
100% (1)
High Vacuum Oil Purifier: Kato Electric Mfg. Co., LTD
4 pages
HT-ST-Manual-V1
No ratings yet
HT-ST-Manual-V1
11 pages
Tutorial2 Solutions 373
No ratings yet
Tutorial2 Solutions 373
5 pages
Fire Department: Residential Apartment Building Fire Safety
No ratings yet
Fire Department: Residential Apartment Building Fire Safety
4 pages
Topic 2 - Data Processes
No ratings yet
Topic 2 - Data Processes
70 pages
Interpreting Numeral Systems: Bases
No ratings yet
Interpreting Numeral Systems: Bases
16 pages
General English For Intermediate Student (1)
No ratings yet
General English For Intermediate Student (1)
83 pages
Serial Communication Interview Questions and Answers
100% (1)
Serial Communication Interview Questions and Answers
9 pages
Weekly Diary_Final.sumagoInfotech
No ratings yet
Weekly Diary_Final.sumagoInfotech
13 pages
CG-Part3 PPSX
No ratings yet
CG-Part3 PPSX
26 pages
ASTM E 1165 2020 - Measurement of Focal Spots of Industrial X-Ray Tubes by Pinhole Imaging
No ratings yet
ASTM E 1165 2020 - Measurement of Focal Spots of Industrial X-Ray Tubes by Pinhole Imaging
17 pages
DBAS6211MM
No ratings yet
DBAS6211MM
173 pages
B.Sc.-Computer-Science-and-Cyber-Security
No ratings yet
B.Sc.-Computer-Science-and-Cyber-Security
1 page
Evolution of Quality Control
No ratings yet
Evolution of Quality Control
40 pages
IT Chapter 8 Exercise Questions
No ratings yet
IT Chapter 8 Exercise Questions
10 pages
Majmundar2018 PDF
No ratings yet
Majmundar2018 PDF
7 pages
Shiva Prasad Bobbala PDF
No ratings yet
Shiva Prasad Bobbala PDF
2 pages
32 Direct Acting Normally Closed Valve 17 MM
No ratings yet
32 Direct Acting Normally Closed Valve 17 MM
2 pages
Chapter 6 - BEAM ELEMENT (Compatibility Mode)
No ratings yet
Chapter 6 - BEAM ELEMENT (Compatibility Mode)
12 pages
Funda MIMO
No ratings yet
Funda MIMO
69 pages
POD Express - Guitar CheatSheet - English
No ratings yet
POD Express - Guitar CheatSheet - English
1 page
STS Reviewer
No ratings yet
STS Reviewer
13 pages
Datasheet GTB6-P4211 1052438 en
No ratings yet
Datasheet GTB6-P4211 1052438 en
8 pages
Thermal Energy Systems: Design and Analysis Second Edition Steven G. Penoncello - Read the ebook online or download it for a complete experience
100% (2)
Thermal Energy Systems: Design and Analysis Second Edition Steven G. Penoncello - Read the ebook online or download it for a complete experience
68 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

DWM UNIT-I NOTES

Uploaded by

DWM UNIT-I NOTES

Uploaded by

UNIT-1

Introduction to Data Warehouse

● It is a central repository where information is coming from one or more data

● Time-referenced data essentially refers to its time-valued characteristic.

Data Granularity:in data warehouse data to keep data at different levels

The Data Access Crisis

Operational vs. Informational Systems

DATA WAREHOUSE ADVANTAGES:

DATA WAREHOUSE DIS ADVANTAGES:

1. DW can suppose high costs, maintenance costs are high

Kimball’s DW/BI Architecture:

Fig:Core elements of the Kimball DW/BI architecture.

Alternative DW/BI Architectures:

Independent Data Mart Architecture:

Hybrid Data Mart –

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.