0% found this document useful (0 votes)

69 views

Data Warehouse - DWDM

A data warehouse is a central repository for integrated data from multiple sources used for reporting and analysis. It stores current and historical data in one place. Data flows into a data warehouse from transaction systems and databases on a regular schedule. Business users then access the data through BI tools. A data mart focuses on a specific subject area or department and can provide quicker access to frequently used data than a full data warehouse. They are often used as part of a bottom-up approach to building an enterprise data warehouse.

Uploaded by

Chandresh Padmani

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

69 views

Data Warehouse - DWDM

Uploaded by

Chandresh Padmani

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 54

Data Warehouse

What is Data Warehouse?

• In computing, a data warehouse (DW or DWH), also known as an
enterprise data warehouse (EDW), is a system used for reporting and
data analysis, and is considered a core component of business
intelligence.
--Wikipedia
What is Data Warehouse?
• DWs are central repositories of integrated data from one or more
disparate sources. They store current and historical data in one single
place that are used for creating analytical reports for workers
throughout the enterprise.
What is Data Warehouse?
• A data warehouse is a central repository of information that can be
analyzed to make better informed decisions.
• Data flows into a data warehouse from transactional systems, relational
databases, and other sources, typically on a regular cadence.
• Business analysts, data scientists, and decision makers access the data
through business intelligence (BI) tools, SQL clients, and other analytics
applications.

https://aws.amazon.com/data-warehouse/
Characteristics of Data warehouse
A data warehouse has following characteristics(features):
•Subject-Oriented: A data warehouse is organized around major subjects,
such as customer, vendor, product, and sales. Rather than concentrating on
the day-to-day operations and transaction processing of an organization, a
data warehouse focuses on the modeling and analysis of data for decision
makers.
Characteristics of Data warehouse
• Integrated: A data warehouse is usually constructed by integrating
multiple heterogeneous sources, such as relational databases, flat files,
and on-line transaction records.
• Data cleaning and data integration techniques are applied to ensure
consistency in naming conventions, encoding structures, attribute
measures, and so on.
Characteristics of Data warehouse
• Time-variant: Data are stored to provide information from a historical
perspective (e.g., the past 5-10 years).
• Non-volatile: A data warehouse is always a physically separate store of
data transformed from the application data found in the operational
environment. Due to this separation, a data warehouse does not require
transaction processing, recovery, and concurrency control mechanisms. It
usually requires only two operations in data accessing: initial loading of
data and access of data.
Data Warehouse overview

https://en.wikipedia.org/wiki/Data_warehouse#/media/File:Data_Warehouse_Feeding_Data_Mart.jpg
Data Warehouse Component

https://www.guru99.com/data-warehouse-architecture.html#8
ETL(Extract, Transform and Load)
vs
ELT (Extract, Load and Transform)
ETL based Data warehousing
ETL(Extract, Transform and Load) is defined as a process that extracts the data from different
RDBMS source systems, then transforms the data (like applying calculations, concatenations,
etc.) and finally loads the data into the Data Warehouse system.
Transformations Example
•Converting numerical values
•Editing text strings
•Matching rows and columns
•Find and replace
•Changing column names
•Recombining columns from different tables and databases
•Precalculating intermediate aggregates
ETL based Data warehousing

https://panoply.io/data-warehouse-guide/etl-tutorial/
ELT based Data warehousing
In this approach, data gets extracted from heterogeneous source systems and are then
directly loaded into the data warehouse, before any transformation occurs. All necessary
transformations are then handled inside the data warehouse itself. Finally, the manipulated
data gets loaded into target tables in the same data warehouse.
ELT based Data warehousing

https://www.xplenty.com/blog/etl-vs-elt/#overview
ETL vs ELT

https://www.softwareadvice.com/resources/etl-vs-elt-for-your-data-warehouse/
Three-Tier Data Warehouse Architecture
Data Warehouse Design
There are two approaches
• Top-Down approach(Bill Inmon)
• Bottom-Up approach (Ralph Kimball)
Data Warehouse Design(Top-Down)
In the top-down approach, the data warehouse is designed first and then data
mart are built on top of data warehouse.
Data Warehouse Design(Top-Down)
Advantages of top-down design are:
• Provides consistent dimensional views of data across data marts, as all data
marts are loaded from the data warehouse.
• This approach is robust against business changes. Creating a new data mart
from the data warehouse is very easy.
Disadvantages of top-down design are:
• This methodology is inflexible to changing departmental needs during
implementation phase.
• The cost, time taken in designing and its maintenance is very high.
Data Warehouse Design(Bottom-up)
As per this method, data marts are first created to provide the reporting and
analytics capability for specific business process, later with these data marts
enterprise data warehouse is created.
Data Warehouse Design(Bottom-up)
• In the bottom-up design approach, the data marts are created first to provide
reporting capability.
• A data mart addresses a single business area such as sales, Finance etc. These data
marts are then integrated to build a complete data warehouse.
• The integration of data marts is implemented using data warehouse bus
architecture.
• In the bus architecture, a dimension is shared between facts in two or more data
marts.
• These dimensions are called conformed dimensions. These conformed dimensions
are integrated from data marts and then data warehouse is built.
Data Warehouse Design(Bottom-Up)
Advantages of Bottom-Up Approach –
• As the data marts are created first, so the reports are quickly generated.
• We can accommodate more number of data marts here and in this way data
warehouse can be extended.
• Also, the cost and time taken in designing this model is low comparatively.
Disadvantage of Bottom-Up Approach –
• This model is not strong as top-down approach as dimensional view of data
marts is not consistent as it is in above approach.
Why build a data warehouse at all—why not just run analytics queries
directly on an online transaction processing (OLTP) database, where
the transactions are recorded?
ANS: Data warehouses are optimized for batched write operations
and reading high volumes of data, whereas OLTP databases are
optimized for continuous write operations and high volumes of small
read operations
Data Warehouse vs Database
Characteristics Data Warehouse Database
Suitable workloads Analytics, reporting, big data Transaction processing
Data source Data collected and normalized from Data captured as-is from a
many sources single source, such as a
transactional system

Data capture Bulk write operations typically on a Optimized for continuous

pre-determined batch schedule write operations as new data
is available to maximize
transaction throughput

Data normalization Denormalized schemas, such as the Highly normalized, static

Star schema or Snowflake schema schemas
Data Warehouse vs Database
Characteristics Data Warehouse Database

Data storage Optimized for simplicity of access and

high-speed query performance using
columnar storage Optimized for high throughout
write operations to a single
row-oriented physical block

Data access Optimized to minimize I/O and High volumes of small read
maximize data throughput operations
Data Mart
Why do we need Data Mart?
• Data Mart helps to enhance user's response time due to reduction in volume of
data
• It provides easy access to frequently requested data.
• Data mart are simpler to implement when compared to corporate Data warehouse.
At the same time, the cost of implementing Data Mart is certainly lower compared
with implementing a full data warehouse.
• Compared to Data Warehouse, a data mart is agile. In case of change in model, data
mart can be built quicker due to a smaller size.
Why do we need Data Mart?
• A Datamart is defined by a single Subject Matter Expert. On the contrary data
warehouse is defined by interdisciplinary SME from a variety of domains. Hence,
Data mart is more open to change compared to Data warehouse.
• Data is partitioned and allows very granular access control privileges.
• Data can be segmented and stored on different hardware/software platforms.
Type of Data Mart
1. Dependent: Dependent data marts are created by drawing data directly from
operational, external or both sources.
2. Independent: Independent data mart is created without the use of a central data
warehouse.
3. Hybrid: This type of data marts can take data from data warehouses or operational
systems.
Type of Data Mart(Dependent Data Mart)
• Dependent data mart(Top-Down approach) is a place
where its data comes from a data warehouse. Data in a
data warehouse is aggregated, restructured, and
summarized when it passes into a dependent data
mart.
Type of Data Mart(Independent Data Mart)
• An independent data mart also known as stand-alone
data mart emphasizes on a particular subject area. It is
not designed in an enterprise context.
• Business intelligent tools or analytic tools query data
directly from the data mart and present information to
the user
• Independent data marts can be built in a short time
Type of Data Mart(Hybrid Data Mart)
• A hybrid data mart combines input from sources
apart from Data warehouse. This could be helpful
when you want ad-hoc integration, like after a new
group or product is added to the organization.
Steps in Implementing a Data Mart
There are following steps for implementing a data mart:

• Designing
• Building(Constructing)
• Populating
• Accessing and Managing
Steps in Implementing a Data Mart
Designing: this step includes identification of a subject or a topic related to which data mart
will store data.
It involves the following tasks:
• Gathering the business and technical requirements
• Identifying data sources
• Selecting the appropriate subset of data
• Designing the logical and physical architecture of the data mart.
Steps in Implementing a Data Mart
Building(Constructing): This step contains creating the physical database and logical
structures associated with the data mart to provide fast and efficient access to the data.

It involves the following tasks:

• Creating the physical database and logical structures such as tablespaces associated
with the data mart.
• creating the schema objects such as tables and indexes describe in the design step.
• Determining how best to set up the tables and access structures.
Steps in Implementing a Data Mart
Populating: This step includes all of the tasks related to the getting data from the source,
cleaning it up, modifying it to the right format and level of detail, and moving it into the data
mart.
It involves the following tasks:
• Mapping data sources to target data sources
• Extracting data
• Cleansing and transforming the information.
• Loading data into the data mart
• Creating and storing metadata
Steps in Implementing a Data Mart
Accessing: This step involves putting the data to use: querying the data, analyzing it, creating
reports, charts and graphs and publishing them.
It involves the following tasks:
• Set up a layer to convert database structures into business terms so that non-technical
persons can access data from data mart easily.
• Set up and manage database architectures like summarized tables which help queries
agree through the front-end tools execute rapidly and efficiently.
Steps in Implementing a Data Mart
Managing: In this step, management functions are performed as:
• Providing secure access to the data.
• Managing the growth of the data.
• Optimizing the system for better performance.
• Ensuring the availability of data event with system failures.
Data Cube
What is Data Cube?
In computer programming contexts, a data cube (or datacube) is a multi-dimensional ("n-
D") array of values.

• Typically, the term datacube is applied in contexts where these arrays are massively
larger than the hosting computer's main memory; examples include multi-
terabyte/petabyte data warehouses and time series of image data.
--Wikipedia
Example
Example: In the 2-D representation, we will look at the All Electronics sales data for items
sold per quarter in the city of Vancouver. The measured display in dollars sold (in
thousands).
Example
Example (4-D cuboid)
Example(4-D cuboid)
Example(4-D cuboid)
Computed versus Stored Data Cubes
• Pre-compute all cells in the cube: If the whole cube is pre-computed, then queries run on
the cube will be very fast. The disadvantage is that the pre-computed cube requires a lot of
memory.
• Pre-compute no cells: To minimize memory requirements, we can pre-compute none of the
cells in the cube. The disadvantage here is that queries on the cube will run more slowly
because the cube will need to be rebuilt for each query.
• Pre-compute some of the cells: As a compromise between these two, we can pre-compute
only those cells in the cube which will most likely be used for decision support queries. The
trade-off between memory space and computing time is called the space-time trade-off,
and it often exists in data mining and computer science in general.
Operations of OLAP(Data Cube)
Four types of analytical operations in OLAP are:
• Roll-up
• Drill-down
• Slice and dice
• Pivot (rotate)
Operations of OLAP(Data Cube)
Roll-up: Roll-up is also known as "consolidation" or "aggregation." The Roll-up operation can
be performed in 2 ways

• Reducing dimensions
• Climbing up concept hierarchy(Concept hierarchy is a system of grouping things based on
their order or level.)
Operations of OLAP(Data Cube)
Operations of OLAP(Data Cube)
Drill-down: In drill-down data is fragmented into smaller parts. It is the opposite of the rollup
process. It can be done via

• Moving down the concept hierarchy

• Increasing a dimension
Operations of OLAP(Data Cube)
Operations of OLAP(Data Cube)
Slice: one dimension is
selected, and a new sub-cube is
created.
Operations of OLAP(Data Cube)
Dice: This operation is similar to
a slice. The difference in dice is
you select 2 or more
dimensions that result in the
creation of a sub-cube.
Operations of OLAP(Data Cube)
Pivot: you rotate the data axes to
provide a substitute presentation
of data.

THE STEP BY STEP GUIDE FOR SUCCESSFUL IMPLEMENTATION OF DATA LAKE-LAKEHOUSE-DATA WAREHOUSE: "THE STEP BY STEP GUIDE FOR SUCCESSFUL IMPLEMENTATION OF DATA LAKE-LAKEHOUSE-DATA WAREHOUSE"
From Everand
THE STEP BY STEP GUIDE FOR SUCCESSFUL IMPLEMENTATION OF DATA LAKE-LAKEHOUSE-DATA WAREHOUSE: "THE STEP BY STEP GUIDE FOR SUCCESSFUL IMPLEMENTATION OF DATA LAKE-LAKEHOUSE-DATA WAREHOUSE"
AJIT DASH
2/5 (2)
A Project Report On "House Price Prediction": Prepared by
100% (2)
A Project Report On "House Price Prediction": Prepared by
15 pages
Data Warehousing - Architecture - Tutorialspoint
No ratings yet
Data Warehousing - Architecture - Tutorialspoint
7 pages
Unit - 1 Introduction To Data Warehousing
No ratings yet
Unit - 1 Introduction To Data Warehousing
57 pages
Learn Data Warehousing in 24 Hours
From Everand
Learn Data Warehousing in 24 Hours
Alex Nordeen
No ratings yet
Learn SAP BI in 24 Hours
From Everand
Learn SAP BI in 24 Hours
Alex Nordeen
3/5 (1)
FAR Notes
100% (3)
FAR Notes
163 pages
Data Warehouse Notes
No ratings yet
Data Warehouse Notes
21 pages
Data Warehouse
No ratings yet
Data Warehouse
74 pages
What Is A Data Warehouse?
No ratings yet
What Is A Data Warehouse?
39 pages
Approach, or A Combination of Both
No ratings yet
Approach, or A Combination of Both
12 pages
Data Warehouse
No ratings yet
Data Warehouse
56 pages
EDWH
No ratings yet
EDWH
10 pages
Data Warehousing
No ratings yet
Data Warehousing
35 pages
Lecture 2
No ratings yet
Lecture 2
68 pages
03 Data Warehouse
No ratings yet
03 Data Warehouse
27 pages
BA unit2 own
No ratings yet
BA unit2 own
10 pages
Module 3 - Datawarehousing
No ratings yet
Module 3 - Datawarehousing
45 pages
Data Warehouse
No ratings yet
Data Warehouse
39 pages
Data Warehouse
No ratings yet
Data Warehouse
73 pages
Chap 2
No ratings yet
Chap 2
53 pages
Lecture 2 - Datawarehouse
No ratings yet
Lecture 2 - Datawarehouse
50 pages
Intro To DW
No ratings yet
Intro To DW
25 pages
Ba Unit 2
No ratings yet
Ba Unit 2
20 pages
DM104 - Evaluation of Business Performance
No ratings yet
DM104 - Evaluation of Business Performance
13 pages
Unit 1
No ratings yet
Unit 1
22 pages
What Is a Data Warehouse
No ratings yet
What Is a Data Warehouse
9 pages
Data Warehouse Architecture
No ratings yet
Data Warehouse Architecture
5 pages
APznzaY6aDiiFQcZdglMmHWqlfsLZcMKsTESHR9B_kPknhosV26ajqWsdEUKja4p9JYNx0z36dw2DbeRDycS1Y8JawcQ87i9STAqIoxAdievoD9TPhGWCj-VFS9pKfSk5UUHP7K-Uuidt3jVKqNIVOgHGNQbWGsnwt_zCupOzVlvYRIscF3zSsEsHVUnpYTm4Pf6Ft1aUDOxMC_
No ratings yet
APznzaY6aDiiFQcZdglMmHWqlfsLZcMKsTESHR9B_kPknhosV26ajqWsdEUKja4p9JYNx0z36dw2DbeRDycS1Y8JawcQ87i9STAqIoxAdievoD9TPhGWCj-VFS9pKfSk5UUHP7K-Uuidt3jVKqNIVOgHGNQbWGsnwt_zCupOzVlvYRIscF3zSsEsHVUnpYTm4Pf6Ft1aUDOxMC_
47 pages
DATA WAREHOUSE - Imp
No ratings yet
DATA WAREHOUSE - Imp
76 pages
Data Modeling Concept Latest
No ratings yet
Data Modeling Concept Latest
25 pages
Group-4 - Data Warehousing
No ratings yet
Group-4 - Data Warehousing
33 pages
All Unit
No ratings yet
All Unit
17 pages
Introduction To DW
No ratings yet
Introduction To DW
28 pages
DWDM Unit 1
No ratings yet
DWDM Unit 1
103 pages
Data Warehouse Definition
No ratings yet
Data Warehouse Definition
12 pages
BI Chapter 03 - Unlocked
No ratings yet
BI Chapter 03 - Unlocked
80 pages
DWH Week 02
No ratings yet
DWH Week 02
22 pages
Lecture 10 - 11 - Data Warehousing
No ratings yet
Lecture 10 - 11 - Data Warehousing
51 pages
Data Warehouse: From Wikipedia, The Free Encyclopedia
No ratings yet
Data Warehouse: From Wikipedia, The Free Encyclopedia
5 pages
Emailing CHAPTER-1-Data-Warehouse-The-Building-Blocks
No ratings yet
Emailing CHAPTER-1-Data-Warehouse-The-Building-Blocks
9 pages
Data Warehouse References
No ratings yet
Data Warehouse References
40 pages
CDM Class4
No ratings yet
CDM Class4
4 pages
DWDM
No ratings yet
DWDM
15 pages
02 DataWarehousing and OLAP
No ratings yet
02 DataWarehousing and OLAP
66 pages
Overview of Data Warehousing and OLAP
No ratings yet
Overview of Data Warehousing and OLAP
12 pages
Module 2
No ratings yet
Module 2
43 pages
Data Warehouses: FPT University
No ratings yet
Data Warehouses: FPT University
49 pages
BIDA NOTES (1)
No ratings yet
BIDA NOTES (1)
67 pages
DWDM Notes - Final
No ratings yet
DWDM Notes - Final
46 pages
DWDM Lecture Notes III-II (1)
No ratings yet
DWDM Lecture Notes III-II (1)
81 pages
Topic 8 - Intro to Data Warehouse
No ratings yet
Topic 8 - Intro to Data Warehouse
40 pages
DWM - Te - Week-1
No ratings yet
DWM - Te - Week-1
53 pages
data mart ian
No ratings yet
data mart ian
8 pages
dataMart
No ratings yet
dataMart
20 pages
20it501 DWDM PPT Unit I
No ratings yet
20it501 DWDM PPT Unit I
127 pages
Data Warehouse Concepts Presentation
100% (2)
Data Warehouse Concepts Presentation
60 pages
Data Warehousing Components - L3 - L4 - L5
No ratings yet
Data Warehousing Components - L3 - L4 - L5
26 pages
Paper Presentation: Data Ware Housing AND Data Mining
No ratings yet
Paper Presentation: Data Ware Housing AND Data Mining
10 pages
Lec 11- DW
No ratings yet
Lec 11- DW
32 pages
R16 4-2 DataMining Notes UNIT-I
No ratings yet
R16 4-2 DataMining Notes UNIT-I
31 pages
Unit-1.1 Data Warehouse
No ratings yet
Unit-1.1 Data Warehouse
29 pages
Slipped discs
No ratings yet
Slipped discs
3 pages
Vikas Singh_Analyst
No ratings yet
Vikas Singh_Analyst
1 page
Prepared By:-Sneha A. Padhiar Ce Department, Cspit
No ratings yet
Prepared By:-Sneha A. Padhiar Ce Department, Cspit
27 pages
Bitmap and Bitmap Join Index
No ratings yet
Bitmap and Bitmap Join Index
18 pages
Apriori
No ratings yet
Apriori
34 pages
Ce348: Information Technology: Credits and Hours: Teaching Scheme Theory Practical Tutorial Total Credit
No ratings yet
Ce348: Information Technology: Credits and Hours: Teaching Scheme Theory Practical Tutorial Total Credit
4 pages
CE347-IOS-MCQ Paper2 - Google Forms
No ratings yet
CE347-IOS-MCQ Paper2 - Google Forms
15 pages
Personal Statement
100% (1)
Personal Statement
2 pages
Cameron Hutchings
No ratings yet
Cameron Hutchings
3 pages
INTB 6200 Managing The Global Enterprise
No ratings yet
INTB 6200 Managing The Global Enterprise
7 pages
Research Project Report ON: Effect of Social Media On 5 Star Hotels BHM
No ratings yet
Research Project Report ON: Effect of Social Media On 5 Star Hotels BHM
41 pages
Expenses Amount: Shashaank Industries Ltd. Profit and Loss Account For The Year Ended 31st March 2006
No ratings yet
Expenses Amount: Shashaank Industries Ltd. Profit and Loss Account For The Year Ended 31st March 2006
6 pages
OLIGOPOLY
No ratings yet
OLIGOPOLY
47 pages
Accounting Problems On Departmental Accounts
No ratings yet
Accounting Problems On Departmental Accounts
12 pages
Cibil Praveen Bhosle 30.08.2024
No ratings yet
Cibil Praveen Bhosle 30.08.2024
31 pages
Novice Hedge
100% (2)
Novice Hedge
97 pages
MICA SEM Display Liveproject Template
No ratings yet
MICA SEM Display Liveproject Template
14 pages
HI 6026 Audit
No ratings yet
HI 6026 Audit
13 pages
Ultimate Guide To Document Control For Medical Device Companies
No ratings yet
Ultimate Guide To Document Control For Medical Device Companies
41 pages
Continuous Quality Improvement Within The Manufacturing Process
No ratings yet
Continuous Quality Improvement Within The Manufacturing Process
5 pages
Business Finance Syllabus 2023-2024
No ratings yet
Business Finance Syllabus 2023-2024
31 pages
Bank Regulation, Risk Management, and Compliance - Theory, Practice, and Key Problem Areas 1st Edition Alexander Dill - Download the ebook now and own the full detailed content
100% (1)
Bank Regulation, Risk Management, and Compliance - Theory, Practice, and Key Problem Areas 1st Edition Alexander Dill - Download the ebook now and own the full detailed content
67 pages
Neglecting The Child Case Study
No ratings yet
Neglecting The Child Case Study
17 pages
Pat McGraw Resume
No ratings yet
Pat McGraw Resume
2 pages
Aqualisa Case - Final Report
100% (1)
Aqualisa Case - Final Report
6 pages
Business Requirement Document
No ratings yet
Business Requirement Document
8 pages
PHR-PP-QC-GG-008 Fabrication Quality Control Plan
No ratings yet
PHR-PP-QC-GG-008 Fabrication Quality Control Plan
9 pages
Curriculum Vitae: Personal Particulars
No ratings yet
Curriculum Vitae: Personal Particulars
2 pages
14913
No ratings yet
14913
135 pages
MBA Full Project
83% (18)
MBA Full Project
123 pages
Tool Identifying Critical Positions Template-E
No ratings yet
Tool Identifying Critical Positions Template-E
5 pages
Questionnaire OTOP A
No ratings yet
Questionnaire OTOP A
16 pages
Labour Rights in Unilever's Supply Chain: From Compliance To Good Practice. An Oxfam Study of Labour Issues in Unilever's Viet Nam Operations and Supply Chain
100% (1)
Labour Rights in Unilever's Supply Chain: From Compliance To Good Practice. An Oxfam Study of Labour Issues in Unilever's Viet Nam Operations and Supply Chain
108 pages
Audit Appendix-2
No ratings yet
Audit Appendix-2
6 pages
39th Induction brochure
No ratings yet
39th Induction brochure
44 pages
(Sem.-VI) Examination Oct - Nov.-2019 Adv - Acco.&Audi.-P-VII
No ratings yet
(Sem.-VI) Examination Oct - Nov.-2019 Adv - Acco.&Audi.-P-VII
3 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Data Warehouse - DWDM

Uploaded by

Data Warehouse - DWDM

Uploaded by

Data Warehouse

What is Data Warehouse?

Data capture Bulk write operations typically on a Optimized for continuous

Data normalization Denormalized schemas, such as the Highly normalized, static

Data storage Optimized for simplicity of access and

It involves the following tasks:

• Moving down the concept hierarchy

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.