0% found this document useful (0 votes)
35 views21 pages

DWM Chp2 Notes

Uploaded by

sr5824241
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
35 views21 pages

DWM Chp2 Notes

Uploaded by

sr5824241
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 21

Chp 2

2.1
Data warehouse modeling:-
Data warehouse modeling is the process of designing the schemas of the detailed
and summarized information of the data warehouse. The goal of data warehouse
modelling is to develop a schema describing the reality, or at least a part of the fact,
which the data warehouse is needed to support.

Data Warehouse Modelling is the first step for building a Data Warehouse system, in
which the process of crafting the schemas based on the comprehensive information
provided by the client/business owners and the enhancement of the crafted schema
is performed, by wrapping all the available facts about the database for the client to
visualize the relationships between various components of the Data Warehouse such
as the databases, tables, contents of the tables including indexes, views and to get a
working product, as a well-structured system consents to form an efficient Data
Warehouse that aids in lessening the overall cost of employing the Data Warehouse
in the business decision-making processes.

Need for Data Warehouse Modeling

 Business requirements collection


 Improving the performance of database
 Provides documentation of the source and target system

Data Cube and OLAP


What is Data Cube?

When data is grouped or combined in multidimensional matrices called Data


Cubes. The data cube method has a few alternative names or a few variants, such as
"Multidimensional databases," "materialized views," and "OLAP (On-Line Analytical
Processing)."

The general idea of this approach is to materialize certain expensive computations


Dhrupesh Sir 9699692059
that are frequently inquired.
Grouping of data in a multidimensional matrix is called data cubes. In Data
Warehousing, we generally deal with various multidimensional data models as the
data will be represented by multiple dimensions and multiple attributes. This
multidimensional data is represented in the data cube as the cube represents a high-
dimensional space.

A data cube is a data structure that allows fast data analysis. Data cube contain two
basic types of data:-

Facts or Measures:- In an OLAP data cube, Facts or Measures are numerical values
that can be accessed at either a detailed dimension member level or can be
aggregated at higher dimension levels. These central values are pre-processed,
aggregated, and analysed. Examples of typical facts/measures include Sales amount,
Units sold, Cost amount, and Transactions count.

Dimensions:- Dimensions are used to organize the facts or measures, and they
consist of categories. Dimensions include dimension attributes,
levels, and hierarchies
Dhrupesh Sir 9699692059
Dimension Attribute:

An attribute is an additional piece of information that provides more details about


the data. Some attributes are used for display purposes, while others may include
characteristics like colours, flavours, or sizes.

Hierarchy:

A hierarchy is a logical structure that arranges the members of a dimension into a


tree-like structure. Each member within a hierarchy has one parent member and zero
or more child members. For instance, a time dimension may have a hierarchy
consisting of levels for Year, Quarter, Month, and Day. In this case, January would be
considered a child of Qtrl, the next higher level in the Time hierarchy. Similarly, Qtri,
which consolidates all of its children's values, is the parent of January in the same
hierarchy.

Level:

Within a hierarchy, data can be organized into different levels of detail, such as Year,
Quarter, Month, and Day in a Time hierarchy, or Country, State, and City in a
geography dimension.

What is OLAP?

OLAP stands for Online Analytical Processing, which is a technology that enables
multi-dimensional analysis of business data. It provides interactive access to large
amounts of data and supports complex calculations and data aggregation. OLAP is
used to support business intelligence and decision-making processes.

Dhrupesh Sir 9699692059


OLAP (for online analytical processing) is software for performing multidimensional
analysis at high speeds on large volumes of data from a data warehouse, data mart,
or some other unified, centralized data store.

Data Cube
When data is grouped or combined in multidimensional matrices called Data
Cubes.

The data cube can be classified into two categories:

Multidimensional data cube: It basically helps in storing large amounts of data by


making use of a multi-dimensional array. It increases its efficiency by keeping an
index of each dimension. Thus, dimensional is able to retrieve data fast.

Relational data cube: It basically helps in storing large amounts of data by making
use of relational tables. Each relational table displays the dimensions of the data
cube. It is slower compared to a Multidimensional Data Cube.

A data cube is created from a subset of attributes in the database. Specific attributes
are chosen to be measure attributes, i.e., the attributes whose values are of interest.
Another attributes are selected as dimensions or functional attributes. The measure
attributes are aggregated according to the dimensions.

Data cube method is an interesting technique with many applications. Data cubes
Dhrupesh Sir 9699692059
could be sparse in many cases because not every cell in each dimension may have
corresponding data in the database.

Example of Data Cube:-

Dhrupesh Sir 9699692059


Advantages of data cubes:

 Multi-dimensional analysis
 Interactivity
 Speed and efficiency
 Data aggregation
 Improved decision-making
 Accessibility

Disadvantages of data cube:

 Complexity
 Data size limitations
 Performance issues
 Data integrity
 Cost
 Inflexibility

Multidimensional Data model:-

The multi-Dimensional Data Model is a method which is used for ordering data in
the database along with good arrangement and assembling of the contents in the
database.

The Multi-Dimensional Data Model allows customers to interrogate


analytical questions associated with market or business trends, unlike relational
databases which allow customers to access data in the form of queries. They allow
users to rapidly receive answers to the requests which they made by
creating and examining the data comparatively fast.

Example:

Let us take the example of a firm. The revenue cost of a firm can be recognized on
the basis of different factors such as geographical location of firm’s
workplace, products of the firm, advertisements done, time utilized to flourish
a product, etc.

Dhrupesh Sir 9699692059


2.2
Schemas in Data warehouse
Star Schema
 The simplest data warehouse schema is star schema because its structure
resembles a star. Star schema consists of data in the form of facts and
dimensions.
 In the Star schema, the center of the star can have one fact tables and
numbers of associated dimension tables.
 The fact table present in the center of star and points of the star are the
dimension tables.
 In star schema fact table contain a large amount of data, with no redundancy.
Each dimension table is joined with the fact table using a primary or foreign
key.
 It is also known as Star Join Schema and is optimized for querying large data
sets.
 This dimension table contains the set of attributes.
Dhrupesh Sir 9699692059
 The following diagram shows the sales data of a company with respect to the
four dimensions, namely time, item, branch, and location.
 There is a fact table at the centre. It contains the keys to each of four
dimensions.
 The fact table also contains the attributes, namely dollars sold and units sold.

Advantages:

1. Offers highly optimized performance.

2. Applicable to both small-scale data sources like data marts and large-scale data
sources such as databases.

3. The star schema is simple to implement and maintain.

Disadvantages:

1. May offer less accuracy and consistency.

2. Data may be de-normalized.

3. May have issues with data redundancy.

Snowflake Schema

 The snowflake schema is a more complex than star schema because


dimension tables of the snowflake are normalized.
 The snowflake schema is represented by centralized fact table which is
connected to multiple dimension table and this dimension table can be
normalized into additional dimension tables.
Dhrupesh Sir 9699692059
 The major difference between the snowflake and star schema models is that
the dimension tables of the snowflake model are normalized to reduce
redundancies
 Some dimension tables in the Snowflake schema are normalized.

 The normalization splits up the data into additional tables.


 Unlike Star schema, the dimensions table in a snowflake schema are
normalized. For example, the item dimension table in snowflake schema is
normalized and split into two dimension tables, namely item and supplier
table.

 Now the item dimension table contains the attributes item_key, item_name,
type, brand, and supplier-key.

 The supplier key is linked to the supplier dimension table. The supplier
dimension table contains the attributes supplier_key and supplier_type.

Advantages:
1. Offers reduced data redundancy.

2. Uses normalized data.

3. Provides more accurate and consistent results compared to the star schema.

Disadvantages:

1. Can be slower than the star schema due to the use of joins.

2. More complex than the star schema.

3. May require more complex queries due to the use of joins.

Dhrupesh Sir 9699692059


Fact Constellation Schema

 A fact constellation has multiple fact tables. It is also known as galaxy schema.
 The following diagram shows two fact tables, namely sales and shipping.

 The sales fact table is same as that in the star schema.


 The shipping fact table has the five dimensions, namely item_key, time_key,
shipper_key, from_location, to_location.
 The shipping fact table also contains two measures, namely dollars sold and
units sold.
 It is also possible to share dimension tables between fact tables. For example,
time, item, and location dimension tables are shared between the sales and
shipping fact table.
Advantages:

1. Offers high flexibility.


2. No data redundancy.
3. Requires low memory/space.

Disadvantages:

1. May have a complicated design.


2. Creating, implementing, and maintaining a galaxy schema can be a challenging
task.
3. May require more complex queries due to the higher number of joins used to
connect fact and dimension tables.
4. Data analysis may be difficult due to the complex structure.

Dhrupesh Sir 9699692059


2.3
OLAP
OLAP can be defined as the process of converting raw data in to business
information through multi dimension analysis. For a data warehouse application a
well-designed metadata layer will provide a multi-dimensional view of data.

OLAP is a key technology for successful management that describes a class of


applications that require multi-dimensional analysis of business data.

Types of OLAP Servers

1. Multi-dimensional OLAP
2. Relational OLAP
3. Hybrid OLAP
4. Specialized SQL Servers
5. Desktop OLAP

1. Multi-dimensional OLAP
 Multi-dimensional OLAP (MOLAP) is based on a multi-dimensional data base
architecture.
 This stores data in a three-dimensional data cube that is already in the OLAP
multi- dimensional format for ‘slicing and dicing’ into analysis views.
 Multi-dimensional architectures provide performance benefits where the data
retrieval paths follow the pre-defined structure of the data cubes.

2. Relational OLAP
 Relational OLAP (ROLAP) is the fastest growing area of OLAP technology, with
new vendors entering the market at an accelerating pace.
 Relational OLAP products are designed to operate directly on a data
warehouse built on relational databases, through a comprehensive metadata
layer.
 The ROLAP is suitable for situations ,
o Where users require unrestricted analysis of a large volume of data,
o Where different business areas require different multi-dimensional views
over the same source data,
Dhrupesh Sir 9699692059
o Where there is a requirement to drill down to a low level of detail
without impacting on the operational system, etc.

Dhrupesh Sir 9699692059


 This is not suitable where data storage is a limiting factor due to
data redundancy.

Need of OLAP

 Finance departments use OLAP for applications such as budgeting, activity-


based costing (allocations), financial performance analysis, and financial
modeling.
 Sales analysis and forecasting are two of the OLAP applications found in
sales departments.
 Marketing departments use OLAP for market research analysis, sales
forecasting, promotions analysis, and customer analysis and market/customer
segmentation.
 Manufacturing OLAP applications include production planning and defect
analysis.
 The key indicator of a successful OLAP application is its ability to provide
information, as needed, i.e., its ability to provide ‘just-in-time’ information for
effective decision-making.
 OLAP enables analysts, managers, and executives to gain insight into data
through fast, consistent, interactive access to a wide variety of possible views
of information. OLAP transforms raw data so that it reflects the real
dimensionality of the enterprise as understood by the user.
 OLAP plays the role of a mediator to the various types of data sources and
front-end interfaces.

Dhrupesh Sir 9699692059


Features of OLAP

The key features of OLAP are:

• Multi-dimensional views of data

• Calculation-intensive capabilities

• Time intelligence

Benefits of OLAP

 Successful OLAP applications increase the productivity of business managers,


developers, and whole organizations.
 IT developers also benefit from using the right OLAP software
 OLAP reduces the applications backlog still further by making business users
self-sufficient enough to build their own models.
 OLAP enables the organization as a whole to respond more quickly to market
demands. Market responsiveness, in turn, often yields improved revenue and
profitability.

OLAP Guidelines
In 1993, Dr. Edgar F. Codd originated twelve rules as the basis for selecting OLAP
tools.

1. Multidimensional Conceptual View


2. Transparency
3. Accessibility
4. Consistent Reporting Performance
5. Client/Server Architecture
6. Generic Dimensionality
7. Dynamic Sparse Matrix Handling
8. Multi-user Support
9. Unrestricted Cross-dimensional Operations
10.Intuitive Data Manipulation
11. Flexible Reporting
12. Unlimited Dimensions and Aggregation Levels

Dhrupesh Sir 9699692059


Multidimensional Conceptual View:

Multidimensional data model is provided that is intuitively analytical and easy to use.
A multidimensional data model decides how the users perceive business problems.

Transparency:

It makes the technology, underlying data repository, computing architecture, and


the diverse nature of source data totally transparent to users.

Accessibility:

Access should provide only to the data that is actually needed to perform the
specific analysis, presenting a single, coherent and consistent view to the users.

Consistent Reporting Performance:

Users should not experience any significant degradation in reporting performance as


the number of dimensions or the size of the database increases. It also ensures users
must perceive consistent run time, response time or machine utilization every time a
given query is run.

Client/Server Architecture:

It conforms the system to the principles of client/server architecture for optimum


performance, flexibility, adaptability, and interoperability.

Generic Dimensionality:

It should be ensured that very data dimension is equivalent in both structure and
operational capabilities. Have one logical structure for all dimensions.

Dynamic Sparse Matrix Handling:

Adaption should be of the physical schema to the specific analytical model being
created and loaded that optimizes sparse matrix handling.
Dhrupesh Sir 9699692059
Multi-user Support:

Support should be provided for end users to work concurrently with either the same
analytical model or to create different models from the same data.

Unrestricted Cross-dimensional Operations:

System should have abilities to recognize dimensional and automatically perform


roll-up and drill-down operations within a dimension or across dimensions.

Intuitive Data Manipulation:

Consolidation path reorientation, drill-down, and roll-up and other manipulations to


be accomplished intuitively should be enabled and directly via point and click
actions.

Flexible Reporting:

Business user is provided capabilities to arrange columns, rows, and cells in manner
that gives the facility of easy manipulation, analysis and synthesis of information.

Unlimited Dimensions and Aggregation Levels:

There should be at least fifteen or twenty data dimensions within a common


analytical model.

2.4
Typical OLAP Operations:-
There are five OLAP operations that can be applied over the data cube. These are
Roll-up, Drill-down, Slice, Dice, Pivot (rotate).

Dhrupesh Sir 9699692059


1. Drill down: In drill-down operation, the less detailed data is converted into highly
detailed data. It can be done by:
 Moving down in the concept hierarchy
 Adding a new dimension

In the cube given in overview section, the drill down operation is performed by
moving down in the concept hierarchy of Time dimension (Quarter -> Month).

Dhrupesh Sir 9699692059


2. Roll up: It is just opposite of the drill-down operation. It performs aggregation
on the OLAP cube. It can be done by:
 Climbing up in the concept hierarchy
 Reducing the dimensions

In the cube given in the overview section, the roll-up operation is performed by
climbing up in the concept hierarchy of Location dimension (City -> Country).

Dhrupesh Sir 9699692059


3. Dice: It selects a sub-cube from the OLAP cube by selecting two or more
dimensions. In the cube given in the overview section, a sub-cube is selected by
selecting following dimensions with criteria:
 Location = “Delhi” or “Kolkata”
 Time = “Q1” or “Q2”
 Item = “Car” or “Bus”

Dhrupesh Sir 9699692059


4. Slice: It selects a single dimension from the OLAP cube which results in a new
sub-cube creation. In the cube given in the overview section, Slice is performed
on the dimension Time = “Q1”.

Dhrupesh Sir 9699692059


5. Pivot: It is also known as rotation operation as it rotates the current view to get a
new view of the representation. In the sub-cube obtained after the slice
operation, performing pivot operation gives a new view of it.

Dhrupesh Sir 9699692059

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy