DWM Chp2 Notes
DWM Chp2 Notes
2.1
Data warehouse modeling:-
Data warehouse modeling is the process of designing the schemas of the detailed
and summarized information of the data warehouse. The goal of data warehouse
modelling is to develop a schema describing the reality, or at least a part of the fact,
which the data warehouse is needed to support.
Data Warehouse Modelling is the first step for building a Data Warehouse system, in
which the process of crafting the schemas based on the comprehensive information
provided by the client/business owners and the enhancement of the crafted schema
is performed, by wrapping all the available facts about the database for the client to
visualize the relationships between various components of the Data Warehouse such
as the databases, tables, contents of the tables including indexes, views and to get a
working product, as a well-structured system consents to form an efficient Data
Warehouse that aids in lessening the overall cost of employing the Data Warehouse
in the business decision-making processes.
A data cube is a data structure that allows fast data analysis. Data cube contain two
basic types of data:-
Facts or Measures:- In an OLAP data cube, Facts or Measures are numerical values
that can be accessed at either a detailed dimension member level or can be
aggregated at higher dimension levels. These central values are pre-processed,
aggregated, and analysed. Examples of typical facts/measures include Sales amount,
Units sold, Cost amount, and Transactions count.
Dimensions:- Dimensions are used to organize the facts or measures, and they
consist of categories. Dimensions include dimension attributes,
levels, and hierarchies
Dhrupesh Sir 9699692059
Dimension Attribute:
Hierarchy:
Level:
Within a hierarchy, data can be organized into different levels of detail, such as Year,
Quarter, Month, and Day in a Time hierarchy, or Country, State, and City in a
geography dimension.
What is OLAP?
OLAP stands for Online Analytical Processing, which is a technology that enables
multi-dimensional analysis of business data. It provides interactive access to large
amounts of data and supports complex calculations and data aggregation. OLAP is
used to support business intelligence and decision-making processes.
Data Cube
When data is grouped or combined in multidimensional matrices called Data
Cubes.
Relational data cube: It basically helps in storing large amounts of data by making
use of relational tables. Each relational table displays the dimensions of the data
cube. It is slower compared to a Multidimensional Data Cube.
A data cube is created from a subset of attributes in the database. Specific attributes
are chosen to be measure attributes, i.e., the attributes whose values are of interest.
Another attributes are selected as dimensions or functional attributes. The measure
attributes are aggregated according to the dimensions.
Data cube method is an interesting technique with many applications. Data cubes
Dhrupesh Sir 9699692059
could be sparse in many cases because not every cell in each dimension may have
corresponding data in the database.
Multi-dimensional analysis
Interactivity
Speed and efficiency
Data aggregation
Improved decision-making
Accessibility
Complexity
Data size limitations
Performance issues
Data integrity
Cost
Inflexibility
The multi-Dimensional Data Model is a method which is used for ordering data in
the database along with good arrangement and assembling of the contents in the
database.
Example:
Let us take the example of a firm. The revenue cost of a firm can be recognized on
the basis of different factors such as geographical location of firm’s
workplace, products of the firm, advertisements done, time utilized to flourish
a product, etc.
Advantages:
2. Applicable to both small-scale data sources like data marts and large-scale data
sources such as databases.
Disadvantages:
Snowflake Schema
Now the item dimension table contains the attributes item_key, item_name,
type, brand, and supplier-key.
The supplier key is linked to the supplier dimension table. The supplier
dimension table contains the attributes supplier_key and supplier_type.
Advantages:
1. Offers reduced data redundancy.
3. Provides more accurate and consistent results compared to the star schema.
Disadvantages:
1. Can be slower than the star schema due to the use of joins.
A fact constellation has multiple fact tables. It is also known as galaxy schema.
The following diagram shows two fact tables, namely sales and shipping.
Disadvantages:
1. Multi-dimensional OLAP
2. Relational OLAP
3. Hybrid OLAP
4. Specialized SQL Servers
5. Desktop OLAP
1. Multi-dimensional OLAP
Multi-dimensional OLAP (MOLAP) is based on a multi-dimensional data base
architecture.
This stores data in a three-dimensional data cube that is already in the OLAP
multi- dimensional format for ‘slicing and dicing’ into analysis views.
Multi-dimensional architectures provide performance benefits where the data
retrieval paths follow the pre-defined structure of the data cubes.
2. Relational OLAP
Relational OLAP (ROLAP) is the fastest growing area of OLAP technology, with
new vendors entering the market at an accelerating pace.
Relational OLAP products are designed to operate directly on a data
warehouse built on relational databases, through a comprehensive metadata
layer.
The ROLAP is suitable for situations ,
o Where users require unrestricted analysis of a large volume of data,
o Where different business areas require different multi-dimensional views
over the same source data,
Dhrupesh Sir 9699692059
o Where there is a requirement to drill down to a low level of detail
without impacting on the operational system, etc.
Need of OLAP
• Calculation-intensive capabilities
• Time intelligence
Benefits of OLAP
OLAP Guidelines
In 1993, Dr. Edgar F. Codd originated twelve rules as the basis for selecting OLAP
tools.
Multidimensional data model is provided that is intuitively analytical and easy to use.
A multidimensional data model decides how the users perceive business problems.
Transparency:
Accessibility:
Access should provide only to the data that is actually needed to perform the
specific analysis, presenting a single, coherent and consistent view to the users.
Client/Server Architecture:
Generic Dimensionality:
It should be ensured that very data dimension is equivalent in both structure and
operational capabilities. Have one logical structure for all dimensions.
Adaption should be of the physical schema to the specific analytical model being
created and loaded that optimizes sparse matrix handling.
Dhrupesh Sir 9699692059
Multi-user Support:
Support should be provided for end users to work concurrently with either the same
analytical model or to create different models from the same data.
Flexible Reporting:
Business user is provided capabilities to arrange columns, rows, and cells in manner
that gives the facility of easy manipulation, analysis and synthesis of information.
2.4
Typical OLAP Operations:-
There are five OLAP operations that can be applied over the data cube. These are
Roll-up, Drill-down, Slice, Dice, Pivot (rotate).
In the cube given in overview section, the drill down operation is performed by
moving down in the concept hierarchy of Time dimension (Quarter -> Month).
In the cube given in the overview section, the roll-up operation is performed by
climbing up in the concept hierarchy of Location dimension (City -> Country).