Dimension Modeling
Dimension Modeling
Dimension Modeling
Logical Modeling
Includes entities (tables), attributes (columns/fields) and
relationships (keys).
Uses business names for entities & attributes
Is independent of technology (platform, DBMS)
Is normalized to fourth normal form(4NF)
Physical Modeling
Includes tables, columns, keys, data types, validation rules,
database triggers, stored procedures, domains, and
access constraints
Uses more defined and less generic specific names for
tables and columns, such as abbreviated column names,
limited by the database management system (DBMS) and
any company defined standards
Includes primary keys and indices for fast data access.
Logical Vs Physical
physical
Entity.
Table.
Attribute
Column
Primary Key
Alternate Key
Rule
Relationship
Foreign Key
Definition
Comment
What is ER Modeling?
Entity Relational Data Modeling is used in OLTP systems
which are transaction oriented.
Focus of OLTP Design
Individual data elements
Data relationships
Design goals
Accurately model business
Remove redundancy(Normalized)
ER Modeling Shortcomings:
Complex
Unfamiliar to business people
Incomplete history
Slow query performance
Dimensional Modeling
Definition
Logical data model used to represent the measures and
dimensions that pertain to one or more business subject
areas
Dimensional Model = Star Schema
Can easily translate into multi-dimensional database
design if required
Overcomes ER design shortcomings
D M Advantages:
Understandable
Systematically represents history
Reliable join paths
High performance query
Enterprise scalability
ER v/s DM
ER
DM
Normal Reports
Dimension tables
Dimension table is one that describe the business entities
of an enterprise, represented as hierarchical, categorical
information such as time, departments , locations, and
products. Dimension tables are sometimes called lookup or
reference tables.
Textual content (Character data)
Dimension tables
Characteristics
Hold the dimensional attributes
Usually have a large number of attributes (wide)
Add flags and indicators that make it easy to perform
specific types of reports
Have small number of rows in comparison to fact tables
(most of the time)
Surrogate Key
A unique (primary key) generated by the RDBMS that is
not derived from any data in the database and whose only
significance is to act as the primary key. A surrogate key is
frequently a sequential number.
Each table assigned a unique primary key, specifically generated
for the data warehouse
Time
Model
time_key
model_key
year
quarter
month
date
brand
category
line
model
Dealer
dealer_key
region
state
city
dealer
Degenerated Dimensions
A degenerate dimension is a dimension which is derived
from the fact table and doesn't have its own dimension
table.
Stored in the fact table
Common examples include invoice numbers or order
numbers
Use - Degenerate dimensions is often based on the desire
to provide a direct reference back to a transactional system
without the overhead of maintaining a separate dimension
table.
Conformed Dimensions
A dimension that has exactly the same meaning and
content when being referred from different fact tables.
Example: Cube-1 contains F1 D1 D2 D3 and Cube-2
contains F2 D1 D2 D4 are the Facts and Dimensions
here D1 D2 are the Conformed Dimensions.
Eg: Time Dimension
Fact table
A fact table consists of the measurements, metrics or facts
of a business process.
Fact tables are often defined by their grain.
Grain
The level of detail represented by a row in the fact table
Must be identified early
Facts
Fully additive
Can be summed across any and all dimensions
Stored in fact table
Examples: revenue, quantity , Sales_amount
Facts
Semi-additive
Semi-additive facts are facts that can be
summed up for some of the dimensions in the
fact table, but not the others.
Facts
Non-additive
Non-additive facts are facts that cannot be
summed up for any of the dimensions present in
the fact table.
All ratios are non-additive
Examples: Age, weather
STAR Schema
The star schema (also called star-join schema or multi-dimensional
schema) is the simplest style of data warehouse schema. The star
schema consists of one or more fact tables referencing any number
of dimension tables.
The main advantages of star schemas are that they:
- Provide highly optimized performance for typical star queries.
- Widely supported by a large number of business intelligence tools.
STAR Schema
Snowflake Schema
The snowflake schema is similar to the star schema. However, in the
snowflake schema, dimensions are normalized into multiple related
tables, whereas the star schema's dimensions are denormalized with
each dimension represented by a single table.
Advantages of Using the Snowflake Schema :
- easier to maintain.
- increases flexibility
Disadvantages of Using the Snowflake Schema
- increases the number of tables an end-user must work with.
- makes the queries much more difficult to create because more tables
need to be joined.
Snowflake Schema
32
Step One
1.
33
Step Two
2.
34
Step Three
3.
35
Identify dimensions
Step Four
4.
36
Select facts
Step Five
5.
37
Identify dimensional
attributes