Data Modeling, Star Schema, Snowflake Schema
Data Modeling, Star Schema, Snowflake Schema
1. Data Modeling
Definition:
Data modeling is the process of creating a visual representation of data and its relationships to
facilitate database design and ensure data integrity, performance, and usability. It helps
structure data logically and physically for storage and analysis.
o Intermediate model that defines the structure of the data, including entity
relationships, attributes, and data types, without focusing on the DBMS.
2. Star Schema
Definition:
The star schema is a data warehouse schema design that consists of a central fact table
connected to multiple dimension tables. It is optimized for analytical queries and decision-
making processes.
Components:
1. Fact Table:
2. Dimension Tables:
Example Schema:
Revenue
Advantages:
Disadvantages:
3. Snowflake Schema
Definition:
The snowflake schema is a normalized version of the star schema. Dimension tables are split
into additional tables to reduce redundancy and storage requirements.
Components:
Example Schema:
CustomerID (FK)
Revenue
Advantages:
Disadvantages:
4. Types of Facts
Categories of Facts:
1. Additive Facts:
2. Semi-Additive Facts:
3. Non-Additive Facts:
5. Dimensions
Definition:
Dimensions provide the descriptive context for facts, enabling users to analyze and filter data
from various perspectives.
Characteristics:
Examples of Dimensions:
1. Time Dimension:
3. Customer Dimension:
Types of Dimensions:
1. Conformed Dimensions:
2. Junk Dimensions:
3. Degenerate Dimensions:
4. Role-Playing Dimensions:
Types of SCDs
• Definition: The dimension data is static and does not change over time.
• Use Case: For attributes like Product Launch Date or Social Security Number that must
remain constant.
• Definition: When a change occurs, the old data is overwritten with the new data, and no
history is maintained.
• Characteristics:
Example:
• Definition: Maintains full history by creating a new record for each change in dimension
data.
• Characteristics:
Implementation Options:
1. Row Versioning: Add a Version column to identify different versions of the same
dimension.
2. Date Range: Add StartDate and EndDate columns to define the validity period.
Example:
• Characteristics:
o Useful when only a small history is needed (e.g., the last two changes).
Example:
• Definition: Maintains history in a separate table, while the main dimension table holds
only the current data.
• Characteristics:
Example:
History Table:
• Definition: Combines elements of SCD Types 1, 2, and 3 to track both historical and
current data while maintaining versioning.
• Characteristics:
Example:
2023-07-
101 John Smith South North 2 NULL
01
2023-01- 2023-06-
101 John Smith North NULL 1
01 30
Type 2 When complete historical tracking is critical for analysis and reporting.
Type 4 When maintaining a clean, smaller main table while preserving history is needed.