Unit 2.4 Star SnowFlake Schema ETl Process
Unit 2.4 Star SnowFlake Schema ETl Process
Unit 2.4 Star SnowFlake Schema ETl Process
SNOWFL
AKE
SCHEM
A
SNOWFLAK
E VS
STAR
SCHEMA
• t h e most basic kind of data storage schema
is the star schema. As a result of its star-
like structuíe, it is refered to as a star
schema.
• When comparing the snowflake and star
schemas, the snowflake schema extends the
star schema by adding more dimensions.
Because of how much its diagram resembles
a snowflake, it is termed a snowflake.
STAR VS • Only a single join in a star schema describes
SNOWFL the connection between a fact table and any
dimension tables.
AKE • A fact table and dimension tables ciícle each
SCHEMA other in the star schema.
• Dimension tables surround the snowflake
schema, which is in return surrounded by
A star schematables.
dimension may be the best choice if you're
looking for a quick and easy cloud data waíehousing
solution. A snowflake schema, however, can be a
better option if you require more adaptability to suit
shifting data requirements.
Consider an order management operational database
that tracks order numbers, dates, the requested ship
dates, customers and their shipping and billing
addresses, products and their quantity and gross
dollar amount, sales representatives that take and
process orders, the deals (promotions) and discounts
proposed/offered to
customers.
STAR You have to design a data warehouse that will be
SCHEMA updated from the above operational database
Example and should support decision making by helping to
answer analytical questions about the net
order dollar amounts per customer, products,
promotions or deals, and the performance of their
sales representatives or agents. Analysis of
requested ship dates is important for analysis as
well. It is also important to allow for performing
order amount analysis in various currencies: dollars,
Step 1 :
Step 3: Choose the attributes
Facts or measures are: of
dimension tables.
• Net_amount_per_customer Attributes of Sales
Representative Dimension:
• Net_amount_per_product
•Sales_rep_id (primary
• Net_amount_per_promotion key)
•Name
Step 2: Choose the dimensions •Deal
for •Discount
the fact table. •Attributes of Time
Dimension:
Dimensions are • Time_id (primary
• Sales Representative key)
• day
• Time • month
• Year
• Customer •Attributes of
• Product Customer Dimension:
• Customer_id
• Order (primary key)
• name
• Attributes of Product Dimension:
• Product_id (primary key and surrogate
key)
• quality
• price
• product_number
• requested_ship_date
• type
• Attributes of Order Dimension:
• Order_id (primary key and surrogate
key)
• order_number
• date
• amount
Extract, transform, and load (ETL) is the process of combining data
from multiple sources into a large, central repository called a data
warehouse.
ETL uses a set of business rules to clean and organize raw data and
prepare it for storage, data analytics, and machine learning (ML).
ETL You can address specific business intelligence needs through data