Snowflake Schema: The Snowflake Schema Is An Extension of Star Schema. in A Snowflake

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 4

Q1) Explain the star schema, snowflake schema and fact constellation schema with examples

Ans: Star Schema: The star schema architecture is easy to design. It is called a star schema because
diagram resembles a star,with points radiating from a center. The center of the star consists of the
fact table, and the points of the star is dimension tables.
The fact tables in a star schema which is third normal form whereas dimensional tables are de-
normalized.

In the following Star Schema example, the fact table is at the center which contains keys to
every dimension table like Dealer_ID, Model ID, Date_ID, Product_ID, Branch_ID & other
attributes like Units sold and revenue.

Snowflake Schema: The snowflake schema is an extension of star schema. In a snowflake


schema, each dimension are normalized and connected to more dimension tables.
In the following Snowflake Schema example, Country is further normalized into an
individual table.

Fact Constellation Schema: Fact Constellation is a schema for representing


multidimensional model. It is a collection of multiple fact tables having some common
dimension tables. It can be viewed as a collection of several star schemas and hence, also
known as Galaxy schema. It is one of the widely used schema for Data warehouse
designing and it is much more complex than star and snowflake schema. For complex
systems, we require fact constellations.
Q2) Discuss the models ROLAP, MOLAP and HOLAP. (6)

Ans: ROLAP: ROLAP stands for Relational Online Analytical Processing. ROLAP stores
data in columns and rows (also known as relational tables) and retrieves the information on
demand through user submitted queries. A ROLAP database can be accessed through
complex SQL queries to calculate information. ROLAP can handle large data volumes, but
the larger the data, the slower the processing times. 
Because queries are made on-demand, ROLAP does not require the storage and pre-
computation of information. However, the disadvantage of ROLAP implementations are the
potential performance constraints and scalability limitations that result from large and
inefficient join operations between large tables. Examples of popular ROLAP products
include Metacube by Stanford Technology Group, Red Brick Warehouse by Red Brick
Systems, and AXSYS Suite by Information Advantage.

MOLAP: MOLAP stands for Multidimensional Online Analytical Processing. MOLAP uses
a multidimensional cube that accesses stored data through various combinations. Data is pre-
computed, pre-summarized, and stored (a difference from ROLAP, where queries are served
on-demand).
A multicube approach has proved successful in MOLAP products. In this approach, a series
of dense, small, precalculated cubes make up a hypercube. Tools that incorporate MOLAP
include Oracle Essbase, IBM Cognos, and Apache Kylin.
Its simple interface makes MOLAP easy to use, even for inexperienced users.  Its speedy data
retrieval makes it the best for “slicing and dicing” operations. One major disadvantage of
MOLAP is that it is less scalable than ROLAP, as it can handle a limited amount of data.

HOLAP: HOLAP stands for Hybrid Online Analytical Processing. As the name suggests,
the HOLAP storage mode connects attributes of both MOLAP and ROLAP. Since HOLAP
involves storing part of your data in a ROLAP store and another part in a MOLAP store,
developers get the benefits of both. 
With this use of the two OLAPs, the data is stored in both multidimensional databases and
relational databases. The decision to access one of the databases depends on which is most
appropriate for the requested processing application or type. This setup allows much more
flexibility for handling data. For theoretical processing, the data is stored in a
multidimensional database. For heavy processing, the data is stored in a relational database. 
Microsoft Analysis Services and SAP AG BI Accelerator are products that run off HOLAP. 

Q3) What is Knowledge discovery in database? How it is related to data mining? (2+3=5)

Knowledge discovery in database: Knowledge discovery in databases (KDD) is the process


of discovering useful knowledge from a collection of data. This widely used data mining
technique is a process that includes data preparation and selection, data cleansing,
incorporating prior knowledge on data sets and interpreting accurate solutions from the
observed results.
Major KDD application areas include marketing, fraud detection, telecommunication and
manufacturing.
The main objective of the KDD process is to extract information from data in the context of
large databases. It does this by using Data Mining algorithms to identify what is deemed
knowledge.
The Knowledge Discovery in Databases is considered as a programmed, exploratory analysis
and modeling of vast data repositories.KDD is the organized procedure of recognizing valid,
useful, and understandable patterns from huge and complex data sets. Data Mining is the root
of the KDD procedure, including the inferring of algorithms that investigate the data, develop
the model, and find previously unknown patterns. The model is used for extracting the
knowledge from the data, analyze the data, and predict the data.

Q4) a)Given the Following measurement for the variable age: 18,22,25,42,28,43,33,35,56,28
Standardize the variable by the following
1. Computer the mean absolute deviation of age
2. Computer the Z-score for the first four measurements. (6)
b) Explain K- Mediods. (2)

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy