CCS341-DW 2 QB Unit 4 Key
CCS341-DW 2 QB Unit 4 Key
SCHOOL OF ENGINEERING
Accredited by NAAC
Approved by AICTE, New Delhi; Affiliated to Anna University, Chennai
Siruganur, Trichy -621 105. www.mamse.in
and components.
3. No redundancy, so it is easier to maintain.
A centralized location that stores information about the data within the warehouse,
including details like table structures, column definitions, data lineage, source systems,
transformations applied, and data quality metrics.
3. Give syntax for dimension definition.
OIN KEY
This clause in the dimension definition specifies how to join the levels in the hierarchy. For
example, JOIN KEY (customers.country_id) REFERENCES country .
SKIP WHEN NULL
This clause can be used to link levels that have a missing level in their hierarchy. For
example, JOIN KEY (city. country_id) REFERENCES country
Parallelism is used to support speedup, where queries are executed faster because
M.A.M. SCHOOL OF ENGINEERING
Accredited by NAAC
Approved by AICTE, New Delhi; Affiliated to Anna University, Chennai
Siruganur, Trichy -621 105. www.mamse.in
more resources, such as processors and disks, are provided. Parallelism is also used to
provide scale-up, where increasing workloads are managed without increase response-
time, via an increase in the degree of parallelism.
1. Horizontal Parallelism
2. Vertical Parallelism
3. Interquery Parallelism
5. Write short note on Round Robin Partition.
In the round robin technique, when a new partition is needed, the old one is archived. It
uses metadata to allow user access tool to refer to the correcttable partition.
The process of designing and structuring the data within a data warehouse by defining
the relationships between different data entities, such as customers, products, and
transactions, to enable efficient analysis and reporting, essentially creating a blueprint for
how data will be organized and stored to support business intelligence needs within the
warehouse
7. List the benefits of Dimensional Modeling
Dimensional modeling represents data with a cube operation, making more suitable
logical data representation with OLAP data management. The perception of Dimensional
Modeling was developed by Ralph Kimball and is consist of "fact" and
"dimension" tables.
8. Define Data cube measure.
When data is grouped or combined in multidimensional matrices called Data Cubes. The
data cube method has a few alternative names or a few variants, such as
"Multidimensional databases," "materialized views," and "OLAP (On-Line Analytical
Processing)."
9. List out the characteristics of star schema
The star schema is intensely suitable for data warehouse database design because of the following
features:
A Fact constellation means two or more fact tables sharing one or more dimensions. It is
also called Galaxy schema.
Fact Constellation Schema describes a logical structure of data warehouse or data mart.
Fact Constellation Schema can design with a collection of de-normalized FACT, Shared,
and Conformed Dimension tables.
Part B
Predict the several selection criteria which should be considered while
implementing a data warehouse tools.
1. The ability to identify the data in the data source environment that can be read bythe
tool is necessary.
2. Support for flat files, indexed files, and legacy DBMSs is critical.
3. The capability to merge records from multiple data stores is required in many
installations.
7. Selective data extraction of both data items and records enables users to extractonly
the required data.
9. The ability to perform data type and the character-set translation is a requirementwhen
moving data between incompatible systems.
10. The ability to create aggregation, summarization and derivation fields andrecords are
necessary.
11. Vendor stability and support for the products are components that must beevaluated
carefully.
M.A.M. SCHOOL OF ENGINEERING
Accredited by NAAC
Approved by AICTE, New Delhi; Affiliated to Anna University, Chennai
Siruganur, Trichy -621 105. www.mamse.in
Explain the star schema, snowflake schema and fact constellation schema with
examples.
A star schema is the elementary form of a dimensional model, in which data are organized
into facts and dimensions. A fact is an event that is counted or measured, such as a sale or
log in. A dimension includes reference data about the fact, such as date, item, or customer.
A star schema is a relational schema where a relational schema whose design represents a
multidimensional data model. The star schema is the explicit data warehouse schema. It is
known as star schema because the entity-relationship diagram of this schemas simulates a
star, with points, diverge from a central table. The center of the schema consists of a large
fact table, and the points of the star are the dimension tables.
12
A snowflake schema is equivalent to the star schema. "A schema is known as a snowflake if
one or more dimension tables do not connect directly to the fact table but must join through
other dimension tables."
The snowflake schema is an expansion of the star schema where each point of the star
explodes into more points. It is called snowflake schema because the diagram of snowflake
schema resembles a snowflake. Snowflaking is a method of normalizing the dimension
tables in a STAR schemas. When we normalize all the dimension tables entirely, the resultant
structure resembles a snowflake with the fact table in the middle.
Snowflaking is used to develop the performance of specific queries. The schema is diagramed
with each fact surrounded by its associated dimensions, and those dimensions are related to
other dimensions, branching out into a snowflake pattern.
M.A.M. SCHOOL OF ENGINEERING
Accredited by NAAC
Approved by AICTE, New Delhi; Affiliated to Anna University, Chennai
Siruganur, Trichy -621 105. www.mamse.in
The snowflake schema consists of one fact table which is linked to many dimension tables,
which can be linked to other dimension tables through a many-to-one relationship. Tables in a
snowflake schema are generally normalized to the third normal form. Each dimension table
performs exactly one level in a hierarchy.
The following diagram shows a snowflake schema with two dimensions, each having three
levels. A snowflake schemas can have any number of dimension, and each dimension can
have any number of levels.
A Fact constellation means two or more fact tables sharing one or more dimensions. Itis
also called Galaxy schema.
Fact Constellation Schema describes a logical structure of data warehouse or data mart. Fact
Constellation Schema can design with a collection of de-normalized FACT, Shared, and
Conformed Dimension tables.
M.A.M. SCHOOL OF ENGINEERING
Accredited by NAAC
Approved by AICTE, New Delhi; Affiliated to Anna University, Chennai
Siruganur, Trichy -621 105. www.mamse.in
Fact Constellation Schema is a sophisticated database design that is difficult to summarize
information. Fact Constellation Schema can implement between aggregate Fact tables or
decompose a complex Fact table into independent simplex Fact tables.
M.A.M. SCHOOL OF ENGINEERING
Accredited by NAAC
Approved by AICTE, New Delhi; Affiliated to Anna University, Chennai
Siruganur, Trichy -621 105. www.mamse.in
When data is grouped or combined in multidimensional matrices called Data Cubes. The
data cube method has a few alternative names or a few variants, such as "Multidimensional
databases," "materialized views," and "OLAP (On-Line Analytical Processing)."
The general idea of this approach is to materialize certain expensive computations that are
frequently inquired.
For example, a relation with the schema sales (part, supplier, customer, and sale-price) can
be materialized into a set of eight views as shown in fig, where psc indicates a viewconsisting
of aggregate function value (such as total-sales) computed by grouping three attributes part,
supplier, and customer, p indicates a view composed of the corresponding aggregate function
values calculated by grouping part alone, etc.
13
A data cube is created from a subset of attributes in the database. Specific attributes are
chosen to be measure attributes, i.e., the attributes whose values are of interest. Another
attributes are selected as dimensions or functional attributes. The measure attributes are
aggregated according to the dimensions.
Dimensions are a fact that defines a data cube. Facts are generally quantities, which are
used for analyzing the relationship between dimensions.
M.A.M. SCHOOL OF ENGINEERING
Accredited by NAAC
Approved by AICTE, New Delhi; Affiliated to Anna University, Chennai
Siruganur, Trichy -621 105. www.mamse.in
OLAP is implemented on data warehouses or data marts. The primary objective of OLAP is to support
ad-hoc querying needed for support DSS. The multidimensional view of data is fundamental to the OLAP
application. OLAP is an operational view, not adata structure or schema. The complex nature of OLAP
applications requires a multidimensional view of the data.
14 i Predict the Application areas of the Data warehouse
Information Processing
It deals with querying, statistical analysis, and reporting via tables, charts, or
graphs. Nowadays, information processing of data warehouse is to construct a
low cost, web-based accessing tools typically integrated with web browsers.
Analytical Processing
Data Mining
The process architecture defines an architecture in which the data from the data warehouse is
processed for a particular computation.
In this architecture, the data is collected into single centralized storage and processed upon
completion by a single machine with a huge structure in terms of memory, processor, and
storage.
M.A.M. SCHOOL OF ENGINEERING
Accredited by NAAC
Approved by AICTE, New Delhi; Affiliated to Anna University, Chennai
Siruganur, Trichy -621 105. www.mamse.in
Centralized process architecture evolved with transaction processing and is well suited for
small organizations with one location of service.
It is very successful when the collection and consumption of data occur at the same location.
In this architecture, information and its processing are allocated across data centers, and its
processing is distributed across data centers, and processing of data is localized with the group
of the results into centralized storage. Distributed architectures are used to overcome the
limitations of the centralized process architectures where all the information needs to be
collected to one central location, and results are available in one central location.
Client-Server
In this architecture, the user does all the information collecting and presentation, while the
server does the processing and management of data.
Three-tier Architecture
M.A.M. SCHOOL OF ENGINEERING
Accredited by NAAC
Approved by AICTE, New Delhi; Affiliated to Anna University, Chennai
Siruganur, Trichy -621 105. www.mamse.in
With client-server architecture, the client machines need to be connected to a server machine,
thus mandating finite states and introducing latencies and overhead in terms of record to be
carried between clients and servers.
N-tier Architecture
The n-tier or multi-tier architecture is where clients, middleware, applications, and servers are
isolated into tiers.
Cluster Architecture
Peer-to-Peer Architecture
This is a type of architecture where there are no dedicated servers and clients. Instead, all the
processing responsibilities are allocated among all machines, called peers. Each machine can
perform the function of a client or server or just process data.