0% found this document useful (0 votes)
40 views11 pages

CCS341-DW 2 QB Unit 4 Key

Key

Uploaded by

ksathishkm
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
40 views11 pages

CCS341-DW 2 QB Unit 4 Key

Key

Uploaded by

ksathishkm
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 11

M.A.M.

SCHOOL OF ENGINEERING
Accredited by NAAC
Approved by AICTE, New Delhi; Affiliated to Anna University, Chennai
Siruganur, Trichy -621 105. www.mamse.in

Department of Computer Science and Engineering


Academic Year 2024-2025 (Odd Semester)
UNIT 4
Sub.Code/Sub.Name : CCS341- Data Warehousing Date :
Year/Sem. : III / V

Answer All the Questions


Part A (10x2=20 Marks)
1. Write the advantages of Snowflake schema

1. The primary advantage of the snowflake schema is the development in query

performance due to minimized disk storage requirements and joining smaller


lookup tables.
2. It provides greater scalability in the interrelationship between dimension levels

and components.
3. No redundancy, so it is easier to maintain.

2. List the metadata repository.

A centralized location that stores information about the data within the warehouse,
including details like table structures, column definitions, data lineage, source systems,
transformations applied, and data quality metrics.
3. Give syntax for dimension definition.

 OIN KEY
This clause in the dimension definition specifies how to join the levels in the hierarchy. For
example, JOIN KEY (customers.country_id) REFERENCES country .
 SKIP WHEN NULL
This clause can be used to link levels that have a missing level in their hierarchy. For
example, JOIN KEY (city. country_id) REFERENCES country

4. List the types of Data Base Parallelism.

Parallelism is used to support speedup, where queries are executed faster because
M.A.M. SCHOOL OF ENGINEERING
Accredited by NAAC
Approved by AICTE, New Delhi; Affiliated to Anna University, Chennai
Siruganur, Trichy -621 105. www.mamse.in
more resources, such as processors and disks, are provided. Parallelism is also used to
provide scale-up, where increasing workloads are managed without increase response-
time, via an increase in the degree of parallelism.
1. Horizontal Parallelism
2. Vertical Parallelism
3. Interquery Parallelism
5. Write short note on Round Robin Partition.

In the round robin technique, when a new partition is needed, the old one is archived. It
uses metadata to allow user access tool to refer to the correcttable partition.

6. Define Data modeling

The process of designing and structuring the data within a data warehouse by defining
the relationships between different data entities, such as customers, products, and
transactions, to enable efficient analysis and reporting, essentially creating a blueprint for
how data will be organized and stored to support business intelligence needs within the
warehouse
7. List the benefits of Dimensional Modeling

Dimensional modeling represents data with a cube operation, making more suitable
logical data representation with OLAP data management. The perception of Dimensional
Modeling was developed by Ralph Kimball and is consist of "fact" and
"dimension" tables.
8. Define Data cube measure.

When data is grouped or combined in multidimensional matrices called Data Cubes. The
data cube method has a few alternative names or a few variants, such as
"Multidimensional databases," "materialized views," and "OLAP (On-Line Analytical
Processing)."
9. List out the characteristics of star schema
The star schema is intensely suitable for data warehouse database design because of the following
features:

○ It creates a DE-normalized database that can quickly provide query responses.


○ It provides a flexible design that can be changed easily or added to throughoutthe
development cycle, and as the database grows.
○ It provides a parallel in design to how end-users typically think of and use thedata.
○ It reduces the complexity of metadata for both developers and end-users.

10. Define Fact Constellation Schema


M.A.M. SCHOOL OF ENGINEERING
Accredited by NAAC
Approved by AICTE, New Delhi; Affiliated to Anna University, Chennai
Siruganur, Trichy -621 105. www.mamse.in

A Fact constellation means two or more fact tables sharing one or more dimensions. It is
also called Galaxy schema.
Fact Constellation Schema describes a logical structure of data warehouse or data mart.
Fact Constellation Schema can design with a collection of de-normalized FACT, Shared,
and Conformed Dimension tables.

Part B
Predict the several selection criteria which should be considered while
implementing a data warehouse tools.

1. The ability to identify the data in the data source environment that can be read bythe
tool is necessary.

2. Support for flat files, indexed files, and legacy DBMSs is critical.

3. The capability to merge records from multiple data stores is required in many
installations.

4. The specification interface to indicate the information to be extracted and


conversation are essential.

5. The ability to read information from repository products or data dictionaries is


desired.
11 6. The code develops by the tool should be completely maintainable.

7. Selective data extraction of both data items and records enables users to extractonly
the required data.

8. A field-level data examination for the transformation of data into information is


needed.

9. The ability to perform data type and the character-set translation is a requirementwhen
moving data between incompatible systems.

10. The ability to create aggregation, summarization and derivation fields andrecords are
necessary.

11. Vendor stability and support for the products are components that must beevaluated
carefully.
M.A.M. SCHOOL OF ENGINEERING
Accredited by NAAC
Approved by AICTE, New Delhi; Affiliated to Anna University, Chennai
Siruganur, Trichy -621 105. www.mamse.in

Explain the star schema, snowflake schema and fact constellation schema with
examples.

A star schema is the elementary form of a dimensional model, in which data are organized
into facts and dimensions. A fact is an event that is counted or measured, such as a sale or
log in. A dimension includes reference data about the fact, such as date, item, or customer.

A star schema is a relational schema where a relational schema whose design represents a
multidimensional data model. The star schema is the explicit data warehouse schema. It is
known as star schema because the entity-relationship diagram of this schemas simulates a
star, with points, diverge from a central table. The center of the schema consists of a large
fact table, and the points of the star are the dimension tables.

12

A snowflake schema is equivalent to the star schema. "A schema is known as a snowflake if
one or more dimension tables do not connect directly to the fact table but must join through
other dimension tables."

The snowflake schema is an expansion of the star schema where each point of the star
explodes into more points. It is called snowflake schema because the diagram of snowflake
schema resembles a snowflake. Snowflaking is a method of normalizing the dimension
tables in a STAR schemas. When we normalize all the dimension tables entirely, the resultant
structure resembles a snowflake with the fact table in the middle.

Snowflaking is used to develop the performance of specific queries. The schema is diagramed
with each fact surrounded by its associated dimensions, and those dimensions are related to
other dimensions, branching out into a snowflake pattern.
M.A.M. SCHOOL OF ENGINEERING
Accredited by NAAC
Approved by AICTE, New Delhi; Affiliated to Anna University, Chennai
Siruganur, Trichy -621 105. www.mamse.in
The snowflake schema consists of one fact table which is linked to many dimension tables,
which can be linked to other dimension tables through a many-to-one relationship. Tables in a
snowflake schema are generally normalized to the third normal form. Each dimension table
performs exactly one level in a hierarchy.

The following diagram shows a snowflake schema with two dimensions, each having three
levels. A snowflake schemas can have any number of dimension, and each dimension can
have any number of levels.

A Fact constellation means two or more fact tables sharing one or more dimensions. Itis
also called Galaxy schema.
Fact Constellation Schema describes a logical structure of data warehouse or data mart. Fact
Constellation Schema can design with a collection of de-normalized FACT, Shared, and
Conformed Dimension tables.
M.A.M. SCHOOL OF ENGINEERING
Accredited by NAAC
Approved by AICTE, New Delhi; Affiliated to Anna University, Chennai
Siruganur, Trichy -621 105. www.mamse.in
Fact Constellation Schema is a sophisticated database design that is difficult to summarize

information. Fact Constellation Schema can implement between aggregate Fact tables or
decompose a complex Fact table into independent simplex Fact tables.
M.A.M. SCHOOL OF ENGINEERING
Accredited by NAAC
Approved by AICTE, New Delhi; Affiliated to Anna University, Chennai
Siruganur, Trichy -621 105. www.mamse.in

Discuss about multidimensional database, data mart and data cube.

When data is grouped or combined in multidimensional matrices called Data Cubes. The
data cube method has a few alternative names or a few variants, such as "Multidimensional
databases," "materialized views," and "OLAP (On-Line Analytical Processing)."
The general idea of this approach is to materialize certain expensive computations that are
frequently inquired.
For example, a relation with the schema sales (part, supplier, customer, and sale-price) can
be materialized into a set of eight views as shown in fig, where psc indicates a viewconsisting
of aggregate function value (such as total-sales) computed by grouping three attributes part,
supplier, and customer, p indicates a view composed of the corresponding aggregate function
values calculated by grouping part alone, etc.

13

A data cube is created from a subset of attributes in the database. Specific attributes are
chosen to be measure attributes, i.e., the attributes whose values are of interest. Another
attributes are selected as dimensions or functional attributes. The measure attributes are
aggregated according to the dimensions.

Dimensions are a fact that defines a data cube. Facts are generally quantities, which are
used for analyzing the relationship between dimensions.
M.A.M. SCHOOL OF ENGINEERING
Accredited by NAAC
Approved by AICTE, New Delhi; Affiliated to Anna University, Chennai
Siruganur, Trichy -621 105. www.mamse.in

OLAP is implemented on data warehouses or data marts. The primary objective of OLAP is to support
ad-hoc querying needed for support DSS. The multidimensional view of data is fundamental to the OLAP
application. OLAP is an operational view, not adata structure or schema. The complex nature of OLAP
applications requires a multidimensional view of the data.
14 i Predict the Application areas of the Data warehouse

Customer relationship management (CRM), sales analysis, marketing campaign


effectiveness, financial reporting, operational efficiency monitoring, fraud
detection, healthcare analytics, supply chain optimization, and market trend
analysis across industries like retail, banking, telecommunications, healthcare, and
manufacturing

Information Processing

It deals with querying, statistical analysis, and reporting via tables, charts, or
graphs. Nowadays, information processing of data warehouse is to construct a
low cost, web-based accessing tools typically integrated with web browsers.

Analytical Processing

It supports various online analytical processing such as drill-down, roll-up, and


pivoting. The historical data is being processed in both summarized and
detailed format.

OLAP is implemented on data warehouses or data marts. The primary objective


of OLAP is to support ad-hoc querying needed for support DSS. The
multidimensional view of data is fundamental to the OLAP application. OLAP
M.A.M. SCHOOL OF ENGINEERING
Accredited by NAAC
Approved by AICTE, New Delhi; Affiliated to Anna University, Chennai
Siruganur, Trichy -621 105. www.mamse.in
is an operational view, not a data structure or schema. The complex nature of
OLAP applications requires a multidimensional view of the data.

Data Mining

It helps in the analysis of hidden design and association, constructing scientific


models, operating classification and prediction, and performing the mining
results using visualization tools.

ii Show the figure in fact constellation schema.

15 Describe in detail about two fundamental process architecture.

The process architecture defines an architecture in which the data from the data warehouse is
processed for a particular computation.

Following are the two fundamental process architectures:

Centralized Process Architecture

In this architecture, the data is collected into single centralized storage and processed upon
completion by a single machine with a huge structure in terms of memory, processor, and
storage.
M.A.M. SCHOOL OF ENGINEERING
Accredited by NAAC
Approved by AICTE, New Delhi; Affiliated to Anna University, Chennai
Siruganur, Trichy -621 105. www.mamse.in

Centralized process architecture evolved with transaction processing and is well suited for
small organizations with one location of service.

It requires minimal resources both from people and system perspectives.

It is very successful when the collection and consumption of data occur at the same location.

Distributed Process Architecture

In this architecture, information and its processing are allocated across data centers, and its
processing is distributed across data centers, and processing of data is localized with the group
of the results into centralized storage. Distributed architectures are used to overcome the
limitations of the centralized process architectures where all the information needs to be
collected to one central location, and results are available in one central location.

There are several architectures of the distributed process:

Client-Server
In this architecture, the user does all the information collecting and presentation, while the
server does the processing and management of data.

Three-tier Architecture
M.A.M. SCHOOL OF ENGINEERING
Accredited by NAAC
Approved by AICTE, New Delhi; Affiliated to Anna University, Chennai
Siruganur, Trichy -621 105. www.mamse.in

With client-server architecture, the client machines need to be connected to a server machine,
thus mandating finite states and introducing latencies and overhead in terms of record to be
carried between clients and servers.

N-tier Architecture

The n-tier or multi-tier architecture is where clients, middleware, applications, and servers are
isolated into tiers.

Cluster Architecture

In this architecture, machines that are connected in network architecture (software or


hardware) to approximately work together to process information or compute requirements in
parallel. Each device in a cluster is associated with a function that is processed locally, and the
result sets are collected to a master server that returns it to the user.

Peer-to-Peer Architecture

This is a type of architecture where there are no dedicated servers and clients. Instead, all the
processing responsibilities are allocated among all machines, called peers. Each machine can
perform the function of a client or server or just process data.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy