DWM QB Soln

1)Explain features of Data Warehouse
1. Subject-Oriented
 Focus on Subjects: Data warehouses are organized around major subjects

like sales, customers, or products. This helps in analyzing data based on
specific themes rather than just transactions.
 Subject-Specific Data: Data is categorized and stored by subjects, making
it easier to generate reports and conduct analysis.
2. Integrated
 Data Integration: Data from different sources is consolidated, cleaned, and

stored in a consistent format in the data warehouse. This ensures that data
from various operational systems is uniform and comparable.
 Consistency: Different data sources might use different conventions,
formats, and units. The data warehouse standardizes these, ensuring that
all data is consistent.
3. Time-Variant
 Historical Data: Data warehouses store historical data, which means they
can store snapshots of data over time. This is crucial for trend analysis,
forecasting, and reporting.
 Time Stamps: Data in a warehouse is associated with time periods,
enabling the analysis of trends over various periods.
4. Non-Volatile
 Stable Data: Once data is entered into the data warehouse, it is not
changed or deleted. This ensures data consistency over time.
 Read-Only Access: Data warehouses are primarily read-only, meaning data
is rarely altered or deleted after it is loaded.
5. Data Granularity
 Detailed and Summary Data: A data warehouse contains data at various

levels of granularity. It stores both highly detailed data for deep analysis
and summarized data for quick insights.
 Flexible Queries: Users can query data at different levels of detail,
depending on their needs.
6. Optimized for Query Performance
 Efficient Query Processing: Data warehouses are optimized to handle

complex queries quickly, enabling fast retrieval of large datasets.
 Indexing and Partitioning: Data is often indexed and partitioned to improve
query performance, allowing faster access to specific data segments.
7. Support for Business Intelligence Tools
 Integration with BI Tools: Data warehouses are designed to work

seamlessly with business intelligence (BI) tools like reporting software,
OLAP (Online Analytical Processing) tools, and data mining applications.
 Decision Support: They provide the data infrastructure required for
decision-making processes in an organization.
8. Scalability
 Handling Large Volumes of Data: Data warehouses are scalable, meaning

they can grow in storage capacity and processing power as the amount of
data increases.
 Distributed Architecture: Many data warehouses use a distributed
architecture, allowing them to handle large volumes of data efficiently.
9. Data Mart Integration
 Subsets of Data Warehouses: Data marts are smaller, more focused

versions of data warehouses that cater to specific business lines or
departments. They can be integrated into a data warehouse to provide
specialized data access.
 Departmental Focus: Data marts allow departments to access relevant
data quickly without querying the entire data warehouse.
10. Metadata Management
 Data about Data: Metadata in a data warehouse describes the data's

source, transformation processes, structure, and usage. It helps users
understand the data and how to use it effectively.
 User-Friendly Access: Metadata management tools in a data warehouse
make it easier for users to find and understand the data they need.
2)Top down vs bottom up approach for building data

warehouse
Top-Down Approach
 Initiated by: Enterprise-wide data warehouse design.

 Development Sequence: First, create a centralized data warehouse, then
develop data marts.
 Data Flow: Data is extracted from operational systems, transformed, and
loaded into the centralized data warehouse, then segmented into data
marts.
 Complexity: More complex and time-consuming due to the need for
comprehensive design upfront.
 Advantages: Ensures consistency and uniformity across the organization;
ideal for large enterprises.
 Disadvantages: High initial costs, longer implementation time.
Bottom-Up Approach
 Initiated by: Individual data marts development.

 Development Sequence: First, build data marts for specific departments,
then integrate them into a centralized data warehouse.
 Data Flow: Data is extracted and loaded into individual data marts, which
are then combined into a data warehouse.
 Complexity: Simpler and faster to implement, with a focus on specific
business needs first.
 Advantages: Quick implementation, lower initial costs; allows for gradual
expansion.
 Disadvantages: Risk of data inconsistency; potential challenges in
integrating data marts later into a cohesive data warehouse.
Top-Down Design Approach Bottom-Up Design Approach
Breaks the vast problem into Solves the essential low-level

smaller subproblems. problem and integrates them into a
higher one.
Inherently architected- not a union Inherently incremental; can

of several data marts. schedule essential data marts first.
Single, central storage of Departmental information stored.

information about the content.
Centralized rules and control. Departmental rules and control.
It includes redundant information. Redundancy can be removed.
It may see quick results if Less risk of failure, favorable return

implemented with repetitions. on investment, and proof of
techniques.
3)Define metadata and explain the types of metadata
Metadata is "data about data." It provides information that describes other

data, helping to understand, manage, and use the data effectively. Metadata
serves as a blueprint that helps in organizing, finding, and managing data
within a system.
Roles of Metadata
 Data Discovery and Search

o Helps users find relevant data quickly through descriptions, tags, and
classifications.
 Data Understanding
o Provides context and meaning to data, helping users interpret and
understand the information.
 Data Management
o Facilitates data organization, storage, and retrieval by describing data
structures, relationships, and formats.
 Data Quality and Consistency
o Ensures accuracy by documenting data sources, transformations, and
validation rules.
 Data Integration
o Enables the combination of data from different sources by providing a
common reference and mapping of data fields.
 Data Governance
o Supports compliance and security by detailing access rights, usage
restrictions, and data lineage.
 Data Analysis and Reporting
o Assists in generating reports and analyses by providing templates,
business rules, and examples.
 Data Operations Monitoring
o Helps track and optimize data processing workflows through
operational metadata on load times, errors, and performance.
 Data Preservation
o Aids in long-term data storage and retrieval by documenting version
history, formats, and preservation strategies.
Types of Metadata
1. Operational Metadata
 Definition: Describes the processes and operations involved in managing

data within a system.
 Details: Includes information on data load times, data source refresh rates,
error logs, and processing times.
 Usage: Helps in monitoring and optimizing data processing workflows,
ensuring smooth and efficient data operations.
2. Extractional and Transformational Metadata
 Definition: Describes the processes of extracting and transforming data

from various sources before loading it into a data warehouse.
 Extractional Metadata:
o Details: Information about the sources of data, extraction methods,
data formats, and timestamps of data extraction.
o Usage: Helps track where data comes from and how it's extracted,
aiding in troubleshooting and ensuring data accuracy.
 Transformational Metadata:
o Details: Describes the rules and algorithms used to cleanse, format,
and transform data into the desired structure.
o Usage: Critical for understanding how raw data is converted into the
standardized format used in the data warehouse, ensuring data
consistency and quality.
3. End-User Metadata
 Definition: Metadata that provides context and understanding for end

users to effectively interact with and analyze data.
 Details: Includes data definitions, business rules, report templates, and
query examples tailored for user interpretation.
 Usage: Facilitates easier access to data for non-technical users, helping
them understand what the data represents and how to use it effectively in
reports or analyses.
4) Difference between olap and oltp
Criteria OLAP OLTP
Purpose OLAP helps you OLTP helps you

analyze large volumes manage and process
of data to support real-time transactions.
decision-making.
Data source OLAP uses historical OLTP uses real-time

and aggregated data and transactional data
from multiple sources. from a single source.
Data structure OLAP uses OLTP uses relational

multidimensional databases.
(cubes) or relational
databases.
Data model OLAP uses star OLTP uses normalized

schema, snowflake or denormalized
schema, or other models.
analytical models.
Volume of data OLAP has large OLTP has
storage requirements. comparatively smaller
Think terabytes (TB) storage requirements.
and petabytes (PB). Think gigabytes (GB).
Response time OLAP has longer OLTP has shorter

response times, response times,
typically in seconds or typically in
minutes. milliseconds
Example applications OLAP is good for OLTP is good for

analyzing trends, processing payments,
predicting customer customer data
behavior, and management, and
identifying order processing.
profitability.
5) Data Warehouse vs Data Mart
Data Warehouse Data Mart
A Data Warehouse is a vast A data mart is an only

repository of information subtype of a Data
collected from various Warehouses. It is
organizations or architecture to meet the
departments within a requirement of a specific
corporation. user group.
It may hold multiple subject It holds only one subject

areas. area. For example, Finance
or Sales.
It holds very detailed It may hold more

information. summarized data.
Works to integrate all data It concentrates on

sources integrating data from a given
subject area or set of source
systems.
In data warehousing, Fact In Data Mart, Star Schema

constellation is used. and Snowflake Schema are
used.
It is a Centralized System. It is a Decentralized System.
Data Warehousing is the Data Marts is a project-

data-oriented. oriented.
6) Operations and examples of olap cube
OLAP stands for Online Analytical Processing Server. It is a software

technology that allows users to analyze information from multiple
database systems at the same time. It is based on multidimensional
data model and allows the user to query on multi-dimensional data
(eg. Delhi -> 2018 -> Sales data). OLAP databases are divided into one
or more cubes and these cubes are known as Hyper-cubes.
OLAP operations:
There are five basic analytical operations that can be performed on

an OLAP cube:
Drill down: In drill-down operation, the less detailed data is
converted into highly detailed data. It can be done by:
o Moving down in the concept hierarchy
o Adding a new dimension

In the cube given in overview section, the drill down operation is
performed by moving down in the concept hierarchy of Time
dimension (Quarter -> Month).
Roll up: It is just opposite of the drill-down operation. It performs
aggregation on the OLAP cube. It can be done by:
o Climbing up in the concept hierarchy
o Reducing the dimensions

In the cube given in the overview section, the roll-up operation is
performed by climbing up in the concept hierarchy of Location
dimension (City -> Country).
Dice: It selects a sub-cube from the OLAP cube by selecting two or
more dimensions. In the cube given in the overview section, a sub-
cube is selected by selecting following dimensions with criteria:
o Location = “Delhi” or “Kolkata”
o Time = “Q1” or “Q2”
o Item = “Car” or “Bus”

Slice: It selects a single dimension from the OLAP cube which results
in a new sub-cube creation. In the cube given in the overview section,
Slice is performed on the dimension Time = “Q1”.
Pivot: It is also known as rotation operation as it rotates the current
view to get a new view of the representation. In the sub-cube
obtained after the slice operation, performing pivot operation gives a
new view of it.
7) ETL process
The ETL (Extract, Transform, Load) process is a fundamental part of data
warehousing. It involves extracting data from various source systems,
transforming it into a format suitable for analysis, and loading it into a data
warehouse. Here’s a breakdown of each stage:
1. Extract
 Purpose: The extraction phase involves pulling data from various source
systems, which can include databases, ERP systems, CRM systems, flat
files, cloud services, or other data repositories.
 Process:
o Identify Data Sources: Determine the source systems from which data
will be extracted.
o Data Collection: Use connectors or APIs to collect data from these
sources. This might involve querying databases, extracting files, or
streaming data from real-time sources.
o Data Handling: During extraction, it's essential to handle large volumes
of data efficiently and manage different data formats.
 Example: Extracting customer transaction data from an e-commerce
platform’s database and sales data from a CRM system.

2. Transform
 Purpose: The transformation phase cleanses, standardizes, and organizes
the extracted data to make it suitable for analysis in the data warehouse.
 Process:
o Data Cleansing: Remove duplicates, handle missing data, correct
errors, and filter out irrelevant information.
o Data Standardization: Convert data into a consistent format (e.g., date
formats, units of measure) to ensure uniformity across the dataset.
o Data Integration: Combine data from multiple sources, resolve data
conflicts, and consolidate related data into a single unified view.
o Data Enrichment: Add derived or calculated data, such as aggregating
sales data to calculate monthly totals or adding geographic data based
on customer addresses.
o Business Rules Application: Apply business logic to transform data
into a form that aligns with the organization's requirements.
 Example: Converting all date fields into a standard format (e.g., YYYY-MM-
DD), integrating customer data from different regions, and calculating
total sales per region.
3. Load
 Purpose: The final stage involves loading the transformed data into the
data warehouse, where it can be stored, indexed, and made available for
querying and analysis.
 Process:
o Data Loading: Insert the transformed data into the data warehouse,
which might involve bulk loading large datasets or incremental loading
(e.g., updating only changed records).

o Data Indexing: Index the data for faster querying and retrieval.
o Data Validation: Perform checks to ensure that the data has been
loaded correctly and that it meets the necessary quality standards.
o Data Partitioning: Optionally, partition the data in the warehouse to
optimize performance.
 Example: Loading the cleansed and standardized customer transaction
data into a sales fact table in the data warehouse, where it can be analyzed
by business intelligence tools.
8)Central tendancy numerical

(Find mean, median, mode, variance and midrange value)
9) Hierarchical clustering
10)K-means clustering algorithm + numerical

DWM QB Soln

Uploaded by

Copyright:

Available Formats

DWM QB Soln

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

DWM QB Soln

Uploaded by

Copyright:

Available Formats

1)Explain features of Data Warehouse

 Focus on Subjects: Data warehouses are organized around major subjects

 Data Integration: Data from different sources is consolidated, cleaned, and

 Detailed and Summary Data: A data warehouse contains data at various

6. Optimized for Query Performance

 Efficient Query Processing: Data warehouses are optimized to handle

7. Support for Business Intelligence Tools

 Integration with BI Tools: Data warehouses are designed to work

 Handling Large Volumes of Data: Data warehouses are scalable, meaning

9. Data Mart Integration

 Subsets of Data Warehouses: Data marts are smaller, more focused

 Data about Data: Metadata in a data warehouse describes the data's

2)Top down vs bottom up approach for building data

 Initiated by: Enterprise-wide data warehouse design.

 Initiated by: Individual data marts development.

Breaks the vast problem into Solves the essential low-level

Inherently architected- not a union Inherently incremental; can

Single, central storage of Departmental information stored.

Centralized rules and control. Departmental rules and control.

It includes redundant information. Redundancy can be removed.

It may see quick results if Less risk of failure, favorable return

3)Define metadata and explain the types of metadata

Metadata is "data about data." It provides information that describes other

 Data Discovery and Search

 Definition: Describes the processes and operations involved in managing

2. Extractional and Transformational Metadata

 Definition: Describes the processes of extracting and transforming data

 Definition: Metadata that provides context and understanding for end

Criteria OLAP OLTP

Purpose OLAP helps you OLTP helps you

Data source OLAP uses historical OLTP uses real-time

Data structure OLAP uses OLTP uses relational

Data model OLAP uses star OLTP uses normalized

Response time OLAP has longer OLTP has shorter

Example applications OLAP is good for OLTP is good for

5) Data Warehouse vs Data Mart

Data Warehouse Data Mart

A Data Warehouse is a vast A data mart is an only

It may hold multiple subject It holds only one subject

It holds very detailed It may hold more

Works to integrate all data It concentrates on

In data warehousing, Fact In Data Mart, Star Schema

It is a Centralized System. It is a Decentralized System.

Data Warehousing is the Data Marts is a project-

OLAP stands for Online Analytical Processing Server. It is a software

There are five basic analytical operations that can be performed on

Drill down: In drill-down operation, the less detailed data is

converted into highly detailed data. It can be done by:

o Moving down in the concept hierarchy

o Adding a new dimension

performed by moving down in the concept hierarchy of Time

dimension (Quarter -> Month).

Roll up: It is just opposite of the drill-down operation. It performs

aggregation on the OLAP cube. It can be done by:

o Climbing up in the concept hierarchy

o Reducing the dimensions

performed by climbing up in the concept hierarchy of Location

dimension (City -> Country).

Dice: It selects a sub-cube from the OLAP cube by selecting two or

more dimensions. In the cube given in the overview section, a sub-