DWM QB Soln

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 18

1)Explain features of Data Warehouse

1. Subject-Oriented

 Focus on Subjects: Data warehouses are organized around major subjects


like sales, customers, or products. This helps in analyzing data based on
specific themes rather than just transactions.
 Subject-Specific Data: Data is categorized and stored by subjects, making
it easier to generate reports and conduct analysis.

2. Integrated

 Data Integration: Data from different sources is consolidated, cleaned, and


stored in a consistent format in the data warehouse. This ensures that data
from various operational systems is uniform and comparable.
 Consistency: Different data sources might use different conventions,
formats, and units. The data warehouse standardizes these, ensuring that
all data is consistent.

3. Time-Variant

 Historical Data: Data warehouses store historical data, which means they
can store snapshots of data over time. This is crucial for trend analysis,
forecasting, and reporting.
 Time Stamps: Data in a warehouse is associated with time periods,
enabling the analysis of trends over various periods.

4. Non-Volatile

 Stable Data: Once data is entered into the data warehouse, it is not
changed or deleted. This ensures data consistency over time.
 Read-Only Access: Data warehouses are primarily read-only, meaning data
is rarely altered or deleted after it is loaded.

5. Data Granularity

 Detailed and Summary Data: A data warehouse contains data at various


levels of granularity. It stores both highly detailed data for deep analysis
and summarized data for quick insights.
 Flexible Queries: Users can query data at different levels of detail,
depending on their needs.

6. Optimized for Query Performance

 Efficient Query Processing: Data warehouses are optimized to handle


complex queries quickly, enabling fast retrieval of large datasets.
 Indexing and Partitioning: Data is often indexed and partitioned to improve
query performance, allowing faster access to specific data segments.

7. Support for Business Intelligence Tools

 Integration with BI Tools: Data warehouses are designed to work


seamlessly with business intelligence (BI) tools like reporting software,
OLAP (Online Analytical Processing) tools, and data mining applications.
 Decision Support: They provide the data infrastructure required for
decision-making processes in an organization.

8. Scalability

 Handling Large Volumes of Data: Data warehouses are scalable, meaning


they can grow in storage capacity and processing power as the amount of
data increases.
 Distributed Architecture: Many data warehouses use a distributed
architecture, allowing them to handle large volumes of data efficiently.

9. Data Mart Integration

 Subsets of Data Warehouses: Data marts are smaller, more focused


versions of data warehouses that cater to specific business lines or
departments. They can be integrated into a data warehouse to provide
specialized data access.
 Departmental Focus: Data marts allow departments to access relevant
data quickly without querying the entire data warehouse.
10. Metadata Management

 Data about Data: Metadata in a data warehouse describes the data's


source, transformation processes, structure, and usage. It helps users
understand the data and how to use it effectively.
 User-Friendly Access: Metadata management tools in a data warehouse
make it easier for users to find and understand the data they need.

2)Top down vs bottom up approach for building data


warehouse

Top-Down Approach

 Initiated by: Enterprise-wide data warehouse design.


 Development Sequence: First, create a centralized data warehouse, then
develop data marts.
 Data Flow: Data is extracted from operational systems, transformed, and
loaded into the centralized data warehouse, then segmented into data
marts.
 Complexity: More complex and time-consuming due to the need for
comprehensive design upfront.
 Advantages: Ensures consistency and uniformity across the organization;
ideal for large enterprises.
 Disadvantages: High initial costs, longer implementation time.

Bottom-Up Approach

 Initiated by: Individual data marts development.


 Development Sequence: First, build data marts for specific departments,
then integrate them into a centralized data warehouse.
 Data Flow: Data is extracted and loaded into individual data marts, which
are then combined into a data warehouse.
 Complexity: Simpler and faster to implement, with a focus on specific
business needs first.
 Advantages: Quick implementation, lower initial costs; allows for gradual
expansion.
 Disadvantages: Risk of data inconsistency; potential challenges in
integrating data marts later into a cohesive data warehouse.
Top-Down Design Approach Bottom-Up Design Approach

Breaks the vast problem into Solves the essential low-level


smaller subproblems. problem and integrates them into a
higher one.

Inherently architected- not a union Inherently incremental; can


of several data marts. schedule essential data marts first.

Single, central storage of Departmental information stored.


information about the content.

Centralized rules and control. Departmental rules and control.

It includes redundant information. Redundancy can be removed.

It may see quick results if Less risk of failure, favorable return


implemented with repetitions. on investment, and proof of
techniques.

3)Define metadata and explain the types of metadata

Metadata is "data about data." It provides information that describes other


data, helping to understand, manage, and use the data effectively. Metadata
serves as a blueprint that helps in organizing, finding, and managing data
within a system.
Roles of Metadata

 Data Discovery and Search


o Helps users find relevant data quickly through descriptions, tags, and
classifications.
 Data Understanding
o Provides context and meaning to data, helping users interpret and
understand the information.
 Data Management
o Facilitates data organization, storage, and retrieval by describing data
structures, relationships, and formats.
 Data Quality and Consistency
o Ensures accuracy by documenting data sources, transformations, and
validation rules.
 Data Integration
o Enables the combination of data from different sources by providing a
common reference and mapping of data fields.
 Data Governance
o Supports compliance and security by detailing access rights, usage
restrictions, and data lineage.
 Data Analysis and Reporting
o Assists in generating reports and analyses by providing templates,
business rules, and examples.
 Data Operations Monitoring
o Helps track and optimize data processing workflows through
operational metadata on load times, errors, and performance.
 Data Preservation
o Aids in long-term data storage and retrieval by documenting version
history, formats, and preservation strategies.

Types of Metadata
1. Operational Metadata

 Definition: Describes the processes and operations involved in managing


data within a system.
 Details: Includes information on data load times, data source refresh rates,
error logs, and processing times.
 Usage: Helps in monitoring and optimizing data processing workflows,
ensuring smooth and efficient data operations.

2. Extractional and Transformational Metadata

 Definition: Describes the processes of extracting and transforming data


from various sources before loading it into a data warehouse.
 Extractional Metadata:
o Details: Information about the sources of data, extraction methods,
data formats, and timestamps of data extraction.
o Usage: Helps track where data comes from and how it's extracted,
aiding in troubleshooting and ensuring data accuracy.
 Transformational Metadata:
o Details: Describes the rules and algorithms used to cleanse, format,
and transform data into the desired structure.
o Usage: Critical for understanding how raw data is converted into the
standardized format used in the data warehouse, ensuring data
consistency and quality.

3. End-User Metadata

 Definition: Metadata that provides context and understanding for end


users to effectively interact with and analyze data.
 Details: Includes data definitions, business rules, report templates, and
query examples tailored for user interpretation.
 Usage: Facilitates easier access to data for non-technical users, helping
them understand what the data represents and how to use it effectively in
reports or analyses.
4) Difference between olap and oltp

Criteria OLAP OLTP

Purpose OLAP helps you OLTP helps you


analyze large volumes manage and process
of data to support real-time transactions.
decision-making.

Data source OLAP uses historical OLTP uses real-time


and aggregated data and transactional data
from multiple sources. from a single source.

Data structure OLAP uses OLTP uses relational


multidimensional databases.
(cubes) or relational
databases.

Data model OLAP uses star OLTP uses normalized


schema, snowflake or denormalized
schema, or other models.
analytical models.
Volume of data OLAP has large OLTP has
storage requirements. comparatively smaller
Think terabytes (TB) storage requirements.
and petabytes (PB). Think gigabytes (GB).

Response time OLAP has longer OLTP has shorter


response times, response times,
typically in seconds or typically in
minutes. milliseconds

Example applications OLAP is good for OLTP is good for


analyzing trends, processing payments,
predicting customer customer data
behavior, and management, and
identifying order processing.
profitability.

5) Data Warehouse vs Data Mart

Data Warehouse Data Mart

A Data Warehouse is a vast A data mart is an only


repository of information subtype of a Data
collected from various Warehouses. It is
organizations or architecture to meet the
departments within a requirement of a specific
corporation. user group.

It may hold multiple subject It holds only one subject


areas. area. For example, Finance
or Sales.

It holds very detailed It may hold more


information. summarized data.

Works to integrate all data It concentrates on


sources integrating data from a given
subject area or set of source
systems.

In data warehousing, Fact In Data Mart, Star Schema


constellation is used. and Snowflake Schema are
used.

It is a Centralized System. It is a Decentralized System.

Data Warehousing is the Data Marts is a project-


data-oriented. oriented.
6) Operations and examples of olap cube

OLAP stands for Online Analytical Processing Server. It is a software


technology that allows users to analyze information from multiple
database systems at the same time. It is based on multidimensional
data model and allows the user to query on multi-dimensional data
(eg. Delhi -> 2018 -> Sales data). OLAP databases are divided into one
or more cubes and these cubes are known as Hyper-cubes.

OLAP operations:

There are five basic analytical operations that can be performed on


an OLAP cube:

Drill down: In drill-down operation, the less detailed data is

converted into highly detailed data. It can be done by:

o Moving down in the concept hierarchy

o Adding a new dimension


In the cube given in overview section, the drill down operation is

performed by moving down in the concept hierarchy of Time

dimension (Quarter -> Month).

Roll up: It is just opposite of the drill-down operation. It performs

aggregation on the OLAP cube. It can be done by:

o Climbing up in the concept hierarchy

o Reducing the dimensions


In the cube given in the overview section, the roll-up operation is

performed by climbing up in the concept hierarchy of Location

dimension (City -> Country).

Dice: It selects a sub-cube from the OLAP cube by selecting two or

more dimensions. In the cube given in the overview section, a sub-

cube is selected by selecting following dimensions with criteria:

o Location = “Delhi” or “Kolkata”

o Time = “Q1” or “Q2”

o Item = “Car” or “Bus”


Slice: It selects a single dimension from the OLAP cube which results

in a new sub-cube creation. In the cube given in the overview section,

Slice is performed on the dimension Time = “Q1”.

Pivot: It is also known as rotation operation as it rotates the current

view to get a new view of the representation. In the sub-cube

obtained after the slice operation, performing pivot operation gives a

new view of it.

7) ETL process
The ETL (Extract, Transform, Load) process is a fundamental part of data

warehousing. It involves extracting data from various source systems,

transforming it into a format suitable for analysis, and loading it into a data

warehouse. Here’s a breakdown of each stage:

1. Extract

 Purpose: The extraction phase involves pulling data from various source

systems, which can include databases, ERP systems, CRM systems, flat

files, cloud services, or other data repositories.

 Process:

o Identify Data Sources: Determine the source systems from which data

will be extracted.

o Data Collection: Use connectors or APIs to collect data from these

sources. This might involve querying databases, extracting files, or

streaming data from real-time sources.

o Data Handling: During extraction, it's essential to handle large volumes

of data efficiently and manage different data formats.

 Example: Extracting customer transaction data from an e-commerce

platform’s database and sales data from a CRM system.


2. Transform

 Purpose: The transformation phase cleanses, standardizes, and organizes

the extracted data to make it suitable for analysis in the data warehouse.

 Process:

o Data Cleansing: Remove duplicates, handle missing data, correct

errors, and filter out irrelevant information.

o Data Standardization: Convert data into a consistent format (e.g., date

formats, units of measure) to ensure uniformity across the dataset.

o Data Integration: Combine data from multiple sources, resolve data

conflicts, and consolidate related data into a single unified view.

o Data Enrichment: Add derived or calculated data, such as aggregating

sales data to calculate monthly totals or adding geographic data based

on customer addresses.

o Business Rules Application: Apply business logic to transform data

into a form that aligns with the organization's requirements.

 Example: Converting all date fields into a standard format (e.g., YYYY-MM-

DD), integrating customer data from different regions, and calculating

total sales per region.

3. Load

 Purpose: The final stage involves loading the transformed data into the

data warehouse, where it can be stored, indexed, and made available for

querying and analysis.

 Process:

o Data Loading: Insert the transformed data into the data warehouse,

which might involve bulk loading large datasets or incremental loading

(e.g., updating only changed records).


o Data Indexing: Index the data for faster querying and retrieval.

o Data Validation: Perform checks to ensure that the data has been

loaded correctly and that it meets the necessary quality standards.

o Data Partitioning: Optionally, partition the data in the warehouse to

optimize performance.

 Example: Loading the cleansed and standardized customer transaction

data into a sales fact table in the data warehouse, where it can be analyzed

by business intelligence tools.

8)Central tendancy numerical


(Find mean, median, mode, variance and midrange value)
9) Hierarchical clustering

10)K-means clustering algorithm + numerical

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy