0% found this document useful (0 votes)

8 views

unit-1 data warehousing

The document provides a comprehensive overview of data warehousing, including its definition, architecture, components, and differences from traditional databases. It highlights the need for data warehouses in handling large volumes of historical data for analytical purposes, and outlines the benefits, features, and challenges associated with data warehousing. Additionally, it discusses the roles of operational databases versus data warehouses in supporting business operations and decision-making.

Uploaded by

JANANI B 22IT

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

8 views

unit-1 data warehousing

Uploaded by

JANANI B 22IT

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 17

CCS341- DATA WAREHOUSING

UNIT IINTRODUCTION TO DATA WAREHOUSE 5

Data warehouse Introduction - Data warehouse components- operational database Vs data

warehouse - Data warehouse Architecture - Three-tier Data Warehouse Architecture - Autonomous
Data Warehouse- Autonomous Data Warehouse Vs Snowflake - Modern Data Warehouse.

1. Data warehouse
A Database Management System (DBMS) stores data in the form of tables and uses an ER model
and the goal is ACID properties. For example, a DBMS of a college has tables for students, faculty.
etc

A Data Warehouse is separate from DBMS, it stores a huge amount of data, which is typically
collected from multiple heterogeneous sources like files, DBMS, etc. The goal is to produce
statistical results that may help in decision-making. For example, a college might want to see quick
different results, like how the placement of CS students has improved over the last 10 years, in
terms of salaries, counts, etc.

Need for Data Warehouse

An ordinary Database can store MBs to GBs of data and that too for a specific purpose. For storing
data of TB size, the storage shifted to the Data Warehouse. Besides this, a transactional database
doesn't offer itself to analytics. To effectively perform analytics, an organization keeps a central
Data Warehouse to closely study its business by organizing, understanding, and using its historical
data for making strategic decisions and analyzing trends.

Benefits of Data Warehouse

Better business analytics: Data warehouse plays an important role in every business to store and
analysis of all the past data and records ofthe company. which can further increase the
understanding or analysis of data for the company.
Faster Queries: The data warehouse is designed to handle large queries that's why it runs queries
faster than the database.
Improved data Quality: In the data warehouse the data you gathered from different sources is
being stored and analyzed it does not interfere with or add data by itself so your quality of data is
maintained and if you get any issue regarding data quality then the data warehouse team will solve
this.
Historical Insight: The warehouse stores all your historical data which contains details about the
business so that one can analyze it at any time and extract insights from it.

Data Warehouse vs DBMS

Database Data Warehouse

Data Warehouse vs DBMS

Database Data Warehouse

A common Database is based on

operational or transactional processing. A data Warehouse is based on analytical
Each operation is an indivisible processing.
transaction.

A Data Warehouse maintains historical

Generally, a Database stores current and data over time. Historical data is the data
up-to-date data which is used for daily kept over years and can used for trend
operations. analysis, make future predictions and
decision support.
A Data Warehouse is integrated generally
the organization level, by combining
A database is generally application data from different databases.
specific. Example -A data warehouse integrates
Example -A database stores related data, the data from one or more databases , so
such as the student details in a school. that analysis can be done to get results
such as the best performing school in a
city.
Constructing a Database is not SO Constructing a Data Warehouse can be
expensive. expensive.

Example Applications of Data Warehousing

Data Warehousing can be applied anywhere where we have a huge amount of data and we want to
see statistical results that help in decision making.
Social Media Websites: The social networking websites like Facebook, Twitter, Linkedin,
etc. are based on analyzing large data sets. These sites gather data related to members,
groups, locations, etc., and store it in a single central repository. Being a large amount of
data, Data Warehouse is needed for implementing the same.
Banking: Most of the banks these days use warehouses to see the spending patterns of
account/cardholders. They use this to provide them with special offers, deals, etc.
Governmnent: Government uses a data warehouse to store and analyze tax payments which
are used to detect tax thefts.

Features of Data Warehousing

Data warehousing is essential for modern data management, providing a strong foundation for
organizations to consolidate and analyze data strategically. Its distinguishing features empower
businesses with the tools to make informed decisions and extract valuable insights from their data.
Centralized Data Repository: Data warehousing provides a centralized repository for all
enterprise data from various sources, such as transactional databases, operational systems,
and external sources. This enables organizations to have a comprehensive view of their data,
which can help in making informed business decisions.
Data Integration: Data warehousing integrates data from different sources into a single,
unified view, which can help in eliminating data silos and reducing data inconsistencies.
Historical Data Storage: Data warehousing stores historical data, which enables
organizations to analyze data trends over time. This can help in identifying patterns and
anomalies in the data, which can be used to improve business performance.
Query and Analysis: Data warehousing provides powerful query and analysis capabilities
that enable users to explore and analyze data in different ways. This can help in identifying
patterns and trends, and can also help in making informed business decisions.
Data Transformation: Data warehousing includes a process of data transformation, which
involves cleaning, filtering, and formatting data from various sources to make it consistent
and usable. This can help in improving data quality and reducing data inconsistencies.
Data Mining: Data warehousing provides data mining capabilities, which enable
organizations to discover hidden patterns and relationships in their data. This can help in
identifying new opportunities, predicting future trends, and mitigating risks.
Data Security: Data warehousing provides robust data security features, such as access
controls, data encryption, and data backups, which ensure that the data is secure and
protected from unauthorized access.
Advantages of Data Warehousing
Intelligent Decision-Making: With centralized data in warehouses, decisions may be made
more quickly and intelligently.
Business Intelligence: Provides strong operational insights through business intelligence.
Historical Analysis: Predictions and trend analysis are made easier by storing past data.
Data Quality: Guarantees data quality and consistency for trustworthy reporting.
Scalability: Capable of managing massive data volumes and expanding to meet changing
requirements.
Effective Queries: Fast and effective data retrieval is made possible by an optimized
structure.

Cost reductions: Data warehousing can result in cost savings over time by reducing data
management procedures and increasing overall efficiency, even when there are setup costs
initially.
Data security: Data warehouses employ security protocols to safeguard confidential
information, guaranteeing that only authorized personnel are granted access to certain data.
Disadvantages of Data Warehousing
Cost: Building a data warehouse can be expensive, requiring significant investments in
hardware, software, and personnel.
Complexity: Data warehousing can be complex, and businesses may need to hire
specialized personnel to manage the system.
Time-consuming: Building a data warehouse can take a significant amount of time,
requiring businesses to be patient and committed to the process.
Data integration challenges: Data from different sources can be challenging to integrate,
requiring significant effort to ensure consistency and accuracy.
Data security: Data warehousing can pose data security risks, and businesses must take
measures to protect sensitive data from unauthorized access or breaches.
1.2. Data warehouse components
Architecture is the proper arrangement of the elements. We build a data warehouse with software
and hardware components. To suit the requirements of our organizations, we arrange these building
we may want to boost up another part with extra tools and services. All of these depends on our
circumstances.

Source data
Exter nal
Information delivery
Arhived
internal
production Managenment and control

Metadata
Data mining

Data Warehouse
DBMS
g0dimensional
Multi
Database OLAP

Data storage
Data marts

Data staging Report/Query

Components or Building Blocks of Data Warehouse

The figure shows the essential elements of a typical warehouse. We see the Source Data component
shows on the left. The Data staging element serves as the next building block. In the middle, we see
the Data Storage component that handles the data warehouses data. This element not only stores
and manages the data; it also keeps track of data using the metadata repository. The Information
Delivery component shows on the right consists of all the different ways of making the information
from the data warehouses available to the users.

1.2.1. Source Data Component

Source data coming into the data warehouses may be grouped into four broad categories:
Production Data: This type of data comes from the different operating systems of the
enterprise. Based on the data requirements in the data warehouse, we choose segments of
the data from the various operational modes.
Internal Data: In cach organization, the client keeps their "private" spreadsheets, reports,
customer profiles, and sometimes even department databases. This is the internal data, part
of which could be useful in a data warehouse.
Archived Data: Operational systems are mainly intended to run the current business.
every operational system, we periodically take the old data and store it in achieved files.
External Data: Most executives depend on information from external sources for a large
percentage of the information they use. They use statistics associating to their industry
produced by the external department.

1.2.2. Data Staging Component

1.2.2. Data Staging Component
After we have been extracted data from various operational systems and external sources, we have
to prepare the files for storing in the data warehouse. The extracted data coming from several
different sources need to be changed, converted, and made ready in a format that is relevant to be
saved for querying and analysis.
1) Data Extraction: This method has to deal with numerous data sources. We have to employ the
appropriate techniques for each data source.
2) Data Transformation: As we know, data for a data warehouse comes from many different
sources. If data extraction for a data warehouse posture big challenges, data transformation present
even significant challenges. We perform several individual tasks as part of data transformation.
First, we clean the data extracted from each source. Cleaning may be the correction of misspellings
or may deal with providing default values for missing data elements., or elimination of duplicates
when we bring in the same data from various source systems.
Standardization of data components forms a large part of data transformation. Data transformation
contains many forms of combining pieces of data from different sources. We combine data from
single source record or related data parts from many source records.
On the other hand, data transformation also contains purging source data that not useful and
separating outsource records into new combinations. Sorting and merging of data take place on a
large scale in the data staging area. When the data transformation function ends, we have a
collection of integrated data that is cleaned, standardized, and summarized.

3) Data Loading: Two distinct categories of tasks form data loading functions. When we complete
the structure and construction of the data warehouse and go live for the first time, we do the initial
loading of the information into the data warehouse storage. The initial load moves high volumes of
data using up a substantial amount of time.

1.2.3. Data Storage Components

Data storage for the data warehousing is a split repository. The data repositories for the operational
systems generally include only the current data. Also, these data repositories include the data
structured in highly normalized for fast and efficient processing.
Information Delivery Component
The information delivery element is used to enable the process of subscribing for data warehouse
files and having it transferred to one or more destinations according to some customer-specified
scheduling lgorithm.
Information
component
delivery AD hoc reports
warehouse
Online
Complex queries

Intranet
MD Analysis

Internet Statistical Analysis

EIS Feed

E-mail
Data mining
Data marts

Infomation delivery component

Metadata Component

Metadata in a data warehouse is equal to the data dictionary or the data catalog in a database
management system. In the data dictionary, we keep the data about the logical data structures, the
data about the records and addresses, the information about the indexes, and so on.

Data Marts

It includes a subset of corporate-wide data that is of value to a specific group of users. The scope is
confined to particular selected subjects. Data in a data warehouse should be a fairly current, but not
mainly up to the minute, although development in the data warehouse industry has made standard
and incremental data dumps more achievable. Data marts are lower than data warehouses and
usually contain organization. The current trends in data warehousing are to developed a data
warehouse with several smaller related data marts for particular kinds of queries and reports.

Management and Control Component

The management and control elements coordinate the services and functions within the data
warehouse. These components control the data transformation and the data transfer into the data
warehouse storage. On the other hand, it moderates the data delivery to the clients. Its work with th
database management systems and authorizes data to be correctly saved in the repositories. It
monitors the movement of information into the staging method and from there into the data
warehouses storage itself.

Why we need a separate Data Warehouse?

Data Warehouse queries are complex because they involve the computation of large group
of data at summarized levels.
It may require the use of distinctive data organization, access, and implementation metho
based on multidimensional views.
Performing OLAP queries in operational database degrade the performance of functiona
tasks.
Data Warehouse is used for analysis and decision making in which extensive database i
required, including historical data, which operational database does not typically maintain.
The separation of an operational database from data warehouses is based on the differen
structures and uses of data in these systems.
Because the two systems provide different functionalities and require different kinds of data,
it is necessary to maintain separate databases.

Database Data Warehouse

1. It is used for Online Transactional Processing 1. It is used for Online Analytical Processing
(OLTP) but can be used for other objectives such as (OLAP). This reads the historical information
Data Warehousing. This records the data from the for the customers for business decisions.
clients for history.
2. The tables and joins are complicated since they are 2. The tables and joins are accessible since
normalized for RDBMS. This is done to reduce they are de-normalized. This is done to
redundant files and to save storage space. minimize the response time for analytical
queries.

3. Data is dynamic 3. Data is largely static

4. Entity: Relational modeling procedures are used for 4. Data: Modeling approach are used for the
RDBMS database design. Data Warehouse design.

5. Optimized for write operations. 5.Optimized for read operations.

6. Performance is low for analysis queries. 6. High performance for analytical queries.
7. The database is the place where the data is taken as a 7. Data Warehouse is the place where the
base and managed to get available fast and efficient application data is handled for analysis and
accesS. reporting objectives.

1.3. operational database Vs data warehouse

Difference between Operational Database and Data Warehouse

The Operational Database is the source of information for the data warehouse. It includes detailed
information used to run the day to day operations of the business. The data frequently changes as
updates are made and reflect the current value of the last transactions.

Operational Database Management Systems also called as OLTP (Online Transactions Processing
Databases), are used to manage dynamic data in real-time.

Data Warehouse Systems serve users or knowledge workers in the purpose of ata analysis and
decision-making. Such systems can organize and present information in specific formats to
accommodate the diverse needs of various users. These systems are called as Online-Analytical
Processing (OLAP) Systems.
Data Warehouse and the OLTP database are both relational databases. However, the goals of both
these databases are different.
Data Warehouse
Operational Database

Operational systems are designed to support Data warehousing systems are typically
high-volume transaction processing. designed to support high-volume analytical
processing (i.e., OLAP).
Operational systems are usually concerned Data warehousing systems are usually
with current data. concerned with historical data.

Data within operational systems are mainly Non-volatile, new data may be added
updated regularly according to need. regularly. Once Added rarely changed.
It is designed for real-time business dealing It is designed for analysis of business
and processes. measures by subject area, categories, and
attributes.

It is optimized for a simple set of transactions, It is optimized for extent loads and high,
generally adding or retrieving a single row at a complex, unpredictable queries that access
time per table. many rows per table.

It is optimized for validation of incoming Loaded with consistent, valid information,

information during transactions, uses requires no real-time validation.
validation data tables.

It supports thousands of concurrent clients. It supports a few concurrent clients relative

to OLTP.

Operational systems are widely process- Data warehousing systems are widely
oriented. subject-oriented
Operational systems are usually optimized to Data warehousing systems are usually
perform fast inserts and updates of optimized to perform fast retrievals of
associatively small volumes of data. relatively high volumes of data.
Data In Data Out

Less Number of data accessed. Large Number of data accessed.

Relational databases are created for on-line Data Warehouse designed for on-line
transactional Processing (OLTP) Analytical Processing (OLAP)

Difference between OLTP and OLAP

OLTP System

OLTP System handle with operational data. Operational data are those data contained in the
operation of a particular system. Example, ATM transactions and Bank transactions, etc.
OLAP System
OLAP handle with Historical Data or Archival Data. Historical data are those data that are achieved
over a long period. For example, if we collect the last 10 years information about flight reservation,
the data can give us much meaningful data such as the trends in the reservation. This may provide
useful information like peak time of travel, what kind of people are traveling in various classes
(Economy/Business) ete.
The major difference between an OLTP and OLAP system is the amount of data analyzed in a
single transaction. Whereas an OLTP manage many concurrent customers and queries touching
only an individual record or limited groups of files at a time. An OLAP system must have the
capability to operate on millions of files to answer a single query.

Feature OLTP OLAP

Characteristic It is a system which is used to It is a system which is used to manage

manage operational Data. informational Data.

Users Clerks, clients, and information Knowledge workers, including managers,

technology professionals. executives, and analysts.
System OLTP system is a customer OLAP system is market-oriented, knowledge
orientation oriented, transaction, and query workers including managers, do data analysts
processing are done by clerks, executive and analysts.
clients, and information
technology professionals.
Data contents OLTP system manages current OLAP System manages a large amount of
data that too detailed and are used historical data, provides facilitates for
for decision making. summarization and aggregation, and stores and
manages data at different levels of granularity.
This information makes the data more
comfortable to use in informed decision
making.
Database Size 100 MB-GB 100 GB-TB

Database design OLTP system usually uses an OLAP system typically uses either a star or
entity-relationship (ER) data snowflake model and subject-oriented database
model and application-oriented design.
database design.
View OLTP system focuses primarily OLAP system often spans multiple versions of
on the current data within an a database schema, due to the evolutionary
enterprise or department, without process of an organization. OLAP systems also
referring to historical information deal with data that originates from various
or data in different organizations. organizations, integrating information from
many data stores.
Volume of data Not very large Because of their large volume, OLAP data are
stored on multiple storage media.
Access patterns The access patterns of an OLTP Accesses to OLAP systems are mostly read
system subsist mainly of short, only methods because of these data warehouses
atomic transactions. Such a stores historical data.
system requires concurrency
control and recovery techniques.
Access mode Read/write Mostly write
Insert and Short and fast inserts and updates Periodic long-running batch jobs refresh the
Updates proposed by end-users. data.

Number of Tens Millions

records
accessed

Normalization Fully Nomalized Partially Normalized

Processing Very Fast It depends on the amount of files contained,

Speed batch data refresh, and complex query may take
many hours, and query speed can be upgraded
by creating indexes.
Data Warehouse Architecture

Why do Business Analysts need Data Warehouse?

A data warehouse is a repository of an organization's electronically stored data. Data warehouses
are designed to facilitate reporting and analysis. It provides many advantages to business analysts
as follows:

1. A data warehouse may provide a competitive advantage by presenting relevant information

from which to measure performance and make critical adjustments in order to help win
over competitors.
2. A data warehouse can enhance business productivity since it is able to quickly and
efficiently gather information, which accurately describes the organization.
3. A data warehouse facilitates customer relationship marketing since it provides a consistent
view of customers and items across all lines of business, all departments, and all markets.
4. A data warehouse may bring about cost reduction by tracking trends, patterns, and
exceptions over long periods of time in a consistent and reliable manner.
5. A data warehouse provides a common data model for all data of interest, regardless of
the data"'s source. This makes it easier to report and analyze information than it would be
if multiple data models from disparate sources were used to retrieve information such as
sales invoices, order receipts. general ledger charges, etc.
6. Because they are separate from operational systems, data warehouses provide retrieval of
data without slowing down operational systems.
Process of Data Warehouse Design
A data warehouse can be built using three approaches:
1. A top-down approach
The top-down approach starts with the overall design and planning. It is useful in cases
where the technology is mature and well-known, and where the business problems that must
be solved are clear and well-understood.

2. A bottom-up approach
The bottom-up approach starts with experiments and prototypes. This is useful in the carly
stage of business modeling and technology development. It allows an organisation to move
forward at considerably less expense and to evaluate the benefits of the technology before
making significant commitments.
3. A combination of both approaches
In the combined approach, an organisation can exploit the planned and strategic nature of
the top-down approach while retaining the rapid implementation and opportunistic
application of the bottom-up approach.
In general, the warehouse design process consists of the following steps:
1. Choose a business process to model, e.g., orders, invoices, shipments, inventory, account
administration, sales, and the general ledger. If the business process is organisational and
involves multiple, complex object collections, a data warehouse model should be followed.
However., if the process is departmental and focuses on the analysis of one kind of business
process, a data mart model should be chosen.

2. Choose the grain of the business process. The grain is the fundamental, atomic level of data
to be represented in the fact table for this process, e.g., individual transactions, individual
daily snapshots, etc.
3. Choose the dimensions that will apply to each fact table record. Typical dimensions are
time, item, customer, supplier, warehouse, transaction type, and status.
4. Choose the measures that will populate cach fact table record. Typical measures are
numeric additive quantities like dollars-sold and units-sold.
Once a data warehouse is designed and constructed, the initial deployment of the warehouse
includes initial installation, rollout planning, training and orientation. Platform upgrades
and maintenance must also be considered. Data warehouse administration will include data

refreshment, data source synchronisation, planning for disaster ecovery, managing access control
and security, managing data growth, managing database performance, and data warehouse
enhancement and extension.

A Three-tier Data Warehouse Architecture

Data Warehouses generally have a three-level (tier) architecture that includes:
1. A bottom tier that consists of the Data Warehouse server, which is almost always a RDBMS.
It may include several specialised data marts and a metadata repository,
2. A middle tier that consists of an OLAP server for fast querying of the data warehouse.
The OLAP server is typically implemented using either (1) a Relational OLAP (ROLAP)
model, i.e., an extended relational DBMS that maps operations on multidimensional data
to standard relational operations; or (2) a Multidimensional OLAP (MOLAP) model, i.e., a
special purpose server that directly implements multidimensional data and operations.
3. A top tier that includes front-end tools for displaying results provided by OLAP, as well as
additional tools for data mining of the OLAP-generated data.

The overall DW architecture is shown in Figure 13.

Figure 1.3 A three-tier Data Warehousing Architecture

Openced deu Eueraal as

esng

BOTtoM TIWR 4a wan

pny

pata
Ming Aminitraie

LAP OLAP
sener

ToP TERfesd-adtet

ryg pot Smpe aatyss

Data Warehouse Models

From the architecture point of view, there are three data warehouse models: the virtual warehouse,
the data mart, and the enterprise warehouse.
Virtual Warehouse: A virtual warehouse is created based on a set of vievws defined for an

operational RDBMS. This warehouse type is relatively easy to build but requires excess
computational capacity of the underlying operational database system. The users directly access
operational data via middleware tools. This architecture is feasible only if queries are posed
infrequently, and usually is used as a temporary solution until a permanent data warehouse is
developed.
Data Mart: The data mart contains a subset of the organisation-wide data that is of value to a
small group of users, e.g., marketing or customer service. This is usually a precursor (and/or a
successor) of the actual data warehouse, which differs with respect to the scope that is confined
to a specific group of users.
Depending on the source of data, data marts can be categorized into the following two classes:
1. Independent data marts are sourced fronm data captured from one or more operational
systems or external information providers, or from data generated locally within a
particular department or geographic area.
2. Dependent data marts are sourced directly from enterprise data warehouses.
Enterprise warehouse: This warehouse type holds all information about subjects spanning the
entire organisation. For a medium- to a large-size company, usually several years are needed to
design and build the enterprise warehouse.
The differences between the vitual and the enterprise DWs are shown in Figure 1.4. Data marts
can also be created as successors of an enterprise data warehouse. In this case, the DW consists of
an enterprise warehouse and (several) data marts.

Figurr L4: A Virtual Data Warehouse and an Enterprise Data Warehouse

Operataal dtahne External data scrc

MJleware
server

Eakerprise aarehune

Dectsion s p t eimeat

Autonomous Data Warehouse

Autonomous Data Warehouse (ADW) is a cloud-based database service provided by Oracle. It is
part of Oracle's Autonomous Database offerings, which also include Autonomous Transaction
Processing (ATP). ADW is designed to simplify database management, reduce operational costs,
and improve performance by leveraging automation and cloud technologies.

Key features of Oracle Autonomous Data Warehouse include:

1. Automation: ADW automates various database management tasks, such as provisioning,

patching, tuning, and backups. This helps to reduce manual effort, minimize errors, and
enhance overall system performance.
2. Self-Driving Capability: The "self-driving" aspect of ADW means that the database can
automatically adapt to changing workloads, optimize itself for performance, and apply
security updates without human intervention.
3. Scalability: ADW provides the ability to easily scale computing resources up or down
based on demand. This ensures that the database can handle varying workloads efficiently.
4. Performance: With features like automatic indexing, performance tuning, and in-memory
processing, ADW aims to deliver high-performance analytics and reporting capabilities.
5. Security: ADW incorporates security measures such as encryption, access controls, and
auditing to protect sensitive data. Oracle also manages and applies security patches
automatically.
6. Compatibility: ADW is compatible with various data integration and analytics tools,
making it easier to integrate into existing workflows and environments.
7. Cloud-Native: Being a cloud-based service, ADW is hosted on Oracle Cloud Infrastructure.
This allows users to take advantage of the scalability, flexibility, and pay-as-you-go pricing
model associated with cloud computing.
8. Support for Multi-Model Data: ADW supports both relational and non-relational data
types, making it suitable for a variety of data processing needs.

Autonomous Data Warehouse Vs Snowflake

Oracle Autonomous Data Warehouse (ADW) and Snowflake are both cloud-based data
warehousing solutions, but they have some differences in terms of architecture, features, and
approach to data management. Here's a comparison between Oracle Autonomous Data Warehouse
and Snowflake:

1. Vendor:

Oracle ADW is a product of Oracle Corporation, a well-established database vendor

with a long history in the industry.
Snowflake is a cloud-native data warehousing platform developed by Snowflake
Computing, a newer entrant to the market.
2. Architecture:

Oracle ADW is built on Oracle Database technology and is part of the Oracle Cloud
Infrastructure. It utilizes Oracle's Autonomous Database technology, which includes
self-driving, self-securing, and self-repairing capabilities.
Snowflake is built as a multi-cloud, multi-cluster, and multi-region data warehouse
service. It has a unique architecture that separates storage and compute resources,
providing elasticity and scalability.
3. Automation:

Both ADW and Snowflake emphasize automation. ADW, as part of the Oracle
Autonomous Database family., is designed to automate various database management
tasks, including provisioning. patching, and tuning.
Snowflake also offers automation features, such as automatic scaling of compute
resources based on demand and automatic performance optimization.
4. Scalability:
ADW provides the ability to scale computing resources up or down based on
workload demands, allowing for flexibility in resource allocation.
Snowflake's architecture allows for independent scaling of compute and storage,
providing the ability to scale resources independently, and it automatically handles
the distribution of data across clusters.
5. Performance:

Both ADW and Snowflake aim to provide high-performance data warehousing.

ADW includes features like automatic indexing and in-memory processing.
Snowflake is known for its case of scaling, enabling users to achieve high
performance by adding or removing compute resources as needed.
6. Multi-Cloud Support:
Snowflake is designed to work seamlessly across multiple cloud providers, such as
AWS, Azure, and Google Cloud Platform, providing customers with flexibility in
choosing their preferred cloud infrastructure.
Oracle ADW is part of the Oracle Cloud Infrastructure and is primarily hosted on
Oracle's cloud.
7. Pricing Model:
Both ADW and Snowflake offer consumption-based pricing models. Snowflake's
pricing is based on the amount of storage used and the amount of compute resources
consumed.
Oracle ADW follows a similar model, charging users based on the resources they
consunme.

Modern Data Warehouse

A modern data warehouse (MDW) is an evolution of traditional data warehousing approaches,

leveraging contemporary technologies, architectures, and best practices to address the growing
challenges and requirements of handling and analyzing large volumes of data. Here are key
characteristics and components of a modern data warehouse:

1. Cloud-Native Architecture:

Modern data warehouses are often built on cloud platforms, such as AWS, Azure, or
Google Cloud, to take advantage of scalable and flexible computing resources, as
well as the ability to pay for resources on a consumption basis.
2. Data Lakes Integration:
Integration with data lakes allows for the storage and analysis of both structured and
unstructured data. This integration supports diverse data types and enables more
comprehensive analytics.
3. Scalability:
Modern data warehouses are designed to scale horizontally and vertically, allowing
organizations to easily add or remove resources based on data volume and
processing needs.
4. Automated Data Management:
Automation is a key aspect, covering various tasks such as data ingestion, data
transformation, and data quality checks. Automated processes reduce manual effort,
enhance efficiency, and improve overall system reliability.
5. Data Virtualization:

Data virtualization enables users to access and analyze data without physically
moving it. This can be particularly useful for integrating data from multiple sources
and providing a unified view without the need for extensive data movement.
6. Advanced Analytics and Machine Learning:
Modern data warehouses often incorporate advanced analytics and machine learning
capabilities directly within the platform. This allows organizations to derive insights
from data and build predictive models without having to move the data to external
systems.
7. Real-Time Data Processing:
The ability to handle real-time data processing and analytics is a crucial aspect of a
modern data warehouse. This is especially important for organizations that require
up-to-the-minute insights for decision-making.
8. Security and Compliance:
Security features are a priority, including robust authentication, encryption, and
compliance with regulatory standards. Modern data warehouses often provide fine
grained access controls to ensure data privacy and security.
9. Cost Managemnent:
Cost-effective solutions are a focus, with modern data warehouses allowing
organizations to pay for the resources they consume. This pay-as-you-go model is
often more cost-efficient than traditional on-premises solutions.
10. Integration with BI Tools and Visualization:
Seamless integration with business intelligence (BI) tools and visualization
platforms is essential to empower users to easily analyze and visualize data stored in
the warehouse.

11. Flexible Data Models:

Modern data warehouses support flexible data models, including both relational and
non-relational data. This flexibility accommodates diverse data types and structures.
12. Data Governance:

Robust data governance features are included to ensure data quality, lineage, and
compliance with regulatory requirements. This includes metadata management, data
cataloging, and lineage tracking.

CCS341-Data Warehousing Notes-Unit I
100% (1)
CCS341-Data Warehousing Notes-Unit I
30 pages
Data Mining & Business Intelligence (MU-Sem 6-1T) (Data Warehouse Mining) ... Page
No ratings yet
Data Mining & Business Intelligence (MU-Sem 6-1T) (Data Warehouse Mining) ... Page
25 pages
Learn Data Warehousing in 24 Hours
From Everand
Learn Data Warehousing in 24 Hours
Alex Nordeen
No ratings yet
DW Lecture Unit 1
No ratings yet
DW Lecture Unit 1
19 pages
DWDM 1 UNIT NOTES-1
No ratings yet
DWDM 1 UNIT NOTES-1
15 pages
UNIT II
No ratings yet
UNIT II
45 pages
Data Vwarehouse
No ratings yet
Data Vwarehouse
5 pages
Data Warehousing Fundamentals
No ratings yet
Data Warehousing Fundamentals
108 pages
Unit 1
No ratings yet
Unit 1
20 pages
Data Warehousing
No ratings yet
Data Warehousing
4 pages
Data Warehouse
No ratings yet
Data Warehouse
6 pages
Data Warehouse Unit 1
No ratings yet
Data Warehouse Unit 1
7 pages
4.Data warehouse(RTIT)
No ratings yet
4.Data warehouse(RTIT)
9 pages
DWDM Notes - Final
No ratings yet
DWDM Notes - Final
46 pages
Data Warehousing
No ratings yet
Data Warehousing
71 pages
Assignment Data Warehousing (Ajay - 58)
No ratings yet
Assignment Data Warehousing (Ajay - 58)
10 pages
Unit I
No ratings yet
Unit I
18 pages
Data Warehousing and Data Mining
No ratings yet
Data Warehousing and Data Mining
135 pages
DWDM Unit 1
No ratings yet
DWDM Unit 1
122 pages
Characteristics and Functions of Data Warehouse
No ratings yet
Characteristics and Functions of Data Warehouse
13 pages
Data Warehouse
No ratings yet
Data Warehouse
16 pages
Assignment (1) Muhammad Uzair Class No (260) Sec (E) Reg No (39626) Distributed Database System
No ratings yet
Assignment (1) Muhammad Uzair Class No (260) Sec (E) Reg No (39626) Distributed Database System
7 pages
Data Warehousing
No ratings yet
Data Warehousing
20 pages
Data Warehousing and Data Mining
No ratings yet
Data Warehousing and Data Mining
156 pages
Concept of Data Warehouse
No ratings yet
Concept of Data Warehouse
4 pages
DWDM(UNIT-1)
No ratings yet
DWDM(UNIT-1)
29 pages
Data Warehousing and Management Prelim Activity
No ratings yet
Data Warehousing and Management Prelim Activity
12 pages
Introduction to Warehousing
No ratings yet
Introduction to Warehousing
21 pages
Unit 3 Introduction To Data Warehousing: Structure Page Nos
No ratings yet
Unit 3 Introduction To Data Warehousing: Structure Page Nos
21 pages
DM - MOD - 2 Part - I
No ratings yet
DM - MOD - 2 Part - I
19 pages
Unit 1 DWDM Complete
No ratings yet
Unit 1 DWDM Complete
104 pages
UNITyssu 1 LT
No ratings yet
UNITyssu 1 LT
12 pages
DAM UNIT - IV
No ratings yet
DAM UNIT - IV
17 pages
Module 1_Introduction to Data Warehousing and Management
No ratings yet
Module 1_Introduction to Data Warehousing and Management
3 pages
DW Unit1
No ratings yet
DW Unit1
26 pages
Bi Units F
No ratings yet
Bi Units F
53 pages
Dwdm Unit-1 Notes PDF
No ratings yet
Dwdm Unit-1 Notes PDF
17 pages
APznzaY6aDiiFQcZdglMmHWqlfsLZcMKsTESHR9B_kPknhosV26ajqWsdEUKja4p9JYNx0z36dw2DbeRDycS1Y8JawcQ87i9STAqIoxAdievoD9TPhGWCj-VFS9pKfSk5UUHP7K-Uuidt3jVKqNIVOgHGNQbWGsnwt_zCupOzVlvYRIscF3zSsEsHVUnpYTm4Pf6Ft1aUDOxMC_
No ratings yet
APznzaY6aDiiFQcZdglMmHWqlfsLZcMKsTESHR9B_kPknhosV26ajqWsdEUKja4p9JYNx0z36dw2DbeRDycS1Y8JawcQ87i9STAqIoxAdievoD9TPhGWCj-VFS9pKfSk5UUHP7K-Uuidt3jVKqNIVOgHGNQbWGsnwt_zCupOzVlvYRIscF3zSsEsHVUnpYTm4Pf6Ft1aUDOxMC_
47 pages
Data warehouse unit-3 complete
No ratings yet
Data warehouse unit-3 complete
31 pages
Data Mining and Data Warehousing
No ratings yet
Data Mining and Data Warehousing
92 pages
Introduction To Data Warehousing
No ratings yet
Introduction To Data Warehousing
5 pages
Data Is A Collection of Facts, Such As Numbers, Words, Measurements, Observations or Just Descriptions of
No ratings yet
Data Is A Collection of Facts, Such As Numbers, Words, Measurements, Observations or Just Descriptions of
31 pages
Unit 1
No ratings yet
Unit 1
34 pages
Advanced Database Presentation
No ratings yet
Advanced Database Presentation
11 pages
DH&DM Unit-1
No ratings yet
DH&DM Unit-1
16 pages
Data Warehouse
No ratings yet
Data Warehouse
22 pages
Data Warehouse and Data Mining
No ratings yet
Data Warehouse and Data Mining
12 pages
Datawarehouse unit2
No ratings yet
Datawarehouse unit2
75 pages
Data Warehousing Notes
No ratings yet
Data Warehousing Notes
34 pages
Notes DWDM
No ratings yet
Notes DWDM
12 pages
WA Data Warehouse
No ratings yet
WA Data Warehouse
16 pages
Data Mining
No ratings yet
Data Mining
3 pages
Data Warehouse-Ccs341 Material
No ratings yet
Data Warehouse-Ccs341 Material
58 pages
Data Warehouse
No ratings yet
Data Warehouse
4 pages
Characteristics of Data Warehousing
No ratings yet
Characteristics of Data Warehousing
5 pages
DA&DM 3rd Unit-Notes
No ratings yet
DA&DM 3rd Unit-Notes
7 pages
Data Warehouse
No ratings yet
Data Warehouse
11 pages
Unit 6 Data Warehousing
No ratings yet
Unit 6 Data Warehousing
40 pages
Database and Data Warehouse
No ratings yet
Database and Data Warehouse
7 pages
Data Warehousing-1
No ratings yet
Data Warehousing-1
51 pages
SAP BI70 Material Revised
No ratings yet
SAP BI70 Material Revised
205 pages
I2 SCM
No ratings yet
I2 SCM
5 pages
Unit 3 PPT (BA)
No ratings yet
Unit 3 PPT (BA)
19 pages
Cheatsheet from Designing data-intensive applications
No ratings yet
Cheatsheet from Designing data-intensive applications
14 pages
Module 1
No ratings yet
Module 1
61 pages
Data Warehouse Tutorial For Beginners: Learn in 7 Days: Course Syllabus
No ratings yet
Data Warehouse Tutorial For Beginners: Learn in 7 Days: Course Syllabus
4 pages
Dot Net Advertising
No ratings yet
Dot Net Advertising
16 pages
PDF Download SAP With Key
No ratings yet
PDF Download SAP With Key
12 pages
Lnctu Bca (CS) Vi Sem Syllabus
No ratings yet
Lnctu Bca (CS) Vi Sem Syllabus
13 pages
Introduction SAP BI GURU99
No ratings yet
Introduction SAP BI GURU99
13 pages
2018-Sixth-Semster-Diploma in Computer Engineering, - (WWW - Arjun00.com - NP)
No ratings yet
2018-Sixth-Semster-Diploma in Computer Engineering, - (WWW - Arjun00.com - NP)
29 pages
DM Part 2
No ratings yet
DM Part 2
24 pages
Chapter1 BI
No ratings yet
Chapter1 BI
41 pages
AIN2601 - Exam Papers
0% (1)
AIN2601 - Exam Papers
18 pages
BI171 Requirements SAMPLE
No ratings yet
BI171 Requirements SAMPLE
69 pages
What Is The Purpose of Factless Fact Table
No ratings yet
What Is The Purpose of Factless Fact Table
11 pages
Contexti Big Data Framework
No ratings yet
Contexti Big Data Framework
3 pages
MISQ BI Special Issue Introduction Chen-Chiang-Storey December 2012 PDF
No ratings yet
MISQ BI Special Issue Introduction Chen-Chiang-Storey December 2012 PDF
24 pages
Business Analytics Notes
No ratings yet
Business Analytics Notes
82 pages
Unit 3 Data Warehousing and OLAP
No ratings yet
Unit 3 Data Warehousing and OLAP
26 pages
SAP BW BEx
No ratings yet
SAP BW BEx
8 pages
Datawarehousing
No ratings yet
Datawarehousing
71 pages
DWDM Unit1
No ratings yet
DWDM Unit1
93 pages
NoSQL
No ratings yet
NoSQL
32 pages
OLAP Cubes and Analytical Operations of OLAP
No ratings yet
OLAP Cubes and Analytical Operations of OLAP
15 pages
HelpDesk CA TEST 1
No ratings yet
HelpDesk CA TEST 1
35 pages
Syl PGDM Even
No ratings yet
Syl PGDM Even
277 pages
2012 - Chenetal - SPECIAL ISSUE BUSINESS INTELLIGENCE RESE
No ratings yet
2012 - Chenetal - SPECIAL ISSUE BUSINESS INTELLIGENCE RESE
26 pages
12 DWDM Notes
No ratings yet
12 DWDM Notes
10 pages
SQL Server 2000 Editions
No ratings yet
SQL Server 2000 Editions
14 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

unit-1 data warehousing

Uploaded by

unit-1 data warehousing

Uploaded by

CCS341- DATA WAREHOUSING

UNIT IINTRODUCTION TO DATA WAREHOUSE 5

Data warehouse Introduction - Data warehouse components- operational database Vs data

Need for Data Warehouse

Benefits of Data Warehouse

Data Warehouse vs DBMS

Database Data Warehouse

Database Data Warehouse

A common Database is based on

A Data Warehouse maintains historical

Example Applications of Data Warehousing

Features of Data Warehousing

Data staging Report/Query

Components or Building Blocks of Data Warehouse

1.2.1. Source Data Component

1.2.2. Data Staging Component

1.2.3. Data Storage Components

Internet Statistical Analysis

Infomation delivery component

Management and Control Component

Why we need a separate Data Warehouse?

Database Data Warehouse

3. Data is dynamic 3. Data is largely static

5. Optimized for write operations. 5.Optimized for read operations.

1.3. operational database Vs data warehouse

Difference between Operational Database and Data Warehouse

It is optimized for validation of incoming Loaded with consistent, valid information,

It supports thousands of concurrent clients. It supports a few concurrent clients relative

Less Number of data accessed. Large Number of data accessed.

Difference between OLTP and OLAP

Feature OLTP OLAP

Characteristic It is a system which is used to It is a system which is used to manage

Users Clerks, clients, and information Knowledge workers, including managers,

Number of Tens Millions

Normalization Fully Nomalized Partially Normalized

Processing Very Fast It depends on the amount of files contained,

Why do Business Analysts need Data Warehouse?

1. A data warehouse may provide a competitive advantage by presenting relevant information

A Three-tier Data Warehouse Architecture

The overall DW architecture is shown in Figure 13.

Openced deu Eueraal as

BOTtoM TIWR 4a wan

ryg pot Smpe aatyss

Data Warehouse Models

Figurr L4: A Virtual Data Warehouse and an Enterprise Data Warehouse

Operataal dtahne External data scrc

Autonomous Data Warehouse

Key features of Oracle Autonomous Data Warehouse include:

1. Automation: ADW automates various database management tasks, such as provisioning,

Autonomous Data Warehouse Vs Snowflake

Oracle ADW is a product of Oracle Corporation, a well-established database vendor

Both ADW and Snowflake aim to provide high-performance data warehousing.

Modern Data Warehouse

A modern data warehouse (MDW) is an evolution of traditional data warehousing approaches,

11. Flexible Data Models:

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.