1 DWH Concepts

Download as pdf or txt
Download as pdf or txt
You are on page 1of 13

Data Warehouse Overview

What is a Data Warehouse?


A data warehouse is a relational database that is designed for query and analysis rather
than for transaction processing. It usually contains historical data derived from
transaction data, but it can include data from other sources. It separates analysis
workload from transaction workload and enables an organization to consolidate data
from several sources.
In addition to a relational database, a data warehouse environment includes an
extraction, transportation, transformation, and loading (ETL) solution, an online
analytical processing (OLAP) engine, client analysis tools, and other applications that
manage the process of gathering data and delivering it to business users.
A common way of introducing data warehousing is to refer to the characteristics of a
data warehouse as set forth by William Inmon:
Subject Oriented
Integrated
Nonvolatile
Time Variant

Subject Oriented
Data warehouses are designed to help you analyze data. For example, to learn more
about your company's sales data, you can build a warehouse that concentrates on
sales. Using this warehouse, you can answer questions like "Who was our best
customer for this item last year?" This ability to define a data warehouse by subject
matter, sales in this case, makes the data warehouse subject oriented.

Integrated
Integration is closely related to subject orientation. Data warehouses must put data
from disparate sources into a consistent format. They must resolve such problems as
naming conflicts and inconsistencies among units of measure. When they achieve this,
they are said to be integrated.
Nonvolatile
Nonvolatile means that, once entered into the warehouse, data should not change. This
is logical because the purpose of a warehouse is to enable you to analyze what has
occurred.

Time Variant
In order to discover trends in business, analysts need large amounts of data. This is
very much in contrast to online transaction processing (OLTP) systems, where
performance requirements demand that historical data be moved to an archive. A data
warehouse's focus on change over time is what is meant by the term time variant.

Contrasting OLTP and Data Warehousing Environments


Figure 1-1 illustrates key differences between an OLTP system and a data warehouse.
Figure 1-1 Contrasting OLTP and Data Warehousing Environments
One major difference between the types of system is that data warehouses are not
usually in third normal form (3NF), a type of data normalization common in OLTP
environments.
Data warehouses and OLTP systems have very different requirements. Here are some
examples of differences between typical data warehouses and OLTP systems:

Workload

Data warehouses are designed to accommodate ad hoc queries. You might not know
the workload of your data warehouse in advance, so a data warehouse should be
optimized to perform well for a wide variety of possible query operations.
OLTP systems support only predefined operations. Your applications might be
specifically tuned or designed to support only these operations.

Data modifications

A data warehouse is updated on a regular basis by the ETL process (run nightly or
weekly) using bulk data modification techniques. The end users of a data warehouse
do not directly update the data warehouse.
In OLTP systems, end users routinely issue individual data modification statements to
the database. The OLTP database is always up to date, and reflects the current state of
each business transaction.

Schema design

Data warehouses often use denormalized or partially denormalized schemas (such as a


star schema) to optimize query performance.
OLTP systems often use fully normalized schemas to optimize update/insert/delete
performance, and to guarantee data consistency.
Typical operations

A typical data warehouse query scans thousands or millions of rows. For example,
"Find the total sales for all customers last month."
A typical OLTP operation accesses only a handful of records. For example, "Retrieve
the current order for this customer."

Historical data

Data warehouses usually store many months or years of data. This is to support
historical analysis.
OLTP systems usually store data from only a few weeks or months. The OLTP
system stores only historical data as needed to successfully meet the requirements of
the current transaction.

Data Warehouse Architectures


Data warehouses and their architectures vary depending upon the specifics of an
organization's situation. Three common architectures are:
Data Warehouse Architecture (Basic)
Data Warehouse Architecture (with a Staging Area)
Data Warehouse Architecture (with a Staging Area and Data Marts)

Data Warehouse Architecture (Basic)


Figure 1-2 shows a simple architecture for a data warehouse. End users directly access
data derived from several source systems through the data warehouse.
Figure 1-2 Architecture of a Data Warehouse

In Figure 1-2, the metadata and raw data of a traditional OLTP system is present, as is
an additional type of data, summary data. Summaries are very valuable in data
warehouses because they pre-compute long operations in advance. For example, a
typical data warehouse query is to retrieve something like August sales. A summary in
Oracle is called a materialized view.
Data Warehouse Architecture (with a Staging Area)
In Figure 1-2, you need to clean and process your operational data before putting it
into the warehouse. You can do this programmatically, although most data
warehouses use a staging area instead. A staging area simplifies building summaries
and general warehouse management. Figure 1-3 illustrates this typical architecture.
Figure 1-3 Architecture of a Data Warehouse with a Staging Area

Data Warehouse Architecture (with a Staging Area and Data Marts)


Although the architecture in Figure 1-3 is quite common, you may want to customize
your warehouse's architecture for different groups within your organization. You can
do this by adding data marts, which are systems designed for a particular line of
business. Figure 1-4 illustrates an example where purchasing, sales, and inventories
are separated. In this example, a financial analyst might want to analyze historical
data for purchases and sales.
Figure 1-4 Architecture of a Data Warehouse with a Staging Area and
Data Marts
Data Warehousing Schemas

A schema is a collection of database objects, including tables, views, indexes, and


synonyms. You can arrange schema objects in the schema models designed for data
warehousing in a variety of ways. Most data warehouses use a dimensional model.
The model of your source data and the requirements of your users help you design the
data warehouse schema. You can sometimes get the source model from your
company's enterprise data model and reverse-engineer the logical data model for the
data warehouse from this. The physical implementation of the logical data warehouse
model may require some changes to adapt it to your system parameters--size of
machine, number of users, storage capacity, type of network, and software.

Star Schemas
The star schema is the simplest data warehouse schema. It is called a star schema
because the diagram resembles a star, with points radiating from a center. The center
of the star consists of one or more fact tables and the points of the star are the
dimension tables, as shown in Figure 2-1.
Figure 2-1 Star Schema
The most natural way to model a data warehouse is as a star schema, only one join
establishes the relationship between the fact table and any one of the dimension tables.
A star schema optimizes performance by keeping queries simple and providing fast
response time. All the information about each level is stored in one row.

Data Warehousing Objects


Fact tables and dimension tables are the two types of objects commonly used in
dimensional data warehouse schemas.

Fact tables are the large tables in your warehouse schema that store business
measurements. Fact tables typically contain facts and foreign keys to the
dimension tables. Fact tables represent data, usually numeric and additive, that can
be analyzed and examined. Examples include sales, cost, and profit.
Dimension tables, also known as lookup or reference tables, contain the relatively
static data in the warehouse. Dimension tables store the information you normally use
to contain queries. Dimension tables are usually textual and descriptive and you can
use them as the row headers of the result set. Examples are customers or products.

Fact Tables
A fact table typically has two types of columns: those that contain numeric facts
(often called measurements), and those that are foreign keys to dimension tables. A
fact table contains either detail-level facts or facts that have been aggregated. Fact

tables that contain aggregated facts are often called summary tables. A fact table
usually contains facts with the same level of aggregation.
Though most facts are additive, they can also be semi-additive or non-additive.
Additive facts can be aggregated by simple arithmetical addition. A common example
of this is sales. Non-additive facts cannot be added at all. An example of this is
averages. Semi-additive facts can be aggregated along some of the dimensions and not
along others. An example of this is inventory levels, where you cannot tell what a
level means simply by looking at it.
Creating a New Fact Table
You must define a fact table for each star schema. From a modeling standpoint, the
primary key of the fact table is usually a composite key that is made up of all of its
foreign keys.

Dimension Tables
A dimension is a structure, often composed of one or more hierarchies, that
categorizes data. Dimensional attributes help to describe the dimensional value. They
are normally descriptive, textual values. Several distinct dimensions, combined with
facts, enable you to answer business questions. Commonly used dimensions are
customers, products, and time.
Dimension data is typically collected at the lowest level of detail and then aggregated
into higher level totals that are more useful for analysis. These natural rollups or
aggregations within a dimension table are called hierarchies.

Hierarchies
Hierarchies are logical structures that use ordered levels as a means of organizing
data. A hierarchy can be used to define data aggregation. For example, in a time
dimension, a hierarchy might aggregate data from the month level to the quarter level
to the year level. A hierarchy can also be used to define a navigational drill path and
to establish a family structure.
Within a hierarchy, each level is logically connected to the levels above and below it.
Data values at lower levels aggregate into the data values at higher levels. A
dimension can be composed of more than one hierarchy. For example, in the product
dimension, there might be two hierarchies--one for product categories and one for
product suppliers.

Dimension hierarchies also group levels from general to granular. Query tools use
hierarchies to enable you to drill down into your data to view different levels of
granularity. This is one of the key benefits of a data warehouse.

Levels
A level represents a position in a hierarchy. For example, a time dimension might
have a hierarchy that represents data at the month, quarter, and year levels. Levels
range from general to specific, with the root level as the highest or most general level.
The levels in a dimension are organized into one or more hierarchies.

Level Relationships
Level relationships specify top-to-bottom ordering of levels from most general (the
root) to most specific information. They define the parent-child relationship between
the levels in a hierarchy.
Hierarchies are also essential components in enabling more complex rewrites. For
example, the database can aggregate an existing sales revenue on a quarterly base to a
yearly aggregation when the dimensional dependencies between quarter and year are
known.

Typical Dimension Hierarchy


Figure 2-2 illustrates a dimension hierarchy based on customers.
Figure 2-2 Typical Levels in a Dimension Hierarchy

Unique Identifiers
Unique identifiers are specified for one distinct record in a dimension table. Artificial
unique identifiers are often used to avoid the potential problem of unique identifiers
changing. Unique identifiers are represented with the # character. For example,
#customer_id.

Relationships
Relationships guarantee business integrity. An example is that if a business sells
something, there is obviously a customer and a product. Designing a relationship
between the sales information in the fact table and the dimension tables products and
customers enforces the business rules in databases.

Example of Data Warehousing Objects and Their Relationships


Figure 2-3 illustrates a common example of a sales fact table and dimension tables
customers, products, promotions, times, and channels.
Figure 2-3 Typical Data Warehousing Objects
What are Slowly Changing Dimensions?

Slowly Changing Dimensions (SCD) - dimensions that change slowly over time,
rather than changing on regular schedule, time-base. In Data Warehouse there is a
need to track changes in dimension attributes in order to report historical data. In other
words, implementing one of the SCD types should enable users assigning proper
dimension's attribute value for given date. Example of such dimensions could be:
customer, geography, employee.

There are many approaches how to deal with SCD. The most popular are:

Type 1 - Overwriting the old value


Type 2 - Creating a new additional record
Type 3 - Adding a new column

Type 1 - Overwriting the old value. In this method no history of dimension changes is
kept in the database. The old dimension value is simply overwritten be the new one.
This type is easy to maintain and is often use for data which changes are caused by
processing corrections(e.g. removal special characters, correcting spelling errors).

Before the change:


Customer_ID Customer_Name Customer_Type
1 Cust_1 Corporate

After the change:


Customer_ID Customer_Name Customer_Type
1 Cust_1 Retail

Type 2 - Creating a new additional record. In this methodology all history of dimension
changes is kept in the database. You capture attribute change by adding a new row
with a new surrogate key to the dimension table. Both the prior and new rows contain
as attributes the natural key(or other durable identifier). Also 'effective date' and
'current indicator' columns are used in this method. There could be only one record
with current indicator set to 'Y'. For 'effective date' columns, i.e. start_date and
end_date, the end_date for current record usually is set to value 9999-12-31.
Introducing changes to the dimensional model in type 2 could be very expensive
database operation so it is not recommended to use it in dimensions where a new
attribute could be added in the future.

Before the change:


Customer_ID Customer_Name Customer_Type Start_Date End_Date Current_Flag
1 Cust_1 Corporate 22-07-2010 31-12-9999 Y

After the change:


Customer_ID Customer_Name Customer_Type Start_Date End_Date Current_Flag
1 Cust_1 Corporate 22-07-2010 17-05-2012 N
2 Cust_1 Retail 18-05-2012 31-12-9999 Y

Type 3 -Adding a new column. In this type usually only the current and previous value
of dimension is kept in the database. The new value is loaded into 'current/new'
column and the old one into 'old/previous' column. Generally speaking the history is
limited to the number of column created for storing historical data. This is the least
commonly needed techinque.

Before the change:


Customer_ID Customer_Name Current_Type Previous_Type
1 Cust_1 Corporate Corporate

After the change:


Customer_ID Customer_Name Current_Type Previous_Type
1 Cust_1 Retail Corporate
OBIEE Configuration files

NQSConfig.INI file is the first file you have to know when you are started working
with Administration Tool.

This file contains the information that which rpd is currently running in online, cache
information, authentication types and etc.,

After the installation you can see two folder in the installed path.

i.e 1. Oracle BI - Related Administration Tool


2. Oracle BIData - Related to Presentation Server.

you can edit and can change the NQS confiq.INI file whenever you want from the
following path.

C:\Oracle BI\Server\Config\ NQS Config.INI

and same like, If you have created any new catalog in the presentation services and
want to make that default, then you have change the one more file .i.e
instanceconfig.xml file.

instanceconfig.xml file is available in the the following path:

C:\Oracle BIData\Config\instanceconfig.xml

nqserver.log & nqquery.log files contains the logs about the query that is
executed while creating the analysis.

[$FMW_HOME]/user_projects/domains/bifoundation_domain/servers/bi_server1/logs

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy