0% found this document useful (0 votes)
60 views

Eserver I5 and Db2: Business Intelligence Concepts

Business intelligence (BI) involves consolidating data from various sources and presenting it in a meaningful way to help business analysts make informed decisions. BI applications allow financial, marketing, sales, and executive teams to analyze data like customer purchasing histories, sales forecasts, and key performance indicators. Companies historically used simple query tools or full data warehouses as BI solutions. Now, many packaged BI tools and approaches are available, including building customized enterprise data warehouses or buying pre-built "plug-in" data marts.

Uploaded by

itsme_mahe5798
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
60 views

Eserver I5 and Db2: Business Intelligence Concepts

Business intelligence (BI) involves consolidating data from various sources and presenting it in a meaningful way to help business analysts make informed decisions. BI applications allow financial, marketing, sales, and executive teams to analyze data like customer purchasing histories, sales forecasts, and key performance indicators. Companies historically used simple query tools or full data warehouses as BI solutions. Now, many packaged BI tools and approaches are available, including building customized enterprise data warehouses or buying pre-built "plug-in" data marts.

Uploaded by

itsme_mahe5798
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 12

eServer i5 and DB2:

Business Intelligence Concepts


Introduction
Business Intelligence (BI) is a popular and powerful concept of applying a set of
technologies to turn data into meaningful information. With Business Intelligence
Applications, large amounts of data originating in many different formats (spreadsheets,
relational databases, web logs) can be consolidated and presented to key Business
Analysts quickly and concisely. Armed with timely, intelligent information that is easily
understood (because it is delivered in business terms), the Business Analyst is enabled to
affect change and develop strategies to drive higher profits.
Business Intelligence Applications cover a broad spectrum of corporate data analysis
needs. Financial departments use it for Corporate Performance Measurement (CPM) and
managing the budgeting process without spending 80% of their time gathering the data or
managing the multitude of non-interconnected spreadsheets. Analytical CRM (Customer
Relationship Management) applies BI practices for marketing purposes. Analytical CRM
allows marketing departments to build intelligent, targeted marketing strategies based on
analysis of customer data such as purchase history and demographics to better understand
consumer buying behaviors. Sales departments use BI applications to track sales vs.
forecast, apply profitability measures to sales tactics, and analyze sales trends to identify
revenue opportunities. Company executives use BI applications to monitor key company
performance indicators through Balanced Scorecard, CPM, or other performance analysis
methodologies.
The need for better information gleamed from the massive amounts of data companies
collect is nothing new. Solutions historically have ranged from simple query tools to fullscale enterprise data warehouses. The availability of BI tools and applications in todays
market is almost overwhelming. Add to that the challenge of choosing a database product
or server platform can create a complex decision process to find the right solution.
This document will cover:

The Business Intelligence Application Architecture


Approaches to Business Intelligence
The IBM eServer i5 as a Business Intelligence foundation
Additional resources

The Business Intelligence Application Architecture


Data Warehousing is the concept of extracting data from different databases,
consolidating, cleansing, and storing that data in a format optimal for reporting or

analytical purposes. A Business Intelligence Application may be implemented with a


classic enterprise data warehousing approach, including best of breed multi-vendor
tools, or through single vendor applications that snap-on to operational systems such as
ERP applications. Regardless, the architectural considerations are typically the same, and
that architecture is consistent with data warehousing concepts.
For instance, data consolidation is a necessity. Key data elements to understand
profitability across the company may reside in many different databases and formats.
Spreadsheets, relational databases or flat files may all be sources of critical data required
to fully understand profitability using key business metrics.
Replication of the data into this consolidated database affords the opportunity to
restructure the data along business lines versus transaction processing lines. For example,
Point-of-Sale (POS) systems collect transaction data associated with a consumer
purchase. Customer master files containing demographic information are stored in
another database or system altogether. To get a single customer centric view of the data
including demographics and historical purchase elements requires the consolidated,
replicated and reformatted (optimized for analysis) database.
Tools to do the acquisition of data (extract), transformation of the data (applying
business rules, for example) and loading of the data are typically called ETL tools. Data
Replication tools can also play a part in this effort. Data Replication tools typically focus
more on the transportation of data between a single source and target with limited
transformation capabilities.
Through use of these tools, Business Analysts are freed from data gathering. They are
instead provided the data in a high performance, logical and intuitive manner that allows
them to independently run analyses.
The foundation for the data warehouse is a powerful Relational Database Management
System (RDBMS). As the size of the database or number of users grow, the ability to
scale becomes critical to maintain optimal performance. How the database leverages
parallel processing and query optimization techniques can be a key differentiator in
determining a platform for the BI Application.
Once built, this consolidated database can be analyzed directly through any number of
analysis tools. This is typically considered a two tier data warehouse architecture with the
consolidated database in a relational database format. More common, however, is a three
tier approach, which suggests that the consolidated data goes through one more data
replication/optimization iteration into a Data Mart. Data Marts tend to be a subset of
corporate data associated with a specific BI application or departmental solution. For
instance, it is not unusual to implement a data mart for corporate performance
measurements, a separate data mart for sales analysis, and yet a third data mart for use in
marketing efforts.

The added-value of the middle (relational) tier in the three tier architecture is that it is
where business rules are applied to the data such that consistency and integrity is
maintained across all the data marts. For instance, NET SALES might mean something
different to the Sales Manager than it does to accounting personnel. But if NET SALES is
defined only once, in the middle tier, there is no argument because the same definition is
preserved across the various data marts.
A common data mart implementation leverages a concept called OnLine Analytical
Processing, or OLAP. In many cases, OLAP based solutions can minimize complexity of
managing data marts through utilities that can automate:

Creation of data mart data models (the database definitions)


Loading of the data into the data mart
Aggregation of data for performance purposes
Maintenance of the data mart to handle updates and manage changes in business rules
or dimensions

The benefit of OLAP to end-users is the ability to slice and dice through data in a high
performance, intuitive and iterative manner. For example, a classic OLAP data mart for
sales analysis allows end users easy answers to questions such as:

What is the comparison of sales versus forecast across sales territories, product sets,
time frames and customers?
What products are showing more positive revenue trends through e-Commerce
channels versus traditional brick and mortar channels?
What is product or customer performance by revenue, cost of goods sold, gross
margin or other measurement by quarter?

Meta data is a concept common in an Enterprise Data Warehouse implementation. Meta


data, often stored in a Meta Data Repository, is important information for both Data
Warehouse Administrators as well as end users. Data Warehouse Administrators use this
information to keep track of data transformation rules, job scheduling, job status, data
model definitions and other information pertinent to defining and managing the data
warehouse. End users can benefit from the Meta data in that it provides a description of
the data stored in the data warehouse or data mart. This descriptive information could
include the timing of the data (e.g., when the data warehouse was last refreshed), the
formula or calculation that was applied to a derived data element, or other descriptive
information.
ETL tools will typically provide a Meta data repository and/or management capability,
oftentimes with the ability to extract Meta data definitions from other tools and databases
into its repository.
Where does data mining fit into all of this? Data mining is the use of statistics or
mathematical algorithms to discover patterns or correlations within the data. While the
data warehouse infrastructure isnt a prerequisite to using data mining tools, it often is
necessary because of the need to prepare the data through consolidation and cleansing
processes in order for the data mining tools to be effective.
Whereas with query processes (both ad-hoc and OLAP) you form a hypothesis and use
the technology to prove or disprove the hypothesis, data mining techniques suggest that
you do not know the hypothesis beforehand. Instead, you use the tool to find out
interesting information about the data that you wouldnt think to ask. For instance, you
may use OLAP analysis to see trends of sales, monitor key performance indicators, or
identify exceptions or anomalies in historical data. With data mining, you could discover
information about consumers purchasing behaviors that would otherwise go unnoticed.
As an example, by mining point of sale transaction data you might discover that
customers have a higher propensity to buy two totally unrelated products together when
they shop (the classic beer and diapers example). Armed with such knowledge, retailers
can determine how best to position products within stores, optimize product pricing, or
build specific marketing programs (coupons) to draw customers and cross sell or up sell.
Data mining tools have been successfully used across a broad range of requirements,
including:

Fraud Detection
Customer Segmentation, Scoring
Risk Assessment
Equipment Preventative Maintenance
Market Basket Analysis

Approaches to Implementing BI Applications


There are two common approaches to implementing a Business Intelligence application:
1. Use data warehouse tools to build a customized data warehouse solution an
Enterprise Data Warehouse
2. Buy a packaged data mart solution the plug-in data mart
Many customers choose to leverage many of the tools available in the marketplace to
build a customized BI application. Tools that may be combined to build this solution
include:

Data Replication Tools


ETL (Extract/Transformation/Loading) Tools, including meta data management
OLAP (Online Analytical Processing) Products
End User Analysis Tools (Query or Report Writing tools)
Data Mining Analysis Tools

While the build approach may take longer to initially implement the first phase of the
project, the key benefit is in leveraging a set of utilities that are open, extensible, and very
robust. This provides a foundation for future growth, including flexibility to add
additional data sources or applications, or manage changing business requirements or
computing environments without re-inventing the wheel. Challenges with this approach
include the potential need to integrate multiple tools, possibly from multiple vendors, and
the learning curve on these new product sets.
The buy approach attempts to simplify the process. Many vendors, in particular
operational system software providers such as ERP or CRM vendors have chosen to add
plug-in data mart applications as part of their core offerings. These BI applications are
typically single data marts where the extraction, loading, and reporting functions have
been pre-built. While in theory there is no such thing as a load and go BI application
because of data cleansing issues, differing reporting requirements, and customization of
the operational systems, this approach can result in a much shorter implementation time.
In addition, much of the challenge of building a BI application from scratch is in
identifying data elements and business rules in the source systems, and the plug-in data
mart provides a huge benefit here.
One definite consideration with the plug-in data mart solution is how open and
extensible it is. If it is based on industry recognized data warehouse toolsets and conforms
to open standards (for example, SQL), then it will be much easier to leverage the
investment for other analytical applications outside of just the ERP or CRM solution. If
the solution is based on proprietary tools or code, then keep in mind it may be more
difficult and costly to enhance the solution to pull in other data sources, add additional
analysis functionality, or leverage various end user interfaces that different users require
because of unique needs. Finally, if the solution does not allow you to build an

architecturally sound data warehouse infrastructure it may limit the ability to build on top
of the solution without re-architecting the data mart implementation.
Another approach to the plug-in data mart comes from BI software vendors, as opposed
to operational system software providers. Many solutions based on open, extensible data
warehouse toolsets from these BI software providers include pre-built plug-ins for various
operational systems, thereby achieving the best of both the buy and build worlds.
These solutions can achieve fast payback as do the buy solutions while typically
providing a foundation for expansion.
Choosing the approach that is right for you will depend on any number of factors,
including:

Cost
ROI payback period
Extensibility of solution
Ability to leverage existing in-house tools, databases, hardware, skills
Market position and industry acceptance

The eServer i5: the Foundation for BI Applications


Choosing a platform for hosting the BI application(s) depend on many factors, including:

Database capabilities
Tools support
Scalability
Total cost of ownership
Openness

On top of those considerations, reducing risk is critical to ensure success of the project.
Risk can be significantly reduced through:

Leveraging of existing in-house skills programming, administration/operations,


database administration
Leveraging of existing hardware, network, OS and database resources
Maintaining common architectures (databases, servers) for multiple applications
across the enterprise
Strong partnerships between I/T and business users
Strong project management

The IBM eServer i5 is a great foundation for a BI application. It is an especially great fit
in situations where there is already a strong iSeries or AS/400 presence, for example,

where a customer has an operational application running on that platform today. But it is
also a great fit in situations where customers are looking for a highly reliable, scalable
solution that does not require the administrative costs of other solutions.
Lets look at some of the attributes of the i5 in a BI environment.

Traditional Strengths of i5, iSeries, and AS/400


DB2 continues to shine when it comes to reliability, availability, and low cost of
ownership. The i5s unique integrated operating system results in less administrative
costs, directly impacting Total Cost of Ownership (TCO) in its favor. The following chart
depicts the TCO advantage of i5 versus UNIX and Wintel based platforms.

Source: The Impact of OS/Platform Selection on the Cost of ERP Implementation, Use and Management - META Group - 07/25/02

The Technology Independent Machine Interface (TIMI) design of the i5 protects


investments by preserving applications and hardware serial numbers as new processor
technologies are developed. For instance, when the AS/400 line introduced 64 bit RISC
computing in 1995, existing systems could be upgraded to the new technology while
preserving machine serial number, which protects the hardware investment and
minimizes costs. More importantly, any application software was automatically reencapsulated to run as a full 64-bit capable application without requirements for new
versions. Compare this to other systems where even if the processor technology is
introduced, the operating system, database, and applications dont fully leverage that

capability without software vendors providing new versions that have had to be enhanced
to fully take advantage of the new technology.
The eServer i5 now on its NINTH generation 64 bit RISC technology and customers have
been able to move to these processor enhancements with minimal disruption.

DB2
DB2 Universal Database (UDB) is the relational database management system (RDBMS)
that is built into the Operating System for the IBM eServer i5. It is an open, extensible,
high performance, scalable database that leverages the i5 architecture to maintain its
value proposition of lower total costs to own.
DB2 in a BI environment provides many advantages.
1. Open Interfaces
Adherence to industry standards allows for more choices when evaluating tools or BI
applications. Portability of those applications or databases, and interconnectivity to many
different platforms/databases are also key benefits of open standard support. Additionally,
because of common standards support such as the SQL standard, companies are better
positioned to leverage available skills in the workforce. DB2 supports more elements of
the core SQL standard today than any other database.
2. Scalability/Performance
The sophisticated cost based query optimizer built into DB2 is the basis of achieving
optimal performance in a BI application. The optimizer provides the brains behind the
SQL processing which most BI applications depend on. The optimizers goal is to build
the best plan for accessing data, which will minimize extraneous I/O and therefore
provide optimal performance. IBM has continued investment in DB2s cost based
optimization technology over time, adding additional features to enable better efficiencies
in execution of complex queries. The following are some of those features.
Real-time statistics: DB2s cost-based optimizer uses statistics that are captured real-time
and stored in the database as well as other information, such as system configuration,
SMP settings, and available indexes in its development of the access plan. Statistics are
extremely important to aid in the optimizers ability to choose the best plan. For example,
data skew relates to the distribution of values within a database table (physical file). For
instance, in a customer table you may have many more customers within one
region/state/zip code than another. The customers are not evenly distributed across the
regions because of population differences, or because only certain products are sold in
certain regions, etc. When an SQL statement that is selecting customers from a specific
set of zip codes is issued, it is important to understand this distribution to build the most
effective plan for retrieving that data. Many optimizers assume equal distribution and that

can result in an inefficient plan being executed. But DB2 maintains information about
data skew, which significantly aids the overall performance of SQL processes.
Database parallelism: A key consideration in any database platform in support of BI
applications is its ability to scale while maintaining acceptable performance. DB2 offers a
variety of techniques to address scalability/performance. Parallel database operations
through the Symmetric Multiprocessing (SMP) feature of I5/OS (OS/400) provide the
ability to split single database tasks/requests across multiple processors within a single i5
system or logical partition (LPAR). The SMP feature provides significant performance
enhancements to database queries, loads, index builds and other operations commonly
used by BI applications.
Advanced indexing: A key (IBM patented) performance technology is DB2s Encoded
Vector Indexing (EVIs). EVIs offer dramatic advancement to bitmap indexing
technology. Bitmap indexing is used by some database solutions to improve performance
in query heavy environments. Bitmap indexing is based on using a single bit (instead of
multiple bytes of data) to indicate that a specific key value can be found in a row in the
database table. Each distinct key value requires its own bitmap array. EVIs use a single
array for all index values, making them smaller, faster, more scalable, and simpler to
manage, search, and update. For example, a bitmap built over a location field with 250
distinct values would require over 31 GB to store just the index for a file with one billion
records. This same index using EVIs would require only 1 GB 30 times less storage
than the bitmap array. Using this advanced indexing feature, DB2 was able to run a query
across a 225 GB table in a little over 35 seconds. This same query previously took over
two hours.
Materialized Query Tables (MQTs) and result set caching: Performance in a data
warehouse environment can be significantly improved for repetitive user queries through
the use of summary tables or caching techniques. Recent DB2 enhancements include
query result set caching, which can mean for repetitive queries DB2 can benefit from
work that has already been done, minimizing the need to re-run the entire query.
Materialized Query Tables are an implementation of DB2-aware materialized views (or
summary tables). DB2 currently supports creation of MQTs through SQL syntax, and will
be enhanced to support automatic maintenance of MQTs by DB2.
3. Reduced Costs through Simplified Management
The eServer i5s unique architecture, including Single Level Storage and a tightly
integrated object based operating system, have been the foundation for its leadership in
Total Cost of Ownership for many years. In addition, DB2 UDB leverages those
architectural benefits by reducing the tasks required by a database administrator (DBA)
typical in other architectures. For example, with DB2 UDB:
! Data Partitioning for performance: NOT REQUIRED

!
!
!
!

Moving data or indexes to avoid disk hot-spots: NOT REQUIRED


Re-balancing Indexes: NOT REQUIRED
Monitoring table spaces, log buffers, lock contention buffers: NOT REQUIRED
Running integrity checkers or statistics collection routines: NOT REQUIRED

DB2 UDB minimizes complexity and costs, improving reliability and availability of the
data warehouse, with less manual intervention by DBA staff.

Leveraging an Existing iSeries or AS/400 Environment


BI applications require you to build replicated databases for reporting purposes. But
moving data between different architectures is not easy. Different operating systems
require multiple skills to support different utilities and operating environments (security,
communications, data loads). Different databases cause duplication of query optimization
skills, database administration skills and processes, and requirements to transform data
due to different data types and SQL standards support. Extraction and replication tools
that support heterogeneous environments can be complex and expensive.
The bottom line is if you are creating reporting repositories because of environmental
factors (e.g., maintaining security of production data, optimizing data for querying,
minimizing impact of query workloads on production systems), it is much easier to
maintain a common architecture for both the production environment and the reporting
environment.
With DB2 UDB there are some fundamental approaches that can make your life easier.
With Remote Journaling, you can capture changed production data and have the system
route the changed data logs (journal receivers) to another i5 logical partition (LPAR) or
system.
To read the database changes from the journal receivers, you could use a low cost utility
called Data Propagator to add the changed records to the reporting repository (on this 2nd
partition/system). This essentially eliminates a key issue with any BI application - the
impact of extraction or data transport on the production environment.
A side benefit of this is the ability to use the 2nd partition/system for other purposes as
well, such as backing up the databases regularly without impacting production, or to
create a High Availability (H/A) scenario.
If there is a requirement for 24 x 7 operations, a combined system for High Availability
and Business Intelligence is an excellent way to optimize hardware costs. High
Availability software solutions offer the ability to automatically mirror production
databases to a backup system (or LPAR). But typically, this backup system is running at
less than full capacity as its CPU cycles are simply in reserve for the instance that a failover situation occurs. A BI application can not only leverage those available CPU Cycles,

but the data is already there as well, minimizing the complexity and potential negative
impact of the extraction process. The backup system/LPAR can then also be tuned to
optimize query processing versus having to share its resources with transaction oriented
workloads on the production machine.

Similarly, Logical Partitioning (LPAR) is a huge advantage for leveraging a common


architecture. Logical Partitioning allows you to dynamically allocate resources across
multiple instances of I5/OS (OS/400) within a single system footprint. LPAR allows you
to carve up the system into partitions, and then dynamically allocate resources (CPU, I/O,
and memory) across those partitions.
For example, suppose you have an 8-way i5 configuration. One partition could be
allocated to the production environment, one partition for development, and yet another
partition for the BI application. A typical configuration might consist of 3.25 CPUs being
allocated to production, another 4 allocated to the BI application, and .5 CPUs left over
for development, Linux, or other purposes (Domino, Websphere). During night time
extraction/loading processes, reallocating this assignment of CPU percentages (e.g., add
another 2 CPUs to the BI partition) can occur such that you throw as much resource as
required for the job at hand and thereby minimize batch processing windows.

Business Intelligence Applications can be a very powerful enabler for any company
looking to gain insights into their data. The IBM eServer i5 is an excellent platform for
minimizing risk and costs associated with implementation of this powerful technology.

Additional Resources:

DB2 UDB for iSeries Home Page:


http://www.ibm.com/servers/eserver/iseries/db2
Indexing Strategies for DB2 UDB for iSeries
http://www-1.ibm.com/servers/enable/site/education/abstracts/indxng_abs.html
DB2 UDB for iSeries Education Roadmap
http://www-1.ibm.com/servers/eserver/iseries/db2/gettingstarted.html
For more information on remote journaling and data propagator, refer to:
http://www-1.ibm.com/servers/eserver/iseries/whpapr/data_rep_sol.html

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy