Eserver I5 and Db2: Business Intelligence Concepts
Eserver I5 and Db2: Business Intelligence Concepts
The added-value of the middle (relational) tier in the three tier architecture is that it is
where business rules are applied to the data such that consistency and integrity is
maintained across all the data marts. For instance, NET SALES might mean something
different to the Sales Manager than it does to accounting personnel. But if NET SALES is
defined only once, in the middle tier, there is no argument because the same definition is
preserved across the various data marts.
A common data mart implementation leverages a concept called OnLine Analytical
Processing, or OLAP. In many cases, OLAP based solutions can minimize complexity of
managing data marts through utilities that can automate:
The benefit of OLAP to end-users is the ability to slice and dice through data in a high
performance, intuitive and iterative manner. For example, a classic OLAP data mart for
sales analysis allows end users easy answers to questions such as:
What is the comparison of sales versus forecast across sales territories, product sets,
time frames and customers?
What products are showing more positive revenue trends through e-Commerce
channels versus traditional brick and mortar channels?
What is product or customer performance by revenue, cost of goods sold, gross
margin or other measurement by quarter?
Fraud Detection
Customer Segmentation, Scoring
Risk Assessment
Equipment Preventative Maintenance
Market Basket Analysis
While the build approach may take longer to initially implement the first phase of the
project, the key benefit is in leveraging a set of utilities that are open, extensible, and very
robust. This provides a foundation for future growth, including flexibility to add
additional data sources or applications, or manage changing business requirements or
computing environments without re-inventing the wheel. Challenges with this approach
include the potential need to integrate multiple tools, possibly from multiple vendors, and
the learning curve on these new product sets.
The buy approach attempts to simplify the process. Many vendors, in particular
operational system software providers such as ERP or CRM vendors have chosen to add
plug-in data mart applications as part of their core offerings. These BI applications are
typically single data marts where the extraction, loading, and reporting functions have
been pre-built. While in theory there is no such thing as a load and go BI application
because of data cleansing issues, differing reporting requirements, and customization of
the operational systems, this approach can result in a much shorter implementation time.
In addition, much of the challenge of building a BI application from scratch is in
identifying data elements and business rules in the source systems, and the plug-in data
mart provides a huge benefit here.
One definite consideration with the plug-in data mart solution is how open and
extensible it is. If it is based on industry recognized data warehouse toolsets and conforms
to open standards (for example, SQL), then it will be much easier to leverage the
investment for other analytical applications outside of just the ERP or CRM solution. If
the solution is based on proprietary tools or code, then keep in mind it may be more
difficult and costly to enhance the solution to pull in other data sources, add additional
analysis functionality, or leverage various end user interfaces that different users require
because of unique needs. Finally, if the solution does not allow you to build an
architecturally sound data warehouse infrastructure it may limit the ability to build on top
of the solution without re-architecting the data mart implementation.
Another approach to the plug-in data mart comes from BI software vendors, as opposed
to operational system software providers. Many solutions based on open, extensible data
warehouse toolsets from these BI software providers include pre-built plug-ins for various
operational systems, thereby achieving the best of both the buy and build worlds.
These solutions can achieve fast payback as do the buy solutions while typically
providing a foundation for expansion.
Choosing the approach that is right for you will depend on any number of factors,
including:
Cost
ROI payback period
Extensibility of solution
Ability to leverage existing in-house tools, databases, hardware, skills
Market position and industry acceptance
Database capabilities
Tools support
Scalability
Total cost of ownership
Openness
On top of those considerations, reducing risk is critical to ensure success of the project.
Risk can be significantly reduced through:
The IBM eServer i5 is a great foundation for a BI application. It is an especially great fit
in situations where there is already a strong iSeries or AS/400 presence, for example,
where a customer has an operational application running on that platform today. But it is
also a great fit in situations where customers are looking for a highly reliable, scalable
solution that does not require the administrative costs of other solutions.
Lets look at some of the attributes of the i5 in a BI environment.
Source: The Impact of OS/Platform Selection on the Cost of ERP Implementation, Use and Management - META Group - 07/25/02
capability without software vendors providing new versions that have had to be enhanced
to fully take advantage of the new technology.
The eServer i5 now on its NINTH generation 64 bit RISC technology and customers have
been able to move to these processor enhancements with minimal disruption.
DB2
DB2 Universal Database (UDB) is the relational database management system (RDBMS)
that is built into the Operating System for the IBM eServer i5. It is an open, extensible,
high performance, scalable database that leverages the i5 architecture to maintain its
value proposition of lower total costs to own.
DB2 in a BI environment provides many advantages.
1. Open Interfaces
Adherence to industry standards allows for more choices when evaluating tools or BI
applications. Portability of those applications or databases, and interconnectivity to many
different platforms/databases are also key benefits of open standard support. Additionally,
because of common standards support such as the SQL standard, companies are better
positioned to leverage available skills in the workforce. DB2 supports more elements of
the core SQL standard today than any other database.
2. Scalability/Performance
The sophisticated cost based query optimizer built into DB2 is the basis of achieving
optimal performance in a BI application. The optimizer provides the brains behind the
SQL processing which most BI applications depend on. The optimizers goal is to build
the best plan for accessing data, which will minimize extraneous I/O and therefore
provide optimal performance. IBM has continued investment in DB2s cost based
optimization technology over time, adding additional features to enable better efficiencies
in execution of complex queries. The following are some of those features.
Real-time statistics: DB2s cost-based optimizer uses statistics that are captured real-time
and stored in the database as well as other information, such as system configuration,
SMP settings, and available indexes in its development of the access plan. Statistics are
extremely important to aid in the optimizers ability to choose the best plan. For example,
data skew relates to the distribution of values within a database table (physical file). For
instance, in a customer table you may have many more customers within one
region/state/zip code than another. The customers are not evenly distributed across the
regions because of population differences, or because only certain products are sold in
certain regions, etc. When an SQL statement that is selecting customers from a specific
set of zip codes is issued, it is important to understand this distribution to build the most
effective plan for retrieving that data. Many optimizers assume equal distribution and that
can result in an inefficient plan being executed. But DB2 maintains information about
data skew, which significantly aids the overall performance of SQL processes.
Database parallelism: A key consideration in any database platform in support of BI
applications is its ability to scale while maintaining acceptable performance. DB2 offers a
variety of techniques to address scalability/performance. Parallel database operations
through the Symmetric Multiprocessing (SMP) feature of I5/OS (OS/400) provide the
ability to split single database tasks/requests across multiple processors within a single i5
system or logical partition (LPAR). The SMP feature provides significant performance
enhancements to database queries, loads, index builds and other operations commonly
used by BI applications.
Advanced indexing: A key (IBM patented) performance technology is DB2s Encoded
Vector Indexing (EVIs). EVIs offer dramatic advancement to bitmap indexing
technology. Bitmap indexing is used by some database solutions to improve performance
in query heavy environments. Bitmap indexing is based on using a single bit (instead of
multiple bytes of data) to indicate that a specific key value can be found in a row in the
database table. Each distinct key value requires its own bitmap array. EVIs use a single
array for all index values, making them smaller, faster, more scalable, and simpler to
manage, search, and update. For example, a bitmap built over a location field with 250
distinct values would require over 31 GB to store just the index for a file with one billion
records. This same index using EVIs would require only 1 GB 30 times less storage
than the bitmap array. Using this advanced indexing feature, DB2 was able to run a query
across a 225 GB table in a little over 35 seconds. This same query previously took over
two hours.
Materialized Query Tables (MQTs) and result set caching: Performance in a data
warehouse environment can be significantly improved for repetitive user queries through
the use of summary tables or caching techniques. Recent DB2 enhancements include
query result set caching, which can mean for repetitive queries DB2 can benefit from
work that has already been done, minimizing the need to re-run the entire query.
Materialized Query Tables are an implementation of DB2-aware materialized views (or
summary tables). DB2 currently supports creation of MQTs through SQL syntax, and will
be enhanced to support automatic maintenance of MQTs by DB2.
3. Reduced Costs through Simplified Management
The eServer i5s unique architecture, including Single Level Storage and a tightly
integrated object based operating system, have been the foundation for its leadership in
Total Cost of Ownership for many years. In addition, DB2 UDB leverages those
architectural benefits by reducing the tasks required by a database administrator (DBA)
typical in other architectures. For example, with DB2 UDB:
! Data Partitioning for performance: NOT REQUIRED
!
!
!
!
DB2 UDB minimizes complexity and costs, improving reliability and availability of the
data warehouse, with less manual intervention by DBA staff.
but the data is already there as well, minimizing the complexity and potential negative
impact of the extraction process. The backup system/LPAR can then also be tuned to
optimize query processing versus having to share its resources with transaction oriented
workloads on the production machine.
Business Intelligence Applications can be a very powerful enabler for any company
looking to gain insights into their data. The IBM eServer i5 is an excellent platform for
minimizing risk and costs associated with implementation of this powerful technology.
Additional Resources: