DB2 SQL Tuning

Download as ppt, pdf, or txt
Download as ppt, pdf, or txt
You are on page 1of 53
At a glance
Powered by AI
The document discusses various techniques for optimizing queries in DB2 such as using simple queries, filtering rows and columns, avoiding unnecessary sorting, using indexes, and classifying predicates.

General recommendations include keeping queries simple, filtering and selecting only necessary rows and columns, avoiding unnecessary sorting, using page level locking to minimize lock duration, and handling large tables carefully.

Predicates are classified as indexable or non-indexable, and stage 1 or stage 2. They are evaluated in a specific order with indexable matching predicates evaluated first followed by other predicates.

Topics of Discussion

General Recommendation Predicates Evaluation Filter Factor Column Correlation Writing Subquery Special Techniques to influence access path Using DB2 EXPLAIN Different Access Types

Topics of Discussion
Join Methods DB2 Data and Index page prefetch Sorting of Data and RIDs View Merge/Materialization Query Parallelism DB2 V5 Features

General Recommendation
Make Sure Queries are as simple as possible Unused rows are not fetched. Filtering to be done by DB2 not in the application program. Unused columns are not selected There is no unnecessary ORDER BY or GROUP BY Clause Use page level locking and try to minimize lock duration. Big tables should be handled with care.

General Recommendation
Try to use indexable predicates wherever possible Do not code redundant predicates Make sure that declared length of host variable is not greater than length attribute of data column. If there are efficient indexes available on the tables in the subquery, co-related subquery will perform better. Otherwise no co related subquery will perform better. If there are multiple subqueries, make sure that they are ordered in efficient manner.

Predicate
Predicates are found on WHERE, ON and HAVING clause of the SQL. ON predicates are applied first, then WHERE predicates and after data access HAVING predicates are applied. Predicates on HAVING clause are not used when accessing data Predicate types have great impact on choosing DB2 access path. General Predicate Types - Subquery, Equal, Range, IN-List and NOT

Predicate
Predicates are also categorized into Indexable and non-Indexable predicates. Predicates are classified into Stage 1 and Stage 2 predicates depending on when they are used during query evaluation. For Outer join, predicates on ON clause are treated as stage 2 predicates. And most of the other predicates are applied after JOIN as stage 2 predicates. Predicates on table expression can be evaluated before join as stage 1 predicate

Order of Predicate Evaluation


Predicates are evaluated in following sequence : Indexable matching predicates are accessed first Next is Indexable non-matching index. Processing is called index screening All Indexable predicates are stage 1 predicates After data page access, other stage 1 predicates are applied Finally stage 2 predicates are applied in returned data rows

Order of Predicate Evaluation


Predicate evaluation within each stage : All equal predicates are applied first Range predicates are next All other predicates at the last
After both of the sets of rule are applied, predicates are evaluated in the order, they appear in the query.

Filtering
DB2 will apply the most restrictive predicate first to reduce the processing at next stage for stage 1 predicates. It is better to code predicate with high filtering factor first DB2 evaluates the filter factor based on catalog information in SYSCOLUMNS and SYSCOLDIST tables.

Filtering
If distribution of column value is not available in catalog table SYSIBM.SYSCOLDIST, DB2 assumes normal distribution for filtering. Make sure the catalog tables are updated either manually or running RUNSTATS. If there is no information available from the catalog tables, DB2 assumes default filter factor. DB2 uses default filter factor for predicates in static SQLs using host variables.

Column Correlation
Two columns of a table are said to be co-related if they are depended on each other. DB2 might not determine optimum access path, table order or join method when query uses highly correlated columns. Column correlation makes query cheaper than actually they are. Run RUNSTATS to update the catalog tables with correct correlation to help DB2 to find actual filtering factor.

Using host variable effectively


If static SQL has host variables, DB2 might not select the optimum access path as it uses default filter factor for the predicates using host variable. There are two ways to change the access path for the query that contains host variables - Using REOPT(VARS) option to change the access path at run time. - Rewrite the SQL in a different way.

Writing Subquery
Subquery could be correlated and non-correlated For Correlated subquery, for each row, returned from the outer query, the subquery is evaluated. For non-correlated subquery, Inner query is evaluated first and then the outer query DB2 sometime can transform the subquery to join and sometime application programmer has to do it for better performance Any subquery could be transformed to a join

Writing Subquery
If you use columns from the both the tables, better to use JOIN Guidelines for Writing efficient subquery :
If there are efficient indexes available on the tables in the subquery, then a correlated subquery is likely to be the most efficient kind of subquery. If there are no efficient indexes available on the tables in the subquery, then a non-correlated subquery would likely perform better. If there are multiple subqueries in any parent query, make sure that the subqueries are ordered in the most efficient manner.

Some Special Techniques


OPTIMIZE OF n ROWS Reducing the number of matching columns for index scan Adding extra local predicates Changing inner join to outer join Updating Catalog Statistics

OPTIMIZE FOR n ROWS


DB2 Chooses the access path that minimizes the response time for retrieving the first few rows Using OPTIMIZE FOR does not stop the user accessing whole result set. This is not useful when DB2 has to gather whole result set before returning the first n rows.

Influencing access path


DB2 evaluates the access path based on information available in catalog tables Wrong catalog information or unavailable catalog information may result in selection of wrong access path Wrong access path could be because of wrong index selection It also could be of index selection where tablespace scan is effective

Influencing access path


Code extra predicate or change predicate to make DB2 select a different different access path

Other factors
Adding extra predicate may influence in selection of join method If you have extra predicate, Nested loop join may be selected as DB2 assumes that filter factor will be high. The proper type of predicate to add is WHERE T1.C1 = T1.C1 Hybrid join is a costlier method. Outer join does not use hybrid join. So If hybrid join is used by DB2, convert inner join to outer join and add extra predicates to removes unneeded rows.

Updating catalog tables


Access path based on catalog column values Catalog tables could be updated manually or running RUNSTATS with appropriate options Tables which are frequently changed, access method on them may suffer as statistics are not reflected Running RUNSTATS is a costly process. So catalog statistics manually should be updated. It is necessary to rebind the static SQLs after catalog statistics update

DB2 EXPLAIN AND TUNING


EXPLAIN is a monitoring tool that produces information about a plan, package, or SQL statement when it is bound. The output appears in a user-supplied table called PLAN_TABLE It helps you to do the following Design databases, indexes, and application programs Determine when to rebind an application Determine the access path chosen for a query

DB2 EXPLAIN OUTPUT


Explain output is stored in PLAN_TABLE Each plan is identified by APPLNAME column Each package is identified by PROGNAME, COLLID and VERSION In each package, you might have multiple SQLs and each is identified by QUERYNO For each query will be evaluated in multiple stages and each stage is identified by QBLOCKNO and PLANNO

Type of Access
Tablespace Scan (ACCESSTYPE = R) Index scan Index scan can categorized into Index Only Access (INDEXONLY = Y) Multiple index Scan (ACCESSTYPE=M,MI,MU,MX) Matching index scan (MATCHCOLS > 0) Non-Matching index scan ( MATCHCOLS = 0) One fetch access (ACCESSTYPE= I1)

Tablespace scan (ACCESSTYPE=R)


A matching index scan is not possible because an index is not available, or there are no predicates to match the index columns.

high percentage of the rows in the table is


returned. In this case an index is not really useful, because most rows need to be read anyway. The indexes that have matching predicates have low cluster ratios and are therefore efficient only for small amounts of data. Sequential prefetch is used (PREFETCH=S)

Using Index
Index to be defined should be solely based on how does application fetch data Proper definition of index will avoid sort You need to trade cost of defining index and performance.

Matching Index Scan


Match index scan provide filtering This is possible if predicates are specified on either the leading or all of the index key columns If degree of filtering is high. Matching index scan is efficient MATCHCOLS will provide the number of matching columns If there are more than one index, DB2 will use IX with most restrictive filtering for matching index scan

Non-Matching IX scan
This is also called Index Screening DB2 select index screening when predicates are specified on index key columns but are not part of the matching columns Index screening predicates improve the index access by reducing the number of rows that qualify while searching the index MATCHCOLS = 0 and ACCESSTYPE = I

IN LIST Index Scan


An IN-list index scan is a special case of the matching index scan, in which a single indexable IN predicate is used as a matching equal predicate. PLAN TABLE shows MATCHCOLS > 0 and ACCESSTYPE = N

Multiple Index Scan


Multiple index access uses more than one index to access a table It is a good access path when No single index provides efficient access OR A combination of index accesses provides efficient access. LIST Sequential prefetch is used as RIDs are collected from each index scan ACCESSTYPE = M,MI,MU,MX and PREFETCH = L Same index also may be scanned more than ones

One Fetch Access


One-fetch index access requires retrieving only one row. It is the best possible access path if available. One-fetch index access is a possible when :
There is only one table in the query. The column function is either MIN or MAX and There is an ascending index column for MIN, and a descending index column for MAX. Either no predicate or all predicates are matching predicates for the index. And There is no GROUP BY.

Index Only access


With index-only access, the access path does not require any data pages because the access information is available in the index Because the index is almost always smaller than the table itself, an index-only access path usually processes the data efficiently. ACCESSTYPE = I AND INDEXONLY = Y

JOIN
A join operation retrieves rows from more than one table and combines them. The operation specifies at least two tables, but they need not be distinct. Application joins are called inner join, left outer join, right outer join and full outer join DB2 internally uses three types of join method Nested loop join, Merge Scan Join and Hybrid Join Hybrid join is not used for OUTER join.

Nested Loop Join (METHOD =1)


Initially two tables are picked out and one is used as inner table and other one as composite table. For each row in outer table, inner table is scanned for matching rows in inner(New) table. And the composite table is prepared. This composite table is used as outer table at the next stage. Process continues until and unless all the tables have been selected. Composite table is sorted when order of join columns on both the table are not same.

Nested Loop Join (Method = 1)


Nested loop join is efficient when Outer table is small. Predicates with small filter factor reduces no of qualifying rows in outer table. The number of data pages accessed in inner table is also small. Highly clustered index available on join columns of the inner table. This join method is efficient when filtering for both the tables(Outer and inner) is high.

Merge Scan Join (Method = 2)


DB2 scans both the tables in order of join column If there is no efficient index to provide the order, DB2 might sort the either or both the tables. DB2 reads a row from the outer table and keep on reading the inner table as long as a match is there. When there is no match, DB2 reads another row from outer table. If outer table has a new value, DB2 searches ahead in the inner table.

Merge Scan Join (Method = 2)


Merge scan is used when : Qualifying rows of inner and outer tables are large and join predicates also does not provide much filtering Tables are large and have no indexes with matching columns

Hybrid Join(Method=4)
It is only used for Inner Join and requires an index on the join column of inner table. Join the outer table with RIDs from the index on the inner table. Index of the inner table is scanned for each row in outer table. Sort the data on RID orders and retrieve the data from inner table using list prefetch Concatenates data from inner table to form the resultant table.

Hybrid Join(Method=4)
Hybrid join is used often when a non-clustered index available on join column of the inner table and there are duplicate qualifying rows on outer table. Hybrid join handles are duplicates in the outer table as inner table is scanned only ones for each set of duplicate values. Prefetch method is LIST SEQUENTIAL

Sequential Prefetch (Prefetch=S)


Sequential prefetch reads a sequential set of pages The maximum number of pages read by a request issued from application program is determined by the size of the buffer pool used. Sequential prefetch is generally used for a table space scan. For an index scan that accesses 8 or more consecutive data pages, DB2 requests sequential prefetch at bind time. The index must have a cluster ratio of 80% or above.

List Sequential (Prefetch=L)


List sequential prefetch reads a set of data pages determined by a list of RIDs taken from an index Usually with a single index that has a cluster ratio lower than 80%. Sometimes on indexes with a high cluster ratio, if the amount of data to be accessed is too small to make sequential prefetch efficient, but large enough to require more than one regular read.

Always to access data by multiple index access or


Hybrid join

Sequential Detection
If DB2 does not choose prefetch at bind time, it can sometimes do that at execution time. The method is called sequential detection. If a table is accessed repeatedly using the same statement (SQL in a do-while loop), the data or index leaf pages of the table can be accessed sequentially. DB2 can use this technique if it did not choose sequential prefetch at bind time because of an inaccurate estimate of the no of pages to be

Sorting of data
Sort can happen on a new table or on the composite table Sort is required by ORDER BY or GROUP BY clause. (SORTC_GROUPBY/SORTC_ORDERBY = Y). Sort is required to remove duplicates while DISTINCT or UNION is used. (SORTC_UNIQ=Y) During Nested loop and Hybrid join, composite table is sorted and Merge scan join, both of the tables might be sorted to make join efficient.
(SORTN_JOIN/SORTC_JOIN=Y)

Sorting of data
Sort is need for subquery processing. Result of the subquery is sorted and put into the work file for later reference by parent query. DB2 sorts RIDs into ascending page number order in order to perform list prefetch. This sort is very fast and is done totally in memory If sort is required during CURSOR processing, it is done during OPEN CURSOR. Once cursor is closed and opened, sort is to be performed again.

View Merge
If query is using view, view name ultimately will be resolved to table name. This process is called view merge. the statement that references the view is combined with the subselect that defined the view. This combination creates a logically equivalent statement. This equivalent statement is executed against the database

View Materialization
Views can not be merged if view definition involves column functions. In that case view materialization is required. Done in two stages : 1) The view's defining subselect is executed against the database and the results are placed in a temporary copy of a result table. 2) The view's referencing statement is then executed against the temporary copy of the result table to obtain the intended result

Query Parallelism
When DB2 plans to access data from a table or index in a partitioned table space, it can initiate multiple parallel operations to reduce the response time for data or processor-intensive queries. Two types of parallelism 1) Query I/O parallelism - Manages concurrent I/O request for a single query. 2) Query CP parallelism - Enables multitasking within a single query. Query is broken into parts and processed.

Query Parallelism
Parallel processing is enabled using DEGREE(ANY) on BIND and REBIND for static SET CURRENT DEGREE = ANY for dynamic The virtual buffer pool parallel sequential threshold (VPPSEQT) value must be large enough to provide adequate buffer pool space for parallel processing Degree = 1 disables parallel processing

DB2 V5 features
The following V5 features also could be useful CASE statement - Replace all UNION and UNION ALL with CASE to have a better performance GLOBAL TEMPORARY table - to avoid repeated joins if applicable Online REORG - It can run in parallel with application as it does not stop application program to access data while reorg is in process

Tools
Tools for Performance Analysis - DB2 PM - CANDLES OMEGAMON - SMF/RMF Data - DB2 TRACE Tools for Access Path Analysis - DB2 EXPLAIN - VISUAL EXPLAIN

Risks
There is no GOLDEN RULE for DB2 SQL tuning Wrong Analysis of performance Data and access method information may led to more performance overhead While tuning SQL in test environment, the person should keep in mind that amount of data and DB2 sub-system setup are not same. Person with good knowledge of DB2 should be involved with tuning activity.

GLOSSARY

BP CP CPC DASD DB2 PM DBD DS IX LDS RI RID RMF SMF SQL T TS VSAM

Buffer Pool Central Processor Central Processing Cage Direct Access Storage Device DB2 Performance Monitor Database Descriptor Dataset Index Linear VSAM Dataset Referential Integrity Row Identifier Resource Monitoring facility System Monitoring Facility Structured Query Language Table Tablespace Virtual Storage Access Method

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy