DB2 SQL Tuning
DB2 SQL Tuning
DB2 SQL Tuning
General Recommendation Predicates Evaluation Filter Factor Column Correlation Writing Subquery Special Techniques to influence access path Using DB2 EXPLAIN Different Access Types
Topics of Discussion
Join Methods DB2 Data and Index page prefetch Sorting of Data and RIDs View Merge/Materialization Query Parallelism DB2 V5 Features
General Recommendation
Make Sure Queries are as simple as possible Unused rows are not fetched. Filtering to be done by DB2 not in the application program. Unused columns are not selected There is no unnecessary ORDER BY or GROUP BY Clause Use page level locking and try to minimize lock duration. Big tables should be handled with care.
General Recommendation
Try to use indexable predicates wherever possible Do not code redundant predicates Make sure that declared length of host variable is not greater than length attribute of data column. If there are efficient indexes available on the tables in the subquery, co-related subquery will perform better. Otherwise no co related subquery will perform better. If there are multiple subqueries, make sure that they are ordered in efficient manner.
Predicate
Predicates are found on WHERE, ON and HAVING clause of the SQL. ON predicates are applied first, then WHERE predicates and after data access HAVING predicates are applied. Predicates on HAVING clause are not used when accessing data Predicate types have great impact on choosing DB2 access path. General Predicate Types - Subquery, Equal, Range, IN-List and NOT
Predicate
Predicates are also categorized into Indexable and non-Indexable predicates. Predicates are classified into Stage 1 and Stage 2 predicates depending on when they are used during query evaluation. For Outer join, predicates on ON clause are treated as stage 2 predicates. And most of the other predicates are applied after JOIN as stage 2 predicates. Predicates on table expression can be evaluated before join as stage 1 predicate
Filtering
DB2 will apply the most restrictive predicate first to reduce the processing at next stage for stage 1 predicates. It is better to code predicate with high filtering factor first DB2 evaluates the filter factor based on catalog information in SYSCOLUMNS and SYSCOLDIST tables.
Filtering
If distribution of column value is not available in catalog table SYSIBM.SYSCOLDIST, DB2 assumes normal distribution for filtering. Make sure the catalog tables are updated either manually or running RUNSTATS. If there is no information available from the catalog tables, DB2 assumes default filter factor. DB2 uses default filter factor for predicates in static SQLs using host variables.
Column Correlation
Two columns of a table are said to be co-related if they are depended on each other. DB2 might not determine optimum access path, table order or join method when query uses highly correlated columns. Column correlation makes query cheaper than actually they are. Run RUNSTATS to update the catalog tables with correct correlation to help DB2 to find actual filtering factor.
Writing Subquery
Subquery could be correlated and non-correlated For Correlated subquery, for each row, returned from the outer query, the subquery is evaluated. For non-correlated subquery, Inner query is evaluated first and then the outer query DB2 sometime can transform the subquery to join and sometime application programmer has to do it for better performance Any subquery could be transformed to a join
Writing Subquery
If you use columns from the both the tables, better to use JOIN Guidelines for Writing efficient subquery :
If there are efficient indexes available on the tables in the subquery, then a correlated subquery is likely to be the most efficient kind of subquery. If there are no efficient indexes available on the tables in the subquery, then a non-correlated subquery would likely perform better. If there are multiple subqueries in any parent query, make sure that the subqueries are ordered in the most efficient manner.
Other factors
Adding extra predicate may influence in selection of join method If you have extra predicate, Nested loop join may be selected as DB2 assumes that filter factor will be high. The proper type of predicate to add is WHERE T1.C1 = T1.C1 Hybrid join is a costlier method. Outer join does not use hybrid join. So If hybrid join is used by DB2, convert inner join to outer join and add extra predicates to removes unneeded rows.
Type of Access
Tablespace Scan (ACCESSTYPE = R) Index scan Index scan can categorized into Index Only Access (INDEXONLY = Y) Multiple index Scan (ACCESSTYPE=M,MI,MU,MX) Matching index scan (MATCHCOLS > 0) Non-Matching index scan ( MATCHCOLS = 0) One fetch access (ACCESSTYPE= I1)
Using Index
Index to be defined should be solely based on how does application fetch data Proper definition of index will avoid sort You need to trade cost of defining index and performance.
Non-Matching IX scan
This is also called Index Screening DB2 select index screening when predicates are specified on index key columns but are not part of the matching columns Index screening predicates improve the index access by reducing the number of rows that qualify while searching the index MATCHCOLS = 0 and ACCESSTYPE = I
JOIN
A join operation retrieves rows from more than one table and combines them. The operation specifies at least two tables, but they need not be distinct. Application joins are called inner join, left outer join, right outer join and full outer join DB2 internally uses three types of join method Nested loop join, Merge Scan Join and Hybrid Join Hybrid join is not used for OUTER join.
Hybrid Join(Method=4)
It is only used for Inner Join and requires an index on the join column of inner table. Join the outer table with RIDs from the index on the inner table. Index of the inner table is scanned for each row in outer table. Sort the data on RID orders and retrieve the data from inner table using list prefetch Concatenates data from inner table to form the resultant table.
Hybrid Join(Method=4)
Hybrid join is used often when a non-clustered index available on join column of the inner table and there are duplicate qualifying rows on outer table. Hybrid join handles are duplicates in the outer table as inner table is scanned only ones for each set of duplicate values. Prefetch method is LIST SEQUENTIAL
Sequential Detection
If DB2 does not choose prefetch at bind time, it can sometimes do that at execution time. The method is called sequential detection. If a table is accessed repeatedly using the same statement (SQL in a do-while loop), the data or index leaf pages of the table can be accessed sequentially. DB2 can use this technique if it did not choose sequential prefetch at bind time because of an inaccurate estimate of the no of pages to be
Sorting of data
Sort can happen on a new table or on the composite table Sort is required by ORDER BY or GROUP BY clause. (SORTC_GROUPBY/SORTC_ORDERBY = Y). Sort is required to remove duplicates while DISTINCT or UNION is used. (SORTC_UNIQ=Y) During Nested loop and Hybrid join, composite table is sorted and Merge scan join, both of the tables might be sorted to make join efficient.
(SORTN_JOIN/SORTC_JOIN=Y)
Sorting of data
Sort is need for subquery processing. Result of the subquery is sorted and put into the work file for later reference by parent query. DB2 sorts RIDs into ascending page number order in order to perform list prefetch. This sort is very fast and is done totally in memory If sort is required during CURSOR processing, it is done during OPEN CURSOR. Once cursor is closed and opened, sort is to be performed again.
View Merge
If query is using view, view name ultimately will be resolved to table name. This process is called view merge. the statement that references the view is combined with the subselect that defined the view. This combination creates a logically equivalent statement. This equivalent statement is executed against the database
View Materialization
Views can not be merged if view definition involves column functions. In that case view materialization is required. Done in two stages : 1) The view's defining subselect is executed against the database and the results are placed in a temporary copy of a result table. 2) The view's referencing statement is then executed against the temporary copy of the result table to obtain the intended result
Query Parallelism
When DB2 plans to access data from a table or index in a partitioned table space, it can initiate multiple parallel operations to reduce the response time for data or processor-intensive queries. Two types of parallelism 1) Query I/O parallelism - Manages concurrent I/O request for a single query. 2) Query CP parallelism - Enables multitasking within a single query. Query is broken into parts and processed.
Query Parallelism
Parallel processing is enabled using DEGREE(ANY) on BIND and REBIND for static SET CURRENT DEGREE = ANY for dynamic The virtual buffer pool parallel sequential threshold (VPPSEQT) value must be large enough to provide adequate buffer pool space for parallel processing Degree = 1 disables parallel processing
DB2 V5 features
The following V5 features also could be useful CASE statement - Replace all UNION and UNION ALL with CASE to have a better performance GLOBAL TEMPORARY table - to avoid repeated joins if applicable Online REORG - It can run in parallel with application as it does not stop application program to access data while reorg is in process
Tools
Tools for Performance Analysis - DB2 PM - CANDLES OMEGAMON - SMF/RMF Data - DB2 TRACE Tools for Access Path Analysis - DB2 EXPLAIN - VISUAL EXPLAIN
Risks
There is no GOLDEN RULE for DB2 SQL tuning Wrong Analysis of performance Data and access method information may led to more performance overhead While tuning SQL in test environment, the person should keep in mind that amount of data and DB2 sub-system setup are not same. Person with good knowledge of DB2 should be involved with tuning activity.
GLOSSARY
BP CP CPC DASD DB2 PM DBD DS IX LDS RI RID RMF SMF SQL T TS VSAM
Buffer Pool Central Processor Central Processing Cage Direct Access Storage Device DB2 Performance Monitor Database Descriptor Dataset Index Linear VSAM Dataset Referential Integrity Row Identifier Resource Monitoring facility System Monitoring Facility Structured Query Language Table Tablespace Virtual Storage Access Method