Join optimization

Join optimization

A join operation is a complex function that requires special attention in order to achieve good performance. This section describes how DB2 Universal Database™ for iSeries™ implements join queries and how optimization choices are made by the query optimizer. It also describes design tips and techniques which help avoid or solve performance problems.

Nested loop join implementation
DB2 Universal Database for iSeries provides a nested loop join method. For this method, the processing of the tables in the join are ordered. This order is called the join order. The first table in the final join order is called the primary table. The other tables are called secondary tables. Each join table position is called a dial.

Join optimization algorithm
The query optimizer must determine the join columns, join operators, local row selection, dial implementation, and dial ordering for a join query.

Join order optimization
The join order is fixed if any join logical files are referenced. The join order is also fixed if the OPNQRYF JORDER(*FILE) parameter is specified or the query options file (QAQQINI) FORCE_JOIN_ORDER parameter is *YES.

Cost estimation and index selection for join secondary dials
As the query optimizer compares the various possible access choices, it must assign a numeric cost value to each candidate and use that value to determine the implementation which consumes the least amount of processing time. This costing value is a combination of CPU and I/O time

Predicates generated through transitive closure
For join queries, the query optimizer may do some special processing to generate additional selection. When the set of predicates that belong to a query logically infer extra predicates, the query optimizer generates additional predicates. The purpose is to provide more information during join optimization.

Look ahead predicate generation (LPG)
A special type of transitive closure called look ahead predicate generation (LPG) may be costed for joins. In this case, the optimizer attempts to minimize the random I/O costs of a join by pre-applying the results of the query to a large fact table. LPG will typically be used with a class of queries referred to as star join queries, however it can possibly be used with any join query.

Tips for improving performance when selecting data from more than two tables
The following suggestion is only applicable to CQE and is directed specifically to select-statements that access several tables. For joins that involve more than two tables, you might want to provide redundant information about the join columns. The CQE optimizer does not generate transitive closure predicates between 2 columns. If you give the optimizer extra information to work with when requesting a join, it can determine the best way to do the join. The additional information might seem redundant, but is helpful to the optimizer.

Multiple join types for a query
Even though multiple join types (inner, left outer, right outer, left exception, and right exception) can be specified in the query using the JOIN syntax, the iSeries Licensed Internal Code can only support one join type of inner, left outer, or left exception join type for the entire query. This requires the optimizer to determine what the overall join type for the query should be and to reorder files to achieve the correct semantics.

Sources of join query performance problems
The optimization algorithms described above benefit most join queries, but the performance of a few queries may be degraded.

Tips for improving the performance of join queries
If you are looking at a join query which is performing poorly or you are about to create a new application which uses join queries, these tips may be useful.

Parent topic:
Processing queries: Overview