Join optimization algorithm

 

The query optimizer must determine the join columns, join operators, local row selection, dial implementation, and dial ordering for a join query.

The join columns and join operators depend on the following situations:

Join specifications which are not implemented for the dial are either deferred until they can be processed in a later dial or, if an inner join was being performed for this dial, processed as row selection. For a given dial, the only join specifications which are usable as join columns for that dial are those being joined to a previous dial. For example, for the second dial the only join specifications that can be used to satisfy the join condition are join specifications which reference columns in the primary dial. Likewise, the third dial can only use join specifications which reference columns in the primary and the second dials and so on. Join specifications which reference later dials are deferred until the referenced dial is processed.

For OPNQRYF, only one type of join operator is allowed for either a left outer or an exception join. That is, the join operator for all join conditions must be the same.

When looking for an existing index to access a secondary dial, the query optimizer looks at the left-most key columns of the index. For a given dial and index, the join specifications which use the left-most key columns can be used. For example:

    DECLARE BROWSE2 CURSOR FOR
     SELECT * FROM EMPLOYEE, EMP_ACT       WHERE EMPLOYEE.EMPNO = EMP_ACT.EMPNO         AND EMPLOYEE.HIREDATE = EMP_ACT.EMSTDATE      OPTIMIZE FOR 99999 ROWS

For the index over EMP_ACT with key columns EMPNO, PROJNO, and EMSTDATE, the join operation is performed only on column EMPNO. After the join is performed, index scan-key selection is done using column EMSTDATE.

The query optimizer also uses local row selection when choosing the best use of the index for the secondary dial. If the previous example had been expressed with a local predicate as:

    DECLARE BROWSE2 CURSOR FOR
     SELECT * FROM EMPLOYEE, EMP_ACT       WHERE EMPLOYEE.EMPNO = EMP_ACT.EMPNO         AND EMPLOYEE.HIREDATE = EMP_ACT.EMSTDATE         AND EMP_ACT.PROJNO = '123456'
     OPTIMIZE FOR 99999 ROWS

The index with key columns EMPNO, PROJNO, and EMSTDATE are fully utilized by combining join and selection into one operation against all three key columns.

When creating a temporary index, the left-most key columns are the usable join columns in that dial position. All local row selection for that dial is processed when selecting entries for inclusion into the temporary index. A temporary index is similar to the index created for a select/omit keyed logical file. The temporary index for the previous example has key columns of EMPNO and EMSTDATE.

Since the query optimizer attempts a combination of join and local row selection when determining access path usage, it is possible to achieve almost all of the same advantages of a temporary index by use of an existing index. In the above example, using either implementation, an existing index may be used or a temporary index may be created. A temporary index is built with the local row selection on PROJNO applied during the index's creation; the temporary index has key columns of EMPNO and EMSTDATE (to match the join selection). If, instead, an existing index was used with key columns of EMPNO, PROJNO, EMSTDATE (or PROJNO, EMP_ACT, EMSTDATE or EMSTDATE, PROJNO, EMP_ACT or ...) the local row selection can be applied at the same time as the join selection (rather than before the join selection, as happens when the temporary index is created, or after the join selection, as happens when only the first key column of the index matches the join column).

The implementation using the existing index is more likely to provide faster performance because join and selection processing are combined without the overhead of building a temporary index. However, the use of the existing index may have just slightly slower I/O processing than the temporary index because the local selection is run many times rather than once. In general, it is a good idea to have existing indexes available with key columns for the combination of join columns and columns using equal selection as the left-most keys.

 

Parent topic:

Join optimization