Two-step grouping with DB2® Multisystem

 

If the partitioning key is not included in the grouping fields, then grouping must be done using two-step grouping, because the like values of a field are not located on the same node.

The following code is an example of two-step grouping.

SQL statement:

      SELECT JOB, AVG(SALARY)
             FROM EMPLOYEE              GROUP BY JOB  

OPNQRYF command:

     OPNQRYF FILE((EMPLOYEE)) FORMAT(GRPFMT2)
                   GRPFLD(JOB)
                   MAPFLD((AVGSAL '%AVG(SALARY)'))

In this example, note that for the group where JOB is Clerk, the value Clerk is on two different nodes in the EMPLOYEE distributed file. Grouping is implemented by first running grouping in parallel on all three nodes. This results in an initial grouping which is placed in a temporary file at the coordinator node. The query is modified, and the grouping is run again at the coordinator node to get the final set of grouping results.

Whole-file grouping (no group by fields) is always implemented using two steps.

If the query contains either a HAVING clause or the group selection expression (GRPSLT) parameter on the OPNQRYF command, all groups from the first grouping step are returned to the coordinator node. The HAVING clause or the GRPSLT parameter is then processed as part of second grouping step.

If the query contains a DISTINCT column (aggregate) function and two-step grouping is required, no grouping is done in the first step. Instead, all records are returned to the coordinator node, and all grouping is run at the coordinator node as part of the second step.

 

Parent topic:

Implementation and optimization of grouping with DB2 Multisystem