Join optimization with DB2® Multisystem

 

The distributed query optimizer generates a plan to join distributed files.

The distributed query optimizer looks at file sizes, expected number of records selected for each file, and the type of distributed joins that are possible; and then the optimizer breaks the query into multiple steps. Each step creates an intermediate result file that is used as input for the next step.

During optimization, a cost is calculated for each join step based on the type of distributed join. The cost reflects, in part, the amount of data movement required for that join step. The cost is used to determine the final distributed plan.

As much processing as possible is completed during each step; for example, record selection isolated to a given step is performed during that step, and as many files as possible are joined for each step. Each join step might involve more than one type of distributed join. A collocated join and a directed join can be combined into one collocated join by directing the necessary file first. A directed join and a re-partitioned join can be combined by directing all the files first and then performing the join. Note that directed and re-partitioned joins are really just a collocated join, with one or more files being directed before the join occurs.

When joining distributed files with local files, the distributed query optimizer calculates a cost, similar to the cost calculated when joining distributed files. Based on this cost, the distributed query optimizer can choose to perform one of the following actions:

 

Parent topic:

Implementation and optimization of join operations with DB2 Multisystem