Implementation and optimization of a single file query with DB2® Multisystem

 

To do a single file query, the system where the query was specified, the coordinator node, determines the nodes of the file to which to send the query. Those nodes run the query and return the queried records to the coordinator node.

All of the examples in this topic use the following distributed files: DEPARTMENT and EMPLOYEE. The node group for these files consists of SYSA, SYSB, and SYSC. The data is partitioned on the department number. The following SQL statement creates the DEPARTMENT distributed file.

CREATE TABLE DEPARTMENT        (DEPTNO CHAR(3) NOT NULL,
       DEPTNAME VARCHAR(20) NOT NULL,
       MGRNO CHAR(6),
       ADMRDEPT CHAR(3) NOT NULL)
      IN NODGRP1 PARTITIONING KEY(DEPTNO)

Table 1. DEPARTMENT table
Node Record number DEPTNO DEPTNAME MGRNO ADMRDEPT
SYSA 1 A00 Support services 000010 A00
SYSB 2 A01 Planning 000010 A00
SYSC 3 B00 Accounting 000050 B00
SYSA 4 B01 Programming 000050 B00
The following SQL statement creates the EMPLOYEE distributed file.
CREATE TABLE EMPLOYEE        (EMPNO CHAR(6) NOT NULL,
       FIRSTNME VARCHAR(12) NOT NULL,
       LASTNAME VARCHAR(15) NOT NULL,
       WORKDEPT CHAR(3) NOT NULL,
       JOB CHAR(8),
       SALARY DECIMAL(9,2))
      IN NODGRP1 PARTITIONING KEY(WORKDEPT)

Table 2. EMPLOYEE table
Node Record number EMPNO FIRSTNME LASTNAME WORK DEPT JOB SALARY
SYSA 1 000010 Christine Haas A00 Manager 41250
SYSA 2 000020 Sally Kwan A00 Clerk 25000
SYSB 3 000030 John Geyer A01 Planner 35150
SYSB 4 000040 Irving Stern A01 Clerk 32320
SYSC 5 000050 Michael Thompson B00 Manager 38440
SYSC 6 000060 Eileen Henderson B00 Accountant 33790
SYSA 7 000070 Jennifer Lutz B01 Programmer 42325
SYSA 8 000080 David White B01 Programmer 36450

The following query uses the defined distributed file EMPLOYEE, with index EMPIDX created over the field SALARY. The query is entered on SYSA.

SQL statement:

      SELECT * FROM EMPLOYEE WHERE SALARY > 40000
   

OPNQRYF command:

     OPNQRYF FILE((EMPLOYEE)) QRYSLT('SALARY > 40000')

In this case, SYSA sends the query to all the nodes of EMPLOYEE, including SYSA. Each node runs the query and returns the records to SYSA. Because a distributed index exists on field SALARY of file EMPLOYEE, optimization that is done on each node decides whether to use the index.

In the next example, the query is specified on SYSA, but the query is sent to a subset of the nodes where the EMPLOYEE file exists. In this case, the query is run locally on SYSA only.

SQL statement:

    SELECT * FROM EMPLOYEE WHERE WORKDEPT = 'A00'

OPNQRYF command:

    OPNQRYF FILE((EMPLOYEE)) QRYSLT('WORKDEPT = 'A00')

The distributed query optimizer determines that there is an isolatable record selection, WORKDEPT = 'A00', involving the partitioning key, WORKDEPT, for this query. The optimizer hashes the value 'A00' and based on the hash value, finds the node at which all of the records satisfying this condition are located. In this case, all of the records satisfying this condition are on SYSA, thus the query is sent only to that node. Because the query originated on SYSA, the query is run locally on SYSA.

The following conditions subset the number of nodes at which a query runs:

For performance reasons, you should specify record selection predicates that match the partitioning key in order to direct the query to a particular node. Record selection with scalar functions of NODENAME, PARTITION, and NODENUMBER can also direct the query to specific nodes.

 

Parent topic:

Query design for performance with DB2 Multisystem