Implementation and optimization of a single file query with DB2® Multisystem
To do a single file query, the system where the query was specified, the coordinator node, determines the nodes of the file to which to send the query. Those nodes run the query and return the queried records to the coordinator node.
All of the examples in this topic use the following distributed files: DEPARTMENT and EMPLOYEE. The node group for these files consists of SYSA, SYSB, and SYSC. The data is partitioned on the department number. The following SQL statement creates the DEPARTMENT distributed file.
CREATE TABLE DEPARTMENT (DEPTNO CHAR(3) NOT NULL, DEPTNAME VARCHAR(20) NOT NULL, MGRNO CHAR(6), ADMRDEPT CHAR(3) NOT NULL) IN NODGRP1 PARTITIONING KEY(DEPTNO)
The following SQL statement creates the EMPLOYEE distributed file.
Table 1. DEPARTMENT table Node Record number DEPTNO DEPTNAME MGRNO ADMRDEPT SYSA 1 A00 Support services 000010 A00 SYSB 2 A01 Planning 000010 A00 SYSC 3 B00 Accounting 000050 B00 SYSA 4 B01 Programming 000050 B00 CREATE TABLE EMPLOYEE (EMPNO CHAR(6) NOT NULL, FIRSTNME VARCHAR(12) NOT NULL, LASTNAME VARCHAR(15) NOT NULL, WORKDEPT CHAR(3) NOT NULL, JOB CHAR(8), SALARY DECIMAL(9,2)) IN NODGRP1 PARTITIONING KEY(WORKDEPT)
Table 2. EMPLOYEE table Node Record number EMPNO FIRSTNME LASTNAME WORK DEPT JOB SALARY SYSA 1 000010 Christine Haas A00 Manager 41250 SYSA 2 000020 Sally Kwan A00 Clerk 25000 SYSB 3 000030 John Geyer A01 Planner 35150 SYSB 4 000040 Irving Stern A01 Clerk 32320 SYSC 5 000050 Michael Thompson B00 Manager 38440 SYSC 6 000060 Eileen Henderson B00 Accountant 33790 SYSA 7 000070 Jennifer Lutz B01 Programmer 42325 SYSA 8 000080 David White B01 Programmer 36450 The following query uses the defined distributed file EMPLOYEE, with index EMPIDX created over the field SALARY. The query is entered on SYSA.
SQL statement:
SELECT * FROM EMPLOYEE WHERE SALARY > 40000OPNQRYF command:
OPNQRYF FILE((EMPLOYEE)) QRYSLT('SALARY > 40000')In this case, SYSA sends the query to all the nodes of EMPLOYEE, including SYSA. Each node runs the query and returns the records to SYSA. Because a distributed index exists on field SALARY of file EMPLOYEE, optimization that is done on each node decides whether to use the index.
In the next example, the query is specified on SYSA, but the query is sent to a subset of the nodes where the EMPLOYEE file exists. In this case, the query is run locally on SYSA only.
SQL statement:
SELECT * FROM EMPLOYEE WHERE WORKDEPT = 'A00'OPNQRYF command:
OPNQRYF FILE((EMPLOYEE)) QRYSLT('WORKDEPT = 'A00')The distributed query optimizer determines that there is an isolatable record selection, WORKDEPT = 'A00', involving the partitioning key, WORKDEPT, for this query. The optimizer hashes the value 'A00' and based on the hash value, finds the node at which all of the records satisfying this condition are located. In this case, all of the records satisfying this condition are on SYSA, thus the query is sent only to that node. Because the query originated on SYSA, the query is run locally on SYSA.
The following conditions subset the number of nodes at which a query runs:
- All fields of the partitioning key must be isolatable record selection
- All predicates must use the equal (=) operator
- All fields of the partitioning key must be compared to a literal
For performance reasons, you should specify record selection predicates that match the partitioning key in order to direct the query to a particular node. Record selection with scalar functions of NODENAME, PARTITION, and NODENUMBER can also direct the query to specific nodes.
Parent topic:
Query design for performance with DB2 Multisystem