Choosing partitioning keys with DB2 Multisystem
For the system to process the partitioned file in the most efficient manner, there are some tips you can consider when setting up or using a partitioning key.
These tips are as follows:
- The best partitioning key is one that has many different values and, therefore, the partitioning activity results in an even distribution of the records of data. Customer numbers, last names, claim numbers, ZIP codes (regional mailing address codes), and telephone area codes are examples of good categories for using as partitioning keys.
Gender, because only two choices exist, male or female, is an example of a poor choice for a partitioning key. Gender causes too much data to be distributed to a single node instead of spread across the nodes. Also, when doing a query, gender as the partitioning key causes the system to process through too many records of data. It is inefficient; another field or fields of data can narrow the scope of the query and make it much more efficient. A partitioning key based on gender is a poor choice in cases where even distribution of data is wanted rather than distribution based on specific values.
When preparing to change a local file into a distributed file, you can use the HASH function to get an idea of how the data is distributed. Because the HASH function can be used against local files and with any variety of columns, you can try different partitioning keys before actually changing the file to be distributed. For example, if you plan to use the ZIP code field of a file, you can run the HASH function using that field to get an idea of the number of records that HASH to each partition number. This helps you in choosing your partitioning key fields, or in creating the partition map in your node groups.
- Do not choose a field that needs to be updated often. A restriction on partitioning key fields is that they can have their values updated only if the update does not force the record to a different node.
- Do not use many fields in the partitioning key; the best choice is to use one field. Using many fields forces the system to do more work at I/O time.
- Choose a simple data type, such as fixed-length character or integer, as your partitioning key. This consideration might help performance because the hashing is done for a single field of a simple data type.
- When choosing a partitioning key, you should consider the join and grouping criteria of the queries you typically run. For example, choosing a field that is never used as a join field for a file that is involved in joins can adversely affect join performance. See Query design for performance with DB2® Multisystem for information about running queries involving distributed files.
Parent topic:
Partitioning with DB2 Multisystem
Related concepts
Query design for performance with DB2 Multisystem