The cluster workload management algorithm
The workload management algorithm uses workload balancing attributes and many rules to select the final destination for messages being put onto cluster queues.
The workload management algorithm is exercised every time a choice of destination is required:The following section describes the workload management algorithm used when determining the final destination for messages being put onto cluster queues. These rules are influenced by the settings applied to the following attributes for queues, queue managers, and channels:
- It is used at the point a cluster queue is opened, by using the MQOO_BIND_ON_OPEN option.
- It is used each time a message is put to a cluster queue when it is opened with MQOO_BIND_NOT_FIXED.
- It is used each time a new message group is started when MQOO_BIND_ON_GROUP is used to open a cluster queue.
- For topic host routing, it is used each time a message is published to a clustered topic. If the local queue manager is not a host for this topic, the algorithm is used to choose a host queue manager to route the message through.
Initially, the queue manager builds a list of possible destinations from two procedures:
Table 1. Attributes for cluster workload management Queues Queue managers Channels
- Matching the target ObjectName and ObjectQmgrName with queue manager alias definitions that are shared in the same clusters as the queue manager.
- Finding unique routes (that is, channels) to a queue manager that hosts a queue with the name ObjectName and is in one of the clusters that the queue manager is a member of.
The algorithm steps through the following rules to eliminate destinations from the list of possible destinations.
After the list of valid destinations has been calculated, messages are workload balanced across them, using the following logic:
- Remote instances of queues or topics or remote CLUSRCVR channels that do not share a cluster with the local queue manager are eliminated.
- If a queue or topic name is specified, remote CLUSRCVR channels that are not in the same cluster as the queue or topic are eliminated.Note: All remaining queues, topics and channels at this stage are made available to the cluster workload exit, if it is configured.
- All channels to queue managers or queue manager aliases that have a CLWLRANK less than the maximum rank of all remaining channels or queue manager aliases are eliminated.
- All queues (not queue manager aliases) with a CLWLRANK less than the maximum rank of all remaining queues are eliminated.
- If more than one instance of a queue, topic, or queue manager alias remains, and if any are pub put enabled, all those that are put disabled are eliminated.Note: If only put disabled instances remain then only inquire operations will succeed, all other operations will fail with MQRC_CLUSTER_PUT_INHIBITED.
- When choosing a queue, if the resulting set of queues contains the local instance of the queue, the local instance is typically used. The local instance of the queue is used if one of the following conditions are true:
- The use-queue attribute of the queue, CLWLUSEQ, is set to LOCAL.
- Both the following statements are true:
- The use-queue attribute of the queue, CLWLUSEQ, is set to QMGR.
- The use-queue attribute of the queue manager, CLWLUSEQ, is set to LOCAL.
- The message is received over a cluster channel rather than by being put by a local application.
- For locally defined queues that are defined with CLWLUSEQ(ANY), or which inherit that same setting from the queue manager, the following points are true, within the wider set of conditions that apply:
- The local queue is chosen, based on the status of the locally-defined CLUSRCVR channels in the same cluster as the queue. This status is compared to the status of the CLUSSDR channels that would take the message to remotely defined queues of the same name.
For example, there is one CLUSRCVR in the same cluster as the queue. That CLUSRCVR has STOPPING status, whereas the other queues of the same name in the cluster have RUNNING or INACTIVE status. In this case the remote channels will be chosen, and the local queue is not used.
- The local queue is chosen based on the number of CLUSRCVR channels, in any comparison with CLUSSDR channels of the same status, that would take the message to remotely defined queues of the same name.
For example, there are four CLUSRCVR channels in the same cluster as the queue, and one CLUSSDR channel. All the channels have the same status of either INACTIVE or RUNNING. Therefore, there are five channels to choose from, and two instances of the queue. Four-fifths (80 percent) of the messages go to the local queue.
- If more than one queue manager remains, if any are not suspended then all those that are suspended are eliminated.
- If more than one remote instance of a queue or topic remains, all channels that are inactive or running are included. The state constants are listed:
- MQCHS_INACTIVE
- MQCHS_RUNNING
- If no remote instance of a queue or topic remains, all channels that are in binding, initializing, starting, or stopping state are included. The state constants are listed:
- MQCHS_BINDING
- MQCHS_INITIALIZING
- MQCHS_STARTING
- MQCHS_STOPPING
- If no remote instance of a queue or topic remains, all channels that are being tried again are included. The state constant is listed:
- MQCHS_RETRYING
- If no remote instance of a queue or topic remains, all channels in requesting, paused, or stopped state are included. The state constants are listed:
- MQCHS_REQUESTING
- MQCHS_PAUSED
- MQCHS_STOPPED
- MQCHS_SWITCHING
- If more than one remote instance of a queue or topic on any queue manager remains, channels with the highest NETPRTY value for each queue manager are chosen.
- All remaining channels and queue manager aliases other than channels and aliases with the highest priority, CLWLPRTY, are eliminated. If any queue manager aliases remain, channels to the queue manager are kept.
- If a queue is being chosen:
- All queues other than queues with the highest priority, CLWLPRTY, are eliminated, and channels are kept.
- The remaining channels are then reduced to no more than the maximum allowed number of most recently-used channels, CLWLMRUC, by eliminating the channels with the lowest values of MQWDR.DestSeqNumber.Note: Internal cluster control messages are sent using the same cluster workload algorithm where appropriate.
- When more than one remote instance of a destination remains and all channels to that destination have CLWLWGHT set to the default setting of 50, the least recently used channel is chosen. This approximately equates to a round-robin style of workload balancing when multiple remote instances exist.
- When more than one remote instance of a destination remains and one or more of the channels to those queues has CLWLWGHT set to a non-default setting (even if they all have a matching non-default value), then routing becomes dependent on the relative weightings of each channel and the total number of times each channel has previously been chosen when sending messages.
- When observing the distribution of messages for a single clustered queue with multiple instances, this can appear to lead to an unbalanced distribution across a subset of queue instances. This is because it is the historic use of each cluster sender channel from this queue manager that is being balanced, not just the message traffic for that queue. If this behavior is not desirable, complete one of the following steps:
- Set CLWLWGHT to 50 on all cluster receiver channels if even distribution is required.
- Or, if certain queue instances need to be weighted differently from others, define those queues in a dedicated cluster, with defined dedicated cluster receiver channels. This action isolates the workload balancing of these queues from others in the cluster.
- The historic data that is used to balance the channels is reset if any cluster workload attributes of available cluster receiver channels are altered or the status of a cluster receiver channel becomes available. Modification to the workload attributes of manually defined cluster sender channels does not reset the historic data.
- When you are considering cluster workload exit logic, the chosen channel is the one with the lowest MQWDR.DestSeqFactor. Each time a channel is chosen, this value is increased by approximately 1000/CLWLWGHT. If there is more than one channel with the lowest value, one of the channels with the lowest MQWDR.DestSeqNumber value is chosen.
The distribution of user messages is not always exact because administration and maintenance of the cluster causes messages to flow across channels. The result is an uneven distribution of user messages that can take some time to stabilize. Because of the admixture of administration and user messages, place no reliance on the exact distribution of messages during workload balancing.