Clustering: Migration and modification best practices
This topic provides guidance for planning and administering IBM MQ clusters. This information is a guide based on testing and feedback from customers.
- Moving objects in a cluster (Best practices for moving objects around inside a cluster, without installing any fix packs or new versions of IBM MQ ).
- Upgrades and maintenance installations (Best practices for keeping a working cluster architecture up and running, while applying maintenance or upgrades and testing the new architecture).
Moving objects in a cluster
- Applications and their queues
When you must move a queue instance hosted on one queue manager to be hosted on another, we can work with the workload balancing parameters to ensure a smooth transition.
Create an instance of the queue where it is to be newly hosted, but use cluster workload balancing settings to continue sending messages to the original instance until our application is ready to switch. This is achieved with the following steps:
- Set the CLWLRANK property of the existing queue to a high value, for example five.
- Create the new instance of the queue and set its CLWLRANK property to zero.
- Complete any further configuration of the new system, for example deploy and start consuming applications against the new instance of the queue.
- Set the CLWLRANK property of the new queue instance to be higher than the original instance, for example nine.
- Allow the original queue instance to process any queued messages in the system and then delete the queue.
- Moving entire queue managers
- If the queue manager is staying on the same host, but the IP address is changing, then the process is as follows:
- DNS, when used correctly, can help simplify the process. For information about using DNS by setting the Connection name (CONNAME) channel attribute, see ALTER CHANNEL.
- If moving a full repository, ensure that we have at least one other full repository which is running smoothly (no problems with channel status for example) before making changes.
- Suspend the queue manager using the SUSPEND QMGR command to avoid traffic buildup.
- Modify the IP address of the computer. If your CLUSRCVR channel definition uses an IP address in the CONNAME field, modify this IP address entry. The DNS cache might need to be flushed through to ensure that updates are available everywhere.
- When the queue manager reconnects to the full repositories, channel auto-definitions automatically resolve themselves.
- If the queue manager hosted a full repository and the IP address changes, it is important to ensure that partials are switched over as soon as possible to point any manually defined CLUSSDR channels to the new location. Until this switch is carried out, these queue managers might be able to only contact the remaining (unchanged) full repository, and warning messages might be seen regarding the incorrect channel definition.
- Resume the queue manager using the RESUME QMGR command.
If the queue manager must be moved to a new host, it is possible to copy the queue manager data and restore from a backup. This process is not recommended however, unless there are no other options; it might be better to create a queue manager on a new machine and replicate queues and applications as described in the previous section. This situation gives a smooth rollover/rollback mechanism.
If you are determined to move a complete queue manager using backup, follow these best practices:When creating a queue manager and replicating the setup from an existing queue manager in the cluster (as described previously in this topic), never treat the two different queue managers as actually being the same. In particular, do not give a new queue manager the same queue manager name and IP address. Attempting to 'drop in' a replacement queue manager is a frequent cause of problems in IBM MQ clusters. The cache expects to receive updates including the QMID attribute, and state can be corrupted.
- Treat the whole process as a queue manager restore from backup, applying any processes you would usually use for system recovery as appropriate for your operating system environment.
- Use the REFRESH CLUSTER command after migration to discard all locally held cluster information (including any auto-defined channels that are in doubt), and force it to be rebuilt. Note: For large clusters, using the REFRESH CLUSTER command can be disruptive to the cluster while it is in progress, and again at 27 day intervals thereafter when the cluster objects automatically send status updates to all interested queue managers. See Refreshing in a large cluster can affect performance and availability of the cluster.
If two different queue managers are accidentally created with the same name, it is recommended to use the RESET CLUSTER QMID command to eject the incorrect entry from the cluster.
Upgrades and maintenance installations
Avoid the so-called big bang scenario (for example, stopping all cluster and queue manager activity, applying all upgrades and maintenance to all queue managers, then starting everything at the same time). Clusters are designed to still work with multiple versions of queue manager coexisting, so a well-planned, phased maintenance approach is recommended.
Have a backup plan:
- On z/OSĀ®, have you applied backwards migration PTFs?
- Have you taken backups?
- Avoid using new cluster functionality immediately: Wait until you are sure that all the queue managers are upgraded to the new level, and are certain that you are not going to roll any of them back. Using new cluster function in a cluster where some queue managers are still at an earlier level can lead to undefined behavior. For example, in the move to IBM WebSphere MQ Version 7.1 from IBM WebSphere MQ Version 6.0, if a queue manager defines a cluster topic, IBM WebSphere MQ Version 6.0 queue managers will not understand the definition or be able to publish on this topic.
Migrate full repositories first. Although they can forward information that they do not understand, they cannot persist it, so it is not the recommended approach unless absolutely necessary. For more information, see Queue manager cluster migration.