Queue Manager Clusters: Resolving Problems

Resolving Problems

The following problems can all be resolved using the REFRESH CLUSTER command. It is unlikely that you will need to use this command during normal circumstances. Use it only if you want your queue manager to make a fresh start in a cluster. Issue the REFRESH CLUSTER command from a queue manager to discard all locally held information about a cluster. For example you might use it if you think your full repository is not up-to-date, perhaps because you have accidentally restored an out-of-date backup. The format of the command is:
REFRESH CLUSTER(clustername) REPOS(YES/NO)
The queue manager from which you issue this command loses all the information in its full repository concerning the named cluster. It also loses any auto-defined channels that are not in doubt and which are not attached to a full repository queue manager. The queue manager has to make a cold-start in that cluster. It must reissue all information about itself and renew its requests for updates to other information that it is interested in. (It does this automatically.)
Here are some procedures for recovering clusters.

Problem 1 -- Out of date information in a restored cluster.

An image backup of PARIS, a partial repository in CLUSTER DEMO has been restored and the cluster information it contains is out of date.

On PARIS, issue the command REFRESH CLUSTER(DEMO)
PARIS will remove all information it has about the cluster DEMO, except that relating to the cluster queue managers which are the full repositories in the cluster. Assuming that this information is still correct, PARIS will contact the full repositories. PARIS will then inform them about itself and its queues and recover the information for queues and queue managers that exist elsewhere in the cluster as they are opened.

Problem 2 -- cluster DEMO force removed by mistake.

RESET CLUSTER(DEMO) QMNAME(PARIS) ACTION(FORCEREMOVE) was issued on a full repository in cluster DEMO by mistake

On PARIS, issue the command REFRESH CLUSTER(DEMO)
See solution to problem Problem 1 -- Out of date information in a restored cluster.

Problem 3 -- Possible repository messages deleted.

Messages destined for PARIS were removed from the SYSTEM.CLUSTER.TRANSMIT.QUEUE in other queue managers and they might have been repository messages

On PARIS, issue the command REFRESH CLUSTER(DEMO)
See solution to problem Problem 1 -- Out of date information in a restored cluster.

Problem 4 -- 2 full repositories moved at the same time.

Cluster DEMO contains two full repositories, PARIS and LONDON. They were both moved to a new location on the network at the same time.

Alter the CONNAME in the CLUSRCVR's and CLUSSDR's to specify the new network addresses.
Alter one of the queue managers (PARIS or LONDON) so it is no longer a full repository for any cluster.
On the altered queue manager, issue the command REFRESH CLUSTER(*) REPOS(YES).
Alter the queue manager so it is acting as a full repository.
This problem could have been avoided if, after moving one of the queue managers (for example LONDON) to its new network address it was allowed to start its CLUSRCVR, altered with the new address. Having informed the rest of the cluster and the other full repository queue manager (PARIS) of the new address of LONDON. The other queue manager (PARIS) can then be moved to its new network address, restarted and its CLUSRCVR modified to show its new network address. The manually defined CLUSSDR channels should also be modified for the sake of clarity, even though at this stage they are not needed for the correct operation of the cluster.
This procedure forces LONDON to reuse the information from the correct CLUSSDR to re-establish contact with PARIS and then rebuild its knowledge of the cluster. Additionally, having once again made contact with PARIS it will be given its own correct network address based on the CONNAME in its CLUSRCVR definition.

Problem 5 -- Unknown state of a cluster.

The state of the cluster is unknown and it is required to completely reset all of the systems in it.

For all full repository queue managers, follow these steps:

Alter queue managers that are full repositories so they are no longer full repositories.
Resolve any in doubt CLUSSDR channels.
Wait for the CLUSSDR channels to become inactive.
Stop the CLUSRCVR channels.
When all of the CLUSRCVR channels on all of the full repository systems are stopped, issue the command REFRESH CLUSTER(DEMO) REPOS(YES)
Alter the queue managers so they are full repositories.
Start the CLUSRCVR channels to re-enable them for communication.

Carry out the following steps on the other partial repository queue managers:

Resolve any in doubt CLUSSDR channels.
Make sure all CLUSSDR channels on the queue manager are stopped or inactive.
Issue the command REFRESH CLUSTER(DEMO) REPOS(YES).

Note:

Under normal conditions the full repositories will exchange information about the queues and queue managers in the cluster. If one full repository is refreshed, the cluster information is recovered from the other. To stop this happening all of the CLUSRCVR channels to full repositories are stopped and the CLUSSDR's should be inactive. When you REFRESH the full repository systems, none of them are able to communicate, so will start from the same cleared state.
As you REFRESH the partial repository systems they rejoin the cluster and rebuild it to the complete set of queue managers and queues.