Queue manager clusters troubleshooting

Use the checklist given here, and the advice given in the subtopics, to help you to detect and deal with problems when we use queue manager clusters.


Before you begin

If your problems relate to publish/subscribe messaging using clusters, rather than to clustering in general, see Routing for publish/subscribe clusters: Notes on behavior.


Procedure

  • Check that your cluster channels are all paired.

    Each cluster sender channel connects to a cluster receiver channel of the same name. If there is no local cluster receiver channel with the same name as the cluster sender channel on the remote queue manager, then it won't work.

  • Check that your channels are running. No channels should be in RETRYING state permanently. Show which channels are running using the following command:
    runmqsc display chstatus(*)
    
    If we have channels in RETRYING state, there might be an error in the channel definition, or the remote queue manager might not be running. While channels are in this state, messages are likely to build up on transmit queues. If channels to full repositories are in this state, then the definitions of cluster objects (for example queues and queue managers) become out-of-date and inconsistent across the cluster.
  • Check that no channels are in STOPPED state. Channels go into STOPPED state when you stop them manually. Channels that are stopped can be restarted using the following command:
    runmqsc start channel(xyz)
    
    A clustered queue manager auto-defines cluster channels to other queue managers in a cluster, as required. These auto-defined cluster channels start automatically as needed by the queue manager, unless they were previously stopped manually. If an auto-defined cluster channel is stopped manually , the queue manager remembers that it was manually stopped and does not start it automatically in the future. If you need to stop a channel, either remember to restart it again at a convenient time, or else issue the following command:
    stop channel(xyz) status(inactive)
    
    The status(inactive) option allows the queue manager to restart the channel at a later date if it needs to do so.
  • Check that all queue managers in the cluster are aware of all the full repositories. We can do this using the following command:
    runmqsc display clusqmgr(*) qmtype
    
    Partial repositories might not be aware of all other partial repositories. All full repositories should be aware of all queue managers in the cluster. If cluster queue managers are missing, this might mean that certain channels are not running correctly.
  • Check that every queue manager (full repositories and partial repositories) in the cluster has a manually defined cluster receiver channel running and is defined in the correct cluster. To see which other queue managers are talking to a cluster receiver channel, use the following command:
    runmqsc display channel(*) rqmname
    
    Check that each manually defined cluster receiver has a conname parameter defined to be ipaddress(port). Without a correct connection name, the other queue manager does not know the connection details to use when connecting back.
  • Check that every partial repository has a manually defined cluster sender channel running to a full repository, and defined in the correct cluster.

    The cluster sender channel name must match the cluster receiver channel name on the other queue manager.

  • Check that every full repository has a manually defined cluster sender channel running to every other full repository, and defined in the correct cluster.

    The cluster sender channel name must match the cluster receiver channel name on the other queue manager. Each full repository does not keep a record of what other full repositories are in the cluster. It assumes that any queue manager to which it has a manually defined cluster sender channel is a full repository.

  • Check the dead letter queue.

    Messages that the queue manager cannot deliver are sent to the dead letter queue.

  • Check that, for each partial repository queue manager, we have defined a single cluster-sender channel to one of the full repository queue managers. This channel acts as a "bootstrap" channel through which the partial repository queue manager initially joins the cluster.
  • Check that the intended full repository queue managers are actual full repositories and are in the correct cluster. We can do this using the following command:
    runmqsc display qmgr repos reposnl
    
  • Check that messages are not building up on transmit queues or system queues. We can check transmit queues using the following command:
    runmqsc display ql(*) curdepth where (usage eq xmitq)
    
    We can check system queues using the following command:
    display ql(system*) curdepth