Application issues seen when running REFRESH CLUSTER

Application issues seen when running REFRESH CLUSTER

Issuing REFRESH CLUSTER is disruptive to the cluster. It might make cluster objects invisible for a short time until the REFRESH CLUSTER processing completes. This can affect running applications. These notes describe some of the application issues you might see.

Reason codes that you might see from MQOPEN, MQPUT, or MQPUT1 calls

During REFRESH CLUSTER the following reason codes might be seen. The reason why each of these codes appears is described in a later section of this topic.

2189 MQRC_CLUSTER_RESOLUTION_ERROR
2085 MQRC_UNKNOWN_OBJECT_NAME
2041 MQRC_OBJECT_CHANGED
2082 MQRC_UNKNOWN_ALIAS_BASE_Q
2270 MQRC_NO_DESTINATIONS_AVAILABLE

All these reason codes indicate name lookup failures at one level or another in the IBM MQ code, which is to be expected if apps are running throughout the time of the REFRESH CLUSTER operation.

The REFRESH CLUSTER operation might be happening locally, or remotely, or both, to cause these outcomes. The likelihood of them appearing is especially high if full repositories are very busy. This happens if REFRESH CLUSTER activities are running locally on the full repository, or remotely on other queue managers in the cluster or clusters that the full repository is responsible for.

In respect of cluster queues that are absent temporarily, and will shortly be reinstated, then all of these reason codes are temporary retry-able conditions (although for 2041 MQRC_OBJECT_CHANGED it can be a little complicated to decide whether the condition is retry-able). If consistent with application rules (for example maximum service times) we should probably retry for about a minute, to give time for the REFRESH CLUSTER activities to complete. For a modest sized cluster, completion is likely to be much quicker than that.

If any of these reason codes is returned from MQOPEN, then no object handle is created, but a later retry should be successful in creating one.

If any of these reason codes is returned from MQPUT, then the object handle is not automatically closed, and retrying should eventually succeed without a need first to close the object handle. However, if the application opened the handle using bind-on-open options, and so requires all messages to go to the same channel, then (contrary to the application's expectations) it is not guaranteed that the retried put would go to the same channel or queue manager as before. It is therefore wise to close the object handle and open a new one, in that case, to regain the bind-on-open semantics.

If any of these reason codes is returned from MQPUT1, then it is unknown whether the problem happened during the open or the put part of the operation. Whichever it is, the operation can be retried. There are no bind-on-open semantics to worry about in this case, because the MQPUT1 operation is an open-put-close sequence that is performed in one continuous action.

Multi-hop scenarios
If the message flow incorporates a multi-hop, such as that shown in the following example, then a name lookup failure caused by REFRESH CLUSTER can occur on a queue manager that is remote from the application. In that case, the application receives a success (zero) return code, but the name lookup failure, if it occurs, prevents a CLUSRCVR channel program from routing the message to any proper destination queue. Instead, the CLUSRCVR channel program follows normal rules to write the message to a dead letter queue, based on the persistence of the message. The reason code associated with that operation is this:

2001 MQRC_ALIAS_BASE_Q_TYPE_ERROR

If there are persistent messages, and no dead letter queues have been defined to receive them, we will see channels ending. Here is an example multi-hop scenario:

MQOPEN on queue manager QM1 specifies Q2.
Q2 is defined in the cluster on a remote queue manager QM2, as an alias.
A message reaches QM2, and finds that Q2 is an alias for Q3.
Q3 is defined in the cluster on a remote queue manager QM3, as a qlocal.
The message reaches QM3, and is put to Q3.

When you test the multi-hop, you might see the following queue manager error log entries:

On the sending and receiving sides, when dead letter queues are in place, and there are persistent messages:

AMQ9544: Messages not put to destination queue

During the processing of channel 'CHLNAME' one or more messages could not be put to the destination queue and attempts were made to put them to a dead letter queue. The location of the queue is $, where 1 is the local dead letter queue and 2 is the remote dead letter queue.

On the receiving side, when a dead letter queue is not in place, and there are persistent messages:

AMQ9565: No dead letter queue defined

AMQ9599: Program could not open a queue manager object

AMQ9999: Channel program ended abnormally

On the sending side, when a dead letter queue is not in place, and there are persistent messages:

AMQ9506: Message receipt confirmation failed

AMQ9780: Channel to remote machine 'a.b.c.d(1415)' is ending because of an error

AMQ9999: Channel program ended abnormally

More details about why each of these reason codes might be displayed when running REFRESH CLUSTER

2189 (088D) (RC2189): MQRC_CLUSTER_RESOLUTION_ERROR

The local queue manager asked its full repositories about the existence of a queue name. There was no response from the full repositories within a hard-coded timeout of 10 seconds. This is because the request message or the response message is on a queue for processing, and this condition will be cleared in due course. At the app, the condition is retry-able, and will succeed when those internal mechanisms have completed.

2085 (0825) (RC2085): MQRC_UNKNOWN_OBJECT_NAME

The local queue manager asked (or has previously asked) its full repositories about the existence of a queue name. The full repositories have responded, saying that they did not know about the queue name. In the context of REFRESH CLUSTER taking place on full and partial repositories, the owner of the queue might not yet have told the full repositories about the queue. Or it might have done so, but the internal messages carrying this information are on a queue for processing, in which case this condition will be cleared in due course. At the app, the condition is retry-able, and will succeed when those internal mechanisms have completed.

2041 (07F9) (RC2041): MQRC_OBJECT_CHANGED

Most likely to be seen from bind-on-open MQPUT. The local queue manager knows about the existence of a queue name, and about the remote queue manager where it resides. In the context of REFRESH CLUSTER taking place on full and partial repositories, the record of the queue manager has been deleted and is in the process of being queried from the full repositories. At the app, it is a little complicated to decide whether the condition is retry-able. In fact, if the MQPUT is retried, it will succeed when those internal mechanisms have completed the job of learning about the remote queue manager. However there is no guarantee that the same queue manager will be used. It is safer to follow the approach usually recommended when MQRC_OBJECT_CHANGED is received, which is to close the object handle and re-open a new one.

2082 (0822) (RC2082): MQRC_UNKNOWN_ALIAS_BASE_Q

Similar in origin to the 2085 MQRC_UNKNOWN_OBJECT_NAME condition, this reason code is seen when a local alias is used, and its TARGET is a cluster queue that is inaccessible for the reasons previously described for reason code 2085.

2001 (07D1) (RC2001): MQRC_ALIAS_BASE_Q_TYPE_ERROR

This reason code is not usually seen at applications. It is only likely to be seen in the queue manager error logs, in relation to attempts to send a message to a dead letter queue. A CLUSRCVR channel program has received a message from its partner CLUSSDR and is deciding where to put it. This scenario is just a variation of the same condition previously described for reason codes 2082 and 2085. In this case, the reason code is seen when an alias is being processed at a different point in the MQ product, compared to where it is processed during an application MQPUT or MQOPEN.

2270 (08DE) (RC2270): MQRC_NO_DESTINATIONS_AVAILABLE

Seen when an application is using a queue that it opened with MQOO_BIND_NOT_FIXED, and the destination objects are unavailable for a short time until the REFRESH CLUSTER processing completes.

Further remarks

If there is any clustered publish/subscribe activity in this environment, then REFRESH CLUSTER can have additional unwanted effects. For example temporarily losing subscriptions for subscribers, that then find they missed a message. See REFRESH CLUSTER considerations for publish/subscribe clusters.
Parent topic: Queue manager clusters troubleshooting

Related information

REFRESH CLUSTER considerations for publish/subscribe clusters
Clustering: Using REFRESH CLUSTER best practices
MQSC Commands reference: REFRESH CLUSTER

Last updated: 2020-10-04