Check that the other end of the channel is still available

We can use the heartbeat interval, the keep alive interval, and the receive timeout, to check that the other end of the channel is available.


Heartbeats

We can use the heartbeat interval channel attribute to specify that flows are to be passed from the sending MCA when there are no messages on the transmission queue, as is described in Heartbeat interval (HBINT).


Keep alive

In IBM MQ for z/OS, if we are using TCP/IP as the transport protocol, we can also specify a value for the Keepalive interval channel attribute (KAINT). You are recommended to give the Keepalive interval a higher value than the heartbeat interval, and a smaller value than the disconnect value. We can use this attribute to specify a time-out value for each channel, as is described in Keepalive Interval (KAINT).

In IBM MQ for IBM i, UNIX, Linux, and Windows systems, if we are using TCP as your transport protocol, we can set keepalive=yes. If you specify this option, TCP periodically checks that the other end of the connection is still available. It is not, the channel is terminated. This option is described in Keepalive Interval (KAINT).

If we have unreliable channels that report TCP errors, use of the Keepalive option means that your channels are more likely to recover.

We can specify time intervals to control the behavior of the Keepalive option. When we change the time interval, only TCP/IP channels started after the change are affected. Ensure that the value that you choose for the time interval is less than the value of the disconnect interval for the channel.

For more information about using the Keepalive option, see the KAINT parameter in the DEFINE CHANNEL command.


Receive timeout

If we are using TCP as your transport protocol, the receiving end of an idle non-MQI channel connection is also closed if no data is received for a period. This period, the receive time-out value, is determined according to the HBINT (heartbeat interval) value.

In IBM MQ for IBM i, UNIX, Linux, and Windows systems, the receive time-out value is set as follows:
  1. For an initial number of flows, before any negotiation takes place, the receive time-out value is twice the HBINT value from the channel definition.
  2. After the channels negotiate an HBINT value, if HBINT is set to less than 60 seconds, the receive time-out value is set to twice this value. If HBINT is set to 60 seconds or more, the receive time-out value is set to 60 seconds greater than the value of HBINT.

In IBM MQ for z/OS, the receive time-out value is set as follows:

  1. For an initial number of flows, before any negotiation takes place, the receive time-out value is twice the HBINT value from the channel definition.
  2. If RCVTIME is set, the timeout is set to one of

    • the negotiated HBINT multiplied by a constant
    • the negotiated HBINT plus a constant number of seconds
    • a constant number of seconds

    depending on the RCVTTYPE parameter, and subject to any limit imposed by RCVTMIN if it applies. RCVTMIN does not apply when RCVTTYPE(EQUAL) is configured. If we use a constant value of RCVTIME and we use a heartbeat interval, do not specify an RCVTIME less than the heartbeat interval. For details of the RCVTIME, RCVTMIN and RCVTTYPE attributes, see the ALTER QMGR command.

Note:

  1. If either of the values is zero, there is no timeout.
  2. For connections that do not support heartbeats, the HBINT value is negotiated to zero in step 2 and hence there is no timeout, so we must use TCP/IP KEEPALIVE.
  3. For client connections that use sharing conversations, heartbeats can flow across the channel (from both ends) all the time, not just when an MQGET is outstanding.
  4. For client connections where sharing conversations are not in use, heartbeats are flowed from the server only when the client issues an MQGET call with wait. Therefore, we are not recommended to set the heartbeat interval too small for client channels. For example, if the heartbeat is set to 10 seconds, an MQCMIT call fails (with MQRC_CONNECTION_BROKEN) if it takes longer than 20 seconds to commit because no data flowed during this time. This can happen with large units of work. However, it does not happen if appropriate values are chosen for the heartbeat interval because only MQGET with wait takes significant periods of time.

    Provided SHARECNV is not zero, the client uses a full duplex connection, which means that the client can (and does) heartbeat during all MQI calls

  5. In IBM MQ Version 7 Client channels, heartbeats can flow from both the server as well as the client side. The timeout at either end is based upon 2*HBINT for HBINTs of less than 60 seconds and HBINT+60 for HBINTs of over 60 seconds.
  6. Canceling the connection after twice the heartbeat interval is valid because a data or heartbeat flow is expected at least at every heartbeat interval. Setting the heartbeat interval too small, however, can cause problems, especially if we are using channel exits. For example, if the HBINT value is one second, and a send or receive exit is used, the receiving end waits for only 2 seconds before canceling the channel. If the MCA is performing a task such as encrypting the message, this value might be too short.


Suggested settings

IBM MQ for z/OSAs an initial starting point, we can use:
/cpf ALTER QMGR TCPKEEP(YES) RCVTTYPE(ADD) RCVTIME(60) ADOPTMCA(ALL) ADOPTCHK(ALL)
where cpf is the command prefix for the queue manager subsystem.

See ALTER QMGR and IBM MQ network availability for more information on the various parameters.

If the IP address of the sender could translate to more than one address, you might need to set ADOPTCHK to QMNAME rather than ALL.

IBM MQ for MultiplatformsIn qm.ini, add the following information:
TCP:
KeepAlive=Yes
CHANNELS:
AdoptNewMCA=ALL
AdoptNewMCACheck=ALL

See ALTER QMGR, Configuration file stanzas for distributed queuing, and Channels stanza of the qm.ini file for more information.

If the IP address of the sender could translate to more than one address, you might need to set NewMCACheck to QMNAME rather than ALL.

Parent topic: Channel control function