Components of a high availability solution on IBM i

Construct a high availability solution using multi-instance queue managers by providing robust networked storage for queue manager data, journal replication or robust IASP storage for queue manager journals, and using reconnectable clients, of applications configured as restartable queue manager services.

A multi-instance queue manager reacts to the detection of queue manager failure by resuming the startup of another queue manager instance on another server. To complete its startup, the instance needs access to the shared queue manager data in networked storage, and to its copy of the local queue manager journal.

To create a high availability solution, you need to manage the availability of the queue manager data, the currency of the local queue manager journal, and either build reconnectable client applications, or deploy our applications as queue manager services to restart automatically when the queue manager resumes. Automatic client reconnect is not supported by IBM MQ classes for Java™.


Queue manager data

Place queue manager data onto networked storage that is shared, highly available and reliable, possibly by using RAID level 1 disks or greater. The file system needs to meet the requirements for a shared file system for multi-instance queue managers; for more information about the requirements for shared file systems, see Requirements for shared file systems. Network File System Version 4 (NFS4) is a protocol that meets these requirements.


Queue manager journals

You also need to configure the IBM i journals used by the queue manager instances so that the standby instance is able to restore its queue manager data to a consistent state. For uninterrupted service, this means you must restore the journals to their state when the active instance failed. Unlike backup or disaster recovery solutions, restoring journals to an earlier checkpoint is not sufficient.

We cannot physically share journals between multiple IBM i systems on networked storage. To restore queue manager journals to the consistent state at the point of failure, you either need to transfer the physical journal that was local to the active queue manager instance at the time of failure to the new instance that has been activated, or a maintain mirrors of the journal on running standby instances. The mirrored journal is a remote journal replica that has been kept exactly in sync with the local journal belonging to the failed instance.

Three configurations are starting points for designing how you manage the journals for a multi-instance queue manager,
  1. Use synchronized journal replication (journal mirroring) from the active instance ASP, to the standby instances ASPs.
  2. Transferring an IASP we have configured to hold the queue manager journal from the active instance to the standby instance that is taking over as the active instance.
  3. Use synchronized secondary IASP mirrors.

See ASP options, for more information on putting queue manager data onto an iASP, in the IBM MQ IBM i CRTMQM command.

Also, see High availability in the IBM i Knowledge Center.


Applications

To build a client to automatically reconnect to the queue manager when the standby queue manager resumes, connect our application to the queue manager using MQCONNX and specify MQCNO_RECONNECT_Q_MGR in the MQCNO Options field. See, High availability sample programs for three sample programs using reconnectable clients, and Application recovery for information about designing client applications for recovery.