Configure high availability, recovery and restart

Configure high availability, recovery and restart

We can make our applications highly available by maintaining queue availability if a queue manager fails, and by recovering messages after server or storage failure.

About this task

On z/OSÂ®, high availability is built into the platform. Extra servant regions are spawned as needed, to meet increased demand. We can also improve server application availability by using queue sharing groups. See Shared queues and queue-sharing groups.

On Multiplatforms, we can improve client application availability by using client reconnection to switch a client automatically between a group of queue managers, or to the new active instance of a multi-instance queue manager after a queue manager failure. Automatic client reconnect is not supported by IBM MQ classes for Javaâ„¢. A multi-instance queue manager is configured to run as a single queue manager on multiple servers. You deploy server applications to this queue manager. If the server running the active instance fails, execution is automatically switched to a standby instance of the same queue manager on a different server. If you configure server applications to run as queue manager services, they are restarted when a standby instance becomes the actively running queue manager instance.
Another way to increase server application availability on Multiplatforms is to deploy server applications to multiple computers in a queue manager cluster. From IBM WebSphere MQ Version 7.1 onwards, cluster error recovery reruns operations that caused problems until the problems are resolved. See Changes to cluster error recovery on servers other than z/OS. We can also configure IBM MQ for Multiplatforms as part of a platform-specific clustering solution such as:

Microsoft Cluster Server
HA clusters on IBM i
PowerHAÂ® for AIX (formerly HACMP on AIX) and other UNIX and Linux clustering solutions
A messaging system ensures that messages entered into the system are delivered to their destination. IBM MQ can trace the route of a message as it moves from one queue manager to another using the dspmqrte command. If a system fails, messages can be recovered in various ways depending on the type of failure, and the way a system is configured. IBM MQ maintains recovery logs of the activities of the queue managers that handle the receipt, transmission, and delivery of messages. It uses these logs for three types of recovery:

Restart recovery, when you stop IBM MQ in a planned way.
Failure recovery, when a failure stops IBM MQ.
Media recovery, to restore damaged objects.
In all cases, the recovery restores the queue manager to the state it was in when the queue manager stopped, except that any in-flight transactions are rolled back, removing from the queues any updates that were in-flight at the time the queue manager stopped. Recovery restores all persistent messages; nonpersistent messages might be lost during the process.

Automatic client reconnection
We can make your client applications reconnect automatically, without writing any additional code, by configuring a number of components.
Console message monitoring
On IBM MQ for z/OS, there are a number of information messages issued by the queue manager or channel initiator that should be considered particularly significant. These messages do not in themselves indicate a problem, but can be useful in tracking because they do indicate a potential issue which might need addressing.
High availability configurations
If you want to operate your IBM MQ queue managers in a high availability (HA) configuration, we can set up your queue managers to work either with a high availability manager, such as PowerHA for AIX (formerly HACMP ) or the Microsoft Cluster Service (MSCS), or with IBM MQ multi-instance queue managers. On Linux systems, we can also deploy replicated data queue managers (RDQMs), which use a quorum-based group to provide high availability.
Logging: Making sure that messages are not lost
IBM MQ records all significant changes to the persistent data controlled by the queue manager in a recovery log.
Backing up and restoring IBM MQ queue manager data
We can protect queue managers against possible corruption caused by hardware failures by backing up queue managers and queue manager data, by backing up the queue manager configuration only, and by using a backup queue manager.
Parent topic: Configure IBM MQ