Configure high availability, recovery and restart
We can make our applications highly available by maintaining queue availability if a queue manager fails, and by recovering messages after server or storage failure.
About this task
On z/OS®, high availability is built into the platform. Extra servant regions are spawned as needed, to meet increased demand. We can also improve server application availability by using queue sharing groups. See Shared queues and queue-sharing groups.
On Multiplatforms, we can improve client application availability by using client reconnection to switch a client automatically between a group of queue managers, or to the new active instance of a multi-instance queue manager after a queue manager failure. Automatic client reconnect is not supported by IBM MQ classes for Java™. A multi-instance queue manager is configured to run as a single queue manager on multiple servers. You deploy server applications to this queue manager. If the server running the active instance fails, execution is automatically switched to a standby instance of the same queue manager on a different server. If you configure server applications to run as queue manager services, they are restarted when a standby instance becomes the actively running queue manager instance.
Another way to increase server application availability on Multiplatforms is to deploy server applications to multiple computers in a queue manager cluster. From IBM WebSphere MQ Version 7.1 onwards, cluster error recovery reruns operations that caused problems until the problems are resolved. See Changes to cluster error recovery on servers other than z/OS. We can also configure IBM MQ for Multiplatforms as part of a platform-specific clustering solution such as:A messaging system ensures that messages entered into the system are delivered to their destination. IBM MQ can trace the route of a message as it moves from one queue manager to another using the dspmqrte command. If a system fails, messages can be recovered in various ways depending on the type of failure, and the way a system is configured. IBM MQ maintains recovery logs of the activities of the queue managers that handle the receipt, transmission, and delivery of messages. It uses these logs for three types of recovery:
- Microsoft Cluster Server
- HA clusters on IBM i
- PowerHA® for AIX (formerly HACMP on AIX) and other UNIX and Linux clustering solutions
In all cases, the recovery restores the queue manager to the state it was in when the queue manager stopped, except that any in-flight transactions are rolled back, removing from the queues any updates that were in-flight at the time the queue manager stopped. Recovery restores all persistent messages; nonpersistent messages might be lost during the process.
- Restart recovery, when you stop IBM MQ in a planned way.
- Failure recovery, when a failure stops IBM MQ.
- Media recovery, to restore damaged objects.