Reliability and availability on IBM i
Multi-instance queue managers aim to improve the availability of applications. Technological and physical constraints mean we need different solutions to meet the demands of disaster recovery, backing up queue managers and continuous operation.
In configuring for reliability and availability you trade off a large number of factors, resulting in four distinct design points:
- Disaster recovery
- Optimized for recovery after a major disaster that destroys all your local assets.
Disaster recovery on IBM i is often based on geographic mirroring of IASP.
- Backup
- Optimized for recovery after a localized failure, commonly a human error or some unforeseen
technical problem.
IBM MQ provides backup queue managers to back up queue managers periodically. You could also use asynchronous replication of queue manager journals to improve the currency of the backup.
- Availability
- Optimized for restoring operations quickly giving the appearance of a nearly uninterrupted
service following foreseeable technical failures such as a server or disk failure.
Recovery is typically measured in minutes, with detection sometimes taking longer than the recovery process. A multi-instance queue manager assists you in configuring for availability.
- Continuous operation
- Optimized for providing an uninterrupted service.
Continuous operation solutions have to solve the detection problem, and nearly always involve submitting the same work through more than one system and either using the first result, or if correctness is a major consideration, comparing at least two outcomes.
A multi-instance queue manager assists you in configuring for availability. One instance of the queue manager is active at a time. Switching over to a standby instance takes from a little more than ten seconds to a fifteen minutes or more, depending on how the system is configured, loaded and tuned.
A multi-instance queue manager can give the appearance of a nearly uninterrupted service if used with reconnectable IBM MQ MQI clients, which are able to continue processing without the application program necessarily being aware of a queue manager outage; see the topic Automated client reconnection.
Parent topic: Multi-instance queue managers on IBM i