High availability configurations

To operate the IBM MQ queue managers in a high availability (HA) configuration, we can set up your queue managers to work either with a high availability manager, such as PowerHA for AIX (formerly HACMP ) or the Microsoft Cluster Service (MSCS), or with IBM MQ multi-instance queue managers. On Linux systems, we can also deploy replicated data queue managers (RDQMs), which use a quorum-based group to provide high availability.

Another option for a high availability or disaster recovery solution is to deploy a pair of IBM MQ appliances. See High Availability and Disaster Recovery in the IBM MQ Appliance documentation.

We need to be aware of the following configuration definitions:

Queue manager clusters: Groups of two or more queue managers on one or more computers, providing automatic interconnection, and allowing queues to be shared among them for load balancing and redundancy. From IBM WebSphere MQ Version 7.1 onwards, cluster error recovery reruns operations that caused problems until the problems are resolved.
HA clusters: HA clusters are groups of two or more computers and resources such as disks and networks, connected together and configured in such a way that, if one fails, a high availability manager, such as HACMP ( UNIX ) or MSCS ( Windows ) performs a failover. The failover transfers the state data of applications from the failing computer to another computer in the cluster and re-initiates their operation there. This provides high availability of services running within the HA cluster. The relationship between IBM MQ clusters and HA clusters is described in Relationship of HA clusters to queue manager clusters.
Multi-instance queue managers: Instances of the same queue manager configured on two or more computers. By starting multiple instances, one instance becomes the active instance and the other instances become standbys. If the active instance fails, a standby instance running on a different computer automatically takes over. We can use multi-instance queue managers to configure your own highly available messaging systems based on IBM MQ, without requiring a cluster technology such as HACMP or MSCS. HA clusters and multi-instance queue managers are alternative ways of making queue managers highly available. Do not combine them by putting a multi-instance queue manager in an HA cluster.
High availability replicated data queue managers (HA RDQMs): Instances of the same queue manager configured on each node in a group of three Linux servers. One of the three instances is the active instance. Data from the active queue manager is synchronously replicated to the other two instances, so one of these instances can take over in the event of some failure. The grouping of the servers is controlled by Pacemaker, and the replication by DRBD.
Disaster recovery replicated data queue managers (DR RDQMs): A queue manager runs on a primary node at one site, with a secondary instance of that queue manager located on a recovery node at a different site. Data is replicated between the primary instance and the secondary instance, and if the primary node is lost for some reason, the secondary instance can be made into the primary instance and started. Both nodes must be Linux servers. The replication is controlled by DRBD.
Disaster recovery/high availability replicated data queue managers (DR/HA RDQMs): We can configure a replicated data queue manager (RDQM) that runs on a high availability group on one site, but can fail over to another high availability group at another site if some disaster occurs that makes the first group unavailable. This is known as a DR/HA RDQM.

Differences between multi-instance queue managers and HA clusters

Multi-instance queue managers and HA clusters are alternative ways to achieve high availability for the queue managers. Here are some points that highlight the differences between the two approaches.

Multi-instance queue managers include the following features:

Basic failover support integrated into IBM MQ
Faster failover than HA cluster
Simple configuration and operation
Integration with IBM MQ Explorer

Limitations of multi-instance queue managers include:

Highly available, high performance networked storage required
More complex network configuration because queue manager changes IP address when it fails over

HA clusters include the following features:

The ability to coordinate multiple resources, such as an application server or database
More flexible configuration options including clusters comprising more than two nodes
Can failover multiple times without operator intervention
Takeover of queue manager's IP address as part of the failover

Limitations of HA clusters include:

Additional product purchase and skills are required
Disks which can be switched between the nodes of the cluster are required
Configuration of HA clusters is relatively complex
Failover is rather slow historically, but recent HA cluster products are improving this
Unnecessary failovers can occur if there are shortcomings in the scripts that are used to monitor resources such as queue managers

Relationship of HA clusters to queue manager clusters

Queue manager clusters provide load balancing of messages across available instances of queue manager cluster queues. This offers higher availability than a single queue manager because, following a failure of a queue manager, messaging applications can still send messages to, and access, surviving instances of a queue manager cluster queue. However, although queue manager clusters automatically route new messages to the available queue managers in a cluster, messages currently queued on an unavailable queue manager are not available until that queue manager is restarted. For this reason, queue manager clusters alone do not provide high availability of all message data or provide automatic detection of queue manager failure and automatic triggering of queue manager restart or failover. High Availability (HA) clusters provide these features. The two types of cluster can be used together to good effect. For an introduction to queue manager clusters, see Designing clusters.

HA clusters on UNIX and Linux
We can use IBM MQ with a high availability (HA) cluster on UNIX and Linux platforms: for example, PowerHA for AIX (formerly HACMP ), Veritas Cluster Server, HP Serviceguard, or a Red Hat Enterprise Linux cluster with Red Hat Cluster Suite.
Supporting the Microsoft Cluster Service (MSCS)
Introducing and setting up MSCS to support failover of virtual servers.
Multi-instance queue managers
Multi-instance queue managers are instances of the same queue manager configured on different servers. One instance of the queue manager is defined as the active instance and another instance is defined as the standby instance. If the active instance fails, the multi-instance queue manager restarts automatically on the standby server.
Combining IBM MQ Availability solutions
Applications are using other IBM MQ capabilities to improve availability. Multi-instance queue managers complement other high availability capabilities.
RDQM high availability
RDQM (replicated data queue manager) is a high availability solution that is available on Linux platforms.
RDQM disaster recovery
RDQM (replicated data queue manager) is available on a subset of Linux platforms and can provide a disaster recovery solution.
RDQM disaster recovery and high availability
We can configure a replicated data queue manager (RDQM) that runs on a high availability group on one site, but can fail over to another high availability group at another site if some disaster occurs that makes the first group unavailable. This is known as a DR/HA RDQM.

Parent topic: Configure high availability, recovery and restart

Related information

High availability for IBM MQ Advanced certified container