RDQM disaster recovery and high availability

We can configure a replicated data queue manager (RDQM) that runs on a high availability group on one site, but can fail over to another high availability group at another site if some disaster occurs that makes the first group unavailable. This is known as a DR/HA RDQM.

A DR/HA RDQM combines the features of a high availability RDQM (see RDQM high availability) and a disaster recovery RDQM (see RDQM disaster recovery).

The following diagram shows an example DR/HA RDQM.

The replication between the DR/HA RDQMs on the main site and the disaster recovery site is always asynchronous. With asynchronous replication, operations such as IBM MQ PUT or GET complete and return to the application before the event is replicated to the secondary queue manager.

We can have two active sites rather than 'main' and 'recovery' sites, if required, so some of your DR/HA RDQMs run on one site and some on the other during normal operation. If a disaster occurs and one site becomes unavailable, then all DR/HA RDQMs run on the same HA group at the same site.

Each HA group is configured in the same way as an ordinary HA group. We can define floating IP addresses for a DR/HA RDQM in each HA group. The floating IP address can be the same or different for each HA group.

We cannot upgrade an existing RDQM to be a DR/HA RDQM, create a DR/HA RDQM. (If required, you could back up the data of an existing RDQM, delete it, recreate it as a DR/HA RDQM, and then restore the data, see Backing up and restoring IBM MQ queue manager data.)

To configure DR/HA RDQMs, we must complete the following major steps:

Configure an HA group on the 'main' site.
Configure an HA group on the 'recovery' site.
Create a primary/primary DR/HA RDQM on one node of the HA group in the 'main' site.
Create primary/secondary DR/HA RDQMs on the other two nodes in the 'main' site.
Define a floating IP address for an application to access the DR/HA RDQM when it is running on any of the nodes of the HA group on the 'main' site.
Create a secondary/primary DR/HA RDQM on one node of the HA group on the 'recovery' site.
Create secondary/secondary DR/HA RDQMs on the other two nodes in the 'recovery' site.
Define a floating IP address for an application to access the DR/HA RDQM when it is running on any of the nodes of the HA group on the 'recovery' site.

Details about each of these steps are given in the following topics.

Requirements for a DR/HA RDQM solution
The requirements for the DR/HA RDQM solution are the same as for the HA RDQM solution and the DR RDQM solution.
Configure HA groups for DR/HA RDQMs
We must create an HA group on both your main and recovery sites. If we have an existing HA group on either site, we can create DR/HA RDQMs in that HA group. (Existing RDQMs will continue to operate as before.)
Create DR/HA RDQMs
You use the crtmqm command to create a replicated data queue manager (RDQM) in a DR/HA configuration.
Create a floating IP address
We can create floating IP addresses for each of our HA groups in a DR/HA RDQM configuration.
Starting, stopping, and displaying the state of a DR/HA RDQM
You use variants of standard IBM MQ control commands to start, stop, and view the current state of a DR/HA RDQM.
View DR/HA RDQM and HA group status
We can view the HA status and DR role of DR/HA replicated data queue managers (RDQMs).
Operate in a DR/HA environment
When operating in a DR/HA environment there are separate considerations for high availability and disaster recovery.
Replacing a failed node in a DR/HA configuration
If one of the nodes in either of our HA groups fails, we can replace it.
DR/HA RDQM worked example
This example show how to create and delete DR/HA RDQM.

Parent topic: High availability configurations