Operate in a disaster recovery environment

Operate in a disaster recovery environment

There are a number of situations in which you might want to switch over to the secondary queue manager in a disaster recovery configuration.

Disaster recovery

Following the complete loss of the primary queue manager at the main site, you start the secondary queue manager at the recovery site. Applications reconnect to the queue manager at the recovery site and the secondary queue manager processes application messages. The steps taken to revert to the previous configuration depend on the cause of the failure. For example, complete loss of main node versus temporary loss.
For steps to take following a temporary loss of the main site, see Switching over to a recovery node. For steps to take following permanent failure, see Replacing a failed node in a disaster recovery configuration.

Disaster recovery test support

We can test the disaster recovery configuration by temporarily switching over to the secondary instance and checking that applications can successfully connect. You follow the same procedure as when you switch over following a temporary failure of the primary node, see Switching over to a recovery node.

Reverting to snapshot

If you suffer a failure in the primary node while a synchronization is in progress, we can revert to the snapshot taken of the secondary queue manager data just before the synchronization started. The secondary is then restored to a consistent state and can be run as the primary. To revert to the snapshot, you make the secondary into the primary, as described in Switching over to a recovery node. We must check that the revert to snapshot has completed (by using the rdqmstatus command) before you start the queue manager.

Switching over to a recovery node
If a disaster occurs in your main site, you take steps to switch over to your recovery site.
Replacing a failed node in a disaster recovery configuration
If you lose one of the nodes in a disaster recovery configuration, we can replace the node and restore the disaster recovery configuration by following this procedure.
Resolving an inconsistent problem in DR RDQM
A DR status of inconsistent can be reported if synchronization fails between the primary and secondary instances of a queue manager.
Resolving a partitioned (split brain) problem in DR RDQM
A partitioned problem can occur if both queue managers in a disaster recovery pair run in the primary role at the same time.
Change IP addresses in disaster recovery configurations
If we change the IP addresses of either of the interfaces in a disaster recovery configuration, replication is no longer possible between the two nodes.

Parent topic: RDQM disaster recovery

Last updated: 2020-10-04