Resolving a partitioned (split brain) problem in DR RDQM

A partitioned problem can occur if both queue managers in a disaster recovery pair run in the primary role at the same time.


If you promoted the secondary instance of a queue manager on the recovery node while the original primary instance continued to run on the main node, then you effectively have two versions of the same queue manager running, each with their own view of the queue manager data. The DR status for the queue manager on each node is reported as Partitioned.

We must decide which of the two queue managers has the most correct view of the data, and retain that set while discarding the other. You use the rdqmdr command to complete this operation.


Procedure

  • To keep the data from the queue manager on the recovery node:
    1. Ensure both queue manager instances are stopped.
    2. Specify that the queue manager on the main node is the secondary:
      rdqmdr -m qmname -s
    3. Specify that the queue manager on the recovery node is the primary:
      rdqmdr -m qmname -p

      Synchronization begins, with the data from the queue manager on the recovery node being copied to the main node.

    4. Check the status of the synchronization:
      rdqmstatus -m qmname
    5. When the synchronization is complete, demote the queue manager on the recovery node:
      rdqmdr -m qmname -s
    6. Promote the queue manager on the main node, and start it:
      rdqmdr -m qmname -p
      strmqm qmname

  • To keep the data from the queue manager on the main node:
    1. Ensure both queue manager instances are stopped.
    2. Specify that the queue manager on the recovery node is the secondary:
      rdqmdr -m qmname -s
    3. Specify that the queue manager on the main node is the primary:
      rdqmdr -m qmname -p

      Synchronization begins, with the data from the queue manager on the main node being copied to the recovery node.

    4. Check the status of the synchronization:
      rdqmstatus -m qmname
    5. When the synchronization is complete, start the queue manager on the main node:
      strmqm qmname

Parent topic: Operate in a disaster recovery environment