Example RDQM HA configurations and errors

An example RDQM HA configuration, complete with example errors and information on how to resolve them.

The example RDQM HA group consists of three nodes:

mqhavm13.gamsworthwilliam.com (referred to as vm13).
mqhavm14.gamsworthwilliam.com (referred to as vm14).
mqhavm15.gamsworthwilliam.com (referred to as vm15).

Three RDQM HA queue managers have been created:

HAQM1 (created on vm13)
HAQM2 (created on vm14)
HAQM3 (created on vm15)

Initial conditions

The initial condition on each of the nodes is given in the following listings:

vm13

[midtownjojo@mqhavm13 ~]$ rdqmstatus -m HAQM1
Node:                                   mqhavm13.gamsworthwilliam.com
Queue manager status:                   Running
CPU:                                    0.00%
Memory:                                 135MB
Queue manager file system:              51MB used, 1.0GB allocated [5%]
HA role:                                Primary
HA status:                              Normal
HA control:                             Enabled
HA current location:                    This node
HA preferred location:                  This node
HA floating IP interface:               None
HA floating IP address:                 None

Node:                                   mqhavm14.gamsworthwilliam.com
HA status:                              Normal

Node:                                   mqhavm15.gamsworthwilliam.com
HA status:                              Normal
Command '/opt/mqm/bin/rdqmstatus' run with sudo.

[midtownjojo@mqhavm13 ~]$ rdqmstatus -m HAQM2
Node:                                   mqhavm13.gamsworthwilliam.com
Queue manager status:                   Running elsewhere
HA role:                                Secondary
HA status:                              Normal
HA control:                             Enabled
HA current location:                    mqhavm14.gamsworthwilliam.com
HA preferred location:                  mqhavm14.gamsworthwilliam.com
HA floating IP interface:               None
HA floating IP address:                 None

Node:                                   mqhavm14.gamsworthwilliam.com
HA status:                              Normal

Node:                                   mqhavm15.gamsworthwilliam.com
HA status:                              Normal
Command '/opt/mqm/bin/rdqmstatus' run with sudo.

[midtownjojo@mqhavm13 ~]$ rdqmstatus -m HAQM3
Node:                                   mqhavm13.gamsworthwilliam.com
Queue manager status:                   Running elsewhere
HA role:                                Secondary
HA status:                              Normal
HA control:                             Enabled
HA current location:                    mqhavm15.gamsworthwilliam.com
HA preferred location:                  mqhavm15.gamsworthwilliam.com
HA floating IP interface:               None
HA floating IP address:                 None

Node:                                   mqhavm14.gamsworthwilliam.com
HA status:                              Normal

Node:                                   mqhavm15.gamsworthwilliam.com
HA status:                              Normal
Command '/opt/mqm/bin/rdqmstatus' run with sudo.

vm14

[midtownjojo@mqhavm14 ~]$ rdqmstatus -m HAQM1
Node:                                   mqhavm14.gamsworthwilliam.com
Queue manager status:                   Running elsewhere
HA role:                                Secondary
HA status:                              Normal
HA control:                             Enabled
HA current location:                    mqhavm13.gamsworthwilliam.com
HA preferred location:                  mqhavm13.gamsworthwilliam.com
HA floating IP interface:               None
HA floating IP address:                 None

Node:                                   mqhavm13.gamsworthwilliam.com
HA status:                              Normal

Node:                                   mqhavm15.gamsworthwilliam.com
HA status:                              Normal
Command '/opt/mqm/bin/rdqmstatus' run with sudo.

[midtownjojo@mqhavm14 ~]$ rdqmstatus -m HAQM2
Node:                                   mqhavm14.gamsworthwilliam.com
Queue manager status:                   Running
CPU:                                    0.00%
Memory:                                 135MB
Queue manager file system:              51MB used, 1.0GB allocated [5%]
HA role:                                Primary
HA status:                              Normal
HA control:                             Enabled
HA current location:                    This node
HA preferred location:                  This node
HA floating IP interface:               None
HA floating IP address:                 None

Node:                                   mqhavm13.gamsworthwilliam.com
HA status:                              Normal

Node:                                   mqhavm15.gamsworthwilliam.com
HA status:                              Normal
Command '/opt/mqm/bin/rdqmstatus' run with sudo.

[midtownjojo@mqhavm14 ~]$ rdqmstatus -m HAQM3
Node:                                   mqhavm14.gamsworthwilliam.com
Queue manager status:                   Running elsewhere
HA role:                                Secondary
HA status:                              Normal
HA control:                             Enabled
HA current location:                    mqhavm15.gamsworthwilliam.com
HA preferred location:                  mqhavm15.gamsworthwilliam.com
HA floating IP interface:               None
HA floating IP address:                 None

Node:                                   mqhavm13.gamsworthwilliam.com
HA status:                              Normal

Node:                                   mqhavm15.gamsworthwilliam.com
HA status:                              Normal
Command '/opt/mqm/bin/rdqmstatus' run with sudo.

vm15

[midtownjojo@mqhavm15 ~]$ rdqmstatus -m HAQM1
Node:                                   mqhavm15.gamsworthwilliam.com
Queue manager status:                   Running elsewhere
HA role:                                Secondary
HA status:                              Normal
HA control:                             Enabled
HA current location:                    mqhavm13.gamsworthwilliam.com
HA preferred location:                  mqhavm13.gamsworthwilliam.com
HA floating IP interface:               None
HA floating IP address:                 None

Node:                                   mqhavm13.gamsworthwilliam.com
HA status:                              Normal

Node:                                   mqhavm14.gamsworthwilliam.com
HA status:                              Normal
Command '/opt/mqm/bin/rdqmstatus' run with sudo.

[midtownjojo@mqhavm15 ~]$ rdqmstatus -m HAQM2
Node:                                   mqhavm15.gamsworthwilliam.com
Queue manager status:                   Running elsewhere
HA role:                                Secondary
HA status:                              Normal
HA control:                             Enabled
HA current location:                    mqhavm14.gamsworthwilliam.com
HA preferred location:                  mqhavm14.gamsworthwilliam.com
HA floating IP interface:               None
HA floating IP address:                 None

Node:                                   mqhavm13.gamsworthwilliam.com
HA status:                              Normal

Node:                                   mqhavm14.gamsworthwilliam.com
HA status:                              Normal
Command '/opt/mqm/bin/rdqmstatus' run with sudo.

[midtownjojo@mqhavm15 ~]$ rdqmstatus -m HAQM3
Node:                                   mqhavm15.gamsworthwilliam.com
Queue manager status:                   Running
CPU:                                    0.02%
Memory:                                 135MB
Queue manager file system:              51MB used, 1.0GB allocated [5%]
HA role:                                Primary
HA status:                              Normal
HA control:                             Enabled
HA current location:                    This node
HA preferred location:                  This node
HA floating IP interface:               None
HA floating IP address:                 None

Node:                                   mqhavm13.gamsworthwilliam.com
HA status:                              Normal

Node:                                   mqhavm14.gamsworthwilliam.com
HA status:                              Normal
Command '/opt/mqm/bin/rdqmstatus' run with sudo.

DRBD scenarios

RDQM HA configurations use DRBD for data replication. The following scenarios illustrate the following possible problems with DRBD:

Loss of DRBD quorum
Loss of a single DRBD connection
Synchronization stuck

DRBD Scenario 1: Loss of DRBD quorum

If the node running an RDQM HA queue manager loses the DRBD quorum for the DRBD resource corresponding to the queue manager, DRBD immediately starts returning errors from I/O operations, which will cause the queue manager to start producing FDCs and eventually stop.

If the remaining two nodes have a DRBD quorum for the DRBD resource then Pacemaker chooses one of the two nodes to start the queue manager. Because there were no updates on the original node from the time where the quorum was lost, it is safe to start the queue manager somewhere else.

The two main ways that we can monitor for a loss of DRBD quorum are:

By using the rdqmstatus command.
By monitoring the syslog of the node where the RDQM HA queue manager is initially running.

rdqmstatus

If we use the rdqmstatus command, if the node vm13 loses DRBD quorum for the DRBD resource for HAQM1, you might see status similar to the following example:

[midtownjojo@mqhavm13 ~]$ rdqmstatus -m HAQM1
Node:                                   mqhavm13.gamsworthwilliam.com
Queue manager status:                   Running elsewhere
HA role:                                Secondary
HA status:                              Remote unavailable
HA control:                             Enabled
HA current location:                    mqhavm14.gamsworthwilliam.com
HA preferred location:                  This node
HA floating IP interface:               None
HA floating IP address:                 None

Node:                                   mqhavm14.gamsworthwilliam.com
HA status:                              Remote unavailable
HA out of sync data:                    0KB

Node:                                   mqhavm15.gamsworthwilliam.com
HA status:                              Remote unavailable
HA out of sync data:                    0KB
Command '/opt/mqm/bin/rdqmstatus' run with sudo.

Notice that the HA status has changed to Remote unavailable, which indicates that both DRBD connections to the other nodes have been lost.

In this case the other two nodes have DRBD quorum for the DRBD resource so the RDQM is running somewhere else, on mqhavm14.gamsworthwilliam.com as shown as the value of HA current location.

monitoring syslog

If you monitor syslog, we will see that DRBD logs a message when it loses quorum for a resource:

Jul 30 09:38:36 mqhavm13 kernel: drbd haqm1/0 drbd100: quorum( yes -> no )

When quorum is restored a similar message is logged:

Jul 30 10:27:32 mqhavm13 kernel: drbd haqm1/0 drbd100: quorum( no -> yes )

DRBD Scenario 2: Loss of a single DRBD connection

If only one of the two DRBD connections from a node running an RDQM HA queue manager is lost then the queue manager does not move.

Starting from the same initial conditions as in the first scenario, after blocking just one of the DRBD replication links, the status reported by rdqmstatus on vm13 is similar to the following example:

Node:                                   mqhavm13.gamsworthwilliam.com
Queue manager status:                   Running
CPU:                                    0.01%
Memory:                                 133MB
Queue manager file system:              52MB used, 1.0GB allocated [5%]
HA role:                                Primary
HA status:                              Mixed
HA control:                             Enabled
HA current location:                    This node
HA preferred location:                  This node
HA floating IP interface:               None
HA floating IP address:                 None

Node:                                   mqhavm14.gamsworthwilliam.com

HA status:                              Remote unavailable
HA out of sync data:                    0KB

Node:                                   mqhavm15.gamsworthwilliam.com
HA status:                              Normal
Command '/opt/mqm/bin/rdqmstatus' run with sudo.

DRBD Scenario 3: Synchronization stuck

Some versions of DRBD had an issue where a synchronization would appear to be stuck and this prevented an RDQM HA queue manager from failing over to a node when the sync to that node is still in progress.

One way to see this is to use the drbdadm status command. When operating normally a response similar to the following example is output:

[midtownjojo@mqhavm13 ~]$ drbdadm status
haqm1 role:Primary
  disk:UpToDate
  mqhavm14.gamsworthwilliam.com role:Secondary
    peer-disk:UpToDate
  mqhavm15.gamsworthwilliam.com role:Secondary
    peer-disk:UpToDate

haqm2 role:Secondary
  disk:UpToDate
  mqhavm14.gamsworthwilliam.com role:Primary
    peer-disk:UpToDate
  mqhavm15.gamsworthwilliam.com role:Secondary
    peer-disk:UpToDate

haqm3 role:Secondary
  disk:UpToDate
  mqhavm14.gamsworthwilliam.com role:Secondary
    peer-disk:UpToDate
  mqhavm15.gamsworthwilliam.com role:Primary
    peer-disk:UpToDate

If synchronization gets stuck, the response is similar to the following example:

[midtownjojo@mqhavm13 ~]$ drbdadm status
haqm1 role:Primary
  disk:UpToDate
  mqhavm14.gamsworthwilliam.com role:Secondary
    peer-disk:UpToDate
  mqhavm15.gamsworthwilliam.com role:Secondary
    replication:SyncSource peer-disk:Inconsistent done:90.91

haqm2 role:Secondary
  disk:UpToDate
  mqhavm14.gamsworthwilliam.com role:Primary
    peer-disk:UpToDate
  mqhavm15.gamsworthwilliam.com role:Secondary
    peer-disk:UpToDate

haqm3 role:Secondary
  disk:UpToDate
  mqhavm14.gamsworthwilliam.com role:Secondary
    peer-disk:UpToDate
  mqhavm15.gamsworthwilliam.com role:Primary
    peer-disk:UpToDate

In this case the RDQM HA queue manager HAQM1 cannot move to vm15 as the disk on vm15 is Inconsistent.

The done value is the percentage complete. If that value is not increasing you could try disconnecting that replica then connecting it again with the following commands (run as root) on vm13:

drbdadm disconnect haqm1:mqhavm15.gamsworthwilliam.com
drbdadm connect haqm1:mqhavm15.gamsworthwilliam.com

If the replication to both Secondary nodes is stuck, we can do the disconnect and connect commands without specifying a node and that will disconnect both connections:

drbdadm disconnect haqm1
drbdadm connect haqm1

Pacemaker scenarios

RDQM HA configurations use Pacemaker to determine where an RDQM HA queue manager runs. The following scenarios illustrate the following possible problems that involve Pacemaker:

Corosync main process not scheduled
RDQM HA queue manager not running where it should

Pacemaker scenario 1: Corosync main process not scheduled

If you see a message in the syslog similar to the following example this indicates that the system is either too busy to schedule CPU time to the main Corosync process or, more commonly, that the system is a Virtual Machine and the Hypervisor has not scheduled any CPU time to the entire VM.

corosync[10800]:  [MAIN  ] Corosync main process was not scheduled for 2787.0891 ms (threshold is 1320.0000 ms). Consider token timeout increase.

Both Pacemaker (and Corosync) and DRBD have timers that are used to detect loss of quorum, so messages like the example indicate that the node did not run for so long that it would have been dropped from the quorum. The Corosync timeout is 1.65 seconds and the threshold of 1.32 seconds is 80% of that, so the message shown in the example is printed when the delay in the scheduling of the main Corosync process hits 80% of the timeout. In the example the process was not scheduled for nearly three seconds. Whatever is causing such a problem must be resolved. One thing that might help in a similar situation is to reduce the requirements of the VM, for example, reducing the number of vCPUs required, as this makes it easier for the Hypervisor to schedule the VM.

Pacemaker scenario 2: An RDQM HA queue manager is not running where it should be

The main tool to help troubleshooting in this scenario is the crm status command. The following example shows a response for the configuration when everything is working as expected:

Stack: corosync
Current DC: mqhavm13.gamsworthwilliam.com (version 1.1.20.linbit-1+20190404+eab6a2092b71.el7.2-eab6a2092b) - partition with quorum
Last updated: Tue Jul 30 09:11:29 2019
Last change: Tue Jul 30 09:10:34 2019 by root via crm_attribute on mqhavm14.gamsworthwilliam.com

3 nodes configured
18 resources configured

Online: [ mqhavm13.gamsworthwilliam.com mqhavm14.gamsworthwilliam.com mqhavm15.gamsworthwilliam.com ]

Full list of resources:

 Master/Slave Set: ms_drbd_haqm1 [p_drbd_haqm1]
     Masters: [ mqhavm13.gamsworthwilliam.com ]
     Slaves: [ mqhavm14.gamsworthwilliam.com mqhavm15.gamsworthwilliam.com ]
 p_fs_haqm1	(ocf::heartbeat:Filesystem):	Started mqhavm13.gamsworthwilliam.com
 p_rdqmx_haqm1	(ocf::ibm:rdqmx):	Started mqhavm13.gamsworthwilliam.com
 haqm1	(ocf::ibm:rdqm):	Started mqhavm13.gamsworthwilliam.com
 Master/Slave Set: ms_drbd_haqm2 [p_drbd_haqm2]
     Masters: [ mqhavm14.gamsworthwilliam.com ]
     Slaves: [ mqhavm13.gamsworthwilliam.com mqhavm15.gamsworthwilliam.com ]
 p_fs_haqm2	(ocf::heartbeat:Filesystem):	Started mqhavm14.gamsworthwilliam.com
 p_rdqmx_haqm2	(ocf::ibm:rdqmx):	Started mqhavm14.gamsworthwilliam.com
 haqm2	(ocf::ibm:rdqm):	Started mqhavm14.gamsworthwilliam.com
 Master/Slave Set: ms_drbd_haqm3 [p_drbd_haqm3]
     Masters: [ mqhavm15.gamsworthwilliam.com ]
     Slaves: [ mqhavm13.gamsworthwilliam.com mqhavm14.gamsworthwilliam.com ]
 p_fs_haqm3	(ocf::heartbeat:Filesystem):	Started mqhavm15.gamsworthwilliam.com
 p_rdqmx_haqm3	(ocf::ibm:rdqmx):	Started mqhavm15.gamsworthwilliam.com
 haqm3	(ocf::ibm:rdqm):	Started mqhavm15.gamsworthwilliam.com

Note the following points:

All three nodes are shown as Online.
Each RDQM HA queue manager is running on the node where it was created, for example, HAQM1 is running on vm13 and so on.

This scenario is constructed by preventing HAQM1 from running on vm14, and then attempting to move HAQM1 to vm14. HAQM1 cannot run on vm14 because the file /var/mqm/mqs.ini on vm14 has an invalid value for the Directory of queue manager HAQM1.

The preferred location for HAQM1 is changed to vm14 by running the following command on vm13:

rdqmadm -m HAQM1 -n mqhavm14.gamsworthwilliam.com -p

This command would normally cause HAQM1 to move to vm14 but in this case checking the status on vm13 returns the following information:

[midtonjojo@mqhavm13 ~]$ rdqmstatus -m HAQM1
Node:                                   mqhavm13.gamsworthwilliam.com
Queue manager status:                   Running
CPU:                                    0.15%
Memory:                                 133MB
Queue manager file system:              52MB used, 1.0GB allocated [5%]
HA role:                                Primary
HA status:                              Normal
HA control:                             Enabled
HA current location:                    This node
HA preferred location:                  mqhavm14.gamsworthwilliam.com
HA floating IP interface:               None
HA floating IP address:                 None

Node:                                   mqhavm14.gamsworthwilliam.com
HA status:                              Normal

Node:                                   mqhavm15.gamsworthwilliam.com
HA status:                              Normal
Command '/opt/mqm/bin/rdqmstatus' run with sudo.

HAQM1 is still running on vm13, it has not moved to vm14 as requested and the cause needs investigating. Examining the Pacemaker status gives the following response:

[midtownjojo@mqhavm13 ~]$ crm status
Stack: corosync
Current DC: mqhavm13.gamsworthwilliam.com (version 1.1.20.linbit-1+20190404+eab6a2092b71.el7.2-eab6a2092b) - partition with quorum
Last updated: Thu Aug  1 14:16:40 2019
Last change: Thu Aug  1 14:16:35 2019 by hacluster via crmd on mqhavm14.gamsworthwilliam.com

3 nodes configured
18 resources configured

Online: [ mqhavm13.gamsworthwilliam.com mqhavm14.gamsworthwilliam.com mqhavm15.gamsworthwilliam.com ]

Full list of resources:

 Master/Slave Set: ms_drbd_haqm1 [p_drbd_haqm1]
     Masters: [ mqhavm13.gamsworthwilliam.com ]
     Slaves: [ mqhavm14.gamsworthwilliam.com mqhavm15.gamsworthwilliam.com ]
 p_fs_haqm1	(ocf::heartbeat:Filesystem):	Started mqhavm13.gamsworthwilliam.com
 p_rdqmx_haqm1	(ocf::ibm:rdqmx):	Started mqhavm13.gamsworthwilliam.com
 haqm1	(ocf::ibm:rdqm):	Started mqhavm13.gamsworthwilliam.com
 Master/Slave Set: ms_drbd_haqm2 [p_drbd_haqm2]
     Masters: [ mqhavm14.gamsworthwilliam.com ]
     Slaves: [ mqhavm13.gamsworthwilliam.com mqhavm15.gamsworthwilliam.com ]
 p_fs_haqm2	(ocf::heartbeat:Filesystem):	Started mqhavm14.gamsworthwilliam.com
 p_rdqmx_haqm2	(ocf::ibm:rdqmx):	Started mqhavm14.gamsworthwilliam.com
 haqm2	(ocf::ibm:rdqm):	Started mqhavm14.gamsworthwilliam.com
 Master/Slave Set: ms_drbd_haqm3 [p_drbd_haqm3]
     Masters: [ mqhavm15.gamsworthwilliam.com ]
     Slaves: [ mqhavm13.gamsworthwilliam.com mqhavm14.gamsworthwilliam.com ]
 p_fs_haqm3	(ocf::heartbeat:Filesystem):	Started mqhavm15.gamsworthwilliam.com
 p_rdqmx_haqm3	(ocf::ibm:rdqmx):	Started mqhavm15.gamsworthwilliam.com
 haqm3	(ocf::ibm:rdqm):	Started mqhavm15.gamsworthwilliam.com

Failed Resource Actions:
* haqm1_monitor_0 on mqhavm14.gamsworthwilliam.com 'not installed' (5): call=372, status=complete, exitreason='',
    last-rc-change='Thu Aug  1 14:16:37 2019', queued=0ms, exec=17ms

Take note of the Failed Resource Actions section that has appeared.

The name of the action, haqm1_monitor_0 tells us that it was a monitor action for the RDQM HAQM1 that failed, and it failed on mqhavm14.gamsworthwilliam.com, so it looks like Pacemaker tried to do what we expected and start HAQM1 on vm14, but for some reason it could not.

We can see when Pacemaker tried do this by looking at the value of the last-rc-change parameter.

Understand the failure

To understand the failure we need to look at the syslog for vm14 at the time of the failure:

Aug  1 14:16:37 mqhavm14 crmd[26377]:  notice: Result of probe operation for haqm1 on mqhavm14.gamsworthwilliam.com: 5 (not installed)

The entry shows that when Pacemaker tried to check the state of haqm1 on vm14 it got an error because haqm1 is not configured, which is because of the deliberate misconfiguration in /var/mqm/mqs.ini.

Correcting the failure

To correct the failure we must correct the underlying problem (in this case restoring the correct directory value for haqm1 in /var/mqm/mqs.ini on vm14). Then we must clear the failed action by using the command crm resource cleanup on the appropriate resource, which in this case is the resource haqm1 as that is the resource mentioned in the failed action. For example:

[midtownjojo@mqhavm13 ~]$ crm resource cleanup haqm1
Cleaned up haqm1 on mqhavm15.gamsworthwilliam.com
Cleaned up haqm1 on mqhavm14.gamsworthwilliam.com
Cleaned up haqm1 on mqhavm13.gamsworthwilliam.com

Then check the Pacemaker status again:

[midtownjojo@mqhavm13 ~]$ crm status
Stack: corosync
Current DC: mqhavm13.gamsworthwilliam.com (version 1.1.20.linbit-1+20190404+eab6a2092b71.el7.2-eab6a2092b) - partition with quorum
Last updated: Thu Aug  1 14:23:17 2019
Last change: Thu Aug  1 14:23:03 2019 by hacluster via crmd on mqhavm13.gamsworthwilliam.com

3 nodes configured
18 resources configured

Online: [ mqhavm13.gamsworthwilliam.com mqhavm14.gamsworthwilliam.com mqhavm15.gamsworthwilliam.com ]

Full list of resources:

 Master/Slave Set: ms_drbd_haqm1 [p_drbd_haqm1]
     Masters: [ mqhavm14.gamsworthwilliam.com ]
     Slaves: [ mqhavm13.gamsworthwilliam.com mqhavm15.gamsworthwilliam.com ]
 p_fs_haqm1	(ocf::heartbeat:Filesystem):	Started mqhavm14.gamsworthwilliam.com
 p_rdqmx_haqm1	(ocf::ibm:rdqmx):	Started mqhavm14.gamsworthwilliam.com
 haqm1	(ocf::ibm:rdqm):	Started mqhavm14.gamsworthwilliam.com
 Master/Slave Set: ms_drbd_haqm2 [p_drbd_haqm2]
     Masters: [ mqhavm14.gamsworthwilliam.com ]
     Slaves: [ mqhavm13.gamsworthwilliam.com mqhavm15.gamsworthwilliam.com ]
 p_fs_haqm2	(ocf::heartbeat:Filesystem):	Started mqhavm14.gamsworthwilliam.com
 p_rdqmx_haqm2	(ocf::ibm:rdqmx):	Started mqhavm14.gamsworthwilliam.com
 haqm2	(ocf::ibm:rdqm):	Started mqhavm14.gamsworthwilliam.com
 Master/Slave Set: ms_drbd_haqm3 [p_drbd_haqm3]
     Masters: [ mqhavm15.gamsworthwilliam.com ]
     Slaves: [ mqhavm13.gamsworthwilliam.com mqhavm14.gamsworthwilliam.com ]
 p_fs_haqm3	(ocf::heartbeat:Filesystem):	Started mqhavm15.gamsworthwilliam.com
 p_rdqmx_haqm3	(ocf::ibm:rdqmx):	Started mqhavm15.gamsworthwilliam.com
 haqm3	(ocf::ibm:rdqm):	Started mqhavm15.gamsworthwilliam.com

The failed action has disappeared and HAQM1 is now running on vm14 as expected. The following example shows the RDQM status:

[midtownjojo@mqhavm13 ~]$ rdqmstatus -m HAQM1
Node:                                   mqhavm13.gamsworthwilliam.com
Queue manager status:                   Running elsewhere
HA role:                                Secondary
HA status:                              Normal
HA control:                             Enabled
HA current location:                    mqhavm14.gamsworthwilliam.com
HA preferred location:                  mqhavm14.gamsworthwilliam.com
HA floating IP interface:               None
HA floating IP address:                 None

Node:                                   mqhavm14.gamsworthwilliam.com
HA status:                              Normal

Node:                                   mqhavm15.gamsworthwilliam.com
HA status:                              Normal
Command '/opt/mqm/bin/rdqmstatus' run with sudo.

Parent topic: Troubleshooting RDQM configurations