Failover and switchover

 

This topic gives concepts and examples of failover and switchover.

 

Mirror copy failover or switchover

A failover or switchover of the mirror copy when the independent disk pool is online results in a synchronization.

A failover or switchover of the mirror copy to another node at that site when the independent disk pool is online results in a synchronization.

 

When geographic mirroring is suspended

While geographic mirroring is suspended, a switchover or failover to the mirror copy is prohibited because the mirror copy contains back-level data. However, in the case where the production copy is lost, you can change the order of the recovery domain nodes to convert such a back-level mirror copy into the production copy. Do this by changing the backup node which owns the mirror copy into a primary node. If geographic mirroring is suspended for some of the independent disk pools in the disk pool group, but not all of the independent disk pools in the disk pool group, you cannot convert the mirror copy into a production copy even by changing the order of the recovery domain nodes. If geographic mirroring is suspended for all of the independent disk pools in the group, you can change the order of the recovery domain names. If the independent disk pools were suspended at different times, then the mirror copies are inconsistent and you should not try to convert these inconsistent mirror copies into the production copy.

 

Examples

The following are examples of failovers and switchovers:

A full or partial synchronization is required once geographic mirroring resumes after being in a suspended state.

 

Ending clustering

Do not end clustering on a node that is performing geographic mirroring. Such nodes own either a production copy or a mirror copy. The following results occur when ending clustering while performing geographic mirroring:

  • Ending clustering for the node that owns the production copy when the cluster resource group is active causes failover.

  • Ending clustering for the node that owns the mirror copy when the cluster resource group is active causes failover of the mirror copy.

  • Ending clustering for the node that owns the mirror copy when failover cannot occur, because the cluster resource group is inactive or because there is no other active node at the mirror copy site, prevents recovery from TCP/IP connection failures.

If you ended clustering inadvertently, you should restart clustering, make the independent disk pools in the cluster resource group unavailable at your first opportunity, then make the independent ASPs available again. When clustering is ended, geographic mirroring cannot recover from certain communications failures until both clustering and geographic mirroring are restarted.

 

Shutting down system

If the system owning the mirror copy must be shutdown while performing geographic mirroring, you should do one of the following to avoid causing the application on the production copy to wait for the recovery timeout:

  • If another active node is at the mirror copy site, switch the mirror copy to the other node. As part of the switchover, geographic mirroring is suspended, but without the timeout delay.

  • If no other active node is at the mirror copy site, suspend geographic mirroring before shutting down the mirror copy system which avoids the recovery timeout delay. Synchronization is required once geographic mirroring is suspended.

After suspending geographic mirroring, a full resynchronization is required when tracking is not used and a partial synchronization is required when tracking is used. Synchronization is required once geographic mirroring is resumed.

Do not shut down the TCP system on a node that is performing geographic mirroring. Such nodes own either a production copy or a mirror copy. The following results occur if the TCP system is shut down:

  • If TCP is shutdown on production copy node and cluster resource group is active, failover occurs to the mirror copy.

  • If TCP Is shutdown on mirror copy node, geographic mirroring is suspended.

 

Recovery from two production copies

For successive failovers when performing geographic mirroring, the situation can arise that you have two production copies. Ordinarily, the production copy and the mirror copy remain consistent, so the next make available or resume automatically changes the former production copy to become the mirror copy, and the next make available will synchronize the new mirror copy. However, when the two nodes were not communicating, the users may have made both production copies available independently by suspending geographic mirroring. In this case, the system does not know which production copy the user wants. You must resolve the inconsistency by changing the recovery domain order. Once the node to serve as the production copy has been selected, the other production copy node becomes a mirror copy and is synchronized to the production copy.

 

Considerations for making a disk pool available at failover or switchover

When you specify *ONLINE for the Configuration object online, the system automates the vary-on as part of failover or switchover. Therefore, you do not have to issue the vary-on. However, if a geographic mirroring problem occurs during the vary-on, the system suspends geographic mirroring and completes the vary-on. You might prefer to fix the problem and keep geographic mirroring active. Also, if the vary-on fails, the system attempts to go back to the original primary node and vary-on the independent ASP back to the original primary node. You might prefer to fix the problem and vary-on the independent ASP to the new primary node.

 

Parent topic:

Planning for geographic mirroring