backup and recovery, after Coupling Facility failure, recovery, restart, Coupling Facility structures, backup, persistent messages, queue-sharing groups" /> Coupling Facility failure

 

Coupling Facility failure

In the unlikely event of a Coupling Facility failure, any nonpersistent messages stored in the affected CF structures are lost. We can recover persistent messages using the RECOVER CFSTRUCT command.

To ensure that we can recover a CF structure in a reasonable time, you must take frequent backups, using the BACKUP CFSTRUCT command. We can choose to 'round-robin' the backups across all the queue managers in the queue-sharing group, or dedicate one queue manager to do all the backups.

Each backup is output to the active log data set of the queue manager taking the backup. The shared queue DB2 repository records the name of the CF structure being backed up, the name of the queue manager doing the backup, the RBA range for this backup on that queue manager's log, and the backup time.

You recover a CF structure by issuing a RECOVER CFSTRUCT command to the queue manager that you want to do the recovery; we can recover a single CF structure, or we can recover several CF structures simultaneously. The command uses the backup, located through the DB2 repository information, and forward recovers this to the point of failure. It does this by applying log records from any queue manager in the queue-sharing group that has performed an MQPUT or MQGET between the start of the backup and the time of failure, on any shared queue that maps to the CF structure. The resulting merging of the logs might require reading a considerable amount of log data, and so you are strongly advised to make frequent (say, hourly) backups, especially if there are large messages within the backup.

If a recoverable application structure has failed, any further application activity is prevented until the structure has been recovered. If the administration structure has also failed, all the queue managers in the queue-sharing group must be started before we can issue the RECOVER CFSTRUCT command.

If a CF structure fails, the action taken by connected queue managers depends on the following:

The following scenarios describe what happens when an administration structure fails:

The following scenarios describe what happens when an application structure fails:

If a CF structure fails, V5.3 and V6.0 queue managers connected to a CFLEVEL(3) or CFLEVEL(4) CF structure continue to run, and applications that do not use the queues in the failed structure can continue normal processing. However, applications that attempt operations on queues in the failed structure receive errors until the RECOVER CFSTRUCT command has successfully rebuilt the failed structure, at which point new requests to open queues in the structure are allowed.