How to back up and recover page sets
There are different mechanisms available for back up and recovery. Use this topic to understand these mechanisms.
This section describes the following topics:
- Create a point of recovery for non-shared resources
- Backing up page sets
- Recovering page sets
- How to delete page sets
For information about how to create a point of recovery for shared resources, see Recovering shared queues.
Create a point of recovery for non-shared resources
IBM MQ can recover objects and non-shared persistent messages to their current state if both:
- Copies of page sets from an earlier point exist.
- All the IBM MQ logs are available to perform recovery from that point.
These represent a point of recovery for non-shared resources.
Both objects and messages are held on page sets. Multiple objects and messages from different queues can exist on the same page set. For recovery purposes, objects and messages cannot be backed up in isolation, so a page set must be backed up as a whole to ensure the correct recovery of the data.
The IBM MQ recovery log contains a record of all persistent messages and changes made to objects. If IBM MQ fails (for example, due to an I/O error on a page set), we can recover the page set by restoring the backup copy and restarting the queue manager. IBM MQ applies the log changes to the page set from the point of the backup copy.
There are two ways of creating a point of recovery:
- Full backup
- Stop the queue manager, which forces all updates on to the page sets.
This allows you to restart from the point of recovery, using only the backed up page set data sets and the logs from that point on.
- Fuzzy backup
- Take fuzzy backup copies of the page sets without stopping the queue manager.
If you use this method, and your associated logs later become damaged or lost, we cannot to use the fuzzy page set backup copies to recover. This is because the fuzzy page set backup copies contain an inconsistent view of the state of the queue manager and are dependent on the logs being available. If the logs are not available, we need to return to the last set of backup page set copies taken while the subsystem was inactive ( Method 1 ) and accept the loss of data from that time.
- Method 1: Full backup
- This method involves shutting the queue manager down. This forces all updates on to the page sets so that the page sets are in a consistent state.
- Stop all the IBM MQ applications that are using the queue manager (allowing them to complete first). This can be done by changing the access security or queue settings, for example.
- When all activity has completed, display and resolve any in-doubt units of recovery. (Use the commands DISPLAY CONN and RESOLVE INDOUBT, as described in DISPLAY CONN and RESOLVE INDOUBT.)
This brings the page sets to a consistent state; if we do not do this, your page sets might not be consistent, and we are effectively doing a fuzzy backup.
- Issue the ARCHIVE LOG command to ensure that the latest log data is written out to the log data sets.
- Issue the STOP QMGR MODE(QUIESCE) command. Record the lowest RBA value in the CSQI024I or CSQI025I messages (see CSQI024I and CSQI025I for more information). We should keep the log data sets starting from the one indicated by the RBA value up to the current log data set.
- Take backup copies of all the queue manager page sets (see Backing up page sets ).
- Method 2: Fuzzy backup
- This method does not involve shutting the queue manager down. Therefore, updates might be in virtual storage buffers during the backup process. This means that the page sets are not in a consistent state, and can only be used for recovery with the logs.
- Issue the DISPLAY USAGE TYPE(ALL) command, and record the RBA value in the CSQI024I or CSQI025I messages (see CSQI024I and CSQI025I for more information).
- Take backup copies of the page sets (see Backing up page sets ).
- Issue the ARCHIVE LOG command, to ensure that the latest log data is written out to the log data sets. To restart from the point of recovery, we must keep the log data sets starting from the log data set indicated by the RBA value up to the current log data set.
Backing up page sets
To recover a page set, IBM MQ needs to know how far back in the log to go. IBM MQ maintains a log RBA number in page zero of each page set, called the recovery log sequence number (LSN). This number is the starting RBA in the log from which IBM MQ can recover the page set. When you back up a page set, this number is also copied.
If the copy is later used to recover the page set, IBM MQ must have access to all the log records from this RBA value to the current RBA. That means we must keep enough of the log records to enable IBM MQ to recover from the oldest backup copy of a page set you intend to keep.
Use ADRDSSU COPY function to copy the page sets.
For more information, see the COPY DATASET Command Syntax for Logical Data Set documentation .
For example://STEP2 EXEC PGM=ADRDSSU,REGION=6M //SYSPRINT DD SYSOUT=H //SYSIN DD * COPY - DATASET(INCLUDE(SCENDATA.MQPA.PAGESET.*)) - RENAMEU(SCENDATA.MQPA.PAGESET.**,SCENDATA.MQPA.BACKUP1.**) - SPHERE - REPUNC - FASTREPLICATION(PREF )- CANCELERROR - TOL(ENQF) /* //If you copy the page set while the queue manager is running we must use a copy utility that copies page zero of the page set first. If we do not do this you could corrupt the data in your page set.
If the process of dynamically expanding a page set is interrupted, for example by power to the system being lost, we can still use ADRDSSU to take a backup of a page set.
If you perform an Access Method Services IDCAMS LISTCAT ENT('page set data set name') ALLOC, we will see that the HI-ALLOC-RBA is higher than the HI-USED-RBA.
The next time this page set fills up it is extended again, if possible, and the pages between the high used RBA and the highest allocated RBA are used, along with another new extent.
Backing up your object definitions
We should also back up copies of our object definitions. To do this, use the MAKEDEF feature of the CSQUTIL COMMAND function (described in Issuing commands to IBM MQ (COMMAND) ).
Back up your object definitions whenever you take a backup copy of our queue manager, and keep the most current version.
Recovering page sets
If the queue manager has terminated due to a failure, the queue manager can normally be restarted with all recovery being performed during restart. However, such recovery is not possible if any of your page sets or log data sets are not available. The extent to which we can now recover depends on the availability of backup copies of page sets and log data sets.
To restart from a point of recovery we must have:
- A backup copy of the page set that is to be recovered.
- If we used the
fuzzybackup process described in Method 2: Fuzzy backup, the log data set that included the recorded RBA value, the log data set that was made by the ARCHIVE LOG command, and all the log data sets between these.- If we used full backup, but we do not have the log data sets following that made by the ARCHIVE LOG command, you do not need to run the FORMAT TYPE(REPLACE) function of the CSQUTIL utility against all your page sets.
To recover a page set to its current state, we must also have all the log data sets and records since the ARCHIVE LOG command.
There are two methods for recovering a page set. To use either method, the queue manager must be stopped.
- Simple recovery
- This is the simpler method, and is appropriate for most recovery situations.
- Delete the page set we want to restore from backup.
- Use the ADRDSSU COPY function to recover your page set from the backup copy..
Alternatively, we can rename your backup copy to the original name, or change the CSQP00xx DD statement in your queue manager procedure to point to your backup page set. However, if you then lose or corrupt the page set, we will no longer have a backup copy to restore from.
- Restart the queue manager.
- When the queue manager has restarted successfully, we can restart the applications
- Reinstate your normal backup procedures for the restored page.
- Advanced recovery
- This method provides performance advantages if you have a large page set to recover, or if there has been much activity on the page set since the last backup copy was taken. However, it requires more manual intervention than the simple method, which might increase the risk of error and the time taken to perform the recovery.
- Delete and redefine the page set we want to restore from backup.
- Use ADRDSSU to copy the backup copy of the page set into the new page set. Define the new page set with a secondary extent value so that it can be expanded dynamically.
Alternatively, we can rename your backup copy to the original name, or change the CSQP00xx DD statement in your queue manager procedure to point to your backup page set. However, if you then lose or corrupt the page set, we will no longer have a backup copy to restore from.
- Change the CSQINP1 definitions for the queue manager to make the buffer pool associated with the page set being recovered as large as possible. By making the buffer pool large, you might be able to keep all the changed pages resident in the buffer pool and reduce the amount of I/O to the page set.
- Restart the queue manager.
- When the queue manager has restarted successfully, stop it (using quiesce) and then restart it using the normal buffer pool definition for that page set. After this second restart completes successfully, we can restart the applications
- Reinstate your normal backup procedures for the restored page.
- What happens when the queue manager is restarted
When the queue manager is restarted, it applies all changes made to the page set that are registered in the log, beginning at the restart point for the page set. IBM MQ can recover multiple page sets in this way. The page set is dynamically expanded, if required, during media recovery.
During restart, IBM MQ determines the log RBA to start from by taking the lowest value from the following:
- Recovery LSN from the checkpoint log record for each page set.
- Recovery LSN from page zero in each page set.
- The RBA of the oldest incomplete unit of recovery in the system at the time the backup was taken.
All object definitions are stored on page set zero. Messages can be stored on any available page set.
Note: The queue manager cannot restart if page set zero is not available.
How to delete page sets
You delete a page set by using the DELETE PSID command; see DELETE PSID for details of this command.
We cannot delete a page set that is still referenced by any storage class. Use DISPLAY STGCLASS to find out which storage classes reference a page set.
The data set is deallocated from IBM MQ but is not deleted. It remains available for future use, or can be deleted using z/OS facilities.
Remove the page set from the started task procedure for the queue manager.
Remove the definition of the page set from your CSQINP1 initialization data set.
Parent topic: Manage IBM MQ resources on z/OS