BSDS problems

Use this topic to investigate, and resolve problems with BSDS.

For background information about the bootstrap data set (BSDS), see the Plan the IBM MQ environment on z/OS .

This topic describes the following BSDS problems:

Normally, there are two copies of the BSDS, but if one is damaged, IBM MQ immediately changes to single BSDS mode. However, the damaged copy of the BSDS must be recovered before restart. If we are in single mode and damage the only copy of the BSDS, or if we are in dual mode and damage both copies, use the procedure described in Recovering the BSDS.

This section covers some of the BSDS problems that can occur at startup. Problems not covered here include:

  • RECOVER BSDS command errors (messages CSQJ301E - CSQJ307I)
  • Change log inventory utility errors (message CSQJ123E)
  • Errors in the BSDS backup being dumped by offload processing (message CSQJ125E)


Error occurs while opening the BSDS

    Symptoms

    IBM MQ issues the following message:

    CSQJ100E +CSQ1 ERROR OPENING BSDSn DSNAME=..., ERROR STATUS=eeii
    

    where eeii is the VSAM return code. For information about VSAM codes, see the DFSMS/MVS™ Macro Instructions for Data Sets documentation.

    System action
    During system initialization, the startup is terminated.

    During a RECOVER BSDS command, the system continues in single BSDS mode.

    System programmer action
    None.

    Operator action
    Carry out these steps:
    1. Run the print log map utility on both copies of the BSDS, and compare the lists to determine which copy is accurate or current.
    2. Rename the data set that had the problem, and define a replacement for it.
    3. Copy the accurate data set to the replacement data set, using Access Method Services.
    4. Restart the queue manager.


Log content does not agree with the BSDS information

    Symptoms
    IBM MQ issues the following message:
    CSQJ102E +CSQ1 LOG RBA CONTENT OF LOG DATA SET DSNAME=...,
               STARTRBA=..., ENDRBA=...,
               DOES NOT AGREE WITH BSDS INFORMATION
    

    This message indicates that the change log inventory utility was used incorrectly or that a down-level data set is being used.

    System action
    Queue manager startup processing is terminated.

    System programmer action
    None.

    Operator action
    Run the print log map utility and the change log inventory utility to print and correct the contents of the BSDS.


Both copies of the BSDS are damaged

    Symptoms
    IBM MQ issues the following messages:
    CSQJ107E +CSQ1 READ ERROR ON BSDS
               DSNAME=... ERROR STATUS=0874
    CSQJ117E +CSQ1 REG8 INITIALIZATION ERROR READING BSDS
               DSNAME=... ERROR STATUS=0874
    CSQJ119E +CSQ1 BOOTSTRAP ACCESS INITIALIZATION PROCESSING FAILED
    

    System action
    Queue manager startup processing is terminated.

    System programmer action
    Carry out these steps:
    1. Rename the data set, and define a replacement for it.
    2. Locate the BSDS associated with the most recent archive log data set, and copy it to the replacement data set.
    3. Use the print log map utility to print the contents of the replacement BSDS.
    4. Use the print log records utility to print a summary report of the active log data sets missing from the replacement BSDS, and to establish the RBA range.
    5. Use the change log inventory utility to update the missing active log data set inventory in the replacement BSDS.
    6. If dual BSDS data sets had been in use, copy the updated BSDS to the second copy of the BSDS.
    7. Restart the queue manager.

    Operator action
    None.


Unequal time stamps

    Symptoms
    IBM MQ issues the following message:
    CSQJ120E +CSQ1 DUAL BSDS DATA SETS HAVE UNEQUAL TIME STAMPS,
               SYSTEM BSDS1=...,BSDS2=...,
               UTILITY BSDS1=...,BSDS2=...
    
    The possible causes are:

    • One copy of the BSDS has been restored. All information about the restored BSDS is down-level. The down-level BSDS has the earlier time stamp.
    • One of the volumes containing the BSDS has been restored. All information about the restored volume is down-level. If the volume contains any active log data sets or IBM MQ data, they are also down-level. The down-level volume has the earlier time stamp.
    • Dual logging has degraded to single logging, and we are trying to start without recovering the damaged log.
    • The queue manager terminated abnormally after updating one copy of the BSDS but before updating the second copy.

    System action
    IBM MQ attempts to resynchronize the BSDS data sets using the more recent copy. If this fails, queue manager startup is terminated.

    System programmer action
    None.

    Operator action
    If automatic resynchronization fails, carry out these steps:
    1. Run the print log map utility on both copies of the BSDS, compare the lists to determine which copy is accurate or current.
    2. Rename the down-level data set and define a replacement for it.
    3. Copy the good data set to the replacement data set, using Access Method Services.
    4. If applicable, determine whether the volume containing the down-level BSDS has been restored. If it has been restored, all data on that volume, such as the active log data, is also down-level.

      If the restored volume contains active log data and you were using dual active logs on separate volumes, we need to copy the current version of the active log to the down-level log data set. See Recovering logs for details of how to do this.


Out of synchronization

    Symptoms
    IBM MQ issues the following message during queue manager initialization:
    CSQJ122E +CSQ1 DUAL BSDS DATA SETS ARE OUT OF SYNCHRONIZATION
    

    The two input copies of the BSDSs have different time stamps, or contain a record that is inconsistent. Differences can exist if operator errors occurred while the change log inventory utility was being used. (For example, the change log inventory utility was only run on one copy.) The change log inventory utility sets a private time stamp in the BSDS control record when it starts, and a close flag when it ends. IBM MQ checks the change log inventory utility time stamps and, if they are different, or they are the same but one close flag is not set, IBM MQ compares the copies of the BSDSs. If the copies are different, message CSQJ122E is issued.

    This message is also issued by the BSDS conversion utility if two input BSDS are specified and a record is found that differs between the two BSDS copies. This situation can arise if the queue manager terminated abnormally prior to the BSDS conversion utility being run.

    System action
    Queue manager startup or the utility is terminated.

    System programmer action
    None.

    Operator action
    If the error occurred during queue manager initialization, carry out these steps:
    1. Run the print log map utility on both copies of the BSDS, and compare the lists to determine which copy is accurate or current.
    2. Rename the data set that had the problem, and define a replacement for it.
    3. Copy the accurate data set to the replacement data set, using access method services.
    4. Restart the queue manager.

    If the error occurred when running the BSDS conversion utility, carry out these steps:

    1. Attempt to restart the queue manager and shut it down cleanly before attempting to run the BSDS conversion utility again.
    2. If this does not solve the problem, run the print log map utility on both copies of the BSDS, and compare the lists to determine which copy is accurate or current.
    3. Change the JCL used to invoke the BSDS conversion utility to specify the current BSDS in the SYSUT1 DD statement, and remove the SYSUT2 DD statement, before submitting the job again.


I/O error

    Symptoms
    IBM MQ changes to single BSDS mode and issues the user message:
    CSQJ126E +CSQ1 BSDS ERROR FORCED SINGLE BSDS MODE
    

    This is followed by one of the following messages:

    CSQJ107E +CSQ1 READ ERROR ON BSDS
               DSNAME=... ERROR STATUS=...
     
    CSQJ108E +CSQ1 WRITE ERROR ON BSDS
               DSNAME=... ERROR STATUS=...
    

    System action
    The BSDS mode changes from dual to single.

    System programmer action
    None.

    Operator action
    Carry out these steps:
    1. Use Access Method Services to rename or delete the damaged BSDS and to define a new BSDS with the same name as the BSDS that had the error. Example control statements can be found in job CSQ4BREC in thlqual.SCSQPROC.
    2. Issue the IBM MQ command RECOVER BSDS to make a copy of the good BSDS in the newly allocated data set and reinstate dual BSDS mode. See also Recovering the BSDS.


Log range problems

Symptoms

IBM MQ has issued message CSQJ113E when reading its own log, or message CSQJ133E or CSQJ134E when reading the log of a queue manager in the queue sharing group. This can happen when we do not have the archive logs needed to restart the queue manager or recover a CF structure.

System action

Depending upon what log record is being read and why, the requestor might end abnormally with a reason code of X'00D1032A'.

System programmer action

Run the print log map utility (CSQJU004) to determine the cause of the error. When message CSQJ133E or CSQJ134E has been issued, run the utility against the BSDS of the queue manager indicated in the message.

If we have:

  • Deleted the entry with the log range (containing the log RBA or LRSN indicated in the message) from the BSDS, and
  • Not deleted or reused the data set

we can add the entry back into the BSDS using the following procedure:

  1. Identify the data set containing the required RBA or LRSN, by looking at an old copy of the contents of BSDS, or by running CSQJU004 against a backup of the BSDS.
  2. Add the data set back into the BSDS using the change log inventory utility (CSQJU003).
  3. Restart the queue manager.

If an archive log data set has been deleted, we will not be able to recover the page set or CF structure that needs the archive logs. Identify the reason that the queue manager needs to read the log record, then take one of the following actions depending on the page set or CF structure affected.

Page sets

Message CSQJ113E during the recovery phase of queue manager restart indicates that the log is needed to perform media recovery to bring a page set up to date.

Identify the page sets that need the deleted log data set for media recovery, by looking at the media recovery RBA in the CSQI1049I message issued for each page set during queue manager restart, then perform the following actions.

  • Page set zeroWe can recover the objects on page set zero, by using the following procedure. Attention: All data in all other page sets will be lost when you carry out the procedure.
    1. Use function SDEFS of the CSQUTIL utility to produce a file of IBM MQ DEFINE commands.
    2. Format page set zero using CSQUTIL, then redefine the other page sets as described in the next section.
    3. Restart the queue manager.
    4. Use CSQUTIL to redefine the objects using the DEFINE commands produced by the utility in step 1.

  • Page sets 1-99Use the following procedure to redefine the page sets. Attention: Any data on the page set is lost when you carry out this operation.
    1. If we can access the page set without any I/O errors, reformat the page set using the CSQUTIL utility with the command FORMAT TYPE(NEW).
    2. If I/O errors occurred when accessing the page set, delete the page set and re-create it.

      If we want the page set to be the same size as before, use the command LISTCAT ENT(dsname) ALLOC to obtain the existing space allocations, and use these in the z/OS DEFINE CLUSTER command.

      Format the new page set using the CSQUTIL utility with the command FORMAT TYPE(NEW).

    3. Restart the queue manager. We might have to take certain actions, such as resetting channels or resolving indoubt channels.

CF structures

Messages CSQJ113E, CSQJ133E, or CSQJ134E, during the recovery of a CF structure, indicate that the logs needed to recover the structure are not available on at least one member of the queue sharing group.

Take one of the following actions depending on the structure affected:

    Application CF structure
    Issue the command RECOVER CFSTRUCT(structure-name) TYPE(PURGE).

    This process empties the structure, so any messages on the structure are lost.

    CSQSYSAPPL structure
    Contact the IBM support center.

    Administration structure
    This structure is rebuilt using log data since the last checkpoint on each queue manager, which should be in active logs.
    If you get this error during administration structure recovery, contact the IBM support center as this indicates that the active log is not available.

Once you have recovered the page set or CF structure, perform a backup of the logs, BSDS, page sets, and CF structures.

To prevent this problem from occurring again, increase the:

  • Archive log retention (ARCRETN) value to be longer, and
  • Increase the frequency of the CF structure backups.

Parent topic: Example recovery procedures on z/OS