example recovery scenarios, problem, active log, CSQJ106E message, CSQJ124E message, stopped data set effect" /> I/O errors occur while reading the active log

 

I/O errors occur while reading the active log

Symptoms

WebSphere MQ issues the following message:

CSQJ106E +CSQ1 LOG READ ERROR DSNAME=..., LOGRBA=...,
           ERROR STATUS=ccccffss

System action

This depends on when the error occurred:

  • If the error occurs during the off-load process, the process tries to read the RBA range from a second copy.

    • If no second copy exists, the active log data set is stopped.

    • If the second copy also has an error, only the original data set that triggered the off-load is stopped. The archive log data set is then terminated, leaving a gap in the archived log RBA range.

    • This message is issued:

      CSQJ124E +CSQ1 OFFLOAD OF ACTIVE LOG SUSPENDED FROM
                 RBA xxxxxx TO RBA xxxxxx DUE TO I/O ERROR

    • If the second copy is satisfactory, the first copy is not stopped.

  • If the error occurs during recovery, WebSphere MQ provides data from specific log RBAs requested from another copy or archive. If this is unsuccessful, recovery does not succeed, and the queue manager terminates abnormally.

  • If the error occurs during restart, if dual logging is used, WebSphere MQ continues with the alternative log data set, otherwise the queue manager ends abnormally.

System programmer action

Look for system messages, such as IEC prefixed messages, and try to resolve the problem using the recommended actions for these messages.

If the active log data set has been stopped, it is not used for logging. The data set is not deallocated; it is still used for reading. Even if the data set is not stopped, an active log data set that gives persistent errors should be replaced.

Operator action

None.

 

Replacing the data set

How you replace the data set depends on whether you are using single or dual active logging.

If you are using dual active logging:

  1. Ensure that the data has been saved.

    The data is saved on the other active log and this can be copied to a replacement active log.

  2. Stop the queue manager and delete the data set with the error using Access Method Services.

  3. Redefine a new log data set using Access Method Services DEFINE so that we can write to it. Use DFDSS or Access Method Services REPRO to copy the good log into the redefined data set so that you have two consistent, correct logs again.

  4. Use the change log inventory utility, CSQJU003, to update the information in the BSDS about the corrupt data set as follows:

    1. Use the DELETE function to remove information about the corrupt data set.

    2. Use the NEWLOG function to name the new data set as the new active log data set and give it the RBA range that was successfully copied.

      We can run the DELETE and NEWLOG functions in the same job step. Put the DELETE statement before NEWLOG statement in the SYSIN input data set.

  5. Restart the queue manager.

If you are using single active logging:

  1. Ensure that the data has been saved.

  2. Stop the queue manager.

  3. Determine whether the data set with the error has been off-loaded:

    1. Use the CSQJU003 utility to list information about the archive log data sets from the BSDS.

    2. Search the list for a data set whose RBA range includes the RBA of the corrupt data set.

  4. If the corrupt data set has been off-loaded, copy its backup in the archive log to a new data set. Then, skip to step 6.

  5. If an active log data set is stopped, an RBA is not off-loaded. Use DFDSS or Access Method Services REPRO to copy the data from the corrupt data set to a new data set.

    If further I/O errors prevent you from copying the entire data set, a gap occurs in the log.

    Note:
    Queue manager restart will not be successful if a gap in the log is detected.

  6. Use the change log inventory utility, CSQJU003, to update the information in the BSDS about the corrupt data set as follows:

    1. Use the DELETE function to remove information about the corrupt data set.

    2. Use the NEWLOG function to name the new data set as the new active log data set and to give it the RBA range that was successfully copied.

      The DELETE and NEWLOG functions can be run in the same job step. Put the DELETE statement before NEWLOG statement in the SYSIN input data set.

  7. Restart the queue manager.