+

Search Tips | Advanced Search

Recovering CICS units of recovery manually

Use this topic to understand what happens when the CICS adapter restarts, and then explains how to deal with any unresolved units of recovery that arise.


What happens when the CICS adapter restarts

Whenever a connection is broken, the adapter has to go through a restart phase during the reconnect process. The restart phase resynchronizes resources. Resynchronization between CICS and IBM MQ enables in-doubt units of work to be identified and resolved.

Resynchronization can be caused by:

  • An explicit request from the distributed queuing component
  • An implicit request when a connection is made to IBM MQ

If the resynchronization is caused by connecting to IBM MQ, the sequence of events is:

  1. The connection process retrieves a list of in-doubt units of work (UOW) IDs from IBM MQ.
  2. The UOW IDs are displayed on the console in CSQC313I messages.
  3. The UOW IDs are passed to CICS.
  4. CICS initiates a resynchronization task (CRSY) for each in-doubt UOW ID.
  5. The result of the task for each in-doubt UOW is displayed on the console.

We need to check the messages that are displayed during the connect process:

    CSQC313I
    Shows that a UOW is in doubt.

    CSQC400I
    Identifies the UOW and is followed by one of these messages:

    • CSQC402I or CSQC403I shows that the UOW was resolved successfully (committed or backed out).
    • CSQC404E, CSQC405E, CSQC406E, or CSQC407E shows that the UOW was not resolved.

    CSQC409I
    Shows that all UOWs were resolved successfully.

    CSQC408I
    Shows that not all UOWs were resolved successfully.

    CSQC314I
    Warns that UOW IDs highlighted with a * are not resolved automatically. These UOWs must be resolved explicitly by the distributed queuing component when it is restarted.

Figure 1 shows an example set of restart messages displayed on the z/OS console.

Figure 1. Example restart messages
CSQ9022I +CSQ1 CSQYASCP ' START QMGR' NORMAL COMPLETION
+CSQC323I VICIC1 CSQCQCON CONNECT received from TERMID=PB62 TRANID=CKCN
+CSQC303I VICIC1 CSQCCON CSQCSERV loaded. Entry point is 850E8918
+CSQC313I VICIC1 CSQCCON UOWID=VICIC1.A6E5A6F0E2178D25 is in doubt
+CSQC313I VICIC1 CSQCCON UOWID=VICIC1.A6E5A6F055B2AC25 is in doubt
+CSQC313I VICIC1 CSQCCON UOWID=VICIC1.A6E5A6EFFD60D425 is in doubt
+CSQC313I VICIC1 CSQCCON UOWID=VICIC1.A6E5A6F07AB56D22 is in doubt
+CSQC307I VICIC1 CSQCCON Successful connection to subsystem VC2
+CSQC472I VICIC1 CSQCSERV Server subtask (TCB address=008BAD18) connect
successful
+CSQC472I VICIC1 CSQCSERV Server subtask (TCB address=008BAA10) connect
successful
+CSQC472I VICIC1 CSQCSERV Server subtask (TCB address=008BA708) connect
successful
+CSQC472I VICIC1 CSQCSERV Server subtask (TCB address=008CAE88) connect
successful
+CSQC472I VICIC1 CSQCSERV Server subtask (TCB address=008CAB80) connect
successful
+CSQC472I VICIC1 CSQCSERV Server subtask (TCB address=008CA878) connect
successful
+CSQC472I VICIC1 CSQCSERV Server subtask (TCB address=008CA570) connect
successful
+CSQC472I VICIC1 CSQCSERV Server subtask (TCB address=008CA268) connect
successful
+CSQC403I VICIC1 CSQCTRUE Resolved BACKOUT for
+CSQC400I VICIC1 CSQCTRUE UOWID=VICIC1.A6E5A6F0E2178D25
+CSQC403I VICIC1 CSQCTRUE Resolved BACKOUT for
+CSQC400I VICIC1 CSQCTRUE UOWID=VICIC1.A6E5A6F055B2AC25
+CSQC403I VICIC1 CSQCTRUE Resolved BACKOUT for
+CSQC400I VICIC1 CSQCTRUE UOWID=VICIC1.A6E5A6F07AB56D22
+CSQC403I VICIC1 CSQCTRUE Resolved BACKOUT for
+CSQC400I VICIC1 CSQCTRUE UOWID=VICIC1.A6E5A6EFFD60D425
+CSQC409I VICIC1 CSQCTRUE Resynchronization completed successfully

The total number of CSQC313I messages should equal the total number of CSQC402I plus CSQC403I messages. If the totals are not equal, there are UOWs that the connection process cannot resolve. Those UOWs that cannot be resolved are caused by problems with CICS (for example, a cold start) or with IBM MQ, or by distributing queuing. When these problems have been fixed, we can initiate another resynchronization by disconnecting and then reconnecting.

Alternatively, we can resolve each outstanding UOW yourself using the RESOLVE INDOUBT command and the UOW ID shown in message CSQC400I. We must then initiate a disconnect and a connect to clean up the unit of recovery descriptors in CICS. We need to know the correct outcome of the UOW to resolve UOWs manually.

All messages that are associated with unresolved UOWs are locked by IBM MQ and no Batch, TSO, or CICS task can access them.

If CICS fails and an emergency restart is necessary, do not vary the GENERIC APPLID of the CICS system. If you do and then reconnect to IBM MQ, data integrity with IBM MQ cannot be guaranteed. This is because IBM MQ treats the new instance of CICS as a different CICS (because the APPLID is different). In-doubt resolution is then based on the wrong CICS log.


How to resolve CICS units of recovery manually

If the adapter ends abnormally, CICS and IBM MQ build in-doubt lists either dynamically or during restart, depending on which subsystem caused the abend.

Note: If we use the DFH$INDB sample program to show units of work, you might find that it does not always show IBM MQ UOWs correctly.

When CICS connects to IBM MQ, there might be one or more units of recovery that have not been resolved.

One of the following messages is sent to the console:

  • CSQC404E
  • CSQC405E
  • CSQC406E
  • CSQC407E
  • CSQC408I

For details of what these messages mean, see the CICS adapter and Bridge messages messages.

CICS retains details of units of recovery that were not resolved during connection startup. An entry is purged when it no longer appears on the list presented by IBM MQ.

Any units of recovery that CICS cannot resolve must be resolved manually using IBM MQ commands. This manual procedure is rarely used within an installation, because it is required only where operational errors or software problems have prevented automatic resolution. Any inconsistencies found during in-doubt resolution must be investigated.

To resolve the units of recovery:
  1. Obtain a list of the units of recovery from IBM MQ using the following command:
    +CSQ1  DISPLAY CONN( * ) WHERE(UOWSTATE EQ UNRESOLVED)
    
    You receive the following message:
    CSQM201I +CSQ1 CSQMDRTC DISPLAY CONN DETAILS
    CONN(BC85772CBE3E0001)
    EXTCONN(C3E2D8C3C7D9F0F94040404040404040)
    TYPE(CONN)
    CONNOPTS(
    MQCNO_STANDARD_BINDING
    )
    UOWLOGDA(2005-02-04)
    UOWLOGTI(10.17.44)
    UOWSTDA(2005-02-04)
    UOWSTTI(10.17.44)
    UOWSTATE(UNRESOLVED)
    NID(IYRCSQ1 .BC8571519B60222D)
    EXTURID(BC8571519B60222D)
    QMURID(0000002BDA50)
    URTYPE(CICS)
    USERID(MQTEST)
    APPLTAG(IYRCSQ1)
    ASID(0000)
    APPLTYPE(CICS)
    TRANSID(GP02)
    TASKNO(0000096)
    END CONN DETAILS
    

    For CICS connections, the NID consists of the CICS applid and a unique number provided by CICS at the time the syncpoint log entries are written. This unique number is stored in records written to both the CICS system log and the IBM MQ log at syncpoint processing time. This value is referred to in CICS as the recovery token.

  2. Scan the CICS log for entries related to a particular unit of recovery.

    Look for a PREPARE record for the task-related installation where the recovery token field (JCSRMTKN) equals the value obtained from the network ID. The network ID is supplied by IBM MQ in the DISPLAY CONN command output.

    The PREPARE record in the CICS log for the units of recovery provides the CICS task number. All other entries on the log for this CICS task can be located using this number.

    We can use the CICS journal print utility DFHJUP when scanning the log. For details of using this program, see the CICS Operations and Utilities Guide.

  3. Scan the IBM MQ log for records with the NID related to a particular unit of recovery. Then use the URID from this record to obtain the rest of the log records for this unit of recovery.

    When scanning the IBM MQ log, note that the IBM MQ startup message CSQJ001I provides the start RBA for this session.

    The print log records program (CSQ1LOGP) can be used for that purpose.

  4. For to, do in-doubt resolution in IBM MQ.

    IBM MQ can be directed to take the recovery action for a unit of recovery using an IBM MQ RESOLVE INDOUBT command.

    To recover all threads associated with a specific connection-name, use the NID(*) option.

    The command produces one of the following messages showing whether the thread is committed or backed out:
    CSQV414I +CSQ1 THREAD network-id COMMIT SCHEDULED
    CSQV415I +CSQ1 THREAD network-id ABORT SCHEDULED
    

When performing in-doubt resolution, CICS and the adapter are not aware of the commands to IBM MQ to commit or back out units of recovery, because only IBM MQ resources are affected. However, CICS keeps details about the in-doubt threads that could not be resolved by IBM MQ. This information is purged either when the list presented is empty, or when the list does not include a unit of recovery of which CICS has details.

Parent topic: Recovering units of work manually

Last updated: 2020-10-04