Dealing with waits on z/OS
Waits can occur in batch or TSO applications, CICS transactions, and other components on IBM MQ for z/OS . Use this topic to determine where waits can occur.
When investigating what appears to be a problem with tasks or subsystems waiting, it is necessary to take into account the environment in which the task or subsystem is running.
It might be that your z/OS system is generally under stress. In this case, there can be many symptoms. If there is not enough real storage, jobs experience waits at paging interrupts or swap-outs. Input/output (I/O) contention or high channel usage can also cause waits.
We can use standard monitoring tools, such as Resource Monitoring Facility ( RMF ) to diagnose such problems. Use normal z/OS tuning techniques to resolve them.
Is a batch or TSO program waiting?
Consider the following points:
- Your program might be waiting on another resource
- For example, a VSAM control interval (CI) that another program is holding for update.
- Your program might be waiting for a message that has not yet arrived
-
This condition might be normal behavior if, for example, it is a server program that constantly monitors a queue.
Alternatively, your program might be waiting for a message that has arrived, but has not yet been committed.
Issue the DIS CONN(*) TYPE(HANDLE) command and examine the queues in use by your program.
If you suspect that your program has issued an MQI call that did not involve an MQGET WAIT, and control has not returned from IBM MQ, take an SVC dump of both the batch or TSO job, and the IBM MQ subsystem before canceling the batch or TSO program.
Also consider that the wait state might be the result of a problem with another program, such as an abnormal termination (see Messages do not arrive when expected on z/OS ), or in IBM MQ itself (see Is IBM MQ waiting for z/OS ? ). Refer to IBM MQ for z/OS dumps (specifically Figure 1 ) for information about obtaining a dump.
If the problem persists, refer to Searching the IBM database for similar problems, and solutions and Contacting IBM Software Support for information about reporting the problem to IBM.
Is a CICS transaction waiting?
Consider the following points:
- CICS might be under stress
- This might indicate that the maximum number of tasks allowed (MAXTASK) has been reached, or a short on storage (SOS) condition exists. Check the console log for messages that might explain this (for example, SOS messages), or see the CICS Problem Determination Guide.
- The transaction might be waiting for another resource
- For example, this might be file I/O. We can use CEMT INQ TASK to see what the task is waiting for. If the resource type is MQSERIES your transaction is waiting on IBM MQ (either in an MQGET WAIT or a task switch). Otherwise see the CICS Problem Determination Guide to determine the reason for the wait.
- The transaction might be waiting for IBM MQ for z/OS
- This might be normal, for example, if your program is a server program that waits for messages to arrive on a queue. Otherwise it might be the result of a transaction abend, for example (see Messages do not arrive when expected on z/OS ). If so, the abend is reported in the CSMT log.
- The transaction might be waiting for a remote message
- If we are using distributed queuing, the program might be waiting for a message that has not yet been delivered from a remote system (for further information, refer to Problems with missing messages when using distributed queuing on z/OS ).
If you suspect that your program has issued an MQI call that did not involve an MQGET WAIT (that is, it is in a task switch), and control has not returned from IBM MQ, take an SVC dump of both the CICS region, and the IBM MQ subsystem before canceling the CICS transaction. Refer to Dealing with loops on z/OS for information about waits. Refer to IBM MQ for z/OS dumps (specifically Figure 1 ) for information about obtaining a dump.
If the problem persists, refer to Searching the IBM database for similar problems, and solutions and Contacting IBM Software Support for information about reporting the problem to IBM.
Is Db2 waiting?
If your investigations indicate that Db2 is waiting, check the following:- Use the Db2 -DISPLAY THREAD(*) command to determine if any activity is taking place between the queue manager and the Db2 subsystem.
- Try and determine whether any waits are local to the queue manager subsystems or are across the Db2 subsystems.
Is RRS active?
- Use the D RRS command to determine if RRS is active.
Is IBM MQ waiting for z/OS ?
If your investigations indicate that IBM MQ itself is waiting, check the following:- Use the DISPLAY THREAD(*) command to check if anything is connected to IBM MQ.
- Use SDSF DA, or the z/OS command DISPLAY
A,xxxxMSTR to determine whether there is any processor usage (as shown in Has the application or IBM MQ for z/OS stopped processing work? ).
- If IBM MQ is using some processor time, reconsider other reasons why IBM MQ might be waiting, or consider whether this is actually a performance problem.
- If there is no processor activity, check whether IBM MQ responds to commands. If we can get a response, reconsider other reasons why IBM MQ might be waiting.
- If we cannot get a response, check the console log for messages that might explain the wait (for example, IBM MQ might have run out of active log data sets, and be waiting for offload processing).
If we are satisfied that IBM MQ has stalled, use the STOP QMGR command in both QUIESCE and FORCE mode to terminate any programs currently being executed.
If the STOP QMGR command fails to respond, cancel the queue manager with a dump, and restart. If the problem recurs, refer to Contacting IBM Software Support for further guidance.
Parent topic: Dealing with applications that are running slowly or have stopped on z/OSRelated concepts