Timeout conditions: analyzing diagnostic data

(ZOS) Timeout conditions: analyzing diagnostic data

The following guidelines provide instructions for finding diagnostic data in an SVC dump that can help you determine what timeout condition occurred.

We should start by finding the task with the EC3 abend:
Format the TCB summary for the servant that was timed out by entering the following command:
ip summ format asid(x' address ')
where address is the address space ID of the servant.

Find the TCB that had the EC3 completion code. Ignore any EC3 completion code on the "main" thread which is the 4th TCB listed in the summary format (the 1st one after the 3 MVSâ„¢ TCBs). The WebSphere main thread is the one that is waiting in BBO_BOA::impl_is_ready. No application requests are ever dispatched on this thread, therefore there is nothing to timeout. During timeout processing the main thread for the server region is also abended with EC3 as a mechanism of bringing the address space down. Thus the reason why the EC3 completion code may appear on the main thread. This is never the cause of a timeout though, only a result of timeout processing.
If there is no EC3 completion code in the TCB summary, look in systrace. Format the systrace in GMT time since the other timestamps you'll be comparing it to are in GMT time. To format in GMT time, enter::
ip systrace all time(gmt)
You may not see the EC3 abend in systrace either as systrace can cover a small amount of time.
We can also try looking in ip verbx mtrace or in syslog to see when the EC3 abend occurred. You'll need this time to determine the 'end' time of the request which is the GMT time the timeout value was reached.
We can determine what timeout values are in effect by checking the reason code associated with the EC3 abend.

Reason code Explanation
04130002 The controller issued an ABTERM for this servant region because a transaction timeout occurred. Code under dispatch could have been in a tight loop.
04130003 The controller issued an ABTERM for this servant region because it was hung trying to move a controller request into the servant region. The target request was timed out, but the servant was currently copying the request. The controller checked the servant for progress at regular intervals, before taking action by issuing an ABTERM.
04130004 The controller issued a ABTERM for this servant region because the WLM queue timeout occurred. Code under dispatch could have been in a tight loop.
04130005 The controller issued an ABTERM for this servant region because a transaction timeout occurred. The transaction has timed out, but no current request associated with the transaction was found. The servant associated with the transaction will be terminated.
04130006 A controller thread encountered a problem while processing a request. The request has been queued to WLM and associated with a servant region. The termination of the associated servant region is needed to complete cleanup for the request.
04130007 The controller issued a ABTERM for this servant region because the HTTP OUTPUT timeout occurred. Code under dispatch could have been in a tight loop.

We can try to find the method name to determine if it was
 httpRequest
,
 httpsRequest
or
 DispatchbyURI
or some other method. If the request is not specifically a request that came through the HTTP or HTTPS transport handlers, the
 protocol_http_output_timeout
(HTTP) and
 protocol_https_timeout_output
(HTTPS) timeout values will not be a factor. In other words, when the request is a
 DispatchbyURI
method, the request is received through the RMI/IIOP protocol, so the
 protocol_http
variables have no effect.
We can then use the IPCS verbexit LEDATA, with the CEEDUMP, or NTHREADS option to obtain the stack trace for that request.

Troubleshoot administration
Timer overview
Timeout conditions - possible causes and fixes
Timeout values: guidelines for altering timeout values

Reason code	Explanation
04130002	The controller issued an ABTERM for this servant region because a transaction timeout occurred. Code under dispatch could have been in a tight loop.
04130003	The controller issued an ABTERM for this servant region because it was hung trying to move a controller request into the servant region. The target request was timed out, but the servant was currently copying the request. The controller checked the servant for progress at regular intervals, before taking action by issuing an ABTERM.
04130004	The controller issued a ABTERM for this servant region because the WLM queue timeout occurred. Code under dispatch could have been in a tight loop.
04130005	The controller issued an ABTERM for this servant region because a transaction timeout occurred. The transaction has timed out, but no current request associated with the transaction was found. The servant associated with the transaction will be terminated.
04130006	A controller thread encountered a problem while processing a request. The request has been queued to WLM and associated with a servant region. The termination of the associated servant region is needed to complete cleanup for the request.
04130007	The controller issued a ABTERM for this servant region because the HTTP OUTPUT timeout occurred. Code under dispatch could have been in a tight loop.