WAS v8.5 > Troubleshoot > Configure the hang detection policy

Hung threads in Java Platform, Enterprise Edition applications

WebSphere Application Server monitors thread activity and performs diagnostic actions if one has become inactive.

When WebSphere detects that a thread has been active longer than the time defined by the thread monitor threshold, the application server takes the following actions:


False Alarms

If the work actually completes, a second set of messages, notifications and PMI events is produced to identify the false alarm. The following message is written to the log:

WSVR0606W: Thread threadname was previously reported to be 
hung but has completed. It was active for approximately hangtime. 
There are totalthreads threads in total in the server that still 
may be hung.
where threadname is the name that appears in a JVM thread dump, hangtime gives an approximation of how long the thread has been active and totalthreads gives an overall assessment of the system threads.


Automatic adjustment of the hang time threshold

If the thread monitor determines that too many false alarms are issued (determined by the number of pairs of hang and clear messages), it can automatically adjust the threshold. When this adjustment occurs, the following message is written to the log:

WSVR0607W: Too many thread hangs have been falsely reported.  The hang 
threshold is now being set to thresholdtime.
where: thresholdtime is the time (in seconds) in which a thread can be active before it is considered hung.

We can prevent WAS from automatically adjusting the hang time threshold. See Configure the hang detection policy


System Alarms

An application server monitors the activity of threads on which system alarms execute. When a system alarm thread has been active longer than the time defined by the alarm thread monitor threshold, the application server logs the following warning in the system log. This message indicates the name of the thread not responding, the length of time the thread has already been active, and the exception stack of the thread, which identifies the system component.

UTLS0008W: The alarm thread threadname has been active for n 
   milliseconds and may be hung. totalthreadsthreadstack   

In this message, threadname is the name that appears in a JVM thread dump, n is approximately how long the thread was active, totalthreads is an overall assessment of the system threads, and threadstack is the exception stack of the thread.

If the alarm work eventually completes, the following message is written to the system log. This message indicates thread that produced the false alarm.

UTLS0009W: Alarm Thread threadname was previously reported to be hung but has 
   completed.  It was active for approximately n milliseconds.

In this message, threadname is the name that appears in a JVM thread dump, and n is approximately how long the thread was active.

Typically, system alarms do not process heavy loads because such activity might slow the processing of later system alarms, which in turn might impact server behavior.

The UTLS0008W message is intended to help IBM Support personnel investigate problems potentially caused by system alarm behavior.

All of the system alarms share a common alarm thread pool. The properties which govern the monitoring of this thread pool can be tuned using the dmgr console. We can reduce the frequency at which WebSphere generates alarm hung thread messages by adjusting the alarm thread monitor check interval or threshold. See the topic Configure the hang detection policy for a description of how to change these settings.


Related


Configure the hang detection policy
Monitoring performance with Tivoli Performance Viewer


Reference:

Example: Adjusting the thread monitor to affect server hang detection


+

Search Tips   |   Advanced Search