WAS v8.5 > Troubleshoot Configure the hang detection policy
The hang detection option for WebSphere Application Server is turned on by default. We can configure a hang detection policy to accommodate the applications and environment so that potential hangs can be reported, providing earlier detection of failing servers. When a hung thread is detected, WAS notifies you so that we can troubleshoot the problem.
A common error in Java EE applications is a hung thread. A hung thread can result from a simple software defect (such as an infinite loop) or a more complex cause (for example, a resource deadlock). System resources, such as CPU time, might be consumed by this hung transaction when threads run unbounded code paths, such as when the code is running in an infinite loop. Alternately, a system can become unresponsive even though all resources are idle, as in a deadlock scenario. Unless an end user or a monitoring tool reports the problem, the system may remain in this degraded state indefinitely.
Using the hang detection policy, we can specify a time that is too long for a unit of work to complete. The thread monitor checks all managed threads in the system (for example, web container threads and object request broker (ORB) threads) . Unmanaged threads, which are threads created by applications, are not monitored. For more information read Hung threads in Java Platform, Enterprise Edition applications.
The thread hang detection option is enabled by default. To adjust the hang detection policy values, or to disable hang detection completely:
- From the dmgr console, click Servers > appservers > server_name
- Under Server Infrastructure, click Administration > Custom Properties
- Click New.
- Add the following properties:
Information Description Name com.ibm.websphere.threadmonitor.interval Value The frequency, in seconds, at which managed threads in the selected application server will be interrogated. Default 180 seconds (three minutes)
Information Description Name com.ibm.websphere.threadmonitor.threshold Value The length of time, in seconds, in which a thread can be active before it is considered hung. Any thread that is detected as active for longer than this length of time is reported as hung. Default 600 seconds (ten minutes)
Information Description Name com.ibm.websphere.threadmonitor.false.alarm.threshold Value The number of times (T) that false alarms can occur before automatically increasing the threshold. It is possible that a thread that is reported as hung eventually completes its work, resulting in a false alarm. A large number of these events indicates the threshhold value is too small. The hang detection facility can automatically respond to this situation: For every T false alarms, the threshold T is increased by a factor of 1.5. Set the value to zero (or less) to disable the automatic adjustment. Default 1.0
Information Description Name com.ibm.websphere.threadmonitor.dump.java Value Set to true to execute the dumpThreads function when a hung thread is detected and a WSVR0605W message is printed. The threads section of the javacore dump can be analyzed to determine what the reported thread and other related threads are doing. Set to an integer value in the range 1 through Integer.MAX_VALUE to cause the dumpThreads function to be executed when a hung thread is detected and a WSVR0605W message is printed. The integer value indicates the maximum number of times dumpThreads will be executed. By default, the dumpThreads function creates a javacore dump. See Java Diagnostics. Beware JAVACORES CONTINUOUSLY CREATED ON A HUNG THREAD Default false (0)
Information Description Name com.ibm.websphere.threadmonitor.dump.stack Value Set to true to cause a stack trace to be printed when a hung thread is detected and a WSVR0605W message is printed. Default true To disable the hang detection option, set the com.ibm.websphere.threadmonitor.interval property to less than or equal to zero.
- Optional: To monitor the activity of threads on which system alarms execute, add the following JVM generic arguments to the server settings.
Information Description Name -Dcom.ibm.websphere.alarmthreadmonitor.generate.javacore Value Set to any value to cause a javacore dump to be created when an hung system alarm thread is detected. The threads section of the javacore dump can be analyzed to determine what the reported thread and other related threads are doing. Default Unset
Information Description Name com.ibm.websphere.alarmthreadmonitor.checkinterval.millis Value The frequency, in milliseconds, at which system alarm threads are interrogated. Set the value to zero to disable system alarm hung thread detection. Maximum interval is 600000 (10 minutes). Default 10000 (10 seconds)
Information Description Name -Dcom.ibm.websphere.alarmthreadmonitor.threshold.millis Value Set to any value integer between 10000 and 600000 (10 minutes). This argument is used to specify the length of time, in milliseconds, that a system alarm thread can be active before it is considered non-responsive. Any system alarm thread that is detected as inactive for longer than this length of time is reported as hung. Default 10000 (10 seconds) To add these arguments to the server settings:
- Under Server Infrastructure on the server settings page in the dmgr console, click Java and process management > Process definition.
- Select Java virtual machine.
- Add the arguments to the JVM generic arguments section.
- Click Apply.
- Click OK.
- Save the changes.
- Restart the Application Server for the changes to take effect.
Subtopics
- Hung threads in Java Platform, Enterprise Edition applications
- Example: Adjusting the thread monitor to affect server hang detection to a thread not being processed correctly.
Related concepts:
Hung threads in Java Platform, Enterprise Edition applicationsDumping threads in server processes using scripting Example: Adjusting the thread monitor to affect server hang detection