Health management

With the health management feature in Liberty, we can take a policy-driven approach to monitoring the application server environment and respond when unhealthy criteria are discovered.

We can define the health policies, which include the health conditions to be monitored in the environment and the health actions to take if these conditions are met.


Health conditions

Health conditions define the variables we want to monitor in the environment. The condition element defines what behavior can trigger this health policy. Only one condition element can be defined per health policy. We can choose from the following predefined health conditions:

    Excessive request timeout condition

    Specifies a percentage of HTTP requests that can time out. When the percentage of requests exceeds the defined value, the health actions run. The timeout value depends on the environment configuration.

      <excessiveRequestTimeout timeoutPercentage="5"/>

    Excessive response time condition

    Tracks the average amount of time that requests take to complete. If the time exceeds the defined response time threshold, the health actions run.

      <excessiveResponseTime responseTime="10s"/>

    Note: Requests that exceed the timeout value configured for the excessive request timeout condition are not counted toward this health condition. For example, if the default timeout value is 60 seconds, then any request that exceeds 60 seconds times out and is not included in the average response time calculation. This restriction applies even if we do not define an excessive request timeout condition.

    Memory condition: excessive memory usage

    Tracks the memory usage for a member. When the memory usage exceeds a percentage of the heap size for a specified time, health actions run.

      <excessiveMemoryUsage heapSizePercentage="85" timePeriod="5m"/>

    Memory condition: memory leak

    When a downward trend in free memory is detected, health actions run.

      <memoryLeak/>

Important:

  • Dynamic Routing must be enabled to use either the excessive request timeout or excessive response time conditions.

  • The healthAnalyzer-1.0 feature must be enabled in the server.xml file to use either the excessive memory usage or memory leak conditions. This feature can be enabled only for collective members.


Health actions

Health actions define the activities to perform when a health condition is not met. Action elements define what action is taken in response to a detected condition. All actions share the element type of <action>. The action attribute determines which action is taken and multiple actions can be defined for each health policy. Actions are run in the order they are specified in the policy. The following table lists the health actions supported in Liberty server environments:

Health action Liberty servers that run in the same collective controller
Restart server. Supported
Take thread dumps. Supported
Take Java virtual machine (JVM) heap dumps. Supported for servers that are running on the IBM JRE or Java Developer Kit
Enter server into maintenance mode. Supported
Exit server out of maintenance mode. Supported

    <action action="generateThreadDump"/>
    <action action="generateHeapDump"/>
    <action action="restartServer"/>
    <action action="enterMaintenanceMode"/>
    <action action="exitMaintenanceMode"/>


Health targets

Target elements define the scope of the topology being monitored for the condition. Three target types are available:

  • A host

      <host hostName="someHost"/>

  • Each of the servers in a cluster

      <cluster clusterName="someCluster"/>

  • A single server

      <server hostName="Host" wlpUsrDirectory="/opt/ibm/liberty/wlp" serverName="Server"/>

Each target type has a unique element used to define it within the healthPolicy element. More than one target can be specified per health policy.