Enabling process restart on failure

Previous | Home | Next


Enabling process restart on failure

In a distributed environment, we can use the health management feature to monitor the status of application servers, nodes, clusters, dynamic clusters, on demand routers, and cells so that we can sense and respond to problem areas before an outage occurs. We can manage the health of an application serving environment with a policy-driven approach that enables specific actions to occur when monitored criteria is met. For example, for an application server, when memory usage exceeds a percentage of the heap size for a specified time, health policy actions can run to correct the situation. The following list shows some of the predefined health policy actions that are applicable to excessive memory usage:

All of the listed actions can be grouped and used in a custom sequence to help detect and correct the problem. We can use the dmgr console to set health policies by clicking Operational policies | Health policies.

Actions that you might set in case your server exceeds 90 percent of the JVM heap size for a period of two minutes.

The two reaction modes for the health management monitor are:

Supervise When the health condition is reached, a task is submitted with a suggested plan of action automatically carried out if the task is approved.
Automatic When the health condition is reached, the actions are automatically carried out in the order you previously defined.

We can define a large number of custom health conditions and actions for when the health conditions breach. Intelligent management features help you recover from the most common operational issues, and there is a more general way to restart your server processes. We can use the native operating system functionality to restart a failed process.

The following sections provide more information about how to set your operating system.


Windows

The administrator can choose to register one or more of the WAS processes on a machine as a Windows service during profile creation. It can also be done after profile creation using the WASService command. With this command, Windows automatically attempts to restart the service if it fails during use. Syntax Enter WASService.exe with no arguments to get a list of the valid formats.
WASService command format

Usage: WASService.exe 

-add <service name>
-serverName <Server>
-profilePath <Server's Profile Directory>
[-wasHome <WebSphere Install Directory>]
[-configRoot <Config Repository Directory>]
[-startArgs <additional start arguments>]
[-stopArgs <additional stop arguments>]
[-userid <execution id> -password <password>]
[-logFile <service log file>]
[-logRoot < server's log directory>]
[-encodeParams]
[-restart <true | false>]
[-startType <automatic | manual | disabled>]
|| -remove <service name>
|| -start <service name> [optional startServer.bat parameters]
|| -stop <service name> [optional stopServer.bat parameters]
|| -status <service name>
|| -encodeParams <service name>

Considerations...


Registering a deployment manager as a Windows 7 service

$ runas /user:IBM-CMierlea\admin 
        "/WAS/AppServer/bin\WASService 
        -add "dmgr" 
        -servername dmgr 
        -profilePath "D:\was85\IBM\WebSphere\AppServr_85_01" -restart true"

Enter the password for IBM-CMierlea\admin:
Attempting to start /WAS/AppServer/bin\WASService -add dmgr -servername
dmgr -profilePath /WAS/AppServer/profiles\Dmgr_85_01 -restart true as user "IBM-CM
..
/WAS/AppServer/bin$ 

The service name added will be IBM WAS V8.5, concatenated with the name you specified for the service name. We can set recovery actions in case of failure using the Recovery tab under the Properties of the new service.

If you remove the service using the WASService -remove command, specify only the latter portion of the name.


/WAS/AppServer/bin$ runas /user:IBM-CMierlea\admin "/WAS/AppServer/bin\WASService -remove "dmgr""
Enter the password for IBM-CMierlea\admin:
Attempting to start /WAS/AppServer/bin\WASService -remove dmgr as user "IBM-CMierlea\admin" ...
/WAS/AppServer/bin$


UNIX and Linux

The administrator can choose to include entries in inittab for one or more of the WAS processes on a machine. Each such process is then automatically restarted if it has failed.

Inittab contents for process restart on deployment manager machine...

On node machine:

When setting the action for startServer.sh to respawn in /etc/inittab, be aware that init always restarts the process, even if you intended for it to remain stopped. As an alternative, we can use the rc.was script located in...

...which allows you to limit the number of retries.

The best solution is to use a monitoring product that implements notification of outages and logic for automatic restart.


z/OS

WebSphere for z/OS takes advantage of the z/OS Automatic Restart Management (ARM) to recover application servers. Each application server running on a z/OS system (including servers you create for the business applications) are automatically registered with an ARM group. Each registration uses a special element type called SYSCB. ARM treats SYSCB as restart level 3, ensuring that RRS (a z/OS facility that provides two-phase sync point support across participating resource managers) restarts before any application server.

If we have an application critical for the business, you need facilities to manage failures. z/OS provides rich automation interfaces, such as automatic restart management, which we can use to detect and recover from failures. The automatic restart management handles the restarting of servers when failures occur.

Some important things to consider when using automatic restart management:


ARM Behavior and WAS for z/OS server instances

When you issue ARM behavior STOP address_space It does not restart the address space.