Restarting an application server in recovery mode
When an application server instance with active transactions in progress restarts after a failure, the transaction service uses recovery logs to complete the recovery process. These logs, which each transactional resource maintains, are used to rerun any InDoubt transactions and return the overall system to a self-consistent state.
(ZOS)
If we are migrating from a previous version of the product, make sure that the REC parameter is included on the JCL procedure statement for the controller as either REC=N or REC=Y. If the JCL procedure does not specify either REC=N or REC=Y, the server does not restart in recovery mode even if we specify the -recovery option.
If the JCL procedures includes REC=N, the setting automatically changes to REC=Y if we specify -recovery when restarting the server. REC=N is automatically included on the JCL procedure if we did not migrate from a previous version of the product. Following is an example of what your updated PROC statement might look like:
//BBO6ACR PROC PARMS=' ',REC=N,Z=BBO6ACRZ
When we restart an application server in recovery mode:
- Transactional resources complete the actions in their recovery logs and then shut down. This action frees up any resource locks that the application server held prior to the failure.
- During the recovery period, only the subset of application server functions that are necessary for transactional recovery to proceed are available.
- The application server does not accept new work during the recovery process.
- The application server shuts down when the recovery is complete.
This recovery process begins as soon as all of the necessary subsystems within the application server are available. If the application server is not restarted in recovery mode, the application server can start accepting new work as soon as the server is ready, which might occur before the recovery work has completed.
Normally, this process is not a problem. However, situations exist when the operating procedures might not be compatible with supporting recovery work and new work simultaneously. For example, we might have a high availability environment where the work handled by the application server that failed is immediately moved to another application server. This backup application server then exclusively processes the work from the application server that failed until recovery has completed on the failed application server and the two application servers can be re-synchronized. In this situation, we might want the failing application server to only perform its transactional recovery process and then shut down. We might not want this application server to start accepting new work while the recovery process is taking place.
To prevent the assignment of new work to an application server that is going through its transaction recovery process, restart the application server in recovery mode.
When we restart a failed application server, the node agent for the node on which the failed application server resides must be running before we can restart that application server.
When an application server stops as part of normal shutdown processing, message WSVR0024I: Server xxxxxxxx PROCESS xxxxxxxx stopped is sent to the system log file. If the server user Ids have ALTER access to the appropriate MVSADMIN.* profiles in the facility class, the resource manager registration entry associated with the application server for this instance of the application server is removed from the RRS logs. However, if the server user Ids have do not have ALTER access to the appropriate MVSADMIN.* profiles in the facility class, the resource manager registration entry associated with the application server for this instance of the application server is not removed from the RRS logs.
If the resource manager registration entry was deleted from the RRS logs, on a subsequent application server start, a cold start is performed. However, we cannot perform a cold start with RRS if we are starting the application server in recovery mode.
(ZOS) With this service release, we can cold start the server in a recovery mode only on the system where the server was configured.
.To be able restart an application server in recovery mode, we must perform the following steps before a failure occurs, and then restart the application server to enable the configuration changes:
Tasks
- If the server is monitored by a node agent, we must clear the Automatic restart option for that server. Clearing this option prevents the node agent from automatically restarting the server in normal mode, before we have a chance to start it in recovery mode.
- In the administrative console, click...
Servers > Server Types > WebSphere application servers > server.
- In the Server Infrastructure section, click Java and process management > Monitor Policy.
- Clear the Automatic restart option.
- If a catastrophic failure occurs that leaves InDoubt transactions, issue the startServer server -recovery command from the command line. This command restarts the server in recovery mode. We must issue the command from the profile_root/bin directory for the profile with which the server is associated.
The application server restarts in recovery mode, performs transactional recovery, and shuts down. Any resource locks that the application server held prior to the failure are released.
(ZOS)
What to do next
Configure the integrated high availability support for the transaction service subcomponent for peer recovery of transactions.
Subtopics
- InFlight work and presumed abort mode
Presumed abort mode is activated when a failure occurs before a distributed transaction starts to commit.- IMS Connect considerations following server recovery
After InDoubt and InFlight work completes, the product server shuts down. A new application server configured for that system is then started up to accept new work. Special considerations must be taken if we are using IMS™ Connect after recovering to an alternate system.
Related:
Transactional high availability Starting servers using scripting startServer command