Set up peer restart and recovery
To allow the product to restart on an alternate system, the following prerequisites must be installed on every system (your original system as well as any systems intended for recovery) before reconfiguring the ARM policies to enable peer restart and recovery.
Deprecated feature: Peer Restart and Recovery (PRR) functionality is deprecated. You should use the integrated high availability support for the transaction service subcomponent, instead of Peer Restart and Recovery for transaction recovery. See the topic Transaction support in WebSphere Application Server for more information about the integrated high availability support for the transaction service subcomponent and how to configure it for peer recovery of transactions being processed on a application server that fails.depfeat
We must also make sure all of the systems, where we might need to perform restart, are part of the same RRS log group.
- z/OS Version 1.2 or higher
- BCP APAR OA01584
- RRS APARs OA02556 and OA2556
- WebSphere Application Server Version 5 or higher
Install the prerequisite service updates on all of these systems will not hinder the current running environment to continue to only restart in place. However, if this service is not installed, there is a possibility that the controller will not be able to move back. OTS will attempt to restart on the alternate system and fail. If there are any URs that are unresolved with RRS once this happens, the controller will not be allowed to restart on the home system until RRS is cancelled on the alternate system. For more information on OTS and RRS, see z/OS MVS™ Programming: Resource Recovery.
If we do not plan to use peer restart and recovery, we do not need to abide by these functional prerequisites. Your system will instead use the restart-in-place function.
The following products all support RRS. Individually, they also support peer restart and recovery, providing that the previously listed prerequisites are all properly installed:
- DB2 Version 7 or higher
- IMS™ Version 8 or higher
- CICS Version 1.3 or higher
- MQSeries Version 5.2 or higher
In addition to the preceding products, many JTA XAResource Managers can be used to assist in a the product peer restart and recovery. Consult the JTA XAResource Manager's documentation to determine if it supports restarting on an alternate system.
When setting up the ARM policy for a sysplex, make sure that both systems have the same level of the Application Server installed. For example, we cannot use an application server running WebSphere Application Server Version 5.1 to perform peer restart and recovery for an application server running WAS v6.0.1.gotcha
Prior to using peer restart and recovery:
- We must ensure that the location service Daemon and node agent are already running on all systems that might be used for recovery. Otherwise, the recovering system might attempt to recover on a system that is not running the location service Daemon and node agent. If this happens, the server will fail to start, and recovery will fail.
Clients will see a performance impact if the systems are running at capacity. In an attempt to minimize the memory and CPU impact on the alternate system, the enterprise bean and web containers are not restarted for servers running in peer-restart mode. This means that application servers that are in the state of being recovered will not be able to accept any inbound work.
After the prerequisites are installed, starting a server on a system to which it was not configured implicitly places the server into peer restart and recovery mode. If we configured the XA Partner log to write to a non-shared HFS, or if you are using a JTA XA Resource Manager, we need to perform the following steps before starting a server:
- (Required only if you are using a non-shared HFS.) Enable non-shared HFS support. When using a non-shared HFS, the configuration settings must be replicated across the different systems in the sysplex. This is done automatically by the deployment manager and node agent. To enable this support, each node agent in the configuration must be set as a recovery node. This change is made in the console:
- In the console navigation, select System Administration > Node agents.
- Select a node agent from the list.
- In the Additional Properties section, select File Synchronization Service.
- In the Additional Properties section, , select Custom properties.
- Select New.
- Enter recoveryNode for Name, and true for Value. The Description field can be left blank.
- Repeat steps 3-7 for each node agent in the configuration.
- Save the configuration.
- (Required only if you are using JTA XAResource Managers.) Make appropriate logs and classes are available on the alternate system To use peer restart and recovery, and the applications access JTA XAResource Managers, you must ensure that the appropriate logs and classes are available on the alternate system.
- Point the product variable TRANLOG_ROOT to a shared HFS. The TRANLOG_ROOT variable must point to a shared HFS, to which all systems in the cell can write. The XA partner log is stored here, and the alternate system must be able to read and update this log.
- In the console, click Servers > Server Types > WebSphere application servers > server_name.
- Under Container Services, click Transaction Service.
- Enter the directory of the shared HFS in the Transaction log directory field.
- Store the driver (i.e., JDBC Driver, JMS Provider, or JCA Resource Adapter, etc.) for each JTA XAResource Manager in an HFS that is readable by all systems in the cell. For example, if the connector is a JDBC driver for a database, the driver would likely be stored in a read-only HFS that is accessible by all systems in the sysplex. This allows the alternate system to read the saved classpath for the resource, and reconstruct it during a restart.
If the connector used to access a JTA XAResource Manager is not stored in an HFS that is readable by all systems that might be used for recovery, when an application server restarts on an alternate system, it will either appear that there is no XA recovery work to do, or it will be impossible to load the classes necessary to communicate with the JTA XAResource Manager
- Resolve InDoubt units.
During a recovery, there will be instances when manual intervention is required to resolve InDoubt units. You will need to use RRS panels for this manual intervention.
- (zos) Peer restart and recovery
The goal of every system is to have as little downtime as possible. Sometimes, however, system failures are inevitable. For example, a system failure might occur because the power unexpectedly goes out in the main system. When a system failure occurs, a restart action we can take is to restart on a peer system in the sysplex. This type of restart uses the peer restart and recovery function. Starting a server on a system to which it was not configured implicitly places it into peer restart and recovery mode.
- (zos) Use RRS panels to resolve InDoubt units of recovery
Use this task to better understand messages received when using peer restart and recovery.
- (zos) Recovering with JTA XAResource managers
When a JTA XAResource manager is enlisted in a global transaction, it cannot express an interest in the z/OS Resource Recovery Services unit of recovery (UR) like an RRS resource manager can. Instead, the product transaction service will save information in its RRS interest indicating that a JTA Resource Manager was enlisted in the transaction.
Transactional high availability
Configure transaction properties for peer recovery