Configure transaction properties for peer recovery

Configure transaction properties for peer recovery

Peer recovery for the transaction service enables servers in a cluster to complete outstanding work for a failed cluster member. Follow the steps in this topic to configure the transaction properties required for peer recovery of failed application servers in a cluster.

To enable transaction peer recovery between servers, we must have a common configuration of the resource providers between the participating server members. This means that peer recovery processing can only take place between members of the same server cluster. Although a cluster can contain servers that are at different versions of WebSphere Application Server, enable and configure high availability only if all servers in the cluster are at v6 or later.

(ZOS) Peer recovery of transactions is in addition to the support for Peer restart and recovery, which enables us to restart on a peer system in the sysplex. For more information about configuring peer restart and recovery, see Set up peer restart and recovery.

Configure the transaction properties required for peer recovery is part of the overall task for configuring a cluster to use high availability support.

Tasks

On z/OS platforms, configure the Resource Access Control Facility (RACF ) to allow the application servers to call the ATRSRV macro.
The ATRSRV macro allows a server to commit and back out transactions for other servers. This process differs from peer restart and recovery support, where the other server is started on another system. The ATRSRV macro is provided by MVS™ Resource Recovery Services (RRS).

The user ID that the application server controller region runs under must have ALTER access to the MVSADMIN.RRS.COMMANDS.gname.sysname resource in the FACILITY class, where gname is the RRS logging group (usually the SYSPLEX name), and sysname is the system name. To allow access to all logging groups and systems, use wildcards in the resource name, for example MVSADMIN.RRS.COMMANDS.*.
Because the controller region runs as an authorized address space, it implicitly has ALTER access to this resource class, unless the RACF configuration explicitly restricts access. By explicitly allowing access to this resource, we are not relying on the authorized state of the controller region.
For more information about the ATRSRV macro and setting the appropriate RACF permissions, see Chapter 8 of MVS Programming: Resource Recovery, SA22-7616-02.

Configure the transaction log directory setting for each server in the cluster. We can configure the location of the transaction log directory using either the administrative console or commands. The configuration is stored in the serverindex.xml node-level configuration file.
Each server in the cluster must be able to access the log directories of other servers in the same cluster. For this reason, do not leave this setting unset. If we do not set a directory, the application server assumes a default location within the appropriate profile directory, which might not be accessible to other servers in the cluster.

Each server in the cluster must also have a unique transaction log directory, to avoid attempts by multiple servers to access the same log file. For example, we could use the name of each server as part of the log directory name for that server.

The storage mechanism used to host recovery log files (for example, we can use IBM Network attached storage (NAS) and shared SCSI drives, but not simple network share) and access to that mechanism (for example, through a local area network (LAN)), must support the file-based force operation used by the recovery log service to force data to disk.

(iSeries) The storage mechanism used to host recovery log files, and access to that mechanism (for example we can store the logs on another IBM i server using the NetClient file system (QNTC), which provides access to data on a remote system using the Server Message Block (SMB) protocol), must support the file-based force operation used by the recovery log service to force data to disk..

In addition, configure the mechanism by which the remote log files are accessed, to exploit any fault tolerance in the underlying file system. For example, using the Network File System (NFS) and hard-mounting the remote directory containing the log files (using the -o hard option of the NFS mount command), the NFS client will try again with a failed operation until the NFS server becomes available again.

For more information about configuring transaction log directories, see Configure transaction properties for an application server.

If we have migrated from a previous version of WAS, be aware that previous versions stored the recovery log configuration in the server.xml server-level configuration file. If we run existing scripting that configures the original recovery log settings, or migrate v5 application servers to a later version of WAS, the original transaction log directory configuration in the server.xml file is updated. The administrative console detects this condition and prompts you to save the configuration when you view the transaction service panel. This save operation saves the changed configuration to the serverindex.xml file, and resets the older fields to null. Change our existing scripting to target the serverindex.xml file at the earliest opportunity. New scripting should also target the serverindex.xml file.
Enable the high availability function for the cluster, by completing the following steps on the cluster configuration panel of the WAS administrative console:

In the administrative console, click...
Servers > Clusters > WebSphere application server clusters > cluster_name.
Select the Enable failover of transaction log recovery option.
Click OK.
For more information about enabling the high availability function for a cluster, see Server cluster settings.
Decide which kind of transaction peer recovery to use by referring to How to choose between automated and manual transaction peer recovery.
Complete one of the following actions, depending on the configuration that you require.

To use automated peer recovery, follow the steps in Configure automated peer recovery for the transaction service.
To use manual peer recovery, configure a policy for the transaction service, as described in Configure manual peer recovery for the transaction service.

What to do next
We must also configure the compensation log location. Each server must have a unique compensation log directory and the compensation logs must be accessible, in a similar way to the transaction logs.

Subtopics

Configure manual peer recovery for the transaction service
anual peer recovery processing is not the default setting; enable it through configuration. Administrator intervention is then required to trigger any peer recovery processing.
Manage manual peer recovery of the transaction service
After configuring manual peer recovery, you trigger a recovery process using the administrative console; peer recovery can no longer take place automatically. This requirement applies to transaction peer recovery processing only; standard recovery processing of server recovery logs, driven when the server starts, still occurs automatically.
Configure automated peer recovery for the transaction service
Configure automated peer recovery to enable cluster members to automatically complete outstanding work for a failed cluster member. After we have configured automated peer recovery, recovery processing occurs without any intervention on your part.

Related:

Transactional high availability
How to choose between automated and manual transaction peer recovery
(ZOS) Peer restart and recovery
(ZOS) Set up peer restart and recovery
Configure transaction properties for an application server
Configure automated peer recovery for the transaction service
Configure a server to use business activity support
Storing and restoring transaction and compensation logs for high availability
Server cluster settings
Compensation service settings