Network Deployment (Distributed operating systems), v8.0 > Administer applications and their environment > Administer Transactions > Administer the transaction service > Configure transaction properties for peer recovery > Configure automated peer recovery for the transaction service
Disable file locking
If you use Network File System v3 (NFSv3) for storing transaction recovery logs, and to use automated peer recovery, first disable file locking.
To complete this task first configure the system to prevent system overloading and network partitioning, as described in the topic about how to choose between automated and manual transaction peer recovery. These situations can lead to the initiation of a peer recovery process for an active server.
Attention: If you do not take this precautionary step, data corruption can occur.
The following list contains some actions that you can take to prevent system overloading and network partitioning:
- Modify the core group heartbeat settings to change the amount of time after which WAS considers a server failed. See the topic about the high availability manager.
- Ensure that your network is safe from network partitioning by, for example, installing backup network adapters.
- Modify the workload management throttling so that the server cannot be overloaded.
WAS obtains an exclusive lock on the physical recovery log files whenever it is instructed to undertake recovery processing, and releases this lock when it is instructed to pass ownership of the logs to another server. Access to a recovery log is performed only when the exclusive lock is held.
NFSv3 supports exclusive file locks, but holds them on behalf of a failed host until that host can restart. In this context, the host is the physical machine running the application server that requests the lock and it is the restart of the host, not the application server, that eventually triggers the locks to release. See the topic about how to choose between automated and manual transaction peer recovery for more information.
To provide a more appropriate failover behavior, you can either use manual failover, and configure the system as described in Configure manual peer recovery for the transaction service, or you can disable the use of exclusive file locking.
Procedure
- In the administrative console, click...
> Server Types > WebSphere application servers > server_name
> [Container Settings] Container Services > Transaction Service.
- Clear the Enable file locking check box.
- Click Apply or OK.
- Save your change to the master configuration.
- Repeat the previous steps for every server in the cluster.
- Restart the servers in the cluster for the changes to take effect.
Results
Exclusive file locking is disabled for all the servers in the cluster.
What to do next
Having taken steps to mitigate the risk to recovery log integrity when locking is disabled, you can tune the heartbeating parameters of the WAS high availability (HA) framework to change the conditions under which a server is considered failed. By considering the characteristics of applications, network, and peak workloads, determine an acceptable period of time after which the likelihood of an incorrectly diagnosed server failure is acceptably small.
A trade-off exists between reducing the risk of an incorrect diagnosis of server failure and increasing the time for automated failover and peer recovery to occur. By default, a server is considered failed after 20 heartbeats, with a 10-second frequency, are missed. These defaults are custom properties of the core group that you can modify.
Transactional high availability
High availability manager
How to choose between automated and manual transaction peer recovery
Configure manual peer recovery for the transaction service
Configure automated peer recovery for the transaction service