Transaction troubleshooting tips

Transaction troubleshooting tips

Use these tips to help troubleshoot problems with the WAS transaction service.

Peer recovery fails to acquire a lock

XAER_NOTA exception logged after server fails

Clean shutdown message is not in the message log

(zos) Hung servers following the failover of large cross-cluster or cross-node global transactions in a high availability environment

For messaging problems specific to WebSphere Application Server nodes, see other topics in the information center, such as the topic about messaging troubleshooting tips, and the WAS Support web page.

Peer recovery fails to acquire a lock

If peer recovery of a transaction fails to acquire a file lock needed to undertake recovery processing, the following messages may occur:
[10/26/04 8:41:38:887 CDT] 00000029 CoordinationL A   CWWTR0100_GENERIC_ERROR
[10/26/04 8:41:39:100 CDT] 00000029 RecoveryHandl A   CWWTR0100E: An attempt to  acquire a file lock needed to perform recovery processing failed. Either the  target server is active or the recovery log configuration is incorrect
....
[10/26/04 8:42:34:921 CDT] 00000027 HAGroupImpl   I   CWRHA0130I: The local  member of group GN_PS=fwsitkaCell01\fwwsaix1Node01\GriffinServer3, IBM_hc=GriffinCluster,type =WAS_TRANSACTIONS has indicated that is it not  alive. The JVM will be terminated.
[10/26/04 8:42:34:927 CDT] 00000027 SystemOut     O Panic:component requested  panic from isAlive
To troubleshoot the cause of failure to acquire the file lock, check the following factors:

If we have enabled failover of transaction log recovery on the server cluster and are using a NAS devise for the transaction logs, check that the DFS level on the machine is at a correct level for the NAS DFS level. If the two levels are not correct, the transaction logs cannot be accessed.

If we are running as non-root, check that the ID numbers of the non-root user and group match on all machines involved with peer recovery.

If we have a policy defined for transaction, review the policy to ensure that you are giving control to the correct servers (perhaps we have to add to or reorder the preferred server list).

Client requests and web services transaction protocol messages are not routed to the appropriate server

When the client is not part of the same administrative cell as the target service, and you require transaction affinity or transaction high availability, we can use the WAS proxy server topology to route client requests and web services transaction protocol messages to the appropriate server. In this topology, the client communicates with a WAS proxy server, which dynamically routes the client requests and web services transaction protocol messages to the appropriate server in a WebSphere Application Server cluster. For this scenario to work, the proxy server must be configured in the same administrative cell as the target service.
Avoid trouble: WAS does not provide on demand router (ODR) support for this scenario. Only the WAS proxy server can act as a proxy for web service transaction endpoints.gotcha

XAER_NOTA exception logged after server fails

If an application server fails, and the end transaction record is not forced to disk immediately, you might or might not recover a transaction.
WAS does not force the end record to the log, so it is up to the operating system/network file system to decide when to write to the disk. The record would be forced if the server was shut down cleanly. The transaction service is designed to cope with the case of the end record never being written to disk - when it gets an XAER_NOTA returned from the databases.

[date time] 00000057 WSRdbXaResour E CWWRA0302E: XAException occurred. Error code is: XAER_NOTA (-4). Exception is: XAER_NOTA

If there is a transaction without an end record left in the transaction log, the transaction service tries to check with the database. If the transaction has completed, the database indicates that there is nothing to complete (XAER_NOTA). This behavior is normal, and is not an error.

Clean shutdown message is not in the message log

When an application server shuts down, any active transactions are rolled back. If all transactions complete successfully, message CWWTR0105I is logged, indicating a clean shutdown of the transaction service, and the next server restart does not need any recovery activity. If an application server shuts down and message CWWTR0105I is not logged, this message does not indicate a problem, but it does mean that recovery activity is required when the server restarts.
Prior to uninstalling the product, you should have a clean shutdown of all application servers so that you avoid data integrity problems.

(zos) Ensure that recovery from an RRS or XA resource perspective is not needed

On the z/OS operating system, the clean shutdown message CWWTR0105I is never logged. To ensure that recovery from an RRS or XA resource perspective is not needed, we can restart the application server in recovery mode, in the system in which it is configured. In recovery mode, if there are any outstanding units of recovery (URs), the application server completes the URs, then shuts down. If there are no outstanding URs, the application server starts, then shuts down normally. Therefore, to ensure that all recovery has occurred, restart the server in recovery mode and wait until a normal shutdown.

(zos) Hung servers following the failover of large cross-cluster or cross-node global transactions in a high availability environment

In the event of a failover, such as LPAR failure, some number of the surviving application servers can become unresponsive.
To resolve this problem, cancel and restart the application servers. If necessary, force restart the application servers.

Related tasks

(zos) Restarting an application server in recovery mode

Messaging troubleshooting tips

Related information:

WebSphere Application Server Support