Tips for troubleshooting transactions

This topic provides a set of tips to help you troubleshoot problems with the WebSphere transaction service.

For messaging problems specific to WAS nodes, see the information center and the Application Servers support web site; for example: Tips for troubleshooting WebSphere messaging [v5].

 

Peer recovery fails to acquire a lock

If

peer recovery of a transaction fails to acquire a file lock that is needed to perform recovery processing, you should see the following messages

[10/26/04 8:41:38:887 CDT] 00000029 CoordinationL A   WTRN0100_GENERIC_ERROR
[10/26/04 8:41:39:100 CDT] 00000029 RecoveryHandl A   WTRN0100E: An attempt to 
acquire a file lock need to perform recovery processing failed. Either the target 
server is active or the recovery log configuration is incorrect
....
[10/26/04 8:42:34:921 CDT] 00000027 HAGroupImpl   I   HMGR0130I: The local member 
of group GN_PS=fwsitkaCell01\fwwsaix1Node01\GriffinServer3,IBM_hc=GriffinCluster,type
=WAS_TRANSACTIONS has indicated that is it not alive. The JVM will be terminated.
[10/26/04 8:42:34:927 CDT] 00000027 SystemOut     O Panic:component requested panic 
from isAlive

To troubleshoot the cause of failure to acquire the file lock, check the following factors:

  • If you have "Enable high availability for persistent services" enabled on the server cluster and are using a NAS devise for the transaction logs, check that the DFS level on your machine is at a correct level for the NAS DFS level. If the two levels are not correct, the transaction logs cannot be accessed.

  • If you are running as non-root, check that the id numbers of the non-root user and group match on all machines involved with peer recovery.

  • If you have a policy defined for transaction, review the policy to ensure that you are giving control to the correct servers (perhaps we need to add to or reorder the preferred server list).

 

XAER_NOTA exception logged after server fails

If an application server fails, and the end transaction record is not forced to disk immediately, you may or may not recover a transaction.

WebSphere does not force the end record to the log, so it is up to the operating system/network file system to decide when to write to the disk. The record would be forced if the server was shutdown cleanly. The transaction service is designed to cope with the case of the end record never being written to disk - when it gets an XAER_NOTA returned from the databases

[date time] 00000057 WSRdbXaResour E   DSRA0302E:  XAException occurred.  
                                        Error code is: XAER_NOTA (-4).  Exception is: XAER_NOTA

If there is a transaction without an end record left in the transaction log, the transaction service tries to check with the database. If the transaction has completed, the database indicates that there is nothing to complete (XAER_NOTA). This is normal behavior, and not an error.