(ZOS) InFlight work and presumed abort mode
Presumed abort mode is activated when a failure occurs before a distributed transaction starts to commit.
If we have a distributed transaction that spans several servers, transactional locks may be held by resource managers involved in that work. When a failure occurs before that distributed transaction has started to commit, the product and the resource managers go into presumed abort mode. In this mode, the resource managers rolls back the transaction.
- The effect of a server failure or communications failure will vary depending on which server is running the work at the time of failure.
- An OTS timeout may be required to rollback the subordinate branches of the distributed transaction tree.
Example: A common case of this is when we have a server B web client that is driving a session bean in the same server. That session bean has executed work against entity beans in servers C and D. All of the servers are involved in the same distributed, global transaction. Suddenly, server B fails while the session bean is InFlight (meaning it hadn't started to commit yet). Servers C and D are waiting for more work or the start of the two-phase commit protocol, but, while in this state, the transactional locks may still be held by the resource managers. So, the server roles are as follows:
- Server A: Servlet/JavaServer Page executed
- Server B: Session bean accessed
- Server C: Entity bean accessed
- Server D: Entity bean accessed
After the timeout occurs, because the session bean is InFlight at the time of the failure, the product rolls back the transaction branch.
When local resource managers are involved, RRS ensures that they are called to perform presumed abort processing. When doing recovery, RRS works with the resource managers to ensure that the recovery is done properly. When a failure occurs while work is InFlight, RRS directs the resource managers involved in the local UR to rollback.
The product always assumes that there is recovery to do. Every time a server comes up, it does something different depending in which mode it is running:
- If the server is running in restart/recovery mode, the product checks to see whether there is any recovery required. If recovery is required, the product attempts to complete the recovery and either succeeds or terminates.
- If the server is running normally, the restart/recovery transaction does not have to complete before the server takes on new work. After the server determines what the restart work is, it begins to take in new work items. Processing of the restart/recovery transaction continues along with the processing of new work items.