(ZOS) When might PRR fail to recover servers
The major reason for peer restart and recovery (PRR) failure is if we experience a network outage while in the process of recovering. If the system cannot reach the superior or subordinate because the network is dead, communications cannot reestablish and the transaction cannot completely resolve.
Deprecated feature: Peer restart and recovery functionality is deprecated. We should use the integrated high availability support for the transaction service subcomponent, instead of peer restart and recovery for transaction recovery. depfeat
When the product cannot automatically resolve all of the URs returned from RRS at restart, RRS will not allow the application server to move back to the home (original) system. If the application serve tries to go back while URs are still incomplete, we will receive an error code (C9C2186A) and a message describing an F02 return code from ATRIBRS. In order to get around this, manual resolution is required to mark the server for "restart anywhere." RRS will do that once all of the URs in which the product is involved are forgotten. If RRS fails to mark the server restart anywhere, the server, upon failure, is required to start on the recovery system. This is not good because it doesn't allow us to move the server back to its true home system.
The ultimate goal of this is to resolve all transactions that the application server (the server instance- owned interests that could not complete recovery) is involved in, and then, if necessary, remove all of the application serve interests that remain in those URs. Once that is complete, browsing the RM data log will show if the resource manager is marked "restart anywhere."
You want to see:
RESOURCE MANAGER=BSS00.SY1.BBOASR4A.IBM RESOURCE MANAGER MAY RESTART ON ANY SYSTEMYou do not want to see:
RESOURCE MANAGER=BSS00.SY2.BBOASR4A.IBM RESOURCE MANAGER MUST RESTART ON SYSTEM SY2
Related:
Transactional high availability Configure transaction properties for peer recovery