+

Search Tips   |   Advanced Search

Messaging engine recovery from exception conditions

In service integration, there can be exception conditions that do not require a messaging engine to restart, exception conditions that require an automatic restart of the messaging engine, exception conditions that are detected by explicit health monitoring and handled by the HAManager, and exception conditions that require user intervention.


Recovery with the messaging engine running

A messaging engine can handle certain exception conditions without requiring the messaging engine to restart or fail over. The exception condition is corrected automatically and an entry is added to the system error log that explains the exception and suggests any user actions. The messaging engine continues to run and to honor the quality of service specified for the messages it is processing.


Recovery with automatic restart of the messaging engine (local exceptions)

A messaging engine can recover from local exceptions by an automatic restart of the messaging engine, either on its current server or on an alternative server. For example, if a messaging engine cannot connect to its data store, possibly the server in which the messaging engine runs cannot create a connection to the data store, but another server in the same cluster can. In a high availability configuration, that is, failover is enabled, the HAManager will stop and disable the messaging engine in the current server and fail over the messaging engine to a new server. The disabled messaging engine is automatically enabled after 30 seconds.


Recovery from exceptions detected by explicit health monitoring

A messaging engine cannot detect exceptions such as a thread spinning (when the thread becomes trapped in a loop and no longer performs useful work), or a deadlock (when two threads are blocking each other), but explicit health monitoring can. The HAManager provides such monitoring, and periodically tests the health of the messaging engine. If the HAManager detects that a messaging engine that uses the data store cannot run properly, the HAManager stops and disables the messaging engine. If the messaging engine uses a file store, then the HAManager shuts down the server hosting the messaging engine. If the server is in a cluster, the HAManager restarts the messaging engine on an alternative server, if the policy of the messaging engine allows failover. The disabled messaging engine is automatically enabled after 30 seconds, if the messaging engine uses a data store.


Recovery that requires user intervention (global exceptions)

A messaging engine cannot recover from global exceptions by restarting or failing over the messaging engine. For example, if the data store for a messaging engine becomes corrupted, the problem is not resolved by running the messaging engine on a different server because it encounters the same problem. If a messaging engine in this situation was to be failed over, the messaging engine would be continually failed over because it could not run in any server. There would be unwanted disruption to the cluster as servers attempted to run the messaging engine and were shut down. To avoid such a situation, if a global exception occurs, the messaging engine logs an error, stops processing messages, and is not failed over. The messaging engine cannot be restarted until you correct the global exception condition and restart the server.

  • Injecting failures into a high availability system