9.1.3 Failover terms and mechanisms

Failover terms and mechanisms

An object or an application includes two distinct aspects:

functions - process availability
data - data availability

If a function is not associated with individual data or states, it is easy to achieve high availability by simply restarting this function process when the old process crashes. However, the reality is that functions are associated with individual data or state, some with persisted data in database or files, such as Entity EJBs. We need to make data management systems highly available to all processes and ensure data integrity because the failed process might damage data.

Failover

Failover refers to the single process that moves from the primary system to the backup system in the cluster. The failure recovery includes several steps:

Stop and exit the failed process.
Release the resources.
Detach the disk array.
Reattach the disk array to the backup system.
Check the disk and file system.
Repair the data integrity.
Gain all resources for running the process in the backup system.
Start the process in the backup system.

This failover process takes several minutes after the fault is detected. This approach can be used for both function-centric or data-centric applications for both Active/Passive and Active/Active configurations.

Fail back, fallback

Fail back or fallback is similar to failover, but occurs from the backup system to the primary system when the primary system is back online. For mutual takeover, because the backup node has its original application running, as shown in Figure 9-8, failing back improves the performance of both applications.