7.4 Disaster recovery

High availability and disaster recovery are two related but very distinct topics, with different goals and different requirements. High availability (HA) involves providing redundancy and allowing automatic failover to ensure that the service being provided to a user is always available.

Disaster recovery (often referred to as DR) deals with true disasters, for example when an entire data center is hit by a power outage, or any other form of total, catastrophic system failure. This section discusses disaster recovery.

WebSphere clustering provides high availability. However, WebSphere clustering alone should not be used to provide disaster recovery. WebSphere cells should not span data centers, and a disaster recovery environment must be located physically distant to the main production site, to avoid being taken down by the localized effects of the same disaster that took down the production environment.

An advantage of running WebSphere on micropartitions is that the cost of having a "standby" partition is minimal. You can completely automate the addition of LPARs into a WebSphere cluster by preparing the LPAR previously; a WebSphere node would be already installed and configured into the cell, so thereafter you would only need to turn on the LPAR and assign resources to it. This ability allows customers to be very flexible with their WebSphere infrastructures.

However, each customer environment is different and has varying requirements, so no single design covers all possible configurations. But for the purpose of illustration, in the following section we describe a simple environment with two System p servers. One server acts as the production environment, and the other server performs the functions of preproduction and testing.