Availability matrix

We all talk about the uptime, and everybody wants 100% uptime. In reality, a 100% uptime system is prohibitively expensive to implement, as we discussed previously. For some applications, 99% uptime is adequate, leaving a downtime of 14 minutes per day on average (see Table 8-1). For some applications, 99.9% or higher uptime is required. Many people refer to 99%, 99.9%, 99.99%, and 99.999% as two nines, three nines, four nines, and five nines. The five nines is generally thought of as the best achievable system with reasonable costs, and many vendors offer such solutions. These vendors include:
IBM with WebSphere WLM and Cluster
IBM with WebSphere MQ Cluster
IBM with HACMP on AIX
HP with MC/ServiceGuard on HP-UX
Sun Microsystems with Sun Cluster on Solaris
VERITAS with VERITAS Cluster Server
Microsoft with Microsoft Cluster Server on Windows
Oracle Parallel Server or Real Application Cluster
DB2 Parallel Server

We describe the implementation of a WebSphere HA system with all of these products in the next sections.

Table 8-1
9s Percentage of uptime Downtime per year Downtime per week Downtime per day
  90% 36.5 days 16.9 hours 2.4 hours
  95% 18.3 days 8.4 hours 1.2 hours
  98% 7.3 days 3.4 hours 28.8 minutes
Two 9s 99% 3.7 days 1.7 hours 14.4 minutes
  99.5% 1.8 days 50.4 minutes 7.2 minutes
  99.8% 17.5 hours 20.2 minutes 2.9 minutes
Three 9s 99.9% 8.8 hours 10.1 minutes 1.4 minutes
Four 9s 99.99% 52.5 minutes 1 minute 8.6 seconds
Five 9s 99.999% 5.3 minutes 6 seconds 864 milliseconds
Six 9s 99.9999% 31.5 seconds 604.8 milliseconds 86.4 milliseconds
Seven 9s 99.99999% 3.2 seconds 60.5 milliseconds 8.6 milliseconds
Eight 9s 99.999999% 315.4 milliseconds 6 milliseconds 0.9 milliseconds

Availability matrix - "nine" rule

The five nines availability allows a downtime of 864 milliseconds per day, 6 seconds per week, and 5.3 minutes per year (see Table 8-1). For all of these clustering techniques with IP takeover, a typical database failover takes two to three minutes, so MTTR=2.5 minutes. We therefore need an MTBF of 183 days to achieve 99.999% availability. That means only two failovers per year. For Oracle Parallel Server/Real Application Cluster, failover can be done in seconds because the instance is pre-existing and IP takeover is not needed.

Some businesses require 7x24x365 availability, while others require 6x20 or 5x12 availability. The latter do not reduce the requirement for high availability if the business requires the minimum interruption during its business hours. Since we do not know when outages will happen, clustering techniques can keep MTTR short and increase available time even if a business operates only 5x12.

Even though clustering techniques can keep a service highly available, service performance may degrade after the failure occurs until the failed system rejoins the cluster after repair.

Therefore, we suggest describing availability using three factors:
System uptime percentage
Business operation hours and pattern
Performance availability requirement

We should design a high availability system to satisfy the uptime requirement during operation hours and to meet the performance availability requirement.

Most business applications do not require 7x24, so software and hardware upgrades can be performed in the scheduled maintenance time. For the business that requires 7x24 services, clustering techniques provide rolling upgrades and hot replacement by manually failing over from one system to another.

  Prev | Home | Next

 

WebSphere is a trademark of the IBM Corporation in the United States, other countries, or both.

 

IBM is a trademark of the IBM Corporation in the United States, other countries, or both.