IBM


1.2.2 Availability matrix

We all talk about uptime, and everybody wants 100% uptime. In reality, a 100% uptime system is prohibitively expensive to implement. For some applications, 99% uptime is adequate, leaving a downtime of 14 minutes per day on average (see Table 1-1). For some applications, 99.9% or higher uptime is required. Many people refer to 99%, 99.9%, 99.99%, and 99.999% as two nines, three nines, four nines, and five nines. The five nines is generally thought of as the best achievable system with reasonable costs, and many vendors offer such solutions. Examples for these solutions are: - IBM with WebSphere WLM and clustering, WebSphere MQ Cluster, HACMP on AIX, TSA, or the Database Partitioning feature in DB2 UDB Enterprise Server Edition

- Sun Microsystems with Sun Cluster on Solaris™

- VERITAS with VERITAS Cluster Server

Table 1-1

9s Percentage of uptime Downtime per year Downtime per week Downtime per day
  90% 36.5 days 16.9 hours 2.4 hours
  95% 18.3 days 8.4 hours 1.2 hours
  98% 7.3 days 3.4 hours 28.8 minutes
Two 9s 99% 3.7 days 1.7 hours 14.4 minutes
  99.5% 1.8 days 50.4 minutes 7.2 minutes
  99.8% 17.5 hours 20.2 minutes 2.9 minutes
Three 9s 99.9% 8.8 hours 10.1 minutes 1.4 minutes
Four 9s 99.99% 52.5 minutes 1 minute 8.6 seconds
Five 9s 99.999% 5.3 minutes 6 seconds 864 milliseconds
Six 9s 99.9999% 31.5 seconds 604.8 milliseconds 86.4 milliseconds
Seven 9s 99.99999% 3.2 seconds 60.5 milliseconds 8.6 milliseconds
Eight 9s 99.999999% 315.4 milliseconds 6 milliseconds 0.9 milliseconds

Availability matrix - nine rule

The five nines availability allows a downtime of 864 milliseconds per day, 6 seconds per week, and 5.3 minutes per year as shown in Table 1-1. For all clustering techniques with IP takeover, a typical database failover takes two to three minutes. Thus, MTTR equals 2.5 minutes. We, therefore, need an MTBF of 183 days to achieve 99.999% availability. That means only two failovers per year.

Some businesses require 7x24x365 availability, while others require 6x20 or 5x12 availability. The latter do not reduce the requirement for high availability if the business requires the minimum interruption during its business hours. Because we do not know when outages will happen, clustering techniques can keep MTTR short and increase available time even if a business operates only 5x12.

Even though clustering techniques can keep a service highly available, service performance might degrade after the failure occurs until the failed system rejoins the cluster after repair.

Therefore, we suggest describing availability using three factors:

System uptime percentage

Business operation hours and pattern

Performance availability requirement

You should design a high availability system to satisfy the uptime requirement during operation hours and to meet the performance availability requirement.

Most business applications do not require 7x24, so software and hardware upgrades can be performed in the scheduled maintenance time. For the business that requires 7x24 services, clustering techniques provide rolling upgrades and hot replacements by failing over manually from one system to another. See Chapter 4, High availability system administration and Chapter 5, High availability application administration for more information.


Redbooks ibm.com/redbooks

Next