1.2.2 Availability matrix

We all talk about uptime, and everybody wants 100% uptime. In reality, a 100% uptime system is prohibitively expensive to implement. For some applications, 99% uptime is adequate, leaving a downtime of 14 minutes per day on average (see Table 1-1). For some applications, 99.9% or higher uptime is required. Many people refer to 99%, 99.9%, 99.99%, and 99.999% as two nines, three nines, four nines, and five nines. The five nines is generally thought of as the best achievable system with reasonable costs, and many vendors offer such solutions. Examples for these solutions are: - IBM with WebSphere WLM and clustering, WebSphere MQ Cluster, HACMP on AIX, TSA, or the Database Partitioning feature in DB2 UDB Enterprise Server Edition


9s	Percentage of uptime	Downtime per year	Downtime per week	Downtime per day
	90%	36.5 days	16.9 hours	2.4 hours
	95%	18.3 days	8.4 hours	1.2 hours
	98%	7.3 days	3.4 hours	28.8 minutes
Two 9s	99%	3.7 days	1.7 hours	14.4 minutes
	99.5%	1.8 days	50.4 minutes	7.2 minutes
	99.8%	17.5 hours	20.2 minutes	2.9 minutes
Three 9s	99.9%	8.8 hours	10.1 minutes	1.4 minutes
Four 9s	99.99%	52.5 minutes	1 minute	8.6 seconds
Five 9s	99.999%	5.3 minutes	6 seconds	864 milliseconds
Six 9s	99.9999%	31.5 seconds	604.8 milliseconds	86.4 milliseconds
Seven 9s	99.99999%	3.2 seconds	60.5 milliseconds	8.6 milliseconds
Eight 9s	99.999999%	315.4 milliseconds	6 milliseconds	0.9 milliseconds

Availability matrix - nine rule

The five nines availability allows a downtime of 864 milliseconds per day, 6 seconds per week, and 5.3 minutes per year as shown in Table 1-1. For all clustering techniques with IP takeover, a typical database failover takes two to three minutes. Thus, MTTR equals 2.5 minutes. We, therefore, need an MTBF of 183 days to achieve 99.999% availability. That means only two failovers per year.

Some businesses require 7x24x365 availability, while others require 6x20 or 5x12 availability. The latter do not reduce the requirement for high availability if the business requires the minimum interruption during its business hours. Because we do not know when outages will happen, clustering techniques can keep MTTR short and increase available time even if a business operates only 5x12.

Even though clustering techniques can keep a service highly available, service performance might degrade after the failure occurs until the failed system rejoins the cluster after repair.

Therefore, we suggest describing availability using three factors:

System uptime percentage

Business operation hours and pattern

Performance availability requirement

You should design a high availability system to satisfy the uptime requirement during operation hours and to meet the performance availability requirement.

Most business applications do not require 7x24, so software and hardware upgrades can be performed in the scheduled maintenance time. For the business that requires 7x24 services, clustering techniques provide rolling upgrades and hot replacements by failing over manually from one system to another. See Chapter 4, High availability system administration and Chapter 5, High availability application administration for more information.

ibm.com/redbooks