|
We all talk about uptime, and everybody wants 100% uptime. In reality, a 100% uptime system is prohibitively expensive to implement. For some applications, 99% uptime is adequate, leaving a downtime of 14 minutes per day on average (see Table 1-1). For some applications, 99.9% or higher uptime is required. Many people refer to 99%, 99.9%, 99.99%, and 99.999% as two nines, three nines, four nines, and five nines. The five nines is generally thought of as the best achievable system with reasonable costs, and many vendors offer such solutions. Examples for these solutions are:
Table 1-1
9s | Percentage of uptime | Downtime per year | Downtime per week | Downtime per day |
---|---|---|---|---|
90% | 36.5 days | 16.9 hours | 2.4 hours | |
95% | 18.3 days | 8.4 hours | 1.2 hours | |
98% | 7.3 days | 3.4 hours | 28.8 minutes | |
Two 9s | 99% | 3.7 days | 1.7 hours | 14.4 minutes |
99.5% | 1.8 days | 50.4 minutes | 7.2 minutes | |
99.8% | 17.5 hours | 20.2 minutes | 2.9 minutes | |
Three 9s | 99.9% | 8.8 hours | 10.1 minutes | 1.4 minutes |
Four 9s | 99.99% | 52.5 minutes | 1 minute | 8.6 seconds |
Five 9s | 99.999% | 5.3 minutes | 6 seconds | 864 milliseconds |
Six 9s | 99.9999% | 31.5 seconds | 604.8 milliseconds | 86.4 milliseconds |
Seven 9s | 99.99999% | 3.2 seconds | 60.5 milliseconds | 8.6 milliseconds |
Eight 9s | 99.999999% | 315.4 milliseconds | 6 milliseconds | 0.9 milliseconds |
Availability matrix - nine rule
The five nines availability allows a downtime of 864 milliseconds per day, 6 seconds per week, and 5.3 minutes per year as shown in Table 1-1. For all clustering techniques with IP takeover, a typical database failover takes two to three minutes. Thus, MTTR equals 2.5 minutes. We, therefore, need an MTBF of 183 days to achieve 99.999% availability. That means only two failovers per year.
Some businesses require 7x24x365 availability, while others require 6x20 or 5x12 availability. The latter do not reduce the requirement for high availability if the business requires the minimum interruption during its business hours. Because we do not know when outages will happen, clustering techniques can keep MTTR short and increase available time even if a business operates only 5x12.
Even though clustering techniques can keep a service highly available, service performance might degrade after the failure occurs until the failed system rejoins the cluster after repair.
Therefore, we suggest describing availability using three factors:
System uptime percentage |
Business operation hours and pattern |
Performance availability requirement |
You should design a high availability system to satisfy the uptime requirement during operation hours and to meet the performance availability requirement.
Most business applications do not require 7x24, so software and hardware upgrades can be performed in the scheduled maintenance time. For the business that requires 7x24 services, clustering techniques provide rolling upgrades and hot replacements by failing over manually from one system to another. See Chapter 4, High availability system administration and Chapter 5, High availability application administration for more information.