1.1.1 High Availability

High Availability refers to reliably providing services to all the users of the system within a reasonable response time for a long duration of time.

Let us expand on the key, underlined attributes that define High Availability.

A system that is operational but not reliable, or a system that is operational but not acceptably responsive, is not considered to be available. The response times are defined by the business requirements for the site.

The number of users refers to the peak number of users, again, as defined by the business requirements.

Two implementations of High Availability can be distinguished from one another by the duration for which they remain available. Obviously, a system that is unavailable for few minutes a year is better than a system that is unavailable for a few minutes every month. Systems implemented for High Availability are usually assigned a certain percentage number indicating the level of their High Availability. For example, a 99.999% availability refers to unavailability (or outage) of a system for only up to 5.26 minutes per year, whereas 99.9% availability refers to unavailability of system for up to 526 minutes (7 hours 46 minutes) per year. A 100% availability is also called continuous availability. Such a solution is virtually impossible, but still a valid ideal goal to aspire to. Any solution that attempts to have continuous availability tends to be very expensive and very complex to implement as well as manage.

Generally, High Availability refers to 99.9% or higher availability. Also, many products and businesses do not count any planned outage towards High Availability calculation, as any planned outage may be managed in such a way that it does impact the business operations.

It is not always an easy task to assign a specific availability number to a given system. The complexity of this task increases as the number of components of a system and dependencies amongst them increase. Furthermore, each component or sub-system may have different availabilities under different states. And, even if a High Availability number can be assigned, it is very difficult to validate it since such a system should rarely fail.

A WebSphere Commerce site may be an aggregation of many components such as database servers, LDAP servers, IBM WebSphere MQ servers, WebSphere Application Servers, Web servers, Load Balancers, fire walls, and so on. It is a very complex task to ascribe a High Availability number to such a site. Instead of focusing on a specific availability number, our focus in this book is to have the ideal goal of ensuring continuous availability. To reach this ideal goal, we need to constantly watch the performance and workload of the site, and we need to keep reinforcing the components according to their strengths and weaknesses to ensure that no unplanned outage may occur. The reinforcement can be managing redundancy of the component, their configuration, or their performance.
xxxx