Clustering for high availability

 

Clustering is a fundamental approach for accomplishing high availability. IBM WAS Network Deployment V5.1 has a built-in server clustering technique (WLM). In addition, there are many other clustering techniques that can be used for WebSphere end-to-end system high availability.

In order to achieve 99.x% of WebSphere system availability, we need to integrate platform-specific clustering solutions with WebSphere to meet the high availability needs for critical applications. Data access is a very important part of most applications, especially transactional applications. Protecting data integrity and enhancing availability of the entire WebSphere processing environment is achieved by integrating WebSphere and platform-specific clustering software.

We should consider WebSphere high availability as end-to-end system availability; this system includes a database, LDAP, firewall, Load Balancer, HTTP server, and WebSphere server. This system is integrated with platform-specific clustering software to achieve maximum availability.

Generally speaking, the percentage of the failed client requests over the total client requests is very small, and most client request failures are caused by database failures, since the database failover takes minutes to finish if IP failover is used. WebSphere process failures usually do not contribute much to the total client request failures, as WebSphere process failover is instantaneous under the mechanism of the WebSphere workload management. Some parallel database servers such as Oracle Parallel Server (OPS) or Real Application Cluster (RAC) and some techniques such as Transparent Application Failover (TAF) can be used to minimize the total client failures.

There are several platform-clustering software packages available. The unit of failover usually includes a collection of network definitions and disk storage, and one or more services such as DB2 database server, Oracle database server, HTTP server, Firewall, LDAP server, WAS, WebSphere Node Agent or Deployment Manager. However, it is not standardized, and different vendors use different terms. For example, the unit of failover in Sun Cluster is called a logical host, while the unit of failover in IBM HACMP or HACMP/ES Cluster is called an appserver, and the unit of failover in HP MC/ServiceGuard is called a package.

There are two kinds of cluster failovers: IP-based cluster failover:

For example, HACMP, HACMP/ES, MC/ServiceGuard, Sun Cluster, VERITAS Cluster, Microsoft Cluster, DB2 Parallel Server (IBM DB2 UDB Enterprise Extended Edition) Non-IP cluster failover:

For example, WebSphere WLM, Oracle Parallel Server, Oracle Real Application Cluster

Usually, IP-based cluster failover is slow (one to five minutes), and non-IP cluster failover is very fast (instantaneous). However, non-IP cluster failover, such as Oracle Parallel Server, still relies on cluster software such as MC/ServiceGuard, Sun Cluster, HACMP, or MSCS to provide the cluster information. Oracle integrated the clustering manager into its products in Oracle 9i on Windows.

In this part, we cover clustering considerations, the failover process, clustering configurations, sample scripts, each WebSphere component's reactions to the failures, and how to tune the cluster system to enhance the performance. We discuss:
HACMP and HACMP/ES
MC/ServiceGuard
Sun Cluster
VERITAS Cluster
Microsoft Cluster
Oracle Parallel Server and Real Application Cluster, both Connection-Time Failover and Transparent Application Failover
DB2 Parallel Server Cluster with hardware-based clustering
WAS high availability
WebSphere Node Agent and Deployment Manager high availability
WebSphere high availability enhancements
WebSphere JMS server high availability
WebSphere MQ high availability
Web server high availability
Load Balancer high availability
Firewall high availability
LDAP high availability
Highly available network file system

We also address the techniques that are needed to build an end-to-end, highly available WebSphere production system.

There are two types of highly available data services: A failover data service runs an application on only one primary node in the cluster at a time. Other nodes might run other applications, but each application runs on only a single node. If a primary node fails, the applications running on the failed node fail over to another node and continue running. We can further divide this category into two subcategories: Active/Active mode, where two services reside in two nodes that are configured as mutual failover. Active/Standby mode, where one node is configured as the primary to run the service, while the other node is configured as a hot standby.

The configuration for both modes is very similar. The advantage of the Active/Active mode configuration is lower hardware cost. However, the service performance is reduced when a failover occurs. The advantage of the Active/Standby mode configuration is steady performance, but redundant hardware is needed. Furthermore, the Active/Active mode configuration may have twice as many interruptions as the Active/Standby mode configuration, since a failure in any of two nodes may cause a failover. A scalable data service spreads an application across multiple nodes to create a single, logical service. Scalable services leverage the number of nodes and processors in the entire cluster on which they run.

  Prev | Home | Next

 

WebSphere is a trademark of the IBM Corporation in the United States, other countries, or both.

 

IBM is a trademark of the IBM Corporation in the United States, other countries, or both.