IBM


1.2.4 Possible single points of failure in the WebSphere system

Table 1-2 lists potential single points of failure in the WebSphere system and possible solutions.

Table 1-2

Failure point Possible solutions
Client access Multiple ISPs.
Firewalls Firewall clustering, firewall sprayer, HA firewall.
Caching Proxy Backup Caching Proxy system.
HTTP sprayer (such as WebSphere Edge Components' Load Balancer) HA solution of vendor, for example backup Load Balancer server.
Web server Multiple Web servers with network sprayer, hardware-based clustering.
WebSphere master repository data, log files HA shared file system, Network File System (NFS), hardware based clustering.
WAS WAS ND - appserver clustering:

  1. Horizontal
  2. Vertical
  3. Combination of both
Additionally for EJBs: backup cluster.
WebSphere Node Agent Multiple Node Agents in the cluster, OS service, hardware-based clustering.

The Node Agent is not considered a single point of failure in WebSphere V6. The Node Agent must be running when starting the appserver on that node so the appserver can register with the Location Service Daemon (LSD). In WebSphere V6 the LSD is HAManager enabled therefore you only need one running Node Agent in the cluster to provide the LSD when the appservers are started on the node. The Node Agent must also be running when changing security related configuration or you might not be able to synchronize with the Deployment Manager later on any more. Refer to Chapter 3, WebSphere administrative process failures for more information.

WebSphere Deployment Manager OS service, hardware-based clustering, backup WebSphere cell.

The Deployment Manager is not considered a single point of failure in WebSphere V6. We need it to configure your WebSphere environment, to monitor performance using the Tivoli Performance Viewer, or to use backup cluster support. Unless these functions are needed, you can run a production environment without an active Deployment Manager. Refer to Chapter 3, WebSphere administrative process failures for more information.

Entity EJBs, application DB HA DBs, parallel DBs.

Make sure your application catches StaleConnectionException and retries, see 15.4, Database server for more information.

Default messaging provider WebSphere appserver clustering: HAManager provides failover.
Default messaging provider data store Clustering, data replication, parallel database.
Application database Clustering, data replication, parallel database.
Session database Memory-to-memory replication, DB clustering, parallel database.
Transaction logs WebSphere appserver clustering: HAManager provides failover, shared file system with horizontal clustering.
WebSphere MQ WebSphere MQ cluster, combination of WebSphere MQ cluster and clustering.
LDAP Master-replica, sprayer, HA LDAP (clustering).
Internal network Dual internal networks.
Hubs Multiple interconnected network paths.
Disk failures, disk bus failure, disk controller failure Disk mirroring, RAID-5, multiple buses, multiple disk controllers.
Network service failures (DNS, ARP, DHCP, and so forth) Multiple network services.
OS or other software crashes Clustering, switch automatically to a healthy node.
Host dies WebSphere appserver clustering, hardware-based clustering: automatically switch to a healthy node.
Power outages UPS, dual-power systems.
Room/floor disaster (fire, flood, and so forth) Systems in different rooms/different floors.
Building disasters (fire, flood, tornado, and so forth) Systems in different buildings.
City disasters (earthquake, flood, and so forth) Remote mirror, replication, geographical clustering.
Region disasters Put two data centers far away with geographical clustering or remote mirroring.
Human error Train people, simplify system management, use clustering and redundant hardware/software.
Software and hardware upgrades Rolling upgrades with clustering or WLM for 7x24x365, planned maintenance for others.

Possible single points of failure in the WebSphere system


Redbooks ibm.com/redbooks

Next