Setup and configuration

There are seven daemon processes associated with MC/ServiceGuard to assist configuration: clustering, syslog, LVM (logical volume management), cluster object manager, SBMP subagent, service assistant, and shared tape. The heartbeat mechanism is used to monitor node and database services and their dependencies. We installed MC/ServiceGuard on two HP 9000 K-class machines and an AutoRAID disk array connected to both machines. We installed DB2 and Oracle on both machines and created instances on the shared disk array.

MC/ServiceGuard supports single-ended SCSI, Fast/Wide SCSI, and Fiber Channel disk interfaces. We used Fast/Wide SCSI interfaces to connect two nodes to an HP disk array. There are two ways to protect your data: Use disk mirroring. You can configure logical volumes using MirrorDisk/UX, so the members of each mirrored set contain exactly the same data. If one disk should fail, MirrorDisk/UX will automatically keep the data available by accessing the other mirror. To protect against SCSI bus failures, each copy of the data must be accessed by a separate SCSI bus. It is optional for you to mirror root disks since other nodes can take over the failed database process in case of failure of the original root disks. Use disk arrays using RAID levels and PV links. The array provides data redundancy for the disks. This protection needs to be combined with the use of redundant SCSI interfaces between each node and the array. Configured with PV links, the redundant interfaces protect against a single point of failure in the I/O channel. You can monitor disks through the event monitoring service.

For software failures, the database can be restarted on the same node or another node with minimum disruption. For failures of disk interfaces or other monitored resources, the database can be moved to another node. For a failure of the node itself, the database can be moved from a failed node to another node automatically. For failure of the LAN, MC/ServiceGuard switches to a standby LAN or moves the database to a standby node.

We need to have at least two heartbeat LANs in the MC/ServiceGuard cluster to keep the heartbeat message highly available. If you have only one heartbeat LAN, a dedicated serial heartbeat is required for two-node heartbeat communications. Redundancy is provided by the primary LAN and the dedicated LAN, which are both carrying the heartbeat. A dedicated heartbeat line will prevent a false diagnosis of heartbeat failure. We used a dedicated private Ethernet LAN for the heartbeat in addition to another heartbeat connection served by one of the public LANs, as shown in Figure 12-5.

We set up and configured our test environment as follows:

1. Install MC/ServiceGuard software on each node with swinstall and choose the B3935DA package. For MC/ServiceGuard installation details, see the manual provided with MC/ServiceGuard software.

2. Configure and update each node for the MC/ServiceGuard cluster: a. Grant security permissions to both machines by adding entries into /etc/cmclustercmclnodelist file:

Hp1.somecorp.com root # WebSphere database cluster

Hp2.somecorp.com root # WebSphere database cluster

If you want to allow non-root users to run cmviewcl, also add the non-root user IDs to this file. b. Define name resolution services:

By default, MC/ServiceGuard uses /etc/resolv.conf to obtain the addresses of the cluster nodes. In case DNS is not available, configure the /etc/hosts file and configure /etc/nsswitch.conf to search the /etc/hosts file when other lookup strategies are not working.

3. Set up and configure the shared disk array: a. Connect the shared disk array to both nodes. b. Create volume groups, logical volumes, and mirrors using pvcreate, vgcreate, vgextend, lvcreate, and lvextend. c. Create cluster lock disks. d. Distribute volume groups to the other node. You can distribute volume groups using either SAM or LVM commands.

Figure 12-5 shows a two-node cluster. The disks are configured so that resources can be allocated to each node and each node may adopt the database application from the other node. Each database application has one disk volume group assigned to it and the logical volumes in that volume group are mirrored. Our arrangement eliminates single points of failure and makes either the disks or its mirrors available in the event that one of the buses fails.

4. Configure MC/ServiceGuard cluster for WebSphere databases: a. Using SAM, select Cluster -> High Availability Cluster. b. Choose Cluster Configuration. c. Select the Actions menu, and choose Create cluster configuration, then follow the instructions. d. Verify the cluster configuration using: For DB2:

cmeckconf -k -v -C /etc/cmcluster/webspheredb2.config For Oracle:

/etc/cmcluster/websphereoracle.config e. Distribute the binary configuration file to the other node using either SAM or the command line. f. Back up the volume group and cluster lock configuration data for possible replacement of disks later on.

5. Configure packages and their services: a. Install DB2 or Oracle in both machines and LDAP into the shared disk. b. Create database instances into the shared LVG. c. Use SAM to configure packages. d. Customize the package control scripts for VG activation, service IPs, volume groups, service start, and service stop. Since the control scripts are very long, we give key sections of our sample scripts for DB2 and Oracle as follows: For DB2, our sample service start script is:

function customer_defined_run_cmds

{

su - db2inst4<<STARTDB

db2start

STARTDB

test_return 51

} Our sample DB2 service stop script is:

function customer_defined_halt_cmds

{

su - db2inst4<<STOPDB

db2 force applications all

sleep 1

db2stop

STOPDB

test_return 52

} For Oracle, our sample service start script is:

function customer_defined_run_cmds

{

su - oracle<<STARTDB

lsnrctl start

export SIDS="APP ADMIN SESSION"

for SID in $SIDS ; do

export ORACLE_SID=$SID

echo "connect internal\nstartup\nquit" | svrmgrl

done

STARTDB

test_return 51

} Our sample Oracle service stop script is:

function customer_defined_halt_cmds

{

su - oracle<<STOPDB

export SIDS="APP ADMIN SESSION"

for SID in $SIDS ; do

export ORACLE_SID=$SID

echo "connect internal\nshutdown\nquit" | svrmgrl

done

lsnrctl stop

STOPDB

test_return 52

} e. Distribute the package configuration using SAM.

6. Verify the cluster operation and configuration to ensure that:
Heartbeat networks are up and working normally
Networks are up and working normally
All nodes are up and working normally
All properties configured are correct
All services such as DB2, Oracle, LDAP are up and working normally
Logs should not have errors

7. Verify system failover from SAM by moving packages from one node to another.

  Prev | Home | Next

 

WebSphere is a trademark of the IBM Corporation in the United States, other countries, or both.

 

IBM is a trademark of the IBM Corporation in the United States, other countries, or both.