IBM Tivoli Monitoring > Version 6.3 Fix Pack 2 > Installation Guides > High Availability Guide for Distributed Systems > Create clusters with Tivoli Monitoring components in an HACMP environment > Install the monitoring server on its base HACMP cluster > Add the monitoring server to the resource group of the base cluster
IBM Tivoli Monitoring, Version 6.3 Fix Pack 2
Add HACMP monitoring for the monitoring server processes
Use HACMPâ„¢ monitoring to monitor the processes of any defined HACMP application server. In the event that the processes are not running, an attempt is made to restart the component. Failover from the application server to the other node occurs if the attempt fails; this behavior depends on the script that is defined to do the recovery action.
Because the monitoring server has two processes that must be running, both processes should be stopped to restart the monitoring server. The two Tivoli Enterprise Monitoring Server processes are kdsmain and cms.
A modified start script, that first checks if one or both processes are running, might resemble the following script. Note that the following parameters were used on the monitor setup:
- Monitor Mode: both
- Owner: root
- Instance count: 1
- Stabilization interval: 60
- Restart count: default 3 (depends on your environment)
- Restart interval: 3 x 60 = 180
- Action on application failure: fallover
- Restart method: startscript
Additional scripts that call this script with the correct parameters must be created.
For additional information on how to set up an application server process monitor, see the HACMP for AIX Administration Guide.
A modified start script:
#!/bin/bash #--------------------------------------------------------------------- # # HACMP: HA start/stop/status script for <TEMSSRV> # # Control <TEMSSRV> # ================== # # This script is to be used as Start/Stop/MonitorCommand # Invocation: # # <$0>.ksh <Action> itm_installation_directory hubname # # arg $1 <Action> is any one of: {start|stop|status} # arg $2 ITM installation directory # arg $3 ITM hub name #--------------------------------------------------------------------- # intended IBM.Application definition # PersistentResourceAttributes:: # Name="SA-<TEMSSRV>-rs" # ResourceType=1 # NodeNameList="{${clusternode1},${clusternode2}}"# StartCommand= # StartCommandTimeout= # StopCommand= # StopCommandTimeout= # MonitorCommand= # MonitorCommandPeriod= # MonitorCommandTimeout= # UserName=root # RunCommandsSync=1 #--------------------------------------------------------------------- ## Static #--------------------------------------------------------------------- INVOCATION="$0 $@" Myname=`/bin/basename $0` USAGE="Usage: ${Myname} [start|stop|status] itm_installation_directory hubname" STATUS_UNKNOWN=0STATUS_ONLINE=1 STATUS_OFFLINE=2 STATUS_FAILED_OFFLINE=3 STATUS_STUCK_ONLINE=4 STATUS_PENDING_ONLINE=5 STATUS_PENDING_OFFLINE=6 STARTSTOP_OK=0 STARTSTOP_ERR=1 RC=0 #--------------------------------------------------------------------- ## Arguments NumArgs=3 #--------------------------------------------------------------------- Action=${1:-status} CANDLEHOME=${2} HUBNAME=${3} #--------------------------------------------------------------------- ## Var (non-configurable) #--------------------------------------------------------------------- ## ... #--------------------------------------------------------------------- ## Var (configurable) #--------------------------------------------------------------------- # SYSLOG_LVL - 0 is least detailed, 2 is most detailed. # written to syslog SYSLOG_LVL=1 #--- Verify the OS and set absolute paths to the binaries OS=`uname -s` OSVERSION="`uname -v`" # do distro stuff if [ -a /etc/SuSE-release ] then loggerPATH=/bin else loggerPATH=/usr/bin fi case $OS in AIX) INST_INTERP="AIX" BIN="/usr/bin" AWK_CMD="$BIN/awk" CAT_CMD="$BIN/cat" DATE_CMD="$BIN/date" PS_CMD="$BIN/ps" SU_CMD="$BIN/su" GREP_CMD="$BIN/grep" TEE_CMD="$BIN/tee" PIDMON_CMD="$BIN/pidmon" LOGGER_CMD="$BIN/logger" KILL_CMD="$BIN/kill" ;; Linux) USRBIN="/usr/bin" BIN="/bin" AWK_CMD="$BIN/gawk" DATE_CMD="$BIN/date" CAT_CMD="$USRBIN/cat" PS_CMD="$BIN/ps" SU_CMD="$BIN/su" KILL_CMD="$BIN/kill" GREP_CMD="$USRBIN/grep" TEE_CMD="$USRBIN/tee" PIDMON_CMD="$USRBIN/pidmon" if [ -a /etc/SuSE-release ]; then LOGGER_CMD="$BIN/logger" else LOGGER_CMD="$USRBIN/logger" fi case `uname -m` in *390*) INST_INTERP="LINUX_S390" ;; *86*) INST_INTERP="LINUX_I386" ;; *) INST_INTERP="LINUX_OTHER" ;; esac ;; esac #---------------------------------------------------- # function: logit # arg $1 log level # arg $2 message #---------------------------------------------------- function logit { if [ $SYSLOG_LVL -ge $1 ]; then echo ${Myname} "$2" ${LOGGER_CMD} -i -t ${Myname}: "$2" fi } #logit #---------------------------------------------------------------------- ## Main Section #---------------------------------------------------------------------- if [ $# != ${NumArgs} ]; then echo ${USAGE} logit 0 "Bad Usage returning: 0" exit 0 fi export CANDLEHOME BINARCH=$CANDLEHOME/*/sy/bin export BINARCH export HUBNAME case ${Action} in start) logit 1 "Start command issued" kdsmainproc=$($PS_CMD -ef | $AWK_CMD '/kdsmain/ && !/awk/ {print $2}') cmsproc=$($PS_CMD -ef | $AWK_CMD '/cms start/ && !/awk/ {print $2}') restart=0 start=1 if [[ $kdsmainproc != "" ]]; then if [[ $cmsproc != "" ]]; then start=0 else $KILL_CMD -9 $kdsmainproc start=1 fi else if [[ $cmsproc != "" ]]; then $KILL_CMD -9 $cmsproc start=1 fi fi if [[ $start = 1 ]]; then $SU_CMD - root -c "$CANDLEHOME/bin/itmcmd server start $HUBNAME" RC=$? else RC=0 fi logit 0 "Start command returned: $RC" ;; stop) logit 1 "Stop command issued" kdsmainproc=$($PS_CMD -ef | $AWK_CMD '/kdsmain/ && !/awk/ {print $2}') cmsproc=$($PS_CMD -ef | $AWK_CMD '/cms start/ && !/awk/ {print $2}') if [[ $kdsmainproc != "" ]]; then if [[ $cmsproc != "" ]]; then $SU_CMD - root -c "$CANDLEHOME/bin/itmcmd server stop $HUBNAME" else $KILL_CMD -9 $kdsmainproc fi else if [[ $cmsproc != "" ]]; then $KILL_CMD -9 $cmsproc fi fi RC=0 logit 0 "Stop command returned: $RC" ;; status) logit 2 "Status command issued" cmsprocstat=$($PS_CMD -ef | $AWK_CMD '/kdsmain/ && !/awk/ {print $2}') if [[ $cmsprocstat != "" ]]; then # the kdsmain process is running echo "cms running" cmsStatus=1 else # the kdsmain process isn't running cmsStatus=2 fi kdsmainproc=$($PS_CMD -ef | $AWK_CMD '/kdsmain/ && !/awk/ {print $2}') start=1; if [[ $kdsmainproc != "" ]]; then # the kdsmain process is running kdsStatus=1 else # the kdsmain process isn't running kdsStatus=2 fi if [[ $cmsStatus = "1" && $kdsStatus = "1" ]]; then # HACMP expects 0 if application running RC=0; else # and non-zero if not running RC=2; fi logit 2 "Status command returned: $RC" ;; *) RC=${UNKNOWN} echo ${USAGE} logit 0 "Bad Action returning: ${RC}" ;; esac exit $RC
Parent topic:
Add the monitoring server to the resource group of the base cluster