Oracle Cluster Management Software for Linux
Overview
OCMS is included with the Oracle9i Enterprise Edition for Linux. It provides cluster membership services, a global view of clusters, node monitoring, and cluster reconfiguration. It is included as a part of Oracle9i Real Application Clusters on Linux and is installed automatically when you choose Oracle9i Real Application Clusters. OCMS consists of the following components:
- Watchdog Daemon
- Node Monitor
- Cluster Manager
Watchdog Daemon
The Watchdog daemon (watchdogd) uses the standard Linux Watchdog timer to monitor selected system resources to prevent database corruption.
The Watchdog daemon monitors the Node Monitor and the Cluster Manager and passes notifications to the Watchdog timer at defined intervals. The behavior of the Watchdog timer is partially controlled by the CONFIG_WATCHDOG_NOWAYOUT kernel configuration parameter.
Oracle9i Real Application Clusters requires that you set the value of the CONFIG_WATCHDOG_NOWAYOUT configuration parameter to Y (disable watchdog shutdown on close). When the Watchdog timer detects an Oracle instance or service failure, it resets the server to avoid possible corruption of the database. If the value of the CONFIG_WATCHDOG_NOWAYOUT parameter is N and a failure is detected, the Watchdog timer does not reset the server.
Oracle9i Real Application Clusters uses the software implementation of the Watchdog timer provided by the Linux kernel.
Node Monitor
The Node Monitor (oranm) maintains a consistent view of the cluster, and reports the status of the nodes in the cluster to the Cluster Manager. The Node Monitor uses a heartbeat mechanism. During normal operations, if the heartbeat mechanism fails the Node Monitor uses a quorum partition on the shared disk to distinguish between a node failure and a network failure.
The Node Monitors on all nodes in a cluster send heartbeat messages to each other. Each node maintains a database containing status information on the other nodes. The Node Monitors in a cluster mark a node inactive if the node fails to send a heartbeat message within a defined time interval.
The heartbeat message from the Node Monitor on a remote server can fail for the following reasons:
- The termination of the Node Monitor on the remote server
- A network failure
- Abnormally heavy load on the remote server
From each cluster node, the Node Monitor periodically updates the designated block on the quorum partition. Other nodes check the timestamp for each updated block. If the heartbeat is dead and the block timestamp is current, the network has failed.
If a node in a cluster stops sending heartbeat messages but continues writing to the shared raw partition, the Node Monitors on other nodes recognize that a network failure occurred. The Node Monitor reconfigures the cluster to terminate the isolated nodes, ensuring that the remaining nodes in the reconfigured cluster continue to function properly.
Abnormally heavy I/O loads can slow down the transmission of heartbeat messages and might indicate a node failure. The Node Monitor works with the Watchdog daemon to stop the node with the abnormally heavy load.
Cluster Manager
The Cluster Manager (oracm) maintains the process-level cluster status. The Cluster Manager accepts registration of Oracle instances to the cluster and provides a consistent view of Oracle instances. The Cluster Manager also propagates status information to all the Oracle instances, enabling communication among instances.
If the LMON process or another Oracle process that can write to the shared disk quits abnormally, the Cluster Manager daemon on the node detects it and requests the Watchdog daemon to stop the node completely. This stops the node from issuing physical I/O to the shared disk before Cluster Manager daemons on the other nodes report the cluster reconfiguration to Oracle instances on the nodes. This action prevents database corruption.
Starting OCMS
Oracle Corporation supplies the $ORACLE_HOME/oracm/bin/ocmstart.sh sample startup script. Run the script as the root user using the ORACLE_HOME and PATH environment variables as defined in the Oracle9i Installation Guide Release 1 (9.0.1) for UNIX Systems. After you are familiar with starting the Watchdog daemon, the Node Monitor, and the Cluster Manager, you can use the script to automate the startup process.
Starting the Watchdog Daemon
To start the Watchdog daemon, enter the following commands:
$ su root # cd $ORALE_HOME/oracm/bin # watchdogd -g dbaThe default location of the Watchdog log file is $ORACLE_HOME/oracm/log/wdd.log.
The Watchdog daemon does not have configuration files.
Watchdogd Daemon Arguments
Argument Valid Values Default Value Description -l number 0 or 1 1 If the value is 0, no resources are registered for monitoring. This argument is used for debugging system configuration problems. If the value is 1, the Cluster Manager and the Node Monitor are registered for monitoring. Oracle Corporation recommends using this option for normal operations.
-m number 0 to 180000 milliseconds 0 Extends the margin time of the Watchdog daemon. For more information on Watchdog devices, see the /usr/src/linux/Documentation/watchdog.txt file in the Linux kernel source code.
-t number 10 to 3000 milliseconds 1000 The time interval at which the Watchdog daemon checks the heartbeat messages from its clients. number must be less than the value of the soft_margin parameter.
-d string
/dev/watchdog Path of the Watchdog timer file. -e string
$ORACLE_HOME/oracm/log/wdd.log Filename of the Watchdog daemon trace file. -g string
"" (empty string) No group is allowed to connect to the Watchdog daemon.
Makes the Watchdog daemon service available for the processes owned by the group defined by the -g string argument.
Configuring the Node Monitor
To configure the Node Monitor, create the nmcfg.ora file in the $ORACLE_HOME/oracm/admin directory on each node and set the following parameters:
- Specify the DefinedNodes parameter. This parameter lists all nodes belonging to the cluster. You must define the host names in the /etc/hosts file before installing Oracle9i. For example, enter the following where node1, node2, node3, and node4 are the host names of the nodes in the cluster:
DefinedNodes=node1 node2 node3 node4- Specify the quorum partition location in the CmDiskFile parameter. For example, if your quorum partition is /dev/raw1, enter the following:
CmDiskFile=/dev/raw1- Specify the CmHostName parameter. This parameter stores the local host name for private networking. You must define the local host name in the /etc/hosts file before installing Oracle9i. For example, enter the following where node1 is the host name used for internode communication:
CmHostName=node1- Save the configured file to the $ORACLE_HOME/oracm/admin directory.
Node Monitor Parameters of the nmcfg.ora File
Parameter Valid Values Default Value Description AutoJoin 0 or 1 0 If this parameter is set to 1, the Node Monitor joins the cluster when the Node Monitor starts. The default action is that the Node Monitor joins the cluster when the Cluster Manager requests to join the cluster. CmDiskFile Up to 256 characters No default value. Set the value explicitly. Pathname of the quorum partition. CmHostName Up to 256 characters Host name of the local node. Store the local host name for private networking. Define the name in the /etc/hosts file. CmServiceName Up to 256 characters CmSrvr Service name to be used for communication among the Node Monitors. If the Node Monitor cannot find the service name in the /etc/services file, it uses the port designated by the CmServicePort parameter. CmServicePort 1 to 65535 60001 Port number to be used for communication among Node Monitors when the CmService Name parameter cannot designate the port number. DefinedNodes Up to 4096 characters No default value. Set the value explicitly. List of host names, separated by spaces, of all the nodes in the cluster. MissCount 2 to 1000 3 When the Node Monitor finds that a node failed to send a heartbeat message within the time specified by adding the value of the MissCount parameter and the value of the PollInterval parameter, the Node Monitor defines the node as dead. PollInterval 10 to 180000 milliseconds 1000 Sends heartbeat messages at this interval. WatchdogMarginWait See "Configuring Timing for Cluster Reconfiguration" .
70000 Specifies the delay between a node failure and the commencement of Oracle9i Real Application Clusters cluster reconfiguration.
Starting the Node Monitor
To start the Node Monitor:
- Confirm that the Watchdog daemon is running by entering the following command:
$ ps -elf | grep watchdogd- As the root user, start the Node Monitor as a background process. Redirect the output to a log file (although output is not normally expected).
The following example shows how to start a Node Monitor service:
$ su root # cd $ORACLE_HOME/oracm/bin # oranm </dev/null >$ORACLE_HOME/oracm/log/nm.out 2>&1 &In the preceding example, all of the output messages and error messages are written to the $ORACLE_HOME/oracm/log/nm.out file.
The oranm process spawns multiple threads. You can list all the threads by entering the ps -elf command.
oranm Argument
Argument Description /c Indicates verbose mode. It prints messages sent from the Cluster Manager to the Node Monitor.
/e:file The name of the trace file for the Node Monitor. The maximum filename length is 192 characters. The default value is $ORACLE_HOME/oracm/log/nm.log. /r Shows brief help for the Node Monitor parameters. The Node Monitor does not start if you specify this argument. /s Indicates verbose mode. It prints detailed information about Node Monitor network traffic. /v Indicates verbose mode. It prints detailed information about every activity of the Node Monitor. /x:MaxLogSize This arguments specifies the maximum size of the trace file. When the size of the trace file reaches this maximum value and it is the first trace file, the Node Monitor renames the trace file to file.startup and creates a new trace file. When the size of a trace file reaches this maximum value and it is not the first trace file, the Node Monitor renames the trace file to file.bak and creates a new trace file. The minimum value of MaxLogSize is 4096 and its maximum value is 2147483647. (A value of -1 = indicates an unlimited maximum size.) The default value is 1000000. /? Shows help for the arguments of the Node Monitor. The Node Monitor does not start if you specify this argument.
Starting the Cluster Manager
Perform the following steps to start the Cluster Manager:
- Confirm that the Watchdog daemon and Node Monitor are running.
- Confirm that the host name specified by the CmHostName parameter in the nmcg.ora file is listed in the /etc/hosts file.
- As the root user, start the oracm process as a background process. Redirect any output to a log file. For example, enter the following:
$ su root # cd $ORACLE_HOME/oracm/bin # oracm </dev/null >$ORACLE_HOME/oracm/log/cm.out 2>&1 &In the preceding example, all of the output messages and error messages are written to the $ORACLE_HOME/oracm/log/cm.out file.
The oracm process spawns multiple threads. To list all the threads, enter the ps -elf command.
Cluster Manager Arguments
Argument Description /a:action Defines the action taken when the LMON process or another Oracle process that can write to the shared disk terminates abnormally. If action is 0, no action is taken. If action is 1 (the default), the Cluster Manager requests the Watchdog daemon to stop the node completely.
/d Enables debug mode. If you set this argument, the Cluster Manager prints trace information that is useful for investigating problems. /e:file The name of the trace file for the Cluster Monitor. The maximum filename length is 192 characters. The default value is $ORACLE_HOME/oracm/log/nm.log. /v Indicates verbose mode. It prints detailed information on every activity of the Cluster Manager. /x:MaxLogSize This arguments specifies the maximum size of the trace file. When the size of the trace file reaches this maximum value and it is the first trace file, the Cluster Manager renames the trace file to file.startup and creates a new trace file. When the size of a trace file reaches this maximum value and it is not the first trace file, the Cluster Manager renames the trace file to file.bak and creates a new trace file. The minimum value of MaxLogSize is 4096 and its maximum value is 2147483647. (A value of -1 indicates an unlimited maximum size.) The default value is 1000000. /? Shows help for the arguments of the Cluster Manager. The Cluster Manager does not start if you specify this argument.
Configuring Timing for Cluster Reconfiguration
To avoid database corruption when a node fails, there is a delay before the Oracle9i Real Application Clusters reconfiguration commences. Without this delay, simultaneous access of the same data block by the failed node and the node performing the recovery can cause database corruption. The length of the delay is defined by WatchdogMarginWait parameter. By default, the time between when the failure is detected and the start of the cluster reconfiguration is 70 seconds.
The value off the WatchdogMarginWait parameter must be greater than the value of the Watchdog daemon -m argument plus the value of the soft_margin parameter.
If you decrease the value of the WatchdogMarginWait parameter, ensure that the sum of the value of the Watchdog daemon -m argument value and the value of the soft_margin parameter are less than the value of the WatchdogMarginWait parameter.
For example, if you decrease the value of the WatchdogMarginWait parameter to 65000 ms, set the value of the soft_margin parameter to 50000 ms and the value of the Watchdog daemon -m argument to 10000 ms.
To avoid database corruptions, reduce the value of the soft_margin parameter before you reduce the value of the WatchdogMarginWait parameter:
- Stop the Oracle instance.
- If you load the softdog module from a system startup file, reduce the value of the soft_margin parameter as follows:
- Edit the script to reduce the value of the soft_margin parameter. For example, enter:
/sbin/insmod softdog soft_margin=50- Reboot the server.
# shutdown -r now- If you do not load the softdog module from a system startup file, reduce the value of the soft_margin parameter as follows:
- If the softdog module is already loaded, reboot the server.
# shutdown -r now- Load the softdog module with a smaller value for the soft_margin parameter. For example:
# /sbin/insmod softdog soft_margin=50- Change the value of the WatchdogMarginWait parameter in the $ORACLE_HOME/oracm/admin/nmcfg.ora file. For example, enter the following line:
WatchdogMarginWait=64000- Restart watchdogd, oranm, and oracm, and the Oracle instance.
Watchdog Daemon and Cluster Manager Starting Options
This section describes how to disable a system reset caused by a node failure. You can also use this procedure for testing or debugging.
By default, the Watchdog daemon starts with an option of -l 1 and the oracm process starts with an option of /a:1. With these default values, the unexpected termination of the LMON process, oranm, oracm, and watchdogd causes a system reset. Also, in the current version, when the watchdogd daemon is running with an option of -l 1, the only way to stop oracm, oranm, and watchdogd is to reboot the system. Therefore, if you run OCMS to perform testing or debugging, Oracle Corporation recommends using the -l 0 and -d /dev/null options of the watchdogd daemon and the /a:0 option of the oracm command.
Known Issues and Restrictions
This sections describes restrictions that apply when you use the following Oracle9i Real Application Clusters features with OCMS:
Lamport System Change Number Generation
Lamport System Change Number (SCN) generation improves the performance of transactions. You can enable or disable the Lamport SCN generation. A delay occurs between the time that the Oracle instance commits an update on a node and the time the instance reflects upon a query on other nodes.
To enable or disable the Lamport SCN generation, set the MAX_COMMIT_PROPAGATION_DELAY initialization parameter. The default value is 90000.
Values of the MAX_COMMIT_PROPAGATION_DELAY Parameter
Value Delay (seconds) Lamport Clock 0 0 No 1 to 299 Value divided by 100 No 300 to 700 3 No Greater than 700 6 Yes (default) In the preceding table, the Value column lists values of the MAX_COMMIT_PROPAGATION_DELAY initialization parameter. The Delay column lists the maximum delay between a commit occurrence on an instance and the time a commit becomes valid on all other instances. The Lamport Clock column indicates whether the Lamport SCN generation is enabled.
ARCHIVELOG Mode and Recovery
To enable database recovery, set the THREAD initialization parameter to a value other than 0 when you use the database in ARCHIVELOG mode. Otherwise, database recovery is not possible.
Shared Server
If you use Shared Server with Oracle9i Real Application Clusters, the value of the MAX_SERVERS initialization parameter must be equal to or greater than the value of the TRANSACTIONS parameter or the Oracle instance deadlocks.