+

Search Tips   |   Advanced Search

(dist)

The high availability deployment manager

The high availability (HA) deployment manager function is configured using a shared file system. When this configuration option is chosen, multiple deployment managers are configured. The benefit of the HA deployment manager function is that the deployment manager is no longer the single point of failure for cell administration. This is important in environments relying on automated operations, including application deployment and server monitoring.


dmgr overview

The deployment managers exist as peers. One is considered active, also known as primary, and hosts the administrative function of the cell, while the others are backups in standby mode. If the active manager fails, a standby takes over and is designated the new active deployment manager. A command line utility is provided to clone the original cell deployment manager into additional deployment managers. Each deployment manager is installed and configured to run on a different physical or logical computer. The deployment managers need not be hosted on homogenous operating platforms, although like platforms are recommended. Each deployment manager shares the same instance of the master configuration repository and workspace area. These must be located on a shared file system.

The file system must support fast lock recovery. The IBM General Parallel File System (GPFS) is recommended, and the Network File System Version 4 (NFS) is also an option. If we use the high availability deployment manager on AIX Version 5.3 and are using NFS Version 4, you must have bos.net.nfs.client Version 5.3.0.60 or later.

Avoid trouble: We must stop all deployment managers that are running in the environment before we can perform maintenance on the NFS drive. Use the extended repository service in conjunction with the HA deployment manager feature. In the event of a NFS failure, we can recover the latest configuration changes using the extended repository service.gotcha

Normal operation includes starting at least two deployment managers. A new highly available deployment manager component runs in each deployment manager to control which deployment manager is elected as the active one. Any other deployment manager in the configuration is in standby mode. The on demand router (ODR) is configured with the communication endpoints for the console, wsadmin.sh, and scripting. The ODR recognizes which deployment manager instance is active and routes all administrative communication to that instance. The HA deployment manager function supports only use of the JMX SOAP connector. The JMX RMI connector is not supported in this configuration.


Configuration

The deployment managers are initially configured into the same core group. Configuring the deployment managers in the same core group is important so that the routing information that is exposed to the ODR is consistent across all the deployment managers. If the deployment managers are placed into separate core groups, the core groups must be connected with a core group bridge.

A typical HA deployment manager configuration consists of two deployment managers that are located on separate workstations. The deployment managers share a master repository located on a SAN FS. All administrative operations are performed through the elected active deployment manager. The standby deployment manager is fully initialized and ready to do work but cannot be used for administration. This is because the administrative function does not currently support multiple concurrent server processes writing to the same configuration. Therefore, the standby rejects any login and JMX requests.

However, if the active deployment manager is stopped or fails, the highly available deployment manager component recognizes the loss of the active deployment manager and dynamically switches the standby into active mode so it can take over for the lost deployment manager. The active and standbys share work spaces. When a deployment manager takeover occurs, work is not lost, because the ODR automatically recognizes the election of the new active deployment manager and reroutes administrative requests to the new active deployment manager. Note that there is a sub 1 minute period of time where the deployment manager will not be available until failover to the secondary is complete.

Failover to the new active deployment manager is depicted in the following diagram:

While the HA deployment manager component is able to detect deployment manager failure and initiate takeover, there are edge conditions where each deployment manager could temporarily believe it is the active deployment manager. To prevent this situation from occurring, the active deployment manager holds a file lock in the shared file system. Because of this, the takeover of the active deployment manager by the standby will take a brief period of time approximately equal to the time it takes for the shared file system to detect the loss of the active deployment manager and release the lock. SAN FS and NFS both use a lock lease model and have configurable times for lock release for failed lock holders. This can be configured as low as 10 seconds for SAN FS.


Related concepts

  • Topology Configurations for Multi-Cell Routing


    Related tasks

  • Configure a high availability deployment manager environment


    Related information:

    IBM General Parallel File System (GPFS) Information Center