High availability environment troubleshooting tips

High availability environment troubleshooting tips

+
Search Tips | Advanced Search

Message HMGR0218I is not displayed after a Java virtual machine starts

In a properly set up high availability environment, a HA manager can reassess the environment it is managing and accept new components as they are added to the environment. For example, when a JVM is added to the infrastructure, a discovery process begins. During startup the JVM tries to contact the other members of the core group. When it finds another running JVM, it initiates a join process with that JVM that determines whether or not the JVM can join the core group. If the new JVM is accepted as a member of the core group, all of the JVMs, including the new one, log message HMGR0218I . This message is also displayed on the admin console.
Message HMGR0218I indicates the number of application servers in the core group that are currently online. If this message is not displayed after a JVM starts, either a configuration problem or a communication problem has occurred. To fix this situation, verify that the appserver is running on a current configuration, by either using the dmgr to tell the node agent to synchronize, or use the syncNode command o manually perform the synchronization. If the JVM still cannot join the core group, a network configuration problem exists.

Message HMGR0123I appears in the system log file

Message HMGR0123I might appear in the system log file if the status of core group members changes at the same time as the active coordinator changes. For example, this message might be issued when a core group member restarts and becomes the active coordinator.
This information message usually does not indicate a serious problem. Even if the message appears in the system log file, the new active coordinator receives the updated group status. To minimize the occurrences of this message, you should select a core group member that does not frequently restart as the preferred core group coordinator.

CPU starvation messages in the system log file

CPU starvation detected error messages are displayed in the system log file whenever there is not enough physical memory available to allow the HA manager threads to have consistent runtimes. When the CPU is spending the majority of its time trying to load swapped-out processes while processing incoming work, thread starvation might occur. The HA manager detects this condition, and logs these error messages informing you that threads are not getting the required runtime.
To achieve good performance and avoid receiving these error messages, allocate at least 512 MB of RAM for each Java process running on a single machine.

High CPU usage in a large cell configuration when security is enabled

With certain configurations and states, the amount of time spent in discovery becomes substantial.

If a large number of processes are defined within a core group, a proportionally large number of connections must be established to support these processes.
If a large number of inactive processes are defined within a core group, a proportionally large number of connections are attempted during each discovery interval.
If administrative security is enabled, the DCS connections are secured, and the impact of opening a connection greatly increases .

Use the Discovery and failure detection page in the admin console to increase the length of time that the Discovery Protocol waits to calculates the set of unconnected core group members, and attempts to open connections to those members. Increasing the length of time between consecutive discovery periods decrease the amount of CPU time that is spent in discovery.

Related concepts
Core group coordinator

Related tasks
Configure core group preferred coordinators
Configure the default Discovery Protocol for a core group
Set up a high availability environment

Related
syncNode