+

Search Tips   |   Advanced Search

Core group discovery and failure detection protocols

When a core group member starts, no connections to other core group members exist. If a core group is configured to run with either the default Discovery and Failure Detection Protocols or an alternative protocol provider, either the discovery and failure detection tasks or the alternate protocol provider tasks start as part of the process startup procedure. These tasks establish connectivity to other core group members, monitor this connectivity and handle connectivity failures for this core group member, at regularly scheduled intervals, as long as the core group member is active.


The default Discovery Protocol

The default Discovery Protocol establishes network connectivity with the other members of the core group by retrieving the list of core group members and the associated network information from the product configuration settings. The Discovery Protocol then attempts to open network connections to all of the other core group members. At periodic intervals, the Discovery Protocol recalculates the set of unconnected members and attempts to open connections to those members.

When a connection is made to another core group member, the Discovery Protocol notifies the View Synchrony Protocol, and logs this event as an informational message in the SystemOut.log file.

Connections can fail at any time for a variety of reasons. The Failure Detection Protocol detects connection failures and notifies the Discovery Protocol. The Discovery Protocol then attempts to open a new network connection to that member at the next scheduled interval.

The amount of CPU cycles that the Discovery Protocol task consumes is proportional to the number of core group members that are stopped or unreachable. The CPU cycles that the Discovery Protocol task consumes is negligible at the default settings.


Default Failure Detection Protocol

The Failure Detection Protocol monitors the core group network connections that the Discovery Protocol establishes. When the Failure Detection Protocol detects a failed network connection, it reports the failure to the View Synchrony Protocol and the Discovery Protocol. The View Synchrony Protocol adjusts the view to exclude the failed member. The Discovery Protocol attempts to reestablish a network connection with the failed member. This task runs as long as the member is active.

The Failure Detection Protocol uses two distinct mechanisms to find failed members:


(iSeries) (Dist) Alternative protocol providers

Currently, no alternative protocol providers are available for the IBM i and distributed platforms.

Use an alternate protocol provider instead of the default Discovery Protocol and Failure Detection Protocol to monitor and manage communication between core group members. In general, alternate protocol providers, such as the z/OS Cross-system Coupling Facility (XCF)-based provider, uses less system resources than the default Discovery Protocol and Failure Detection Protocol, especially during times when the core group members are idle. An alternate protocol provider generally use less system resources because it does not perform the member-to-member TCP/IP pinging that the default protocol providers use to determine if a core group member is still active.

(ZOS) If we decide to use the z/OS Cross-system Coupling Facility (XCF)-based protocol provider, we should understand that at startup, the server process is joined, as a member, to an XCF group. The XCF group contains all of the active members for the core group. XCF provides notification to all of the members of this group whenever a member joins the group, and whenever a member can no longer be contacted because the server shutdown, or XCF determines that the server process has terminated. Whenever a connection between core group members is established, the z/OS Cross-system Coupling Facility (XCF)-based protocol provider notifies the View Synchrony Protocol, and logs this event as an informational message, similar to the following message, in the SystemOut.log file.

Before reconfiguring a specific core group to use an alternative protocol provider, verify that the core group meets the following requirements. If the core group does not meet all of these requirements, we must continue to use the default Discovery Protocol and the default Failure Detection Protocol with this core group.


  • Configure the default Discovery Protocol for a core group
  • Configure the default Failure Detection Protocol for a core group
  • (ZOS) Select an alternate protocol provider for a core group
  • Core group custom properties
  • High Performance Extensible Logging