Configure the default Failure Detection Protocol for a core group
The default Failure Detection Protocol monitors the core group network connections that the default Discovery Protocol establishes, and notifies the default Discovery Protocol if a connection failure occurs.
- Understand the concepts described in the topic Core group discovery and failure detection protocols.
- Check the operating system settings relevant to TCP/IP socket closing events.
- Determine failure detection goals and which settings must change to accomplish these goals.
Heartbeat timeout period = Heartbeat transmission period * Number of missed consecutive heartbeats
The heartbeat transmission period specifies the frequency at which a core group member sends a heartbeat packet over every established connection. The default value for the heartbeat transmission period is 30 seconds.
The heartbeat timeout period specifies the failure detection time. If no packets are received during the specified time period, a failure is declared. The default value for the heartbeat transmission period is 180 seconds.
We might want to perform this task...
- To change the failover characteristics of our system.
- Core groups are large and analysis indicates excessive CPU usage is spent monitoring heartbeats.
The heartbeat transmission period and heartbeat timeout period are configurable. Use the administrative console or the wsadmin tool to adjust these settings if the default values are not appropriate for the environment, unless we are running in a mixed cell environment that includes core groups that contain a mixture of v7.0 and v6.x processes,
Mixed-version environment: If we are running in a mixed cell environment, and we have core groups that contain a mixture of v7.0 and v6.x processes, we must continue to use the IBM_CS_FD_PERIOD_SECS and IBM_CS_FD_CONSECUTIVE_MISSED core group custom properties to adjust these settings.
To specify these custom properties:
- In the administrative console, click...
Servers > Core Groups > Core group settings > core_group_name > Additional Properties section, click Custom properties > New
- In the Name field, specify either IBM_CS_FD_PERIOD_SECS or IBM_CS_FD_CONSECUTIVE_MISSED.
Then specify a new value for these properties in the Value field.
The IBM_CS_FD_PERIOD_SECS custom property specifies how frequently the Failure Detection Protocol checks the core group network connections that the discovery protocol establishes.
The IBM_CS_FD_CONSECUTIVE_MISSED property specifies the number of consecutive heartbeats that a member can missed before it is communication with that member is discontinued.
mixv
Remember, when using the administrative console or the wsadmin tool to configure the Failure Detection Protocol, we configure the heartbeat transmission period, and the heartbeat timeout period. However if we are using the custom properties to configure the Failure Detection Protocol, we configure the heartbeat transmission period, and the number of missed consecutive heartbeats.
Change the settings for the default Failure Detection Protocol
- In the administrative console, click...
Servers > Core Groups > Core group settings > core_group_name > Additional Properties > Discovery and failure detection
The Use the default protocol providers option must be selected. If this option is not selected, do not perform any more of the steps in this task.
- Specify, in milliseconds, a new value for the Heartbeat transmission period property.
The default value for this property is 30000 milliseconds, which equals 30 seconds.
- Specify, in milliseconds, a new value for the Heartbeat timeout period property.
The default value for this property is 180000 milliseconds, which equals 180 seconds.
- Click OK and then click Review.
- Select Synchronize changes with nodes, and then click Save.
- Restart all of the members of the core group.
After the servers restart, the core group members all run with the new Failure Detection Protocol settings.
Related:
Core groups (high availability domains) Core group discovery and failure detection protocols