Network Deployment (Distributed operating systems), v8.0 > Establishing high availability > Configure core groups
Configure the default Failure Detection Protocol for a core group
The default Failure Detection Protocol monitors the core group network connections that the default Discovery Protocol establishes, and notifies the default Discovery Protocol if a connection failure occurs.
- Understand the concepts described in the topic Core group discovery and failure detection protocols.
- Check your operating system settings that are relevant to TCP/IP socket closing events.
- Determine your failure detection goals and which settings must change to accomplish these goals.
The value that we specify for the Heartbeat timeout period should equal the product of multiplying the value specified for the Heartbeat transmission period property, times the Number of missed consecutive heartbeats property.
- The heartbeat transmission period specifies the frequency at which a core group member sends a heartbeat packet over every established connection. The default value for the heartbeat transmission period is 30 seconds.
- The heartbeat timeout period specifies the failure detection time. If no packets are received during the specified time period, a failure is declared. The default value for the heartbeat transmission period is 180 seconds.
You might want to perform this task if:
- You want to change the failover characteristics of the system.
- Your core groups are large and analysis indicates excessive CPU usage is spent monitoring heartbeats.
The heartbeat transmission period and heartbeat timeout period are configurable. Use the admin console or wsadmin.sh to adjust these settings if the default values are not appropriate for the environment, unless you are running in a mixed cell environment that includes core groups containing a mixture of v7.0 and v6.x processes,
Mixed-version environment: If you are running in a mixed cell environment, and we have core groups containing a mixture of v7.0 and Version 6.x processes, continue to use the IBM_CS_FD_PERIOD_SECS and IBM_CS_FD_CONSECUTIVE_MISSED core group custom properties to adjust these settings.
To specify these custom properties:
- In the admin console, click...
Servers > Core Groups > Core group settings > core_group_name. Then, in the Additional Properties > Custom properties > New.
- In theName field, specify either IBM_CS_FD_PERIOD_SECS or IBM_CS_FD_CONSECUTIVE_MISSED, and then specify a new value for these properties in the Value field.
The IBM_CS_FD_PERIOD_SECS custom property specifies how frequently the Failure Detection Protocol checks the core group network connections that the discovery protocol establishes.
The IBM_CS_FD_CONSECUTIVE_MISSED property specifies the number of consecutive heartbeats that a member can missed before it is communication with that member is discontinued.
mixv
Remember, when we use the admin console or wsadmin.sh to configure the Failure Detection Protocol, you configure the heartbeat transmission period, and the heartbeat timeout period. However if you are use the custom properties to configure the Failure Detection Protocol, you configure the heartbeat transmission period, and the number of missed consecutive heartbeats.
To use the admin console to change the settings for the default Failure Detection Protocol complete the following steps.
Procedure
- In the admin console, click...
Servers > Core Groups > Core group settings > core_group_name.
- Then, in the Additional Properties > Discovery and failure detection. The Use the default protocol providers option must be selected. If this option is not selected, do not perform any more of the steps in this task.
- Specify, in milliseconds, a new value for the Heartbeat transmission period property.
The default value for this property is 30000 milliseconds, which equals 30 seconds.
- Specify, in milliseconds, a new value for the Heartbeat timeout period property.
The default value for this property is 180000 milliseconds, which equals 180 seconds.
- Click OK and then click Review.
- Select Synchronize changes with nodes, and then click Save.
- Restart all of the members of the core group.
Results
After the servers restart, the core group members all run with the new Failure Detection Protocol settings.
Core groups (high availability domains)
Core group discovery and failure detection protocols
Set up a high availability environment