IBM


6.5.2 TCP KEEP_ALIVE

If a socket between two peers is closed then the side receiving the closed socket exception will signal its peers that the other JVM is to be regarded as failed. This means that if a JVM panics or exits then the failure is detected as quickly as the TCP implementation allows. If the failure is because of a power failure or a network failure, then the socket will be closed after the period defined by the KEEP_ALIVE interval of the operating system. This is normally a long time and should be tuned to more realistic values in any WebSphere system. A long KEEP_ALIVE interval can cause many undesirable behaviors in a highly available WebSphere environment when systems fail (including database systems).

This failure detection method is however less prone to processor or memory starvation from swapping or thrashing. Both failure detectors together offer a very reliable mechanism of failure detection.

Attention: The TCP KEEP_ALIVE value is a network setting of your operating system. Changing its value might have side-effects to other processes running in your system.


Redbooks ibm.com/redbooks

Next