Stopping a cluster member

We tested plug-in failover operation with a stopped cluster member, as follows:

1. Verify that all the cluster members are running within the cluster. Check this by following the steps described in Workload management with the plug-in.

Cycle through to make sure that all of the cluster members are available. There is no need to repeat the changes to the workload management policy.

2. In the plugin-cfg.xml file, change the Log tag. Set the Log tag so that the line looks like this:

<Log LogLevel="Trace" Name="C:\WebSphere\AppServer\logs\stopclustermember.log"/>

3. Save and close the plugin-cfg.xml file.

4. Open the Administrative Console and select one of your cluster members.

5. Stop this cluster member.

6. Repeat step 1, noting the absence of the cluster member just stopped.

7. Start the cluster member again.

8. Wait 60 seconds and repeat step 1. The cluster member will return to serving requests.

 

What is happening?

The plug-in uses the round robin method to distribute the requests to the cluster members. Upon reaching the cluster member that was stopped, the plug-in attempts to connect and finds there is no HTTP process listening on the port.

The plug-in marks this cluster member as down and writes an error to the log, as shown in Example 5-17.

Example 5-17 Plug-in trace with cluster member down

...
ERROR: ws_common: websphereGetStream: Failed to connect to app server on host 'app1.itso.ibm.com', OS err=10061
ERROR: ws_common: websphereExecute: Failed to create the stream
ERROR: ws_server: serverSetFailoverStatus: Marking was1node_PluginMember1 down
STATS: ws_server: serverSetFailoverStatus: Server was1node_PluginMember1 : pendingConnections 0 failedConnections 1 affinityConnections 0 totalConnections 0.
ERROR: ws_common: websphereHandleRequest: Failed to execute the transaction to 'was1node_PluginMember1'on host 'app1.itso.ibm.com'; will try another one
...

It then tries to connect to the next cluster member in the primary server list. When it has found a cluster member that works, the request is served from that cluster member instead.

The plug-in does not try the cluster member for another 60 seconds. If tracing is enabled, you will be able to see that the plug-in shows the time left every time it comes to the downed cluster member in the round robin algorithm, as shown in Example 5-18.

Example 5-18 Plug-in trace cluster member retry interval countdown

...
STATS: ws_server_group: serverGroupCheckServerStatus: Checking status of 
was1node_PluginMember1, ignoreWeights 0, markedDown 1, retryNow 0, wlbAllows 0 
reachedMaxConnectionsLimit 0
TRACE: ws_server: serverHasReachedMaxConnections: currentConnectionsCount 0, 
maxConnectionsCount -1.
TRACE: ws_server_group: serverGroupCheckServerStatus: Server was1node_PluginMember1 is marked 
down; retry in 55
...

After restarting the cluster member and once the 60-second retry interval has passed, the next request attempt to the downed cluster member tries to connect again. This time, it is successful and the request is served.

  Prev | Home | Next

 

WebSphere is a trademark of the IBM Corporation in the United States, other countries, or both.

 

IBM is a trademark of the IBM Corporation in the United States, other countries, or both.