Network failures

When the network becomes unavailable, the Node Agent will force its managed appservers to stop if the loopback is configured for 127.0.0.1. If the network becomes available within 10 minutes (which is the timeout value for the Node Agent), the Node Agent can restart the stopped servers. If the network is unavailable beyond the 10 minutes downtime, the application servers will remain down and you will need to manually start the application servers. The behavior after disconnecting the network cable for a long time is shown in Example 9-4.

Example 9-4 Network failure Node Agent trace

NodeSyncTask  A ADMS0003I: Configuration synchronization completed 
successfully.
NodeSync      E ADMS0015E: The synchronization request can not be completed 
because the node agent can not communicate with the deployment manager.
NodeSyncTask  A ADMS0016I: Configuration synchronization failed.
NodeAgent     W ADML0063W: Cannot contact server "WebHAbbMember4". Force to 
stop this server if it is still running.
NodeAgent     A ADML0064I: Restarting unreachable server "WebHAbbMember4".
NodeAgent     W ADML0063W: Cannot contact server "WebHAbbMember3". Force to 
stop this server if it is still running.
NodeAgent     A ADML0064I: Restarting unreachable server "WebHAbbMember3".
NodeSync      E ADMS0015E: The synchronization request can not be completed 
because the node agent can not communicate with the deployment manager.
NodeSyncTask  A ADMS0016I: Configuration synchronization failed.
NodeAgent     W ADML0040E: Timed out waiting for server "WebHAbbMember4" 
initialization: 600 seconds
NodeAgent     W ADML0040E: Timed out waiting for server "WebHAbbMember3" 
initialization: 600 seconds
NodeSync      E ADMS0015E: The synchronization request can not be completed 
because the node agent can not communicate with the deployment manager.
NodeSyncTask  A ADMS0016I: Configuration synchronization failed.
NodeSyncTask  A ADMS0003I: Configuration synchronization completed 
successfully.

Example 9-5 shows the server trace.

Example 9-5 Network failure Server trace

WsServer      E WSVR0003E: Server WebHAbbMember3 failed to start
com.ibm.ejs.EJSException: Could not register with Location Service Daemon; 
nested exception is: 
	org.omg.CORBA.TRANSIENT: Host unreachable: 
connect:host=10.55.90.84,port=9900  minor code: 4942F301  completed: No
org.omg.CORBA.TRANSIENT: Host unreachable: 
connect:host=10.55.90.84,port=9900  minor code: 4942F301  completed: No
	at 
com.ibm.CORBA.transport.TransportConnectionBase.connect(TransportConnection
Base.java:338)
	[10:10:27:674 CDT] 68cf7b9a WsServer      E WSVR0009E: Error occurred 
during startup

To resolve this problem, configure a loopback alias to a systems' real IP address, not the default loopback of 127.0.0.1.

In AIX, use the following command:

ifconfig lo0 alias my_ip netmask 255.255.255.0

After a short-time network outage, the Node Agent can restart servers automatically as shown in Example 9-6.

Example 9-6 Automatic restart of appservers after network failure

:NodeSyncTask  A ADMS0003I: Configuration synchronization completed 
successfully.
NodeAgent     A ADML0000I: Server initialization completed. Process id is: 
45848
DiscoveryMBea I ADMD0023I: Process discovered (name: WebHAbbMember4, type: 
ManagedProcess, pid: 45848)
NodeSyncTask  A ADMS0003I: Configuration synchronization completed 
successfully.
NodeAgent     A ADML0000I: Server initialization completed. Process id is: 
116004
DiscoveryMBea I ADMD0023I: Process discovered (name: WebHAbbMember3, type: 
ManagedProcess, pid: 116004)
NodeSyncTask  A ADMS0003I: Configuration synchronization completed 
successfully.

  Prev | Home | Next

 

WebSphere is a trademark of the IBM Corporation in the United States, other countries, or both.

 

IBM is a trademark of the IBM Corporation in the United States, other countries, or both.