Network Deployment (Distributed operating systems), v8.0 > Reference > Administrator best practices
addNode command best practices
Overview
Use addNode.sh to add a stand-alone node into a cell.
The addNode.sh command does the following:
- Copies the base WAS cell configuration to a new cell structure that matches the structure of dmgr.
- Creates a new node agent definition for the node that the cell incorporates.
- Sends commands to the dmgr to add the documents from the new node to the cell repository.
- Performs the first configuration synchronization for the new node, and verifies that this node is synchronized with the cell.
- Launches the node agent process for the new node.
- Updates the setupCmdLine.bat or setupCmdline.sh files and the wsadmin.properties file to point to the new cell.
- After federating the node, addNode.sh backs up the plugin-cfg.xml file from...
WAS_HOME/config/cells
...to the directory...
config/backup/base/cells
- Generates a new plugin-cfg.xml file on the dmgr
- Calls nodeSync to copy files to nodes.
Tips for using addNode.sh
- If you add a node to a cell, the cell name of the node you are federating must be different from the name of the cell to which the node is federated. Otherwise, you receive the ADMU0027E message, and addNode.sh does not add the node to the cell.
- Verify that the dmgr and nodes are updated to the same revision level within the WAS. For example, a dmgr at level 6.0.1 will be unable to federate with nodes at 6.0.2.
- Do not put WAS .jar files on the generic CLASSPATH variable (default class path) for the overall system.
- If the WAS ND cannot resolve the host name of the server, problems can occur with such situations as adding or administering nodes, or the node agent contacting the application server.
To resolve the host name, the product opens a port, or queries for an IP address. The product then waits for the operating system to return the correct information. The operating system might go to multiple places to find the IP address, but the product does not care about the order in which the operating system does this, if the correct information is returned. If the host name of the server cannot be resolved, refer to your network administration documentation to resolve the problem. The following additional information might help you ensure that the host name is resolved.
- Some operating systems create an association between the host name of the machine and the loopback address of 127.0.0.1. Red Hat installations create the association by default. SuSE installations create a similar association with the loopback address 127.0.0.2. The association is in the hosts file.
If the hosts file contains mappings from the 127.0.0.1 IP or 127.0.0.2 address to a host name other than localhost, remove the mappings. The following example illustrates what might happen if the mappings are not removed: When a node agent communicates with the dmgr, it sends its IP address to the dmgr. The node agent resolves the node agent host name to 127.0.0.1 if the operating system returns a mapping for the host name from the hosts file. This resolution prevents the dmgr from sending a message to the node agent because the 127.0.0.1 IP address is also the IP address for the local machine of the dmgr.
(AIX) (Solaris) The hosts file is located at...
/etc/hosts
(Windows) The hosts file is located at...
\WINDOWS\system32\drivers\etc\hosts
- (AIX) The default AIX installation checks the domain name server (DNS) first to return the information to a server so that the server host name of that server or another server can be resolved. If the host name cannot be resolved or cannot be resolved in a reasonable amount of time, you can add the following statement to the file...
/etc/netsvc.conf
that the AIX operating system checks the local hosts file first for the host name.
hosts=local,bind
- By default, applications that are installed on the node will not copy to the cell. If you install an application after using addNode.sh, the application will install on the cell. By specifying the -includeapps option, you force addNode.sh to copy applications from the node to the cell. Applications with duplicate names will not copy to the cell.
- Cell-level documents are not merged.
Any changes that you make to the stand-alone cell-level documents before using addNode.sh must be repeated on the new cell. For example, virtual hosts.
- If you receive an OutOfMemory exception while using addNode.sh, you may need to increase the heap size of the dmgr.
To increase the heap size of the dmgr, adjust the Maximum heap size parameter. For example, in the administrative console, go to...
System administration | Deployment manager | Java and Process Management | Process definition | Java Virtual Machine
..and increase the Maximum heap size value.
(Solaris) Avoid trouble: On HP-UX or Solaris operating systems, a java.lang.OutOfMemoryError: PermGen space problem might occur during large and complex tasks. For example, you might encounter this problem when you run commands such as addNode on nodes with large applications. When the demands for resources exceed the default storage size, the task can fail with a java.lang.OutOfMemoryError: PermGen space error.
To resolve this problem, increase the minimum size of the permanent region. Set the -XX:PermSize JVM option to a value such as 128MB, which is sufficient for many situations in which this problem occurs:
XX:PermSize=128m
- In some instances it may take longer than anticipated for the dmgr to respond to addNode.sh.
The default timeout value, which determines how long the client will wait for a server response, is appropriate in the majority of cases. However, you may require more time for the server to respond under heavier processing conditions. For example, if you include the -includeapps option and have a large number of applications, or the applications are very large, the default value of 180 seconds may be insufficient.
To change the default timeout value, open the file...
$PROFILE//properties/soap.client.props
...in any ASCII text editor and find the following line (shown here with default value of 180 seconds):
com.ibm.SOAP.requestTimeout=180
If you need to change the default you can edit this line to set the timeout to a value more appropriate for your situation.
Setting the default timeout value to 0 seconds disables the timeout check.
If the timeout value is set too high you will have to wait a long time to determine if addNode.sh will successfully complete its request to the dmgr. If the value is set too short the dmgr will not have sufficient time to complete the request before addNode.sh concludes that the dmgr is not responding and will respond with an error. Other factors that may affect server timeouts include the processing load or excessive paging on the dmgr and network latency. Some of these conditions may be transient.
- If you receive an addNode error message regarding bad clock syncs, make sure that the computer with the node to be federated is in time sync with the dmgr computer to which the node is to be federated.
- If you use addNode.sh from a node that was federated to an existing dmgr, the dmgr will be corrupt. You will not be able to start the second dmgr after you stop it. This happens because the addNode command creates a directory...
dmgrProfile/config/cells/dmgrCell/dmgrCell
You will come into contact with the issue if we have a federated node and run addNode.sh again for a different dmgr. This causes the dmgr to be corrupted and you will not be able to start the dmgr afterwards because of the incomplete node directory. Perform one of the following solutions to resolve this issue:
- If the dmgr is running, use the cleanupNode.sh on dmgr where the incomplete node resides.
- Manually delete the directory created on the dmgr configuration during an addNode command operation that was incomplete. For example:
WAS_HOME/profiles/dmgrProfile/config/cells/dmgrCell/nodeName
Manage nodes
addNode command
removeNode command