Considering security specific to a multi-node or process ND environment
WAS ND supports centralized management of distributed nodes and appservers. This support inherently brings complexity, especially when security is included. Because everything is distributed, security plays an even larger role in ensuring that communications are appropriately secure between appservers and node agents, and between node agents (a node-specific configuration manager) and the dmgr (a domain-wide, centralized configuration manager).
Because the processes are distributed, the authentication mechanism that must be used is LTPA. The LTPA tokens are encrypted, signed and forwardable to remote processes. However, the tokens have expirations. The SOAP connector, which is the default connector, is used for administrative security and does not have retry logic for expired tokens. However, the protocol is stateless so a new token is created for each request if there is not sufficient time to run the request with the given time left in the token. An alternative connector is the RMI connector, which is stateful, and has some retry logic to correct expired tokens by resubmitting the requests after the error is detected. Also, because tokens have time-specific expiration, the synchronization of the system clocks is crucial to the proper operation of token-based validation. If the clocks are off by too much (approximately 10-15 minutes), we can encounter unrecoverable validation failures that can be avoided by having them in sync. Verify that the clock time, date, and time zones are all the same between systems. It is acceptable for nodes to be across time zones, provided that the times are correct within the time zones (for example, 5 PM CST = 6 PM EST, and so on).
Consider the following issues when using or planning for a ND environment.
- When attempting to run system management commands such as the stopNode command, explicitly specify admin credentials to perform the operation. Most commands accept –user and –password parameters to specify the user ID and password, respectively. Specify the user ID and password of an administrative user; for example, a user who is a member of the console users with Operator or Administrator privileges or the admin user ID configured in the user registry. An example of the stopNode command follows:
stopNode -username user -password pass
- Verify that the configuration at the node agents is always synchronized with the dmgr prior to starting or restarting a node. To manually get the configuration synchronized, issue the syncNode command from each node not synchronized. To synchronize the configuration for node agents that are started, click System Administration > Nodes. Select all the started nodes, and then click Synchronize.
- Verify that the clocks on all systems are in sync including the time zone, time and date. If they are out of sync, the tokens expire immediately when they reach the target server due to the time differences.
- Verify that the LTPA token expiration period is long enough to complete the longest downstream request. Some credentials are cached and therefore the timeout does not always include the length of the request.
- The admin connector used by default for system management is SOAP. SOAP is a stateless HTTP protocol. For most situations, this connector is sufficient. If we have a problem using the SOAP connector, we might want to change the default connector on all the servers from SOAP to RMI. The RMI connector uses CSIv2, a stateful, interoperable protocol, and can be configured to use identity assertion (downstream delegation), message-layer authentication (BasicAuth or Token), and client certificate authentication (for server trust isolation). To change the default connector on a given server, go to Administration Services under Additional properties for that server.
- An error message might occur within the admin subsystem security. This error indicates that the sending process did not supply a credential to the receiving process. Typically the causes of this problem are:
- The sending process has security disabled while the receiving process has security enabled. This setup typically indicates one of the two processes are not synchronized with the cell. Having security disabled for a specific appserver does not have any effect on administrative security.
- The clocks between the systems are not synchronized; this immediately makes the credential tokens not valid. Verify that the time, date, and time zones are consistent between the two machines. An error similar to the following might occur:[9/18/02 16:48:23:859 CDT] 3b9cef35 RoleBasedAuth A CWSCJ0305I: Role based authorization check failed for security name <null>, accessId NO_CRED_NO_ACCESS_ID while invoking method propagateNotifications:[Ljavax.management.Notification; on resource NotificationService and module NotificationService.
- When getting the following error message, validate that the clocks are synchronized between all servers within the cell, and the configurations are synchronized between all nodes and the Deployment Manager. An error similar to the following might occur:[9/18/02 16:48:22:859 CDT] 3bd06f34 LTPAServerObj E CWSCJ0372E: Validation of the token failed.
ResultsProper understanding of the security interactions between distributed servers greatly reduces the problems that are encountered with secure communications. Security adds complexity because additional function must be managed. For security to work properly, it needs thorough consideration during the planning of the infrastructure.
Next stepsWhen we have security problems that are related to the WAS ND environment, see Troubleshooting security configurations to find additional information about the problem. When trace is needed to solve a problem because servers are distributed, it is often required to gather trace on all servers simultaneously while recreating the problem. This trace can be enabled dynamically or statically, depending on the type of problem that is occurring.
Troubleshooting security configurations