Security considerations when in a multi-node WAS ND environment
Overview
With WebSphere Application Server Network Deployment we need to ensure secure communications between application servers and node agents, and between node agents and the deployment manager .
Because the processes are distributed, the authentication mechanism that must be used is LTPA. The LTPA tokens are encrypted, signed and forwardable to remote processes.
Tokens have expirations. The default SOAP connector, used for administrative security, does not have retry logic for expired tokens. However, the protocol is stateless so a new token is created for each request if there is not sufficient time to run the request with the given time left in the token. An alternative connector is the RMI connector, which is stateful, and has some retry logic to correct expired tokens by resubmitting the requests after the error is detected.
Because tokens have time-specific expiration, the synchronization of the system clocks is crucial to the proper operation of token-based validation. If the clocks are off by too much (approximately 10-15 minutes), we can encounter unrecoverable validation failures that can be avoided by having them in sync. Verify that the clock time, date, and time zones are all the same between systems. It is acceptable for nodes to be across time zones, provided that the times are correct within the time zones (for example, 5 PM CST = 6 PM EST, and so on).
Because the processes are distributed, an authentication mechanism must be selected that supports an authentication token such as LTPA. The tokens are encrypted, signed and forwardable to remote processes. However, the tokens have expiration times which are set on the WAS console. The SOAP connector which is the default connector, is used for administrative security and does not have retry logic for expired tokens. However, the protocol is stateless so a new token is created for each request if there is not sufficient time to run the request with the given time left in the token. An alternative connector is the Remote Method Invocation (RMI) connector, which is stateful, and has some retry logic to correct expired tokens by resubmitting the requests after the error is detected.
Because tokens have time-specific expiration, the synchronization of the system clocks is crucial to the proper operation of token-based validation. If the clocks are off by too much (approximately 10-15 minutes), we can encounter unrecoverable validation failures that can be avoided by having them in sync. Verify that the clock time, date, and time zones are all the same between systems. It is acceptable for nodes to be across time zones, provided that the times are correct within the time zones (for example, 5 PM CST = 6 PM EST, and so on).
Checklist
- Verify that the configuration at the node agents is always synchronized with the deployment manager prior to starting or restarting a node.
To manually get the configuration synchronized, issue the syncNode command from each node that is not synchronized. To synchronize the configuration for node agents that are started, click...
System Administration | Nodes | nodes | Synchronize
- Verify that the clocks on all systems are in sync, including the time and date.
If they are out of sync, the tokens expire immediately when they reach the target server due to the time differences. Coordinated Universal Time (UTC) is used by default, and all other machines must have the same UTC time. Consult the operating system documentation for information regarding how to ensure this.
- Verify that the LTPA token expiration period is long enough to complete the longest downstream request.
Some credentials are cached and therefore the timeout does not always include the length of the request.Specifically for cached credentials, we might need to evaluate the settings for the security cache (WSSecureMap) and LTPA timeout.
- We can change the default connector from the default SOAP, a stateless HTTP protocol, to RMI, a stateful, interoperable protocol.
RMI can be configured to use...
- Identity assertion (downstream delegation)
- Message-layer authentication (BasicAuth or Token)
- Client certificate authentication (for server trust isolation)
To change the default connector on a given server, go to...
server | Administration Services | Additional properties
- (zos) An error message might occur within the administrative subsystem security. This error indicates that the sending process did not supply a credential to the receiving process. Typically the cause of this problem is the sending process has security disabled while the receiving process has security enabled. This setup typically indicates that one of the two processes are not synchronized with the cell. Having security disabled for a specific application server does not have any effect on administrative security.
- An error message might occur within the administrative subsystem security. This error indicates that the sending process did not supply a credential to the receiving process. Typically the causes of this problem are:
- The sending process has security disabled while the receiving process has security enabled. This setup typically indicates one of the two processes are not synchronized with the cell. Having security disabled for a specific application server does not have any effect on administrative security.
- The clocks between the systems are not synchronized; this immediately makes the credential tokens not valid. Verify that the time, date, and time zones are consistent between the two machines. An error similar to the following might occur:
[9/18/02 16:48:23:859 CDT] 3b9cef35 RoleBasedAuth A CWSCJ0305I: Role based authorization check failed for security name <null>, accessId NO_CRED_NO_ACCESS_ID while invoking method propagateNotifications:[Ljavax.management.Notification; on resource NotificationService and module NotificationService.
- When getting the following error message, validate that the clocks are synchronized between all servers within the cell, and the configurations are synchronized between all nodes and the dmgr. An error similar to the following might occur:
[9/18/02 16:48:22:859 CDT] 3bd06f34 LTPAServerObj E CWSCJ0372E: Validation of the token failed.
Results
Proper understanding of the security interactions between distributed servers greatly reduces the problems that are encountered with secure communications. Security adds complexity because additional function must be managed. For security to work properly, it needs thorough consideration during the planning of the infrastructure.
What to do next
When we have security problems related to the WAS Network Deployment environment, see Troubleshooting security configurations to find additional information about the problem. When trace is needed to solve a problem because servers are distributed, it is often required to gather trace on all servers simultaneously while recreating the problem. This trace can be enabled dynamically or statically, depending on the type of problem that is occurring.
Subtopics
- LTPA token cushion period
Within the LTPA token expiration, there is a cushion period used to validate the tokens before a request is sent to the downstream application servers. This helps prevent the expiration of the tokens in a downstream server. The cushion period is twenty percent of the LTPA token expiration period, and has a maximum default time out value of ten minutes. However, this period should not be lower than the ORB request time out value, which is three minutes.
Related concepts
(zos) Java thread identity and an operating system thread identity
- LTPA token cushion period
Related tasks
- Troubleshooting security configurations