+

Search Tips   |   Advanced Search

Liberty collective troubleshooting

There are a number of common issues we might encounter when troubleshooting the Liberty profile. The issues typically relate to configuration of the collective controller, member, or host system. Browse the list of issues to learn how to resolve the issues.

For fixes to other issues, see Runtime environment known restrictions.


Issues involving CLI, Jython, or MBean connection using the REST connector

CWWKX0217E: No MBean is currently registered with the given object_name

Message:

    Error: CWWKX0217E: No MBean is currently registered with the given ObjectName 'WebSphere:feature=collectiveController,type=CollectiveRegistration,name=CollectiveRegistration'

Cause:

The MBean might not be available yet. Check the server logs to see if the MBean has reported ready.

There might have been a problem starting the collective repository. Check to see if the collective repository has started.

If the target is a collective controller, verify that the replica set is active. If a majority of the collective controller replicas are not started, this message will be seen. Start the remaining replicas.

The servers configuration might be incomplete. Make sure that the server is properly configured.

CWWKX0215E: There was a problem with the user name or password provided.

Message:

    Error: CWWKX0215E: There was a problem with the user name or password provided. The server responded with code 401 and message 'Unauthorized'

Cause:

The username and password might be incorrect. Make sure that the username and password are correct for the target server.

The user might not be granted the Administrator role. Make sure that the user is granted the Administrative role, or choose a different user.

The security configuration for the target server might be incomplete. Make sure that the security configuration is defined and the security service reports as ready (CWWKS0008I).

Error: Connection refused: connect

Message:

    Error: Connection refused: connect

Cause:

The host and port might be incorrect. Make sure that the host and port are correct for the target server.

The server might not be running. Make sure that the server is running.

java.net.SocketException error

Message:

    java.net.SocketException: java.security.NoSuchAlgorithmException: Error constructing implementation (algorithm: Default, provider: SunJSSE, class: sun.security.ssl.SSLContextImpl$DefaultSSLContext)(possibly others...)

Cause:

The truststore and truststore password might be incorrect. Make sure that the truststore path, truststore password, and contents of the truststore are correct.


Issues involving start and stop commands

Start or stopping the servers remotely causes a Java not found error

Message:

Start or stopping the servers remotely (using ClusterManager.startCluster or ServerCommands.startServer for example) encounters the following error:

    {stderr=java: javaCmd 14: serverCmd 32: ./server 873: FSUM7351 not found, stdout=, returnCode=127}

Solution:

The member servers need a server.env file that specifies a JAVA_HOME variable.

CTGRI0000E: Could not establish a connection to the target machine with the authorization credentials that were provided.

Message:

    CTGRI0000E Could not establish a connection to the target machine with the authorization credentials that were provided.

Cause:

Authentication fails using user name or password:

  • Make sure that the user name and password are correct in the target server's server.xml <hostAuthConfig> element.

  • Update the host authentication configuration using the collective updateHost command.

Authentication fails using ssh keys:

  • Check permissions on:

    • ~/.ssh should be 0700

    • ~/.ssh/authorized_keys should be 0600

  • ~/.ssh and all children must be correct if using SELinux. Use restorecon -R to fix the permissions.

CTGRI0001E: The application could not establish a connection to host_name.

Message:
{ExceptionMessage=ConnectException caught while performing stopCluster operation on member webp1a.ibm.com,/P1A/WebSphere_LP/usr,memberA1: java.net.ConnectException: 
CTGRI0001E The application could not establish a connection to webp1a.ibm.com., Exception=java.net.ConnectException: CTGRI0001E The application could not establish a connection to webp1a.ibm.com.}
Cause:

Starting or stopping the servers remotely using commands such as ClusterManager.startCluster or ServerCommands.startServer can cause the error.

Message CTGRI0001E, along with message CTGRI0026E, can indicate that too many concurrent SSH connections are made to a host. Possible causes are:

  • Autonomics such as scaling controller

  • Running ClusterManager.startCluster, ServerCommands.startServer, or other system management commands on a number of servers on a single host that exceeds the maximum number of concurrent unauthenticated connections to the SSH daemon.

Solution:

Confirm that the RPC mechanism (such as SSH) is started. Also confirm that the configured settings, such as host and port, are correct.

If the environment uses SSH, change the settings in the SSH configuration file. The SSH configuration MaxStartups setting has a default of 10 concurrent unauthenticated connections. Changing the MaxStartups setting in the SSH configuration file, /etc/ssh/sshd_config, can solve the problem. The MaxStartups setting specifies the maximum number of concurrent unauthenticated connections to the SSH daemon. Additional connections are dropped until authentication succeeds or the LoginGraceTime expires for a connection. We can enable random early drop by specifying the three colon separated values start:rate:full (for example, 10:30:60). sshd(8) refuses connection attempts with a probability of rate/100 (30%) if there are currently start (10) unauthenticated connections. The probability increases linearly and all connection attempts are refused if the number of unauthenticated connections reaches full (60). The following sample SSH configuration file settings specify MaxStartups and other settings that can alleviate connection problems:

ClientAliveInterval 60
ClientAliveCountMax 3
MaxSessions 100
MaxStartups 100:30:200
LoginGraceTime 180
For more information about Secure Shell (SSH) protocol and changing /etc/ssh/sshd_config settings, see Set up RXA for Liberty collective operations.

CTGRI0026E A connection could not be completed to host_name during the specified timeout interval.

Message:

    CTGRI0026E A connection could not be completed to webp1a.ibm.com during the specified timeout interval.

Cause:

Too many concurrent SSH connections to a host can cause this error. Solution:

See the solution for message CTGRI0001E.

CWWKX7204E: Cannot connect to host host_name with the credentials provided.

Message:
localhost,C:/wlp,member1 stop operation resulted in an Exception: ConnectException caught while performing stopCluster operation on member localhost,C:/wlp,member1: java.net.ConnectException: 
CWWKX7204E: Cannot connect to host localhost with the credentials provided.
Solution:

Make sure that the cluster member authentication information is set correctly and that all Remote Execution and Access (RXA) requirements are met. Many RXA operations require access to resources that are not generally accessible by standard user accounts. See Set up RXA for Liberty collective operations for more information.

Concepts:

  • Collective architecture
  • File transfer in a Liberty collective
  • Collective security
  • File transfer

    Tasks:

  • Set the default host name of a Liberty server
  • Configure a Liberty collective
  • Register host computers with a Liberty collective
  • Set the JAVA_HOME variable for Liberty collective members
  • Configure Liberty collective replica sets

    Reference:

  • Example of setting up a JMX routing environment
  • List of provided MBeans
  • Overriding Liberty server host information