Troubleshooting WebSphere Liberty operators

We might encounter an issue when we install, configure, or manage operators. We can run oc or kubectl commands to check the status of pods, operators, and custom resources (CR) and to investigate problems. Run the MustGather tool to collect information about clusters and send the information to IBM Support.

To run oc commands, we need the Red Hat OpenShift command-line interface (CLI). To run kubectl commands if Red Hat OpenShift is not installed, we need the Kubernetes command line tool.

Tip: The documentation shows oc commands. To run kubectl commands, replace oc with kubectl in the commands.


Troubleshooting operators

Run the following oc commands to investigate problems with pods and operators.

  • Check the WebSphere Liberty operator.

      oc get pods -l name=websphere-liberty-operator

    Output from the get pods command shows the pod name and status.

      NAME    READY   STATUS    RESTARTS   AGE
      websphere-liberty-operator-5c4548d98f-xgqtg   1/1     Running   0  2m29s

  • Check the operator events. In the describe pod command, replace pod_name with a pod name from the get pods output.

      oc describe pod pod_name

    The following example command uses the WebSphere Liberty operator pod name.

      oc describe pod websphere-liberty-operator-5c4548d98f-xgqtg

  • Check the operator logs. In the logs command, replace pod_name with a pod name from the get pods output.

      oc logs pod_name

The logging level of the operator can be dynamically modified using the operator's ConfigMap.

See Operator ConfigMap.


Troubleshooting custom resources

If the operator is running as wanted, check the status of the WebSphereLibertyApplication customer resource (CR) instance.

The following commands use wlapp, which is the short name for WebSphereLibertyApplication.

  • Check the CR status. In the get wlapp command, replace app_name with the name of the CR instance.

      oc get wlapp app_name -o wide

    The following example shows the command with my-app for app_name and the output.

      oc get wlapp my-app -o wide
      
      NAME     IMAGE      EXPOSED   RECONCILED   REASON    MESSAGE   AGE
      my-app   quay.io/my-repo/my-app:1.0    false     True  1h

  • Check the CR effective fields. In the get wlapp command, replace app_name with the name of the CR instance.

      oc get wlapp app_name -o yaml

    Ensure that the effective CR values in the output are what we want. If the CR successfully reconciled, the output has Reconciled in the status section.

      $ oc get wlapp my-app -o yaml
      
        apiVersion: liberty.websphere.ibm.com/v1
        kind: WebSphereLibertyApplication
        ...
        status:
          conditions:
          - lastUpdateTime: "2020-01-08T22:06:50Z"
            status: "True"
            type: Reconciled

  • Check the CR events. In the describe wlap command, replace app_name with the name of the CR instance.

      oc describe wlapp app_name

    WebSphereLibertyApplication CR displays Certificate not ready or TLS secret not found message

    When the .spec.manageTLS parameter is enabled in the WebSphereLibertyApplication CR, the Liberty operator manages the application certificate. If the Certificate CRD from the cert-manager.io group is installed on the cluster, then the operator creates instances of the cert-manager.io CR. A cert-manager operator must be installed on the cluster for the instances of cert-manager.io CR to be reconciled and the certificates to be generated. Otherwise, the WebSphereLibertyApplication CR displays one or both of the following messages in its status:

    • Certificate Issuer is not ready
    • Secret <CR_NAME>-svc-tls-cm was not found in namespace <CR_NAMESPACE>, Secret <CR_NAME>-svc-tls-cm not found

    To resolve this issue, complete one of the following options:

    • Check whether a cert-manager operator is installed and if it's managing the namespace of the WebSphereLibertyApplication CR. If not, install a cert-manager operator.

    • On Red Hat OpenShift, set the service annotation in the WebSphereLibertyApplication CR. The Liberty operator switches to using the Red Hat OpenShift Service CA. To set the service annotation, see Generate certificates with Red Hat OpenShift service CA.

    • On Red Hat OpenShift, check if the cert-manager operator is installed in any other namespaces in the cluster. If not, check the instances of the cert-manager.io CR. If no instances are required, then delete the cert-manager.io related CRDs. The Liberty operator switches to using the Red Hat OpenShift Service CA.

    Lost connections when using OpenJ9 version openj9-0.33.1

    If we are running a WebSphereLibertyApplication custom resource that uses JITServer with OpenJ9 version openj9-0.33.1 , the application might get lost connections to the JITServer with errors in the application and JIT Server containers. To prevent this problem, upgrade OpenJ9 to version 0.35.0.

    • The following code shows an example error in an application pod.

      139745560807168:error:140940F4:SSL routines:ssl3_read_bytes:unexpected message:ssl/record/rec_layer_s3.c:1477:
      #JITServer: t=865045 Lost connection to the server (serverUID=2529813496315317418)
      

    • The following shows an example error in a JIT server pod

      139825751693056:error:1408F10B:SSL routines:ssl3_get_record:wrong version number:ssl/record/ssl3_record.c:355:
      

    • The following code shows how to get the version information for OpenJ9.

      sh-4.4$ java -version
      openjdk version "17.0.4.1" 2022-08-12
      IBM Semeru Runtime Open Edition 17.0.4.1 (build 17.0.4.1+1)
      Eclipse OpenJ9 VM 17.0.4.1 (build openj9-0.33.1, JRE 17 Linux amd64-64-Bit Compressed References 20220812_266 (JIT enabled, AOT enabled)
      OpenJ9   - 1d9d16830
      OMR      - b58aa2708
      JCL      - 1f4d354e654 based on jdk-17.0.4.1+1)
      

    • The following code sample shows details of the Liberty version.

      ********************************************************************************
      product = Open Liberty 22.0.0.10 (wlp-1.0.69.cl221020220912-1100)
      wlp.install.dir = /opt/ol/wlp/
      server.output.dir = /opt/ol/wlp/output/defaultServer/
      java.home = /opt/java/openjdk
      java.version = 17.0.4.1
      java.runtime = IBM Semeru Runtime Open Edition (17.0.4.1+1)
      os = Linux (4.18.0-372.19.1.el8_6.x86_64; amd64) (en_US)
      process = 1@daytrader7-7f795bd46b-8vtkl
      Classpath = /opt/ol/wlp/bin/tools/ws-server.jar:/opt/ol/wlp/bin/tools/ws-javaagent.jar
      Java Library path = /opt/java/openjdk/lib/default:/opt/java/openjdk/lib:/usr/lib64:/usr/lib
      ********************************************************************************
      

    The following code sample shows that OpenJ9 version 0.35.0 is installed.

      sh-4.4$ java -version
      openjdk version "17.0.5" 2022-10-18
      IBM Semeru Runtime Open Edition 17.0.5.0 (build 17.0.5+8)
      Eclipse OpenJ9 VM 17.0.5.0 (build openj9-0.35.0, JRE 17 Linux amd64-64-Bit Compressed References 20221018_325 (JIT enabled, AOT enabled)
      OpenJ9   - e04a7f6c1
      OMR      - 85a21674f
      JCL      - 32d2c409a33 based on jdk-17.0.5+8)
      

    • The following code sample shows details of the Liberty version.

      ********************************************************************************
      product = Open Liberty 22.0.0.12 (wlp-1.0.71.cl221220221107-1900)
      wlp.install.dir = /opt/ol/wlp/
      server.output.dir = /opt/ol/wlp/output/defaultServer/
      java.home = /opt/java/openjdk
      java.version = 17.0.5
      java.runtime = IBM Semeru Runtime Open Edition (17.0.5+8)
      os = Linux (4.18.0-305.57.1.el8_4.x86_64; amd64) (en_US)
      process = 1@daytrader7-0
      Classpath = /opt/ol/wlp/bin/tools/ws-server.jar:/opt/ol/wlp/bin/tools/ws-javaage
      nt.jar
      Java Library path = /opt/java/openjdk/lib/default:/opt/java/openjdk/lib:/usr/lib
      64:/usr/lib
      ********************************************************************************
      


Gathering information about clusters with MustGather

We can use the MustGather tool to collect information about your cluster. IBM Support uses the collected information to help you troubleshoot and fix problems.

Run the Red Hat OpenShift must-gather and describe commands before you contact IBM Support to save time.

  1. Open a command-line prompt to the directory where we want to store the MustGather data.

  2. Log in to your Red Hat OpenShift cluster. The username must have cluster-admin permissions.

      oc login https://your_cluster_hostname -u username -p password

  3. Run the Red Hat OpenShift must-gather command.

    Replace instance_namespace and operator_namespace with the namespace values of the installation.

      oc adm must-gather --image=quay.io/opencloudio/must-gather:4.5.4 -- gather -n instance_namespace,operator_namespace -m failure,overview,cloudpak

    The command compresses the collected data into a .tgz output file and stores it in ./must-gather.local.random_number.

  4. Run Red Hat OpenShift describe commands to gather data about the state of the WebSphereLibertyApplication, WebSphereLibertyDump, and WebSphereLibertyTrace instances.

    The MustGather tool does not gather data about custom resources. Use the describe command to gather data about instances. Replace the instance_namespace variables with the namespace values for your instances.

      oc describe wlapp -n WebSphereLibertyApplication_instance_namespace > wlapp.txt
      oc describe wldump -n WebSphereLibertyDump_instance_namespace > wldump.txt
      oc describe wltrace -n WebSphereLibertyTrace_instance_namespace > wltrace.txt

  5. Open a Support Ticket with IBM Support and attach the MustGather .tgz output file.


Getting help from IBM Support

Open a Support Ticket with IBM Support and add information that can help IBM Support troubleshoot and fix the problem.

  1. Click Open a case on the WebSphere Application Server support or Let's troubleshoot page.

  2. Add information that can help IBM Support determine the cause of the error.

    In the ticket, describe the error. If the error is difficult to describe, then provide a screen capture of the error. Also, provide pertinent information, such as a description of the cluster configuration and the component that is failing or having issues.

  3. If we used the MustGather tool to collect information about your cluster, attach the MustGather .tgz output file and the .txt files that have data about our WebSphereLibertyApplication, WebSphereLibertyDump, and WebSphereLibertyTrace instances.