+

Search Tips   |   Advanced Search

Logging and monitoring cluster health

For cluster metrics and app logging and monitoring, Red Hat OpenShift on IBM Cloud clusters include built-in tools to help you manage the health of our single cluster instance. We can also set up IBM Cloud tools for multi-cluster analysis or other use cases, such as IBM Cloud Kubernetes Service add-ons: IBM Log Analysis with LogDNA and IBM Cloud Monitoring with Sysdig.


Understand options for logging and monitoring

To help understand when to use the built-in OpenShift tools or IBM Cloud integrations, review the following table.

Type OpenShift tools IBM Cloud integrations
Cluster and app logging Built-in OpenShift logging tools:
  • Built-in view of pod logs in the OpenShift web console.
  • Built-in pod logs are not configured with persistent storage. You must integrate with a cloud database to back up the logging data and make it highly available, and manage the logs yourself.

OpenShift 3.11: We cannot run the Ansible playbook to deploy the OpenShift Container Platform Elasticsearch, Fluentd, and Kibana (EFK) stack because we cannot modify the default configuration of the Red Hat OpenShift on IBM Cloud cluster.

OpenShift 4: To set up an OpenShift Container Platform Elasticsearch, Fluentd, and Kibana (EFK) stack , see installing the cluster logging operator.

IBM Log Analysis with LogDNA:
  • Customizable user interface for live streaming of log tailing, real-time troubleshooting, issue alerts, and log archiving.
  • Quick integration with the cluster via a script.
  • Aggregated logs across clusters and cloud providers.
  • Historical access to logs that is based on the plan you choose.
  • Highly available, scalable, and compliant with industry security standards.
  • Integrated with IBM Cloud IAM for user access management.
  • Flexible plans, including a free Lite option.

To get started, see Forwarding cluster and app logs to IBM Log Analysis with LogDNA.
API audit logging Built-in OpenShift audit logging tools:
API audit logging to monitor user-initiated activities is currently not supported.
IBM Log Analysis with LogDNA:
  • Customizable user interface for live streaming of log tailing, real-time troubleshooting, issue alerts, and log archiving.
  • Quick integration with the cluster via a script.
  • Aggregated logs across clusters and cloud providers.
  • Historical access to logs that is based on the plan you choose.
  • Highly available, scalable, and compliant with industry security standards.
  • Integrated with IBM Cloud IAM for user access management.
  • Flexible plans, including a free Lite option.

To get started, see Forwarding Kubernetes API audit logs to LogDNA.

Forwarding Kubernetes API audit logs to LogDNA is not supported for version 3.11 clusters.


IBM Cloud Activity Tracker with LogDNA:
Use IBM Cloud Activity Tracker with LogDNA to view cluster management events that are generated by the Red Hat OpenShift on IBM Cloud API. To access these logs, provision an instance of IBM Cloud Activity Tracker with LogDNA. For more information about the types of IBM Cloud Kubernetes Service events that we can track, see Activity Tracker events.
Monitoring Built-in OpenShift monitoring tools:
  • Built-in Prometheus and Grafana deployments in openshift-monitoring project for cluster metrics.
  • At-a-glance, real-time view of how the pods consume cluster resources that can be accessed from the OpenShift Cluster Console.
  • Monitoring is on a per-cluster basis.
  • The openshift-monitoring project stack is set up in a single zone only. No persistent storage is available to back up or view metric history.

For more information, see the OpenShift documentation .
IBM Cloud Monitoring with Sysdig:
  • Customizable user interface for a unified look at the cluster metrics, container security, resource usage, alerts, and custom events.
  • Quick integration with the cluster via a script.
  • Aggregated metrics and container monitoring across clusters and cloud providers for consistent operations enablement.
  • Historical access to metrics that is based on the timeline and plan, and ability to capture and download trace files.
  • Highly available, scalable, and compliant with industry security standards.
  • Integrated with IBM Cloud IAM for user access management.
  • Free trial to try out the capabilities.

To get started, see Forwarding cluster and app metrics to IBM Cloud Monitoring with Sysdig.



Forwarding cluster and app logs to IBM Log Analysis with LogDNA

Use the Red Hat OpenShift on IBM Cloud observability plug-in to create a logging configuration for IBM Log Analysis with LogDNA in the cluster, and use this logging configuration to automatically collect and forward pod logs to IBM Log Analysis with LogDNA.

We can have only one logging configuration for IBM Log Analysis with LogDNA in the cluster at a time. To use a different IBM Log Analysis with LogDNA service instance to send logs to, use the ibmcloud ob logging config replace command.

If you created a LogDNA logging configuration in the cluster without using the Red Hat OpenShift on IBM Cloud observability plug-in, we can use the ibmcloud ob logging agent discover command to make the configuration visible to the plug-in. Then, we can use the observability plug-in commands and functionality in the IBM Cloud console to manage the configuration.

Before beginning:

  • Verify that we are assigned the Editor platform role and Manager server access role for IBM Log Analysis with LogDNA.
  • Verify that we are assigned the Administrator platform role and the Manager service access role for all Kubernetes namespaces in IBM Cloud Kubernetes Service to create the logging configuration. To view a logging configuration or launch the LogDNA dashboard after the logging configuration is created, users must be assigned the Administrator platform role and the Manager service access for the ibm-observe Kubernetes namespace in IBM Cloud Kubernetes Service.
  • To use the CLI to set up the logging configuration:

To set up a logging configuration for the cluster:

  1. Create an IBM Log Analysis with LogDNA service instance and note the name of the instance. The service instance must belong to the same IBM Cloud account where you created the cluster, but can be in a different resource group and IBM Cloud region than the cluster.
  2. Set up a logging configuration for the cluster. When you create the logging configuration, an OpenShift project ibm-observe is created and a LogDNA agent is deployed as a daemon set to all worker nodes in the cluster. This agent collects logs with the extension *.log and extensionless files that are stored in the /var/log directory of our pod from all projects, including kube-system. The agent then forwards the logs to the IBM Log Analysis with LogDNA service.

    • From the console:

      1. From the Red Hat OpenShift on IBM Cloud console, select the cluster for which we want to create a LogDNA logging configuration.
      2. On the cluster Overview page, click Connect.
      3. Select the region and the IBM Log Analysis with LogDNA service instance that you created earlier, and click Connect.
    • From the CLI:

      1. Create the LogDNA logging configuration. When you create the LogDNA logging configuration, the ingestion key that was last added is retrieved automatically. To use a different ingestion key, add the --logdna-ingestion-key <ingestion_key> option to the command.

        To use a different ingestion key after you created your logging configuration, use the ibmcloud ob logging config replace command.

        ibmcloud ob logging config create --cluster <cluster_name_or_ID> --instance <LogDNA_instance_name_or_ID>
        

        Example output:

        Creating configuration...
        OK
        
      2. Verify that the logging configuration was added to the cluster.

        ibmcloud ob logging config list --cluster <cluster_name_or_ID>
        

        Example output:

        Listing configurations...
        
        OK
        Instance Name                            Instance ID                            CRN   
        IBM Cloud Log Analysis with LogDNA-opm   1a111a1a-1111-11a1-a1aa-aaa11111a11a   crn:v1:prod:public:logdna:us-south:a/a11111a1aaaaa11a111aa11a1aa1111a:1a111a1a-1111-11a1-a1aa-aaa11111a11a::
        
  3. Optional: Verify that the LogDNA agent was set up successfully.

    1. If you used the console to create the LogDNA logging configuration, log in to the cluster. For more information, see Access the OpenShift cluster..

    2. Verify that the daemon set for the LogDNA agent was created and all instances are listed as AVAILABLE.

      oc get daemonsets -n ibm-observe
      

      Example output:

      NAME           DESIRED   CURRENT   READY   UP-TO-DATE   AVAILABLE   NODE SELECTOR   AGE
      logdna-agent   9         9         9       9            9           <none>          14m
      

      The number of daemon set instances that are deployed equals the number of worker nodes in the cluster.

    3. Review the configmap that was created for the LogDNA agent.

      oc describe configmap -n ibm-observe
      
  4. Access the logs for the pods from the LogDNA dashboard.

    1. From the Red Hat OpenShift on IBM Cloud console, select the cluster that you configured.
    2. On the cluster Overview page, click Launch. The LogDNA dashboard opens.
    3. Review the pod logs that the LogDNA agent collected from the cluster. It might take a few minutes for the first logs to show.
  5. Review how we can search and filter logs in the LogDNA dashboard.


Forwarding Kubernetes API audit logs to IBM Log Analysis with LogDNA

Collect and forward any events that are passed through your Kubernetes API server to IBM Log Analysis with LogDNA.

Forwarding Kubernetes API audit logs to LogDNA is not supported for version 3.11 clusters.

In the following steps, you create an audit system in the cluster that consists of an audit webhook, a log collection service and webserver app, and a logging agent. The webhook collects the Kubernetes API server events from the cluster master. The log collection service is a Kubernetes ClusterIP service that is created from an image from the public IBM Cloud registry. This service exposes a simple node.js HTTP webserver app that is exposed only on the private network. The webserver app parses the log data from the audit webhook and creates each log as a unique JSON line. Finally, the logging agent forwards the logs from the webserver app to IBM Log Analysis with LogDNA, where we can view the logs.

To see how the audit webhook collects logs, check out the IBM Cloud Kubernetes Service kube-audit policy.

We cannot modify the default kube-audit policy or apply your own custom policy.

Before beginning:

To forward Kubernetes API audit logs to IBM Log Analysis with LogDNA:

  1. Set up an IBM Log Analysis with LogDNA instance and configure the LogDNA agent in the cluster.

  2. Target the global container registry for public IBM Cloud images.

    ibmcloud cr region-set global
    
  3. Optional: For more information about the kube-audit image, inspect icr.io/ibm/ibmcloud-kube-audit-to-logdna.

    ibmcloud cr image-inspect icr.io/ibm/ibmcloud-kube-audit-to-logdna
    
  4. Create a configuration file that is named ibmcloud-kube-audit.yaml. This configuration file creates a log collection service and a deployment that pulls the icr.io/ibm/ibmcloud-kube-audit-to-logdna image to create a log collection container.

    apiVersion: v1
    kind: List
    metadata:
     name: ibmcloud-kube-audit
    items:
     - apiVersion: apps/v1
       kind: Deployment
       metadata:
         name: ibmcloud-kube-audit
         labels:
           app: ibmcloud-kube-audit
       spec:
         replicas: 1
         selector:
           matchLabels:
             app: ibmcloud-kube-audit
         template:
           metadata:
             labels:
      app: ibmcloud-kube-audit
           spec:
             containers:
      - name: ibmcloud-kube-audit
        image: 'icr.io/ibm/ibmcloud-kube-audit-to-logdna:latest'
        ports:
          - containerPort: 3000
     - apiVersion: v1
       kind: Service
       metadata:
         name: ibmcloud-kube-audit-service
         labels:
           app: ibmcloud-kube-audit
       spec:
         selector:
           app: ibmcloud-kube-audit
         ports:
           - protocol: TCP
             port: 80
             targetPort: 3000
         type: ClusterIP
    
  5. Create the deployment in the default namespace of the cluster.

    kubectl create -f ibmcloud-kube-audit.yaml
    
  6. Verify that the ibmcloud-kube-audit-service pod has a STATUS of Running.

    kubectl get pods -l app=ibmcloud-kube-audit
    

    Example output:

    NAME                                             READY   STATUS         RESTARTS   AGE
    ibmcloud-kube-audit-c75cb84c5-qtzqd              1/1     Running        0          21s
    
  7. Verify that the ibmcloud-kube-audit-service service is deployed in the cluster. In the output, note the CLUSTER_IP.

    kubectl get svc -l app=ibmcloud-kube-audit
    

    Example output:

    NAME                          TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)          AGE
    ibmcloud-kube-audit-service   ClusterIP   172.21.xxx.xxx   <none>        80/TCP           1m
    
  8. Create the audit webhook to collect Kubernetes API server event logs. Add the http:// prefix to the CLUSTER_IP.

    ibmcloud oc cluster master audit-webhook set --cluster <cluster_name_or_ID> --remote-server http://172.21.xxx.xxx
    
  9. Verify that the audit webhook is created in the cluster.

    ibmcloud oc cluster master audit-webhook get --cluster <cluster_name_or_ID>
    

    Example output:

    OK
    Server:            http://172.21.xxx.xxx
    
  10. Apply the webhook to your Kubernetes API server by refreshing the cluster master.

    ibmcloud oc cluster master refresh --cluster <cluster_name_or_ID>
    
  11. After the master refresh completes, access your logs from the LogDNA dashboard.

    1. From the IBM Cloud Observability > Logging console, in the row for the Log Analysis with LogDNA instance, click View LogDNA. The LogDNA dashboard opens.
    2. Wait a few minutes for the logs to display.



Forwarding cluster and app metrics to IBM Cloud Monitoring with Sysdig

Use the Red Hat OpenShift on IBM Cloud observability plug-in to create a monitoring configuration for IBM Cloud Monitoring with Sysdig in the cluster, and use this monitoring configuration to automatically collect and forward metrics to IBM Cloud Monitoring with Sysdig.

With IBM Cloud Monitoring with Sysdig, we can collects cluster and pod metrics, such as the CPU and memory usage of our worker nodes, incoming and outgoing HTTP traffic for the pods, and data about several infrastructure components. In addition, the agent can collect custom application metrics by using either a Prometheus-compatible scraper or a StatsD facade.

We can have only one monitoring configuration for IBM Cloud Monitoring with Sysdig in the cluster at a time. To use a different IBM Cloud Monitoring with Sysdig service instance to send metrics to, use the ibmcloud ob monitoring config replace command.

If you created a Sysdig monitoring configuration in the cluster without using the Red Hat OpenShift on IBM Cloud observability plug-in, we can use the ibmcloud ob monitoring agent discover command to make the configuration visible to the plug-in. Then, we can use the observability plug-in commands and functionality in the IBM Cloud console to manage the configuration.

Before beginning:

  • Verify that we are assigned the Editor platform role and Manager server access role for IBM Cloud Monitoring with Sysdig.
  • Verify that we are assigned the Administrator platform role and the Manager service access role for all Kubernetes namespaces in IBM Cloud Kubernetes Service to create the monitoring configuration. To view a monitoring configuration or launch the Sysdig dashboard after the monitoring configuration is created, users must be assigned the Administrator platform role and the Manager service access role for the ibm-observe Kubernetes namespace in IBM Cloud Kubernetes Service.
  • To use the CLI to set up the monitoring configuration:

To set up a monitoring configuration for the cluster:

  1. Create an IBM Cloud Monitoring with Sysdig service instance and note the name of the instance. The service instance must belong to the same IBM Cloud account where you created the cluster, but can be in a different resource group and IBM Cloud region than the cluster.
  2. Set up a monitoring configuration for the cluster. When you create the monitoring configuration, an OpenShift project ibm-observe is created and a Sysdig agent is deployed as a Kubernetes daemon set to all worker nodes in your cluster. This agent collects cluster and pod metrics, such as the worker node CPU and memory usage, or the amount incoming and outgoing network traffic to the pods.

    • From the console:

      1. From the Red Hat OpenShift on IBM Cloud console, select the cluster for which we want to create a Sysdig monitoring configuration.
      2. On the cluster Overview page, click Connect.
      3. Select the region and the IBM Cloud Monitoring with Sysdig service instance that you created earlier, and click Connect.
    • From the CLI:

      1. Create the Sysdig monitoring configuration. When you create the Sysdig monitoring configuration, the access key that was last added is retrieved automatically. To use a different access key, add the --sysdig-access-key <access_key> option to the command.

        To use a different service access key after you created the monitoring configuration, use the ibmcloud ob monitoring config replace command.

        ibmcloud ob monitoring config create --cluster <cluster_name_or_ID> --instance <Sysdig_instance_name_or_ID>
        

        Example output:

        Creating configuration...
        OK
        
      2. Verify that the monitoring configuration was added to the cluster.

        ibmcloud ob monitoring config list --cluster <cluster_name_or_ID>
        

        Example output:

        Listing configurations...
        
        OK
        Instance Name                            Instance ID                            CRN   
        IBM Cloud Monitornig with Sysdig-aaa     1a111a1a-1111-11a1-a1aa-aaa11111a11a   crn:v1:prod:public:sysdig:us-south:a/a11111a1aaaaa11a111aa11a1aa1111a:1a111a1a-1111-11a1-a1aa-aaa11111a11a::
        
  3. Optional: Verify that the Sysdig agent was set up successfully.

    1. If you used the console to create the Sysdig monitoring configuration, log in to the cluster. For more information, see Access the OpenShift cluster.
    2. Verify that the daemon set for the Sysdig agent was created and all instances are listed as AVAILABLE.

      oc get daemon sets -n ibm-observe
      

      Example output:

      NAME           DESIRED   CURRENT   READY   UP-TO-DATE   AVAILABLE   NODE SELECTOR   AGE
      sysdig-agent   9         9         9       9            9           <none>          14m
      

      The number of daemon set instances that are deployed equals the number of worker nodes in the cluster.

    3. Review the configmap that was created for the Sysdig agent.

      oc describe configmap -n ibm-observe
      
  4. Access the metrics for the pods and cluster from the Sysdig dashboard.

    1. From the Red Hat OpenShift on IBM Cloud console, select the cluster that you configured.
    2. On the cluster Overview page, click Launch. The Sysdig dashboard opens.
    3. Review the pod and cluster metrics that the Sysdig agent collected from the cluster. It might take a few minutes for the first metrics to show.
  5. Review how we can work with the Sysdig dashboard to further analyze your metrics.


Viewing cluster states

Review the state of an OpenShift cluster to get information about the availability and capacity of the cluster, and potential problems that might occur.

To view information about a specific cluster, such as its zones, service endpoint URLs, Ingress subdomain, version, and owner, use the ibmcloud oc cluster get --cluster <cluster_name_or_ID> command. Include the --show-resources flag to view more cluster resources such as add-ons for storage pods or subnet VLANs for public and private IPs.

We can review information about the overall cluster, the IBM-managed master, and your worker nodes. To troubleshoot the cluster and worker nodes, see Troubleshooting clusters.


Cluster states

We can view the current cluster state by running the ibmcloud oc cluster ls command and locating the State field.

Cluster state Description
Aborted The deletion of the cluster is requested by the user before the Kubernetes master is deployed. After the deletion of the cluster is completed, the cluster is removed from your dashboard. If the cluster is stuck in this state for a long time, open an IBM Cloud support case.
Critical The Kubernetes master cannot be reached or all worker nodes in the cluster are down. If you enabled IBM Key Protect in the cluster, the Key Protect container might fail to encrypt or decrypt the cluster secrets. If so, we can view an error with more information when you run oc get secrets.
Delete failed The Kubernetes master or at least one worker node cannot be deleted. List worker nodes by running ibmcloud oc worker ls --cluster <cluster_name_or_ID>. If worker nodes are listed, see Unable to create or delete worker nodes. If no workers are listed, open an IBM Cloud support case.
Deleted The cluster is deleted but not yet removed from your dashboard. If the cluster is stuck in this state for a long time, open an IBM Cloud support case.
Deleting The cluster is being deleted and cluster infrastructure is being dismantled. We cannot access the cluster.
Deploy failed The deployment of the Kubernetes master could not be completed. We cannot resolve this state. Contact IBM Cloud support by opening an IBM Cloud support case.
Deploying The Kubernetes master is not fully deployed yet. We cannot access the cluster. Wait until the cluster is fully deployed to review the health of the cluster.
Normal All worker nodes in a cluster are up and running. We can access the cluster and deploy apps to the cluster. This state is considered healthy and does not require an action from you.

Although the worker nodes might be normal, other infrastructure resources, such as networking and storage, might still need attention. If you just created the cluster, some parts of the cluster that are used by other services such as Ingress secrets or registry image pull secrets, might still be in process.

Pending The Kubernetes master is deployed. The worker nodes are being provisioned and are not available in the cluster yet. We can access the cluster, but we cannot deploy apps to the cluster.
Requested A request to create the cluster and order the infrastructure for the Kubernetes master and worker nodes is sent. When the deployment of the cluster starts, the cluster state changes to Deploying. If the cluster is stuck in the Requested state for a long time, open an IBM Cloud support case.
Updating The Kubernetes API server that runs in your Kubernetes master is being updated to a new Kubernetes API version. During the update, we cannot access or change the cluster. Worker nodes, apps, and resources that the user deployed are not modified and continue to run. Wait for the update to complete to review the health of the cluster.
Unsupported The Kubernetes version that the cluster runs is no longer supported. Your cluster's health is no longer actively monitored or reported. Additionally, we cannot add or reload worker nodes. To continue receiving important security updates and support, we must update the cluster. Review the version update preparation actions, then update the cluster to a supported Kubernetes version.
Warning
  • At least one worker node in the cluster is not available, but other worker nodes are available and can take over the workload. Try to reload the unavailable worker nodes.
  • Your cluster has zero worker nodes, such as if you created a cluster without any worker nodes or manually removed all the worker nodes from the cluster. Resize your worker pool to add worker nodes to recover from a Warning state.
  • A control plane operation for the cluster failed. View the cluster in the console or run ibmcloud oc cluster get --cluster <cluster_name_or_ID> to check the Master Status for further debugging.


Master states

Your Red Hat OpenShift on IBM Cloud includes an IBM-managed master with highly available replicas, automatic security patch updates applied for you, and automation in place to recover in case of an incident. We can check the health, status, and state of the cluster master by running ibmcloud oc cluster get --cluster <cluster_name_or_ID>.

Master Health
The Master Health reflects the state of master components and notifies you if something needs your attention. The health might be one of the following:

  • error: The master is not operational. IBM is automatically notified and takes action to resolve this issue. We can continue monitoring the health until the master is normal. We can also open an IBM Cloud support case.
  • normal: The master is operational and healthy. No action is required.
  • unavailable: The master might not be accessible, which means some actions such as resizing a worker pool are temporarily unavailable. IBM is automatically notified and takes action to resolve this issue. We can continue monitoring the health until the master is normal.
  • unsupported: The master runs an unsupported version of Kubernetes. You must update the cluster to return the master to normal health.

Master Status and State
The Master Status provides details of what operation from the master state is in progress. The status includes a timestamp of how long the master has been in the same state, such as Ready (1 month ago). The Master State reflects the lifecycle of possible operations that can be performed on the master, such as deploying, updating, and deleting. Each state is described in the following table.

Master state Description
deployed The master is successfully deployed. Check the status to verify that the master is Ready or to see if an update is available.
deploying The master is currently deploying. Wait for the state to become deployed before working with the cluster, such as adding worker nodes.
deploy_failed The master failed to deploy. IBM Support is notified and works to resolve the issue. Check the Master Status field for more information, or wait for the state to become deployed.
deleting The master is currently deleting because you deleted the cluster. We cannot undo a deletion. After the cluster is deleted, we can no longer check the master state because the cluster is completely removed.
delete_failed The master failed to delete. IBM Support is notified and works to resolve the issue. We cannot resolve the issue by trying to delete the cluster again. Instead, check the Master Status field for more information, or wait for the cluster to delete. We can also open an IBM Cloud support case.
updating The master is updating its Kubernetes version. The update might be a patch update that is automatically applied, or a minor or major version that you applied by updating the cluster. During the update, your highly available master can continue processing requests, and the app workloads and worker nodes continue to run. After the master update is complete, we can update your worker nodes.

If the update is unsuccessful, the master returns to a deployed state and continues running the previous version. IBM Support is notified and works to resolve the issue. We can check if the update failed in the Master Status field.
update_cancelled The master update is canceled because the cluster was not in a healthy state at the time of the update. Your master remains in this state until the cluster is healthy and you manually update the master. To update the master, use the ibmcloud oc cluster master update command.If you do not want to update the master to the default major.minor version during the update, include the --version flag and specify the latest patch version that is available for the major.minor version that we want, such as 1.18.9. To list available versions, run ibmcloud oc versions.
update_failed The master update failed. IBM Support is notified and works to resolve the issue. We can continue to monitor the health of the master until the master reaches a normal state. If the master remains in this state for more than 1 day, open an IBM Cloud support case. IBM Support might identify other issues in the cluster that we must fix before the master can be updated.


Worker node states

We can view the current worker node state by running the ibmcloud oc worker ls --cluster <cluster_name_or_ID> command and locating the State and Status fields.

Worker node state Description
Critical A worker node can go into a Critical state for many reasons:
  • You initiated a reboot for the worker node without cordoning and draining your worker node. Rebooting a worker node can cause data corruption in containerd, kubelet, kube-proxy, and calico.
  • The pods that are deployed to your worker node do not use proper resource limits for memory and CPU. If you set none or excessive resource limits, pods can consume all available resources, leaving no resources for other pods to run on this worker node. This overcommitment of workload causes the worker node to fail.
    1. List the pods that run on the worker node and review the CPU and memory usage, requests and limits.
      oc describe node <worker_private_IP>
    2. For pods that consume a lot of memory and CPU resources, check if you set proper resource limits for memory and CPU.
      oc get pods <pod_name> -n <namespace> -o json
    3. Optional: Remove the resource-intensive pods to free up compute resources on the worker node.
      oc delete pod <pod_name>
      oc delete deployment <deployment_name>
  • containerd, kubelet, or calico went into an unrecoverable state after it ran hundreds or thousands of containers over time.
  • You set up a Virtual Router Appliance for the worker node that went down and cut off the communication between your worker node and the Kubernetes master.
  • Current networking issues in IBM Cloud Kubernetes Service or IBM Cloud infrastructure that causes the communication between your worker node and the Kubernetes master to fail.
  • Your worker node ran out of capacity. Check the Status of the worker node to see whether it shows Out of disk or Out of memory. If your worker node is out of capacity, consider to either reduce the workload on the worker node or add a worker node to the cluster to help load balance the workload.
  • The device was powered off from the IBM Cloud console resource list . Open the resource list and find your worker node ID in the Devices list. In the action menu, click Power On.
In many cases, reloading your worker node can solve the problem. When you reload your worker node, the latest patch version is applied to your worker node. The major and minor version is not changed. Before you reload your worker node, make sure to cordon and drain your worker node to ensure that the existing pods are terminated gracefully and rescheduled onto remaining worker nodes.

If reloading the worker node does not resolve the issue, go to the next step to continue troubleshooting your worker node.
Deployed Updates are successfully deployed to your worker node. After updates are deployed, Red Hat OpenShift on IBM Cloud starts a health check on the worker node. After the health check is successful, the worker node goes into a Normal state. Worker nodes in a Deployed state usually are ready to receive workloads, which we can check by running oc get nodes and confirming that the state shows Normal.
Deploying When you update the Kubernetes version of our worker node, your worker node is redeployed to install the updates. If you reload or reboot your worker node, the worker node is redeployed to automatically install the latest patch version. If your worker node is stuck in this state for a long time, continue with the next step to see whether a problem occurred during the deployment.
Deploy_failed Your worker node could not be deployed. List the details for the worker node to find the details for the failure by running ibmcloud oc worker get --cluster <cluster_name_or_id> --worker <worker_node_id>.
Normal Your worker node is fully provisioned and ready to be used in the cluster. This state is considered healthy and does not require an action from the user. Note: Although the worker nodes might be normal, other infrastructure resources, such as networking and storage, might still need attention.
Provisioning Your worker node is being provisioned and is not available in the cluster yet. We can monitor the provisioning process in the Status column of our CLI output. If your worker node is stuck in this state for a long time, continue with the next step to see whether a problem occurred during the provisioning.
Provision pending Another process is completing before the worker node provisioning process starts. We can monitor the other process that must complete first in the Status column of our CLI output. For example, in VPC clusters, the Pending security group creation indicates that the security group for the worker nodes is creating first before the worker nodes can be provisioned. If your worker node is stuck in this state for a long time, continue with the next step to see whether a problem occurred during the other process.
Provision_failed Your worker node could not be provisioned. List the details for the worker node to find the details for the failure by running ibmcloud oc worker get --cluster <cluster_name_or_id> --worker <worker_node_id>.
Reloading Your worker node is being reloaded and is not available in the cluster. We can monitor the reloading process in the Status column of our CLI output. If your worker node is stuck in this state for a long time, continue with the next step to see whether a problem occurred during the reloading.
Reloading_failed Your worker node could not be reloaded. List the details for the worker node to find the details for the failure by running ibmcloud oc worker get --cluster <cluster_name_or_id> --worker <worker_node_id>.
Reload_pending A request to reload or to update the Kubernetes version of our worker node is sent. When the worker node is being reloaded, the state changes to Reloading.
Unknown The Kubernetes master is not reachable for one of the following reasons:
  • You requested an update of our Kubernetes master. The state of the worker node cannot be retrieved during the update. If the worker node remains in this state for an extended period of time even after the Kubernetes master is successfully updated, try to reload the worker node.
  • We might have another firewall that is protecting your worker nodes, or changed firewall settings recently. Red Hat OpenShift on IBM Cloud requires certain IP addresses and ports to be opened to allow communication from the worker node to the Kubernetes master and vice versa. For more information, see Firewall prevents worker nodes from connecting.
  • The Kubernetes master is down. Contact IBM Cloud support by opening an IBM Cloud support case.
Warning Your worker node is reaching the limit for memory or disk space. We can either reduce work load on the worker node or add a worker node to the cluster to help load balance the work load.



Using the cluster logging operator

To deploy the OpenShift Container Platform cluster logging operator on your Red Hat OpenShift on IBM Cloud cluster, see the OpenShift documentation. Additionally, we must update the cluster logging instance to use the IBM Cloud Block Storage ibmc-block-gold storage class.

To create a cluster logging instance with the ibmc-block-gold storage class:

  1. Access the OpenShift cluster.
  2. From the OpenShift web console Administrator perspective, click Operators > Installed Operators.
  3. Click Cluster Logging.
  4. In the Provided APIs section, Cluster Logging tile, click Create Instance.
  5. Modify the configuration YAML to change the storage class for the ElasticSearch log storage from gp2 to ibmc-block-gold.
    ...
        elasticsearch:
          nodeCount: 3
          redundancyPolicy: SingleRedundancy
          storage:
            storageClassName: ibmc-block-gold
            size: 200G
    ...
    
  6. Click Create.
  7. Verify that the operator, Elasticsearch, Fluentd, and Kibana pods are all Running.