Monitor

About cluster monitoring
- Stack components and monitored targets
Configure the monitoring stack
Manage cluster alerts
Access Prometheus, Alertmanager, and Grafana
- Access Prometheus, Alerting UI, and Grafana using the Web console
- Access Prometheus, Alertmanager, and Grafana directly
Expose custom application metrics for horizontal pod autoscaling

About cluster monitoring

OpenShift includes a pre-configured, pre-installed, and self-updating monitoring stack based on the Prometheus open source project and its wider eco-system. It provides monitoring of cluster components and includes a set of alerts to immediately notify the cluster administrator about any occurring problems and a set of Grafana dashboards. The cluster monitoring stack is only supported for monitoring OpenShift clusters. Important

To ensure compatibility with future OpenShift updates, configuring only the specified monitoring stack options is supported.

Stack components and monitored targets

The monitoring stack includes these components:

Table 1.1. Monitoring stack components

Component Description

Cluster Monitoring Operator The OpenShift Cluster Monitoring Operator (CMO) is the central component of the stack. It controls the deployed monitoring components and resources and ensures that they are always up to date.
Prometheus Operator The Prometheus Operator (PO) creates, configures, and manages Prometheus and Alertmanager instances. It also automatically generates monitoring target configurations based on familiar Kubernetes label queries.
Prometheus The Prometheus is the systems and service monitoring system, around which the monitoring stack is based.
Prometheus Adapter The Prometheus Adapter exposes cluster resource metrics API for horizontal pod autoscaling. Resource metrics are CPU and memory utilization.
Alertmanager The Alertmanager service handles alerts sent by Prometheus.
kube-state-metrics The kube-state-metrics exporter agent converts Kubernetes objects to metrics that Prometheus can use.
node-exporter node-exporter is an agent deployed on every node to collect metrics about it.
Grafana The Grafana analytics platform provides dashboards for analyzing and visualizing the metrics. The Grafana instance that is provided with the monitoring stack, along with its dashboards, is read-only.

Component	Description
Cluster Monitoring Operator	The OpenShift Cluster Monitoring Operator (CMO) is the central component of the stack. It controls the deployed monitoring components and resources and ensures that they are always up to date.
Prometheus Operator	The Prometheus Operator (PO) creates, configures, and manages Prometheus and Alertmanager instances. It also automatically generates monitoring target configurations based on familiar Kubernetes label queries.
Prometheus	The Prometheus is the systems and service monitoring system, around which the monitoring stack is based.
Prometheus Adapter	The Prometheus Adapter exposes cluster resource metrics API for horizontal pod autoscaling. Resource metrics are CPU and memory utilization.
Alertmanager	The Alertmanager service handles alerts sent by Prometheus.
kube-state-metrics	The kube-state-metrics exporter agent converts Kubernetes objects to metrics that Prometheus can use.
node-exporter	node-exporter is an agent deployed on every node to collect metrics about it.
Grafana	The Grafana analytics platform provides dashboards for analyzing and visualizing the metrics. The Grafana instance that is provided with the monitoring stack, along with its dashboards, is read-only.

All the components of the monitoring stack are monitored by the stack and are automatically updated when OpenShift is updated.

In addition to the components of the stack itself, the monitoring stack monitors:

CoreDNS
Elasticsearch (if Logging is installed)
Etcd
Fluentd (if Logging is installed)
HAProxy
Image registry
Kubelets
Kubernetes apiserver
Kubernetes controller manager
Kubernetes scheduler
Metering (if Metering is installed)
OpenShift apiserver
OpenShift controller manager
Operator Lifecycle Manager (OLM)
Telemeter client

Each OpenShift component is responsible for its monitoring configuration. For problems with a component's monitoring, open a bug in Bugzilla against that component, not against the general monitoring component.

Other OpenShift framework components might be exposing metrics as well. For details, see their respective documentation.

Next steps

Configure the monitoring stack.

Configure the monitoring stack

Prior to OpenShift 4, the Prometheus Cluster Monitoring stack was configured through the Ansible inventory file. For that purpose, the stack exposed a subset of its available configuration options as Ansible variables. We configured the stack before installinged OpenShift.

In OpenShift 4, Ansible is not the primary technology to install OpenShift anymore. The installation program provides only a very low number of configuration options before installation. Configuring most OpenShift framework components, including the Prometheus Cluster Monitoring stack, happens post-installation.

This section explains what configuration is supported, shows how to configure the monitoring stack, and demonstrates several common configuration scenarios.

Prerequisites

The monitoring stack imposes additional resource requirements. Consult the computing resources recommendations in Scaling the Cluster Monitoring Operator and verify that we have sufficient resources.

Maintenance and support

The supported way of configuring OpenShift Monitoring is by configuring it using the options described in this document. Do not use other configurations, as they are unsupported. Configuration paradigms might change across Prometheus releases, and such cases can only be handled gracefully if all configuration possibilities are controlled. If you use configurations other than those described in this section, the changes will disappear because the cluster-monitoring-operator reconciles any differences. The operator reverses everything to the defined state by default and by design.

Explicitly unsupported cases include:

Create additional ServiceMonitor objects in the openshift-monitoring namespace. This extends the targets the cluster monitoring Prometheus instance scrapes, which can cause collisions and load differences that cannot be accounted for. These factors might make the Prometheus setup unstable.
Create unexpected ConfigMap objects or PrometheusRule objects. This causes the cluster monitoring Prometheus instance to include additional alerting and recording rules.
Modify resources of the stack. The Prometheus Monitoring Stack ensures its resources are always in the state it expects them to be. If they are modified, the stack will reset them.
Use resources of the stack for your purposes. The resources created by the Prometheus Cluster Monitoring stack are not meant to be used by any other resources, as there are no guarantees about their backward compatibility.
Stopping the Cluster Monitoring Operator from reconciling the monitoring stack.
Add new alerting rules.
Modify the monitoring stack Grafana instance.

Create cluster monitoring ConfigMap

To configure the Prometheus Cluster Monitoring stack, create the cluster monitoring ConfigMap.

Prerequisites

An installed oc CLI tool
Administrative privileges for the cluster

Procedure

Check whether the cluster-monitoring-config ConfigMap object exists:

$ oc -n openshift-monitoring get configmap cluster-monitoring-config

If it does not exist, create it:

$ oc -n openshift-monitoring create configmap cluster-monitoring-config

Start editing the cluster-monitoring-config ConfigMap:

$ oc -n openshift-monitoring edit configmap cluster-monitoring-config

Create the data section if it does not exist yet:

apiVersion: v1
kind: ConfigMap
metadata:
  name: cluster-monitoring-config
  namespace: openshift-monitoring
data:
  config.yaml: |

Configure the cluster monitoring stack

We can configure the Prometheus Cluster Monitoring stack using ConfigMaps. ConfigMaps configure the Cluster Monitoring Operator, which in turn configures components of the stack.

Prerequisites

Make sure we have the cluster-monitoring-config ConfigMap object with the data/config.yaml section.

Procedure

Start editing the cluster-monitoring-config ConfigMap:

$ oc -n openshift-monitoring edit configmap cluster-monitoring-config

Put your configuration under data/config.yaml as key-value pair <component_name>: <component_configuration>:

apiVersion: v1
kind: ConfigMap
metadata:
  name: cluster-monitoring-config
  namespace: openshift-monitoring
data:
  config.yaml: |
    component:
      configuration for the component

Substitute <component> and <configuration for the component> accordingly.

For example, create this ConfigMap to configure a Persistent Volume Claim (PVC) for Prometheus:

apiVersion: v1
kind: ConfigMap
metadata:
  name: cluster-monitoring-config
  namespace: openshift-monitoring
data:
  config.yaml: |
    prometheusK8s:
      volumeClaimTemplate: spec: storageClassName: gluster-block resources: requests: storage: 40Gi

Here, prometheusK8s defines the Prometheus component and the following lines define its configuration.

Save the file to apply the changes. The pods affected by the new configuration are restarted automatically.

Configurable monitoring components

This table shows the monitoring components we can configure and the keys used to specify the components in the ConfigMap:

Table 1.2. Configurable monitoring components

Component Key

Prometheus Operator prometheusOperator
Prometheus prometheusK8s
Alertmanager alertmanagerMain
kube-state-metrics kubeStateMetrics
Grafana grafana
Telemeter Client telemeterClient
Prometheus Adapter k8sPrometheusAdapter

Component	Key
Prometheus Operator	prometheusOperator
Prometheus	prometheusK8s
Alertmanager	alertmanagerMain
kube-state-metrics	kubeStateMetrics
Grafana	grafana
Telemeter Client	telemeterClient
Prometheus Adapter	k8sPrometheusAdapter

From this list, only Prometheus and Alertmanager have extensive configuration options. All other components usually provide only the nodeSelector field for being deployed on a specified node.

Move monitoring components to different nodes

We can move any of the monitoring stack components to specific nodes.

Prerequisites

Make sure we have the cluster-monitoring-config ConfigMap object with the data/config.yaml section.

Procedure

Start editing the cluster-monitoring-config ConfigMap:

$ oc -n openshift-monitoring edit configmap cluster-monitoring-config

NodeSelector constraint for the component under data/config.yaml:

apiVersion: v1
kind: ConfigMap
metadata:
  name: cluster-monitoring-config
  namespace: openshift-monitoring
data:
  config.yaml: |
    component:
      nodeSelector:
node key: node value
node key: node value
...

Substitute <component> accordingly and substitute <node key>: <node value> with the map of key-value pairs that specifies the destination node. Often, only a single key-value pair is used.

The component can only run on a node that has each of the specified key-value pairs as labels. The node can have additional labels as well.

For example, to move components to the node that is labeled foo: bar, use:

apiVersion: v1
kind: ConfigMap
metadata:
  name: cluster-monitoring-config
  namespace: openshift-monitoring
data:
  config.yaml: |
    prometheusOperator: nodeSelector: foo: bar prometheusK8s: nodeSelector: foo: bar alertmanagerMain: nodeSelector: foo: bar kubeStateMetrics: nodeSelector: foo: bar grafana: nodeSelector: foo: bar telemeterClient: nodeSelector: foo: bar k8sPrometheusAdapter: nodeSelector: foo: bar

Save the file to apply the changes. The components affected by the new configuration are moved to new nodes automatically.

Additional resources

See the Kubernetes documentation for details on the nodeSelector constraint.

Configure persistent storage

Run cluster monitoring with persistent storage means that your metrics are stored to a Persistent Volume and can survive a pod being restarted or recreated. This is ideal if we require your metrics or alerting data to be guarded from data loss. For production environments, it is highly recommended to configure persistent storage. Important

In OpenShift 4.1 deployed on bare metal, Prometheus and Alertmanager cannot have persistent storage and are ephemeral only. For use cases other than bare metal, use dynamic provisioning to use persistent storage. In particular, do not use NFS for the monitoring stack.

Prerequisites

Dedicate sufficient persistent storage to ensure that the disk does not become full. How much storage you need depends on the number of pods. For information on system requirements for persistent storage, see Prometheus database storage requirements.
Unless you enable dynamically-provisioned storage, make sure we have a Persistent Volume (PV) ready to be claimed by the Persistent Volume Claim (PVC), one PV for each replica. Since Prometheus has two replicas and Alertmanager has three replicas, you need five PVs to support the entire monitoring stack.
Use the block type of storage.

Configure a persistent volume claim

For the Prometheus or Alertmanager to use a persistent volume (PV), you first must configure a persistent volume claim (PVC).

Prerequisites

Make sure we have the necessary storage class configured.
Make sure we have the cluster-monitoring-config ConfigMap object with the data/config.yaml section.

Procedure

Edit the cluster-monitoring-config ConfigMap:

$ oc -n openshift-monitoring edit configmap cluster-monitoring-config

Put your PVC configuration for the component under data/config.yaml:

apiVersion: v1
kind: ConfigMap
metadata:
  name: cluster-monitoring-config
  namespace: openshift-monitoring
data:
  config.yaml: |
    component:
      volumeClaimTemplate:
metadata:
  name: PVC name
spec:
  storageClassName: storage class
  resources:
requests:
  storage: 40Gi

See the Kubernetes documentation on PersistentVolumeClaims for information on how to specify volumeClaimTemplate.

For example, to configure a PVC that claims any configured OpenShift block PV as a persistent storage for Prometheus, use:

apiVersion: v1
kind: ConfigMap
metadata:
  name: cluster-monitoring-config
  namespace: openshift-monitoring
data:
  config.yaml: |
    prometheusK8s:
      volumeClaimTemplate:
metadata:
  name: my-prometheus-claim
spec:
  storageClassName: gluster-block
  resources:
requests:
  storage: 40Gi

And to configure a PVC that claims any configured OpenShift block PV as a persistent storage for Alertmanager, we can use:

apiVersion: v1
kind: ConfigMap
metadata:
  name: cluster-monitoring-config
  namespace: openshift-monitoring
data:
  config.yaml: |
    alertmanagerMain:
      volumeClaimTemplate:
metadata:
  name: my-alertmanager-claim
spec:
  storageClassName: gluster-block
  resources:
requests:
  storage: 40Gi

Save the file to apply the changes. The pods affected by the new configuration are restarted automatically and the new storage configuration is applied.

Modify retention time for Prometheus metrics data

By default, the Prometheus Cluster Monitoring stack configures the retention time for Prometheus data to be 15 days. We can modify the retention time to change how soon the data is deleted.

Prerequisites

Make sure we have the cluster-monitoring-config ConfigMap object with the data/config.yaml section.

Procedure

Start editing the cluster-monitoring-config ConfigMap:

$ oc -n openshift-monitoring edit configmap cluster-monitoring-config

Put your retention time configuration under data/config.yaml:

apiVersion: v1
kind: ConfigMap
metadata:
  name: cluster-monitoring-config
  namespace: openshift-monitoring
data:
  config.yaml: |
    prometheusK8s:
      retention: time specification

Substitute <time specification> with a number directly followed by ms (milliseconds), s (seconds), m (minutes), h (hours), d (days), w (weeks), or y (years).

For example, to configure retention time to be 24 hours, use:

apiVersion: v1
kind: ConfigMap
metadata:
  name: cluster-monitoring-config
  namespace: openshift-monitoring
data:
  config.yaml: |
    prometheusK8s:
      retention: 24h

Save the file to apply the changes. The pods affected by the new configuration are restarted automatically.

Configure Alertmanager

The Prometheus Alertmanager is a component that manages incoming alerts, including:

Alert silencing
Alert inhibition
Alert aggregation
Reliable deduplication of alerts
Group alerts
Sending grouped alerts as notifications through receviers such as email, PagerDuty, and HipChat

Alertmanager default configuration

The default configuration of the OpenShift Monitoring Alertmanager cluster is this:

global:
  resolve_timeout: 5m
route:
  group_wait: 30s
  group_interval: 5m
  repeat_interval: 12h
  receiver: default
  routes:
  - match:
      alertname: Watchdog
    repeat_interval: 5m
    receiver: watchdog
receivers:
- name: default
- name: watchdog

OpenShift monitoring ships with the Watchdog alert that always triggers by default to ensure the availability of the monitoring infrastructure.

Applying custom Alertmanager configuration

We can overwrite the default Alertmanager configuration by editing the alertmanager-main secret inside the openshift-monitoring namespace.

Prerequisites

An installed jq tool for processing JSON data

Procedure

Print the currently active Alertmanager configuration into file alertmanager.yaml:

$ oc -n openshift-monitoring get secret alertmanager-main --template='{{ index .data "alertmanager.yaml" }}' |base64 -d > alertmanager.yaml

Change the configuration in file alertmanager.yaml to the new configuration:

data:
 config.yaml: |
  global:
    resolve_timeout: 5m
  route:
    group_wait: 30s
    group_interval: 5m
    repeat_interval: 12h
    receiver: default
    routes:
    - match:
alertname: Watchdog
      repeat_interval: 5m
      receiver: watchdog
    - match:
service: _your service_ 1
      routes:
      - match:
  _your matching rules_ 2
receiver: _receiver_ 3
  receivers:
  - name: default
  - name: watchdog
  - name: _receiver_
    _receiver configuration_

1 service specifies the service that fires the alerts.
2 <your matching rules> specify the target alerts.
3 receiver specifies the receiver to use for the alert.

For example, this listing configures PagerDuty for notifications:

data:
 config.yaml: |
  global:
    resolve_timeout: 5m
  route:
    group_wait: 30s
    group_interval: 5m
    repeat_interval: 12h
    receiver: default
    routes:
    - match:
alertname: Watchdog
      repeat_interval: 5m
      receiver: watchdog
    - match: service: example-app routes: - match: severity: critical receiver: team-frontend-page
  receivers:
  - name: default
  - name: watchdog
  - name: team-frontend-page pagerduty_configs: - service_key: "your-key"

With this configuration, alerts of critical severity fired by the example-app service are sent using the team-frontend-page receiver, which means that these alerts are paged to a chosen person.

Apply the new configuration in the file:

$ oc -n openshift-monitoring create secret generic alertmanager-main --from-literal=alertmanager.yaml="$(< alertmanager.yaml)" --dry-run -o=yaml | oc -n openshift-monitoring replace secret --filename=-

Additional resources

See the PagerDuty official site for more information on PagerDuty.
See the PagerDuty Prometheus Integration Guide to learn how to retrieve the service_key.
See Alertmanager configuration for configuring alerting through different alert receivers.

Alerting rules

OpenShift Cluster Monitoring by default ships with a set of pre-defined alerting rules.

Note that:

The default alerting rules are used specifically for the OpenShift cluster and nothing else. For example, we get alerts for a persistent volume in the cluster, but we do not get them for persistent volume in your custom namespace.
Currently we cannot add custom alerting rules.
Some alerting rules have identical names. This is intentional. They are sending alerts about the same event with different thresholds, with different severity, or both.
With the inhibition rules, the lower severity is inhibited when the higher severity is firing.

Additional resources

See also the Alertmanager documentation.

Next steps

Manage cluster alerts.
Learn about Telemetry and, if necessary, opt out of it.

Manage cluster alerts

OpenShift 4 provides a Web interface to the Alertmanager, which enables you to manage alerts. This section demonstrates how to use the Alerting UI.

Contents of the Alerting UI

This section shows and explains the contents of the Alerting UI, a Web interface to the Alertmanager.

The main two pages of the Alerting UI are the Alerts and the Silences pages.

The Alerts page is located in Monitor -Alerts of the web console. alerts screen

Filtering alerts by their names.
Filtering the alerts by their states. To fire, some alerts need a certain condition to be true for the duration of a timeout. If a condition of an alert is currently true, but the timeout has not been reached, such an alert is in the Pending state.
Alert name.
Description of an alert.
Current state of the alert and when the alert went into this state.
Value of the Severity label of the alert.
Actions we can do with the alert.

The Silences page is located in Monitor -Silences of the web console. silences screen

Create a silence for an alert.
Filtering silences by their name.
Filtering silences by their states. If a silence is pending, it is currently not active because it is scheduled to start at a later time. If a silence expired, it is no longer active because it has reached its end time.
Description of a silence. It includes the specification of alerts that it matches.
Current state of the silence. For active silences, it shows when it ends, and for pending silences, it shows when it starts.
Number of alerts that are being silenced by the silence.
Actions we can do with a silence.

Additionally, next to the title of each of these pages is a link to the old Alertmanager interface.

Get information about alerts and alerting rules

We can find an alert and see information about it or its governing alerting rule.

Procedure

Open the web console and navigate to Monitor -Alerts.
Optional: Filter the alerts by name using the Filter alerts by name field.
Optional: Filter the alerts by state using one or more of the state buttons Firing, Silenced, Pending, Not firing.
Optional: Sort the alerts by clicking one or more of the Name, State, and Severity column headers.
After you see the alert, we can see either details of the alert or details of its governing alerting rule.
To see alert details, click on the name of the alert. This is the page with alert details:
The page has the graph with timeseries of the alert. It also has information about the alert, including:
- A link to its governing alerting rule
- Description of the alert
To see alerting rule details, click the button in the last column and select View Alerting Rule. This is the page with alerting rule details:
The page has information about the alerting rule, including:
- Alerting rule name, severity, and description
- The expression that defines the condition for firing the alert
- The time for which the condition should be true for an alert to fire
- Graph for each alert governed by the alerting rule, showing the value with which the alert is firing
- Table of all alerts governed by the alerting rule

Silencing alerts

We can either silence a specific alert or silence alerts that match a specification that you define.

Procedure

To silence a set of alerts by creating an alert specification:

Navigate to the Monitor -Silences page of the web console.
Click Create Silence.
Populate the Create Silence form.
To create the silence, click Create.

To silence a specific alert:

Navigate to the Monitor -Alerts page of the web console.
For the alert that we want to silence, click the button in the last column and click Silence Alert. The Create Silence form will appear with prepopulated specification of the chosen alert.
Optional: Modify the silence.
To create the silence, click Create.

Get information about silences

We can find a silence and view its details.

Procedure

Open the web console and navigate to Monitor -Silences.
Optional: Filter the silences by name using the Filter Silences by name field.
Optional: Filter the silences by state using one or more of the state buttons Active, Pending, Expired.
Optional: Sort the silences by clicking one or more of the Name, State, and Firing alerts column headers.
After you see the silence, we can click its name to see the details, including:
- Alert specification
- State
- Start time
- End time
- Number and list of firing alerts

Editing silences

We can edit a silence, which will expire the existing silence and create a new silence with the changed configuration.

Procedure

Navigate to the Monitor -Silences screen.
For the silence we want to modify, click the button in the last column and click Edit silence.
Alternatively, we can click Actions -Edit Silence in the Silence Overview screen for a particular silence.
In the Edit Silence screen, enter the changes and click the Save button. This will expire the existing silence and create one with the chosen configuration.

Expiring silences

We can expire a silence. Expiring a silence deactivates it forever.

Procedure

Navigate to the Monitor -Silences page.
For the silence we want to expire, click the button in the last column and click Expire Silence.
Alternatively, we can click the Actions -Expire Silence button in the Silence Overview page for a particular silence.
Confirm by clicking Expire Silence. This expires the silence.

Next steps

Access the Prometheus, Alertmanager, and Grafana.

Access Prometheus, Alertmanager, and Grafana

To work with data gathered by the monitoring stack, we might want to use the Prometheus, Alertmanager, and Grafana interfaces. They are available by default.

Access Prometheus, Alerting UI, and Grafana using the Web console

We can access Prometheus, Alerting UI, and Grafana web UIs using a Web browser through the OpenShift Web console. Note

The Alerting UI accessed in this procedure is the new interface for Alertmanager.

Prerequisites

Authentication is performed against the OpenShift identity and uses the same credentials or means of authentication as is used elsewhere in OpenShift. Use a role that has read access to all namespaces, such as the cluster-monitoring-view cluster role.

Procedure

Navigate to the OpenShift Web console and authenticate.
To access Prometheus, navigate to "Monitoring" -"Metrics".
To access the Alerting UI, navigate to "Monitoring" -"Alerts" or "Monitoring" -"Silences".
To access Grafana, navigate to "Monitoring" -"Dashboards".

Access Prometheus, Alertmanager, and Grafana directly

We can access Prometheus, Alertmanager, and Grafana web UIs using the oc tool and a Web browser. Note

The Alertmanager UI accessed in this procedure is the old interface for Alertmanager.

Prerequisites

Authentication is performed against the OpenShift identity and uses the same credentials or means of authentication as is used elsewhere in OpenShift. Use a role that has read access to all namespaces, such as the cluster-monitoring-view cluster role.

Procedure

Run:

$ oc -n openshift-monitoring get routes
NAME                HOST/PORT                                                     ...
alertmanager-main   alertmanager-main-openshift-monitoring.apps.url.openshift.com ...
grafana             grafana-openshift-monitoring.apps.url.openshift.com           ...
prometheus-k8s      prometheus-k8s-openshift-monitoring.apps.url.openshift.com    ...

Prepend https:// to the address, we cannot access web UIs using unencrypted connection.
For example, this is the resulting URL for Alertmanager:
```
https://alertmanager-main-openshift-monitoring.apps.url.openshift.com
```
Navigate to the address using a Web browser and authenticate.

Additional resources

For documentation on the new interface for Alertmanager, see Manage cluster alerts.

Expose custom application metrics for horizontal pod autoscaling

We can use the prometheus-adapter resource to expose custom application metrics for the horizontal pod autoscaler. Important

Prometheus Adapter is a Technology Preview feature only. Technology Preview features are not supported with Red Hat production service level agreements (SLAs) and might not be functionally complete. Red Hat does not recommend using them in production. These features provide early access to upcoming product features, enabling customers to test functionality and provide feedback during the development process.

Prerequisites

Make sure we have a custom Prometheus instance installed. In this example, it is presumed that Prometheus was installed in the default namespace.
Make sure you configured monitoring for our application. In this example, it is presumed that the application and the service monitor for it were installed in the default namespace.

Procedure

Create a YAML file for your configuration. In this example, it is called deploy.yaml.

Add configuration for creating the service account, necessary roles, and role bindings for prometheus-adapter:

kind: ServiceAccount
apiVersion: v1
metadata:
  name: custom-metrics-apiserver
  namespace: default
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: custom-metrics-server-resources
rules:
- apiGroups:
  - custom.metrics.k8s.io
  resources: ["*"]
  verbs: ["*"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: custom-metrics-resource-reader
rules:
- apiGroups:
  - ""
  resources:
  - namespaces
  - pods
  - services
  verbs:
  - get
  - list
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: custom-metrics:system:auth-delegator
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: system:auth-delegator
subjects:
- kind: ServiceAccount
  name: custom-metrics-apiserver
  namespace: default
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
  name: custom-metrics-auth-reader
  namespace: kube-system
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: Role
  name: extension-apiserver-authentication-reader
subjects:
- kind: ServiceAccount
  name: custom-metrics-apiserver
  namespace: default
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: custom-metrics-resource-reader
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: custom-metrics-resource-reader
subjects:
- kind: ServiceAccount
  name: custom-metrics-apiserver
  namespace: default
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: hpa-controller-custom-metrics
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: custom-metrics-server-resources
subjects:
- kind: ServiceAccount
  name: horizontal-pod-autoscaler
  namespace: kube-system
---

Add configuration for the custom metrics for prometheus-adapter:

apiVersion: v1
kind: ConfigMap
metadata:
  name: adapter-config
  namespace: default
data:
  config.yaml: |
    rules:
    - seriesQuery: 'http_requests_total{namespace!="",pod!=""}' 1
      resources:
overrides:
  namespace: {resource: "namespace"}
  pod: {resource: "pod"}
  service: {resource: "service"}
      name:
matches: "^(.*)_total"
as: "${1}_per_second" 2
      metricsQuery: 'sum(rate(<<.Series>>{<<.LabelMatchers>>}[2m])) by (<<.GroupBy>>)'
---

1 Chosen metric to be the number of HTTP requests.
2 The frequency for the metric.

Add configuration for registering prometheus-adapter as an API service:

apiVersion: v1
kind: Service
metadata:
  annotations:
    service.alpha.openshift.io/serving-cert-secret-name: prometheus-adapter-tls
  labels:
    name: prometheus-adapter
  name: prometheus-adapter
  namespace: default
spec:
  ports:
  - name: https
    port: 443
    targetPort: 6443
  selector:
    app: prometheus-adapter
  type: ClusterIP
---
apiVersion: apiregistration.k8s.io/v1beta1
kind: APIService
metadata:
  name: v1beta1.custom.metrics.k8s.io
spec:
  service:
    name: prometheus-adapter
    namespace: default
  group: custom.metrics.k8s.io
  version: v1beta1
  insecureSkipTLSVerify: true
  groupPriorityMinimum: 100
  versionPriority: 100
---

Show the Prometheus Adapter image to use:

$ kubectl get -n openshift-monitoring deploy/prometheus-adapter -o jsonpath="{..image}"
quay.io/openshift-release-dev/ocp-v4.1-art-dev@sha256:76db3c86554ad7f581ba33844d6a6ebc891236f7db64f2d290c3135ba81c264c

Add configuration for deploying prometheus-adapter:

apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    app: prometheus-adapter
  name: prometheus-adapter
  namespace: default
spec:
  replicas: 1
  selector:
    matchLabels:
      app: prometheus-adapter
  template:
    metadata:
      labels:
app: prometheus-adapter
      name: prometheus-adapter
    spec:
      serviceAccountName: custom-metrics-apiserver
      containers:
      - name: prometheus-adapter
image: openshift-release-dev/ocp-v4.1-art-dev 1
args:
- --secure-port=6443
- --tls-cert-file=/var/run/serving-cert/tls.crt
- --tls-private-key-file=/var/run/serving-cert/tls.key
- --logtostderr=true
- --prometheus-url=http://prometheus-operated.default.svc:9090/
- --metrics-relist-interval=1m
- --v=4
- --config=/etc/adapter/config.yaml
ports:
- containerPort: 6443
volumeMounts:
- mountPath: /var/run/serving-cert
  name: volume-serving-cert
  readOnly: true
- mountPath: /etc/adapter/
  name: config
  readOnly: true
- mountPath: /tmp
  name: tmp-vol
      volumes:
      - name: volume-serving-cert
secret:
  secretName: prometheus-adapter-tls
      - name: config
configMap:
  name: adapter-config
      - name: tmp-vol
emptyDir: {}

1 image: openshift-release-dev/ocp-v4.1-art-dev specifies the Prometheus Adapter image found in the previous step.

Apply the configuration file to the cluster:
```
$ oc apply -f deploy.yaml
```
Now the application's metrics are exposed and can be used to configure horizontal pod autoscaling.

Additional resources

See the horizontal pod autoscaling documentation.
See the Kubernetes documentation on horizontal pod autoscaler.

Quick Links

Help

Site Info

Trust Red Hat

Browser Support Policy

Accessibility

Awards and Recognition

Colophon

Monitor

About cluster monitoring

Stack components and monitored targets

Configure the monitoring stack

Maintenance and support

Create cluster monitoring ConfigMap

Configure the cluster monitoring stack

Configurable monitoring components

Move monitoring components to different nodes

Configure persistent storage

Configure a persistent volume claim

Modify retention time for Prometheus metrics data

Configure Alertmanager

Alertmanager default configuration

Applying custom Alertmanager configuration

Alerting rules

Manage cluster alerts

Contents of the Alerting UI

Get information about alerts and alerting rules

Silencing alerts

Get information about silences

Editing silences

Expiring silences

Access Prometheus, Alertmanager, and Grafana

Access Prometheus, Alerting UI, and Grafana using the Web console

Access Prometheus, Alertmanager, and Grafana directly

Expose custom application metrics for horizontal pod autoscaling

Quick Links

Help

Site Info

Related Sites

About