Use External Labels with Prometheus Alerts

We are going to customise Prometheus alerts by using external labels.

The Problem: One Prometheus Instance per Kubernetes Cluster

I’ve recently deployed the second Kubernetes cluster into the homelab environment, and realised that if I send alerts to the same Slack channel, I can’t tell which environment the alert somes from. I therefore need a way to identify the cluster that fires the alerts, ideally getting the cluster name passed to Alertmanager.

The Solution: External Labels

Starting with Prometheus 2.27, it is possible to expand environment variables in external labels. If the feature is enabled, then Prometheus would replace ${var} or $var in the external_labels values according to the values of the current environment variables. According to documentation, references to undefined variables are replaced by the empty string.

Pre-requisites

We are using our Kubernetes homelab to configure Prometheus and Alertmanager.

Download Files from GitHub

Prometheus and Alertmanager configuration files used in this article are hosted on GitHub. Clone the following repository:

$ git clone https://github.com/lisenet/kubernetes-homelab.git

Note that this homelab project is under development, therefore please refer to GitHub for any source code changes.

Use External Labels with Prometheus Alerts

Create Prometheus Secret to Store Cluster Name

Create a secret called prometheus-cluster-name that contains the cluster name the Prometheus instance is running in.

$ kubectl -n monitoring create secret generic \
  prometheus-cluster-name --from-literal=CLUSTER_NAME=kubernetes-homelab

Update Prometheus Deployment Configuration

Edit prometheus-deployment.yml, enable the feature expand-external-labels and instruct Prometheus to read environment variables from the secret prometheus-cluster-name:

---
apiVersion: apps/v1
kind: Deployment
[...]
      containers:
        - name: prometheus
          image: prom/prometheus:v2.29.0
          imagePullPolicy: IfNotPresent
          args:
            - "--storage.tsdb.retention.time=28d"
            - "--config.file=/etc/prometheus/prometheus.yml"
            - "--storage.tsdb.path=/prometheus/"
            - "--enable-feature=expand-external-labels"
          envFrom:
            - secretRef:
                name: prometheus-cluster-name
          ports:
            - containerPort: 9090
              protocol: TCP
[...]

Update Prometheus ConfigMap

Edit prometheus-config-map.yml and add external labels to global Prometheus configuration, also specify alert relabel configuration:

---
apiVersion: v1
kind: ConfigMap
[...]
  prometheus.yml: |-
    global:
      evaluation_interval: 60s
      scrape_interval: 15s
      scrape_timeout: 10s
      external_labels:
        cluster: ${CLUSTER_NAME}
    rule_files:
      - /etc/prometheus/prometheus.rules
    alerting:
      alert_relabel_configs:
      - source_labels: [cluster]
        action: replace
        regex: (.*)
        replacement: "$1"
        target_label: cluster
      alertmanagers:
      - static_configs:
        - targets:
          - 'alertmanager.monitoring.svc:9093'
[...]

For each configured alert, add a blank cluster label. Note that alert relabeling is applied to alerts before they are sent to the Alertmanager.

data:
  prometheus.rules: |-
    groups:
    - name: node.alerts
      rules:
      - alert: KubernetesHostHighCPUUsage
        expr: 100 - (avg by (instance) (irate(node_cpu_seconds_total{mode="idle"}[5m])) * 100) > 90
        for: 15m
        labels:
          severity: warning
          context: node
          cluster:
        annotations:
          summary: High load on node
          description: "Node {{ $labels.instance }} has more than 90% CPU load"

Apply Changes to Prometheus

$ kubectl apply -f ./kubernetes-homelab/prometheus/

Configure Alertmanager Slack Receiver

Edit alertmanager-config-map.yml and customise the alerting to include the cluster name:

---
apiVersion: v1
kind: ConfigMap
[...]    
    receivers:
    - name: 'slack_homelab'
      slack_configs:
      - api_url: https://hooks.slack.com/services/XYZXYZXYZ/ABCABCABC/1234567890
        channel: '#homelab'
        send_resolved: true
        title: "[{{ .Status | toUpper }}] {{ range .Alerts }}{{ .Annotations.summary }}\n{{ end }}"
        text: |-
          {{ range .Alerts }}*Description:* {{ .Annotations.description }}
          *Context:* {{ .Labels.context }}
          *Cluster:* {{ .Labels.cluster }}
          *Severity:* {{ .Labels.severity }}
          {{ end }}

Apply changes to Alertmanager:

$ kubectl apply -f ./kubernetes-homelab/alertmanager/

When a new alert arrives, it should contain the cluster name:

References

https://prometheus.io/docs/prometheus/2.29/feature_flags/

https://prometheus.io/docs/prometheus/latest/configuration/configuration/#alert_relabel_configs

Leave a Reply

Your email address will not be published. Required fields are marked *