dashboard: Add new prometheus alert

It was requested for us to update our alerting definitions to include a slow OSD Ops health check. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1951664 Signed-off-by: Boris Ranto <branto@redhat.com> (cherry picked from commit 2491d4e004)
2021-06-08 09:43:23 +02:00 · 2021-06-08 09:43:23 +02:00 · a6cf646e45
parent f0413c4a2b
commit a6cf646e45
1 changed files with 8 additions and 0 deletions
--- a/roles/ceph-prometheus/files/ceph_dashboard.yml
+++ b/roles/ceph-prometheus/files/ceph_dashboard.yml
@ -105,3 +105,11 @@ groups:
    annotations:
      summary: "OSD(s) with High PG Count"
      description: "This indicates there are some OSDs with high PG count (275+)."
+  - alert: Slow OSD Ops
+    expr: ceph_healthcheck_slow_ops > 0
+    for: 1m
+    labels:
+      severity: page
+    annotations:
+      summary: "Slow OSD Ops"
+      description: "OSD requests are taking too long to process (osd_op_complaint_time exceeded)"