Alerting
Explore this Page
- Overview
- Requirements
- Installation
- Alertmanager Configuration
- Alerting Rules
- Alert Evaluation and Triggering
- Benefits of Alerting
Overview
Alerting in DataCore Puls8 provides real-time insights into the health and performance of storage resources by integrating with the Prometheus and Alertmanager components of the Kubernetes ecosystem. It enables automated detection of critical issues such as latency spikes, resource saturation, and abnormal behavior in both OpenEBS Local Storage and OpenEBS Replicated Storage.
The alerting system helps you take timely action by triggering customized notifications based on pre-defined thresholds and rules. DataCore Puls8’s alerting capabilities are designed for flexibility, allowing you to tailor alerts to operational standards and integrate with existing monitoring infrastructure.
Requirements
You must have valid credentials for the datacoresoftware Docker Hub registry.
Installation
The monitoring chart is included as a dependency in the DataCore Puls8 umbrella Helm chart.
Monitoring is enabled by default, and the stack installs:
- kube-prometheus-stack v70.10.0
- DataCore Puls8 specific add-ons and configurations for:
- Prometheus
- Grafana
- Alertmanager
If you want to install DataCore Puls8 without the monitoring stack (Example: If your environment already includes kube-prometheus-stack
), use the following Helm command:
helm install puls8 oci://registry-1.docker.io/datacoresoftware/puls8 \
-n puls8 --create-namespace --version 4.3.0-develop \
--set monitoring.kube-prometheus-stack.install=false
This installs only DataCore Puls8-specific monitoring custom resources (CRs). If these CRs are installed in a different namespace, some additional configuration is required.
If you already have kube-prometheus-stack
installed:
Prometheus Rule Selector Adjustment
To prevent the DataCore Puls8-specific rules from being ignored due to mismatched release labels, adjust the rule selector in the existing Prometheus installation:
helm upgrade <release_name> prometheus-community/kube-prometheus-stack -n monitoring \
--set prometheus.prometheusSpec.serviceMonitorSelector.matchLabels=null \
--set prometheus.prometheusSpec.podMonitorSelector.matchLabels=null \
--set prometheus.prometheusSpec.ruleSelector.matchLabels=null
This allows Prometheus to detect and process rules regardless of label mismatches.
Handling Alertmanager from External Stack
If Alertmanager is installed separately (i.e., not managed by DataCore Puls8), you must manually integrate DataCore Puls8 alerting by adding child routes and receivers specific to DataCore Puls8 alerts in the existing Alertmanager configuration.
Alertmanager Configuration
Prometheus handles the evaluation of rules and creation of alerts, but not their delivery. Alertmanager acts as the notification system managing alert grouping, deduplication, silencing, routing, and dispatching to receivers.
The Alertmanager configuration is defined in the values.yaml
file.
By default, no receivers are defined. You are expected to configure receivers based on your requirements.
monitoring:
kube-prometheus-stack:
alertmanager:
config:
global:
smtp_smarthost: 'smtp.org.com:587'
smtp_from: 'sender@org.com'
smtp_auth_username: 'sender@org.com'
smtp_auth_password: 'hAOS357*XZpqsse'
route:
receiver: team-X-mails
group_by: [alertname, engine]
routes:
- matchers:
- product="puls8"
receiver: puls8-receiver
receivers:
- name: 'team-X-mails'
email_configs:
- to: 'team-X+alerts@example.org'
send_resolved: true
- name: 'puls8-receiver'
email_configs:
- to: 'receiver@org.com'
send_resolved: true
Refer to the Prometheus Alertmanager Configuration Documentation for more details and other receiver types.
Alerting Rules
DataCore Puls8 includes Prometheus alert rules focused on OpenEBS Replicated PV Mayastor performance and capacity metrics. These rules can be modified or extended based on the needs of each organization.
Performance Rules
Performance rules monitor latency across:
- Volume Targets
- Replicas
- DiskPools
Latency metrics (read/write) are collected in a time series using counters exposed by OpenEBS Replicated PV Mayastor. These counters are stored in-memory and reset upon service restarts. Refer to the Monitoring Documentation for more information on Latency to be calculated.
- alert: MayastorDiskPoolWriteLatencyAvgHigh
expr: irate(diskpool_write_latency_us[1m]) / irate(diskpool_num_write_ops[1m]) > 500
for: 5m
labels:
severity: warning
product: puls8
engine: mayastor
annotations:
summary: "High write latency on disk pool"
description: "The write latency on disk pool {{ $labels.name }} on node {{ $labels.node }} is higher than 0.5ms."
alert
: Name of the rule.expr
: Calculates average write latency per operation using Prometheus irate function.for
: The condition must hold for 5 minutes before the alert is triggered.labels
: Categorize and filter alerts in Alertmanager.annotations
: Provide summary and description for better visibility.
Performance thresholds vary with application type, workload density, and infrastructure. You can benchmark the environment before customizing thresholds.
Capacity Rules
Capacity alerts monitor DiskPool usage. The default behavior is:
- Warning alert when > 75% of capacity is consumed
- Critical alert when > 90% of capacity is consumed
- alert: MayastorDiskPoolUsage
expr: diskpool_used_size_bytes / diskpool_total_size_bytes > 0.9
for: 1m
labels:
engine: mayastor
product: puls8
severity: critical
annotations:
summary: "Critical Alert of Disk Pool Usage"
description: "Mayastor diskpool {{ $labels.name }} on node {{ $labels.node }} has exceeded 90% of total capacity."
Alert Evaluation and Triggering
Prometheus evaluates each alerting rule at a 30-second interval (default) within a rule group. If the rule expression holds true continuously for the defined for duration, the alert transitions from Pending to Firing and is sent to Alertmanager.
Labels in the alert help group similar alerts, and annotations provide context such as summary and description.
Benefits of Alerting
- Faster Issue Detection and Resolution: Reduce MTTR by acting on alerts in real time.
- Improved Reliability: Proactively manage performance and capacity issues before they impact workloads.
- Customizable: Tailor alert rules and thresholds to suit application-specific needs.
- Seamless Integration: Compatible with existing kube-prometheus-stack deployments.
Learn More