Monitoring infra should alert when an expected collector can't be scraped #251

Closed
opened 2025-08-05 14:59:32 +00:00 by delroth · 1 comment
delroth commented 2025-08-05 14:59:32 +00:00 (Migrated from git.lix.systems)

When I deployed the smartctl exporter on bm-12 it at first couldn't start due to disk brokenness, then was broken due to a bug in the collector. In neither case did this show up anywhere in the monitoring, and the absence of metrics could be hiding other conditions. We should figure out how to bubble up the fact that the local agent can't scrape a target and alert on this.

When I deployed the smartctl exporter on bm-12 it at first couldn't start due to disk brokenness, then was broken due to a bug in the collector. In neither case did this show up anywhere in the monitoring, and the absence of metrics could be hiding other conditions. We should figure out how to bubble up the fact that the local agent can't scrape a target and alert on this.
raito commented 2025-08-05 15:02:30 +00:00 (Migrated from git.lix.systems)
              {
                alert = "FailedScrape";
                labels.severity = "warning";
                annotations.summary = "Scrape failed";
                annotations.description = "The job was not successfully scraped";
                for = "2m";
                expr = ''
                  up == 0
                '';
              }

could do the job

``` { alert = "FailedScrape"; labels.severity = "warning"; annotations.summary = "Scrape failed"; annotations.description = "The job was not successfully scraped"; for = "2m"; expr = '' up == 0 ''; } ``` could do the job
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
afnix/infra#251
No description provided.