Alert on flapping / restart-looping systemd units #370
Labels
No labels
Compat/Breaking
Difficulty/Architectural
Difficulty/Easy
Difficulty/Hard
Help Wanted
Kind/Bug
Kind/Documentation
Kind/Enhancement
Kind/Feature
Kind/Testing
Priority/Critical
Priority/High
Priority/Low
Priority/Medium
Reviewed/Confirmed
Reviewed/Duplicate
Reviewed/Invalid
Reviewed/Won't Fix
Security
Silenced Alert
Status/Abandoned
Status/Blocked
Status/Need More Info
Status/Postponed
Tracking Issue
No milestone
No project
No assignees
1 participant
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference
afnix/infra#370
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
We recently had a situation on
lix-zulip01
where some of the zulip queue runners were restart-looping at a high rate. Since they were never actually in thefailed
state we never saw this in our alerting until further symptoms ended up causing problems. I don't know off-hand how we could monitor for this, but this is something we should really have imo.