Bring the BMC metrics over our Grafana #185
Labels
No labels
Compat/Breaking
Difficulty/Architectural
Difficulty/Easy
Difficulty/Hard
Help Wanted
Kind/Bug
Kind/Documentation
Kind/Enhancement
Kind/Feature
Kind/Testing
Priority/Critical
Priority/High
Priority/Low
Priority/Medium
Reviewed/Confirmed
Reviewed/Duplicate
Reviewed/Invalid
Reviewed/Won't Fix
Security
Silenced Alert
Status/Abandoned
Status/Blocked
Status/Need More Info
Status/Postponed
Tracking Issue
No milestone
No project
No assignees
1 participant
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference: afnix/infra#185
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
We had a weird incident where some voltage lines on the ARM64 motherboard were down temporarily causing a freeze of bm-11.
This was fixed by me intervening by shutting down the machine, waiting for the lines to come up, then restarting. This is very weird and hard to debug properly.
To let other people be aware of such an issue, we should scrape the OpenBMC metrics in some fashion.
Currently, OpenBMC access for the ARM64 box requires usage of a SOCKS5 proxy, we could analyze an architecture on how to exfil all the metrics we need and push them.
This task is high priority and may require custom development. Please ping me if you want to take it.