Spurious alerts on CI coredumps

raito commented

2025-09-06 22:34:15 +00:00

Owner

Certain CI workloads will generate broken coredumps as part of their integration testing.

This appears in our alerting as spurious systemd units failing when such a coredump happen.

We should find a way to exclude an entire cgroup tree, e.g. the Nix build cgroup tree.

Certain CI workloads will generate broken coredumps as part of their integration testing. This appears in our alerting as spurious systemd units failing when such a coredump happen. We should find a way to exclude an entire cgroup tree, e.g. the Nix build cgroup tree.

raito added the

Silenced Alert

label

2025-09-06 22:34:15 +00:00

delroth commented

2025-09-07 01:32:30 +00:00

Owner

Is there maybe a way to just drop those coredumps on the floor instead of having them get processed by systemd-coredumpd? I'd prefer we isolate the CI workloads from their host as much as possible.

raito commented

2025-09-07 16:10:37 +00:00

Author

Owner

I do not see an option in coredump.conf to achieve this.

I do not see an option in `coredump.conf` to achieve this.

delroth commented

2025-09-08 06:07:22 +00:00

Owner

I was thinking e.g. new Lix option to RLIMIT_CORE=0.

delroth commented

2025-09-08 06:09:01 +00:00

Owner

Wait, actually looking into this RLIMIT_CORE=0 should be the default unless we explicitly --option enable-core-dumps true, what gives?

Wait, actually looking into this RLIMIT_CORE=0 should be the default unless we explicitly `--option enable-core-dumps true`, what gives?

raito commented

2025-09-08 13:03:42 +00:00

Author

Owner

I think RLIMIT_CORE=0 has only effects if kernel.core_pattern is a filename and not a pipe, i.e. when the kernel writes the coredump, not when systemd processes it.

raito commented

2025-09-08 17:13:39 +00:00

Author

Owner

I noticed [6736486.194464] coredump: 3234(liblixmain-test): RLIMIT_CORE is set to 1, aborting core in my EPYC logs BTW.

I noticed `[6736486.194464] coredump: 3234(liblixmain-test): RLIMIT_CORE is set to 1, aborting core` in my EPYC logs BTW.

Rows
Columns

Spurious alerts on CI coredumps #325