Hercules CI disabled for stuck VM tests #213

Closed
opened 2023-07-22 03:12:29 +00:00 by zowoq · 19 comments
zowoq commented 2023-07-22 03:12:29 +00:00 (Migrated from github.com)

@nix-community/lanzaboote

I've disabled hercules on this repo as your VM tests are getting stuck and causing OOMs on the infra.

https://hercules-ci.com/accounts/github/nix-community/derivations/%2Fnix%2Fstore%2F2j033ap4rs21iw8c90rr6s8sxqgdbi09-vm-test-run-lanzaboote.drv/log?via-job=42b9d7c2-fb82-47f8-9e66-0c616110034d

@nix-community/lanzaboote I've disabled hercules on this repo as your VM tests are getting stuck and causing OOMs on the infra. https://hercules-ci.com/accounts/github/nix-community/derivations/%2Fnix%2Fstore%2F2j033ap4rs21iw8c90rr6s8sxqgdbi09-vm-test-run-lanzaboote.drv/log?via-job=42b9d7c2-fb82-47f8-9e66-0c616110034d
RaitoBezarius commented 2023-07-22 13:43:14 +00:00 (Migrated from github.com)

Thank you @zowoq, we will investigate and put timeouts to avoid this in the future. Can we re-enable Hercules by ourselves and do I need to ping once we are ready again?

Thank you @zowoq, we will investigate and put timeouts to avoid this in the future. Can we re-enable Hercules by ourselves and do I need to ping once we are ready again?
zowoq commented 2023-07-22 13:51:11 +00:00 (Migrated from github.com)

No, don't need to ping and you can re-enable it yourself with the Build is repository switch here: https://hercules-ci.com/github/nix-community/lanzaboote.

No, don't need to ping and you can re-enable it yourself with the `Build is repository` switch here: https://hercules-ci.com/github/nix-community/lanzaboote.
nikstur commented 2023-07-22 15:09:14 +00:00 (Migrated from github.com)

I have a feeling this is because we are re-compiling half the world when we cross compile our stub. Sounds like another reason to go back to crane. I have a PR in the works already.

I have a feeling this is because we are re-compiling half the world when we cross compile our stub. Sounds like another reason to go back to crane. I have a PR in the works already.
RaitoBezarius commented 2023-07-22 15:13:48 +00:00 (Migrated from github.com)

I have a feeling this is because we are re-compiling half the world when we cross compile our stub. Sounds like another reason to go back to crane. I have a PR in the works already.

This is unrelated because by the time we are running VM tests, the system closure is ready.

> I have a feeling this is because we are re-compiling half the world when we cross compile our stub. Sounds like another reason to go back to crane. I have a PR in the works already. This is unrelated because by the time we are running VM tests, the system closure is ready.
zowoq commented 2023-09-25 21:44:11 +00:00 (Migrated from github.com)
I've disabled hercules as it was causing the same problem again. https://hercules-ci.com/accounts/github/nix-community/derivations/%2Fnix%2Fstore%2Fqwx0i64mljpccxswk852ljhl3rn3y2gk-vm-test-run-lanzaboote.drv/log?via-job=89eea2c4-5bdc-475f-93f9-478ae12965d1
RaitoBezarius commented 2023-09-25 22:59:08 +00:00 (Migrated from github.com)

I see, this is an annoying but in the test framework, I will try to come up with a fix in nixpkgs.

I see, this is an annoying but in the test framework, I will try to come up with a fix in nixpkgs.
RaitoBezarius commented 2023-09-27 00:43:31 +00:00 (Migrated from github.com)

I am exploring a proper solution in https://github.com/NixOS/nixpkgs/pull/257535, in the meantime, I think I will get a timeout option in the test driver which is the "easy solution".

I am exploring a **proper** solution in https://github.com/NixOS/nixpkgs/pull/257535, in the meantime, I think I will get a timeout option in the test driver which is the "easy solution".
RaitoBezarius commented 2023-10-23 00:14:15 +00:00 (Migrated from github.com)
Hopefully https://github.com/NixOS/nixpkgs/pull/262839
RaitoBezarius commented 2023-10-30 11:09:14 +00:00 (Migrated from github.com)

Timeout have been merged and lanzaboote master has been updated to use it.
We will need to rebase all PRs to make use of it in PR CI.

Timeout have been merged and lanzaboote master has been updated to use it. We will need to rebase all PRs to make use of it in PR CI.
RaitoBezarius commented 2023-10-30 11:09:19 +00:00 (Migrated from github.com)

I re-enabled CI.

I re-enabled CI.
RaitoBezarius commented 2023-10-30 11:09:32 +00:00 (Migrated from github.com)

Let's re-open if Hercules CI has to be disabled again.

Let's re-open if Hercules CI has to be disabled again.
zowoq commented 2023-10-30 11:41:49 +00:00 (Migrated from github.com)

Timeout have been merged

How long is the timeout? With the number of nixos tests this project has it need to be quite short to not block other projects.

We will need to rebase all PRs to make use of it in PR CI.

I've disabled CI again as I don't want PRs pushed without being rebased and end up with stuck tests again.

Please rebase the PRs first, then re-enable CI.

> Timeout have been merged How long is the timeout? With the number of nixos tests this project has it need to be quite short to not block other projects. > We will need to rebase all PRs to make use of it in PR CI. I've disabled CI again as I don't want PRs pushed without being rebased and end up with stuck tests again. Please rebase the PRs first, then re-enable CI.
RaitoBezarius commented 2023-10-30 12:01:14 +00:00 (Migrated from github.com)

How long is the timeout? With the number of nixos tests this project has it need to be quite short to not block other projects.

1 hour, by default on any NixOS test AFAIK. It is up on master.

> How long is the timeout? With the number of nixos tests this project has it need to be quite short to not block other projects. 1 hour, by default on any NixOS test AFAIK. It is up on master.
zowoq commented 2023-10-30 12:18:10 +00:00 (Migrated from github.com)

With this number of vm tests an hour is too long for our limited resources. They seem to run quick, can you do a 5 - 10 minute timeout?

With this number of vm tests an hour is too long for our limited resources. They seem to run quick, can you do a 5 - 10 minute timeout?
RaitoBezarius commented 2023-10-30 12:38:27 +00:00 (Migrated from github.com)

With this number of vm tests an hour is too long for our limited resources. They seem to run quick, can you do a 5 - 10 minute timeout?

I can.

I rebased all the PRs, will submit a 10 minutes timeout as a default and then rebase everything again.

> With this number of vm tests an hour is too long for our limited resources. They seem to run quick, can you do a 5 - 10 minute timeout? I can. I rebased all the PRs, will submit a 10 minutes timeout as a default and then rebase everything again.
nikstur commented 2023-10-30 12:38:38 +00:00 (Migrated from github.com)

Let's try 5 minutes and if we see that that doesn't suffice we can go up to 10.

Edit: In the interest of fair resource sharing.

Let's try 5 minutes and if we see that that doesn't suffice we can go up to 10. Edit: In the interest of fair resource sharing.
RaitoBezarius commented 2023-10-30 12:41:09 +00:00 (Migrated from github.com)

https://github.com/nix-community/lanzaboote/pull/250 @nikstur can I let you do a quick review?

https://github.com/nix-community/lanzaboote/pull/250 @nikstur can I let you do a quick review?
RaitoBezarius commented 2023-10-30 12:48:46 +00:00 (Migrated from github.com)

Everything has been rebased, I am turning on the CI again.

Everything has been rebased, I am turning on the CI again.
zowoq commented 2023-10-30 12:59:56 +00:00 (Migrated from github.com)

Thanks!

Thanks!
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference: raito/lanzaboote#213
No description provided.