-
Notifications
You must be signed in to change notification settings - Fork 61
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Runtime of CircleCI jobs exceed the limit for the free tier #166
Comments
@SWilson4 Here's that CircleCI issue I mentioned. I've got a first pass at porting it to Github Actions here. However, there's some difference in how the container/host handles process reaping which breaks a unit test, but I can patch around that.
|
That config looks reasonable to me. Would it help if I made a draft PR to move the |
Yeah, that'll help. I can put together the PR in OpenSSH to get Github Actions working. |
We are hitting the 1 hour time limit of Circle CI (Issue open-quantum-safe#166). This migrates the existing CircleCI job completely to Github Actions which has a 5 hour time limit. For the most part, this is pretty much a one-to-one migration. The one oddity I noticed is a regression in one of the unit tests. I've root-caused it to a change in how Github's Ubuntu host + OQS' CI container handles zombie processes. In this configuration, they don't seem to reap zombies as aggressively as it did in CircleCI. This causes the test to "fail" because it counts a zombie as "alive". The creation of a zombie is by design due to how ssh-agent's subprocess mode works (ssh-agent forks into a child process to allow an arbitrary parent to take over, the child then becomes a zombie when it recognizes its orphaned status). To work around this, I added a check to the assertion to count zombies as "not alive" and this allows the test to pass on Github Actions. Since upstream OpenSSH provided its own set of Github Actions, I simply moved those to the `upstream-github` directory to avoid conflicts and preserve the source.
We are hitting the 1 hour time limit of Circle CI (Issue open-quantum-safe#166). This migrates the existing CircleCI job completely to Github Actions which has a 5 hour time limit. For the most part, this is pretty much a one-to-one migration. The one oddity I noticed is a regression in one of the unit tests. I've root-caused it to a change in how Github's Ubuntu host + OQS' CI container handles zombie processes. In this configuration, they don't seem to reap zombies as aggressively as it did in CircleCI. This causes the test to "fail" because it counts a zombie as "alive". The creation of a zombie is by design due to how ssh-agent's subprocess mode works (ssh-agent forks into a child process to allow an arbitrary parent to take over, the child then becomes a zombie when it recognizes its orphaned status). To work around this, I added a check to the assertion to count zombies as "not alive" and this allows the test to pass on Github Actions. Since upstream OpenSSH provided its own set of Github Actions, I simply moved those to the `upstream-github` directory to avoid conflicts and preserve the source. Signed-off-by: gcr <[email protected]>
That's the way to go -- we already did in
That was because at the time it had support for platforms other CI systems didn't. For |
We are hitting the 1 hour time limit of Circle CI (Issue open-quantum-safe#166). This migrates the existing CircleCI job completely to Github Actions which has a 5 hour time limit. For the most part, this is pretty much a one-to-one migration. Since upstream OpenSSH provided its own set of Github Actions, I simply moved those to the `upstream-github` directory to avoid conflicts and preserve the source. I did run into two issues with getting the integration tests to pass. Beyond that, I ran into two issues that arose from migrating to Github Actions which need to be partched around. The combination of Github Actions' host with the OQS CI container results in a lazier reaping of zombie processes which breaks this test. In this test, ssh-agent is run as a subprocess to some arbitrary user command. This enables exclusive access to ssh-agent to that specific process. The way this works under the hood is that ssh-agent forks into a child process and the parent process exec's into the arbitrary command ([code ref](https://github.com/open-quantum-safe/openssh/blob/OQS-v9/ssh-agent.c#L2384)) which runs to completion. The child process than polls its parent process until it detects its own orphaned status and terminates itself. This, by design, results in a zombie process which must be reaped. The test's assertion uses `kill -0` to check for liveness, but that counts zombies as "alive". The workaround for this then is to add an additional check to assert that zombies are in fact "dead". The `percent` test tests % expansions inside SSH config files (e.g. home directory, username, port number). The assertion for the home directory uses the `HOME` environmental variable. Unfortunately, when running a container on a Github Runner, they unconditionally override the value of `HOME` with `/github/home` ([issue ref](actions/runner#863)) and this breaks the test assertion. The fix here is to get a more reliable reference for the home directory and use that for the assertion. Signed-off-by: gcr <[email protected]>
We are hitting the 1 hour time limit of Circle CI (Issue open-quantum-safe#166). This migrates the existing CircleCI job completely to Github Actions which has a 5 hour time limit. For the most part, this is pretty much a one-to-one migration. Since upstream OpenSSH provided its own set of Github Actions, I simply moved those to the `upstream-github` directory to avoid conflicts and preserve the source. I did run into two issues with getting the integration tests to pass. Beyond that, I ran into two issues that arose from migrating to Github Actions which need to be partched around. agent-subprocess zombie process reaping The combination of Github Actions' host with the OQS CI container results in a lazier reaping of zombie processes which breaks this test. In this test, ssh-agent is run as a subprocess to some arbitrary user command. This enables exclusive access to ssh-agent to that specific process. The way this works under the hood is that ssh-agent forks into a child process and the parent process exec's into the arbitrary command ([code ref](https://github.com/open-quantum-safe/openssh/blob/OQS-v9/ssh-agent.c#L2384)) which runs to completion. The child process than polls its parent process until it detects its own orphaned status and terminates itself. This, by design, results in a zombie process which must be reaped. The test's assertion uses `kill -0` to check for liveness, but that counts zombies as "alive". The workaround for this then is to add an additional check to assert that zombies are in fact "dead". percent expansion is broken due to Github's HOME override The `percent` test tests % expansions inside SSH config files (e.g. home directory, username, port number). The assertion for the home directory uses the `HOME` environmental variable. Unfortunately, when running a container on a Github Runner, they unconditionally override the value of `HOME` with `/github/home` ([issue ref](actions/runner#863)) and this breaks the test assertion. The fix here is to get a more reliable reference for the home directory and use that for the assertion. Signed-off-by: gcr <[email protected]>
We are hitting the 1 hour time limit of Circle CI (Issue open-quantum-safe#166). This migrates the existing CircleCI job completely to Github Actions which has a 5 hour time limit. For the most part, this is pretty much a one-to-one migration. Since upstream OpenSSH provided its own set of Github Actions, I simply moved those to the `upstream-github` directory to avoid conflicts and preserve the source. I did run into two issues with getting the integration tests to pass. Beyond that, I ran into two issues that arose from migrating to Github Actions which need to be partched around. agent-subprocess zombie process reaping The combination of Github Actions' host with the OQS CI container results in a lazier reaping of zombie processes which breaks this test. In this test, ssh-agent is run as a subprocess to some arbitrary user command. This enables exclusive access to ssh-agent to that specific process. The way this works under the hood is that ssh-agent forks into a child process and the parent process exec's into the arbitrary command ([code ref](https://github.com/open-quantum-safe/openssh/blob/OQS-v9/ssh-agent.c#L2384)) which runs to completion. The child process than polls its parent process until it detects its own orphaned status and terminates itself. This, by design, results in a zombie process which must be reaped. The test's assertion uses `kill -0` to check for liveness, but that counts zombies as "alive". The workaround for this then is to add an additional check to assert that zombies are in fact "dead". percent expansion is broken due to Github's HOME override The `percent` test tests % expansions inside SSH config files (e.g. home directory, username, port number). The assertion for the home directory uses the `HOME` environmental variable. Unfortunately, when running a container on a Github Runner, they unconditionally override the value of `HOME` with `/github/home` ([issue ref](actions/runner#863)) and this breaks the test assertion. The fix here is to get a more reliable reference for the home directory and use that for the assertion. Signed-off-by: Gerardo Ravago <[email protected]>
Merging open-quantum-safe/liboqs#1898 will complete the move to GitHub Actions. |
Thanks everyone for quickly driving this to completion! |
The reverse is more due: Thanks for the quick and thorough update (really, upgrade) to oqs-openssh, @geedo0 ! |
CircleCI's free tier limits a job's run time to 1 hour doc. The CI checks run by OpenSSH have been approaching that 1 hour limit and are now at the point where they just barely run past that limit and fail the CI checks example. Previous runs were already pretty close to the limit, the addition of new algorithms to test was enough to tip the run-time over the limit.
There's no solution that doesn't come with trade-offs, but here are some possibilities:
The text was updated successfully, but these errors were encountered: