Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ci: add workaround for WSL hanging in CI #993

Merged

Conversation

austinvazquez
Copy link
Member

@austinvazquez austinvazquez commented Jun 20, 2024

Issue #, if available:
There is a known issue, microsoft/WSL#8529, where WSL commands can hang. This can cause Windows e2e tests to block until hitting the 2 hour timeout.

Description of changes:
This change adds a workaround to detect the bad state and attempt to mitigate by killing the WSL service. If the issue cannot be resolved, the test will only hang for 300 seconds before failing.

Testing done:
CI run was successful with 8 WSL shutdown failures.
https://github.com/runfinch/finch/actions/runs/9682445232/job/26715743040

image
  • I've reviewed the guidance in CONTRIBUTING.md

Trade-off analysis
The trade-off for this approach is the test suite can take longer with multiple reset VM calls being made. Sample runs which previously took ~15 minutes are up to ~37 minutes with the hanging mitigation; however, this is down from the 2 hour timeout failure which would occur without the mitigation.

License Acceptance

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

@austinvazquez austinvazquez force-pushed the debug-windows-test-vm-lockup branch 8 times, most recently from bc59409 to 73900bc Compare June 21, 2024 14:56
@austinvazquez austinvazquez changed the title ci: add debug code for test vm lockup on windows ci: add workaround for WSL hanging in CI Jun 21, 2024
@austinvazquez austinvazquez force-pushed the debug-windows-test-vm-lockup branch 4 times, most recently from a6180de to 5b9f8ed Compare June 26, 2024 15:24
@austinvazquez austinvazquez marked this pull request as ready for review June 26, 2024 16:36
Copy link
Member

@pendo324 pendo324 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hopefully this works better than the naive shutdown command 👍

@austinvazquez
Copy link
Member Author

Hopefully this works better than the naive shutdown command 👍

From the microsoft/WSL issue, some users reported shutdown taking 2-3 minutes. We can also consider being more aggressive than this and killing the WSL service faster.

@austinvazquez austinvazquez merged commit 6fcc73b into runfinch:main Jun 26, 2024
24 checks passed
@austinvazquez austinvazquez deleted the debug-windows-test-vm-lockup branch June 26, 2024 19:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants