Fix two tests that fail on a heavily-loaded machine #38825

orlitzky · 2024-10-19T12:19:02Z

The commit messages explain each change in detail, but tl;dr these two tests fail for me when the machine is under heavy load because they are guessing how long it will take some computation to complete and the guess is wrong.

One test in this file sets the timeout to 100ms and expects an easy example to complete within that amount of time. It can still fail, though, if the machine is under load: File "src/sage/parallel/decorate.py", line 519, in sage.parallel.decorate.fork Failed example: g(5, m=5) Expected: 8 Got: Killing subprocess 62130 with input ((5,), {'m': 5}) which took too long 8 This example isn't really the purpose of the doctest -- its intent is to show that there are hard examples that will NOT complete in 100ms. (That is much easier to do.) All things considered, we are better off just dropping this one flaky example.

The test we have for start_workers() is doing a bit more than that. It's starting two threads that each print something, sleep for a bit, and then print something else. To check the expected output, we are basically guessing how long it will take these threads to run, and on a very heavily-loaded machine, we can guess wrong: File "src/sage/parallel/map_reduce.py", line 1149, in sage.parallel.map_reduce.RESetMapReduce.start_workers Failed example: sleep(float(1.5)) Expected: Finished: ... Finished: ... Got: Finished: 2 (We were expecting both threads to finish within 1.5s, but only one did.) To make this a bit more robust, we eliminate the sleep() and subsequent print inside of the threads. As a result, we need only make one guess: how long it will take the threads to start. And for that we now go overkill and wait 5s.

github-actions · 2024-10-19T13:03:40Z

Documentation preview for this PR (built with commit a181883; changes) is ready! 🎉
This preview will update shortly after each push to this PR.

The commit messages explain each change in detail, but tl;dr these two tests fail for me when the machine is under heavy load because they are guessing how long it will take some computation to complete and the guess is wrong. URL: sagemath#38825 Reported by: Michael Orlitzky Reviewer(s): Tobias Diez

tornaria · 2024-10-27T23:21:55Z

src/sage/parallel/map_reduce.py

            ....:     return []
            sage: S = RESetMapReduce(roots=[1, 2], children=children)
            sage: S.setup_workers(2)
-            sage: S.start_workers(); sleep(float(0.4))
+            sage: S.start_workers(); sleep(float(5))  # long time


The # long time label causes a test failure in the cleanup section, when running tests in non-long mode.

orlitzky added 2 commits October 19, 2024 07:57

github-actions bot added the s: needs review label Oct 19, 2024

orlitzky mentioned this pull request Oct 19, 2024

Compile everything with meson #36524

Merged

5 tasks

tobiasdiez approved these changes Oct 19, 2024

View reviewed changes

tobiasdiez added s: positive review and removed s: needs review labels Oct 19, 2024

vbraun merged commit 3232e30 into sagemath:develop Oct 26, 2024
20 checks passed

github-actions bot removed the s: positive review label Oct 26, 2024

tornaria reviewed Oct 27, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix two tests that fail on a heavily-loaded machine #38825

Fix two tests that fail on a heavily-loaded machine #38825

orlitzky commented Oct 19, 2024

github-actions bot commented Oct 19, 2024

tornaria Oct 27, 2024

orlitzky Oct 28, 2024

Fix two tests that fail on a heavily-loaded machine #38825

Fix two tests that fail on a heavily-loaded machine #38825

Conversation

orlitzky commented Oct 19, 2024

github-actions bot commented Oct 19, 2024

tornaria Oct 27, 2024

Choose a reason for hiding this comment

orlitzky Oct 28, 2024

Choose a reason for hiding this comment