Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix two tests that fail on a heavily-loaded machine #38825

Merged
merged 2 commits into from
Oct 26, 2024

Conversation

orlitzky
Copy link
Contributor

The commit messages explain each change in detail, but tl;dr these two tests fail for me when the machine is under heavy load because they are guessing how long it will take some computation to complete and the guess is wrong.

One test in this file sets the timeout to 100ms and expects an easy
example to complete within that amount of time. It can still fail,
though, if the machine is under load:

  File "src/sage/parallel/decorate.py", line 519, in
  sage.parallel.decorate.fork
    Failed example:
      g(5, m=5)
    Expected:
      8
  Got:
      Killing subprocess 62130 with input ((5,), {'m': 5}) which
      took too long
      8

This example isn't really the purpose of the doctest -- its intent is
to show that there are hard examples that will NOT complete in
100ms. (That is much easier to do.) All things considered, we are
better off just dropping this one flaky example.
The test we have for start_workers() is doing a bit more than
that. It's starting two threads that each print something, sleep for a
bit, and then print something else. To check the expected output, we
are basically guessing how long it will take these threads to run, and
on a very heavily-loaded machine, we can guess wrong:

  File "src/sage/parallel/map_reduce.py", line 1149, in
  sage.parallel.map_reduce.RESetMapReduce.start_workers
  Failed example:
      sleep(float(1.5))
  Expected:
      Finished: ...
      Finished: ...
  Got:
      Finished: 2

(We were expecting both threads to finish within 1.5s, but only one
did.)

To make this a bit more robust, we eliminate the sleep() and
subsequent print inside of the threads. As a result, we need only make
one guess: how long it will take the threads to start. And for that we
now go overkill and wait 5s.
Copy link

Documentation preview for this PR (built with commit a181883; changes) is ready! 🎉
This preview will update shortly after each push to this PR.

@orlitzky orlitzky mentioned this pull request Oct 19, 2024
5 tasks
vbraun pushed a commit to vbraun/sage that referenced this pull request Oct 23, 2024
    
The commit messages explain each change in detail, but tl;dr these two
tests fail for me when the machine is under heavy load because they are
guessing how long it will take some computation to complete and the
guess is wrong.
    
URL: sagemath#38825
Reported by: Michael Orlitzky
Reviewer(s): Tobias Diez
@vbraun vbraun merged commit 3232e30 into sagemath:develop Oct 26, 2024
20 checks passed
....: return []
sage: S = RESetMapReduce(roots=[1, 2], children=children)
sage: S.setup_workers(2)
sage: S.start_workers(); sleep(float(0.4))
sage: S.start_workers(); sleep(float(5)) # long time
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The # long time label causes a test failure in the cleanup section, when running tests in non-long mode.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants