Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ci: Make more robust timer_test #3953

Merged
merged 1 commit into from
Aug 21, 2023

Conversation

lgritz
Copy link
Collaborator

@lgritz lgritz commented Aug 21, 2023

On overloaded VMs, the timer_test is ureliable and fails spuriously too often.

Two strategies in this patch, hope some combination does the trick:

  1. Increase the absolute error allowed for the timing tests when doing CI runs.

  2. Add just a bit more relative error (via a new unit test macro, OIIO_CHECK_EQUAL_THRESH_REL).

Hopefully this will cut down on CI failures for unit_timer test case.

On overloaded VMs, the timer_test is ureliable and fails spuriously
too often.

Two strategies in this patch, hope some combination does the trick:

1. Increase the absolute error allowed for the timing tests when
   doing CI runs.

2. Add just a bit more relative error (via a new unit test macro,
   OIIO_CHECK_EQUAL_THRESH_REL).

Hopefully this will cut down on CI failures for unit_timer test
case.

Signed-off-by: Larry Gritz <[email protected]>
Copy link
Collaborator

@ThiagoIze ThiagoIze left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

10% relative error could still be too small since the error is related to the OS's tick size. On Windows this defaults to 15ms, so if a test tries to sleep for 100ms (what interval is set to I think), 10% error is 10ms and so you might still be wrong some of the time. 15% relative error should work on standard Windows machines if you want to go with this approach.

But I believe users can modify the tick size, so a better approach would be to have OS specific queries for tick size and then verify that the timer was equal to or up to 15ms slower. Since this code is for a unit test, it's possible the quick 15% padding is good enough and we can revisit this in the future if anyone reports fails.

I don't believe the timer can return faster than the requested time, so you might want to get rid of the absolutes.

@lgritz
Copy link
Collaborator Author

lgritz commented Aug 21, 2023

In these examples, I'm using an absolute error PLUS relative, so I think it's going to be ok.

Copy link
Collaborator

@ThiagoIze ThiagoIze left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So abs + relative is giving you 20ms of padding? Yes, that might be OK for most machines. As I said before, this won't work on all machines but for unit tests it's probably not worth worrying about until we hit it.

@lgritz
Copy link
Collaborator Author

lgritz commented Aug 21, 2023

This takes a stab at improving our hit rate on this test in CI, and I don't see how it hurts anything. If it doesn't fully solve the problem and we have to revisit again, so be it.

@lgritz lgritz merged commit 0fca555 into AcademySoftwareFoundation:master Aug 21, 2023
23 checks passed
@jessey-git
Copy link
Contributor

I can at least verify that this patch allows the unit_timer test to pass for me locally on windows :)

Also, here's output from a quick .exe I threw together to verify the timer resolution from Sleep(0) through Sleep(100). Just a FYI to put into perspective of actual timings you might see:
win_sleep_resolution.txt

@lgritz lgritz deleted the lg-timerfail branch August 23, 2023 21:32
lgritz added a commit to lgritz/OpenImageIO that referenced this pull request Aug 25, 2023
On overloaded VMs, the timer_test is ureliable and fails spuriously too
often.

Two strategies in this patch, hope some combination does the trick:

1. Increase the absolute error allowed for the timing tests when doing
CI runs.

2. Add just a bit more relative error (via a new unit test macro,
OIIO_CHECK_EQUAL_THRESH_REL).

Hopefully this will cut down on CI failures for unit_timer test case.

Signed-off-by: Larry Gritz <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants