Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: prevent NaN from cropping up in testshade for icx (or anyone else) #1874

Merged
merged 1 commit into from
Sep 25, 2024

Conversation

lgritz
Copy link
Collaborator

@lgritz lgritz commented Sep 24, 2024

No description provided.

Copy link
Contributor

@sfriedmapixar sfriedmapixar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@lgritz lgritz merged commit d65c589 into AcademySoftwareFoundation:main Sep 25, 2024
21 of 22 checks passed
@fpsunflower
Copy link
Contributor

I'm not following how icx could lead to a NaN here -- cosf() has to return a number in [0,1] so sqrt(1 - z * z) should be well defined, no?

Was this to address the icx CI failure (which still seems to be there) or another case?

@AlexMWells
Copy link
Contributor

technically, just 1 ulp over 1.0 or below -1.0 could cause 1-z*z to be negative. This was just a "possible" source of nan from a code review, so not indicative that is was triggering a nan, just that theoretically it could.

@fpsunflower
Copy link
Contributor

I understand, but if cosine returned a value above 1.0 for small inputs I would consider that a bug with the math library.

@lgritz
Copy link
Collaborator Author

lgritz commented Sep 25, 2024

Good guess, but we still get the problem even with this fix (which I suppose I can revert, maybe it was premature to merge?).

I think that the NaNs -- which didn't necessarily come from this statement, though it seemed like a good guess -- are the symptoms rather than the cause of the underlying issue.

I think most or all of the "render" tests fail, but the failure modes are different -- some hit that assertion (which didn't involve the NaN, it was only when added another print at the time of the assertion did I see other things being NaN), others crash outright, others do neither but timeout after over 10 minutes for that one test (maybe stuck in an infinite loop while constructing the BVH?). All only on icx.

I think somebody with icx handy is going to need to look at this in a debugger.

@steenax86
Copy link
Contributor

I will investigate. To confirm, all these failures are in testsuite/render-*?

@lgritz
Copy link
Collaborator Author

lgritz commented Sep 25, 2024

Yeah, if you look at the CI logs for any of the icx failure, you should be able to see which tests are failing. Seems to happen every time, so any of the ones that either crash or hit an assertion ought to be trivial to pinpoint with a debug build and any debugger.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants