Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PR/BatchedFixForce_LLVM_Boolean #1717

Conversation

AlexMWells
Copy link
Contributor

@AlexMWells AlexMWells commented Aug 22, 2023

Description

Fixes #1716: Crash assigning boolean expression to int in batched…mode

Modifying BatchedAnalysis::establish_symbols_forced_llvm_bool to excluded renderer outputs to fix the issue.

Tests

Added new test compassign-bool to recreate the issue and verify it is fixed.

Checklist:

  • I have read the contribution guidelines.
  • I have previously submitted a Contributor License Agreement.
  • I have updated the documentation, if applicable.
  • I have ensured that the change is tested somewhere in the testsuite (adding new test cases if necessary).
  • My code follows the prevailing code style of this project.

Copy link
Contributor

@sfriedmapixar sfriedmapixar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@lgritz
Copy link
Collaborator

lgritz commented Aug 25, 2023

Is this ready to merge?

@AlexMWells
Copy link
Contributor Author

No, I will be updating it to handle the follow on issue reported in the issue 1716

@AlexMWells AlexMWells force-pushed the PR/BatchedFixForce_LLVM_Boolean branch from 0510663 to cf7260f Compare September 8, 2023 22:59
@AlexMWells
Copy link
Contributor Author

Just committed the comprehensive fix(es):

Rearchitected BatchedAnalysis::establish_symbols_forced_llvm_bool to track a BoolStatus per Symbol evaluating all operations that write to the symbol. Only if all ops that write to it always produce a logically boolean result will the BoolStatus be set to Yes.
There is another set of ops that could produce a logically boolean result if their inputs were also boolean. In that case the BoolStatus is set to Maybe and its input symbols tracked (as they may not have a known BoolStatus yet) and deferred for a follow on pass.
After the initial pass is made, if there were any deferred Symbols, we can check their dependencies to see if they have a known BoolStatus and we can set the Symbol's BoolStatus to Yes or No depending on those input dependencies. If the BoolStatus of any dependencies is still Unknown, we just defer it again. We continue making passes over deferred Symbols until we have established a known BoolStatus for them all. Afterwards we use forced_llvm_bool(true) on the Symbols where BoolStatus is Yes.
Improved support for storing forced_llvm_bool Symbol values by automatically converting from int->bool and then a vector of bools (mask) to the platforms native mask type (might be vector of integers on platforms without native mask support).
Fixed bug in BatchedAnalysis where arrays of closures were not being marked as varying, only closures where.
Allow GroupData to hold actual boolean or native mask data type vs. integers when a Symbol is forced_llvm_bool.
Improved batched GroupData layout calculation to use llvm offset and alignment calculations based on the actual DataLayout (might consider upgrading the scalar code gen to do the same).
Fixed bug where results of partial results from RendererServices::get_userdata allowed initops to execute unmasked effectively overwriting the results from RendererServices::get_userdata. Issue stemmed from op_useparam not being modelled properly by BatchedAnalysis.

Also for scalar and batched enabled ability to have interpolated output parameters to match oslc behavior of allowing it.

Added new test compassign-bool to recreate the issue and verify it is fixed.
Added new test closure-layered to recreate the issue and verify it is fixed.
Added new test userdata-partial recreate the issue and verify it is fixed.

@lgritz
Copy link
Collaborator

lgritz commented Sep 14, 2023

Is this still in progress?

Alex, why did a bunch of methods get marked as NOINLINE here?

@johnhaddon
Copy link
Contributor

I tested this PR in Gaffer today, and can confirm that it fixes the original problem involving the boolean assignment. But it also appears to have introduced two new test failures for us. In both cases we're reading an integer via getattribute() and then doing something based on its value :

int shadingIndex;
if( getattribute( "shading:index", shadingIndex ) )
{
    printf( "SHADING INDEX %d", shadingIndex );
    if (shadingIndex % 2 == 0)
    {
         ...

This is the meat of the BatchedRendererServices::get_userdata() call that invokes :

if( wval.type() == OIIO::TypeDesc( OIIO::TypeDesc::INT32 ) )
{
    maskedDataInitWithZeroDerivs( wval );
    wval.mask().foreach ([&wval, pointIndex](ActiveLane lane) -> void {
        int i = pointIndex + lane.value();
        wval.assign_val_lane_from_scalar( lane, &i );
    });
}

Can you see anything wrong with that? If I instrument the C++ then it appears to be running for all the lanes, and assigning the value I expect, but that's not coming through on the OSL side.

What seems to work around the problem is to make sure shadingIndex is initialised before calling getattribute() :

int shadingIndex = -1;
if( getattribute( ...

Then all of a sudden the tests pass. I don't suppose that gives you any clues as to something we might be doing wrong, or as to something that might have been inadvertently broken in this PR?

@AlexMWells
Copy link
Contributor Author

Is this still in progress?

Alex, why did a bunch of methods get marked as NOINLINE here?

Debugging, when inlined its quite difficult to follow through. And for most of these not sure they would benefit from inlining, especially the ones recursive in nature. They can be removed later if a slow down in analysis is observed. But really something with a larger working set of complex shaders to measure the difference between inline and NOINLINE here.

@AlexMWells
Copy link
Contributor Author

@johnhaddon thanks for testing.

For the broken one, what output are you seeing from your
printf( "SHADING INDEX %d", shadingIndex );
vs. what you see when you set
int shadingIndex = -1;
?
Are they all 0 or 1's or garbage?
And you can set attribute in the ShadyingSystem "dump_forced_llvm_bool_symbols" to 1, and run the broken and workaround version and report back on which symbols.
Perhaps getattribute is not participating is in the forced bool analysis correctly.

@AlexMWells
Copy link
Contributor Author

@johnhaddon , I think I was able to reproduce the issue, the getattribute doesn't appear to disqualify the local variable from being forced_llvm_bool.

testshade --options opt_batched_analysis=1,dump_forced_llvm_bool_symbols=1 -t 1 test
shadingIndex is forced llvm bool.
$tmp1 is forced llvm bool.
$tmp3 is forced llvm bool.

But when you initialized shadingIndex to -1, it could tell the initial value couldn't be bool.

testshade --options opt_batched_analysis=1,dump_forced_llvm_bool_symbols=1 -t 1 test
$tmp1 is forced llvm bool.
$tmp3 is forced llvm bool.

I'll see if I can get getattribute to play nicely here.

@johnhaddon
Copy link
Contributor

I think I was able to reproduce the issue, the getattribute doesn't appear to disqualify the local variable from being forced_llvm_bool.

Yep, I think that's it. I see the same output in my original test case here, if I run it with dump_forced_llvm_bool_symbols=1. And for what it's worth, in the broken case, shadingIndex was always 0 for all lanes.

…mode

Modifying BatchedAnalysis::establish_symbols_forced_llvm_bool to excluded renderer outputs to fix the issue.
Added new test compassign-bool to recreate the issue and verify it is fixed.

Signed-off-by: Alex M. Wells <[email protected]>
Signed-off-by: Alex M. Wells <[email protected]>
…ubuntu and macos CI builds had issues

Signed-off-by: Alex M. Wells <[email protected]>
… to int in batched mode

    Rearchitected BatchedAnalysis::establish_symbols_forced_llvm_bool to track a BoolStatus per Symbol evaluating all operations that write to the symbol. Only if all ops that write to it always produce a logically boolean result will the BoolStatus be set to Yes.
There is another set of ops that could produce a logically boolean result if their inputs were also boolean.  In that case the BoolStatus is set to Maybe and its input symbols tracked (as they may not have a known BoolStatus yet) and deferred for a follow on pass.
After the initial pass is made, if there were any deferred Symbols, we can check their dependencies to see if they have a known BoolStatus and we can set the Symbol's BoolStatus to Yes or No depending on those input dependencies.  If the BoolStatus of any dependencies is still Unknown, we just defer it again.  We continue making passes over deferred Symbols until we have established a known BoolStatus for them all.  Afterwards we use forced_llvm_bool(true) on the Symbols where BoolStatus is Yes.
    Improved support for storing forced_llvm_bool Symbol values by automatically converting from int->bool and then a vector of bools (mask) to the platforms native mask type (might be vector of integers on platforms without native mask support).
    Fixed bug in BatchedAnalysis where arrays of closures were not being marked as varying, only closures where.
    Allow GroupData to hold actual boolean or native mask data type vs. integers when a Symbol is forced_llvm_bool.
    Improved batched GroupData layout calculation to use llvm offset and alignment calculations based on the actual DataLayout (might consider upgrading the scalar code gen to do the same).
    Fixed bug where results of partial results from RendererServices::get_userdata allowed initops to execute unmasked effectively overwriting the results from RendererServices::get_userdata.  Issue stemmed from op_useparam not being modelled properly by BatchedAnalysis.

    Added new test compassign-bool to recreate the issue and verify it is fixed.
    Added new test closure-layered to recreate the issue and verify it is fixed.
    Added new test userdata-partial recreate the issue and verify it is fixed.

Signed-off-by: Alex M. Wells <[email protected]>
added missing test output for when batched is not a compiletime enabled
NOTE: in previous commit, ability to have interpolated output parameters was enabled to match oslc behavior of allowing it.

Signed-off-by: Alex M. Wells <[email protected]>
Signed-off-by: Alex M. Wells <[email protected]>
…urn value is always a bool but result parameter when it is an integer should NOT be forced_llvm_bool.

Added test getattribute-shading to recreate the issue and verify its fixed.

Signed-off-by: Alex M. Wells <[email protected]>
…ilds that exclude batched.

Signed-off-by: Alex M. Wells <[email protected]>
Signed-off-by: Alex M. Wells <[email protected]>
@AlexMWells AlexMWells force-pushed the PR/BatchedFixForce_LLVM_Boolean branch from 492f0e8 to edade61 Compare September 20, 2023 20:58
@AlexMWells
Copy link
Contributor Author

@johnhaddon the pr has been updated with a fix for getattribute, whose return value is always bool, but it was treating its result parameter as always being bool as well. Fix was to restrict consideration to write's from return values, vs any argument.

Let us know if it works, otherwise I think the PR is good to go.

@johnhaddon
Copy link
Contributor

Thanks @AlexMWells - I'm happy to confirm that the whole GafferOSL test suite is running successfully with this PR. Many thanks once again!

@lgritz lgritz merged commit e327241 into AcademySoftwareFoundation:main Sep 21, 2023
21 of 22 checks passed
@AlexMWells AlexMWells mentioned this pull request Sep 25, 2023
5 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Crash assigning boolean expression to int in batched mode
4 participants