Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add API for building interpolated getter free functions. #1765

Merged
merged 8 commits into from
Feb 15, 2024

Conversation

Uedaki
Copy link
Contributor

@Uedaki Uedaki commented Jan 26, 2024

Description

Adds a new RendererServices API build_interpolated_getter which allows for building custom free functions that provide values for interpolated datas.

The new API is involved at material compilation time and allow the developers to provide specialized functions leveraging the information known at compile time. It will allow for many optimization opportunities for the compiler by replacing runtime branch with direct memory reads.

This PR is a followup to the API build_attribute_getter proposed in this PR: #1704

Tests

TestShade provides an example implementation showing how this compile time information can be used to select an appropriate function to use as the attribute provider, and how to configure the function signature.
Since the feature is enable by default when rs_bitcode is used, the existing testsuite ensure the feature is generating the same result as the previous API.

Checklist:

  • I have read the contribution guidelines.
  • I have updated the documentation, if applicable.
  • I have ensured that the change is tested somewhere in the testsuite (adding new test cases if necessary).
  • My code follows the prevailing code style of this project. If I haven't
    already run clang-format v17 before submitting, I definitely will look at
    the CI test that runs clang-format and fix anything that it highlights as
    being nonconforming.

Copy link

linux-foundation-easycla bot commented Jan 26, 2024

CLA Signed

The committers listed above are authorized under a signed CLA.

@Uedaki Uedaki marked this pull request as ready for review January 30, 2024 23:55
@@ -320,9 +320,9 @@ BackendLLVM::llvm_type_groupdata()
TypeDesc* types = &group().m_userdata_types[0];
int* offsets = &group().m_userdata_offsets[0];
int sz = (nuserdata + 3) & (~3);
fields.push_back(ll.type_array(ll.type_bool(), sz));
fields.push_back(ll.type_array(ll.type_int8(), sz));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you talk to changing the type of the
bool GroupData::userdata_init_flags;
to
int8 GroupData::userdata_init_flags;
?
Is it to ensure the size of the GroupData structure is consistent across platforms?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

userdata_init_flags is a variable used as a char in osl_bind_interpolated_param which can take 3 different values: not initialized (0), not found (1), and initialized (2).
Boolean values are treated as 2-state variables with llvm operators. Due to this limitation, it was impossible to implement the osl_bind_interpolated_param functionality in build_interpolated_getter with llvm operators. Cases 1 and 2 were treated the same way with the bool array.
To solve this problem, I changed the array type from bool to int8. This solution doesn't change the logic, since userdata_init_flags was already used as a char array.
That way I could implement osl_bind_interpolated_param with llvm operators for the build_interpolated_getter functionality.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Uedaki , excellent explanation. Can you add a comment that it is representing 3 states and list them.
Follow on, should it just go to a vanilla int vs int8? Do we gain much by using 1 byte vs 4 bytes, not sure if alignment rules would cause the following variable to start on 4 byte boundary anyway.

std::vector<llvm::Value*> args;
args.reserve(spec.arg_count() + 1);
for (size_t index = 0; index < spec.arg_count(); index++) {
const auto& arg = spec.arg(index);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even though the code flow says it, maybe add a comment here explaining that the argument could be dynamic value identified by an enum or constant value that can directly be emitted.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd rather keep the virtual functions in a good logical order when reading the header. Appending to the end is a bit of a suspect way to avoiding ABI break, I don't think it's guaranteed.

We're allowed to break ABI compatibility in main. We just need to adjust the version numbers to reflect it. Which we can do in a subsequent commit (we already need a subsequent commit in order to update the reference commit for ABI compatibility, so this is not a problem).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I put this comment in the wrong place!

}
}
args.push_back(
ll.void_ptr(groupdata_field_ptr(2 + userdata_index)));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add comment identifying which field your are extracting

spec.set(rs_get_interpolated_test);
} else {
spec.set(rs_get_interpolated_generic,
InterpolatedSpecBuiltinArg::OpaqueExecutionContext,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we always know param_name, type, and derivatives at compile time, do we need a fallback at all?
Either we should be able to emit a specific call or not?

const OSL::TypeDesc type = OSL::TypeDesc_from(_type);

OSL::ShaderGlobals* sg = (OSL::ShaderGlobals*)sg_;
return sg->renderer->get_userdata(derivatives, name, type, sg, val);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be device compatible, we should break encapsulation of the first parameter OSL::OpaqueExecContextPtr by casting it to ShaderGlobals. In particular we can't make calls to a RendererServices virtual function on the device side.
So what ever functionality you want from sg->renderer->get_userdata should be placed right here with any required supporting data placed inside struct RenderState located testshade/render_state.h

The idea being if you are using free functions version, you wouldn't even need to implement RendererServices::get_userdata, because rs_get_interpolated* has replaced it.

}

OSL_RSOP bool
rs_get_interpolated_generic(void* sg_, OSL::ustringhash_pod _name,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

void* sg_
should be
OSL::OpaqueExecContextPtr ec

@AlexMWells
Copy link
Contributor

@Uedaki , overall this looks great! There are just some encapsulation/device compatibility issues in the testshade rs_get_interpolated_* functions to clear up and a few more comments.

@AlexMWells
Copy link
Contributor

@lgritz , can you comment on the CI /VFX2023 abi check failing? Is that because a new virtual function was added to RendererServices? Do we need to bump a version number or something to indicate the ABI might be changing?

@AlexMWells
Copy link
Contributor

AlexMWells commented Feb 1, 2024

From the CI build artifacts:

rendererservices.h
namespace OSL_v1_14_0
[−] class RendererServices  1 
  | Change | Effect -- | -- | -- Virtual method build_interpolated_getter ( ShaderGroup const&, OpenImageIO_v2_4::ustring const&, struct OpenImageIO_v2_4::TypeDesc, bool, InterpolatedGetterSpec& ) has been added to this class. | The layout of v-table has been changed. Call of any virtual method at higher position in this class or its subclasses may result in crash or incorrect behavior of applications.

[+] show v-table (old and new)
[+] affected symbols: 21 (3.7%)

So perhaps the new virtual method needs to be moved after all the other virtual methods, try that. Then on ABI breaking version changes, we could rearrange. Add comment why it's at the end.

@lgritz
Copy link
Collaborator

lgritz commented Feb 2, 2024

So perhaps the new virtual method needs to be moved after all the other virtual methods, try that. Then on ABI breaking version changes, we could rearrange. Add comment why it's at the end.

I'd rather keep the virtual functions in a good logical order when reading the header. Appending to the end is a bit of a suspect way to avoiding ABI break, I don't think it's guaranteed.

We're allowed to break ABI compatibility in main. We just need to adjust the version numbers to reflect it. Which we can do in a subsequent commit (we already need a subsequent commit in order to update the reference commit for ABI compatibility, so this is not a problem).

Unless you need this backported to 1.13, in which case we should do this (even if not guaranteed) to try to preserve ABI compatibility in practice as best we can.

@AlexMWells
Copy link
Contributor

We're allowed to break ABI compatibility in main. We just need to adjust the version numbers to reflect it. Which we can do in a subsequent commit (we already need a subsequent commit in order to update the reference commit for ABI compatibility, so this is not a problem).

@lgritz (asking as I think I might be missing something here) Why wouldn't we just require this PR update the version number, as it will need to to pass CI before we merge right?

@lgritz
Copy link
Collaborator

lgritz commented Feb 14, 2024

@lgritz (asking as I think I might be missing something here) Why wouldn't we just require this PR update the version number, as it will need to to pass CI before we merge right?

Bumping the version is what we do when we have to break the ABI, yes!

But that alone won't make it pass the automated test that compares the ABI to a designated prior commit (named by its hash) that is the "reference" ABI we are trying to match. So the sequence we follow is:

  1. Merge the fix that breaks the ABI. It is at this point expected to fail the ABI CI check, and that's ok because we know why it's failing and are going to immediately fix it.
  2. Change the ABI check to make the "reference ABI" name the commit from step 1, and merge that as a separate fix. At this point, the automated ABI check will pass again.

You can't combine these into just one step, because the actual merge of PR (1) will result in a different commit hash than when the PR was submitted. There's no way to know what the new ABI reference commit's hash will be post-merge, so you simply cannot include it in step 1.

If you can think of a better solution to this, I'm all ears. I haven't been able to think of an easy way to say "make THIS commit, whatever its hash may end up being, be the new ABI reference going forward."

Copy link
Contributor

@AlexMWells AlexMWells left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, real test will be enabling in production renderer to see if all scenarios where covered.

@lgritz lgritz merged commit 7316669 into AcademySoftwareFoundation:main Feb 15, 2024
22 of 23 checks passed
chellmuth pushed a commit to chellmuth/OpenShadingLanguage that referenced this pull request Sep 6, 2024
This won't be picked up by the platforms yet, they are still using
1.13.5.x for 2023.10 (as well as still OIIO 2.6.0.x, which is too old
for this OSL release). So really the only client for this build is
likely Arnold.

Highlights:

* Raise OpenImageIO to 2.6.1.0.
* RS::get_texture_info make robust to empty shaderglobals param (AcademySoftwareFoundation#1731)
* LLVM opaque pointers support (AcademySoftwareFoundation#1728)
* LLVM new pass manager support (AcademySoftwareFoundation#1729)
* LLVM 16 and 17 support (AcademySoftwareFoundation#1730)
* fix: SymOverrideInfo bitfields should be the same type (AcademySoftwareFoundation#1745)
* fix: batched pointcloud if "index" is not passed (AcademySoftwareFoundation#1753)
* api: Ustringhash phase 3 (AcademySoftwareFoundation#1732)
* fix: Fix NVPTX TargetMachine leak, etc. (AcademySoftwareFoundation#1763)
* feat(api): Add API for building interpolated getter free functions. (AcademySoftwareFoundation#1765)
* build: add options to use static Cuda libraries (AcademySoftwareFoundation#1772)
* build(deps): Initial support for LLVM-18 (AcademySoftwareFoundation#1773)
* fix: Mute partio error prints (AcademySoftwareFoundation#1774)
* fix: calculatenormal needs fliphandedness (AcademySoftwareFoundation#1783)
* fix: Print closure missing error message at compile time instead of run time.  (AcademySoftwareFoundation#1781)
* fix: GPU interpolated param initialization (AcademySoftwareFoundation#1791)
* oslquery: Simplify include and link needs (AcademySoftwareFoundation#1787)
* Make isconnected() work with downstream renderer "connections." (AcademySoftwareFoundation#1782)

Also along for the ride:

* Remove the `blah-platform/=1.0` preamble, no longer needed.
* Stop building for deprecated Nuke 12 and 13 platforms.
* Raise cmake build version to 3.24

See merge request spi/dev/3rd-party/osl-feedstock!57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants