Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make batched use ustringhash_pod and make batched compile in Windows #1831

Open
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

johnfea
Copy link
Contributor

@johnfea johnfea commented Jun 17, 2024

Description

EDIT:
Make batched and non-batched internally use the same string representation for reasons discussed below.

Tests

New test cases aren't necessarily required.
This PR would fix new failing test case in #1832.

Checklist:

  • I have read the contribution guidelines.
  • I have updated the documentation, if applicable.
  • I have ensured that the change is tested somewhere in the testsuite (adding new test cases if necessary).
  • My code follows the prevailing code style of this project. If I haven't
    already run clang-format v17 before submitting, I definitely will look at
    the CI test that runs clang-format and fix anything that it highlights as
    being nonconforming.

@johnfea johnfea changed the title Add testminimal and a new testsuite case that fails in batched mode. Add testminimal and a new testsuite case. Jun 17, 2024
@johnfea
Copy link
Contributor Author

johnfea commented Jun 17, 2024

I also tried to debug a bit why "closure-string" test fails with batched avx2.

oslexec/wide/wide_opclosure.cpp: in void init_closure_component() casting ClosureComponent* there into MicrofacetParams* printed the same wrong hash for ustringhash as does process_bsdf_closure() in testminimal/oslmaterial.cpp.

oslexec/batched_llvm_gen.cpp and its llvm_gen_closure function: it seems like a promising place to check since it processes all the variables inside microfacet() statement in test.osl, starting with distribution ustringhash. EDIT: This place has the same wrong hash, checked with debug cout: "llvm::outs() << 'loaded: ' << *loaded << '\n';", so the problem happens before that.

@johnfea johnfea force-pushed the testminimal branch 7 times, most recently from 51b795e to 58bb38e Compare June 19, 2024 17:13
@johnfea
Copy link
Contributor Author

johnfea commented Jun 19, 2024

The problem was indeed in llvm_gen_closure function:
src/oslexec/batched_llvm_gen.cpp:

"llvm::Value* loaded = rop.llvm_load_value(sym.."

src/liboslexec/batched_backendllvm.cpp:

"llvm::Value*BatchedBackendLLVM::llvm_load_value(const Symbol& sym.."
"ll.constant(string_val);"

src/liboslexec/llvm_util.cpp:

"llvm::Value* LLVM_Util::constant(ustring s)"

This function already has code to select between ustring and ustringhash based on ustring representation, but
src/include/OSL/llvm_util.h always selects ustring at least in batched mode.

So I added code in CMakeLists.txt to use already existing OSL_USTRINGREP_IS_HASH to conditionally select hash in src/include/OSL/llvm_util.h and then enabled support for hash for batched functions in src/liboslexec/llvm_util.cpp.

This fixes "closure-string" test but introduces new failed tests because batched functions in liboslexec/wide/* still use ustring_pod instead of ustringhash_pod. I enabled ustringhash_pod for wide/wide_opmatrix.hpp as a start.

@johnfea johnfea force-pushed the testminimal branch 6 times, most recently from 2b413f5 to 67eb1cb Compare June 19, 2024 18:19
@johnfea
Copy link
Contributor Author

johnfea commented Jun 19, 2024

I enabled ustringhash_pod for all files in liboslexec/wide/. However, there's issues with the patch. All string parameters there are now ustringhash_pod and according to this error i64 is being inputted for strings, but its giving type mismatch, not sure why.

Call parameter type does not match function signature!
i64 -3801545496781945523
 i8*  %48 = call i32 @osl_b8_AVX2_build_transform_matrix_Wmss_masked(i8* %45, i8* nonnull %37, i64 -3801545496781945523, i64 -2374949479891991091, i32 %47)

EDIT: It was fixed in src/liboslexec/builtindecl_wide_xmacro.h replacing X with s for string parameters.

@johnfea johnfea force-pushed the testminimal branch 8 times, most recently from 47c66e8 to 9f8b3d0 Compare June 20, 2024 10:26
@johnfea johnfea changed the title Add testminimal and a new testsuite case. Add testminimal, a new testsuite case and make batched use ustringhash_pod Jun 20, 2024
@johnfea johnfea force-pushed the testminimal branch 3 times, most recently from efa389f to a3e70cb Compare June 20, 2024 15:15
@johnfea
Copy link
Contributor Author

johnfea commented Jun 20, 2024

There's now more code to support ustringhash_pod for batched. Some string variables from .osl files get passed as ustrings into liboslexec/wide functions while ustringhash is expected, I didn't find where that is set.

@AlexMWells
Copy link
Contributor

I think we left the batched version alone (not using hashes where possible) mostly to just limit the scope of the code changes with the original hashcode changes which were mostly targeting GPU work happening on the scalar (non-batched) code path. And of course there is some shared code which pays with some confusion on which parameters it's actually aking.

I don't think there is problem with moving batched to ustringhash, the tests are comprehensive and will probably catch everything

@@ -31,10 +31,12 @@ namespace pvt {
// Shims to convert llvm gen to rs free function C++ parameter types
// and forward on calls to re free functions.

// TODO this can be removed now that batched supports ustringhash_pod
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this impl should be changed at all. Just people calling it maybe don't need to call it.
Then if there are no callers left, it can be removed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this impl should be changed at all. Just people calling it maybe don't need to call it. Then if there are no callers left, it can be removed.

Yes. I changed it because I didn't want atm. to go through all the code calling it, and instead made the change here in a single place.

@AlexMWells
Copy link
Contributor

There's now more code to support ustringhash_pod for batched. Some string variables from .osl files get passed as ustrings into liboslexec/wide functions while ustringhash is expected, I didn't find where that is set.

You might need to look at each failing call, most likely batched_llvm_gen.cpp.

@AlexMWells
Copy link
Contributor

Just one comment on converting to hashcode vs. ustrings, for "real" string operations that are not constant folded (CPU only), there will be more overhead when we do the lookup of the hashcode to real string value to perform the string op
Before:

        ustring str        = wS[lane];
        ustring sep        = wSep[lane];

After:

        ustring str        = ustring_from(wS[lane]);
        ustring sep        = ustring_from(wSep[lane]);

Counter argument was if you had dynamic string operations, good chance that is a much bigger performance penalty than a couple lookups.

@sfriedmapixar
Copy link
Contributor

Regarding moving ustrings to hashes -- the primary reason to do this is to have compiled GPU shaders hit in NVidia's compiler cache, thus it is currently of no benefit to the CPU. Because the non-simd CPU code shares a bunch of declarations for this, they ended up getting changed as well. For the specialized case of CPU SIMD code, in the absence of bugs, we've found in our renderer it makes much more sense to leave it as ustrings and avoid the expensive ustring_from transformations and have strings be easy to inspect in a debugger than to change to the slower and more difficult to use ustringhash implementation. Was this only to try and fix bugs you were seeing with hash values being passed in, or were there other reasons you had?

Also, it would be much better to separate the minimal test tool out into a separate pull request from any major ustring<->ustringhash changes.

@johnfea
Copy link
Contributor Author

johnfea commented Jun 21, 2024

Was this only to try and fix bugs you were seeing with hash values being passed in, or were there other reasons you had?

My first reason was performance. For example, hashing due to calling transform("myspace") was taking around 1% of rendering time in batched mode only. EDIT2: It wasn't unnecessary hashing I was seeing in a profiler but unnecessary hash->ustring conversions, anyway all that I could find in a profiler and some by looking at code are now removed by this patch and were partly due to wide/ code expecting user interface functions to use ustring format but they already use ustringhash.

All of OSL interface code for batched seems to already use hashes. Hashes are great to use in a switch {} from performance standpoint, where the expected cases will be compile time constants. This is much faster than doing string comparisons or map lookups with a string key. This is relevant at least in renderer implementations of get_matrices, get_attributes and when parsing closure tree and inside OSL too for example when comparing noise function names and transform spaces names, some of which could be changed to switches. Hashes might be slower when printing and doing some string operations, but is much of that being run when doing production rendering?

EDIT2: Also, ustringhash_pod can be passed in CPU registers into functions which might give a small performance benefit.

However, the reason I ended up changing it here was to fix this testcase so another reason to consider it is maintainability and code complexity. Having non-batched and batched use different string representation internally can lead to more complex and more error prone code.

Also, it would be much better to separate the minimal test tool out into a separate pull request from any major ustring<->ustringhash changes.

Agreed, I can make those a separate PR. EDIT: Done.

@johnfea
Copy link
Contributor Author

johnfea commented Jun 21, 2024

Some string variables from .osl files get passed as ustrings into liboslexec/wide functions while ustringhash is expected, I didn't find where that is set.

You might need to look at each failing call, most likely batched_llvm_gen.cpp.

Here's an example:

testsuite/array-reg/test_varying_index_string.osl:

string indirectR = rarray[varyingIndexR];
Cout[0] = stoi(indirectR)/256.0;

indirectR gets passed as ustring into stoi() in batched mode. Here's some relevant debug output with OSL_DEV_ONLY and some of my debug prints:

llvm_alloca indirectRNAME: indirectR
string
 n=1 t.size()=8 as VARYING
...
call llvm_gen_generic
generic handle symbol: indirectR
...
lvm_gen_generic osl_b8_AVX2_stoi_WiWs
>>return value is pointer osl_b8_AVX2_stoi_WiWs
....pushing $tmp5 as void_ptr
....pushing indirectR as void_ptr

What I'm looking for here is where in OSL codebase variable indirectR gets set to ustring instead of ustringhash.
Some of the places I looked at are:

batched_backendllvm.cpp: llvm_alloca()
batched_llvm_instance.cpp: BatchedBackendLLVM::llvm_assign_initial_value()

@johnfea johnfea changed the title Add testminimal, a new testsuite case and make batched use ustringhash_pod Make batched use ustringhash_pod Jun 21, 2024
@johnfea
Copy link
Contributor Author

johnfea commented Jun 21, 2024

Just one comment on converting to hashcode vs. ustrings, for "real" string operations that are not constant folded (CPU only), there will be more overhead when we do the lookup of the hashcode to real string value to perform the string op

Yes. I already commented this little from performance standpoint above, but additionally: This patch currently uses real ustrings for llvm_gen_printf() variadic arguments just because changing Strutil::vsprintf to support hashes was non-trivial. I'm not sure if this always works, however, unless all of those arguments are always are created in llvm_gen_printf().

But if some mixing can work, then it would be possible to use real ustrings for strings ops or for some string ops on CPU and hashes everywhere else.

@johnfea
Copy link
Contributor Author

johnfea commented Jun 23, 2024

testsuite/array-reg/test_varying_index_string.osl:

I did manage to narrow this one down. The problem is that string array constants are stored as char* pointers or ustrings and then passed as i64 that are actually pointers through everything into stoi().

In non-batched mode, constant allocation happens at llvm_create_constant(), there's even TODO there to use hashes instead. I didn't find where the same thing happens for batched mode, one way to a fix would be to use hashes instead there.

Another way to a possible fix would be to do conversion to hash at BatchedBackendLLVM::llvm_store_value(), non-batched mode already does one conversion like that there. Not sure about what code would achieve that, non-batched mode conversion looked like pointer cast and applying it just seemed to brake the format which was already correct, i64.

Another possibility is to do the conversion to hash at BatchedBackendLLVM::llvm_load_value() which gets called for that const symbol before llvm_store_value(), which also calls llvm_get_pointer() for that symbol.

EDIT: Also, everything I'm using in OSL seems to already work with in batched mode with this patch so I'm no longer in a hurry to get this working. I also tried a smaller patch closurefix that only changes to use ustringhash in batched llvm_gen_closure() llvm::Value* loaded = rop.llvm_load_value() but that didn't work.

@johnfea johnfea force-pushed the testminimal branch 2 times, most recently from 82dfc10 to 2199e96 Compare June 24, 2024 08:19
@johnfea johnfea changed the title Make batched use ustringhash_pod Make batched use ustringhash_pod and make batched compile in Windows Jul 4, 2024
@johnfea
Copy link
Contributor Author

johnfea commented Jul 4, 2024

Now this patch also makes batched OSL compile in Windows with MSVC and Mingw and run at least with Mingw, attempting to fix #1645. There's overlap with make batched use ustringhash_pod patch so they are currently a combined patch.

This part is not exactly ready since it disables functionality, but it does run and highlights the problem parts.

It looks like code in liboslexec/wide depends on Linux way of loading symbols, where loaded shared lib can access symbols from loading shared lib, which somehow differs from how Windows handles this. Unless there's some kind of linker options that solves this, I think a solution would be to make oslexec.dll supply wide dlls with callback functions for e.g. error_fmt() and few other functions that are called this way. As is, this patch just comments out those calls from liboslexec/wide.

@lgritz
Copy link
Collaborator

lgritz commented Sep 3, 2024

I'm not sure I understand why this is failing CI. Can you rebase to current main and try pushing again to trigger a CI run?

@johnfea
Copy link
Contributor Author

johnfea commented Sep 3, 2024

"Make batched use ustringhash_pod" patch works but is still missing string array constant support, that alone fails CI. I would really ask help from someone who knows the code better to fix that. See discussion above.

"make batched compile in Windows" patch works but disables features which likely fail CI. It probably belongs in a separate patch that is worked on only after "make batched use ustringhash_pod" has been merged.

@lgritz
Copy link
Collaborator

lgritz commented Nov 10, 2024

What's the status here?

@johnfea
Copy link
Contributor Author

johnfea commented Nov 12, 2024

Status unchanged.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants