Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Should 128-bit bit-shift/rotation operators be added? #5

Closed
alexcrichton opened this issue Aug 14, 2024 · 5 comments
Closed

Should 128-bit bit-shift/rotation operators be added? #5

alexcrichton opened this issue Aug 14, 2024 · 5 comments

Comments

@alexcrichton
Copy link
Collaborator

This was bought up at the last CG meeting and wasn't originally evaluated for this proposal. The question is if 128-bit shift-and-rotate operators should be added (IIRC, please correct me if I'm wrong). This would perhaps be i64.{shl,shr_s,shr_u,rotl,rotr}128 for example.

Performance and generated code should be evaluated for these operations today in comparison with what native platforms do. Ideally a benchmark or microbenchmark could be created to compare before/after performance of hypothetical operations.

@alexcrichton
Copy link
Collaborator Author

Digging into this a bit, here's an example of what various native platforms generate for bit-shifting operations. This is the wasm LLVM emits, lightly hand-edited to return two i64 values instead of storing them to memory, and that also showcases what Wasmtime generates for native code today.

Given this it looks like there's significant room for improvement either in Wasmtime or the possibility of adding these operations to this proposal itself. It would be best to validate with a benchmark, however, that these operations are indeed significantly faster with the native lowerings to justify adding them.

@alexcrichton
Copy link
Collaborator Author

Actually no I take back what I said, the assemblies of aarch64 and riscv64 look pretty similar to what the native wasm produces. Only x86_64 seems significantly smaller here through its use of shld and shrd which are not currently pattern-matched by Wasmtime.

So I would update my hypothesis here to aarch64/riscv64 are unlikely to show improvements and x86_64 will likely show improvements for Wasmtime, and maybe other engines too. This should of course be verified, however.

@alexcrichton
Copy link
Collaborator Author

The numbers in #2 (comment) sort of confirm the above hypothesis. On x64 wasm is ~100% slower than native but on aarch64 it's 35% slower than native (numbers for Wasmtime). The 35% number is likely more in the ballpark of "the general delta between Wasmtime and native" rather than specifically related to the shift benchmark in question.

On investigation of the native x64 benchmark though the source code for the algorithm doesn't use simd but the generated code is using vector shift/shuffle instructions. I didn't see sh{l,r}d in the disassembly so looks like LLVM is being quite clever here.

@alexcrichton
Copy link
Collaborator Author

Upon investigating this more I've confirmed that LLVM, for native, is lowering to SIMD bits for the core left/right shift algorithms. If I enable +simd128 for wasm it also uses simd bits as well. The difference here appears to be how LLVM unrolls the loop once on native but not on wasm. Overall it appears that there's not much related to this proposal itself in terms of arithmetic but rather matching performance here would be a combination of improved codegen in LLVM along with possible changes on the simd side of things, which at least for me is out of scope of this proposal.

alexcrichton added a commit that referenced this issue Sep 20, 2024
Summarize #5 and write down some words for this in the overview.
@alexcrichton
Copy link
Collaborator Author

I'm going to close this as "no" with the summary here

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant