Should 128-bit bit-shift/rotation operators be added? #5

alexcrichton · 2024-08-14T15:17:41Z

This was bought up at the last CG meeting and wasn't originally evaluated for this proposal. The question is if 128-bit shift-and-rotate operators should be added (IIRC, please correct me if I'm wrong). This would perhaps be i64.{shl,shr_s,shr_u,rotl,rotr}128 for example.

Performance and generated code should be evaluated for these operations today in comparison with what native platforms do. Ideally a benchmark or microbenchmark could be created to compare before/after performance of hypothetical operations.

The text was updated successfully, but these errors were encountered:

alexcrichton · 2024-08-14T19:44:24Z

Digging into this a bit, here's an example of what various native platforms generate for bit-shifting operations. This is the wasm LLVM emits, lightly hand-edited to return two i64 values instead of storing them to memory, and that also showcases what Wasmtime generates for native code today.

Given this it looks like there's significant room for improvement either in Wasmtime or the possibility of adding these operations to this proposal itself. It would be best to validate with a benchmark, however, that these operations are indeed significantly faster with the native lowerings to justify adding them.

alexcrichton · 2024-08-14T19:49:35Z

Actually no I take back what I said, the assemblies of aarch64 and riscv64 look pretty similar to what the native wasm produces. Only x86_64 seems significantly smaller here through its use of shld and shrd which are not currently pattern-matched by Wasmtime.

So I would update my hypothesis here to aarch64/riscv64 are unlikely to show improvements and x86_64 will likely show improvements for Wasmtime, and maybe other engines too. This should of course be verified, however.

alexcrichton · 2024-08-28T20:36:51Z

The numbers in #2 (comment) sort of confirm the above hypothesis. On x64 wasm is ~100% slower than native but on aarch64 it's 35% slower than native (numbers for Wasmtime). The 35% number is likely more in the ballpark of "the general delta between Wasmtime and native" rather than specifically related to the shift benchmark in question.

On investigation of the native x64 benchmark though the source code for the algorithm doesn't use simd but the generated code is using vector shift/shuffle instructions. I didn't see sh{l,r}d in the disassembly so looks like LLVM is being quite clever here.

alexcrichton · 2024-09-09T14:29:32Z

Upon investigating this more I've confirmed that LLVM, for native, is lowering to SIMD bits for the core left/right shift algorithms. If I enable +simd128 for wasm it also uses simd bits as well. The difference here appears to be how LLVM unrolls the loop once on native but not on wasm. Overall it appears that there's not much related to this proposal itself in terms of arithmetic but rather matching performance here would be a combination of improved codegen in LLVM along with possible changes on the simd side of things, which at least for me is out of scope of this proposal.

Summarize #5 and write down some words for this in the overview.

alexcrichton · 2024-10-08T17:37:07Z

I'm going to close this as "no" with the summary here

alexcrichton added a commit that referenced this issue Sep 20, 2024

Summarize discussion of shifting operators in Overview

b89c0ce

Summarize #5 and write down some words for this in the overview.

alexcrichton mentioned this issue Sep 20, 2024

Summarize discussion of shifting operators in Overview #17

Merged

alexcrichton closed this as completed Oct 8, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Should 128-bit bit-shift/rotation operators be added? #5

Should 128-bit bit-shift/rotation operators be added? #5

alexcrichton commented Aug 14, 2024

alexcrichton commented Aug 14, 2024

alexcrichton commented Aug 14, 2024

alexcrichton commented Aug 28, 2024

alexcrichton commented Sep 9, 2024

alexcrichton commented Oct 8, 2024

Should 128-bit bit-shift/rotation operators be added? #5

Should 128-bit bit-shift/rotation operators be added? #5

Comments

alexcrichton commented Aug 14, 2024

alexcrichton commented Aug 14, 2024

alexcrichton commented Aug 14, 2024

alexcrichton commented Aug 28, 2024

alexcrichton commented Sep 9, 2024

alexcrichton commented Oct 8, 2024