Multibyte access to `(array i8)` #395

wingo · 2023-07-07T09:10:53Z

Use case: Your language allows users to define new packed struct types at runtime. Your language toolchain targets wasm/gc. You use an (array i8) to represent the backing store for those packed structs.

Problem: For multi-byte loads you have to emit multiple array.get_s or array.get_u calls and then combine the bytes appropriately. This is inefficient, which is enough of a motivation to add multibyte accessors. There should be something to load and store u16 and i16 from arbitrary offsets in an (array i8), as well as i32, i64, f32, f64, and i128.

However I just realized another motivation for multibyte accessors: byte-by-byte access is potentially incorrect in the presence of threads and mutation. Unlike naturally-aligned access to memory in the MVP, access to (array i8) contents with MVP GC ops will tear. Not sure what to do about that: whether to specify that naturally aligned multibyte access does not tear (perhaps with an exception for i128), or whether to ensure atomic access only via specific atomic operations. In any case there is some design work to do here.

The text was updated successfully, but these errors were encountered:

titzer · 2023-07-07T13:23:38Z

I've been ruminating on this in the background and I've started to think that we should consider a post-MVP feature that allows regular load/store instructions (of which we now have several dozen) to apply to either on-heap GC arrays or vice versa: to allow slices of linear memory to be viewable through (i.e. aliased by) GC arrays. We have bits available in the memarg immediate of loads and stores. Would this fit your use case?

vouillon · 2023-07-07T15:21:52Z

Another post-MVP feature I would be interested in is to be able to access JavaScript typed arrays (or ArrayBuffer objects) directly from WebAssembly. (Sure, one can access the linear memory as an ArrayBuffer, but it is not as convenient as first-class garbage collected objects.) I was concerned that this would involve adding a lot of new instructions. But using regular load/store instructions could be a solution.

wingo · 2023-07-10T08:04:12Z

I've been ruminating on this in the background and I've started to think that we should consider a post-MVP feature that allows regular load/store instructions (of which we now have several dozen) to apply to either on-heap GC arrays or vice versa: to allow slices of linear memory to be viewable through (i.e. aliased by) GC arrays. We have bits available in the memarg immediate of loads and stores. Would this fit your use case?

Oh I like this! A couple thoughts:

This could an alternate way to get to 64-bit array access offsets; but then array.len is 32-bit. Dunno, probably not worth exploring, but needs to be kept in mind.
There is overlap with the WebGPU use case; perhaps in LLVM toolchains one could associate an (array i8) with an address space for some code segment, and then memory operations on that address space would go through the arrayref. See Proposal: Fine grained control of memory design#1439.
Probably needs to be restricted to array i8; if you include (array i32) you would then depend on byte order. I suppose you could extend to array-of-struct-with-only-numeric-fields but then we are in the realm of value types rather than reference types.

osa1 · 2023-07-11T08:34:40Z

We have a similar use case in dart2wasm. Dart standard library has a few typed array types like Float32List, Int64List that store the elements unboxed. These types use a byte buffer type (ByteBuffer) to store the elements, and a ByteBuffer can be shared with different lists with different element types, and the lists that share the same ByteBuffer can even use different start offsets as the index of the first element. (for example x[0] reads a 32-bit int at 0x1000, y[0] which shares the same byte buffer reads a 64-bit float at 0x1001)

Currently for ByteBuffer we use array i8, which leads to extremely slow code even in the common case where a list is used directly (not via a view) because of the single-byte reads and writes. One benchmark for typed array performance runs at 23% of the same program compiled to JS using JS typed array API.

Multi-byte reads and writes to array i8s would solve the problem with views, but we also want to be able to share these arrays with JS. Post-MVP JS API for GC arrays may help with this, but @titzer's idea of linear-memory-backed arrays would also solve it nicely and as efficiently as possible. An additional benefit is that it would also make it possible to share these arrays with other linear memory programs (e.g. C++ compiled to Wasm with emscripten, sharing the same linear memory with dart2wasm-generated program, sharing the malloc/free implementations).

(Sharing GC references with linear memory application is also possible, but the language-level types for these references need to be more restrictive compared to a type representing a linear memory address, which can just be a uint8_t*.)

Currently the best we can do for the common case of using a list directly (not via a view) while implementing the Dart typed data API (e.g. with the ability to get byte buffer of a list and use it as storage for another list with different element type, maybe with an offset into the buffer) is we implement multiple ByteBuffer subclasses, each with a differently typed array:

class _I32ByteBuffer implements ByteBuffer {
  final WasmIntArray<WasmI32> _data; // array i32
  ...
}

class _F64ByteBuffer implements ByteBuffer {
  final WasmFloatArray<WasmF64> _data; // array f64
  ...
}

We need one such class for: i8, i16, i32, i64, f32, f64, and for SIMD types for f64x2, f32x4, i32x4, i64x2.

A ByteBuffer implementation for e.g. f64 will have efficient read_f64 and write_f64 iff the offset into the Wasm GC array is also a multiple of the element size. All the other read/write methods (and when the offset is not a multiple of element type) needs to read/write either one byte at a time (better for code size as we can inherit these methods from a base class) or we can improve cases like reading an i32 when the array type is i64, or by doing two i32 reads when the array type is i32 and read_i64 is called etc.

However (1) there will be a lot of code (2) ByteBuffer accesses will have to be virtualized (3) this doesn't solve the problem with sharing these arrays with JS (and maybe also with linear-memory applications).

rossberg · 2023-07-11T08:54:37Z

Just to mention it: an alternative that we have thrown around in the past was to have a form of reinterpret cast on (transparent) array types, such that array(i8) can be viewed as array(f64) etc. That may be useful for other purposes. But it might add some extra complexity around unaligned array sizes.

osa1 · 2023-07-11T11:49:05Z

Would multi-byte read and write instructions for array i8 cause any redundancy in the current MVP GC spec? At least for aligned reads I think a 4-byte read from an array i8 will have the same runtime performance as a read from array i32, so it seems like other array types with unboxed elements become less useful.

titzer · 2023-12-18T19:31:50Z

Just to mention it: an alternative that we have thrown around in the past was to have a form of reinterpret cast on (transparent) array types, such that array(i8) can be viewed as array(f64) etc. That may be useful for other purposes. But it might add some extra complexity around unaligned array sizes.

I think we want to avoid exposing the byte order of array elements and struct fields, so I'd be fine restricting the scope of this feature to array i8.

Another possibility that I've ruminated on is to allow "pinning" all or part of an array i8 by temporarily binding a memory declaration to it. For example, suppose a module declares a separate, empty, but "pinnable" memory, and we introduce instructions memory.pin_array(a: array i8, offset: i32/64, length: i32/i6) and memory.unpin. The semantics of memory.pin_array would be to update mutable state in the instance to refer to the specified inner portion of the array. Then all load and store instructions that target that memory can actually work on the raw storage of the array. This has the advantage of not introducing new immediate flags for instructions related to memories. (Though these memories would be byte-sized, rather than page-sized, and cannot be grown).

yuri91 · 2024-07-29T13:41:50Z

We have a similar use case in CheerpJ (JVM running in the browser):

C++ JNI code that access Java byte arrays expects to be able to do arbitrarily sized load and stores on them.

Currently we compile Java bytecode to JavaScript, but we would like to eventually target WasmGC.

This issue and the inability to expose GC arrays as JS typed arrays (or being able to pass them directly to Web APIs) are the main roadblocks.

mkustermann · 2024-10-03T09:20:46Z

Another use case where this would be very helpful:

When compiling to WasmGC instead of JavaScript one may uses WasmGC arrays instead of JS typed data - as WasmGC arrays are faster to allocate & operate on inside wasm code. Though in some occasions one has to pass data across the boundary to JavaScript.

=> Right now there's no efficient way to memcpy from WasmGC arrays to JS typed arrays (also not to linear memory). One has to do that byte-by-byte
=> If we had multi-byte access we could have much faster bulk copy of data - load&store 8 bytes at a time instead of 1. It may not be the fastest possible way to copy memory (which may be using vectorized instructions), but it would get much closer to it.

rossberg added the Post-MVP Ideas for Post-MVP extensions label Jul 11, 2023

yjbanov mentioned this issue Aug 24, 2023

[dart2wasm] finalize approach for typed data dart-lang/sdk#53345

Open

tlively mentioned this issue Oct 17, 2023

Post-MVP feature requests #452

Open

lax1dude mentioned this issue Oct 26, 2024

Efficient bulk transfer of a i8 array's contents into a JavaScript ArrayBuffer? #568

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Multibyte access to `(array i8)` #395

Multibyte access to `(array i8)` #395

wingo commented Jul 7, 2023 •

edited

Loading

titzer commented Jul 7, 2023

vouillon commented Jul 7, 2023

wingo commented Jul 10, 2023

osa1 commented Jul 11, 2023

rossberg commented Jul 11, 2023

osa1 commented Jul 11, 2023

titzer commented Dec 18, 2023

yuri91 commented Jul 29, 2024

mkustermann commented Oct 3, 2024 •

edited

Loading

Multibyte access to (array i8) #395

Multibyte access to (array i8) #395

Comments

wingo commented Jul 7, 2023 • edited Loading

titzer commented Jul 7, 2023

vouillon commented Jul 7, 2023

wingo commented Jul 10, 2023

osa1 commented Jul 11, 2023

rossberg commented Jul 11, 2023

osa1 commented Jul 11, 2023

titzer commented Dec 18, 2023

yuri91 commented Jul 29, 2024

mkustermann commented Oct 3, 2024 • edited Loading

Multibyte access to `(array i8)` #395

Multibyte access to `(array i8)` #395

wingo commented Jul 7, 2023 •

edited

Loading

mkustermann commented Oct 3, 2024 •

edited

Loading