-
Notifications
You must be signed in to change notification settings - Fork 36
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add a bytes
tensor type
#55
Comments
The initial support can be for 1x1 tensors as you can then provide multiple 1x1 input tensor to a model if you really need to get it to work. I think the longer term solution is the idea we discussed, but I want to clarify it slightly as what you described above isn't quite what I had in mind. I am proposing we handle it the same way most of the ML frameworks handle it. More specifically, most frameworks allow passing a linear array that is re-interpreted based on the shape parameter. You can apply the same mapping to a list<list> where each element in the list represents the element being mapped. A simple example is 25 element list<list> being mapped to a 5 x 5 tensor by row-major mapping. The only thing that wit needs to preserve is the length of the linear array. This has the benefits of being in line with how other frameworks handle mapping dimensions and it fits all tensor types well (especially if we expand set_input so that it has forms for primitive types). I think (3) is an overly strong assumption and there are currently issues in how we handle this as well. For example, WIT currently does a copy when passing in a tensor and since we lack a |
I agree that The problem with changing the type of Did I understand what you were getting at correctly? If so, we probably need to look at some of the other options above instead of changing the type of |
Tensors are n-dimensional and can have several discerning factors beyond type and shape. In GPU memory environments, there may also commonly be strides. Images and video have channel and depth formats describing the shape. Leaving The current |
That is not what I was getting at all in your 5x5x5 tensor example. The simplest way I can put it is that list<list> is the linear array for the bytes type. Mostly agree with @shschaefer. The channel and depth formats determine the shape, but once the shape is determined that will determine the number of bytes in the linear array. The encoding and arrangement of those bytes maybe impacted by what the model expects, but as you point out we lack a sufficiently detailed tensor metadata to convey that information to the underlying system. This convinces me that we should just leave the API as is for now and leave it up to each implementation to decide how it wants to encode that information into the input tensor. This implicitly means we won't be able to make changes like range checks i.e if shape is 5x5x5 then array should be 125, because there is no guarantee the number of bytes is connected to the shape of the tensor. |
Transformer networks for LLM take input sequences with a fixed length. So in that regard our current wasi-nn spec is sufficient. However, the data preprocessing part where text of arbitrary length is tokenzied with padding and truncation is not covered in the spec. We had similar issues with image classification on converting image to tensor and the need for some helper functions/meta data. We talked about options of incorporating that into spec, or leave it to implementers in SDK. Maybe we should revisit this topic? |
I think I have a new suggestion to add to this thread: why don't we use The problem here is how to communicate the lengths of these variable-length
I would like to propose a new, fifth option, based on the fourth:
|
As @mingqiusun said LLMs take fixed sized size tokens in a sequence. This is only for supporting frameworks that have string or object dtype for the tensor. |
Your new suggestion only supports 1 x N dimensional tensors and not arbitrary shapes. It's probably still possible to embed that information into the shape field-- for example say first entry in shape is length of tensor dimensions, followed by tensor dimensions, and the rest has to be mapped as length of bytes in row major order, but this means you will need language bindings to make it reasonable to use as that is some complicated logic to construct those calls. It might be better to do nothing than introduce this much mapping complexity for what is probably a less common dtype. |
Quick correction on (3), I think we could solve just this issue with the minimal amount of complexity by adding list<list> or list but my preference is to make no changes as it isn't required for generative AI and it's not clear that those dtypes are broadly used in the ecosystem. Making no change seems inline with what the other solutions are attempting to accomplish, but avoids introducing additional complexity for something we may not even want to support. Also think the energy going into the bytes discussion would be more fruitful going into tensor metadata along the lines of what @shschaefer brought up. @squillace This seems most relevant to the ONNX backend. You all probably have a better idea of how frequently the string dtype is used by models. Do you all see the need for this additional complexity to shoehorn in non-fixed sized dtypes? |
Usually, an LLM model expects input tensors in fixed shapes such as [batch, sequence, feature]. This maps well to our current spec for tensors. Maybe what is needed is a helper function such as text2tensor? But the challenge is that this process of conversion is highly customizable, not exposed by all frameworks and hard to standardize. |
LLMs are not the justification for adding this type unless you are trying to support ggml string input or wrapper models around LLMs. This still reduces to the question of what dtypes should we support? Without any principles this seems like a fairly arbitrary decision that should just be made, until a compelling use case demonstrates otherwise. Complicated encoding schemes to shoehorn into current ABI add a lot of baggage to the spec and means they have to be supported for a very long time. Better to do nothing than lock in some brittle limited pattern forever. |
@geekbeast's use case was the original motivation for this issue; since he feels like he can make do without a new type, let's park this issue until someone absolutely needs it. Like he mentions above, we don't want to lock in some "complicated encoding scheme," so some caution is warranted here. If anyone does end up looking at this in the future, my current take is that options 3 ( |
Some models accept tensors whose items are
bytes
. In order to add these toenum tensor-type
, we need to figure out how to represent thesebytes
items as tensor data, which is currently au8
array:wasi-nn/wit/wasi-nn.wit
Line 44 in 747d8df
Imagine the situation where a model's input is a
1x10
tensor ofbytes
; this means 10 byte arrays need to be stored in the tensor data section. Unfortunately, these byte arrays could all be of different sizes; how should the specification handle this? Some options:1x1
tensors are possible withbytes
or something of that nature1x10xN
There might be other options — let's discuss them in this issue. @geekbeast has floated the idea that tensor data should be represented a
list<list<u8>>
: this way we can use the WIT/WITX type system for encoding each of the lengths of thebytes
arrays. This has some problems: (1) what about tensors with more dimensions? We don't know how manylist<...>
wrappers we need. (2) This representation doesn't fit other tensor types well: e.g., we don't need to know thatf32
is a 4-bytelist<u8>
. (3) Coercing tensors into a specific WIT/WITX type could involve some copying; ideally we just want to be able to pass some pre-existing bytes (e.g., from a decoded image) as tensor data without additional overhead.Your feedback is appreciated to figure this out!
The text was updated successfully, but these errors were encountered: