You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
It would be very nice to be able to implement one's own batching logic.
Motivation
AFAIK, the adaptative batching algorithm functions as a blackbox. The parameters (max batch size and max latency) offer very limited control over it.
For instance, I have a use case where the inputs may greatly vary in length and batching them together does not make sense. In some cases it is even slower than doing it sequentially.
I would love to be able to only batch together inputs that are close in length by writing my own logic.
Other
No response
The text was updated successfully, but these errors were encountered:
Hi, if you are still interested in this, we have made some changes to the batching recently.
Now the batch might be split into smaller pieces to fit the max batch size.
For example, if the max_batch_size is 10 and we send 3 requests with size of [7, 7, 6] in receiving order, the batch engine will execute with size of 7+3 and 4+6 sequentially, the second request is split in to two parts with size of (3, 4).
@frostming
Thanks for keeping me updated :)
If I understand correctly your comment applies to when the client sends data in batches. In that case, the batches are now rebuilt to better fit the max batch size argument.
This is not the feature I wish existed. I would really like it if we had more options to customize the way batches are made, like for instance only batching together inputs that share a same metadata field. This is key to getting the best performance in some cases, like the example in my original post.
Maybe it is already possible to do something by subclassing the class that handles batching ? It is not mentionned in the documentation.
Feature request
It would be very nice to be able to implement one's own batching logic.
Motivation
AFAIK, the adaptative batching algorithm functions as a blackbox. The parameters (max batch size and max latency) offer very limited control over it.
For instance, I have a use case where the inputs may greatly vary in length and batching them together does not make sense. In some cases it is even slower than doing it sequentially.
I would love to be able to only batch together inputs that are close in length by writing my own logic.
Other
No response
The text was updated successfully, but these errors were encountered: