Support stream input in Whisper serving and stream ffmpeg chunks #361
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Closes #261.
This allows the Whisper serving to accept a stream of consecutive chunks. Importantly, this improves the
{:file, path}
, such that we read the file in chunks using ffmpeg, rather than loading it into memory all at once.This PR drops support for a list if inputs, such as
Nx.Serving.batched_run(MyServing, [{:file, path1}, {:file2, path2}]
). This serving works on a higher level than usually, because a single chunked input is already multiple inputs to the model, so I think this is sane. Multiple inputs can be processed concurrently by callingbatched_run
from multiple processes.I noticed a bug, specifically when streaming with timestamps and using a
batch_size
, we would ignore small segments after every batch.@josevalim I went with the stream of consecutive chunks and handle accumulation + overlapping internally. I think it is preferable to keep the chunking details internal and I don't see a benefit of exposing it. If we shift accumulation to the user, they would basically need to do exactly that. For ffmpeg we would do that in bumblebee anyway, because accumulation is better than duplicating ffmpeg decoding work.