Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support stream input in Whisper serving and stream ffmpeg chunks #361

Merged
merged 6 commits into from
Mar 12, 2024

Conversation

jonatanklosko
Copy link
Member

Closes #261.

This allows the Whisper serving to accept a stream of consecutive chunks. Importantly, this improves the {:file, path}, such that we read the file in chunks using ffmpeg, rather than loading it into memory all at once.

This PR drops support for a list if inputs, such as Nx.Serving.batched_run(MyServing, [{:file, path1}, {:file2, path2}]). This serving works on a higher level than usually, because a single chunked input is already multiple inputs to the model, so I think this is sane. Multiple inputs can be processed concurrently by calling batched_run from multiple processes.

I noticed a bug, specifically when streaming with timestamps and using a batch_size, we would ignore small segments after every batch.


@josevalim I went with the stream of consecutive chunks and handle accumulation + overlapping internally. I think it is preferable to keep the chunking details internal and I don't see a benefit of exposing it. If we shift accumulation to the user, they would basically need to do exactly that. For ffmpeg we would do that in bumblebee anyway, because accumulation is better than duplicating ffmpeg decoding work.

lib/bumblebee/audio.ex Outdated Show resolved Hide resolved
Copy link
Contributor

@josevalim josevalim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tiny comments but ship as you prefer!

@jonatanklosko jonatanklosko merged commit f6791b2 into main Mar 12, 2024
2 checks passed
@jonatanklosko jonatanklosko deleted the jk-whisper-input-stream branch March 12, 2024 08:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Stream audio chunk by chunk to Whisper
2 participants