Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Load features onto the GPU in batches to support arbitrarily long audio #129

Merged
merged 3 commits into from
Oct 24, 2023

Conversation

lachesis
Copy link
Collaborator

We tested with large model on nvidia 1070 (8GB) with default settings. We also verified that it works with chunking disabled, and with language detection enabled.

  • 1 hour of 16-bit stereo PCM audio at 48kHz takes about 600MiB of CPU RAM, so there is no issue with preprocessing all of the audio at once.
  • GPU batches default to 2 chunks, but this can be tweaked (and should be for cards with more RAM).
  • Language detection only looks at the first GPU batch worth of audio.
  • This feature conflicts with translation, so if long audio is submitted and translation is enabled, it will be disabled on the fly and no translation will be done.
  • Chunking really doesn't need to be disabled any more, even for relatively low RAM cards. We made the chunking memory threshold configurable in settings.py and defaulted it to 4GB.

Processing all of "12 Angry Men" (1 hour 36 minutes) on large model with beam size 5 took 656666ms on a 1070 Ti and the results were reasonable.

Thanks @richardklafter for pair programming this patch.

@kristiankielhofner kristiankielhofner merged commit dac07ff into main Oct 24, 2023
1 of 2 checks passed
@kristiankielhofner kristiankielhofner deleted the eric/gpu-batching branch October 24, 2023 15:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants