Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to configure a text-to-speech model forced_token_ids? #210

Closed
rhcarvalho opened this issue May 19, 2023 · 3 comments
Closed

How to configure a text-to-speech model forced_token_ids? #210

rhcarvalho opened this issue May 19, 2023 · 3 comments

Comments

@rhcarvalho
Copy link

Thanks for Bumblebee and the provided examples! I'm trying out the example at https://github.com/elixir-nx/bumblebee/blob/main/examples/phoenix/speech_to_text.exs.

It works well for audio input in English. For audio input in other languages, it seems to be automatically translating the output to English.

I read https://huggingface.co/openai/whisper-tiny#usage, and, if I understood it well, I'd need to use forced_token_ids to specify the desired/input language and task to be transcribe and not translate. Like in:

processor = WhisperProcessor.from_pretrained("openai/whisper-tiny")
model = WhisperForConditionalGeneration.from_pretrained("openai/whisper-tiny")
forced_decoder_ids = processor.get_decoder_prompt_ids(language="french", task="transcribe")  ## <<<<<

# ...

predicted_ids = model.generate(input_features, forced_decoder_ids=forced_decoder_ids)
transcription = processor.batch_decode(predicted_ids)
['<|startoftranscript|><|fr|><|transcribe|><|notimestamps|> Un vrai travail intéressant va enfin être mené sur ce sujet.<|endoftext|>']
transcription = processor.batch_decode(predicted_ids, skip_special_tokens=True)
[' Un vrai travail intéressant va enfin être mené sur ce sujet.']

How to do that with Bumblebee?

@josevalim
Copy link
Contributor

Not possible yet, see #187. :)

@jonatanklosko
Copy link
Member

@rhcarvalho you can customize forced_token_ids see #107 (comment), we want to streamline this in #187 with higher-level options :)

@rhcarvalho
Copy link
Author

@jonatanklosko 👏 thanks for the pointer! I think the argument types have changed since then as the original example in the comment throws an error. This is what worked for me, in case someone ends up checking this issue for a solution:

diff --git examples/phoenix/speech_to_text.exs examples/phoenix/speech_to_text.exs
index 99f72cb..94e8989 100644
--- examples/phoenix/speech_to_text.exs
+++ examples/phoenix/speech_to_text.exs
@@ -314,6 +314,15 @@ end
 {:ok, tokenizer} = Bumblebee.load_tokenizer({:hf, "openai/whisper-tiny"})
 {:ok, generation_config} = Bumblebee.load_generation_config({:hf, "openai/whisper-tiny"})

+generation_config = %{
+  generation_config
+  | forced_token_ids: [
+      {1, Bumblebee.Tokenizer.token_to_id(tokenizer, "<|pt|>")},
+      {2, Bumblebee.Tokenizer.token_to_id(tokenizer, "<|transcribe|>")},
+      {3, Bumblebee.Tokenizer.token_to_id(tokenizer, "<|notimestamps|>")}
+    ]
+}
+
 serving =
   Bumblebee.Audio.speech_to_text(model_info, featurizer, tokenizer, generation_config,
     compile: [batch_size: 10],

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants