Documentation for exporting openai/whisper-large-v3 to ONNX #1752

mmingo848 · 2024-03-10T05:24:36Z

Feature request

Hello, I am exporting the OpenAI Whisper-large0v3 to ONNX and see it exports several files, most importantly in this case encoder (encoder_model.onnx & encoder_model.onnx.data) and decoder (decoder_model.onnx, decoder_model.onnx.data, decoder_with_past_model.onnx, decoder_with_past_model.onnx.data) files. I'd like to also be able to use as much as possible from the pipe in the new onnx files:

pipe = pipeline( "automatic-speech-recognition", model=model, tokenizer=processor.tokenizer, feature_extractor=processor.feature_extractor, max_new_tokens=128, chunk_length_s=30, batch_size=16, return_timestamps=True, torch_dtype=torch_dtype, device=device, )

Is there documentation that explains how to incorporate all these different things? I know transformer models are much different in this whole process and I cannot find a clear A -> B process on how to export this model and perform tasks such as quantization, etc. I see I can do the following for the tokenizer with ONNX, but I'd like more insight about the rest I mentioned above (how to use the seperate onnx files & how to use as much as the preexisting pipeline).

processor.tokenizer.save_pretrained(onnx_path)

I also see I can do:

model = ORTModelForSpeechSeq2Seq.from_pretrained( model_id, export=True )

but I cannot find documentation on how to specify where it is exported to, which seem's like I am either missing something fairly simple or it is just not hyperlinked in the documentation.

Motivation

I'd love to see further documentation on the entire export process for this highly popular model. Deployment is significantly slowed due to there not being a easy to find A -> B process for exporting the model and using the pipeline given in the vanilla model.

Your contribution

I am able to provide additional information to make this process easier.

The text was updated successfully, but these errors were encountered:

fxmarty · 2024-03-19T06:31:46Z

@mmingo848 You can use:

optimum-cli export onnx --help
optimum-cli export onnx --model openai/whisper-large-v3 whisper_onnx

and then use ORTModelForSpeechSeq2Seq.

Although decoder_model.onnx and decoder_with_past_model.onnx are saved in the output folder, they are not required for inference and you can just use decoder_model_merged.onnx for the decoder, which handles both without KV cache (first decoding step) and with KV cache (following decoding steps) cases. ORTModelForSpeechSeq2Seq does not use decoder_model.onnx and decoder_with_past_model.onnx by default.

Feel free to refer to:

Let me know if this documentation is helpful!

MrRace · 2024-03-29T11:59:33Z

@fxmarty the log:

Validating ONNX model /share_model_zoo/LLM/openai/onnx_whisper-large-v3/encoder_model.onnx...
        -[✓] ONNX model output names match reference model (last_hidden_state)
        - Validating ONNX Model output "last_hidden_state":
                -[✓] (2, 1500, 1280) matches (2, 1500, 1280)
                -[x] values not close enough, max diff: 0.019733428955078125 (atol: 0.001)
Validating ONNX model /share_model_zoo/LLM/openai/onnx_whisper-large-v3/decoder_model.onnx...
        -[✓] ONNX model output names match reference model (logits)
        - Validating ONNX Model output "logits":
                -[✓] (2, 16, 51866) matches (2, 16, 51866)
                -[✓] all values close (atol: 0.001)
The ONNX export succeeded with the warning: The maximum absolute difference between the output of the reference model and the ONNX exported model is not within the set tolerance 0.001:
- last_hidden_state: max diff = 0.019733428955078125.

you can see [x] values not close enough, max diff: 0.019733428955078125 (atol: 0.001) for the "last_hidden_state", Is it normal for this situation to occur?

fxmarty · 2024-03-29T13:15:09Z

@MrRace Yes it can happen, I would not be worried. We should improve the warning.

MrRace · 2024-04-01T11:19:11Z

@mmingo848 You can use:
optimum-cli export onnx --help
optimum-cli export onnx --model openai/whisper-large-v3 whisper_onnx
and then use ORTModelForSpeechSeq2Seq.

Although decoder_model.onnx and decoder_with_past_model.onnx are saved in the output folder, they are not required for inference and you can just use decoder_model_merged.onnx for the decoder, which handles both without KV cache (first decoding step) and with KV cache (following decoding steps) cases. ORTModelForSpeechSeq2Seq does not use decoder_model.onnx and decoder_with_past_model.onnx by default.

Feel free to refer to:

https://huggingface.co/docs/optimum/main/en/exporters/onnx/usage_guides/export_a_model

https://huggingface.co/docs/optimum/main/en/onnxruntime/usage_guides/pipelines

https://huggingface.co/docs/optimum/main/en/onnxruntime/package_reference/modeling_ort#optimum.onnxruntime.ORTModelForSpeechSeq2Seq

Let me know if this documentation is helpful!

@fxmarty I exported the Whisper ONNX model files using the following command:

optimum-cli export onnx --model /share_model_zoo/LLM/openai/whisper-large-v3/ --task automatic-speech-recognition --device cuda:0 /share_model_zoo/LLM/openai/onnx_gpu_whisper-large-v3/

Under the export directory /share_model_zoo/LLM/openai/onnx_gpu_whisper-large-v3/, there are four ONNX model files: encoder_model.onnx_data,
encoder_model.onnx,
decoder_model.onnx_data, and
decoder_model.onnx.

However, the decoder_model_merged.onnx and decoder_with_past_model.onnx files you mentioned are not present. Why?

fxmarty · 2024-04-02T09:02:21Z

@MrRace You need --task automatic-speech-recognition-with-past. There should be a log during the export about it (that specifying --task automatic-speech-recognition disables KV cache).

MrRace · 2024-04-02T10:53:26Z

@MrRace You need --task automatic-speech-recognition-with-past. There should be a log during the export about it (that specifying --task automatic-speech-recognition disables KV cache).

@fxmarty Thank you very much for your response. However, after following the commands you provided, the following error occurred. How can I fix this error? Thanks again.

Validation for the model /share_model_zoo/LLM/openai/onnx_gpu_with-past-whisper-large-v3/decoder_model_merged.onnx raised: [ONNXRuntimeError] : 1 : FAIL : Load model from /b4-ai/share_model_zoo/LLM/openai/onnx_gpu_with-past-whisper-large-v3/decoder_model_merged.onnx failed:/onnxruntime_src/onnxruntime/core/graph/model.cc:180 onnxruntime::Model::Model(onnx::ModelProto&&, const PathString&, const IOnnxRuntimeOpSchemaRegistryList*, const onnxruntime::logging::Logger&, const onnxruntime::ModelOptions&) Unsupported model IR version: 10, max supported IR version: 9

Traceback (most recent call last):
  File "/share/opt/minicoda/lib/python3.11/site-packages/optimum/exporters/onnx/convert.py", line 1207, in onnx_export_from_model
    validate_models_outputs(
  File "/share/opt/minicoda/lib/python3.11/site-packages/optimum/exporters/onnx/convert.py", line 182, in validate_models_outputs
    raise exceptions[-1][1]
  File "/share/opt/minicoda/lib/python3.11/site-packages/optimum/exporters/onnx/convert.py", line 165, in validate_models_outputs
    validate_model_outputs(
  File "/share/opt/minicoda/lib/python3.11/site-packages/optimum/exporters/onnx/convert.py", line 233, in validate_model_outputs
    raise error
onnxruntime.capi.onnxruntime_pybind11_state.Fail: [ONNXRuntimeError] : 1 : FAIL : Load model from /share_model_zoo/LLM/openai/onnx_gpu_with-past-whisper-large-v3/decoder_model_merged.onnx failed:/onnxruntime_src/onnxruntime/core/graph/model.cc:180 onnxruntime::Model::Model(onnx::ModelProto&&, const PathString&, const IOnnxRuntimeOpSchemaRegistryList*, const onnxruntime::logging::Logger&, const onnxruntime::ModelOptions&) Unsupported model IR version: 10, max supported IR version: 9

optimum : 1.18.0
onnx 1.16.0
onnxruntime 1.17.1
onnxruntime_extensions 0.10.1
onnxruntime-gpu 1.17.1

fxmarty · 2024-04-02T12:50:44Z

Yes, this was fixed in #1780, which is not yet in a release.

Please downgrade to onnx 1.15 or use optimum from source.

MrRace · 2024-04-03T06:45:47Z

Yes, this was fixed in #1780, which is not yet in a release.

Please downgrade to onnx 1.15 or use optimum from source.

@fxmarty Thanks a lot, it can work. After obtaining the decoder_model_merged.onnx and decoder_with_past_model.onnx files, how can I perform inference on test audio? Could you provide a complete example? Or could you advise on how to modify my example below? Thank you very much.

import os
import pdb

from onnxruntime import InferenceSession
import onnxruntime as ort
from transformers import WhisperProcessor
import time
import soundfile as sf

print("onnxruntime device=", ort.get_device())

onnx_model_dir = "/share_model_zoo/LLM/openai/onnx_gpu_with-past-whisper-large-v3/"
onnx_model_file = "decoder_model_merged.onnx"
onnx_model_file_path = os.path.join(onnx_model_dir, onnx_model_file)
print("Use onnx file=", onnx_model_file_path)
is_use_gpu = True
if is_use_gpu:
    session = InferenceSession(onnx_model_file_path, providers=['CUDAExecutionProvider'])
    print("Use onnxruntime-GPU")
else:
    session = InferenceSession(onnx_model_file_path, providers=['CPUExecutionProvider'])
    print("Use onnxruntime-CPU")


processor = WhisperProcessor.from_pretrained(onnx_model_dir)

test_audio_file = "./samples/jfk.wav"
array, sampling_rate = sf.read(test_audio_file)

input_features = processor(array, sampling_rate=sampling_rate, return_tensors="pt").input_features

# for i in range(len(session.get_inputs())):
#     print("session.get_inputs()[{}].name={}".format(i, session.get_inputs()[i].name))

# inference
decoder_input = {session.get_inputs()[0].name: input_features}
decoder_output = session.run(None, decoder_input)
print("decoder_output=", decoder_output)

The above code will raise an error, such as ValueError: Required inputs (['encoder_hidden_states', 'past_key_values.0.decoder.key', 'past_key_values.0.decoder.value', 'past_key_values.0.encoder.key', 'past_key_values.0.encoder.value', 'past_key_values.1.decod ... and so on.

fxmarty · 2024-04-05T06:47:42Z

Hi @MrRace, if you don't want to reimplement the inference code from scratch, I advise you to use https://huggingface.co/docs/optimum/main/en/onnxruntime/package_reference/modeling_ort#optimum.onnxruntime.ORTModelForSpeechSeq2Seq. An example is available there. By default, only encoder_model.onnx and decoder_model_merged.onnx will be used at inference.

I advise you to use https://github.com/lutzroeder/netron if you would like to visualize the ONNX graphs and understand their inputs/outputs.

MrRace · 2024-04-07T02:00:22Z

Hi @MrRace, if you don't want to reimplement the inference code from scratch, I advise you to use https://huggingface.co/docs/optimum/main/en/onnxruntime/package_reference/modeling_ort#optimum.onnxruntime.ORTModelForSpeechSeq2Seq. An example is available there. By default, only encoder_model.onnx and decoder_model_merged.onnx will be used at inference.

I advise you to use https://github.com/lutzroeder/netron if you would like to visualize the ONNX graphs and understand their inputs/outputs.

@fxmarty Thanks a lot for your reply, yeah, I want to implement it from scratch to better understand the overall inference process.

tengomucho added feature-request New feature or request onnx Related to the ONNX export labels Oct 9, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Documentation for exporting openai/whisper-large-v3 to ONNX #1752

Documentation for exporting openai/whisper-large-v3 to ONNX #1752

mmingo848 commented Mar 10, 2024 •

edited

Loading

fxmarty commented Mar 19, 2024

MrRace commented Mar 29, 2024 •

edited

Loading

fxmarty commented Mar 29, 2024

MrRace commented Apr 1, 2024 •

edited

Loading

fxmarty commented Apr 2, 2024

MrRace commented Apr 2, 2024

fxmarty commented Apr 2, 2024

MrRace commented Apr 3, 2024 •

edited

Loading

fxmarty commented Apr 5, 2024

MrRace commented Apr 7, 2024

Documentation for exporting openai/whisper-large-v3 to ONNX #1752

Documentation for exporting openai/whisper-large-v3 to ONNX #1752

Comments

mmingo848 commented Mar 10, 2024 • edited Loading

Feature request

Motivation

Your contribution

fxmarty commented Mar 19, 2024

MrRace commented Mar 29, 2024 • edited Loading

fxmarty commented Mar 29, 2024

MrRace commented Apr 1, 2024 • edited Loading

fxmarty commented Apr 2, 2024

MrRace commented Apr 2, 2024

fxmarty commented Apr 2, 2024

MrRace commented Apr 3, 2024 • edited Loading

fxmarty commented Apr 5, 2024

MrRace commented Apr 7, 2024

mmingo848 commented Mar 10, 2024 •

edited

Loading

MrRace commented Mar 29, 2024 •

edited

Loading

MrRace commented Apr 1, 2024 •

edited

Loading

MrRace commented Apr 3, 2024 •

edited

Loading