Musicgen ONNX export (text-conditional only) #1779

fxmarty · 2024-03-27T16:42:40Z

Exports Musicgen conditioned by a text prompt.

optimum-cli export onnx --model facebook/musicgen-small musicgen_onnx &> export.log

If we want to condition with audio, it is more tricky and we first need to be able to export EncodecModel.encode which requires a combination of jit.script/jit.trace as it has some unrollable loops, unfortuantely.

Only KV cache export is tested & supported.

The following subcomponents are exported:

text_encoder.onnx: corresponds to the text encoder part in https://github.com/huggingface/transformers/blob/v4.39.1/src/transformers/models/musicgen/modeling_musicgen.py#L1457.
audio_encoder_decode.onnx: corresponds to the Encodec audio encoder part in https://github.com/huggingface/transformers/blob/v4.39.1/src/transformers/models/musicgen/modeling_musicgen.py#L2472-L2480.
decoder_model.onnx: The Musicgen decoder, without past key values input, and computing cross attention.
decoder_with_past_model.onnx: The Musicgen decoder, with past_key_values input (KV cache filled), not computing cross attention.
decoder_model_merged.onnx: The two previous models fused in one, to avoid duplicating weights. A boolean input use_cache_branch allows to select the branch to use. In the first forward pass where the KV cache is empty, dummy past key values inputs need to be passed and are ignored with use_cache_branch=False.
build_delay_pattern_mask.onnx: corresponds to https://github.com/huggingface/transformers/blob/v4.39.1/src/transformers/models/musicgen/modeling_musicgen.py#L1054-L1125

Partially fixes #1297

⚠️ Depends on huggingface/transformers#29913, please check out on this branch for now

HuggingFaceDocBuilderDev · 2024-03-27T17:00:14Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

fxmarty · 2024-03-27T17:08:39Z

Need #1780 for the CI

ylacombe · 2024-04-03T08:11:58Z

Will be great to do it with Musicgen Melody as well (where music conditioning is different, so could maybe work)

fxmarty · 2024-04-04T16:38:30Z

@ylacombe What are the main differences?

xenova · 2024-04-05T00:15:29Z

I've been testing this out in transformers.js and will report back when the output matches!

xenova · 2024-04-05T12:20:15Z

I upgraded to main transformers + this branch, (latest onnx, onnxruntime, optimum too), and running

optimum-cli export onnx -m facebook/musicgen-small output/facebook/musicgen-small

results in:

 File "/usr/local/lib/python3.10/dist-packages/transformers/models/musicgen/modeling_musicgen.py", line 1261, in forward
    decoder_outputs = self.decoder(
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1520, in _call_impl
    return forward_call(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1501, in _slow_forward
    result = self.forward(*input, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/transformers/models/musicgen/modeling_musicgen.py", line 1089, in forward
    attention_mask = _prepare_4d_causal_attention_mask_for_sdpa(
  File "/usr/local/lib/python3.10/dist-packages/transformers/modeling_attn_mask_utils.py", line 351, in _prepare_4d_causal_attention_mask_for_sdpa
    raise ValueError(
ValueError: Attention using SDPA can not be traced with torch.jit.trace when no attention_mask is provided. To solve this issue, please either load your model with the argument `attn_implementation="eager"` or pass an attention_mask input when tracing the model.

(from huggingface/transformers#29939; cc @ylacombe)

Downgrading to a version before that commit works.

xenova

Can confirm this works for transformers.js (v3 branch)! 🚀

Example code:

import { AutoTokenizer, MusicgenForConditionalGeneration } from '@xenova/transformers';

// Load tokenizer and model
const tokenizer = await AutoTokenizer.from_pretrained('Xenova/musicgen-small');
const model = await MusicgenForConditionalGeneration.from_pretrained(
  'Xenova/musicgen-small', { dtype: 'fp32' }
);

// Prepare text input
const prompt = '80s pop track with bassy drums and synth';
const inputs = tokenizer(prompt);

// Generate audio
const audio_values = await model.generate({
  ...inputs,
  max_new_tokens: 512,
  do_sample: true,
  guidance_scale: 3,
});

// (Optional) Write the output to a WAV file
import wavefile from 'wavefile';
import fs from 'fs';

const wav = new wavefile.WaveFile();
wav.fromScratch(1, model.config.audio_encoder.sampling_rate, '32f', audio_values.data);
fs.writeFileSync('musicgen_out.wav', wav.toBuffer());

Samples:

sample_1.mp4

sample_2.mp4

sample_3.mp4

optimum/exporters/onnx/model_configs.py

* WIP but need to work on encodec first * musicgen onnx export * better logs * add tests * rename audio_encoder_decode.onnx to encodec_decode.onnx * fix num heads in pkv * nits * add build_delay_pattern_mask * fix wrong hidden_size for cross attention pkv * fix tests * update doc

fxmarty added 3 commits March 26, 2024 15:51

WIP but need to work on encodec first

e1d40db

musicgen onnx export

f86a93d

better logs

f8a0365

fxmarty added 3 commits March 27, 2024 18:12

add tests

082f043

Merge branch 'master' into musicgen-onnx

b2e2f07

rename audio_encoder_decode.onnx to encodec_decode.onnx

e73f5ef

xenova mentioned this pull request Mar 27, 2024

Is it possible to run a music / sounds generation model? huggingface/transformers.js#668

Closed

fxmarty added 3 commits April 4, 2024 12:04

fix num heads in pkv

2b44700

nits

0987bd6

add build_delay_pattern_mask

602fb2b

fxmarty requested review from michaelbenayoun, xenova, JingyaHuang and echarlaix April 4, 2024 16:50

fix wrong hidden_size for cross attention pkv

3150763

xenova approved these changes Apr 8, 2024

View reviewed changes

fxmarty added 2 commits April 10, 2024 10:05

fix tests

ed13ece

update doc

9482591

fxmarty merged commit 2f75b0d into huggingface:main Apr 10, 2024
57 of 61 checks passed

fxmarty mentioned this pull request Apr 10, 2024

[ONNX export] Musicgen for text-to-audio #1297

Closed

JingyaHuang reviewed Apr 30, 2024

View reviewed changes

optimum/exporters/onnx/model_configs.py Show resolved Hide resolved

sang-nguyen-ts mentioned this pull request Aug 15, 2024

Torch compile problem and some more ideas huggingface/parler-tts#107

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Musicgen ONNX export (text-conditional only) #1779

Musicgen ONNX export (text-conditional only) #1779

fxmarty commented Mar 27, 2024 •

edited

Loading

HuggingFaceDocBuilderDev commented Mar 27, 2024

fxmarty commented Mar 27, 2024

ylacombe commented Apr 3, 2024

fxmarty commented Apr 4, 2024

xenova commented Apr 5, 2024

xenova commented Apr 5, 2024 •

edited

Loading

xenova left a comment

Musicgen ONNX export (text-conditional only) #1779

Musicgen ONNX export (text-conditional only) #1779

Conversation

fxmarty commented Mar 27, 2024 • edited Loading

HuggingFaceDocBuilderDev commented Mar 27, 2024

fxmarty commented Mar 27, 2024

ylacombe commented Apr 3, 2024

fxmarty commented Apr 4, 2024

xenova commented Apr 5, 2024

xenova commented Apr 5, 2024 • edited Loading

xenova left a comment

Choose a reason for hiding this comment

fxmarty commented Mar 27, 2024 •

edited

Loading

xenova commented Apr 5, 2024 •

edited

Loading