-
Notifications
You must be signed in to change notification settings - Fork 455
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Onnx granite #2043
base: main
Are you sure you want to change the base?
Onnx granite #2043
Conversation
e4b94a4
to
29fc359
Compare
can you add this architecture to the tests suite using a small tiny random model, and push it to the hub (you can create it with |
I've created a tiny-random model: https://huggingface.co/hf-internal-testing/tiny-random-GraniteForCausalLM. Unfortunately, ONNX export currently fails
with the following error: See log
Investigating why 🤔 |
Turns it it is a known (and fixed) bug: pytorch/pytorch#135615 Upgrading to torch nightly fixes it 👍 |
Oooh great ! it's the error I've been seeing when exporting clip with sdpa |
Thanks for all the quick review! I realized I missed a lot of context on the issue itself, so will update with details of how I tested this locally. |
and just to confirm, I tested the ONNX model with Transformers.js, and it matches the python version exactly 👍 |
Perfect, thanks a lot @xenova
For the minimum pytorch version, it can be set with
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
@gabe-l-hart Let me know if you need any help with the final steps (as outlined above)! 🤗 PyTorch 2.5 just released, which should fix the above issue (untested; previous nightly version did fix the bug). |
Thank you! I've been sidetracked on other threads, but hope to get back to this one later in the week. |
No worries! I'm doing the conversions of the latest set of released models with this branch. Will report back when complete. Also, looks like https://huggingface.co/ibm-granite/granite-3.0-1b-a400m-instruct is tagged as |
FYI: I can confirm that https://huggingface.co/onnx-community/granite-3.0-2b-instruct exported correctly and works in Transformers.js 👍 The MoE model is slightly more complicated, and throws an error when performing top_k on a tensor whose dim is < k. Looks like we need to insert a min op there. |
29fc359
to
5925174
Compare
huggingface#2043 (comment) Branch: OnnxGranite Signed-off-by: Gabe Goodhart <[email protected]>
Ok, I've added
When I looked at this myself, I got stuck on the "more complicated" part and punted on it for this PR. Is adding this |
you can also add the architecture for testing with onnxruntime 😄 |
Branch: OnnxGranite Signed-off-by: Gabe Goodhart <[email protected]>
Branch: OnnxGranite Signed-off-by: Gabe Goodhart <[email protected]>
Branch: OnnxGranite Signed-off-by: Gabe Goodhart <[email protected]>
Branch: OnnxGranite Signed-off-by: Gabe Goodhart <[email protected]>
…ttention Branch: OnnxGranite Signed-off-by: Gabe Goodhart <[email protected]>
Branch: OnnxGranite Signed-off-by: Gabe Goodhart <[email protected]>
huggingface#2043 (comment) Branch: OnnxGranite Signed-off-by: Gabe Goodhart <[email protected]>
Branch: OnnxGranite Signed-off-by: Gabe Goodhart <[email protected]>
Branch: OnnxGranite Signed-off-by: Gabe Goodhart <[email protected]>
f936bad
to
22de7b7
Compare
Thanks for pointing that out! I think I've got that added too. I tried running the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM ✅
I'll do some final tests in my conversion script, and will merge soon. Thanks @gabe-l-hart 👍
What does this PR do?
This PR adds support for models using IBM's GraniteForCausalLM architecture when converting to ONNX. The key changes are:
Allow users to opt into usingNo longer neededtransformers>=4.45
foronnx
conversions"granite"
to model configs and tasks"granite"
as amodel_type
that uses grouped attentionNOTE: I encountered an issue very similar to the one discussed in #1835. The root cause for me was the need to add
"granite"
to the list of models requiring Grouped Query Attention inmodeling_decoder.py
. I don't believe that is the root cause for #1835 since"llama"
is already present there, but it is likely a similar issue showing up in the inference module usingnum_attention_heads
instead ofnum_key_value_heads
.Rationale
This PR specifically addresses the
"GraniteForCausalLM"
architecture for IBM's forthcomingGranite
family of models. The current ibm/PowerLM-3b model use this architecture and can be used as a placeholder for testing until the new models are released. The one exception is that thePowerLM
model hasnum_attention_heads
andnum_key_value_heads
set to match (no Grouped Query Attention) whereas the new models will use that (thus the need for the change to ensure GQA is used for"granite"
at inference time).Testing
When testing locally, I had the following dependency versions:
To test the conversion, I did the following:
To evaluate the output side-by-side with the source model, I used the following script:
side_by_side.py
Before submitting
Who can review?