Fixed switched token_type_ids and attention_mask #412

rolshoven · 2023-08-25T15:36:49Z

I was having the same error as mentioned in #338 where I could not export my model with model_base stsb-xlm-roberta-base. After some debugging, I noticed that the attention_mask and token_type_ids were switched in the function forward (line 50) in setfit/exporters/onnx.py. The error then occurs because we are trying to look up both the token_type_id embedding with index 0 and the one with index 1, but there is only one embedding in the matrix. I believe that this did not happen with other model bases because they have more than two token_type embeddings.

However, I must confess that I was not yet able to test this fix with other models that previously worked. We should definitely do this before we merge this code. To make the code safer, I also made us of kwargs when calling self.model_body instead of positional arguments. In my case, I was able to export the model after this small fix.

rolshoven · 2023-08-28T07:47:46Z

Just noticed that the exported onnx model does only work if we switch the attention_mask and token_type_ids of the generated dcitionary after tokenization, which is probably caused by my change. I will investigate further and report back soon.

rolshoven · 2023-08-28T09:25:22Z

I reverted my previous changes and implemented the fix directly in function export_onnx_setfit_model of src/setfit/exporters/onnx.py. The problem was that the tokenizer dictionary keys were not ordered. I implemented a generic solution that uses the signature function from module inspect to make as few assumptions on the parameters as possible while ensuring the correct order of the input values.

rolshoven · 2023-09-18T06:42:20Z

I forgot to include the import ~~but now the tests should work~~

Edit: okay, there are still errors, I'll analyse and address them soon!

andreeapricopi · 2023-09-25T14:18:39Z

Tested with distiluse-base-multilingual-cased-v2, which leads to: ValueError: tuple.index(x): x not in tuple in setfit/exporters/onnx.py: 94 in <lambda>

Code snippet for reproductibility:

from setfit import SetFitModel
from sentence_transformers import SentenceTransformer
from setfit import SetFitHead, SetFitHead, SetFitModel
from setfit.exporters.onnx import export_onnx

model_id = "sentence-transformers/distiluse-base-multilingual-cased-v2" 

model_body = SentenceTransformer(model_id)
model_head = SetFitHead(in_features = model_body.get_sentence_embedding_dimension(), out_features = 4)
model = SetFitModel(model_body = model_body, model_head = model_head)

export_onnx(model.model_body,
            model.model_head,
            opset=12,
            output_path="dummy_path")

tomaarsen · 2023-11-24T12:12:16Z

Perhaps we can adopt the approach from #435 for ONNX, rather than sticking with the current export_onnx. It seems more consistent.

rolshoven mentioned this pull request Aug 25, 2023

error export onxx with body roberta #338

Open

rolshoven marked this pull request as draft August 28, 2023 07:46

rolshoven force-pushed the main branch from 937c408 to 1ca3207 Compare August 28, 2023 09:13

Fixed order of input parameters for onnx export

70610f5

rolshoven force-pushed the main branch from 1ca3207 to 70610f5 Compare September 18, 2023 06:40

rolshoven marked this pull request as ready for review September 18, 2023 06:41

rolshoven marked this pull request as draft September 18, 2023 08:29

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fixed switched token_type_ids and attention_mask #412

Fixed switched token_type_ids and attention_mask #412

rolshoven commented Aug 25, 2023

rolshoven commented Aug 28, 2023

rolshoven commented Aug 28, 2023

rolshoven commented Sep 18, 2023 •

edited

Loading

andreeapricopi commented Sep 25, 2023 •

edited

Loading

tomaarsen commented Nov 24, 2023

Fixed switched token_type_ids and attention_mask #412

Are you sure you want to change the base?

Fixed switched token_type_ids and attention_mask #412

Conversation

rolshoven commented Aug 25, 2023

rolshoven commented Aug 28, 2023

rolshoven commented Aug 28, 2023

rolshoven commented Sep 18, 2023 • edited Loading

andreeapricopi commented Sep 25, 2023 • edited Loading

tomaarsen commented Nov 24, 2023

rolshoven commented Sep 18, 2023 •

edited

Loading

andreeapricopi commented Sep 25, 2023 •

edited

Loading