Onnx converted model has slower inference #191

yogitavm · 2024-09-17T09:21:04Z

I finetuned gliner small v2.1 model and created onnx version of the same model using the convert_to_onnx.ipynb exmple code.
When I compared the inference time of both models, the onnx version took 50% more time.

This is how I'm loading the model:
model = GLiNER.from_pretrained(model_path, load_onnx_model=True, load_tokenizer=True)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Onnx converted model has slower inference #191

Onnx converted model has slower inference #191

yogitavm commented Sep 17, 2024

Onnx converted model has slower inference #191

Onnx converted model has slower inference #191

Comments

yogitavm commented Sep 17, 2024