Add OpenVINO support #2712

helena-intel · 2024-06-04T14:44:48Z

Add OpenVINO support for SentenceTransformer models.

Add backend="openvino" to use OpenVINO. OpenVINO models can be loaded directly, or converted on the fly from PyTorch models on the Hugging Face hub.
Use an OpenVINO config with model_kwargs={"ov_config": config} where config can either be a dictionary or a path to a .json file
Use Intel iGPU or dGPU for inference with model_kwargs={"device": "GPU"}. (The device argument for SentenceTransformer expects a PyTorch device. It would require more code modifications with if backend checks to support using the device argument directly to enable Intel GPU. If that is preferred I'm happy to add that)

Documentation is to be done. Should I add an .rst file to docs/sentence_transformer/usage ? Here is basic documentation on how to use the OpenVINO backend, and an example of how to quantize a sentence-transformers model with NNCF and use that with sentence-transformers and the OpenVINO backend: https://gist.github.com/helena-intel/fe7ea16bc015a3d581f3a7417a35a87e

Limitations:

T5 models are not yet supported. optimum-intel plans to refactor seq2seq models, T5 models can be added once this refactoring is done
This PR only supports SentenceTransformer. CrossEncoder support could be added in a new PR.

michaelfeil · 2024-06-09T01:25:28Z

@helena-intel

Thanks! I am not really a reviewer, just saw this PR by chance.

Two concerns:

OVModelForFeatureExtraction -> Doesn't this require a ONNX model, or a re-exported model?
How good would the abstractions you introduced hold for other providers (plain Onnx / the AWS neuron stuff / other impls?)
doesnt openvino ship with optium-intel? Or at least via pip install optium-intel[openvino] or similar?

helena-intel · 2024-06-11T21:15:26Z

@michaelfeil Thanks for your comments!

OVModelForFeatureExtraction -> Doesn't this require a ONNX model, or a re-exported model?

No, it supports both PyTorch models and OpenVINO IR models. If a path to a PyTorch model is provided, it will be converted to OpenVINO IR on the fly.

How good would the abstractions you introduced hold for other providers (plain Onnx / the AWS neuron stuff / other impls?)

I added a backend parameter instead of hardcoding to OpenVINO to make it easy to add other backends too. It should be easy for all Optimum backends. There are some specifics to OpenVINO (e.g. specific configuration settings, supporting exporting on the fly) so the _load_openvino_model() method is specific for that, but the principle of loading models with Optimum is the same for all backends.

I'm also open to suggestions for a different implementation!

doesnt openvino ship with optium-intel? Or at least via pip install optium-intel[openvino] or similar?

Yes, pip install optimum[openvino] and pip install optimum-intel[openvino] install optimum-intel and all recommended dependencies for running OpenVINO models, including NNCF for model quantization and openvino-tokenizers. For running the test I added just OpenVINO is enough.

helena-intel added 2 commits August 24, 2024 08:29

Add OpenVINO support

4b31bfa

Fix OpenVINO test on Windows

4c26bad

helena-intel force-pushed the helena/openvino-support branch from c8c8906 to 4c26bad Compare August 24, 2024 08:50

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add OpenVINO support #2712

Add OpenVINO support #2712

helena-intel commented Jun 4, 2024

michaelfeil commented Jun 9, 2024 •

edited

Loading

helena-intel commented Jun 11, 2024

Add OpenVINO support #2712

Are you sure you want to change the base?

Add OpenVINO support #2712

Conversation

helena-intel commented Jun 4, 2024

michaelfeil commented Jun 9, 2024 • edited Loading

helena-intel commented Jun 11, 2024

michaelfeil commented Jun 9, 2024 •

edited

Loading