Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add OpenVINO support #2712

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

helena-intel
Copy link

Add OpenVINO support for SentenceTransformer models.

  • Add backend="openvino" to use OpenVINO. OpenVINO models can be loaded directly, or converted on the fly from PyTorch models on the Hugging Face hub.
  • Use an OpenVINO config with model_kwargs={"ov_config": config} where config can either be a dictionary or a path to a .json file
  • Use Intel iGPU or dGPU for inference with model_kwargs={"device": "GPU"}. (The device argument for SentenceTransformer expects a PyTorch device. It would require more code modifications with if backend checks to support using the device argument directly to enable Intel GPU. If that is preferred I'm happy to add that)

Documentation is to be done. Should I add an .rst file to docs/sentence_transformer/usage ? Here is basic documentation on how to use the OpenVINO backend, and an example of how to quantize a sentence-transformers model with NNCF and use that with sentence-transformers and the OpenVINO backend: https://gist.github.com/helena-intel/fe7ea16bc015a3d581f3a7417a35a87e

Limitations:

  • T5 models are not yet supported. optimum-intel plans to refactor seq2seq models, T5 models can be added once this refactoring is done
  • This PR only supports SentenceTransformer. CrossEncoder support could be added in a new PR.

@michaelfeil
Copy link
Contributor

michaelfeil commented Jun 9, 2024

@helena-intel

Thanks! I am not really a reviewer, just saw this PR by chance.

Two concerns:

  • OVModelForFeatureExtraction -> Doesn't this require a ONNX model, or a re-exported model?
  • How good would the abstractions you introduced hold for other providers (plain Onnx / the AWS neuron stuff / other impls?)
  • doesnt openvino ship with optium-intel? Or at least via pip install optium-intel[openvino] or similar?

@helena-intel
Copy link
Author

@michaelfeil Thanks for your comments!

OVModelForFeatureExtraction -> Doesn't this require a ONNX model, or a re-exported model?

No, it supports both PyTorch models and OpenVINO IR models. If a path to a PyTorch model is provided, it will be converted to OpenVINO IR on the fly.

How good would the abstractions you introduced hold for other providers (plain Onnx / the AWS neuron stuff / other impls?)

I added a backend parameter instead of hardcoding to OpenVINO to make it easy to add other backends too. It should be easy for all Optimum backends. There are some specifics to OpenVINO (e.g. specific configuration settings, supporting exporting on the fly) so the _load_openvino_model() method is specific for that, but the principle of loading models with Optimum is the same for all backends.

I'm also open to suggestions for a different implementation!

doesnt openvino ship with optium-intel? Or at least via pip install optium-intel[openvino] or similar?

Yes, pip install optimum[openvino] and pip install optimum-intel[openvino] install optimum-intel and all recommended dependencies for running OpenVINO models, including NNCF for model quantization and openvino-tokenizers. For running the test I added just OpenVINO is enough.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants