This package provides a MLServer runtime compatible with Openvino. This package has couple features:
- If server detect that model file is onnx format script will auto convert to openvino format (xml, bin) with dynamic batch size for openvino.
- Openvino dynamic batch size
- Grpc Ready
- V2 Inference Protocol
- Models metrics
For serving Openvino I choose MLServer because this framework has V2 Inference Protocol (https://kserve.github.io/website/modelserving/inference_api/), grpc and metrics out of the box.
pip install mlserver mlserver-openvino
If no content type is present on the request or metadata, the Openvino runtime will try to decode the payload as a NumPy Array. To avoid this, either send a different content type explicitly, or define the correct one as part of your model’s metadata.
Your models add to models folder. Accepted files: ["model.xml", "model.onnx"]
/example
/models/your-model-name/
/tests
setup.py
README.md
Training and serve example: https://mlserver.readthedocs.io/en/latest/examples/sklearn/README.html
For download metrics (prometheus) use below links
GET http://<your-endpoint>/metrics
GET http://0.0.0.0:8080/metrics
# Build docker image
mlserver build . -t test
# Start server and pass mlserevr_models_dir
docker run -it --rm -e MLSERVER_MODELS_DIR=/opt/mlserver/models/ -p 8080:8080 -p 8081:8081 test
For example script see below files:
/example/grpc-example.py
/example/rest-example.py
- First create one time kserve runtime from file: kserve/cluster-runtime.yaml
- Create InferenceService from template:
apiVersion: "serving.kserve.io/v1beta1"
kind: "InferenceService"
metadata:
name: "my-openvino-model"
spec:
predictor:
model:
modelFormat:
name: openvino
runtime: kserve-mlserver-openvino
#storageUri: "gs://kfserving-examples/models/xgboost/iris"
storageUri: https://github.com/myrepo/models/mymodel.joblib?raw=true
{
"name": "mnist-onnx-openvino",
"implementation": "mlserver_openvino.OpenvinoRuntime",
"parameters": {
"uri": "./model.onnx",
"version": "v0.1.0",
"extra": {
"transform": [
{
"name": "Prepare Metadata",
"pipeline_file_path": "./pipeline.cloudpickle",
"input_index": 0
}
]
}
},
"inputs": [
{
"name": "input-0",
"datatype": "FP32",
"shape": [28,28,1]
}
],
"outputs": [
{
"name": "output",
"datatype": "FP32",
"shape": [10]
}
]
}
If you add transformer pipeline in extra properties you should dump code in same python version as execute mlserver
make test