-
Notifications
You must be signed in to change notification settings - Fork 3
KServe Multi Model Serving
InferenceDB supports KServe's Multi Model Serving using the filters
object in InferenceLogger. See the full example here.
First, we will need a Kafka broker to collect all KServe inference requests and responses:
apiVersion: eventing.knative.dev/v1
kind: Broker
metadata:
name: sklearn-mms-broker
namespace: default
annotations:
eventing.knative.dev/broker.class: Kafka
spec:
config:
apiVersion: v1
kind: ConfigMap
name: inferencedb-kafka-broker-config
namespace: knative-eventing
---
apiVersion: v1
kind: ConfigMap
metadata:
name: inferencedb-kafka-broker-config
namespace: knative-eventing
data:
# Number of topic partitions
default.topic.partitions: "8"
# Replication factor of topic messages.
default.topic.replication.factor: "1"
# A comma separated list of bootstrap servers. (It can be in or out the k8s cluster)
bootstrap.servers: "kafka-cp-kafka.default.svc.cluster.local:9092"
Next, we will create an InferenceService that will serve multiple TrainedModel objects:
apiVersion: serving.kserve.io/v1beta1
kind: InferenceService
metadata:
name: sklearn-mms
spec:
predictor:
minReplicas: 1
logger:
mode: all
url: http://kafka-broker-ingress.knative-eventing.svc.cluster.local/default/sklearn-mms-broker
sklearn:
name: sklearn-mms
protocolVersion: v2
resources:
limits:
cpu: 500m
memory: 1Gi
requests:
cpu: 500m
memory: 1Gi
Note the logger
section - you can read more about it in the KServe documentation.
You can now add some models to the new InferenceService:
apiVersion: serving.kserve.io/v1alpha1
kind: TrainedModel
metadata:
name: model1
spec:
inferenceService: sklearn-mms
model:
storageUri: gs://seldon-models/sklearn/mms/lr_model
framework: sklearn
memory: 512Mi
---
apiVersion: serving.kserve.io/v1alpha1
kind: TrainedModel
metadata:
name: model2
spec:
inferenceService: sklearn-mms
model:
storageUri: gs://seldon-models/sklearn/mms/lr_model
framework: sklearn
memory: 512Mi
Finally, we can log the predictions of our new models using InferenceDB:
apiVersion: inferencedb.aporia.com/v1alpha1
kind: InferenceLogger
metadata:
name: sklearn-mms-model-1
namespace: default
spec:
# NOTE: The format is knative-broker-<namespace>-<brokerName>
topic: knative-broker-default-sklearn-mms-broker
schema:
type: avro
config:
columnNames:
inputs: [sepal_width, petal_width, sepal_length, petal_length]
outputs: [flower]
events:
type: kserve
config: {}
filters:
modelName: model1
# modelVersion: v1
destination:
type: confluent-s3
config:
url: s3://aporia-data/inferencedb
format: parquet
---
apiVersion: inferencedb.aporia.com/v1alpha1
kind: InferenceLogger
metadata:
name: sklearn-mms-model-2
namespace: default
spec:
# NOTE: The format is knative-broker-<namespace>-<brokerName>
topic: knative-broker-default-sklearn-mms-broker
schema:
type: avro
config:
columnNames:
inputs: [sepal_width, petal_width, sepal_length, petal_length]
outputs: [flower]
events:
type: kserve
config: {}
filters:
modelName: model2
# modelVersion: v2
destination:
type: confluent-s3
config:
url: s3://aporia-data/inferencedb
format: parquet
Note the usage of filters.
First, we will need to port-forward the Istio service so we can access it from our local machine:
kubectl port-forward --namespace istio-system svc/istio-ingressgateway 8080:80
Prepare a payload in a file called mms-input.json
:
{
"inputs": [
{
"name": "input-0",
"shape": [2, 4],
"datatype": "FP32",
"data": [
[6.8, 2.8, 4.8, 1.4],
[6.0, 3.4, 4.5, 1.6]
]
}
]
}
And finally, you can send some inference requests:
SERVICE_HOSTNAME=$(kubectl get inferenceservice sklearn-mms -o jsonpath='{.status.url}' | cut -d "/" -f 3)
curl -v \
-H "Host: ${SERVICE_HOSTNAME}" \
-H "Content-Type: application/json" \
-d @./iris-input.json \
http://localhost:8080/v2/models/model1/infer
If everything was configured correctly, these predictions should have been logged to two Parquet files in S3.
import pandas as pd
df = pd.read_parquet("s3://aporia-data/inferencedb/default-sklearn-mms-model1/")
print(df)
df = pd.read_parquet("s3://aporia-data/inferencedb/default-sklearn-mms-model2/")
print(df)