From 689e4304943a61fea0861a34ecf600374570cf41 Mon Sep 17 00:00:00 2001 From: Antoine Chaffin Date: Tue, 1 Oct 2024 18:04:20 +0200 Subject: [PATCH] Add a readme --- pylate/server/README.md | 19 +++++++++++++++++++ 1 file changed, 19 insertions(+) create mode 100644 pylate/server/README.md diff --git a/pylate/server/README.md b/pylate/server/README.md new file mode 100644 index 0000000..0923329 --- /dev/null +++ b/pylate/server/README.md @@ -0,0 +1,19 @@ +# Serve the embeddings of a PyLate model +The ```server.py``` script allows to create a FastAPI server to serve the embeddings of a PyLate model. +To use it, simply run ```python server.py``` +You can then send requests to the API like so: +``` +curl -X POST http://localhost:8002/v1/embeddings \ + -H "Content-Type: application/json" \ + -d '{ + "input": ["Query 1", "Query 2"], + "model": "lightonai/colbertv2.0", + "is_query": false + }' +``` +If you want to encode queries, simply set ```ìs_query``` to ```True```. + +Note that the server leverages [batched](https://github.com/mixedbread-ai/batched), so you can do batch processing by sending multiple separate calls and it will create batches dynamically to fill up the GPU. + +For now, the server only support one loaded model, which you can define by using the ```--model``` argument when launching the server. +