From 689e4304943a61fea0861a34ecf600374570cf41 Mon Sep 17 00:00:00 2001
From: Antoine Chaffin <ant54600@hotmail.fr>
Date: Tue, 1 Oct 2024 18:04:20 +0200
Subject: [PATCH] Add a readme

---
 pylate/server/README.md | 19 +++++++++++++++++++
 1 file changed, 19 insertions(+)
 create mode 100644 pylate/server/README.md

diff --git a/pylate/server/README.md b/pylate/server/README.md
new file mode 100644
index 0000000..0923329
--- /dev/null
+++ b/pylate/server/README.md
@@ -0,0 +1,19 @@
+# Serve the embeddings of a PyLate model
+The ```server.py``` script allows to create a FastAPI server to serve the embeddings of a PyLate model.
+To use it, simply run ```python server.py```
+You can then send requests to the API like so:
+```
+curl -X POST http://localhost:8002/v1/embeddings \
+  -H "Content-Type: application/json" \
+  -d '{
+    "input": ["Query 1", "Query 2"],
+    "model": "lightonai/colbertv2.0",
+    "is_query": false
+  }'
+```
+If you want to encode queries, simply set ```ìs_query``` to ```True```.
+
+Note that the server leverages [batched](https://github.com/mixedbread-ai/batched), so you can do batch processing by sending multiple separate calls and it will create batches dynamically to fill up the GPU.
+
+For now, the server only support one loaded model, which you can define by using the ```--model``` argument when launching the server.
+