From 2583f5e0e9dc79a725e457cd94ef07d6574392a4 Mon Sep 17 00:00:00 2001 From: Antoine Chaffin Date: Wed, 2 Oct 2024 15:01:11 +0200 Subject: [PATCH] Enhancement of the documentation --- docs/documentation/.pages | 2 +- docs/documentation/{server.md => fastapi.md} | 5 +++-- 2 files changed, 4 insertions(+), 3 deletions(-) rename docs/documentation/{server.md => fastapi.md} (73%) diff --git a/docs/documentation/.pages b/docs/documentation/.pages index 0732036..5aa0204 100644 --- a/docs/documentation/.pages +++ b/docs/documentation/.pages @@ -4,4 +4,4 @@ nav: - Datasets: datasets.md - Retrieval: retrieval.md - Evaluation: evaluation.md - - Serving: server.md + - FastAPI: fastapi.md diff --git a/docs/documentation/server.md b/docs/documentation/fastapi.md similarity index 73% rename from docs/documentation/server.md rename to docs/documentation/fastapi.md index 3b85a46..9649b68 100644 --- a/docs/documentation/server.md +++ b/docs/documentation/fastapi.md @@ -1,4 +1,4 @@ -# Serve the embeddings of a PyLate model +# Serve the embeddings of a PyLate model using FastAPI The ```server.py``` script (located in the ```server``` folder) allows to create a FastAPI server to serve the embeddings of a PyLate model. To use it, you need to install the api dependencies: ```pip install "pylate[api]"``` Then, run ```python server.py``` to launch the server. @@ -15,7 +15,8 @@ curl -X POST http://localhost:8002/v1/embeddings \ ``` If you want to encode queries, simply set ```ìs_query``` to ```True```. -Note that the server leverages [batched](https://github.com/mixedbread-ai/batched), so you can do batch processing by sending multiple separate calls and it will create batches dynamically to fill up the GPU. +???+ tip + Note that the server leverages [batched](https://github.com/mixedbread-ai/batched), so you can do batch processing by sending multiple separate calls and it will create batches dynamically to fill up the GPU. For now, the server only support one loaded model, which you can define by using the ```--model``` argument when launching the server.