-
Notifications
You must be signed in to change notification settings - Fork 7
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
Showing
1 changed file
with
19 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,19 @@ | ||
# Serve the embeddings of a PyLate model | ||
The ```server.py``` script allows to create a FastAPI server to serve the embeddings of a PyLate model. | ||
To use it, simply run ```python server.py``` | ||
You can then send requests to the API like so: | ||
``` | ||
curl -X POST http://localhost:8002/v1/embeddings \ | ||
-H "Content-Type: application/json" \ | ||
-d '{ | ||
"input": ["Query 1", "Query 2"], | ||
"model": "lightonai/colbertv2.0", | ||
"is_query": false | ||
}' | ||
``` | ||
If you want to encode queries, simply set ```ìs_query``` to ```True```. | ||
|
||
Note that the server leverages [batched](https://github.com/mixedbread-ai/batched), so you can do batch processing by sending multiple separate calls and it will create batches dynamically to fill up the GPU. | ||
|
||
For now, the server only support one loaded model, which you can define by using the ```--model``` argument when launching the server. | ||
|