Merge pull request #44 from lightonai/reranking_doc

Adding reranking documentation and reworking the benchmark documentation
lightonai · Aug 28, 2024 · b52880c · b52880c
2 parents eb591f4 + 0608d33
commit b52880c
Show file tree

Hide file tree

Showing 8 changed files with 129 additions and 17 deletions.
diff --git a/README.md b/README.md
@@ -351,6 +351,42 @@ Sample Output:
 ]
 ```
 
+## Rerank
+
+If you only want to use the ColBERT model to perform reranking on top of your first-stage retrieval pipeline without building an index, you can simply use rank function and pass the queries and documents to rerank:
+
+```python
+from pylate import rank
+
+queries = [
+    "query A",
+    "query B",
+]
+documents = [
+    ["document A", "document B"],
+    ["document 1", "document C", "document B"],
+]
+documents_ids = [
+    [1, 2],
+    [1, 3, 2],
+]
+
+queries_embeddings = model.encode(
+    queries,
+    is_query=True,
+)
+documents_embeddings = model.encode(
+    documents,
+    is_query=False,
+)
+
+reranked_documents = rank.rerank(
+    documents_ids=documents_ids,
+    queries_embeddings=queries_embeddings,
+    documents_embeddings=documents_embeddings,
+)
+```
+
 ## Contributing
 
 We welcome contributions! To get started:

diff --git a/docs/api/losses/Contrastive.md b/docs/api/losses/Contrastive.md
@@ -10,7 +10,7 @@ Contrastive loss. Expects as input two texts and a label of either 0 or 1. If th
 
     ColBERT model.
 
-- **score_metric** – defaults to `<function colbert_scores at 0x1526b8fe0>`
+- **score_metric** – defaults to `<function colbert_scores at 0x17af08fe0>`
 
     ColBERT scoring function. Defaults to colbert_scores.
 

diff --git a/docs/api/losses/Distillation.md b/docs/api/losses/Distillation.md
@@ -10,7 +10,7 @@ Distillation loss for ColBERT model. The loss is computed with respect to the fo
 
     SentenceTransformer model.
 
-- **score_metric** (*Callable*) – defaults to `<function colbert_kd_scores at 0x1526b91c0>`
+- **score_metric** (*Callable*) – defaults to `<function colbert_kd_scores at 0x17af70360>`
 
     Function that returns a score between two sequences of embeddings.
 

diff --git a/docs/benchmarks/.pages b/docs/benchmarks/.pages
@@ -1,3 +1,3 @@
-title: Benchmarks
+title: Models
 nav:
-    - Benchmarks: benchmarks.md
+    - Models: models.md
diff --git a/docs/benchmarks/benchmarks.md b/docs/benchmarks/benchmarks.md
diff --git a/docs/benchmarks/models.md b/docs/benchmarks/models.md
@@ -0,0 +1,15 @@
+# Available models
+
+Here is a list of the pre-trained ColBERT models available in PyLate along with their results on BEIR:
+
+=== "Table"
+
+
+| Model                                 | BEIR AVG | NFCorpus | SciFact | SCIDOCS | FiQA2018 | TRECCOVID | HotpotQA | Touche2020 | ArguAna | ClimateFEVER | FEVER | QuoraRetrieval | NQ   | DBPedia |
+|---------------------------------------|----------|----------|---------|---------|----------|-----------|----------|------------|---------|--------------|-------|----------------|------|---------|
+| answerdotai/answerai-colbert-small-v1 | 53.79    | 37.3     | 74.77   | 18.42   | 41.15    | 84.59     | 76.11    | 25.69      | 50.09   | 33.07        | 90.96 | 87.72          | 59.1 | 45.58   |
+| lightonai/colbertv2.0                 | 50.02    | 33.8     | 69.3    | 15.4    | 35.6     | 73.3      | 66.7     | 26.3       | 46.3    | 17.6         | 78.5  | 85.2           | 56.2 | 44.6    |
+
+Please note that the `lightonai/colbertv2.0` is simply a translation of the original [ColBERTv2 model](https://huggingface.co/colbert-ir/colbertv2.0/tree/main) to work with PyLate and we thank Omar Khattab for allowing us to share the model on PyLate.
+
+We are planning to release various strong models in the near future, but feel free to contact us if you want to make your existing ColBERT compatible with PyLate!
diff --git a/docs/documentation/retrieval.md b/docs/documentation/retrieval.md
@@ -133,10 +133,46 @@ scores = retriever.retrieve(
 )
 
 ```
-## Remove documents from the index
+### Remove documents from the index
 
 To remove documents from the index, use the `remove_documents` method. Provide the document IDs you want to remove from the index:
 
 ```python
 index.remove_documents(["1", "2"])
-```
+```
+
+## ColBERT reranking
+
+If you only want to use the ColBERT model to perform reranking on top of your first-stage retrieval pipeline without building an index, you can simply use rank function and pass the queries and documents to rerank:
+
+```python
+from pylate import rank
+
+queries = [
+    "query A",
+    "query B",
+]
+documents = [
+    ["document A", "document B"],
+    ["document 1", "document C", "document B"],
+]
+documents_ids = [
+    [1, 2],
+    [1, 3, 2],
+]
+
+queries_embeddings = model.encode(
+    queries,
+    is_query=True,
+)
+documents_embeddings = model.encode(
+    documents,
+    is_query=False,
+)
+
+reranked_documents = rank.rerank(
+    documents_ids=documents_ids,
+    queries_embeddings=queries_embeddings,
+    documents_embeddings=documents_embeddings,
+)
+```
diff --git a/docs/index.md b/docs/index.md
@@ -351,6 +351,42 @@ Sample Output:
 ]
 ```
 
+## Rerank
+
+If you only want to use the ColBERT model to perform reranking on top of your first-stage retrieval pipeline without building an index, you can simply use rank function and pass the queries and documents to rerank:
+
+```python
+from pylate import rank
+
+queries = [
+    "query A",
+    "query B",
+]
+documents = [
+    ["document A", "document B"],
+    ["document 1", "document C", "document B"],
+]
+documents_ids = [
+    [1, 2],
+    [1, 3, 2],
+]
+
+queries_embeddings = model.encode(
+    queries,
+    is_query=True,
+)
+documents_embeddings = model.encode(
+    documents,
+    is_query=False,
+)
+
+reranked_documents = rank.rerank(
+    documents_ids=documents_ids,
+    queries_embeddings=queries_embeddings,
+    documents_embeddings=documents_embeddings,
+)
+```
+
 ## Contributing
 
 We welcome contributions! To get started: