Skip to content

Commit

Permalink
Merge pull request #44 from lightonai/reranking_doc
Browse files Browse the repository at this point in the history
Adding reranking documentation and reworking the benchmark documentation
  • Loading branch information
raphaelsty authored Aug 28, 2024
2 parents eb591f4 + 0608d33 commit b52880c
Show file tree
Hide file tree
Showing 8 changed files with 129 additions and 17 deletions.
36 changes: 36 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -351,6 +351,42 @@ Sample Output:
]
```

## Rerank

If you only want to use the ColBERT model to perform reranking on top of your first-stage retrieval pipeline without building an index, you can simply use rank function and pass the queries and documents to rerank:

```python
from pylate import rank

queries = [
"query A",
"query B",
]
documents = [
["document A", "document B"],
["document 1", "document C", "document B"],
]
documents_ids = [
[1, 2],
[1, 3, 2],
]

queries_embeddings = model.encode(
queries,
is_query=True,
)
documents_embeddings = model.encode(
documents,
is_query=False,
)

reranked_documents = rank.rerank(
documents_ids=documents_ids,
queries_embeddings=queries_embeddings,
documents_embeddings=documents_embeddings,
)
```

## Contributing

We welcome contributions! To get started:
Expand Down
2 changes: 1 addition & 1 deletion docs/api/losses/Contrastive.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ Contrastive loss. Expects as input two texts and a label of either 0 or 1. If th

ColBERT model.

- **score_metric** – defaults to `<function colbert_scores at 0x1526b8fe0>`
- **score_metric** – defaults to `<function colbert_scores at 0x17af08fe0>`

ColBERT scoring function. Defaults to colbert_scores.

Expand Down
2 changes: 1 addition & 1 deletion docs/api/losses/Distillation.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ Distillation loss for ColBERT model. The loss is computed with respect to the fo

SentenceTransformer model.

- **score_metric** (*Callable*) – defaults to `<function colbert_kd_scores at 0x1526b91c0>`
- **score_metric** (*Callable*) – defaults to `<function colbert_kd_scores at 0x17af70360>`

Function that returns a score between two sequences of embeddings.

Expand Down
4 changes: 2 additions & 2 deletions docs/benchmarks/.pages
Original file line number Diff line number Diff line change
@@ -1,3 +1,3 @@
title: Benchmarks
title: Models
nav:
- Benchmarks: benchmarks.md
- Models: models.md
11 changes: 0 additions & 11 deletions docs/benchmarks/benchmarks.md

This file was deleted.

15 changes: 15 additions & 0 deletions docs/benchmarks/models.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
# Available models

Here is a list of the pre-trained ColBERT models available in PyLate along with their results on BEIR:

=== "Table"


| Model | BEIR AVG | NFCorpus | SciFact | SCIDOCS | FiQA2018 | TRECCOVID | HotpotQA | Touche2020 | ArguAna | ClimateFEVER | FEVER | QuoraRetrieval | NQ | DBPedia |
|---------------------------------------|----------|----------|---------|---------|----------|-----------|----------|------------|---------|--------------|-------|----------------|------|---------|
| answerdotai/answerai-colbert-small-v1 | 53.79 | 37.3 | 74.77 | 18.42 | 41.15 | 84.59 | 76.11 | 25.69 | 50.09 | 33.07 | 90.96 | 87.72 | 59.1 | 45.58 |
| lightonai/colbertv2.0 | 50.02 | 33.8 | 69.3 | 15.4 | 35.6 | 73.3 | 66.7 | 26.3 | 46.3 | 17.6 | 78.5 | 85.2 | 56.2 | 44.6 |

Please note that the `lightonai/colbertv2.0` is simply a translation of the original [ColBERTv2 model](https://huggingface.co/colbert-ir/colbertv2.0/tree/main) to work with PyLate and we thank Omar Khattab for allowing us to share the model on PyLate.

We are planning to release various strong models in the near future, but feel free to contact us if you want to make your existing ColBERT compatible with PyLate!
40 changes: 38 additions & 2 deletions docs/documentation/retrieval.md
Original file line number Diff line number Diff line change
Expand Up @@ -133,10 +133,46 @@ scores = retriever.retrieve(
)

```
## Remove documents from the index
### Remove documents from the index

To remove documents from the index, use the `remove_documents` method. Provide the document IDs you want to remove from the index:

```python
index.remove_documents(["1", "2"])
```
```

## ColBERT reranking

If you only want to use the ColBERT model to perform reranking on top of your first-stage retrieval pipeline without building an index, you can simply use rank function and pass the queries and documents to rerank:

```python
from pylate import rank

queries = [
"query A",
"query B",
]
documents = [
["document A", "document B"],
["document 1", "document C", "document B"],
]
documents_ids = [
[1, 2],
[1, 3, 2],
]

queries_embeddings = model.encode(
queries,
is_query=True,
)
documents_embeddings = model.encode(
documents,
is_query=False,
)

reranked_documents = rank.rerank(
documents_ids=documents_ids,
queries_embeddings=queries_embeddings,
documents_embeddings=documents_embeddings,
)
```
36 changes: 36 additions & 0 deletions docs/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -351,6 +351,42 @@ Sample Output:
]
```

## Rerank

If you only want to use the ColBERT model to perform reranking on top of your first-stage retrieval pipeline without building an index, you can simply use rank function and pass the queries and documents to rerank:

```python
from pylate import rank

queries = [
"query A",
"query B",
]
documents = [
["document A", "document B"],
["document 1", "document C", "document B"],
]
documents_ids = [
[1, 2],
[1, 3, 2],
]

queries_embeddings = model.encode(
queries,
is_query=True,
)
documents_embeddings = model.encode(
documents,
is_query=False,
)

reranked_documents = rank.rerank(
documents_ids=documents_ids,
queries_embeddings=queries_embeddings,
documents_embeddings=documents_embeddings,
)
```

## Contributing

We welcome contributions! To get started:
Expand Down

0 comments on commit b52880c

Please sign in to comment.