Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding reranking documentation and reworking the benchmark documentation #44

Merged
merged 1 commit into from
Aug 28, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
36 changes: 36 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -351,6 +351,42 @@ Sample Output:
]
```

## Rerank

If you only want to use the ColBERT model to perform reranking on top of your first-stage retrieval pipeline without building an index, you can simply use rank function and pass the queries and documents to rerank:

```python
from pylate import rank

queries = [
"query A",
"query B",
]
documents = [
["document A", "document B"],
["document 1", "document C", "document B"],
]
documents_ids = [
[1, 2],
[1, 3, 2],
]

queries_embeddings = model.encode(
queries,
is_query=True,
)
documents_embeddings = model.encode(
documents,
is_query=False,
)

reranked_documents = rank.rerank(
documents_ids=documents_ids,
queries_embeddings=queries_embeddings,
documents_embeddings=documents_embeddings,
)
```

## Contributing

We welcome contributions! To get started:
Expand Down
2 changes: 1 addition & 1 deletion docs/api/losses/Contrastive.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ Contrastive loss. Expects as input two texts and a label of either 0 or 1. If th

ColBERT model.

- **score_metric** – defaults to `<function colbert_scores at 0x1526b8fe0>`
- **score_metric** – defaults to `<function colbert_scores at 0x17af08fe0>`

ColBERT scoring function. Defaults to colbert_scores.

Expand Down
2 changes: 1 addition & 1 deletion docs/api/losses/Distillation.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ Distillation loss for ColBERT model. The loss is computed with respect to the fo

SentenceTransformer model.

- **score_metric** (*Callable*) – defaults to `<function colbert_kd_scores at 0x1526b91c0>`
- **score_metric** (*Callable*) – defaults to `<function colbert_kd_scores at 0x17af70360>`

Function that returns a score between two sequences of embeddings.

Expand Down
4 changes: 2 additions & 2 deletions docs/benchmarks/.pages
Original file line number Diff line number Diff line change
@@ -1,3 +1,3 @@
title: Benchmarks
title: Models
nav:
- Benchmarks: benchmarks.md
- Models: models.md
11 changes: 0 additions & 11 deletions docs/benchmarks/benchmarks.md

This file was deleted.

15 changes: 15 additions & 0 deletions docs/benchmarks/models.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
# Available models

Here is a list of the pre-trained ColBERT models available in PyLate along with their results on BEIR:

=== "Table"


| Model | BEIR AVG | NFCorpus | SciFact | SCIDOCS | FiQA2018 | TRECCOVID | HotpotQA | Touche2020 | ArguAna | ClimateFEVER | FEVER | QuoraRetrieval | NQ | DBPedia |
|---------------------------------------|----------|----------|---------|---------|----------|-----------|----------|------------|---------|--------------|-------|----------------|------|---------|
| answerdotai/answerai-colbert-small-v1 | 53.79 | 37.3 | 74.77 | 18.42 | 41.15 | 84.59 | 76.11 | 25.69 | 50.09 | 33.07 | 90.96 | 87.72 | 59.1 | 45.58 |
| lightonai/colbertv2.0 | 50.02 | 33.8 | 69.3 | 15.4 | 35.6 | 73.3 | 66.7 | 26.3 | 46.3 | 17.6 | 78.5 | 85.2 | 56.2 | 44.6 |

Please note that the `lightonai/colbertv2.0` is simply a translation of the original [ColBERTv2 model](https://huggingface.co/colbert-ir/colbertv2.0/tree/main) to work with PyLate and we thank Omar Khattab for allowing us to share the model on PyLate.

We are planning to release various strong models in the near future, but feel free to contact us if you want to make your existing ColBERT compatible with PyLate!
40 changes: 38 additions & 2 deletions docs/documentation/retrieval.md
Original file line number Diff line number Diff line change
Expand Up @@ -133,10 +133,46 @@ scores = retriever.retrieve(
)

```
## Remove documents from the index
### Remove documents from the index

To remove documents from the index, use the `remove_documents` method. Provide the document IDs you want to remove from the index:

```python
index.remove_documents(["1", "2"])
```
```

## ColBERT reranking

If you only want to use the ColBERT model to perform reranking on top of your first-stage retrieval pipeline without building an index, you can simply use rank function and pass the queries and documents to rerank:

```python
from pylate import rank

queries = [
"query A",
"query B",
]
documents = [
["document A", "document B"],
["document 1", "document C", "document B"],
]
documents_ids = [
[1, 2],
[1, 3, 2],
]

queries_embeddings = model.encode(
queries,
is_query=True,
)
documents_embeddings = model.encode(
documents,
is_query=False,
)

reranked_documents = rank.rerank(
documents_ids=documents_ids,
queries_embeddings=queries_embeddings,
documents_embeddings=documents_embeddings,
)
```
36 changes: 36 additions & 0 deletions docs/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -351,6 +351,42 @@ Sample Output:
]
```

## Rerank

If you only want to use the ColBERT model to perform reranking on top of your first-stage retrieval pipeline without building an index, you can simply use rank function and pass the queries and documents to rerank:

```python
from pylate import rank

queries = [
"query A",
"query B",
]
documents = [
["document A", "document B"],
["document 1", "document C", "document B"],
]
documents_ids = [
[1, 2],
[1, 3, 2],
]

queries_embeddings = model.encode(
queries,
is_query=True,
)
documents_embeddings = model.encode(
documents,
is_query=False,
)

reranked_documents = rank.rerank(
documents_ids=documents_ids,
queries_embeddings=queries_embeddings,
documents_embeddings=documents_embeddings,
)
```

## Contributing

We welcome contributions! To get started:
Expand Down
Loading