Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace giga_cherche by pylate #39

Merged
merged 1 commit into from
Aug 22, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .github/workflows/python-tests.yml
Original file line number Diff line number Diff line change
Expand Up @@ -29,4 +29,4 @@ jobs:

- name: Run tests library
run: |
pytest giga_cherche
pytest pylate
14 changes: 7 additions & 7 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@ For example, to run the BEIR evaluations using giga-cherche indexes:
# Modeling
The modeling of giga-cherche is based on sentence-transformers which allow to build a ColBERT model from any encoder available by appending a projection layer applied to the output of the encoders to reduce the embeddings dimension.
```
from giga_cherche import models
from pylate import models
model_name = "bert-base-uncased"
model = models.ColBERT(model_name_or_path=model_name)
```
Expand All @@ -40,7 +40,7 @@ from sentence_transformers import (
SentenceTransformerTrainingArguments,
)

from giga_cherche import losses, models, datasets, evaluation
from pylate import losses, models, datasets, evaluation

model_name = "bert-base-uncased"
batch_size = 32
Expand Down Expand Up @@ -134,7 +134,7 @@ Note that query embeddings cannot be pooled.
You can then compute the ColBERT max-sim scores like this:

```python
from giga_cherche import scores
from pylate import scores
similarity_scores = scores.colbert_scores(query_embeddings, document_embeddings)
```

Expand All @@ -147,7 +147,7 @@ Before being able to create and use an index, you need to need to launch the Wea
To populate an index, simply create it and then add the computed embeddings with their corresponding ids:

```python
from giga_cherche import indexes
from pylate import indexes

index = indexes.Weaviate(name="test_index")

Expand All @@ -171,7 +171,7 @@ index.remove_documents(["1"])
To retrieve documents from the index, you can use the following code snippet:

```python
from giga_cherche import retrieve
from pylate import retrieve

retriever = retrieve.ColBERT(Weaviate)

Expand All @@ -185,7 +185,7 @@ retrieved_chunks = retriever.retrieve(queries_embeddings, k=10)
You can also simply rerank a list of ids produced by an upstream retrieval module (such as BM25):

```python
from giga_cherche import rerank
from pylate import rerank

reranker = rerank.ColBERT(Weaviate)

Expand All @@ -199,7 +199,7 @@ reranked_chunks = reranker.rerank(
We can eavaluate the performance of the model using the BEIR evaluation framework. The following code snippet shows how to evaluate the model on the SciFact dataset:

```python
from giga_cherche import evaluation, indexes, models, retrieve, utils
from pylate import evaluation, indexes, models, retrieve, utils

model = models.ColBERT(
model_name_or_path="NohTow/colbertv2_sentence_transformer",
Expand Down
2 changes: 1 addition & 1 deletion evaluation/beir.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
"""Evaluation script for the SciFact dataset using the Beir library."""

from giga_cherche import evaluation, indexes, models, retrieve
from pylate import evaluation, indexes, models, retrieve

model = models.ColBERT(
model_name_or_path="sentence-transformers/all-MiniLM-L6-v2",
Expand Down
2 changes: 1 addition & 1 deletion evaluation/miracl.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@

from beir.datasets.data_loader import GenericDataLoader

from giga_cherche import evaluation, indexes, models, retrieve
from pylate import evaluation, indexes, models, retrieve

model = models.ColBERT(
model_name_or_path="NohTow/colbert_xml-r-english",
Expand Down
File renamed without changes.
File renamed without changes.
File renamed without changes.
4 changes: 2 additions & 2 deletions giga_cherche/evaluation/beir.py → pylate/evaluation/beir.py
Original file line number Diff line number Diff line change
Expand Up @@ -44,7 +44,7 @@ def load_beir(dataset_name: str, split: str = "test") -> tuple[list, list, dict]

Examples
--------
>>> from giga_cherche import evaluation
>>> from pylate import evaluation

>>> documents, queries, qrels = evaluation.load_beir(
... "scifact",
Expand Down Expand Up @@ -111,7 +111,7 @@ def get_beir_triples(

Examples
--------
>>> from giga_cherche import evaluation
>>> from pylate import evaluation

>>> documents, queries, qrels = evaluation.load_beir(
... "scifact",
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -41,7 +41,7 @@ class ColBERTDistillationEvaluator(SentenceEvaluator):
Examples
--------

>>> from giga_cherche import models, evaluation
>>> from pylate import models, evaluation

>>> model = models.ColBERT(
... model_name_or_path="sentence-transformers/all-MiniLM-L6-v2", device="cpu"
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -104,7 +104,7 @@ class ColBERTTripletEvaluator(TripletEvaluator):

Examples
--------
>>> from giga_cherche import evaluation, models
>>> from pylate import evaluation, models

>>> model = models.ColBERT(
... model_name_or_path="sentence-transformers/all-MiniLM-L6-v2",
Expand Down
File renamed without changes.
File renamed without changes.
Original file line number Diff line number Diff line change
Expand Up @@ -46,7 +46,7 @@ class Voyager(Base):

Examples
--------
>>> from giga_cherche import indexes, models
>>> from pylate import indexes, models

>>> index = indexes.Voyager(
... index_folder="test_indexes",
Expand Down
File renamed without changes.
Original file line number Diff line number Diff line change
Expand Up @@ -81,7 +81,7 @@ class Contrastive(nn.Module):

Examples
--------
>>> from giga_cherche import models, losses
>>> from pylate import models, losses

>>> model = models.ColBERT(
... model_name_or_path="sentence-transformers/all-MiniLM-L6-v2", device="cpu"
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,7 @@ class Distillation(torch.nn.Module):

Examples
--------
>>> from giga_cherche import models, losses
>>> from pylate import models, losses

>>> model = models.ColBERT(
... model_name_or_path="sentence-transformers/all-MiniLM-L6-v2", device="cpu"
Expand Down
2 changes: 1 addition & 1 deletion giga_cherche/models/Dense.py → pylate/models/Dense.py
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,7 @@ class Dense(DenseSentenceTransformer):

Examples
--------
>>> from giga_cherche import models
>>> from pylate import models

>>> model = models.Dense(
... in_features=768,
Expand Down
File renamed without changes.
4 changes: 2 additions & 2 deletions giga_cherche/models/colbert.py → pylate/models/colbert.py
Original file line number Diff line number Diff line change
Expand Up @@ -148,7 +148,7 @@ class ColBERT(SentenceTransformer):

Examples
--------
>>> from giga_cherche import models
>>> from pylate import models

>>> model = models.ColBERT(
... model_name_or_path="sentence-transformers/all-MiniLM-L6-v2",
Expand Down Expand Up @@ -808,7 +808,7 @@ def encode_multi_process(

Examples
--------
>>> from giga_cherche import models
>>> from pylate import models

>>> model = models.ColBERT(
... "sentence-transformers/all-MiniLM-L6-v2",
Expand Down
File renamed without changes.
2 changes: 1 addition & 1 deletion giga_cherche/rank/rank.py → pylate/rank/rank.py
Original file line number Diff line number Diff line change
Expand Up @@ -41,7 +41,7 @@ def rerank(

Examples
--------
>>> from giga_cherche import models, rank
>>> from pylate import models, rank

>>> model = models.ColBERT(
... model_name_or_path="sentence-transformers/all-MiniLM-L6-v2", device="cpu"
Expand Down
File renamed without changes.
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@ class ColBERT:

Examples
--------
>>> from giga_cherche import indexes, models, retrieve
>>> from pylate import indexes, models, retrieve

>>> model = models.ColBERT(
... model_name_or_path="sentence-transformers/all-MiniLM-L6-v2",
Expand Down
File renamed without changes.
2 changes: 0 additions & 2 deletions giga_cherche/scores/scores.py → pylate/scores/scores.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,3 @@
"""ColBERT scores computation."""

import numpy as np
import torch

Expand Down
File renamed without changes.
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@ class ColBERTCollator:

Examples
--------
>>> from giga_cherche import models, utils
>>> from pylate import models, utils

>>> model = models.ColBERT(
... model_name_or_path="sentence-transformers/all-MiniLM-L6-v2", device="cpu"
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ def iter_batch(

Examples
-------
>>> from giga_cherche import utils
>>> from pylate import utils

>>> X = [
... "element 0",
Expand Down
File renamed without changes.
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,7 @@ class KDProcessing:
Examples
--------
>>> from datasets import load_dataset
>>> from giga_cherche import utils
>>> from pylate import utils

>>> train = load_dataset(
... path="lightonai/lighton-ms-marco-mini",
Expand Down Expand Up @@ -121,7 +121,7 @@ def map(self, example: dict) -> dict:
Examples
--------
>>> from datasets import load_dataset
>>> from giga_cherche import utils
>>> from pylate import utils

>>> train = load_dataset(
... path="lightonai/lighton-ms-marco-mini",
Expand Down
File renamed without changes.
4 changes: 2 additions & 2 deletions setup.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
import setuptools

from giga_cherche.__version__ import __version__
from pylate.__version__ import __version__

with open(file="README.md", mode="r", encoding="utf-8") as fh:
long_description = fh.read()
Expand All @@ -20,7 +20,7 @@


setuptools.setup(
name="giga_cherche",
name="pylate",
version=f"{__version__}",
license="",
author="LightON",
Expand Down
2 changes: 1 addition & 1 deletion tests/test_contrastive.py
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@
)
from sentence_transformers.training_args import BatchSamplers

from giga_cherche import evaluation, losses, models, utils
from pylate import evaluation, losses, models, utils


def test_contrastive_training() -> None:
Expand Down
2 changes: 1 addition & 1 deletion tests/test_kd.py
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@
SentenceTransformerTrainingArguments,
)

from giga_cherche import losses, models, utils
from pylate import losses, models, utils


def test_kd_training() -> None:
Expand Down
2 changes: 1 addition & 1 deletion tests/test_retriever.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
from giga_cherche import indexes, models, retrieve
from pylate import indexes, models, retrieve


def test_voyager_index(**kwargs) -> None:
Expand Down
2 changes: 1 addition & 1 deletion train/knowledge_distillation.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@
SentenceTransformerTrainingArguments,
)

from giga_cherche import losses, models, utils
from pylate import losses, models, utils

train = load_dataset(
path="./datasets/msmarco_fr_full",
Expand Down
2 changes: 1 addition & 1 deletion train/triplet.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@
)
from sentence_transformers.training_args import BatchSamplers

from giga_cherche import evaluation, losses, models, utils
from pylate import evaluation, losses, models, utils

model_name = "NohTow/colbertv2_sentence_transformer" # "distilroberta-base" # Choose the model you want
batch_size = 32 # The larger you select this, the better the results (usually). But it requires more GPU memory
Expand Down
Loading