Skip to content

v0.2.0: New Features Integrated with BEIR

Compare
Choose a tag to compare
@thakur-nandan thakur-nandan released this 06 Jul 16:38
· 168 commits to main since this release

FAISS Indexes and Search Integration

  • FAISS Indexes can be created and used for evaluation using the BEIR repository. We have added support to Flat-IP, HNSW, PQ, PCAMatrix, and BinaryFlat Indexes.
  • Faiss indexes use various compression algorithms useful for reducing Index memory sizes or improving retrieval speed.
  • You can also save your corpus embeddings as a faiss index, which wasn't possible with the exact search originally.
  • Check out how to evaluate dense retrieval using a faiss index [here] and dimension reduction using PCA [here].

Multilingual Datasets and Evaluation

  • Thanks to @julian-risch, we have added our first multilingual dataset to the BEIR repository - GermanQuAD (German SQuAD dataset).
  • We have changed Elasticsearch now to allow evaluation on languages apart from English, check it out [here].
  • We also have added a DPR model class which lets you load DPR models from Huggingface Repo, you can use this Class now for evaluation let's say the GermanDPR model [link].

DeepCT evaluation

  • We have transformed the original DeepCT code to be able to use tensorflow (tf) >v2.0 and now hosted the latest repo [here].
  • Using the hosted code, we are now able to use DeepCT for evaluation in BEIR using Anserini Retrieval, check [here].

Training Latest MSMARCO v3 Models

  • From the SentenceTransformers repository, we have integrated the latest training code for MSMARCO on custom manually provided hard negatives. This provides the state-of-the-art SBERT models trained on MSMARCO, check [here].

Using Multiple-GPU for question-generation

  • A big challenge was to use multiple GPUs for the generation of questions much faster. We have included Process-pools to generate questions much faster and now using multiple GPUs also in parallel, check [here].

Integration of Binary Passage Retrievers (BPR)

  • BPR (ACL'21, link) is now integrated within the BEIR benchmark. Now you can easily train a state-of-the-art BPR model on MSMARCO using the loss function described in the original paper, check [here].
  • You can also evaluate BPR now easily now in a zero-shot evaluation fashion, check [here].
  • We would soon open-source the BPR public models trained on MSMARCO.