Skip to content

Source code of our paper "PairDistill: Pairwise Relevance Distillation for Dense Retrieval", EMNLP 2024 Main.

Notifications You must be signed in to change notification settings

MiuLab/PairDistill

Repository files navigation

PairDistill: Pairwise Relevance Distillation for Dense Retrieval

📃 Paper • 🤗 Models & Datasets

Source code, trained models, and data of our paper "PairDistill: Pairwise Relevance Distillation for Dense Retrieval", accepted to EMNLP 2024 Main Conference.

Please cite the following reference if you find our code, models, and datasets useful.

@inproceedings{huang2024pairdistill,
      title={PairDistill: Pairwise Relevance Distillation for Dense Retrieval}, 
      author={Chao-Wei Huang and Yun-Nung Chen},
      year={2024},
      booktitle={Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing (EMNLP 2024)}
}
image

Overview

PairDistill is a pairwise relevance distillation framework designed to enhance the retrieval performance of dense retrieval models. PairDistill leverages the pairwise relevance signals to guide the distillation process. PairDistill achieves superior performance on MS MARCO, BEIR, and LoTTE.

Install Dependencies

Make a new Python 3.9+ environment using virtualenv or conda.

conda create -n pair-distill python=3.10
conda activate pair-distill
# Install python dependencies. We specify the versions in the requirements.txt file, but newer versions should work generally okay.
pip install -r requirements.txt

PairDistill supports two dense retrieval models: ColBERT and DPR through the dpr-scale library. Please install the corresponding dependencies for the model you want to use.

# Install ColBERT dependencies
pip install -r ColBERT/requirements.txt

# Install DPR dependencies
pip install -r dpr-scale/requirements.txt

Training

Please navigate to ColBERT for ColBERT training. You could directly run

python3 train.py

to launch the training.

About

Source code of our paper "PairDistill: Pairwise Relevance Distillation for Dense Retrieval", EMNLP 2024 Main.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published