This guide provides instructions to reproduce the DistilBERT KD dense retrieval model on the MS MARCO passage ranking task, described in the following paper:
Sebastian Hofstätter, Sophia Althammer, Michael Schröder, Mete Sertkan, and Allan Hanbury. Improving Efficient Neural Ranking Models with Cross-Architecture Knowledge Distillation. arXiv:2010.02666, October 2020.
Note that we often observe minor differences in scores between different computing environments (e.g., Linux vs. macOS). However, the differences usually appear in the fifth digit after the decimal point, and do not appear to be a cause for concern from a reproducibility perspective. Thus, while the scoring script provides results to much higher precision, we have intentionally rounded to four digits after the decimal point.
Dense retrieval, with brute-force index:
python -m pyserini.search.faiss \
--index msmarco-v1-passage.distilbert-dot-margin-mse-t2 \
--topics msmarco-passage-dev-subset \
--encoded-queries distilbert_kd-msmarco-passage-dev-subset \
--output runs/run.msmarco-passage.distilbert-dot-margin_mse-t2.tsv \
--output-format msmarco \
--batch-size 512 --threads 16
Replace --encoded-queries
with --encoder sebastian-hofstaetter/distilbert-dot-margin_mse-T2-msmarco
for on-the-fly query encoding.
To evaluate:
python -m pyserini.eval.msmarco_passage_eval msmarco-passage-dev-subset \
runs/run.msmarco-passage.distilbert-dot-margin_mse-t2.tsv
Results:
#####################
MRR @10: 0.3251
QueriesRanked: 6980
#####################
We can also use the official TREC evaluation tool trec_eval
to compute other metrics than MRR@10.
For that we first need to convert runs and qrels files to the TREC format:
python -m pyserini.eval.convert_msmarco_run_to_trec_run \
--input runs/run.msmarco-passage.distilbert-dot-margin_mse-t2.tsv \
--output runs/run.msmarco-passage.distilbert-dot-margin_mse-t2.trec
python -m pyserini.eval.trec_eval -c -mrecall.1000 -mmap msmarco-passage-dev-subset \
runs/run.msmarco-passage.distilbert-dot-margin_mse-t2.trec
Results:
map all 0.3309
recall_1000 all 0.9553