DKRR (Distilling Knowledge from Reader to Retriever) is a technique to learn retriever models described in the following paper:
Gautier Izacard and Edouard Grave. Distilling Knowledge from Reader to Retriever for Question Answering. arXiv:2012.04584, 2020.
We have incorporated this work into Pyserini.
In particular, we have taken the pretrained nq_retriever
and tqa_retriever
models from the DKRR repo, used them to index English Wikipedia, and then incorporate into the dense retrieval framework in Pyserini.
This guide provides instructions to reproduce our results.
Running DKRR retrieval on dpr-nq-dev
and nq-test
of the Natural Questions dataset:
python -m pyserini.search.faiss \
--index wikipedia-dpr-100w.dkrr-nq \
--topics dpr-nq-dev \
--encoded-queries dkrr-dpr-nq-retriever-dpr-nq-dev \
--output runs/run.dpr-dkrr-nq.dev.trec \
--query-prefix question: \
--batch-size 512 --threads 16
python -m pyserini.search.faiss \
--index wikipedia-dpr-100w.dkrr-nq \
--topics nq-test \
--encoded-queries dkrr-dpr-nq-retriever-nq-test \
--output runs/run.dpr-dkrr-nq.test.trec \
--query-prefix question: \
--batch-size 512 --threads 16
Alternatively, replace --encoded-queries ...
with --encoder castorini/dkrr-dpr-nq-retriever
for on-the-fly query encoding.
To evaluate, convert the TREC output format to DPR's json format:
python -m pyserini.eval.convert_trec_run_to_dpr_retrieval_run \
--topics dpr-nq-dev \
--index wikipedia-dpr \
--input runs/run.dpr-dkrr-nq.dev.trec \
--output runs/run.dpr-dkrr-nq.dev.json
python -m pyserini.eval.convert_trec_run_to_dpr_retrieval_run \
--topics nq-test \
--index wikipedia-dpr \
--input runs/run.dpr-dkrr-nq.test.trec \
--output runs/run.dpr-dkrr-nq.test.json
Evaluating:
python -m pyserini.eval.evaluate_dpr_retrieval \
--retrieval runs/run.dpr-dkrr-nq.dev.json \
--topk 5 20 100 500 1000
python -m pyserini.eval.evaluate_dpr_retrieval \
--retrieval runs/run.dpr-dkrr-nq.test.json \
--topk 5 20 100 500 1000
The expected results are as follows, shown in the "ours" column:
Metric | dpr-nq-dev (ours) |
dpr-nq-dev (orig) |
nq-test (ours) |
---|---|---|---|
Top-5 | 72.40 | 73.80 | |
Top-20 | 82.36 | 82.4 | 84.27 |
Top-100 | 87.87 | 87.9 | 89.34 |
Top-500 | 90.37 | 92.24 | |
Top-1000 | 91.30 | 93.43 |
For reference, reported results from the paper (Table 8) are shown in the "orig" column.
Running DKRR retrieval on dpr-trivia-dev
and dpr-trivia-test
of the TriviaQA dataset:
python -m pyserini.search.faiss \
--index wikipedia-dpr-100w.dkrr-tqa \
--topics dpr-trivia-dev \
--encoded-queries dkrr-dpr-tqa-retriever-dpr-tqa-dev \
--output runs/run.dpr-dkrr-trivia.dev.trec \
--query-prefix question: \
--batch-size 512 --threads 16
python -m pyserini.search.faiss \
--index wikipedia-dpr-100w.dkrr-tqa \
--topics dpr-trivia-test \
--encoded-queries dkrr-dpr-tqa-retriever-dpr-tqa-test \
--output runs/run.dpr-dkrr-trivia.test.trec \
--query-prefix question: \
--batch-size 512 --threads 16
Alternatively, replace --encoded-queries ...
with --encoder castorini/dkrr-dpr-tqa-retriever
for on-the-fly query encoding.
To evaluate, convert the TREC output format to DPR's json format:
python -m pyserini.eval.convert_trec_run_to_dpr_retrieval_run \
--topics dpr-trivia-dev \
--index wikipedia-dpr \
--input runs/run.dpr-dkrr-trivia.dev.trec \
--output runs/run.dpr-dkrr-trivia.dev.json
python -m pyserini.eval.convert_trec_run_to_dpr_retrieval_run \
--topics dpr-trivia-test \
--index wikipedia-dpr \
--input runs/run.dpr-dkrr-trivia.test.trec \
--output runs/run.dpr-dkrr-trivia.test.json
Evaluating:
python -m pyserini.eval.evaluate_dpr_retrieval \
--retrieval runs/run.dpr-dkrr-trivia.dev.json \
--topk 5 20 100 500 1000
python -m pyserini.eval.evaluate_dpr_retrieval \
--retrieval runs/run.dpr-dkrr-trivia.test.json \
--topk 5 20 100 500 1000
The expected results are as follows, shown in the "ours" column:
Metric | dpr-trivia-dev (ours) |
dpr-trivia-dev (orig) |
dpr-trivia-test (ours) |
---|---|---|---|
Top-5 | 77.31 | 77.23 | |
Top-20 | 83.63 | 83.5 | 83.74 |
Top-100 | 87.39 | 87.4 | 87.78 |
Top-500 | 89.77 | 89.87 | |
Top-1000 | 90.35 | 90.63 |
For reference, reported results from the paper (Table 8) are shown in the "orig" column.
Running hybrid sparse-dense retrieval with DKKR and GAR-T5 is detailed in experiments-gar-t5.md