Skip to content

Latest commit

 

History

History
53 lines (37 loc) · 3.62 KB

cqr_experiments.md

File metadata and controls

53 lines (37 loc) · 3.62 KB

Experiments for Conversational Query Reformulation

Data Preparation

  1. Download either the training and evaluation input query JSON files. These files can be found under data/treccastweb/2019/data if you cloned the submodules for this repo.

Pass your pathname to a variable default:

export input_query_json=data/treccastweb/2019/data
  1. Download the evaluation answer files for training or evaluation. The training answer file is found under data/treccastweb/2019/data.

Run CQR retrieval

The following command is for HQE, but you can also run other CQR methods using t5 or fusion instead of hqe as the input to the --experiment flag. Running the command for the first time will download the CAsT 2019 index (or whatever index is specified for the --sparse_index flag). It is also possible to supply a path to a local directory containing the index.

python -m experiments.run_retrieval \
      --experiment hqe \
      --hits 1000 \
      --sparse_index cast2019 \
      --qid_queries $input_query_json \
      --output ./output/hqe_bm25 \

The experiment will output the retrieval results at the specified location in TSV format. By default, this will perform retrieval using only BM25, but you can add the --rerank flag to further rerank these results using BERT. For other command line arguments, see run_retrieval.py.

Evaluate CQR results

Convert the TSV file from above to TREC format and use the TREC tool to evaluate the resuls in terms of Recall@1000, mAP and NDCG@1,3.

python -m pyserini.eval.trec_eval -c -mndcg_cut.3,1 -mrecall.1000 -mmap $qrel ./output/hqe_bm25.trec

Evaluation results

Results for the CAsT 2019 evaluation dataset are provided below. The results may be slightly different from the numbers reported in the paper due to implementation differences between Huggingface and SpaCy versions. As of writing, we use spacy==2.2.4 with the English model en_core_web_sm==2.2.5, and transformers==4.0.0.

HQE BM25 HQE BM25 + BERT T5 BM25 T5 BM25 + BERT Fusion BM25 Fusion BM25 + BERT
mAP 0.2109 0.3058 0.2250 0.3555 0.2584 0.3739
Recall@1000 0.7322 0.7322 0.7392 0.7392 0.8028 0.8028
NDCG@1 0.2640 0.4745 0.2842 0.5751 0.3353 0.5838
NDCG@3 0.2606 0.4798 0.2954 0.5464 0.3247 0.5640

Reproduction Log