Skip to content

Latest commit

 

History

History
73 lines (54 loc) · 2.45 KB

README.md

File metadata and controls

73 lines (54 loc) · 2.45 KB

NeuCLIR22 mT5

This repository includes the reproduction steps for the winning reranker model proposed in NeuralMind-UNICAMP at 2022 TREC NeuCLIR: Large Boring Rerankers for Cross-lingual Retrieval for NeuCLIR 2022.

Requirements

If you plan to run the model with weights in 8-bit (int8), you will need at least 40GB of GPU RAM. For 16-bit (fp16), you will need 80GB.

Installation

To install the necessary packages, use the pip package manager to install InPars toolkit, jsonlines reader, and ir-measures:

pip install inpars jsonlines ir-measures

If you want to use int8 for inference, install the bitsandbytes library. Refer to their installation documentation for more information.

pip install bitsandbytes

Usage

Clone this repository:

git clone https://github.com/vjeronymo2/neuCLIR2022-mT5.git

On root directory, copy the docs.jsonl from each language to the approriate empty directories (./fa, ./ru, and ./zh)

Open make_corpus&queries.ipynb notebook and run all cells to generate the corpus and topics for each language.

For a given run (e.g. runs/run.organizers.fa.txt) you wish to rerank, use the following command from InPars

python -m inpars.rerank \
        --model="unicamp-dl/mt5-13b-mmarco-100k" \
        --corpus="fa/docs_parsed.jsonl" \
        --queries="fa/topics/topics_mt_desc_title.tsv" \
        --input_run="runs/run.organizers.fa.txt" \
        --output_run="runs/run.organizers.mt5_mt_desc_title.txt"

To inference with either fp16 or int8, pass their arguments along.

        --fp16 \
        --int8 \

For metrics, you can use the ir_measures library:

ir_measures qrels_modified.fa \
        runs/run.organizers.mt5_mt_desc_title.txt \
        'nDCG@10 nDCG@20 MAP RBP(rel=1) R@100 R@1000 Judged@10 Judged@20'

Cite this work

Please cite our paper if you use this repository.

@misc{jeronymo2023neuralmindunicamp,
      title={NeuralMind-UNICAMP at 2022 TREC NeuCLIR: Large Boring Rerankers for Cross-lingual Retrieval}, 
      author={Vitor Jeronymo and Roberto Lotufo and Rodrigo Nogueira},
      year={2023},
      eprint={2303.16145},
      archivePrefix={arXiv},
      primaryClass={cs.IR}
}