If you plan to run the model with weights in 8-bit (int8), you will need at least 40GB of GPU RAM. For 16-bit (fp16), you will need 80GB.
To install the necessary packages, use the pip package manager to install InPars toolkit, jsonlines reader, and ir-measures:
pip install inpars jsonlines ir-measures
If you want to use int8 for inference, install the bitsandbytes library. Refer to their installation documentation for more information.
pip install bitsandbytes
Clone this repository:
git clone https://github.com/vjeronymo2/neuCLIR2022-mT5.git
On root directory, copy the docs.jsonl
from each language to the approriate empty directories (./fa
, ./ru
, and ./zh
)
Open make_corpus&queries.ipynb
notebook and run all cells to generate the corpus and topics for each language.
For a given run (e.g. runs/run.organizers.fa.txt
) you wish to rerank, use the following command from InPars
python -m inpars.rerank \
--model="unicamp-dl/mt5-13b-mmarco-100k" \
--corpus="fa/docs_parsed.jsonl" \
--queries="fa/topics/topics_mt_desc_title.tsv" \
--input_run="runs/run.organizers.fa.txt" \
--output_run="runs/run.organizers.mt5_mt_desc_title.txt"
To inference with either fp16 or int8, pass their arguments along.
--fp16 \
--int8 \
For metrics, you can use the ir_measures library:
ir_measures qrels_modified.fa \
runs/run.organizers.mt5_mt_desc_title.txt \
'nDCG@10 nDCG@20 MAP RBP(rel=1) R@100 R@1000 Judged@10 Judged@20'
Please cite our paper if you use this repository.
@misc{jeronymo2023neuralmindunicamp,
title={NeuralMind-UNICAMP at 2022 TREC NeuCLIR: Large Boring Rerankers for Cross-lingual Retrieval},
author={Vitor Jeronymo and Roberto Lotufo and Rodrigo Nogueira},
year={2023},
eprint={2303.16145},
archivePrefix={arXiv},
primaryClass={cs.IR}
}