Skip to content

Latest commit

 

History

History
16 lines (12 loc) · 786 Bytes

lucene-inverted.msmarco-v1-doc-segmented.unicoil.20221005.252b5e.README.md

File metadata and controls

16 lines (12 loc) · 786 Bytes

msmarco-v1-doc-segmented-unicoil

Lucene impact index of the MS MARCO V1 segmented document corpus for uniCOIL.

This index was generated on 2022/10/05 at Anserini commit 252b5e on tuna with the following command:

nohup target/appassembler/bin/IndexCollection \
 -collection JsonVectorCollection \
 -input /tuna1/collections/msmarco/msmarco-doc-segmented-unicoil \
 -index indexes/lucene-index.msmarco-v1-doc-segmented-unicoil.20221005.252b5e/ \
 -generator DefaultLuceneDocumentGenerator \
 -threads 16 -impact -pretokenized -optimize >& logs/log.msmarco-v1-doc-segmented-unicoil.20221005.252b5e &

In April 2024, index was repackaged to adopt a more consistent naming scheme.