diff --git a/README.md b/README.md index b847eec57..5c0647616 100644 --- a/README.md +++ b/README.md @@ -12,7 +12,7 @@ Retrieval using sparse representations is provided via integration with our grou Retrieval using dense representations is provided via integration with Facebook's [Faiss](https://github.com/facebookresearch/faiss) library. Pyserini is primarily designed to provide effective, reproducible, and easy-to-use first-stage retrieval in a multi-stage ranking architecture. -Our toolkit is self-contained as a standard Python package and comes with queries, relevance judgments, [pre-built indexes](docs/prebuilt-indexes.md), and evaluation scripts for many commonly used IR test collections. +Our toolkit is self-contained as a standard Python package and comes with queries, relevance judgments, [prebuilt indexes](docs/prebuilt-indexes.md), and evaluation scripts for many commonly used IR test collections. With Pyserini, it's easy to reproduce runs on a number of standard IR test collections! For additional details, [our paper](https://dl.acm.org/doi/10.1145/3404835.3463238) in SIGIR 2021 provides a nice overview. @@ -86,7 +86,7 @@ The steps are different for different classes of models: ## ⚗️ Reproducibility With Pyserini, it's easy to [reproduce](docs/reproducibility.md) runs on a number of standard IR test collections! -We provide a number of [pre-built indexes](docs/prebuilt-indexes.md) that directly support reproducibility "out of the box". +We provide a number of [prebuilt indexes](docs/prebuilt-indexes.md) that directly support reproducibility "out of the box". In our [SIGIR 2022 paper](https://dl.acm.org/doi/10.1145/3477495.3531749), we introduced "two-click reproductions" that allow anyone to reproduce experimental runs with only two clicks (i.e., copy and paste). Documentation is organized into reproduction matrices for different corpora that provide a summary of different experimental conditions and query sets: @@ -177,7 +177,7 @@ Additional reproduction guides below provide detailed step-by-step instructions. ## 📃 Additional Documentation -+ [Guide to pre-built indexes](docs/prebuilt-indexes.md) ++ [Guide to prebuilt indexes](docs/prebuilt-indexes.md) + [Guide to interactive searching](docs/usage-interactive-search.md) + [Guide to text classification with the 20Newsgroups dataset](docs/experiments-20newgroups.md) + [Guide to working with the COVID-19 Open Research Dataset (CORD-19)](docs/working-with-cord19.md) @@ -236,7 +236,7 @@ Additional reproduction guides below provide detailed step-by-step instructions. ⁉️ **Lucene 8 to Lucene 9 Transition.** In 2022, Pyserini underwent a transition from Lucene 8 to Lucene 9. -Most of the pre-built indexes have been rebuilt using Lucene 9, but there are a few still based on Lucene 8. +Most of the prebuilt indexes have been rebuilt using Lucene 9, but there are a few still based on Lucene 8. More details: diff --git a/docs/2cr/miracl.html b/docs/2cr/miracl.html index 19588ce62..a32b5a936 100644 --- a/docs/2cr/miracl.html +++ b/docs/2cr/miracl.html @@ -131,21 +131,40 @@ ">
-

MIRACL

+

Pyserini Reproductions: MIRACL

-
+
+ +

This page provides two-click reproductions* for a number of experimental runs on the MIRACL dataset. Instructions for programmatic execution are shown at the bottom of this page. The dataset is described in the following paper:

+ +

Xinyu Zhang, Nandan Thakur, Odunayo Ogundepo, Ehsan Kamalloo, David Alfonso-Hermelo, Xiaoguang Li, Qun Liu, Mehdi Rezagholizadeh, and Jimmy Lin. MIRACL: A Multilingual Retrieval Dataset Covering 18 Diverse Languages. Transactions of the Association for Computational Linguistics, 11:1114–1131, 2023.

+ +

Many of the models presented on this page are described in the following paper:

+ +

Xinyu Zhang, Kelechi Ogueji, Xueguang Ma, and Jimmy Lin. Towards Best Practices for Training Multilingual Dense Retrieval Models. ACM Transactions on Information Systems, 42(2), Article No. 39, 2023.

+ +

Key:

+ +
- + @@ -650,10 +669,10 @@

MIRACL

- + - + @@ -1131,30 +1150,30 @@

MIRACL

- + - - - - - - - - - - - - - - - - - - - + + + + + + + + + + + + + + + + + + + - + - + - - - - - - - - - - - - - - - - - - - + + + + + + + + + + + + + + + + + + + - + - + - + @@ -2486,10 +2505,10 @@

MIRACL

- + - + @@ -2976,7 +2995,7 @@

MIRACL

- + @@ -3481,10 +3500,10 @@

MIRACL

- + - + @@ -3962,30 +3981,30 @@

MIRACL

- + - - - - - - - - - - - - - - - - - - - + + + + + + + + + + + + + + + + + + + - + - + - - - - - - - - - - - - - - - - - - - + + + + + + + + + + + + + + + + + + + - + - + - + @@ -5317,10 +5336,10 @@

MIRACL

- + - + diff --git a/docs/2cr/mrtydi.html b/docs/2cr/mrtydi.html index 507b6f946..daa5980a5 100644 --- a/docs/2cr/mrtydi.html +++ b/docs/2cr/mrtydi.html @@ -131,14 +131,28 @@ ">
-

Mr.TyDi

+

Pyserini Reproductions: Mr.TyDi

-
+
+ +

This page provides two-click reproductions* for a number of experimental runs on the Mr. TyDi dataset. Instructions for programmatic execution are shown at the bottom of this page. The dataset is described in the following paper:

+ +

Xinyu Zhang, Xueguang Ma, Peng Shi, and Jimmy Lin. Mr. TyDi: A Multi-lingual Benchmark for Dense Retrieval. Proceedings of 1st Workshop on Multilingual Representation Learning, pages 127-137, November 2021, Punta Cana, Dominican Republic.

+ +

Key:

+ +
    +
  • BM25
  • +
  • mDPR (split) pFT NQ: mDPR (split encoders), pre-FT w/ NQ
  • +
  • mDPR (tied) pFT NQ: mDPR (tied encoders), pre-FT w/ NQ
  • +
  • mDPR (tied) pFT MS MARCO: mDPR (tied encoders), pre-FT w/ MS MARCO
  • +
  • mDPR (tied) pFT MS MARCO + FT all: mDPR (tied encoders), pre-FT w/ MS MARCO, FT w/ all
  • +
nDCG@10, dev queriesnDCG@10, dev queries ar bn en
@@ -1226,20 +1245,16 @@

MIRACL

Command to generate run:
-
python -m pyserini.search.faiss \
-  --threads 16 --batch-size 512 \
-  --encoder-class auto \
-  --encoder castorini/mdpr-tied-pft-msmarco-ft-all \
-  --topics miracl-v1.0-ar-dev \
-  --index miracl-v1.0-ar-mdpr-tied-pft-msmarco-ft-all \
-  --output run.miracl.mdpr-tied-pft-msmarco-ft-all.ar.dev.txt --hits 1000
+
python -m pyserini.fusion \
+  --runs  run.miracl.bm25.ar.dev.top1000.txt run.miracl.mdpr-tied-pft-msmarco.ar.dev.top1000.txt \
+  --output run.miracl.bm25-mdpr-tied-pft-msmarco-hybrid.ar.dev.txt --method interpolation --alpha 0.5 --depth 1000 --k 1000
 
Evaluation commands:
python -m pyserini.eval.trec_eval \
   -c -M 100 -m ndcg_cut.10 miracl-v1.0-ar-dev \
-  run.miracl.mdpr-tied-pft-msmarco-ft-all.ar.dev.txt
+ run.miracl.bm25-mdpr-tied-pft-msmarco-hybrid.ar.dev.txt
@@ -1247,20 +1262,16 @@

MIRACL

Command to generate run:
-
python -m pyserini.search.faiss \
-  --threads 16 --batch-size 512 \
-  --encoder-class auto \
-  --encoder castorini/mdpr-tied-pft-msmarco-ft-all \
-  --topics miracl-v1.0-bn-dev \
-  --index miracl-v1.0-bn-mdpr-tied-pft-msmarco-ft-all \
-  --output run.miracl.mdpr-tied-pft-msmarco-ft-all.bn.dev.txt --hits 1000
+
python -m pyserini.fusion \
+  --runs  run.miracl.bm25.bn.dev.top1000.txt run.miracl.mdpr-tied-pft-msmarco.bn.dev.top1000.txt \
+  --output run.miracl.bm25-mdpr-tied-pft-msmarco-hybrid.bn.dev.txt --method interpolation --alpha 0.5 --depth 1000 --k 1000
 
Evaluation commands:
python -m pyserini.eval.trec_eval \
   -c -M 100 -m ndcg_cut.10 miracl-v1.0-bn-dev \
-  run.miracl.mdpr-tied-pft-msmarco-ft-all.bn.dev.txt
+ run.miracl.bm25-mdpr-tied-pft-msmarco-hybrid.bn.dev.txt
@@ -1268,20 +1279,16 @@

MIRACL

Command to generate run:
-
python -m pyserini.search.faiss \
-  --threads 16 --batch-size 512 \
-  --encoder-class auto \
-  --encoder castorini/mdpr-tied-pft-msmarco-ft-all \
-  --topics miracl-v1.0-en-dev \
-  --index miracl-v1.0-en-mdpr-tied-pft-msmarco-ft-all \
-  --output run.miracl.mdpr-tied-pft-msmarco-ft-all.en.dev.txt --hits 1000
+
python -m pyserini.fusion \
+  --runs  run.miracl.bm25.en.dev.top1000.txt run.miracl.mdpr-tied-pft-msmarco.en.dev.top1000.txt \
+  --output run.miracl.bm25-mdpr-tied-pft-msmarco-hybrid.en.dev.txt --method interpolation --alpha 0.5 --depth 1000 --k 1000
 
Evaluation commands:
python -m pyserini.eval.trec_eval \
   -c -M 100 -m ndcg_cut.10 miracl-v1.0-en-dev \
-  run.miracl.mdpr-tied-pft-msmarco-ft-all.en.dev.txt
+ run.miracl.bm25-mdpr-tied-pft-msmarco-hybrid.en.dev.txt
@@ -1289,20 +1296,16 @@

MIRACL

Command to generate run:
-
python -m pyserini.search.faiss \
-  --threads 16 --batch-size 512 \
-  --encoder-class auto \
-  --encoder castorini/mdpr-tied-pft-msmarco-ft-all \
-  --topics miracl-v1.0-es-dev \
-  --index miracl-v1.0-es-mdpr-tied-pft-msmarco-ft-all \
-  --output run.miracl.mdpr-tied-pft-msmarco-ft-all.es.dev.txt --hits 1000
+
python -m pyserini.fusion \
+  --runs  run.miracl.bm25.es.dev.top1000.txt run.miracl.mdpr-tied-pft-msmarco.es.dev.top1000.txt \
+  --output run.miracl.bm25-mdpr-tied-pft-msmarco-hybrid.es.dev.txt --method interpolation --alpha 0.5 --depth 1000 --k 1000
 
Evaluation commands:
python -m pyserini.eval.trec_eval \
   -c -M 100 -m ndcg_cut.10 miracl-v1.0-es-dev \
-  run.miracl.mdpr-tied-pft-msmarco-ft-all.es.dev.txt
+ run.miracl.bm25-mdpr-tied-pft-msmarco-hybrid.es.dev.txt
@@ -1310,20 +1313,16 @@

MIRACL

Command to generate run:
-
python -m pyserini.search.faiss \
-  --threads 16 --batch-size 512 \
-  --encoder-class auto \
-  --encoder castorini/mdpr-tied-pft-msmarco-ft-all \
-  --topics miracl-v1.0-fa-dev \
-  --index miracl-v1.0-fa-mdpr-tied-pft-msmarco-ft-all \
-  --output run.miracl.mdpr-tied-pft-msmarco-ft-all.fa.dev.txt --hits 1000
+
python -m pyserini.fusion \
+  --runs  run.miracl.bm25.fa.dev.top1000.txt run.miracl.mdpr-tied-pft-msmarco.fa.dev.top1000.txt \
+  --output run.miracl.bm25-mdpr-tied-pft-msmarco-hybrid.fa.dev.txt --method interpolation --alpha 0.5 --depth 1000 --k 1000
 
Evaluation commands:
python -m pyserini.eval.trec_eval \
   -c -M 100 -m ndcg_cut.10 miracl-v1.0-fa-dev \
-  run.miracl.mdpr-tied-pft-msmarco-ft-all.fa.dev.txt
+ run.miracl.bm25-mdpr-tied-pft-msmarco-hybrid.fa.dev.txt
@@ -1331,20 +1330,16 @@

MIRACL

Command to generate run:
-
python -m pyserini.search.faiss \
-  --threads 16 --batch-size 512 \
-  --encoder-class auto \
-  --encoder castorini/mdpr-tied-pft-msmarco-ft-all \
-  --topics miracl-v1.0-fi-dev \
-  --index miracl-v1.0-fi-mdpr-tied-pft-msmarco-ft-all \
-  --output run.miracl.mdpr-tied-pft-msmarco-ft-all.fi.dev.txt --hits 1000
+
python -m pyserini.fusion \
+  --runs  run.miracl.bm25.fi.dev.top1000.txt run.miracl.mdpr-tied-pft-msmarco.fi.dev.top1000.txt \
+  --output run.miracl.bm25-mdpr-tied-pft-msmarco-hybrid.fi.dev.txt --method interpolation --alpha 0.5 --depth 1000 --k 1000
 
Evaluation commands:
python -m pyserini.eval.trec_eval \
   -c -M 100 -m ndcg_cut.10 miracl-v1.0-fi-dev \
-  run.miracl.mdpr-tied-pft-msmarco-ft-all.fi.dev.txt
+ run.miracl.bm25-mdpr-tied-pft-msmarco-hybrid.fi.dev.txt
@@ -1352,20 +1347,16 @@

MIRACL

Command to generate run:
-
python -m pyserini.search.faiss \
-  --threads 16 --batch-size 512 \
-  --encoder-class auto \
-  --encoder castorini/mdpr-tied-pft-msmarco-ft-all \
-  --topics miracl-v1.0-fr-dev \
-  --index miracl-v1.0-fr-mdpr-tied-pft-msmarco-ft-all \
-  --output run.miracl.mdpr-tied-pft-msmarco-ft-all.fr.dev.txt --hits 1000
+
python -m pyserini.fusion \
+  --runs  run.miracl.bm25.fr.dev.top1000.txt run.miracl.mdpr-tied-pft-msmarco.fr.dev.top1000.txt \
+  --output run.miracl.bm25-mdpr-tied-pft-msmarco-hybrid.fr.dev.txt --method interpolation --alpha 0.5 --depth 1000 --k 1000
 
Evaluation commands:
python -m pyserini.eval.trec_eval \
   -c -M 100 -m ndcg_cut.10 miracl-v1.0-fr-dev \
-  run.miracl.mdpr-tied-pft-msmarco-ft-all.fr.dev.txt
+ run.miracl.bm25-mdpr-tied-pft-msmarco-hybrid.fr.dev.txt
@@ -1373,20 +1364,16 @@

MIRACL

Command to generate run:
-
python -m pyserini.search.faiss \
-  --threads 16 --batch-size 512 \
-  --encoder-class auto \
-  --encoder castorini/mdpr-tied-pft-msmarco-ft-all \
-  --topics miracl-v1.0-hi-dev \
-  --index miracl-v1.0-hi-mdpr-tied-pft-msmarco-ft-all \
-  --output run.miracl.mdpr-tied-pft-msmarco-ft-all.hi.dev.txt --hits 1000
+
python -m pyserini.fusion \
+  --runs  run.miracl.bm25.hi.dev.top1000.txt run.miracl.mdpr-tied-pft-msmarco.hi.dev.top1000.txt \
+  --output run.miracl.bm25-mdpr-tied-pft-msmarco-hybrid.hi.dev.txt --method interpolation --alpha 0.5 --depth 1000 --k 1000
 
Evaluation commands:
python -m pyserini.eval.trec_eval \
   -c -M 100 -m ndcg_cut.10 miracl-v1.0-hi-dev \
-  run.miracl.mdpr-tied-pft-msmarco-ft-all.hi.dev.txt
+ run.miracl.bm25-mdpr-tied-pft-msmarco-hybrid.hi.dev.txt
@@ -1394,20 +1381,16 @@

MIRACL

Command to generate run:
-
python -m pyserini.search.faiss \
-  --threads 16 --batch-size 512 \
-  --encoder-class auto \
-  --encoder castorini/mdpr-tied-pft-msmarco-ft-all \
-  --topics miracl-v1.0-id-dev \
-  --index miracl-v1.0-id-mdpr-tied-pft-msmarco-ft-all \
-  --output run.miracl.mdpr-tied-pft-msmarco-ft-all.id.dev.txt --hits 1000
+
python -m pyserini.fusion \
+  --runs  run.miracl.bm25.id.dev.top1000.txt run.miracl.mdpr-tied-pft-msmarco.id.dev.top1000.txt \
+  --output run.miracl.bm25-mdpr-tied-pft-msmarco-hybrid.id.dev.txt --method interpolation --alpha 0.5 --depth 1000 --k 1000
 
Evaluation commands:
python -m pyserini.eval.trec_eval \
   -c -M 100 -m ndcg_cut.10 miracl-v1.0-id-dev \
-  run.miracl.mdpr-tied-pft-msmarco-ft-all.id.dev.txt
+ run.miracl.bm25-mdpr-tied-pft-msmarco-hybrid.id.dev.txt
@@ -1415,20 +1398,16 @@

MIRACL

Command to generate run:
-
python -m pyserini.search.faiss \
-  --threads 16 --batch-size 512 \
-  --encoder-class auto \
-  --encoder castorini/mdpr-tied-pft-msmarco-ft-all \
-  --topics miracl-v1.0-ja-dev \
-  --index miracl-v1.0-ja-mdpr-tied-pft-msmarco-ft-all \
-  --output run.miracl.mdpr-tied-pft-msmarco-ft-all.ja.dev.txt --hits 1000
+
python -m pyserini.fusion \
+  --runs  run.miracl.bm25.ja.dev.top1000.txt run.miracl.mdpr-tied-pft-msmarco.ja.dev.top1000.txt \
+  --output run.miracl.bm25-mdpr-tied-pft-msmarco-hybrid.ja.dev.txt --method interpolation --alpha 0.5 --depth 1000 --k 1000
 
Evaluation commands:
python -m pyserini.eval.trec_eval \
   -c -M 100 -m ndcg_cut.10 miracl-v1.0-ja-dev \
-  run.miracl.mdpr-tied-pft-msmarco-ft-all.ja.dev.txt
+ run.miracl.bm25-mdpr-tied-pft-msmarco-hybrid.ja.dev.txt
@@ -1436,20 +1415,16 @@

MIRACL

Command to generate run:
-
python -m pyserini.search.faiss \
-  --threads 16 --batch-size 512 \
-  --encoder-class auto \
-  --encoder castorini/mdpr-tied-pft-msmarco-ft-all \
-  --topics miracl-v1.0-ko-dev \
-  --index miracl-v1.0-ko-mdpr-tied-pft-msmarco-ft-all \
-  --output run.miracl.mdpr-tied-pft-msmarco-ft-all.ko.dev.txt --hits 1000
+
python -m pyserini.fusion \
+  --runs  run.miracl.bm25.ko.dev.top1000.txt run.miracl.mdpr-tied-pft-msmarco.ko.dev.top1000.txt \
+  --output run.miracl.bm25-mdpr-tied-pft-msmarco-hybrid.ko.dev.txt --method interpolation --alpha 0.5 --depth 1000 --k 1000
 
Evaluation commands:
python -m pyserini.eval.trec_eval \
   -c -M 100 -m ndcg_cut.10 miracl-v1.0-ko-dev \
-  run.miracl.mdpr-tied-pft-msmarco-ft-all.ko.dev.txt
+ run.miracl.bm25-mdpr-tied-pft-msmarco-hybrid.ko.dev.txt
@@ -1457,20 +1432,16 @@

MIRACL

Command to generate run:
-
python -m pyserini.search.faiss \
-  --threads 16 --batch-size 512 \
-  --encoder-class auto \
-  --encoder castorini/mdpr-tied-pft-msmarco-ft-all \
-  --topics miracl-v1.0-ru-dev \
-  --index miracl-v1.0-ru-mdpr-tied-pft-msmarco-ft-all \
-  --output run.miracl.mdpr-tied-pft-msmarco-ft-all.ru.dev.txt --hits 1000
+
python -m pyserini.fusion \
+  --runs  run.miracl.bm25.ru.dev.top1000.txt run.miracl.mdpr-tied-pft-msmarco.ru.dev.top1000.txt \
+  --output run.miracl.bm25-mdpr-tied-pft-msmarco-hybrid.ru.dev.txt --method interpolation --alpha 0.5 --depth 1000 --k 1000
 
Evaluation commands:
python -m pyserini.eval.trec_eval \
   -c -M 100 -m ndcg_cut.10 miracl-v1.0-ru-dev \
-  run.miracl.mdpr-tied-pft-msmarco-ft-all.ru.dev.txt
+ run.miracl.bm25-mdpr-tied-pft-msmarco-hybrid.ru.dev.txt
@@ -1478,20 +1449,16 @@

MIRACL

Command to generate run:
-
python -m pyserini.search.faiss \
-  --threads 16 --batch-size 512 \
-  --encoder-class auto \
-  --encoder castorini/mdpr-tied-pft-msmarco-ft-all \
-  --topics miracl-v1.0-sw-dev \
-  --index miracl-v1.0-sw-mdpr-tied-pft-msmarco-ft-all \
-  --output run.miracl.mdpr-tied-pft-msmarco-ft-all.sw.dev.txt --hits 1000
+
python -m pyserini.fusion \
+  --runs  run.miracl.bm25.sw.dev.top1000.txt run.miracl.mdpr-tied-pft-msmarco.sw.dev.top1000.txt \
+  --output run.miracl.bm25-mdpr-tied-pft-msmarco-hybrid.sw.dev.txt --method interpolation --alpha 0.5 --depth 1000 --k 1000
 
Evaluation commands:
python -m pyserini.eval.trec_eval \
   -c -M 100 -m ndcg_cut.10 miracl-v1.0-sw-dev \
-  run.miracl.mdpr-tied-pft-msmarco-ft-all.sw.dev.txt
+ run.miracl.bm25-mdpr-tied-pft-msmarco-hybrid.sw.dev.txt
@@ -1500,20 +1467,16 @@

MIRACL

Command to generate run:
-
python -m pyserini.search.faiss \
-  --threads 16 --batch-size 512 \
-  --encoder-class auto \
-  --encoder castorini/mdpr-tied-pft-msmarco-ft-all \
-  --topics miracl-v1.0-te-dev \
-  --index miracl-v1.0-te-mdpr-tied-pft-msmarco-ft-all \
-  --output run.miracl.mdpr-tied-pft-msmarco-ft-all.te.dev.txt --hits 1000
+
python -m pyserini.fusion \
+  --runs  run.miracl.bm25.te.dev.top1000.txt run.miracl.mdpr-tied-pft-msmarco.te.dev.top1000.txt \
+  --output run.miracl.bm25-mdpr-tied-pft-msmarco-hybrid.te.dev.txt --method interpolation --alpha 0.5 --depth 1000 --k 1000
 
Evaluation commands:
python -m pyserini.eval.trec_eval \
   -c -M 100 -m ndcg_cut.10 miracl-v1.0-te-dev \
-  run.miracl.mdpr-tied-pft-msmarco-ft-all.te.dev.txt
+ run.miracl.bm25-mdpr-tied-pft-msmarco-hybrid.te.dev.txt
@@ -1522,20 +1485,16 @@

MIRACL

Command to generate run:
-
python -m pyserini.search.faiss \
-  --threads 16 --batch-size 512 \
-  --encoder-class auto \
-  --encoder castorini/mdpr-tied-pft-msmarco-ft-all \
-  --topics miracl-v1.0-th-dev \
-  --index miracl-v1.0-th-mdpr-tied-pft-msmarco-ft-all \
-  --output run.miracl.mdpr-tied-pft-msmarco-ft-all.th.dev.txt --hits 1000
+
python -m pyserini.fusion \
+  --runs  run.miracl.bm25.th.dev.top1000.txt run.miracl.mdpr-tied-pft-msmarco.th.dev.top1000.txt \
+  --output run.miracl.bm25-mdpr-tied-pft-msmarco-hybrid.th.dev.txt --method interpolation --alpha 0.5 --depth 1000 --k 1000
 
Evaluation commands:
python -m pyserini.eval.trec_eval \
   -c -M 100 -m ndcg_cut.10 miracl-v1.0-th-dev \
-  run.miracl.mdpr-tied-pft-msmarco-ft-all.th.dev.txt
+ run.miracl.bm25-mdpr-tied-pft-msmarco-hybrid.th.dev.txt
@@ -1544,20 +1503,16 @@

MIRACL

Command to generate run:
-
python -m pyserini.search.faiss \
-  --threads 16 --batch-size 512 \
-  --encoder-class auto \
-  --encoder castorini/mdpr-tied-pft-msmarco-ft-all \
-  --topics miracl-v1.0-zh-dev \
-  --index miracl-v1.0-zh-mdpr-tied-pft-msmarco-ft-all \
-  --output run.miracl.mdpr-tied-pft-msmarco-ft-all.zh.dev.txt --hits 1000
+
python -m pyserini.fusion \
+  --runs  run.miracl.bm25.zh.dev.top1000.txt run.miracl.mdpr-tied-pft-msmarco.zh.dev.top1000.txt \
+  --output run.miracl.bm25-mdpr-tied-pft-msmarco-hybrid.zh.dev.txt --method interpolation --alpha 0.5 --depth 1000 --k 1000
 
Evaluation commands:
python -m pyserini.eval.trec_eval \
   -c -M 100 -m ndcg_cut.10 miracl-v1.0-zh-dev \
-  run.miracl.mdpr-tied-pft-msmarco-ft-all.zh.dev.txt
+ run.miracl.bm25-mdpr-tied-pft-msmarco-hybrid.zh.dev.txt
@@ -1566,20 +1521,16 @@

MIRACL

Command to generate run:
-
python -m pyserini.search.faiss \
-  --threads 16 --batch-size 512 \
-  --encoder-class auto \
-  --encoder castorini/mdpr-tied-pft-msmarco-ft-all \
-  --topics miracl-v1.0-de-dev \
-  --index miracl-v1.0-de-mdpr-tied-pft-msmarco-ft-all \
-  --output run.miracl.mdpr-tied-pft-msmarco-ft-all.de.dev.txt --hits 1000
+
python -m pyserini.fusion \
+  --runs  run.miracl.bm25.de.dev.top1000.txt run.miracl.mdpr-tied-pft-msmarco.de.dev.top1000.txt \
+  --output run.miracl.bm25-mdpr-tied-pft-msmarco-hybrid.de.dev.txt --method interpolation --alpha 0.5 --depth 1000 --k 1000
 
Evaluation commands:
python -m pyserini.eval.trec_eval \
   -c -M 100 -m ndcg_cut.10 miracl-v1.0-de-dev \
-  run.miracl.mdpr-tied-pft-msmarco-ft-all.de.dev.txt
+ run.miracl.bm25-mdpr-tied-pft-msmarco-hybrid.de.dev.txt
@@ -1588,20 +1539,16 @@

MIRACL

Command to generate run:
-
python -m pyserini.search.faiss \
-  --threads 16 --batch-size 512 \
-  --encoder-class auto \
-  --encoder castorini/mdpr-tied-pft-msmarco-ft-all \
-  --topics miracl-v1.0-yo-dev \
-  --index miracl-v1.0-yo-mdpr-tied-pft-msmarco-ft-all \
-  --output run.miracl.mdpr-tied-pft-msmarco-ft-all.yo.dev.txt --hits 1000
+
python -m pyserini.fusion \
+  --runs  run.miracl.bm25.yo.dev.top1000.txt run.miracl.mdpr-tied-pft-msmarco.yo.dev.top1000.txt \
+  --output run.miracl.bm25-mdpr-tied-pft-msmarco-hybrid.yo.dev.txt --method interpolation --alpha 0.5 --depth 1000 --k 1000
 
Evaluation commands:
python -m pyserini.eval.trec_eval \
   -c -M 100 -m ndcg_cut.10 miracl-v1.0-yo-dev \
-  run.miracl.mdpr-tied-pft-msmarco-ft-all.yo.dev.txt
+ run.miracl.bm25-mdpr-tied-pft-msmarco-hybrid.yo.dev.txt
@@ -1612,30 +1559,30 @@

MIRACL

@@ -1707,16 +1654,20 @@

MIRACL

Command to generate run:
-
python -m pyserini.fusion \
-  --runs  run.miracl.bm25.ar.dev.top1000.txt run.miracl.mdpr-tied-pft-msmarco.ar.dev.top1000.txt \
-  --output run.miracl.bm25-mdpr-tied-pft-msmarco-hybrid.ar.dev.txt --method interpolation --alpha 0.5 --depth 1000 --k 1000
+
python -m pyserini.search.faiss \
+  --threads 16 --batch-size 512 \
+  --encoder-class auto \
+  --encoder castorini/mdpr-tied-pft-msmarco-ft-all \
+  --topics miracl-v1.0-ar-dev \
+  --index miracl-v1.0-ar-mdpr-tied-pft-msmarco-ft-all \
+  --output run.miracl.mdpr-tied-pft-msmarco-ft-all.ar.dev.txt --hits 1000
 
Evaluation commands:
python -m pyserini.eval.trec_eval \
   -c -M 100 -m ndcg_cut.10 miracl-v1.0-ar-dev \
-  run.miracl.bm25-mdpr-tied-pft-msmarco-hybrid.ar.dev.txt
+ run.miracl.mdpr-tied-pft-msmarco-ft-all.ar.dev.txt
@@ -1724,16 +1675,20 @@

MIRACL

Command to generate run:
-
python -m pyserini.fusion \
-  --runs  run.miracl.bm25.bn.dev.top1000.txt run.miracl.mdpr-tied-pft-msmarco.bn.dev.top1000.txt \
-  --output run.miracl.bm25-mdpr-tied-pft-msmarco-hybrid.bn.dev.txt --method interpolation --alpha 0.5 --depth 1000 --k 1000
+
python -m pyserini.search.faiss \
+  --threads 16 --batch-size 512 \
+  --encoder-class auto \
+  --encoder castorini/mdpr-tied-pft-msmarco-ft-all \
+  --topics miracl-v1.0-bn-dev \
+  --index miracl-v1.0-bn-mdpr-tied-pft-msmarco-ft-all \
+  --output run.miracl.mdpr-tied-pft-msmarco-ft-all.bn.dev.txt --hits 1000
 
Evaluation commands:
python -m pyserini.eval.trec_eval \
   -c -M 100 -m ndcg_cut.10 miracl-v1.0-bn-dev \
-  run.miracl.bm25-mdpr-tied-pft-msmarco-hybrid.bn.dev.txt
+ run.miracl.mdpr-tied-pft-msmarco-ft-all.bn.dev.txt
@@ -1741,16 +1696,20 @@

MIRACL

Command to generate run:
-
python -m pyserini.fusion \
-  --runs  run.miracl.bm25.en.dev.top1000.txt run.miracl.mdpr-tied-pft-msmarco.en.dev.top1000.txt \
-  --output run.miracl.bm25-mdpr-tied-pft-msmarco-hybrid.en.dev.txt --method interpolation --alpha 0.5 --depth 1000 --k 1000
+
python -m pyserini.search.faiss \
+  --threads 16 --batch-size 512 \
+  --encoder-class auto \
+  --encoder castorini/mdpr-tied-pft-msmarco-ft-all \
+  --topics miracl-v1.0-en-dev \
+  --index miracl-v1.0-en-mdpr-tied-pft-msmarco-ft-all \
+  --output run.miracl.mdpr-tied-pft-msmarco-ft-all.en.dev.txt --hits 1000
 
Evaluation commands:
python -m pyserini.eval.trec_eval \
   -c -M 100 -m ndcg_cut.10 miracl-v1.0-en-dev \
-  run.miracl.bm25-mdpr-tied-pft-msmarco-hybrid.en.dev.txt
+ run.miracl.mdpr-tied-pft-msmarco-ft-all.en.dev.txt
@@ -1758,16 +1717,20 @@

MIRACL

Command to generate run:
-
python -m pyserini.fusion \
-  --runs  run.miracl.bm25.es.dev.top1000.txt run.miracl.mdpr-tied-pft-msmarco.es.dev.top1000.txt \
-  --output run.miracl.bm25-mdpr-tied-pft-msmarco-hybrid.es.dev.txt --method interpolation --alpha 0.5 --depth 1000 --k 1000
+
python -m pyserini.search.faiss \
+  --threads 16 --batch-size 512 \
+  --encoder-class auto \
+  --encoder castorini/mdpr-tied-pft-msmarco-ft-all \
+  --topics miracl-v1.0-es-dev \
+  --index miracl-v1.0-es-mdpr-tied-pft-msmarco-ft-all \
+  --output run.miracl.mdpr-tied-pft-msmarco-ft-all.es.dev.txt --hits 1000
 
Evaluation commands:
python -m pyserini.eval.trec_eval \
   -c -M 100 -m ndcg_cut.10 miracl-v1.0-es-dev \
-  run.miracl.bm25-mdpr-tied-pft-msmarco-hybrid.es.dev.txt
+ run.miracl.mdpr-tied-pft-msmarco-ft-all.es.dev.txt
@@ -1775,16 +1738,20 @@

MIRACL

Command to generate run:
-
python -m pyserini.fusion \
-  --runs  run.miracl.bm25.fa.dev.top1000.txt run.miracl.mdpr-tied-pft-msmarco.fa.dev.top1000.txt \
-  --output run.miracl.bm25-mdpr-tied-pft-msmarco-hybrid.fa.dev.txt --method interpolation --alpha 0.5 --depth 1000 --k 1000
+
python -m pyserini.search.faiss \
+  --threads 16 --batch-size 512 \
+  --encoder-class auto \
+  --encoder castorini/mdpr-tied-pft-msmarco-ft-all \
+  --topics miracl-v1.0-fa-dev \
+  --index miracl-v1.0-fa-mdpr-tied-pft-msmarco-ft-all \
+  --output run.miracl.mdpr-tied-pft-msmarco-ft-all.fa.dev.txt --hits 1000
 
Evaluation commands:
python -m pyserini.eval.trec_eval \
   -c -M 100 -m ndcg_cut.10 miracl-v1.0-fa-dev \
-  run.miracl.bm25-mdpr-tied-pft-msmarco-hybrid.fa.dev.txt
+ run.miracl.mdpr-tied-pft-msmarco-ft-all.fa.dev.txt
@@ -1792,16 +1759,20 @@

MIRACL

Command to generate run:
-
python -m pyserini.fusion \
-  --runs  run.miracl.bm25.fi.dev.top1000.txt run.miracl.mdpr-tied-pft-msmarco.fi.dev.top1000.txt \
-  --output run.miracl.bm25-mdpr-tied-pft-msmarco-hybrid.fi.dev.txt --method interpolation --alpha 0.5 --depth 1000 --k 1000
+
python -m pyserini.search.faiss \
+  --threads 16 --batch-size 512 \
+  --encoder-class auto \
+  --encoder castorini/mdpr-tied-pft-msmarco-ft-all \
+  --topics miracl-v1.0-fi-dev \
+  --index miracl-v1.0-fi-mdpr-tied-pft-msmarco-ft-all \
+  --output run.miracl.mdpr-tied-pft-msmarco-ft-all.fi.dev.txt --hits 1000
 
Evaluation commands:
python -m pyserini.eval.trec_eval \
   -c -M 100 -m ndcg_cut.10 miracl-v1.0-fi-dev \
-  run.miracl.bm25-mdpr-tied-pft-msmarco-hybrid.fi.dev.txt
+ run.miracl.mdpr-tied-pft-msmarco-ft-all.fi.dev.txt
@@ -1809,16 +1780,20 @@

MIRACL

Command to generate run:
-
python -m pyserini.fusion \
-  --runs  run.miracl.bm25.fr.dev.top1000.txt run.miracl.mdpr-tied-pft-msmarco.fr.dev.top1000.txt \
-  --output run.miracl.bm25-mdpr-tied-pft-msmarco-hybrid.fr.dev.txt --method interpolation --alpha 0.5 --depth 1000 --k 1000
+
python -m pyserini.search.faiss \
+  --threads 16 --batch-size 512 \
+  --encoder-class auto \
+  --encoder castorini/mdpr-tied-pft-msmarco-ft-all \
+  --topics miracl-v1.0-fr-dev \
+  --index miracl-v1.0-fr-mdpr-tied-pft-msmarco-ft-all \
+  --output run.miracl.mdpr-tied-pft-msmarco-ft-all.fr.dev.txt --hits 1000
 
Evaluation commands:
python -m pyserini.eval.trec_eval \
   -c -M 100 -m ndcg_cut.10 miracl-v1.0-fr-dev \
-  run.miracl.bm25-mdpr-tied-pft-msmarco-hybrid.fr.dev.txt
+ run.miracl.mdpr-tied-pft-msmarco-ft-all.fr.dev.txt
@@ -1826,16 +1801,20 @@

MIRACL

Command to generate run:
-
python -m pyserini.fusion \
-  --runs  run.miracl.bm25.hi.dev.top1000.txt run.miracl.mdpr-tied-pft-msmarco.hi.dev.top1000.txt \
-  --output run.miracl.bm25-mdpr-tied-pft-msmarco-hybrid.hi.dev.txt --method interpolation --alpha 0.5 --depth 1000 --k 1000
+
python -m pyserini.search.faiss \
+  --threads 16 --batch-size 512 \
+  --encoder-class auto \
+  --encoder castorini/mdpr-tied-pft-msmarco-ft-all \
+  --topics miracl-v1.0-hi-dev \
+  --index miracl-v1.0-hi-mdpr-tied-pft-msmarco-ft-all \
+  --output run.miracl.mdpr-tied-pft-msmarco-ft-all.hi.dev.txt --hits 1000
 
Evaluation commands:
python -m pyserini.eval.trec_eval \
   -c -M 100 -m ndcg_cut.10 miracl-v1.0-hi-dev \
-  run.miracl.bm25-mdpr-tied-pft-msmarco-hybrid.hi.dev.txt
+ run.miracl.mdpr-tied-pft-msmarco-ft-all.hi.dev.txt
@@ -1843,16 +1822,20 @@

MIRACL

Command to generate run:
-
python -m pyserini.fusion \
-  --runs  run.miracl.bm25.id.dev.top1000.txt run.miracl.mdpr-tied-pft-msmarco.id.dev.top1000.txt \
-  --output run.miracl.bm25-mdpr-tied-pft-msmarco-hybrid.id.dev.txt --method interpolation --alpha 0.5 --depth 1000 --k 1000
+
python -m pyserini.search.faiss \
+  --threads 16 --batch-size 512 \
+  --encoder-class auto \
+  --encoder castorini/mdpr-tied-pft-msmarco-ft-all \
+  --topics miracl-v1.0-id-dev \
+  --index miracl-v1.0-id-mdpr-tied-pft-msmarco-ft-all \
+  --output run.miracl.mdpr-tied-pft-msmarco-ft-all.id.dev.txt --hits 1000
 
Evaluation commands:
python -m pyserini.eval.trec_eval \
   -c -M 100 -m ndcg_cut.10 miracl-v1.0-id-dev \
-  run.miracl.bm25-mdpr-tied-pft-msmarco-hybrid.id.dev.txt
+ run.miracl.mdpr-tied-pft-msmarco-ft-all.id.dev.txt
@@ -1860,16 +1843,20 @@

MIRACL

Command to generate run:
-
python -m pyserini.fusion \
-  --runs  run.miracl.bm25.ja.dev.top1000.txt run.miracl.mdpr-tied-pft-msmarco.ja.dev.top1000.txt \
-  --output run.miracl.bm25-mdpr-tied-pft-msmarco-hybrid.ja.dev.txt --method interpolation --alpha 0.5 --depth 1000 --k 1000
+
python -m pyserini.search.faiss \
+  --threads 16 --batch-size 512 \
+  --encoder-class auto \
+  --encoder castorini/mdpr-tied-pft-msmarco-ft-all \
+  --topics miracl-v1.0-ja-dev \
+  --index miracl-v1.0-ja-mdpr-tied-pft-msmarco-ft-all \
+  --output run.miracl.mdpr-tied-pft-msmarco-ft-all.ja.dev.txt --hits 1000
 
Evaluation commands:
python -m pyserini.eval.trec_eval \
   -c -M 100 -m ndcg_cut.10 miracl-v1.0-ja-dev \
-  run.miracl.bm25-mdpr-tied-pft-msmarco-hybrid.ja.dev.txt
+ run.miracl.mdpr-tied-pft-msmarco-ft-all.ja.dev.txt
@@ -1877,16 +1864,20 @@

MIRACL

Command to generate run:
-
python -m pyserini.fusion \
-  --runs  run.miracl.bm25.ko.dev.top1000.txt run.miracl.mdpr-tied-pft-msmarco.ko.dev.top1000.txt \
-  --output run.miracl.bm25-mdpr-tied-pft-msmarco-hybrid.ko.dev.txt --method interpolation --alpha 0.5 --depth 1000 --k 1000
+
python -m pyserini.search.faiss \
+  --threads 16 --batch-size 512 \
+  --encoder-class auto \
+  --encoder castorini/mdpr-tied-pft-msmarco-ft-all \
+  --topics miracl-v1.0-ko-dev \
+  --index miracl-v1.0-ko-mdpr-tied-pft-msmarco-ft-all \
+  --output run.miracl.mdpr-tied-pft-msmarco-ft-all.ko.dev.txt --hits 1000
 
Evaluation commands:
python -m pyserini.eval.trec_eval \
   -c -M 100 -m ndcg_cut.10 miracl-v1.0-ko-dev \
-  run.miracl.bm25-mdpr-tied-pft-msmarco-hybrid.ko.dev.txt
+ run.miracl.mdpr-tied-pft-msmarco-ft-all.ko.dev.txt
@@ -1894,16 +1885,20 @@

MIRACL

Command to generate run:
-
python -m pyserini.fusion \
-  --runs  run.miracl.bm25.ru.dev.top1000.txt run.miracl.mdpr-tied-pft-msmarco.ru.dev.top1000.txt \
-  --output run.miracl.bm25-mdpr-tied-pft-msmarco-hybrid.ru.dev.txt --method interpolation --alpha 0.5 --depth 1000 --k 1000
+
python -m pyserini.search.faiss \
+  --threads 16 --batch-size 512 \
+  --encoder-class auto \
+  --encoder castorini/mdpr-tied-pft-msmarco-ft-all \
+  --topics miracl-v1.0-ru-dev \
+  --index miracl-v1.0-ru-mdpr-tied-pft-msmarco-ft-all \
+  --output run.miracl.mdpr-tied-pft-msmarco-ft-all.ru.dev.txt --hits 1000
 
Evaluation commands:
python -m pyserini.eval.trec_eval \
   -c -M 100 -m ndcg_cut.10 miracl-v1.0-ru-dev \
-  run.miracl.bm25-mdpr-tied-pft-msmarco-hybrid.ru.dev.txt
+ run.miracl.mdpr-tied-pft-msmarco-ft-all.ru.dev.txt
@@ -1911,16 +1906,20 @@

MIRACL

Command to generate run:
-
python -m pyserini.fusion \
-  --runs  run.miracl.bm25.sw.dev.top1000.txt run.miracl.mdpr-tied-pft-msmarco.sw.dev.top1000.txt \
-  --output run.miracl.bm25-mdpr-tied-pft-msmarco-hybrid.sw.dev.txt --method interpolation --alpha 0.5 --depth 1000 --k 1000
+
python -m pyserini.search.faiss \
+  --threads 16 --batch-size 512 \
+  --encoder-class auto \
+  --encoder castorini/mdpr-tied-pft-msmarco-ft-all \
+  --topics miracl-v1.0-sw-dev \
+  --index miracl-v1.0-sw-mdpr-tied-pft-msmarco-ft-all \
+  --output run.miracl.mdpr-tied-pft-msmarco-ft-all.sw.dev.txt --hits 1000
 
Evaluation commands:
python -m pyserini.eval.trec_eval \
   -c -M 100 -m ndcg_cut.10 miracl-v1.0-sw-dev \
-  run.miracl.bm25-mdpr-tied-pft-msmarco-hybrid.sw.dev.txt
+ run.miracl.mdpr-tied-pft-msmarco-ft-all.sw.dev.txt
@@ -1929,16 +1928,20 @@

MIRACL

Command to generate run:
-
python -m pyserini.fusion \
-  --runs  run.miracl.bm25.te.dev.top1000.txt run.miracl.mdpr-tied-pft-msmarco.te.dev.top1000.txt \
-  --output run.miracl.bm25-mdpr-tied-pft-msmarco-hybrid.te.dev.txt --method interpolation --alpha 0.5 --depth 1000 --k 1000
+
python -m pyserini.search.faiss \
+  --threads 16 --batch-size 512 \
+  --encoder-class auto \
+  --encoder castorini/mdpr-tied-pft-msmarco-ft-all \
+  --topics miracl-v1.0-te-dev \
+  --index miracl-v1.0-te-mdpr-tied-pft-msmarco-ft-all \
+  --output run.miracl.mdpr-tied-pft-msmarco-ft-all.te.dev.txt --hits 1000
 
Evaluation commands:
python -m pyserini.eval.trec_eval \
   -c -M 100 -m ndcg_cut.10 miracl-v1.0-te-dev \
-  run.miracl.bm25-mdpr-tied-pft-msmarco-hybrid.te.dev.txt
+ run.miracl.mdpr-tied-pft-msmarco-ft-all.te.dev.txt
@@ -1947,16 +1950,20 @@

MIRACL

Command to generate run:
-
python -m pyserini.fusion \
-  --runs  run.miracl.bm25.th.dev.top1000.txt run.miracl.mdpr-tied-pft-msmarco.th.dev.top1000.txt \
-  --output run.miracl.bm25-mdpr-tied-pft-msmarco-hybrid.th.dev.txt --method interpolation --alpha 0.5 --depth 1000 --k 1000
+
python -m pyserini.search.faiss \
+  --threads 16 --batch-size 512 \
+  --encoder-class auto \
+  --encoder castorini/mdpr-tied-pft-msmarco-ft-all \
+  --topics miracl-v1.0-th-dev \
+  --index miracl-v1.0-th-mdpr-tied-pft-msmarco-ft-all \
+  --output run.miracl.mdpr-tied-pft-msmarco-ft-all.th.dev.txt --hits 1000
 
Evaluation commands:
python -m pyserini.eval.trec_eval \
   -c -M 100 -m ndcg_cut.10 miracl-v1.0-th-dev \
-  run.miracl.bm25-mdpr-tied-pft-msmarco-hybrid.th.dev.txt
+ run.miracl.mdpr-tied-pft-msmarco-ft-all.th.dev.txt
@@ -1965,16 +1972,20 @@

MIRACL

Command to generate run:
-
python -m pyserini.fusion \
-  --runs  run.miracl.bm25.zh.dev.top1000.txt run.miracl.mdpr-tied-pft-msmarco.zh.dev.top1000.txt \
-  --output run.miracl.bm25-mdpr-tied-pft-msmarco-hybrid.zh.dev.txt --method interpolation --alpha 0.5 --depth 1000 --k 1000
+
python -m pyserini.search.faiss \
+  --threads 16 --batch-size 512 \
+  --encoder-class auto \
+  --encoder castorini/mdpr-tied-pft-msmarco-ft-all \
+  --topics miracl-v1.0-zh-dev \
+  --index miracl-v1.0-zh-mdpr-tied-pft-msmarco-ft-all \
+  --output run.miracl.mdpr-tied-pft-msmarco-ft-all.zh.dev.txt --hits 1000
 
Evaluation commands:
python -m pyserini.eval.trec_eval \
   -c -M 100 -m ndcg_cut.10 miracl-v1.0-zh-dev \
-  run.miracl.bm25-mdpr-tied-pft-msmarco-hybrid.zh.dev.txt
+ run.miracl.mdpr-tied-pft-msmarco-ft-all.zh.dev.txt
@@ -1983,16 +1994,20 @@

MIRACL

Command to generate run:
-
python -m pyserini.fusion \
-  --runs  run.miracl.bm25.de.dev.top1000.txt run.miracl.mdpr-tied-pft-msmarco.de.dev.top1000.txt \
-  --output run.miracl.bm25-mdpr-tied-pft-msmarco-hybrid.de.dev.txt --method interpolation --alpha 0.5 --depth 1000 --k 1000
+
python -m pyserini.search.faiss \
+  --threads 16 --batch-size 512 \
+  --encoder-class auto \
+  --encoder castorini/mdpr-tied-pft-msmarco-ft-all \
+  --topics miracl-v1.0-de-dev \
+  --index miracl-v1.0-de-mdpr-tied-pft-msmarco-ft-all \
+  --output run.miracl.mdpr-tied-pft-msmarco-ft-all.de.dev.txt --hits 1000
 
Evaluation commands:
python -m pyserini.eval.trec_eval \
   -c -M 100 -m ndcg_cut.10 miracl-v1.0-de-dev \
-  run.miracl.bm25-mdpr-tied-pft-msmarco-hybrid.de.dev.txt
+ run.miracl.mdpr-tied-pft-msmarco-ft-all.de.dev.txt
@@ -2001,16 +2016,20 @@

MIRACL

Command to generate run:
-
python -m pyserini.fusion \
-  --runs  run.miracl.bm25.yo.dev.top1000.txt run.miracl.mdpr-tied-pft-msmarco.yo.dev.top1000.txt \
-  --output run.miracl.bm25-mdpr-tied-pft-msmarco-hybrid.yo.dev.txt --method interpolation --alpha 0.5 --depth 1000 --k 1000
+
python -m pyserini.search.faiss \
+  --threads 16 --batch-size 512 \
+  --encoder-class auto \
+  --encoder castorini/mdpr-tied-pft-msmarco-ft-all \
+  --topics miracl-v1.0-yo-dev \
+  --index miracl-v1.0-yo-mdpr-tied-pft-msmarco-ft-all \
+  --output run.miracl.mdpr-tied-pft-msmarco-ft-all.yo.dev.txt --hits 1000
 
Evaluation commands:
python -m pyserini.eval.trec_eval \
   -c -M 100 -m ndcg_cut.10 miracl-v1.0-yo-dev \
-  run.miracl.bm25-mdpr-tied-pft-msmarco-hybrid.yo.dev.txt
+ run.miracl.mdpr-tied-pft-msmarco-ft-all.yo.dev.txt
@@ -2021,10 +2040,10 @@

MIRACL

Recall@100, dev queriesRecall@100, dev queries ar bn en
@@ -4057,20 +4076,16 @@

MIRACL

Command to generate run:
-
python -m pyserini.search.faiss \
-  --threads 16 --batch-size 512 \
-  --encoder-class auto \
-  --encoder castorini/mdpr-tied-pft-msmarco-ft-all \
-  --topics miracl-v1.0-ar-dev \
-  --index miracl-v1.0-ar-mdpr-tied-pft-msmarco-ft-all \
-  --output run.miracl.mdpr-tied-pft-msmarco-ft-all.ar.dev.txt --hits 1000
+
python -m pyserini.fusion \
+  --runs  run.miracl.bm25.ar.dev.top1000.txt run.miracl.mdpr-tied-pft-msmarco.ar.dev.top1000.txt \
+  --output run.miracl.bm25-mdpr-tied-pft-msmarco-hybrid.ar.dev.txt --method interpolation --alpha 0.5 --depth 1000 --k 1000
 
Evaluation commands:
python -m pyserini.eval.trec_eval \
   -c -m recall.100 miracl-v1.0-ar-dev \
-  run.miracl.mdpr-tied-pft-msmarco-ft-all.ar.dev.txt
+ run.miracl.bm25-mdpr-tied-pft-msmarco-hybrid.ar.dev.txt
@@ -4078,20 +4093,16 @@

MIRACL

Command to generate run:
-
python -m pyserini.search.faiss \
-  --threads 16 --batch-size 512 \
-  --encoder-class auto \
-  --encoder castorini/mdpr-tied-pft-msmarco-ft-all \
-  --topics miracl-v1.0-bn-dev \
-  --index miracl-v1.0-bn-mdpr-tied-pft-msmarco-ft-all \
-  --output run.miracl.mdpr-tied-pft-msmarco-ft-all.bn.dev.txt --hits 1000
+
python -m pyserini.fusion \
+  --runs  run.miracl.bm25.bn.dev.top1000.txt run.miracl.mdpr-tied-pft-msmarco.bn.dev.top1000.txt \
+  --output run.miracl.bm25-mdpr-tied-pft-msmarco-hybrid.bn.dev.txt --method interpolation --alpha 0.5 --depth 1000 --k 1000
 
Evaluation commands:
python -m pyserini.eval.trec_eval \
   -c -m recall.100 miracl-v1.0-bn-dev \
-  run.miracl.mdpr-tied-pft-msmarco-ft-all.bn.dev.txt
+ run.miracl.bm25-mdpr-tied-pft-msmarco-hybrid.bn.dev.txt
@@ -4099,20 +4110,16 @@

MIRACL

Command to generate run:
-
python -m pyserini.search.faiss \
-  --threads 16 --batch-size 512 \
-  --encoder-class auto \
-  --encoder castorini/mdpr-tied-pft-msmarco-ft-all \
-  --topics miracl-v1.0-en-dev \
-  --index miracl-v1.0-en-mdpr-tied-pft-msmarco-ft-all \
-  --output run.miracl.mdpr-tied-pft-msmarco-ft-all.en.dev.txt --hits 1000
+
python -m pyserini.fusion \
+  --runs  run.miracl.bm25.en.dev.top1000.txt run.miracl.mdpr-tied-pft-msmarco.en.dev.top1000.txt \
+  --output run.miracl.bm25-mdpr-tied-pft-msmarco-hybrid.en.dev.txt --method interpolation --alpha 0.5 --depth 1000 --k 1000
 
Evaluation commands:
python -m pyserini.eval.trec_eval \
   -c -m recall.100 miracl-v1.0-en-dev \
-  run.miracl.mdpr-tied-pft-msmarco-ft-all.en.dev.txt
+ run.miracl.bm25-mdpr-tied-pft-msmarco-hybrid.en.dev.txt
@@ -4120,20 +4127,16 @@

MIRACL

Command to generate run:
-
python -m pyserini.search.faiss \
-  --threads 16 --batch-size 512 \
-  --encoder-class auto \
-  --encoder castorini/mdpr-tied-pft-msmarco-ft-all \
-  --topics miracl-v1.0-es-dev \
-  --index miracl-v1.0-es-mdpr-tied-pft-msmarco-ft-all \
-  --output run.miracl.mdpr-tied-pft-msmarco-ft-all.es.dev.txt --hits 1000
+
python -m pyserini.fusion \
+  --runs  run.miracl.bm25.es.dev.top1000.txt run.miracl.mdpr-tied-pft-msmarco.es.dev.top1000.txt \
+  --output run.miracl.bm25-mdpr-tied-pft-msmarco-hybrid.es.dev.txt --method interpolation --alpha 0.5 --depth 1000 --k 1000
 
Evaluation commands:
python -m pyserini.eval.trec_eval \
   -c -m recall.100 miracl-v1.0-es-dev \
-  run.miracl.mdpr-tied-pft-msmarco-ft-all.es.dev.txt
+ run.miracl.bm25-mdpr-tied-pft-msmarco-hybrid.es.dev.txt
@@ -4141,20 +4144,16 @@

MIRACL

Command to generate run:
-
python -m pyserini.search.faiss \
-  --threads 16 --batch-size 512 \
-  --encoder-class auto \
-  --encoder castorini/mdpr-tied-pft-msmarco-ft-all \
-  --topics miracl-v1.0-fa-dev \
-  --index miracl-v1.0-fa-mdpr-tied-pft-msmarco-ft-all \
-  --output run.miracl.mdpr-tied-pft-msmarco-ft-all.fa.dev.txt --hits 1000
+
python -m pyserini.fusion \
+  --runs  run.miracl.bm25.fa.dev.top1000.txt run.miracl.mdpr-tied-pft-msmarco.fa.dev.top1000.txt \
+  --output run.miracl.bm25-mdpr-tied-pft-msmarco-hybrid.fa.dev.txt --method interpolation --alpha 0.5 --depth 1000 --k 1000
 
Evaluation commands:
python -m pyserini.eval.trec_eval \
   -c -m recall.100 miracl-v1.0-fa-dev \
-  run.miracl.mdpr-tied-pft-msmarco-ft-all.fa.dev.txt
+ run.miracl.bm25-mdpr-tied-pft-msmarco-hybrid.fa.dev.txt
@@ -4162,20 +4161,16 @@

MIRACL

Command to generate run:
-
python -m pyserini.search.faiss \
-  --threads 16 --batch-size 512 \
-  --encoder-class auto \
-  --encoder castorini/mdpr-tied-pft-msmarco-ft-all \
-  --topics miracl-v1.0-fi-dev \
-  --index miracl-v1.0-fi-mdpr-tied-pft-msmarco-ft-all \
-  --output run.miracl.mdpr-tied-pft-msmarco-ft-all.fi.dev.txt --hits 1000
+
python -m pyserini.fusion \
+  --runs  run.miracl.bm25.fi.dev.top1000.txt run.miracl.mdpr-tied-pft-msmarco.fi.dev.top1000.txt \
+  --output run.miracl.bm25-mdpr-tied-pft-msmarco-hybrid.fi.dev.txt --method interpolation --alpha 0.5 --depth 1000 --k 1000
 
Evaluation commands:
python -m pyserini.eval.trec_eval \
   -c -m recall.100 miracl-v1.0-fi-dev \
-  run.miracl.mdpr-tied-pft-msmarco-ft-all.fi.dev.txt
+ run.miracl.bm25-mdpr-tied-pft-msmarco-hybrid.fi.dev.txt
@@ -4183,20 +4178,16 @@

MIRACL

Command to generate run:
-
python -m pyserini.search.faiss \
-  --threads 16 --batch-size 512 \
-  --encoder-class auto \
-  --encoder castorini/mdpr-tied-pft-msmarco-ft-all \
-  --topics miracl-v1.0-fr-dev \
-  --index miracl-v1.0-fr-mdpr-tied-pft-msmarco-ft-all \
-  --output run.miracl.mdpr-tied-pft-msmarco-ft-all.fr.dev.txt --hits 1000
+
python -m pyserini.fusion \
+  --runs  run.miracl.bm25.fr.dev.top1000.txt run.miracl.mdpr-tied-pft-msmarco.fr.dev.top1000.txt \
+  --output run.miracl.bm25-mdpr-tied-pft-msmarco-hybrid.fr.dev.txt --method interpolation --alpha 0.5 --depth 1000 --k 1000
 
Evaluation commands:
python -m pyserini.eval.trec_eval \
   -c -m recall.100 miracl-v1.0-fr-dev \
-  run.miracl.mdpr-tied-pft-msmarco-ft-all.fr.dev.txt
+ run.miracl.bm25-mdpr-tied-pft-msmarco-hybrid.fr.dev.txt
@@ -4204,20 +4195,16 @@

MIRACL

Command to generate run:
-
python -m pyserini.search.faiss \
-  --threads 16 --batch-size 512 \
-  --encoder-class auto \
-  --encoder castorini/mdpr-tied-pft-msmarco-ft-all \
-  --topics miracl-v1.0-hi-dev \
-  --index miracl-v1.0-hi-mdpr-tied-pft-msmarco-ft-all \
-  --output run.miracl.mdpr-tied-pft-msmarco-ft-all.hi.dev.txt --hits 1000
+
python -m pyserini.fusion \
+  --runs  run.miracl.bm25.hi.dev.top1000.txt run.miracl.mdpr-tied-pft-msmarco.hi.dev.top1000.txt \
+  --output run.miracl.bm25-mdpr-tied-pft-msmarco-hybrid.hi.dev.txt --method interpolation --alpha 0.5 --depth 1000 --k 1000
 
Evaluation commands:
python -m pyserini.eval.trec_eval \
   -c -m recall.100 miracl-v1.0-hi-dev \
-  run.miracl.mdpr-tied-pft-msmarco-ft-all.hi.dev.txt
+ run.miracl.bm25-mdpr-tied-pft-msmarco-hybrid.hi.dev.txt
@@ -4225,20 +4212,16 @@

MIRACL

Command to generate run:
-
python -m pyserini.search.faiss \
-  --threads 16 --batch-size 512 \
-  --encoder-class auto \
-  --encoder castorini/mdpr-tied-pft-msmarco-ft-all \
-  --topics miracl-v1.0-id-dev \
-  --index miracl-v1.0-id-mdpr-tied-pft-msmarco-ft-all \
-  --output run.miracl.mdpr-tied-pft-msmarco-ft-all.id.dev.txt --hits 1000
+
python -m pyserini.fusion \
+  --runs  run.miracl.bm25.id.dev.top1000.txt run.miracl.mdpr-tied-pft-msmarco.id.dev.top1000.txt \
+  --output run.miracl.bm25-mdpr-tied-pft-msmarco-hybrid.id.dev.txt --method interpolation --alpha 0.5 --depth 1000 --k 1000
 
Evaluation commands:
python -m pyserini.eval.trec_eval \
   -c -m recall.100 miracl-v1.0-id-dev \
-  run.miracl.mdpr-tied-pft-msmarco-ft-all.id.dev.txt
+ run.miracl.bm25-mdpr-tied-pft-msmarco-hybrid.id.dev.txt
@@ -4246,20 +4229,16 @@

MIRACL

Command to generate run:
-
python -m pyserini.search.faiss \
-  --threads 16 --batch-size 512 \
-  --encoder-class auto \
-  --encoder castorini/mdpr-tied-pft-msmarco-ft-all \
-  --topics miracl-v1.0-ja-dev \
-  --index miracl-v1.0-ja-mdpr-tied-pft-msmarco-ft-all \
-  --output run.miracl.mdpr-tied-pft-msmarco-ft-all.ja.dev.txt --hits 1000
+
python -m pyserini.fusion \
+  --runs  run.miracl.bm25.ja.dev.top1000.txt run.miracl.mdpr-tied-pft-msmarco.ja.dev.top1000.txt \
+  --output run.miracl.bm25-mdpr-tied-pft-msmarco-hybrid.ja.dev.txt --method interpolation --alpha 0.5 --depth 1000 --k 1000
 
Evaluation commands:
python -m pyserini.eval.trec_eval \
   -c -m recall.100 miracl-v1.0-ja-dev \
-  run.miracl.mdpr-tied-pft-msmarco-ft-all.ja.dev.txt
+ run.miracl.bm25-mdpr-tied-pft-msmarco-hybrid.ja.dev.txt
@@ -4267,20 +4246,16 @@

MIRACL

Command to generate run:
-
python -m pyserini.search.faiss \
-  --threads 16 --batch-size 512 \
-  --encoder-class auto \
-  --encoder castorini/mdpr-tied-pft-msmarco-ft-all \
-  --topics miracl-v1.0-ko-dev \
-  --index miracl-v1.0-ko-mdpr-tied-pft-msmarco-ft-all \
-  --output run.miracl.mdpr-tied-pft-msmarco-ft-all.ko.dev.txt --hits 1000
+
python -m pyserini.fusion \
+  --runs  run.miracl.bm25.ko.dev.top1000.txt run.miracl.mdpr-tied-pft-msmarco.ko.dev.top1000.txt \
+  --output run.miracl.bm25-mdpr-tied-pft-msmarco-hybrid.ko.dev.txt --method interpolation --alpha 0.5 --depth 1000 --k 1000
 
Evaluation commands:
python -m pyserini.eval.trec_eval \
   -c -m recall.100 miracl-v1.0-ko-dev \
-  run.miracl.mdpr-tied-pft-msmarco-ft-all.ko.dev.txt
+ run.miracl.bm25-mdpr-tied-pft-msmarco-hybrid.ko.dev.txt
@@ -4288,20 +4263,16 @@

MIRACL

Command to generate run:
-
python -m pyserini.search.faiss \
-  --threads 16 --batch-size 512 \
-  --encoder-class auto \
-  --encoder castorini/mdpr-tied-pft-msmarco-ft-all \
-  --topics miracl-v1.0-ru-dev \
-  --index miracl-v1.0-ru-mdpr-tied-pft-msmarco-ft-all \
-  --output run.miracl.mdpr-tied-pft-msmarco-ft-all.ru.dev.txt --hits 1000
+
python -m pyserini.fusion \
+  --runs  run.miracl.bm25.ru.dev.top1000.txt run.miracl.mdpr-tied-pft-msmarco.ru.dev.top1000.txt \
+  --output run.miracl.bm25-mdpr-tied-pft-msmarco-hybrid.ru.dev.txt --method interpolation --alpha 0.5 --depth 1000 --k 1000
 
Evaluation commands:
python -m pyserini.eval.trec_eval \
   -c -m recall.100 miracl-v1.0-ru-dev \
-  run.miracl.mdpr-tied-pft-msmarco-ft-all.ru.dev.txt
+ run.miracl.bm25-mdpr-tied-pft-msmarco-hybrid.ru.dev.txt
@@ -4309,20 +4280,16 @@

MIRACL

Command to generate run:
-
python -m pyserini.search.faiss \
-  --threads 16 --batch-size 512 \
-  --encoder-class auto \
-  --encoder castorini/mdpr-tied-pft-msmarco-ft-all \
-  --topics miracl-v1.0-sw-dev \
-  --index miracl-v1.0-sw-mdpr-tied-pft-msmarco-ft-all \
-  --output run.miracl.mdpr-tied-pft-msmarco-ft-all.sw.dev.txt --hits 1000
+
python -m pyserini.fusion \
+  --runs  run.miracl.bm25.sw.dev.top1000.txt run.miracl.mdpr-tied-pft-msmarco.sw.dev.top1000.txt \
+  --output run.miracl.bm25-mdpr-tied-pft-msmarco-hybrid.sw.dev.txt --method interpolation --alpha 0.5 --depth 1000 --k 1000
 
Evaluation commands:
python -m pyserini.eval.trec_eval \
   -c -m recall.100 miracl-v1.0-sw-dev \
-  run.miracl.mdpr-tied-pft-msmarco-ft-all.sw.dev.txt
+ run.miracl.bm25-mdpr-tied-pft-msmarco-hybrid.sw.dev.txt
@@ -4331,20 +4298,16 @@

MIRACL

Command to generate run:
-
python -m pyserini.search.faiss \
-  --threads 16 --batch-size 512 \
-  --encoder-class auto \
-  --encoder castorini/mdpr-tied-pft-msmarco-ft-all \
-  --topics miracl-v1.0-te-dev \
-  --index miracl-v1.0-te-mdpr-tied-pft-msmarco-ft-all \
-  --output run.miracl.mdpr-tied-pft-msmarco-ft-all.te.dev.txt --hits 1000
+
python -m pyserini.fusion \
+  --runs  run.miracl.bm25.te.dev.top1000.txt run.miracl.mdpr-tied-pft-msmarco.te.dev.top1000.txt \
+  --output run.miracl.bm25-mdpr-tied-pft-msmarco-hybrid.te.dev.txt --method interpolation --alpha 0.5 --depth 1000 --k 1000
 
Evaluation commands:
python -m pyserini.eval.trec_eval \
   -c -m recall.100 miracl-v1.0-te-dev \
-  run.miracl.mdpr-tied-pft-msmarco-ft-all.te.dev.txt
+ run.miracl.bm25-mdpr-tied-pft-msmarco-hybrid.te.dev.txt
@@ -4353,20 +4316,16 @@

MIRACL

Command to generate run:
-
python -m pyserini.search.faiss \
-  --threads 16 --batch-size 512 \
-  --encoder-class auto \
-  --encoder castorini/mdpr-tied-pft-msmarco-ft-all \
-  --topics miracl-v1.0-th-dev \
-  --index miracl-v1.0-th-mdpr-tied-pft-msmarco-ft-all \
-  --output run.miracl.mdpr-tied-pft-msmarco-ft-all.th.dev.txt --hits 1000
+
python -m pyserini.fusion \
+  --runs  run.miracl.bm25.th.dev.top1000.txt run.miracl.mdpr-tied-pft-msmarco.th.dev.top1000.txt \
+  --output run.miracl.bm25-mdpr-tied-pft-msmarco-hybrid.th.dev.txt --method interpolation --alpha 0.5 --depth 1000 --k 1000
 
Evaluation commands:
python -m pyserini.eval.trec_eval \
   -c -m recall.100 miracl-v1.0-th-dev \
-  run.miracl.mdpr-tied-pft-msmarco-ft-all.th.dev.txt
+ run.miracl.bm25-mdpr-tied-pft-msmarco-hybrid.th.dev.txt
@@ -4375,20 +4334,16 @@

MIRACL

Command to generate run:
-
python -m pyserini.search.faiss \
-  --threads 16 --batch-size 512 \
-  --encoder-class auto \
-  --encoder castorini/mdpr-tied-pft-msmarco-ft-all \
-  --topics miracl-v1.0-zh-dev \
-  --index miracl-v1.0-zh-mdpr-tied-pft-msmarco-ft-all \
-  --output run.miracl.mdpr-tied-pft-msmarco-ft-all.zh.dev.txt --hits 1000
+
python -m pyserini.fusion \
+  --runs  run.miracl.bm25.zh.dev.top1000.txt run.miracl.mdpr-tied-pft-msmarco.zh.dev.top1000.txt \
+  --output run.miracl.bm25-mdpr-tied-pft-msmarco-hybrid.zh.dev.txt --method interpolation --alpha 0.5 --depth 1000 --k 1000
 
Evaluation commands:
python -m pyserini.eval.trec_eval \
   -c -m recall.100 miracl-v1.0-zh-dev \
-  run.miracl.mdpr-tied-pft-msmarco-ft-all.zh.dev.txt
+ run.miracl.bm25-mdpr-tied-pft-msmarco-hybrid.zh.dev.txt
@@ -4397,20 +4352,16 @@

MIRACL

Command to generate run:
-
python -m pyserini.search.faiss \
-  --threads 16 --batch-size 512 \
-  --encoder-class auto \
-  --encoder castorini/mdpr-tied-pft-msmarco-ft-all \
-  --topics miracl-v1.0-de-dev \
-  --index miracl-v1.0-de-mdpr-tied-pft-msmarco-ft-all \
-  --output run.miracl.mdpr-tied-pft-msmarco-ft-all.de.dev.txt --hits 1000
+
python -m pyserini.fusion \
+  --runs  run.miracl.bm25.de.dev.top1000.txt run.miracl.mdpr-tied-pft-msmarco.de.dev.top1000.txt \
+  --output run.miracl.bm25-mdpr-tied-pft-msmarco-hybrid.de.dev.txt --method interpolation --alpha 0.5 --depth 1000 --k 1000
 
Evaluation commands:
python -m pyserini.eval.trec_eval \
   -c -m recall.100 miracl-v1.0-de-dev \
-  run.miracl.mdpr-tied-pft-msmarco-ft-all.de.dev.txt
+ run.miracl.bm25-mdpr-tied-pft-msmarco-hybrid.de.dev.txt
@@ -4419,20 +4370,16 @@

MIRACL

Command to generate run:
-
python -m pyserini.search.faiss \
-  --threads 16 --batch-size 512 \
-  --encoder-class auto \
-  --encoder castorini/mdpr-tied-pft-msmarco-ft-all \
-  --topics miracl-v1.0-yo-dev \
-  --index miracl-v1.0-yo-mdpr-tied-pft-msmarco-ft-all \
-  --output run.miracl.mdpr-tied-pft-msmarco-ft-all.yo.dev.txt --hits 1000
+
python -m pyserini.fusion \
+  --runs  run.miracl.bm25.yo.dev.top1000.txt run.miracl.mdpr-tied-pft-msmarco.yo.dev.top1000.txt \
+  --output run.miracl.bm25-mdpr-tied-pft-msmarco-hybrid.yo.dev.txt --method interpolation --alpha 0.5 --depth 1000 --k 1000
 
Evaluation commands:
python -m pyserini.eval.trec_eval \
   -c -m recall.100 miracl-v1.0-yo-dev \
-  run.miracl.mdpr-tied-pft-msmarco-ft-all.yo.dev.txt
+ run.miracl.bm25-mdpr-tied-pft-msmarco-hybrid.yo.dev.txt
@@ -4443,30 +4390,30 @@

MIRACL

@@ -4538,16 +4485,20 @@

MIRACL

Command to generate run:
-
python -m pyserini.fusion \
-  --runs  run.miracl.bm25.ar.dev.top1000.txt run.miracl.mdpr-tied-pft-msmarco.ar.dev.top1000.txt \
-  --output run.miracl.bm25-mdpr-tied-pft-msmarco-hybrid.ar.dev.txt --method interpolation --alpha 0.5 --depth 1000 --k 1000
+
python -m pyserini.search.faiss \
+  --threads 16 --batch-size 512 \
+  --encoder-class auto \
+  --encoder castorini/mdpr-tied-pft-msmarco-ft-all \
+  --topics miracl-v1.0-ar-dev \
+  --index miracl-v1.0-ar-mdpr-tied-pft-msmarco-ft-all \
+  --output run.miracl.mdpr-tied-pft-msmarco-ft-all.ar.dev.txt --hits 1000
 
Evaluation commands:
python -m pyserini.eval.trec_eval \
   -c -m recall.100 miracl-v1.0-ar-dev \
-  run.miracl.bm25-mdpr-tied-pft-msmarco-hybrid.ar.dev.txt
+ run.miracl.mdpr-tied-pft-msmarco-ft-all.ar.dev.txt
@@ -4555,16 +4506,20 @@

MIRACL

Command to generate run:
-
python -m pyserini.fusion \
-  --runs  run.miracl.bm25.bn.dev.top1000.txt run.miracl.mdpr-tied-pft-msmarco.bn.dev.top1000.txt \
-  --output run.miracl.bm25-mdpr-tied-pft-msmarco-hybrid.bn.dev.txt --method interpolation --alpha 0.5 --depth 1000 --k 1000
+
python -m pyserini.search.faiss \
+  --threads 16 --batch-size 512 \
+  --encoder-class auto \
+  --encoder castorini/mdpr-tied-pft-msmarco-ft-all \
+  --topics miracl-v1.0-bn-dev \
+  --index miracl-v1.0-bn-mdpr-tied-pft-msmarco-ft-all \
+  --output run.miracl.mdpr-tied-pft-msmarco-ft-all.bn.dev.txt --hits 1000
 
Evaluation commands:
python -m pyserini.eval.trec_eval \
   -c -m recall.100 miracl-v1.0-bn-dev \
-  run.miracl.bm25-mdpr-tied-pft-msmarco-hybrid.bn.dev.txt
+ run.miracl.mdpr-tied-pft-msmarco-ft-all.bn.dev.txt
@@ -4572,16 +4527,20 @@

MIRACL

Command to generate run:
-
python -m pyserini.fusion \
-  --runs  run.miracl.bm25.en.dev.top1000.txt run.miracl.mdpr-tied-pft-msmarco.en.dev.top1000.txt \
-  --output run.miracl.bm25-mdpr-tied-pft-msmarco-hybrid.en.dev.txt --method interpolation --alpha 0.5 --depth 1000 --k 1000
+
python -m pyserini.search.faiss \
+  --threads 16 --batch-size 512 \
+  --encoder-class auto \
+  --encoder castorini/mdpr-tied-pft-msmarco-ft-all \
+  --topics miracl-v1.0-en-dev \
+  --index miracl-v1.0-en-mdpr-tied-pft-msmarco-ft-all \
+  --output run.miracl.mdpr-tied-pft-msmarco-ft-all.en.dev.txt --hits 1000
 
Evaluation commands:
python -m pyserini.eval.trec_eval \
   -c -m recall.100 miracl-v1.0-en-dev \
-  run.miracl.bm25-mdpr-tied-pft-msmarco-hybrid.en.dev.txt
+ run.miracl.mdpr-tied-pft-msmarco-ft-all.en.dev.txt
@@ -4589,16 +4548,20 @@

MIRACL

Command to generate run:
-
python -m pyserini.fusion \
-  --runs  run.miracl.bm25.es.dev.top1000.txt run.miracl.mdpr-tied-pft-msmarco.es.dev.top1000.txt \
-  --output run.miracl.bm25-mdpr-tied-pft-msmarco-hybrid.es.dev.txt --method interpolation --alpha 0.5 --depth 1000 --k 1000
+
python -m pyserini.search.faiss \
+  --threads 16 --batch-size 512 \
+  --encoder-class auto \
+  --encoder castorini/mdpr-tied-pft-msmarco-ft-all \
+  --topics miracl-v1.0-es-dev \
+  --index miracl-v1.0-es-mdpr-tied-pft-msmarco-ft-all \
+  --output run.miracl.mdpr-tied-pft-msmarco-ft-all.es.dev.txt --hits 1000
 
Evaluation commands:
python -m pyserini.eval.trec_eval \
   -c -m recall.100 miracl-v1.0-es-dev \
-  run.miracl.bm25-mdpr-tied-pft-msmarco-hybrid.es.dev.txt
+ run.miracl.mdpr-tied-pft-msmarco-ft-all.es.dev.txt
@@ -4606,16 +4569,20 @@

MIRACL

Command to generate run:
-
python -m pyserini.fusion \
-  --runs  run.miracl.bm25.fa.dev.top1000.txt run.miracl.mdpr-tied-pft-msmarco.fa.dev.top1000.txt \
-  --output run.miracl.bm25-mdpr-tied-pft-msmarco-hybrid.fa.dev.txt --method interpolation --alpha 0.5 --depth 1000 --k 1000
+
python -m pyserini.search.faiss \
+  --threads 16 --batch-size 512 \
+  --encoder-class auto \
+  --encoder castorini/mdpr-tied-pft-msmarco-ft-all \
+  --topics miracl-v1.0-fa-dev \
+  --index miracl-v1.0-fa-mdpr-tied-pft-msmarco-ft-all \
+  --output run.miracl.mdpr-tied-pft-msmarco-ft-all.fa.dev.txt --hits 1000
 
Evaluation commands:
python -m pyserini.eval.trec_eval \
   -c -m recall.100 miracl-v1.0-fa-dev \
-  run.miracl.bm25-mdpr-tied-pft-msmarco-hybrid.fa.dev.txt
+ run.miracl.mdpr-tied-pft-msmarco-ft-all.fa.dev.txt
@@ -4623,16 +4590,20 @@

MIRACL

Command to generate run:
-
python -m pyserini.fusion \
-  --runs  run.miracl.bm25.fi.dev.top1000.txt run.miracl.mdpr-tied-pft-msmarco.fi.dev.top1000.txt \
-  --output run.miracl.bm25-mdpr-tied-pft-msmarco-hybrid.fi.dev.txt --method interpolation --alpha 0.5 --depth 1000 --k 1000
+
python -m pyserini.search.faiss \
+  --threads 16 --batch-size 512 \
+  --encoder-class auto \
+  --encoder castorini/mdpr-tied-pft-msmarco-ft-all \
+  --topics miracl-v1.0-fi-dev \
+  --index miracl-v1.0-fi-mdpr-tied-pft-msmarco-ft-all \
+  --output run.miracl.mdpr-tied-pft-msmarco-ft-all.fi.dev.txt --hits 1000
 
Evaluation commands:
python -m pyserini.eval.trec_eval \
   -c -m recall.100 miracl-v1.0-fi-dev \
-  run.miracl.bm25-mdpr-tied-pft-msmarco-hybrid.fi.dev.txt
+ run.miracl.mdpr-tied-pft-msmarco-ft-all.fi.dev.txt
@@ -4640,16 +4611,20 @@

MIRACL

Command to generate run:
-
python -m pyserini.fusion \
-  --runs  run.miracl.bm25.fr.dev.top1000.txt run.miracl.mdpr-tied-pft-msmarco.fr.dev.top1000.txt \
-  --output run.miracl.bm25-mdpr-tied-pft-msmarco-hybrid.fr.dev.txt --method interpolation --alpha 0.5 --depth 1000 --k 1000
+
python -m pyserini.search.faiss \
+  --threads 16 --batch-size 512 \
+  --encoder-class auto \
+  --encoder castorini/mdpr-tied-pft-msmarco-ft-all \
+  --topics miracl-v1.0-fr-dev \
+  --index miracl-v1.0-fr-mdpr-tied-pft-msmarco-ft-all \
+  --output run.miracl.mdpr-tied-pft-msmarco-ft-all.fr.dev.txt --hits 1000
 
Evaluation commands:
python -m pyserini.eval.trec_eval \
   -c -m recall.100 miracl-v1.0-fr-dev \
-  run.miracl.bm25-mdpr-tied-pft-msmarco-hybrid.fr.dev.txt
+ run.miracl.mdpr-tied-pft-msmarco-ft-all.fr.dev.txt
@@ -4657,16 +4632,20 @@

MIRACL

Command to generate run:
-
python -m pyserini.fusion \
-  --runs  run.miracl.bm25.hi.dev.top1000.txt run.miracl.mdpr-tied-pft-msmarco.hi.dev.top1000.txt \
-  --output run.miracl.bm25-mdpr-tied-pft-msmarco-hybrid.hi.dev.txt --method interpolation --alpha 0.5 --depth 1000 --k 1000
+
python -m pyserini.search.faiss \
+  --threads 16 --batch-size 512 \
+  --encoder-class auto \
+  --encoder castorini/mdpr-tied-pft-msmarco-ft-all \
+  --topics miracl-v1.0-hi-dev \
+  --index miracl-v1.0-hi-mdpr-tied-pft-msmarco-ft-all \
+  --output run.miracl.mdpr-tied-pft-msmarco-ft-all.hi.dev.txt --hits 1000
 
Evaluation commands:
python -m pyserini.eval.trec_eval \
   -c -m recall.100 miracl-v1.0-hi-dev \
-  run.miracl.bm25-mdpr-tied-pft-msmarco-hybrid.hi.dev.txt
+ run.miracl.mdpr-tied-pft-msmarco-ft-all.hi.dev.txt
@@ -4674,16 +4653,20 @@

MIRACL

Command to generate run:
-
python -m pyserini.fusion \
-  --runs  run.miracl.bm25.id.dev.top1000.txt run.miracl.mdpr-tied-pft-msmarco.id.dev.top1000.txt \
-  --output run.miracl.bm25-mdpr-tied-pft-msmarco-hybrid.id.dev.txt --method interpolation --alpha 0.5 --depth 1000 --k 1000
+
python -m pyserini.search.faiss \
+  --threads 16 --batch-size 512 \
+  --encoder-class auto \
+  --encoder castorini/mdpr-tied-pft-msmarco-ft-all \
+  --topics miracl-v1.0-id-dev \
+  --index miracl-v1.0-id-mdpr-tied-pft-msmarco-ft-all \
+  --output run.miracl.mdpr-tied-pft-msmarco-ft-all.id.dev.txt --hits 1000
 
Evaluation commands:
python -m pyserini.eval.trec_eval \
   -c -m recall.100 miracl-v1.0-id-dev \
-  run.miracl.bm25-mdpr-tied-pft-msmarco-hybrid.id.dev.txt
+ run.miracl.mdpr-tied-pft-msmarco-ft-all.id.dev.txt
@@ -4691,16 +4674,20 @@

MIRACL

Command to generate run:
-
python -m pyserini.fusion \
-  --runs  run.miracl.bm25.ja.dev.top1000.txt run.miracl.mdpr-tied-pft-msmarco.ja.dev.top1000.txt \
-  --output run.miracl.bm25-mdpr-tied-pft-msmarco-hybrid.ja.dev.txt --method interpolation --alpha 0.5 --depth 1000 --k 1000
+
python -m pyserini.search.faiss \
+  --threads 16 --batch-size 512 \
+  --encoder-class auto \
+  --encoder castorini/mdpr-tied-pft-msmarco-ft-all \
+  --topics miracl-v1.0-ja-dev \
+  --index miracl-v1.0-ja-mdpr-tied-pft-msmarco-ft-all \
+  --output run.miracl.mdpr-tied-pft-msmarco-ft-all.ja.dev.txt --hits 1000
 
Evaluation commands:
python -m pyserini.eval.trec_eval \
   -c -m recall.100 miracl-v1.0-ja-dev \
-  run.miracl.bm25-mdpr-tied-pft-msmarco-hybrid.ja.dev.txt
+ run.miracl.mdpr-tied-pft-msmarco-ft-all.ja.dev.txt
@@ -4708,16 +4695,20 @@

MIRACL

Command to generate run:
-
python -m pyserini.fusion \
-  --runs  run.miracl.bm25.ko.dev.top1000.txt run.miracl.mdpr-tied-pft-msmarco.ko.dev.top1000.txt \
-  --output run.miracl.bm25-mdpr-tied-pft-msmarco-hybrid.ko.dev.txt --method interpolation --alpha 0.5 --depth 1000 --k 1000
+
python -m pyserini.search.faiss \
+  --threads 16 --batch-size 512 \
+  --encoder-class auto \
+  --encoder castorini/mdpr-tied-pft-msmarco-ft-all \
+  --topics miracl-v1.0-ko-dev \
+  --index miracl-v1.0-ko-mdpr-tied-pft-msmarco-ft-all \
+  --output run.miracl.mdpr-tied-pft-msmarco-ft-all.ko.dev.txt --hits 1000
 
Evaluation commands:
python -m pyserini.eval.trec_eval \
   -c -m recall.100 miracl-v1.0-ko-dev \
-  run.miracl.bm25-mdpr-tied-pft-msmarco-hybrid.ko.dev.txt
+ run.miracl.mdpr-tied-pft-msmarco-ft-all.ko.dev.txt
@@ -4725,16 +4716,20 @@

MIRACL

Command to generate run:
-
python -m pyserini.fusion \
-  --runs  run.miracl.bm25.ru.dev.top1000.txt run.miracl.mdpr-tied-pft-msmarco.ru.dev.top1000.txt \
-  --output run.miracl.bm25-mdpr-tied-pft-msmarco-hybrid.ru.dev.txt --method interpolation --alpha 0.5 --depth 1000 --k 1000
+
python -m pyserini.search.faiss \
+  --threads 16 --batch-size 512 \
+  --encoder-class auto \
+  --encoder castorini/mdpr-tied-pft-msmarco-ft-all \
+  --topics miracl-v1.0-ru-dev \
+  --index miracl-v1.0-ru-mdpr-tied-pft-msmarco-ft-all \
+  --output run.miracl.mdpr-tied-pft-msmarco-ft-all.ru.dev.txt --hits 1000
 
Evaluation commands:
python -m pyserini.eval.trec_eval \
   -c -m recall.100 miracl-v1.0-ru-dev \
-  run.miracl.bm25-mdpr-tied-pft-msmarco-hybrid.ru.dev.txt
+ run.miracl.mdpr-tied-pft-msmarco-ft-all.ru.dev.txt
@@ -4742,16 +4737,20 @@

MIRACL

Command to generate run:
-
python -m pyserini.fusion \
-  --runs  run.miracl.bm25.sw.dev.top1000.txt run.miracl.mdpr-tied-pft-msmarco.sw.dev.top1000.txt \
-  --output run.miracl.bm25-mdpr-tied-pft-msmarco-hybrid.sw.dev.txt --method interpolation --alpha 0.5 --depth 1000 --k 1000
+
python -m pyserini.search.faiss \
+  --threads 16 --batch-size 512 \
+  --encoder-class auto \
+  --encoder castorini/mdpr-tied-pft-msmarco-ft-all \
+  --topics miracl-v1.0-sw-dev \
+  --index miracl-v1.0-sw-mdpr-tied-pft-msmarco-ft-all \
+  --output run.miracl.mdpr-tied-pft-msmarco-ft-all.sw.dev.txt --hits 1000
 
Evaluation commands:
python -m pyserini.eval.trec_eval \
   -c -m recall.100 miracl-v1.0-sw-dev \
-  run.miracl.bm25-mdpr-tied-pft-msmarco-hybrid.sw.dev.txt
+ run.miracl.mdpr-tied-pft-msmarco-ft-all.sw.dev.txt
@@ -4760,16 +4759,20 @@

MIRACL

Command to generate run:
-
python -m pyserini.fusion \
-  --runs  run.miracl.bm25.te.dev.top1000.txt run.miracl.mdpr-tied-pft-msmarco.te.dev.top1000.txt \
-  --output run.miracl.bm25-mdpr-tied-pft-msmarco-hybrid.te.dev.txt --method interpolation --alpha 0.5 --depth 1000 --k 1000
+
python -m pyserini.search.faiss \
+  --threads 16 --batch-size 512 \
+  --encoder-class auto \
+  --encoder castorini/mdpr-tied-pft-msmarco-ft-all \
+  --topics miracl-v1.0-te-dev \
+  --index miracl-v1.0-te-mdpr-tied-pft-msmarco-ft-all \
+  --output run.miracl.mdpr-tied-pft-msmarco-ft-all.te.dev.txt --hits 1000
 
Evaluation commands:
python -m pyserini.eval.trec_eval \
   -c -m recall.100 miracl-v1.0-te-dev \
-  run.miracl.bm25-mdpr-tied-pft-msmarco-hybrid.te.dev.txt
+ run.miracl.mdpr-tied-pft-msmarco-ft-all.te.dev.txt
@@ -4778,16 +4781,20 @@

MIRACL

Command to generate run:
-
python -m pyserini.fusion \
-  --runs  run.miracl.bm25.th.dev.top1000.txt run.miracl.mdpr-tied-pft-msmarco.th.dev.top1000.txt \
-  --output run.miracl.bm25-mdpr-tied-pft-msmarco-hybrid.th.dev.txt --method interpolation --alpha 0.5 --depth 1000 --k 1000
+
python -m pyserini.search.faiss \
+  --threads 16 --batch-size 512 \
+  --encoder-class auto \
+  --encoder castorini/mdpr-tied-pft-msmarco-ft-all \
+  --topics miracl-v1.0-th-dev \
+  --index miracl-v1.0-th-mdpr-tied-pft-msmarco-ft-all \
+  --output run.miracl.mdpr-tied-pft-msmarco-ft-all.th.dev.txt --hits 1000
 
Evaluation commands:
python -m pyserini.eval.trec_eval \
   -c -m recall.100 miracl-v1.0-th-dev \
-  run.miracl.bm25-mdpr-tied-pft-msmarco-hybrid.th.dev.txt
+ run.miracl.mdpr-tied-pft-msmarco-ft-all.th.dev.txt
@@ -4796,16 +4803,20 @@

MIRACL

Command to generate run:
-
python -m pyserini.fusion \
-  --runs  run.miracl.bm25.zh.dev.top1000.txt run.miracl.mdpr-tied-pft-msmarco.zh.dev.top1000.txt \
-  --output run.miracl.bm25-mdpr-tied-pft-msmarco-hybrid.zh.dev.txt --method interpolation --alpha 0.5 --depth 1000 --k 1000
+
python -m pyserini.search.faiss \
+  --threads 16 --batch-size 512 \
+  --encoder-class auto \
+  --encoder castorini/mdpr-tied-pft-msmarco-ft-all \
+  --topics miracl-v1.0-zh-dev \
+  --index miracl-v1.0-zh-mdpr-tied-pft-msmarco-ft-all \
+  --output run.miracl.mdpr-tied-pft-msmarco-ft-all.zh.dev.txt --hits 1000
 
Evaluation commands:
python -m pyserini.eval.trec_eval \
   -c -m recall.100 miracl-v1.0-zh-dev \
-  run.miracl.bm25-mdpr-tied-pft-msmarco-hybrid.zh.dev.txt
+ run.miracl.mdpr-tied-pft-msmarco-ft-all.zh.dev.txt
@@ -4814,16 +4825,20 @@

MIRACL

Command to generate run:
-
python -m pyserini.fusion \
-  --runs  run.miracl.bm25.de.dev.top1000.txt run.miracl.mdpr-tied-pft-msmarco.de.dev.top1000.txt \
-  --output run.miracl.bm25-mdpr-tied-pft-msmarco-hybrid.de.dev.txt --method interpolation --alpha 0.5 --depth 1000 --k 1000
+
python -m pyserini.search.faiss \
+  --threads 16 --batch-size 512 \
+  --encoder-class auto \
+  --encoder castorini/mdpr-tied-pft-msmarco-ft-all \
+  --topics miracl-v1.0-de-dev \
+  --index miracl-v1.0-de-mdpr-tied-pft-msmarco-ft-all \
+  --output run.miracl.mdpr-tied-pft-msmarco-ft-all.de.dev.txt --hits 1000
 
Evaluation commands:
python -m pyserini.eval.trec_eval \
   -c -m recall.100 miracl-v1.0-de-dev \
-  run.miracl.bm25-mdpr-tied-pft-msmarco-hybrid.de.dev.txt
+ run.miracl.mdpr-tied-pft-msmarco-ft-all.de.dev.txt
@@ -4832,16 +4847,20 @@

MIRACL

Command to generate run:
-
python -m pyserini.fusion \
-  --runs  run.miracl.bm25.yo.dev.top1000.txt run.miracl.mdpr-tied-pft-msmarco.yo.dev.top1000.txt \
-  --output run.miracl.bm25-mdpr-tied-pft-msmarco-hybrid.yo.dev.txt --method interpolation --alpha 0.5 --depth 1000 --k 1000
+
python -m pyserini.search.faiss \
+  --threads 16 --batch-size 512 \
+  --encoder-class auto \
+  --encoder castorini/mdpr-tied-pft-msmarco-ft-all \
+  --topics miracl-v1.0-yo-dev \
+  --index miracl-v1.0-yo-mdpr-tied-pft-msmarco-ft-all \
+  --output run.miracl.mdpr-tied-pft-msmarco-ft-all.yo.dev.txt --hits 1000
 
Evaluation commands:
python -m pyserini.eval.trec_eval \
   -c -m recall.100 miracl-v1.0-yo-dev \
-  run.miracl.bm25-mdpr-tied-pft-msmarco-hybrid.yo.dev.txt
+ run.miracl.mdpr-tied-pft-msmarco-ft-all.yo.dev.txt
@@ -4852,10 +4871,10 @@

MIRACL

@@ -452,10 +466,10 @@

Mr.TyDi

- + - + @@ -741,10 +755,10 @@

Mr.TyDi

- + - + @@ -1041,10 +1055,10 @@

Mr.TyDi

- + - + @@ -1341,10 +1355,10 @@

Mr.TyDi

- + - + @@ -1957,10 +1971,10 @@

Mr.TyDi

- + - + @@ -2246,10 +2260,10 @@

Mr.TyDi

- + - + @@ -2546,10 +2560,10 @@

Mr.TyDi

- + - + @@ -2846,10 +2860,10 @@

Mr.TyDi

- + - + diff --git a/docs/prebuilt-indexes.md b/docs/prebuilt-indexes.md index 8096ecebe..5371f50e7 100644 --- a/docs/prebuilt-indexes.md +++ b/docs/prebuilt-indexes.md @@ -49,6 +49,8 @@ Detailed configuration information for the pre-built indexes are stored in [`pys ## Standard Lucene Indexes +
+MS MARCO
msmarco-v1-doc [readme] @@ -199,6 +201,9 @@ Detailed configuration information for the pre-built indexes are stored in [`pys
Lucene index (+docvectors) of the MS MARCO V2 augmented passage corpus with doc2query-T5 expansions.
+
+
+BEIR
beir-v1.0.0-trec-covid.flat [readme] @@ -433,6 +438,9 @@ Detailed configuration information for the pre-built indexes are stored in [`pys
Lucene multifield index of BEIR (v1.0.0): SciFact.
+
+
+Mr.TyDi
mrtydi-v1.1-ar [readme] @@ -479,6 +487,9 @@ Detailed configuration information for the pre-built indexes are stored in [`pys
Lucene index for Mr.TyDi v1.1 (Thai).
+
+
+MIRACL
miracl-v1.0-ar [readme] @@ -553,6 +564,9 @@ Detailed configuration information for the pre-built indexes are stored in [`pys
Lucene index for MIRACL v1.0 (Yoruba).
+
+
+Other
ciral-v1.0-ha [readme] @@ -726,9 +740,12 @@ Detailed configuration information for the pre-built indexes are stored in [`pys
Lucene index for AToMiC Images v0.2 large setting on validation set
+
## Lucene Impact Indexes +
+MS MARCO
msmarco-v1-passage.slimr [readme] @@ -810,98 +827,133 @@ Detailed configuration information for the pre-built indexes are stored in [`pys
Lucene impact index of the MS MARCO V2 segmented document corpus for uniCOIL (noexp) with title prepended.
+
+
+BEIR
beir-v1.0.0-trec-covid.splade-pp-ed +[readme]
Lucene impact index of BEIR (v1.0.0): TREC-COVID, encoded by SPLADE++ (CoCondenser-EnsembleDistil).
beir-v1.0.0-bioasq.splade-pp-ed +[readme]
Lucene impact index of BEIR (v1.0.0): BioASQ, encoded by SPLADE++ (CoCondenser-EnsembleDistil).
beir-v1.0.0-nfcorpus.splade-pp-ed +[readme]
Lucene impact index of BEIR (v1.0.0): NFCorpus, encoded by SPLADE++ (CoCondenser-EnsembleDistil).
beir-v1.0.0-nq.splade-pp-ed +[readme]
Lucene impact index of BEIR (v1.0.0): NQ, encoded by SPLADE++ (CoCondenser-EnsembleDistil).
beir-v1.0.0-hotpotqa.splade-pp-ed +[readme]
Lucene impact index of BEIR (v1.0.0): HotpotQA, encoded by SPLADE++ (CoCondenser-EnsembleDistil).
beir-v1.0.0-fiqa.splade-pp-ed +[readme]
Lucene impact index of BEIR (v1.0.0): FiQA-2018, encoded by SPLADE++ (CoCondenser-EnsembleDistil).
beir-v1.0.0-signal1m.splade-pp-ed +[readme]
Lucene impact index of BEIR (v1.0.0): Signal-1M, encoded by SPLADE++ (CoCondenser-EnsembleDistil).
beir-v1.0.0-trec-news.splade-pp-ed +[readme]
Lucene impact index of BEIR (v1.0.0): TREC-NEWS, encoded by SPLADE++ (CoCondenser-EnsembleDistil).
beir-v1.0.0-robust04.splade-pp-ed +[readme]
Lucene impact index of BEIR (v1.0.0): Robust04, encoded by SPLADE++ (CoCondenser-EnsembleDistil).
beir-v1.0.0-arguana.splade-pp-ed +[readme]
Lucene impact index of BEIR (v1.0.0): ArguAna, encoded by SPLADE++ (CoCondenser-EnsembleDistil).
beir-v1.0.0-webis-touche2020.splade-pp-ed +[readme]
Lucene impact index of BEIR (v1.0.0): Webis-Touche2020, encoded by SPLADE++ (CoCondenser-EnsembleDistil).
beir-v1.0.0-cqadupstack-android.splade-pp-ed +[readme]
Lucene impact index of BEIR (v1.0.0): CQADupStack-android, encoded by SPLADE++ (CoCondenser-EnsembleDistil).
beir-v1.0.0-cqadupstack-english.splade-pp-ed +[readme]
Lucene impact index of BEIR (v1.0.0): CQADupStack-english, encoded by SPLADE++ (CoCondenser-EnsembleDistil).
beir-v1.0.0-cqadupstack-gaming.splade-pp-ed +[readme]
Lucene impact index of BEIR (v1.0.0): CQADupStack-gaming, encoded by SPLADE++ (CoCondenser-EnsembleDistil).
beir-v1.0.0-cqadupstack-gis.splade-pp-ed +[readme]
Lucene impact index of BEIR (v1.0.0): CQADupStack-gis, encoded by SPLADE++ (CoCondenser-EnsembleDistil).
beir-v1.0.0-cqadupstack-mathematica.splade-pp-ed +[readme]
Lucene impact index of BEIR (v1.0.0): CQADupStack-mathematica, encoded by SPLADE++ (CoCondenser-EnsembleDistil).
beir-v1.0.0-cqadupstack-physics.splade-pp-ed +[readme]
Lucene impact index of BEIR (v1.0.0): CQADupStack-physics, encoded by SPLADE++ (CoCondenser-EnsembleDistil).
beir-v1.0.0-cqadupstack-programmers.splade-pp-ed +[readme]
Lucene impact index of BEIR (v1.0.0): CQADupStack-programmers, encoded by SPLADE++ (CoCondenser-EnsembleDistil).
beir-v1.0.0-cqadupstack-stats.splade-pp-ed +[readme]
Lucene impact index of BEIR (v1.0.0): CQADupStack-stats, encoded by SPLADE++ (CoCondenser-EnsembleDistil).
beir-v1.0.0-cqadupstack-tex.splade-pp-ed +[readme]
Lucene impact index of BEIR (v1.0.0): CQADupStack-tex, encoded by SPLADE++ (CoCondenser-EnsembleDistil).
beir-v1.0.0-cqadupstack-unix.splade-pp-ed +[readme]
Lucene impact index of BEIR (v1.0.0): CQADupStack-unix, encoded by SPLADE++ (CoCondenser-EnsembleDistil).
beir-v1.0.0-cqadupstack-webmasters.splade-pp-ed +[readme]
Lucene impact index of BEIR (v1.0.0): CQADupStack-webmasters, encoded by SPLADE++ (CoCondenser-EnsembleDistil).
beir-v1.0.0-cqadupstack-wordpress.splade-pp-ed +[readme]
Lucene impact index of BEIR (v1.0.0): CQADupStack-wordpress, encoded by SPLADE++ (CoCondenser-EnsembleDistil).
beir-v1.0.0-quora.splade-pp-ed +[readme]
Lucene impact index of BEIR (v1.0.0): Quora, encoded by SPLADE++ (CoCondenser-EnsembleDistil).
beir-v1.0.0-dbpedia-entity.splade-pp-ed +[readme]
Lucene impact index of BEIR (v1.0.0): DBPedia, encoded by SPLADE++ (CoCondenser-EnsembleDistil).
beir-v1.0.0-scidocs.splade-pp-ed +[readme]
Lucene impact index of BEIR (v1.0.0): SCIDOCS, encoded by SPLADE++ (CoCondenser-EnsembleDistil).
beir-v1.0.0-fever.splade-pp-ed +[readme]
Lucene impact index of BEIR (v1.0.0): FEVER, encoded by SPLADE++ (CoCondenser-EnsembleDistil).
beir-v1.0.0-climate-fever.splade-pp-ed +[readme]
Lucene impact index of BEIR (v1.0.0): Climate-FEVER, encoded by SPLADE++ (CoCondenser-EnsembleDistil).
beir-v1.0.0-scifact.splade-pp-ed +[readme]
Lucene impact index of BEIR (v1.0.0): SciFact, encoded by SPLADE++ (CoCondenser-EnsembleDistil).
+
## Faiss Indexes +
+MS MARCO
msmarco-v1-passage.cosdpr-distil
Faiss flat index of the MS MARCO passage corpus encoded by cosDPR-distil. @@ -961,6 +1013,11 @@ Detailed configuration information for the pre-built indexes are stored in [`pys
msmarco-v1-doc-segmented.tct_colbert-v2-hnp
Faiss flat index of the MS MARCO document corpus encoded by TCT-ColBERT-V2-HNP
+
+
+
+BEIR +
beir-v1.0.0-trec-covid.contriever [readme]
Faiss flat index for BEIR (v1.0.0): TREC-COVID, encoded by Contriever. @@ -1396,6 +1453,11 @@ Detailed configuration information for the pre-built indexes are stored in [`pys [readme]
Faiss index for BEIR v1.0.0 (SciFact) corpus encoded by cohere-embed-english-v3.0 encoder.
+
+
+
+Mr.TyDi +
mrtydi-v1.1-arabic-mdpr-nq [readme]
Faiss index for Mr.TyDi v1.1 (Arabic) corpus encoded by mDPR passage encoder pre-fine-tuned on NQ. @@ -1572,6 +1634,11 @@ Detailed configuration information for the pre-built indexes are stored in [`pys [readme]
Faiss index for Mr.TyDi v1.1 (Thai) corpus encoded by mDPR passage encoder pre-fine-tuned on NQ.
+
+
+
+MIRACL +
miracl-v1.0-ar-mdpr-tied-pft-msmarco [readme]
Faiss index for MIRACL v1.0 (Arabic) corpus encoded by mDPR passage encoder pre-fine-tuned on MS MARCO. @@ -1852,28 +1919,11 @@ Detailed configuration information for the pre-built indexes are stored in [`pys [readme]
Faiss index for MIRACL v1.0 (Yoruba) corpus encoded by mContriever passage encoder pre-fine-tuned on MS MARCO.
-
wikipedia-dpr-100w.dpr-multi -
Faiss FlatIP index of Wikipedia encoded by the DPR doc encoder trained on multiple QA datasets -
-
wikipedia-dpr-100w.dpr-single-nq -
Faiss FlatIP index of Wikipedia encoded by the DPR doc encoder trained on NQ -
-
wikipedia-dpr-100w.bpr-single-nq -
Faiss binary index of Wikipedia encoded by the BPR doc encoder trained on NQ -
-
wikipedia-dpr-100w.ance-multi -
Faiss FlatIP index of Wikipedia encoded by the ANCE-multi encoder -
-
wikipedia-dpr-100w.dkrr-nq -
Faiss FlatIP index of Wikipedia DPR encoded by the retriever model from 'Distilling Knowledge from Reader to Retriever for Question Answering' trained on NQ -
-
wikipedia-dpr-100w.dkrr-tqa -
Faiss FlatIP index of Wikipedia DPR encoded by the retriever model from 'Distilling Knowledge from Reader to Retriever for Question Answering' trained on TriviaQA -
-
wiki-all-6-3.dpr2-multi-retriever -[readme] -
Faiss FlatIP index of wiki-all-6-3-tamber encoded by a 2nd iteration DPR model trained on multiple QA datasets -
+
+
+
+Other +
ciral-v1.0-ha-mdpr-tied-pft-msmarco [readme]
Faiss index for CIRAL v1.0 (Hausa) corpus encoded by mDPR passage encoder pre-fine-tuned on MS MARCO. @@ -1906,6 +1956,32 @@ Detailed configuration information for the pre-built indexes are stored in [`pys [readme]
Faiss index for CIRAL v1.0 (Yoruba) corpus encoded by Afriberta-DPR passage encoder pre-fine-tuned on MS MARCO and fine-tuned on Latin languages in Mr. TyDi.
+
+
+
wikipedia-dpr-100w.dpr-multi +
Faiss FlatIP index of Wikipedia encoded by the DPR doc encoder trained on multiple QA datasets +
+
wikipedia-dpr-100w.dpr-single-nq +
Faiss FlatIP index of Wikipedia encoded by the DPR doc encoder trained on NQ +
+
wikipedia-dpr-100w.bpr-single-nq +
Faiss binary index of Wikipedia encoded by the BPR doc encoder trained on NQ +
+
wikipedia-dpr-100w.ance-multi +
Faiss FlatIP index of Wikipedia encoded by the ANCE-multi encoder +
+
wikipedia-dpr-100w.dkrr-nq +
Faiss FlatIP index of Wikipedia DPR encoded by the retriever model from 'Distilling Knowledge from Reader to Retriever for Question Answering' trained on NQ +
+
wikipedia-dpr-100w.dkrr-tqa +
Faiss FlatIP index of Wikipedia DPR encoded by the retriever model from 'Distilling Knowledge from Reader to Retriever for Question Answering' trained on TriviaQA +
+
wiki-all-6-3.dpr2-multi-retriever +[readme] +
Faiss FlatIP index of wiki-all-6-3-tamber encoded by a 2nd iteration DPR model trained on multiple QA datasets +
+
+
cast2019-tct_colbert-v2.hnsw [readme]
Faiss HNSW index of the CAsT2019 passage corpus encoded by the tct_colbert-v2 passage encoder @@ -2007,3 +2083,4 @@ Detailed configuration information for the pre-built indexes are stored in [`pys
Faiss index for AToMiC Texts v0.2.1 on large corpus encoded by laion/CLIP-Salesforce.blip-itm-large-coco
+
diff --git a/pyserini/2cr/miracl.py b/pyserini/2cr/miracl.py index d657e391e..b88d4600d 100644 --- a/pyserini/2cr/miracl.py +++ b/pyserini/2cr/miracl.py @@ -57,11 +57,11 @@ html_display = OrderedDict() html_display['bm25'] = 'BM25' -html_display['mdpr-tied-pft-msmarco'] = 'mDPR (tied encoders), pre-FT w/ MS MARCO' -html_display['mdpr-tied-pft-msmarco-ft-all'] = 'mDPR (tied encoders), pre-FT w/ MS MARCO then FT w/ all Mr. TyDi' -html_display['bm25-mdpr-tied-pft-msmarco-hybrid'] = 'Hybrid of `bm25` and `mdpr-tied-pft-msmarco`' -html_display['mdpr-tied-pft-msmarco-ft-miracl'] = 'mDPR (tied encoders), pre-FT w/ MS MARCO then in-lang FT w/ MIRACL' -html_display['mcontriever-tied-pft-msmarco'] = 'mContriever (tied encoders), pre-FT w/ MS MARCO' +html_display['mdpr-tied-pft-msmarco'] = 'mDPR pFT' +html_display['bm25-mdpr-tied-pft-msmarco-hybrid'] = 'BM25+mDPR pFT' +html_display['mdpr-tied-pft-msmarco-ft-all'] = 'mDPR pFT+FT1' +html_display['mdpr-tied-pft-msmarco-ft-miracl'] = 'mDPR pFT+FT2' +html_display['mcontriever-tied-pft-msmarco'] = 'mContriever' models = list(html_display) @@ -285,12 +285,12 @@ def generate_report(args): # Build the table for MRR@100, test queries html_rows = generate_table_rows(table, row_template, commands, eval_commands, 1, split, 'nDCG@10') all_rows = '\n'.join(html_rows) - tables_html.append(Template(table_template).substitute(desc=f'nDCG@10, {split} queries', rows=all_rows)) + tables_html.append(Template(table_template).substitute(desc=f'nDCG@10, {split} queries', rows=all_rows)) # Build the table for R@100, test queries html_rows = generate_table_rows(table, row_template, commands, eval_commands, 2, split, 'R@100') all_rows = '\n'.join(html_rows) - tables_html.append(Template(table_template).substitute(desc=f'Recall@100, {split} queries', rows=all_rows)) + tables_html.append(Template(table_template).substitute(desc=f'Recall@100, {split} queries', rows=all_rows)) with open(args.output, 'w') as out: out.write(Template(html_template).substitute(title='MIRACL', tables=' '.join(tables_html))) diff --git a/pyserini/2cr/miracl_html.template b/pyserini/2cr/miracl_html.template index 2c1688185..27ce9fce2 100644 --- a/pyserini/2cr/miracl_html.template +++ b/pyserini/2cr/miracl_html.template @@ -131,14 +131,33 @@ pre[class*="prettyprint"] { ">
-

$title

+

Pyserini Reproductions: $title

-
+
+ +

This page provides two-click reproductions* for a number of experimental runs on the MIRACL dataset. Instructions for programmatic execution are shown at the bottom of this page. The dataset is described in the following paper:

+ +

Xinyu Zhang, Nandan Thakur, Odunayo Ogundepo, Ehsan Kamalloo, David Alfonso-Hermelo, Xiaoguang Li, Qun Liu, Mehdi Rezagholizadeh, and Jimmy Lin. MIRACL: A Multilingual Retrieval Dataset Covering 18 Diverse Languages. Transactions of the Association for Computational Linguistics, 11:1114–1131, 2023.

+ +

Many of the models presented on this page are described in the following paper:

+ +

Xinyu Zhang, Kelechi Ogueji, Xueguang Ma, and Jimmy Lin. Towards Best Practices for Training Multilingual Dense Retrieval Models. ACM Transactions on Information Systems, 42(2), Article No. 39, 2023.

+ +

Key:

+ +
    +
  • BM25: BM25
  • +
  • mDPR pFT: mDPR (tied encoders), pre-FT w/ MS MARCO
  • +
  • BM25+mDPR pFT: hybrid of BM25 and mDPR (tied encoders), pre-FT w/ MS MARCO
  • +
  • mDPR pFT+FT1: mDPR (tied encoders), pre-FT w/ MS MARCO then FT w/ all Mr. TyDi
  • +
  • mDPR pFT+FT2: mDPR (tied encoders), pre-FT w/ MS MARCO then in-lang FT w/ MIRACL
  • +
  • mContriever: mContriever (tied encoders), pre-FT w/ MS MARCO
  • +
$tables diff --git a/pyserini/2cr/mrtydi.py b/pyserini/2cr/mrtydi.py index 85147d7e8..cd1f31b56 100644 --- a/pyserini/2cr/mrtydi.py +++ b/pyserini/2cr/mrtydi.py @@ -51,10 +51,10 @@ html_display = { 'bm25': 'BM25', - 'mdpr-split-pft-nq': 'mDPR (split encoders), pre-FT w/ NQ', - 'mdpr-tied-pft-nq': 'mDPR (tied encoders), pre-FT w/ NQ', - 'mdpr-tied-pft-msmarco': 'mDPR (tied encoders), pre-FT w/ MS MARCO', - 'mdpr-tied-pft-msmarco-ft-all': 'mDPR (tied encoders), pre-FT w/ MS MARCO, FT w/ all' + 'mdpr-split-pft-nq': 'mDPR (split) pFT NQ', + 'mdpr-tied-pft-nq': 'mDPR (tied) pFT NQ', + 'mdpr-tied-pft-msmarco': 'mDPR (tied) pFT MS MARCO', + 'mdpr-tied-pft-msmarco-ft-all': 'mDPR (tied) pFT MS MARCO + FT all' } trec_eval_metric_definitions = { diff --git a/pyserini/2cr/mrtydi_html.template b/pyserini/2cr/mrtydi_html.template index 21ec6fcd0..597a6933c 100644 --- a/pyserini/2cr/mrtydi_html.template +++ b/pyserini/2cr/mrtydi_html.template @@ -131,14 +131,28 @@ pre[class*="prettyprint"] { ">
-

$title

+

Pyserini Reproductions: $title

-
+
+ +

This page provides two-click reproductions* for a number of experimental runs on the Mr. TyDi dataset. Instructions for programmatic execution are shown at the bottom of this page. The dataset is described in the following paper:

+ +

Xinyu Zhang, Xueguang Ma, Peng Shi, and Jimmy Lin. Mr. TyDi: A Multi-lingual Benchmark for Dense Retrieval. Proceedings of 1st Workshop on Multilingual Representation Learning, pages 127-137, November 2021, Punta Cana, Dominican Republic.

+ +

Key:

+ +
    +
  • BM25
  • +
  • mDPR (split) pFT NQ: mDPR (split encoders), pre-FT w/ NQ
  • +
  • mDPR (tied) pFT NQ: mDPR (tied encoders), pre-FT w/ NQ
  • +
  • mDPR (tied) pFT MS MARCO: mDPR (tied encoders), pre-FT w/ MS MARCO
  • +
  • mDPR (tied) pFT MS MARCO + FT all: mDPR (tied encoders), pre-FT w/ MS MARCO, FT w/ all
  • +
$tables diff --git a/pyserini/prebuilt_index_info.py b/pyserini/prebuilt_index_info.py index a30875e9c..0f6770e3c 100644 --- a/pyserini/prebuilt_index_info.py +++ b/pyserini/prebuilt_index_info.py @@ -2801,6 +2801,7 @@ "beir-v1.0.0-trec-covid.splade-pp-ed": { "description": "Lucene impact index of BEIR (v1.0.0): TREC-COVID, encoded by SPLADE++ (CoCondenser-EnsembleDistil).", "filename": "lucene-inverted.beir-v1.0.0-trec-covid.splade-pp-ed.20231124.a66f86f.tar.gz", + "readme": "lucene-inverted.beir-v1.0.0-splade-pp-ed.20231124.a66f86f.README.md", "urls": [ "https://rgw.cs.uwaterloo.ca/pyserini/indexes/lucene/lucene-inverted.beir-v1.0.0-trec-covid.splade-pp-ed.20231124.a66f86f.tar.gz" ], @@ -2814,6 +2815,7 @@ "beir-v1.0.0-bioasq.splade-pp-ed": { "description": "Lucene impact index of BEIR (v1.0.0): BioASQ, encoded by SPLADE++ (CoCondenser-EnsembleDistil).", "filename": "lucene-inverted.beir-v1.0.0-bioasq.splade-pp-ed.20231124.a66f86f.tar.gz", + "readme": "lucene-inverted.beir-v1.0.0-splade-pp-ed.20231124.a66f86f.README.md", "urls": [ "https://rgw.cs.uwaterloo.ca/pyserini/indexes/lucene/lucene-inverted.beir-v1.0.0-bioasq.splade-pp-ed.20231124.a66f86f.tar.gz" ], @@ -2827,6 +2829,7 @@ "beir-v1.0.0-nfcorpus.splade-pp-ed": { "description": "Lucene impact index of BEIR (v1.0.0): NFCorpus, encoded by SPLADE++ (CoCondenser-EnsembleDistil).", "filename": "lucene-inverted.beir-v1.0.0-nfcorpus.splade-pp-ed.20231124.a66f86f.tar.gz", + "readme": "lucene-inverted.beir-v1.0.0-splade-pp-ed.20231124.a66f86f.README.md", "urls": [ "https://rgw.cs.uwaterloo.ca/pyserini/indexes/lucene/lucene-inverted.beir-v1.0.0-nfcorpus.splade-pp-ed.20231124.a66f86f.tar.gz" ], @@ -2840,6 +2843,7 @@ "beir-v1.0.0-nq.splade-pp-ed": { "description": "Lucene impact index of BEIR (v1.0.0): NQ, encoded by SPLADE++ (CoCondenser-EnsembleDistil).", "filename": "lucene-inverted.beir-v1.0.0-nq.splade-pp-ed.20231124.a66f86f.tar.gz", + "readme": "lucene-inverted.beir-v1.0.0-splade-pp-ed.20231124.a66f86f.README.md", "urls": [ "https://rgw.cs.uwaterloo.ca/pyserini/indexes/lucene/lucene-inverted.beir-v1.0.0-nq.splade-pp-ed.20231124.a66f86f.tar.gz" ], @@ -2853,6 +2857,7 @@ "beir-v1.0.0-hotpotqa.splade-pp-ed": { "description": "Lucene impact index of BEIR (v1.0.0): HotpotQA, encoded by SPLADE++ (CoCondenser-EnsembleDistil).", "filename": "lucene-inverted.beir-v1.0.0-hotpotqa.splade-pp-ed.20231124.a66f86f.tar.gz", + "readme": "lucene-inverted.beir-v1.0.0-splade-pp-ed.20231124.a66f86f.README.md", "urls": [ "https://rgw.cs.uwaterloo.ca/pyserini/indexes/lucene/lucene-inverted.beir-v1.0.0-hotpotqa.splade-pp-ed.20231124.a66f86f.tar.gz" ], @@ -2866,6 +2871,7 @@ "beir-v1.0.0-fiqa.splade-pp-ed": { "description": "Lucene impact index of BEIR (v1.0.0): FiQA-2018, encoded by SPLADE++ (CoCondenser-EnsembleDistil).", "filename": "lucene-inverted.beir-v1.0.0-fiqa.splade-pp-ed.20231124.a66f86f.tar.gz", + "readme": "lucene-inverted.beir-v1.0.0-splade-pp-ed.20231124.a66f86f.README.md", "urls": [ "https://rgw.cs.uwaterloo.ca/pyserini/indexes/lucene/lucene-inverted.beir-v1.0.0-fiqa.splade-pp-ed.20231124.a66f86f.tar.gz" ], @@ -2879,6 +2885,7 @@ "beir-v1.0.0-signal1m.splade-pp-ed": { "description": "Lucene impact index of BEIR (v1.0.0): Signal-1M, encoded by SPLADE++ (CoCondenser-EnsembleDistil).", "filename": "lucene-inverted.beir-v1.0.0-signal1m.splade-pp-ed.20231124.a66f86f.tar.gz", + "readme": "lucene-inverted.beir-v1.0.0-splade-pp-ed.20231124.a66f86f.README.md", "urls": [ "https://rgw.cs.uwaterloo.ca/pyserini/indexes/lucene/lucene-inverted.beir-v1.0.0-signal1m.splade-pp-ed.20231124.a66f86f.tar.gz" ], @@ -2892,6 +2899,7 @@ "beir-v1.0.0-trec-news.splade-pp-ed": { "description": "Lucene impact index of BEIR (v1.0.0): TREC-NEWS, encoded by SPLADE++ (CoCondenser-EnsembleDistil).", "filename": "lucene-inverted.beir-v1.0.0-trec-news.splade-pp-ed.20231124.a66f86f.tar.gz", + "readme": "lucene-inverted.beir-v1.0.0-splade-pp-ed.20231124.a66f86f.README.md", "urls": [ "https://rgw.cs.uwaterloo.ca/pyserini/indexes/lucene/lucene-inverted.beir-v1.0.0-trec-news.splade-pp-ed.20231124.a66f86f.tar.gz" ], @@ -2905,6 +2913,7 @@ "beir-v1.0.0-robust04.splade-pp-ed": { "description": "Lucene impact index of BEIR (v1.0.0): Robust04, encoded by SPLADE++ (CoCondenser-EnsembleDistil).", "filename": "lucene-inverted.beir-v1.0.0-robust04.splade-pp-ed.20231124.a66f86f.tar.gz", + "readme": "lucene-inverted.beir-v1.0.0-splade-pp-ed.20231124.a66f86f.README.md", "urls": [ "https://rgw.cs.uwaterloo.ca/pyserini/indexes/lucene/lucene-inverted.beir-v1.0.0-robust04.splade-pp-ed.20231124.a66f86f.tar.gz" ], @@ -2918,6 +2927,7 @@ "beir-v1.0.0-arguana.splade-pp-ed": { "description": "Lucene impact index of BEIR (v1.0.0): ArguAna, encoded by SPLADE++ (CoCondenser-EnsembleDistil).", "filename": "lucene-inverted.beir-v1.0.0-arguana.splade-pp-ed.20231124.a66f86f.tar.gz", + "readme": "lucene-inverted.beir-v1.0.0-splade-pp-ed.20231124.a66f86f.README.md", "urls": [ "https://rgw.cs.uwaterloo.ca/pyserini/indexes/lucene/lucene-inverted.beir-v1.0.0-arguana.splade-pp-ed.20231124.a66f86f.tar.gz" ], @@ -2931,6 +2941,7 @@ "beir-v1.0.0-webis-touche2020.splade-pp-ed": { "description": "Lucene impact index of BEIR (v1.0.0): Webis-Touche2020, encoded by SPLADE++ (CoCondenser-EnsembleDistil).", "filename": "lucene-inverted.beir-v1.0.0-webis-touche2020.splade-pp-ed.20231124.a66f86f.tar.gz", + "readme": "lucene-inverted.beir-v1.0.0-splade-pp-ed.20231124.a66f86f.README.md", "urls": [ "https://rgw.cs.uwaterloo.ca/pyserini/indexes/lucene/lucene-inverted.beir-v1.0.0-webis-touche2020.splade-pp-ed.20231124.a66f86f.tar.gz" ], @@ -2944,6 +2955,7 @@ "beir-v1.0.0-cqadupstack-android.splade-pp-ed": { "description": "Lucene impact index of BEIR (v1.0.0): CQADupStack-android, encoded by SPLADE++ (CoCondenser-EnsembleDistil).", "filename": "lucene-inverted.beir-v1.0.0-cqadupstack-android.splade-pp-ed.20231124.a66f86f.tar.gz", + "readme": "lucene-inverted.beir-v1.0.0-splade-pp-ed.20231124.a66f86f.README.md", "urls": [ "https://rgw.cs.uwaterloo.ca/pyserini/indexes/lucene/lucene-inverted.beir-v1.0.0-cqadupstack-android.splade-pp-ed.20231124.a66f86f.tar.gz" ], @@ -2957,6 +2969,7 @@ "beir-v1.0.0-cqadupstack-english.splade-pp-ed": { "description": "Lucene impact index of BEIR (v1.0.0): CQADupStack-english, encoded by SPLADE++ (CoCondenser-EnsembleDistil).", "filename": "lucene-inverted.beir-v1.0.0-cqadupstack-english.splade-pp-ed.20231124.a66f86f.tar.gz", + "readme": "lucene-inverted.beir-v1.0.0-splade-pp-ed.20231124.a66f86f.README.md", "urls": [ "https://rgw.cs.uwaterloo.ca/pyserini/indexes/lucene/lucene-inverted.beir-v1.0.0-cqadupstack-english.splade-pp-ed.20231124.a66f86f.tar.gz" ], @@ -2970,6 +2983,7 @@ "beir-v1.0.0-cqadupstack-gaming.splade-pp-ed": { "description": "Lucene impact index of BEIR (v1.0.0): CQADupStack-gaming, encoded by SPLADE++ (CoCondenser-EnsembleDistil).", "filename": "lucene-inverted.beir-v1.0.0-cqadupstack-gaming.splade-pp-ed.20231124.a66f86f.tar.gz", + "readme": "lucene-inverted.beir-v1.0.0-splade-pp-ed.20231124.a66f86f.README.md", "urls": [ "https://rgw.cs.uwaterloo.ca/pyserini/indexes/lucene/lucene-inverted.beir-v1.0.0-cqadupstack-gaming.splade-pp-ed.20231124.a66f86f.tar.gz" ], @@ -2983,6 +2997,7 @@ "beir-v1.0.0-cqadupstack-gis.splade-pp-ed": { "description": "Lucene impact index of BEIR (v1.0.0): CQADupStack-gis, encoded by SPLADE++ (CoCondenser-EnsembleDistil).", "filename": "lucene-inverted.beir-v1.0.0-cqadupstack-gis.splade-pp-ed.20231124.a66f86f.tar.gz", + "readme": "lucene-inverted.beir-v1.0.0-splade-pp-ed.20231124.a66f86f.README.md", "urls": [ "https://rgw.cs.uwaterloo.ca/pyserini/indexes/lucene/lucene-inverted.beir-v1.0.0-cqadupstack-gis.splade-pp-ed.20231124.a66f86f.tar.gz" ], @@ -2996,6 +3011,7 @@ "beir-v1.0.0-cqadupstack-mathematica.splade-pp-ed": { "description": "Lucene impact index of BEIR (v1.0.0): CQADupStack-mathematica, encoded by SPLADE++ (CoCondenser-EnsembleDistil).", "filename": "lucene-inverted.beir-v1.0.0-cqadupstack-mathematica.splade-pp-ed.20231124.a66f86f.tar.gz", + "readme": "lucene-inverted.beir-v1.0.0-splade-pp-ed.20231124.a66f86f.README.md", "urls": [ "https://rgw.cs.uwaterloo.ca/pyserini/indexes/lucene/lucene-inverted.beir-v1.0.0-cqadupstack-mathematica.splade-pp-ed.20231124.a66f86f.tar.gz" ], @@ -3009,6 +3025,7 @@ "beir-v1.0.0-cqadupstack-physics.splade-pp-ed": { "description": "Lucene impact index of BEIR (v1.0.0): CQADupStack-physics, encoded by SPLADE++ (CoCondenser-EnsembleDistil).", "filename": "lucene-inverted.beir-v1.0.0-cqadupstack-physics.splade-pp-ed.20231124.a66f86f.tar.gz", + "readme": "lucene-inverted.beir-v1.0.0-splade-pp-ed.20231124.a66f86f.README.md", "urls": [ "https://rgw.cs.uwaterloo.ca/pyserini/indexes/lucene/lucene-inverted.beir-v1.0.0-cqadupstack-physics.splade-pp-ed.20231124.a66f86f.tar.gz" ], @@ -3022,6 +3039,7 @@ "beir-v1.0.0-cqadupstack-programmers.splade-pp-ed": { "description": "Lucene impact index of BEIR (v1.0.0): CQADupStack-programmers, encoded by SPLADE++ (CoCondenser-EnsembleDistil).", "filename": "lucene-inverted.beir-v1.0.0-cqadupstack-programmers.splade-pp-ed.20231124.a66f86f.tar.gz", + "readme": "lucene-inverted.beir-v1.0.0-splade-pp-ed.20231124.a66f86f.README.md", "urls": [ "https://rgw.cs.uwaterloo.ca/pyserini/indexes/lucene/lucene-inverted.beir-v1.0.0-cqadupstack-programmers.splade-pp-ed.20231124.a66f86f.tar.gz" ], @@ -3035,6 +3053,7 @@ "beir-v1.0.0-cqadupstack-stats.splade-pp-ed": { "description": "Lucene impact index of BEIR (v1.0.0): CQADupStack-stats, encoded by SPLADE++ (CoCondenser-EnsembleDistil).", "filename": "lucene-inverted.beir-v1.0.0-cqadupstack-stats.splade-pp-ed.20231124.a66f86f.tar.gz", + "readme": "lucene-inverted.beir-v1.0.0-splade-pp-ed.20231124.a66f86f.README.md", "urls": [ "https://rgw.cs.uwaterloo.ca/pyserini/indexes/lucene/lucene-inverted.beir-v1.0.0-cqadupstack-stats.splade-pp-ed.20231124.a66f86f.tar.gz" ], @@ -3048,6 +3067,7 @@ "beir-v1.0.0-cqadupstack-tex.splade-pp-ed": { "description": "Lucene impact index of BEIR (v1.0.0): CQADupStack-tex, encoded by SPLADE++ (CoCondenser-EnsembleDistil).", "filename": "lucene-inverted.beir-v1.0.0-cqadupstack-tex.splade-pp-ed.20231124.a66f86f.tar.gz", + "readme": "lucene-inverted.beir-v1.0.0-splade-pp-ed.20231124.a66f86f.README.md", "urls": [ "https://rgw.cs.uwaterloo.ca/pyserini/indexes/lucene/lucene-inverted.beir-v1.0.0-cqadupstack-tex.splade-pp-ed.20231124.a66f86f.tar.gz" ], @@ -3061,6 +3081,7 @@ "beir-v1.0.0-cqadupstack-unix.splade-pp-ed": { "description": "Lucene impact index of BEIR (v1.0.0): CQADupStack-unix, encoded by SPLADE++ (CoCondenser-EnsembleDistil).", "filename": "lucene-inverted.beir-v1.0.0-cqadupstack-unix.splade-pp-ed.20231124.a66f86f.tar.gz", + "readme": "lucene-inverted.beir-v1.0.0-splade-pp-ed.20231124.a66f86f.README.md", "urls": [ "https://rgw.cs.uwaterloo.ca/pyserini/indexes/lucene/lucene-inverted.beir-v1.0.0-cqadupstack-unix.splade-pp-ed.20231124.a66f86f.tar.gz" ], @@ -3074,6 +3095,7 @@ "beir-v1.0.0-cqadupstack-webmasters.splade-pp-ed": { "description": "Lucene impact index of BEIR (v1.0.0): CQADupStack-webmasters, encoded by SPLADE++ (CoCondenser-EnsembleDistil).", "filename": "lucene-inverted.beir-v1.0.0-cqadupstack-webmasters.splade-pp-ed.20231124.a66f86f.tar.gz", + "readme": "lucene-inverted.beir-v1.0.0-splade-pp-ed.20231124.a66f86f.README.md", "urls": [ "https://rgw.cs.uwaterloo.ca/pyserini/indexes/lucene/lucene-inverted.beir-v1.0.0-cqadupstack-webmasters.splade-pp-ed.20231124.a66f86f.tar.gz" ], @@ -3087,6 +3109,7 @@ "beir-v1.0.0-cqadupstack-wordpress.splade-pp-ed": { "description": "Lucene impact index of BEIR (v1.0.0): CQADupStack-wordpress, encoded by SPLADE++ (CoCondenser-EnsembleDistil).", "filename": "lucene-inverted.beir-v1.0.0-cqadupstack-wordpress.splade-pp-ed.20231124.a66f86f.tar.gz", + "readme": "lucene-inverted.beir-v1.0.0-splade-pp-ed.20231124.a66f86f.README.md", "urls": [ "https://rgw.cs.uwaterloo.ca/pyserini/indexes/lucene/lucene-inverted.beir-v1.0.0-cqadupstack-wordpress.splade-pp-ed.20231124.a66f86f.tar.gz" ], @@ -3100,6 +3123,7 @@ "beir-v1.0.0-quora.splade-pp-ed": { "description": "Lucene impact index of BEIR (v1.0.0): Quora, encoded by SPLADE++ (CoCondenser-EnsembleDistil).", "filename": "lucene-inverted.beir-v1.0.0-quora.splade-pp-ed.20231124.a66f86f.tar.gz", + "readme": "lucene-inverted.beir-v1.0.0-splade-pp-ed.20231124.a66f86f.README.md", "urls": [ "https://rgw.cs.uwaterloo.ca/pyserini/indexes/lucene/lucene-inverted.beir-v1.0.0-quora.splade-pp-ed.20231124.a66f86f.tar.gz" ], @@ -3113,6 +3137,7 @@ "beir-v1.0.0-dbpedia-entity.splade-pp-ed": { "description": "Lucene impact index of BEIR (v1.0.0): DBPedia, encoded by SPLADE++ (CoCondenser-EnsembleDistil).", "filename": "lucene-inverted.beir-v1.0.0-dbpedia-entity.splade-pp-ed.20231124.a66f86f.tar.gz", + "readme": "lucene-inverted.beir-v1.0.0-splade-pp-ed.20231124.a66f86f.README.md", "urls": [ "https://rgw.cs.uwaterloo.ca/pyserini/indexes/lucene/lucene-inverted.beir-v1.0.0-dbpedia-entity.splade-pp-ed.20231124.a66f86f.tar.gz" ], @@ -3126,6 +3151,7 @@ "beir-v1.0.0-scidocs.splade-pp-ed": { "description": "Lucene impact index of BEIR (v1.0.0): SCIDOCS, encoded by SPLADE++ (CoCondenser-EnsembleDistil).", "filename": "lucene-inverted.beir-v1.0.0-scidocs.splade-pp-ed.20231124.a66f86f.tar.gz", + "readme": "lucene-inverted.beir-v1.0.0-splade-pp-ed.20231124.a66f86f.README.md", "urls": [ "https://rgw.cs.uwaterloo.ca/pyserini/indexes/lucene/lucene-inverted.beir-v1.0.0-scidocs.splade-pp-ed.20231124.a66f86f.tar.gz" ], @@ -3139,6 +3165,7 @@ "beir-v1.0.0-fever.splade-pp-ed": { "description": "Lucene impact index of BEIR (v1.0.0): FEVER, encoded by SPLADE++ (CoCondenser-EnsembleDistil).", "filename": "lucene-inverted.beir-v1.0.0-fever.splade-pp-ed.20231124.a66f86f.tar.gz", + "readme": "lucene-inverted.beir-v1.0.0-splade-pp-ed.20231124.a66f86f.README.md", "urls": [ "https://rgw.cs.uwaterloo.ca/pyserini/indexes/lucene/lucene-inverted.beir-v1.0.0-fever.splade-pp-ed.20231124.a66f86f.tar.gz" ], @@ -3152,6 +3179,7 @@ "beir-v1.0.0-climate-fever.splade-pp-ed": { "description": "Lucene impact index of BEIR (v1.0.0): Climate-FEVER, encoded by SPLADE++ (CoCondenser-EnsembleDistil).", "filename": "lucene-inverted.beir-v1.0.0-climate-fever.splade-pp-ed.20231124.a66f86f.tar.gz", + "readme": "lucene-inverted.beir-v1.0.0-splade-pp-ed.20231124.a66f86f.README.md", "urls": [ "https://rgw.cs.uwaterloo.ca/pyserini/indexes/lucene/lucene-inverted.beir-v1.0.0-climate-fever.splade-pp-ed.20231124.a66f86f.tar.gz" ], @@ -3165,6 +3193,7 @@ "beir-v1.0.0-scifact.splade-pp-ed": { "description": "Lucene impact index of BEIR (v1.0.0): SciFact, encoded by SPLADE++ (CoCondenser-EnsembleDistil).", "filename": "lucene-inverted.beir-v1.0.0-scifact.splade-pp-ed.20231124.a66f86f.tar.gz", + "readme": "lucene-inverted.beir-v1.0.0-splade-pp-ed.20231124.a66f86f.README.md", "urls": [ "https://rgw.cs.uwaterloo.ca/pyserini/indexes/lucene/lucene-inverted.beir-v1.0.0-scifact.splade-pp-ed.20231124.a66f86f.tar.gz" ], diff --git a/pyserini/resources/index-metadata/faiss-flat.beir-v1.0.0.contriever-msmarco.20230124.README.md b/pyserini/resources/index-metadata/faiss-flat.beir-v1.0.0.contriever-msmarco.20230124.README.md index 5da58d837..0e0d59d16 100644 --- a/pyserini/resources/index-metadata/faiss-flat.beir-v1.0.0.contriever-msmarco.20230124.README.md +++ b/pyserini/resources/index-metadata/faiss-flat.beir-v1.0.0.contriever-msmarco.20230124.README.md @@ -16,4 +16,6 @@ python -m tevatron.driver.encode \ where the `subdataset` is one of the BEIR dataset, e.g. `scifact`. -The Embedding is then converted to Pyserini index format. \ No newline at end of file +The Embedding is then converted to Pyserini index format. + +In April 2024, indexes were repackaged to adopt a more consistent naming scheme. diff --git a/pyserini/resources/index-metadata/faiss-flat.beir-v1.0.0.contriever.20230124.README.md b/pyserini/resources/index-metadata/faiss-flat.beir-v1.0.0.contriever.20230124.README.md index 761bf6627..8fbeefc83 100644 --- a/pyserini/resources/index-metadata/faiss-flat.beir-v1.0.0.contriever.20230124.README.md +++ b/pyserini/resources/index-metadata/faiss-flat.beir-v1.0.0.contriever.20230124.README.md @@ -16,4 +16,6 @@ python -m tevatron.driver.encode \ where the `subdataset` is one of the BEIR dataset, e.g. `scifact`. -The Embedding is then converted to Pyserini index format. \ No newline at end of file +The Embedding is then converted to Pyserini index format. + +In April 2024, indexes were repackaged to adopt a more consistent naming scheme. diff --git a/pyserini/resources/index-metadata/faiss-flat.msmarco-v1-passage.openai-text-embedding-3-large.20240410.c13cd6.README.md b/pyserini/resources/index-metadata/faiss-flat.msmarco-v1-passage.openai-text-embedding-3-large.20240410.c13cd6.README.md index 6c511f132..3528221ce 100644 --- a/pyserini/resources/index-metadata/faiss-flat.msmarco-v1-passage.openai-text-embedding-3-large.20240410.c13cd6.README.md +++ b/pyserini/resources/index-metadata/faiss-flat.msmarco-v1-passage.openai-text-embedding-3-large.20240410.c13cd6.README.md @@ -5,4 +5,6 @@ This index was generated on 2024/04/10 on `orca` at commit: + Pyserini commit [`c13cd6`](https://github.com/castorini/pyserini/commit/c13cd630136c7290ee95ee2cba74aeee3c5cbe07) (2024/04/10) -The corpora was encoded through the Azure OpenAI Service. Embeddings have 3072 dimensions. \ No newline at end of file +The corpora was encoded through the Azure OpenAI Service. Embeddings have 3072 dimensions. + +In April 2024, index was repackaged to adopt a more consistent naming scheme. diff --git a/pyserini/resources/index-metadata/index-msmarco-doc-20201117-f87c94-readme.txt b/pyserini/resources/index-metadata/index-msmarco-doc-20201117-f87c94-readme.txt deleted file mode 100644 index cd7fe0374..000000000 --- a/pyserini/resources/index-metadata/index-msmarco-doc-20201117-f87c94-readme.txt +++ /dev/null @@ -1,15 +0,0 @@ -This index was generated on 2020/11/17 at commit f87c945fd1c1e4174468194c72e3c05688dc45dd Mon Nov 16 16:17:20 2020 -0500 -with the following command: - -sh target/appassembler/bin/IndexCollection -collection CleanTrecCollection \ - -generator DefaultLuceneDocumentGenerator -input collections/msmarco-doc \ - -index index-msmarco-doc-20201117-f87c94 -threads 1 -storeRaw -optimize - -Note that to reduce index size: - -+ positions are not indexed (so no phrase queries) -+ document vectors are not stored (so no query expansion) - -However, the raw documents are stored, so they can be fetched and fed to further downstream reranking components. - -index-msmarco-doc-20201117-f87c94.tar.gz MD5 checksum = ac747860e7a37aed37cc30ed3990f273 diff --git a/pyserini/resources/index-metadata/index-msmarco-doc-expanded-per-doc-20201126-1b4d0a-readme.txt b/pyserini/resources/index-metadata/index-msmarco-doc-expanded-per-doc-20201126-1b4d0a-readme.txt deleted file mode 100644 index db57732f8..000000000 --- a/pyserini/resources/index-metadata/index-msmarco-doc-expanded-per-doc-20201126-1b4d0a-readme.txt +++ /dev/null @@ -1,14 +0,0 @@ -This index was generated on 2020/11/26 at - -+ docTTTTTquery commit d2704c025c2bf6db652b4b27f49c4e59714ba898 (2020/11/24). -+ anserini commit 1b4d0a29879a867ca5d1f003f924acc3279455ba (2020/11/25). - -with the following command: - -sh anserini/target/appassembler/bin/IndexCollection -collection JsonCollection \ - -generator DefaultLuceneDocumentGenerator -threads 1 \ - -input msmarco-doc-expanded -index index-msmarco-doc-expanded-per-doc-20201126-1b4d0a -optimize - -Note that this index does not store any "extras" (positions, document vectors, raw documents, etc.). - -index-msmarco-doc-expanded-per-doc-20201126-1b4d0a.tar.gz MD5 checksum = f7056191842ab77a01829cff68004782 diff --git a/pyserini/resources/index-metadata/index-msmarco-doc-expanded-per-passage-20201126-1b4d0a-readme.txt b/pyserini/resources/index-metadata/index-msmarco-doc-expanded-per-passage-20201126-1b4d0a-readme.txt deleted file mode 100644 index 29362ba57..000000000 --- a/pyserini/resources/index-metadata/index-msmarco-doc-expanded-per-passage-20201126-1b4d0a-readme.txt +++ /dev/null @@ -1,14 +0,0 @@ -This index was generated on 2020/11/26 at - -+ docTTTTTquery commit d2704c025c2bf6db652b4b27f49c4e59714ba898 (2020/11/24). -+ anserini commit 1b4d0a29879a867ca5d1f003f924acc3279455ba (2020/11/25). - -with the following command: - -sh anserini/target/appassembler/bin/IndexCollection -collection JsonCollection \ - -generator DefaultLuceneDocumentGenerator -threads 1 \ - -input msmarco-doc-expanded-passage -index index-msmarco-doc-expanded-per-passage-20201126-1b4d0a -optimize - -Note that this index does not store any "extras" (positions, document vectors, raw documents, etc.). - -index-msmarco-doc-expanded-per-passage-20201126-1b4d0a.tar.gz MD5 checksum = 54ea30c64515edf3c3741291b785be53 diff --git a/pyserini/resources/index-metadata/index-msmarco-doc-per-passage-20201204-f50dcc-readme.txt b/pyserini/resources/index-metadata/index-msmarco-doc-per-passage-20201204-f50dcc-readme.txt deleted file mode 100644 index 6f250a5de..000000000 --- a/pyserini/resources/index-metadata/index-msmarco-doc-per-passage-20201204-f50dcc-readme.txt +++ /dev/null @@ -1,19 +0,0 @@ -This index was generated on 2020/12/04 at - -+ docTTTTTquery commit 5be1af130b4657ea117781f761c4e5d15c77cb42 (2020/12/01). -+ anserini commit f50dcceb6cd0ec3403c1e77066aa51bb3275d24e (2020/12/04). - -with the following command: - -sh anserini/target/appassembler/bin/IndexCollection -collection JsonCollection \ - -generator DefaultLuceneDocumentGenerator -threads 1 \ - -input msmarco-doc-passage -index index-msmarco-doc-per-passage-20201204-f50dcc -storeRaw -optimize - -Note that to reduce index size: - -+ positions are not indexed (so no phrase queries) -+ document vectors are not stored (so no query expansion) - -However, the raw documents are stored, so they can be fetched and fed to further downstream reranking components. - -index-msmarco-doc-per-passage-20201204-f50dcc.tar.gz MD5 checksum = 797367406a7542b649cefa6b41cf4c33 diff --git a/pyserini/resources/index-metadata/index-msmarco-doc-per-passage-slim-20201204-f50dcc-readme.txt b/pyserini/resources/index-metadata/index-msmarco-doc-per-passage-slim-20201204-f50dcc-readme.txt deleted file mode 100644 index 565915c8b..000000000 --- a/pyserini/resources/index-metadata/index-msmarco-doc-per-passage-slim-20201204-f50dcc-readme.txt +++ /dev/null @@ -1,14 +0,0 @@ -This index was generated on 2020/12/04 at - -+ docTTTTTquery commit 5be1af130b4657ea117781f761c4e5d15c77cb42 (2020/12/01). -+ anserini commit f50dcceb6cd0ec3403c1e77066aa51bb3275d24e (2020/12/04). - -with the following command: - -sh anserini/target/appassembler/bin/IndexCollection -collection JsonCollection \ - -generator DefaultLuceneDocumentGenerator -threads 1 \ - -input msmarco-doc-passage -index index-msmarco-doc-per-passage-slim-20201204-f50dcc -optimize - -This minimal index does not store any "extras" (positions, document vectors, raw documents, etc.). - -index-msmarco-doc-per-passage-slim-20201204-f50dcc.tar.gz MD5 checksum = 77c2409943a8c9faffabf57cb6adca69 diff --git a/pyserini/resources/index-metadata/index-msmarco-doc-slim-20201202-ab6e28-readme.txt b/pyserini/resources/index-metadata/index-msmarco-doc-slim-20201202-ab6e28-readme.txt deleted file mode 100644 index 7e79f60ca..000000000 --- a/pyserini/resources/index-metadata/index-msmarco-doc-slim-20201202-ab6e28-readme.txt +++ /dev/null @@ -1,10 +0,0 @@ -This index was generated on 2020/12/02 at commit ab6e280b06a7a6476d001a5eb2319c191010c0e1 (2020/12/01) -with the following command: - -sh target/appassembler/bin/IndexCollection -collection CleanTrecCollection \ - -generator DefaultLuceneDocumentGenerator -input collections/msmarco-doc \ - -index index-msmarco-doc-slim-20201202-ab6e28 -threads 1 -optimize - -This minimal index does not store any "extras" (positions, document vectors, raw documents, etc.). - -index-msmarco-doc-slim-20201202-ab6e28.tar.gz MD5 checksum = c56e752f7992bf6149761097641d515a diff --git a/pyserini/resources/index-metadata/lucene-index.beir-v1.0.0-arguana.splade-pp-ed.20231124.a66f86f.README.md b/pyserini/resources/index-metadata/lucene-index.beir-v1.0.0-arguana.splade-pp-ed.20231124.a66f86f.README.md deleted file mode 100644 index edf1ec154..000000000 --- a/pyserini/resources/index-metadata/lucene-index.beir-v1.0.0-arguana.splade-pp-ed.20231124.a66f86f.README.md +++ /dev/null @@ -1,13 +0,0 @@ -# BEIR (v1.0.0) - ArguAna - -This Lucene impact index for SPLADE++ (CoCondenser-EnsembleDistil)" was generated on 2023/11/24 at Anserini commit [`a66f86f`](https://github.com/castorini/anserini/commit/a66f86fb463db76df521f58992b000dd4ab39548) on `orca` with the following command: - -``` -nohup target/appassembler/bin/IndexCollection \ - -collection JsonVectorCollection \ - -generator DefaultLuceneDocumentGenerator \ - -input /store/collections/beir-v1.0.0/splade-pp-ed/arguana \ - -index indexes/lucene-index.beir-v1.0.0-arguana.splade-pp-ed.20231124.a66f86f \ - -threads 16 -impact -pretokenized -optimize \ - >& logs/log.beir-v1.0.0--arguana.splade-pp-ed.20231124.a66f86f & -``` diff --git a/pyserini/resources/index-metadata/lucene-index.beir-v1.0.0-bioasq.splade-pp-ed.20231124.a66f86f.README.md b/pyserini/resources/index-metadata/lucene-index.beir-v1.0.0-bioasq.splade-pp-ed.20231124.a66f86f.README.md deleted file mode 100644 index 5982b89a4..000000000 --- a/pyserini/resources/index-metadata/lucene-index.beir-v1.0.0-bioasq.splade-pp-ed.20231124.a66f86f.README.md +++ /dev/null @@ -1,13 +0,0 @@ -# BEIR (v1.0.0) - BioASQ - -This Lucene impact index for SPLADE++ (CoCondenser-EnsembleDistil)" was generated on 2023/11/24 at Anserini commit [`a66f86f`](https://github.com/castorini/anserini/commit/a66f86fb463db76df521f58992b000dd4ab39548) on `orca` with the following command: - -``` -nohup target/appassembler/bin/IndexCollection \ - -collection JsonVectorCollection \ - -generator DefaultLuceneDocumentGenerator \ - -input /store/collections/beir-v1.0.0/splade-pp-ed/bioasq \ - -index indexes/lucene-index.beir-v1.0.0-bioasq.splade-pp-ed.20231124.a66f86f \ - -threads 16 -impact -pretokenized -optimize \ - >& logs/log.beir-v1.0.0--bioasq.splade-pp-ed.20231124.a66f86f & -``` diff --git a/pyserini/resources/index-metadata/lucene-index.beir-v1.0.0-climate-fever.splade-pp-ed.20231124.a66f86f.README.md b/pyserini/resources/index-metadata/lucene-index.beir-v1.0.0-climate-fever.splade-pp-ed.20231124.a66f86f.README.md deleted file mode 100644 index 93b79d5f4..000000000 --- a/pyserini/resources/index-metadata/lucene-index.beir-v1.0.0-climate-fever.splade-pp-ed.20231124.a66f86f.README.md +++ /dev/null @@ -1,13 +0,0 @@ -# BEIR (v1.0.0) - Climate-FEVER - -This Lucene impact index for SPLADE++ (CoCondenser-EnsembleDistil)" was generated on 2023/11/24 at Anserini commit [`a66f86f`](https://github.com/castorini/anserini/commit/a66f86fb463db76df521f58992b000dd4ab39548) on `orca` with the following command: - -``` -nohup target/appassembler/bin/IndexCollection \ - -collection JsonVectorCollection \ - -generator DefaultLuceneDocumentGenerator \ - -input /store/collections/beir-v1.0.0/splade-pp-ed/climate-fever \ - -index indexes/lucene-index.beir-v1.0.0-climate-fever.splade-pp-ed.20231124.a66f86f \ - -threads 16 -impact -pretokenized -optimize \ - >& logs/log.beir-v1.0.0--climate-fever.splade-pp-ed.20231124.a66f86f & -``` diff --git a/pyserini/resources/index-metadata/lucene-index.beir-v1.0.0-cqadupstack-android.splade-pp-ed.20231124.a66f86f.README.md b/pyserini/resources/index-metadata/lucene-index.beir-v1.0.0-cqadupstack-android.splade-pp-ed.20231124.a66f86f.README.md deleted file mode 100644 index 4b71eeea7..000000000 --- a/pyserini/resources/index-metadata/lucene-index.beir-v1.0.0-cqadupstack-android.splade-pp-ed.20231124.a66f86f.README.md +++ /dev/null @@ -1,13 +0,0 @@ -# BEIR (v1.0.0) - CQADupStack-android - -This Lucene impact index for SPLADE++ (CoCondenser-EnsembleDistil)" was generated on 2023/11/24 at Anserini commit [`a66f86f`](https://github.com/castorini/anserini/commit/a66f86fb463db76df521f58992b000dd4ab39548) on `orca` with the following command: - -``` -nohup target/appassembler/bin/IndexCollection \ - -collection JsonVectorCollection \ - -generator DefaultLuceneDocumentGenerator \ - -input /store/collections/beir-v1.0.0/splade-pp-ed/cqadupstack-android \ - -index indexes/lucene-index.beir-v1.0.0-cqadupstack-android.splade-pp-ed.20231124.a66f86f \ - -threads 16 -impact -pretokenized -optimize \ - >& logs/log.beir-v1.0.0--cqadupstack-android.splade-pp-ed.20231124.a66f86f & -``` diff --git a/pyserini/resources/index-metadata/lucene-index.beir-v1.0.0-cqadupstack-english.splade-pp-ed.20231124.a66f86f.README.md b/pyserini/resources/index-metadata/lucene-index.beir-v1.0.0-cqadupstack-english.splade-pp-ed.20231124.a66f86f.README.md deleted file mode 100644 index 99e6fb2b6..000000000 --- a/pyserini/resources/index-metadata/lucene-index.beir-v1.0.0-cqadupstack-english.splade-pp-ed.20231124.a66f86f.README.md +++ /dev/null @@ -1,13 +0,0 @@ -# BEIR (v1.0.0) - CQADupStack-english - -This Lucene impact index for SPLADE++ (CoCondenser-EnsembleDistil)" was generated on 2023/11/24 at Anserini commit [`a66f86f`](https://github.com/castorini/anserini/commit/a66f86fb463db76df521f58992b000dd4ab39548) on `orca` with the following command: - -``` -nohup target/appassembler/bin/IndexCollection \ - -collection JsonVectorCollection \ - -generator DefaultLuceneDocumentGenerator \ - -input /store/collections/beir-v1.0.0/splade-pp-ed/cqadupstack-english \ - -index indexes/lucene-index.beir-v1.0.0-cqadupstack-english.splade-pp-ed.20231124.a66f86f \ - -threads 16 -impact -pretokenized -optimize \ - >& logs/log.beir-v1.0.0--cqadupstack-english.splade-pp-ed.20231124.a66f86f & -``` diff --git a/pyserini/resources/index-metadata/lucene-index.beir-v1.0.0-cqadupstack-gaming.splade-pp-ed.20231124.a66f86f.README.md b/pyserini/resources/index-metadata/lucene-index.beir-v1.0.0-cqadupstack-gaming.splade-pp-ed.20231124.a66f86f.README.md deleted file mode 100644 index fa467adda..000000000 --- a/pyserini/resources/index-metadata/lucene-index.beir-v1.0.0-cqadupstack-gaming.splade-pp-ed.20231124.a66f86f.README.md +++ /dev/null @@ -1,13 +0,0 @@ -# BEIR (v1.0.0) - CQADupStack-gaming - -This Lucene impact index for SPLADE++ (CoCondenser-EnsembleDistil)" was generated on 2023/11/24 at Anserini commit [`a66f86f`](https://github.com/castorini/anserini/commit/a66f86fb463db76df521f58992b000dd4ab39548) on `orca` with the following command: - -``` -nohup target/appassembler/bin/IndexCollection \ - -collection JsonVectorCollection \ - -generator DefaultLuceneDocumentGenerator \ - -input /store/collections/beir-v1.0.0/splade-pp-ed/cqadupstack-gaming \ - -index indexes/lucene-index.beir-v1.0.0-cqadupstack-gaming.splade-pp-ed.20231124.a66f86f \ - -threads 16 -impact -pretokenized -optimize \ - >& logs/log.beir-v1.0.0--cqadupstack-gaming.splade-pp-ed.20231124.a66f86f & -``` diff --git a/pyserini/resources/index-metadata/lucene-index.beir-v1.0.0-cqadupstack-gis.splade-pp-ed.20231124.a66f86f.README.md b/pyserini/resources/index-metadata/lucene-index.beir-v1.0.0-cqadupstack-gis.splade-pp-ed.20231124.a66f86f.README.md deleted file mode 100644 index 2681a4dc6..000000000 --- a/pyserini/resources/index-metadata/lucene-index.beir-v1.0.0-cqadupstack-gis.splade-pp-ed.20231124.a66f86f.README.md +++ /dev/null @@ -1,13 +0,0 @@ -# BEIR (v1.0.0) - CQADupStack-gis - -This Lucene impact index for SPLADE++ (CoCondenser-EnsembleDistil)" was generated on 2023/11/24 at Anserini commit [`a66f86f`](https://github.com/castorini/anserini/commit/a66f86fb463db76df521f58992b000dd4ab39548) on `orca` with the following command: - -``` -nohup target/appassembler/bin/IndexCollection \ - -collection JsonVectorCollection \ - -generator DefaultLuceneDocumentGenerator \ - -input /store/collections/beir-v1.0.0/splade-pp-ed/cqadupstack-gis \ - -index indexes/lucene-index.beir-v1.0.0-cqadupstack-gis.splade-pp-ed.20231124.a66f86f \ - -threads 16 -impact -pretokenized -optimize \ - >& logs/log.beir-v1.0.0--cqadupstack-gis.splade-pp-ed.20231124.a66f86f & -``` diff --git a/pyserini/resources/index-metadata/lucene-index.beir-v1.0.0-cqadupstack-mathematica.splade-pp-ed.20231124.a66f86f.README.md b/pyserini/resources/index-metadata/lucene-index.beir-v1.0.0-cqadupstack-mathematica.splade-pp-ed.20231124.a66f86f.README.md deleted file mode 100644 index 914d50450..000000000 --- a/pyserini/resources/index-metadata/lucene-index.beir-v1.0.0-cqadupstack-mathematica.splade-pp-ed.20231124.a66f86f.README.md +++ /dev/null @@ -1,13 +0,0 @@ -# BEIR (v1.0.0) - CQADupStack-mathematica - -This Lucene impact index for SPLADE++ (CoCondenser-EnsembleDistil)" was generated on 2023/11/24 at Anserini commit [`a66f86f`](https://github.com/castorini/anserini/commit/a66f86fb463db76df521f58992b000dd4ab39548) on `orca` with the following command: - -``` -nohup target/appassembler/bin/IndexCollection \ - -collection JsonVectorCollection \ - -generator DefaultLuceneDocumentGenerator \ - -input /store/collections/beir-v1.0.0/splade-pp-ed/cqadupstack-mathematica \ - -index indexes/lucene-index.beir-v1.0.0-cqadupstack-mathematica.splade-pp-ed.20231124.a66f86f \ - -threads 16 -impact -pretokenized -optimize \ - >& logs/log.beir-v1.0.0--cqadupstack-mathematica.splade-pp-ed.20231124.a66f86f & -``` diff --git a/pyserini/resources/index-metadata/lucene-index.beir-v1.0.0-cqadupstack-physics.splade-pp-ed.20231124.a66f86f.README.md b/pyserini/resources/index-metadata/lucene-index.beir-v1.0.0-cqadupstack-physics.splade-pp-ed.20231124.a66f86f.README.md deleted file mode 100644 index bcc452def..000000000 --- a/pyserini/resources/index-metadata/lucene-index.beir-v1.0.0-cqadupstack-physics.splade-pp-ed.20231124.a66f86f.README.md +++ /dev/null @@ -1,13 +0,0 @@ -# BEIR (v1.0.0) - CQADupStack-physics - -This Lucene impact index for SPLADE++ (CoCondenser-EnsembleDistil)" was generated on 2023/11/24 at Anserini commit [`a66f86f`](https://github.com/castorini/anserini/commit/a66f86fb463db76df521f58992b000dd4ab39548) on `orca` with the following command: - -``` -nohup target/appassembler/bin/IndexCollection \ - -collection JsonVectorCollection \ - -generator DefaultLuceneDocumentGenerator \ - -input /store/collections/beir-v1.0.0/splade-pp-ed/cqadupstack-physics \ - -index indexes/lucene-index.beir-v1.0.0-cqadupstack-physics.splade-pp-ed.20231124.a66f86f \ - -threads 16 -impact -pretokenized -optimize \ - >& logs/log.beir-v1.0.0--cqadupstack-physics.splade-pp-ed.20231124.a66f86f & -``` diff --git a/pyserini/resources/index-metadata/lucene-index.beir-v1.0.0-cqadupstack-programmers.splade-pp-ed.20231124.a66f86f.README.md b/pyserini/resources/index-metadata/lucene-index.beir-v1.0.0-cqadupstack-programmers.splade-pp-ed.20231124.a66f86f.README.md deleted file mode 100644 index b9f15c7f7..000000000 --- a/pyserini/resources/index-metadata/lucene-index.beir-v1.0.0-cqadupstack-programmers.splade-pp-ed.20231124.a66f86f.README.md +++ /dev/null @@ -1,13 +0,0 @@ -# BEIR (v1.0.0) - CQADupStack-programmers - -This Lucene impact index for SPLADE++ (CoCondenser-EnsembleDistil)" was generated on 2023/11/24 at Anserini commit [`a66f86f`](https://github.com/castorini/anserini/commit/a66f86fb463db76df521f58992b000dd4ab39548) on `orca` with the following command: - -``` -nohup target/appassembler/bin/IndexCollection \ - -collection JsonVectorCollection \ - -generator DefaultLuceneDocumentGenerator \ - -input /store/collections/beir-v1.0.0/splade-pp-ed/cqadupstack-programmers \ - -index indexes/lucene-index.beir-v1.0.0-cqadupstack-programmers.splade-pp-ed.20231124.a66f86f \ - -threads 16 -impact -pretokenized -optimize \ - >& logs/log.beir-v1.0.0--cqadupstack-programmers.splade-pp-ed.20231124.a66f86f & -``` diff --git a/pyserini/resources/index-metadata/lucene-index.beir-v1.0.0-cqadupstack-stats.splade-pp-ed.20231124.a66f86f.README.md b/pyserini/resources/index-metadata/lucene-index.beir-v1.0.0-cqadupstack-stats.splade-pp-ed.20231124.a66f86f.README.md deleted file mode 100644 index e6dff042c..000000000 --- a/pyserini/resources/index-metadata/lucene-index.beir-v1.0.0-cqadupstack-stats.splade-pp-ed.20231124.a66f86f.README.md +++ /dev/null @@ -1,13 +0,0 @@ -# BEIR (v1.0.0) - CQADupStack-stats - -This Lucene impact index for SPLADE++ (CoCondenser-EnsembleDistil)" was generated on 2023/11/24 at Anserini commit [`a66f86f`](https://github.com/castorini/anserini/commit/a66f86fb463db76df521f58992b000dd4ab39548) on `orca` with the following command: - -``` -nohup target/appassembler/bin/IndexCollection \ - -collection JsonVectorCollection \ - -generator DefaultLuceneDocumentGenerator \ - -input /store/collections/beir-v1.0.0/splade-pp-ed/cqadupstack-stats \ - -index indexes/lucene-index.beir-v1.0.0-cqadupstack-stats.splade-pp-ed.20231124.a66f86f \ - -threads 16 -impact -pretokenized -optimize \ - >& logs/log.beir-v1.0.0--cqadupstack-stats.splade-pp-ed.20231124.a66f86f & -``` diff --git a/pyserini/resources/index-metadata/lucene-index.beir-v1.0.0-cqadupstack-tex.splade-pp-ed.20231124.a66f86f.README.md b/pyserini/resources/index-metadata/lucene-index.beir-v1.0.0-cqadupstack-tex.splade-pp-ed.20231124.a66f86f.README.md deleted file mode 100644 index f62809ede..000000000 --- a/pyserini/resources/index-metadata/lucene-index.beir-v1.0.0-cqadupstack-tex.splade-pp-ed.20231124.a66f86f.README.md +++ /dev/null @@ -1,13 +0,0 @@ -# BEIR (v1.0.0) - CQADupStack-tex - -This Lucene impact index for SPLADE++ (CoCondenser-EnsembleDistil)" was generated on 2023/11/24 at Anserini commit [`a66f86f`](https://github.com/castorini/anserini/commit/a66f86fb463db76df521f58992b000dd4ab39548) on `orca` with the following command: - -``` -nohup target/appassembler/bin/IndexCollection \ - -collection JsonVectorCollection \ - -generator DefaultLuceneDocumentGenerator \ - -input /store/collections/beir-v1.0.0/splade-pp-ed/cqadupstack-tex \ - -index indexes/lucene-index.beir-v1.0.0-cqadupstack-tex.splade-pp-ed.20231124.a66f86f \ - -threads 16 -impact -pretokenized -optimize \ - >& logs/log.beir-v1.0.0--cqadupstack-tex.splade-pp-ed.20231124.a66f86f & -``` diff --git a/pyserini/resources/index-metadata/lucene-index.beir-v1.0.0-cqadupstack-unix.splade-pp-ed.20231124.a66f86f.README.md b/pyserini/resources/index-metadata/lucene-index.beir-v1.0.0-cqadupstack-unix.splade-pp-ed.20231124.a66f86f.README.md deleted file mode 100644 index 5b70c88fa..000000000 --- a/pyserini/resources/index-metadata/lucene-index.beir-v1.0.0-cqadupstack-unix.splade-pp-ed.20231124.a66f86f.README.md +++ /dev/null @@ -1,13 +0,0 @@ -# BEIR (v1.0.0) - CQADupStack-unix - -This Lucene impact index for SPLADE++ (CoCondenser-EnsembleDistil)" was generated on 2023/11/24 at Anserini commit [`a66f86f`](https://github.com/castorini/anserini/commit/a66f86fb463db76df521f58992b000dd4ab39548) on `orca` with the following command: - -``` -nohup target/appassembler/bin/IndexCollection \ - -collection JsonVectorCollection \ - -generator DefaultLuceneDocumentGenerator \ - -input /store/collections/beir-v1.0.0/splade-pp-ed/cqadupstack-unix \ - -index indexes/lucene-index.beir-v1.0.0-cqadupstack-unix.splade-pp-ed.20231124.a66f86f \ - -threads 16 -impact -pretokenized -optimize \ - >& logs/log.beir-v1.0.0--cqadupstack-unix.splade-pp-ed.20231124.a66f86f & -``` diff --git a/pyserini/resources/index-metadata/lucene-index.beir-v1.0.0-cqadupstack-webmasters.splade-pp-ed.20231124.a66f86f.README.md b/pyserini/resources/index-metadata/lucene-index.beir-v1.0.0-cqadupstack-webmasters.splade-pp-ed.20231124.a66f86f.README.md deleted file mode 100644 index ac87f1514..000000000 --- a/pyserini/resources/index-metadata/lucene-index.beir-v1.0.0-cqadupstack-webmasters.splade-pp-ed.20231124.a66f86f.README.md +++ /dev/null @@ -1,13 +0,0 @@ -# BEIR (v1.0.0) - CQADupStack-webmasters - -This Lucene impact index for SPLADE++ (CoCondenser-EnsembleDistil)" was generated on 2023/11/24 at Anserini commit [`a66f86f`](https://github.com/castorini/anserini/commit/a66f86fb463db76df521f58992b000dd4ab39548) on `orca` with the following command: - -``` -nohup target/appassembler/bin/IndexCollection \ - -collection JsonVectorCollection \ - -generator DefaultLuceneDocumentGenerator \ - -input /store/collections/beir-v1.0.0/splade-pp-ed/cqadupstack-webmasters \ - -index indexes/lucene-index.beir-v1.0.0-cqadupstack-webmasters.splade-pp-ed.20231124.a66f86f \ - -threads 16 -impact -pretokenized -optimize \ - >& logs/log.beir-v1.0.0--cqadupstack-webmasters.splade-pp-ed.20231124.a66f86f & -``` diff --git a/pyserini/resources/index-metadata/lucene-index.beir-v1.0.0-cqadupstack-wordpress.splade-pp-ed.20231124.a66f86f.README.md b/pyserini/resources/index-metadata/lucene-index.beir-v1.0.0-cqadupstack-wordpress.splade-pp-ed.20231124.a66f86f.README.md deleted file mode 100644 index 49f97f995..000000000 --- a/pyserini/resources/index-metadata/lucene-index.beir-v1.0.0-cqadupstack-wordpress.splade-pp-ed.20231124.a66f86f.README.md +++ /dev/null @@ -1,13 +0,0 @@ -# BEIR (v1.0.0) - CQADupStack-wordpress - -This Lucene impact index for SPLADE++ (CoCondenser-EnsembleDistil)" was generated on 2023/11/24 at Anserini commit [`a66f86f`](https://github.com/castorini/anserini/commit/a66f86fb463db76df521f58992b000dd4ab39548) on `orca` with the following command: - -``` -nohup target/appassembler/bin/IndexCollection \ - -collection JsonVectorCollection \ - -generator DefaultLuceneDocumentGenerator \ - -input /store/collections/beir-v1.0.0/splade-pp-ed/cqadupstack-wordpress \ - -index indexes/lucene-index.beir-v1.0.0-cqadupstack-wordpress.splade-pp-ed.20231124.a66f86f \ - -threads 16 -impact -pretokenized -optimize \ - >& logs/log.beir-v1.0.0--cqadupstack-wordpress.splade-pp-ed.20231124.a66f86f & -``` diff --git a/pyserini/resources/index-metadata/lucene-index.beir-v1.0.0-dbpedia-entity.splade-pp-ed.20231124.a66f86f.README.md b/pyserini/resources/index-metadata/lucene-index.beir-v1.0.0-dbpedia-entity.splade-pp-ed.20231124.a66f86f.README.md deleted file mode 100644 index bebf9fbbe..000000000 --- a/pyserini/resources/index-metadata/lucene-index.beir-v1.0.0-dbpedia-entity.splade-pp-ed.20231124.a66f86f.README.md +++ /dev/null @@ -1,13 +0,0 @@ -# BEIR (v1.0.0) - DBPedia - -This Lucene impact index for SPLADE++ (CoCondenser-EnsembleDistil)" was generated on 2023/11/24 at Anserini commit [`a66f86f`](https://github.com/castorini/anserini/commit/a66f86fb463db76df521f58992b000dd4ab39548) on `orca` with the following command: - -``` -nohup target/appassembler/bin/IndexCollection \ - -collection JsonVectorCollection \ - -generator DefaultLuceneDocumentGenerator \ - -input /store/collections/beir-v1.0.0/splade-pp-ed/dbpedia-entity \ - -index indexes/lucene-index.beir-v1.0.0-dbpedia-entity.splade-pp-ed.20231124.a66f86f \ - -threads 16 -impact -pretokenized -optimize \ - >& logs/log.beir-v1.0.0--dbpedia-entity.splade-pp-ed.20231124.a66f86f & -``` diff --git a/pyserini/resources/index-metadata/lucene-index.beir-v1.0.0-fever.splade-pp-ed.20231124.a66f86f.README.md b/pyserini/resources/index-metadata/lucene-index.beir-v1.0.0-fever.splade-pp-ed.20231124.a66f86f.README.md deleted file mode 100644 index 34a5ee8ab..000000000 --- a/pyserini/resources/index-metadata/lucene-index.beir-v1.0.0-fever.splade-pp-ed.20231124.a66f86f.README.md +++ /dev/null @@ -1,13 +0,0 @@ -# BEIR (v1.0.0) - FEVER - -This Lucene impact index for SPLADE++ (CoCondenser-EnsembleDistil)" was generated on 2023/11/24 at Anserini commit [`a66f86f`](https://github.com/castorini/anserini/commit/a66f86fb463db76df521f58992b000dd4ab39548) on `orca` with the following command: - -``` -nohup target/appassembler/bin/IndexCollection \ - -collection JsonVectorCollection \ - -generator DefaultLuceneDocumentGenerator \ - -input /store/collections/beir-v1.0.0/splade-pp-ed/fever \ - -index indexes/lucene-index.beir-v1.0.0-fever.splade-pp-ed.20231124.a66f86f \ - -threads 16 -impact -pretokenized -optimize \ - >& logs/log.beir-v1.0.0--fever.splade-pp-ed.20231124.a66f86f & -``` diff --git a/pyserini/resources/index-metadata/lucene-index.beir-v1.0.0-fiqa.splade-pp-ed.20231124.a66f86f.README.md b/pyserini/resources/index-metadata/lucene-index.beir-v1.0.0-fiqa.splade-pp-ed.20231124.a66f86f.README.md deleted file mode 100644 index 1e8bb5d1b..000000000 --- a/pyserini/resources/index-metadata/lucene-index.beir-v1.0.0-fiqa.splade-pp-ed.20231124.a66f86f.README.md +++ /dev/null @@ -1,13 +0,0 @@ -# BEIR (v1.0.0) - FiQA-2018 - -This Lucene impact index for SPLADE++ (CoCondenser-EnsembleDistil)" was generated on 2023/11/24 at Anserini commit [`a66f86f`](https://github.com/castorini/anserini/commit/a66f86fb463db76df521f58992b000dd4ab39548) on `orca` with the following command: - -``` -nohup target/appassembler/bin/IndexCollection \ - -collection JsonVectorCollection \ - -generator DefaultLuceneDocumentGenerator \ - -input /store/collections/beir-v1.0.0/splade-pp-ed/fiqa \ - -index indexes/lucene-index.beir-v1.0.0-fiqa.splade-pp-ed.20231124.a66f86f \ - -threads 16 -impact -pretokenized -optimize \ - >& logs/log.beir-v1.0.0--fiqa.splade-pp-ed.20231124.a66f86f & -``` diff --git a/pyserini/resources/index-metadata/lucene-index.beir-v1.0.0-hotpotqa.splade-pp-ed.20231124.a66f86f.README.md b/pyserini/resources/index-metadata/lucene-index.beir-v1.0.0-hotpotqa.splade-pp-ed.20231124.a66f86f.README.md deleted file mode 100644 index 1bcb4d39d..000000000 --- a/pyserini/resources/index-metadata/lucene-index.beir-v1.0.0-hotpotqa.splade-pp-ed.20231124.a66f86f.README.md +++ /dev/null @@ -1,13 +0,0 @@ -# BEIR (v1.0.0) - HotpotQA - -This Lucene impact index for SPLADE++ (CoCondenser-EnsembleDistil)" was generated on 2023/11/24 at Anserini commit [`a66f86f`](https://github.com/castorini/anserini/commit/a66f86fb463db76df521f58992b000dd4ab39548) on `orca` with the following command: - -``` -nohup target/appassembler/bin/IndexCollection \ - -collection JsonVectorCollection \ - -generator DefaultLuceneDocumentGenerator \ - -input /store/collections/beir-v1.0.0/splade-pp-ed/hotpotqa \ - -index indexes/lucene-index.beir-v1.0.0-hotpotqa.splade-pp-ed.20231124.a66f86f \ - -threads 16 -impact -pretokenized -optimize \ - >& logs/log.beir-v1.0.0--hotpotqa.splade-pp-ed.20231124.a66f86f & -``` diff --git a/pyserini/resources/index-metadata/lucene-index.beir-v1.0.0-nfcorpus.splade-pp-ed.20231124.a66f86f.README.md b/pyserini/resources/index-metadata/lucene-index.beir-v1.0.0-nfcorpus.splade-pp-ed.20231124.a66f86f.README.md deleted file mode 100644 index 328f9ac8a..000000000 --- a/pyserini/resources/index-metadata/lucene-index.beir-v1.0.0-nfcorpus.splade-pp-ed.20231124.a66f86f.README.md +++ /dev/null @@ -1,13 +0,0 @@ -# BEIR (v1.0.0) - NFCorpus - -This Lucene impact index for SPLADE++ (CoCondenser-EnsembleDistil)" was generated on 2023/11/24 at Anserini commit [`a66f86f`](https://github.com/castorini/anserini/commit/a66f86fb463db76df521f58992b000dd4ab39548) on `orca` with the following command: - -``` -nohup target/appassembler/bin/IndexCollection \ - -collection JsonVectorCollection \ - -generator DefaultLuceneDocumentGenerator \ - -input /store/collections/beir-v1.0.0/splade-pp-ed/nfcorpus \ - -index indexes/lucene-index.beir-v1.0.0-nfcorpus.splade-pp-ed.20231124.a66f86f \ - -threads 16 -impact -pretokenized -optimize \ - >& logs/log.beir-v1.0.0--nfcorpus.splade-pp-ed.20231124.a66f86f & -``` diff --git a/pyserini/resources/index-metadata/lucene-index.beir-v1.0.0-nq.splade-pp-ed.20231124.a66f86f.README.md b/pyserini/resources/index-metadata/lucene-index.beir-v1.0.0-nq.splade-pp-ed.20231124.a66f86f.README.md deleted file mode 100644 index a4df82c57..000000000 --- a/pyserini/resources/index-metadata/lucene-index.beir-v1.0.0-nq.splade-pp-ed.20231124.a66f86f.README.md +++ /dev/null @@ -1,13 +0,0 @@ -# BEIR (v1.0.0) - NQ - -This Lucene impact index for SPLADE++ (CoCondenser-EnsembleDistil)" was generated on 2023/11/24 at Anserini commit [`a66f86f`](https://github.com/castorini/anserini/commit/a66f86fb463db76df521f58992b000dd4ab39548) on `orca` with the following command: - -``` -nohup target/appassembler/bin/IndexCollection \ - -collection JsonVectorCollection \ - -generator DefaultLuceneDocumentGenerator \ - -input /store/collections/beir-v1.0.0/splade-pp-ed/nq \ - -index indexes/lucene-index.beir-v1.0.0-nq.splade-pp-ed.20231124.a66f86f \ - -threads 16 -impact -pretokenized -optimize \ - >& logs/log.beir-v1.0.0--nq.splade-pp-ed.20231124.a66f86f & -``` diff --git a/pyserini/resources/index-metadata/lucene-index.beir-v1.0.0-quora.splade-pp-ed.20231124.a66f86f.README.md b/pyserini/resources/index-metadata/lucene-index.beir-v1.0.0-quora.splade-pp-ed.20231124.a66f86f.README.md deleted file mode 100644 index b6150cdc1..000000000 --- a/pyserini/resources/index-metadata/lucene-index.beir-v1.0.0-quora.splade-pp-ed.20231124.a66f86f.README.md +++ /dev/null @@ -1,13 +0,0 @@ -# BEIR (v1.0.0) - Quora - -This Lucene impact index for SPLADE++ (CoCondenser-EnsembleDistil)" was generated on 2023/11/24 at Anserini commit [`a66f86f`](https://github.com/castorini/anserini/commit/a66f86fb463db76df521f58992b000dd4ab39548) on `orca` with the following command: - -``` -nohup target/appassembler/bin/IndexCollection \ - -collection JsonVectorCollection \ - -generator DefaultLuceneDocumentGenerator \ - -input /store/collections/beir-v1.0.0/splade-pp-ed/quora \ - -index indexes/lucene-index.beir-v1.0.0-quora.splade-pp-ed.20231124.a66f86f \ - -threads 16 -impact -pretokenized -optimize \ - >& logs/log.beir-v1.0.0--quora.splade-pp-ed.20231124.a66f86f & -``` diff --git a/pyserini/resources/index-metadata/lucene-index.beir-v1.0.0-robust04.splade-pp-ed.20231124.a66f86f.README.md b/pyserini/resources/index-metadata/lucene-index.beir-v1.0.0-robust04.splade-pp-ed.20231124.a66f86f.README.md deleted file mode 100644 index 43b339aa7..000000000 --- a/pyserini/resources/index-metadata/lucene-index.beir-v1.0.0-robust04.splade-pp-ed.20231124.a66f86f.README.md +++ /dev/null @@ -1,13 +0,0 @@ -# BEIR (v1.0.0) - Robust04 - -This Lucene impact index for SPLADE++ (CoCondenser-EnsembleDistil)" was generated on 2023/11/24 at Anserini commit [`a66f86f`](https://github.com/castorini/anserini/commit/a66f86fb463db76df521f58992b000dd4ab39548) on `orca` with the following command: - -``` -nohup target/appassembler/bin/IndexCollection \ - -collection JsonVectorCollection \ - -generator DefaultLuceneDocumentGenerator \ - -input /store/collections/beir-v1.0.0/splade-pp-ed/robust04 \ - -index indexes/lucene-index.beir-v1.0.0-robust04.splade-pp-ed.20231124.a66f86f \ - -threads 16 -impact -pretokenized -optimize \ - >& logs/log.beir-v1.0.0--robust04.splade-pp-ed.20231124.a66f86f & -``` diff --git a/pyserini/resources/index-metadata/lucene-index.beir-v1.0.0-scidocs.splade-pp-ed.20231124.a66f86f.README.md b/pyserini/resources/index-metadata/lucene-index.beir-v1.0.0-scidocs.splade-pp-ed.20231124.a66f86f.README.md deleted file mode 100644 index 1f7ba4dbf..000000000 --- a/pyserini/resources/index-metadata/lucene-index.beir-v1.0.0-scidocs.splade-pp-ed.20231124.a66f86f.README.md +++ /dev/null @@ -1,13 +0,0 @@ -# BEIR (v1.0.0) - SCIDOCS - -This Lucene impact index for SPLADE++ (CoCondenser-EnsembleDistil)" was generated on 2023/11/24 at Anserini commit [`a66f86f`](https://github.com/castorini/anserini/commit/a66f86fb463db76df521f58992b000dd4ab39548) on `orca` with the following command: - -``` -nohup target/appassembler/bin/IndexCollection \ - -collection JsonVectorCollection \ - -generator DefaultLuceneDocumentGenerator \ - -input /store/collections/beir-v1.0.0/splade-pp-ed/scidocs \ - -index indexes/lucene-index.beir-v1.0.0-scidocs.splade-pp-ed.20231124.a66f86f \ - -threads 16 -impact -pretokenized -optimize \ - >& logs/log.beir-v1.0.0--scidocs.splade-pp-ed.20231124.a66f86f & -``` diff --git a/pyserini/resources/index-metadata/lucene-index.beir-v1.0.0-scifact.splade-pp-ed.20231124.a66f86f.README.md b/pyserini/resources/index-metadata/lucene-index.beir-v1.0.0-scifact.splade-pp-ed.20231124.a66f86f.README.md deleted file mode 100644 index 54efb3ed8..000000000 --- a/pyserini/resources/index-metadata/lucene-index.beir-v1.0.0-scifact.splade-pp-ed.20231124.a66f86f.README.md +++ /dev/null @@ -1,13 +0,0 @@ -# BEIR (v1.0.0) - SciFact - -This Lucene impact index for SPLADE++ (CoCondenser-EnsembleDistil)" was generated on 2023/11/24 at Anserini commit [`a66f86f`](https://github.com/castorini/anserini/commit/a66f86fb463db76df521f58992b000dd4ab39548) on `orca` with the following command: - -``` -nohup target/appassembler/bin/IndexCollection \ - -collection JsonVectorCollection \ - -generator DefaultLuceneDocumentGenerator \ - -input /store/collections/beir-v1.0.0/splade-pp-ed/scifact \ - -index indexes/lucene-index.beir-v1.0.0-scifact.splade-pp-ed.20231124.a66f86f \ - -threads 16 -impact -pretokenized -optimize \ - >& logs/log.beir-v1.0.0--scifact.splade-pp-ed.20231124.a66f86f & -``` diff --git a/pyserini/resources/index-metadata/lucene-index.beir-v1.0.0-signal1m.splade-pp-ed.20231124.a66f86f.README.md b/pyserini/resources/index-metadata/lucene-index.beir-v1.0.0-signal1m.splade-pp-ed.20231124.a66f86f.README.md deleted file mode 100644 index 04672439c..000000000 --- a/pyserini/resources/index-metadata/lucene-index.beir-v1.0.0-signal1m.splade-pp-ed.20231124.a66f86f.README.md +++ /dev/null @@ -1,13 +0,0 @@ -# BEIR (v1.0.0) - Signal-1M - -This Lucene impact index for SPLADE++ (CoCondenser-EnsembleDistil)" was generated on 2023/11/24 at Anserini commit [`a66f86f`](https://github.com/castorini/anserini/commit/a66f86fb463db76df521f58992b000dd4ab39548) on `orca` with the following command: - -``` -nohup target/appassembler/bin/IndexCollection \ - -collection JsonVectorCollection \ - -generator DefaultLuceneDocumentGenerator \ - -input /store/collections/beir-v1.0.0/splade-pp-ed/signal1m \ - -index indexes/lucene-index.beir-v1.0.0-signal1m.splade-pp-ed.20231124.a66f86f \ - -threads 16 -impact -pretokenized -optimize \ - >& logs/log.beir-v1.0.0--signal1m.splade-pp-ed.20231124.a66f86f & -``` diff --git a/pyserini/resources/index-metadata/lucene-index.beir-v1.0.0-trec-covid.splade-pp-ed.20231124.a66f86f.README.md b/pyserini/resources/index-metadata/lucene-index.beir-v1.0.0-trec-covid.splade-pp-ed.20231124.a66f86f.README.md deleted file mode 100644 index 4b210fc42..000000000 --- a/pyserini/resources/index-metadata/lucene-index.beir-v1.0.0-trec-covid.splade-pp-ed.20231124.a66f86f.README.md +++ /dev/null @@ -1,13 +0,0 @@ -# BEIR (v1.0.0) - TREC-COVID - -This Lucene impact index for SPLADE++ (CoCondenser-EnsembleDistil)" was generated on 2023/11/24 at Anserini commit [`a66f86f`](https://github.com/castorini/anserini/commit/a66f86fb463db76df521f58992b000dd4ab39548) on `orca` with the following command: - -``` -nohup target/appassembler/bin/IndexCollection \ - -collection JsonVectorCollection \ - -generator DefaultLuceneDocumentGenerator \ - -input /store/collections/beir-v1.0.0/splade-pp-ed/trec-covid \ - -index indexes/lucene-index.beir-v1.0.0-trec-covid.splade-pp-ed.20231124.a66f86f \ - -threads 16 -impact -pretokenized -optimize \ - >& logs/log.beir-v1.0.0--trec-covid.splade-pp-ed.20231124.a66f86f & -``` diff --git a/pyserini/resources/index-metadata/lucene-index.beir-v1.0.0-trec-news.splade-pp-ed.20231124.a66f86f.README.md b/pyserini/resources/index-metadata/lucene-index.beir-v1.0.0-trec-news.splade-pp-ed.20231124.a66f86f.README.md deleted file mode 100644 index 724bf3d8d..000000000 --- a/pyserini/resources/index-metadata/lucene-index.beir-v1.0.0-trec-news.splade-pp-ed.20231124.a66f86f.README.md +++ /dev/null @@ -1,13 +0,0 @@ -# BEIR (v1.0.0) - TREC-NEWS - -This Lucene impact index for SPLADE++ (CoCondenser-EnsembleDistil)" was generated on 2023/11/24 at Anserini commit [`a66f86f`](https://github.com/castorini/anserini/commit/a66f86fb463db76df521f58992b000dd4ab39548) on `orca` with the following command: - -``` -nohup target/appassembler/bin/IndexCollection \ - -collection JsonVectorCollection \ - -generator DefaultLuceneDocumentGenerator \ - -input /store/collections/beir-v1.0.0/splade-pp-ed/trec-news \ - -index indexes/lucene-index.beir-v1.0.0-trec-news.splade-pp-ed.20231124.a66f86f \ - -threads 16 -impact -pretokenized -optimize \ - >& logs/log.beir-v1.0.0--trec-news.splade-pp-ed.20231124.a66f86f & -``` diff --git a/pyserini/resources/index-metadata/lucene-index.beir-v1.0.0-webis-touche2020.splade-pp-ed.20231124.a66f86f.README.md b/pyserini/resources/index-metadata/lucene-index.beir-v1.0.0-webis-touche2020.splade-pp-ed.20231124.a66f86f.README.md deleted file mode 100644 index 68f31536c..000000000 --- a/pyserini/resources/index-metadata/lucene-index.beir-v1.0.0-webis-touche2020.splade-pp-ed.20231124.a66f86f.README.md +++ /dev/null @@ -1,13 +0,0 @@ -# BEIR (v1.0.0) - Webis-Touche2020 - -This Lucene impact index for SPLADE++ (CoCondenser-EnsembleDistil)" was generated on 2023/11/24 at Anserini commit [`a66f86f`](https://github.com/castorini/anserini/commit/a66f86fb463db76df521f58992b000dd4ab39548) on `orca` with the following command: - -``` -nohup target/appassembler/bin/IndexCollection \ - -collection JsonVectorCollection \ - -generator DefaultLuceneDocumentGenerator \ - -input /store/collections/beir-v1.0.0/splade-pp-ed/webis-touche2020 \ - -index indexes/lucene-index.beir-v1.0.0-webis-touche2020.splade-pp-ed.20231124.a66f86f \ - -threads 16 -impact -pretokenized -optimize \ - >& logs/log.beir-v1.0.0--webis-touche2020.splade-pp-ed.20231124.a66f86f & -``` diff --git a/pyserini/resources/index-metadata/lucene-index.msmarco-doc-per-passage-expansion.unicoil-d2q.20211012.58d286.readme.txt b/pyserini/resources/index-metadata/lucene-index.msmarco-doc-per-passage-expansion.unicoil-d2q.20211012.58d286.readme.txt deleted file mode 100644 index e8d2cc17f..000000000 --- a/pyserini/resources/index-metadata/lucene-index.msmarco-doc-per-passage-expansion.unicoil-d2q.20211012.58d286.readme.txt +++ /dev/null @@ -1,12 +0,0 @@ -This index was generated on 2021/10/12 at commit 58d286c3f9fe845e261c271f2a0f514462844d97 (2021/10/05) -with the following command: - -python -m pyserini.index -collection JsonVectorCollection \ - -input collections/msmarco-doc-per-passage-expansion-unicoil-d2q-b8/ \ - -index indexes/lucene-index.msmarco-doc-per-passage-expansion.unicoil-d2q.20211012.58d286 \ - -generator DefaultLuceneDocumentGenerator -impact -pretokenized \ - -threads 36 -optimize - -This minimal index does not store any "extras" (positions, document vectors, raw documents, etc.). - -lucene-index.msmarco-doc-per-passage-expansion.unicoil-d2q.20211012.58d286.tar.gz MD5 checksum = 44bfc848f9a77302b10a59c5b136eb95 diff --git a/pyserini/resources/index-metadata/lucene-index.msmarco-passage.deepimpact.20211012.58d286.readme.txt b/pyserini/resources/index-metadata/lucene-index.msmarco-passage.deepimpact.20211012.58d286.readme.txt deleted file mode 100644 index e35652518..000000000 --- a/pyserini/resources/index-metadata/lucene-index.msmarco-passage.deepimpact.20211012.58d286.readme.txt +++ /dev/null @@ -1,12 +0,0 @@ -This index was generated on 2021/10/12 at commit 58d286c3f9fe845e261c271f2a0f514462844d97 (2021/10/05) -with the following command: - -python -m pyserini.index -collection JsonVectorCollection \ - -input collections/msmarco-passage-deepimpact-b8/ \ - -index indexes/lucene-index.msmarco-passage.deepimpact.20211012.58d286 \ - -generator DefaultLuceneDocumentGenerator -impact -pretokenized \ - -threads 36 -optimize - -This minimal index does not store any "extras" (positions, document vectors, raw documents, etc.). - -lucene-index.msmarco-passage.deepimpact.20211012.58d286.tar.gz MD5 checksum = 9938f5529fee5cdb405b8587746c9e93 diff --git a/pyserini/resources/index-metadata/lucene-index.msmarco-passage.distill-splade-max.20211012.58d286.readme.txt b/pyserini/resources/index-metadata/lucene-index.msmarco-passage.distill-splade-max.20211012.58d286.readme.txt deleted file mode 100644 index 8a3b451a3..000000000 --- a/pyserini/resources/index-metadata/lucene-index.msmarco-passage.distill-splade-max.20211012.58d286.readme.txt +++ /dev/null @@ -1,12 +0,0 @@ -This index was generated on 2021/10/12 at commit 58d286c3f9fe845e261c271f2a0f514462844d97 (2021/10/05) -with the following command: - -python -m pyserini.index -collection JsonVectorCollection \ - -input collections/msmarco-passage-distill-splade-max \ - -index indexes/lucene-index.msmarco-passage.distill-splade-max.20211012.58d286 \ - -generator DefaultLuceneDocumentGenerator -impact -pretokenized \ - -threads 36 -optimize - -This minimal index does not store any "extras" (positions, document vectors, raw documents, etc.). - -lucene-index.msmarco-passage.distill-splade-max.20211012.58d286.tar.gz MD5 checksum = 957c0dd1b78b61aeddc8685150fd8360 diff --git a/pyserini/resources/index-metadata/lucene-index.msmarco-passage.unicoil-d2q.20211012.58d286.readme.txt b/pyserini/resources/index-metadata/lucene-index.msmarco-passage.unicoil-d2q.20211012.58d286.readme.txt deleted file mode 100644 index b9b427dc7..000000000 --- a/pyserini/resources/index-metadata/lucene-index.msmarco-passage.unicoil-d2q.20211012.58d286.readme.txt +++ /dev/null @@ -1,12 +0,0 @@ -This index was generated on 2021/10/12 at commit 58d286c3f9fe845e261c271f2a0f514462844d97 (2021/10/05) -with the following command: - -python -m pyserini.index -collection JsonVectorCollection \ - -input collections/msmarco-passage-unicoil-b8/ \ - -index indexes/lucene-index.msmarco-passage.unicoil-d2q.20211012.58d286 \ - -generator DefaultLuceneDocumentGenerator -impact -pretokenized \ - -threads 36 -optimize - -This minimal index does not store any "extras" (positions, document vectors, raw documents, etc.). - -lucene-index.msmarco-passage.unicoil-d2q.20211012.58d286.tar.gz MD5 checksum = 4a8cb3b86a0d9085a0860c7f7bb7fe99 diff --git a/pyserini/resources/index-metadata/lucene-index.msmarco-passage.unicoil-tilde.20211012.58d286.readme.txt b/pyserini/resources/index-metadata/lucene-index.msmarco-passage.unicoil-tilde.20211012.58d286.readme.txt deleted file mode 100644 index 817abea4c..000000000 --- a/pyserini/resources/index-metadata/lucene-index.msmarco-passage.unicoil-tilde.20211012.58d286.readme.txt +++ /dev/null @@ -1,12 +0,0 @@ -This index was generated on 2021/10/12 at commit 58d286c3f9fe845e261c271f2a0f514462844d97 (2021/10/05) -with the following command: - -python -m pyserini.index -collection JsonVectorCollection \ - -input collections/msmarco-passage-unicoil-tilde-expansion-b8/ \ - -index indexes/lucene-index.msmarco-passage.unicoil-tilde.20211012.58d286 \ - -generator DefaultLuceneDocumentGenerator -impact -pretokenized \ - -threads 36 -optimize - -This minimal index does not store any "extras" (positions, document vectors, raw documents, etc.). - -lucene-index.msmarco-passage.unicoil-tilde.20211012.58d286.tar.gz MD5 checksum = cc19cfe241053f5a303f7f05a7ac40a5 diff --git a/pyserini/resources/index-metadata/lucene-index.msmarco-v2-passage-augmented-d2q-t5-docvectors.20220525.30c997.README.md b/pyserini/resources/index-metadata/lucene-index.msmarco-v2-passage-augmented-d2q-t5-docvectors.20220525.30c997.README.md deleted file mode 100644 index 19538faf1..000000000 --- a/pyserini/resources/index-metadata/lucene-index.msmarco-v2-passage-augmented-d2q-t5-docvectors.20220525.30c997.README.md +++ /dev/null @@ -1,16 +0,0 @@ -# msmarco-v2-passage-augmented-d2q-t5-docvectors - -Lucene index (+docvectors) of the MS MARCO V2 augmented passage corpus, with doc2query-T5 expansions. - -This index was generated on 2022/05/25 at Anserini commit [`30c997`](https://github.com/castorini/anserini/commit/30c9974f495a06c94d576d0e9c2c5861515e0e19) on `damiano` with the following command: - -``` -target/appassembler/bin/IndexCollection -collection MsMarcoV2PassageCollection \ - -generator DefaultLuceneDocumentGenerator -threads 18 \ - -input /scratch2/collections/msmarco/msmarco_v2_passage_augmented_d2q-t5/ \ - -index indexes/lucene-index.msmarco-v2-passage-augmented-d2q-t5-docvectors.20220525.30c997/ \ - -storeDocvectors -optimize -``` - -Note that this index stores term frequencies along with the docvectors: bag-of-words queries and relevance feedback are supported, but not phrase queries. -The raw text is not stored. diff --git a/pyserini/resources/index-metadata/lucene-index.msmarco-v2-passage-augmented-d2q-t5.20220201.9ea315.README.md b/pyserini/resources/index-metadata/lucene-index.msmarco-v2-passage-augmented-d2q-t5.20220201.9ea315.README.md deleted file mode 100644 index 27fe35125..000000000 --- a/pyserini/resources/index-metadata/lucene-index.msmarco-v2-passage-augmented-d2q-t5.20220201.9ea315.README.md +++ /dev/null @@ -1,15 +0,0 @@ -# msmarco-v2-passage-augmented-d2q-t5 - -Lucene index of the MS MARCO V2 augmented passage corpus, with doc2query-T5 expansions. - -This index was generated on 2022/02/01 at Anserini commit [`06fb4f`](https://github.com/castorini/anserini/commit/9ea3159adeeffd84e10e197af4c36febb5b74c7b) on `orca` with the following command: - -``` -target/appassembler/bin/IndexCollection -collection MsMarcoV2PassageCollection \ - -generator DefaultLuceneDocumentGenerator -threads 18 \ - -input /store/collections/msmarco/msmarco_v2_passage_augmented_d2q-t5/ \ - -index indexes/lucene-index.msmarco-v2-passage-augmented-d2q-t5.20220201.9ea315/ \ - -optimize -``` - -Note that this index stores term frequencies only, which supports bag-of-words queries, but no phrase queries and no relevance feedback. In addition, there is no way to fetch the raw text. diff --git a/pyserini/resources/index-metadata/lucene-index.msmarco-v2-passage-augmented-full.20220111.06fb4f.README.md b/pyserini/resources/index-metadata/lucene-index.msmarco-v2-passage-augmented-full.20220111.06fb4f.README.md deleted file mode 100644 index ee627583a..000000000 --- a/pyserini/resources/index-metadata/lucene-index.msmarco-v2-passage-augmented-full.20220111.06fb4f.README.md +++ /dev/null @@ -1,21 +0,0 @@ -# msmarco-v2-passage-augmented-full - -Lucene index of the MS MARCO V2 augmented passage corpus. - -This index was generated on 2022/01/11 at Anserini commit [`06fb4f`](https://github.com/castorini/anserini/commit/06fb4f9947ff2167c276d8893287453af7680786) on `orca` with the following command: - -``` -target/appassembler/bin/IndexCollection -collection MsMarcoV2PassageCollection \ - -generator DefaultLuceneDocumentGenerator -threads 18 \ - -input /store/collections/msmarco/msmarco_v2_passage_augmented/ \ - -index indexes/lucene-index.msmarco-v2-passage-augmented-full.20220111.06fb4f/ \ - -storePositions -storeDocvectors -storeRaw -optimize -``` - -Note that there are three variants of this index: - -+ `msmarco-v2-passage-augmented` (82G uncompressed): the "default" version, which stores term frequencies and the raw text. This supports bag-of-words queries, but no phrase queries and no relevance feedback. -+ `msmarco-v2-passage-augmented-slim` (18G uncompressed): the "slim" version, which stores term frequencies only. This supports bag-of-words queries, but no phrase queries and no relevance feedback. There is no way to fetch the raw text from this index. -+ `msmarco-v2-passage-augmented-full` (142G uncompressed): the "full" version, which stores term frequencies, term positions, document vectors, and the raw text. This supports bag-of-words queries, phrase queries, and relevance feedback. - -This is the "full" version. \ No newline at end of file diff --git a/pyserini/resources/index-metadata/lucene-index.msmarco-v2-passage-augmented-slim.20220111.06fb4f.README.md b/pyserini/resources/index-metadata/lucene-index.msmarco-v2-passage-augmented-slim.20220111.06fb4f.README.md deleted file mode 100644 index 61e5d0090..000000000 --- a/pyserini/resources/index-metadata/lucene-index.msmarco-v2-passage-augmented-slim.20220111.06fb4f.README.md +++ /dev/null @@ -1,21 +0,0 @@ -# msmarco-v2-passage-augmented-slim - -Lucene index of the MS MARCO V2 augmented passage corpus. - -This index was generated on 2022/01/11 at Anserini commit [`06fb4f`](https://github.com/castorini/anserini/commit/06fb4f9947ff2167c276d8893287453af7680786) on `orca` with the following command: - -``` -target/appassembler/bin/IndexCollection -collection MsMarcoV2PassageCollection \ - -generator DefaultLuceneDocumentGenerator -threads 18 \ - -input /store/collections/msmarco/msmarco_v2_passage_augmented/ \ - -index indexes/lucene-index.msmarco-v2-passage-augmented-slim.20220111.06fb4f/ \ - -optimize -``` - -Note that there are three variants of this index: - -+ `msmarco-v2-passage-augmented` (82G uncompressed): the "default" version, which stores term frequencies and the raw text. This supports bag-of-words queries, but no phrase queries and no relevance feedback. -+ `msmarco-v2-passage-augmented-slim` (18G uncompressed): the "slim" version, which stores term frequencies only. This supports bag-of-words queries, but no phrase queries and no relevance feedback. There is no way to fetch the raw text from this index. -+ `msmarco-v2-passage-augmented-full` (142G uncompressed): the "full" version, which stores term frequencies, term positions, document vectors, and the raw text. This supports bag-of-words queries, phrase queries, and relevance feedback. - -This is the "slim" version. \ No newline at end of file diff --git a/pyserini/resources/index-metadata/lucene-index.msmarco-v2-passage-augmented.20220111.06fb4f.README.md b/pyserini/resources/index-metadata/lucene-index.msmarco-v2-passage-augmented.20220111.06fb4f.README.md deleted file mode 100644 index 3aff53355..000000000 --- a/pyserini/resources/index-metadata/lucene-index.msmarco-v2-passage-augmented.20220111.06fb4f.README.md +++ /dev/null @@ -1,21 +0,0 @@ -# msmarco-v2-passage-augmented - -Lucene index of the MS MARCO V2 augmented passage corpus. - -This index was generated on 2022/01/11 at Anserini commit [`06fb4f`](https://github.com/castorini/anserini/commit/06fb4f9947ff2167c276d8893287453af7680786) on `orca` with the following command: - -``` -target/appassembler/bin/IndexCollection -collection MsMarcoV2PassageCollection \ - -generator DefaultLuceneDocumentGenerator -threads 18 \ - -input /store/collections/msmarco/msmarco_v2_passage_augmented/ \ - -index indexes/lucene-index.msmarco-v2-passage-augmented.20220111.06fb4f/ \ - -storeRaw -optimize -``` - -Note that there are three variants of this index: - -+ `msmarco-v2-passage-augmented` (82G uncompressed): the "default" version, which stores term frequencies and the raw text. This supports bag-of-words queries, but no phrase queries and no relevance feedback. -+ `msmarco-v2-passage-augmented-slim` (18G uncompressed): the "slim" version, which stores term frequencies only. This supports bag-of-words queries, but no phrase queries and no relevance feedback. There is no way to fetch the raw text from this index. -+ `msmarco-v2-passage-augmented-full` (142G uncompressed): the "full" version, which stores term frequencies, term positions, document vectors, and the raw text. This supports bag-of-words queries, phrase queries, and relevance feedback. - -This is the "default" version. \ No newline at end of file diff --git a/pyserini/resources/index-metadata/lucene-index.msmarco-v2-passage-d2q-t5-docvectors.20220525.30c997.README.md b/pyserini/resources/index-metadata/lucene-index.msmarco-v2-passage-d2q-t5-docvectors.20220525.30c997.README.md deleted file mode 100644 index 59fd7e47b..000000000 --- a/pyserini/resources/index-metadata/lucene-index.msmarco-v2-passage-d2q-t5-docvectors.20220525.30c997.README.md +++ /dev/null @@ -1,16 +0,0 @@ -# msmarco-v2-passage-d2q-t5-docvectors - -Lucene index (+docvectors) of the MS MARCO V2 passage corpus, with doc2query-T5 expansions. - -This index was generated on 2022/05/25 at Anserini commit [`30c997`](https://github.com/castorini/anserini/commit/30c9974f495a06c94d576d0e9c2c5861515e0e19) on `damiano` with the following command: - -``` -target/appassembler/bin/IndexCollection -collection MsMarcoV2PassageCollection \ - -generator DefaultLuceneDocumentGenerator -threads 18 \ - -input /scratch2/collections/msmarco/msmarco_v2_passage_d2q-t5/ \ - -index indexes/lucene-index.msmarco-v2-passage-d2q-t5-docvectors.20220525.30c997/ \ - -storeDocvectors -optimize -``` - -Note that this index stores term frequencies along with the docvectors: bag-of-words queries and relevance feedback are supported, but not phrase queries. -The raw text is not stored. diff --git a/pyserini/resources/index-metadata/lucene-index.msmarco-v2-passage-d2q-t5.20220201.9ea315.README.md b/pyserini/resources/index-metadata/lucene-index.msmarco-v2-passage-d2q-t5.20220201.9ea315.README.md deleted file mode 100644 index 37f9b289a..000000000 --- a/pyserini/resources/index-metadata/lucene-index.msmarco-v2-passage-d2q-t5.20220201.9ea315.README.md +++ /dev/null @@ -1,15 +0,0 @@ -# msmarco-v2-passage-d2q-t5 - -Lucene index of the MS MARCO V2 passage corpus, with doc2query-T5 expansions. - -This index was generated on 2022/02/01 at Anserini commit [`06fb4f`](https://github.com/castorini/anserini/commit/9ea3159adeeffd84e10e197af4c36febb5b74c7b) on `orca` with the following command: - -``` -target/appassembler/bin/IndexCollection -collection MsMarcoV2PassageCollection \ - -generator DefaultLuceneDocumentGenerator -threads 18 \ - -input /store/collections/msmarco/msmarco_v2_passage_d2q-t5/ \ - -index indexes/lucene-index.msmarco-v2-passage-d2q-t5.20220201.9ea315/ \ - -optimize -``` - -Note that this index stores term frequencies only, which supports bag-of-words queries, but no phrase queries and no relevance feedback. In addition, there is no way to fetch the raw text. diff --git a/pyserini/resources/index-metadata/lucene-index.msmarco-v2-passage-full.20220111.06fb4f.README.md b/pyserini/resources/index-metadata/lucene-index.msmarco-v2-passage-full.20220111.06fb4f.README.md deleted file mode 100644 index 8fd87fdff..000000000 --- a/pyserini/resources/index-metadata/lucene-index.msmarco-v2-passage-full.20220111.06fb4f.README.md +++ /dev/null @@ -1,21 +0,0 @@ -# msmarco-v2-passage-full - -Lucene index of the MS MARCO V2 passage corpus. - -This index was generated on 2022/01/11 at Anserini commit [`06fb4f`](https://github.com/castorini/anserini/commit/06fb4f9947ff2167c276d8893287453af7680786) on `orca` with the following command: - -``` -target/appassembler/bin/IndexCollection -collection MsMarcoV2PassageCollection \ - -generator DefaultLuceneDocumentGenerator -threads 18 \ - -input /store/collections/msmarco/msmarco_v2_passage/ \ - -index indexes/lucene-index.msmarco-v2-passage-full.20220111.06fb4f/ \ - -storePositions -storeDocvectors -storeRaw -optimize -``` - -Note that there are three variants of this index: - -+ `msmarco-v2-passage` (45G uncompressed): the "default" version, which stores term frequencies and the raw text. This supports bag-of-words queries, but no phrase queries and no relevance feedback. -+ `msmarco-v2-passage-slim` (11G uncompressed): the "slim" version, which stores term frequencies only. This supports bag-of-words queries, but no phrase queries and no relevance feedback. There is no way to fetch the raw text from this index. -+ `msmarco-v2-passage-full` (69G uncompressed): the "full" version, which stores term frequencies, term positions, document vectors, and the raw text. This supports bag-of-words queries, phrase queries, and relevance feedback. - -This is the "full" version. \ No newline at end of file diff --git a/pyserini/resources/index-metadata/lucene-index.msmarco-v2-passage-slim.20220111.06fb4f.README.md b/pyserini/resources/index-metadata/lucene-index.msmarco-v2-passage-slim.20220111.06fb4f.README.md deleted file mode 100644 index e3f5e1b57..000000000 --- a/pyserini/resources/index-metadata/lucene-index.msmarco-v2-passage-slim.20220111.06fb4f.README.md +++ /dev/null @@ -1,21 +0,0 @@ -# msmarco-v2-passage-slim - -Lucene index of the MS MARCO V2 passage corpus. - -This index was generated on 2022/01/11 at Anserini commit [`06fb4f`](https://github.com/castorini/anserini/commit/06fb4f9947ff2167c276d8893287453af7680786) on `orca` with the following command: - -``` -target/appassembler/bin/IndexCollection -collection MsMarcoV2PassageCollection \ - -generator DefaultLuceneDocumentGenerator -threads 18 \ - -input /store/collections/msmarco/msmarco_v2_passage/ \ - -index indexes/lucene-index.msmarco-v2-passage-slim.20220111.06fb4f/ \ - -optimize -``` - -Note that there are three variants of this index: - -+ `msmarco-v2-passage` (45G uncompressed): the "default" version, which stores term frequencies and the raw text. This supports bag-of-words queries, but no phrase queries and no relevance feedback. -+ `msmarco-v2-passage-slim` (11G uncompressed): the "slim" version, which stores term frequencies only. This supports bag-of-words queries, but no phrase queries and no relevance feedback. There is no way to fetch the raw text from this index. -+ `msmarco-v2-passage-full` (69G uncompressed): the "full" version, which stores term frequencies, term positions, document vectors, and the raw text. This supports bag-of-words queries, phrase queries, and relevance feedback. - -This is the "slim" version. \ No newline at end of file diff --git a/pyserini/resources/index-metadata/lucene-index.msmarco-v2-passage-unicoil-0shot.20220219.6a7080.README.md b/pyserini/resources/index-metadata/lucene-index.msmarco-v2-passage-unicoil-0shot.20220219.6a7080.README.md deleted file mode 100644 index af3c94f37..000000000 --- a/pyserini/resources/index-metadata/lucene-index.msmarco-v2-passage-unicoil-0shot.20220219.6a7080.README.md +++ /dev/null @@ -1,14 +0,0 @@ -# msmarco-v2-passage-unicoil-0shot - -Lucene impact index of the MS MARCO V2 passage corpus for uniCOIL. - -This index was generated on 2022/02/19 at Anserini commit [`9ea315`](https://github.com/castorini/anserini/commit/6a708047f71528f7d516c0dd45485204a36e6b1d) on `orca` with the following command: - -``` -target/appassembler/bin/IndexCollection \ - -collection JsonVectorCollection \ - -input /store/collections/msmarco/msmarco_v2_passage_unicoil_0shot \ - -index indexes/lucene-index.msmarco-v2-passage-unicoil-0shot.20220219.6a7080/ \ - -generator DefaultLuceneDocumentGenerator \ - -threads 18 -impact -pretokenized -optimize -``` diff --git a/pyserini/resources/index-metadata/lucene-index.msmarco-v2-passage-unicoil-noexp-0shot.20220219.6a7080.README.md b/pyserini/resources/index-metadata/lucene-index.msmarco-v2-passage-unicoil-noexp-0shot.20220219.6a7080.README.md deleted file mode 100644 index a7959c822..000000000 --- a/pyserini/resources/index-metadata/lucene-index.msmarco-v2-passage-unicoil-noexp-0shot.20220219.6a7080.README.md +++ /dev/null @@ -1,14 +0,0 @@ -# msmarco-v2-passage-unicoil-noexp-0shot - -Lucene impact index of the MS MARCO V2 passage corpus for uniCOIL (noexp). - -This index was generated on 2022/02/19 at Anserini commit [`9ea315`](https://github.com/castorini/anserini/commit/6a708047f71528f7d516c0dd45485204a36e6b1d) on `orca` with the following command: - -``` -target/appassembler/bin/IndexCollection \ - -collection JsonVectorCollection \ - -input /store/collections/msmarco/msmarco_v2_passage_unicoil_noexp_0shot \ - -index indexes/lucene-index.msmarco-v2-passage-unicoil-noexp-0shot.20220219.6a7080/ \ - -generator DefaultLuceneDocumentGenerator \ - -threads 18 -impact -pretokenized -optimize -``` diff --git a/pyserini/resources/index-metadata/lucene-index.msmarco-v2-passage.20220111.06fb4f.README.md b/pyserini/resources/index-metadata/lucene-index.msmarco-v2-passage.20220111.06fb4f.README.md deleted file mode 100644 index 9cbb2e23e..000000000 --- a/pyserini/resources/index-metadata/lucene-index.msmarco-v2-passage.20220111.06fb4f.README.md +++ /dev/null @@ -1,21 +0,0 @@ -# msmarco-v2-passage - -Lucene index of the MS MARCO V2 passage corpus. - -This index was generated on 2022/01/11 at Anserini commit [`06fb4f`](https://github.com/castorini/anserini/commit/06fb4f9947ff2167c276d8893287453af7680786) on `orca` with the following command: - -``` -target/appassembler/bin/IndexCollection -collection MsMarcoV2PassageCollection \ - -generator DefaultLuceneDocumentGenerator -threads 18 \ - -input /store/collections/msmarco/msmarco_v2_passage/ \ - -index indexes/lucene-index.msmarco-v2-passage.20220111.06fb4f/ \ - -storeRaw -optimize -``` - -Note that there are three variants of this index: - -+ `msmarco-v2-passage` (45G uncompressed): the "default" version, which stores term frequencies and the raw text. This supports bag-of-words queries, but no phrase queries and no relevance feedback. -+ `msmarco-v2-passage-slim` (11G uncompressed): the "slim" version, which stores term frequencies only. This supports bag-of-words queries, but no phrase queries and no relevance feedback. There is no way to fetch the raw text from this index. -+ `msmarco-v2-passage-full` (69G uncompressed): the "full" version, which stores term frequencies, term positions, document vectors, and the raw text. This supports bag-of-words queries, phrase queries, and relevance feedback. - -This is the "default" version. \ No newline at end of file diff --git a/pyserini/resources/index-metadata/lucene-index.msmarco-v2-passage.unicoil-noexp-0shot.20211012.58d286.readme.txt b/pyserini/resources/index-metadata/lucene-index.msmarco-v2-passage.unicoil-noexp-0shot.20211012.58d286.readme.txt deleted file mode 100644 index fea735fec..000000000 --- a/pyserini/resources/index-metadata/lucene-index.msmarco-v2-passage.unicoil-noexp-0shot.20211012.58d286.readme.txt +++ /dev/null @@ -1,12 +0,0 @@ -This index was generated on 2021/10/12 at commit 58d286c3f9fe845e261c271f2a0f514462844d97 (2021/10/05) -with the following command: - -python -m pyserini.index -collection JsonVectorCollection \ - -input collections/msmarco-v2-passage-unicoil-noexp-0shot-b8 \ - -index indexes/lucene-index.msmarco-v2-passage.unicoil-noexp-0shot.20211012.58d286 \ - -generator DefaultLuceneDocumentGenerator -impact -pretokenized \ - -threads 36 -optimize - -This minimal index does not store any "extras" (positions, document vectors, raw documents, etc.). - -lucene-index.msmarco-v2-passage.unicoil-noexp-0shot.20211012.58d286.tar.gz MD5 checksum = 8886a8d9599838bc6d8d61464da61086 diff --git a/pyserini/resources/index-metadata/lucene-index.msmarco-v2-passage.unicoil-tilde.20211012.58d286.readme.txt b/pyserini/resources/index-metadata/lucene-index.msmarco-v2-passage.unicoil-tilde.20211012.58d286.readme.txt deleted file mode 100644 index 5e772713c..000000000 --- a/pyserini/resources/index-metadata/lucene-index.msmarco-v2-passage.unicoil-tilde.20211012.58d286.readme.txt +++ /dev/null @@ -1,12 +0,0 @@ -This index was generated on 2021/10/12 at commit 58d286c3f9fe845e261c271f2a0f514462844d97 (2021/10/05) -with the following command: - -python -m pyserini.index -collection JsonVectorCollection \ - -input collections/msmarco-v2-passage-unicoil-tilde-expansion-b8/ \ - -index indexes/lucene-index.msmarco-v2-passage.unicoil-tilde.20211012.58d286 \ - -generator DefaultLuceneDocumentGenerator -impact -pretokenized \ - -threads 36 -optimize - -This minimal index does not store any "extras" (positions, document vectors, raw documents, etc.). - -lucene-index.msmarco-v2-passage.unicoil-tilde.20211012.58d286.tar.gz MD5 checksum = 562f9534eefe04ab8c07beb304074d41 diff --git a/pyserini/resources/index-metadata/lucene-inverted.beir-v1.0.0-flat.20221116.505594.README.md b/pyserini/resources/index-metadata/lucene-inverted.beir-v1.0.0-flat.20221116.505594.README.md index 2ef11bbf6..d4a33687a 100644 --- a/pyserini/resources/index-metadata/lucene-inverted.beir-v1.0.0-flat.20221116.505594.README.md +++ b/pyserini/resources/index-metadata/lucene-inverted.beir-v1.0.0-flat.20221116.505594.README.md @@ -235,3 +235,5 @@ nohup target/appassembler/bin/IndexCollection \ -threads 16 -storePositions -storeDocvectors -storeRaw -optimize \ >& logs/log.beir-v1.0.0-scifact-flat.20221116.505594 & ``` + +In April 2024, indexes were repackaged to adopt a more consistent naming scheme. diff --git a/pyserini/resources/index-metadata/lucene-inverted.beir-v1.0.0-multifield.20221116.505594.README.md b/pyserini/resources/index-metadata/lucene-inverted.beir-v1.0.0-multifield.20221116.505594.README.md index 453718050..5aec5765f 100644 --- a/pyserini/resources/index-metadata/lucene-inverted.beir-v1.0.0-multifield.20221116.505594.README.md +++ b/pyserini/resources/index-metadata/lucene-inverted.beir-v1.0.0-multifield.20221116.505594.README.md @@ -235,3 +235,5 @@ nohup target/appassembler/bin/IndexCollection \ -threads 16 -fields title -storePositions -storeDocvectors -storeRaw -optimize \ >& logs/log.beir-v1.0.0-scifact-multifield.20221116.505594 & ``` + +In April 2024, indexes were repackaged to adopt a more consistent naming scheme. diff --git a/pyserini/resources/index-metadata/lucene-inverted.beir-v1.0.0-splade-pp-ed.20231124.a66f86f.README.md b/pyserini/resources/index-metadata/lucene-inverted.beir-v1.0.0-splade-pp-ed.20231124.a66f86f.README.md new file mode 100644 index 000000000..d94fdd9a5 --- /dev/null +++ b/pyserini/resources/index-metadata/lucene-inverted.beir-v1.0.0-splade-pp-ed.20231124.a66f86f.README.md @@ -0,0 +1,239 @@ +# BEIR (v1.0.0): SPLADE++ (CoCondenser-EnsembleDistil) Indexes + +The Lucene impact indexes for SPLADE++ (CoCondenser-EnsembleDistil) were generated on 2023/11/24 at Anserini commit [`a66f86f`](https://github.com/castorini/anserini/commit/a66f86fb463db76df521f58992b000dd4ab39548) on `orca` with the following commands: + +``` +nohup target/appassembler/bin/IndexCollection \ + -collection JsonVectorCollection \ + -generator DefaultLuceneDocumentGenerator \ + -input /store/collections/beir-v1.0.0/splade-pp-ed/arguana \ + -index indexes/lucene-index.beir-v1.0.0-arguana.splade-pp-ed.20231124.a66f86f \ + -threads 16 -impact -pretokenized -optimize \ + >& logs/log.beir-v1.0.0--arguana.splade-pp-ed.20231124.a66f86f & + +nohup target/appassembler/bin/IndexCollection \ + -collection JsonVectorCollection \ + -generator DefaultLuceneDocumentGenerator \ + -input /store/collections/beir-v1.0.0/splade-pp-ed/bioasq \ + -index indexes/lucene-index.beir-v1.0.0-bioasq.splade-pp-ed.20231124.a66f86f \ + -threads 16 -impact -pretokenized -optimize \ + >& logs/log.beir-v1.0.0--bioasq.splade-pp-ed.20231124.a66f86f & + +nohup target/appassembler/bin/IndexCollection \ + -collection JsonVectorCollection \ + -generator DefaultLuceneDocumentGenerator \ + -input /store/collections/beir-v1.0.0/splade-pp-ed/climate-fever \ + -index indexes/lucene-index.beir-v1.0.0-climate-fever.splade-pp-ed.20231124.a66f86f \ + -threads 16 -impact -pretokenized -optimize \ + >& logs/log.beir-v1.0.0--climate-fever.splade-pp-ed.20231124.a66f86f & + +nohup target/appassembler/bin/IndexCollection \ + -collection JsonVectorCollection \ + -generator DefaultLuceneDocumentGenerator \ + -input /store/collections/beir-v1.0.0/splade-pp-ed/cqadupstack-android \ + -index indexes/lucene-index.beir-v1.0.0-cqadupstack-android.splade-pp-ed.20231124.a66f86f \ + -threads 16 -impact -pretokenized -optimize \ + >& logs/log.beir-v1.0.0--cqadupstack-android.splade-pp-ed.20231124.a66f86f & + +nohup target/appassembler/bin/IndexCollection \ + -collection JsonVectorCollection \ + -generator DefaultLuceneDocumentGenerator \ + -input /store/collections/beir-v1.0.0/splade-pp-ed/cqadupstack-english \ + -index indexes/lucene-index.beir-v1.0.0-cqadupstack-english.splade-pp-ed.20231124.a66f86f \ + -threads 16 -impact -pretokenized -optimize \ + >& logs/log.beir-v1.0.0--cqadupstack-english.splade-pp-ed.20231124.a66f86f & + +nohup target/appassembler/bin/IndexCollection \ + -collection JsonVectorCollection \ + -generator DefaultLuceneDocumentGenerator \ + -input /store/collections/beir-v1.0.0/splade-pp-ed/cqadupstack-gaming \ + -index indexes/lucene-index.beir-v1.0.0-cqadupstack-gaming.splade-pp-ed.20231124.a66f86f \ + -threads 16 -impact -pretokenized -optimize \ + >& logs/log.beir-v1.0.0--cqadupstack-gaming.splade-pp-ed.20231124.a66f86f & + +nohup target/appassembler/bin/IndexCollection \ + -collection JsonVectorCollection \ + -generator DefaultLuceneDocumentGenerator \ + -input /store/collections/beir-v1.0.0/splade-pp-ed/cqadupstack-gis \ + -index indexes/lucene-index.beir-v1.0.0-cqadupstack-gis.splade-pp-ed.20231124.a66f86f \ + -threads 16 -impact -pretokenized -optimize \ + >& logs/log.beir-v1.0.0--cqadupstack-gis.splade-pp-ed.20231124.a66f86f & + +nohup target/appassembler/bin/IndexCollection \ + -collection JsonVectorCollection \ + -generator DefaultLuceneDocumentGenerator \ + -input /store/collections/beir-v1.0.0/splade-pp-ed/cqadupstack-mathematica \ + -index indexes/lucene-index.beir-v1.0.0-cqadupstack-mathematica.splade-pp-ed.20231124.a66f86f \ + -threads 16 -impact -pretokenized -optimize \ + >& logs/log.beir-v1.0.0--cqadupstack-mathematica.splade-pp-ed.20231124.a66f86f & + +nohup target/appassembler/bin/IndexCollection \ + -collection JsonVectorCollection \ + -generator DefaultLuceneDocumentGenerator \ + -input /store/collections/beir-v1.0.0/splade-pp-ed/cqadupstack-physics \ + -index indexes/lucene-index.beir-v1.0.0-cqadupstack-physics.splade-pp-ed.20231124.a66f86f \ + -threads 16 -impact -pretokenized -optimize \ + >& logs/log.beir-v1.0.0--cqadupstack-physics.splade-pp-ed.20231124.a66f86f & + +nohup target/appassembler/bin/IndexCollection \ + -collection JsonVectorCollection \ + -generator DefaultLuceneDocumentGenerator \ + -input /store/collections/beir-v1.0.0/splade-pp-ed/cqadupstack-programmers \ + -index indexes/lucene-index.beir-v1.0.0-cqadupstack-programmers.splade-pp-ed.20231124.a66f86f \ + -threads 16 -impact -pretokenized -optimize \ + >& logs/log.beir-v1.0.0--cqadupstack-programmers.splade-pp-ed.20231124.a66f86f & + +nohup target/appassembler/bin/IndexCollection \ + -collection JsonVectorCollection \ + -generator DefaultLuceneDocumentGenerator \ + -input /store/collections/beir-v1.0.0/splade-pp-ed/cqadupstack-stats \ + -index indexes/lucene-index.beir-v1.0.0-cqadupstack-stats.splade-pp-ed.20231124.a66f86f \ + -threads 16 -impact -pretokenized -optimize \ + >& logs/log.beir-v1.0.0--cqadupstack-stats.splade-pp-ed.20231124.a66f86f & + +nohup target/appassembler/bin/IndexCollection \ + -collection JsonVectorCollection \ + -generator DefaultLuceneDocumentGenerator \ + -input /store/collections/beir-v1.0.0/splade-pp-ed/cqadupstack-tex \ + -index indexes/lucene-index.beir-v1.0.0-cqadupstack-tex.splade-pp-ed.20231124.a66f86f \ + -threads 16 -impact -pretokenized -optimize \ + >& logs/log.beir-v1.0.0--cqadupstack-tex.splade-pp-ed.20231124.a66f86f & + +nohup target/appassembler/bin/IndexCollection \ + -collection JsonVectorCollection \ + -generator DefaultLuceneDocumentGenerator \ + -input /store/collections/beir-v1.0.0/splade-pp-ed/cqadupstack-unix \ + -index indexes/lucene-index.beir-v1.0.0-cqadupstack-unix.splade-pp-ed.20231124.a66f86f \ + -threads 16 -impact -pretokenized -optimize \ + >& logs/log.beir-v1.0.0--cqadupstack-unix.splade-pp-ed.20231124.a66f86f & + +nohup target/appassembler/bin/IndexCollection \ + -collection JsonVectorCollection \ + -generator DefaultLuceneDocumentGenerator \ + -input /store/collections/beir-v1.0.0/splade-pp-ed/cqadupstack-webmasters \ + -index indexes/lucene-index.beir-v1.0.0-cqadupstack-webmasters.splade-pp-ed.20231124.a66f86f \ + -threads 16 -impact -pretokenized -optimize \ + >& logs/log.beir-v1.0.0--cqadupstack-webmasters.splade-pp-ed.20231124.a66f86f & + +nohup target/appassembler/bin/IndexCollection \ + -collection JsonVectorCollection \ + -generator DefaultLuceneDocumentGenerator \ + -input /store/collections/beir-v1.0.0/splade-pp-ed/cqadupstack-wordpress \ + -index indexes/lucene-index.beir-v1.0.0-cqadupstack-wordpress.splade-pp-ed.20231124.a66f86f \ + -threads 16 -impact -pretokenized -optimize \ + >& logs/log.beir-v1.0.0--cqadupstack-wordpress.splade-pp-ed.20231124.a66f86f & + +nohup target/appassembler/bin/IndexCollection \ + -collection JsonVectorCollection \ + -generator DefaultLuceneDocumentGenerator \ + -input /store/collections/beir-v1.0.0/splade-pp-ed/dbpedia-entity \ + -index indexes/lucene-index.beir-v1.0.0-dbpedia-entity.splade-pp-ed.20231124.a66f86f \ + -threads 16 -impact -pretokenized -optimize \ + >& logs/log.beir-v1.0.0--dbpedia-entity.splade-pp-ed.20231124.a66f86f & + +nohup target/appassembler/bin/IndexCollection \ + -collection JsonVectorCollection \ + -generator DefaultLuceneDocumentGenerator \ + -input /store/collections/beir-v1.0.0/splade-pp-ed/fever \ + -index indexes/lucene-index.beir-v1.0.0-fever.splade-pp-ed.20231124.a66f86f \ + -threads 16 -impact -pretokenized -optimize \ + >& logs/log.beir-v1.0.0--fever.splade-pp-ed.20231124.a66f86f & + +nohup target/appassembler/bin/IndexCollection \ + -collection JsonVectorCollection \ + -generator DefaultLuceneDocumentGenerator \ + -input /store/collections/beir-v1.0.0/splade-pp-ed/fiqa \ + -index indexes/lucene-index.beir-v1.0.0-fiqa.splade-pp-ed.20231124.a66f86f \ + -threads 16 -impact -pretokenized -optimize \ + >& logs/log.beir-v1.0.0--fiqa.splade-pp-ed.20231124.a66f86f & + +nohup target/appassembler/bin/IndexCollection \ + -collection JsonVectorCollection \ + -generator DefaultLuceneDocumentGenerator \ + -input /store/collections/beir-v1.0.0/splade-pp-ed/hotpotqa \ + -index indexes/lucene-index.beir-v1.0.0-hotpotqa.splade-pp-ed.20231124.a66f86f \ + -threads 16 -impact -pretokenized -optimize \ + >& logs/log.beir-v1.0.0--hotpotqa.splade-pp-ed.20231124.a66f86f & + +nohup target/appassembler/bin/IndexCollection \ + -collection JsonVectorCollection \ + -generator DefaultLuceneDocumentGenerator \ + -input /store/collections/beir-v1.0.0/splade-pp-ed/nfcorpus \ + -index indexes/lucene-index.beir-v1.0.0-nfcorpus.splade-pp-ed.20231124.a66f86f \ + -threads 16 -impact -pretokenized -optimize \ + >& logs/log.beir-v1.0.0--nfcorpus.splade-pp-ed.20231124.a66f86f & + +nohup target/appassembler/bin/IndexCollection \ + -collection JsonVectorCollection \ + -generator DefaultLuceneDocumentGenerator \ + -input /store/collections/beir-v1.0.0/splade-pp-ed/nq \ + -index indexes/lucene-index.beir-v1.0.0-nq.splade-pp-ed.20231124.a66f86f \ + -threads 16 -impact -pretokenized -optimize \ + >& logs/log.beir-v1.0.0--nq.splade-pp-ed.20231124.a66f86f & + +nohup target/appassembler/bin/IndexCollection \ + -collection JsonVectorCollection \ + -generator DefaultLuceneDocumentGenerator \ + -input /store/collections/beir-v1.0.0/splade-pp-ed/quora \ + -index indexes/lucene-index.beir-v1.0.0-quora.splade-pp-ed.20231124.a66f86f \ + -threads 16 -impact -pretokenized -optimize \ + >& logs/log.beir-v1.0.0--quora.splade-pp-ed.20231124.a66f86f & + +nohup target/appassembler/bin/IndexCollection \ + -collection JsonVectorCollection \ + -generator DefaultLuceneDocumentGenerator \ + -input /store/collections/beir-v1.0.0/splade-pp-ed/robust04 \ + -index indexes/lucene-index.beir-v1.0.0-robust04.splade-pp-ed.20231124.a66f86f \ + -threads 16 -impact -pretokenized -optimize \ + >& logs/log.beir-v1.0.0--robust04.splade-pp-ed.20231124.a66f86f & + +nohup target/appassembler/bin/IndexCollection \ + -collection JsonVectorCollection \ + -generator DefaultLuceneDocumentGenerator \ + -input /store/collections/beir-v1.0.0/splade-pp-ed/scidocs \ + -index indexes/lucene-index.beir-v1.0.0-scidocs.splade-pp-ed.20231124.a66f86f \ + -threads 16 -impact -pretokenized -optimize \ + >& logs/log.beir-v1.0.0--scidocs.splade-pp-ed.20231124.a66f86f & + +nohup target/appassembler/bin/IndexCollection \ + -collection JsonVectorCollection \ + -generator DefaultLuceneDocumentGenerator \ + -input /store/collections/beir-v1.0.0/splade-pp-ed/scifact \ + -index indexes/lucene-index.beir-v1.0.0-scifact.splade-pp-ed.20231124.a66f86f \ + -threads 16 -impact -pretokenized -optimize \ + >& logs/log.beir-v1.0.0--scifact.splade-pp-ed.20231124.a66f86f & + +nohup target/appassembler/bin/IndexCollection \ + -collection JsonVectorCollection \ + -generator DefaultLuceneDocumentGenerator \ + -input /store/collections/beir-v1.0.0/splade-pp-ed/signal1m \ + -index indexes/lucene-index.beir-v1.0.0-signal1m.splade-pp-ed.20231124.a66f86f \ + -threads 16 -impact -pretokenized -optimize \ + >& logs/log.beir-v1.0.0--signal1m.splade-pp-ed.20231124.a66f86f & + +nohup target/appassembler/bin/IndexCollection \ + -collection JsonVectorCollection \ + -generator DefaultLuceneDocumentGenerator \ + -input /store/collections/beir-v1.0.0/splade-pp-ed/trec-covid \ + -index indexes/lucene-index.beir-v1.0.0-trec-covid.splade-pp-ed.20231124.a66f86f \ + -threads 16 -impact -pretokenized -optimize \ + >& logs/log.beir-v1.0.0--trec-covid.splade-pp-ed.20231124.a66f86f & + +nohup target/appassembler/bin/IndexCollection \ + -collection JsonVectorCollection \ + -generator DefaultLuceneDocumentGenerator \ + -input /store/collections/beir-v1.0.0/splade-pp-ed/trec-news \ + -index indexes/lucene-index.beir-v1.0.0-trec-news.splade-pp-ed.20231124.a66f86f \ + -threads 16 -impact -pretokenized -optimize \ + >& logs/log.beir-v1.0.0--trec-news.splade-pp-ed.20231124.a66f86f & + +nohup target/appassembler/bin/IndexCollection \ + -collection JsonVectorCollection \ + -generator DefaultLuceneDocumentGenerator \ + -input /store/collections/beir-v1.0.0/splade-pp-ed/webis-touche2020 \ + -index indexes/lucene-index.beir-v1.0.0-webis-touche2020.splade-pp-ed.20231124.a66f86f \ + -threads 16 -impact -pretokenized -optimize \ + >& logs/log.beir-v1.0.0--webis-touche2020.splade-pp-ed.20231124.a66f86f & +``` + +In April 2024, indexes were repackaged to adopt a more consistent naming scheme. diff --git a/pyserini/resources/index-metadata/lucene-inverted.msmarco-v1-doc-segmented.20221004.252b5e.README.md b/pyserini/resources/index-metadata/lucene-inverted.msmarco-v1-doc-segmented.20221004.252b5e.README.md index 5766d69d5..d0533dbf8 100644 --- a/pyserini/resources/index-metadata/lucene-inverted.msmarco-v1-doc-segmented.20221004.252b5e.README.md +++ b/pyserini/resources/index-metadata/lucene-inverted.msmarco-v1-doc-segmented.20221004.252b5e.README.md @@ -29,3 +29,5 @@ target/appassembler/bin/IndexCollection -collection JsonCollection \ -index indexes/lucene-index.msmarco-v1-doc-segmented-full.20221004.252b5e/ \ -storePositions -storeDocvectors -storeRaw -optimize >& logs/log.msmarco-v1-doc-segmented-full.20221004.252b5e & ``` + +In April 2024, indexes were repackaged to adopt a more consistent naming scheme. diff --git a/pyserini/resources/index-metadata/lucene-inverted.msmarco-v1-doc-segmented.d2q-t5.20221004.252b5e.README.md b/pyserini/resources/index-metadata/lucene-inverted.msmarco-v1-doc-segmented.d2q-t5.20221004.252b5e.README.md index 00fa81f12..12ce1e951 100644 --- a/pyserini/resources/index-metadata/lucene-inverted.msmarco-v1-doc-segmented.d2q-t5.20221004.252b5e.README.md +++ b/pyserini/resources/index-metadata/lucene-inverted.msmarco-v1-doc-segmented.d2q-t5.20221004.252b5e.README.md @@ -22,3 +22,5 @@ target/appassembler/bin/IndexCollection -collection JsonCollection \ -index indexes/lucene-index.msmarco-v1-doc-segmented-d2q-t5-docvectors.20221004.252b5e/ \ -storeDocvectors -optimize >& logs/log.msmarco-v1-doc-segmented-d2q-t5-docvectors.20221004.252b5e & ``` + +In April 2024, indexes were repackaged to adopt a more consistent naming scheme. diff --git a/pyserini/resources/index-metadata/lucene-inverted.msmarco-v1-doc-segmented.ltr.20211031.33e4151.README.txt b/pyserini/resources/index-metadata/lucene-inverted.msmarco-v1-doc-segmented.ltr.20211031.33e4151.README.txt index 65aec7251..da9b3f7fb 100644 --- a/pyserini/resources/index-metadata/lucene-inverted.msmarco-v1-doc-segmented.ltr.20211031.33e4151.README.txt +++ b/pyserini/resources/index-metadata/lucene-inverted.msmarco-v1-doc-segmented.ltr.20211031.33e4151.README.txt @@ -10,3 +10,7 @@ This is built with spacy 3.0.6. The max length is 3 and stride is 1. index-msmarco-passage-ltr-20210519-e25e33f MD5 checksum = bd60e89041b4ebbabc4bf0cfac608a87 + +--- + +In May 2024, index was repackaged to adopt a more consistent naming scheme. diff --git a/pyserini/resources/index-metadata/lucene-inverted.msmarco-v1-doc-segmented.unicoil-noexp.20221005.252b5e.README.md b/pyserini/resources/index-metadata/lucene-inverted.msmarco-v1-doc-segmented.unicoil-noexp.20221005.252b5e.README.md index 23964afda..1be4dbf3e 100644 --- a/pyserini/resources/index-metadata/lucene-inverted.msmarco-v1-doc-segmented.unicoil-noexp.20221005.252b5e.README.md +++ b/pyserini/resources/index-metadata/lucene-inverted.msmarco-v1-doc-segmented.unicoil-noexp.20221005.252b5e.README.md @@ -12,3 +12,5 @@ nohup target/appassembler/bin/IndexCollection \ -generator DefaultLuceneDocumentGenerator \ -threads 16 -impact -pretokenized -optimize >& logs/log.msmarco-v1-doc-segmented-unicoil-noexp.20221005.252b5e & ``` + +In April 2024, index was repackaged to adopt a more consistent naming scheme. diff --git a/pyserini/resources/index-metadata/lucene-inverted.msmarco-v1-doc-segmented.unicoil.20221005.252b5e.README.md b/pyserini/resources/index-metadata/lucene-inverted.msmarco-v1-doc-segmented.unicoil.20221005.252b5e.README.md index dbf04dd2c..fe78efadc 100644 --- a/pyserini/resources/index-metadata/lucene-inverted.msmarco-v1-doc-segmented.unicoil.20221005.252b5e.README.md +++ b/pyserini/resources/index-metadata/lucene-inverted.msmarco-v1-doc-segmented.unicoil.20221005.252b5e.README.md @@ -12,3 +12,5 @@ nohup target/appassembler/bin/IndexCollection \ -generator DefaultLuceneDocumentGenerator \ -threads 16 -impact -pretokenized -optimize >& logs/log.msmarco-v1-doc-segmented-unicoil.20221005.252b5e & ``` + +In April 2024, index was repackaged to adopt a more consistent naming scheme. diff --git a/pyserini/resources/index-metadata/lucene-inverted.msmarco-v1-doc.20221004.252b5e.README.md b/pyserini/resources/index-metadata/lucene-inverted.msmarco-v1-doc.20221004.252b5e.README.md index 1719bab44..8f92a5f71 100644 --- a/pyserini/resources/index-metadata/lucene-inverted.msmarco-v1-doc.20221004.252b5e.README.md +++ b/pyserini/resources/index-metadata/lucene-inverted.msmarco-v1-doc.20221004.252b5e.README.md @@ -29,3 +29,5 @@ target/appassembler/bin/IndexCollection -collection JsonCollection \ -index indexes/lucene-index.msmarco-v1-doc-full.20221004.252b5e/ \ -storePositions -storeDocvectors -storeRaw -optimize >& logs/log.msmarco-v1-doc-full.20221004.252b5e & ``` + +In April 2024, indexes were repackaged to adopt a more consistent naming scheme. diff --git a/pyserini/resources/index-metadata/lucene-inverted.msmarco-v1-doc.d2q-t5.20221004.252b5e.README.md b/pyserini/resources/index-metadata/lucene-inverted.msmarco-v1-doc.d2q-t5.20221004.252b5e.README.md index 34aaad437..7eabbaf28 100644 --- a/pyserini/resources/index-metadata/lucene-inverted.msmarco-v1-doc.d2q-t5.20221004.252b5e.README.md +++ b/pyserini/resources/index-metadata/lucene-inverted.msmarco-v1-doc.d2q-t5.20221004.252b5e.README.md @@ -22,3 +22,5 @@ target/appassembler/bin/IndexCollection -collection JsonCollection \ -index indexes/lucene-index.msmarco-v1-doc-d2q-t5-docvectors.20221004.252b5e/ \ -storeDocvectors -optimize >& logs/log.msmarco-v1-doc-d2q-t5-docvectors.20221004.252b5e & ``` + +In April 2024, indexes were repackaged to adopt a more consistent naming scheme. diff --git a/pyserini/resources/index-metadata/lucene-inverted.msmarco-v1-passage.20221004.252b5e.README.md b/pyserini/resources/index-metadata/lucene-inverted.msmarco-v1-passage.20221004.252b5e.README.md index 6de41d6e0..ee247492c 100644 --- a/pyserini/resources/index-metadata/lucene-inverted.msmarco-v1-passage.20221004.252b5e.README.md +++ b/pyserini/resources/index-metadata/lucene-inverted.msmarco-v1-passage.20221004.252b5e.README.md @@ -29,3 +29,5 @@ nohup target/appassembler/bin/IndexCollection -collection JsonCollection \ -index indexes/lucene-index.msmarco-v1-passage-full.20221004.252b5e/ \ -storePositions -storeDocvectors -storeRaw -optimize >& logs/log.msmarco-v1-passage-full.20221004.252b5e & ``` + +In April 2024, indexes were repackaged to adopt a more consistent naming scheme. diff --git a/pyserini/resources/index-metadata/lucene-inverted.msmarco-v1-passage.d2q-t5.20221004.252b5e.README.md b/pyserini/resources/index-metadata/lucene-inverted.msmarco-v1-passage.d2q-t5.20221004.252b5e.README.md index 013ec39c0..f17b3c83b 100644 --- a/pyserini/resources/index-metadata/lucene-inverted.msmarco-v1-passage.d2q-t5.20221004.252b5e.README.md +++ b/pyserini/resources/index-metadata/lucene-inverted.msmarco-v1-passage.d2q-t5.20221004.252b5e.README.md @@ -22,3 +22,5 @@ nohup target/appassembler/bin/IndexCollection -collection JsonCollection \ -index indexes/lucene-index.msmarco-v1-passage-d2q-t5-docvectors.20221004.252b5e/ \ -storeDocvectors -optimize >& logs/log.msmarco-v1-passage-d2q-t5-docvectors.20221004.252b5e & ``` + +In April 2024, indexes were repackaged to adopt a more consistent naming scheme. diff --git a/pyserini/resources/index-metadata/lucene-inverted.msmarco-v1-passage.deepimpact.20221005.252b5e.README.md b/pyserini/resources/index-metadata/lucene-inverted.msmarco-v1-passage.deepimpact.20221005.252b5e.README.md index f640c01e3..37ef9b938 100644 --- a/pyserini/resources/index-metadata/lucene-inverted.msmarco-v1-passage.deepimpact.20221005.252b5e.README.md +++ b/pyserini/resources/index-metadata/lucene-inverted.msmarco-v1-passage.deepimpact.20221005.252b5e.README.md @@ -12,3 +12,5 @@ nohup target/appassembler/bin/IndexCollection \ -generator DefaultLuceneDocumentGenerator \ -threads 16 -impact -pretokenized -optimize >& logs/log.msmarco-v1-passage-deepimpact.20221005.252b5e & ``` + +In April 2024, index was repackaged to adopt a more consistent naming scheme. diff --git a/pyserini/resources/index-metadata/lucene-inverted.msmarco-v1-passage.distill-splade-max.20221005.252b5e.README.md b/pyserini/resources/index-metadata/lucene-inverted.msmarco-v1-passage.distill-splade-max.20221005.252b5e.README.md index 78594299e..e3149fae1 100644 --- a/pyserini/resources/index-metadata/lucene-inverted.msmarco-v1-passage.distill-splade-max.20221005.252b5e.README.md +++ b/pyserini/resources/index-metadata/lucene-inverted.msmarco-v1-passage.distill-splade-max.20221005.252b5e.README.md @@ -12,3 +12,5 @@ nohup target/appassembler/bin/IndexCollection \ -generator DefaultLuceneDocumentGenerator \ -threads 16 -impact -pretokenized -optimize >& logs/log.msmarco-v1-passage-distill-splade-max.20221005.252b5e & ``` + +In April 2024, index was repackaged to adopt a more consistent naming scheme. diff --git a/pyserini/resources/index-metadata/lucene-inverted.msmarco-v1-passage.ltr.20210519.e25e33f.README.txt b/pyserini/resources/index-metadata/lucene-inverted.msmarco-v1-passage.ltr.20210519.e25e33f.README.txt index 4a5e758a8..257695e5a 100644 --- a/pyserini/resources/index-metadata/lucene-inverted.msmarco-v1-passage.ltr.20210519.e25e33f.README.txt +++ b/pyserini/resources/index-metadata/lucene-inverted.msmarco-v1-passage.ltr.20210519.e25e33f.README.txt @@ -9,3 +9,7 @@ Note, pretokenized option is used to keep preprocessed tokenization. This is built with spacy 3.0.6. index-msmarco-passage-ltr-20210519-e25e33f MD5 checksum = a5de642c268ac1ed5892c069bdc29ae3 + +--- + +In May 2024, index was repackaged to adopt a more consistent naming scheme. diff --git a/pyserini/resources/index-metadata/lucene-inverted.msmarco-v1-passage.slimr-pp.20230925.md b/pyserini/resources/index-metadata/lucene-inverted.msmarco-v1-passage.slimr-pp.20230925.md index efe02b93e..d2b6064f7 100644 --- a/pyserini/resources/index-metadata/lucene-inverted.msmarco-v1-passage.slimr-pp.20230925.md +++ b/pyserini/resources/index-metadata/lucene-inverted.msmarco-v1-passage.slimr-pp.20230925.md @@ -1,5 +1,6 @@ This index was generated on 2023/02/20 with the following command: +``` python -m pyserini.index.lucene \ --collection JsonVectorCollection \ --input collections/slimr_qtopk20_ptopk20_hardneg7_nobalanced_hardneg_distilled \ @@ -7,5 +8,8 @@ python -m pyserini.index.lucene \ --generator DefaultLuceneDocumentGenerator \ --threads 48 \ --impact --pretokenized +``` lucene-index.msmarco-v1-passage-slimr-pp.20230925.tar.gz MD5 checksum = 5badbe47b6a50cf252cafb8a648743f1 + +In April 2024, index was repackaged to adopt a more consistent naming scheme. diff --git a/pyserini/resources/index-metadata/lucene-inverted.msmarco-v1-passage.slimr.20230925.md b/pyserini/resources/index-metadata/lucene-inverted.msmarco-v1-passage.slimr.20230925.md index a12443f83..131a63dcc 100644 --- a/pyserini/resources/index-metadata/lucene-inverted.msmarco-v1-passage.slimr.20230925.md +++ b/pyserini/resources/index-metadata/lucene-inverted.msmarco-v1-passage.slimr.20230925.md @@ -1,5 +1,6 @@ This index was generated on 2023/02/20 with the following command: +``` python -m pyserini.index.lucene \ --collection JsonVectorCollection \ --input collections/slimr_qtopk20_ptopk20_hardneg7_nobalanced \ @@ -7,5 +8,8 @@ python -m pyserini.index.lucene \ --generator DefaultLuceneDocumentGenerator \ --threads 48 \ --impact --pretokenized +``` lucene-index.msmarco-v1-passage-slimr.20230925.tar.gz MD5 checksum = 3532a09a4a47f862d63b8df81b39ecc9 + +In April 2024, index was repackaged to adopt a more consistent naming scheme. diff --git a/pyserini/resources/index-metadata/lucene-inverted.msmarco-v1-passage.splade-pp.20230524.a59610.README.md b/pyserini/resources/index-metadata/lucene-inverted.msmarco-v1-passage.splade-pp.20230524.a59610.README.md index 057f8190d..f91d47668 100644 --- a/pyserini/resources/index-metadata/lucene-inverted.msmarco-v1-passage.splade-pp.20230524.a59610.README.md +++ b/pyserini/resources/index-metadata/lucene-inverted.msmarco-v1-passage.splade-pp.20230524.a59610.README.md @@ -61,3 +61,5 @@ target/appassembler/bin/IndexCollection \ -threads 16 -impact -pretokenized -storeRaw -optimize \ >& logs/log.msmarco-v1-passage-splade-pp-sd-text.20230524.a59610 & ``` + +In April 2024, indexes were repackaged to adopt a more consistent naming scheme. diff --git a/pyserini/resources/index-metadata/lucene-inverted.msmarco-v1-passage.unicoil-noexp.20221005.252b5e.README.md b/pyserini/resources/index-metadata/lucene-inverted.msmarco-v1-passage.unicoil-noexp.20221005.252b5e.README.md index e661cd20e..223a15926 100644 --- a/pyserini/resources/index-metadata/lucene-inverted.msmarco-v1-passage.unicoil-noexp.20221005.252b5e.README.md +++ b/pyserini/resources/index-metadata/lucene-inverted.msmarco-v1-passage.unicoil-noexp.20221005.252b5e.README.md @@ -12,3 +12,5 @@ nohup target/appassembler/bin/IndexCollection \ -generator DefaultLuceneDocumentGenerator \ -threads 16 -impact -pretokenized -optimize >& logs/log.msmarco-v1-passage-unicoil-noexp.20221005.252b5e & ``` + +In April 2024, index was repackaged to adopt a more consistent naming scheme. diff --git a/pyserini/resources/index-metadata/lucene-inverted.msmarco-v1-passage.unicoil-tilde.20221005.252b5e.README.md b/pyserini/resources/index-metadata/lucene-inverted.msmarco-v1-passage.unicoil-tilde.20221005.252b5e.README.md index 1652109cd..dc0fb7f44 100644 --- a/pyserini/resources/index-metadata/lucene-inverted.msmarco-v1-passage.unicoil-tilde.20221005.252b5e.README.md +++ b/pyserini/resources/index-metadata/lucene-inverted.msmarco-v1-passage.unicoil-tilde.20221005.252b5e.README.md @@ -12,3 +12,5 @@ nohup target/appassembler/bin/IndexCollection \ -generator DefaultLuceneDocumentGenerator \ -threads 16 -impact -pretokenized -optimize >& logs/log.msmarco-v1-passage-unicoil-tilde.20221005.252b5e & ``` + +In April 2024, index was repackaged to adopt a more consistent naming scheme. diff --git a/pyserini/resources/index-metadata/lucene-inverted.msmarco-v1-passage.unicoil.20221005.252b5e.README.md b/pyserini/resources/index-metadata/lucene-inverted.msmarco-v1-passage.unicoil.20221005.252b5e.README.md index 2813b6cfa..2daa532c1 100644 --- a/pyserini/resources/index-metadata/lucene-inverted.msmarco-v1-passage.unicoil.20221005.252b5e.README.md +++ b/pyserini/resources/index-metadata/lucene-inverted.msmarco-v1-passage.unicoil.20221005.252b5e.README.md @@ -12,3 +12,5 @@ nohup target/appassembler/bin/IndexCollection \ -generator DefaultLuceneDocumentGenerator \ -threads 16 -impact -pretokenized -optimize >& logs/log.msmarco-v1-passage-unicoil.20221005.252b5e & ``` + +In April 2024, index was repackaged to adopt a more consistent naming scheme. diff --git a/pyserini/resources/index-metadata/lucene-inverted.msmarco-v2-doc-segmented.20220808.4d6d2a.README.md b/pyserini/resources/index-metadata/lucene-inverted.msmarco-v2-doc-segmented.20220808.4d6d2a.README.md index c64d860b9..ddf7bad0d 100644 --- a/pyserini/resources/index-metadata/lucene-inverted.msmarco-v2-doc-segmented.20220808.4d6d2a.README.md +++ b/pyserini/resources/index-metadata/lucene-inverted.msmarco-v2-doc-segmented.20220808.4d6d2a.README.md @@ -32,3 +32,5 @@ nohup target/appassembler/bin/IndexCollection -collection MsMarcoV2DocCollection -storePositions -storeDocvectors -storeRaw -optimize \ >& logs/log.msmarco-v2-doc-segmented-full.20220808.4d6d2a.txt & ``` + +In May 2024, indexes were repackaged to adopt a more consistent naming scheme. diff --git a/pyserini/resources/index-metadata/lucene-inverted.msmarco-v2-doc-segmented.d2q-t5.20220808.4d6d2a.README.md b/pyserini/resources/index-metadata/lucene-inverted.msmarco-v2-doc-segmented.d2q-t5.20220808.4d6d2a.README.md index 631e85c82..a83a7ce15 100644 --- a/pyserini/resources/index-metadata/lucene-inverted.msmarco-v2-doc-segmented.d2q-t5.20220808.4d6d2a.README.md +++ b/pyserini/resources/index-metadata/lucene-inverted.msmarco-v2-doc-segmented.d2q-t5.20220808.4d6d2a.README.md @@ -24,3 +24,5 @@ nohup target/appassembler/bin/IndexCollection -collection MsMarcoV2DocCollection -storeDocvectors -optimize \ >& logs/log.msmarco-v2-doc-segmented-d2q-t5-docvectors.20220808.4d6d2a.txt & ``` + +In May 2024, indexes were repackaged to adopt a more consistent naming scheme. diff --git a/pyserini/resources/index-metadata/lucene-inverted.msmarco-v2-doc-segmented.unicoil-0shot.20220808.4d6d2a.README.md b/pyserini/resources/index-metadata/lucene-inverted.msmarco-v2-doc-segmented.unicoil-0shot.20220808.4d6d2a.README.md index 09d20888e..df2fdc75e 100644 --- a/pyserini/resources/index-metadata/lucene-inverted.msmarco-v2-doc-segmented.unicoil-0shot.20220808.4d6d2a.README.md +++ b/pyserini/resources/index-metadata/lucene-inverted.msmarco-v2-doc-segmented.unicoil-0shot.20220808.4d6d2a.README.md @@ -13,3 +13,5 @@ nohup target/appassembler/bin/IndexCollection \ -threads 18 -impact -pretokenized -optimize \ >& logs/log.msmarco-v2-doc-segmented-unicoil-0shot.20220808.4d6d2a.txt & ``` + +In May 2024, index was repackaged to adopt a more consistent naming scheme. diff --git a/pyserini/resources/index-metadata/lucene-inverted.msmarco-v2-doc-segmented.unicoil-noexp-0shot.20220808.4d6d2a.README.md b/pyserini/resources/index-metadata/lucene-inverted.msmarco-v2-doc-segmented.unicoil-noexp-0shot.20220808.4d6d2a.README.md index 2727c9314..3440d0663 100644 --- a/pyserini/resources/index-metadata/lucene-inverted.msmarco-v2-doc-segmented.unicoil-noexp-0shot.20220808.4d6d2a.README.md +++ b/pyserini/resources/index-metadata/lucene-inverted.msmarco-v2-doc-segmented.unicoil-noexp-0shot.20220808.4d6d2a.README.md @@ -13,3 +13,5 @@ nohup target/appassembler/bin/IndexCollection \ -threads 18 -impact -pretokenized -optimize \ >& logs/log.msmarco-v2-doc-segmented-unicoil-noexp-0shot.20220808.4d6d2a.txt & ``` + +In May 2024, index was repackaged to adopt a more consistent naming scheme. diff --git a/pyserini/resources/index-metadata/lucene-inverted.msmarco-v2-doc.20220808.4d6d2a.README.md b/pyserini/resources/index-metadata/lucene-inverted.msmarco-v2-doc.20220808.4d6d2a.README.md index 77c05dcaf..17e5a9023 100644 --- a/pyserini/resources/index-metadata/lucene-inverted.msmarco-v2-doc.20220808.4d6d2a.README.md +++ b/pyserini/resources/index-metadata/lucene-inverted.msmarco-v2-doc.20220808.4d6d2a.README.md @@ -32,3 +32,5 @@ nohup target/appassembler/bin/IndexCollection -collection MsMarcoV2DocCollection -storePositions -storeDocvectors -storeRaw -optimize \ >& logs/log.msmarco-v2-doc-full.20220808.4d6d2a.txt & ``` + +In May 2024, indexes were repackaged to adopt a more consistent naming scheme. diff --git a/pyserini/resources/index-metadata/lucene-inverted.msmarco-v2-doc.d2q-t5.20220808.4d6d2a.README.md b/pyserini/resources/index-metadata/lucene-inverted.msmarco-v2-doc.d2q-t5.20220808.4d6d2a.README.md index 9a12701b0..de63cfbcc 100644 --- a/pyserini/resources/index-metadata/lucene-inverted.msmarco-v2-doc.d2q-t5.20220808.4d6d2a.README.md +++ b/pyserini/resources/index-metadata/lucene-inverted.msmarco-v2-doc.d2q-t5.20220808.4d6d2a.README.md @@ -24,3 +24,5 @@ nohup target/appassembler/bin/IndexCollection -collection MsMarcoV2DocCollection -storeDocvectors -optimize \ >& logs/log.msmarco-v2-doc-d2q-t5-docvectors.20220808.4d6d2a.txt & ``` + +In May 2024, indexes were repackaged to adopt a more consistent naming scheme. diff --git a/pyserini/resources/index-metadata/lucene-inverted.msmarco-v2-passage-augmented.20220808.4d6d2a.README.md b/pyserini/resources/index-metadata/lucene-inverted.msmarco-v2-passage-augmented.20220808.4d6d2a.README.md index 6be4bee58..2ce1dc255 100644 --- a/pyserini/resources/index-metadata/lucene-inverted.msmarco-v2-passage-augmented.20220808.4d6d2a.README.md +++ b/pyserini/resources/index-metadata/lucene-inverted.msmarco-v2-passage-augmented.20220808.4d6d2a.README.md @@ -32,3 +32,5 @@ nohup target/appassembler/bin/IndexCollection -collection MsMarcoV2PassageCollec -storePositions -storeDocvectors -storeRaw -optimize \ >& logs/log.msmarco-v2-passage-augmented-full.20220808.4d6d2a.txt & ``` + +In May 2024, indexes were repackaged to adopt a more consistent naming scheme. diff --git a/pyserini/resources/index-metadata/lucene-inverted.msmarco-v2-passage-augmented.d2q-t5.20220808.4d6d2a.README.md b/pyserini/resources/index-metadata/lucene-inverted.msmarco-v2-passage-augmented.d2q-t5.20220808.4d6d2a.README.md index 782e3fe14..337f0dd6a 100644 --- a/pyserini/resources/index-metadata/lucene-inverted.msmarco-v2-passage-augmented.d2q-t5.20220808.4d6d2a.README.md +++ b/pyserini/resources/index-metadata/lucene-inverted.msmarco-v2-passage-augmented.d2q-t5.20220808.4d6d2a.README.md @@ -24,3 +24,5 @@ nohup target/appassembler/bin/IndexCollection -collection MsMarcoV2PassageCollec -storeDocvectors -optimize \ >& logs/log.msmarco-v2-passage-augmented-d2q-t5-docvectors.20220808.4d6d2a.txt & ``` + +In May 2024, indexes were repackaged to adopt a more consistent naming scheme. diff --git a/pyserini/resources/index-metadata/lucene-inverted.msmarco-v2-passage.20220808.4d6d2a.README.md b/pyserini/resources/index-metadata/lucene-inverted.msmarco-v2-passage.20220808.4d6d2a.README.md index 4ea429f61..ba869baa4 100644 --- a/pyserini/resources/index-metadata/lucene-inverted.msmarco-v2-passage.20220808.4d6d2a.README.md +++ b/pyserini/resources/index-metadata/lucene-inverted.msmarco-v2-passage.20220808.4d6d2a.README.md @@ -32,3 +32,5 @@ nohup target/appassembler/bin/IndexCollection -collection MsMarcoV2PassageCollec -storePositions -storeDocvectors -storeRaw -optimize \ >& logs/log.msmarco-v2-passage-full.20220808.4d6d2a.txt & ``` + +In May 2024, indexes were repackaged to adopt a more consistent naming scheme. diff --git a/pyserini/resources/index-metadata/lucene-inverted.msmarco-v2-passage.d2q-t5.20220808.4d6d2a.README.md b/pyserini/resources/index-metadata/lucene-inverted.msmarco-v2-passage.d2q-t5.20220808.4d6d2a.README.md index 23c350f0b..4e2a59d07 100644 --- a/pyserini/resources/index-metadata/lucene-inverted.msmarco-v2-passage.d2q-t5.20220808.4d6d2a.README.md +++ b/pyserini/resources/index-metadata/lucene-inverted.msmarco-v2-passage.d2q-t5.20220808.4d6d2a.README.md @@ -24,3 +24,5 @@ nohup target/appassembler/bin/IndexCollection -collection MsMarcoV2PassageCollec -storeDocvectors -optimize \ >& logs/log.msmarco-v2-passage-d2q-t5-docvectors.20220808.4d6d2a.txt & ``` + +In May 2024, indexes were repackaged to adopt a more consistent naming scheme. diff --git a/pyserini/resources/index-metadata/lucene-inverted.msmarco-v2-passage.unicoil-0shot.20220808.4d6d2a.README.md b/pyserini/resources/index-metadata/lucene-inverted.msmarco-v2-passage.unicoil-0shot.20220808.4d6d2a.README.md index 6c837f8a0..dd96e719c 100644 --- a/pyserini/resources/index-metadata/lucene-inverted.msmarco-v2-passage.unicoil-0shot.20220808.4d6d2a.README.md +++ b/pyserini/resources/index-metadata/lucene-inverted.msmarco-v2-passage.unicoil-0shot.20220808.4d6d2a.README.md @@ -13,3 +13,5 @@ nohup target/appassembler/bin/IndexCollection \ -threads 18 -impact -pretokenized -optimize \ >& logs/log.msmarco-v2-passage-unicoil-0shot.20220808.4d6d2a.txt & ``` + +In May 2024, index was repackaged to adopt a more consistent naming scheme. diff --git a/pyserini/resources/index-metadata/lucene-inverted.msmarco-v2-passage.unicoil-noexp-0shot.20220808.4d6d2a.README.md b/pyserini/resources/index-metadata/lucene-inverted.msmarco-v2-passage.unicoil-noexp-0shot.20220808.4d6d2a.README.md index 97dd1db97..6f26e6dcb 100644 --- a/pyserini/resources/index-metadata/lucene-inverted.msmarco-v2-passage.unicoil-noexp-0shot.20220808.4d6d2a.README.md +++ b/pyserini/resources/index-metadata/lucene-inverted.msmarco-v2-passage.unicoil-noexp-0shot.20220808.4d6d2a.README.md @@ -13,3 +13,5 @@ nohup target/appassembler/bin/IndexCollection \ -threads 18 -impact -pretokenized -optimize \ >& logs/log.msmarco-v2-passage-unicoil-noexp-0shot.20220808.4d6d2a.txt & ``` + +In May 2024, index was repackaged to adopt a more consistent naming scheme. diff --git a/scripts/generate_docs_from_prebuilt_indexes.py b/scripts/generate_docs_from_prebuilt_indexes.py index 732477bc5..8d1af1ab6 100644 --- a/scripts/generate_docs_from_prebuilt_indexes.py +++ b/scripts/generate_docs_from_prebuilt_indexes.py @@ -88,14 +88,70 @@ def generate_prebuilt(index): if __name__ == '__main__': print(__boilerplate__) print('\n\n## Standard Lucene Indexes') + + print('
') + print('MS MARCO') generate_prebuilt(TF_INDEX_INFO_MSMARCO) + print('
') + + print('
') + print('BEIR') generate_prebuilt(TF_INDEX_INFO_BEIR) + print('
') + + print('
') + print('Mr.TyDi') generate_prebuilt(TF_INDEX_INFO_MRTYDI) + print('
') + + print('
') + print('MIRACL') generate_prebuilt(TF_INDEX_INFO_MIRACL) + print('
') + + print('
') + print('Other') generate_prebuilt(TF_INDEX_INFO_CIRAL) generate_prebuilt(TF_INDEX_INFO_OTHER) + print('
') + print('\n\n## Lucene Impact Indexes') + + print('
') + print('MS MARCO') generate_prebuilt(IMPACT_INDEX_INFO_MSMARCO) + print('
') + + print('
') + print('BEIR') generate_prebuilt(IMPACT_INDEX_INFO_BEIR) + print('
') + print('\n\n## Faiss Indexes') - generate_prebuilt(FAISS_INDEX_INFO) + + print('
') + print('MS MARCO') + generate_prebuilt(FAISS_INDEX_INFO_MSMARCO) + print('
') + + print('
') + print('BEIR') + generate_prebuilt(FAISS_INDEX_INFO_BEIR) + print('
') + + print('
') + print('Mr.TyDi') + generate_prebuilt(FAISS_INDEX_INFO_MRTYDI) + print('
') + + print('
') + print('MIRACL') + generate_prebuilt(FAISS_INDEX_INFO_MIRACL) + print('
') + + print('
') + print('Other') + generate_prebuilt(FAISS_INDEX_INFO_CIRAL) + generate_prebuilt(FAISS_INDEX_INFO_WIKIPEDIA) + generate_prebuilt(FAISS_INDEX_INFO_OTHER) + print('
')