Format missmatch when trying to load trec-car index. #962
-
Hi, I was trying to build the trec-car index and access it with pyserini, here is what i did. First i cloned and installed anserini and anserini-tools
then, i indexed my collection ( 1 .cbor file from http://trec-car.cs.unh.edu/datareleases/v2.0/paragraphCorpus.v2.0.tar.xz) with the following command -generator DefaultLuceneDocumentGenerator -threads 1 -input ./paragraphCorpus -index \
./lucene-index.car17 -storeRaw With this process, i was able to generate the index correctly, but when i tried to access the index with pyserini simpleSearcher i got the following java exception
Some other information that might be useful
I am not sure what was the problem here, but would appreciate any help. |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 2 replies
-
You're getting Lucene index incompatibility issues. Anserini master is based on Lucene 8.11 now: https://github.com/castorini/anserini/blob/master/docs/release-notes/release-notes-v0.14.0.md Previously it was based on Lucene 8.3 - Pyserini PyPI artifact 0.14.0 is still based on Lucene 8.3. Are you indexing from Pyserini (Python) or Anserini (Java)? If you're in Java-land exclusively (both indexing and search), you should be fine. You're probably mixing Python and Java and a way that's exposing this incompatability. |
Beta Was this translation helpful? Give feedback.
You're getting Lucene index incompatibility issues.
Anserini master is based on Lucene 8.11 now: https://github.com/castorini/anserini/blob/master/docs/release-notes/release-notes-v0.14.0.md
Previously it was based on Lucene 8.3 - Pyserini PyPI artifact 0.14.0 is still based on Lucene 8.3.
Are you indexing from Pyserini (Python) or Anserini (Java)?
If you're in Java-land exclusively (both indexing and search), you should be fine. You're probably mixing Python and Java and a way that's exposing this incompatability.