Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactored/ Corrected a part of Documentation #1851

Merged
merged 2 commits into from
Apr 3, 2024
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
14 changes: 7 additions & 7 deletions docs/usage-index.md
Original file line number Diff line number Diff line change
Expand Up @@ -242,7 +242,7 @@ python -m pyserini.encode \
input --corpus tests/resources/simple_cacm_corpus.json \
--fields text \
output --embeddings path/to/output/dir \
encoder --encoder castorini/unicoil-d2q-msmarco-passage \
encoder --encoder castorini/unicoil-msmarco-passage \
--fields text \
--batch 32 \
--fp16 # if inference with autocast()
Expand All @@ -256,14 +256,14 @@ The output will be stored in jsonl format. Each line contains following info:
}
```

Once the collections are [encoded](usage-encode.md) into vectors,
Once the collections are encoded into vectors,
we can start to build the index.

Pyserini supports four types of index so far:
1. [HNSWPQ](https://faiss.ai/cpp_api/struct/structfaiss_1_1IndexHNSWPQ.html#struct-faiss-indexhnswpq)
```bash
python -m pyserini.index.faiss \
--input path/to/encoded/corpus \ # either in the Faiss or the jsonl format
--input path/to/encoded/corpus \ # Folder containing file either in the Faiss or the jsonl format
--output path/to/output/index \
--hnsw \
--pq
Expand All @@ -272,15 +272,15 @@ python -m pyserini.index.faiss \
2. [HNSW](https://faiss.ai/cpp_api/struct/structfaiss_1_1IndexHNSW.html#struct-faiss-indexhnsw)
```bash
python -m pyserini.index.faiss \
--input path/to/encoded/corpus \ # either in the Faiss or the jsonl format
--input path/to/encoded/corpus \ # The folder containing file either in the Faiss or the jsonl format
--output path/to/output/index \
--hnsw
```

3. [PQ](https://faiss.ai/cpp_api/struct/structfaiss_1_1IndexPQ.html)
```bash
python -m pyserini.index.faiss \
--input path/to/encoded/corpus \ # either in the Faiss or the jsonl format
--input path/to/encoded/corpus \ # The folder containing file either in the Faiss or the jsonl format
--output path/to/output/index \
--pq
```
Expand All @@ -290,8 +290,8 @@ This command is for converting the `.jsonl` format into Faiss flat format,
and generates the same files with `pyserini.encode` with `--to-faiss` specified.
```bash
python -m pyserini.index.faiss \
--input path/to/encoded/corpus \ # in jsonl format
--output path/to/output/index \
--input path/to/encoded/corpus \ # The folder containing file in jsonl format
--output path/to/output/index
```

Once the index is built, you can use `FaissSearcher` to search in the collection:
Expand Down