Skip to content

Commit

Permalink
Refactored/Corrected a part of documentation (#1851)
Browse files Browse the repository at this point in the history
  • Loading branch information
ashishakkumar committed Apr 3, 2024
1 parent cbdb598 commit be7d750
Showing 1 changed file with 7 additions and 7 deletions.
14 changes: 7 additions & 7 deletions docs/usage-index.md
Original file line number Diff line number Diff line change
Expand Up @@ -242,7 +242,7 @@ python -m pyserini.encode \
input --corpus tests/resources/simple_cacm_corpus.json \
--fields text \
output --embeddings path/to/output/dir \
encoder --encoder castorini/unicoil-d2q-msmarco-passage \
encoder --encoder castorini/unicoil-msmarco-passage \
--fields text \
--batch 32 \
--fp16 # if inference with autocast()
Expand All @@ -256,14 +256,14 @@ The output will be stored in jsonl format. Each line contains following info:
}
```

Once the collections are [encoded](usage-encode.md) into vectors,
Once the collections are encoded into vectors,
we can start to build the index.

Pyserini supports four types of index so far:
1. [HNSWPQ](https://faiss.ai/cpp_api/struct/structfaiss_1_1IndexHNSWPQ.html#struct-faiss-indexhnswpq)
```bash
python -m pyserini.index.faiss \
--input path/to/encoded/corpus \ # either in the Faiss or the jsonl format
--input path/to/encoded/corpus \ # Folder containing file either in the Faiss or the jsonl format
--output path/to/output/index \
--hnsw \
--pq
Expand All @@ -272,15 +272,15 @@ python -m pyserini.index.faiss \
2. [HNSW](https://faiss.ai/cpp_api/struct/structfaiss_1_1IndexHNSW.html#struct-faiss-indexhnsw)
```bash
python -m pyserini.index.faiss \
--input path/to/encoded/corpus \ # either in the Faiss or the jsonl format
--input path/to/encoded/corpus \ # The folder containing file either in the Faiss or the jsonl format
--output path/to/output/index \
--hnsw
```

3. [PQ](https://faiss.ai/cpp_api/struct/structfaiss_1_1IndexPQ.html)
```bash
python -m pyserini.index.faiss \
--input path/to/encoded/corpus \ # either in the Faiss or the jsonl format
--input path/to/encoded/corpus \ # The folder containing file either in the Faiss or the jsonl format
--output path/to/output/index \
--pq
```
Expand All @@ -290,8 +290,8 @@ This command is for converting the `.jsonl` format into Faiss flat format,
and generates the same files with `pyserini.encode` with `--to-faiss` specified.
```bash
python -m pyserini.index.faiss \
--input path/to/encoded/corpus \ # in jsonl format
--output path/to/output/index \
--input path/to/encoded/corpus \ # The folder containing file in jsonl format
--output path/to/output/index
```

Once the index is built, you can use `FaissSearcher` to search in the collection:
Expand Down

0 comments on commit be7d750

Please sign in to comment.