Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Documentation for Binary Quantization Support with KNN Vector Search #8281

Merged
merged 8 commits into from
Sep 17, 2024
172 changes: 171 additions & 1 deletion _search-plugins/knn/knn-vector-quantization.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@

By default, the k-NN plugin supports the indexing and querying of vectors of type `float`, where each dimension of the vector occupies 4 bytes of memory. For use cases that require ingestion on a large scale, keeping `float` vectors can be expensive because OpenSearch needs to construct, load, save, and search graphs (for native `nmslib` and `faiss` engines). To reduce the memory footprint, you can use vector quantization.

OpenSearch supports many varieties of quantization. In general, the level of quantization will provide a trade-off between the accuracy of the nearest neighbor search and the size of the memory footprint consumed by the vector search. The supported types include byte vectors, 16-bit scalar quantization, and product quantization (PQ).
OpenSearch supports many varieties of quantization. In general, the level of quantization will provide a trade-off between the accuracy of the nearest neighbor search and the size of the memory footprint consumed by the vector search. The supported types include byte vectors, 16-bit scalar quantization, product quantization (PQ), and binary quantization(BQ).

## Lucene byte vector

Expand Down Expand Up @@ -310,3 +310,173 @@
```r
1.1*((8 / 8 * 64 + 24) * 1000000 + 100 * (2^8 * 4 * 256 + 4 * 512 * 256)) ~= 0.171 GB
```

## Binary Quantization
kolchfa-aws marked this conversation as resolved.
Show resolved Hide resolved
kolchfa-aws marked this conversation as resolved.
Show resolved Hide resolved

Starting with the version 2.17, OpenSearch supports binary quantization (BQ) with binary vector support for the Faiss engine. Binary quantization compresses vectors into a binary format (0s and 1s), making it highly efficient in terms of memory usage. You can choose to represent each vector dimension using 1, 2, or 4 bits, depending on the desired precision. One of the advantages of using binary quantization is that the training process is handled automatically during indexing. This means that no separate training step is required, unlike other quantization techniques such as product quantization.

Check warning on line 316 in _search-plugins/knn/knn-vector-quantization.md

View workflow job for this annotation

GitHub Actions / style-job

[vale] reported by reviewdog 🐶 [OpenSearch.UnitsSpacing] Put a space between the number and the units in '0s '. Raw Output: {"message": "[OpenSearch.UnitsSpacing] Put a space between the number and the units in '0s '.", "location": {"path": "_search-plugins/knn/knn-vector-quantization.md", "range": {"start": {"line": 316, "column": 188}}}, "severity": "WARNING"}
kolchfa-aws marked this conversation as resolved.
Show resolved Hide resolved

### Using binary quantization
kolchfa-aws marked this conversation as resolved.
Show resolved Hide resolved
To configure binary quantization for the Faiss engine, define a `knn_vector` field and specify the `mode` as `on_disk`. This configuration defaults to 1-bit binary quantization and both `ef_search` and `ef_construction` set to `100`:
kolchfa-aws marked this conversation as resolved.
Show resolved Hide resolved
kolchfa-aws marked this conversation as resolved.
Show resolved Hide resolved
```json
PUT my-vector-index
{
"mappings": {
Vikasht34 marked this conversation as resolved.
Show resolved Hide resolved
"properties": {
"my_vector_field": {
"type": "knn_vector",
"dimension": 8,
"space_type": "l2",
"data_type": "float",
"mode": "on_disk"
}
}
}
}
```
kolchfa-aws marked this conversation as resolved.
Show resolved Hide resolved
{% include copy-curl.html %}

To further optimize the configuration, you can specify additional parameters such as the compression level and fine-tune the search parameters. For example, you can override the `ef_construction` value or define the compression level, which corresponds to the number of bits used for quantization:
kolchfa-aws marked this conversation as resolved.
Show resolved Hide resolved

- **32x compression** for 1-bit quantization
- **16x compression** for 2-bit quantization
- **8x compression** for 4-bit quantization

This allows for greater control over memory usage and recall performance, providing flexibility to balance between precision and storage efficiency.
kolchfa-aws marked this conversation as resolved.
Show resolved Hide resolved

To specify the compression level, set the `compression_level` parameter:

```json
PUT my-vector-index
{
"mappings": {
"properties": {
"my_vector_field": {
"type": "knn_vector",
"dimension": 8,
"space_type": "l2",
"data_type": "float",
"mode": "on_disk",
"compression_level": "16x",
"method": {
"params": {
"ef_construction": 16
}
}
}
}
}
}
```
kolchfa-aws marked this conversation as resolved.
Show resolved Hide resolved
{% include copy-curl.html %}

The following example futher fine-tunes the configuration by defining `ef_construction` , `encoder` and the number of bits `bits`:

Check failure on line 372 in _search-plugins/knn/knn-vector-quantization.md

View workflow job for this annotation

GitHub Actions / style-job

[vale] reported by reviewdog 🐶 [OpenSearch.Spelling] Error: futher. If you are referencing a setting, variable, format, function, or repository, surround it with tic marks. Raw Output: {"message": "[OpenSearch.Spelling] Error: futher. If you are referencing a setting, variable, format, function, or repository, surround it with tic marks.", "location": {"path": "_search-plugins/knn/knn-vector-quantization.md", "range": {"start": {"line": 372, "column": 23}}}, "severity": "ERROR"}
kolchfa-aws marked this conversation as resolved.
Show resolved Hide resolved
kolchfa-aws marked this conversation as resolved.
Show resolved Hide resolved

```json
PUT my-vector-index
{
"mappings": {
"properties": {
"my_vector_field": {
"type": "knn_vector",
"dimension": 8,
"method": {
"name": "hnsw",
"engine": "faiss",
"space_type": "l2",
"params": {
"m": 16,
"ef_construction": 512,
"encoder": {
"name": "binary",
"parameters": {
"bits": 1 // Can be 1, 2, or 4
kolchfa-aws marked this conversation as resolved.
Show resolved Hide resolved
}
}
}
}
}
}
}
}
```
kolchfa-aws marked this conversation as resolved.
Show resolved Hide resolved
{% include copy-curl.html %}

### Search using binary quantized vectors
kolchfa-aws marked this conversation as resolved.
Show resolved Hide resolved
You can perform a k-NN search on your index by providing a vector and specifying the number of nearest neighbors (k) to return:

```json
GET my-vector-index/_search
{
"size": 2,
"query": {
"knn": {
"my_vector_field": {
"vector": [1.5, 5.5, 1.5, 5.5, 1.5, 5.5, 1.5, 5.5],
"k": 10
}
}
}
}
```
kolchfa-aws marked this conversation as resolved.
Show resolved Hide resolved
{% include copy-curl.html %}

You can also fine-tune search by providing the `ef_search` and `oversample_factor` parameters
kolchfa-aws marked this conversation as resolved.
Show resolved Hide resolved
The `oversample_factor` parameter controls the factor by which the search oversamples the candidate vectors before ranking them. A higher oversample factor means more candidates will be considered before ranking, improving accuracy but also increasing search time. When selecting the `oversample_factor` value, consider the trade-off between accuracy and efficiency. For example, setting the `oversample_factor` to `2.0` will double the number of candidates considered during the ranking phase, which may help achieve better results.
kolchfa-aws marked this conversation as resolved.
Show resolved Hide resolved

The following request specifies the `ef_search` and `oversample_factor` parameters:

```json
GET my-vector-index/_search
{
"size": 2,
"query": {
"knn": {
"my_vector_field": {
"vector": [1.5, 5.5, 1.5, 5.5, 1.5, 5.5, 1.5, 5.5],
"k": 10,
"method_params": {
"ef_search": 10
},
"rescore": {
"oversample_factor": 10.0
}
}
}
}
}
```
kolchfa-aws marked this conversation as resolved.
Show resolved Hide resolved
{% include copy-curl.html %}


#### HNSW memory estimation

The memory required for the Hierarchical Navigable Small World (HNSW) graph can be estimated as `1.1 * (dimension + 8 * m)` bytes/vector, where `m` is the maximum number of bidirectional links created for each element during the construction of the graph.

As an example, assume that you have 1 million vectors with a dimension of 256 and an `m` of 16. The following sections provide memory requirement estimations for various compression values.

##### 1-bit quantization (32x compression)
kolchfa-aws marked this conversation as resolved.
Show resolved Hide resolved

In 1-bit quantization, each dimension is represented using 1 bit, equivalent to a 32x compression factor. The memory requirement can be estimated as follows:

```r
Memory = 1.1 * ((256 * 1 / 8) + 8 * 16) * 1,000,000
~= 0.176 GB
```

##### 2-bit quantization (16x compression)
kolchfa-aws marked this conversation as resolved.
Show resolved Hide resolved

In 2-bit quantization, each dimension is represented using 2 bits, equivalent to a 16x compression factor. The memory requirement can be estimated as follows:

```r
kolchfa-aws marked this conversation as resolved.
Show resolved Hide resolved
Memory = 1.1 * ((256 * 2 / 8) + 8 * 16) * 1,000,000
~= 0.211 GB
```

##### 4-bit quantization (8x compression)

In 4-bit quantization, each dimension is represented using 4 bits, equivalent to a 8x compression factor. The memory requirement can be estimated as follows:
kolchfa-aws marked this conversation as resolved.
Show resolved Hide resolved

```r
Memory = 1.1 * ((256 * 4 / 8) + 8 * 16) * 1,000,000
~= 0.282 GB
```
Loading