From ecd2232ac2b6c97dad9f5be5ab4e528a629c5a71 Mon Sep 17 00:00:00 2001 From: zhichao-aws Date: Wed, 14 Aug 2024 04:54:00 +0800 Subject: [PATCH] Refactor of the neural sparse search tutorial (#7922) * refactor Signed-off-by: zhichao-aws * fix Signed-off-by: zhichao-aws * Doc review Signed-off-by: Fanit Kolchina * Link fix Signed-off-by: Fanit Kolchina * Apply suggestions from code review Co-authored-by: Nathan Bower Signed-off-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> --------- Signed-off-by: zhichao-aws Signed-off-by: Fanit Kolchina Signed-off-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> Co-authored-by: Fanit Kolchina Co-authored-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> Co-authored-by: Nathan Bower --- .../processors/sparse-encoding.md | 2 +- _ml-commons-plugin/pretrained-models.md | 8 +- _search-plugins/neural-sparse-search.md | 421 +-------------- .../neural-sparse-with-pipelines.md | 486 ++++++++++++++++++ .../neural-sparse-with-raw-vectors.md | 99 ++++ 5 files changed, 607 insertions(+), 409 deletions(-) create mode 100644 _search-plugins/neural-sparse-with-pipelines.md create mode 100644 _search-plugins/neural-sparse-with-raw-vectors.md diff --git a/_ingest-pipelines/processors/sparse-encoding.md b/_ingest-pipelines/processors/sparse-encoding.md index 38b44320b1..3af6f4e987 100644 --- a/_ingest-pipelines/processors/sparse-encoding.md +++ b/_ingest-pipelines/processors/sparse-encoding.md @@ -141,7 +141,7 @@ The response confirms that in addition to the `passage_text` field, the processo } ``` -Once you have created an ingest pipeline, you need to create an index for ingestion and ingest documents into the index. To learn more, see [Step 2: Create an index for ingestion]({{site.url}}{{site.baseurl}}/search-plugins/neural-sparse-search/#step-2-create-an-index-for-ingestion) and [Step 3: Ingest documents into the index]({{site.url}}{{site.baseurl}}/search-plugins/neural-sparse-search/#step-3-ingest-documents-into-the-index) of [Neural sparse search]({{site.url}}{{site.baseurl}}/search-plugins/neural-sparse-search/). +Once you have created an ingest pipeline, you need to create an index for ingestion and ingest documents into the index. To learn more, see [Create an index for ingestion]({{site.url}}{{site.baseurl}}/search-plugins/neural-sparse-with-pipelines/#step-2b-create-an-index-for-ingestion) and [Step 3: Ingest documents into the index]({{site.url}}{{site.baseurl}}/search-plugins/neural-sparse-with-pipelines/#step-2c-ingest-documents-into-the-index) of [Neural sparse search]({{site.url}}{{site.baseurl}}/search-plugins/neural-sparse-search/). --- diff --git a/_ml-commons-plugin/pretrained-models.md b/_ml-commons-plugin/pretrained-models.md index 30540cfe49..154b8b530f 100644 --- a/_ml-commons-plugin/pretrained-models.md +++ b/_ml-commons-plugin/pretrained-models.md @@ -46,11 +46,13 @@ The following table provides a list of sentence transformer models and artifact Sparse encoding models transfer text into a sparse vector and convert the vector to a list of `` pairs representing the text entry and its corresponding weight in the sparse vector. You can use these models for use cases such as clustering or sparse neural search. -We recommend the following models for optimal performance: +We recommend the following combinations for optimal performance: - Use the `amazon/neural-sparse/opensearch-neural-sparse-encoding-v1` model during both ingestion and search. - Use the `amazon/neural-sparse/opensearch-neural-sparse-encoding-doc-v1` model during ingestion and the -`amazon/neural-sparse/opensearch-neural-sparse-tokenizer-v1` model during search. +`amazon/neural-sparse/opensearch-neural-sparse-tokenizer-v1` tokenizer during search. + +For more information about the preceding options for running neural sparse search, see [Generating sparse vector embeddings within OpenSearch]({{site.url}}{{site.baseurl}}/search-plugins/neural-sparse-with-pipelines/). The following table provides a list of sparse encoding models and artifact links you can use to download them. @@ -58,7 +60,7 @@ The following table provides a list of sparse encoding models and artifact links |:---|:---|:---|:---|:---| | `amazon/neural-sparse/opensearch-neural-sparse-encoding-v1` | 1.0.1 | Yes | - [model_url](https://artifacts.opensearch.org/models/ml-models/amazon/neural-sparse/opensearch-neural-sparse-encoding-v1/1.0.1/torch_script/neural-sparse_opensearch-neural-sparse-encoding-v1-1.0.1-torch_script.zip)
- [config_url](https://artifacts.opensearch.org/models/ml-models/amazon/neural-sparse/opensearch-neural-sparse-encoding-v1/1.0.1/torch_script/config.json) | A neural sparse encoding model. The model transforms text into a sparse vector, identifies the indexes of non-zero elements in the vector, and then converts the vector into `` pairs, where each entry corresponds to a non-zero element index. To experiment with this model using transformers and the PyTorch API, see the [HuggingFace documentation](https://huggingface.co/opensearch-project/opensearch-neural-sparse-encoding-v1). | | `amazon/neural-sparse/opensearch-neural-sparse-encoding-doc-v1` | 1.0.1 | Yes | - [model_url](https://artifacts.opensearch.org/models/ml-models/amazon/neural-sparse/opensearch-neural-sparse-encoding-doc-v1/1.0.1/torch_script/neural-sparse_opensearch-neural-sparse-encoding-doc-v1-1.0.1-torch_script.zip)
- [config_url](https://artifacts.opensearch.org/models/ml-models/amazon/neural-sparse/opensearch-neural-sparse-encoding-doc-v1/1.0.1/torch_script/config.json) | A neural sparse encoding model. The model transforms text into a sparse vector, identifies the indexes of non-zero elements in the vector, and then converts the vector into `` pairs, where each entry corresponds to a non-zero element index. To experiment with this model using transformers and the PyTorch API, see the [HuggingFace documentation](https://huggingface.co/opensearch-project/opensearch-neural-sparse-encoding-doc-v1). | -| `amazon/neural-sparse/opensearch-neural-sparse-tokenizer-v1` | 1.0.1 | Yes | - [model_url](https://artifacts.opensearch.org/models/ml-models/amazon/neural-sparse/opensearch-neural-sparse-tokenizer-v1/1.0.1/torch_script/neural-sparse_opensearch-neural-sparse-tokenizer-v1-1.0.1-torch_script.zip)
- [config_url](https://artifacts.opensearch.org/models/ml-models/amazon/neural-sparse/opensearch-neural-sparse-tokenizer-v1/1.0.1/torch_script/config.json) | A neural sparse tokenizer model. The model tokenizes text into tokens and assigns each token a predefined weight, which is the token's inverse document frequency (IDF). If the IDF file is not provided, the weight defaults to 1. For more information, see [Preparing a model]({{site.url}}{{site.baseurl}}/ml-commons-plugin/custom-local-models/#preparing-a-model). | +| `amazon/neural-sparse/opensearch-neural-sparse-tokenizer-v1` | 1.0.1 | Yes | - [model_url](https://artifacts.opensearch.org/models/ml-models/amazon/neural-sparse/opensearch-neural-sparse-tokenizer-v1/1.0.1/torch_script/neural-sparse_opensearch-neural-sparse-tokenizer-v1-1.0.1-torch_script.zip)
- [config_url](https://artifacts.opensearch.org/models/ml-models/amazon/neural-sparse/opensearch-neural-sparse-tokenizer-v1/1.0.1/torch_script/config.json) | A neural sparse tokenizer. The tokenizer splits text into tokens and assigns each token a predefined weight, which is the token's inverse document frequency (IDF). If the IDF file is not provided, the weight defaults to 1. For more information, see [Preparing a model]({{site.url}}{{site.baseurl}}/ml-commons-plugin/custom-local-models/#preparing-a-model). | ### Cross-encoder models **Introduced 2.12** diff --git a/_search-plugins/neural-sparse-search.md b/_search-plugins/neural-sparse-search.md index 8aa2ff7dbf..0beee26ef0 100644 --- a/_search-plugins/neural-sparse-search.md +++ b/_search-plugins/neural-sparse-search.md @@ -2,7 +2,7 @@ layout: default title: Neural sparse search nav_order: 50 -has_children: false +has_children: true redirect_from: - /search-plugins/neural-sparse-search/ - /search-plugins/sparse-search/ @@ -14,261 +14,20 @@ Introduced 2.11 [Semantic search]({{site.url}}{{site.baseurl}}/search-plugins/semantic-search/) relies on dense retrieval that is based on text embedding models. However, dense methods use k-NN search, which consumes a large amount of memory and CPU resources. An alternative to semantic search, neural sparse search is implemented using an inverted index and is thus as efficient as BM25. Neural sparse search is facilitated by sparse embedding models. When you perform a neural sparse search, it creates a sparse vector (a list of `token: weight` key-value pairs representing an entry and its weight) and ingests data into a rank features index. -When selecting a model, choose one of the following options: +To further boost search relevance, you can combine neural sparse search with dense [semantic search]({{site.url}}{{site.baseurl}}/search-plugins/semantic-search/) using a [hybrid query]({{site.url}}{{site.baseurl}}/query-dsl/compound/hybrid/). -- Use a sparse encoding model at both ingestion time and search time for better search relevance at the expense of relatively high latency. -- Use a sparse encoding model at ingestion time and a tokenizer at search time for lower search latency at the expense of relatively lower search relevance. Tokenization doesn't involve model inference, so you can deploy and invoke a tokenizer using the ML Commons Model API for a more streamlined experience. +You can configure neural sparse search in the following ways: -**PREREQUISITE**
-Before using neural sparse search, make sure to set up a [pretrained sparse embedding model]({{site.url}}{{site.baseurl}}/ml-commons-plugin/pretrained-models/#sparse-encoding-models) or your own sparse embedding model. For more information, see [Choosing a model]({{site.url}}{{site.baseurl}}/ml-commons-plugin/integrating-ml-models/#choosing-a-model). -{: .note} +- Generate vector embeddings within OpenSearch: Configure an ingest pipeline to generate and store sparse vector embeddings from document text at ingestion time. At query time, input plain text, which will be automatically converted into vector embeddings for search. For complete setup steps, see [Configuring ingest pipelines for neural sparse search]({{site.url}}{{site.baseurl}}/search-plugins/neural-sparse-with-pipelines/). +- Ingest raw sparse vectors and search using sparse vectors directly. For complete setup steps, see [Ingesting and searching raw vectors]({{site.url}}{{site.baseurl}}/search-plugins/neural-sparse-with-raw-vectors/). -## Using neural sparse search +To learn more about splitting long text into passages for neural search, see [Text chunking]({{site.url}}{{site.baseurl}}/search-plugins/text-chunking/). -To use neural sparse search, follow these steps: +## Accelerating neural sparse search -1. [Create an ingest pipeline](#step-1-create-an-ingest-pipeline). -1. [Create an index for ingestion](#step-2-create-an-index-for-ingestion). -1. [Ingest documents into the index](#step-3-ingest-documents-into-the-index). -1. [Search the index using neural search](#step-4-search-the-index-using-neural-sparse-search). -1. _Optional_ [Create and enable the two-phase processor](#step-5-create-and-enable-the-two-phase-processor-optional). +Starting with OpenSearch version 2.15, you can significantly accelerate the search process by creating a search pipeline with a `neural_sparse_two_phase_processor`. -## Step 1: Create an ingest pipeline - -To generate vector embeddings, you need to create an [ingest pipeline]({{site.url}}{{site.baseurl}}/api-reference/ingest-apis/index/) that contains a [`sparse_encoding` processor]({{site.url}}{{site.baseurl}}/api-reference/ingest-apis/processors/sparse-encoding/), which will convert the text in a document field to vector embeddings. The processor's `field_map` determines the input fields from which to generate vector embeddings and the output fields in which to store the embeddings. - -The following example request creates an ingest pipeline where the text from `passage_text` will be converted into text embeddings and the embeddings will be stored in `passage_embedding`: - -```json -PUT /_ingest/pipeline/nlp-ingest-pipeline-sparse -{ - "description": "An sparse encoding ingest pipeline", - "processors": [ - { - "sparse_encoding": { - "model_id": "aP2Q8ooBpBj3wT4HVS8a", - "field_map": { - "passage_text": "passage_embedding" - } - } - } - ] -} -``` -{% include copy-curl.html %} - -To split long text into passages, use the `text_chunking` ingest processor before the `sparse_encoding` processor. For more information, see [Text chunking]({{site.url}}{{site.baseurl}}/search-plugins/text-chunking/). - - -## Step 2: Create an index for ingestion - -In order to use the text embedding processor defined in your pipeline, create a rank features index, adding the pipeline created in the previous step as the default pipeline. Ensure that the fields defined in the `field_map` are mapped as correct types. Continuing with the example, the `passage_embedding` field must be mapped as [`rank_features`]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/rank/#rank-features). Similarly, the `passage_text` field should be mapped as `text`. - -The following example request creates a rank features index that is set up with a default ingest pipeline: - -```json -PUT /my-nlp-index -{ - "settings": { - "default_pipeline": "nlp-ingest-pipeline-sparse" - }, - "mappings": { - "properties": { - "id": { - "type": "text" - }, - "passage_embedding": { - "type": "rank_features" - }, - "passage_text": { - "type": "text" - } - } - } -} -``` -{% include copy-curl.html %} - -To save disk space, you can exclude the embedding vector from the source as follows: - -```json -PUT /my-nlp-index -{ - "settings": { - "default_pipeline": "nlp-ingest-pipeline-sparse" - }, - "mappings": { - "_source": { - "excludes": [ - "passage_embedding" - ] - }, - "properties": { - "id": { - "type": "text" - }, - "passage_embedding": { - "type": "rank_features" - }, - "passage_text": { - "type": "text" - } - } - } -} -``` -{% include copy-curl.html %} - -Once the `` pairs are excluded from the source, they cannot be recovered. Before applying this optimization, make sure you don't need the `` pairs for your application. -{: .important} - -## Step 3: Ingest documents into the index - -To ingest documents into the index created in the previous step, send the following requests: - -```json -PUT /my-nlp-index/_doc/1 -{ - "passage_text": "Hello world", - "id": "s1" -} -``` -{% include copy-curl.html %} - -```json -PUT /my-nlp-index/_doc/2 -{ - "passage_text": "Hi planet", - "id": "s2" -} -``` -{% include copy-curl.html %} - -Before the document is ingested into the index, the ingest pipeline runs the `sparse_encoding` processor on the document, generating vector embeddings for the `passage_text` field. The indexed document includes the `passage_text` field, which contains the original text, and the `passage_embedding` field, which contains the vector embeddings. - -## Step 4: Search the index using neural sparse search - -To perform a neural sparse search on your index, use the `neural_sparse` query clause in [Query DSL]({{site.url}}{{site.baseurl}}/opensearch/query-dsl/index/) queries. - -The following example request uses a `neural_sparse` query to search for relevant documents using a raw text query: - -```json -GET my-nlp-index/_search -{ - "query": { - "neural_sparse": { - "passage_embedding": { - "query_text": "Hi world", - "model_id": "aP2Q8ooBpBj3wT4HVS8a" - } - } - } -} -``` -{% include copy-curl.html %} - -The response contains the matching documents: - -```json -{ - "took" : 688, - "timed_out" : false, - "_shards" : { - "total" : 1, - "successful" : 1, - "skipped" : 0, - "failed" : 0 - }, - "hits" : { - "total" : { - "value" : 2, - "relation" : "eq" - }, - "max_score" : 30.0029, - "hits" : [ - { - "_index" : "my-nlp-index", - "_id" : "1", - "_score" : 30.0029, - "_source" : { - "passage_text" : "Hello world", - "passage_embedding" : { - "!" : 0.8708904, - "door" : 0.8587369, - "hi" : 2.3929274, - "worlds" : 2.7839446, - "yes" : 0.75845814, - "##world" : 2.5432441, - "born" : 0.2682308, - "nothing" : 0.8625516, - "goodbye" : 0.17146169, - "greeting" : 0.96817183, - "birth" : 1.2788506, - "come" : 0.1623208, - "global" : 0.4371151, - "it" : 0.42951578, - "life" : 1.5750692, - "thanks" : 0.26481047, - "world" : 4.7300377, - "tiny" : 0.5462298, - "earth" : 2.6555297, - "universe" : 2.0308156, - "worldwide" : 1.3903781, - "hello" : 6.696973, - "so" : 0.20279501, - "?" : 0.67785245 - }, - "id" : "s1" - } - }, - { - "_index" : "my-nlp-index", - "_id" : "2", - "_score" : 16.480486, - "_source" : { - "passage_text" : "Hi planet", - "passage_embedding" : { - "hi" : 4.338913, - "planets" : 2.7755864, - "planet" : 5.0969057, - "mars" : 1.7405145, - "earth" : 2.6087382, - "hello" : 3.3210192 - }, - "id" : "s2" - } - } - ] - } -} -``` - -You can also use the `neural_sparse` query with sparse vector embeddings: -```json -GET my-nlp-index/_search -{ - "query": { - "neural_sparse": { - "passage_embedding": { - "query_tokens": { - "hi" : 4.338913, - "planets" : 2.7755864, - "planet" : 5.0969057, - "mars" : 1.7405145, - "earth" : 2.6087382, - "hello" : 3.3210192 - } - } - } - } -} -``` -## Step 5: Create and enable the two-phase processor (Optional) - - -The `neural_sparse_two_phase_processor` is a new feature introduced in OpenSearch 2.15. Using the two-phase processor can significantly improve the performance of neural sparse queries. - -To quickly launch a search pipeline with neural sparse search, use the following example pipeline: +To create a search pipeline with a two-phase processor for neural sparse search, use the following request: ```json PUT /_search/pipeline/two_phase_search_pipeline @@ -277,7 +36,7 @@ PUT /_search/pipeline/two_phase_search_pipeline { "neural_sparse_two_phase_processor": { "tag": "neural-sparse", - "description": "This processor is making two-phase processor." + "description": "Creates a two-phase processor for neural sparse search." } } ] @@ -286,166 +45,18 @@ PUT /_search/pipeline/two_phase_search_pipeline {% include copy-curl.html %} Then choose the index you want to configure with the search pipeline and set the `index.search.default_pipeline` to the pipeline name, as shown in the following example: -```json -PUT /index-name/_settings -{ - "index.search.default_pipeline" : "two_phase_search_pipeline" -} -``` -{% include copy-curl.html %} - - - -## Setting a default model on an index or field - -A [`neural_sparse`]({{site.url}}{{site.baseurl}}/query-dsl/specialized/neural-sparse/) query requires a model ID for generating sparse embeddings. To eliminate passing the model ID with each neural_sparse query request, you can set a default model on index-level or field-level. - -First, create a [search pipeline]({{site.url}}{{site.baseurl}}/search-plugins/search-pipelines/index/) with a [`neural_query_enricher`]({{site.url}}{{site.baseurl}}/search-plugins/search-pipelines/neural-query-enricher/) request processor. To set a default model for an index, provide the model ID in the `default_model_id` parameter. To set a default model for a specific field, provide the field name and the corresponding model ID in the `neural_field_default_id` map. If you provide both `default_model_id` and `neural_field_default_id`, `neural_field_default_id` takes precedence: - -```json -PUT /_search/pipeline/default_model_pipeline -{ - "request_processors": [ - { - "neural_query_enricher" : { - "default_model_id": "bQ1J8ooBpBj3wT4HVUsb", - "neural_field_default_id": { - "my_field_1": "uZj0qYoBMtvQlfhaYeud", - "my_field_2": "upj0qYoBMtvQlfhaZOuM" - } - } - } - ] -} -``` -{% include copy-curl.html %} - -Then set the default model for your index: - -```json -PUT /my-nlp-index/_settings -{ - "index.search.default_pipeline" : "default_model_pipeline" -} -``` -{% include copy-curl.html %} - -You can now omit the model ID when searching: ```json -GET /my-nlp-index/_search +PUT /my-nlp-index/_settings { - "query": { - "neural_sparse": { - "passage_embedding": { - "query_text": "Hi world" - } - } - } + "index.search.default_pipeline" : "two_phase_search_pipeline" } ``` {% include copy-curl.html %} -The response contains both documents: - -```json -{ - "took" : 688, - "timed_out" : false, - "_shards" : { - "total" : 1, - "successful" : 1, - "skipped" : 0, - "failed" : 0 - }, - "hits" : { - "total" : { - "value" : 2, - "relation" : "eq" - }, - "max_score" : 30.0029, - "hits" : [ - { - "_index" : "my-nlp-index", - "_id" : "1", - "_score" : 30.0029, - "_source" : { - "passage_text" : "Hello world", - "passage_embedding" : { - "!" : 0.8708904, - "door" : 0.8587369, - "hi" : 2.3929274, - "worlds" : 2.7839446, - "yes" : 0.75845814, - "##world" : 2.5432441, - "born" : 0.2682308, - "nothing" : 0.8625516, - "goodbye" : 0.17146169, - "greeting" : 0.96817183, - "birth" : 1.2788506, - "come" : 0.1623208, - "global" : 0.4371151, - "it" : 0.42951578, - "life" : 1.5750692, - "thanks" : 0.26481047, - "world" : 4.7300377, - "tiny" : 0.5462298, - "earth" : 2.6555297, - "universe" : 2.0308156, - "worldwide" : 1.3903781, - "hello" : 6.696973, - "so" : 0.20279501, - "?" : 0.67785245 - }, - "id" : "s1" - } - }, - { - "_index" : "my-nlp-index", - "_id" : "2", - "_score" : 16.480486, - "_source" : { - "passage_text" : "Hi planet", - "passage_embedding" : { - "hi" : 4.338913, - "planets" : 2.7755864, - "planet" : 5.0969057, - "mars" : 1.7405145, - "earth" : 2.6087382, - "hello" : 3.3210192 - }, - "id" : "s2" - } - } - ] - } -} -``` - -## Next steps - -- To learn more about splitting long text into passages for neural search, see [Text chunking]({{site.url}}{{site.baseurl}}/search-plugins/text-chunking/). - -## FAQ - -Refer to the following frequently asked questions for more information about neural sparse search. - -### How do I mitigate remote connector throttling exceptions? - -When using connectors to call a remote service like SageMaker, ingestion and search calls sometimes fail due to remote connector throttling exceptions. - -To mitigate throttling exceptions, modify the connector's [`client_config`]({{site.url}}{{site.baseurl}}/ml-commons-plugin/remote-models/blueprints/#configuration-parameters) parameter to decrease the number of maximum connections, using the `max_connection` setting to prevent the maximum number of concurrent connections from exceeding the threshold of the remote service. You can also modify the retry settings to flatten the request spike during ingestion. - -For versions earlier than OpenSearch 2.15, the SageMaker throttling exception will be thrown as the following "error": - -``` - { - "type": "status_exception", - "reason": "Error from remote service: {\"message\":null}" - } -``` - +For information about `two_phase_search_pipeline`, see [Neural sparse query two-phase processor]({{site.url}}{{site.baseurl}}/search-plugins/search-pipelines/neural-sparse-query-two-phase-processor/). -## Next steps +## Further reading -- To learn more about splitting long text into passages for neural search, see [Text chunking]({{site.url}}{{site.baseurl}}/search-plugins/text-chunking/). +- Learn more about how sparse encoding models work and explore OpenSearch neural sparse search benchmarks in [Improving document retrieval with sparse semantic encoders](https://opensearch.org/blog/improving-document-retrieval-with-sparse-semantic-encoders/). +- Learn the fundamentals of neural sparse search and its efficiency in [A deep dive into faster semantic sparse retrieval in OpenSearch 2.12](https://opensearch.org/blog/A-deep-dive-into-faster-semantic-sparse-retrieval-in-OS-2.12/). diff --git a/_search-plugins/neural-sparse-with-pipelines.md b/_search-plugins/neural-sparse-with-pipelines.md new file mode 100644 index 0000000000..fea2f0d795 --- /dev/null +++ b/_search-plugins/neural-sparse-with-pipelines.md @@ -0,0 +1,486 @@ +--- +layout: default +title: Configuring ingest pipelines +parent: Neural sparse search +nav_order: 10 +has_children: false +--- + +# Configuring ingest pipelines for neural sparse search + +Generating sparse vector embeddings within OpenSearch enables neural sparse search to function like lexical search. To take advantage of this encapsulation, set up an ingest pipeline to create and store sparse vector embeddings from document text during ingestion. At query time, input plain text, which will be automatically converted into vector embeddings for search. + +For this tutorial, you'll use neural sparse search with OpenSearch's built-in machine learning (ML) model hosting and ingest pipelines. Because the transformation of text to embeddings is performed within OpenSearch, you'll use text when ingesting and searching documents. + +At ingestion time, neural sparse search uses a sparse encoding model to generate sparse vector embeddings from text fields. + +At query time, neural sparse search operates in one of two search modes: + +- **Bi-encoder mode** (requires a sparse encoding model): A sparse encoding model generates sparse vector embeddings from query text. This approach provides better search relevance at the cost of a slight increase in latency. + +- **Doc-only mode** (requires a sparse encoding model and a tokenizer): A sparse encoding model generates sparse vector embeddings from query text. In this mode, neural sparse search tokenizes query text using a tokenizer and obtains the token weights from a lookup table. This approach provides faster retrieval at the cost of a slight decrease in search relevance. The tokenizer is deployed and invoked using the [Model API]({{site.url}}{{site.baseurl}}/ml-commons-plugin/api/model-apis/index/) for a uniform neural sparse search experience. + +For more information about choosing the neural sparse search mode that best suits your workload, see [Choose the search mode](#step-1a-choose-the-search-mode). + +## Tutorial + +This tutorial consists of the following steps: + +1. [**Configure a sparse encoding model/tokenizer**](#step-1-configure-a-sparse-encoding-modeltokenizer). + 1. [Choose the search mode](#step-1a-choose-the-search-mode) + 1. [Register the model/tokenizer](#step-1b-register-the-modeltokenizer) + 1. [Deploy the model/tokenizer](#step-1c-deploy-the-modeltokenizer) +1. [**Ingest data**](#step-2-ingest-data) + 1. [Create an ingest pipeline](#step-2a-create-an-ingest-pipeline) + 1. [Create an index for ingestion](#step-2b-create-an-index-for-ingestion) + 1. [Ingest documents into the index](#step-2c-ingest-documents-into-the-index) +1. [**Search the data**](#step-3-search-the-data) + +### Prerequisites + +Before you start, complete the [prerequisites]({{site.url}}{{site.baseurl}}/search-plugins/neural-search-tutorial/#prerequisites) for neural search. + +## Step 1: Configure a sparse encoding model/tokenizer + +Both the bi-encoder and doc-only search modes require you to configure a sparse encoding model. Doc-only mode requires you to configure a tokenizer in addition to the model. + +### Step 1(a): Choose the search mode + +Choose the search mode and the appropriate model/tokenizer combination: + +- **Bi-encoder**: Use the `amazon/neural-sparse/opensearch-neural-sparse-encoding-v1` model during both ingestion and search. + +- **Doc-only**: Use the `amazon/neural-sparse/opensearch-neural-sparse-encoding-doc-v1` model during ingestion and the `amazon/neural-sparse/opensearch-neural-sparse-tokenizer-v1` tokenizer during search. + +The following table provides a search relevance comparison for the two search modes so that you can choose the best mode for your use case. + +| Mode | Ingestion model | Search model | Avg search relevance on BEIR | Model parameters | +|-----------|---------------------------------------------------------------|---------------------------------------------------------------|------------------------------|------------------| +| Doc-only | `amazon/neural-sparse/opensearch-neural-sparse-encoding-doc-v1` | `amazon/neural-sparse/opensearch-neural-sparse-tokenizer-v1` | 0.49 | 133M | +| Bi-encoder| `amazon/neural-sparse/opensearch-neural-sparse-encoding-v1` | `amazon/neural-sparse/opensearch-neural-sparse-encoding-v1` | 0.524 | 133M | + +### Step 1(b): Register the model/tokenizer + +When you register a model/tokenizer, OpenSearch creates a model group for the model/tokenizer. You can also explicitly create a model group before registering models. For more information, see [Model access control]({{site.url}}{{site.baseurl}}/ml-commons-plugin/model-access-control/). + +#### Bi-encoder mode + +When using bi-encoder mode, you only need to register the `amazon/neural-sparse/opensearch-neural-sparse-encoding-v1` model. + +Register the sparse encoding model: + +```json +POST /_plugins/_ml/models/_register?deploy=true +{ + "name": "amazon/neural-sparse/opensearch-neural-sparse-encoding-v1", + "version": "1.0.1", + "model_format": "TORCH_SCRIPT" +} +``` +{% include copy-curl.html %} + +Registering a model is an asynchronous task. OpenSearch returns a task ID for every model you register: + +```json +{ + "task_id": "aFeif4oB5Vm0Tdw8yoN7", + "status": "CREATED" +} +``` + +You can check the status of the task by calling the Tasks API: + +```json +GET /_plugins/_ml/tasks/aFeif4oB5Vm0Tdw8yoN7 +``` +{% include copy-curl.html %} + +Once the task is complete, the task state will change to `COMPLETED` and the Tasks API response will contain the model ID of the registered model: + +```json +{ + "model_id": "", + "task_type": "REGISTER_MODEL", + "function_name": "SPARSE_ENCODING", + "state": "COMPLETED", + "worker_node": [ + "4p6FVOmJRtu3wehDD74hzQ" + ], + "create_time": 1694358489722, + "last_update_time": 1694358499139, + "is_async": true +} +``` + +Note the `model_id` of the model you've created; you'll need it for the following steps. + +#### Doc-only mode + +When using doc-only mode, you need to register the `amazon/neural-sparse/opensearch-neural-sparse-encoding-doc-v1` model, which you'll use at ingestion time, and the `amazon/neural-sparse/opensearch-neural-sparse-tokenizer-v1` tokenizer, which you'll use at search time. + +Register the sparse encoding model: + +```json +POST /_plugins/_ml/models/_register?deploy=true +{ + "name": "amazon/neural-sparse/opensearch-neural-sparse-encoding-doc-v1", + "version": "1.0.1", + "model_format": "TORCH_SCRIPT" +} +``` +{% include copy-curl.html %} + +Register the tokenizer: + +```json +POST /_plugins/_ml/models/_register?deploy=true +{ + "name": "amazon/neural-sparse/opensearch-neural-sparse-tokenizer-v1", + "version": "1.0.1", + "model_format": "TORCH_SCRIPT" +} +``` +{% include copy-curl.html %} + +Like in the bi-encoder mode, use the Tasks API to check the status of the registration task. After the Tasks API returns the task state as `COMPLETED`. Note the `model_id` of the model and the tokenizer you've created; you'll need them for the following steps. + +### Step 1(c): Deploy the model/tokenizer + +Next, you'll need to deploy the model/tokenizer you registered. Deploying a model creates a model instance and caches the model in memory. + +#### Bi-encoder mode + +To deploy the model, provide its model ID to the `_deploy` endpoint: + +```json +POST /_plugins/_ml/models//_deploy +``` +{% include copy-curl.html %} + +As with the register operation, the deploy operation is asynchronous, so you'll get a task ID in the response: + +```json +{ + "task_id": "ale6f4oB5Vm0Tdw8NINO", + "status": "CREATED" +} +``` + +You can check the status of the task by using the Tasks API: + +```json +GET /_plugins/_ml/tasks/ale6f4oB5Vm0Tdw8NINO +``` +{% include copy-curl.html %} + +Once the task is complete, the task state will change to `COMPLETED`: + +```json +{ + "model_id": "", + "task_type": "DEPLOY_MODEL", + "function_name": "SPARSE_ENCODING", + "state": "COMPLETED", + "worker_node": [ + "4p6FVOmJRtu3wehDD74hzQ" + ], + "create_time": 1694360024141, + "last_update_time": 1694360027940, + "is_async": true +} +``` + +#### Doc-only mode + +To deploy the model, provide its model ID to the `_deploy` endpoint: + +```json +POST /_plugins/_ml/models//_deploy +``` +{% include copy-curl.html %} + +You can deploy the tokenizer in the same way: + +```json +POST /_plugins/_ml/models//_deploy +``` +{% include copy-curl.html %} + +As with bi-encoder mode, you can check the status of both deploy tasks by using the Tasks API. Once the task is complete, the task state will change to `COMPLETED`. + +## Step 2: Ingest data + +In both the bi-encoder and doc-only modes, you'll use a sparse encoding model at ingestion time to generate sparse vector embeddings. + +### Step 2(a): Create an ingest pipeline + +To generate sparse vector embeddings, you need to create an [ingest pipeline]({{site.url}}{{site.baseurl}}/api-reference/ingest-apis/index/) that contains a [`sparse_encoding` processor]({{site.url}}{{site.baseurl}}/api-reference/ingest-apis/processors/sparse-encoding/), which will convert the text in a document field to vector embeddings. The processor's `field_map` determines the input fields from which to generate vector embeddings and the output fields in which to store the embeddings. + +The following example request creates an ingest pipeline where the text from `passage_text` will be converted into sparse vector embeddings, which will be stored in `passage_embedding`. Provide the model ID of the registered model in the request: + +```json +PUT /_ingest/pipeline/nlp-ingest-pipeline-sparse +{ + "description": "An sparse encoding ingest pipeline", + "processors": [ + { + "sparse_encoding": { + "model_id": "", + "field_map": { + "passage_text": "passage_embedding" + } + } + } + ] +} +``` +{% include copy-curl.html %} + +To split long text into passages, use the `text_chunking` ingest processor before the `sparse_encoding` processor. For more information, see [Text chunking]({{site.url}}{{site.baseurl}}/search-plugins/text-chunking/). + +### Step 2(b): Create an index for ingestion + +In order to use the sparse encoding processor defined in your pipeline, create a rank features index, adding the pipeline created in the previous step as the default pipeline. Ensure that the fields defined in the `field_map` are mapped as correct types. Continuing with the example, the `passage_embedding` field must be mapped as [`rank_features`]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/rank/#rank-features). Similarly, the `passage_text` field must be mapped as `text`. + +The following example request creates a rank features index configured with a default ingest pipeline: + +```json +PUT /my-nlp-index +{ + "settings": { + "default_pipeline": "nlp-ingest-pipeline-sparse" + }, + "mappings": { + "properties": { + "id": { + "type": "text" + }, + "passage_embedding": { + "type": "rank_features" + }, + "passage_text": { + "type": "text" + } + } + } +} +``` +{% include copy-curl.html %} + +To save disk space, you can exclude the embedding vector from the source as follows: + +```json +PUT /my-nlp-index +{ + "settings": { + "default_pipeline": "nlp-ingest-pipeline-sparse" + }, + "mappings": { + "_source": { + "excludes": [ + "passage_embedding" + ] + }, + "properties": { + "id": { + "type": "text" + }, + "passage_embedding": { + "type": "rank_features" + }, + "passage_text": { + "type": "text" + } + } + } +} +``` +{% include copy-curl.html %} + +Once the `` pairs are excluded from the source, they cannot be recovered. Before applying this optimization, make sure you don't need the `` pairs for your application. +{: .important} + +### Step 2(c): Ingest documents into the index + +To ingest documents into the index created in the previous step, send the following requests: + +```json +PUT /my-nlp-index/_doc/1 +{ + "passage_text": "Hello world", + "id": "s1" +} +``` +{% include copy-curl.html %} + +```json +PUT /my-nlp-index/_doc/2 +{ + "passage_text": "Hi planet", + "id": "s2" +} +``` +{% include copy-curl.html %} + +Before the document is ingested into the index, the ingest pipeline runs the `sparse_encoding` processor on the document, generating vector embeddings for the `passage_text` field. The indexed document includes the `passage_text` field, which contains the original text, and the `passage_embedding` field, which contains the vector embeddings. + +## Step 3: Search the data + +To perform a neural sparse search on your index, use the `neural_sparse` query clause in [Query DSL]({{site.url}}{{site.baseurl}}/opensearch/query-dsl/index/) queries. + +The following example request uses a `neural_sparse` query to search for relevant documents using a raw text query. Provide the model ID for bi-encoder mode or the tokenizer ID for doc-only mode: + +```json +GET my-nlp-index/_search +{ + "query": { + "neural_sparse": { + "passage_embedding": { + "query_text": "Hi world", + "model_id": "" + } + } + } +} +``` +{% include copy-curl.html %} + +The response contains the matching documents: + +```json +{ + "took" : 688, + "timed_out" : false, + "_shards" : { + "total" : 1, + "successful" : 1, + "skipped" : 0, + "failed" : 0 + }, + "hits" : { + "total" : { + "value" : 2, + "relation" : "eq" + }, + "max_score" : 30.0029, + "hits" : [ + { + "_index" : "my-nlp-index", + "_id" : "1", + "_score" : 30.0029, + "_source" : { + "passage_text" : "Hello world", + "passage_embedding" : { + "!" : 0.8708904, + "door" : 0.8587369, + "hi" : 2.3929274, + "worlds" : 2.7839446, + "yes" : 0.75845814, + "##world" : 2.5432441, + "born" : 0.2682308, + "nothing" : 0.8625516, + "goodbye" : 0.17146169, + "greeting" : 0.96817183, + "birth" : 1.2788506, + "come" : 0.1623208, + "global" : 0.4371151, + "it" : 0.42951578, + "life" : 1.5750692, + "thanks" : 0.26481047, + "world" : 4.7300377, + "tiny" : 0.5462298, + "earth" : 2.6555297, + "universe" : 2.0308156, + "worldwide" : 1.3903781, + "hello" : 6.696973, + "so" : 0.20279501, + "?" : 0.67785245 + }, + "id" : "s1" + } + }, + { + "_index" : "my-nlp-index", + "_id" : "2", + "_score" : 16.480486, + "_source" : { + "passage_text" : "Hi planet", + "passage_embedding" : { + "hi" : 4.338913, + "planets" : 2.7755864, + "planet" : 5.0969057, + "mars" : 1.7405145, + "earth" : 2.6087382, + "hello" : 3.3210192 + }, + "id" : "s2" + } + } + ] + } +} +``` + +## Accelerating neural sparse search + +To learn more about improving retrieval time for neural sparse search, see [Accelerating neural sparse search]({{site.url}}{{site.baseurl}}/search-plugins/neural-sparse-search/#accelerating-neural-sparse-search). + +## Creating a search pipeline for neural sparse search + +You can create a search pipeline that augments neural sparse search functionality by: + +- Accelerating neural sparse search for faster retrieval. +- Setting the default model ID on an index for easier use. + +To configure the pipeline, add a [`neural_sparse_two_phase_processor`]({{site.url}}{{site.baseurl}}/search-plugins/search-pipelines/neural-sparse-query-two-phase-processor/) or a [`neural_query_enricher`]({{site.url}}{{site.baseurl}}/search-plugins/search-pipelines/neural-query-enricher/) processor. The following request creates a pipeline with both processors: + +```json +PUT /_search/pipeline/neural_search_pipeline +{ + "request_processors": [ + { + "neural_sparse_two_phase_processor": { + "tag": "neural-sparse", + "description": "Creates a two-phase processor for neural sparse search." + } + }, + { + "neural_query_enricher" : { + "default_model_id": "" + } + } + ] +} +``` +{% include copy-curl.html %} + +Then set the default pipeline for your index to the newly created search pipeline: + +```json +PUT /my-nlp-index/_settings +{ + "index.search.default_pipeline" : "neural_search_pipeline" +} +``` +{% include copy-curl.html %} + +For more information about setting a default model on an index, or to learn how to set a default model on a specific field, see [Setting a default model on an index or field]({{site.url}}{{site.baseurl}}/search-plugins/semantic-search/#setting-a-default-model-on-an-index-or-field). + +## Troubleshooting + +This section contains information about resolving common issues encountered while running neural sparse search. + +### Remote connector throttling exceptions + +When using connectors to call a remote service such as Amazon SageMaker, ingestion and search calls sometimes fail because of remote connector throttling exceptions. + +For OpenSearch versions earlier than 2.15, a throttling exception will be returned as an error from the remote service: + +```json +{ + "type": "status_exception", + "reason": "Error from remote service: {\"message\":null}" +} +``` + +To mitigate throttling exceptions, decrease the maximum number of connections specified in the `max_connection` setting in the connector's [`client_config`]({{site.url}}{{site.baseurl}}/ml-commons-plugin/remote-models/blueprints/#configuration-parameters) object. Doing so will prevent the maximum number of concurrent connections from exceeding the threshold of the remote service. You can also modify the retry settings to avoid a request spike during ingestion. \ No newline at end of file diff --git a/_search-plugins/neural-sparse-with-raw-vectors.md b/_search-plugins/neural-sparse-with-raw-vectors.md new file mode 100644 index 0000000000..d69a789a1d --- /dev/null +++ b/_search-plugins/neural-sparse-with-raw-vectors.md @@ -0,0 +1,99 @@ +--- +layout: default +title: Using raw vectors +parent: Neural sparse search +nav_order: 20 +has_children: false +--- + +# Using raw vectors for neural sparse search + +If you're using self-hosted sparse embedding models, you can ingest raw sparse vectors and use neural sparse search. + +## Tutorial + +This tutorial consists of the following steps: + +1. [**Ingest sparse vectors**](#step-1-ingest-sparse-vectors) + 1. [Create an index](#step-1a-create-an-index) + 1. [Ingest documents into the index](#step-1b-ingest-documents-into-the-index) +1. [**Search the data using raw sparse vector**](#step-2-search-the-data-using-a-sparse-vector). + + +## Step 1: Ingest sparse vectors + +Once you have generated sparse vector embeddings, you can directly ingest them into OpenSearch. + +### Step 1(a): Create an index + +In order to ingest documents containing raw sparse vectors, create a rank features index: + +```json +PUT /my-nlp-index +{ + "mappings": { + "properties": { + "id": { + "type": "text" + }, + "passage_embedding": { + "type": "rank_features" + }, + "passage_text": { + "type": "text" + } + } + } +} +``` +{% include copy-curl.html %} + +### Step 1(b): Ingest documents into the index + +To ingest documents into the index created in the previous step, send the following request: + +```json +PUT /my-nlp-index/_doc/1 +{ + "passage_text": "Hello world", + "id": "s1", + "passage_embedding": { + "hi" : 4.338913, + "planets" : 2.7755864, + "planet" : 5.0969057, + "mars" : 1.7405145, + "earth" : 2.6087382, + "hello" : 3.3210192 + } +} +``` +{% include copy-curl.html %} + +## Step 2: Search the data using a sparse vector + +To search the documents using a sparse vector, provide the sparse embeddings in the `neural_sparse` query: + +```json +GET my-nlp-index/_search +{ + "query": { + "neural_sparse": { + "passage_embedding": { + "query_tokens": { + "hi" : 4.338913, + "planets" : 2.7755864, + "planet" : 5.0969057, + "mars" : 1.7405145, + "earth" : 2.6087382, + "hello" : 3.3210192 + } + } + } + } +} +``` +{% include copy-curl.html %} + +## Accelerating neural sparse search + +To learn more about improving retrieval time for neural sparse search, see [Accelerating neural sparse search]({{site.url}}{{site.baseurl}}/search-plugins/neural-sparse-search/#accelerating-neural-sparse-search).