From bc34f81dd3fda57644cd5c77fce1c005528ca4c5 Mon Sep 17 00:00:00 2001 From: "github-actions[bot]" Date: Mon, 28 Aug 2023 15:57:08 +0000 Subject: [PATCH] Refactor search pipeline documentation (#4908) * Refactor search pipeline documentation Signed-off-by: Fanit Kolchina * Apply suggestions from code review Co-authored-by: Nathan Bower Signed-off-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> --------- Signed-off-by: Fanit Kolchina Signed-off-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> Co-authored-by: Nathan Bower (cherry picked from commit 87ed0af8515e74f165f241fc2d5155a0ff87fdce) Signed-off-by: github-actions[bot] --- .../filter-query-processor.md | 4 +- _search-plugins/search-pipelines/index.md | 382 +----------------- .../personalize-search-ranking.md | 4 +- .../rename-field-processor.md | 4 +- .../search-pipelines/script-processor.md | 4 +- .../search-pipeline-metrics.md | 168 ++++++++ .../search-pipelines/search-processors.md | 101 +++++ .../search-pipelines/using-search-pipeline.md | 160 ++++++++ 8 files changed, 444 insertions(+), 383 deletions(-) create mode 100644 _search-plugins/search-pipelines/search-pipeline-metrics.md create mode 100644 _search-plugins/search-pipelines/search-processors.md create mode 100644 _search-plugins/search-pipelines/using-search-pipeline.md diff --git a/_search-plugins/search-pipelines/filter-query-processor.md b/_search-plugins/search-pipelines/filter-query-processor.md index b18faea3c5..b358bbb542 100644 --- a/_search-plugins/search-pipelines/filter-query-processor.md +++ b/_search-plugins/search-pipelines/filter-query-processor.md @@ -3,8 +3,8 @@ layout: default title: Filter query processor nav_order: 10 has_children: false -parent: Search pipelines -grand_parent: Search +parent: Search processors +grand_parent: Search pipelines --- # Filter query processor diff --git a/_search-plugins/search-pipelines/index.md b/_search-plugins/search-pipelines/index.md index 3edae52fa4..26f663704a 100644 --- a/_search-plugins/search-pipelines/index.md +++ b/_search-plugins/search-pipelines/index.md @@ -24,87 +24,9 @@ The following is a list of search pipeline terminology: Both request and response processing for the pipeline are performed on the coordinator node, so there is no shard-level processing. {: .note} -## Search request processors +## Processors -OpenSearch supports the following search request processors: - -- [`script`]({{site.url}}{{site.baseurl}}/search-plugins/search-pipelines/script-processor/): Adds a script that is run on newly indexed documents. -- [`filter_query`]({{site.url}}{{site.baseurl}}/search-plugins/search-pipelines/filter-query-processor/): Adds a filtering query that is used to filter requests. - -## Search response processors - -OpenSearch supports the following search response processors: - -- [`rename_field`]({{site.url}}{{site.baseurl}}/search-plugins/search-pipelines/rename-field-processor/): Renames an existing field. -- [`personalize_search_ranking`]({{site.url}}{{site.baseurl}}/search-plugins/search-pipelines/personalize-search-ranking/): Uses [Amazon Personalize](https://aws.amazon.com/personalize/) to rerank search results (requires setting up the Amazon Personalize service). - -## Viewing available processor types - -You can use the Nodes Search Pipelines API to view the available processor types: - -```json -GET /_nodes/search_pipelines -``` -{% include copy-curl.html %} - -The response contains the `search_pipelines` object that lists the available request and response processors: - -
- - Response - - {: .text-delta} - -```json -{ - "_nodes" : { - "total" : 1, - "successful" : 1, - "failed" : 0 - }, - "cluster_name" : "runTask", - "nodes" : { - "36FHvCwHT6Srbm2ZniEPhA" : { - "name" : "runTask-0", - "transport_address" : "127.0.0.1:9300", - "host" : "127.0.0.1", - "ip" : "127.0.0.1", - "version" : "3.0.0", - "build_type" : "tar", - "build_hash" : "unknown", - "roles" : [ - "cluster_manager", - "data", - "ingest", - "remote_cluster_client" - ], - "attributes" : { - "testattr" : "test", - "shard_indexing_pressure_enabled" : "true" - }, - "search_pipelines" : { - "request_processors" : [ - { - "type" : "filter_query" - }, - { - "type" : "script" - } - ], - "response_processors" : [ - { - "type" : "rename_field" - } - ] - } - } - } -} -``` -
- -In addition to the processors provided by OpenSearch, additional processors may be provided by plugins. -{: .note} +To learn more about available search processors, see [Search processors]({{site.url}}{{site.baseurl}}/search-plugins/search-pipelines/search-processors/). ## Creating a search pipeline @@ -161,46 +83,16 @@ By default, a search pipeline stops if one of its processors fails. If you want If the processor fails, OpenSearch logs the failure and continues to run all remaining processors in the search pipeline. To check whether there were any failures, you can use [search pipeline metrics](#search-pipeline-metrics). -## Using a temporary search pipeline for a request +## Using search pipelines -As an alternative to creating a search pipeline, you can define a temporary search pipeline to be used for only the current query: +To use a pipeline with a query, specify the pipeline name in the `search_pipeline` query parameter: ```json -POST /my-index/_search -{ - "query" : { - "match" : { - "text_field" : "some search text" - } - }, - "pipeline" : { - "request_processors": [ - { - "filter_query" : { - "tag" : "tag1", - "description" : "This processor is going to restrict to publicly visible documents", - "query" : { - "term": { - "visibility": "public" - } - } - } - } - ], - "response_processors": [ - { - "rename_field": { - "field": "message", - "target_field": "notification" - } - } - ] - } -} +GET /my_index/_search?search_pipeline=my_pipeline ``` {% include copy-curl.html %} -With this syntax, the pipeline does not persist and is used only for the query for which it is specified. +Alternatively, you can use a temporary pipeline with a request or set a default pipeline for an index. To learn more, see [Using a search pipeline]({{site.url}}{{site.baseurl}}/search-plugins/search-pipelines/using-search-pipeline/). ## Retrieving search pipelines @@ -255,110 +147,6 @@ GET /_search/pipeline/my* ``` {% include copy-curl.html %} - -## Using a search pipeline - -To use a pipeline with a query, specify the pipeline name in the `search_pipeline` query parameter: - -```json -GET /my_index/_search?search_pipeline=my_pipeline -``` -{% include copy-curl.html %} - -For a complete example of using a search pipeline with a `filter_query` processor, see [`filter_query` processor example]({{site.url}}{{site.baseurl}}/search-plugins/search-pipelines/filter-query-processor#example). - -## Default search pipeline - -For convenience, you can set a default search pipeline for an index. Once your index has a default pipeline, you don't need to specify the `search_pipeline` query parameter in every search request. - -### Setting a default search pipeline for an index - -To set a default search pipeline for an index, specify the `index.search.default_pipeline` in the index's settings: - -```json -PUT /my_index/_settings -{ - "index.search.default_pipeline" : "my_pipeline" -} -``` -{% include copy-curl.html %} - -After setting the default pipeline for `my_index`, you can try the same search for all documents: - -```json -GET /my_index/_search -``` -{% include copy-curl.html %} - -The response contains only the public document, indicating that the pipeline was applied by default: - -
- - Response - - {: .text-delta} - -```json -{ - "took" : 19, - "timed_out" : false, - "_shards" : { - "total" : 1, - "successful" : 1, - "skipped" : 0, - "failed" : 0 - }, - "hits" : { - "total" : { - "value" : 1, - "relation" : "eq" - }, - "max_score" : 0.0, - "hits" : [ - { - "_index" : "my_index", - "_id" : "1", - "_score" : 0.0, - "_source" : { - "message" : "This is a public message", - "visibility" : "public" - } - } - ] - } -} -``` -
- -### Disabling the default pipeline for a request - -If you want to run a search request without applying the default pipeline, you can set the `search_pipeline` query parameter to `_none`: - -```json -GET /my_index/_search?search_pipeline=_none -``` -{% include copy-curl.html %} - -### Removing the default pipeline - -To remove the default pipeline from an index, set it to `null` or `_none`: - -```json -PUT /my_index/_settings -{ - "index.search.default_pipeline" : null -} -``` -{% include copy-curl.html %} - -```json -PUT /my_index/_settings -{ - "index.search.default_pipeline" : "_none" -} -``` -{% include copy-curl.html %} - ## Updating a search pipeline To update a search pipeline dynamically, replace the search pipeline using the Search Pipeline API. @@ -454,160 +242,4 @@ The response contains the pipeline version: ## Search pipeline metrics -To view search pipeline metrics, use the [Nodes Stats API]({{site.url}}{{site.baseurl}}/api-reference/nodes-apis/nodes-stats/): - -```json -GET /_nodes/stats/search_pipeline -``` -{% include copy-curl.html %} - -The response contains statistics for all search pipelines: - -```json -{ - "_nodes" : { - "total" : 1, - "successful" : 1, - "failed" : 0 - }, - "cluster_name" : "runTask", - "nodes" : { - "CpvTK7KuRD6Oww8TTp8g2Q" : { - "timestamp" : 1689007282929, - "name" : "runTask-0", - "transport_address" : "127.0.0.1:9300", - "host" : "127.0.0.1", - "ip" : "127.0.0.1:9300", - "roles" : [ - "cluster_manager", - "data", - "ingest", - "remote_cluster_client" - ], - "attributes" : { - "testattr" : "test", - "shard_indexing_pressure_enabled" : "true" - }, - "search_pipeline" : { - "total_request" : { - "count" : 5, - "time_in_millis" : 158, - "current" : 0, - "failed" : 0 - }, - "total_response" : { - "count" : 2, - "time_in_millis" : 1, - "current" : 0, - "failed" : 0 - }, - "pipelines" : { - "public_info" : { - "request" : { - "count" : 3, - "time_in_millis" : 71, - "current" : 0, - "failed" : 0 - }, - "response" : { - "count" : 0, - "time_in_millis" : 0, - "current" : 0, - "failed" : 0 - }, - "request_processors" : [ - { - "filter_query:abc" : { - "type" : "filter_query", - "stats" : { - "count" : 1, - "time_in_millis" : 0, - "current" : 0, - "failed" : 0 - } - } - }, - { - "filter_query" : { - "type" : "filter_query", - "stats" : { - "count" : 4, - "time_in_millis" : 2, - "current" : 0, - "failed" : 0 - } - } - } - ], - "response_processors" : [ ] - }, - "guest_pipeline" : { - "request" : { - "count" : 2, - "time_in_millis" : 87, - "current" : 0, - "failed" : 0 - }, - "response" : { - "count" : 2, - "time_in_millis" : 1, - "current" : 0, - "failed" : 0 - }, - "request_processors" : [ - { - "script" : { - "type" : "script", - "stats" : { - "count" : 2, - "time_in_millis" : 86, - "current" : 0, - "failed" : 0 - } - } - }, - { - "filter_query:abc" : { - "type" : "filter_query", - "stats" : { - "count" : 1, - "time_in_millis" : 0, - "current" : 0, - "failed" : 0 - } - } - }, - { - "filter_query" : { - "type" : "filter_query", - "stats" : { - "count" : 3, - "time_in_millis" : 0, - "current" : 0, - "failed" : 0 - } - } - } - ], - "response_processors" : [ - { - "rename_field" : { - "type" : "rename_field", - "stats" : { - "count" : 2, - "time_in_millis" : 1, - "current" : 0, - "failed" : 0 - } - } - } - ] - } - } - } - } - } -} -``` - -For descriptions of each field in the response, see the [Nodes Stats search pipeline section]({{site.url}}{{site.baseurl}}/api-reference/nodes-apis/nodes-stats/#search_pipeline). \ No newline at end of file +For information about retrieving search pipeline statistics, see [Search pipeline metrics]({{site.url}}{{site.baseurl}}/search-plugins/search-pipelines/search-pipeline-metrics/). \ No newline at end of file diff --git a/_search-plugins/search-pipelines/personalize-search-ranking.md b/_search-plugins/search-pipelines/personalize-search-ranking.md index f913475ed4..c008bb155b 100644 --- a/_search-plugins/search-pipelines/personalize-search-ranking.md +++ b/_search-plugins/search-pipelines/personalize-search-ranking.md @@ -3,8 +3,8 @@ layout: default title: Personalize search ranking processor nav_order: 40 has_children: false -parent: Search pipelines -grand_parent: Search +parent: Search processors +grand_parent: Search pipelines --- # Personalize search ranking processor diff --git a/_search-plugins/search-pipelines/rename-field-processor.md b/_search-plugins/search-pipelines/rename-field-processor.md index 3ac3c541bc..47d445f093 100644 --- a/_search-plugins/search-pipelines/rename-field-processor.md +++ b/_search-plugins/search-pipelines/rename-field-processor.md @@ -3,8 +3,8 @@ layout: default title: Rename field processor nav_order: 20 has_children: false -parent: Search pipelines -grand_parent: Search +parent: Search processors +grand_parent: Search pipelines --- # Rename field processor diff --git a/_search-plugins/search-pipelines/script-processor.md b/_search-plugins/search-pipelines/script-processor.md index f190053641..4c25dd490e 100644 --- a/_search-plugins/search-pipelines/script-processor.md +++ b/_search-plugins/search-pipelines/script-processor.md @@ -3,8 +3,8 @@ layout: default title: Script processor nav_order: 30 has_children: false -parent: Search pipelines -grand_parent: Search +parent: Search processors +grand_parent: Search pipelines --- # Script processor diff --git a/_search-plugins/search-pipelines/search-pipeline-metrics.md b/_search-plugins/search-pipelines/search-pipeline-metrics.md new file mode 100644 index 0000000000..840db42238 --- /dev/null +++ b/_search-plugins/search-pipelines/search-pipeline-metrics.md @@ -0,0 +1,168 @@ +--- +layout: default +title: Search pipeline metrics +nav_order: 40 +has_children: false +parent: Search pipelines +grand_parent: Search +--- + +# Search pipeline metrics + +To view search pipeline metrics, use the [Nodes Stats API]({{site.url}}{{site.baseurl}}/api-reference/nodes-apis/nodes-stats/): + +```json +GET /_nodes/stats/search_pipeline +``` +{% include copy-curl.html %} + +The response contains statistics for all search pipelines: + +```json +{ + "_nodes" : { + "total" : 1, + "successful" : 1, + "failed" : 0 + }, + "cluster_name" : "runTask", + "nodes" : { + "CpvTK7KuRD6Oww8TTp8g2Q" : { + "timestamp" : 1689007282929, + "name" : "runTask-0", + "transport_address" : "127.0.0.1:9300", + "host" : "127.0.0.1", + "ip" : "127.0.0.1:9300", + "roles" : [ + "cluster_manager", + "data", + "ingest", + "remote_cluster_client" + ], + "attributes" : { + "testattr" : "test", + "shard_indexing_pressure_enabled" : "true" + }, + "search_pipeline" : { + "total_request" : { + "count" : 5, + "time_in_millis" : 158, + "current" : 0, + "failed" : 0 + }, + "total_response" : { + "count" : 2, + "time_in_millis" : 1, + "current" : 0, + "failed" : 0 + }, + "pipelines" : { + "public_info" : { + "request" : { + "count" : 3, + "time_in_millis" : 71, + "current" : 0, + "failed" : 0 + }, + "response" : { + "count" : 0, + "time_in_millis" : 0, + "current" : 0, + "failed" : 0 + }, + "request_processors" : [ + { + "filter_query:abc" : { + "type" : "filter_query", + "stats" : { + "count" : 1, + "time_in_millis" : 0, + "current" : 0, + "failed" : 0 + } + } + }, + { + "filter_query" : { + "type" : "filter_query", + "stats" : { + "count" : 4, + "time_in_millis" : 2, + "current" : 0, + "failed" : 0 + } + } + } + ], + "response_processors" : [ ] + }, + "guest_pipeline" : { + "request" : { + "count" : 2, + "time_in_millis" : 87, + "current" : 0, + "failed" : 0 + }, + "response" : { + "count" : 2, + "time_in_millis" : 1, + "current" : 0, + "failed" : 0 + }, + "request_processors" : [ + { + "script" : { + "type" : "script", + "stats" : { + "count" : 2, + "time_in_millis" : 86, + "current" : 0, + "failed" : 0 + } + } + }, + { + "filter_query:abc" : { + "type" : "filter_query", + "stats" : { + "count" : 1, + "time_in_millis" : 0, + "current" : 0, + "failed" : 0 + } + } + }, + { + "filter_query" : { + "type" : "filter_query", + "stats" : { + "count" : 3, + "time_in_millis" : 0, + "current" : 0, + "failed" : 0 + } + } + } + ], + "response_processors" : [ + { + "rename_field" : { + "type" : "rename_field", + "stats" : { + "count" : 2, + "time_in_millis" : 1, + "current" : 0, + "failed" : 0 + } + } + } + ] + } + } + } + } + } +} +``` + +For descriptions of each field in the response, see the [Nodes Stats search pipeline section]({{site.url}}{{site.baseurl}}/api-reference/nodes-apis/nodes-stats/#search_pipeline). \ No newline at end of file diff --git a/_search-plugins/search-pipelines/search-processors.md b/_search-plugins/search-pipelines/search-processors.md new file mode 100644 index 0000000000..d35f243bbf --- /dev/null +++ b/_search-plugins/search-pipelines/search-processors.md @@ -0,0 +1,101 @@ +--- +layout: default +title: Search processors +nav_order: 50 +has_children: true +parent: Search pipelines +grand_parent: Search +--- + +# Search processors + +Search processors can be of the following types: + +- [Search request processors](#search-request-processors) +- [Search response processors](#search-response-processors) + +## Search request processors + +The following table lists all supported search request processors. + +Processor | Description | Earliest available version +:--- | :--- | :--- +[`script`]({{site.url}}{{site.baseurl}}/search-plugins/search-pipelines/script-processor/) | Adds a script that is run on newly indexed documents. | 2.8 +[`filter_query`]({{site.url}}{{site.baseurl}}/search-plugins/search-pipelines/filter-query-processor/) | Adds a filtering query that is used to filter requests. | 2.8 + +## Search response processors + +The following table lists all supported search response processors. + +Processor | Description | Earliest available version +:--- | :--- | :--- +[`rename_field`]({{site.url}}{{site.baseurl}}/search-plugins/search-pipelines/rename-field-processor/)| Renames an existing field. | 2.8 +[`personalize_search_ranking`]({{site.url}}{{site.baseurl}}/search-plugins/search-pipelines/personalize-search-ranking/) | Uses [Amazon Personalize](https://aws.amazon.com/personalize/) to rerank search results (requires setting up the Amazon Personalize service). | 2.9 + +## Viewing available processor types + +You can use the Nodes Search Pipelines API to view the available processor types: + +```json +GET /_nodes/search_pipelines +``` +{% include copy-curl.html %} + +The response contains the `search_pipelines` object that lists the available request and response processors: + +
+ + Response + + {: .text-delta} + +```json +{ + "_nodes" : { + "total" : 1, + "successful" : 1, + "failed" : 0 + }, + "cluster_name" : "runTask", + "nodes" : { + "36FHvCwHT6Srbm2ZniEPhA" : { + "name" : "runTask-0", + "transport_address" : "127.0.0.1:9300", + "host" : "127.0.0.1", + "ip" : "127.0.0.1", + "version" : "3.0.0", + "build_type" : "tar", + "build_hash" : "unknown", + "roles" : [ + "cluster_manager", + "data", + "ingest", + "remote_cluster_client" + ], + "attributes" : { + "testattr" : "test", + "shard_indexing_pressure_enabled" : "true" + }, + "search_pipelines" : { + "request_processors" : [ + { + "type" : "filter_query" + }, + { + "type" : "script" + } + ], + "response_processors" : [ + { + "type" : "rename_field" + } + ] + } + } + } +} +``` +
+ +In addition to the processors provided by OpenSearch, additional processors may be provided by plugins. +{: .note} diff --git a/_search-plugins/search-pipelines/using-search-pipeline.md b/_search-plugins/search-pipelines/using-search-pipeline.md new file mode 100644 index 0000000000..e01d1dad51 --- /dev/null +++ b/_search-plugins/search-pipelines/using-search-pipeline.md @@ -0,0 +1,160 @@ +--- +layout: default +title: Using a search pipeline +nav_order: 20 +has_children: false +parent: Search pipelines +grand_parent: Search +--- + +# Using a search pipeline + +You can use a search pipeline in the following ways: + +- [Specify an existing pipeline](#specifying-an-existing-search-pipeline-for-a-request) for a request. +- [Use a temporary pipeline](#using-a-temporary-search-pipeline-for-a-request) for a request. +- Set a [default pipeline](#default-search-pipeline) for all requests in an index. + +## Specifying an existing search pipeline for a request + +After you [create a search pipeline]({{site.url}}{{site.baseurl}}/search-plugins/search-pipelines/index#creating-a-search-pipeline), you can use the pipeline with a query by specifying the pipeline name in the `search_pipeline` query parameter: + +```json +GET /my_index/_search?search_pipeline=my_pipeline +``` +{% include copy-curl.html %} + +For a complete example of using a search pipeline with a `filter_query` processor, see [`filter_query` processor example]({{site.url}}{{site.baseurl}}/search-plugins/search-pipelines/filter-query-processor#example). + +## Using a temporary search pipeline for a request + +As an alternative to creating a search pipeline, you can define a temporary search pipeline to be used for only the current query: + +```json +POST /my-index/_search +{ + "query" : { + "match" : { + "text_field" : "some search text" + } + }, + "pipeline" : { + "request_processors": [ + { + "filter_query" : { + "tag" : "tag1", + "description" : "This processor is going to restrict to publicly visible documents", + "query" : { + "term": { + "visibility": "public" + } + } + } + } + ], + "response_processors": [ + { + "rename_field": { + "field": "message", + "target_field": "notification" + } + } + ] + } +} +``` +{% include copy-curl.html %} + +With this syntax, the pipeline does not persist and is used only for the query for which it is specified. + +## Default search pipeline + +For convenience, you can set a default search pipeline for an index. Once your index has a default pipeline, you don't need to specify the `search_pipeline` query parameter in every search request. + +### Setting a default search pipeline for an index + +To set a default search pipeline for an index, specify the `index.search.default_pipeline` in the index's settings: + +```json +PUT /my_index/_settings +{ + "index.search.default_pipeline" : "my_pipeline" +} +``` +{% include copy-curl.html %} + +After setting the default pipeline for `my_index`, you can try the same search for all documents: + +```json +GET /my_index/_search +``` +{% include copy-curl.html %} + +The response contains only the public document, indicating that the pipeline was applied by default: + +
+ + Response + + {: .text-delta} + +```json +{ + "took" : 19, + "timed_out" : false, + "_shards" : { + "total" : 1, + "successful" : 1, + "skipped" : 0, + "failed" : 0 + }, + "hits" : { + "total" : { + "value" : 1, + "relation" : "eq" + }, + "max_score" : 0.0, + "hits" : [ + { + "_index" : "my_index", + "_id" : "1", + "_score" : 0.0, + "_source" : { + "message" : "This is a public message", + "visibility" : "public" + } + } + ] + } +} +``` +
+ +### Disabling the default pipeline for a request + +If you want to run a search request without applying the default pipeline, you can set the `search_pipeline` query parameter to `_none`: + +```json +GET /my_index/_search?search_pipeline=_none +``` +{% include copy-curl.html %} + +### Removing the default pipeline + +To remove the default pipeline from an index, set it to `null` or `_none`: + +```json +PUT /my_index/_settings +{ + "index.search.default_pipeline" : null +} +``` +{% include copy-curl.html %} + +```json +PUT /my_index/_settings +{ + "index.search.default_pipeline" : "_none" +} +``` +{% include copy-curl.html %} \ No newline at end of file