Merge branch 'main' into collapse-search-results-opensearch-project#7507

leanneeliatra · Aug 15, 2024 · 53f5f3f · 53f5f3f
2 parents 3b21a0c + ecd2232
commit 53f5f3f
Show file tree

Hide file tree

Showing 32 changed files with 2,300 additions and 476 deletions.
diff --git a/_about/version-history.md b/_about/version-history.md
@@ -9,6 +9,7 @@ permalink: /version-history/
 
 OpenSearch version | Release highlights | Release date  
 :--- | :--- | :--- 
+[2.16.0](https://github.com/opensearch-project/opensearch-build/blob/main/release-notes/opensearch-release-notes-2.16.0.md) | Includes built-in byte vector quantization and binary vector support in k-NN. Adds new sort, split, and ML inference search processors for search pipelines. Provides application-based configuration templates and additional plugins to integrate multiple data sources in OpenSearch Dashboards. Includes an experimental Batch Predict ML Commons API. For a full list of release highlights, see the Release Notes. | 06 August 2024
 [2.15.0](https://github.com/opensearch-project/opensearch-build/blob/main/release-notes/opensearch-release-notes-2.15.0.md) | Includes parallel ingestion processing, SIMD support for exact search, and the ability to disable doc values for the k-NN field. Adds wildcard and derived field types. Improves performance for single-cardinality aggregations, rolling upgrades to remote-backed clusters, and more metrics for top N queries. For a full list of release highlights, see the Release Notes. | 25 June 2024
 [2.14.0](https://github.com/opensearch-project/opensearch-build/blob/main/release-notes/opensearch-release-notes-2.14.0.md) | Includes performance improvements to hybrid search and date histogram queries with multi-range traversal, ML model integration within the Ingest API, semantic cache for LangChain applications, low-level vector query interface for neural sparse queries, and improved k-NN search filtering. Provides an experimental tiered cache feature. For a full list of release highlights, see the Release Notes. | 14 May 2024
 [2.13.0](https://github.com/opensearch-project/opensearch-build/blob/main/release-notes/opensearch-release-notes-2.13.0.md) | Makes agents and tools and the OpenSearch Assistant Toolkit generally available. Introduces vector quantization within OpenSearch. Adds LLM guardrails and hybrid search with aggregations. Adds the Bloom filter skipping index for Apache Spark data sources, I/O-based admission control, and the ability to add an alerting cluster that manages all alerting tasks. For a full list of release highlights, see the Release Notes. | 2 April 2024

diff --git a/_analyzers/token-filters/apostrophe.md b/_analyzers/token-filters/apostrophe.md
@@ -2,7 +2,7 @@
 layout: default
 title: Apostrophe
 parent: Token filters
-nav_order: 110
+nav_order: 10
 ---
 
 # Apostrophe token filter
@@ -22,7 +22,7 @@ PUT /custom_text_index
       "analyzer": {
         "custom_analyzer": {
           "type": "custom",
-          "tokenizer": "standard", // splits text into words
+          "tokenizer": "standard",
           "filter": [
             "lowercase",
             "apostrophe"

diff --git a/_api-reference/document-apis/get-documents.md b/_api-reference/document-apis/get-documents.md
@@ -11,29 +11,28 @@ redirect_from:
 **Introduced 1.0**
 {: .label .label-purple }
 
-After adding a JSON document to your index, you can use the get document API operation to retrieve the document's information and data.
+After adding a JSON document to your index, you can use the Get Document API operation to retrieve the document's information and data.
 
-## Example
-
-```json
-GET sample-index1/_doc/1
-```
-{% include copy-curl.html %}
 
 ## Path and HTTP methods
 
-```
+Use the GET method to retrieve a document and its source or stored fields from a particular index. Use the HEAD method to verify that a document exists:
+
+```json
 GET <index>/_doc/<_id>
 HEAD <index>/_doc/<_id>
 ```
-```
+
+Use `_source` to retrieve the document source or to verify that it exists:
+
+```json
 GET <index>/_source/<_id>
 HEAD <index>/_source/<_id>
 ```
 
-## URL parameters
+## Query parameters
 
-All get document URL parameters are optional.
+All query parameters are optional.
 
 Parameter | Type | Description
 :--- | :--- | :---
@@ -48,6 +47,83 @@ _source_includes | String | A comma-separated list of source fields to include i
 version | Integer | The version of the document to return, which must match the current version of the document.
 version_type | Enum | Retrieves a specifically typed document. Available options are `external` (retrieve the document if the specified version number is greater than the document's current version) and `external_gte` (retrieve the document if the specified version number is greater than or equal to the document's current version). For example, to retrieve version 3 of a document, use `/_doc/1?version=3&version_type=external`.
 
+### Real time
+
+The OpenSearch Get Document API operates in real time by default, which means that it retrieves the latest version of the document regardless of the index's refresh rate or the rate at which new data becomes searchable. However, if you request stored fields (using the `stored_fields` parameter) for a document that has been updated but not yet refreshed, then the Get Document API parses and analyzes the document's source to extract those stored fields. 
+
+To disable the real-time behavior and retrieve the document based on the last refreshed state of the index, set the `realtime` parameter to `false`.
+
+### Source filtering
+
+By default, the Get Document API returns the entire contents of the `_source` field for the requested document. However, you can choose to exclude the `_source` field from the response by using the `_source` URL parameter and setting it to `false`, as shown in the following example:
+
+```json
+GET test-index/_doc/0?_source=false
+```
+
+#### `source` includes and excludes
+
+If you only want to retrieve specific fields from the source, use the `_source_includes` or `_source_excludes` parameters to include or exclude particular fields, respectively. This can be beneficial for large documents because retrieving only the required fields can reduce network overhead. 
+
+Both parameters accept a comma-separated list of fields and wildcard expressions, as shown in the following example, where any `_source` that contains `*.play` is included in the response but sources with the field `entities` are excluded:
+
+```json
+GET test-index/_doc/0?_source_includes=*.play&_source_excludes=entities
+```
+
+#### Shorter notation
+
+If you only want to include certain fields and don't need to exclude any, you can use a shorter notation by specifying the desired fields directly in the `_source` parameter:
+
+```json
+GET test-index/_doc/0?_source=*.id
+```
+
+### Routing
+
+When indexing documents in OpenSearch, you can specify a `routing` value to control the shard assignments for documents. If routing was used during indexing, you must provide the same routing value when retrieving the document using the Get Document API, as shown in the following example:
+
+```json
+GET test-index/_doc/1?routing=user1
+```
+
+This request retrieves the document with the ID `1`, but it uses the routing value "user1" to determine on which shard the document is stored. If the correct routing value is not specified, the Get Document API is not able to locate and fetch the requested document.
+
+### Preference
+
+The Get Document API allows you to control which shard replica handles the request. By default, the operation is randomly distributed across the available shard replicas.
+
+However, you can specify a preference to influence the replica selection. The preference can be set to one of the following values:
+
+- `_local`: The operation attempts to execute on a locally allocated shard replica, if possible. This can improve performance by reducing network overhead.
+- Custom (string) value: Specifying a custom string value ensures that requests with the same value are routed to the same set of shards. This consistency can be beneficial when managing shards in different refresh states because it prevents "jumping values" that may occur when hitting shards with varying data visibility. A common practice is to use a web session ID or a user name as the custom value.
+
+
+### Refresh
+
+Set the `refresh` parameter to `true` to force a refresh of the relevant shard before running the Get Document API operation. This ensures that the most recent data changes are made searchable and visible to the API. However, a refresh should be performed judiciously because it can potentially impose a heavy load on the system and slow down indexing performance. It's recommended to carefully evaluate the trade-off between data freshness and system load before enabling the `refresh` parameter.
+
+### Distributed
+
+When running the Get Document API, OpenSearch first calculates a hash value based on the document ID, which determines the specific ID of the shard on which the document resides. The operation is then redirected to one of the replicas (including the primary shard and its replica shards) in that shard ID group, and the result is returned from that replica.
+
+A higher number of shard replicas improves the scalability and performance of GET operations because the load can be distributed across multiple replica shards. This means that as the number of replicas increases, you can achieve better scaling and throughput for Get Document API requests.
+
+### Versioning support
+
+Use the `version` parameter to retrieve a document only if its current version matches the specified version number. This can be useful for ensuring data consistency and preventing conflicts when working with versioned documents.
+
+Internally, when a document is updated in OpenSearch, the original version is marked as deleted, and a new version of the document is added. However, the original version doesn't immediately disappear from the system. While you won't be able to access it through the Get Document API, OpenSearch manages the cleanup of deleted document versions in the background as you continue indexing new data.
+
+## Example request
+
+The following example request retrieves information about a document named `1`:
+
+```json
+GET sample-index1/_doc/1
+```
+{% include copy-curl.html %}
+
 
 ## Example response
 ```json

diff --git a/_api-reference/index-apis/blocks.md b/_api-reference/index-apis/blocks.md
@@ -0,0 +1,59 @@
+---
+layout: default
+title: Blocks
+parent: Index APIs
+nav_order: 6
+---
+
+# Blocks
+**Introduced 1.0**
+{: .label .label-purple }
+
+Use the Blocks API to limit certain operations on a specified index. Different types of blocks allow you to restrict index write, read, or metadata operations. 
+For example, adding a `write` block through the API ensures that all index shards have properly accounted for the block before returning a successful response. Any in-flight write operations to the index must be complete before the `write` block takes effect.
+
+## Path and HTTP methods
+
+```json
+PUT /<index>/_block/<block>
+```
+
+## Path parameters
+
+| Parameter | Data type | Description |
+:--- | :--- | :---
+| `index` | String | A comma-delimited list of index names. Wildcard expressions (`*`) are supported. To target all data streams and indexes in a cluster, use `_all` or `*`. Optional. |
+| `<block>` | String | Specifies the type of block to apply to the index. Valid values are: <br> `metadata`: Disables all metadata changes, such as closing the index. <br> `read`: Disables any read operations. <br> `read_only`: Disables any write operations and metadata changes. <br> `write`: Disables write operations. However, metadata changes are still allowed. |
+
+## Query parameters
+
+The following table lists the available query parameters. All query parameters are optional.
+
+| Parameter | Data type | Description |
+| :--- | :--- | :--- |
+| `ignore_unavailable` | Boolean | When `false`, the request returns an error when it targets a missing or closed index. Default is `false`.
+| `allow_no_indices` | Boolean | When `false`, the Refresh Index API returns an error when a wildcard expression, index alias, or `_all` targets only closed or missing indexes, even when the request is made against open indexes. Default is `true`. |
+| `expand_wildcards` | String | The type of index that the wildcard patterns can match. If the request targets data streams, this argument determines whether the wildcard expressions match any hidden data streams. Supports comma-separated values, such as `open,hidden`. Valid values are `all`, `open`, `closed`, `hidden`, and `none`. |
+`cluster_manager_timeout` | Time | The amount of time to wait for a connection to the cluster manager node. Default is `30s`.
+`timeout` | Time | The amount of time to wait for the request to return. Default is `30s`. |
+
+## Example request
+
+The following example request disables any `write` operations made to the test index:
+
+```json
+PUT /test-index/_block/write
+```
+
+## Example response
+
+```json
+{
+  "acknowledged" : true,
+  "shards_acknowledged" : true,
+  "indices" : [ {
+    "name" : "test-index",
+    "blocked" : true
+  } ]
+}
+```
diff --git a/_api-reference/index-apis/segment.md b/_api-reference/index-apis/segment.md
@@ -34,7 +34,6 @@ The Segment API supports the following optional query parameters.
 Parameter | Data type | Description
 :--- | :--- | :---
 `allow_no_indices` | Boolean | Whether to ignore wildcards that don't match any indexes. Default is `true`.
-`allow_partial_search_results` | Boolean | Whether to return partial results if the request encounters an error or times out. Default is `true`.
 `expand_wildcards` | String | Specifies the type of index that wildcard expressions can match. Supports comma-separated values. Valid values are `all` (match any index), `open` (match open, non-hidden indexes), `closed` (match closed, non-hidden indexes), `hidden` (match hidden indexes), and `none` (deny wildcard expressions). Default is `open`.
 `ignore_unavailable` | Boolean | When `true`, OpenSearch ignores missing or closed indexes. If `false`, OpenSearch returns an error if the force merge operation encounters missing or closed indexes. Default is `false`.
 `verbose` | Boolean | When `true`, provides information about Lucene's memory usage. Default is `false`.

diff --git a/_benchmark/reference/workloads/operations.md b/_benchmark/reference/workloads/operations.md
@@ -10,13 +10,13 @@ nav_order: 100
 # operations
 <!-- vale on -->
 
-The `operations` element contains a list of all available operations for specifying a schedule.
+The `operations` element contains a list of all available operations for specifying a schedule. 
 
 <!-- vale off -->
 ## bulk
 <!-- vale on -->
 
-The `bulk` operation type allows you to run [bulk](/api-reference/document-apis/bulk/) requests as a task. 
+The `bulk` operation type allows you to run [bulk]({{site.url}}{{site.baseurl}}/api-reference/document-apis/bulk/) requests as a task. 
 
 ### Usage
 
@@ -82,7 +82,7 @@ If `detailed-results` is `true`, the following metadata is returned:
 ## create-index
 <!-- vale on -->
 
-The `create-index` operation runs the [Create Index API](/api-reference/index-apis/create-index/). It supports the following two modes of index creation:
+The `create-index` operation runs the [Create Index API]({{site.url}}{{site.baseurl}}/api-reference/index-apis/create-index/). It supports the following two index creation modes:
 
 - Creating all indexes specified in the workloads `indices` section
 - Creating one specific index defined within the operation itself
@@ -157,7 +157,7 @@ The `create-index` operation returns the following metadata:
 ## delete-index
 <!-- vale on -->
 
-The `delete-index` operation runs the [Delete Index API](api-reference/index-apis/delete-index/). Like with the [`create-index`](#create-index) operation, you can delete all indexes found in the `indices` section of the workload or delete one or more indexes based on the string passed in the `index` setting.
+The `delete-index` operation runs the [Delete Index API]({{site.url}}{{site.baseurl}}/api-reference/index-apis/delete-index/). As with the [`create-index`](#create-index) operation, you can delete all indexes found in the `indices` section of the workload or delete one or more indexes based on the string passed in the `index` setting.
 
 ### Usage
 
@@ -215,7 +215,7 @@ The `delete-index` operation returns the following metadata:
 ## cluster-health
 <!-- vale on -->
 
-The `cluster-health` operation runs the [Cluster Health API](api-reference/cluster-api/cluster-health/), which checks the cluster health status and returns the expected status according to the parameters set for `request-params`. If an unexpected cluster health status is returned, the operation reports a failure. You can use the `--on-error` option in the OpenSearch Benchmark `execute-test` command to control how OpenSearch Benchmark behaves when the health check fails.
+The `cluster-health` operation runs the [Cluster Health API]({{site.url}}{{site.baseurl}}/api-reference/cluster-api/cluster-health/), which checks the cluster health status and returns the expected status according to the parameters set for `request-params`. If an unexpected cluster health status is returned, then the operation reports a failure. You can use the `--on-error` option in the OpenSearch Benchmark `execute-test` command to control how OpenSearch Benchmark behaves when the health check fails.
 
 
 ### Usage
@@ -285,7 +285,7 @@ Parameter | Required | Type | Description
 ## search
 <!-- vale on -->
 
-The `search` operation runs the [Search API](/api-reference/search/), which you can use to run queries in OpenSearch Benchmark indexes.
+The `search` operation runs the [Search API]({{site.url}}{{site.baseurl}}/api-reference/search/), which you can use to run queries in OpenSearch Benchmark indexes.
 
 ### Usage
 

diff --git a/_config.yml b/_config.yml
@@ -5,10 +5,10 @@ baseurl: "/docs/latest" # the subpath of your site, e.g. /blog
 url: "https://opensearch.org" # the base hostname & protocol for your site, e.g. http://example.com
 permalink: /:path/
 
-opensearch_version: '2.15.0'
-opensearch_dashboards_version: '2.15.0'
-opensearch_major_minor_version: '2.15'
-lucene_version: '9_10_0'
+opensearch_version: '2.16.0'
+opensearch_dashboards_version: '2.16.0'
+opensearch_major_minor_version: '2.16'
+lucene_version: '9_11_1'
 
 # Build settings
 markdown: kramdown

diff --git a/_data/versions.json b/_data/versions.json
@@ -1,10 +1,11 @@
 {
-  "current": "2.15",
+  "current": "2.16",
   "all": [
-    "2.15",
+    "2.16",
     "1.3"
   ],
   "archived": [
+    "2.15",
     "2.14",
     "2.13",
     "2.12",
@@ -24,7 +25,7 @@
     "1.1",
     "1.0"
     ],
-  "latest": "2.15"
+  "latest": "2.16"
 }
 
 
diff --git a/_ingest-pipelines/processors/sparse-encoding.md b/_ingest-pipelines/processors/sparse-encoding.md
@@ -141,7 +141,7 @@ The response confirms that in addition to the `passage_text` field, the processo
 }
 ```
 
-Once you have created an ingest pipeline, you need to create an index for ingestion and ingest documents into the index. To learn more, see [Step 2: Create an index for ingestion]({{site.url}}{{site.baseurl}}/search-plugins/neural-sparse-search/#step-2-create-an-index-for-ingestion) and [Step 3: Ingest documents into the index]({{site.url}}{{site.baseurl}}/search-plugins/neural-sparse-search/#step-3-ingest-documents-into-the-index) of [Neural sparse search]({{site.url}}{{site.baseurl}}/search-plugins/neural-sparse-search/).
+Once you have created an ingest pipeline, you need to create an index for ingestion and ingest documents into the index. To learn more, see [Create an index for ingestion]({{site.url}}{{site.baseurl}}/search-plugins/neural-sparse-with-pipelines/#step-2b-create-an-index-for-ingestion) and [Step 3: Ingest documents into the index]({{site.url}}{{site.baseurl}}/search-plugins/neural-sparse-with-pipelines/#step-2c-ingest-documents-into-the-index) of [Neural sparse search]({{site.url}}{{site.baseurl}}/search-plugins/neural-sparse-search/).
 
 ---