Skip to content

Commit

Permalink
Merge branch 'main' into collapse-search-results-opensearch-project#7507
Browse files Browse the repository at this point in the history
  • Loading branch information
leanneeliatra committed Aug 15, 2024
2 parents 3b21a0c + ecd2232 commit 53f5f3f
Show file tree
Hide file tree
Showing 32 changed files with 2,300 additions and 476 deletions.
1 change: 1 addition & 0 deletions _about/version-history.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,7 @@ permalink: /version-history/

OpenSearch version | Release highlights | Release date
:--- | :--- | :---
[2.16.0](https://github.com/opensearch-project/opensearch-build/blob/main/release-notes/opensearch-release-notes-2.16.0.md) | Includes built-in byte vector quantization and binary vector support in k-NN. Adds new sort, split, and ML inference search processors for search pipelines. Provides application-based configuration templates and additional plugins to integrate multiple data sources in OpenSearch Dashboards. Includes an experimental Batch Predict ML Commons API. For a full list of release highlights, see the Release Notes. | 06 August 2024
[2.15.0](https://github.com/opensearch-project/opensearch-build/blob/main/release-notes/opensearch-release-notes-2.15.0.md) | Includes parallel ingestion processing, SIMD support for exact search, and the ability to disable doc values for the k-NN field. Adds wildcard and derived field types. Improves performance for single-cardinality aggregations, rolling upgrades to remote-backed clusters, and more metrics for top N queries. For a full list of release highlights, see the Release Notes. | 25 June 2024
[2.14.0](https://github.com/opensearch-project/opensearch-build/blob/main/release-notes/opensearch-release-notes-2.14.0.md) | Includes performance improvements to hybrid search and date histogram queries with multi-range traversal, ML model integration within the Ingest API, semantic cache for LangChain applications, low-level vector query interface for neural sparse queries, and improved k-NN search filtering. Provides an experimental tiered cache feature. For a full list of release highlights, see the Release Notes. | 14 May 2024
[2.13.0](https://github.com/opensearch-project/opensearch-build/blob/main/release-notes/opensearch-release-notes-2.13.0.md) | Makes agents and tools and the OpenSearch Assistant Toolkit generally available. Introduces vector quantization within OpenSearch. Adds LLM guardrails and hybrid search with aggregations. Adds the Bloom filter skipping index for Apache Spark data sources, I/O-based admission control, and the ability to add an alerting cluster that manages all alerting tasks. For a full list of release highlights, see the Release Notes. | 2 April 2024
Expand Down
4 changes: 2 additions & 2 deletions _analyzers/token-filters/apostrophe.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
layout: default
title: Apostrophe
parent: Token filters
nav_order: 110
nav_order: 10
---

# Apostrophe token filter
Expand All @@ -22,7 +22,7 @@ PUT /custom_text_index
"analyzer": {
"custom_analyzer": {
"type": "custom",
"tokenizer": "standard", // splits text into words
"tokenizer": "standard",
"filter": [
"lowercase",
"apostrophe"
Expand Down
98 changes: 87 additions & 11 deletions _api-reference/document-apis/get-documents.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,29 +11,28 @@ redirect_from:
**Introduced 1.0**
{: .label .label-purple }

After adding a JSON document to your index, you can use the get document API operation to retrieve the document's information and data.
After adding a JSON document to your index, you can use the Get Document API operation to retrieve the document's information and data.

## Example

```json
GET sample-index1/_doc/1
```
{% include copy-curl.html %}

## Path and HTTP methods

```
Use the GET method to retrieve a document and its source or stored fields from a particular index. Use the HEAD method to verify that a document exists:

```json
GET <index>/_doc/<_id>
HEAD <index>/_doc/<_id>
```
```

Use `_source` to retrieve the document source or to verify that it exists:

```json
GET <index>/_source/<_id>
HEAD <index>/_source/<_id>
```

## URL parameters
## Query parameters

All get document URL parameters are optional.
All query parameters are optional.

Parameter | Type | Description
:--- | :--- | :---
Expand All @@ -48,6 +47,83 @@ _source_includes | String | A comma-separated list of source fields to include i
version | Integer | The version of the document to return, which must match the current version of the document.
version_type | Enum | Retrieves a specifically typed document. Available options are `external` (retrieve the document if the specified version number is greater than the document's current version) and `external_gte` (retrieve the document if the specified version number is greater than or equal to the document's current version). For example, to retrieve version 3 of a document, use `/_doc/1?version=3&version_type=external`.

### Real time

The OpenSearch Get Document API operates in real time by default, which means that it retrieves the latest version of the document regardless of the index's refresh rate or the rate at which new data becomes searchable. However, if you request stored fields (using the `stored_fields` parameter) for a document that has been updated but not yet refreshed, then the Get Document API parses and analyzes the document's source to extract those stored fields.

To disable the real-time behavior and retrieve the document based on the last refreshed state of the index, set the `realtime` parameter to `false`.

### Source filtering

By default, the Get Document API returns the entire contents of the `_source` field for the requested document. However, you can choose to exclude the `_source` field from the response by using the `_source` URL parameter and setting it to `false`, as shown in the following example:

```json
GET test-index/_doc/0?_source=false
```

#### `source` includes and excludes

If you only want to retrieve specific fields from the source, use the `_source_includes` or `_source_excludes` parameters to include or exclude particular fields, respectively. This can be beneficial for large documents because retrieving only the required fields can reduce network overhead.

Both parameters accept a comma-separated list of fields and wildcard expressions, as shown in the following example, where any `_source` that contains `*.play` is included in the response but sources with the field `entities` are excluded:

```json
GET test-index/_doc/0?_source_includes=*.play&_source_excludes=entities
```

#### Shorter notation

If you only want to include certain fields and don't need to exclude any, you can use a shorter notation by specifying the desired fields directly in the `_source` parameter:

```json
GET test-index/_doc/0?_source=*.id
```

### Routing

When indexing documents in OpenSearch, you can specify a `routing` value to control the shard assignments for documents. If routing was used during indexing, you must provide the same routing value when retrieving the document using the Get Document API, as shown in the following example:

```json
GET test-index/_doc/1?routing=user1
```

This request retrieves the document with the ID `1`, but it uses the routing value "user1" to determine on which shard the document is stored. If the correct routing value is not specified, the Get Document API is not able to locate and fetch the requested document.

### Preference

The Get Document API allows you to control which shard replica handles the request. By default, the operation is randomly distributed across the available shard replicas.

However, you can specify a preference to influence the replica selection. The preference can be set to one of the following values:

- `_local`: The operation attempts to execute on a locally allocated shard replica, if possible. This can improve performance by reducing network overhead.
- Custom (string) value: Specifying a custom string value ensures that requests with the same value are routed to the same set of shards. This consistency can be beneficial when managing shards in different refresh states because it prevents "jumping values" that may occur when hitting shards with varying data visibility. A common practice is to use a web session ID or a user name as the custom value.


### Refresh

Set the `refresh` parameter to `true` to force a refresh of the relevant shard before running the Get Document API operation. This ensures that the most recent data changes are made searchable and visible to the API. However, a refresh should be performed judiciously because it can potentially impose a heavy load on the system and slow down indexing performance. It's recommended to carefully evaluate the trade-off between data freshness and system load before enabling the `refresh` parameter.

### Distributed

When running the Get Document API, OpenSearch first calculates a hash value based on the document ID, which determines the specific ID of the shard on which the document resides. The operation is then redirected to one of the replicas (including the primary shard and its replica shards) in that shard ID group, and the result is returned from that replica.

A higher number of shard replicas improves the scalability and performance of GET operations because the load can be distributed across multiple replica shards. This means that as the number of replicas increases, you can achieve better scaling and throughput for Get Document API requests.

### Versioning support

Use the `version` parameter to retrieve a document only if its current version matches the specified version number. This can be useful for ensuring data consistency and preventing conflicts when working with versioned documents.

Internally, when a document is updated in OpenSearch, the original version is marked as deleted, and a new version of the document is added. However, the original version doesn't immediately disappear from the system. While you won't be able to access it through the Get Document API, OpenSearch manages the cleanup of deleted document versions in the background as you continue indexing new data.

## Example request

The following example request retrieves information about a document named `1`:

```json
GET sample-index1/_doc/1
```
{% include copy-curl.html %}


## Example response
```json
Expand Down
59 changes: 59 additions & 0 deletions _api-reference/index-apis/blocks.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,59 @@
---
layout: default
title: Blocks
parent: Index APIs
nav_order: 6
---

# Blocks
**Introduced 1.0**
{: .label .label-purple }

Use the Blocks API to limit certain operations on a specified index. Different types of blocks allow you to restrict index write, read, or metadata operations.
For example, adding a `write` block through the API ensures that all index shards have properly accounted for the block before returning a successful response. Any in-flight write operations to the index must be complete before the `write` block takes effect.

## Path and HTTP methods

```json
PUT /<index>/_block/<block>
```

## Path parameters

| Parameter | Data type | Description |
:--- | :--- | :---
| `index` | String | A comma-delimited list of index names. Wildcard expressions (`*`) are supported. To target all data streams and indexes in a cluster, use `_all` or `*`. Optional. |
| `<block>` | String | Specifies the type of block to apply to the index. Valid values are: <br> `metadata`: Disables all metadata changes, such as closing the index. <br> `read`: Disables any read operations. <br> `read_only`: Disables any write operations and metadata changes. <br> `write`: Disables write operations. However, metadata changes are still allowed. |

## Query parameters

The following table lists the available query parameters. All query parameters are optional.

| Parameter | Data type | Description |
| :--- | :--- | :--- |
| `ignore_unavailable` | Boolean | When `false`, the request returns an error when it targets a missing or closed index. Default is `false`.
| `allow_no_indices` | Boolean | When `false`, the Refresh Index API returns an error when a wildcard expression, index alias, or `_all` targets only closed or missing indexes, even when the request is made against open indexes. Default is `true`. |
| `expand_wildcards` | String | The type of index that the wildcard patterns can match. If the request targets data streams, this argument determines whether the wildcard expressions match any hidden data streams. Supports comma-separated values, such as `open,hidden`. Valid values are `all`, `open`, `closed`, `hidden`, and `none`. |
`cluster_manager_timeout` | Time | The amount of time to wait for a connection to the cluster manager node. Default is `30s`.
`timeout` | Time | The amount of time to wait for the request to return. Default is `30s`. |

## Example request

The following example request disables any `write` operations made to the test index:

```json
PUT /test-index/_block/write
```

## Example response

```json
{
"acknowledged" : true,
"shards_acknowledged" : true,
"indices" : [ {
"name" : "test-index",
"blocked" : true
} ]
}
```
1 change: 0 additions & 1 deletion _api-reference/index-apis/segment.md
Original file line number Diff line number Diff line change
Expand Up @@ -34,7 +34,6 @@ The Segment API supports the following optional query parameters.
Parameter | Data type | Description
:--- | :--- | :---
`allow_no_indices` | Boolean | Whether to ignore wildcards that don't match any indexes. Default is `true`.
`allow_partial_search_results` | Boolean | Whether to return partial results if the request encounters an error or times out. Default is `true`.
`expand_wildcards` | String | Specifies the type of index that wildcard expressions can match. Supports comma-separated values. Valid values are `all` (match any index), `open` (match open, non-hidden indexes), `closed` (match closed, non-hidden indexes), `hidden` (match hidden indexes), and `none` (deny wildcard expressions). Default is `open`.
`ignore_unavailable` | Boolean | When `true`, OpenSearch ignores missing or closed indexes. If `false`, OpenSearch returns an error if the force merge operation encounters missing or closed indexes. Default is `false`.
`verbose` | Boolean | When `true`, provides information about Lucene's memory usage. Default is `false`.
Expand Down
12 changes: 6 additions & 6 deletions _benchmark/reference/workloads/operations.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,13 +10,13 @@ nav_order: 100
# operations
<!-- vale on -->

The `operations` element contains a list of all available operations for specifying a schedule.
The `operations` element contains a list of all available operations for specifying a schedule.

<!-- vale off -->
## bulk
<!-- vale on -->

The `bulk` operation type allows you to run [bulk](/api-reference/document-apis/bulk/) requests as a task.
The `bulk` operation type allows you to run [bulk]({{site.url}}{{site.baseurl}}/api-reference/document-apis/bulk/) requests as a task.

### Usage

Expand Down Expand Up @@ -82,7 +82,7 @@ If `detailed-results` is `true`, the following metadata is returned:
## create-index
<!-- vale on -->

The `create-index` operation runs the [Create Index API](/api-reference/index-apis/create-index/). It supports the following two modes of index creation:
The `create-index` operation runs the [Create Index API]({{site.url}}{{site.baseurl}}/api-reference/index-apis/create-index/). It supports the following two index creation modes:

- Creating all indexes specified in the workloads `indices` section
- Creating one specific index defined within the operation itself
Expand Down Expand Up @@ -157,7 +157,7 @@ The `create-index` operation returns the following metadata:
## delete-index
<!-- vale on -->

The `delete-index` operation runs the [Delete Index API](api-reference/index-apis/delete-index/). Like with the [`create-index`](#create-index) operation, you can delete all indexes found in the `indices` section of the workload or delete one or more indexes based on the string passed in the `index` setting.
The `delete-index` operation runs the [Delete Index API]({{site.url}}{{site.baseurl}}/api-reference/index-apis/delete-index/). As with the [`create-index`](#create-index) operation, you can delete all indexes found in the `indices` section of the workload or delete one or more indexes based on the string passed in the `index` setting.

### Usage

Expand Down Expand Up @@ -215,7 +215,7 @@ The `delete-index` operation returns the following metadata:
## cluster-health
<!-- vale on -->

The `cluster-health` operation runs the [Cluster Health API](api-reference/cluster-api/cluster-health/), which checks the cluster health status and returns the expected status according to the parameters set for `request-params`. If an unexpected cluster health status is returned, the operation reports a failure. You can use the `--on-error` option in the OpenSearch Benchmark `execute-test` command to control how OpenSearch Benchmark behaves when the health check fails.
The `cluster-health` operation runs the [Cluster Health API]({{site.url}}{{site.baseurl}}/api-reference/cluster-api/cluster-health/), which checks the cluster health status and returns the expected status according to the parameters set for `request-params`. If an unexpected cluster health status is returned, then the operation reports a failure. You can use the `--on-error` option in the OpenSearch Benchmark `execute-test` command to control how OpenSearch Benchmark behaves when the health check fails.


### Usage
Expand Down Expand Up @@ -285,7 +285,7 @@ Parameter | Required | Type | Description
## search
<!-- vale on -->

The `search` operation runs the [Search API](/api-reference/search/), which you can use to run queries in OpenSearch Benchmark indexes.
The `search` operation runs the [Search API]({{site.url}}{{site.baseurl}}/api-reference/search/), which you can use to run queries in OpenSearch Benchmark indexes.

### Usage

Expand Down
8 changes: 4 additions & 4 deletions _config.yml
Original file line number Diff line number Diff line change
Expand Up @@ -5,10 +5,10 @@ baseurl: "/docs/latest" # the subpath of your site, e.g. /blog
url: "https://opensearch.org" # the base hostname & protocol for your site, e.g. http://example.com
permalink: /:path/

opensearch_version: '2.15.0'
opensearch_dashboards_version: '2.15.0'
opensearch_major_minor_version: '2.15'
lucene_version: '9_10_0'
opensearch_version: '2.16.0'
opensearch_dashboards_version: '2.16.0'
opensearch_major_minor_version: '2.16'
lucene_version: '9_11_1'

# Build settings
markdown: kramdown
Expand Down
7 changes: 4 additions & 3 deletions _data/versions.json
Original file line number Diff line number Diff line change
@@ -1,10 +1,11 @@
{
"current": "2.15",
"current": "2.16",
"all": [
"2.15",
"2.16",
"1.3"
],
"archived": [
"2.15",
"2.14",
"2.13",
"2.12",
Expand All @@ -24,7 +25,7 @@
"1.1",
"1.0"
],
"latest": "2.15"
"latest": "2.16"
}


2 changes: 1 addition & 1 deletion _ingest-pipelines/processors/sparse-encoding.md
Original file line number Diff line number Diff line change
Expand Up @@ -141,7 +141,7 @@ The response confirms that in addition to the `passage_text` field, the processo
}
```

Once you have created an ingest pipeline, you need to create an index for ingestion and ingest documents into the index. To learn more, see [Step 2: Create an index for ingestion]({{site.url}}{{site.baseurl}}/search-plugins/neural-sparse-search/#step-2-create-an-index-for-ingestion) and [Step 3: Ingest documents into the index]({{site.url}}{{site.baseurl}}/search-plugins/neural-sparse-search/#step-3-ingest-documents-into-the-index) of [Neural sparse search]({{site.url}}{{site.baseurl}}/search-plugins/neural-sparse-search/).
Once you have created an ingest pipeline, you need to create an index for ingestion and ingest documents into the index. To learn more, see [Create an index for ingestion]({{site.url}}{{site.baseurl}}/search-plugins/neural-sparse-with-pipelines/#step-2b-create-an-index-for-ingestion) and [Step 3: Ingest documents into the index]({{site.url}}{{site.baseurl}}/search-plugins/neural-sparse-with-pipelines/#step-2c-ingest-documents-into-the-index) of [Neural sparse search]({{site.url}}{{site.baseurl}}/search-plugins/neural-sparse-search/).

---

Expand Down
Loading

0 comments on commit 53f5f3f

Please sign in to comment.