Skip to content

Commit

Permalink
Merge branch 'main' into has-parent-query
Browse files Browse the repository at this point in the history
  • Loading branch information
kolchfa-aws committed Sep 26, 2024
2 parents 9eb614f + d2e4e37 commit 44e697e
Show file tree
Hide file tree
Showing 9 changed files with 40 additions and 14 deletions.
24 changes: 23 additions & 1 deletion _analyzers/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -170,6 +170,28 @@ The response provides information about the analyzers for each field:
}
```

## Normalizers
Tokenization divides text into individual terms, but it does not address variations in token forms. Normalization resolves these issues by converting tokens into a standard format. This ensures that similar terms are matched appropriately, even if they are not identical.

### Normalization techniques

The following normalization techniques can help address variations in token forms:
1. **Case normalization**: Converts all tokens to lowercase to ensure case-insensitive matching. For example, "Hello" is normalized to "hello".

2. **Stemming**: Reduces words to their root form. For instance, "cars" is stemmed to "car", and "running" is normalized to "run".

3. **Synonym handling:** Treats synonyms as equivalent. For example, "jogging" and "running" can be indexed under a common term, such as "run".

### Normalization

A search for `Hello` will match documents containing `hello` because of case normalization.

A search for `cars` will also match documents containing `car` because of stemming.

A query for `running` can retrieve documents containing `jogging` using synonym handling.

Normalization ensures that searches are not limited to exact term matches, allowing for more relevant results. For instance, a search for `Cars running` can be normalized to match `car run`.

## Next steps

- Learn more about specifying [index analyzers]({{site.url}}{{site.baseurl}}/analyzers/index-analyzers/) and [search analyzers]({{site.url}}{{site.baseurl}}/analyzers/search-analyzers/).
- Learn more about specifying [index analyzers]({{site.url}}{{site.baseurl}}/analyzers/index-analyzers/) and [search analyzers]({{site.url}}{{site.baseurl}}/analyzers/search-analyzers/).
6 changes: 3 additions & 3 deletions _api-reference/index-apis/update-alias.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ nav_order: 5
**Introduced 1.0**
{: .label .label-purple }

The Create or Update Alias API adds a data stream or index to an alias or updates the settings for an existing alias. For more alias API operations, see [Index aliases]({{site.url}}{{site.baseurl}}/opensearch/index-alias/).
The Create or Update Alias API adds one or more indexes to an alias or updates the settings for an existing alias. For more alias API operations, see [Index aliases]({{site.url}}{{site.baseurl}}/opensearch/index-alias/).

The Create or Update Alias API is distinct from the [Alias API]({{site.url}}{{site.baseurl}}/opensearch/rest-api/alias/), which supports the addition and removal of aliases and the removal of alias indexes. In contrast, the following API only supports adding or updating an alias without updating the index itself. Each API also uses different request body parameters.
{: .note}
Expand All @@ -35,7 +35,7 @@ PUT /_alias

| Parameter | Type | Description |
:--- | :--- | :---
| `target` | String | A comma-delimited list of data streams and indexes. Wildcard expressions (`*`) are supported. To target all data streams and indexes in a cluster, use `_all` or `*`. Optional. |
| `target` | String | A comma-delimited list of indexes. Wildcard expressions (`*`) are supported. To target all indexes in a cluster, use `_all` or `*`. Optional. |
| `alias-name` | String | The alias name to be created or updated. Optional. |

## Query parameters
Expand All @@ -53,7 +53,7 @@ In the request body, you can specify the index name, the alias name, and the set

Field | Type | Description
:--- | :--- | :--- | :---
`index` | String | A comma-delimited list of data streams or indexes that you want to associate with the alias. If this field is set, it will override the index name specified in the URL path.
`index` | String | A comma-delimited list of indexes that you want to associate with the alias. If this field is set, it will override the index name specified in the URL path.
`alias` | String | The name of the alias. If this field is set, it will override the alias name specified in the URL path.
`is_write_index` | Boolean | Specifies whether the index should be a write index. An alias can only have one write index at a time. If a write request is submitted to an alias that links to multiple indexes, then OpenSearch runs the request only on the write index.
`routing` | String | Assigns a custom value to a shard for specific operations.
Expand Down
4 changes: 2 additions & 2 deletions _benchmark/quickstart.md
Original file line number Diff line number Diff line change
Expand Up @@ -116,7 +116,7 @@ You can now run your first benchmark. The following benchmark uses the [percolat
Benchmarks are run using the [`execute-test`]({{site.url}}{{site.baseurl}}/benchmark/commands/execute-test/) command with the following command flags:
For additional `execute_test` command flags, see the [execute-test]({{site.url}}{{site.baseurl}}/benchmark/commands/execute-test/) reference. Some commonly used options are `--workload-params`, `--exclude-tasks`, and `--include-tasks`.
For additional `execute-test` command flags, see the [execute-test]({{site.url}}{{site.baseurl}}/benchmark/commands/execute-test/) reference. Some commonly used options are `--workload-params`, `--exclude-tasks`, and `--include-tasks`.
{: .tip}
* `--pipeline=benchmark-only` : Informs OSB that users wants to provide their own OpenSearch cluster.
Expand All @@ -136,7 +136,7 @@ opensearch-benchmark execute-test --pipeline=benchmark-only --workload=percolato
```
{% include copy.html %}
When the `execute_test` command runs, all tasks and operations in the `percolator` workload run sequentially.
When the `execute-test` command runs, all tasks and operations in the `percolator` workload run sequentially.
### Validating the test
Expand Down
4 changes: 2 additions & 2 deletions _benchmark/user-guide/creating-custom-workloads.md
Original file line number Diff line number Diff line change
Expand Up @@ -263,7 +263,7 @@ opensearch-benchmark list workloads --workload-path=</path/to/workload/>
Use the `opensearch-benchmark execute-test` command to invoke your new workload and run a benchmark test against your OpenSearch cluster, as shown in the following example. Replace `--workload-path` with the path to your custom workload, `--target-host` with the `host:port` pairs for your cluster, and `--client-options` with any authorization options required to access the cluster.
```
opensearch-benchmark execute_test \
opensearch-benchmark execute-test \
--pipeline="benchmark-only" \
--workload-path="<PATH OUTPUTTED IN THE OUTPUT OF THE CREATE-WORKLOAD COMMAND>" \
--target-host="<CLUSTER ENDPOINT>" \
Expand All @@ -289,7 +289,7 @@ head -n 1000 <index>-documents.json > <index>-documents-1k.json
Then, run `opensearch-benchmark execute-test` with the option `--test-mode`. Test mode runs a quick version of the workload test.
```
opensearch-benchmark execute_test \
opensearch-benchmark execute-test \
--pipeline="benchmark-only" \
--workload-path="<PATH OUTPUTTED IN THE OUTPUT OF THE CREATE-WORKLOAD COMMAND>" \
--target-host="<CLUSTER ENDPOINT>" \
Expand Down
2 changes: 1 addition & 1 deletion _benchmark/user-guide/distributed-load.md
Original file line number Diff line number Diff line change
Expand Up @@ -64,7 +64,7 @@ With OpenSearch Benchmark running on all three nodes and the worker nodes set to
On **Node 1**, run a benchmark test with the `worker-ips` set to the IP addresses for your worker nodes, as shown in the following example:

```
opensearch-benchmark execute_test --pipeline=benchmark-only --workload=eventdata --worker-ips=198.52.100.0,198.53.100.0 --target-hosts=<DOMAIN_ENDPOINT> --client-options=<STANDARD_CLIENT_OPTIONS> --kill-running-processes
opensearch-benchmark execute-test --pipeline=benchmark-only --workload=eventdata --worker-ips=198.52.100.0,198.53.100.0 --target-hosts=<DOMAIN_ENDPOINT> --client-options=<STANDARD_CLIENT_OPTIONS> --kill-running-processes
```

After the test completes, the logs generated by the test appear on your worker nodes.
Expand Down
8 changes: 5 additions & 3 deletions _benchmark/user-guide/target-throughput.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,13 +16,15 @@ OpenSearch Benchmark has two testing modes, both of which are related to through

## Benchmarking mode

When you do not specify a `target-throughput`, OpenSearch Benchmark latency tests are performed in *benchmarking mode*. In this mode, the OpenSearch client sends requests to the OpenSearch cluster as fast as possible. After the cluster receives a response from the previous request, OpenSearch Benchmark immediately sends the next request to the OpenSearch client. In this testing mode, latency is identical to service time.
When `target-throughput` is set to `0`, OpenSearch Benchmark latency tests are performed in *benchmarking mode*. In this mode, the OpenSearch client sends requests to the OpenSearch cluster as fast as possible. After the cluster receives a response from the previous request, OpenSearch Benchmark immediately sends the next request to the OpenSearch client. In this testing mode, latency is identical to service time.

OpenSearch Benchmark issues one request at a time per a single client. The number of clients is set by the `search-clients` setting in the workload parameters.

## Throughput-throttled mode

**Throughput** measures the rate at which OpenSearch Benchmark issues requests, assuming that responses will be returned instantaneously. However, users can set a `target-throughput`, which is a common workload parameter that can be set for each test and is measured in operations per second.
If the `target-throughput` is not set to `0`, then OpenSearch Benchmark issues the next request in accordance with the `target-throughput`, assuming that responses are returned instantaneously.

OpenSearch Benchmark issues one request at a time for a single-client thread, which is specified as `search-clients` in the workload parameters. If `target-throughput` is set to `0`, then OpenSearch Benchmark issues a request immediately after it receives the response from the previous request. If the `target-throughput` is not set to `0`, then OpenSearch Benchmark issues the next request in accordance with the `target-throughput`, assuming that responses are returned instantaneously.
**Throughput** measures the rate at which OpenSearch Benchmark issues requests, assuming that responses are returned instantaneously. To configure the request rate, you can set the `target-throughput` workload parameter to the desired number of operations per second for each test.

When you want to simulate the type of traffic you might encounter when deploying a production cluster, set the `target-throughput` in your benchmark test to match the number of requests you estimate that the production cluster might receive. The following examples show how the `target-throughput` setting affects the latency measurement.

Expand Down
2 changes: 1 addition & 1 deletion _search-plugins/knn/disk-based-vector-search.md
Original file line number Diff line number Diff line change
Expand Up @@ -167,7 +167,7 @@ GET my-vector-index/_search
For [model-based indexes]({{site.url}}{{site.baseurl}}/search-plugins/knn/approximate-knn/#building-a-k-nn-index-from-a-model), you can specify the `on_disk` parameter in the training request in the same way that you would specify it during index creation. By default, `on_disk` mode will use the [Faiss IVF method]({{site.url}}{{site.baseurl}}/search-plugins/knn/knn-index/#supported-faiss-methods) and a compression level of `32x`. To run the training API, send the following request:

```json
POST /_plugins/_knn/models/_train/test-model
POST /_plugins/_knn/models/test-model/_train
{
"training_index": "train-index-name",
"training_field": "train-field-name",
Expand Down
2 changes: 2 additions & 0 deletions _security/access-control/document-level-security.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,8 @@ Document-level security lets you restrict a role to a subset of documents in an

![Document- and field-level security screen in OpenSearch Dashboards]({{site.url}}{{site.baseurl}}/images/security-dls.png)

The maximum size for the document-level security configuration is 1024 KB (1,048,404 characters).
{: .warning}

## Simple roles

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@ The searchable snapshot feature incorporates techniques like caching frequently

To configure the searchable snapshots feature, create a node in your `opensearch.yml file` and define the node role as `search`. Optionally, you can also configure the `cache.size` property for the node.

A `search` node reserves storage for the cache to perform searchable snapshot queries. In the case of a dedicated search node where the node exclusively has the `search` role, this value defaults to a fixed percentage of available storage. In other cases, the value needs to be configured by the user using the `node.search.cache.size` setting.
A `search` node reserves storage for the cache to perform searchable snapshot queries. In the case of a dedicated search node where the node exclusively has the `search` role, this value defaults to a fixed percentage (80%) of available storage. In other cases, the value needs to be configured by the user using the `node.search.cache.size` setting.

Parameter | Type | Description
:--- | :--- | :---
Expand Down

0 comments on commit 44e697e

Please sign in to comment.