diff --git a/_analyzers/index.md b/_analyzers/index.md index 95f97ec8ce..9b999e5c3d 100644 --- a/_analyzers/index.md +++ b/_analyzers/index.md @@ -170,6 +170,28 @@ The response provides information about the analyzers for each field: } ``` +## Normalizers +Tokenization divides text into individual terms, but it does not address variations in token forms. Normalization resolves these issues by converting tokens into a standard format. This ensures that similar terms are matched appropriately, even if they are not identical. + +### Normalization techniques + +The following normalization techniques can help address variations in token forms: +1. **Case normalization**: Converts all tokens to lowercase to ensure case-insensitive matching. For example, "Hello" is normalized to "hello". + +2. **Stemming**: Reduces words to their root form. For instance, "cars" is stemmed to "car", and "running" is normalized to "run". + +3. **Synonym handling:** Treats synonyms as equivalent. For example, "jogging" and "running" can be indexed under a common term, such as "run". + +### Normalization + +A search for `Hello` will match documents containing `hello` because of case normalization. + +A search for `cars` will also match documents containing `car` because of stemming. + +A query for `running` can retrieve documents containing `jogging` using synonym handling. + +Normalization ensures that searches are not limited to exact term matches, allowing for more relevant results. For instance, a search for `Cars running` can be normalized to match `car run`. + ## Next steps -- Learn more about specifying [index analyzers]({{site.url}}{{site.baseurl}}/analyzers/index-analyzers/) and [search analyzers]({{site.url}}{{site.baseurl}}/analyzers/search-analyzers/). \ No newline at end of file +- Learn more about specifying [index analyzers]({{site.url}}{{site.baseurl}}/analyzers/index-analyzers/) and [search analyzers]({{site.url}}{{site.baseurl}}/analyzers/search-analyzers/). diff --git a/_api-reference/index-apis/update-alias.md b/_api-reference/index-apis/update-alias.md index f32d34025e..c069703bf3 100644 --- a/_api-reference/index-apis/update-alias.md +++ b/_api-reference/index-apis/update-alias.md @@ -10,7 +10,7 @@ nav_order: 5 **Introduced 1.0** {: .label .label-purple } -The Create or Update Alias API adds a data stream or index to an alias or updates the settings for an existing alias. For more alias API operations, see [Index aliases]({{site.url}}{{site.baseurl}}/opensearch/index-alias/). +The Create or Update Alias API adds one or more indexes to an alias or updates the settings for an existing alias. For more alias API operations, see [Index aliases]({{site.url}}{{site.baseurl}}/opensearch/index-alias/). The Create or Update Alias API is distinct from the [Alias API]({{site.url}}{{site.baseurl}}/opensearch/rest-api/alias/), which supports the addition and removal of aliases and the removal of alias indexes. In contrast, the following API only supports adding or updating an alias without updating the index itself. Each API also uses different request body parameters. {: .note} @@ -35,7 +35,7 @@ PUT /_alias | Parameter | Type | Description | :--- | :--- | :--- -| `target` | String | A comma-delimited list of data streams and indexes. Wildcard expressions (`*`) are supported. To target all data streams and indexes in a cluster, use `_all` or `*`. Optional. | +| `target` | String | A comma-delimited list of indexes. Wildcard expressions (`*`) are supported. To target all indexes in a cluster, use `_all` or `*`. Optional. | | `alias-name` | String | The alias name to be created or updated. Optional. | ## Query parameters @@ -53,7 +53,7 @@ In the request body, you can specify the index name, the alias name, and the set Field | Type | Description :--- | :--- | :--- | :--- -`index` | String | A comma-delimited list of data streams or indexes that you want to associate with the alias. If this field is set, it will override the index name specified in the URL path. +`index` | String | A comma-delimited list of indexes that you want to associate with the alias. If this field is set, it will override the index name specified in the URL path. `alias` | String | The name of the alias. If this field is set, it will override the alias name specified in the URL path. `is_write_index` | Boolean | Specifies whether the index should be a write index. An alias can only have one write index at a time. If a write request is submitted to an alias that links to multiple indexes, then OpenSearch runs the request only on the write index. `routing` | String | Assigns a custom value to a shard for specific operations. diff --git a/_benchmark/quickstart.md b/_benchmark/quickstart.md index a6bcd59819..9ab6d25c77 100644 --- a/_benchmark/quickstart.md +++ b/_benchmark/quickstart.md @@ -116,7 +116,7 @@ You can now run your first benchmark. The following benchmark uses the [percolat Benchmarks are run using the [`execute-test`]({{site.url}}{{site.baseurl}}/benchmark/commands/execute-test/) command with the following command flags: -For additional `execute_test` command flags, see the [execute-test]({{site.url}}{{site.baseurl}}/benchmark/commands/execute-test/) reference. Some commonly used options are `--workload-params`, `--exclude-tasks`, and `--include-tasks`. +For additional `execute-test` command flags, see the [execute-test]({{site.url}}{{site.baseurl}}/benchmark/commands/execute-test/) reference. Some commonly used options are `--workload-params`, `--exclude-tasks`, and `--include-tasks`. {: .tip} * `--pipeline=benchmark-only` : Informs OSB that users wants to provide their own OpenSearch cluster. @@ -136,7 +136,7 @@ opensearch-benchmark execute-test --pipeline=benchmark-only --workload=percolato ``` {% include copy.html %} -When the `execute_test` command runs, all tasks and operations in the `percolator` workload run sequentially. +When the `execute-test` command runs, all tasks and operations in the `percolator` workload run sequentially. ### Validating the test diff --git a/_benchmark/user-guide/creating-custom-workloads.md b/_benchmark/user-guide/creating-custom-workloads.md index ee0dca1ce9..6effa9a265 100644 --- a/_benchmark/user-guide/creating-custom-workloads.md +++ b/_benchmark/user-guide/creating-custom-workloads.md @@ -263,7 +263,7 @@ opensearch-benchmark list workloads --workload-path= Use the `opensearch-benchmark execute-test` command to invoke your new workload and run a benchmark test against your OpenSearch cluster, as shown in the following example. Replace `--workload-path` with the path to your custom workload, `--target-host` with the `host:port` pairs for your cluster, and `--client-options` with any authorization options required to access the cluster. ``` -opensearch-benchmark execute_test \ +opensearch-benchmark execute-test \ --pipeline="benchmark-only" \ --workload-path="" \ --target-host="" \ @@ -289,7 +289,7 @@ head -n 1000 -documents.json > -documents-1k.json Then, run `opensearch-benchmark execute-test` with the option `--test-mode`. Test mode runs a quick version of the workload test. ``` -opensearch-benchmark execute_test \ +opensearch-benchmark execute-test \ --pipeline="benchmark-only" \ --workload-path="" \ --target-host="" \ diff --git a/_benchmark/user-guide/distributed-load.md b/_benchmark/user-guide/distributed-load.md index 60fc98500f..f16de29f88 100644 --- a/_benchmark/user-guide/distributed-load.md +++ b/_benchmark/user-guide/distributed-load.md @@ -64,7 +64,7 @@ With OpenSearch Benchmark running on all three nodes and the worker nodes set to On **Node 1**, run a benchmark test with the `worker-ips` set to the IP addresses for your worker nodes, as shown in the following example: ``` -opensearch-benchmark execute_test --pipeline=benchmark-only --workload=eventdata --worker-ips=198.52.100.0,198.53.100.0 --target-hosts= --client-options= --kill-running-processes +opensearch-benchmark execute-test --pipeline=benchmark-only --workload=eventdata --worker-ips=198.52.100.0,198.53.100.0 --target-hosts= --client-options= --kill-running-processes ``` After the test completes, the logs generated by the test appear on your worker nodes. diff --git a/_benchmark/user-guide/target-throughput.md b/_benchmark/user-guide/target-throughput.md index 63832de595..1be0a8be39 100644 --- a/_benchmark/user-guide/target-throughput.md +++ b/_benchmark/user-guide/target-throughput.md @@ -16,13 +16,15 @@ OpenSearch Benchmark has two testing modes, both of which are related to through ## Benchmarking mode -When you do not specify a `target-throughput`, OpenSearch Benchmark latency tests are performed in *benchmarking mode*. In this mode, the OpenSearch client sends requests to the OpenSearch cluster as fast as possible. After the cluster receives a response from the previous request, OpenSearch Benchmark immediately sends the next request to the OpenSearch client. In this testing mode, latency is identical to service time. +When `target-throughput` is set to `0`, OpenSearch Benchmark latency tests are performed in *benchmarking mode*. In this mode, the OpenSearch client sends requests to the OpenSearch cluster as fast as possible. After the cluster receives a response from the previous request, OpenSearch Benchmark immediately sends the next request to the OpenSearch client. In this testing mode, latency is identical to service time. + +OpenSearch Benchmark issues one request at a time per a single client. The number of clients is set by the `search-clients` setting in the workload parameters. ## Throughput-throttled mode -**Throughput** measures the rate at which OpenSearch Benchmark issues requests, assuming that responses will be returned instantaneously. However, users can set a `target-throughput`, which is a common workload parameter that can be set for each test and is measured in operations per second. +If the `target-throughput` is not set to `0`, then OpenSearch Benchmark issues the next request in accordance with the `target-throughput`, assuming that responses are returned instantaneously. -OpenSearch Benchmark issues one request at a time for a single-client thread, which is specified as `search-clients` in the workload parameters. If `target-throughput` is set to `0`, then OpenSearch Benchmark issues a request immediately after it receives the response from the previous request. If the `target-throughput` is not set to `0`, then OpenSearch Benchmark issues the next request in accordance with the `target-throughput`, assuming that responses are returned instantaneously. +**Throughput** measures the rate at which OpenSearch Benchmark issues requests, assuming that responses are returned instantaneously. To configure the request rate, you can set the `target-throughput` workload parameter to the desired number of operations per second for each test. When you want to simulate the type of traffic you might encounter when deploying a production cluster, set the `target-throughput` in your benchmark test to match the number of requests you estimate that the production cluster might receive. The following examples show how the `target-throughput` setting affects the latency measurement. diff --git a/_search-plugins/knn/disk-based-vector-search.md b/_search-plugins/knn/disk-based-vector-search.md index 82da30a0ac..dfb9262db5 100644 --- a/_search-plugins/knn/disk-based-vector-search.md +++ b/_search-plugins/knn/disk-based-vector-search.md @@ -167,7 +167,7 @@ GET my-vector-index/_search For [model-based indexes]({{site.url}}{{site.baseurl}}/search-plugins/knn/approximate-knn/#building-a-k-nn-index-from-a-model), you can specify the `on_disk` parameter in the training request in the same way that you would specify it during index creation. By default, `on_disk` mode will use the [Faiss IVF method]({{site.url}}{{site.baseurl}}/search-plugins/knn/knn-index/#supported-faiss-methods) and a compression level of `32x`. To run the training API, send the following request: ```json -POST /_plugins/_knn/models/_train/test-model +POST /_plugins/_knn/models/test-model/_train { "training_index": "train-index-name", "training_field": "train-field-name", diff --git a/_security/access-control/document-level-security.md b/_security/access-control/document-level-security.md index 352fe06a61..b17b60e147 100644 --- a/_security/access-control/document-level-security.md +++ b/_security/access-control/document-level-security.md @@ -13,6 +13,8 @@ Document-level security lets you restrict a role to a subset of documents in an ![Document- and field-level security screen in OpenSearch Dashboards]({{site.url}}{{site.baseurl}}/images/security-dls.png) +The maximum size for the document-level security configuration is 1024 KB (1,048,404 characters). +{: .warning} ## Simple roles diff --git a/_tuning-your-cluster/availability-and-recovery/snapshots/searchable_snapshot.md b/_tuning-your-cluster/availability-and-recovery/snapshots/searchable_snapshot.md index b9e35b2697..d13955f3f0 100644 --- a/_tuning-your-cluster/availability-and-recovery/snapshots/searchable_snapshot.md +++ b/_tuning-your-cluster/availability-and-recovery/snapshots/searchable_snapshot.md @@ -18,7 +18,7 @@ The searchable snapshot feature incorporates techniques like caching frequently To configure the searchable snapshots feature, create a node in your `opensearch.yml file` and define the node role as `search`. Optionally, you can also configure the `cache.size` property for the node. -A `search` node reserves storage for the cache to perform searchable snapshot queries. In the case of a dedicated search node where the node exclusively has the `search` role, this value defaults to a fixed percentage of available storage. In other cases, the value needs to be configured by the user using the `node.search.cache.size` setting. +A `search` node reserves storage for the cache to perform searchable snapshot queries. In the case of a dedicated search node where the node exclusively has the `search` role, this value defaults to a fixed percentage (80%) of available storage. In other cases, the value needs to be configured by the user using the `node.search.cache.size` setting. Parameter | Type | Description :--- | :--- | :---