From f03cad50a96ea9a0c4cf4c0d12c0f9886b54998e Mon Sep 17 00:00:00 2001 From: Aria Marble <111301581+ariamarble@users.noreply.github.com> Date: Fri, 4 Nov 2022 13:02:35 -0600 Subject: [PATCH] [DOC] [v2.4.0] Update documentation for Searchable Snapshot Feature (#1795) * Update documentation for Searchable Snapshot Feature Signed-off-by: ariamarble * tons of updates Signed-off-by: ariamarble * minor change to return page to default Signed-off-by: ariamarble * made suggested changes Signed-off-by: ariamarble Signed-off-by: ariamarble --- _opensearch/snapshots/searchable_snapshot.md | 119 +++++++++++++++++++ _opensearch/snapshots/snapshot-restore.md | 1 - 2 files changed, 119 insertions(+), 1 deletion(-) create mode 100644 _opensearch/snapshots/searchable_snapshot.md diff --git a/_opensearch/snapshots/searchable_snapshot.md b/_opensearch/snapshots/searchable_snapshot.md new file mode 100644 index 0000000000..81c639e632 --- /dev/null +++ b/_opensearch/snapshots/searchable_snapshot.md @@ -0,0 +1,119 @@ +--- +layout: default +title: Searchable snapshots +parent: Snapshots +nav_order: 40 +has_children: false +--- + +# Searchable snapshots + +Searchable snapshots is an experimental feature with OpenSearch 2.4. Therefore, we do not recommend the use of this feature in a production environment. For updates on progress or if you want leave feedback that could help improve the feature, see the [searchable snapshot GitHub issue](https://github.com/opensearch-project/OpenSearch/issues/2919). +{: .warning } + +A searchable snapshot is an index where data is read from a [snapshot repository]({{site.url}}{{site.baseurl}}/opensearch/snapshots/snapshot-restore/#register-repository) on-demand at search time, rather than downloading all index data to cluster storage at restore time. Because the index data remains in the snapshot format in the repository, searchable snapshot indexes are inherently read-only. Any attempt to write to a searchable snapshot index will result in an error. + +To enable the searchable snapshots feature, reference the steps below. + +## Enabling the feature flag + +There are several methods for enabling searchable snapshots, depending on the install type. + +### Enable on a node using a tarball install + +The flag is toggled using a new jvm parameter that is set either in `OPENSEARCH_JAVA_OPTS` or in config/jvm.options. + +- Option 1: Update config/jvm.options by adding the following line: + + ```json + -Dopensearch.experimental.feature.searchable_snapshot.enabled=true + ``` + +- Option 2: Use the `OPENSEARCH_JAVA_OPTS` environment variable: + + ```json + export OPENSEARCH_JAVA_OPTS="-Dopensearch.experimental.feature.searchable_snapshot.enabled=true" + ``` +- Option 3: For developers using Gradle, update run.gradle by adding the following lines: + + ```json + testClusters { + runTask { + testDistribution = 'archive' + if (numZones > 1) numberOfZones = numZones + if (numNodes > 1) numberOfNodes = numNodes + systemProperty 'opensearch.experimental.feature.searchable_snapshot.enabled', 'true' + } + } + ``` + +- Finally, create a node in your opensearch.yml file and define the node role as `search`: + + ```bash + node.name: snapshots-node + node.roles: [ search ] + ``` + +### Enable with Docker containers + +If you're running Docker, add the following line to docker-compose.yml underneath the `opensearch-node` and `environment` section: + +```json +OPENSEARCH_JAVA_OPTS="-Dopensearch.experimental.feature.searchable_snapshot.enabled=true" # Enables searchable snapshot +``` + +To create a node with the `search` node roll, add the line `- node.roles: [ search ]` to your docker-compose.yml file: + +```bash +version: '3' +services: + opensearch-node1: + image: opensearchproject/opensearch:2.4.0 + container_name: opensearch-node1 + environment: + - cluster.name=opensearch-cluster + - node.name=opensearch-node1 + - node.roles: [ search ] +``` + +## Create a searchable snapshot index + +Creating a searchable snapshot index is done by specifying the `remote_snapshot` storage type using the [restore snapshots API]({{site.url}}{{site.baseurl}}/opensearch/snapshots/snapshot-restore/#restore-snapshots). + +Request field | Description +:--- | :--- +`storage_type` | `local` indicates that all snapshot metadata and index data will be downloaded to local storage.

`remote_snapshot` indicates that snapshot metadata will be downloaded to the cluster, but the remote repository will remain the authoritative store of the index data. Data will be downloaded and cached as necessary to service queries. At least one node in the cluster must be configured with the `search` node role in order to restore a snapshot using the type `remote_snapshot`.

Defaults to `local`. + +## Listing indexes + +To determine if an index is a searchable snapshot index, look for a store type with the value of `remote_snapshot`: + +``` +GET /my-index/_settings?pretty +``` + +```json +{ + "my-index": { + "settings": { + "index": { + "store": { + "type": "remote_snapshot" + } + } + } + } +} +``` + +## Potential use cases + +- Users who wish to offload indexes from cluster-based storage, yet retain the ability to search them. +- Users who wish to have a large number of searchable indexes in media with lower costs. + +## Known limitations + +- Accessing data from a remote repository is slower than local disk reads, so higher latencies on search queries are expected. +- Data is discarded immediately after being read. Subsequent searches for the same data will have to be downloaded again. Future work will address this by implementing a disk-based cache for storing frequently-accessed data. +- Many remote object stores charge on a per-request basis for retrieval, so users should closely monitor any costs incurred. +- Searching remote data can impact the performance of other queries running on the same node. Users are recommended to provision dedicated nodes with the `search` role for performance-critical applications. \ No newline at end of file diff --git a/_opensearch/snapshots/snapshot-restore.md b/_opensearch/snapshots/snapshot-restore.md index b258c225c4..3ac3537470 100644 --- a/_opensearch/snapshots/snapshot-restore.md +++ b/_opensearch/snapshots/snapshot-restore.md @@ -343,7 +343,6 @@ Request fields | Description `rename_replacement` | If you want to rename indices as you restore them, use this option to specify the replacement pattern. Use `$0` to include the entire matching index name, `$1` to include the content of the first capture group, etc. `index_settings` | If you want to change index settings on restore, specify them here. `ignore_index_settings` | Rather than explicitly specifying new settings with `index_settings`, you can ignore certain index settings in the snapshot and use the cluster defaults on restore. -`storage_type` | `local` indicates that all snapshot metadata and index data will be downloaded to local storage.

`remote_snapshot` indicates that snapshot metadata will be downloaded to the cluster, but the remote repository will remain the authoritative store of the index data. Data will be downloaded and cached as necessary to service queries. At least one node in the cluster must be configured with the [search role]({{site.url}}{{site.baseurl}}/security-plugin/access-control/users-roles/) in order to restore a snapshot using the type `remote_snapshot`.

Defaults to `local`. ### Conflicts and compatibility