Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DOC] [v2.4.0] Update documentation for Searchable Snapshot Feature #1795

Merged
merged 4 commits into from
Nov 4, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
119 changes: 119 additions & 0 deletions _opensearch/snapshots/searchable_snapshot.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,119 @@
---
layout: default
title: Searchable snapshots
parent: Snapshots
nav_order: 40
has_children: false
---

# Searchable snapshots

Searchable snapshots is an experimental feature with OpenSearch 2.4. Therefore, we do not recommend the use of this feature in a production environment. For updates on progress or if you want leave feedback that could help improve the feature, see the [searchable snapshot GitHub issue](https://github.com/opensearch-project/OpenSearch/issues/2919).
ariamarble marked this conversation as resolved.
Show resolved Hide resolved
{: .warning }
ariamarble marked this conversation as resolved.
Show resolved Hide resolved

A searchable snapshot is an index where data is read from a [snapshot repository]({{site.url}}{{site.baseurl}}/opensearch/snapshots/snapshot-restore/#register-repository) on-demand at search time, rather than downloading all index data to cluster storage at restore time. Because the index data remains in the snapshot format in the repository, searchable snapshot indexes are inherently read-only. Any attempt to write to a searchable snapshot index will result in an error.
ariamarble marked this conversation as resolved.
Show resolved Hide resolved

To enable the searchable snapshots feature, reference the steps below.
ariamarble marked this conversation as resolved.
Show resolved Hide resolved

## Enabling the feature flag

There are several methods for enabling searchable snapshots, depending on the install type.
ariamarble marked this conversation as resolved.
Show resolved Hide resolved

### Enable on a node using a tarball install

The flag is toggled using a new jvm parameter that is set either in `OPENSEARCH_JAVA_OPTS` or in config/jvm.options.
ariamarble marked this conversation as resolved.
Show resolved Hide resolved

- Option 1: Update config/jvm.options by adding the following line:

```json
-Dopensearch.experimental.feature.searchable_snapshot.enabled=true
```

- Option 2: Use the `OPENSEARCH_JAVA_OPTS` environment variable:

```json
export OPENSEARCH_JAVA_OPTS="-Dopensearch.experimental.feature.searchable_snapshot.enabled=true"
```
- Option 3: For developers using Gradle, update run.gradle by adding the following lines:

```json
testClusters {
runTask {
testDistribution = 'archive'
if (numZones > 1) numberOfZones = numZones
if (numNodes > 1) numberOfNodes = numNodes
systemProperty 'opensearch.experimental.feature.searchable_snapshot.enabled', 'true'
}
}
```

- Finally, create a node in your opensearch.yml file and define the node role as `search`:

```bash
node.name: snapshots-node
node.roles: [ search ]
```

### Enable with Docker containers

If you're running Docker, add the following line to docker-compose.yml underneath the `opensearch-node` and `environment` section:
ariamarble marked this conversation as resolved.
Show resolved Hide resolved

```json
OPENSEARCH_JAVA_OPTS="-Dopensearch.experimental.feature.searchable_snapshot.enabled=true" # Enables searchable snapshot
```

To create a node with the `search` node roll, add the line `- node.roles: [ search ]` to your docker-compose.yml file:
ariamarble marked this conversation as resolved.
Show resolved Hide resolved

```bash
version: '3'
services:
opensearch-node1:
image: opensearchproject/opensearch:2.4.0
container_name: opensearch-node1
environment:
- cluster.name=opensearch-cluster
- node.name=opensearch-node1
- node.roles: [ search ]
```

## Create a searchable snapshot index

Creating a searchable snapshot index is done by specifying the `remote_snapshot` storage type using the [restore snapshots API]({{site.url}}{{site.baseurl}}/opensearch/snapshots/snapshot-restore/#restore-snapshots).
ariamarble marked this conversation as resolved.
Show resolved Hide resolved

Request field | Description
ariamarble marked this conversation as resolved.
Show resolved Hide resolved
:--- | :---
`storage_type` | `local` indicates that all snapshot metadata and index data will be downloaded to local storage. <br /><br > `remote_snapshot` indicates that snapshot metadata will be downloaded to the cluster, but the remote repository will remain the authoritative store of the index data. Data will be downloaded and cached as necessary to service queries. At least one node in the cluster must be configured with the `search` node role in order to restore a snapshot using the type `remote_snapshot`. <br /><br > Defaults to `local`.
ariamarble marked this conversation as resolved.
Show resolved Hide resolved

## Listing indexes

To determine if an index is a searchable snapshot index, look for a store type with the value of `remote_snapshot`:
ariamarble marked this conversation as resolved.
Show resolved Hide resolved

```
GET /my-index/_settings?pretty
```

```json
{
"my-index": {
"settings": {
"index": {
"store": {
"type": "remote_snapshot"
}
}
}
}
}
```

## Potential use cases

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need an intro sentence here. Something like "The following are potential use cases for the searchable snapshots feature:" is fine. Also, the two bulleted items would need to start with something other than "Users", as users are not use cases. Per the comments below, I would consider removing this section or folding it into the introduction.

- Users who wish to offload indexes from cluster-based storage, yet retain the ability to search them.
ariamarble marked this conversation as resolved.
Show resolved Hide resolved
ariamarble marked this conversation as resolved.
Show resolved Hide resolved
- Users who wish to have a large number of searchable indexes in media with lower costs.
ariamarble marked this conversation as resolved.
Show resolved Hide resolved

## Known limitations

ariamarble marked this conversation as resolved.
Show resolved Hide resolved
- Accessing data from a remote repository is slower than local disk reads, so higher latencies on search queries are expected.
- Data is discarded immediately after being read. Subsequent searches for the same data will have to be downloaded again. Future work will address this by implementing a disk-based cache for storing frequently-accessed data.
ariamarble marked this conversation as resolved.
Show resolved Hide resolved
- Many remote object stores charge on a per-request basis for retrieval, so users should closely monitor any costs incurred.
- Searching remote data can impact the performance of other queries running on the same node. Users are recommended to provision dedicated nodes with the `search` role for performance-critical applications.
ariamarble marked this conversation as resolved.
Show resolved Hide resolved
1 change: 0 additions & 1 deletion _opensearch/snapshots/snapshot-restore.md
Original file line number Diff line number Diff line change
Expand Up @@ -343,7 +343,6 @@ Request fields | Description
`rename_replacement` | If you want to rename indices as you restore them, use this option to specify the replacement pattern. Use `$0` to include the entire matching index name, `$1` to include the content of the first capture group, etc.
`index_settings` | If you want to change index settings on restore, specify them here.
`ignore_index_settings` | Rather than explicitly specifying new settings with `index_settings`, you can ignore certain index settings in the snapshot and use the cluster defaults on restore.
`storage_type` | `local` indicates that all snapshot metadata and index data will be downloaded to local storage. <br /><br > `remote_snapshot` indicates that snapshot metadata will be downloaded to the cluster, but the remote repository will remain the authoritative store of the index data. Data will be downloaded and cached as necessary to service queries. At least one node in the cluster must be configured with the [search role]({{site.url}}{{site.baseurl}}/security-plugin/access-control/users-roles/) in order to restore a snapshot using the type `remote_snapshot`. <br /><br > Defaults to `local`.


### Conflicts and compatibility
Expand Down