Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DOC] Searchable Snapshot Feature #1684

Closed
1 of 4 tasks
andrross opened this issue Oct 25, 2022 · 6 comments
Closed
1 of 4 tasks

[DOC] Searchable Snapshot Feature #1684

andrross opened this issue Oct 25, 2022 · 6 comments
Assignees
Labels
snapshots v2.4.0 'Issues and PRs related to version v2.4.0'

Comments

@andrross
Copy link
Member

andrross commented Oct 25, 2022

What do you want to do?

  • Request a change to existing documentation
  • Add new documentation
  • Report a technical problem with the documentation
  • Other

Tell us about your request.
We would like to develop the documentation for the initial experimental release of the searchable snapshot feature in OpenSearch 2.4. The feature design proposal is here: opensearch-project/OpenSearch#3895

What other resources are available?

The searchable snapshot feature consists primarily of two API changes:

  • A new parameter is added to the snapshot restore API indicating the snapshot should be restored as a "searchable snapshot" as opposed to downloading all index data at restore time
  • A new node role is added to indicate that a node is capable of hosting a searchable snapshot index and serving queries.

I've included some thoughts on the documentation about these two areas below:

Snapshot Restore API

A new parameter will be introduced in the snapshot restore API: storage_type.

Setting Description
storage_type Must be one of local or remote_snapshot. local is the default if not specified, and indicates that all snapshot metadata and index data will be downloaded to local instance storage. remote_snapshot indicates that snapshot metadata will be downloaded to the cluster but the remote repository will remain the authoritative store of the index data. Data will be downloaded on-demand as necessary to service queries. At least one node in the cluster must be configured for the search role in order to restore a snapshot of type remote_snapshot.

Node Role

This feature adds a new search role to the list of possible node roles. I can't seem to find existing documentation about all the existing build-in roles (data, cluster_manager, ingest, and remote_cluster_client), but here is a example documentation on a dynamic role added by a plugin.

@kolchfa-aws kolchfa-aws added this to the v2.4 milestone Oct 25, 2022
@kolchfa-aws kolchfa-aws added the v2.4.0 'Issues and PRs related to version v2.4.0' label Oct 25, 2022
@ariamarble ariamarble self-assigned this Oct 26, 2022
@Naarcha-AWS Naarcha-AWS added the 2 - In progress Issue/PR: The issue or PR is in progress. label Oct 26, 2022
@ariamarble ariamarble added Closed - Complete Issue: Work is done and associated PRs closed and removed 2 - In progress Issue/PR: The issue or PR is in progress. labels Oct 27, 2022
@ariamarble
Copy link
Contributor

Closing with PR #1749

@andrross
Copy link
Member Author

andrross commented Oct 27, 2022

Left a comment on the PR right as it was being merged, but I think we'll want to model this documentation similar to segment replication. Specifically this should be a stand-alone page with all the experimental feature caveats. We should probably wait until the feature is fully ready in a subsequent release before updating the main "restore snapshot" documentation with the new parameter.

@ariamarble ariamarble reopened this Nov 2, 2022
@ariamarble
Copy link
Contributor

We'll work on breaking this off into it's own section

@ariamarble ariamarble added 2 - In progress Issue/PR: The issue or PR is in progress. and removed Closed - Complete Issue: Work is done and associated PRs closed labels Nov 2, 2022
@andrross
Copy link
Member Author

andrross commented Nov 2, 2022

@ariamarble Here's a rough draft of the content I had in mind. I don't have a strong opinion as to exactly how it should be structured or what page it should be on.

Searchable snapshots

Searchable snapshots is an experimental feature with OpenSearch 2.4.
Therefore, we do not recommend the use of this feature in a production
environment. For updates on progress or if you want leave feedback that
could help improve the feature, see the [searchable snapshot GitHub issue](https://github.com/opensearch-project/OpenSearch/issues/2919). 
{: .warning}
As an experimental feature, searchable snapshots will be behind a feature
flag and must be enabled on **each node** of a cluster.
{: .note }

(All the feature flag text can be copied from https://opensearch.org/docs/latest/opensearch/segment-replication/configuration/#enabling-the-feature-flag
just need to substitute in the following text for the actual flag value)

opensearch.experimental.feature.searchable_snapshot.enabled

A searchable snapshot index is an index where data is read from a snapshot repository on-demand at search time, rather than downloading all index data to cluster storage at restore time. Because the index data remains in the snapshot format in the repository, searchable snapshot indexes are inherently read-only. Any attempt to write to a searchable snapshot index will result in an error.

Create an index

Creating a searchable snapshot index is done by specifying the remote_snapshot storage type in the restore API:

Request fields Description
storage_type local indicates that all snapshot metadata and index data will be downloaded to local storage. remote_snapshot indicates that snapshot metadata will be downloaded to the cluster, but the remote repository will remain the authoritative store of the index data. Data will be downloaded and cached as necessary to service queries. At least one node in the cluster must be configured with the search role in order to restore a snapshot using the type remote_snapshot. Defaults to local.

Listing indexes

To determine if an index is a searchable snapshot index, look for the store setting in the index settings with the value of remote_snapshot:

curl -X GET "https://localhost:9200/my-index/_settings?pretty" -ku admin:admin
{
  "my-index": {
    "settings": {
      "index": {
        "store": {
          "type": "remote_snapshot"
        }
      }
    }
  }
}

Potential use cases

  • Users who wish to offload indexes from cluster-based storage yet retain the ability to search them

Known limitations

  • Accessing data from a remote repository is slower than local disk reads, so higher latencies on search queries are expected.
  • Data is discarded immediately after being read. Subsequent searches for the same data will have to download data again. Future work will address this by implementing a disk-based cache for storing frequently-accessed data.
  • Many remote object stores charge on a per-request basis for retrieval so users should closely monitor any costs incurred.
  • Seaching remote data can impact the performance of other queries running on the same node. Users are recommended to provision dedicated nodes with the search role for performance-critical applications.

@ariamarble
Copy link
Contributor

Created PR - #1795 - with my early rough draft. I will incorporate these changes into that PR.

@ariamarble ariamarble added 3 - Tech review PR: Tech review in progress and removed 2 - In progress Issue/PR: The issue or PR is in progress. labels Nov 2, 2022
@ariamarble
Copy link
Contributor

PR - #1795 - is ready for Tech Review if you want to review @andrross

@ariamarble ariamarble added 5 - Editorial review PR: Editorial review in progress and removed 3 - Tech review PR: Tech review in progress labels Nov 4, 2022
@ariamarble ariamarble removed the 5 - Editorial review PR: Editorial review in progress label Nov 8, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
snapshots v2.4.0 'Issues and PRs related to version v2.4.0'
Projects
None yet
Development

No branches or pull requests

4 participants