Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DOC] Remote Store feature documentation changes for 2.10.0 release #4875

Closed
2 of 4 tasks
sachinpkale opened this issue Aug 24, 2023 · 4 comments · Fixed by #5037 or #5078
Closed
2 of 4 tasks

[DOC] Remote Store feature documentation changes for 2.10.0 release #4875

sachinpkale opened this issue Aug 24, 2023 · 4 comments · Fixed by #5037 or #5078
Assignees
Labels
3 - Done Issue is done/complete Sev2 High-medium priority. Upcoming release or incorrect information. untriaged v2.10.0
Milestone

Comments

@sachinpkale
Copy link
Member

sachinpkale commented Aug 24, 2023

What do you want to do?

  • Request a change to existing documentation
  • Add new documentation
  • Report a technical problem with the documentation
  • Other

Tell us about your request. Provide a summary of the request and all versions that are affected.

  • We are planning to make Remote Store feature GA in 2.10.0 release.
  • This feature is already documented here but need some changes with 2.10.0 release.
  • We may also need to create separate pages to cover new changes like index metadata and benchmark numbers.
@hdhalter hdhalter added 1 - Backlog Issue: The issue is unassigned or assigned but not started v2.10.0 and removed untriaged labels Aug 24, 2023
@hdhalter hdhalter added this to the v2.10 milestone Aug 25, 2023
@hdhalter hdhalter added the Sev2 High-medium priority. Upcoming release or incorrect information. label Aug 25, 2023
@hdhalter hdhalter added 2 - In progress Issue/PR: The issue or PR is in progress. and removed 1 - Backlog Issue: The issue is unassigned or assigned but not started labels Sep 13, 2023
@hdhalter hdhalter added 3 - Done Issue is done/complete and removed 2 - In progress Issue/PR: The issue or PR is in progress. labels Sep 19, 2023
@sachinpkale
Copy link
Member Author

@Naarcha-AWS The main page changes are pending. Providing details below

@sachinpkale
Copy link
Member Author

sachinpkale commented Sep 22, 2023

Remote Backed Storage: Main Page

Overview

Remote-backed storage offers OpenSearch users a new way to protect against data loss by automatically creating backups of all index operations and sending them to the configured remote store. When remote backed storage is enabled for a cluster, it uses replication type as segment. See Segment replication for additional information.

With remote-backed storage, when a write request lands on the primary shard, the request is indexed to Lucene on the primary shard only. The corresponding translog is then uploaded to remote translog store. OpenSearch does not send the write request to the replicas, but rather performs a primary term validation to confirm that the request originator shard is still the primary shard. Primary term validation ensures that the acting primary shard fails if it becomes isolated and is unaware of the cluster manager electing a new primary.

Once segments are created on primary as part of refresh/flush/merge flow, segments are uploaded to remote segment store and the replica shards source the copy from the same store. This frees up primary from data copying operation.

With translog and segment data uploaded to remote store, remote backed storage achieves request level durability.

Configuration

Remote backed storage feature is a cluster level setting and can only be enabled during bootstrap of the cluster. Once cluster is bootstrapped, the feature can not be enabled or disabled. This helps in providing durability guarantees at the cluster level.

The communication to the configured remote cluster happens via repository plugin interface. With this, all the existing implementations of the repository plugin (Azure Blob Store, Google Cloud Storage, AWS S3 etc) are compatible with remote backed storage.

To enable remote backed storage for a given cluster, we need to provide repository details as node attributes in each node’s opensearch.yml. A sample set of node attributes look like:

# Repository name
node.attr.remote_store.segment.repository: my-repo-1
node.attr.remote_store.translog.repository: my-repo-2
node.attr.remote_store.state.repository: my-repo-3

# Segment repository settings
node.attr.remote_store.repository.my-repo-1.type: s3
node.attr.remote_store.repository.my-repo-1.settings.bucket: <Bucket Name 1>
node.attr.remote_store.repository.my-repo-1.settings.base_path: <Bucket Base Path 1>
node.attr.remote_store.repository.my-repo-1.settings.region: us-east-1

# Translog repository settings
node.attr.remote_store.repository.my-repo-2.type: s3
node.attr.remote_store.repository.my-repo-2.settings.bucket: <Bucket Name 2>
node.attr.remote_store.repository.my-repo-2.settings.base_path: <Bucket Base Path 2>
node.attr.remote_store.repository.my-repo-2.settings.region: us-east-1

# Cluster state repository settings
node.attr.remote_store.repository.my-repo-3.type: s3
node.attr.remote_store.repository.my-repo-3.settings.bucket: <Bucket Name 3>
node.attr.remote_store.repository.my-repo-3.settings.base_path: <Bucket Base Path 3>
node.attr.remote_store.repository.my-repo-3.settings.region: us-east-1

We don’t have to provide 3 different repositories for segment, translog and state. All 3 stores can share the same repository. Once cluster is created with remote_store settings, all the indices created in that cluster will start uploading data to the configured remote store.

Restore

To restore an index from a remote backup, such as in the event of a node failure, following are 2 options:

Restore only unassigned shards

curl -X POST "https://localhost:9200/_remotestore/_restore" -H 'Content-Type: application/json' -d'
{
  "indices": ["my-index-1", "my-index-2"]
}
'

Restore all the shards of a given index

curl -X POST "https://localhost:9200/_remotestore/_restore?restore_all_shards=true" -H 'Content-Type: application/json' -d'
{
  "indices": ["my-index-1"]
}
'

Related Settings

You can use following cluster level settings to tune your cluster as per the workload. For more information related to these settings, refer: https://opensearch.org/docs/latest/api-reference/cluster-api/cluster-settings/

  1. cluster.default.index.refresh_interval
  2. cluster.minimum.index.refresh_interval
  3. cluster.remote_store.translog.buffer_interval

Next Steps

  1. Track Future Enhancements to Remote Backed Storage - [Meta] Remote Store: Future Enhancements OpenSearch#10181

@Naarcha-AWS Naarcha-AWS reopened this Sep 22, 2023
@Naarcha-AWS
Copy link
Collaborator

Reopening this issue for now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3 - Done Issue is done/complete Sev2 High-medium priority. Upcoming release or incorrect information. untriaged v2.10.0
Projects
None yet
3 participants