Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a Partition Store snapshot restore policy #1999

Draft
wants to merge 1 commit into
base: pcholakov/stack/1
Choose a base branch
from

Conversation

pcholakov
Copy link
Contributor

@pcholakov pcholakov commented Sep 27, 2024

pcholakov added a commit that referenced this pull request Sep 27, 2024
stack-info: PR: #1999, branch: pcholakov/stack/2
@pcholakov pcholakov changed the title Add a Partition Store snapshot restore policy [Snapshots] Add a Partition Store snapshot restore policy Sep 27, 2024
Copy link

github-actions bot commented Sep 27, 2024

Test Results

15 files  ±0  15 suites  ±0   9m 46s ⏱️ +2s
 6 tests ±0   6 ✅ ±0  0 💤 ±0  0 ❌ ±0 
18 runs  ±0  18 ✅ ±0  0 💤 ±0  0 ❌ ±0 

Results for commit 66c9f49. ± Comparison against base commit 13b6fda.

♻️ This comment has been updated with latest results.

@pcholakov pcholakov changed the base branch from pcholakov/stack/1 to main September 27, 2024 16:36
pcholakov added a commit that referenced this pull request Sep 27, 2024
stack-info: PR: #1999, branch: pcholakov/stack/2
@pcholakov pcholakov changed the title [Snapshots] Add a Partition Store snapshot restore policy Add a Partition Store snapshot restore policy Sep 27, 2024
@pcholakov pcholakov changed the base branch from main to pcholakov/stack/1 September 27, 2024 16:37
@pcholakov
Copy link
Contributor Author

Testing
Start a restate-server with config:

With this change we introduce the ability to enable restoring a snapshot when the partition store is empty. We can test this by dropping the partition column family and re-starting restate-server with restore enabled.

Create snapshot:

> restatectl snapshots
Partition snapshots

Usage: restatectl snapshots [OPTIONS] <COMMAND>

Commands:
  create-snapshot  Create [aliases: create]
  help             Print this message or the help of the given subcommand(s)

Options:
  -v, --verbose...                               Increase logging verbosity
  -q, --quiet...                                 Decrease logging verbosity
      --table-style <TABLE_STYLE>                Which table output style to use [default: compact] [possible values: compact, borders]
      --time-format <TIME_FORMAT>                [default: human] [possible values: human, iso8601, rfc2822]
  -y, --yes                                      Auto answer "yes" to confirmation prompts
      --connect-timeout <CONNECT_TIMEOUT>        Connection timeout for network calls, in milliseconds [default: 5000]
      --request-timeout <REQUEST_TIMEOUT>        Overall request timeout for network calls, in milliseconds [default: 13000]
      --cluster-controller <CLUSTER_CONTROLLER>  Cluster Controller host:port (e.g. http://localhost:5122/) [default: http://localhost:5122/]
  -h, --help                                     Print help (see more with '--help')
> restatectl snapshots create -p 1
Snapshot created: snap_12PclG04SN8eVSKYXCFgXx7

Server writes snapshot on-demand:

2024-09-26T07:31:49.261080Z INFO restate_admin::cluster_controller::service
  Create snapshot command received
    partition_id: PartitionId(1)
on rs:worker-0
2024-09-26T07:31:49.261133Z INFO restate_admin::cluster_controller::service
  Asking node to snapshot partition
    node_id: GenerationalNodeId(PlainNodeId(0), 3)
    partition_id: PartitionId(1)
on rs:worker-0
2024-09-26T07:31:49.261330Z INFO restate_worker::partition_processor_manager
  Received 'CreateSnapshotRequest { partition_id: PartitionId(1) }' from N0:3
on rs:worker-9
  in restate_core::network::connection_manager::network-reactor
    peer_node_id: N0:3
    protocol_version: 1
    task_id: 32
2024-09-26T07:31:49.264763Z INFO restate_worker::partition::snapshot_producer
  Partition snapshot written
    lsn: 3
    metadata: "/Users/pavel/restate/test/n1/db-snapshots/1/snap_12PclG04SN8eVSKYXCFgXx7/metadata.json"
on rt:pp-1

Sample metadata file: snap_12PclG04SN8eVSKYXCFgXx7/metadata.json

{
  "version": "V1",
  "cluster_name": "snap-test",
  "partition_id": 1,
  "node_name": "n1",
  "created_at": "2024-09-26T07:31:49.264522000Z",
  "snapshot_id": "snap_12PclG04SN8eVSKYXCFgXx7",
  "key_range": {
    "start": 9223372036854775808,
    "end": 18446744073709551615
  },
  "min_applied_lsn": 3,
  "db_comparator_name": "leveldb.BytewiseComparator",
  "files": [
    {
      "column_family_name": "",
      "name": "/000030.sst",
      "directory": "/Users/pavel/restate/test/n1/db-snapshots/1/snap_12PclG04SN8eVSKYXCFgXx7",
      "size": 1267,
      "level": 0,
      "start_key": "64650000000000000001010453454c46",
      "end_key": "667300000000000000010000000000000002",
      "smallest_seqno": 11,
      "largest_seqno": 12,
      "num_entries": 0,
      "num_deletions": 0
    },
    {
      "column_family_name": "",
      "name": "/000029.sst",
      "directory": "/Users/pavel/restate/test/n1/db-snapshots/1/snap_12PclG04SN8eVSKYXCFgXx7",
      "size": 1142,
      "level": 6,
      "start_key": "64650000000000000001010453454c46",
      "end_key": "667300000000000000010000000000000002",
      "smallest_seqno": 0,
      "largest_seqno": 0,
      "num_entries": 0,
      "num_deletions": 0
    }
  ]
}
> restatectl snapshots create -p 0

Optionally, we can also trim the log to prevent replay from Bifrost.

> restatectl logs trim -l 0 -t 1000

With Restate stopped, we drop the partition store:

> rocksdb_ldb drop_column_family --db=../test/n1/db data-0

Using this config:

[worker]
snapshot-restore-policy = "on-init"

When Restate server comes up, we can see that it successfully restores from the latest snapshot:

2024-09-27T15:39:27.704350Z INFO restate_partition_store::partition_store_manager
  Restoring partition from snapshot
    partition_id: PartitionId(0)
    snapshot_id: snap_16mzxFw4Ve8MPbfVRKOwBON
    lsn: Lsn(9636)
on rt:pp-0
2024-09-27T15:39:27.704415Z INFO restate_partition_store::partition_store_manager
  Initializing partition store from snapshot
    partition_id: PartitionId(0)
    min_applied_lsn: Lsn(9636)
on rt:pp-0
2024-09-27T15:39:27.717951Z INFO restate_worker::partition
  PartitionProcessor starting up.
on rt:pp-0
  in restate_worker::partition::run
    partition_id: 0

@pcholakov pcholakov changed the base branch from pcholakov/stack/1 to main September 30, 2024 17:07
pcholakov added a commit that referenced this pull request Sep 30, 2024
stack-info: PR: #1999, branch: pcholakov/stack/2
@pcholakov pcholakov changed the base branch from main to pcholakov/stack/1 September 30, 2024 17:07
@AhmedSoliman
Copy link
Contributor

I'm not sure how this ties into the bigger picture for partition store recovery, so maybe we should hide the configuration option until we have an end-to-end design specced out. The primary unanswered question is about who makes the decision and where does the knowledge about the snapshot come from. One option is cluster controller passing this information down through the attachment plan, or whether it's self-decided like how you are proposing here.

I can see one fallback strategy which follows your proposal, i.e. if we don't have a local partition store, and we didn't get information about a snapshot to restore, then try and fetch one. But I guess we'll need to check the trim point of the log to figure if the snapshot we have is good enough or not before we commit to being a follower or leader.

@pcholakov pcholakov changed the base branch from pcholakov/stack/1 to main October 4, 2024 08:00
pcholakov added a commit that referenced this pull request Oct 4, 2024
stack-info: PR: #1999, branch: pcholakov/stack/2
@pcholakov pcholakov changed the base branch from main to pcholakov/stack/1 October 4, 2024 08:00
@pcholakov pcholakov changed the base branch from pcholakov/stack/1 to main October 4, 2024 08:02
pcholakov added a commit that referenced this pull request Oct 4, 2024
stack-info: PR: #1999, branch: pcholakov/stack/2
@pcholakov pcholakov changed the base branch from main to pcholakov/stack/1 October 4, 2024 08:02
stack-info: PR: #1999, branch: pcholakov/stack/2
@pcholakov pcholakov changed the base branch from pcholakov/stack/1 to main October 4, 2024 08:07
@pcholakov pcholakov changed the base branch from main to pcholakov/stack/1 October 4, 2024 08:07
@pcholakov
Copy link
Contributor Author

In chatting with @tillrohrmann this morning, we figured that it's probably better to park this PR for now until we have a better idea about how the bootstrap process will fit in with the cluster control plane overall. This was useful to demo that restoring partition stores works, but it's likely not the long-term experience we want.

@pcholakov pcholakov marked this pull request as draft October 4, 2024 08:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants