Expose a partition worker CreateSnapshot RPC #1998

pcholakov · 2024-09-27T15:57:35Z

Stacked PRs:

Expose a partition worker CreateSnapshot RPC

github-actions · 2024-09-27T16:18:46Z

Test Results

15 files ±0 15 suites ±0 9m 15s ⏱️ -29s
6 tests ±0 6 ✅ ±0 0 💤 ±0 0 ❌ ±0
18 runs ±0 18 ✅ ±0 0 💤 ±0 0 ❌ ±0

Results for commit 8df89b3. ± Comparison against base commit 13b6fda.

♻️ This comment has been updated with latest results.

pcholakov · 2024-09-27T16:39:36Z

Testing notes

With the new admin RPC exposed, we can now request a snapshot of the worker's partition store on demand:

> restatectl snapshots
Partition snapshots

Usage: restatectl snapshots [OPTIONS] <COMMAND>

Commands:
  create-snapshot  Create [aliases: create]
  help             Print this message or the help of the given subcommand(s)

Options:
  -v, --verbose...                               Increase logging verbosity
  -q, --quiet...                                 Decrease logging verbosity
      --table-style <TABLE_STYLE>                Which table output style to use [default: compact] [possible values: compact, borders]
      --time-format <TIME_FORMAT>                [default: human] [possible values: human, iso8601, rfc2822]
  -y, --yes                                      Auto answer "yes" to confirmation prompts
      --connect-timeout <CONNECT_TIMEOUT>        Connection timeout for network calls, in milliseconds [default: 5000]
      --request-timeout <REQUEST_TIMEOUT>        Overall request timeout for network calls, in milliseconds [default: 13000]
      --cluster-controller <CLUSTER_CONTROLLER>  Cluster Controller host:port (e.g. http://localhost:5122/) [default: http://localhost:5122/]
  -h, --help                                     Print help (see more with '--help')
> restatectl snapshots create -p 1
Snapshot created: snap_12PclG04SN8eVSKYXCFgXx7

Server writes snapshot to db-snapshots relative to base dir:

2024-09-26T07:31:49.261080Z INFO restate_admin::cluster_controller::service
  Create snapshot command received
    partition_id: PartitionId(1)
on rs:worker-0
2024-09-26T07:31:49.261133Z INFO restate_admin::cluster_controller::service
  Asking node to snapshot partition
    node_id: GenerationalNodeId(PlainNodeId(0), 3)
    partition_id: PartitionId(1)
on rs:worker-0
2024-09-26T07:31:49.261330Z INFO restate_worker::partition_processor_manager
  Received 'CreateSnapshotRequest { partition_id: PartitionId(1) }' from N0:3
on rs:worker-9
  in restate_core::network::connection_manager::network-reactor
    peer_node_id: N0:3
    protocol_version: 1
    task_id: 32
2024-09-26T07:31:49.264763Z INFO restate_worker::partition::snapshot_producer
  Partition snapshot written
    lsn: 3
    metadata: "/Users/pavel/restate/test/n1/db-snapshots/1/snap_12PclG04SN8eVSKYXCFgXx7/metadata.json"
on rt:pp-1

Snapshot metadata will contain output similar to the below:

{
  "version": "V1",
  "cluster_name": "snap-test",
  "partition_id": 1,
  "node_name": "n1",
  "created_at": "2024-09-26T07:31:49.264522000Z",
  "snapshot_id": "snap_12PclG04SN8eVSKYXCFgXx7",
  "key_range": {
    "start": 9223372036854775808,
    "end": 18446744073709551615
  },
  "min_applied_lsn": 3,
  "db_comparator_name": "leveldb.BytewiseComparator",
  "files": [
    {
      "column_family_name": "",
      "name": "/000030.sst",
      "directory": "/Users/pavel/restate/test/n1/db-snapshots/1/snap_12PclG04SN8eVSKYXCFgXx7",
      "size": 1267,
      "level": 0,
      "start_key": "64650000000000000001010453454c46",
      "end_key": "667300000000000000010000000000000002",
      "smallest_seqno": 11,
      "largest_seqno": 12,
      "num_entries": 0,
      "num_deletions": 0
    },
    {
      "column_family_name": "",
      "name": "/000029.sst",
      "directory": "/Users/pavel/restate/test/n1/db-snapshots/1/snap_12PclG04SN8eVSKYXCFgXx7",
      "size": 1142,
      "level": 6,
      "start_key": "64650000000000000001010453454c46",
      "end_key": "667300000000000000010000000000000002",
      "smallest_seqno": 0,
      "largest_seqno": 0,
      "num_entries": 0,
      "num_deletions": 0
    }
  ]
}

This can later be used to restore the column family to the same state.

AhmedSoliman · 2024-09-30T09:52:00Z

crates/worker/src/partition_processor_manager.rs

+    type MessageType = CreateSnapshotRequest;
+
+    async fn on_message(&self, msg: Incoming<Self::MessageType>) {
+        info!("Received '{:?}' from {}", msg.body(), msg.peer());


Intentional? if so, does this need to be at INFO level?

It was, but happy to turn it down or eliminate altogether. Seemed like something that happens infrequently enough that it wouldn't hurt to log - but perhaps debug is much more appropriate here.

AhmedSoliman · 2024-09-30T09:53:16Z

crates/types/src/config/networking.rs

+    /// Default timeout for internal cluster RPC calls.
+    #[serde_as(as = "serde_with::DisplayFromStr")]
+    #[cfg_attr(feature = "schemars", schemars(with = "String"))]
+    pub rpc_call_timeout: humantime::Duration,


Unsure if this can be a single universal value. Expectations might vary a lot depending on the RPC itself. Perhaps we can define the timeout on the operation level instead?

I read that as, it shouldn't even be configurable then? It seems like for the create-snapshot internal operation a sane value of a couple of seconds should suffice. I think it might make sense to have a general intra-cluster RPC timeout but if we don't use it now, there's no need to pollute the config keys with it yet.

Wonder if we could integrate a default timeout in the define_rpc! macro, which can be optionally overridden per-operation in config. The right config section then seems more like [worker] since it is the "server" handling the RPC. Thoughts on this approach?

tillrohrmann

Thanks for creating this PR @pcholakov. The change look good to me. I think if we remove the general purpose rpc timeout from the scope of this PR, it is good to get merged.

tillrohrmann · 2024-10-02T22:05:43Z

crates/admin/src/cluster_controller/service.rs

+        node_id: GenerationalNodeId,
+        partition_id: PartitionId,
+    ) -> anyhow::Result<SnapshotId> {
+        let snapshot_timeout = self.networking_options.rpc_call_timeout.as_ref();


Wondering whether the generic rpc call timeout is a good fit for the snapshot timeout. If the snapshot is large, then uploading it to S3 will probably take a bit of time.

Initially, I had the idea that the scope of the PRC is purely to produce the snapshot and write it to the filesystem, with a completely separate feedback mechanism for communicating the "offloaded" snapshot LSN back to (possibly) a metadata store entry. But I haven't given this a lot of thought and we might well prefer for the upload to happen as part of CreateSnapshot. In any event, I have made the change to eliminate the generic timeout already - will push it shortly.

crates/worker/src/partition_processor_manager.rs

pcholakov · 2024-10-04T08:05:04Z

crates/types/protobuf/restate/common.proto

+  PARTITION_CREATE_SNAPSHOT_REQUEST = 42;
+  PARTITION_CREATE_SNAPSHOT_RESPONSE = 43;


@AhmedSoliman I see you left some gaps elsewhere - would you like me to start these in a new range, e.g. 50+? I think we'll have more RPCs for managing worker nodes that could be grouped together.

stack-info: PR: #1998, branch: pcholakov/stack/1

pcholakov force-pushed the pcholakov/stack/1 branch from 6b5ac97 to 7da0dc9 Compare September 27, 2024 15:57

This was referenced Sep 27, 2024

Add a Partition Store snapshot restore policy #1999

Draft

[Snapshots] Introduce a Snapshot Producer component #1981

Merged

pcholakov changed the title ~~Expose a partition worker CreateSnapshot RPC~~ [Snapshots] Expose a partition worker CreateSnapshot RPC Sep 27, 2024

pcholakov mentioned this pull request Sep 27, 2024

[Snapshots] 3: Expose a partition worker CreateSnapshot RPC #1991

Closed

pcholakov force-pushed the pcholakov/stack/1 branch from 7da0dc9 to 4fb62ee Compare September 27, 2024 16:37

pcholakov changed the title ~~[Snapshots] Expose a partition worker CreateSnapshot RPC~~ Expose a partition worker CreateSnapshot RPC Sep 27, 2024

pcholakov requested a review from AhmedSoliman September 27, 2024 16:42

AhmedSoliman reviewed Sep 30, 2024

View reviewed changes

pcholakov force-pushed the pcholakov/stack/1 branch from 4fb62ee to 1f4ce9b Compare September 30, 2024 17:07

tillrohrmann approved these changes Oct 2, 2024

View reviewed changes

pcholakov force-pushed the pcholakov/stack/1 branch 2 times, most recently from d621a61 to 76ddc3b Compare October 4, 2024 08:02

pcholakov commented Oct 4, 2024

View reviewed changes

Expose a partition worker CreateSnapshot RPC

8df89b3

stack-info: PR: #1998, branch: pcholakov/stack/1

pcholakov force-pushed the pcholakov/stack/1 branch from 76ddc3b to 8df89b3 Compare October 4, 2024 08:07

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Expose a partition worker CreateSnapshot RPC #1998

Expose a partition worker CreateSnapshot RPC #1998

pcholakov commented Sep 27, 2024 •

edited

Loading

github-actions bot commented Sep 27, 2024 •

edited

Loading

pcholakov commented Sep 27, 2024

AhmedSoliman Sep 30, 2024

pcholakov Sep 30, 2024

AhmedSoliman Sep 30, 2024

pcholakov Sep 30, 2024

pcholakov Sep 30, 2024

tillrohrmann left a comment

tillrohrmann Oct 2, 2024

pcholakov Oct 4, 2024

pcholakov Oct 4, 2024

		PARTITION_CREATE_SNAPSHOT_REQUEST = 42;
		PARTITION_CREATE_SNAPSHOT_RESPONSE = 43;

Expose a partition worker CreateSnapshot RPC #1998

Are you sure you want to change the base?

Expose a partition worker CreateSnapshot RPC #1998

Conversation

pcholakov commented Sep 27, 2024 • edited Loading