From 3b405c84f5c8f08c2a40fe239b6fa692ac1f9f08 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?=E5=BC=A0=E7=82=8E=E6=B3=BC?= Date: Mon, 13 Nov 2023 10:35:46 +0800 Subject: [PATCH] Doc: refine FAQ, add "What actions are required when a node restarts" --- openraft/src/docs/faq/faq.md | 125 ++++++++++++++++++++--------------- 1 file changed, 71 insertions(+), 54 deletions(-) diff --git a/openraft/src/docs/faq/faq.md b/openraft/src/docs/faq/faq.md index 7ad1d5089..90008fc0a 100644 --- a/openraft/src/docs/faq/faq.md +++ b/openraft/src/docs/faq/faq.md @@ -1,76 +1,93 @@ # FAQ -- **🤔 Why is log id a tuple of `(term, node_id, log_index)`, while standard Raft uses just - `(term, log_index)`**? +### Why is log id a tuple of `(term, node_id, log_index)`? - 💡 The log id `(term, node_id, log_index)` is used to minimize the chance of election conflicts. - This way in every term there could be more than one leaders elected, and the last one is valid. - See: [`leader-id`](`crate::docs::data::leader_id`) for details. -

+In standard Raft log id is `(term, log_index)`, in Openraft he log id `(term, +node_id, log_index)` is used to minimize the chance of election conflicts. +This way in every term there could be more than one leaders elected, and the last one is valid. +See: [`leader-id`](`crate::docs::data::leader_id`) for details. +
-- **🤔 How to remove node-2 safely from a cluster `{1, 2, 3}`**? +### How to remove node-2 safely from a cluster `{1, 2, 3}`? - 💡 Call `Raft::change_membership(btreeset!{1, 3})` to exclude node-2 from - the cluster. Then wipe out node-2 data. - **NEVER** modify/erase the data of any node that is still in a raft cluster, unless you know what you are doing. -

+Call `Raft::change_membership(btreeset!{1, 3})` to exclude node-2 from +the cluster. Then wipe out node-2 data. +**NEVER** modify/erase the data of any node that is still in a raft cluster, unless you know what you are doing. +
-- **🤔 Can I wipe out the data of ONE node and wait for the leader to replicate all data to it again**? +### What actions are required when a node restarts? - 💡 Avoid doing this. Doing so will panic the leader. But it is permitted - if [`loosen-follower-log-revert`] feature flag is enabled. +None. No calls, e.g., to either [`add_learner()`][] or [`change_membership()`][] +are necessary. - In a raft cluster, although logs are replicated to multiple nodes, - wiping out a node and restarting it is still possible to cause data loss. - Assumes the leader is `N1`, followers are `N2, N3, N4, N5`: - - A log(`a`) that is replicated by `N1` to `N2, N3` is considered committed. - - At this point, if `N3` is replaced with an empty node, and at once the leader `N1` is crashed. Then `N5` may elected as a new leader with granted vote by `N3, N4`; - - Then the new leader `N5` will not have log `a`. +Openraft maintains the membership configuration in [`Membership`][] for for all +nodes in the cluster, including voters and non-voters (learners). When a +`follower` or `learner` restarts, the leader will automatically re-establish +replication. - ```text - Ni: Node i - Lj: Leader at term j - Fj: Follower at term j - N1 | L1 a crashed - N2 | F1 a - N3 | F1 a erased F2 - N4 | F2 - N5 | elect L2 - ----------------------------+---------------> time - Data loss: N5 does not have log `a` - ``` +### Can I wipe out the data of ONE node and wait for the leader to replicate all data to it again? - But for even number nodes cluster, Erasing **exactly one** node won't cause data loss. - Thus, in a special scenario like this, or for testing purpose, you can use - `--feature loosen-follower-log-revert` to permit erasing a node. -

+Avoid doing this. Doing so will panic the leader. But it is permitted +if [`loosen-follower-log-revert`] feature flag is enabled. +In a raft cluster, although logs are replicated to multiple nodes, +wiping out a node and restarting it is still possible to cause data loss. +Assumes the leader is `N1`, followers are `N2, N3, N4, N5`: +- A log(`a`) that is replicated by `N1` to `N2, N3` is considered committed. +- At this point, if `N3` is replaced with an empty node, and at once the leader + `N1` is crashed. Then `N5` may elected as a new leader with granted vote by + `N3, N4`; +- Then the new leader `N5` will not have log `a`. -- **🤔 Is Openraft resilient to incorrectly configured clusters?** +```text +Ni: Node i +Lj: Leader at term j +Fj: Follower at term j - 💡 No, Openraft, like standard raft, cannot identify errors in cluster configuration. +N1 | L1 a crashed +N2 | F1 a +N3 | F1 a erased F2 +N4 | F2 +N5 | elect L2 +----------------------------+---------------> time + Data loss: N5 does not have log `a` +``` - A common error is the assigning a wrong network addresses to a node. In such - a scenario, if this node becomes the leader, it will attempt to replicate - logs to itself. This will cause Openraft to panic because replication - messages can only be received by a follower. +But for even number nodes cluster, Erasing **exactly one** node won't cause data loss. +Thus, in a special scenario like this, or for testing purpose, you can use +`--feature loosen-follower-log-revert` to permit erasing a node. +
- ```text - thread 'main' panicked at openraft/src/engine/engine_impl.rs:793:9: - assertion failed: self.internal_server_state.is_following() - ``` - ```ignore - // openraft/src/engine/engine_impl.rs:793 - pub(crate) fn following_handler(&mut self) -> FollowingHandler { - debug_assert!(self.internal_server_state.is_following()); - // ... - } - ``` +### Is Openraft resilient to incorrectly configured clusters? -

+No, Openraft, like standard raft, cannot identify errors in cluster configuration. + +A common error is the assigning a wrong network addresses to a node. In such +a scenario, if this node becomes the leader, it will attempt to replicate +logs to itself. This will cause Openraft to panic because replication +messages can only be received by a follower. + +```text +thread 'main' panicked at openraft/src/engine/engine_impl.rs:793:9: +assertion failed: self.internal_server_state.is_following() +``` + +```ignore +// openraft/src/engine/engine_impl.rs:793 +pub(crate) fn following_handler(&mut self) -> FollowingHandler { + debug_assert!(self.internal_server_state.is_following()); + // ... +} +``` + +
[`loosen-follower-log-revert`]: `crate::docs::feature_flags#loosen_follower_log_revert` + +[`add_learner()`]: `crate::Raft::add_learner` +[`change_membership()`]: `crate::Raft::change_membership` +[`Membership`]: `crate::Membership`