Skip to content

Commit

Permalink
Doc: refine FAQ, add "What actions are required when a node restarts"
Browse files Browse the repository at this point in the history
  • Loading branch information
drmingdrmer committed Nov 13, 2023
1 parent 85719cc commit 3b405c8
Showing 1 changed file with 71 additions and 54 deletions.
125 changes: 71 additions & 54 deletions openraft/src/docs/faq/faq.md
Original file line number Diff line number Diff line change
@@ -1,76 +1,93 @@
# FAQ

- **🤔 Why is log id a tuple of `(term, node_id, log_index)`, while standard Raft uses just
`(term, log_index)`**?
### Why is log id a tuple of `(term, node_id, log_index)`?

💡 The log id `(term, node_id, log_index)` is used to minimize the chance of election conflicts.
This way in every term there could be more than one leaders elected, and the last one is valid.
See: [`leader-id`](`crate::docs::data::leader_id`) for details.
<br/><br/>
In standard Raft log id is `(term, log_index)`, in Openraft he log id `(term,
node_id, log_index)` is used to minimize the chance of election conflicts.
This way in every term there could be more than one leaders elected, and the last one is valid.
See: [`leader-id`](`crate::docs::data::leader_id`) for details.
<br/>


- **🤔 How to remove node-2 safely from a cluster `{1, 2, 3}`**?
### How to remove node-2 safely from a cluster `{1, 2, 3}`?

💡 Call `Raft::change_membership(btreeset!{1, 3})` to exclude node-2 from
the cluster. Then wipe out node-2 data.
**NEVER** modify/erase the data of any node that is still in a raft cluster, unless you know what you are doing.
<br/><br/>
Call `Raft::change_membership(btreeset!{1, 3})` to exclude node-2 from
the cluster. Then wipe out node-2 data.
**NEVER** modify/erase the data of any node that is still in a raft cluster, unless you know what you are doing.
<br/>


- **🤔 Can I wipe out the data of ONE node and wait for the leader to replicate all data to it again**?
### What actions are required when a node restarts?

💡 Avoid doing this. Doing so will panic the leader. But it is permitted
if [`loosen-follower-log-revert`] feature flag is enabled.
None. No calls, e.g., to either [`add_learner()`][] or [`change_membership()`][]
are necessary.

In a raft cluster, although logs are replicated to multiple nodes,
wiping out a node and restarting it is still possible to cause data loss.
Assumes the leader is `N1`, followers are `N2, N3, N4, N5`:
- A log(`a`) that is replicated by `N1` to `N2, N3` is considered committed.
- At this point, if `N3` is replaced with an empty node, and at once the leader `N1` is crashed. Then `N5` may elected as a new leader with granted vote by `N3, N4`;
- Then the new leader `N5` will not have log `a`.
Openraft maintains the membership configuration in [`Membership`][] for for all
nodes in the cluster, including voters and non-voters (learners). When a
`follower` or `learner` restarts, the leader will automatically re-establish
replication.

```text
Ni: Node i
Lj: Leader at term j
Fj: Follower at term j

N1 | L1 a crashed
N2 | F1 a
N3 | F1 a erased F2
N4 | F2
N5 | elect L2
----------------------------+---------------> time
Data loss: N5 does not have log `a`
```
### Can I wipe out the data of ONE node and wait for the leader to replicate all data to it again?

But for even number nodes cluster, Erasing **exactly one** node won't cause data loss.
Thus, in a special scenario like this, or for testing purpose, you can use
`--feature loosen-follower-log-revert` to permit erasing a node.
<br/><br/>
Avoid doing this. Doing so will panic the leader. But it is permitted
if [`loosen-follower-log-revert`] feature flag is enabled.

In a raft cluster, although logs are replicated to multiple nodes,
wiping out a node and restarting it is still possible to cause data loss.
Assumes the leader is `N1`, followers are `N2, N3, N4, N5`:
- A log(`a`) that is replicated by `N1` to `N2, N3` is considered committed.
- At this point, if `N3` is replaced with an empty node, and at once the leader
`N1` is crashed. Then `N5` may elected as a new leader with granted vote by
`N3, N4`;
- Then the new leader `N5` will not have log `a`.

- **🤔 Is Openraft resilient to incorrectly configured clusters?**
```text
Ni: Node i
Lj: Leader at term j
Fj: Follower at term j
💡 No, Openraft, like standard raft, cannot identify errors in cluster configuration.
N1 | L1 a crashed
N2 | F1 a
N3 | F1 a erased F2
N4 | F2
N5 | elect L2
----------------------------+---------------> time
Data loss: N5 does not have log `a`
```

A common error is the assigning a wrong network addresses to a node. In such
a scenario, if this node becomes the leader, it will attempt to replicate
logs to itself. This will cause Openraft to panic because replication
messages can only be received by a follower.
But for even number nodes cluster, Erasing **exactly one** node won't cause data loss.
Thus, in a special scenario like this, or for testing purpose, you can use
`--feature loosen-follower-log-revert` to permit erasing a node.
<br/>

```text
thread 'main' panicked at openraft/src/engine/engine_impl.rs:793:9:
assertion failed: self.internal_server_state.is_following()
```

```ignore
// openraft/src/engine/engine_impl.rs:793
pub(crate) fn following_handler(&mut self) -> FollowingHandler<C> {
debug_assert!(self.internal_server_state.is_following());
// ...
}
```
### Is Openraft resilient to incorrectly configured clusters?

<br/><br/>
No, Openraft, like standard raft, cannot identify errors in cluster configuration.

A common error is the assigning a wrong network addresses to a node. In such
a scenario, if this node becomes the leader, it will attempt to replicate
logs to itself. This will cause Openraft to panic because replication
messages can only be received by a follower.

```text
thread 'main' panicked at openraft/src/engine/engine_impl.rs:793:9:
assertion failed: self.internal_server_state.is_following()
```

```ignore
// openraft/src/engine/engine_impl.rs:793
pub(crate) fn following_handler(&mut self) -> FollowingHandler<C> {
debug_assert!(self.internal_server_state.is_following());
// ...
}
```

<br/>

[`loosen-follower-log-revert`]: `crate::docs::feature_flags#loosen_follower_log_revert`

[`add_learner()`]: `crate::Raft::add_learner`
[`change_membership()`]: `crate::Raft::change_membership`
[`Membership`]: `crate::Membership`

0 comments on commit 3b405c8

Please sign in to comment.