Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

error : it has to be a leader!!! #920

Closed
Tracked by #923
web3creator opened this issue Nov 3, 2023 · 5 comments · Fixed by #924
Closed
Tracked by #923

error : it has to be a leader!!! #920

web3creator opened this issue Nov 3, 2023 · 5 comments · Fixed by #924
Assignees

Comments

@web3creator
Copy link

Describe the bug
I started a node, then set the node as leader, then ctrl+c crashed the node, and then the following error occurred when restarting the node.

thread 'tokio-runtime-worker' panicked at /Users/simonyi/.cargo/git/checkouts/openraft-0677bbb78cc57964/4b78dd3/openraft/src/core/raft_core.rs:797:13:
internal error: entered unreachable code: it has to be a leader!!!

Is this normal logic or a bug?

Copy link

github-actions bot commented Nov 3, 2023

👋 Thanks for opening this issue!

Get help or engage by:

  • /help : to print help messages.
  • /assignme : to assign this issue to you.

@drmingdrmer drmingdrmer self-assigned this Nov 4, 2023
@drmingdrmer
Copy link
Member

A panic is a logic bug.

Could you provide the debug level logs? This way I can see what's happening when it panicked.

@web3creator
Copy link
Author

@drmingdrmer

2023-11-04T03:03:21.296813Z DEBUG tokio-runtime-worker ThreadId(06) new{cluster=morse_brain}:RaftCore{id=7395302447718541494 cluster=morse_brain}:main: openraft::core::raft_core: raft node is initializing
2023-11-04T03:03:21.296848Z  INFO tokio-runtime-worker ThreadId(06) new{cluster=morse_brain}:RaftCore{id=7395302447718541494 cluster=morse_brain}:main:startup: openraft::engine::engine_impl: startup: state: RaftState { vote: UTime { data: Vote { leader_id: LeaderId { term: 1, node_id: 7395302447718541494 }, committed: true }, utime: Some(Instant { tv_sec: 464375, tv_nsec: 700082041 }) }, committed: Some(LogId { leader_id: LeaderId { term: 10000, node_id: 2 }, index: 10000 }), purged_next: 10001, log_ids: LogIdList { key_log_ids: [LogId { leader_id: LeaderId { term: 10000, node_id: 2 }, index: 10000 }] }, membership_state: MembershipState { committed: EffectiveMembership { log_id: None, membership: Membership { configs: [], nodes: {} }, voter_ids: {} }, effective: EffectiveMembership { log_id: None, membership: Membership { configs: [], nodes: {} }, voter_ids: {} } }, snapshot_meta: SnapshotMeta { last_log_id: Some(LogId { leader_id: LeaderId { term: 10000, node_id: 2 }, index: 10000 }), last_membership: StoredMembership { log_id: None, membership: Membership { configs: [], nodes: {} } }, snapshot_id: "10000-2-10000-1" }, server_state: Learner, accepted: Accepted { leader_id: LeaderId { term: 0, node_id: 0 }, log_id: None }, io_state: IOState { building_snapshot: false, vote: Vote { leader_id: LeaderId { term: 1, node_id: 7395302447718541494 }, committed: true }, flushed: LogIOId { leader_id: LeaderId { term: 0, node_id: 0 }, log_id: None }, applied: Some(LogId { leader_id: LeaderId { term: 10000, node_id: 2 }, index: 10000 }), purged: Some(LogId { leader_id: LeaderId { term: 10000, node_id: 2 }, index: 10000 }) }, snapshot_streaming: None, purge_upto: Some(LogId { leader_id: LeaderId { term: 10000, node_id: 2 }, index: 10000 }) }
2023-11-04T03:03:21.297285Z  INFO tokio-runtime-worker ThreadId(06) new{cluster=morse_brain}:RaftCore{id=7395302447718541494 cluster=morse_brain}:main:startup: openraft::engine::engine_impl: startup: is_leader: true
2023-11-04T03:03:21.297298Z  INFO tokio-runtime-worker ThreadId(06) new{cluster=morse_brain}:RaftCore{id=7395302447718541494 cluster=morse_brain}:main:startup: openraft::engine::engine_impl: startup: is_voter: false
2023-11-04T03:03:21.299352Z DEBUG tokio-runtime-worker ThreadId(06) new{cluster=morse_brain}:RaftCore{id=7395302447718541494 cluster=morse_brain}:main:startup:calc_server_state: openraft::raft_state: states is_member=false is_leader=true is_leading=true
2023-11-04T03:03:21.299383Z DEBUG tokio-runtime-worker ThreadId(06) new{cluster=morse_brain}:RaftCore{id=7395302447718541494 cluster=morse_brain}:main:startup: openraft::engine::handler::server_state_handler: update_server_state_if_changed id=7395302447718541494 prev_server_state=Learner server_state=Learner
2023-11-04T03:03:21.300196Z DEBUG tokio-runtime-worker ThreadId(06) new{cluster=morse_brain}:RaftCore{id=7395302447718541494 cluster=morse_brain}:main:startup: openraft::engine::handler::replication_handler: openraft::engine::handler::replication_handler::ReplicationHandler<_>::update_local_progress upto=10000-2-10000
2023-11-04T03:03:21.300225Z DEBUG tokio-runtime-worker ThreadId(06) new{cluster=morse_brain}:RaftCore{id=7395302447718541494 cluster=morse_brain}:main:startup:initiate_replication: openraft::engine::handler::replication_handler: openraft::engine::handler::replication_handler::ReplicationHandler<_>::initiate_replication progress=VecProgress { quorum_set: Joint { data: [], _p: PhantomData<(u64, alloc::vec::Vec<u64>)> }, granted: None, voter_count: 0, vector: [], stat: Stat { update_count: 0, move_count: 0, is_quorum_count: 0 } }
2023-11-04T03:03:21.300296Z DEBUG tokio-runtime-worker ThreadId(06) new{cluster=morse_brain}:RaftCore{id=7395302447718541494 cluster=morse_brain}:main:run_engine_commands: openraft::core::raft_core: queued commands: start...
2023-11-04T03:03:21.300307Z DEBUG tokio-runtime-worker ThreadId(06) new{cluster=morse_brain}:RaftCore{id=7395302447718541494 cluster=morse_brain}:main:run_engine_commands: openraft::core::raft_core: queued commands: RebuildReplicationStreams { targets: [] }
2023-11-04T03:03:21.300317Z DEBUG tokio-runtime-worker ThreadId(06) new{cluster=morse_brain}:RaftCore{id=7395302447718541494 cluster=morse_brain}:main:run_engine_commands: openraft::core::raft_core: queued commands: end...
2023-11-04T03:03:21.300325Z DEBUG tokio-runtime-worker ThreadId(06) new{cluster=morse_brain}:RaftCore{id=7395302447718541494 cluster=morse_brain}:main:run_engine_commands: openraft::core::raft_core: run command: RebuildReplicationStreams { targets: [] }
2023-11-04T03:03:21.300346Z DEBUG tokio-runtime-worker ThreadId(06) new{cluster=morse_brain}:RaftCore{id=7395302447718541494 cluster=morse_brain}:main:run_engine_commands: openraft::core::raft_core: condition: None
2023-11-04T03:03:21.300382Z  INFO tokio-runtime-worker ThreadId(06) new{cluster=morse_brain}:RaftCore{id=7395302447718541494 cluster=morse_brain}:main:run_engine_commands:remove_all_replication: openraft::core::raft_core: remove all replication
thread 'tokio-runtime-worker' panicked at /Users/simonyi/.cargo/git/checkouts/openraft-0677bbb78cc57964/4b78dd3/openraft/src/core/raft_core.rs:797:13:
internal error: entered unreachable code: it has to be a leader!!!

@drmingdrmer
Copy link
Member

drmingdrmer commented Nov 4, 2023

What steps did you take to arrive at this particular status?
Did you manually fakes these status for Openraft?

From the provided log, it appears that the membership is empty:

Membership { configs: [], nodes: {} }, voter_ids: {} }

and the vote is non-empty:

Vote { leader_id: LeaderId { term: 1, node_id: 7395302447718541494 }

The lack of initialization for Openraft is evident, since an empty membership is not permissible when initializing Openraft.

Reason and Solution

Openraft's handling of this unexpected scenario seems inconsistent. In this context:

Given that calc_server_state() returns Learner, Openraft does not initialize the Leader state, which accounts for the encountered panic.

To resolve this issue, I will modify calc_server_state() to return Lender, which should help achieve internal consistency.

Added 2023-11-06:

To resolve this issue, I will update calc_server_state() and is_leader() so that they return consistent result according to the following table:

For node-2:

  • Node-2 with vote (term=1, node_id=2, committed=true):

    • is a leader if it is present in config, either a voter or non-voter.
    • is a learner if it is absent in config.
  • Node-2 with vote (term=1, node_id=2, committed=false):

    • is a candidate if it is present in config, either a voter or non-voter.
    • is a learner if it is absent in config.
  • Node-3 with vote (term=1, node_id=99, committed=false|true):

    • is a follower if it is a voter in config,
    • is a learner if it is a non-voter or absent in config.
vote \ membership Voter Non-voter Absent
(term=1, node_id=2, committed=true) leader leader learner
(term=1, node_id=2, committed=false) candidate candidate learner
(term=1, node_id=99, committed=true) follower learner learner
(term=1, node_id=99, committed=false) follower learner learner

drmingdrmer added a commit to drmingdrmer/openraft that referenced this issue Nov 6, 2023
Openraft does not require Leader/Candidate to be a voter.
Before this commit, Openraft's handling of non-voter leader is
inconsistent:

For a node that has a vote proposed by this node and is committed, and is not a voter:

- [`calc_server_state()`](https://github.com/datafuselabs/openraft/blob/97254a31c673e52429ee7187de96fd2ea2a5ce98/openraft/src/raft_state/mod.rs#L323) returns `Learner`.
- But [`is_leader()`](https://github.com/datafuselabs/openraft/blob/97254a31c673e52429ee7187de96fd2ea2a5ce98/openraft/src/raft_state/mod.rs#L353) returns true.

Since this commit,
`calc_server_state()` and `is_leader()` returns consistent result according to the following table:

For node-2:

- Node-2 with vote `(term=1, node_id=2, committed=true)`:
  - is a leader if it is **present** in config, either a voter or non-voter.
  - is a learner if it is **absent** in config.

- Node-2 with vote `(term=1, node_id=2, committed=false)`:
  - is a candidate if it is **present** in config, either a voter or non-voter.
  - is a learner if it is **absent** in config.

- Node-3 with vote `(term=1, node_id=99, committed=false|true)`:
  - is a follower if it is a **voter** in config,
  - is a learner if it is a **non-voter** or **absent** in config.

| vote \ membership                     | Voter     | Non-voter | Absent  |
|---------------------------------------|-----------|-----------|---------|
| (term=1, node_id=2, committed=true)   | leader    | leader    | learner |
| (term=1, node_id=2, committed=false)  | candidate | candidate | learner |
| (term=1, node_id=99, committed=true)  | follower  | learner   | learner |
| (term=1, node_id=99, committed=false) | follower  | learner   | learner |

- Fix: databendlabs#920
drmingdrmer added a commit to drmingdrmer/openraft that referenced this issue Nov 6, 2023
Openraft does not require Leader/Candidate to be a voter.
Before this commit, Openraft's handling of non-voter leader is
inconsistent:

For a node that has a vote proposed by this node and is committed, and is not a voter:

- [`calc_server_state()`](https://github.com/datafuselabs/openraft/blob/97254a31c673e52429ee7187de96fd2ea2a5ce98/openraft/src/raft_state/mod.rs#L323) returns `Learner`.
- But [`is_leader()`](https://github.com/datafuselabs/openraft/blob/97254a31c673e52429ee7187de96fd2ea2a5ce98/openraft/src/raft_state/mod.rs#L353) returns true.

Since this commit,
`calc_server_state()` and `is_leader()` returns consistent result according to the following table:

For node-2:

- Node-2 with vote `(term=1, node_id=2, committed=true)`:
  - is a leader if it is **present** in config, either a voter or non-voter.
  - is a learner if it is **absent** in config.

- Node-2 with vote `(term=1, node_id=2, committed=false)`:
  - is a candidate if it is **present** in config, either a voter or non-voter.
  - is a learner if it is **absent** in config.

- Node-3 with vote `(term=1, node_id=99, committed=false|true)`:
  - is a follower if it is a **voter** in config,
  - is a learner if it is a **non-voter** or **absent** in config.

| vote \ membership                     | Voter     | Non-voter | Absent  |
|---------------------------------------|-----------|-----------|---------|
| (term=1, node_id=2, committed=true)   | leader    | leader    | learner |
| (term=1, node_id=2, committed=false)  | candidate | candidate | learner |
| (term=1, node_id=99, committed=true)  | follower  | learner   | learner |
| (term=1, node_id=99, committed=false) | follower  | learner   | learner |

- Fix: databendlabs#920
@drmingdrmer
Copy link
Member

The following PR is merged. In your case, the node should start as a Learner now. Please confirm that the issue is addressed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants