-
Notifications
You must be signed in to change notification settings - Fork 136
recvAppendEntries: Assertion r->state == RAFT_FOLLOWER || r->state == RAFT_CANDIDATE
failed
#386
Comments
recvAppendEntries: Assertion
r->state == RAFT_FOLLOWER || r->state == RAFT_CANDIDATE' failed.`r->state == RAFT_FOLLOWER || r->state == RAFT_CANDIDATE
failed
Here's the failing assertion: raft/src/recv_append_entries.c Lines 58 to 87 in 98070bd
If the invariants described in that comment are being violated, that seems pretty serious. |
This continues to happen, here's a job where you can also see a consistency violation (stale reads): https://github.com/canonical/jepsen.dqlite/actions/runs/5187761474/jobs/9350528154 |
Looking at the logs It looks like term
the second term-11 leader was n3 elected about 8 seconds later at:
Perhaps n3 was partitioned before n2 was elected and was stuck at term 10, at some point it became candidate and bumped to term 11, winning the elections. I will investigate further to see what happened exactly. |
It seems that the |
@freeekanayaka I think it's a bug. |
Happened again here Not on a master run, but don't think it matters ... |
It seems what happens in the logs Mathieu posted is that n3 goes offline at some point, comes back, gets its log up to speed, but somehow keeps operating using a very old configuration (several ASSIGNs ago). So when the election for term 4 happens, n2 thinks that the voters are {n1, n2, n4}, while n3 thinks the voters are {n1, n3, n5}. They both receive one other vote and become leader, and some time later n2 gets an unexpected request from n3. I will have to dig further to figure out why n3 doesn't switch to a newer configuration as it's updating its log. |
I think the root of the problem might be the code in appendFollowerCb (replication.c) that can update the last_stored index for a configuration log entry without applying that configuration locally, if the node has become a candidate since appendFollowerCb was installed. This make it possible for n3 to appear up to date to other nodes based on the data in its RequestVote messages, while still operating with an old configuration. That raises the question of what a candidate node should do if it finds itself applying (uncommitted) a configuration change during its candidacy. Seems like it would be best for it to just end its candidacy in this case? Otherwise counting votes turns into a mess. Or maybe followers shouldn't convert to candidate until all their in-flight append requests have completed successfully, so that configuration changes get a chance to be applied. |
I think that this is candidate for corporate db vendors review where replication problem with a lot of theory was made into real products on the market... There is also Postgres but think they are some century behind... |
Found by Jepsen, here's the job:
https://github.com/canonical/jepsen.dqlite/actions/runs/4418832275/jobs/7746473216
Hat tip to @nurturenature, whose recent improvements to our Jepsen harness helped uncover this.
The text was updated successfully, but these errors were encountered: