Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: Read State #497

Closed
3 tasks done
bsbds opened this issue Nov 2, 2023 · 4 comments
Closed
3 tasks done

[Bug]: Read State #497

bsbds opened this issue Nov 2, 2023 · 4 comments
Assignees
Labels
bug Something isn't working
Milestone

Comments

@bsbds
Copy link
Collaborator

bsbds commented Nov 2, 2023

There are several read-state related bugs found in the jepsen tests.

  • 1. Lost cmd ids
    Sometimes the cmd ids are lost during the test, this is caused by the current read state only collect sp, which is not sufficient, because the sp only store the current speculative execute cmd, but later conflict command will goes into ucp. The solution is to add ucp cmds to read state response.
  • 2. Index barrier not triggers in order
    Currently the IndexBarrier uses last_trigger_index to track the last execute index. However the after sync in execute concurrently so a command with a larger index may execute before a command with a smaller index. This requires a refactor.
    /// Trigger all barriers whose index is less than or equal to the given index.
    pub(crate) fn trigger(&self, index: u64) {
        let mut inner_l = self.inner.lock();
        if inner_l.last_trigger_index < index {
            inner_l.last_trigger_index = index;
        }
        let mut split_barriers = inner_l.barriers.split_off(&(index.overflow_add(1)));
        std::mem::swap(&mut inner_l.barriers, &mut split_barriers);
        for (_, barrier) in split_barriers {
            barrier.notify(usize::MAX);
        }
    }
  • 3. Commit index on leader may newer than uncommited cmds
    The commit index may newer because the after sync in execute concurrently. And a cmd with higher log index may complete before a cmd with lower index. To resolve this problem we should combine wait commit index and wait cmd ids, after both returns could we proceed serializable read on current node.
@bsbds
Copy link
Collaborator Author

bsbds commented Nov 2, 2023

The fix for the first and third is trivial. I'll provide some details about the refactor for the second issue.

As the after sync stage is execute concurrently, we need to first wait for the lower index to complete, before we trigger the higher index.

I uses a binary heap(min heap), aka priority_queue, to track the index triggered. And we use next to store the lowest index that haven't triggered.

struct IndexBarrierInner {
    next: u64,
    indices: BinaryHeap<IndexRevision>,
}

Each time the after sync triggers a index, we push it into the heap and do the following procedure repeatedly: check the heap top if it's equal to next, if they are the same, we pop remove the top element and update `next, other wise we break the loop.

    pub(crate) fn trigger(&self, index: u64, revision: i64) {
        let mut inner_l = self.inner.lock();
        inner_l.indices.push(IndexRevision::new(index, revision));
        while inner_l
            .indices
            .peek()
            .map_or(false, |i| i.index.eq(&inner_l.next))
        {
            let next = inner_l.next;
            let IndexRevision { index, revision } = inner_l
                .indices
                .pop()
                .unwrap_or_else(|| unreachable!("IndexRevision should be Some"));
            inner_l.next = next.overflow_add(1);
        }
    }

@Phoenix500526 Phoenix500526 added the bug Something isn't working label Nov 10, 2023
@Phoenix500526 Phoenix500526 added this to the v0.7.0 milestone Nov 22, 2023
@bsbds
Copy link
Collaborator Author

bsbds commented Jan 4, 2024

Issue 2 and 3 are no longer valid after recent refactor, I will only fix the first one.

@Phoenix500526
Copy link
Collaborator

Refer to: #782

@Phoenix500526
Copy link
Collaborator

Closed in #857

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants