Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

(Partial) Jepsen test analysis #467

Open
2 tasks done
bsbds opened this issue Sep 28, 2023 · 2 comments
Open
2 tasks done

(Partial) Jepsen test analysis #467

bsbds opened this issue Sep 28, 2023 · 2 comments

Comments

@bsbds
Copy link
Collaborator

bsbds commented Sep 28, 2023

Tests

I partially completed Jepsen tests on Xline, which based on https://github.com/jepsen-io/etcd.

Tested the following, without nemesis to produce failures:

  • Register
    Tests for single registers, using knossos for linearizability checking. This test contains read/write/cas operations.
  • Set
    Use a compare-and-set transaction to read a set of integers from a single key and append a value to that set
  • Append
    Tests append/read transactions over lists. In order to provide append
    transactions, we need to read the current states, then perform a second
    transaction to perform all writes (and reads).
  • Wr
    Tests transactional writes and reads to registers using Elle.

Result

  • Register
    Failed once, haven't investigate what happend yet.
  • Set
    Ok
  • Append
    Mostly Failed
  • Wr
    Mostly Failed

Anomalies

The most obvious anomalies I found is txn inconsistencies in append and wr.
After some investigations, I found two basic types of anomalies.

G2-item #0
Let:
  T1 = {:index 469, :time 13951607366, :type :ok, :process 3, :f :txn, :value [[:w 7 30] [:r 9 nil] [:w 9 1]]}
  T2 = {:index 472, :time 13956078295, :type :ok, :process 5, :f :txn, :value [[:r 9 nil] [:w 9 3] [:r 8 8]]}

Then:
  - T1 < T2, because T1 read key 9 = nil, and T2 set it to 3, which came later in the version order.
  - However, T2 < T1, because T2 read key 9 = nil, and T1 set it to 1, which came later in the version order: a contradiction!

This is caused by:
When constructing key range for conflict check in command_from_request_wrapper, we have the following code

        RequestWrapper::TxnRequest(ref req) => req
            .compare
            .iter()
            .map(|cmp| KeyRange::new(cmp.key.as_slice(), cmp.range_end.as_slice()))
            .collect(),

The code only use compare keys for conflict check, but the child operation keys are not added here, so the command may execute out of order. A fix would be add all keys of that txn to the command.

            :anomalies {:internal ({:op #jepsen.history.Op{:index 43,
                                                           :time 12262130884,
                                                           :type :ok,
                                                           :process 29,
                                                           :f :txn,
                                                           :value [[:w
                                                                    1
                                                                    7]
                                                                   [:r
                                                                    0
                                                                    nil]
                                                                   [:r
                                                                    1
                                                                    4]
                                                                   [:r
                                                                    2
                                                                    6]]},
                                    :mop [:r 1 4],
                                    :expected 7}

The operations in a single txn should be executed sequentially. However in Xline, we donot check for conflicts inside a single txn, all commands result are based on the storage state before the txn is exected. This behaviour is inconsistent with etcd. This needs futher discussion. Maybe we could statically check that txn after the compare is completed.

@liangyuanpeng
Copy link
Contributor

The operations in a single txn should be executed sequentially. However in Xline, we donot check for conflicts inside a single txn, all commands result are based on the storage state before the txn is exected

Seems like it's the reason of #468, PR #472 would close #468 ? @bsbds

@bsbds
Copy link
Collaborator Author

bsbds commented Nov 1, 2023

The operations in a single txn should be executed sequentially. However in Xline, we donot check for conflicts inside a single txn, all commands result are based on the storage state before the txn is exected

Seems like it's the reason of #468, PR #472 would close #468 ? @bsbds

Sorry for the late response. Indeed it's the root cause of that. I'll fix it in another PR. Thanks for the test case!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants