(Partial) Jepsen test analysis #467

bsbds · 2023-09-28T12:09:53Z

Tests

I partially completed Jepsen tests on Xline, which based on https://github.com/jepsen-io/etcd.

Tested the following, without nemesis to produce failures:

Register
Tests for single registers, using knossos for linearizability checking. This test contains read/write/cas operations.
Set
Use a compare-and-set transaction to read a set of integers from a single key and append a value to that set
Append
Tests append/read transactions over lists. In order to provide append
transactions, we need to read the current states, then perform a second
transaction to perform all writes (and reads).
Wr
Tests transactional writes and reads to registers using Elle.

Result

Register
Failed once, haven't investigate what happend yet.
Set
Ok
Append
Mostly Failed
Wr
Mostly Failed

Anomalies

The most obvious anomalies I found is txn inconsistencies in append and wr.
After some investigations, I found two basic types of anomalies.

[Bug]: Txn child request keys get ignored during conflict check #470

G2-item #0
Let:
  T1 = {:index 469, :time 13951607366, :type :ok, :process 3, :f :txn, :value [[:w 7 30] [:r 9 nil] [:w 9 1]]}
  T2 = {:index 472, :time 13956078295, :type :ok, :process 5, :f :txn, :value [[:r 9 nil] [:w 9 3] [:r 8 8]]}

Then:
  - T1 < T2, because T1 read key 9 = nil, and T2 set it to 3, which came later in the version order.
  - However, T2 < T1, because T2 read key 9 = nil, and T1 set it to 1, which came later in the version order: a contradiction!

This is caused by:
When constructing key range for conflict check in command_from_request_wrapper, we have the following code

        RequestWrapper::TxnRequest(ref req) => req
            .compare
            .iter()
            .map(|cmp| KeyRange::new(cmp.key.as_slice(), cmp.range_end.as_slice()))
            .collect(),

The code only use compare keys for conflict check, but the child operation keys are not added here, so the command may execute out of order. A fix would be add all keys of that txn to the command.

[Bug]: requests in a single txn do not execute in sequence #471

            :anomalies {:internal ({:op #jepsen.history.Op{:index 43,
                                                           :time 12262130884,
                                                           :type :ok,
                                                           :process 29,
                                                           :f :txn,
                                                           :value [[:w
                                                                    1
                                                                    7]
                                                                   [:r
                                                                    0
                                                                    nil]
                                                                   [:r
                                                                    1
                                                                    4]
                                                                   [:r
                                                                    2
                                                                    6]]},
                                    :mop [:r 1 4],
                                    :expected 7}

The operations in a single txn should be executed sequentially. However in Xline, we donot check for conflicts inside a single txn, all commands result are based on the storage state before the txn is exected. This behaviour is inconsistent with etcd. This needs futher discussion. Maybe we could statically check that txn after the compare is completed.

The text was updated successfully, but these errors were encountered:

liangyuanpeng · 2023-10-08T03:38:36Z

The operations in a single txn should be executed sequentially. However in Xline, we donot check for conflicts inside a single txn, all commands result are based on the storage state before the txn is exected

Seems like it's the reason of #468, PR #472 would close #468 ? @bsbds

bsbds · 2023-11-01T11:29:57Z

The operations in a single txn should be executed sequentially. However in Xline, we donot check for conflicts inside a single txn, all commands result are based on the storage state before the txn is exected

Seems like it's the reason of #468, PR #472 would close #468 ? @bsbds

Sorry for the late response. Indeed it's the root cause of that. I'll fix it in another PR. Thanks for the test case!

This was referenced Oct 8, 2023

[Bug]: Txn child request keys get ignored during conflict check #470

Closed

[Bug]: requests in a single txn do not execute in sequence #471

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

(Partial) Jepsen test analysis #467

(Partial) Jepsen test analysis #467

bsbds commented Sep 28, 2023 •

edited

Loading

liangyuanpeng commented Oct 8, 2023

bsbds commented Nov 1, 2023 •

edited

Loading

(Partial) Jepsen test analysis #467

(Partial) Jepsen test analysis #467

Comments

bsbds commented Sep 28, 2023 • edited Loading

Tests

Result

Anomalies

liangyuanpeng commented Oct 8, 2023

bsbds commented Nov 1, 2023 • edited Loading

bsbds commented Sep 28, 2023 •

edited

Loading

bsbds commented Nov 1, 2023 •

edited

Loading