Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

avet index out of sync with datoms #386

Closed
filipesilva opened this issue Mar 1, 2021 · 29 comments
Closed

avet index out of sync with datoms #386

filipesilva opened this issue Mar 1, 2021 · 29 comments

Comments

@filipesilva
Copy link
Contributor

Hi there 👋

I'm seeing some odd behaviour with Datascript where two databases with the same datoms and schema show different results for entity lookups.

For some background context, we keep a persistent log of serialized datascript transactions on a server and also save a serialized version of database every ~500 txs. Serialization used datascript-transit.

When the app starts, it will load the most recent serialized db and then apply all txs since then.

We noticed this problem because the we saw a serialized db render differently than the previously db plus the applied transactions.

In the code examples below, old-db is the db that had a series of transactions applied to it in-memory, and new-db is that same db after a serialization round trip.

To check these two databases were the same, I compared the datoms and schema.

;; Datoms and schema are the same
(let [[only-in-old
        only-in-new
        in-both] (clojure.data/diff (seq old-db) (seq new-db))]
  (empty? only-in-old)       ;; true
  (empty? only-in-new)       ;; true
  (count in-both)            ;; 371900
  (= (:schema old-db) 
     (:schema new-db))       ;; true
  (identical? old-db new-db) ;; false
  )

I tracked it down to the entity lookup on these two databases.

;; old-db seems to not have this entity, but the datom exists
(let [uid "UN1JwSR7t"
      eid [:block/uid uid]
      db-id 45587
      has-uid (fn [[_ a v _]] (and (= a :block/uid) (= v uid)))
      q '[:find ?e
          :in $ ?block-uid
          :where
          [?e :block/uid ?block-uid]]]
  ;; d/entity doesn't find it on old-db
  (d/entity old-db eid) ;; nil
  (d/entity new-db eid) ;; {:db/id 45587}

  ;; datom is there
  (filter has-uid (seq old-db)) ;; (#datascript/Datom [45587 :block/uid "UN1JwSR7t" 537057490 true])
  (filter has-uid (seq new-db)) ;; (#datascript/Datom [45587 :block/uid "UN1JwSR7t" 537057490 true])

  ;; Pull can't find it either on old-db
  (d/pull old-db '[:db/id] eid) ;; throws :entity-id/missing error
  (d/pull new-db '[:db/id] eid) ;; #:db{:id 45587}

  ;; Query finds it though.
  (d/q q old-db uid)            ;; #{[45587]}
  (d/q q new-db uid)            ;; #{[45587]}
  )

I thought this was really weird so then I compared the indexes for a sanity check.

;; these two indexes are the same
(let [[only-in-old
        only-in-new
        in-both] (clojure.data/diff (:aevt old-db) (:aevt new-db))]
  (count only-in-old) ;; 0
  (count only-in-new) ;; 0
  (count in-both)     ;; 371900
  )

(let [[only-in-old
        only-in-new
        in-both] (clojure.data/diff (:eavt old-db) (:eavt new-db))]
  (count only-in-old) ;; 0
  (count only-in-new) ;; 0
  (count in-both)     ;; 371900
  )

;; but this index has a bunch of entries that are different!
(let [[only-in-old
        only-in-new
        in-both] (clojure.data/diff (:avet old-db) (:avet new-db))]
  (count only-in-old) ;; 717
  (count only-in-new) ;; 177
  (count in-both)     ;; 231502 
  )

Worth mentioning in case it's relevant: this db includes a few js/NaN values that I filtered out of the results, and nested maps as values.

I'm trying to get a standalone minimal repro up but this is not trivial since this DB contains a lot of sensitive information. I think I'm able to pare it down though.

I wanted to open the issue as soon as possible in case you had some tips on what might be happening or what I should focus on for the repro.

I couldn't find any other open issues that seemed to be the same as this one, but #373 sounds similar.

I think this is a bug because I expect databases that have the same datoms and schema to also have the same index, and because the entity lookup should return the entity if it exists.
If there are legitimate cases that don't conform to these expectations, please let me know.

Cheers, and thanks for the awesome database!

@tonsky
Copy link
Owner

tonsky commented Mar 2, 2021

Interesting! I suspect NaNs might be the reason, since they are pretty weird:

1 < NaN
=> false
1 > NaN
=> false

The fact that you only see difference in :avet supports this theory: it’s the only index where DS gets to compare values.

Can you try modified Datascript build? I suggest you insert NaN checks around here

(defn value-compare [x y]
(cond
(= x y) 0
#?@(:clj [(instance? Number x) (clojure.lang.Numbers/compare x y)])
#?@(:clj [(instance? Comparable x) (.compareTo ^Comparable x y)]
:cljs [(satisfies? IComparable x) (-compare x y)])
#?@(:cljs [(and (or (string? x) (array? x) (true? x) (false? x))
(identical? (type x) (type y))) (garray/defaultCompare x y)])
:else (- (hash x) (hash y))))
. Something to make NaN to sort stable (e.g. always bigger or always smaller than any other number)?

@filipesilva
Copy link
Contributor Author

filipesilva commented Mar 4, 2021

Hi again @tonsky!

I didn't pursue the NaN issue yet, because my results so far didn't lead towards it being the root of the problem. Our NaNs are within maps that we store as values. I do think it's something around value comparison though.

At the moment I'm able to isolate the transaction that takes a db's avet index to a broken state (I'll describe what a broken state is later), but attempting to do the transaction in isolation on a toy db does not seem to break it.

The only thing odd about this transaction is that it has a number value instead of a string (like other existing entities) for a {:db/unique :db.unique/identity} schema property. In fact, the CLJS version of datascript accepts this transaction and ends up with broken indexes, but the CLJ version throws on transact with an error about comparing string and long, and does not end up with broken indexes.

Regarding what constitutes a broken db, I define it as a db that's not equal to a clone of itself, where equality is defined by schemas, indexes, and datoms being the same:

(ns relemma.shared.logic.datascript.equality
  (:require
    [clojure.data :as data]
    [clojure.walk :as walk]
    [datascript.core :as datascript]))

(defn NaN? [x]
  ;; NaN is the only number that isn't equal to itself.
  ;; NB: both (= ##NaN ##NaN) and (not= ##NaN ##NaN) are false in CLJ (but not CLJS) for some reason.
  (and (number? x) (false? (= x x))))

(defn strip-NaN [x]
  (walk/postwalk #(if (NaN? %) 0 %) x))

(defn datom->vec [[e a v t]]
  [e a v t])

(defn normalized-datoms [db]
  (->> db
       seq
       (map datom->vec)
       set
       strip-NaN))

(defn datom-diff [db1 db2]
  (data/diff (normalized-datoms db1) (normalized-datoms db2)))

(defn datoms= [db1 db2]
  (= (normalized-datoms db1) (normalized-datoms db2)))

(defn diff-idx [idx x y]
  (data/diff (idx x) (idx y)))

(defn index= [db1 db2]
  (let [aevt-diff (diff-idx :aevt db1 db2)
        eavt-diff (diff-idx :eavt db1 db2)
        avet-diff (diff-idx :avet db1 db2)]
     (= (first aevt-diff) (second aevt-diff)
       (first eavt-diff) (second eavt-diff)
       (first avet-diff) (second avet-diff)
       nil)))

(defn schema= [db1 db2]
  (= (:schema db1) (:schema db2)))

(defn db= [db1 db2]
  (and (schema= db1 db2)
       (index= db1 db2)
       (datoms= db1 db2)))

(defn clone-db [db]
  ;; NB: (seq db) is the same as (d/datoms db :eavt)
  (datascript/init-db (seq db) (:schema db)))

(defn broken-db? [db] (not (db= db (clone-db db))))

In the case I describe above, the db before the transaction returns false from (broken-db? db) and the db after the transaction returns true.

There's something else interesting about the avet index diff though. Below is the way by which the indexes between the db and the clone differ. I've redacted the uids, but I think the only relevant thing was that they didn't repeat across or inside the sets. Notice how the second set has the numeric uid in the middle.

(take 2 (de/diff-idx :avet broken-db broken-db-clone)) => 
(#{#datascript/Datom [45652 :block/uid 45619 537058459 true]
   #datascript/Datom [8905 :block/uid "redacted-001" 536911536 true]
   #datascript/Datom [27110 :block/uid "redacted-002" 536987307 true]}
 #{#datascript/Datom [45646 :block/uid "redacted-003" 537058402 true]
   #datascript/Datom [45645 :block/uid "redacted-004" 537058389 true]
   #datascript/Datom [45650 :block/uid "redacted-005" 537058422 true]
   #datascript/Datom [45651 :block/uid "redacted-006" 537058459 true]
   #datascript/Datom [45647 :block/uid "redacted-007" 537058406 true]
   #datascript/Datom [45642 :block/uid "redacted-008" 537058365 true]
   #datascript/Datom [45643 :block/uid "redacted-009" 537058372 true]
   #datascript/Datom [45648 :block/uid "redacted-010" 537058410 true]
   #datascript/Datom [45649 :block/uid "redacted-011" 537058418 true]
   #datascript/Datom [45644 :block/uid "redacted-012" 537058383 true]
   #datascript/Datom [45652 :block/uid 45619 537058459 true]
   #datascript/Datom [20803 :block/uid "redacted-013" 536958221 true]
   #datascript/Datom [45626 :block/uid "redacted-014" 537058242 true]
   #datascript/Datom [28557 :block/uid "redacted-015" 536997349 true]
   #datascript/Datom [10154 :block/uid "redacted-016" 536920698 true]
   #datascript/Datom [6977 :block/uid "redacted-017" 536907281 true]
   #datascript/Datom [8621 :block/uid "redacted-018" 536909593 true]
   #datascript/Datom [23392 :block/uid "redacted-019" 536966137 true]
   #datascript/Datom [4100 :block/uid "redacted-020" 536890498 true]
   #datascript/Datom [32731 :block/uid "redacted-021" 537016082 true]
   #datascript/Datom [21212 :block/uid "redacted-022" 536958472 true]
   #datascript/Datom [5594 :block/uid "redacted-023" 536897359 true]})

For the second set, datascript/entity behaves differently for lookups before or after the numeric uid:

;; before the numeric block uid d/entity doesn't work
(datascript.core/entity broken-db-clone [:block/uid "redacted-006"]) => nil
;; after the numeric block uid d/entity works
(datascript.core/entity broken-db-clone [:block/uid "redacted-020"]) => {:db/id 4100}

This behaviour makes me thing there's something related to order going on here. Worth mentioning that the diff result set is always the same, so I don't think there's any race conditions going on.

I'm still trying to make an isolated repro but like I mentioned the toy example I attempted did not reproduce the issue. I was hoping you might have some hints of what I could be looking for to narrow it down to a usable repro.

@tonsky
Copy link
Owner

tonsky commented Mar 4, 2021

it has a number value instead of a string (like other existing entities)

Probably not a good idea either, same problem as with NaNs: they violate comparison invariants

2 < "a"
false
2 > "a"
false

@filipesilva
Copy link
Contributor Author

That makes sense with what I'm seeing then. I guess what's happening is that one of the protocols for search that Datascript implements is hitting that comparison and then thinks the element isn't there at all?

The question then is what the intended behaviour from Datascript is at this point. Should CLJS Datascript fail on transact (like the CLJ version) when the comparison invariants are violated, or should the user attempt to enforce this?

@tonsky
Copy link
Owner

tonsky commented Mar 4, 2021

I think it should fail, but it’s hard to figure out which values are comparable and which are not. That’s why it was left as a “loose constraint” for users to enforce. If values are not comparable, don’t put them in the same indexed attribute (non-indexed should be fine).

If you have an idea how to enforce it, I’ll be happy to implement.

@filipesilva
Copy link
Contributor Author

filipesilva commented Mar 4, 2021

Here's the minimal repro I could make without any dependencies other than datascript.

It shows how ent lookups stop working for some entities, but not all, and how the avet index is no longer equal between cloned dbs.

(ns filipe.playground
  (:require
   [datascript.core :as ds]
   [clojure.data :as data]))

;; I'm not sure why only some sets of txs reproduce this problem, but
;; guess it has to do with how Datascript implements searches.
(def txs [[{:block/uid "2LB4tlJGy"}]
          [{:block/uid "2ON453J0Z"}]
          [{:block/uid "2KqLLNbPg"}]
          [{:block/uid "2L0dcD7yy"}]
          [{:block/uid "2KqFNrhTZ"}]
          [{:block/uid "2KdQmItUD"}]
          [{:block/uid "2O8BcBfIL"}]
          [{:block/uid "2L4ZbI7nK"}]
          [{:block/uid "2KotiW36Z"}]
          [{:block/uid "2O4o-y5J8"}]
          [{:block/uid "2KimvuGko"}]
          [{:block/uid "dTR20ficj"}]
          [{:block/uid "wRmp6bXAx"}]
          [{:block/uid "rfL-iQOZm"}]
          [{:block/uid "tya6s422-"}]
          ;; This is the transaction that breaks the db.
          ;; It has a number instead of a string for a unique attr.
          [{:block/uid 45619}]])

(def schema {:block/uid {:db/unique :db.unique/identity}})
(def conn (ds/create-conn schema))

;; Make a working db by applying all txs but the one with a different type,
;; then also make the broken db.
(doseq [tx (butlast txs)]
  (ds/transact! conn tx))

(def working-db @conn)

;; CLJ will throw on this transact (can't compare long to string), but CLJS will accept it.
(ds/transact! conn (last txs))

(def broken-db @conn)

;; We can find broken lookups by attempting to look up all the datoms.
(defn broken-ent-lookups [db]
  (->> (seq db)
       (map (fn [[_ a v]] [a v]))
       (remove #(ds/entity db %))))

;; The broken db cannot lookup a few of the entities in its datoms.
(broken-ent-lookups broken-db) ;; => ([:block/uid "wRmp6bXAx"] [:block/uid "rfL-iQOZm"])

;; The working db doesn't have this problem.
(broken-ent-lookups working-db) ;; => ()

;; Just to reiterate how adding that last transaction broke existing lookups:
(ds/entity working-db [:block/uid "wRmp6bXAx"]) ;; => #:db{:id 13}
(ds/entity broken-db  [:block/uid "wRmp6bXAx"]) ;; => nil
(ds/entity working-db [:block/uid "rfL-iQOZm"]) ;; => #:db{:id 14}
(ds/entity broken-db  [:block/uid "rfL-iQOZm"]) ;; => nil

;; Another way to identify that the db is broken is by diffing the avet index on db clones.
(defn clone-db [db]
  (ds/init-db (seq db) (:schema db)))

(defn diff-avet-idx [x y]
  (data/diff (:avet x) (:avet y)))

;; The broken db diff will say there's two datoms that look identical but are exclusive to each index.
(diff-avet-idx (clone-db broken-db) (clone-db broken-db))
;; => [#{#datascript/Datom [16 :block/uid 45619 536870928 true]
;;       #datascript/Datom [14 :block/uid "rfL-iQOZm" 536870926 true]}
;;     #{#datascript/Datom [16 :block/uid 45619 536870928 true]
;;       #datascript/Datom [14 :block/uid "rfL-iQOZm" 536870926 true]}
;;     ...]

;; The working db diff will say the indexes don't have any differences.
(diff-avet-idx (clone-db working-db) (clone-db working-db))
;; => [nil nil ...]

@filipesilva
Copy link
Contributor Author

I think it should fail, but it’s hard to figure out which values are comparable and which are not. That’s why it was left as a “loose constraint” for users to enforce. If values are not comparable, don’t put them in the same indexed attribute (non-indexed should be fine).

If you have an idea how to enforce it, I’ll be happy to implement.

The first thing that comes to mind is (identical? (type x) (type y)), like you have for the string/array/true/false cases in value-compare. If the types are not identical, throw.

But I'm not sure if this has some weird unforeseen interaction that ends up being a regression. And I think the principle behind this change should be to prevent cases where the indexes are broken without affecting any other cases.

So maybe the better approach would be to verify the invariant, and throw if it is not ensured when adding the index entry. e.g. when adding 45619 to the index via either transact! or init-db (maybe others, I don't know), and the right insertion point was found, also check the invariant still holds for that insertion point. WDYT?

@filipesilva
Copy link
Contributor Author

Also relevant is how such a change would affect serialization with datascript-transit. I guess write-transit-str would be fine, but read-transit-str would probably throw when recreating the indexes, whereas before it would not throw.

How would a user recover from such a case? Probably by trying to remove the problematic datoms. But I'm not sure if datascript-transit is able to provide some intermediate state where a user could trim the datoms before proceeding with deserialization.

This is especially bad because a user can have a partially broken index without knowing, since the symptom is that some entities are unavailable for lookups, but others are still fine.

@tonsky
Copy link
Owner

tonsky commented Mar 4, 2021

The issue is very nasty, but tricky part is that it’s hard to figure out that values are comparable in Clojure. Same class might implement comparison incorrectly. Different classes might be comparable (e.g. Vector and Seq). Different classes might be not comparable and still return some value (string and number).

It’s also tricky to decide where this check should be inserted. When adding new value, which values should I compare it with? With all already added is too expensive. Even a check during a binary search is taxing.

The best would probably be to fix type in the schema, just like Datomic does. That would limit some valid use-cases (e.g. storing vectors and seqs next to each other, or storing user-defined types), but we’ll solve that issue. Unfortunately, this route is too backwards-incompatible to actually implement.

@filipesilva
Copy link
Contributor Author

I'm looking around the Datascript codebase to see if there's something I could do, and I noticed something interesting: trying to evaluate the code below in a CIDER cljs lein nodes repl actually errors out:

(let [conn (d/create-conn {:block/uid {:db/unique :db.unique/identity}})
      tx1 [{:block/uid "tya6s422-"}]
      _ (d/transact! conn tx1)
      tx2 [{:block/uid 45619}]
      _ (d/transact! conn tx2)]
  )

  Show: Project-Only All 
  Hide: Clojure Java REPL Tooling Duplicates  (12 frames hidden)

1. Unhandled clojure.lang.ExceptionInfo
   Execution error (Error) at (<cljs repl>:1). Cannot compare tya6s422- to 45619

   {:type :js-eval-exception,
    :error
    {:status :exception,
     :value
     "Execution error (Error) at (<cljs repl>:1).\nCannot compare tya6s422- to 45619\n"},
    :form
    (let
     [conn
      (d/create-conn {:block/uid {:db/unique :db.unique/identity}})
      tx1
      [{:block/uid "tya6s422-"}]
      _
      (d/transact! conn tx1)
      tx2
      [{:block/uid 45619}]
      _
      (d/transact! conn tx2)]),
    :js
    "try{cljs.core.pr_str.call(null,(function (){var ret__6694__auto__ = (function (){var conn = datascript.core.create_conn.call(null,new cljs.core.PersistentArrayMap(null, 1, [new cljs.core.Keyword(\"block\",\"uid\",\"block/uid\",-1623585167),new cljs.core.PersistentArrayMap(null, 1, [new cljs.core.Keyword(\"db\",\"unique\",\"db/unique\",329396388),new cljs.core.Keyword(\"db.unique\",\"identity\",\"db.unique/identity\",1675950722)], null)], null));\nvar tx1 = new cljs.core.PersistentVector(null, 1, 5, cljs.core.PersistentVector.EMPTY_NODE, [new cljs.core.PersistentArrayMap(null, 1, [new cljs.core.Keyword(\"block\",\"uid\",\"block/uid\",-1623585167),\"tya6s422-\"], null)], null);\nvar _ = datascript.core.transact_BANG_.call(null,conn,tx1);\nvar tx2 = new cljs.core.PersistentVector(null, 1, 5, cljs.core.PersistentVector.EMPTY_NODE, [new cljs.core.PersistentArrayMap(null, 1, [new cljs.core.Keyword(\"block\",\"uid\",\"block/uid\",-1623585167),(45619)], null)], null);\nvar ___$1 = datascript.core.transact_BANG_.call(null,conn,tx2);\nreturn null;\n})();\ncljs.core._STAR_3 = cljs.core._STAR_2;\n\ncljs.core._STAR_2 = cljs.core._STAR_1;\n\ncljs.core._STAR_1 = ret__6694__auto__;\n\nreturn ret__6694__auto__;\n})());\n}catch (e11095){var e__6695__auto___11096 = e11095;\ncljs.core._STAR_e = e__6695__auto___11096;\n\nthrow e__6695__auto___11096;\n}"}
                 repl.cljc:  578  cljs.repl$evaluate_form/invokeStatic
                 repl.cljc:  499  cljs.repl$evaluate_form/invoke
       piggieback_impl.clj:  250  cider.piggieback/eval-cljs
       piggieback_impl.clj:  249  cider.piggieback/eval-cljs
       piggieback_impl.clj:  289  cider.piggieback/do-eval/fn
                  AFn.java:  152  clojure.lang.AFn/applyToHelper
                  AFn.java:  144  clojure.lang.AFn/applyTo
                  core.clj:  665  clojure.core/apply
                  core.clj: 1973  clojure.core/with-bindings*
                  core.clj: 1973  clojure.core/with-bindings*
               RestFn.java:  425  clojure.lang.RestFn/invoke
       piggieback_impl.clj:  266  cider.piggieback/do-eval
       piggieback_impl.clj:  265  cider.piggieback/do-eval
       piggieback_impl.clj:  323  cider.piggieback/evaluate
       piggieback_impl.clj:  321  cider.piggieback/evaluate
                  Var.java:  384  clojure.lang.Var/invoke
       piggieback_impl.clj:  354  cider.piggieback/wrap-cljs-repl/fn/fn/fn
       piggieback_impl.clj:  201  cider.piggieback/enqueue/fn
                  AFn.java:   22  clojure.lang.AFn/run
               session.clj:  202  nrepl.middleware.session/session-exec/main-loop/fn
               session.clj:  201  nrepl.middleware.session/session-exec/main-loop
                  AFn.java:   22  clojure.lang.AFn/run
               Thread.java:  831  java.lang.Thread/run

The verbiage around the real fns themselves is java, but the eval'd string looks like it's javascript. I'm not really sure what throws the error either. But this looked odd and interesting.

@filipesilva
Copy link
Contributor Author

Another different approach here might be to actually allows these "bad" comparisons via the fallback hash in value-compare instead of preventing them. Which is to say, make it work on cljs instead of preventing it.

I'm still trying to understand where the comparison goes wrong, because that hash compare is there, but this seems like an interesting approach.

@filipesilva
Copy link
Contributor Author

I'm looking at this fn:

(defn- cmp-attr-quick [a1 a2]
  ;; either both are keywords or both are strings
  #?(:cljs
     (if (keyword? a1)
       (-compare a1 a2)
       (garray/defaultCompare a1 a2))
     :clj
     (.compareTo ^Comparable a1 a2)))

Should this be using garray/defaultCompare without the (identical? (type x) (type y)) as in the value-compare fn?

@tonsky
Copy link
Owner

tonsky commented Mar 5, 2021

This compares attribute keywords themselves, not the values

@filipesilva
Copy link
Contributor Author

Oh I think I figured out why in #386 (comment) I saw the value comparison error - I was on an old version of the Datascript repo, 0.18.8 actually. In the latest master there's no error.

So in 0.18.8 transacting different data types was actually an error but it's not anymore. Is this issue a regression then?

@filipesilva
Copy link
Contributor Author

filipesilva commented Mar 5, 2021

Also figured out where the Cannot compare tya6s422- to 45619 message came from. 0.18.8 was using Clojure's compare, and (compare "tya6s422-" to 45619) errors out with that message.

(defn cmp-datoms-avet-quick [^Datom d1, ^Datom d2]
(combine-cmp
(cmp-attr-quick (.-a d1) (.-a d2))
(compare (.-v d1) (.-v d2))
(#?(:clj Integer/compare :cljs -) (.-e d1) (.-e d2))
(#?(:clj Integer/compare :cljs -) (datom-tx d1) (datom-tx d2))))

@tonsky
Copy link
Owner

tonsky commented Mar 5, 2021

Looking at changelog, I started using hashes to compare non-comparable values in 1.0.0 release (#274 #356). Not sure if that solves your problem, since both string and number are both pretty much comparable

@filipesilva
Copy link
Contributor Author

That's interesting, they are comparable by hash just fine. Then this probably has nothing to do with insertion, and is about the entity lookup instead. Maybe that's not using the hash comparison? Will look into that next.

@tonsky
Copy link
Owner

tonsky commented Mar 5, 2021

Sure. It’s not the question of comparing hashes, that’s the simple part. The hard part is figuring out that your default compare is no good and you need to fall back to hashes

@filipesilva
Copy link
Contributor Author

I'm a bit confused about that part.

My impression from reading bits around datascript.db was that the AVET index is a sorted set using the cmp-datoms-avet fn, which uses value-cmp to compare the values, which in turn goes to value-compare.

(defn value-compare [x y]
  (cond
    (= x y) 0
    #?@(:clj  [(instance? Number x)       (clojure.lang.Numbers/compare x y)])
    #?@(:clj  [(instance? Comparable x)   (.compareTo ^Comparable x y)]
        :cljs [(satisfies? IComparable x) (-compare x y)])
    #?@(:cljs [(and (or (string? x) (array? x) (true? x) (false? x))
                 (identical? (type x) (type y))) (garray/defaultCompare x y)])
    :else (- (hash x) (hash y))))

This in turn does the fallback to hashes for the string/number case, and seems to compare them correctly:

(value-compare 45619 "tya6s422-")   ;; => -458678613
(value-compare "tya6s422-" 45619)   ;; => 458678613
(- (hash 45619) (hash "tya6s422-")) ;; => -458678613
(- (hash "tya6s422-") (hash 45619)) ;; => 458678613

So at first blush this seems like it should be fine? That's the part that confuses me. Is there some other value comparator elsewhere that I'm missing?

@tonsky
Copy link
Owner

tonsky commented Mar 5, 2021

Hmm, yes, if it gets to hash, it should be fine.

@filipesilva
Copy link
Contributor Author

So far I've traced the bad behaviour down to datascript.db/-datoms, which calls me.tonsky.persistent-sorted-set/slice on the sorted set that has the avet comparator.

I added some logging to the comparator fn:

(defn cmp-datoms-avet [^Datom d1, ^Datom d2]
  (print "cmp-datoms-avet value-cmp" (.-v d1) (.-v d2) (value-cmp (.-v d1) (.-v d2)))
  (combine-cmp
    (cmp (.-a d1) (.-a d2))
    (value-cmp (.-v d1) (.-v d2))
    (#?(:clj Integer/compare :cljs -) (.-e d1) (.-e d2))
    (#?(:clj Integer/compare :cljs -) (datom-tx d1) (datom-tx d2))))

Then, still using the repro code in #386 (comment), I called the following:

(def eid1 [:block/uid "wRmp6bXAx"])

(datascript.db/-datoms working-db :avet eid1)
;; => (#datascript/Datom[13 :block/uid "wRmp6bXAx" 536870925 true])
;; prints
;;  cmp-datoms-avet value-cmp 2LB4tlJGy wRmp6bXAx -1
;;  cmp-datoms-avet value-cmp dTR20ficj wRmp6bXAx -1
;;  cmp-datoms-avet value-cmp tya6s422- wRmp6bXAx -1
;;  cmp-datoms-avet value-cmp wRmp6bXAx wRmp6bXAx 0
;;  cmp-datoms-avet value-cmp 2LB4tlJGy wRmp6bXAx -1
;;  cmp-datoms-avet value-cmp dTR20ficj wRmp6bXAx -1
;;  cmp-datoms-avet value-cmp tya6s422- wRmp6bXAx -1
;;  cmp-datoms-avet value-cmp wRmp6bXAx wRmp6bXAx 0


(datascript.db/-datoms broken-db :avet eid1)
;; => nil
;; prints
;;  cmp-datoms-avet value-cmp 2LB4tlJGy wRmp6bXAx -1
;;  cmp-datoms-avet value-cmp 45619 wRmp6bXAx 905417766
;;  cmp-datoms-avet value-cmp 2O8BcBfIL wRmp6bXAx -1
;;  cmp-datoms-avet value-cmp 2ON453J0Z wRmp6bXAx -1
;;  cmp-datoms-avet value-cmp 2LB4tlJGy wRmp6bXAx -1
;;  cmp-datoms-avet value-cmp 45619 wRmp6bXAx 905417766
;;  cmp-datoms-avet value-cmp 2O8BcBfIL wRmp6bXAx -1
;;  cmp-datoms-avet value-cmp 2ON453J0Z wRmp6bXAx -1

I find odd here is that the comparator is being called a second set of times after finding the right thing in the first case. But then I looked at the source in https://github.com/tonsky/persistent-sorted-set/blob/e29c02cb1991e29f968c78eb9760566f3d7231b3/src-clojure/me/tonsky/persistent_sorted_set.cljs#L744-L768 and -rseek is indeed called twice so maybe that's normal.

The other thing thing that crossed my mind was "it's common enough to get - operations the other way around so why don't I just try flipping the :else (- (hash x) (hash y))".

And that did the trick.

  ;; Now the broken-db lookups work instead of retuning nil
  (d/entity working-db [:block/uid "wRmp6bXAx"]) ;; => #:db{:id 13}
  (d/entity broken-db  [:block/uid "wRmp6bXAx"]) ;; => #:db{:id 13}
  (d/entity working-db [:block/uid "rfL-iQOZm"]) ;; => #:db{:id 14}
  (d/entity broken-db  [:block/uid "rfL-iQOZm"]) ;; => #:db{:id 13}

I guess this never came up in tests because the problem was the flipped comparator value needed to take you down a wrong path in a b-tree to reproduce.

Putting together a PR and tests.

@filipesilva
Copy link
Contributor Author

Thinking about it some more, I'm still confused as to why the hash comparison needed to be flipped. As long as it was self-consistent, there shouldn't be a problem right?

@filipesilva
Copy link
Contributor Author

I haven't done the PR because I'm pretty sure flipping the hash comparison is the wrong fix and would just move the problem to some other path of the binary tree.

Not really sure how to proceed from here, but at this point the problem looks related to where some values are ending up in the binary tree given the comparator.

It's interesting that the two lookups that fail share a lot of the comparisons with the insertion of the number value:

(d/db-with working-db (last txs))
;; prints
;; cmp-datoms-avet value-cmp 2LB4tlJGy 45619 -1231765406
;; cmp-datoms-avet value-cmp dTR20ficj 45619 736635941
;; cmp-datoms-avet value-cmp 2O8BcBfIL 45619 -828129617
;; cmp-datoms-avet value-cmp 2ON453J0Z 45619 -1747753137

 (d/entity broken-db  [:block/uid "wRmp6bXAx"]) ;; => nil
;; prints
;; cmp-datoms-avet value-cmp 2LB4tlJGy wRmp6bXAx -1
;; cmp-datoms-avet value-cmp 45619 wRmp6bXAx 905417766
;; cmp-datoms-avet value-cmp 2O8BcBfIL wRmp6bXAx -1
;; cmp-datoms-avet value-cmp 2ON453J0Z wRmp6bXAx -1

(d/entity broken-db  [:block/uid "rfL-iQOZm"]) ;; => nil
;; prints
;; cmp-datoms-avet value-cmp 2LB4tlJGy rfL-iQOZm -1
;; cmp-datoms-avet value-cmp 45619 rfL-iQOZm 2087952925
;; cmp-datoms-avet value-cmp 2O8BcBfIL rfL-iQOZm -1
;; cmp-datoms-avet value-cmp 2ON453J0Z rfL-iQOZm -1

@tonsky
Copy link
Owner

tonsky commented Mar 5, 2021

I suppose hash comparison is not going to work out. We lose transitivity:

45619 < dTR20ficj
dTR20ficj < rfL-iQOZm
rfL-iQOZm < 45619

E.g. comparing string-string and number-number gives you different order than comparing string-number or number-string.

To make this order correct I think we should do this: compare type first (as string, for example), if equal, then hashes inside that type. That should give us a total order and get rid of the error. Can you test it out? Something like

(defn value-compare [x y]
  (cond
    (= x y) 0
    #?@(:clj  [(instance? Number x)       (clojure.lang.Numbers/compare x y)])
    #?@(:clj  [(instance? Comparable x)   (.compareTo ^Comparable x y)]
        :cljs [(satisfies? IComparable x) (-compare x y)])
    (not (identical? (type x) (type y)))  (garray/defaultCompare (type x) (type y))
    #?@(:cljs [(or (string? x) (array? x) (true? x) (false? x)) (garray/defaultCompare x y)])
    :else (- (hash x) (hash y))))

@filipesilva
Copy link
Contributor Author

I think I see the transitivity problem. So the solution you're proposing is basically saying "anything in this type is always bigger/smaller than anything in that type", right? Going to give it a try.

@tonsky
Copy link
Owner

tonsky commented Mar 5, 2021

Yes, well put!

@filipesilva
Copy link
Contributor Author

Yes I can confirm that in the sample I put above this approach does indeed fix it. I'll put up a PR with the fix, and a test with the repro.

filipesilva added a commit to filipesilva/datascript that referenced this issue Mar 5, 2021
@filipesilva
Copy link
Contributor Author

Here it is, PTAL: #388

Also, it was very nice to have the back and forth here with you while trying to get to the bottom of this, thank you 😀 .

@tonsky tonsky closed this as completed in 262f266 Mar 6, 2021
@tonsky
Copy link
Owner

tonsky commented Mar 6, 2021

Fixed in 1.0.7

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants