Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding documentation regarding the importance of a single time source… #110

Open
wants to merge 2 commits into
base: bdr-plugin/next
Choose a base branch
from
Open

Conversation

gilesw
Copy link

@gilesw gilesw commented Jul 10, 2015

… on conflict handlers

see https://github.com/2ndQuadrant/bdr/issues/109

@anarazel
Copy link

Hi,

On 2015-07-10 03:14:16 -0700, gilesw wrote:

  • Adding documentation regarding the importance of a single time source on conflict handlers
  • If you have a conflict when the time on two nodes is out of sync the conflict
  • may never be able to resolved because the last update time will never match
  • even after the handler has run. This will manifest itself as row updates only
  • syncing in one direction.

That shouldn't actually happening - this should result in the "wrong
row" winning, but it should nevertheless be resolved.

@gilesw
Copy link
Author

gilesw commented Jul 10, 2015

Hi anarazel,

I've corrected the time source now but the steps I used to create the conflict were:-

For an update/update conflict I powered down node a and updated on node b. Then powered down node b and updated node a and powered node b back on.

conflict_id              | 860
local_node_sysid         | 6166345561721046825
local_conflict_xid       | 4990
local_conflict_lsn       | 0/CA06DD98
local_conflict_time      | 2015-07-08 16:30:29.276713+00
object_schema            | public
object_name              | table
remote_node_sysid        | 6166334043667378995
remote_txid              | 3114
remote_commit_time       | 2015-07-08 15:45:44.999168+00
remote_commit_lsn        | 1/4104FF98
conflict_type            | update_update
conflict_resolution      | last_update_wins_keep_local
local_tuple              | {"table_id":1452776,"last_update_id":"xxx","password":"obs","username":"final7","acc_id":1,"last_update_time":"2015-07-08T16:16:34.854137+00:00","make_public":false}
remote_tuple             | {"table_id":1452776,"last_update_id":"xxx","password":"obs","username":"final8","acc_id":1,"last_update_time":"2015-07-08T15:45:44.994954+00:00","make_public":false}
local_tuple_xmin         | 4988

It did have me stumped for a good while which is why I submitted the issue for the doc update. As soon as the time source was corrected though the syncing was bi-directional again.

I did try to do some more diagnosis by clearing out the conflict history to try and log each step but I got into an infinite conflict loop. If you delete the conflict history on each node you actually generated a delete/delete conflict. Do you want me to submit this as a bug or is there a purge function that I'm missing?

@ringerc
Copy link

ringerc commented Mar 18, 2016

If you delete the conflict history on each node you actually generated a delete/delete conflict

Hm. We don't replicate inserts into the conflict history table from the conflict tracking code, but maybe we don't filter out subsequent SQL-level update/delete on the table? I'll need to check. Creating new bug.

@ringerc
Copy link

ringerc commented Mar 18, 2016

It sounds like we need to reproduce this and fix the underlying bug with desynchronized time causing failure to resolve.

@gilesw Can you supply a more detailed set of steps to reproduce this? BDR setup commands, DDL, and the SQL run on each node to create the issue?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants