Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(sync): prevent race condition by relying on autoincrement #4938

Merged
merged 3 commits into from
Nov 8, 2023

Conversation

max-nextcloud
Copy link
Collaborator

@max-nextcloud max-nextcloud commented Oct 30, 2023

Prevent a possible race condition when two clients add steps at the same time.

See #4600.

Rely on the autoincrementing id in order to provide a canonical order that steps can be retrieved in.

When two clients push steps at the same time
the entries receive destinct ids that increment.
So if another client fetches steps in between
it will see the smaller id as the version of the fetched step and fetch the other step later on.

Transition:
In the future we can drop the version column entirely but currently there are still steps stored in the database that make use of the old column.
So we need to transition away from that.

In order to find entries that are newer than version x we select those that have both a version and an id larger than x.

Entries of the new format are newer than any entry of the old format. So we set their version to the largest possible value. This way they will always fulfill the version condition and the condition on the id is more strict and therefore effective.

For the old format the version will be smaller than the id as it's incremented per document while the id is unique accross documents. Therefore the version condition is the more strict one and effective.

The only scenario where the version might be larger than the id would be if there's very few documents in the database and they have had a lot of steps stored in single database entries.

📝 Summary

🏁 Checklist

  • Code is properly formatted (npm run lint / npm run stylelint / composer run cs:check)
  • Sign-off message is added to all commits
  • Tests (unit, integration and/or end-to-end) passing and the changes are covered with tests
  • Documentation is not required

@max-nextcloud max-nextcloud force-pushed the fix/4600-prevent-race-condition branch 2 times, most recently from 9182a19 to e9dcefc Compare October 30, 2023 19:49
@max-nextcloud max-nextcloud marked this pull request as draft October 30, 2023 19:57
@max-nextcloud
Copy link
Collaborator Author

Obviously WIP - opened the PR to be able to discuss the approach.

lib/Db/Step.php Outdated Show resolved Hide resolved
lib/Db/Step.php Outdated Show resolved Hide resolved
@mejo- mejo- force-pushed the fix/4600-prevent-race-condition branch 2 times, most recently from 1503b54 to a5175cf Compare October 31, 2023 11:36
@mejo-
Copy link
Member

mejo- commented Oct 31, 2023

I like the approach, it's a way cleaner solution than my approach to make read+write of version atomic. I pushed some fixes (force-pushed code-style fixes into the original commit and added another commit for logical fixes).

@mejo-
Copy link
Member

mejo- commented Oct 31, 2023

Successfully tested the following scenario:

  • main branch checked out
  • client1 creates document and creates steps
  • client2 joins document session and creates steps
  • client2 leaves document session
  • switch to PR branch
  • client3 joins document session and creates steps
  • client2 joins document session and creates steps
  • client1 creates steps

All change are synced between all clients as expected.

@juliusknorr
Copy link
Member

juliusknorr commented Oct 31, 2023

Found some weirdness when giving it a quick test, but not sure if related to this PR or not:

  • Two separate browsers
  • Take browser A offline via dev tools and quickly click back and type one character
  • Edit a bit in browser B
  • Wait until the close request in browser A is failing
  • Set browser A online again
  • Click reconnect
  • 🐛 See that reconnect seems successful and changes from B are picked up, however changes from A do not propagate to B
  • 🐛 Reconnect error seems to be showing and hiding randomly
  • Editing is not stable after that

Edit: Also happens on main it seems

@mejo- mejo- marked this pull request as ready for review October 31, 2023 22:07
@mejo- mejo- added bug Something isn't working 3. to review feature: sync labels Oct 31, 2023
Copy link
Member

@mejo- mejo- left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ready for merge from my perspective.

@max-nextcloud
Copy link
Collaborator Author

There are two things that i did not look into yet that might be worthwhile ensuring:

  • I'm not sure how the database queries perform now. We're now limiting by both version and id and I think there's an index on each of them but no combined index. Since these endpoints are called very often would be good to ensure there's no performance regression. Maybe we can get hold of the sql query that's run right now and run EXPLAIN on it to see which indexes it would use.
  • There's a theoretical possibility that the highest version in the steps table is larger than the largest id. Versions can increment by more than one so if i only have one document that has a few steps and some of them increase the version by more than one the resulting version will be higher than the max id. This is unlikely but i am unsure if we can completely ignore it. It could be mitigated by inserting a step with a higher id that does not serve any other purpose. Not sure that cornercase is worth the effort though.

max-nextcloud and others added 2 commits November 2, 2023 22:46
Prevent a possible race condition when two clients add steps at the same time.

See #4600.

Rely on the autoincrementing id in order to provide a canonical order
that steps can be retrieved in.

When two clients push steps at the same time
the entries receive destinct ids that increment.
So if another client fetches steps in between
it will see the smaller id as the version of the fetched step
and fetch the other step later on.

Transition:
In the future we can drop the version column entirely
but currently there are still steps stored in the database
that make use of the old column.
So we need to transition away from that.

In order to find entries that are newer than version x
we select those that have both a version and an id larger than x.

Entries of the new format are newer than any entry of the old format.
So we set their version to the largest possible value.
This way they will always fulfill the version condition
and the condition on the id is more strict and therefore effective.

For the old format the version will be smaller than the id
as it's incremented per document while the id is unique accross documents.
Therefore the version condition is the more strict one and effective.

The only scenario where the version might be larger than the id
would be if there's very few documents in the database
and they have had a lot of steps stored in single database entries.

Signed-off-by: Max <[email protected]>
Signed-off-by: Jonas <[email protected]>
The value used before (largest possible MySQL BIGINT value) was too big
for PHP int. Since we still support 32-bit platforms on Nextcloud, let's
stick to the largest possible 32-bit PHP integer value.

Besides, setting the value as default for `Step::version` doesn't work
as `QBMapper->insert()` doesn't recognize the `version` field as changed
in that case. So let's default to `0` again and set it using
`Step->setVersion()` later.

Signed-off-by: Jonas <[email protected]>
@juliusknorr juliusknorr force-pushed the fix/4600-prevent-race-condition branch from a5175cf to 308d640 Compare November 2, 2023 21:46
@mejo-
Copy link
Member

mejo- commented Nov 7, 2023

There's a theoretical possibility that the highest version in the steps table is larger than the largest id. Versions can increment by more than one so if i only have one document that has a few steps and some of them increase the version by more than one the resulting version will be higher than the max id. This is unlikely but i am unsure if we can completely ignore it. It could be mitigated by inserting a step with a higher id that does not serve any other purpose. Not sure that cornercase is worth the effort though.

I thought a bit about this and I think we should ignore this corner case. Nextcloud instances with just one (or very few) document sessions likely aren't used collaboratively. Sure, there's a theoretical possibility that people setup a Nextcloud instance to work together on one file, but that's not very likely. And even in this scenario, the autosave should have saved the last document state. So worst case in this very unlikely scenario people will loose a few editing steps. This is not worth adding complex migration logic in my opinion.

@mejo-
Copy link
Member

mejo- commented Nov 8, 2023

I'm not sure how the database queries perform now. We're now limiting by both version and id and I think there's an index on each of them but no combined index. Since these endpoints are called very often would be good to ensure there's no performance regression. Maybe we can get hold of the sql query that's run right now and run EXPLAIN on it to see which indexes it would use.

@juliushaertl and me looked into this. The index on document_id is always used first and the following where statements are only executed on its subset. In our tests, the new queries didn't look worse regarding performance than the old ones.

@mejo- mejo- merged commit c49d129 into main Nov 8, 2023
46 checks passed
@mejo- mejo- deleted the fix/4600-prevent-race-condition branch November 8, 2023 11:24
@mejo-
Copy link
Member

mejo- commented Nov 8, 2023

/backport ea208ee,308d64069f8903a113985d7d71d027b127a3b0ce to stable27

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants