Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

issue-652: do not acquire barriers from the past upon FlushBytes #1919

Merged
merged 6 commits into from
Sep 16, 2024

Conversation

debnatkh
Copy link
Collaborator

@debnatkh debnatkh commented Aug 30, 2024

References (and, hopefully, fixes) #652

The main problem is that FlushBytes acquires collect barrier, which is less than the last collect commit id:


  1. Consider that there were the following sequence of writes:
Write(0,       256 KiB, 'a') -> Blob(commitId = 42)
Write(256 KiB, 256 KiB, 'b') -> Blob(commitId = 43)
Write(512 KiB, 1,       'f') -> FreshBytes(commitId = 44)
Write(0,       256 KiB, 'c') -> Blob(commitId = 45)

This will lead to the following file layout: [ccccccc][bbbbbbb][f]

  1. After execution of the CollectGarbage, all three new blobs will get a KeepFlag and the last collect commit id will be equal to 44
CommitId:     41      42        43       44
            Blob(a) Blob(b) FreshBytes Blob(c)
                                          |
                                 LastCollectCommitId
  1. After execution of the Cleanup operation, the first blob will be marked as garbage

  2. Let us execute FlushBytes operation, It will acquire collect barrier, equal to the minimal commitId, associated with FreshBlobs:

    for (const auto& bytes: args.Bytes) {
    args.CollectCommitId = Min(args.CollectCommitId, bytes.MinCommitId);
    }

After this acquisition there will be one barrier, equal to 43:

CommitId:     41      42        43       44
            Blob(a) Blob(b) FreshBytes Blob(c)
                                |         |
                             Barrier  LastCollectCommitId
  1. When the next CollectGarbage operation is to be executed, it will choose 42 as a collectCommitId:

const auto& barrier = *Impl->Barriers.begin();
return barrier.CommitId - 1;

After it the CollectGarbage request with one new grabage will be sent, leading to a decrease in collectCommitIds sequence: 42 after 44

@debnatkh debnatkh added the filestore Add this label to run only cloud/filestore build and tests on PR label Aug 30, 2024
@debnatkh debnatkh added the large-tests Launch large tests for PR label Aug 30, 2024
Copy link
Contributor

Note

This is an automated comment that will be appended during run.

🟢 linux-x86_64-relwithdebinfo: all tests PASSED for commit 29b86c8.

TESTS PASSED ERRORS FAILED SKIPPED MUTED?
1815 1815 0 0 0 0

@debnatkh debnatkh marked this pull request as ready for review August 30, 2024 17:43
@debnatkh debnatkh requested a review from qkrorlqr August 30, 2024 17:43
@debnatkh debnatkh changed the title issue-652: add test reproducing the error issue-652: do not acquire barriers from the past upon FlushBytes Aug 30, 2024
Copy link
Contributor

Note

This is an automated comment that will be appended during run.

🟢 linux-x86_64-relwithdebinfo: all tests PASSED for commit 5f6ce5f.

TESTS PASSED ERRORS FAILED SKIPPED MUTED?
1815 1815 0 0 0 0

Copy link
Collaborator

@qkrorlqr qkrorlqr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Хотя нет, unship, теперь у нас FlushBytes начинает отличаться от Compaction. Compaction генерирует CommitId один раз и использует его как для своих DstBlobs, так и для установки барьера. Не хочется, чтобы FlushBytes отличался от Compaction в этом месте, т.к. по своей сути он делает то же самое, что и Compaction.

Также смущает следующее. Compaction и FlushBytes читают индекс на Prepare-стадии. А CollectBarrier устанавливают на Complete-стадии. Между этими стадиями могут случиться другие события. Сейчас код устроен так, что Cleanup / Compaction / FlushBytes не работают одновременно. При этом что-то поменять в индексе так, что в определенном диапазоне по определенному CommitId могут перестать находиться блобы, которые на Prepare-стадии нашел Compaction или FlushBytes, могут только операции Cleanup / Compaction / FlushBytes. Но это, во-первых, хрупко и заслуживает как минимум комментария в коде. Во-вторых, когда мы начнем реализовывать #1923 , такой операцией также станет DeleteCheckpoint, и надо написать в коде //TODO(#1923) про вот эту ситуацию. Ну и подумать еще раз, а не может ли уже прямо сейчас быть проблемой тот факт, что блобы ищутся в индексе на Prepare-стадии, а барьер ставится позже - на Complete.

Copy link
Contributor

Note

This is an automated comment that will be appended during run.

🟢 linux-x86_64-relwithdebinfo: all tests PASSED for commit f68106f.

TESTS PASSED ERRORS FAILED SKIPPED MUTED?
1815 1815 0 0 0 0

Copy link
Contributor

Note

This is an automated comment that will be appended during run.

🟢 linux-x86_64-relwithdebinfo: all tests PASSED for commit cf06289.

TESTS PASSED ERRORS FAILED SKIPPED MUTED?
1815 1815 0 0 0 0

@debnatkh debnatkh merged commit 4282375 into main Sep 16, 2024
11 of 12 checks passed
@debnatkh debnatkh deleted the users/debnatkh/issue-652-repro-ut branch September 16, 2024 11:41
debnatkh added a commit that referenced this pull request Sep 18, 2024
debnatkh added a commit that referenced this pull request Sep 18, 2024
…t upon, fix ls command in client without --json flag (#2063)

* issue-652: do not acquire barriers from the past upon `FlushBytes` (#1919)

issue-652: add test reproducing the error

* [Filestore] fix ls command in client without `--json` flag (#2041)

[Filestore] fix ls command without `--json`
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
filestore Add this label to run only cloud/filestore build and tests on PR large-tests Launch large tests for PR
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants