-
Notifications
You must be signed in to change notification settings - Fork 21
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Filestore] Invalid CollectGarbage
requests to blobstorage.
#652
Comments
nbs/cloud/filestore/libs/storage/tablet/tablet_actor_cleanup.cpp Lines 68 to 72 in 0555c7d
nbs/cloud/filestore/libs/storage/tablet/tablet_state_data.cpp Lines 935 to 939 in 0555c7d
nbs/cloud/filestore/libs/storage/tablet/model/garbage_queue.cpp Lines 224 to 232 in 0555c7d
Generating a new CommitId on the |
The main problem is that FlushBytes acquires collect barrier, which is less than the last collect commit id:
This will lead to the following file layout:
After this acquisition there will be one barrier, equal to 43:
nbs/cloud/filestore/libs/storage/tablet/model/garbage_queue.cpp Lines 227 to 228 in 836a516
After it the CollectGarbage request with one new grabage will be sent, leading to a decrease in collectCommitIds sequence: 42 after 44 |
To reproduce the issue, one can use fio: fio --name=random-write-test \
--ioengine=libaio \
--rw=randwrite \
--bs=512-4k \
--size=1G \
--direct=1 \
--iodepth=16 \
--numjobs=4 \
--offset_increment=512 \
--do_verify=0 \
--time_based \
--runtime=$[120*60*60] AppCriticalEvents/CollectGarbageError errors after starting afformentioned fio: AppCriticalEvents/CollectGarbageError errors after deploying fix #1919: |
Errors like following stared causing IndexTablet to restart
Looks like
CollectGarbage
requests sent byTIndexTablet
does not guarantee the increasing order of (gen, step)Started seeing this error much more often after enabling vhost-side reads on the whole cluster
The text was updated successfully, but these errors were encountered: