Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Failed to build delete range aggregator Magic number mismatch: expected #10494

Closed
kwannoel opened this issue Jun 23, 2023 · 6 comments · Fixed by #10700
Closed

Failed to build delete range aggregator Magic number mismatch: expected #10494

kwannoel opened this issue Jun 23, 2023 · 6 comments · Fixed by #10700
Assignees
Labels
found-by-sqlsmith help wanted Issues that need help from contributors type/bug Something isn't working
Milestone

Comments

@kwannoel
Copy link
Contributor

2022-08-24T01:04:14.584219Z  WARN node{id=13 name="compactor-2"}:task{id=6083}: risingwave_storage::hummock::compactor: Failed to build delete range aggregator Magic number mismatch: expected 1468377971, found: 23981349.

See https://gist.github.com/kwannoel/7f34fff04ee8f3eecdd8d10ab25bdc7f?permalink_comment_id=4608036#gistcomment-4608036 for more info.

@github-actions github-actions bot added this to the release-0.20 milestone Jun 23, 2023
@lmatz lmatz added type/bug Something isn't working and removed type/feature labels Jun 23, 2023
@kwannoel kwannoel added the help wanted Issues that need help from contributors label Jun 23, 2023
@hzxa21 hzxa21 self-assigned this Jun 23, 2023
@hzxa21
Copy link
Collaborator

hzxa21 commented Jun 23, 2023

I can successfully reproduced this issue locally and verified several things:

  1. The encoding size and meta offset are both correct in the writer and reader side.
  2. The content seen by the writer and the reader are indeed different.

I suspect there is something wrong in madsim s3 simulator so I turned on debug log in madsim and saw these debug logs in madsim relevant to the problematic object (id=93):

2022-08-24T01:04:09.509693Z DEBUG node{id=3 name="s3"}:task{id=4273}: madsim_aws_sdk_s3::sim::server::service: upload_part bucket="hummock001" key="hummock_001/32/93.data" upload_id="121596723" part_number=2
2022-08-24T01:04:09.511002Z DEBUG node{id=3 name="s3"}:task{id=4274}: madsim_aws_sdk_s3::sim::server::service: upload_part bucket="hummock001" key="hummock_001/32/93.data" upload_id="121596723" part_number=1
2022-08-24T01:04:09.528232Z DEBUG node{id=3 name="s3"}:task{id=4280}: madsim_aws_sdk_s3::sim::server::service: complete_multipart_upload bucket="hummock001" key="hummock_001/32/93.data" upload_id="121596723"

We can see that UploadPart for part_number=2 finishes before UploadPart for part_number=1, which is a valid behavior for S3 multipart upload since parts can be uploaded concurrently in different connections. However, after checking the madsim s3 simulator codes, the simulator seems to miss the ordering and put part2 before part1 in the object body.

Root cause: although the simulator sorts the parts by number at first and populates a selection idx, it also sorts the selection idx afterwards. This causes the selection idx ordering to be identical with the parts original ordering and thus miss to order the parts by part number.

@kwannoel
Copy link
Contributor Author

cc @wangrunji0408

@hzxa21
Copy link
Collaborator

hzxa21 commented Jun 23, 2023

Potential fix: madsim-rs/madsim#149

@kwannoel
Copy link
Contributor Author

fuzzing-37.log

Similar error encountered today, but from monitored_store iter instead:

risingwave_storage::monitor::monitored_store: Failed in iter: Hummock error: Magic number mismatch: expected 1468377971, found: 564084722.
  backtrace of inner error:

https://buildkite.com/risingwavelabs/generate-sqlsmith-snapshots-weekly/builds/108#0188f428-2377-4dcf-a342-456063a88aa3

@hzxa21
Copy link
Collaborator

hzxa21 commented Jul 3, 2023

fuzzing-37.log

Similar error encountered today, but from monitored_store iter instead:

risingwave_storage::monitor::monitored_store: Failed in iter: Hummock error: Magic number mismatch: expected 1468377971, found: 564084722.
  backtrace of inner error:

https://buildkite.com/risingwavelabs/generate-sqlsmith-snapshots-weekly/builds/108#0188f428-2377-4dcf-a342-456063a88aa3

@kwannoel Is the issue resolved after the madsim fix?

@kwannoel
Copy link
Contributor Author

kwannoel commented Jul 3, 2023

Yup, tested this issue + fuzzing-37 locally, in #10700.
Passes both. Thanks for your help!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
found-by-sqlsmith help wanted Issues that need help from contributors type/bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants