-
Notifications
You must be signed in to change notification settings - Fork 3.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[enhancement](MOW) refactor delete bitmap calculator and enhance the deletion bitmap calculation for full duplicate load #22566
Conversation
clang-tidy review says "All clean, LGTM! 👍" |
clang-tidy review says "All clean, LGTM! 👍" |
2 similar comments
clang-tidy review says "All clean, LGTM! 👍" |
clang-tidy review says "All clean, LGTM! 👍" |
run buildall |
(From new machine)TeamCity pipeline, clickbench performance test result: |
da1452b
to
b5185dc
Compare
clang-tidy review says "All clean, LGTM! 👍" |
run buildall |
(From new machine)TeamCity pipeline, clickbench performance test result: |
clang-tidy review says "All clean, LGTM! 👍" |
6 similar comments
clang-tidy review says "All clean, LGTM! 👍" |
clang-tidy review says "All clean, LGTM! 👍" |
clang-tidy review says "All clean, LGTM! 👍" |
clang-tidy review says "All clean, LGTM! 👍" |
clang-tidy review says "All clean, LGTM! 👍" |
clang-tidy review says "All clean, LGTM! 👍" |
run buildall |
(From new machine)TeamCity pipeline, clickbench performance test result: |
clang-tidy review says "All clean, LGTM! 👍" |
run buildall |
(From new machine)TeamCity pipeline, clickbench performance test result: |
clang-tidy review says "All clean, LGTM! 👍" |
clang-tidy review says "All clean, LGTM! 👍" |
run buildall |
(From new machine)TeamCity pipeline, clickbench performance test result: |
b6449a2
to
4bc5be0
Compare
run buildall |
clang-tidy review says "All clean, LGTM! 👍" |
(From new machine)TeamCity pipeline, clickbench performance test result: |
run p0 |
run buildall |
(From new machine)TeamCity pipeline, clickbench performance test result: |
4bc5be0
to
3800592
Compare
clang-tidy review says "All clean, LGTM! 👍" |
run buildall |
We're closing this PR because it hasn't been updated in a while. |
Proposed changes
Issue Number: close #xxx
We propose a new approach based on multi-way merge sort to calculate the delete bitmap between new and old records. Specifically, under the Unique-Key model, in the previous implementation, we would individually search for each newly added record within the existing rowsets using a primary key index to identify records that needed to be marked for deletion. However, this method's efficiency is significantly compromised when performing full duplicate loads especially when the number of rowset gets larger.
This patch does not exhibit significant performance degradation on average, but in the context of 't+1 load' scenarios (where multiple minor updates are performed on a substantial existing dataset before a full-scale replacement update), there is a notable performance improvement.
For example, let's say we create a unique-key table with a composite key of 3 integers. We then adopt the following import approach: first, perform a full import of all the data (40M rows), followed by several, say 20, smaller-scale imports (1M rows), and then another full import (40M rows). In the final import, on our machine, we observed that the unoptimized code took 89 seconds to import, whereas after implementing optimizations, the code only required 50 seconds.
Further comments
If this is a relatively large or complex change, kick off the discussion at [email protected] by explaining why you chose the solution you did and what alternatives you considered, etc...