-
Notifications
You must be signed in to change notification settings - Fork 3.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[fix](clone) Fix clone and alter tablet use same tablet path #34889
Conversation
Thank you for your contribution to Apache Doris. Since 2024-03-18, the Document has been moved to doris-website. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
clang-tidy made some suggestions
@@ -263,6 +263,9 @@ Status EngineCloneTask::_do_clone() { | |||
&store, _clone_req.partition_id)); | |||
auto tablet_dir = fmt::format("{}/{}/{}", local_shard_root_path, _clone_req.tablet_id, | |||
_clone_req.schema_hash); | |||
auto tablet_manager = _engine.tablet_manager(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
warning: 'auto tablet_manager' can be declared as 'auto *tablet_manager' [readability-qualified-auto]
auto tablet_manager = _engine.tablet_manager(); | |
auto *tablet_manager = _engine.tablet_manager(); |
run buildall |
TeamCity be ut coverage result: |
TPC-DS: Total hot run time: 172537 ms
|
ClickBench: Total hot run time: 30.25 s
|
run buildall |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
PR approved by anyone and no changes requested. |
TPC-H: Total hot run time: 42083 ms
|
TeamCity be ut coverage result: |
TPC-DS: Total hot run time: 168084 ms
|
ClickBench: Total hot run time: 30.57 s
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
need update
clang-tidy review says "All clean, LGTM! 👍" |
TeamCity be ut coverage result: |
TPC-H: Total hot run time: 39967 ms
|
TPC-DS: Total hot run time: 172966 ms
|
ClickBench: Total hot run time: 30.55 s
|
run buildall |
clang-tidy review says "All clean, LGTM! 👍" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
PR approved by at least one committer and no changes requested. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
TPC-H: Total hot run time: 40312 ms
|
TPC-DS: Total hot run time: 172807 ms
|
TeamCity be ut coverage result: |
ClickBench: Total hot run time: 31.16 s
|
The entire process is as follows: 1. Drop the tablet. 2. Successfully clone the tablet in full. 3. Start the incremental clone. 4. Start to move the tablet to the trash (the process of actually cleaning the data begins from step 1, where the tablet was dropped). 5. The incremental clone fails. 6. The incremental clone is successfully retried. Step 4 moved the data that was just pulled from the full clone to the trash, leading to data loss. The failure in step 5 of the incremental clone was also due to the deletion of the just-pulled snapshot data. Fix: When cloning, check the tablet status and determine if the tablet directory has already been moved to the trash directory. If it has not been moved to the trash, the clone thread should help move it to the trash directory.
…34889) The entire process is as follows: 1. Drop the tablet. 2. Successfully clone the tablet in full. 3. Start the incremental clone. 4. Start to move the tablet to the trash (the process of actually cleaning the data begins from step 1, where the tablet was dropped). 5. The incremental clone fails. 6. The incremental clone is successfully retried. Step 4 moved the data that was just pulled from the full clone to the trash, leading to data loss. The failure in step 5 of the incremental clone was also due to the deletion of the just-pulled snapshot data. Fix: When cloning, check the tablet status and determine if the tablet directory has already been moved to the trash directory. If it has not been moved to the trash, the clone thread should help move it to the trash directory.
…34889) The entire process is as follows: 1. Drop the tablet. 2. Successfully clone the tablet in full. 3. Start the incremental clone. 4. Start to move the tablet to the trash (the process of actually cleaning the data begins from step 1, where the tablet was dropped). 5. The incremental clone fails. 6. The incremental clone is successfully retried. Step 4 moved the data that was just pulled from the full clone to the trash, leading to data loss. The failure in step 5 of the incremental clone was also due to the deletion of the just-pulled snapshot data. Fix: When cloning, check the tablet status and determine if the tablet directory has already been moved to the trash directory. If it has not been moved to the trash, the clone thread should help move it to the trash directory.
…34889) The entire process is as follows: 1. Drop the tablet. 2. Successfully clone the tablet in full. 3. Start the incremental clone. 4. Start to move the tablet to the trash (the process of actually cleaning the data begins from step 1, where the tablet was dropped). 5. The incremental clone fails. 6. The incremental clone is successfully retried. Step 4 moved the data that was just pulled from the full clone to the trash, leading to data loss. The failure in step 5 of the incremental clone was also due to the deletion of the just-pulled snapshot data. Fix: When cloning, check the tablet status and determine if the tablet directory has already been moved to the trash directory. If it has not been moved to the trash, the clone thread should help move it to the trash directory.
Proposed changes
Issue Number: close #xxx
The entire process is as follows:
Step 4 moved the data that was just pulled from the full clone to the trash, leading to data loss.
The failure in step 5 of the incremental clone was also due to the deletion of the just-pulled snapshot data.
Fix:
When cloning, check the tablet status and determine if the tablet directory has already been moved to the trash directory. If it has not been moved to the trash, the clone thread should help move it to the trash directory.
Further comments
If this is a relatively large or complex change, kick off the discussion at [email protected] by explaining why you chose the solution you did and what alternatives you considered, etc...