-
Notifications
You must be signed in to change notification settings - Fork 3.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[improvement](spill) fuzzy spill and fix bugs #33291
Conversation
Thank you for your contribution to Apache Doris. Since 2024-03-18, the Document has been moved to doris-website. |
run buildall |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
clang-tidy made some suggestions
return Status::OK(); | ||
} | ||
|
||
Status PartitionedHashJoinProbeOperatorX::close(RuntimeState* state) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
warning: method 'close' can be made static [readability-convert-member-functions-to-static]
be/src/pipeline/exec/partitioned_hash_join_probe_operator.h:155:
- Status close(RuntimeState* state) override;
+ static Status close(RuntimeState* state) override;
} | ||
void SpillSortSharedState::clear() { | ||
|
||
void SpillSortSharedState::close() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
warning: method 'close' can be made static [readability-convert-member-functions-to-static]
be/src/pipeline/pipeline_x/dependency.h:514:
- void close();
+ static void close();
be/src/vec/spill/spill_stream.cpp
Outdated
@@ -103,9 +113,9 @@ Status SpillStream::wait_spill() { | |||
return Status::OK(); | |||
} | |||
|
|||
Status SpillStream::spill_block(const Block& block, bool eof) { | |||
Status SpillStream::spill_block(RuntimeState* state, const Block& block, bool eof) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
warning: method 'spill_block' can be made static [readability-convert-member-functions-to-static]
be/src/vec/spill/spill_stream.h:55:
- Status spill_block(RuntimeState* state, const Block& block, bool eof);
+ static Status spill_block(RuntimeState* state, const Block& block, bool eof);
PrettyPrinter::print_bytes(_used_bytes), | ||
PrettyPrinter::print_bytes(_available_bytes), | ||
PrettyPrinter::print_bytes(incoming_data_size)); | ||
return true; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
warning: redundant boolean literal in conditional return statement [readability-simplify-boolean-expr]
be/src/vec/spill/spill_stream_manager.cpp:305:
- if (_used_bytes + incoming_data_size > _limit_bytes ||
- left_bytes <= config::storage_flood_stage_left_capacity_bytes) {
- LOG(WARNING) << fmt::format(
- "spill data reach limit, path: {}, limit: {}, used: {}, available: {}, "
- "incoming "
- "bytes: {}",
- _path, PrettyPrinter::print_bytes(_limit_bytes),
- PrettyPrinter::print_bytes(_used_bytes),
- PrettyPrinter::print_bytes(_available_bytes),
- PrettyPrinter::print_bytes(incoming_data_size));
- return true;
- }
- return false;
+ return _used_bytes + incoming_data_size > _limit_bytes ||
+ left_bytes <= config::storage_flood_stage_left_capacity_bytes;
_path, PrettyPrinter::print_bytes(_disk_capacity_bytes), | ||
PrettyPrinter::print_bytes(_available_bytes), | ||
PrettyPrinter::print_bytes(incoming_data_size)); | ||
return true; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
warning: redundant boolean literal in conditional return statement [readability-simplify-boolean-expr]
be/src/vec/spill/spill_stream_manager.cpp:328:
- if (used_pct >= config::spill_storage_usage_percent / 100.0) {
- LOG(WARNING) << fmt::format(
- "spill data reach limit, path: {}, capacity: {}, available: {}, incoming "
- "bytes: {}",
- _path, PrettyPrinter::print_bytes(_disk_capacity_bytes),
- PrettyPrinter::print_bytes(_available_bytes),
- PrettyPrinter::print_bytes(incoming_data_size));
- return true;
- }
- return false;
+ return used_pct >= config::spill_storage_usage_percent / 100.0;
} | ||
} | ||
|
||
int64_t get_used_bytes() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
warning: method 'get_used_bytes' can be made const [readability-make-member-function-const]
int64_t get_used_bytes() { | |
int64_t get_used_bytes() const { |
} | ||
} | ||
|
||
double get_usage(int64_t incoming_data_size) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
warning: method 'get_usage' can be made const [readability-make-member-function-const]
double get_usage(int64_t incoming_data_size) { | |
double get_usage(int64_t incoming_data_size) const { |
} | ||
} | ||
|
||
int64_t storage_limit() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
warning: method 'storage_limit' can be made const [readability-make-member-function-const]
int64_t storage_limit() { | |
int64_t storage_limit() const { |
TPC-H: Total hot run time: 38736 ms
|
TPC-DS: Total hot run time: 180630 ms
|
TeamCity be ut coverage result: |
ClickBench: Total hot run time: 29.44 s
|
Load test result on machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
|
run buildall |
TeamCity be ut coverage result: |
TPC-H: Total hot run time: 38561 ms
|
TPC-DS: Total hot run time: 180915 ms
|
ClickBench: Total hot run time: 29.84 s
|
Load test result on machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
|
4293418
to
5c3196f
Compare
run buildall |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
clang-tidy made some suggestions
@@ -570,6 +563,15 @@ Status PartitionedHashJoinProbeOperatorX::push(RuntimeState* state, vectorized:: | |||
return Status::OK(); | |||
} | |||
|
|||
Status PartitionedHashJoinProbeOperatorX::_setup_internal_operator_for_non_spill( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
warning: method '_setup_internal_operator_for_non_spill' can be made static [readability-convert-member-functions-to-static]
be/src/pipeline/exec/partitioned_hash_join_probe_operator.h:197:
- [[nodiscard]] Status _setup_internal_operator_for_non_spill(
+ [[nodiscard]] static Status _setup_internal_operator_for_non_spill(
@@ -55,10 +57,69 @@ Status PartitionedHashJoinSinkLocalState::close(RuntimeState* state, Status exec | |||
return PipelineXSpillSinkLocalState::close(state, exec_status); | |||
} | |||
|
|||
size_t PartitionedHashJoinSinkLocalState::revocable_mem_size(RuntimeState* state) const { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
warning: method 'revocable_mem_size' can be made static [readability-convert-member-functions-to-static]
size_t PartitionedHashJoinSinkLocalState::revocable_mem_size(RuntimeState* state) const { | |
size_t PartitionedHashJoinSinkLocalState::revocable_mem_size(RuntimeState* state) { |
be/src/pipeline/exec/partitioned_hash_join_sink_operator.h:50:
- size_t revocable_mem_size(RuntimeState* state) const;
+ static size_t revocable_mem_size(RuntimeState* state) ;
@@ -124,14 +185,53 @@ | |||
return Status::OK(); | |||
} | |||
|
|||
Status PartitionedHashJoinSinkLocalState::_partition_block(RuntimeState* state, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
warning: method '_partition_block' can be made static [readability-convert-member-functions-to-static]
be/src/pipeline/exec/partitioned_hash_join_sink_operator.h:59:
- Status _partition_block(RuntimeState* state, vectorized::Block* in_block, size_t begin,
+ static Status _partition_block(RuntimeState* state, vectorized::Block* in_block, size_t begin,
return _inner_sink_operator->open(state); | ||
} | ||
|
||
Status PartitionedHashJoinSinkOperatorX::_setup_internal_operator(RuntimeState* state) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
warning: method '_setup_internal_operator' can be made static [readability-convert-member-functions-to-static]
be/src/pipeline/exec/partitioned_hash_join_sink_operator.h:138:
- Status _setup_internal_operator(RuntimeState* state);
+ static Status _setup_internal_operator(RuntimeState* state);
TPC-H: Total hot run time: 38419 ms
|
TPC-DS: Total hot run time: 181269 ms
|
ClickBench: Total hot run time: 30.13 s
|
Load test result on machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
|
TeamCity be ut coverage result: |
run builall |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
clang-tidy made some suggestions
_path, PrettyPrinter::print_bytes(_disk_capacity_bytes), | ||
PrettyPrinter::print_bytes(_available_bytes), | ||
PrettyPrinter::print_bytes(incoming_data_size)); | ||
return true; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
warning: redundant boolean literal in conditional return statement [readability-simplify-boolean-expr]
be/src/vec/spill/spill_stream_manager.cpp:329:
- if (used_pct >= config::spill_storage_usage_percent / 100.0) {
- LOG(WARNING) << fmt::format(
- "spill data reach limit, path: {}, capacity: {}, available: {}, incoming "
- "bytes: {}",
- _path, PrettyPrinter::print_bytes(_disk_capacity_bytes),
- PrettyPrinter::print_bytes(_available_bytes),
- PrettyPrinter::print_bytes(incoming_data_size));
- return true;
- }
- return false;
+ return used_pct >= config::spill_storage_usage_percent / 100.0;
54b9f4e
to
bc31130
Compare
run buildall |
TeamCity be ut coverage result: |
run buildall |
TPC-H: Total hot run time: 41702 ms
|
TPC-DS: Total hot run time: 187804 ms
|
run buildall |
TPC-H: Total hot run time: 39558 ms
|
TPC-DS: Total hot run time: 187332 ms
|
run buildall |
TPC-H: Total hot run time: 41646 ms
|
TPC-DS: Total hot run time: 187662 ms
|
@@ -1972,7 +1972,6 @@ public void initFuzzyModeVariables() { | |||
this.topnOptLimitThreshold = (int) Math.pow(10, random.nextInt(5)); | |||
|
|||
// for spill to disk | |||
/* | |||
if (Config.pull_request_id > 10000) { | |||
if (Config.pull_request_id % 2 == 1) { | |||
this.enablePipelineXEngine = true; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just use default value.
97d65da
to
5e1cae7
Compare
run buildall |
TPC-H: Total hot run time: 40788 ms
|
TPC-DS: Total hot run time: 187548 ms
|
run buildall |
PR approved by at least one committer and no changes requested. |
PR approved by anyone and no changes requested. |
run performance |
Proposed changes
Issue Number: close #xxx
Further comments
If this is a relatively large or complex change, kick off the discussion at [email protected] by explaining why you chose the solution you did and what alternatives you considered, etc...