Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Refactor](inverted index) refactor inverted index compound predicates evaluate logic #38908 #40574

Merged

Conversation

airborne12
Copy link
Member

cherry pick from #38908

…s evaluate logic (apache#38908)

This PR addresses several key issues related to the compound condition
support in the inverted index, and optimization for index skipping
without returning to the table:

1. **Unified Handling of `expr` and `column predicate`**:
- Combined the processing of inverted index-related `column predicate`
and `expr`.
- Ensured that compound conditions involving both `column predicate` and
`expr` are processed uniformly to reduce complexity and improve
robustness.

2. **Optimized the Execution of Compound Conditions**:
- Removed the logic in `scan_operator` that normalized compound
predicates by pushing down logic to `_common_expr_ctxs_push_down` where
`expr` contexts are managed.
- Added `evaluate_inverted_index` support to the `vexpr` and function
layers, such as `function comparison` and `function collection_in`.
- Introduced new data structures in `VExprContext` to store results from
`evaluate_inverted_index`, thus facilitating quick lookup and
application of these results during execution.
@airborne12
Copy link
Member Author

run buildall

@doris-robot
Copy link

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR

Since 2024-03-18, the Document has been moved to doris-website.
See Doris Document.

Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clang-tidy made some suggestions

if (!st.ok()) {
bitmap->addRange(0, num_rows);
return st;
Status evaluate_inverted_index(VExprContext* context, uint32_t segment_num_rows) override {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

warning: function 'evaluate_inverted_index' exceeds recommended size/complexity thresholds [readability-function-size]

    Status evaluate_inverted_index(VExprContext* context, uint32_t segment_num_rows) override {
           ^
Additional context

be/src/vec/exprs/vcompound_pred.h:56: 95 lines including whitespace and comments (threshold 80)

    Status evaluate_inverted_index(VExprContext* context, uint32_t segment_num_rows) override {
           ^

if (all_pass && !res.is_empty()) {
// set fast_execute when expr evaluated by inverted index correctly
_can_fast_execute = true;
context->get_inverted_index_context()->set_inverted_index_result_for_expr(this, res);
}
return Status::OK();
}

Status execute(VExprContext* context, Block* block, int* result_column_id) override {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

warning: function 'execute' has cognitive complexity of 112 (threshold 50) [readability-function-cognitive-complexity]

    Status execute(VExprContext* context, Block* block, int* result_column_id) override {
           ^
Additional context

be/src/vec/exprs/vcompound_pred.h:154: +1, including nesting penalty of 0, nesting level increased to 1

        if (_can_fast_execute && fast_execute(context, block, result_column_id)) {
        ^

be/src/vec/exprs/vcompound_pred.h:154: +1

        if (_can_fast_execute && fast_execute(context, block, result_column_id)) {
                              ^

be/src/vec/exprs/vcompound_pred.h:157: +1, including nesting penalty of 0, nesting level increased to 1

        if (children().size() == 1 || !_all_child_is_compound_and_not_const()) {
        ^

be/src/vec/exprs/vcompound_pred.h:163: +1, including nesting penalty of 0, nesting level increased to 1

        RETURN_IF_ERROR(_children[0]->execute(context, block, &lhs_id));
        ^

be/src/common/status.h:628: expanded from macro 'RETURN_IF_ERROR'

    do {                                \
    ^

be/src/vec/exprs/vcompound_pred.h:163: +2, including nesting penalty of 1, nesting level increased to 2

        RETURN_IF_ERROR(_children[0]->execute(context, block, &lhs_id));
        ^

be/src/common/status.h:630: expanded from macro 'RETURN_IF_ERROR'

        if (UNLIKELY(!_status_.ok())) { \
        ^

be/src/vec/exprs/vcompound_pred.h:175: +1, including nesting penalty of 0, nesting level increased to 1

        if (lhs_is_nullable) {
        ^

be/src/vec/exprs/vcompound_pred.h:189: nesting level increased to 1

        auto get_rhs_colum = [&]() {
                             ^

be/src/vec/exprs/vcompound_pred.h:190: +2, including nesting penalty of 1, nesting level increased to 2

            if (rhs_id == -1) {
            ^

be/src/vec/exprs/vcompound_pred.h:191: +3, including nesting penalty of 2, nesting level increased to 3

                RETURN_IF_ERROR(_children[1]->execute(context, block, &rhs_id));
                ^

be/src/common/status.h:628: expanded from macro 'RETURN_IF_ERROR'

    do {                                \
    ^

be/src/vec/exprs/vcompound_pred.h:191: +4, including nesting penalty of 3, nesting level increased to 4

                RETURN_IF_ERROR(_children[1]->execute(context, block, &rhs_id));
                ^

be/src/common/status.h:630: expanded from macro 'RETURN_IF_ERROR'

        if (UNLIKELY(!_status_.ok())) { \
        ^

be/src/vec/exprs/vcompound_pred.h:201: +3, including nesting penalty of 2, nesting level increased to 3

                if (rhs_is_nullable) {
                ^

be/src/vec/exprs/vcompound_pred.h:209: nesting level increased to 1

        auto return_result_column_id = [&](ColumnPtr res_column, int res_id) -> int {
                                       ^

be/src/vec/exprs/vcompound_pred.h:210: +2, including nesting penalty of 1, nesting level increased to 2

            if (result_is_nullable && !res_column->is_nullable()) {
            ^

be/src/vec/exprs/vcompound_pred.h:210: +1

            if (result_is_nullable && !res_column->is_nullable()) {
                                   ^

be/src/vec/exprs/vcompound_pred.h:219: nesting level increased to 1

        auto create_null_map_column = [&](ColumnPtr& null_map_column,
                                      ^

be/src/vec/exprs/vcompound_pred.h:221: +2, including nesting penalty of 1, nesting level increased to 2

            if (null_map_data == nullptr) {
            ^

be/src/vec/exprs/vcompound_pred.h:230: nesting level increased to 1

        auto vector_vector_null = [&]<bool is_and_op>() {
                                  ^

be/src/vec/exprs/vcompound_pred.h:240: +2, including nesting penalty of 1, nesting level increased to 2

            if constexpr (is_and_op) {
            ^

be/src/vec/exprs/vcompound_pred.h:241: +3, including nesting penalty of 2, nesting level increased to 3

                for (size_t i = 0; i < size; ++i) {
                ^

be/src/vec/exprs/vcompound_pred.h:246: +1, nesting level increased to 2

            } else {
              ^

be/src/vec/exprs/vcompound_pred.h:247: +3, including nesting penalty of 2, nesting level increased to 3

                for (size_t i = 0; i < size; ++i) {
                ^

be/src/vec/exprs/vcompound_pred.h:260: +1, including nesting penalty of 0, nesting level increased to 1

        if (_op == TExprOpcode::COMPOUND_AND) {
        ^

be/src/vec/exprs/vcompound_pred.h:263: +2, including nesting penalty of 1, nesting level increased to 2

            if ((lhs_all_false && !lhs_is_nullable) || (lhs_all_false && lhs_all_is_not_null)) {
            ^

be/src/vec/exprs/vcompound_pred.h:263: +1

            if ((lhs_all_false && !lhs_is_nullable) || (lhs_all_false && lhs_all_is_not_null)) {
                                                    ^

be/src/vec/exprs/vcompound_pred.h:263: +1

            if ((lhs_all_false && !lhs_is_nullable) || (lhs_all_false && lhs_all_is_not_null)) {
                               ^

be/src/vec/exprs/vcompound_pred.h:263: +1

            if ((lhs_all_false && !lhs_is_nullable) || (lhs_all_false && lhs_all_is_not_null)) {
                                                                      ^

be/src/vec/exprs/vcompound_pred.h:266: +1, nesting level increased to 2

            } else {
              ^

be/src/vec/exprs/vcompound_pred.h:267: +3, including nesting penalty of 2, nesting level increased to 3

                RETURN_IF_ERROR(get_rhs_colum());
                ^

be/src/common/status.h:628: expanded from macro 'RETURN_IF_ERROR'

    do {                                \
    ^

be/src/vec/exprs/vcompound_pred.h:267: +4, including nesting penalty of 3, nesting level increased to 4

                RETURN_IF_ERROR(get_rhs_colum());
                ^

be/src/common/status.h:630: expanded from macro 'RETURN_IF_ERROR'

        if (UNLIKELY(!_status_.ok())) { \
        ^

be/src/vec/exprs/vcompound_pred.h:269: +3, including nesting penalty of 2, nesting level increased to 3

                if ((lhs_all_true && !lhs_is_nullable) ||    //not null column
                ^

be/src/vec/exprs/vcompound_pred.h:269: +1

                if ((lhs_all_true && !lhs_is_nullable) ||    //not null column
                                                       ^

be/src/vec/exprs/vcompound_pred.h:269: +1

                if ((lhs_all_true && !lhs_is_nullable) ||    //not null column
                                  ^

be/src/vec/exprs/vcompound_pred.h:270: +1

                    (lhs_all_true && lhs_all_is_not_null)) { //nullable column
                                  ^

be/src/vec/exprs/vcompound_pred.h:273: +1, nesting level increased to 3

                } else if ((rhs_all_false && !rhs_is_nullable) ||
                       ^

be/src/vec/exprs/vcompound_pred.h:273: +1

                } else if ((rhs_all_false && !rhs_is_nullable) ||
                                                               ^

be/src/vec/exprs/vcompound_pred.h:273: +1

                } else if ((rhs_all_false && !rhs_is_nullable) ||
                                          ^

be/src/vec/exprs/vcompound_pred.h:274: +1

                           (rhs_all_false && rhs_all_is_not_null)) {
                                          ^

be/src/vec/exprs/vcompound_pred.h:277: +1, nesting level increased to 3

                } else if ((rhs_all_true && !rhs_is_nullable) ||
                       ^

be/src/vec/exprs/vcompound_pred.h:277: +1

                } else if ((rhs_all_true && !rhs_is_nullable) ||
                                                              ^

be/src/vec/exprs/vcompound_pred.h:277: +1

                } else if ((rhs_all_true && !rhs_is_nullable) ||
                                         ^

be/src/vec/exprs/vcompound_pred.h:278: +1

                           (rhs_all_true && rhs_all_is_not_null)) {
                                         ^

be/src/vec/exprs/vcompound_pred.h:281: +1, nesting level increased to 3

                } else {
                  ^

be/src/vec/exprs/vcompound_pred.h:282: +4, including nesting penalty of 3, nesting level increased to 4

                    if (!result_is_nullable) {
                    ^

be/src/vec/exprs/vcompound_pred.h:284: +5, including nesting penalty of 4, nesting level increased to 5

                        for (size_t i = 0; i < size; i++) {
                        ^

be/src/vec/exprs/vcompound_pred.h:287: +1, nesting level increased to 4

                    } else {
                      ^

be/src/vec/exprs/vcompound_pred.h:292: +1, nesting level increased to 1

        } else if (_op == TExprOpcode::COMPOUND_OR) {
               ^

be/src/vec/exprs/vcompound_pred.h:295: +2, including nesting penalty of 1, nesting level increased to 2

            if ((lhs_all_true && !lhs_is_nullable) || (lhs_all_true && lhs_all_is_not_null)) {
            ^

be/src/vec/exprs/vcompound_pred.h:295: +1

            if ((lhs_all_true && !lhs_is_nullable) || (lhs_all_true && lhs_all_is_not_null)) {
                                                   ^

be/src/vec/exprs/vcompound_pred.h:295: +1

            if ((lhs_all_true && !lhs_is_nullable) || (lhs_all_true && lhs_all_is_not_null)) {
                              ^

be/src/vec/exprs/vcompound_pred.h:295: +1

            if ((lhs_all_true && !lhs_is_nullable) || (lhs_all_true && lhs_all_is_not_null)) {
                                                                    ^

be/src/vec/exprs/vcompound_pred.h:298: +1, nesting level increased to 2

            } else {
              ^

be/src/vec/exprs/vcompound_pred.h:299: +3, including nesting penalty of 2, nesting level increased to 3

                RETURN_IF_ERROR(get_rhs_colum());
                ^

be/src/common/status.h:628: expanded from macro 'RETURN_IF_ERROR'

    do {                                \
    ^

be/src/vec/exprs/vcompound_pred.h:299: +4, including nesting penalty of 3, nesting level increased to 4

                RETURN_IF_ERROR(get_rhs_colum());
                ^

be/src/common/status.h:630: expanded from macro 'RETURN_IF_ERROR'

        if (UNLIKELY(!_status_.ok())) { \
        ^

be/src/vec/exprs/vcompound_pred.h:300: +3, including nesting penalty of 2, nesting level increased to 3

                if ((lhs_all_false && !lhs_is_nullable) || (lhs_all_false && lhs_all_is_not_null)) {
                ^

be/src/vec/exprs/vcompound_pred.h:300: +1

                if ((lhs_all_false && !lhs_is_nullable) || (lhs_all_false && lhs_all_is_not_null)) {
                                                        ^

be/src/vec/exprs/vcompound_pred.h:300: +1

                if ((lhs_all_false && !lhs_is_nullable) || (lhs_all_false && lhs_all_is_not_null)) {
                                   ^

be/src/vec/exprs/vcompound_pred.h:300: +1

                if ((lhs_all_false && !lhs_is_nullable) || (lhs_all_false && lhs_all_is_not_null)) {
                                                                          ^

be/src/vec/exprs/vcompound_pred.h:303: +1, nesting level increased to 3

                } else if ((rhs_all_true && !rhs_is_nullable) ||
                       ^

be/src/vec/exprs/vcompound_pred.h:303: +1

                } else if ((rhs_all_true && !rhs_is_nullable) ||
                                                              ^

be/src/vec/exprs/vcompound_pred.h:303: +1

                } else if ((rhs_all_true && !rhs_is_nullable) ||
                                         ^

be/src/vec/exprs/vcompound_pred.h:304: +1

                           (rhs_all_true && rhs_all_is_not_null)) {
                                         ^

be/src/vec/exprs/vcompound_pred.h:307: +1, nesting level increased to 3

                } else if ((rhs_all_false && !rhs_is_nullable) ||
                       ^

be/src/vec/exprs/vcompound_pred.h:307: +1

                } else if ((rhs_all_false && !rhs_is_nullable) ||
                                                               ^

be/src/vec/exprs/vcompound_pred.h:307: +1

                } else if ((rhs_all_false && !rhs_is_nullable) ||
                                          ^

be/src/vec/exprs/vcompound_pred.h:308: +1

                           (rhs_all_false && rhs_all_is_not_null)) {
                                          ^

be/src/vec/exprs/vcompound_pred.h:311: +1, nesting level increased to 3

                } else {
                  ^

be/src/vec/exprs/vcompound_pred.h:312: +4, including nesting penalty of 3, nesting level increased to 4

                    if (!result_is_nullable) {
                    ^

be/src/vec/exprs/vcompound_pred.h:314: +5, including nesting penalty of 4, nesting level increased to 5

                        for (size_t i = 0; i < size; i++) {
                        ^

be/src/vec/exprs/vcompound_pred.h:317: +1, nesting level increased to 4

                    } else {
                      ^

be/src/vec/exprs/vcompound_pred.h:322: +1, nesting level increased to 1

        } else {
          ^

if (it == column_names.end()) {
return Status::Error<ErrorCode::INTERNAL_ERROR>("fast_execute failed: {}",
result_column_name);
Status VExpr::_evaluate_inverted_index(VExprContext* context, const FunctionBasePtr& function,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

warning: function '_evaluate_inverted_index' exceeds recommended size/complexity thresholds [readability-function-size]

Status VExpr::_evaluate_inverted_index(VExprContext* context, const FunctionBasePtr& function,
              ^
Additional context

be/src/vec/exprs/vexpr.cpp:603: 106 lines including whitespace and comments (threshold 80)

Status VExpr::_evaluate_inverted_index(VExprContext* context, const FunctionBasePtr& function,
              ^

}
VLOG_DEBUG << "begin to execute match directly, column_name=" << column_name
<< ", match_query_str=" << match_query_str;
InvertedIndexCtx* inverted_index_ctx = reinterpret_cast<InvertedIndexCtx*>(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

warning: use auto when initializing with a cast to avoid duplicating the type name [modernize-use-auto]

Suggested change
InvertedIndexCtx* inverted_index_ctx = reinterpret_cast<InvertedIndexCtx*>(
auto* inverted_index_ctx = reinterpret_cast<InvertedIndexCtx*>(

@airborne12 airborne12 merged commit 1e7884f into apache:branch-3.0 Sep 10, 2024
19 of 25 checks passed
@airborne12 airborne12 deleted the pick_38908_to_origin_branch-3.0 branch September 10, 2024 06:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants