Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[feature](datalake) Add BucketShuffleJoin support for bucketed hive tables #27784

Open
wants to merge 6 commits into
base: master
Choose a base branch
from

Conversation

Nitin-Kashyap
Copy link
Contributor

@Nitin-Kashyap Nitin-Kashyap commented Nov 29, 2023

Add BucketShuffleJoin support for bucketed hive tables generated by Spark. (27783)

Proposed changes

Issue Number: close #27783

1. Original planner updated to consider BucketShuffle for bucketed hive table
2. Neerids planner updated for bucketShuffle join on hive tables.
3. Added spark style hash calculation in BE for shuffle on one side.

###Sample Output:s
NeredisPlanner
OldPlanner

Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clang-tidy made some suggestions

be/src/vec/columns/column_decimal.cpp Outdated Show resolved Hide resolved
be/src/vec/columns/column_map.cpp Show resolved Hide resolved
be/src/vec/columns/column_string.cpp Outdated Show resolved Hide resolved
Copy link
Contributor

clang-tidy review says "All clean, LGTM! 👍"

Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clang-tidy made some suggestions

be/src/vec/columns/column_vector.cpp Outdated Show resolved Hide resolved
@Nitin-Kashyap Nitin-Kashyap force-pushed the feature-hiveBucketShuffle branch 2 times, most recently from ed212e1 to eaf29b0 Compare November 30, 2023 05:47
Copy link
Contributor

clang-tidy review says "All clean, LGTM! 👍"

@morningman morningman self-assigned this Nov 30, 2023
@morningman
Copy link
Contributor

Hi @Nitin-Kashyap , thanks for your contribution.
Could you please provide some create table stmt of hive table on spark side,
so that we can test this case?

@morningman
Copy link
Contributor

BTW, is it only suitable for "spark created" hive bucket table?
What if the hive table is created by other system with different hash function?

@Nitin-Kashyap
Copy link
Contributor Author

Nitin-Kashyap commented Dec 1, 2023

Hi @Nitin-Kashyap , thanks for your contribution. Could you please provide some create table stmt of hive table on spark side, so that we can test this case?

@morningman Please find the sample test I used for this case: -

CREATE TABLE parquet_test (
     user_id INT,
     key       VARCHAR(20),
     part      VARCAHAR(10)
)
USING parquet
PARTITIONED BY (part)
CLUSTERED BY (user_id) INTO 3 BUCKETS;

INSERT INTO parquet_test2 VALUES (31, 'U31', 'IN'),  (11,'U11','IN'), (21, 'U21', 'IN');

@Nitin-Kashyap
Copy link
Contributor Author

Nitin-Kashyap commented Dec 1, 2023

BTW, is it only suitable for "spark created" hive bucket table? What if the hive table is created by other system with different hash function?

@morningman Yes, for current scope it will understand only Spark created bucketed table, it identifies this by Properties defined by spark for bucket specification.

I plan to take up supporting for Hive, Hudi as well in some time (hopefully in next PR); for this I have left a place holder THashType [HIVE_MOD: Hive and Hudi use the same hash method] however for hudi some more changes on FE side need to do for identifing type bucket id from file path.

Copy link
Contributor

github-actions bot commented Dec 2, 2023

clang-tidy review says "All clean, LGTM! 👍"

1 similar comment
Copy link
Contributor

github-actions bot commented Dec 2, 2023

clang-tidy review says "All clean, LGTM! 👍"

Copy link
Contributor

github-actions bot commented Dec 4, 2023

clang-tidy review says "All clean, LGTM! 👍"

be/src/vec/utils/util.hpp Outdated Show resolved Hide resolved
Copy link
Contributor

github-actions bot commented Dec 4, 2023

clang-tidy review says "All clean, LGTM! 👍"

Copy link
Contributor

clang-tidy review says "All clean, LGTM! 👍"

Copy link
Contributor

clang-tidy review says "All clean, LGTM! 👍"

Copy link
Contributor

clang-tidy review says "All clean, LGTM! 👍"

@Nitin-Kashyap
Copy link
Contributor Author

run buildall

@doris-robot
Copy link

TeamCity be ut coverage result:
Function Coverage: 35.68% (8549/23961)
Line Coverage: 27.51% (69374/252223)
Region Coverage: 26.65% (35975/135012)
Branch Coverage: 23.46% (18393/78404)
Coverage Report: http://coverage.selectdb-in.cc/coverage/a13919ca473b2abfe5bf7b177061c4da8416f4d9_a13919ca473b2abfe5bf7b177061c4da8416f4d9/report/index.html

Copy link
Contributor

clang-tidy review says "All clean, LGTM! 👍"

@Nitin-Kashyap
Copy link
Contributor Author

run buildall

Copy link
Contributor

github-actions bot commented Mar 1, 2024

clang-tidy review says "All clean, LGTM! 👍"

@doris-robot
Copy link

TeamCity be ut coverage result:
Function Coverage: 35.60% (8550/24014)
Line Coverage: 27.42% (69395/253124)
Region Coverage: 26.57% (36001/135482)
Branch Coverage: 23.39% (18394/78630)
Coverage Report: http://coverage.selectdb-in.cc/coverage/a5ce2395a2ea0752c987d764edc0b39f902f82ce_a5ce2395a2ea0752c987d764edc0b39f902f82ce/report/index.html

@morningman
Copy link
Contributor

Hi @Nitin-Kashyap , I submitted a PR to your branch
Nitin-Kashyap#1
Please review

Copy link
Contributor

github-actions bot commented Mar 1, 2024

clang-tidy review says "All clean, LGTM! 👍"

Copy link
Contributor

@morrySnow morrySnow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@Nitin-Kashyap
Copy link
Contributor Author

run buildall

Copy link
Contributor

sh-checker report

To get the full details, please check in the job output.

shellcheck errors

'shellcheck ' returned error 1 finding the following syntactical issues:

----------

In build-support/shell-check.sh line 30:
    source "${DORIS_HOME}/env.sh" &>/dev/null
           ^--------------------^ SC1091 (info): Not following: ./env.sh: openBinaryFile: does not exist (No such file or directory)


In build.sh line 34:
. "${DORIS_HOME}/env.sh"
  ^--------------------^ SC1091 (info): Not following: ./env.sh: openBinaryFile: does not exist (No such file or directory)


In build.sh line 108:
    "${MVN_CMD}" clean
     ^--------^ SC2154 (warning): MVN_CMD is referenced but not assigned.


In build.sh line 280:
if [[ ! -f "${DORIS_THIRDPARTY}/installed/lib/libbacktrace.a" ]]; then
            ^-----------------^ SC2154 (warning): DORIS_THIRDPARTY is referenced but not assigned.


In build.sh line 474:
    ENABLE_PCH                  -- ${ENABLE_PCH}
                                   ^-----------^ SC2154 (warning): ENABLE_PCH is referenced but not assigned.


In build.sh line 538:
    MAKE_PROGRAM="$(command -v "${BUILD_SYSTEM}")"
                                ^-------------^ SC2154 (warning): BUILD_SYSTEM is referenced but not assigned.


In build.sh line 545:
    echo "-- Use ccache: ${CMAKE_USE_CCACHE}"
                         ^-----------------^ SC2154 (warning): CMAKE_USE_CCACHE is referenced but not assigned.


In build.sh line 551:
    "${CMAKE_CMD}" -G "${GENERATOR}" \
     ^----------^ SC2154 (warning): CMAKE_CMD is referenced but not assigned.
                       ^----------^ SC2154 (warning): GENERATOR is referenced but not assigned.


In build_plugin.sh line 25:
. "${DORIS_HOME}/env.sh"
  ^--------------------^ SC1091 (info): Not following: ./env.sh: openBinaryFile: does not exist (No such file or directory)


In build_plugin.sh line 105:
        "${MVN_CMD}" clean
         ^--------^ SC2154 (warning): MVN_CMD is referenced but not assigned.


In contrib/udf/build_udf.sh line 39:
. "${DORIS_HOME}/env.sh"
  ^--------------------^ SC1091 (info): Not following: ./env.sh: openBinaryFile: does not exist (No such file or directory)


In contrib/udf/build_udf.sh line 123:
    "${CMAKE_CMD}" -G "${GENERATOR}" -DCMAKE_BUILD_TYPE="${CMAKE_BUILD_TYPE}" ../
     ^----------^ SC2154 (warning): CMAKE_CMD is referenced but not assigned.
                       ^----------^ SC2154 (warning): GENERATOR is referenced but not assigned.


In contrib/udf/build_udf.sh line 124:
    "${BUILD_SYSTEM}" -j "${PARALLEL}"
     ^-------------^ SC2154 (warning): BUILD_SYSTEM is referenced but not assigned.


In fe_plugins/auditloader/build.sh line 25:
. "${DORIS_HOME}/env.sh"
  ^--------------------^ SC1091 (info): Not following: ./env.sh: openBinaryFile: does not exist (No such file or directory)


In fe_plugins/auditloader/build.sh line 29:
"${MVN_CMD}" clean package -DskipTests
 ^--------^ SC2154 (warning): MVN_CMD is referenced but not assigned.


In fs_brokers/apache_hdfs_broker/build.sh line 25:
. "${DORIS_HOME}/env.sh"
  ^--------------------^ SC1091 (info): Not following: ./env.sh: openBinaryFile: does not exist (No such file or directory)


In fs_brokers/apache_hdfs_broker/build.sh line 35:
"${MVN_CMD}" package -DskipTests
 ^--------^ SC2154 (warning): MVN_CMD is referenced but not assigned.


In generated-source.sh line 29:
. "${DORIS_HOME}/env.sh"
  ^--------------------^ SC1091 (info): Not following: ./env.sh: openBinaryFile: does not exist (No such file or directory)


In run-be-ut.sh line 42:
. "${DORIS_HOME}/env.sh"
  ^--------------------^ SC1091 (info): Not following: ./env.sh: openBinaryFile: does not exist (No such file or directory)


In run-be-ut.sh line 132:
    ENABLE_PCH          -- ${ENABLE_PCH}
                           ^-----------^ SC2154 (warning): ENABLE_PCH is referenced but not assigned.


In run-be-ut.sh line 218:
MAKE_PROGRAM="$(command -v "${BUILD_SYSTEM}")"
                            ^-------------^ SC2154 (warning): BUILD_SYSTEM is referenced but not assigned.


In run-be-ut.sh line 220:
echo "-- Use ccache: ${CMAKE_USE_CCACHE}"
                     ^-----------------^ SC2154 (warning): CMAKE_USE_CCACHE is referenced but not assigned.


In run-be-ut.sh line 224:
"${CMAKE_CMD}" -G "${GENERATOR}" \
 ^----------^ SC2154 (warning): CMAKE_CMD is referenced but not assigned.
                   ^----------^ SC2154 (warning): GENERATOR is referenced but not assigned.


In run-be-ut.sh line 241:
    -DDORIS_JAVA_HOME="${JAVA_HOME}" \
                       ^----------^ SC2154 (warning): JAVA_HOME is referenced but not assigned.


In run-be-ut.sh line 303:
if [[ -d "${DORIS_THIRDPARTY}/installed/lib/hadoop_hdfs/" ]]; then
          ^-----------------^ SC2154 (warning): DORIS_THIRDPARTY is referenced but not assigned.


In run-be-ut.sh line 424:
        cmd1="${LLVM_PROFDATA} merge -o ${profdata} ${profraw}"
              ^--------------^ SC2154 (warning): LLVM_PROFDATA is referenced but not assigned.


In run-be-ut.sh line 427:
        cmd2="${LLVM_COV} show -output-dir=${DORIS_TEST_BINARY_DIR}/report -format=html \
              ^---------^ SC2154 (warning): LLVM_COV is referenced but not assigned.


In run-cloud-ut.sh line 140:
. "${DORIS_HOME}/env.sh"
  ^--------------------^ SC1091 (info): Not following: ./env.sh: openBinaryFile: does not exist (No such file or directory)


In run-cloud-ut.sh line 160:
MAKE_PROGRAM="$(command -v "${BUILD_SYSTEM}")"
                            ^-------------^ SC2154 (warning): BUILD_SYSTEM is referenced but not assigned.


In run-cloud-ut.sh line 166:
"${CMAKE_CMD}" -G "${GENERATOR}" \
 ^----------^ SC2154 (warning): CMAKE_CMD is referenced but not assigned.
                   ^----------^ SC2154 (warning): GENERATOR is referenced but not assigned.


In run-cloud-ut.sh line 177:
    "${CMAKE_USE_CCACHE}" \
     ^-----------------^ SC2154 (warning): CMAKE_USE_CCACHE is referenced but not assigned.


In run-fe-ut.sh line 25:
. "${DORIS_HOME}/env.sh"
  ^--------------------^ SC1091 (info): Not following: ./env.sh: openBinaryFile: does not exist (No such file or directory)


In run-fe-ut.sh line 117:
        "${MVN_CMD}" test jacoco:report -DfailIfNoTests=false -Dtest="$1"
         ^--------^ SC2154 (warning): MVN_CMD is referenced but not assigned.


In thirdparty/build-thirdparty.sh line 42:
    . "${DORIS_HOME}/env.sh"
      ^--------------------^ SC1091 (info): Not following: ./env.sh: openBinaryFile: does not exist (No such file or directory)


In thirdparty/build-thirdparty.sh line 155:
if [[ "${CC}" == *gcc ]]; then
       ^---^ SC2154 (warning): CC is referenced but not assigned.


In thirdparty/build-thirdparty.sh line 208:
check_prerequest "${CMAKE_CMD} --version" "cmake"
                  ^----------^ SC2154 (warning): CMAKE_CMD is referenced but not assigned.


In thirdparty/build-thirdparty.sh line 340:
        "${CMAKE_CMD}" -G "${GENERATOR}" -DCMAKE_INSTALL_PREFIX="${TP_INSTALL_DIR}" -DEVENT__DISABLE_TESTS=ON \
                           ^----------^ SC2154 (warning): GENERATOR is referenced but not assigned.


In thirdparty/build-thirdparty.sh line 343:
    "${BUILD_SYSTEM}" -j "${PARALLEL}"
     ^-------------^ SC2154 (warning): BUILD_SYSTEM is referenced but not assigned.


In thirdparty/build-thirdparty.sh line 1123:
    local ld="${DORIS_BIN_UTILS}/ld"
              ^----------------^ SC2154 (warning): DORIS_BIN_UTILS is referenced but not assigned.

For more information:
  https://www.shellcheck.net/wiki/SC2154 -- BUILD_SYSTEM is referenced but no...
  https://www.shellcheck.net/wiki/SC1091 -- Not following: ./env.sh: openBina...
----------

You can address the above issues in one of three ways:
1. Manually correct the issue in the offending shell script;
2. Disable specific issues by adding the comment:
  # shellcheck disable=NNNN
above the line that contains the issue, where NNNN is the error code;
3. Add '-e NNNN' to the SHELLCHECK_OPTS setting in your .yml action file.



shfmt errors
'shfmt ' found no issues.

Copy link
Contributor

sh-checker report

To get the full details, please check in the job output.

shellcheck errors

'shellcheck ' returned error 1 finding the following syntactical issues:

----------

In build-support/shell-check.sh line 30:
    source "${DORIS_HOME}/env.sh" &>/dev/null
           ^--------------------^ SC1091 (info): Not following: ./env.sh: openBinaryFile: does not exist (No such file or directory)


In build.sh line 34:
. "${DORIS_HOME}/env.sh"
  ^--------------------^ SC1091 (info): Not following: ./env.sh: openBinaryFile: does not exist (No such file or directory)


In build.sh line 108:
    "${MVN_CMD}" clean
     ^--------^ SC2154 (warning): MVN_CMD is referenced but not assigned.


In build.sh line 280:
if [[ ! -f "${DORIS_THIRDPARTY}/installed/lib/libbacktrace.a" ]]; then
            ^-----------------^ SC2154 (warning): DORIS_THIRDPARTY is referenced but not assigned.


In build.sh line 474:
    ENABLE_PCH                  -- ${ENABLE_PCH}
                                   ^-----------^ SC2154 (warning): ENABLE_PCH is referenced but not assigned.


In build.sh line 538:
    MAKE_PROGRAM="$(command -v "${BUILD_SYSTEM}")"
                                ^-------------^ SC2154 (warning): BUILD_SYSTEM is referenced but not assigned.


In build.sh line 545:
    echo "-- Use ccache: ${CMAKE_USE_CCACHE}"
                         ^-----------------^ SC2154 (warning): CMAKE_USE_CCACHE is referenced but not assigned.


In build.sh line 551:
    "${CMAKE_CMD}" -G "${GENERATOR}" \
     ^----------^ SC2154 (warning): CMAKE_CMD is referenced but not assigned.
                       ^----------^ SC2154 (warning): GENERATOR is referenced but not assigned.


In build_plugin.sh line 25:
. "${DORIS_HOME}/env.sh"
  ^--------------------^ SC1091 (info): Not following: ./env.sh: openBinaryFile: does not exist (No such file or directory)


In build_plugin.sh line 105:
        "${MVN_CMD}" clean
         ^--------^ SC2154 (warning): MVN_CMD is referenced but not assigned.


In contrib/udf/build_udf.sh line 39:
. "${DORIS_HOME}/env.sh"
  ^--------------------^ SC1091 (info): Not following: ./env.sh: openBinaryFile: does not exist (No such file or directory)


In contrib/udf/build_udf.sh line 123:
    "${CMAKE_CMD}" -G "${GENERATOR}" -DCMAKE_BUILD_TYPE="${CMAKE_BUILD_TYPE}" ../
     ^----------^ SC2154 (warning): CMAKE_CMD is referenced but not assigned.
                       ^----------^ SC2154 (warning): GENERATOR is referenced but not assigned.


In contrib/udf/build_udf.sh line 124:
    "${BUILD_SYSTEM}" -j "${PARALLEL}"
     ^-------------^ SC2154 (warning): BUILD_SYSTEM is referenced but not assigned.


In fe_plugins/auditloader/build.sh line 25:
. "${DORIS_HOME}/env.sh"
  ^--------------------^ SC1091 (info): Not following: ./env.sh: openBinaryFile: does not exist (No such file or directory)


In fe_plugins/auditloader/build.sh line 29:
"${MVN_CMD}" clean package -DskipTests
 ^--------^ SC2154 (warning): MVN_CMD is referenced but not assigned.


In fs_brokers/apache_hdfs_broker/build.sh line 25:
. "${DORIS_HOME}/env.sh"
  ^--------------------^ SC1091 (info): Not following: ./env.sh: openBinaryFile: does not exist (No such file or directory)


In fs_brokers/apache_hdfs_broker/build.sh line 35:
"${MVN_CMD}" package -DskipTests
 ^--------^ SC2154 (warning): MVN_CMD is referenced but not assigned.


In generated-source.sh line 29:
. "${DORIS_HOME}/env.sh"
  ^--------------------^ SC1091 (info): Not following: ./env.sh: openBinaryFile: does not exist (No such file or directory)


In run-be-ut.sh line 42:
. "${DORIS_HOME}/env.sh"
  ^--------------------^ SC1091 (info): Not following: ./env.sh: openBinaryFile: does not exist (No such file or directory)


In run-be-ut.sh line 132:
    ENABLE_PCH          -- ${ENABLE_PCH}
                           ^-----------^ SC2154 (warning): ENABLE_PCH is referenced but not assigned.


In run-be-ut.sh line 218:
MAKE_PROGRAM="$(command -v "${BUILD_SYSTEM}")"
                            ^-------------^ SC2154 (warning): BUILD_SYSTEM is referenced but not assigned.


In run-be-ut.sh line 220:
echo "-- Use ccache: ${CMAKE_USE_CCACHE}"
                     ^-----------------^ SC2154 (warning): CMAKE_USE_CCACHE is referenced but not assigned.


In run-be-ut.sh line 224:
"${CMAKE_CMD}" -G "${GENERATOR}" \
 ^----------^ SC2154 (warning): CMAKE_CMD is referenced but not assigned.
                   ^----------^ SC2154 (warning): GENERATOR is referenced but not assigned.


In run-be-ut.sh line 241:
    -DDORIS_JAVA_HOME="${JAVA_HOME}" \
                       ^----------^ SC2154 (warning): JAVA_HOME is referenced but not assigned.


In run-be-ut.sh line 303:
if [[ -d "${DORIS_THIRDPARTY}/installed/lib/hadoop_hdfs/" ]]; then
          ^-----------------^ SC2154 (warning): DORIS_THIRDPARTY is referenced but not assigned.


In run-be-ut.sh line 424:
        cmd1="${LLVM_PROFDATA} merge -o ${profdata} ${profraw}"
              ^--------------^ SC2154 (warning): LLVM_PROFDATA is referenced but not assigned.


In run-be-ut.sh line 427:
        cmd2="${LLVM_COV} show -output-dir=${DORIS_TEST_BINARY_DIR}/report -format=html \
              ^---------^ SC2154 (warning): LLVM_COV is referenced but not assigned.


In run-cloud-ut.sh line 140:
. "${DORIS_HOME}/env.sh"
  ^--------------------^ SC1091 (info): Not following: ./env.sh: openBinaryFile: does not exist (No such file or directory)


In run-cloud-ut.sh line 160:
MAKE_PROGRAM="$(command -v "${BUILD_SYSTEM}")"
                            ^-------------^ SC2154 (warning): BUILD_SYSTEM is referenced but not assigned.


In run-cloud-ut.sh line 166:
"${CMAKE_CMD}" -G "${GENERATOR}" \
 ^----------^ SC2154 (warning): CMAKE_CMD is referenced but not assigned.
                   ^----------^ SC2154 (warning): GENERATOR is referenced but not assigned.


In run-cloud-ut.sh line 177:
    "${CMAKE_USE_CCACHE}" \
     ^-----------------^ SC2154 (warning): CMAKE_USE_CCACHE is referenced but not assigned.


In run-fe-ut.sh line 25:
. "${DORIS_HOME}/env.sh"
  ^--------------------^ SC1091 (info): Not following: ./env.sh: openBinaryFile: does not exist (No such file or directory)


In run-fe-ut.sh line 117:
        "${MVN_CMD}" test jacoco:report -DfailIfNoTests=false -Dtest="$1"
         ^--------^ SC2154 (warning): MVN_CMD is referenced but not assigned.


In thirdparty/build-thirdparty.sh line 42:
    . "${DORIS_HOME}/env.sh"
      ^--------------------^ SC1091 (info): Not following: ./env.sh: openBinaryFile: does not exist (No such file or directory)


In thirdparty/build-thirdparty.sh line 155:
if [[ "${CC}" == *gcc ]]; then
       ^---^ SC2154 (warning): CC is referenced but not assigned.


In thirdparty/build-thirdparty.sh line 208:
check_prerequest "${CMAKE_CMD} --version" "cmake"
                  ^----------^ SC2154 (warning): CMAKE_CMD is referenced but not assigned.


In thirdparty/build-thirdparty.sh line 340:
        "${CMAKE_CMD}" -G "${GENERATOR}" -DCMAKE_INSTALL_PREFIX="${TP_INSTALL_DIR}" -DEVENT__DISABLE_TESTS=ON \
                           ^----------^ SC2154 (warning): GENERATOR is referenced but not assigned.


In thirdparty/build-thirdparty.sh line 343:
    "${BUILD_SYSTEM}" -j "${PARALLEL}"
     ^-------------^ SC2154 (warning): BUILD_SYSTEM is referenced but not assigned.


In thirdparty/build-thirdparty.sh line 1123:
    local ld="${DORIS_BIN_UTILS}/ld"
              ^----------------^ SC2154 (warning): DORIS_BIN_UTILS is referenced but not assigned.

For more information:
  https://www.shellcheck.net/wiki/SC2154 -- BUILD_SYSTEM is referenced but no...
  https://www.shellcheck.net/wiki/SC1091 -- Not following: ./env.sh: openBina...
----------

You can address the above issues in one of three ways:
1. Manually correct the issue in the offending shell script;
2. Disable specific issues by adding the comment:
  # shellcheck disable=NNNN
above the line that contains the issue, where NNNN is the error code;
3. Add '-e NNNN' to the SHELLCHECK_OPTS setting in your .yml action file.



shfmt errors
'shfmt ' found no issues.

Nitin-Kashyap and others added 6 commits July 5, 2024 11:08
… generated by Spark. (27783)

    1. Original planner updated to consider BucketShuffle for bucketed hive table
    2. Neerids planner updated for bucketShuffle join on hive tables.
    3. Added spark style hash calculation in BE for shuffle on one side.
    4. Added shuffle hash selection based on left(non-shuffling) side.
Copy link
Contributor

github-actions bot commented Jul 5, 2024

clang-tidy review says "All clean, LGTM! 👍"

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Feature] Enable BucketShuffle Join for Hive tables
6 participants