-
Notifications
You must be signed in to change notification settings - Fork 3.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[feature](datalake) Add BucketShuffleJoin support for bucketed hive tables #27784
base: master
Are you sure you want to change the base?
[feature](datalake) Add BucketShuffleJoin support for bucketed hive tables #27784
Conversation
0488e3b
to
c5e23b2
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
clang-tidy made some suggestions
clang-tidy review says "All clean, LGTM! 👍" |
b4464d4
to
f9e42ab
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
clang-tidy made some suggestions
ed212e1
to
eaf29b0
Compare
clang-tidy review says "All clean, LGTM! 👍" |
Hi @Nitin-Kashyap , thanks for your contribution. |
BTW, is it only suitable for "spark created" hive bucket table? |
@morningman Please find the sample test I used for this case: - CREATE TABLE parquet_test (
user_id INT,
key VARCHAR(20),
part VARCAHAR(10)
)
USING parquet
PARTITIONED BY (part)
CLUSTERED BY (user_id) INTO 3 BUCKETS;
INSERT INTO parquet_test2 VALUES (31, 'U31', 'IN'), (11,'U11','IN'), (21, 'U21', 'IN');
|
@morningman Yes, for current scope it will understand only Spark created bucketed table, it identifies this by Properties defined by spark for bucket specification. I plan to take up supporting for Hive, Hudi as well in some time (hopefully in next PR); for this I have left a place holder THashType [HIVE_MOD: Hive and Hudi use the same hash method] however for hudi some more changes on FE side need to do for identifing type bucket id from file path. |
eaf29b0
to
34c701c
Compare
clang-tidy review says "All clean, LGTM! 👍" |
1 similar comment
clang-tidy review says "All clean, LGTM! 👍" |
34c701c
to
d25350a
Compare
clang-tidy review says "All clean, LGTM! 👍" |
fe/fe-core/src/main/java/org/apache/doris/catalog/external/HMSExternalTable.java
Outdated
Show resolved
Hide resolved
fe/fe-core/src/main/java/org/apache/doris/catalog/external/HMSExternalTable.java
Outdated
Show resolved
Hide resolved
fe/fe-core/src/main/java/org/apache/doris/catalog/external/HMSExternalTable.java
Outdated
Show resolved
Hide resolved
fe/fe-core/src/main/java/org/apache/doris/catalog/external/HMSExternalTable.java
Outdated
Show resolved
Hide resolved
fe/fe-core/src/main/java/org/apache/doris/catalog/external/HMSExternalTable.java
Outdated
Show resolved
Hide resolved
fe/fe-core/src/main/java/org/apache/doris/catalog/external/HMSExternalTable.java
Outdated
Show resolved
Hide resolved
fe/fe-core/src/main/java/org/apache/doris/planner/external/HiveScanNode.java
Outdated
Show resolved
Hide resolved
fe/fe-core/src/main/java/org/apache/doris/planner/DataPartition.java
Outdated
Show resolved
Hide resolved
fe/fe-core/src/main/java/org/apache/doris/planner/DistributedPlanner.java
Outdated
Show resolved
Hide resolved
fe/fe-core/src/main/java/org/apache/doris/planner/external/FileQueryScanNode.java
Outdated
Show resolved
Hide resolved
d25350a
to
28039b8
Compare
clang-tidy review says "All clean, LGTM! 👍" |
clang-tidy review says "All clean, LGTM! 👍" |
6f15fc1
to
f84c004
Compare
clang-tidy review says "All clean, LGTM! 👍" |
f84c004
to
a13919c
Compare
clang-tidy review says "All clean, LGTM! 👍" |
run buildall |
TeamCity be ut coverage result: |
a13919c
to
92adb1e
Compare
clang-tidy review says "All clean, LGTM! 👍" |
92adb1e
to
a5ce239
Compare
run buildall |
clang-tidy review says "All clean, LGTM! 👍" |
TeamCity be ut coverage result: |
Hi @Nitin-Kashyap , I submitted a PR to your branch |
clang-tidy review says "All clean, LGTM! 👍" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
run buildall |
83b98e5
to
8daa14a
Compare
|
8daa14a
to
bd53e2c
Compare
|
… generated by Spark. (27783) 1. Original planner updated to consider BucketShuffle for bucketed hive table 2. Neerids planner updated for bucketShuffle join on hive tables. 3. Added spark style hash calculation in BE for shuffle on one side. 4. Added shuffle hash selection based on left(non-shuffling) side.
bd53e2c
to
d6e888e
Compare
clang-tidy review says "All clean, LGTM! 👍" |
Add BucketShuffleJoin support for bucketed hive tables generated by Spark. (27783)
Proposed changes
Issue Number: close #27783
###Sample Output:s