Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[GLUTEN-3378][CORE] DeltaScanTransformer to support delta table #3982

Merged
merged 3 commits into from
Dec 12, 2023

Conversation

YannByron
Copy link
Contributor

What changes were proposed in this pull request?

Based on the new framework defined in #3843, modify the implementation of delta scan.

How was this patch tested?

(Please explain how this patch was tested. E.g. unit tests, integration tests, manual tests)

(If this patch involves UI changes, please attach a screenshot; otherwise, remove this)

Copy link

github-actions bot commented Dec 8, 2023

#3378

Copy link

github-actions bot commented Dec 8, 2023

Run Gluten Clickhouse CI

Copy link

github-actions bot commented Dec 8, 2023

Run Gluten Clickhouse CI


def createDataSourceTransformer(
batchScan: FileSourceScanExec,
partitionFilters: Seq[Expression]): FileSourceScanExecTransformer = {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rename partitionFilters to newPartitionFilters so that avoid confusion as FileSourceScanExec also includes this kind of member?

@yma11
Copy link
Contributor

yma11 commented Dec 10, 2023

By the way, is it reasonable to move previous column mapping overrides into DeltaScanTransformer?

@YannByron
Copy link
Contributor Author

By the way, is it reasonable to move previous column mapping overrides into DeltaScanTransformer?

They are two things. DeltaScanTransformer is used to replace this line:

case "DeltaParquetFileFormat" => ReadFileFormat.ParquetReadFormat

to remove all the codes related to specified lake format in gluten-core.

Copy link

Run Gluten Clickhouse CI

Copy link
Contributor

@leesf leesf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

@yma11 yma11 merged commit 0ff8a42 into apache:main Dec 12, 2023
17 checks passed
@GlutenPerfBot
Copy link
Contributor

===== Performance report for TPCH SF2000 with Velox backend, for reference only ====

query log/native_3982_time.csv log/native_master_12_11_2023_71ec720d4_time.csv difference percentage
q1 33.22 32.10 -1.125 96.61%
q2 24.80 24.79 -0.007 99.97%
q3 37.89 38.42 0.533 101.41%
q4 39.07 39.90 0.835 102.14%
q5 70.12 72.41 2.289 103.26%
q6 5.31 7.02 1.703 132.05%
q7 84.71 85.38 0.664 100.78%
q8 88.85 86.65 -2.196 97.53%
q9 125.18 126.76 1.579 101.26%
q10 46.64 46.79 0.146 100.31%
q11 20.48 20.34 -0.136 99.33%
q12 23.58 26.95 3.375 114.31%
q13 46.68 45.56 -1.117 97.61%
q14 17.49 18.31 0.819 104.68%
q15 28.60 27.78 -0.823 97.12%
q16 15.83 15.83 -0.000 100.00%
q17 103.50 101.57 -1.927 98.14%
q18 149.63 151.54 1.909 101.28%
q19 14.28 12.91 -1.370 90.40%
q20 28.58 26.84 -1.747 93.89%
q21 226.87 228.06 1.194 100.53%
q22 14.59 13.81 -0.778 94.67%
total 1245.89 1249.71 3.818 100.31%

@sezruby
Copy link

sezruby commented Dec 14, 2023

@YannByron are you planning to add protocol reader version check?
It won't support a scan with deletion vector.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants