Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CORE] Add GlutenImplicits to get FallbackSummary easily #3599

Merged
merged 2 commits into from
Nov 6, 2023

Conversation

ulysses-you
Copy link
Contributor

@ulysses-you ulysses-you commented Nov 2, 2023

What changes were proposed in this pull request?

This pr adds a a helper class to get the Gluten fallback summary from a Spark Dataset. If AQE is enabled, but the query is not materialized, then this method will re-plan the query execution with disabled AQE. It is a workaround to get the final plan, and it may cause inconsistent results with a materialized query. However, we have no choice.

For example:

import org.apache.spark.sql.execution.GlutenImplicits._
val df = spark.sql("SELECT  FROM t")
df.fallbackSummary

How was this patch tested?

add test

Copy link

github-actions bot commented Nov 2, 2023

Thanks for opening a pull request!

Could you open an issue for this pull request on Github Issues?

https://github.com/oap-project/gluten/issues

Then could you also rename commit message and pull request title in the following format?

[GLUTEN-${ISSUES_ID}][COMPONENT]feat/fix: ${detailed message}

See also:

Copy link

github-actions bot commented Nov 2, 2023

Run Gluten Clickhouse CI

@ulysses-you
Copy link
Contributor Author

cc @zhouyuan @PHILO-HE thank you

Copy link

github-actions bot commented Nov 3, 2023

Run Gluten Clickhouse CI

2 similar comments
Copy link

github-actions bot commented Nov 3, 2023

Run Gluten Clickhouse CI

Copy link

github-actions bot commented Nov 3, 2023

Run Gluten Clickhouse CI

Copy link
Contributor

@PHILO-HE PHILO-HE left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your work!

@@ -2014,6 +2014,9 @@ class ClickHouseTestSettings extends BackendTestSettings {
"SELECT structFieldSimple.key, arrayFieldSimple[1] FROM tableWithSchema a where int_Field=1")
.exclude("SELECT structFieldComplex.Value.`value_(2)` FROM tableWithSchema")
enableSuite[SparkFunctionStatistics]

enableSuite[GlutenImplicitsTest]
.exclude("fallbackSummary with shuffle")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This feature is not workable for CH backend? cc @zzcclp.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should work with CH backend. There are some different behavior, so I disable these tests for CH backend. e.g., velox backend would add one more project before shuffle, velox backend supports columnar cache, etc..

Copy link

github-actions bot commented Nov 3, 2023

Run Gluten Clickhouse CI

Copy link

github-actions bot commented Nov 6, 2023

Run Gluten Clickhouse CI

@PHILO-HE
Copy link
Contributor

PHILO-HE commented Nov 6, 2023

Hi @ulysses-you, could you also update the doc? This may be a good place: https://github.com/oap-project/gluten/blob/main/docs/get-started/Velox.md

@ulysses-you
Copy link
Contributor Author

@PHILO-HE yes, added the docs

Copy link

github-actions bot commented Nov 6, 2023

Run Gluten Clickhouse CI

Copy link
Contributor

@PHILO-HE PHILO-HE left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@ulysses-you ulysses-you merged commit 05c7435 into apache:main Nov 6, 2023
17 checks passed
@ulysses-you ulysses-you deleted the fallback branch November 6, 2023 09:19
@GlutenPerfBot
Copy link
Contributor

===== Performance report for TPCH SF2000 with Velox backend, for reference only ====

query log/native_3599_time.csv log/native_master_11_05_2023_16e287450_time.csv difference percentage
q1 32.73 34.85 2.126 106.49%
q2 24.76 25.12 0.356 101.44%
q3 37.88 40.11 2.231 105.89%
q4 36.83 37.09 0.264 100.72%
q5 69.01 71.44 2.431 103.52%
q6 6.49 9.10 2.608 140.18%
q7 85.59 87.61 2.027 102.37%
q8 86.81 85.22 -1.592 98.17%
q9 123.02 121.12 -1.894 98.46%
q10 51.97 51.83 -0.138 99.74%
q11 20.23 19.68 -0.549 97.29%
q12 25.58 28.01 2.433 109.51%
q13 49.44 48.61 -0.833 98.32%
q14 17.57 19.14 1.571 108.94%
q15 30.50 33.03 2.522 108.27%
q16 16.71 16.10 -0.611 96.35%
q17 102.53 102.08 -0.447 99.56%
q18 147.61 149.04 1.430 100.97%
q19 14.80 18.25 3.449 123.31%
q20 30.65 30.91 0.269 100.88%
q21 222.07 222.95 0.878 100.40%
q22 13.76 13.47 -0.291 97.88%
total 1246.52 1264.76 18.242 101.46%

ulysses-you added a commit to ulysses-you/gluten that referenced this pull request Dec 7, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants