Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[GLUTEN-3594] [VL] Allow users to set bloom filter configurations #3610

Merged
merged 1 commit into from
Nov 8, 2023

Conversation

zhli1142015
Copy link
Contributor

What changes were proposed in this pull request?

Fixes: #3594

How was this patch tested?

(Please explain how this patch was tested. E.g. unit tests, integration tests, manual tests)

(If this patch involves UI changes, please attach a screenshot; otherwise, remove this)

Copy link

github-actions bot commented Nov 3, 2023

#3594

Copy link

github-actions bot commented Nov 3, 2023

Run Gluten Clickhouse CI

1 similar comment
Copy link

github-actions bot commented Nov 3, 2023

Run Gluten Clickhouse CI

@@ -1267,4 +1272,12 @@ object GlutenConfig {
+ "partial aggregation may be early abandoned.")
.intConf
.createOptional

val COLUMNAR_VELOX_BLOOM_FILTER_MAX_NUM_BITS =
buildConf("spark.gluten.sql.columnar.backend.velox.bloomFilter.maxNumBits")
Copy link
Contributor Author

@zhli1142015 zhli1142015 Nov 3, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a new config as spark.bloom_filter.max_num_bits has different behavior from spark conf RUNTIME_BLOOM_FILTER_MAX_NUM_BITS.
https://github.com/facebookincubator/velox/blob/73d4279a14744cf4d038d3a967a49dcddbad9d39/velox/core/QueryConfig.h#L632C5-L632C24

@@ -62,6 +62,9 @@ const std::string kAbandonPartialAggregationMinPct =
"spark.gluten.sql.columnar.backend.velox.abandonPartialAggregationMinPct";
const std::string kAbandonPartialAggregationMinRows =
"spark.gluten.sql.columnar.backend.velox.abandonPartialAggregationMinRows";
const std::string kBloomFilterExpectedNumItems = "spark.sql.optimizer.runtime.bloomFilter.expectedNumItems";
const std::string kBloomFilterNumBits = "spark.sql.optimizer.runtime.bloomFilter.numBits";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about incorporating the two parameters mentioned above with the "spark.gluten" prefix? For instance, we could use "spark.gluten.sql.optimizer.runtime.bloomFilter.expectedNumItems" and "spark.gluten.sql.optimizer.runtime.bloomFilter.numBits".

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok to me, let me add new config entries for them.

Copy link

github-actions bot commented Nov 6, 2023

Run Gluten Clickhouse CI

1 similar comment
Copy link

github-actions bot commented Nov 6, 2023

Run Gluten Clickhouse CI

Copy link
Contributor

@JkSelf JkSelf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Just one small question.

getConfigValue(confMap_, kBloomFilterExpectedNumItems, "1000000");
configs[velox::core::QueryConfig::kSparkBloomFilterNumBits] =
getConfigValue(confMap_, kBloomFilterNumBits, "8388608");
configs[velox::core::QueryConfig::kSparkBloomFilterMaxNumBits] =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems the default value of kSparkBloomFilterMaxNumBits in vanilla spark is 67108864L here . Why here is 4194304?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jinchengchenghh
Copy link
Contributor

Please also update the document, https://github.com/oap-project/gluten/blob/main/docs/Configuration.md

@zhli1142015
Copy link
Contributor Author

Please also update the document, https://github.com/oap-project/gluten/blob/main/docs/Configuration.md

Added, thanks.

Copy link

github-actions bot commented Nov 7, 2023

Run Gluten Clickhouse CI

@JkSelf
Copy link
Contributor

JkSelf commented Nov 8, 2023

@zhli1142015 Can you help to rebase again? Thanks.

Copy link

github-actions bot commented Nov 8, 2023

Run Gluten Clickhouse CI

@zhli1142015
Copy link
Contributor Author

@zhli1142015 Can you help to rebase again? Thanks.

Rebased, thanks.

Copy link
Contributor

@JkSelf JkSelf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@zhli1142015 zhli1142015 merged commit 4a72871 into apache:main Nov 8, 2023
17 checks passed
@GlutenPerfBot
Copy link
Contributor

===== Performance report for TPCH SF2000 with Velox backend, for reference only ====

query log/native_3610_time.csv log/native_master_11_07_2023_e3eff1d8f_time.csv difference percentage
q1 34.24 34.38 0.137 100.40%
q2 25.01 25.03 0.012 100.05%
q3 39.76 38.14 -1.612 95.95%
q4 37.72 37.57 -0.148 99.61%
q5 70.81 71.50 0.692 100.98%
q6 7.98 6.26 -1.720 78.45%
q7 84.36 82.22 -2.134 97.47%
q8 85.73 86.95 1.222 101.43%
q9 120.45 119.81 -0.639 99.47%
q10 52.57 51.26 -1.306 97.52%
q11 19.94 19.73 -0.213 98.93%
q12 27.37 24.39 -2.980 89.11%
q13 48.50 50.30 1.793 103.70%
q14 16.80 17.67 0.865 105.15%
q15 31.96 30.35 -1.609 94.97%
q16 16.34 16.20 -0.141 99.14%
q17 102.47 101.51 -0.961 99.06%
q18 145.74 148.26 2.519 101.73%
q19 14.79 16.17 1.386 109.37%
q20 30.14 30.31 0.173 100.57%
q21 222.04 224.88 2.842 101.28%
q22 13.84 14.08 0.245 101.77%
total 1248.56 1246.98 -1.576 99.87%

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[VL] Allow users to set bloom filter configurations
4 participants