Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CH] AQE cannot coalesce partitions when Exchange hash partitioning exists non-attribute expressions #3486

Closed
exmy opened this issue Oct 23, 2023 · 9 comments · Fixed by #3941
Labels
bug Something isn't working triage

Comments

@exmy
Copy link
Contributor

exmy commented Oct 23, 2023

Backend

CH (ClickHouse)

Bug description

image

A pre-project operator is added before the exchange operator when hash partitioning involves non-attribute expressions, which results in that AQE cannot coalesce shuffle partitions.

Spark version

None

Spark configurations

No response

System information

No response

Relevant logs

No response

@exmy exmy added bug Something isn't working triage labels Oct 23, 2023
@lgbo-ustc
Copy link
Contributor

@rui-mo mo @zzcclp Should we extend the substait pb to support expression in exchange operator?

@rui-mo
Copy link
Contributor

rui-mo commented Oct 23, 2023

@exmy Thanks for noticing this issue. Could you explain a bit how large this issue will affect the performance?

@rui-mo
Copy link
Contributor

rui-mo commented Oct 25, 2023

cc @PHILO-HE

@exmy
Copy link
Contributor Author

exmy commented Nov 24, 2023

@exmy Thanks for noticing this issue. Could you explain a bit how large this issue will affect the performance?

We've recently noticed that this issue will also prevent AQE's OptimizeSkewedJoin rule from being effective, and it has a significant impact on performance.

If OptimizeSkewedJoin disabled due to this issue:
image
image

If OptimizeSkewedJoin enabled:
image
image

@rui-mo

@rui-mo
Copy link
Contributor

rui-mo commented Nov 27, 2023

@exmy Thanks for the profiling. On which query can we reproduce the performance result? Does that mean we shouldn't add extra project before exchange?

@exmy
Copy link
Contributor Author

exmy commented Dec 1, 2023

@exmy Thanks for the profiling. On which query can we reproduce the performance result? Does that mean we shouldn't add extra project before exchange?

Sorry, this issue is specific to the CH backend. It's not present in the Velox backend.

@exmy exmy changed the title [CORE] AQE cannot coalesce partitions when Exchange hash partitioning exists non-attribute expressions [CH] AQE cannot coalesce partitions when Exchange hash partitioning exists non-attribute expressions Dec 1, 2023
@rui-mo
Copy link
Contributor

rui-mo commented Dec 6, 2023

Sorry, this issue is specific to the CH backend. It's not present in the Velox backend.

@exmy Thanks for your work. Velox backend also inserts a Project transformer before exchange if the hash keys are expressions. I assume similar issue also occurs for Velox backend, right? cc @PHILO-HE

@exmy
Copy link
Contributor Author

exmy commented Dec 6, 2023

@exmy Thanks for your work. Velox backend also inserts a Project transformer before exchange if the hash keys are expressions. I assume similar issue also occurs for Velox backend, right? cc @PHILO-HE

Velox backend doesn't has this issue. Because it adds pre-project to calculate hash value but doesn't change shuffle hash expressions in ShuffleExchange operator. I have tested it. @rui-mo

@rui-mo
Copy link
Contributor

rui-mo commented Dec 7, 2023

@exmy Thanks for checking!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working triage
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants