[Gluten-1.2] Port #10534 to Branch-1.2 for Fix hash build memory over use (#10534) #500
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Summary:
For duplicate rows memory usage, currently under parallel join build conditions, each build operator reserves memory big enough to accommodate total number of rows across all hash tables from all build operators. Instead each build operator should only reserve memory enough for its own hash table rows.
This optimization reduced hash build operator memory usage by 10x and we see total memory reduction of some queries reduced by 70%.
Pull Request resolved: facebookincubator#10534
Reviewed By: zacw7
Differential Revision: D60131886
Pulled By: tanjialiang
fbshipit-source-id: a8c1c777df557dfcfc754ef31164a116fdb917c3
(cherry picked from commit 3fb9657)