-
Notifications
You must be signed in to change notification settings - Fork 434
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[GLUTEN-7802][CORE] Use original total memory size if enable dynamic offheap #7803
Conversation
Run Gluten Clickhouse CI |
@supermem613 @zhztheplayer @ulysses-you @PHILO-HE Could you please take a look? |
Run Gluten Clickhouse CI |
Be careful, the dynamic offheap feature is experimental. |
Thank you for reminding me, yea, I was just testing the feature in a test environment, and I found that this feature ignores the user-configured offheap memory. |
Noted. So the work is related to #7605? Since I didn't see the tickets linked. |
I didn't notice it before, but I understand the magic of #5439 lies in using an on-heap memory quota to control on-heap and off-heap memory. Off-heap memory is floating and reserved for OFFHEAP_SIZING_MEMORY_FRACTION to prevent overlapping behavior when on-heap and off-heap memory are allocated simultaneously. #7605 may have been intentionally designed by #5439, but the current quota ignores the specified off-heap memory. This PR is used to make the quota's memory include the user-specified off-heap+on-heap. |
Right - we purposefully ignore the user-configured offheap memory because we are going to size it based on the executor memory set. I don't think we should consider a manually considered off-heap setting as additional memory space we should use, especially if there may be use of off-heap memory outside of Gluten - which the user would be setting it up for... since this feature would preclude manually setting that, no? |
Putting it differently - if what we are looking for is the capability to reduce the dynamic memory allocated with an offset, not just a factor, we should add a setting that does that, not use the off-heap setting. |
Thank you for your explanation. If a user specifies "onheap 10G offheap 5G", was the available memory for the app 15G before, and now only 10G? If I missed something, please correct me. |
Fundamentally, if the dynamic sizing off-heap memory feature is on, the user manually setting the off-heap size - which will be overwritten - could only be interpreted in two ways: (1) The user wants to manually control the overall off-heap memory usage (Gluten + any other uses). For (1), it directly conflicts with the feature. Either one manually sets the off-heap or one lets the feature do its thing. For (2), the current implementation of the feature is counter-productive as it will size the off-heap based solely on Gluten calculation and will use it ignoring any other user off-heap usage. This scenario is essentially not well addressed right now. I'd argue that really, we should separate the off-heap tracking in Gluten vs. Spark off-heap (maybe only when the dynamic sizing feature is on), so that customers can choose to set aside off-heap for their custom JARs, etc, and separately set the memory for Gluten OR use the dynamic sizing feature, but without conflicting with the off-heap setting. Now, going back to your example of "onheap 10G offheap 5G". Fundamentally the question is what the use is attempting to do. If the user wants to use 10GB for JVM and 5GB for Gluten + custom JARs, it would do so with the dynamic sizing feature off (scenario (1) above). If they want 10GB for JVM or Gluten and 5GB for custom JARs, we hit the (2) scenario above - so right now they would turn off the dynamic sizing feature. Overall, IMO, this is a strong argument for pursuing next what I described above - separating off-heap usage, perhaps only when the dynamic sizing feature is on, for Gluten vs. everything else. This would cover the user scenario of setting aside memory for things other than Gluten nicely. |
@supermem613 Thank you for your very detailed explanation. you are right. thanks! |
What changes were proposed in this pull request?
This PR aims to fix #7802.
How was this patch tested?
manual tests