-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Introduce execution hint for Cardinality aggregation #15764
base: main
Are you sure you want to change the base?
Introduce execution hint for Cardinality aggregation #15764
Conversation
❌ Gradle check result for b7560d6: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
❌ Gradle check result for 5a71577: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
❌ Gradle check result for 7accb1d: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
❌ Gradle check result for 96f9b83: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
❌ Gradle check result for b101ace: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
❌ Gradle check result for 2658aa2: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
Signed-off-by: Siddharth Rayabharam <[email protected]>
Signed-off-by: Siddharth Rayabharam <[email protected]>
Signed-off-by: Siddharth Rayabharam <[email protected]>
Signed-off-by: Siddharth Rayabharam <[email protected]>
Signed-off-by: Siddharth Rayabharam <[email protected]>
Signed-off-by: Siddharth Rayabharam <[email protected]>
Signed-off-by: Siddharth Rayabharam <[email protected]>
Signed-off-by: Siddharth Rayabharam <[email protected]>
Signed-off-by: Siddharth Rayabharam <[email protected]>
2658aa2
to
fd02f7d
Compare
❌ Gradle check result for fd02f7d: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
@rishabhmaurya gradle check failing with |
@maitreya2954 looks like the build is failing due to failure of a task. Please check the console output for full build output.
|
@rishabhmaurya @msfroh @maitreya2954 we have a similar precedent [1] recently where we decided that offloading such types of decisions to the user is not a viable path forward. The server should be the one to figure out the best way to perform the aggregation in question, not users. [1] #15012 |
Signed-off-by: Siddharth Rayabharam <[email protected]>
❌ Gradle check result for a6f3c65: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
❌ Gradle check result for e205165: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
From #15269, it sounds like the original proposal was to add a memory threshold (which a user could adjust up/down) to cluster settings that would let the server decide which mode to use based on available memory. Would that be a better option? |
@msfroh @reta I think adding a better heuristic for selection of ordinal vs direct collector could be a follow up to it. This setting is for the power users who understand the impact of ordinal collector behaviour and understand the implications of it and still want to use it. This is similar to execution hint we expose for term aggregation. Also, sometimes cardinality agg is the major usecase for users and they may want to tweak this setting directly to understand the memory impact in their non-prod environments to better scale the prod environments. Heuristic based solutions doesn't gives this flexibility. And I'm not making a hypothetical use case as this became a blocker for one of the managed service user who wanted to tweak this setting in their non-prod environment to understand the performance gains and memory impact where they were evaluating eager_global ordinals, murmur hash, direct and ordinal collector but didn't have a way to override the behaviour for direct vs ordinal collector. |
@rishabhmaurya changing and maintaining public APIs for a feature that (as per you comment) only one single user may benefit from is not justified (in my opinion). The idea behind heuristic based solutions is to have a reliable way to pick the best path for vast majority of the users, the execution hint offloads this pain to the user. If we could avoid that by spending more time understanding "the performance gains and memory impact evaluating eager_global ordinals, murmur hash, direct and ordinal collector" (as you mentioned), it would be win / win option. |
@reta @msfroh @rishabhmaurya I think best way to move forward is to develop a reliable heuristic but also give a user ability to set a memory threshold or choose a desired behavior. There is no doubt, A good heuristic will benefit vast majority of the users. However, on the other hand, users who are willing to expend more memory for OrdinalsCollector to improve aggregation times (I believe ordinals are faster, but correct me if I am wrong). So, isolating few power users with this usecase might also not be best way for us. I suggest, we keep the current heuristic (here) as the default behaviour and give option to the user to choose "execution_hint" AND option to set a "memory threshold" in the cluster settings. @reta Since we do not have any new heuristic other than the existing one, we can create an seperate issue to come up with an enhanced heuristic. Wdyt? |
@maitreya2954 for this specific execution hint, the user has to be a real expert to understand a) what these DirectCollector and OrdinalsCollector are b) how they differ c) when to use one or another. I am fine to have this new hint added but I would like to have it documented in a way that user (and not only real expert) would be able to understand when one may need to use it and why. |
This PR is stalled because it has been open for 30 days with no activity. |
Description
This PR tries to introduce
execution_hint
field for Cardinality aggregation. Theexecution_hint
field currently accepts two values:direct
andordinals
. Specifyingordinals
execution hint on non-ordinal fields will have no effect i.e DirectCollector is used. Similarly, specifyingdirect
on ordinals fields will have no effect i.e OrdinalsCollector is used.Related Issues
Resolves #15269
Related Documentation issue:
opensearch-project/documentation-website#8264
Check List
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.