Introduce execution hint for Cardinality aggregation #15764

maitreya2954 · 2024-09-05T19:30:56Z

Description

This PR tries to introduce execution_hint field for Cardinality aggregation. The execution_hint field currently accepts two values: direct and ordinals. Specifying ordinals execution hint on non-ordinal fields will have no effect i.e DirectCollector is used. Similarly, specifying direct on ordinals fields will have no effect i.e OrdinalsCollector is used.

Related Issues

Resolves #15269

Check List

Functionality includes testing.
API changes companion pull request created, if applicable.
Public documentation issue/PR created, if applicable.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

github-actions · 2024-09-05T19:49:42Z

❌ Gradle check result for b7560d6: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

github-actions · 2024-09-05T19:54:25Z

❌ Gradle check result for 5a71577: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

github-actions · 2024-09-12T01:01:35Z

❌ Gradle check result for 7accb1d: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

github-actions · 2024-09-13T19:48:02Z

❌ Gradle check result for 96f9b83: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

github-actions · 2024-09-13T19:57:29Z

❌ Gradle check result for b101ace: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

github-actions · 2024-09-13T21:58:07Z

❌ Gradle check result for 2658aa2: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Signed-off-by: Siddharth Rayabharam <[email protected]>

github-actions · 2024-09-14T03:08:45Z

❌ Gradle check result for fd02f7d: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

maitreya2954 · 2024-09-14T15:03:57Z

@rishabhmaurya gradle check failing with Ownership is not configured for this Run. what does this mean?

rishabhmaurya · 2024-09-16T15:54:20Z

@maitreya2954 looks like the build is failing due to failure of a task. Please check the console output for full build output.

> Task :server:forbiddenApisMain
Forbidden method invocation: java.lang.String#toUpperCase() [Uses default locale]
  in org.opensearch.search.aggregations.metrics.CardinalityAggregatorFactory$ExecutionMode (CardinalityAggregatorFactory.java:75)

reta · 2024-09-16T16:22:29Z

@rishabhmaurya @msfroh @maitreya2954 we have a similar precedent [1] recently where we decided that offloading such types of decisions to the user is not a viable path forward. The server should be the one to figure out the best way to perform the aggregation in question, not users.

[1] #15012

Signed-off-by: Siddharth Rayabharam <[email protected]>

github-actions · 2024-09-16T19:13:40Z

❌ Gradle check result for a6f3c65: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

github-actions · 2024-09-16T20:14:56Z

❌ Gradle check result for e205165: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

msfroh · 2024-09-16T21:00:26Z

@rishabhmaurya @msfroh @maitreya2954 we have a similar precedent [1] recently where we decided that offloading such types of decisions to the user is not a viable path forward. The server should be the one to figure out the best way to perform the aggregation in question, not users.

[1] #15012

From #15269, it sounds like the original proposal was to add a memory threshold (which a user could adjust up/down) to cluster settings that would let the server decide which mode to use based on available memory. Would that be a better option?

rishabhmaurya · 2024-10-02T16:52:28Z

@msfroh @reta I think adding a better heuristic for selection of ordinal vs direct collector could be a follow up to it. This setting is for the power users who understand the impact of ordinal collector behaviour and understand the implications of it and still want to use it. This is similar to execution hint we expose for term aggregation. Also, sometimes cardinality agg is the major usecase for users and they may want to tweak this setting directly to understand the memory impact in their non-prod environments to better scale the prod environments. Heuristic based solutions doesn't gives this flexibility. And I'm not making a hypothetical use case as this became a blocker for one of the managed service user who wanted to tweak this setting in their non-prod environment to understand the performance gains and memory impact where they were evaluating eager_global ordinals, murmur hash, direct and ordinal collector but didn't have a way to override the behaviour for direct vs ordinal collector.

reta · 2024-10-02T17:12:23Z

@msfroh @reta I think adding a better heuristic for selection of ordinal vs direct collector could be a follow up to it

@rishabhmaurya changing and maintaining public APIs for a feature that (as per you comment) only one single user may benefit from is not justified (in my opinion). The idea behind heuristic based solutions is to have a reliable way to pick the best path for vast majority of the users, the execution hint offloads this pain to the user. If we could avoid that by spending more time understanding "the performance gains and memory impact evaluating eager_global ordinals, murmur hash, direct and ordinal collector" (as you mentioned), it would be win / win option.

maitreya2954 · 2024-10-02T18:16:10Z

@reta @msfroh @rishabhmaurya I think best way to move forward is to develop a reliable heuristic but also give a user ability to set a memory threshold or choose a desired behavior. There is no doubt, A good heuristic will benefit vast majority of the users. However, on the other hand, users who are willing to expend more memory for OrdinalsCollector to improve aggregation times (I believe ordinals are faster, but correct me if I am wrong). So, isolating few power users with this usecase might also not be best way for us.

I suggest, we keep the current heuristic (here) as the default behaviour and give option to the user to choose "execution_hint" AND option to set a "memory threshold" in the cluster settings. @reta Since we do not have any new heuristic other than the existing one, we can create an seperate issue to come up with an enhanced heuristic. Wdyt?

reta · 2024-10-02T18:29:39Z

@reta @msfroh @rishabhmaurya I think best way to move forward is to develop a reliable heuristic but also give a user ability to set a memory threshold or choose a desired behavior

@maitreya2954 for this specific execution hint, the user has to be a real expert to understand a) what these DirectCollector and OrdinalsCollector are b) how they differ c) when to use one or another. I am fine to have this new hint added but I would like to have it documented in a way that user (and not only real expert) would be able to understand when one may need to use it and why.

opensearch-trigger-bot · 2024-11-09T15:22:12Z

This PR is stalled because it has been open for 30 days with no activity.

github-actions bot added enhancement Enhancement or improvement to existing feature or request good first issue Good for newcomers Search:Aggregations labels Sep 5, 2024

maitreya2954 mentioned this pull request Sep 13, 2024

[DOC] Documentation change for Cardinality Aggregator: execution_hint field added opensearch-project/documentation-website#8264

Open

4 tasks

maitreya2954 marked this pull request as ready for review September 13, 2024 20:12

maitreya2954 requested review from anasalkouz, andrross, ashking94, Bukhtawar, CEHENKLE, dblock, dbwiddis, gbbafna, jed326, kotwanikunal, mch2, msfroh, nknize, owaiskazi19, reta, Rishikesh1159, sachinpkale, saratvemulapalli, shwetathareja and sohami as code owners September 13, 2024 20:12

This was referenced Sep 13, 2024

[Feature Request] Add a cluster setting for memory threshold to pick OrdinalsCollector in Cardinality aggregation #15269

Open

Execution hint documentation added to cardinality agg opensearch-project/documentation-website#8265

Draft

maitreya2954 added 9 commits September 13, 2024 22:56

Introduce execution hint for Cardinality aggregation

cabf900

Signed-off-by: Siddharth Rayabharam <[email protected]>

Cleanup ExecutionMode enum declarations

0101e9f

Signed-off-by: Siddharth Rayabharam <[email protected]>

Remove unwanted lines

6f7ba1d

Signed-off-by: Siddharth Rayabharam <[email protected]>

toString overrided for ExecutionHint

fc14339

Signed-off-by: Siddharth Rayabharam <[email protected]>

Testcases added for CardinalityAggregator execution hint

7bca633

Signed-off-by: Siddharth Rayabharam <[email protected]>

Test case for invalid execution hint added

7e62f5d

Signed-off-by: Siddharth Rayabharam <[email protected]>

Java tags added for ExecutionMode

521939a

Signed-off-by: Siddharth Rayabharam <[email protected]>

Test method names corrected

0bd75bf

Signed-off-by: Siddharth Rayabharam <[email protected]>

gradle format checks

fd02f7d

Signed-off-by: Siddharth Rayabharam <[email protected]>

maitreya2954 force-pushed the cardinality_agg_collectors_hint branch from 2658aa2 to fd02f7d Compare September 14, 2024 02:56

Forbidden Apis fixed

a6f3c65

Signed-off-by: Siddharth Rayabharam <[email protected]>

Merge branch 'main' into cardinality_agg_collectors_hint

e205165

opensearch-ci-bot mentioned this pull request Sep 17, 2024

[AUTOCUT] Gradle Check Flaky Test Report for SearchRestCancellationIT #14311

Open

sandeshkr419 added the v2.18.0 Issues and PRs related to version 2.18.0 label Oct 9, 2024

opensearch-trigger-bot bot added the stalled Issues that have stalled label Nov 9, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Introduce execution hint for Cardinality aggregation #15764

Introduce execution hint for Cardinality aggregation #15764

maitreya2954 commented Sep 5, 2024 •

edited

Loading

github-actions bot commented Sep 5, 2024

github-actions bot commented Sep 5, 2024

github-actions bot commented Sep 12, 2024

github-actions bot commented Sep 13, 2024

github-actions bot commented Sep 13, 2024

github-actions bot commented Sep 13, 2024

github-actions bot commented Sep 14, 2024

maitreya2954 commented Sep 14, 2024

rishabhmaurya commented Sep 16, 2024

reta commented Sep 16, 2024 •

edited

Loading

github-actions bot commented Sep 16, 2024

github-actions bot commented Sep 16, 2024

msfroh commented Sep 16, 2024

rishabhmaurya commented Oct 2, 2024 •

edited

Loading

reta commented Oct 2, 2024 •

edited

Loading

maitreya2954 commented Oct 2, 2024

reta commented Oct 2, 2024 •

edited

Loading

opensearch-trigger-bot bot commented Nov 9, 2024

Introduce execution hint for Cardinality aggregation #15764

Are you sure you want to change the base?

Introduce execution hint for Cardinality aggregation #15764

Conversation

maitreya2954 commented Sep 5, 2024 • edited Loading

Description

Related Issues

Related Documentation issue:

Check List

github-actions bot commented Sep 5, 2024

github-actions bot commented Sep 5, 2024

github-actions bot commented Sep 12, 2024

github-actions bot commented Sep 13, 2024

github-actions bot commented Sep 13, 2024

github-actions bot commented Sep 13, 2024

github-actions bot commented Sep 14, 2024

maitreya2954 commented Sep 14, 2024

rishabhmaurya commented Sep 16, 2024

reta commented Sep 16, 2024 • edited Loading

github-actions bot commented Sep 16, 2024

github-actions bot commented Sep 16, 2024

msfroh commented Sep 16, 2024

rishabhmaurya commented Oct 2, 2024 • edited Loading

reta commented Oct 2, 2024 • edited Loading

maitreya2954 commented Oct 2, 2024

reta commented Oct 2, 2024 • edited Loading

opensearch-trigger-bot bot commented Nov 9, 2024

maitreya2954 commented Sep 5, 2024 •

edited

Loading

reta commented Sep 16, 2024 •

edited

Loading

rishabhmaurya commented Oct 2, 2024 •

edited

Loading

reta commented Oct 2, 2024 •

edited

Loading

reta commented Oct 2, 2024 •

edited

Loading