Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEATURE] Add support to recommend threshold tuning for heap based task cancellations by SearchBackpressureService #455

Open
kaushalmahi12 opened this issue Jul 14, 2023 · 1 comment
Labels
enhancement New feature or request

Comments

@kaushalmahi12
Copy link
Contributor

kaushalmahi12 commented Jul 14, 2023

Is your feature request related to a problem?

Recently opensearch introduced a new feature called searchbackpressure to make the service more resilient to node drops and performance degradation. It solves the problem by cancelling resource guzzling search queries at shard level and coordinator node level. In order to achieve this it uses various settings to cancel a search query based on the resource the query is making heavy use of. As part of this feature we will try to add support to recommend threshold tuning for those settings for heap based query cancellation at shard and coordinator level.

What solution would you like?

Since there are multiple settings for each resource based cancellation. We will only recommend a single value (a multiplier) by which the thresholds should increase/decrease for a resource(In this case heap) as that would complicate the solution and number of RCAs we will need to create. We will emit actions for both the searchTask(Coordinator) and shard level differently.

Logic to mark the RCA unhealthy to increase the thresholds (Node level)

  • If the max heap used by openSearch process is below 85% for a minute. Since RCA runs at 5 seconds interval, we will keep a sliding window of heapUsed values for a minute.
  • And the heap based task cancellations are more than 3%. (Since there are rate limiters to limit the amount of cancellations. Can't cancel more than 10% of all successful tasks both at shard level and coordinator level).

Logic to mark the RCA unhealthy to decrease the thresholds (Node level)

  • If the max heap used by openSearch process is above 90% for a minute. Since RCA runs at 5 seconds interval, we will keep a sliding window of heapUsed values for a minute.
  • And the heap based task cancellations are less than 3%. (Since there are rate limiters to limit the amount of cancellations. Can't cancel more than 10% of all successful tasks both at shard level and coordinator level).

Marking the cluster level RCAs unhealthy

We will mark the cluster level RCA as unhealthy if any of the node in the cluster has unhealthy node level RCA for an hour with a cool off period of one day.

Adjusted SBP Settings

  • search_backpressure.search_task.total_heap_percent_threshold
  • search_backpressure.search_task.heap_percent_threshold
  • search_backpressure.search_task.heap_variance
  • search_backpressure.search_task.heap_moving_average_window_size

What alternatives have you considered?
The RCA framework is already in place to which runs as a side car and does not share the opensearch process resources. The alternate solution could have been to place this logic in the opensearch but that can create the resource scarcity and performance degradation of opensearch process under duress

Do you have any additional context?
Add any other context or screenshots about the feature request here.

@kaushalmahi12 kaushalmahi12 added enhancement New feature or request untriaged labels Jul 14, 2023
@kaushalmahi12 kaushalmahi12 changed the title [FEATURE] Add autotune feature for heap based task cancellations by SearchBackpressureService [FEATURE] Add support to recommend threshold tuning for heap based task cancellations by SearchBackpressureService Jul 14, 2023
@dblock dblock removed the untriaged label Jun 6, 2024
@dblock
Copy link
Member

dblock commented Jun 6, 2024

[Triage -- attendees 1, 2, 3, 4, 5, 6, 7]

Looks like a legit feature request, thanks for opening it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants