Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: mute collectors based on exception count #56

Draft
wants to merge 13 commits into
base: main
Choose a base branch
from

Conversation

kartva
Copy link
Contributor

@kartva kartva commented Nov 10, 2023

Is your feature request related to a problem? Please provide an existing Issue # , or describe.
As discussed at opensearch-project/performance-analyzer#549

Describe the solution you are proposing
We use the exception counts maintained by StatsCollector to dynamically mute collectors based on the number of exceptions they throw in the last minute of sampling.

Additional context
Currently in the proof of concept stage. If this is successful, further avenues of investigation may be:

  • tracking the number of bytes written by each collector and using that data to dynamically control collectors
  • collecting more nuanced performance information about collectors such as their CPU consumption

Check List

  • New functionality includes testing.
    • All tests pass
  • New functionality has been documented.
    • New functionality has javadoc added
  • Commits are signed per the DCO using --signoff

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

// since other collectors run every 5 sec, we can obtain an estimate
// of the exceptions they've thrown in the last ~10 runs or so.
int exceptionCount = StatsCollector.instance().exceptionCount(collector.getErrorMetric());
if (!collector.inProgress() || exceptionCount > 5) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can do exceptionCount > 5 in the POC, but in the final implementation, this threshold should be configurable.

@@ -21,9 +21,7 @@
*/
public class ServiceMetrics {
public static SampleAggregator READER_METRICS_AGGREGATOR,
STAT_METRICS_AGGREGATOR = new SampleAggregator(StatMetrics.values()),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any specific reasons we removed those aggregators? Are those just for testing?

Copy link
Contributor Author

@kartva kartva Nov 14, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These aggregators are unused in the entire Performance Analyzer codebase. They seem to be leftover from an earlier refactoring possibly involving RCA.

Do you think this is a change that would benefit from being its own PR?

and use them in deciding whether to mute a collector
caused by Mockito

ref: https://stackoverflow.com/questions/68113065/mockito-inaccessible-object-exception-while-creating-a-mock-object

To quote:
"This can happen if Mockito requires reflective access to non-public parts in a Java module.
you can get around this by explicitly allowing access via --add-opens in your java call"
invokePrivilegedAndLogError already exists for that purpose

consider transitioning current uses to invokePrivilegedAndLogError

this is important as the exception counting mechanism relies on
exceptions not being swallowed earlier in call stack
by adding repetition options to runBehavior
@kartva
Copy link
Contributor Author

kartva commented Nov 29, 2023

What sort of configuration setting do you think would be appropriate for exceptionMuteLimit? Would this have a default value (like 5) or be off by default? I imagine having it off by default will be sensible.

@kartva kartva requested a review from ansjcy November 29, 2023 22:59
@kartva
Copy link
Contributor Author

kartva commented Nov 29, 2023

summary of changes so far:

  • added exception counting mechanism to mute collectors
  • changed invokePrivileged to not catch errors, since exception counting relies on the exceptions not being caught and logged earlier in the chain. invokePrivilegedAndLogError is more appropriate for when exceptions being caught is desired.
    • migrated some uses of invokePrivileged to invokePrivilegedAndLogError
  • added testing for muting behavior
    • this required adding a flag to allow Mockito access to certain parts of java.base (see commit)
    • test passes

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants