Support training for latency based anomalies specifically for perf-anomaly #409

shashank10456 · 2024-09-10T19:55:22Z

Explain what this PR does.

This PR supports Anomaly Detection on fields that use valuesDoubleSketches. We add aggregations and postaggregations which run natively on druid. These sketches are converted to values using these postaggregations and are run on druid.
This would enable us to use anomaly detection for inputs using sketches(https://datasketches.apache.org/). For example, latency based anomaly.

Also, I have made few changes to DockerFile and added a patch for Numalogic 0.9.1 to avoid CVE issues. This is important for the perf-anomaly team to avoid moving to Numaflow 1.2.1 and updating all the UDFs and UDSinks. This would help them save lot of time by just upgrading the ML vertices.

…omaly

Signed-off-by: shashank10456 <[email protected]>

…omaly

codecov · 2024-09-10T19:59:20Z

Codecov Report

Attention: Patch coverage is 90.00000% with 1 line in your changes missing coverage. Please review.

Project coverage is 92.17%. Comparing base (f29f771) to head (883a32c).
Report is 20 commits behind head on main.

Files with missing lines	Patch %	Lines
numalogic/connectors/_config.py	90.00%	0 Missing and 1 partial ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #409      +/-   ##
==========================================
- Coverage   93.07%   92.17%   -0.90%     
==========================================
  Files          97       97              
  Lines        4492     4781     +289     
  Branches      387      430      +43     
==========================================
+ Hits         4181     4407     +226     
- Misses        231      276      +45     
- Partials       80       98      +18

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

qhuai · 2024-09-10T20:59:20Z

Please replace "Explain what this PR does." with the real description & purpose of this PR.

qhuai · 2024-09-30T01:12:38Z

numalogic/connectors/_config.py

+        if not self.post_aggregations:
+            self.post_aggregations = {
+                "p90": _post_agg.QuantilesDoublesSketchToQuantile(
+                    output_name="agg_out", field=postaggregator.Field("agg_out"), fraction=0.90


As a library feature, percentile is better to be configurable.

qhuai · 2024-09-30T01:14:39Z

numalogic/connectors/_config.py


        if not self.aggregations:
-            self.aggregations = {"count": doublesum("count")}
+            self.aggregations = {
+                "agg_out": _agg.quantiles_doubles_sketch("valuesDoublesSketch", "agg0", 64)


What does this value 64 imply? Does it need to be configurable for different latency anomaly use cases?

qhuai · 2024-09-30T01:23:30Z

Dockerfile

@@ -3,9 +3,9 @@
 ####################################################################################################

 ARG PYTHON_VERSION=3.11
-FROM python:${PYTHON_VERSION}-slim-bookworm AS builder
+FROM python:${PYTHON_VERSION}-bookworm AS builder


The -slim-bookworm here replaced with -bookworm. However, the same -slim-bookworm at line 31 remains unchanged. Why is the difference?

ssrigiri1 and others added 4 commits May 14, 2024 00:19

Support training for latency based anomalies specifically for perf-an…

2045115

…omaly

Delete ML_comp_10hrs.ipynb

7b42e2f

Signed-off-by: shashank10456 <[email protected]>

Support training for latency based anomalies specifically for perf-an…

bb07663

…omaly

Cleanup docker file

883a32c

shashank10456 mentioned this pull request Sep 12, 2024

Feat: Support Anomaly Detection for fields using valuesDoublesSketch. #411

Open

shashank10456 changed the title ~~Support trainer perf anomaly 4~~ Support training for latency based anomalies specifically for perf-anomaly Sep 12, 2024

qhuai reviewed Sep 30, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support training for latency based anomalies specifically for perf-anomaly #409

Support training for latency based anomalies specifically for perf-anomaly #409

shashank10456 commented Sep 10, 2024 •

edited

Loading

codecov bot commented Sep 10, 2024

qhuai commented Sep 10, 2024

qhuai Sep 30, 2024

qhuai Sep 30, 2024

qhuai Sep 30, 2024

Support training for latency based anomalies specifically for perf-anomaly #409

Are you sure you want to change the base?

Support training for latency based anomalies specifically for perf-anomaly #409

Conversation

shashank10456 commented Sep 10, 2024 • edited Loading

codecov bot commented Sep 10, 2024

Codecov Report

qhuai commented Sep 10, 2024

qhuai Sep 30, 2024

Choose a reason for hiding this comment

qhuai Sep 30, 2024

Choose a reason for hiding this comment

qhuai Sep 30, 2024

Choose a reason for hiding this comment

shashank10456 commented Sep 10, 2024 •

edited

Loading