Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ref(alerts): Update Snuba queries to match events-stats more closely #77755

Merged
merged 18 commits into from
Sep 24, 2024

Conversation

ceorourke
Copy link
Member

@ceorourke ceorourke commented Sep 18, 2024

When a user creates an anomaly detection alert we need to query snuba for 28 days worth of historical data to send to Seer to calculate the anomalies. Originally (#74614) I'd tried to pull out the relevant parts of the events-stats endpoint to mimic the data we see populated in metric alert preview charts (but for a larger time period, and it's happening after the rule is saved so I can't use any of the request object stuff) but I think I missed some things, so this PR aims to make that data be the same.

Closes https://getsentry.atlassian.net/browse/ALRT-288 (hopefully)

@@ -42,6 +42,27 @@
from sentry.utils.snuba import MAX_FIELDS, SnubaTSResult


def get_query_columns(columns, rollup):
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I moved this to be reused by anomaly detection

"""
serializer = SnubaTSResultSerializer(organization=organization, lookup=None, user=None)
Copy link
Member Author

@ceorourke ceorourke Sep 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm using the same serializer the events-stats endpoint uses and just pulling that data off to format into a list of TimeSeriesPoints for Seer's API. I clicked through every alert type and it always has the timestamp and count

data,
resolve_axis_column(query_columns[0]),
allow_partial_buckets=False,
zerofill_results=False,
Copy link
Member Author

@ceorourke ceorourke Sep 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was getting strange results in tests with this set to True, and for our purposes I think it doesn't matter that much since we default to sending Seer a 0 if we don't find a count anyway

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By "strange" I mean it was hitting this line and overwriting data with a count I had in a test as an empty array.

Copy link

codecov bot commented Sep 19, 2024

Codecov Report

Attention: Patch coverage is 82.35294% with 12 lines in your changes missing coverage. Please review.

✅ All tests successful. No failed tests found.

Files with missing lines Patch % Lines
src/sentry/seer/anomaly_detection/utils.py 78.43% 6 Missing and 5 partials ⚠️
...seer/anomaly_detection/get_historical_anomalies.py 87.50% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master   #77755      +/-   ##
==========================================
+ Coverage   78.09%   78.10%   +0.01%     
==========================================
  Files        6979     6988       +9     
  Lines      309710   310197     +487     
  Branches    50695    50744      +49     
==========================================
+ Hits       241857   242291     +434     
- Misses      61378    61420      +42     
- Partials     6475     6486      +11     

stats_period=None,
environments=environments,
)
snuba_query_string = get_snuba_query_string(snuba_query)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is one of the key changes here - the front end constructs a stringified query based on snuba_query.query AND snuba_query.event_types. This adds a join to the table for things like errors count with the is:unresolved query, or when you're using the dropdown to select "errors", "default", or "errors OR default" event types

@ceorourke
Copy link
Member Author

The users experiencing errors query is selecting data as a different name but it's otherwise the same, I don't know if that makes a difference to the outcome?
events-stats:

SELECT (events._snuba_events.time AS _snuba_events.time), (uniq((events._snuba_events.tags[sentry:user] AS _snuba_events.tags[sentry:user])) AS _snuba_count_unique_user)

anomaly detection:
SELECT (events._snuba_events.time AS _snuba_events.time), (uniq((events._snuba_events.tags[sentry:user] AS _snuba_events.tags[sentry:user])) AS _snuba_count_unique_tags_sentry_user)

if dataset_label == "events":
# DATASET_OPTIONS expects the name 'errors'
dataset_label = "errors"
elif dataset_label == "generic_metrics":
# XXX: performance alerts in prod
dataset_label = "transactions"
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure if I need to be using the discover dataset here, it's hard to know since it's different locally

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we do want to use the discover dataset - unfortunately this differs in prod vs. locally so it's confusing but locally we make this request:

api/0/organizations/sentry/events-stats/?interval=60m&project=1&query=event.type%3Atransaction&referrer=api.organization-event-stats&statsPeriod=9998m&yAxis=count%28%29

and in prod we make this request:

api/0/organizations/sentry/events-stats/?dataset=metricsEnhanced&interval=60m&project=1&query=event.type%3Atransaction&referrer=api.organization-event-stats&statsPeriod=9998m&yAxis=count%28%29

the events-stats endpoint gets the dataset label from the request here and falls back to using discover if nothing is passed (like in production).

In our case we're not getting the dataset label from the request but rather the snuba query object, so instead of it being "discover" it's "generic_metrics" in production and "transactions" in my local db.

All that to say for performance metrics alerts we need to use the discover dataset for both generic_metrics and transaction dataset labels

@ceorourke ceorourke marked this pull request as ready for review September 20, 2024 21:28
@ceorourke ceorourke requested a review from a team as a code owner September 20, 2024 21:28
@ceorourke ceorourke requested a review from a team September 20, 2024 21:54
Copy link
Contributor

@saponifi3d saponifi3d left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm! I just had some nitpicks - may want to have someone with more insights into seer take another glance too.

src/sentry/seer/anomaly_detection/utils.py Outdated Show resolved Hide resolved
src/sentry/seer/anomaly_detection/utils.py Outdated Show resolved Hide resolved

if dataset == metrics_performance:
nested_data = data.data.get("data", [])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so, this is... data.data.data? 😅 is there anywhere we have control over any of those that we could rename the variable? i think adding context like what kind of data it is would help a lot.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sadly no, this is the format of SnubaTSResult which is typed in the function signature.

src/sentry/seer/anomaly_detection/utils.py Outdated Show resolved Hide resolved
@ceorourke ceorourke merged commit 583a084 into master Sep 24, 2024
49 of 50 checks passed
@ceorourke ceorourke deleted the ceorourke/anomaly-detection-no-none-values branch September 24, 2024 16:29
0Calories pushed a commit that referenced this pull request Sep 25, 2024
…77755)

When a user creates an anomaly detection alert we need to query snuba
for 28 days worth of historical data to send to Seer to calculate the
anomalies. Originally (#74614)
I'd tried to pull out the relevant parts of the `events-stats` endpoint
to mimic the data we see populated in metric alert preview charts (but
for a larger time period, and it's happening after the rule is saved so
I can't use any of the `request` object stuff) but I think I missed some
things, so this PR aims to make that data be the same.

Closes https://getsentry.atlassian.net/browse/ALRT-288 (hopefully)
saponifi3d added a commit that referenced this pull request Sep 26, 2024
…tch events-stats more closely (#77755)"

I just want to test if ci is green if this PR is reverted - seems like
an issue there...

This reverts commit 583a084.
@github-actions github-actions bot locked and limited conversation to collaborators Oct 10, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Scope: Backend Automatically applied to PRs that change backend components
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants