-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DM-44875: Handle ambiguous calibration lookups on older postgres #1029
base: main
Are you sure you want to change the base?
Conversation
Fix an issue where using order_by in the new query system for a data ID query or dataset query would sometimes fail with the postgres error "SELECT DISTINCT ON expressions must match initial ORDER BY expressions". This was occurring because the data ID de-duplication logic sometimes uses DISTINCT ON. Postgres requires that the leftmost ORDER BY expressions match the DISTINCT ON clause, and we were not enforcing that.
For Postgres older than version 16, we now throw an error for find-first calibration searches instead of silently returning a potentially-incorrect result. This is similar to the behavior of the old query system. The query requires the ANY_VALUE aggregate function to generate the validity match count column used by postprocessing to check for ambiguous results. This function is not available on older Postgres.
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #1029 +/- ##
==========================================
+ Coverage 89.38% 89.39% +0.01%
==========================================
Files 358 358
Lines 45473 45481 +8
Branches 9346 9347 +1
==========================================
+ Hits 40648 40660 +12
+ Misses 3523 3522 -1
+ Partials 1302 1299 -3 ☔ View full report in Codecov by Sentry. |
@@ -715,11 +717,11 @@ def apply_query_projection( | |||
unique_keys.append(builder.joiner.fields[dataset_type]["collection_key"]) | |||
else: | |||
derived_fields.append((dataset_type, "collection_key")) | |||
elif dataset_field == "timespan" and plan.datasets[dataset_type].is_calibration_search: | |||
elif dataset_field == "timespan" and is_calibration_search: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This dataset_field == "timespan"
case was already not covered before I got here. I wanted to add a test but can't figure out what circumstances would trigger it, since the validity range comparison is handled in-DB, and DatasetRefs don't include the calibration collection timespan field.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, this can't be tested until we add support for "general" results.
# If we're doing a non-find-first query against a | ||
# CALIBRATION collection, the timespan is also a unique | ||
# key... | ||
if dataset_type == plan.find_first_dataset: | ||
if is_find_first_search: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also I realized last night that we need to add dataset_id to unique keys for non-find-first search, I'll do that today before you review this for real hopefully
c818712
to
ac3a767
Compare
For Postgres older than version 16, we now throw an error for find-first calibration searches instead of silently returning a potentially-incorrect result. This is similar to the behavior of the old query system.
The query requires the ANY_VALUE aggregate function to generate the validity match count column used by postprocessing to check for ambiguous results. This function is not available on older Postgres.
Checklist
doc/changes
configs/old_dimensions