DM-44875: Handle ambiguous calibration lookups on older postgres #1029

dhirving · 2024-06-25T22:17:01Z

For Postgres older than version 16, we now throw an error for find-first calibration searches instead of silently returning a potentially-incorrect result. This is similar to the behavior of the old query system.

The query requires the ANY_VALUE aggregate function to generate the validity match count column used by postprocessing to check for ambiguous results. This function is not available on older Postgres.

Checklist

ran Jenkins
added a release note for user-visible changes to doc/changes
(if changing dimensions.yaml) make a copy of dimensions.yaml in configs/old_dimensions

Fix an issue where using order_by in the new query system for a data ID query or dataset query would sometimes fail with the postgres error "SELECT DISTINCT ON expressions must match initial ORDER BY expressions". This was occurring because the data ID de-duplication logic sometimes uses DISTINCT ON. Postgres requires that the leftmost ORDER BY expressions match the DISTINCT ON clause, and we were not enforcing that.

For Postgres older than version 16, we now throw an error for find-first calibration searches instead of silently returning a potentially-incorrect result. This is similar to the behavior of the old query system. The query requires the ANY_VALUE aggregate function to generate the validity match count column used by postprocessing to check for ambiguous results. This function is not available on older Postgres.

codecov · 2024-06-25T22:32:11Z

Codecov Report

Attention: Patch coverage is 82.60870% with 4 lines in your changes missing coverage. Please review.

Project coverage is 89.39%. Comparing base (aadb42e) to head (7a73223).
Report is 23 commits behind head on main.

Files	Patch %	Lines
...hon/lsst/daf/butler/direct_query_driver/_driver.py	66.66%	3 Missing and 1 partial ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #1029      +/-   ##
==========================================
+ Coverage   89.38%   89.39%   +0.01%     
==========================================
  Files         358      358              
  Lines       45473    45481       +8     
  Branches     9346     9347       +1     
==========================================
+ Hits        40648    40660      +12     
+ Misses       3523     3522       -1     
+ Partials     1302     1299       -3

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

dhirving · 2024-06-27T22:02:48Z

python/lsst/daf/butler/direct_query_driver/_driver.py

@@ -715,11 +717,11 @@ def apply_query_projection(
                        unique_keys.append(builder.joiner.fields[dataset_type]["collection_key"])
                    else:
                        derived_fields.append((dataset_type, "collection_key"))
-                elif dataset_field == "timespan" and plan.datasets[dataset_type].is_calibration_search:
+                elif dataset_field == "timespan" and is_calibration_search:


This dataset_field == "timespan" case was already not covered before I got here. I wanted to add a test but can't figure out what circumstances would trigger it, since the validity range comparison is handled in-DB, and DatasetRefs don't include the calibration collection timespan field.

Right, this can't be tested until we add support for "general" results.

dhirving · 2024-06-28T16:38:44Z

python/lsst/daf/butler/direct_query_driver/_driver.py

                    # If we're doing a non-find-first query against a
                    # CALIBRATION collection, the timespan is also a unique
                    # key...
-                    if dataset_type == plan.find_first_dataset:
+                    if is_find_first_search:


also I realized last night that we need to add dataset_id to unique keys for non-find-first search, I'll do that today before you review this for real hopefully

dhirving added 2 commits June 25, 2024 11:31

dhirving commented Jun 27, 2024

View reviewed changes

dhirving marked this pull request as ready for review June 27, 2024 22:16

dhirving commented Jun 28, 2024

View reviewed changes

dhirving marked this pull request as draft June 28, 2024 18:12

dhirving force-pushed the tickets/DM-44868 branch from c818712 to ac3a767 Compare July 2, 2024 17:21

Base automatically changed from tickets/DM-44868 to main July 2, 2024 17:59

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DM-44875: Handle ambiguous calibration lookups on older postgres #1029

DM-44875: Handle ambiguous calibration lookups on older postgres #1029

dhirving commented Jun 25, 2024

codecov bot commented Jun 25, 2024 •

edited

Loading

dhirving Jun 27, 2024

TallJimbo Jun 28, 2024

dhirving Jun 28, 2024

DM-44875: Handle ambiguous calibration lookups on older postgres #1029

Are you sure you want to change the base?

DM-44875: Handle ambiguous calibration lookups on older postgres #1029

Conversation

dhirving commented Jun 25, 2024

Checklist

codecov bot commented Jun 25, 2024 • edited Loading

Codecov Report

dhirving Jun 27, 2024

Choose a reason for hiding this comment

TallJimbo Jun 28, 2024

Choose a reason for hiding this comment

dhirving Jun 28, 2024

Choose a reason for hiding this comment

codecov bot commented Jun 25, 2024 •

edited

Loading