DM-45489: Add more test cover for RemoteButler queries #1042

dhirving · 2024-07-30T22:06:33Z

Fixed an issue where the "Hybrid" query results object for queryDataIds was testing the DirectButler implementation in some cases where it should have been testing the RemoteButler implementation. This revealed some small missing exception handling and minor discrepancies between the two implementations, which are now fixed.

Checklist

ran Jenkins
added a release note for user-visible changes to doc/changes
(if changing dimensions.yaml) make a copy of dimensions.yaml in configs/old_dimensions

Match the DirectButler behavior in RemoteButler queryDataIds by throwing an ArgumentError if collections is specified without datasets.

The query shims for RemoteButler are not able to determine whether there are any errors until they go to resolve the query, so resolve the query in a test to allow this instance of MissingDatasetTypeError to be raised lazily.

Matching the DirectButler behavior, return a more user-friendly error when an empty string is passed to order_by.

Update unit tests related to interpreting identifiers in queries to account for differences between the old and new query systems.

Tweak the registry query unit tests to handle minor differences between the old and new query systems.

Fixed an issue where the "Hybrid" query results object for queryDataIds was testing the DirectButler implementation in some cases where it should have been testing the RemoteButler implementation.

codecov · 2024-07-30T22:33:44Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 89.38%. Comparing base (88f4f2d) to head (66e3efc).

Additional details and impacted files

@@           Coverage Diff           @@
##             main    #1042   +/-   ##
=======================================
  Coverage   89.37%   89.38%           
=======================================
  Files         359      359           
  Lines       45630    45642   +12     
  Branches     9349     9357    +8     
=======================================
+ Hits        40783    40795   +12     
  Misses       3521     3521           
  Partials     1326     1326

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

dhirving · 2024-07-30T22:45:23Z

python/lsst/daf/butler/registry/tests/_registry.py

+            with self.assertRaises(NotImplementedError):
+                registry.queryDataIds(
+                    ["instrument", "detector", "exposure"], datasets="bias", collections=coll_list
+                ).count()


In the new query system this is returning zero rows, which I don't think is right.. it should probably be returning the data IDs for all the bias datasets instead?

I think the the behavior is correct, and the comment above this test is wrong: I don't see anything that inserts exposure dimension records, so the query is going to return no rows because it's got a join to that empty table. Note that the dimensions of bias are just {instrument, detector}; the match to exposure is going to be a temporal join between the CALIBRATION collection's validity range and exposure.timespan.

So I suspect there's supposed to be an exposure record inserted with a timespan that overlaps the validity range of exactly one bias for or or two detectors in this test, as that'd make the test much more interesting for the new system (and it wouldn't affect the behavior of the old query system, which is probably why it wasn't done). And if we do that the new query system should return rows for that exposure and whatever detectors have a bias with the matching timespan. But it's quite possible that behavior is already covered in other tests of the new query system and hence there's no need to re-check it here.

Note that we don't really care about the case where the exposure's timespan overlaps the validity ranges of multiple biases; this query might still be sound, but it'd make any find-first search for the bias fail, and hence it represents a practically useless validity range.

TallJimbo · 2024-07-31T15:38:14Z

python/lsst/daf/butler/registry/tests/_registry.py

+            with self.assertRaises(NotImplementedError):
+                registry.queryDataIds(
+                    ["instrument", "detector", "exposure"], datasets="bias", collections=coll_list
+                ).count()


I think the the behavior is correct, and the comment above this test is wrong: I don't see anything that inserts exposure dimension records, so the query is going to return no rows because it's got a join to that empty table. Note that the dimensions of bias are just {instrument, detector}; the match to exposure is going to be a temporal join between the CALIBRATION collection's validity range and exposure.timespan.

So I suspect there's supposed to be an exposure record inserted with a timespan that overlaps the validity range of exactly one bias for or or two detectors in this test, as that'd make the test much more interesting for the new system (and it wouldn't affect the behavior of the old query system, which is probably why it wasn't done). And if we do that the new query system should return rows for that exposure and whatever detectors have a bias with the matching timespan. But it's quite possible that behavior is already covered in other tests of the new query system and hence there's no need to re-check it here.

Note that we don't really care about the case where the exposure's timespan overlaps the validity ranges of multiple biases; this query might still be sound, but it'd make any find-first search for the bias fail, and hence it represents a practically useless validity range.

dhirving added 6 commits July 30, 2024 11:03

Fix missing queryDataIds argument validation

27e5d80

Match the DirectButler behavior in RemoteButler queryDataIds by throwing an ArgumentError if collections is specified without datasets.

Tweak test for RemoteButler behavior

5337185

The query shims for RemoteButler are not able to determine whether there are any errors until they go to resolve the query, so resolve the query in a test to allow this instance of MissingDatasetTypeError to be raised lazily.

Provide better error message for missing dimension

2624869

Matching the DirectButler behavior, return a more user-friendly error when an empty string is passed to order_by.

Update query tests for new query system diagnostics

56cdc0c

Update unit tests related to interpreting identifiers in queries to account for differences between the old and new query systems.

Update test for minor query system differences

5a406b3

Tweak the registry query unit tests to handle minor differences between the old and new query systems.

Add test cover for RemoteButler query exceptions

66e3efc

Fixed an issue where the "Hybrid" query results object for queryDataIds was testing the DirectButler implementation in some cases where it should have been testing the RemoteButler implementation.

dhirving force-pushed the tickets/DM-45489 branch from 97a9c97 to 66e3efc Compare July 30, 2024 22:20

dhirving marked this pull request as ready for review July 30, 2024 22:40

dhirving commented Jul 30, 2024

View reviewed changes

TallJimbo approved these changes Jul 31, 2024

View reviewed changes

dhirving merged commit c108883 into main Jul 31, 2024
18 checks passed

dhirving deleted the tickets/DM-45489 branch July 31, 2024 17:35

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DM-45489: Add more test cover for RemoteButler queries #1042

DM-45489: Add more test cover for RemoteButler queries #1042

dhirving commented Jul 30, 2024 •

edited

Loading

codecov bot commented Jul 30, 2024

dhirving Jul 30, 2024

TallJimbo Jul 31, 2024

TallJimbo Jul 31, 2024

DM-45489: Add more test cover for RemoteButler queries #1042

DM-45489: Add more test cover for RemoteButler queries #1042

Conversation

dhirving commented Jul 30, 2024 • edited Loading

Checklist

codecov bot commented Jul 30, 2024

Codecov Report

dhirving Jul 30, 2024

Choose a reason for hiding this comment

TallJimbo Jul 31, 2024

Choose a reason for hiding this comment

TallJimbo Jul 31, 2024

Choose a reason for hiding this comment

dhirving commented Jul 30, 2024 •

edited

Loading