DM-38498: improve follow-up query support for QG generation use cases #876

TallJimbo · 2023-08-08T16:49:13Z

Checklist

ran Jenkins
added a release note for user-visible changes to doc/changes

codecov · 2023-08-08T17:04:57Z

Codecov Report

Patch coverage: 88.61% and project coverage change: +0.01% 🎉

Comparison is base (7c9a229) 87.65% compared to head (fd2474a) 87.67%.

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #876      +/-   ##
==========================================
+ Coverage   87.65%   87.67%   +0.01%     
==========================================
  Files         272      272              
  Lines       36005    36107     +102     
  Branches     7529     7552      +23     
==========================================
+ Hits        31560    31656      +96     
  Misses       3270     3270              
- Partials     1175     1181       +6

Files Changed	Coverage Δ
python/lsst/daf/butler/core/persistenceContext.py	`59.25% <ø> (ø)`
python/lsst/daf/butler/core/dimensions/_records.py	`82.14% <50.00%> (-1.71%)`	⬇️
.../butler/registry/datasets/byDimensions/_storage.py	`91.83% <80.00%> (-0.54%)`	⬇️
...ython/lsst/daf/butler/registry/queries/_results.py	`89.72% <83.33%> (-0.22%)`	⬇️
python/lsst/daf/butler/registry/queries/_query.py	`79.25% <86.66%> (+0.82%)`	⬆️
...lsst/daf/butler/registry/queries/_query_backend.py	`92.45% <94.44%> (+0.14%)`	⬆️
...hon/lsst/daf/butler/core/dimensions/_coordinate.py	`88.73% <100.00%> (+0.28%)`	⬆️
.../daf/butler/registry/queries/_sql_query_backend.py	`86.53% <100.00%> (+0.13%)`	⬆️
python/lsst/daf/butler/registry/tests/_registry.py	`98.21% <100.00%> (+0.02%)`	⬆️

... and 2 files with indirect coverage changes

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

andy-slac

Looks good, few minor questions.

andy-slac · 2023-08-22T17:59:49Z

python/lsst/daf/butler/registry/datasets/byDimensions/_storage.py

                raise ConflictingDefinitionError(
                    f"Existing dataset type or run do not match new dataset: {row._asdict()}"
                )


This looks like an unreachable code, all branches above raise exceptions.

andy-slac · 2023-08-22T18:00:42Z

python/lsst/daf/butler/registry/datasets/byDimensions/_storage.py

                raise ConflictingDefinitionError(
                    f"Existing dataset type and dataId does not match new dataset: {row._asdict()}"
                )


andy-slac · 2023-08-22T18:04:15Z

python/lsst/daf/butler/registry/queries/_query.py

+    def iter_data_ids_and_dataset_refs(
+        self, dataset_type: DatasetType, dimensions: DimensionGraph | None = None
+    ) -> Iterator[tuple[DataCoordinate, DatasetRef]]:
+        """Iterate over pairs of data IDs and dataset refs.


Doesn't dataset ref include its data ID?

It does, but these data IDs may have different dimensions from the dataset type. I'll point that out in the docs.

andy-slac · 2023-08-22T18:06:16Z

python/lsst/daf/butler/registry/queries/_query.py

+
+        Returns
+        -------
+        pairs : `~collections.abc.Iterable` [ `tuple` [ `DataCoordinate`,


Iterator instead of Iterable?

andy-slac · 2023-08-22T18:16:09Z

python/lsst/daf/butler/registry/queries/_query.py

+                initial_join_max_columns=frozenset(self._relation.columns),
+                governor_constraints=self._governor_constraints,
+                spatial_joins=spatial_joins,
+            )


I'd need another day to understand the details in the code above, so I just trust it is OK. 🙂

This is the butler functionality that will finally let us avoid the death-by-many-tiny-queries performance problem in QuantumGraph generation for (in particular) ISR.

This will allow reference catalog queries in QG generation to be vectorized as long as they use the common skypix system.

Just had to debug some of these, and found the integer primary keys embedded in the old messages hard to use. Unfortunately I couldn't get rid of all of them: this code has no access to the general mapping from dataset type ID to dataset type name. But I think that's the least likely field to be causing the conflict here, so it's not a huge loss.

Make use of the serialization caches when calling `to_simple` on dimension records.

TallJimbo force-pushed the tickets/DM-38498 branch 2 times, most recently from c2ecaa9 to 9570f7c Compare August 10, 2023 06:04

TallJimbo force-pushed the tickets/DM-38498 branch 6 times, most recently from 4d5a0bb to 1cf1130 Compare August 21, 2023 19:51

TallJimbo marked this pull request as ready for review August 21, 2023 21:08

andy-slac approved these changes Aug 22, 2023

View reviewed changes

Fix doc quotation and rewrap.

3a8a388

TallJimbo force-pushed the tickets/DM-38498 branch from aa95802 to 2c33bb0 Compare August 22, 2023 23:12

TallJimbo and others added 7 commits August 24, 2023 10:59

Add support for data ID follow-up queries for calibration datasets.

a1c3ca3

This is the butler functionality that will finally let us avoid the death-by-many-tiny-queries performance problem in QuantumGraph generation for (in particular) ISR.

Fix missing word in code comment.

edffe23

Relax requirement on dimensions in followup dataset queries.

1ea680d

This will allow reference catalog queries in QG generation to be vectorized as long as they use the common skypix system.

Add DataCoordinate.values_tuple() and optimizations based on it.

16e67b0

Cache dimension records when serializing

1e767ac

Make use of the serialization caches when calling `to_simple` on dimension records.

Add changelog entry.

fd2474a

TallJimbo force-pushed the tickets/DM-38498 branch from 2c33bb0 to fd2474a Compare August 24, 2023 14:59

TallJimbo merged commit 1585a23 into main Aug 24, 2023
16 checks passed

TallJimbo deleted the tickets/DM-38498 branch August 24, 2023 15:20

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DM-38498: improve follow-up query support for QG generation use cases #876

DM-38498: improve follow-up query support for QG generation use cases #876

TallJimbo commented Aug 8, 2023 •

edited

Loading

codecov bot commented Aug 8, 2023 •

edited

Loading

andy-slac left a comment

andy-slac Aug 22, 2023

andy-slac Aug 22, 2023

andy-slac Aug 22, 2023

TallJimbo Aug 22, 2023

andy-slac Aug 22, 2023

andy-slac Aug 22, 2023

DM-38498: improve follow-up query support for QG generation use cases #876

DM-38498: improve follow-up query support for QG generation use cases #876

Conversation

TallJimbo commented Aug 8, 2023 • edited Loading

Checklist

codecov bot commented Aug 8, 2023 • edited Loading

Codecov Report

andy-slac left a comment

Choose a reason for hiding this comment

andy-slac Aug 22, 2023

Choose a reason for hiding this comment

andy-slac Aug 22, 2023

Choose a reason for hiding this comment

andy-slac Aug 22, 2023

Choose a reason for hiding this comment

TallJimbo Aug 22, 2023

Choose a reason for hiding this comment

andy-slac Aug 22, 2023

Choose a reason for hiding this comment

andy-slac Aug 22, 2023

Choose a reason for hiding this comment

TallJimbo commented Aug 8, 2023 •

edited

Loading

codecov bot commented Aug 8, 2023 •

edited

Loading