-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DM-38498: improve follow-up query support for QG generation use cases #876
Conversation
Codecov ReportPatch coverage:
Additional details and impacted files@@ Coverage Diff @@
## main #876 +/- ##
==========================================
+ Coverage 87.65% 87.67% +0.01%
==========================================
Files 272 272
Lines 36005 36107 +102
Branches 7529 7552 +23
==========================================
+ Hits 31560 31656 +96
Misses 3270 3270
- Partials 1175 1181 +6
☔ View full report in Codecov by Sentry. |
c2ecaa9
to
9570f7c
Compare
4d5a0bb
to
1cf1130
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, few minor questions.
raise ConflictingDefinitionError( | ||
f"Existing dataset type or run do not match new dataset: {row._asdict()}" | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks like an unreachable code, all branches above raise exceptions.
raise ConflictingDefinitionError( | ||
f"Existing dataset type and dataId does not match new dataset: {row._asdict()}" | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here too.
def iter_data_ids_and_dataset_refs( | ||
self, dataset_type: DatasetType, dimensions: DimensionGraph | None = None | ||
) -> Iterator[tuple[DataCoordinate, DatasetRef]]: | ||
"""Iterate over pairs of data IDs and dataset refs. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doesn't dataset ref include its data ID?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It does, but these data IDs may have different dimensions from the dataset type. I'll point that out in the docs.
|
||
Returns | ||
------- | ||
pairs : `~collections.abc.Iterable` [ `tuple` [ `DataCoordinate`, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Iterator
instead of Iterable
?
initial_join_max_columns=frozenset(self._relation.columns), | ||
governor_constraints=self._governor_constraints, | ||
spatial_joins=spatial_joins, | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd need another day to understand the details in the code above, so I just trust it is OK. 🙂
aa95802
to
2c33bb0
Compare
This is the butler functionality that will finally let us avoid the death-by-many-tiny-queries performance problem in QuantumGraph generation for (in particular) ISR.
This will allow reference catalog queries in QG generation to be vectorized as long as they use the common skypix system.
Just had to debug some of these, and found the integer primary keys embedded in the old messages hard to use. Unfortunately I couldn't get rid of all of them: this code has no access to the general mapping from dataset type ID to dataset type name. But I think that's the least likely field to be causing the conflict here, so it's not a huge loss.
Make use of the serialization caches when calling `to_simple` on dimension records.
2c33bb0
to
fd2474a
Compare
Checklist
doc/changes