-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DM-41165: Inconsistent handling of expanded datasetRefs #895
Conversation
Codecov ReportAll modified lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #895 +/- ##
==========================================
- Coverage 87.60% 87.60% -0.01%
==========================================
Files 272 272
Lines 36600 36600
Branches 7641 7641
==========================================
- Hits 32064 32062 -2
- Misses 3350 3351 +1
- Partials 1186 1187 +1
☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good. I fully agree with your diagnosis and fix.
As for testing, one thing you don't have to worry about is Registry
implementations other than SqlRegistry
; one of the decisions coming out of the middleware meeting a couple of weeks ago is to collapse those back into a single final/concrete Registry
. But there are indeed lots of ways a registry (or a butler, for that matter) can be configured and testing all of the various permutations is unsurprisingly difficult.
But I don't see any tests right now that both set up a full Butler
and enable the ObsCore stuff - test_obscore.py
only makes a Registry
and then operates directly on that. We're going to want to address that at some point, because we're also going to move in the direction of putting more public interfaces directly on Butler
(rather than Butler.registry
) and hence that's what we'll want to test in places like test_obscore.py
. But doing that on this ticket is a heavy lift. I'm okay with punting test coverage for this bugfix to another ticket (e.g. one mostly about migrating registry-focused tests to butler-focused tests). @andy-slac, any thoughts?
Thinking a bit more, I wonder if this says that something down in the ObsCore implementation should also be able to expand its own data IDs; if it's not too inefficient, that would make it easier to test and (more importantly) make overall Registry behavior much less dependent on whether ObsCore is enabled. Curious to hear @andy-slac 's thoughts on that, too. |
Doing that at obscore plugin level will likely be inefficient, and probably non-trivial (obscore plugin would have to depend on other things at registry level). Still there will be an opportunity to revisit obscore implementation when we do big refactoring, we can think about it then. For now I think we can merge this ticket and remember that there may be potential optimizations. |
Regarding |
Thanks for the tip about From a contract point of view, ObsCore requiring expanded IDs in its API was not the problem -- the implementation of |
The previous documentation implied that _importDatasets requires expanded data IDs in a specific context, when implementations can and do assume expanded IDs for a variety of reasons. In particular, the introduction of optional ObsCore makes Registry a tagged class, so two instances of the same Registry implementation can now behave very differently. Removing the misleading context will prevent developers from trying to make implementation- or instance-dependent optimizations when calling the Registry interface, improving the stability of future code.
We definitely do. It's just difficult right now because the set of permutations we test now (registries with different database backends, butlers with different kinds of datastores) is incomplete, already quite large, and about to be quite obsolete (since we are collapsing registry the registry hierarchy, expanding polymorphism in the butler hierarchy, and dropping InMemoryDatastore). We have a lot of work to do on the tests. |
Should I open a ticket for that (to be blocked by the big redesigns)? |
f10f94d
to
ab40e2c
Compare
Previously, transfer_from did not impose requirements on dataset refs, but set a flag on _importDatasets that (per its docs) should only have been set if the caller could guarantee expanded datasetRefs. This commit keeps the open-ended precondition on transfer_from, and allows internal expansion instead. This should be a no-op if the input datasetRefs are already expanded. The expand flag is left at its default instead of being explicitly set to True to give _importDatasets ultimate control over expansion (e.g., in case it no longer requires expanded IDs in the future).
ab40e2c
to
2eae16f
Compare
Yes, I think that makes sense, especially since it seems there's a particular combination of operations and configurations that are quite important to Prompt Processing that got missed in our test matrix so far, and it'd be good to get your best guess at that combination in writing somewhere. |
Done as DM-41234, though I'm not sure I fully answered the question about "particular combination". |
This PR fixes a bug where some
Registry
objects require expanded datasetRefs for transfers and others do not, but refs would not be expanded when necessary because the transfer code tried to be clever and second-guess the spec. The formal spec is unchanged, but the code is now more robust about following it.I have not tried to improve
daf_butler
's test coverage to catch this bug, as the intersecting variables (SqlRegistry
vs. other implementations, ObsCore vs. non-ObsCore registries, expanded vs. non-expanded data IDs) are too complex for me to follow. Any pointers on where this should be covered would be appreciated.Checklist
doc/changes