-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DM-47325: Add API for parsing butler dataset URIs (butler and ivo) #1113
base: main
Are you sure you want to change the base?
Conversation
9dd5098
to
57d6ec3
Compare
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #1113 +/- ##
==========================================
+ Coverage 89.37% 89.38% +0.01%
==========================================
Files 362 362
Lines 48312 48384 +72
Branches 5872 5879 +7
==========================================
+ Hits 43179 43250 +71
Misses 3718 3718
- Partials 1415 1416 +1 ☔ View full report in Codecov by Sentry. |
57d6ec3
to
19d4c15
Compare
python/lsst/daf/butler/_butler.py
Outdated
return label, dataset_id | ||
|
||
@classmethod | ||
def get_dataset_from_uri(cls, uri: str) -> DatasetRef | None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dhirving I was thinking that this should take a LabeledButlerFactory
as an optional parameter (so that a service like vo-cutouts can call it directly with the URI) but do you have any thoughts on how to pass in the token? In the general case the caller won't know which butler the URI is referring to. Can I assume that **kwargs
should be forwarded as factory.create_butler(label=label, **kwargs)
and assume the token would be shared? Or should there be a dict indexed by label that has the additional parameters that the service can set up once and pass in?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like your idea of a config object that could be set up by the caller and passed through.
One other idea would be a dependency injection pattern. Maybe declare a protocol like
class ButlerFactoryProtocol(Protocol):
create(label: str) -> Butler
# (or just `Callable[[str], Butler]
which get_dataset_from_uri
could take as a parameter. Then LabeledButlerFactory
could grow a method bind(access_token: str) -> ButlerFactoryProtocol
to create a factory with the access token already bound in. That would let callers do whatever they want to customize the instance creation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At the moment the access token is in fact shared since it comes from Gafaelfawr and just represents the user's identity. Though no guarantees it will stay they way if we end up needing to delegate authentication between IDF and USDF, or some other unfortunate thing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Trying to think through what you said above. For:
my_factory = factory.bind(access_token=token)
ref = Butler.get_dataset_from_uri(dataset_uri, my_factory)
doesn't this mean that the access_token is required to be the same for every butler label?
Don't we need:
my_factory = factory.bind("dp02": {"access_token": token}, "dp02-direct": {"access_token": None})
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is only one access token per HTTP request -- Gafaelfawr generates it to identify the user. It is not associated with the repository in any way. So at least at the moment there's only one access token. If you pass an access token to LabeledButlerFactory
for a DirectButler label it just ignores it.
The advantage to the dependency injection pattern is that this code wouldn't have to know anything at all about authentication or where Butlers come from -- it makes that the caller's responsibility. This code just knows that if it has a label, the caller will work out how to get it the matching Butler. So if things get more complicated later, this interface doesn't have to change. (One example might be caching the Butler if someone wants to call get_dataset_from_uri
multiple times in the same request -- though that might be better solved by wrapping this function in a separate object.)
Your config object idea would work well too, allowing us to add more options later if LabeledButlerFactory
gets more complicated. It's just a little less flexible and it couples this function tightly to LabeledButlerFactory
.
can include ``/`` and the leading ``/`` is always stripped. If the | ||
repository label starts with ``/`` then it must be doubled up. e.g., | ||
|
||
ivo://rubin//repo/main/82d79caa-0823-4300-9874-67b737367ee0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@gpdf do you have any thoughts as to what the netloc field for our IVO IDs might look like?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Following discussion with @timj today, the suggestion is that the netloc be treated leniently -- it will have a standard value (e.g., "rubin", still to be confirmed with the IVOA), and then the path will be:
butler/(butler-identity)/(uuid)
If the butler-identity begins with a "/", e.g., "/repo/main", then an extra "/" is required in the pathname, e.g.,
ivo://rubin/butler//repo/main/82d79caa-0823-4300-9874-67b737367ee0
To be explicit the butler-identity is everything between the first and last slashes in the path.
335acdb
to
1a3d455
Compare
1a3d455
to
957f200
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Responded to request for feedback on ivo: URIs.
can include ``/`` and the leading ``/`` is always stripped. If the | ||
repository label starts with ``/`` then it must be doubled up. e.g., | ||
|
||
ivo://rubin//repo/main/82d79caa-0823-4300-9874-67b737367ee0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Following discussion with @timj today, the suggestion is that the netloc be treated leniently -- it will have a standard value (e.g., "rubin", still to be confirmed with the IVOA), and then the path will be:
butler/(butler-identity)/(uuid)
If the butler-identity begins with a "/", e.g., "/repo/main", then an extra "/" is required in the pathname, e.g.,
ivo://rubin/butler//repo/main/82d79caa-0823-4300-9874-67b737367ee0
To be explicit the butler-identity is everything between the first and last slashes in the path.
Checklist
doc/changes
configs/old_dimensions