Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DM-47325: Add API for parsing butler dataset URIs (butler and ivo) #1113

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

timj
Copy link
Member

@timj timj commented Nov 1, 2024

Checklist

  • ran Jenkins
  • added a release note for user-visible changes to doc/changes
  • (if changing dimensions.yaml) make a copy of dimensions.yaml in configs/old_dimensions

Copy link

codecov bot commented Nov 1, 2024

Codecov Report

Attention: Patch coverage is 98.64865% with 1 line in your changes missing coverage. Please review.

Project coverage is 89.38%. Comparing base (e75d1d5) to head (00b1254).

Files with missing lines Patch % Lines
python/lsst/daf/butler/_labeled_butler_factory.py 87.50% 0 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1113      +/-   ##
==========================================
+ Coverage   89.37%   89.38%   +0.01%     
==========================================
  Files         362      362              
  Lines       48312    48384      +72     
  Branches     5872     5879       +7     
==========================================
+ Hits        43179    43250      +71     
  Misses       3718     3718              
- Partials     1415     1416       +1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

return label, dataset_id

@classmethod
def get_dataset_from_uri(cls, uri: str) -> DatasetRef | None:
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dhirving I was thinking that this should take a LabeledButlerFactory as an optional parameter (so that a service like vo-cutouts can call it directly with the URI) but do you have any thoughts on how to pass in the token? In the general case the caller won't know which butler the URI is referring to. Can I assume that **kwargs should be forwarded as factory.create_butler(label=label, **kwargs) and assume the token would be shared? Or should there be a dict indexed by label that has the additional parameters that the service can set up once and pass in?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like your idea of a config object that could be set up by the caller and passed through.

One other idea would be a dependency injection pattern. Maybe declare a protocol like

class ButlerFactoryProtocol(Protocol):
    create(label: str) -> Butler

# (or just `Callable[[str], Butler]

which get_dataset_from_uri could take as a parameter. Then LabeledButlerFactory could grow a method bind(access_token: str) -> ButlerFactoryProtocol to create a factory with the access token already bound in. That would let callers do whatever they want to customize the instance creation.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At the moment the access token is in fact shared since it comes from Gafaelfawr and just represents the user's identity. Though no guarantees it will stay they way if we end up needing to delegate authentication between IDF and USDF, or some other unfortunate thing.

Copy link
Member Author

@timj timj Nov 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Trying to think through what you said above. For:

my_factory = factory.bind(access_token=token) 
ref = Butler.get_dataset_from_uri(dataset_uri, my_factory)

doesn't this mean that the access_token is required to be the same for every butler label?

Don't we need:

my_factory = factory.bind("dp02": {"access_token": token}, "dp02-direct": {"access_token": None})

?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is only one access token per HTTP request -- Gafaelfawr generates it to identify the user. It is not associated with the repository in any way. So at least at the moment there's only one access token. If you pass an access token to LabeledButlerFactory for a DirectButler label it just ignores it.

The advantage to the dependency injection pattern is that this code wouldn't have to know anything at all about authentication or where Butlers come from -- it makes that the caller's responsibility. This code just knows that if it has a label, the caller will work out how to get it the matching Butler. So if things get more complicated later, this interface doesn't have to change. (One example might be caching the Butler if someone wants to call get_dataset_from_uri multiple times in the same request -- though that might be better solved by wrapping this function in a separate object.)

Your config object idea would work well too, allowing us to add more options later if LabeledButlerFactory gets more complicated. It's just a little less flexible and it couples this function tightly to LabeledButlerFactory.

can include ``/`` and the leading ``/`` is always stripped. If the
repository label starts with ``/`` then it must be doubled up. e.g.,

ivo://rubin//repo/main/82d79caa-0823-4300-9874-67b737367ee0
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gpdf do you have any thoughts as to what the netloc field for our IVO IDs might look like?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Following discussion with @timj today, the suggestion is that the netloc be treated leniently -- it will have a standard value (e.g., "rubin", still to be confirmed with the IVOA), and then the path will be:

butler/(butler-identity)/(uuid)

If the butler-identity begins with a "/", e.g., "/repo/main", then an extra "/" is required in the pathname, e.g.,

ivo://rubin/butler//repo/main/82d79caa-0823-4300-9874-67b737367ee0

To be explicit the butler-identity is everything between the first and last slashes in the path.

@timj timj force-pushed the tickets/DM-47325 branch 2 times, most recently from 335acdb to 1a3d455 Compare November 5, 2024 23:03
Copy link
Contributor

@gpdf gpdf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Responded to request for feedback on ivo: URIs.

can include ``/`` and the leading ``/`` is always stripped. If the
repository label starts with ``/`` then it must be doubled up. e.g.,

ivo://rubin//repo/main/82d79caa-0823-4300-9874-67b737367ee0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Following discussion with @timj today, the suggestion is that the netloc be treated leniently -- it will have a standard value (e.g., "rubin", still to be confirmed with the IVOA), and then the path will be:

butler/(butler-identity)/(uuid)

If the butler-identity begins with a "/", e.g., "/repo/main", then an extra "/" is required in the pathname, e.g.,

ivo://rubin/butler//repo/main/82d79caa-0823-4300-9874-67b737367ee0

To be explicit the butler-identity is everything between the first and last slashes in the path.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants