Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DRS bulk requests #334

Open
briandoconnor opened this issue Oct 20, 2020 · 3 comments
Open

DRS bulk requests #334

briandoconnor opened this issue Oct 20, 2020 · 3 comments

Comments

@briandoconnor
Copy link
Contributor

briandoconnor commented Oct 20, 2020

Goal

We want to have this merged into DRS 1.4 ahead of the fall 2023 Plenary

Background

Some DRS implementers/users have requested the ability to make DRS requests in bulk for multiple DRS URIs. For example, NHGRI AnVIL (Terra) and the use of Galaxy in that project, Gen3 (for multiple projects), and Velsera.

As of 5/22/23 we have the complete set of bulk endpoints for authorization information, DRS IDs, and DRS access methods. This PR does not include pagination nor does it include explicit pairing of passports to the output of a bulk response. See the PR for more info.

Feature Branch/PR

We made a PR #365

@luke-c-sargent
Copy link

Late to the party, but to fill in some Galaxy-centric details: there are AnVIL workspaces that contain tables with thousands of DRS URIs; sometimes there exists associated metadata within these tables Galaxy can use to pre-populate fields and delay resolution til the point of actual file acquisiton, sometimes there is not. To show the user the nature of the data they are browsing in the latter case (and not just a big guid), that will require thousands of individual HTTP connections to the resolver. It seems like batch resolution is a win for everyone re: infrastructure strain and user experience in any circumstance where there is more than one DRS URI to resolve.

Bulk resolution could as simple as a body containing a list of DRS URIs. If there is some concern about this being non conformant to the DRS spec (I vaguely recall this being mentioned when I brought it up in a call), perhaps implementing something like Google storage batching via multipart message types would allow the underlying system to operate on in individual basis while still providing the time / bandwidth savings. This might also be useful in situations where a system wants to aggregate requests from multiple users, with each part having its own auth token (Example: AnVIL users can click a button to resolve DRS URIs to see its metadata; if there were tens of thousands of concurrent users making these requests, the underlying systems could bundle all of their requests into multipart packets with individual auth tokens, sort the results, and shave off precious UI lag seconds).

These are just some thoughts; of course the best folks to formulate the correct approach are you all! Thanks for keeping this in your dev thought processes, I'm keen to see how this progresses and happy to answer any questions re: the Galaxy on AnVIL side of things.

@ianfore ianfore mentioned this issue Apr 12, 2021
@jb-adams
Copy link
Member

this was discussed at the June 2021 FASP hackathon, see Section 2a of the hackathon notes. The main outcome was a strawperson gist developed by @mbarkley , @ianfore and others outlining the API endpoint and payload format to facilitate batch requests

@briandoconnor
Copy link
Contributor Author

See notes from the June 2021 FASP hackathon.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants