-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DM-47375: Run query_all_datasets in a single request for RemoteButler #1114
Open
dhirving
wants to merge
10
commits into
main
Choose a base branch
from
tickets/DM-47375
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #1114 +/- ##
==========================================
+ Coverage 89.41% 89.44% +0.02%
==========================================
Files 363 366 +3
Lines 48440 48598 +158
Branches 5879 5890 +11
==========================================
+ Hits 43315 43469 +154
- Misses 3716 3717 +1
- Partials 1409 1412 +3 ☔ View full report in Codecov by Sentry. |
dhirving
force-pushed
the
tickets/DM-47375
branch
3 times, most recently
from
November 5, 2024 22:54
03351be
to
79c520d
Compare
dhirving
force-pushed
the
tickets/DM-45873
branch
from
November 8, 2024 21:23
2cd8e3e
to
c077b29
Compare
Switch from the query_datasets convenience method to the advanced query system in query_all_datasets. This lets us get the results one page at a time, which will be needed to prevent memory exhaustion when running these queries on the server.
It turns out that the query-datasets CLI was not actually using dimension records, and it will simplify the implementation to not support this.
The backend for querying multiple dataset types will not support "order by", so restrict the CLI to match the implementation.
The upcoming implementation of query_all_datasets will not support order_by, so remove it. This requires modifying the query-datasets CLI to use the single dataset type query_datasets when order by needs to be supported.
In preparation for implementing query_all_datasets on the server, make the streaming response and timeout logic from the existing query handler re-usable.
After the refactor in the previous commit, this is somewhat independent of the query routes.
This will be shared by the RemoteButler query_all_datasets implementation in an upcoming commit.
This will be used in an upcoming commit to prevent excessive duplication of function parameters between implementations of query_all_datasets.
query_all_datasets can potentially involve hundreds or thousands of separate dataset queries. We don't want clients slamming the server with that many HTTP requests, so add a server-side endpoint that can handle these queries in a single request.
dhirving
force-pushed
the
tickets/DM-47375
branch
from
November 12, 2024 21:15
07afc52
to
32c647e
Compare
It turns out the QueryDatasets class is shared by multiple CLI scripts, some of which need dimension records included. So add back `with_dimension_records` to the internal implementation of query_all_datasets.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Added a server-side endpoint to handle
query_all_datasets
in a single request.query_all_datasets
can potentially involve hundreds or thousands of separate dataset queries, and we don't want clients slamming the server with that many HTTP requests.The new endpoint streams results in the same manner as the existing query endpoints used by
QueryDriver
, but it is separate from theQuery
/QueryDriver
framework.This is not yet used in the CLI tools and
Butler._query_all_datasets
is still private -- we need to deploy an updated server with this change before we can release the client side.--order-by
in the CLI tools is now restricted to queries for a single dataset type -- future implementations ofquery_all_datasets
may not support it.Checklist
doc/changes
configs/old_dimensions