-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DM-45993: Optimize DirectButlerCollections.query_info to avoid too many queries #1075
Conversation
Sounds good. I'll rebase #1074 after this merges and add the new parameters. |
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #1075 +/- ##
=======================================
Coverage 89.65% 89.66%
=======================================
Files 359 359
Lines 46885 46925 +40
Branches 9637 9650 +13
=======================================
+ Hits 42036 42073 +37
- Misses 3482 3485 +3
Partials 1367 1367 ☔ View full report in Codecov by Sentry. |
c88a4bb
to
bcba021
Compare
@timj, I have added a new private method to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great. Thanks for unifying the logic in query-dimension-record and data-ids as well.
It makes sense not to include the doc strings in queries by default.
… (DM-45993) This reduces drastically the number of queries that query_info needs to run.
Filter both dataset types and collections to query from collection summaries.
`query_info` nw receives optional `include_doc` parameter to allow explicit loading of doc strings.
This allows more efficient filtering with per-dataset type list of collection names returned.
Co-authored-by: Tim Jenness <[email protected]>
4b25396
to
76d34d6
Compare
Direct butler reimplements
query_info
method to avoid multiple queries, which makes it significantly faster. This patch also adds two optional parameters toquery_info
to allow further optimizations. There is still an inefficiency infetch_summaries
method when the number of potential collections is very large (when collections are*
). Further optimization would probably need more work (and I think that we'll have to optimize it as the number of collections grows every day).@dhirving, I added the same parameters to remote butler interface, but they are not used for now. I know you are working on DM-46129, maybe you can add forwarding of those parameters to remote server?
Checklist
doc/changes
configs/old_dimensions