-
Notifications
You must be signed in to change notification settings - Fork 36
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support for concurrent metadata fetching #138
Comments
Thank you for creating the issue, it will give us a place to remember to continue the conversation. I look forward to learning from your deep experience, and the performance issues that are of most concern to you. If I understand correctly Orogene does not need to do any backtracking. This means that once it's decided a version is needed then it will definitely need the information about its dependencies. Furthermore it can process those dependencies in whatever order they show up. A NP-Complete resolver also needs to handle cancellation, it may have considered Given cancellation I don't see an elegant API where the dependency provider can push work into the resolver. If some request comes in and it's still relevant everything is happy. If that request comes in and it's no longer relevant what is the resolver supposed to do? It can drop it on the floor, but that seems wasteful. It can put it in a cashe, in case the resolver needs to request it again. But in that case, this caching behavior can just as well live outside the resolver's code. Although I should make a fully worked out example to figure out what the problems are. My fundamental question, I think, is what API would you like to see a library like PubGrub to have? |
The backtracking is kind of irrelevant: what there needs to be an async API for is "I ran into a new-to-the-resolver dependency name, and I need to get its list of available versions", although even this might vary by package manager--this is just the kind of API that Orogene would be able to use because it works such that you get the list of versions, then you do a (synchronous) resolution based on that version list, which is now already in memory. Once you've requested the metadata with all the versions for one package name, you don't need to repeat the process ever again--you just memoize the metadata. So the async thing that PubGrub would need to have is something like: trait DependencyProvider {
type Identifier;
type Metadata;
async fn get_metadata(&self, id: Self::Identifier) -> Self::Metadata;
} |
That makes a lot of sense. I think the existing API fits in your model as the synchronous resolution algorithm, which is called inside some wrapper that does the asynchronous data retrieval and memoize. (And we should add an example of how to make a wrapper like that.) The place I think backtracking fits in... Let's say we have a complex package |
The way Orogene works is that it queues up multiple concurrent metadata requests based on the latest resolved version of their parent package. So if you have just resolved This is essentially the core of the orogene resolver (which, again, doesn't backtrack, but I don't actually think that changes the game here as much, as far as the key point of concurrent metadata fetches goes): https://github.com/orogene/orogene/blob/main/crates/node-maintainer/src/resolver.rs#L52-L276 It's a bit of a trick to make what is conceptually a sequential process parallelize the things that it can. |
That is 100% possible with wrappers around PubGrub! I absolutely need to document as an example (and in the book) how to put the pieces together. |
Fetching metadata synchronously and blocking on its resolution is bound to be extremely slow. In Orogene, our resolver is able to optimistically parallelize dependency metadata fetches before actual placement, so you can have e.g. 50 different packages looking up version information while the resolver works on the data that's already fetched.
The perf benefits of this are enormous.
Anyway just creating this issue on @Eh2406's request :)
The text was updated successfully, but these errors were encountered: