Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

A faster InstalledFirst policy (remake of #55) #65

Closed
wants to merge 29 commits into from

Conversation

johntyree
Copy link
Contributor

This is a remake of the second half of #55.

This PR introduces a Policy that provides "installed first" behavior to the dependency solver, as well as a few helper classes.

Benchmarks are in issue #52.

It is implemented as a giant priority queue where the priorities split into three bands, in order from highest to lowest:

  1. Installed packages
    • Packages which are already installed on the system.
  2. Required packages
    • Packages which have been explicitly requested.
    • This is lower priority than the installed packages because we don't want to unnecessarily upgrade things.
  3. Everything else.

Within each group, packages are ordered descending by version, such that newer packages are tried first.

I'm currently using the notion that version objects are always comparable to define an ordering over all packages and assign priorities based on that, rather than grouping packages by name. In addition to being fast, easy to write, and easy to understand, it leads to a few interesting properties:

Things it does

  • it suggests packages with large version numbers earlier, i.e. curl-7.3.0 comes well before Biggus-0.7.0.
  • This was very easy to implement, but there's no good reason to do it this way. At best it doesn't hurt us. There are plenty of ways to make this smarter.
  • it suggests packages interleaved with each other, i.e. pep8 1.4.6 then scipy 0.15.0 then pep8 0.6.1,
  • high priority packages are suggested repeatedly until all options are exhausted,
  • not crash on Crash with simple >= request #42,
  • it can totally ignore the clause literals and still be quite fast.

To see this play out, there's a PolicyLogger class that is currently wrapping InstalledFirstPolicy and keeping track of things.

Things it doesn't do

  • Anything smart by examining clause instances themselves,
  • Anything clever with package metadata (i.e. should we suggest packages with few or many dependencies first?)

If this is suitable, I would think that this closes #42 and #52.

@johntyree
Copy link
Contributor Author

I'll address comments from #55 here as well.

@johntyree
Copy link
Contributor Author

From #55 (comment)

If I understand correctly the logic, we have one priority queue divided in 3 non-overlapping bands, whose boundaries are defined w/ REQUIRED and CURRENT. What's the rationale to use this instead of 3 priority queues ?

If it is about getting the priority of an arbitrary pkg id more easily (one instead of 3 function calls), then I think it would be more readable to have a separate class for the low-level priority handling, in which case I don't mind how it is implemented internally (3 bands vs 3 priority queues).

This is indeed to make the code simpler and faster since we are pushing and removing from the priority queue tens of thousands of times for a reasonably sized problem. Given our recent discussion from #62, I will think about how to cheaply split this up to support different behaviors at runtime.

@johntyree
Copy link
Contributor Author

@cournape I found it difficult to split the functionality apart cleanly, but hopefully this is something like what you had in mind 💭

@johntyree
Copy link
Contributor Author

So #68 updates the original InstalledFirstPolicy in a way that it is competitive with this one and out performs it on certain problems.

Since this PR preserves that policy as UndeterminedClausePolicy, I still think this PR is relevant and should be merged. It's also not clear whether either policy behaves as outlined in #62 yet, but I believe the priority queue approach is easier to understand in that respect.

@@ -82,12 +140,16 @@ def _group_packages_by_name(self, decision_set):

def _handle_empty_decision_set(self, assignments, clauses):
# TODO inefficient and verbose

# The assignments with value None
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this comment is redundant

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These comments are here because it took me quite a while to understand what this policy was doing. For someone who knows the code base, they do not add much value. I'll remove them now.

assigned_ids = set(assignments.keys()) - unassigned_ids

# Magnitude is literal, sign is Truthiness
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

@johntyree
Copy link
Contributor Author

@cournape ready for re-review.

@johntyree
Copy link
Contributor Author

This PR is getting dismantled and broken into at least two, maybe more.

@johntyree johntyree closed this Dec 10, 2015
@cournape cournape modified the milestone: 0.1 Dec 16, 2015
@cournape cournape deleted the enh/faster-installed-first branch January 7, 2016 15:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Crash with simple >= request
2 participants