-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Consistently use get_requirement
to cache Requirement
construction
#12663
Consistently use get_requirement
to cache Requirement
construction
#12663
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Disclaimer: I am not a pip committer, please do not take any suggestions in my review as binding.
A few things worth highlighting:
-
There's already a caching wrapper function for Requirement construction in the utilities, let's use it instead of creating another one. The
maxsize
argument should probably be bumped, however. 1
pip/src/pip/_internal/utils/packaging.py
Lines 37 to 45 in 9ef0fdf
@functools.lru_cache(maxsize=512) def get_requirement(req_string: str) -> Requirement: """Construct a packaging.Requirement object with caching""" # Parsing requirement strings is expensive, and is also expected to happen # with a low diversity of different arguments (at least relative the number # constructed). This method adds a cache to requirement object creation to # minimize repeated parsing of the same string to construct equivalent # Requirement objects. return Requirement(req_string) -
Packaging 22.0+ has a hand-rolled parser (replacing
pyparser
) which is apparently 95% faster. Will the caching remain beneficial once Upgrade vendored packaging lib #12300 lands? (To test this, the easiest way is likely to importRequirement
from your pip dev environment and install the latest version of packaging, bypassing the vendored copy.)
I'd wager that it's still worth the caching, but it would be good to know by how much after packaging is bumped.
Footnotes
-
It is used globally so there would be likely more cache pressure, although (sporadically, but yeah) ↩
Awesome, I will pull that PR branch and reprofile on it and see if this still takes a significant amount of time.
I didn't realize that existed, I can also profile switching all |
0c89f1d
to
172b0a7
Compare
get_requirement
to cache Requirement
construction
172b0a7
to
3ccb656
Compare
Fixed, made usage consistent across pip codebase, and bumped maxsize, testing cache pressure:
Indeed, I found that parsing is significantly faster, but due to the With #12300 I found I still saw very small improvements for simple requirements that didn't involve backtracking, but instead of running dozens of times I had to run hundreds of times for the statistics to approach significance. So this PR is a big improvement if #12300 doesn't land, but still a significant improvement for pathalogical requirements if #12300 does land. |
ed2c8db
to
34bcf02
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately Requirement
instance attributes are mutable so mayhem is possible if a requirement is ever modified in-place, but it appears that the codebase does not do that. It'll just have to be something to be careful of in future code reviews.
Thanks @notatallshaw for the in-depth investigation here!
I made the same comment recently on another PR. The |
I'm not sure the two actions are actions are actually that counter productive. E.g, If And likely #12300 reduces the constant time factor of construction significantly more than the constant time factor increase switching to a frozen data class might incur. |
To be clear though, this PR doesn't introduce the assumption that Further |
Fair point - as a static typing skeptic, I should be the one arguing the "consenting adults" position 🙂 Consider my reservations withdrawn. |
34bcf02
to
b453c38
Compare
Rebased now #12300 has landed |
I'm mildly concerned about the number of caches we've thrown into the resolver context, especially since we have nothing to enforce mutation prevention. While we have control over all the pieces here, so this is theoretically "safe"; we don't have anything to prevent us from mutating these objects. We should likely look into ensuring we don't mutate these objects if they've been entered into the cache. |
Is there a preferred way you have to prevent mutation? I would be happy to do a follow up PR that turned the objects I've recently touched into data classes, with the appropriate flags set. The current resolution algorithm is O(n^2), and if caches aren't used then pip can end up calling things O(n^3) or worse. So while this is the way the resolution works, I forsee myself and others finding more places where caches improve the resolution. |
None at the moment, hence the "we should look into" statement. To be clear, I am not concerned about this specific PR -- I'm on board for that and it's the broader pattern we have that I am concerned about. 😅 |
Any objections to landing this for 24.2 once conflicts are fixed? I can look into introducing an immutable requirement wrapper/version of |
I'll fix the conflicts as soon as I get a chance, probably tomorrow evening. |
7304556
to
53729cc
Compare
Branch is updated and CI is green now |
Of course this has conflicts again, but under the understanding I'll pursue #12806 after this PR (#12663 (comment)). I'm going to schedule this PR for 24.2. Please feel free to remove this PR from the milestone if you object. |
53729cc
to
4eefd36
Compare
Unblocked, CI is green. |
I have been investigating other easy performance improvements when encountering long backtracking times, so I profiled
pip install --dry-run --ignore-installed apache-airflow[all]
where all packages are cached and with #12657 already applied.Before call graph
For this scenario
_parseNoCache
is taking almost 28% of the run time during profiling, partly because it is being called many times by constructingRequirement
in a loopCaching the construction of
Requirement
the performance ofapache-airflow[all]
went from 174 seconds to 143 seconds and_parseNoCache
reduced to 3% of the run time during profiling:After call graph
Further, testing performance of the highly pathological requirements
"urllib3>2" "boto3<1.20"
total time went from 559 seconds to 117 seconds.Testing very simple requirements such as
requests
andpandas
performance appeared to be about 1-3% better, but they were within the margin of error. However I did observe for these requirements allRequirement
s objects were constructed at least twice, e.g. forrequests
the finalCacheInfo
wasCacheInfo(hits=11, misses=11, maxsize=None, currsize=11)
, and forpandas
the final one wasCacheInfo(hits=86, misses=86, maxsize=None, currsize=86)
. So it does make sense there would be a small saving for even requirements that don't backtrack.I set the maxsize to 1024 preemptively for questions about unlimited memory usage, out of the requirements I tested only
apache-airflow[all]
exceeded the cache size, so I tested a few data points:CacheInfo(hits=2176, misses=60088, maxsize=128, currsize=128)
CacheInfo(hits=2402, misses=59862, maxsize=256, currsize=256)
CacheInfo(hits=55408, misses=6856, maxsize=512, currsize=512)
CacheInfo(hits=56087, misses=6127, maxsize=1024, currsize=1024)
CacheInfo(hits=56307, misses=5906, maxsize=2048, currsize=2048)
CacheInfo(hits=59185, misses=3079, maxsize=None, currsize=3079)
It appears there's some arbitrary point, which depends on the dependency graph, where once the maxsize is over the hit rate goes up significantly. Out of all the graphs I tested, 1024 seemed to be firmly above that number.
For the
apache-airflow[all]
, the time difference between 1024 and None on my machine was 146 seconds vs. 143 seconds.Happy to take a different approach if someone spots one.