-
Notifications
You must be signed in to change notification settings - Fork 2.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make the lock file more merge-friendly #496
Comments
+1 Resolving the |
Workaround:
or some other similar innocuous package. Will this issue be resolved soon? Does a missing content-hash hurt security or not? |
Yarn's implementation of this is way less insane than I was expecting:
If the merge conflict resulted in a syntax error, it fails. Yarn's lockfile structure is designed to make this easier: it's flat, it's sorted alphabetically and node is okay with duplicated dependencies in the tree. |
is |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. |
I think this is still a valid issue and should not be closed. |
I have written a stand-alone script as a stopgap. It only goes half the way, resolving the merge conflict for https://github.com/cjolowicz/scripts/blob/master/python/poetry-merge-lock.py Update: Please use poetry-merge-lock from PyPI instead (see comment below). This should allow you to merge without manual conflict resolution, in most cases. |
Here is a stand-alone tool which should handle most merge conflicts in the lock file: pip install --user --upgrade poetry-merge-lock This tool is in early development. If you're interested in trying it out, please let me know on its issue tracker if you encounter any problems. @sdispater, @finswimmer : Would you be interested in a PR to add this as a Poetry command? |
Hello @cjolowicz , your contribution is very welcome. I find this feature very useful. 👍 However, I cannot promise if and when this can be included. Including new features is decided by @sdispater . But please go on! fin swimmer |
as long as there is a line The repository I just tried this on, 5 MR were created. If it uses tox, it takes no human time and 5 CI runs. Multiple this by my 30 repos, and make it a daily thing to have to deal with and it becomes intractable. The solution cannot be in a new poetry command to resolve the lock, that would still require all the manual work. Beacuse all these MR are created at once, in parallel, we cannot teach dependabot to do it either, at the time of each MR they all come from the same spot on master so there is no conflict, its only when the first is accepted that we need to rebase the 2nd, and when it is accepted we need to rebase (twice) the 3rd, and so on. Can we not make the issuance of event without dependabot its troublesome. Ideas:
so perhaps if there was a --no-output-hash option to the |
@donbowman Doesn't Dependabot rebase your PRs automatically if they conflict? This is what happens for me. I only need to resolve conflicts when rebasing or cherry-picking my own commits, in which case poetry-merge-lock mostly works fine. I sometimes follow it by |
Currently when creating a lockfile, a metadata.content-hash field is generated, containing the hash of the sorted contents of pyproject.toml. However (see python-poetry#496) this causes merge conflicts, particularly when using a tool such as dependabot that makes multiple parallel branches with dependency changes. Add a --no-hash option to poetry update. This causes metadata.content-hash to be omitted. In turn poetry will later complain that the lock file may not be fresh, (although it is correct), and the user can run poetry lock to update it, after all the merges are complete. dependabot currently runs `poetry update <DEPENDENCY> --lock`, I would propose to incrporate this PR to poetry and then modify dependabot to have teh --no-hash option.
Currently when creating a lockfile, a metadata.content-hash field is generated, containing the hash of the sorted contents of pyproject.toml. However (see python-poetry#496) this causes merge conflicts, particularly when using a tool such as dependabot that makes multiple parallel branches with dependency changes. Add a --no-hash option to poetry update. This causes metadata.content-hash to be omitted. In turn poetry will later complain that the lock file may not be fresh, (although it is correct), and the user can run poetry lock to update it, after all the merges are complete. dependabot currently runs `poetry update <DEPENDENCY> --lock`, I would propose to incrporate this PR to poetry and then modify dependabot to have teh --no-hash option.
this is on gitlab. I added #2654 to poetry as a suggested solution. |
@donbowman I assumed you were referring to GitHub Dependabot. The bot recreates the PR (not a rebase, sorry for the sloppy terminology) when the changes conflict with the target branch. |
i see. i guess, inefficiently, i could create some webhook that when the first mr is accepted, it could somehow run the dependabot again to recreate all the remaining updates, and then repeat every hour or so until all are done. |
It looks like Dependabot actually does automatically rebase (well, rewrite-force-pushes) the PR by default, according to the documentation. Judging by the docs and that issue, it sounds like it does it with webhooks or another timely mechanism rather than waiting for the next run, though neither one is explicit on how quick the turnaround is. Maybe you're seeing merge conflicts triggered by |
content-hash conflicts even w/o dependabot. if 2 people change 2 branches, its in conflict. the hosted dependabot of github may in fact have some faster trigger. |
Currently when creating a lockfile, a metadata.content-hash field is generated, containing the hash of the sorted contents of pyproject.toml. However (see python-poetry#496) this causes merge conflicts, particularly when using a tool such as dependabot that makes multiple parallel branches with dependency changes. Add a --no-hash option to poetry update. This causes metadata.content-hash to be omitted. In turn poetry will later complain that the lock file may not be fresh, (although it is correct), and the user can run poetry lock to update it, after all the merges are complete. dependabot currently runs `poetry update <DEPENDENCY> --lock`, I would propose to incrporate this PR to poetry and then modify dependabot to have teh --no-hash option.
My 2 cents, if the [metadata]
content-hash = [
"astroid:03472c30eb2c53",
"flask:bb564576db6a918",
#...
] |
Here's an approach that has worked well for me and only uses git and (recent) Poetry: git restore --staged --worktree poetry.lock
poetry lock --no-update When rebasing a feature branch on main, this preserves pins from the main branch, and recomputes pins for your feature branch. You would then follow up with these commands to continue the rebase: git add poetry.lock
git rebase --continue |
Most the answers here are answering what to do on a merge conflict, but we should be addressing why we're having a merge conflict in the first place. It is not reasonable to have a merge conflict whenever two developers modify different dependencies. Having a conflict means you will need to realize that, merge master into your branch, recreate the lock file (which takes at least 3 minutes, because python...), re-push, re-run your CI (which could take a lot of time as well), etc. This is very bad UX. @lephuongbg's comment is a possible solution. Is it possible to look into this? Thank you a lot! |
@neersighted I understand the suggestion is quite dangerous, but as more and more people are reporting that the conflict issue is turning into a major pain point, I was hoping that some kind of workaround could be made possible (+1 on giving this a very clear name). Over the weekend I wanted to understand how other ecosystems (specifically Node.js) are solving the same problem (while keeping in mind that the metadata cost in Python is different). NPM stores the "self" package in the lock file as well (Yarn only does when using workspaces it seems): // package-lock.json
{
"name": "poetry-test",
"version": "1.0.0",
"lockfileVersion": 2,
"requires": true,
"packages": {
"": {
"name": "poetry-test",
"version": "1.0.0",
"license": "MIT",
"dependencies": {
"react": "^18.2.0"
}
}, Poetry instead stores a hash of
If we compare these two approaches, NPM's lockfile can be merged by git most of the time (and otherwise humans can merge it by hand) while Poetry is guaranteed to conflict when two commits modify the While it seems quite possible to render
Generally, a git-merged lock file is not guaranteed to be "correct" regarding to the That said, the correctness argument seems to be ignoring the fact that:
The thing that makes (1) a bit unique in Poetry is that That said, this should more or less also work if Poetry stored the Regarding (2) to me it seems that this hash is giving a false sense of correctness. The only true way to ensure things are correct is to run To summarize, I find it quite hard to defend the existence of the content-hash. (I personally believe that Poetry's UX would benefit if |
This is easy to assert from a non-maintainer standpoint, but we already pay a very high maintenance/support burden for other poorly designed/second-class features. I suspect you will not be able to find any consensus to merge another one.
You're not incorrect, but this is not the intended way to use Poetry/a primary workflow. In general what is intended is If we drop hashing and make the lock file self-descriptive, we'll greatly increase the maintenance burden for the lock file, but that may be a price worth paying. However, we make the common case much more risky/dangerous. People are going to blindly trust Dependabot/Git to update the lock file, and will be opening bugs against Poetry when they get unexpected results. I'm not sure that "make sure you run I'd still like to hear what some of the SMEs in this area think, but since we're now talking about a "no-hash" approach instead of trying to change the solver, I'll also ping @abn @finswimmer @sdispater for opinions. |
A final thought: we can never come up with the 'canonical' (what the solver would pick, given all information was available to it) solution to lock file merges, but maybe we can at least start validating the lock file makes sense? That is to say, we store the dependency tree, so we could at least validate that it still makes sense/the solution doesn't have internal conflicts. We might already do that, but it certainly doesn't happen in the parts of the installer I am familiar with... cc @radoering @dimbleby |
This command just checks for the correct hash. (That's why it is quite fast.)
I disagree. Even for large projects
More or less. Actually, each install contains a solver run for the target environment with the lockfile being the only available repository. If the lockfile is not consistent with the pyproject.toml you get some sort of a solver error many users don't understand. The only hint for the reason of these errors is the warning about an inconsistent lock file (based on the hash). Just a quick idea, but we could derive "solver error during install" -> "inconsistent lock file". (It's probably not that easy because there are probably a bunch of different exceptions we have to consider but it might work.) Anyway, it's less expensive to validate the lockfile for the current environment (during install) than to validate it in general. |
Right, this matches my current recollection. If it's a full solver run (and not abbreviated like I thought), then it's more robust than I realized. The missing piece is inferring that a solver error during install is likely caused by an inconsistent lock file; I wonder if re-trying with an automatic re-lock makes sense in that case? |
In general it is not possible safely to merge two independent changes to a lockfile without re-solving. One can construct examples in which an upgrade to A is possible, an upgrade to B is possible, but the upgraded A and B are incompatible with one another. So if some bot has proposed both upgrades then it would be an error to merge both, even if they are compatible so far as git can tell. In that sense, being merge-friendly would actually be an anti-feature!
Edit: I mostly withdraw this last comment. Looking back over some dependabot MRs: it is capable of making lockfile-only changes, but it also sometimes changes pyproject.toml - even when it didn't have to. Perhaps it would be a good idea to ask dependabot to behave as I had believed it behaved all along, ie have it not change pyproject.toml when the update that it is making is compatible with the existing requirements |
I wouldn't want to have an automatic re-lock. IMO, |
This has always been my general objection. However, I wonder if users might find it more desirable to use a "best-guess" (Git merge-based) solution when possible, and to bail out when an incompatible merge happened (since we in theory have the metadata to detect that/already run the solver). However, I just noticed a seeming non-sequitur from @Mogost:
I don't understand what you mean -- a conflicted |
I've been looking more carefully at my dependabot MRs to understand why sometimes dependabot updates pyproject.toml and sometimes it is content simply to update poetry.lock I think the difference is that So perhaps there's some code already in dependabot that is trying to be smart about this, but which could be taught about Maybe someone on this thread who finds this bothersome would like to raise an issue (or merge request) at dependabot to investigate (or fix) this. |
Even if you don’t want renovate to group dependency updates, you can still do things like ask it to only create one PR at a time, for example. This can reduce the burden of wasted CI time. No matter which way you look at it, it will still make your life easier in dealing with this poetry lock file problem. |
Well, it's a non-non-sequitur. Right now, if you want a completely automated solution to resolve merge conflicts with Poetry lockfiles, you have to tell dependency bots to "rebase" their PRs every time a PR touching the lockfile is merged. I'm putting "rebase" in quotes, because in the context of dependency bots like dependabot or Renovate it's not about git rebase, but rather about re-locking the lockfile. So, imagine, you have 4 dependabot PRs. You merge 1 PR, and trigger 3 PRs to re-lock and re-run the CI (not only Poetry, which might take an insignificant amount of time as compared to tests, but everything). Then you merge the 2nd one and the dance repeats. This is what drains CI-time, not a possibly failing Poetry call, which (if it indeed fails) will actually conserve CI time, because downstream jobs just won't be able to proceed. So I think your thinking goes into the right direction in terms of saving CI time. |
Maybe this is a matter of differing workflows, but as when I use automated tools there is one PR or they are batched, the prospect of "random" failures requiring human intervention during the install stage seems much more likely to waste CI time. |
Have Poetry developers looked at other lock file implementations to help with creating a more merge friendly format? What about taking a look at Elixir's |
The content-hash remains to be a big point of frustration. When three engineers add dependencies it's a constant race who gets their merge in first. The other two will have to rebase either once or twice to fix the |
The issue I'm most afraid over right now over how the If there was even an imperfect way poetry could decrease the surface of a conflict (and just having a built-in way to resolve In a tool that I otherwise really enjoy using, this is a painful pain point. |
since there is no content-hash change when pyproject.toml is unchanged, a "built-in way to resolve |
At the moment, I am the only developer managing dependencies with Poetry, and I actually don't know how to manage I feel Poetry is absolutely essential to use, but I don't know how to manage the |
Ah, I explained badly:
But as been said earlier in the discussion: Of course, to go as far as possible, any solution would need to look at both sides of the merge, and try to solve them together (like the NPM way someone described, or like the now-outdated "poetry-merge-file" package). And this is of course much harder to maintain. |
@neersighted Have you thought any more about this idea? To me, it sounds ideal if possible. (Sorry if this became spammy, but there's been so much back and forth in this issue that I felt a moments suggestion/thought may have been lost.) |
This is not just a dependabot issue: if you work on a long-lived branch where you add/remove/upgrade dependencies good luck with rebasing against external dependency upgrades (for instance, merged dependabot PRs)! |
This is more for myself, but having to rebase all the time, just accept the incoming rebase:
then re-lock all the lock files. This primarily assumes the development branch's PR is JUST for adding a new package. |
Issue
When two devs install dependencies on separate branches, it is very easy to end up merge-conflicted, in particular, the
metadata.content-hash
key often changes. It is very unclear how to resolve this manually, so I often delete the lockfile (or perhaps just that key) and rebuild it and, basically, hope that it comes out the same.It seems like in some scenarios that merge conflicts could be resolved automatically based on
pyproject.toml
. Yarn does this, for instance.The text was updated successfully, but these errors were encountered: