Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid cloning dependencies full repo if possible #318

Open
martinetd opened this issue Aug 19, 2023 · 1 comment
Open

Avoid cloning dependencies full repo if possible #318

martinetd opened this issue Aug 19, 2023 · 1 comment

Comments

@martinetd
Copy link

martinetd commented Aug 19, 2023

hi!

cryptpad has recently removed its bower dependency and I tried node2nix on it, and it looks like it generates things as expected. Thanks for this great tool!

One of the dependency is written as a git repo (cryptpad/drawio-npm#npm with full url in lock file) and node2nix will do a full git clone for it.

That repo is fairly big (1.3GB download according to the git clone log message), and just cloning with --depth=1 reduces the download to 56MB (or 50MB for a tar.gz archive) so it's not like it's all useful as we basically just clone to compute the sha256 hash of the dependency (nix-hash on the dir after cleaning up .git etc)?

In practice without package lock we could just clone with depth=1 to get the latest commit as I think that's how the package.json format works; if there's a lock we need to get a specific commit. There's a couple of ways to make this work:

  • if we assume github here (for projects owner/repo automatically expanded to github) then one can use https://github.com/owner/repo/archive/[commit].tar.gz -- this is probably the easiest to implement, but it's a bit sad to only implement github...
  • I wasn't able to make git clone work with a specific commit, but fetch will happily work: mkdir <dir> && git -C <dir> init && git -C <dir> fetch --depth=1 <repo> <commit> && git -C <dir> checkout FETCH_HEAD will "clone" an individual commit, or as well as symbolic versions e.g. commit can be a full rev but refs/pull/4779/head would also work.
    I remembered this would only work with advertised commits (e.g. tip of a branch or tag) before, but this doesn't seem to be the case anymore, and it also worked for me on other repos I tried off other platforms (tried cgit, gitlab). In case of failure we probably want to fall back to a full clone... A bit annoying.

Hmm now I'm looking there's also submodules to cater for; the later clone's command will get a git directory so submodule update --init --recursive --depth=1 should also work just fine, and only get the latest commit.

What do you think?

I've had a quick look at the code and I don't see any huge blocker as long we can untangle the commit that wants to be fetched, but I'm really not familiar with how node works so a sanity check would be appreciated.

This isn't a priority as everything works, but if I could avoid downloading over 1GB of data everytime I want to update cryptpad, it'll be appreciated in the long run :)

@martinetd
Copy link
Author

Nevermind, I ran into a bug after this (it complains cache mode is 'only-if-cached' but no cached response is available which looks like a problem with the package.json/lock file not being up to date?) and was too lazy to look further -- I switched to buildNpmPackage as that seems to work out, so I won't be looking into this further.
(... Just have to look at how to configure this thing again now...)

Thanks again / sorry for the noise; the idea of this issue is still valid so I'm leaving this open, but I no longer have any incentive to actually do it now so won't be following up.
Feel free to close anytime if not interested.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant