Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Git speedups for large repos #207

Open
gjost opened this issue Jan 24, 2022 · 1 comment
Open

Git speedups for large repos #207

gjost opened this issue Jan 24, 2022 · 1 comment
Assignees
Labels

Comments

@gjost
Copy link
Member

gjost commented Jan 24, 2022

Spend no more than 2 days on this.

The ddr-densho-1000 is really huge and this causes usability problems even when the repo is checked out locally. In particular, git status takes forever to run.
Repo has tons of files and also a long history (~4000 commits).

IDEA cp ddr-densho-1000 ddr-densho-1000new, remove .git/, git init
where does the slow come from?
TODO research git performance (num objects, size, repo age)
TODO can we set git caching interval?
TODO profile git operations
does not correlate to number of objects of phsyical size of repo
seems to be length commit history

Ways to improve git status performance (2012)
https://stackoverflow.com/questions/4994772/ways-to-improve-git-status-performance
10 GB repo on NFS on Linux. First time git status ~36min, subsequent 8min

Slow Git Performance (2021)
https://support.purestorage.com/Knowledge_Base/FlashBlade_KB/Slow_Git_Performance

OPTIONS

Shallow clone
git clone --depth=50 --no-single-branch COLLECTION

Sparse checkout
https://github.blog/2020-01-17-bring-your-monorepo-down-to-size-with-sparse-checkout/
git clone COLLECTION
git sparse-checkout init --cone
git sparse-checkout set ...

Partial checkouts
https://github.blog/2020-12-21-get-up-to-speed-with-partial-clone-and-shallow-clone/
Blobless clones: git clone --filter=blob:none
Treeless clones: git clone --filter=tree:0

TODO Test shallow,sparse clones
TODO test on Dana's machine

@gjost gjost added the question label Jan 24, 2022
@gjost gjost self-assigned this Jan 24, 2022
@gjost gjost added the WORKING label Jun 17, 2022
@gjost
Copy link
Member Author

gjost commented Jun 17, 2022

We Put Half a Million files in One git Repository, Here’s What We Learned
https://canvatechblog.com/we-put-half-a-million-files-in-one-git-repository-heres-what-we-learned-ec734a764181

To reduce the amount of work git needs to do to find changes, we used the fsmonitor hook with Watchman so we capture changes as they happen instead of having to scan all files in the repository every time a command is run.
We also enabled feature.manyFiles, which under the hood enables the untracked cache to skip directories and files that haven’t been modified.
Git also has a built-in command (maintenance) to optimize a repository’s data, speeding up commands and reducing disk space. This isn’t enabled by default, so we register it with a schedule for daily and hourly routines.
Sparse checkout
If an engineer can tell us what they usually work on, we can craft a checkout pattern that includes all the required dependencies to run and test their code locally while keeping the checkout as small as possible.
Sparse checkout drawbacks:

  • Tracked files not physically populated on disk can’t be searched through or interacted with. Accidental changes or an erroneous merge conflict might leave these files in a bad state.
  • Overhead to every git checkout to check if updated file should be populated or ignored. This overhead is small with simple patterns but becomes significant with more complex ones.

https://news.ycombinator.com/item?id=31762245
Interesting

The Case Against Monorepos (Infoworld)

Trunk-Based Development: Monorepos (https://trunkbaseddevelopment.com/monorepos)
monorepo.tools - Everything you need to know about monorepos, and the tools to build them (https://monorepo.tools)

@gjost gjost removed the WORKING label Mar 17, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant