You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The ddr-densho-1000 is really huge and this causes usability problems even when the repo is checked out locally. In particular, git status takes forever to run.
Repo has tons of files and also a long history (~4000 commits).
IDEA cp ddr-densho-1000 ddr-densho-1000new, remove .git/, git init
where does the slow come from?
TODO research git performance (num objects, size, repo age)
TODO can we set git caching interval?
TODO profile git operations
does not correlate to number of objects of phsyical size of repo
seems to be length commit history
We Put Half a Million files in One git Repository, Here’s What We Learned https://canvatechblog.com/we-put-half-a-million-files-in-one-git-repository-heres-what-we-learned-ec734a764181
To reduce the amount of work git needs to do to find changes, we used the fsmonitor hook with Watchman so we capture changes as they happen instead of having to scan all files in the repository every time a command is run.
We also enabled feature.manyFiles, which under the hood enables the untracked cache to skip directories and files that haven’t been modified.
Git also has a built-in command (maintenance) to optimize a repository’s data, speeding up commands and reducing disk space. This isn’t enabled by default, so we register it with a schedule for daily and hourly routines. Sparse checkout
If an engineer can tell us what they usually work on, we can craft a checkout pattern that includes all the required dependencies to run and test their code locally while keeping the checkout as small as possible.
Sparse checkout drawbacks:
Tracked files not physically populated on disk can’t be searched through or interacted with. Accidental changes or an erroneous merge conflict might leave these files in a bad state.
Overhead to every git checkout to check if updated file should be populated or ignored. This overhead is small with simple patterns but becomes significant with more complex ones.
Spend no more than 2 days on this.
The
ddr-densho-1000
is really huge and this causes usability problems even when the repo is checked out locally. In particular,git status
takes forever to run.Repo has tons of files and also a long history (~4000 commits).
IDEA cp ddr-densho-1000 ddr-densho-1000new, remove .git/, git init
where does the slow come from?
TODO research git performance (num objects, size, repo age)
TODO can we set git caching interval?
TODO profile git operations
does not correlate to number of objects of phsyical size of repo
seems to be length commit history
Ways to improve git status performance (2012)
https://stackoverflow.com/questions/4994772/ways-to-improve-git-status-performance
10 GB repo on NFS on Linux. First time git status ~36min, subsequent 8min
Slow Git Performance (2021)
https://support.purestorage.com/Knowledge_Base/FlashBlade_KB/Slow_Git_Performance
OPTIONS
Shallow clone
git clone --depth=50 --no-single-branch COLLECTION
Sparse checkout
https://github.blog/2020-01-17-bring-your-monorepo-down-to-size-with-sparse-checkout/
git clone COLLECTION
git sparse-checkout init --cone
git sparse-checkout set ...
Partial checkouts
https://github.blog/2020-12-21-get-up-to-speed-with-partial-clone-and-shallow-clone/
Blobless clones: git clone --filter=blob:none
Treeless clones: git clone --filter=tree:0
TODO Test shallow,sparse clones
TODO test on Dana's machine
The text was updated successfully, but these errors were encountered: