Describe caching of git repositories #89

lachmanfrantisek · 2021-04-20T08:26:18Z

Preview of the markdown content

caching_of_git_repos/README.md

TomasTomecek

well written!

TomasTomecek · 2021-04-20T11:00:36Z

caching_of_git_repos/README.md

+## Pros&Cons
+
+- Cloning is much faster for repositories in the cache.
+- Cloning is slower for repositories not present in the cache.


we can have a list of repos where we know they are being cached: cockpit, systemd, kernel sound like great candidates, especially when we can reuse the cached copy in upstream and c9s

TomasTomecek · 2021-04-20T11:01:36Z

caching_of_git_repos/README.md

+- Cloning is slower for repositories not present in the cache.
+- Less memory is needed to clone repositories in the cache.
+  (Which makes it possible to clone kernel for example.)
+- More memory is needed to clone repositories not present in the cache.


it depends what would be the workflow to populate the cache but we should do it outside sandcastle/worker

TomasTomecek · 2021-04-20T11:02:03Z

caching_of_git_repos/README.md

+- Less storage is needed for the cloned repo if it is in the cache.
+  (Only the current state of the repo is saved, historical commits reference the cache repo.)


storage is so much cheaper in comparison to memory

TomasTomecek · 2021-04-20T11:06:29Z

caching_of_git_repos/README.md

+- The cache does not need to be writable for cloning.
+  Only for creating/updating.
+- Persistent volumes can be used.
+- How much storage we can afford?


our current cost in online is 25€ for 1G of mem and 1€ for 1G of storage - so we can easily start with 16G of storage

TomasTomecek · 2021-04-20T11:08:07Z

caching_of_git_repos/README.md

+
+- Manually on request. Mount the volume once with more memory and fetch the needed repository.
+- Manually on sentry issue. As previous but gather the problematic repos in sentry.
+- Start with kernel manually and add new ones on the go.


It would be delightful if we could create a workflow how to populate the cache (e.g. regen it weekly), have metadata which repositories are in the cache and when should they be "attached" so they are being used transparently.

TomasTomecek · 2021-04-20T11:09:13Z

caching_of_git_repos/README.md

+- Just kernel.
+- A group of hardcoded/configured repositories.
+- All repositories matching some condition (at least some commits, some size, ...)
+- All repositories. (Add if not present.)


using kernel for a PoC would be very nice, we could prototype this in the SIG

after it's proven, we could introduce this in the upstream to projects with at least N runs a week?

TomasTomecek · 2021-04-20T11:11:15Z

caching_of_git_repos/README.md

+4. Or, we can forward some method for handling the cloning.
+   (Defined in the service repo, run in the packit.)
+
+## Is this relevant for the CLI users?


I'm sorry but I don't see any value here since people already have those projects cloned.

I agree that it does not make sense to spend much time on it, but (as I wrote below, in this part),

second repo (upstream/downstream) is temporarily cloned by default

both repos are temporarily cloned when URL is used as an argument (That's what I use for example. You don't need to care about the state of the local repository.)

I really liked how you explained it during the arch meeting, it indeed makes perfect sense for those temporary clones, +1

Signed-off-by: Frantisek Lachman <[email protected]>

jpopelka approved these changes Apr 20, 2021

View reviewed changes

caching_of_git_repos/README.md Outdated Show resolved Hide resolved

TomasTomecek reviewed Apr 20, 2021

View reviewed changes

Describe caching of git repositories

b4f4ef9

Signed-off-by: Frantisek Lachman <[email protected]>

lachmanfrantisek force-pushed the caching-of-git-repos branch from 630bbbe to b4f4ef9 Compare April 20, 2021 12:35

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Describe caching of git repositories #89

Describe caching of git repositories #89

lachmanfrantisek commented Apr 20, 2021

TomasTomecek left a comment

TomasTomecek Apr 20, 2021

TomasTomecek Apr 20, 2021

TomasTomecek Apr 20, 2021

TomasTomecek Apr 20, 2021

TomasTomecek Apr 20, 2021

TomasTomecek Apr 20, 2021

TomasTomecek Apr 20, 2021

lachmanfrantisek Apr 21, 2021

TomasTomecek Apr 22, 2021

		- Less storage is needed for the cloned repo if it is in the cache.
		(Only the current state of the repo is saved, historical commits reference the cache repo.)

Describe caching of git repositories #89

Are you sure you want to change the base?

Describe caching of git repositories #89

Conversation

lachmanfrantisek commented Apr 20, 2021

TomasTomecek left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment