Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Describe caching of git repositories #89

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

lachmanfrantisek
Copy link
Member

caching_of_git_repos/README.md Outdated Show resolved Hide resolved
Copy link
Member

@TomasTomecek TomasTomecek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

well written!

## Pros&Cons

- Cloning is much faster for repositories in the cache.
- Cloning is slower for repositories not present in the cache.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can have a list of repos where we know they are being cached: cockpit, systemd, kernel sound like great candidates, especially when we can reuse the cached copy in upstream and c9s

- Cloning is slower for repositories not present in the cache.
- Less memory is needed to clone repositories in the cache.
(Which makes it possible to clone kernel for example.)
- More memory is needed to clone repositories not present in the cache.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it depends what would be the workflow to populate the cache but we should do it outside sandcastle/worker

Comment on lines +32 to +33
- Less storage is needed for the cloned repo if it is in the cache.
(Only the current state of the repo is saved, historical commits reference the cache repo.)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

storage is so much cheaper in comparison to memory

- The cache does not need to be writable for cloning.
Only for creating/updating.
- Persistent volumes can be used.
- How much storage we can afford?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

our current cost in online is 25€ for 1G of mem and 1€ for 1G of storage - so we can easily start with 16G of storage


- Manually on request. Mount the volume once with more memory and fetch the needed repository.
- Manually on sentry issue. As previous but gather the problematic repos in sentry.
- Start with kernel manually and add new ones on the go.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be delightful if we could create a workflow how to populate the cache (e.g. regen it weekly), have metadata which repositories are in the cache and when should they be "attached" so they are being used transparently.

Comment on lines +53 to +56
- Just kernel.
- A group of hardcoded/configured repositories.
- All repositories matching some condition (at least some commits, some size, ...)
- All repositories. (Add if not present.)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

using kernel for a PoC would be very nice, we could prototype this in the SIG

after it's proven, we could introduce this in the upstream to projects with at least N runs a week?

4. Or, we can forward some method for handling the cloning.
(Defined in the service repo, run in the packit.)

## Is this relevant for the CLI users?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm sorry but I don't see any value here since people already have those projects cloned.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that it does not make sense to spend much time on it, but (as I wrote below, in this part),

  • second repo (upstream/downstream) is temporarily cloned by default
  • both repos are temporarily cloned when URL is used as an argument (That's what I use for example. You don't need to care about the state of the local repository.)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really liked how you explained it during the arch meeting, it indeed makes perfect sense for those temporary clones, +1

Signed-off-by: Frantisek Lachman <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants