-
Notifications
You must be signed in to change notification settings - Fork 17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Describe caching of git repositories #89
base: main
Are you sure you want to change the base?
Describe caching of git repositories #89
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
well written!
## Pros&Cons | ||
|
||
- Cloning is much faster for repositories in the cache. | ||
- Cloning is slower for repositories not present in the cache. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we can have a list of repos where we know they are being cached: cockpit, systemd, kernel sound like great candidates, especially when we can reuse the cached copy in upstream and c9s
- Cloning is slower for repositories not present in the cache. | ||
- Less memory is needed to clone repositories in the cache. | ||
(Which makes it possible to clone kernel for example.) | ||
- More memory is needed to clone repositories not present in the cache. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it depends what would be the workflow to populate the cache but we should do it outside sandcastle/worker
- Less storage is needed for the cloned repo if it is in the cache. | ||
(Only the current state of the repo is saved, historical commits reference the cache repo.) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
storage is so much cheaper in comparison to memory
- The cache does not need to be writable for cloning. | ||
Only for creating/updating. | ||
- Persistent volumes can be used. | ||
- How much storage we can afford? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
our current cost in online is 25€ for 1G of mem and 1€ for 1G of storage - so we can easily start with 16G of storage
|
||
- Manually on request. Mount the volume once with more memory and fetch the needed repository. | ||
- Manually on sentry issue. As previous but gather the problematic repos in sentry. | ||
- Start with kernel manually and add new ones on the go. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be delightful if we could create a workflow how to populate the cache (e.g. regen it weekly), have metadata which repositories are in the cache and when should they be "attached" so they are being used transparently.
- Just kernel. | ||
- A group of hardcoded/configured repositories. | ||
- All repositories matching some condition (at least some commits, some size, ...) | ||
- All repositories. (Add if not present.) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
using kernel for a PoC would be very nice, we could prototype this in the SIG
after it's proven, we could introduce this in the upstream to projects with at least N runs a week?
4. Or, we can forward some method for handling the cloning. | ||
(Defined in the service repo, run in the packit.) | ||
|
||
## Is this relevant for the CLI users? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm sorry but I don't see any value here since people already have those projects cloned.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that it does not make sense to spend much time on it, but (as I wrote below, in this part),
- second repo (upstream/downstream) is temporarily cloned by default
- both repos are temporarily cloned when URL is used as an argument (That's what I use for example. You don't need to care about the state of the local repository.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I really liked how you explained it during the arch meeting, it indeed makes perfect sense for those temporary clones, +1
Signed-off-by: Frantisek Lachman <[email protected]>
630bbbe
to
b4f4ef9
Compare
Preview of the markdown content