-
Notifications
You must be signed in to change notification settings - Fork 118
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Provide a way to remove old commits #681
base: main
Are you sure you want to change the base?
Conversation
Motivation: Central Dogma uses jGit to store data. Due to the nature of Git that stores unlimited history, Central Dogma will eventually get in trouble managing disk usage. We can handle this by maintaing the primary and secondary Git repository internally. This works in this way: 1 Commits are pushed to the primary Git repository. 2 If the number of commits exceed the threshold (`minRetentionCommits`), then the secondary Git repository is created. 3 Commits are pushed to the both primary and secondary Git repositories. 4 If the secondary Git repository has the number of commits more than the threshold; - The secondary Git repository is promoted to the primary Git repository. - The primary Git repository is removed completely. - Another secondary Git repository is created. 5 Back to 3. Modifications: - TBD Result: - Close 575 - TBD Todo: - Provide a way to set `minRetentionCommits` and `minRetentionDay` for each repository. - Support mirroring from CD to external Git.
…itory by the leader
Codecov ReportPatch coverage:
Additional details and impacted files@@ Coverage Diff @@
## main #681 +/- ##
============================================
- Coverage 65.69% 63.53% -2.17%
- Complexity 3351 3473 +122
============================================
Files 358 366 +8
Lines 13893 15093 +1200
Branches 1496 1687 +191
============================================
+ Hits 9127 9589 +462
- Misses 3916 4591 +675
- Partials 850 913 +63
☔ View full report in Codecov by Sentry. |
This somewhat sounds like a surprising behavior to me. This means a diff command that once worked as expected starts to return completely different result once rolling occurs. Can we instead treat any revision between |
This issue seems to be a necessary feature to keep the cluster running reliably. |
@aidanbae Sorry for making you wait for this. I'm working on it now. Please stay tuned. |
That makes sense. Let me apply that approach. |
Motivation: Before we apply line#681 that removes old commits, it would be nice if we can find any usages that access data with the INIT Revision or revisions that are more than 5000 (default minimum number of commit retentions) commits ahead from the head. Modifications: - Add a counter that is incresed when INIT Revision or revisions that are more than 5000 ahead from the head are used. Result: - Track the number of usages. - This commit will be reverted after we find the usage and fix it.
Had a chat with the team, and we decided to raise an exception instead of treating it as an empty state. This PR is ready, so PTAL. 😉 |
Motivation: Before we apply #681 that removes old commits, it would be nice if we could find any usages that access data with the INIT Revision or revisions that are more than 5000 (default minimum number of commit retentions) commits ahead from the head. Modifications: - Add a counter that is increased when INIT Revision or revisions that are more than 5000 ahead from the head are used. Result: - Track the number of usages. - This commit will be reverted after we find the usage and fix it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good overall! Left some minor questions 🙇
requireNonNull(initialRevision, "initialRevision"); | ||
checkArgument(minRetentionCommits > 0, "minRetentionCommits: %s (expected: > 0)", | ||
minRetentionCommits); | ||
checkArgument(minRetentionDays > 0, "minRetentionDays: %s (expected: > 0)", minRetentionDays); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Question) isn't it possible for minRetentionDays
to be 0?
checkArgument(minRetentionDays > 0, "minRetentionDays: %s (expected: > 0)", minRetentionDays); | |
checkArgument(minRetentionDays >= 0, "minRetentionDays: %s (expected: >= 0)", minRetentionDays); |
this.parent = requireNonNull(parent, "parent"); | ||
originalRepoName = requireNonNull(repositoryDir, "repositoryDir").getName(); | ||
this.repositoryWorker = requireNonNull(repositoryWorker, "repositoryWorker"); | ||
requireNonNull(author, "author"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And we can remove the assignment at L171
requireNonNull(author, "author"); | |
this.author = requireNonNull(author, "author"); |
} | ||
} | ||
|
||
private int applyChanges(@Nullable Revision baseRevision, @Nullable ObjectId baseTreeId, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would honestly be very helpful if these kind of methods were deduped instead of copy-pasted so that we don't have to review them again 😅
Anyways, let me just assume that these methods were copy-pasted since I don't think it's realistic to invest more time into looking in detail through this file
if (!primaryRepoDir.mkdirs()) { | ||
throw new StorageException("failed to create " + primaryRepoDir + " while migrating to V2."); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it might be safer to do this before L249 so that we run less of a risk. (If this step fails, then repoDir
doesn't exist and only tmpRepoDir
exists at this point)
continue; | ||
} | ||
|
||
final boolean fetchContent = FindOption.FETCH_CONTENT.get(options); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can compute this once at L441
this.headRevision = res.revision; | ||
final InternalRepository secondaryRepo = this.secondaryRepo; | ||
if (secondaryRepo != null) { | ||
assert headRevision.equals(secondaryRepo.headRevision()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Question) Can this be guaranteed?
For instance, if a CentralDogma
server is shut down forcefully due to a jvm crash before the secondaryRepo.commit
is invoked, will the server be recoverable? or do we need to manually delete the secondaryRepo
folder?
This scenario could also be possible if secondaryRepo == null
while a commit is occurring. (due to secondary repo rolling)
Rather, would a mechanism which catches up to the primary repo be a better idea?
assert headRevision.equals(secondaryRepo.headRevision()); | ||
|
||
// Push the same commit to the secondary repo. | ||
secondaryRepo.commit(headRevision, headRevision.forward(1), commitTimeMillis, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If a StorageException
is thrown for whatever reason (e.g. insufficient disk), should the overall call to the commit also fail?
secondaryRepo.commit(headRevision, headRevision.forward(1), commitTimeMillis, | |
try { | |
secondaryRepo.commit(headRevision, headRevision.forward(1), commitTimeMillis, | |
} (catch ...) {} |
primarySuffix = newPrimarySuffix; | ||
writeSuffix(newPrimarySuffix); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we should update the suffix in case writeSuffix
fails (primaryRepoDir
will return the wrong value)
primarySuffix = newPrimarySuffix; | |
writeSuffix(newPrimarySuffix); | |
writeSuffix(newPrimarySuffix); | |
primarySuffix = newPrimarySuffix; |
} | ||
|
||
@Override | ||
public void createRollingRepository(Revision rollingRepositoryInitialRevision, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Question) I'm assuming that we don't have a way to synchronize two CreateRollingRepositoryCommand
being issued on a single repository simultaneously. (and judging from the current implementation I don't think it should be allowed)
Do you think this scenario is impossible? or do we introduce a separate lock for this command?
Motivation:
Central Dogma uses jGit to store data. Due to the nature of Git that stores unlimited history,
Central Dogma will eventually get in trouble managing disk usage.
We can handle this by maintaining the primary and secondary Git repositories internally.
This works in this way:
minRetentionCommits
), then the secondary Git repository is created.Modifications:
CreateRollingRepositoryCommand
that creates the rolling repository by theCommitRetentionManagementPlugin
.GitRepositoryV2
that manages the rolling jGit repositories.foo_0000000000
,foo_0000000001
,foo_0000000002
and so onRepositoryMetadataDatabase
has the suffix in its file database.GitRepository
is not removed for the migration test.InternalRepository
that has jGit repository andCommitIdDatabase
.diff
,watch
,history
, etc.) is lower than thefirstRevision
of the current primary repo?Revision.INIT(1)
is used, thefirstRevision
is used instead.diff(INIT, new Revision(100) ...)
is equals todiff(new Revision(firstRevisionNumber), new Revision(100) ...)
Revision
betweenRevision.INIT(1)
and thefirstRevision
is used, aRevisionNotFoundException
is raised.watch
andfindLatestRevision
.Result:
minRetentionCommits
while keeping the commits made in the recentminRetentionDay
.Todo:
minRetentionCommits
andminRetentionDay
for each repository.