-
Notifications
You must be signed in to change notification settings - Fork 142
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Don't search/list every single repo reference when explicit references are provided #112
base: master
Are you sure you want to change the base?
Don't search/list every single repo reference when explicit references are provided #112
Conversation
It would be a nice optimization not to run But this PR as currently written introduces a couple of rough edges:
|
…s are specified When explicit roots and reference filters are both provided, we want to make sure that the optimisation don't kick in. As we still want to iterate over all references in order to apply the filters. Here is an example with git-sizer being run with explicit roots and reference filters ``` git-sizer main~20^{tree} main~10^{tree} --tags --branches ```
135d97e
to
721e353
Compare
We printed out the reference count, even when explicit roots were provided. ``` git-sizer master~50^{tree} 'HEAD@{1.week.ago}' -v [TRUNCATED_OUTPUT] | * References | | | | * Count | 187 | | | * Ignored | 187 | | | | | | [TRUNCATED_OUTPUT] ``` This is not very useful, as the reference count depends on the list of references extracted by running `git for-each-ref`. This commit changes the output to not print the reference count when explicit roots and no reference filters are provided to git-sizer
The reference count is still displayed when the -j flag is passed to git-sizer. ``` git-sizer master~50^{tree} 'HEAD@{1.week.ago}' -v -j [TRUNCATED_OUTPUT] "reference_count": 15, "reference_groups": { "ignored": 15 }, [TRUNCATED_OUTPUT] ``` We don't want to print the reference count for json v1, when explicit root are not provided git-sizer
41cd01a
to
d407c86
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Aside from the line comments, I noticed that this version of the code emits confusing counts in the "references" progress meter, like:
$ git-sizer HEAD^
[…]
Processing references: 1
I think that this count should be zero, since no references were scanned. But the progress meter is actually counting roots, not references. So this is misleading. (I know it's "just" a progress meter, but I know that I glance at those numbers all the time and rely on them.)
I suggest instead iterating over the references and roots separately, with two separate progress meters, like
diff --git a/sizes/graph.go b/sizes/graph.go
index 0fb1c8a..2fb1b71 100644
--- a/sizes/graph.go
+++ b/sizes/graph.go
@@ -269,14 +269,22 @@ func ScanRepositoryUsingGraph(
return HistorySize{}, err
}
+ // Process roots in two passes just to make the progress meter
+ // output more useful:
+
progressMeter.Start("Processing references: %d")
for _, root := range roots {
- progressMeter.Inc()
if refRoot, ok := root.(ReferenceRoot); ok {
+ progressMeter.Inc()
graph.RegisterReference(refRoot.Reference(), refRoot.Groups())
}
+ }
+ progressMeter.Done()
+ progressMeter.Start("Processing roots: %d")
+ for _, root := range roots {
if root.Walk() {
+ progressMeter.Inc()
graph.pathResolver.RecordName(root.Name(), root.OID())
}
}
for _, arg := range flags.Args() { | ||
oid, err := repo.ResolveObject(arg) | ||
// If no reference filters and no explicit roots were provided | ||
if git.IsNoReferencesFilter(rgb.GetTopLevelGroup().GetFilter()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of revealing internal details via both RefGroupBuilder.GetTopLevelGroup()
and refGroup.GetFilter()
plus needing the strange function IsNoReferencesFilter()
, how about adding a special-purpose method to RefGroupBuilder
, something like the following:
// RefoptsSeen returns `true` iff any reference selection options were
// used. It is only valid after `Finish()` has been called.
func (rgb *RefGroupBuilder) RefoptsSeen() bool {
return rgb.topLevelGroup.filter != git.NoReferencesFilter
}
git.NoReferencesFilter
is a singleton, so I think that a simple comparison like this should suffice.
|
||
func IsNoReferencesFilter(val interface{}) bool { | ||
_, ok := val.(noReferencesFilter) | ||
return ok | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
RefGroupBuilder.RefoptsSeen()
could make this could go away.
func (rg *refGroup) GetFilter() git.ReferenceFilter { | ||
return rg.filter | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
RefGroupBuilder.RefoptsSeen()
could make this could go away.
func (rgb *RefGroupBuilder) GetTopLevelGroup() *refGroup { | ||
return rgb.topLevelGroup | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
RefGroupBuilder.RefoptsSeen()
could make this could go away.
var refCountSection section | ||
if !opts.WithoutReferenceCount { | ||
refCountSection = S( | ||
"References", | ||
I("referenceCount", "Count", | ||
"The total number of references", | ||
nil, s.ReferenceCount, metric, "", 25e3), | ||
S( | ||
"", | ||
rgis..., | ||
), | ||
) | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This approach seems to depend on the fact that even though the uninitialized section
is inserted into the table, it doesn't output anything. I think that's correct ✔️
(To be clear, the progress meter was already inaccurate before this PR. It just wasn't so obvious, because the references were being scanned and counted as well.) |
elhmn-limit-tree-traversal-to-refs-passed-by-arguments |
Now that we can provide explicit roots to git-sizer, we don't want to look for other references than the one provided.