-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enumerate GitHub wikis concurrently #2371
Comments
I'm leaning towards option 2 ("just use another goroutine" always worries me a little), but I'll let others weigh in. While we wait for that, though, what's the latency penalty of the code as it is? Is it bad enough that we should back it out while we decide? |
Give it a try yourself and let me know. I live in a century-old building with slow Internet speed, plus WSL's networking stack constantly falls over itself. Perhaps my results are a confluence of bad luck and not reflective of a typical experience. |
I just checked myself, and you're right, that performance cost is drastic. I'm going to open a revert PR to back it out until we figure out how we want to solve this. EDIT we could also just default the flag to off, right? |
This was fixed in #2379. Albeit, the flag is still disabled by default. |
Please review the Community Note before submitting
Description
PR #2233 added the ability to scan GitHub wikis by default, as wikis are just repos and scanning them has no rate limit. Regrettably, I overlooked how dispatching a request per repo can increase startup time. This does not have a noticeable effect when scanning repos or smaller orgst, however, it does have a noticeable impact for larger orgs.
trufflehog/pkg/sources/github/repo.go
Lines 238 to 240 in 27b30e6
Without this check, enumeration takes ~3s per page (I have terrible Internet):
With this check, enumeration takes closer to ~15s per page:
Preferred Solution
Two potential solutions come to mind:
trufflehog/pkg/sources/github/github.go
Lines 804 to 815 in 27b30e6
A third possibility would be to simply try to clone the wiki for any repo that has
"has_wiki": true
and ignore clone errors. The theory is that a HEAD request would be more efficient than attempting to clone a repo.Additional Context
A network request is necessary because GitHub's
has_wiki
property lies.#2233 (comment)
References
#2233
The text was updated successfully, but these errors were encountered: