Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consider making "skip git" the default in v2 #1277

Open
G-Rath opened this issue Sep 25, 2024 · 4 comments · May be fixed by #1311
Open

Consider making "skip git" the default in v2 #1277

G-Rath opened this issue Sep 25, 2024 · 4 comments · May be fixed by #1311
Assignees
Labels
V2 Wishlist Enhancements that require a breaking change

Comments

@G-Rath
Copy link
Collaborator

G-Rath commented Sep 25, 2024

(I've been meaning to raise this to discuss for a few months and we're getting closer to v2 so I'm making a public issue to force my own hand 😄)

Currently when I run the scanner without any config it will include scanning the commit if it sees its in a git repository which generally it always will be for me as everything I do that involves code and dependencies uses git for source control, however most of the time these git repositories will not be public because they're proprietary applications meaning there will never a result of scanning the commit and arguably that's a bit of private data being sent to the API (which unlike e.g. the dependency tree, will never give a positive result).

My understanding is that generally there are two main criteria for git scanning to be useful:

  1. the codebase is public (meaning its commits can be indexed, and whatnot)
  2. the codebase is using a very "commit driven" language like C++ (i.e. while possible for other ecosystems to have commits related to vulnerabilities, it's far more likely in just about every ecosystem that version numbers/semver will be being used instead)

Given the number of ecosystems the scanner supports and frankly just how much better/easier non-git based vuln info is to handle right now, I strongly suspect more than half of the uses of the scanner are in a context where at least one of those conditions are not true meaning this current default is not useful for most people and arguably a little negative (though I admit it's not a huge privacy concern).

Personally I'd prefer if this was opt-in (i.e. --check-git or --include-git), though maybe there's another way to try make this more automatic - for example, maybe it would make sense for the scanner to fallback to checking git if it doesn't find any other lockfiles, or to make this managed through a config property so codebases could opt-in (i.e. open source repos could create an osv-scanner.toml with a marker effective telling the scanner "hey this is a public repo so feel free to be more 'aggressive' in what you check")

@G-Rath G-Rath added the V2 Wishlist Enhancements that require a breaking change label Sep 25, 2024
@G-Rath
Copy link
Collaborator Author

G-Rath commented Oct 8, 2024

Discussed offline and agreed that we should make it disabled by default

@G-Rath G-Rath self-assigned this Oct 8, 2024
@oliverchang
Copy link
Collaborator

This generally makes sense to me that we shouldn't need to scan the git hash of projects generally. However, we do want C/C++ scanning to work out of the box for third party / open source dependencies.

Should we look at still accounting for git repos if they live inside one of these directories by default?

vendoredLibNames = map[string]struct{}{

@G-Rath
Copy link
Collaborator Author

G-Rath commented Oct 10, 2024

@oliverchang it looks like we're already doing that - if I'm understanding the code right, vendor scanning is about finding git commit shas, and is currently in no way tied to the --skip-git flag ok no it looks like the current logic is to look for C/C++ files and determine a hash, so we'd want to extend that logic to also check if there's an initialized .git directory and if so include the sha.

On an aside, that list includes vendor which is a common directory used by bundler/rails, which was another thing I was going to look into after learning about it the other day (as i.e right now on our Rails apps the scanner keeps finding those as an empty directory and then complaining that there's nothing in them), but overall I think it's a lot fair for the scanner to be automatically looking into a directory with that name looking for .git directory

@another-rex
Copy link
Collaborator

Agree with not enabling scan the base git repository by default, though I think scanning submodules git hashes, and hashes under vendored lib names is still valuable to enable by default?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
V2 Wishlist Enhancements that require a breaking change
Projects
None yet
3 participants