Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add new Apache or MIT license rule #3738 #3750

Merged
merged 3 commits into from
Jun 10, 2024

Conversation

vasily-pozdnyakov
Copy link
Contributor

@vasily-pozdnyakov vasily-pozdnyakov commented Apr 26, 2024

Fixes #3738

Tasks

  • Reviewed contribution guidelines
  • PR is descriptively titled 📑 and links the original issue above 🔗
  • Commits are in uniquely-named feature branch and has no merge conflicts 📁

Signed-off-by: Vasily Pozdnyakov <[email protected]>
@vasily-pozdnyakov
Copy link
Contributor Author

vasily-pozdnyakov commented Apr 26, 2024

Hi, I am new to new rules creation and have a couple of questions:

  1. What does scancode-reindex-licenses do? It does not introduce any visible changes (validation is clear, what about reindex?).
  2. Why most of the rules are weak (without "{{}}")? If I understand correctly, it might bring a lot of false positives (let's say, there is a matching license notice for MIT, but with a different license name (Apache) - the scancode might detect it incorrectly), is it like that?

@AyanSinhaMahapatra
Copy link
Contributor

@vasily-pozdnyakov welcome and thanks for the PR!

What does scancode-reindex-licenses do? It does not introduce any visible changes (validation is clear, what about reindex?).

When you get SCTK from a git checkout and install scancode to run locally (or get scancode from github releases or pip install) you have the license indexes at scancode-toolkit/src/licensedcode/data/cache/license_index/ which contains the license index (pre-built if you're downloading the release/via pip) that is used for license detection when you run scancode. Now if you are updating/adding new rule/licenses in scancode you want to run scancode-reindex-licenses so next time you run a scan locally these rules are in the index and are used in license detection, but this does no changes in the repository of rules.

Why most of the rules are weak (without "{{}}")? If I understand correctly, it might bring a lot of false positives (let's say, there is a matching license notice for MIT, but with a different license name (Apache) - the scancode might detect it incorrectly), is it like that?

That's mostly correct, this could (and does also) generate a lot of false positives in a lot of cases. See #2878

We are working on resolving this and I have a pending PR at #3254 testing out adding required phrases automatically and massively across all the rules from the following:

  1. license names and other license keywords
  2. from other instances of these required phrases added in other rules.
    This would improve the accuracy significantly for these rules and also make sure when we add a required phrase, same is marked across the same phrases present in all rules. There is just a bit of work remaining on testing this and verifying this works correctly across all the rules. But this addition of required phrases would be continuous, or a massive one-time effort otherwise.

Signed-off-by: Vasily Pozdnyakov <[email protected]>
Copy link
Contributor

@pombredanne pombredanne left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. Just a minor but important nit wrt. to the license expression. The norm for MIT and Pache in the Rust ecosystem is to be a choice and buck2 and this new rules is not an exception.

We could also later add more rules for derived notices such as https://github.com/facebook/buck2?tab=readme-ov-file#license

src/licensedcode/data/rules/mit_and_apache-2.0_10.RULE Outdated Show resolved Hide resolved
@pombredanne pombredanne changed the title Add new rule to fix #3738 Add new Apache or MIT license rule #3738 Jun 8, 2024
@AyanSinhaMahapatra
Copy link
Contributor

Thanks++ @vasily-pozdnyakov, merging!

@AyanSinhaMahapatra AyanSinhaMahapatra merged commit b2968dc into aboutcode-org:develop Jun 10, 2024
32 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Surprising false positives of mit_or_gpl-3.0_17.RULE
3 participants