Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve Copyright Detection #3929

Merged
merged 6 commits into from
Oct 4, 2024
Merged

Improve Copyright Detection #3929

merged 6 commits into from
Oct 4, 2024

Conversation

AyanSinhaMahapatra
Copy link
Contributor

Tasks

  • Reviewed contribution guidelines
  • PR is descriptively titled 📑 and links the original issue above 🔗
  • Tests pass -- look for a green checkbox ✔️ a few minutes after opening your PR
    Run tests locally to check for errors.
  • Commits are in uniquely-named feature branch and has no merge conflicts 📁
  • Updated documentation pages (if applicable)
  • Updated CHANGELOG.rst (if applicable)

Add a new matcher_order attribute to LicenseMatch and use it for sorting
matches rather than the matcher string.
This was we can ensure that there is a proper precedence between
matchers when two matches are matching exactly the same text.

The new sort order for matcher is like that:
- 0: 1-hash
- 1: 2-aho
- 2: 1-spdx-id
- 3: 3-seq
- 4: 5-undetected
- 5: 5-aho-frag
- 6: 6-unknown

The outcome is that a hash or aho match for the same text at the same
position will take precedence of the SPDX id match, allowing to curate
and correct some incorrect license expressions if needed.

Reference: #3912
Reported-by: Ayan Sinha Mahapatra <[email protected]>
Signed-off-by: Philippe Ombredanne <[email protected]>
And also improve other copyright detections

Signed-off-by: Philippe Ombredanne <[email protected]>
Enable CREDITs detection in main authors loop
And detect more copyrights

Signed-off-by: Philippe Ombredanne <[email protected]>
Signed-off-by: Philippe Ombredanne <[email protected]>
Signed-off-by: Philippe Ombredanne <[email protected]>
@pombredanne
Copy link
Contributor

The Azure CI is flaky and randomly fails with:

E               ERROR: Unknown error:
E               Traceback (most recent call last):
E                 File "/home/vsts/work/1/s/src/scancode/interrupt.py", line 89, in interruptible
E                   create_signal(SIGALRM, handler)
E                 File "/opt/hostedtoolcache/Python/3.9.20/x64/lib/python3.9/signal.py", line 56, in signal
E                   handler = _signal.signal(_enum_to_int(signalnum), _enum_to_int(handler))
E               ValueError: signal only works in main thread of the main interpreter

This is an heisenbug only on Azure so I am going to merge anyway.

@pombredanne pombredanne merged commit 7d0d91a into develop Oct 4, 2024
37 of 39 checks passed
@pombredanne pombredanne deleted the misc-copyrights2 branch October 4, 2024 08:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants