Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve copyright detection #3910

Merged
merged 45 commits into from
Sep 12, 2024
Merged

Improve copyright detection #3910

merged 45 commits into from
Sep 12, 2024

Conversation

pombredanne
Copy link
Contributor

@pombredanne pombredanne commented Sep 6, 2024

This PR improves many areas of copyright detection for correctness and false positive.
In particular:

  • The way we strip text from markup tags has been entirely reworked, replacing an obscure and untested regex with a simpler splitter regex and a set of known tags to strip.
  • Handling of various tag-like strings such as <https://some.url.com> and emails has been modified and improved
  • A large number of issues and false positive have been fixed, in particular by re-scanning Linux.
  • Many new tests have been added along the way
  • The code to select candidate lines and prepare text lines has been streamlined and simplified

Tasks

  • Reviewed contribution guidelines
  • PR is descriptively titled 📑 and links the original issue above 🔗
  • Tests pass -- look for a green checkbox ✔️ a few minutes after opening your PR
    Run tests locally to check for errors.
  • Commits are in uniquely-named feature branch and has no merge conflicts 📁
  • Updated documentation pages (if applicable)
  • Updated CHANGELOG.rst (if applicable)

Signed-off-by: Philippe Ombredanne <[email protected]>
Signed-off-by: Philippe Ombredanne <[email protected]>
Signed-off-by: Philippe Ombredanne <[email protected]>
Do not use specific approach.

Signed-off-by: Philippe Ombredanne <[email protected]>
Some of these need love to fix. They are under to_improve

Signed-off-by: Philippe Ombredanne <[email protected]>
Signed-off-by: Philippe Ombredanne <[email protected]>
Signed-off-by: Philippe Ombredanne <[email protected]>
Signed-off-by: Philippe Ombredanne <[email protected]>
Signed-off-by: Philippe Ombredanne <[email protected]>
Signed-off-by: Philippe Ombredanne <[email protected]>
Signed-off-by: Philippe Ombredanne <[email protected]>
pombredanne and others added 16 commits September 7, 2024 08:39
Adapted trailing is no longer incorrectly detected

Signed-off-by: Philippe Ombredanne <[email protected]>
Signed-off-by: Philippe Ombredanne <[email protected]>
Signed-off-by: Philippe Ombredanne <[email protected]>
Signed-off-by: Philippe Ombredanne <[email protected]>
Empty lines can stop a notice continuity.

Signed-off-by: Philippe Ombredanne <[email protected]>
Empty lines can stop a notice continuity.

Signed-off-by: Philippe Ombredanne <[email protected]>
Signed-off-by: Philippe Ombredanne <[email protected]>
Signed-off-by: Philippe Ombredanne <[email protected]>
Somehow we were NOT running tests tests at all. We now do.
Also ensure that <Red as in <Red Hat Inc> is not treated as markup.

Signed-off-by: Philippe Ombredanne <[email protected]>
Signed-off-by: Philippe Ombredanne <[email protected]>
This is testing encoding which is tested elsewhere. We care only
about markup here

Signed-off-by: Philippe Ombredanne <[email protected]>
Signed-off-by: Philippe Ombredanne <[email protected]>
Signed-off-by: Ayan Sinha Mahapatra <[email protected]>
They were not used in any tests and serve no purpose

Signed-off-by: Philippe Ombredanne <[email protected]>
Signed-off-by: Philippe Ombredanne <[email protected]>
We had code in tow places. Now all is combined in one place.
The code has further been streamlined and refactored for clarity
and simplicity, in combination with copyright cndaidate selection.

Also amend tests as needed.

Signed-off-by: Philippe Ombredanne <[email protected]>
Rather than using complex regexes, we now use a simple set of known
tagsto strip from markup.

Signed-off-by: Philippe Ombredanne <[email protected]>
* Use new improved markup cleaner
* Refine detection for corner cases
* Update test accordingly

Signed-off-by: Philippe Ombredanne <[email protected]>
* Discard some placeholder tags such as <year>
* Remove specific code for Debian tags
* Streamline code for clarity

Signed-off-by: Philippe Ombredanne <[email protected]>
Signed-off-by: Philippe Ombredanne <[email protected]>
Signed-off-by: Philippe Ombredanne <[email protected]>
Signed-off-by: Philippe Ombredanne <[email protected]>
Strip leading copyright signs from holders

Signed-off-by: Philippe Ombredanne <[email protected]>
Handle more edge cases with new preparation

Signed-off-by: Philippe Ombredanne <[email protected]>
We now drop leading <> around a URL

Signed-off-by: Philippe Ombredanne <[email protected]>
Signed-off-by: Philippe Ombredanne <[email protected]>
Handle more edge cases

Signed-off-by: Philippe Ombredanne <[email protected]>
Signed-off-by: Philippe Ombredanne <[email protected]>
Signed-off-by: Philippe Ombredanne <[email protected]>
@pombredanne pombredanne changed the title Misc copyrights Improve copyright detection Sep 11, 2024
Do not lower tags first

Signed-off-by: Philippe Ombredanne <[email protected]>
Signed-off-by: Philippe Ombredanne <[email protected]>
Signed-off-by: Philippe Ombredanne <[email protected]>
Signed-off-by: Philippe Ombredanne <[email protected]>
@pombredanne
Copy link
Contributor Author

@AyanSinhaMahapatra ready to merge

Copy link
Contributor

@AyanSinhaMahapatra AyanSinhaMahapatra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @pombredanne ! Merging this

@AyanSinhaMahapatra AyanSinhaMahapatra merged commit 6e756c4 into develop Sep 12, 2024
33 checks passed
@AyanSinhaMahapatra AyanSinhaMahapatra deleted the misc-copyrights branch September 12, 2024 07:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants